[whatwg] several messages about serialising HTML and related subjects

Ian Hickson ian at hixie.ch
Thu Feb 28 18:23:03 PST 2008

Executive summary: I did most of the changes suggested below.

On Wed, 15 Aug 2007, Simon Pieters wrote:
> The spec says:
>    Other nodes types (e.g. Attr) cannot occur as children of elements. If
>    they do, this algorithm must raise an INVALID_STATE_ERR exception.
> s/elements/elements or documents/ as the algorithm can be used for documents
> as well.
> What about PIs? They can occur as children of elements or documents. 


On Wed, 15 Aug 2007, Simon Pieters wrote:
> The serializing HTML fragments algorithm talks about "child node" to 
> refer to the current node being processed. This is a bit confusing, and 
> I think "current node" would be clearer.


On Thu, 16 Aug 2007, Lachlan Hunt wrote:
>   There is a possible issue serialising HTML fragments section [1]. The 
> algorithm seems fine for use with things like innerHTML, but there are 
> other issues that should be considered when serialising to a file, 
> database, network stream or something.
> Such serialisers should consider the character encoding.  Although a 
> Unicode encoding should ideally be used, some serialisers may need to 
> serialise to a different encoding at the request of the user or 
> limitations of the environment.  In such cases, the serialisation should 
> output appropriate character references for characters that can't be 
> represented.
> It should also handle outputting the appropriate <meta charset=""> 
> and/or BOM, especially in environments that can't declare it at the 
> transport level like HTTP can.
> Perhaps the spec should say something about this issue somehwhere.
> [1] http://www.whatwg.org/specs/web-apps/current-work/#serialising

The section is specifically for serialising a subtree to a Unicode stream 
without mutation, not to a byte stream. What's the use case that isn't 
covered by "8.1 Writing HTML documents"?

On Mon, 27 Aug 2007, Simon Pieters wrote:
> IE7 and Firefox serialize U+00A0 characters in data and attribute values 
> as " " when getting innerHTML. Safari and Opera don't. Should the 
> spec be aligned with IE7 and Firefox here?
> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cscript%3Ewindow.onload%3Dfunction%28%29%7Bw%28document.body.innerHTML%29%7D%3C/script%3E%3Cp%20title%3D%22x%A0x%22%3Ex%A0x

I don't see any great benefit to doing so; do any pages require this?

On Tue, 28 Aug 2007, Alexey Proskuryakov wrote:
>   This has caused a compatibility issue for WebKit at least once. In 
> that case, we got away with evangelizing, but we still track this as a 
> bug that needs to be fixed eventually.
>   http://bugs.webkit.org/show_bug.cgi?id=11947

Ah. Ok then. Done.

On Tue, 28 Aug 2007, Boris Zbarsky wrote:

> For what it's worth, the relevant Mozilla bugs are 
> https://bugzilla.mozilla.org/show_bug.cgi?id=165686 and 
> https://bugzilla.mozilla.org/show_bug.cgi?id=169590

Cool, thanks.

On Tue, 11 Sep 2007, Simon Pieters wrote:
> Consider the following document:
>    <h:p xmlns:h="http://www.w3.org/1999/xhtml"><x/></h:p>
> When getting innerHTML on the root element, should the serialization 
> declare the no namespace explicitly as in <x xmlns=""/>? (I think it 
> should because setting innerHTML will imply namespace declarations so it 
> might change meaning if you insert it somewhere else with innerHTML.)

I've added this:

| If any of the elements in the serialisation are in the null namespace,
| the default namespace in scope for those elements must be explicitly 
| declared as the empty string.

Is that ok?

> Also, the spec says:
>    In an XML context, the innerHTML DOM attribute on HTMLElements and
>    HTMLDocuments, on getting, must return a string in the form of an
>    internal general parsed entity [...]
> ...and then goes on to say that some DocumentType nodes must raise an 
> exception, however internal general parsed entities can't have doctypes 
> in the first place.

Oops. Fixed. Only elements should return internal general parsed entities; 
documents should return document entities. Empty documents now raise an 

> Finally, the spec lists the following as something that throws:
>    A Text node whose data contains characters that are not matched by the
>    XML Char production. [XML]
> But Text data is not the only case that might not match the Char 
> production in XML. Comment data, CDATASection data, 
> ProcessingInstruction target, and, I think, Attr value.


Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list