[whatwg] [wa1] Status of tree construction section

Wed Jun 13 15:34:27 PDT 2007

On Wed, 12 Jul 2006, Stewart Brodie wrote:
> > > 
> > > In the main phase, section 'If the insertion mode is "in row"', the 
> > > last option for 'anything else' says "process ... as if ... in 
> > > table".  I think that should say "as if ... in table body" instead.  
> > > That case will re-throw the token out to "in table" in any case if 
> > > it doesn't handle it.
> > 
> > There'd be no difference. Any token that isn't handled by the "row" 
> > mode will not be handled by the "table body" mode.
> 
> Yes, I noticed that and agree, except that it just seemed to me that it 
> would be more natural to expect unhandled things to be thrown to the 
> next level of scope (table body) rather than bypassing it and going 
> directly to the table.

I don't really see why. Seems like an additional level of indirection.

> > > I've come to the conclusion that you need pictures to accompany the 
> > > "adoption agency algorithm".  However, I'm not an artist.  Indeed, 
> > > I'm so bad at drawing pictures, that in the past, users often sent 
> > > me replacement bitmap graphics for my programs because they found my 
> > > attempts so distressing :-)
> > 
> > Yeah, I completely agree. Diagrams and examples. If someone wants to 
> > do a diagram here I'd be most happy. Failing that, I'll probably get 
> > around to it in due course (e.g. once I'm convinced it actually 
> > works).
> 
> It is the most complex part of the tree construction.  Perhaps in lieu 
> of pictures in the short term, a short non-normative summary could be 
> added describing what the algorithm is doing, because reverse 
> engineering it from the 14-step plan is hard.

As I said, if someone wants to contribute examples, introduction 
materials, diagrams, or other helping material, I'm certainly open to 
adding it to the spec. I just don't want to do it myself until I'm 
confident the spec is stable.

> > > The "parsing quirks" box lists several issues that I think are 
> > > important. The <script> one in particular is so very common. 
> > > Unfortunately, I had to cave in eventually and support that because 
> > > it broke some customers' own sites.
> > 
> > Can you describe what exactly the quirk is? I have yet to see an 
> > algorithmic description of how to parse <script> blocks in quirks 
> > mode. In my research and the research that other people have done, it 
> > was found that every UA does it slightly differently. This is why I'd 
> > really rather not do this.  If you can tell me exactly what it is, I 
> > might be more convinced to do it.
> 
> Yes, it's hard to pin down.  In effect, it's a new value for the content 
> model flag which is like some sort of combination of RCDATA and 
> PLAINTEXT. I'm not sure it's just a quirk, to be honest.  I've tried the 
> following snippet in Firefox, Opera & IE6 and they behave the same way 
> regardless of the presence of a strict HTML4 doctype declaration before 
> the <html>
> 
> <html><title>The <!-- comment with a </title> in the --> title</title><body
> onload="document.body.appendChild(document.createTextNode(document.title))">
> 
> In all cases, the window title and the text shown in the document body was:
>
>   The <!-- comment with a </title> in the --> title
> 
> The same behaviour appears to apply to TEXTAREA, SCRIPT, NOSCRIPT, 
> NOFRAMES, NOEMBED.  STYLE works differently in Firefox (it thinks that 
> the content property's value terminates the style tag:
> 
>   <style> <!-- h1:after { content: '</style>'; color: red } --> </style>
> 
> The rule seems to be that whilst you are lexing the contents of one of 
> these magical elements, you have an additional flag, initialised to 
> false, that indicates that you are inside an pseudo-comment.  You 
> continue to accumulate character tokens, but if you see the sequence 
> <!-- and the flag is false, you set the flag to true.  If the flag is 
> true and you see the sequence -->, you set the flag to false.  Whilst 
> the flag is true, finding the < does not switch to the open tag state.  
> The character tokens are all accumulated into the content of the 
> element, regardless of whether they match the <!-- or --> markers.

It does indeed seem that CDATA and RCDATA have this behaviour in the 
tokeniser in IE. Fixed.

> > > Finally (for now ;-), right at the beginning of the tree 
> > > construction section, it says that DOM Mutation events must not fire 
> > > for changes caused by the UA parsing the document.  I cannot decide 
> > > whether or not I agree with that statement.  My experimentation 
> > > appears to show that this is indeed what happens in Firefox, at 
> > > least. I put a script in the head of my document that attaches a 
> > > listener for DOMNodeInserted on the document.documentElement node 
> > > (i.e. the HTML element) and it never gets called due to nodes being 
> > > added by the parser.  Internally, for me, it's a PITA though, 
> > > because my node tree construction code and DOM implementation code 
> > > use the same internal APIs - and these automatically trigger the DOM 
> > > events, which, in turn, get dispatched to the various internal 
> > > default event handlers to deal with the special types of node that 
> > > require additional behaviour (like IMG, LINK, META etc.).
> > 
> > In Web browsers it's simply not an option. Having to fire mutation 
> > events for every mutation according to the complete DOM3 Events model 
> > is prohibitively expensive.
> 
> To be honest, I've not found it a burden even on the sorts of low-end 
> devices that our software runs (typically 300MHz CPUs, 8MB RAM, that 
> sort of thing)  Then again, I have a highly optimised event dispatcher 
> that takes steps to minimise the work, particularly when there are no 
> DOM listeners for the event being raised, which will almost always be 
> the case for the events concerned (DOMNodeInserted and 
> DOMNodeInsertedIntoDocument and the Removed counterparts).  The internal 
> default event handlers have similar filtering to eliminate any 
> unnecessary processing quickly.

Even minimal work is more than no work, and when you're dealing with 
thousands of elements, that's a big difference (in the order of 
milliseconds).

> In the "in body" section, WBR doesn't really belong with a,b,big,em... 
> because it never had content.  It probably ought to go in with 
> area,basefont,bgsound... a bit further down, or in its own section.  
> There's no real point bothering with putting it in the list of active 
> formatting elements so it's coming off the stack again straight away.

Fixed.

Thanks,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'