[whatwg] [wa1] Status of tree construction section

Ian Hickson ian at hixie.ch
Tue Jul 11 18:46:16 PDT 2006

On Mon, 10 Jul 2006, Stewart Brodie wrote:
> In the main phase, section 'If the insertion mode is "in row"', the last 
> option for 'anything else' says "process ... as if ... in table".  I 
> think that should say "as if ... in table body" instead.  That case will 
> re-throw the token out to "in table" in any case if it doesn't handle 
> it.

There'd be no difference. Any token that isn't handled by the "row" mode 
will not be handled by the "table body" mode.

> The case immediately above that "An end tag whose tag name is one of: body,
> caption, col, colgroup, html, td, th, tr".  The /tr case is already handled
> by the second case.  Remove 'tr' from the list here.

Good catch. Fixed.

> In 'If the insertion mode is "in cell"', the absence of a case for an 
> end tag for CAPTION looks odd.  All the other table-related tags are 
> handled here explicitly, so why is CAPTION so different (that it should 
> be handled in the 'treat it as "in body"' way)?

It makes no difference, but ok, I've listed "caption" in the "in cell" 

> I've come to the conclusion that you need pictures to accompany the 
> "adoption agency algorithm".  However, I'm not an artist.  Indeed, I'm 
> so bad at drawing pictures, that in the past, users often sent me 
> replacement bitmap graphics for my programs because they found my 
> attempts so distressing :-)

Yeah, I completely agree. Diagrams and examples. If someone wants to do a 
diagram here I'd be most happy. Failing that, I'll probably get around to 
it in due course (e.g. once I'm convinced it actually works).

> With reference to that algorithm, I think that the text in point 1 
> should be re-organised somewhat after the second paragraph to make it a 
> little clearer.  I've re-organised it and I think it says exactly the 
> same now, but simpler and with less potential for misunderstanding:
>   "If there is a _formatting element_; proceed immediately to step 2
>   Otherwise, there is no _formatting element_.  If there is an element in
>   the _list of active formatting elements_ that:
>   o  [same three steps, but with ", and" appended to the top one]
>   then remove the last such element from the _list of active formatting
>   elements_.
>   In any case, abort these steps."

I think I radically rewrote that step since you last looked at it, because 
your comment above doesn't match the current text. I found a massive 
glaring bug in the algorithm about a week or two ago that I fixed which 
required a big rewrite of step 1 of that algorithm, so that may be why.

> In the various places where a given operation has to be described 
> multiple times, you've macroed it (e.g. "insert an HTML element", "clear 
> the list of active formatting elements up to the last marker").  I 
> suggest adding another this one that can be used during the Adoption 
> Agency algorithm (I'm sure that I found I needed to perform this search 
> in other places too - hence defining it separately - although I can't 
> quite recall exactly where for the time being, ho hum):
>   "The _list of active formatting elements_ is said to *have an element in
>    active formatting scope* when the following algorithm terminates in a
>    match state:
>   1. If the _list of active formatting elements_ is empty, terminate in a
>      failure state.
>   2. Initialise _entry_ to be the last (most recently added) entry in the
>      _list of active formatting elements_.
>   3. If _entry_ is a marker, terminate in a failure state.
>   4. If _entry_ is an element with a tag name matching the target element
>      name, terminate in a match state.
>   5. If there are further elements in the _list of active formatting
>      elements_, set _entry_ to the previous entry and return to step 3.
>   6. Terminate in a failure state (there are no more entries)"

What would this replace in the current text?

> Step 6 in the original 14-step algorithm: "relative position of the 
> formatting element".  Relative to what?


> The "parsing quirks" box lists several issues that I think are 
> important. The <script> one in particular is so very common.

Can you describe what exactly the quirk is? I have yet to see an 
algorithmic description of how to parse <script> blocks in quirks mode. In 
my research and the research that other people have done, it was found 
that every UA does it slightly differently. This is why I'd really rather 
not do this.

> Unfortunately, I had to cave in eventually and support that because it 
> broke some customers' own sites.

If you can tell me exactly what it is, I might be more convinced to do it.

> I have come across never-opened </br> and </p> too.

I'm currently doing a study to determine how common these are. Preliminary 
results suggest they are indeed far too common to be left out.

> I've never heard of <% ... %> before.  Sometimes, it's really quite 
> depressing the rubbish that people (and programs!) write out.


> I spent a long time trying to work out what I needed to store for each 
> entry on both the stack of open elements and the list of active 
> formatting elements.  I think it should be stated up front because this 
> is often an area of confusion, in my experience.  I frequently get upset 
> with co-workers over misuse of the terms "element", "tag" and "node", 
> for example :-)

What do you think you have to store? I'm not sure what answer would 
satisfy you here.

> Finally (for now ;-), right at the beginning of the tree construction 
> section, it says that DOM Mutation events must not fire for changes 
> caused by the UA parsing the document.  I cannot decide whether or not I 
> agree with that statement.  My experimentation appears to show that this 
> is indeed what happens in Firefox, at least. I put a script in the head 
> of my document that attaches a listener for DOMNodeInserted on the 
> document.documentElement node (i.e. the HTML element) and it never gets 
> called due to nodes being added by the parser.  Internally, for me, it's 
> a PITA though, because my node tree construction code and DOM 
> implementation code use the same internal APIs - and these automatically 
> trigger the DOM events, which, in turn, get dispatched to the various 
> internal default event handlers to deal with the special types of node 
> that require additional behaviour (like IMG, LINK, META etc.).

In Web browsers it's simply not an option. Having to fire mutation events 
for every mutation according to the complete DOM3 Events model is 
prohibitively expensive.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list