[whatwg] [wa1] Status of tree construction section
Ian Hickson
ian at hixie.ch
Wed Jun 13 15:34:27 PDT 2007
On Wed, 12 Jul 2006, Stewart Brodie wrote:
> > >
> > > In the main phase, section 'If the insertion mode is "in row"', the
> > > last option for 'anything else' says "process ... as if ... in
> > > table". I think that should say "as if ... in table body" instead.
> > > That case will re-throw the token out to "in table" in any case if
> > > it doesn't handle it.
> >
> > There'd be no difference. Any token that isn't handled by the "row"
> > mode will not be handled by the "table body" mode.
>
> Yes, I noticed that and agree, except that it just seemed to me that it
> would be more natural to expect unhandled things to be thrown to the
> next level of scope (table body) rather than bypassing it and going
> directly to the table.
I don't really see why. Seems like an additional level of indirection.
> > > I've come to the conclusion that you need pictures to accompany the
> > > "adoption agency algorithm". However, I'm not an artist. Indeed,
> > > I'm so bad at drawing pictures, that in the past, users often sent
> > > me replacement bitmap graphics for my programs because they found my
> > > attempts so distressing :-)
> >
> > Yeah, I completely agree. Diagrams and examples. If someone wants to
> > do a diagram here I'd be most happy. Failing that, I'll probably get
> > around to it in due course (e.g. once I'm convinced it actually
> > works).
>
> It is the most complex part of the tree construction. Perhaps in lieu
> of pictures in the short term, a short non-normative summary could be
> added describing what the algorithm is doing, because reverse
> engineering it from the 14-step plan is hard.
As I said, if someone wants to contribute examples, introduction
materials, diagrams, or other helping material, I'm certainly open to
adding it to the spec. I just don't want to do it myself until I'm
confident the spec is stable.
> > > The "parsing quirks" box lists several issues that I think are
> > > important. The <script> one in particular is so very common.
> > > Unfortunately, I had to cave in eventually and support that because
> > > it broke some customers' own sites.
> >
> > Can you describe what exactly the quirk is? I have yet to see an
> > algorithmic description of how to parse <script> blocks in quirks
> > mode. In my research and the research that other people have done, it
> > was found that every UA does it slightly differently. This is why I'd
> > really rather not do this. If you can tell me exactly what it is, I
> > might be more convinced to do it.
>
> Yes, it's hard to pin down. In effect, it's a new value for the content
> model flag which is like some sort of combination of RCDATA and
> PLAINTEXT. I'm not sure it's just a quirk, to be honest. I've tried the
> following snippet in Firefox, Opera & IE6 and they behave the same way
> regardless of the presence of a strict HTML4 doctype declaration before
> the <html>
>
> <html><title>The <!-- comment with a </title> in the --> title</title><body
> onload="document.body.appendChild(document.createTextNode(document.title))">
>
> In all cases, the window title and the text shown in the document body was:
>
> The <!-- comment with a </title> in the --> title
>
> The same behaviour appears to apply to TEXTAREA, SCRIPT, NOSCRIPT,
> NOFRAMES, NOEMBED. STYLE works differently in Firefox (it thinks that
> the content property's value terminates the style tag:
>
> <style> <!-- h1:after { content: '</style>'; color: red } --> </style>
>
> The rule seems to be that whilst you are lexing the contents of one of
> these magical elements, you have an additional flag, initialised to
> false, that indicates that you are inside an pseudo-comment. You
> continue to accumulate character tokens, but if you see the sequence
> <!-- and the flag is false, you set the flag to true. If the flag is
> true and you see the sequence -->, you set the flag to false. Whilst
> the flag is true, finding the < does not switch to the open tag state.
> The character tokens are all accumulated into the content of the
> element, regardless of whether they match the <!-- or --> markers.
It does indeed seem that CDATA and RCDATA have this behaviour in the
tokeniser in IE. Fixed.
> > > Finally (for now ;-), right at the beginning of the tree
> > > construction section, it says that DOM Mutation events must not fire
> > > for changes caused by the UA parsing the document. I cannot decide
> > > whether or not I agree with that statement. My experimentation
> > > appears to show that this is indeed what happens in Firefox, at
> > > least. I put a script in the head of my document that attaches a
> > > listener for DOMNodeInserted on the document.documentElement node
> > > (i.e. the HTML element) and it never gets called due to nodes being
> > > added by the parser. Internally, for me, it's a PITA though,
> > > because my node tree construction code and DOM implementation code
> > > use the same internal APIs - and these automatically trigger the DOM
> > > events, which, in turn, get dispatched to the various internal
> > > default event handlers to deal with the special types of node that
> > > require additional behaviour (like IMG, LINK, META etc.).
> >
> > In Web browsers it's simply not an option. Having to fire mutation
> > events for every mutation according to the complete DOM3 Events model
> > is prohibitively expensive.
>
> To be honest, I've not found it a burden even on the sorts of low-end
> devices that our software runs (typically 300MHz CPUs, 8MB RAM, that
> sort of thing) Then again, I have a highly optimised event dispatcher
> that takes steps to minimise the work, particularly when there are no
> DOM listeners for the event being raised, which will almost always be
> the case for the events concerned (DOMNodeInserted and
> DOMNodeInsertedIntoDocument and the Removed counterparts). The internal
> default event handlers have similar filtering to eliminate any
> unnecessary processing quickly.
Even minimal work is more than no work, and when you're dealing with
thousands of elements, that's a big difference (in the order of
milliseconds).
> In the "in body" section, WBR doesn't really belong with a,b,big,em...
> because it never had content. It probably ought to go in with
> area,basefont,bgsound... a bit further down, or in its own section.
> There's no real point bothering with putting it in the list of active
> formatting elements so it's coming off the stack again straight away.
Fixed.
Thanks,
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list