[whatwg] Namespaces and tag names in the HTML parser

Ian Hickson ian at hixie.ch
Mon Jul 1 15:40:04 PDT 2013


There have been a number of high-impact normative changes and high-risk 
editorial changes to the HTML parser in the last few days.

This refactored the parser logic for foster parenting:
   http://html5.org/tools/web-apps-tracker?from=7997&to=7998

This defined ownerDocument for the parser, which was previously quite 
ambiguous:
   http://html5.org/tools/web-apps-tracker?from=7998&to=7999

This added support for the new <template> element that Rafael and Tony 
specced and that Firefox and Chrome implement:
   http://html5.org/tools/web-apps-tracker?from=7999&to=8000

These changed the spec to fix a number of namespace-related bugs:
   http://html5.org/tools/web-apps-tracker?from=8000&to=8004

I would be very grateful to anyone who is able to review these changes. 
They are quite risky.


More comments below.

On Wed, 29 May 2013, Peter Occil wrote:
> > >
> > > The spec for what should happen to that <td> is the first step of 
> > > http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-intr
> > > 
> > > This case clearly seems like a bug in Gecko: it's treating the <math 
> > > tr> as if it's an HTML <tr>. That is, it's comparing only the local 
> > > name (or "tag name" as the spec usually refers to it).
> > 
> > Right, that's wrong. The spec isn't ambiguous here, it explicitly says 
> > that the current node must be a <tr> or <html> element, not an element 
> > with a "tr" or "html" tag name, and <tr> and <html> elements are in 
> > the HTML namespace (they're even hyperlinked to their definitions).
> 
> For some people it might be ambiguous: "while the current node is not a 
> tr element or an html element" doesn't refer to a namespace (if one 
> takes only the words, not their hyperlinks), so some people may believe 
> that only the tag names of "tr" and "html" are important. (While the 
> HTML spec states that "except where otherwise stated, all elements 
> defined or mentioned in this specification are in the HTML namespace", 
> the chance is high that some people might gloss over that, as I 
> certainly did when I implemented my HTML parser.)

Short of explicitly putting "in the HTML namespace" at every occurrence of 
this, I don't know how to fix this. Putting "in the HTML namespace" 
everywhere is a non-starter, there's something like ten thousand 
occurrences of element names in the spec. (Literally. Ten thousand.)


> If you're correct in this case, the words should have been "while the 
> current node is neither a tr element in the HTML namespace nor an html 
> element in the HTML namespace".
> 
> To avoid confusion, it would be helpful, in each case, to state whether, 
> say, "a table element" refers to "a table element in the HTML namespace" 
> or "a table element in any namespace".

It _always_ refers to "in the HTML namespace" except where the contrary is 
explicitly stated (which is relatively rare).


On Thu, 30 May 2013, Rafael Weinstein wrote:
> On Wed, May 29, 2013 at 3:19 PM, Ian Hickson <ian at hixie.ch> wrote:
> >
> > Well, as noted in the bug, I don't think we should check the namespace 
> > in _every_ case. The case in the bug is this:
> >
> >    <body><table><tr><td><svg><td><foreignObject></td>Foo<foo>
> >
> > This is clearly invalid; the question is, what <td> did the author 
> > mean to match, if any? It makes sense to me to match the most recently 
> > one. In
> 
> Not that I care very much to attempt to support DWIM in this way, 
> because I think allowing parser implementations to maintain a sane 
> invariant here is more important, but...
> 
> I think it's more likely the author was being lazy about closing all the 
> svg tags and simply wanted a quick way to say "I'm done with my table 
> cell"

I don't know how to do that. In this case:

    <body><svg><g><foreignObject></g>

...the author "more likely" wanted to close the <svg:g>. How do we tell 
the difference? I think it's simpler to just assume you're closing the 
nearest one in scope, rather than trying to DWIM our way into the right 
case. Having said that, there were definitely bogus cases in the spec. 
I've tried to fix them.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the whatwg mailing list