[whatwg] Namespaces and tag names in the HTML parser
ian at hixie.ch
Wed May 29 15:19:35 PDT 2013
On Wed, 27 Feb 2013, Adam Klein wrote:
> Consider the following script:
> tr = document.createElement('tr')
> tr.innerHTML = '<math><tr><mo><td>';
> That is, the fragment is parsed with tr as the context element. What
> should the generated DOM be?
Up to the <td> it's unambiguous and uncontroversial, I hope; and should
At the "<td>", you clear the stack back to a table row context, which pops
all the nodes from the stack except the root one (the <html> one,
representing the original <tr> element on which innerHTML was invoked).
It thus results in:
> Note that <mo> is a "MathML text integration point", which causes the
> <td> to be processed not as foreign content but as a normal HTML token.
> This leads to the following DOM in WebKit:
> <math math>
> <math tr>
> <math mo>
> (the "math" prefixes denote that these are elements with the MathML
That is correct.
> In Gecko, I instead get:
> <math math>
> <math tr>
> <math mo>
That is not.
> The spec for what should happen to that <td> is the first step of
> This case clearly seems like a bug in Gecko: it's treating the <math tr>
> as if it's an HTML <tr>. That is, it's comparing only the local name (or
> "tag name" as the spec usually refers to it).
Right, that's wrong. The spec isn't ambiguous here, it explicitly says
that the current node must be a <tr> or <html> element, not an element
with a "tr" or "html" tag name, and <tr> and <html> elements are in the
HTML namespace (they're even hyperlinked to their definitions).
> But this same ambiguity exists elsewhere in the spec. For example, the
> very next item under "in row" says "If the stack of open elements does
> not have an element in table scope with the same tag name as the token"
> (in this case, it's looking for a <tr>).
Yeah, that text is wrong, because part of the rules look for <*:tr>, and
part assume that only <html:tr> was matched. In fact, it means that
tr.innerHTML = '<math><tr><mo></tr>' has no parse error and pops the root
<html> off the tree! That's clearly bogus.
> I think the HTML parser ought to specify more precisely how to deal with
> namespaces in the stack of open elements, given that that stack can
> contain elements of varying namespaces.
It's not so much that it has to do it precisely (it does), it's that it
has to do it accurately...
There's a huge number of places in the spec that do tag name comparisons
rather than element identity (tag+namespace) comparisons, and it's not at
all clear to me that they should all change. Consider:
On Fri, 15 Mar 2013, Rafael Weinstein wrote:
> I just opened another similar bug:
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=21292 which has a similar
> root cause.
> I agree with Adam that it seems wrong that the stack of open elements
> can contain elements in disparate namespaces, but its operation (at
> times) only examines the local name (e.g. checking if an element is in a
> specific scope, popping elements from the stack of open elements until
> an element with the same tag name...)
Well, as noted in the bug, I don't think we should check the namespace in
_every_ case. The case in the bug is this:
This is clearly invalid; the question is, what <td> did the author mean to
match, if any? It makes sense to me to match the most recently one. In
particular, consider these variations:
The cases in the spec now that are bogus are the cases where I mix one and
the other. That actually means the opposite kind of change as is being
proposed above: for example, it would mean changing the "table" end tag
steps from what they say now (popping an HTML <table> element), to popping
any "table" element regardless of namespace. This would make the algorithm
more consistent, and remove the bugs mentioned above.
Is this what people want to do? It's not what you (Adam) implemented, as I
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg