[whatwg] Namespaces and tag names in the HTML parser

Thu Aug 1 11:22:23 PDT 2013

Many of these cases occur in the normative portion of the tree construction
stage.  Most of them involve checking whether an element (as opposed to
a tag token) has a certain name:

Accordingly, these cases are ambiguous:

* If foster parenting is enabled and target is a table, tbody, tfoot, thead, 
or tr element
* Let last template be the last template element in the stack of open 
elements, if any.
* Let last table be the last table element in the stack of open elements, if 
any.
* If the adjusted insertion location is inside a template element, let it 
instead be inside
  the template element's template contents [first instance only]
* When the steps below require the UA to generate implied end tags, then, 
while the
  current node is a dd element, a dt element, an li element, an option 
element, an
  optgroup element, a p element, an rp element, or an rt element, the UA 
must pop
  the current node off the stack of open elements.
* Create an html element whose ownerDocument is the Document object. 
[doesn't
  mention the namespace]
* If there is no template element on the stack of open elements, then this 
is a
  parse error; ignore the token.
* If the current node is not a template element, then this is a parse error.
* Pop elements from the stack of open elements until a template element has
  been popped from the stack.
* If there is a template element on the stack of open elements, ignore the 
token.
* If the second element on the stack of open elements is not a body element,
  [...] or if there is a template element on the stack of open elements, 
then
  ignore the token.

  And more.

But these cases aren't ambiguous:

* [L]et adjusted insertion location be inside the first element in the stack 
of open
elements (the html element) ... [explanatory only]
* [I]t's possible for elements, the table element in this case in 
particular,
to have been moved by a script around in the DOM ...  [appears in a note]
* [A]ssociate the newly created element with the form element pointed to by 
the
form element pointer
* Set the head element pointer to the newly created head element.
* If the parser was originally created for the HTML fragment parsing 
algorithm,
  then mark the script element as "already started".
* Pop the current node (which will be the head element) off the stack of 
open
  elements. [Appears twice]
* Pop the current node (which will be a noscript element) from the stack of 
open
  elements; the new current node will be a head element. [Appears twice]
* [F]or each attribute on the token, check to see if the attribute is 
already present
   on the body element (the second element) on the stack of open elements

And more.

As you can see, it's really only a few dozen ambiguous cases, not thousands.
Plus they all seem to follow one of these patterns:

* If the node is a so-and-so element
* While the node is a so-and-so element, a such-and-such element, etc.
* The last so-and-so element on the stack of open elements
* If there is a so-and-so element on the stack of open elements
* Until a so-and-so element has been popped from the stack
* If the list of active formatting elements contains a so-and-so element
* Have a so-and-so element in button scope, table scope, etc.

(One exception is "Create an html element whose ownerDocument is the
Document object.")

Moreover, where needed, a shortcut is to use "an HTML so-and-so element"
rather than "a so-and-so element in the HTML namespace".  (This can apply
similarly to SVG and MathML.)

--Peter

-----Original Message----- 
From: Ian Hickson
Sent: Thursday, August 01, 2013 1:31 PM
To: Peter Occil
Cc: WHATWG
Subject: Re: Namespaces and tag names in the HTML parser

On Wed, 10 Jul 2013, Peter Occil wrote:
> >
> > Short of explicitly putting "in the HTML namespace" at every
> > occurrence of this, I don't know how to fix this. Putting "in the HTML
> > namespace" everywhere is a non-starter, there's something like ten
> > thousand occurrences of element names in the spec. (Literally. Ten
> > thousand.)
>
> I don't mean in the entire HTML spec, I only mean within the tree
> construction section, and then only where it eliminates ambiguity, such
> as "while the current node is not a tr element or an html element", as I
> stated previously.  I agree it's silly to include the words "in the HTML
> namespace" everywhere in the spec.

I don't really understand why that case is ambiguous, but thousands of
others aren't. Can you elaborate on what the difference is?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'