[whatwg] Namespaces and tag names in the HTML parser
Peter Occil
poccil14 at gmail.com
Thu Aug 1 11:22:23 PDT 2013
Many of these cases occur in the normative portion of the tree construction
stage. Most of them involve checking whether an element (as opposed to
a tag token) has a certain name:
Accordingly, these cases are ambiguous:
* If foster parenting is enabled and target is a table, tbody, tfoot, thead,
or tr element
* Let last template be the last template element in the stack of open
elements, if any.
* Let last table be the last table element in the stack of open elements, if
any.
* If the adjusted insertion location is inside a template element, let it
instead be inside
the template element's template contents [first instance only]
* When the steps below require the UA to generate implied end tags, then,
while the
current node is a dd element, a dt element, an li element, an option
element, an
optgroup element, a p element, an rp element, or an rt element, the UA
must pop
the current node off the stack of open elements.
* Create an html element whose ownerDocument is the Document object.
[doesn't
mention the namespace]
* If there is no template element on the stack of open elements, then this
is a
parse error; ignore the token.
* If the current node is not a template element, then this is a parse error.
* Pop elements from the stack of open elements until a template element has
been popped from the stack.
* If there is a template element on the stack of open elements, ignore the
token.
* If the second element on the stack of open elements is not a body element,
[...] or if there is a template element on the stack of open elements,
then
ignore the token.
And more.
But these cases aren't ambiguous:
* [L]et adjusted insertion location be inside the first element in the stack
of open
elements (the html element) ... [explanatory only]
* [I]t's possible for elements, the table element in this case in
particular,
to have been moved by a script around in the DOM ... [appears in a note]
* [A]ssociate the newly created element with the form element pointed to by
the
form element pointer
* Set the head element pointer to the newly created head element.
* If the parser was originally created for the HTML fragment parsing
algorithm,
then mark the script element as "already started".
* Pop the current node (which will be the head element) off the stack of
open
elements. [Appears twice]
* Pop the current node (which will be a noscript element) from the stack of
open
elements; the new current node will be a head element. [Appears twice]
* [F]or each attribute on the token, check to see if the attribute is
already present
on the body element (the second element) on the stack of open elements
And more.
As you can see, it's really only a few dozen ambiguous cases, not thousands.
Plus they all seem to follow one of these patterns:
* If the node is a so-and-so element
* While the node is a so-and-so element, a such-and-such element, etc.
* The last so-and-so element on the stack of open elements
* If there is a so-and-so element on the stack of open elements
* Until a so-and-so element has been popped from the stack
* If the list of active formatting elements contains a so-and-so element
* Have a so-and-so element in button scope, table scope, etc.
(One exception is "Create an html element whose ownerDocument is the
Document object.")
Moreover, where needed, a shortcut is to use "an HTML so-and-so element"
rather than "a so-and-so element in the HTML namespace". (This can apply
similarly to SVG and MathML.)
--Peter
-----Original Message-----
From: Ian Hickson
Sent: Thursday, August 01, 2013 1:31 PM
To: Peter Occil
Cc: WHATWG
Subject: Re: Namespaces and tag names in the HTML parser
On Wed, 10 Jul 2013, Peter Occil wrote:
> >
> > Short of explicitly putting "in the HTML namespace" at every
> > occurrence of this, I don't know how to fix this. Putting "in the HTML
> > namespace" everywhere is a non-starter, there's something like ten
> > thousand occurrences of element names in the spec. (Literally. Ten
> > thousand.)
>
> I don't mean in the entire HTML spec, I only mean within the tree
> construction section, and then only where it eliminates ambiguity, such
> as "while the current node is not a tr element or an html element", as I
> stated previously. I agree it's silly to include the words "in the HTML
> namespace" everywhere in the spec.
I don't really understand why that case is ambiguous, but thousands of
others aren't. Can you elaborate on what the difference is?
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list