[whatwg] On tag inference
hsivonen at iki.fi
Mon Aug 29 12:29:08 PDT 2005
What kind of approach to tag inference can HTML5 be expected to take?
For an SGML validator that is parsing HTML 4 the set of possible
element names is finite. However, a browser needs to deal with an
infinite set of a potential elements names. Therefore, it makes a
difference whether end tag inference is based on what is allowed as a
child of an element or on what elements are not allowed.
Is 'foo' an element that not allowed as a child of 'p' and, therefore,
implicitly closes the 'p'? Or is 'foo' not on the list of elements that
close 'p' and, therefore, does not implicitly close it? Which way are
the inference rules going to be defined?
As far as I can tell, there are four kinds of inference needed when
parsing *conforming* documents (ie. no second stack for residual
1) Element end causes the end of the elements that is on the top of the
2) End of the data stream causes the end of the element that is on the
top of the stack.
3) Element start causes the end of the element that is on the top of
4) Element start causes another element start before itself.
Is this list complete?
I am assuming that for *conforming* documents, the above-mentioned
inference decisions can be taken by observing the top of the stack and
the element name associated with the current end or start element
event. Correct? (I am assuming the rules may be applied repeatedly. Ie.
null on stack and start 'title' implies 'html' start. 'html' on stack
and start 'title' implies 'head' start. 'head' on stack and start
'title' implies nothing and the start 'title' goes through.)
It seems to me that #3 is the tricky case in terms of interaction with
unknown element names. #1 and #2 require a list of elements whose end
tag is optional. #4 requires a map of top of stack plus current start
pairs to inferred start tags.
* I am assuming an implementation maintains a stack of open elements or
works directly on a parser tree in which case the path from the current
node to the root has the right same role as the stack.
hsivonen at iki.fi
More information about the whatwg