[imps] Problem with the tree-construction test cases and implied body

Thomas Broyer t.broyer at gmail.com
Tue Sep 11 01:00:55 PDT 2007


2007/9/11, Thomas Broyer:
> 2007/9/11, Anne van Kesteren:
> > Given that <html>, </html>, <head>, </head>, <body> and </body> are all
> > optional in the language it didn't seem logical to make this a parse
> > error. I like to believe I'm correct in that interpretation.
> > (Incidentally, I also wrote the implementation. Incidentally, this was
> > tested against testcases written by Hixie himself.)
>
> Hmm, that's a pretty good point! ;-)
>
> ...so let's fix the spec (or rather, note it for when we'll solve the
> "big issue")

FYI, I've fixed it in Twintsam by testing for "head" in addition to
"body" in the EOF case of the main phase. The spec could read (changes
marked with <ins>):
<<<
An end-of-file token

    Generate implied end tags.

    If there are more than two nodes on the stack of open elements, or
if there are two nodes but the second node is not <ins>a head node
or</ins> a body node, this is a parse error.

    Otherwise, if the parser was originally created as part of the
HTML fragment parsing algorithm, and there's more than one element in
the stack of open elements, and the second node on the stack of open
elements is not <ins>a head node or</ins> a body node, then this is a
parse error. (fragment case)

    Stop parsing.
>>>

Note that I've also changed the "fragment case", though I'm really not
sure it should be changed that way too. At least it doesn't change
anything in the available test cases (or rather it doesn't make
Twintsam pass/fail more/less tests; but Twintsam is far from
finished).


N.B.: If you're interested in how Twintsam handles EOF (and how it
ensures every produced document has a head and a body), look for
"ProcessEndOfFile" in
<http://twintsam.googlecode.com/svn/trunk/Twintsam/Html/HtmlReader.Parsing.cs>
Keep in mind that the HtmlReader class is a System.Xml.XmlReader
subclass and that it "generates tokens" (its goal is to "fix" the
markup to produce well-formed XML). I'll soon add a tree-builder class
to complement the HtmlReader and handle reparenting cases (title goes
into the head, things inside a table but not in a cell are moved
outside the table, etc.) I'm not yet sure it's even feasible, but
let's try doing it.

-- 
Thomas Broyer



More information about the Implementors mailing list