[whatwg] Bug in "Before DOCTYPE name state"?

Thu Dec 21 23:38:48 PST 2006

2006/12/22, Ian Hickson:
> On Thu, 21 Dec 2006, Thomas Broyer wrote:
> >
> > Why is the DOCTYPE marked "in error" in the former case?
>
> Because otherwise this document:
>
>    <!DOCTYPEH
>
> ...would emit a DOCTYPE that is not in error (since the token would be
> emitted before the bit at the end of the DOCTYPE name state).

Doh! right.

> > In other words, why would <!DOCTYPE html> be "in error" while
> > <!DOCTYPE Html> wouldn't?
>
> Both would be not in error, because of the sentence at the end of the
> DOCTYPE name state.

OK, now understood (thanks you Simon for having enlighted me)

> On Thu, 21 Dec 2006, Thomas Broyer wrote:
> >
> > But it also has this note, which is quite confusing: "Because lowercase
> > letters in the name are uppercased by the algorithm above, the "HTML"
> > letters are actually case-insensitive relative to the markup."
>
> How is it confusing? I would clarify it, but I don't know what is
> confusing.

Maybe there's no need to clarify it, it might just have been me…

> > It remains that the tokenization stage is a bit confusing…
>
> Yes. The tree construction stage is even worse. Just implement it exactly
> as written with no interpretation and you should be fine. ;-)

My "problem" is that I'm not implementing an "emitting" parser (à la
SAX) but a "pulling" parser, so I'm stopping as soon as I've found a
token and return true to say "hey, I've changed the TokenType, Name,
Value, etc. properties to reflect a new token".
...so I'm interpreting ;-)

Re tree construction, I'm about to implemented it in two parts: in the
"pull parser" when possible (handling omitted tags and misnested
formatting elements) and in a "tree fixer" otherwise (move the <meta>
and <link> into <head>, etc.)

-- 
Thomas Broyer