[whatwg] Bug in "Before DOCTYPE name state"?

Thu Dec 21 09:09:43 PST 2006

2006/12/21, Anne van Kesteren:
> On Thu, 21 Dec 2006 11:08:51 +0100, Thomas Broyer wrote:
>
> > Before DOCTYPE name state:
> > http://www.whatwg.org/specs/web-apps/current-work/#before1
> > """
> > ↪ U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z
> >     Create a new DOCTYPE token. Set the token's name name to the
> > uppercase version of the current input character (subtract 0x0020 from
> > the character's code point), and mark it as being in error. Switch to
> > the DOCTYPE name state.
> > """
> >
> > DOCTYPE name state
> > http://www.whatwg.org/specs/web-apps/current-work/#doctype1
> > """
> > ↪ U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z
> >     Append the uppercase version of the current input character
> > (subtract 0x0020 from the character's code point) to the current
> > DOCTYPE token's name. Stay in the DOCTYPE name state."""
> >
> > Why is the DOCTYPE marked "in error" in the former case?
> >
> > In other words, why would <!DOCTYPE html> be "in error" while
> > <!DOCTYPE Html> wouldn't?
> >
> > My guess is that it's a bug in the "Before DOCTYPE name state".
>
> It's not. The "DOCTYPE name state" also has this paragraph: "Then, if the
> name of the DOCTYPE token is exactly the four letters "HTML", then mark
> the token as being correct. Otherwise, mark it as being in error."

But it also has this note, which is quite confusing: "Because
lowercase letters in the name are uppercased by the algorithm above,
the "HTML" letters are actually case-insensitive relative to the
markup."

However, section 8.1.1 says:
http://www.whatwg.org/specs/web-apps/current-work/#doctype
"""
In other words, <!DOCTYPE HTML>, case-insensitively.
"""

So I guess you're right.

It remains that the tokenization stage is a bit confusing…

-- 
Thomas Broyer