[whatwg] Parsing: Tokenisation - DOCTYPE State

Sat Jan 28 22:59:33 PST 2006

Hi,
   I believe there are some mistakes in the DOCTYPE state section.

As far as I can tell both of these DOCTYPEs are considered conformant, 
but shouldn't the first be an easy parse error?

   <!DOCTYPEhtml>
   <!DOCTYPE html>

In the DOCTYPE state, it says:

U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z
     Create a new DOCTYPE token. Set the token's name name to the
     uppercase version of the current input character (*add 0x0020
     to the character's codepoint*), and mark it as being in error.
     Switch to the DOCTYPE name state.

* That should read "[subtract] 0x0020 to the character's codepoint"
   (This error is repeated in the DOCTYPE name state too.)

* Why is it marked as being error at that stage?  It doesn't seem to
   be necessary because of the last step in the DOCTYPE name state that
   says:
   "If the name of the DOCTYPE token is exactly the four letters "HTML",
    then mark the token as being correct. Otherwise, mark it as being in
    error."

-- 
Lachlan Hunt
http://lachy.id.au/