[whatwg] Parsing: Tokenisation - DOCTYPE State
lachlan.hunt at lachy.id.au
Sat Jan 28 22:59:33 PST 2006
I believe there are some mistakes in the DOCTYPE state section.
As far as I can tell both of these DOCTYPEs are considered conformant,
but shouldn't the first be an easy parse error?
In the DOCTYPE state, it says:
U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z
Create a new DOCTYPE token. Set the token's name name to the
uppercase version of the current input character (*add 0x0020
to the character's codepoint*), and mark it as being in error.
Switch to the DOCTYPE name state.
* That should read "[subtract] 0x0020 to the character's codepoint"
(This error is repeated in the DOCTYPE name state too.)
* Why is it marked as being error at that stage? It doesn't seem to
be necessary because of the last step in the DOCTYPE name state that
"If the name of the DOCTYPE token is exactly the four letters "HTML",
then mark the token as being correct. Otherwise, mark it as being in
More information about the whatwg