[whatwg] [WebApps] Parsing: bogus DOCTYPE state
Ian Hickson
ian at hixie.ch
Tue Jul 18 17:20:44 PDT 2006
On Mon, 17 Jul 2006, J. King wrote:
>
> The bogus DOCTYPE state consumes all characters until it gets to EOF or a '>'
> character. I presume this means that the following DOCTYPE:
>
> <!DOCTYPE html blah "http://some<invalid>URI">
>
> ...would finish at the first > and emit character tokens for 'URI">'.
Correct. That's compatible with the rendering that that DOCTYPE causes in
Safari, Opera, and Mozilla. (In Mozilla the DOCTYPE actually ends at the
"<", so you have an <invalid> element in the DOM too. In Safari the
DOCTYPE can end at a "<" only if it preceeded by a space. The spec
doesn't have any "<" magic for DOCTYPEs.)
> Similarly, I imagine this sequence:
>
> <!DOCTYPE html blah <html lang="en"><head>
>
> ...would not produce a start-tag token for 'html'.
Correct, although in Mozilla and Safari it actually does. I doubt this is
a big deal since in IE there is, as you propose, somewhat more complex
DOCTYPE parsing at work, and so the DOCTYPEs end up containing the
entirety of your examples. (Of course, IE then treats them as comments,
not as DOCTYPEs, in the DOM.)
> Is this what browsers do, or is this an oversight?
It's compatible with what some browsers do. It was intentional, at least.
I believe it's actually compatible with the SGML parsing rules, too,
though I may be mistaken about that and don't have a copy of Goldfarb
around to check.
> Even if it -is- what browsers do, this behaviour would lead conformance
> checkers to report the wrong kinds of errors; I would suggest a more
> complex parsing of DOCTYPEs is necessary.
Well, anything other than <!DOCTYPE HTML> is invalid, so there'll already
be at least one parse error -- the DOCTYPE being invalid. Conformance
checkers are, of course, allowed to go out of their way to make their
errors more understandable.
FWIW, my implementation, which has had very little work put into its
error handling, reported:
16: Parse error: unexpected character while tokenising end of DOCTYPE.
41: Parse error: errorneous document type declaration.
...on your first example, and:
16: Parse error: unexpected character while tokenising end of DOCTYPE.
36: Parse error: errorneous document type declaration.
...on your second (and no other errors). Those don't seem like the wrong
kinds of errors. :-)
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list