[imps] 24 June 2010 HTML 5 spec: bug when emitting tokenizer start tags
Ian Hickson
ian at hixie.ch
Mon Aug 9 17:03:20 PDT 2010
On Wed, 23 Jun 2010, Rob Jellinghaus wrote:
>
> The 24 June 2010 working draft of the HTML5 spec has, I believe, a bug
> with tokenizer state update when emitting start tags. The bug is an
> ordering problem between the tokenizer state update performed by the
> tokenizer itself, and the tokenizer state update sometimes performed by
> the tree construction stage.
>
> http://dev.w3.org/html5/spec/Overview.html currently links to
> http://www.w3.org/TR/2010/WD-html5-20100624/ as the latest version, but
> the latter link is broken at the moment. Looking at the former, for
> instance:
>
> Section 8.2.4.10 (Tag name state) says
>
> ↪U+003E GREATER-THAN SIGN (>)
> Emit the current tag token. Switch to the data state.
>
> The "Emit the current tag token" step is defined in section 8.2.4 as:
>
> When a token is emitted, it must immediately be handled by the
> tree construction stage. The tree construction stage can affect
> the state of the tokenization stage, and can insert additional
> characters into the stream.
>
> So let us consider the following HTML:
>
> <html>
> <head>
> <script><!-- window.alert(); --></script>
> </head>
> <body></body>
> </html>
>
> At the closing '>' of '<script>', the tokenizer is in tag name state.
> It emits the current tag token, which is a 'script' start tag.
>
> The tree construction stage, in section 8.2.5.7 ("in head" insertion
> mode), specifies:
>
> ↪A start tag whose tag name is "script"
> Run these steps:
> ...
> 5.Switch the tokenizer to the script data state.
>
> The tree construction stage therefore resets the tokenizer state
> immediately.
>
> After completing, the tree construction stage returns to the tokenizer.
> *And at that point, the tokenizer is specified to reset to the data
> state!* This state update overwrites the state update from the tree
> construction stage, and the script is not parsed as script.
>
> I encountered this bug in my own implementation. The identical bug
> exists in all the other states that can emit start tags which can
> contain content (8.2.4.34 through 8.2.4.37, and 8.2.4.42).
For the record, this was fixed a few weeks ago. Let me know if anything is
still broken here.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the Implementors
mailing list