[imps] 24 June 2010 HTML 5 spec: bug when emitting tokenizer start tags
ian at hixie.ch
Mon Aug 9 17:03:20 PDT 2010
On Wed, 23 Jun 2010, Rob Jellinghaus wrote:
> The 24 June 2010 working draft of the HTML5 spec has, I believe, a bug
> with tokenizer state update when emitting start tags. The bug is an
> ordering problem between the tokenizer state update performed by the
> tokenizer itself, and the tokenizer state update sometimes performed by
> the tree construction stage.
> http://dev.w3.org/html5/spec/Overview.html currently links to
> http://www.w3.org/TR/2010/WD-html5-20100624/ as the latest version, but
> the latter link is broken at the moment. Looking at the former, for
> Section 184.108.40.206 (Tag name state) says
> ↪U+003E GREATER-THAN SIGN (>)
> Emit the current tag token. Switch to the data state.
> The "Emit the current tag token" step is defined in section 8.2.4 as:
> When a token is emitted, it must immediately be handled by the
> tree construction stage. The tree construction stage can affect
> the state of the tokenization stage, and can insert additional
> characters into the stream.
> So let us consider the following HTML:
> <script><!-- window.alert(); --></script>
> At the closing '>' of '<script>', the tokenizer is in tag name state.
> It emits the current tag token, which is a 'script' start tag.
> The tree construction stage, in section 220.127.116.11 ("in head" insertion
> mode), specifies:
> ↪A start tag whose tag name is "script"
> Run these steps:
> 5.Switch the tokenizer to the script data state.
> The tree construction stage therefore resets the tokenizer state
> After completing, the tree construction stage returns to the tokenizer.
> *And at that point, the tokenizer is specified to reset to the data
> state!* This state update overwrites the state update from the tree
> construction stage, and the script is not parsed as script.
> I encountered this bug in my own implementation. The identical bug
> exists in all the other states that can emit start tags which can
> contain content (18.104.22.168 through 22.214.171.124, and 126.96.36.199).
For the record, this was fixed a few weeks ago. Let me know if anything is
still broken here.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the Implementors