From rjelling at microsoft.com Wed Jun 23 11:54:34 2010 From: rjelling at microsoft.com (Rob Jellinghaus) Date: Wed, 23 Jun 2010 18:54:34 +0000 Subject: [imps] 24 June 2010 HTML 5 spec: bug when emitting tokenizer start tags Message-ID: <9701F5AA905BF549AEF8EF6EF9D5A31243A08188@TK5EX14MBXC134.redmond.corp.microsoft.com> The 24 June 2010 working draft of the HTML5 spec has, I believe, a bug with tokenizer state update when emitting start tags. The bug is an ordering problem between the tokenizer state update performed by the tokenizer itself, and the tokenizer state update sometimes performed by the tree construction stage. http://dev.w3.org/html5/spec/Overview.html currently links to http://www.w3.org/TR/2010/WD-html5-20100624/ as the latest version, but the latter link is broken at the moment. Looking at the former, for instance: Section 8.2.4.10 (Tag name state) says ?U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state. The "Emit the current tag token" step is defined in section 8.2.4 as: When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the tokenization stage, and can insert additional characters into the stream. So let us consider the following HTML:
At the closing '>' of '