[whatwg] Tokenizor PseudoCode

Mon Jul 1 16:03:38 PDT 2013

On Fri, 15 Mar 2013, Mohammad Al Houssami (Alumni) wrote:
> 
> I just want to make sure that in places where no state change is called 
> it means we stay in the same state right? Take the RCDATA state below. 
> In the anything else branch we emit character token and then go consume 
> another character and check all the cases in this state. This is the 
> only thing that makes sense but I just want to make sure :)

On Sat, 16 Mar 2013, Bjoern Hoehrmann wrote:
> 
> You missed "When a token is emitted, it must immediately be handled by 
> the tree construction stage. The tree construction stage can affect the 
> state of the tokenization stage ..." but if that does not result in a 
> change of state either, then yes, as far as I am aware.

On Fri, 15 Mar 2013, Mohammad Al Houssami (Alumni) wrote:
>
> I'm trying to build an HTML5 Parser in Smalltalk and as a first step I'm 
> implementing the tokenizer and everything happens there. I think this is 
> the case only when we have scripts that add characters to the HTML 
> document which is out of the scope of the project I am working on at the 
> moment. Is this true or not ?

On Sat, 16 Mar 2013, Bjoern Hoehrmann wrote:
> 
> No. Grepping for "PLAINTEXT" should make this clear.

There's a number of places in the tree construction stage that change the 
tokenizer state, in particular, the parsing for these elements: title, 
noscript, noframes, style, xmp, iframe, noembed, script, plaintext, 
textarea.

HTH,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'