[whatwg] Tokenizor PseudoCode
Ian Hickson
ian at hixie.ch
Mon Jul 1 16:03:38 PDT 2013
On Fri, 15 Mar 2013, Mohammad Al Houssami (Alumni) wrote:
>
> I just want to make sure that in places where no state change is called
> it means we stay in the same state right? Take the RCDATA state below.
> In the anything else branch we emit character token and then go consume
> another character and check all the cases in this state. This is the
> only thing that makes sense but I just want to make sure :)
On Sat, 16 Mar 2013, Bjoern Hoehrmann wrote:
>
> You missed "When a token is emitted, it must immediately be handled by
> the tree construction stage. The tree construction stage can affect the
> state of the tokenization stage ..." but if that does not result in a
> change of state either, then yes, as far as I am aware.
On Fri, 15 Mar 2013, Mohammad Al Houssami (Alumni) wrote:
>
> I'm trying to build an HTML5 Parser in Smalltalk and as a first step I'm
> implementing the tokenizer and everything happens there. I think this is
> the case only when we have scripts that add characters to the HTML
> document which is out of the scope of the project I am working on at the
> moment. Is this true or not ?
On Sat, 16 Mar 2013, Bjoern Hoehrmann wrote:
>
> No. Grepping for "PLAINTEXT" should make this clear.
There's a number of places in the tree construction stage that change the
tokenizer state, in particular, the parsing for these elements: title,
noscript, noframes, style, xmp, iframe, noembed, script, plaintext,
textarea.
HTH,
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list