[whatwg] Potentially avoidable tokeniser/treebuilder dependency
ian at hixie.ch
Mon Oct 5 18:05:18 PDT 2009
On Wed, 23 Sep 2009, Øistein E. Andersen wrote:
> The major obstacle for an independent tokeniser seems to be that the
> content model flag is set to RCDATA, RAWTEXT or PLAINTEXT by the
> treebuilder and not by the tokeniser. In most cases, the new content
> model flag is entirely predictable from the start tag (and
> RCDATA/RAWTEXT element names are known to the tokeniser already). The
> only exceptions I have found so far concern start tags within <select>
> and <frameset>, which are dropped by the treebuilder and therefore do
> not cause the content model flag to change. Even these cases could
> perhaps have been handled by the tokeniser without too much trouble (and
> without changing the spec) if it were not for the "in select in table"
> insertion mode, where a missing </select> end tag may be inferred
> depending on the stack of open elements.
> It seems unfortunate to abandon the possibility of an independent
> tokeniser just to handle what appears to be a corner case of a corner
> case, viz, unclosed RCDATA/RAWTEXT elements inside an unclosed <select>
> element in a table. The easiest solution would be to switch the content
> model flag upon seeing an RCDATA/RAWTEXT/PLAINTEXT start tag
> irrespective of insertion mode, i.e., also within <select> and
> <frameset>, which would allow the tokeniser to take care of this without
> added complexity. Other solutions might be worth considering if this is
> found to be too incompatible with existing pages. (I could have a look
> at the the http://www.dotnetdotcom.org/ dataset if that would be of any
I don't feel comfortable changing this without a _really_ good reason,
given the high risk of compatibility problems. Having the tokeniser be
separable was never a design goal; that it is possible to get even as
close as it is today is frankly quite surprising to me.
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg