[whatwg] Stability of tokenizing/dom algorithms

Ian Hickson ian at hixie.ch
Mon Dec 15 13:17:38 PST 2008

On Mon, 15 Dec 2008, Edward Z. Yang wrote:
> In theory, I could write separate sanitizers for HTML 4, XHTML 1.0, 
> XHTML 2.0, HTML 5, etc. In practice, I want to reuse as much code as 
> possible between these cases, since I'm a lazy developer. Perhaps 
> "extensibility" is not the right word here; it's more like "reusability" 
> of components.

Oh well that's just a matter of having pluggable modules for different 
things to filter. You can equally support SVG and MathML in this way. You 
just need the core processing to be made independent of the filtering.

> A side-note: something we've been looking into is bolting on extensions 
> to the HTML language. A user might write something in HTML 5, but the 
> website is in HTML 4, so the sanitizer converts the HTML 5 into a more 
> ugly but functional HTML 4 version, and returns that. The future, today!

I wouldn't really worry about "4" vs "5". What matters is what works in 
browsers, or whatever tools your users are using. (This is one reason in 
HTML5 we do away with having the version number in the DOCTYPE.) I'd 
recommend just using the HTML5 DOCTYPE and then filtering the content to 
be whatever you want it to be.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list