[whatwg] Stability of tokenizing/dom algorithms

Edward Z. Yang edwardzyang at thewritingpot.com
Mon Dec 15 13:27:52 PST 2008


Ian Hickson wrote:
> Oh well that's just a matter of having pluggable modules for different 
> things to filter. You can equally support SVG and MathML in this way. You 
> just need the core processing to be made independent of the filtering.

I just realized an error in my thought that I would need to modify the
parsing algorithm; that would only be the case if I tried to integrate
filtering with the core processing. If it's a two-stage process, the
core processing merely has special rules for certain elements embedded
in it, but otherwise acts normally. Performance *is* an issue (getting
things to be standards compliant is relatively CPU/memory intensive),
but getting things to work is first.

> I wouldn't really worry about "4" vs "5". What matters is what works in 
> browsers, or whatever tools your users are using. (This is one reason in 
> HTML5 we do away with having the version number in the DOCTYPE.) I'd 
> recommend just using the HTML5 DOCTYPE and then filtering the content to 
> be whatever you want it to be.

HTML Purifier puts a high value on standards-compliance, and we've been
attacked on several occasions because of it. "Standards suck." To this I
have to say, standards compliance has helped defend against a number of
XSS attacks--enforcing it lowers attack surface and makes behavior much
more well-defined. So I feel like it's a goal worth striving for, in and
of itself, especially since you can't enforce semantics with computers.

Cheers,
Edward



More information about the whatwg mailing list