[whatwg] Stability of tokenizing/dom algorithms
Edward Z. Yang
edwardzyang at thewritingpot.com
Mon Dec 15 13:27:52 PST 2008
Ian Hickson wrote:
> Oh well that's just a matter of having pluggable modules for different
> things to filter. You can equally support SVG and MathML in this way. You
> just need the core processing to be made independent of the filtering.
I just realized an error in my thought that I would need to modify the
parsing algorithm; that would only be the case if I tried to integrate
filtering with the core processing. If it's a two-stage process, the
core processing merely has special rules for certain elements embedded
in it, but otherwise acts normally. Performance *is* an issue (getting
things to be standards compliant is relatively CPU/memory intensive),
but getting things to work is first.
> I wouldn't really worry about "4" vs "5". What matters is what works in
> browsers, or whatever tools your users are using. (This is one reason in
> HTML5 we do away with having the version number in the DOCTYPE.) I'd
> recommend just using the HTML5 DOCTYPE and then filtering the content to
> be whatever you want it to be.
HTML Purifier puts a high value on standards-compliance, and we've been
attacked on several occasions because of it. "Standards suck." To this I
have to say, standards compliance has helped defend against a number of
XSS attacks--enforcing it lowers attack surface and makes behavior much
more well-defined. So I feel like it's a goal worth striving for, in and
of itself, especially since you can't enforce semantics with computers.
More information about the whatwg