[whatwg] innerStaticHTML
Kornel Lesiński
kornel at geekhood.net
Mon May 11 17:02:05 PDT 2009
On 06.05.2009, at 17:31, Adam Barth wrote:
>
> WHY NOT toStaticHTML?
>
> toStaticHTML addresses the same use cause by translating an untrusted
> string to another string that lacks active HTML content. This API has
> two issues:
>
> 1) The untrusted string -> static string -> HTML parser workflow
> requires the browser to parse the string twice, introducing a
> performance penalty and a security issue if the two parsing aren't
> identical.
That is based on assumptions that:
1. parsing is expensive enough to warrant API optimized for this
particular case
2. browsers cannot optimize it otherwise
3. returned code will be ambiguous
In client-side scripts untrusted content comes from the network, which
means that parsing time is going to be miniscule compared to time
required to fetch the content (and to render it). My guess is that
parsing itself is not a bottleneck.
Second, it _is_ possible to avoid reparsing without special API for
this. toStaticHTML() may return subclass of String that contains
reference to parsed DOM. Roughly something like this:
function toStaticHTML(html)
{
var cleanDOM = clean(parse(html))
return {
toString:function(){return unparse(cleanDOM)},
node:cleanDOM
}
}
which should make common case:
innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html;
toStaticHTML() enables other optimisations, e.g. filtered HTML can be
saved for future use (in local storage) or string filtered once used
in multiple places.
Alternatively there could be toStaticDOM() method that returns
DOMDocumentFragment, avoiding reparsing issue entirely.
> 2) The API is difficult to future-proof because future versions of
> HTML are likely to add new tags with active content (e.g., like the
> <video> tag's event handlers).
When support for new tag is added to a browser, it would also be added
to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't
be a problem either way. Browser doesn't need to worry about dangerous
constructs it does not support.
Methods are easier to patch than properties in JavaScript, so if
implementation of existing toStaticHTML() turned out to be insecure,
the method could be easily replaced/patched on cilent-side, or
applications could post-process output of toStaticHTML().
It's not that easy with a property.
I dislike APIs based on magic properties. Properties cannot take
arguments and we'd have to create new property for every combination
of arguments. If innerHTML was a method, instead of creating new
property we could extend it to be innerHTML(html, static=true).
If more sophisticated filtering becomes needed in the future, we could
have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}),
but it would be silly to create another
innerStaticHTMLwithSVGandRDFbutWithoutMarquee property.
--
regards, Kornel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090512/15357f76/attachment-0002.htm>
More information about the whatwg
mailing list