[whatwg] innerStaticHTML

Mon May 11 17:02:05 PDT 2009

On 06.05.2009, at 17:31, Adam Barth wrote:
>
> WHY NOT toStaticHTML?
>
> toStaticHTML addresses the same use cause by translating an untrusted
> string to another string that lacks active HTML content.  This API has
> two issues:
>
> 1) The untrusted string -> static string -> HTML parser workflow
> requires the browser to parse the string twice, introducing a
> performance penalty and a security issue if the two parsing aren't
> identical.

That is based on assumptions that:
1. parsing is expensive enough to warrant API optimized for this  
particular case
2. browsers cannot optimize it otherwise
3. returned code will be ambiguous

In client-side scripts untrusted content comes from the network, which  
means that parsing time is going to be miniscule compared to time  
required to fetch the content (and to render it). My guess is that  
parsing itself is not a bottleneck.

Second, it _is_ possible to avoid reparsing without special API for  
this. toStaticHTML() may return subclass of String that contains  
reference to parsed DOM. Roughly something like this:

function toStaticHTML(html)
{
     var cleanDOM = clean(parse(html))
     return {
         toString:function(){return unparse(cleanDOM)},
         node:cleanDOM
     }
}

which should make common case:

innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html;

toStaticHTML() enables other optimisations, e.g. filtered HTML can be  
saved for future use (in local storage) or string filtered once used  
in multiple places.

Alternatively there could be toStaticDOM() method that returns  
DOMDocumentFragment, avoiding reparsing issue entirely.

> 2) The API is difficult to future-proof because future versions of
> HTML are likely to add new tags with active content (e.g., like the
> <video> tag's event handlers).

When support for new tag is added to a browser, it would also be added  
to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't  
be a problem either way. Browser doesn't need to worry about dangerous  
constructs it does not support.

Methods are easier to patch than properties in JavaScript, so if  
implementation of existing toStaticHTML() turned out to be insecure,  
the method could be easily replaced/patched on cilent-side, or  
applications could post-process output of toStaticHTML().
It's not that easy with a property.

I dislike APIs based on magic properties. Properties cannot take  
arguments and we'd have to create new property for every combination  
of arguments. If innerHTML was a method, instead of creating new  
property we could extend it to be innerHTML(html, static=true).

If more sophisticated filtering becomes needed in the future, we could  
have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}),  
but it would be silly to create another  
innerStaticHTMLwithSVGandRDFbutWithoutMarquee property.

-- 
regards, Kornel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090512/15357f76/attachment-0002.htm>