[whatwg] innerStaticHTML
Ian Hickson
ian at hixie.ch
Fri Jun 5 18:09:37 PDT 2009
I include below a summary of the feedback received on the topic of a
security-aware innerHTML.
Defining a spec-blessed whitelist of element, attributes, and attribute
values is and filtering at the parser level is a significant new feature.
While I see that it has value, I think on the short term it would be
better to wait for a future version of HTML before introducing this
feature; ideally once we have more implementation experience with
experimental versions of this idea.
I would encourage browser vendors to introduce APIs similar to that
discussed below, clearly marked as vendor-specific (e.g. for Firefox,
something like .mozStaticInnerHTML).
On Wed, 6 May 2009, Adam Barth wrote:
>
> Rob and I were discussing the use case for IE8's toStaticHTML API and
> thought it might make sense to standardize a more robust and
> future-proof API for the same use case.
>
> USE CASE
>
> I receive an untrusted string, for example a weather report or a Twitter
> status update, from postMessage or a cross-origin XMLHttpRequest, and I
> want to display its content to the user without getting XSSed.
>
> WORKAROUNDS
>
> If the content is purely text (e.g., no images, styles, or hyperlinks),
> then I can create a text node containing the string and insert it into
> my page's DOM. If the content is not purely text, I need to implement
> an XSS filter in JavaScript (which folks commonly screw up).
>
> PROPOSAL
>
> In addition to innerHTML, DOM elements should expose an innerStaticHTML
> property. When set, innerStaticHTML should behave the same as innerHTML
> except that scripts should not execute (even in event handlers) and
> plug-ins should not be created.
>
> EXAMPLE
>
> <script>
> fetchMostRecentTweetFor("whatwg", function (tweet) {
> document.getElementById("whatwg_tweet).innerStaticHTML = tweet;
> });
> </script>
>
> WHY NOT toStaticHTML?
>
> toStaticHTML addresses the same use cause by translating an untrusted
> string to another string that lacks active HTML content. This API has
> two issues:
>
> 1) The untrusted string -> static string -> HTML parser workflow
> requires the browser to parse the string twice, introducing a
> performance penalty and a security issue if the two parsing aren't
> identical.
>
> 2) The API is difficult to future-proof because future versions of HTML
> are likely to add new tags with active content (e.g., like the <video>
> tag's event handlers). Either we'll have to commit to a toStaticHTML
> algorithm that will be secure in all future versions of HTML, or we'll
> have to change the input-output behavior of toStaticHTML in future
> versions of HTML.
>
> innerStaticHTML addresses the same use case without these issues.
On Wed, 6 May 2009, João Eiras wrote:
>
> As part of a browser implementation team I can clearly say that the
> cases where scripts should, or should not run are very hard to implement
> in a cross browser compatible way. Marking those scripts or plugins are
> non-executable would make everything much more complex and bug prone.
> Also, it would be impossible to do that for a onevent attribute without
> all sorts of problems. The suggestion of marking content as
> non-executable doesn't solve anything, because after setting
> innerStaticHTML another script might serialize a piece of the affected
> DOM to string and back to a tree, and the code could then execute, which
> would not be wanted.
>
> The only viable solution, from my point of view, would be for the UA to
> parse the string, and remove all untrusted content from the result tree
> before appending to the document. That would mean removing all onevent
> attributes, all scripts elements, all plugins, etc. Basically, letting
> the UA implement all the filtering.
On Wed, 6 May 2009, Robert O'Callahan wrote:
>
> I think that's actually what Adam is proposing. At least, it's what I
> had in mind when we discussed it.
On Thu, 7 May 2009, Tab Atkins Jr. wrote:
>
> I'm in favor of this. Browser-specified sanitizing, woo!
>
> Obviously this doesn't replace the need for sandbox iframes (those are
> still necessary for building a page using external html without
> javascript), but it's a much easier solution for pretty much any
> js-based sandbox-iframe situation.
On Wed, 6 May 2009, Philip Taylor wrote:
>
> Could <iframe sandbox> work as a workaround?
>
> var iframe = document.createElement('iframe');
> iframe.sandbox = ''; // (um, I hope this is right? I'm guessing
> // any non-null/undefined value enables
> // sandboxing, or something)
> iframe.seamless = true;
> iframe.src = 'data:text/html,'+encodeURIComponent(tweet);
> document.getElementById('whatwg_tweet').appendChild(iframe);
On Wed, 6 May 2009, Robert O'Callahan wrote:
>
> Seamless sandboxed IFRAMEs are probably harder to implement, probably
> heavier-weight, and won't work in all situations, such as if you want to
> get safe inline content or want to safely manipulate the content before
> displaying it.
On Thu, 7 May 2009, Kristof Zelechovski wrote:
>
> If toStaticHTML prunes everything it is not sure of, the danger of a
> known language construct suddenly introducing active content is
> negligible. I am sure HTML5 specification editors bear that aspect in
> mind and so shall they in the future.
On Tue, 12 May 2009, Kornel LesiÅ~Dski wrote:
>
> That is based on assumptions that:
> 1. parsing is expensive enough to warrant API optimized for this
> particular case
> 2. browsers cannot optimize it otherwise
> 3. returned code will be ambiguous
>
> In client-side scripts untrusted content comes from the network, which
> means that parsing time is going to be miniscule compared to time
> required to fetch the content (and to render it). My guess is that
> parsing itself is not a bottleneck.
>
> Second, it _is_ possible to avoid reparsing without special API for
> this. toStaticHTML() may return subclass of String that contains
> reference to parsed DOM. Roughly something like this:
>
> function toStaticHTML(html) {
> var cleanDOM = clean(parse(html))
> return {
> toString:function(){return unparse(cleanDOM)},
> node:cleanDOM
> }
> }
>
> which should make common case:
>
> innerHTML = toStaticHTML(html) just as fast as innerStaticHTML = html;
>
> toStaticHTML() enables other optimisations, e.g. filtered HTML can be
> saved for future use (in local storage) or string filtered once used in
> multiple places.
>
> Alternatively there could be toStaticDOM() method that returns
> DOMDocumentFragment, avoiding reparsing issue entirely.
>
> > 2) The API is difficult to future-proof because future versions of
> > HTML are likely to add new tags with active content (e.g., like the
> > <video> tag's event handlers).
>
> When support for new tag is added to a browser, it would also be added
> to its toStaticHTML()/innerStaticHTML, so evolution of HTML shouldn't be
> a problem either way. Browser doesn't need to worry about dangerous
> constructs it does not support.
>
> Methods are easier to patch than properties in JavaScript, so if
> implementation of existing toStaticHTML() turned out to be insecure, the
> method could be easily replaced/patched on cilent-side, or applications
> could post-process output of toStaticHTML(). It's not that easy with a
> property.
>
> I dislike APIs based on magic properties. Properties cannot take
> arguments and we'd have to create new property for every combination of
> arguments. If innerHTML was a method, instead of creating new property
> we could extend it to be innerHTML(html, static=true).
>
> If more sophisticated filtering becomes needed in the future, we could
> have toStaticHTML(html, {preserve:['svg','rdf'], remove:'marquee'}), but
> it would be silly to create another
> innerStaticHTMLwithSVGandRDFbutWithoutMarquee property.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
More information about the whatwg
mailing list