[whatwg] some thoughts on sandboxed IFRAMEs

Ian Hickson ian at hixie.ch
Sun Jan 24 02:52:24 PST 2010

On Fri, 11 Dec 2009, Michal Zalewski wrote:
> 1) IFRAME semantics make it exceedingly cumbersome to sandbox short 
> snippets of text, and this task is perhaps the most common and pressing 
> XSS-related challenge. Unless the document is constructed on client side 
> by JavaScript, sites would need to use opaque data: URLs, or put up with 
> a lot of additional HTTP roundtrips, to utilize sandboxed IFRAMEs for 
> this purpose. [ There is also the problem of formatting and positioning 
> IFRAME content, although the seamless attribute would fix this. ]

I've introduced srcdoc="" to largely handle this. There is an example in 
the spec showing how it can be used.

> The ability to sandbox SPANs or DIVs using a token-guarded approach
> (<span sandbox="random_token"></span sandbox="same_token">) is, on the
> other hand, considerably easier on the developer, and probably has a
> very similar implementation complexity.

This has been proposed before. The concern is that many authors would be 
likely to make mistakes in their selection of "random" tokens that would 
lead to significant flaws in the deployment of the feature.

srcdoc="" is less prone to errors. Only " and & characters need to be 
escaped. If the " character is not escaped, then a single " character in 
the input will cause the comment to break. This is likely to be caught 
early. If the & character is not escaped, correctness and fidelity will 
suffer, but it will not lead to security errors.

> 2) Renderers suck dealing with IFRAMEs, and will probably continue to
> do so for time being. This means that a typical, moderately complex
> application (say, as a discussion forum or a social site), where
> hundreds of user-controlled strings may need to be present to display
> user content - the mechanism would have an unacceptable load time and
> memory footprint. In fact, people are already coming up with
> lightweight alternatives with a significant functionality overlap (and
> different security controls). Microsoft has toStaticHTML(), while a
> standardized implementation is being discussed here right now in a
> separate thread.

I agree that we should investigate other options too (<iframe> boxes 
aren't suitable for everything), but I don't think that current 
implementation problems with <iframe> should necessarily prevent us from 
investigating sandboxed iframes too.

In certain contexts, e.g. reddit comments, it may be the case that instead 
of one sandboxed <iframe> per comment, the best way to do things is 
instead one sandboxed iframe for all the comments, with scripts disabled 
and allow-same-origin enabled, so that scripts can poke into the page and 
set event handlers on all the relevant links.

> Isn't the benefit of keeping the design slightly simpler (and 
> realistically, limited to relatively few usage scenarios) negated by the 
> fact that alternative solutions to other narrow problems would need to 
> emerge elsewhere? The browser coming with several different script 
> sanitizers with completely different APIs and security controls does not 
> strike me as a desirable outcome (all the flavors of SOP are a testament 
> to this). If the anser is not a strong "no", maybe the token-guarded DIV 
> / SPAN approach is a better alternative?

I agree in principle that fewer features are better than more features, 
but we have to take into account that many of the people deploying these 
features know nothing about security. We have to ensure that the security 
aspects of features like this (like what to escape, what security tokens 
need to be generated) are aligned with the practical aspects of features 
like this (like what results in the page appearing to work, regardless of 
the state of security).

> Now, that aside - on a more pragmatic level, I have two extra comments:
> 1) The utility of the SOP sandboxing behavior outlined in the spec is
> diminished if we have no way to actually *enforce* that the IFRAMEd
> resource would only be rendered in such a context. If I am serving
> user-supplied, unsanitized HTML, it is obviously safe to do <iframe
> sandbox src="show.cgi?id=1234"></iframe> - but where do we prevent the
> attacker from calling http://my_site/show.cgi?id=1234 directly, and
> bypassing the filter?

I've introduced text/html-sandboxed for this purpose.

> 2.1) The ability to disable loading of external resources (images, 
> scripts, etc) in the sandboxed document. The common usage scenario is 
> when you do not want the displayed document to "phone home" for privacy 
> reasons, for example in a web mail system.

Good point. Should we make sandbox="" disable off-origin network requests?

> 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually 
> be approximated with the excommunicated <plaintext> tag, or with 
> Content-Type: text/plain / data:text/plain,. On token-guarded SPANs or 
> DIVs, however, it would be pretty damn useful for displaying text 
> content without the need to escape &, <, >, etc. "Pure" security benefit 
> is limited, but as a phishing prevention and display correctness 
> measure, it makes sense.

I don't really understand the use case here; could you elaborate?

On Sun, 13 Dec 2009, Michal Zalewski replied to Tab:
> >
> > I believe that the @doc attribute, discussed in the original threads 
> > about @sandbox, will be introduced to deal with that.  It'll take 
> > plain html as a string, avoiding the opaqueness and larger escaping 
> > requirements of a data:// url, as the only thing you'll have to escape 
> > is whichever quote you're using to surround the value.
> That doesn't strike me as a robust way to prevent XSS - the primary 
> reason why we need sandboxing to begin with is that people have a 
> difficulty properly parsing, serializing, or escaping HTML; so replacing 
> this with a mechanism that still requires escaping is perhaps 
> suboptimal.

There's a world of difference between "properly parsing, serializing, or 
escaping HTML" and "escaping quotes and ampersands".

> >  More importantly, though, it puts a significant burden on authors to 
> > generate unpredictable tokens.  Is this difficult?  No, of course not. 
> > But people *will* do it badly, copypasting a single token in all their 
> > <iframe>s or similar.
> People already need to do this well for XSRF defenses to work, and I'd 
> wager it's a much simpler and better-defined problem than real-world 
> HTML parsing and escaping could realistically be. It is also very easy 
> to delegate this task to existing functions in common web frameworks.

Do people get CSRF right more often than simply escaping characters? It 
seems implausible that authors get complex cryptographic properties right 
more often than a simple set of substitutions, but I suppose stranger 
things are true on the Web.

> Also, a single token on a returned page, as long as it's unpredictable 
> across user sessions, should not be a significant issue.

I'm just worried that some people would just a constant string.

On Sun, 13 Dec 2009, Adam Barth wrote:
> I agree that we need something to help with content received by 
> cross-site XMLHttpRequest and postMessage.  For those use cases, we're 
> already running script, so a design like toStaticHTML seems better than 
> <jail>.

If the data is to be rendered into a block-level box, it seems that 
srcdoc="" might actually handle that case too.

On Sun, 13 Dec 2009, Michal Zalewski replied to Adam:
> >
> > The @sandbox seems like a better fit for the advertising use case.
> I am not contesting this, to be clear - I am aware of many cases where 
> it would be very useful - but gadgets are a fairly small part of the 
> Internet, and seems like a unified solution would be more desirable than 
> several very different APIs with different granularity.
> The toStaticHTML-alike will address another specific uses, but will 
> leave applications that can't rely on JS exclusively for their rendering 
> needs (which I'd wager is still a majority) out in the cold; which would 
> probably lead to a yet another XSS prevention / HTML sandboxing approach 
> emerging later on.
> I haven't really seen a compelling argument why all these can't be 
> unified without a significant increase in code or spec complexity - 
> maybe one exists.

What would they be unified under? I don't think anyone has proposed 
anything that solves all the problems that CSP, sandbox="", srcdoc="", 
toStaticHTML(), httpOnly, text/html-sandboxed, and the various other 
"security" mechanisms introduced to the platform over the past few years 
would solve without introducing more complexity overall.

There are many problems to solve. It seems logical that we'd end up with 
many solutions.

On Sun, 13 Dec 2009, Michal Zalewski replied to Adam:
> >
> > That seems like a backwards way of proceeding.  Do you have a proposal 
> > for unification besides the <jail> tag?
> The only fundamental objection I have heard against it is the trouble 
> with XML representation.

Well, it also doesn't really solve all the problems. For example, it 
doesn't solve the "embedding external content safely" problem.

> The other option is to simply require a traditional CDATA-esque behavior 
> or a tag parameter - which would place the burden on the author to 
> filter out / escape a single exact string or a quote, but would be 
> similar otherwise.

That's similar to what srcdoc="" does when used with sandbox="".

> It's obviously less secure - because while the token-based approach
> actually requires the user to explicitly come up with a token, however
> poor it might be; whereas here, there is no way to enforce escaping.

The token-based approach could lead an author to just coming up with a 
constant token, which is just as useless as not enforcing escaping, except 
that the author had to wonder how to get security to use it, and thus the 
author will have a false sense of security whose only likely failure mode 
is an actual attack. Compare this to srcdoc="", where the failure mode is 
the use of a quote mark, and is thus likely to happen much earlier than an 
attack. It's also easier to understand the failure mode. "The token has to 
be unguessable" is harder to explain than "quotes have to be escaped".

> From Tab's response, looks like it's being considered, too - @doc + 
> @seamless. What's strikes me as a bit ironic is that this way, we're 
> overloading IFRAME to become something else entirely, and after 
> rejecting token-guards, settling for an option that is definitely not 
> perfect, and in practice, I think, is bound to be less secure.

I don't really follow the "something else entirely" bit. Also, why would 
it be less secure? What is the attack scenario?

On Sun, 13 Dec 2009, Michal Zalewski wrote:
> Huh? But that's not the point I am making... I am not arguing that 
> iframe sandbox should be abandoned as a bad idea - quite the opposite.
> I was merely suggesting that we *expand* the same logic, and the same 
> excellent security control granularity, to span and div; this seems like 
> it would not increase the implementation complexity in any significant 
> way.

I don't understand the proposal then. What is the problem it is solving, 
and how does it solve it?

> We could then allow these to be populated with secure contents in three 
> ways:
> 1) Guarded closing tag - this is simple and bullet-proof; but may 
> conflict with XML serializations, and hence require some hacks,

I strongly disagree with the characterisation of this idea as "simple and 
bullet-proof", at least for anyone who doesn't understand cryptography.

> 2) CDATA or @doc-like approaches. Less secure because it does not 
> enforce a security control, but less contentious, and already being 
> considered for IFRAMEs.

I don't understand what you mean by "does not enforce a security control", 
or how a guarded closing tag does "enforce a security control".

> 3) .innerHTML, which would be then safe by default, without the need for 
> .innerSafeHTML (and the associated ambiguities) or explicit 
> .toStaticHTML calls.

To run scripts in a safe environment, we need to have a separate global 
object, which is why we're using <iframe> for it. This supports the 
equivalent of ".innerHTML" as you describe (.srcdoc).

If you just want something that blocks scripts, plugins, forms, targeted 
links, etc, without a separate document, then it's not clear to me that 
that is something that is sanely achievable. It would require complex 
changes all over the place.

What is the use case this is targetted at?

On Sun, 13 Dec 2009, Adam Barth wrote:
> I'm very interested in a solution that works for the following use 
> cases:
> 1) A web page wants to display untrusted (i.e., restricted) HTML 
> received via cross-site XMLHttpRequest or postMessage.

Do you have a concrete use case for which <iframe> doesn't work?

> 2) A blog wishes to display many comments containing untrusted (i.e., 
> restricted) HTML.

It seems <iframe srcdoc> works well for this case. You can even safely 
enable scripts in the comments, so that people can upload little 
calculator-like things or games, not that I would recommend that!

On Sun, 13 Dec 2009, Michal Zalewski wrote:
> [...] this really strikes me as throwing random ideas at the wall, and 
> seeing which ones stick.

Welcome to Web standards development. :-)

> Furthermore, in this particular case, I am really concerned that the 
> spec is at odds with itself - you mention certain specific use cases, 
> but the spec seems to be after a broader goal: sandboxing user-supplied 
> content in general. In doing so, it gives some bad advice (again, the 
> user content example is exploitable, at least until the arrival of some 
> out-of-scope security mechanism to prevent it).

I've added a warning to the spec pointing out that the text/html-sandboxed 
MIME type has to be used in that case.

On Sun, 13 Dec 2009, Aryeh Gregor wrote:
> So instead, why not just use the standard escaping mechanisms we already 
> have?  Allow a sandbox attribute on all elements that can contain 
> phrasing or flow content.  Any such element with a sandbox attribute 
> will be required to contain no literal <>'" before the closing tag.  If 
> any of those four characters is encountered, the element is treated as 
> having no contents.  Otherwise, the browser unescapes all characters 
> with special meanings ("<" -> "<", ">" -> ">", "&" -> "&", 
> etc.) and then treats the resulting string as the inner HTML of the 
> element, parsing it like regular HTML, but the contents are sandboxed.
> Examples:
> <span sandbox>This span will work normally, except for being 
> sandboxed.</span>
> <span sandbox>This span will be <em>empty</em> in the DOM, even though 
> it contains no evil content, because otherwise authors will forget to 
> escape the contents of the sandbox.</span>
> <span sandbox><span>But this span will have another span as its 
> child, sandboxed.  The regular parser sees no entities here, only a 
> nested span!</span></span>
> <span sandbox>It would be safe to allow this to work, since it only 
> contains an apostrophe, but let's not, so that lack of escaping is 
> easier to catch.  This span is therefore also empty.</span>

What would the "sandbox" do, other than require one level of escaping? 
i.e. what is it protecting against?

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

More information about the whatwg mailing list