[whatwg] some thoughts on sandboxed IFRAMEs

Mon Jan 25 11:51:33 PST 2010

> I've introduced srcdoc="" to largely handle this. There is an example in
> the spec showing how it can be used.

Yup, sounds good.

> This has been proposed before. The concern is that many authors would be
> likely to make mistakes in their selection of "random" tokens that would
> lead to significant flaws in the deployment of the feature.
>
> srcdoc="" is less prone to errors. Only " and & characters need to be
> escaped. If the " character is not escaped, then a single " character in
> the input will cause the comment to break.

My counterargument, as stated later in the thread, is quite simple:
the former *forces* you to implement a security mechanism, else the
functionality will break. You can still use a bad token, but you are
required to make the effort.

In that regard, the comparison to XSRF is probably not valid; a vast
majority of XSRF bugs occurs not because people pick poor tokens (in
fact, that's really a majority), but because they don't use them at
all. From that perspectiv, srcdoc="..." is very similar to XSRF -
people will mess it up simply by not thinking about the correct
escaping.

That said, I am not really arguing against srcdoc="..."; I think it's
an OK feature. My point is simply that I would love to see less
fragmentation when it comes to XSS defenses and the granularity of
security controls. The initial proposal of <iframe> sandboxes solved a
very narrow use case, and other, unrelated designs started to spring
up elsewhere. This wouldn't be bad by itself, but while the security
controls on <iframes> were pretty great (with some tweaks, such as
text/html-sandboxed), they would not be reflected in other APIs, which
I thought is unfortunate.

If we extend sandboxed iframes with srcdoc, seamless frames,
text/html-sandboxed, and <iframe> rendering performance improvements,
it actually becomes close to a comprehensive solution, and I am happy
with this (other than a vague feeling that we just repurposed <iframe>
to be some sort of a <span> ;-).

> I've introduced text/html-sandboxed for this purpose.

Yup, I noticed. Looks great. It does make me wonder about two things, though:

1) Some other security mechanisms (CORS, anti-clickjacking controls,
XSS filter controls) rely on separate HTTP headers instead. Is there a
compelling reason not to follow that lead - or better yet, to unify
all security headers to conserve space?

2) People may conceivably want to sandbox other document types (e.g.,
SVG, RSS, or other XML-based formats rendered natively, and offering
scripting capabilities). Do we want to create "-sandboxed" MIME types
for each? The header approach would fix this, too.

>> 2.1) The ability to disable loading of external resources (images,
>> scripts, etc) in the sandboxed document. The common usage scenario is
>> when you do not want the displayed document to "phone home" for privacy
>> reasons, for example in a web mail system.
>
> Good point. Should we make sandbox="" disable off-origin network requests?

That would be great. I think Adam proposed we have a separate
sandbox="..." toggle for this. Whether it's on or off by default
probably doesn't matter much.

>> 2.2) The ability to disable HTML parsing. On IFRAMEs, this can actually
>> be approximated with the excommunicated <plaintext> tag, or with
>> Content-Type: text/plain / data:text/plain,. On token-guarded SPANs or
>> DIVs, however, it would be pretty damn useful for displaying text
>> content without the need to escape &, <, >, etc. "Pure" security benefit
>> is limited, but as a phishing prevention and display correctness
>> measure, it makes sense.
>
> I don't really understand the use case here; could you elaborate?

One use case is a web forum or a web mail interface where you want to
display a message, but specifically don't want HTML formatting. Or,
performance permitting, the same could be used for any text-only entry
fields displayed on a page. These are common XSS vectors, and they are
not readily addressed by sandboxed <iframe> + srcdoc="...", because
this will not render as expected:

User's favorite smiley face is: <iframe srcdoc="<O>_<O>"></iframe>

Having a drop-in solution for this would be pretty nice, and very easy
to implement, too: just force text/plain, do not sniff.

> Do people get CSRF right more often than simply escaping characters? It
> seems implausible that authors get complex cryptographic properties right
> more often than a simple set of substitutions, but I suppose stranger
> things are true on the Web.

Keep in mind that pretty much every web application already needs to
safely generate unique, unpredictable tokens - for session identifiers
that guard authenticated sessions. If they can't get it right, they
are hosed anyway - but problems here are not horribly common, in my
experience at least, and web app frameworks do a decent job of helping
developers by providing token-generating facilities.

As noted earlier, the vast majority of issues with XSS and XSRF
defenses is that you explicitly need to think about them, and a
failure to do so has no obvious side effects. From that perspective,
any solution that requires a security mechanism to be used is far
better.

>> Also, a single token on a returned page, as long as it's unpredictable
>> across user sessions, should not be a significant issue.
> I'm just worried that some people would just a constant string.

Sure; but there's a difference between the near certainty of
forgetting to escape something, somewhere, in a complex web app (in
fact, I did so on several occasions in my code, despite making a
living by pointing these issues to others on a daily basis) - and
willingly sabotaging your app. Browser vendors / standards bodies are
partly at fault for the former, but have no obligation to prevent the
latter.

>> I was merely suggesting that we *expand* the same logic, and the same
>> excellent security control granularity, to span and div; this seems like
>> it would not increase the implementation complexity in any significant
>> way.
>
> I don't understand the proposal then. What is the problem it is solving,
> and how does it solve it?

It makes it possible to conveniently sandbox small, inline snippets of
user input on HTML pages, without a significant performance penalty.
As noted, if <iframe> rendering speed is improved instead, and srcdoc
and seamless rendering is implemented, the two proposals become
roughly equivalent, though (spare for the argument over srcdoc="..."
versus guard tokens). It does feel a bit odd to turn an <iframe> into
something that looks like <span>, swims like <span>, and quacks like
<span> - but that's mostly an aesthetic objection that shouldn't
matter much (a labored argument could be made that it will not be
obvious you can and should use <iframe> this way, so some people may
not fully utilize the security benefits it offers - but...)

> I don't understand what you mean by "does not enforce a security control",
> or how a guarded closing tag does "enforce a security control".

If you have the intent to display user-controlled input, you have three options:

<span>[server-sanitized string]</span>
<iframe srcdoc="[server-escaped string]"></iframe>
<span guard=[token]>[any string]</span guard=[token]>

The first two options will not immediately fail if you forget about or
mess up escaping or sanitization - and people to, all the time. The
last option just works, unless you purposefully sabotage your code by
using "1234", or can't generate a random number / session token on
server side (in which case, you are hosed already - session cookies
can be guessed, too). The last option strikes me as a bit less
error-prone.

/mz