[whatwg] some thoughts on sandboxed IFRAMEs

Sun Dec 13 16:14:21 PST 2009

On Fri, Dec 11, 2009 at 11:18 PM, Michal Zalewski <lcamtuf at coredump.cx> wrote:
> The ability to sandbox SPANs or DIVs using a token-guarded approach
> (<span sandbox="random_token"></span sandbox="same_token">) is, on the
> other hand, considerably easier on the developer, and probably has a
> very similar implementation complexity.

Well, the problem this random token thing is trying to address is that
the untrusted content could just close the tag. (I fondly remember my
days on Geocities, when we would add <noscript><noscript> to the end
of our pages to try to get rid of the auto-injected ads.) But it's
kind of hacky and might be prone to failure, and the syntax is really
unpleasant (especially for XML compatibility).

So instead, why not just use the standard escaping mechanisms we
already have?  Allow a sandbox attribute on all elements that can
contain phrasing or flow content.  Any such element with a sandbox
attribute will be required to contain no literal <>'" before the
closing tag.  If any of those four characters is encountered, the
element is treated as having no contents.  Otherwise, the browser
unescapes all characters with special meanings ("<" -> "<", ">"
-> ">", "&" -> "&", etc.) and then treats the resulting string as
the inner HTML of the element, parsing it like regular HTML, but the
contents are sandboxed.

Examples:

<span sandbox>This span will work normally, except for being sandboxed.</span>

<span sandbox>This span will be <em>empty</em> in the DOM, even though
it contains no evil content, because otherwise authors will forget to
escape the contents of the sandbox.</span>

But this span will have another span as its
child, sandboxed. The regular parser sees no entities here, only a
nested span!

It would be safe to allow this to work, since it only
contains an apostrophe, but let's not, so that lack of escaping is
easier to catch. This span is therefore also empty.

I think this is easier to use than having to generate a random token,
and also more secure.  If your code isn't escaping things right,
you'll quickly notice when your blog comments all vanish.

This is even backward-compatible, in a certain sense. <jail> would be
unsafe to serve with untrusted contents until all UAs reliably support
it. This would be perfectly safe in all browsers, it would just
display poorly in old browsers if there's any HTML markup in the
content.

What do people think of this syntax?