[whatwg] Sandboxing to accommodate user generated content.

Tue Jun 17 10:45:06 PDT 2008

> I've also been having side discussions with a few people regarding the
> ability for a website owner to mark sections as data rather than code
> (where everything lies now).
> Your <htmlarea> tag idea is a good one (maybe change the tag to <data>
> just a nitpick) however you don't address the use case of the
> following
>
> <data>
>
> <user supplied input>
>
> </data>

I have considered your idea (below) but found that it would not allow
efficient server side caching, which often is needed. If instead all
html inside <data></data> must be escaped like this:

<data>

<user supplied input>

</data>

Then this will be secure both for HTML 4 and HTML 5 browsers. HTML 4
browsers will display html, while HTML 5 browsers will display
correctly formatted code. A simple javascript like this (untested)
would make the data tags readable for HTML 4 browsers:

var els = document.getElementsByTagName("DATA");
for(e in els) els[e].innerHTML =
els[e].innerHTML.replace(/<[^>]*>/g, "").replace(/\n/g,
"<br>");

A problem with this approach is that developers might forget to escape
tags, therefore I think browsers should display a security warning
message if the character < or > is encountered inside a <data> tag.

> If the user injects </data> then game over.  A solution I discovered
> for this problem (others I'm sure as well that aren't speaking)
> borrows from the defenses of cross-site request forgery (CSRF) where a
> non guessable token is used. Take the following example
>
> <data id="GUID">
> </data>
> </data id="<GUID>">
>
> GUID would be a temporary GUID value such as
> 'F9968C5E-CEB2-4faa-B6BF-329BF39FA1E4' that would be tied to the user
> session. An attacker would be unable to break out of a <data> tag due
> to the fact that they couldn't guess the closing ID value. This is

*snip*

>>>  I believe the idea to deal with this is to add another attribute to <iframe>, besides sandbox="" and seamless="" we already have for sandboxing. This attribute, doc="", would take
>>> a string of markup where you would only need to escape the quotation character used (so either ' or "). The fallback for legacy user agents would be the src="" attribute.
>
> To take this a step further there may be situations where user content
> is reflected inside of HTML tags in the following manner such as
> '<a href="<user generated value">foo</a>'. For situations like this an
> additional attribute (along the lines of what you propose) could be
> added to this tag (or any tag for that matter)
> to instruct the browser that no script/html can execute.
>
> <a sandbox="true"  href="javascript:alert(document.cookie")>asd</a>
> <a sandbox="true" href="<injected value>">asd</a>  (injected value  "
> onload="javascript:alert('wooot')" foo="bar)

I like this better than a separate tag yes. <div sandbox="1"></div> or
<div content="untrusted"></div>