[whatwg] Sandboxing to accommodate user generated content.

Tue Jun 17 09:53:20 PDT 2008

Hello,

I'm new to the list and have joined in response to this discussion on
html security changes.

>>I have been reading up on past discussions on sandboxing content, and I feel that it is generally agreed on that there should be some mechanism for marking content as "user
>>generated". The discussion mainly appears to be focused on implementation. Please read my implementation notes at the end of this message on how we can include this
>>function safely for both HTML 4 and HTML 5 browsers, and still allow HTML 4 browsers to function properly.
>>
>>
>> In the discussions I find that backward compatability is absolutely the most important issue. Second is that it must be easy for web developers to use the features.
>>
>> The suggested solution of using an attribute on an <iframe> element for storing the user generated content has several problems;
>>
>> 1: The use of src= as a fallback means that style information will be lost and stylesheets must be loaded again.
>>
>> 2: The use of src= yields problems with iframe heights (since the src-url must be hosted on another server javascript cannot fix this) and HTML 4 browsers have no other method of
>> adjusting the iframe height according to the content.
>>
>> My solution:

>> If we add a new element <htmlarea></htmlarea>, old browsers will run scripts, while new browsers will stop scripts and this is a major problem.
>>
>> If HTML 5 browsers require everything between <htmlarea></htmlarea> to be html entity escaped, that is < and > must be replaced with &lt; and &gt; respectively. If this is not
>> done, HTML 5 browsers will issue a severe warning and refuse to display the page. Developers will quickly learn.
>>
>> HTML 4 browsers will never run scripts (since it will only see plain text). HTML 5 browsers will display rich text. It would be completely secure for both HTML 4 and HTML 5
>> browsers.
>>
>> A simple Javascript could clean up the HTML markup for HTML 4 browsers..

I've also been having side discussions with a few people regarding the
ability for a website owner to mark sections as data rather than code
(where everything lies now).
Your <htmlarea> tag idea is a good one (maybe change the tag to <data>
just a nitpick) however you don't address the use case of the
following

<data>

<user supplied input>

</data>

If the user injects </data> then game over.  A solution I discovered
for this problem (others I'm sure as well that aren't speaking)
borrows from the defenses of cross-site request forgery (CSRF) where a
non guessable token is used. Take the following example

<data id="GUID">
</data>
</data id="<GUID>">

GUID would be a temporary GUID value such as
'F9968C5E-CEB2-4faa-B6BF-329BF39FA1E4' that would be tied to the user
session. An attacker would be unable to break out of a <data> tag due
to the fact that they couldn't guess the closing ID value. This is
something that could be built into a web framework (JSP tag/PHP
function/asp.net component) that could handle the token generation
portion to assist with adoption.

A few notes on this approach

- <data> (or <htmlarea> whatever you call it) can not be nested.
- All content inside data tags would need to be treated as text or
handled as HTML entity encoded values before processing

>>  I believe the idea to deal with this is to add another attribute to <iframe>, besides sandbox="" and seamless="" we already have for sandboxing. This attribute, doc="", would take
>> a string of markup where you would only need to escape the quotation character used (so either ' or "). The fallback for legacy user agents would be the src="" attribute.

To take this a step further there may be situations where user content
is reflected inside of HTML tags in the following manner such as
'<a href="<user generated value">foo</a>'. For situations like this an
additional attribute (along the lines of what you propose) could be
added to this tag (or any tag for that matter)
to instruct the browser that no script/html can execute.

<a sandbox="true"  href="javascript:alert(document.cookie")>asd</a>
<a sandbox="true" href="<injected value>">asd</a>  (injected value  "
onload="javascript:alert('wooot')" foo="bar)

In this example the developer would allow user content to be inserted
into the href value as desired, however disallow script injection as
well as breaking out of the html attribute by the specification of
this tag (i.e. everything inside each attribute is treated as HTML
entity data/text).

My 0.04.

Regards,
- Robert Auger
http://www.webappsec.org/