[whatwg] Sandboxing to accommodate user generated content.

Frode Børli frode at seria.no
Tue Jun 17 06:05:09 PDT 2008


I have been reading up on past discussions on sandboxing content, and
I feel that it is generally agreed on that there should be some
mechanism for marking content as "user generated". The discussion
mainly appears to be focused on implementation. Please read my
implementation notes at the end of this message on how we can include
this function safely for both HTML 4 and HTML 5 browsers, and still
allow HTML 4 browsers to function properly.


My main arguments for having this feature (in one form or another) in
the browser is:

- It is future proof. Changes to browsers (for example adding
expression support to css) will never again require old sanitizers to
be updated.
- It does not require much skill and effort from the web developer to
safely sanitize user content.
- Security bugs are fixed by browser vendors, and not by each web developer.


In the discussions I find that backward compatability is absolutely
the most important issue. Second is that it must be easy for web
developers to use the features.

The suggested solution of using an attribute on an <iframe> element
for storing the user generated content has several problems;

1: The use of src= as a fallback means that style information will be
lost and stylesheets must be loaded again.

2: The use of src= yields problems with iframe heights (since the
src-url must be hosted on another server javascript cannot fix this)
and HTML 4 browsers have no other method of adjusting the iframe
height according to the content.

3: If you have a page that lists 60 comments on a blog, then the user
agent would have to contact the server 60 times to fetch each comment.
This again means that perl/php scripts have to be invoked 60 times for
one page view - that is 61 separate database connections and session
initializations.

4: For the fallback method of using src= for HTML 4 browsers to
actually work, the fallback documents must be hosted on a separate
domain name. This again means that a website using HTTPS must purchase
and maintain two certificates.

I do not believe this solution will ever be used.


My solution:

If we add a new element <htmlarea></htmlarea>, old browsers will run
scripts, while new browsers will stop scripts and this is a major
problem.

If HTML 5 browsers require everything between <htmlarea></htmlarea> to
be html entity escaped, that is < and > must be replaced with &lt; and
&gt; respectively. If this is not done, HTML 5 browsers will issue a
severe warning and refuse to display the page. Developers will quickly
learn.

HTML 4 browsers will never run scripts (since it will only see plain
text). HTML 5 browsers will display rich text. It would be completely
secure for both HTML 4 and HTML 5 browsers.

A simple Javascript could clean up the HTML markup for HTML 4 browsers..


>  I believe the idea to deal with this is to add another attribute to <iframe>, besides sandbox="" and seamless="" we already have for sandboxing. This attribute, doc="", would take a string of markup where you would only need to escape the quotation character used (so either ' or "). The fallback for legacy user agents would be the src="" attribute.

-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.


More information about the whatwg mailing list