[whatwg] Sandboxing to accommodate user generated content.

Frode Børli frode at seria.no
Mon Jun 16 21:09:55 PDT 2008


 Hi! I am a new member of this mailing list, and I wish to contribute with a
couple of specific requirements that I believe should be discussed and
perhaps implemented in the final specification. I am unsure if this is the
correct place to post my ideas (or if my ideas are even new), but if it is
not, then I am sure somebody will instruct me. :) One person told me that
the specification was finished and no new features would be added from now
on - but hopefully that is not true.


The challenge:

More and more websites have features where users can contribute with user
generated content - often in the form of audio, video, images
or wiki-articles. An older type of content contribution is normal text such
as posts in a discussion forum, a mailing list such as this and comments on
blog articles.

A major challenge for many web developers is validating "untrusted" content
such as the message body of a blog comment. Unless the developer has a
flawless and future proof algorithm for ensuring that the message body does
not contain any script, web developers have to resort to text only - or
bbCode-style markup languages to allow users to post text content with
richer formatting. If the developer wants to enable rich formatting using
bbCode, it also needs fairly advanced methods of ensuring that no scripts
are executed. Consider this bbCode example:
[img]some_image.jpg'onmouseover=maliciousScript()[/img]. The bbCode parser
must ensure that there is absolutely no method of injecting scripts in user
posts - and that is very difficult when at the same time there exists
parsing errors in browsers. The example could easily be validating by not
allowing apostrophes or quotation marks in urls - but then we have multiple
entities that could be used: ' or '. To make matters worse, some
browsers parse &#39 which is an incomplete html entity and all these
variations must be considered by the bbCode parser author.

Another problem which makes future proofing this type of security is that
standards evolve. A few years ago you could safely allow users to apply
css-styles to tags. Example bbCode tag [color=blue]Blue text[/color] would
be translated to <span style='color: blue'>Blue text</span>. In this example
an exploit could be [color=expression(maliciousCode())]Text[/color]. When
the algorithm was made, it was considered secure, since no script could ever
be executed inside a style attribute. With the invention of expressions and
behaviours etc the knowledge required by web developers are ever increasing,
and web developers have to review all old code whenever new technologies
emerge - because what once was secure suddenly is not secure anymore.


One solution:

<htmlarea>User generated content</htmlarea>


No scripts would ever be allowed to be executed inside this tag. Malicious
users could potentially submit "</htmlarea> unsafe content <htmlarea>" and
get around this. There are as I can see it two solutions to this:

User generated content inside the tag must be escaped using html entities
(but still rendered as html by the user agent), or the author must prevent
users from submitting the string "</htmlarea>" and all possible variations
of the tag.

If the first solution is used, then browsers should display a
strong security warning if unescaped content is seen between htmlarea-tags
on a website (to educated web developers).


A sidenote: The tag name I chose is based on the <textarea>-tags which
should also be entity escaped to prevent users from inserting the text
</textarea>.  This currently breaks a lot of web pages - so perhaps a strong
security warning is in place if unescaped content is found after the
textarea start tag also?


-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need
to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.



-- 
Best regards / Med vennlig hilsen
Frode Børli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need
to.

Tenk miljø. Ikke skriv ut denne e-posten dersom det ikke er nødvendig.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20080617/a38ade1f/attachment.htm>


More information about the whatwg mailing list