[whatwg] Content Restrictions
Alexey Feldgendler
alexey at feldgendler.ru
Thu Mar 9 08:57:31 PST 2006
On Mon, 06 Mar 2006 16:48:08 +0600, Gervase Markham <gerv at mozilla.org>
wrote:
>> I never said that the website won't have to do HTML cleaning for
>> user-supplied content. But with HTML 5 reference parsing algorithm, such
>> cleaning is going to be much easier and straightforward: parse the text
>> into DOM (as if it was inside BODY, for example), remove or modify
>> forbidden elements, then serialize it. That way, </SANDBOX> will be
>> ignored as an easy parse error because it doesn't match an opening tag
>> within the user-supplied text. An unclosed comment will be ignored, too.
> Er, what defines "the user-supplied content"? Surely it's the <SANDBOX>
> tags? So how can you say "A </SANDBOX> inside the user-supplied content
> will be ignored", as you don't know whether a </SANDBOX> you encounter
> is the end of the sandbox or not?
>
> Or are you suggesting that only one sandbox per page is allowed, and the
> user agent should use the outermost </SANDBOX> tag?
It's my fault, I just didn't make it clear enough. Here is the scenario I
was keeping in mind.
Let's imagine a blogging website that allows anybody to create a blog
which is available as http://www.example.com/blogs/username/. Many such
sites allow various user customization, so imagine this site lets the blog
owner to supply custom HTML to display on top of the blog page. This is
primarily used by blog authors to design stylish navigation. To make such
navigation menus more attractive, the authors wish to use JavaScript and
Flash, but unrestricted JavaScript would make it possible for the blog
owner to steal visitors' session cookies.
The blog author logs in and opens some kind of customization screen:
HTML to display on top of your blog: [TEXTAREA]
[SUBMIT]
So, imagine the blog author enters into the textarea:
Welcome to my blog!</sandbox><a href="#"
onclick="alert(document.cookie)">Click here</a>
After submission, this code is fed to the HTML cleaner. At present, HTML
cleaners are usually complicated scripts which try to catch known quirks
of the user agents, and still they usually have security holes found one
after another. See for example
http://cvs.livejournal.org/browse.cgi/livejournal/cgi-bin/cleanhtml.pl.
With HTML 5 parsing spec, there will be one single algorithm for parsing
HTML code with well-defined error recovery. So, the HTML cleaner at the
server side runs the HTML 5 parser on the user-supplied text, which
produces the following DOM:
* Welcome to my blog!
* A
href="#"
onclick="alert(document.cookie)"
* Click here
The </sandbox> tag is ignored as an easy parse error because there is no
matching <sandbox> tag in the user-supplied text. After parsing, the HTML
cleaner iterates through the tree, renaming potentially unsafe elements
and attributes, producing the following:
* Welcome to my blog!
* A
href="#"
safe-onclick="alert(document.cookie)"
* Click here
At the final stage, the HTML cleaner re-serializes the DOM into the
following code, which is saved into the database:
Welcome to my blog!<a href="#" safe-onclick="alert(document.cookie)">Click
here</a>
When the site renders the blog page, it puts the "HTML for page top"
inside a sandbox:
<body>
<sandbox>
Welcome to my blog!<a href="#" safe-onclick="alert(document.cookie)">Click
here</a>
</sandbox>
...
</body>
Each blog entry is probably also contained in its own sandbox. This is
even more important on the so-called friends pages, where entries by
different authors are displayed on the same page.
When the page is rendered in a modern user agent which supports
sandboxing, the safe-onclick attribute is interpreted exactly the same as
onclick. When the user clicks the link, the event handler is executed.
Because the code is inside the sandbox, it operates on a fake document
object, so it doesn't retrieve the cookies (I think document.cookie should
just return an empty string). The visitor's session cookies are safe.
When the page is rendered in an older user agent which doesn't support
sandboxing, the safe-onclick attribute is ignored because it is unknown.
When the user clicks the link, no event handler is executed, and the
cookies are safe again.
--
Opera M2 8.5 on Debian Linux 2.6.12-1-k7
* Origin: X-Man's Station [ICQ: 115226275] <alexey at feldgendler.ru>
More information about the whatwg
mailing list