[whatwg] Sandboxing ideas
Alexey Feldgendler
alexey at feldgendler.ru
Tue May 8 17:05:19 PDT 2007
On Tue, 08 May 2007 05:50:38 +0200, Ian Hickson <ian at hixie.ch> wrote:
>> 1. The entire thing has to degrade SAFELY in existing browsers. With
>> your approach, any existing browser will just ignore the unknown
>> "sandbox" attribute, effectively allowing the script to do anything.
>> This is not acceptable.
> This probably depends on the use cases in question. For some use cases,
> the status quo is in fact the script running with full privileges, so
> while not being ideal, it is indeed acceptable; in other cases, you
> wouldn't want scripts to run at all if they weren't limited in some way.
A security feature, by definition, protects the users from a certain class
of attacks. An attack needs to be only successful in one browser to do
harm. For example, a malicious advertising script which actually steals
passwords entered by users on the host page is dangerous enough even if
the attacker only succeeds in stealing passwords of just a fraction of the
users.
I can't really imagine a scenario in which sandbox restrictions could be
somehow considered optional. Wherever there is need for such restrictions,
it's unacceptable to run the script without them implemented.
> This is unfortunately far too complicated. It basically duplicates most
> of
> the <iframe> security and DOM model, which itself has been a big source
> of
> bugs over the years.
Yes, that's the idea (about the duplication, not about the bugs).
> Actually the origin-checking in browsers is simpler than that. It only
> happens at certain very specific places, namely the Window interface
> entry
> points. If we want to add a security model here, it has to be at the
> Window level, which basically means a new browsing context.
I should probably have named the element <browsingcontext>.
The key differences from <iframe> are:
1. Doesn't require loading of a separate document via a separate HTTP
request, and without the ugliness of data: URIs. If there was some
"inline" version of <iframe>, such as <iframe>content</iframe>, that would
be just fine.
2. Implements the security barrier even though the inner content doesn't
come from a different domain. <iframe> would require a separate domain for
that.
3. The security barrier is asymmetric, i.e. the outer scripts have access
to the inner content, but not the other way round.
>> Of course, there is a lot more to think and talk about. I suppose there
>> are going to be problems with particular buggy implementations of
>> sandboxing and exploits specifically targetted at holes in such
>> implementations. I suspect that web application authors and site
>> administrators will be hesitant to allow user scripting even in
>> sandboxes because of the possible browser bugs.
> Because of this, we really want to make sure we leverage as much of the
> existing infrastructure as possible. I'm worried that the DOMSandbox
> idea,
> with its "fake" documents, etc, introduces too much complexity.
You're drawing parallels between sandboxing and <iframe>. If the
shortcomings of <iframe> listed above can be alleviated, it would be just
fine.
>> I propose to define the notion of "side effect free script". All
>> browsers which allow scripts in declarations like CSS should only allow
>> side effect free scripts in such places.
>> 2. It can call any non-native function, but the same restrictions apply.
> So it can get hold of data that the rest of the page has created, or is
> storing in its temporary variables (e.g. it can get hold of your calendar
> data if you're looking at an online calendar application).
No, it's impossible to store any data permanently in a thread which is in
SEF mode. Only locals can be assigned, and they aren't going to last
longer than the thread anyway.
> With the above you could still do something like:
>
> <a style="display: expression(...)"
> href="http://evil.example.com?a">a</a>
> <a style="display: expression(...)"
> href="http://evil.example.com?b">b</a>
>
> ...where the first "..." script returns 'none' to convey one piece of
> information and 'block' to convey another, and the second is the reverse;
> the user who clicks on the link then exposes the bit of information the
> script was trying to steal. I'm sure there are more powerful attacks as
> well, e.g. using href=javascript: to return an HTML page with script.
Even easier: background: url(expression(...)).
I see your point.
> In short, the complexity is high, as is the risk that it isn't
> comprehensive. Also, it seems to me that most scripts want to do
> something
> more fancy. For example, a calendar widget will want to talk to its
> server, render new DOMs, interact with the user, etc. What's the use case
> for these scripts? Are they common enough to warrant their own security
> model?
It's not for most scripts. It's basically only for expression() in CSS,
which is generally a good thing, if only it can be made impossible to do
use it for bad purposes. And this whole SEF idea is not really relevant to
sandboxing.
>> Frames are a terrible solution. The content is after all a part of the
>> page it's hosted in, but we want to sandbox it to make sure it can't do
>> any harm.
>>
>> Let's say we'd like to sandbox anonymous user-contributed comments on a
>> blog, but not comments from logged in users. That would require all
>> anonymous comments to be placed within an iframe. For 100 anonymous
>> comments, that's 100 iframes on a single web page. Don't tell me that's
>> an elegant solution.
> Why not? Or rather, why is a 100 <sandbox> frames (or whatever) better?
1. Because it doesn't require 100 HTTP requests to load the page.
2. Because it doesn't require a separate domain to serve the iframe
content from.
These two are major, and there are also several minor issues (some sizing
problems with iframes, as pointed out by Charles; stylesheet propagation
into sandboxes; strict symmetry of restrictions on iframes).
> We can't do something like this:
>
> <body>
> <p>Hello, you said:
> <sandbox>Hello World</sandbox>
> </p>
> </body>
>
> ...because nothing stops the user from inserting "</sandbox>" into the
> string -- e.g. if the user tried to insert
> "</sandbox><script>alert(window.cookie)</script>" the result would be:
All attempts to treat user-submitted HTML as a string are doomed to having
such vulnerabilities. <sandbox> alone doesn't add much to this problem.
Just look at how complex is the HTML sanitizer in LiveJournal which allows
some user-submitted markup but not all.
The only ultimate solution here is to parse the user-submitted HTML with
an HTML5 parser and reserialize it. The string
"</sandbox><script>alert(window.cookie)</script>" would parse into one
<script> element with a text node inside (stray </sandbox> at the start
gets ignored), and reserialize as "<script>alert(window.cookie)</script>".
That's the only reasonable way (apart from completely escaping all <>"&
characters) to include ANY user-submitted string into generated HTML, with
or without <sandbox>.
> The sanest way I can see of limiting scripting is to give it its own
> browsing context (aka scripting context, or global scope). Anything short
> of this would make the security model overly complicated -- the security
> model is what we want to keep at its simplest, as I've said several times
> in this e-mail.
<sandbox> would indeed be one, just with the content supplied inline.
> This basically implies an <iframe>, again possibly with the data in a
> data: URI, and combined with a way to ioslate the content in the <iframe>
> from the content of the parent browsing context:
>
> <iframe
> src="data:text/html;base64,PHA%2BVGhpcyBpcyBteSBzYW1wbGUgbWFya3VwITwvcD4%3D"
> isolate-scripts
> ></iframe>
data: URIs are maybe appropriate for a small list-bullet PNG, but not for
a blog entry or comment. They are ugly and impossible to read and write
without machine conversion. Any element that lets you write the HTML
content inside, be it <iframe> or <sandbox> or something else, would be OK.
> The names above are a bit long; here's a summary of what the four modes
> could be:
>
> seamless - if present, styles cascade through the browsing context
> boundary; ignored if the origin doesn't match the parent's.
>
> noscript - disables all scripts in the embedded page
>
> isolate - make the origin of the file not match the parent's,
> regardless of the real origins
>
> restrict - disable certain APIs in the browsing context
These make a nice list of toggle attributes for the <sandbox> element.
--
Alexey Feldgendler <alexey at feldgendler.ru>
[ICQ: 115226275] http://feldgendler.livejournal.com
More information about the whatwg
mailing list