[whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

Mon Nov 30 18:32:04 PST 2009

On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak <mjs at apple.com> wrote:
> 1) It seems like this API is harder to use than a sandboxed iframe. To use
> it correctly, you need to determine a whitelist of safe elements and
> attributes; providing an explicit whitelist at least of tags is mandatory.
> With a sandboxed iframe, as a Web developer you can just ask the browser to
> turn off unsafe things and not worry about designing a security policy.
> Besides ease of use, there is also the concern that a server-side filtering
> whitelist may be buggy, and if you apply the same whitelist on the client
> side as backup instead of doing something high level like "disable
> scripting" then you are less likely to benefit from defense in depth, since
> you may just replicate the bug.

I should follow up with folks in the ruby-on-rails community to see
how they view their sanitize API.  The one person I asked had a
positive opinion, but we should get a bigger sample size.

I think updateWithSanitizedHTML has different use cases than @sandbox.
 I think the killer applications for @sandbox are advertisements and
gadgets.  In those cases, the developer wants most of the browser's
functionality, but wants to turn off some dangerous stuff (like
plug-ins).  For updateWithSanitizedHTML, the killer application is
something like blog comments, where you basically want text with some
formatting tags (bold, italics, and maybe images depending on the
forum).

> 2) It seems like this API loses one of the big benefits of sanitizing HTML
> in the browser implementation. Specifically, in theory it's safe to say
> "allow everything except any construct that would result in script/code
> running". You can't do that on the server side - blacklisting is not sound
> because you can't predict the capabilities of all browsers. But the browser
> can predict its own capabilities. Sandboxed iframes do allow for this.

The benefit is that you know you're getting the right parsing.  You're
not going to be tripped up by <img/src=javascript: and friends.  Also,
this API is useful in cases where you don't have a server to help you
sanitize your input.  One example I saw recently was a GreaseMonkey
script that wanted to add EXIF metadata to Flickr.  Basically, the
script grabbed the EXIF data from api.flickr.com and added it to the
current page.  Unfortunately, that meant I could use this GreaseMonkey
script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
are other ways of solving the problem (I asked the developer to build
the DOM in memory and use innerText), but you want something simple
for these cases.

> I think the benefits of filtering by tag/attribute/scheme for advanced
> experts are outweighed by these two disadvantages for basic use, compared to
> something simple like the original staticInnerHTML idea. Another possible
> alternative is to express how to sanitize at a higher level, using something
> similar to sandboxed iframe feature strings.

If you think of @sandbox as being optimized for rich untrusted content
and updateWithSanitizedHTML as being optimized for poor untrusted
content, then you'll see that's what the API does already.  The
feature string Slashdot wants for its comments is ("a b strong i em",
"href"), but another message board might want something different.
For example, 4chan might want ("img", "src alt").  I don't think these
require particularly advanced experts to understand.

> Here's a problem that exists with both this API and also innerStaticHTML:
>
> 3) There is no secure and efficient way to append sanitized contents to an
> element that already has children. This may result in authors appending with
> innerHTML +=  (inefficient and insecure!) or insertAdjecentHTML() (efficient
> but still insecure!). I'm willing to concede that use cases other than
> "replace existing contents" and "append to existing contents" are fairly
> exotic.

Maybe we need insertAdjecentSanitizedHTML instead or in addition.  ;)

Adam