[whatwg] Content type sniffing
Boris Zbarsky
bzbarsky at MIT.EDU
Mon Jan 12 07:54:15 PST 2009
Adam Barth wrote:
> Extensions are bad news for content sniffing because they can often be
> chosen by the attacker. For example, suppose user-uploaded content is
> can be downloaded at:
>
> http://example.com/download.php
>
> In most PHP configurations, the attacker can choose whatever file
> extension he likes by directing the user's browser to:
>
> http://example.com/download.php/whatever.foo
>
> And the PHP script will happily run.
Right, I understand that.
> Yes. We do have lots of data from opt-in user metrics from Chrome.
> Here is a somewhat recent summary:
>
> https://crypto.stanford.edu/~abarth/research/html5/content-sniffing/
I'm not quite sure what to make of this, actually. Specifically, where
is the "22.19%" number for "HTML Tags" coming from? 22.19% of what?
The magic numbers stuff actually adds up to 100%, but of what?
> To address your particular concern, <body occurs 6899 times less often
> than <script on Web content that lacks a Content-Type (or has an bogus
> Content-Type like */*), assuming I did my arithmetic correctly.
OK, that's good to know.
> I'm sympathetic to adding more HTML tags to the list, but I'm not sure
> how far down the tail we should go. In Chrome, we went for 99.999%
> compatibility, which might be a bit far down the tail.
Doesn't seem that way to me, given the number of web pages out there.
> http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?view=markup
Ah, ok. The relevant Gecko code is
<http://hg.mozilla.org/mozilla-central/annotate/9f82199fdb9c/netwerk/streamconv/converters/nsUnknownDecoder.cpp#l477>.
I'd probably be fine with trimming that list down a bit, but I'm not
quite sure what the downsides of having more tags in it are here.
-Boris
More information about the whatwg
mailing list