[whatwg] Video with MIME type application/octet-stream

Wed Sep 1 13:46:14 PDT 2010

On Tue, Aug 31, 2010 at 4:13 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> The issue would be someone linking to text or HTML or a binary blob that
> happens to have some bits at the beginning that look like an audio/video
> types and expecting them to be rendered respectivel as text or HTML or be
> downloaded.

Is this realistically possible unless the author deliberately crafts
the file?  We're talking quite a few bytes that have to be exactly
right, no?  If the author does deliberately craft the file, is there
any security risk in displaying it unexpectedly, given that media
isn't scriptable?

> The big danger with sniffing, as always, is that the server will think one
> thing will happen and suddenly the browser will do something totally
> different.

As long as what the browser is doing is almost certain to be closer to
the author's/user's/webmaster's intent, that's not a problem.
Sniffing is a problem if you risk false positives or security issues,
but I can't see how that's an issue in this specific case.  We have a
lot of experience with the perils of sniffing -- have any issues ever
been caused by this kind of sniffing problem?  The only sniffing
problems I know of are when

1) The sniffing is unreliable, so false identifications happen by
accident.  They're common with MIME types too, but at least with MIME
they're more predictable.  This will hold for pretty much any text
format, if only because you might want to serve the file as text/plain
to mean "let the user view the source code instead of executing it".
But with binary formats it doesn't have to be plausible, if the string
you're sniffing for is reasonably long.

2) The MIME type is safe (e.g., not scriptable), and the type it's
sniffed as is not safe (e.g., it's HTML or JAR).  Then even if false
identifications are overwhelmingly improbable by accident, they'll
happen when people upload malicious files posing as an image or
whatever to get code to execute from a domain they don't control.

Are there clear problems that have arisen in other cases?

On Tue, Aug 31, 2010 at 8:59 PM, Andrew Scherkus <scherkus at chromium.org> wrote:
> We use the incoming MIME type to determine whether we render the audio/video
> in the browser versus download.  We would never want to execute multimedia
> sniffing code in the trusted/browser process so implementing sniffing for a
> top level browser window would involve sending the bytes to a sandboxed
> process for inspection first.

Why can't you do media sniffing in the trusted process?  It must be a
lot simpler than parsing HTTP headers -- just a memcmp() or two per
format, if the format is designed so it can be sniffed well.

On Wed, Sep 1, 2010 at 12:27 AM, Gregory Maxwell <gmaxwell at gmail.com> wrote:
> Aggressive sniffing can and has resulted in some pretty nasty security bugs.
>
> E.g. an attacker crafts an input that a website identifies as video
> and permits the upload but which a browser sniffs out to be a java jar
> which can then access the source URL with the permissions of the user.

This is problem (2) above.  The solution is never to sniff for
scriptable content.  The problem can't plausibly arise with media
files -- if you can execute a vulnerability via getting the user to
view a media file, it's probably via arbitrary code execution.  In
that case you don't need to disguise yourself, just get the viewer to
go to your own website and do whatever you want, since there are no
same-domain restrictions.

> The sniffing rules, in some contexts and some browsers can also end up
> causing surprising failures... e.g. I've seen older versions of some
> sniffing heavy browsers automatically switch into UCS-2LE encoding at
> wrong and surprising times. Perhaps this is irrelevant in a video
> specific discussion of sniffing— but it is a hazard with sniffing in
> general.

Is this plausible in practice for common media formats?  I didn't find
info on sniffing media by quick Googling, but for instance, GIF starts
with "GIF87a" or "GIF89a", and PNG has an eight-byte signature.
Random binary data is going to hit these one time in 2^48 or 2^64,
about 10^14 and 10^19 respectively.  The actual figure is likely to be
even lower, because most binary formats don't have arbitrary data in
their first few bytes.  Is this really something we should worry
about, given how obviously hard it is to get MIME types right?

> Moreover, it'll never be consistent from implementation to
> implementation, which seems to me to be pretty antithetical to
> standardization in general.

The exact sniffing algorithm needs to be precisely specced.  In fact,
there's work undergoing to do that right now, for other types of
sniffing:

http://tools.ietf.org/html/draft-abarth-mime-sniff-05

There's no reason it can't be perfectly consistent.  The reason it's
historically been inconsistent is because specs have tried to claim
that no sniffing is allowed, so implementers had no spec to follow.
Which is what's in the HTML5 spec now, and it's a mistake.

On Wed, Sep 1, 2010 at 10:37 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> Yes, actually, if there's a filtering proxy trying to screen out video or
> image data that's trying to exploit known OS-level bugs, say.

It seems like such a proxy would be unreliable in any event, since you
could do all sorts of things to obfuscate it, not least of all just
using HTTPS.  Do any such proxies exist?  Given the demonstrable
difficulty to authors of getting MIME types right, I think
non-hypothetical reasons are needed to justify relying on them.

On Wed, Sep 1, 2010 at 12:29 PM, Eric Carlson <eric.carlson at apple.com> wrote:
>   Hard coding the type is only possible if the element uses a <source>
> element, @type isn't allowed on <audio> or <video>.

Probably because it was only introduced to let the UA choose between
multiple <source>s.  It's a nonissue, the spec can be changed if
there's good reason.

On Wed, Sep 1, 2010 at 3:54 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> Will sniff as binary so as not to render as text but will NOT, last I
> checked, render as an image or whatnot (for good security reasons, imho).

What reasons are these?