[whatwg] Video with MIME type application/octet-stream

Aryeh Gregor Simetrical+w3c at gmail.com
Tue Sep 7 12:29:00 PDT 2010

On Tue, Sep 7, 2010 at 5:51 AM, And Clover <and-py at doxdesk.com> wrote:
> Quite. It surprises and saddens me that anyone wants to argue for *more*
> sniffing, and even enshrining it in a web standard.

I'm not a fan of sniffing, but I'm also not a fan of blindly believing
clearly wrong MIME types and thereby forcing authors to do needless
configuration work, which they might not even be able to do.  I'm not
yet sure what the correct tradeoff is here, but I'm pretty sure it's
not "no sniffing at all under any conditions".

> Sniffing is a perpetual disaster that, after several security-sensitive
> problems, web browsers have been moving to deprecate/mitigate. If browsers
> want to guess types when no Content-Type is specified(*) then fine, but
> there is no good reason to ignore an explicitly-set type. I don't want my
> `application/octet-stream` file download service to be repurposeable as a
> video player for some other party!

If you don't want that, you should be using access control, not MIME types.

> For reasons already argued about here, you will never make the results of
> content-sniffing reliable, so why bother to standardise it? A standardised
> unreliable feature is no better than an unstandardised one.

Sure it is, because it's unreliable in the same way across all
browsers.  That means that in any given case, all browsers will work
the same.  This is particularly essential for security -- undocumented
sniffing behavior has caused more than one vulnerability in the past.

> The typing mechanism of the web (and more) is Content-Type, period. There
> should be no confusion of this with officially-endorsed sniffing.

We already have officially endorsed sniffing where web compat requires it:


The question is if we can avoid it for new content types like
video/audio.  If not, we should spec it in advance so we at least have
something that's as sane as possible under the circumstances.

> That it is
> 'hard' for web authors to ensure the correct Content-Types are set is:
> * not W3/WHATWG's problem. If web servers make adding Content-Type
> information hard, then web servers need to be updated to make it easier;

I don't know about the W3C, but reality is the WHATWG's problem.  We
can't let things be broken and just say it's someone else's fault.  We
need to institute workarounds at our level for failures on other
levels if that's what's necessary to get good security and a good
user/author experience.

> * not really true, at least for Apache which can allow AddType et al in the
> .htaccess files that low-end shared hosts use. This may not be widely-known
> or practised, but that doesn't really merit changing the standards for
> everyone else to cope with.

Creating a .htaccess file is a technical procedure that most users
will not know how to do, particularly since the problem will probably
just manifest itself as "the video doesn't work".  It's also not
possible on some hosts -- although it's certainly possible on the
large majority of cheap shared hosts, and of course on hosts where the
author has root access.

On Tue, Sep 7, 2010 at 6:52 AM, Philip Jägenstedt <philipj at opera.com> wrote:
> It hasn't been explicitly stated, but I assume that the only cases where
> sniffing for video formats would be employed would be for missing
> Content-Type, text/plain and application/octet-stream.

If those are the only common MIME types incorrectly served for unknown
file types, that seems reasonable.  (Some files might be actively
misidentified, like if I have an Ogg file saved as .jpeg, but
hopefully this will be very rare.)

On Tue, Sep 7, 2010 at 8:56 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> On 9/7/10 4:11 AM, Philip Jägenstedt wrote:
>> It's garbage in at least UTF-8, Big5 and GBK.
> Thanks.  I assume that applies to the OggS\0 sequence too, right?  I
> appreciate the data!
>> I'm not sure what infrastructure is in place, but perhaps one could
>> *not* sniff if Content-Type also indicates an encoding?
> As long as "indicates an encoding" doesn't include UTF-8 or ISO-8859-1
> (thanks, Apache!), that should be reasonable, I think.

So at least for Ogg and WebM, how about:

* Sniff only if Content-Type is typical of what popular browsers serve
for unrecognized filetypes.  E.g., only for no Content-Type,
text/plain, or application/octet-stream, and only if the encoding is
either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
do here.
* Sniff the same both for video tags and top-level browsing contexts,
so "open video in new tab" doesn't mysteriously fail on some setups.
* If a file in a top-level browsing context is sniffed as video but
then some kind of error is returned before the video plays the first
frame, fall back to allowing the user to download it, or whatever the
usual action would be if no sniffing had occurred.

Within these constraints, false positives in the sniffing algorithm
are made significantly more unlikely and should be pretty harmless --
if it fails in <video> it would fail anyway, and if it fails in a
top-level browsing context it will fall back quickly to the usual
behavior.  (Unless your non-video file actually so closely resembles a
video file that it will actually begin to play -- hopefully we can all
agree that's a negligible concern.)  Does this sound like a reasonable
direction to go in?

On Tue, Sep 7, 2010 at 12:12 PM, Maciej Stachowiak <mjs at apple.com> wrote:
> At least in the case of Safari, we initially added sniffing for the benefit of video types likely to be played with the QuickTime plugin - mainly .mov and various flavors of MPEG. It is common for these to be served with an incorrect MIME type. And we did not want to impose a high transition cost on content already being served via the QuickTime plugin. The QuickTime plugin may be a slightly less relevant consideration now than when we first thought about this, but at this point it is possible content has been migrated to <video> while still carrying broken MIME types.

What sorts of incorrect MIME types are usually present here, and how
do they come about?  I assume it's not a matter of web servers not
recognizing the file extension.

More information about the whatwg mailing list