[whatwg] Video with MIME type application/octet-stream

Wed Dec 8 17:19:58 PST 2010

Long story short: I haven't changed the spec where it talks about <video>, 
<source type>, Content-Type, and direct file inspection for type 
determination. My plan is to just wait and see what browsers do and update 
the spec accordingly in due course. This is mostly because we clearly have 
a wide range of opinions regarding what the right behaviour is, 
implementations are still changing, and implementors often disagree with 
their own implementations at this stage.

On Tue, 31 Aug 2010, Boris Zbarsky wrote:
> On 8/31/10 3:36 AM, Ian Hickson wrote:
> > > You might say "Hey, but aren't you content sniffing then to find the 
> > > codecs" and you'd be right. But in this case we're respecting the 
> > > MIME type sent by the server - it tells the browser to whatever 
> > > level of detail it wants (including codecs if needed) what type it 
> > > is sending. If the server sends 'text/plain' or 'video/x-matroska' I 
> > > wouldn't expect a browsers to sniff it for Ogg content.
> > 
> > The Microsoft guys responded to my suggestion that they might want to 
> > implement something like this with "what's the benefit of doing 
> > that?".
> 
> One obvious benefit is that videos with the wrong type will not work, 
> and hence videos will be sent with the right type.
> 
> If the question is what the benefits of that are, one is that the "view 
> video in new window" context menu option actually works.
> Another benefit is that you can send someone the link to the video, 
> instead of the embedding page, and it will work.
> Another is that when you save the video to disk the browser will fix up 
> the extension correctly, if needed.

I think that they would argue that these should work either way, with the 
same sniffing being used to ensure it works in all of these places.

> > It seems that sniffing is context-sensitive.
> 
> Yes, but one issue is that we really do want resources to be usable 
> outside the context the page happens to want to put them in.
> 
> The ship has sailed on <img>, clearly, and is working on sailing on 
> <video>, but I feel that the behavior IE and Chrome are implementing 
> here is highly detrimental to the web.  Not that they care much.
>
> > Sadly, the boat has sailed for text/html and XML at this point, but 
> > for binary types, and for contexts where text/plain isn't a contender, 
> > why bother doing anything but sniff?
> 
> See above.  As long as some contexts are sniffing and some are not, we 
> have a problem.  If it were all-sniff (with the same algorithm across 
> the board!) or all-not-sniff, we might be ok.

I could go either way, but I think the road to all-sniff is less steep.

On Tue, 31 Aug 2010, Boris Zbarsky wrote:
> On 8/31/10 9:57 AM, Anne van Kesteren wrote:
> > > 
> > > If the question is what the benefits of that are, one is that the 
> > > "view video in new window" context menu option actually works.
> > 
> > If you sniff you can sniff there too.
> 
> Not really, since it's just rendering in a toplevel browser window.  Or 
> rather... one could, but sniffing or not depending on something other 
> than the state of the url bar and the server response in toplevel 
> browser windows is extremely poor UI.

I'm not sure I follow. It works fine for sniffing JPEGs sent with the 
wrong type; why wouldn't it work for videos too?

> > > Another benefit is that you can send someone the link to the video, 
> > > instead of the embedding page, and it will work.
> > 
> > If you sniff you can sniff there too. (Unless that user uses a 
> > competitor's browser, but that would be an incentive to encourage that 
> > user to use the sniffing browser.)
> 
> You can't sniff in a toplevel browser window.  Not the same way that 
> people are sniffing in <video>.  It would break the web.

How so?

On Tue, 31 Aug 2010, Aryeh Gregor wrote:
> 
> If you can't come up with any actual problems with what IE is doing, 
> then why is anything else even being considered?

Because a number of people I respect, such as Boris, who also happen to 
have more influence than I, since they are implementors, would rather we 
not determine types based on leading byte comparisons but on the MIME 
type.

> There's a very clear-cut problem with relying on MIME types: MIME types 
> are often wrong and hard for authors to configure, and this is not going 
> to change anytime soon.

Certainly it won't change if we have any sniffing going on. :-)

> > Sadly, the boat has sailed for text/html and XML at this point, but 
> > for binary types, and for contexts where text/plain isn't a contender, 
> > why bother doing anything but sniff?
> 
> If this is your position, why doesn't the spec match it?

The spec doesn't reflect my position. It would be quite different if it 
did. :-) It reflects what can be implemented interoperably within the 
constraints put forward by implementors.

On Tue, 31 Aug 2010, Boris Zbarsky wrote:
> On 8/31/10 3:59 PM, Aryeh Gregor wrote:
> > On Tue, Aug 31, 2010 at 10:35 AM, Boris Zbarsky<bzbarsky at mit.edu> 
> > wrote:
> > > You can't sniff in a toplevel browser window.  Not the same way that 
> > > people are sniffing in<video>.  It would break the web.
> > 
> > How so?  For the sake of argument, suppose you sniff only for known 
> > binary video/audio types, and fall back to existing behavior if the 
> > type isn't one of those (e.g., not video or audio).  Do people do 
> > things like link to MP3 files with incorrect MIME types and no 
> > Content-Disposition, and expect them to download?
> 
> The issue would be someone linking to text or HTML or a binary blob that 
> happens to have some bits at the beginning that look like an audio/video 
> types and expecting them to be rendered respectivel as text or HTML or 
> be downloaded.

text/html wouldn't be sniffed as video, any more than it would be sniffed 
as an image if served at the top level. I agree that this can lead to 
differences between <video> and top-level, even with sniffing in both places.

> > I don't see how sniffing vs. using MIME type makes a compatibility 
> > difference here, since media support in browsers is so new -- surely 
> > whatever bad thing happens, sniffing will make it happen more often, 
> > at worst.
> 
> The big danger with sniffing, as always, is that the server will think 
> one thing will happen and suddenly the browser will do something totally 
> different.

Indeed, which is why we must specify any sniffing we do.

On Fri, 3 Sep 2010, Aryeh Gregor wrote:
> 
> But the spec never allowed sniffing, and two browsers do it anyway.  
> Ian has spoken to those browsers' implementers, and the browsers have 
> not changed, despite knowing that they aren't following the spec.  Do 
> you have any particular reason to believe that they'll change?

This is indeed a problem (but it's three browsers, not two). Eventually 
the spec will change to whatever is implemented. For now I'm waiting to 
see if compliance with the specs can still be achieved. (Note that the 
spec as it stands takes a compromise position: the content is only 
accepted if the Content-Type and type="" values are supported types (if 
present) and the content sniffs as a supported type, but nothing in the 
spec checks that all three values are the same.)

On Sat, 4 Sep 2010, Roger Hågensen wrote:
> 
> I may be going slightly off topic with this, but in relation to sniffing 
> and the issue around that, there actually is a long term solution that 
> could be used. Any program would only need to sniff the first 265 bytes 
> of any file to know what format it is. I created a rough draft of an 
> idea I had that I called BINID: [...]

I would recommend approaching this kind of thing using the ideas described 
here:

http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F

i.e. approach implementors, get experimental code out there, see how much 
adoption you can get.

On Sun, 5 Sep 2010, Aryeh Gregor wrote:
> 
> Either context-independent, or specified to occur only in certain key 
> contexts like <video>/top-level browsing context.  No browser implements 
> my suggested behavior today, but I think we all agree it's 
> confusing/harmful to only sniff for <video> and not top-level browsing 
> contexts too, because it breaks all sorts of expected behavior (open in 
> new tab, copy video URL, etc.).

Indeed. That's partly the thinking behind the requirment in the spec today 
that the type be recognised as well as the binary data; it would mean we 
can do the same for videos as for images (where the types are all treated 
as synonyms for each other and enable sniffing).

On Tue, 7 Sep 2010, Boris Zbarsky wrote:

> On 9/7/10 9:16 AM, Philip Jägenstedt wrote:
> > UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do
> > real-world text documents include \0 bytes?
> 
> Yes.  Real-world text documents include all sorts of gunk.  Just rarely.
> 
> > > As long as "indicates an encoding" doesn't include UTF-8 or ISO-8859-1
> > > (thanks, Apache!), that should be reasonable, I think.
> > 
> > Are you saying that Apache has, at various times, set the default
> > character encoding to UTF-8 or ISO-8859-1?
> 
> Yes, precisely.  Though the UTF-8 stuff was Linux distros, I think, not Apache
> itself (in that Apache just sent the thing passed to AddDefaultCharset and
> they changed the value of that from ISO-8859-1 to UTF-8 in their distro
> packages).  Here's the relevant comment from the Gecko source where we do our
> text-or-binary sniffing for toplevel contexts:
> 
>  Make sure to do a case-sensitive exact match comparison here.  Apache
>  1.x just sends text/plain for "unknown", while Apache 2.x sends
>  text/plain with a ISO-8859-1 charset.  Debian's Apache version, just to
>  be different, sends text/plain with iso-8859-1 charset.  For extra fun,
>  FC7, RHEL4, and Ubuntu Feisty send charset=UTF-8.  Don't do general
>  case-insensitive comparison, since we really want to apply this crap as
>  rarely as we can.
> 
> > I was hoping that no encoding parameter at all would be sent :/
> 
> Heh.  I've long since given up all hope of reason on this stuff; I just try to
> keep it as sane and predictable and simple as possible.  :(
> 
> -Boris
> 

On Tue, 7 Sep 2010, And Clover wrote:
> 
> Sniffing is a perpetual disaster that, after several security-sensitive 
> problems, web browsers have been moving to deprecate/mitigate.

The didaster is uninteroperable sniffing. That's not what is being 
proposed here; we're discussing defining the exact byte sequences and 
algorithms to detect specific types of content.

> For reasons already argued about here, you will never make the results 
> of content-sniffing reliable, so why bother to standardise it?

I disagree with the premise of that argument.

> The typing mechanism of the web (and more) is Content-Type, period.

Reality disagrees.

> That it is 'hard' for web authors to ensure the correct Content-Types 
> are set is:
> 
> * not W3/WHATWG's problem. If web servers make adding Content-Type 
> information hard, then web servers need to be updated to make it easier

I can't speak as to whether the W3C think it's the W3C's problem, but it 
_is_ the WHATWG's problem, in that the goal here is interoperability and 
the often incorrect use of Content-Type is leading to interoperability 
issues.

> * not really true, at least for Apache which can allow AddType et al in 
> the .htaccess files that low-end shared hosts use. This may not be 
> widely-known or practised, but that doesn't really merit changing the 
> standards for everyone else to cope with.

What matters is what is practiced, not what is theoretically possible.

On Tue, 7 Sep 2010, Philip Jägenstedt wrote:
> 
> IE9, Safari and Chrome ignore Content-Type in a <video> context and rely 
> on sniffing. If you want Content-Type to be respected, convince the 
> developers of those 3 browsers to change. If not, it's quite inevitable 
> that Opera and Firefox will eventually have to follow.

Indeed.

> It hasn't been explicitly stated, but I assume that the only cases where 
> sniffing for video formats would be employed would be for missing 
> Content-Type, text/plain and application/octet-stream.

Currently it's for no type, application/octet-stream with no parameters, 
and any supported video type.

On Tue, 7 Sep 2010, Boris Zbarsky wrote:
> On 9/7/10 9:03 AM, Philip Jägenstedt wrote:
> > On Tue, 07 Sep 2010 14:54:15 +0200, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> > > On 9/7/10 6:52 AM, Philip Jägenstedt wrote:
> > > 
> > > That's not what at least Aryeh is proposing, no. Also not what at 
> > > least some of the browsers implement.
> > 
> > Oops, I was talking about top-level contexts here. In a <video> 
> > context, always ignoring the Content-Type and always sniffing is the 
> > most sane solution (apart from always respecting Content-Type).
> 
> Yes, the suggestion Aryeh is making is that toplevel contexts should use 
> the same sniffing algorithm as the <video> context and should sniff 
> everything for video, completely ignoring the Content-Type header.

I think the most logical thing for top-level browsing contexts would be to 
do the same as for images: sniff when you've detected binary, or when 
you've been given a known video type.

On Tue, 7 Sep 2010, Boris Zbarsky wrote:
> On 9/7/10 3:29 PM, Aryeh Gregor wrote:
> > * Sniff only if Content-Type is typical of what popular browsers serve
> > for unrecognized filetypes.  E.g., only for no Content-Type,
> > text/plain, or application/octet-stream, and only if the encoding is
> > either not present or is UTF-8 or ISO-8859-1.  Or whatever web servers
> > do here.
> > * Sniff the same both for video tags and top-level browsing contexts,
> > so "open video in new tab" doesn't mysteriously fail on some setups.
> 
> I could probably live with those, actually.

That's consistent with the most I would propose too.

On Wed, 8 Sep 2010, David Singer wrote:
>
> what about "don't sniff if the HTML gave you a mime type" (i.e. a source 
> element with a type attribute), or at least "don't sniff for the 
> purposes of determining CanPlay, dispatch, if the HTML source gave you a 
> mime type"?

That's more or less what the spec has. If the type in the HTML (or the 
Content-Type, indeed) is one you don't support (modulo 
application/octet-stream for practical reasons), then you don't play it, 
even if it's a supported file. If the types in the HTML and HTTP are types 
that _are_ supported, then the spec say you sniff to find out what exactly 
it is and then render it if that's still supported.

On Thu, 9 Sep 2010, David Singer wrote:
>
> I can't think why always sniffing is simple, or cheap, or desirable.  
> I'd love to get to never-sniff, but am not sanguine.

If Safari/Quicktime switch to never sniffing, that increases the odds of 
us being able to never stiff.

On Thu, 9 Sep 2010, Andy Berkheimer wrote:
> 
> Given the past history of content sniffing and security warts, it is 
> useful - or at least comforting - to have a path for the careful server 
> to indicate "I know this file really is intended to be handled as this 
> type, please don't sniff it".  This is particularly true for a server 
> handling sanitized files from unknown sources, as no sanitizer will be 
> perfect.
> 
> Today we approximate this through accurate use of Content-Type and a 
> recent addition of X-Content-Type-Options: nosniff.

X-Content-Type-Options doesn't stop all sniffing in the browsers that 
support it, and isn't a stable equilibrium (since careless server 
operators, as you call them, will likely end up specifying it by mistake 
-- e.g. because an admin sets it globally on their entire site but an 
author doesn't realise this and uploads poorly labeled content).

On Mon, 13 Sep 2010, Mikko Rantalainen wrote:
> 
> And for heavens sake, do not specify any sniffing as "official".

Why not?

> Instead, explicitly specify all sniffing as UA specific and possibly 
> suggest that UAs should inform the user that content is broken and the 
> current rendering is best effort if any sniffing is required.

That sniffing is UA-specific is why it's problematic.

On Wed, 8 Sep 2010, And Clover wrote:
> 
> In any case, any sniffing solution will always be inconsistent as 
> different browsers, OSes, installed codecs and options expose different 
> media filetypes to the net.

I don't think the latter need lead to the former any more than in the case 
of Content-Type.

> Never mind just browsers, or even browsers that simply pass the resource 
> to their underlying media frameworks for sniffing: there are far more 
> already-deployed media players with HTTP capability than there are 
> browsers with video/audio support. There is no chance we will ever be 
> able to standardise the implementation of sniffing amongst this wide 
> range of agents!

That's possible, but so what? They already sniff. We can't convince them 
not to sniff any more than we can convince them to sniff in a particular 
way. That game is already lost. Where we might be able to have convergence 
is with Web browsers; that seems like the best place to start. Once we 
have interop there, there becomes a much higher incentive to the other 
players to converge on the same rules.

> So there will always be non-compliant UAs. In the face of this, we might 
> as well standardise the 'good' solution - minimal sniffing - and hope to 
> drag a few modern browsers along with that, instead of mandating an 
> unreliable sniffing approach that *still* isn't implemented universally.

Your argument works equally well in the other direction:

So there will always be non-compliant UAs. In the face of this, we might 
as well standardise the 'good' solution - direct type determination - and 
hope to drag a few modern browsers along with that, instead of mandating 
an unreliable labeling approach that *still* isn't implemented universally.

> > This is particularly essential for security -- undocumented sniffing 
> > behavior has caused more than one vulnerability in the past.
> 
> Yes. Undocumented sniffing behaviour has caused many vulnerabilities, as 
> even well-known sniffing behaviour continues to do (see the current 
> publicised difficulties with CSS-inclusion attacks).

The CSS-inclusion attacks are due to undocumented sniffing (or rather, 
just assuming that the resource is CSS, ignoring both the file contents 
and the file type), not specced behaviour.

> Lack of sniffing behaviour, however, has never caused a vulnerability. 
> It fails safe.

That's clearly untrue -- all it takes is for a server to incorrectly label 
untrusted user-provided content as text/html for there to be a vulnerability.

On Wed, 1 Sep 2010, Boris Zbarsky wrote:
> 
> It can't possibly work for images.  If I send a file as text/html, and 
> you load it from an <img> then you will render it as an image (possibly 
> a broken one).  If you load it from a toplevel browsing context you will 
> render it as text/html, even if it's image data (where "you" possibly 
> excludes IE/Windows, which will do some sniffing in that situation).

True, if it's labeled as text/html currently you'll only sniff for HTML or 
Atom/RSS feeds, not images.

On Wed, 1 Sep 2010, Boris Zbarsky wrote:
> On 9/1/10 2:51 PM, Ian Hickson wrote:
> > (Currently, text/html won't ever sniff as binary IIRC, but text/plain, 
> > in certain cases, will.
> 
> Will sniff as binary so as not to render as text but will NOT, last I 
> checked, render as an image or whatnot (for good security reasons, 
> imho).

No, it can be detected as an image. Not sure why that would be a security 
problem. The security problem would be if a file got treated as text/html 
when it wasn't label as such (privilege escalation). In the case of a 
binary file labeled as text/plain and then being treated as an image, the 
only serious security problem I can think of would be if the image 
used a code execution vulnerability, but then the problem would presumably 
exist even if the image was labeled "correctly" as an image, and would 
likely work regardless of the origin, so it would be hardly helpful to not 
load it if the file was labeled as text/plain.

On Wed, 1 Sep 2010, Gregory Maxwell wrote:
> 
> Aggressive sniffing can and has resulted in some pretty nasty security 
> bugs.
> 
> E.g. an attacker crafts an input that a website identifies as video and 
> permits the upload but which a browser sniffs out to be a java jar which 
> can then access the source URL with the permissions of the user.

Right, the problem is if people use _different_ sniffing. Hence the intent 
to standardise any sniffing we decide should be implemented.

> Moreover, it'll never be consistent from implementation to 
> implementation, which seems to me to be pretty antithetical to 
> standardization in general.

I disagree with the premise here. There's nothing about sniffing that 
makes in less susceptible to interoperability convergence with a 
specification than any other aspect of a browser.

On Tue, 31 Aug 2010, Adam Barth wrote:
> 
> Why will sniffing never be consistent?  We need only step up as a 
> community and spec things that implementors are willing to implement. 
> Inoperability suffers when we insist on specing things that implementors 
> refuse to implement.

Exactly.

On Wed, 1 Sep 2010, Philip Jägenstedt wrote:
> On Tue, 31 Aug 2010 09:36:00 +0200, Ian Hickson <ian at hixie.ch> wrote:
> > On Mon, 19 Jul 2010, Philip Jägenstedt wrote:
> > > 
> > > I've tested Firefox 3.6.4, Firefox 4.0b1 and Chrome 5.0.375.99 and 
> > > none return "maybe" for canPlayType("application/octet-stream"). I 
> > > couldn't get meaningful results from Safari on Windows (requires 
> > > restart to detect QuickTime, perhaps?).
> > > 
> > > It would appear that Opera is the only browser that supports 
> > > application/octet-stream. At the time I added this, it was simply 
> > > because it is true, maybe we can play it. However, I see no 
> > > practical benefit of this spec-wise or implementation-wise. Since no 
> > > other browsers have implemented it, I am going to remove it from 
> > > Opera and hope that the spec will be changed to match this.
> > 
> > Agreed. I've changed the spec to match.
> 
> I never did make that change, instead waiting for the outcome of this 
> discussion. Note that since Opera uses the same code path for checking 
> the argument to canPlayType and for the Content-Type header, the change 
> would also have meant that videos served as application/octet-stream 
> would stop working, in violation of the spec.

At this point I'm in a similar situation, waiting to see what the browsers 
converge on before changing the spec again. :-)

> > canPlayType() is now hardcoded as not supporting 
> > application/octet-stream even though that type is otherwise not 
> > considered one that isn't supported (i.e. is a type that sniffs).
> 
> I'm not very happy with special-casing application/octet-stream only for 
> canPlayType, especially as it only handles the exact string 
> "application/octet-stream", not e.g. "application/octet-stream;" which 
> would instead be put through the same code path as Content-Type and 
> return "maybe".
> 
> At this point the least complex solution seems to be to ignore the 
> Content-Type header and unless the teams behind Chrome, Safari and IE9 
> have a sudden change of hearts it's the only realistic outcome. Perhaps 
> we should also encourage authors to not send the Content-Type header at 
> all, to remove any illusions of it having an effect.

Maybe.

On Tue, 31 Aug 2010, Julian Reschke wrote:
> On 31.08.2010 09:36, Ian Hickson wrote:
> > > From<http://greenbytes.de/tech/webdav/rfc2046.html#rfc.section.1>:
> > > 
> > > "Parameters are modifiers of the media subtype, and as such do not 
> > > fundamentally affect the nature of the content. The set of 
> > > meaningful parameters depends on the media type and subtype. Most 
> > > parameters are associated with a single specific subtype. However, a 
> > > given top-level media type may define parameters which are 
> > > applicable to any subtype of that type. Parameters may be required 
> > > by their defining media type or subtype or they may be optional. 
> > > MIME implementations must also ignore any parameters whose names 
> > > they do not recognize."
> > > 
> > > So, as "codecs" is not defined on application/octet-stream, the 
> > > parameter simply should be ignored, thus the advice [...]:
> > > 
> > > "The MIME type "application/octet-stream" with no parameters is 
> > > never a type that the user agent knows it cannot render. User agents 
> > > must treat that type as equivalent to the lack of any explicit 
> > > Content-Type metadata when it is used to label a potential media 
> > > resource.
> > > 
> > > Note: In the absence of a specification to the contrary, the MIME 
> > > type "application/octet-stream" when used with parameters, e.g. 
> > > "application/octet-stream;codecs=theora", is a type that the user 
> > > agent knows it cannot render."
> > > 
> > > is incorrect, because it requires handling 
> > > "application/octet-stream" and 
> > > "application/octet-stream;codecs=theora" differently.
> > 
> > That's not incorrect. The type with no parameters is a special case 
> > that corresponds to a common configuration default. The case with 
> > parameters is not that case, and represents likely intentional 
> > configuration and thus clearly not a video format the UA supports.
> 
> My point is that it's incorrect to make this distinction, and that it's 
> furthermore misleading to mention the "codecs" parameter in the context 
> of a type that doesn't define it.

"codecs" is mentioned here because people asked how it should be handled, 
so clearly it's relevant.

The point is that it's not "application/octet-stream;codecs=theora" that 
is handled "differently". It's handled the same as "bogus/bogus", as 
"application/octet-stream;bogus=bogus", etc. It's very specifically the 
exact magic string "application/octet-stream" that is handled specially, 
and that only for historical reasons.

> > > It's also not clear whether the note applies to all parameters or 
> > > just "codecs".
> > 
> > The normative text you quote doesn't mention any specific parameters.
> 
> In which case it would be a *bit* clearer if the note used a parameter 
> that doesn't suggest that "codecs" has any meaning on a/o.

The note in question explictly says "that parameter is not defined for 
that type", so I don't think it's realistic to say that someone could be 
confused into thinking that the parameter is defined for that type.

> > Regarding codecs="" in particular, it's an implementation reality that 
> > user agents that support it are likely to support it regardless of the 
> > type, so there's really no point trying to maintain an artificial 
> > boundary of which types it has semantics for and which it doesn't.
> 
> David Singer pointed out in 
> <http://www.w3.org/Bugs/Public/show_bug.cgi?id=10202#c11> that this is 
> the wrong thing to do.
> 
> Do you have any evidence that UAs already use "codecs" on types on which 
> they aren't defined, *and*, if this is the case, they can't be changed 
> anymore?

I do not. This doesn't affect the note in question though.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'