[whatwg] Video with MIME type application/octet-stream

Boris Zbarsky bzbarsky at MIT.EDU
Mon Sep 6 18:56:54 PDT 2010


On 9/6/10 3:19 PM, Aryeh Gregor wrote:
> On Mon, Sep 6, 2010 at 4:14 AM, Philip Jägenstedt<philipj at opera.com>  wrote:
>> The Ogg page begins with the 4 bytes "OggS", which is what Opera (GStreamer)
>> checks for. For additional safety, one could also check for the trailing
>> version indicator, which ought to be a NULL byte for current Ogg. [1] [2]
>
> "OggS\0" as the first five bytes seems safe to check for.  It's rather
> short, I guess because it's repeated on every page, but five bytes is
> long enough that it should occur by random only negligibly often, in
> either text or binary files.

So if a text file starts with U+4F67 U+6753 (both CJK ideographs) and 
any ASCII character (can this happen in the real world?) you're OK with 
treating it as Ogg?  Same for files staring with U+674F U+5367 (both CJK 
ideographs) and any plane-0 character whose Unicode codepoint is 0 mod 
2^16 (plenty of CJK stuff like that)?  Is your CJK good enough that you 
know text files would never start like this, or are you just assuming 
that people who are silly enough to use UTF-16 for their text files and 
aren't in Europe don't matter?  Or that you don't care about people who 
happen to not use a BOM?

> It looks like you could check for 0x1a 0x45 0xdf 0xa3 as the first
> four bytes

U+1A45 is Thai, looks like.  DFA3 is a surrogate, so you're ok there.

U+451A is CJK.  U+A3DF looks like a Yi syllable, so you're more or less 
ok there too.  I'm assuming you've already checked this byte sequence 
out in UTF-8 and some other common encodings?

-Boris

P.S.  Sniffing is harder that you seem to think.  It really is...



More information about the whatwg mailing list