[whatwg] MIME Sniffing spec - http://mimesniff.spec.whatwg.org/
bzbarsky at MIT.EDU
Sat Oct 22 07:53:11 PDT 2011
On 10/22/11 6:09 AM, Daniel Glazman wrote:
>>> text/plain; charset=iso-8859-1
>>> This is wrong. Nothing in the MIME or the HTTP specs says such a
>>> whitespace is mandatory. Whitespace is explicitely forbidden between
>>> type and subtype, between parameter-name and parameter-value, but that's
>>> all. AFAIC, |text/plain;charset=iso-8859-1| is perfectly valid and
>>> |text/plain ; charset=iso-8859-1| is perfectly valid too.
>> We do not want to sniff text/plain more than strictly necessary.
> Sorry, I don't understand that answer, what do you mean exactly ?
Normally, when a browser receives a header of the form "text/plain ...."
where ... is anything, it should treat the page as text-plain.
However, there is a known bug in old Apache installations where Apache
defaulted to sending a type of "text/plain" or "text/plain;
charset=iso-8859-1" or "text/plain; charset=ISO-8859-1" or "text/plain;
charset=UTF8" (depending on the installation) any time it didn't know
what type of data the file was.
Therefore, it is fairly common for random binary files to be served with
those 4 exact header values. Thus, if those _exact_ strings are
encountered the UA needs to sniff to make sure it's not actually binary.
> If I read the document correctly, UAs are going to fallback to complex
> type detection with perf and time cost just because the content-type
> detection did not honour the potential presence of whitespace ???
> Really ?
You read it wrong. If the whitespace doesn't match the exact values in
the table, the UA will just treat the page as text/plain. It's only
when the header value is exactly one of the 4 in the table that the UA
will go into http://mimesniff.spec.whatwg.org/#text-or-binary
More information about the whatwg