[whatwg] Content type sniffing
Boris Zbarsky
bzbarsky at MIT.EDU
Sun Jan 11 18:41:58 PST 2009
I just noticed that section 2.7.1 of HTML5 says:
Extensions must not be used for determining resource types
for resources fetched over HTTP.
While I understand the reasons for this, there are certainly cases where
this will break sites (basically those using HTTP 0.9, or later HTTP
versions but not sending a content-type). In particular, the HTML
sniffing in the algorithm is very limited and wouldn't sniff this document:
<body>Some text</body>
as HTML.
Now this use case (no content-type at all) was pretty common when the
unknown type sniffer in Gecko was written, but that was years ago. Do
we have any data on how common it is now?
-Boris
P.S. Of course at the moment the sniffer in Gecko is used for more than
just HTTP, and it looks like we'll need separate modes for things like
HTTP and things like file://. I can live with that, though. For the
file:// case detection of HTML in documents with no
doctype/<html>/<head> is a must.
More information about the whatwg
mailing list