[whatwg] Charset sniffing from XML prolog

Kartikaya Gupta lists.whatwg at stakface.com
Wed Oct 7 18:29:17 PDT 2009


On Wed, 07 Oct 2009 20:23:35 -0400, Boris Zbarsky <bzbarsky at MIT.EDU> wrote:
> On 10/7/09 7:51 PM, Kartikaya Gupta wrote:
> > I tried it again in Chrome and if I paste the above in the address bar I get US-ASCII. But if I save it to a file and then load it I get UTF-8. I checked the headers being sent from Apache and they don't include any sneaky encoding hints, just Content-Type: text/html.
> 
> Can you attach the exact file you saved?  Does it have a BOM, perchance?
> 
> 

No BOM (I created the files using vim, and checked them with xxd).

Using document.inputEncoding:
http://stakface.com/pub/mango/fakexml.html
http://stakface.com/pub/mango/fakexml_iso.html

Using a degree symbol in UTF-8:
http://stakface.com/pub/mango/fakexml2.html
http://stakface.com/pub/mango/fakexml2_iso.html

In both cases the _iso version has a tweaked prolog such that it goes back to ISO-8859-1 in Firefox. Chrome still detects fakexml_iso.html as UTF-8. I've now also tested in Firefox on Mac (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3) which also has a default encoding of ISO-8859-1 as per the preferences.

kats



More information about the whatwg mailing list