[whatwg] Charset sniffing from XML prolog

Wed Oct 7 17:04:16 PDT 2009

On Wednesday 2009-10-07 23:51 +0000, Kartikaya Gupta wrote:
> On Wed, 07 Oct 2009 19:34:18 -0400, Boris Zbarsky <bzbarsky at MIT.EDU> wrote:
> > On 10/7/09 7:12 PM, Kartikaya Gupta wrote:
> > > If a document is served as text/html, but contains an XML prolog with an encoding attribute, it seems that all Firefox, Opera, and Chrome all pick up the encoding from the prolog and use it when parsing the rest of the document. (IE6 does not). The HTML5 spec doesn't seem to include XML-prolog checking in its encoding sniffing algorithm, should it?
> > > 
> > > <?xml version="1.0" encoding="utf-8"?>
> > > <html>insert utf-8 content here, or alert(document.inputEncoding) for browsers that support it</html>
> > 
> > data:text/html,<?xml version="1.0" 
> > encoding="utf-8"?><html><script>alert(document.inputEncoding)</script></html>
> > 
> > Shows ISO-8859-1 for me in Firefox over here.
> > 
> 
> Strange. I got "UTF-8" when I pasted that into the address bar. For reference, the version of FF I'm using is:

Maybe you've configured UTF-8 as the fallback encoding?  It's a
preference (and its default value varies between localizations).

Tools -> Options -> Content -> Fonts & Colors -> Character Encoding
-> Default Character Encoding.  (For other platforms, change Tools
-> Options to Edit -> Preferences (GNOME-based platforms) or Firefox
-> Preferences (Mac).)

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/