[whatwg] UTF-16 encoding default

Kartikaya Gupta lists.whatwg at stakface.com
Tue Jun 23 18:42:29 PDT 2009

There's a page (http://www.microsoft.com/windowsmobile/mobile/en-us/totalaccess/software/software/eula-sw-netflix.mspx specifically) that has a Content-Type header of "text/html; charset=utf-16" and has no BOM. The references I've seen (RFC2781, as well as http://unicode.org/faq/utf_bom.html#gen7) say that this means the content should be assumed to be UTF-16BE. The page, however, is actually in UTF-16LE.

All browsers seem to do some sort of unspecified magic and figure out that the page is in LE. I was wondering if that magic could be described and added to the HTML5 spec so that it covers rendering the above page as expected. According to the draft spec as it stands, I believe that page should be rendered as garbage.


PS - the page also has a meta tag that says the charset is iso-8859-1. *sigh*

More information about the whatwg mailing list