[whatwg] Character encoding of document.open()ed documents
hsivonen at iki.fi
Thu Apr 1 02:29:34 PDT 2010
On Mar 31, 2010, at 22:11, Boris Zbarsky wrote:
> On 3/31/10 10:37 AM, Henri Sivonen wrote:
>> Gecko sets the document's character encoding to UTF-8 and uses UTF-8 to decode the external resource.
> One more clarifying question.... Does Gecko use UTF-8, or the encoding of whatever document it was open() got called on?
Gecko uses the encoding of the document that open() got called on.
>> WebKit uses the encoding of the opener. IE8 (both with compat view button pressed and not pressed) sets the document's character encoding to "unicode" and uses UTF-8 to decode the external resource. Opera uses Windows-1252 to decode the external resource.
> Similar question for IE.
IE6 and IE8 set the encoding to "unicode" and use UTF-8 to decode the external resource even if the document that open() was called on had a different meta charset.
It seems that WebKit uses the encoding of the document that open() was called on *and* about:blank in an iframe inherits the encoding of its parent, which is why I previously thought WebKit used the encoding of the opener.
Furthermore, I was wrong when I thought Opera didn't support document.charset and document.characterSet. It does support them, but document.open()ed docs have the document's character encoding set to the empty string and the empty string means the user's fallback encoding (Windows-1252 by default) for the purpose of external resources.
From the evidence so far, assuming that IE is axiomatically sufficiently Web compatible here, it seems to me that making document.open() set the encoding to UTF-8 and ignoring meta charset in document.open()ed documents could work. I can also see why retaining the encoding of the document that open() was called on could be preferable, but so far I'm not persuaded that meta charset in document.open()ed documents should have an effect.
I verified that CSS and JS are treated the same way:
On Apr 1, 2010, at 06:26, And Clover wrote:
> No browser will actually try to submit a form as UTF-16 for this reason, but it still causes problems. eg. Firefox misleadingly sets the `_charset_` hack field to 'UTF-16' even though the submission is UTF-8-encoded.
Is a bug on file? I didn't find a bug about this in Bugzilla.
hsivonen at iki.fi
More information about the whatwg