[whatwg] [encoding] utf-16
Anne van Kesteren
annevk at opera.com
Wed Dec 28 07:13:56 PST 2011
On Wed, 28 Dec 2011 12:30:49 +0100, Leif Halvard Silli
<xn--mlform-iua at målform.no> wrote:
> I spotted a shortcoming in your testing:
>> I ran some utf-16 tests using 007A as input data, optionally preceded by
>> FFFE or FEFF, and with utf-16, utf-16le, and utf-16be declared in the
>> Content-Type header. For WebKit I tested both Safari 5.1.2 and Chrome
>> 17.0.963.12. Trident is Internet Explorer 9 on Windows 7. Presto is
>> 11.60. Gecko is Nightly 12.0a1 (2011-12-26).
>> HTTP BOM Trident WebKit Gecko Presto
>> utf-16 - 7A00 7A00 007A 007A
>> utf-16le - 7A00 7A00 7A00 7A00
>> utf-16be - 007A 007A 007A 007A
> The above test row is not complete. You should also run a BOM-less test
> using the UTF-16 label but where the 007A is represented in the
> big-endian way - a bit like I did here:
> <http://malform.no/testing/utf/#html-table-7>. The you get as result
> that Opera and Firefox do not take it for a given that files sent as
> 'utf-16' are big-endian:
> utf-16 - gibb* gibb* 007A 007A
> *gibb = gibberish/mojibake.
I get U+7A00 as I indicated above. I would not qualify that as gibberish
personally. (My table is somewhat confusing as input 007A was meant to
describe octets, but the table describes code points.)
Presto and Gecko do have some magic, but it seems better if they were the
same as Trident (and WebKit).
> That the BOM is removed from the output for utf-16be labelled files,
> means that the 'utf-16be' labelled file nevertheless is treated as
> UTF-16 (per UTF-16's specification). (Otherwise, if it had not been
> removed, the BOM character should have caused quirks mode.)
> Taking what you did not test for into account, it would make sense if
> 'utf-16' continues to be treated as a label under which both big-endian
> and litt-endian can be expected. And thus, that Webkit and IE starts to
> detect when UTF-16 is big-endian, but without a BOM.
I am not sure what you are trying to say here.
Anne van Kesteren
More information about the whatwg