[whatwg] [encoding] utf-16
Anne van Kesteren
annevk at opera.com
Wed Dec 28 01:05:48 PST 2011
On Wed, 28 Dec 2011 03:20:26 +0100, Leif Halvard Silli
<xn--mlform-iua at målform.no> wrote:
> By "default" you supposedly mean "default, before error
> handling/heuristic detection". Relevance: On the "real" Web, no browser
> fails to display utf-16 as often as Webkit - its defaulting behavior
> not withstanding - it can't be a goal to replicate that, for instance.
Do you mean heuristics when it comes to the decoding layer? Or before
that? I do think any heuristics ought to be defined.
>> utf-16le becomes a label for utf-16.
> * Logically, utf-16be should become a label for utf-16 then, as well.
That's not logical.
> Is that what you suggest? Because, if the BOM can change the meaning of
> utf-16be, then it makes sense to treat the utf-16be label as well as
> the utf-16le label as synonymous with utf-16. (Thus, effectively
> utf-16le and utf-16be becomes defunct/unreliable on the Web.)
No, because utf-16be actually has different behavior in absence of a BOM.
It does mean they can share some common algorithm(s), but they have to
stay different encodings.
> SECONDLY: You effectively say that, for the UTF-16 BOM, then the BOM
> should override the HTTP level charset info. OK. But then you should go
> the full way, and give the BOM the same, overriding authority when it
> comes to the UTF-8 BOM. For instance, if the HTTP server's Content-Type
> header specifies ISO-8859-1 (or 'utf-8' or 'utf-16'), but the file
> itself contains a BOM (that contradicts the HTTP info), then the BOM
> "wins" - in IE and WEbkit. (And, btw, w.r.t. IE, then the
> X-Content-Type: header has no effect w.r.t. treating the HTTP's charset
> info as authoritative - the BOM wins even then.)
No, I don't see why we have to go there at all. All this suggests is that
within the two utf-16 encodings the first four bytes have special meaning.
That does not all suggest we should do the same for numerous other
encodings unrelated to utf-16.
Anne van Kesteren
More information about the whatwg