[whatwg] StringEncoding open issues

Mon Sep 17 14:50:46 PDT 2012

On Mon, Sep 17, 2012 at 2:17 PM, Anne van Kesteren <annevk at annevk.nl> wrote:

> On Mon, Sep 17, 2012 at 11:13 PM, Joshua Bell <jsbell at chromium.org> wrote:
> > I've attempted to distill the above into the spec in an algorithmic way:
> > http://wiki.whatwg.org/wiki/StringEncoding#TextDecoder
> >
> > English version: If you specify "utf-16" you get endian-agnostic UTF-16
> > encoding support. Failing that, if your encoding matches your BOM it is
> > consumed. Failing *that*, you get whatever behavior falls out of the
> decode
> > algorithm (garbage, error, etc).
>
> Why would we want the API to work different from how it works in
> markup (with <meta charset> etc.)? Granted it's not super logical, but
> I don't really see why we should make it inconsistent and more
> complicated.
>

That's how the spec started out, so a recap of this thread would give you
the back-and-forth that led here. To summarize:

Having the BOM in the content be higher priority than the coding selected
by the developer was not seen as desirable (see earlier in the thread), and
potentially a source of errors. Selecting encoding via BOM (in general, or
to emulate <meta charset>, etc) was seen as something that could be done in
user code if desired, but unexpected otherwise.

Two desired behaviors remained: (1) developer need for BOM-specified
endian-agnostic UTF-16 encoding similar to ICU's handling that
distinguishes "utf-16" from "utf-16le", and (2) that matching BOMs should
be consumed and not appear in the decoded data.