[whatwg] Encoding: API

Joshua Bell jsbell at chromium.org
Wed Oct 10 10:28:22 PDT 2012

On Wed, Oct 10, 2012 at 6:42 AM, Anne van Kesteren <annevk at annevk.nl> wrote:

> Hey, I was wondering whether it would make sense to define
> http://wiki.whatwg.org/wiki/StringEncoding as part of
> http://encoding.spec.whatwg.org/ Tying them together makes sense to me
> anyway and is similar to what we do with URL, HTML, etc.

No objection from me.

> As for the open issue, I think it would make sense if the encoding's
> name was returned. Label is just some case-insensitive keyword to get
> there.

I tend to agree, as the label gives you no information you don't already
have and the name can be at least a diagnostic.

> I also still think it's kinda yucky that this API has this gigantic
> hack around what the rest of the platform does with respect to the
> byte order mark. It seems really weird to not expose the same
> encode/decode that HTML/XML/CSS/etc. use.

IMHO the API needs to support use cases: (1) code that wants to follow the
behavior of the web platform with respect to legacy content (i.e. the
desire to self-host), and (2) code that wants to parse files that are not
traditionally "web" data, i.e. fragments of binary files, which don't have
legacy behavior and where BOM taking priority would be surprising to
developers. For #2, following the behavior of APIs like ICU with respect to
BOMs is more sensible. I believe #2 is higher priority as long as it does
not preclude #1, and #1 can be achieved by code that inspects the stream
before handing it off to the decoder.

Practically speaking, this would mean refactoring the combined spec so that
the current BOM handling is defined for parsing web content outside of the
API rather than requiring the API to hack around it.


While we're here, any feedback from implementers? Mozilla is apparently
quite far along. Any surprises or additional issues? Any initial feedback
from users?

I received feedback recently that the API is perhaps too terse right now
when dealing with streaming content, and a more explicit decode(),
decodeStream(), resetStream() might be more intelligible. Thoughts?

More information about the whatwg mailing list