[whatwg] Encoding: API
Anne van Kesteren
annevk at annevk.nl
Wed Oct 10 21:09:09 PDT 2012
On Wed, Oct 10, 2012 at 7:28 PM, Joshua Bell <jsbell at chromium.org> wrote:
> On Wed, Oct 10, 2012 at 6:42 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
>> I also still think it's kinda yucky that this API has this gigantic
>> hack around what the rest of the platform does with respect to the
>> byte order mark. It seems really weird to not expose the same
>> encode/decode that HTML/XML/CSS/etc. use.
> IMHO the API needs to support use cases: (1) code that wants to follow the
> behavior of the web platform with respect to legacy content (i.e. the
> desire to self-host), and (2) code that wants to parse files that are not
> traditionally "web" data, i.e. fragments of binary files, which don't have
> legacy behavior and where BOM taking priority would be surprising to
> developers. For #2, following the behavior of APIs like ICU with respect to
> BOMs is more sensible. I believe #2 is higher priority as long as it does
> not preclude #1, and #1 can be achieved by code that inspects the stream
> before handing it off to the decoder.
> Practically speaking, this would mean refactoring the combined spec so that
> the current BOM handling is defined for parsing web content outside of the
> API rather than requiring the API to hack around it.
You would still get the hack because the API requires special
treatment for "utf-16". Given that per Unicode "utf-16le" and
"utf-16be" outlaw the BOM, maybe a good solution would be a flag to
disable BOM handling as seen by the decode algorithm? So the decoder
gets a disableBOM flag that defaults to false? That would only require
a special case for BOM handling on top of what there is today, which
seems a fair bit cleaner.
> I received feedback recently that the API is perhaps too terse right now
> when dealing with streaming content, and a more explicit decode(),
> decodeStream(), resetStream() might be more intelligible. Thoughts?
Either way works for me.
More information about the whatwg