[whatwg] API for encoding/decoding ArrayBuffers into text

Jonas Sicking jonas at sicking.cc
Wed Mar 21 01:27:47 PDT 2012


On Tue, Mar 20, 2012 at 10:39 AM, Joshua Bell <jsbell at chromium.org> wrote:
> On Tue, Mar 20, 2012 at 7:26 AM, Glenn Maynard <glenn at zewt.org> wrote:
>
>> On Mon, Mar 19, 2012 at 11:52 PM, Jonas Sicking <jonas at sicking.cc> wrote:
>>
>>> Why are encodings different than other parts of the API where you
>>>
>> indeed have to know what works and what doesn't.
>>>
>>
>> Do you memorize lists of encodings?  I certainly don't.  I look them up as
>> needed.
>>
>> UTF8 is stateful, so I disagree.
>>>
>>
>> No, UTF-8 doesn't require a stateful decoder to support streaming.  You
>> decode up to the last codepoint that you can decode completely.  The return
>> values are the output data, the number of bytes output, and the number of
>> bytes consumed; that's all you need to restart decoding later.  That's the
>> iconv(3) approach that we're probably all familiar with, which works with
>> almost all encodings.
>>
>> ISO-2022 encodings are stateful: you have to persistently remember the
>> character subsets activated by earlier escape sequences.  An iconv-like
>> streaming API is impossible; to support streamed decoding, you'd need to
>> have a decoder object that the user keeps around in order to store that
>> state.  http://en.wikipedia.org/wiki/ISO/IEC_2022#Code_structure
>>
>
> Which seems like it leaves us with these options:
>
> 1. Only support encodings with stateless coding (possibly down to a minimum
> of UTF-8)
> 2. Only provide an API supporting non-streaming coding (i.e. whole
> strings/whole buffers)
> 3. Expand the API to return encoder/decoder objects that capture state

I'm pretty sure there is consensus for supporting UTF8. UTF8 is
stateful though can be made not stateful by not consuming all
characters and instead forcing the caller to keep the state (in the
form of unconsumed text).

So I would rephrase your 3 options above as:

1) Create an API which forces consumers to do state handling. Probably
leading to people creating wrappers which essentially implement option
3
2) Don't support streaming
3) Have encoder/decoder objects which hold state

I personally don't think 1 is a good option since it's basically the
same as 3 but just with libraries doing some of the work. We might as
well do that work so that libraries aren't needed.

This leaves us with 2 or 3. So the question is if we should support
streaming or not. I suspect doing so would be worth it.

/ Jonas



More information about the whatwg mailing list