[whatwg] API for encoding/decoding ArrayBuffers into text

Glenn Maynard glenn at zewt.org
Tue Mar 20 07:26:44 PDT 2012


On Mon, Mar 19, 2012 at 11:52 PM, Jonas Sicking <jonas at sicking.cc> wrote:

> Why are encodings different than other parts of the API where you
>
indeed have to know what works and what doesn't.
>

Do you memorize lists of encodings?  I certainly don't.  I look them up as
needed.

UTF8 is stateful, so I disagree.
>

No, UTF-8 doesn't require a stateful decoder to support streaming.  You
decode up to the last codepoint that you can decode completely.  The return
values are the output data, the number of bytes output, and the number of
bytes consumed; that's all you need to restart decoding later.  That's the
iconv(3) approach that we're probably all familiar with, which works with
almost all encodings.

ISO-2022 encodings are stateful: you have to persistently remember the
character subsets activated by earlier escape sequences.  An iconv-like
streaming API is impossible; to support streamed decoding, you'd need to
have a decoder object that the user keeps around in order to store that
state.  http://en.wikipedia.org/wiki/ISO/IEC_2022#Code_structure

-- 
Glenn Maynard



More information about the whatwg mailing list