[whatwg] API for encoding/decoding ArrayBuffers into text

Jonas Sicking jonas at sicking.cc
Mon Mar 26 16:27:29 PDT 2012


On Mon, Mar 26, 2012 at 2:49 PM, Joshua Bell <jsbell at chromium.org> wrote:
> On Mon, Mar 26, 2012 at 2:42 PM, Anne van Kesteren <annevk at opera.com> wrote:
>
>> On Mon, 26 Mar 2012 17:56:41 +0100, Joshua Bell <jsbell at chromium.org>
>> wrote:
>>
>>> Bikeshed: The |continues| term doesn't completely thrill me; it's clear
>>> in context, but not necessarily what someone might go searching for.
>>> {eof:true} would be lovely except we want the default to be yes-EOF but a
>>> falsy JS value. |noEOF| ?
>>>
>>
>> Peter Beverloo suggests "stream" on IRC. I like it.
>
>
> +1
>
>
>> Opinions on one object type (Encoding) vs. two (Encoder, Decoder) ?
>>>
>>
>> Two seems cleaner.
>
>
> I've gone ahead and updated the wiki/draft:
> http://wiki.whatwg.org/wiki/StringEncoding
>
> This includes:
>
> * TextEncoder / TextDecoder objects, with |encode| and |decode| methods
> that take option dicts
> * A |stream| option, per the above
> * A |nullTerminator| option eliminates the need for a stringLength method
> (hasta la vista, baby!)
> * |encodedLength| method is dropped since you can't in-place encode anyway
> * decoding errors yield fallback code points by default, but setting a
> |fatal| option cause a DOMException to be thrown instead
> * specified exceptions as DOMException of type "EncodingError", as a
> placeholder
>
> New issues resulting from this refactor:
>
> * You can change the options (stream, nullTerminator, fatal) midway through
> decoding a stream. This would be silly to do, but as written I don't think
> this makes the implementation more difficult. Alternately, the non-stream
> options could be set on the TextDecoder object itself.
>
> * BOM handling needs to be resolved. The Encoding spec makes the encoding
> label secondary to the BOM. With this API it's unclear if that should be
> the case. Options include having a mismatching BOM throw, treating a
> mismatching BOM as a decoding error (i.e. fallback or throw, depending on
> options), or allow the BOM to actually switch the decoder used for this
> "stream" - possibly if-and-only-if the default encoding was specified.
>
> I've also partially updated the JS "polyfill" proof-of-concept
> implementation, tests, and examples as well, but it does not implement
> streaming yet (i.e. a "stream" option is ignored, state is always lost); I
> need to do a tiny bit more refactoring first.

This looks awesome!

A few comments:

* It appears that we lost the ability to measure how long a resulting
buffer was going to be and then decode into the buffer. I don't know
if this is an issue.
* It might be a performance problem to have to check for the
fatal/nullTerminator options on each call.
* We lost the ability to decode from a arraybuffer and see how many
bytes were consumed before a null-terminator was hit. One not terribly
elegant solution would be to add a TextDecoder.decodeWithLength method
which return a DOMString+length tuple.

/ Jonas



More information about the whatwg mailing list