[whatwg] API for encoding/decoding ArrayBuffers into text

Mon Mar 26 16:40:41 PDT 2012

On Mon, Mar 26, 2012 at 4:12 PM, Glenn Maynard <glenn at zewt.org> wrote:

> On Mon, Mar 26, 2012 at 4:49 PM, Joshua Bell <jsbell at chromium.org> wrote:
>
>> * A |stream| option, per the above
>>
>
> Does this make sense when you're using stream: false to flush the stream?
> It's still a streaming operation.  I guess it's "close enough".
>
> * A |nullTerminator| option eliminates the need for a stringLength method
>> (hasta la vista, baby!)
>>
>
> I strongly disagree with this change.  It's much cleaner and more generic
> for the decoding algorithm to not know anything about null terminators, and
> to have separate general-purpose methods to determine the length of the
> string (memchr/wmemchr analogs, which we should have anyway).  We made this
> simplification a long time ago--why did you resurrect this?
>

Ah, I'd forgotten that there was consensus that doing this outside the API
was preferable. I'll remove the option when I touch the spec again.

* BOM handling needs to be resolved. The Encoding spec makes the encoding
>> label secondary to the BOM. With this API it's unclear if that should be
>> the case. Options include having a mismatching BOM throw, treating a
>> mismatching BOM as a decoding error (i.e. fallback or throw, depending on
>> options), or allow the BOM to actually switch the decoder used for this
>> "stream" - possibly if-and-only-if the default encoding was specified.
>>
>
> The path of fewest errors is probably to have a BOM override the specified
> UTF-16 endianness, so saying "UTF-16BE" just changes the default.
>

This would apply on if the previous call had {stream: false} (implicitly or
explicitly). Calling with {stream:false} would reset for the next call.

Would it apply only to UTF-16 or UTF-8 as well? Should there be any special
behavior when not specifying an encoding in the constructor?

On Mon, Mar 26, 2012 at 4:27 PM, Jonas Sicking <jonas at sicking.cc> wrote:

> A few comments:
>
> * It appears that we lost the ability to measure how long a resulting
> buffer was going to be and then decode into the buffer. I don't know
> if this is an issue.
>

True. On the plus side, the examples in the page (encode/decode
array-of-strings) didn't change size or IMHO readability at all.

> * It might be a performance problem to have to check for the
> fatal/nullTerminator options on each call.
>

No comment here. Moving the "fatal" and other options to the TextDecoding
object rather than the decode() call is a possibility. I'm not sure which I
prefer.

> * We lost the ability to decode from a arraybuffer and see how many
> bytes were consumed before a null-terminator was hit. One not terribly
> elegant solution would be to add a TextDecoder.decodeWithLength method
> which return a DOMString+length tuple.

Agreed, but of course see above - there was consensus earlier in the thread
that searching for null terminators should be done outside the API,
therefore the caller will have the length handy already. Yes, this would be
a big flaw since decoding at tightly packed data structure (e.g. array of
null terminated strings w/o length) would be impossible with just the
nullTerminator flag.