[whatwg] API for encoding/decoding ArrayBuffers into text

Kenneth Russell kbr at google.com
Tue Mar 27 17:12:57 PDT 2012

On Mon, Mar 26, 2012 at 10:28 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> On Mon, Mar 26, 2012 at 6:11 PM, Kenneth Russell <kbr at google.com> wrote:
>> On Mon, Mar 26, 2012 at 5:33 PM, Jonas Sicking <jonas at sicking.cc> wrote:
>>> On Mon, Mar 26, 2012 at 4:40 PM, Joshua Bell <jsbell at chromium.org> wrote:
>>>>> * We lost the ability to decode from a arraybuffer and see how many
>>>>> bytes were consumed before a null-terminator was hit. One not terribly
>>>>> elegant solution would be to add a TextDecoder.decodeWithLength method
>>>>> which return a DOMString+length tuple.
>>>> Agreed, but of course see above - there was consensus earlier in the thread
>>>> that searching for null terminators should be done outside the API,
>>>> therefore the caller will have the length handy already. Yes, this would be
>>>> a big flaw since decoding at tightly packed data structure (e.g. array of
>>>> null terminated strings w/o length) would be impossible with just the
>>>> nullTerminator flag.
>>> Requiring callers to find the null character first, and then use that
>>> will require one additional pass over the encoded binary data though.
>>> Also, if we put the API for finding the null character on the Decoder
>>> object it doesn't seem like we're creating an API which is easier to
>>> use, just one that has moved some of the logic from the API to every
>>> caller.
>>> Though I guess the best solution would be to add methods to DataView
>>> which allows consuming an ArrayBuffer up to a null terminated point
>>> and returns the decoded string. Potentially such a method could take a
>>> Decoder object as argument.
>> The rationale for specifying the string encoding and decoding
>> functionality outside the typed array specification is to keep the
>> typed array spec small and easily implementable. The indexed property
>> getters and setters on the typed array views, and methods on DataView,
>> are designed to be implementable with a small amount of assembly code
>> in JavaScript engines. I'd strongly prefer to continue to design the
>> encoding/decoding functionality separately from the typed array views.
> Is there a reason you couldn't keep the current set of functions on
> DataView implemented using a small amount of assembly code, and let
> the new functions fall back to slower C++ functions?

That's possible.

Another motivation for keeping encoding/decoding functionality
separate is that it is likely that it will require a lot of spec text,
which would dramatically increase the size of the typed array spec.
Perhaps once all of the details have been hammered out on this thread
it will be more obvious whether these methods would be much clearer if
added directly to DataView.

A couple of comments on the current StringEncoding proposal:

  - I think it should reference DataView directly rather than
ArrayBufferView. The typed array spec was specifically designed with
two use cases in mind: in-memory assembly of data to be sent to the
graphics card or audio device, where the byte order must be that of
the host architecture; and assembly of data for network transmission,
where the byte order needs to be explicit. DataView covers the latter

  - It would be preferable if the encoding API had a way to avoid
memory allocation, for example to encode into a passed-in DataView.


More information about the whatwg mailing list