[whatwg] API for encoding/decoding ArrayBuffers into text

Wed Apr 4 18:05:44 PDT 2012

On Wed, Apr 4, 2012 at 11:09 AM, Joshua Bell <jsbell at chromium.org> wrote:

> Any further input on Kenneth's suggestions?
>

I largely disagree with those suggestions, because I don't believe they
align with the natural, intuitive usage of the API.

Re: ArrayBufferView vs. DataView - I'm tempted to make the switch to just
> DataView. As discussed below, data parsing/serialization operations will
> tend to be associated with DataViews.

I disagree.  TypedArray is much more natural for processing arrays, since
they can be accessed just like a regular JavaScript array; code generally
doesn't have to care whether it's been given a JavaScript array or a
TypedArray.  For DataView, you need to rewrite everything.

As Glenn has mentioned elsewhere
> recently, it is possible to accidentally do a buffer copy when mis-using
> typed array constructors, while DataView avoids this.

That should be fixed, not used against TypedArray classes when they make
sense.

This can be fixed by adding a TypedArray(TypedArray, byteOffset, length)
constructor, which creates a new shallow view from an existing view; this
would be logically grouped with the similar TypedArray(ArrayBuffer,
byteOffset, length) function.  Unfortunately, the offset parameter would
have to be required, so the method can be resolved against the
TypedArray(TypedArray) constructor.  (A cleaner design would have been to
have a separate copy() function to create an explicit copy, but it's most
likely too late to remove the TypedArray(TypedArray) ctor.)

As (another) aside, all of the TypedArray constructors should be available
on DataView, too, so they exist on all ArrayBufferView subtypes.

DataViews are cheap
> to construct, and when I'm writing sample code for the proposed API I find
> I create throw-away DataViews anyway.

Array views are cheap to construct, too.

APIs returning DataViews feels unnatural; it's a helper class that isn't
returned by anything else.  If you don't return a view of a specific,
contextually-meaningful type (eg. Int16LEArray for UTF-16LE) from encode(),
then returning the ArrayBuffer itself seems preferable, like XHR2.  Let's
not split APIs, with some returning DataView and some ArrayBuffer.

Also, there is the potential for
> confusion when using a non-Uint8Array buffer e.g. are the elements being
> decoded using array[N] as the octets or using the underlying buffer? for
> Uint16Array/UTF-16 encodings, what are the endianness concerns?

The data is always decoded based on the encoding specified.

It wouldn't make sense for decode() to only take a DataView.  If I have an
Int8Array, it's busywork to make me construct a DataView from it so I can
pass it to decode().  Just take ArrayBufferView, so it doesn't care what
the particular view type is.

DataView APIs have an explicit endianness and no index getter, which
> alleviates this
> somewhat.
>

Ideally, endian-explicit TypedArrays should be created, eg. Int16LEArray
and Int16BEArray.  I mentioned this in the other thread; the big-endian
types seem important to have anyway (regardless of the encoding API), and
the little-endian views are just so we can pretend the "native endian"
issue isn't there.

 Also, I am planning to move the "fatal" option from the encode/decode
> methods to the TextEncoder/TextDecoder constructors. Objections?

I don't have a strong feeling either way.  Can you think of any cases where
the encoder/decoder object would be handed off from one user to another,
who might want different behavior?  It seems unlikely.

-- 
Glenn Maynard