[whatwg] Endianness of typed arrays

Wed Mar 28 14:53:49 PDT 2012

On Wed, Mar 28, 2012 at 2:27 PM, Brandon Jones <tojiro at gmail.com> wrote:
> I was initially on the "just make it little endian and don't make me worry
> about it" side of the fence, but on further careful consideration I've
> changed my mind: I think having typed arrays use the platform endianness is
> the right call.
>
> As Ken pointed out, if you are populating your arrays from javascript or a
> JSON file or something similar this is a non-issue. The problem only occurs
> when you are attempting to load a binary blob directly into a typed array.
> Unless that blob is entirely homogenous (ie: all Float32's or all Int16's,
> etc) it's impossible to trivially swap endianness without being provided a
> detailed breakdown of the data patterns contained within the blob.
>
> Consider this example (using WebGL, but the same could apply elsewhere): I
> download a binary file containing tightly packed interleaved vertices that I
> want to pass directly to a WebGL buffer. The data contains little endian
> vertex positions, texture coordinates, texture ID's and a 32 bit color per
> vertex, so the data looks something like this:
>
> struct {
>     Float32[3] pos,
>     Float32[4] uv,
>     Uint16 textureId,
>     Uint32 color
> };
>
> I will receive this data from XHR as an opaque TypedArray, and if the
> platform is little endian I can pass it directly to the GPU. But on big
> endian systems, a translation needs to be done somewhere:
>
> xhr.responseType = "arraybuffer";
> xhr.onload = function() {
>     var vertBuffer = gl.createBuffer();
>     gl.bindBuffer(gl.ARRAY_BUFFER, vertBuffer);
>
>     // If bigEndian then... magic!
>
>     gl.bufferData(gl.ARRAY_BUFFER, this.response, gl.STATIC_DRAW);
> }
>
> So the question is: What exactly are we expecting that "magic" to be? We
> can't just swizzle every 4 bytes. Either the graphics driver must do the
> endian swap as it processes the buffer, which is possible but entirely out
> of the browsers control, or we would have to provide data packing
> information to the browser so that it could do the appropriate swap for us.
> And if I'm going to have to build up a data definition and pass that through
> to the browser anyway... well I've just destroyed the whole "don't make me
> care about endianness" ideal, haven't I? I might as well just do the swap in
> my own code via a DataView, or better yet cache a big endian version of the
> same file on the server side if I'm worried about performance.

I would suggest that you pass down the schema of the data to the
client application along with the raw binary file, and always iterate
down it with DataView, reading each individual value and storing it
into one of multiple typed array views of a new ArrayBuffer. Then
upload the new ArrayBuffer to WebGL. This way, if you get the code
working on one platform, you are guaranteed that it will work on all
platforms.

As one simple concrete example, please look at
http://code.google.com/p/webglsamples/source/browse/hdr/hdr.js#235 .
This demo downloads high dynamic range textures as binary files
containing floating-point values. The data is copied from the XHR's
ArrayBuffer using a DataView, knowing that the source data is in
little endian format, and stored into a Float32Array for upload to
WebGL. This code works identically on big-endian and little-endian
architectures.

> So yeah, it sucks that we have to plan for devices that are practically
> non-existant and difficult to test for, but I don't really see a nicer
> (practical) solution.
>
> That said, one thing that DataView doesn't handle too nicely right now is
> arrays. You're basically stuck for-looping over your data, even if it's all
> the same type. I would fully support having new DataView methods available
> like:
>
> Int32Array getInt32Array(unsigned long byteOffset, unsigned long elements,
> optional boolean littleEndian);
>
> Which would be a nice, sensible optimization since I'm pretty sure the
> browser backend could do that faster than a JS loop.

Definitely agree that adding array readers and writers to DataView is
worth considering; it's even mentioned in the typed array spec at
http://www.khronos.org/registry/typedarray/specs/latest/#11 . I would
however like to work on optimizing DataView's single-element accessors
first so that we could do a good measurement of the potential speedup.
Right now DataView is completely unoptimized in WebKit's
implementation, but the typed array views have had the benefit of
months of optimization work in both the JavaScriptCore and V8 engines.

-Ken

> --Brandon
>
> On Wed, Mar 28, 2012 at 1:39 PM, Kenneth Russell <kbr at google.com> wrote:
>>
>> On Wed, Mar 28, 2012 at 12:34 PM, Benoit Jacob <bjacob at mozilla.com> wrote:
>> > Before I joined this mailign list, Boris Zbarsky wrote:
>> >> C)  Try to guess based on where the array buffer came from and have
>> >> different behavior for different array buffers.  With enough luck (or
>> >> good enough heuristics), would make at least some WebGL work, while
>> >> also
>> >> making non-WebGL things loaded over XHR work.
>> >
>> > FWIW, here is a way to do this that will always work and won't rely on
>> > "luck". The key idea is that by the time one draws stuff, all the
>> > information about how vertex attributes use buffer data must be known.
>> >
>> > 1. In webgl.bufferData implementation, don't call glBufferData, instead
>> > just cache the buffer data.
>> >
>> > 2. In webgl.vertexAttribPointer, record the attributes structure (their
>> > types, how they use buffer data). Do not convert/upload buffers yet.
>> >
>> > 3. In the first WebGL draw call (like webgl.drawArrays) since the last
>> > bufferData/vertexAttribPointer call, do the conversion of buffers and the
>> > glBufferData calls. Use some heuristics to drop the buffer data cache, as
>> > most WebGL apps will not have a use for it anymore.
>>
>> It would never be possible to drop the CPU side buffer data cache. A
>> subsequent draw call may set up the vertex attribute pointers
>> differently for the same buffer object, which would necessitate going
>> back through the buffer's data and generating new, appropriately
>> byte-swapped data for the GPU.
>>
>> >> In practice, if forced to implement a UA on a big-endian system today,
>> >> I
>> >> would likely pick option (C)....  I wouldn't classify that as a victory
>> >> for standardization, but I'm also not sure what we can do at this point
>> >> to fix the brokenness.
>> >
>> > I agree that seems to be the only way to support universal webgl content
>> > on big-endian UAs. It's not great due to the memory overhead, but at least
>> > it shouldn't incur a significant performance overhead, and it typically only
>> > incurs a temporary memory overhead as we should be able to drop the buffer
>> > data caches quickly in most cases. Also, buffers are typically 10x smaller
>> > than textures, so the memory overhead would typically be ~ 10% in corner
>> > cases where we couldn't drop the caches.
>>
>> Our emails certainly crossed, but please refer to my other email.
>> WebGL applications that assemble vertex data for the GPU using typed
>> arrays will already work correctly on big-endian architectures. This
>> was a key consideration when these APIs were being designed. The
>> problems occur when binary data is loaded via XHR and uploaded to
>> WebGL directly. DataView is supposed to be used in such cases to load
>> the binary data, because the endianness of the file format must
>> necessarily be known.
>>
>> The possibility of forcing little-endian semantics was considered when
>> typed arrays were originally being designed. I don't have absolute
>> performance numbers to quote you, but based on previous experience
>> with Java's NIO Buffer classes, I am positive that the performance
>> impact for WebGL applications on big-endian architectures would be
>> very large. It would prevent applications which manipulate vertices in
>> JavaScript from running acceptably on big-endian machines.
>>
>> -Ken
>>
>> > In conclusion: WebGL is not the worst here, there is a pretty reasonable
>> > avenue for big-endian UAs to implement it in a way that allows running the
>> > same unmodified content as little-endian UAs.
>> >
>> > Benoit
>
>