[whatwg] API for encoding/decoding ArrayBuffers into text
glenn at zewt.org
Thu Mar 15 17:20:26 PDT 2012
On Thu, Mar 15, 2012 at 6:51 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> What's the use-case for the "stringLength" function? You can't decode
> into an existing datastructure anyway, so you're ultimately forced to
> call "decode" at which point the "stringLength" function hasn't helped
stringLength doesn't return the length of the decoded string. It returns
the byte offset of the first \0 (or the length of the whole buffer, if
none), for decoding null-terminated strings. For multibyte encodings (eg.
everything except UTF-16 and friends), it's just memchr(), so it's much
faster than actually decoding the string.
Currently the use-case of simply wanting to convert a string to a
> binary buffer is a bit cumbersome. You first have to call the
> "encodedLength" function, then allocate a buffer of the right size,
> then call the "encode" function.
I suggested eg.
result = encode("string", "utf-8", null).output;
which would create an ArrayBuffer of the required size. Presumably the
null ArrayBufferView argument would be optional, so you could just say
It doesn't seem possible to implement the 'encode' function without
> doing multiple scans over the string. The implementation seems
> required both to check that the data can be decoded using the
> specified encoding, as well as check that the data will fit in the
> passed in buffer. Only then can the implementation start decoding the
> data. This seems problematic.
Only if it guarantees that it doesn't write anything to the output buffer
unless the entire result will fit. I don't think we need to do that; just
guarantee that it'll be truncated on a whole codepoint.
I also don't think it's a good idea to throw an exception for encoding
> errors. Better to convert characters to the unicode replacement
> character. I believe we made a similar change to the WebSockets
> specification recently.
Was that change made? I filed
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16157, but it still seems to
More information about the whatwg