[whatwg] API for encoding/decoding ArrayBuffers into text
Glenn Maynard
glenn at zewt.org
Thu Mar 15 17:20:26 PDT 2012
On Thu, Mar 15, 2012 at 6:51 PM, Jonas Sicking <jonas at sicking.cc> wrote:
> What's the use-case for the "stringLength" function? You can't decode
> into an existing datastructure anyway, so you're ultimately forced to
> call "decode" at which point the "stringLength" function hasn't helped
> you.
>
stringLength doesn't return the length of the decoded string. It returns
the byte offset of the first \0 (or the length of the whole buffer, if
none), for decoding null-terminated strings. For multibyte encodings (eg.
everything except UTF-16 and friends), it's just memchr(), so it's much
faster than actually decoding the string.
Currently the use-case of simply wanting to convert a string to a
> binary buffer is a bit cumbersome. You first have to call the
> "encodedLength" function, then allocate a buffer of the right size,
> then call the "encode" function.
I suggested eg.
result = encode("string", "utf-8", null).output;
which would create an ArrayBuffer of the required size. Presumably the
null ArrayBufferView argument would be optional, so you could just say
encode("string", "utf-8").
It doesn't seem possible to implement the 'encode' function without
> doing multiple scans over the string. The implementation seems
> required both to check that the data can be decoded using the
> specified encoding, as well as check that the data will fit in the
> passed in buffer. Only then can the implementation start decoding the
> data. This seems problematic.
>
Only if it guarantees that it doesn't write anything to the output buffer
unless the entire result will fit. I don't think we need to do that; just
guarantee that it'll be truncated on a whole codepoint.
I also don't think it's a good idea to throw an exception for encoding
> errors. Better to convert characters to the unicode replacement
> character. I believe we made a similar change to the WebSockets
> specification recently.
>
Was that change made? I filed
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16157, but it still seems to
be undecided.
--
Glenn Maynard
More information about the whatwg
mailing list