[whatwg] API for encoding/decoding ArrayBuffers into text

Jonas Sicking jonas at sicking.cc
Tue Mar 27 23:44:13 PDT 2012


On Tue, Mar 27, 2012 at 4:45 PM, Glenn Maynard <glenn at zewt.org> wrote:
> On Tue, Mar 27, 2012 at 12:41 AM, Jonas Sicking <jonas at sicking.cc> wrote:
>>
>> The memchr is purely overhead, I.e. we are comparing memchr+decoding
>> to decoding. So I don't see what's backing up the "probably the
>> fastest thing" claim.
>
>
> If you don't do it as an initial pass, then you have to embed null checks
> into the inner loop of your decoding algorithm.  For example, an ASCII
> decoder may look like:
>
> // char *input = input buffer
> // char *input_end = one past last byte of input buffer
> // wchar_t *output = output buffer
> input_end = memchr(input, 0, input_end - input);
> while(input < input_end)
> {
>     if(*input >= 0x80)
>         *output++ = 0xFFFD;
>     else
>         *output++ = *input;
>     ++input;
> }
>
> If you don't do the initial search, then it becomes:
>
> while(input < input_end && *input != 0)
> {
>     if(*input >= 0x80)
>         *output++ = 0xFFFD;
>     else
>         *output++ = *input;
>     ++input;
> }
>
> which means that you have an additional branch each time through the loop to
> check for the null terminator.  That's likely to be slower than just doing
> another pass.
>
> But anyway, please either make a benchmark or two to show the differences
> we're talking about, or drop "performance" as an argument.  This is all just
> a distraction otherwise.  I don't think the speed of conversion is even a
> serious issue, much less the microseconds taken by memchr.

The extra null-check is basically free since you are going to be bound
on memory IO. I.e. the extra nullcheck will just happen in the bubble
in the CPU pipeline while waiting for data from memory.

Scanning over the buffer twice will cause a lot more memory IO and
will definitely be slower.

>> > It doesn't seem materially harder (a little more code, yes, but that's
>> > not
>> > the same thing), and it's more general-purpose.
>>
>> I agree it doesn't seem materially harder. I also agree that I don't
>> have data to show that it's materially slower. But it sounds like
>> we're in agreement that keeping the logic outside is both harder and
>> slower which honestly doesn't speak strongly in its favor.
>
> Sorry, I'm confused--you're saying that it isn't harder, but we're in
> agreement that it's harder.  Please clarify what you mean.
>
> I don't believe it's meaningfully slower or harder.

I'm saying that having separate functions for
* finding the null terminator
* decoding a set number of bytes

is both harder and slower for the webpage, than having a single
function which just decodes to the null terminator.

We can argue weather it's meaningfully slower or harder. But it seems
like we agree that it's slower and harder.

>> I don't understand the argument that the alternative is more
>> "general-purpose". The API is already generic in that you can use
>> whatever delimiter you want since you pass in a view. The only
>> functionality which is not available is finding a null-terminator in
>> an arraybuffer which you are arguing below shouldn't be part of the
>> decoder (which I agree with).
>
> I'm confused.  What are you arguing?  "The alternative"--taking the null
> terminator search out of the decoder--you seem to argue against (first
> sentence), then to agree with (last sentence).  Can you back up and restate
> what you're saying from scratch?

If you agree that creating separate functions for finding the null
terminator and then decoding to it, rather than having a single
function which does both things, while yet agreeing that having
separate functions are better, then clearly you must think that having
separate functions bring some other benefits.

I still don't understand what that benefit you are seeing is. You
hinted at some "more generic" argument, but I still don't understand
it. So far the only reason that has been brought up is that it
provides an API for simply finding null terminators which could be
useful if you are doing things other than decoding. Is that what you
are talking about when you are saying that it's "more generic"?

/ Jonas



More information about the whatwg mailing list