[whatwg] WebSocket bufferedAmount includes overhead or not

Wed Mar 31 14:44:57 PDT 2010

On Tue, Mar 30, 2010 at 11:06 PM, Niklas Beischer <no at opera.com> wrote:
> On Tue, 30 Mar 2010 17:22:07 +0300, Jonas Sicking <jonas at sicking.cc> wrote:
>
>> On Tue, Mar 30, 2010 at 1:51 AM, Niklas Beischer <no at opera.com> wrote:
>>>
>>> On Tue, 30 Mar 2010 09:19:33 +0300, Jonas Sicking <jonas at sicking.cc>
>>> wrote:
>>>>
>>>> On Wed, Mar 24, 2010 at 2:33 PM, Ian Hickson <ian at hixie.ch> wrote:
>>>>>
>>>>> On Tue, 23 Mar 2010, Anne van Kesteren wrote:
>>>>>>
>>>>>> We (Opera) would prefer this too. I.e. to not impose details of the
>>>>>> protocol on the API.
>>>>>
>>>>> If we're exposing nothing from the protocol, does that mean we
>>>>> shouldn't
>>>>> be exposing that the string converts to UTF-8 either?
>>>>
>>>> While exposing the fact that strings are sent as UTF-8 does say
>>>> something about the protocol, I think it's still much more protocol
>>>> independent than including the message headers. The string has to be
>>>> serialized in some way, and it seems unlikely that any newly developed
>>>> protocol in the foreseeable future would use anything other than UTF-8
>>>> as serialization.
>>>
>>> True, but if bufferedAmount does not byte for byte (or character for
>>> character) match what is fed to the API, what does a few bytes
>>> representing the current overhead matter? IIRC EcmaScript uses UTF-16,
>>> which means that serialization will in most cases make the number of
>>> actually buffered bytes differ from the number of bytes in the original
>>> message buffer.
>>
>> EcmaScript doesn't do any serialization so I'm not sure what you mean
>> here?
>
> I meant the serialization in the WebSocket. Unless the protocol
> implementation keeps track of exactly how its serialized buffer differs from
> the original buffer it will not be able to give a correct answer to how much
> of the original buffer is left to transfer.
>
>
>>> And just because we currently use UTF-8 for
>>> serialization doesn't mean that will always be the case. Thus API users
>>> cannot rely on performing their own conversion to UTF-8 just to find out
>>> exactly how many characters in their message have been sent.
>>
>> My point was that using anything but UTF-8 is unlikely. So the
>> opposite of what you're saying here.
>
> So you're saying binary is out of the question?

No. At this point I'm confused as to what your point is. Unless you're
simply agreeing with the earlier emails in this thread that the fact
that we're converting to UTF-8 and use the converted bytes exposes
some protocol details. Are you arguing anything other than that?

/ Jonas