[whatwg] WebSocket bufferedAmount includes overhead or not

Thu Apr 1 00:33:27 PDT 2010

On Thu, 01 Apr 2010 00:44:57 +0300, Jonas Sicking <jonas at sicking.cc> wrote:

> On Tue, Mar 30, 2010 at 11:06 PM, Niklas Beischer <no at opera.com> wrote:
>> On Tue, 30 Mar 2010 17:22:07 +0300, Jonas Sicking <jonas at sicking.cc>  
>> wrote:
>>
>>> On Tue, Mar 30, 2010 at 1:51 AM, Niklas Beischer <no at opera.com> wrote:
>>>>
>>>> On Tue, 30 Mar 2010 09:19:33 +0300, Jonas Sicking <jonas at sicking.cc>
>>>> wrote:
>>>>>
>>>>> On Wed, Mar 24, 2010 at 2:33 PM, Ian Hickson <ian at hixie.ch> wrote:
>>>>>>
>>>>>> On Tue, 23 Mar 2010, Anne van Kesteren wrote:
>>>>>>>
>>>>>>> We (Opera) would prefer this too. I.e. to not impose details of the
>>>>>>> protocol on the API.
>>>>>>
>>>>>> If we're exposing nothing from the protocol, does that mean we
>>>>>> shouldn't
>>>>>> be exposing that the string converts to UTF-8 either?
>>>>>
>>>>> While exposing the fact that strings are sent as UTF-8 does say
>>>>> something about the protocol, I think it's still much more protocol
>>>>> independent than including the message headers. The string has to be
>>>>> serialized in some way, and it seems unlikely that any newly  
>>>>> developed
>>>>> protocol in the foreseeable future would use anything other than  
>>>>> UTF-8
>>>>> as serialization.
>>>>
>>>> True, but if bufferedAmount does not byte for byte (or character for
>>>> character) match what is fed to the API, what does a few bytes
>>>> representing the current overhead matter? IIRC EcmaScript uses UTF-16,
>>>> which means that serialization will in most cases make the number of
>>>> actually buffered bytes differ from the number of bytes in the  
>>>> original
>>>> message buffer.
>>>
>>> EcmaScript doesn't do any serialization so I'm not sure what you mean
>>> here?
>>
>> I meant the serialization in the WebSocket. Unless the protocol
>> implementation keeps track of exactly how its serialized buffer differs  
>> from
>> the original buffer it will not be able to give a correct answer to how  
>> much
>> of the original buffer is left to transfer.
>>
>>
>>>> And just because we currently use UTF-8 for
>>>> serialization doesn't mean that will always be the case. Thus API  
>>>> users
>>>> cannot rely on performing their own conversion to UTF-8 just to find  
>>>> out
>>>> exactly how many characters in their message have been sent.
>>>
>>> My point was that using anything but UTF-8 is unlikely. So the
>>> opposite of what you're saying here.
>>
>> So you're saying binary is out of the question?
>
> No. At this point I'm confused as to what your point is. Unless you're
> simply agreeing with the earlier emails in this thread that the fact
> that we're converting to UTF-8 and use the converted bytes exposes
> some protocol details. Are you arguing anything other than that?

No, I'm not arguing anything else. I do agree that it exposes protocol  
details. My point is that precisely that makes it more or less pointless  
to hide other protocol specifics. Specifics that have minor impact on the  
actual number bufferedAmount contains.

According to your suggestion we should expose the impact of serialization  
but hide the framing. I don't see the reason for drawing the line between  
the two. It will, in my opinion, only complicate the implementation.

BR,
  /niklas

-- 
Niklas Beischer
Software Developer
Opera Software