[whatwg] WebSocket bufferedAmount includes overhead or not

Thu Apr 1 02:20:46 PDT 2010

On Thu, Apr 1, 2010 at 12:33 AM, Niklas Beischer <no at opera.com> wrote:
> On Thu, 01 Apr 2010 00:44:57 +0300, Jonas Sicking <jonas at sicking.cc> wrote:
>
>> On Tue, Mar 30, 2010 at 11:06 PM, Niklas Beischer <no at opera.com> wrote:
>>>
>>> On Tue, 30 Mar 2010 17:22:07 +0300, Jonas Sicking <jonas at sicking.cc>
>>> wrote:
>>>
>>>> On Tue, Mar 30, 2010 at 1:51 AM, Niklas Beischer <no at opera.com> wrote:
>>>>>
>>>>> On Tue, 30 Mar 2010 09:19:33 +0300, Jonas Sicking <jonas at sicking.cc>
>>>>> wrote:
>>>>>>
>>>>>> On Wed, Mar 24, 2010 at 2:33 PM, Ian Hickson <ian at hixie.ch> wrote:
>>>>>>>
>>>>>>> On Tue, 23 Mar 2010, Anne van Kesteren wrote:
>>>>>>>>
>>>>>>>> We (Opera) would prefer this too. I.e. to not impose details of the
>>>>>>>> protocol on the API.
>>>>>>>
>>>>>>> If we're exposing nothing from the protocol, does that mean we
>>>>>>> shouldn't
>>>>>>> be exposing that the string converts to UTF-8 either?
>>>>>>
>>>>>> While exposing the fact that strings are sent as UTF-8 does say
>>>>>> something about the protocol, I think it's still much more protocol
>>>>>> independent than including the message headers. The string has to be
>>>>>> serialized in some way, and it seems unlikely that any newly developed
>>>>>> protocol in the foreseeable future would use anything other than UTF-8
>>>>>> as serialization.
>>>>>
>>>>> True, but if bufferedAmount does not byte for byte (or character for
>>>>> character) match what is fed to the API, what does a few bytes
>>>>> representing the current overhead matter? IIRC EcmaScript uses UTF-16,
>>>>> which means that serialization will in most cases make the number of
>>>>> actually buffered bytes differ from the number of bytes in the original
>>>>> message buffer.
>>>>
>>>> EcmaScript doesn't do any serialization so I'm not sure what you mean
>>>> here?
>>>
>>> I meant the serialization in the WebSocket. Unless the protocol
>>> implementation keeps track of exactly how its serialized buffer differs
>>> from
>>> the original buffer it will not be able to give a correct answer to how
>>> much
>>> of the original buffer is left to transfer.
>>>
>>>
>>>>> And just because we currently use UTF-8 for
>>>>> serialization doesn't mean that will always be the case. Thus API users
>>>>> cannot rely on performing their own conversion to UTF-8 just to find
>>>>> out
>>>>> exactly how many characters in their message have been sent.
>>>>
>>>> My point was that using anything but UTF-8 is unlikely. So the
>>>> opposite of what you're saying here.
>>>
>>> So you're saying binary is out of the question?
>>
>> No. At this point I'm confused as to what your point is. Unless you're
>> simply agreeing with the earlier emails in this thread that the fact
>> that we're converting to UTF-8 and use the converted bytes exposes
>> some protocol details. Are you arguing anything other than that?
>
> No, I'm not arguing anything else. I do agree that it exposes protocol
> details. My point is that precisely that makes it more or less pointless to
> hide other protocol specifics. Specifics that have minor impact on the
> actual number bufferedAmount contains.
>
> According to your suggestion we should expose the impact of serialization
> but hide the framing. I don't see the reason for drawing the line between
> the two. It will, in my opinion, only complicate the implementation.

I agree it does complicate the implementation, but as far as I can see
not very much. Thus I prioritize utility to webpages over lazyness of
web browser authors ;)

The reason I think that I think we should hide framing is that this is
much more likely to change in future versions of the protocol. For
example if we add support for multiplexing (I don't know how likely
this is) we'll possibly have to add an extra byte to frames to
indicate channel. This could be negotiated on connection as to keep
things backwards compatible.

However I see it as much less likely that future versions of websocket
will change strings from being serialized as UTF-8 to being serialized
as something else.

/ Jonas