[whatwg] Flow Control on Websockets

Tue Dec 3 16:40:48 PST 2013

On Tue, Dec 3, 2013 at 4:16 PM, Ian Hickson <ian at hixie.ch> wrote:

> On Thu, 17 Oct 2013, Michael Meier wrote:
> >
> > Suppose I have a Websocket server and a WS client connected to it. The
> > client is a JS script in a browser offering the standard WS API. The
> > server produces data at a rate r1 and sends it to the client, which is
> > able to meaningfully process data at a rate of r2, where r1 > r2.
>
> What kind of data are we talking about here?
>
> I'm finding it hard to come up with data where it would make sense for the
> server to slow down production, but where WebSocket would still be the
> right protocol to use.
>
> If the server is generating a lot of data based on an algorithm, e.g.
> generating the digits of pi, without using data from the client, then HTTP
> seems like a better protocol -- just generate the data offline, and
> download it all later.
>
> If the server is generating data in response to some real-time query, e.g.
> the results of a search on Twitter or some such, then what is the server
> going to do with backpressure? Drop the data on the floor? Whether the
> client drops the data on the floor or the server drops the data on the
> floor doesn't much affect the user, though obviously if the client is
> dropping it then the bandwidth is being wasted.
>

Buffer the data. Read from the backend when the flow control window opens
up. I mean, that's what would happen if there was packetloss in the
network, leading to retransmissions, and possibly the TCP receive window
shrunk to 0. Irrespective of whether or not the browser websocket client
has some way to instruct the browser to stop reading from the socket,
there's always a possibility of the receive window shrinking to 0, and the
server *must* deal with it, in order to be a compliant TCP implementation.

>
> It would be helpful to have a better idea of what kind of data you were
> thinking about.
>
>
> > The JS script registers an onmessage handler and is called every time
> > the browser receives a message from the WS. Even if the JS script is
> > still busy processing the received message, say over a chain of
> > asynchronous calls, the browser might receive the next message and call
> > onmessage again. For the script, there are two options to proceed. The
> > first option is to drop data. This might not be possible in all
> > applications and is also a shame, since the data has already been
> > transported over the network. The second option is to buffer the data.
> > This is not a real option, though, since it will buffer an ever
> > increasing amount of data because r1 > r2.
>
> You could just use synchronous APIs in a Worker and not return; if the UA
> is clever, that will apply backpressure. But yeah, it's not ideal.
>
>
> > On the sending side of the browser, flow control seems to be possible by
> > using the bufferedAmount attribute to decide when to pause and resume
> > sending of data.
>
> That doesn't tell you what the backpressure's like, only how much the user
> agent hasn't yet sent.
>
>
> > Why is there such an assymetry between sending an receiving? Is it
> > possible to have flow control on the receiving side without resorting to
> > application level handshakes?
>
> I've filed this bug to track this feature request, if any user agent
> implementors want to add this, please do comment on the bug!:
>
>    https://www.w3.org/Bugs/Public/show_bug.cgi?id=23992
>
>
> On Thu, 17 Oct 2013, Nicholas Wilson wrote:
> >
> > If you're at all interested in the freshness of the data, you don't want
> > to use TCP as your sole flowcontrol mechanism. It's fine for bulk file
> > transfers, but think how many megabytes of buffering there can be - the
> > sum of all the send buffers of all the intermediaries along the chain.
> > On a low-loss network, the TCP window size will become very large. You
> > quickly get to a point where the server's filled up all the buffers
> > along the way, fine for file transfer, but potentially seconds'-worth of
> > latency.
>
> There's presumably use cases for not-quite-real-time data where it's still
> ok to drop some data. For example, I would expect a real-time search on
> Google+ to not show _every_ result, just a sample at each time interval.
> But in that example, I would expect the server to be sending data much,
> much slower than the user agent can handle it, so the topic of this thread
> doesn't really come up.
>
>
> > Your second question is whether it's possible to stop the browser
> > reading from the socket. Yes, just don't return from your onmessage
> > handler until you've actually finished handling the message. If you fire
> > up a new worker then tell the browser you're done, you're seeing the
> > inevitable result of that. You have to wait on the worker - or, if you
> > want to process say four messages in parallel, wait on the worker pool
> > until it's dropped below four active before returning.
>
> That's not actually possible in many cases. Some Web APIs (e.g. channel
> messaging) are exclusively async, even in workers.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>