[whatwg] Simplified WebSockets

Tue Sep 30 01:04:37 PDT 2008

It occurred to me the other day when musing on WebSockets that the 
handshake is more complicated than required to achieve its purpose and 
still allows potential exploits. I'm going to assume for now the purpose 
of the handshake is to:

* Prevent unsafe communication with a non-websocket service.
* Provide just enough HTTP compatibility to allow proxying and virtual 
hosting.

I think the case has been successfully put that DDOS or command 
injection are possible using IMG tags or existing javascript methods - 
however the counter-argument has been made that the presence of legacy 
issues is not an argument for creating new ones. So with that in mind we 
should implement WebSockets as robustly as we can.

Since we don't at first know what the service is we really need to 
assume that:

* Long strings or certain characters may crash the service.
* The service may not be line orientated.
* The service may use binary data for communications, rather than text.
* Characters outside the ASCII printable range may have special meaning 
(ie, 'bell' or control characters).
* No string is safe, since the service may use string commands and 
non-whitespace separators.

For the sake of argument we'll assume the existence of a service that 
accepts commands as follows (we'll also assume the service ignores bad 
commands and continues processing):

AUTHENTICATE(user,password);GRANT(user,ALL);DELETE(/some/record);LOGOUT;

To feed this command set to the service via WebSockets we use:

var ws = new 
WebSocket("http://server:1024/?;AUTHENTICATE(user,password);GRANT(user,ALL);DELETE(/some/record);LOGOUT;")

I have already verified that none of these characters require escaping 
in URLs. The current proposal is fairly strict about allowed URIs but in 
my opinion it is not strict enough. We really need to verify we are 
talking to a WebSocket service before we start sending anything under 
the control of a malicious author.

Now given the huge variety of non-HTTP sub-systems we'll be talking to I 
don't think a full URL or path is actually a useful part of the 
handshake. What does path mean to a mail server for instance?

Here is my proposal:

C = client
S = service

# First we talk to our proxy, if configured. We know we're talking to a 
proxy because it's set on the client.

C> CONNECT server.example.com:1024 HTTP/1.1
C> Host: server.example.com:1024
C> Proxy-Connection: Keep-Alive
C> Upgrade: WebSocket/1.0

# Without a proxy we send

C> HEAD server.example.com:1024 HTTP/1.1
C> Host: server.example.com:1024
C> Connection: Keep-Alive
C> Upgrade: WebSocket/1.0

# If all goes well the service will respond with:

S> HTTP/1.1 200 OK
S> Upgrade: WebSocket/1.0
or
S> Some other HTTP response (but no Upgrade header)
or
S> Other non-HTTP response
or
No response.

# If we get a 200 response with Upgrade: WebSocket we *know* we have a 
WebSocket. Any other response and the client can throw a 'Connection 
failed' or 'Timeout' exception.

The client and server can now exchange any authentication tokens, access 
conditions, cookies, etc according to service requirements. eg:

ws.Send( 'referrer=' + window.location + '\r\n' );
ws.Send( 'channel=' + 'customers' + '\r\n' );
ws.Send( CookiesToServerSyntax() );

The key advantages of this method are:

* Simplicity (less handshaking, less parsing, fewer requirements)
* Security (No page author control over initial handshake beyond the 
server name or IP. Removes the risk of command injection via URI.)
* Compatibility (HTTP compatible. Proxy and Virtual Hosting compatible. 
Allows a CGI script to emulate a WebSocket)

I'm not saying the current proposal doesn't provide some of these 
things, just that I believe this proposal does it better.

Shannon