WebSockets Servers 101

December 7, 2016

Let’s start listening…

listening for incoming socket connections. This is the very first thing that every server should do and to avoid potential problems with firewalls/proxies, let’s use the standard port 80.

So, what else should our WebSocket server be able to do?

  • respond to the handshake request (GET)

  • receive and send messages

    • extract frames

  • keep track of clients

  • respond to Pings and Pongs

  • close the connection

  • …and potentially support subprotocols and extensions

Responding to the handshake

If you read my previous post WebSockets Clients 101 you should already be familiar with the concept of handshake.

Let’s just remind a list of actions to take on this simple client’s request:

GET /echo HTTP/1.1
Host: echo.rafalgolarz.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBbZSGub13jZQ==
Sec-WebSocket-Version: 13

To prove to the client that the handshake is received, the server must:

  • concatenate the client’s Sec-WebSocket-Key and “258EAFA5-E914-47DA-95CA-C5AB0DC85B11” together

  • take the SHA-1 hash of the result

  • return the base64 encoding of the hash as the value of Sec-WebSocket-Accept header

If you add three more required headers you’re pretty much done with the handshake, you’re ready to exchange data:

  • HTTP/1.1 101 Put Some custom text here…

  • Upgrade: websocket

  • Connection: Upgrade

We’ll skip the optional headers for the sake of simplicity.

Receiving and sending messages

That’s the most important part about your server and it’s actually down to extracting information from frames.

  • check the payload length (to know when to stop reading the message :)

    • the length is represented by bits 9 - 15
    • if the length is 125 or less, you got the length
    • if the length is 126, just read the next 16 bits
    • if the length is 127, just read the next 64 bits

    All the bits should be interpreted as an unsigned integer.
    The payload length does not include the length of the masking key.

  • check if the data is masked (the mask bit is set)

    • if it’s not masked, drop the connection. A client must mask the data even if when using a secure socket.
    • if it’s masked, read the frame-masking-key (4 octets) and apply some simple XOR with modulo 4 operation to extract the key
    • do not mask the data when sending a frame back (a client must close a connection if it detects a masked frame)
  • read the message

    • the message (the payload data) can be sent as a utf8 text (opcode: 0x1) or binary (opcode: 0x2)
    • the message can be fragmented (split into separate frames) and the FIN bit tells it’s the last message in a series (0 means that the server should keep listening for more parts of the message)

Keeping track of clients

This is really a generic behaviour of servers. You don’t want to shake hands multiple times with the same client if that’s been alreeady completed or you should simply protect yourself from too many attempts of connection from the same IP.

As you already know each connection has a uniqe id represented by Sec-WebSocket-Key.

Playing Ping Pong

You know the rules:

  • when you get a ping, send back a pong

  • if get a pong without ever sending a ping …ignore this (unless you noticed a way too many of pongs :)

Just remember to send back a pong with the exact same Payload Data as ping.

Closing the connection

The connection can be closed either by the client or server by sending a control frame with data containing a specified control sequence to begin the closing handshake.

Once our server receive such frame, the connection should be closed immediately and the server should not accept any further data from the client (obviously until a reconnection :)

Extensions and subprotocols

These two topics deserve a much longer post but the good news is you don’t need to handle them unless you know you need them in your project or you’re building a very generic WebSocket server for wider audiance.

Extensions can be use to compress data and they use the frame-rsv1, frame-rsv2, and frame-rsv3 bits of the frame header (see the picture above). A client requests extensions by including a Sec-WebSocket-Extensions header field

Subprotocols - each subprotocol requires an individual implementation on the server side. A client request should include this header: Sec-WebSocket-Protocol