December 7, 2016
Let’s start listening…
…listening for incoming socket connections. This is the very first thing that every server should do and to avoid potential problems with firewalls/proxies, let’s use the standard port 80.
So, what else should our WebSocket server be able to do?
respond to the handshake request (GET)
receive and send messages
- extract frames
- extract frames
keep track of clients
respond to Pings and Pongs
close the connection
…and potentially support subprotocols and extensions
Responding to the handshake
If you read my previous post WebSockets Clients 101 you should already be familiar with the concept of handshake.
Let’s just remind a list of actions to take on this simple client’s request:
GET /echo HTTP/1.1 Host: echo.rafalgolarz.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBbZSGub13jZQ== Sec-WebSocket-Version: 13
To prove to the client that the handshake is received, the server must:
concatenate the client’s Sec-WebSocket-Key and “258EAFA5-E914-47DA-95CA-C5AB0DC85B11” together
take the SHA-1 hash of the result
return the base64 encoding of the hash as the value of Sec-WebSocket-Accept header
If you add three more required headers you’re pretty much done with the handshake, you’re ready to exchange data:
HTTP/1.1 101 Put Some custom text here…
We’ll skip the optional headers for the sake of simplicity.
Receiving and sending messages
That’s the most important part about your server and it’s actually down to extracting information from frames.
check the payload length (to know when to stop reading the message :)
- the length is represented by bits 9 - 15
- if the length is 125 or less, you got the length
- if the length is 126, just read the next 16 bits
- if the length is 127, just read the next 64 bits
All the bits should be interpreted as an unsigned integer.
The payload length does not include the length of the masking key.
check if the data is masked (the mask bit is set)
- if it’s not masked, drop the connection. A client must mask the data even if when using a secure socket.
- if it’s masked, read the frame-masking-key (4 octets) and apply some simple XOR with modulo 4 operation to extract the key
- do not mask the data when sending a frame back (a client must close a connection if it detects a masked frame)
read the message
- the message (the payload data) can be sent as a utf8 text (opcode: 0x1) or binary (opcode: 0x2)
- the message can be fragmented (split into separate frames) and the FIN bit tells it’s the last message in a series (0 means that the server should keep listening for more parts of the message)
Keeping track of clients
This is really a generic behaviour of servers. You don’t want to shake hands multiple times with the same client if that’s been alreeady completed or you should simply protect yourself from too many attempts of connection from the same IP.
As you already know each connection has a uniqe id represented by Sec-WebSocket-Key.
Playing Ping Pong
You know the rules:
when you get a ping, send back a pong
if get a pong without ever sending a ping …ignore this (unless you noticed a way too many of pongs :)
Just remember to send back a pong with the exact same Payload Data as ping.
Closing the connection
The connection can be closed either by the client or server by sending a control frame with data containing a specified control sequence to begin the closing handshake.
Once our server receive such frame, the connection should be closed immediately and the server should not accept any further data from the client (obviously until a reconnection :)
Extensions and subprotocols
These two topics deserve a much longer post but the good news is you don’t need to handle them unless you know you need them in your project or you’re building a very generic WebSocket server for wider audiance.
Extensions can be use to compress data and they use the frame-rsv1, frame-rsv2, and frame-rsv3 bits of the frame header (see the picture above). A client requests extensions by including a Sec-WebSocket-Extensions header field
Subprotocols - each subprotocol requires an individual implementation on the server side. A client request should include this header: Sec-WebSocket-Protocol