WebSockets 101

December 4, 2016

Forget about frameworks for now…


There’s a lot of frameworks that support WebSockets, but the purpose of every framework is to speed up the process of building products by hiding all low level details underneath, the details which I believe are important to know (=understand).

You probably know that WebSockets allow you to establish real time communication between a webserver and its clients and what’s cool about it, is you can get a response without a need to request it (explicitly). If your app primarily broadcasts information to its clients (like weather forecasts) and there’s not much interactivity, then there are better options out there but if you need a bidirectional communication that happen simultaneously, it’s the way to go.

Latency

You just need a single HTTP connection (by default port 80 or 443 when using SSL) to open a WebSocket connection and reuse it for exchanging messages, so it greatly reduces latency.


It actually uses HTTP to benefit from its popularity but the design does not limit WebSocket to HTTP and maybe there will be some future implementations without reinventing the protocol.

WebSocket is layered over TCP, but why can’t we just simply open a single TCP connection?
We could but TCP alone deals with streams of bytes with no inherent concept of messages and WebSocket attempts to simplify it by introducing the WebSocket API. With WebSocket, multi-byte messages will arrive in whole and in order, just like HTTP. Because message boundaries are built into the WebSocket Protocol, it is possible to send and receive separate messages and avoid common fragmentation mistakes.

Let’s shake hands

Both the client and server have to shake their hands, if the handskake was succesful, then the data transfer part starts and can be sent at will (independently from each other).

The handshake from the client may look like this (must be GET and HTTP/1.1 or greater):

    GET /echo HTTP/1.1
    Host: echo.rafalgolarz.com
    Upgrade: websocket
    Connection: Upgrade
    Sec-WebSocket-Key: dGhlIHNhbXBbZSGub13jZQ==
    Sec-WebSocket-Version: 13

This request includes a special header: Upgrade. The Upgrade header indicates that the client would like to upgrade the connection to a different protocol (websocket in our case) and after a successful upgrade, the syntax of the connection switches over to the data-framing format used to represent WebSocket messages.

The handshake from the server (response) could be something like:

    HTTP/1.1 101 Switching Protocols
    Upgrade: websocket
    Connection: Upgrade
    Sec-WebSocket-Accept: s2pPLMBjTxaQ1kIGzzhZRbK+xOo=

I focused only on the required headers, there might be a few more optional ones added to both handshakes. Switching Protocols after HTTP/1.1 101 can be replaced by any other description (101 is required here).

The server has to prove to the client that it received the client’s WebSocket handshake and to prove that, the server has to take two pieces of information and combine them to form a response. The first piece of information comes from the Sec-WebSocket-Key header field which is a base64 encoded random bytes. The second piece of information is a constant key (a GUID) included in the protocol specification that every WebSocket server must know. The computed value is returned as the Sec-WebSocket-Accept header which is a 20-byte MD5 hash in base64.

2 more cents about headers sent by the client:

  • Sec-WebSocket-Version: the version for RFC 6455 (The WebSocket Protocol) is always 13. The server responds with this header if it does not support the version of the protocol requested by the client. In that case, the header sent by the server lists the versions it does support. This only happens if the client predates RFC 6455.

  • Origin: http://rafalgolarz.com Although this header field is optional, a connection attempt lacking this header field should not be interpreted as coming from a browser client (obviously values in these examples will be different in your case :)

Messages

While a WebSocket connection is open, the client and server can send messages to each other at any time. A message is composed of one or more frames. A frame has an associated type. Each frame belonging to the same message contains the same type of data. Broadly speaking, there are types for textual data (UTF8), binary data (whose interpretation is left up to the application), and control frames (protocol-level signaling, such as to signal that the connection should be closed).

There is typically only one frame in a message, but a message can be composed of any number of frames. Servers could use different numbers of frames to begin delivering data before the entirety of the data is available.

  • Opcodes specify the type of the message paylod and can have up to 16 numerical values (but only 5 are used by the official protocol):

    • 1 - text

    • 2- binary

    • 8 - close (Close the connection. The body may contain UTF-8-encoded data with value /reason/ which is not necessarily human readable but may be useful for debugging or passing information relevant to the script that opened the connection.)

    • 9 - ping (may serve either as a keepalive or as a means to verify that the remote endpoint is still responsive)

    • 10 (hex 0xA) - pong (sent in response to a Ping frame and must have identical “Application data” as found in the message body of the Ping frame being replied to.)

  • Length

  • Decoding Text (UTF8)

  • Masking is used to obfuscate the content of the message. Every payload received by a WebSocket server is first unmasked before processing. Binary messages can be delivered directly.

  • Multi-frame messages use the fin bit in the frame format for streaming of partially available messages, which may be fragmented or incomplete. To transmit an incomplete message, you can send a frame that has the fin bit set to zero (1 indicates that the message ends with that frame’s payload).

After both sending and receiving a Close message, an endpoint considers the WebSocket connection closed and must close the underlying TCP connection. The server must close the underlying TCP connection immediately; the client should wait for the server to close the connection but may close the connection at any time after sending and receiving a Close message, e.g., if it has not received a TCP Close from the server in a reasonable time period.

Subprotocols

If you want to use simple protocols on top of WebSockets (like XML formats), you can add Sec-WebSocket-Protocol header in the opening WebSocket handshake from the client to the server.

RFC 6455 refers to custom protocols as “subprotocols” but it’s commonly called “protocols”.

Selecting a protocol does not change the syntax of the WebSocket Protocol itself. Instead, these protocols provide higher-level semantics for frameworks and applications. Standard protocols have been officially registered according to RFC 6455 and with the IANA (Internet Assigned Numbers Authority), the official governing body for registered protocols.

WebSocket API

If you’re still with me, here’s where the fun begins.
I hope you got a picture now of what the WebSocket protocol is - it basicaly has two parts: a handshake and the data transfer.

The API is also straight forward and purely event driven (no need to poll the server for the most updated status, the client simply listens for notifications).

If you’re not sure if your browser supports WebSockets (unlikely), copy and paste this fragment of JavaScript Code to the console:

    if (window.WebSocket) {
        console.log("This browser supports WebSocket!");
    } else {
        console.log("This browser does not support WebSocket.");
    }


The Constructor

To establish a WebSocket connection to a server, you need to call The WebSocket constructor which takes:

  • one required argument: the URL which you want to connect

  • and one optional argument: protocols (either a single name or an array)


    //connect to the server
    var ws = new WebSocket("ws://www.rafalgolarz.com");

or

    //connect to the server using one protocol called protocolName
    var ws = new WebSocket("ws://www.rafalgolarz.com, "protocolName");

or

    //connect to the server with multiple protocol choices
    var ws = new WebSocket("ws://www.rafalgolarz.com", ["myProtocol", "myProtocol2"]);


The constructor returns a WebSocket object instance. You can listen for events on that object.

The Events

Before we use available methods for WebSocket objects, let’s find out how to interact with the WebSocket instance using four different events:

  • onopen

  • onmessage

  • onerror

  • onclose

To start listening for these events, you simply add callback functions to the WebSocket object or you can use the addEventListener() DOM method on your WebSocket objects. It’s important to implement these events before attempting to send a message.

onopen

To make sure that the data will be send only when a connection is established we should wrap the send() method (desribed further in the post) with onopen event:

    ws.onopen = function(e) {
        console.log("Connection open, we can try to send a message");
    };



onmessage

The server might send us messages at any time and whenever this happens the onmessage callback receives the message.

    ws.onmessage = function(e) {
        if (typeof e.data === "string") {
            console.log("String message received", e, e.data);
        } else {
            console.log("Other message received", e, e.data);
        }
    };

The default data format is ‘blob’ which is particularly useful when sending and receiving files.



onerror

Errors also cause WebSocket connections to close, so this event handler is a good place for your reconnection logic.

    ws.onerror = function(e) {
        console.log("WebSocket Error: " , e);
        youFunctionToHandleErrors(e);
    };



onclose

This event is fired when the WebSocket connection is closed.

    ws.onclose = function(e) {
        console.log("Connection closed", e);
    };

It has three useful properties you can use for error handling and recovery: wasClean, code, and error.

It’s time to call the actual methods

Once you establish a connection between your client and server using WebSocket, you can invoke two methods:

  • send

  • close


send

You use this method to send messages from your client to the server. After sending one or more messages, you can leave the connection open or call the close() method to terminate the connection.

    ws.onopen = function(e) {
        console.log("Connection open, we can try to send a message");
        ws.send("Finally sending something...");
    };

If you want to send the data only while the socket is open, you can check the readyState property:

        if (ws.readyState === WebSocket.OPEN) {
            ws.send(data);
        } else {
            //do something else
        }


The readyState attribute may have one of the four values:

  • CONNECTING (The connection is not yet open)

  • OPEN (The connection is open and ready to communicate)

  • CLOSING (The connection is in the process of closing)

  • CLOSED (the connection is closed or couldn’t be opened)

Knowing the current state can be very useful in troubleshooting your application.


close

This method is used to close the WebSocket connection or to terminate an attempt to connect. If the connection is already closed, then the method does nothing.

    ws.close();

You can optionally pass the code (a numerical status code) and reason (a text string).

    ws.close(1000, "Closing the WebSocket connection normally");



Attributes

I already mentioned one of the attributes called “protocol” but I want to focus on another one: bufferedAmount.

You can check the number of bytes that have been queued but not yet transmitted to the server, this is very useful if the client application transports large amounts of data to the server.

    var MAX_BUFFER = 8192;

    var ws = new WebSocket("ws://echo.rafalgolarz.com/updates");

    ws.onopen = function () {

        // let's try to send every second
        setInterval( function() {

            if (ws.bufferedAmount < MAX_BUFFER) {
                ws.send(checkTheStatusAndSendData());
            }
        }, 1000);
    };



In the next post I will show you how to handle WebSocket connections on the server side.


Sources:
[1] RFC 6455 - https://tools.ietf.org/html/rfc6455
[2] The Definitive Guide to HTML5 WebSocket, Apress 2013
[3] MDN - https://developer.mozilla.org/en-US/docs/Web/API/WebSocket