Build Your Own Web Server From Scratch In Node.JS

CHAPTER

03. Code A TCP Server

Our first step is to get familiar with the socket API, so we will code a simple TCP server in this chapter.

3.1 TCP Quick Review

Layers of Protocols

Network protocols are divided into different layers, where the higher layer depends on the lower layer, and each layer provides different capacities.

 top
  /\    | App |     message or whatever
  ||    | TCP |     byte stream
  ||    | IP  |     packets
  ||    | ... |
bottom

The layer below TCP is the IP layer. Each IP packet is a message with 3 components:

The sender’s address.
The receiver’s address.
The message data.

Communication with a packet-based scheme is not easy. There are lots of problems for applications to solve:

What if the message data exceeds the capacity of a single packet?
What if the packet is lost?
Out-of-order packets?

To make things simple, the next layer is added on top of IP packets. TCP provides:

Byte streams instead of packets.
Reliable and ordered delivery.

A byte stream is simply an ordered sequence of bytes. A protocol, rather than the application, is used to make sense of these bytes. Protocols are like file formats, except that the total length is unknown and the data is read in one pass.

UDP is on the same layer as TCP, but is still packet-based like the lower layer. UDP just adds port numbers over IP packets.

TCP Byte Stream vs. UDP Packet

The key difference: boundaries.

UDP: Each read from a socket corresponds to a single write from the peer.
TCP: No such correspondence! Data is a continuous flow of bytes.

TCP simply has no mechanism for preserving boundaries.

TCP send buffer: This is where data is stored before transmission. Multiple writes are indistinguishable from a single write.
Data is encapsulated as one or more IP packets, IP boundaries have no relationship to the original write boundaries.
TCP receive buffer: Data is available to applications as it arrives.

The No. 1 beginner trap in socket programming is “concatenating & splitting TCP packets” because there is no such thing as “TCP packets”. Protocols are required to interpret TCP data by imposing boundaries within the byte stream.

Byte Stream vs. Packet: DNS as an Example

To help you understand the implications of the byte stream, let’s use the DNS protocol (domain name to IP address lookup) as an example.

DNS runs on UDP, the client sends a single request message and the server responds with a single response message. A DNS message is encapsulated in a UDP packet.

| IP header |         IP payload           |
            \............................../
             | UDP header |  UDP payload  |
                          \.............../
                           | DNS message |

Due to the drawbacks of packet-based protocols, e.g., the inability to use large messages, DNS is also designed to run on TCP. But TCP knows nothing about “message”, so when sending DNS messages over TCP, a 2-byte length field is prepended to each DNS message so that the server or client can tell which part of the byte stream is which message. This 2-byte length field is the simplest example of an application protocol on top of TCP. This protocol allows for multiple application messages (DNS) in a single TCP byte stream.

| len1 | msg1 | len2 | msg2 | ...

TCP Start with a Handshake

To establish a TCP connection, there should be a client and a server (ignoring the simultaneous case). The server waits for the client at a specific address (IP + port), this step is called bind & listen. Then the client can connect to that address. The “connect” operation involves a 3-step handshake (SYN, SYN-ACK, ACK), but this is not our concern because the OS does it transparently. After the OS completes the handshake, the connection can be accepted by the server.

TCP is Bidirectional & Full-Duplex

Once established, the TCP connection can be used as a bi-directional byte stream, with 2 channels for each direction. Many protocols are request-response like HTTP/1.1, where a peer is either sending a request/response or receiving a response/request. But TCP isn’t restricted to this mode of communication. Each peer can send and receive at the same time (e.g. WebSocket), this is called full-duplex communication.

TCP End with 2 Handshakes

A peer tells the other side that no more data will be sent with the FIN flag, then the other side ACKs the FIN. The remote application is notified of the termination when reading from the channel.

Each direction of channels can be terminated independently, so the other side also performs the same handshake to fully close the connection.

3.2 Socket Primitives

The socket API comes in different shapes in different languages and libraries. You are likely to get confused if you jump into the API documentation without knowing the basics.

Applications Refer to Sockets by Opaque OS Handles

When you create a TCP connection, the connection is managed by your operating system, and you use the socket handle to refer to the connection in the socket API. In Linux, a socket handle is simply a file descriptor (fd). In Node.js, socket handles are wrapped into JS objects with methods on them.

Any OS handle must be closed by the application to terminate the underlying resource and recycle the handle.

Listening Socket & Connection Socket

A TCP server listens on a particular address (IP + port) and accepts client connections from that address. The listening address is also represented by a socket handle. And when you accept a new client connection, you get the socket handle of the TCP connection.

Now you know that there are 2 types of socket handles.

Listening sockets. Obtained by listening on an address.
Connection sockets. Obtained by accepting a client connection from a listening socket.

End of Transmission

Send and receive are also called read and write. For the write side, there are ways to tell the peer that no more data will be sent.

Closing a socket terminates a connection and causes the TCP FIN to be sent. Closing a handle of any type also recycles the handle itself. (Once the handle is gone, you cannot do anything with it.)
You can also shutdown your side of the transmission (also send FIN) while still being able to receive data from the peer; this is called a half-open connection, more on this later.

For the read side, there are ways to know when the peer has ended the transmission (received FIN). The end of transmission is often called the end of file (EOF).

List of Socket Primitives

In summary, there are several socket primitives that you need to know about.

Listening socket:
- bind & listen
- accept
- close
Connection socket:
- read
- write
- close

3.3 Socket API in Node.js

We will introduce the socket API with a small exercise: a TCP server that reads data from clients and writes the same data back. This is called an “echo server”.

Step 1: Create A Listening Socket

All the networking stuff is in the net module.

import * as net from "net";

Different types of sockets are represented as JS objects. The net.createServer() function creates a listening socket whose type is net.Server. net.Server has a listen() method to bind and listen on an address.

let server = net.createServer();
server.listen({host: '127.0.0.1', port: 1234});

Step 2: Accept New Connections

The next thing is the accept primitive for getting new connections. Unfortunately, there is no accept() function that simply returns a connection.

Here we need some background knowledge about IO in JS: There are 2 styles of handling IO in JS, the first style is using callbacks; you request something to be done and register a callback with the runtime, and when the thing is done, the callback is invoked.

function newConn(socket: net.Socket): void {
    console.log('new connection', socket.remoteAddress, socket.remotePort);
    // ...
}

let server = net.createServer();
server.on('connection', newConn);
server.listen({host: '127.0.0.1', port: 1234});

In the above code listing, server.on('connection', newConn) registers the callback function newConn. The runtime will automatically perform the accept operation and invoke the callback with the new connection as an argument of type net.Socket. This callback is registered once, but will be called for each new connection.

Step 3: Error Handling

The 'connection' argument is called an event, which is something you can register callbacks on. There are other events on a listening socket. For example, there is the 'error' event, which is invoked when an error occurs.

server.on('error', (err: Error) => { throw err; });

Here we simply throw the exception and terminate the program. You can test this by running 2 servers on the same address and port, the second server will fail.

As this book is not a manual, we will not list everything here. Read the Node.js documentation to find out other potentially useful events.

Step 4: Read and Write

Data received from the connection is also delivered via callbacks. The relevant events for reading from a socket are the 'data' event and the 'end' event. The 'data' event is invoked whenever data arrives from the peer, and the 'end' event is invoked when the peer has ended the transmission.

    socket.on('end', () => {
        // FIN received. The connection will be closed automatically.
        console.log('EOF.');
    });
    socket.on('data', (data: Buffer) => {
        console.log('data:', data);
        socket.write(data); // echo back the data.
    });

The socket.write() method sends data back to the peer.

Step 5: Close The Connection

The socket.end() method ends the transmission and closes the socket. Here we call socket.end() when the data contains the letter “q” so we can easily test this scenario.

    socket.on('data', (data: Buffer) => {
        console.log('data:', data);
        socket.write(data); // echo back the data.

        // actively closed the connection if the data contains 'q'
        if (data.includes('q')) {
            console.log('closing.');
            socket.end();   // this will send FIN and close the connection.
        }
    });

When the transmission is ended from either side, the socket is automatically closed by the runtime. There is also the 'error' event on net.Socket that reports IO errors. This event also causes the runtime to close the socket.

Step 6: Test It

Here is the complete code for our echo server.

import * as net from "net";

function newConn(socket: net.Socket): void {
    console.log('new connection', socket.remoteAddress, socket.remotePort);

    socket.on('end', () => {
        // FIN received. The connection will be closed automatically.
        console.log('EOF.');
    });
    socket.on('data', (data: Buffer) => {
        console.log('data:', data);
        socket.write(data); // echo back the data.

        // actively closed the connection if the data contains 'q'
        if (data.includes('q')) {
            console.log('closing.');
            socket.end();   // this will send FIN and close the connection.
        }
    });
}

let server = net.createServer();
server.on('error', (err: Error) => { throw err; });
server.on('connection', newConn);
server.listen({host: '127.0.0.1', port: 1234});

Start the echo server by running node --enable-source-maps echo_server.js. And test it with the nc or socat command.

3.4 Discussion: Half-Open Connections

Each direction of a TCP connection is ended independently, and it is possible to make use of the state where one direction is closed and the other is still open; this unidirectional use of TCP is called TCP half-open. For example, if peer A half-closes the connection to peer B:

A cannot send any more data, but can still receive from B.
B gets EOF, but can still send to A.

Not many applications make use of this. Most applications treat EOF the same way as being fully closed by the peer, and will also close the socket immediately.

The socket primitive for this is called shudown. Sockets in Node.js do not support half-open by default, and are automatically closed when either side sends or receives EOF. To support TCP half-open, an additional flag is required.

let server = net.createServer({allowHalfOpen: true});

When the allowHalfOpen flag is enabled, you are responsible for closing the connection, because socket.end() will no longer close the connection, but will only send EOF. Use socket.destroy() to close the socket manually.

3.5 Discussion: The Event Loop & Concurrency

JS Code Runs Within the Event Loop

As you can see, callbacks are needed to do anything in our echo server. This is how an event loop works. It’s a mechanism of the Node.js runtime that is invisible to the programmer. The runtime does something like this:

// pseudo code!
while (running) {
    let events = wait_for_events(); // blocking
    for (let e of events) {
        do_something(e);    // may invoke callbacks
    }
}

The runtime polls for IO events from the OS, such as a new connection arriving, a socket becoming ready to read, or a timer expiring. Then the runtime reacts to the events and invokes the callbacks that the programmer registered earlier. This process repeats after all events have been handled, thus it’s called the event loop.

The event loop is single-threaded; execution is either on the runtime code or on the JS code (callbacks or the main program). This works because when a callback returns, or awaits, control is back to the runtime, so the runtime can emit events and schedule other tasks. This implies that any JS code is expected to finish in a short time because the event loop is halted when executing JS code.

Concurrency in Node.JS is Event-Based

To help you understand the implication of the event loop, let’s now consider concurrency. A server can have multiple connections simultaneously, and each connection can emit events.

While an event handler is running, the single-threaded runtime cannot do anything for the other connections until the handler returns. The longer you process an event, the longer everything else is delayed.

3.6 Discussion: Asynchronous vs. Synchronous

Blocking & Non-Blocking IO

It’s vital to avoid staying in the event loop for too long. One way to cause such trouble is to run CPU-intensive code. This can be solved by …

Voluntarily yield to the runtime.
Moving the CPU-intensive code out of the event loop via multi-threading or multi-processing.

These topics are beyond the scope of this book, and our primary concern is IO.

The OS provides both blocking mode and non-blocking mode for network IO.

In blocking mode, the calling OS thread blocks until the result is ready.
In non-blocking mode, the OS immediately returns if the result is not ready (or is ready), and there is a way to be notified of readiness (for event loops).

The Node.JS runtime uses non-blocking mode because blocking mode is incompatible with event-based concurrency. The only blocking operation in an event loop is polling the OS for more events when there is nothing to do.

IO in Node.js is Asynchronous

Most Node.js library functions related to IO are either callback-based or promise-based. Promises can be viewed as another way to manage callbacks. These are also described as asynchronous, meaning that the result is delivered via a callback. These APIs do not block the event loop because the JS code doesn’t wait for the result; instead, the JS code returns to the runtime, and when the result is ready, the runtime invokes the callback to continue your program.

The opposite is the synchronous API, which blocks the calling OS thread to wait for the result. For example, let’s take a look at the documentation of the fs module, file APIs are available in all 3 types.

// promise
filehandle.read([options]);
// callback
fs.read(fd[, options], callback);
// synchronous, do not use!
fs.readSync(fd, buffer[, options]);

The synchronous API is what you do NOT use in network applications since it blocks the event loop. They exist for some simple use cases (like scripting) that do not depend on the event loop at all.

Event-Based Programming Beyond Networking

IO is more than disk files and networking. In GUI systems, user input from the mouse and keyboard is also IO. And event loops are not unique to the Node.js runtime; Web browsers and all other GUI applications also use event loops under the hood. You can transfer your experience in GUI programming to network programming and vice versa.

3.7 Discussion: Promise-Based IO

As we mentioned before, there is another style of writing IO code. The alternative style uses Promises instead of callbacks. The advantage of promise-based APIs is that you can await on them and get the result, thus avoiding breaking your program into tiny callbacks that scattered all over the place.

A hypothetical promised-based API for the accept primitive looks like this:

// pseudo code!
while (running) {
    let socket = await server.accept();
    newConn(socket);    // no `await` on this
}

And the hypothetical API for the read and write primitive looks like this:

// pseudo code!
async function newConn(socket) {
    while (true) {
        let data = await socket.read();
        if (!data) {
            break;  // EOF
        }
        await socket.write(data);
    }
}

The above pseudo code appears to be synchronous, but without blocking the event loop. Although the advantage may not be clear at this point, since our program is very simple.

Some Node.js APIs, but not all of them, are available in both callback-based and promise-based styles. However, with some effort, callback-based APIs can be converted to promised-based ones, as we will see in the next chapter.

(Error report | Ask questions) @ build-your-own.org

See also:
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.

Check it out