Build Your Own Web Server From Scratch In JavaScript
⟵ prev Contents next ⟶

🆕 This chapter is part of the WIP book:
Build Your Own Web Server From Scratch In JavaScript

Subscribe to get notified of new chapters and the book's release.
🔥 My other Book: Build Your Own Redis

05. Your First Network Protocol

Now that we have learned the basics of the socket API. The next step is to handle the HTTP protocol.

As mentioned in an earlier chapter, HTTP is a request-response protocol, and a TCP connection can be used for any number of requests and responses (ignoring the ancient HTTP/1.0 version). The first thing to consider when designing a network protocol is how to separate messages (requests or responses) apart. Since TCP offers only a stream of bytes, and bytes themselves have no boundaries, splitting messages is usually the first step in parsing a protocol.

Parsing HTTP is a non-trivial amount of work, so let’s take a smaller step first. We will implement a much simpler protocol in this chapter to illustrate the techniques before tackling HTTP in the next chapter.

5.1 Message Echo Server

Our protocol is simple, consisting of messages separated by '\n' (the newline character). The server processes client messages and sends back responses in the same protocol. If the client message is 'quit', the server responds with 'Bye.' and closes the connection, otherwise, the server echoes the client message back with the prefix 'Echo: '.

client      server
------      ------

msg1\n  ==>
       <==  Echo: msg1\n
msg2\n  ==>
       <==  Echo: msg2\n
quit\n  ==>
       <==  Bye.\n

This is similar to the echo server in the last chapter, but with a protocol instead of blindly writing back bytes.

5.2 Protocol Implementation

Step 1: Dynamic-Sized Buffer

The Buffer type represents a chunk of binary data of fixed size. But dynamic-sized buffers are also very common, and we will use one later. Below is a common implementation of a dynamic-sized buffer.

// A dynamic-sized buffer
type DynBuf = {
    data: Buffer,
    length: number,

// append data to DynBuf
function bufPush(buf: DynBuf, data: Buffer) {
    const newLen = buf.length + data.length;
    if ( < newLen) {
        // grow the capacity by the power of two
        let cap = Math.max(, 32);
        while (cap < newLen) {
            cap *= 2;
        const growed = Buffer.alloc(cap);, 0, 0); = growed;

    data.copy(, buf.length, 0);
    buf.length = newLen;

Basically, any dynamic-sized array looks like this.

Step 2: The Server Loop

At a high level, the server should be a loop.

  1. Parse and remove a complete message from the incoming byte stream.
  2. Handle the message.
  3. Send the response.

Let’s look at the first step in the loop: parse and remove a complete message from the incoming byte stream.

When you read from the socket, you get a piece of data. But that piece of data can be anything in the byte stream; it can be half of a message, mutliple messages, or multiple and a half messages. What we can do is collect the data into a buffer, and only handle it when the buffer contains a complete message.

Our server loop:

async function serveClient(socket: net.Socket) {
    const conn: TCPConn = soInit(socket);
    const buf: DynBuf = {data: Buffer.alloc(0), length: 0};
    while (true) {
        // try to get 1 message from the buffer
        const msg = cutMessage(buf);
        if (!msg) {
            // need more data
            const data = await soRead(conn);
            bufPush(buf, data);
            // EOF?
            if (data.length === 0) {
                // omitted ...
            // got some data, try it again.

        // process the message and send the response
        // omitted ...
    } // loop for messages

The data is first concatenated into a buffer, then we try to parse the message from the buffer, if the buffer doesn’t contain a complete message, we will try it again after we got more data.

Note that the buffer is not bounded to a single request; if there is more data in the buffer after the message, the remaining data is passed to the next iteration so that we can parse the next message correctly.

Step 3: Split Messages

Let’s code the cutMessage function which removes one message from the buffer.

// parse & remove a message from the beginning of the buffer if possible
function cutMessage(buf: DynBuf): null|Buffer {
    // messages are separated by '\n'
    const idx ='\n');
    if (idx < 0 || idx >= buf.length) {
        return null;    // not complete
    // make a copy of the message and move the remaining data to the front
    const msg = Buffer.from(, idx + 1));
    bufPop(buf, idx + 1);
    return msg;

// remove data from the front
function bufPop(buf: DynBuf, len: number) {, len, buf.length);
    buf.length -= len;

The current message is always at the beginning of the buffer. When a complete message is present, the message is removed by moving the rest of the data to the beginning; the buffer is reusable. However, this way of handling buffers is not very optimal, more on this later.

From this function you can learn some Buffer methods.

Step 4: Handle Requests

Requests are handled in the server loop and responses are sent.

    while (true) {
        // try to get 1 message from the buffer
        const msg = cutMessage(buf);
        if (!msg) {
            // omitted ...

        // process the message and send the response
        if (msg.equals(Buffer.from('quit\n'))) {
            await soWrite(conn, Buffer.from('Bye.\n'));
        } else {
            const reply = Buffer.concat([Buffer.from('Echo: '), msg]);
            await soWrite(conn, reply);
    } // loop for messages

Our message echo server is now complete. You can test it with the socat command.

5.3 Discussion: Pipelined Requests

Our server code assumes that there can be more than one message in the buffer. You may wonder how can this happen, since in a request-response scenario, the client waits for the response before sending more requests. This is to support an optimization.

Reduce Latency by Pipelining Requests

Consider a typical modern web page that involves lots of scripts, images and other resources. It takes many additional HTTP requests to load the page, and each request increases the load time by at least one roundtrip time (RTT). If we can send multiple requests at once, without waiting for the responses one by one, the load time could be greatly reduced. On the server side, the server shouldn’t tell the difference because a TCP connection is just a byte stream.

 client            server
 ------            ------

 |req1|  ==>
 |req2|  ==>
 |req3|  ==>  <==  |res1|
  ...         <==  |res2|

This is called pipelined requests. It’s a common way to reduce RTTs in request-response protocols, as long as the requests don’t depend on the previous response.

Support Pipelined Requests

While you can use pipelined requests on many well-implemented network servers, such as Redis, NGINX, some less common implementations are problematic, so web browsers do not use pipelined HTTP requests by default due to buggy servers.

But as you can see in our code, supporting pipelined requests requires no additional work at all, as long as you treat the TCP as a proper byte stream and do not make inappropriate assumptions.

You can test the pipelined scenario with the following command line.

echo -e 'asdf\n1234' | socat tcp: -

The server will probably receive 2 messages in a single 'data' event, and our server handled them correctly.

Deadlock by Pipelining

A caveat about pipelining requests: pipelining too many requests can lead to deadlocks; because both the server and the client can be sending at the same time, and if both their send buffers are full, it’s deadlocked as they are both stuck at sending and cannot drain the buffer.

5.4 Discussion: Smarter Buffers

Dynamic Array and Big-O

When we repeatedly append data to a fixed size buffer, we have to allocate a larger buffer, if we allocate just enough space to hold the new data, we have to allocate a new buffer and copy the old data every time, leading to O(n2) behavior. That’s why we created an appendable buffer with extra capacity and make it grow exponentially.

// O(n*n) complexity. Don't do this!
while (need_more_data()) {
    buf = Buffer.concat([buf, get_some_data()]);

Removing Data from the Front

However, there is still O(n2) behavior in our buffer code; whenever we removed a message from the buffer, we moved the remaining data to the front. This can be fixed by deferring the data movement. We can keep the remaining data temporarily in place until the wasted space in the front reaches a threshold, such as the half of the buffer capacity.

This fix requires us to keep track of the beginning of the data in the buffer. The coding is left as an exercise for the reader.

Using Fixed Sized Buffer without Reallocations

Another optimization of the buffer code: we can parse the message with a fixed size buffer, as long as there is a hard limit on the size of the message. The message size limit is reasonable when designing practical network protocols. If the message size is limited, we only need a fixed size buffer large enough to hold one message. The advantages of this are that no dynamic memory management is required, and memory usage is easily predictable. These upsides may not be obvious in Node.js, but may be desirable in environments with manual memory management or constrained hardware.

5.5 Discussion: Buffered Reader


5.6 Conclusion: Network Programming Basics

Although we have not implemented anything specific to HTTP. We have learned the basics of network programming. If we replace the simple protocol here with the HTTP protocol, we get an HTTP server, which we will do in the next chapter.

A summary of what we have learned so far:

See also: offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.
Check it out

⟵ prev Contents next ⟶