Build Your Own Web Server From Scratch In JavaScript
⟵ prev Contents next ⟶

🆕 This chapter is part of the WIP book:
Build Your Own Web Server From Scratch In JavaScript

Subscribe to get notified of new chapters and the book's release.
🔥 My other Book: Build Your Own Redis

02. HTTP Overview

2.1 Overview

As you may already know, the HTTP protocol sits above the TCP protocol. How TCP itself works in detail is not our concern; what we need to know about TCP is that it’s a bidirectional channel for transmitting raw bytes — a carrier for other application protocols such as HTTP or SSH.

Although each direction of a TCP connection can operate independently, many protocols follow the request-response model. The client sends a request, then the server sends a response, then the client might use the same connection for further requests and responses.

 client        server
 ------        ------

| req1 |  ==>
         <==  | res1 |
| req2 |  ==>
         <==  | res2 |

An HTTP request or response consists of a header followed by an optional payload. The header contains the URL of the request, or the response code, followed by a list of header fields.

2.2 HTTP by Example

To get the hang of network protocols, let’s start by making an HTTP request from the command line.

Run the netcat command:

nc 80

The nc (netcat) command creates a TCP connection to the destination host and port, and then attaches the connection to stdin and stdout. We can now start typing in the terminal and the data will be transmitted over the connection:

GET / HTTP/1.0
(empty line)

(Note the extra line at the end!)

We will get the following response:

HTTP/1.0 200 OK
Age: 525410
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2020 11:11:11 GMT
Etag: "1234567890+gzip+ident"
Expires: Thu, 20 Oct 2020 11:11:11 GMT
Last-Modified: Thu, 20 Oct 2019 11:11:11 GMT
Server: ECS (ddd/EEEE)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
Connection: close

<!doctype html>
<!-- omitted -->

You might be surprised at how easy it is to make a raw HTTP request from the command line if you have never seen this before. Let’s take a look at the data and try to figure out what it means.

The first line of the request — GET / HTTP/1.0 — contains the HTTP method GET, the URI /, and the HTTP version 1.0. This is easy to figure out.

And the first line of the response — HTTP/1.0 200 OK — contains the HTTP version and the response code 200.

Following the first line is a list of header fields in the format of Key: value. The request has a single header field — Host — which contains the domain name.

The response contains many fields, and their functions are not as obvious as the Host field. Many HTTP header fields are optional, and some are even useless. We will learn more about this in later chapters.

The response header is followed by the payload, which in our case is an HTML document. Payload and header are separated by an empty line. The GET request has no payload so it ends with an empty line.

This is just a simple example that you can play with from the command line. We will examine the HTTP protocol in more detail later.

2.3 The Evolution of HTTP

HTTP/1.0: The Prototype

The example above uses HTTP/1.0, which is an ancient version of HTTP. HTTP/1.0 doesn’t support multiple requests over a single connection at all, and you need a new connection for every requests. This is problematic because typical web pages depend on multiples extra resources such as images, scripts, or stylesheets; the latency of the TCP handshake makes HTTP/1.0 very suboptimal.

HTTP/1.1 fixed this and became a decent protocol. You can try to use the nc command to send multiple requests on the same connection by simply changing HTTP/1.0 to HTTP/1.1.

HTTP/1.1: Production-Ready & Easy-to-Understand

This book will focus on HTTP/1.1, as it is still very popular and is used beyond web applications. Even software systems that have nothing to do with the web have adopted HTTP as the basis of their network protocol. For example, gRPC, a popular RPC framework, can be used over HTTP, and many company’s internal software services are accessed over HTTP.

Why is HTTP so popular? One possible reason is that it can be used as a generic request-response protocol, applications can reply on HTTP instead of inventing their own protocols. This makes HTTP a good target if you want to learn how to build network protocols.

HTTP/2: New Capacities

There have been further developments since HTTP/1.1. HTTP/2, related to SPDY, is the next iteration of HTTP. It brings not only incremental refinements such as compressed headers, but also 2 new capacities. One is server push, which is sending resources to the client before the client requests them. The other is multiplexing multiple requests over a single TCP connection, which is an attempt to address head-of-line blocking. With these new features, HTTP/2 is no longer a simple request-response protocol. That’s why we start with HTTP/1.1 because it’s simple enough and easy to understand.

HTTP/3: More Ambition

HTTP/3, related to QUIC, is much larger than HTTP/2. It replaces TCP and uses UDP instead. And of course, it has to replicate most of the functionality of the TCP, this TCP alternative is called QUIC. The motivations behind QUIC are userspace congestion control, multiplexing, and head-of-line blocking.

You can learn a lot by reading about these new technologies, but you may be overwhelmed by these concepts and jargon in the process. So let’s start with something small and simple: coding an HTTP/1.1 server.

2.4 Command Line Tools

The netcat command line example above illustrates an important use case: debugging and testing. In later chapters, you will implement an HTTP server, and you can use the netcat command to quickly test your server for various scenarios. Although you might want to store the request data in a file instead of typing it into the terminal.

nc 80 <request.txt

In later chapters, you may encounter some quirks of the nc command, such as not sending EOF, or multiple versions of the nc with incompatible flags. You can replace nc with the modern replacement: socat.

socat -

The telent command is also popular in tutorials.

telnet 80

Also, instead of manually constructing the requests data, you can also use an existing HTTP client. Try the curl command:

curl -vvv

Most sites on the Internet now support HTTPS alongside plaintext HTTP. HTTPS adds an extra layer of protocol called “TLS” between HTTP and TCP. TLS is not a plaintext protocol like HTTP, so you cannot use the netcat command to test an HTTPS server. Fortunately, TLS is just a wrapper around TCP, so you can use a TLS client instead of the netcat command to send an HTTPS request. Replace the netcat command with the openssl s_client command for HTTPS:

openssl s_client -verify_quiet -quiet -connect

In the next chapter, we will do some actual coding.

⟵ prev Contents next ⟶