Build Your Own Web Server From Scratch In JavaScript Subscribe to get notified of new chapters and the book's release.
02. HTTP Overview
2.1 Overview
As you may already know, the HTTP protocol sits above the TCP protocol. How TCP itself works in detail is not our concern; what we need to know about TCP is that it’s a bidirectional channel for transmitting raw bytes — a carrier for other application protocols such as HTTP or SSH.
Although each direction of a TCP connection can operate independently, many protocols follow the request-response model. The client sends a request, then the server sends a response, then the client might use the same connection for further requests and responses.
client server
------ ------
| req1 | ==>
<== | res1 |
| req2 | ==>
<== | res2 |
...
An HTTP request or response consists of a header followed by an optional payload. The header contains the URL of the request, or the response code, followed by a list of header fields.
2.2 HTTP by Example
To get the hang of network protocols, let’s start by making an HTTP request from the command line.
Run the netcat command:
nc example.com 80
The nc
(netcat) command creates a TCP connection to the
destination host and port, and then attaches the connection to stdin and
stdout. We can now start typing in the terminal and the data will be
transmitted over the connection:
GET / HTTP/1.0
Host: example.com (empty line)
(Note the extra line at the end!)
We will get the following response:
HTTP/1.0 200 OK
Age: 525410
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2020 11:11:11 GMT
Etag: "1234567890+gzip+ident"
Expires: Thu, 20 Oct 2020 11:11:11 GMT
Last-Modified: Thu, 20 Oct 2019 11:11:11 GMT
Server: ECS (ddd/EEEE)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
Connection: close
<!doctype html>
<html>
<!-- omitted --> </html>
You might be surprised at how easy it is to make a raw HTTP request from the command line if you have never seen this before. Let’s take a look at the data and try to figure out what it means.
The first line of the request — GET / HTTP/1.0
—
contains the HTTP method GET
, the URI /
, and
the HTTP version 1.0
. This is easy to figure out.
And the first line of the response — HTTP/1.0 200 OK
—
contains the HTTP version and the response code 200
.
Following the first line is a list of header fields in the format of
Key: value
. The request has a single header field —
Host
— which contains the domain name.
The response contains many fields, and their functions are not as
obvious as the Host
field. Many HTTP header fields are
optional, and some are even useless. We will learn more about this in
later chapters.
The response header is followed by the payload, which in our case is an HTML document. Payload and header are separated by an empty line. The GET request has no payload so it ends with an empty line.
This is just a simple example that you can play with from the command line. We will examine the HTTP protocol in more detail later.
2.3 The Evolution of HTTP
HTTP/1.0: The Prototype
The example above uses HTTP/1.0, which is an ancient version of HTTP. HTTP/1.0 doesn’t support multiple requests over a single connection at all, and you need a new connection for every requests. This is problematic because typical web pages depend on multiples extra resources such as images, scripts, or stylesheets; the latency of the TCP handshake makes HTTP/1.0 very suboptimal.
HTTP/1.1 fixed this and became a decent protocol. You can try to use
the nc
command to send multiple requests on the same
connection by simply changing HTTP/1.0
to
HTTP/1.1
.
HTTP/1.1: Production-Ready & Easy-to-Understand
This book will focus on HTTP/1.1, as it is still very popular and is used beyond web applications. Even software systems that have nothing to do with the web have adopted HTTP as the basis of their network protocol. For example, gRPC, a popular RPC framework, can be used over HTTP, and many company’s internal software services are accessed over HTTP.
Why is HTTP so popular? One possible reason is that it can be used as a generic request-response protocol, applications can reply on HTTP instead of inventing their own protocols. This makes HTTP a good target if you want to learn how to build network protocols.
HTTP/2: New Capacities
There have been further developments since HTTP/1.1. HTTP/2, related to SPDY, is the next iteration of HTTP. It brings not only incremental refinements such as compressed headers, but also 2 new capacities. One is server push, which is sending resources to the client before the client requests them. The other is multiplexing multiple requests over a single TCP connection, which is an attempt to address head-of-line blocking. With these new features, HTTP/2 is no longer a simple request-response protocol. That’s why we start with HTTP/1.1 because it’s simple enough and easy to understand.
HTTP/3: More Ambition
HTTP/3, related to QUIC, is much larger than HTTP/2. It replaces TCP and uses UDP instead. And of course, it has to replicate most of the functionality of the TCP, this TCP alternative is called QUIC. The motivations behind QUIC are userspace congestion control, multiplexing, and head-of-line blocking.
You can learn a lot by reading about these new technologies, but you may be overwhelmed by these concepts and jargon in the process. So let’s start with something small and simple: coding an HTTP/1.1 server.
2.4 Command Line Tools
The netcat command line example above illustrates an important use case: debugging and testing. In later chapters, you will implement an HTTP server, and you can use the netcat command to quickly test your server for various scenarios. Although you might want to store the request data in a file instead of typing it into the terminal.
nc example.com 80 <request.txt
In later chapters, you may encounter some quirks of the
nc
command, such as not sending EOF, or multiple versions
of the nc
with incompatible flags. You can replace
nc
with the modern replacement: socat
.
socat tcp:example.com:80 -
The telent
command is also popular in tutorials.
telnet example.com 80
Also, instead of manually constructing the requests data, you can
also use an existing HTTP client. Try the curl
command:
curl -vvv http://example.com/
Most sites on the Internet now support HTTPS alongside plaintext
HTTP. HTTPS adds an extra layer of protocol called “TLS” between HTTP
and TCP. TLS is not a plaintext protocol like HTTP, so you cannot use
the netcat command to test an HTTPS server. Fortunately, TLS is just a
wrapper around TCP, so you can use a TLS client instead of the netcat
command to send an HTTPS request. Replace the netcat command with the
openssl s_client
command for HTTPS:
openssl s_client -verify_quiet -quiet -connect example.com:443
In the next chapter, we will do some actual coding.