02. HTTP Overview

2.1 Overview

As you may already know, the HTTP protocol sits above the TCP protocol. How TCP itself works in detail is not our concern; what we need to know about TCP is that it’s a bidirectional channel for transmitting raw bytes — a carrier for other application protocols such as HTTP or SSH.

Although each direction of a TCP connection can operate independently, many protocols follow the request-response model. The client sends a request, then the server sends a response, then the client might use the same connection for further requests and responses.

 client        server
 ------        ------

| req1 |  ==>
         <==  | res1 |
| req2 |  ==>
         <==  | res2 |
         ...

An HTTP request or response consists of a header followed by an optional payload. The header contains the URL of the request, or the response code, followed by a list of header fields.

2.2 HTTP by Example

To get the hang of network protocols, let’s start by making an HTTP request from the command line.

Run the netcat command:

nc example.com 80

The nc (netcat) command creates a TCP connection to the destination host and port, and then attaches the connection to stdin and stdout. We can now start typing in the terminal and the data will be transmitted:

GET / HTTP/1.0
Host: example.com
(empty line)

(Note the extra empty line at the end!)

We will get the following response:

HTTP/1.0 200 OK
Age: 525410
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2020 11:11:11 GMT
Etag: "1234567890+gzip+ident"
Last-Modified: Thu, 20 Oct 2019 11:11:11 GMT
Vary: Accept-Encoding
Content-Length: 1256
Connection: close

<!doctype html>
<!-- omitted -->

Making HTTP requests from the command line is very easy. Let’s take a look at the data and try to figure out what it means.

The first line of the request — GET / HTTP/1.0 — contains the HTTP method GET, the URI /, and the HTTP version 1.0. This is easy to figure out.

And the first line of the response — HTTP/1.0 200 OK — contains the HTTP version and the response code 200.

Following the first line is a list of header fields in the format of Key: value. The request has a single header field — Host — which contains the domain name.

The response contains many fields, and their functions are not as obvious as the Host field. Many HTTP header fields are optional, and some are even useless. We will learn more about this in later chapters.

The response header is followed by the payload, which in our case is an HTML document. Payload and header are separated by an empty line. The GET request has no payload so it ends with an empty line.

This is just a simple example that you can play with from the command line. We will examine the HTTP protocol in more detail later.

2.3 The Evolution of HTTP

HTTP/1.0: The Prototype

The example above uses HTTP/1.0, which is an ancient version of HTTP. HTTP/1.0 doesn’t support multiple requests over a single connection at all, and you need a new connection for every request. This is problematic because typical web pages depend on many extra resources, such as images, scripts, or stylesheets; the latency of the TCP handshake makes HTTP/1.0 very suboptimal.

HTTP/1.1 fixed this and became a practical protocol. You can try using the nc command to send multiple requests on the same connection by simply changing HTTP/1.0 to HTTP/1.1.

HTTP/1.1: Production-Ready & Easy-to-Understand

This book will focus on HTTP/1.1, as it is still very popular and easy to understand. Even software systems that have nothing to do with the Web have adopted HTTP as the basis of their network protocol. When a backend developer talks about an “API”, they likely mean an HTTP-based one, even for internal software services.

Why is HTTP so popular? One possible reason is that it can be used as a generic request-response protocol; developers can reply on HTTP instead of inventing their own protocols. This makes HTTP a good target for learning how to build network protocols.

HTTP/2: New Capacities

There have been further developments since HTTP/1.1. HTTP/2, related to SPDY, is the next iteration of HTTP. In addition to incremental refinements such as compressed headers, it has 2 new capacities.

With these new features, HTTP/2 is no longer a simple request-response protocol. That’s why we start with HTTP/1.1 because it’s simple enough and easy to understand.

HTTP/3: More Ambition

HTTP/3 is much larger than HTTP/2. It replaces TCP and uses UDP instead. So it needs to replicate most of the functionality of TCP, this TCP alternative is called QUIC. The motivations behind QUIC are userspace congestion control, multiplexing, and head-of-line blocking.

You can learn a lot by reading about these new technologies, but you may be overwhelmed by these concepts and jargon. So let’s start with something small and simple: coding an HTTP/1.1 server.

2.4 Command Line Tools

You are introduced to the command line because it allows for quick testing and debugging. which is handy when coding your own HTTP server. Although you may want to store the request data in a file instead of typing it each time.

nc example.com 80 <request.txt

In practice, you may encounter some quirks of the nc command, such as not sending EOF, or multiple versions of nc with incompatible flags. You can use the modern replacement socat instead.

socat tcp:example.com:80 -

The telent command is also popular in tutorials.

telnet example.com 80

You can also use an existing HTTP client instead of manually constructing the request data. Try the curl command:

curl -vvv http://example.com/

Most sites support HTTPS alongside plaintext HTTP. HTTPS adds an extra protocol layer called “TLS” between HTTP and TCP. TLS is not plaintext, so you cannot use netcat to test an HTTPS server. But TLS still provides a byte stream like TCP, so you just need to replace netcat with a TLS client.

openssl s_client -verify_quiet -quiet -connect example.com:443

In the next chapter, we will do some actual coding.