Build Your Own Web Server From Scratch In Node.JS
build-your-own.org eBook·Paperback
⟵ prev Contents next ⟶

🔥 My other Book: Build Your Own Redis

06. HTTP Semantics and Syntax

HTTP is very human-readable, which means you can build a server by looking at examples instead of the specification. However, this approach results in buggy toy code, and you won’t learn much. So you need to consult the specification — a series of RFC documents.

6.1 High-Level Structures

Let’s review what you already know from the introductory chapter:

These things are mostly the same from HTTP/1.0 to HTTP/3.

HTTP/1.0 200 OK
Age: 525410
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2020 11:11:11 GMT
Etag: "1234567890+gzip+ident"
Last-Modified: Thu, 20 Oct 2019 11:11:11 GMT
Vary: Accept-Encoding
Content-Length: 1256
Connection: close

<!doctype html>
<!-- omitted -->

6.2 Content-Length

HTTP semantics are mostly about interpreting header fields, which is described in RFC 9110. Try reading it yourself.

The most important header fields are Content-Length and Transfer-Encoding, because they determine the length of an HTTP message, which is the most important function of a protocol.

The Length of the HTTP Header

Both a request and a response consist of 2 parts: header + body. They are separated by an empty line. A line ends with '\r\n'. So the header ends with '\r\n\r\n' including the empty line. That’s how we determine the length of the header.

The Length of the HTTP Body

The length of the body is complicated because there are 3 ways to determine it. The first way is to use Content-Length, which contains the length of the body.

Some ancient HTTP/1.0 software doesn’t use Content-Length, so the body is just the rest of the connection data, the parser reads the socket to EOF and that’s the body. This is the second way to determine the body length. This way is problematic because you cannot tell if the connection is ended prematurely.

6.3 Chunked Transfer Encoding

Generate and Send Data on the Fly

The third way is to use Transfer-Encoding: chunked instead of Content-Length. This is called chunked transfer encoding. It can mark the end of the payload without knowing its size in advance.

This allows the server to send the response while generating it on the fly. This use case is called streaming. An example is displaying real-time logs to the client without waiting for the process to finish.

Another Layer of Protocol

How does chunked encoding work? As the sender, we don’t know the total payload length, but we do know the portion of the payload we have. So we can send it in a mini-message format called a “chunk”. And a special chunk marks the end of end stream.

The receiver parses the byte stream into chunks and consumes the data, until the special chunk is received. Here is a concrete example:

4\r\nHTTP\r\n5\r\nserver\r\n0\r\n\r\n

It is parsed into 3 chunks:

You can easily guess how this works. Chunks start with the size of the data, and a 0-sized chunk marks the end of the stream.

Chunks Are Not Messages

Note that the chunk data boundaries are just side effects. These chunks are not represented to the application as individual messages; the application still sees the payload as a byte stream.

6.4 Ambiguities in HTTP

The Happy Cases of Body Length

In summary, this is how to determine the length of the payload body (if the HTTP method allows the payload body, i.e., POST or PUT).

  1. If Transfer-Encoding: chunked is present. Parse chunks.
  2. If Content-Length: number is valid. The length is known.
  3. If neither field is present, use the rest of the connection data as the payload.

There are also special cases, such as GET and HEAD, 304 (Not Modified) status code, which make HTTP not easy to implement.

Mind the Nasty Cases

You may wonder what happens if both header fields are present, as there is no clear way to interpret this. This kind of ambiguity is a source of security exploits known as “HTTP request smuggling”.

Another ambiguity is the nonexistent payload body for the GET request, what if the the request includes Content-Length? Should the server ignore the field or forbid the field? What about Content-Length: 0?

Also, should the server or client even allow users to mess with the Content-Length and Transfer-Encoding fields at all? There are many discussions on the Internet, and although the RFC tried to enumerate the cases, different implementations handle them differently.

An exercise for the reader: If you are designing a new protocol, how do you avoid ambiguities like this?

6.5 HTTP Message Format

RFC 9112 describes exactly how bits are transmitted over the network.

Read the BNF Language

The HTTP message format is described in a language called BNF. Go to the “2. Message” section in RFC 9112 and you will see things like this:

HTTP-message   = start-line CRLF
                 *( field-line CRLF )
                 CRLF
                 [ message-body ]
start-line     = request-line / status-line

This says: An HTTP message is either a request message or a response message. A message starts with either a request line or a status line, followed by multiple header fields, then an empty line, then the optional payload body. Lines are separated by CRLF, which is the ASCII string '\r\n'. The BNF language is much more concise and less ambiguous than English.

HTTP Header Fields

field-line   = field-name ":" OWS field-value OWS

The header field name and value are separated by a colon, but the rules for field name and value are defined in RFC 9110 instead.

field-name     = token
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
               ; any VCHAR, except delimiters

OWS            = *( SP / HTAB )
               ; optional whitespace

field-value    = *field-content
field-content  = field-vchar
                 [ 1*( SP / HTAB / field-vchar ) field-vchar ]
field-vchar    = VCHAR / obs-text
obs-text       = %x80-FF

This is the general rule for field name and value. SP, HTAB, and VCHAR refer to space, tab, and printable ASCII character, respectively. Some characters are forbidden in header fields, especially CR and LF.

Some header fields have additional rules for interpretation, such as comma-separated values or quoted strings. For now, we can just leave them as they are until we need them.

The HTTP specification is very large, and this chapter only covers the most important bits of implementing an HTTP server, which we will do in the next chapter.

6.6 Common Header Fields

Many header fields are either interpreted by applications or used by optional HTTP features, and are not immediately relevant to our implementation. You can become familiar with them by inspecting HTTP headers in browser dev tools.

Common header fields. (C means client and S means server.)
Header Field By Description
Content-Length: 60 C/S Discussed.
Transfer-Encoding: chunked C/S Discussed.
Accept: text/html C For negotiating content types.
Content-Type: text/html S Content type.
Accept-Encoding: gzip C For negotiating content compression.
Content-Encoding: gzip S Compressed response.
Vary: content-encoding S Tell proxies about content negotiations.
Authorization: Basic dTpw C Authorization by username and password.
Cache-Control: no-cache C/S Affect caching behavior.
Age: 60 S How long is the item cached by the proxy?
Set-Cookie: k=v S HTTP cookie.
Cookie: k=v C HTTP cookie.
Date C/S Not very useful.
Expect: 100-continue C An obscure feature.
Host: example.com C/S The host name of the URL.
Last-Modified S For cache validation and 304 Not Modified.
If-Modified-Since C Validate Last-Modified.
ETag: abcd S For cache validation and 304 Not Modified.
If-None-Match: abcd C Validate ETag.
Range: 10- C Range request. Get a portion of the response.
Content-Range: bytes 10-/60 S Range response.
Accept-Ranges: bytes S Indicate that range requests are allowed.
Referer: http://foo.com/ C Where is the user from?
Transfer-Encoding: gzip S An alternative way to achieve compression.
TE: gzip C For negotiating Transfer-Encoding.
Trailer: Foo C/S Obscure feature: Header fields after payload.
User-Agent: Foo C Client software.
Server: Foo S Server software.
Upgrade: websocket C/S Create WebSockets.
Access-Control-* S For cross-origin resource sharing (CORS).
Origin C CORS.
Location: http://bar.com/ S For 3xx redirections.

6.7 HTTP Methods

Read-Only Methods

The 2 most important HTTP methods are GET and POST. Why do we need different HTTP methods? Besides the obvious fact that a POST request can carry a payload where a GET cannot, it is also a good idea to separate read-only operations from write operations. You can use GET for read-only operations and POST for the rest.

A read-only method is called a “safe” method. There are 3 safe methods:

Cacheability

One reason for separating read-only operations from write operations is that read-only operations are generally cacheable. On the other hand, it makes no sense to cache write operations as they are state-changing.

However, the rules for cacheability are more complicated than different HTTP methods.

CRUD and Resources

You may have wondered why there are so many HTTP methods. Wouldn’t just GET and POST suffice? In fact, that’s what many applications do. More methods were added to HTTP because people imagined HTTP as a protocol for managing “resources”. For example, a forum user can manipulate his posts as resources:

These 4 verbs are often referred to as CRUD.

Idempotence

But why add CRUD as HTTP methods? A forum user may also move a post to another forum, should HTTP also include a MOVE method? Mirroring arbitrary English verbs is not a good reason to define HTTP methods. One of the better reasons is to define the idempotence of operations.

An idempotent operation is one that can be repeated with the same effect. This means that you can safely retry the operation until it succeeds. For example, if you rm a file over SSH and the connection breaks before you see the result, so the state of the file is unknown to you, but you can always blindly rm it again (if it’s really the same file):

An idempotent operation over HTTP can still result in a different status code, just like the return code of rm.

Idempotence in HTTP:

Idempotence in browsers:

But this still doesn’t answer the puzzle of why there are so many verbs, because HTTP could just add 1 more method for idempotent writes instead of 3 (PATCH, PUT, DELETE). In fact, there may be no strong reason for apps to use them all.

Comparison of HTTP Methods

A summary of general-purpose HTTP methods.

Verb Safe Idempotent Cacheable <form> CRUD Req body Res body
GET Yes Yes Yes Yes read No Yes
HEAD Yes Yes Yes No read No No
POST No No No* Yes - Yes Yes
PATCH No No No* No update Yes May
PUT No Yes No No create Yes May
DELETE No Yes No No delete May May

Note: Cacheable POST or PATCH is possible, but rarely supported.

6.8 Discussion: Text vs. Binary

HTTP is designed in a way that you can send requests from telnet, so you can learn it by poking around. However, textual protocols have downsides.

Text is Often Ambiguous

One downside is that human-readable formats are often less machine-readable, because they are more flexible than necessary. Consider the way HTTP payload length is determined:

HTTP is a simple protocol, where simple means it’s easy to look at. Writing code for it is not simple because there are too many rules for interpreting it, and the rules still leave you with ambiguities.

Text is More Work & Error-Prone

Another downside is that dealing with text is a lot more work. To properly handle text strings, you need to know their length first, which is often determined by delimiters. The extra work of looking for delimiters is the cost of human-readable formats.

It’s also error-prone; in C programming, null-terminated strings (0-delimited) have caused many security exploits.

HTTP/2 is binary and more complex than HTTP/1.1, but parsing the protocol is still easier because you don’t have to deal with elements of unknown length.

6.9 Discussion: Delimiters

Serialization Errors in Delimited Data

Delimiters are everywhere in textual protocols. For example, in HTTP …

One problem with delimiters is that the data cannot contain the delimiter itself. Failure to enforce this rule can lead to some injection exploits.

If a malicious client can trick a buggy server into emitting an header field value with CRLF in it, and the header field is the last field, then the payload body starts with the part of the field value that the attacker controls. This is called “HTTP response splitting”.

A proper HTTP server/client must forbid CRLF in header fields as there is no way to encode them. However, this is not true for many generic data formats. For example, JSON uses {}[],: to delimit elements, but a JSON string can contain arbitrary characters, so strings are quoted to avoid ambiguity with delimiters. But the quotes themselves are also delimiters, so escape sequences are needed to encode quotes.

This is why you need a JSON library to produce JSON instead of concatenating strings together. And HTTP is less well defined and more complicated than JSON, so pay attention to the specifications.

Length-Prefixed Data in Binary Protocols

Delimiters in text are used to separate elements. In binary protocols and formats, a better and simpler alternative is to use length-prefixed data, that is, to specify the length of the element before the element data. Some examples are:

( Report an Error | Ask a Question) @ build-your-own.org

See also:
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.
Check it out

⟵ prev Contents next ⟶