Build Your Own Web Server From Scratch In JavaScript
⟵ prev Contents next ⟶

🆕 This chapter is part of the WIP book:
Build Your Own Web Server From Scratch In JavaScript

Subscribe to get notified of new chapters and the book's release.
🔥 My other Book: Build Your Own Redis

06. HTTP Semantics & Message Format

The details of the HTTP protocol are described in a specification consisting of a series of RFC documents. These are a must read if you want to implement HTTP yourself.

HTTP is very human-readable, which means you can probably build an HTTP server by looking at example messages instead of specifications. However, you will probably end up with a very buggy server. That’s why you’ll be introduced to RFC documents.

6.1 High-Level Structures

You already have some ideas about HTTP from the introductory chapter. Let’s review them again:

These things are mostly the same in all versions of HTTP from HTTP/1.0 to HTTP/3.

An example HTTP response message:

HTTP/1.0 200 OK
Age: 525410
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2020 11:11:11 GMT
Etag: "1234567890+gzip+ident"
Expires: Thu, 20 Oct 2020 11:11:11 GMT
Last-Modified: Thu, 20 Oct 2019 11:11:11 GMT
Server: ECS (ddd/EEEE)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
Connection: close
(empty)
HTML payload ...

6.2 Content-Length

The header fields are what we need to look at in detail. RFC 9110 describes general concepts and semantics, including how header fields work. Try reading it after this section.

The most important header fields are Content-Length and Transfer-Encoding, because these 2 fields play a role in protocol parsing. As we mentioned earlier, the first thing to consider when designing a protocol is how messages are separated apart, i.e., how long a message is.

The Length of HTTP Header

An HTTP message consists of 2 parts: the header and the body payload. The header consists of lines of plain text, and the header and body are separated by an empty line. Thus, we can determine the length of the header by looking for the empty line. A line ends with '\r\n', so we are looking for '\r\n\r\n'.

The Length of HTTP Body

The length of the body is somewhat complicated, because there are 3 ways to indicate the body length. The first way is to use the Content-Length field to specify the length of the body directly.

Some ancient HTTP/1.0 software doesn’t send the Content-Length field, the payload body is just the rest of the connection, and the parser reads the socket to EOF and that’s the body. Not only does this design not allow the connection to be reused for multiple requests, but it also has trouble dealing with premature connection close as the receiver dosen’t know whether the payload was received in full. This is the second way to determine the body length.

6.3 Chunked Encoding

Ending A Data Stream without Its Length

The third way to determine the body length is to use chunked encoding. Chunked encoding is used to indicate the end of the body without knowing the body length beforehand. It is implied by the presence of the Transfer-Encoding: chunked field instead of the Content-Length field.

Chunked encoding allows applications to transmit the response while generating it on the fly. This use case is called streaming. An example is displaying real-time logs to the client without waiting for the process to finish.

Another Layer of Protocol

How does chunked encoding work? We could use a special “marker” to mark the end of the body. But the body is just a stream of arbitrary bytes, what if the marker itself is in the payload? The solution is to put the marker at the beginning of the data instead of at the end of the data. A marker says: Here is a chunk of data of length N, and after N bytes of data, there is the next marker. And the last marker is a special one, so you know that the body has ended.

| len | data | len | data | ... | len | data | end |

Here is a concrete example of chunked encoding:

4\r\nHTTP\r\n5\r\nserver\r\n0\r\n\r\n

It is parsed into 3 chunks:

You may have guessed how this works. The first chunk carries 4 bytes data 'HTTP', the second carries 6 bytes data 'server', and the last is a special chunk indicating the end of the payload.

Chunks are not Messages

Note that the chunk boundaries are just side effects of the chunked encoding. These chunks are not meant to be presented to the application, and the application still sees the payload as a stream of bytes, kinda like TCP doesn’t preserve packet boundaries from the IP layer.

6.4 Ambiguities in HTTP

The Happy Cases of Body Length

In summary, this is how to determine the length of the payload body (if the HTTP method allows the payload body, i.e., POST or PUT).

  1. If the Content-Length: number is present. The length is known.
  2. If the Transfer-Encoding: chunked is present. Decode the chunked data format until the end of the payload is indicated.
  3. If neither field is present. This is probably an HTTP/1.0 client, use the rest of the connection data as the payload.

Mind the Nasty Cases

You may wonder what happens if both header fields are present, as there is no clear way to interpret this. This kind of ambiguity is a source of security exploits known as “HTTP request smuggling”. Google it to learn more.

Another ambiguity is the nonexistent payload body for the GET request, what if the the request includes Content-Length: 0? Should the server ignore the field, forbid the field, or make a special case for the 0-length case? Also, should the server or client even allow users to mess with the Content-Length or Transfer-Encoding field at all? There are many discussions on the Internet, and different implementations handle them differently.

An exercise for the reader: If you are designing a new protocol, how do you avoid ambiguities like this?

6.5 HTTP Message Format

There is more to learn from the RFC 9110 HTTP Semantics document, but much of it is not immediately relevant to creating an HTTP server. Let’s now focus on another document: RFC 9112 — HTTP/1.1. This document describes exactly how bits are transmitted over the network. Make sure you have a copy of the RFC document before proceeding.

Read the BNF Language

The HTTP message format is described in a language called BNF. Go to the “2. Message” section in RFC 9112 and you will see things like this:

HTTP-message   = start-line CRLF
                 *( field-line CRLF )
                 CRLF
                 [ message-body ]
start-line     = request-line / status-line

This says: An HTTP message is either a request message or a response message. A message starts with either a request line or a status line, followed by multiple header fields, then an empty line, then the optional payload body. Lines are separated by CRLF, which is the ASCII string '\r\n'. The BNF language is much more concise and less ambiguous than English.

HTTP Header Fields

field-line   = field-name ":" OWS field-value OWS

The header field name and value are separated by a colon, but the rules for field name and value are defined in RFC 9110 instead.

field-name     = token
token          = 1*tchar
tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
               / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
               / DIGIT / ALPHA
               ; any VCHAR, except delimiters
OWS            = *( SP / HTAB )
               ; optional whitespace
field-value    = *field-content
field-content  = field-vchar
                 [ 1*( SP / HTAB / field-vchar ) field-vchar ]
field-vchar    = VCHAR / obs-text
obs-text       = %x80-FF

This is the general rule for field name and value. SP, HTAB, and VCHAR refer to space, tab, and printable ASCII character, respectively. Some characters are forbidden in header fields, especially CR and LF.

Some header fields have additional rules for interpretation, such as comma-separated values or quoted strings. For now, we can just leave them as they are until we need them.

The HTTP specification is very large, and this chapter only covers the most important bits of implementing an HTTP server, which we will do in the next chapter.

6.6 Common Header Fields

TODO

6.7 Discussion: Text vs. Binary

Text is Easier to Poke Around

HTTP is designed in a way that you can send requests from telnet. This is considered somewhat of an advantage because you can learn how things work by poking around the server with telnet. You can also do some quick tests to debug the server. However, this design choice comes with downsides as well.

Text is Often Ambiguous

One downside is that human-readable designs are often less machine-readable. Because being human readable means that the protocol is more flexible than necessary. For example, the header field value can be surrounded by optional whitespaces. When implementing the protocol, it not only takes extra effort to remove the optional whitespaces, but it also introduces a source of error — the programmer may misinterpret what counts as a “space”, resulting in different behavior in different software.

Text is More Work & Error-Prone

Another downside is that a textual protocol always requires more work to parse than the simplest binary alternative. We have talked multiple times about determining the length of things, such as the length of the HTTP header and the length of each line in the header. Whenever there is a textual string, there is a need to determine its length.

A fictional binary alternative to HTTP could start with a fixed-size structure containing fixed-width integers for the size of the method, the size of the URI, the size of the whole header, and the number of header fields. And each header field starts with its own size. This kind of binary protocol requires minimal parsing and is therefore less error-prone.

Although this book only implements HTTP, you can learn some lessons about general networking as well. If you are designing a new protocol, you can avoid many of the complexities of HTTP.

6.8 Discussion: Delimiters

Delimiters are everywhere in textual protocols. In HTTP, each line is terminated by CRLF and the header is terminated by an empty line. One problem with delimiters is that the data cannot contain the delimiter itself. Failure to enforce this rule can lead to a class of exploits known as injection attacks.

If a malicious client can trick a buggy server into emitting an header field value with CRLF in it, and the header field is the last field, then the payload body starts with the part of the field value that the attacker controls. Google “HTTP response splitting” to learn more.

This brings us to another lesson — textual protocols are not only harder to parse, they are also harder to serialize. Null-terminated strings are often considered less than ideal in C, as are delimiters in protocols and formats. This also prompts us to pay attention to specifications. RFC 9110 specifies exactly what is allowed in header fields, and CRLF is not included.

See also:
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.
Check it out

⟵ prev Contents next ⟶