Build Your Own Web Server From Scratch In JavaScript Subscribe to get notified of new chapters and the book's release.
06. HTTP Semantics & Message Format
The details of the HTTP protocol are described in a specification consisting of a series of RFC documents. These are a must read if you want to implement HTTP yourself.
HTTP is very human-readable, which means you can probably build an HTTP server by looking at example messages instead of specifications. However, you will probably end up with a very buggy server. That’s why you’ll be introduced to RFC documents.
6.1 High-Level Structures
You already have some ideas about HTTP from the introductory chapter. Let’s review them again:
- An HTTP request message consists of:
- The method, which is a verb such as
GET
,POST
. - The URI.
- A list of header fields, which is a list of key-value pairs.
- A payload body may follow the request header. This is not true for
all methods, for example, the
GET
method does not have a payload body while thePOST
method does.
- The method, which is a verb such as
- An HTTP response consists of:
- A status code, mostly to indicate whether the request was successful.
- A list of header fields like the request header.
- An optional payload body.
These things are mostly the same in all versions of HTTP from HTTP/1.0 to HTTP/3.
An example HTTP response message:
HTTP/1.0 200 OK
Age: 525410
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2020 11:11:11 GMT
Etag: "1234567890+gzip+ident"
Expires: Thu, 20 Oct 2020 11:11:11 GMT
Last-Modified: Thu, 20 Oct 2019 11:11:11 GMT
Server: ECS (ddd/EEEE)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
Connection: close
(empty) HTML payload ...
6.2 Content-Length
The header fields are what we need to look at in detail. RFC 9110 describes general concepts and semantics, including how header fields work. Try reading it after this section.
The most important header fields are Content-Length
and
Transfer-Encoding
, because these 2 fields play a role in
protocol parsing. As we mentioned earlier, the first thing to consider
when designing a protocol is how messages are separated apart, i.e., how
long a message is.
The Length of HTTP Header
An HTTP message consists of 2 parts: the header and the body payload.
The header consists of lines of plain text, and the header and body are
separated by an empty line. Thus, we can determine the length of the
header by looking for the empty line. A line ends with
'\r\n'
, so we are looking for '\r\n\r\n'
.
The Length of HTTP Body
The length of the body is somewhat complicated, because there are 3
ways to indicate the body length. The first way is to use the
Content-Length
field to specify the length of the body
directly.
Some ancient HTTP/1.0 software doesn’t send the
Content-Length
field, the payload body is just the rest of
the connection, and the parser reads the socket to EOF and that’s the
body. Not only does this design not allow the connection to be reused
for multiple requests, but it also has trouble dealing with premature
connection close as the receiver dosen’t know whether the payload was
received in full. This is the second way to determine the body
length.
6.3 Chunked Encoding
Ending A Data Stream without Its Length
The third way to determine the body length is to use chunked
encoding. Chunked encoding is used to indicate the end of the body
without knowing the body length beforehand. It is implied by the
presence of the Transfer-Encoding: chunked
field instead of
the Content-Length
field.
Chunked encoding allows applications to transmit the response while generating it on the fly. This use case is called streaming. An example is displaying real-time logs to the client without waiting for the process to finish.
Another Layer of Protocol
How does chunked encoding work? We could use a special “marker” to mark the end of the body. But the body is just a stream of arbitrary bytes, what if the marker itself is in the payload? The solution is to put the marker at the beginning of the data instead of at the end of the data. A marker says: Here is a chunk of data of length N, and after N bytes of data, there is the next marker. And the last marker is a special one, so you know that the body has ended.
| len | data | len | data | ... | len | data | end |
Here is a concrete example of chunked encoding:
4\r\nHTTP\r\n5\r\nserver\r\n0\r\n\r\n
It is parsed into 3 chunks:
4\r\nHTTP\r\n
6\r\nserver\r\n
0\r\n\r\n
You may have guessed how this works. The first chunk carries 4 bytes
data 'HTTP'
, the second carries 6 bytes data
'server'
, and the last is a special chunk indicating the
end of the payload.
Chunks are not Messages
Note that the chunk boundaries are just side effects of the chunked encoding. These chunks are not meant to be presented to the application, and the application still sees the payload as a stream of bytes, kinda like TCP doesn’t preserve packet boundaries from the IP layer.
6.4 Ambiguities in HTTP
The Happy Cases of Body Length
In summary, this is how to determine the length of the payload body
(if the HTTP method allows the payload body, i.e., POST
or
PUT
).
- If the
Content-Length: number
is present. The length is known. - If the
Transfer-Encoding: chunked
is present. Decode the chunked data format until the end of the payload is indicated. - If neither field is present. This is probably an HTTP/1.0 client, use the rest of the connection data as the payload.
Mind the Nasty Cases
You may wonder what happens if both header fields are present, as there is no clear way to interpret this. This kind of ambiguity is a source of security exploits known as “HTTP request smuggling”. Google it to learn more.
Another ambiguity is the nonexistent payload body for the
GET
request, what if the the request includes
Content-Length: 0
? Should the server ignore the field,
forbid the field, or make a special case for the 0-length case? Also,
should the server or client even allow users to mess with the
Content-Length
or Transfer-Encoding
field at
all? There are many discussions on the Internet, and different
implementations handle them differently.
An exercise for the reader: If you are designing a new protocol, how do you avoid ambiguities like this?
6.5 HTTP Message Format
There is more to learn from the RFC 9110 HTTP Semantics document, but much of it is not immediately relevant to creating an HTTP server. Let’s now focus on another document: RFC 9112 — HTTP/1.1. This document describes exactly how bits are transmitted over the network. Make sure you have a copy of the RFC document before proceeding.
Read the BNF Language
The HTTP message format is described in a language called BNF. Go to the “2. Message” section in RFC 9112 and you will see things like this:
HTTP-message = start-line CRLF
*( field-line CRLF )
CRLF
[ message-body ]
start-line = request-line / status-line
This says: An HTTP message is either a request message or a response
message. A message starts with either a request line or a status line,
followed by multiple header fields, then an empty line, then the
optional payload body. Lines are separated by CRLF, which is the ASCII
string '\r\n'
. The BNF language is much more concise and
less ambiguous than English.
HTTP Header Fields
field-line = field-name ":" OWS field-value OWS
The header field name and value are separated by a colon, but the rules for field name and value are defined in RFC 9110 instead.
field-name = token
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
; any VCHAR, except delimiters
OWS = *( SP / HTAB )
; optional whitespace
field-value = *field-content
field-content = field-vchar
[ 1*( SP / HTAB / field-vchar ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-text = %x80-FF
This is the general rule for field name and value. SP
,
HTAB
, and VCHAR
refer to space, tab, and
printable ASCII character, respectively. Some characters are forbidden
in header fields, especially CR and LF.
Some header fields have additional rules for interpretation, such as comma-separated values or quoted strings. For now, we can just leave them as they are until we need them.
The HTTP specification is very large, and this chapter only covers the most important bits of implementing an HTTP server, which we will do in the next chapter.
6.6 Common Header Fields
TODO
6.7 Discussion: Text vs. Binary
Text is Easier to Poke Around
HTTP is designed in a way that you can send requests from
telnet
. This is considered somewhat of an advantage because
you can learn how things work by poking around the server with
telnet
. You can also do some quick tests to debug the
server. However, this design choice comes with downsides as well.
Text is Often Ambiguous
One downside is that human-readable designs are often less machine-readable. Because being human readable means that the protocol is more flexible than necessary. For example, the header field value can be surrounded by optional whitespaces. When implementing the protocol, it not only takes extra effort to remove the optional whitespaces, but it also introduces a source of error — the programmer may misinterpret what counts as a “space”, resulting in different behavior in different software.
Text is More Work & Error-Prone
Another downside is that a textual protocol always requires more work to parse than the simplest binary alternative. We have talked multiple times about determining the length of things, such as the length of the HTTP header and the length of each line in the header. Whenever there is a textual string, there is a need to determine its length.
A fictional binary alternative to HTTP could start with a fixed-size structure containing fixed-width integers for the size of the method, the size of the URI, the size of the whole header, and the number of header fields. And each header field starts with its own size. This kind of binary protocol requires minimal parsing and is therefore less error-prone.
Although this book only implements HTTP, you can learn some lessons about general networking as well. If you are designing a new protocol, you can avoid many of the complexities of HTTP.
6.8 Discussion: Delimiters
Delimiters are everywhere in textual protocols. In HTTP, each line is terminated by CRLF and the header is terminated by an empty line. One problem with delimiters is that the data cannot contain the delimiter itself. Failure to enforce this rule can lead to a class of exploits known as injection attacks.
If a malicious client can trick a buggy server into emitting an header field value with CRLF in it, and the header field is the last field, then the payload body starts with the part of the field value that the attacker controls. Google “HTTP response splitting” to learn more.
This brings us to another lesson — textual protocols are not only harder to parse, they are also harder to serialize. Null-terminated strings are often considered less than ideal in C, as are delimiters in protocols and formats. This also prompts us to pay attention to specifications. RFC 9110 specifies exactly what is allowed in header fields, and CRLF is not included.
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.