01. Why Compiler?

Compilers are a fascinating topic. You use them every day. They turn your textual source code into some binary magic executed by computers.

It is often said that learning C leads to a better understanding of computers. What’s even better than just learning C is learning compilers and assembly, because even after learning C, the machine and the compiler may still seem like black boxes to you.

Compiler construction is a subject included in computer science and software engineering education. However, many of today’s (2023+) coders do not have a formal CS education. Basic things such as compilers, databases, operating systems, etc. are often seen as magical black boxes. That’s why I started the “Build Your Own X” book series. To learn and teach basic things by the “from scratch” approach, through succinct & condensed books.

02. Build Your Own Compiler

The book roughly follows the following steps.

Step 1: Interpreter

The first step to my compiler is a simple interpreter, which is a Python script that executes an S-expression (sexpr). It has variables, control flows, and functions, which is sufficient to write programs in. The syntax looks like this:

(def foo (n)
    (if (le n 0)
        (then 0)
        (else (+ n (call foo (- n 1))))))

Interpreters like this are usually about 200 LoC or less in a scripting language, which isn’t much work, but it can be exciting on the first try.

Step 2: Bytecode Compiler

The next step is a bytecode compiler. For example, the function bar is compiled into bytecode:

(def (bar int) ((n int))
    (if (le n 0) (then 0) (else (call bar (- n 1)))))
    const 0 1
    binop le 0 1 1
    jmpf 1 L1
    const 0 1
    jmp L0
    const 1 1
    binop - 0 1 1
    call 1 1 2 2
    ret 1

It’s a virtual instruction set that can later be mapped to real machine instructions.

Step 3: Machine Code

To generate machine code, you need to learn some assembly, so a succinct chapter on x64 assembly language is included.

And the book doesn’t stop at the assembly language, you’ll also dive into the instruction encoding. Emitting machine code directly instead of using an external assembler will help you understand x64 assembly better. Did you know that mov is actually multiple instructions in the same name? Or why you cannot say mov [rax], [rcx] in assembly?

03. The Book

Like my other books, this book follows a step-by-step approach. You start with simple things, like a calculator, then a simple interpreter. Then you tackle the compiler bit-by-bit, compiling to bytecode, learning assembly, and finally generating native executables.

The final product is a small, statically typed language that resembles a subset of C, which compiles into x64 Linux ELF executables. And it has a machine-agnostic IR, so it’s not hard to target other platforms such as ARM, which is a challenge you can take up later.

The interpreter chapters are freely available online.

The whole book is available as paperback or ebook.