00. Introduction

Master fundamentals by building your own DB

What to learn?

Complex systems like databases are built on a few simple principles.

  1. Atomicity & durability. A DB is more than files!
    • Persist data with fsync.
    • Crash recovery.
  2. KV store based on B-tree.
    • Disk-based data structures.
    • Space management with a free list.
  3. Relational DB on top of KV.
    • How tables and indexes are mapped to low-level B-trees.
    • SQL-like query language; parser & interpreter.
  4. Concurrency control for transactions.

Code a database in 3000 LoC, incrementally

It’s amazing that an interesting and broad topic can be captured in 3000 LoC. You may have experience with larger projects, but not all experience is equal.

LoC Step
366 B+tree data structure.
601 Append-only KV.
731 Practical KV with a free list.
1107 Tables on KV.
1294 Range queries.
1438 Secondary indexes.
1461 Transactional interfaces.
1702 Concurrency control.
2795 SQL-like query language.

Learn by doing: principles instead of jargon

Database literature is full of confusing, overloaded jargon with no consistent meaning. It’s easy to get lost when reading about it. On the other hand, Feymann once said, “what I can’t build, I don’t understand”. Can you build a database by reading about databases? Test your understanding!

While there is a lot to learn, not all knowledge is equally important, it takes only a few principles to build a DB, so anyone can try.

Topic 1: durability and atomicity

More than a data format

Smartphones use SQLite (a file-based DB) heavily. Why store data in SQLite instead of some other format, say JSON? Because you risk data loss if it crashes during an update. The file can end up half-written, truncated, or even missing.

There are techniques to fix this, and they lead to databases.

Durability and atomicity with `fsync`

Atomicity means that data is either updated or not, not in between. Durability means that data is guaranteed to exist after a certain point. They are not separate concerns, because we must achieved both.

The first thing to learn is the fsync syscall. A file write doesn’t reach disk synchronously, there are multiple levels of buffering (OS page cache and on-device RAM). fsync flushes pending data and waits until it’s done. This makes writes durable, but what about atomicity?

Topic 2: indexing data structures

Control latency and cost with indexes

A DB turns a query into a result without the user knowing how. But the result is not the only concern, latency and cost (memory, IO, computation) are also relevant, hence the distinction between analytical (OLAP) and transactional (OLTP).

  • OLAP can involve large amounts of data, with aggregation or join operations. Indexing can be limited or non-existent.
  • OLTP touches small amounts of data using indexes. Low latency & cost.

The word “transactional” is not about DB transactions, it’s just a funny jargon.

In-memory data structures vs. on-disk data structures

There are extra challenges when putting an indexing data structure on disk. One of the problems is updating disk data in-place, because you have to deal with corrupted states after a crash. Disks are not just slower RAM. Reusing in-memory data structure code will not work.

Another problem is latency. Even for the best SSDs, latency is 3 orders of magnitude greater than RAM (50μs vs. 50ns). This limits the choice for data structures. Practical DBs use either B-trees or LSM-trees, not some binary trees or skip lists. Concurrent access to data structures is also a topic.

Topic 3: Relational DB on KV

Two layers of DB interfaces

SQL is almost a synonym for database. But SQL is just a user interface, it’s not fundamental to a DB. What’s important is the functionalities underneath.

Another much simpler interface is key-value (KV). You can get, set, and delete a single key, and most importantly, list a range of keys in sorted order. KV is simpler than SQL because it’s one layer lower. Relational DBs are built on top of KV-like interfaces called storage engines.

Query languages: parsers and interpreters

The last step is easy, despite the larger LoC. Both the parser and the interpreter are coded with nothing but recursion! The lesson can be applied to almost any computer language, or creating your own programming language or DSL (See my book “From Source Code To Machine Code” for more challenges).

Questions | getting help

You can email the author:

Build Your Own X book series

X includes Redis, web server and compiler. Read the web version on the website.

https://build-your-own.org