00. Introduction
Master fundamentals by building your own DB
What to learn?
Complex systems like databases are built on a few simple principles.
- Atomicity & durability. A DB is more than
files!
- Persist data with
fsync
. - Crash recovery.
- Persist data with
- KV store based on B-tree.
- Disk-based data structures.
- Space management with a free list.
- Relational DB on top of KV.
- How tables and indexes are mapped to low-level B-trees.
- SQL-like query language; parser & interpreter.
- Concurrency control for transactions.
Code a database in 3000 LoC, incrementally
It’s amazing that an interesting and broad topic can be captured in 3000 LoC. You may have experience with larger projects, but not all experience is equal.
LoC | Step |
---|---|
366 | B+tree data structure. |
601 | Append-only KV. |
731 | Practical KV with a free list. |
1107 | Tables on KV. |
1294 | Range queries. |
1438 | Secondary indexes. |
1461 | Transactional interfaces. |
1702 | Concurrency control. |
2795 | SQL-like query language. |
Learn by doing: principles instead of jargon
Database literature is full of confusing, overloaded jargon with no consistent meaning. It’s easy to get lost when reading about it. On the other hand, Feymann once said, “what I can’t build, I don’t understand”. Can you build a database by reading about databases? Test your understanding!
While there is a lot to learn, not all knowledge is equally important, it takes only a few principles to build a DB, so anyone can try.
Topic 1: durability and atomicity
More than a data format
Smartphones use SQLite (a file-based DB) heavily. Why store data in SQLite instead of some other format, say JSON? Because you risk data loss if it crashes during an update. The file can end up half-written, truncated, or even missing.
There are techniques to fix this, and they lead to databases.
Durability and atomicity with `fsync`
Atomicity means that data is either updated or not, not in between. Durability means that data is guaranteed to exist after a certain point. They are not separate concerns, because we must achieved both.
The first thing to learn is the fsync
syscall. A file
write doesn’t reach disk synchronously, there are multiple levels of
buffering (OS page cache and on-device RAM). fsync
flushes
pending data and waits until it’s done. This makes writes durable, but
what about atomicity?
Topic 2: indexing data structures
Control latency and cost with indexes
A DB turns a query into a result without the user knowing how. But the result is not the only concern, latency and cost (memory, IO, computation) are also relevant, hence the distinction between analytical (OLAP) and transactional (OLTP).
- OLAP can involve large amounts of data, with aggregation or join operations. Indexing can be limited or non-existent.
- OLTP touches small amounts of data using indexes. Low latency & cost.
The word “transactional” is not about DB transactions, it’s just a funny jargon.
In-memory data structures vs. on-disk data structures
There are extra challenges when putting an indexing data structure on disk. One of the problems is updating disk data in-place, because you have to deal with corrupted states after a crash. Disks are not just slower RAM. Reusing in-memory data structure code will not work.
Another problem is latency. Even for the best SSDs, latency is 3 orders of magnitude greater than RAM (50μs vs. 50ns). This limits the choice for data structures. Practical DBs use either B-trees or LSM-trees, not some binary trees or skip lists. Concurrent access to data structures is also a topic.
Topic 3: Relational DB on KV
Two layers of DB interfaces
SQL is almost a synonym for database. But SQL is just a user interface, it’s not fundamental to a DB. What’s important is the functionalities underneath.
Another much simpler interface is key-value (KV). You can get, set, and delete a single key, and most importantly, list a range of keys in sorted order. KV is simpler than SQL because it’s one layer lower. Relational DBs are built on top of KV-like interfaces called storage engines.
Query languages: parsers and interpreters
The last step is easy, despite the larger LoC. Both the parser and the interpreter are coded with nothing but recursion! The lesson can be applied to almost any computer language, or creating your own programming language or DSL (See my book “From Source Code To Machine Code” for more challenges).
Build Your Own X book series
X includes Redis, web server and compiler. Read the web version on the website.