Build Your Own Redis with C/C++

CHAPTER

13. Cache Expiration with TTL

13.1 Time-to-live

Cache expiration methods

In many apps, database access is not uniform; a large fraction of requests touch only a small fraction of data repeatedly. This small fraction of hot data can be served from a cache to improve latency and throughput. Redis is mostly used as a cache.

Hot data changes over time. To maintain a small cache of hot data, you need to constantly remove data to make room for fresh data. Redis offers 2 mechanisms:

Set a maximum memory limit, randomly evict keys when the limit is reached.
Set an expiration time (TTL) on keys, delete expired keys using timers.

We will implement TTL (time-to-live), which requires extending the timer code.

Timer quick review

As explained in the last chapter, we need a data structure that can find and remove the nearest timestamp repeatedly, that is, produce the timestamps in sort order.

This data structure can be a linked list iff the timeout value is fixed. TTLs are arbitrary, so we need a more generic sorting data structure, such as an AVL tree, but there is a better tree data structure called a heap. Do not confuse this with the unrelated stack/heap terminology.

13.2 The heap data structure

Binary search tree vs. heap

All binary trees mentioned so far are binary search trees (BST). They are used to order data from left to right. For each node in a BST, its left subtree is ordered before itself and its right subtree is ordered after itself.

       8
   ┌───┴─┐
   4   ┆ 9
 ┌─┴─┐ ┆ ┆
 2 ┆ 6 ┆ ┆
┌┴┬┴┬┴┬┴┬┴┐
│2│4│6│8│9│
└─┴─┴─┴─┴─┘

A heap is also a binary tree, but the data is ordered differently: For each node in a heap, both its left and right children are ordered after itself.

Example: 2 valid heaps with the same data (1, 2, 3, 4, 5, 6, 9).

      1                   1
  ┌───┴───┐           ┌───┴───┐
  4       2           2       5
┌─┴─┐   ┌─┴─┐       ┌─┴─┐   ┌─┴─┐
5   6   3   9       4   3   6   9

The range of the 2 subtrees can overlap. So, unlike BSTs, it cannot produce a fully sorted sequence by simply traversing the tree.

Minimum heap and maximum heap

While a heap doesn’t represent the sorted sequence, it does keep track of the minimum value; the root node contains the minimum value. This is called a minimum heap. A reverse-ordered heap is called a maximum heap.

We will use a minimum heap for our TTL timers. Finding the minimum value (nearest timer) is O(1). Removing or updating a node is O(log N), explained later.

An AVL tree can do everything a heap can do, it can find the minimum value in O(log N) by finding the leftmost node, this node can also be cached to make it accessible in O(1). But why use a heap instead of a BST? Because a heap is simpler, uses less space, and is more performant.

Array-encoded binary tree

A binary tree is some dynamically allocated nodes linked by pointers. However, some binary trees can be represented without pointers using an array:

                        ┌─┐
      1                 │1│
  ┌───┴───┐             ├─┼─┐               ┌─┐┌─┬─┐┌─┬─┬─┬─┐
  4       2    ────►    │4│2│      ────►    │1││4│2││5│6│3│9│ more...
┌─┴─┐   ┌─┴─┐           ├─┼─┼─┬─┐           └─┘└─┴─┘└─┴─┴─┴─┘
5   6   3   9           │5│6│3│9│     index  0  1 2  3 4 5 6
 ... more ...           └─┴─┴─┴─┘     level  0  1 1  2 2 2 2
                       ... more ...

In this representation, nodes are flattened into an array level by level. This requires that each level is fully filled; empty subtrees are not allowed except for the last level.

The number of nodes in each level is simply a power of 2 series: 1, 2, 4, 8, etc. The index of a child node can be calculated from the index of the parent node.

static size_t heap_left(size_t i)   { return i * 2 + 1; }
static size_t heap_right(size_t i)  { return i * 2 + 2; }

For example, node 4’s index is 1, its 2 children are at indexes 3 and 4 respectively.

We can also calculate the index of the parent for the indexes of its children.

static size_t heap_parent(size_t i) { return (i + 1) / 2 - 1; }

Array-encoded heap

The heap data structure is array-encoded. It’s a binary tree with 2 invariants:

A node’s value is less than both of its children.
Each level is fully filled except for the last level.

Heap is the preferred for minimum/maximum value problems. Being array-encoded means no dynamic allocations, so insert and delete are faster.

Update a heap value to be less

Changing the value of a node may break the 1st invariant that a node’s value is less than its children. This invariant is restored by moving nodes around. Let’s say the value is changed to be less than its parent, this can be fixed by swapping the bad node with the parent node.

  5              5                   3
┌─┴─┐   ───►   ┌─┴─┐        ───►   ┌─┴─┐
6   8*         6   3 (bad)         6   5

After that, the parent node will also become less, which may also break the invariant, requiring another swap with the grandparent.

struct HeapItem { uint64_t val; };

// when a node becomes greater than its parent
static void heap_up(HeapItem *a, size_t pos) {
    HeapItem t = a[pos];
    while (pos > 0 && a[heap_parent(pos)].val > t.val) {
        // swap with the parent
        a[pos] = a[heap_parent(pos)];
        pos = heap_parent(pos);
    }
    a[pos] = t;
}

Update a heap value to be greater

Similarly, if a value is updated to be greater, it may need to be swapped with one of its children to restore the invariant.

  2*             5 (bad)             3
┌─┴─┐   ───►   ┌─┴─┐        ───►   ┌─┴─┐
6   3          6   3               6   5

There may be 2 children, we must swap with the smallest one.

static void heap_down(HeapItem *a, size_t pos, size_t len) {
    HeapItem t = a[pos];
    while (true) {
        // find the smallest one among the parent and their kids
        size_t l = heap_left(pos);
        size_t r = heap_right(pos);
        size_t min_pos = pos;
        uint64_t min_val = t.val;
        if (l < len && a[l].val < min_val) {
            min_pos = l;
            min_val = a[l].val;
        }
        if (r < len && a[r].val < min_val) {
            min_pos = r;
        }
        if (min_pos == pos) {
            break;
        }
        // swap with the kid
        a[pos] = a[min_pos];
        pos = min_pos;
    }
    a[pos] = t;
}

Update a heap value

Updating a heap value causes it to move either up or down, depending on whether it’s increased or decreased. The time complexity is O(log N) because heaps are height-balanced trees.

void heap_update(HeapItem *a, size_t pos, size_t len) {
    if (pos > 0 && a[heap_parent(pos)].val > a[pos].val) {
        heap_up(a, pos);
    } else {
        heap_down(a, pos, len);
    }
}

Insert a heap item

The last level of the tree is at the end of the array, so the last level is allowed to be incomplete. Example:

                        ┌─┐
      1                 │1│
  ┌───┴───┐             ├─┼─┐               ┌─┐┌─┬─┐┌─┬─┬─┬─┐
  4       2    ────►    │4│2│      ────►    │1││4│2││5│ │ │ │
┌─┴─┐   ┌─┴─┐           ├─┼─┼─┬─┐           └─┘└─┴─┘└─┴─┴─┴─┘
5  nil nil nil          │5│ │ │ │
                        └─┴─┴─┴─┘

The last level is where new items are inserted, if the last level is full, the tree gets a new level with just 1 item, which is the new last level. This won’t break the 2nd invariant that each level is fully filled except for the last level.

New items are appended to the array, then the 1st invariant is fixed by heap_update().

std::vector<HeapItem> a;
a.push_back(...);
heap_update(a.data(), a.size() - 1, a.size());

Delete a heap item

There are 2 ways to delete an item from a dynamic array. The naive way is to move the items after it forward by 1. This is O(N) and will mess up the heap data structure.

Another way is to swap it with the last item and delete the last item instead. This is O(1) and compatible with the heap data structure. You just need to call heap_update() on the swapped item.

static void heap_delete(std::vector<HeapItem> &a, size_t pos) {
    // swap the erased item with the last item
    a[pos] = a.back();
    a.pop_back();
    // update the swapped item
    if (pos < a.size()) {
        heap_update(a.data(), pos, a.size());
    }
}

13.3 Implement TTL

Associate data with the heap items

A heap item also contains the reference to the key, so it can be deleted by timers.

struct HeapItem {
    uint64_t val;       // heap value, the expiration time
    struct Entry *ref;  // extra data associated with the value
};

A key with a TTL can also be deleted by the del command. To remove the associated heap item, the key should also reference the heap item via its array index.

struct Entry {
    struct HNode node;      // hashtable node
    std::string key;
    // for TTL
    size_t heap_idx = -1;   // array index to the heap item
    // ...
};

The heap data structure code must update Entry::heap_idx when items are moved. But using our experience with intrusive data structures, we can make the code generic by making HeapItem::ref point to the embedded array index instead.

struct HeapItem {
    uint64_t val;       // heap value, the expiration time
    size_t *ref;        // points to `Entry::heap_idx`
};

The data structure code can then update the embedded array index without knowing about struct Entry at all.

static void heap_up(HeapItem *a, size_t pos) {
    HeapItem t = a[pos];
    while (pos > 0 && a[heap_parent(pos)].val > t.val) {
        // swap with the parent
        a[pos] = a[heap_parent(pos)];
        *a[pos].ref = pos;              // added!
        pos = heap_parent(pos);
    }
    a[pos] = t;
    *a[pos].ref = pos;                  // added!
}
// NOTE: heap_down() is also modified.

Add, update, and remove timestamps in the heap

For the sake of laziness, I use negative TTLs to mean removing the TTL.

// set or remove the TTL
static void entry_set_ttl(Entry *ent, int64_t ttl_ms) {
    if (ttl_ms < 0 && ent->heap_idx != (size_t)-1) {
        // setting a negative TTL means removing the TTL
        heap_delete(g_data.heap, ent->heap_idx);
        ent->heap_idx = -1;
    } else if (ttl_ms >= 0) {
        // add or update the heap data structure
        uint64_t expire_at = get_monotonic_msec() + (uint64_t)ttl_ms;
        HeapItem item = {expire_at, &ent->heap_idx};
        heap_upsert(g_data.heap, ent->heap_idx, item);
    }
}

Entry::heap_idx is −1 if no TTL is set. It’s used to update or delete the heap item.

// update or append
static void heap_upsert(std::vector<HeapItem> &a, size_t pos, HeapItem t) {
    if (pos < a.size()) {
        a[pos] = t;         // update an existing item
    } else {
        pos = a.size();
        a.push_back(t);     // or add a new item
    }
    heap_update(a.data(), pos, a.size());
}

Find the nearest timer

next_timer_ms() returns the timeout value of the nearest timer. We can keep the linked list timer from the last chapter and use the smaller timestamp.

static uint32_t next_timer_ms() {
    uint64_t now_ms = get_monotonic_msec();
    uint64_t next_ms = (uint64_t)-1;    // maximum value
    // idle timers using a linked list
    if (!dlist_empty(&g_data.idle_list)) {
        Conn *conn = container_of(g_data.idle_list.next, Conn, idle_node);
        next_ms = conn->last_active_ms + k_idle_timeout_ms;
    }
    // TTL timers using a heap
    if (!g_data.heap.empty() && g_data.heap[0].val < next_ms) {
        next_ms = g_data.heap[0].val;
    }
    // timeout value
    if (next_ms == (uint64_t)-1) {
        return -1;  // no timers, no timeouts
    }
    if (next_ms <= now_ms) {
        return 0;   // missed?
    }
    return (int32_t)(next_ms - now_ms);
}

Handle expirations

Simply handle the heap-based timer in process_timers().

static void process_timers() {
    uint64_t now_ms = get_monotonic_msec();
    // idle timers using a linked list
    // ...
    // TTL timers using a heap
    const std::vector<HeapItem> &heap = g_data.heap;
    while (!heap.empty() && heap[0].val < now_ms) {
        Entry *ent = container_of(heap[0].ref, Entry, heap_idx);
        hm_delete(&g_data.db, &ent->node, &hnode_same);
        entry_del(ent); // delete the key
    }
}

The TTL should also be removed from the heap when deleting a key.

static void entry_del(Entry *ent) {
    // ...
    entry_set_ttl(ent, -1); // remove from the heap data structure
    delete ent;
}

Limit the amount of work per loop iteration

Event-based concurrency means that we cannot stop the event loop for too long, so we should watch out for potentially high latency tasks. If many keys are set to expire at the same time, the server will be busy deleting them for a while. So we set a limit for timer processing.

    const size_t k_max_works = 2000;
    size_t nworks = 0;
    while (!heap.empty() && heap[0].val < now_ms && nworks++ < k_max_works) {
        Entry *ent = container_of(heap[0].ref, Entry, heap_idx);
        hm_delete(&g_data.db, &ent->node, &hnode_same);
        entry_del(ent); // delete the key
    }

If the limit is exceeded, the program will enter the next event loop iteration with a timeout value of 0, causing poll() not to wait. poll() will still notify any IO readiness, so the program can interleave IO handling with a huge number of expiring timers.

Add TTL commands

EXPIRE sets a TTL, TTL returns the TTL value, PERSIST removes the TTL.

static void do_expire(std::vector<std::string> &cmd, Buffer &out) {
    // parse args
    int64_t ttl_ms = 0;
    if (!str2int(cmd[2], ttl_ms)) {
        return out_err(out, ERR_BAD_ARG, "expect int64");
    }
    // lookup the key
    LookupKey key;
    key.key.swap(cmd[1]);
    key.node.hcode = str_hash((uint8_t *)key.key.data(), key.key.size());
    HNode *node = hm_lookup(&g_data.db, &key.node, &entry_eq);
    // set TTL
    if (node) {
        Entry *ent = container_of(node, Entry, node);
        entry_set_ttl(ent, ttl_ms);
    }
    return out_int(out, node ? 1: 0);
}

Source code:

13_server.cpp

avl.cpp

avl.h

common.h

hashtable.cpp

hashtable.h

heap.cpp

heap.h

list.h

test_heap.cpp

zset.cpp

zset.h

(Error report | Ask questions) @ build-your-own.org

See also:
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.

Check it out

Build Your Own Redis with C/C++

prev ◄─┘ Contents └─► next