13. Cache Expiration with TTL
13.1 Time-to-live
Cache expiration methods
In many apps, database access is not uniform; a large fraction of requests touch only a small fraction of data repeatedly. This small fraction of hot data can be served from a cache to improve latency and throughput. Redis is mostly used as a cache.
Hot data changes over time. To maintain a small cache of hot data, you need to constantly remove data to make room for fresh data. Redis offers 2 mechanisms:
- Set a maximum memory limit, randomly evict keys when the limit is reached.
- Set an expiration time (TTL) on keys, delete expired keys using timers.
We will implement TTL (time-to-live), which requires extending the timer code.
Timer quick review
As explained in the last chapter, we need a data structure that can find and remove the nearest timestamp repeatedly, that is, produce the timestamps in sort order.
This data structure can be a linked list iff the timeout value is fixed. TTLs are arbitrary, so we need a more generic sorting data structure, such as an AVL tree, but there is a better tree data structure called a heap. Do not confuse this with the unrelated stack/heap terminology.
13.2 The heap data structure
Binary search tree vs. heap
All binary trees mentioned so far are binary search trees (BST). They are used to order data from left to right. For each node in a BST, its left subtree is ordered before itself and its right subtree is ordered after itself.
8
┌───┴─┐
4 ┆ 9
┌─┴─┐ ┆ ┆
2 ┆ 6 ┆ ┆
┌┴┬┴┬┴┬┴┬┴┐
│2│4│6│8│9│
└─┴─┴─┴─┴─┘
A heap is also a binary tree, but the data is ordered differently: For each node in a heap, both its left and right children are ordered after itself.
Example: 2 valid heaps with the same data (1, 2, 3, 4, 5, 6, 9).
1 1
┌───┴───┐ ┌───┴───┐
4 2 2 5
┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
5 6 3 9 4 3 6 9
The range of the 2 subtrees can overlap. So, unlike BSTs, it cannot produce a fully sorted sequence by simply traversing the tree.
Minimum heap and maximum heap
While a heap doesn’t represent the sorted sequence, it does keep track of the minimum value; the root node contains the minimum value. This is called a minimum heap. A reverse-ordered heap is called a maximum heap.
We will use a minimum heap for our TTL timers. Finding the minimum value (nearest timer) is O(1). Removing or updating a node is O(log N), explained later.
An AVL tree can do everything a heap can do, it can find the minimum value in O(log N) by finding the leftmost node, this node can also be cached to make it accessible in O(1). But why use a heap instead of a BST? Because a heap is simpler, uses less space, and is more performant.
Array-encoded binary tree
A binary tree is some dynamically allocated nodes linked by pointers. However, some binary trees can be represented without pointers using an array:
┌─┐
1 │1│
┌───┴───┐ ├─┼─┐ ┌─┐┌─┬─┐┌─┬─┬─┬─┐
4 2 ────► │4│2│ ────► │1││4│2││5│6│3│9│ more...
┌─┴─┐ ┌─┴─┐ ├─┼─┼─┬─┐ └─┘└─┴─┘└─┴─┴─┴─┘
5 6 3 9 │5│6│3│9│ index 0 1 2 3 4 5 6
... more ... └─┴─┴─┴─┘ level 0 1 1 2 2 2 2
... more ...
In this representation, nodes are flattened into an array level by level. This requires that each level is fully filled; empty subtrees are not allowed except for the last level.
The number of nodes in each level is simply a power of 2 series: 1, 2, 4, 8, etc. The index of a child node can be calculated from the index of the parent node.
static size_t heap_left(size_t i) { return i * 2 + 1; }
static size_t heap_right(size_t i) { return i * 2 + 2; }
For example, node 4’s index is 1, its 2 children are at indexes 3 and 4 respectively.
We can also calculate the index of the parent for the indexes of its children.
static size_t heap_parent(size_t i) { return (i + 1) / 2 - 1; }
Array-encoded heap
The heap data structure is array-encoded. It’s a binary tree with 2 invariants:
- A node’s value is less than both of its children.
- Each level is fully filled except for the last level.
Heap is the preferred for minimum/maximum value problems. Being array-encoded means no dynamic allocations, so insert and delete are faster.
Update a heap value to be less
Changing the value of a node may break the 1st invariant that a node’s value is less than its children. This invariant is restored by moving nodes around. Let’s say the value is changed to be less than its parent, this can be fixed by swapping the bad node with the parent node.
5 5 3
┌─┴─┐ ───► ┌─┴─┐ ───► ┌─┴─┐
6 8* 6 3 (bad) 6 5
After that, the parent node will also become less, which may also break the invariant, requiring another swap with the grandparent.
struct HeapItem { uint64_t val; };
// when a node becomes greater than its parent
static void heap_up(HeapItem *a, size_t pos) {
= a[pos];
HeapItem t while (pos > 0 && a[heap_parent(pos)].val > t.val) {
// swap with the parent
[pos] = a[heap_parent(pos)];
a= heap_parent(pos);
pos }
[pos] = t;
a}
Update a heap value to be greater
Similarly, if a value is updated to be greater, it may need to be swapped with one of its children to restore the invariant.
2* 5 (bad) 3
┌─┴─┐ ───► ┌─┴─┐ ───► ┌─┴─┐
6 3 6 3 6 5
There may be 2 children, we must swap with the smallest one.
static void heap_down(HeapItem *a, size_t pos, size_t len) {
= a[pos];
HeapItem t while (true) {
// find the smallest one among the parent and their kids
size_t l = heap_left(pos);
size_t r = heap_right(pos);
size_t min_pos = pos;
uint64_t min_val = t.val;
if (l < len && a[l].val < min_val) {
= l;
min_pos = a[l].val;
min_val }
if (r < len && a[r].val < min_val) {
= r;
min_pos }
if (min_pos == pos) {
break;
}
// swap with the kid
[pos] = a[min_pos];
a= min_pos;
pos }
[pos] = t;
a}
Update a heap value
Updating a heap value causes it to move either up or down, depending on whether it’s increased or decreased. The time complexity is O(log N) because heaps are height-balanced trees.
void heap_update(HeapItem *a, size_t pos, size_t len) {
if (pos > 0 && a[heap_parent(pos)].val > a[pos].val) {
(a, pos);
heap_up} else {
(a, pos, len);
heap_down}
}
Insert a heap item
The last level of the tree is at the end of the array, so the last level is allowed to be incomplete. Example:
┌─┐
1 │1│
┌───┴───┐ ├─┼─┐ ┌─┐┌─┬─┐┌─┬─┬─┬─┐
4 2 ────► │4│2│ ────► │1││4│2││5│ │ │ │
┌─┴─┐ ┌─┴─┐ ├─┼─┼─┬─┐ └─┘└─┴─┘└─┴─┴─┴─┘
5 nil nil nil │5│ │ │ │
└─┴─┴─┴─┘
The last level is where new items are inserted, if the last level is full, the tree gets a new level with just 1 item, which is the new last level. This won’t break the 2nd invariant that each level is fully filled except for the last level.
New items are appended to the array, then the 1st invariant is fixed
by heap_update()
.
std::vector<HeapItem> a;
.push_back(...);
a(a.data(), a.size() - 1, a.size()); heap_update
Delete a heap item
There are 2 ways to delete an item from a dynamic array. The naive way is to move the items after it forward by 1. This is O(N) and will mess up the heap data structure.
Another way is to swap it with the last item and delete the last item
instead. This is O(1) and
compatible with the heap data structure. You just need to call
heap_update()
on the swapped item.
static void heap_delete(std::vector<HeapItem> &a, size_t pos) {
// swap the erased item with the last item
[pos] = a.back();
a.pop_back();
a// update the swapped item
if (pos < a.size()) {
(a.data(), pos, a.size());
heap_update}
}
13.3 Implement TTL
Associate data with the heap items
A heap item also contains the reference to the key, so it can be deleted by timers.
struct HeapItem {
uint64_t val; // heap value, the expiration time
struct Entry *ref; // extra data associated with the value
};
A key with a TTL can also be deleted by the del
command.
To remove the associated heap item, the key should also reference the
heap item via its array index.
struct Entry {
struct HNode node; // hashtable node
std::string key;
// for TTL
size_t heap_idx = -1; // array index to the heap item
// ...
};
The heap data structure code must update Entry::heap_idx
when items are moved. But using our experience with intrusive data
structures, we can make the code generic by making
HeapItem::ref
point to the embedded array index
instead.
struct HeapItem {
uint64_t val; // heap value, the expiration time
size_t *ref; // points to `Entry::heap_idx`
};
The data structure code can then update the embedded array index
without knowing about struct Entry
at all.
static void heap_up(HeapItem *a, size_t pos) {
= a[pos];
HeapItem t while (pos > 0 && a[heap_parent(pos)].val > t.val) {
// swap with the parent
[pos] = a[heap_parent(pos)];
a*a[pos].ref = pos; // added!
= heap_parent(pos);
pos }
[pos] = t;
a*a[pos].ref = pos; // added!
}
// NOTE: heap_down() is also modified.
Add, update, and remove timestamps in the heap
For the sake of laziness, I use negative TTLs to mean removing the TTL.
// set or remove the TTL
static void entry_set_ttl(Entry *ent, int64_t ttl_ms) {
if (ttl_ms < 0 && ent->heap_idx != (size_t)-1) {
// setting a negative TTL means removing the TTL
(g_data.heap, ent->heap_idx);
heap_delete->heap_idx = -1;
ent} else if (ttl_ms >= 0) {
// add or update the heap data structure
uint64_t expire_at = get_monotonic_msec() + (uint64_t)ttl_ms;
= {expire_at, &ent->heap_idx};
HeapItem item (g_data.heap, ent->heap_idx, item);
heap_upsert}
}
Entry::heap_idx
is −1
if no TTL is set. It’s used to update or delete the heap item.
// update or append
static void heap_upsert(std::vector<HeapItem> &a, size_t pos, HeapItem t) {
if (pos < a.size()) {
[pos] = t; // update an existing item
a} else {
= a.size();
pos .push_back(t); // or add a new item
a}
(a.data(), pos, a.size());
heap_update}
Find the nearest timer
next_timer_ms()
returns the timeout value of the nearest
timer. We can keep the linked list timer from the last chapter and use
the smaller timestamp.
static uint32_t next_timer_ms() {
uint64_t now_ms = get_monotonic_msec();
uint64_t next_ms = (uint64_t)-1; // maximum value
// idle timers using a linked list
if (!dlist_empty(&g_data.idle_list)) {
*conn = container_of(g_data.idle_list.next, Conn, idle_node);
Conn = conn->last_active_ms + k_idle_timeout_ms;
next_ms }
// TTL timers using a heap
if (!g_data.heap.empty() && g_data.heap[0].val < next_ms) {
= g_data.heap[0].val;
next_ms }
// timeout value
if (next_ms == (uint64_t)-1) {
return -1; // no timers, no timeouts
}
if (next_ms <= now_ms) {
return 0; // missed?
}
return (int32_t)(next_ms - now_ms);
}
Handle expirations
Simply handle the heap-based timer in
process_timers()
.
static void process_timers() {
uint64_t now_ms = get_monotonic_msec();
// idle timers using a linked list
// ...
// TTL timers using a heap
const std::vector<HeapItem> &heap = g_data.heap;
while (!heap.empty() && heap[0].val < now_ms) {
*ent = container_of(heap[0].ref, Entry, heap_idx);
Entry (&g_data.db, &ent->node, &hnode_same);
hm_delete(ent); // delete the key
entry_del}
}
The TTL should also be removed from the heap when deleting a key.
static void entry_del(Entry *ent) {
// ...
(ent, -1); // remove from the heap data structure
entry_set_ttldelete ent;
}
Limit the amount of work per loop iteration
Event-based concurrency means that we cannot stop the event loop for too long, so we should watch out for potentially high latency tasks. If many keys are set to expire at the same time, the server will be busy deleting them for a while. So we set a limit for timer processing.
const size_t k_max_works = 2000;
size_t nworks = 0;
while (!heap.empty() && heap[0].val < now_ms && nworks++ < k_max_works) {
*ent = container_of(heap[0].ref, Entry, heap_idx);
Entry (&g_data.db, &ent->node, &hnode_same);
hm_delete(ent); // delete the key
entry_del}
If the limit is exceeded, the program will enter the next event loop
iteration with a timeout value of 0, causing poll()
not to
wait. poll()
will still notify any IO readiness, so the
program can interleave IO handling with a huge number of expiring
timers.
Add TTL commands
EXPIRE
sets a TTL, TTL
returns the TTL
value, PERSIST
removes the TTL.
static void do_expire(std::vector<std::string> &cmd, Buffer &out) {
// parse args
int64_t ttl_ms = 0;
if (!str2int(cmd[2], ttl_ms)) {
return out_err(out, ERR_BAD_ARG, "expect int64");
}
// lookup the key
;
LookupKey key.key.swap(cmd[1]);
key.node.hcode = str_hash((uint8_t *)key.key.data(), key.key.size());
key*node = hm_lookup(&g_data.db, &key.node, &entry_eq);
HNode // set TTL
if (node) {
*ent = container_of(node, Entry, node);
Entry (ent, ttl_ms);
entry_set_ttl}
return out_int(out, node ? 1: 0);
}
Source code: