Build Your Own Redis with C/C++
build-your-own.org eBook·Paperback
⟵ prev Contents next ⟶

ℹ️ New Book: Build Your Own Database

08. Data Structure: Hashtables

Let’s replace the placeholder map from the previous chapter.

8.1 Hashtable Quick Review

Skip to the next section if you already know what a hashtable is.

Data Structures for KV

There are 2 classes of data structures for KV stores:

Sorting data structures maintains the sort order and uses comparisons to narrow down the search, so the best and average case for lookup is O(log(n)).

While hashtables rely on uniformly distributed hash values for O(1) operations.

How to choose between them? The primary concern is whether the keys are sorted. For top-level keys in Redis, there is no use case that requires ordering, so hashtables are chosen because of the O(1) operations.

However, for relational databases, sorting is a core functionality, so hash indexes are rarely used. And Redis also offers the sorted set type for sorting.

Hash Functions Map Keys to Slots

A hashtable is an array of a fixed number of slots. A key is hashed into an integer hash value, and the slot index is typically just hash(key) % size, then the key is associated with the slot.

The hash function is used to:

Open Addressing and Chaining for Resolving Collisions

Multiple keys may be hashed to the same slot. 2 ways to resolve collisions:

Finding an Empty Slot with Open Addressing

Many strategies with different probe length distributions:

Collection of Keys with Chaining

In chaining hashtables, the collection of colliding keys is often a linked list. Hence the name “chaining”. But linked lists are not the only option:

// an ad hoc chaining hashtable
std::vector<std::vector<T>> m;
m.resize(1024);
m[hash(key) % m.size()].push_back(pair);    // insert

The ad hoc chaining hashtable is handy in job interviews; it’s easier than linked lists, and there aren’t many ways to do (mess up) it, unlike open addressing.

Load Factor = Number_of_keys / Number_of_slots

Resizable Hashtables

Unlike trees, hashtables do not grow gradually. If the load factor is too high, a larger hashtable is created to replace the old one.

The size of the new hashtable grows exponentially so that the amortized cost is still O(1). This is similar to growing dynamic arrays like std::vector.

It’s common to use powers of two for the array size, because the slow modulo operator can be replaced with hash(key) & (size - 1);

Lazy Deletion With Open Addressing

With open addressing, a lookup fails on an empty slot. So erasing a slot will affect other colliding keys. This can be solved by re-inserting the affected keys.

Another solution is to simply mark the deleted slot as dead, so that lookups can continue searching while insertions can still reclaim the slot.

8.2 Considerations for Coding Hashtables

Worst Cases Matter Most

A single Redis instance can scale to hundreds of GB of memory. While most uses of Redis are not real-time applications, latency becomes a problem at this scale.

Hashtables are known for their O(1) amortized cost, but a single resize is still O(n) and can bring you down no matter how fast it is on average.

A partial solution is sharding: dividing the key space into smaller instances. This also scales throughput, but complicates things with new components.

The real solution is to progressively move keys to the new table, so that an insert does a bounded amount of work and a lookup can use both tables during resizing.

Besides resizing, collisions are also a source of worst cases. Balanced tree structures, although slower, are preferred for latency-sensitive apps.

CPU Cache Affects Performance

Many benchmarks show a big performance advantage for open addressing because they often use integer keys that fit in the slots, while a chaining hashtable uses heap-allocated linked lists.

The case for key lookup of a single probe:

The case for key lookup of multiple probes:

Open addressing is more CPU-cache-friendly, even if keys are also heap-allocated. But performance is not everything!

Chaining Hashtables Are Simple

A chaining hashtable is simply a combination of 2 collections:

Open addressing can also be simple when using simple strategies like linear or quadratic probing. But simple strategies have a poor search length distribution.

Even advanced open addressing strategies are worse than chaining in worst cases, because collisions lead to more collisions, whereas in chaining tables, colliding keys take up only 1 slot.

Most hashtable discussions are about open addressing, as there aren’t many tricks to chaining. You can skip the theories and just code a chaining table; it’s hard to mess up.

8.3 Considerations for Using Hashtables

Heap Allocations Per Key

Let’s count the number of heap allocations when inserting a key:

Chaining always seems to require 1 more allocation. However, the case of chaining with pointers to data can be avoided by embedding the list node into the key object, so that both chaining and open addressing require 1 allocation when keys are heap-allocated. Embedding the list node is called intrusive data structures, more on this later.

We’ve also differentiated the cases for integers and heap objects, which benchmarks often don’t do. So don’t make decisions based on random benchmarks.

Memory Overhead Per Key

Quick quiz: Estimate the memory usage of 1M keys of 10-byte size in Redis.

Answer: Measure it yourself. It’s higher than your guess.

In open addressing tables, the overhead comes from the slots:

In chaining tables using linked lists:

Comparison:

Shrinking a Hashtable?

While a hashtable will automatically grow, most hashtables won’t automatically shrink to save memory, because this is an example of premature optimization.

Hashtables in General-Purpose Libraries

Being general-purpose imposes some constraints. For example, std::unordered_map is guaranteed to allocate keys on the heap, so references to keys remain valid after resizing, and keys need not be movable. This is suboptimal for use cases without such requirements, because open addressing can be faster.

Also, general-purpose libraries are often not optimized for extreme cases (such as latency). If Redis were built with C++ STL, it would not be successful.

There is a myth of building quality software on top of high-quality, general-purpose libraries. But quality is not composable, you still need to learn things.

8.4 Intrusive Data Structures

Intrusive data structures are a way of making generic collections.

Generic Collections

Generic means that the code is usable for different types of data. For example, there are several ways to make a linked list node.

Each way has drawbacks:

Embedding Structures Within Data

All of the above methods are based on the idea of encapsulating data within structures, which people learned early on in programming. However, it’s not always the best idea: structures can be added (intruded) to data without encapsulation!

struct Node {
    Node *next;
};

struct MyData {
    int foo;
    Node node;  // embedded structure
    // whatever ...
};
struct MultiIndexedData {
    int bar;
    Node node1; // embedded structure
    Node node2; // another structure
};

In the example above, the structure (list node) contains no data, instead, the data includes the structure. How is this generic? Let’s count the list size.

size_t count_nodes(struct Node *node) {
    size_t cnt = 0;
    for (; node != NULL; node = node->next) {
        cnt++;
    }
    return cnt;
}

Unlike templated code, this code knows nothing about data types at all. It’s generic because the structure is independent of the data.

Using Intrusive Data Structures

With intrusive chaining hashtables, a lookup returns a list node instead of the data. The list node is part of the data, so the container (data) can be found by offsetting the list node pointer. This is commonly done with the container_of macro.

Node *my_node = some_lookup_function();
MyData *my_data = container_of(my_node, MyData, node);
//                             ^ptr     ^type   ^member

container_of is usually defined as:

#define container_of(ptr, T, member) ({                  \
    const typeof( ((T *)0)->member ) *__mptr = (ptr);    \
    (T *)( (char *)__mptr - offsetof(T, member) );})

Ignoring the type checking, it can be simplified as:

#define container_of(ptr, T, member) \
    (T *)( (char *)ptr - offsetof(T, member) )

Summary: Intrusive Data Structures

Advantages:

Drawbacks:

8.5 Code a Chaining Hashtable

Step 1: Open Addressing or Chaining?

2 choices:

A chaining hashtable is simple and hard to mess up, so that’s what I chose. It’s also used by the real Redis.

Step 2: Define List Node

// hashtable node, should be embedded into the payload
struct HNode {
    HNode *next = NULL;
    uint64_t hcode = 0; // cached hash value
};

The hash value is cached in the node, this is for …

Step 3: Create Fixed Sized Hashtables

Resizing will be progressive, so we need a fixed sized table first.

struct HTab {
    HNode **tab = NULL; // array of `HNode *`
    size_t mask = 0;    // 2^n - 1
    size_t size = 0;
};

// n must be a power of 2
static void h_init(HTab *htab, size_t n) {
    assert(n > 0 && ((n - 1) & n) == 0);
    htab->tab = (HNode **)calloc(sizeof(HNode *), n);
    htab->mask = n - 1;
    htab->size = 0;
}

We use powers of 2 for growth, so a bit mask is used instead of the slow modulo operator, since modulo by the power of 2 is the same as getting the lower bits.

Step 4: List Insertion

Just a standard linked list insertion. Note that we don’t care about the type of keys or how to allocate the node, that’s how to code intrusive data structures.

// hashtable insertion
static void h_insert(HTab *htab, HNode *node) {
    size_t pos = node->hcode & htab->mask;  // slot index
    HNode *next = htab->tab[pos];           // prepend the list
    node->next = next;
    htab->tab[pos] = node;
    htab->size++;
}

Step 5: Table Lookup

The eq callback is for checking equality. Note the fast path of comparing hash codes.

static HNode **h_lookup(HTab *htab, HNode *key, bool (*eq)(HNode *, HNode *)) {
    if (!htab->tab) {
        return NULL;
    }

    size_t pos = key->hcode & htab->mask;
    HNode **from = &htab->tab[pos];     // incoming pointer to the result
    for (HNode *cur; (cur = *from) != NULL; from = &cur->next) {
        if (cur->hcode == key->hcode && eq(cur, key)) {
            return from;
        }
    }
    return NULL;
}

The lookup is a standard linked list search. Except that it returns the address of the incoming pointer instead of the node pointer. This is for node deletions.

Step 5: Table Deletion

To delete a list node, the previous node is updated to point to the next node, except for the first node, there is no previous node! Most people will add a special case for deleting the first node:

But this is not necessary, consider that in both cases the update just sets a pointer to the next node, you don’t need to know if it is from the slot or a node, as long as you have the address (of the incoming pointer).

// remove a node from the chain
static HNode *h_detach(HTab *htab, HNode **from) {
    HNode *node = *from;
    *from = node->next;
    htab->size--;
    return node;
}

What you can learn from this: Special cases may not be special.

Step 6: Progressive Resizing

The real hashtable we’ll use contains 2 fixed sized tables for progressive resizing.

// the real hashtable interface.
struct HMap {
    HTab ht1;   // newer
    HTab ht2;   // older
    size_t resizing_pos = 0;
};

The insertion function will do this:

const size_t k_max_load_factor = 8;

void hm_insert(HMap *hmap, HNode *node) {
    if (!hmap->ht1.tab) {
        h_init(&hmap->ht1, 4);  // 1. Initialize the table if it is empty.
    }
    h_insert(&hmap->ht1, node); // 2. Insert the key into the newer table.

    if (!hmap->ht2.tab) {       // 3. Check the load factor
        size_t load_factor = hmap->ht1.size / (hmap->ht1.mask + 1);
        if (load_factor >= k_max_load_factor) {
            hm_start_resizing(hmap);    // create a larger table
        }
    }
    hm_help_resizing(hmap);     // 4. Move some keys into the newer table.
}

static void hm_start_resizing(HMap *hmap) {
    assert(hmap->ht2.tab == NULL);
    // create a bigger hashtable and swap them
    hmap->ht2 = hmap->ht1;
    h_init(&hmap->ht1, (hmap->ht1.mask + 1) * 2);
    hmap->resizing_pos = 0;
}

hm_help_resizing() moves some keys to the new table, it’s triggered from both lookups and updates.

const size_t k_resizing_work = 128; // constant work

static void hm_help_resizing(HMap *hmap) {
    size_t nwork = 0;
    while (nwork < k_resizing_work && hmap->ht2.size > 0) {
        // scan for nodes from ht2 and move them to ht1
        HNode **from = &hmap->ht2.tab[hmap->resizing_pos];
        if (!*from) {
            hmap->resizing_pos++;
            continue;
        }

        h_insert(&hmap->ht1, h_detach(&hmap->ht2, from));
        nwork++;
    }

    if (hmap->ht2.size == 0 && hmap->ht2.tab) {
        // done
        free(hmap->ht2.tab);
        hmap->ht2 = HTab{};
    }
}

The Lookup checks both tables:

HNode *hm_lookup(HMap *hmap, HNode *key, bool (*eq)(HNode *, HNode *)) {
    hm_help_resizing(hmap);
    HNode **from = h_lookup(&hmap->ht1, key, eq);
    from = from ? from : h_lookup(&hmap->ht2, key, eq);
    return from ? *from : NULL;
}

Deletion is omitted because it’s trivial.

8.6 Putting It Together

Let’s define the key object.

// the structure for the key
struct Entry {
    struct HNode node;
    std::string key;
    std::string val;
};

We now have 3 heap objects per key (Entry + strings), but this can be reduced to 1 if you embed the string data into the key object. We’ll skip this kind of effort.

// The data structure for the key space.
static struct {
    HMap db;
} g_data;

static uint32_t do_get(
    std::vector<std::string> &cmd, uint8_t *res, uint32_t *reslen)
{
    Entry key;
    key.key.swap(cmd[1]);
    key.node.hcode = str_hash((uint8_t *)key.key.data(), key.key.size());

    HNode *node = hm_lookup(&g_data.db, &key.node, &entry_eq);
    if (!node) {
        return RES_NX;
    }

    const std::string &val = container_of(node, Entry, node)->val;
    assert(val.size() <= k_max_msg);
    memcpy(res, val.data(), val.size());
    *reslen = (uint32_t)val.size();
    return RES_OK;
}

static bool entry_eq(HNode *lhs, HNode *rhs) {
    struct Entry *le = container_of(lhs, struct Entry, node);
    struct Entry *re = container_of(rhs, struct Entry, node);
    return le->key == re->key;
}

Since we’re using intrusive data structures, the do_set() and do_del() are where the actual key allocation and deallocation happens.

8.7 What We Learned

Source code:

( Report an Error | Ask a Question) @ build-your-own.org

See also:
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.
Check it out

⟵ prev Contents next ⟶