08. Data Structure: Hashtables

Let’s replace the placeholder map from the previous chapter.

8.1 Hashtable Quick Review

Skip to the next section if you already know what a hashtable is.

Data Structures for KV

There are 2 classes of data structures for KV stores:

  • Sorting data structures. Some examples:
    • AVL tree (balanced binary tree).
    • Treap (randomized binary tree).
    • Trie (n-ary tree).
    • B-tree (balanced n-ary tree).
    • Log structured merge tree (sorted arrays).
  • Hashing data structures. A.k.a. hashtables.
    • Open addressing.
    • Chaining.

Sorting data structures maintains the sort order and uses comparisons to narrow down the search, so the best and average case for lookup is O(log(n)).

While hashtables rely on uniformly distributed hash values for O(1) operations.

How to choose between them? The primary concern is whether the keys are sorted. For top-level keys in Redis, there is no use case that requires ordering, so hashtables are chosen because of the O(1) operations.

However, for relational databases, sorting is a core functionality, so hash indexes are rarely used. And Redis also offers the sorted set type for sorting.

Hash Functions Map Keys to Slots

A hashtable is an array of a fixed number of slots. A key is hashed into an integer hash value, and the slot index is typically just hash(key) % size, then the key is associated with the slot.

The hash function is used to:

  • Turn non-integer keys into integers so that they can be used as slot indexes.
  • Ensure that slot indexes are uniformly distributed.

Open Addressing and Chaining for Resolving Collisions

Multiple keys may be hashed to the same slot. 2 ways to resolve collisions:

  • Open addressing:
    • Keys are stored in slots.
    • Find another slot if it’s already occupied.
  • Chaining: Each slot points to a collection of colliding keys.

Finding an Empty Slot with Open Addressing

Many strategies with different probe length distributions:

  • Linear probing: Keep probing the next neighboring slot.
  • Quadratic probing: The step to the next probe is increased quadratically.
  • Double hashing: The step is computed by another hash function.
  • Many more …

Collection of Keys with Chaining

In chaining hashtables, the collection of colliding keys is often a linked list. Hence the name “chaining”. But linked lists are not the only option:

// an ad hoc chaining hashtable
std::vector<std::vector<T>> m;
m.resize(1024);
m[hash(key) % m.size()].push_back(pair);    // insert

The ad hoc chaining hashtable is handy in job interviews; it’s easier than linked lists, and there aren’t many ways to do (mess up) it, unlike open addressing.

Load Factor = Number_of_keys / Number_of_slots

  • Open addressing:
    • Load factor is limited below 1.
    • Deteriorates rapidly near the limit.
  • Chaining:
    • No hard limit on load factor.
    • Average search length scales linearly with load factor.

Resizable Hashtables

Unlike trees, hashtables do not grow gradually. If the load factor is too high, a larger hashtable is created to replace the old one.

The size of the new hashtable grows exponentially so that the amortized cost is still O(1). This is similar to growing dynamic arrays like std::vector.

It’s common to use powers of two for the array size, because the slow modulo operator can be replaced with hash(key) & (size - 1);

Lazy Deletion With Open Addressing

With open addressing, a lookup fails on an empty slot. So erasing a slot will affect other colliding keys. This can be solved by re-inserting the affected keys.

Another solution is to simply mark the deleted slot as dead, so that lookups can continue searching while insertions can still reclaim the slot.

8.2 Considerations for Coding Hashtables

Worst Cases Matter Most

A single Redis instance can scale to hundreds of GB of memory. While most uses of Redis are not real-time applications, latency becomes a problem at this scale.

Hashtables are known for their O(1) amortized cost, but a single resize is still O(n) and can bring you down no matter how fast it is on average.

A partial solution is sharding: dividing the key space into smaller instances. This also scales throughput, but complicates things with new components.

The real solution is to progressively move keys to the new table, so that an insert does a bounded amount of work and a lookup can use both tables during resizing.

Besides resizing, collisions are also a source of worst cases. Balanced tree structures, although slower, are preferred for latency-sensitive apps.

CPU Cache Affects Performance

Many benchmarks show a big performance advantage for open addressing because they often use integer keys that fit in the slots, while a chaining hashtable uses heap-allocated linked lists.

The case for key lookup of a single probe:

  • Open addressing: 1 memory read from the slot.
  • Chaining: 2 memory reads. The slot and the heap object.

The case for key lookup of multiple probes:

  • Open addressing: Depending on the strategy, the key may be found near the initial slot.
  • Chaining: Following multiple list nodes.

Open addressing is more CPU-cache-friendly, even if keys are also heap-allocated. But performance is not everything!

Chaining Hashtables Are Simple

A chaining hashtable is simply a combination of 2 collections:

  • Array of linked lists. Most commonly used.
  • Array of arrays. More CPU cache friendly.
  • Array of trees. Defense against worst cases.

Open addressing can also be simple when using simple strategies like linear or quadratic probing. But simple strategies have a poor search length distribution.

Even advanced open addressing strategies are worse than chaining in worst cases, because collisions lead to more collisions, whereas in chaining tables, colliding keys take up only 1 slot.

Most hashtable discussions are about open addressing, as there aren’t many tricks to chaining. You can skip the theories and just code a chaining table; it’s hard to mess up.

8.3 Considerations for Using Hashtables

Heap Allocations Per Key

Let’s count the number of heap allocations when inserting a key:

  • Open addressing with data stored in slots (such as integers): 0 allocations.
  • Open addressing with pointers to data (such as strings): 1 allocation.
  • Chaining with data stored in the list node: 1 allocation for the list node.
  • Chaining with pointers to data: 2 allocations.

Chaining always seems to require 1 more allocation. However, the case of chaining with pointers to data can be avoided by embedding the list node into the key object, so that both chaining and open addressing require 1 allocation when keys are heap-allocated. Embedding the list node is called intrusive data structures, more on this later.

We’ve also differentiated the cases for integers and heap objects, which benchmarks often don’t do. So don’t make decisions based on random benchmarks.

Memory Overhead Per Key

Quick quiz: Estimate the memory usage of 1M keys of 10-byte size in Redis.

Answer: Measure it yourself. It’s higher than your guess.

In open addressing tables, the overhead comes from the slots:

  • The slot contains auxiliary data such as the deletion bit.
  • The slot contains the key. Empty slots are proportional to used slots.

In chaining tables using linked lists:

  • Each slot is a pointer of constant size.
  • Each key needs a list node of constant size.

Comparison:

  • Open addressing with fixed sized keys: Proportional overhead.
  • Open addressing with pointers to keys: Fixed overhead per key.
  • Chaining using linked lists: Fixed overhead per key.

Shrinking a Hashtable?

While a hashtable will automatically grow, most hashtables won’t automatically shrink to save memory, because this is an example of premature optimization.

  • Shrinking is useless if the usage pattern is periodic.
  • The saved memory may not be returned to the OS due to fragmentation.

Hashtables in General-Purpose Libraries

Being general-purpose imposes some constraints. For example, std::unordered_map is guaranteed to allocate keys on the heap, so references to keys remain valid after resizing, and keys need not be movable. This is suboptimal for use cases without such requirements, because open addressing can be faster.

Also, general-purpose libraries are often not optimized for extreme cases (such as latency). If the real Redis were built with run-of-the-mill libraries such as C++ STL, it would not be successful. Because the hashtable resizing latency alone would kill many use cases.

There is a myth that you can build quality software on top of high-quality, general-purpose libraries. But the reality is that:

  • They do not exist beyond the simplest cases.
  • Even if they do exist, you may lack the knowledge to wield them.

So you still need to learn things to level up as a programmer.

8.4 Intrusive Data Structures

Intrusive data structures are a way of making generic collections.

Generic Collections

Generic means that the code is usable for different types of data. For example, there are several ways to make a linked list node.

  • C++ templates:

    template <class T>
    struct Node {
        T data;
        struct Node *next;
    };
  • void * in C:

    struct Node {
        void *data; // points to anything
        struct Node *next;
    };
  • Macro hacks in C:

    #define DEFINE_NODE(T) struct Node_ ## T { \
        T data; \
        struct Node_ ## T *next; \
    }
  • Fixed sized byte array (not generic at all):

    struct Node {
        char data[256];     // should be enough?
        struct Node *next;
    };
  • Code generators. Common in Go.

Each way has drawbacks:

  • Templates:
    • Header only. Slow compilation and bloated code.
    • Messy compiler error messages.
    • Not available in pure C.
  • void *:
    • Requires 1 extra heap allocation for the data (probably).
    • Tedious type casting.
  • Macro hacks:
    • Header only.
    • More messy than C++ templates.
    • Tedious to use.
  • Fixed sized byte array:
    • Not generic. Arbitrary limits.
    • Wastes space.
  • Code generators:
    • Extra build steps.

Embedding Structures Within Data

All of the above methods are based on the idea of encapsulating data within structures, which people learned early on in programming. However, it’s not always the best idea: structures can be added (intruded) to data without encapsulation!

struct Node {
    Node *next;
};

struct MyData {
    int foo;
    Node node;  // embedded structure
    // whatever ...
};
struct MultiIndexedData {
    int bar;
    Node node1; // embedded structure
    Node node2; // another structure
};

In the example above, the structure (list node) contains no data, instead, the data includes the structure. How is this generic? See this example:

size_t count_nodes(struct Node *node) {
    size_t cnt = 0;
    for (; node != NULL; node = node->next) {
        cnt++;
    }
    return cnt;
}

Unlike templated code, this code knows nothing about data types at all. The structure is independent of the data. But how do we use the data?

Using Intrusive Data Structures

With intrusive chaining hashtables, a lookup returns a list node instead of the data. The list node is part of the data, so the container (data) can be found by offsetting the list node pointer. This is commonly done with the container_of macro.

Node *my_node = some_lookup_function();
MyData *my_data = container_of(my_node, MyData, node);
//                             ^ptr     ^type   ^member

container_of is usually defined as:

#define container_of(ptr, T, member) ({                  \
    const typeof( ((T *)0)->member ) *__mptr = (ptr);    \
    (T *)( (char *)__mptr - offsetof(T, member) );})
  • The 1st line checks if ptr matches T::member with the GCC extension typeof.
  • The 2nd line offsets ptr from T::member to T.

Ignoring the type checking, it can be simplified as:

#define container_of(ptr, T, member) \
    (T *)( (char *)ptr - offsetof(T, member) )

Summary: Intrusive Data Structures

Advantages:

  • Generic, but not aware of data types.
    • Doesn’t require the code to be in the header.
    • Viable in plain C.
    • Can have different data types in the same collection.
  • No internal memory management.
    • Borrows space from data. No new allocations for heap-allocated data.
    • Flexible. Data allocations are entirely up to the user.
  • No assumptions about data ownership.
    • Share data between multiple collections without any indirection.

Drawbacks:

  • Needs type casting.
  • Not as easy to use as templated collections.

8.5 Code a Chaining Hashtable

Step 1: Open Addressing or Chaining?

2 choices:

  • Open addressing with pointers to keys.
  • Intrusive chaining hashtable.

A chaining hashtable is simple and hard to mess up, so that’s what I chose. It’s also used by the real Redis.

Step 2: Define List Node

// hashtable node, should be embedded into the payload
struct HNode {
    HNode *next = NULL;
    uint64_t hcode = 0; // cached hash value
};

The hash value is cached in the node, this is for …

  • Reusing the hash value when resizing.
  • Fast path for checking equality.

Step 3: Create Fixed Sized Hashtables

Resizing will be progressive, so we need a fixed sized table first.

struct HTab {
    HNode **tab = NULL; // array of `HNode *`
    size_t mask = 0;    // 2^n - 1
    size_t size = 0;
};

// n must be a power of 2
static void h_init(HTab *htab, size_t n) {
    assert(n > 0 && ((n - 1) & n) == 0);
    htab->tab = (HNode **)calloc(sizeof(HNode *), n);
    htab->mask = n - 1;
    htab->size = 0;
}

We use powers of 2 for growth, so a bit mask is used instead of the slow modulo operator, since modulo by the power of 2 is the same as getting the lower bits.

Step 4: List Insertion

Just a standard linked list insertion. Note that we don’t care about the type of keys or how to allocate the node, that’s how to code intrusive data structures.

// hashtable insertion
static void h_insert(HTab *htab, HNode *node) {
    size_t pos = node->hcode & htab->mask;  // slot index
    HNode *next = htab->tab[pos];           // prepend the list
    node->next = next;
    htab->tab[pos] = node;
    htab->size++;
}

Step 5: Table Lookup

The eq callback is for checking equality. Note the fast path of comparing hash codes.

static HNode **h_lookup(HTab *htab, HNode *key, bool (*eq)(HNode *, HNode *)) {
    if (!htab->tab) {
        return NULL;
    }

    size_t pos = key->hcode & htab->mask;
    HNode **from = &htab->tab[pos];     // incoming pointer to the result
    for (HNode *cur; (cur = *from) != NULL; from = &cur->next) {
        if (cur->hcode == key->hcode && eq(cur, key)) {
            return from;
        }
    }
    return NULL;
}

The lookup is a standard linked list search. Except that it returns the address of the incoming pointer instead of the node pointer. This is for node deletions.

Step 5: Table Deletion

To delete a list node, the previous node is updated to point to the next node, except for the first node, there is no previous node! Most people will add a special case for deleting the first node:

  • Update the slot when deleting the first node.
  • Otherwise, update the previous node.

But this is not necessary, consider that in both cases the update just sets a pointer to the next node, you don’t need to know if it is from the slot or a node, as long as you have the address (of the incoming pointer).

// remove a node from the chain
static HNode *h_detach(HTab *htab, HNode **from) {
    HNode *node = *from;
    *from = node->next;
    htab->size--;
    return node;
}

What you can learn from this: Special cases may not be special.

  • No special case for deleting list nodes.
  • No need for a separate function to search and delete.

Step 6: Progressive Resizing

The real hashtable we’ll use contains 2 fixed sized tables for progressive resizing.

// the real hashtable interface.
struct HMap {
    HTab ht1;   // newer
    HTab ht2;   // older
    size_t resizing_pos = 0;
};

The insertion function will do this:

const size_t k_max_load_factor = 8;

void hm_insert(HMap *hmap, HNode *node) {
    if (!hmap->ht1.tab) {
        h_init(&hmap->ht1, 4);  // 1. Initialize the table if it is empty.
    }
    h_insert(&hmap->ht1, node); // 2. Insert the key into the newer table.

    if (!hmap->ht2.tab) {       // 3. Check the load factor
        size_t load_factor = hmap->ht1.size / (hmap->ht1.mask + 1);
        if (load_factor >= k_max_load_factor) {
            hm_start_resizing(hmap);    // create a larger table
        }
    }
    hm_help_resizing(hmap);     // 4. Move some keys into the newer table.
}

static void hm_start_resizing(HMap *hmap) {
    assert(hmap->ht2.tab == NULL);
    // create a bigger hashtable and swap them
    hmap->ht2 = hmap->ht1;
    h_init(&hmap->ht1, (hmap->ht1.mask + 1) * 2);
    hmap->resizing_pos = 0;
}

hm_help_resizing() moves some keys to the new table, it’s triggered from both lookups and updates.

const size_t k_resizing_work = 128; // constant work

static void hm_help_resizing(HMap *hmap) {
    size_t nwork = 0;
    while (nwork < k_resizing_work && hmap->ht2.size > 0) {
        // scan for nodes from ht2 and move them to ht1
        HNode **from = &hmap->ht2.tab[hmap->resizing_pos];
        if (!*from) {
            hmap->resizing_pos++;
            continue;
        }

        h_insert(&hmap->ht1, h_detach(&hmap->ht2, from));
        nwork++;
    }

    if (hmap->ht2.size == 0 && hmap->ht2.tab) {
        // done
        free(hmap->ht2.tab);
        hmap->ht2 = HTab{};
    }
}

The Lookup checks both tables:

HNode *hm_lookup(HMap *hmap, HNode *key, bool (*eq)(HNode *, HNode *)) {
    hm_help_resizing(hmap);
    HNode **from = h_lookup(&hmap->ht1, key, eq);
    from = from ? from : h_lookup(&hmap->ht2, key, eq);
    return from ? *from : NULL;
}

Deletion is omitted because it’s trivial.

8.6 Putting It Together

Let’s define the key object.

// the structure for the key
struct Entry {
    struct HNode node;
    std::string key;
    std::string val;
};

We now have 3 heap objects per key (Entry + strings), but this can be reduced to 1 if you embed the string data into the key object. We’ll skip this kind of effort.

// The data structure for the key space.
static struct {
    HMap db;
} g_data;

static uint32_t do_get(
    std::vector<std::string> &cmd, uint8_t *res, uint32_t *reslen)
{
    Entry key;
    key.key.swap(cmd[1]);
    key.node.hcode = str_hash((uint8_t *)key.key.data(), key.key.size());

    HNode *node = hm_lookup(&g_data.db, &key.node, &entry_eq);
    if (!node) {
        return RES_NX;
    }

    const std::string &val = container_of(node, Entry, node)->val;
    assert(val.size() <= k_max_msg);
    memcpy(res, val.data(), val.size());
    *reslen = (uint32_t)val.size();
    return RES_OK;
}

static bool entry_eq(HNode *lhs, HNode *rhs) {
    struct Entry *le = container_of(lhs, struct Entry, node);
    struct Entry *re = container_of(rhs, struct Entry, node);
    return le->key == re->key;
}

Since we’re using intrusive data structures, the do_set() and do_del() are where the actual key allocation and deallocation happens.

8.7 What We Learned

  • 2 classes of hashtables: open addressing and chaining.
  • Intrusive data structures: generic and flexible.
  • Consider latency and worst cases when handling lots of data.
  • Avoid special cases via pointer indirection.

Source code: