05. B+Tree Deletion and Testing

5.1 High-level interfaces

We’ll add the interfaces to use the B+tree as a KV.

// insert a new key or update an existing key
func (tree *BTree) Insert(key []byte, val []byte) error
// delete a key and returns whether the key was there
func (tree *BTree) Delete(key []byte) (bool, error)

Most of the details are introduced with the tree insertion, so there’s not much more to learn from the deletion. Skip this chapter if you know the principle.

Handle the root node

There is some extra work in maintaining the root node for tree insertions.

  • Create the root node if the tree is empty.
  • Add a new root if the root node is split.
func (tree *BTree) Insert(key []byte, val []byte) error {
    // 1. check the length limit imposed by the node format
    if err := checkLimit(key, val); err != nil {
        return err // the only way for an update to fail
    }
    // 2. create the first node
    if tree.root == 0 {
        root := BNode(make([]byte, BTREE_PAGE_SIZE))
        // later...
        tree.root = tree.new(root)
        return nil
    }
    // 3. insert the key
    node := treeInsert(tree, tree.get(tree.root), key, val)
    // 4. grow the tree if the root is split
    nsplit, split := nodeSplit3(node)
    tree.del(tree.root)
    if nsplit > 1 {
        root := BNode(make([]byte, BTREE_PAGE_SIZE))
        // later ...
        tree.root = tree.new(root)
    } else {
        tree.root = tree.new(split[0])
    }
    return nil
}

Sentinel value

There is a trick when creating the first root: we inserted an empty key. This is called a sentinel value, it’s used to remove an edge case.

    if tree.root == 0 {
        root := BNode(make([]byte, BTREE_PAGE_SIZE))
        root.setHeader(BNODE_LEAF, 2)
        // a dummy key, this makes the tree cover the whole key space.
        // thus a lookup can always find a containing node.
        nodeAppendKV(root, 0, 0, nil, nil)
        nodeAppendKV(root, 1, 0, key, val)
        tree.root = tree.new(root)
        return nil
    }

Examine the lookup function nodeLookupLE, you’ll see that it won’t work if the key is out of the node’s range. This is fixed by inserting an empty key, which is the lowest possible key by sort order, so that a lookup will always find a position.

Grow the tree

The new root points to the split nodes of the old root. This won’t happen often for the tree height is O(log N).

    if nsplit > 1 { // the root was split, add a new level.
        root := BNode(make([]byte, BTREE_PAGE_SIZE))
        root.setHeader(BNODE_NODE, nsplit)
        for i, knode := range split[:nsplit] {
            ptr, key := tree.new(knode), knode.getKey(0)
            nodeAppendKV(root, uint16(i), ptr, key, nil)
        }
    }

5.2 Merge nodes

Node update functions

Since we’re copy-on-write, updating a node means copying data to a new node.

// remove a key from a leaf node
func leafDelete(new BNode, old BNode, idx uint16)
// merge 2 nodes into 1
func nodeMerge(new BNode, left BNode, right BNode)
// replace 2 adjacent links with 1
func nodeReplace2Kid(new BNode, old BNode, idx uint16, ptr uint64, key []byte)

Merge conditions

Deleting may result in empty nodes, which can be merged with a sibling if it has one. shouldMerge returns which sibling (left or right) to merge with.

// should the updated kid be merged with a sibling?
func shouldMerge(
    tree *BTree, node BNode, idx uint16, updated BNode,
) (int, BNode) {
    if updated.nbytes() > BTREE_PAGE_SIZE/4 {
        return 0, BNode{}
    }
    if idx > 0 {
        sibling := BNode(tree.get(node.getPtr(idx - 1)))
        merged := sibling.nbytes() + updated.nbytes() - HEADER
        if merged <= BTREE_PAGE_SIZE {
            return -1, sibling // left
        }
    }
    if idx+1 < node.nkeys() {
        sibling := BNode(tree.get(node.getPtr(idx + 1)))
        merged := sibling.nbytes() + updated.nbytes() - HEADER
        if merged <= BTREE_PAGE_SIZE {
            return +1, sibling // right
        }
    }
    return 0, BNode{}
}

Deleted keys mean unused space within nodes. In the worst case, a mostly empty tree can still retain a large number of nodes. We can improve this by triggering merges earlier — using 1/4 of a page as a threshold instead of the empty node, which is an arbitrary soft limit on the minimum node size.

5.3 B+tree deletion

This is similar to insertion, just replace splitting with merging.

// delete a key from the tree
func treeDelete(tree *BTree, node BNode, key []byte) BNode

// delete a key from an internal node; part of the treeDelete()
func nodeDelete(tree *BTree, node BNode, idx uint16, key []byte) BNode {
    // recurse into the kid
    kptr := node.getPtr(idx)
    updated := treeDelete(tree, tree.get(kptr), key)
    if len(updated) == 0 {
        return BNode{} // not found
    }
    tree.del(kptr)
    // check for merging
    new := BNode(make([]byte, BTREE_PAGE_SIZE))
    mergeDir, sibling := shouldMerge(tree, node, idx, updated)
    switch {
    case mergeDir < 0: // left
        merged := BNode(make([]byte, BTREE_PAGE_SIZE))
        nodeMerge(merged, sibling, updated)
        tree.del(node.getPtr(idx - 1))
        nodeReplace2Kid(new, node, idx-1, tree.new(merged), merged.getKey(0))
    case mergeDir > 0: // right
        merged := BNode(make([]byte, BTREE_PAGE_SIZE))
        nodeMerge(merged, updated, sibling)
        tree.del(node.getPtr(idx + 1))
        nodeReplace2Kid(new, node, idx, tree.new(merged), merged.getKey(0))
    case mergeDir == 0 && updated.nkeys() == 0:
        assert(node.nkeys() == 1 && idx == 0) // 1 empty child but no sibling
        new.setHeader(BNODE_NODE, 0)          // the parent becomes empty too
    case mergeDir == 0 && updated.nkeys() > 0: // no merge
        nodeReplaceKidN(tree, new, node, idx, updated)
    }
    return new
}

Even if a node becomes empty, it may not be merged if it has no siblings. In this case, the empty node is propagated to its parent and merged later.

5.4 Test the B+tree

The data structure only interacts with the rest of the DB via the 3 page management callbacks. To test the B+tree, we can simulate pages in memory.

type C struct {
    tree  BTree
    ref   map[string]string // the reference data
    pages map[uint64]BNode  // in-memory pages
}

func newC() *C {
    pages := map[uint64]BNode{}
    return &C{
        tree: BTree{
            get: func(ptr uint64) []byte {
                node, ok := pages[ptr]
                assert(ok)
                return node
            },
            new: func(node []byte) uint64 {
                assert(BNode(node).nbytes() <= BTREE_PAGE_SIZE)
                ptr := uint64(uintptr(unsafe.Pointer(&node[0])))
                assert(pages[ptr] == nil)
                pages[ptr] = node
                return ptr
            },
            del: func(ptr uint64) {
                assert(pages[ptr] != nil)
                delete(pages, ptr)
            },
        },
        ref:   map[string]string{},
        pages: pages,
    }
}

C.pages is a map of allocated pages. It’s used to validate pointers and read pages. The pointers are actually in-memory pointers, and the B+tree code doesn’t care.

To test the B+tree, we first need to update it under various scenarios and then verify the result. The verification is generic, there are 2 things to verify:

  1. The structure is valid.
    • Keys are sorted.
    • Node sizes are within limits.
  2. The data matches a reference. We used a map to capture each update.
func (c *C) add(key string, val string) {
    c.tree.Insert([]byte(key), []byte(val))
    c.ref[key] = val // reference data
}

The test cases are left as an exercise. The next thing is B+tree on disk.