05. Code a B+Tree (Part II)
5.1 High-level interfaces
We’ll add the interfaces to use the B+tree as a KV.
// insert a new key or update an existing key
func (tree *BTree) Insert(key []byte, val []byte)
// delete a key and returns whether the key was there
func (tree *BTree) Delete(key []byte) bool
Most of the details are introduced with the tree insertion, so there’s not much more to learn from the deletion. Skip this chapter if you know the principle.
Keep the root node
There is some extra work in maintaining the root node for tree insertions.
- Create the root node if the tree is empty.
- Add a new root if the root node is split.
func (tree *BTree) Insert(key []byte, val []byte) {
if tree.root == 0 {
// create the first node
:= BNode{data: make([]byte, BTREE_PAGE_SIZE)}
root .setHeader(BNODE_LEAF, 2)
root// a dummy key, this makes the tree cover the whole key space.
// thus a lookup can always find a containing node.
(root, 0, 0, nil, nil)
nodeAppendKV(root, 1, 0, key, val)
nodeAppendKV.root = tree.new(root)
treereturn
}
:= treeInsert(tree, tree.get(tree.root), key, val)
node , split := nodeSplit3(node)
nsplit.del(tree.root)
treeif nsplit > 1 {
// the root was split, add a new level.
:= BNode{data: make([]byte, BTREE_PAGE_SIZE)}
root .setHeader(BNODE_NODE, nsplit)
rootfor i, knode := range split[:nsplit] {
, key := tree.new(knode), knode.getKey(0)
ptr(root, uint16(i), ptr, key, nil)
nodeAppendKV}
.root = tree.new(root)
tree} else {
.root = tree.new(split[0])
tree}
}
Sentinel value
There is a trick when creating the first root: we inserted an empty key. This is called a sentinel value, it’s used to remove an edge case.
If you examine the lookup function nodeLookupLE
, you’ll
see that it won’t work if the key is out of the node range. This is
fixed by inserting an empty key into the tree, which is the lowest
possible key by sort order, so that nodeLookupLE
will
always find a position.
5.2 Merge nodes
Node update functions
We’ll need some new functions for the tree deletion.
// remove a key from a leaf node
func leafDelete(new BNode, old BNode, idx uint16)
// merge 2 nodes into 1
func nodeMerge(new BNode, left BNode, right BNode)
// replace 2 adjacent links with 1
func nodeReplace2Kid(
new BNode, old BNode, idx uint16, ptr uint64, key []byte,
)
Merge conditions
Deleting may result in empty nodes, which can be merged with a
sibling if it has one. shouldMerge
returns which sibling
(left or right) to merge with.
// should the updated kid be merged with a sibling?
func shouldMerge(
*BTree, node BNode,
tree uint16, updated BNode,
idx ) (int, BNode) {
if updated.nbytes() > BTREE_PAGE_SIZE/4 {
return 0, BNode{}
}
if idx > 0 {
:= tree.get(node.getPtr(idx - 1))
sibling := sibling.nbytes() + updated.nbytes() - HEADER
merged if merged <= BTREE_PAGE_SIZE {
return -1, sibling // left
}
}
if idx+1 < node.nkeys() {
:= tree.get(node.getPtr(idx + 1))
sibling := sibling.nbytes() + updated.nbytes() - HEADER
merged if merged <= BTREE_PAGE_SIZE {
return +1, sibling // right
}
}
return 0, BNode{}
}
Deleted keys mean unused space within nodes. In the worst case, a mostly empty tree can still retain a large number of nodes. We can improve this by triggering merges earlier — using 1/4 of a page as a threshold instead of the empty node, which is a soft limit on the minimum node size.
5.3 B+tree deletion
This is similar to insertion, just replace splitting with merging.
// delete a key from the tree
func treeDelete(tree *BTree, node BNode, key []byte) BNode
// delete a key from an internal node; part of the treeDelete()
func nodeDelete(tree *BTree, node BNode, idx uint16, key []byte) BNode {
// recurse into the kid
:= node.getPtr(idx)
kptr := treeDelete(tree, tree.get(kptr), key)
updated if len(updated.data) == 0 {
return BNode{} // not found
}
.del(kptr)
tree
new := BNode{data: make([]byte, BTREE_PAGE_SIZE)}
// check for merging
, sibling := shouldMerge(tree, node, idx, updated)
mergeDirswitch {
case mergeDir < 0: // left
:= BNode{data: make([]byte, BTREE_PAGE_SIZE)}
merged (merged, sibling, updated)
nodeMerge.del(node.getPtr(idx - 1))
tree(new, node, idx-1, tree.new(merged), merged.getKey(0))
nodeReplace2Kidcase mergeDir > 0: // right
:= BNode{data: make([]byte, BTREE_PAGE_SIZE)}
merged (merged, updated, sibling)
nodeMerge.del(node.getPtr(idx + 1))
tree(new, node, idx, tree.new(merged), merged.getKey(0))
nodeReplace2Kidcase mergeDir == 0 && updated.nkeys() == 0:
(node.nkeys() == 1 && idx == 0) // 1 empty child but no sibling
assertnew.setHeader(BNODE_NODE, 0) // the parent becomes empty too
case mergeDir == 0 && updated.nkeys() > 0: // no merge
(tree, new, node, idx, updated)
nodeReplaceKidN}
return new
}
Even if a node becomes empty, it may not be merged if it has no siblings. In this case, the empty node is propagated to its parent and merged later.
5.4 Test the B+tree
The data structure only interacts with the rest of the DB via the 3 page management callbacks. To test the B+tree, we can simulate pages in memory.
type C struct {
tree BTreemap[string]string // the reference data
ref map[uint64]BNode // in-memory pages
pages }
func newC() *C {
:= map[uint64]BNode{}
pages return &C{
: BTree{
tree: func(ptr uint64) BNode {
get, ok := pages[ptr]
node(ok)
assertreturn node
},
new: func(node BNode) uint64 {
(node.nbytes() <= BTREE_PAGE_SIZE)
assert:= uint64(uintptr(unsafe.Pointer(&node.data[0])))
ptr (pages[ptr].data == nil)
assert[ptr] = node
pagesreturn ptr
},
: func(ptr uint64) {
del(pages[ptr].data != nil)
assertdelete(pages, ptr)
},
},
: map[string]string{},
ref: pages,
pages}
}
C.pages
is a map of allocated pages. It’s used to
validate pointers and read pages. The pointers are actually in-memory
pointers, and the B+tree code doesn’t care.
To test the B+tree, we first need to update it under various scenarios and then verify the result. The verification is generic, there are 2 things to verify:
- The structure is valid.
- Keys are sorted.
- Node sizes are within limits.
- The data matches a reference. We used a map to capture each update.
func (c *C) add(key string, val string) {
.tree.Insert([]byte(key), []byte(val))
c.ref[key] = val // reference data
c}
The test cases are left as an exercise. The next thing is B+tree on disk.
( Report an Error | Ask a Question) @ build-your-own.org
codecrafters.io offers “Build Your Own X” courses in many programming languages.
Including Redis, Git, SQLite, Docker, and more.