Design and Analysis of Algorithms (B-Trees)
B-Trees
B-Trees are tree data structures that store sorted data. B-Trees can be seen as a generalization of Binary Search Trees where nodes can have more than one key/value and more than two children. Similar to BSTs, they support search, insertion, and deletion in logarithmic time.
1 Properties
A B-tree has a parameter called the minimum degree or branching factor. For the purposes of our discussion let the branching factor be B.
- For any non-leaf node, the number of children is one greater than the number of keys in that node.
- Every non-root node contains at least B − 1 keys. Consequently, all internal (non-leaf and non-root) nodes have at least B children.
- Every node contains at most 2B −1 keys. Consequently, all nodes have at most 2B children.
- All the leaves are at the same depth.
The keys is a B-tree are sorted in a similar fashion to BST. Consider a node x with C children. Let’s say that x has keys k1 < k2 < ... < kC. For ease of notation, we define k0 = ∞ and kn + 1 = −∞. If K belongs to the ith(1 ≤ i ≤ n + 1) sub-tree of x, then ki−1 ≤ K ≤ ki.
- Search time is O(log(n))
- Insert/Delete time is O(lg(n)) if B = O(1)
2 Why B-Trees
- Caches read whole blocks of data, and want entire block useful
- Set parameter B equal to block size
- O(logb(n)) block reads per Search, Insert, Delete operations.
B-Trees are used by most databases and filesystems:
-Databases: Sleepycat/BerkelyDB, MySQL, SQLite
-Filesystems: MacOS HFS/HFS+, ReiserFS, Windows NTFS, Linux ext3, shmfs