B-Trees and B+ Trees: Database Index Structures

Understand B-trees and B+ trees for disk-based storage, database indexing, and efficient range queries with O(log n) operations.

published: April 7, 2026 reading time: 16 min read author: GeekWorkBench updated: May 17, 2026

Quick Summary

B-trees solve the disk I/O problem binary trees face by storing hundreds of keys per node, reducing tree height from ~30 to ~3 for billion-record datasets. B+ trees go further by storing all records in linked leaf nodes, with internal nodes only holding routing keys—enabling efficient sequential access for range queries. MySQL's InnoDB, PostgreSQL, Oracle, and SQL Server all use B+ trees as their primary index structure. Understanding page size, fill factor, and the difference between clustered and non-clustered indexes helps optimize real-world database performance.

B-Trees and B+ Trees: Database Index Structures

B-trees are the workhorse of database indexes and file systems. While balanced binary trees give O(log n) search, they perform terribly on disk—each node access might require a disk read. B-trees solve this by storing hundreds of keys per node, matching disk block sizes and reducing tree height from ~30 (for billion records) to ~3.

The B+ tree variant that’s even more common in databases adds a crucial optimization: all records are stored in leaf nodes, while internal nodes only store keys. The leaf nodes are linked sequentially, enabling efficient range queries by traversing the linked list rather than revisiting parent nodes. MySQL’s InnoDB, PostgreSQL, and most RDBMSes use B+ trees as their primary index structure.

Introduction

B-trees are the workhorse of database indexes and file systems, yet their design makes perfect sense once you understand the problem they solve: disk I/O is expensive (milliseconds versus nanoseconds for memory), and balanced binary trees with two children per node require too many disk reads to be practical for large datasets. A B-tree node typically holds hundreds of key-pointer pairs, matching disk block sizes so that each node access costs at most one disk read. With t=100 (200 keys per node), a billion records requires only ~3 levels instead of ~30 for a binary tree.

B+ trees are the dominant variant in relational databases. They optimize B-trees for read-heavy workloads by storing all data only in leaf nodes, while internal nodes store only keys for routing. The leaf nodes are linked sequentially, making range queries efficient by traversing the linked list rather than bouncing back to parent nodes repeatedly. MySQL’s InnoDB, PostgreSQL, Oracle, and SQL Server all use B+ trees as their primary index structure.

This guide covers B-tree and B+ tree structure and operations, node split and merge mechanics, and the critical differences that make B+ trees preferable for most database workloads. You’ll understand why page size and fill factor matter in practice, how clustered versus non-clustered indexes affect performance, and why sequential ID generation (auto-increment) causes less index bloat than random UUIDs.

When to Use

B-trees and B+ trees apply when:

Database indexing — Primary access path for most relational databases
Filesystem indexing — NTFS, ext4, and HFS+ use B+ tree variants
Large datasets on disk — Minimizing disk I/O matters more than memory efficiency
Range queries — B+ tree linked leaves enable sequential access
Ordered iteration — Keys can be retrieved in sorted order efficiently

When Not to Use

In-memory databases (red-black trees or skip lists may be faster due to cache locality)
Write-heavy workloads (B-tree rebalancing involves page splits that can be expensive)
Key-value stores only needing point queries (hash indexes are O(1))

Architecture: Multi-Way Balanced Trees

graph TD
    subgraph Internal["B+ Tree Internal Node"]
        I1["[20 | 40 | 60]"]
    end

    subgraph Leaf["B+ Tree Leaf Nodes (linked)"]
        L1["[10-15] -> [20-30] -> [40-50] -> [60-70]"]
    end

    I1 --> L1

Internal nodes store keys for routing; leaf nodes store actual data pointers and are linked for range access.

Implementation

B-Tree Node

class BTreeNode:
    def __init__(self, t, leaf=False):
        self.t = t  # Minimum degree (defines order)
        self.leaf = leaf
        self.keys = []  # Key values
        self.children = []  # Child pointers
        self.n = 0  # Current number of keys


class BTree:
    """
    B-tree implementation for educational purposes.
    In practice, use database-provided implementations.
    """

    def __init__(self, t):
        self.t = t  # Minimum degree (order = 2t)
        self.root = BTreeNode(t, leaf=True)

    def search(self, k, node=None):
        """Search for key k, return node if found."""
        if node is None:
            node = self.root

        i = 0
        while i < node.n and k > node.keys[i]:
            i += 1

        if i < node.n and k == node.keys[i]:
            return node, i
        if node.leaf:
            return None
        return self.search(k, node.children[i])

    def insert(self, k):
        """Insert key k into B-tree."""
        root = self.root
        if len(root.keys) == 2 * self.t - 1:
            new_root = BTreeNode(self.t, leaf=False)
            new_root.children.append(root)
            self._split_child(new_root, 0, root)
            self.root = new_root
        self._insert_nonfull(self.root, k)

    def _split_child(self, parent, i, child):
        """Split a full child node."""
        t = self.t
        new_child = BTreeNode(t, leaf=child.leaf)
        new_child.n = t - 1

        # Copy keys to new child
        for j in range(t - 1):
            new_child.keys.append(child.keys[j + t])
        if not child.leaf:
            for j in range(t):
                new_child.children.append(child.children[j + t])

        child.n = t - 1
        child.keys = child.keys[:t - 1]
        child.children = child.children[:t]

        # Insert median key into parent
        parent.keys.insert(i, child.keys.pop(t - 1))
        parent.children.insert(i + 1, new_child)

    def _insert_nonfull(self, node, k):
        """Insert into non-full node."""
        i = len(node.keys) - 1
        if node.leaf:
            node.keys.append(None)
            while i >= 0 and k < node.keys[i]:
                node.keys[i + 1] = node.keys[i]
                i -= 1
            node.keys[i + 1] = k
        else:
            while i >= 0 and k < node.keys[i]:
                i -= 1
            i += 1
            if len(node.children[i].keys) == 2 * self.t - 1:
                self._split_child(node, i, node.children[i])
                if k > node.keys[i]:
                    i += 1
            self._insert_nonfull(node.children[i], k)

Common Pitfalls / Anti-Patterns

Confusing t (minimum degree) with order — B-tree of order m has max m children; t = ceil(m/2)
Not handling root split — Special case when root is full and needs to become internal
B+ tree leaf vs internal keys — Internal nodes store separator keys for routing only
Disk block alignment — Real implementations align to filesystem blocks (4KB typically)

Quick Recap Checklist

B-trees store up to 2t-1 keys per node, t-1 to 2t-1 children
All leaves at same depth (balanced)
B+ tree: only leaves store data, internal nodes store routing keys only
B+ tree leaves are linked for efficient range scans
Used by virtually all relational databases for index access

Trade-off Analysis

Aspect	B-Tree	B+ Tree	Notes
Search depth	O(log n) to leaf	O(log n) to leaf	Same asymptotic
Range queries	Must revisit parent	Sequential leaf links	B+ tree wins
Insert/Split	Split may affect all levels	Same	Similar complexity
Storage efficiency	All nodes store data	Only leaves store data	B+ tree wins
Pointer overhead	Keys + data in all nodes	Internal nodes only have keys	B+ tree wins
Point query speed	Can find at leaf earlier	Always to leaf	B-tree may be faster
Leaf scan speed	Non-sequential	Sequential via linked list	B+ tree wins

When to prefer B-tree: Write-heavy workloads, point queries where early termination helps, embedded databases with limited RAM.

When to prefer B+ tree: Read-heavy workloads with range scans, database indexes, filesystem metadata.

Real-world Failure Scenarios

PostgreSQL index bloat: Frequent updates leave empty space in pages, causing index size inflation. Periodic REINDEX or VACUUM needed.
MySQL InnoDB page splits: Random inserts cause many page splits at intermediate levels, degrading insert performance. Bulk loading or sorted inserts perform better.
SQLite B-tree depth growth: Small cache + large dataset = cache misses. With default page size, deep trees cause 10-20 I/Os per query on cold cache.
MongoDB index selectivity: Low-selectivity indexes (e.g., boolean field) cause full index scans that are slower than collection scans.
Hot spot nodes in distributed B+ trees: Cassandra/ScyllaDB partition splits create hot nodes. Partition strategy matters more than tree algorithm.

Interview Questions

1. Why are B-trees better for disk-based storage than binary trees?

Disk I/O is expensive (milliseconds vs nanoseconds for memory). A binary tree with 1 billion records has height ~30—worst case 30 disk reads. A B-tree with t=100 (200 keys per node) has height only 3—maximum 3 disk reads. B-trees maximize fanout to minimize tree height and disk accesses per lookup.

2. What's the difference between B-tree and B+ tree?

In a B-tree, all nodes (including leaves) can store data and keys. In a B+ tree, only leaf nodes store data; internal nodes store only routing keys. This means: B+ trees have higher fan-out (more keys per internal node), all leaves at same depth, and efficient range queries via linked leaves. B-trees can find a key without traversing to a leaf.

3. What happens during B-tree node split?

When a node is full (2t-1 keys) and needs to insert, it splits at the median key. The median key goes to the parent; the left half stays in original node, right half goes to new node. This may cascade—parent might also be full, requiring recursive split. Splits propagate toward root at most O(tree height), typically 3-4 levels for large trees.

4. How does B+ tree support efficient range queries?

B+ tree leaves are linked via next/prev pointers. After finding the starting leaf via the tree, range queries traverse the linked list without returning to parent nodes. This enables sequential leaf access with minimal disk seeks—one I/O per leaf page vs. O(log n) parent revisits for a standard B-tree.

5. What is the relationship between B-tree order (m) and minimum degree (t)?

A B-tree of order m has a maximum of m children per node. The minimum degree t defines the node capacity as: minimum children = t, maximum children = 2t. Relationship: m = 2t (or close to it depending on implementation). For example, with t=50, each node holds up to 99 keys and up to 100 children pointers.

6. Why do database indexes use B+ trees instead of red-black trees?

Red-black trees have height O(log n) but low fanout (~2 children per node), meaning ~30 levels for 1 billion records. Each level may be a disk I/O. B+ trees pack hundreds of keys per node (matching 4KB disk blocks), reducing height to 3-4 levels. Fewer levels = fewer disk seeks = orders of magnitude faster for storage-backed systems.

7. What is write amplification in B-trees and when does it matter?

Write amplification = actual disk writes per user insert. B-trees cause writes when: (1) updating leaf pages, (2) splitting pages (write parent and two children), (3) updating parent keys. In write-heavy workloads, page splits cause multiple disk writes per insert. LSM trees reduce write amplification by batching writes in memory before sequential SSTable writes, at the cost of read amplification.

8. How does key size affect B-tree performance?

Larger keys reduce fanout: fewer keys fit per node, increasing tree height and I/O per search. For a 4KB page with 8-byte pointers, a 16-byte key allows ~200 keys per node (height ~3 for 1B records), while a 64-byte key drops to ~50 keys (height ~5). Composite indexes (key1, key2) compound this. Covered indexes and prefix compression mitigate the problem for index-only scans.

9. How does B+ tree handle deletions and what is page reclamation?

Expected answer points:

Deletions merge sibling nodes when keys fall below minimum threshold (t-1)
Coalescence or redistribution handles underfull nodes
Page reclamation: deleted pages return to free list for reuse
PostgreSQL VACUUM reclaims dead tuples; MySQL InnoDB purge thread handles deleted records
Lazy deletion: pages marked deleted but not immediately reused until VACUUM/purge runs

10. What is the difference between clustered and non-clustered indexes in the context of B+ trees?

Expected answer points:

Clustered index: B+ tree leaf pages ARE the data pages; table rows stored in order of index key
Non-clustered index: B+ tree leaf pages contain indexed key + row pointer to data
Each table has one clustered index (primary key typically); multiple non-clustered indexes
Clustered: range queries faster (data already ordered), insert/delete more expensive
Non-clustered: extra I/O for data lookup via pointer chase

11. What is index bloat and how does it affect B+ tree performance?

Expected answer points:

Index bloat: accumulation of dead or reusable pages due to updates/deletes
Causes: random inserts/deletes leave gaps, update-in-place rewrites pages
Symptoms: index size larger than necessary, degraded scan performance, increased cache pressure
Solutions: REINDEX rebuilds index from scratch; VACUUM (PostgreSQL) reclaims dead space
MySQL InnoDB: OPTIMIZE TABLE reorganizes pages;ibuffg purge thread handles deletions

12. How does B+ tree height stay low even with billions of records?

Expected answer points:

High fanout: each internal node holds hundreds of children pointers
4KB page with 8-byte pointers allows ~500 children per internal node
Height calculation: log_500(1 billion) ≈ 3.4 — only 4 levels needed
Root node typically cached in memory; path from root to leaf is 2-3 disk I/Os
B-tree variants with larger pages (e.g., 64KB) can achieve height of 2 for same dataset

13. What is the fill factor in B+ tree indexes and why does it matter?

Expected answer points:

Fill factor: percentage of page space to reserve for future inserts (default 90%)
Higher fill factor: more storage efficient, fewer pages, but more page splits on insert
Lower fill factor: leaves room for in-page inserts, reduces splits but wastes space
Use low fill factor for heavily updated tables, high fill factor for read-mostly
PostgreSQL: CREATE INDEX WITH (fillfactor=70); MySQL InnoDB:innodb_fill_factor

14. How does B+ tree support unique indexes vs non-unique indexes?

Expected answer points:

Unique index: each key appears exactly once in tree; enforced at insert time
Non-unique index: duplicate keys allowed; typically handled by appending row IDs
B+ tree stores (key, row_pointer) tuples; for duplicates, uses (key, transaction_id) or list pages
Unique constraint check traverses tree O(log n) then verifies no match at leaf
Non-unique: may need to scan all matching entries for query execution

15. What is prefix compression in B+ tree leaf pages and how does it work?

Expected answer points:

Leaf pages store sorted keys; adjacent keys often share common prefixes
Prefix compression: store first key fully, subsequent keys store only differing suffix
Example: ["apple", "apricot", "banana"] becomes ["apple", "-ricot", "banana"]
Reduces leaf page space by 20-40% depending on key prefix similarity
Decompression needed on read; trade-off between CPU and I/O savings

16. Compare B+ trees with LSM trees (Log-Structured Merge Trees).

Expected answer points:

LSM trees: writes go to memory (MemTable), then flushed to SSTables on disk
B+ trees: in-place writes; page splits cause random I/O
LSM tree write amplification: 1 (data written once), B+ tree: 3-10x
B+ tree read amplification: O(log n), LSM tree: O(log n) + K (K = number of SSTables)
LSM trees: better for write-heavy workloads; B+ trees: better for read-heavy or mixed

17. What is a covering index and how does it eliminate table lookups in B+ trees?

Expected answer points:

Covering index: query can be satisfied entirely from index pages without touching table
All referenced columns included in index key or stored as included columns
Query executor does index lookup, reads leaf nodes only — no row pointer chase to heap
PostgreSQL: "Index Only Scan"; MySQL: "Using index" in EXPLAIN
Trade-offs: larger index, slower inserts/updates, but faster reads for covered queries

18. How does the buffer pool/cache affect B+ tree performance in databases?

Expected answer points:

Buffer pool (innodb_buffer_pool_size, shared_buffers) caches B+ tree pages in memory
Hot pages: root, intermediate nodes cached; leaf pages loaded on demand
Cache hit ratio determines effective I/O cost; 99% hit ratio = 1% disk reads
LRU or clock-sweep eviction policies manage buffer pool pages
Prefetching: sequential leaf scans pre-load adjacent pages into buffer pool

19. What are the advantages and disadvantages of using GUIDs vs sequential IDs as B+ tree keys?

Expected answer points:

Sequential IDs (auto-increment): inserts go to rightmost leaf page — minimal page splits
GUIDs (UUIDs): random insertion point causes page splits throughout tree
Random GUIDs: write amplification high, index bloat higher, cache efficiency lower
Solutions: UUID v7 (time-ordered), UUID v1 with node ID, COMB (sequential GUIDs)
PostgreSQL: gen_random_uuid(); MySQL: UUID() — both random by default

20. How does B+ tree behave under high concurrency with multiple readers and writers?

Expected answer points:

Latch coupling: B+ tree operations acquire locks on nodes being modified
Read-write locks: readers share, writers exclusive; prevents dirty reads
Lock escalation: fine-grained node locks scale better than table-level locks
Write-write: page split or merge requires parent lock during modification
PostgreSQL: Heap Only Tuple (HOT) updates avoid index maintenance when possible
MySQL InnoDB: gap locks and next-key locks for isolation; reduces deadlocks but adds contention

Conclusion

B-trees and B+ trees are the backbone of virtually every relational database you’ve ever used. The core idea is simple: keep trees shallow by cramming more keys into each node, so disk I/O stays low. B+ trees go a step further by chaining leaf nodes together, which makes range scans actually bearable. If you want to see where this leads next, look into composite indexes, covering indexes, and how query planners decide which index to use — Database Indexing Deep Dive is a good starting point.

B-Trees and B+ Trees: Database Index Structures

B-Trees and B+ Trees: Database Index Structures

Introduction

When to Use

When Not to Use

Architecture: Multi-Way Balanced Trees

Implementation

B-Tree Node

Common Pitfalls / Anti-Patterns

Quick Recap Checklist

Trade-off Analysis

Real-world Failure Scenarios

Interview Questions

Further Reading

Conclusion

B-Trees and B+ Trees: Database Index Structures

B-Trees and B+ Trees: Database Index Structures

Introduction

When to Use

When Not to Use

Architecture: Multi-Way Balanced Trees

Implementation

B-Tree Node

Common Pitfalls / Anti-Patterns

Quick Recap Checklist

Trade-off Analysis

Real-world Failure Scenarios

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Arrays vs Linked Lists: Understanding the Trade-offs

Arrays: 1D, 2D, and Multi-dimensional Mastery

AVL Trees: Self-Balancing Binary Search Trees