Directory Implementation

How file systems implement directories through linear search, hash tables, and B-trees, enabling efficient path resolution and file lookup operations.

published: May 19, 2026 reading time: 28 min read author: GeekWorkBench

Directory Implementation

When you type ls /home/user/documents/report.pdf, something has to figure out where that file actually lives on disk. That something is the directory implementation. The directory is the mapping between human-readable names and the underlying inodes that point to file data. How that mapping is structured determines how fast you can find files, how many files a directory can hold, and how the file system behaves under heavy load.

Most people think directories are simple folders. Under the hood, they’re sophisticated data structures that make the difference between a file system that screams and one that crawls.

Introduction

When to Use / When Not to Use

Understanding directory implementation helps with system design and troubleshooting.

When directory implementation matters:

Systems with millions of files in single directories
High-throughput workloads that create/delete files frequently
Backup systems that must scan directory structures
Troubleshooting slow file system operations

When you can ignore it:

Normal desktop/server use with moderate file counts
Cloud storage abstracted behind network file systems
Applications using higher-level APIs (databases, etc.)

Architecture or Flow Diagram

graph TD
    A[Path: /home/user/docs/file.txt] --> B[Root Directory]
    B --> C[Look up "home"]
    C --> D[User Directory]
    D --> E[Look up "docs"]
    E --> F[Docs Directory]
    F --> G[Look up "file.txt"]
    G --> H[Return Inode Number]

    style A stroke:#ff00ff,stroke-width:2px
    style H stroke:#ff00ff,stroke-width:2px

Path resolution walks through each component. The efficiency of each lookup step determines overall performance.

Core Concepts

Directory Entry Structure

At the basic level, a directory contains entries. Each entry maps a name to an inode:

// Classic Unix directory entry (simplified)
struct old_dirent {
    unsigned long d_ino;      // Inode number
    unsigned short d_reclen; // Record length
    unsigned char d_type;    // File type
    char d_name[256];        // File name (variable length)
};

// Modern directory entry with type
struct linux_dirent64 {
    unsigned long d_ino;      // Inode number
    unsigned long d_off;      // Offset to next entry
    unsigned short d_reclen;  // Record length
    unsigned char d_type;    // File type
    char d_name[];            // File name (null-terminated)
};

The d_reclen field enables entries of variable length. Older systems used fixed-size entries (often 16 or 32 bytes), which limited file names and wasted space.

Linear Search (Simple List)

The most basic directory implementation: entries stored as a simple list, searched sequentially.

graph TD
    A[Directory Entry Array] --> B[.]
    A --> C[..]
    A --> D[file1.txt]
    A --> E[file2.txt]
    A --> F[file3.txt]
    A --> G[fileN.txt]

    H[Search for "file3.txt"] --> I[Compare with file1.txt: NO]
    I --> J[Compare with file2.txt: NO]
    J --> K[Compare with file3.txt: YES - Return inode]

    style H stroke:#00fff9

Characteristics:

O(n) lookup time where n is number of entries
No memory overhead for indexing
Simple implementation
Performance degrades linearly with directory size

def linear_search(directory, target_name):
    """Simulate linear directory search."""
    for entry in directory.entries:
        if entry.name == target_name:
            return entry.inode
    return None  # Not found

# Complexity: O(n) where n = number of entries
# For 1 million files in a directory: up to 1 million comparisons

Hash Table Directories

Hash tables provide O(1) average lookup by computing a hash of the filename:

graph TD
    A[Filename: "document.txt"] --> B[Hash Function]
    B --> C[Bucket Index: 7]
    C --> D[Bucket 7]
    D --> E[Entry: document.txt -> inode 1234]

    F[Filename: "report.pdf"] --> G[Hash Function]
    G --> H[Bucket Index: 3]
    H --> I[Bucket 3]
    I --> J[Entry: report.pdf -> inode 5678]

    style A stroke:#ff00ff
    style F stroke:#ff00ff

Advantages:

O(1) average lookup regardless of directory size
Handles collisions through chaining or open addressing
Good for large directories

Disadvantages:

Hash computation overhead
Collisions require additional handling
Memory overhead for hash table structure
Cannot efficiently list entries in sorted order

// Hash table directory implementation (conceptual)
typedef struct dir_entry {
    char *name;
    unsigned long inode;
    struct dir_entry *next;  // Collision chain
} DirEntry;

typedef struct {
    DirEntry *buckets[TABLE_SIZE];
    unsigned int (*hash_fn)(const char *name, int len);
} HashDirectory;

// Insert operation
int hash_insert(HashDirectory *dir, const char *name, unsigned long inode) {
    unsigned int index = dir->hash_fn(name, strlen(name));
    DirEntry *new_entry = malloc(sizeof(DirEntry));
    new_entry->name = strdup(name);
    new_entry->inode = inode;
    new_entry->next = dir->buckets[index];
    dir->buckets[index] = new_entry;
    return 0;
}

B-Tree and B+ Tree Directories

Modern file systems (ext4, NTFS, XFS) use B-trees for directories to maintain sorted order and handle millions of entries:

graph TD
    A[Root Node] --> B[Intermediate Node]
    A --> C[Intermediate Node]

    B --> D[Leaf: A-M]
    B --> E[Leaf: N-Z]
    C --> F[Leaf: A-G]
    C --> G[Leaf: H-N]

    style A stroke:#ff00ff,stroke-width:3px
    style D stroke:#00fff9
    style E stroke:#00fff9
    style F stroke:#00fff9
    style G stroke:#00fff9

B-Tree characteristics:

O(log n) lookup time
Maintains sorted order (range queries efficient)
Handles insertions/deletions without complete restructuring
Efficient for both small and large directories

ext4 HTree (Indexed Directory): ext4 uses a special B-tree called HTree (a hybrid tree) for directory indexing. It supports:

O(log n) lookups via two-level indexing
Linear directory entries as fallback for small directories
Automatic upgrade to HTree when directory grows large

graph TD
    A[Hash Index Root] --> B[Index Node 0]
    A --> C[Index Node 1]
    B --> D[Leaf: files 0-999]
    B --> E[Leaf: files 1000-1999]
    C --> F[Leaf: files 2000-2999]
    C --> G[Leaf: files 3000-3999]

    style A stroke:#ff00ff,stroke-width:3px

Path Resolution

Path resolution walks the directory tree, looking up each component:

// Simplified path resolution algorithm
inode_t resolve_path(const char *path) {
    if (path[0] == '/') {
        // Absolute path - start at root
        current = root_inode;
        path = path + 1;
    } else {
        // Relative path - start at current directory
        current = cwd_inode;
    }

    while (*path) {
        // Find next path component
        component = next_component(&path);

        // Skip empty components (e.g., "a//b")
        if (strlen(component) == 0) continue;

        // Look up component in current directory
        inode = lookup_entry(current, component);
        if (!inode) return -ENOENT;  // Component doesn't exist

        // Handle ".." and "."
        if (strcmp(component, "..") == 0) {
            current = get_parent(current);
        } else if (strcmp(component, ".") == 0) {
            // Stay at current
        } else {
            current = inode;
        }
    }
    return current;
}

Directory Caching

The kernel caches directory lookups to avoid repeated I/O:

# Check dentry cache statistics
cat /proc/sys/fs/dentry-state

# View cached entries
ls -la /proc/slabinfo | grep dentry

# Clear dentry cache (requires root)
echo 3 > /proc/sys/vm/drop_caches

Production Failure Scenarios

Scenario 1: Million-File Directory Meltdown

What happened: A continuous integration system stored build artifacts in a single directory that grew to over 3 million files. Operations like ls and find became unusable. Creating new files took 10+ seconds. The file system was ext3 with linear directory search.

Why it happened: ext3 uses linear directory search by default. With 3 million files, finding any file required scanning millions of entries. The system spent all its time searching directories instead of doing useful work.

Detection:

# Count files in a directory
ls -la /path/to/directory | wc -l

# Or faster
find /path/to/directory -maxdepth 1 | wc -l

# Check directory entry type
debugfs -R "dir /path/to/directory" /dev/sda1

Mitigation:

Upgrade to ext4 which enables HTree indexing automatically

Use e2fsck to convert directory to indexed format:

sudo tune2fs -O dir_index /dev/sda1
sudo e2fsck -fD /dev/sda1  # Convert directories

Restructure storage to spread files across multiple directories
Implement automatic directory sharding (e.g., using first letters)

Scenario 2: Directory Entry Corruption

What happened: A server crashed mid-write to a directory. On reboot, the directory contained entries pointing to non-existent inodes. Some file names were garbled. The kernel marked the file system dirty.

Why it happened: Without journaling (ext3), directory modifications weren’t atomic. A partial write left the directory in an inconsistent state.

Detection:

# Run file system check in read-only mode
sudo fsck.ext4 -n /dev/sda1

# Use debugfs to examine directory structure
sudo debugfs -w /dev/sda1
debugfs: dirstat /path/to/directory

Mitigation:

Use journaling file systems (ext4, XFS)
Convert ext3 to ext4: tune2fs -O extents,uninit_bg /dev/sda1
Run periodic consistency checks
Monitor kernel logs for file system errors

Scenario 3: Case-Insensitive Lookup Failure

What happened: A developer created a file named Config.php and pushed to a case-insensitive Windows file share mounted on Linux. Another developer on Linux couldn’t find config.php. The share appeared to lose the file.

Why it happened: Different file systems have different case sensitivity rules:

File System	Case Sensitive?
ext2/3/4	Yes
XFS	Yes
NTFS	No
FAT32	No
CIFS/SMB default	Often no

Detection:

# Check mount options for case sensitivity
mount | grep cifs

# Test case sensitivity
touch TESTFILE
ls testfile 2>/dev/null && echo "Case insensitive" || echo "Case sensitive"

Mitigation:

Use consistent naming conventions (all lowercase)

Specify case sensitivity mount option for CIFS:

mount -o case insensitive //server/share /mnt
# Or
mount -o case sensitive //server/share /mnt

Document case sensitivity rules for cross-platform projects

Trade-off Table

Method	Lookup Time	Insert Time	Memory	Use Case
Linear List	O(n)	O(1) at end	Minimal	Small directories (<1000 files)
Hash Table	O(1) avg	O(1) avg	Medium	Large directories, random access
B-Tree	O(log n)	O(log n)	Higher	Very large, sorted access needed
ext4 HTree	O(log n)	O(log n)	Medium	Modern Linux default

Implementation Snippet

Directory Entry Parser

#!/usr/bin/env python3
"""Parse Linux ext4 directory entries."""

import struct

def parse_ext4_directory(data, block_size=4096):
    """Parse directory entries from an ext4 directory block."""
    entries = []
    offset = 0

    while offset < block_size:
        # Directory entry header (8 bytes)
        header = struct.unpack('<IHHHHH', data[offset:offset+struct.calcsize('<IHHHHH')])

        inode = header[0]
        rec_len = header[1]
        name_len = header[3]

        if inode == 0:
            # Entry not in use
            offset += rec_len
            continue

        # Extract filename
        name_start = offset + struct.calcsize('<IHHHHH')
        name = data[name_start:name_start + name_len].decode('utf-8', errors='replace')

        # File type (stored in record length high bits in ext4)
        file_type = (rec_len >> 12) if rec_len > 255 else 1

        entries.append({
            'inode': inode,
            'name': name,
            'rec_len': rec_len & 0xFFF,  # Lower 12 bits
            'type': ['unknown', 'regular', 'directory', 'char', 'block',
                     'fifo', 'sock', 'symlink'][file_type]
        })

        offset += rec_len

    return entries

def list_directory(device, inode_num):
    """List entries in a directory given its inode."""
    import subprocess

    result = subprocess.run([
        'debugfs', '-R', f'rdump / {inode_num}',
        device
    ], capture_output=True, text=True)
    return result.stdout

Simulating Directory Search

#!/usr/bin/env python3
"""Compare different directory search strategies."""

import time
import random
import string

class DirectoryEntry:
    def __init__(self, name, inode):
        self.name = name
        self.inode = inode

class LinearDirectory:
    """Simple list-based directory."""
    def __init__(self):
        self.entries = []

    def insert(self, name, inode):
        self.entries.append(DirectoryEntry(name, inode))

    def lookup(self, name):
        for entry in self.entries:
            if entry.name == name:
                return entry.inode
        return None

    def list_all(self):
        return [e.name for e in self.entries]

class HashDirectory:
    """Hash table based directory."""
    def __init__(self, size=256):
        self.buckets = [[] for _ in range(size)]
        self.size = size

    def _hash(self, name):
        h = 0
        for c in name:
            h = (h * 31 + ord(c)) % self.size
        return h

    def insert(self, name, inode):
        bucket = self._hash(name)
        for entry in self.buckets[bucket]:
            if entry.name == name:
                entry.inode = inode
                return
        self.buckets[bucket].append(DirectoryEntry(name, inode))

    def lookup(self, name):
        bucket = self._hash(name)
        for entry in self.buckets[bucket]:
            if entry.name == name:
                return entry.inode
        return None

def benchmark(directory, names, iterations=1000):
    """Benchmark directory lookup performance."""
    # Insert all names
    for name in names:
        directory.insert(name, random.randint(1, 1000000))

    # Benchmark lookups
    start = time.time()
    for _ in range(iterations):
        for name in random.sample(names, min(100, len(names))):
            directory.lookup(name)
    elapsed = time.time() - start

    return elapsed

if __name__ == "__main__":
    # Generate test names
    names = [''.join(random.choices(string.ascii_lowercase, k=10))
             for _ in range(10000)]

    linear = LinearDirectory()
    hash_dir = HashDirectory()

    linear_time = benchmark(linear, names)
    hash_time = benchmark(hash_dir, names)

    print(f"Linear search: {linear_time:.4f}s")
    print(f"Hash table:    {hash_time:.4f}s")
    print(f"Speedup:       {linear_time/hash_time:.2f}x")

Observability Checklist

Monitoring directory performance:

ls -la /path: Check entry count and modification
getfattr -d file: Check extended attributes
debugfs -R “dirstat /path” /dev/sdX: Directory statistics on ext4
xfs_info /mount: Show XFS directory block configuration

# Monitor directory operation times
strace -c ls /path/to/directory

# Check for problematic directories
for dir in /var /home /tmp; do
    count=$(find "$dir" -maxdepth 1 -type f | wc -l)
    if [ $count -gt 10000 ]; then
        echo "WARNING: $dir has $count files"
    fi
done

# Check ext4 directory indexing
sudo tune2fs -l /dev/sda1 | grep -E "Directory"
# Should show: "Directory ACLs", "Directory Indexed"

Common Pitfalls / Anti-Patterns

Directory Permissions

Directory permissions work differently than file permissions:

Read (r): List directory contents (use ls)
Write (w): Create, rename, or delete files within directory
Execute (x): Access directory contents (traverse, use cd)

# r--: Can see file names but not metadata or content
ls /directory      # Yes
ls -l /directory   # No
cat /file          # No

# -w-: Can create/delete files but can't read them
touch /directory/newfile  # Yes
rm /directory/file        # Yes
ls /directory             # No

# --x: Can access known files but not list directory
cd /directory     # Yes
ls /directory     # No
cat /file         # Yes (if you know exact path)

Sticky Bit on Shared Directories

# Prevents users from deleting others' files in shared directories
chmod +t /shared/directory

# View sticky bit
ls -la /shared
# drwxrwxrwt  2 root root 4096 May 19 10:00 /shared
#                                                         ^ sticky bit

Directory Access Control Lists (ACLs)

# Set default ACL for new files in directory
setfacl -d -m u:alice:rw /shared/directory

# Set ACL for specific user
setfacl -m u:bob:r /shared/directory

# View ACLs
getfacl /shared/directory

# Remove specific ACL entry
setfacl -x u:alice /shared/directory

Common Pitfalls / Anti-patterns

1. Millions of Files in One Directory

# BAD: Storing millions of files in single directory
# This kills performance on most file systems
find /logs -name "*.log" | head -1000000 > /tmp/all_logs

# GOOD: Use subdirectories sharded by date/hash
find /logs -type d | head
# /logs/2024/01/01
# /logs/2024/01/02
# ...

# GOOD: Use hierarchical sharding
for i in $(seq 0 99); do
    mkdir -p /data/$i
done
mv file_0001.dat /data/$(($RANDOM % 100))/

2. Long File Names

# Path length limits (older systems):
# - Maximum component: 255 bytes
# - Maximum total path: 4096 bytes

# BAD: Extremely long file names
touch "this_is_a_very_very_long_file_name_that_might_cause_problems.txt"

# GOOD: Use reasonable naming conventions
touch "config_v2_final_REVISED.txt"

3. Special Characters in File Names

# BAD: Special characters that break scripts
touch "file with spaces.txt"
touch "file;with;semicolons.txt"
touch "file*with*asterisks.txt"

# GOOD: Alphanumeric and underscores
touch "config_settings.txt"
touch "user_data_backup.txt"

Quick Recap Checklist

Directories map file names to inode numbers
Linear search is O(n), good for small directories (<1000 entries)
Hash tables provide O(1) average lookup but no sorted iteration
B-trees give O(log n) and maintain sorted order
ext4 uses HTree for large directories; convert with e2fsck -D
Path resolution walks through each component, looking up each name
Directory caching (dentry cache) dramatically speeds repeated lookups
Different file systems have different case sensitivity rules
Watch for inode exhaustion (not just space) when many small files exist

Interview Questions

1. How does Linux resolve a path like `/home/user/docs/file.txt`?

Path resolution is a multi-step process:

Parse: Split the path into components: /, home, user, docs, file.txt
Start: Begin at the root inode (for absolute paths starting with /) or current directory (for relative paths)
Lookup each component:
- Look up home in the root directory → get inode for /home
- Look up user in /home → get inode for /home/user
- Continue until file.txt is found
Validate permissions: At each step, verify execute permission on the directory
Return inode: The inode number for file.txt

The kernel uses the dentry cache to speed repeated lookups. Each path component lookup first checks the cache before doing I/O.

2. Why is having a million files in one directory slow on ext3 but fast on ext4?

ext3 uses linear directory search: To find any file, ext3 scans entries one by one from the beginning. With a million files, the average lookup requires scanning 500,000 entries. This is O(n) complexity.

ext4 uses HTree indexing: When a directory grows beyond ~100 entries, ext4 automatically converts it to use a two-level hash tree. Lookups become O(log n) — finding a file in a million-entry directory takes ~20 comparisons instead of 500,000.

Additional factors:

ext4 uses larger directory block sizes (4KB vs 1KB)
ext4 has delayed allocation (better block placement)
ext4 journal ensures directory modifications are atomic

You can manually convert ext3 directories to indexed format with e2fsck -D.

3. What is the dentry cache and why is it important?

The dentry cache (Directory Entry cache) caches recently accessed directory entries — the mappings between file names and inode numbers. It's a tree-structured cache that lives in kernel memory.

Importance:

Avoids repeated disk I/O for frequently accessed paths
Enables fast path resolution (check cache → hit vs miss)
Stores validity and aging information
Helps maintain consistency across mount points

When you access /home/user/docs/file.txt:

Check cache for / → get root inode
Check cache for /home → get inode
Check cache for /home/user → get inode
And so on...

Without the dentry cache, every path component lookup would require disk I/O.

4. Explain the difference between directory permissions and file permissions.

Directory permissions control what you can do with the directory itself:

Read (r): List the contents — you can see what files exist (via ls)
Write (w): Create, rename, or delete files within the directory (even files you don't own, if directory is writable)
Execute (x): Enter the directory and access files within it — you need this to cd into the directory or access any file by path

File permissions control what you can do with the file itself:

Read (r): View file contents
Write (w): Modify file contents
Execute (x): Run the file as a program

Key insight: you need execute permission on a directory to access any file within it, regardless of the file's own permissions. To read /home/user/secret.txt, you need x on /, /home, and /home/user, plus r on secret.txt.

5. What happens when you delete a file that's currently open by another process?

On Unix/Linux systems, deleting an open file works due to how the kernel handles file references:

Directory entry removed: The file name is removed from its parent directory immediately
Inode reference held: The file's inode remains allocated as long as any process has it open
Data blocks intact: The actual file data stays on disk until the last reference (file descriptor or hard link) is released
Process continues: The process with the open file descriptor can still read/write normally
Cleanup on close: When the last process closes the file, the kernel frees the inode and data blocks

This behavior is used by package managers and software updates: they create a new version of a file, atomically replace the old one, and the old version persists until processes using it exit.

You can verify this with: lsof +L1 — shows files that are deleted but still open.

6. How do B-trees maintain sorted order and why does this matter for directories?

B-trees store keys in sorted order at each level. For directory lookups, this means:

Range queries are efficient: Listing all files starting with "a" can stop early
Prefix matching: Finding all files in a range is a tree traversal, not a full scan
Insert/delete balanced: Tree stays balanced, guaranteeing O(log n) even after many modifications

ext4's HTree is a hash-indexed B-tree: the hash of the filename determines which branch to follow. This maintains O(log n) lookup while distributing entries evenly across branches (avoiding the "hot spot" problem of sorted B-trees with sequential inserts).

7. What is the difference between dcache and inode cache in Linux?

The dentry cache (dcache) stores directory entry structures—mappings from filename component to inode. Each dentry also holds references to its parent dentry and children, forming the directory tree in memory.

The inode cache stores the actual inode structures loaded from disk. When VFS needs to read a file's metadata or data blocks, it uses the inode cache to avoid disk I/O.

The relationship: dentries point to inodes. When you resolve /home/user/file.txt, the dentry for "user" points to the inode for /home/user. Both caches are separately managed but work together—dentries provide fast path resolution, inodes provide fast metadata access.

8. Why does `ls` on a directory with millions of files become slow?

On ext3 (linear search), listing a directory with millions of files requires reading every entry sequentially. The O(n) complexity means average lookup is n/2 comparisons—catastrophic for large directories.

On ext4 with HTree, `ls` should be O(log n) for each entry lookup during readdir(), but there are still bottlenecks:

readdir() must return entries in sorted order—HTree lookups still traverse the tree
Each directory entry requires a dentry allocation in kernel memory
Stale dentry invalidation if files are being created/deleted concurrently
Directory block reads are sequential but still I/O-bound

The real cost is often not the lookup but the memory allocation and userspace processing for millions of entries.

9. How does directory permission interact with file permission during access?

When accessing /home/user/documents/report.txt:

You need execute (x) permission on `/` to reach any subdirectory
You need execute (x) on `/home` to enter it
You need execute (x) on `/home/user` to enter it
You need execute (x) on `/home/user/documents` to enter it
Finally, you need the appropriate permission on `report.txt` itself (r for read, w for write, etc.)

The directory permissions don't control access to files inside—they control access to the directory itself. Even if you have full permission on a file, you can't access it if you lack execute permission on any parent directory.

10. What is a negative dentry and why does it exist?

A negative dentry is a cached lookup result indicating "this name does not exist." When you look up /nonexistent once, the kernel caches the negative result, preventing repeated lookup attempts.

Benefits:

Avoids repeated failed lookups for missing files
Particularly important for shared libraries (/lib/libfoo.so not found)—avoids filesystem hits on every program load
TTL-based invalidation prevents stale negatives

Negative dentries can cause issues when a file is created after a negative cache: the cache must be invalidated. This happens through directory notifications (inotify) or timeout-based expiry.

11. How do hard links and symbolic links differ at the directory implementation level?

Hard links: Multiple directory entries (directory entries are names in a directory) point to the same inode. The inode maintains a link count — how many directory entries reference it. Deleting a hard link decrements the count; when it reaches zero, the inode is freed. Hard links cannot span file systems (same inode numbers are local to each filesystem), and cannot reference directories (would create cycles).

Symbolic links: A special file type containing a path string. The path can point anywhere (even to nonexistent targets). Resolving a symbolic link requires following the path — if the target is removed, the symlink becomes "dangling." Symlinks can span file systems and reference directories. Directory entries for symlinks store the link's target path, not an inode.

At implementation: hard links share inode references directly; symlinks store path strings that must be recursively resolved.

12. How does the VFS layer abstract directory operations across different file systems?

VFS (Virtual File System) provides a unified interface that all file systems implement. For directory operations:

struct inode_operations: Contains function pointers for lookup, create, mkdir, rename
struct dentry_operations: Contains d_hash, d_compare, d_delete
struct file_operations: Contains readdir function for iterating directory entries

When you call readdir(), VFS routes to the concrete filesystem's iterate() or readdir() implementation. ext4 uses HTree for indexed lookup; NFS may send lookups to remote server; tmpfs stores entries in RAM. VFS handles caching, locking, and pathname translation — filesystem code only deals with its own structures.

13. What is directory entry caching and how does it interact with the page cache?

Page cache stores disk blocks (file data and metadata) in memory. Dentry cache stores parsed pathname components. They interact:

When resolving /home/user/file.txt, dentry cache holds "/" dentry, "home" dentry, "user" dentry
Each dentry points to an inode — inodes are cached in the inode cache
When a file is opened, its inode's pages (block mapping + data blocks) are in page cache
Modifying a file: dentry → inode → page cache → mark page dirty → writeback

The relationship: dentries are parsed from directory file data (which lives in page cache). When directory contents change, d_invalidate() marks relevant dentries invalid — they will be re-parsed on next lookup, pulling fresh data from page cache.

14. How does journaling ensure directory consistency after a crash?

Journaling records directory operations in a journal (separate log area) before applying them to the actual directory:

Journal transaction: Record intent (transaction begin, operation details)
Commit: Write operation to journal on disk (durable)
Checkpoint: Apply operation to actual directory structure
Journal cleanup: Remove committed transactions from journal

On crash recovery: filesystem replays journal entries, completing any committed transactions. Directory structure is never left in partial-write state. ext4 uses journaling for metadata (configurable: ordered mode journals only metadata, writeback allows data journal). XFS journals both data and metadata in different modes.

15. What happens when you rename a file across different directories on the same filesystem?

Local rename (same filesystem, different directory): It's a meta-operation — the inode doesn't change, only the parent directory entry. The kernel:

Lock both source and destination parent directories (inode mutex)
Verify destination doesn't already exist (if it does, fail with EEXIST)
Remove source filename entry, decrement link count of source inode
Create destination filename entry pointing to same inode, increment link count
Unlock directories

This is atomic in the sense that no intermediate state is visible: either the old name or the new name exists, never both simultaneously. The rename is cheap because no data moves — only directory entries are updated.

16. How does inotify interact with directory monitoring at the kernel level?

inotify (inode notification) is a Linux kernel subsystem for filesystem event monitoring. When you inotify_init(), you get a file descriptor; inotify_add_watch() registers interest in specific files/directories.

Events are stored in an inotify_event structure and can be read via read(). For directories, events include: IN_CREATE, IN_DELETE, IN_MOVED_FROM, IN_MOVED_TO, IN_MODIFY. The kernel hooks into VFS: when VFS modifies a directory entry, it calls inotify_d_instantiate() which queues events.

Performance: inotify watches consume kernel memory. Large number of watches or rapid events can overwhelm buffers — events may be coalesced or lost (read size determines how many events you retrieve per read).

17. What is the relationship between file descriptor limits and directory entry lookups?

File descriptor limits (ulimit -n, /proc/sys/fs/file-max) control how many open file descriptors a process can have. Each open file, directory traversal, and memory-mapped file consumes a file descriptor.

During directory traversal (e.g., readdir() loop), each directory entry may trigger:

stat() calls for metadata — these use fd indirectly via dentry cache
Opening files — each open consumes a file descriptor
Processing large directories without closing descriptors — causes leaks

When FD limit is reached: EMFILE (Too many open files) — new file opens fail, but existing operations may continue if they have open descriptors. Always close directory handles with closedir() after use.

18. How does fsck recover corrupted directory structures and what are its limitations?

fsck recovery process:

Phase 1: Check inodes — verify inode structure, link counts, block allocation bitmap consistency
Phase 2: Check directory structure — verify directory entries point to valid inodes, parent/child relationships
Phase 3: Reconnect orphaned inodes — inodes with zero directory links are reattached to lost+found
Phase 4: Check reference counts — verify link counts match actual directory references
Phase 5: Check cylinder groups — verify superblock and group descriptors

Limitations: fsck cannot recover data content (only structure), cannot repair hardware errors, cannot restore deleted files from backups. Orphaned files in lost+found have numeric names — content must be manually identified. Running fsck on mounted filesystem is dangerous — use recovery mode.

19. How does case-insensitive lookup work in filesystems like NTFS and FAT32?

Case insensitivity implementation: When creating or looking up a filename, the filesystem normalizes the name (converts to uppercase in NTFS, or stores comparison flags). Lookup uses case-folded comparison instead of byte comparison.

NTFS specifics: Stores names in Unicode (UTF-16), uses RtlUpcaseUnicodeString() for comparisons. The volume mount option case sensitive can enable case-preserving-but-sensitive behavior for WSL/developers.

FAT32 specifics: 8.3 filename format (max 8 chars + 3 extension), all uppercase internally. Long filename (VFAT) extension stores names as UTF-16 but normalizes for lookup.

Cross-platform issues: A file created as "README.TXT" on Windows may appear as "readme.txt" on Linux — the filesystem stores the original case but lookup is case-insensitive. Applications expecting case-sensitive behavior may fail.

20. What is directory quota enforcement and how does it interact with directory entry allocation?

Directory quotas limit the number of entries or total size within a directory tree. Implementation options:

Project quotas (XFS): Hierarchical quota tracking, each directory tree has a project ID with enforced limits
Directory-based limits: Some filesystems (ReiserFS, ext4 with patches) support per-directory entry limits
External quota systems: Quota daemon (quotacheck, quotaon) scans filesystem and enforces limits via inode count

Interaction with entry allocation:

User attempts to create file in directory with quota
Kernel checks current entry count vs soft/hard limits
If hard limit reached, return EDQUOT (disc quota exceeded)
Quota is checked before actual allocation — prevents partial operations

Quota enforcement for directories is more complex than for files because directories can contain many files — the quota applies to the tree, not individual entries.

Conclusion

Directory implementations transform flat name spaces into the hierarchical structures we navigate daily. The evolution from linear search through hash tables to B-trees reflects the industry learning to handle millions of files per directory without sacrificing performance. ext4’s HTree index makes million-file directories practical — a task that would bring ext3 to its knees.

Path resolution leverages the dentry cache to avoid repeated disk I/O for frequently accessed paths. Each lookup checks the cache before triggering disk reads, making repeated file access dramatically faster than initial access. This caching strategy applies not just to file content but to the directory structure itself.

For deeper study, explore how distributed file systems like Ceph and GlusterFS implement directory metadata at scale, how case-insensitive file systems handle Unicode normalization, and how the trade-offs between directory implementation strategies affect benchmark results. The principles learned here apply broadly to any system that needs to map keys to values at scale.

Directory Implementation

Introduction

When to Use / When Not to Use

Architecture or Flow Diagram

Core Concepts

Directory Entry Structure

Linear Search (Simple List)

Hash Table Directories

B-Tree and B+ Tree Directories

Path Resolution

Directory Caching

Production Failure Scenarios

Scenario 1: Million-File Directory Meltdown

Scenario 2: Directory Entry Corruption

Scenario 3: Case-Insensitive Lookup Failure

Trade-off Table

Implementation Snippet

Directory Entry Parser

Simulating Directory Search

Observability Checklist

Common Pitfalls / Anti-Patterns

Directory Permissions

Sticky Bit on Shared Directories

Directory Access Control Lists (ACLs)

Common Pitfalls / Anti-patterns

1. Millions of Files in One Directory

2. Long File Names

3. Special Characters in File Names

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

ASLR & Stack Protection

Assembly Language Basics: Writing Code the CPU Understands

Boolean Logic & Gates