Virtual File Systems (VFS)

Understanding how Linux abstracts multiple file systems through a common interface, enabling transparent access to ext4, NTFS, FAT, and network file systems.

published: May 19, 2026 reading time: 38 min read author: GeekWorkBench

Quick Summary

Understanding how Linux abstracts multiple file systems through a common interface, enabling transparent access to ext4, NTFS, FAT, and network file systems.

Virtual File Systems (VFS)

Every time you access a file on Linux—whether it lives on an ext4 partition, a USB drive formatted with FAT32, an NFS share, or even /proc—the same interface handles it. That interface is the Virtual File System (VFS) layer, also known as the VFS abstraction layer. Without VFS, every application would need to understand how to talk to each specific file system type. With VFS, applications speak a universal language while the kernel translates to whatever file system actually stores the data.

VFS is one of the most elegant abstractions in operating systems. It enables the illusion of a unified file tree while simultaneously supporting dozens of radically different file system implementations. Understanding VFS helps you troubleshoot mounting issues, optimize file system performance, and understand how Linux achieves its legendary flexibility.

Introduction

The Virtual File System (VFS) layer—also called the VFS abstraction layer—is one of the most elegant and powerful concepts in modern operating systems. Every time you read a file, list a directory, or mount a drive on Linux, you’re interacting with VFS without necessarily realizing it.

What VFS is: VFS is an abstraction layer in the Linux kernel that provides a unified interface to different file system implementations. Without it, applications would need separate code to communicate with ext4, XFS, NTFS, NFS, CIFS, and all the other file system types. With VFS, applications speak a single universal language while the kernel translates operations to whatever file system actually stores the data.

Why it matters: Understanding VFS is essential for system administrators, DevOps engineers, and developers working with containers. It explains why mounting works the way it does, how containers get their filesystem isolation, and why some operations behave differently across file system types. You’ll be able to troubleshoot issues that would otherwise seem mysterious and make better decisions about storage in production environments.

What you’ll learn: This article walks through the VFS architecture, its four core data structures (superblock, inode, dentry, and file), how file systems register and mount, path resolution mechanics, the dentry cache, file system types, and real-world failure scenarios with mitigations.

When to Use / When Not to Use

Understanding VFS helps with system administration and troubleshooting.

When VFS knowledge is essential:

Mounting and configuring various file systems
Troubleshooting “mount succeeded but files not accessible” issues
Working with network file systems (NFS, CIFS, FUSE)
Understanding why some operations are slower on certain file systems
Container storage and volume mounting

When you can rely on defaults:

Standard server configuration with single file system type
Desktop usage with built-in file system support
Simple container workloads with default storage

Architecture or Flow Diagram

graph TD
    A[Application] --> B[POSIX System Calls]
    B --> C[VFS Layer]

    C --> D[ext4 Driver]
    C --> E[NTFS Driver]
    C --> F[FAT/VFAT Driver]
    C --> G[NFS Client]
    C --> H[CIFS/SMB Client]
    C --> I[procfs Driver]
    C --> J[tmpfs Driver]

    D --> K[Block Device Layer]
    E --> K
    F --> K
    G --> L[Network]
    H --> L
    K --> M[Storage Device]
    L --> N[NFS Server]
    L --> O[SMB Server]

    style A stroke:#ff00ff,stroke-width:2px
    style C stroke:#ff00ff,stroke-width:3px

The VFS layer sits between applications and the actual file system implementations. Each file system type implements the VFS interface, making them interchangeable from the application’s perspective.

Core Concepts

VFS Data Structures

The VFS layer is built on four key data structures that every file system must implement:

// Superblock - file system level metadata
struct super_block {
    unsigned long s_blocksize;         // Block size in bytes
    struct super_operations *s_op;     // Superblock operations
    struct dentry *s_root;             // Root directory entry
    struct list_head s_files;          // All open files
    void *s_fs_info;                   // File system specific info
    // ... many more fields
};

// Inode - represents a file (similar to on-disk inode)
struct inode {
    unsigned long i_ino;               // Inode number
    umode_t i_mode;                    // File type and permissions
    struct inode_operations *i_op;     // Inode operations
    struct file_operations *i_fop;     // File operations
    struct super_block *i_sb;          // Superblock reference
    // ... many more fields
};

// Dentry - directory entry (name to inode mapping)
struct dentry {
    const char *d_name;                // Name component
    struct inode *d_inode;             // Associated inode
    struct dentry *d_parent;           // Parent directory
    struct list_head d_subdirs;        // Child entries
    // ... many more fields
};

// File - open file instance
struct file {
    struct path f_path;                // Path to file
    struct file_operations *f_op;      // File operations
    loff_t f_pos;                      // Current position
    unsigned int f_flags;              // Open flags
    // ... many more fields
};

The key insight: these are generic structures. Specific file systems fill them in with their own implementations of standard operations.

VFS Operations

Each data structure has an associated operations table:

// Superblock operations - file system level
struct super_operations {
    struct inode *(*alloc_inode)(struct super_block *);
    void (*destroy_inode)(struct inode *);
    void (*dirty_inode)(struct inode *, int);
    void (*write_inode)(struct inode *, int);
    void (*put_inode)(struct inode *);
    void (*put_super)(struct super_block *);
    // ... more
};

// Inode operations - file/directory specific
struct inode_operations {
    int (*create)(struct inode *, struct dentry *, umode_t, bool);
    int (*lookup)(struct inode *, struct dentry *);
    int (*link)(struct dentry *, struct inode *, struct dentry *);
    int (*unlink)(struct inode *, struct dentry *);
    int (*mkdir)(struct inode *, struct dentry *, umode_t);
    int (*rmdir)(struct inode *, struct dentry *);
    // ... more
};

// File operations - file access
struct file_operations {
    loff_t (*llseek)(struct file *, loff_t, int);
    ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
    int (*open)(struct inode *, struct file *);
    int (*release)(struct inode *, struct file *);
    // ... more
};

Each file system (ext4, XFS, NTFS, etc.) implements these operations for its own data structures and semantics.

File System Registration

When the kernel boots, file system drivers register with VFS:

// Register a file system type
register_filesystem(&ext4_fs_type);
register_filesystem(&xfs_fs_type);
register_filesystem(&vfat_fs_type);
register_filesystem(&nfs_fs_type);

// File system type structure
struct file_system_type {
    const char *name;           // "ext4", "xfs", "ntfs"
    int fs_flags;               // FS_REQUIRES_DEV, FS_BINARY_MOUNTDATA, etc.
    struct dentry *(*mount)(struct file_system_type *, int,
                            const char *, void *);
    void (*kill_sb)(struct super_block *);
    struct module *owner;
    // ...
};

This registration makes the file system available for mounting.

Mount Chain

When you mount a device, the chain looks like:

graph TD
    A["mount -t ext4 /dev/sda1 /mnt"] --> B[VFS receives mount request]
    B --> C[Find ext4 in registered file systems]
    C --> D[Call ext4 mount function]
    D --> E[Read superblock from device]
    E --> F[Create super_block structure]
    F --> G[Create root dentry and inode]
    G --> H[Link /mnt to VFS mount tree]

    style A stroke:#ff00ff,stroke-width:2px
    style H stroke:#00fff9

The mount creates the VFS structures that represent the mounted file system in the unified namespace.

Path Resolution in VFS

When an application accesses /home/user/file.txt:

sequenceDiagram
    participant App as Application
    participant VFS as VFS Layer
    participant Cache as Dentry Cache
    participant FS as ext4 Driver
    participant Disk as Disk

    App->>VFS: open("/home/user/file.txt")
    VFS->>Cache: lookup dentry for "/"
    Cache-->>VFS: root inode
    VFS->>Cache: lookup dentry for "home"
    Cache-->>VFS: cached or inode
    VFS->>FS: read dir, find "user"
    FS->>Disk: read directory blocks
    Disk-->>FS: directory entries
    FS-->>VFS: inode for "user"
    VFS->>Cache: cache dentry
    VFS->>FS: lookup "file.txt"
    FS-->>VFS: inode for file.txt
    VFS-->>App: file descriptor

The dentry cache dramatically speeds repeated path lookups.

Core Concepts: File System Types

Disk-Based File Systems

These work with block devices and form the backbone of local storage on Linux systems. Each brings different trade-offs around performance, reliability, and features:

File System	Best For	Key Limitation
ext4	General-purpose Linux workloads	Not ideal for very large filesystems (>50TB)
XFS	High-throughput, large files (media, databases)	Slower metadata operations than ext4
Btrfs	When you need snapshots, checksums, or pooling	Less mature, slower than ext4 in some workloads
NTFS	Dual-boot with Windows	Requires ntfs-3g driver, limited Linux integration
FAT32/exFAT	USB drives, cross-platform compatibility	No journaling, 4GB file size limit (FAT32)

Choosing between ext4 and XFS: ext4 is the default for most distributions because it recovers quickly from crashes and handles small files efficiently. XFS excels at large sequential I/O (video processing, scientific data) and scales better on massive storage arrays. For a file server holding 100TB of video archives, XFS typically outperforms ext4. For a web server with millions of small files, ext4’s faster metadata operations win.

Btrfs trade-offs: Btrfs’s copy-on-write design enables snapshots without the overhead of LVM layering, and its checksumming catches silent data corruption that ext4 and XFS miss. However, Btrfs write amplification can accelerate SSD wear in high-write workloads, and its memory usage grows with metadata complexity.

Network File Systems

These access remote servers:

NFS (Network File System): Unix/Linux standard
CIFS/SMB: Windows interoperability
SSHFS: File system over SSH
FTPFS: FTP-backed file system

# Mount NFS
sudo mount -t nfs4 server:/share /mnt/nfs

# Mount CIFS
sudo mount -t cifs //server/share /mnt/cifs -o username=user

# Mount SSHFS
sshfs user@server:/path /mnt/sshfs

Virtual/Proc File Systems

These don’t store data on disk:

# proc - process information
ls /proc
# 1/  1234/  self/

# sys - system information
ls /sys
# block/  bus/  class/  devices/

# tmpfs - RAM-based file system
mount -t tmpfs tmpfs /tmp

# devpts - terminal devices
ls /dev/pts

Union/Mount Namespace File Systems

Union file systems combine multiple directory trees into a single unified view. They’re foundational to container image layering and development workflows:

How union mounts work: When you overlay /upper on top of /lower, files in /upper shadow files with the same name in /lower. If /lower/file.txt exists and /upper/file.txt does not, you see /lower/file.txt through the mount. If /upper/file.txt exists, you see that instead. Deletions in the upper layer create whiteout files that mask the lower layer.

overlay vs bind mounts: A bind mount (mount --bind /src /dst) makes the same directory appear in two locations — they’re identical views of the same data, not a union. An overlay mount (overlay with lowerdir + upperdir) creates a union where the upper layer is writable and shadows the read-only lower layers. Containers use overlay to layer multiple read-only image directories plus one writable container layer.

Key use cases:

Containers: Docker’s overlay2 driver stacks image layers as lower directories and the container’s writable layer as upper
Development: Overlay your source directory over a base image to test changes without modifying the original
Live patching: Mount patched files over original files to apply fixes without redeployment

The workdir parameter in overlay mounts is required for atomic rename operations — the kernel uses it as a scratch space when swapping files atomically.

# Three-layer overlay example (common in containers)
mount -t overlay overlay \
  -o lowerdir=/baseOS:/tools:/app,upperdir=/container_changes,workdir=/overlay_work \
  /merged

Advanced VFS Topics

Now that you understand the core VFS machinery—the four data structures, how file systems register, and how paths get resolved—the kernel’s other systems kick in to make everything actually fast and safe in production. The page cache sits under every file system driver. Container runtimes build their isolation on top of VFS mount namespaces. Memory eviction algorithms keep the caches from consuming everything. And FUSE lets people write file systems without touching kernel code at all. Each one is worth knowing if you’re debugging a real system.

Page Cache Deep Dive

The page cache (formerly called the buffer cache) sits between file system drivers and the actual disk. When you call read(), data is returned from the page cache if it’s already in memory — no disk I/O occurs. When you call write(), data goes into the page cache and is marked dirty; a background flusher thread (flush or pdflush) eventually writes it to disk.

Key structures involved:

address_space: Each inode has one of these, managing the mapping between file offsets and memory pages in the page cache. The writepage() and readpage() callbacks connect to the file system’s block-level I/O.
Writeback mechanics: Dirty pages accumulate in the page cache and are flushed based on dirty_writeback_interval_centisecs and dirty_expire_interval_centisecs. The vm.dirty_ratio and vm.dirty_background_ratio sysctls control when writeback begins.
Interaction with mmap: When you memory-map a file with mmap(), reads and writes to the mapped region flow through the page cache. Page faults bring data in from disk if not yet cached.

Container Storage Drivers

Container runtimes use VFS abstractions to provide filesystem isolation:

overlay2: The preferred driver for Docker on Linux. It builds on VFS’s overlay mount, using multiple lower layers (typically the image layers) and one upper layer (the container’s writable layer). The lowerdir can be a colon-separated list of read-only image layers; upperdir is the container’s changes; workdir is required for atomic renames.
devicemapper (thinp): Uses the device mapper target (dm-thin) to create thin provisioned volumes. Each container gets a snapshot of a base image volume. Copy-on-write shares blocks between the base and derived volumes.
Btrfs: Some container runtimes use Btrfs subvolumes and snapshots. Btrfs’s native copy-on-write means derived images share blocks until written — very efficient for layered images.

Why overlay2 over overlay: The original overlay driver had issues with inode numbers colliding when multiple layers had many files. overlay2 uses a directory-based approach that avoids this by using the layer’s filesystem UUID combined with the inode number, giving each layer a unique inode namespace.

Linux Page Cache Eviction

The kernel maintains two LRU (Least Recently Used) lists per zone:

Active list: Hot pages that are frequently accessed — kept in memory as long as possible.
Inactive list: Pages that have been used less recently — first candidates for eviction.

The eviction process flows through shrink_page_list():

The VM (virtual memory) subsystem balances when memory pressure crosses vm.min_free_kbytes thresholds.
shrink_inactive_list() moves pages from the inactive list to the active list (if referenced) or evicts them (if clean, immediately; if dirty, queues them for writeback first).
The vm.vfs_cache_pressure knob controls how aggressively the kernel reclaims dentry and inode cache memory versus page cache. At 100 (default), the kernel balances equally. At 50, it retains more dentry/inode structures at the cost of dropping more file data pages. At 200, it drops dentry/inode structures more readily to preserve page cache.

Mount Namespaces

Mount namespaces (enabled by CLONE_NEWNS in clone()) give each process group an independent view of the mount table. This is what containers use to give each workload its own filesystem root.

Key concepts:

Private mounts (default): Changes inside the namespace don’t propagate out, and external changes don’t affect the namespace.
Shared mounts: Bidirectional propagation between peer namespaces. Container runtimes often mark the container’s mount namespace as shared so that volume mounts appear in /proc/self/mountinfo.
Slave mounts: Propagation goes one way — master to slave, but not vice versa.
Mount propagation and Docker: When you bind-mount a host directory into a container (docker run -v /host/data:/container/data), Docker typically marks the bind mount as shared so the container sees the host’s directory. The /proc/self/mountinfo file shows the propagation type as a field in each mount entry.

FUSE in Userspace

VFS supports user-space file systems through FUSE (Filesystem in Userspace), which allows file system implementations without kernel code. The kernel module (fuse.ko) handles the VFS interface and communicates with a userspace daemon via /dev/fuse.

How it works:

FUSE driver registers with VFS like any other file system.
When a VFS operation arrives (e.g., read(), lookup()), the kernel sends the request to the userspace daemon via /dev/fuse.
The daemon processes the request using ordinary userspace code and returns the result.
The kernel completes the VFS operation and returns to the caller.

Common FUSE implementations include sshfs (mount remote files over SSH), gocryptfs (encrypted filesystem at the filesystem level), borgbackup (backup with deduplication), and mergerfs (pooling multiple drives into a single logical volume).

Production Failure Scenarios

Scenario 1: File System Not Registered

What happened: An administrator tried to mount an ext4 partition but got “unknown file system type ‘ext4’.” The system had kernel support for ext4 as a module, but the module wasn’t loaded.

Detection:

# Check loaded file system modules
lsmod | grep -E "ext4|xfs|btrfs"

# Check available file systems
cat /proc/filesystems

# Try loading the module
sudo modprobe ext4

Mitigation:

Ensure file system modules are built into kernel or loaded
For embedded systems, include necessary FS support in kernel config
Use modprobe or add to /etc/modules for persistent loading

Scenario 2: VFS Cache Pressure Causing Memory Issues

What happened: A system with 64GB RAM showed 58GB used by page cache, leaving little for applications. The system started swapping despite having memory pressure from cache.

Detection:

# Check memory usage breakdown
free -h

# Check VFS cache statistics
cat /proc/meminfo | grep -E "Cached|Dirty|Writeback"

# Check for dropping caches
sync
echo 3 > /proc/sys/vm/drop_caches
free -h

Mitigation:

Adjust vm.vfs_cache_pressure:

# Default is 100, lower to keep more dentry/inode cache
sysctl -w vm.vfs_cache_pressure=50

# Or make persistent in /etc/sysctl.conf
vm.vfs_cache_pressure = 50

Use drop_caches for immediate relief during maintenance
Monitor and alert on cache vs application memory balance

Scenario 3: Overlay Mount Inconsistency

What happened: A container runtime used overlay file system. Applications inside containers saw stale files, files that existed in the lower layer weren’t visible, and some files showed old content despite being updated in the base image.

Why it happened: Overlay file systems have specific requirements for showing/hiding files. Incorrect lowerdir/upperdir configuration or copying files instead of using the union semantics caused visibility issues.

Detection:

# Check overlay mount options
mount | grep overlay

# View overlay layers
cat /proc/mounts | grep overlay

# Check which layers files come from
ls -la /merged/file  # upper has whiteout?

Mitigation:

Ensure proper overlay mount options:

mount -t overlay overlay \
  -o lowerdir=/lower1:/lower2,upperdir=/upper,workdir=/work \
  /merged

Understand whiteout files (show deleted files from lower)
Use chattr -i for immutable flag handling in overlay

Scenario 4: Lost Connection to Network File System

What happened: An NFS server became unreachable. Client systems with NFS mounts hung—any command accessing /mnt/nfs would block indefinitely. The mount point couldn’t be unmounted.

Detection:

# Check NFS mount status
mount | grep nfs
cat /proc/mounts | grep nfs

# Check NFS daemon status
systemctl status nfs-server

# Monitor for network issues
netstat -an | grep 2049

Mitigation:

Use hard vs soft mount options:

# Hard mount: retry indefinitely (can hang)
mount -t nfs server:/share /mnt -o hard

# Soft mount: timeout and return error
mount -t nfs server:/share /mnt -o soft,timeo=50

Use intr option to allow signals to interrupt:

mount -t nfs server:/share /mnt -o hard,intr

Use autofs for on-demand mounting
Set up monitoring for NFS connectivity

Unmount with lazy option when hung:

sudo umount -l /mnt/nfs  # lazy unmount
sudo umount -f /mnt/nfs  # forced unmount

Trade-off Table

File System	VFS Support	Performance	Features	Complexity
ext4	Native	Good	Journal, extents	Low
XFS	Native	Excellent	Journal, quota	Medium
Btrfs	Native	Good	COW, snapshots	High
NTFS	Via ntfs-3g	Moderate	Windows compat	Medium
NFSv4	Native	Network limited	Stateful	Medium
CIFS/SMB	Native	Network limited	Windows compat	Low
tmpfs	Native	Excellent (RAM)	Dynamic sizing	Low
overlay	Native	Good	Union mount	Medium

Implementation Snippet

Implementing a Simple FUSE File System

#!/usr/bin/env python3
"""Simple FUSE file system using Python (fuse-python)."""

from fuse import FUSE, FuseOSError, Operations
import os
import time

class SimpleFS(Operations):
    """A simple in-memory file system demonstrating VFS concepts."""

    def __init__(self):
        # In-memory storage
        self.files = {
            '/': {
                'type': 'directory',
                'content': b'',
                'st': self._stat('/', is_dir=True)
            }
        }

    def _stat(self, path, is_dir=False):
        """Generate stat information."""
        return {
            'st_mode': 0o40755 if is_dir else 0o100644,
            'st_nlink': 2 if is_dir else 1,
            'st_size': len(self.files.get(path, {}).get('content', b'')),
            'st_ctime': time.time(),
            'st_mtime': time.time(),
            'st_atime': time.time(),
        }

    def getattr(self, path, fh=None):
        if path not in self.files:
            raise FuseOSError(2)  # ENOENT
        return self._stat(path, self.files[path]['type'] == 'directory')

    def readdir(self, path, fh):
        entries = ['.', '..']
        for name in self.files:
            if name != '/' and os.path.dirname(name) == path.rstrip('/'):
                entries.append(os.path.basename(name))
        return entries

    def read(self, path, size, offset, fh):
        if path not in self.files:
            raise FuseOSError(2)
        data = self.files[path]['content']
        return data[offset:offset + size]

    def write(self, path, data, offset, fh):
        if path not in self.files:
            # Create file
            self.files[path] = {
                'type': 'file',
                'content': b''}
        current = self.files[path]['content']
        self.files[path]['content'] = current[:offset] + data
        return len(data)

    def create(self, path, mode, fi=None):
        self.files[path] = {
            'type': 'file',
            'content': b'',
            'st': self._stat(path)
        }
        return 0

    def mkdir(self, path, mode):
        self.files[path] = {
            'type': 'directory',
            'content': b'',
            'st': self._stat(path, is_dir=True)
        }

    def unlink(self, path):
        if path in self.files:
            del self.files[path]

    def rmdir(self, path):
        if path in self.files and self.files[path]['type'] == 'directory':
            del self.files[path]

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('mount_point', help='Where to mount')
    parser.add_argument('-f', '--foreground', action='store_true')
    args = parser.parse_args()

    fuse = FUSE(SimpleFS(), args.mount_point, foreground=args.foreground)

Checking VFS Statistics

#!/bin/bash
# vfs_stats.sh - Display VFS statistics

echo "=== VFS Statistics ==="
echo ""

echo "--- File Systems Registered ==="
cat /proc/filesystems

echo ""
echo "--- Mount Points ==="
mount | column -t

echo ""
echo "--- Dentry Cache Stats ==="
cat /proc/sys/fs/dentry-state

echo ""
echo "--- Inode Stats ==="
cat /proc/sys/fs/inode-state

echo ""
echo "--- File Handle Limits ==="
echo "System max: $(cat /proc/sys/fs/file-max)"
echo "Current used: $(cat /proc/sys/fs/file-nr | awk '{print $1}')"
echo "Per-process limit: $(ulimit -n)"

echo ""
echo "--- VFS Cache Pressure ==="
cat /proc/sys/vm/vfs_cache_pressure

echo ""
echo "--- Dentry Cache Size ==="
grep -E "NrDentries|Dcache_alive" /proc/slabinfo 2>/dev/null || echo "Info not available"

Observability Checklist

Monitoring VFS and file system health:

mount: Show all mounted file systems with options
cat /proc/filesystems: List supported file system types
cat /proc/mounts: Detailed mount information including bind mounts
df -h: Show space usage for all mounted file systems
du -sh /path: Check space usage of specific directories

# Comprehensive VFS monitoring script
#!/bin/bash

echo "=== VFS Health Report ==="
echo "Generated: $(date)"
echo ""

echo "--- Active Mounts with FS Type ---"
mount | grep -v "tmpfs\|proc\|sys\|devpts\|cgroup" | awk '{print $3, $5}' | sort

echo ""
echo "--- Mount Options Security Check ---"
for mount_point in $(mount | awk '{print $3}'); do
    # Skip virtual mounts
    [[ "$mount_point" =~ ^(proc|sys|dev|/sys|/proc|/dev) ]] && continue

    opts=$(mount | grep " $mount_point " | awk '{print $6}' | tr -d '()')
    if [[ "$opts" == *"noexec"* ]]; then
        echo "$mount_point: noexec set (good for security)"
    fi
    if [[ "$opts" == *"nosuid"* ]]; then
        echo "$mount_point: nosuid set (good for security)"
    fi
done

echo ""
echo "--- NFS Mounts Status ---"
mount | grep -E "nfs|cifs" | while read line; do
    echo "$line"
    # Check for hung mounts
    mount_point=$(echo "$line" | awk '{print $3}')
    timeout 1 ls "$mount_point" >/dev/null 2>&1
    if [ $? -ne 0 ]; then
        echo "  WARNING: $mount_point not responding!"
    fi
done

echo ""
echo "--- File Descriptor Usage ---"
current=$(cat /proc/sys/fs/file-nr | awk '{print $1}')
max=$(cat /proc/sys/fs/file-max)
pct=$((current * 100 / max))
echo "Used: $current / $max ($pct%)"
if [ $pct -gt 80 ]; then
    echo "WARNING: File descriptor usage above 80%"
fi

Common Pitfalls / Anti-Patterns

1. Ignoring Bind Mount Flags

# BAD: Bind mount without considering security
mount --bind /home /mnt/shared

# GOOD: Use appropriate flags
mount --bind /home /mnt/shared
mount -o remount,bind,nosuid,nodev,ro /mnt/shared

2. Network File System Without Timeout

# BAD: Hard mount with no interrupt capability
mount -t nfs server:/share /mnt -o hard

# GOOD: Soft mount with timeout, interruptible
mount -t nfs server:/share /mnt -o soft,timeo=50,retrans=3,intr

# BEST for critical systems: autofs
echo "/mnt/nfs -fstype=nfs4 ro,intr server:/share" >> /etc/auto.master

3. Union Mount Misconfiguration

# BAD: Incorrect overlay order
mount -t overlay overlay \
  -o upperdir=/upper,lowerdir=/lower,workdir=/work /merged
# If upper is below lower in order, lower wins

# GOOD: Correct order
mount -t overlay overlay \
  -o lowerdir=/lower:/base,upperdir=/upper,workdir=/work /merged

4. Assuming VFS Caches Are Always Safe

# BAD: Not syncing before unmount
umount /mnt  # Could lose data in cache

# GOOD: Sync first
sync
umount /mnt

# Or use lazy unmount if busy
umount -l /mnt

5. Insecure Mount Options for Sensitive Partitions

# security mount options for various scenarios

# /var (data partition)
# - noexec: prevents binary execution from this partition
# - nosuid: ignores setuid bit
# - nodev: no device files
UUID=xxx /var ext4 defaults,noexec,nosuid,nodev 0 2

# /tmp (temporary files)
# Consider tmpfs with size limit
tmpfs /tmp tmpfs defaults,noexec,nosuid,nodev,size=2G 0 0

# Network mounts
# Prevent execution of remote binaries
mount -t nfs server:/share /mnt -o noexec,nosuid,hard,intr

6. Disabling atime Updates Without Understanding the Trade-offs

# No atime update (good for SSDs, reduces writes)
mount -o noatime /dev/sda1 /mnt

# Read-only mounting
mount -o remount,ro /mnt

# Prevent setuid execution
mount -o nosuid /mnt

# No device files
mount -o nodev /mnt

# No binary execution
mount -o noexec /mnt

While these options improve performance and security, always understand what functionality you’re disabling. For example, noatime breaks some applications that rely on atime for mail delivery notification or backup rotation logic.

Quick Recap Checklist

VFS provides the common interface all Linux file systems implement
Key structures: super_block, inode, dentry, file
Each file system implements operations through function pointers
Dentry cache dramatically speeds repeated path lookups
Network file systems add latency but enable sharing
Virtual file systems (proc, sys, tmpfs) provide kernel interfaces
Mount options control security and performance
VFS is why you can cat /proc/cpuinfo and mount -t nfs server:/share with the same API

Interview Questions

1. What is the VFS layer and why was it created?

The Virtual File System (VFS) is an abstraction layer in the Linux kernel that provides a unified interface to different file system implementations. Before VFS, applications would need to know how to communicate with each specific file system type.

VFS was created to solve the problem of file system heterogeneity. When you have ext4, XFS, Btrfs, NTFS, NFS, CIFS, and dozens of other file systems, applications shouldn't need separate code paths for each one.

The key insight: all file systems present the same API through VFS. Applications call open(), read(), write(), and close(). VFS translates these to whatever the underlying file system understands. The application has no idea—and doesn't care—what's underneath.

2. Explain the relationship between VFS, the dentry cache, and the inode cache.

The dentry cache (Directory Entry cache) and inode cache work together to speed file system operations:

Dentry cache stores the mapping between directory entry names and inode numbers. When you access /home/user/file.txt, the dentry cache remembers:

"/" maps to the root inode
"home" maps to inode for /home
"user" maps to inode for /home/user

Inode cache stores the actual inode structures (metadata about files) including permissions, timestamps, and pointers to data blocks.

The relationship: dentries point to inodes. When you resolve a path, you use the dentry cache to quickly find each component, which gives you the inode number, which the inode cache can then provide the full inode structure.

Without these caches, every file access would require disk I/O to read directory entries and inodes.

3. What happens when you mount a device? Walk through the VFS layer involved.

When you execute mount -t ext4 /dev/sda1 /mnt, the process involves:

Parse mount options: VFS extracts file system type (ext4) and target (/mnt)
Locate file system driver: Looks up "ext4" in registered file systems
Call mount function: Invokes ext4's mount() function
Read superblock: ext4 driver reads the file system's superblock from the device
Create VFS structures: Allocates super_block, inode for root directory
Link to mount tree: Adds the mount to the VFS mount namespace
Return success: Now /mnt represents the root of ext4 filesystem

After mounting, any file operation in /mnt goes through the ext4 driver's VFS operations to the underlying blocks on /dev/sda1.

4. How does the kernel support multiple simultaneous file system types?

The kernel supports multiple file system types through registration and operation vectors:

Each file system driver registers with VFS using register_filesystem(), providing:

Its name (e.g., "ext4", "xfs", "nfs")
Its mount function
Its operation vectors (super_operations, inode_operations, file_operations)

When a mount is requested, VFS looks up the file system by name and calls the registered mount function. Each file system implements the same interface but with its own logic.

At runtime, you can have ext4 on /, XFS on /home, tmpfs on /tmp, and NFS on /mnt/nfs simultaneously. Applications see all as part of the unified namespace, but VFS routes each operation to the appropriate driver.

5. What is the difference between a bind mount and a symbolic link, from the VFS perspective?

Symbolic link is a file type (stored in directory entries, has its own inode, contains a path string). When you access a symlink, VFS performs path resolution on the target path, which may cross mount points.

Bind mount is a VFS concept where the same directory entry (same dentry/inode) appears in multiple places in the mount tree. The underlying data is identical—they share the same VFS structures.

Key differences:

Symlinks can cross mount boundaries; bind mounts stay within the same file system view
Bind mounts show the actual data, not a path that could be modified
Deleting through a bind mount affects the original (they're the same inode)
Symlinks have their own inode; bind mounts share the same inode

In container contexts, bind mounts are used to expose host directories into containers. The container sees the same data as the host because it's the same VFS entry, just accessed from a different mount point.

6. How does the kernel handle a mount operation at the VFS layer?

When you execute mount -t ext4 /dev/sda1 /mnt:

sys_mount() system call: Triggers VFS mount logic
Parse mount options: VFS extracts file system type and flags
Locate file system driver: Looks up "ext4" in the registered file system list
Call mount function: Invokes ext4's mount(), which reads the superblock
Create super_block: Allocates kernel structure, reads superblock from disk
Create root dentry and inode: Represents / of the new filesystem
Link to mount tree: Adds to the per-process mount namespace
Return: Now /mnt paths route through ext4 driver

The mount namespace is per-process (container isolation). Each process may see different mounts.

7. What is the purpose of the super_operations structure in VFS?

struct super_operations is a function pointer table that defines callbacks for file system-level operations. Each file system implements these to provide its specific behavior:

alloc_inode / destroy_inode: Create/free inode structures
dirty_inode: Called when inode is modified
write_inode: Persist inode to disk
put_super: Clean up during unmount
remount_fs: Handle mount option changes

This is the VFS polymorphism pattern: VFS calls these functions without knowing if it's ext4, XFS, or NTFS. Each driver fills in its own implementations, and VFS calls through the function pointers.

8. What is an example of when VFS abstraction "leaks" in practice?

VFS abstraction leaks when the unified interface doesn't fully mask differences:

Extended attributes: ext4 supports ACLs via xattrs; FAT32 doesn't. Copying files between them loses permissions.
Case sensitivity: ext4 is case-sensitive, NTFS/FAT are case-insensitive. A file created on Linux may be invisible on Windows mounts.
Symbolic links on FAT: FAT doesn't support symlinks. Creating one on a CIFS mount backed by FAT might create a shortcut (.lnk) file instead—or fail silently.
Special files: /proc and /sys aren't real directories. Tools like find behave differently on them.

Understanding these leaks helps diagnose cross-filesystem issues like "my permissions don't work on NAS."

9. How do containers use VFS mount namespaces for isolation?

Containers use mount namespaces (CLONE_NEWNS) to create isolated mount views:

Clone with new namespace: clone(CLONE_NEWNS) creates process with copy of parent's mount namespace
Private mount: The container's root is initially a copy of the host's
Bind mounts: mount --bind /host/path /container/path exposes host directories at container paths
Overlay mount: Upperdir/lowerdir layers implement copy-on-write for container changes
Pivot_root or chroot: Changes the container's view of "/" to the container's rootfs

The container sees only its mounts—a process in the container cannot see or affect host mounts (unless explicitly shared). This isolation is entirely a VFS concept.

10. What is the difference between page cache and dentry cache in VFS?

Page cache: Stores actual file data content. When you read() a file, data goes into the page cache. When you write(), data is written to page cache first and flushed to disk later.

Dentry cache: Stores directory entry metadata—filename to inode mappings. When you resolve a path, you traverse dentry cache (cached lookups) to find the inode number. Dentries also cache child dentries for fast subtree traversal.

Key differences:

Page cache stores data; dentry cache stores structure
Page cache is page-granularity (4KB typically); dentries are variable-size
Dentry cache is purely kernel RAM; page cache can be swapped
Dentries implement directory tree structure; page cache is linear file content

Both are critical for performance—dentries speed path resolution, page cache speeds file content access.

11. What happens when you access a file on a network file system like NFS?

For NFS, each VFS operation triggers network I/O:

open(): NFS client sends OPEN call to NFS server, receives file handle
read(): Client sends READ request with file handle, offset, count; server responds with data
write(): Client sends WRITE request with data; server acknowledges
close(): Client sends CLOSE; server releases file state

NFS client caches aggressively:

Attribute cache: Stales inode metadata locally
Data cache: Pages cached locally with weak consistency
Dentry cache: Path component lookups cached

The trade-off: network latency (milliseconds) vs local disk (microseconds). NFS performance depends on cache hit rates.

12. How does path resolution work in VFS when traversing a path like /home/user/file.txt?

Path resolution in VFS follows a systematic traversal:

Starting at root: VFS starts with the root dentry (always cached)
Component lookup: For each path component ("home", "user", "file.txt"), VFS calls lookup() on the parent directory's inode
Dentry cache check: Before calling the file system's lookup(), VFS checks the dentry cache. If the dentry is already cached, return it immediately
FS-specific lookup: If not cached, call the file system's inode->i_op->lookup() function which reads directory entries from disk
Cache the result: The newly found dentry is cached for future lookups
Repeat: Continue until the final component is resolved

Each cached dentry also caches the dentries of its children, so deep path traversal after the first access is mostly cache hits. The d_lookup() function handles the hash-table lookup in the dentry cache.

13. What is the difference between a file struct and an inode in VFS?

Inode (struct inode): Represents a file on disk. There is exactly one inode per file (identified by inode number). It contains metadata (permissions, timestamps, size, block pointers) and points to the file's data blocks. Inodes are persistent—they exist on disk and are loaded into memory when needed.

File struct (struct file): Represents an open file handle. It exists only in memory for as long as the file is open. It contains the current file position (f_pos), open flags (f_flags), and points to the inode. Multiple processes can have the same file open, each with their own struct file but sharing the same inode.

Key difference: One inode per file on disk; one file struct per open file handle per process. If two processes open the same file, you have 2 file structs but 1 inode. If one process opens the same file twice, you have 2 file structs but 1 inode.

14. What happens when you unmount a file system in Linux?

Unmounting involves several steps:

Sync: sync() flushes all dirty data and metadata to disk
Reference count check: VFS checks that no files are open and no processes have chdir'd into the mount point
Call put_super(): The file system's put_super() is called to release the super_block
Free inodes: All inodes associated with the mount are freed (or marked for destruction)
Remove from mount tree: The mount entry is removed from the VFS mount namespace
Release resources: Filesystem-specific cleanup (close block device, free private data)

If files are still open or processes are using the mount, umount fails with "Device or resource busy" (unless umount -l lazy unmount is used, which detaches immediately and cleans up later).

15. How does VFS handle rename operations across different file systems?

Rename within the same file system is straightforward: VFS calls inode->i_op->rename(), which updates directory entries to point to the same inode under a new name.

Rename across file systems is not permitted at the VFS level. The operation:

Check source and target: VFS verifies both paths resolve within the same mount
Fail if different mounts: Cross-mount renames (e.g., /mnt/drive1/file to /mnt/drive2/file) return EXDEV ("Cross-device link")

This is a fundamental VFS constraint. Applications must implement cross-device rename as copy + delete: read source, write to destination, then unlink source. This preserves data integrity but loses metadata like timestamps and permissions unless explicitly preserved.

16. What is the role of the inode_operations structure in VFS?

struct inode_operations defines callbacks for file and directory operations that act on inodes. Each file system implements these for its specific semantics:

create: Create a regular file in a directory (e.g., open(filename, O_CREAT))
lookup: Find a directory entry by name, returning its inode
link: Create a hard link (same inode, new directory entry)
unlink: Remove a directory entry pointing to an inode
mkdir: Create a subdirectory
rmdir: Remove an empty subdirectory
rename: Change a file's name (possibly within the same directory)
setattr: Change inode attributes (permissions, timestamps)

The VFS layer calls these function pointers without knowing the underlying file system. ext4, XFS, and NTFS each have their own implementations with different algorithms and on-disk structures.

17. What is the relationship between tmpfs and the VFS layer?

tmpfs is a file system implemented entirely in VFS—it has no disk backing. It stores files in virtual memory (RAM) and can optionally use swap space when RAM is low.

tmpfs registers with VFS just like disk-based file systems (register_filesystem(&tmpfs_fs_type)). It implements the same VFS operations: inode_operations, file_operations, super_operations.

Key characteristics:

No on-disk structure—files vanish on reboot
Dynamic sizing: uses available RAM/swap up to a configured limit
Fast: no disk I/O for reads/writes
Commonly used for /dev/shm (shared memory), /tmp, and container mounts

From VFS perspective, tmpfs looks like any other file system. Applications access it via the same open(), read(), write() calls. The difference is purely in the implementation—the tmpfs driver never touches a block device.

18. How does VFS interact with the page cache during read and write operations?

Reads flow through VFS to the page cache:

Application calls read(fd, buf, size)
VFS's generic_file_read() checks the page cache first
If the page is cached, copy data from page cache to userspace buffer
If not cached, allocate a page, call the file system's readpage(), then copy

Writes also use write-back caching:

Application calls write(fd, buf, size)
VFS writes to the page cache (marking pages as dirty)
Returns immediately to application (fast)
Background kernel threads (pdflush/flush) periodically write dirty pages to disk

The page cache is unified—ext4, XFS, and all other file systems share it. When ext4 writes a block, it goes into the same page cache that XFS uses. This maximizes cache utilization across file systems.

19. What is the purpose of the file_operations structure and how does it differ from inode_operations?

struct file_operations defines callbacks for file I/O operations—things you do on an open file handle:

llseek: Change file position
read / write: Data I/O
readdir: Iterate directory entries (for readdir() syscall)
mmap: Memory-map the file
fsync: Force dirty pages to disk
lock: File locking (flock, fcntl)

struct inode_operations defines operations on the inode itself—metadata and name-level operations:

create: Create a new file
lookup: Find a file in a directory
mkdir, rmdir: Directory operations
rename: Change file name
link, unlink: Hard link operations

File operations are per-open-file (struct file), inode operations are per-file (struct inode). Multiple opens of the same file share the inode but have separate file operation tables.

20. How does the VFS layer handle file system mount propagation in mount namespaces?

Mount namespaces (CLONE_NEWNS) give each process or container group an independent view of the mount table. Mount propagation determines how mounts in one namespace affect others:

Mount types:

Private (default): Mounts and unmounts do not propagate to/from other namespaces
Shared: Mount propagates bidirectionally with peer namespaces
slave: Mounts from master propagate to slave, but not vice versa
unbindable: Cannot be bind mounted

When a container is created with its own mount namespace:

Initial mounts are copied from the parent namespace (private mounts become private in the new namespace)
Bind mounts (like mounting host directories into containers) can be marked shared so changes are visible to the host, or private for isolation
Unmounting inside the container (e.g., /proc) does not affect the host's mount table

The /proc/self/mountinfo file shows the propagation type and peer relationships for each mount. Container runtimes carefully configure propagation (e.g., Docker's volume mounts are typically shared or slave) to enable the desired isolation.

Conclusion

The Virtual File System layer is what makes Linux’s unified namespace possible. By defining a standard set of operations (super_operations, inode_operations, file_operations) that each file system implements, VFS allows applications to interact with ext4, XFS, NFS, CIFS, and even virtual file systems like procfs through the same POSIX API. The dentry cache and inode cache are the performance keys, dramatically reducing disk I/O for repeated path resolutions.

When working with file systems, understanding VFS helps you troubleshoot mount issues, choose appropriate mount options for security and performance, and design storage for containers and networked environments. The mount namespace isolation that containers rely on is fundamentally a VFS concept. Remember that network file systems (NFS, CIFS) add latency and failure modes that local file systems do not have—design for timeouts and retry logic in production.

Virtual File Systems (VFS)

Introduction

When to Use / When Not to Use

Architecture or Flow Diagram

Core Concepts

VFS Data Structures

VFS Operations

File System Registration

Mount Chain

Path Resolution in VFS

Core Concepts: File System Types

Disk-Based File Systems

Network File Systems

Virtual/Proc File Systems

Union/Mount Namespace File Systems

Advanced VFS Topics

Page Cache Deep Dive

Container Storage Drivers

Linux Page Cache Eviction

Mount Namespaces

FUSE in Userspace

Production Failure Scenarios

Scenario 1: File System Not Registered

Scenario 2: VFS Cache Pressure Causing Memory Issues

Scenario 3: Overlay Mount Inconsistency

Scenario 4: Lost Connection to Network File System

Trade-off Table

Implementation Snippet

Implementing a Simple FUSE File System

Checking VFS Statistics

Observability Checklist

Common Pitfalls / Anti-Patterns

1. Ignoring Bind Mount Flags

2. Network File System Without Timeout

3. Union Mount Misconfiguration

4. Assuming VFS Caches Are Always Safe

5. Insecure Mount Options for Sensitive Partitions

6. Disabling atime Updates Without Understanding the Trade-offs

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

ASLR & Stack Protection

Assembly Language Basics: Writing Code the CPU Understands

Boolean Logic & Gates