Cache Eviction Policies: LRU, LFU, FIFO, and Beyond

A comprehensive guide to cache eviction policies — LRU, LFU, FIFO, Random — with implementation trade-offs, decision frameworks, and real-world case studies.

published: reading time: 48 min read author: GeekWorkBench

Introduction

This guide covers the most widely-used cache eviction policies, from simple strategies like FIFO and Random to sophisticated ones like LRU and LFU, with implementation trade-offs, decision frameworks, and real-world case studies.

Core Concepts

LRU (Least Recently Used)

LRU evicts the item that hasn’t been accessed for the longest time. The idea is that recently used items are more likely to be used again soon.

graph LR
    A[Cache Full] --> B[New item arrives]
    B --> C{Is item in cache?}
    C -->|No| D[Evict LRU item]
    D --> E[Add new item]
    C -->|Yes| F[Update position]

Implementation with LinkedHashMap (Java):

import java.util.LinkedHashMap;
import java.util.Map;

public class LRUCache<K, V> extends LinkedHashMap<K, V> {
    private final int capacity;

    public LRUCache(int capacity) {
        super(capacity, 0.75f, true); // accessOrder = true for LRU
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        return size() > capacity;
    }
}

Redis LRU Implementation:

Redis doesn’t have a true LRU implementation because that would require tracking access order for every item, which is expensive. Instead, it samples a few random keys and evicts the least recently used among them.

# Configure Redis to use LRU eviction
maxmemory 100mb
maxmemory-policy allkeys-lru

# Or for volatile-lru (only evict keys with TTL)
maxmemory-policy volatile-lru

Good for:

  • Access patterns where recent items get hit again soon
  • Stable popular items
  • Working sets that fit comfortably in cache

Struggles with:

  • One-time scans (every item touched once, never again)
  • Large working sets that overflow memory
  • Write-heavy loads with big objects

LFU (Least Frequently Used)

LFU evicts the item with the lowest access count. If an item was accessed 100 times and another only 5 times, LFU keeps the 100-time item.

graph LR
    A[Cache Full] --> B[New item arrives]
    B --> C{Item exists?}
    C -->|No| D[Evict LFU item]
    D --> E[Add new item, count=1]
    C -->|Yes| F[Increment count]
    F --> E

Implementation:

from collections import Counter
import heapq

class LFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key -> (value, freq, timestamp)
        self.freq_heap = []  # (freq, timestamp, key)
        self.counter = Counter()
        self.timestamp = 0

    def get(self, key):
        if key not in self.cache:
            return None

        value, freq, ts = self.cache[key]
        self.counter[key] += 1
        self.cache[key] = (value, freq + 1, self.timestamp)
        heapq.heappush(self.freq_heap,
                      (freq + 1, self.timestamp, key))
        self.timestamp += 1
        return value

    def put(self, key, value):
        if self.capacity == 0:
            return

        if key in self.cache:
            self.cache[key] = (value, self.counter[key] + 1, self.timestamp)
        else:
            if len(self.cache) >= self.capacity:
                # Evict least frequently used
                while self.freq_heap:
                    freq, ts, k = heapq.heappop(self.freq_heap)
                    if k in self.cache and self.cache[k][1] == freq:
                        del self.cache[k]
                        del self.counter[k]
                        break

            self.cache[key] = (value, 1, self.timestamp)
            self.counter[key] = 1

        heapq.heappush(self.freq_heap,
                      (self.cache[key][1], self.timestamp, key))
        self.timestamp += 1

Redis LFU Implementation:

# Enable LFU
maxmemory-policy allkeys-lfu

# Or volatile-lfu (only keys with TTL)
maxmemory-policy volatile-lfu

# Configure LFU decay time (how often freq counters decrease)
lfu-decay-time 1

Good for:

  • Items with genuinely different popularity levels
  • Stable access counts over time
  • Rewarding consistently popular items

Struggles with:

  • New items that need time to build up access count
  • Old items that stay even after they stop being popular
  • Frequency counter overflow if not tuned

FIFO (First In, First Out)

FIFO is the simplest policy: whatever entered the cache first leaves first. No access tracking, no frequency counting. Just a queue.

graph LR
    A[Cache Full] --> B[New item arrives]
    B --> C[Evict oldest item]
    C --> D[Add new item]

Implementation:

from collections import deque

class FIFOCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}
        self.queue = deque()

    def put(self, key, value):
        if key in self.cache:
            return

        if len(self.cache) >= self.capacity:
            oldest = self.queue.popleft()
            if oldest in self.cache:
                del self.cache[oldest]

        self.cache[key] = value
        self.queue.append(key)

    def get(self, key):
        return self.cache.get(key)

Good for:

  • Uniform access patterns
  • Situations where simplicity beats optimal hit rate
  • Debugging (predictable behavior)

Struggles with:

  • Most real-world access patterns
  • Frequently accessed items getting evicted if they happened to arrive first
  • Performance compared to LRU

TTL (Time To Live)

TTL isn’t really an eviction policy in the traditional sense. Instead, items automatically expire after a set time. The cache doesn’t need to decide what to evict; expired items are either ignored on access or cleaned up by a background process.

graph LR
    A[Item added] --> B[Timer starts]
    B --> C{TTL expired?}
    C -->|Yes| D[Item invalid]
    C -->|No| E[Item valid]

Redis TTL:

# Set a 5-minute TTL on a key
SET user:123 "data"
EXPIRE user:123 300

# Or use SETEX for atomic set + expire
SETEX user:123 300 "data"

# Check remaining TTL
TTL user:123

Lazy expiration vs active cleanup:

class TTLCache:
    def __init__(self, default_ttl=3600):
        self.default_ttl = default_ttl
        self.cache = {}  # key -> (value, expiry_time)

    def get(self, key):
        if key not in self.cache:
            return None

        value, expiry = self.cache[key]
        if time.time() > expiry:
            del self.cache[key]
            return None

        return value

    def put(self, key, value, ttl=None):
        ttl = ttl or self.default_ttl
        expiry = time.time() + ttl
        self.cache[key] = (value, expiry)

Good for:

  • Data that naturally expires (sessions, temporary data)
  • Cache consistency without manual invalidation
  • Data that might change on the backend

Struggles with:

  • Finding the right TTL value (too short = cache churn, too long = stale data)
  • Access patterns (ignores them entirely)
  • Memory waste from expired items still sitting around

Random Replacement

Random replacement picks a random item to evict when the cache is full. No tracking, no history, no complexity.

graph LR
    A[Cache Full] --> B[New item arrives]
    B --> C[Pick random item]
    C --> D[Evict it]
    D --> E[Add new item]

Implementation:

import random

class RandomCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}

    def put(self, key, value):
        if key in self.cache:
            self.cache[key] = value
            return

        if len(self.cache) >= self.capacity:
            # Pick random key to evict
            evict_key = random.choice(list(self.cache.keys()))
            del self.cache[evict_key]

        self.cache[key] = value

    def get(self, key):
        return self.cache.get(key)

Redis random eviction:

# Evict random keys
maxmemory-policy allkeys-random

# Or only random among keys with TTL
maxmemory-policy volatile-random

Good for:

  • Uniform or unpredictable access patterns
  • Extreme simplicity requirements
  • Baseline comparison for other policies
  • Certain distributed caching scenarios

Struggles with:

  • Non-uniform access patterns
  • Need for predictable performance
  • Most real-world use cases

Topic-Specific Deep Dives

Comparing the Policies

PolicyComplexityHit RateMemory OverheadBest For
LRUMediumGoodMediumTemporal locality
LFUHighBestHighStable popularity
FIFOLowPoorLowUniform access
TTLLowVariableLowFreshness required
RandomVery LowVariableNoneUniform access, simplicity
flowchart TD
    A[Access Pattern Analysis] --> B{Is access uniform?}
    B -->|Yes| C{Random item acceptable?}
    B -->|No| D{Do popular items stay popular?}
    D -->|Yes| E[LFU]
    D -->|No| F{Is recent data more popular?}
    F -->|Yes| G[LRU]
    F -->|No| H[Consider TTL or Random]
    C -->|Yes| I[Random or FIFO]
    C -->|No| J[LRU]

Trade-off Analysis (H1)

PolicyHit RateMemory OverheadCPU CostScan ResistanceTunabilityBest For
LRUGood (temporal locality)Very LowVery LowLowNoneGeneral-purpose, web serving, APIs
LFUBetter (frequency reward)Low (counter per entry)MediumMediumNoneRead-heavy, ranking systems, trending content
FIFOPoorVery LowVery LowNoneNoneBaselines, memory budget constraints
RandomPoorVery LowVery LowHighNoneBaselines, memory budget constraints
TTLDepends on TTLNoneLowN/ATTL valueTime-sensitive data, session expiry

Key Takeaways:

  • Default: LRU for most use cases — simple, effective, low overhead
  • Read-heavy/frequency-based: LFU — rewards consistent access patterns
  • Memory-constrained: LRU over FIFO/Random — both have poor hit rates
  • Variable workloads: Consider ARC or 2Q for adaptive behavior
  • Redis: Use volatile-lru or volatile-lfu to only evict keys with TTL set

Real-World Considerations

Redis-Specific Notes

Redis 4.0+ introduced LFU and improved LRU. The maxmemory-policy setting controls what happens when memory limit is reached:

# Volatile policies (only evict keys with an expiry)
maxmemory-policy volatile-lru    # Remove LRU keys with TTL
maxmemory-policy volatile-lfu    # Remove LFU keys with TTL
maxmemory-policy volatile-ttl    # Remove keys with shortest TTL
maxmemory-policy volatile-random # Random removal of keys with TTL

# Allkeys policies (evict any key)
maxmemory-policy allkeys-lru     # LRU on all keys
maxmemory-policy allkeys-lfu     # LFU on all keys
maxmemory-policy allkeys-random  # Random on all keys

Approximate LRU in Redis

True LRU requires maintaining access order for all items, which is expensive. Redis samples N keys and picks the best among them:

# Number of samples to check (default 5)
maxmemory-samples 10

Higher values approach true LRU but use more CPU.

Memory vs Speed

There’s a fundamental trade-off in eviction policy design:

  • More tracking = better decisions but more CPU/memory overhead
  • Less tracking = faster operations but potentially worse hit rates

For most applications, LRU with a small samples count (3-5) strikes a good balance. The hit rate difference between true LRU and 5-sample LRU is usually negligible.


Cache Stampede Prevention

Here’s a fun way to take down your origin: let a popular cache entry expire, then watch as every concurrent request for that entry piles up at your database at the same time. That’s the thundering herd, and it can saturate your database before your cache even gets a chance to help.

graph TD
    A[Hot entry expires] --> B[Request 1 misses cache]
    A --> C[Request 2 misses cache]
    A --> D[Request N misses cache]
    B --> E[All hit origin simultaneously]
    C --> E
    D --> E
    E --> F[Origin overload / failure]

Probabilistic Early Expiration (XFetch)

Instead of waiting for TTL to expire exactly, probabilistically extend some entries early to spread the load:

import hashlib
import random
import time

def get_with_probabilistic_early_expiry(cache, key, ttl, delta=0.1):
    """
    Implements XFetch: probabilistically refreshes entries
    before they actually expire to prevent thundering herds.
    """
    value, expiry = cache.get(key)
    if value is None:
        return None

    # If within delta window, probabilistically refresh
    time_left = expiry - time.time()
    if time_left < ttl * delta:
        # Higher time_left means lower probability of refresh
        # This spreads the refresh load over the delta window
        prob_refresh = 1 - (time_left / (ttl * delta))
        if random.random() < prob_refresh:
            # This thread will refresh - others may get stale value
            return value, "stale", expiry  # Return stale, background refresh

    return value, "fresh", expiry

def background_refresh(cache, key, compute_fn, ttl):
    """Background refresh worker - non-blocking"""
    try:
        fresh_value = compute_fn()
        cache.set(key, fresh_value, ttl)
    except Exception:
        pass  # Log and continue, don't block requests

Lease / Callback Pattern

Instead of letting all requests hit the origin, assign a “lease” to one requester while others wait:

import threading

class LeaseCache:
    def __init__(self):
        self.cache = {}
        self.leases = {}  # key -> holder thread ID
        self.lock = threading.Lock()

    def get(self, key, compute_fn, ttl=3600):
        value = self.cache.get(key)
        if value is not None:
            return value

        with self.lock:
            if key in self.leases:
                # Another thread is computing - wait or return stale
                holder = self.leases[key]
                # Wait briefly then return stale or recompute
                return self.cache.get(key)  # May be None

            # This thread gets the lease
            self.leases[key] = threading.get_ident()

        try:
            value = compute_fn()
            self.cache[key] = value
            return value
        finally:
            with self.lock:
                del self.leases[key]

    def get_with_stale(self, key, compute_fn, ttl=3600, max_staleness=60):
        """
        Return stale value if available, refresh in background.
        Best for non-critical data where slight staleness is acceptable.
        """
        entry = self.cache.get(key)
        if entry is not None:
            value, timestamp = entry
            if time.time() - timestamp < max_staleness:
                # Serve stale immediately, refresh async
                thread = threading.Thread(target=self._refresh, args=(key, compute_fn, ttl))
                thread.start()
                return value, "stale"

        # No stale available - must block for fresh
        return self.get(key, compute_fn, ttl), "fresh"

    def _refresh(self, key, compute_fn, ttl):
        try:
            value = compute_fn()
            self.cache[key] = (value, time.time())
        except Exception:
            pass

Redis-based Stampede Prevention

# Use SETNX-based lock for single-flight (only one request recomputes)
SETNX lock:user:123 "{timestamp}"
EXPIRE lock:user:123 10

# If lock acquired: recompute, set cache
# If lock failed: wait, then retry cache read

# Better: Redlock for distributed stampede prevention
# Use probabilistic early expiry with jitter
TTL = base_ttl + random.uniform(0, base_ttl * 0.1)

When to Use Each Approach

ApproachBest ForTrade-off
Probabilistic early expiryRead-heavy, tolerate slight stalenessSome requests get stale data
Lease/callbackStrict consistency requirementsFirst request pays full latency
Distributed lock (Redlock)Multi-node deploymentsAdded complexity, lock overhead
Stale-while-revalidateNon-critical data, high read throughputStale data exposure window

Cache Warming Strategies

After a restart, a deploy, or an accidental flush, your cache starts empty. Every request becomes a cache miss and hits your origin at once. Cache warming is how you bridge that gap — getting popular data back in before production traffic becomes a problem.

Proactive Warming on Startup

import redis
import json

def warm_cache_from_persistence(redis_client, warm_keys_path):
    """
    Load popular keys from a warm_keys.json file on startup.
    This file should be generated from production access logs.
    """
    with open(warm_keys_path, 'r') as f:
        warm_entries = json.load(f)

    warmed = 0
    for entry in warm_entries:
        key = entry['key']
        value = entry['value']
        ttl = entry.get('ttl', 3600)

        # Only warm if key doesn't already exist
        if not redis_client.exists(key):
            redis_client.setex(key, ttl, value)
            warmed += 1

    return warmed

# Example warm_keys.json
# [
#   {"key": "product:featured:2026", "value": "...", "ttl": 300},
#   {"key": "user:session:12345", "value": "...", "ttl": 1800}
# ]

Background Warming with Priority Queue

import heapq
import threading
from collections import deque

class BackgroundWarmingCache:
    def __init__(self, redis_client, max_warm_per_second=100):
        self.redis = redis_client
        self.max_warm_per_second = max_warm_per_second
        self.warm_queue = []  # (priority, key, compute_fn)
        self.warmed_count = 0
        self.lock = threading.Lock()
        threading.Thread(target=self._warming_worker, daemon=True).start()

    def enqueue_warm(self, key, compute_fn, priority=1):
        """Priority 1 = highest (warm first), higher = lower priority"""
        with self.lock:
            heapq.heappush(self.warm_queue, (priority, key, compute_fn))

    def _warming_worker(self):
        while True:
            with self.lock:
                if not self.warm_queue:
                    time.sleep(0.1)
                    continue

                priority, key, compute_fn = heapq.heappop(self.warm_queue)

            # Only warm if not already cached
            if not self.redis.exists(key):
                try:
                    value = compute_fn()
                    self.redis.set(key, value)
                    with self.lock:
                        self.warmed_count += 1
                except Exception:
                    pass  # Skip failed warm, log in production

            # Rate limiting
            time.sleep(1.0 / self.max_warm_per_second)

On-Demand Warming (First-Request Triggered)

def get_with_on_demand_warm(cache, key, compute_fn, ttl=3600):
    """
    On cache miss, return a synthetic stale value immediately
    if one is available, then trigger background warm.
    """
    stale = cache.get_stale(key)

    if stale is not None:
        # Return stale immediately, warm in background
        threading.Thread(target=cache.set, args=(key, compute_fn(), ttl)).start()
        return stale, "warm_stale"

    # No stale available - must compute
    value = compute_fn()
    cache.set(key, value, ttl)
    return value, "fresh"

# Usage in endpoint
@app.route('/api/product/<id>')
def get_product(id):
    value, source = get_with_on_demand_warm(
        cache=redis,
        key=f"product:{id}",
        compute_fn=lambda: db.fetch_product(id)
    )
    return {"data": value, "cache_source": source}

Warming from Database Query Logs

import psycopg2

def generate_warm_keys_from_query_logs(db_conn, top_n=1000, hours=24):
    """
    Analyze slow query log to identify frequently accessed keys.
    Useful for generating warm_keys.json automatically.
    """
    cursor = db_conn.cursor()
    cursor.execute("""
        SELECT cache_key, COUNT(*) as hits
        FROM query_log
        WHERE timestamp > NOW() - INTERVAL '%s hours'
        GROUP BY cache_key
        ORDER BY hits DESC
        LIMIT %s
    """, (hours, top_n))

    return [{"key": row[0], "priority": i} for i, row in enumerate(cursor.fetchall())]

Cache Warming Decision Matrix

StrategyLatency ImpactComplexityBest Scenario
Proactive (file load)None (background)LowKnown popular keys, restart scenarios
Priority queueMinimal (rate-limited)MediumLarge warm sets, production traffic
On-demand (stale-first)None (stale served)MediumReal-time, any key access pattern
Query log analysisNone (offline analysis)HighContinuous optimization, A/B testing

Multi-Level Cache Hierarchy

Real systems don’t use a single cache. They stack caches at multiple levels, each with different capacity, latency, and eviction characteristics. Understanding these hierarchies is critical for designing systems that scale.

graph TD
    subgraph "L1 - CPU Core (ns)"
        A[L1 Cache<br/>32-64KB<br/>~1ns]
    end
    subgraph "L2 - CPU Core (ns)"
        B[L2 Cache<br/>256KB-1MB<br/>~4ns]
    end
    subgraph "L3 - Shared (ns)"
        C[L3 Cache<br/>8-64MB<br/>~10-20ns]
    end
    subgraph "Memory (μs)"
        D[RAM<br/>GB scale<br/>~100ns]
    end
    subgraph "SSD (μs)"
        E[NVMe SSD<br/>~100μs]
    end
    subgraph "Disk (ms)"
        F[HDD/Cloud Storage<br/>~10ms]
    end

    A --> B --> C --> D --> E --> F

Inclusive vs Exclusive Caches

Inclusive cache: Lower level contains all entries from upper level. Simple, wastes some space.

Exclusive cache: Levels don’t duplicate. More capacity per footprint, more complex eviction coordination.

graph LR
    subgraph "Inclusive Cache Hierarchy"
        I1[L1] --> I2[L2]
        I2 --> I3[L3]
    end

    subgraph "Exclusive Cache Hierarchy"
        E1[L1] --> E2[L2]
        E2 --> E3[L3]
    end
PropertyInclusiveExclusive
Capacity efficiencyLower (duplication)Higher (no duplication)
Eviction complexityLowerHigher (must check all levels)
Hit validationSimple (check L1, done)Must check all levels
Used byMost modern CPUsSome HPC architectures
Cache coherencySimplerMore complex

Write Policy in Multi-Level Caches

graph TD
    A[CPU Write] --> B{L1 Hit?}
    B -->|Yes| C[Write to L1]
    B -->|No| D{L2 Hit?}
    D -->|Yes| E[Write to L2, fetch from L1]
    D -->|No| F{L3 Hit?}
    F -->|Yes| G[Write to L3, propagate up]
    F -->|No| H[Write to Memory]
    H --> I[Invalidate caches]

Write-through: Every write updates all levels immediately. Simple coherency, higher write traffic.

Write-back: Write only to highest level, propagate to lower levels on eviction. Lower write traffic, complex coherency.

Cache Line Eviction Coordination

When L1 evicts an entry that exists in L2/L3, the entry moves down rather than being discarded:

class MultiLevelCache:
    def __init__(self, l1, l2, l3):
        self.l1 = l1  # Small, fast
        self.l2 = l2  # Medium
        self.l3 = l3  # Large, slow

    def evict_from_l1(self, key, value):
        """L1 eviction -> promoted to L2"""
        if self.l2.has_capacity():
            self.l2.set(key, value)
        else:
            evicted_l2_key, evicted_l2_val = self.l2.evict_lru()
            self.l3.set(evicted_l2_key, evicted_l2_val)  # L2 evicted -> L3
            self.l2.set(key, value)

    def get(self, key):
        if value := self.l1.get(key):
            return value, "L1"
        if value := self.l2.get(key):
            self.l1.set(key, value)  # Promote to L1
            return value, "L2"
        if value := self.l3.get(key):
            self.l2.set(key, value)  # Promote to L2
            self.l1.set(key, value)  # Also promote to L1
            return value, "L3"
        return None, "miss"

Software-Level Cache Hierarchies (Redis tiering)

# Redis Tiered Cache (Redis 7.0+ with DRAM + NVMe)
# Configure Redis with memory + storage engine
activerehashing yes
# Use as tiered: hot data in memory, warm data on disk

# For application-level tiering
tiered-cache:
  l1:  # In-memory Redis
    maxmemory: 64gb
    policy: allkeys-lru
  l2:  # SSD-backed Redis
    maxmemory: 512gb
    storage-engine: devours
    eviction-policy: allkeys-lru

Hit Rate in Multi-Level Cache

def calculate_multilevel_hit_rate(l1_hits, l2_hits, l3_hits, total):
    """
    Calculate effective hit rate across cache hierarchy.
    Total effective hit = L1 + L2*(1-L1_rate) + L3*(1-L1_rate)*(1-L2_rate)
    """
    l1_rate = l1_hits / total
    l2_rate = l2_hits / total
    l3_rate = l3_hits / total

    effective_hit = l1_rate + l2_rate * (1 - l1_rate) + l3_rate * (1 - l1_rate) * (1 - l2_rate)
    return effective_hit

# Example:
# L1 hit rate: 80%, L2 hit rate: 15%, L3 hit rate: 4%
# effective_hit = 0.80 + 0.15*(0.20) + 0.04*(0.20)*(0.85)
# effective_hit = 0.80 + 0.03 + 0.0068 = 0.8368 = 83.7%

Write Policy Deep Dive

Caching isn’t only about reads. Write patterns determine cache freshness and storage efficiency. The three primary write policies serve different consistency and performance needs.

graph LR
    subgraph Write-Through
        A[Write] --> B[Update Cache]
        B --> C[Update Storage]
        C --> D[Return]
    end
    subgraph Write-Back
        A2[Write] --> B2[Update Cache]
        B2 --> D2[Return fast]
        D2 -.-> C2[Update Storage later]
    end
    subgraph Write-Around
        A3[Write] --> C3[Update Storage]
        C3 --> D3[Return]
        D3 -.-> B3[Cache may get entry]
    end

Write-Through

Every write updates both cache and storage. Simple consistency — cache always reflects stored data.

def write_through(cache, db, key, value):
    """
    Write-through: synchronous write to both cache and DB.
    Strong consistency, higher write latency.
    """
    cache.set(key, value)      # Update cache first (or after, configurable)
    db.update(key, value)     # Synchronous DB write
    return value              # Only return after both succeed

# Use when:
# - Data must not be lost (financial transactions)
# - Cache and DB must always be consistent
# - Write amplification is acceptable

Trade-off table:

FactorWrite-ThroughWrite-BackWrite-Around
Write latencyHigh (sync storage)Low (cache only)Medium (storage only)
ConsistencyStrongWeak (power loss risk)Storage is truth
Storage hitsEvery writeOn evictionOnly on read
Data durabilityImmediateOn evictionImmediate
ComplexityLowHigh (eviction logic)Medium

Write-Back (Write-Behind)

Write to cache only, defer storage writes until eviction or checkpoint. Very fast for writes, risky on crash.

class WriteBackCache:
    def __init__(self, cache, db, flush_interval=30):
        self.cache = cache
        self.db = db
        self.dirty = {}  # key -> value (written but not persisted)
        self.flush_interval = flush_interval

        # Background flush thread
        threading.Thread(target=self._periodic_flush, daemon=True).start()

    def write(self, key, value):
        self.cache.set(key, value)
        self.dirty[key] = value  # Mark as dirty

    def _periodic_flush(self):
        while True:
            time.sleep(self.flush_interval)
            self.flush_all()

    def flush_all(self):
        for key, value in self.dirty.items():
            self.db.update(key, value)
        self.dirty.clear()

    def __del__(self):
        self.flush_all()  # Ensure dirty writes on shutdown

Write-Around

Write goes directly to storage, bypassing cache. Cache stays uncluttered with one-time-write data.

def write_around(cache, db, key, value):
    """
    Write-around: write to storage, cache on next read.
    Keeps cache clean of one-time-write data.
    """
    db.update(key, value)  # Always write to storage
    cache.delete(key)      # Invalidate any existing cache entry
    # Cache will be populated on next read

Use cases:

PolicyBest For
Write-throughCritical data, financial, inventory where loss = loss
Write-backHigh-write-throughput (logs, metrics), acceptable loss
Write-aroundOne-time writes (logs, events), read-heavy workloads

Redis Write Patterns

# Write-through in Redis (synchronous SET + DB write)
SET user:123 "data"
HSET users:123 name "Alice" email "alice@example.com"

# For true write-through, wrap in MULTI/EXEC transaction
MULTI
HSET user:123 name "Alice"
LPUSH user:123:actions "login"
EXEC  # Atomic, but not durable

# For durability: use Redis RDB/AOF + application-level write-through
# Application writes to Redis, Redis persists asynchronously
# On crash, restore from AOF + reconcile with DB

# Write-behind pattern with Redis
# 1. Write to Redis (fast)
SET session:abc "{data}"
# 2. Background job flushes to DB periodically
# Use Redis Stream or just a cron job

Read-Through vs Write-Through

PatternDescriptionWhen to Use
Read-throughCache automatically loads from storage on missAlways (caching library handles)
Write-throughCache and storage updated simultaneouslyStrong consistency requirements
Write-behindCache updated, storage asyncHigh write throughput

Capacity Estimation

Different eviction policies have different memory overheads for tracking access. This matters when you’re trying to size your cache.

Memory overhead per million keys (approximate Redis):

PolicyMetadata Overhead per KeyPer 1M Keys
FIFO~8 bytes (queue pointer)~8MB
Random0 bytes (no tracking)0MB
LRU (approximate, 5 samples)~16 bytes (access timestamp)~16MB
LFU~24 bytes (counter + decay timestamp)~24MB
TTL-only0 bytes (Redis native)0MB

Working set size estimation:

working_set_bytes = unique_keys_per_second × avg_key_size × avg_ttl_seconds
cache_hit_probability = min(1, cache_size / working_set_bytes)

For a session cache with 100,000 unique sessions per second, average session size 2KB, average TTL 30 minutes (1800 seconds): working set = 100,000 × 2KB × 1800 = 360GB. A 64GB cache gives hit probability of 64/360 = 17.8% — too low, session cache is not fitting in cache. A 512GB cache gives 512/360 = 142% — working set fits with headroom.

Redis LRU vs LFU memory: LFU needs two values per key: the access counter and a decay timestamp. Counter overflow is a real concern — Redis LFU counters are 8-bit (0-255). After 255 increments, the counter stops growing unless decay kicks in. The lfu-decay-time controls how often counters are halved: lfu-decay-time 10 means every 10 minutes, all counters are multiplied by 0.5. This prevents counter freeze but means occasional access keeps items alive longer than expected.

Memcached slab class sizing: Memcached organizes memory into slab classes of 1MB each, divided into fixed-size chunks. For LRU with eviction, choose chunk size matching your average value size. If your values average 200 bytes but your slab class uses 128-byte chunks, you waste memory. If you use 1MB chunks for 200-byte values, you waste even more. The slabs reassign command can rebalance slab classes in production.


When to Use / When Not to Use

PolicyWhen to UseWhen Not to Use
LRUAccess has temporal locality (recent items accessed again); stable working set that fits in memoryOne-time scans that touch every item; large working sets that exceed memory
LFUItems have stable popularity over time; you want to reward consistently hot itemsRapidly changing access patterns; new items need time to build frequency
FIFODebugging or testing; uniform access where simplicity is paramountProduction systems with real access patterns; any case where hit rate matters
TTLData that naturally expires (sessions, temporary tokens); freshness-critical dataStatic reference data; data without natural expiration
RandomUniform access patterns; extreme simplicity requirements; baseline comparisonNon-uniform access patterns; any case where specific items must be retained

Decision Guide

graph TD
    A[Access Pattern] --> B{Is access temporal?}
    B -->|Yes| C{Working set fits memory?}
    C -->|Yes| K[LRU]
    C -->|No| F[LRU with smaller cache or LFU]
    K --> D{Is frequency stable?}
    F --> D
    D -->|Yes| G[LFU]
    D -->|No| H{Need freshness?}
    H -->|Yes| I[TTL]
    H -->|No| J[Random or FIFO]

Production Failure Scenarios

FailureImpactMitigation
LRU with full memoryOld items hog cache, new popular items evicted immediatelyMonitor eviction rates; use volatile-lru to only evict items with TTL
LFU frequency counter overflowItems stop being properly ranked, hit rate dropsTune lfu-decay-time; use maxmemory-samples for better approximation
TTL items not expiringMemory fills with dead entries; cache becomes ineffectiveEnable active expiration (lazyfree_lazy_expire in Redis); monitor expires stat
Random evicts hot itemsUsers experience random slow requests; inconsistent performanceSwitch to LRU or LFU; track what’s actually being evicted
Cache restart with no persistenceAll items lost, cold start causes database stampedeImplement cache warming; use Redis RDB persistence or replica for warm restart

Common Pitfalls / Anti-Patterns

1. Assuming LRU Means “Oldest Recently Used”

LRU evicts the least recently used - the item whose last access was longest ago. Not the item with the oldest timestamp in absolute terms.

# WRONG understanding: LRU keeps oldest items
# The item added 1 year ago but accessed 1 minute ago is NOT
# the LRU item. The item added 1 second ago but not accessed
# since is the LRU item.

# RIGHT understanding:
# LRU = Least Recently Used = longest time since last access

2. Mixing Volatile and Non-Volatile Keys

Using both volatile-lru and non-volatile (no TTL) keys can cause memory issues.

# PROBLEM: Cache fills with non-volatile keys
# volatile-lru can only evict keys WITH TTL
# If keys without TTL fill memory, volatile-lru has nothing to evict

# SOLUTION: Either use only volatile keys with TTL, or use allkeys-lru
# to evict from entire cache
maxmemory-policy allkeys-lru  # Evicts anything
# OR
maxmemory-policy volatile-lru  # Only evicts keys with TTL
# Never mix strategies - pick one approach

3. Setting TTL Too Long for Volatile Policies

If using volatile-lru or volatile-lfu, keys without TTL never get evicted and can fill memory.

# PROBLEM: Some keys have no TTL, some have very long TTL
redis.setex("user:123", 86400, data)      # 24 hour TTL
redis.set("config", json.dumps(config))   # NO TTL - never expires

# After cache fills with NO-TTL keys, volatile-lru can't free space
# because it only touches keys with TTL

# SOLUTION: Always use SETEX/SET with EX option; never use SET without TTL
redis.setex("config", 86400, json.dumps(config))  # Always with TTL

4. Ignoring Eviction Rate in Production

Low eviction count in dev, high eviction count in production means memory miscalculation.

# Development: Small dataset, everything fits
# Production: Large dataset, cache too small

# Monitor in production:
# redis-cli INFO stats | grep evicted_keys
# If evicted_keys > 0 frequently, cache is too small or policy wrong

5. Not Accounting for LRU Approximation in Redis

Redis doesn’t do true LRU. It samples keys. Small samples = inaccurate LRU.

# Default 5 samples - may miss true LRU candidate
maxmemory-samples 5

# For more accurate LRU (more CPU, better decisions)
maxmemory-samples 10

# For very large caches with non-uniform access
maxmemory-samples 16

Quick Recap

Key Bullets

  • LRU is the default choice for most use cases because temporal locality is common
  • LFU rewards consistency - items accessed frequently stay cached longer
  • TTL is not an eviction policy per se - it’s a freshness mechanism
  • FIFO and Random are baselines, not production choices for hit-rate-critical workloads
  • Redis samples ~5 keys for LRU approximation (configurable with maxmemory-samples)
  • volatile-* policies only evict keys that have a TTL set

Copy/Paste Checklist

# Redis eviction policy configuration
# For general caching (evict any key):
maxmemory 100mb
maxmemory-policy allkeys-lru

# For cache with some persistent keys:
maxmemory 100mb
maxmemory-policy allkeys-lru  # OR allkeys-lfu

# For cache where items MUST have TTL:
maxmemory 100mb
maxmemory-policy volatile-lru  # Only evict keys WITH TTL
maxmemory-policy volatile-lfu

# For pure TTL-based expiration (no eviction):
maxmemory-policy volatile-ttl  # Evicts keys closest to expiration
maxmemory 100mb

# For debugging or very specific cases:
maxmemory-policy allkeys-random
maxmemory-policy volatile-random

# Monitoring commands
INFO stats | grep -E "keyspace|evicted|expired"
OBJECT freq user:123  # Check access frequency of a key
DEBUG OBJECT LRU user:123  # Check LRU position of a key

Security Checklist

  • Configure memory limits - Prevent cache from consuming all system memory
  • Disable dangerous commands - FLUSHDB, FLUSHALL can wipe entire cache
  • Use ACLs - Restrict which commands each user/role can execute
  • Bind to internal interfaces - Never expose Redis/Memcached to public internet
  • Monitor for abnormal eviction patterns - Could indicate attack or misconfiguration
  • Validate key inputs - Prevent injection of extremely long or special character keys
  • Set up replication authentication - Replicas should require auth to sync from primary
# Redis security configuration
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command DEBUG ""
maxmemory 100mb
maxmemory-policy allkeys-lru

Observability Checklist

Metrics to Track

  • evicted_keys - Number of items evicted per second; indicates memory pressure
  • expired_keys - Number of items that expired via TTL
  • stat_keyspace_hits/misses - Cache hit ratio calculation
  • used_memory - Current memory usage
  • maxmemory - Configured memory limit
# Redis INFO stats to monitor
INFO stats | grep -E "keyspace|evicted|expired"
# Keyspace hits:123456789
# Keyspace misses:12345678
# Expiredkeys:1234567
# Evictedkeys:12345

Logs to Capture

import structlog
logger = structlog.get_logger()

# Monitor eviction events
def check_eviction_risk(redis_client, threshold=0.7):
    info = redis_client.info('memory')
    used = info['used_memory']
    maxmem = info['maxmemory']

    if maxmem > 0:
        usage_ratio = used / maxmem
        if usage_ratio > threshold:
            logger.warning("cache_memory_pressure",
                usage_percent=f"{usage_ratio * 100:.1f}%",
                evicted_recently=info.get('evicted_keys', 0))
    return usage_ratio

Alert Rules

- alert: HighEvictionRate
  expr: rate(redis_evicted_keys_total[5m]) > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High cache eviction rate - memory pressure"

- alert: LowHitRate
  expr: redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) < 0.7
  for: 10m
  labels:
    severity: critical
  annotations:
    summary: "Cache hit rate below 70%"

Real-World Case Studies

Real-World Case Study: Netflix

Netflix streams video to 250+ million subscribers. Their caching strategy is among the most sophisticated in the industry.

Netflix uses a tiered caching approach. The bottom tier is origin storage (Amazon S3). Above that, regional caches handle requests for entire geographic regions. At the top, edge caches serve individual streams. The eviction policy at each tier differs based on content lifecycle.

For the “continue watching” feature — which shows a user’s in-progress videos — Netflix uses LFU eviction. The logic: videos that 10,000 people are actively watching should stay cached over videos that 10 people are watching. LFU keeps the most-requested content. The counter decay is tuned aggressively (lfu-decay-time 1) because a video’s popularity can drop dramatically after it leaves the “trending” window.

For the “top 10” recommendations list, they use LRU. The list changes daily, and yesterday’s list becomes irrelevant quickly. LRU naturally evicts yesterday’s recommendations as today’s take over.

The lesson: one eviction policy does not fit all use cases. Netflix runs multiple cache clusters for different features, each tuned to its specific access pattern. Auditing which content lives in which cache tier and whether the policy matches the content’s access pattern is ongoing operational work.


Interview Questions

1. Your Redis cache shows high memory usage but low hit rate. What is likely happening?

The working set does not fit in the available cache. With LRU, if your most frequently accessed data exceeds available memory, those hot items get evicted by newer items that are accessed once and never again (the "one-time access" problem). Diagnose with redis-cli --latency-histogram to see response time distribution, and OBJECT freq to check access frequency of keys. The fix: either increase cache memory, reduce working set size (trim keys or reduce TTL), or move to LFU which rewards consistently popular items over one-time accesses.

2. You are designing a cache for API responses. The API has 1 million unique endpoints but your cache can only hold 100,000 entries. Which eviction policy would you choose?

With 1M unique endpoints and 100K capacity, 90% of cache entries will never be revisited (assuming uniform distribution). LRU would constantly evict to make room for new entries that arrive once and are never requested again. Better approaches: use API response fingerprinting to identify which endpoints are actually popular (cache only top N), or use LFU to identify truly popular endpoints. TTL alone is not sufficient — without LRU/LFU, the popular endpoints still get evicted when one-time-access items fill the cache. The practical answer is to investigate actual access distribution first: if 20% of endpoints account for 80% of requests, filter your cache to those first.

3. What happens to LFU frequency counters over time for rarely-accessed items?

LFU counters decay via lfu-decay-time. Every N minutes (default 1), Redis multiplies all LFU counters by 0.5, effectively halving them. A key accessed once (counter = 1) becomes 0.5 → 0 → never considered "least frequently used" for eviction purposes. An item accessed 100 times (counter = 100) becomes 50, then 25, then 12.5. This means LFU with decay correctly ages out old popularity, but if decay is too aggressive, genuinely popular items get evicted because their counters dropped too far. The counter is 8-bit (0-255 max), so items with very high counts can saturate. If decay time is too long, counters never shrink and old popularity data poisons the cache. Tuning lfu-decay-time requires understanding your content lifecycle — video streaming with week-long popularity windows needs different decay than news with hour-long windows.

4. You are using volatile-lru but some keys have no TTL. The cache fills up and evictions stop working. Why?

volatile-lru only considers keys with a TTL for eviction. If keys without TTL (no expiration) fill 80% of your cache, volatile-lru can only evict from the 20% that have TTL. When the volatile subset fills, volatile-lru runs out of candidates and maxmemory cannot be respected. The solution: either use allkeys-lru to evict from the entire cache regardless of TTL, or ensure ALL keys in the cache have a TTL set. Mixing persistent keys (no TTL) with volatile keys in the same Redis instance when using volatile-lru is an anti-pattern — those persistent keys will eventually fill memory and cause eviction failure.

5. Your cache hit rate drops dramatically after a deploy but recovers within an hour. What caused this and how would you diagnose it?

The pattern of recovery suggests a cold cache problem triggered by the deploy. Common causes: (1) Application restarted and cache was flushed or inaccessible, (2) New deployment changed cache key format making old keys incompatible, (3) Traffic shift exposed a larger working set that wasn't warmed. Diagnosis steps: check `redis-cli INFO stats` for `keyspace_hits/misses` ratio, compare `evicted_keys` count before/after deploy, review application logs for cache connection errors, and verify key naming scheme didn't change. The fix: implement cache warming from a warm key file on startup, use rolling deploys that keep some instances serving with warm caches while others restart, or use blue-green cache warmup where new instances warm up before receiving traffic.

6. You are running a Redis cluster with 6 nodes and need to implement distributed rate limiting at 10,000 requests per second across the cluster. How do you achieve this with a cache?

Use a sliding window rate limiter with Redis MULT/EXEC across cluster nodes. Each request increments a counter key scoped to the user+window (e.g., `ratelimit:user123:window_1700000000`) using `INCR` with `EXPIRE` to auto-clean windows. Since Redis Cluster partitions by key hash, the rate limit key must be single-key for atomicity — use a hash slot calculator or proxy to route all rate limit operations through a single node, or use Redisson's RRateLimiter which handles distributed coordination. For 10K RPS, a single Redis node can handle ~50K ops/sec with pipelining, so co-locating rate limiting on one node is acceptable if you partition by tenant (e.g., `ratelimit:tenant_a:window`). Alternative: use a dedicated rate-limit Redis node outside the cluster to avoid impacting primary workloads.

7. Explain the difference between cache-aside (lazy loading), read-through, and write-through caching patterns. When would you choose each?

Cache-aside (lazy loading): Application checks cache first, on miss reads from DB and populates cache. Write goes to DB first, then invalidates cache. Pros: only popular data fills cache, no wasted writes. Cons: first request is always slow (cache miss), potential stale data on write failures. Best for: read-heavy workloads where most data is never accessed.

Read-through: Cache automatically loads data from storage on miss — application only talks to cache. Pros: simpler application code, cache handles loading transparently. Cons: cache must understand storage format, more complexity in cache layer. Best for: when you control the cache implementation and want transparent caching.

Write-through: Every write goes to cache AND storage simultaneously. Pros: cache always consistent with storage, no data loss on crash. Cons: higher write latency (synchronous), cache can be flooded with one-time writes. Best for: write-heavy workloads requiring strong consistency (financial data, inventory).

8. A user reports intermittent slow responses from your API. You have a Redis cache. How do you determine if the cache is the source of the latency spikes?

Three diagnostic approaches: First, use `redis-cli --latency-history` to capture latency percentiles over time — if p99 exceeds 10ms for non-BLPOP commands, cache is the culprit. Second, check `redis-cli INFO latency_stats` which records command-level latencies — look for commands with high `latency_us` values. Third, run `redis-cli --intrinsic-latency 100` to measure the baseline hardware latency (should be under 1ms). If intrinsic latency is fine but commands are slow, it's contention or big keys. Also monitor `instantaneous_ops_per_sec` — if ops drop during your spike windows while CPU is high, you have command queueing. Check `MEMORY USAGE` for key size bloat that could cause slow serialization. For production, instrument your application with cache-level tracing: log cache hit/miss/latency for every request and correlate slow requests with cache misses.

9. You have a 512MB Redis cache and need to cache 10 million user sessions. Average session size is 500 bytes. Can this fit? What else do you need to consider?

Raw calculation: 10M sessions × 500 bytes = 5GB. Your 512MB cache can only hold ~1M sessions. At 10% cache coverage, hit rate would be ~10% assuming uniform access — which is likely the realistic range for LRU. But there are critical factors: (1) Key overhead — each Redis key name adds ~40-60 bytes, so 10M keys × 60 bytes = 600MB just for key names, leaving essentially 0 bytes for values. (2) Redis memory allocation is ~1.5x the raw data size due to overhead, dict entries, and sdsalloc. (3) If access is Zipfian (20% of users = 80% of traffic), caching the top 2M popular sessions would give you 80% hit rate on 20% of traffic. Solution: use session TTL to bound working set, or add more memory (16GB+ for 10M sessions at 500 bytes each with overhead). Always use `redis-cli --bigkeys` to identify oversized keys before production.

10. How does Redis handle cache eviction when memory limit is reached with `allkeys-lru`? Walk through the exact sequence of operations.

When `maxmemory` is reached, Redis triggers the eviction sequence: (1) freeMemoryIfNeeded() is called before any new write operation. (2) If used_memory > maxmemory, eviction begins. (3) Redis samples maxmemory-samples keys (default 5) using OBJECT idleTime(key) to get LRU/LFU idle. (4) It selects the best candidate (highest idle time for LRU, lowest frequency for LFU). (5) If volatile-* policy, only keys with TTL are sampled. (6) The eviction candidate is deleted with dbDelete(). (7) If memory is still over limit, repeat up to maxmemory-eviction-tenacity attempts (default 10). (8) If all attempts fail, Redis returns an out-of-memory error on the write command. The evicted_keys metric in INFO stats tracks total evictions. Important: LRU in Redis is approximate — it doesn't scan all keys, just samples. With small maxmemory-samples, you may evict a recently-used key if it happened to be in the sampled set.

11. What is cache stampede (thundering herd), and what are the primary strategies to prevent it in production?

Cache stampede occurs when a popular cache entry expires simultaneously, causing all concurrent requests to miss and hit the origin at the same time. This can saturate or crash the backing database. Primary prevention strategies include: (1) Probabilistic early expiration (XFetch) — extend some entries before actual TTL to spread the refresh load over a time window. (2) Lease/callback pattern — assign a lease to the first requester; others wait or receive stale data. (3) Distributed locking (Redlock) — only one requester recomputes, others wait. (4) Stale-while-revalidate — serve stale data immediately while refreshing in background. The best choice depends on consistency requirements: strict consistency needs locks, high-throughput systems with tolerable staleness benefit from probabilistic approaches. In Redis, use SETNX-based locks or implement probabilistic TTL jitter.

12. Explain the ARC (Adaptive Replacement Cache) algorithm. How does it combine LRU and LFU, and why is it considered self-tuning?

ARC is a self-tuning cache algorithm introduced by IBM that combines LRU and LFU by maintaining four lists: T1 (recent LRU), T2 (frequent LFU), B1 (evicted from T1), and B2 (evicted from T2). The key innovation is that ARC dynamically adjusts the size of T1 vs T2 based on workload. If a miss occurs on a recently seen item (in B1), ARC concludes LRU is too aggressive and grows T1. If a miss occurs on a frequently seen item (in B2), ARC grows T2. This feedback loop adapts to workload changes without manual tuning. ARC achieves better hit rates than pure LRU or LFU in workloads with varying locality patterns. Redis does not natively support ARC, but it can be implemented at the application level or used in specialized caches like `cachegrand`.

13. Your application experiences a cold start problem after deployments. Every time you deploy, cache miss rate spikes for 10-15 minutes. How do you implement effective cache warming?

Cold start after deploy is a classic cache warming problem. Strategies: (1) Proactive warming from persistence — on startup, load a pre-generated list of warm keys (top N from production access logs) from JSON file. Generate this list offline from access logs. (2) Priority queue background warming — use a heap to warm keys in priority order (highest priority = most popular first), rate-limited to avoid overwhelming the origin during warmup. (3) On-demand stale-first warming — serve stale data immediately on cache miss while triggering background recomputation; handles any key access pattern. (4) Rolling deploys — restart instances in batches so some instances maintain warm caches while others restart. (5) Blue-green cache warmup — new instances warm up fully before receiving production traffic. For Redis, save top keys with DUMP and restore on startup. Monitor `evicted_keys` and `keyspace_hits/misses` to validate warmup effectiveness.

14. What are the trade-offs between write-through, write-back, and write-around caching policies? When would you choose each?

Write-through: synchronously writes to both cache and storage. Strong consistency, higher write latency (waits for storage), but cache always reflects stored data. Best for: financial transactions, inventory systems where data loss is unacceptable.

Write-back (write-behind): writes to cache only, defers storage writes until eviction or checkpoint. Very fast writes, risky on crash (data loss if power loss), complex coherency. Best for: high-write-throughput workloads like logs, metrics, where acceptable data loss is defined in the SLA.

Write-around: writes directly to storage, bypassing cache. Keeps cache clean of one-time-write data. Best for: write-heavy workloads with read-heavy access patterns (e.g., logging events that are rarely reread). Cache gets populated only on subsequent reads.

Choice depends on: (1) Write-to-read ratio — write-heavy = write-behind, read-heavy = write-through. (2) Consistency requirements — financial = write-through. (3) Data durability needs — write-back needs battery-backed cache or persistency to avoid loss.

15. Describe the multi-level cache hierarchy from L1 to L3 in modern CPUs. How do inclusive and exclusive caches differ, and why does it matter for eviction?

Modern CPUs use a tiered cache hierarchy: L1 (32-64KB, ~1ns), L2 (256KB-1MB, ~4ns), L3 (8-64MB, ~10-20ns), all much faster than RAM (~100ns) and storage (~100μs to ms). Multi-level caching exploits spatial and temporal locality.

Inclusive cache: Lower level contains all entries from upper level. Simple coherency, wastes some capacity due to duplication. Used by most modern CPUs.

Exclusive cache: Levels don't duplicate entries. Higher capacity per footprint, more complex eviction coordination (must check all levels on miss). Used in some HPC architectures.

For software caching: inclusive = simpler validation (check L1, if miss check L2, you're done). Exclusive = more capacity but must check all levels on miss. In multi-level software caches (Redis tiering), evicted entries from L1 should be promoted to L2 before discarding — this coordination prevents thrashing when hot entries accidentally get evicted from all levels.

16. How does Memcached's slab allocator work, and why does choosing the wrong slab class waste memory even with sufficient total free memory?

Memcached divides memory into slab classes (default 1MB each), each containing fixed-size chunks. Slab class 1 might have 64-byte chunks, class 2 with 128-byte chunks, up to class 42 with 1MB chunks. When storing an item, Memcached finds the smallest slab class that fits the value size.

Memory waste occurs in two ways: (1) Internal fragmentation: if your average value is 200 bytes but slab class uses 128-byte chunks, items don't fit and Memcached uses the next class up (256 bytes), wasting 56 bytes per item. At scale, this adds up. (2) External fragmentation: total free memory exists but in wrong slab classes — slab 10 (512 bytes) is full while slab 15 (10KB) is mostly empty.

The `slabs reassign` command can rebalance slab classes in production but requires careful planning. For capacity planning, profile your actual value size distribution and configure `max_chunk_size` and growth factor accordingly. Redis uses a similar approach with its memory allocator (jemalloc) but abstracts this less visibly.

17. What is the difference between cache-aside (lazy loading), read-through, and write-through patterns? When would you choose each?

Cache-aside (lazy loading): Application checks cache first, on miss reads from DB and populates cache. Write goes to DB first, then invalidates cache. Pros: only popular data fills cache, no wasted writes on one-time data. Cons: first request always slow (cold miss), potential stale data if write fails before cache invalidation. Best for: read-heavy workloads where most data is never accessed.

Read-through: Cache automatically loads data from storage on miss — application only talks to cache, cache handles loading transparently. Pros: simpler application code. Cons: cache must understand storage format, adds complexity in cache layer. Best for: when you control both cache and storage and want transparent caching.

Write-through: Every write updates cache AND storage simultaneously. Pros: cache always consistent with storage, no data loss on crash. Cons: higher write latency (synchronous), cache fills with one-time writes. Best for: write-heavy workloads requiring strong consistency (financial, inventory).

Cache-aside is the most common pattern in web applications (e.g., Django cache framework, Spring Cache). Read-through is common in content delivery networks and OS page caches.

18. LFU counter overflow is a real problem in production Redis. How does it manifest, and how do you tune `lfu-decay-time` to prevent it?

Redis LFU counters are 8-bit, ranging 0-255. When an item is accessed very frequently (e.g., a top-100 hot key with millions of hits), its counter saturates at 255 and stops growing. Meanwhile, a key with counter 254 will be evicted over one with 255, even if the 254 key deserves to stay more. This breaks LFU's ranking accuracy.

`lfu-decay-time` controls how often counters are halved. Default is 1 minute. Setting it too short (e.g., 1) means occasional accesses keep items alive too long (a key accessed once stays at 0.5 → 0.25 → ... and gets re-accessed before hitting 0). Setting it too long (e.g., 30) means genuinely popular items lose their ranking too quickly after a lull.

For content with week-long popularity windows (e.g., video streaming, news), use decay times in the 5-15 minute range. For rapidly changing trending content (social media, live events), use aggressive decay (1-3 minutes). Monitor `OBJECT freq` on known hot keys to verify counters are decaying as expected. Consider using LRU for workloads where item popularity changes rapidly rather than relying on frequency counting.

19. You have a 1GB Redis cache and need to cache 5 million short-lived tokens. Each token is ~100 bytes with a 15-minute TTL. Average unique tokens per second is 5,000. Should this fit in cache? What hit rate would you expect?

Working set calculation: 5,000 unique tokens/sec × 100 bytes × 900 seconds (15 min TTL) = 450MB working set. A 1GB cache should theoretically hold this comfortably.

But with LRU, hit rate depends on access distribution. If access is uniform: probability of a request being for a token in cache = cache_size / working_set = 1024MB / 450MB > 1 = 100% (everything fits). However, in practice: (1) LRU with maxmemory-samples 5 may evict slightly-recently-used tokens. (2) Key name overhead (~50 bytes per key in Redis) means 5M tokens × 50 bytes = 250MB just for key names. (3) With 100-byte values, effective capacity is ~750MB for values, leaving some headroom.

Expected hit rate with LRU: if access is uniform and working set fits, ~95%+ hit rate. If access is Zipfian (some tokens hit much more frequently), LRU will naturally keep popular tokens. If using LFU, tokens accessed frequently will be rewarded and maintain high hit rates. With TTL of 15 minutes and 5,000 TPS, token churn is manageable.

20. How would you implement a distributed rate limiter using Redis that supports 50,000 requests per second across a 6-node Redis Cluster?

For distributed rate limiting on Redis Cluster with 50K RPS: (1) Sliding window counter: each request increments a key `ratelimit:{tenant}:{window_timestamp}` using INCR with EXPIRE on the window. Cluster partitions by key hash, so all operations for a tenant must hit the same node — use hashtag `{tenant}` in the key.

(2) Token bucket with Redis MULTI/EXEC: atomically decrement tokens and check remaining. Requires single-key access for atomicity.

(3) Architecture: since 50K RPS exceeds single Redis node capacity (~30-50K ops/sec depending on payload), partition by tenant across multiple nodes. Use a proxy layer (e.g., Redis Cluster proxy, or application-level routing) to route rate limit keys for each tenant to the same node.

(4) Alternative: dedicated rate-limit Redis: co-locate rate limiting on a single powerful node (e.g., Redis on a high-CPU instance) separate from your cluster. 50K RPS is well within single-node capacity with pipelining.

Key design: ensure atomicity by keeping all rate-limit operations for a tenant on one node. For very high throughput, consider local in-memory rate limiting with periodic Redis sync for distributed enforcement.


Further Reading

Official Documentation

Papers and Technical References

Tools and Utilities

Books

  • Designing Data-Intensive Applications by Martin Kleppmann — Chapter on caching, databases, and trade-offs
  • Redis in Action by Josiah Carlson — Practical Redis including advanced eviction patterns
  • Building Microservices by Sam Newman — Caching patterns in distributed systems

Conclusion

Pick your eviction policy based on your access patterns. LRU is the default for most use cases because temporal locality is common. LFU rewards consistency. FIFO and random are mostly for specialized cases or as baselines.

The biggest mistake people make is treating eviction policy as a one-time decision. Monitor your hit rate. If it drops, your access patterns might have changed and a different policy might serve you better.

Category

Related Posts

Bloom Filters: Space-Efficient Probabilistic Data Structure

How Bloom filters provide memory-efficient membership testing with configurable false positive rates for caches, databases, and distributed systems.

#distributed-systems #data-structures #storage

Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming

A comprehensive guide to advanced cache patterns — thundering herd, cache stampede prevention with distributed locking and probabilistic early expiration, and cache warming strategies.

#system-design #caching #distributed-systems

Distributed Caching: Scaling Cache Across Multiple Nodes

A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.

#system-design #caching #distributed-systems