Caching Strategies: A Practical Guide

Learn the main caching patterns — cache-aside, write-through, write-behind, and refresh-ahead — plus how to pick TTLs, invalidate stale data, and distribute caches across nodes.

published: August 15, 2024 reading time: 43 min read author: GeekWorkBench updated: December 1, 2024

Caching Strategies: A Practical Guide

Introduction

Most applications have data that rarely changes but gets hit constantly. User profiles, product listings, config values, session data. Without caching, every request for this data pounds the database, even when nothing’s changed since last Tuesday.

The numbers tell the story:

Approach	Typical Latency	Requests per Second (per node)
Database query	5-50ms	1,000-10,000
Cache hit	0.1-1ms	100,000-1,000,000
Cache miss (with cache)	5-51ms	Same as database

A cache that serves stale data is worse than no cache. And a cache that needs constant babysitting to stay valid is just overhead you do not need.

Core Concepts

These patterns describe how data flows between cache and application.

Read Caching Patterns

Read caching patterns describe how data flows from cache to application on read operations.

Invalidate-on-Read (Stale-While-Revalidate)

On each read, the cache checks whether the cached data is still fresh. This is typically done by comparing a version number, ETag, or timestamp against the origin. If the data is stale, the cache invalidates it and fetches fresh data from the origin before returning. This pattern is also known as stale-while-revalidate: serve the cached data immediately while asynchronously fetching an update if the data is past its freshness threshold.

This approach combines fast reads (cache hit returns immediately) with automatic background refresh for data that has changed. It works well when data changes unpredictably and you want to avoid write-time invalidation overhead.

Cache-Aside (Lazy Loading)

This is what most people mean when they say “caching.” Your application checks the cache first, loads from the database on a miss, then populates the cache for next time.

sequenceDiagram
    participant Client
    participant Cache
    participant Database

    Client->>Cache: GET user:123
    Cache-->>Client: Cache miss

    Client->>Database: SELECT * FROM users WHERE id = 123
    Database-->>Client: User data

    Client->>Cache: SET user:123 (ttl=3600)
    Cache-->>Client: OK

    Client->>Cache: GET user:123
    Cache-->>Client: User data (cached)

Implementation:

def get_user(user_id):
    # Try cache first
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)

    if cached:
        return json.loads(cached)

    # Cache miss - load from database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    # Populate cache with TTL
    redis.setex(cache_key, 3600, json.dumps(user))

    return user

Write operation:

def update_user(user_id, data):
    # Update database first
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Invalidate cache
    redis.delete(f"user:{user_id}")

Pros:

Simple to implement
Cache only contains data that’s actually requested
No cache stampede on startup (cold cache is expected)
Easy to reason about

Cons:

First request after cache miss is always slow
Cache and database can temporarily diverge (eventual consistency)
Three network round-trips on cache miss (check, read, write)

Use this when: reads dominate your workload and you can live with brief inconsistency.

Read-Through (Cache Enrichment)

Same idea as cache-aside, but the cache library handles the miss logic for you. You just ask the cache for data; it fetches from the database automatically if needed.

sequenceDiagram
    participant Client
    participant Cache
    participant Database

    Client->>Cache: GET user:123
    Cache->>Cache: Check in-memory store

    alt Cache miss
        Cache->>Database: SELECT * FROM users WHERE id = 123
        Database-->>Cache: User data
        Cache->>Cache: Store in memory
    end

    Cache-->>Client: User data

Implementation with Redis:

def get_user_cached(user_id):
    cache_key = f"user:{user_id}"

    # Check if loader is registered
    user = redis.get(cache_key)
    if user:
        return json.loads(user)

    # Using Redis Functions or Lua script for atomic read-through
    # This is handled by the cache layer itself
    return None

Most caching libraries (like Spring Cache, Django’s cache framework, Go’s groupcache) implement read-through natively:

// Using groupcache (read-through implementation)
var db Database
sc := groupcache.NewGetter("http://cache-server/", &db)

func getUser(ctx context.Context, userID int64) (*User, error) {
    var user User
    key := fmt.Sprintf("user:%d", userID)
    err := sc.Get(ctx, key, &user)
    return &user, err
}

Pros:

Cleaner application code
Reduced latency on cache miss (cache fetches in parallel with other requests)
Cache handles the fetch-and-store atomically

Cons:

Less control over cache logic
All caches must implement the same pattern
Can mask cache behavior from developers

Use this when: you want caching to be infrastructure, not application logic.

Write Caching Patterns

Write caching patterns describe how data flows from application to cache on write operations.

Write-Through

Every write goes to cache and database together. The operation doesn’t return until both succeed.

sequenceDiagram
    participant Client
    participant Cache
    participant Database

    Client->>Cache: SET user:123
    Cache->>Database: UPDATE users SET ...
    Database-->>Cache: OK

    Cache-->>Client: OK

Implementation:

def update_user(user_id, data):
    # Write to cache AND database
    cache_key = f"user:{user_id}"

    # Start transaction
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Write-through to cache
    redis.setex(cache_key, 3600, json.dumps(data))

    return data

Pros:

Strong consistency between cache and database
Cache is always warm with latest data
No cache invalidation logic needed

Cons:

Write latency increases (two writes instead of one)
Cache can be knocked out by write-heavy workloads
Cache might be populated with data that’s never read

Use this when: consistency matters more than write speed and your writes are infrequent relative to reads.

Write-Behind (Write-Back)

You write to the cache and it batches the database writes to happen later, in the background.

sequenceDiagram
    participant Client
    participant Cache
    participant Database
    participant WriteBuffer

    Client->>Cache: SET user:123
    Cache->>WriteBuffer: Queue write
    Cache-->>Client: OK (fast)

    Note over WriteBuffer: Background worker

    WriteBuffer->>Database: Batch UPDATE
    Database-->>WriteBuffer: OK

Implementation:

import asyncio
from collections import deque

class WriteBehindCache:
    def __init__(self, redis, db, batch_size=100, flush_interval=1.0):
        self.redis = redis
        self.db = db
        self.write_queue = deque()
        self.batch_size = batch_size
        self.flush_interval = flush_interval
        asyncio.create_task(self._flush_loop())

    async def set(self, key, value):
        self.redis.setex(key, 3600, json.dumps(value))
        self.write_queue.append((key, value))

        if len(self.write_queue) >= self.batch_size:
            await self._flush()

    async def _flush(self):
        if not self.write_queue:
            return

        batch = []
        while self.write_queue and len(batch) < self.batch_size:
            batch.append(self.write_queue.popleft())

        # Batch write to database
        for key, value in batch:
            self.db.execute(
                "UPDATE users SET ... WHERE id = ?",
                value['id'],
                value
            )

    async def _flush_loop(self):
        while True:
            await asyncio.sleep(self.flush_interval)
            await self._flush()

Pros:

Very low write latency
Batching reduces database load
Cache handles burst writes gracefully

Cons:

Risk of data loss if cache fails before flush
Complexity in handling partial failures
Cache and database can significantly diverge
Harder to debug (writes happen asynchronously)

Use this when: you’re collecting metrics or events and losing a few writes won’t ruin your day.

Refresh-Ahead (Proactive Caching)

The cache automatically refreshes entries before they expire. Popular data stays perpetually warm, so users never hit a cache miss.

sequenceDiagram
    participant Cache
    participant Database
    participant Refresher

    Note over Cache: Entry TTL = 300s

    Refresher->>Cache: Check TTL
    Refresher->>Database: SELECT (background)
    Refresher->>Cache: SET (reset TTL)

    loop Every 60 seconds
        Refresher->>Cache: Check popular entries
        Refresher->>Database: Refresh if TTL < 60s
    end

Implementation:

import time
from threading import Thread

class RefreshAheadCache:
    def __init__(self, redis, db, ttl=300, refresh_threshold=0.8):
        self.redis = redis
        self.db = db
        self.ttl = ttl
        self.refresh_threshold = refresh_threshold
        self.popular_keys = set()

        # Background refresher thread
        self.running = True
        self.thread = Thread(target=self._refresh_loop)
        self.thread.start()

    def track_access(self, key):
        """Track frequently accessed keys"""
        self.popular_keys.add(key)

    def get(self, key):
        value = self.redis.get(key)
        if value:
            self.track_access(key)
            return json.loads(value)
        return None

    def _should_refresh(self, key):
        """Check if key needs proactive refresh"""
        ttl = self.redis.ttl(key)
        return ttl > 0 and ttl < (self.ttl * self.refresh_threshold)

    def _refresh_loop(self):
        while self.running:
            for key in list(self.popular_keys):
                if self._should_refresh(key):
                    # Refresh in background
                    data = self.db.query(
                        "SELECT * FROM users WHERE id = ?",
                        key.split(':')[1]
                    )
                    self.redis.setex(key, self.ttl, json.dumps(data))

            time.sleep(10)  # Check every 10 seconds

Pros:

Eliminates cache miss latency for popular items
Users never wait for cache to repopulate
Smoother performance under varying loads

Cons:

Wasted resources refreshing items not actually needed
Complexity in tracking truly popular keys
Risk of refreshing stale data
Additional logic to determine refresh threshold

Use this when: you have a known set of hot data and read latency matters more than wasted cycles.

Topic-Specific Deep Dives

These sections dig into specific aspects of caching implementation and operations.

Memcached vs Redis: Making the Choice

Both Memcached and Redis serve as distributed caching layers, but they target different use cases. Understanding the trade-offs helps you pick the right tool.

Feature	Memcached	Redis
Data structures	Key-value only	Strings, hashes, lists, sets, sorted sets, streams
Persistence	None (memory-only)	Optional RDB snapshots + AOF
Replication	Not built-in	Master-replica replication
Clustering	Consistent hashing client-side	Native cluster mode built-in
eviction policies	LRU, LFU, TTL	LRU, LFU, TTL + manual control
Memory efficiency	Simple slab allocator	More overhead per key
Use when	Simple caching, PHP/auto	Complex data, pub/sub, sorted sets, need persistence

Memcached excels at: simple key-value caching where you just need to store serialized objects. Its memory efficiency and simplicity make it ideal for basic cache-aside patterns. Many PHP applications and frameworks default to Memcached for this reason.

Redis excels at: caching that requires data structures (like leaderboards with sorted sets, pub/sub for cache invalidation, or stream-based event queuing). Its native clustering and replication simplify operational complexity.

# Memcached: simple get/set
memcached.set(key, value, expire=3600)
value = memcached.get(key)

# Redis: richer operations
redis.set(key, value, ex=3600)
redis.zadd("leaderboard", {"user_id": score})  # Sorted set for rankings
redis.publish("invalidate", key)  # Pub/sub for coordinated invalidation

For most caching scenarios, Redis wins because it reduces the number of systems you need to operate. But if you have a pure read-heavy workload with simple key-value requirements and memory efficiency is critical, Memcached is a valid choice.

When NOT to Cache

Caching is not always the answer. Here are scenarios where the complexity outweighs the benefits.

Do not cache when:

Data changes on every request (no repeat reads to benefit from)
Cache would consume more memory than the database itself (full table caching)
Consistency requirements preclude staleness (financial transactions)
Your database already handles your load comfortably (add complexity only when needed)
Data is unique per request and never repeated (session data with per-user keys)

Signs your cache is not helping:

Hit rate below 50% despite tuning
Cache memory pressure causes constant evictions
You spend more time managing cache invalidation than writing application code
Cache failures cause more production incidents than database failures

Cache as a performance optimization, not a architectural necessity. If your database handles your load without caching, keep it simple.

CDN Caching for Static Assets

CDNs sit at the edge of your infrastructure, caching content close to users. Unlike application caches that handle dynamic data, CDNs typically handle static assets: images, CSS, JavaScript, fonts, videos.

CDN caching strategies differ from application caching:

Aspect	Application Cache (Redis/Memcached)	CDN
Content type	Dynamic data, API responses	Static files (images, JS, CSS)
TTL range	Seconds to hours	Minutes to years
Invalidation	Event-driven or TTL	Purge API or TTL expiry
Cache key	Data-specific (user:123:profile)	URL-based (/assets/logo.png)
Geographic distribution	Limited to cache cluster location	Global PoPs near users

Cache-Control directives every developer should know:

Versioning Strategies

# Immutable assets: cache forever, change URL on deploy
Cache-Control: public, max-age=31536000, immutable

# Versioned asset: cache forever, change URL on deploy
# /app.abc123.js

No caching (sensitive content)

Cache-Control: no-store, private

Stale-while-revalidate (serve stale, update in background)

Cache-Control: public, max-age=3600, stale-while-revalidate=86400


**CDN invalidation pitfalls:**

- Purge is not instant. Most CDNs take 30 seconds to 5 minutes to propagate purges globally.
- Cache tags or content-type purging helps but is not universally supported.
- Versioned URLs (e.g., `/app.abc123.js`) beat cache invalidation for JavaScript/CSS updates.

```html
<!-- Versioned asset: cache forever, change URL on deploy -->
<script src="/app.abc123.js"></script>

<!-- vs -->

<!-- Unversioned: requires CDN purge on every deploy -->
<script src="/app.js"></script>

When CDN alone is not enough: CDNs excel at caching static assets with long TTLs. But for dynamic content that changes frequently (like a news homepage), CDNs need help. Pattern: CDN edge caching with application-level cache invalidation via surrogate keys or tag-based purging. Cloudflare Workers or Fastly VCL can intercept requests and conditionally purge cache when the origin data changes.

Choosing the Right Strategy

Strategy	Read Performance	Write Performance	Consistency	Complexity
Cache-Aside	Good (after miss)	Best	Eventual	Low
Read-Through	Good	Same as DB	Eventual	Low
Write-Through	Good	Good	Strong	Medium
Write-Behind	Good	Best	Eventual	High
Refresh-Ahead	Best	Same	Near-strong	High

How to decide

Which latency matters more, reads or writes?

Reads: cache-aside, read-through, or refresh-ahead
Writes: write-behind or write-through

How synced do cache and database need to be?

Tight consistency: write-through
Eventual is fine: cache-aside or write-behind

What happens if the cache goes down before flushing?

Can’t lose writes: write-through
A few lost writes are okay: write-behind

Is access predictable?

Unpredictable: cache-aside
Known hot set: refresh-ahead

Cache Invalidation and TTL

Cache invalidation is the hardest part of caching. The right TTL strategy is equally critical for maintaining freshness vs. efficiency.

Time-Based Invalidation (TTL)

The simplest approach — entries expire after a fixed duration.

# TTL-based invalidation
redis.setex(cache_key, 3600, value)  # 1 hour TTL

When to use: Data that naturally becomes stale over time (user profiles, product prices, news articles).

Limitation: You must choose a TTL that balances freshness against load. Too short = cache thrashing. Too long = stale data.

Event-Based Invalidation (Cache Eviction on Write)

When data changes in the database, explicitly remove or update the corresponding cache entry.

def update_user(user_id, data):
    # Update database first
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Invalidate cache entry
    redis.delete(f"user:{user_id}")

    # Optionally, immediately repopulate with fresh data
    fresh_user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(f"user:{user_id}", 3600, json.dumps(fresh_user))

When to use: When you need immediate consistency on writes (write-through scenario).

Limitation: Requires your application to remember to invalidate on every write path. Miss one path, and you have stale data.

Event-Driven Invalidation (Pub/Sub)

Use a message queue or pub/sub system to propagate invalidation events across all cache nodes.

# Publisher: when data changes
def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Publish invalidation event
    redis.publish("cache:invalidate", f"user:{user_id}")

# Subscriber: on each application server
def subscribe_invalidation():
    pubsub = redis.pubsub()
    pubsub.subscribe("cache:invalidate")

    for message in pubsub.listen():
        if message["type"] == "message":
            cache_key = message["data"]
            redis.delete(cache_key)

When to use: Multi-server deployments where cache lives on application servers (local caches) and you need coordinated invalidation across all nodes.

Limitation: Event delivery is not guaranteed. Subscribers might miss messages during restarts. Always combine with TTL as a safety net.

Hybrid Approach: TTL + Event Invalidation

The most robust strategy combines TTL (safety net) with event invalidation (specificity).

def get_user(user_id):
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)

    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    # Set TTL as safety net (e.g., 1 hour)
    redis.setex(cache_key, 3600, json.dumps(user))

    return user

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Immediate invalidation via event (specific)
    redis.publish("cache:invalidate", f"user:{user_id}")

    # TTL still acts as safety net if event is missed

This approach handles: (1) missed invalidation events via TTL, (2) immediate consistency when events fire, and (3) cache recovery after failures.

Cache Invalidation in Distributed Systems

In distributed cache setups, invalidation becomes more complex because multiple cache nodes might hold the same key.

# Using Redis SCAN to find and delete all copies of a key across nodes
def invalidate_key_across_cluster(key_pattern):
    """
    Invalidate all keys matching a pattern across the cluster.
    Use with caution — expensive operation.
    """
    cursor = 0
    while True:
        cursor, keys = redis.scan(cursor, match=key_pattern, count=100)
        if keys:
            redis.delete(*keys)
        if cursor == 0:
            break

# Example: invalidate all user session keys for a specific user
invalidate_key_across_cluster(f"session:*:user:{user_id}")

For distributed caches like Memcached or Redis Cluster, consider consistent hashing to determine which node holds a specific key — this lets you invalidate directly without scanning.

TTL Selection Guide

Choosing the right TTL is a balancing act between data freshness, cache efficiency, and database load.

TTL Selection Framework

Ask these questions to determine appropriate TTLs:

1. How stale can this data be?

Data Type	Staleness Tolerance	Suggested TTL
Real-time prices	Seconds	30-60 seconds
Social media posts	Minutes	5-15 minutes
User profiles	Minutes to hours	15-60 minutes
Product catalog	Hours	1-24 hours
Static config	Hours to days	1-24 hours
Session data	Duration of session	24 hours

2. What is the cost of a cache miss vs stale data?

Miss-costly, stale-tolerant: longer TTLs work fine (config, user preferences)
Miss-costly, stale-intolerant: use shorter TTLs + event invalidation (prices, inventory)
Miss-cheap, stale-tolerant: shorter TTLs are fine (view counts, trending topics)

3. How does access pattern decay?

Data that spikes in popularity then drops off (social posts, news) needs shorter TTLs than evergreen content (documentation, product specs).

TTL Jitter (Preventing Thundering Herds)

If all cache entries expire at the same time, you get a thundering herd when they all expire. Add random jitter to TTLs:

import random

def set_with_jitter(key, value, base_ttl):
    """
    Set cache with randomized TTL to prevent synchronized expiration.
    Jitter is +/- 10% of base TTL.
    """
    jitter = base_ttl * 0.1
    actual_ttl = base_ttl + random.uniform(-jitter, jitter)
    redis.setex(key, int(actual_ttl), value)

# Usage
set_with_jitter("user:123", user_data, base_ttl=3600)  # 3240-3960 seconds

TTL Tiering

For the same data, consider storing multiple copies at different TTLs for different freshness requirements:

def cache_user_profile(user_id):
    cache_key = f"user:{user_id}"

    # Fresh copy: short TTL
    fresh = redis.get(f"{cache_key}:fresh")
    if not fresh:
        fresh = db.query("SELECT * FROM users WHERE id = ?", user_id)
        redis.setex(f"{cache_key}:fresh", 300, json.dumps(fresh))  # 5 min

    # Stale copy: long TTL (fallback)
    stale = redis.get(f"{cache_key}:stale")
    if not stale:
        stale = fresh  # Initial population
        redis.setex(f"{cache_key}:stale", 86400, json.dumps(stale))  # 24 hours

    return fresh if fresh else stale

Dynamic TTL Based on Data Characteristics

Some data has variable freshness based on its nature. Use dynamic TTLs:

def get_dynamic_ttl(data_type, data_age_hours=0):
    """
    Return appropriate TTL based on data type and age.
    """
    base_ttls = {
        "breaking_news": 30,      # 30 seconds
        "sports_scores": 60,      # 1 minute
        "product_price": 300,     # 5 minutes
        "blog_post": 1800,        # 30 minutes
        "documentation": 86400,   # 24 hours
    }

    base = base_ttls.get(data_type, 3600)

    # Reduce TTL for rapidly changing data
    if data_age_hours < 1:
        return base // 2  # Halve TTL for fresh content
    return base

Distributed Cache Patterns

When a single cache instance cannot handle your load, distribute the cache across multiple nodes.

Consistent Hashing

Consistent hashing maps keys to cache nodes based on key hash values, minimizing remapping when nodes are added or removed.

import hashlib

class ConsistentHash:
    def __init__(self, nodes):
        self.ring = {}
        self.sorted_keys = []

        for node in nodes:
            self._add_node(node)

    def _add_node(self, node):
        for i in range(100):  # Virtual nodes for better distribution
            key = hashlib.md5(f"{node}:{i}".encode()).hexdigest()
            self.ring[key] = node
            self.sorted_keys.append(key)
        self.sorted_keys.sort()

    def get_node(self, key):
        key_hash = hashlib.md5(key.encode()).hexdigest()
        for sorted_key in self.sorted_keys:
            if key_hash <= sorted_key:
                return self.ring[sorted_key]
        return self.ring[self.sorted_keys[0]]

# Usage
ch = ConsistentHash(["cache-1", "cache-2", "cache-3"])
node = ch.get_node("user:123")  # Always returns same node for same key

When you add or remove a node, only K/n keys remap (where K is total keys, n is nodes). This avoids cache stampedes during scaling events.

Cache Sharding by Entity

Instead of distributing keys at random, shard by entity type so related data stays together.

def get_shard(cache_key):
    """
    Shard by entity type to keep related data together.
    """
    # Extract entity type from key
    entity_type = cache_key.split(":")[0]  # "user", "product", "order"

    # Hash the entity type for distribution
    type_hash = hashlib.md5(entity_type.encode()).hexdigest()

    # Map to shard
    shard_index = int(type_hash, 16) % NUM_SHARDS
    return f"cache-shard-{shard_index}"

# Route requests to appropriate shard
def cache_get(cache_key):
    shard = get_shard(cache_key)
    return redis_shards[shard].get(cache_key)

All data for a single user (profile, preferences, history) lives in the same shard, which makes multi-key operations and pipelining straightforward.

Replication with Read Replicas

For read-heavy workloads, add replica nodes that handle read traffic while the primary handles writes.

# Write to primary
def cache_set(key, value):
    redis_primary.setex(key, 3600, value)

    # Replicate asynchronously to read replicas
    redis_replicas.each { |r| r.setex(key, 3600, value) }

# Read from replica (randomly selected)
def cache_get(key):
    replica = random.choice(redis_replicas)
    return replica.get(key)

The tradeoff: replicas might lag the primary, serving slightly stale data. For most caching scenarios this is fine.

Multi-Tier Caching

Deploy a local (L1) in-memory cache in front of a distributed (L2) cache.

import functools
from threading import Lock

class TwoTierCache:
    def __init__(self, local_cache, redis_cache):
        self.local = local_cache  # e.g., LRUCache from cachetools
        self.redis = redis_cache
        self.local_lock = Lock()

    def get(self, key):
        # Try L1 first (local, ultra-fast)
        value = self.local.get(key)
        if value is not None:
            return value

        # L1 miss — try L2 (distributed)
        value = self.redis.get(key)
        if value is not None:
            # Populate L1 for next request
            with self.local_lock:
                self.local[key] = value
            return value

        return None

    def set(self, key, value, ttl=3600):
        # Write to both tiers
        self.redis.setex(key, ttl, value)
        with self.local_lock:
            self.local[key] = value

    def invalidate(self, key):
        self.redis.delete(key)
        with self.local_lock:
            self.local.pop(key, None)

YouTube’s architecture uses exactly this pattern: L1 per-machine cache handles the ultra-hot set, L2 distributed cache handles warm data, and the database handles cold data.

Case Study: YouTube’s Cache Hierarchy

YouTube’s caching infrastructure is one of the most studied in the industry. Their approach uses multiple cache layers: L1 (in-memory, per-machine), L2 (distributed cache), and CDN at the edge.

YouTube’s L1 cache is a small in-memory cache on each application server. It handles the most frequently accessed items — popular videos, trending content. L1 hit rate alone is often 50-60% because many users on the same machine access the same popular content.

The L2 distributed cache (originally Memcached, later moved to custom infrastructure) handles cache misses from L1. L2 is sharded across many machines to provide petabyte-scale capacity. Cache misses from L2 go to storage (BigTable).

The CDN handles the edge, serving popular content from points of presence close to users. YouTube’s CDN cache hit rate is over 90% for video streaming — once a video becomes popular, it propagates to CDN PoPs and subsequent requests rarely hit origin.

The lesson: YouTube does not rely on a single cache tier. They use L1 to handle the ultra-hot set with extremely low latency, L2 for the warm cache, and CDN for the long tail of popular-but-not-ultra-popular content. Most companies should design for two tiers (local cache + distributed cache) before adding a CDN.

Case Study: Twitter’s Cache Warming Strategy

Twitter has a unique caching challenge: events (tweets, likes, follows) have a short window of high read traffic, then traffic drops off a cliff. A tweet from a celebrity gets millions of reads in the first hour, then readership drops to hundreds per day.

Twitter’s solution is aggressive cache warming: when a tweet is published, Twitter pushes it into the timelines of active followers’ caches rather than waiting for cache misses. This is the fanout-on-write pattern — write to caches at publish time rather than computing at read time.

The tradeoff is write amplification. Every tweet from a celebrity with 10 million followers requires 10 million cache writes. Twitter manages this by limiting fanout to active users only and using hybrid push/pull for lower-activity accounts. Inactive users’ timelines are computed on read from the tweet author’s tweet store.

The operational lesson: cache warming trades write amplification for read latency. For content with rapid decay in read traffic (news, social posts, live events), warming the cache at write time reduces read latency at the cost of higher write overhead. For evergreen content, cache-aside with long TTLs is simpler and more efficient.

Capacity Estimation: Cache Size vs Hit Rate

The relationship between cache size and hit rate is not linear. Adding more cache memory gives diminishing returns beyond a certain point.

The working set model: your hit rate depends on how much of your frequently-accessed data fits in cache. If 80% of your requests hit 20% of your data, and that 20% fits in cache, you can achieve 95%+ hit rate with relatively small cache. If access is uniformly distributed, even a large cache provides modest hit rates.

The formula for estimating required cache size: working_set_bytes = unique_keys_per_second * avg_value_size * avg_ttl_seconds. If you have 10,000 requests per second, average value is 1KB, and you want a 5-minute TTL window, your working set is 10,000 1,000 300 = 3GB minimum for a fully-utilized cache before evictions. In practice, you need 1.5-2x that because LRU/LFU policies do not perfectly track the working set.

The hit rate curve: start at 0% hit rate with no cache, rapid climb as cache grows to cover the hot working set, then diminishing returns as cache size exceeds working set. Plot your hit rate against cache size to find the knee of the curve — the point where adding more cache stops helping significantly. This is your target cache size.

For cache-aside specifically, the miss penalty matters more than raw hit rate. A cache miss does a full database round-trip. If your database latency is 10ms and cache latency is 0.5ms, each miss costs 9.5ms extra. At 99% hit rate, only 1% of requests pay the miss penalty. At 95% hit rate, 5% pay it — a 5x increase in slow queries.

Monitoring and Operations

Observability Checklist

Monitor these metrics and set up alerts for production cache health.

Metrics to Track

Hit Rate: hits / (hits + misses) - should stay above 80-90% for well-tuned caches
Memory Usage: used_memory / maxmemory - alert at 70%, critical at 80%
Eviction Count: evicted_keys - indicates memory pressure
Connection Count: connected_clients - sudden drops indicate connection issues
Command Latency: P50, P95, P99 for GET/SET operations
Replication Lag: For replicated setups, lag should stay below 100ms

Logs to Capture

# Log cache operations for debugging
import structlog
logger = structlog.get_logger()

def get_user(user_id):
    cache_key = f"user:{user_id}"
    start = time.time()

    cached = redis.get(cache_key)
    if cached:
        logger.info("cache_hit", key=cache_key, latency_ms=(time.time() - start) * 1000)
        return json.loads(cached)

    # Cache miss - this should be rare in production
    logger.warning("cache_miss", key=cache_key)
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(cache_key, 3600, json.dumps(user))

    return user

Alert Rules

# Prometheus alert rules for Redis
- alert: CacheHitRateLow
  expr: redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) < 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Cache hit rate below 80%"

- alert: CacheMemoryExhausted
  expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Cache memory above 80% capacity"

Security Checklist

Cache security is often overlooked until a breach happens.

Never expose Redis/Memcached directly to the internet - Bind to localhost or private network only
Use authentication - Redis requirepass or Memcached SASL authentication
Enable TLS - For connections crossing network boundaries
Validate key namespaces - Use prefixes like app:env:table: to prevent key collisions
Sanitize cache keys - User input should never become cache keys without validation
Implement rate limiting - Prevent cache exhaustion attacks
Audit cache access - Log who accessed what, especially for sensitive data
Never cache sensitive data - PII, passwords, tokens, payment info should never enter the cache

# Redis secure configuration
bind 127.0.0.1 -::1
requirepass your-strong-password-here
tls-replication yes
tls-auth-clients no

Trade-off Analysis

Understanding the trade-offs between caching strategies helps you make informed decisions for your specific use case.

Production Failure Scenarios

Understanding what fails and how to recover is critical for production caching systems.

Cache Node Failure

When a cache node goes down, all requests directly hit the database, potentially causing cascade failure.

Mitigation:

Use connection pooling with automatic retry
Implement circuit breaker pattern
Use replica nodes for read failover

Memory Exhaustion

When cache memory is exhausted, eviction kicks in aggressively and hit rate drops to 0%.

Mitigation:

Monitor memory usage and set appropriate maxmemory limits
Alert at 70% threshold
Implement proper TTL policies

Network Partition

When network connectivity between application and cache fails, requests hang or timeout.

Mitigation:

Set reasonable socket timeouts (100-500ms)
Configure fail-fast behavior to fall back to database
Use connection pooling with health checks

Thundering Herd on Restart

When cache restarts, all clients hit the database simultaneously.

Mitigation:

Pre-warm cache on restart
Use staggered TTLs with jitter
Implement request coalescing (semaphores or locks)

Cache Credential Rotation

During credential rotation, brief outages or authentication failures can occur.

Mitigation:

Use connection pooling with lazy reconnection
Rotate credentials during low-traffic windows
Implement connection string rotation

Consistency vs Performance

Strategy	Consistency	Performance	Complexity
Cache-Aside	Eventual	High (after warm-up)	Low
Write-Through	Strong	Read-optimized	Medium
Write-Behind	Eventual	Write-optimized	High
Refresh-Ahead	Near-strong	Best for hot data	High

Memory vs Hit Rate

Larger caches achieve higher hit rates, but the relationship is not linear:

Working set fits in cache: 95%+ hit rate achievable
Working set exceeds cache: Hit rate drops proportionally
Diminishing returns: After the “knee” of the curve, adding memory yields minimal improvement

Latency vs Durability

Approach	Latency	Durability	Risk
Write-through	Higher (waits for DB)	Best (dual-write)	Low
Write-behind	Lowest (async)	Risk of loss	Higher
Cache-aside	Variable	Database-only	Medium

Implementation Complexity vs Operational Burden

Pattern	Code Complexity	Operational Complexity
Cache-Aside	Low	Low
Read-Through	Low	Medium
Write-Through	Medium	Medium
Write-Behind	High	High
Refresh-Ahead	High	High

Quick Recap + Interview Questions

Key Bullets

Cache-aside is the default strategy for most read-heavy workloads
Write-through ensures strong consistency but increases write latency
Write-behind batches writes for performance but risks data loss
Refresh-ahead eliminates misses for popular items but adds complexity
Always implement stampede protection when cache misses could cascade
Monitor hit rate, memory usage, and eviction counts continuously

Copy/Paste Checklist

# Cache-Aside Implementation Checklist
- [ ] Check cache first (redis.get)
- [ ] On miss, query database
- [ ] Populate cache with TTL (redis.setex)
- [ ] On write, invalidate cache (redis.delete), don't update
- [ ] Implement stampede protection with locks
- [ ] Cache null values with short TTL to prevent penetration
- [ ] Monitor hit rate - should be >80%
- [ ] Set appropriate TTLs based on data freshness requirements
- [ ] Log cache hits and misses for observability
- [ ] Use circuit breaker for cache failures

# TTL Selection Guide
- [ ] User profiles: 15-60 minutes
- [ ] Session data: 24 hours
- [ ] API responses: 5-30 minutes
- [ ] Static config: 1-24 hours
- [ ] Product catalog: 1-24 hours
- [ ] Real-time data: No caching or very short TTL (30-60 seconds)

Best Practices Summary

Architecture Principles

Cache as a win, not a requirement. If your database handles load fine, you may not need caching. Add caching when you have measurable latency or throughput problems.
Design for cache failure. Your application should degrade gracefully when the cache is unavailable — fall back to the database directly.
Keep the cache stateless. Cache nodes should not hold state that cannot be recovered. If a cache node restarts, any other node should be able to serve the same keys.
Instrument everything. Cache hit rates, eviction counts, memory usage — you cannot tune what you cannot measure.

Operational Guidelines

Start with cache-aside. It is the simplest strategy with the best debuggability. Add complexity only when measurements tell you to.
Use TTLs on everything. No key should live forever. TTLs prevent unbounded memory growth and ensure eventual consistency.
Namespace your keys. Use prefixes like app:env:entity:id to prevent collisions in shared cache infrastructure.
Monitor the 80% threshold. Cache hit rate should be above 80-90% for well-tuned caches. If it is lower, either your working set does not fit or your access patterns are too uniform.
Test failure modes. Periodically kill cache nodes and verify your application handles it gracefully.

Code Quality

Never use cache as primary store. The database is always the source of truth.
Invalidate on write, never update. Delete cache entries when data changes rather than trying to keep cache and database in sync.
Handle the null case. Cache null values to prevent cache penetration attacks.
Protect against stampedes. Use locks or probabilistic early expiration when cache misses are expensive.

Interview Questions

1. Explain the difference between cache-aside and write-through caching strategies. When would you choose one over the other?

Cache-aside (lazy loading): the app checks the cache first, loads from the database on a miss, then populates the cache. Writes go directly to the database, and the cache is invalidated afterward.

Write-through: every write goes to both the cache and database together. The operation does not return until both succeed.

Cache-aside wins for read-heavy workloads where brief inconsistency is acceptable. Write-through makes more sense when consistency matters more than write speed and writes are infrequent relative to reads.

2. What is a cache stampede and how do you prevent it?

A cache stampede (thundering herd) happens when a popular entry expires and multiple concurrent requests all try to rebuild it at the same time, overwhelming the database.

Prevention strategies:

Lock-based protection: only one request rebuilds the cache; others wait and retry.
Probabilistic early expiration: randomly refresh entries before they expire based on a probability function.
Mutex + early expiration combined: refresh early but coordinate with locks so only one request does the work.
Background refreshing: a separate thread or process keeps popular entries warm before they expire.

3. How does consistent hashing help with distributed caching?

Consistent hashing maps keys to cache nodes based on hash values. When nodes are added or removed, only K/n keys remap (where K is total keys, n is nodes), minimizing cache misses during scaling.

Key benefits:

Less cache invalidation needed during scaling events
Better load distribution across nodes
Easier horizontal scaling for cache clusters

4. What is cache penetration and how do you mitigate it?

Cache penetration occurs when requests repeatedly query for keys that do not exist in the cache or database. Each request bypasses the cache and hits the database, negating the cache's purpose.

Mitigations:

Cache null values: store a marker (like "NULL") for non-existent keys with a short TTL to prevent repeated lookups.
Bloom filters: use a bloom filter to quickly determine if a key might exist before querying the cache.
Input validation: sanitize cache keys to reject obviously invalid requests early.

5. When would you choose write-behind (write-back) over write-through?

Write-behind batches database writes in the background, returning immediately after the cache is updated. Write-through waits for both cache and database to succeed before returning.

Choose write-behind when:

Write latency matters more than immediate durability
You are collecting metrics, events, or analytics where losing a few writes is acceptable
You want to reduce database load from burst writes
Data loss risk is acceptable (your application can tolerate retransmission or recomputation)

Skip write-behind when data consistency is critical or you cannot tolerate any data loss.

6. How do you choose appropriate TTL values for cached data?

TTL selection depends on three factors:

Staleness tolerance: real-time data (prices, inventory) needs short TTLs (seconds to minutes). Static data (config, documentation) can use hours or days.
Miss penalty: high miss penalty (expensive database queries) suggests longer TTLs to maximize hit rate.
Access decay pattern: content that spikes in popularity then drops (social posts) needs shorter TTLs than evergreen content.

Best practice: add jitter (+/- 10%) to TTLs to prevent synchronized expiration of related keys.

7. What is the difference between cache invalidation via TTL versus event-driven invalidation?

TTL invalidation: entries automatically expire after a fixed duration. Simple, requires no application logic, but cannot provide immediate consistency when data changes.

Event-driven invalidation: when data changes in the database, a message is published (pub/sub) and all cache nodes delete the corresponding entry. Provides immediate consistency but requires more infrastructure and can miss events during failures.

The hybrid approach uses event invalidation for immediate consistency with TTL as a safety net for missed events. This is the most robust pattern for production systems.

8. What is a two-tier (L1/L2) cache and why would you use it?

A two-tier cache places a small, fast local cache (L1) in front of a larger, distributed cache (L2). L1 is typically an in-memory cache on each application server. L2 is a shared cache like Redis or Memcached.

Benefits:

L1 hit rate of 50-60% for shared popular content (users on same machine accessing same data)
Ultra-low latency for L1 hits (microseconds vs milliseconds for L2)
L2 provides capacity for the warm cache beyond what fits in local memory
Reduces cross-network traffic to L2 cache

YouTube's architecture uses exactly this pattern with per-machine L1, distributed L2, and CDN at the edge.

9. How does refresh-ahead caching differ from cache-aside, and what are its trade-offs?

Cache-aside: cache is populated on read misses. Users occasionally experience cache miss latency.

Refresh-ahead: cache entries are proactively refreshed before they expire, based on predicted access patterns. Popular entries stay perpetually warm.

Trade-offs:

Pro: eliminates cache miss latency for tracked popular items; smoother performance under varying loads
Con: wasted resources refreshing items not actually needed; complexity in tracking truly popular keys; risk of refreshing stale data

Best for: known hot data sets where read latency matters more than wasted refresh cycles.

10. What metrics would you monitor to detect cache problems in production?

Primary metrics:

Hit rate: hits / (hits + misses). Should stay above 80-90%. Drop indicates working set does not fit or access pattern changed.
Memory usage: used_memory / maxmemory. Alert at 70%, critical at 80%.
Eviction count: Rate of evicted_keys. High rate indicates memory pressure.
Command latency: P50, P95, P99 for GET/SET operations.

Secondary metrics:

Connection count (sudden drops indicate connection issues)
Replication lag for replicated setups (should stay below 100ms)
Error rate (connection errors, timeout errors)

11. How would you design a cache warming strategy for a system that experiences cold starts?

Cold start problems occur when a cache restarts or when new data becomes hot without warning. Design for both scenarios:

Pre-warming on restart: After a cache node restarts, run a background job that populates the cache with the most frequently accessed keys before serving traffic.
Predictive warming: Track access patterns and pre-populate cache for data that is likely to become hot (scheduled events, expected traffic spikes).
Staggered key population: Avoid repopulating everything at once by staggering cache population based on key popularity.
Request coalescing: During cold start, allow only one request to rebuild a missing key while others wait. Prevents multiple requests from hitting the database simultaneously.

For Twitter-style workloads where content popularity spikes and then decays, warming at write time (fanout-on-write) trades write amplification for consistent read latency.

12. Explain the difference between LRU, LFU, and TTL eviction policies. When would you choose each?

LRU (Least Recently Used): Evicts the least recently accessed item. Good for temporal access patterns where recently accessed items are likely to be accessed again.

LFU (Least Frequently Used): Evicts the least frequently accessed item. Better for sustained hot data where popularity is stable over time.

TTL-based: Entries expire after a fixed time regardless of access frequency. Best for data that naturally becomes stale.

When to choose:

LRU: General purpose, works well when access patterns have temporal locality. Memcached defaults to LRU.
LFU: When you have stable hot sets and want to protect frequently-accessed items from being evicted by one-time accesses. Redis uses LFU for volatile keys.
TTL: When data freshness matters more than access frequency. Always use TTL as a safety net even with LRU/LFU.

Most production systems use LRU with TTL as a complementary eviction mechanism rather than relying on a single policy.

13. How does cache sharding differ from consistent hashing, and when would you use each?

Cache sharding: Partition data by entity type or key prefix. All data for a user stays in the same shard, enabling multi-key operations and pipelining within a shard.

Consistent hashing: Map keys to nodes based on hash values. Provides better load distribution when nodes are added or removed because only K/n keys remap.

When to use sharding:

You have entity types with different access patterns and sizes
You need atomic multi-key operations within an entity
You want simpler debugging (data for entity X is always on shard Y)

When to use consistent hashing:

Uniform distribution of keys across nodes is critical
You frequently add/remove cache nodes
You want to minimize cache invalidation during scaling

Many systems combine both: consistent hashing within shards to handle node failure and rebalancing within each shard's node group.

14. What is cache poisoning and how do you prevent it?

Cache poisoning occurs when an attacker injects malicious data into your cache that is then served to many users. Usually achieved by exploiting cache key collision or polluting shared cache with malicious values.

Prevention strategies:

Key validation: Sanitize cache keys to reject special characters, extremely long keys, or malformed input that could become injection vectors.
Key namespacing: Use prefixes like `app:env:entity:id` to prevent collision between different applications sharing cache infrastructure.
Input validation before caching: Validate data before storing in cache. Do not cache unchecked user input.
Cache access controls: Implement authentication for cache access and audit who accesses what.
Integrity checks: Sign cached values and verify signature before serving. Prevents tampering with cached data.

If your cache is shared across multiple applications, a compromised app can poison data that affects other applications. Namespacing and access controls are critical in multi-tenant cache deployments.

15. How would you handle cache consistency in a microservices architecture where multiple services cache the same underlying data?

In microservices, the same data (e.g., user profile) may be cached by multiple services independently. This creates consistency challenges:

Pattern: Single source of truth with pub/sub invalidation

The service owning the data publishes invalidation events when data changes.
All other services subscribe and invalidate their local caches.
TTL acts as a safety net if events are missed.

Pattern: Cache-aside with external invalidation

Central cache layer (Redis) holds canonical cached data.
Services read from central cache instead of maintaining their own caches.
Simpler consistency model but central cache becomes a dependency.

Key consideration: You cannot have strong consistency across independent caches. Design for eventual consistency and use write-through or event invalidation to minimize the inconsistency window. If strict consistency is required, bypass caches on reads and use write-through on writes.

16. What is the relationship between cache hit rate, latency, and throughput? How do you calculate the impact of cache performance on system capacity?

Latency impact: Cache hit latency is typically 0.1-1ms vs 5-50ms for database queries. Each miss adds ~10-50ms latency per request.

Throughput impact: Database queries limit concurrent operations due to connection pool constraints. Cache hits free database connections for other requests.

Capacity calculation:

If database supports 10,000 queries/second at 50ms latency, at 99% hit rate you need only 100 queries/second from database.
This means 1% of requests are slow (50ms) and 99% are fast (0.5ms). Average latency = 0.99 * 0.5ms + 0.01 * 50ms = 0.995ms.
At 95% hit rate: 0.95 * 0.5ms + 0.05 * 50ms = 2.975ms average. A 5% drop in hit rate causes ~3x increase in average latency.

Rule of thumb: 99% hit rate gives ~1ms average latency. 95% gives ~3ms. 90% gives ~5.5ms. The miss penalty dominates once hit rate drops below 95%. Cache tuning efforts should target 95%+ hit rate on the hot working set.

17. How would you implement a distributed rate limiter using cache?

Rate limiting using a distributed cache like Redis uses atomic increment operations with expiry:

# Sliding window rate limiter
def rate_limit(key, limit, window_seconds):
    current = redis.incr(key)
    if current == 1:
        redis.expire(key, window_seconds)
    return current <= limit
Fixed window with sliding window log (more accurate)
def sliding_window_rate_limit(key, limit, window_seconds):
now = time.time()
window_start = now - window_seconds
redis.zremrangebyscore(key, 0, window_start)
current = redis.zcard(key)
if current < limit:
redis.zadd(key, now, str(uuid.uuid4()))
redis.expire(key, window_seconds + 1)
return current < limit

Considerations:

Use atomic operations to prevent race conditions
Lua scripts for Redis ensure read-check-increment is atomic
Sliding window is more accurate but costs more operations
Fixed window is simpler but allows burst at window boundaries

18. What are the trade-offs between local (in-process) caches and distributed (networked) caches?

Local cache (e.g., Caffeine, LRUCache):

Ultra-low latency (microseconds) - no network round-trip
No serialization/deserialization overhead
Cannot share across application instances
Lost on application restart
Memory limited to application process size

Distributed cache (e.g., Redis, Memcached):

Shared across all application instances
Persist across restarts
Network latency (0.5-2ms per operation)
Serialization overhead
Single point of failure (mitigated with replication)

Best practice: Use a two-tier cache: local as L1 for the ultra-hot set, distributed as L2 for the warm cache. This gives you microsecond latency for L1 hits while sharing data across instances via L2.

19. How do you diagnose and troubleshoot cache-related production issues?

Step 1: Identify the pattern

Sudden latency spike: likely cache node failure or network partition
Gradual performance degradation: likely memory pressure, increasing evictions
Intermittent issues: likely connection pool exhaustion or periodic garbage collection

Step 2: Check metrics

Hit rate drop: working set grew beyond cache size or access pattern changed
Memory usage spike: likely key accumulation, TTL misconfiguration, or memory leak
High eviction rate: cache undersized for working set

Step 3: Check logs

Connection timeouts: network issues or cache overload
OOM errors: maxmemory misconfigured or eviction policy not working

Step 4: Test assumptions

Bypass cache and hit database directly to isolate whether cache is the problem
Use cache introspection commands (Redis INFO, Memcached stats) to dump internal state

Step 5: Fix and verify

Add capacity or tune eviction policy
Implement circuit breaker to degrade gracefully
Monitor to confirm fix worked

20. Describe a scenario where caching would hurt application performance instead of helping it.

Scenario: Write-heavy workload with strong consistency requirements

Imagine a real-time bidding system where each ad impression generates a write, and every read must reflect the most recent bid state (no staleness allowed).

Why caching hurts:

Cache-aside introduces eventual consistency - reads might return stale bid data, causing incorrect pricing
Write-through doubles write latency (cache + database) for every impression
Cache invalidation logic adds complexity and potential for bugs in hot path
Cache might be populated with data that is never read again (each bid is unique)

Better approach:

Use the database as the primary store with proper indexing
Consider database read replicas if read latency is the concern
Only add caching when measurements prove it helps

The key lesson: caching trades consistency for performance. When consistency is more important than performance (financial systems, real-time bidding), caching can actively harm your system by introducing bugs that are hard to reproduce (race conditions between cache and database) while adding complexity.

Conclusion

There is no single best caching strategy. Cache-aside is the default because it covers the most cases with the least complexity. But you’ll encounter situations where write-through or refresh-ahead fits better.

Start simple. Measure your hit rate. Add complexity only when the data tells you to.

Caching Strategies: A Practical Guide

Introduction

Core Concepts

Read Caching Patterns

Invalidate-on-Read (Stale-While-Revalidate)

Cache-Aside (Lazy Loading)

Read-Through (Cache Enrichment)

Write Caching Patterns

Write-Through

Write-Behind (Write-Back)

Refresh-Ahead (Proactive Caching)

Topic-Specific Deep Dives

Memcached vs Redis: Making the Choice

When NOT to Cache

CDN Caching for Static Assets

Versioning Strategies

No caching (sensitive content)

Stale-while-revalidate (serve stale, update in background)

Choosing the Right Strategy

How to decide

Cache Invalidation and TTL

Time-Based Invalidation (TTL)

Event-Based Invalidation (Cache Eviction on Write)

Event-Driven Invalidation (Pub/Sub)

Hybrid Approach: TTL + Event Invalidation

Cache Invalidation in Distributed Systems

TTL Selection Guide

TTL Selection Framework

TTL Jitter (Preventing Thundering Herds)

TTL Tiering

Dynamic TTL Based on Data Characteristics

Distributed Cache Patterns

Consistent Hashing

Cache Sharding by Entity

Replication with Read Replicas

Multi-Tier Caching

Case Study: YouTube’s Cache Hierarchy

Case Study: Twitter’s Cache Warming Strategy

Capacity Estimation: Cache Size vs Hit Rate

Monitoring and Operations

Observability Checklist

Metrics to Track

Logs to Capture

Alert Rules

Security Checklist

Trade-off Analysis

Production Failure Scenarios

Cache Node Failure

Memory Exhaustion

Network Partition

Thundering Herd on Restart

Cache Credential Rotation

Consistency vs Performance

Memory vs Hit Rate

Latency vs Durability

Implementation Complexity vs Operational Burden

Quick Recap + Interview Questions

Key Bullets

Copy/Paste Checklist

Best Practices Summary

Architecture Principles

Operational Guidelines

Code Quality

Interview Questions

Fixed window with sliding window log (more accurate)

Further Reading

Books

Articles and Papers

Documentation

Conclusion

Category

Tags

Related Posts

Cache Stampede Prevention: Protecting Your Cache

Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming

Distributed Caching: Scaling Cache Across Multiple Nodes