Caching Strategies: A Practical Guide
Learn the main caching patterns — cache-aside, write-through, write-behind, and refresh-ahead — plus how to pick TTLs, invalidate stale data, and distribute caches across nodes.
Caching Strategies: A Practical Guide
Introduction
Most applications have data that rarely changes but gets hit constantly. User profiles, product listings, config values, session data. Without caching, every request for this data pounds the database, even when nothing’s changed since last Tuesday.
The numbers tell the story:
| Approach | Typical Latency | Requests per Second (per node) |
|---|---|---|
| Database query | 5-50ms | 1,000-10,000 |
| Cache hit | 0.1-1ms | 100,000-1,000,000 |
| Cache miss (with cache) | 5-51ms | Same as database |
A cache that serves stale data is worse than no cache. And a cache that needs constant babysitting to stay valid is just overhead you do not need.
Core Concepts
These patterns describe how data flows between cache and application.
Read Caching Patterns
Read caching patterns describe how data flows from cache to application on read operations.
Invalidate-on-Read (Stale-While-Revalidate)
On each read, the cache checks whether the cached data is still fresh. This is typically done by comparing a version number, ETag, or timestamp against the origin. If the data is stale, the cache invalidates it and fetches fresh data from the origin before returning. This pattern is also known as stale-while-revalidate: serve the cached data immediately while asynchronously fetching an update if the data is past its freshness threshold.
This approach combines fast reads (cache hit returns immediately) with automatic background refresh for data that has changed. It works well when data changes unpredictably and you want to avoid write-time invalidation overhead.
Cache-Aside (Lazy Loading)
This is what most people mean when they say “caching.” Your application checks the cache first, loads from the database on a miss, then populates the cache for next time.
sequenceDiagram
participant Client
participant Cache
participant Database
Client->>Cache: GET user:123
Cache-->>Client: Cache miss
Client->>Database: SELECT * FROM users WHERE id = 123
Database-->>Client: User data
Client->>Cache: SET user:123 (ttl=3600)
Cache-->>Client: OK
Client->>Cache: GET user:123
Cache-->>Client: User data (cached)
Implementation:
def get_user(user_id):
# Try cache first
cache_key = f"user:{user_id}"
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss - load from database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# Populate cache with TTL
redis.setex(cache_key, 3600, json.dumps(user))
return user
Write operation:
def update_user(user_id, data):
# Update database first
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Invalidate cache
redis.delete(f"user:{user_id}")
Pros:
- Simple to implement
- Cache only contains data that’s actually requested
- No cache stampede on startup (cold cache is expected)
- Easy to reason about
Cons:
- First request after cache miss is always slow
- Cache and database can temporarily diverge (eventual consistency)
- Three network round-trips on cache miss (check, read, write)
Use this when: reads dominate your workload and you can live with brief inconsistency.
Read-Through (Cache Enrichment)
Same idea as cache-aside, but the cache library handles the miss logic for you. You just ask the cache for data; it fetches from the database automatically if needed.
sequenceDiagram
participant Client
participant Cache
participant Database
Client->>Cache: GET user:123
Cache->>Cache: Check in-memory store
alt Cache miss
Cache->>Database: SELECT * FROM users WHERE id = 123
Database-->>Cache: User data
Cache->>Cache: Store in memory
end
Cache-->>Client: User data
Implementation with Redis:
def get_user_cached(user_id):
cache_key = f"user:{user_id}"
# Check if loader is registered
user = redis.get(cache_key)
if user:
return json.loads(user)
# Using Redis Functions or Lua script for atomic read-through
# This is handled by the cache layer itself
return None
Most caching libraries (like Spring Cache, Django’s cache framework, Go’s groupcache) implement read-through natively:
// Using groupcache (read-through implementation)
var db Database
sc := groupcache.NewGetter("http://cache-server/", &db)
func getUser(ctx context.Context, userID int64) (*User, error) {
var user User
key := fmt.Sprintf("user:%d", userID)
err := sc.Get(ctx, key, &user)
return &user, err
}
Pros:
- Cleaner application code
- Reduced latency on cache miss (cache fetches in parallel with other requests)
- Cache handles the fetch-and-store atomically
Cons:
- Less control over cache logic
- All caches must implement the same pattern
- Can mask cache behavior from developers
Use this when: you want caching to be infrastructure, not application logic.
Write Caching Patterns
Write caching patterns describe how data flows from application to cache on write operations.
Write-Through
Every write goes to cache and database together. The operation doesn’t return until both succeed.
sequenceDiagram
participant Client
participant Cache
participant Database
Client->>Cache: SET user:123
Cache->>Database: UPDATE users SET ...
Database-->>Cache: OK
Cache-->>Client: OK
Implementation:
def update_user(user_id, data):
# Write to cache AND database
cache_key = f"user:{user_id}"
# Start transaction
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Write-through to cache
redis.setex(cache_key, 3600, json.dumps(data))
return data
Pros:
- Strong consistency between cache and database
- Cache is always warm with latest data
- No cache invalidation logic needed
Cons:
- Write latency increases (two writes instead of one)
- Cache can be knocked out by write-heavy workloads
- Cache might be populated with data that’s never read
Use this when: consistency matters more than write speed and your writes are infrequent relative to reads.
Write-Behind (Write-Back)
You write to the cache and it batches the database writes to happen later, in the background.
sequenceDiagram
participant Client
participant Cache
participant Database
participant WriteBuffer
Client->>Cache: SET user:123
Cache->>WriteBuffer: Queue write
Cache-->>Client: OK (fast)
Note over WriteBuffer: Background worker
WriteBuffer->>Database: Batch UPDATE
Database-->>WriteBuffer: OK
Implementation:
import asyncio
from collections import deque
class WriteBehindCache:
def __init__(self, redis, db, batch_size=100, flush_interval=1.0):
self.redis = redis
self.db = db
self.write_queue = deque()
self.batch_size = batch_size
self.flush_interval = flush_interval
asyncio.create_task(self._flush_loop())
async def set(self, key, value):
self.redis.setex(key, 3600, json.dumps(value))
self.write_queue.append((key, value))
if len(self.write_queue) >= self.batch_size:
await self._flush()
async def _flush(self):
if not self.write_queue:
return
batch = []
while self.write_queue and len(batch) < self.batch_size:
batch.append(self.write_queue.popleft())
# Batch write to database
for key, value in batch:
self.db.execute(
"UPDATE users SET ... WHERE id = ?",
value['id'],
value
)
async def _flush_loop(self):
while True:
await asyncio.sleep(self.flush_interval)
await self._flush()
Pros:
- Very low write latency
- Batching reduces database load
- Cache handles burst writes gracefully
Cons:
- Risk of data loss if cache fails before flush
- Complexity in handling partial failures
- Cache and database can significantly diverge
- Harder to debug (writes happen asynchronously)
Use this when: you’re collecting metrics or events and losing a few writes won’t ruin your day.
Refresh-Ahead (Proactive Caching)
The cache automatically refreshes entries before they expire. Popular data stays perpetually warm, so users never hit a cache miss.
sequenceDiagram
participant Cache
participant Database
participant Refresher
Note over Cache: Entry TTL = 300s
Refresher->>Cache: Check TTL
Refresher->>Database: SELECT (background)
Refresher->>Cache: SET (reset TTL)
loop Every 60 seconds
Refresher->>Cache: Check popular entries
Refresher->>Database: Refresh if TTL < 60s
end
Implementation:
import time
from threading import Thread
class RefreshAheadCache:
def __init__(self, redis, db, ttl=300, refresh_threshold=0.8):
self.redis = redis
self.db = db
self.ttl = ttl
self.refresh_threshold = refresh_threshold
self.popular_keys = set()
# Background refresher thread
self.running = True
self.thread = Thread(target=self._refresh_loop)
self.thread.start()
def track_access(self, key):
"""Track frequently accessed keys"""
self.popular_keys.add(key)
def get(self, key):
value = self.redis.get(key)
if value:
self.track_access(key)
return json.loads(value)
return None
def _should_refresh(self, key):
"""Check if key needs proactive refresh"""
ttl = self.redis.ttl(key)
return ttl > 0 and ttl < (self.ttl * self.refresh_threshold)
def _refresh_loop(self):
while self.running:
for key in list(self.popular_keys):
if self._should_refresh(key):
# Refresh in background
data = self.db.query(
"SELECT * FROM users WHERE id = ?",
key.split(':')[1]
)
self.redis.setex(key, self.ttl, json.dumps(data))
time.sleep(10) # Check every 10 seconds
Pros:
- Eliminates cache miss latency for popular items
- Users never wait for cache to repopulate
- Smoother performance under varying loads
Cons:
- Wasted resources refreshing items not actually needed
- Complexity in tracking truly popular keys
- Risk of refreshing stale data
- Additional logic to determine refresh threshold
Use this when: you have a known set of hot data and read latency matters more than wasted cycles.
Topic-Specific Deep Dives
These sections dig into specific aspects of caching implementation and operations.
Memcached vs Redis: Making the Choice
Both Memcached and Redis serve as distributed caching layers, but they target different use cases. Understanding the trade-offs helps you pick the right tool.
| Feature | Memcached | Redis |
|---|---|---|
| Data structures | Key-value only | Strings, hashes, lists, sets, sorted sets, streams |
| Persistence | None (memory-only) | Optional RDB snapshots + AOF |
| Replication | Not built-in | Master-replica replication |
| Clustering | Consistent hashing client-side | Native cluster mode built-in |
| eviction policies | LRU, LFU, TTL | LRU, LFU, TTL + manual control |
| Memory efficiency | Simple slab allocator | More overhead per key |
| Use when | Simple caching, PHP/auto | Complex data, pub/sub, sorted sets, need persistence |
Memcached excels at: simple key-value caching where you just need to store serialized objects. Its memory efficiency and simplicity make it ideal for basic cache-aside patterns. Many PHP applications and frameworks default to Memcached for this reason.
Redis excels at: caching that requires data structures (like leaderboards with sorted sets, pub/sub for cache invalidation, or stream-based event queuing). Its native clustering and replication simplify operational complexity.
# Memcached: simple get/set
memcached.set(key, value, expire=3600)
value = memcached.get(key)
# Redis: richer operations
redis.set(key, value, ex=3600)
redis.zadd("leaderboard", {"user_id": score}) # Sorted set for rankings
redis.publish("invalidate", key) # Pub/sub for coordinated invalidation
For most caching scenarios, Redis wins because it reduces the number of systems you need to operate. But if you have a pure read-heavy workload with simple key-value requirements and memory efficiency is critical, Memcached is a valid choice.
When NOT to Cache
Caching is not always the answer. Here are scenarios where the complexity outweighs the benefits.
Do not cache when:
- Data changes on every request (no repeat reads to benefit from)
- Cache would consume more memory than the database itself (full table caching)
- Consistency requirements preclude staleness (financial transactions)
- Your database already handles your load comfortably (add complexity only when needed)
- Data is unique per request and never repeated (session data with per-user keys)
Signs your cache is not helping:
- Hit rate below 50% despite tuning
- Cache memory pressure causes constant evictions
- You spend more time managing cache invalidation than writing application code
- Cache failures cause more production incidents than database failures
Cache as a performance optimization, not a architectural necessity. If your database handles your load without caching, keep it simple.
CDN Caching for Static Assets
CDNs sit at the edge of your infrastructure, caching content close to users. Unlike application caches that handle dynamic data, CDNs typically handle static assets: images, CSS, JavaScript, fonts, videos.
CDN caching strategies differ from application caching:
| Aspect | Application Cache (Redis/Memcached) | CDN |
|---|---|---|
| Content type | Dynamic data, API responses | Static files (images, JS, CSS) |
| TTL range | Seconds to hours | Minutes to years |
| Invalidation | Event-driven or TTL | Purge API or TTL expiry |
| Cache key | Data-specific (user:123:profile) | URL-based (/assets/logo.png) |
| Geographic distribution | Limited to cache cluster location | Global PoPs near users |
Cache-Control directives every developer should know:
Versioning Strategies
# Immutable assets: cache forever, change URL on deploy
Cache-Control: public, max-age=31536000, immutable
# Versioned asset: cache forever, change URL on deploy
# /app.abc123.js
No caching (sensitive content)
Cache-Control: no-store, private
Stale-while-revalidate (serve stale, update in background)
Cache-Control: public, max-age=3600, stale-while-revalidate=86400
**CDN invalidation pitfalls:**
- Purge is not instant. Most CDNs take 30 seconds to 5 minutes to propagate purges globally.
- Cache tags or content-type purging helps but is not universally supported.
- Versioned URLs (e.g., `/app.abc123.js`) beat cache invalidation for JavaScript/CSS updates.
```html
<!-- Versioned asset: cache forever, change URL on deploy -->
<script src="/app.abc123.js"></script>
<!-- vs -->
<!-- Unversioned: requires CDN purge on every deploy -->
<script src="/app.js"></script>
When CDN alone is not enough: CDNs excel at caching static assets with long TTLs. But for dynamic content that changes frequently (like a news homepage), CDNs need help. Pattern: CDN edge caching with application-level cache invalidation via surrogate keys or tag-based purging. Cloudflare Workers or Fastly VCL can intercept requests and conditionally purge cache when the origin data changes.
Choosing the Right Strategy
| Strategy | Read Performance | Write Performance | Consistency | Complexity |
|---|---|---|---|---|
| Cache-Aside | Good (after miss) | Best | Eventual | Low |
| Read-Through | Good | Same as DB | Eventual | Low |
| Write-Through | Good | Good | Strong | Medium |
| Write-Behind | Good | Best | Eventual | High |
| Refresh-Ahead | Best | Same | Near-strong | High |
How to decide
Which latency matters more, reads or writes?
- Reads: cache-aside, read-through, or refresh-ahead
- Writes: write-behind or write-through
How synced do cache and database need to be?
- Tight consistency: write-through
- Eventual is fine: cache-aside or write-behind
What happens if the cache goes down before flushing?
- Can’t lose writes: write-through
- A few lost writes are okay: write-behind
Is access predictable?
- Unpredictable: cache-aside
- Known hot set: refresh-ahead
Cache Invalidation and TTL
Cache invalidation is the hardest part of caching. The right TTL strategy is equally critical for maintaining freshness vs. efficiency.
Time-Based Invalidation (TTL)
The simplest approach — entries expire after a fixed duration.
# TTL-based invalidation
redis.setex(cache_key, 3600, value) # 1 hour TTL
When to use: Data that naturally becomes stale over time (user profiles, product prices, news articles).
Limitation: You must choose a TTL that balances freshness against load. Too short = cache thrashing. Too long = stale data.
Event-Based Invalidation (Cache Eviction on Write)
When data changes in the database, explicitly remove or update the corresponding cache entry.
def update_user(user_id, data):
# Update database first
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Invalidate cache entry
redis.delete(f"user:{user_id}")
# Optionally, immediately repopulate with fresh data
fresh_user = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(f"user:{user_id}", 3600, json.dumps(fresh_user))
When to use: When you need immediate consistency on writes (write-through scenario).
Limitation: Requires your application to remember to invalidate on every write path. Miss one path, and you have stale data.
Event-Driven Invalidation (Pub/Sub)
Use a message queue or pub/sub system to propagate invalidation events across all cache nodes.
# Publisher: when data changes
def update_user(user_id, data):
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Publish invalidation event
redis.publish("cache:invalidate", f"user:{user_id}")
# Subscriber: on each application server
def subscribe_invalidation():
pubsub = redis.pubsub()
pubsub.subscribe("cache:invalidate")
for message in pubsub.listen():
if message["type"] == "message":
cache_key = message["data"]
redis.delete(cache_key)
When to use: Multi-server deployments where cache lives on application servers (local caches) and you need coordinated invalidation across all nodes.
Limitation: Event delivery is not guaranteed. Subscribers might miss messages during restarts. Always combine with TTL as a safety net.
Hybrid Approach: TTL + Event Invalidation
The most robust strategy combines TTL (safety net) with event invalidation (specificity).
def get_user(user_id):
cache_key = f"user:{user_id}"
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# Set TTL as safety net (e.g., 1 hour)
redis.setex(cache_key, 3600, json.dumps(user))
return user
def update_user(user_id, data):
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Immediate invalidation via event (specific)
redis.publish("cache:invalidate", f"user:{user_id}")
# TTL still acts as safety net if event is missed
This approach handles: (1) missed invalidation events via TTL, (2) immediate consistency when events fire, and (3) cache recovery after failures.
Cache Invalidation in Distributed Systems
In distributed cache setups, invalidation becomes more complex because multiple cache nodes might hold the same key.
# Using Redis SCAN to find and delete all copies of a key across nodes
def invalidate_key_across_cluster(key_pattern):
"""
Invalidate all keys matching a pattern across the cluster.
Use with caution — expensive operation.
"""
cursor = 0
while True:
cursor, keys = redis.scan(cursor, match=key_pattern, count=100)
if keys:
redis.delete(*keys)
if cursor == 0:
break
# Example: invalidate all user session keys for a specific user
invalidate_key_across_cluster(f"session:*:user:{user_id}")
For distributed caches like Memcached or Redis Cluster, consider consistent hashing to determine which node holds a specific key — this lets you invalidate directly without scanning.
TTL Selection Guide
Choosing the right TTL is a balancing act between data freshness, cache efficiency, and database load.
TTL Selection Framework
Ask these questions to determine appropriate TTLs:
1. How stale can this data be?
| Data Type | Staleness Tolerance | Suggested TTL |
|---|---|---|
| Real-time prices | Seconds | 30-60 seconds |
| Social media posts | Minutes | 5-15 minutes |
| User profiles | Minutes to hours | 15-60 minutes |
| Product catalog | Hours | 1-24 hours |
| Static config | Hours to days | 1-24 hours |
| Session data | Duration of session | 24 hours |
2. What is the cost of a cache miss vs stale data?
- Miss-costly, stale-tolerant: longer TTLs work fine (config, user preferences)
- Miss-costly, stale-intolerant: use shorter TTLs + event invalidation (prices, inventory)
- Miss-cheap, stale-tolerant: shorter TTLs are fine (view counts, trending topics)
3. How does access pattern decay?
Data that spikes in popularity then drops off (social posts, news) needs shorter TTLs than evergreen content (documentation, product specs).
TTL Jitter (Preventing Thundering Herds)
If all cache entries expire at the same time, you get a thundering herd when they all expire. Add random jitter to TTLs:
import random
def set_with_jitter(key, value, base_ttl):
"""
Set cache with randomized TTL to prevent synchronized expiration.
Jitter is +/- 10% of base TTL.
"""
jitter = base_ttl * 0.1
actual_ttl = base_ttl + random.uniform(-jitter, jitter)
redis.setex(key, int(actual_ttl), value)
# Usage
set_with_jitter("user:123", user_data, base_ttl=3600) # 3240-3960 seconds
TTL Tiering
For the same data, consider storing multiple copies at different TTLs for different freshness requirements:
def cache_user_profile(user_id):
cache_key = f"user:{user_id}"
# Fresh copy: short TTL
fresh = redis.get(f"{cache_key}:fresh")
if not fresh:
fresh = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(f"{cache_key}:fresh", 300, json.dumps(fresh)) # 5 min
# Stale copy: long TTL (fallback)
stale = redis.get(f"{cache_key}:stale")
if not stale:
stale = fresh # Initial population
redis.setex(f"{cache_key}:stale", 86400, json.dumps(stale)) # 24 hours
return fresh if fresh else stale
Dynamic TTL Based on Data Characteristics
Some data has variable freshness based on its nature. Use dynamic TTLs:
def get_dynamic_ttl(data_type, data_age_hours=0):
"""
Return appropriate TTL based on data type and age.
"""
base_ttls = {
"breaking_news": 30, # 30 seconds
"sports_scores": 60, # 1 minute
"product_price": 300, # 5 minutes
"blog_post": 1800, # 30 minutes
"documentation": 86400, # 24 hours
}
base = base_ttls.get(data_type, 3600)
# Reduce TTL for rapidly changing data
if data_age_hours < 1:
return base // 2 # Halve TTL for fresh content
return base
Distributed Cache Patterns
When a single cache instance cannot handle your load, distribute the cache across multiple nodes.
Consistent Hashing
Consistent hashing maps keys to cache nodes based on key hash values, minimizing remapping when nodes are added or removed.
import hashlib
class ConsistentHash:
def __init__(self, nodes):
self.ring = {}
self.sorted_keys = []
for node in nodes:
self._add_node(node)
def _add_node(self, node):
for i in range(100): # Virtual nodes for better distribution
key = hashlib.md5(f"{node}:{i}".encode()).hexdigest()
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def get_node(self, key):
key_hash = hashlib.md5(key.encode()).hexdigest()
for sorted_key in self.sorted_keys:
if key_hash <= sorted_key:
return self.ring[sorted_key]
return self.ring[self.sorted_keys[0]]
# Usage
ch = ConsistentHash(["cache-1", "cache-2", "cache-3"])
node = ch.get_node("user:123") # Always returns same node for same key
When you add or remove a node, only K/n keys remap (where K is total keys, n is nodes). This avoids cache stampedes during scaling events.
Cache Sharding by Entity
Instead of distributing keys at random, shard by entity type so related data stays together.
def get_shard(cache_key):
"""
Shard by entity type to keep related data together.
"""
# Extract entity type from key
entity_type = cache_key.split(":")[0] # "user", "product", "order"
# Hash the entity type for distribution
type_hash = hashlib.md5(entity_type.encode()).hexdigest()
# Map to shard
shard_index = int(type_hash, 16) % NUM_SHARDS
return f"cache-shard-{shard_index}"
# Route requests to appropriate shard
def cache_get(cache_key):
shard = get_shard(cache_key)
return redis_shards[shard].get(cache_key)
All data for a single user (profile, preferences, history) lives in the same shard, which makes multi-key operations and pipelining straightforward.
Replication with Read Replicas
For read-heavy workloads, add replica nodes that handle read traffic while the primary handles writes.
# Write to primary
def cache_set(key, value):
redis_primary.setex(key, 3600, value)
# Replicate asynchronously to read replicas
redis_replicas.each { |r| r.setex(key, 3600, value) }
# Read from replica (randomly selected)
def cache_get(key):
replica = random.choice(redis_replicas)
return replica.get(key)
The tradeoff: replicas might lag the primary, serving slightly stale data. For most caching scenarios this is fine.
Multi-Tier Caching
Deploy a local (L1) in-memory cache in front of a distributed (L2) cache.
import functools
from threading import Lock
class TwoTierCache:
def __init__(self, local_cache, redis_cache):
self.local = local_cache # e.g., LRUCache from cachetools
self.redis = redis_cache
self.local_lock = Lock()
def get(self, key):
# Try L1 first (local, ultra-fast)
value = self.local.get(key)
if value is not None:
return value
# L1 miss — try L2 (distributed)
value = self.redis.get(key)
if value is not None:
# Populate L1 for next request
with self.local_lock:
self.local[key] = value
return value
return None
def set(self, key, value, ttl=3600):
# Write to both tiers
self.redis.setex(key, ttl, value)
with self.local_lock:
self.local[key] = value
def invalidate(self, key):
self.redis.delete(key)
with self.local_lock:
self.local.pop(key, None)
YouTube’s architecture uses exactly this pattern: L1 per-machine cache handles the ultra-hot set, L2 distributed cache handles warm data, and the database handles cold data.
Case Study: YouTube’s Cache Hierarchy
YouTube’s caching infrastructure is one of the most studied in the industry. Their approach uses multiple cache layers: L1 (in-memory, per-machine), L2 (distributed cache), and CDN at the edge.
YouTube’s L1 cache is a small in-memory cache on each application server. It handles the most frequently accessed items — popular videos, trending content. L1 hit rate alone is often 50-60% because many users on the same machine access the same popular content.
The L2 distributed cache (originally Memcached, later moved to custom infrastructure) handles cache misses from L1. L2 is sharded across many machines to provide petabyte-scale capacity. Cache misses from L2 go to storage (BigTable).
The CDN handles the edge, serving popular content from points of presence close to users. YouTube’s CDN cache hit rate is over 90% for video streaming — once a video becomes popular, it propagates to CDN PoPs and subsequent requests rarely hit origin.
The lesson: YouTube does not rely on a single cache tier. They use L1 to handle the ultra-hot set with extremely low latency, L2 for the warm cache, and CDN for the long tail of popular-but-not-ultra-popular content. Most companies should design for two tiers (local cache + distributed cache) before adding a CDN.
Case Study: Twitter’s Cache Warming Strategy
Twitter has a unique caching challenge: events (tweets, likes, follows) have a short window of high read traffic, then traffic drops off a cliff. A tweet from a celebrity gets millions of reads in the first hour, then readership drops to hundreds per day.
Twitter’s solution is aggressive cache warming: when a tweet is published, Twitter pushes it into the timelines of active followers’ caches rather than waiting for cache misses. This is the fanout-on-write pattern — write to caches at publish time rather than computing at read time.
The tradeoff is write amplification. Every tweet from a celebrity with 10 million followers requires 10 million cache writes. Twitter manages this by limiting fanout to active users only and using hybrid push/pull for lower-activity accounts. Inactive users’ timelines are computed on read from the tweet author’s tweet store.
The operational lesson: cache warming trades write amplification for read latency. For content with rapid decay in read traffic (news, social posts, live events), warming the cache at write time reduces read latency at the cost of higher write overhead. For evergreen content, cache-aside with long TTLs is simpler and more efficient.
Capacity Estimation: Cache Size vs Hit Rate
The relationship between cache size and hit rate is not linear. Adding more cache memory gives diminishing returns beyond a certain point.
The working set model: your hit rate depends on how much of your frequently-accessed data fits in cache. If 80% of your requests hit 20% of your data, and that 20% fits in cache, you can achieve 95%+ hit rate with relatively small cache. If access is uniformly distributed, even a large cache provides modest hit rates.
The formula for estimating required cache size: working_set_bytes = unique_keys_per_second * avg_value_size * avg_ttl_seconds. If you have 10,000 requests per second, average value is 1KB, and you want a 5-minute TTL window, your working set is 10,000 1,000 300 = 3GB minimum for a fully-utilized cache before evictions. In practice, you need 1.5-2x that because LRU/LFU policies do not perfectly track the working set.
The hit rate curve: start at 0% hit rate with no cache, rapid climb as cache grows to cover the hot working set, then diminishing returns as cache size exceeds working set. Plot your hit rate against cache size to find the knee of the curve — the point where adding more cache stops helping significantly. This is your target cache size.
For cache-aside specifically, the miss penalty matters more than raw hit rate. A cache miss does a full database round-trip. If your database latency is 10ms and cache latency is 0.5ms, each miss costs 9.5ms extra. At 99% hit rate, only 1% of requests pay the miss penalty. At 95% hit rate, 5% pay it — a 5x increase in slow queries.
Monitoring and Operations
Observability Checklist
Monitor these metrics and set up alerts for production cache health.
Metrics to Track
- Hit Rate:
hits / (hits + misses)- should stay above 80-90% for well-tuned caches - Memory Usage:
used_memory / maxmemory- alert at 70%, critical at 80% - Eviction Count:
evicted_keys- indicates memory pressure - Connection Count:
connected_clients- sudden drops indicate connection issues - Command Latency: P50, P95, P99 for GET/SET operations
- Replication Lag: For replicated setups, lag should stay below 100ms
Logs to Capture
# Log cache operations for debugging
import structlog
logger = structlog.get_logger()
def get_user(user_id):
cache_key = f"user:{user_id}"
start = time.time()
cached = redis.get(cache_key)
if cached:
logger.info("cache_hit", key=cache_key, latency_ms=(time.time() - start) * 1000)
return json.loads(cached)
# Cache miss - this should be rare in production
logger.warning("cache_miss", key=cache_key)
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(cache_key, 3600, json.dumps(user))
return user
Alert Rules
# Prometheus alert rules for Redis
- alert: CacheHitRateLow
expr: redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) < 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 80%"
- alert: CacheMemoryExhausted
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
for: 2m
labels:
severity: critical
annotations:
summary: "Cache memory above 80% capacity"
Security Checklist
Cache security is often overlooked until a breach happens.
- Never expose Redis/Memcached directly to the internet - Bind to localhost or private network only
- Use authentication - Redis
requirepassor Memcached SASL authentication - Enable TLS - For connections crossing network boundaries
- Validate key namespaces - Use prefixes like
app:env:table:to prevent key collisions - Sanitize cache keys - User input should never become cache keys without validation
- Implement rate limiting - Prevent cache exhaustion attacks
- Audit cache access - Log who accessed what, especially for sensitive data
- Never cache sensitive data - PII, passwords, tokens, payment info should never enter the cache
# Redis secure configuration
bind 127.0.0.1 -::1
requirepass your-strong-password-here
tls-replication yes
tls-auth-clients no
Trade-off Analysis
Understanding the trade-offs between caching strategies helps you make informed decisions for your specific use case.
Production Failure Scenarios
Understanding what fails and how to recover is critical for production caching systems.
Cache Node Failure
When a cache node goes down, all requests directly hit the database, potentially causing cascade failure.
Mitigation:
- Use connection pooling with automatic retry
- Implement circuit breaker pattern
- Use replica nodes for read failover
Memory Exhaustion
When cache memory is exhausted, eviction kicks in aggressively and hit rate drops to 0%.
Mitigation:
- Monitor memory usage and set appropriate
maxmemorylimits - Alert at 70% threshold
- Implement proper TTL policies
Network Partition
When network connectivity between application and cache fails, requests hang or timeout.
Mitigation:
- Set reasonable socket timeouts (100-500ms)
- Configure fail-fast behavior to fall back to database
- Use connection pooling with health checks
Thundering Herd on Restart
When cache restarts, all clients hit the database simultaneously.
Mitigation:
- Pre-warm cache on restart
- Use staggered TTLs with jitter
- Implement request coalescing (semaphores or locks)
Cache Credential Rotation
During credential rotation, brief outages or authentication failures can occur.
Mitigation:
- Use connection pooling with lazy reconnection
- Rotate credentials during low-traffic windows
- Implement connection string rotation
Consistency vs Performance
| Strategy | Consistency | Performance | Complexity |
|---|---|---|---|
| Cache-Aside | Eventual | High (after warm-up) | Low |
| Write-Through | Strong | Read-optimized | Medium |
| Write-Behind | Eventual | Write-optimized | High |
| Refresh-Ahead | Near-strong | Best for hot data | High |
Memory vs Hit Rate
Larger caches achieve higher hit rates, but the relationship is not linear:
- Working set fits in cache: 95%+ hit rate achievable
- Working set exceeds cache: Hit rate drops proportionally
- Diminishing returns: After the “knee” of the curve, adding memory yields minimal improvement
Latency vs Durability
| Approach | Latency | Durability | Risk |
|---|---|---|---|
| Write-through | Higher (waits for DB) | Best (dual-write) | Low |
| Write-behind | Lowest (async) | Risk of loss | Higher |
| Cache-aside | Variable | Database-only | Medium |
Implementation Complexity vs Operational Burden
| Pattern | Code Complexity | Operational Complexity |
|---|---|---|
| Cache-Aside | Low | Low |
| Read-Through | Low | Medium |
| Write-Through | Medium | Medium |
| Write-Behind | High | High |
| Refresh-Ahead | High | High |
Quick Recap + Interview Questions
Key Bullets
- Cache-aside is the default strategy for most read-heavy workloads
- Write-through ensures strong consistency but increases write latency
- Write-behind batches writes for performance but risks data loss
- Refresh-ahead eliminates misses for popular items but adds complexity
- Always implement stampede protection when cache misses could cascade
- Monitor hit rate, memory usage, and eviction counts continuously
Copy/Paste Checklist
# Cache-Aside Implementation Checklist
- [ ] Check cache first (redis.get)
- [ ] On miss, query database
- [ ] Populate cache with TTL (redis.setex)
- [ ] On write, invalidate cache (redis.delete), don't update
- [ ] Implement stampede protection with locks
- [ ] Cache null values with short TTL to prevent penetration
- [ ] Monitor hit rate - should be >80%
- [ ] Set appropriate TTLs based on data freshness requirements
- [ ] Log cache hits and misses for observability
- [ ] Use circuit breaker for cache failures
# TTL Selection Guide
- [ ] User profiles: 15-60 minutes
- [ ] Session data: 24 hours
- [ ] API responses: 5-30 minutes
- [ ] Static config: 1-24 hours
- [ ] Product catalog: 1-24 hours
- [ ] Real-time data: No caching or very short TTL (30-60 seconds)
Best Practices Summary
Architecture Principles
- Cache as a win, not a requirement. If your database handles load fine, you may not need caching. Add caching when you have measurable latency or throughput problems.
- Design for cache failure. Your application should degrade gracefully when the cache is unavailable — fall back to the database directly.
- Keep the cache stateless. Cache nodes should not hold state that cannot be recovered. If a cache node restarts, any other node should be able to serve the same keys.
- Instrument everything. Cache hit rates, eviction counts, memory usage — you cannot tune what you cannot measure.
Operational Guidelines
- Start with cache-aside. It is the simplest strategy with the best debuggability. Add complexity only when measurements tell you to.
- Use TTLs on everything. No key should live forever. TTLs prevent unbounded memory growth and ensure eventual consistency.
- Namespace your keys. Use prefixes like
app:env:entity:idto prevent collisions in shared cache infrastructure. - Monitor the 80% threshold. Cache hit rate should be above 80-90% for well-tuned caches. If it is lower, either your working set does not fit or your access patterns are too uniform.
- Test failure modes. Periodically kill cache nodes and verify your application handles it gracefully.
Code Quality
- Never use cache as primary store. The database is always the source of truth.
- Invalidate on write, never update. Delete cache entries when data changes rather than trying to keep cache and database in sync.
- Handle the null case. Cache null values to prevent cache penetration attacks.
- Protect against stampedes. Use locks or probabilistic early expiration when cache misses are expensive.
Interview Questions
Cache-aside (lazy loading): the app checks the cache first, loads from the database on a miss, then populates the cache. Writes go directly to the database, and the cache is invalidated afterward.
Write-through: every write goes to both the cache and database together. The operation does not return until both succeed.
Cache-aside wins for read-heavy workloads where brief inconsistency is acceptable. Write-through makes more sense when consistency matters more than write speed and writes are infrequent relative to reads.
A cache stampede (thundering herd) happens when a popular entry expires and multiple concurrent requests all try to rebuild it at the same time, overwhelming the database.
Prevention strategies:
- Lock-based protection: only one request rebuilds the cache; others wait and retry.
- Probabilistic early expiration: randomly refresh entries before they expire based on a probability function.
- Mutex + early expiration combined: refresh early but coordinate with locks so only one request does the work.
- Background refreshing: a separate thread or process keeps popular entries warm before they expire.
Consistent hashing maps keys to cache nodes based on hash values. When nodes are added or removed, only K/n keys remap (where K is total keys, n is nodes), minimizing cache misses during scaling.
Key benefits:
- Less cache invalidation needed during scaling events
- Better load distribution across nodes
- Easier horizontal scaling for cache clusters
Cache penetration occurs when requests repeatedly query for keys that do not exist in the cache or database. Each request bypasses the cache and hits the database, negating the cache's purpose.
Mitigations:
- Cache null values: store a marker (like "NULL") for non-existent keys with a short TTL to prevent repeated lookups.
- Bloom filters: use a bloom filter to quickly determine if a key might exist before querying the cache.
- Input validation: sanitize cache keys to reject obviously invalid requests early.
Write-behind batches database writes in the background, returning immediately after the cache is updated. Write-through waits for both cache and database to succeed before returning.
Choose write-behind when:
- Write latency matters more than immediate durability
- You are collecting metrics, events, or analytics where losing a few writes is acceptable
- You want to reduce database load from burst writes
- Data loss risk is acceptable (your application can tolerate retransmission or recomputation)
Skip write-behind when data consistency is critical or you cannot tolerate any data loss.
TTL selection depends on three factors:
- Staleness tolerance: real-time data (prices, inventory) needs short TTLs (seconds to minutes). Static data (config, documentation) can use hours or days.
- Miss penalty: high miss penalty (expensive database queries) suggests longer TTLs to maximize hit rate.
- Access decay pattern: content that spikes in popularity then drops (social posts) needs shorter TTLs than evergreen content.
Best practice: add jitter (+/- 10%) to TTLs to prevent synchronized expiration of related keys.
TTL invalidation: entries automatically expire after a fixed duration. Simple, requires no application logic, but cannot provide immediate consistency when data changes.
Event-driven invalidation: when data changes in the database, a message is published (pub/sub) and all cache nodes delete the corresponding entry. Provides immediate consistency but requires more infrastructure and can miss events during failures.
The hybrid approach uses event invalidation for immediate consistency with TTL as a safety net for missed events. This is the most robust pattern for production systems.
A two-tier cache places a small, fast local cache (L1) in front of a larger, distributed cache (L2). L1 is typically an in-memory cache on each application server. L2 is a shared cache like Redis or Memcached.
Benefits:
- L1 hit rate of 50-60% for shared popular content (users on same machine accessing same data)
- Ultra-low latency for L1 hits (microseconds vs milliseconds for L2)
- L2 provides capacity for the warm cache beyond what fits in local memory
- Reduces cross-network traffic to L2 cache
YouTube's architecture uses exactly this pattern with per-machine L1, distributed L2, and CDN at the edge.
Cache-aside: cache is populated on read misses. Users occasionally experience cache miss latency.
Refresh-ahead: cache entries are proactively refreshed before they expire, based on predicted access patterns. Popular entries stay perpetually warm.
Trade-offs:
- Pro: eliminates cache miss latency for tracked popular items; smoother performance under varying loads
- Con: wasted resources refreshing items not actually needed; complexity in tracking truly popular keys; risk of refreshing stale data
Best for: known hot data sets where read latency matters more than wasted refresh cycles.
Primary metrics:
- Hit rate: hits / (hits + misses). Should stay above 80-90%. Drop indicates working set does not fit or access pattern changed.
- Memory usage: used_memory / maxmemory. Alert at 70%, critical at 80%.
- Eviction count: Rate of evicted_keys. High rate indicates memory pressure.
- Command latency: P50, P95, P99 for GET/SET operations.
Secondary metrics:
- Connection count (sudden drops indicate connection issues)
- Replication lag for replicated setups (should stay below 100ms)
- Error rate (connection errors, timeout errors)
Cold start problems occur when a cache restarts or when new data becomes hot without warning. Design for both scenarios:
- Pre-warming on restart: After a cache node restarts, run a background job that populates the cache with the most frequently accessed keys before serving traffic.
- Predictive warming: Track access patterns and pre-populate cache for data that is likely to become hot (scheduled events, expected traffic spikes).
- Staggered key population: Avoid repopulating everything at once by staggering cache population based on key popularity.
- Request coalescing: During cold start, allow only one request to rebuild a missing key while others wait. Prevents multiple requests from hitting the database simultaneously.
For Twitter-style workloads where content popularity spikes and then decays, warming at write time (fanout-on-write) trades write amplification for consistent read latency.
LRU (Least Recently Used): Evicts the least recently accessed item. Good for temporal access patterns where recently accessed items are likely to be accessed again.
LFU (Least Frequently Used): Evicts the least frequently accessed item. Better for sustained hot data where popularity is stable over time.
TTL-based: Entries expire after a fixed time regardless of access frequency. Best for data that naturally becomes stale.
When to choose:
- LRU: General purpose, works well when access patterns have temporal locality. Memcached defaults to LRU.
- LFU: When you have stable hot sets and want to protect frequently-accessed items from being evicted by one-time accesses. Redis uses LFU for volatile keys.
- TTL: When data freshness matters more than access frequency. Always use TTL as a safety net even with LRU/LFU.
Most production systems use LRU with TTL as a complementary eviction mechanism rather than relying on a single policy.
Cache sharding: Partition data by entity type or key prefix. All data for a user stays in the same shard, enabling multi-key operations and pipelining within a shard.
Consistent hashing: Map keys to nodes based on hash values. Provides better load distribution when nodes are added or removed because only K/n keys remap.
When to use sharding:
- You have entity types with different access patterns and sizes
- You need atomic multi-key operations within an entity
- You want simpler debugging (data for entity X is always on shard Y)
When to use consistent hashing:
- Uniform distribution of keys across nodes is critical
- You frequently add/remove cache nodes
- You want to minimize cache invalidation during scaling
Many systems combine both: consistent hashing within shards to handle node failure and rebalancing within each shard's node group.
Cache poisoning occurs when an attacker injects malicious data into your cache that is then served to many users. Usually achieved by exploiting cache key collision or polluting shared cache with malicious values.
Prevention strategies:
- Key validation: Sanitize cache keys to reject special characters, extremely long keys, or malformed input that could become injection vectors.
- Key namespacing: Use prefixes like `app:env:entity:id` to prevent collision between different applications sharing cache infrastructure.
- Input validation before caching: Validate data before storing in cache. Do not cache unchecked user input.
- Cache access controls: Implement authentication for cache access and audit who accesses what.
- Integrity checks: Sign cached values and verify signature before serving. Prevents tampering with cached data.
If your cache is shared across multiple applications, a compromised app can poison data that affects other applications. Namespacing and access controls are critical in multi-tenant cache deployments.
In microservices, the same data (e.g., user profile) may be cached by multiple services independently. This creates consistency challenges:
Pattern: Single source of truth with pub/sub invalidation
- The service owning the data publishes invalidation events when data changes.
- All other services subscribe and invalidate their local caches.
- TTL acts as a safety net if events are missed.
Pattern: Cache-aside with external invalidation
- Central cache layer (Redis) holds canonical cached data.
- Services read from central cache instead of maintaining their own caches.
- Simpler consistency model but central cache becomes a dependency.
Key consideration: You cannot have strong consistency across independent caches. Design for eventual consistency and use write-through or event invalidation to minimize the inconsistency window. If strict consistency is required, bypass caches on reads and use write-through on writes.
Latency impact: Cache hit latency is typically 0.1-1ms vs 5-50ms for database queries. Each miss adds ~10-50ms latency per request.
Throughput impact: Database queries limit concurrent operations due to connection pool constraints. Cache hits free database connections for other requests.
Capacity calculation:
- If database supports 10,000 queries/second at 50ms latency, at 99% hit rate you need only 100 queries/second from database.
- This means 1% of requests are slow (50ms) and 99% are fast (0.5ms). Average latency = 0.99 * 0.5ms + 0.01 * 50ms = 0.995ms.
- At 95% hit rate: 0.95 * 0.5ms + 0.05 * 50ms = 2.975ms average. A 5% drop in hit rate causes ~3x increase in average latency.
Rule of thumb: 99% hit rate gives ~1ms average latency. 95% gives ~3ms. 90% gives ~5.5ms. The miss penalty dominates once hit rate drops below 95%. Cache tuning efforts should target 95%+ hit rate on the hot working set.
Rate limiting using a distributed cache like Redis uses atomic increment operations with expiry:
# Sliding window rate limiter def rate_limit(key, limit, window_seconds): current = redis.incr(key) if current == 1: redis.expire(key, window_seconds) return current <= limitFixed window with sliding window log (more accurate)
def sliding_window_rate_limit(key, limit, window_seconds): now = time.time() window_start = now - window_seconds redis.zremrangebyscore(key, 0, window_start) current = redis.zcard(key) if current < limit: redis.zadd(key, now, str(uuid.uuid4())) redis.expire(key, window_seconds + 1) return current < limit
Considerations:
- Use atomic operations to prevent race conditions
- Lua scripts for Redis ensure read-check-increment is atomic
- Sliding window is more accurate but costs more operations
- Fixed window is simpler but allows burst at window boundaries
Local cache (e.g., Caffeine, LRUCache):
- Ultra-low latency (microseconds) - no network round-trip
- No serialization/deserialization overhead
- Cannot share across application instances
- Lost on application restart
- Memory limited to application process size
Distributed cache (e.g., Redis, Memcached):
- Shared across all application instances
- Persist across restarts
- Network latency (0.5-2ms per operation)
- Serialization overhead
- Single point of failure (mitigated with replication)
Best practice: Use a two-tier cache: local as L1 for the ultra-hot set, distributed as L2 for the warm cache. This gives you microsecond latency for L1 hits while sharing data across instances via L2.
Step 1: Identify the pattern
- Sudden latency spike: likely cache node failure or network partition
- Gradual performance degradation: likely memory pressure, increasing evictions
- Intermittent issues: likely connection pool exhaustion or periodic garbage collection
Step 2: Check metrics
- Hit rate drop: working set grew beyond cache size or access pattern changed
- Memory usage spike: likely key accumulation, TTL misconfiguration, or memory leak
- High eviction rate: cache undersized for working set
Step 3: Check logs
- Connection timeouts: network issues or cache overload
- OOM errors: maxmemory misconfigured or eviction policy not working
Step 4: Test assumptions
- Bypass cache and hit database directly to isolate whether cache is the problem
- Use cache introspection commands (Redis INFO, Memcached stats) to dump internal state
Step 5: Fix and verify
- Add capacity or tune eviction policy
- Implement circuit breaker to degrade gracefully
- Monitor to confirm fix worked
Scenario: Write-heavy workload with strong consistency requirements
Imagine a real-time bidding system where each ad impression generates a write, and every read must reflect the most recent bid state (no staleness allowed).
Why caching hurts:
- Cache-aside introduces eventual consistency - reads might return stale bid data, causing incorrect pricing
- Write-through doubles write latency (cache + database) for every impression
- Cache invalidation logic adds complexity and potential for bugs in hot path
- Cache might be populated with data that is never read again (each bid is unique)
Better approach:
- Use the database as the primary store with proper indexing
- Consider database read replicas if read latency is the concern
- Only add caching when measurements prove it helps
The key lesson: caching trades consistency for performance. When consistency is more important than performance (financial systems, real-time bidding), caching can actively harm your system by introducing bugs that are hard to reproduce (race conditions between cache and database) while adding complexity.
Further Reading
Books
- Redis in Action by Josiah Carlson — Comprehensive guide to Redis with production examples covering replication, sharding, and performance tuning.
- Designing Data-Intensive Applications by Martin Kleppmann — The definitive text on distributed systems, caching patterns, and data consistency models.
- Effective Redis by Alan Grories — Practical patterns and anti-patterns for Redis in production.
Articles and Papers
- Redis Architecture - DZone — Visual walkthrough of Redis internals and architecture choices.
- How Consistent Hashing is Used in Load Balancing — Deep dive into consistent hashing beyond caching.
- Cache Warming at Twitter — Twitter’s engineering blog on their cache warming strategy.
- Memcached internals and tuning — Official memcached internals documentation.
- Probabilistic Early Expiration paper — Academic paper on stampede protection algorithms.
Documentation
- Redis Documentation — Official Redis docs including client docs, administration, and best practices.
- Memcached Wiki — Community-maintained memcached documentation.
- Cache stampede prevention techniques — Redis-specific approaches to the thundering herd problem.
Conclusion
There is no single best caching strategy. Cache-aside is the default because it covers the most cases with the least complexity. But you’ll encounter situations where write-through or refresh-ahead fits better.
Start simple. Measure your hit rate. Add complexity only when the data tells you to.
Category
Related Posts
Cache Stampede Prevention: Protecting Your Cache
Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.
Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming
A comprehensive guide to advanced cache patterns — thundering herd, cache stampede prevention with distributed locking and probabilistic early expiration, and cache warming strategies.
Distributed Caching: Scaling Cache Across Multiple Nodes
A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.