Distributed Caching: Scaling Cache Across Multiple Nodes

A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.

published: February 15, 2026 reading time: 52 min read author: Geek Workbench updated: June 17, 2026

Quick Summary

Distributed caching scales cache across multiple nodes using consistent hashing to map keys to nodes, minimizing reorganization when adding or removing nodes compared to naive modulo hashing. Client-side sharding puts routing logic in the client for low latency with no proxy SPOF, while proxy-based routing hides cluster topology but adds latency and a potential failure point. The hard problems are cache invalidation across replicas, handling the thundering herd on cache miss, and keeping replicas consistent without killing read throughput. After reading, you will understand how to shard a cache, choose between client-side and proxy routing, and handle the failure modes unique to distributed caches.

Introduction

Single-node caching works until it doesn’t.

Limitation	What Happens
Memory	Your 16GB cache node fills up. You need 64GB.
Requests	Your cache serves 100k requests/second. You need 500k.
Availability	Your cache node crashes. Every request hits the database.

Distributed caching solves all three. But it introduces coordination problems.

Core Concepts

Client-Side Sharding

The client decides which cache node to use based on the key. No proxy needed.

graph LR
    A[Client] --> B{Which node?}
    B -->|key1| C[Node 1]
    B -->|key2| D[Node 2]
    B -->|key3| E[Node 3]

import hashlib
from bisect import bisect

class ShardedCache:
    def __init__(self, nodes):
        self.nodes = nodes
        self.ring = {}
        self.sorted_keys = []
        self._build_ring()

    def _build_ring(self):
        for node in self.nodes:
            for i in range(150):
                key = f"{node}:{i}"
                hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
                self.ring[hash_key] = node
                self.sorted_keys.append(hash_key)
        self.sorted_keys.sort()

    def _get_node(self, key):
        if not self.ring:
            return None
        hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
        idx = bisect(self.sorted_keys, hash_key)
        if idx >= len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

    def get(self, key):
        node = self._get_node(key)
        return node.cache.get(key)

    def set(self, key, value):
        node = self._get_node(key)
        node.cache.set(key, value)

Pros: No proxy layer, low latency, simple for small clusters. Cons: Client manages sharding, adding nodes requires moving data.

Proxy-Based Routing

Proxy-based routing puts a layer between clients and cache nodes. The client sends requests to the proxy, the proxy looks up which node owns the key, and forwards the request there. The client never knows or cares which node actually stores the data. Managed services like Redis Enterprise, ElastiCache, and Twemproxy use this model, and it is common when you want to hide cluster topology from application code entirely.

The big advantage is that cluster changes do not require application changes. If you add a node, remove a node, or replace a node, the proxy configuration updates and clients keep sending to the same endpoint. This matters in environments where deploying application changes is expensive or slow, or where you want the flexibility to swap cache infrastructure without a code release.

The downside is an extra network hop. The client-to-proxy leg and proxy-to-node leg add roughly 0.5-1ms of latency on top of the cache operation itself. For most applications this is fine. For workloads doing millions of cache operations per second where every microsecond counts, that overhead compounds. The proxy is also a single point of failure unless you run multiple proxies behind a load balancer, which reintroduces operational complexity you were trying to avoid.

A single proxy can also become a bottleneck at high request volumes. You have to size the proxy tier for your peak load and watch its CPU and connection counts. Managed services autoscale the proxy layer for you. If you are running Twemproxy or a similar tool yourself, plan for proxy capacity separately from cache node capacity.

graph LR
    A[Client] --> B[Proxy]
    B --> C[Node 1]
    B --> D[Node 2]
    B --> E[Node 3]

Use proxy-based routing when: you run a managed cache service that exposes this model natively, your application stack uses multiple languages where implementing consistent hashing in each client library would be duplicated effort, or you want to be able to swap cache cluster topology without touching application code.

Avoid it when: every microsecond matters in your hot path, you already have solid client-side sharding in place, or the operational cost of an additional tier is not justified by the simplicity gains.

Pros: Client code stays simple, cluster changes do not require redeploys, easy to change cache technology behind the same interface. Cons: Extra hop adds latency, proxy is a single point of failure without HA setup, proxy can bottleneck under load, adds cost if using a managed service.

Redis Cluster

Redis Cluster is Redis’s native distributed mode. It handles sharding, replication, and failover automatically.

# Minimum Redis Cluster setup (6 nodes = 3 primaries + 3 replicas)
redis-cli --cluster create \
  192.168.1.1:7000 \
  192.168.1.2:7000 \
  192.168.1.3:7000 \
  192.168.1.4:7000 \
  192.168.1.5:7000 \
  192.168.1.6:7000 \
  --cluster-replicas 1

import redis

# Redis Cluster client handles topology automatically
r = redis.RedisCluster(
    host='192.168.1.1',
    port=7000,
    skip_full_coverage_check=True
)

# Works the same as single Redis
r.set('key', 'value')
r.get('key')

Redis Cluster splits keys across 16,384 slots. Each primary node owns a subset of slots. If a primary fails, its replica promotes automatically.

Topic-Specific Deep Dives

Cache Coherence

When you have multiple cache nodes, coherence becomes a problem. If you update data in Node 1, how do you invalidate that same data in Node 2?

The Problem

graph TD
    A[Write to DB] --> B[Update Node 1]
    B -.->|Async| C[Node 2 has stale data]
    D[Read from Node 2] --> E[Stale data returned]

Solutions

1. Write-Through All Nodes

On write, update all cache nodes.

def write_to_all_nodes(key, value, nodes):
    for node in nodes:
        node.set(key, value)

# Problem: More write latency, potential inconsistency if one fails

This works but is slow and fragile. One slow node drags everything down.

2. Invalidation Broadcasting

On write, broadcast invalidation to all nodes.

def invalidate_across_cluster(key, nodes, pubsub):
    # Publish invalidation event
    pubsub.publish('cache:invalidate', key)

    # Subscribe on each node
    for node in nodes:
        node.delete(key)

# Subscribe to invalidation
pubsub.subscribe('cache:invalidate')

Redis pub/sub is useful here. When data changes, publish an invalidation message. All cache instances subscribe and delete the key.

# On write
redis_cluster.publish('invalidation', json.dumps({'key': 'user:123'}))

# On each cache node
pubsub = redis_cluster.pubsub()
pubsub.subscribe('invalidation')

for message in pubsub.listen():
    if message['type'] == 'message':
        data = json.loads(message['data'])
        cache.delete(data['key'])

3. Consistent Hashing with Virtual Nodes

Use consistent hashing so keys map to the same nodes reliably. Adding a node only moves some keys.

class VirtualNodeSharding:
    def __init__(self, nodes, replicas=150):
        self.replicas = replicas
        self.ring = {}
        self.sorted_keys = []

        for node in nodes:
            for i in range(replicas):
                # Virtual node for better distribution
                key = f"{node}:vnodes:{i}"
                hash_key = self._hash(key)
                self.ring[hash_key] = node
                self.sorted_keys.append(hash_key)

        self.sorted_keys.sort()

    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def get_node(self, key):
        hash_key = self._hash(key)
        idx = bisect(self.sorted_keys, hash_key)
        if idx >= len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

Session Store Patterns

Sessions are a common distributed caching use case. User sessions need to be available across all application instances.

Redis Session Store

import redis
import json
import secrets

class RedisSessionStore:
    def __init__(self, redis_cluster, ttl=86400):
        self.redis = redis_cluster
        self.ttl = ttl

    def create_session(self, user_id, data=None):
        session_id = secrets.token_urlsafe(32)
        session_key = f"session:{session_id}"

        session_data = {
            'user_id': user_id,
            'created_at': time.time(),
            'data': data or {}
        }

        self.redis.setex(
            session_key,
            self.ttl,
            json.dumps(session_data)
        )

        return session_id

    def get_session(self, session_id):
        session_key = f"session:{session_id}"
        data = self.redis.get(session_key)

        if not data:
            return None

        return json.loads(data)

    def update_session(self, session_id, data):
        session_key = f"session:{session_id}"
        session = self.get_session(session_id)

        if not session:
            return False

        session['data'].update(data)
        session['updated_at'] = time.time()

        self.redis.setex(
            session_key,
            self.ttl,
            json.dumps(session)
        )

        return True

    def destroy_session(self, session_id):
        session_key = f"session:{session_id}"
        self.redis.delete(session_key)

Session affinity vs distributed sessions

If your application runs on multiple instances, sessions need to be shared. Two options:

Sticky sessions set a cookie (or inspect client IP) at the load balancer to route each user to the same application instance. The session lives entirely in that instance’s memory. No network hop needed for reads or writes. Most load balancers (NGINX, HAProxy, AWS ALB) enable this with a single config flag. The downside: if that instance crashes, the session disappears. Users get logged out, lose cart data, or hit whatever else was in that session blob. Rolling deployments also hurt. Old instances drain connections slowly while new instances sit idle, leaving users stranded on dying nodes.

Distributed sessions store session data in a shared cache (Redis) that every instance reads and writes. Any instance can serve any request. The RedisSessionStore class above shows the pattern: serialize to JSON, store with a TTL, all instances share the same data. No single instance holds session state exclusively.

The trade-off goes beyond resilience versus speed. Sticky sessions can create uneven load: one user’s heavy requests pile onto a single instance while others sit underutilized. Distributed sessions spread load evenly but add a network hop (500µs-2ms) to every request that touches session data. For user-facing applications where a dropped session means a frustrated customer, distributed sessions are worth the extra latency. For internal dashboards or batch processing tools where session loss is a minor inconvenience, sticky sessions keep operations simple.

Hybrid approaches use sticky sessions as the default with a distributed session store as backup. If the sticky instance fails, the load balancer reroutes to another instance, which pulls the session from Redis. The common case stays fast (in-memory), and failures fall back safely (Redis).

High Availability Patterns

Replication

Add replica nodes that copy data from the primary. Reads can go to replicas, spreading load.

# Redis replication config (on replica)
replicaof 192.168.1.1 7000
replica-read-only yes

# Read from replica, write to primary
def get_from_replica(key):
    return replica_redis.get(key)

def set_to_primary(key, value):
    return primary_redis.setex(key, ttl, value)

def get_cached(key):
    # Try primary first (has freshest data)
    val = primary_redis.get(key)
    if val:
        return val

    # Fall back to replica
    return replica_redis.get(key)

Automatic Failover

Redis Sentinel handles failover automatically. When a primary fails, Sentinel promotes a replica and updates clients.

# Sentinel configuration
sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000

from redis.sentinel import Sentinel

sentinel = Sentinel([
    ('192.168.1.1', 26379),
    ('192.168.1.2', 26379),
    ('192.168.1.3', 26379)
], socket_timeout=0.1)

# Get master and replica connections
master = sentinel.master_for('mymaster')
replica = sentinel.slave_for('mymaster')

# Automatic failover handling
master.set('key', 'value')  # Writes always go to master
replica.get('key')           # Reads can go to replica

Trade-off Analysis

This section consolidates the key architectural decisions and their implications across distributed caching implementations.

Consistency vs Availability

In a distributed cache, every write faces a fork: wait for all replicas to confirm before returning to the client, or return immediately and let replicas sync later. Redis Cluster picks the second path by default: writes return as soon as the primary acknowledges, and replicas catch up in the background. That’s faster and keeps latency low, but if the primary crashes before the next replica sync, those writes are gone. Redis Sentinel with synchronous writes to a single master and its replicas takes the first path, which means stronger consistency but your write latency now depends on the network between nodes.

The table below maps each coherence strategy to its CAP positioning. “Strong” consistency means the read always reflects the latest write accepted by the cluster; “eventual” means reads might return data from before the most recent write, within some bounded staleness window.

Approach	Consistency	Availability	Typical Use Case
Redis Cluster (async)	Eventual	High	Web applications, content caching
Redis Sentinel + writes	Strong (sync write)	Moderate (failover)	Financial data, inventory
Write-through all nodes	Strong	Low (any node down)	Small clusters, critical consistency
Pub/Sub invalidation	Eventual	High	Multi-region, social media

Latency vs Consistency Trade-offs

Read and write latency interact with consistency guarantees in ways that affect your choice of strategy. The table below compares how each approach handles the latency-versus-freshness spectrum, helping you decide which trade-off fits your workload’s tolerance for stale reads versus demand for low-latency responses.

Strategy	Read Latency	Write Latency	Staleness Window
Primary for all ops	~0.5-1ms	~0.5-1ms	None (always fresh)
Read from replica	~0.3-0.5ms	~0.5-1ms	Replication lag (typically <100ms)
Probabilistic early exp	~0.5-1ms	Same as write	TTL × probabilistic window
Stale-while-revalidate	~0.3ms (serve stale)	Background refresh	Configurable, can be seconds

Memory vs Performance

Every replica you add doubles the memory footprint of the data it mirrors. A 16GB master with two replicas consumes 48GB total across the cluster, but only 16GB is actually usable — the replicas exist for availability and read scaling, not for storing additional unique data. That shapes your cost per usable gigabyte in ways that matter at scale. Hot keys make this worse: if one key gets 10% of your traffic and you replicate it across all nodes for load distribution, you’ve paid the replication cost many times over for a single key.

Node failures compound the problem. When a primary fails and a replica promotes, the new primary handles all reads and writes with no replica protection until a new replica is provisioned. During that window, your effective cluster memory drops by half while request load spikes as the failover node tries to backfill. Knowing how your configuration handles that transition tells you more than any steady-state benchmark.

Configuration	Memory Efficiency	Performance	Failure Mode
No replication	100% usable	Highest throughput	Single node failure = data loss
1 replica per master	50% usable	~70% of no-replica	1 node fail = promoted replica serves
2 replicas per master	33% usable	~50% of no-replica	More resilience, higher cost
Hot key replication	Lower (replicated keys)	Even load distribution	Complexity in invalidation

Client-Side Sharding vs Proxy-Based Routing

Client-side sharding pushes the routing logic into your application code. The client computes a hash of the key, maps it to a node position on the consistent hash ring, and sends the request directly to that node. No intermediary means fewer network hops and fewer potential failure points. The catch is that every client library in every language your stack uses needs to implement and maintain the sharding logic. When you add a node, every client must be updated with the new ring configuration, unless you use a stable hash ring with virtual nodes to minimize remapping.

Proxy-based routing centralizes that complexity on a layer like Twemproxy, Envoy, or a managed service sidecar. Clients send all cache requests to the proxy, which fans them out to the appropriate node based on its own internal routing table. Clients stay thin — they only need to know about the proxy endpoint, not the cluster topology. The proxy can also handle health checking, retry logic, and rate limiting across the cluster. The cost is an extra hop and a new single point of failure if you do not run multiple proxies behind a load balancer.

Factor	Client-Side Sharding	Proxy-Based Routing
Latency	Lowest (direct)	+1 hop (proxy)
Operations complexity	Higher (each client)	Lower (centralized)
Scalability	Limited by client	Proxy must scale
Failure domain	Client failures affect cache access	Proxy is SPOF unless HA
Topology management	Client must track	Proxy handles all

TTL Selection Trade-offs

Setting cache expiration is one of the most consequential decisions in cache design. Too short and you lose cache benefit; too long and you serve stale data. This table maps TTL ranges to their effects on hit rate, freshness, and backend load so you can pick a starting point for your workload type.

TTL Setting	Cache Efficiency	Data Freshness	Server Load
Very short (<60s)	Low hit rate	Near real-time	High DB load
Short (1-5 min)	Moderate hit rate	Fresh enough for most	Moderate DB load
Long (1-24 hours)	High hit rate	Stale data risk	Low DB load
Adaptive TTL	Variable	Better freshness	Complexity in implementation

When to Use / When Not to Use

Architecture	When to Use	When Not to Use
Client-Side Sharding	Small to medium clusters (<10 nodes); low latency critical; team comfortable with client logic	Large clusters; multi-language environments; teams want simplicity
Proxy-Based Routing	Multi-client support; topology hidden; managed service environments	Extra hop latency unacceptable; proxy is single point of failure
Redis Cluster	Large datasets (>10GB); need automatic failover; want native sharding	Small datasets; need strict ordering; transaction-heavy workloads
Write-Through All Nodes	Small clusters; strong consistency required; write volume manageable	Large clusters; high write volume; eventual consistency acceptable
Pub/Sub Invalidation	Multi-region; eventual consistency acceptable; need real-time invalidation	Strict consistency required; high invalidation frequency
Consistent Hashing	Frequent node additions/removals; want minimal key remapping	Fixed cluster size; need strict key distribution
Sticky Sessions	Simple deployments; session loss acceptable on failure; cost-sensitive	High availability required; sessions must survive instance failures
Distributed Sessions	HA critical; multi-instance deployments; sessions must persist	Simple single-instance apps; session loss tolerable

Decision Guide

Use this flowchart when you’re evaluating distributed caching for a new project or moving away from a single-node setup. Start at the top and follow each decision point down.

First question — Memory > Single Node? If your working set fits in one node’s RAM (50GB or less for Redis), you may not need distribution yet. Single-node caching with replication for HA is simpler to operate and avoids coordination across nodes. Move to distributed caching when you need more memory than one node provides, or when you need to spread request load across multiple nodes for read throughput.

Second question — Need HA? If availability matters for your application, you need replication. The next decision is whether you need automatic failover or can handle replica promotion manually during incidents.

Third question — Client Simplicity? Client-side sharding puts routing logic into your application code. That means every client library in every language your stack uses needs to implement consistent hashing. If your team wants to avoid that burden, proxy-based routing centralizes the logic on a sidecar like Twemproxy or Envoy. Managed services like ElastiCache and Redis Enterprise use proxy-based routing under the hood.

Fourth question — Need Automatic Failover? Sentinel monitors a single master-replica set and promotes replicas automatically when the primary fails. Redis Cluster handles failover natively as part of its sharding model — each shard is an independent master-replica pair. If you need HA across multiple shards and want automatic failover without external tooling, Redis Cluster is the answer. If you’re running a simpler setup with one master and a few replicas, Sentinel is lighter weight.

graph TD
    A[Scale Need] --> B{Memory > Single Node?}
    B -->|Yes| C{Need HA?}
    B -->|No| D[Single Node Cache]
    C -->|Yes| E[Redis Cluster or Replicated Setup]
    C -->|No| F{Client Simplicity?}
    F -->|Yes| G[Proxy-Based Routing]
    F -->|No| H[Client-Side Sharding]
    E --> I{Need Automatic Failover?}
    I -->|Yes| J[Sentinel or Redis Cluster]
    I -->|No| K[Manual Replica Promotion]

Capacity Estimation

Planning a distributed cache cluster requires understanding how data and load distribute across nodes.

Cluster memory sizing formula:

total_cache_memory = sum(memory_per_node × number_of_nodes)
effective_capacity = total_cache_memory × (1 - replication_overhead) × (1 - fragmentation_factor)
usable_capacity = effective_capacity × 0.7  # Keep 30% headroom for spikes

For a Redis Cluster with 3 masters × 16GB each and 1 replica per master (total 6 nodes):

Total raw memory: 48GB
Replication overhead: 33% (1 replica per master): 48GB × 0.67 = 32GB effective
Fragmentation factor: typically 1.1-1.5× depending on allocator: 32GB / 1.2 = 26.7GB
Usable at 70%: ~18.7GB for cached data

Slot distribution in Redis Cluster: 16,384 slots divided across N masters. With 3 masters: each owns ~5,461 slots. When adding nodes, Redis Cluster migrates slots in units — one slot at a time. The cluster slots command shows current ownership. Plan slot assignments in advance using redis-cli --cluster import or manual cluster setslot commands during maintenance windows.

Hot key detection: Even with consistent hashing, hot keys cause uneven load. The formula for identifying hot keys:

hot_key_probability = (requests_per_key / total_requests) × number_of_nodes

If 1% of keys receive 50% of traffic on a 10-node cluster, those keys are 5× hotter than average. Mitigation: replicate hot keys to all nodes and route randomly, use key spreading (split user:123 into user:123:shard0 … user:123:shard9), or move hot key handling to a dedicated single-node Redis with replica read scaling.

Memcached cluster write capacity: With N nodes in a consistent hash ring, each write goes to one node. Write throughput = single_node_throughput × N (assuming even distribution). If one Memcached node handles 200k ops/sec, a 10-node cluster handles ~2M ops/sec total, but any single key’s write rate is still bound by one node.

Production Failure Scenarios

Failure	Impact	Mitigation
Single cache node crashes	Keys on that node missed; potential database overload	Replica promotion; circuit breaker; replica reads during failover
Network partition	Cluster split-brain or unavailable	Redis Cluster handles automatically; Sentinel monitors
Proxy failure (Twemproxy)	All cache requests fail	Run multiple proxies with load balancer; keep-alive health checks
Shard rebalancing	Temporary key remapping; some misses during migration	Use stable cluster configuration; avoid frequent resharding
Invalidation broadcast lost	Stale data on some nodes	Implement periodic reconciliation; use version vectors
Replica lag	Stale reads from replicas	Monitor lag; route consistency-critical reads to primary
Sentinel election failure	No automatic failover possible	Run 3+ Sentinels; ensure proper quorum configuration

Common Pitfalls / Anti-Patterns

1. Assuming Hashing Is Evenly Distributed

Poor hash functions or small keyspaces cause hot spots.

# BAD: Modulo-based sharding creates hot spots
def get_node(key, num_nodes):
    hash_val = hash(key)
    return hash_val % num_nodes  # Uneven for small keyspaces

# GOOD: Use consistent hashing with virtual nodes
class ConsistentHashSharding:
    def __init__(self, nodes, replicas=150):
        self.ring = {}
        for node in nodes:
            for i in range(replicas):
                key = f"{node}:{i}"
                self.ring[md5(key)] = node

2. Not Handling Node Failure Gracefully

Assuming all nodes always available leads to cascading failures.

# BAD: No failure handling
def get(key):
    return nodes[key % len(nodes)].get(key)  # Crashes if node down

# GOOD: Retry on different node, circuit breaker
def get_with_fallback(key, max_retries=3):
    for attempt in range(max_retries):
        try:
            node = get_node_for_key(key)
            return node.get(key)
        except ConnectionError:
            logger.warning("node_connection_failed", node=node, attempt=attempt)
            # Circuit breaker would prevent repeated attempts
    return get_from_database_fallback(key)

3. Pub/Sub Invalidation Without Guarantees

Pub/Sub is fire-and-forget; messages can be lost.

# BAD: Assuming pub/sub invalidation always works
def on_data_update(key, value):
    db.update(key, value)
    redis_cluster.publish('invalidation', key)  # Fire and forget

# GOOD: Verify invalidation, periodic reconciliation
def on_data_update(key, value):
    db.update(key, value)

    # Publish invalidation
    redis_cluster.publish('invalidation', key)

    # Also delete locally to handle the common case
    redis_cluster.delete(key)

    # For strict consistency: check all nodes periodically
    schedule_reconciliation(key)

4. Ignoring Slot Migration Complexity

Adding nodes to Redis Cluster requires migrating slots.

graph LR
    A[Before Migration] --> B[During Migration<br/>Slot 123 split]
    B --> C[Node1 owns 0-5460]
    B --> D[Node2 owns 5461-10922<br/>+ Slot 123 migrating]
    C --> E[After Migration<br/>Node1 owns 0-5460]
    D --> E

# During slot migration:
# 1. Cluster must be in stable state (no other migrations)
# 2. Target node must accept IMPORTING slot
# 3. Source node must accept MIGRATING slot
# 4. Keys must be moved one by one
# 5. Slot ownership transferred after all keys moved

# Don't add nodes during high traffic - migration causes temporary unavailability

5. Single Sentinel for HA

Sentinel needs quorum to elect a new master.

# BAD: Single Sentinel
# If Sentinel crashes, no failover possible

# GOOD: 3+ Sentinels with proper quorum
sentinel monitor mymaster 192.168.1.1 7000 2  # Quorum of 2
sentinel monitor mymaster 192.168.1.2 7000 2  # On different machines
sentinel monitor mymaster 192.168.1.3 7000 2  # Quorum = 2 of 3

Cache Stampede Prevention

Here’s a scenario I’ve seen play out in production: a popular cache key expires, and suddenly 50 requests hit the database at once. That’s the thundering herd — cache expiration creates traffic spikes that can flatten your database before you know what happened.

Probabilistic Early Expiration

Instead of waiting for TTL to expire, refresh the cache probabilistically before it expires:

import random
import time

class ProbabilisticEarlyExpiration:
    def __init__(self, cache_client, beta=0.3):
        self.cache = cache_client
        self.beta = beta  # Higher = more aggressive early refresh

    def get(self, key, fetch_func):
        cached = self.cache.get(key)
        if cached is not None:
            # Check if we should early-refresh
            metadata = self.cache.get(f"{key}:meta")
            if metadata:
                age, original_ttl = float(metadata['age']), float(metadata['ttl'])
                # Probabilistic early expiration formula from XFetch
                probability = self.beta * (age / original_ttl) ** 2
                if random.random() < probability:
                    # Async refresh in background (or sync for simplicity)
                    try:
                        new_value = fetch_func(key)
                        self.cache.set(key, new_value, ttl=original_ttl)
                    except Exception:
                        pass  # Don't fail if refresh fails
            return cached

        # Cache miss - fetch and store
        value = fetch_func(key)
        ttl = self._estimate_ttl(value)
        self.cache.set(key, value, ttl=ttl)
        self.cache.set(f"{key}:meta", {'age': 0, 'ttl': ttl})
        return value

Lock-Based Cache Fill

Use distributed locks so only one request fetches and fills the cache:

import redis
import threading

class LockBasedCacheFill:
    def __init__(self, redis_client, lock_ttl=5):
        self.redis = redis_client
        self.lock_ttl = lock_ttl

    def get_or_fill(self, key, fetch_func, ttl=300):
        cached = self.redis.get(key)
        if cached is not None:
            return cached

        lock_key = f"lock:{key}"
        # Try to acquire lock
        acquired = self.redis.set(lock_key, "1", nx=True, ex=self.lock_ttl)

        if acquired:
            try:
                value = fetch_func(key)
                self.redis.setex(key, ttl, value)
                return value
            finally:
                self.redis.delete(lock_key)
        else:
            # Another request is filling - wait and retry
            time.sleep(0.1)
            cached = self.redis.get(key)
            if cached:
                return cached
            # Fallback after timeout - fetch directly
            return fetch_func(key)

Background Refresh with Lease

Give requests a lease on the cache entry so they know when to refresh:

def get_with_lease(self, key, fetch_func, ttl=300):
    cached, lease_id = self.redis.get_with_lease(key)

    if cached:
        remaining_ttl = self.redis.ttl(key)
        # If less than 20% of original TTL remains, trigger background refresh
        if remaining_ttl < ttl * 0.2:
            # Fire-and-forget background refresh
            thread = threading.Thread(target=self._background_fill, args=(key, fetch_func))
            thread.start()
        return cached

    return self._fill_cache(key, fetch_func, ttl)

TTL Selection Deep Dive

TTL is one of those settings that seems trivial but will bite you in production. Set it too short and you lose most of your cache benefit. Too long and you’re serving stale data without realizing it.

TTL Selection Framework

Data Type	Typical TTL Range	Rationale
User session	15 min - 24 hours	Session must outlast user activity, but stale sessions hurt
Product catalog	1 hour - 24 hours	Updates are infrequent, stale data has business cost
Social media feed	30 sec - 5 min	Freshness critical, staleness very visible
Analytics aggregates	5 min - 1 hour	Source data changes slowly, aggregate is cheap to refresh
Configuration flags	30 sec - 5 min	Feature flags need fast propagation
Search index	1 hour - 24 hours	Index rebuild is expensive, updates are batched
Thumbnail / media URL	24 hours - 7 days	URLs change rarely but CDN caching matters

TTL Refresh Strategy

def adaptive_ttl(key, value, access_pattern):
    """
    Dynamically set TTL based on access patterns and data type.
    """
    base_ttl = {
        'session': 86400,       # 24 hours
        'catalog': 3600,        # 1 hour
        'feed': 60,            # 1 minute
        'config': 300,         # 5 minutes
        'static': 86400 * 7,   # 7 days
    }

    ttl = base_ttl.get(access_pattern, 3600)

    # Reduce TTL for frequently changing data
    if value.get('updated_at'):
        age = time.time() - value['updated_at']
        if age > ttl * 0.8:
            ttl = int(ttl * 0.5)  # Halve TTL for aging data

    # Increase TTL for read-heavy data that rarely changes
    read_count = value.get('read_count', 0)
    if read_count > 10000:
        ttl = int(ttl * 1.5)

    return ttl

Handling TTL vs Freshness Trade-offs

The max-age header and Redis TTL serve different purposes:

# Redis TTL controls when cache expires internally
# HTTP max-age controls what downstream clients see

# For a product catalog cached at CDN edge:
response.set_header('Cache-Control', 'public, max-age=300')  # CDN caches 5 min
redis.setex('product:123', 3600, data)  # Redis expires in 1 hour

# Invalidation: when product updates, publish to pub/sub
redis.publish('invalidation', 'product:123')
# Application deletes the key: redis.delete('product:123')
# Next request fetches fresh data, repopulates cache with new TTL

Cache Warming Strategies

Cold cache is one of those things that feels fine in testing and absolutely destroys you at 3am on a Monday. When a cache restarts or scales, you need a strategy to repopulate it without hammering your database.

Proactive Warming

On cache startup or scale-out, pre-populate with estimated hot keys:

class CacheWarmer:
    def __init__(self, cache_client, db_client):
        self.cache = cache_client
        self.db = db_client

    def warm_from_access_log(self, access_log_path, top_n=1000):
        """
        Warm cache from recent access patterns.
        access_log_path: path to access log with key names
        """
        from collections import Counter

        key_counts = Counter()
        with open(access_log_path) as f:
            for line in f:
                key = self._extract_key_from_log(line)
                if key:
                    key_counts[key] += 1

        # Get top N most accessed keys
        hot_keys = [k for k, _ in key_counts.most_common(top_n)]

        # Batch fetch from database and populate cache
        self._batch_fill(hot_keys, batch_size=100)

    def _batch_fill(self, keys, batch_size=100):
        for i in range(0, len(keys), batch_size):
            batch = keys[i:i + batch_size]
            # Fetch all from DB in one query
            rows = self.db.fetch_many("SELECT * FROM items WHERE id IN (%s)", batch)
            for row in rows:
                self.cache.setex(f"item:{row['id']}", 3600, json.dumps(row))

    def warm_from_key_pattern(self, patterns):
        """
        Warm from known key patterns (e.g., all products in category).
        """
        for pattern in patterns:
            # e.g., "product:*" - scan matching keys
            cursor = 0
            while True:
                cursor, keys = self.cache.scan(cursor, match=pattern, count=100)
                for key in keys:
                    # Re-populate with fresh data
                    value = self.db.fetch_one(key.replace('cache:', ''))
                    if value:
                        self.cache.setex(key, 3600, json.dumps(value))
                if cursor == 0:
                    break

Warm Cache on Node Addition

When adding a new cache node (cluster scale-out), pre-warm it:

graph TD
    A[New Node Added] --> B{Key ownership?}
    B --> C[Scan hot keys from other nodes]
    C --> D[Batch fetch from database]
    D --> E[Populate new node]
    E --> F[Monitor hit rate on new node]
    F --> G{Hit rate > threshold?}
    G -->|Yes| H[Node ready]
    G -->|No| I[Increase warm priority]
    I --> C

def warm_new_node(new_node, cluster, hot_keys):
    """
    Pre-populate a new cluster node with hot keys.
    """
    for key_batch in chunked(hot_keys, 50):
        # Get current value from existing nodes
        values = cluster.mget(key_batch)
        # Set on new node with same TTL
        for key, value in zip(key_batch, values):
            if value:
                ttl = cluster.ttl(key)
                new_node.setex(key, ttl, value)

Lazy vs Eager Warming

Strategy	Pros	Cons	Best For
Lazy	No upfront cost, only heat what gets used	Initial requests hit DB	Large caches, infrequent cold starts
Eager	Instant hot cache, no DB spikes	Wastes resources on unused keys	Predictable traffic, scheduled maintenance
Hybrid	Prioritizes hot keys, fills on-demand	Complexity	Production systems with known hot set

Multi-Tier Caching

Most production systems don’t get far with a single cache layer. At scale, you end up stacking L1 (in-process memory), L2 (Redis or Memcached), and L3 (CDN) — each with different latency, capacity, and cost characteristics.

Common Cache Hierarchy

graph TB
    A[Request] --> B[L1: In-Process<br/>Process memory<br/>~1MB, sub-ms]
    B -->|miss| C[L2: Distributed Cache<br/>Redis/Memcached<br/>~10GB, <1ms]
    C -->|miss| D[L3: CDN Edge<br/>Geographic PoPs<br/>~100GB, 5-20ms]
    D -->|miss| E[Database<br/>ms latency]

L1 + L2 Implementation

import threading
import time

class TwoTierCache:
    def __init__(self, l1_size=1000, l2_client=None):
        # L1: Simple in-memory dict with LRU
        self.l1 = {}  # key -> (value, expiry)
        self.l1_access = {}  # key -> last access time for LRU
        self.l1_lock = threading.Lock()
        self.l1_size = l1_size

        # L2: Redis
        self.l2 = l2_client

    def get(self, key):
        # Try L1 first
        with self.l1_lock:
            if key in self.l1:
                value, expiry = self.l1[key]
                if time.time() < expiry:
                    self.l1_access[key] = time.time()
                    return value
                del self.l1[key]
                del self.l1_access[key]

        # Try L2
        if self.l2:
            value = self.l2.get(key)
            if value:
                # Populate L1
                self._l1_set(key, value, ttl=300)
                return value

        return None

    def set(self, key, value, ttl=300):
        # Write to both tiers
        self._l1_set(key, value, ttl)
        if self.l2:
            self.l2.setex(key, ttl, value)

    def _l1_set(self, key, value, ttl):
        with self.l1_lock:
            # Evict LRU if at capacity
            if len(self.l1) >= self.l1_size and key not in self.l1:
                lru_key = min(self.l1_access, key=self.l1_access.get)
                del self.l1[lru_key]
                del self.l1_access[lru_key]

            self.l1[key] = (value, time.time() + ttl)
            self.l1_access[key] = time.time()

    def invalidate(self, key):
        with self.l1_lock:
            self.l1.pop(key, None)
            self.l1_access.pop(key, None)
        if self.l2:
            self.l2.delete(key)

CDN as L3: Cache Aside with Tier-Aware Client

class TierAwareCacheClient:
    def __init__(self, cdn_client, redis_client, db_client):
        self.cdn = cdn_client  # L3: CDN (e.g., Cloudflare API)
        self.redis = redis_client  # L2: Redis
        self.db = db_client  # Origin

    def get(self, key):
        # Try CDN (L3) - lowest latency for distant users
        value = self.cdn.get(key)
        if value:
            return value

        # Try Redis (L2)
        value = self.redis.get(key)
        if value:
            # Populate CDN for next request
            self.cdn.set(key, value, ttl=300)
            return value

        # Cache miss - fetch from DB (L1 effectively)
        value = self.db.fetch(key)
        if value:
            # Write to both L2 and L3
            self.redis.setex(key, 3600, value)
            self.cdn.set(key, value, ttl=300)
        return value

Quick Recap Checklist

Copy/Paste Checklist

Common commands and configurations you can copy directly into your deployment scripts.

Redis Cluster minimum setup (6 nodes):

redis-cli --cluster create \
  192.168.1.1:7000 \
  192.168.1.2:7000 \
  192.168.1.3:7000 \
  192.168.1.4:7000 \
  192.168.1.5:7000 \
  192.168.1.6:7000 \
  --cluster-replicas 1

Sentinel configuration (3 nodes):

sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000

Pub/Sub invalidation subscription:

redis-cli SUBSCRIBE cache:invalidate
# Message format: {"key": "user:123", "action": "delete"}

Cluster health check:

redis-cli cluster info
redis-cli cluster nodes | grep master

Python Redis Cluster client:

r = redis.RedisCluster(
    host='192.168.1.1',
    port=7000,
    skip_full_coverage_check=True,
    read_from_replicas=True  # Read from replicas for scale
)

Deployment checklist:

Minimum 3 masters + replicas for HA
Sentinel quorum = (sentinels / 2) + 1
Monitor replication lag on all replicas
Test failover manually before going to production
Implement application-level retry logic for cluster operations
Use slot-aware client for efficient routing
Set up alerts for cluster node failures

Observability Checklist

You need visibility into your cache cluster in production. You cannot debug what you cannot observe. This section covers the minimum signals to track at both cluster and node levels.

Cluster-Level:

cluster_nodes - Number of nodes, their states, and roles
cluster_slots_fail - Number of slots not covered by reachable nodes
cluster_known_nodes - Total known nodes in cluster

Node-Level:

connected_slaves - Number of replicas connected to each master
master_repl_offset - Replication position
slave_repl_offset - Replica lag calculation

# Cluster health check
def check_cluster_health(cluster):
    health = {
        'healthy': True,
        'nodes': {},
        'issues': []
    }

    for node in cluster.nodes:
        info = node.info()
        health['nodes'][node.name] = {
            'role': info.get('role'),
            'connected': info.get('connected') == 'true',
            'memory': info.get('used_memory'),
        }

        if info.get('role') == 'master' and int(info.get('connected_slaves', 0)) == 0:
            health['issues'].append(f"Master {node.name} has no replicas")

    health['healthy'] = len(health['issues']) == 0
    return health

# Monitor replication lag
def check_replication_lag(primary, replica):
    primary_offset = primary.info()['master_repl_offset']
    replica_info = replica.info()
    replica_offset = replica_info.get('master_repl_offset', 0)
    lag_bytes = primary_offset - replica_offset

    # Rough estimate: 1MB = ~1 second at typical network speeds
    lag_ms = (lag_bytes / 1024 / 1024) * 1000

    if lag_ms > 100:
        logger.warning("replication_lag_high",
            lag_ms=lag_ms,
            lag_bytes=lag_bytes)

Metrics to Track

Cluster-Level:

cluster_nodes - Number of nodes, their states, and roles
cluster_slots_fail - Number of slots not covered by reachable nodes
cluster_known_nodes - Total known nodes in cluster

Node-Level:

connected_slaves - Number of replicas connected to each master
master_repl_offset - Replication position
slave_repl_offset - Replica lag calculation

# Cluster health check
def check_cluster_health(cluster):
    health = {
        'healthy': True,
        'nodes': {},
        'issues': []
    }

    for node in cluster.nodes:
        info = node.info()
        health['nodes'][node.name] = {
            'role': info.get('role'),
            'connected': info.get('connected') == 'true',
            'memory': info.get('used_memory'),
        }

        if info.get('role') == 'master' and int(info.get('connected_slaves', 0)) == 0:
            health['issues'].append(f"Master {node.name} has no replicas")

    health['healthy'] = len(health['issues']) == 0
    return health

# Monitor replication lag
def check_replication_lag(primary, replica):
    primary_offset = primary.info()['master_repl_offset']
    replica_info = replica.info()
    replica_offset = replica_info.get('master_repl_offset', 0)
    lag_bytes = primary_offset - replica_offset

    # Rough estimate: 1MB = ~1 second at typical network speeds
    lag_ms = (lag_bytes / 1024 / 1024) * 1000

    if lag_ms > 100:
        logger.warning("replication_lag_high",
            lag_ms=lag_ms,
            lag_bytes=lag_bytes)

Logs to Capture

logger = structlog.get_logger()

# Log cluster topology changes
def log_cluster_event(event_type, node_id, old_state, new_state):
    logger.warning("cluster_topology_change",
        event_type=event_type,
        node_id=node_id,
        old_state=old_state,
        new_state=new_state,
        timestamp=time.time())

# Log failover events
def log_failover(primary_id, new_primary_id):
    logger.critical("cache_failover_completed",
        old_primary=primary_id,
        new_primary=new_primary_id,
        timestamp=time.time())

Alert Rules

- alert: CacheClusterNodeDown
  expr: cache_cluster_connected_nodes < expected_nodes
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Cache cluster node(s) unavailable"

- alert: CacheReplicationLag
  expr: cache_replication_lag_bytes > 10485760 # 10MB
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Cache replication lag exceeds 10MB"

- alert: CacheClusterSlotFail
  expr: cache_cluster_slots_fail > 0
  for: 30s
  labels:
    severity: critical
  annotations:
    summary: "Cache cluster has unreachable slots"

Security Checklist

Node authentication - Use Redis ACLs or cluster authentication
Inter-node encryption - Encrypt traffic between cluster nodes
Network isolation - Cluster nodes should be on private network
Key prefix namespacing - Prevent collisions in shared cluster
Monitor cluster events - Watch for unauthorized topology changes
Secure Sentinel - Sentinel instances should require authentication
Backup cluster state - Regularly backup cluster configuration
Validate node identity - Verify node certificates if using TLS

# Redis Cluster secure configuration
# On each node:
requirepass your-cluster-auth
cluster-preferred-endpoint-type ipv4
tls-cluster yes

# In redis.conf:
# Bind to internal network only
bind 10.0.1.1 -::1

# ACL for application user
user appuser on >apppassword ~app:* on >appdb ~* +get +set +del +exists -flushall -flushdb

# Redis Cluster minimum setup (6 nodes)
redis-cli --cluster create \
  192.168.1.1:7000 \
  192.168.1.2:7000 \
  192.168.1.3:7000 \
  192.168.1.4:7000 \
  192.168.1.5:7000 \
  192.168.1.6:7000 \
  --cluster-replicas 1

# Sentinel configuration (3 nodes)
sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000

# Pub/Sub invalidation subscription
redis-cli SUBSCRIBE cache:invalidate
# Message format: {"key": "user:123", "action": "delete"}

# Cluster health check
redis-cli cluster info
redis-cli cluster nodes | grep master

# Python Redis Cluster
r = redis.RedisCluster(
    host='192.168.1.1',
    port=7000,
    skip_full_coverage_check=True,
    read_from_replicas=True  # Read from replicas for scale
)

# Deployment checklist:
# [ ] Minimum 3 masters + replicas for HA
# [ ] Sentinel quorum = (sentinels / 2) + 1
# [ ] Monitor replication lag on all replicas
# [ ] Test failover manually before going to production
# [ ] Implement application-level retry logic for cluster operations
# [ ] Use slot-aware client for efficient routing
# [ ] Set up alerts for cluster node failures

Real-World Case Studies

Redis Cluster Failure at Scale

A major e-commerce platform ran a 24-node Redis Cluster (12 masters, 12 replicas) for session storage and cart data. During a peak traffic event, a network switch failed causing a brief partition between 4 nodes and the rest of the cluster.

The cluster’s quorum-based failover correctly detected that the 4 isolated nodes could not form a majority and did not promote themselves — correct behavior. But the remaining 20 nodes experienced a cascade: the primary for one of the hot key slots was among the isolated nodes. The replica for that slot was in the majority partition. Failover promoted the replica to primary.

The problem: the application was using a slot-aware client that cached slot-to-node mapping. When failover changed slot ownership, the client’s cached mapping became stale. Some application instances still routed requests for the affected slot to the demoted node (now a replica). The replica rejected writes, and those application instances had no mechanism to refresh the slot map — they kept retrying the same failed node.

The recovery took 8 minutes. The fix: application code was updated to handle MOVED errors by refreshing the slot map and retrying, and the slot refresh timeout was reduced from 60 seconds to 5 seconds. Monitoring was added for MOVED error rates per application instance.

The lesson: cluster failover works, but your application’s cluster client must handle MOVED redirects correctly. A slot-aware client that does not honor MOVED creates a partial failure that bypasses cluster resilience. Test failover explicitly — trigger manual failover and observe whether your application recovers cleanly.

Monitoring Distributed Cache

You need visibility into your distributed cache. Key metrics:

# Check cluster health
def check_cluster_health(cluster):
    health = {
        'nodes': [],
        'total_keys': 0,
        'total_memory': 0,
        'hit_rate': 0
    }

    for node in cluster.get_nodes():
        info = node.info()
        health['nodes'].append({
            'host': node.host,
            'role': node.role,
            'connected': node.connected
        })
        health['total_keys'] += info.get('keys', 0)
        health['total_memory'] += info.get('used_memory', 0)

    return health

# Alert thresholds
ALERT_THRESHOLDS = {
    'memory_usage_percent': 80,  # Alert if >80% memory used
    'hit_rate_percent': 80,     # Alert if hit rate <80%
    'connected_clients': 10000, # Alert if >10k clients
    'replication_lag_ms': 100  # Alert if replica lag >100ms
}

Interview Questions

1. How does consistent hashing reduce the impact of adding or removing nodes in a distributed cache?

Expected answer points:

Consistent hashing maps keys to nodes using a hash ring with virtual nodes (150+ per physical node)
When a node is added or removed, only K/N keys are remapped (where K is virtual nodes per node, N is total nodes) — far less than traditional modulo hashing which remaps all keys
Virtual nodes ensure even key distribution even when physical nodes have different capacities
Trade-off: still requires some data movement during rebalancing, but impact is bounded and predictable

2. Describe the differences between cache-aside, read-through, and write-through caching patterns. When would you use each?

Expected answer points:

Cache-aside: application checks cache first, falls back to database on miss, then populates cache — most common pattern, application controls everything
Read-through: cache automatically fetches from database on miss — simpler for application but less control over serialization
Write-through: application writes to cache, cache synchronously writes to database — simpler reads but slower writes
Use cache-aside when: you need control over serialization, multiple data sources, or asymmetric read/write loads
Use write-through when: reads far outnumber writes and you want simplified read path

3. How would you handle the thundering herd problem when a popular cache key expires simultaneously across many requests?

Expected answer points:

Probabilistic early expiration (XFetch): before TTL expires, randomly refresh based on key age and a beta parameter — reduces simultaneous expiration
Lock-based cache fill: first request acquires a distributed lock and fills; others wait and retry — guarantees single fill but adds lock latency
Background refresh with lease: serve stale data while refreshing asynchronously — best for latency-sensitive workloads
Other approaches: random jitter on TTL, staggered TTLs for grouped keys, proactive warming of critical keys

4. What are the trade-offs between Redis Sentinel and Redis Cluster for high availability?

Expected answer points:

Redis Sentinel: monitoring, notification, and automatic failover for a single master — requires external proxy or client-side logic for routing
Redis Cluster: native sharding, replication, and failover within the Redis server — no external components needed but shards are independently available
Sentinel is simpler operationally but requires careful client configuration for failover
Cluster provides better write scalability (multiple masters) but requires slot-aware clients and has limitations on transactions across slots
Cluster cannot guarantee ordering across slots; Sentinel can if you use a single master-replica pair

5. How do you ensure cache coherence when data is updated in one cache node but must be invalidated across all nodes?

Expected answer points:

Pub/Sub broadcast: when data updates, publish invalidation event to all nodes — simple but unreliable (messages can be lost)
Write-through all nodes: synchronously update all cache replicas on write — strong consistency but slow and fragile if nodes fail mid-write
Version vectors or vector clocks: track which nodes have which version of data — enables conflict resolution but adds complexity
Periodic reconciliation: periodically scan and compare cache state across nodes — catches missed invalidations but is not real-time
Hybrids: publish invalidation + also delete locally (handles common case) + periodic reconciliation (catches missed cases)

6. How would you design a multi-tier caching system with L1 (in-process), L2 (Redis), and L3 (CDN)?

Expected answer points:

L1 (in-process): process-local memory, smallest capacity (~1MB), fastest latency — store most frequently accessed hot keys
L2 (Redis): distributed cache, larger capacity (~10GB), sub-millisecond latency — primary cache for most data
L3 (CDN): geographically distributed, largest capacity, higher latency (5-20ms) — cache static content, media, responses for distant users
On read: check L1 → L2 → L3 → database; on write: write to all tiers with appropriate TTLs
Invalidation must cascade: delete L1 locally + delete L2 in Redis + purge CDN
Trade-offs: complexity of tier management, consistency across tiers, cost of CDN for dynamic content

7. How do you estimate the memory capacity needed for a distributed Redis cluster?

Expected answer points:

Formula: usable_capacity = (sum of node memory) × (1 - replication_overhead) × (1 - fragmentation_factor) × 0.7
Example: 3 masters × 16GB = 48GB raw; with 1 replica per master, 33% overhead → 32GB effective; fragmentation 1.2× → 26.7GB; 30% headroom → ~18.7GB usable
Account for replication overhead: each replica mirrors its master (2× raw memory for master+replica pair)
Fragmentation factor accounts for memory allocator overhead (typically 1.1-1.5×) — depends on key size distribution and allocator settings
Hot key analysis: identify keys receiving disproportionate traffic; replicate hot keys across nodes or split them

8. What metrics would you monitor to detect cache cluster problems before they become failures?

Expected answer points:

Cluster-level: cluster_nodes (node count), cluster_slots_fail (unreachable slots), cluster_known_nodes (total known nodes)
Node-level: connected_slaves (replica count per master), master_repl_offset vs slave_repl_offset (replication lag in bytes)
Memory usage: used_memory_human vs maxmemory — alert if >80% utilized
Hit rate: keyspace_hits / (keyspace_hits + keyspace_misses) — alert if <80%
Replica lag: alert if lag_ms > 100ms — stale reads affect correctness
Client connections: connected_clients — alert if >10k or suddenly spikes
Slot migration in progress: if cluster is resharding, temporary unavailability and misses expected

9. How would you handle slot migration when adding a new node to a Redis Cluster?

Expected answer points:

Redis Cluster has 16,384 slots; when adding nodes, slots are migrated one-by-one from existing nodes to new nodes
Migration process: target node accepts IMPORTING, source node accepts MIGRATING, keys are moved one by one, then slot ownership transfers
During migration, some keys may be temporarily unavailable — requests for migrating slot keys may get MOVED error
Slot-aware clients handle MOVED by refreshing slot map and retrying on correct node
Best practices: schedule during low-traffic windows, avoid concurrent migrations, test migration process in staging
After migration, application clients must handle MOVED errors correctly — if client caches slot map without refreshing, partial failure occurs

10. What's the difference between sticky sessions and distributed sessions? When would you choose one over the other?

Expected answer points:

Sticky sessions: load balancer routes each user to the same application instance — sessions live in that instance's memory
Distributed sessions: sessions stored in shared cache (Redis), accessible from any application instance
Sticky sessions: simpler, faster (no network hop), but instance failure loses all sessions for that user
Distributed sessions: more resilient, enables horizontal scaling (any instance can serve any user), slightly slower due to cache access
Choose sticky sessions when: deploying to a single instance or small number of stable instances, session loss on failure is acceptable, cost-sensitive
Choose distributed sessions when: multi-instance deployment, high availability required, sessions must survive instance failures, horizontal scaling needed

11. How does Redis Cluster handle slot migration, and what are the operational considerations during a cluster resharding process?

Expected answer points:

Redis Cluster has 16,384 slots distributed across master nodes; migration moves slots one at a time from source to target node
Migration process: target node enters IMPORTING state, source enters MIGRATING state, keys are scanned and moved one by one via MIGRATE command
During migration, keys for the migrating slot may be on either node — cluster ensures atomic operations during transition
Clients receive MOVED errors when requesting keys on wrong node; slot-aware clients refresh slot map and retry on correct node
Operational best practices: schedule during low-traffic windows, ensure cluster is stable (no other migrations), monitor slot fail count
Avoid adding nodes during peak traffic — resharding causes temporary unavailability for keys being migrated

12. What is the difference between Redis Sentinel and Redis Cluster, and how do you choose between them for a given use case?

Expected answer points:

Redis Sentinel: provides monitoring, notification, and automatic failover for a single master + replicas — does NOT shard data
Redis Cluster: provides sharding (data distributed across multiple masters), replication per shard, and automatic failover — handles both scaling and HA
Choose Sentinel when: dataset fits on single node (~50GB), you need HA for reads/writes to one master, can tolerate external proxy or client-side redirect logic
Choose Cluster when: dataset exceeds single node capacity, need write scaling (multiple masters), want native sharding with slot-aware clients
Sentinel limitations: no data sharding, all writes go to single master, cluster-wide operations require external tooling
Cluster limitations: cannot guarantee ordering across slots, multi-key operations limited to same slot, transactions across slots not supported

13. How would you implement a cache warming strategy after a complete cluster failure or restart? What are the trade-offs between eager and lazy warming?

Expected answer points:

Eager warming: pre-populate cache from database before serving traffic — no cold-start latency for users but delays cluster startup
Lazy warming: serve traffic immediately, populate cache on demand — users experience cache misses initially but cluster starts faster
Proactive warming approach: use access logs to identify hot keys (top N by request count), batch-fetch from DB, populate cache before going live
Node addition warming: scan existing nodes for hot keys, fetch values, populate new node with same TTL — monitor hit rate to validate
Trade-offs: eager wastes resources on keys that may never be accessed; lazy causes initial latency spike and DB overload risk
Hybrid approach: warm top 1000-5000 hot keys eagerly (covers 80% of traffic), lazy-warm everything else

14. Explain how probabilistic early expiration (XFetch) works and when you would prefer it over lock-based cache fill for thundering herd prevention.

Expected answer points:

XFetch formula: probability = beta × (age / original_ttl)^2 — as key ages toward expiration, probability of early refresh increases
When probability triggers, one requestor synchronously refreshes cache; others continue using stale value until refresh completes
Advantage: no coordination overhead (no locks), keeps cache warm naturally, only triggers refresh when approaching expiration
Preference for XFetch when: latency tolerance exists (stale OK briefly), lock infrastructure unavailable, want simplicity over strict guarantees
Lock-based fill advantage: guarantees exactly one DB request per expired key, no stale reads during regeneration
Preference for locks when: staleness completely unacceptable, can tolerate lock acquisition latency, need strict per-key guarantees
Hybrid: use probabilistic early refresh as first line of defense, fall back to lock-based fill if cache miss occurs

15. How do you design a cache invalidation strategy that works across multiple cache tiers (L1 in-process, L2 Redis, L3 CDN)?

Expected answer points:

Invalidation must cascade: delete from L1 locally + delete from L2 in Redis + purge from CDN
For L1 (in-process): local deletion is immediate; no cross-node coordination needed
For L2 (Redis): delete key + publish invalidation event via pub/sub; all instances subscribed to invalidation channel delete the key
For L3 (CDN): use CDN purge API (e.g., Cloudflare API, Akamai purge) after Redis invalidation completes
Ordering matters: invalidate L1/L2 first (fast), then CDN purge (slower) — ensures subsequent requests don't repopulate from stale CDN
For strict consistency: implement version numbers or ETags; include version in cache key; changing data bumps version and naturally makes old keys stale
Alternative: use short TTL on CDN responses (5-10 min) so stale content naturally expires while invalidation cascade runs

16. What are the operational considerations for running Redis Cluster in a multi-region or cross-data-center deployment?

Expected answer points:

Network latency: cross-region RTT can be 100-200ms+ — replication lag increases, client timeouts more frequent
Cluster topology: Redis Cluster does not natively support multi-region — all nodes must be in low-latency network; cross-region requires application-level sharding
Replication: async by default — writes in one region may take time to replicate to another; potential for data loss during partition
Failover complexity: network partition between regions may cause split-brain or prolonged unavailability
Alternative approaches: separate Redis clusters per region with application-level replication; use Redis Enterprise Global (multi-master); use CRDT-based solutions
Monitoring: set up alerts for replication lag per region, track cross-region network latency, monitor cluster_slots_fail for network issues
Cost trade-offs: multi-region provides disaster recovery but increases operational complexity and cost significantly

17. How would you detect and mitigate hot keys in a Redis Cluster where 1% of keys receive 50% of traffic?

Expected answer points:

Detection: use Redis MONITOR or slow log to identify keys with high access counts; analyze with redis-cli --bigkeys or MEMORY USAGE
Hot key identification formula: hot_key_probability = (requests_per_key / total_requests) × number_of_nodes
Mitigation strategy 1 - Key replication: split hot key into multiple keys (user:123:shard0, user:123:shard1...) and randomize replica selection
Mitigation strategy 2 - Replica splitting: replicate hot key to all nodes; route requests to random replica to spread load
Mitigation strategy 3 - Dedicated hot key cluster: move hot keys to separate single-node Redis with replica read scaling
Application-level: use consistent hashing with hot key awareness — manually pin hot keys to nodes with lower utilization
Monitoring: track hit rate per node, alert on node-level imbalance, use Redis INFO stats for keyspace_hits per database

18. Describe the security considerations for a production Redis cluster and how you would implement defense-in-depth.

Expected answer points:

Authentication: use Redis ACLs (not just requirepass) — create users with minimal permissions (e.g., app user can only GET/SET/DEL on app:* keys)
TLS: enable encrypted connections between clients and cluster nodes; also encrypt inter-node replication traffic
Network isolation: bind cluster nodes to private network interfaces only (bind 10.0.1.1); never expose cluster on public IPs
Key namespacing: use prefixes (app:user:, app:product:) to prevent collisions when multiple applications share cluster
Sentinel security: Sentinel instances require authentication; they control master election so compromised Sentinel = cluster compromise
Monitoring: watch for unauthorized topology changes (CLUSTER SETSLOT, CLUSTER meet), monitor failed authentication attempts
Backup: regular cluster state backup (redis-cli --cluster save or rdbcopy); test restoration procedure
Compliance: for PCI/DSS or HIPAA, enable AOF persistence, TLS, ACLs, and audit logging of all commands

19. How does the choice between asynchronous and synchronous replication affect consistency guarantees in a distributed cache?

Expected answer points:

Async replication: write completes on primary after local write, replica updated later — fast writes but potential data loss if primary fails
Sync replication: write completes only after replica acknowledges — no data loss but higher latency, any replica failure blocks writes
Redis Cluster uses async replication by default — replicas may lag up to a few hundred milliseconds during normal operation
Consistency implications: async = eventual consistency, reads from replicas may return stale data; writes may be lost if primary fails before replica update
Mitigation: track replication lag (master_repl_offset vs slave_repl_offset) and alert when lag exceeds threshold (e.g., 100ms)
For stronger guarantees: use WAIT command to wait for replica acknowledgment (synchronous behavior) — blocks until N replicas acknowledge
Trade-off: WAIT improves durability but adds latency; unsuitable for latency-sensitive workloads; still eventual consistency if majority fails

20. How would you approach capacity planning for a distributed cache cluster that needs to handle 100k read ops/sec and 10GB of data?

Expected answer points:

Data capacity: 10GB working set + 30% headroom = ~13GB raw capacity needed per replica; with 1 replica per master, total 20GB across master+replica
Read throughput: Redis handles ~100k ops/sec per node comfortably; 3-node cluster (1 master + 1 replica each) handles 100k reads if read from replicas
Write throughput: writes go to master; single master handles all writes — partition writes across multiple keys to scale write throughput
Cluster configuration: 3 masters × 16GB each = 48GB raw, 33% replication overhead, 1.2x fragmentation = ~26GB effective, 70% usable = ~18GB — exceeds 10GB requirement
Memory estimation formula: usable_capacity = (sum of node memory) × (1 - replication_overhead) × (1 - fragmentation_factor) × 0.7
Hot keys consideration: if hot keys exist, replicate them across nodes or use key spreading; monitor per-node load distribution
Growth buffer: plan for 6-12 months growth; if 10GB is current, plan for 30GB nodes to accommodate growth without immediate resharding

Conclusion

Distributed caching solves scale and availability problems but introduces complexity. Start with a single node, add replication for read scaling, move to clustering when you need more memory or write throughput.

The coherence problem is real. Pick a strategy that matches your consistency requirements. Eventual consistency via pub/sub invalidation works for most use cases. Strong consistency is expensive.

Monitor everything. In distributed systems, you cannot eyeball whether the cache is healthy.

Best Practices Summary

These are the things I see teams trip over most often in production. Skim them now so you’re not debugging cache issues at midnight.

Architecture: Prefer Redis Cluster for clusters with more than 10 nodes or datasets larger than 50GB — it handles sharding and failover natively so you don’t have to. Use consistent hashing with virtual nodes (150+ replicas per node) so that adding or removing nodes only remaps a fraction of your keys. Never run a single Sentinel — deploy at least 3 on separate machines with quorum = (sentinels / 2) + 1. And test that your slot-aware client actually handles MOVED redirects correctly; I’ve seen clusters fail gracefully while applications crumbled because clients cached stale slot maps.

Coherence: Pub/Sub invalidation is eventually consistent — messages can be lost, full stop. Pair it with local deletion on write so the common case works without waiting for pub/sub delivery. Use periodic reconciliation to catch the cases pub/sub misses. Write-through all nodes is tempting if you need strong consistency but it falls apart fast as you scale — one slow node poisons everything.

Performance: Probabilistic early expiration (XFetch) keeps cache warm without locks or coordination overhead. Lock-based cache fill prevents concurrent database hits but adds latency for whoever wins the race. Read from replicas for read-heavy workloads; keep the primary for writes. And set alerts for replication lag — if replicas fall more than 100ms behind, you’re serving stale data without knowing it.

Operations: Keep 30% memory headroom at all times. Cache eviction under pressure causes unpredictable latency spikes that are hard to debug. Slot migration is disruptive — schedule during low-traffic windows, never during peak. Hot keys deserve attention: if 1% of keys receive 50% of your traffic, replicate those keys or split them across nodes.

Security: Use Redis ACLs, not just requirepass. TLS between cluster nodes keeps inter-node traffic private. Bind cluster nodes to private networks only. And namespace your keys with prefixes to avoid collisions when multiple applications share a cluster.

Introduction

Core Concepts

Client-Side Sharding

Proxy-Based Routing

Redis Cluster

Topic-Specific Deep Dives

Cache Coherence

The Problem

Solutions

1. Write-Through All Nodes

2. Invalidation Broadcasting

3. Consistent Hashing with Virtual Nodes

Session Store Patterns

Redis Session Store

Session affinity vs distributed sessions

High Availability Patterns

Replication

Automatic Failover

Trade-off Analysis

Consistency vs Availability

Latency vs Consistency Trade-offs

Memory vs Performance

Client-Side Sharding vs Proxy-Based Routing

TTL Selection Trade-offs

When to Use / When Not to Use

Decision Guide

Capacity Estimation

Production Failure Scenarios

Common Pitfalls / Anti-Patterns

1. Assuming Hashing Is Evenly Distributed

2. Not Handling Node Failure Gracefully

3. Pub/Sub Invalidation Without Guarantees

4. Ignoring Slot Migration Complexity

5. Single Sentinel for HA

Cache Stampede Prevention

Probabilistic Early Expiration

Lock-Based Cache Fill

Background Refresh with Lease

TTL Selection Deep Dive

TTL Selection Framework

TTL Refresh Strategy

Handling TTL vs Freshness Trade-offs

Cache Warming Strategies

Proactive Warming

Warm Cache on Node Addition

Lazy vs Eager Warming

Multi-Tier Caching

Common Cache Hierarchy

L1 + L2 Implementation

CDN as L3: Cache Aside with Tier-Aware Client

Quick Recap Checklist

Copy/Paste Checklist

Observability Checklist

Metrics to Track

Logs to Capture

Alert Rules

Security Checklist

Real-World Case Studies

Redis Cluster Failure at Scale

Monitoring Distributed Cache

Interview Questions

Further Reading

Conclusion

Best Practices Summary

Category

Tags

Related Posts

Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming

Caching Strategies: A Practical Guide

Cache Stampede Prevention: Protecting Your Cache