Distributed Caching: Scaling Cache Across Multiple Nodes

A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.

published: reading time: 43 min read author: Geek Workbench

Introduction

Single-node caching works until it doesn’t.

LimitationWhat Happens
MemoryYour 16GB cache node fills up. You need 64GB.
RequestsYour cache serves 100k requests/second. You need 500k.
AvailabilityYour cache node crashes. Every request hits the database.

Distributed caching solves all three. But it introduces coordination problems.


Core Concepts

Client-Side Sharding

The client decides which cache node to use based on the key. No proxy needed.

graph LR
    A[Client] --> B{Which node?}
    B -->|key1| C[Node 1]
    B -->|key2| D[Node 2]
    B -->|key3| E[Node 3]
import hashlib
from bisect import bisect

class ShardedCache:
    def __init__(self, nodes):
        self.nodes = nodes
        self.ring = {}
        self.sorted_keys = []
        self._build_ring()

    def _build_ring(self):
        for node in self.nodes:
            for i in range(150):
                key = f"{node}:{i}"
                hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
                self.ring[hash_key] = node
                self.sorted_keys.append(hash_key)
        self.sorted_keys.sort()

    def _get_node(self, key):
        if not self.ring:
            return None
        hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
        idx = bisect(self.sorted_keys, hash_key)
        if idx >= len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

    def get(self, key):
        node = self._get_node(key)
        return node.cache.get(key)

    def set(self, key, value):
        node = self._get_node(key)
        node.cache.set(key, value)

Pros: No proxy layer, low latency, simple for small clusters. Cons: Client manages sharding, adding nodes requires moving data.

Proxy-Based Routing

A proxy (like Twemproxy or Redis Cluster client support) routes requests. Clients don’t know about the cluster topology.

graph LR
    A[Client] --> B[Proxy]
    B --> C[Node 1]
    B --> D[Node 2]
    B --> E[Node 3]

Clients send all requests to the proxy. The proxy handles sharding logic. This is how managed services like Redis Enterprise and ElastiCache work.

Pros: Client simplicity, topology hidden from clients. Cons: Extra hop (proxy), proxy is a SPOF unless HA, more latency.

Redis Cluster

Redis Cluster is Redis’s native distributed mode. It handles sharding, replication, and failover automatically.

# Minimum Redis Cluster setup (6 nodes = 3 primaries + 3 replicas)
redis-cli --cluster create \
  192.168.1.1:7000 \
  192.168.1.2:7000 \
  192.168.1.3:7000 \
  192.168.1.4:7000 \
  192.168.1.5:7000 \
  192.168.1.6:7000 \
  --cluster-replicas 1
import redis

# Redis Cluster client handles topology automatically
r = redis.RedisCluster(
    host='192.168.1.1',
    port=7000,
    skip_full_coverage_check=True
)

# Works the same as single Redis
r.set('key', 'value')
r.get('key')

Redis Cluster splits keys across 16,384 slots. Each primary node owns a subset of slots. If a primary fails, its replica promotes automatically.


Topic-Specific Deep Dives

Cache Coherence

When you have multiple cache nodes, coherence becomes a problem. If you update data in Node 1, how do you invalidate that same data in Node 2?

The Problem

graph TD
    A[Write to DB] --> B[Update Node 1]
    B -.->|Async| C[Node 2 has stale data]
    D[Read from Node 2] --> E[Stale data returned]

Solutions

1. Write-Through All Nodes

On write, update all cache nodes.

def write_to_all_nodes(key, value, nodes):
    for node in nodes:
        node.set(key, value)

# Problem: More write latency, potential inconsistency if one fails

This works but is slow and fragile. One slow node drags everything down.

2. Invalidation Broadcasting

On write, broadcast invalidation to all nodes.

def invalidate_across_cluster(key, nodes, pubsub):
    # Publish invalidation event
    pubsub.publish('cache:invalidate', key)

    # Subscribe on each node
    for node in nodes:
        node.delete(key)

# Subscribe to invalidation
pubsub.subscribe('cache:invalidate')

Redis pub/sub is useful here. When data changes, publish an invalidation message. All cache instances subscribe and delete the key.

# On write
redis_cluster.publish('invalidation', json.dumps({'key': 'user:123'}))

# On each cache node
pubsub = redis_cluster.pubsub()
pubsub.subscribe('invalidation')

for message in pubsub.listen():
    if message['type'] == 'message':
        data = json.loads(message['data'])
        cache.delete(data['key'])

3. Consistent Hashing with Virtual Nodes

Use consistent hashing so keys map to the same nodes reliably. Adding a node only moves some keys.

class VirtualNodeSharding:
    def __init__(self, nodes, replicas=150):
        self.replicas = replicas
        self.ring = {}
        self.sorted_keys = []

        for node in nodes:
            for i in range(replicas):
                # Virtual node for better distribution
                key = f"{node}:vnodes:{i}"
                hash_key = self._hash(key)
                self.ring[hash_key] = node
                self.sorted_keys.append(hash_key)

        self.sorted_keys.sort()

    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def get_node(self, key):
        hash_key = self._hash(key)
        idx = bisect(self.sorted_keys, hash_key)
        if idx >= len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

Session Store Patterns

Sessions are a common distributed caching use case. User sessions need to be available across all application instances.

Redis Session Store

import redis
import json
import secrets

class RedisSessionStore:
    def __init__(self, redis_cluster, ttl=86400):
        self.redis = redis_cluster
        self.ttl = ttl

    def create_session(self, user_id, data=None):
        session_id = secrets.token_urlsafe(32)
        session_key = f"session:{session_id}"

        session_data = {
            'user_id': user_id,
            'created_at': time.time(),
            'data': data or {}
        }

        self.redis.setex(
            session_key,
            self.ttl,
            json.dumps(session_data)
        )

        return session_id

    def get_session(self, session_id):
        session_key = f"session:{session_id}"
        data = self.redis.get(session_key)

        if not data:
            return None

        return json.loads(data)

    def update_session(self, session_id, data):
        session_key = f"session:{session_id}"
        session = self.get_session(session_id)

        if not session:
            return False

        session['data'].update(data)
        session['updated_at'] = time.time()

        self.redis.setex(
            session_key,
            self.ttl,
            json.dumps(session)
        )

        return True

    def destroy_session(self, session_id):
        session_key = f"session:{session_id}"
        self.redis.delete(session_key)

Session affinity vs distributed sessions

If your application runs on multiple instances, sessions need to be shared. Two options:

  1. Sticky sessions: Route each user to the same instance. Simpler, but instance failure loses sessions.

  2. Distributed sessions: Sessions stored in Redis, available to all instances. More resilient, slightly slower.

For anything critical, use distributed sessions. Sticky sessions will ruin your day when an instance goes down.


High Availability Patterns

Replication

Add replica nodes that copy data from the primary. Reads can go to replicas, spreading load.

# Redis replication config (on replica)
replicaof 192.168.1.1 7000
replica-read-only yes
# Read from replica, write to primary
def get_from_replica(key):
    return replica_redis.get(key)

def set_to_primary(key, value):
    return primary_redis.setex(key, ttl, value)

def get_cached(key):
    # Try primary first (has freshest data)
    val = primary_redis.get(key)
    if val:
        return val

    # Fall back to replica
    return replica_redis.get(key)
Automatic Failover

Redis Sentinel handles failover automatically. When a primary fails, Sentinel promotes a replica and updates clients.

# Sentinel configuration
sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
from redis.sentinel import Sentinel

sentinel = Sentinel([
    ('192.168.1.1', 26379),
    ('192.168.1.2', 26379),
    ('192.168.1.3', 26379)
], socket_timeout=0.1)

# Get master and replica connections
master = sentinel.master_for('mymaster')
replica = sentinel.slave_for('mymaster')

# Automatic failover handling
master.set('key', 'value')  # Writes always go to master
replica.get('key')           # Reads can go to replica

Trade-off Analysis

This section consolidates the key architectural decisions and their implications across distributed caching implementations.

Consistency vs Availability

The classic CAP theorem trade-off manifests directly in cache cluster design:

ApproachConsistencyAvailabilityTypical Use Case
Redis Cluster (async)EventualHighWeb applications, content caching
Redis Sentinel + writesStrong (sync write)Moderate (failover)Financial data, inventory
Write-through all nodesStrongLow (any node down)Small clusters, critical consistency
Pub/Sub invalidationEventualHighMulti-region, social media

Latency vs Consistency Trade-offs

StrategyRead LatencyWrite LatencyStaleness Window
Primary for all ops~0.5-1ms~0.5-1msNone (always fresh)
Read from replica~0.3-0.5ms~0.5-1msReplication lag (typically <100ms)
Probabilistic early exp~0.5-1msSame as writeTTL × probabilistic window
Stale-while-revalidate~0.3ms (serve stale)Background refreshConfigurable, can be seconds

Memory vs Performance

ConfigurationMemory EfficiencyPerformanceFailure Mode
No replication100% usableHighest throughputSingle node failure = data loss
1 replica per master50% usable~70% of no-replica1 node fail = promoted replica serves
2 replicas per master33% usable~50% of no-replicaMore resilience, higher cost
Hot key replicationLower (replicated keys)Even load distributionComplexity in invalidation

Client-Side Sharding vs Proxy-Based Routing

FactorClient-Side ShardingProxy-Based Routing
LatencyLowest (direct)+1 hop (proxy)
Operations complexityHigher (each client)Lower (centralized)
ScalabilityLimited by clientProxy must scale
Failure domainClient failures affect cache accessProxy is SPOF unless HA
Topology managementClient must trackProxy handles all

TTL Selection Trade-offs

TTL SettingCache EfficiencyData FreshnessServer Load
Very short (<60s)Low hit rateNear real-timeHigh DB load
Short (1-5 min)Moderate hit rateFresh enough for mostModerate DB load
Long (1-24 hours)High hit rateStale data riskLow DB load
Adaptive TTLVariableBetter freshnessComplexity in implementation

When to Use / When Not to Use

ArchitectureWhen to UseWhen Not to Use
Client-Side ShardingSmall to medium clusters (<10 nodes); low latency critical; team comfortable with client logicLarge clusters; multi-language environments; teams want simplicity
Proxy-Based RoutingMulti-client support; topology hidden; managed service environmentsExtra hop latency unacceptable; proxy is single point of failure
Redis ClusterLarge datasets (>10GB); need automatic failover; want native shardingSmall datasets; need strict ordering; transaction-heavy workloads
Write-Through All NodesSmall clusters; strong consistency required; write volume manageableLarge clusters; high write volume; eventual consistency acceptable
Pub/Sub InvalidationMulti-region; eventual consistency acceptable; need real-time invalidationStrict consistency required; high invalidation frequency
Consistent HashingFrequent node additions/removals; want minimal key remappingFixed cluster size; need strict key distribution
Sticky SessionsSimple deployments; session loss acceptable on failure; cost-sensitiveHigh availability required; sessions must survive instance failures
Distributed SessionsHA critical; multi-instance deployments; sessions must persistSimple single-instance apps; session loss tolerable

Decision Guide

graph TD
    A[Scale Need] --> B{Memory > Single Node?}
    B -->|Yes| C{Need HA?}
    B -->|No| D[Single Node Cache]
    C -->|Yes| E[Redis Cluster or Replicated Setup]
    C -->|No| F{Client Simplicity?}
    F -->|Yes| G[Proxy-Based Routing]
    F -->|No| H[Client-Side Sharding]
    E --> I{Need Automatic Failover?}
    I -->|Yes| J[Sentinel or Redis Cluster]
    I -->|No| K[Manual Replica Promotion]

Capacity Estimation

Planning a distributed cache cluster requires understanding how data and load distribute across nodes.

Cluster memory sizing formula:

total_cache_memory = sum(memory_per_node × number_of_nodes)
effective_capacity = total_cache_memory × (1 - replication_overhead) × (1 - fragmentation_factor)
usable_capacity = effective_capacity × 0.7  # Keep 30% headroom for spikes

For a Redis Cluster with 3 masters × 16GB each and 1 replica per master (total 6 nodes):

  • Total raw memory: 48GB
  • Replication overhead: 33% (1 replica per master): 48GB × 0.67 = 32GB effective
  • Fragmentation factor: typically 1.1-1.5× depending on allocator: 32GB / 1.2 = 26.7GB
  • Usable at 70%: ~18.7GB for cached data

Slot distribution in Redis Cluster: 16,384 slots divided across N masters. With 3 masters: each owns ~5,461 slots. When adding nodes, Redis Cluster migrates slots in units — one slot at a time. The cluster slots command shows current ownership. Plan slot assignments in advance using redis-cli --cluster import or manual cluster setslot commands during maintenance windows.

Hot key detection: Even with consistent hashing, hot keys cause uneven load. The formula for identifying hot keys:

hot_key_probability = (requests_per_key / total_requests) × number_of_nodes

If 1% of keys receive 50% of traffic on a 10-node cluster, those keys are 5× hotter than average. Mitigation: replicate hot keys to all nodes and route randomly, use key spreading (split user:123 into user:123:shard0user:123:shard9), or move hot key handling to a dedicated single-node Redis with replica read scaling.

Memcached cluster write capacity: With N nodes in a consistent hash ring, each write goes to one node. Write throughput = single_node_throughput × N (assuming even distribution). If one Memcached node handles 200k ops/sec, a 10-node cluster handles ~2M ops/sec total, but any single key’s write rate is still bound by one node.


Production Failure Scenarios

FailureImpactMitigation
Single cache node crashesKeys on that node missed; potential database overloadReplica promotion; circuit breaker; replica reads during failover
Network partitionCluster split-brain or unavailableRedis Cluster handles automatically; Sentinel monitors
Proxy failure (Twemproxy)All cache requests failRun multiple proxies with load balancer; keep-alive health checks
Shard rebalancingTemporary key remapping; some misses during migrationUse stable cluster configuration; avoid frequent resharding
Invalidation broadcast lostStale data on some nodesImplement periodic reconciliation; use version vectors
Replica lagStale reads from replicasMonitor lag; route consistency-critical reads to primary
Sentinel election failureNo automatic failover possibleRun 3+ Sentinels; ensure proper quorum configuration

Common Pitfalls / Anti-Patterns

1. Assuming Hashing Is Evenly Distributed

Poor hash functions or small keyspaces cause hot spots.

# BAD: Modulo-based sharding creates hot spots
def get_node(key, num_nodes):
    hash_val = hash(key)
    return hash_val % num_nodes  # Uneven for small keyspaces

# GOOD: Use consistent hashing with virtual nodes
class ConsistentHashSharding:
    def __init__(self, nodes, replicas=150):
        self.ring = {}
        for node in nodes:
            for i in range(replicas):
                key = f"{node}:{i}"
                self.ring[md5(key)] = node

2. Not Handling Node Failure Gracefully

Assuming all nodes always available leads to cascading failures.

# BAD: No failure handling
def get(key):
    return nodes[key % len(nodes)].get(key)  # Crashes if node down

# GOOD: Retry on different node, circuit breaker
def get_with_fallback(key, max_retries=3):
    for attempt in range(max_retries):
        try:
            node = get_node_for_key(key)
            return node.get(key)
        except ConnectionError:
            logger.warning("node_connection_failed", node=node, attempt=attempt)
            # Circuit breaker would prevent repeated attempts
    return get_from_database_fallback(key)

3. Pub/Sub Invalidation Without Guarantees

Pub/Sub is fire-and-forget; messages can be lost.

# BAD: Assuming pub/sub invalidation always works
def on_data_update(key, value):
    db.update(key, value)
    redis_cluster.publish('invalidation', key)  # Fire and forget

# GOOD: Verify invalidation, periodic reconciliation
def on_data_update(key, value):
    db.update(key, value)

    # Publish invalidation
    redis_cluster.publish('invalidation', key)

    # Also delete locally to handle the common case
    redis_cluster.delete(key)

    # For strict consistency: check all nodes periodically
    schedule_reconciliation(key)

4. Ignoring Slot Migration Complexity

Adding nodes to Redis Cluster requires migrating slots.

graph LR
    A[Before Migration] --> B[During Migration<br/>Slot 123 split]
    B --> C[Node1 owns 0-5460]
    B --> D[Node2 owns 5461-10922<br/>+ Slot 123 migrating]
    C --> E[After Migration<br/>Node1 owns 0-5460]
    D --> E
# During slot migration:
# 1. Cluster must be in stable state (no other migrations)
# 2. Target node must accept IMPORTING slot
# 3. Source node must accept MIGRATING slot
# 4. Keys must be moved one by one
# 5. Slot ownership transferred after all keys moved

# Don't add nodes during high traffic - migration causes temporary unavailability

5. Single Sentinel for HA

Sentinel needs quorum to elect a new master.

# BAD: Single Sentinel
# If Sentinel crashes, no failover possible

# GOOD: 3+ Sentinels with proper quorum
sentinel monitor mymaster 192.168.1.1 7000 2  # Quorum of 2
sentinel monitor mymaster 192.168.1.2 7000 2  # On different machines
sentinel monitor mymaster 192.168.1.3 7000 2  # Quorum = 2 of 3

Cache Stampede Prevention

Here’s a scenario I’ve seen play out in production: a popular cache key expires, and suddenly 50 requests hit the database at once. That’s the thundering herd — cache expiration creates traffic spikes that can flatten your database before you know what happened.

Probabilistic Early Expiration

Instead of waiting for TTL to expire, refresh the cache probabilistically before it expires:

import random
import time

class ProbabilisticEarlyExpiration:
    def __init__(self, cache_client, beta=0.3):
        self.cache = cache_client
        self.beta = beta  # Higher = more aggressive early refresh

    def get(self, key, fetch_func):
        cached = self.cache.get(key)
        if cached is not None:
            # Check if we should early-refresh
            metadata = self.cache.get(f"{key}:meta")
            if metadata:
                age, original_ttl = float(metadata['age']), float(metadata['ttl'])
                # Probabilistic early expiration formula from XFetch
                probability = self.beta * (age / original_ttl) ** 2
                if random.random() < probability:
                    # Async refresh in background (or sync for simplicity)
                    try:
                        new_value = fetch_func(key)
                        self.cache.set(key, new_value, ttl=original_ttl)
                    except Exception:
                        pass  # Don't fail if refresh fails
            return cached

        # Cache miss - fetch and store
        value = fetch_func(key)
        ttl = self._estimate_ttl(value)
        self.cache.set(key, value, ttl=ttl)
        self.cache.set(f"{key}:meta", {'age': 0, 'ttl': ttl})
        return value

Lock-Based Cache Fill

Use distributed locks so only one request fetches and fills the cache:

import redis
import threading

class LockBasedCacheFill:
    def __init__(self, redis_client, lock_ttl=5):
        self.redis = redis_client
        self.lock_ttl = lock_ttl

    def get_or_fill(self, key, fetch_func, ttl=300):
        cached = self.redis.get(key)
        if cached is not None:
            return cached

        lock_key = f"lock:{key}"
        # Try to acquire lock
        acquired = self.redis.set(lock_key, "1", nx=True, ex=self.lock_ttl)

        if acquired:
            try:
                value = fetch_func(key)
                self.redis.setex(key, ttl, value)
                return value
            finally:
                self.redis.delete(lock_key)
        else:
            # Another request is filling - wait and retry
            time.sleep(0.1)
            cached = self.redis.get(key)
            if cached:
                return cached
            # Fallback after timeout - fetch directly
            return fetch_func(key)

Background Refresh with Lease

Give requests a lease on the cache entry so they know when to refresh:

def get_with_lease(self, key, fetch_func, ttl=300):
    cached, lease_id = self.redis.get_with_lease(key)

    if cached:
        remaining_ttl = self.redis.ttl(key)
        # If less than 20% of original TTL remains, trigger background refresh
        if remaining_ttl < ttl * 0.2:
            # Fire-and-forget background refresh
            thread = threading.Thread(target=self._background_fill, args=(key, fetch_func))
            thread.start()
        return cached

    return self._fill_cache(key, fetch_func, ttl)

TTL Selection Deep Dive

TTL is one of those settings that seems trivial but will bite you in production. Set it too short and you lose most of your cache benefit. Too long and you’re serving stale data without realizing it.

TTL Selection Framework

Data TypeTypical TTL RangeRationale
User session15 min - 24 hoursSession must outlast user activity, but stale sessions hurt
Product catalog1 hour - 24 hoursUpdates are infrequent, stale data has business cost
Social media feed30 sec - 5 minFreshness critical, staleness very visible
Analytics aggregates5 min - 1 hourSource data changes slowly, aggregate is cheap to refresh
Configuration flags30 sec - 5 minFeature flags need fast propagation
Search index1 hour - 24 hoursIndex rebuild is expensive, updates are batched
Thumbnail / media URL24 hours - 7 daysURLs change rarely but CDN caching matters

TTL Refresh Strategy

def adaptive_ttl(key, value, access_pattern):
    """
    Dynamically set TTL based on access patterns and data type.
    """
    base_ttl = {
        'session': 86400,       # 24 hours
        'catalog': 3600,        # 1 hour
        'feed': 60,            # 1 minute
        'config': 300,         # 5 minutes
        'static': 86400 * 7,   # 7 days
    }

    ttl = base_ttl.get(access_pattern, 3600)

    # Reduce TTL for frequently changing data
    if value.get('updated_at'):
        age = time.time() - value['updated_at']
        if age > ttl * 0.8:
            ttl = int(ttl * 0.5)  # Halve TTL for aging data

    # Increase TTL for read-heavy data that rarely changes
    read_count = value.get('read_count', 0)
    if read_count > 10000:
        ttl = int(ttl * 1.5)

    return ttl

Handling TTL vs Freshness Trade-offs

The max-age header and Redis TTL serve different purposes:

# Redis TTL controls when cache expires internally
# HTTP max-age controls what downstream clients see

# For a product catalog cached at CDN edge:
response.set_header('Cache-Control', 'public, max-age=300')  # CDN caches 5 min
redis.setex('product:123', 3600, data)  # Redis expires in 1 hour

# Invalidation: when product updates, publish to pub/sub
redis.publish('invalidation', 'product:123')
# Application deletes the key: redis.delete('product:123')
# Next request fetches fresh data, repopulates cache with new TTL

Cache Warming Strategies

Cold cache is one of those things that feels fine in testing and absolutely destroys you at 3am on a Monday. When a cache restarts or scales, you need a strategy to repopulate it without hammering your database.

Proactive Warming

On cache startup or scale-out, pre-populate with estimated hot keys:

class CacheWarmer:
    def __init__(self, cache_client, db_client):
        self.cache = cache_client
        self.db = db_client

    def warm_from_access_log(self, access_log_path, top_n=1000):
        """
        Warm cache from recent access patterns.
        access_log_path: path to access log with key names
        """
        from collections import Counter

        key_counts = Counter()
        with open(access_log_path) as f:
            for line in f:
                key = self._extract_key_from_log(line)
                if key:
                    key_counts[key] += 1

        # Get top N most accessed keys
        hot_keys = [k for k, _ in key_counts.most_common(top_n)]

        # Batch fetch from database and populate cache
        self._batch_fill(hot_keys, batch_size=100)

    def _batch_fill(self, keys, batch_size=100):
        for i in range(0, len(keys), batch_size):
            batch = keys[i:i + batch_size]
            # Fetch all from DB in one query
            rows = self.db.fetch_many("SELECT * FROM items WHERE id IN (%s)", batch)
            for row in rows:
                self.cache.setex(f"item:{row['id']}", 3600, json.dumps(row))

    def warm_from_key_pattern(self, patterns):
        """
        Warm from known key patterns (e.g., all products in category).
        """
        for pattern in patterns:
            # e.g., "product:*" - scan matching keys
            cursor = 0
            while True:
                cursor, keys = self.cache.scan(cursor, match=pattern, count=100)
                for key in keys:
                    # Re-populate with fresh data
                    value = self.db.fetch_one(key.replace('cache:', ''))
                    if value:
                        self.cache.setex(key, 3600, json.dumps(value))
                if cursor == 0:
                    break

Warm Cache on Node Addition

When adding a new cache node (cluster scale-out), pre-warm it:

graph TD
    A[New Node Added] --> B{Key ownership?}
    B --> C[Scan hot keys from other nodes]
    C --> D[Batch fetch from database]
    D --> E[Populate new node]
    E --> F[Monitor hit rate on new node]
    F --> G{Hit rate > threshold?}
    G -->|Yes| H[Node ready]
    G -->|No| I[Increase warm priority]
    I --> C
def warm_new_node(new_node, cluster, hot_keys):
    """
    Pre-populate a new cluster node with hot keys.
    """
    for key_batch in chunked(hot_keys, 50):
        # Get current value from existing nodes
        values = cluster.mget(key_batch)
        # Set on new node with same TTL
        for key, value in zip(key_batch, values):
            if value:
                ttl = cluster.ttl(key)
                new_node.setex(key, ttl, value)

Lazy vs Eager Warming

StrategyProsConsBest For
LazyNo upfront cost, only heat what gets usedInitial requests hit DBLarge caches, infrequent cold starts
EagerInstant hot cache, no DB spikesWastes resources on unused keysPredictable traffic, scheduled maintenance
HybridPrioritizes hot keys, fills on-demandComplexityProduction systems with known hot set

Multi-Tier Caching

Most production systems don’t get far with a single cache layer. At scale, you end up stacking L1 (in-process memory), L2 (Redis or Memcached), and L3 (CDN) — each with different latency, capacity, and cost characteristics.

Common Cache Hierarchy

graph TB
    A[Request] --> B[L1: In-Process<br/>Process memory<br/>~1MB, sub-ms]
    B -->|miss| C[L2: Distributed Cache<br/>Redis/Memcached<br/>~10GB, <1ms]
    C -->|miss| D[L3: CDN Edge<br/>Geographic PoPs<br/>~100GB, 5-20ms]
    D -->|miss| E[Database<br/>ms latency]

L1 + L2 Implementation

import threading
import time

class TwoTierCache:
    def __init__(self, l1_size=1000, l2_client=None):
        # L1: Simple in-memory dict with LRU
        self.l1 = {}  # key -> (value, expiry)
        self.l1_access = {}  # key -> last access time for LRU
        self.l1_lock = threading.Lock()
        self.l1_size = l1_size

        # L2: Redis
        self.l2 = l2_client

    def get(self, key):
        # Try L1 first
        with self.l1_lock:
            if key in self.l1:
                value, expiry = self.l1[key]
                if time.time() < expiry:
                    self.l1_access[key] = time.time()
                    return value
                del self.l1[key]
                del self.l1_access[key]

        # Try L2
        if self.l2:
            value = self.l2.get(key)
            if value:
                # Populate L1
                self._l1_set(key, value, ttl=300)
                return value

        return None

    def set(self, key, value, ttl=300):
        # Write to both tiers
        self._l1_set(key, value, ttl)
        if self.l2:
            self.l2.setex(key, ttl, value)

    def _l1_set(self, key, value, ttl):
        with self.l1_lock:
            # Evict LRU if at capacity
            if len(self.l1) >= self.l1_size and key not in self.l1:
                lru_key = min(self.l1_access, key=self.l1_access.get)
                del self.l1[lru_key]
                del self.l1_access[lru_key]

            self.l1[key] = (value, time.time() + ttl)
            self.l1_access[key] = time.time()

    def invalidate(self, key):
        with self.l1_lock:
            self.l1.pop(key, None)
            self.l1_access.pop(key, None)
        if self.l2:
            self.l2.delete(key)

CDN as L3: Cache Aside with Tier-Aware Client

class TierAwareCacheClient:
    def __init__(self, cdn_client, redis_client, db_client):
        self.cdn = cdn_client  # L3: CDN (e.g., Cloudflare API)
        self.redis = redis_client  # L2: Redis
        self.db = db_client  # Origin

    def get(self, key):
        # Try CDN (L3) - lowest latency for distant users
        value = self.cdn.get(key)
        if value:
            return value

        # Try Redis (L2)
        value = self.redis.get(key)
        if value:
            # Populate CDN for next request
            self.cdn.set(key, value, ttl=300)
            return value

        # Cache miss - fetch from DB (L1 effectively)
        value = self.db.fetch(key)
        if value:
            # Write to both L2 and L3
            self.redis.setex(key, 3600, value)
            self.cdn.set(key, value, ttl=300)
        return value

Quick Recap Checklist

  • Distributed caching solves memory, throughput, and availability limits of single node
  • Client-side sharding gives lowest latency but requires client logic
  • Proxy-based routing simplifies clients but adds hop and potential SPOF
  • Redis Cluster provides automatic sharding, replication, and failover
  • Cache coherence across nodes requires explicit strategy (pub/sub, write-all, etc.)
  • Session stores are a common distributed cache use case — distributed sessions beat sticky sessions
  • Sentinel handles failover for Redis masters; Redis Cluster handles it natively
  • Consistent hashing with virtual nodes (150+ replicas) minimizes remapping on node changes
  • Probabilistic early expiration (XFetch) prevents cache stampede without locks
  • Lock-based cache fill guarantees single DB hit but adds coordination overhead
  • Multi-tier caching (L1/L2/L3) balances latency, capacity, and cost
  • Cache invalidation must cascade across all tiers — L1 local, L2 pub/sub, L3 CDN purge
  • Monitor replication lag and alert if >100ms to catch stale read issues
  • Slot-aware clients must handle MOVED redirects correctly during failover
  • Keep 30% memory headroom to avoid eviction under pressure causing latency spikes

Copy/Paste Checklist

Observability Checklist

Metrics to Track

Cluster-Level:

  • cluster_nodes - Number of nodes, their states, and roles
  • cluster_slots_fail - Number of slots not covered by reachable nodes
  • cluster_known_nodes - Total known nodes in cluster

Node-Level:

  • connected_slaves - Number of replicas connected to each master
  • master_repl_offset - Replication position
  • slave_repl_offset - Replica lag calculation
# Cluster health check
def check_cluster_health(cluster):
    health = {
        'healthy': True,
        'nodes': {},
        'issues': []
    }

    for node in cluster.nodes:
        info = node.info()
        health['nodes'][node.name] = {
            'role': info.get('role'),
            'connected': info.get('connected') == 'true',
            'memory': info.get('used_memory'),
        }

        if info.get('role') == 'master' and int(info.get('connected_slaves', 0)) == 0:
            health['issues'].append(f"Master {node.name} has no replicas")

    health['healthy'] = len(health['issues']) == 0
    return health

# Monitor replication lag
def check_replication_lag(primary, replica):
    primary_offset = primary.info()['master_repl_offset']
    replica_info = replica.info()
    replica_offset = replica_info.get('master_repl_offset', 0)
    lag_bytes = primary_offset - replica_offset

    # Rough estimate: 1MB = ~1 second at typical network speeds
    lag_ms = (lag_bytes / 1024 / 1024) * 1000

    if lag_ms > 100:
        logger.warning("replication_lag_high",
            lag_ms=lag_ms,
            lag_bytes=lag_bytes)

Logs to Capture

logger = structlog.get_logger()

# Log cluster topology changes
def log_cluster_event(event_type, node_id, old_state, new_state):
    logger.warning("cluster_topology_change",
        event_type=event_type,
        node_id=node_id,
        old_state=old_state,
        new_state=new_state,
        timestamp=time.time())

# Log failover events
def log_failover(primary_id, new_primary_id):
    logger.critical("cache_failover_completed",
        old_primary=primary_id,
        new_primary=new_primary_id,
        timestamp=time.time())

Alert Rules

- alert: CacheClusterNodeDown
  expr: cache_cluster_connected_nodes < expected_nodes
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Cache cluster node(s) unavailable"

- alert: CacheReplicationLag
  expr: cache_replication_lag_bytes > 10485760 # 10MB
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Cache replication lag exceeds 10MB"

- alert: CacheClusterSlotFail
  expr: cache_cluster_slots_fail > 0
  for: 30s
  labels:
    severity: critical
  annotations:
    summary: "Cache cluster has unreachable slots"

Security Checklist

  • Node authentication - Use Redis ACLs or cluster authentication
  • Inter-node encryption - Encrypt traffic between cluster nodes
  • Network isolation - Cluster nodes should be on private network
  • Key prefix namespacing - Prevent collisions in shared cluster
  • Monitor cluster events - Watch for unauthorized topology changes
  • Secure Sentinel - Sentinel instances should require authentication
  • Backup cluster state - Regularly backup cluster configuration
  • Validate node identity - Verify node certificates if using TLS
# Redis Cluster secure configuration
# On each node:
requirepass your-cluster-auth
cluster-preferred-endpoint-type ipv4
tls-cluster yes

# In redis.conf:
# Bind to internal network only
bind 10.0.1.1 -::1

# ACL for application user
user appuser on >apppassword ~app:* on >appdb ~* +get +set +del +exists -flushall -flushdb

# Redis Cluster minimum setup (6 nodes)
redis-cli --cluster create \
  192.168.1.1:7000 \
  192.168.1.2:7000 \
  192.168.1.3:7000 \
  192.168.1.4:7000 \
  192.168.1.5:7000 \
  192.168.1.6:7000 \
  --cluster-replicas 1

# Sentinel configuration (3 nodes)
sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000

# Pub/Sub invalidation subscription
redis-cli SUBSCRIBE cache:invalidate
# Message format: {"key": "user:123", "action": "delete"}

# Cluster health check
redis-cli cluster info
redis-cli cluster nodes | grep master

# Python Redis Cluster
r = redis.RedisCluster(
    host='192.168.1.1',
    port=7000,
    skip_full_coverage_check=True,
    read_from_replicas=True  # Read from replicas for scale
)

# Deployment checklist:
# [ ] Minimum 3 masters + replicas for HA
# [ ] Sentinel quorum = (sentinels / 2) + 1
# [ ] Monitor replication lag on all replicas
# [ ] Test failover manually before going to production
# [ ] Implement application-level retry logic for cluster operations
# [ ] Use slot-aware client for efficient routing
# [ ] Set up alerts for cluster node failures

Real-World Case Studies

Redis Cluster Failure at Scale

A major e-commerce platform ran a 24-node Redis Cluster (12 masters, 12 replicas) for session storage and cart data. During a peak traffic event, a network switch failed causing a brief partition between 4 nodes and the rest of the cluster.

The cluster’s quorum-based failover correctly detected that the 4 isolated nodes could not form a majority and did not promote themselves — correct behavior. But the remaining 20 nodes experienced a cascade: the primary for one of the hot key slots was among the isolated nodes. The replica for that slot was in the majority partition. Failover promoted the replica to primary.

The problem: the application was using a slot-aware client that cached slot-to-node mapping. When failover changed slot ownership, the client’s cached mapping became stale. Some application instances still routed requests for the affected slot to the demoted node (now a replica). The replica rejected writes, and those application instances had no mechanism to refresh the slot map — they kept retrying the same failed node.

The recovery took 8 minutes. The fix: application code was updated to handle MOVED errors by refreshing the slot map and retrying, and the slot refresh timeout was reduced from 60 seconds to 5 seconds. Monitoring was added for MOVED error rates per application instance.

The lesson: cluster failover works, but your application’s cluster client must handle MOVED redirects correctly. A slot-aware client that does not honor MOVED creates a partial failure that bypasses cluster resilience. Test failover explicitly — trigger manual failover and observe whether your application recovers cleanly.


Monitoring Distributed Cache

You need visibility into your distributed cache. Key metrics:

# Check cluster health
def check_cluster_health(cluster):
    health = {
        'nodes': [],
        'total_keys': 0,
        'total_memory': 0,
        'hit_rate': 0
    }

    for node in cluster.get_nodes():
        info = node.info()
        health['nodes'].append({
            'host': node.host,
            'role': node.role,
            'connected': node.connected
        })
        health['total_keys'] += info.get('keys', 0)
        health['total_memory'] += info.get('used_memory', 0)

    return health

# Alert thresholds
ALERT_THRESHOLDS = {
    'memory_usage_percent': 80,  # Alert if >80% memory used
    'hit_rate_percent': 80,     # Alert if hit rate <80%
    'connected_clients': 10000, # Alert if >10k clients
    'replication_lag_ms': 100  # Alert if replica lag >100ms
}

Interview Questions

1. How does consistent hashing reduce the impact of adding or removing nodes in a distributed cache?

Expected answer points:

  • Consistent hashing maps keys to nodes using a hash ring with virtual nodes (150+ per physical node)
  • When a node is added or removed, only K/N keys are remapped (where K is virtual nodes per node, N is total nodes) — far less than traditional modulo hashing which remaps all keys
  • Virtual nodes ensure even key distribution even when physical nodes have different capacities
  • Trade-off: still requires some data movement during rebalancing, but impact is bounded and predictable
2. Describe the differences between cache-aside, read-through, and write-through caching patterns. When would you use each?

Expected answer points:

  • Cache-aside: application checks cache first, falls back to database on miss, then populates cache — most common pattern, application controls everything
  • Read-through: cache automatically fetches from database on miss — simpler for application but less control over serialization
  • Write-through: application writes to cache, cache synchronously writes to database — simpler reads but slower writes
  • Use cache-aside when: you need control over serialization, multiple data sources, or asymmetric read/write loads
  • Use write-through when: reads far outnumber writes and you want simplified read path
3. How would you handle the thundering herd problem when a popular cache key expires simultaneously across many requests?

Expected answer points:

  • Probabilistic early expiration (XFetch): before TTL expires, randomly refresh based on key age and a beta parameter — reduces simultaneous expiration
  • Lock-based cache fill: first request acquires a distributed lock and fills; others wait and retry — guarantees single fill but adds lock latency
  • Background refresh with lease: serve stale data while refreshing asynchronously — best for latency-sensitive workloads
  • Other approaches: random jitter on TTL, staggered TTLs for grouped keys, proactive warming of critical keys
4. What are the trade-offs between Redis Sentinel and Redis Cluster for high availability?

Expected answer points:

  • Redis Sentinel: monitoring, notification, and automatic failover for a single master — requires external proxy or client-side logic for routing
  • Redis Cluster: native sharding, replication, and failover within the Redis server — no external components needed but shards are independently available
  • Sentinel is simpler operationally but requires careful client configuration for failover
  • Cluster provides better write scalability (multiple masters) but requires slot-aware clients and has limitations on transactions across slots
  • Cluster cannot guarantee ordering across slots; Sentinel can if you use a single master-replica pair
5. How do you ensure cache coherence when data is updated in one cache node but must be invalidated across all nodes?

Expected answer points:

  • Pub/Sub broadcast: when data updates, publish invalidation event to all nodes — simple but unreliable (messages can be lost)
  • Write-through all nodes: synchronously update all cache replicas on write — strong consistency but slow and fragile if nodes fail mid-write
  • Version vectors or vector clocks: track which nodes have which version of data — enables conflict resolution but adds complexity
  • Periodic reconciliation: periodically scan and compare cache state across nodes — catches missed invalidations but is not real-time
  • Hybrids: publish invalidation + also delete locally (handles common case) + periodic reconciliation (catches missed cases)
6. How would you design a multi-tier caching system with L1 (in-process), L2 (Redis), and L3 (CDN)?

Expected answer points:

  • L1 (in-process): process-local memory, smallest capacity (~1MB), fastest latency — store most frequently accessed hot keys
  • L2 (Redis): distributed cache, larger capacity (~10GB), sub-millisecond latency — primary cache for most data
  • L3 (CDN): geographically distributed, largest capacity, higher latency (5-20ms) — cache static content, media, responses for distant users
  • On read: check L1 → L2 → L3 → database; on write: write to all tiers with appropriate TTLs
  • Invalidation must cascade: delete L1 locally + delete L2 in Redis + purge CDN
  • Trade-offs: complexity of tier management, consistency across tiers, cost of CDN for dynamic content
7. How do you estimate the memory capacity needed for a distributed Redis cluster?

Expected answer points:

  • Formula: usable_capacity = (sum of node memory) × (1 - replication_overhead) × (1 - fragmentation_factor) × 0.7
  • Example: 3 masters × 16GB = 48GB raw; with 1 replica per master, 33% overhead → 32GB effective; fragmentation 1.2× → 26.7GB; 30% headroom → ~18.7GB usable
  • Account for replication overhead: each replica mirrors its master (2× raw memory for master+replica pair)
  • Fragmentation factor accounts for memory allocator overhead (typically 1.1-1.5×) — depends on key size distribution and allocator settings
  • Hot key analysis: identify keys receiving disproportionate traffic; replicate hot keys across nodes or split them
8. What metrics would you monitor to detect cache cluster problems before they become failures?

Expected answer points:

  • Cluster-level: cluster_nodes (node count), cluster_slots_fail (unreachable slots), cluster_known_nodes (total known nodes)
  • Node-level: connected_slaves (replica count per master), master_repl_offset vs slave_repl_offset (replication lag in bytes)
  • Memory usage: used_memory_human vs maxmemory — alert if >80% utilized
  • Hit rate: keyspace_hits / (keyspace_hits + keyspace_misses) — alert if <80%
  • Replica lag: alert if lag_ms > 100ms — stale reads affect correctness
  • Client connections: connected_clients — alert if >10k or suddenly spikes
  • Slot migration in progress: if cluster is resharding, temporary unavailability and misses expected
9. How would you handle slot migration when adding a new node to a Redis Cluster?

Expected answer points:

  • Redis Cluster has 16,384 slots; when adding nodes, slots are migrated one-by-one from existing nodes to new nodes
  • Migration process: target node accepts IMPORTING, source node accepts MIGRATING, keys are moved one by one, then slot ownership transfers
  • During migration, some keys may be temporarily unavailable — requests for migrating slot keys may get MOVED error
  • Slot-aware clients handle MOVED by refreshing slot map and retrying on correct node
  • Best practices: schedule during low-traffic windows, avoid concurrent migrations, test migration process in staging
  • After migration, application clients must handle MOVED errors correctly — if client caches slot map without refreshing, partial failure occurs
10. What's the difference between sticky sessions and distributed sessions? When would you choose one over the other?

Expected answer points:

  • Sticky sessions: load balancer routes each user to the same application instance — sessions live in that instance's memory
  • Distributed sessions: sessions stored in shared cache (Redis), accessible from any application instance
  • Sticky sessions: simpler, faster (no network hop), but instance failure loses all sessions for that user
  • Distributed sessions: more resilient, enables horizontal scaling (any instance can serve any user), slightly slower due to cache access
  • Choose sticky sessions when: deploying to a single instance or small number of stable instances, session loss on failure is acceptable, cost-sensitive
  • Choose distributed sessions when: multi-instance deployment, high availability required, sessions must survive instance failures, horizontal scaling needed
11. How does Redis Cluster handle slot migration, and what are the operational considerations during a cluster resharding process?

Expected answer points:

  • Redis Cluster has 16,384 slots distributed across master nodes; migration moves slots one at a time from source to target node
  • Migration process: target node enters IMPORTING state, source enters MIGRATING state, keys are scanned and moved one by one via MIGRATE command
  • During migration, keys for the migrating slot may be on either node — cluster ensures atomic operations during transition
  • Clients receive MOVED errors when requesting keys on wrong node; slot-aware clients refresh slot map and retry on correct node
  • Operational best practices: schedule during low-traffic windows, ensure cluster is stable (no other migrations), monitor slot fail count
  • Avoid adding nodes during peak traffic — resharding causes temporary unavailability for keys being migrated
12. What is the difference between Redis Sentinel and Redis Cluster, and how do you choose between them for a given use case?

Expected answer points:

  • Redis Sentinel: provides monitoring, notification, and automatic failover for a single master + replicas — does NOT shard data
  • Redis Cluster: provides sharding (data distributed across multiple masters), replication per shard, and automatic failover — handles both scaling and HA
  • Choose Sentinel when: dataset fits on single node (~50GB), you need HA for reads/writes to one master, can tolerate external proxy or client-side redirect logic
  • Choose Cluster when: dataset exceeds single node capacity, need write scaling (multiple masters), want native sharding with slot-aware clients
  • Sentinel limitations: no data sharding, all writes go to single master, cluster-wide operations require external tooling
  • Cluster limitations: cannot guarantee ordering across slots, multi-key operations limited to same slot, transactions across slots not supported
13. How would you implement a cache warming strategy after a complete cluster failure or restart? What are the trade-offs between eager and lazy warming?

Expected answer points:

  • Eager warming: pre-populate cache from database before serving traffic — no cold-start latency for users but delays cluster startup
  • Lazy warming: serve traffic immediately, populate cache on demand — users experience cache misses initially but cluster starts faster
  • Proactive warming approach: use access logs to identify hot keys (top N by request count), batch-fetch from DB, populate cache before going live
  • Node addition warming: scan existing nodes for hot keys, fetch values, populate new node with same TTL — monitor hit rate to validate
  • Trade-offs: eager wastes resources on keys that may never be accessed; lazy causes initial latency spike and DB overload risk
  • Hybrid approach: warm top 1000-5000 hot keys eagerly (covers 80% of traffic), lazy-warm everything else
14. Explain how probabilistic early expiration (XFetch) works and when you would prefer it over lock-based cache fill for thundering herd prevention.

Expected answer points:

  • XFetch formula: probability = beta × (age / original_ttl)^2 — as key ages toward expiration, probability of early refresh increases
  • When probability triggers, one requestor synchronously refreshes cache; others continue using stale value until refresh completes
  • Advantage: no coordination overhead (no locks), keeps cache warm naturally, only triggers refresh when approaching expiration
  • Preference for XFetch when: latency tolerance exists (stale OK briefly), lock infrastructure unavailable, want simplicity over strict guarantees
  • Lock-based fill advantage: guarantees exactly one DB request per expired key, no stale reads during regeneration
  • Preference for locks when: staleness completely unacceptable, can tolerate lock acquisition latency, need strict per-key guarantees
  • Hybrid: use probabilistic early refresh as first line of defense, fall back to lock-based fill if cache miss occurs
15. How do you design a cache invalidation strategy that works across multiple cache tiers (L1 in-process, L2 Redis, L3 CDN)?

Expected answer points:

  • Invalidation must cascade: delete from L1 locally + delete from L2 in Redis + purge from CDN
  • For L1 (in-process): local deletion is immediate; no cross-node coordination needed
  • For L2 (Redis): delete key + publish invalidation event via pub/sub; all instances subscribed to invalidation channel delete the key
  • For L3 (CDN): use CDN purge API (e.g., Cloudflare API, Akamai purge) after Redis invalidation completes
  • Ordering matters: invalidate L1/L2 first (fast), then CDN purge (slower) — ensures subsequent requests don't repopulate from stale CDN
  • For strict consistency: implement version numbers or ETags; include version in cache key; changing data bumps version and naturally makes old keys stale
  • Alternative: use short TTL on CDN responses (5-10 min) so stale content naturally expires while invalidation cascade runs
16. What are the operational considerations for running Redis Cluster in a multi-region or cross-data-center deployment?

Expected answer points:

  • Network latency: cross-region RTT can be 100-200ms+ — replication lag increases, client timeouts more frequent
  • Cluster topology: Redis Cluster does not natively support multi-region — all nodes must be in low-latency network; cross-region requires application-level sharding
  • Replication: async by default — writes in one region may take time to replicate to another; potential for data loss during partition
  • Failover complexity: network partition between regions may cause split-brain or prolonged unavailability
  • Alternative approaches: separate Redis clusters per region with application-level replication; use Redis Enterprise Global (multi-master); use CRDT-based solutions
  • Monitoring: set up alerts for replication lag per region, track cross-region network latency, monitor cluster_slots_fail for network issues
  • Cost trade-offs: multi-region provides disaster recovery but increases operational complexity and cost significantly
17. How would you detect and mitigate hot keys in a Redis Cluster where 1% of keys receive 50% of traffic?

Expected answer points:

  • Detection: use Redis MONITOR or slow log to identify keys with high access counts; analyze with redis-cli --bigkeys or MEMORY USAGE
  • Hot key identification formula: hot_key_probability = (requests_per_key / total_requests) × number_of_nodes
  • Mitigation strategy 1 - Key replication: split hot key into multiple keys (user:123:shard0, user:123:shard1...) and randomize replica selection
  • Mitigation strategy 2 - Replica splitting: replicate hot key to all nodes; route requests to random replica to spread load
  • Mitigation strategy 3 - Dedicated hot key cluster: move hot keys to separate single-node Redis with replica read scaling
  • Application-level: use consistent hashing with hot key awareness — manually pin hot keys to nodes with lower utilization
  • Monitoring: track hit rate per node, alert on node-level imbalance, use Redis INFO stats for keyspace_hits per database
18. Describe the security considerations for a production Redis cluster and how you would implement defense-in-depth.

Expected answer points:

  • Authentication: use Redis ACLs (not just requirepass) — create users with minimal permissions (e.g., app user can only GET/SET/DEL on app:* keys)
  • TLS: enable encrypted connections between clients and cluster nodes; also encrypt inter-node replication traffic
  • Network isolation: bind cluster nodes to private network interfaces only (bind 10.0.1.1); never expose cluster on public IPs
  • Key namespacing: use prefixes (app:user:, app:product:) to prevent collisions when multiple applications share cluster
  • Sentinel security: Sentinel instances require authentication; they control master election so compromised Sentinel = cluster compromise
  • Monitoring: watch for unauthorized topology changes (CLUSTER SETSLOT, CLUSTER meet), monitor failed authentication attempts
  • Backup: regular cluster state backup (redis-cli --cluster save or rdbcopy); test restoration procedure
  • Compliance: for PCI/DSS or HIPAA, enable AOF persistence, TLS, ACLs, and audit logging of all commands
19. How does the choice between asynchronous and synchronous replication affect consistency guarantees in a distributed cache?

Expected answer points:

  • Async replication: write completes on primary after local write, replica updated later — fast writes but potential data loss if primary fails
  • Sync replication: write completes only after replica acknowledges — no data loss but higher latency, any replica failure blocks writes
  • Redis Cluster uses async replication by default — replicas may lag up to a few hundred milliseconds during normal operation
  • Consistency implications: async = eventual consistency, reads from replicas may return stale data; writes may be lost if primary fails before replica update
  • Mitigation: track replication lag (master_repl_offset vs slave_repl_offset) and alert when lag exceeds threshold (e.g., 100ms)
  • For stronger guarantees: use WAIT command to wait for replica acknowledgment (synchronous behavior) — blocks until N replicas acknowledge
  • Trade-off: WAIT improves durability but adds latency; unsuitable for latency-sensitive workloads; still eventual consistency if majority fails
20. How would you approach capacity planning for a distributed cache cluster that needs to handle 100k read ops/sec and 10GB of data?

Expected answer points:

  • Data capacity: 10GB working set + 30% headroom = ~13GB raw capacity needed per replica; with 1 replica per master, total 20GB across master+replica
  • Read throughput: Redis handles ~100k ops/sec per node comfortably; 3-node cluster (1 master + 1 replica each) handles 100k reads if read from replicas
  • Write throughput: writes go to master; single master handles all writes — partition writes across multiple keys to scale write throughput
  • Cluster configuration: 3 masters × 16GB each = 48GB raw, 33% replication overhead, 1.2x fragmentation = ~26GB effective, 70% usable = ~18GB — exceeds 10GB requirement
  • Memory estimation formula: usable_capacity = (sum of node memory) × (1 - replication_overhead) × (1 - fragmentation_factor) × 0.7
  • Hot keys consideration: if hot keys exist, replicate them across nodes or use key spreading; monitor per-node load distribution
  • Growth buffer: plan for 6-12 months growth; if 10GB is current, plan for 30GB nodes to accommodate growth without immediate resharding

Further Reading


Conclusion

Distributed caching solves scale and availability problems but introduces complexity. Start with a single node, add replication for read scaling, move to clustering when you need more memory or write throughput.

The coherence problem is real. Pick a strategy that matches your consistency requirements. Eventual consistency via pub/sub invalidation works for most use cases. Strong consistency is expensive.

Monitor everything. In distributed systems, you cannot eyeball whether the cache is healthy.

Best Practices Summary

These are the things I see teams trip over most often in production. Skim them now so you’re not debugging cache issues at midnight.

Architecture: Prefer Redis Cluster for clusters with more than 10 nodes or datasets larger than 50GB — it handles sharding and failover natively so you don’t have to. Use consistent hashing with virtual nodes (150+ replicas per node) so that adding or removing nodes only remaps a fraction of your keys. Never run a single Sentinel — deploy at least 3 on separate machines with quorum = (sentinels / 2) + 1. And test that your slot-aware client actually handles MOVED redirects correctly; I’ve seen clusters fail gracefully while applications crumbled because clients cached stale slot maps.

Coherence: Pub/Sub invalidation is eventually consistent — messages can be lost, full stop. Pair it with local deletion on write so the common case works without waiting for pub/sub delivery. Use periodic reconciliation to catch the cases pub/sub misses. Write-through all nodes is tempting if you need strong consistency but it falls apart fast as you scale — one slow node poisons everything.

Performance: Probabilistic early expiration (XFetch) keeps cache warm without locks or coordination overhead. Lock-based cache fill prevents concurrent database hits but adds latency for whoever wins the race. Read from replicas for read-heavy workloads; keep the primary for writes. And set alerts for replication lag — if replicas fall more than 100ms behind, you’re serving stale data without knowing it.

Operations: Keep 30% memory headroom at all times. Cache eviction under pressure causes unpredictable latency spikes that are hard to debug. Slot migration is disruptive — schedule during low-traffic windows, never during peak. Hot keys deserve attention: if 1% of keys receive 50% of your traffic, replicate those keys or split them across nodes.

Security: Use Redis ACLs, not just requirepass. TLS between cluster nodes keeps inter-node traffic private. Bind cluster nodes to private networks only. And namespace your keys with prefixes to avoid collisions when multiple applications share a cluster.


Category

Related Posts

Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming

A comprehensive guide to advanced cache patterns — thundering herd, cache stampede prevention with distributed locking and probabilistic early expiration, and cache warming strategies.

#system-design #caching #distributed-systems

Caching Strategies: A Practical Guide

Learn the main caching patterns — cache-aside, write-through, write-behind, and refresh-ahead — plus how to pick TTLs, invalidate stale data, and distribute caches across nodes.

#caching #redis #distributed-systems

Cache Stampede Prevention: Protecting Your Cache

Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.

#cache #cache-stampede #performance