Redis vs Memcached: Choosing an In-Memory Data Store

A comprehensive comparison of Redis and Memcached — data structures, persistence, clustering, Lua scripting, pub/sub, and guidance on when to choose each.

published: February 10, 2026 reading time: 41 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Memcached is a simple key-value store for strings only; Redis is a data structure server supporting strings, lists, hashes, sets, sorted sets, bitmaps, and streams. If you just need to cache JSON blobs or session data, Memcached's simplicity wins. If you need rate limiting with sorted sets, pub/sub messaging, geospatial queries, or atomic operations on complex data, Redis wins. Redis supports persistence (RDB snapshots, AOF logs), clustering, and Lua scripting for atomic transactions. Both use approximated LRU eviction, not true LRU. The choice shapes debugging at 2am: Memcached bugs are usually simple key-miss issues; Redis bugs can involve data structure inconsistencies, persistence corruption, or cluster split-brain. After reading this you'll choose the right tool based on actual requirements, not habit.

Redis vs Memcached: Choosing an In-Memory Data Store

Introduction

Both sit in front of your database and cache frequently accessed data in memory. Developers often use them interchangeably without understanding the differences. The differences matter: Redis is a data structure server that also happens to support caching. Memcached is a caching engine with a simpler data model. That distinction shapes what you can build with them and how you debug them at 2am.

This is not a “both are good” comparison. I will tell you when each one makes sense.

Core Concepts

Memcached stores strings and nothing but strings. You give it a key, you get back a value. That is the whole API.

Redis supports strings, lists, hashes, sets, sorted sets, bitmaps, hyperloglogs, geospatial indexes, and streams. It can act as a cache, a session store, a message broker, a rate limiter, and a real-time analytics engine. Note that volatile-lru and allkeys-lru use an approximated LRU algorithm (sampled LRU), not true LRU — Redis picks a random set of keys and evicts the least recently used among them. This is a memory-efficient approximation. volatile-lru applies the same sampled eviction only to keys with a TTL set, while allkeys-lru applies it across all keys.

# Memcached: everything is a string
memcached.set("user:123", json.dumps(user_data))
user_data = json.loads(memcached.get("user:123"))

# Redis: native data structures
redis.hset("user:123", mapping=user_data)
user_data = redis.hgetall("user:123")

Performance depends on what you are doing with them.

Redis Data Structures

Strings

Both handle simple string values. Redis just has more ways to manipulate them.

# Memcached
set key "value"
get key

# Redis
set key "value"
get key

# Redis extras
append key " more"     # Append to existing
incr count             # Atomic increment
decr count             # Atomic decrement
setrange key 0 "re"    # Overwrite bytes
getrange key 0 3       # Substring retrieval

Lists

Memcached does not have lists. Redis does.

# Redis lists: ordered, push/pop from either end
redis.lpush("queue:jobs", "job1", "job2", "job3")
redis.rpop("queue:jobs")  # Returns "job1" (oldest)
redis.lrange("queue:jobs", 0, -1)  # Get all

# Common use: recently viewed items, job queues, activity logs
redis.lpush("user:123:views", product_id)
redis.ltrim("user:123:views", 0, 19)  # Keep last 20

Sets and Sorted Sets

Memcached has no sets. Redis has both.

# Redis sets: unique, unordered
redis.sadd("user:123:likes", "product1", "product2", "product3")
redis.smembers("user:123:likes")
redis.sismember("user:123:likes", "product1")  # O(1) check

# Redis sorted sets: scored sets for leaderboards, priorities
redis.zadd("leaderboard", {"player1": 100, "player2": 200, "player3": 150})
redis.zrevrange("leaderboard", 0, 9, withscores=True)  # Top 10
redis.zrank("leaderboard", "player2")  # Get rank

Hashes

Memcached has no hashes. Redis does.

# Redis hashes: objects without serialization overhead
redis.hset("user:123", "name", "Alice", "email", "alice@example.com")
redis.hget("user:123", "name")  # "Alice"
redis.hgetall("user:123")  # All fields

# vs Memcached requiring JSON serialization
memcached.set("user:123", json.dumps({"name": "Alice", "email": "..."}))

Eviction Policies

Both support similar eviction policies when memory is full.

# Memcached eviction
# -no-eviction: return error on out-of-memory
# -allkeys-lru: evict least recently used of all keys
# -allkeys-random: evict random
# -volatile-lru: evict LRU of keys with TTL
# -volatile-ttl: evict shortest TTL
# -volatile-random: evict random of keys with TTL

memcached -o expire_counter,merge_threshold,ev=volatile-lru

# Redis maxmemory policies
# allkeys-lru, allkeys-random, allkeys-lfu, allkeys-ttl
# volatile-lru, volatile-lfu, volatile-random, volatile-ttl
# noeviction

maxmemory 100mb
maxmemory-policy allkeys-lru

The policies are nearly identical. Redis adds LFU (Least Frequently Used) which Memcached does not have.

Cache Invalidation Strategies

“Cache aside” (lazy loading) is the most common pattern, but there are several strategies with different trade-offs.

Write-Through Cache

Data is written to both cache and database synchronously. Reads always hit cache.

def set_user(user_id, data):
    # Write to cache AND database together
    redis.set(f"user:{user_id}", json.dumps(data))
    db.users.update(user_id, data)
    return data

def get_user(user_id):
    # Cache is always fresh
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    # Fallback only if cache miss
    data = db.users.get(user_id)
    redis.set(f"user:{user_id}", json.dumps(data))
    return data

Pros: Strong consistency — cache always matches database. Simple read path (always read from cache).

Cons: Write latency is higher (two writes). Cache can be stale if database write succeeds but cache write fails (use transactions).

Best for: Write-heavy workloads where data must always be current, configuration data, reference data.

Write-Behind Cache (Write-Back)

Data is written to cache only. Database is updated asynchronously.

def set_user(user_id, data):
    # Write to cache only — fast
    redis.set(f"user:{user_id}", json.dumps(data))
    # Async write to database via queue
    queue.enqueue("db:users:upsert", {"user_id": user_id, "data": data})
    return data

Pros: Very fast writes. Reduces database load during write spikes.

Cons: Risk of data loss if cache fails before database is updated. Requires additional infrastructure (write queue, retry logic). Cache can be inconsistent across nodes during propagation.

Best for: Write-heavy workloads where occasional data loss is acceptable (metrics, analytics, leaderboards), session data.

Cache-Aside (Lazy Loading)

The application manages cache explicitly — reads populate cache on miss, writes update database and invalidate cache.

def get_user(user_id):
    # Read from cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    # Cache miss — load from database
    data = db.users.get(user_id)
    # Populate cache for next time
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))
    return data

def set_user(user_id, data):
    # Write to database
    db.users.update(user_id, data)
    # Invalidate cache — do not update it
    redis.delete(f"user:{user_id}")
    return data

Pros: Simple. Cache never has stale writes (invalidated on update). Read-heavy workloads naturally populate cache. No write amplification.

Cons: Cache miss penalty — first read after invalidation or startup is slow. “Thundering herd” problem on popular keys.

Best for: Read-heavy workloads, most general-purpose caching, when database is source of truth.

TTL and Invalidation Details

TTL Selection Criteria

Choosing TTL is a trade-off between staleness and cache efficiency.

Data Type	Recommended TTL	Rationale
User sessions	24-48 hours	Sessions have natural expiry; 48h covers timezone gaps
Configuration data	5-15 minutes	Changes need to propagate; short enough to recover
API responses (public)	1-5 minutes	Fresh data important; short TTL limits staleness
Analytics / aggregates	15-60 minutes	Tolerates some staleness; longer = better hit rate
Product catalog	1-24 hours	Updates are infrequent; long TTL = better hit rate
Leaderboards	30-300 seconds	Needs near-real-time accuracy; short TTL required

Quick heuristic: Pick TTL based on how stale your data is allowed to be. If you cannot answer “how stale can this be?”, set it to 60 seconds or less.

Invalidation Patterns

Delete vs Expire:

# Delete: immediate removal
redis.delete("user:123")

# Expire: time-based removal
redis.setex("temp:data", 300, value)  # Auto-removes in 5 min

# Use expire for: temporary data, cached computations
# Use delete for: data that changed, explicit updates

Invalidation on update vs refresh on read:

# Option A: Invalidate on write (cache-aside)
def update_user(user_id, data):
    db.users.update(user_id, data)
    redis.delete(f"user:{user_id}")  # Next read fetches fresh

# Option B: Refresh on write (write-through variant)
def update_user(user_id, data):
    db.users.update(user_id, data)
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))  # Write new value

# Option A (invalidate) is preferred because:
# - Avoids write amplification (only invalidates, does not rewrite)
# - Simpler: no need to serialize and store on every write
# - Cache stays consistent if write fails (delete does not happen)

Avoiding the Thundering Herd

When a popular cache key expires, many requests simultaneously hit the database.

# BAD: Many requests see cache miss, all hit database
def get_product(product_id):
    cached = redis.get(f"product:{product_id}")
    if not cached:
        data = db.products.get(product_id)  # ALL requests hit DB
        redis.setex(f"product:{product_id}", 300, json.dumps(data))
    return json.loads(cached) if cached else data

# GOOD: Single request refreshes, others wait
import threading
import time

def get_product_safe(product_id, lock_ttl=10):
    cached = redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)

    lock_key = f"lock:product:{product_id}"
    # Try to acquire lock
    if redis.set(lock_key, "1", nx=True, ex=lock_ttl):
        # We got the lock — refresh from database
        data = db.products.get(product_id)
        redis.setex(f"product:{product_id}", 300, json.dumps(data))
        redis.delete(lock_key)
        return data
    else:
        # Another request is refreshing — wait and retry
        time.sleep(0.1)
        cached = redis.get(f"product:{product_id}")
        if cached:
            return json.loads(cached)
        return get_product_safe(product_id, lock_ttl)  # Retry

Alternative: Probabilistic early expiration (XFetch):

import hashlib
import random

def get_with_xfetch(key, beta=1.0):
    """XFetch: probabilistic early expiration to prevent thundering herd"""
    value = redis.get(key)
    if value:
        # Check if we should refresh early (probabilistic)
        ttl = redis.ttl(key)
        if ttl > 0:
            # Regenerate earlier if: random() < exp(-ttl/beta)
            if random.random() < math.exp(-ttl / beta):
                # Background refresh (in production, use separate thread/queue)
                return value, True  # "stale" flag to caller
    return value, False

# Usage: serve stale data while refreshing in background

Cache Warming and Cold Starts

When the Cache Starts Cold

When a cache starts empty (restart, deployment, failure), every request hits the database.

# BAD: Cold cache causes database overload at startup
# All 10,000 users hit DB simultaneously after Redis restart

# GOOD: Pre-warm cache before taking traffic
def warm_cache():
    """Run at startup before accepting traffic"""
    popular_keys = db.products.get_top_100()  # Identify hot data
    for product in popular_keys:
        redis.setex(f"product:{product.id}", 3600, json.dumps(product))
    # Now safe to take traffic

# Better: Progressive warming with rate limiting
def warm_cache_progressive():
    keys_to_warm = get_keys_by_priority()  # Sort by access frequency
    for i, key in enumerate(keys_to_warm):
        data = db.fetch(key)
        redis.setex(f"cache:{key}", get_ttl_for(key), data)
        # Rate limit: 1000 keys per second to avoid overwhelming DB
        if i % 1000 == 0:
            time.sleep(1)

Keeping Frequently-Used Data Hot

Monitor cache temperature — how often each key is accessed.

# Track key access frequency
def access_key(key):
    # Increment access counter (atomic)
    redis.hincrby("key:access", key, 1)
    return redis.get(key)

# Analyze access patterns weekly
def analyze_access():
    # Find keys not accessed in 7 days — low priority for warming
    # Find top 1000 accessed keys — priority for staying cached
    hot_keys = redis.zrevrange("key:access", 0, 999, withscores=True)

    # Set aggressive TTL on hot keys
    for key, score in hot_keys:
        current_ttl = redis.ttl(f"cache:{key}")
        if current_ttl < 3600:  # Less than 1 hour
            redis.expire(f"cache:{key}", 86400)  # Extend to 24 hours

Warming Patterns in Practice

# Pattern 1: Scheduled pre-warming before high-traffic events
def warm_for_black_friday():
    """Run 30 minutes before expected traffic spike"""
    # Pre-compute and cache popular product pages
    top_products = db.get_products_by_category("popular", limit=500)
    for product in top_products:
        redis.setex(f"product:{product.id}", 7200, compute_product_page(product))
    # Pre-warm user sessions that will be active
    active_user_ids = db.get_users_logged_in_recently(limit=10000)
    for user_id in active_user_ids:
        redis.setex(f"session:{user_id}", 172800, load_user_session(user_id))

# Pattern 2: Proactive caching on database write
def write_with_proactive_cache(user_id, data):
    # Write to database
    db.users.update(user_id, data)
    # Proactively cache the result
    redis.setex(f"user:{user_id}", 86400, json.dumps(data))
    # Also cache related data
    redis.setex(f"user:{user_id}:profile", 86400, json.dumps(data["profile"]))

# Pattern 3: Background refresh for critical keys
def start_background_refresh(key, compute_fn, ttl=300):
    """Refresh key in background before expiry"""
    def refresh():
        while True:
            value = compute_fn()
            redis.setex(key, ttl, value)
            time.sleep(ttl * 0.8)  # Refresh at 80% of TTL
    thread = threading.Thread(target=refresh, daemon=True)
    thread.start()

Persistence

Memcached: Pure Memory

Memcached is pure memory. It never touches disk. When it restarts, everything is gone.

# Memcached has no persistence options
# Restart = empty cache

This sounds like a drawback, but for pure caching it is fine. Your source of truth is the database anyway.

When a Memcached node restarts, every cached key is gone. There is no reload sequence, no replay of write-ahead logs, no restoration from an RDB snapshot. The cache repopulates itself through normal traffic. Your application has to handle this gracefully: fall back to the database directly, pre-warm the cache before taking a node out of rotation, or accept a brief spike in database load. Instagram wrote about this when they migrated away from Memcached — their solution was application-level cache warming plus consistent hashing across many virtual nodes so that individual node failures did not concentrate load on any one backend.

The trade-off is deliberate. Memcached has no persistence to configure, no fork overhead, no AOF rewrite storms, no durability guarantees to maintain. For pure caching where the database is always the source of truth, this simplicity is a feature. You skip the memory and CPU cost of maintaining an on-disk copy of data that will be invalidated anyway. Redis persisting to disk while your cache repopulates from the database within minutes means the persistence layer is wasted effort for most caching workloads. Memcached skips it entirely and gives you a cache that is fast, simple, and predictable — as long as your application code is designed to survive empty.

Redis: Optional Persistence

Redis persists to disk. You can survive restarts without losing data.

# RDB snapshots: point-in-time dumps
save 900 1    # Save if 1 key changed in 900 seconds
save 300 10   # Save if 10 keys changed in 300 seconds
save 60 10000 # Save if 10000 keys changed in 60 seconds

# AOF (Append Only File): every write logged
appendonly yes
appendfsync everysec  # fsync every second (balance of speed/safety)

# Or no persistence at all (pure cache mode)
save ""

Redis persistence is configurable. You can turn it off for pure caching or enable it for durability.

Performance and Clustering

Performance

Raw performance depends on your workload. Here is a general comparison:

Operation	Memcached	Redis
GET/SET (simple)	Very fast	Fast
MGET/MSET (batch)	Faster	Slower (per-key overhead)
INCR (atomic counter)	Fast	Very fast
Sets/Lists/Hashes	Not supported	Depends on operation
Memory efficiency	Better (simple values)	Depends on data structures

For simple string caching, Memcached often uses less memory per key. For complex data structures, Redis’s overhead is usually worth it.

Redis uses single-threaded execution (one command at a time per connection, but multiple connections). Memcached is multi-threaded. On a single instance, Redis can saturate network bandwidth. Memcached scales better on multi-core for raw throughput.

# Redis pipelining: batch commands to reduce round trips
pipe = redis.pipeline()
for key in keys:
    pipe.get(key)
results = pipe.execute()  # One round trip for all

Clustering and Distribution

Clustering and distribution are where Redis and Memcached diverge most dramatically in operational complexity. In production, you rarely run a single instance — you need multiple machines for capacity, fault tolerance, or geographic distribution. How each system handles this shapes everything from your deployment scripts to your 3am pager duty.

Redis ships with a built-in clustering layer. Redis Cluster handles partitioning, replication, and failover automatically — your application talks to a single logical database while Redis moves data around behind the scenes. Memcached has no such facility. You build distribution yourself, typically with consistent hashing on the client side, which means your application code owns the sharding logic and topology changes require coordinated deployments.

This difference is not academic. Redis Cluster adds operational machinery — cluster-node-timeout, gossip protocols, manual resharding — but removes the burden of client-side routing. Memcached keeps operations simple at the server level (no cluster state to manage) but pushes complexity to the client, where every language library may implement consistent hashing differently. The sections below break down what each approach actually looks like in practice.

Memcached

Memcached has no native clustering. You shard across instances manually using consistent hashing.

import hashlib

class ConsistentHash:
    def __init__(self, servers):
        self.servers = servers
        self.ring = {}
        self.sorted_keys = []

        for server in servers:
            for i in range(150):
                key = f"{server}:{i}"
                hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
                self.ring[hash_key] = server
                self.sorted_keys.append(hash_key)

        self.sorted_keys.sort()

    def get_server(self, key):
        hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
        for sorted_key in self.sorted_keys:
            if hash_key <= sorted_key:
                return self.ring[sorted_key]
        return self.ring[self.sorted_keys[0]]

It works. You are just managing the sharding yourself.

Redis

Redis has built-in clustering with automatic sharding.

# Redis Cluster configuration
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000

# Automatic sharding, replication, and failover
# Your application sees a single logical database

Redis Cluster partitions keys across nodes automatically. It also supports replication for read scaling and failover.

Capacity Estimation: Memory-per-Key and Cluster Slot Planning

Memory per key differs significantly between Redis and Memcached.

Redis memory breakdown per key (string value):

Component	Size
Key pointer	~56 bytes (SDS allocator)
Value storage	Actual value size
Redis object overhead	~16 bytes
Dictionary entry (if in hash)	~32 bytes
Total minimum per key	~72 bytes + value

Memcached memory breakdown per key (string value):

Component	Size
Key	Key length
Value	Actual value size
flags byte	1 byte
CAS token (optional)	8 bytes
Expiry time	4 bytes
Overhead per item	~25 bytes
Total minimum per item	~25 bytes + key + value

For a cache with 1 million keys, each storing a 200-byte value:

Redis string: ~72M overhead + 200M data = ~272M total
Memcached: ~25M overhead + key_space + 200M data = ~240M + key_space

Memcached wins on simple string workloads by 10-20% memory efficiency. Redis pays the overhead for richer data structures.

Redis Cluster slot planning: 16,384 slots divided across N master nodes. For a 6-node cluster (3 masters + 3 replicas), each master owns ~5,461 slots. Slot ownership determines which node stores which keys. The formula: slot = CRC16(key) % 16384. When planning capacity, ensure each master has headroom — if one master owns 5,461 slots and your average key is 1KB with 100K keys per slot, that node needs roughly 5GB. Plan for 2x headroom.

Memcached cluster sizing: No slots — consistent hashing distributes keys. Target 150-200 virtual nodes per physical node for even distribution. With N nodes and V virtual nodes each, the coefficient of variation (CV) of key distribution should stay below 0.3. Formula: CV ≈ 1/√(N × V). For CV < 0.3 with V = 150, you need N ≥ 12 nodes for even distribution. Fewer nodes means higher variance in distribution.

Comparative Analysis Tables

Cache Invalidation Strategy Comparison

Strategy	Write Latency	Read Consistency	Data Loss Risk	Complexity	Best For
Write-Through	High (sync)	Always fresh	None	Low	Write-heavy, consistency-critical
Write-Behind	Low (async)	Eventually consistent	High	High	Write spikes, high-throughput
Cache-Aside	Low (1 write)	Strong (on invalidation)	Low	Medium	Read-heavy, general purpose

Redis vs Memcached: When to Use Which

Factor	Redis	Memcached	Winner
Data structures	Strings, Lists, Sets, Hashes, Sorted Sets	Strings only	Redis
Memory efficiency	Higher per-key overhead (~72 bytes + value)	Lower per-key overhead (~25 bytes + value)	Memcached
Persistence	RDB snapshots, AOF logs	None (pure memory)	Redis
Clustering	Built-in cluster with slots	Client-side consistent hashing	Redis
Threading model	Single-threaded (no locks)	Multi-threaded (global lock per operation)	Memcached (throughput), Redis (consistency)
Atomic operations	INCR, SETNX, Lua scripts	CAS tokens only	Redis
Pub/Sub	Native support	Not supported	Redis
Latency	Sub-millisecond, predictable	Sub-millisecond, predictable	Tie
Operational complexity	Higher (config, persistence)	Lower (stateless)	Memcached
Production maturity	Very mature at scale	Mature	Tie

Eviction Policy Comparison

Policy	Redis Support	Memcached Support	Behavior
LRU (Least Recently Used)	`allkeys-lru`, `volatile-lru`	`allkeys-lru`, `volatile-lru`	Evict least recently accessed
LFU (Least Frequently Used)	`allkeys-lfu`, `volatile-lfu`	Not supported	Evict least frequently accessed
TTL	`volatile-ttl`	`volatile-ttl`	Evict shortest TTL first
Random	`allkeys-random`, `volatile-random`	`allkeys-random`, `volatile-random`	Random eviction
No eviction	`noeviction`	`noeviction`	Return error on OOM

Redis LFU advantage: LFU tracks frequency, not just recency. For data that is accessed frequently for a period then becomes cold, LFU prevents it from being evicted by a single recent access spike. Memcached does not have this capability.

Connection Management Trade-offs

Aspect	Single Connection	Connection Pool	Presized Pool
Setup cost	High (connect latency)	Medium (pool creation)	Low
Concurrent requests	Poor (blocks)	Good	Best
Resource usage	Low (1 socket)	Medium (N sockets)	Medium
Complexity	Simple	Moderate	Simple
Best for	Scripts, short-lived	Web applications	High-throughput

When to Use / When Not to Use

Memcached makes sense for simple string caching — HTML fragments, API responses, session data — where maximum memory efficiency matters and you do not need complex data structures. It scales horizontally via consistent hashing, and the operational surface is small. Use it for database query results that fit naturally in key-value form, and for things that do not change often and benefit from sub-millisecond access.

Redis makes sense when you need lists, sets, sorted sets, or hashes; when you want optional persistence; when you need atomic counters for rate limiting or distributed locks; when you need pub/sub for real-time features or chat; when you want built-in clustering; or when you are building leaderboards, job queues, or caching with complex data access patterns. Lua scripting adds atomic multi-step operations without race conditions.

A Practical Decision Framework

Do you need anything beyond simple string key-value?
  YES -> Redis
  NO  -> Does memory efficiency matter more than features?
          YES -> Memcached
          NO  -> Redis (for easier operations and clustering)

If you are not sure, start with Redis. The extra memory usage is negligible for most workloads. If you later find memory is tight and profiling shows Memcached is meaningfully better, switch.

Production Failure Scenarios and Trade-off Analysis

Failure	Impact	Mitigation
Redis/Memcached OOM	Cache returns errors; application falls back to database	Monitor `used_memory/maxmemory` ratio; set alerts at 70% threshold
Redis fork for RDB save	Brief blocking during fork; memory doubles during copy-on-write	Schedule RDB saves during low-traffic; use AOF instead for persistence
Memcached restart	All data lost immediately (no persistence)	Design for cold cache; implement application-level cache warming
Redis replica lag	Reads from replica may return stale data	Monitor `replication_backlog_histlen`; read from primary for consistency-critical data
Connection pool exhaustion	Requests timeout waiting for connection	Size connection pool appropriately; implement request queuing with timeout
Single-threaded Redis blocking	Long commands block all other commands	Avoid KEYS, SMEMBERS on large sets; use pipeline/batch operations
Memcached multi-thread contention	High CPU under heavy load	Scale horizontally with consistent hashing; consider Redis for complex workloads

Detailed Failure Scenarios

Case 1: Redis OOM During Peak Traffic

What happened: A Redis instance reached maxmemory during a flash sale. Eviction policy was noeviction. Redis started returning errors instead of serving requests.

Root cause: The maxmemory-policy was set to noeviction (return error on OOM) instead of allkeys-lru. Additionally, the application was not handling cache errors gracefully — it failed fast instead of falling back to the database.

Impact: 12% of requests failed during a 45-minute window. The database was under-provisioned for the fallback load and also started timing out.

Lesson learned: Always use an eviction policy that allows Redis to keep serving. Implement circuit breakers so the application falls back to the database gracefully when cache errors spike. Monitor evicted_keys and used_memory metrics.

Case 2: Memcached Restart Storm

What happened: A Memcached node was restarted for a configuration update. Within 90 seconds, the database was overwhelmed by cold-cache requests from all application servers simultaneously.

Root cause: No cache warming strategy. All 50 application instances started with empty local caches and hit the database for the same popular keys simultaneously. The database had no protection against this concurrent access pattern.

Impact: Average response time jumped from 15ms to 8,400ms. Database CPU hit 100%. The restart took 3 minutes longer than expected because the database was too overloaded to respond to health checks.

Lesson learned: Implement cache warming before taking a cache node out of rotation. Use consistent hashing with virtual nodes so individual key popularity does not spike on single nodes after restart. Consider using a local L1 cache (in-memory LRU) in front of Memcached to absorb cold-start load.

Case 3: Redis Pipeline Blocking on Large Set

What happened: A developer ran redis-cli --bigkeys on a production Redis instance during peak hours to find memory-heavy keys.

Root cause: The --bigkeys flag performs a full SCAN and evaluates every key’s memory footprint. On a 50GB Redis instance with millions of keys, this command consumed 15 seconds of CPU and blocked all other commands during that window.

Impact: P99 latency spiked from 5ms to 12,000ms. The application saw 200+ connection timeouts. The on-call engineer spent 45 minutes diagnosing why Redis was suddenly unresponsive.

Lesson learned: Never run memory introspection commands (--bigkeys, MEMORY USAGE on unknown keys, KEYS *) on production Redis instances. Use SCAN with COUNT limits for any introspection, and always run it during maintenance windows. For memory analysis, use Redis MEMORY STATS and INFO memory instead.

Common Pitfalls and Anti-Patterns

1. Using KEYS Command in Production

The KEYS command scans all keys and blocks Redis. Never use it in production.

# BAD: KEYS blocks Redis for seconds
all_keys = redis.keys("user:*")

# GOOD: Use SCAN for production
cursor = 0
while True:
    cursor, keys = redis.scan(cursor, match="user:*", count=100)
    process(keys)
    if cursor == 0:
        break

2. Storing Large Values Without Compression

Large values consume memory disproportionately and slow down operations.

# BAD: Storing large uncompressed data
redis.set("page:123", large_html_content)  # 500KB+ per page

# GOOD: Compress large values
import zlib
compressed = zlib.compress(large_html_content.encode())
redis.setex("page:123", 3600, compressed)

3. Not Using Connection Pooling

Each operation creating a new connection adds overhead.

# BAD: New connection each time
def get_user(user_id):
    r = redis.Redis(host='localhost', port=6379)  # Connection every call
    return r.get(f"user:{user_id}")

# GOOD: Reuse connection
pool = redis.ConnectionPool(host='localhost', port=6379, max_connections=50)

def get_user(user_id):
    r = redis.Redis(connection_pool=pool)
    return r.get(f"user:{user_id}")

4. Ignoring Memcached Persistence Limitations

Memcached has no persistence. Data is lost on restart.

# BAD: Assuming Memcached persists data
memcached.set("session:123", session_data)
# ... server restarts ...
session = memcached.get("session:123")  # None - data gone

# GOOD: Design for cold start
session = memcached.get("session:123")
if not session:
    session = load_from_database()  # Always have fallback
    memcached.set("session:123", session, time=3600)

5. Using Redis Single Instance for Everything

Redis single-threaded nature means CPU-bound operations block everything.

# BAD: CPU-heavy operation in Redis
# This blocks all other commands
redis.sort("large-set")  # O(N log N) - blocks Redis

# GOOD: Move CPU work to application
data = redis.lrange("large-list", 0, -1)
sorted_data = sorted(data)  # Application handles sorting

Quick Recap

Redis offers data structures — lists, sets, hashes — that Memcached cannot match. Memcached is more memory-efficient for simple strings. Redis persistence (RDB/AOF) survives restarts; Memcached does not. Redis Cluster provides automatic sharding; Memcached requires client-side sharding. Both support LRU/LFU eviction but Redis LFU is more sophisticated. Redis single-threaded is a feature — no race conditions — but CPU-heavy operations block everything.

Best Practices Summary

Redis: use connection pooling (never a new connection per request); set maxmemory and an eviction policy like allkeys-lru for most caching workloads; never run KEYS, SMEMBERS, or SORT on large sets; use pipelining for batch operations; enable slow log monitoring at 10ms threshold; use hashes for objects instead of JSON strings; set reasonable TTLs; rename dangerous commands in production (rename-command FLUSHDB ""); monitor memory fragmentation (mem_fragmentation_ratio > 1.5 indicates problems); use Lua scripts for atomic multi-step operations.

Memcached: use consistent hashing for sharding with 150-200 virtual nodes per physical node; use consistent key naming with service prefixes (users:123, sessions:abc); store serialized data efficiently with MessagePack or Protocol Buffers instead of JSON for 20-30% smaller payloads; set appropriate chunk size (default 1MB may waste memory for small values); monitor evictions — high rates indicate cache is too small or TTLs are misconfigured; prefer UDP for get operations in read-heavy workloads; use multi-get for batch operations.

Observability Checklist

Security Checklist

Enable authentication — Redis 6+ supports ACLs. Memcached supports SASL authentication. Never run without auth in production.
Bind to internal IPs only — bind 127.0.0.1 or bind 10.0.0.0/8 to prevent unauthorized access. No public IP exposure.
Encrypt in transit — Use TLS for Redis and Memcached if crossing network boundaries. Redis 6+ has native TLS support.
Limit commands — Rename dangerous commands: rename-command FLUSHDB "" rename-command CONFIG "" rename-command KEYS "".
Set maxmemory — Prevent cache from consuming all available RAM and causing system instability.
Use firewall rules — Restrict access to cache ports (6379 for Redis, 11211 for Memcached) to application servers only.

Metrics to Track

Redis:

# Core metrics via Redis INFO
INFO memory  # used_memory, maxmemory, mem_fragmentation_ratio
INFO stats   # total_commands_processed, keyspace_hits, keyspace_misses
INFO replication  # master_link_status, slave_read_only, replication_lag
INFO clients  # connected_clients, blocked_clients

# Calculate hit rate
# hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses)

Memcached:

# Stats command
stats
# Items: curr_items, total_items, evictions
# Memory: bytes, limit_maxbytes
# Hit rate: get_hits, get_misses

# Calculate hit rate
# hit_rate = get_hits / (get_hits + get_misses)

Logs to Capture

import structlog
import time

logger = structlog.get_logger()

class CacheMetrics:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.start_time = time.time()

    def track_operation(self, operation, key, hit=True):
        logger.info("cache_operation",
            operation=operation,
            key=key,
            cache_hit=hit,
            latency_ms=self._measure_latency()
        )

    def log_memory_pressure(self):
        info = self.redis.info('memory')
        used = info['used_memory']
        maxmem = info['maxmemory']

        if maxmem > 0 and used / maxmem > 0.8:
            logger.warning("cache_memory_critical",
                used_mb=used / 1024 / 1024,
                max_mb=maxmem / 1024 / 1024,
                fragmentation=info.get('mem_fragmentation_ratio', 1))

Interview Questions

1. Redis uses more memory per key than Memcached for simple strings. How would you optimize a Redis deployment for a memory-constrained environment?

Memcached wins on raw memory efficiency for simple strings because it has minimal per-key overhead. For Redis in memory-constrained environments, use hashes instead of string serialization — HSET user:123 name Alice email alice@example.com stores all fields in one Redis key with shared overhead, versus one key per field or JSON serialization in a string key. Enable maxmemory-policy allkeys-lru and set maxmemory conservatively. Use MEMORY USAGE command to identify large keys. Consider using ziplist encoding for small hashes and lists to compress memory. For pure string caching where memory is critical, Memcached remains the pragmatic choice.

2. Your Redis instance shows high CPU usage despite moderate request rates. What is likely happening?

Redis is single-threaded, so a single long-running command blocks everything. The slowlog get 10 command reveals which commands are taking >10ms. Common culprits: SORT on large sets, KEYS pattern scans (never use in production), SMEMBERS on large sets, ZRANGEBYSCORE on large sorted sets without LIMIT, or FLUSHDB during peak traffic. For complex operations on large datasets, move the work to the application side — fetch the raw data and process it there. Also check for fork fatigue if using RDB persistence — the fork itself is cheap but if the parent process is CPU-bound, latency spikes occur during fork.

3. What are the trade-offs between Redis RDB snapshots and AOF persistence for a caching workload?

RDB snapshots are point-in-time dumps — compact, fast to restore, but you lose data since the last snapshot if the instance crashes. AOF logs every write operation — better durability, configurable fsync intervals, but larger files and slower writes. For a pure cache where the database is the source of truth, RDB is usually sufficient — if Redis restarts with an empty cache, the application repopulates from the database. Enable AOF only when you need durability guarantees for cached data, or when restart time matters more than storage overhead. The appendfsync everysec setting is a good balance — worst case 1 second of data loss but much faster than always.

4. Memcached is multi-threaded but you observe high CPU and low throughput. What is happening?

Memcached's multi-threaded architecture uses a global lock on the cache for each operation. If your workload performs very small gets and sets, the lock contention overhead exceeds the parallelism benefit. High CPU with low throughput is the signature of lock contention in Memcached. Workarounds: use connection pooling to multiplex connections (more clients means better parallelism), partition your keys across multiple Memcached instances to reduce per-instance lock contention, or switch to Redis where single-threaded execution eliminates lock contention entirely for most workloads. Profile with stats command — look at lock_ratio or wait_ratio if available in your Memcached version.

5. How would you design a rate limiter using Redis? What are the trade-offs of different approaches (token bucket vs sliding window vs fixed window)?

A token bucket rate limiter in Redis uses INCR and EXPIRE: INCR increments a counter on each request, EXPIRE sets a TTL equal to the time window. If the count exceeds the limit, reject the request. Fixed window uses SETEX with key as rate_limit:{window} where window is timestamp rounded to the interval. Sliding window uses a sorted set with timestamps as scores — more accurate but requires ZREMRANGEBYSCORE and ZCARD. Token bucket allows burstiness within limits; fixed window is simpler but has boundary spikes; sliding window is most accurate but most expensive. For distributed rate limiting, Redis atomic operations are essential — Lua scripts ensure check-and-increment is atomic.

6. A Redis replica falls behind the master by 30 seconds during peak traffic. What are the risks and how do you mitigate them?

A 30-second replica lag means any read from the replica returns data that is up to 30 seconds stale. For most use cases this is acceptable; for leaderboards, like counts, or session data it can cause inconsistency. Risks: users might see outdated counts, missing likes, or stale leaderboard positions. Mitigation: monitor replication_backlog_histlen and master_link_down_since_seconds. If lag is caused by network or master load, fix the root cause first. For read-heavy workloads that can tolerate some staleness, use replica lag thresholds in your application — read from primary if lag exceeds your SLA. For consistency-critical reads (like financial data), always read from the primary. Consider read-timeouts on replicas — if the replica cannot keep up, it is better to fail the read than serve stale data.

7. What is the thundering herd problem and how does it affect both Redis and Memcached? How would you prevent it?

The thundering herd problem occurs when a popular cache key expires or a cache becomes empty, and thousands of requests simultaneously try to refresh the same key from the database. Both Redis and Memcached suffer from this because they are typically used as shared caches. Prevention strategies: (1) probabilistic early expiration (XFetch) — randomly refresh keys before they expire based on expected access frequency; (2) distributed locks — only one request refreshes the cache, others wait and retry; (3) cache warming — proactively populate cache before expected traffic spikes; (4) merge responses — if multiple requests arrive for the same key, batch them into one database query and return the result to all. For Memcached specifically, using local in-process L1 cache in front of it absorbs most thundering herd patterns because hot keys stay in process memory.

8. Compare Redis Cluster hashing with Memcached consistent hashing. What are the trade-offs?

Redis Cluster uses hash slot distribution: 16,384 slots calculated as CRC16(key) % 16384. Each master node owns a subset of slots. When you add or remove nodes, Redis migrates slots (typically 1/16th of keys per node). This is automatic and well-designed. Memcached uses consistent hashing with virtual nodes (typically 150-200 per physical node). When you add or remove nodes, only K/N keys are remapped where K is total keys and N is nodes — similar migration cost to Redis Cluster. Key difference: Redis Cluster requires at least 3 master nodes and resharding triggers brief unavailability during slot migration. Memcached consistent hashing is simpler — no special nodes required, just hash ring membership. For Redis: use for complex workloads needing replication, multiple data types, and built-in HA. For Memcached: use when you need simple, stateless sharding and can manage failover at the application layer.

9. How do you choose between Redis Strings and Redis Hashes for storing object data? When does each perform better?

Strings store serialized objects (JSON, msgpack). A single string key holds the entire object. Hashes store field-value pairs directly — each field is a separate key in Redis's internal dict. Choose Strings when: the entire object is always read or written as a whole, you need to store pre-serialized data from external systems, or you want to use string operations like APPEND or INCR on numeric fields. Choose Hashes when: you frequently read or write individual fields (partial access patterns), you want to avoid serialization/deserialization overhead, or you want Redis to manage field expiration independently. Memory: for objects with few fields (< 10), hashes have less overhead because field names and values share dict entry overhead. For large objects with many fields, hashes can be more memory-intensive than JSON in a string because each field is a separate Redis key-value. Benchmark your specific access patterns. Rule of thumb: if you access < 50% of fields at a time, hashes usually win.

10. Your application uses both Redis and Memcached. When would you use each? Design a caching architecture that uses both effectively.

A common pattern is L1 (local in-memory) + L2 (Memcached) + L3 (Redis) + Database. Memcached handles simple string caching for page fragments, rendered HTML, and API responses that benefit from its memory efficiency. Redis handles complex data (sorted sets for leaderboards, lists for queues, hashes for user objects), session storage with TTL, pub/sub for real-time features, and rate limiting with atomic operations. Concretely: use Memcached for cached database query results that are simple key-value at the page level (e.g., product:123 → HTML fragment). Use Redis for anything requiring data structures (like sets for "users who liked this post"), session data with TTL, distributed locks (SETNX), rate limiting counters, and pub/sub channels. The architectural principle: Memcached is a dumb, fast cache for immutable or rarely-changed data. Redis is a data store that happens to cache well. Start with Redis for everything; add Memcached only when memory is demonstrably constrained and profiling shows Memcached is meaningfully more efficient for specific workloads.

11. When would you choose Memcached over Redis?

Choose Memcached when you need pure string key-value caching and memory efficiency is critical. If your data fits naturally as key-value pairs, you do not need atomic counters, data structures, or persistence, and your team prefers operational simplicity, Memcached wins. It is also the right choice when horizontal scaling via consistent hashing is acceptable and you do not need built-in clustering. For everything beyond simple string caching — sorted sets, hashes, pub/sub, Lua scripting, or persistence — use Redis.

12. How does Redis handle cache stampede prevention?

Redis addresses cache stampede (thundering herd) through several mechanisms. Distributed locking via SETNX ensures only one request refreshes a hot key while others wait. Probabilistic early expiration (XFetch) randomly refreshes keys before they expire based on expected access frequency, preventing mass expiration events. WAIT command can be used for read-your-writes consistency. Application-level patterns like merging concurrent requests for the same key into a single database query also help. Memcached lacks built-in stampede prevention — use local in-process L1 cache in front of Memcached to absorb hot key access spikes.

13. What are the trade-offs between write-through and write-behind caching?

Write-through writes synchronously to both cache and database — strong consistency, simple reads, but higher write latency and potential write amplification. Write-behind writes to cache only and async flushes to database — fast writes, handles spikes, but risks data loss if cache fails before database write and requires additional infrastructure (write queue, retry logic). Cache-aside (lazy loading) is the most common pattern: writes go directly to database, cache is invalidated on write; reads populate cache on miss. Best for read-heavy workloads where the database is the source of truth.

14. How do you estimate cache capacity for a given workload?

Estimate by measuring your working set size: total unique keys accessed within a typical traffic window multiplied by average key size. For Redis, account for ~72 bytes per-key overhead plus value size. For Memcached, ~25 bytes per-item overhead plus key and value. Target cache size so your working set fits with 20-30% headroom for traffic spikes. Monitor eviction rates — evictions > 1% of requests indicate cache is too small or TTLs are misconfigured. Use MEMORY USAGE in Redis to identify large keys. For Memcached, stats shows curr_items and evictions. Size for peak + 20%, not average load.

15. What monitoring metrics matter most for a Redis or Memcached deployment?

For Redis: used_memory/maxmemory ratio (alert at 70%), keyspace_hits/keyspace_misses for hit rate, evicted_keys and expired_keys for eviction pressure, replication_backlog_histlen for replica lag, mem_fragmentation_ratio for memory fragmentation, slowlog for commands > 10ms, and connected_clients for connection pool pressure. For Memcached: get_hits/get_misses for hit rate, curr_items and evictions for cache pressure, bytes/limit_maxbytes for memory usage, and wait_ratio if available for lock contention.

16. How does consistent hashing help with cache sharding?

Consistent hashing distributes keys across cache nodes so that adding or removing a node remaps only K/N keys (where K is total keys and N is nodes), minimizing cache misses during topology changes. Memcached uses consistent hashing with virtual nodes (150-200 per physical node) for this purpose. Redis Cluster instead uses hash slots (16,384 total, calculated as CRC16(key) % 16384) and migrates slots when nodes are added or removed. Both approaches limit the blast radius of node additions or failures, but Redis Cluster automates the process while Memcached requires client-side implementation of the consistent hashing ring.

17. What are the failure modes of a distributed cache and how do you mitigate them?

Key failure modes: OOM (cache returns errors, application falls back to DB) — mitigate with maxmemory-policy allkeys-lru and 70% memory alerts. Cold start (cache restarts empty, DB overwhelmed) — mitigate with cache warming before taking nodes out of rotation. Connection pool exhaustion (requests timeout) — size pool appropriately, implement request queuing with timeout. Replica lag (stale reads) — monitor replication_backlog_histlen, read from primary for consistency-critical data. Redis fork blocking (RDB saves pause commands) — schedule RDB during low-traffic windows, use AOF everysec. Single-threaded blocking (long commands block all others) — avoid KEYS, SMEMBERS on large sets in production. Memcached lock contention (high CPU, low throughput under load) — partition across more instances, use connection multiplexing.

18. How does Redis LFU eviction policy work differently from LRU, and what are the specific use cases where LFU outperforms LRU?

LRU (Least Recently Used) evicts based on access recency — the most recently accessed key survives longest. LFU (Least Frequently Used) evicts based on access frequency — the least frequently accessed key is removed first. The difference matters for workloads where data is accessed frequently for a burst period, then becomes cold. With LRU, a single recent access can keep a key alive even if it has not been touched in days. With LFU, a key accessed 1,000 times last week but not this week will be evicted before a key accessed 10 times this week. Use LFU when: your working set changes gradually (popular items stay popular), you want to preserve frequently-accessed data during traffic spikes, or you need to prevent cold data from being retained by one-time access events. Redis implements LFU with LFU_DECAY_TIME (how often to decrement counters) and LFU_INIT_VAL (initial frequency value). Memcached does not support LFU.

19. Describe the trade-offs of using Redis Pipeline versus MGET/MSET versus Lua scripts for batch operations.

Redis Pipeline batches multiple commands into a single network round trip — the client sends N commands, Redis processes them sequentially, client receives N responses. No atomicity guarantee (other commands from other clients can interleave). Best for: improving throughput on bulk reads/writes when each command is independent. MGET/MSET are native batch commands — MGET key1 key2 key3 retrieves multiple keys in one command, which is more efficient than pipelining individual GETs because Redis processes it internally as a single operation. Lua scripts are atomic — Redis executes the entire script without interleaving other commands, making them safe for read-check-write patterns. Lua scripts have startup overhead (script compilation) and cannot use blocking commands inside them. Trade-off summary: pipeline for throughput on independent ops, MGET/MSET for native batch efficiency, Lua for atomic multi-step operations.

20. Your team is considering moving from Memcached to Redis. What is your decision framework and what risks do you identify during migration?

Decision framework: start with Redis for new projects. If the team has operational experience with Memcached and the workload is purely simple string key-value, stay with Memcached. If you need data structures (sets, sorted sets, hashes, lists), atomic counters, pub/sub, persistence, or built-in clustering, move to Redis. Migration risks: data loss during transition if both caches are running simultaneously (cache keys diverge); increased operational complexity — Redis requires monitoring for fork times, AOF/RDB trade-offs, and memory fragmentation; connection pool sizing is different — Memcached multi-threaded model handles concurrency differently than Redis single-threaded model; application code changes — replacing memcached.get/set with redis.hgetall or redis.lrange is not a drop-in replacement. Mitigation: run both in parallel during transition, use feature flags to route traffic, implement thorough testing before cutting over, and plan for 2x operational monitoring during the transition period.

Conclusion

Memcached is simpler and more memory-efficient for pure string caching. Redis is more capable. For basic caching, they are comparable. But Redis’s data structures unlock patterns that would be painful or impossible with Memcached.

I default to Redis for new projects. The operational simplicity of having one system for caching, sessions, pub/sub, and rate limiting usually beats the memory efficiency gains of Memcached.

That said, if you are caching primarily string data and memory is tight, Memcached still earns its place.

Redis vs Memcached: Choosing an In-Memory Data Store

Introduction

Core Concepts

Redis Data Structures

Strings

Lists

Sets and Sorted Sets

Hashes

Eviction Policies

Cache Invalidation Strategies

Write-Through Cache

Write-Behind Cache (Write-Back)

Cache-Aside (Lazy Loading)

TTL and Invalidation Details

TTL Selection Criteria

Invalidation Patterns

Avoiding the Thundering Herd

Cache Warming and Cold Starts

When the Cache Starts Cold

Keeping Frequently-Used Data Hot

Warming Patterns in Practice

Persistence

Memcached: Pure Memory

Redis: Optional Persistence

Performance and Clustering

Performance

Clustering and Distribution

Memcached

Redis

Capacity Estimation: Memory-per-Key and Cluster Slot Planning

Comparative Analysis Tables

Cache Invalidation Strategy Comparison

Redis vs Memcached: When to Use Which

Eviction Policy Comparison

Connection Management Trade-offs

When to Use / When Not to Use

A Practical Decision Framework

Production Failure Scenarios and Trade-off Analysis

Detailed Failure Scenarios

Case 1: Redis OOM During Peak Traffic

Case 2: Memcached Restart Storm

Case 3: Redis Pipeline Blocking on Large Set

Common Pitfalls and Anti-Patterns

1. Using KEYS Command in Production

2. Storing Large Values Without Compression

3. Not Using Connection Pooling

4. Ignoring Memcached Persistence Limitations

5. Using Redis Single Instance for Everything

Quick Recap

Best Practices Summary

Observability Checklist

Security Checklist

Metrics to Track

Logs to Capture

Interview Questions

Further Reading

Official Documentation

Related Articles

External Resources

Performance Tuning

Conclusion

Category

Tags

Related Posts

Key-Value Stores: Redis and DynamoDB Patterns

Caching Strategies: A Practical Guide

Cache Stampede Prevention: Protecting Your Cache