System Design: Netflix Architecture for Global Streaming
Deep dive into Netflix architecture. Learn about content delivery, CDN design, microservices, recommendation systems, and streaming protocols.
System Design: Netflix Architecture for Global Streaming
Netflix streams to over 250 million subscribers across 190 countries. The engineering challenges at that scale are nothing like what most people picture — it’s not just about pushing bytes, but about content delivery, real-time encoding, recommendation systems that drive 80% of what people watch, and a microservices platform that has to stay up when half the world is binging a new season.
This case study goes into how Netflix’s architecture actually works.
Requirements Analysis
Functional Requirements
Users need to:
- Browse and search a catalog of movies and TV shows
- Stream video content on various devices
- Create and manage profiles
- Continue watching across devices
- Rate and review content
Non-Functional Requirements
The platform must:
- Stream in 4K HDR with surround sound
- Start playback in under 2 seconds
- Handle 15+ million concurrent streams
- Maintain 99.99% availability
- Work on 1000+ device types
Capacity Estimation
| Metric | Value |
|---|---|
| Subscribers | 250 million |
| Peak concurrent streams | 15+ million |
| Content library | 50,000+ titles |
| Average stream bitrate | 8 Mbps |
| Peak bandwidth | 120 Tbps |
| CDN edge locations | 100+ |
Content Delivery Architecture
Netflix’s content delivery is where most of the engineering work actually happens. Instead of relying on third-party CDNs (Akamai, Cloudflare, and the like), Netflix built their own called Open Connect — specifically designed to handle video traffic at their enormous scale.
Open Connect CDN
graph TB
A[Netflix Origin] --> B[ISP Interconnects]
B --> C[Open Connect Appliances]
C --> D[Residential Routers]
D --> E[Devices]
subgraph "ISP Network"
B
end
subgraph "Customer Premise"
C
D
end
Open Connect appliances are purpose-built servers Netflix deploys directly inside ISP data centers around the world. They cache popular content as close to users as possible, which dramatically cuts down Netflix’s bandwidth costs and keeps playback latency low.
Request Flow
sequenceDiagram
participant D as Device
participant OCA as Open Connect Appliance
participant CS as Control Plane
participant LS as License Server
D->>CS: GET /manifest.m3u8
CS->>D: Return manifest with URL to nearest OCA
D->>OCA: GET /video/segment1.ts
OCA-->>D: Video segment
D->>LS: GET /license for DRM
LS-->>D: License key
D->>OCA: GET /video/segment2.ts
OCA-->>D: Video segment
Video Encoding Pipeline
graph LR
A[Source Video] --> B[Transcode Cluster]
B --> C{HD or 4K?}
C -->|4K| D[4K Encode Farm]
C -->|HD| E[HD Encode Farm]
D --> F[Output Profiles]
E --> F
F --> G[CDN Origins]
G --> H[Edge Cache]
Netflix encodes each title in multiple resolutions and bitrates — the output of their encoding pipeline feeds directly into adaptive bitrate streaming, so the player can switch quality on the fly based on the viewer’s network conditions.
| Profile | Resolution | Bitrate | Codec |
|---|---|---|---|
| 4K HDR | 3840x2160 | 16 Mbps | H.265/VP9 |
| 1080p | 1920x1080 | 5 Mbps | H.264/H.265 |
| 720p | 1280x720 | 2.5 Mbps | H.264 |
| 480p | 854x480 | 1 Mbps | H.264 |
| Audio | Stereo/5.1/Atmos | 192-768 kbps | AAC |
Streaming Protocol
Netflix serves video using adaptive bitrate streaming, adjusting quality based on network conditions.
HLS/DASH Manifest
sequenceDiagram
participant D as Device
participant CDN as CDN
participant L as License Server
Note over D: Initial playback request
D->>CDN: GET /title/1234/manifest.m3u8
CDN-->>D: M3U8 with quality levels
Note over D: Parse quality levels
D->>CDN: GET /title/1234/video_4k.m3u8
CDN-->>D: Segment list
D->>CDN: GET /title/1234/video_4k/segment1.ts
CDN-->>D: Video segment
D->>L: GET /widevine/license
L-->>D: DRM license
Note over D: Decode and display
D->>CDN: GET /title/1234/video_4k/segment2.ts
CDN-->>D: Video segment
Adaptive Bitrate Logic
class AdaptiveBitrateController:
def __init__(self, bandwidth_calculator: BandwidthCalculator):
self.bandwidth_calculator = bandwidth_calculator
self.current_quality = "auto"
self.quality_levels = ["4k", "1080p", "720p", "480p", "360p"]
def select_quality(self, buffer_level: float, throughput: float) -> str:
# Rules-based adaptation
if buffer_level < 10: # Buffer running low
return self._downgrade()
elif buffer_level > 60: # Buffer healthy
return self._upgrade(throughput)
else:
return self.current_quality
def _downgrade(self) -> str:
current_idx = self.quality_levels.index(self.current_quality)
if current_idx < len(self.quality_levels) - 1:
return self.quality_levels[current_idx + 1]
return self.current_quality
def _upgrade(self, throughput: float) -> str:
# Choose highest quality that fits throughput
for quality in self.quality_levels:
if self._required_throughput(quality) < throughput * 0.8:
return quality
return self.quality_levels[-1]
QUIC and HTTP/3 Deep Dive
Netflix’s streaming infrastructure increasingly relies on QUIC (Quick UDP Internet Connections), the transport protocol behind HTTP/3. Understanding QUIC matters for streaming architecture because it directly shapes how buffers behave, how connections survive network changes, and how users experience playback on spotty connections.
Why QUIC Replaces TCP for Streaming
TCP was designed for reliable, ordered byte streams. For streaming video, that reliability guarantee is overkill — if a segment arrives late, you’d rather skip it than wait. QUIC builds on UDP to give streaming applications control over what “reliable” means.
graph TB
subgraph "TCP Limitations"
A[TCP Handshake] --> B[TLS Handshake]
B --> C[3-RTT Total]
C --> D[Head-of-line blocking]
D --> E[Connection migration impossible]
end
subgraph "QUIC Advantages"
F[QUIC Handshake] --> G[Combined crypto]
G --> H[1-RTT or 0-RTT]
H --> I[Stream multiplexing]
I --> J[Connection migration works]
end
Key QUIC benefits for streaming:
| Feature | TCP | QUIC |
|---|---|---|
| Connection establishment | 1-3 RTTs | 0-1 RTT |
| Head-of-line blocking | Yes | No (stream-level) |
| Connection migration | Breaks | Preserves |
| Packet loss impact | All streams | Single stream |
| TLS handshake | Separate | Combined |
Stream Multiplexing in QUIC
TCP’s head-of-line blocking problem means if packet N is lost, all HTTP/2 streams stall waiting for retransmission. QUIC fixes this at the transport layer:
class QUICStreamMultiplexing:
"""Streams are independent in QUIC"""
def __init__(self, connection: QUICConnection):
self.connection = connection
self.streams = {} # Stream ID -> Stream state
def send_on_stream(self, stream_id: int, data: bytes):
# Packet loss on stream 5 does NOT block stream 7
self.connection.send_stream_data(stream_id, data)
def handle_packet_loss(self, stream_id: int):
# Only this stream recovers, others continue
self.streams[stream_id].retransmit()
For Netflix, this means audio and video can recover independently. If video packets drop, audio keeps playing smoothly while video rebuffers — far better than both freezing.
Connection Migration
Users switch between WiFi and cellular constantly. With TCP, this breaks connections and forces reconnection. QUIC’s connection migration preserves the logical session across network changes:
class ConnectionMigration:
"""QUIC connection survives network changes"""
def on_network_change(self, new_endpoint: tuple):
# Same connection ID, new network path
self.connection.migrate(new_endpoint)
# All stream state preserved
# Playback continues without rebuffering
assert self.streams_all_alive()
Netflix leverages this for mobile users moving between networks. The playback session continues seamlessly rather than stalling during reconnection.
0-RTT Resumption
QUIC’s 0-RTT mode allows clients to resume sessions and send data immediately without waiting for handshake completion. For streaming:
class ZeroRTTStreaming:
"""Send first video request immediately"""
async def start_playback(self, title_id: str):
# 0-RTT: send immediately, handshake in background
# First segments start downloading while TLS completes
self.transport.send_stream_data(
stream_id=1,
data=f"GET /video/{title_id}/segment1",
flush=True
)
This cuts startup latency significantly — the first video segments arrive while the secure connection is being established.
QUIC Loss Detection
QUIC implements its own loss detection optimized for streaming:
class QUICLossDetection:
def __init__(self):
self.packet_threshold = 3 # Pkt-based detection
self.time_threshold = 0.25 # 250ms time-based
def on_ack(self, ack_info: dict):
if self._detect_loss(ack_info):
# Fast retransmit for streaming
self._retransmit_lost_stream()
def _detect_loss(self, ack_info: dict) -> bool:
# QUIC uses both packet count AND time since last ack
elapsed = time.time() - ack_info['last_ack_time']
lost_packets = ack_info['total_packets'] - ack_info['acked_packets']
return (lost_packets >= self.packet_threshold or
elapsed >= self.time_threshold)
Microservices Architecture
Netflix decomposed their monolith into hundreds of microservices.
Service Decomposition
graph TB
subgraph "Edge Layer"
GW[API Gateway]
GS[Gateway Service]
end
subgraph "Backend"
M[Metadata Service]
R[Recommendation Engine]
P[Playback Service]
S[Search Service]
A[Auth Service]
end
subgraph "Data"
EV[EVCache]
DB[(Cassandra)]
ES[Elasticsearch]
end
GW --> GS
GS --> M
GS --> R
GS --> P
GS --> S
GS --> A
M --> DB
M --> EV
R --> EV
S --> ES
Key Microservices
| Service | Responsibility | Data Store |
|---|---|---|
| API Gateway | Request routing, aggregation | None |
| Metadata | Titles, episodes, images | Cassandra |
| Playback | Streaming session management | EVCache |
| Recommendations | Personalized suggestions | Elasticsearch |
| Search | Full-text search | Elasticsearch |
| User Profile | Account, profiles, settings | Cassandra |
| Billing | Subscriptions, payments | PostgreSQL |
API Gateway
public class ApiGatewayApplication {
public static void main(String[] args) {
// Zuul routes configuration
addRequestThreadFilters();
addResponseFilters();
// Route definitions
configureRoutes(new ZuulRouteBuilder()
.route("/api/v1/metadata/**", "metadata-service")
.route("/api/v1/playback/**", "playback-service")
.route("/api/v1/recommendations/**", "recommendation-service")
.route("/api/v1/search/**", "search-service")
);
}
}
API Design
Streaming Endpoints
| Endpoint | Description |
|---|---|
| GET /api/v1/browse | Get personalized rows |
| GET /api/v1/titles/{id}/metadata | Get title details |
| GET /api/v1/titles/{id}/similars | Similar titles |
| GET /api/v1/playback/session | Initialize playback |
| POST /api/v1/playback/position | Update position |
| GET /api/v1/search?q={query} | Search titles |
Response Example
{
"data": {
"id": "81239481",
"title": "Stranger Things",
"type": "show",
"poster_url": "https://cdn.netflix.com/poster.jpg",
"backdrop_url": "https://cdn.netflix.com/backdrop.jpg",
"rating": "TV-14",
"year": 2024,
"duration": "4 seasons",
"synopsis": "When a young boy vanishes...",
"genres": ["Drama", "Horror", "Sci-Fi"],
"seasons": [
{
"season_num": 1,
"episodes": [
{
"num": 1,
"title": "The Vanishing",
"duration_secs": 3600,
"thumbnail": "https://cdn.netflix.com/s1e1.jpg"
}
]
}
]
},
"meta": {
"request_id": "abc123",
"version": "2.1"
}
}
Recommendation System
Netflix’s recommendation engine is responsible for about 80% of the content people watch. That’s not a small thing. If their recommendations were bad, they’d have to produce twice as much original content to keep viewers engaged.
Recommendation Pipeline
graph LR
A[User Events] --> B[Event Pipeline]
B --> C[Feature Store]
C --> D{Ranking Models}
D --> E[Personalized Ranking]
D --> F[Similar Titles]
D --> G[Top Picks]
E --> H[API Response]
F --> H
G --> H
Ranking Model Features
class RankingFeatures:
def __init__(self, user_id: int, title_id: int):
self.user_features = self._get_user_features(user_id)
self.title_features = self._get_title_features(title_id)
self.context_features = self._get_context_features()
def compute_features(self) -> Dict[str, float]:
return {
# User attributes
"user_age_days": self.user_features.age_days,
"user_avg_watch_time": self.user_features.avg_watch_time,
"user_rating_avg": self.user_features.avg_rating,
"user_genre_preferences": self.user_features.genre_scores,
# Title attributes
"title_popularity_score": self.title_features.popularity,
"title_recency": self.title_features.release_days_ago,
"title_rating": self.title_features.avg_rating,
"title_match_score": self._genre_match(),
# Context
"time_of_day": self.context_features.hour,
"day_of_week": self.context_features.day,
"device_type": self.context_features.device
}
Ranking Service
class RecommendationService:
def __init__(self, model: RankingModel, cache: RedisCache):
self.model = model
self.cache = cache
async def get_ranked_list(
self,
user_id: int,
row_count: int = 20,
evidence_count: int = 5
) -> List[RankedTitle]:
# Check cache
cache_key = f"recs:{user_id}:{row_count}"
cached = await self.cache.get(cache_key)
if cached:
return self._deserialize(cached)
# Get candidate titles
candidates = await self._get_candidates(user_id, 500)
# Compute features for each
ranked = []
for title in candidates:
features = self._compute_features(user_id, title)
score = await self.model.predict(features)
ranked.append((score, title))
# Sort and return top N
ranked.sort(key=lambda x: x[0], reverse=True)
results = [title for _, title in ranked[:row_count]]
# Cache for 5 minutes
await self.cache.setex(cache_key, 300, self._serialize(results))
return results
Capacity Planning Deep Dive
Netflix’s capacity planning for global streaming involves predicting demand, provisioning infrastructure, and handling traffic spikes during major releases. Getting this wrong means either money wasted on idle resources or users experiencing buffering.
Traffic Modeling
Netflix builds traffic models based on historical viewing patterns, marketing signals, and regional demographics:
class TrafficModel:
def predict_demand(self, title_id: str, release_datetime: datetime) -> DemandForecast:
# Base demand from similar title releases
base_demand = self._get_baseline_demand(title_id)
# Marketing multiplier
marketing_score = self._get_marketing_score(title_id)
# Time-of-day adjustment
hour_factor = self._get_hour_distribution(release_datetime)
# Geographic distribution
regional_demand = self._get_regional_forecast(title_id)
return DemandForecast(
peak_concurrent_streams=base_demand * marketing_score * hour_factor,
total_viewers=sum(regional_demand.values()),
bandwidth_tbps=self._estimate_bandwidth(regional_demand)
)
Provisioning for Major Releases
For highly anticipated releases, Netflix pre-positions infrastructure:
class ReleaseProvisioning:
def provision_for_release(self, title_id: str, release_date: datetime):
forecast = self.traffic_model.predict_demand(title_id, release_date)
# Pre-encode at all quality levels
self.encoding_queue.prioritize(title_id, priority="critical")
# Pre-position to CDN edges
for region, expected_streams in forecast.regional_demand.items():
self.cdn.pre_position(
title_id=title_id,
region=region,
expected_streams=expected_streams
)
# Scale recommendation service
self.recommendation_service.scale(
min_instances=forecast.peak_concurrent_streams / 1000
)
# License server capacity
self.license_service.scale(
target_capacity=forecast.peak_concurrent_streams * 1.5
)
Load Shedding Strategy
When demand exceeds capacity, Netflix gracefully degrades rather than cascading failures:
class LoadShedding:
PRIORITY_LEVELS = {
"premium_subscriber": 1,
"standard_subscriber": 2,
"basic_subscriber": 3,
"anonymous": 4
}
def should_accept_request(self, request: Request, current_load: float) -> bool:
if current_load < self.capacity_threshold:
return True
# Reject lower-priority users under load
request_priority = self.PRIORITY_LEVELS[request.subscription_tier]
threshold_priority = self._load_to_priority_threshold(current_load)
return request_priority < threshold_priority
def _load_to_priority_threshold(self, load: float) -> int:
# As load increases, only accept higher-priority requests
if load > 0.95:
return 1 # Only premium
elif load > 0.85:
return 2 # Standard+
elif load > 0.70:
return 3 # Basic+
return 4 # Accept all
Data Storage
Netflix stores different types of data differently. Metadata (titles, episodes, users) lives in Cassandra. Playback state goes to EVCache for low-latency reads. The key is matching the storage engine to how you actually access the data.
Cassandra for Metadata
CREATE TABLE titles (
title_id UUID PRIMARY KEY,
title_type TEXT, -- 'movie' or 'show'
title_name TEXT,
synopsis TEXT,
release_year INT,
duration_secs INT,
rating TEXT,
genres LIST<TEXT>,
-- Denormalized for query performance
genres_sorted SET<TEXT>,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
CREATE TABLE episodes (
show_id UUID,
season_num INT,
episode_num INT,
episode_id UUID,
title_name TEXT,
duration_secs INT,
synopsis TEXT,
PRIMARY KEY ((show_id), season_num, episode_num)
);
CREATE TABLE title_by_genre (
genre TEXT,
release_year INT,
popularity_score DOUBLE,
title_id UUID,
PRIMARY KEY ((genre), popularity_score, release_year)
);
EVCache for Playback State
# Playback state cached in EVCache
PLAYBACK_STATE_TTL = 7200 # 2 hours
async def get_playback_state(user_id: int, title_id: int) -> PlaybackState:
key = f"playback:{user_id}:{title_id}"
cached = await evcache.get(key)
if cached:
return PlaybackState(**json.loads(cached))
# Fetch from database
state = await db.fetch_playback_state(user_id, title_id)
if state:
await evcache.setex(key, PLAYBACK_STATE_TTL, json.dumps(state))
return state
async def save_playback_position(user_id: int, title_id: int, position: int):
# Write to EVCache immediately
key = f"playback:{user_id}:{title_id}"
state = PlaybackState(user_id=user_id, title_id=title_id, position=position)
await evcache.setex(key, PLAYBACK_STATE_TTL, json.dumps(state))
# Persist to Cassandra async
asyncio.create_task(
db.save_playback_state(user_id, title_id, position)
)
Global Architecture
Multi-Region Setup
graph TB
subgraph "US-East (Primary)"
ZU[Zuul Gateway]
SVCS_E[Services]
DB_E[(Cassandra)]
end
subgraph "EU-West (Replica)"
ZU2[Zuul Gateway]
SVCS_W[Services]
DB_W[(Cassandra)]
end
subgraph "Asia-Pacific"
ZU3[Zuul Gateway]
SVCS_A[Services]
DB_A[(Cassandra)]
end
SVCS_E <--> DB_E
SVCS_W <--> DB_W
SVCS_A <--> DB_A
ZU --> SVCS_E
ZU2 --> SVCS_W
ZU3 --> SVCS_A
Traffic Routing
Netflix uses latency-based routing to direct users to the nearest region:
class LatencyRouter:
def route_request(self, user_id: int, service: str) -> str:
# Get user's last known region
user_region = self._get_user_region(user_id)
# Check if region is healthy
if self._is_region_healthy(user_region):
return f"{service}.{user_region}.netflix.com"
# Fallback to lowest latency
latencies = self._measure_all_regions(service)
return min(latencies, key=latencies.get)
def _measure_all_regions(self, service: str) -> Dict[str, float]:
return {
"us-east-1": self._ping(f"{service}.us-east-1.netflix.com"),
"eu-west-1": self._ping(f"{service}.eu-west-1.netflix.com"),
"ap-northeast-1": self._ping(f"{service}.ap-northeast-1.netflix.com")
}
DRM and Entitlement Systems
Digital Rights Management (DRM) protects content from unauthorized copying. Netflix uses multiple DRM schemes for different platforms.
DRM Architecture
graph TB
D[Device] -->|1. Initialize| L[License Server]
L -->|2. Device ID + Content Key Request| E[Entitlement Service]
E -->|3. Check subscription| DB[(User DB)]
DB -->|4. Entitlement OK| E
E -->|5. Grant license| L
L -->|6. Encrypted License| D
D -->|7. Decrypt with device key| K[Content Key]
K -->|8. Play| V[Video Decrypt]
Multi-DRM Support
class DRMManager:
"""Handle multiple DRM schemes per platform"""
DRM_SCHEMES = {
"widevine": "com.widevine.alpha", # Android, Chrome, many devices
"playready": "com.microsoft.playready", # Windows, Xbox
"fairplay": "com.apple.fairplay", # iOS, Safari
"clearkey": "org.w3.clearkey" # Web fallback
}
def get_supported_drm(self, device_type: str) -> List[str]:
"""Return DRM schemes supported by device"""
capabilities = {
"android": ["widevine"],
"ios": ["fairplay"],
"web_safari": ["fairplay", "clearkey"],
"web_chrome": ["widevine", "clearkey"],
"windows": ["playready", "widevine", "clearkey"],
"smarttv": ["widevine", "playready"]
}
return capabilities.get(device_type, ["widevine"])
Entitlement Checking
class EntitlementService:
"""Verify user can access specific content"""
async def check_entitlement(
self,
user_id: int,
title_id: str,
device_type: str
) -> EntitlementResult:
# Get user's subscription tier
subscription = await self.user_service.get_subscription(user_id)
# Get title's required tier
title = await self.metadata_service.get_title(title_id)
if subscription.tier < title.required_tier:
return EntitlementResult(
allowed=False,
reason="subscription_tier_too_low",
upgrade_to=title.required_tier
)
# Check concurrent stream limit
active_streams = await self.playback_service.count_active_streams(user_id)
if active_streams >= subscription.max_streams:
return EntitlementResult(
allowed=False,
reason="max_streams_exceeded",
active_streams=active_streams
)
return EntitlementResult(allowed=True)
Resilience Patterns
Circuit Breaker
class CircuitBreaker:
def __init__(self, failure_threshold: int = 5, timeout: int = 60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failure_count = 0
self.last_failure_time = None
self.state = "closed"
async def call(self, func: Callable, *args, **kwargs):
if self.state == "open":
if time.time() - self.last_failure_time > self.timeout:
self.state = "half-open"
else:
raise CircuitOpenException()
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise e
def _on_success(self):
self.failure_count = 0
self.state = "closed"
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "open"
Bulkheads
Isolate service dependencies to prevent cascading failures:
class BulkheadExecutor:
def __init__(self, max_concurrent: int = 100):
self.semaphore = asyncio.Semaphore(max_concurrent)
async def execute(self, func: Callable, *args, **kwargs):
async with self.semaphore:
return await func(*args, **kwargs)
# Different bulkheads for different service calls
metadata_bulkhead = BulkheadExecutor(max_concurrent=50)
recommendation_bulkhead = BulkheadExecutor(max_concurrent=20)
playback_bulkhead = BulkheadExecutor(max_concurrent=30)
Adaptive Bitrate Streaming Deep Dive
ABR Algorithm Details
class AdaptiveBitrateController:
"""Netflix's adaptive bitrate selection algorithm"""
def __init__(self):
# Quality levels (lowest to highest)
self.levels = [
{"name": "auto", "min_bandwidth": 0},
{"name": "360p", "min_bandwidth": 0.7, "max_bandwidth": 1.5},
{"name": "480p", "min_bandwidth": 1.5, "max_bandwidth": 3},
{"name": "720p", "min_bandwidth": 3, "max_bandwidth": 5},
{"name": "1080p", "min_bandwidth": 5, "max_bandwidth": 10},
{"name": "4K", "min_bandwidth": 10, "max_bandwidth": float('inf')}
]
# State
self.current_level = 0
self.buffer_levels = [] # Ring buffer of recent buffer levels
self.throughput_samples = [] # Ring buffer of recent throughput
def calculate_throughput(self, segments: List[Segment]) -> float:
"""Calculate effective throughput from recent segments"""
if not segments:
return 0
# Weight recent samples higher
weights = [0.1, 0.15, 0.25, 0.5] # Oldest to newest
weighted_sum = sum(
s.download_time / s.size_mb * w
for s, w in zip(segments[-4:], weights)
)
return weighted_sum
def select_quality(self, throughput: float, buffer_level: float) -> str:
"""Select optimal quality based on conditions"""
# Determine if we're in startup, steady, or buffer-depleted state
state = self._classify_state(buffer_level)
if state == "startup":
# During startup, download multiple qualities in parallel
return self.levels[0]["name"] # "auto"
elif state == "buffer_depleted":
# Buffer running low - switch to lower quality
return self._select_lower_quality(throughput)
elif state == "steady":
# Try to improve quality if buffer is healthy
return self._select_quality_for_throughput(throughput)
return self.current_level
def _classify_state(self, buffer_level: float) -> str:
if buffer_level < 10:
return "buffer_depleted"
elif buffer_level < 60:
return "steady"
else:
return "startup" # Large buffer, can experiment
CDN Cache Invalidation
graph TB
A[Content Update] --> B{New encode or metadata change?}
B -->|Metadata| C[Update metadata service]
B -->|New encode| D[Push to origin]
D --> E[Invalidate edge caches]
E --> F[CDN propagates in < 30s]
subgraph "Cache Invalidation Strategy"
C
E
end
class CDNInvalidationService:
"""Handle content cache invalidation across CDN"""
async def invalidate_title(self, title_id: str, reason: str):
"""Invalidate all cached data for a title"""
# Invalidate manifest files
await self.cdn.invalidate(f"/title/{title_id}/*.m3u8")
# Invalidate metadata
await self.cdn.invalidate(f"/api/metadata/{title_id}")
# Invalidate thumbnails
await self.cdn.invalidate(f"/images/{title_id}/*")
# Log invalidation for audit
await self.audit_log.record({
"event": "cache_invalidation",
"title_id": title_id,
"reason": reason,
"timestamp": datetime.utcnow()
})
Multi-Device Session Management
Users watch Netflix on multiple devices. Sessions must sync playback position and handle concurrent playback limits.
Session State
class PlaybackSession:
"""Represent an active playback session"""
def __init__(
self,
session_id: str,
user_id: int,
title_id: str,
device_type: str,
position_seconds: int,
quality: str
):
self.session_id = session_id
self.user_id = user_id
self.title_id = title_id
self.device_type = device_type
self.position_seconds = position_seconds
self.quality = quality
self.started_at = datetime.utcnow()
self.last_heartbeat = datetime.utcnow()
async def update_position(self, position_seconds: int):
"""Update playback position"""
self.position_seconds = position_seconds
self.last_heartbeat = datetime.utcnow()
# Persist to storage
await self.playback_store.save_position(
self.session_id,
position_seconds
)
# Invalidate device position cache
await self.cache.delete(f"position:{self.user_id}:{self.title_id}")
Continue Watching Sync
class ContinueWatchingService:
"""Sync playback position across devices"""
async def get_resume_position(
self,
user_id: int,
title_id: str,
requesting_device: str
) -> ResumePosition:
# Check if another device has more recent position
active_sessions = await self.session_manager.get_active_sessions(
user_id,
exclude_device=requesting_device
)
# Find session with this title
for session in active_sessions:
if session.title_id == title_id:
return ResumePosition(
position_seconds=session.position_seconds,
device=session.device_type,
updated_at=session.last_heartbeat
)
# Fallback to database
return await self.playback_store.get_position(user_id, title_id)
async def handle_playback_start(
self,
user_id: int,
title_id: str,
device_type: str
) -> Session:
# Create new session
session = PlaybackSession(
session_id=uuid4(),
user_id=user_id,
title_id=title_id,
device_type=device_type,
position_seconds=0,
quality="auto"
)
# Register session
await self.session_manager.register(session)
# Enforce concurrent stream limit
await self._enforce_stream_limit(user_id)
return session
async def _enforce_stream_limit(self, user_id: int):
"""Ensure user hasn't exceeded stream limit"""
subscription = await self.user_service.get_subscription(user_id)
active = await self.session_manager.count_active(user_id)
if active > subscription.max_streams:
# Force oldest session to stop
oldest = await self.session_manager.get_oldest_session(user_id)
await self.session_manager.terminate(oldest.session_id)
Trade-off Analysis
| Factor | Custom CDN (Open Connect) | Third-Party CDN | Hybrid Approach |
|---|---|---|---|
| Cost at scale | Lower (owned hardware) | Higher (per TB) | Medium |
| Control | Full | Limited | Partial |
| Time to deploy | Long (build your own) | Fast (existing) | Medium |
| Optimization | Streaming-specific | General purpose | Mixed |
| Operational burden | High (you manage) | Low (managed) | Medium |
Production Failure Scenarios
| Failure Scenario | Impact | Mitigation |
|---|---|---|
| CDN origin failure | Video segments unavailable | Multi-CDN; fallback to direct streaming |
| License server down | No new streams can start | Cache licenses; graceful degradation |
| Recommendation service slow | Homepage takes longer to load | Cache recommendations; show stale content |
| Encoding pipeline backlog | New content delayed | Priority encoding; capacity headroom |
| Device too many streams | New streams rejected | Clear error message; upgrade prompt |
Scenario Drills
Scenario 1: CDN Origin Server Failure
Situation: The CDN origin serving video segments goes down during peak viewing hours.
Analysis:
- Devices requesting segments get errors
- Playback stalls, rebuffering begins
- Millions of concurrent streams affected
Solution: Multi-CDN deployment with automatic failover. If one CDN has issues, traffic routes to another. Open Connect has multiple origin clusters geographically distributed. Devices can switch CDN transparently if segment requests fail.
Scenario 2: New Show Releases Simultaneously Worldwide
Situation: A highly anticipated show releases globally at midnight UTC. 10 million users try to start playback simultaneously.
Analysis:
- Encoding pipeline must complete all quality levels before release
- CDN edge caches start cold for a new title
- License servers receive burst of requests
Solution: Pre-encode content days before release. Pre-position popular titles at ISP locations. License server scales horizontally; licenses are cached for their validity period to reduce server load.
Scenario 3: ABR Algorithm Causes Quality Flutter
Situation: Users report constantly changing video quality, creating a jarring viewing experience.
Analysis:
- ABR switches quality too frequently
- Bandwidth measurements fluctuate (wireless networks)
- Buffer thresholds trigger rapid up/down switching
Solution: Implement stability windows. Once you switch to a quality level, stay there for at least 30 seconds. Use weighted throughput averages that emphasize recent samples. Build in safety margins (use 70% of measured throughput for decisions).
Failure Flow Diagrams
Stream Playback Initialization
flowchart TD
A[User Clicks Play] --> B[Get Manifest from CDN]
B --> C[Parse Quality Levels]
C --> D[Select Initial Quality]
D --> E[Request Video Segment]
E --> F{Segment Available?}
F -->|No| G[Try Alternative CDN]
G --> E2[Request Video Segment]
E2 --> F2{Segment Available?}
F2 -->|Yes| H[Download Segment]
F2 -->|No| G
F -->|Yes| H
H --> I[Request DRM License]
I --> J[License Server]
J --> K{Check Entitlement?}
K -->|No| L[Return Error]
K -->|Yes| M[Return License Key]
M --> N[Decrypt Segment]
N --> O[Decode and Display]
O --> P[Buffer Next Segment]
P --> E
ABR Quality Selection
flowchart TD
A[Monitor Buffer Level] --> B{Buffer < 10s?}
B -->|Yes| C[Select Lower Quality]
B -->|No| D{Buffer > 60s?}
D -->|Yes| E[Select Higher Quality]
D -->|No| F[Maintain Current Quality]
C --> G[Download at New Quality]
E --> H[Download at New Quality]
F --> I[Continue Current Quality]
G --> J[Deliver to Player]
H --> J
I --> J
J --> A
Multi-CDN Failover
flowchart TD
A[Request Segment] --> B[Primary CDN]
B --> C{Response OK?}
C -->|Yes| D[Deliver to Device]
C -->|No| E[Try Secondary CDN]
E --> F{Response OK?}
F -->|Yes| G[Deliver to Device]
F -->|No| H[Fallback to Origin Direct]
H --> I{Origin Available?}
I -->|Yes| J[Deliver to Device]
I -->|No| K[Show Error]
Common Pitfalls / Anti-Patterns
Pitfall 1: Aggressive Quality Switching
Problem: Switching quality too frequently creates a “flutter” effect that’s visually jarring.
Solution: Implement quality stability windows. Once you switch up or down, stay at that level for at least 30 seconds.
Pitfall 2: Ignoring Network Variability
Problem: Using average throughput misses spikes and drops.
Solution: Use weighted average that emphasizes recent samples. Build in safety margins (use 70-80% of measured throughput for decisions).
Pitfall 3: Not Testing on Real Networks
Problem: Lab testing does not capture real-world variability (WiFi interference, cellular handoffs).
Solution: A/B test ABR algorithms on real users. Monitor quality distributions and rebuffer ratios in production.
Real-world Failure Scenarios
Scenario 1: Netflix Open Connect Appliance Failure
What happened: During peak viewing hours, multiple Netflix Open Connect Appliances (OCAs) deployed at an ISP failed simultaneously due to a firmware bug, causing video buffering for thousands of subscribers in that region.
Root cause: A firmware update pushed to OCAs contained a memory leak that was only triggered under sustained high-throughput conditions. The bug was not caught in staging because test environments never replicated the sustained load of peak hours.
Impact: Subscribers experienced persistent buffering and playback failures. Support tickets flooded in from the affected region, and Netflix’s NOC team took over 3 hours to identify the root cause as a firmware issue rather than a network connectivity problem.
Lesson learned: Test firmware updates under production-representative sustained load conditions. Implement progressive rollout with automatic rollback if error rates spike. Build observability into the OCA management plane to detect anomalies faster.
Scenario 2: Cassandra Batch Statement Timeout Cascade
What happened: A change in Netflix’s Cassandra batch statement handling caused cascading timeouts across the playback history service, resulting in incomplete viewing history for users and data loss during the incident window.
Root cause: The migration from individual statements to batch statements unintentionally increased query execution time. Under peak load, query latency exceeded the application timeout threshold, triggering retries that compounded the load on the already-stressed cluster.
Impact: Users who watched content during the incident window had gaps in their viewing history. The issue affected the “Continue Watching” feature, which relies on accurate playback history, causing frustration for impacted subscribers.
Lesson learned: Always benchmark query performance changes at production scale before deploying schema or query pattern changes. Set conservative timeout values that account for cluster stress conditions. Implement circuit breakers for database calls to prevent cascade failures.
Scenario 3: Zuul Filter Regression Causing Service Degradation
What happened: A regression introduced in a Zuul filter update caused incorrect routing for a subset of API requests, resulting in users receiving content intended for other accounts.
Root cause: The filter’s route lookup logic had a subtle regression that only manifested when the number of available backend instances crossed a specific threshold. The condition was not covered by existing integration tests.
Impact: A small number of users briefly saw content belonging to other accounts — a serious data confidentiality incident. Netflix’s incident response team immediately rolled back the filter and initiated a security investigation.
Lesson learned: Implement integration tests that cover edge cases in routing logic, including boundary conditions. Deploy routing filters with canary analysis that monitors error rates and data integrity metrics. Ensure all routing changes undergo security review.
Quick Recap
- Netflix’s Open Connect CDN places servers at ISP locations worldwide.
- Adaptive bitrate streaming selects quality based on bandwidth and buffer health.
- DRM (Widevine, PlayReady, FairPlay) protects content on each platform.
- Entitlement service enforces subscription tier and concurrent stream limits.
- Session sync allows “continue watching” across devices.
Copy/Paste Checklist
- [ ] Implement ABR with buffer-aware quality selection
- [ ] Use multi-CDN with failover
- [ ] Cache CDN responses aggressively
- [ ] Enforce concurrent stream limits per subscription
- [ ] Implement session sync for continue watching
- [ ] Monitor stream startup time and rebuffer ratio
- [ ] Test ABR on real networks, not just labs
Observability Checklist
Metrics to Capture
stream_startup_time_seconds(histogram) - Time to first framebitrate_selected(histogram) - Quality distributionrebuffer_ratio(gauge) - Time spent rebuffering vs playingcdn_cache_hit_ratio(gauge) - Cache efficiencylicense_request_latency_ms(histogram) - DRM overheadconcurrent_streams(gauge) - Active stream count
Alerts to Configure
| Alert | Threshold | Severity |
|---|---|---|
| Startup time P99 > 3s | 3000ms | Warning |
| Rebuffer ratio > 5% | 5% | Warning |
| CDN cache hit < 90% | 90% | Warning |
| License latency P99 > 200ms | 200ms | Critical |
| Active streams < expected | < 50% baseline | Warning |
Security Checklist
- DRM encryption for all premium content
- Device attestation before issuing licenses
- HDCP enforcement for high-definition outputs
- Screen capture detection and blocking
- Concurrent stream enforcement
- Geographic restrictions per title
- Secure token exchange for session management
- Content signing to prevent tampering
Interview Questions
Netflix streams billions of hours monthly. At that scale, commercial CDN costs become prohibitive. Open Connect appliances are custom-built for Netflix's workload (video streaming, not general web content). They are deployed at ISP facilities worldwide, placing content close to users while reducing Netflix's backbone costs. The economics only work at Netflix's scale.
Netflix encodes each title in multiple quality levels (4K, 1080p, 720p, etc.). The device downloads an HLS/DASH manifest listing available quality levels. The client measures download speed and buffer fullness. If buffer runs low, it switches to lower quality. If buffer is healthy and bandwidth is high, it switches up. This happens every few seconds during playback.
The entitlement service tracks active playback sessions per user. When a user starts playback, it counts existing sessions. If the count exceeds the subscription tier limit (e.g., 4 streams for premium), the oldest session is terminated. The device receives an error with an upgrade prompt.
These are DRM schemes for different platforms. Widevine (Google) runs on Android, Chrome, most smart TVs. PlayReady (Microsoft) runs on Windows, Xbox, some smart TVs. FairPlay (Apple) runs on iOS, Safari, Apple TV. Netflix negotiates with the device to select the highest security level the device supports. Content is encrypted once and licenses are delivered via the device's preferred DRM.
You're watching a show on your laptop, switch to your phone, and it resumes from roughly where you left off. This seems simple but requires coordination.
Netflix creates a playback session when you start watching. As you watch, position updates go to EVCache for low-latency access and asynchronously persist to Cassandra. When you open Netflix on another device, the system looks for your active sessions across all devices and resumes from whichever has the most recent position.
The subtlety: this means your phone might be 30 seconds ahead of where you actually left off on the laptop, but that's usually fine. Users expect continuity, not precision.
EVCache is Netflix's memcached wrapper — a distributed in-memory cache tuned for their workload. It sits in front of Cassandra for reads, giving sub-millisecond latency for hot data. The write path goes to Cassandra directly; EVCache gets populated on read.
What's interesting is what they cache: playback sessions, recommendations for active users, and anything else that's read frequently but doesn't change constantly. The cache TTLs are short by design — freshness matters more than cache hit rates for this data.
Netflix doesn't stream one video file — they encode each title in dozens of quality variants (different resolutions, bitrates, codecs). A new release might get encoded in 4K HDR, 1080p, 720p, 480p, and lower, with both H.264 and H.265 codecs. That's dozens of files per title.
The pipeline prioritizes based on predicted popularity. A show expected to be popular gets encoded faster and pre-positioned to CDN edges before release. This is why Netflix can release a new season globally at midnight and have it streaming immediately — the encoding work is done in advance.
Netflix built Open Connect because at their scale, commercial CDN pricing would have been astronomical. Owning the hardware means lower per-TB costs at massive volume. They can also optimize specifically for video streaming — the hardware, the software, the ISP partnerships, all tuned for their workload.
The cost is operational complexity. Netflix has to build, deploy, and maintain servers in ISP facilities worldwide. That's a huge engineering investment. For most companies, a third-party CDN like Cloudflare or Fastly is the right call — you get global reach without building your own infrastructure.
The hybrid approach (some self-hosted, some third-party) is common for companies at Netflix scale but not theirs.
New users have no viewing history, so collaborative filtering can't work yet. Netflix falls back to contextual signals: what device you're on, what time of day, your geographic region. They also use demographic-based recommendations — "users in your age group and region also watched X."
The personalization ramps up as you watch more. The first few recommendations are generic; by the time you've watched a dozen titles, Netflix has enough signal to make meaningful personalized suggestions.
What's worth noting: Netflix's recommendations are famously driven by their ML models, but the cold-start problem means new users see more editorial and popularity-based picks until the models have enough data.
At millions of queries per second across global regions, a traditional sharded MySQL setup hits limits: cross-shard queries become expensive, adding capacity means resharding, and failover is complicated. Cassandra avoids these problems by being designed for horizontal scale from the start.
The data model maps naturally to Cassandra's partition keys. "Give me all episodes for show X" is a partition lookup, not a scan. "Give me titles sorted by popularity within drama" uses Cassandra's clustering columns. The query patterns align with how the data is stored.
Cassandra's tunable consistency also matters. Netflix can choose eventual consistency for reads where freshness doesn't matter (view counts, recommendations) and stronger consistency where it does (billing, entitlements).
Quality flutter happens when the ABR algorithm switches quality too frequently, creating a jarring viewing experience. Netflix addresses this through several mechanisms:
- Stability windows: Once you switch to a quality level, stay there for at least 30 seconds before considering another switch.
- Weighted throughput averaging: Recent samples are weighted higher than older ones to respond quickly to actual changes while avoiding noise from temporary fluctuations.
- Safety margins: Use 70-80% of measured throughput for quality decisions to account for variability in network conditions.
- State-based decisions: Different logic for startup vs. steady-state vs. buffer-depleted scenarios.
Key components for multi-CDN failover:
- Health monitoring: Each CDN reports latency, error rates, and throughput continuously.
- Failover triggers: Automatic switch when primary CDN returns errors, latency exceeds threshold, or success rate drops below SLA.
- Client-side logic: Devices retry with exponential backoff, then switch to next CDN in priority order.
- Consistent hashing: Ensure segments are requested from the same CDN to avoid requesting duplicate data from multiple sources.
- Origin fallback: Last resort is direct origin streaming if all CDNs fail.
Preparation starts days before release:
- Priority encoding queue: Predict popularity based on marketing signals, pre-encode high-priority titles first.
- All quality levels up front: Encode 4K, 1080p, 720p, 480p, etc., with both H.264 and H.265 codecs before release.
- CDN pre-positioning: Push encoded content to ISP-level Open Connect appliances before midnight UTC.
- Capacity headroom: Encoding farms scale horizontally, allocate extra capacity for expected burst.
- License server scaling: Pre-scale license server fleet to handle 10x normal request volume.
Geographic restrictions are enforced at multiple layers:
- Entitlement service: Before issuing a DRM license, checks if the user's account has rights for the requested geographic region.
- Content metadata: Each title has geographic availability windows tied to licensing agreements.
- Device IP routing: Latency-based routing considers region; requests for restricted content return error before segment delivery.
- CDN-level blocking: Some CDNs can enforce geographic restrictions at edge nodes before requests hit origin.
Netflix uses multi-armed bandit approaches and A/B testing:
- Exploitation: Show titles similar to what you've watched (high confidence predictions).
- Exploration: Occasionally show new or unpredictable titles to gather data on user preferences.
- Epsilon-greedy: Most of the time use best predictions, but occasionally random selection.
- Thompson sampling: Bayesian approach that balances exploration more intelligently.
- Contextual bandits: Consider time of day, device type, and other context when deciding what to explore.
The session sync system requires:
- Active session tracking: When playback starts, create a session in EVCache with device ID, timestamp, and position.
- Cross-device position queries: When opening the app, query all active sessions for this user, find most recent position across any device.
- Async Cassandra persistence: Position updates go to Cassandra in background for durability.
- Conflict resolution: If two devices report positions within seconds of each other, use the later timestamp.
- Session expiration: Sessions expire after heartbeat timeout (e.g., 30 minutes of no activity).
The enforcement process:
- Stream counting: Before starting a new session, count active sessions for this user.
- Limit checking: Compare count against subscription tier limit (Basic=1, Standard=4, Premium=4).
- Oldest termination: If at limit, find the session with the earliest start time (or oldest heartbeat).
- Graceful notification: Send termination message to the oldest device showing an error with upgrade prompt.
- State cleanup: Remove the terminated session from both EVCache and Cassandra.
Key streaming health metrics:
- Stream startup time: Time from click to first frame. Target P99 < 3 seconds.
- Rebuffer ratio: Time spent waiting vs. time playing. Target < 5%.
- Bitrate distribution: Histogram of quality levels selected. Higher is better but must not cause rebuffering.
- CDN cache hit ratio: Target > 90% at edge caches.
- License request latency: DRM overhead. Target P99 < 200ms.
- Error rates: By error type (network, DRM, device).
- Concurrent streams gauge: Actual vs. expected for capacity planning.
Detection mechanisms:
- Location tracking: If two sessions have IP geolocations that are physically impossible to reach from one user (e.g., different continents within minutes), flag the account.
- Device fingerprinting: Track device characteristics to identify if multiple devices are being used by different households.
- Behavioral patterns: Different viewing patterns (different times of day, different genres) might indicate shared accounts.
- Explicit enforcement: When stream limit is hit, require email/SMS verification to confirm the primary user.
Open Connect is Netflix's custom CDN hardware:
- Purpose-built hardware: Servers optimized for video streaming workloads, not general web content.
- ISP co-location: Appliances live in ISP data centers, directly connected to ISP networks.
- Benefits: Reduces Netflix's backbone traffic (users get content from local ISP), lowers latency for users, reduces costs at massive scale.
- Content placement: Popular content is pre-positioned based on viewing patterns. New releases are pushed before release date.
- ISP benefits: Reduced upstream bandwidth costs, better quality for their subscribers.
Further Reading
- CDN Deep Dive — Content delivery network design
- NoSQL Databases — Cassandra and distributed data stores
- Distributed Caching — EVCache patterns
- Microservices Architecture — Service decomposition patterns
- Circuit Breaker Pattern — Resilience patterns
For more on CDN design, see our CDN Deep Dive guide. For database strategies, see NoSQL Databases. For caching patterns, see Distributed Caching.
Conclusion
Netflix’s architecture handles global streaming at scale:
- Open Connect CDN places content at ISP facilities worldwide
- Adaptive bitrate streaming optimizes quality for each connection
- Microservices enable independent scaling and deployment
- Recommendation algorithms drive content discovery
- Multi-region deployment for global reach
Category
Related Posts
System Design: Twitter Feed Architecture and Scalability
Deep dive into Twitter system design. Learn about feed generation, fan-out, timeline computation, search, notifications, and scaling challenges.
System Design: URL Shortener from Scratch
Deep dive into URL shortener architecture. Learn hash function design, redirect logic, data storage, rate limiting, and high-availability.
Amazon Architecture: Lessons from the Pioneer of Microservices
Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.