System Design: Netflix Architecture for Global Streaming

Deep dive into Netflix architecture. Learn about content delivery, CDN design, microservices, recommendation systems, and streaming protocols.

published: March 22, 2026 reading time: 41 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Netflix streams to 250M+ subscribers using a custom CDN that puts servers inside ISP networks, cutting backbone costs while keeping playback fast. Adaptive bitrate streaming shifts quality based on network conditions and buffer levels, with QUIC replacing TCP for faster handshakes and seamless network transitions on mobile. The backend runs on microservices for independent scaling, Cassandra for metadata, EVCache for playback state, and recommendation algorithms that drive roughly 80% of what viewers watch.

System Design: Netflix Architecture for Global Streaming

Netflix streams to over 250 million subscribers across 190 countries. The engineering challenges at that scale are nothing like what most people picture — it’s not just about pushing bytes, but about content delivery, real-time encoding, recommendation systems that drive 80% of what people watch, and a microservices platform that has to stay up when half the world is binging a new season.

This case study goes into how Netflix’s architecture actually works.

Requirements Analysis

Functional Requirements

Users need to:

Browse and search a catalog of movies and TV shows
Stream video content on various devices
Create and manage profiles
Continue watching across devices
Rate and review content

Non-Functional Requirements

The platform must:

Stream in 4K HDR with surround sound
Start playback in under 2 seconds
Handle 15+ million concurrent streams
Maintain 99.99% availability
Work on 1000+ device types

Capacity Estimation

Metric	Value
Subscribers	250 million
Peak concurrent streams	15+ million
Content library	50,000+ titles
Average stream bitrate	8 Mbps
Peak bandwidth	120 Tbps
CDN edge locations	100+

Content Delivery Architecture

Netflix’s content delivery is where most of the engineering work actually happens. Instead of relying on third-party CDNs (Akamai, Cloudflare, and the like), Netflix built their own called Open Connect — specifically designed to handle video traffic at their enormous scale.

Open Connect CDN

graph TB
    A[Netflix Origin] --> B[ISP Interconnects]
    B --> C[Open Connect Appliances]
    C --> D[Residential Routers]
    D --> E[Devices]

    subgraph "ISP Network"
        B
    end

    subgraph "Customer Premise"
        C
        D
    end

Open Connect appliances are purpose-built servers Netflix deploys directly inside ISP data centers around the world. They cache popular content as close to users as possible, which dramatically cuts down Netflix’s bandwidth costs and keeps playback latency low.

Request Flow

sequenceDiagram
    participant D as Device
    participant OCA as Open Connect Appliance
    participant CS as Control Plane
    participant LS as License Server

    D->>CS: GET /manifest.m3u8
    CS->>D: Return manifest with URL to nearest OCA
    D->>OCA: GET /video/segment1.ts
    OCA-->>D: Video segment
    D->>LS: GET /license for DRM
    LS-->>D: License key
    D->>OCA: GET /video/segment2.ts
    OCA-->>D: Video segment

Video Encoding Pipeline

graph LR
    A[Source Video] --> B[Transcode Cluster]
    B --> C{HD or 4K?}
    C -->|4K| D[4K Encode Farm]
    C -->|HD| E[HD Encode Farm]
    D --> F[Output Profiles]
    E --> F
    F --> G[CDN Origins]
    G --> H[Edge Cache]

Netflix encodes each title in multiple resolutions and bitrates — the output of their encoding pipeline feeds directly into adaptive bitrate streaming, so the player can switch quality on the fly based on the viewer’s network conditions.

Profile	Resolution	Bitrate	Codec
4K HDR	3840x2160	16 Mbps	H.265/VP9
1080p	1920x1080	5 Mbps	H.264/H.265
720p	1280x720	2.5 Mbps	H.264
480p	854x480	1 Mbps	H.264
Audio	Stereo/5.1/Atmos	192-768 kbps	AAC

Streaming Protocol

Netflix serves video using adaptive bitrate streaming, adjusting quality based on network conditions.

HLS/DASH Manifest

sequenceDiagram
    participant D as Device
    participant CDN as CDN
    participant L as License Server

    Note over D: Initial playback request

    D->>CDN: GET /title/1234/manifest.m3u8
    CDN-->>D: M3U8 with quality levels
    Note over D: Parse quality levels

    D->>CDN: GET /title/1234/video_4k.m3u8
    CDN-->>D: Segment list

    D->>CDN: GET /title/1234/video_4k/segment1.ts
    CDN-->>D: Video segment
    D->>L: GET /widevine/license
    L-->>D: DRM license

    Note over D: Decode and display

    D->>CDN: GET /title/1234/video_4k/segment2.ts
    CDN-->>D: Video segment

Adaptive Bitrate Logic

class AdaptiveBitrateController:
    def __init__(self, bandwidth_calculator: BandwidthCalculator):
        self.bandwidth_calculator = bandwidth_calculator
        self.current_quality = "auto"
        self.quality_levels = ["4k", "1080p", "720p", "480p", "360p"]

    def select_quality(self, buffer_level: float, throughput: float) -> str:
        # Rules-based adaptation
        if buffer_level < 10:  # Buffer running low
            return self._downgrade()
        elif buffer_level > 60:  # Buffer healthy
            return self._upgrade(throughput)
        else:
            return self.current_quality

    def _downgrade(self) -> str:
        current_idx = self.quality_levels.index(self.current_quality)
        if current_idx < len(self.quality_levels) - 1:
            return self.quality_levels[current_idx + 1]
        return self.current_quality

    def _upgrade(self, throughput: float) -> str:
        # Choose highest quality that fits throughput
        for quality in self.quality_levels:
            if self._required_throughput(quality) < throughput * 0.8:
                return quality
        return self.quality_levels[-1]

QUIC and HTTP/3 Deep Dive

Netflix’s streaming infrastructure increasingly relies on QUIC (Quick UDP Internet Connections), the transport protocol behind HTTP/3. Understanding QUIC matters for streaming architecture because it directly shapes how buffers behave, how connections survive network changes, and how users experience playback on spotty connections.

Why QUIC Replaces TCP for Streaming

TCP was designed for reliable, ordered byte streams. For streaming video, that reliability guarantee is overkill — if a segment arrives late, you’d rather skip it than wait. QUIC builds on UDP to give streaming applications control over what “reliable” means.

graph TB
    subgraph "TCP Limitations"
        A[TCP Handshake] --> B[TLS Handshake]
        B --> C[3-RTT Total]
        C --> D[Head-of-line blocking]
        D --> E[Connection migration impossible]
    end

    subgraph "QUIC Advantages"
        F[QUIC Handshake] --> G[Combined crypto]
        G --> H[1-RTT or 0-RTT]
        H --> I[Stream multiplexing]
        I --> J[Connection migration works]
    end

Key QUIC benefits for streaming:

Feature	TCP	QUIC
Connection establishment	1-3 RTTs	0-1 RTT
Head-of-line blocking	Yes	No (stream-level)
Connection migration	Breaks	Preserves
Packet loss impact	All streams	Single stream
TLS handshake	Separate	Combined

Stream Multiplexing in QUIC

TCP’s head-of-line blocking problem means if packet N is lost, all HTTP/2 streams stall waiting for retransmission. QUIC fixes this at the transport layer:

class QUICStreamMultiplexing:
    """Streams are independent in QUIC"""

    def __init__(self, connection: QUICConnection):
        self.connection = connection
        self.streams = {}  # Stream ID -> Stream state

    def send_on_stream(self, stream_id: int, data: bytes):
        # Packet loss on stream 5 does NOT block stream 7
        self.connection.send_stream_data(stream_id, data)

    def handle_packet_loss(self, stream_id: int):
        # Only this stream recovers, others continue
        self.streams[stream_id].retransmit()

For Netflix, this means audio and video can recover independently. If video packets drop, audio keeps playing smoothly while video rebuffers — far better than both freezing.

Connection Migration

Users switch between WiFi and cellular constantly. With TCP, this breaks connections and forces reconnection. QUIC’s connection migration preserves the logical session across network changes:

class ConnectionMigration:
    """QUIC connection survives network changes"""

    def on_network_change(self, new_endpoint: tuple):
        # Same connection ID, new network path
        self.connection.migrate(new_endpoint)

        # All stream state preserved
        # Playback continues without rebuffering
        assert self.streams_all_alive()

Netflix leverages this for mobile users moving between networks. The playback session continues seamlessly rather than stalling during reconnection.

0-RTT Resumption

QUIC’s 0-RTT mode allows clients to resume sessions and send data immediately without waiting for handshake completion. For streaming:

class ZeroRTTStreaming:
    """Send first video request immediately"""

    async def start_playback(self, title_id: str):
        # 0-RTT: send immediately, handshake in background
        # First segments start downloading while TLS completes
        self.transport.send_stream_data(
            stream_id=1,
            data=f"GET /video/{title_id}/segment1",
            flush=True
        )

This cuts startup latency significantly — the first video segments arrive while the secure connection is being established.

QUIC Loss Detection

QUIC implements its own loss detection optimized for streaming:

class QUICLossDetection:
    def __init__(self):
        self.packet_threshold = 3  # Pkt-based detection
        self.time_threshold = 0.25  # 250ms time-based

    def on_ack(self, ack_info: dict):
        if self._detect_loss(ack_info):
            # Fast retransmit for streaming
            self._retransmit_lost_stream()

    def _detect_loss(self, ack_info: dict) -> bool:
        # QUIC uses both packet count AND time since last ack
        elapsed = time.time() - ack_info['last_ack_time']
        lost_packets = ack_info['total_packets'] - ack_info['acked_packets']
        return (lost_packets >= self.packet_threshold or
                elapsed >= self.time_threshold)

Microservices Architecture

Netflix decomposed their monolith into hundreds of microservices.

Service Decomposition

graph TB
    subgraph "Edge Layer"
        GW[API Gateway]
        GS[Gateway Service]
    end

    subgraph "Backend"
        M[Metadata Service]
        R[Recommendation Engine]
        P[Playback Service]
        S[Search Service]
        A[Auth Service]
    end

    subgraph "Data"
        EV[EVCache]
        DB[(Cassandra)]
        ES[Elasticsearch]
    end

    GW --> GS
    GS --> M
    GS --> R
    GS --> P
    GS --> S
    GS --> A
    M --> DB
    M --> EV
    R --> EV
    S --> ES

Key Microservices

Service	Responsibility	Data Store
API Gateway	Request routing, aggregation	None
Metadata	Titles, episodes, images	Cassandra
Playback	Streaming session management	EVCache
Recommendations	Personalized suggestions	Elasticsearch
Search	Full-text search	Elasticsearch
User Profile	Account, profiles, settings	Cassandra
Billing	Subscriptions, payments	PostgreSQL

API Gateway

public class ApiGatewayApplication {
    public static void main(String[] args) {
        // Zuul routes configuration
        addRequestThreadFilters();
        addResponseFilters();

        // Route definitions
        configureRoutes(new ZuulRouteBuilder()
            .route("/api/v1/metadata/**", "metadata-service")
            .route("/api/v1/playback/**", "playback-service")
            .route("/api/v1/recommendations/**", "recommendation-service")
            .route("/api/v1/search/**", "search-service")
        );
    }
}

API Design

Streaming Endpoints

Endpoint	Description
GET /api/v1/browse	Get personalized rows
GET /api/v1/titles/{id}/metadata	Get title details
GET /api/v1/titles/{id}/similars	Similar titles
GET /api/v1/playback/session	Initialize playback
POST /api/v1/playback/position	Update position
GET /api/v1/search?q={query}	Search titles

Response Example

{
  "data": {
    "id": "81239481",
    "title": "Stranger Things",
    "type": "show",
    "poster_url": "https://cdn.netflix.com/poster.jpg",
    "backdrop_url": "https://cdn.netflix.com/backdrop.jpg",
    "rating": "TV-14",
    "year": 2024,
    "duration": "4 seasons",
    "synopsis": "When a young boy vanishes...",
    "genres": ["Drama", "Horror", "Sci-Fi"],
    "seasons": [
      {
        "season_num": 1,
        "episodes": [
          {
            "num": 1,
            "title": "The Vanishing",
            "duration_secs": 3600,
            "thumbnail": "https://cdn.netflix.com/s1e1.jpg"
          }
        ]
      }
    ]
  },
  "meta": {
    "request_id": "abc123",
    "version": "2.1"
  }
}

Recommendation System

Netflix’s recommendation engine is responsible for about 80% of the content people watch. That’s not a small thing. If their recommendations were bad, they’d have to produce twice as much original content to keep viewers engaged.

Recommendation Pipeline

graph LR
    A[User Events] --> B[Event Pipeline]
    B --> C[Feature Store]
    C --> D{Ranking Models}
    D --> E[Personalized Ranking]
    D --> F[Similar Titles]
    D --> G[Top Picks]
    E --> H[API Response]
    F --> H
    G --> H

Ranking Model Features

class RankingFeatures:
    def __init__(self, user_id: int, title_id: int):
        self.user_features = self._get_user_features(user_id)
        self.title_features = self._get_title_features(title_id)
        self.context_features = self._get_context_features()

    def compute_features(self) -> Dict[str, float]:
        return {
            # User attributes
            "user_age_days": self.user_features.age_days,
            "user_avg_watch_time": self.user_features.avg_watch_time,
            "user_rating_avg": self.user_features.avg_rating,
            "user_genre_preferences": self.user_features.genre_scores,

            # Title attributes
            "title_popularity_score": self.title_features.popularity,
            "title_recency": self.title_features.release_days_ago,
            "title_rating": self.title_features.avg_rating,
            "title_match_score": self._genre_match(),

            # Context
            "time_of_day": self.context_features.hour,
            "day_of_week": self.context_features.day,
            "device_type": self.context_features.device
        }

Ranking Service

class RecommendationService:
    def __init__(self, model: RankingModel, cache: RedisCache):
        self.model = model
        self.cache = cache

    async def get_ranked_list(
        self,
        user_id: int,
        row_count: int = 20,
        evidence_count: int = 5
    ) -> List[RankedTitle]:
        # Check cache
        cache_key = f"recs:{user_id}:{row_count}"
        cached = await self.cache.get(cache_key)
        if cached:
            return self._deserialize(cached)

        # Get candidate titles
        candidates = await self._get_candidates(user_id, 500)

        # Compute features for each
        ranked = []
        for title in candidates:
            features = self._compute_features(user_id, title)
            score = await self.model.predict(features)
            ranked.append((score, title))

        # Sort and return top N
        ranked.sort(key=lambda x: x[0], reverse=True)
        results = [title for _, title in ranked[:row_count]]

        # Cache for 5 minutes
        await self.cache.setex(cache_key, 300, self._serialize(results))

        return results

Capacity Planning Deep Dive

Capacity planning for Netflix’s recommendation system operates under constraints that most web infrastructure teams never encounter. The recommendation engine fires on every homepage load, every search query, and every “More Like This” row on the detail page — hundreds of millions of calls per day, each requiring ML model inference that is orders of magnitude more expensive than a simple database lookup. A typical caching layer helps, but cache hit rates collapse during major release events when traffic patterns shift dramatically and old predictions no longer apply. The structural challenge is not just scale — it is that the workload is both latency-sensitive (users expect the homepage to render fast) and compute-intensive (ranking models involve real-time matrix factorization and deep learning inference). Get the capacity wrong in the quiet direction and you burn money on idle GPU clusters. Get it wrong in the other direction and you leave users staring at loading spinners when they should be browsing.

The other dimension that makes this different from standard capacity planning is the release-driven traffic spike. When a highly anticipated show drops globally at midnight UTC, the recommendation pipeline sees demand orders of magnitude above normal within seconds. Unlike e-commerce sites where Black Friday traffic builds over hours, Netflix’s audience can shock the system in minutes. Marketing-driven spikes compound this — a viral trailer can accelerate demand beyond any historical baseline. Traditional auto-scaling reacts too slowly for these events; by the time new instances spin up, the spike may have already peaked and begun to subside.

Netflix addresses this with a three-layer framework that moves as much work as possible out of the critical path:

Layer	What it does	When it runs
Traffic Modeling	Forecasts demand from viewing history, marketing signals, regional demographics	Days to weeks ahead
Provisioning	Pre-scales infrastructure, pre-encodes content, pre-positions CDN files	Hours to days ahead
Load Shedding	Graceful degradation when actual demand exceeds even best forecasts	Real-time

The first two layers are predictive — they try to get ahead of demand before it arrives. The third layer is reactive — it is the safety net that prevents cascading failures when predictions fall short. Each layer addresses a different failure mode: bad forecasts, slow provisioning, and unforeseeable demand spikes respectively. The subsections that follow drill into each layer’s mechanisms and the trade-offs involved.

Traffic Modeling

Netflix builds traffic models based on historical viewing patterns, marketing signals, and regional demographics:

class TrafficModel:
    def predict_demand(self, title_id: str, release_datetime: datetime) -> DemandForecast:
        # Base demand from similar title releases
        base_demand = self._get_baseline_demand(title_id)

        # Marketing multiplier
        marketing_score = self._get_marketing_score(title_id)

        # Time-of-day adjustment
        hour_factor = self._get_hour_distribution(release_datetime)

        # Geographic distribution
        regional_demand = self._get_regional_forecast(title_id)

        return DemandForecast(
            peak_concurrent_streams=base_demand * marketing_score * hour_factor,
            total_viewers=sum(regional_demand.values()),
            bandwidth_tbps=self._estimate_bandwidth(regional_demand)
        )

Provisioning for Major Releases

For highly anticipated releases, Netflix pre-positions infrastructure:

class ReleaseProvisioning:
    def provision_for_release(self, title_id: str, release_date: datetime):
        forecast = self.traffic_model.predict_demand(title_id, release_date)

        # Pre-encode at all quality levels
        self.encoding_queue.prioritize(title_id, priority="critical")

        # Pre-position to CDN edges
        for region, expected_streams in forecast.regional_demand.items():
            self.cdn.pre_position(
                title_id=title_id,
                region=region,
                expected_streams=expected_streams
            )

        # Scale recommendation service
        self.recommendation_service.scale(
            min_instances=forecast.peak_concurrent_streams / 1000
        )

        # License server capacity
        self.license_service.scale(
            target_capacity=forecast.peak_concurrent_streams * 1.5
        )

Load Shedding Strategy

When demand exceeds capacity, Netflix gracefully degrades rather than cascading failures:

class LoadShedding:
    PRIORITY_LEVELS = {
        "premium_subscriber": 1,
        "standard_subscriber": 2,
        "basic_subscriber": 3,
        "anonymous": 4
    }

    def should_accept_request(self, request: Request, current_load: float) -> bool:
        if current_load < self.capacity_threshold:
            return True

        # Reject lower-priority users under load
        request_priority = self.PRIORITY_LEVELS[request.subscription_tier]
        threshold_priority = self._load_to_priority_threshold(current_load)

        return request_priority < threshold_priority

    def _load_to_priority_threshold(self, load: float) -> int:
        # As load increases, only accept higher-priority requests
        if load > 0.95:
            return 1  # Only premium
        elif load > 0.85:
            return 2  # Standard+
        elif load > 0.70:
            return 3  # Basic+
        return 4  # Accept all

Data Storage

Netflix stores different types of data differently. Metadata (titles, episodes, users) lives in Cassandra. Playback state goes to EVCache for low-latency reads. The key is matching the storage engine to how you actually access the data.

Cassandra for Metadata

CREATE TABLE titles (
    title_id UUID PRIMARY KEY,
    title_type TEXT,  -- 'movie' or 'show'
    title_name TEXT,
    synopsis TEXT,
    release_year INT,
    duration_secs INT,
    rating TEXT,
    genres LIST<TEXT>,
    -- Denormalized for query performance
    genres_sorted SET<TEXT>,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE episodes (
    show_id UUID,
    season_num INT,
    episode_num INT,
    episode_id UUID,
    title_name TEXT,
    duration_secs INT,
    synopsis TEXT,
    PRIMARY KEY ((show_id), season_num, episode_num)
);

CREATE TABLE title_by_genre (
    genre TEXT,
    release_year INT,
    popularity_score DOUBLE,
    title_id UUID,
    PRIMARY KEY ((genre), popularity_score, release_year)
);

EVCache for Playback State

# Playback state cached in EVCache
PLAYBACK_STATE_TTL = 7200  # 2 hours

async def get_playback_state(user_id: int, title_id: int) -> PlaybackState:
    key = f"playback:{user_id}:{title_id}"
    cached = await evcache.get(key)

    if cached:
        return PlaybackState(**json.loads(cached))

    # Fetch from database
    state = await db.fetch_playback_state(user_id, title_id)

    if state:
        await evcache.setex(key, PLAYBACK_STATE_TTL, json.dumps(state))

    return state

async def save_playback_position(user_id: int, title_id: int, position: int):
    # Write to EVCache immediately
    key = f"playback:{user_id}:{title_id}"
    state = PlaybackState(user_id=user_id, title_id=title_id, position=position)

    await evcache.setex(key, PLAYBACK_STATE_TTL, json.dumps(state))

    # Persist to Cassandra async
    asyncio.create_task(
        db.save_playback_state(user_id, title_id, position)
    )

Global Architecture

Multi-Region Setup

graph TB
    subgraph "US-East (Primary)"
        ZU[Zuul Gateway]
        SVCS_E[Services]
        DB_E[(Cassandra)]
    end

    subgraph "EU-West (Replica)"
        ZU2[Zuul Gateway]
        SVCS_W[Services]
        DB_W[(Cassandra)]
    end

    subgraph "Asia-Pacific"
        ZU3[Zuul Gateway]
        SVCS_A[Services]
        DB_A[(Cassandra)]
    end

    SVCS_E <--> DB_E
    SVCS_W <--> DB_W
    SVCS_A <--> DB_A

    ZU --> SVCS_E
    ZU2 --> SVCS_W
    ZU3 --> SVCS_A

Traffic Routing

Netflix uses latency-based routing to direct users to the nearest region:

class LatencyRouter:
    def route_request(self, user_id: int, service: str) -> str:
        # Get user's last known region
        user_region = self._get_user_region(user_id)

        # Check if region is healthy
        if self._is_region_healthy(user_region):
            return f"{service}.{user_region}.netflix.com"

        # Fallback to lowest latency
        latencies = self._measure_all_regions(service)
        return min(latencies, key=latencies.get)

    def _measure_all_regions(self, service: str) -> Dict[str, float]:
        return {
            "us-east-1": self._ping(f"{service}.us-east-1.netflix.com"),
            "eu-west-1": self._ping(f"{service}.eu-west-1.netflix.com"),
            "ap-northeast-1": self._ping(f"{service}.ap-northeast-1.netflix.com")
        }

DRM and Entitlement Systems

Digital Rights Management (DRM) protects content from unauthorized copying. Netflix uses multiple DRM schemes for different platforms.

DRM Architecture

graph TB
    D[Device] -->|1. Initialize| L[License Server]
    L -->|2. Device ID + Content Key Request| E[Entitlement Service]
    E -->|3. Check subscription| DB[(User DB)]
    DB -->|4. Entitlement OK| E
    E -->|5. Grant license| L
    L -->|6. Encrypted License| D
    D -->|7. Decrypt with device key| K[Content Key]
    K -->|8. Play| V[Video Decrypt]

Multi-DRM Support

class DRMManager:
    """Handle multiple DRM schemes per platform"""

    DRM_SCHEMES = {
        "widevine": "com.widevine.alpha",      # Android, Chrome, many devices
        "playready": "com.microsoft.playready", # Windows, Xbox
        "fairplay": "com.apple.fairplay",      # iOS, Safari
        "clearkey": "org.w3.clearkey"           # Web fallback
    }

    def get_supported_drm(self, device_type: str) -> List[str]:
        """Return DRM schemes supported by device"""
        capabilities = {
            "android": ["widevine"],
            "ios": ["fairplay"],
            "web_safari": ["fairplay", "clearkey"],
            "web_chrome": ["widevine", "clearkey"],
            "windows": ["playready", "widevine", "clearkey"],
            "smarttv": ["widevine", "playready"]
        }
        return capabilities.get(device_type, ["widevine"])

Entitlement Checking

class EntitlementService:
    """Verify user can access specific content"""

    async def check_entitlement(
        self,
        user_id: int,
        title_id: str,
        device_type: str
    ) -> EntitlementResult:
        # Get user's subscription tier
        subscription = await self.user_service.get_subscription(user_id)

        # Get title's required tier
        title = await self.metadata_service.get_title(title_id)

        if subscription.tier < title.required_tier:
            return EntitlementResult(
                allowed=False,
                reason="subscription_tier_too_low",
                upgrade_to=title.required_tier
            )

        # Check concurrent stream limit
        active_streams = await self.playback_service.count_active_streams(user_id)

        if active_streams >= subscription.max_streams:
            return EntitlementResult(
                allowed=False,
                reason="max_streams_exceeded",
                active_streams=active_streams
            )

        return EntitlementResult(allowed=True)

Resilience Patterns

Circuit Breaker

class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5, timeout: int = 60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = "closed"

    async def call(self, func: Callable, *args, **kwargs):
        if self.state == "open":
            if time.time() - self.last_failure_time > self.timeout:
                self.state = "half-open"
            else:
                raise CircuitOpenException()

        try:
            result = await func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e

    def _on_success(self):
        self.failure_count = 0
        self.state = "closed"

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = "open"

Bulkheads

Isolate service dependencies to prevent cascading failures:

class BulkheadExecutor:
    def __init__(self, max_concurrent: int = 100):
        self.semaphore = asyncio.Semaphore(max_concurrent)

    async def execute(self, func: Callable, *args, **kwargs):
        async with self.semaphore:
            return await func(*args, **kwargs)

# Different bulkheads for different service calls
metadata_bulkhead = BulkheadExecutor(max_concurrent=50)
recommendation_bulkhead = BulkheadExecutor(max_concurrent=20)
playback_bulkhead = BulkheadExecutor(max_concurrent=30)

Adaptive Bitrate Streaming Deep Dive

ABR Algorithm Details

class AdaptiveBitrateController:
    """Netflix's adaptive bitrate selection algorithm"""

    def __init__(self):
        # Quality levels (lowest to highest)
        self.levels = [
            {"name": "auto", "min_bandwidth": 0},
            {"name": "360p", "min_bandwidth": 0.7, "max_bandwidth": 1.5},
            {"name": "480p", "min_bandwidth": 1.5, "max_bandwidth": 3},
            {"name": "720p", "min_bandwidth": 3, "max_bandwidth": 5},
            {"name": "1080p", "min_bandwidth": 5, "max_bandwidth": 10},
            {"name": "4K", "min_bandwidth": 10, "max_bandwidth": float('inf')}
        ]

        # State
        self.current_level = 0
        self.buffer_levels = []  # Ring buffer of recent buffer levels
        self.throughput_samples = []  # Ring buffer of recent throughput

    def calculate_throughput(self, segments: List[Segment]) -> float:
        """Calculate effective throughput from recent segments"""
        if not segments:
            return 0

        # Weight recent samples higher
        weights = [0.1, 0.15, 0.25, 0.5]  # Oldest to newest
        weighted_sum = sum(
            s.download_time / s.size_mb * w
            for s, w in zip(segments[-4:], weights)
        )
        return weighted_sum

    def select_quality(self, throughput: float, buffer_level: float) -> str:
        """Select optimal quality based on conditions"""
        # Determine if we're in startup, steady, or buffer-depleted state
        state = self._classify_state(buffer_level)

        if state == "startup":
            # During startup, download multiple qualities in parallel
            return self.levels[0]["name"]  # "auto"

        elif state == "buffer_depleted":
            # Buffer running low - switch to lower quality
            return self._select_lower_quality(throughput)

        elif state == "steady":
            # Try to improve quality if buffer is healthy
            return self._select_quality_for_throughput(throughput)

        return self.current_level

    def _classify_state(self, buffer_level: float) -> str:
        if buffer_level < 10:
            return "buffer_depleted"
        elif buffer_level < 60:
            return "steady"
        else:
            return "startup"  # Large buffer, can experiment

CDN Cache Invalidation

graph TB
    A[Content Update] --> B{New encode or metadata change?}
    B -->|Metadata| C[Update metadata service]
    B -->|New encode| D[Push to origin]
    D --> E[Invalidate edge caches]
    E --> F[CDN propagates in < 30s]

    subgraph "Cache Invalidation Strategy"
        C
        E
    end

class CDNInvalidationService:
    """Handle content cache invalidation across CDN"""

    async def invalidate_title(self, title_id: str, reason: str):
        """Invalidate all cached data for a title"""
        # Invalidate manifest files
        await self.cdn.invalidate(f"/title/{title_id}/*.m3u8")

        # Invalidate metadata
        await self.cdn.invalidate(f"/api/metadata/{title_id}")

        # Invalidate thumbnails
        await self.cdn.invalidate(f"/images/{title_id}/*")

        # Log invalidation for audit
        await self.audit_log.record({
            "event": "cache_invalidation",
            "title_id": title_id,
            "reason": reason,
            "timestamp": datetime.utcnow()
        })

Multi-Device Session Management

Users watch Netflix on multiple devices. Sessions must sync playback position and handle concurrent playback limits.

Session State

class PlaybackSession:
    """Represent an active playback session"""

    def __init__(
        self,
        session_id: str,
        user_id: int,
        title_id: str,
        device_type: str,
        position_seconds: int,
        quality: str
    ):
        self.session_id = session_id
        self.user_id = user_id
        self.title_id = title_id
        self.device_type = device_type
        self.position_seconds = position_seconds
        self.quality = quality
        self.started_at = datetime.utcnow()
        self.last_heartbeat = datetime.utcnow()

    async def update_position(self, position_seconds: int):
        """Update playback position"""
        self.position_seconds = position_seconds
        self.last_heartbeat = datetime.utcnow()

        # Persist to storage
        await self.playback_store.save_position(
            self.session_id,
            position_seconds
        )

        # Invalidate device position cache
        await self.cache.delete(f"position:{self.user_id}:{self.title_id}")

Continue Watching Sync

class ContinueWatchingService:
    """Sync playback position across devices"""

    async def get_resume_position(
        self,
        user_id: int,
        title_id: str,
        requesting_device: str
    ) -> ResumePosition:
        # Check if another device has more recent position
        active_sessions = await self.session_manager.get_active_sessions(
            user_id,
            exclude_device=requesting_device
        )

        # Find session with this title
        for session in active_sessions:
            if session.title_id == title_id:
                return ResumePosition(
                    position_seconds=session.position_seconds,
                    device=session.device_type,
                    updated_at=session.last_heartbeat
                )

        # Fallback to database
        return await self.playback_store.get_position(user_id, title_id)

    async def handle_playback_start(
        self,
        user_id: int,
        title_id: str,
        device_type: str
    ) -> Session:
        # Create new session
        session = PlaybackSession(
            session_id=uuid4(),
            user_id=user_id,
            title_id=title_id,
            device_type=device_type,
            position_seconds=0,
            quality="auto"
        )

        # Register session
        await self.session_manager.register(session)

        # Enforce concurrent stream limit
        await self._enforce_stream_limit(user_id)

        return session

    async def _enforce_stream_limit(self, user_id: int):
        """Ensure user hasn't exceeded stream limit"""
        subscription = await self.user_service.get_subscription(user_id)
        active = await self.session_manager.count_active(user_id)

        if active > subscription.max_streams:
            # Force oldest session to stop
            oldest = await self.session_manager.get_oldest_session(user_id)
            await self.session_manager.terminate(oldest.session_id)

Trade-off Analysis

Factor	Custom CDN (Open Connect)	Third-Party CDN	Hybrid Approach
Cost at scale	Lower (owned hardware)	Higher (per TB)	Medium
Control	Full	Limited	Partial
Time to deploy	Long (build your own)	Fast (existing)	Medium
Optimization	Streaming-specific	General purpose	Mixed
Operational burden	High (you manage)	Low (managed)	Medium

Production Failure Scenarios

Failure Scenario	Impact	Mitigation
CDN origin failure	Video segments unavailable	Multi-CDN; fallback to direct streaming
License server down	No new streams can start	Cache licenses; graceful degradation
Recommendation service slow	Homepage takes longer to load	Cache recommendations; show stale content
Encoding pipeline backlog	New content delayed	Priority encoding; capacity headroom
Device too many streams	New streams rejected	Clear error message; upgrade prompt

Scenario Drills

Scenario 1: CDN Origin Server Failure

Situation: The CDN origin serving video segments goes down during peak viewing hours.

Analysis:

Devices requesting segments get errors
Playback stalls, rebuffering begins
Millions of concurrent streams affected

Solution: Multi-CDN deployment with automatic failover. If one CDN has issues, traffic routes to another. Open Connect has multiple origin clusters geographically distributed. Devices can switch CDN transparently if segment requests fail.

Scenario 2: New Show Releases Simultaneously Worldwide

Situation: A highly anticipated show releases globally at midnight UTC. 10 million users try to start playback simultaneously.

Analysis:

Encoding pipeline must complete all quality levels before release
CDN edge caches start cold for a new title
License servers receive burst of requests

Solution: Pre-encode content days before release. Pre-position popular titles at ISP locations. License server scales horizontally; licenses are cached for their validity period to reduce server load.

Scenario 3: ABR Algorithm Causes Quality Flutter

Situation: Users report constantly changing video quality, creating a jarring viewing experience.

Analysis:

ABR switches quality too frequently
Bandwidth measurements fluctuate (wireless networks)
Buffer thresholds trigger rapid up/down switching

Solution: Implement stability windows. Once you switch to a quality level, stay there for at least 30 seconds. Use weighted throughput averages that emphasize recent samples. Build in safety margins (use 70% of measured throughput for decisions).

Failure Flow Diagrams

Stream Playback Initialization

flowchart TD
    A[User Clicks Play] --> B[Get Manifest from CDN]
    B --> C[Parse Quality Levels]
    C --> D[Select Initial Quality]
    D --> E[Request Video Segment]
    E --> F{Segment Available?}
    F -->|No| G[Try Alternative CDN]
    G --> E2[Request Video Segment]
    E2 --> F2{Segment Available?}
    F2 -->|Yes| H[Download Segment]
    F2 -->|No| G
    F -->|Yes| H
    H --> I[Request DRM License]
    I --> J[License Server]
    J --> K{Check Entitlement?}
    K -->|No| L[Return Error]
    K -->|Yes| M[Return License Key]
    M --> N[Decrypt Segment]
    N --> O[Decode and Display]
    O --> P[Buffer Next Segment]
    P --> E

ABR Quality Selection

flowchart TD
    A[Monitor Buffer Level] --> B{Buffer < 10s?}
    B -->|Yes| C[Select Lower Quality]
    B -->|No| D{Buffer > 60s?}
    D -->|Yes| E[Select Higher Quality]
    D -->|No| F[Maintain Current Quality]
    C --> G[Download at New Quality]
    E --> H[Download at New Quality]
    F --> I[Continue Current Quality]
    G --> J[Deliver to Player]
    H --> J
    I --> J
    J --> A

Multi-CDN Failover

flowchart TD
    A[Request Segment] --> B[Primary CDN]
    B --> C{Response OK?}
    C -->|Yes| D[Deliver to Device]
    C -->|No| E[Try Secondary CDN]
    E --> F{Response OK?}
    F -->|Yes| G[Deliver to Device]
    F -->|No| H[Fallback to Origin Direct]
    H --> I{Origin Available?}
    I -->|Yes| J[Deliver to Device]
    I -->|No| K[Show Error]

Common Pitfalls / Anti-Patterns

Pitfall 1: Aggressive Quality Switching

Problem: Switching quality too frequently creates a “flutter” effect that’s visually jarring.

Solution: Implement quality stability windows. Once you switch up or down, stay at that level for at least 30 seconds.

Quality flutter is one of the most common ABR complaints in production. The underlying issue is that naive throughput-based algorithms respond to every measurement fluctuation. A temporary network hiccup causes an immediate step down in quality; a brief burst of good throughput triggers a step back up. The result is a quality level that keeps oscillating, which users describe as “the video keeps changing.”

The fix sounds almost too simple: add a cooldown timer before allowing another quality switch. Netflix’s production ABR enforces a minimum dwell time at each quality level. The exact duration varies by implementation, but 30 seconds is a common starting point. During this window, the algorithm keeps measuring throughput and buffer levels but does not act on them.

Weighted throughput averaging helps just as much. Rather than reacting to the last measurement, ABR algorithms weight recent samples more heavily while still smoothing out single-packet anomalies. A typical scheme might weight the last four samples as [0.1, 0.15, 0.25, 0.5], where the most recent sample dominates. One bad reading will not cause an unnecessary downgrade this way.

Safety margins are the last piece. ABR decisions should use 70-80% of measured throughput rather than the full amount, accounting for the fact that networks are rarely stable. A connection that delivers 10 Mbps right now may drop to 7 Mbps a few seconds later. Building in headroom prevents the aggressive quality climbs that inevitably lead to rebuffering.

Pitfall 2: Ignoring Network Variability

Problem: Using average throughput misses spikes and drops.

Solution: Use weighted average that emphasizes recent samples. Build in safety margins (use 70-80% of measured throughput for decisions).

Wireless networks exhibit high variability that simple averaging fails to capture. A user on a WiFi network sharing bandwidth with other household devices, or a mobile user transitioning between cell towers, experiences conditions that change dramatically over seconds. A naive average over the last N samples treats a spike and a drop with equal weight, producing quality decisions that are either too aggressive or too conservative.

Netflix’s ABR implementation uses exponential weighted moving average (EWMA) for throughput estimation. The key parameter is the smoothing factor, often called alpha. A higher alpha gives more weight to recent measurements; a lower alpha produces smoother estimates but responds more slowly to genuine changes. The right value depends on the network type: mobile networks typically need higher alpha values to react quickly to handoffs, while stable WiFi can use lower values for smoother quality decisions.

Variance itself is a useful signal. If the standard deviation of your throughput measurements is high relative to the mean, the network is unstable and you should be more conservative with quality selection. Some production systems track a separate confidence score combining both average and variance to make more informed quality decisions.

Packet loss rate is another signal that often gets ignored. TCP retransmissions caused by packet loss artificially inflate measured throughput because the protocol resends lost packets without the receiver acknowledging them. An ABR algorithm that only looks at bytes-per-second without accounting for retransmissions will overestimate available bandwidth following packet loss, leading to quality selections the network cannot sustain.

Pitfall 3: Not Testing on Real Networks

Problem: Lab testing does not capture real-world variability (WiFi interference, cellular handoffs).

Solution: A/B test ABR algorithms on real users. Monitor quality distributions and rebuffer ratios in production.

Lab testing with controlled network links captures some variability through configurable jitter and packet loss profiles, but cannot replicate the full range of conditions real users experience. The gap is real: a residential WiFi network dealing with a neighbor’s streaming load, a crowded conference WiFi with hundreds of devices competing for bandwidth, or a mobile user on a train passing through dead zones all behave in ways that lab equipment rarely simulates without deliberate effort.

Netflix tests ABR changes through canary deployments and A/B experiments. A small percentage of users receive the new algorithm while the majority stay on the baseline. The key metrics are the quality distribution, the rebuffer ratio (time spent waiting versus time watching), and stream startup time. A winning algorithm maintains or improves quality distribution while reducing rebuffer ratio.

Segment-level instrumentation makes this possible. Every segment download records the requested quality, actual throughput achieved, buffer level before and after the segment, and whether rebuffering occurred during playback. This data feeds into experiment evaluation. Without this granularity, you cannot meaningfully compare algorithm variants.

Real-world testing also surfaces edge cases that simulation misses. Users who pause playback and resume hours later with a cold buffer behave differently from continuous viewers. VPN users routing traffic through distant exit points experience latency spikes that would never appear in a lab. Enterprise network users face rate limiting and traffic shaping that artificially constrains throughput. Each of these scenarios requires specific handling in the ABR logic.

Real-world Failure Scenarios

Scenario 1: Netflix Open Connect Appliance Failure

What happened: During peak viewing hours, multiple Netflix Open Connect Appliances (OCAs) deployed at an ISP failed simultaneously due to a firmware bug, causing video buffering for thousands of subscribers in that region.

Root cause: A firmware update pushed to OCAs contained a memory leak that was only triggered under sustained high-throughput conditions. The bug was not caught in staging because test environments never replicated the sustained load of peak hours.

Impact: Subscribers experienced persistent buffering and playback failures. Support tickets flooded in from the affected region, and Netflix’s NOC team took over 3 hours to identify the root cause as a firmware issue rather than a network connectivity problem.

Lesson learned: Test firmware updates under production-representative sustained load conditions. Implement progressive rollout with automatic rollback if error rates spike. Build observability into the OCA management plane to detect anomalies faster.

Scenario 2: Cassandra Batch Statement Timeout Cascade

What happened: A change in Netflix’s Cassandra batch statement handling caused cascading timeouts across the playback history service, resulting in incomplete viewing history for users and data loss during the incident window.

Root cause: The migration from individual statements to batch statements unintentionally increased query execution time. Under peak load, query latency exceeded the application timeout threshold, triggering retries that compounded the load on the already-stressed cluster.

Impact: Users who watched content during the incident window had gaps in their viewing history. The issue affected the “Continue Watching” feature, which relies on accurate playback history, causing frustration for impacted subscribers.

Lesson learned: Always benchmark query performance changes at production scale before deploying schema or query pattern changes. Set conservative timeout values that account for cluster stress conditions. Implement circuit breakers for database calls to prevent cascade failures.

Scenario 3: Zuul Filter Regression Causing Service Degradation

What happened: A regression introduced in a Zuul filter update caused incorrect routing for a subset of API requests, resulting in users receiving content intended for other accounts.

Root cause: The filter’s route lookup logic had a subtle regression that only manifested when the number of available backend instances crossed a specific threshold. The condition was not covered by existing integration tests.

Impact: A small number of users briefly saw content belonging to other accounts — a serious data confidentiality incident. Netflix’s incident response team immediately rolled back the filter and initiated a security investigation.

Lesson learned: Implement integration tests that cover edge cases in routing logic, including boundary conditions. Deploy routing filters with canary analysis that monitors error rates and data integrity metrics. Ensure all routing changes undergo security review.

Quick Recap

Netflix’s Open Connect CDN places servers at ISP locations worldwide.
Adaptive bitrate streaming selects quality based on bandwidth and buffer health.
DRM (Widevine, PlayReady, FairPlay) protects content on each platform.
Entitlement service enforces subscription tier and concurrent stream limits.
Session sync allows “continue watching” across devices.

Copy/Paste Checklist

- [ ] Implement ABR with buffer-aware quality selection
- [ ] Use multi-CDN with failover
- [ ] Cache CDN responses aggressively
- [ ] Enforce concurrent stream limits per subscription
- [ ] Implement session sync for continue watching
- [ ] Monitor stream startup time and rebuffer ratio
- [ ] Test ABR on real networks, not just labs

Observability Checklist

Metrics to Capture

stream_startup_time_seconds (histogram) - Time to first frame
bitrate_selected (histogram) - Quality distribution
rebuffer_ratio (gauge) - Time spent rebuffering vs playing
cdn_cache_hit_ratio (gauge) - Cache efficiency
license_request_latency_ms (histogram) - DRM overhead
concurrent_streams (gauge) - Active stream count

Alerts to Configure

Alert	Threshold	Severity
Startup time P99 > 3s	3000ms	Warning
Rebuffer ratio > 5%	5%	Warning
CDN cache hit < 90%	90%	Warning
License latency P99 > 200ms	200ms	Critical
Active streams < expected	< 50% baseline	Warning

Security Checklist

DRM encryption for all premium content
Device attestation before issuing licenses
HDCP enforcement for high-definition outputs
Screen capture detection and blocking
Concurrent stream enforcement
Geographic restrictions per title
Secure token exchange for session management
Content signing to prevent tampering

Interview Questions

1. Why does Netflix build its own CDN (Open Connect) instead of using existing CDNs?

Netflix streams billions of hours monthly. At that scale, commercial CDN costs become prohibitive. Open Connect appliances are custom-built for Netflix's workload (video streaming, not general web content). They are deployed at ISP facilities worldwide, placing content close to users while reducing Netflix's backbone costs. The economics only work at Netflix's scale.

2. How does adaptive bitrate streaming work?

Netflix encodes each title in multiple quality levels (4K, 1080p, 720p, etc.). The device downloads an HLS/DASH manifest listing available quality levels. The client measures download speed and buffer fullness. If buffer runs low, it switches to lower quality. If buffer is healthy and bandwidth is high, it switches up. This happens every few seconds during playback.

3. How does Netflix enforce concurrent stream limits?

The entitlement service tracks active playback sessions per user. When a user starts playback, it counts existing sessions. If the count exceeds the subscription tier limit (e.g., 4 streams for premium), the oldest session is terminated. The device receives an error with an upgrade prompt.

4. What is the difference between Widevine, PlayReady, and FairPlay?

These are DRM schemes for different platforms. Widevine (Google) runs on Android, Chrome, most smart TVs. PlayReady (Microsoft) runs on Windows, Xbox, some smart TVs. FairPlay (Apple) runs on iOS, Safari, Apple TV. Netflix negotiates with the device to select the highest security level the device supports. Content is encrypted once and licenses are delivered via the device's preferred DRM.

5. How does Netflix handle the "continue watching" feature across devices?

You're watching a show on your laptop, switch to your phone, and it resumes from roughly where you left off. This seems simple but requires coordination.

Netflix creates a playback session when you start watching. As you watch, position updates go to EVCache for low-latency access and asynchronously persist to Cassandra. When you open Netflix on another device, the system looks for your active sessions across all devices and resumes from whichever has the most recent position.

The subtlety: this means your phone might be 30 seconds ahead of where you actually left off on the laptop, but that's usually fine. Users expect continuity, not precision.

6. What is the role of EVCache in Netflix's architecture?

EVCache is Netflix's memcached wrapper — a distributed in-memory cache tuned for their workload. It sits in front of Cassandra for reads, giving sub-millisecond latency for hot data. The write path goes to Cassandra directly; EVCache gets populated on read.

What's interesting is what they cache: playback sessions, recommendations for active users, and anything else that's read frequently but doesn't change constantly. The cache TTLs are short by design — freshness matters more than cache hit rates for this data.

7. How does Netflix's encoding pipeline work and why is it important?

Netflix doesn't stream one video file — they encode each title in dozens of quality variants (different resolutions, bitrates, codecs). A new release might get encoded in 4K HDR, 1080p, 720p, 480p, and lower, with both H.264 and H.265 codecs. That's dozens of files per title.

The pipeline prioritizes based on predicted popularity. A show expected to be popular gets encoded faster and pre-positioned to CDN edges before release. This is why Netflix can release a new season globally at midnight and have it streaming immediately — the encoding work is done in advance.

8. What are the trade-offs between using a custom CDN versus a third-party CDN?

Netflix built Open Connect because at their scale, commercial CDN pricing would have been astronomical. Owning the hardware means lower per-TB costs at massive volume. They can also optimize specifically for video streaming — the hardware, the software, the ISP partnerships, all tuned for their workload.

The cost is operational complexity. Netflix has to build, deploy, and maintain servers in ISP facilities worldwide. That's a huge engineering investment. For most companies, a third-party CDN like Cloudflare or Fastly is the right call — you get global reach without building your own infrastructure.

The hybrid approach (some self-hosted, some third-party) is common for companies at Netflix scale but not theirs.

9. How does Netflix's recommendation system handle cold start for new users?

New users have no viewing history, so collaborative filtering can't work yet. Netflix falls back to contextual signals: what device you're on, what time of day, your geographic region. They also use demographic-based recommendations — "users in your age group and region also watched X."

The personalization ramps up as you watch more. The first few recommendations are generic; by the time you've watched a dozen titles, Netflix has enough signal to make meaningful personalized suggestions.

What's worth noting: Netflix's recommendations are famously driven by their ML models, but the cold-start problem means new users see more editorial and popularity-based picks until the models have enough data.

10. Why does Netflix use Cassandra instead of a traditional RDBMS for metadata storage?

At millions of queries per second across global regions, a traditional sharded MySQL setup hits limits: cross-shard queries become expensive, adding capacity means resharding, and failover is complicated. Cassandra avoids these problems by being designed for horizontal scale from the start.

The data model maps naturally to Cassandra's partition keys. "Give me all episodes for show X" is a partition lookup, not a scan. "Give me titles sorted by popularity within drama" uses Cassandra's clustering columns. The query patterns align with how the data is stored.

Cassandra's tunable consistency also matters. Netflix can choose eventual consistency for reads where freshness doesn't matter (view counts, recommendations) and stronger consistency where it does (billing, entitlements).

11. How does the ABR algorithm prevent quality "flutter" and maintain stable playback?

Quality flutter happens when the ABR algorithm switches quality too frequently, creating a jarring viewing experience. Netflix addresses this through several mechanisms:

Stability windows: Once you switch to a quality level, stay there for at least 30 seconds before considering another switch.
Weighted throughput averaging: Recent samples are weighted higher than older ones to respond quickly to actual changes while avoiding noise from temporary fluctuations.
Safety margins: Use 70-80% of measured throughput for quality decisions to account for variability in network conditions.
State-based decisions: Different logic for startup vs. steady-state vs. buffer-depleted scenarios.

12. Design the multi-CDN failover system for video streaming.

Key components for multi-CDN failover:

Health monitoring: Each CDN reports latency, error rates, and throughput continuously.
Failover triggers: Automatic switch when primary CDN returns errors, latency exceeds threshold, or success rate drops below SLA.
Client-side logic: Devices retry with exponential backoff, then switch to next CDN in priority order.
Consistent hashing: Ensure segments are requested from the same CDN to avoid requesting duplicate data from multiple sources.
Origin fallback: Last resort is direct origin streaming if all CDNs fail.

13. How would you design the encoding pipeline to handle a new highly-anticipated show release?

Preparation starts days before release:

Priority encoding queue: Predict popularity based on marketing signals, pre-encode high-priority titles first.
All quality levels up front: Encode 4K, 1080p, 720p, 480p, etc., with both H.264 and H.265 codecs before release.
CDN pre-positioning: Push encoded content to ISP-level Open Connect appliances before midnight UTC.
Capacity headroom: Encoding farms scale horizontally, allocate extra capacity for expected burst.
License server scaling: Pre-scale license server fleet to handle 10x normal request volume.

14. Explain how Netflix handles geographic content restrictions and licensing rights.

Geographic restrictions are enforced at multiple layers:

Entitlement service: Before issuing a DRM license, checks if the user's account has rights for the requested geographic region.
Content metadata: Each title has geographic availability windows tied to licensing agreements.
Device IP routing: Latency-based routing considers region; requests for restricted content return error before segment delivery.
CDN-level blocking: Some CDNs can enforce geographic restrictions at edge nodes before requests hit origin.

15. How does the recommendation system balance exploration vs. exploitation?

Netflix uses multi-armed bandit approaches and A/B testing:

Exploitation: Show titles similar to what you've watched (high confidence predictions).
Exploration: Occasionally show new or unpredictable titles to gather data on user preferences.
Epsilon-greedy: Most of the time use best predictions, but occasionally random selection.
Thompson sampling: Bayesian approach that balances exploration more intelligently.
Contextual bandits: Consider time of day, device type, and other context when deciding what to explore.

16. Design the session sync mechanism for "continue watching" across devices.

The session sync system requires:

Active session tracking: When playback starts, create a session in EVCache with device ID, timestamp, and position.
Cross-device position queries: When opening the app, query all active sessions for this user, find most recent position across any device.
Async Cassandra persistence: Position updates go to Cassandra in background for durability.
Conflict resolution: If two devices report positions within seconds of each other, use the later timestamp.
Session expiration: Sessions expire after heartbeat timeout (e.g., 30 minutes of no activity).

17. How does Netflix handle the case where a user's playback session needs to be terminated due to stream limit?

The enforcement process:

Stream counting: Before starting a new session, count active sessions for this user.
Limit checking: Compare count against subscription tier limit (Basic=1, Standard=4, Premium=4).
Oldest termination: If at limit, find the session with the earliest start time (or oldest heartbeat).
Graceful notification: Send termination message to the oldest device showing an error with upgrade prompt.
State cleanup: Remove the terminated session from both EVCache and Cassandra.

18. What metrics would you track to ensure a streaming service is healthy?

Key streaming health metrics:

Stream startup time: Time from click to first frame. Target P99 < 3 seconds.
Rebuffer ratio: Time spent waiting vs. time playing. Target < 5%.
Bitrate distribution: Histogram of quality levels selected. Higher is better but must not cause rebuffering.
CDN cache hit ratio: Target > 90% at edge caches.
License request latency: DRM overhead. Target P99 < 200ms.
Error rates: By error type (network, DRM, device).
Concurrent streams gauge: Actual vs. expected for capacity planning.

19. How would you design a system to detect and prevent account sharing beyond allowed streams?

Detection mechanisms:

Location tracking: If two sessions have IP geolocations that are physically impossible to reach from one user (e.g., different continents within minutes), flag the account.
Device fingerprinting: Track device characteristics to identify if multiple devices are being used by different households.
Behavioral patterns: Different viewing patterns (different times of day, different genres) might indicate shared accounts.
Explicit enforcement: When stream limit is hit, require email/SMS verification to confirm the primary user.

20. Explain how Netflix's Open Connect Appliances work and why they're placed at ISPs.

Open Connect is Netflix's custom CDN hardware:

Purpose-built hardware: Servers optimized for video streaming workloads, not general web content.
ISP co-location: Appliances live in ISP data centers, directly connected to ISP networks.
Benefits: Reduces Netflix's backbone traffic (users get content from local ISP), lowers latency for users, reduces costs at massive scale.
Content placement: Popular content is pre-positioned based on viewing patterns. New releases are pushed before release date.
ISP benefits: Reduced upstream bandwidth costs, better quality for their subscribers.

Conclusion

Netflix’s architecture handles global streaming at scale:

Open Connect CDN places content at ISP facilities worldwide
Adaptive bitrate streaming optimizes quality for each connection
Microservices enable independent scaling and deployment
Recommendation algorithms drive content discovery
Multi-region deployment for global reach