System Design: Twitter Feed Architecture and Scalability

Deep dive into Twitter system design. Learn about feed generation, fan-out, timeline computation, search, notifications, and scaling challenges.

published: March 22, 2026 reading time: 38 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Twitter's timeline uses a hybrid push/pull approach: regular users' tweets fan out to cached timelines on write, while celebrity tweets get pulled on read to avoid fan-out storms for high-follower accounts. Tweet IDs encode timestamps via Snowflake, giving chronological order without clock synchronization across machines. The system handles 500M users through multi-layer caching (L1 Memcached, L2 Redis), async fan-out via Kafka, Elasticsearch for search, and notifications split into real-time (mentions) and batched (likes, follows) paths.

System Design: Twitter Feed Architecture and Scalability

Twitter handles millions of users posting billions of tweets. The core challenge is not just storing tweets but delivering them to followers in real-time. This case study examines how to design a Twitter-like social media platform.

This is an advanced system design topic. If you are new to distributed systems, start with our Database Scaling and Caching Strategies guides.

Introduction

The Twitter timeline architecture shows what it takes to build a system that’s simultaneously read-heavy, write-heavy, and real-time at scale. This case study covers data modeling, production hardening, and the tradeoffs in between.

Requirements Analysis

Functional Requirements

Users should be able to:

Post tweets (text, images, videos)
Follow/unfollow other users
View a timeline of tweets from followed users
Search for tweets and users
Like, retweet, and reply to tweets
Receive notifications for interactions

Non-Functional Requirements

The system needs:

Timeline latency under 200ms for 99th percentile
Support for 500 million users
Handle 200 million daily active users
Process 500 million tweets per day (6,000 tweets per second peak)
99.99% availability

Data Models

Core Entities

-- Users table
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(15) NOT NULL UNIQUE,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    display_name VARCHAR(50),
    bio TEXT,
    avatar_url VARCHAR(500),
    follower_count BIGINT DEFAULT 0,
    following_count BIGINT DEFAULT 0,
    verified BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Tweets table
CREATE TABLE tweets (
    id BIGSERIAL PRIMARY KEY,
    author_id BIGINT NOT NULL REFERENCES users(id),
    parent_id BIGINT REFERENCES tweets(id),  -- For replies
    content TEXT NOT NULL,
    media_urls JSONB DEFAULT '[]',
    retweet_of BIGINT REFERENCES tweets(id),
    like_count BIGINT DEFAULT 0,
    retweet_count BIGINT DEFAULT 0,
    reply_count BIGINT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    deleted BOOLEAN DEFAULT FALSE,

    INDEX idx_tweets_author (author_id),
    INDEX idx_tweets_created (created_at DESC)
);

-- Follows table
CREATE TABLE follows (
    follower_id BIGINT NOT NULL REFERENCES users(id),
    following_id BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    PRIMARY KEY (follower_id, following_id)
);

-- Likes table
CREATE TABLE likes (
    user_id BIGINT NOT NULL REFERENCES users(id),
    tweet_id BIGINT NOT NULL REFERENCES tweets(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    PRIMARY KEY (user_id, tweet_id)
);

NoSQL Alternative

For extreme write throughput, use a wide-column store:

{
  "table": "tweets",
  "key": "tweet_id",
  "columns": {
    "author_id": "bigint",
    "content": "text",
    "created_at": "timestamp",
    "media": ["list of urls"]
  },
  "clustering": ["created_at DESC"]
}

Timeline Architecture

The timeline is where Twitter’s architecture gets interesting. There are two main approaches: push (fan-out on write) and pull (fan-out on read).

Push Model (Fan-out on Write)

When a user posts a tweet, push it to all followers’ timelines immediately:

sequenceDiagram
    participant U as User Posts Tweet
    participant API as API Server
    participant FF as Fan-out Service
    participant Q as Message Queue
    participant TM as Timeline Cache

    U->>API: POST /tweets {content: "Hello"}
    API->>DB: Store tweet
    DB-->>API: Tweet created
    API->>FF: Fan-out tweet
    FF->>DB: Get followers (10K)
    FF->>Q: Enqueue fan-out jobs
    Q->>TM: Push to each timeline cache

This approach provides fast reads but expensive writes. For celebrities with millions of followers, fan-out becomes problematic.

Pull Model (Fan-out on Read)

When a user loads their timeline, aggregate tweets from all followed users:

sequenceDiagram
    participant U as User Views Timeline
    participant API as API Server
    participant TM as Timeline Cache
    participant DB as Tweet DB
    participant MR as Merge & Rank

    U->>API: GET /timeline
    API->>TM: Get cached timeline IDs
    TM-->>API: Timeline IDs (200 tweets)
    API->>MR: Request tweet details
    MR->>DB: Batch fetch tweets
    DB-->>MR: Tweet objects
    MR-->>API: Enriched tweets
    API-->>U: Timeline response

Hybrid approach combines both: pull for celebrities, push for regular users.

Feed Generation

Timeline Consistency Model

The hybrid push/pull architecture introduces a fundamental consistency challenge: how do you merge pre-computed tweets (pushed) with on-demand tweets (pulled) while maintaining chronological order?

The Problem

When a user loads their timeline, they receive:

Pre-computed tweets from regular users (cached in timeline store)
On-demand tweets from celebrities (fetched at read time)

These two sources arrive separately and must be merged. The merge must preserve reverse-chronological order without comparing actual timestamps.

The Solution: Snowflake ID Sorting

Snowflake IDs provide a self-ordering property. Each ID encodes:

41 bits: timestamp (milliseconds since Twitter epoch)
5 bits: datacenter ID
5 bits: worker ID
12 bits: sequence number

Since datacenter clocks are synchronized and IDs increment within each datacenter, ID ordering correlates directly with time ordering. The merge algorithm:

def merge_timeline(pushed_tweets, pulled_tweets, limit=200):
    # Both lists pre-sorted by Snowflake ID descending
    merged = []
    p_idx, pu_idx = 0, 0

    while len(merged) < limit:
        p_tweet = pushed_tweets[p_idx] if p_idx < len(pushed_tweets) else None
        pu_tweet = pulled_tweets[pu_idx] if pu_idx < len(pulled_tweets) else None

        if p_tweet is None:
            merged.append(pulled_tweets[pu_idx])
            pu_idx += 1
        elif pu_tweet is None:
            merged.append(pushed_tweets[p_idx])
            p_idx += 1
        elif p_tweet.id > pu_tweet.id:  # ID encodes time
            merged.append(p_tweet)
            p_idx += 1
        else:
            merged.append(pu_tweet)
            pu_idx += 1

    return merged

Consistency Guarantees:

Eventual consistency for pushed tweets: Due to async fan-out, there’s a small window where a new tweet exists but hasn’t been pushed to all timelines
Strong consistency for pulled tweets: Celebrity tweets are fetched directly from DB at read time
No duplicate tweets: The timeline service deduplicates by tweet ID during merge
No ordering anomalies: Snowflake IDs eliminate clock synchronization issues

Cache Invalidation Strategy:

When a user unfollows someone, their cached timeline retains old tweets from that user until TTL expiry. This is acceptable because:

The user explicitly chose to unfollow
The stale data naturally ages out
Forced cache invalidation would be expensive

For scenarios requiring stronger guarantees (like blocking), use write-through invalidation: when you block a user, explicitly remove their tweets from your timeline cache.

Timeline Service

class TimelineService:
    def __init__(self, cache: RedisCluster, db: Database):
        self.cache = cache
        self.db = db

    async def get_timeline(
        self,
        user_id: int,
        limit: int = 200,
        cursor: str = None
    ) -> TimelineResponse:
        # Try cache first
        cached = await self.cache.get(f"timeline:{user_id}")
        if cached and not cursor:
            return self._parse_cached_timeline(cached)

        # Fetch from multiple sources
        following = await self._get_following(user_id)
        tweets = await self._fetch_tweets(user_id, following, limit, cursor)

        return TimelineResponse(
            tweets=tweets,
            cursor=self._generate_cursor(tweets)
        )

    async def _fetch_tweets(
        self,
        user_id: int,
        following: List[int],
        limit: int,
        cursor: str
    ) -> List[Tweet]:
        # Split into celebrities and regular users
        celebrities, regular = await self._categorize_users(following)

        # Fetch in parallel
        celebrity_tweets = await self._fetch_celebrity_tweets(celebrities, limit)
        regular_tweets = await self._fetch_regular_tweets(
            user_id, regular, limit - len(celebrity_tweets), cursor
        )

        # Merge and rank
        merged = self._merge_tweets(celebrity_tweets, regular_tweets)
        return self._rank_tweets(merged)[:limit]

Caching Timeline

Cache timelines with user-specific keys:

CACHE_CONFIG = {
    "timeline_ttl": 3600,        # 1 hour
    "max_timeline_size": 800,    # Store 800 tweets, return 200
    "min_followers_for_fanout": 10000  # Celebrity threshold
}

async def cache_timeline(user_id: int, tweets: List[Tweet]):
    key = f"timeline:{user_id}"
    serialized = serialize_tweets(tweets)

    await self.cache.setex(
        key,
        CACHE_CONFIG["timeline_ttl"],
        serialized
    )

Fan-out Service

The fan-out service distributes tweets to follower timelines:

High-Follower Handling

class FanoutService:
    async def fanout_tweet(self, tweet: Tweet):
        author_id = tweet.author_id
        follower_count = await self.db.fetch_val(
            "SELECT follower_count FROM users WHERE id = $1",
            author_id
        )

        if follower_count > CACHE_CONFIG["min_followers_for_fanout"]:
            # Celebrity: skip fan-out, users pull on read
            await self._mark_tweet__celebrity(tweet)
        else:
            # Regular user: fan-out to all followers
            await self._fanout_to_followers(tweet)

    async def _fanout_to_followers(self, tweet: Tweet):
        async with self.db.pool.acquire() as conn:
            async for row in conn.cursor(
                "SELECT follower_id FROM follows WHERE following_id = $1",
                tweet.author_id
            ):
                follower_id = row["follower_id"]
                await self._push_to_timeline(follower_id, tweet)

    async def _push_to_timeline(self, user_id: int, tweet: Tweet):
        key = f"timeline:{user_id}"
        tweet_preview = {
            "id": tweet.id,
            "author_id": tweet.author_id,
            "created_at": tweet.created_at.isoformat()
        }

        # Add to front of list
        await self.cache.lpush(key, serialize_tweet_preview(tweet_preview))
        # Trim to max size
        await self.cache.ltrim(key, 0, CACHE_CONFIG["max_timeline_size"] - 1)

Search Architecture

Twitter search requires full-text search across billions of tweets.

Elasticsearch Mapping

{
  "mappings": {
    "properties": {
      "tweet_id": { "type": "long" },
      "author_id": { "type": "long" },
      "content": {
        "type": "text",
        "analyzer": "twitter_analyzer",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "hashtags": { "type": "keyword" },
      "mentions": { "type": "keyword" },
      "created_at": { "type": "date" },
      "engagement_score": { "type": "integer" }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "twitter_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  }
}

Search Query

async def search_tweets(
    query: str,
    limit: int = 20,
    since_id: str = None,
    max_id: str = None
) -> SearchResponse:
    must_clauses = [
        {"match": {"content": query}}
    ]

    filter_clauses = [
        {"range": {"created_at": {"gte": "2020-01-01"}}}
    ]

    if since_id:
        filter_clauses.append({"range": {"tweet_id": {"gt": int(since_id)}}})

    result = await es.search(
        index="tweets",
        body={
            "query": {
                "bool": {
                    "must": must_clauses,
                    "filter": filter_clauses
                }
            },
            "sort": [
                {"_score": "desc"},
                {"tweet_id": "desc"}
            ],
            "size": limit
        }
    )

    return SearchResponse(
        tweets=[Tweet(**hit["_source"]) for hit in result["hits"]["hits"]],
        count=result["hits"]["total"]["value"]
    )

Notification System

Notification Types

Type	Trigger	Delivery
Mention	@username in tweet	Immediate
Reply	Reply to user’s tweet	Immediate
Like	Someone likes your tweet	Batch (hourly)
Follow	Someone follows you	Batch (hourly)
Retweet	Someone retweets your tweet	Batch (hourly)

Notification Pipeline

graph LR
    A[Tweet Event] --> B[Event Router]
    B --> C[Real-time Queue]
    B --> D[Batch Queue]
    C --> E[Push Service]
    C --> F[WebSocket]
    D --> G[Batch Processor]
    G --> H[Email Service]
    G --> I[Push Notifications]

Notification Service

class NotificationService:
    def __init__(self, queue: KafkaProducer, db: Database):
        self.queue = queue
        self.db = db

    async def notify_mention(self, tweet: Tweet, mentioned_users: List[int]):
        for user_id in mentioned_users:
            await self.queue.send("notifications", {
                "type": "mention",
                "user_id": user_id,
                "tweet_id": tweet.id,
                "author_id": tweet.author_id,
                "created_at": datetime.utcnow().isoformat()
            })

    async def notify_followers(self, follower_ids: List[int], event: Dict):
        batch = {
            "type": "follow",
            "user_ids": follower_ids,
            "actor_id": event["actor_id"],
            "created_at": datetime.utcnow().isoformat()
        }
        await self.queue.send_batch("notifications-batch", [batch])

Caching Patterns

Multi-Layer Caching

CACHE_LAYERS = [
    {"name": "tweet", "ttl": 86400, "size": "16MB"},
    {"name": "timeline", "ttl": 3600, "size": "100MB"},
    {"name": "user", "ttl": 900, "size": "16MB"},
    {"name": "social-graph", "ttl": 300, "size": "50MB"}
]

Cache-Aside for Tweets

async def get_tweet(tweet_id: int) -> Optional[Tweet]:
    # L1: Memcached
    cached = await mc.get(f"tweet:{tweet_id}")
    if cached:
        return Tweet(**json.loads(cached))

    # L2: Redis cluster
    cached = await redis.get(f"tweet:{tweet_id}")
    if cached:
        tweet = Tweet(**json.loads(cached))
        await mc.set(f"tweet:{tweet_id}", json.dumps(tweet.dict()))
        return tweet

    # Database
    tweet = await db.fetch_one(
        "SELECT * FROM tweets WHERE id = $1", tweet_id
    )

    if tweet:
        await redis.setex(f"tweet:{tweet_id}", 3600, json.dumps(tweet))
        await mc.set(f"tweet:{tweet_id}", json.dumps(tweet.dict()))

    return tweet

Write Path Optimization

Write-Behind Cache

async def post_tweet(user_id: int, content: str) -> Tweet:
    # Write to database
    tweet = await db.create(
        "INSERT INTO tweets (author_id, content) VALUES ($1, $2) RETURNING *",
        user_id, content
    )

    # Update timeline cache asynchronously
    asyncio.create_task(self._update_timeline_caches(user_id, tweet))

    # Index for search asynchronously
    asyncio.create_task(self._index_tweet(tweet))

    return tweet

Tweet ID Generation

Use snowflake IDs for tweet ordering:

class SnowflakeID:
    def __init__(self, datacenter_id: int = 0, worker_id: int = 0):
        self.datacenter_id = datacenter_id
        self.worker_id = worker_id
        self.timestamp = 0
        self.sequence = 0

    def generate(self) -> int:
        timestamp = int(time.time() * 1000) - 1609459200000  # Epoch offset

        if timestamp == self.timestamp:
            self.sequence = (self.sequence + 1) & 4095
        else:
            self.sequence = 0

        self.timestamp = timestamp

        return (
            (timestamp << 22) |
            (self.datacenter_id << 17) |
            (self.worker_id << 12) |
            self.sequence
        )

API Design

Core Endpoints

Method	Endpoint	Description
POST	/api/v2/tweets	Create tweet
DELETE	/api/v2/tweets/:id	Delete tweet
GET	/api/v2/users/:id/timeline	Get user timeline
GET	/api/v2/timelines/home	Get home timeline
GET	/api/v2/tweets/search	Search tweets
POST	/api/v2/users/:id/follow	Follow user
DELETE	/api/v2/users/:id/follow	Unfollow user

Response Format

{
  "data": {
    "id": "1234567890",
    "text": "Hello, Twitter!",
    "author_id": "9876543210",
    "created_at": "2026-03-22T10:30:00Z",
    "like_count": 1523,
    "retweet_count": 342,
    "reply_count": 89
  },
  "meta": {
    "result_count": 1,
    "next_token": "b26v89c19zqg8o3fosdk7n9l"
  }
}

Scalability Challenges

The Celebrity Problem

Users like @elonmusk have millions of followers. A single tweet from a celebrity with 10M followers generates 10M timeline cache write operations if you fan out. At 6,000 tweets per second peak, a celebrity posting during a busy event can instantly overwhelm the fan-out queue and cascade into timeline delays for everyone.

The hybrid push/pull model solves this at the decision point. When a tweet arrives, the fan-out service checks the author’s follower count. Above a configurable threshold (usually 10,000 followers), the tweet skips fan-out entirely. Followers see it when they load their timeline and the system pulls recent tweets from accounts they follow. The threshold itself is tunable. Set it too low and celebrities still cause bursts. Set it too high and regular users with mid-sized followings get slow timeline updates.

Rate limiting adds another layer of protection. Celebrity accounts get stricter per-window rate limits on tweet publication. This prevents coordinated posting bursts from high-profile accounts during events like product launches or breaking news. The limit is a sliding window counter stored in Redis, keyed by author_id with a tier-based cap.

The CDN angle matters for media content. When a celebrity posts a video or image, the media is uploaded to a CDN edge cache before the tweet is published. Followers pulling the tweet at read time fetch media from the nearest CDN node instead of hitting Twitter’s media servers. This offloads the bandwidth spike that would otherwise land on origin infrastructure.

Hot Keys

Popular tweets create hot spots in the database. When a tweet goes viral, every like, retweet, and reply hits the same database row or partition. If 100,000 people like a tweet within a minute, that partition handles about 1,700 writes per second for a single row. That partition becomes a bottleneck while others sit idle.

Sharding by hash(tweet_id) spreads hot tweets across partitions. The hash function maps each tweet_id to a random partition. A celebrity tweet ends up on a different partition than the author’s other tweets. This prevents one user’s popularity from overloading a single partition. The tradeoff is that range queries, like fetching all tweets from a specific user, become scatter-gather operations across multiple partitions.

Eventual consistency for like and retweet counts removes the write contention. Instead of updating the exact count on every action, the system batches increments in a counter service and flushes to the database periodically. Users see a slightly stale count but the database avoids lock contention. This is the same approach Twitter uses for follower counts. The actual count is computed from the likes table for accuracy when needed, while the denormalized counter handles the fast path.

Client-side rate limiting prevents abuse amplification. If a single script sends 10,000 likes per second for a tweet, the server limits that client before the hot key problem worsens. Rate limiting is applied per user_id and per IP, with stricter limits on actions that trigger notifications or timeline updates.

Write Amplification

Fan-out multiplies write load. Each tweet from a regular user with 1,000 followers generates 1,000 timeline cache writes. With 500 million tweets per day and millions of active users with followings in the hundreds or thousands, the write amplification factor can reach 100x or more. The system writes one tweet to the database and writes hundreds of cache entries for follower timelines.

Async fan-out via message queues decouples the write path from the fan-out path. The API server writes the tweet to the database and publishes a fan-out event to Kafka. A pool of fan-out workers consumes these events and writes to timeline caches. If a worker fails, the Kafka offset stays uncommitted and another worker retries. The queue itself acts as a buffer. During traffic spikes, the queue depth grows but no data is lost.

The hybrid push/pull model directly reduces amplification. Celebrity tweets skip fan-out entirely, removing millions of cache writes per tweet. Regular users with small followings get full fan-out. Between those extremes, the system uses a graduated approach. Users with 5,000 followers might get fan-out to active followers only. Users with 10K followers get pull-only treatment. The exact thresholds are configurable per deployment.

Periodic batch updates help with the long tail of inactive users. Instead of writing to every follower’s timeline cache immediately, the system writes to caches of recently active users first. Inactive users, those with no login in 30 days, get their timeline rebuilt on next login instead of kept fresh in cache. This cuts the fan-out volume significantly since most users are inactive on any given day.

Trade-off Analysis

Factor	Push (Fan-out on Write)	Pull (Fan-out on Read)	Hybrid Approach
Read latency	Low (pre-computed)	High (compute on read)	Medium
Write latency	High (fan-out to all)	Low (single write)	Medium
Memory usage	High (all timelines)	Low (compute on need)	Medium
Celebrity posts	Problematic (fan-out storm)	Efficient (pull)	Efficient
Consistency	Eventual (async fan-out)	Strong (read own)	Tunable
Implementation	Complex (queues, workers)	Simpler	Moderate

Production Failure Scenarios

Failure Scenario	Impact	Mitigation
Fan-out queue backlog	Timeline updates delayed for hours	Auto-scale fan-out workers; prioritize active users
Celebrity tweet viral	Massive fan-out burst crashes servers	Rate limit celebrity tweets; pre-compute popular timelines
Timeline cache miss	Cold start: timeline loads from DB, high latency	Pre-warm caches for trending users; use background refresh
Search index lag	New tweets not appearing in search	Accept search lag; prioritize by engagement
Hot tweet partition	Single partition overloaded by likes/retweets	Shard by tweet_id; use eventual consistency for counts
Tweet DB primary failure	Writes fail; no new tweets	Use multi-primary replication; promote read replica

Common Pitfalls / Anti-Patterns

Pitfall 1: Fanning Out to All Followers Synchronously

Problem: Synchronous fan-out on every tweet creates massive latency spikes for users with many followers.

The issue is straightforward math. A user with 1,000 followers posting once triggers 1,000 timeline cache writes before the API can respond. Those writes are sequential and blocking — each one has to complete before the next starts and before the user gets their success response. At 10ms per write, that’s 10 seconds of latency for a single tweet from a mid-sized account. Scale that to a user with 100,000 followers and you are looking at 1,000 seconds, or about 17 minutes, of blocked time. The API would time out before finishing.

The synchronous approach also saturates database connections. Each fan-out write holds a connection open. With thousands of concurrent users and multiple followers each, connection pool exhaustion becomes a real risk. The database might start rejecting new writes from other operations entirely, taking down not just timeline updates but tweet posting and user authentication too.

Solution: Use async fan-out via message queues. Return success to user immediately, fan-out in background.

The fix is to decouple the write path from the fan-out path. When a tweet arrives, the API writes it to the database and publishes a single event to a message queue. That event contains the tweet and the author ID. The API returns success to the user right away. A pool of background workers consumes queue events and performs the actual fan-out writes. If a worker crashes, Kafka redelivers the message — no data lost, no user-facing error. The queue acts as a buffer during traffic spikes. During a celebrity event, queue depth grows but the API stays responsive.

Pitfall 2: Not Handling the Celebrity Problem

Problem: A user with 10M followers causes fan-out storms that overwhelm the system.

The danger is that celebrity accounts sneak up on you. Most feature work targets the happy path: regular users posting, fans following, timelines updating. Nobody writes a ticket for “handle when someone with more followers than our entire old architecture can handle posts during a live event.” But at scale, this is the scenario that actually breaks production.

The failure mode is subtle. A celebrity posting does not immediately crash anything. The fan-out queue depth starts climbing. Workers fall behind. Timelines for everyone, not just followers of the celebrity, start getting stale because queue workers are saturated writing to celebrity timelines instead of processing regular users. The symptom is broad degradation, not a single account error.

Solution: Implement hybrid model. Skip fan-out for users above a follower threshold; their tweets are pulled on read.

Pitfall 3: Storing Full Tweets in Timeline Cache

Problem: Caching 800 tweets × 5KB each = 4MB per user timeline in cache.

The memory math is where it falls apart. A tweet object in memory is not just the 280 characters of text. It includes the author ID, timestamps, engagement counters (likes, retweets, replies), media URLs, and client-side hydration data. A realistic tweet object averages 3-5KB. Twitter stores 800 tweet IDs per timeline to serve 200 tweets on request (with buffer for dedup and pagination). That is 4MB per user timeline in cache.

With 200 million daily active users, even if only 50 million have cached timelines at any given time, you are looking at 200 terabytes of cache memory just for timelines. Add replication factor (Twitter typically runs 3x for redundancy) and you are at 600TB. At $50 per terabyte per month for Redis cluster hosting, that is $30,000 per month in timeline cache costs alone — before you account for tweet cache, user cache, and social graph cache.

The irony is that most of that data is already cached elsewhere. Tweet objects for popular content sit in a separate tweet cache layer. When a user loads their timeline, the system needs the full tweet objects anyway for rendering. Storing full tweets in the timeline cache does not save a database round-trip — it just duplicates data that already exists in a dedicated cache tier.

Solution: Store only tweet IDs and timestamps in timeline cache; fetch full tweet objects separately with multi-get.

The compact representation uses 16 bytes per tweet (8-byte tweet ID, 8-byte timestamp). A timeline of 800 tweets takes 12.8KB instead of 4MB — a 300x reduction. The timeline cache returns tweet IDs only, and the API fetches the full tweet objects with a single multi-get operation against the tweet cache layer. Tweet cache hits are fast (memory). Tweet cache misses fall through to the database. This pattern works because tweet objects for active content are almost always in cache — the working set of recent tweets is exactly what the tweet cache is sized for.

Pitfall 4: Ignoring Engagement Calculation Cost

Problem: Calculating “trending” or “ranked” timeline requires scanning engagement data on every read.

A ranked timeline is not just reverse-chronological. Twitter mixes in tweets based on engagement signals — likes, retweets, replies, and click-through rate. Computing this on every read means that every timeline request requires scanning recent tweets from every user you follow, scoring each one by engagement, and sorting. If you follow 500 people and each posts 10 tweets in the last hour, that is 5,000 tweets to score on every page load. At 35,000 timeline reads per second peak, you are looking at 175 million scoring operations per second.

The database is the bottleneck here. A scan of recent tweets requires hitting the database, not cache — engagement data lives in separate tables (likes, retweets) that would need to be joined at read time. That join is expensive and does not scale. Even with proper indices, scanning 5,000 rows per request at 35,000 QPS is roughly 175 million row scans per second against the database. That is enough to saturate even a well-tuned PostgreSQL cluster.

Solution: Pre-compute engagement scores; update asynchronously; use Redis sorted sets for real-time ranking.

The fix is to move the scoring work offline. A background job computes an engagement score for each tweet periodically (every 5-15 minutes) and stores it in a sorted set keyed by tweet_id with the score as the score. Redis sorted sets support efficient range queries — you can fetch the top K tweets by score in O(log N) time. When a timeline loads, the system scores tweets by their pre-computed engagement rather than computing on the fly.

The update frequency is tunable. Trending tweets change score quickly, so they need frequent updates (every 1-2 minutes). Most tweets see gradual score changes, so recomputing every 15 minutes is sufficient. The tradeoff is staleness — a tweet that goes viral between updates will be under-ranked temporarily. For most use cases, this is acceptable. If not, you can boost the score of tweets with recent engagement acceleration (a delta check) to catch viral content faster.

Real-world Failure Scenarios

Scenario 1: Twitter API Rate Limit Storm (2020)

What happened: During a major global event, Twitter’s API experienced a massive surge in requests that overwhelmed rate limiting infrastructure, causing cascading failures across third-party applications and internal services.

Root cause: Rate limiting tokens were stored in a single Redis cluster without geographic distribution. When the cluster experienced elevated latency, all token validation requests queued up, causing timeouts across the board.

Impact: Third-party Twitter clients became unusable for several hours. Internal services that relied on the Twitter API for real-time data also experienced degradation, affecting timeline generation and search functionality.

Lesson learned: Distribute rate limiting state across multiple Redis clusters with geographic proximity to API consumers. Implement circuit breakers that gracefully degrade rather than cascade when backends are slow.

Scenario 2: Tweet Storage Corruption

What happened: A bug in Twitter’s distributed tweet storage system caused partial data corruption for tweets containing certain Unicode characters, rendering them un retrievable and causing gaps in user timelines.

Root cause: The tweet compaction process did not properly handle multi-byte UTF-8 sequences in edge cases. Corrupted records were compacted and replaced the original, making the data permanently unavailable.

Impact: Affected users saw gaps in their timelines with no explanation. The bug persisted for several days before detection, and the corrupted data was unrecoverable.

Lesson learned: Implement checksums for all stored data. Never overwrite original records during compaction — write the new compacted record to a separate location and atomically update the pointer. Regular integrity checks on stored data are essential.

Scenario 3: Search Index Desynchronization

What happened: Twitter’s real-time search index fell out of sync with the primary tweet database after a network partition, causing newly posted tweets to be invisible in search results for up to 30 minutes.

Root cause: The search indexing pipeline used eventual consistency semantics. When the network partition occurred, the indexing queue accumulated a backlog that took significant time to drain after connectivity was restored.

Impact: Users searching for trending topics during the partition window found incomplete results, leading to complaints about “missing tweets” in search. The desynchronization was not visible to users who only browsed their timeline.

Lesson learned: For real-time search use cases, use synchronous indexing with acknowledgment before confirming the write to the user. Implement index health monitoring and alerts for queue depth and indexing lag metrics.

Quick Recap

Timeline delivery: Hybrid push/pull
Celebrity handling: Pull on read (skip fan-out)
Tweet ordering: Snowflake IDs (time-sortable)
Cache strategy: Multi-layer (L1 Memcached, L2 Redis)
Write path: Write-behind caching
Fan-out: Async via message queues
Search: Elasticsearch with custom analyzer
Notifications: Real-time (mentions) + batch (likes/follows)

Copy/Paste Checklist

- [ ] Implement hybrid push/pull for celebrity accounts
- [ ] Cache timeline IDs, not full tweet objects
- [ ] Use async fan-out via message queue
- [ ] Partition by tweet_id for hot tweet handling
- [ ] Monitor fan-out queue depth and auto-scale workers
- [ ] Pre-warm caches for trending users
- [ ] Implement rate limiting on tweet publication
- [ ] Use snowflake IDs for ordering

Observability Checklist

Metrics to Capture

tweets_published_total (counter) - By author tier (celebrity vs regular)
timeline_load_duration_seconds (histogram) - P50, P95, P99
timeline_cache_hit_ratio (gauge) - Hit vs miss rate
fanout_queue_depth (gauge) - Messages pending fan-out
search_indexing_lag_seconds (gauge) - Time from tweet to searchable
engagement_events_total (counter) - Likes, retweets, replies by tier

Logs to Emit

{
  "timestamp": "2026-03-22T10:30:00Z",
  "event": "tweet_published",
  "author_id": "123456",
  "author_tier": "celebrity",
  "follower_count": 15000000,
  "fanout_queued": true,
  "fanout_queue_depth": 50000
}

Alerts to Configure

Alert	Threshold	Severity
Timeline P99 > 500ms	500ms for 5 min	Warning
Fan-out queue > 1M	1000000 pending	Critical
Cache hit ratio < 80%	80% for 10 min	Warning
Search lag > 5 min	300s	Warning

Security Checklist

Rate limiting on tweet publication (prevent spam)
Tweet content moderation (pre-scan for abuse)
Anti-scraping measures (limit bulk reads)
DM encryption (end-to-end for private messages)
OAuth 2.0 for all API access
IP-based access controls for internal services
Audit logging of admin operations

Capacity Estimation

QPS Calculations

Daily active users: 200 million
Tweets per user per day: 2 (average)
Peak QPS: 200M × 2 / 86400 ≈ 4,600 tweets/second (average)
Peak spike factor: 10x → 46,000 tweets/second

Timeline reads:
- 50% users check timeline 3x/day
- Average timeline: 200 tweets
- Read QPS: 200M × 0.5 × 3 / 86400 ≈ 3,500 reads/second
- Peak: ~35,000 reads/second

Storage Estimation

Tweets stored: 500 million tweets/day × 365 days × 5 years
= 500M × 365 × 5 = 912.5 billion tweets

Per tweet (avg): 500 bytes content + 200 bytes metadata = 700 bytes
Total: 912.5B × 700 bytes ≈ 639 TB

With 3x replication and indices: ~2 PB storage needed

Interview Questions

1. How does Twitter handle celebrity users with millions of followers?

Twitter uses a hybrid approach. For regular users (under 10,000 followers), tweets are fanned out to follower timelines on write. For celebrities and high-follower accounts, fan-out is skipped. Their tweets are pulled on read when users load their timeline. This prevents fan-out storms that would overwhelm the system.

2. Why does Twitter store only tweet IDs in timeline cache, not full tweet objects?

Memory is the constraint. Twitter caches 800 tweet IDs per user timeline (to serve 200 tweets on request). Each tweet object is around 5KB with metadata. If you cached the full objects, that's 4MB per user timeline. With hundreds of millions of users, you'd need hundreds of petabytes of cache.

Storing only IDs keeps the cache compact. The IDs are small, and you can fetch the actual tweet content with a multi-get operation — one request that pulls 200 tweet objects from cache or database. This is fast because tweet objects for popular content are themselves cached in a separate layer.

3. How does Twitter ensure message ordering in timelines?

Tweet IDs are Snowflake IDs, which encode a timestamp. Each ID is 64 bits: 41 bits for timestamp (milliseconds since a custom epoch), 5 bits for datacenter, 5 bits for worker, and 12 bits for sequence number. The key property is that IDs are monotonically increasing within a datacenter.

When merging tweets from celebrities (pulled at read time) with tweets from regular users (pushed at write time), the client sorts by tweet ID. Since IDs encode time, this gives you chronological order without comparing actual timestamps.

This sidesteps clock synchronization issues. If you used wall-clock timestamps for ordering, you'd need all servers synchronized via NTP, and even small clock skew could cause ordering violations.

4. What happens when you like a tweet?

The like action is lightweight — it's not synchronous on the hot path. The client sends the like request, the server records it in the database with eventual consistency (not immediately consistent), and returns success. If the database write fails, the user sees an error; if it succeeds, the like count might take a moment to update everywhere.

Notifications follow from that: if the tweet author has immediate notifications enabled, they get a push right away. If they've chosen batched notifications, they get a digest later. The distinction matters because a celebrity with millions of followers can't afford immediate fan-out of every like notification.

5. What is the trade-off between push (fan-out on write) and pull (fan-out on read) timeline models?

Push (fan-out on write) means timelines are pre-computed. When you post, the system fans out your tweet to everyone's cached timelines. Reads are fast — the timeline is ready. But writes are expensive, especially for users with many followers.

Pull (fan-out on read) means you write once (to the tweet database) and everyone reads your tweet when they load their timeline. Writes are cheap. Reads are expensive — every timeline load requires merging tweets from everyone you follow.

The hybrid model splits the difference: regular users get pushed, celebrities get pulled. You get fast reads for most content without the fan-out cost for high-follower accounts.

6. How does Snowflake ID generation work and why is it important for Twitter?

A Snowflake ID is 64 bits split into: timestamp (41 bits, custom epoch), datacenter (5 bits), worker (5 bits), and sequence (12 bits). Each datacenter-worker pair generates IDs independently, so there's no coordination needed between machines.

Why it matters for Twitter: you get globally unique IDs that sort by creation time without a central coordinator. No database sequence, no Zookeeper, no consensus protocol. The tradeoff is that you need enough datacenters and workers — with 5 bits each, you get 32 datacenters and 32 workers per datacenter, plenty for any practical deployment.

7. Design the search functionality for Twitter. How would you handle full-text search across billions of tweets?

Elasticsearch is the standard approach for this scale. You index tweets with a custom analyzer that handles Twitter-specific patterns: hashtags, mentions, URLs, and slang. The index is partitioned by time period (monthly or weekly) so old tweets can be kept in cheaper storage.

On the query side, you need to balance relevance with recency. Twitter uses a hybrid scoring: text match score weighted against engagement metrics (likes, retweets) and recency. Trending topics get a boost via a separate signal.

For performance, results are cached for popular queries. The index itself is eventually consistent — there will be a small lag (seconds to minutes) between when a tweet is posted and when it becomes searchable.

8. How would you design Twitter's notification system to handle billions of notifications?

Notifications get split into two paths based on urgency. Immediate notifications (mentions, replies) go through a real-time queue and get pushed via WebSocket or push notification immediately. Batched notifications (likes, follows, retweets) accumulate in a batch queue and get processed hourly.

The notification service itself is stateless — it reads from Kafka and writes to a notification store (a distributed database with user-specific partitions). The store indexes notifications by user_id and created_at for efficient retrieval.

For delivery confirmation, you track read status. Unread counts are cached separately and invalidated when new notifications arrive. The main tradeoff is between immediacy and system load — Twitter chose to batch non-critical notifications to keep the system stable.

9. What is the hot key problem in Twitter's database and how do you solve it?

Popular tweets (think viral content or a celebrity post) create hot spots — a single partition receives way more traffic than others. Reads for that tweet's data overwhelm the database.

Solutions include: sharding by hash(tweet_id) so hot tweets spread across partitions, using eventual consistency for like/retweet counts (not requiring strong consistency), and adding a caching layer specifically for popular tweets. You can also rate-limit clients to prevent a single tweet from causing a cascade.

10. How does write-behind caching work and when would you use it?

Write-behind (also called write-back) caches write to the database first, then asynchronously update the cache. The user gets a fast response, and the cache population happens in the background.

Twitter uses this for tweet posting: the write goes to the database (source of truth), and then background tasks update timeline caches and search indexes. This keeps the write path fast and resilient. If the background task fails, you either retry or accept a stale cache until TTL expiry.

The tradeoff is that if the database write succeeds but the cache update fails, you have inconsistency. You need monitoring to detect and recover from these cases.

11. Explain the timeline merge problem. How do you correctly merge tweets from push and pull sources?

The merge problem: your timeline has pre-computed tweets from regular users (pushed) and needs to pull celebrity tweets on read. The challenge is sorting them together correctly.

Twitter's solution uses Snowflake IDs as the sort key. Both pushed and pulled tweets have IDs that encode their creation time. When merging, you sort by ID descending. Since IDs are monotonically increasing within a datacenter, this gives you chronological order without comparing actual timestamps.

The practical challenge is ensuring you get enough tweets. If you're pulling celebrity tweets at read time, you need to know which celebrities a user follows to fetch the right tweets. That follow-graph lookup adds latency, which is why there's a social graph cache.

12. How would you design the follower/following data model at scale?

The follows table is a many-to-many relationship: user_id → many → user_id. The primary access patterns are "who does user X follow?" and "who follows user Y?" Both queries need to be fast.

Sharding strategy: partition follows by follower_id for "who does user X follow" queries, and by following_id for "who follows user Y" queries. Or use a hybrid approach where some data is replicated for both access patterns.

The counts (follower_count, following_count) are denormalized for fast access. They're updated asynchronously because they don't need strong consistency — you can accept temporary inaccuracies while the system catches up.

13. What caching layers does Twitter use and why?

Twitter uses a multi-layer cache hierarchy. L1 is an in-process cache (like Memcached) for the fastest hits. L2 is Redis cluster for shared cache across servers. Then the database.

Different data has different cache strategies: tweets cache for high-read, timeline cache for user-specific timelines, user cache for profile data, and social graph cache for the follower/following relationships. Each layer has its own TTL based on how often the data changes.

The social graph cache is particularly important because the follow relationships don't change as often as tweet content, but they're queried on every timeline load.

14. How does Twitter handle rate limiting and why is it necessary?

Rate limiting prevents abuse and keeps the system stable under load. Twitter limits both reads (how many timelines you can fetch) and writes (how many tweets you can post) per user, per window.

Implementation: a token bucket or sliding window counter per user_id, stored in a fast cache (Redis). When a request comes in, you check if the user has tokens left. If not, you reject with a 429 response.

Rate limiting is especially important for celebrity tweets — without it, a single viral tweet could cause a fan-out burst that overwhelms the system. Limiting how often a single user can post also helps reduce spam and coordinate attacks.

15. Design the data storage strategy for tweets over a 5-year period.

Over 5 years, Twitter stores roughly 900 billion tweets. At 700 bytes per tweet (content + metadata), that's about 630PB before replication, or ~2PB with 3x replication and indices.

Storage is tiered by age. Recent tweets (last 30-90 days) are kept in fast storage (SSD) for quick access. Older tweets move to cheaper storage (HDD or cold storage). This is called tiered storage or data aging.

Data is partitioned by time period (monthly or weekly partitions). This makes it easy to drop old data or move it to cheaper storage. Queries for recent tweets hit recent partitions; queries for old tweets span more partitions but are less frequent.

16. How would you handle a scenario where the fan-out queue backs up?

Queue backup means timeline updates are delayed — users might not see new tweets for hours. The impact is visibility, not data loss.

Mitigations: auto-scaling fan-out workers based on queue depth, prioritizing active users (people who login frequently) over inactive ones, and using backpressure to slow down new tweet ingestion when the queue is overwhelmed.

For the user experience, you might show a "timeline may be outdated" warning or load tweets directly from the database instead of the cache. The key is that the timeline remains functional even if it's stale — you don't fail completely, you degrade gracefully.

17. What is the role of a message queue in Twitter's architecture?

Kafka is the backbone for async processing. Tweet ingestion goes through Kafka: the write API publishes to a topic, and fan-out workers consume from that topic to update timeline caches. Notification processing similarly uses queues — the write path is decoupled from the fan-out and notification paths.

Queues provide three benefits: buffering (absorb spikes without overwhelming downstream services), decoupling (write path doesn't wait for fan-out to complete), and fault tolerance (failed messages can be retried). If a fan-out worker crashes, the message stays in the queue and another worker picks it up.

18. How does Twitter compute trending topics?

Trending is computed from engagement velocity — how fast a topic is gaining likes, retweets, and replies. A simple approach counts mentions in the last N minutes and ranks by velocity. More sophisticated systems weight by follower count and engagement quality.

The pipeline: tweets are streamed through Kafka, a storm or flink job computes rolling mention counts for hashtags and search queries, results are stored in a sorted set (like Redis sorted sets) with time-based expiry, and the top K trending items are served to clients.

The main tradeoff is between sensitivity (detecting trends quickly) and noise (avoiding spikes from scheduled events or coordinated campaigns). Geographic segmentation also matters — what's trending globally might differ from what's trending in Tokyo.

19. Design the API endpoint for fetching a user's timeline.

The primary endpoint is GET /api/v2/timelines/home. Query parameters include max_id (for pagination, exclusive upper bound), since_id (exclusive lower bound), and limit (default 200, max 800).

Response format is JSON with a data array of tweet objects and a meta object containing result_count and next_token for cursor-based pagination.

The implementation checks the timeline cache first (Redis). If there's a cache hit and no cursor is provided, return cached timeline IDs and batch-fetch tweet objects. On cache miss, fetch from the database and rebuild the cache asynchronously.

20. What happens when a Twitter user with 10 million followers posts a tweet?

The write goes to the database and returns success immediately. The fan-out service checks the follower count and sees it's above the celebrity threshold (around 10K), so it skips fan-out entirely.

When followers load their timeline, the system fetches the celebrity's recent tweets separately (pull-based fan-out on read). This merge happens in the timeline service: the cached timeline for the user includes pre-computed tweets from regular users, and at read time the system pulls the latest tweets from celebrities they follow.

The tradeoff: some read latency for celebrity tweets (they're not pre-computed in the timeline cache), but massive savings on write side. A single celebrity tweet would otherwise generate millions of cache updates. At 10M followers, that fan-out could take hours and overwhelm the system.

Conclusion

Twitter’s architecture balances read and write optimization through a hybrid push/pull model. The key decisions:

Push for regular users, pull for celebrities
Multi-layer caching throughout
Eventual consistency for non-critical data
Async processing for expensive operations

System Design: Twitter Feed Architecture and Scalability

Introduction

Requirements Analysis

Functional Requirements

Non-Functional Requirements

Data Models

Core Entities

NoSQL Alternative

Timeline Architecture

Push Model (Fan-out on Write)

Pull Model (Fan-out on Read)

Feed Generation

Timeline Consistency Model

The Problem

The Solution: Snowflake ID Sorting

Timeline Service

Caching Timeline

Fan-out Service

High-Follower Handling

Search Architecture

Elasticsearch Mapping

Search Query

Notification System

Notification Types

Notification Pipeline

Notification Service

Caching Patterns

Multi-Layer Caching

Cache-Aside for Tweets

Write Path Optimization

Write-Behind Cache

Tweet ID Generation

API Design

Core Endpoints

Response Format

Scalability Challenges

The Celebrity Problem

Hot Keys

Write Amplification

Trade-off Analysis

Production Failure Scenarios

Common Pitfalls / Anti-Patterns

Pitfall 1: Fanning Out to All Followers Synchronously

Pitfall 2: Not Handling the Celebrity Problem

Pitfall 3: Storing Full Tweets in Timeline Cache

Pitfall 4: Ignoring Engagement Calculation Cost

Real-world Failure Scenarios

Scenario 1: Twitter API Rate Limit Storm (2020)

Scenario 2: Tweet Storage Corruption

Scenario 3: Search Index Desynchronization

Quick Recap

Copy/Paste Checklist

Observability Checklist

Metrics to Capture

Logs to Emit

Alerts to Configure

Security Checklist

Capacity Estimation

QPS Calculations

Storage Estimation

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

System Design: Netflix Architecture for Global Streaming

System Design: URL Shortener from Scratch

Amazon Architecture: Lessons from the Pioneer of Microservices