CQRS and Event Sourcing: Distributed Data Management

Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.

published: March 24, 2026 reading time: 45 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

CQRS splits read and write into separate models so each can be optimized independently. Event sourcing stores every state change as an immutable record instead of overwriting current state, which gives you audit trails, temporal queries, and the ability to replay history. These patterns pair well with sagas for distributed transactions, but you'll need to get comfortable with eventual consistency between the write side and read models built asynchronously from event streams.

CQRS and Event Sourcing: Patterns for Distributed Data

Most database patterns assume the same model works for reading and writing. CRUD against a single table works fine when your system is simple. That assumption breaks down as you scale. Read-heavy workloads collide with write-heavy workloads. Reporting queries slow down transactional writes. Security gets tangled when the same model serves both purposes.

CQRS and Event Sourcing tackle these problems head-on. These are not new ideas, but they have become more relevant as teams move toward microservices and distributed systems where a monolithic database is no longer the right tool.

This post explains both patterns, shows how they work together, and helps you decide when the tradeoffs make sense.

What is CQRS

CQRS stands for Command Query Responsibility Segregation. The core idea is straightforward: use different models for writing data versus reading data.

In a typical application, one data model handles both operations:

-- Writing
INSERT INTO orders (customer_id, total, status) VALUES (1, 99.99, 'pending');

-- Reading
SELECT * FROM orders WHERE customer_id = 1;

With CQRS, you split this into two distinct models:

Command model: Handles writes. Optimized for validating and persisting state changes.
Query model: Handles reads. Optimized for answering specific questions efficiently.

The command model might store data in a normalized relational structure. The query model might use a denormalized format optimized for specific access patterns, like a separate read model for each screen in your application.

Why Separate Reads and Writes

Reads and writes have fundamentally different characteristics:

The conflict becomes concrete when you look at a real scenario. Consider an e-commerce product catalog. The write side needs referential integrity: foreign keys, unique constraints, normalized storage. The read side needs flexible search: full-text search across product names and descriptions, faceted filtering by attributes, denormalized joins for autocomplete. These two access patterns fight each other when they share the same database. You end up with queries that do 12-table joins for read operations that should take milliseconds, because the normalized schema that protects data integrity is hostile to read performance.

The staleness column in the comparison table reveals another mismatch. Write operations typically need immediate consistency. If a user updates their password, they expect that change to be visible before the next request. Read operations can often tolerate staleness. A product recommendation that is 5 seconds old is still useful. A user’s profile update that takes 5 seconds to propagate is jarring. By separating the models, you can accept the tradeoff that makes sense for each side. The write model enforces immediate consistency where it matters. The read model accepts whatever staleness the async projection introduces, which you can tune based on user experience requirements.

A common symptom of the combined model problem is the “read vs write scaling mismatch.” Your application might have 100 read requests per second but only 10 writes per second. Yet you are stuck scaling both together because the same database handles both. With CQRS, you scale the read store horizontally without touching the write store. You might run 5 read replicas during peak traffic while the write database stays at a single primary. The asymmetry is not a problem to solve; it is the freedom that separation provides.

Aspect	Writes	Reads
Pattern	Infrequent, discrete operations	Frequent, potentially complex queries
Data volume	Single record or small batch	Large result sets, aggregations
Optimization	Normalization, constraints	Denormalization, indexes
Timing	Immediate consistency needed	Can tolerate staleness

When you try to optimize both on the same model, you end up compromising both. CQRS lets you optimize each side independently.

Basic CQRS Flow

graph LR
    Command[Command] --> CommandHandler[Command Handler]
    CommandHandler --> WriteStore[(Write Store)]
    WriteStore -->|async projection| ReadStore[(Read Store)]

    Query[Query] --> QueryHandler[Query Handler]
    QueryHandler --> ReadStore

Commands flow to the command handler, which validates and persists changes to the write store. Read models are updated asynchronously from the write store through an event-driven projection mechanism.

The async projection lag is the critical detail to understand. When a command succeeds, the caller receives confirmation immediately. The event is durable. But the read model update happens in the background, typically via a separate consumer process. Under normal conditions, this lag is sub-second. Under load, it stretches to seconds or longer. During that window, a user who submits an order and immediately queries their order list might not see their new order. The command returned success. The read model has not caught up yet.

This matters for error handling. If the command fails validation, you return an error immediately and nothing is written. But if the command succeeds and the event store write succeeds, you have committed the event. The projection failure after that point should not roll back the command; it should trigger a retry or alert. Your application must distinguish between command rejection (user error, return immediately) and projection failure (system error, retry or alert). Conflating these two failure modes leads to confusing user experiences where commands succeed but nothing appears to happen.

Event Sourcing Fundamentals

Event sourcing changes how you think about persistence. Instead of storing current state, you store the sequence of events that led to that state.

Storing Events Instead of State

Traditional state storage overwrites values:

UPDATE accounts SET balance = 500 WHERE id = 1;

The old state is lost.

Event sourcing appends instead:

AccountCredited { account_id: 1, amount: 200, balance: 500, timestamp: 2026-03-24T10:00:00Z }
AccountCredited { account_id: 1, amount: 300, balance: 500, timestamp: 2026-03-24T11:00:00Z }

Every change becomes an immutable record. You derive the current balance by replaying events.

The immutability is what makes audit trails reliable. In a traditional database, you can update a record and lose the previous value. An auditor cannot tell when the change occurred or what the previous value was. With event sourcing, the event is append-only and timestamped. The AccountCredited event records not just the new balance but the amount credited, the timestamp, and often metadata like the initiating user or correlation ID. You can reconstruct exactly what happened and in what order. This matters for financial systems, healthcare records, compliance-heavy domains, and any system where you need to answer “what was the state at time X” with confidence.

When you combine immutable events with cryptographic integrity, you get tamper-proof audit logs. Each event can carry a hash of the previous event, creating a chain similar to a blockchain without the consensus overhead. If someone modifies an event in the database, the hash chain breaks and you detect the tampering. This is useful for regulatory compliance where you need to prove that records have not been altered after the fact.

An event store is a specialized database designed for event sourcing. Core operations:

Append events: Add new events to a stream
Read stream: Retrieve all events for an aggregate in order
Subscribe: Get notified when new events are added

The event store is the source of truth. Everything else is derived.

A message queue looks similar to an event store on the surface: producers send messages, consumers receive them. But the semantics differ in critical ways. A message queue typically deletes messages after consumption. An event store retains events indefinitely and allows replay from any point. A message queue delivers events to one consumer or distributes them across consumers for load balancing. An event store maintains stream semantics: events belong to an aggregate and must be read in order. A message queue might deliver events across different partitions with no ordering guarantee. An event store within a single aggregate stream guarantees ordering because events append to a single logical log.

Kafka is the most common example of a system adapted for event sourcing. It provides durable log storage and replay, which covers some event store needs. But Kafka partitions by topic, not by aggregate. If you have an Order-123 stream spread across partitions, you need to ensure all events for that order land in the same partition to maintain ordering. EventStoreDB handles this natively: streams are aggregate-centric by design, and the API enforces stream semantics. When choosing infrastructure, the question is whether you want an event store that happens to support streaming, or a streaming platform that you adapt for event sourcing.

graph LR
E1[AccountOpened] --> Store[(Event Store)]
E2[Deposited] --> Store
E3[Withdrawn] --> Store

    Store -->|replay| Snapshot[Current State]

Aggregates and Event Streams

In domain-driven design, an aggregate is a cluster of related objects treated as a single unit for data changes. Each aggregate has its own event stream.

For an order management system:

OrderAggregate stream contains: OrderCreated, ItemAdded, ItemRemoved, OrderConfirmed, OrderShipped, OrderCancelled
InventoryAggregate stream contains: InventoryReserved, InventoryReleased, InventoryDamaged

Each stream is append-only. Events are immutable once written.

Stream boundaries determine your consistency scope. All events for a single aggregate instance live in the same stream and process sequentially. This is not a limitation; it is the mechanism that enforces consistency. The aggregate root applies business rules on every event and rejects invalid transitions. If OrderCancelled cannot follow OrderConfirmed in the same stream, the aggregate enforces that rule. No concurrent modification can bypass it because the stream is a serial log.

Cross-aggregate operations have no such guarantee. If you need to reserve inventory and create an order in a single logical operation, you must use a saga to coordinate across aggregate boundaries. Each aggregate processes its own stream independently. The saga observes events from both streams and emits compensating commands when something goes wrong. This is eventual consistency across aggregates. The boundary is not a bug; it is the fundamental tradeoff that lets aggregates scale independently.

Event sourcing solves several problems that are difficult to handle with state storage:

Complete audit trail. You have a log of every state change. Regulatory requirements, debugging, and understanding user behavior all become easier with the full history.

Temporal queries. “What was the account balance on March 1st at 3pm?” With event sourcing, you replay events up to that timestamp. With state storage, you typically do not have this data readily available.

Replay capability. If your read model has a bug, you fix it and rebuild from the event stream. No data is lost because the events are the source of truth.

Debugging and tracing. You can replay production events locally to reproduce bugs. The event log becomes a precise record of what happened.

Parallel projections. Multiple read models can be built from the same event stream independently. You can add new views of your data without modifying the write side.

The temporal query example is worth examining closely. A bank regulator asks: “What was the account balance for account 4567 on March 1st, 2026 at 3pm EST?” With state storage, you do not have this data unless you took explicit snapshots at that moment. With event sourcing, you replay the event stream up to the timestamp of interest. The AccountOpened event sets initial balance to zero. The Deposited events add amounts. The Withdrawn events subtract amounts. Running the balance forward to March 1st 3pm gives you the exact balance at that moment. This is not a special feature; it is the natural consequence of storing the full history. The same mechanism lets you debug by replaying events locally, audit by reviewing the full sequence, and rebuild read models from scratch when your projection logic changes.

Event sourcing is not without friction:

Event schema evolution. Event schemas change over time. Old events must still deserialize correctly. You need a strategy for handling changes: upcasters, version numbers, or both.

Projection rebuild times. Replaying millions of events takes time. Large aggregates can become slow to reconstruct. Periodic snapshots help mitigate this.

Eventual consistency. Read models built from events lag behind writes. Your application must handle stale data.

Increased complexity. The mental model differs from traditional CRUD. Teams need time to adapt.

Schema evolution deserves a concrete example. Suppose you launch with OrderCreated { order_id, customer_name, total }. Six months later, you split customer_name into customer_first_name and customer_last_name. Your event store still has thousands of OrderCreated_v1 events with the old schema. When your projection replays these events, it must handle both v1 and v2 formats. You write an upcaster that transforms v1 to v2: parse customer_name, split on space, populate the two new fields. Without this, your projection either crashes on unknown fields or produces incorrect data. The problem compounds when you have multiple schema changes over years of history. You need to maintain a chain of upcasters: v1 to v2, v2 to v3, and so on. Miss one link and historical replay breaks. This is the operational complexity that event schema evolution introduces.

The way you process events in a CQRS/event sourcing system shapes performance, reliability, and consistency characteristics.

At-Least-Once vs Exactly-Once Delivery

Event consumers typically deliver messages with at-least-once semantics. Your projection handlers must handle duplicate events gracefully.

At-least-once delivery means an event may be delivered multiple times during retries or rebalances. Without idempotency checks, duplicates corrupt your read model.

Exactly-once delivery requires distributed consensus protocols and is expensive to guarantee. Most systems settle for at-least-once with idempotent handlers.

Design your projections assuming events will arrive more than once. The idempotent handler pattern handles this:

def handle_transfer_event(event):
    # Check if already processed
    if event.event_id in processed_event_log:
        return

    # Process event
    update_read_model(event)

    # Record processed ID
    processed_event_log.add(event.event_id)

Event Ordering Guarantees

Events within a single aggregate stream maintain ordering. Events across aggregates do not.

Within a stream: Events are appended in order. A Withdraw event after a Deposit in the same account stream respects the correct balance.

Across streams: A PaymentProcessed event and a FulfillmentScheduled event may arrive in different orders to different consumers.

For aggregate-centric operations, ordering within the stream matters. For cross-aggregate workflows, you need additional coordination.

Consider an account aggregate processing a sequence of events: AccountOpened with balance 0, then Deposited with amount 500, then Withdrawn with amount 200. If these events are replayed in order, the account balance correctly reflects 300. If the Withdrawn event were somehow processed before the Deposited event, the balance would go negative or reject entirely. This ordering dependency is why aggregate boundaries are critical in event sourcing. The aggregate root enforces invariants that depend on events arriving in sequence. When designing aggregates, ensure that all events which affect the same invariant live in the same stream. Splitting related events across streams creates race conditions that no amount of consumer-side logic can fully correct.

Cross-stream ordering requires explicit coordination mechanisms such as sagas, process managers, or correlation IDs that allow consumers to detect and handle out-of-order events. Without this, you accept that certain operations cannot be atomic across service boundaries.

Backpressure Handling

Fast event producers can overwhelm slow projection consumers. Without backpressure, lag grows until your system breaks.

Consumer-side backpressure: Kafka consumers control their own fetch rate. If projections process slowly, consumer lag grows.

Producer-side backpressure: Commands wait if the event store is overwhelmed. This backpressure propagates to callers.

Mitigation approaches:

Scale horizontally: Add projection worker instances
Prioritize streams: Process critical streams first during high load
Batch processing: Accumulate events and process in batches for throughput
Circuit breakers: Stop consuming when downstream systems fail

Kafka provides specific configuration options to control consumption rate and manage backpressure. fetch.min.bytes specifies the minimum amount of data the broker returns per fetch request; setting it higher (for example, 1MB instead of the default 1 byte) reduces the number of small fetches when processing is slow, allowing the consumer to accumulate more events per poll. max.poll.records controls how many records return per poll; reducing this limit (for example, to 100 instead of the default 500) decreases the memory footprint and processing time per poll cycle when projections are complex. max.poll.interval.ms defines the maximum time between polls before the consumer is considered dead; if your projection processing takes significant time per batch, increase this value. session.timeout.ms and heartbeat.interval.ms control consumer group membership and rebalance behavior; too short a session timeout causes unnecessary rebalances under GC pauses or slow processing. Tuning these parameters requires understanding your projection complexity, your desired lag SLA, and your consumer group’s processing capacity.

When backpressure builds despite configuration tuning, circuit breakers prevent cascading failures. If a downstream read model database becomes unavailable, continue consuming and buffering events in memory risks OOM errors. A circuit breaker opens after N consecutive failures, stops fetching new events, and allows the projection to recover gracefully. Some teams implement a dead-letter queue for events that exceed a retry threshold, ensuring these events are not lost while the main projection continues processing newer events.

Projections vs Transformations

Projections build read models from events. Transformations convert events between schemas.

Projections: OrderCreated → OrderSummary for the orders read model Transformations: OrderCreated_v1 → OrderCreated_v2 (upcaster)

Keep transformations focused. A transformation should only adapt schema, not change semantic meaning. Business logic belongs in projections.

An upcaster demonstrates the distinction clearly. Suppose the original OrderCreated_v1 event contains:

{
  "event_type": "OrderCreated",
  "version": 1,
  "order_id": "abc123",
  "customer_name": "Acme Corp",
  "total": 99.99
}

After a schema change, OrderCreated_v2 adds a customer_id field and renames total to order_total. The upcaster transforms the v1 event when replaying historical events:

def upcast_order_created_v1_to_v2(event_data):
    return {
        "event_type": "OrderCreated",
        "version": 2,
        "order_id": event_data["order_id"],
        "customer_id": None,  # Unknown for legacy events
        "customer_name": event_data["customer_name"],
        "order_total": event_data["total"]
    }

The upcaster handles schema migration without introducing business rules. It maps fields, fills defaults for new fields, and preserves the original semantic data. If you find yourself adding conditional logic or derived calculations in an upcaster, that logic belongs in a projection handler instead. Projections apply business logic to build read models; upcasters only bridge schema versions.

Cold Start Considerations

New projection consumers start from the beginning of the event stream. For mature systems, this takes time.

Full replay: Read from offset 0. Process all historical events. Takes time proportional to event history length.

From timestamp: Some event stores let you replay from a specific time. Useful for disaster recovery.

Parallel replay: Partition the event stream and process partitions in parallel. Requires consumer support for parallel processing.

Plan for cold start times when designing your recovery procedures. A system with 50 million events might take hours to fully replay.

Cold start time scales roughly linearly with event count when processing is single-threaded. A system with 10 million events might take 30 to 60 minutes to replay on a well-tuned consumer. With 50 million events, expect 2 to 4 hours or more depending on event size, projection complexity, and network latency to the read model database. This is not theoretical: teams have been surprised by 8-hour replay windows in production systems that had grown beyond initial estimates. Without snapshots, aggregate reconstruction time grows with event count per aggregate, which can make single-aggregate queries slow even when overall projection lag appears healthy.

Parallel replay mitigates this through horizontal scaling. Kafka supports partitioning by aggregate ID, and multiple consumer instances can each process different partitions. With max.poll.records tuned appropriately and sufficient consumer instances (one per partition), you can reduce wall-clock replay time proportionally to partition count. EventStoreDB supports similar parallelism through its persistent subscriptions with competing consumers. The trade-off is that parallelism adds complexity: you must partition without breaking ordering guarantees, handle partial partition failures, and coordinate snapshot creation across parallel consumers. For most systems, a well-configured snapshot strategy reduces cold start times enough that parallel replay is rarely needed outside of disaster recovery scenarios.

Saga Pattern Integration

CQRS and event sourcing pair naturally with the saga pattern for distributed transactions across service boundaries.

Sagas as Coordinators

A saga coordinates multiple services to complete a business workflow. Each step of the saga emits events that trigger the next step.

A hotel booking saga:

1. ReserveCar event → CarService reserves car
2. CarReserved event → HotelService reserves room
3. RoomReserved event → FlightService confirms flight
4. All confirmed → BookingComplete event

If any step fails:

1. ReserveCar event → CarService reserves car
2. CarReserved event → HotelService reports: HotelFull
3. HotelUnavailable event → CarService cancels car reservation
4. BookingCancelled event → Notify user

Event Sourcing for Sagas

Each saga instance can be an aggregate in your event store. The saga state becomes explicit.

class BookingSaga:
    def __init__(self, saga_id):
        self.saga_id = saga_id
        self.state = "pending"
        self.steps = []

    def handle(self, event):
        if isinstance(event, CarReserved):
            self.transition_to("hotel_reservation")
        elif isinstance(event, HotelUnavailable):
            self.transition_to("compensating")
            self.emit(CancelCarReservation(self.saga_id))

The event log captures the full saga execution, including compensations. Debugging becomes straightforward.

Command and Query Separation in Sagas

Sagas issue commands and receive events. The saga itself is write-focused. Query side belongs in the services participating in the saga.

A saga coordinator does not directly read from read models. It receives events, makes decisions, and emits commands. The separation keeps the design clean.

The coordinator should not read from read models for two reasons. First, read models reflect eventual consistency state, which lags behind the write side. If the saga coordinator reads a read model to decide its next action, it may make decisions based on stale data. In a hotel-booking saga, the coordinator might believe a room is available because the inventory read model has not yet processed a recent reservation, leading to double-booking. Second, coupling forms: the saga coordinator becomes dependent on the query side’s schema and availability. If the read model schema changes, the saga coordinator must change. If the read model goes down, the saga stalls even though the write side is healthy. Keeping the saga focused on commands and events maintains the separation of concerns that CQRS provides. The participating services own their query models; the saga only knows about events and commands.

Event Stores

Standard relational databases are not well-suited for event sourcing without significant effort. You need specialized infrastructure designed for append-only event storage.

Kafka

Apache Kafka is frequently used as an event store for event sourcing. It provides durable, ordered, partitioned event streams with consumer group semantics.

Kafka excels at:

High-throughput event streaming
Event replay from any offset
Multi-subscriber event consumption
Partitioning for horizontal scaling

Kafka is not a true event store in the domain-driven design sense because it lacks aggregate-centric operations. It is an event streaming platform adapted for event sourcing patterns.

EventStoreDB

EventStoreDB is purpose-built for event sourcing. It understands aggregates, streams, and event versioning natively.

EventStoreDB provides:

Aggregate-centric APIs
Built-in projections
Event versioning and upcasting support
Subscription mechanisms for building read models

EventStoreDB is a better fit when event sourcing is the primary pattern and you want infrastructure designed specifically for it.

Other Options

Postgres with append-only tables: Simple approach for low-volume scenarios
Axon Server: Commercial event store with built-in CQRS support
Marten: Event sourcing library for .NET using Postgres

The choice depends on your scale requirements, team expertise, and whether you need the broader Kafka ecosystem for other use cases.

Building Read Models with Projections

Projections build read models from events. A projection subscribes to event streams and updates a read model accordingly.

How Projections Work

graph LR
    EventStore[(Event Store)] --> Projection[Projection]
    Projection --> ReadModel[(Read Model)]

    subgraph Projection Logic
        P1[Handle AccountOpened]
        P2[Handle Deposit]
        P3[Handle Withdrawal]
    end

    Projection --> P1
    Projection --> P2
    Projection --> P3

When an event is written, the projection handler for that event type executes. It reads the event, updates the read model, and commits.

Projection Examples

Given these events:

OrderCreated { order_id: 123, customer_id: 456, items: [...] }
ItemAdded { order_id: 123, item_id: 789, quantity: 2 }
OrderConfirmed { order_id: 123 }

A “orders by customer” read model projection:

def handle_order_created(event):
    read_model.insert({
        'order_id': event.order_id,
        'customer_id': event.customer_id,
        'status': 'created',
        'total_items': len(event.items)
    })

def handle_item_added(event):
    read_model.update(
        {'order_id': event.order_id},
        {'$inc': {'total_items': event.quantity}}
    )

def handle_order_confirmed(event):
    read_model.update(
        {'order_id': event.order_id},
        {'$set': {'status': 'confirmed'}}
    )

The projection logic can be anything you can express in code. This flexibility is what makes projections powerful.

Command Handler Implementation

class CommandHandler:
    """Handles commands and emits events."""

    def __init__(self, event_store):
        self.event_store = event_store

    def handle_create_order(self, command: CreateOrderCommand) -> OrderCreatedEvent:
        # Validate business rules
        if not command.items:
            raise InvalidOrderError("Order must have at least one item")

        # Create and persist event
        event = OrderCreatedEvent(
            order_id=str(uuid.uuid4()),
            customer_id=command.customer_id,
            items=command.items,
            total=sum(item.price * item.quantity for item in command.items),
            created_at=datetime.utcnow()
        )
        self.event_store.append(event)
        return event

    def handle_confirm_order(self, command: ConfirmOrderCommand) -> OrderConfirmedEvent:
        # Load current state from event stream
        events = self.event_store.get_stream(command.order_id)
        order = OrderAggregate.reconstruct(events)

        # Validate state transition
        if order.status != "pending":
            raise InvalidStateTransitionError(f"Cannot confirm order in status: {order.status}")

        # Emit event
        event = OrderConfirmedEvent(
            order_id=command.order_id,
            confirmed_at=datetime.utcnow()
        )
        self.event_store.append(event)
        return event

Aggregate Reconstruction with Snapshots

class OrderAggregate:
    """Reconstructs aggregate state from events with snapshot support."""

    SNAPSHOT_INTERVAL = 100  # Create snapshot every 100 events

    @classmethod
    def reconstruct(cls, events: list[Event], snapshot: Snapshot = None) -> "OrderAggregate":
        aggregate = cls()

        if snapshot:
            # Start from snapshot
            aggregate.order_id = snapshot.order_id
            aggregate.customer_id = snapshot.customer_id
            aggregate.status = snapshot.status
            aggregate.items = snapshot.items
            aggregate.total = snapshot.total
            aggregate.version = snapshot.version
        else:
            # Replay from beginning
            aggregate.version = 0

        # Apply remaining events
        start_version = snapshot.version if snapshot else 0
        for event in events[start_version:]:
            aggregate._apply(event)
            aggregate.version += 1

        return aggregate

    def _apply(self, event: Event):
        if isinstance(event, OrderCreated):
            self.order_id = event.order_id
            self.customer_id = event.customer_id
            self.items = event.items
            self.total = event.total
            self.status = "created"
        elif isinstance(event, ItemAdded):
            self.items.append(event.item)
            self.total += event.item.price * event.item.quantity
        elif isinstance(event, OrderConfirmed):
            self.status = "confirmed"

    def should_snapshot(self) -> bool:
        return self.version % self.SNAPSHOT_INTERVAL == 0

    def create_snapshot(self) -> Snapshot:
        return Snapshot(
            order_id=self.order_id,
            customer_id=self.customer_id,
            status=self.status,
            items=self.items,
            total=self.total,
            version=self.version
        )

Idempotent Projection Handler

class IdempotentProjection:
    """Projection that handles duplicate events safely."""

    def __init__(self, read_model_db):
        self.read_model = read_model_db
        self.processed_events = set()  # Track processed event IDs

    def handle(self, event: Event):
        # Idempotency check
        if event.event_id in self.processed_events:
            return  # Already processed, skip

        # Process event
        if isinstance(event, OrderCreated):
            self._handle_order_created(event)
        elif isinstance(event, OrderConfirmed):
            self._handle_order_confirmed(event)

        # Mark as processed
        self.processed_events.add(event.event_id)

        # Prune old event IDs to prevent memory growth
        if len(self.processed_events) > 100000:
            self._prune_processed_events()

    def _handle_order_created(self, event):
        # Upsert instead of insert
        self.read_model.upsert(
            {'order_id': event.order_id},
            {
                '$set': {
                    'customer_id': event.customer_id,
                    'status': 'created',
                    'created_at': event.created_at
                },
                '$setOnInsert': {'_id': event.order_id}
            }
        )

Because projections are derived from events, they can be rebuilt from scratch at any time. This is invaluable when:

You find a bug in projection logic
You need a new read model from historical data
You want to change the read model schema

Rebuilding involves:

Clear the existing read model
Reset the projection checkpoint to the beginning
Replay all events through the projection handler

The event store provides the replay capability.

Eventual Consistency Implications

CQRS and event sourcing introduce eventual consistency between the write side and read models. This has practical implications for how you build applications.

What Eventual Consistency Means

When you write data, the command completes successfully. The event is persisted. But the read model update happens asynchronously. For a brief window, reads return stale data.

The duration of this window varies:

Local event processing: milliseconds
Kafka consumer lag: milliseconds to seconds
Cross-datacenter replication: seconds to minutes

You must design your application assuming reads can be stale.

These numbers reflect practical observations rather than guarantees. Local event processing within the same process typically completes in 1 to 5 milliseconds under normal load, limited by the speed of the event bus or in-memory queue. When events pass through Kafka, consumer lag depends on fetch configuration and processing time per event; with default settings and lightweight projections, you see 50 to 500 milliseconds. Heavy projections with database writes can push this to several seconds. Cross-datacenter replication introduces network latency between datacenters (typically 10 to 100 milliseconds for regional replication) plus the same Kafka consumer lag on the receiving end, which can total seconds to minutes if the inter-datacenter link is congested or if the receiving datacenter’s projection workers are underprovisioned.

Understanding these latency ranges matters for error handling. If you set aggressive timeout values assuming local-speed consistency, commands will fail unnecessarily. If you assume cross-datacenter latency is acceptable for a user-facing read, you may display stale data longer than users tolerate. Match your consistency assumptions to the actual infrastructure.

User Experience Considerations

Users notice eventual consistency in specific scenarios:

They submit an order and immediately check their order list
They update their profile and refresh the page
They delete an item and see it briefly before the page updates

Mitigation strategies:

Optimistic UI: Show the expected state immediately, reconcile if needed
Read-your-writes consistency: Route reads to a model that includes your own writes
Refresh mechanisms: Provide manual refresh or auto-refresh for stale-sensitive views

Optimistic UI is the most common mitigation. The client applies the expected change locally before the server confirms it. If the server rejects the command, the client rolls back. This approach works well for order placement: show the order as “confirmed” immediately, and if the projection has not yet caught up, a later refresh reveals the confirmed state anyway. The risk is minor inconsistency during the window; the benefit is perceived zero latency.

Read-your-writes consistency routes the user’s own writes into their read path directly. Rather than reading from the shared read model, the user’s session includes a small in-memory buffer of their recent commands and their expected results. When the user queries their own data, the handler merges the buffered writes with the read model. This gives the user the illusion of synchronous consistency for their own actions while preserving the performance benefits of async projection for other users. Implementation options include sticky sessions that route reads to a projection worker that recently processed the user’s events, or a client-side cache that injects known writes into the response before the server reply arrives.

Eventual Consistency Flow

graph LR
    Command[Command] --> Validate[Validate Command]
    Validate --> Persist[Persist to Event Store]
    Persist --> Emit[Emit Event]
    Emit --> Async[Async Projection]
    Async --> Update[Update Read Model]
    Update --> Query[Query Read Model]
    Query --> User[Return to User]
    Persist --> Query[Read immediately<br/>after write]

Production Failure Scenarios

Failure	Impact	Mitigation
Projection lag too long	Users see stale data for extended period	Monitor projection lag; scale projection workers; alert on SLA breach
Event schema changes break projections	Read models stop updating	Implement upcasters; version event schemas; test schema evolution
Projection worker crashes mid-update	Partial update to read model	Idempotent projections; event replay capability; transactional outbox
Event store becomes unavailable	Commands cannot be processed	Use replicated event store; partition for availability
Snapshot too old	Long aggregate reconstruction time	Define snapshot frequency; monitor snapshot age
Concurrent projections conflict	Duplicate or lost updates in read model	Use optimistic concurrency; idempotent operations; event ordering guarantees
Kafka consumer group rebalance	Brief read model unavailability	Plan for rebalance delays; use consumer lag monitoring
Read model database unavailable	Reads fail but writes succeed	Read replicas; circuit breakers; graceful degradation

Consistency Guarantees

CQRS does not mean you abandon consistency entirely:

Commands validate against current state before accepting changes
Events are written atomically within an aggregate
Read models eventually catch up

The aggregate boundary defines your consistency scope. Within an aggregate, you have strong consistency. Across aggregates, you have eventual consistency.

Strong consistency within an aggregate means that all events for a single aggregate instance are written sequentially and reflect the aggregate’s true state at the time of the command. The aggregate root enforces invariants that depend on the full history. For example, an overdraft protection aggregate might have a rule that credits cannot bring the balance above the credit limit, enforced by the aggregate on each event. This enforcement is strong because it happens at command time using the authoritative event sequence. No other process can concurrently modify the same aggregate without going through the same event sequence, so race conditions within an aggregate are structurally impossible.

Across aggregates, you have eventual consistency because operations on different aggregates proceed independently. If aggregate A publishes an event that triggers a projection that eventually updates a read model that aggregate B reads, aggregate B sees a version of reality that lags by the projection lag duration. Cross-aggregate operations also have no transactional boundary. If you need to reserve inventory and create an order as a single unit of work, you must use a saga because no transaction spans aggregates. The saga itself is eventually consistent: it observes events and emits compensating commands, but there is no moment at which both aggregates have definitively committed the same logical transaction.

When to Use CQRS and Event Sourcing

These patterns solve specific problems. They introduce complexity that must be justified.

Trade-off Comparison

Criteria	Traditional CRUD	CQRS Only	CQRS + Event Sourcing
Complexity	Low	Medium	High
Read/Write Optimization	Limited (same model)	Independent optimization	Independent optimization
Audit Trail	Limited (last state only)	Limited (last state only)	Complete history
Temporal Queries	Not supported	Not supported	Full support
Projection Rebuild	N/A	Clear and rebuild	Replay events
Event Schema Evolution	N/A	N/A	Required
Infrastructure	Standard database	Write + read stores	Event store + projections
Team Learning Curve	Low	Medium	High
Debugging	Standard	Standard	Event replay

Good Fits

CQRS and event sourcing work well when:

Audit requirements are strict. Financial systems, healthcare records, compliance-heavy domains. The immutable event log satisfies audit requirements naturally.

Temporal queries are frequent. “Show me all state changes for account X in Q3 2025.” With events, this is straightforward. With state storage, you need special infrastructure.

Multiple read models exist. Different consumers need different views of the same data. Reporting, analytics, operational dashboards, user-facing views. Each can have its own optimized read model.

Complex domain with rich behavior. When the write side involves significant business logic, a separate command model lets you express that logic clearly without mixing in read concerns.

Distributed systems with event-driven integration. Microservices that communicate through events naturally benefit from event sourcing. The events become the integration contract between services.

A financial ledger is a textbook good fit. Every transaction is an immutable credit or debit event. The audit trail is the core business requirement, not an afterthought. Reconstructing account balance at any point in time is routine, not a special query. When regulators ask to see all transactions for account 12345 between January 1st and March 31st, you replay the events. No special infrastructure, no snapshots of state at specific times. The event stream is the ledger.

A content management system is a poor fit. Users create and edit articles. The business value is in the current state of articles, not the history of edits. Nobody asks “what was the article title on March 3rd at 2pm.” There is no audit requirement that demands every keystroke preserved. CRUD against a single database handles this fine. Adding event sourcing adds operational complexity with no corresponding business value.

These patterns are likely overkill when:

Simple CRUD dominates. If your application is mostly create, read, update, delete with straightforward queries, CQRS adds complexity without benefit.

Strong consistency is required everywhere. If the business cannot tolerate stale reads in any scenario, eventual consistency creates friction.

Small team with limited experience. The operational overhead is real. You need expertise in event store administration, projection development, and debugging distributed systems.

Low latency requirements are strict. The additional async steps in CQRS introduce latency. If sub-millisecond response times are critical, the added hops hurt.

A real-time gaming leaderboard is a poor fit. When a player scores points, the leaderboard must update immediately. If a player sits at rank 5 and their score updates, they expect to see the new rank within milliseconds. Eventual consistency with its async projection creates a window where the leaderboard shows stale data. Players will notice and complain. The added complexity of CQRS does not buy you anything here because the read model update must be near-synchronous anyway.

A small team of three engineers building an MVP is a poor fit. The operational overhead is real even at small scale. You need to understand event store administration, projection development, idempotency handling, schema evolution. With three engineers and a launch deadline, this overhead slows you down. The patterns make sense when you have the team size and the problem complexity that justifies them.

Before adopting CQRS and event sourcing:

Do you have a genuine need for event replay or temporal queries?
Do you need multiple, differently-optimized read models?
Is the team prepared for the operational complexity?
Is eventual consistency acceptable for your use cases?

If you answered yes to the first two questions and are prepared for the complexity, these patterns will likely pay off.

CQRS and event sourcing connect to other patterns in distributed systems.

Event sourcing pairs naturally with event-driven architecture. Events are the communication mechanism, and event sourcing provides the persistence strategy for those events.

The saga pattern handles distributed transactions across microservices. Sagas work well with event sourcing because each saga step can emit events that trigger the next step.

For distributed transactions across services, see distributed transactions which covers two-phase commit and related approaches.

The database per service pattern supports CQRS because each side can use a database optimized for its workload. The write side might use a relational database for transactional integrity. The read side might use a document store or columnar database for query performance.

Quick Recap

Key Points

CQRS separates read and write models for independent optimization
Event sourcing stores events instead of state, enabling replay and audit
Event stores like Kafka and EventStoreDB provide the persistence layer
Projections build read models from events asynchronously
Eventual consistency is a fundamental tradeoff
These patterns suit audit-heavy, temporal-query-rich, and multi-read-model domains
They add complexity and require careful consideration before adoption

Pre-Deployment Checklist

- [ ] Event store selected and operational (Kafka, EventStoreDB, or other)
- [ ] Aggregate boundaries defined and documented
- [ ] Event schema versioning strategy implemented (upcasters or versioned types)
- [ ] Projection infrastructure in place for read model updates
- [ ] Snapshot strategy defined for long-running aggregates
- [ ] Monitoring for projection lag and event consumption delays
- [ ] Read model rebuild procedure documented and tested
- [ ] Eventual consistency handling documented for UI teams
- [ ] Security model defined for event store access
- [ ] Schema evolution testing implemented

Interview Questions

1. What is the fundamental difference between CQRS and traditional CRUD architectures?

Expected answer points:

CRUD uses a single model for both reads and writes; CQRS separates them into distinct command and query models
Traditional CRUD compromises both read and write optimization on the same model
CQRS allows independent optimization of each side—writes can use normalized storage while reads use denormalized formats
The command model handles validation and state changes; the query model handles efficient data retrieval

2. Explain the relationship between CQRS and event sourcing. Can you have one without the other?

Expected answer points:

CQRS and event sourcing are complementary but independent patterns
CQRS alone gives you separate read/write models but still typically stores current state
Event sourcing stores the sequence of events rather than state, enabling replay and audit
Yes, you can have CQRS without event sourcing (using regular state storage for the write model)
You can also have event sourcing without CQRS (storing events but using same model for reads)
The combination is powerful: CQRS + event sourcing gives optimized reads/writes plus complete audit trail and temporal queries

3. What is an event store and how does it differ from a traditional database?

Expected answer points:

An event store is a specialized database designed for append-only event storage
Core operations: append events, read stream by aggregate ID, subscribe to event streams
Events are immutable once written—unlike traditional databases where records can be updated/deleted
Stores the sequence of events that led to current state, not the current state itself
Supports event replay to reconstruct aggregate state at any point in time
Popular implementations: EventStoreDB (DDD-native), Apache Kafka (streaming platform adapted for event sourcing)

4. What is eventual consistency in CQRS and why does it occur?

Expected answer points:

Eventual consistency means read models may lag behind writes temporarily
Occurs because read models are updated asynchronously from the write store through projections
When a command completes, the event is persisted immediately but read model updates happen in the background
Lag duration varies: local processing (milliseconds), Kafka consumer lag (ms to seconds), cross-datacenter (seconds to minutes)
Applications must be designed to handle stale reads during this window
Mitigation strategies: optimistic UI, read-your-writes consistency, manual/auto-refresh mechanisms

5. How do you handle event schema evolution in event sourcing?

Expected answer points:

Event schemas change over time as domain understanding evolves
Old events must still deserialize correctly despite schema changes
Common approaches: version numbers in events, upcasters (transform old event formats to new), or both
Upcasters are functions that transform events from one schema version to another
Snapshot strategy helps: periodic snapshots reduce reconstruction time for long event histories
Testing schema evolution is critical—verify old events can be replayed with new code

6. What is the aggregate reconstruction process and how do snapshots help?

Expected answer points:

Aggregate reconstruction replays events from the event stream to build current state
For each event, the aggregate's apply method updates state accordingly
Without snapshots, reconstruction time grows linearly with event count
Snapshots store periodic state checkpoints—reconstruction starts from the nearest snapshot then replays remaining events
Trade-off: simpler reconstruction vs storage overhead for snapshots
Common pattern: snapshot every N events (e.g., every 100 events)

7. What makes idempotent projection handlers important in CQRS/event sourcing?

Expected answer points:

Projections may process the same event multiple times due to retries, consumer group rebalances, or crashes mid-update
Idempotent handlers ensure duplicate processing produces the same result as single processing
Implementation: track processed event IDs in a set; skip events already processed
Uses upsert operations instead of blind inserts (can fail on duplicates)
Prevents duplicate or corrupted data in read models from retry scenarios
Pruning mechanism needed to prevent unbounded memory growth in the processed events set

8. When would you choose Kafka over EventStoreDB for event sourcing?

Expected answer points:

Kafka is an event streaming platform, not a true event store in DDD sense—lacks aggregate-centric operations
Choose Kafka when: already using Kafka for other use cases, need high-throughput streaming, need multi-subscriber consumption patterns, need the broader Kafka ecosystem
Choose EventStoreDB when: event sourcing is the primary pattern, need native aggregate support, built-in projections, native event versioning and upcasting, simpler operational model
Kafka excels at horizontal scaling and durability through partitioning
EventStoreDB better fits pure event sourcing workloads where aggregate boundaries are clear

9. What are the main challenges teams face when adopting CQRS and event sourcing?

Expected answer points:

Mental model shift: different from traditional CRUD, requires learning new patterns
Eventual consistency handling: applications must be designed for stale reads
Projection rebuild times: large aggregates can be slow to reconstruct without snapshots
Event schema evolution complexity: upcasters and versioning add operational overhead
Increased infrastructure complexity: separate read/write stores, event store, projection workers
Team learning curve: requires expertise in event store administration, projection development, debugging distributed systems
Debugging distributed systems is inherently harder than monolithic debugging

10. Design a decision framework for when to adopt CQRS and event sourcing.

Expected answer points:

Do you have genuine need for event replay or temporal queries? If yes, strong candidate for event sourcing
Do you need multiple, differently-optimized read models for different consumers? Supports CQRS
Is the team prepared for operational complexity? Requires dedicated expertise
Is eventual consistency acceptable for your use cases? If strong consistency required everywhere, friction will outweigh benefits
Audit requirements: regulatory/compliance needs favor event sourcing's immutable log
Domain complexity: rich behavior domains benefit from explicit command models
Start incrementally: CQRS alone before adding event sourcing; prove value at each step

11. How would you implement snapshotting to optimize aggregate reconstruction performance?

Expected answer points:

Snapshots store periodic state checkpoints to avoid replaying entire event history
Define snapshot interval based on event count (e.g., every 100 events)
Reconstruction process: load nearest snapshot, then replay events after snapshot timestamp
Snapshot contains all aggregate state fields plus version number
Trade-off: faster reconstruction vs storage overhead for snapshot objects
Monitor snapshot age and create new snapshots when version gap grows too large
Consider snapshot frequency based on aggregate event rate and acceptable reconstruction time

12. Explain how upcasters work and why they are necessary for event schema evolution.

Expected answer points:

Upcasters transform events from old schema versions to new schema versions during replay
When event schemas change, old events stored in the event store may not deserialize correctly with new code
Upcaster functions apply transformations to convert old event format to current format
Typically chained: v1 → v2 → v3 upcasters applied sequentially for older events
Alternative approach: embed version number in event and handle version branching in apply logic
Upcasters allow adding new fields with defaults without breaking old events
Test upcasters by replaying historical events through the full upcaster chain

13. What is the relationship between aggregates and event streams in DDD?

Expected answer points:

An aggregate is a cluster of related objects treated as a single unit for data changes
Each aggregate instance has its own event stream—events belong to one aggregate's history
Aggregate boundary defines consistency scope: within aggregate, strong consistency; cross-aggregate, eventual consistency
Command handlers load aggregate by replaying its event stream, apply command, emit new events
Events reference aggregate ID, aggregate version, and contain the domain data that changed
Split aggregates at natural bounce points: Order vs Inventory are separate aggregates with separate streams
Stream naming convention: `aggregateType-aggregateId` (e.g., `Order-abc123`)

14. How do you handle cross-aggregate transactions in an event-sourced system?

Expected answer points:

Cross-aggregate transactions cannot use strong consistency—aggregates only handle their own state

Saga pattern coordinates multi-aggregate workflows through events and compensating actions

When aggregate A needs aggregate B to change: emit event from A, saga subscribes and issues command to B

Compensating transactions reverse steps if a later step fails

Order matters: booking flow reserves flight, then hotel, then car—if hotel fails, undo flight reservation

Eventual consistency means temporary inconsistency during saga execution—users see intermediate states

Design sagas to be eventually consistent and compensate gracefully

15. Compare synchronous vs asynchronous projection strategies and their trade-offs.

Expected answer points:

Synchronous projection: command handler updates read model before returning success. Low latency for reads, but command latency increases
Asynchronous projection: event is persisted, projection happens in background. Command returns immediately, reads may be stale
Hybrid approach: synchronous update for critical read path, async for rest
Async projection scales better under high write load—projection workers scale independently
Async projection complicates read-your-writes consistency—user may not see their own write immediately
Consider: for which read models is immediate consistency required? Only those get synchronous updates

16. How would you design a read model for a dashboard showing real-time metrics?

Expected answer points:

Real-time dashboard needs minimal lag between write and visible update
Use synchronous projection or very low-latency async (local event processing)
Pre-aggregate metrics in projection: maintain running totals, counts, averages in read model
Denormalize for query pattern: dashboard widget data stored in format ready for display
Consider separate read model per widget rather than one model serving all dashboard needs
Include projection lag monitoring—alert if read model falls behind event stream
For high-cardinality metrics (per-user), consider downsampling or aggregating at write time

17. What are the security implications of event sourcing and how do you address them?

Expected answer points:

Event store contains full history including deleted data—cannot just delete records
PII in events requires encryption at rest and access controls
Commands validate authorization; events should not leak sensitive data to unauthorized consumers
Use separate streams per tenant in multi-tenant systems for isolation
Implement event masking: transform sensitive fields when replaying to unauthorized consumers
Audit log of event store access complements the event log itself
Consider event immutability vs right to be forgotten regulations (GDPR)—requires special handling

18. How do you test event-sourced aggregates and projection handlers?

Expected answer points:

Test aggregates by replaying a sequence of events and asserting final state
Given events [OrderCreated, ItemAdded], when Apply command ConfirmOrder, then emit OrderConfirmed
Test invalid state transitions: given OrderCancelled, when ConfirmOrder, then reject with InvalidStateTransition
Projection tests: given events, when projection processes them, then read model matches expected state
Test idempotency by calling projection handler multiple times with same event and verifying same result
Integration tests: real event store, real read model, verify rebuild from scratch produces correct state
Property-based testing: generate random event sequences and verify aggregate invariants hold

19. Describe a strategy for migrating an existing CRUD system to CQRS without rewriting everything.

Expected answer points:

Introduce CQRS incrementally: start with one bounded context or one read model
Phase 1: Add read model alongside existing CRUD; sync via change data capture or triggers
Phase 2: New features use command model; old features remain CRUD
Phase 3: Migrate read-heavy features to use read model; eventually migrate write side
Keep existing database as write store initially; add denormalized read model for new queries
Use strangler fig pattern: new CQRS features wrap around legacy system, gradually replace
Event sourcing added last after CQRS settles—it adds complexity beyond just separation

20. How does event sourcing affect database backup and disaster recovery strategies?

Expected answer points:

Event store backup is simpler: append-only logs, no complex transaction logs, no deleted records
Restore is replay: load event store backup, rebuild all read models from events
Backup frequency matters less than with state storage—no data lost between backups if events are immutable
Snapshots reduce recovery time objective (RTO): rebuild from snapshot + recent events vs full replay
Cross-datacenter replication of event stream provides disaster recovery without complex failover
Read models are disposable: if read model database fails, rebuild from event store
Test disaster recovery: periodically restore to fresh environment and verify read model rebuild
Backup encryption and access controls still required for event store containing sensitive history

Conclusion

CQRS and event sourcing solve real problems in distributed systems. Separating read and write concerns, keeping the full history of changes, replaying events when things go wrong—these aren’t academic ideas. They come from dealing with messy production systems.

The tradeoffs are genuine. You take on operational complexity, eventual consistency headaches, and a learning curve for the whole team. I’ve seen teams struggle with this for months before it clicked.

For most applications, keep it simple. A monolith with a well-designed database handles a lot. But when audit trails matter, when you need to query historical states, when different consumers want wildly different views of the same data—these patterns earn their keep.

CQRS and Event Sourcing: Patterns for Distributed Data

What is CQRS

Why Separate Reads and Writes

Basic CQRS Flow

Event Sourcing Fundamentals

Storing Events Instead of State

Aggregates and Event Streams

At-Least-Once vs Exactly-Once Delivery

Event Ordering Guarantees

Backpressure Handling

Projections vs Transformations

Cold Start Considerations

Saga Pattern Integration

Sagas as Coordinators

Event Sourcing for Sagas

Command and Query Separation in Sagas

Event Stores

Kafka

EventStoreDB

Other Options

Building Read Models with Projections

How Projections Work

Projection Examples

Command Handler Implementation

Aggregate Reconstruction with Snapshots

Idempotent Projection Handler

Eventual Consistency Implications

What Eventual Consistency Means

User Experience Considerations

Eventual Consistency Flow

Production Failure Scenarios

Consistency Guarantees

When to Use CQRS and Event Sourcing

Trade-off Comparison

Good Fits

Related Patterns

Quick Recap

Key Points

Pre-Deployment Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

CQRS Pattern

Event Sourcing

Amazon Architecture: Lessons from the Pioneer of Microservices