CQRS and Event Sourcing: Distributed Data Management

Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.

published: reading time: 30 min read author: GeekWorkBench

CQRS and Event Sourcing: Patterns for Distributed Data

Most database patterns assume the same model works for reading and writing. CRUD against a single table works fine when your system is simple. That assumption breaks down as you scale. Read-heavy workloads collide with write-heavy workloads. Reporting queries slow down transactional writes. Security gets tangled when the same model serves both purposes.

CQRS and Event Sourcing tackle these problems head-on. These are not new ideas, but they have become more relevant as teams move toward microservices and distributed systems where a monolithic database is no longer the right tool.

This post explains both patterns, shows how they work together, and helps you decide when the tradeoffs make sense.

What is CQRS

CQRS stands for Command Query Responsibility Segregation. The core idea is straightforward: use different models for writing data versus reading data.

In a typical application, one data model handles both operations:

-- Writing
INSERT INTO orders (customer_id, total, status) VALUES (1, 99.99, 'pending');

-- Reading
SELECT * FROM orders WHERE customer_id = 1;

With CQRS, you split this into two distinct models:

  • Command model: Handles writes. Optimized for validating and persisting state changes.
  • Query model: Handles reads. Optimized for answering specific questions efficiently.

The command model might store data in a normalized relational structure. The query model might use a denormalized format optimized for specific access patterns, like a separate read model for each screen in your application.

Why Separate Reads and Writes

Reads and writes have fundamentally different characteristics:

AspectWritesReads
PatternInfrequent, discrete operationsFrequent, potentially complex queries
Data volumeSingle record or small batchLarge result sets, aggregations
OptimizationNormalization, constraintsDenormalization, indexes
TimingImmediate consistency neededCan tolerate staleness

When you try to optimize both on the same model, you end up compromising both. CQRS lets you optimize each side independently.

Basic CQRS Flow

graph LR
    Command[Command] --> CommandHandler[Command Handler]
    CommandHandler --> WriteStore[(Write Store)]
    WriteStore -->|async projection| ReadStore[(Read Store)]

    Query[Query] --> QueryHandler[Query Handler]
    QueryHandler --> ReadStore

Commands flow to the command handler, which validates and persists changes to the write store. Read models are updated asynchronously from the write store through an event-driven projection mechanism.

Event Sourcing Fundamentals

Event sourcing changes how you think about persistence. Instead of storing current state, you store the sequence of events that led to that state.

Storing Events Instead of State

Traditional state storage overwrites values:

UPDATE accounts SET balance = 500 WHERE id = 1;

The old state is lost.

Event sourcing appends instead:

AccountCredited { account_id: 1, amount: 200, balance: 500, timestamp: 2026-03-24T10:00:00Z }
AccountCredited { account_id: 1, amount: 300, balance: 500, timestamp: 2026-03-24T11:00:00Z }

Every change becomes an immutable record. You derive the current balance by replaying events.

The Event Store

An event store is a specialized database designed for event sourcing. Core operations:

  • Append events: Add new events to a stream
  • Read stream: Retrieve all events for an aggregate in order
  • Subscribe: Get notified when new events are added

The event store is the source of truth. Everything else is derived.

graph LR
    E1[AccountOpened] --> Store[(Event Store)]
    E2[Deposited] --> Store
    E3[Withdrawn] --> Store

    Store -->|replay| Snapshot[Current State]

Aggregates and Event Streams

In domain-driven design, an aggregate is a cluster of related objects treated as a single unit for data changes. Each aggregate has its own event stream.

For an order management system:

  • OrderAggregate stream contains: OrderCreated, ItemAdded, ItemRemoved, OrderConfirmed, OrderShipped, OrderCancelled
  • InventoryAggregate stream contains: InventoryReserved, InventoryReleased, InventoryDamaged

Each stream is append-only. Events are immutable once written.

Benefits of Event Sourcing

Event sourcing solves several problems that are difficult to handle with state storage:

Complete audit trail. You have a log of every state change. Regulatory requirements, debugging, and understanding user behavior all become easier with the full history.

Temporal queries. “What was the account balance on March 1st at 3pm?” With event sourcing, you replay events up to that timestamp. With state storage, you typically do not have this data readily available.

Replay capability. If your read model has a bug, you fix it and rebuild from the event stream. No data is lost because the events are the source of truth.

Debugging and tracing. You can replay production events locally to reproduce bugs. The event log becomes a precise record of what happened.

Parallel projections. Multiple read models can be built from the same event stream independently. You can add new views of your data without modifying the write side.

Challenges of Event Sourcing

Event sourcing is not without friction:

Event schema evolution. Event schemas change over time. Old events must still deserialize correctly. You need a strategy for handling changes: upcasters, version numbers, or both.

Projection rebuild times. Replaying millions of events takes time. Large aggregates can become slow to reconstruct. Periodic snapshots help mitigate this.

Eventual consistency. Read models built from events lag behind writes. Your application must handle stale data.

Increased complexity. The mental model differs from traditional CRUD. Teams need time to adapt.

Event Processing Strategies

The way you process events in a CQRS/event sourcing system shapes performance, reliability, and consistency characteristics.

At-Least-Once vs Exactly-Once Delivery

Event consumers typically deliver messages with at-least-once semantics. Your projection handlers must handle duplicate events gracefully.

At-least-once delivery means an event may be delivered multiple times during retries or rebalances. Without idempotency checks, duplicates corrupt your read model.

Exactly-once delivery requires distributed consensus protocols and is expensive to guarantee. Most systems settle for at-least-once with idempotent handlers.

Design your projections assuming events will arrive more than once. The idempotent handler pattern handles this:

def handle_transfer_event(event):
    # Check if already processed
    if event.event_id in processed_event_log:
        return

    # Process event
    update_read_model(event)

    # Record processed ID
    processed_event_log.add(event.event_id)

Event Ordering Guarantees

Events within a single aggregate stream maintain ordering. Events across aggregates do not.

Within a stream: Events are appended in order. A Withdraw event after a Deposit in the same account stream respects the correct balance.

Across streams: A PaymentProcessed event and a FulfillmentScheduled event may arrive in different orders to different consumers.

For aggregate-centric operations, ordering within the stream matters. For cross-aggregate workflows, you need additional coordination.

Backpressure Handling

Fast event producers can overwhelm slow projection consumers. Without backpressure, lag grows until your system breaks.

Consumer-side backpressure: Kafka consumers control their own fetch rate. If projections process slowly, consumer lag grows.

Producer-side backpressure: Commands wait if the event store is overwhelmed. This backpressure propagates to callers.

Mitigation approaches:

  • Scale horizontally: Add projection worker instances
  • Prioritize streams: Process critical streams first during high load
  • Batch processing: Accumulate events and process in batches for throughput
  • Circuit breakers: Stop consuming when downstream systems fail

Projections vs Transformations

Projections build read models from events. Transformations convert events between schemas.

Projections: OrderCreatedOrderSummary for the orders read model Transformations: OrderCreated_v1OrderCreated_v2 (upcaster)

Keep transformations focused. A transformation should only adapt schema, not change semantic meaning. Business logic belongs in projections.

Cold Start Considerations

New projection consumers start from the beginning of the event stream. For mature systems, this takes time.

Full replay: Read from offset 0. Process all historical events. Takes time proportional to event history length.

From timestamp: Some event stores let you replay from a specific time. Useful for disaster recovery.

Parallel replay: Partition the event stream and process partitions in parallel. Requires consumer support for parallel processing.

Plan for cold start times when designing your recovery procedures. A system with 50 million events might take hours to fully replay.

Saga Pattern Integration

CQRS and event sourcing pair naturally with the saga pattern for distributed transactions across service boundaries.

Sagas as Coordinators

A saga coordinates multiple services to complete a business workflow. Each step of the saga emits events that trigger the next step.

A hotel booking saga:

1. ReserveCar event → CarService reserves car
2. CarReserved event → HotelService reserves room
3. RoomReserved event → FlightService confirms flight
4. All confirmed → BookingComplete event

If any step fails:

1. ReserveCar event → CarService reserves car
2. CarReserved event → HotelService reports: HotelFull
3. HotelUnavailable event → CarService cancels car reservation
4. BookingCancelled event → Notify user

Event Sourcing for Sagas

Each saga instance can be an aggregate in your event store. The saga state becomes explicit.

class BookingSaga:
    def __init__(self, saga_id):
        self.saga_id = saga_id
        self.state = "pending"
        self.steps = []

    def handle(self, event):
        if isinstance(event, CarReserved):
            self.transition_to("hotel_reservation")
        elif isinstance(event, HotelUnavailable):
            self.transition_to("compensating")
            self.emit(CancelCarReservation(self.saga_id))

The event log captures the full saga execution, including compensations. Debugging becomes straightforward.

Command and Query Separation in Sagas

Sagas issue commands and receive events. The saga itself is write-focused. Query side belongs in the services participating in the saga.

A saga coordinator does not directly read from read models. It receives events, makes decisions, and emits commands. The separation keeps the design clean.

Event Stores

Standard relational databases are not well-suited for event sourcing without significant effort. You need specialized infrastructure designed for append-only event storage.

Kafka

Apache Kafka is frequently used as an event store for event sourcing. It provides durable, ordered, partitioned event streams with consumer group semantics.

Kafka excels at:

  • High-throughput event streaming
  • Event replay from any offset
  • Multi-subscriber event consumption
  • Partitioning for horizontal scaling

Kafka is not a true event store in the domain-driven design sense because it lacks aggregate-centric operations. It is an event streaming platform adapted for event sourcing patterns.

EventStoreDB

EventStoreDB is purpose-built for event sourcing. It understands aggregates, streams, and event versioning natively.

EventStoreDB provides:

  • Aggregate-centric APIs
  • Built-in projections
  • Event versioning and upcasting support
  • Subscription mechanisms for building read models

EventStoreDB is a better fit when event sourcing is the primary pattern and you want infrastructure designed specifically for it.

Other Options

  • Postgres with append-only tables: Simple approach for low-volume scenarios
  • Axon Server: Commercial event store with built-in CQRS support
  • Marten: Event sourcing library for .NET using Postgres

The choice depends on your scale requirements, team expertise, and whether you need the broader Kafka ecosystem for other use cases.

Building Read Models with Projections

Projections build read models from events. A projection subscribes to event streams and updates a read model accordingly.

How Projections Work

graph LR
    EventStore[(Event Store)] --> Projection[Projection]
    Projection --> ReadModel[(Read Model)]

    subgraph Projection Logic
        P1[Handle AccountOpened]
        P2[Handle Deposit]
        P3[Handle Withdrawal]
    end

    Projection --> P1
    Projection --> P2
    Projection --> P3

When an event is written, the projection handler for that event type executes. It reads the event, updates the read model, and commits.

Projection Examples

Given these events:

OrderCreated { order_id: 123, customer_id: 456, items: [...] }
ItemAdded { order_id: 123, item_id: 789, quantity: 2 }
OrderConfirmed { order_id: 123 }

A “orders by customer” read model projection:

def handle_order_created(event):
    read_model.insert({
        'order_id': event.order_id,
        'customer_id': event.customer_id,
        'status': 'created',
        'total_items': len(event.items)
    })

def handle_item_added(event):
    read_model.update(
        {'order_id': event.order_id},
        {'$inc': {'total_items': event.quantity}}
    )

def handle_order_confirmed(event):
    read_model.update(
        {'order_id': event.order_id},
        {'$set': {'status': 'confirmed'}}
    )

The projection logic can be anything you can express in code. This flexibility is what makes projections powerful.

Command Handler Implementation

class CommandHandler:
    """Handles commands and emits events."""

    def __init__(self, event_store):
        self.event_store = event_store

    def handle_create_order(self, command: CreateOrderCommand) -> OrderCreatedEvent:
        # Validate business rules
        if not command.items:
            raise InvalidOrderError("Order must have at least one item")

        # Create and persist event
        event = OrderCreatedEvent(
            order_id=str(uuid.uuid4()),
            customer_id=command.customer_id,
            items=command.items,
            total=sum(item.price * item.quantity for item in command.items),
            created_at=datetime.utcnow()
        )
        self.event_store.append(event)
        return event

    def handle_confirm_order(self, command: ConfirmOrderCommand) -> OrderConfirmedEvent:
        # Load current state from event stream
        events = self.event_store.get_stream(command.order_id)
        order = OrderAggregate.reconstruct(events)

        # Validate state transition
        if order.status != "pending":
            raise InvalidStateTransitionError(f"Cannot confirm order in status: {order.status}")

        # Emit event
        event = OrderConfirmedEvent(
            order_id=command.order_id,
            confirmed_at=datetime.utcnow()
        )
        self.event_store.append(event)
        return event

Aggregate Reconstruction with Snapshots

class OrderAggregate:
    """Reconstructs aggregate state from events with snapshot support."""

    SNAPSHOT_INTERVAL = 100  # Create snapshot every 100 events

    @classmethod
    def reconstruct(cls, events: list[Event], snapshot: Snapshot = None) -> "OrderAggregate":
        aggregate = cls()

        if snapshot:
            # Start from snapshot
            aggregate.order_id = snapshot.order_id
            aggregate.customer_id = snapshot.customer_id
            aggregate.status = snapshot.status
            aggregate.items = snapshot.items
            aggregate.total = snapshot.total
            aggregate.version = snapshot.version
        else:
            # Replay from beginning
            aggregate.version = 0

        # Apply remaining events
        start_version = snapshot.version if snapshot else 0
        for event in events[start_version:]:
            aggregate._apply(event)
            aggregate.version += 1

        return aggregate

    def _apply(self, event: Event):
        if isinstance(event, OrderCreated):
            self.order_id = event.order_id
            self.customer_id = event.customer_id
            self.items = event.items
            self.total = event.total
            self.status = "created"
        elif isinstance(event, ItemAdded):
            self.items.append(event.item)
            self.total += event.item.price * event.item.quantity
        elif isinstance(event, OrderConfirmed):
            self.status = "confirmed"

    def should_snapshot(self) -> bool:
        return self.version % self.SNAPSHOT_INTERVAL == 0

    def create_snapshot(self) -> Snapshot:
        return Snapshot(
            order_id=self.order_id,
            customer_id=self.customer_id,
            status=self.status,
            items=self.items,
            total=self.total,
            version=self.version
        )

Idempotent Projection Handler

class IdempotentProjection:
    """Projection that handles duplicate events safely."""

    def __init__(self, read_model_db):
        self.read_model = read_model_db
        self.processed_events = set()  # Track processed event IDs

    def handle(self, event: Event):
        # Idempotency check
        if event.event_id in self.processed_events:
            return  # Already processed, skip

        # Process event
        if isinstance(event, OrderCreated):
            self._handle_order_created(event)
        elif isinstance(event, OrderConfirmed):
            self._handle_order_confirmed(event)

        # Mark as processed
        self.processed_events.add(event.event_id)

        # Prune old event IDs to prevent memory growth
        if len(self.processed_events) > 100000:
            self._prune_processed_events()

    def _handle_order_created(self, event):
        # Upsert instead of insert
        self.read_model.upsert(
            {'order_id': event.order_id},
            {
                '$set': {
                    'customer_id': event.customer_id,
                    'status': 'created',
                    'created_at': event.created_at
                },
                '$setOnInsert': {'_id': event.order_id}
            }
        )

Because projections are derived from events, they can be rebuilt from scratch at any time. This is invaluable when:

  • You find a bug in projection logic
  • You need a new read model from historical data
  • You want to change the read model schema

Rebuilding involves:

  1. Clear the existing read model
  2. Reset the projection checkpoint to the beginning
  3. Replay all events through the projection handler

The event store provides the replay capability.

Eventual Consistency Implications

CQRS and event sourcing introduce eventual consistency between the write side and read models. This has practical implications for how you build applications.

What Eventual Consistency Means

When you write data, the command completes successfully. The event is persisted. But the read model update happens asynchronously. For a brief window, reads return stale data.

The duration of this window varies:

  • Local event processing: milliseconds
  • Kafka consumer lag: milliseconds to seconds
  • Cross-datacenter replication: seconds to minutes

You must design your application assuming reads can be stale.

User Experience Considerations

Users notice eventual consistency in specific scenarios:

  • They submit an order and immediately check their order list
  • They update their profile and refresh the page
  • They delete an item and see it briefly before the page updates

Mitigation strategies:

  • Optimistic UI: Show the expected state immediately, reconcile if needed
  • Read-your-writes consistency: Route reads to a model that includes your own writes
  • Refresh mechanisms: Provide manual refresh or auto-refresh for stale-sensitive views

Eventual Consistency Flow

graph LR
    Command[Command] --> Validate[Validate Command]
    Validate --> Persist[Persist to Event Store]
    Persist --> Emit[Emit Event]
    Emit --> Async[Async Projection]
    Async --> Update[Update Read Model]
    Update --> Query[Query Read Model]
    Query --> User[Return to User]
    Persist --> Query[Read immediately<br/>after write]

Production Failure Scenarios

FailureImpactMitigation
Projection lag too longUsers see stale data for extended periodMonitor projection lag; scale projection workers; alert on SLA breach
Event schema changes break projectionsRead models stop updatingImplement upcasters; version event schemas; test schema evolution
Projection worker crashes mid-updatePartial update to read modelIdempotent projections; event replay capability; transactional outbox
Event store becomes unavailableCommands cannot be processedUse replicated event store; partition for availability
Snapshot too oldLong aggregate reconstruction timeDefine snapshot frequency; monitor snapshot age
Concurrent projections conflictDuplicate or lost updates in read modelUse optimistic concurrency; idempotent operations; event ordering guarantees
Kafka consumer group rebalanceBrief read model unavailabilityPlan for rebalance delays; use consumer lag monitoring
Read model database unavailableReads fail but writes succeedRead replicas; circuit breakers; graceful degradation

Consistency Guarantees

CQRS does not mean you abandon consistency entirely:

  • Commands validate against current state before accepting changes
  • Events are written atomically within an aggregate
  • Read models eventually catch up

The aggregate boundary defines your consistency scope. Within an aggregate, you have strong consistency. Across aggregates, you have eventual consistency.

When to Use CQRS and Event Sourcing

These patterns solve specific problems. They introduce complexity that must be justified.

Trade-off Comparison

CriteriaTraditional CRUDCQRS OnlyCQRS + Event Sourcing
ComplexityLowMediumHigh
Read/Write OptimizationLimited (same model)Independent optimizationIndependent optimization
Audit TrailLimited (last state only)Limited (last state only)Complete history
Temporal QueriesNot supportedNot supportedFull support
Projection RebuildN/AClear and rebuildReplay events
Event Schema EvolutionN/AN/ARequired
InfrastructureStandard databaseWrite + read storesEvent store + projections
Team Learning CurveLowMediumHigh
DebuggingStandardStandardEvent replay

Good Fits

CQRS and event sourcing work well when:

Audit requirements are strict. Financial systems, healthcare records, compliance-heavy domains. The immutable event log satisfies audit requirements naturally.

Temporal queries are frequent. “Show me all state changes for account X in Q3 2025.” With events, this is straightforward. With state storage, you need special infrastructure.

Multiple read models exist. Different consumers need different views of the same data. Reporting, analytics, operational dashboards, user-facing views. Each can have its own optimized read model.

Complex domain with rich behavior. When the write side involves significant business logic, a separate command model lets you express that logic clearly without mixing in read concerns.

Distributed systems with event-driven integration. Microservices that communicate through events naturally benefit from event sourcing. The events become the integration contract between services.

Poor Fits

These patterns are likely overkill when:

Simple CRUD dominates. If your application is mostly create, read, update, delete with straightforward queries, CQRS adds complexity without benefit.

Strong consistency is required everywhere. If the business cannot tolerate stale reads in any scenario, eventual consistency creates friction.

Small team with limited experience. The operational overhead is real. You need expertise in event store administration, projection development, and debugging distributed systems.

Low latency requirements are strict. The additional async steps in CQRS introduce latency. If sub-millisecond response times are critical, the added hops hurt.

Decision Framework

Before adopting CQRS and event sourcing:

  1. Do you have a genuine need for event replay or temporal queries?
  2. Do you need multiple, differently-optimized read models?
  3. Is the team prepared for the operational complexity?
  4. Is eventual consistency acceptable for your use cases?

If you answered yes to the first two questions and are prepared for the complexity, these patterns will likely pay off.

CQRS and event sourcing connect to other patterns in distributed systems.

Event sourcing pairs naturally with event-driven architecture. Events are the communication mechanism, and event sourcing provides the persistence strategy for those events.

The saga pattern handles distributed transactions across microservices. Sagas work well with event sourcing because each saga step can emit events that trigger the next step.

For distributed transactions across services, see distributed transactions which covers two-phase commit and related approaches.

The database per service pattern supports CQRS because each side can use a database optimized for its workload. The write side might use a relational database for transactional integrity. The read side might use a document store or columnar database for query performance.

Quick Recap

Key Points

  • CQRS separates read and write models for independent optimization
  • Event sourcing stores events instead of state, enabling replay and audit
  • Event stores like Kafka and EventStoreDB provide the persistence layer
  • Projections build read models from events asynchronously
  • Eventual consistency is a fundamental tradeoff
  • These patterns suit audit-heavy, temporal-query-rich, and multi-read-model domains
  • They add complexity and require careful consideration before adoption

Pre-Deployment Checklist

- [ ] Event store selected and operational (Kafka, EventStoreDB, or other)
- [ ] Aggregate boundaries defined and documented
- [ ] Event schema versioning strategy implemented (upcasters or versioned types)
- [ ] Projection infrastructure in place for read model updates
- [ ] Snapshot strategy defined for long-running aggregates
- [ ] Monitoring for projection lag and event consumption delays
- [ ] Read model rebuild procedure documented and tested
- [ ] Eventual consistency handling documented for UI teams
- [ ] Security model defined for event store access
- [ ] Schema evolution testing implemented

Interview Questions

1. What is the fundamental difference between CQRS and traditional CRUD architectures?

Expected answer points:

  • CRUD uses a single model for both reads and writes; CQRS separates them into distinct command and query models
  • Traditional CRUD compromises both read and write optimization on the same model
  • CQRS allows independent optimization of each side—writes can use normalized storage while reads use denormalized formats
  • The command model handles validation and state changes; the query model handles efficient data retrieval
2. Explain the relationship between CQRS and event sourcing. Can you have one without the other?

Expected answer points:

  • CQRS and event sourcing are complementary but independent patterns
  • CQRS alone gives you separate read/write models but still typically stores current state
  • Event sourcing stores the sequence of events rather than state, enabling replay and audit
  • Yes, you can have CQRS without event sourcing (using regular state storage for the write model)
  • You can also have event sourcing without CQRS (storing events but using same model for reads)
  • The combination is powerful: CQRS + event sourcing gives optimized reads/writes plus complete audit trail and temporal queries
3. What is an event store and how does it differ from a traditional database?

Expected answer points:

  • An event store is a specialized database designed for append-only event storage
  • Core operations: append events, read stream by aggregate ID, subscribe to event streams
  • Events are immutable once written—unlike traditional databases where records can be updated/deleted
  • Stores the sequence of events that led to current state, not the current state itself
  • Supports event replay to reconstruct aggregate state at any point in time
  • Popular implementations: EventStoreDB (DDD-native), Apache Kafka (streaming platform adapted for event sourcing)
4. What is eventual consistency in CQRS and why does it occur?

Expected answer points:

  • Eventual consistency means read models may lag behind writes temporarily
  • Occurs because read models are updated asynchronously from the write store through projections
  • When a command completes, the event is persisted immediately but read model updates happen in the background
  • Lag duration varies: local processing (milliseconds), Kafka consumer lag (ms to seconds), cross-datacenter (seconds to minutes)
  • Applications must be designed to handle stale reads during this window
  • Mitigation strategies: optimistic UI, read-your-writes consistency, manual/auto-refresh mechanisms
5. How do you handle event schema evolution in event sourcing?

Expected answer points:

  • Event schemas change over time as domain understanding evolves
  • Old events must still deserialize correctly despite schema changes
  • Common approaches: version numbers in events, upcasters (transform old event formats to new), or both
  • Upcasters are functions that transform events from one schema version to another
  • Snapshot strategy helps: periodic snapshots reduce reconstruction time for long event histories
  • Testing schema evolution is critical—verify old events can be replayed with new code
6. What is the aggregate reconstruction process and how do snapshots help?

Expected answer points:

  • Aggregate reconstruction replays events from the event stream to build current state
  • For each event, the aggregate's apply method updates state accordingly
  • Without snapshots, reconstruction time grows linearly with event count
  • Snapshots store periodic state checkpoints—reconstruction starts from the nearest snapshot then replays remaining events
  • Trade-off: simpler reconstruction vs storage overhead for snapshots
  • Common pattern: snapshot every N events (e.g., every 100 events)
7. What makes idempotent projection handlers important in CQRS/event sourcing?

Expected answer points:

  • Projections may process the same event multiple times due to retries, consumer group rebalances, or crashes mid-update
  • Idempotent handlers ensure duplicate processing produces the same result as single processing
  • Implementation: track processed event IDs in a set; skip events already processed
  • Uses upsert operations instead of blind inserts (can fail on duplicates)
  • Prevents duplicate or corrupted data in read models from retry scenarios
  • Pruning mechanism needed to prevent unbounded memory growth in the processed events set
8. When would you choose Kafka over EventStoreDB for event sourcing?

Expected answer points:

  • Kafka is an event streaming platform, not a true event store in DDD sense—lacks aggregate-centric operations
  • Choose Kafka when: already using Kafka for other use cases, need high-throughput streaming, need multi-subscriber consumption patterns, need the broader Kafka ecosystem
  • Choose EventStoreDB when: event sourcing is the primary pattern, need native aggregate support, built-in projections, native event versioning and upcasting, simpler operational model
  • Kafka excels at horizontal scaling and durability through partitioning
  • EventStoreDB better fits pure event sourcing workloads where aggregate boundaries are clear
9. What are the main challenges teams face when adopting CQRS and event sourcing?

Expected answer points:

  • Mental model shift: different from traditional CRUD, requires learning new patterns
  • Eventual consistency handling: applications must be designed for stale reads
  • Projection rebuild times: large aggregates can be slow to reconstruct without snapshots
  • Event schema evolution complexity: upcasters and versioning add operational overhead
  • Increased infrastructure complexity: separate read/write stores, event store, projection workers
  • Team learning curve: requires expertise in event store administration, projection development, debugging distributed systems
  • Debugging distributed systems is inherently harder than monolithic debugging
10. Design a decision framework for when to adopt CQRS and event sourcing.

Expected answer points:

  • Do you have genuine need for event replay or temporal queries? If yes, strong candidate for event sourcing
  • Do you need multiple, differently-optimized read models for different consumers? Supports CQRS
  • Is the team prepared for operational complexity? Requires dedicated expertise
  • Is eventual consistency acceptable for your use cases? If strong consistency required everywhere, friction will outweigh benefits
  • Audit requirements: regulatory/compliance needs favor event sourcing's immutable log
  • Domain complexity: rich behavior domains benefit from explicit command models
  • Start incrementally: CQRS alone before adding event sourcing; prove value at each step
11. How would you implement snapshotting to optimize aggregate reconstruction performance?

Expected answer points:

  • Snapshots store periodic state checkpoints to avoid replaying entire event history
  • Define snapshot interval based on event count (e.g., every 100 events)
  • Reconstruction process: load nearest snapshot, then replay events after snapshot timestamp
  • Snapshot contains all aggregate state fields plus version number
  • Trade-off: faster reconstruction vs storage overhead for snapshot objects
  • Monitor snapshot age and create new snapshots when version gap grows too large
  • Consider snapshot frequency based on aggregate event rate and acceptable reconstruction time
12. Explain how upcasters work and why they are necessary for event schema evolution.

Expected answer points:

  • Upcasters transform events from old schema versions to new schema versions during replay
  • When event schemas change, old events stored in the event store may not deserialize correctly with new code
  • Upcaster functions apply transformations to convert old event format to current format
  • Typically chained: v1 → v2 → v3 upcasters applied sequentially for older events
  • Alternative approach: embed version number in event and handle version branching in apply logic
  • Upcasters allow adding new fields with defaults without breaking old events
  • Test upcasters by replaying historical events through the full upcaster chain
13. What is the relationship between aggregates and event streams in DDD?

Expected answer points:

  • An aggregate is a cluster of related objects treated as a single unit for data changes
  • Each aggregate instance has its own event stream—events belong to one aggregate's history
  • Aggregate boundary defines consistency scope: within aggregate, strong consistency; cross-aggregate, eventual consistency
  • Command handlers load aggregate by replaying its event stream, apply command, emit new events
  • Events reference aggregate ID, aggregate version, and contain the domain data that changed
  • Split aggregates at natural bounce points: Order vs Inventory are separate aggregates with separate streams
  • Stream naming convention: `aggregateType-aggregateId` (e.g., `Order-abc123`)
14. How do you handle cross-aggregate transactions in an event-sourced system?

Expected answer points:

  • Cross-aggregate transactions cannot use strong consistency—aggregates only handle their own state
  • Saga pattern coordinates multi-aggregate workflows through events and compensating actions
  • When aggregate A needs aggregate B to change: emit event from A, saga subscribes and issues command to B
  • Compensating transactions reverse steps if a later step fails
  • Order matters: booking flow reserves flight, then hotel, then car—if hotel fails, undo flight reservation
  • Eventual consistency means temporary inconsistency during saga execution—users see intermediate states
  • Design sagas to be eventually consistent and compensate gracefully
  • 15. Compare synchronous vs asynchronous projection strategies and their trade-offs.

    Expected answer points:

    • Synchronous projection: command handler updates read model before returning success. Low latency for reads, but command latency increases
    • Asynchronous projection: event is persisted, projection happens in background. Command returns immediately, reads may be stale
    • Hybrid approach: synchronous update for critical read path, async for rest
    • Async projection scales better under high write load—projection workers scale independently
    • Async projection complicates read-your-writes consistency—user may not see their own write immediately
    • Consider: for which read models is immediate consistency required? Only those get synchronous updates
    16. How would you design a read model for a dashboard showing real-time metrics?

    Expected answer points:

    • Real-time dashboard needs minimal lag between write and visible update
    • Use synchronous projection or very low-latency async (local event processing)
    • Pre-aggregate metrics in projection: maintain running totals, counts, averages in read model
    • Denormalize for query pattern: dashboard widget data stored in format ready for display
    • Consider separate read model per widget rather than one model serving all dashboard needs
    • Include projection lag monitoring—alert if read model falls behind event stream
    • For high-cardinality metrics (per-user), consider downsampling or aggregating at write time
    17. What are the security implications of event sourcing and how do you address them?

    Expected answer points:

    • Event store contains full history including deleted data—cannot just delete records
    • PII in events requires encryption at rest and access controls
    • Commands validate authorization; events should not leak sensitive data to unauthorized consumers
    • Use separate streams per tenant in multi-tenant systems for isolation
    • Implement event masking: transform sensitive fields when replaying to unauthorized consumers
    • Audit log of event store access complements the event log itself
    • Consider event immutability vs right to be forgotten regulations (GDPR)—requires special handling
    18. How do you test event-sourced aggregates and projection handlers?

    Expected answer points:

    • Test aggregates by replaying a sequence of events and asserting final state
    • Given events [OrderCreated, ItemAdded], when Apply command ConfirmOrder, then emit OrderConfirmed
    • Test invalid state transitions: given OrderCancelled, when ConfirmOrder, then reject with InvalidStateTransition
    • Projection tests: given events, when projection processes them, then read model matches expected state
    • Test idempotency by calling projection handler multiple times with same event and verifying same result
    • Integration tests: real event store, real read model, verify rebuild from scratch produces correct state
    • Property-based testing: generate random event sequences and verify aggregate invariants hold
    19. Describe a strategy for migrating an existing CRUD system to CQRS without rewriting everything.

    Expected answer points:

    • Introduce CQRS incrementally: start with one bounded context or one read model
    • Phase 1: Add read model alongside existing CRUD; sync via change data capture or triggers
    • Phase 2: New features use command model; old features remain CRUD
    • Phase 3: Migrate read-heavy features to use read model; eventually migrate write side
    • Keep existing database as write store initially; add denormalized read model for new queries
    • Use strangler fig pattern: new CQRS features wrap around legacy system, gradually replace
    • Event sourcing added last after CQRS settles—it adds complexity beyond just separation
    20. How does event sourcing affect database backup and disaster recovery strategies?

    Expected answer points:

    • Event store backup is simpler: append-only logs, no complex transaction logs, no deleted records
    • Restore is replay: load event store backup, rebuild all read models from events
    • Backup frequency matters less than with state storage—no data lost between backups if events are immutable
    • Snapshots reduce recovery time objective (RTO): rebuild from snapshot + recent events vs full replay
    • Cross-datacenter replication of event stream provides disaster recovery without complex failover
    • Read models are disposable: if read model database fails, rebuild from event store
    • Test disaster recovery: periodically restore to fresh environment and verify read model rebuild
    • Backup encryption and access controls still required for event store containing sensitive history

    Further Reading

    Conclusion

    CQRS and event sourcing solve real problems in distributed systems. Separating read and write concerns, keeping the full history of changes, replaying events when things go wrong—these aren’t academic ideas. They come from dealing with messy production systems.

    The tradeoffs are genuine. You take on operational complexity, eventual consistency headaches, and a learning curve for the whole team. I’ve seen teams struggle with this for months before it clicked.

    For most applications, keep it simple. A monolith with a well-designed database handles a lot. But when audit trails matter, when you need to query historical states, when different consumers want wildly different views of the same data—these patterns earn their keep.

    Category

    Related Posts

    CQRS Pattern

    Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.

    #database #cqrs #event-sourcing

    Event Sourcing

    Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.

    #database #event-sourcing #cqrs

    Amazon Architecture: Lessons from the Pioneer of Microservices

    Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.

    #microservices #amazon #architecture