Event-Driven Architecture: Events, Commands, and Patterns

Learn event-driven architecture fundamentals: event sourcing, CQRS, event correlation, choreography vs orchestration, and implementation patterns.

published: March 22, 2026 reading time: 30 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Event-driven architecture flips the script on microservices communication: instead of services calling each other directly, they publish events and let interested consumers react. This decouples senders from receivers in both time and space. This guide cuts through the jargon: events vs commands, event sourcing trade-offs, CQRS implementation patterns, correlation IDs for distributed tracing, and the choreography vs orchestration debate. By the end you'll know when EDA actually helps versus when it just adds complexity.

Event-Driven Architecture: Events, Commands, and Patterns

Introduction

In request-response, Service A calls Service B directly and waits. In event-driven architecture, Service A publishes an event and moves on. Service B (and C, D, and whoever else is listening) picks it up and reacts. The sender doesn’t know or care who’s listening.

This decoupling in time and space is what makes EDA powerful. Service A doesn’t wait for Service B to finish processing. Service B can be down, busy, or not running at all — the event sits in a queue until Service B is ready. That resilience and flexibility comes with complexity you need to understand before adopting it.

This post covers the core concepts of EDA: events vs commands, event sourcing, CQRS, correlation, and the choreography vs orchestration debate.

Topic-Specific Deep Dives

Event Sourcing

Event sourcing is a pattern where you store the full history of state changes as events, rather than just the current state. The current state is derived by replaying events.

graph LR
    E1[UserRegistered] --> Store[(Event Store)]
    E2[EmailChanged] --> Store
    E3[NameUpdated] --> Store

    Store -->|replay| State[Current State]

Instead of:

UPDATE users SET name = 'Alice' WHERE id = 1;

You append:

UserNameUpdated { user_id: 1, old_name: 'Bob', new_name: 'Alice', timestamp: ... }

Benefits of Event Sourcing

Complete audit trail: Every state change is stored, immutable and auditable
Temporal queries: Ask “what was the customer’s address on January 15th?”
Replay: If your read model is wrong, rebuild it from events
Decoupled write and read: Events are the source of truth, projections are derived

Challenges of Event Sourcing

Event schema evolution: Old events must still deserialize correctly as your schema changes
Projections can be slow: Replaying millions of events takes time
Eventual consistency: Read models lag behind writes as projections catch up
Complexity: More moving parts than simple CRUD

Handling Schema Evolution

Events are immutable, but your schema will change. When UserCreated gets a new field two years from now, old events still need to deserialize correctly. Several approaches handle this: upcasters (transformers that convert old versions to new), version numbers in event type names like UserCreatedV2, and snapshots that cap replay depth.

CQRS: Command Query Responsibility Segregation

CQRS separates read and write models. Writes go through one model (optimized for commands). Reads go through a different model (optimized for queries).

Writes: API -> Command Handler -> Event Store
Reads:  Query -> Read Model (materialized view)

The event store is the write side. Read models are projections built from events.

graph LR
    Command[Command] --> CommandHandler[Command Handler]
    CommandHandler --> EventStore[(Event Store)]

    EventStore -->|project| ReadModel1[Read Model: Orders by User]
    EventStore -->|project| ReadModel2[Read Model: Orders by Date]
    EventStore -->|project| ReadModel3[Read Model: Revenue Dashboard]

When to Use CQRS

CQRS shines when read and write workloads differ significantly, you need multiple optimized read representations, teams can work independently on each side, or you’re already using event sourcing. If your domain is simple CRUD with basic queries, CQRS adds complexity you don’t need.

Event Correlation

In distributed systems, a single business transaction spans multiple services. Tracking such transactions requires correlation IDs.

A correlation ID is a unique identifier attached to all events resulting from the same originating action:

UserSignsUp (correlation_id: abc-123)
  -> AccountCreated (correlation_id: abc-123)
     -> WelcomeEmailSent (correlation_id: abc-123)
     -> AnalyticsEvent (correlation_id: abc-123)

With correlation IDs, you can trace a complete transaction across services and logs.

Implementation

Pass correlation ID through the entire call chain:

def handle_signup(user_data):
    correlation_id = str(uuid.uuid4())
    event = {
        'type': 'UserSignedUp',
        'correlation_id': correlation_id,
        'user_id': user_data['id'],
        'email': user_data['email']
    }
    event_bus.publish(event)

Each service logs its correlation ID, enabling distributed tracing.

Choreography vs Orchestration

When a business transaction spans multiple services, who coordinates it? Two approaches:

Choreography

Services react to events and emit their own events. No central coordinator.

OrderService: OrderPlaced -> InventoryService: ReserveInventory -> PaymentService: ChargePayment
                                        -> ShippingService: ScheduleShipping

Each service knows only its own part. The overall flow emerges from the event chain.

Pros:

Services are truly decoupled
No single point of failure
Easy to add new consumers

Cons:

Behavior is scattered across services
Hard to see the overall transaction
Difficult to implement transactions that must succeed or fail together

Orchestration

A central orchestrator coordinates the transaction:

graph LR
    Orch[Order Orchestrator] -->|Reserve| Inventory[Inventory Service]
    Orch -->|Charge| Payment[Payment Service]
    Orch -->|Schedule| Shipping[Shipping Service]

The orchestrator knows the complete workflow. It decides what to do next based on responses.

Pros:

Behavior is centralized and visible
Easier to implement complex workflows with branches and compensation
Transaction boundaries are explicit

Cons:

The orchestrator becomes a central point of coupling
Risk of “smart middleware” anti-pattern (business logic in the orchestrator)
Single point of failure (mitigated by workflow engines with persistence)

Which to Use

Choreography works well for simple, linear workflows where services are truly independent. Orchestration works better for complex workflows with complex compensation logic.

In practice, many systems use a hybrid: orchestration for the core business workflow, choreography for peripheral side effects.

Trade-off Analysis

Factor	EDA	Request-Response	Notes
Coupling	Loose - producers don’t know consumers	Tight - client knows service	EDA enables independent evolution
Scalability	Horizontal - event broker handles load	Vertical - service handles load	EDA scales better for high write volume
Consistency	Eventual (eventual consistency)	Strong (immediate)	EDA requires handling lag
Latency	Low for publish, variable for consume	Predictable	EDA has higher average latency
Debugging	Harder - distributed transactions	Easier - synchronous calls	EDA needs correlation IDs and tracing
Event Schema	Must be versioned and stable	API contracts only	EDA requires schema governance
Infrastructure	Event broker needed (Kafka, RabbitMQ)	Simple HTTP/REST	EDA adds infrastructure complexity
Data Recovery	Natural - replay events	Point-in-time snapshots	EDA provides better recovery
Learning Curve	Steep - new patterns	Gentle - familiar REST	EDA requires paradigm shift

When to Use / When Not to Use

When to Use Event-Driven Architecture

Microservices with independent scaling: When services should be decoupled and scale independently
High write volume with audit requirements: When you need complete audit trails of all state changes
Multiple read models: When different consumers need different representations of the same data (CQRS)
Real-time event processing: When you need immediate reaction to events (notifications, analytics, monitoring)
System integration: When integrating multiple systems that evolve independently
Temporal queries: When you need to query historical state (“what was the balance on date X?”)

When Not to Use Event-Driven Architecture

Simple request-response logic: When your use case is straightforward CRUD operations
Strong consistency requirements: When you need immediate consistency across all services
Small teams with limited expertise: When operational complexity exceeds team capacity
Low-latency synchronous requirements: When sub-millisecond response times are critical
Simple workflows: When workflow steps are few and do not benefit from decoupling
Batch-oriented processing: When your use case is bulk data processing rather than real-time events

Message Ordering and Delivery Guarantees

Distributed systems cannot guarantee both ordering and availability. You must choose between ordering guarantees and availability.

At-Most-Once Delivery

The message is delivered at most once - it may be lost but never duplicated. The consumer does not acknowledge receipt, so the broker can discard it after sending.

Producer -> Broker -> Consumer (no ack)
           [message lost if consumer offline]

Use case: Monitoring metrics where occasional loss is acceptable.

At-Least-Once Delivery

The message is guaranteed to arrive, but may be delivered multiple times. The consumer acknowledges after processing, and the broker redelivers if no ack is received.

Producer -> Broker -> Consumer (ack after process)
           [redelivered if no ack received]
           [duplicate possible]

Use case: Payment processing, where missing a transaction is worse than a duplicate.

Exactly-Once Delivery

The message is delivered exactly once, neither lost nor duplicated. Requires coordination between producer confirmation and consumer offset management.

Kafka: Producer tx + Consumer offset commit
       [most expensive but safest]

Use case: Financial transactions where duplicates or losses are both unacceptable.

Ordering Guarantees

Kafka provides per-partition ordering. All events for the same partition key are processed in order. To guarantee cross-partition ordering, use a single partition or explicit sequencing.

# Ensures ordering within user_id
producer.send('events', key=b'user-123', value=event)

For systems requiring global ordering, consider using a single partition topic or implementing sequence numbers in event payloads.

Saga Pattern

When a business transaction spans multiple services, you need a way to handle failures gracefully. Sagas manage distributed transactions through a sequence of local transactions, each with a compensating transaction for rollback.

Why Sagas Instead of Distributed Transactions

Traditional ACID transactions do not scale across services. Two-phase commit (2PC) introduces tight coupling and availability loss. Sagas provide a looser coupling model with compensating transactions.

Types of Sagas

Choreography-based Saga: Each service publishes events that trigger the next service’s local transaction. No central coordinator.

graph LR
    O[Order Service] -->|OrderCreated| P[Payment Service]
    P -->|PaymentCaptured| I[Inventory Service]
    I -->|InventoryReserved| S[Shipping Service]
    S -->|ShippingScheduled| O

Orchestration-based Saga: A central saga orchestrator manages the workflow, deciding what to do next and triggering compensating transactions on failure.

graph LR
    Orch[Order Saga Orchestrator] -->|CreateOrder| OrderSvc[Order Service]
    Orch -->|CapturePayment| PaySvc[Payment Service]
    Orch -->|ReserveInventory| InvSvc[Inventory Service]
    Orch -->|ScheduleShipping| ShipSvc[Shipping Service]

    PaySvc -.->|PaymentFailed| Orch
    Orch -.->|CancelOrder| OrderSvc
    Orch -.->|RefundPayment| PaySvc

Implementing Compensation

Each saga step must define its compensation logic:

class OrderSaga:
    def create_order(self, order_data):
        order = self.order_service.create(order_data)
        return SagaStep(order_id=order.id, forward=self.capture_payment, backward=self.cancel_order)

    def capture_payment(self, step):
        payment = self.payment_service.capture(step.order_id)
        return SagaStep(payment_id=payment.id, forward=self.reserve_inventory, backward=self.refund_payment)

    def cancel_order(self, step):
        self.order_service.cancel(step.order_id)

    def refund_payment(self, step):
        self.payment_service.refund(step.payment_id)

Saga vs Event Sourcing

Sagas focus on workflow coordination. Event sourcing focuses on state history. They complement each other:

Saga without Event Sourcing: Orchestrates steps but does not persist the saga state
Saga with Event Sourcing: Each saga step emits events, creating an audit trail of workflow decisions

Event Schema Design

Events are the contract between producer and consumer. A well-designed event schema evolves gracefully and remains comprehensible across services.

Event Envelope Pattern

Every event should have a standard envelope:

{
  "event_id": "uuid-v4",
  "event_type": "OrderPlaced",
  "event_version": "1.0",
  "timestamp": "2026-03-22T10:30:00Z",
  "correlation_id": "uuid-for-tracing",
  "causation_id": "uuid-of-triggering-event",
  "payload": {
    "order_id": "ORD-12345",
    "customer_id": "CUST-67890",
    "total_amount": 99.99,
    "currency": "USD"
  }
}

This envelope provides observability, traceability, and versioning at the schema level.

Schema Registry

Centralize event schema definitions using a schema registry:

Confluent Schema Registry: Stores Avro or JSON Schema for Kafka events
AWS Glue Schema Registry: AWS-managed schema registry for Kafka and Kinesis
Custom registry: Simple API with versioning, validation, and compatibility checks

Schema registry benefits:

Backward compatibility: New schema versions can read old events
Forward compatibility: Old schema versions can read new events (ignoring new fields)
Schema evolution rules: Define which changes are allowed (add fields, remove fields, etc.)

Event Design Principles

Use flat structures: Nested objects complicate schema evolution
Include timestamps: Essential for ordering and debugging
Add context fields: correlation_id, causation_id enable distributed tracing
Version explicitly: Include version number to handle schema evolution
Use primitive types: Avoid complex types in early versions

Event Processing Patterns

Beyond simple publish-subscribe, event processing involves patterns for handling high-volume, low-latency, and fault-tolerant scenarios.

Event Streaming vs Event Carousel

Event Streaming: Continuous flow of events processed in real-time. Kafka excels at this.

Producer1 -> Topic -> Consumer1
            Topic -> Consumer2
            Topic -> Consumer3

Event Carousel: Repeated processing of the same event batch until consumed. Useful for batch-oriented workflows.

Filtering and Transformation

Filter events at the broker level to reduce network traffic:

# Kafka Streams example: filter low-value events
stream.filter((event) => event.amount > 100)

Transform events before consumption:

# Add enrichment before storing
stream.map((event) => ({
  ...event,
  enriched_at: timestamp,
  region: lookup_region(event.customer_id)
}))

State Stores

Maintain stateful event processing using state stores:

# Kafka Streams: materialized view
orders_by_user = stream.groupByKey()
    .aggregate(
        initializer=[],
        aggregator=(key, event, existing) => existing.append(event)
    )

State stores enable:

joins: Combining events from multiple topics
aggregations: Running totals and windowed computations
tables: Latest value per key

Outbox Pattern

The outbox pattern solves a fundamental reliability problem in event-driven systems: the dual-write issue. When your application needs to update a database and publish an event, doing both non-atomically creates inconsistency risk. If the database commits but the event publish fails, your system enters an invalid state.

Without the outbox pattern:

# Problem: These two operations are not atomic
def handle_order(order_data):
    order = db.create_order(order_data)  # Commits
    event_bus.publish(OrderCreated(order))  # Could fail after commit

With the outbox pattern:

def handle_order(order_data):
    order = db.create_order(order_data)
    db.save_outbox_event(OutboxEvent(type='OrderCreated', payload=order))  # Same transaction
    # Commit both or neither

# Separate process polls outbox and publishes
def outbox_relay():
    events = db.get_pending_outbox_events()
    for event in events:
        event_bus.publish(event)
        db.mark_outbox_processed(event.id)

The outbox table lives in the same database as your business data. A dedicated relay process polls pending events and publishes them to the broker. This guarantees at-least-once delivery because the event is persisted before the publish acknowledgment returns.

Production consideration: Use a single-threaded relay or partition by aggregate ID to prevent out-of-order publishing when multiple instances run.

Event Notification vs Event-Carried State Transfer

Two competing philosophies govern how much data to include in an event:

Event Notification: The event contains only enough information to identify what happened. Consumers must query back to the producer for full details.

{ "event_type": "OrderPlaced", "order_id": "ORD-123", "timestamp": "..." }

Consumers call GET /orders/ORD-123 to get the complete order details.

Event-Carried State Transfer: The event contains all data consumers need without callback queries.

{ "event_type": "OrderPlaced", "order_id": "ORD-123", "customer_id": "...", "items": [...], "total": 99.99 }

Trade-offs:

Aspect	Event Notification	Event-Carried State Transfer
Consumer complexity	Higher (needs callback)	Lower (self-contained)
Producer load	Lower	Lower (no callbacks)
Network calls	Additional per event	None
Data size	Small	Larger
Consumer independence	Lower (coupled to producer API)	Higher
Schema evolution	Easier (consumers re-fetch)	Harder (all consumers must update)

Event notification works well when producers need to protect their data model or when consumers frequently need related data. Event-carried state transfer works well when consumers are truly independent and event sizes remain manageable.

Idempotency in Event Processing

At-least-once delivery means consumers must handle redelivered events gracefully. Idempotency ensures processing an event multiple times produces the same result as processing it once.

Deduplication by event ID:

processed_events = redis.set('processed_events')

def handle_event(event):
    if event.event_id in processed_events:
        return  # Already handled
    process_event(event)
    processed_events.add(event.event_id)

Natural idempotency: Some operations are naturally idempotent:

# Idempotent: Setting a value to the same thing twice
account.update_balance(new_balance=100)

# Non-idempotent: Incrementing
account.increment_balance(amount=10)  # Called twice = +20 instead of +10

Design event handlers to use upserts instead of increments where possible. When increments are necessary, track the event ID to detect duplicates.

Consumer-side acknowledgment:

def process_with_ack(event):
    try:
        result = handle_event(event)
        acknowledge(event.offset)  # Commit offset only after success
        return result
    except:
        negative_acknowledge(event.offset)  # Trigger redelivery
        raise

Backpressure Handling

Backpressure occurs when producers publish events faster than consumers can process them. Left unhandled, this leads to queue accumulation, memory exhaustion, and eventually system failure.

Monitoring consumer lag:

# Kafka: track consumer lag per partition
lag = end_offset - current_offset
if lag > threshold:
    alert("Consumer falling behind")

Strategies for handling backpressure:

Horizontal scaling: Add consumer instances to parallelize processing
Throttling: Implement rate limiting on the producer side
Buffering with limits: Cap buffer size and route overflow to dead letter queues
Adaptive processing: Slow down when queue depth exceeds thresholds

Dead letter queue pattern:

def process_with_dlq(event):
    try:
        handle_event(event)
    except PermanentFailure:
        dlq.publish(event)  # No point retrying
        acknowledge(event.offset)
    except TransientFailure:
        raise  # Retry via redelivery

Design systems to gracefully degrade under load rather than cascade failures. When in doubt, prefer dropping events with alerting over unbounded queue growth.

Production Failure Scenarios

Failure	Impact	Mitigation
Event broker failure	Events not published or consumed	Use broker clustering with replication; implement outbox pattern
Consumer crash mid-processing	Events may be lost or reprocessed	Use acknowledgment after successful processing; idempotent consumers
Out-of-order event delivery	State becomes inconsistent	Use sequence numbers or timestamps; implement conflict resolution
Event schema version mismatch	Consumers cannot deserialize events	Implement schema versioning; use upcasters or schema registry
Poison event blocking consumer	Consumer stops processing	Configure dead letter queues; implement retry limits
Cascading failures	One service failure triggers failures in others	Implement circuit breakers; design for partial availability
Event replay overload	New consumer overwhelms system with replays	Throttle replay rate; process in batches with backpressure
Correlation ID loss	Cannot trace transactions across services	Propagate correlation IDs through all events; log correlation

Common Pitfalls / Anti-Patterns

Pitfall 1: Using Events for Queries

Events are for notifications, not queries. If you find yourself requesting data through events and waiting for responses, use a direct API call instead.

Here is the pattern to watch for:

# Anti-pattern: using events as disguised queries
def handle_user_created(event):
    # Need user details, so publish a request event...
    request_event = {'type': 'RequestUserDetails', 'user_id': event.user_id}
    event_bus.publish(request_event)
    # ...and wait for a response event
    response = wait_for_event('UserDetailsProvided', timeout=5)

This creates what is called event coupling — the consumer now depends on a specific response event that another service must produce. The producer of UserCreated becomes an implicit server for UserDetailsProvided. If that response never arrives (service down, event lost, schema mismatch), your consumer hangs or times out. What started as a decoupled notification turns into a synchronous dependency with extra failure modes.

The fix: If you need data synchronously, call an API directly:

# Better: direct API call
def handle_user_created(event):
    user_details = user_service_api.get_user(event.user_id)
    process_user(user_details)

If you need the data asynchronously, use event-carried state transfer — include the data the consumer needs in the original event payload. The consumer gets everything it needs in one shot, no callbacks needed. This keeps events as true notifications rather than disguised request-response channels.

The rule is simple: events announce that something happened. They do not ask questions.

Pitfall 2: Mixing Commands and Events

Commands expect one handler; events expect zero or more handlers. Sending a command as an event creates confusion about who should handle it and whether responses are expected.

Pitfall 3: Not Planning for Event Schema Evolution

Events are immutable once published. If you change the schema, old events must still deserialize correctly. Plan versioning from the start using upcasters or versioned event types.

Pitfall 4: Over-Engineering with Choreography

Choreography is simple until it is not. Complex workflows with compensation logic are easier to manage with orchestration. Do not default to choreography for everything.

Pitfall 5: Ignoring Eventual Consistency

Read models lag behind writes in EDA. If your business logic assumes immediate consistency, you will have bugs. Design UIs and expectations around eventual consistency.

Pitfall 6: Creating Too Many Event Types

If every database write creates a unique event type, you have an event catalog nightmare. Use a smaller set of event types with sufficient payload to handle multiple use cases.

Consider a real scenario. Your user service tracks profile changes — name, email, address, phone, preferences. A naive approach creates one event type per field:

UserNameUpdated, UserEmailUpdated, UserAddressChanged,
UserPhoneChanged, UserPreferencesUpdated, UserAvatarChanged ...

That catalog grows with every new feature. New service owners dig through 50-plus event types to find the ones they need. Schema registry maintenance balloons.

A better approach: group related changes into fewer, richer event types:

{
  "event_type": "UserProfileUpdated",
  "version": "1.0",
  "changed_fields": ["name", "email"],
  "previous_values": { "name": "Bob", "email": "bob@old.com" },
  "new_values": { "name": "Alice", "email": "alice@new.com" }
}

The event type stays stable (just UserProfileUpdated) while the payload describes what changed. Consumers that care about any profile change subscribe to one event and filter by changed_fields. Consumers that need specific transitions (like email verification) check the field list.

When to split vs. when to group:

Scenario	Approach	Reason
Each field has completely different consumers	Split into separate events	Avoids broadcasting irrelevant data
Fields are processed together (e.g., address update triggers shipping recalculation)	Group into one event type	Reduces event catalog complexity
Regulatory audit requires per-field visibility	Split with field-level event types	Clear audit trail per data point
General profile maintenance	Group into a single `Updated` event	Consumers filter by `changed_fields`

Start with broader event types and split only when you have real consumer-specific requirements. An event type that proves too coarse is easy to split later (the split event becomes a new type). An event type that is too fine-grained creates permanent noise in your system that is hard to undo.

Interview Questions

1. What is the fundamental difference between an event and a command in event-driven architecture?

Expected answer points:

Commands are directed requests for a specific action ("do this"), expecting one handler
Events are broadcast facts that something happened ("this happened"), with zero or more potential listeners
Commands use imperative naming (verb-noun like CreateOrder), while events use past tense naming (noun-verb like OrderPlaced)

2. Explain event sourcing and describe how you would rebuild current state from events.

Expected answer points:

Event sourcing stores the complete history of state changes as a sequence of events rather than persisting current state
To rebuild state, you replay all events from the beginning (or from a snapshot) in order
Each event represents a state mutation; applying them sequentially reconstructs the aggregate
Snapshots can optimize by periodically capturing state to reduce replay depth

3. What is CQRS and when would you choose to use it?

Expected answer points:

CQRS (Command Query Responsibility Segregation) separates read and write models into distinct paths
Write side: Commands go through a command handler to an event store
Read side: Queries hit materialized views (projections) built from events
Best suited when read/write workloads differ significantly, you need multiple optimized read representations, or you are already using event sourcing
Not suitable for simple CRUD domains where the overhead outweighs benefits

4. Describe the choreography vs orchestration patterns for distributed transactions.

Expected answer points:

Choreography: Services react to events and emit their own events without a central coordinator; the overall workflow emerges from the event chain
Orchestration: A central orchestrator coordinates the entire transaction, deciding next steps based on responses
Choreography benefits: True decoupling, no single point of failure, easy to add consumers
Orchestration benefits: Centralized behavior, explicit transaction boundaries, easier compensation logic
Hybrid approach is common in practice: orchestration for core business workflows, choreography for peripheral side effects

5. What are the three types of delivery guarantees in messaging systems?

Expected answer points:

At-most-once: Message may be lost but never duplicated; no consumer acknowledgment
At-least-once: Message is guaranteed to arrive but may be delivered multiple times; consumer acknowledges after processing
Exactly-once: Message arrives exactly once, neither lost nor duplicated; requires coordination between producer and consumer offset management

6. How does the Saga pattern handle distributed transaction failures?

Expected answer points:

Saga manages distributed transactions through a sequence of local transactions, each with a compensating transaction for rollback
Each saga step performs a forward action and defines a backward compensation action
If any step fails, all previously completed steps execute their compensation transactions in reverse order
Choreography-based saga: services publish events triggering next steps
Orchestration-based saga: a central orchestrator manages workflow and compensation

7. What is a correlation ID and why is it important in event-driven systems?

Expected answer points:

A correlation ID is a unique identifier attached to all events resulting from the same originating action
It enables tracing a complete distributed transaction across service boundaries and logs
Passed through the entire call chain: originating action generates the ID, all subsequent events inherit it
Essential for debugging distributed systems where a single business transaction spans multiple services

8. How do you handle event schema evolution when events are immutable?

Expected answer points:

Upcasting: A transformer converts old event versions to new versions before processing
Event versioning: Include version number in event type (UserCreatedV2) with transformation logic
Snapshots: Periodically capture state to limit replay depth for long-lived entities
Schema registry: Centralize schema definitions with backward/forward compatibility rules
Key principle: Old events must still deserialize correctly as your schema changes

9. What are the trade-offs between eventual consistency and strong consistency in EDA?

Expected answer points:

Eventual consistency: Read models lag behind writes; writes return quickly but reads may see stale data
Strong consistency: All services immediately reflect the same state; higher latency but no stale reads
EDA typically embraces eventual consistency for better availability and scalability
Design implications: UIs and business logic must account for lag between write and read model update
Strong consistency required only when absolute accuracy is critical (e.g., financial transactions with regulatory requirements)

10. Describe the key components of a well-designed event envelope schema.

Expected answer points:

event_id: Unique identifier (UUID v4) for deduplication and tracing
event_type: The event name (e.g., OrderPlaced) for routing and handling
event_version: Schema version number for compatibility handling
timestamp: When the event occurred for ordering and debugging
correlation_id: Links related events across the distributed transaction
causation_id: References the event that triggered this one
payload: The actual event data, kept flat for easier schema evolution

11. What is the outbox pattern and why is it important for reliable event publishing?

Expected answer points:

The outbox pattern solves the dual-write problem where failing to write to the database and publish an event atomically leads to inconsistencies
Instead of publishing events directly, you write events to an outbox table in the same transaction as your business data
A separate process (outbox relay) polls the outbox table and publishes events to the broker
This ensures at-least-once delivery because the event is persisted before acknowledgment
Prevents scenarios where the database commits but the event publish fails

12. Explain the difference between event notification and event-carried state transfer patterns.

Expected answer points:

Event notification: The event contains minimal data (just enough to identify what happened); consumers must call back to the producer for full details
Event-carried state transfer: The event contains the complete state needed by consumers, eliminating need for callback queries
Event notification reduces producer complexity but increases consumer complexity and load on producer
Event-carried state transfer simplifies consumers but creates larger events and potential data consistency issues
Choose based on consumer independence needs vs network and storage trade-offs

13. How do you ensure idempotency when processing events in a distributed system?

Expected answer points:

Idempotency ensures that processing the same event multiple times produces the same result as processing it once
Use event_id deduplication: Store processed event IDs and check before processing
Design handlers as naturally idempotent: Use upserts instead of inserts, check-before-update patterns
Implement idempotency keys at the consumer level with acknowledgment based on successful processing
For at-least-once delivery systems, idempotency is critical to handle redelivery without data corruption

14. What are the key differences between Apache Kafka, RabbitMQ, and Amazon SNS/SQS for event-driven architecture?

Expected answer points:

Kafka: Durable log-based messaging with per-partition ordering, replay capability, and consumer group semantics; best for high-throughput event streaming
RabbitMQ: Traditional message broker with exchanges, queues, and bindings; supports complex routing but no native replay
SNS/SQS: AWS managed services with push-based pub/sub (SNS) and pull-based queues (SQS); simplifies operations but creates AWS vendor dependency
Kafka excels at event sourcing and audit trails; RabbitMQ better for request-response integration; SNS/SQS for cloud-native AWS workflows

15. How does the event curtain concept help in event schema governance?

Expected answer points:

The event curtain is an interface that hides the internal event structure from consumers
Producers publish events through the curtain; consumers receive events through the curtain
This allows internal schema changes without breaking consumers who have already processed events
The curtain acts as a version adapter, translating between producer and consumer versions
Critical for maintaining backward compatibility in long-lived event-driven systems

16. What is backpressure handling and why is it important in event processing systems?

Expected answer points:

Backpressure occurs when consumers cannot process events as fast as producers publish them
Without handling, this leads to message accumulation, memory exhaustion, or dropped messages
Strategies: Consumer scaling (add more instances), throttling (slow producer), buffering with limits, dead letter queues for overflow
Kafka offers consumer lag monitoring as a backpressure indicator
Design systems to gracefully degrade under load rather than cascade failures

17. Describe the event sourcing projection rebuild process and how you would handle long projection times.

Expected answer points:

Projection rebuild replays all events from the event store to rebuild a read model from scratch
For large event stores, this can take hours or days, leaving the system unavailable
Snapshotting mitigates this by periodically persisting the projected state, reducing replay to events since the last snapshot
Blue-green projection builds a new projection alongside the existing one, switching when complete
Eventual consistency means read models lag during rebuild; design UIs to handle temporary staleness

18. What testing strategies are appropriate for event-driven microservices?

Expected answer points:

Unit testing: Test individual aggregate and event handlers in isolation with mock dependencies
Contract testing: Verify event schemas between producer and consumer services
Integration testing: Test event flow through the broker with test containers or embedded brokers
Circuit breaker testing: Verify system behavior when downstream services fail
Chaos testing: Inject failures like broker unavailability, message duplication, and out-of-order delivery

19. How does the saga pattern differ from traditional distributed transactions like two-phase commit?

Expected answer points:

Two-phase commit (2PC) provides atomic commitment across services but introduces tight coupling and blocks resources during the prepare phase
Sagas use sequential local transactions with compensating transactions for rollback, avoiding blocking
2PC loses availability during the commit phase; sagas remain available with proper compensation design
Sagas trade atomicity for availability and scalability, accepting eventual consistency
Choreography-based sagas scale better; orchestration-based sagas provide visibility but introduce centralization

20. What are the main considerations for choosing between event-driven and request-driven architecture for a given domain?

Expected answer points:

Use event-driven when: services are truly independent, you need audit trails, multiple read models, or genuine decoupling in time and space
Use request-driven when: you need strong consistency, low-latency synchronous responses, or simple CRUD operations
Hybrid approach is common: core business workflows use request-response, while peripheral side effects use events
Consider team expertise: EDA has higher operational complexity and learning curve
Start with simpler architectures and evolve to event-driven only when benefits are clear and justified

Conclusion

Events vs Commands

Understanding the distinction between events and commands is foundational to event-driven architecture.

Commands are directed requests for a specific action, expecting one handler:

Imperative naming: verb-noun (CreateUser, SubmitOrder, ProcessPayment)
The sender expects a single response
Like a phone call — one person answers

Events are broadcast facts that something happened, with zero or more potential listeners:

Past tense naming: noun-verb (UserCreated, OrderSubmitted, PaymentProcessed)
The sender does not know who is listening
Like a radio broadcast — anyone tuned in receives it

Using the wrong naming convention creates confusion. If you name something CreateUser but treat it as an event (broadcasting to multiple consumers), other developers will expect a single handler response that never comes.

For example:

Command: CreateUser → expects one handler to process and respond
Event: UserCreated → broadcasts to all interested consumers

This distinction shapes how you design message contracts throughout an EDA system.

Key Points

Events are facts that something happened; commands are requests for specific actions
Event sourcing stores state changes as events for replay and audit
CQRS separates read and write models for independent optimization
Correlation IDs enable distributed tracing across service boundaries
Choreography lets services react to events without central coordination
Orchestration uses a central coordinator for complex workflows
Event schema evolution requires backward-compatible versioning strategies
Design for eventual consistency; read models lag behind writes

Pre-Deployment Checklist

- [ ] Event schema versioning strategy defined (upcasters or versioned types)
- [ ] Event schema registry deployed for validation
- [ ] Correlation ID propagation implemented across all services
- [ ] Dead letter queue configured for failed event processing
- [ ] Idempotent event consumers implemented
- [ ] Circuit breakers configured for downstream service calls
- [ ] Event broker clustering and replication configured
- [ ] Monitoring for event consumption lag configured
- [ ] Alert thresholds set for dead letter queue depth
- [ ] Event retention policy defined (for event sourcing replay)
- [ ] Read model rebuild procedure documented
- [ ] Schema evolution testing implemented (old events deserialize correctly)
- [ ] Security controls (authentication, authorization, encryption) configured
- [ ] Workflow compensation logic tested (for orchestration)

Event-Driven Architecture: Events, Commands, and Patterns

Introduction

Topic-Specific Deep Dives

Event Sourcing

Benefits of Event Sourcing

Challenges of Event Sourcing

Handling Schema Evolution

CQRS: Command Query Responsibility Segregation

When to Use CQRS

Event Correlation

Implementation

Choreography vs Orchestration

Choreography

Orchestration

Which to Use

Trade-off Analysis

When to Use / When Not to Use

When to Use Event-Driven Architecture

When Not to Use Event-Driven Architecture

Message Ordering and Delivery Guarantees

At-Most-Once Delivery

At-Least-Once Delivery

Exactly-Once Delivery

Ordering Guarantees

Saga Pattern

Why Sagas Instead of Distributed Transactions

Types of Sagas

Implementing Compensation

Saga vs Event Sourcing

Event Schema Design

Event Envelope Pattern

Schema Registry

Event Design Principles

Event Processing Patterns

Event Streaming vs Event Carousel

Filtering and Transformation

State Stores

Outbox Pattern

Event Notification vs Event-Carried State Transfer

Idempotency in Event Processing

Backpressure Handling

Production Failure Scenarios

Common Pitfalls / Anti-Patterns

Pitfall 1: Using Events for Queries

Pitfall 2: Mixing Commands and Events

Pitfall 3: Not Planning for Event Schema Evolution

Pitfall 4: Over-Engineering with Choreography

Pitfall 5: Ignoring Eventual Consistency

Pitfall 6: Creating Too Many Event Types

Interview Questions

Further Reading

Conclusion

Events vs Commands

Key Points

Pre-Deployment Checklist

Category

Tags

Related Posts

CQRS and Event Sourcing: Distributed Data Management

CQRS Pattern

Event Sourcing