Event-Driven Architecture: Events, Commands, and Patterns

Learn event-driven architecture fundamentals: event sourcing, CQRS, event correlation, choreography vs orchestration, and implementation patterns.

published: reading time: 27 min read author: GeekWorkBench

Event-Driven Architecture: Events, Commands, and Patterns

Introduction

In request-response, Service A calls Service B directly and waits. In event-driven architecture, Service A publishes an event and moves on. Service B (and C, D, and whoever else is listening) picks it up and reacts. The sender doesn’t know or care who’s listening.

This decoupling in time and space is what makes EDA powerful. Service A doesn’t wait for Service B to finish processing. Service B can be down, busy, or not running at all — the event sits in a queue until Service B is ready. That resilience and flexibility comes with complexity you need to understand before adopting it.

This post covers the core concepts of EDA: events vs commands, event sourcing, CQRS, correlation, and the choreography vs orchestration debate.

Topic-Specific Deep Dives

Event Sourcing

Event sourcing is a pattern where you store the full history of state changes as events, rather than just the current state. The current state is derived by replaying events.

graph LR
    E1[UserRegistered] --> Store[(Event Store)]
    E2[EmailChanged] --> Store
    E3[NameUpdated] --> Store

    Store -->|replay| State[Current State]

Instead of:

UPDATE users SET name = 'Alice' WHERE id = 1;

You append:

UserNameUpdated { user_id: 1, old_name: 'Bob', new_name: 'Alice', timestamp: ... }

Benefits of Event Sourcing

  • Complete audit trail: Every state change is stored, immutable and auditable
  • Temporal queries: Ask “what was the customer’s address on January 15th?”
  • Replay: If your read model is wrong, rebuild it from events
  • Decoupled write and read: Events are the source of truth, projections are derived

Challenges of Event Sourcing

  • Event schema evolution: Old events must still deserialize correctly as your schema changes
  • Projections can be slow: Replaying millions of events takes time
  • Eventual consistency: Read models lag behind writes as projections catch up
  • Complexity: More moving parts than simple CRUD

Handling Schema Evolution

Events are immutable, but your schema will change. When UserCreated gets a new field two years from now, old events still need to deserialize correctly. Several approaches handle this: upcasters (transformers that convert old versions to new), version numbers in event type names like UserCreatedV2, and snapshots that cap replay depth.

CQRS: Command Query Responsibility Segregation

CQRS separates read and write models. Writes go through one model (optimized for commands). Reads go through a different model (optimized for queries).

Writes: API -> Command Handler -> Event Store
Reads:  Query -> Read Model (materialized view)

The event store is the write side. Read models are projections built from events.

graph LR
    Command[Command] --> CommandHandler[Command Handler]
    CommandHandler --> EventStore[(Event Store)]

    EventStore -->|project| ReadModel1[Read Model: Orders by User]
    EventStore -->|project| ReadModel2[Read Model: Orders by Date]
    EventStore -->|project| ReadModel3[Read Model: Revenue Dashboard]

When to Use CQRS

CQRS shines when read and write workloads differ significantly, you need multiple optimized read representations, teams can work independently on each side, or you’re already using event sourcing. If your domain is simple CRUD with basic queries, CQRS adds complexity you don’t need.

Event Correlation

In distributed systems, a single business transaction spans multiple services. Tracking such transactions requires correlation IDs.

A correlation ID is a unique identifier attached to all events resulting from the same originating action:

UserSignsUp (correlation_id: abc-123)
  -> AccountCreated (correlation_id: abc-123)
     -> WelcomeEmailSent (correlation_id: abc-123)
     -> AnalyticsEvent (correlation_id: abc-123)

With correlation IDs, you can trace a complete transaction across services and logs.

Implementation

Pass correlation ID through the entire call chain:

def handle_signup(user_data):
    correlation_id = str(uuid.uuid4())
    event = {
        'type': 'UserSignedUp',
        'correlation_id': correlation_id,
        'user_id': user_data['id'],
        'email': user_data['email']
    }
    event_bus.publish(event)

Each service logs its correlation ID, enabling distributed tracing.

Choreography vs Orchestration

When a business transaction spans multiple services, who coordinates it? Two approaches:

Choreography

Services react to events and emit their own events. No central coordinator.

OrderService: OrderPlaced -> InventoryService: ReserveInventory -> PaymentService: ChargePayment
                                        -> ShippingService: ScheduleShipping

Each service knows only its own part. The overall flow emerges from the event chain.

Pros:

  • Services are truly decoupled
  • No single point of failure
  • Easy to add new consumers

Cons:

  • Behavior is scattered across services
  • Hard to see the overall transaction
  • Difficult to implement transactions that must succeed or fail together

Orchestration

A central orchestrator coordinates the transaction:

graph LR
    Orch[Order Orchestrator] -->|Reserve| Inventory[Inventory Service]
    Orch -->|Charge| Payment[Payment Service]
    Orch -->|Schedule| Shipping[Shipping Service]

The orchestrator knows the complete workflow. It decides what to do next based on responses.

Pros:

  • Behavior is centralized and visible
  • Easier to implement complex workflows with branches and compensation
  • Transaction boundaries are explicit

Cons:

  • The orchestrator becomes a central point of coupling
  • Risk of “smart middleware” anti-pattern (business logic in the orchestrator)
  • Single point of failure (mitigated by workflow engines with persistence)

Which to Use

Choreography works well for simple, linear workflows where services are truly independent. Orchestration works better for complex workflows with complex compensation logic.

In practice, many systems use a hybrid: orchestration for the core business workflow, choreography for peripheral side effects.

Trade-off Analysis

FactorEDARequest-ResponseNotes
CouplingLoose - producers don’t know consumersTight - client knows serviceEDA enables independent evolution
ScalabilityHorizontal - event broker handles loadVertical - service handles loadEDA scales better for high write volume
ConsistencyEventual (eventual consistency)Strong (immediate)EDA requires handling lag
LatencyLow for publish, variable for consumePredictableEDA has higher average latency
DebuggingHarder - distributed transactionsEasier - synchronous callsEDA needs correlation IDs and tracing
Event SchemaMust be versioned and stableAPI contracts onlyEDA requires schema governance
InfrastructureEvent broker needed (Kafka, RabbitMQ)Simple HTTP/RESTEDA adds infrastructure complexity
Data RecoveryNatural - replay eventsPoint-in-time snapshotsEDA provides better recovery
Learning CurveSteep - new patternsGentle - familiar RESTEDA requires paradigm shift

When to Use / When Not to Use

When to Use Event-Driven Architecture

  • Microservices with independent scaling: When services should be decoupled and scale independently
  • High write volume with audit requirements: When you need complete audit trails of all state changes
  • Multiple read models: When different consumers need different representations of the same data (CQRS)
  • Real-time event processing: When you need immediate reaction to events (notifications, analytics, monitoring)
  • System integration: When integrating multiple systems that evolve independently
  • Temporal queries: When you need to query historical state (“what was the balance on date X?”)

When Not to Use Event-Driven Architecture

  • Simple request-response logic: When your use case is straightforward CRUD operations
  • Strong consistency requirements: When you need immediate consistency across all services
  • Small teams with limited expertise: When operational complexity exceeds team capacity
  • Low-latency synchronous requirements: When sub-millisecond response times are critical
  • Simple workflows: When workflow steps are few and do not benefit from decoupling
  • Batch-oriented processing: When your use case is bulk data processing rather than real-time events

Message Ordering and Delivery Guarantees

Distributed systems cannot guarantee both ordering and availability. You must choose between ordering guarantees and availability.

At-Most-Once Delivery

The message is delivered at most once - it may be lost but never duplicated. The consumer does not acknowledge receipt, so the broker can discard it after sending.

Producer -> Broker -> Consumer (no ack)
           [message lost if consumer offline]

Use case: Monitoring metrics where occasional loss is acceptable.

At-Least-Once Delivery

The message is guaranteed to arrive, but may be delivered multiple times. The consumer acknowledges after processing, and the broker redelivers if no ack is received.

Producer -> Broker -> Consumer (ack after process)
           [redelivered if no ack received]
           [duplicate possible]

Use case: Payment processing, where missing a transaction is worse than a duplicate.

Exactly-Once Delivery

The message is delivered exactly once, neither lost nor duplicated. Requires coordination between producer confirmation and consumer offset management.

Kafka: Producer tx + Consumer offset commit
       [most expensive but safest]

Use case: Financial transactions where duplicates or losses are both unacceptable.

Ordering Guarantees

Kafka provides per-partition ordering. All events for the same partition key are processed in order. To guarantee cross-partition ordering, use a single partition or explicit sequencing.

# Ensures ordering within user_id
producer.send('events', key=b'user-123', value=event)

For systems requiring global ordering, consider using a single partition topic or implementing sequence numbers in event payloads.

Saga Pattern

When a business transaction spans multiple services, you need a way to handle failures gracefully. Sagas manage distributed transactions through a sequence of local transactions, each with a compensating transaction for rollback.

Why Sagas Instead of Distributed Transactions

Traditional ACID transactions do not scale across services. Two-phase commit (2PC) introduces tight coupling and availability loss. Sagas provide a looser coupling model with compensating transactions.

Types of Sagas

Choreography-based Saga: Each service publishes events that trigger the next service’s local transaction. No central coordinator.

graph LR
    O[Order Service] -->|OrderCreated| P[Payment Service]
    P -->|PaymentCaptured| I[Inventory Service]
    I -->|InventoryReserved| S[Shipping Service]
    S -->|ShippingScheduled| O

Orchestration-based Saga: A central saga orchestrator manages the workflow, deciding what to do next and triggering compensating transactions on failure.

graph LR
    Orch[Order Saga Orchestrator] -->|CreateOrder| OrderSvc[Order Service]
    Orch -->|CapturePayment| PaySvc[Payment Service]
    Orch -->|ReserveInventory| InvSvc[Inventory Service]
    Orch -->|ScheduleShipping| ShipSvc[Shipping Service]

    PaySvc -.->|PaymentFailed| Orch
    Orch -.->|CancelOrder| OrderSvc
    Orch -.->|RefundPayment| PaySvc

Implementing Compensation

Each saga step must define its compensation logic:

class OrderSaga:
    def create_order(self, order_data):
        order = self.order_service.create(order_data)
        return SagaStep(order_id=order.id, forward=self.capture_payment, backward=self.cancel_order)

    def capture_payment(self, step):
        payment = self.payment_service.capture(step.order_id)
        return SagaStep(payment_id=payment.id, forward=self.reserve_inventory, backward=self.refund_payment)

    def cancel_order(self, step):
        self.order_service.cancel(step.order_id)

    def refund_payment(self, step):
        self.payment_service.refund(step.payment_id)

Saga vs Event Sourcing

Sagas focus on workflow coordination. Event sourcing focuses on state history. They complement each other:

  • Saga without Event Sourcing: Orchestrates steps but does not persist the saga state
  • Saga with Event Sourcing: Each saga step emits events, creating an audit trail of workflow decisions

Event Schema Design

Events are the contract between producer and consumer. A well-designed event schema evolves gracefully and remains comprehensible across services.

Event Envelope Pattern

Every event should have a standard envelope:

{
  "event_id": "uuid-v4",
  "event_type": "OrderPlaced",
  "event_version": "1.0",
  "timestamp": "2026-03-22T10:30:00Z",
  "correlation_id": "uuid-for-tracing",
  "causation_id": "uuid-of-triggering-event",
  "payload": {
    "order_id": "ORD-12345",
    "customer_id": "CUST-67890",
    "total_amount": 99.99,
    "currency": "USD"
  }
}

This envelope provides observability, traceability, and versioning at the schema level.

Schema Registry

Centralize event schema definitions using a schema registry:

  • Confluent Schema Registry: Stores Avro or JSON Schema for Kafka events
  • AWS Glue Schema Registry: AWS-managed schema registry for Kafka and Kinesis
  • Custom registry: Simple API with versioning, validation, and compatibility checks

Schema registry benefits:

  • Backward compatibility: New schema versions can read old events
  • Forward compatibility: Old schema versions can read new events (ignoring new fields)
  • Schema evolution rules: Define which changes are allowed (add fields, remove fields, etc.)

Event Design Principles

  1. Use flat structures: Nested objects complicate schema evolution
  2. Include timestamps: Essential for ordering and debugging
  3. Add context fields: correlation_id, causation_id enable distributed tracing
  4. Version explicitly: Include version number to handle schema evolution
  5. Use primitive types: Avoid complex types in early versions

Event Processing Patterns

Beyond simple publish-subscribe, event processing involves patterns for handling high-volume, low-latency, and fault-tolerant scenarios.

Event Streaming: Continuous flow of events processed in real-time. Kafka excels at this.

Producer1 -> Topic -> Consumer1
            Topic -> Consumer2
            Topic -> Consumer3

Event Carousel: Repeated processing of the same event batch until consumed. Useful for batch-oriented workflows.

Filtering and Transformation

Filter events at the broker level to reduce network traffic:

# Kafka Streams example: filter low-value events
stream.filter((event) => event.amount > 100)

Transform events before consumption:

# Add enrichment before storing
stream.map((event) => ({
  ...event,
  enriched_at: timestamp,
  region: lookup_region(event.customer_id)
}))

State Stores

Maintain stateful event processing using state stores:

# Kafka Streams: materialized view
orders_by_user = stream.groupByKey()
    .aggregate(
        initializer=[],
        aggregator=(key, event, existing) => existing.append(event)
    )

State stores enable:

  • joins: Combining events from multiple topics
  • aggregations: Running totals and windowed computations
  • tables: Latest value per key

Outbox Pattern

The outbox pattern solves a fundamental reliability problem in event-driven systems: the dual-write issue. When your application needs to update a database and publish an event, doing both non-atomically creates inconsistency risk. If the database commits but the event publish fails, your system enters an invalid state.

Without the outbox pattern:

# Problem: These two operations are not atomic
def handle_order(order_data):
    order = db.create_order(order_data)  # Commits
    event_bus.publish(OrderCreated(order))  # Could fail after commit

With the outbox pattern:

def handle_order(order_data):
    order = db.create_order(order_data)
    db.save_outbox_event(OutboxEvent(type='OrderCreated', payload=order))  # Same transaction
    # Commit both or neither

# Separate process polls outbox and publishes
def outbox_relay():
    events = db.get_pending_outbox_events()
    for event in events:
        event_bus.publish(event)
        db.mark_outbox_processed(event.id)

The outbox table lives in the same database as your business data. A dedicated relay process polls pending events and publishes them to the broker. This guarantees at-least-once delivery because the event is persisted before the publish acknowledgment returns.

Production consideration: Use a single-threaded relay or partition by aggregate ID to prevent out-of-order publishing when multiple instances run.

Event Notification vs Event-Carried State Transfer

Two competing philosophies govern how much data to include in an event:

Event Notification: The event contains only enough information to identify what happened. Consumers must query back to the producer for full details.

{ "event_type": "OrderPlaced", "order_id": "ORD-123", "timestamp": "..." }

Consumers call GET /orders/ORD-123 to get the complete order details.

Event-Carried State Transfer: The event contains all data consumers need without callback queries.

{ "event_type": "OrderPlaced", "order_id": "ORD-123", "customer_id": "...", "items": [...], "total": 99.99 }

Trade-offs:

AspectEvent NotificationEvent-Carried State Transfer
Consumer complexityHigher (needs callback)Lower (self-contained)
Producer loadLowerLower (no callbacks)
Network callsAdditional per eventNone
Data sizeSmallLarger
Consumer independenceLower (coupled to producer API)Higher
Schema evolutionEasier (consumers re-fetch)Harder (all consumers must update)

Event notification works well when producers need to protect their data model or when consumers frequently need related data. Event-carried state transfer works well when consumers are truly independent and event sizes remain manageable.

Idempotency in Event Processing

At-least-once delivery means consumers must handle redelivered events gracefully. Idempotency ensures processing an event multiple times produces the same result as processing it once.

Deduplication by event ID:

processed_events = redis.set('processed_events')

def handle_event(event):
    if event.event_id in processed_events:
        return  # Already handled
    process_event(event)
    processed_events.add(event.event_id)

Natural idempotency: Some operations are naturally idempotent:

# Idempotent: Setting a value to the same thing twice
account.update_balance(new_balance=100)

# Non-idempotent: Incrementing
account.increment_balance(amount=10)  # Called twice = +20 instead of +10

Design event handlers to use upserts instead of increments where possible. When increments are necessary, track the event ID to detect duplicates.

Consumer-side acknowledgment:

def process_with_ack(event):
    try:
        result = handle_event(event)
        acknowledge(event.offset)  # Commit offset only after success
        return result
    except:
        negative_acknowledge(event.offset)  # Trigger redelivery
        raise

Backpressure Handling

Backpressure occurs when producers publish events faster than consumers can process them. Left unhandled, this leads to queue accumulation, memory exhaustion, and eventually system failure.

Monitoring consumer lag:

# Kafka: track consumer lag per partition
lag = end_offset - current_offset
if lag > threshold:
    alert("Consumer falling behind")

Strategies for handling backpressure:

  1. Horizontal scaling: Add consumer instances to parallelize processing
  2. Throttling: Implement rate limiting on the producer side
  3. Buffering with limits: Cap buffer size and route overflow to dead letter queues
  4. Adaptive processing: Slow down when queue depth exceeds thresholds

Dead letter queue pattern:

def process_with_dlq(event):
    try:
        handle_event(event)
    except PermanentFailure:
        dlq.publish(event)  # No point retrying
        acknowledge(event.offset)
    except TransientFailure:
        raise  # Retry via redelivery

Design systems to gracefully degrade under load rather than cascade failures. When in doubt, prefer dropping events with alerting over unbounded queue growth.

Production Failure Scenarios

FailureImpactMitigation
Event broker failureEvents not published or consumedUse broker clustering with replication; implement outbox pattern
Consumer crash mid-processingEvents may be lost or reprocessedUse acknowledgment after successful processing; idempotent consumers
Out-of-order event deliveryState becomes inconsistentUse sequence numbers or timestamps; implement conflict resolution
Event schema version mismatchConsumers cannot deserialize eventsImplement schema versioning; use upcasters or schema registry
Poison event blocking consumerConsumer stops processingConfigure dead letter queues; implement retry limits
Cascading failuresOne service failure triggers failures in othersImplement circuit breakers; design for partial availability
Event replay overloadNew consumer overwhelms system with replaysThrottle replay rate; process in batches with backpressure
Correlation ID lossCannot trace transactions across servicesPropagate correlation IDs through all events; log correlation

Common Pitfalls / Anti-Patterns

Pitfall 1: Using Events for Queries

Events are for notifications, not queries. If you find yourself requesting data through events and waiting for responses, use a direct API call instead.

Pitfall 2: Mixing Commands and Events

Commands expect one handler; events expect zero or more handlers. Sending a command as an event creates confusion about who should handle it and whether responses are expected.

Pitfall 3: Not Planning for Event Schema Evolution

Events are immutable once published. If you change the schema, old events must still deserialize correctly. Plan versioning from the start using upcasters or versioned event types.

Pitfall 4: Over-Engineering with Choreography

Choreography is simple until it is not. Complex workflows with compensation logic are easier to manage with orchestration. Do not default to choreography for everything.

Pitfall 5: Ignoring Eventual Consistency

Read models lag behind writes in EDA. If your business logic assumes immediate consistency, you will have bugs. Design UIs and expectations around eventual consistency.

Pitfall 6: Creating Too Many Event Types

If every database write creates a unique event type, you have an event catalog nightmare. Use a smaller set of event types with sufficient payload to handle multiple use cases.

Interview Questions

1. What is the fundamental difference between an event and a command in event-driven architecture?

Expected answer points:

  • Commands are directed requests for a specific action ("do this"), expecting one handler
  • Events are broadcast facts that something happened ("this happened"), with zero or more potential listeners
  • Commands use imperative naming (verb-noun like CreateOrder), while events use past tense naming (noun-verb like OrderPlaced)
2. Explain event sourcing and describe how you would rebuild current state from events.

Expected answer points:

  • Event sourcing stores the complete history of state changes as a sequence of events rather than persisting current state
  • To rebuild state, you replay all events from the beginning (or from a snapshot) in order
  • Each event represents a state mutation; applying them sequentially reconstructs the aggregate
  • Snapshots can optimize by periodically capturing state to reduce replay depth
3. What is CQRS and when would you choose to use it?

Expected answer points:

  • CQRS (Command Query Responsibility Segregation) separates read and write models into distinct paths
  • Write side: Commands go through a command handler to an event store
  • Read side: Queries hit materialized views (projections) built from events
  • Best suited when read/write workloads differ significantly, you need multiple optimized read representations, or you are already using event sourcing
  • Not suitable for simple CRUD domains where the overhead outweighs benefits
4. Describe the choreography vs orchestration patterns for distributed transactions.

Expected answer points:

  • Choreography: Services react to events and emit their own events without a central coordinator; the overall workflow emerges from the event chain
  • Orchestration: A central orchestrator coordinates the entire transaction, deciding next steps based on responses
  • Choreography benefits: True decoupling, no single point of failure, easy to add consumers
  • Orchestration benefits: Centralized behavior, explicit transaction boundaries, easier compensation logic
  • Hybrid approach is common in practice: orchestration for core business workflows, choreography for peripheral side effects
5. What are the three types of delivery guarantees in messaging systems?

Expected answer points:

  • At-most-once: Message may be lost but never duplicated; no consumer acknowledgment
  • At-least-once: Message is guaranteed to arrive but may be delivered multiple times; consumer acknowledges after processing
  • Exactly-once: Message arrives exactly once, neither lost nor duplicated; requires coordination between producer and consumer offset management
6. How does the Saga pattern handle distributed transaction failures?

Expected answer points:

  • Saga manages distributed transactions through a sequence of local transactions, each with a compensating transaction for rollback
  • Each saga step performs a forward action and defines a backward compensation action
  • If any step fails, all previously completed steps execute their compensation transactions in reverse order
  • Choreography-based saga: services publish events triggering next steps
  • Orchestration-based saga: a central orchestrator manages workflow and compensation
7. What is a correlation ID and why is it important in event-driven systems?

Expected answer points:

  • A correlation ID is a unique identifier attached to all events resulting from the same originating action
  • It enables tracing a complete distributed transaction across service boundaries and logs
  • Passed through the entire call chain: originating action generates the ID, all subsequent events inherit it
  • Essential for debugging distributed systems where a single business transaction spans multiple services
8. How do you handle event schema evolution when events are immutable?

Expected answer points:

  • Upcasting: A transformer converts old event versions to new versions before processing
  • Event versioning: Include version number in event type (UserCreatedV2) with transformation logic
  • Snapshots: Periodically capture state to limit replay depth for long-lived entities
  • Schema registry: Centralize schema definitions with backward/forward compatibility rules
  • Key principle: Old events must still deserialize correctly as your schema changes
9. What are the trade-offs between eventual consistency and strong consistency in EDA?

Expected answer points:

  • Eventual consistency: Read models lag behind writes; writes return quickly but reads may see stale data
  • Strong consistency: All services immediately reflect the same state; higher latency but no stale reads
  • EDA typically embraces eventual consistency for better availability and scalability
  • Design implications: UIs and business logic must account for lag between write and read model update
  • Strong consistency required only when absolute accuracy is critical (e.g., financial transactions with regulatory requirements)
10. Describe the key components of a well-designed event envelope schema.

Expected answer points:

  • event_id: Unique identifier (UUID v4) for deduplication and tracing
  • event_type: The event name (e.g., OrderPlaced) for routing and handling
  • event_version: Schema version number for compatibility handling
  • timestamp: When the event occurred for ordering and debugging
  • correlation_id: Links related events across the distributed transaction
  • causation_id: References the event that triggered this one
  • payload: The actual event data, kept flat for easier schema evolution
11. What is the outbox pattern and why is it important for reliable event publishing?

Expected answer points:

  • The outbox pattern solves the dual-write problem where failing to write to the database and publish an event atomically leads to inconsistencies
  • Instead of publishing events directly, you write events to an outbox table in the same transaction as your business data
  • A separate process (outbox relay) polls the outbox table and publishes events to the broker
  • This ensures at-least-once delivery because the event is persisted before acknowledgment
  • Prevents scenarios where the database commits but the event publish fails
12. Explain the difference between event notification and event-carried state transfer patterns.

Expected answer points:

  • Event notification: The event contains minimal data (just enough to identify what happened); consumers must call back to the producer for full details
  • Event-carried state transfer: The event contains the complete state needed by consumers, eliminating need for callback queries
  • Event notification reduces producer complexity but increases consumer complexity and load on producer
  • Event-carried state transfer simplifies consumers but creates larger events and potential data consistency issues
  • Choose based on consumer independence needs vs network and storage trade-offs
13. How do you ensure idempotency when processing events in a distributed system?

Expected answer points:

  • Idempotency ensures that processing the same event multiple times produces the same result as processing it once
  • Use event_id deduplication: Store processed event IDs and check before processing
  • Design handlers as naturally idempotent: Use upserts instead of inserts, check-before-update patterns
  • Implement idempotency keys at the consumer level with acknowledgment based on successful processing
  • For at-least-once delivery systems, idempotency is critical to handle redelivery without data corruption
14. What are the key differences between Apache Kafka, RabbitMQ, and Amazon SNS/SQS for event-driven architecture?

Expected answer points:

  • Kafka: Durable log-based messaging with per-partition ordering, replay capability, and consumer group semantics; best for high-throughput event streaming
  • RabbitMQ: Traditional message broker with exchanges, queues, and bindings; supports complex routing but no native replay
  • SNS/SQS: AWS managed services with push-based pub/sub (SNS) and pull-based queues (SQS); simplifies operations but creates AWS vendor dependency
  • Kafka excels at event sourcing and audit trails; RabbitMQ better for request-response integration; SNS/SQS for cloud-native AWS workflows
15. How does the event curtain concept help in event schema governance?

Expected answer points:

  • The event curtain is an interface that hides the internal event structure from consumers
  • Producers publish events through the curtain; consumers receive events through the curtain
  • This allows internal schema changes without breaking consumers who have already processed events
  • The curtain acts as a version adapter, translating between producer and consumer versions
  • Critical for maintaining backward compatibility in long-lived event-driven systems
16. What is backpressure handling and why is it important in event processing systems?

Expected answer points:

  • Backpressure occurs when consumers cannot process events as fast as producers publish them
  • Without handling, this leads to message accumulation, memory exhaustion, or dropped messages
  • Strategies: Consumer scaling (add more instances), throttling (slow producer), buffering with limits, dead letter queues for overflow
  • Kafka offers consumer lag monitoring as a backpressure indicator
  • Design systems to gracefully degrade under load rather than cascade failures
17. Describe the event sourcing projection rebuild process and how you would handle long projection times.

Expected answer points:

  • Projection rebuild replays all events from the event store to rebuild a read model from scratch
  • For large event stores, this can take hours or days, leaving the system unavailable
  • Snapshotting mitigates this by periodically persisting the projected state, reducing replay to events since the last snapshot
  • Blue-green projection builds a new projection alongside the existing one, switching when complete
  • Eventual consistency means read models lag during rebuild; design UIs to handle temporary staleness
18. What testing strategies are appropriate for event-driven microservices?

Expected answer points:

  • Unit testing: Test individual aggregate and event handlers in isolation with mock dependencies
  • Contract testing: Verify event schemas between producer and consumer services
  • Integration testing: Test event flow through the broker with test containers or embedded brokers
  • Circuit breaker testing: Verify system behavior when downstream services fail
  • Chaos testing: Inject failures like broker unavailability, message duplication, and out-of-order delivery
19. How does the saga pattern differ from traditional distributed transactions like two-phase commit?

Expected answer points:

  • Two-phase commit (2PC) provides atomic commitment across services but introduces tight coupling and blocks resources during the prepare phase
  • Sagas use sequential local transactions with compensating transactions for rollback, avoiding blocking
  • 2PC loses availability during the commit phase; sagas remain available with proper compensation design
  • Sagas trade atomicity for availability and scalability, accepting eventual consistency
  • Choreography-based sagas scale better; orchestration-based sagas provide visibility but introduce centralization
20. What are the main considerations for choosing between event-driven and request-driven architecture for a given domain?

Expected answer points:

  • Use event-driven when: services are truly independent, you need audit trails, multiple read models, or genuine decoupling in time and space
  • Use request-driven when: you need strong consistency, low-latency synchronous responses, or simple CRUD operations
  • Hybrid approach is common: core business workflows use request-response, while peripheral side effects use events
  • Consider team expertise: EDA has higher operational complexity and learning curve
  • Start with simpler architectures and evolve to event-driven only when benefits are clear and justified

Further Reading

  • Apache Kafka — a common backbone for event-driven systems; its durable log, replay capability, and consumer group model map well to EDA patterns
  • Pub/Sub Patterns — pub/sub patterns that complement event-driven architecture
  • Message Queue Types — messaging infrastructure that supports event-driven systems

Conclusion

Events vs Commands

Understanding the distinction between events and commands is foundational to event-driven architecture.

Commands are directed requests for a specific action, expecting one handler:

  • Imperative naming: verb-noun (CreateUser, SubmitOrder, ProcessPayment)
  • The sender expects a single response
  • Like a phone call — one person answers

Events are broadcast facts that something happened, with zero or more potential listeners:

  • Past tense naming: noun-verb (UserCreated, OrderSubmitted, PaymentProcessed)
  • The sender does not know who is listening
  • Like a radio broadcast — anyone tuned in receives it

Using the wrong naming convention creates confusion. If you name something CreateUser but treat it as an event (broadcasting to multiple consumers), other developers will expect a single handler response that never comes.

For example:

  • Command: CreateUser → expects one handler to process and respond
  • Event: UserCreated → broadcasts to all interested consumers

This distinction shapes how you design message contracts throughout an EDA system.

Key Points

  • Events are facts that something happened; commands are requests for specific actions
  • Event sourcing stores state changes as events for replay and audit
  • CQRS separates read and write models for independent optimization
  • Correlation IDs enable distributed tracing across service boundaries
  • Choreography lets services react to events without central coordination
  • Orchestration uses a central coordinator for complex workflows
  • Event schema evolution requires backward-compatible versioning strategies
  • Design for eventual consistency; read models lag behind writes

Pre-Deployment Checklist

- [ ] Event schema versioning strategy defined (upcasters or versioned types)
- [ ] Event schema registry deployed for validation
- [ ] Correlation ID propagation implemented across all services
- [ ] Dead letter queue configured for failed event processing
- [ ] Idempotent event consumers implemented
- [ ] Circuit breakers configured for downstream service calls
- [ ] Event broker clustering and replication configured
- [ ] Monitoring for event consumption lag configured
- [ ] Alert thresholds set for dead letter queue depth
- [ ] Event retention policy defined (for event sourcing replay)
- [ ] Read model rebuild procedure documented
- [ ] Schema evolution testing implemented (old events deserialize correctly)
- [ ] Security controls (authentication, authorization, encryption) configured
- [ ] Workflow compensation logic tested (for orchestration)

Category

Related Posts

CQRS and Event Sourcing: Distributed Data Management

Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.

#microservices #cqrs #event-sourcing

CQRS Pattern

Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.

#database #cqrs #event-sourcing

Event Sourcing

Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.

#database #event-sourcing #cqrs