Event-Driven Architecture: Events, Commands, and Patterns
Learn event-driven architecture fundamentals: event sourcing, CQRS, event correlation, choreography vs orchestration, and implementation patterns.
Event-Driven Architecture: Events, Commands, and Patterns
Introduction
In request-response, Service A calls Service B directly and waits. In event-driven architecture, Service A publishes an event and moves on. Service B (and C, D, and whoever else is listening) picks it up and reacts. The sender doesn’t know or care who’s listening.
This decoupling in time and space is what makes EDA powerful. Service A doesn’t wait for Service B to finish processing. Service B can be down, busy, or not running at all — the event sits in a queue until Service B is ready. That resilience and flexibility comes with complexity you need to understand before adopting it.
This post covers the core concepts of EDA: events vs commands, event sourcing, CQRS, correlation, and the choreography vs orchestration debate.
Topic-Specific Deep Dives
Event Sourcing
Event sourcing is a pattern where you store the full history of state changes as events, rather than just the current state. The current state is derived by replaying events.
graph LR
E1[UserRegistered] --> Store[(Event Store)]
E2[EmailChanged] --> Store
E3[NameUpdated] --> Store
Store -->|replay| State[Current State]
Instead of:
UPDATE users SET name = 'Alice' WHERE id = 1;
You append:
UserNameUpdated { user_id: 1, old_name: 'Bob', new_name: 'Alice', timestamp: ... }
Benefits of Event Sourcing
- Complete audit trail: Every state change is stored, immutable and auditable
- Temporal queries: Ask “what was the customer’s address on January 15th?”
- Replay: If your read model is wrong, rebuild it from events
- Decoupled write and read: Events are the source of truth, projections are derived
Challenges of Event Sourcing
- Event schema evolution: Old events must still deserialize correctly as your schema changes
- Projections can be slow: Replaying millions of events takes time
- Eventual consistency: Read models lag behind writes as projections catch up
- Complexity: More moving parts than simple CRUD
Handling Schema Evolution
Events are immutable, but your schema will change. When UserCreated gets a new field two years from now, old events still need to deserialize correctly. Several approaches handle this: upcasters (transformers that convert old versions to new), version numbers in event type names like UserCreatedV2, and snapshots that cap replay depth.
CQRS: Command Query Responsibility Segregation
CQRS separates read and write models. Writes go through one model (optimized for commands). Reads go through a different model (optimized for queries).
Writes: API -> Command Handler -> Event Store
Reads: Query -> Read Model (materialized view)
The event store is the write side. Read models are projections built from events.
graph LR
Command[Command] --> CommandHandler[Command Handler]
CommandHandler --> EventStore[(Event Store)]
EventStore -->|project| ReadModel1[Read Model: Orders by User]
EventStore -->|project| ReadModel2[Read Model: Orders by Date]
EventStore -->|project| ReadModel3[Read Model: Revenue Dashboard]
When to Use CQRS
CQRS shines when read and write workloads differ significantly, you need multiple optimized read representations, teams can work independently on each side, or you’re already using event sourcing. If your domain is simple CRUD with basic queries, CQRS adds complexity you don’t need.
Event Correlation
In distributed systems, a single business transaction spans multiple services. Tracking such transactions requires correlation IDs.
A correlation ID is a unique identifier attached to all events resulting from the same originating action:
UserSignsUp (correlation_id: abc-123)
-> AccountCreated (correlation_id: abc-123)
-> WelcomeEmailSent (correlation_id: abc-123)
-> AnalyticsEvent (correlation_id: abc-123)
With correlation IDs, you can trace a complete transaction across services and logs.
Implementation
Pass correlation ID through the entire call chain:
def handle_signup(user_data):
correlation_id = str(uuid.uuid4())
event = {
'type': 'UserSignedUp',
'correlation_id': correlation_id,
'user_id': user_data['id'],
'email': user_data['email']
}
event_bus.publish(event)
Each service logs its correlation ID, enabling distributed tracing.
Choreography vs Orchestration
When a business transaction spans multiple services, who coordinates it? Two approaches:
Choreography
Services react to events and emit their own events. No central coordinator.
OrderService: OrderPlaced -> InventoryService: ReserveInventory -> PaymentService: ChargePayment
-> ShippingService: ScheduleShipping
Each service knows only its own part. The overall flow emerges from the event chain.
Pros:
- Services are truly decoupled
- No single point of failure
- Easy to add new consumers
Cons:
- Behavior is scattered across services
- Hard to see the overall transaction
- Difficult to implement transactions that must succeed or fail together
Orchestration
A central orchestrator coordinates the transaction:
graph LR
Orch[Order Orchestrator] -->|Reserve| Inventory[Inventory Service]
Orch -->|Charge| Payment[Payment Service]
Orch -->|Schedule| Shipping[Shipping Service]
The orchestrator knows the complete workflow. It decides what to do next based on responses.
Pros:
- Behavior is centralized and visible
- Easier to implement complex workflows with branches and compensation
- Transaction boundaries are explicit
Cons:
- The orchestrator becomes a central point of coupling
- Risk of “smart middleware” anti-pattern (business logic in the orchestrator)
- Single point of failure (mitigated by workflow engines with persistence)
Which to Use
Choreography works well for simple, linear workflows where services are truly independent. Orchestration works better for complex workflows with complex compensation logic.
In practice, many systems use a hybrid: orchestration for the core business workflow, choreography for peripheral side effects.
Trade-off Analysis
| Factor | EDA | Request-Response | Notes |
|---|---|---|---|
| Coupling | Loose - producers don’t know consumers | Tight - client knows service | EDA enables independent evolution |
| Scalability | Horizontal - event broker handles load | Vertical - service handles load | EDA scales better for high write volume |
| Consistency | Eventual (eventual consistency) | Strong (immediate) | EDA requires handling lag |
| Latency | Low for publish, variable for consume | Predictable | EDA has higher average latency |
| Debugging | Harder - distributed transactions | Easier - synchronous calls | EDA needs correlation IDs and tracing |
| Event Schema | Must be versioned and stable | API contracts only | EDA requires schema governance |
| Infrastructure | Event broker needed (Kafka, RabbitMQ) | Simple HTTP/REST | EDA adds infrastructure complexity |
| Data Recovery | Natural - replay events | Point-in-time snapshots | EDA provides better recovery |
| Learning Curve | Steep - new patterns | Gentle - familiar REST | EDA requires paradigm shift |
When to Use / When Not to Use
When to Use Event-Driven Architecture
- Microservices with independent scaling: When services should be decoupled and scale independently
- High write volume with audit requirements: When you need complete audit trails of all state changes
- Multiple read models: When different consumers need different representations of the same data (CQRS)
- Real-time event processing: When you need immediate reaction to events (notifications, analytics, monitoring)
- System integration: When integrating multiple systems that evolve independently
- Temporal queries: When you need to query historical state (“what was the balance on date X?”)
When Not to Use Event-Driven Architecture
- Simple request-response logic: When your use case is straightforward CRUD operations
- Strong consistency requirements: When you need immediate consistency across all services
- Small teams with limited expertise: When operational complexity exceeds team capacity
- Low-latency synchronous requirements: When sub-millisecond response times are critical
- Simple workflows: When workflow steps are few and do not benefit from decoupling
- Batch-oriented processing: When your use case is bulk data processing rather than real-time events
Message Ordering and Delivery Guarantees
Distributed systems cannot guarantee both ordering and availability. You must choose between ordering guarantees and availability.
At-Most-Once Delivery
The message is delivered at most once - it may be lost but never duplicated. The consumer does not acknowledge receipt, so the broker can discard it after sending.
Producer -> Broker -> Consumer (no ack)
[message lost if consumer offline]
Use case: Monitoring metrics where occasional loss is acceptable.
At-Least-Once Delivery
The message is guaranteed to arrive, but may be delivered multiple times. The consumer acknowledges after processing, and the broker redelivers if no ack is received.
Producer -> Broker -> Consumer (ack after process)
[redelivered if no ack received]
[duplicate possible]
Use case: Payment processing, where missing a transaction is worse than a duplicate.
Exactly-Once Delivery
The message is delivered exactly once, neither lost nor duplicated. Requires coordination between producer confirmation and consumer offset management.
Kafka: Producer tx + Consumer offset commit
[most expensive but safest]
Use case: Financial transactions where duplicates or losses are both unacceptable.
Ordering Guarantees
Kafka provides per-partition ordering. All events for the same partition key are processed in order. To guarantee cross-partition ordering, use a single partition or explicit sequencing.
# Ensures ordering within user_id
producer.send('events', key=b'user-123', value=event)
For systems requiring global ordering, consider using a single partition topic or implementing sequence numbers in event payloads.
Saga Pattern
When a business transaction spans multiple services, you need a way to handle failures gracefully. Sagas manage distributed transactions through a sequence of local transactions, each with a compensating transaction for rollback.
Why Sagas Instead of Distributed Transactions
Traditional ACID transactions do not scale across services. Two-phase commit (2PC) introduces tight coupling and availability loss. Sagas provide a looser coupling model with compensating transactions.
Types of Sagas
Choreography-based Saga: Each service publishes events that trigger the next service’s local transaction. No central coordinator.
graph LR
O[Order Service] -->|OrderCreated| P[Payment Service]
P -->|PaymentCaptured| I[Inventory Service]
I -->|InventoryReserved| S[Shipping Service]
S -->|ShippingScheduled| O
Orchestration-based Saga: A central saga orchestrator manages the workflow, deciding what to do next and triggering compensating transactions on failure.
graph LR
Orch[Order Saga Orchestrator] -->|CreateOrder| OrderSvc[Order Service]
Orch -->|CapturePayment| PaySvc[Payment Service]
Orch -->|ReserveInventory| InvSvc[Inventory Service]
Orch -->|ScheduleShipping| ShipSvc[Shipping Service]
PaySvc -.->|PaymentFailed| Orch
Orch -.->|CancelOrder| OrderSvc
Orch -.->|RefundPayment| PaySvc
Implementing Compensation
Each saga step must define its compensation logic:
class OrderSaga:
def create_order(self, order_data):
order = self.order_service.create(order_data)
return SagaStep(order_id=order.id, forward=self.capture_payment, backward=self.cancel_order)
def capture_payment(self, step):
payment = self.payment_service.capture(step.order_id)
return SagaStep(payment_id=payment.id, forward=self.reserve_inventory, backward=self.refund_payment)
def cancel_order(self, step):
self.order_service.cancel(step.order_id)
def refund_payment(self, step):
self.payment_service.refund(step.payment_id)
Saga vs Event Sourcing
Sagas focus on workflow coordination. Event sourcing focuses on state history. They complement each other:
- Saga without Event Sourcing: Orchestrates steps but does not persist the saga state
- Saga with Event Sourcing: Each saga step emits events, creating an audit trail of workflow decisions
Event Schema Design
Events are the contract between producer and consumer. A well-designed event schema evolves gracefully and remains comprehensible across services.
Event Envelope Pattern
Every event should have a standard envelope:
{
"event_id": "uuid-v4",
"event_type": "OrderPlaced",
"event_version": "1.0",
"timestamp": "2026-03-22T10:30:00Z",
"correlation_id": "uuid-for-tracing",
"causation_id": "uuid-of-triggering-event",
"payload": {
"order_id": "ORD-12345",
"customer_id": "CUST-67890",
"total_amount": 99.99,
"currency": "USD"
}
}
This envelope provides observability, traceability, and versioning at the schema level.
Schema Registry
Centralize event schema definitions using a schema registry:
- Confluent Schema Registry: Stores Avro or JSON Schema for Kafka events
- AWS Glue Schema Registry: AWS-managed schema registry for Kafka and Kinesis
- Custom registry: Simple API with versioning, validation, and compatibility checks
Schema registry benefits:
- Backward compatibility: New schema versions can read old events
- Forward compatibility: Old schema versions can read new events (ignoring new fields)
- Schema evolution rules: Define which changes are allowed (add fields, remove fields, etc.)
Event Design Principles
- Use flat structures: Nested objects complicate schema evolution
- Include timestamps: Essential for ordering and debugging
- Add context fields: correlation_id, causation_id enable distributed tracing
- Version explicitly: Include version number to handle schema evolution
- Use primitive types: Avoid complex types in early versions
Event Processing Patterns
Beyond simple publish-subscribe, event processing involves patterns for handling high-volume, low-latency, and fault-tolerant scenarios.
Event Streaming vs Event Carousel
Event Streaming: Continuous flow of events processed in real-time. Kafka excels at this.
Producer1 -> Topic -> Consumer1
Topic -> Consumer2
Topic -> Consumer3
Event Carousel: Repeated processing of the same event batch until consumed. Useful for batch-oriented workflows.
Filtering and Transformation
Filter events at the broker level to reduce network traffic:
# Kafka Streams example: filter low-value events
stream.filter((event) => event.amount > 100)
Transform events before consumption:
# Add enrichment before storing
stream.map((event) => ({
...event,
enriched_at: timestamp,
region: lookup_region(event.customer_id)
}))
State Stores
Maintain stateful event processing using state stores:
# Kafka Streams: materialized view
orders_by_user = stream.groupByKey()
.aggregate(
initializer=[],
aggregator=(key, event, existing) => existing.append(event)
)
State stores enable:
- joins: Combining events from multiple topics
- aggregations: Running totals and windowed computations
- tables: Latest value per key
Outbox Pattern
The outbox pattern solves a fundamental reliability problem in event-driven systems: the dual-write issue. When your application needs to update a database and publish an event, doing both non-atomically creates inconsistency risk. If the database commits but the event publish fails, your system enters an invalid state.
Without the outbox pattern:
# Problem: These two operations are not atomic
def handle_order(order_data):
order = db.create_order(order_data) # Commits
event_bus.publish(OrderCreated(order)) # Could fail after commit
With the outbox pattern:
def handle_order(order_data):
order = db.create_order(order_data)
db.save_outbox_event(OutboxEvent(type='OrderCreated', payload=order)) # Same transaction
# Commit both or neither
# Separate process polls outbox and publishes
def outbox_relay():
events = db.get_pending_outbox_events()
for event in events:
event_bus.publish(event)
db.mark_outbox_processed(event.id)
The outbox table lives in the same database as your business data. A dedicated relay process polls pending events and publishes them to the broker. This guarantees at-least-once delivery because the event is persisted before the publish acknowledgment returns.
Production consideration: Use a single-threaded relay or partition by aggregate ID to prevent out-of-order publishing when multiple instances run.
Event Notification vs Event-Carried State Transfer
Two competing philosophies govern how much data to include in an event:
Event Notification: The event contains only enough information to identify what happened. Consumers must query back to the producer for full details.
{ "event_type": "OrderPlaced", "order_id": "ORD-123", "timestamp": "..." }
Consumers call GET /orders/ORD-123 to get the complete order details.
Event-Carried State Transfer: The event contains all data consumers need without callback queries.
{ "event_type": "OrderPlaced", "order_id": "ORD-123", "customer_id": "...", "items": [...], "total": 99.99 }
Trade-offs:
| Aspect | Event Notification | Event-Carried State Transfer |
|---|---|---|
| Consumer complexity | Higher (needs callback) | Lower (self-contained) |
| Producer load | Lower | Lower (no callbacks) |
| Network calls | Additional per event | None |
| Data size | Small | Larger |
| Consumer independence | Lower (coupled to producer API) | Higher |
| Schema evolution | Easier (consumers re-fetch) | Harder (all consumers must update) |
Event notification works well when producers need to protect their data model or when consumers frequently need related data. Event-carried state transfer works well when consumers are truly independent and event sizes remain manageable.
Idempotency in Event Processing
At-least-once delivery means consumers must handle redelivered events gracefully. Idempotency ensures processing an event multiple times produces the same result as processing it once.
Deduplication by event ID:
processed_events = redis.set('processed_events')
def handle_event(event):
if event.event_id in processed_events:
return # Already handled
process_event(event)
processed_events.add(event.event_id)
Natural idempotency: Some operations are naturally idempotent:
# Idempotent: Setting a value to the same thing twice
account.update_balance(new_balance=100)
# Non-idempotent: Incrementing
account.increment_balance(amount=10) # Called twice = +20 instead of +10
Design event handlers to use upserts instead of increments where possible. When increments are necessary, track the event ID to detect duplicates.
Consumer-side acknowledgment:
def process_with_ack(event):
try:
result = handle_event(event)
acknowledge(event.offset) # Commit offset only after success
return result
except:
negative_acknowledge(event.offset) # Trigger redelivery
raise
Backpressure Handling
Backpressure occurs when producers publish events faster than consumers can process them. Left unhandled, this leads to queue accumulation, memory exhaustion, and eventually system failure.
Monitoring consumer lag:
# Kafka: track consumer lag per partition
lag = end_offset - current_offset
if lag > threshold:
alert("Consumer falling behind")
Strategies for handling backpressure:
- Horizontal scaling: Add consumer instances to parallelize processing
- Throttling: Implement rate limiting on the producer side
- Buffering with limits: Cap buffer size and route overflow to dead letter queues
- Adaptive processing: Slow down when queue depth exceeds thresholds
Dead letter queue pattern:
def process_with_dlq(event):
try:
handle_event(event)
except PermanentFailure:
dlq.publish(event) # No point retrying
acknowledge(event.offset)
except TransientFailure:
raise # Retry via redelivery
Design systems to gracefully degrade under load rather than cascade failures. When in doubt, prefer dropping events with alerting over unbounded queue growth.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Event broker failure | Events not published or consumed | Use broker clustering with replication; implement outbox pattern |
| Consumer crash mid-processing | Events may be lost or reprocessed | Use acknowledgment after successful processing; idempotent consumers |
| Out-of-order event delivery | State becomes inconsistent | Use sequence numbers or timestamps; implement conflict resolution |
| Event schema version mismatch | Consumers cannot deserialize events | Implement schema versioning; use upcasters or schema registry |
| Poison event blocking consumer | Consumer stops processing | Configure dead letter queues; implement retry limits |
| Cascading failures | One service failure triggers failures in others | Implement circuit breakers; design for partial availability |
| Event replay overload | New consumer overwhelms system with replays | Throttle replay rate; process in batches with backpressure |
| Correlation ID loss | Cannot trace transactions across services | Propagate correlation IDs through all events; log correlation |
Common Pitfalls / Anti-Patterns
Pitfall 1: Using Events for Queries
Events are for notifications, not queries. If you find yourself requesting data through events and waiting for responses, use a direct API call instead.
Pitfall 2: Mixing Commands and Events
Commands expect one handler; events expect zero or more handlers. Sending a command as an event creates confusion about who should handle it and whether responses are expected.
Pitfall 3: Not Planning for Event Schema Evolution
Events are immutable once published. If you change the schema, old events must still deserialize correctly. Plan versioning from the start using upcasters or versioned event types.
Pitfall 4: Over-Engineering with Choreography
Choreography is simple until it is not. Complex workflows with compensation logic are easier to manage with orchestration. Do not default to choreography for everything.
Pitfall 5: Ignoring Eventual Consistency
Read models lag behind writes in EDA. If your business logic assumes immediate consistency, you will have bugs. Design UIs and expectations around eventual consistency.
Pitfall 6: Creating Too Many Event Types
If every database write creates a unique event type, you have an event catalog nightmare. Use a smaller set of event types with sufficient payload to handle multiple use cases.
Interview Questions
Expected answer points:
- Commands are directed requests for a specific action ("do this"), expecting one handler
- Events are broadcast facts that something happened ("this happened"), with zero or more potential listeners
- Commands use imperative naming (verb-noun like CreateOrder), while events use past tense naming (noun-verb like OrderPlaced)
Expected answer points:
- Event sourcing stores the complete history of state changes as a sequence of events rather than persisting current state
- To rebuild state, you replay all events from the beginning (or from a snapshot) in order
- Each event represents a state mutation; applying them sequentially reconstructs the aggregate
- Snapshots can optimize by periodically capturing state to reduce replay depth
Expected answer points:
- CQRS (Command Query Responsibility Segregation) separates read and write models into distinct paths
- Write side: Commands go through a command handler to an event store
- Read side: Queries hit materialized views (projections) built from events
- Best suited when read/write workloads differ significantly, you need multiple optimized read representations, or you are already using event sourcing
- Not suitable for simple CRUD domains where the overhead outweighs benefits
Expected answer points:
- Choreography: Services react to events and emit their own events without a central coordinator; the overall workflow emerges from the event chain
- Orchestration: A central orchestrator coordinates the entire transaction, deciding next steps based on responses
- Choreography benefits: True decoupling, no single point of failure, easy to add consumers
- Orchestration benefits: Centralized behavior, explicit transaction boundaries, easier compensation logic
- Hybrid approach is common in practice: orchestration for core business workflows, choreography for peripheral side effects
Expected answer points:
- At-most-once: Message may be lost but never duplicated; no consumer acknowledgment
- At-least-once: Message is guaranteed to arrive but may be delivered multiple times; consumer acknowledges after processing
- Exactly-once: Message arrives exactly once, neither lost nor duplicated; requires coordination between producer and consumer offset management
Expected answer points:
- Saga manages distributed transactions through a sequence of local transactions, each with a compensating transaction for rollback
- Each saga step performs a forward action and defines a backward compensation action
- If any step fails, all previously completed steps execute their compensation transactions in reverse order
- Choreography-based saga: services publish events triggering next steps
- Orchestration-based saga: a central orchestrator manages workflow and compensation
Expected answer points:
- A correlation ID is a unique identifier attached to all events resulting from the same originating action
- It enables tracing a complete distributed transaction across service boundaries and logs
- Passed through the entire call chain: originating action generates the ID, all subsequent events inherit it
- Essential for debugging distributed systems where a single business transaction spans multiple services
Expected answer points:
- Upcasting: A transformer converts old event versions to new versions before processing
- Event versioning: Include version number in event type (UserCreatedV2) with transformation logic
- Snapshots: Periodically capture state to limit replay depth for long-lived entities
- Schema registry: Centralize schema definitions with backward/forward compatibility rules
- Key principle: Old events must still deserialize correctly as your schema changes
Expected answer points:
- Eventual consistency: Read models lag behind writes; writes return quickly but reads may see stale data
- Strong consistency: All services immediately reflect the same state; higher latency but no stale reads
- EDA typically embraces eventual consistency for better availability and scalability
- Design implications: UIs and business logic must account for lag between write and read model update
- Strong consistency required only when absolute accuracy is critical (e.g., financial transactions with regulatory requirements)
Expected answer points:
- event_id: Unique identifier (UUID v4) for deduplication and tracing
- event_type: The event name (e.g., OrderPlaced) for routing and handling
- event_version: Schema version number for compatibility handling
- timestamp: When the event occurred for ordering and debugging
- correlation_id: Links related events across the distributed transaction
- causation_id: References the event that triggered this one
- payload: The actual event data, kept flat for easier schema evolution
Expected answer points:
- The outbox pattern solves the dual-write problem where failing to write to the database and publish an event atomically leads to inconsistencies
- Instead of publishing events directly, you write events to an outbox table in the same transaction as your business data
- A separate process (outbox relay) polls the outbox table and publishes events to the broker
- This ensures at-least-once delivery because the event is persisted before acknowledgment
- Prevents scenarios where the database commits but the event publish fails
Expected answer points:
- Event notification: The event contains minimal data (just enough to identify what happened); consumers must call back to the producer for full details
- Event-carried state transfer: The event contains the complete state needed by consumers, eliminating need for callback queries
- Event notification reduces producer complexity but increases consumer complexity and load on producer
- Event-carried state transfer simplifies consumers but creates larger events and potential data consistency issues
- Choose based on consumer independence needs vs network and storage trade-offs
Expected answer points:
- Idempotency ensures that processing the same event multiple times produces the same result as processing it once
- Use event_id deduplication: Store processed event IDs and check before processing
- Design handlers as naturally idempotent: Use upserts instead of inserts, check-before-update patterns
- Implement idempotency keys at the consumer level with acknowledgment based on successful processing
- For at-least-once delivery systems, idempotency is critical to handle redelivery without data corruption
Expected answer points:
- Kafka: Durable log-based messaging with per-partition ordering, replay capability, and consumer group semantics; best for high-throughput event streaming
- RabbitMQ: Traditional message broker with exchanges, queues, and bindings; supports complex routing but no native replay
- SNS/SQS: AWS managed services with push-based pub/sub (SNS) and pull-based queues (SQS); simplifies operations but creates AWS vendor dependency
- Kafka excels at event sourcing and audit trails; RabbitMQ better for request-response integration; SNS/SQS for cloud-native AWS workflows
Expected answer points:
- The event curtain is an interface that hides the internal event structure from consumers
- Producers publish events through the curtain; consumers receive events through the curtain
- This allows internal schema changes without breaking consumers who have already processed events
- The curtain acts as a version adapter, translating between producer and consumer versions
- Critical for maintaining backward compatibility in long-lived event-driven systems
Expected answer points:
- Backpressure occurs when consumers cannot process events as fast as producers publish them
- Without handling, this leads to message accumulation, memory exhaustion, or dropped messages
- Strategies: Consumer scaling (add more instances), throttling (slow producer), buffering with limits, dead letter queues for overflow
- Kafka offers consumer lag monitoring as a backpressure indicator
- Design systems to gracefully degrade under load rather than cascade failures
Expected answer points:
- Projection rebuild replays all events from the event store to rebuild a read model from scratch
- For large event stores, this can take hours or days, leaving the system unavailable
- Snapshotting mitigates this by periodically persisting the projected state, reducing replay to events since the last snapshot
- Blue-green projection builds a new projection alongside the existing one, switching when complete
- Eventual consistency means read models lag during rebuild; design UIs to handle temporary staleness
Expected answer points:
- Unit testing: Test individual aggregate and event handlers in isolation with mock dependencies
- Contract testing: Verify event schemas between producer and consumer services
- Integration testing: Test event flow through the broker with test containers or embedded brokers
- Circuit breaker testing: Verify system behavior when downstream services fail
- Chaos testing: Inject failures like broker unavailability, message duplication, and out-of-order delivery
Expected answer points:
- Two-phase commit (2PC) provides atomic commitment across services but introduces tight coupling and blocks resources during the prepare phase
- Sagas use sequential local transactions with compensating transactions for rollback, avoiding blocking
- 2PC loses availability during the commit phase; sagas remain available with proper compensation design
- Sagas trade atomicity for availability and scalability, accepting eventual consistency
- Choreography-based sagas scale better; orchestration-based sagas provide visibility but introduce centralization
Expected answer points:
- Use event-driven when: services are truly independent, you need audit trails, multiple read models, or genuine decoupling in time and space
- Use request-driven when: you need strong consistency, low-latency synchronous responses, or simple CRUD operations
- Hybrid approach is common: core business workflows use request-response, while peripheral side effects use events
- Consider team expertise: EDA has higher operational complexity and learning curve
- Start with simpler architectures and evolve to event-driven only when benefits are clear and justified
Further Reading
- Apache Kafka — a common backbone for event-driven systems; its durable log, replay capability, and consumer group model map well to EDA patterns
- Pub/Sub Patterns — pub/sub patterns that complement event-driven architecture
- Message Queue Types — messaging infrastructure that supports event-driven systems
Conclusion
Events vs Commands
Understanding the distinction between events and commands is foundational to event-driven architecture.
Commands are directed requests for a specific action, expecting one handler:
- Imperative naming: verb-noun (CreateUser, SubmitOrder, ProcessPayment)
- The sender expects a single response
- Like a phone call — one person answers
Events are broadcast facts that something happened, with zero or more potential listeners:
- Past tense naming: noun-verb (UserCreated, OrderSubmitted, PaymentProcessed)
- The sender does not know who is listening
- Like a radio broadcast — anyone tuned in receives it
Using the wrong naming convention creates confusion. If you name something CreateUser but treat it as an event (broadcasting to multiple consumers), other developers will expect a single handler response that never comes.
For example:
- Command:
CreateUser→ expects one handler to process and respond - Event:
UserCreated→ broadcasts to all interested consumers
This distinction shapes how you design message contracts throughout an EDA system.
Key Points
- Events are facts that something happened; commands are requests for specific actions
- Event sourcing stores state changes as events for replay and audit
- CQRS separates read and write models for independent optimization
- Correlation IDs enable distributed tracing across service boundaries
- Choreography lets services react to events without central coordination
- Orchestration uses a central coordinator for complex workflows
- Event schema evolution requires backward-compatible versioning strategies
- Design for eventual consistency; read models lag behind writes
Pre-Deployment Checklist
- [ ] Event schema versioning strategy defined (upcasters or versioned types)
- [ ] Event schema registry deployed for validation
- [ ] Correlation ID propagation implemented across all services
- [ ] Dead letter queue configured for failed event processing
- [ ] Idempotent event consumers implemented
- [ ] Circuit breakers configured for downstream service calls
- [ ] Event broker clustering and replication configured
- [ ] Monitoring for event consumption lag configured
- [ ] Alert thresholds set for dead letter queue depth
- [ ] Event retention policy defined (for event sourcing replay)
- [ ] Read model rebuild procedure documented
- [ ] Schema evolution testing implemented (old events deserialize correctly)
- [ ] Security controls (authentication, authorization, encryption) configured
- [ ] Workflow compensation logic tested (for orchestration) Category
Related Posts
CQRS and Event Sourcing: Distributed Data Management
Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.
CQRS Pattern
Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.
Event Sourcing
Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.