Publish/Subscribe Patterns: Topics, Subscriptions, Filtering
Learn publish-subscribe messaging patterns: topic hierarchies, subscription management, message filtering, fan-out, and dead letter queues.
Publish/Subscribe Patterns: Topics, Subscriptions, and Filtering
Publish/subscribe is a messaging pattern where senders (publishers) broadcast messages to topics instead of sending them directly to specific receivers. Subscribers receive messages from the topics they care about. This decoupling is practical but comes with tradeoffs around topic design, filtering, and delivery semantics.
This post walks through the practical aspects of building pub/sub systems: topic hierarchies, subscription management, message filtering, and fan-out patterns.
Introduction
Publish/subscribe is a messaging pattern where senders — called publishers — do not direct messages to specific receivers. Instead, they broadcast messages to named topics. Subscribers listen to the topics they care about and receive all messages published to those topics. This decouples producers from consumers in both time and space: publishers do not need to know who (if anyone) is listening, and subscribers do not need to be online when messages are published.
A single order.placed event can simultaneously trigger inventory updates in the warehouse service, receipt generation in the billing service, fraud detection in the risk service, and analytics in the data pipeline — without the order service knowing anything about those downstream consumers.
This post covers the practical building blocks: how to design topics, manage subscriptions, filter messages selectively, handle fan-out to many subscribers, and deal with failures through dead letter queues. It also covers the trade-offs that determine when pub/sub fits and when a queue or direct RPC is simpler.
Topic-Specific Deep Dives
Topic Design
Topics are the core abstraction in pub/sub. Getting them right makes everything else easier.
Flat Topics
The simplest approach: one topic per message type.
user.created
order.placed
payment.processed
Each subscriber chooses which topics to listen to. Simple, but limited. What if you want all user events? You subscribe to multiple topics.
Hierarchical Topics
Hierarchies organize topics into trees, enabling broader subscriptions:
users/
users.created
users.updated
users.deleted
orders/
orders.placed
orders.updated
orders.cancelled
payments/
payments.processed
payments.failed
Subscribing to orders/ gives you all order events. Subscribing to users.* gives you all user lifecycle events.
graph TD
Publisher -->|publish| Topic[orders.placed]
subgraph Subscribers
Analytics[Analytics Service]
Notifications[Notification Service]
Audit[Audit Service]
end
Topic -->|orders.placed| Analytics
Topic -->|orders.placed| Notifications
Topic -->|orders.placed| Audit
Naming conventions matter. Without some discipline, hierarchies turn into a mess pretty quickly.
Subject-Based vs Content-Based
Most pub/sub systems are subject-based: topics are predefined strings that publishers use. Subscribers choose topics to listen to.
Content-based systems route based on message content. A subscriber might say “give me all messages where amount > 1000.” This is more flexible but harder to implement efficiently. It also raises security concerns—what if a subscriber crafts queries to sniff data they should not see?
Most teams end up using subject-based routing. It is simpler to implement and performs well at scale.
Subscription Management
Subscriptions track what each subscriber wants. Managing their lifecycles is trickier than it sounds.
Durable Subscriptions
A durable subscription persists even when the subscriber is offline. When it reconnects, it gets the messages that arrived while it was disconnected.
// Pseudo-code for durable subscription
subscription = client.subscribe("orders.*", durable=true)
client.disconnect()
client.reconnect()
subscription.resume() // missed messages delivered
This matters for mobile clients and services that restart. Without durability, offline periods mean permanent message loss.
Shared Subscriptions
When multiple instances of a service run (say, three notification service instances), you want them to share the work. Shared subscriptions distribute messages across instances.
orders/ -> [notification-1, notification-2, notification-3]
each instance gets roughly 1/3 of messages
This is essential for scaling consumers horizontally. Without sharing, all instances receive all messages, which defeats the purpose of running multiple workers.
Subscription Types by Delivery
- Exclusive: Only one consumer receives messages (no sharing)
- Shared: Messages distributed across multiple consumers
- Failover: One consumer receives all messages, others standby
Message Filtering
Sometimes subscribers only need a subset of what a topic publishes. Filtering lets them narrow what they receive.
Topic-Level Filtering
The simplest form: use narrow topics. A subscriber to orders.high-value gets only high-value orders, not everything else.
This requires publishers to classify messages correctly, which couples them to subscriber knowledge.
Header-Based Filtering
Messages have headers (key-value pairs) that you can use for filtering:
{
"topic": "orders",
"headers": {
"priority": "high",
"region": "us-west",
"source": "web"
},
"body": { "order_id": "12345", "amount": 5000 }
}
A subscriber can filter: headers.region = 'us-west' AND headers.priority = 'high'.
This separates classification (publisher job) from filtering (subscriber job).
SQL-Based Filtering
Some systems let subscribers express filters as SQL-like conditions:
SELECT * FROM orders WHERE amount > 1000 AND region = 'us-west'
Amazon SNS supports SQL-based filtering:
{
"FilterPolicy": {
"amount": [{ "numeric": [">", 1000] }],
"region": ["us-west"]
}
}
Fan-Out Patterns
Fan-out describes how one message reaches multiple subscribers. The pattern changes depending on what you need.
Broadcast Fan-Out
Every subscriber receives every message. The simplest model.
graph LR
Pub[Publisher] -->|1 message| Topic
Topic -->|copy 1| Sub1[Subscriber 1]
Topic -->|copy 2| Sub2[Subscriber 2]
Topic -->|copy 3| Sub3[Subscriber 3]
The broker makes copies. Cost scales with subscriber count.
Selective Fan-Out
With filtering, only matching subscribers receive the message. More efficient but requires the broker to evaluate filters.
Dead Letter Queues
When a subscriber cannot process a message (crashes, returns error, times out), what happens? Dead letter queues capture failed messages so you can inspect them later.
graph LR
Pub[Publisher] --> Topic
Topic -->|normal| Sub[Subscriber]
Topic -->|failed after retries| DLQ[Dead Letter Queue]
Without DLQs, poison messages block the queue or get dropped silently. DLQs let you see what failed and why.
Implementation Considerations
Message Ordering
Most pub/sub systems do not guarantee ordering across topics. Within a single ordered stream (like Kafka partitions), ordering holds. Across multiple publishers or topics, it does not.
If you need global ordering, you need a totally ordered broadcast protocol, which is expensive. More commonly, you accept per-partition ordering and use correlation IDs to reconstruct order client-side.
Backpressure Handling
When subscribers cannot keep up with message throughput, backpressure prevents the system from collapsing. Without it, consumers run out of memory or broker queues grow without bound.
Buffer-based approaches accumulate messages locally up to a limit:
local_buffer = RingBuffer(capacity=1000)
while local_buffer.has_capacity():
message = broker.fetch()
local_buffer.push(message)
process_batch(local_buffer.drain())
When the buffer fills, the subscriber must either drop messages, pause consumption, or signal the broker to slow down.
Flow control signals let subscribers push back on publishers:
subscriber --> [Window Full] --> broker --> publisher slows production
RabbitMQ uses basic.qos with prefetch to limit unacknowledged messages. Kafka has consumer group rebalance hooks for similar behavior.
Exponential backoff on consumption failures keeps repeated processing failures from creating message pileup:
retry_delay = 1_000 # ms
for attempt in range(max_retries):
try:
process(message)
ack(message)
break
except TransientError:
sleep(retry_delay)
retry_delay *= 2
retry_delay = min(retry_delay, max_delay)
The key takeaway: backpressure should propagate upward, not accumulate silently. If your subscribers cannot keep up, the system should throttle publishers or shed load, not buffer forever.
Message Deduplication
At-least-once delivery can deliver the same message twice. Idempotent consumers handle this by tracking processed message IDs.
if message.id in processed_ids:
skip
else:
process(message)
add message.id to processed_ids
Design for at-least-once and make processing idempotent. Exactly-once is usually not worth the cost.
Delivery Guarantees Deep Dive
Pub/sub systems offer different delivery guarantees. The right choice depends on your processing model.
| Guarantee | Description | Idempotent Required | Use Case |
|---|---|---|---|
| At-most-once | Message delivered once or not at all | No | Metrics, fire-and-forget events |
| At-least-once | Message delivered, may be duplicated | Yes | Processing jobs, state updates |
| Exactly-once | Message delivered exactly once | Yes (and coord.) | Payments, critical business events |
Most broker systems (RabbitMQ, Amazon SNS/SQS, Google Cloud Pub/Sub) provide at-least-once by default. Exactly-once requires distributed consensus and adds significant overhead.
When designing consumers, assume at-least-once arrival. Make every processing step idempotent using message IDs stored in a deduplication table or bloom filter.
Pub/sub connects closely to event-driven architecture, where topics carry domain events between services. It also relates to message queue types for understanding where pub/sub fits in the broader messaging landscape.
Trade-off Analysis
| Design Choice | Pros | Cons | When to Use |
|---|---|---|---|
| Flat Topics | Simple, easy to understand | No wildcard subscriptions, subscribers must list each topic | Small number of independent topics |
| Hierarchical Topics | Wildcard subscriptions, natural organization | Naming discipline required, can become messy | Large topic spaces with natural groupings |
| Subject-Based Routing | Efficient, simple broker logic | Limited flexibility | Most production systems |
| Content-Based Routing | Maximum flexibility for subscribers | Complex broker logic, security concerns | Specialized filtering requirements |
| Durable Subscriptions | No message loss during offline periods | Higher broker memory/storage | Mobile clients, fault-tolerant services |
| Shared Subscriptions | Horizontal scaling, efficient resource use | Message ordering not guaranteed per instance | Stateless workers, scaled consumers |
| Exclusive Subscriptions | Strict ordering guarantee | No horizontal scaling | Ordered processing, single consumer |
| Broadcast Fan-Out | Simple, all subscribers get all messages | Scales with subscriber count | Notifications, event broadcasting |
| Selective Fan-Out | Efficient, subscribers get only relevant messages | Broker must evaluate filters | Targeted deliveries, filtered streams |
| Delivery Guarantee | Message Loss Risk | Duplicate Risk | Implementation Complexity |
|---|---|---|---|
| At-most-once | Possible (message may not arrive) | None | Low |
| At-least-once | None | Possible (duplicates) | Medium (idempotency needed) |
| Exactly-once | None | None | High (distributed consensus) |
When to Use / When Not to Use
When to Use Publish-Subscribe
Use pub/sub when one event should trigger actions in multiple services simultaneously — analytics, notifications, audit, and others. When services need to react to shared events without calling each other directly. When pushing updates to web clients, mobile apps, or IoT devices. When the same data needs to live in multiple systems — databases, caches, search indexes.
When Not to Use Publish-Subscribe
Don’t reach for pub/sub when you need a reply from a specific service — use direct RPC or a queue with a reply-to header. Not when messages must be processed in strict order with a single consumer. Not when you’re just distributing work evenly across workers (a plain queue handles that better). And not when all consumers must finish processing before the operation completes — pub/sub is eventually consistent by nature.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Broker goes down | All publishing and subscribing stops | Deploy broker clusters with replication; use multiple broker endpoints |
| Subscriber crash during processing | Message may be lost or reprocessed | Use manual acknowledgment; configure prefetch limits |
| Message explosion (too many subscribers) | Network saturation, increased latency | Limit subscription count; use content filtering at broker |
| Topic explosion (too many topics) | Memory pressure on broker | Implement topic naming policies; use topic hierarchies instead of flat topics |
| Subscription drift | Consumers receiving unintended messages | Use filter policies; audit subscriptions regularly |
| Ordering violations | Messages arrive out of order across topics | Use correlation IDs to reorder; use single-partition streams for ordering |
| Poison message blocking | Failed messages block subscriber | Configure dead letter queues; set retry limits with exponential backoff |
Common Pitfalls / Anti-Patterns
Common Pitfalls
Pitfall 1: Topic Proliferation
Creating too many topics (one per event type) leads to management overhead and broker resource exhaustion. Use hierarchical topics to group related events and apply naming policies.
Pitfall 2: Treating Pub/Sub as Synchronous Communication
Pub/sub is inherently asynchronous. If you need synchronous request/response, do not use topics. Clients must be designed to handle out-of-order, asynchronous responses.
Pitfall 3: Forgetting That Subscribers Can Go Offline
Without durable subscriptions, offline subscribers miss messages permanently. For critical subscribers, make sure durability is configured.
Pitfall 4: Not Designing for Message Replay
Subscribers that crash and restart need to reprocess messages they missed. Design consumers to handle gaps in the message stream gracefully.
Pitfall 5: Over-Filtering at the Publisher
When publishers classify messages for subscribers, they become coupled to subscriber requirements. Push filtering to subscribers and keep publishers unaware of subscriber specifics.
Pitfall 6: Ignoring Dead Letter Queue Accumulation
If DLQs are accumulating, something is wrong with message processing. Monitor DLQ depth and investigate root causes promptly.
Interview Questions
Expected answer points:
- Subject-based routing uses predefined topic strings; subscribers choose which topics to listen to
- Content-based routing evaluates message body attributes; subscribers define filter conditions on message content
- Content-based routing is more flexible but harder to implement efficiently and raises security concerns
Expected answer points:
- Durable subscriptions persist state server-side so messages arriving while subscriber is offline are queued
- On reconnect, the subscriber receives missed messages from the persisted state
- Critical for mobile clients and services that restart frequently
- Without durability, offline periods cause permanent message loss
Expected answer points:
- DLQs capture messages that fail processing after all retry attempts
- Prevents poison messages from blocking the queue or being dropped silently
- Enables debugging of processing failures by inspecting failed messages
- DLQ depth should be monitored and alerts configured for threshold violations
Expected answer points:
- Broadcast fan-out delivers every message to every subscriber; broker makes copies
- Selective fan-out uses filtering so only matching subscribers receive the message
- Selective fan-out is more efficient but requires broker to evaluate filters
- Cost scales with subscriber count in broadcast; less so in selective
Expected answer points:
- Exclusive: Only one consumer receives messages from the subscription
- Shared: Messages are distributed across multiple consumers in the group
- Failover: One consumer receives all messages; others remain standby for high availability
Expected answer points:
- Backpressure prevents system collapse when subscribers cannot keep up with message throughput
- Without it, consumers run out of memory or broker queues grow without bound
- Buffer-based approaches accumulate messages locally up to a limit
- Flow control signals let subscribers push back on publishers to slow things down
- Should propagate upward, not accumulate silently
Expected answer points:
- At-most-once: message delivered once or not at all; no idempotency required
- At-least-once: message guaranteed to be delivered but may be duplicated; idempotent processing required
- Exactly-once: message delivered exactly once; requires distributed consensus and significant overhead
- Most brokers provide at-least-once by default
Expected answer points:
- Shared subscriptions distribute messages across multiple consumer instances for horizontal scaling
- Exclusive subscriptions ensure only one consumer receives all messages
- Without shared subscriptions, all instances get all messages, which defeats horizontal scaling
- Shared works well for scaling consumers; exclusive is for single-threaded processing requirements
Expected answer points:
- Topic message rate (messages published per second per topic)
- Subscription delivery rate (messages delivered per second per subscription)
- Subscriber lag (time between publish and delivery)
- Dead letter queue depth (failed messages waiting to be inspected)
- Filter match rate (percentage of messages matching subscriber filters)
- Connection count (active publishers and subscribers per topic)
Expected answer points:
- Broker failure: Deploy broker clusters with replication; use multiple broker endpoints
- Subscriber crash during processing: Use manual acknowledgment; configure prefetch limits
- Message explosion (too many subscribers): Limit subscription count; use content filtering
- Topic explosion (too many topics): Implement naming policies; use topic hierarchies
- Ordering violations: Use correlation IDs to reorder; use single-partition streams
- Poison message blocking: Configure dead letter queues; set retry limits with exponential backoff
Expected answer points:
- Header-based filtering uses key-value pairs attached to messages; subscribers filter by header attributes
- SQL-based filtering lets subscribers express complex conditions with operators like >, <, =, AND, OR
- Header filtering is simpler and faster but limited to predefined header fields
- SQL filtering is more expressive but requires the broker to parse and evaluate SQL-like expressions
- Amazon SNS supports SQL-based filtering via FilterPolicy with numeric and string matching
Expected answer points:
- Pub/sub is often the messaging backbone for event-driven architecture
- In EDA, topics carry domain events between services that react to those events
- Pub/sub provides the decoupled, asynchronous communication that EDA requires
- Services do not call each other directly; they subscribe to topics and react to events
- This separation allows services to evolve independently without tight coupling
Expected answer points:
- Flat topics are simple but require explicit subscriptions for each topic
- Hierarchical topics enable broad subscriptions using wildcard patterns (e.g., orders/*)
- Hierarchies reduce coupling between publishers and subscribers
- The tradeoff is naming discipline—without conventions, hierarchies become messy
- Hierarchies work well when there is a natural domain tree; flat works when topics are independent
Expected answer points:
- At-least-once delivery can duplicate messages; idempotent processing handles duplicates
- Idempotent processing means processing a message multiple times has the same effect as once
- Implementation: track processed message IDs in a deduplication table or bloom filter
- Check if message.id exists before processing; skip if already processed
- Design for at-least-once and make every processing step idempotent
Expected answer points:
- Pub/sub does not guarantee ordering across topics or publishers
- Correlation IDs allow clients to reconstruct order on their side
- Messages related to the same logical operation share a correlation ID
- Subscribers sort messages by correlation ID to restore proper sequence
- For strict ordering, use single-partition ordered streams and correlation IDs together
Expected answer points:
- Buffer-based approaches accumulate messages locally in a ring buffer up to capacity
- When buffer fills, the subscriber must drop messages, pause consumption, or signal the broker
- Flow control signals let subscribers push back on publishers to slow production
- RabbitMQ uses basic.qos with prefetch to limit unacknowledged messages
- Kafka has consumer group rebalance hooks for similar backpressure behavior
- Backpressure should propagate upward, not accumulate silently
Expected answer points:
- Authentication: Require credentials for publishers and subscribers; use mutual TLS
- Authorization: Implement topic-level access controls; restrict who can publish to which topics
- Encryption in transit: Enable TLS for all broker connections
- Encryption at rest: Enable disk encryption if broker stores messages
- Message-level security: Validate message schemas; sanitize content to prevent injection
- Subscription security: Prevent unauthorized subscription creation
Expected answer points:
- Exclusive subscriptions deliver all messages to one consumer; others receive nothing
- Shared subscriptions distribute messages across multiple consumers in a group
- Without sharing, all instances get all messages—defeating horizontal scaling
- Shared works well for scaling out stateless workers; exclusive is for single-threaded requirements
- Failover subscriptions designate one active consumer with others on standby
Expected answer points:
- Exponential backoff increases retry delays progressively after transient failures
- Starting delay doubles or triples after each failed attempt up to a maximum
- Prevents message pileup from repeated processing failures creating backlog
- Example: retry_delay starts at 1000ms, doubles each attempt, caps at max_delay
- Keeps the system responsive while failed messages are retried at sustainable rate
Expected answer points:
- Topic message rate: Messages published per second per topic
- Subscription delivery rate: Messages delivered per second per subscription
- Subscriber lag: Time between message publish and delivery to subscribers
- Dead letter queue depth: Failed messages awaiting inspection
- Filter match rate: Percentage of published messages matching subscriber filters
- Connection count: Active publishers and subscribers per topic
- Alerts should trigger when DLQ depth exceeds threshold or delivery failures exceed rate
Further Reading
- Event-Driven Architecture - Companion pattern for building event-driven systems
- Message Queue Types - Understanding where pub/sub fits in the broader messaging landscape
- Amazon SNS Documentation - AWS’s pub/sub service deep dive
- RabbitMQ Routing Topics - RabbitMQ’s official tutorial on topic routing patterns
Conclusion
Key Points
- Pub/sub broadcasts messages to all subscribers; each subscriber gets a copy
- Topic hierarchies organize messages by domain and enable broad subscriptions
- Durable subscriptions persist messages for offline subscribers
- Shared subscriptions distribute load across multiple consumer instances
- Dead letter queues capture poison messages for debugging
- Design for at-least-once delivery with idempotent processing
- Ordering is not guaranteed across topics; use correlation IDs when needed
Pre-Deployment Checklist
- [ ] Topic naming convention documented and enforced
- [ ] Durable subscriptions enabled for critical subscribers
- [ ] Dead letter queues configured with monitoring
- [ ] Filter policies documented per subscription
- [ ] Manual acknowledgment implemented in consumers
- [ ] Idempotent message processing implemented
- [ ] Retry limits set with exponential backoff
- [ ] TLS/encryption enabled for all connections
- [ ] Topic-level access controls configured
- [ ] Alert thresholds set for DLQ depth and delivery failures
- [ ] Consumer group scaling strategy defined
- [ ] Correlation ID propagation implemented for distributed tracing
Observability
Metrics to Monitor
- Topic message rate: Messages published per second per topic
- Subscription delivery rate: Messages delivered per second per subscription
- Subscriber lag: Time between message publish and delivery to subscribers
- Dead letter queue depth: Failed messages awaiting inspection
- Filter match rate: Percentage of published messages matching subscriber filters
- Connection count: Active publishers and subscribers per topic
Logs to Capture
- Topic creation and deletion events
- Subscription creation, modification, and deletion
- Message publish events with topic, routing keys, and headers
- Message delivery events per subscription
- Dead letter queue arrivals with original topic and failure reason
- Filter policy evaluations (when a message is filtered out)
Alerts to Configure
- Dead letter queue depth exceeds threshold
- Subscriber delivery failures exceed rate threshold
- Topic message rate anomalies (spike or drop)
- Subscription lag exceeds SLA threshold
- Broker memory or connection limits approaching
Security Checklist
- Authentication: Require credentials for publishers and subscribers; use mutual TLS for authentication
- Authorization: Implement topic-level access controls; restrict who can publish to which topics
- Encryption in transit: Enable TLS for all broker connections
- Encryption at rest: Enable disk encryption if broker stores messages
- Message-level security: Validate message schemas; sanitize content to prevent injection attacks
- Subscription security: Prevent unauthorized subscription creation; validate subscriber identity
- Audit trails: Log all topic and subscription administrative actions
- Data classification: Do not publish sensitive data to shared topics without encryption
Category
Related Posts
CQRS Pattern
Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.
Event Sourcing
Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.
Message Queue Types: Point-to-Point vs Publish-Subscribe
Understand the two fundamental messaging patterns - point-to-point and publish-subscribe - and when to use each, including JMS, AMQP, and MQTT protocols.