Apache Kafka: Distributed Streaming Platform

Learn how Apache Kafka handles distributed streaming with partitions, consumer groups, exactly-once semantics, and event-driven architecture patterns.

published: March 22, 2026 reading time: 53 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Kafka organizes events into immutable topics split into partitions, and consumer groups let different services each process the full stream independently, which makes it the backbone of event-driven architectures at companies like LinkedIn, Netflix, and Uber. The offset-based consumer model means clients control their position in the log, so they can replay events at any time rather than just processing once. Exactly-once delivery uses Kafka transactions to atomically commit offsets alongside output writes, closing the gap where crashes between database writes and offset commits cause duplicates. Understanding the trade-offs between at-most-once, at-least-once, and exactly-once semantics, along with broker replication and consumer group rebalancing, is what separates production-ready Kafka designs from toy examples.

Introduction

Apache Kafka is a distributed streaming platform built for high-throughput, fault-tolerant event streaming at scale. Originally developed at LinkedIn and later open-sourced through the Apache Foundation, Kafka has become the backbone of event-driven architectures across industries — powering real-time analytics pipelines, event sourcing systems, microservices communication, and log aggregation at companies like Netflix, Uber, and Airbnb.

Unlike traditional message queues that delete messages after consumption, Kafka persists messages in immutable logs. Producers write events to topics; consumers read from those topics independently. This durability and the ability to replay events make Kafka well-suited for systems where late-arriving consumers or audit trails matter.

This post covers the core concepts that make Kafka work: topics and partitions, consumer groups, offset management, exactly-once semantics, broker replication, dead letter queues, and backpressure handling. By the end, you’ll understand how Kafka achieves its legendary throughput, how to design consumer groups for parallelism, and how to avoid the common pitfalls in production.

Core Concepts

Topics and partitions

Kafka organizes data into topics. Unlike a queue where messages are consumed and deleted, Kafka topics are logs. Messages are appended and kept for a configurable retention period: hours, days, or indefinitely.

Topic: order-events
Partition 0: [msg1, msg2, msg3, msg5, msg8]
Partition 1: [msg4, msg6, msg7, msg9]
Partition 2: [msg10, msg11, msg12]

Each topic splits into partitions for parallelism. Partitions distribute across brokers. Within a partition, messages have a monotonically increasing offset that uniquely identifies each one.

Message keying

Producers can specify a key when publishing:

producer.send(new ProducerRecord("order-events", orderId, orderJson));

Kafka hashes the key to determine the partition. Messages with the same key always go to the same partition, which means ordering per key. All events for the same order arrive in the same partition, in order.

Partition assignment

Partitions assign to brokers at topic creation time:

Broker 1: Partition 0, Partition 3
Broker 2: Partition 1, Partition 4
Broker 3: Partition 2, Partition 5

The leader broker for each partition handles reads and writes. Followers replicate for fault tolerance.

Consumer groups

Kafka consumers belong to consumer groups. Each message in a topic delivers to one consumer within each group.

graph LR
    Producer -->|publish| Topic[Topic with 3 Partitions]
    Topic -->|P0| CG1[Consumer Group A]
    Topic -->|P1| CG1
    Topic -->|P2| CG1
    Topic -->|P0| CG2[Consumer Group B]
    Topic -->|P1| CG2
    Topic -->|P2| CG2

Group A might have one consumer processing all partitions. Group B might have three consumers, each owning one partition. Both groups receive all messages independently.

Rebalancing

When a consumer joins or leaves a group, Kafka triggers a rebalance. The coordinator revokes the consumer’s partition assignments and redistributes them across the remaining consumers. While this is happening, no consumer in the group processes messages. Every consumer stops polling, hands back its partitions, and waits for the coordinator to send its new assignment.

How long a rebalance takes depends on partition and consumer count, and which protocol the group uses:

Protocol	What happens	Impact on throughput
Eager rebalance (default)	All consumers stop, revoke every partition, then get reassigned from scratch	Full stop-the-world pause. Can last seconds for groups with hundreds of partitions
Cooperative rebalance	Partitions move incrementally; consumers keep working on partitions they’re not surrendering	Only the transferred partitions pause. Disruption measured in milliseconds

The cooperative protocol requires the CooperativeStickyAssignor, covered below. If you haven’t set partition.assignment.strategy explicitly, you’re using the eager protocol.

What causes unwanted rebalances

Most unwanted rebalances start with a simple mismatch: the consumer takes longer to process a batch than the broker’s timeout allows.

config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 300000); // 5 minutes, default
config.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 45000);    // 45 seconds, default

If consumer.poll() isn’t called within max.poll.interval.ms because the previous batch took too long, the broker marks the consumer dead, kicks it from the group, and triggers a rebalance. The typical fix is to lower max.poll.records so each batch fits inside your processing window, or raise max.poll.interval.ms to match reality.

Other common triggers:

Garbage collection pauses — a stop-the-world GC that lasts longer than session.timeout.ms looks like a crashed consumer to the broker
Network blips — missed heartbeats during transient network issues cause false-positive failures
Slowconsumer.commitSync() — if offset commits block on a downstream system, the poll interval timer keeps ticking

Static group membership

Introduced in Kafka 2.3, static group membership lets a consumer survive restarts without triggering a rebalance. A consumer configured with a group.instance.id holds its partition assignment for session.timeout.ms while it’s offline, then reconnects and resumes from where it left off:

config.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, "consumer-1");

This eliminates rebalances during rolling restarts — handy when your containers restart during a deploy.

Spotting problematic rebalances

A healthy group rebalances only when consumers are intentionally added or removed. If you see rebalances at odd hours or during low traffic, look at broker and consumer logs for:

Consumer crashes or OOM kills
Member [...] has left group without a clean leave request — suggests an unplanned disconnect
Rebalancing events with no corresponding consumer join — a consumer timed out
Repeated rebalance cycles without stable intervals — the group can never settle, often called a “rebalance storm”

Graph rebalance events alongside consumer heap usage and network latency — the cause usually jumps out faster than reading configs.

Offset management

Kafka tracks consumption progress via offsets. Consumers commit offsets to mark what they have processed:

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        process(record);
    }
    consumer.commitSync();
}

If a consumer crashes after processing but before committing, it receives the same messages on restart. This is at-least-once delivery. With idempotent processing, duplicates are harmless.

Exactly-Once Delivery

Overview

Exactly-once semantics guarantee that each message is processed exactly one time across the full pipeline. No duplicates. No gaps. This holds even when producers retry after network errors, brokers fail over, or consumers crash mid-processing.

Kafka delivers exactly-once through three coordinated mechanisms:

Transactional producer — a producer configured with a transactional.id groups multiple sends into an atomic unit. If the producer crashes mid-transaction and restarts, the transaction coordinator detects the in-flight transaction and aborts it, preventing partial writes from surviving.
Transaction coordinator — a broker-level component that manages the two-phase commit protocol. It writes prepare and commit markers to an internal __transaction_state topic and tracks every transaction’s lifecycle.
Read-committed isolation — consumers set isolation.level=read_committed to only see messages from committed transactions. Aborted or in-flight records are filtered at the broker before the consumer ever sees them.

The atomic commit flow works like this:

Producer calls beginTransaction() — the coordinator registers the transaction
Producer sends records to one or more partitions
Producer calls commitTransaction() — the coordinator writes a prepare marker to the transaction log and to each involved partition
Coordinator writes the commit marker — records become visible to read_committed consumers
If the producer crashes before step 4, the coordinator aborts the transaction after a timeout; consumers never see partial data

This atomicity closes the window between “wrote to database” and “committed offset” that creates duplicates in at-least-once setups. Without transactions, a consumer crash in that narrow gap causes duplicate database writes when the consumer replays the same messages on restart.

The problem

Exactly-once is hard because processing spans multiple systems, each tracking success differently:

Kafka -> Consumer -> Database

The gap sits between “wrote to the database” and “committed the offset.” If the consumer crashes after the database write succeeds but before the offset commit arrives, Kafka has no record that the message was processed. On restart, the consumer replays the same offset and the database write runs again.

Whether this matters depends on what the consumer does. A read-only analytics query produces no visible side effect — the duplicate is harmless. But when the consumer updates an inventory count, charges a card, or deducts from a balance, the duplicate ripples into the real world. The system believes the operation ran once; the downstream service processed it twice.

Idempotent processing closes this gap from the consumer side. If the database write uses an upsert with the message offset as a deduplication key, the second write is silently ignored. Not every output system supports idempotent writes though. Some legacy databases enforce unique constraints that reject duplicates outright. Some operations are inherently non-idempotent — sending an email, triggering a payment, incrementing a counter.

Transactions close the gap from the Kafka side instead, bundling the offset commit and the output write into one atomic unit. When that atomicity holds, the duplicate window disappears entirely and the consumer does not need to know whether its processing is idempotent.

Kafka transactions

Kafka transactions solve this by atomically committing offsets and output:

producer.initTransactions();
producer.beginTransaction();
producer.send(producerRecord);
producer.sendOffsetsToTransaction(consumer.offsets(), consumer.groupMetadata());
producer.commitTransaction();

If the transaction commits, the message writes and the offset commits atomically. If it aborts, neither happens. The consumer reads the message again.

When you need exactly-once

Most use cases do not need exactly-once. At-least-once with idempotent processing is simpler and performs better. Use exactly-once only when duplicate processing has real consequences that you cannot design around.

Financial transactions come first. A payment pipeline that charges a card twice because a consumer restarted mid-transaction is a direct financial loss. The same applies to stock trades, account balance transfers, and billing operations where the operation itself is the record of truth — undoing a duplicate charge through compensating transactions is messy and often disputed.

Inventory management is the second case. When a consumer decrements stock on a purchase message, a duplicate means the inventory count drops twice for a single order. If your fulfillment system ships based on that count, you oversell. This gets worse when inventory counts feed into reorder triggers — a depressed count can trigger a large spurious reorder that your procurement team has to cancel.

Unique constraint systems need their own mention. Some databases enforce uniqueness on user ID, order ID, or transaction ID. If your consumer inserts into such a table without an upsert pattern, a duplicate message throws a constraint violation and the consumer crashes, potentially landing the message in the DLQ. If you cannot change the schema to support idempotent upserts, exactly-once transactions protect you.

Non-idempotent downstream calls are the third case. Sending an email, pushing a notification, firing a webhook, triggering an external API — these have real-world side effects that Kafka cannot undo. If your consumer fires a webhook on each message and restarts after the webhook fired but before the offset committed, the webhook fires again on restart. Exactly-once prevents the duplicate call, but only if the downstream system does not already have its own deduplication.

For most event pipelines — analytics ingestion, search index updates, cache warming, audit logging — at-least-once with idempotency is the right call. It is simpler, has lower latency, and your idempotent consumers handle duplicates gracefully. Exactly-once is only worth the cost when the downstream risk is real and nontrivial to design around.

End-to-end exactly-once flow

The exactly-once flow in practice:

sequenceDiagram
    participant Producer
    participant Kafka as Kafka Cluster
    participant Consumer as Consumer App
    participant DB as Output DB
    participant Coordinator as Transaction Coordinator

    Producer->>Kafka: send() with transactional ID
    Kafka->>Producer: acknowledged

    Consumer->>Kafka: poll() receives record
    Consumer->>DB: write to database
    Consumer->>Coordinator: sendOffsetsToTransaction()
    Coordinator->>Kafka: commit offsets + data atomically
    Kafka->>Coordinator: commit confirmed
    Coordinator->>Consumer: offset commit confirmed

    Note over Consumer,DB: If crash here: replay from committed offset, skip already-written records

The transaction coordinator bundles the database write and the offset commit together. They either both commit or both abort. When the consumer restarts, it resumes from the last committed offset — skipping any records that were already written to the output system.

Without transactions, there is a window between “wrote to database” and “committed offset.” Crash in that window and you get duplicates. Transactions close that window.

Broker Replication and Fault Tolerance

Replication and ISR

Kafka replicates partitions across brokers for fault tolerance. Each partition has a leader and multiple followers. Followers replicate messages from the leader, staying in sync.

graph LR
    subgraph Broker-1
        L1[Partition 0 Leader]
    end
    subgraph Broker-2
        F1[Partition 0 Follower - ISR]
    end
    subgraph Broker-3
        F2[Partition 0 Follower - ISR]
    end
    Producer -->|writes| L1
    L1 -->|replicate| F1
    L1 -->|replicate| F2
    L1 -->|consume| Consumer

In-Sync Replicas (ISR) are replicas that have fully caught up with the leader. Only ISR members can become leaders after a failure. The replication.factor setting controls how many replicas exist, and min.insync.replicas defines the minimum ISR size for acknowledging writes.

For example, with replication.factor=3 and min.insync.replicas=2, a partition has 3 replicas. Writes acknowledge when at least 2 replicas (the leader plus 1 follower) have persisted the message.

Key configuration defaults

Parameter	Default	Description	Production recommendation
`replication.factor`	1	Number of replicas per partition	3 for critical topics
`min.insync.replicas`	1	Minimum ISR for acknowledge	2 (requires `acks=all`)
`retention.ms`	7 days	Message retention period	Based on replay requirements
`acks`	1 (leader)	Acknowledgment required	`all` for critical data
`compression.type`	producer	Compression codec	`lz4` or `zstd`
`max.in.flight.requests.per.connection`	5	Unacknowledged requests	1 for exactly-once

Dead Letter Queues

When processing fails and you cannot retry, messages need somewhere to go. Dead letter queues (DLQs) catch messages that consumer processing repeatedly fails on.

@Bean
public DeadLetterPublishingPostProcessor deadLetterPublishingPostProcessor(
        ConcurrentKafkaListenerContainerFactory<String, String> factory) {

    DefaultErrorHandler errorHandler = new DefaultErrorHandler(
        new FixedBackOff(1000L, 3)  // 3 retries, 1 second apart
    );

    // Send to DLQ after retries exhausted
    errorHandler.setBackOffMultiplier(2);
    factory.setCommonErrorHandler(errorHandler);

    return new DeadLetterPublishingPostProcessor(
        factory.getKafkaTemplate(),
        (record, exception) -> new TopicPartition(
            record.topic() + ".DLQ",  // convention: topic.DLQ
            record.partition()
        )
    );
}

This sends failed messages to order-events.DLQ after 3 retry attempts. The DLQ preserves the original topic, partition, and key so you can investigate without losing context.

DLQ design considerations

Monitoring: Alert when DLQ depth exceeds zero — messages piling up means something is wrong upstream
Retention: DLQ retention is often shorter than main topic; set a TTL or explicit cleanup job
Reprocessing: DLQ messages can be reprocessed by a debugging consumer or manually re-published to the original topic after fixing the issue
Causality tracking: Include the original exception stack trace or error code in the message value for debugging

Backpressure Handling

Kafka producers send messages faster than consumers can process them. Backpressure management prevents unbounded lag growth.

Consumer-side backpressure

max.poll.records limits how many messages a consumer fetches per poll:

config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 100);  // process 100, then poll again
config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 300000);  // 5 minute max poll interval

fetch.min.bytes and fetch.max.wait.ms control how much data the consumer waits for:

config.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1024 * 1024);  // wait for at least 1MB
config.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 500);  // or 500ms, whichever comes first

Producer-side backpressure

Producer buffer memory (buffer.memory) and batch.size interact to create natural backpressure. If the broker is slow and the send buffer fills, send() blocks or throws an exception:

config.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 32 * 1024 * 1024);  // 32 MB buffer
config.put(ProducerConfig.BATCH_SIZE_CONFIG, 16 * 1024);  // 16 KB batches
config.put(ProducerConfig.LINGER_MS_CONFIG, 5);  // wait up to 5ms to batch

Backpressure signals to watch

Signal	Meaning	Action
Consumer lag growing	Producers outpacing consumers	Add consumers or optimize processing
Producer `send()` blocking	Broker throughput saturated	Add brokers or reduce producer load
Under-replicated partitions > 0	Followers falling behind leader	Check disk I/O, network, or broker load
Request timeout exceptions	Brokers too slow to respond	Increase `request.timeout.ms` or scale

Delivery Guarantees

Kafka provides three delivery guarantee levels. Understanding when each applies matters for system design.

At-most-once delivery

Messages may be lost but are never duplicated. This happens when consumers commit offsets before processing:

consumer.commitAsync();  // commit before processing
process(record);  // if crash here, message is lost

Use at-most-once when:

Duplicate messages cost more than missed messages (sensor data aggregation, metrics)
You need lowest possible latency and can tolerate gaps

At-least-once delivery (default)

Messages are never lost but may be duplicated. Consumer commits after processing:

process(record);  // do work first
consumer.commitSync();  // then commit offset

If the consumer crashes after processing but before committing, the same messages reprocess on restart. With idempotent operations (writes with unique keys), duplicates are safe.

Use at-least-once when:

Missing messages is worse than duplicates (inventory updates, payment processing)
Your consumers are idempotent

Exactly-once delivery

Each message processes exactly once, no duplicates, no loss. Requires Kafka transactions as shown earlier in this post.

Use exactly-once when:

Duplicate processing has serious consequences
Your output system cannot handle idempotent writes
The performance cost is acceptable

Guarantee	Message Loss	Duplicates	Latency	Best Use Case
At-most-once	Yes	No	Lowest	Telemetry, metrics, gaps acceptable
At-least-once	No	Yes	Medium	Most use cases; idempotent consumers
Exactly-once	No	No	Highest	Financial, inventory, idempotency-impossible

Consumer Group Coordination

Consumer group behavior depends on partition assignment strategy and coordination needs.

Sticky assignor

The default sticky assignor minimizes partition movement during rebalances. When a consumer leaves, its partitions stay together when reassigned rather than being scattered across remaining consumers.

config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    StickyAssignor.class.getName());

Benefit: fewer message reorderings during rebalance since partitions that were together stay together.

Cooperative sticky assignor

For minimal disruption during rebalances, use cooperative sticky assignor. It allows incremental rebalancing without stopping consumption entirely:

config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    CooperativeStickyAssignor.class.getName());

The consumer continues processing while partition ownership shifts incrementally.

Standalone consumers

Sometimes you need a single consumer not part of a group:

consumer.subscribe(List.of("topic"));  // group-based
// vs
consumer.assign(List.of(new TopicPartition("topic", 0)));  // standalone, manual assignment

Standalone consumers manually assign partitions and manage their own offsets. Useful for:

One-off administrative tasks (consuming from beginning to rebuild state)
Specialized processing pipelines that should not interfere with normal consumers
Debugging and testing

Maximum parallelism calculation

For a given topic, maximum consumer parallelism equals partition count:

Topic with 12 partitions
  → Up to 12 consumers in the group (each gets 1 partition)
  → Adding more consumers does nothing (no extra partitions)

If you need 20 consumers for parallel processing:
  → Topic must have at least 20 partitions

Partition count is fixed at topic creation. Plan accordingly.

Kafka Streams example

Kafka Streams is a client library for building real-time stream processing applications. The word count example is the canonical demonstration. It counts occurrences of words across an infinite stream of text events:

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;

import java.util.Arrays;
import java.util.Properties;

public class WordCountApplication {
    public static void main(String[] args) {
        Properties config = new Properties();
        config.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-app");
        config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        StreamsBuilder builder = new StreamsBuilder();

        // Source: read from input topic
        KStream<String, String> textLines = builder.stream("text-lines-topic");

        // Process: split into words, group, count
        textLines
            .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
            .groupBy((key, word) -> word)
            .count(Materialized.as("word-counts-store"))
            .toStream()
            .to("word-counts-output-topic");

        KafkaStreams streams = new KafkaStreams(builder.build(), config);
        streams.start();
    }
}

How it works:

flatMapValues splits each input line into lowercase words
groupBy groups by word, discarding the message key
count maintains a running count per word in a state store
toStream.to writes results to the output topic

Scaling behavior: each partition processes words independently. The word “kafka” appearing in partitions 0 and 1 produces two separate counts that must be aggregated downstream if you need a global count. For true global word count, either use a single partition or aggregate in a subsequent step.

Fault tolerance: Kafka Streams checkpointing persists state to Kafka topics. If a stream processor crashes, it resumes from the last checkpointed position without data loss.

Topic-Specific Deep Dives

Advanced Partition Sizing

Partition count is the most consequential design decision for a Kafka topic. It determines consumer parallelism and producer throughput, as well as how evenly load spreads across your brokers. Since partition count is immutable after topic creation, getting it right the first time saves a migration later.

The right number balances four factors:

Consumer parallelism — each partition serves at most one consumer in a group. Need 20 consumers running in parallel? Plan for at least 20 partitions.
Producer throughput — a single partition reliably handles about 10 MB/s of write traffic. Divide your target throughput by 10 to get a floor for partition count.
Broker capacity — Kafka clusters start showing strain past roughly 4,000 partitions per broker. Controller elections, metadata sync, and rebalance times all grow with partition count.
Key distribution — high-cardinality keys like order IDs or user IDs spread evenly across partitions. Low-cardinality keys concentrate writes on a few partitions, creating hot spots that limit overall throughput.

A safe starting range is 6-12 partitions for most topics. Monitor consumer lag after deployment: if lag grows and you have no consumers left to add, you need more partitions. If partitions sit idle, reduce consumers instead. For high-throughput topics, use the formula under Partition count sizing below, then round up to leave room for growth.

Partition count sizing

Factors that drive partition count:

Factor	Impact	Consideration
Desired consumer parallelism	More partitions means more concurrent consumers	One partition per consumer in a group
Producer throughput target	Each partition handles about 10 MB/s	Partitions should be >= target_MB_s / 10
Maximum broker scale	Kafka performance degrades past roughly 4000 partitions per broker	Consider broker count when sizing
Key cardinality	High-cardinality keys distribute evenly	Avoid partitions that become hot spots

Sizing formula:

def calculate_partition_count(
    target_throughput_mbps: float,
    max_consumer_parallelism: int,
    num_brokers: int,
    replication_factor: int = 3
) -> dict:
    """
    Calculate recommended partition count for a topic.
    """
    # Throughput-based: each partition handles ~10 MB/s reliably
    partitions_for_throughput = math.ceil(target_throughput_mbps / 10)

    # Consumer-based: need at least as many partitions as consumers
    partitions_for_consumers = max_consumer_parallelism

    # Broker-based: avoid too many partitions per broker
    # Guideline: < 4000 partitions per broker for good performance
    max_partitions_per_broker = 4000
    partitions_for_broker_capacity = num_brokers * max_partitions_per_broker

    recommended = max(partitions_for_throughput, partitions_for_consumers)
    recommended = min(recommended, partitions_for_broker_capacity)

    # Account for replication: total partitions including replicas
    total_partitions_in_cluster = recommended * replication_factor

    return {
        'recommended_partitions': recommended,
        'based_on': 'throughput' if partitions_for_throughput >= partitions_for_consumers else 'consumer_parallelism',
        'throughput_achievable_mbps': recommended * 10,
        'max_consumer_parallelism': recommended,
        'total_partition_slots_used': total_partitions_in_cluster,
        'partitions_per_broker': recommended / num_brokers
    }

# Example: 100 MB/s target, 30 consumers, 6 brokers
result = calculate_partition_count(100, 30, 6)
# Returns: recommended=30 (based on consumer count),
#          throughput_achievable=300 MB/s,
#          partitions_per_broker=5

Practical guidelines:

Start conservative: 6-12 partitions for most topics
Increase when consumer lag appears and more consumers cannot be added
Monitor per-partition throughput to detect hot spots
When repartitioning is needed, create a new topic and migrate data with a consumer that reads from both

Producer and consumer implications:

More partitions means more producer connections and higher client-side memory
More partitions means longer leader election time after broker failure
Consumers with many partitions need more heap for offset tracking

Kafka use cases

Kafka excels at specific workloads:

Event streaming

Capture events from many sources and distribute to multiple consumers:

Clickstream -> Kafka -> Analytics
                     -> Personalization
                     -> Fraud Detection
                     -> Audit Log

Message bus replacement

Replace traditional message queues with Kafka for better throughput and replay capability.

Data integration

Connect different systems without point-to-point coupling:

Database CDC -> Kafka -> Search Index
                     -> Cache
                     -> Data Warehouse
                     -> ML Pipeline

Change Data Capture from databases fits this pattern well. Any system can consume the stream without touching the source database.

Kafka vs traditional queues

Aspect	Kafka	Traditional Queue
Retention	Days/weeks/forever	Until consumed
Replay	Yes	No (usually)
Ordering	Per partition	Per queue
Throughput	Very high	Moderate
Consumer groups	Independent per group	Shared or exclusive

Kafka’s retention and replay make it unique. You can reprocess historical data if your processing logic changes. Traditional queues cannot do this.

For broader event-driven patterns, see our post on event-driven architecture. For pub/sub patterns that overlap with Kafka’s topic model, see pub/sub patterns.

Trade-off Analysis

When designing Kafka-based systems, understanding trade-offs helps you make informed decisions.

Throughput vs Durability

Choosing an acknowledgment strategy is the single biggest durability knob you have as a producer. The trade-off is direct: wait for fewer replicas and you write faster, but a broker crash before other replicas persist means lost messages. Wait for all replicas and you write slower, but the message survives a broker failure.

acks=1 (leader only) is the fastest path. The leader writes to its local log and responds immediately. If the leader crashes before followers have copied the message, the data is gone. This works for internally routed events where a missed message is acceptable collateral — think telemetry or metrics where a gap is recoverable.

acks=all with min.insync.replicas=2 sits in the middle. The leader waits for at least one follower to acknowledge before returning success to the producer. If the leader fails, you have at least one other broker with a copy. The latency hit comes from the round-trip to the follower; for most workloads this is measured in single-digit milliseconds on a healthy LAN.

acks=all with min.insync.replicas=3 is the safest configuration for critical data. You need all three replicas to acknowledge before the producer considers the write successful. The latency compounds when a broker is under load or the network hiccups, and write throughput drops because every record waits on the slowest replica.

Exactly-once mode stacks transaction overhead on top of acks=all. The producer and transaction coordinator exchange additional messages to bundle records atomically, and the coordinator writes commit markers across all involved partitions. This is the highest latency option and it reduces your effective throughput by roughly 20-40% depending on batch size and network conditions.

Approach	Throughput	Durability	Latency	Complexity
`acks=1` (leader only)	Highest	Low	Lowest	Lowest
`acks=all` + min.isr=2	Medium	High	Medium	Medium
`acks=all` + min.isr=3	Lower	Highest	Higher	Medium
Exactly-once mode	Lowest	Highest	Highest	Highest

Partition Count vs Overhead

Every partition you create adds direct overhead to every broker that holds a replica. This is not theoretical — clusters that accumulate thousands of partitions without planning hit real operational problems that are difficult to unwind.

File descriptors are the first constraint. Kafka opens a separate segment file, index file, and time-indexed index file per partition per replica. On Linux, the default “open files” limit is 1024 per process, which sounds generous until you do the math: 4000 partitions across 3 replicas with 3 files each equals 36,000 open files for a single broker. Set ulimit -n high enough and monitor it in production, because when the broker hits the limit, produce requests start failing with “Too many open files” at the worst possible moment.

Memory overhead grows differently. Each broker maintains in-memory structures for leader election state, ISR tracking, and per-partition metadata. Past about 4,000 partitions per broker, this metadata alone can consume gigabytes of heap. That shrinks the page cache your consumers depend on for fast reads, which paradoxically slows down the very consumers you are trying to scale.

Leader election time is the most operationally visible symptom. When a broker fails, Kafka’s controller triggers leader elections for every partition it owned simultaneously. With 6-12 partitions per broker, this takes under 100 milliseconds and no one notices. With 500+ partitions, a single broker failure triggers hundreds of elections in parallel, and election time stretches into seconds. At 500+ partitions on a stressed cluster, the controller itself can time out during the storm, causing cascading failures.

The practical ceiling for a healthy 3-broker cluster is around 4,000 partitions per broker. That keeps elections under 100ms. If you need more throughput than 12,000 partitions can deliver at 10 MB/s each, add brokers rather than more partitions. Partition count is immutable after topic creation, so over-partitioning early to “leave room for growth” is a trap — you pay the overhead forever and cannot reduce it without recreating the topic.

Partitions	Max Consumers	Memory Overhead	Leader Election Time
6-12	6-12	Low	Fast (< 100ms)
50-100	50-100	Medium	Medium (1-2s)
500+	500+	High	Slow (10+s)
4000+/broker	Depends	Very high	Very slow

Retention vs Storage Cost

Retention settings drive storage costs in a way that is not always obvious until you do the math on a high-throughput topic. A topic ingesting 1 GB per minute retains 7 days of data as roughly 1 TB per replica. Bump retention to 30 days and you are storing 4-5 TB per replica. At Kafka’s typical replication factor of 3, that becomes 12-15 TB of raw storage before you account for compaction overhead, index files, and the fact that disk vendors sell capacity in powers of 1,000 not 1,024.

The storage multiplier is not linear with retention time. A topic with 1-hour retention and 1 GB/minute throughput uses 60 GB per replica. Scale that to 7 days and you get roughly 1 TB per replica, but the actual multiplier on disk is 5-7x baseline because Kafka keeps index files, offset indexes, and time-sorted index files alongside the log segments. Segments also have a fixed overhead that amortizes better at longer retentions.

Storage cost is only half the picture. Longer retention changes broker restart behavior. When a broker restarts, it replays its log segments from disk to warm up consumer caches. With 7 days of retention across hundreds of segments, this warm-up takes minutes. With 30 days or more, it takes longer still, and if you are relying on a rolling restart for a zero-downtime deploy, longer restart times extend your window of reduced availability.

For most applications, 7 days is the sweet spot. It covers the common failure mode where a consumer is down for a few hours and needs to catch up on arrival. It supports the replay scenario where a bug in processing logic is fixed and historical data reprocessed. It does not impose the storage costs of longer retentions, and it keeps broker restart times manageable.

Regulatory requirements sometimes mandate 30 days or more. Financial services, healthcare, and industries with compliance retention rules need the longer window. In these cases, budget for the storage explicitly and consider tiered storage in Kafka 3.6+ to move older segments to cheaper object storage while keeping them queryable.

Indefinite retention is really only appropriate for event sourcing use cases where the log is the source of truth and you need the full history for rebuilds. If your event sourcing application is the only consumer, consider whether a compacted topic (which retains only the latest value per key) meets your needs at a fraction of the storage cost.

Retention	Storage Multiplier	Replay Window	Best For
1 hour	1x baseline	Limited	Real-time only
7 days	5-7x baseline	One week	Most applications
30 days	20-30x baseline	One month	Regulatory compliance
Indefinite	Variable	Full history	Event sourcing, audit logs

Consumer Scaling Constraints

Consumer lag grows when producers write faster than consumers can process. Before adding resources, diagnose which constraint is actually binding. Applying the wrong fix wastes effort and can make things worse.

The first check is partition count versus consumer count. If your topic has 12 partitions and you have 4 consumers, you have 8 spare partitions. Lag growing in this situation means you have headroom — add consumers up to the partition count and lag stabilizes. This is the cheapest fix because Kafka distributes partitions automatically and no topic recreation is needed.

If you have as many consumers as partitions and lag is still growing, you have hit the parallelism ceiling. No amount of adding consumers helps because Kafka assigns at most one consumer per partition. Your options are to optimize the processing logic (batch writes, async handlers, faster serialization), or increase partition count by recreating the topic with more partitions. The latter is disruptive and requires migrating consumers to the new topic, so it is a last resort.

Sometimes lag grows despite having spare partitions and efficient consumers. This points to a processing bottleneck in the consumer itself — a slow database write, a rate-limited downstream API, or a consumer that is doing too much work per message. Profile the consumer code directly: instrument processing time per message batch, check database connection pool utilization, and look for anything that blocks or serializes where parallelism could help.

Frequent rebalances cause a distinct pattern. Throughput drops in cycles rather than growing gradually. Each rebalance pauses consumption while partitions are revoked and reassigned. If your consumers restart often (rolling deploys, OOM kills, GC pauses triggering broker timeouts), the group never settles and achieves steady-state throughput. Switching to the cooperative sticky assignor reduces the scope of each rebalance to only the partitions being transferred, rather than stopping the world.

Scenario	Limiting Factor	Solution
Lag growing, spare partitions	Consumer count < partitions	Add consumers up to partition count
Lag growing, no spare partitions	Partition count limits parallelism	Increase partitions (recreate topic)
Lag growing, many consumers but still behind	Processing bottleneck	Optimize logic, scale horizontally with more partitions
Rebalances causing throughput drops	Frequent consumer restarts	Use sticky/cooperative sticky assignor

Exactly-once vs At-least-once Decision Matrix

Factor	Use At-least-once	Use Exactly-once
Duplicate processing cost	Low (metrics, analytics)	High (financial, inventory)
Output system support	Any	Must support idempotent writes
Throughput requirement	High	Moderate to high
Operational complexity	Lower	Higher
End-to-end latency tolerance	Low	Higher

Production Failure Scenarios

Understanding how Kafka fails in production helps you design more resilient systems.

Scenario 1: Broker network partition

When a broker loses network connectivity but doesn’t crash:

Broker-2 becomes unreachable
  → Partition 0 leader (on Broker-2) stops responding
  → Followers on Broker-1 and Broker-3 detect leader timeout
  → Kafka controller triggers leader election
  → ISR shrinks to exclude unreachable broker
  → min.insync.replicas check: if remaining < min.insync.replicas, writes fail
  → Producer receives NotEnoughReplicasException

Impact: Temporary unavailability of the partition. If unclean.leader.election=true and no ISR available, potential message loss.

Scenario 2: Zombie consumer problem

Consumer crashes but doesn’t leave group gracefully:

Consumer-1 processes messages and writes to database
Consumer-1 crashes AFTER database write but BEFORE offset commit
  → Consumer-1's partition sits unassigned
  → No consumer processes those messages (lag grows)
  → After session.timeout expires, Kafka triggers rebalance
  → New consumer picks up partition, replays same messages
  → Without idempotent processing: duplicate writes occur

The fix: idempotent consumers handle this gracefully. If you cannot make processing idempotent, use exactly-once semantics.

Scenario 3: Schema Registry mismatch

Producer and consumer evolve schemas independently:

Producer v1: { "userId": "string", "action": "string" }
Producer v2: { "userId": "string", "action": "string", "metadata": { "source": "string" } }
Consumer still expects v1 schema
  → Deserialization fails
  → Message goes to DLQ (if configured) or consumer crashes
  → Backlog of unprocessed messages accumulates

The fix: use Schema Registry with compatibility checking. Never evolve schemas in incompatible ways.

Scenario 4: Clock skew in clustered deployment

Brokers on different machines have clock skew:

Broker-1 clock: 1000
Broker-2 clock: 990 (10 seconds behind)
Broker-3 clock: 1010 (10 seconds ahead)

Leader write at timestamp 1005 (Broker-1 time)
  → Broker-2 sees this as future timestamp (1005 > 990)
  → Log segment index corrupted on Broker-2 replica
  → During leader election, Broker-2's replica deemed invalid
  → ISR shrinks, potential data loss if Broker-1 goes down

The fix: use NTP synchronization across all brokers. Monitor clock drift with automated alerts.

Scenario 5: Partition reassignment during peak load

Operations team rebalances partitions while producers are at full throughput:

Original: Broker-1 has partitions 0,1,2; Broker-2 has 3,4,5
Reassignment triggered:
  → New replicas start copying from leaders (network spike)
  → ISR temporarily expands (old + new replicas receiving)
  → Controller overwhelmed by metadata updates
  → Request latency P99 spikes to 10+ seconds
  → Producer buffers fill, send() blocks
  → Consumer lag grows as brokers struggle to keep up

The fix: schedule reassignments during low-traffic windows. Use Cruise Control for automated, throttled reassignment.

Failure	Impact	Mitigation
Broker goes down	Partition leader election; temporary unavailability	Configure replication factor of 3; use ISR configuration
Controller failure	Cluster-wide coordination pause	Run multiple brokers; use ZooKeeper/KRaft for controller election
Network partition	Followers fall out of ISR; potential data loss	Monitor ISR size; alert when replicas fall behind
Producer retry storm	Duplicate messages after transient failures	Enable idempotent producer; design idempotent consumers
Consumer rebalance storm	Throughput drops during rebalancing	Use sticky partition assignment; avoid frequent consumer restarts
Offset commit failure	Messages reprocessed or skipped	Use transactional producers with exactly-once semantics when needed
Partition imbalance	Some brokers overloaded while others idle	Monitor partition distribution; use Cruise Control for rebalancing
Data loss on leader change	Under-replicated partitions lose messages	Ensure min.insync.replicas >= 2; acks=all on producers

Common Pitfalls / Anti-Patterns

Pitfall 1: too many partitions

Each partition increases Kafka’s overhead (file handles, memory, leader elections). Creating thousands of partitions when you only need dozens causes unnecessary complexity. Start with fewer partitions and increase only when needed.

File descriptors are where it hits first. Kafka opens a separate segment file, index file, and lock file per partition. A cluster running 10,000 partitions will slam into the default Linux “open files” limit of 1024 per process — and when that happens, produce requests start failing with “Too many open files.” The only fix is raising the limit at the OS level and cutting back partition count. On a stressed broker at peak traffic, this is the last thing you need.

Memory overhead scales with partition count. Each broker maintains in-memory structures for leader election, ISR tracking, and per-partition metadata. Past about 4,000 partitions per broker, that metadata alone can eat gigabytes of heap, which shrinks the page cache your consumers rely on for fast reads. The JVM also starts GC-ing more often with a larger object graph to manage. The controller broker feels this most — when it goes down, it has to handle leader elections for every partition it owned, all at once.

Election time climbs with partition count too. With 500+ partitions on a single broker, a failure triggers an election cascade that can take 10+ seconds, leaving all those partitions unavailable for writes. Clusters running tens of thousands of partitions have seen the controller itself time out during these storms, cascading failures across the cluster.

For a 3-broker cluster, the practical ceiling is around 4,000 partitions per broker. That keeps elections under 100ms. Past that, the controller starts lagging and becomes a bottleneck. If you need more throughput than 12,000 partitions can deliver, scale out by adding brokers rather than piling on more partitions.

Not planning for partition count

Partition count is fixed at topic creation. If you need more later, you must recreate the topic. Plan partition count based on expected throughput and consumer parallelism requirements.

Pitfall 2: ignoring consumer lag

Consumer lag is the gap between the latest offset in a partition and the offset your consumer has committed. A small amount of lag is normal during traffic spikes, but lag that grows over time signals a real mismatch between how fast producers write and how fast consumers can process.

Lag hides in plain sight because Kafka keeps working while it grows. Producers keep writing, consumers keep reading older messages, and nobody notices until the backlog reaches a point where consumers can never fully catch up without downtime.

To stay ahead of it:

Check lag via kafka-consumer-groups — run kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --describe to see per-partition lag
Alert on persistent growth — lag that rises across three consecutive measurement intervals is worse than a single high spike
Add consumers first — if your topic has spare partitions, adding consumers is the quickest fix
Optimize processing if consumers are maxed out — profile consumer code, batch database writes, or move slow operations to async handlers
Increase partitions as a last resort — partitions are immutable after topic creation, so this means recreating the topic

Before scaling, check whether the lag is evenly distributed. If one partition’s lag is orders of magnitude higher than the rest, you have a hot key concentrating writes on a single partition. That is a data modeling problem, not a capacity problem.

Pitfall 3: not using compression

Text-based formats like JSON have a lot of redundancy. Without compression, Kafka producers send raw payloads across the network, consuming more bandwidth and filling broker disks faster than needed.

Compression codecs for Kafka trade off speed, ratio, and CPU cost:

ZSTD — best compression ratio (5-10x on JSON text), moderate CPU. Good for storage-sensitive topics or when producers have CPU to spare
LZ4 — very fast compression and decompression with a decent ratio (3-5x). A safe default for most workloads
Snappy — fast but lower ratio (2-3x). Historically common but ZSTD and LZ4 usually beat it
GZIP — high ratio, slow. Useful where storage cost dominates compute

Enable compression at the producer level with one config line:

config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

Brokers and consumers decompress transparently — no changes needed on the read side. The CPU cost of compression on producers is almost always worth the bandwidth and disk savings, especially for JSON-heavy payloads or high-throughput topics.

Pitfall 4: auto.offset.reset = earliest without understanding consequences

auto.offset.reset controls what happens when a consumer group has no committed offset — either because the group is brand new or its offset expired from the __consumer_offsets topic. The three options are:

earliest — start from the beginning of the log
latest — start from the newest message (skip all existing data)
none — fail immediately if no offset is found

Setting earliest as the default sounds safe (“I won’t miss anything”) but in production it can cause a stampede. A consumer group restarting from the beginning with 30 days of retention will try to process all 30 days of messages at once. If your topic processes millions of events per day and each message triggers a database write, your consumers will fall behind instantly, lag will spike, and downstream systems may buckle under the load.

The safer pattern:

Default to latest for production consumer groups — new consumers start on current data and ignore historical messages
Use earliest explicitly when you need to backfill or rebuild state, and scope the operation to a separate consumer group with its own offset tracking
Set none during development to catch offset issues early — the consumer fails fast instead of silently reprocessing old data

Pitfall 5: sending sensitive data unencrypted

Kafka sends data in plaintext by default — no encryption in transit, no encryption at rest. If your messages contain PII, credentials, financial data, or anything with compliance requirements, the default configuration is a liability.

Production Kafka security has several layers:

Encryption in transit — enable SSL/TLS on brokers and configure clients with ssl.truststore.location and ssl.keystore.location. This prevents eavesdropping on the wire between producers, brokers, and consumers
Authentication — SASL/SCRAM or mTLS verifies that clients are who they claim to be before they can connect
Authorization — ACLs on topics restrict which clients can read or write. Without ACLs, any authenticated client can access any topic
Encryption at rest — broker-side disk encryption or Kafka’s Secret API protects data stored in log segments

A minimal TLS setup on the producer:

config.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
config.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "/path/to/truststore.jks");
config.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "/path/to/keystore.jks");
config.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "password");

If compliance or data governance policies prevent sensitive data from touching Kafka at all, consider a sidecar or pre-processing service that scrubs sensitive fields before publishing.

Interview Questions

1. What is the fundamental difference between Kafka and a traditional message queue?

Expected answer points:

Kafka is a distributed streaming platform built around the concept of a durable log, not a queue
Messages in Kafka topics are retained for a configurable period (hours, days, indefinitely) rather than being deleted upon consumption
This retention enables replay capability: consumers can re-read historical data from any point in the log
Traditional queues typically delete messages after consumption and do not support replay
Kafka's topic model supports multiple independent consumer groups, each reading the same data independently

2. How does Kafka guarantee ordering within a partition?

Expected answer points:

Within a partition, messages have monotonically increasing offsets that define total order
Producers can specify a message key; Kafka hashes the key to determine partition assignment
All messages with the same key go to the same partition, guaranteeing order per key
Consumers read messages in offset order from their assigned partitions
Cross-partition ordering is not guaranteed; only per-partition ordering is maintained

3. Explain the difference between at-least-once, at-most-once, and exactly-once delivery semantics in Kafka.

Expected answer points:

At-most-once: consumer commits offsets before processing; messages may be lost but never duplicated (lowest latency)
At-least-once (default): consumer processes messages then commits offsets; messages may be duplicated but never lost
Exactly-once: uses Kafka transactions to atomically commit offsets and output writes; no duplicates, no loss (highest latency)
The choice depends on use case: at-least-once with idempotent consumers handles most scenarios well
Exactly-once should only be used when duplicate processing has serious consequences and the cost is acceptable

4. What is a Consumer Group and how does it affect message consumption?

Expected answer points:

A consumer group is a set of consumers cooperating to consume messages from a topic
Each partition is delivered to exactly one consumer within a group (ensuring parallelism)
Different consumer groups each receive all messages independently
When a consumer joins or leaves, Kafka rebalances partition assignments across the group
Rebalancing temporarily pauses consumption; frequent rebalances hurt throughput

5. What is ISR (In-Sync Replicas) and why does it matter for durability?

Expected answer points:

ISR are replicas that have fully caught up with the partition leader
Only ISR members can become leader after a failure
The min.insync.replicas setting determines the minimum ISR size for acknowledging writes
With replication.factor=3 and min.insync.replicas=2, writes acknowledge when at least 2 replicas persist the message
If all in-sync replicas fall behind or fail, the partition becomes unavailable for writes

6. How do you handle messages that fail processing repeatedly in Kafka?

Expected answer points:

Dead Letter Queues (DLQs) catch messages that exhaust retries
Configure a DefaultErrorHandler with FixedBackOff for retry behavior (e.g., 3 retries, 1 second apart)
DLQ preserves original topic, partition, and key for debugging
Monitor DLQ depth — messages piling up indicate upstream issues
DLQ messages can be reprocessed after fixing the underlying issue

7. What are the key configuration parameters for ensuring durability in Kafka?

Expected answer points:

replication.factor = 3 (or higher) for critical topics to ensure redundancy
min.insync.replicas = 2 (requires acks=all) so writes persist to multiple replicas
acks = all (wait for all ISR to acknowledge before confirming write)
retention.ms configured based on replay requirements (hours, days, or indefinite)
Enable producer idempotency to prevent duplicates during retries

8. How does the partition count affect Kafka performance and scalability?

Expected answer points:

Partition count determines maximum consumer parallelism (one consumer per partition per group)
Each partition handles roughly 10 MB/s throughput; size partitions accordingly
Kafka performance degrades past ~4000 partitions per broker
Partition count is immutable after topic creation; plan ahead
More partitions means higher overhead (file handles, memory, leader election time)

9. What is the difference between ZooKeeper and KRaft modes in Kafka?

Expected answer points:

ZooKeeper: traditional metadata management (partition leadership, ISR, consumer offsets, ACLs)
KRaft: Kafka's built-in consensus protocol (Kafka 3.3+) removes ZooKeeper dependency
KRaft scales better — ZooKeeper struggles with millions of znodes (common with many consumer groups)
KRaft enables simpler cluster setup and faster controller election
Kafka 3.3+ supports live migration from ZooKeeper to KRaft without downtime

10. How would you design a Kafka-based system to handle backpressure?

Expected answer points:

Consumer-side: limit max.poll.records and configure fetch.min.bytes/fetch.max.wait.ms
Producer-side: buffer.memory and batch.size create natural backpressure when broker is slow
Monitor consumer lag — growing lag signals producers outpacing consumers
Watch for under-replicated partitions, producer send() blocking, and request timeout exceptions
Scale consumers horizontally (if partitions allow) or optimize processing logic
Scale brokers if broker throughput is the bottleneck

11. What is log compaction in Kafka and when would you use it?

Expected answer points:

Log compaction retains the latest message for each key within a partition, discarding older messages with the same key
Unlike time-based retention which deletes messages after a period, compaction keeps the most recent value for each key indefinitely
Use cases: maintaining a lookup table or changelog where only the latest state matters (e.g., customer profile updates)
Enables Kafka as a key-value store or database for event sourcing where you need the current state, not full history
Compaction runs in the background and does not block normal writes

12. How does Kafka handle schema evolution with Schema Registry?

Expected answer points:

Schema Registry stores and validates message schemas (Avro or JSON Schema) separately from Kafka
Producers and consumers register and retrieve schemas by subject name, enabling schema validation at publish time
Schema compatibility modes control evolution: BACKWARD, FORWARD, FULL, NONE
BACKWARD compatibility allows consumers reading new data to work with old schemas (most common)
Without Schema Registry, incompatible schema changes cause deserialization errors or silent data corruption
Schema Registry also compresses message payloads by storing schema IDs instead of full schemas in each message

13. What is Kafka Connect and what are the key components?

Expected answer points:

Kafka Connect is a framework for scalably and reliably streaming data between Kafka and external systems
Connectors are plugins that define how to interact with source systems (databases, S3, JDBC) or sink systems
Workers are the processes that execute connectors; they can run in standalone or distributed mode
Converters handle serialization/deserialization of record keys and values (Avro, JSON, Parquet)
Transforms are optional lightweight modifications to records (e.g., filtering, adding fields)
Offset storage: Connect manages offset tracking internally, typically in a Kafka topic

14. Explain the role of the Transaction Coordinator in Kafka exactly-once semantics.

Expected answer points:

The Transaction Coordinator is a Kafka broker component that manages the two-phase commit protocol for transactions
Phase 1 (prepare): producer sends commit request; coordinator writes a prepare marker to all involved partitions
Phase 2 (commit): coordinator writes a commit marker atomically across all partitions
If the producer crashes before commit, the coordinator detects and aborts the transaction on restart
Consumer uses the transaction marker to filter out aborted transactions via isolation.level=read_committed
The transactional.id ensures exactly-once semantics even across producer restarts

15. How does the consumer partition assignment strategy affect performance?

Expected answer points:

Range assignor: assigns partitions contiguously per topic; can cause imbalance if topic partition counts differ
RoundRobin assignor: distributes partitions evenly across consumers regardless of topic; better balance
Sticky assignor: minimizes partition movement during rebalances, reducing reprocessing overhead
Cooperative sticky assignor: allows incremental rebalancing without full stop-the-world pauses
Choice affects rebalance duration, message ordering during rebalance, and consumer CPU utilization
For stateful consumers, sticky assignment prevents costly state migration

16. What are the trade-offs between increasing partitions versus adding consumers?

Expected answer points:

Adding consumers only helps if there are spare partitions; max parallelism equals partition count
Increasing partitions increases parallelism but is permanent and costly: more file handles, memory overhead, longer leader election
More partitions also increases end-to-end latency (more replication, more producer batching decisions)
Rebalancing with many partitions takes longer, temporarily impacting throughput
Rule of thumb: target 10-100 MB/s per partition throughput, start conservative (6-12 partitions)
If lag is growing and no spare partitions, you must add partitions (requires topic recreation or new topic)

17. How do you monitor Kafka consumer lag and what are acceptable thresholds?

Expected answer points:

Consumer lag is the difference between the latest offset and the consumer's committed offset
Kafka exposes lag per partition via KafkaConsumer.metrics() or tools like kafka-consumer-groups
JMX metrics: ConsumerLag and FetchManager metrics in the kafka.consumer group
Monitoring tools: Confluent Control Center, Kafka Manager, Prometheus with JMX exporter, Datadog
Acceptable threshold depends on SLA: for 5-minute SLA, lag should stay under 5 minutes
Growing lag signals producers outpacing consumers; investigate processing bottlenecks or scale consumers

18. What is the purpose of Kafka's retention policy and how do you choose the right setting?

Expected answer points:

Retention policy determines how long messages are kept before being eligible for deletion
Configured via retention.ms (time-based) or retention.bytes (size-based) per topic
Choose based on use case: event sourcing needs long retention for replay, real-time analytics may need shorter
Consider downstream consumers needing to reprocess historical data if processing logic changes
Longer retention increases storage costs; balance between replay window and cost
For compliance or audit requirements, retention may need to be days or weeks

19. How does Kafka achieve fault tolerance at the broker level?

Expected answer points:

Partition replication: each partition has a leader and ISR followers on different brokers
If a broker fails, Kafka automatically elects a new leader from ISR members
min.insync.replicas ensures writes persist to multiple replicas before acknowledgment
Controller broker manages partition leadership and cluster coordination (backed by ZooKeeper or KRaft)
unclean.leader.election setting controls behavior when no ISR is available: can lose messages if set to true
racks configuration allows placing replicas in different physical locations for rack-aware failure handling

20. What are the key differences between Kafka and other streaming platforms like Apache Pulsar or RabbitMQ?

Expected answer points:

Kafka uses partition-centric model with consumer groups; Pulsar uses subscription types (exclusive, failover, shared, key-shared)
Pulsar separates storage into tiered storage (BookKeeper for real-time, Apache Tiered Storage for historical)
RabbitMQ is a traditional broker with exchanges and queues, not a log-based system; no native replay
Pulsar supports geo-replication out-of-the-box; Kafka requires MirrorMaker
Kafka has a larger ecosystem and community; Pulsar offers better multi-tenancy and geo-replication
For pure message queuing with acknowledgments, RabbitMQ excels; for high-throughput event streaming with replay, Kafka excels

Conclusion

Key points

Kafka is a distributed log, not a queue; messages are retained and can be replayed
Topics divide into partitions for parallelism; same key goes to same partition
Consumer groups enable independent consumption by different services
At-least-once delivery is the default; exactly-once requires Kafka transactions
Rebalancing happens when consumers join or leave; frequent rebalances hurt throughput
ZooKeeper (or KRaft in newer versions) manages cluster metadata

Configuration Essentials

Core Setup

Partition count planned for target throughput and consumer parallelism
Replication factor set to 3 for critical topics
min.insync.replicas configured to 2 (requires acks=all)
Retention policy set based on replay requirements

Producer Configuration

Idempotent producer enabled
Compression enabled (LZ4 or ZSTD recommended)
acks=all for critical data
retries configured with appropriate backoff

Consumer Configuration

Consumer group offset reset policy defined
max.poll.records tuned for processing capacity
session.timeout appropriate for your use case
Partition assignment strategy selected (sticky or cooperative sticky for production)

Operations and Monitoring

Observability checklist

Metrics to monitor

Consumer lag: difference between latest offset and consumer position (critical for SLAs)
Under-replicated partitions: partitions without full replication
ISR size: In-Sync Replicas count per partition
Message throughput: messages and bytes per second per topic
Request latency: P99 producer and consumer request latencies
Disk usage: broker disk utilization and growth rate
Consumer group status: active members and partition assignments
Controller status: leader elections and controller changes

Logs to capture

Broker startup and shutdown events
Partition leader election events
Consumer group rebalancing events
Producer acknowledgment failures and retries
Controller changes and election events
Under-replicated partition events
Disk space warnings

Alerts to configure

Consumer lag exceeds SLA threshold (for example, more than 5 minutes behind)
Under-replicated partitions greater than 0
Broker disk usage above 80%
Producer error rate above 1%
Consumer group has no active members
Controller is unavailable
Messages per second exceeds capacity threshold
Request latency P99 exceeds threshold

ZooKeeper and KRaft storage requirements

Kafka needs somewhere to store cluster metadata — ZooKeeper traditionally, KRaft in Kafka 3.3+.

What is stored:

Partition leadership and ISR membership
Consumer group offsets and membership
Access control lists (ACLs)
Topic configurations
Delegation token information

ZooKeeper/KRaft storage ≈
    partitions × (leadership + ISR state)
  + consumer_groups × (member offsets + metadata)
  + topics × (configs + ACLs)
  + delegation_tokens

Typical storage needs:

Cluster Size	Topics	Partitions	Consumer Groups	ZooKeeper/KRaft Storage
Small (3 brokers)	50	200	30	50-200 MB
Medium (6 brokers)	200	1,000	150	200-500 MB
Large (12 brokers)	500	5,000	500	1-3 GB
Very large (24+ brokers)	1,000+	20,000+	2,000+	5-10 GB

Storage is not the real bottleneck. Znode count is. ZooKeeper falls over when there are millions of child znodes — which happens when you have many consumer group members or partition replicas. KRaft mode removes ZooKeeper dependency and scales better.

If you are still on ZooKeeper, migrate to KRaft. Kafka 3.3+ supports live migration — no downtime needed.

Deployment Readiness

Pre-deployment checklist

- [ ] Replication factor set to 3 for critical topics
- [ ] min.insync.replicas configured appropriately
- [ ] Consumer lag monitoring configured and alerts set
- [ ] Producer retries and idempotency configured
- [ ] Compression enabled on producers
- [ ] Schema Registry deployed for schema validation
- [ ] ACLs configured for topic access control
- [ ] TLS/SSL encryption enabled for all connections
- [ ] Partition count planned based on throughput requirements
- [ ] Retention policy configured (hours, days, weeks)
- [ ] Dead letter topic configured for failed messages
- [ ] Consumer group offset reset policy defined
- [ ] Backup and disaster recovery plan documented

Security

SASL/SCRAM or mTLS configured for authentication
ACLs configured for topic access control
TLS/SSL encryption enabled for all connections
Sensitive data encryption configured

Security checklist

Authentication: use SASL/PLAIN or SCRAM for client authentication; mTLS for certificate-based auth
Authorization: implement ACLs to restrict topic access; principle of least privilege
Encryption in transit: enable SSL/TLS for all broker and client connections
Encryption at rest: use disk encryption or Kafka’s Secret API for sensitive data
Schema validation: validate message schemas with Confluent Schema Registry
Data sanitization: sanitize message keys and values to prevent injection
Audit logging: enable Kafka’s audit logging for admin operations
Network segmentation: place brokers in private networks; restrict inter-broker communication

Introduction

Core Concepts

Topics and partitions

Message keying

Partition assignment

Consumer groups

Rebalancing

What causes unwanted rebalances

Static group membership

Spotting problematic rebalances

Offset management

Exactly-Once Delivery

Overview

The problem

Kafka transactions

When you need exactly-once

End-to-end exactly-once flow

Broker Replication and Fault Tolerance

Replication and ISR

Key configuration defaults

Dead Letter Queues

DLQ design considerations

Backpressure Handling

Consumer-side backpressure

Producer-side backpressure

Backpressure signals to watch

Delivery Guarantees

At-most-once delivery

At-least-once delivery (default)

Exactly-once delivery

Consumer Group Coordination

Sticky assignor

Cooperative sticky assignor

Standalone consumers

Maximum parallelism calculation

Kafka Streams example

Topic-Specific Deep Dives

Advanced Partition Sizing

Partition count sizing

Kafka use cases

Event streaming

Message bus replacement

Data integration

Kafka vs traditional queues

Trade-off Analysis

Throughput vs Durability

Partition Count vs Overhead

Retention vs Storage Cost

Consumer Scaling Constraints

Exactly-once vs At-least-once Decision Matrix

Production Failure Scenarios

Scenario 1: Broker network partition

Scenario 2: Zombie consumer problem

Scenario 3: Schema Registry mismatch

Scenario 4: Clock skew in clustered deployment

Scenario 5: Partition reassignment during peak load

Common Pitfalls / Anti-Patterns

Pitfall 1: too many partitions

Not planning for partition count

Pitfall 2: ignoring consumer lag

Pitfall 3: not using compression

Pitfall 4: auto.offset.reset = earliest without understanding consequences

Pitfall 5: sending sensitive data unencrypted

Interview Questions

Further Reading

Conclusion