Three-Phase Commit: Non-Blocking Distributed Transactions
Learn how Three-Phase Commit (3PC) extends 2PC with a pre-commit phase, its assumptions, limitations, and when to use it.
Three-Phase Commit: Non-Blocking Distributed Transactions
Two-Phase Commit works when everyone cooperates. The coordinator stays up, participants respond, and the network behaves. But distributed systems do not always cooperate. When the coordinator crashes mid-transaction, participants can wait forever. This blocking problem is what Three-Phase Commit tries to solve.
I ran into this during a database migration. We had a distributed transaction spanning three data centers, and the coordinator crashed at exactly the wrong moment. Two participants thought the transaction was pending. One thought it had aborted. We spent hours untangling the state. 2PC is simple but fragile. 3PC is smarter but comes with its own costs.
Introduction
Here’s the issue with 2PC. After voting yes in Phase 1, participants enter the prepared state. They hold locks and wait. And wait. And wait for the coordinator’s decision. If the coordinator crashes at this point, those participants are stuck. They cannot commit (maybe the coordinator decided to abort). They cannot abort (maybe the coordinator decided to commit). They just block.
graph TD
subgraph "2PC Coordinator Crash Scenario"
C[Coordinator]
P1[Participant 1 - PREPARED]
P2[Participant 2 - PREPARED]
C -->|Phase 1| P1
C -->|Phase 1| P2
P1 -->|YES| C
P2 -->|YES| C
C -.- X[CRASH]
X -.->|stuck| B1[BLOCKED]
X -.->|stuck| B2[BLOCKED]
end
This blocking is not just a performance issue. Locks sit held. Resources stay consumed. In worst cases, someone has to manually untangle things.
How 3PC Extends 2PC
3PC adds an extra phase between voting and committing. The idea is that 3PC is designed to be non-blocking under failure assumptions that are more realistic than 2PC’s assumptions.
Phase 1: CanCommit
The coordinator asks all participants if they can commit a transaction. This is identical to 2PC’s prepare phase.
sequenceDiagram
participant C as Coordinator
participant P1 as Participant 1
participant P2 as Participant 2
C->>P1: CanCommit?
C->>P2: CanCommit?
P1-->>C: Yes
P2-->>C: Yes
If any participant votes No or times out, the coordinator sends Abort. The transaction ends. No blocking at this stage.
Phase 2: PreCommit
If all participants vote Yes, the coordinator sends PreCommit to all participants. This is the new phase in 3PC.
sequenceDiagram
participant C as Coordinator
participant P1 as Participant 1
participant P2 as Participant 2
Note over C,P2: Phase 2: PreCommit
C->>P1: PreCommit
C->>P2: PreCommit
P1-->>C: ACK
P2-->>C: ACK
Once a participant receives PreCommit, it knows something: all participants voted Yes, and the coordinator is still alive (it managed to send PreCommit messages). This knowledge changes the failure semantics.
Phase 3: DoCommit
After receiving ACK from all participants, the coordinator sends DoCommit. Participants then finalize the transaction.
sequenceDiagram
participant C as Coordinator
participant P1 as Participant 1
participant P2 as Participant 2
Note over C,P2: Phase 3: DoCommit
C->>P1: DoCommit
C->>P2: DoCommit
P1-->>C: Committed
P2-->>C: Committed
Why 3PC Is Non-Blocking
Here’s what happens when the coordinator crashes during Phase 2.
With 2PC, a participant in the prepared state cannot decide if the coordinator dies. With 3PC, when a participant receives PreCommit, it knows every participant voted Yes. If the coordinator crashes after sending PreCommit, participants can safely complete the commit. They have enough information to decide.
graph TD
subgraph "3PC Coordinator Crash After PreCommit"
C[Coordinator]
P1[Participant 1]
P2[Participant 2]
C -->|PreCommit| P1
C -->|PreCommit| P2
P1 -->|ACK| C
P2 -->|ACK| C
C -.- X[CRASH]
P1 -->|I can commit| D1[DoCommit]
P2 -->|I can commit| D2[DoCommit]
end
A participant that receives PreCommit and times out waiting for DoCommit can safely commit. It knows all participants voted Yes and the coordinator was alive long enough to send PreCommit.
Critical Assumptions
3PC requires assumptions that are often violated in practice:
Network Synchrony Assumption
3PC assumes the network is eventually synchronous. This means messages will eventually be delivered, even if delayed. In a truly asynchronous network where messages can be lost indefinitely, 3PC cannot guarantee non-blocking behavior.
This is the same assumption that makes FLP impossibility result relevant. If the network can partition forever, no protocol can be both safe and live in all executions.
Bounded Node Failure
3PC assumes nodes do not fail forever. If a node crashes and never recovers, 3PC cannot complete that transaction. The protocol handles transient coordinator failures but not permanent participant failures.
No Partition After PreCommit
3PC guarantees non-blocking behavior when the coordinator crashes after PreCommit AND the network does not partition. If a network partition occurs at exactly the wrong moment, participants could diverge.
When 3PC Might Be Considered
3PC is rarely used in production, but here are scenarios where it could make sense:
-
Short-duration transactions on reliable networks: If your network is mostly reliable and transactions complete quickly, the extra phase overhead might be acceptable.
-
Systems requiring strict liveness: If blocking is unacceptable and your network assumptions match 3PC’s requirements, the protocol provides better liveness guarantees than 2PC.
-
Research and educational contexts: Understanding 3PC helps understand the trade-offs in distributed transaction protocols.
Trade-off Analysis
| Aspect | 2PC | 3PC | Saga |
|---|---|---|---|
| Blocking | Yes, coordinator crash in prepared state | No (under assumptions) | No |
| Phases | 2 | 3 | Many (one per step) |
| Coordinator crash during prepared | Blocks participants | Participants can recover | No effect |
| Network assumptions | None (works async) | Eventual synchrony | None |
| Rollback on failure | Atomic | Atomic | Compensating transactions |
| Performance overhead | 2 round trips | 3 round trips | N round trips |
| Complexity | Low | Medium | High |
| Use case | Tight consistency | Tight consistency | Eventual consistency |
| Example systems | PostgreSQL, MySQL | Rarely used | AWS Step Functions, Temporal |
Production Failure Scenarios
Here is where theory meets reality. 3PC handles some failures elegantly, but others expose cracks in its design.
Coordinator Crash After PreCommit but Before DoCommit
This is the scenario 3PC was built for. Coordinator crashes after sending PreCommit but before DoCommit? With 2PC, participants would block forever. With 3PC, participants that received PreCommit can commit independently. The logic is straightforward: every participant voted Yes, and the coordinator made it to Phase 2. No participant will suddenly vote No at this point.
But there is a wrinkle. If the coordinator crashed after sending PreCommit but before getting all the ACKs back, some participants may have sent ACK while others have not. A participant that never got PreCommit in the first place cannot safely commit. Recovery becomes asymmetric, and asymmetry breeds bugs.
Network Partition During PreCommit Phase
Partition happens right when the coordinator sends PreCommit. Some participants get it, others do not. The ones that got PreCommit eventually commit after timeout. The ones that did not get it eventually abort. The system diverges.
This is the killer. 3PC’s non-blocking promise only holds if the network does not partition after PreCommit. But partitions can happen whenever they want. The theoretical guarantee sounds nice until you remember what networks actually do.
Coordinator Crash Before PreCommit
Coordinator goes down before sending any PreCommit. Participants are stuck in CanCommit, having voted Yes (or been presumed to have voted Yes by not saying No). With 3PC, they can abort after timeout. No worse than 2PC, no blocking.
Simultaneous Participant and Coordinator Failure
Both crash together. The survivors must wait for one or the other to come back. 3PC has no magic for this. No protocol can work around nodes that stay dead forever.
Common Pitfalls
Despite its theoretical advantages, 3PC is rarely used in production:
-
The assumptions are hard to meet. Network synchrony is not guaranteed in real systems. Wide-area networks especially can experience prolonged partitions.
-
The improvement is marginal. 3PC eliminates blocking only under specific failure scenarios. Most systems just use timeouts and manual intervention instead of adding 3PC’s complexity.
-
Saga pattern is often better. For long-running transactions, compensating transactions are more practical than trying to maintain locks across distributed participants.
-
The performance cost matters. The extra round trip hurts high-throughput systems. For most use cases, the blocking probability with 2PC is low enough that 3PC’s extra latency is hard to justify.
Implementing a Simple 3PC
Here is a simplified view of how 3PC coordinator logic works:
class ThreePhaseCommitCoordinator:
def __init__(self, participants):
self.participants = participants
self.state = "INIT"
def execute(self, transaction):
# Phase 1: CanCommit
votes = []
for p in self.participants:
vote = p.can_commit()
votes.append(vote)
if all(v == "YES" for v in votes):
# Phase 2: PreCommit
self.state = "PRECOMMIT"
for p in self.participants:
p.pre_commit()
# Phase 3: DoCommit
self.state = "COMMIT"
for p in self.participants:
p.do_commit()
else:
# Abort
self.state = "ABORT"
for p in self.participants:
p.abort()
The participant side follows a similar pattern with timeouts at each phase that enable recovery decisions.
Consistency vs Availability Trade-offs in 3PC
People sometimes treat 3PC as if it solves the consistency-availability tradeoff. It does not. Here is why.
The CAP Theorem Context
3PC aims for consistency while handling coordinator crashes. But during a network partition after PreCommit, things break down. Participants that got PreCommit commit. The ones that did not get it abort. Consistency goes out the window.
Why 3PC Does Not Solve CAP
The misconception is widespread. People hear “non-blocking” and assume that means 3PC is safe under all conditions. It is not. The difference from 2PC is that 3PC avoids blocking. It does not avoid inconsistency when partitions occur.
Non-blocking and consistency are not the same property. 3PC stays non-blocking under certain failure assumptions. It does not stay consistent under all failure scenarios.
Practical Implications
If you need strict consistency, both 2PC and 3PC fail during partitions. The real choice is blocking (2PC) or potential inconsistency (3PC). Neither gives you availability in the CAP sense during a partition.
If availability is your goal, look at Saga or eventual consistency. These handle partitions without the lock-holding problems of 2PC or the partition-vulnerability of 3PC.
Quick Recap
- 3PC adds a PreCommit phase between 2PC’s voting and commit phases
- The PreCommit phase lets participants recover when the coordinator crashes
- 3PC is non-blocking under assumptions of eventual network synchrony and bounded failures
- In practice, 3PC is rarely used because its assumptions are hard to meet
- Saga pattern is often preferred for long-running distributed transactions
- 2PC remains the most common protocol for short distributed transactions requiring atomicity
For more on distributed transactions, see Two-Phase Commit for the protocol that 3PC builds upon. To understand the broader consistency landscape, read Consistency Models. For handling failures without blocking, see the Saga Pattern and Outbox Pattern.
Interview Questions
- After Phase 1 (voting), participants enter the prepared state and hold locks
- If the coordinator crashes in this state, participants cannot decide whether to commit or abort
- They cannot commit (maybe coordinator decided to abort) and cannot abort (maybe coordinator decided to commit)
- The result is indefinite blocking, holding resources and locks
- 3PC adds a PreCommit phase between voting and final commit
- When a participant receives PreCommit, it knows all participants voted Yes and the coordinator is alive
- If the coordinator crashes after PreCommit, participants can safely commit without waiting
- The extra phase provides the information needed for recovery decisions
- Eventual network synchrony: messages will eventually be delivered, even if delayed
- Bounded node failures: nodes do not fail forever and eventually recover
- No network partition after PreCommit: partition during Phase 2 can still cause inconsistency
- If these assumptions are violated, 3PC loses its non-blocking advantage
- Participants that received PreCommit will eventually commit after timeout
- Participants that did not receive PreCommit will abort after timeout
- This results in inconsistent outcomes across participant groups
- 3PC's non-blocking guarantee does not hold during network partitions
- 2PC requires 2 phases with 2 round trips (CanCommit/Abort, then DoCommit/Abort)
- 3PC requires 3 phases with 3 round trips (CanCommit, PreCommit, DoCommit)
- The extra round trip adds latency, especially problematic for high-throughput systems
- Message complexity is higher in 3PC due to the additional PreCommit phase and ACK messages
- Network synchrony assumptions are hard to meet in real systems, especially WANs
- The improvement over 2PC is marginal: blocking only occurs under specific failure scenarios
- Saga pattern is more practical for long-running transactions
- Most systems use timeouts and manual intervention for the rare blocking cases
- Network partition during the PreCommit phase
- Different participants receive PreCommit at different times due to message delays
- Participants that received PreCommit commit, others abort after timeout
- This violates atomicity and consistency guarantees
- Saga uses compensating transactions for rollback instead of atomic rollback
- Saga is non-blocking by design and works well for long-running transactions
- Saga provides eventual consistency rather than strict atomicity
- 3PC attempts strict atomicity but can still produce inconsistencies during partitions
- The participant knows all participants voted Yes and the coordinator was alive
- After a timeout waiting for DoCommit, the participant can safely commit
- This is the key non-blocking property of 3PC
- The participant has sufficient information to make a safe decision independently
- FLP proves that in a fully asynchronous system, no consensus protocol can be both safe and live
- 3PC's non-blocking property relies on eventual synchrony assumptions (not fully asynchronous)
- If the network can partition forever, 3PC cannot guarantee non-blocking behavior
- 3PC moves the assumption from "no failures" to "eventual message delivery"
- Timeouts trigger recovery decisions when expected messages do not arrive
- In Phase 1, timeout without Yes vote results in Abort
- In Phase 2, timeout after PreCommit results in DoCommit
- In Phase 3, timeout waiting for ACK results in assuming commit succeeded
- Timeout values must be set based on expected network delay characteristics
- No, 3PC cannot guarantee availability during network partitions
- During partitions after PreCommit, some participants commit while others abort
- This violates consistency, so the system cannot be both consistent and available (CAP)
- 3PC is designed for consistency with non-blocking under specific failure assumptions
- Coordinator sends Prepare to all participants and receives Yes votes from all
- Coordinator crashes before sending the commit decision
- In 2PC: participants are in prepared state and block indefinitely
- In 3PC: participants received PreCommit, so they can commit after timeout
- The key difference is the information available to participants after PreCommit
- If the crashed participant never received PreCommit, other participants can commit after timeout
- If the crashed participant received PreCommit but crashed before ACK, recovery becomes ambiguous
- The coordinator cannot distinguish between "crashed before ACK" and "ACK lost in transit"
- This requires manual intervention or additional protocol machinery to resolve
- If any participant votes No, the coordinator immediately sends Abort to all participants
- No blocking occurs at this stage since the decision is to abort
- Participants that already prepared can release their locks upon receiving Abort
- This is identical to 2PC's handling of a No vote
- 2PC blocks when coordinator crashes in prepared state because participants cannot proceed without the coordinator's decision
- 3PC non-blocking emerges because the PreCommit phase transfers sufficient state to participants
- With PreCommit, participants know all voted Yes and coordinator was alive, enabling autonomous decisions
- The distinction is information distribution: 3PC participants have enough context to decide without the coordinator
- Add a quorom-based commit rule requiring majority acknowledgment before DoCommit
- Implement partition detection with minority side automatically aborting
- Use a witness or observer node pattern to track global commit state
- Consider integrating with Paxos or Raft for coordinator recovery during partitions
- Timeout too short: participants abort prematurely when network is just slow
- Timeout too long: blocking duration becomes unacceptable when coordinator truly fails
- Phase-dependent timeouts: PreCommit timeout should be shorter than CanCommit timeout
- Adaptive timeouts based on historical network latency patterns improve reliability
- Both aim to recover from coordinator/leader failures without blocking
- 3PC relies on participants autonomously deciding based on PreCommit state
- Raft elects a new leader who then determines continuation based on log state
- 3PC can commit autonomously while Raft requires new leader to drive progress
- Raft provides stronger guarantees but requires more infrastructure
- When network assumptions do not guarantee eventual synchrony (unreliable networks)
- When simplicity matters more than theoretical liveness improvements
- When the probability of coordinator crash during prepared state is acceptably low
- When extra round trip latency is unacceptable for high-throughput systems
- When manual intervention for rare blocking scenarios is acceptable
Further Reading
Three-Phase Commit solves 2PC’s blocking problem in theory. In practice, the assumptions required for 3PC to work are harder to guarantee than just dealing with 2PC’s rare blocking scenarios. Understanding the trade-offs helps you choose the right protocol for your specific requirements.
For further exploration of distributed systems concepts, consider these resources:
-
Distributed Systems: Concepts and Design by George Coulouris - Comprehensive textbook covering consensus protocols and distributed transactions
-
Introduction to Reliable and Secure Distributed Systems by Christian Bettstetter - Deep dive into protocol correctness and failure models
-
The FLP Impossibility Result - Original paper by Fischer, Lynch, and Patterson on the impossibility of consensus in asynchronous systems
Conclusion
Category
Related Posts
Apache ZooKeeper: Consensus and Coordination
Explore ZooKeeper's Zab consensus protocol, hierarchical znodes, watches, leader election, and practical use cases for distributed coordination.
Distributed Systems Primer: Key Concepts for Modern Architecture
A practical introduction to distributed systems fundamentals. Learn about failure modes, replication strategies, consensus algorithms, and the core challenges of building distributed software.
etcd: Distributed Key-Value Store for Configuration
Deep dive into etcd architecture using Raft consensus, watches for reactive configuration, leader election patterns, and Kubernetes integration.