Three-Phase Commit: Non-Blocking Distributed Transactions

Learn how Three-Phase Commit (3PC) extends 2PC with a pre-commit phase, its assumptions, limitations, and when to use it.

published: March 24, 2026 reading time: 24 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Three-Phase Commit adds a PreCommit phase between voting and committing to solve 2PC's blocking problem. When the coordinator crashes after PreCommit, participants know enough to commit safely instead of waiting forever. The catch: 3PC only achieves non-blocking behavior under assumptions that are hard to meet in practice - eventual network synchrony, bounded node failures, and no partitions after PreCommit. In production, most teams find that 2PC's rare blocking scenarios are easier to handle than 3PC's extra round trip and delicate assumptions. For long-running transactions, the Saga pattern with compensating transactions is usually the better choice.

Three-Phase Commit: Non-Blocking Distributed Transactions

Two-Phase Commit works when everyone cooperates. The coordinator stays up, participants respond, and the network behaves. But distributed systems do not always cooperate. When the coordinator crashes mid-transaction, participants can wait forever. This blocking problem is what Three-Phase Commit tries to solve.

I ran into this during a database migration. We had a distributed transaction spanning three data centers, and the coordinator crashed at exactly the wrong moment. Two participants thought the transaction was pending. One thought it had aborted. We spent hours untangling the state. 2PC is simple but fragile. 3PC is smarter but comes with its own costs.

Introduction

Here’s the issue with 2PC. After voting yes in Phase 1, participants enter the prepared state. They hold locks and wait. And wait. And wait for the coordinator’s decision. If the coordinator crashes at this point, those participants are stuck. They cannot commit (maybe the coordinator decided to abort). They cannot abort (maybe the coordinator decided to commit). They just block.

graph TD
    subgraph "2PC Coordinator Crash Scenario"
        C[Coordinator]
        P1[Participant 1 - PREPARED]
        P2[Participant 2 - PREPARED]

        C -->|Phase 1| P1
        C -->|Phase 1| P2
        P1 -->|YES| C
        P2 -->|YES| C
        C -.- X[CRASH]
        X -.->|stuck| B1[BLOCKED]
        X -.->|stuck| B2[BLOCKED]

    end

This blocking is not just a performance issue. Locks sit held. Resources stay consumed. In worst cases, someone has to manually untangle things.

How 3PC Extends 2PC

3PC adds an extra phase between voting and committing. The idea is that 3PC is designed to be non-blocking under failure assumptions that are more realistic than 2PC’s assumptions.

Phase 1: CanCommit

The coordinator asks all participants if they can commit a transaction. This is identical to 2PC’s prepare phase.

sequenceDiagram
    participant C as Coordinator
    participant P1 as Participant 1
    participant P2 as Participant 2

    C->>P1: CanCommit?
    C->>P2: CanCommit?
    P1-->>C: Yes
    P2-->>C: Yes

If any participant votes No or times out, the coordinator sends Abort. The transaction ends. No blocking at this stage.

Phase 2: PreCommit

If all participants vote Yes, the coordinator sends PreCommit to all participants. This is the new phase in 3PC.

sequenceDiagram
    participant C as Coordinator
    participant P1 as Participant 1
    participant P2 as Participant 2

    Note over C,P2: Phase 2: PreCommit
    C->>P1: PreCommit
    C->>P2: PreCommit
    P1-->>C: ACK
    P2-->>C: ACK

Once a participant receives PreCommit, it knows something: all participants voted Yes, and the coordinator is still alive (it managed to send PreCommit messages). This knowledge changes the failure semantics.

Phase 3: DoCommit

After receiving ACK from all participants, the coordinator sends DoCommit. Participants then finalize the transaction.

sequenceDiagram
    participant C as Coordinator
    participant P1 as Participant 1
    participant P2 as Participant 2

    Note over C,P2: Phase 3: DoCommit
    C->>P1: DoCommit
    C->>P2: DoCommit
    P1-->>C: Committed
    P2-->>C: Committed

Why 3PC Is Non-Blocking

Here’s what happens when the coordinator crashes during Phase 2.

With 2PC, a participant in the prepared state cannot decide if the coordinator dies. With 3PC, when a participant receives PreCommit, it knows every participant voted Yes. If the coordinator crashes after sending PreCommit, participants can safely complete the commit. They have enough information to decide.

graph TD
    subgraph "3PC Coordinator Crash After PreCommit"
        C[Coordinator]
        P1[Participant 1]
        P2[Participant 2]

        C -->|PreCommit| P1
        C -->|PreCommit| P2
        P1 -->|ACK| C
        P2 -->|ACK| C
        C -.- X[CRASH]

        P1 -->|I can commit| D1[DoCommit]
        P2 -->|I can commit| D2[DoCommit]

    end

A participant that receives PreCommit and times out waiting for DoCommit can safely commit. It knows all participants voted Yes and the coordinator was alive long enough to send PreCommit.

Critical Assumptions

3PC requires assumptions that are often violated in practice:

Network Synchrony Assumption

3PC assumes the network is eventually synchronous. This means messages will eventually be delivered, even if delayed. In a truly asynchronous network where messages can be lost indefinitely, 3PC cannot guarantee non-blocking behavior.

This is the same assumption that makes FLP impossibility result relevant. If the network can partition forever, no protocol can be both safe and live in all executions.

The distinction between synchronous, partially synchronous, and asynchronous networks matters here. A synchronous network guarantees message delivery within a fixed time bound. An asynchronous network makes no such guarantee — messages can take arbitrarily long or be lost entirely. Real networks fall somewhere in between: they usually deliver messages but can experience prolonged delays during congestion or hardware faults.

3PC requires eventual synchrony, which means the network must eventually deliver messages even if it takes a long time. Your timeout values have to be set with this in mind. Set them too short and you get premature aborts on slow-but-healthy links. Set them too long and you extend the blocking window when the coordinator actually fails. In wide-area deployments across multiple data centers, these trade-offs become acute because latency spikes are common and partitions can last minutes.

Bounded Node Failure

3PC assumes nodes do not fail forever. If a node crashes and never recovers, 3PC cannot complete that transaction. The protocol handles transient coordinator failures but not permanent participant failures.

“Bounded” here means failures are temporary and recoverable. A node crashes, stops participating, but eventually comes back online and resumes its role. 3PC builds on this assumption because its recovery logic depends on participants eventually being present to commit or abort.

The problem emerges when a participant fails permanently. Suppose a participant votes Yes in Phase 1, receives PreCommit in Phase 2, and then crashes before sending ACK. The coordinator has moved to Phase 3 (DoCommit) on the assumption that all participants can commit. But one participant is gone. The remaining participants will commit — they received PreCommit and timed out — while the crashed participant is permanently uncertain. Its data may be left in a half-committed state. Some implementations handle this with a recovery log: participants write their intent to durable storage before sending ACK, so a crashed node can rejoin and determine its fate by reading that log. Most 3PC descriptions skip this because it adds complexity. The core protocol assumes nodes come back. It does not say what to do when they do not.

No Partition After PreCommit

3PC guarantees non-blocking behavior when the coordinator crashes after PreCommit AND the network does not partition. If a network partition occurs at exactly the wrong moment, participants could diverge.

The non-blocking guarantee rests on a specific sequence: the coordinator sends PreCommit to everyone, all participants acknowledge, and then the coordinator sends DoCommit. If the coordinator crashes after this sequence starts, participants who received PreCommit know enough to commit. But that guarantee holds only if the network stays connected.

Consider what happens when a partition splits the participant group right after some nodes receive PreCommit. The nodes on the side that got PreCommit will commit after their timeout fires. The nodes on the other side will abort after their timeout fires. You now have a transaction that committed on one side and aborted on the other. The atomicity guarantee is broken. This is not a rare edge case; network partitions are exactly the kind of failure distributed systems must handle. The assumption that partitions will not occur after PreCommit is essentially requiring the window between PreCommit and DoCommit to be partition-free. If that assumption does not hold, you lose the property 3PC was designed to provide.

When 3PC Might Be Considered

3PC is rarely used in production, but here are scenarios where it could make sense:

Short-duration transactions on reliable networks: If your network is mostly reliable and transactions complete quickly, the extra phase overhead might be acceptable.
Systems requiring strict liveness: If blocking is unacceptable and your network assumptions match 3PC’s requirements, the protocol provides better liveness guarantees than 2PC.
Research and educational contexts: Understanding 3PC helps understand the trade-offs in distributed transaction protocols.

Trade-off Analysis

Aspect	2PC	3PC	Saga
Blocking	Yes, coordinator crash in prepared state	No (under assumptions)	No
Phases	2	3	Many (one per step)
Coordinator crash during prepared	Blocks participants	Participants can recover	No effect
Network assumptions	None (works async)	Eventual synchrony	None
Rollback on failure	Atomic	Atomic	Compensating transactions
Performance overhead	2 round trips	3 round trips	N round trips
Complexity	Low	Medium	High
Use case	Tight consistency	Tight consistency	Eventual consistency
Example systems	PostgreSQL, MySQL	Rarely used	AWS Step Functions, Temporal

Production Failure Scenarios

Here is where theory meets reality. 3PC handles some failures elegantly, but others expose cracks in its design.

Coordinator Crash After PreCommit but Before DoCommit

This is the scenario 3PC was designed for. When the coordinator crashes after sending PreCommit but before DoCommit, 2PC participants would block forever waiting for a decision that never comes. 3PC avoids this by giving participants enough information to make a recovery decision autonomously.

The mechanism is tied to what participants know at each phase. A participant that received PreCommit knows three things: every participant voted Yes in Phase 1, the coordinator was alive long enough to reach Phase 2, and no participant will suddenly change its vote. These facts together form the basis for safe autonomous commit. After a timeout waiting for DoCommit, the participant can proceed knowing it has majority support for the transaction.

The wrinkle is asymmetric ACK receipt. The coordinator may crash after sending PreCommit to some participants but before others have sent their ACK. Participants that received PreCommit can commit after their timeout. Participants that never received PreCommit cannot safely commit and must abort. This asymmetry creates a recovery gap. The coordinator dying mid-Phase-2 means some participants have commit authority while others do not.

Consider a four-participant transaction. The coordinator sends PreCommit to P1, P2, and P3. P1 and P2 send ACK. The coordinator crashes before reaching P3 or before receiving P3’s ACK. P1 and P2 will commit after timeout. P3, having never received PreCommit, will abort. The transaction is now committed on P1 and P2 but aborted on P3. This is not a theoretical edge case; it is the natural consequence of coordinator failure during Phase 2. Recovery requires reconciling this split outcome, which usually means manual intervention or a reconciliation process outside the protocol.

This is why some 3PC implementations add a recovery log. Participants write their intent to durable storage before sending ACK. If a participant crashes and restarts, it reads the log to determine whether it should commit or abort. The protocol itself does not specify this; it is an engineering addition to handle the asymmetry that coordinator failure introduces.

Network Partition During PreCommit Phase

A network partition during the PreCommit phase is the scenario that breaks 3PC’s central promise. The coordinator sends PreCommit to all participants, but the network splits before some receive it. The result is divergent behavior: participants on one side of the partition commit while participants on the other abort. Atomicity collapses.

The sequence plays out like this. The coordinator reaches Phase 2 and sends PreCommit to P1, P2, and P3. The message reaches P1 and P2 but the network partitions before P3 gets it. P1 and P2 wait for DoCommit. P3 waits for something that will never come. After their timeouts, P1 and P2 commit because they received PreCommit. P3 aborts because it never received PreCommit and timed out waiting for the next phase. The transaction is now in a state that the protocol cannot represent: committed on P1 and P2, aborted on P3.

This is not a rare failure mode. Network partitions happen in distributed systems. They happen during hardware faults, switch misconfigurations, datacenter connectivity issues, and routine maintenance windows. The window between PreCommit being sent and DoCommit being received is not instantaneous. It involves network round-trips, processing time, and disk writes. A partition can occur at any point during that window.

The consequences are severe. You now have a transaction that partially committed. P1 and P2 have applied the changes. P3 has not. Your data is inconsistent across participants. Reconciling this requires application-level logic that 3PC does not provide. You might need compensating transactions, manual repair, or a full rollback. The protocol gave you non-blocking behavior in exchange for accepting this failure mode.

Real-world deployments make this worse, not better. In a single-datacenter deployment, partitions are rare but still possible. In a multi-datacenter deployment spanning geographically distributed nodes, partitions are common. A WAN link going down for five minutes is not a hypothetical; it is an operational reality for many systems. Setting timeout values long enough to cover these events makes the blocking window unacceptably long. Setting them short enough to be responsive introduces premature aborts on slow networks. There is no safe middle ground that works across both local and wide-area networks.

The assumption that partitions will not occur after PreCommit is essentially equivalent to assuming your network is reliable. If you have a reliable network that never partitions, 3PC works as advertised. But if you have a reliable network that never partitions, you do not need the non-blocking property because coordinator crashes in the prepared state are the only blocking scenario, and those are rare in reliable networks. 3PC’s assumptions undermine its own value proposition.

Coordinator Crash Before PreCommit

Coordinator goes down before sending any PreCommit. Participants are stuck in CanCommit, having voted Yes (or been presumed to have voted Yes by not saying No). With 3PC, they can abort after timeout. No worse than 2PC, no blocking.

This scenario is actually the simplest recovery case in 3PC. Participants voted Yes in Phase 1, meaning they are willing to commit. But the coordinator never got to Phase 2, so no participant has received PreCommit. Without PreCommit, no participant has the information needed to commit unilaterally. The safe move is to abort after a timeout.

The timeout value matters here. Set it too short and a slow coordinator triggers premature aborts. Set it too long and participants hold resources while waiting for a coordinator that is not coming back. You would set this timeout based on expected round-trip times for your network, with some margin for unusual congestion. Some implementations use an adaptive timeout that grows if they see repeated coordinator unavailability, but that adds complexity.

What makes this scenario tractable is that no participant has acted on incomplete information. Everyone is waiting, everyone voted Yes, and everyone can independently conclude that the transaction should be rolled back. There is no partial commit to clean up, no divergent state to reconcile. The recovery path is clean.

Simultaneous Participant and Coordinator Failure

Both crash together. The survivors must wait for one or the other to come back. 3PC has no magic for this. No protocol can work around nodes that stay dead forever.

Common Pitfalls

Despite its theoretical advantages, 3PC is rarely used in production:

The assumptions are hard to meet. Network synchrony is not guaranteed in real systems. Wide-area networks especially can experience prolonged partitions.
The improvement is marginal. 3PC eliminates blocking only under specific failure scenarios. Most systems just use timeouts and manual intervention instead of adding 3PC’s complexity.
Saga pattern is often better. For long-running transactions, compensating transactions are more practical than trying to maintain locks across distributed participants.
The performance cost matters. The extra round trip hurts high-throughput systems. For most use cases, the blocking probability with 2PC is low enough that 3PC’s extra latency is hard to justify.

Implementing a Simple 3PC

Here is a simplified view of how 3PC coordinator logic works:

class ThreePhaseCommitCoordinator:
    def __init__(self, participants):
        self.participants = participants
        self.state = "INIT"

    def execute(self, transaction):
        # Phase 1: CanCommit
        votes = []
        for p in self.participants:
            vote = p.can_commit()
            votes.append(vote)

        if all(v == "YES" for v in votes):
            # Phase 2: PreCommit
            self.state = "PRECOMMIT"
            for p in self.participants:
                p.pre_commit()

            # Phase 3: DoCommit
            self.state = "COMMIT"
            for p in self.participants:
                p.do_commit()
        else:
            # Abort
            self.state = "ABORT"
            for p in self.participants:
                p.abort()

The participant side follows a similar pattern with timeouts at each phase that enable recovery decisions.

Consistency vs Availability Trade-offs in 3PC

People sometimes treat 3PC as if it solves the consistency-availability tradeoff. It does not. Here is why.

The CAP Theorem Context

The CAP theorem says a distributed system can give you at most two of three things: consistency, availability, and partition tolerance. Since partitions are unavoidable in practice, you are really choosing between consistency and availability.

3PC tries to stay consistent without the blocking problem 2PC has. Under some failure assumptions, it works. But when a partition hits during PreCommit, the whole thing breaks. Participants who got PreCommit commit after timeout. Those who missed it abort. The system diverges into two inconsistent groups.

This is not a bug in 3PC. It is just what happens when you hit a partition. 3PC cannot keep everyone available and consistent at the same time. It picks consistency in this scenario, which is fine by CAP. But 3PC does not dodge the trade-off. It just shifts where the conflict shows up compared to 2PC.

Why 3PC Does Not Solve CAP

The misconception is widespread. People hear “non-blocking” and assume that means 3PC is safe under all conditions. It is not. The difference from 2PC is that 3PC avoids blocking. It does not avoid inconsistency when partitions occur.

Non-blocking and consistency are not the same property. 3PC stays non-blocking under certain failure assumptions. It does not stay consistent under all failure scenarios.

Here is the concrete failure mode. During the window between the coordinator sending PreCommit and participants receiving it, a partition can split the participant group. Those on the side that received PreCommit will eventually commit. Those on the other side will abort after timeout. The transaction is now in an inconsistent state — some participants committed, others rolled back. CAP does not say this cannot happen; it says you cannot have consistency and availability during a partition. 3PC picks consistency in this scenario by allowing participants who received PreCommit to proceed. But it does not prevent the split from happening, and it does not give you availability during that split.

The CAP theorem does not care what your protocol claims to guarantee. It cares about actual system behavior during partitions. 3PC’s non-blocking property is a liveness improvement over 2PC under certain assumptions, but those assumptions break down during network partitions. The protocol shifts the blocking problem to an inconsistency problem, which may or may not be preferable depending on your application’s needs. For systems that need strict consistency, neither 2PC nor 3PC provides availability during a partition. For systems that prefer availability, neither protocol is appropriate.

Practical Implications

If you need strict consistency, both 2PC and 3PC fail during partitions. The real choice is blocking (2PC) or potential inconsistency (3PC). Neither gives you availability in the CAP sense during a partition.

If availability is your goal, look at Saga or eventual consistency. These handle partitions without the lock-holding problems of 2PC or the partition-vulnerability of 3PC.

Quick Recap

3PC adds a PreCommit phase between 2PC’s voting and commit phases
The PreCommit phase lets participants recover when the coordinator crashes
3PC is non-blocking under assumptions of eventual network synchrony and bounded failures
In practice, 3PC is rarely used because its assumptions are hard to meet
Saga pattern is often preferred for long-running distributed transactions
2PC remains the most common protocol for short distributed transactions requiring atomicity

For more on distributed transactions, see Two-Phase Commit for the protocol that 3PC builds upon. To understand the broader consistency landscape, read Consistency Models. For handling failures without blocking, see the Saga Pattern and Outbox Pattern.

Interview Questions

1. Explain the blocking problem in Two-Phase Commit and why it occurs.

After Phase 1 (voting), participants enter the prepared state and hold locks
If the coordinator crashes in this state, participants cannot decide whether to commit or abort
They cannot commit (maybe coordinator decided to abort) and cannot abort (maybe coordinator decided to commit)
The result is indefinite blocking, holding resources and locks

2. How does Three-Phase Commit address the blocking problem?

3PC adds a PreCommit phase between voting and final commit
When a participant receives PreCommit, it knows all participants voted Yes and the coordinator is alive
If the coordinator crashes after PreCommit, participants can safely commit without waiting
The extra phase provides the information needed for recovery decisions

3. What are the critical assumptions that 3PC relies on for its non-blocking property?

Eventual network synchrony: messages will eventually be delivered, even if delayed
Bounded node failures: nodes do not fail forever and eventually recover
No network partition after PreCommit: partition during Phase 2 can still cause inconsistency
If these assumptions are violated, 3PC loses its non-blocking advantage

4. What happens if a network partition occurs during the PreCommit phase in 3PC?

Participants that received PreCommit will eventually commit after timeout
Participants that did not receive PreCommit will abort after timeout
This results in inconsistent outcomes across participant groups
3PC's non-blocking guarantee does not hold during network partitions

5. Compare 2PC and 3PC in terms of message complexity and round trips.

2PC requires 2 phases with 2 round trips (CanCommit/Abort, then DoCommit/Abort)
3PC requires 3 phases with 3 round trips (CanCommit, PreCommit, DoCommit)
The extra round trip adds latency, especially problematic for high-throughput systems
Message complexity is higher in 3PC due to the additional PreCommit phase and ACK messages

6. Why is 3PC rarely used in production systems despite its theoretical advantages?

Network synchrony assumptions are hard to meet in real systems, especially WANs
The improvement over 2PC is marginal: blocking only occurs under specific failure scenarios
Saga pattern is more practical for long-running transactions
Most systems use timeouts and manual intervention for the rare blocking cases

7. Under what conditions can 3PC produce inconsistent outcomes?

Network partition during the PreCommit phase
Different participants receive PreCommit at different times due to message delays
Participants that received PreCommit commit, others abort after timeout
This violates atomicity and consistency guarantees

8. How does the Saga pattern differ from 3PC in handling distributed transactions?

Saga uses compensating transactions for rollback instead of atomic rollback
Saga is non-blocking by design and works well for long-running transactions
Saga provides eventual consistency rather than strict atomicity
3PC attempts strict atomicity but can still produce inconsistencies during partitions

9. What happens when a participant receives PreCommit but the coordinator crashes before DoCommit?

The participant knows all participants voted Yes and the coordinator was alive
After a timeout waiting for DoCommit, the participant can safely commit
This is the key non-blocking property of 3PC
The participant has sufficient information to make a safe decision independently

10. Explain the relationship between 3PC and the FLP impossibility result.

FLP proves that in a fully asynchronous system, no consensus protocol can be both safe and live
3PC's non-blocking property relies on eventual synchrony assumptions (not fully asynchronous)
If the network can partition forever, 3PC cannot guarantee non-blocking behavior
3PC moves the assumption from "no failures" to "eventual message delivery"

11. What is the role of timeouts in 3PC recovery decisions?

Timeouts trigger recovery decisions when expected messages do not arrive
In Phase 1, timeout without Yes vote results in Abort
In Phase 2, timeout after PreCommit results in DoCommit
In Phase 3, timeout waiting for ACK results in assuming commit succeeded
Timeout values must be set based on expected network delay characteristics

12. Can 3PC guarantee availability under network partitions? Explain why or why not.

No, 3PC cannot guarantee availability during network partitions
During partitions after PreCommit, some participants commit while others abort
This violates consistency, so the system cannot be both consistent and available (CAP)
3PC is designed for consistency with non-blocking under specific failure assumptions

13. Describe a scenario where 2PC blocks but 3PC does not.

Coordinator sends Prepare to all participants and receives Yes votes from all
Coordinator crashes before sending the commit decision
In 2PC: participants are in prepared state and block indefinitely
In 3PC: participants received PreCommit, so they can commit after timeout
The key difference is the information available to participants after PreCommit

14. What happens if a participant crashes during the PreCommit phase?

If the crashed participant never received PreCommit, other participants can commit after timeout
If the crashed participant received PreCommit but crashed before ACK, recovery becomes ambiguous
The coordinator cannot distinguish between "crashed before ACK" and "ACK lost in transit"
This requires manual intervention or additional protocol machinery to resolve

15. How does 3PC handle the scenario where a participant votes No in Phase 1?

If any participant votes No, the coordinator immediately sends Abort to all participants
No blocking occurs at this stage since the decision is to abort
Participants that already prepared can release their locks upon receiving Abort
This is identical to 2PC's handling of a No vote

16. What is the difference between 3PC's non-blocking property and 2PC's blocking behavior?

2PC blocks when coordinator crashes in prepared state because participants cannot proceed without the coordinator's decision
3PC non-blocking emerges because the PreCommit phase transfers sufficient state to participants
With PreCommit, participants know all voted Yes and coordinator was alive, enabling autonomous decisions
The distinction is information distribution: 3PC participants have enough context to decide without the coordinator

17. How would you modify 3PC to handle network partitions more gracefully?

Add a quorom-based commit rule requiring majority acknowledgment before DoCommit
Implement partition detection with minority side automatically aborting
Use a witness or observer node pattern to track global commit state
Consider integrating with Paxos or Raft for coordinator recovery during partitions

18. What are the timeout value considerations when implementing 3PC?

Timeout too short: participants abort prematurely when network is just slow
Timeout too long: blocking duration becomes unacceptable when coordinator truly fails
Phase-dependent timeouts: PreCommit timeout should be shorter than CanCommit timeout
Adaptive timeouts based on historical network latency patterns improve reliability

19. Compare 3PC's approach to coordinator recovery with Raft's leader election.

Both aim to recover from coordinator/leader failures without blocking
3PC relies on participants autonomously deciding based on PreCommit state
Raft elects a new leader who then determines continuation based on log state
3PC can commit autonomously while Raft requires new leader to drive progress
Raft provides stronger guarantees but requires more infrastructure

20. In what scenarios would you choose 2PC over 3PC despite 3PC's non-blocking advantage?

When network assumptions do not guarantee eventual synchrony (unreliable networks)
When simplicity matters more than theoretical liveness improvements
When the probability of coordinator crash during prepared state is acceptably low
When extra round trip latency is unacceptable for high-throughput systems
When manual intervention for rare blocking scenarios is acceptable

Three-Phase Commit: Non-Blocking Distributed Transactions

Introduction

How 3PC Extends 2PC

Phase 1: CanCommit

Phase 2: PreCommit

Phase 3: DoCommit

Why 3PC Is Non-Blocking

Critical Assumptions

Network Synchrony Assumption

Bounded Node Failure

No Partition After PreCommit

When 3PC Might Be Considered

Trade-off Analysis

Production Failure Scenarios

Coordinator Crash After PreCommit but Before DoCommit

Network Partition During PreCommit Phase

Coordinator Crash Before PreCommit

Simultaneous Participant and Coordinator Failure

Common Pitfalls

Implementing a Simple 3PC

Consistency vs Availability Trade-offs in 3PC

The CAP Theorem Context

Why 3PC Does Not Solve CAP

Practical Implications

Quick Recap

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Apache ZooKeeper: Consensus and Coordination

Distributed Systems Primer: Key Concepts for Modern Architecture

etcd: Distributed Key-Value Store for Configuration