Synchronous Replication: Data Consistency Across Nodes

Learn about synchronous replication patterns in distributed databases, including Quorum-based replication and write-ahead log shipping for zero RPO.

published: March 24, 2026 reading time: 20 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Synchronous replication ensures data durability across distributed database nodes by blocking writes until replicas confirm receipt. The primary waits for a quorum of replicas to acknowledge each write before returning success to the client, which gives zero RPO but adds latency. Quorum-based systems like Google Spanner and CockroachDB require majority confirmation and let you lose floor(N/2) nodes and still maintain quorum. This approach works well for financial transactions and inventory systems where data loss is unacceptable, but cross-datacenter deployments face real latency constraints.

Introduction

Synchronous replication is a replication strategy where the primary (leader) node waits for confirmation that writes have been successfully applied to a quorum of replicas before acknowledging the write to the client. This ensures that data is durable across multiple nodes before the write is considered complete, making it ideal for workloads where zero data loss is non-negotiable.

Unlike asynchronous replication, which returns immediately after the local write, synchronous replication introduces latency proportional to the round-trip time between the primary and its replicas. In exchange, it provides strong consistency guarantees and eliminates scenarios where a successful write is silently lost due to a node failure.

This document covers how synchronous replication works under the hood, the different protocols and configurations available (including quorum-based replication and semi-synchronous replication), and the practical trade-offs you need to understand before deploying it in production.

Core Concepts

Synchronous replication blocks the commit operation until replicas confirm they have written the data. The primary sends the transaction to replicas, waits for acknowledgment, then returns success to the client.

sequenceDiagram
    participant Client
    participant Primary
    participant Replica1
    participant Replica2

    Client->>Primary: BEGIN; UPDATE...; COMMIT
    Primary->>Replica1: Send WAL entry
    Primary->>Replica2: Send WAL entry
    Replica1-->>Primary: ACK received
    Replica2-->>Primary: ACK received
    Primary-->>Client: COMMIT confirmed

The protocol varies. Some systems use write-ahead log (WAL) shipping. PostgreSQL uses WAL; MySQL Group Replication uses certification.

The Cost: Latency

Every synchronous write waits for a network round trip to at least one replica. In a single-datacenter setup, this latency might be 1-5ms. Cross-datacenter latency jumps to 20-100ms or more depending on geography.

# Synchronous replication adds latency
start = time.time()
result = db.execute("UPDATE accounts SET balance = ? WHERE id = ?", new_balance, account_id)
# This blocks until replica confirms
end = time.time()
print(f"Write latency: {end - start}ms")  # Typically 5-50ms with sync repl

This latency is the reason most systems default to async. For critical data, though, the durability guarantee is worth the cost.

Quorum-Based Replication

Modern synchronous replication often uses quorum-based approaches. With a quorum, you write to multiple nodes but only wait for a majority to confirm.

graph TD
    subgraph Quorum Write
        Client1[Client] --> Write[Write to N nodes]
        Write --> Q{Quorum?}
        Q -->|Write confirmed| Success[Success to Client]
    end

    subgraph "3 Node Cluster"
        N1[(Node 1)]
        N2[(Node 2)]
        N3[(Node 3)]
    end

    Write --> N1
    Write --> N2
    Write --> N3
    N1 --> Q
    N2 --> Q
    N3 --> Q

With 3 nodes, you need 2 confirmations (majority). With 5 nodes, you need 3. The quorum provides fault tolerance: you can lose floor(N/2) nodes and still have a majority to confirm writes.

# Quorum calculation
def required_quorum(total_nodes):
    return (total_nodes // 2) + 1

# 3 nodes require 2 acks
print(required_quorum(3))  # 2

# 5 nodes require 3 acks
print(required_quorum(5))  # 3

# 7 nodes require 4 acks
print(required_quorum(7))  # 4

Google Spanner uses this approach. So does CockroachDB. Both provide strong consistency across globally distributed nodes.

Write-Your-Write Consistency

Synchronous replication gives you write-your-write consistency by default. If a client writes data and then immediately reads it, the read will see that write. No special handling needed.

-- With synchronous replication, this always works
BEGIN;
UPDATE users SET email = 'new@example.com' WHERE id = 1;
COMMIT;

-- Immediate read returns the new email
SELECT email FROM users WHERE id = 1;
-- Returns: 'new@example.com'

With async replication, that immediate read might hit a replica that has not yet received the update. You would see the old email.

Single-Leader vs Multi-Leader

Synchronous replication works with both single-leader and multi-leader topologies.

Single-leader is simpler. All writes go to one primary. The primary coordinates synchronous replication to followers. PostgreSQL streaming replication in synchronous mode uses this topology.

Multi-leader allows writes to any node. Each leader synchronously replicates to other leaders. This is more complex because concurrent writes to the same row can conflict.

graph TD
    subgraph Single Leader
        L1[Leader] --> F1[Follower 1]
        L1 --> F2[Follower 2]
    end

    subgraph Multi Leader
        L3[Leader A] <--> L4[Leader B]
        L4 <--> L5[Leader C]
        L5 <--> L3
    end

Most production systems use single-leader synchronous replication. Multi-leader conflicts are genuinely hard to handle correctly.

Configuring Synchronous Replication

PostgreSQL

PostgreSQL uses Write-Ahead Log (WAL) shipping for synchronous replication. When a transaction commits on the primary, the WAL records stream immediately to the configured standby servers. The primary blocks until at least one standby confirms it has received and written the WAL records to disk. The commit does not complete until that confirmation arrives.

The synchronous_commit parameter controls the durability level. Setting it to on (the default) forces the primary to wait for acknowledgments from the specified standbys before telling the client the write succeeded. The synchronous_standby_names parameter lists the standby servers that participate in synchronous replication. You can use priority-based naming to prefer certain standbys over others, which is useful when some replicas are geographically closer.

The query against pg_stat_replication shows the health of each replication connection. The sent_lsn, write_lsn, flush_lsn, and replay_lsn columns track WAL as it moves through the pipeline — sent to the replica, written to its WAL file, flushed to disk, and finally replayed. Subtracting replay_lsn from sent_lsn gives you the replication lag in bytes.

-- Enable synchronous replication
ALTER SYSTEM SET synchronous_commit = on;
ALTER SYSTEM SET synchronous_standby_names = 'replica1,replica2';

-- Verify status
SELECT client_addr, state, sent_lsn, write_lsn,
       flush_lsn, replay_lsn,
       sent_lsn - replay_lsn AS replication_lag
FROM pg_stat_replication;

MySQL Group Replication

MySQL Group Replication takes a different path. It uses a certification-based protocol instead of WAL shipping. When a transaction commits on one member, it broadcasts the changes to the group. Other members run a certification process that checks whether concurrent transactions modified the same row. If there is no conflict, the transaction is certified and applied. This approach handles multi-leader topologies naturally because conflicts surface at commit time rather than being prevented at write time.

The configuration shown switches MySQL into multi-primary mode with update-everywhere checks enabled. In single-primary mode (the default), all writes go to one node and replicate synchronously to secondaries. In multi-primary mode, any member can accept writes, and the certification layer keeps things consistent. The group_replication_enforce_update_everywhere_checks setting turns on conflict detection for multi-primary mode, which adds overhead but lets you distribute writes across all nodes.

MySQL Group Replication requires the group_replication plugin to be installed and configured. The performance_schema.replication_group_members view shows the current state of each member, which helps when you need to figure out whether all nodes are reachable.

-- Configure group replication as synchronous
SET GLOBAL group_replication_single_primary_mode = OFF;
SET GLOBAL group_replication_enforce_update_everywhere_checks = ON;

START GROUP_REPLICATION;

CockroachDB

CockroachDB builds synchronous replication into its core using the Raft consensus algorithm. PostgreSQL and MySQL treat sync replication as a configurable add-on. CockroachDB treats it as a fundamental feature. Data splits into ranges (typically 64MB each), and each range has its own Raft group. Writes must achieve consensus within that Raft group before being acknowledged.

The default setup writes data to 3 range replicas and waits for 2 confirmations. That is the quorum. If a node fails, the remaining two replicas keep serving reads and can elect a new leader. Because each range runs independently, CockroachDB can keep going even when some ranges have lost quorum, as long as other ranges are fine.

The SHOW EXPERIMENTAL_RANGES command shows how data spreads across your cluster. Each range lists its current replicas and leader. In production, you would distribute range replicas across failure domains (different racks, zones, or datacenters) so that losing one rack does not take out a majority. CockroachDB ships with replication always on, always consistent. You do not configure synchronous replication explicitly. Latency is handled through placement-aware replica selection.

-- CockroachDB uses Range replication by default
-- 3-way synchronous replication is built in
SHOW EXPERIMENTAL_RANGES FROM TABLE users;

Semi-Synchronous Replication

Pure synchronous replication can block forever if a replica goes down. Semi-synchronous replication offers a middle ground: wait for at least one replica, but do not block if none respond.

graph TD
    P[Primary] --> R1[Replica 1]
    P --> R2[Replica 2]

    R1 -->|confirms| P
    P -->|after 1 ack| Client[Client notified]

    R2 -.->|timeout| P

PostgreSQL supports this via the synchronous_commit = remote_apply setting. MySQL has similar functionality with semi-sync replication plugins.

When Synchronous Replication Makes Sense

Synchronous replication is not always the answer. Use it when:

Financial transactions: Ledgers, payments, anything where data loss is unacceptable
Inventory systems: You cannot oversell because a replica was behind
Regulatory requirements: Some compliance frameworks mandate zero data loss (RPO = 0)
Critical metadata: User accounts, permissions, access control lists

# Example: Synchronous replication for financial transactions
def transfer_funds(from_account, to_account, amount):
    with db.transaction():
        # These writes confirm synchronously
        db.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?",
                   amount, from_account)
        db.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?",
                   amount, to_account)
        # Transaction only commits when replicas confirm

Synchronous Replication Failure Flow

Understanding what happens when components fail is critical for designing resilient systems.

flowchart TD
    Start[Write Request] --> Primary[Primary receives write]
    Primary --> Send1{Send to Replica 1}
    Primary --> Send2{Send to Replica 2}

    Send1 -->|ACK received| Wait2{Wait for Replica 2}
    Send2 -->|ACK received| Wait1{Wait for Replica 1}

    Send1 -->|timeout| Retry1{Retry timeout?}
    Send2 -->|timeout| Retry2{Retry timeout?}

    Retry1 -->|exceeded| Block[Primary blocks write]
    Retry2 -->|exceeded| Block

    Wait2 -->|both acks| Success[Write confirmed to client]
    Wait1 -->|both acks| Success

    Block -->|Replica recovers| Resume[Resume waiting]
    Block -->|replica down long| FailWrite[Write fails - transaction rejected]

Failure scenarios:

Scenario	Behavior	Impact
One replica times out	Primary blocks until timeout exceeds threshold	Write latency increases
Both replicas timeout	Write fails, transaction rejected	Client must retry
Primary fails before commit	No data loss (no commit until replicas confirm)	Automatic failover to replica
Network partition	Primary cannot reach quorum	Primary blocks or fails

Common Pitfalls

Configuring sync with only one replica: If you have synchronous replication configured but only one replica, losing that replica blocks all writes. Always have at least two replicas for synchronous mode.
Mixing sync and async modes: Some databases let you configure which replicas use sync vs async. Mixing modes creates confusing behavior where some data is durable and some is not.
Ignoring latency during peak load: During high write throughput, synchronous replication latency compounds. Test under load, not just idle conditions.
Not testing failover: If you have never tested synchronous failover, you do not know if it works correctly. Schedule regular failover drills.
Geographic distribution: Synchronous replication across continents is usually impractical. If you need strong consistency with geographic distribution, consider synchronizing to a nearby replica and async-replicating to distant ones.

Split-Brain Prevention

Synchronous replication helps prevent split-brain scenarios. If the primary fails and a replica promotes, that replica has all committed data because writes only confirmed after replicas received them.

But you still need quorum-based failover logic. Multiple replicas might think the primary is down simultaneously. Use consensus-based leader election to ensure only one replica promotes.

graph TD
    M[Monitor] --> P1{Primary alive?}
    P1 -->|no| E[Election]
    E --> Q{Quorum agrees?}
    Q -->|yes| P2[Promote replica]
    Q -->|no| E
    P2 --> NM[New Primary]

Quick Recap

Synchronous replication blocks until replicas confirm writes
Latency cost: 1-50ms per write depending on topology
Quorum-based replication provides fault tolerance with majority confirmation
Best for: financial data, inventory, anything requiring zero data loss
Monitor replication lag, connection counts, and disk usage
Test failover procedures before you need them

For more on replication strategies, see Database Replication for a broader overview. To understand the consistency trade-offs, read Consistency Models. If you are evaluating replication against other availability patterns, see Availability Patterns.

Trade-Off Comparison: Sync vs Semi-Sync vs Async

Dimension	Synchronous	Semi-Synchronous	Asynchronous
Durability Guarantee	Full (all replicas confirm)	Partial (at least one)	Minimal (primary only)
Write Latency	Highest (waits for all)	Medium (waits for one)	Lowest (no wait)
Blocking Risk	High (any replica down)	Low (only primary down)	None
Data Loss Risk	None	Minimal	Possible
Availability	Lower	Medium	Highest
Typical RPO	Zero	Near-zero	Lag-dependent
Cross-DC Support	Poor (high latency)	Moderate	Excellent

Monitoring Synchronous Replication

Watch these metrics:

-- PostgreSQL: Check replication health
SELECT client_addr, state, sent_lsn, write_lsn,
       flush_lsn, replay_lsn,
       replay_lag,
       flush_lag,
       write_lag
FROM pg_stat_replication;

-- MySQL: Group Replication status
SELECT MEMBER_HOST, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

Key metrics to track:

Metric	Warning	Critical
Replication lag	> 1s	> 10s
Replica state	—	off “streaming”
Disk usage	> 70%	> 85%
Connection count	> 80% max	> 95% max

Interview Questions

1. What is synchronous replication and how does it differ from asynchronous replication?

Expected answer points:

Synchronous replication blocks the commit operation until all replicas confirm they have written the data
Async replication returns success to the client immediately without waiting for replica confirmations
The key trade-off is latency (sync) vs potential data loss (async)
Sync provides zero RPO; async RPO depends on replication lag

2. Why does synchronous replication add latency to write operations?

Expected answer points:

Each write must be sent to replicas and the primary must wait for acknowledgments before confirming to the client
Network round-trip time to at least one replica is added to every synchronous write
Single-datacenter latency: 1-5ms typical
Cross-datacenter latency: 20-100ms or more depending on geography
Latency compounds during high write throughput periods

3. How does quorum-based replication work in synchronous systems?

Expected answer points:

Writes are sent to all nodes but only a majority (quorum) must confirm
Quorum size = floor(N/2) + 1 where N is total nodes
With 3 nodes, 2 confirmations needed; with 5 nodes, 3 needed
Provides fault tolerance: can lose floor(N/2) nodes and still have a majority
Examples: Google Spanner, CockroachDB use quorum-based sync replication

4. What is write-your-write consistency and how does synchronous replication guarantee it?

Expected answer points:

Write-your-write consistency means if a client writes data and immediately reads it, the read will see that write
Synchronous replication confirms writes only after replicas acknowledge receipt
No special client-side handling required
With async replication, immediate reads might hit a replica that has not yet received the update

5. What are the advantages and disadvantages of single-leader vs multi-leader topologies?

Expected answer points:

Single-leader: simpler topology, all writes go to one primary which coordinates replication to followers
Multi-leader: writes can go to any node, each leader replicates to other leaders
Multi-leader complexity: concurrent writes to the same row can conflict
Most production systems use single-leader synchronous replication
PostgreSQL streaming replication in synchronous mode uses single-leader topology

6. How do you configure synchronous replication in PostgreSQL?

Expected answer points:

Enable with: ALTER SYSTEM SET synchronous_commit = on;
Specify standby servers: ALTER SYSTEM SET synchronous_standby_names = 'replica1,replica2';
Monitor using pg_stat_replication view checking client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
Calculate replication lag: sent_lsn - replay_lsn

7. What is semi-synchronous replication and when would you use it?

Expected answer points:

Semi-synchronous waits for at least one replica to confirm, but does not block if no response
Addresses the problem of pure sync blocking forever if a replica goes down
PostgreSQL supports via synchronous_commit = remote_apply setting
MySQL has similar functionality with semi-sync replication plugins
Trade-off: slightly lower durability guarantee in exchange for better availability

8. What happens when a replica times out during synchronous replication?

Expected answer points:

Primary blocks the write until timeout threshold is exceeded
If timeout exceeds threshold, the primary retries or fails the write
Write latency increases during replica timeout
If both replicas timeout, the transaction is rejected and client must retry
When replica recovers, normal operation resumes

9. How does synchronous replication help prevent split-brain scenarios?

Expected answer points:

Writes only confirm after replicas receive them
If primary fails and a replica promotes, that replica has all committed data
However, quorum-based failover logic is still needed
Multiple replicas might think primary is down simultaneously
Consensus-based leader election ensures only one replica promotes

10. What are the key metrics to monitor in synchronous replication?

Expected answer points:

Replication lag: warning > 1s, critical > 10s
Replica state: should be "streaming"
Disk usage on replicas: warning > 70%, critical > 85%
Connection count: warning > 80% max, critical > 95% max
LSN positions: sent_lsn, write_lsn, flush_lsn, replay_lsn in PostgreSQL

11. What common pitfalls should you avoid when implementing synchronous replication?

Expected answer points:

Configuring sync with only one replica: losing that replica blocks all writes
Mixing sync and async modes: creates confusing behavior where some data is durable and some is not
Ignoring latency during peak load: sync replication latency compounds under load
Not testing failover: regular failover drills are essential
Geographic distribution: sync across continents is usually impractical

12. How does CockroachDB implement synchronous replication?

Expected answer points:

CockroachDB uses the Raft consensus algorithm for synchronous replication
Default configuration writes to 3 nodes and waits for 2 confirmations
Uses Range replication (data is split into ranges)
Each range has its own Raft group for consistency
Provides strong consistency across globally distributed nodes

13. What is the typical RPO (Recovery Point Objective) for synchronous replication?

Expected answer points:

Synchronous replication provides zero RPO
No data loss because writes only commit after replicas confirm
Semi-synchronous provides near-zero RPO
Async RPO is lag-dependent (could be seconds to hours)
RPO = 0 is required for financial transactions, inventory systems, and some compliance frameworks

14. What happens to writes during a network partition with synchronous replication?

Expected answer points:

If primary cannot reach quorum, it blocks or fails writes
The system becomes unavailable for writes until partition heals
This is the availability trade-off for strong consistency
During a partition, clients may receive timeouts or errors
Read operations may still work depending on configuration

15. When is synchronous replication not the right choice?

Expected answer points:

When write latency tolerance is very low
When data loss is acceptable (non-critical data)
When geographic distribution spans continents (latency impractical)
When highest availability is required (sync blocking is unacceptable)
When write throughput is extremely high and cost/speed trade-off does not justify sync

16. How does MySQL Group Replication differ from PostgreSQL synchronous replication?

Expected answer points:

MySQL Group Replication uses certification-based approach, not WAL shipping
MySQL can run in single-primary or multi-primary mode
Configure multi-primary: SET GLOBAL group_replication_single_primary_mode = OFF;
Enable update everywhere: SET GLOBAL group_replication_enforce_update_everywhere_checks = ON;
PostgreSQL uses streaming replication with WAL shipping

17. What is the relationship between synchronous replication and CAP theorem?

Expected answer points:

Synchronous replication favors Consistency over Availability in the CAP theorem
During network partitions, sync replication systems become unavailable rather than inconsistent
Async replication favors Availability over Consistency
This is why sync replication is chosen for critical data where data loss is unacceptable

18. How do you calculate the required quorum for a given number of nodes?

Expected answer points:

Required quorum formula: (total_nodes // 2) + 1
3 nodes require 2 confirmations
5 nodes require 3 confirmations
7 nodes require 4 confirmations
Majority requirement ensures at most one partition can have quorum

19. What happens if the primary fails before a commit is confirmed in synchronous replication?

Expected answer points:

No data loss occurs because the commit did not complete
Transaction is rejected since replicas did not confirm
Client receives failure and must retry
Automatic failover to a replica that has all committed data occurs
This is a key advantage over async replication

20. What use cases specifically benefit from synchronous replication?

Expected answer points:

Financial transactions: ledgers, payments, anything where data loss is unacceptable
Inventory systems: cannot oversell because a replica was behind
Regulatory requirements: compliance frameworks that mandate zero data loss (RPO = 0)
Critical metadata: user accounts, permissions, access control lists
Any scenario where the cost of data loss outweighs the latency cost of synchronous writes

Conclusion

Synchronous replication is the right choice when consistency cannot be compromised and your network is fast and reliable. It guarantees that every write is durable on a quorum of replicas before acknowledging the client, eliminating ghost reads and split-brain scenarios at the cost of increased write latency. In practice, most production systems use semi-synchronous replication as a pragmatic middle ground—guaranteeing at least one replica is in sync while keeping latency manageable.

Use synchronous replication for financial transactions, inventory systems, and any workload where stale reads would cause data corruption or business loss. Avoid it for geo-distributed deployments with high network latency, write-heavy workloads that need global performance, or systems where availability must be maintained even during network partitions.

Monitoring synchronous replication requires tracking replica lag (even in sync mode), acknowledgment round-trip times, quorum health, and failover timing. When a synchronous replica falls behind or fails, the system must either pause writes or fall back to asynchronous mode—design your alerting and runbooks accordingly.

Introduction

Core Concepts

The Cost: Latency

Quorum-Based Replication

Write-Your-Write Consistency

Single-Leader vs Multi-Leader

Configuring Synchronous Replication

PostgreSQL

MySQL Group Replication

CockroachDB

Semi-Synchronous Replication

When Synchronous Replication Makes Sense

Synchronous Replication Failure Flow

Common Pitfalls

Split-Brain Prevention

Quick Recap

Trade-Off Comparison: Sync vs Semi-Sync vs Async

Monitoring Synchronous Replication

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Asynchronous Replication: Speed and Availability at Scale

Google Spanner: Globally Distributed SQL at Scale

Consistency Models in Distributed Systems: Complete Guide