Asynchronous Replication: Speed and Availability at Scale
Learn how asynchronous replication works in distributed databases, including eventual consistency implications, lag monitoring, and practical use cases where speed outweighs strict consistency.
Asynchronous Replication: Speed and Availability at Scale
Introduction
Asynchronous replication is the workhorse of distributed databases. Writes confirm immediately on the primary. Replicas receive the changes eventually, usually milliseconds later but sometimes much longer. This gives you speed and availability, but it introduces a window where data can be lost if the primary fails.
Most production databases run async by default. The performance benefit is real. The trade-off is accepting a small probability of losing recent writes during failover. For many applications, this is acceptable. For others, it is not.
How Asynchronous Replication Works
The primary processes a write, confirms it to the client, and sends the change to replicas in the background. The client does not wait for replica confirmation.
sequenceDiagram
participant Client
participant Primary
participant Replica
Client->>Primary: BEGIN; UPDATE...; COMMIT
Primary->>Primary: Write to local WAL
Primary-->>Client: COMMIT confirmed
Note over Primary,Replica: (Background replication)
rect rgb(200, 200, 200)
Primary->>Replica: Send WAL entry
Replica-->>Primary: ACK
end
This non-blocking behavior is why async replication is fast. Write latency is essentially local disk latency plus a small amount of processing. There is no waiting for remote replica confirmation.
The Replication Lag Problem
With async replication, replicas fall behind the primary. This lag can be seconds or minutes depending on load, network conditions, and replica performance.
# This sequence can fail with async replication
def update_and_read(user_id, new_email):
db.execute("UPDATE users SET email = ? WHERE id = ?", new_email, user_id)
# Write confirmed immediately
# But replica might not have the update yet
user = db_replica.execute("SELECT email FROM users WHERE id = ?", user_id)
# Might return old email if replica is lagging
return user['email']
This is the fundamental tension. Async replication is fast, but reads from replicas can return stale data.
Statement-Based vs WAL-Based Replication
Different databases use different mechanisms.
Statement-based replication sends the SQL statements to replicas. MySQL’s original replication used this approach.
-- Primary executes:
UPDATE users SET email = 'new@example.com' WHERE id = 1;
-- Replica executes the same statement:
UPDATE users SET email = 'new@example.com' WHERE id = 1;
The problem: nondeterministic functions like NOW() or RAND() produce different results on different nodes.
WAL-based replication sends the binary WAL entries. PostgreSQL uses this. WAL entries are deterministic because they contain the actual bytes that changed.
# WAL entry contains the actual bytes
wal_entry = {
'type': 'UPDATE',
'table': 'users',
'key': 'id=1',
'old_data': {'email': 'old@example.com'},
'new_data': {'email': 'new@example.com'},
'timestamp': 1711234567
}
Row-based replication sends the actual row changes. MySQL’s row-based replication combines the benefits of both.
Eventual Consistency
Async replication provides eventual consistency. Given enough time without new writes, all replicas will converge to the same state.
flowchart LR
P[Primary] -->|immediate| R1[Replica 1]
P -->|immediate| R2[Replica 2]
R1 -.->|eventually consistent| Same[Same State]
R2 -.->|eventually consistent| Same
The “eventually” part is important. During the lag window, reads from different replicas can return different results. This is observable inconsistency, not just a theoretical concern.
# User experience with eventual consistency
def get_user_email(user_id):
# Route to random replica
replica = random.choice([replica1, replica2, replica3])
return replica.execute("SELECT email FROM users WHERE id = ?", user_id)
# Two consecutive calls might return different results
email1 = get_user_email(123) # 'old@example.com'
email2 = get_user_email(123) # 'new@example.com' if replica2 has newer data
Use Cases Where Async Makes Sense
Async replication is the right choice when:
- Web sessions and preferences: User profile updates can lag a few seconds without causing real problems
- Analytics and reporting: Reads for dashboards do not need millisecond freshness
- Social media feeds: Timeline updates do not need to be immediately consistent
- Logging and metrics: Time-series data tolerates small gaps
- Geographic distribution: Replicating across continents with sync would be prohibitively slow
# Good async use case: user preferences
def update_user_preference(user_id, key, value):
# Write immediately, replicate async
db.execute("UPDATE preferences SET value = ? WHERE user_id = ? AND key = ?",
value, user_id, key)
# User sees update instantly on primary
# Other replicas catch up shortly
# Bad async use case: account balance
def transfer_funds(from_id, to_id, amount):
# This needs synchronous replication for correctness
db.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?",
amount, from_id)
db.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?",
amount, to_id)
Monitoring Replication Lag
Lag monitoring is critical with async replication. You need to know how far behind your replicas are.
-- PostgreSQL: Check replication lag
SELECT client_addr, state,
sent_lsn, replay_lsn,
sent_lsn - replay_lsn AS lag_bytes
FROM pg_stat_replication;
-- MySQL: Check replica lag
SHOW REPLICA STATUS\G
# Look at: Seconds_Behind_Master, Relay_Log_Pos
# Application-level lag monitoring
def check_replica_health(replica):
last_replicated_ts = replica.get_last_replicated_timestamp()
current_ts = time.time()
lag_seconds = current_ts - last_replicated_ts
if lag_seconds > 30:
alert(f"Replica {replica.name} lag: {lag_seconds}s")
# Route reads away from this replica
remove_from_read_pool(replica)
Key metrics to track:
| Metric | Normal | Warning | Critical |
|---|---|---|---|
| Replication lag | < 1s | 1-10s | > 10s |
| Replica IO thread | Running | — | Stopped |
| Replica SQL thread | Running | — | Stopped |
| WAL backlog | < 100MB | 100MB-1GB | > 1GB |
The CAP Theorem and Async Replication
Async replication is the availability (A) choice in the CAP theorem trade-off. During a network partition, an async primary can continue accepting writes. Those writes might not survive if the primary fails before replicas receive them.
flowchart TD
subgraph Partition
P[Primary] --> R1[(Replica 1)]
P -.- X[(Partition)]
R2[(Replica 2)] -.- X
end
P -->|writes continue| Client1[Client]
R2 -.->|disconnected| P
If you need consistency during partitions, you need synchronous replication. If you need availability during partitions, async is your choice.
For more on this trade-off, see CAP Theorem and Consistency Models.
Reducing Lag in Async Setups
If lag is a problem, there are techniques to reduce it.
Upgrade replica hardware: Replicas that cannot keep up with the primary’s write rate will always lag. Give them faster disks and more CPU.
Reduce write workload: If the primary writes 10,000 transactions per second and replicas can only handle 8,000, lag will grow indefinitely.
Use parallel replication: Modern MySQL and PostgreSQL support parallel replica apply.
-- MySQL: Enable parallel replication
STOP REPLICA;
SET GLOBAL slave_parallel_workers = 8;
START REPLICA;
-- PostgreSQL: Increase wal_sender_timeout for long-running queries
ALTER SYSTEM SET wal_sender_timeout = '60s';
Cascade replication: Chain replicas so the primary does not feed every replica directly. This reduces primary load.
flowchart TD
P[Primary] --> R1[Replica 1]
R1 --> R2[Replica 2]
R1 --> R3[Replica 3]
R2 --> R4[Replica 4]
Read Routing with Async Replicas
Application code should handle stale reads gracefully when using async replicas.
# Read from replica with fallback to primary
def read_user(user_id):
try:
# Try replica first
replica = get_read_replica()
user = replica.execute("SELECT * FROM users WHERE id = ?", user_id)
return user
except ReplicaTooLagError:
# Fall back to primary if replica is too far behind
return primary.execute("SELECT * FROM users WHERE id = ?", user_id)
# Read-after-write with async replication
def update_and_read(user_id, new_email):
# Always read from primary after writes
db.execute("UPDATE users SET email = ? WHERE id = ?", new_email, user_id)
return primary.execute("SELECT email FROM users WHERE id = ?", user_id)
When Async Is Not Enough
Some data genuinely cannot tolerate async replication:
- Financial transactions: If you write a payment and the primary fails before replicas receive it, money disappears
- Inventory management: Overselling happens when replicas do not have the latest stock counts
- Locking operations: If you acquire a lock and the primary fails before it replicates, you have a zombie lock
For these cases, see Synchronous Replication for strong consistency options.
Real-world Failure Scenarios
Scenario 1: The Overwhelmed Replica
A replica running on undersized hardware falls behind during peak traffic. The primary continues accepting writes at 5,000 TPS while the replica can only apply 3,000. Over hours, the WAL backlog grows to 50GB. When the primary finally fails, the replica is promoted but 50GB of committed transactions are lost. The engineering team had set alert thresholds but the on-call engineer dismissed the warning, believing it was a transient load spike.
Scenario 2: The Partition That Wasn’t
During a network partition, an application continued writing to the primary. Without realizing replicas were unreachable, the team confirmed to users that orders were “saved.” When the partition healed, they discovered the primary had accumulated 2 hours of writes that now conflicted with the recovered replica’s state. Manual reconciliation took three days.
Scenario 3: The Untested Failover
A database administrator manually promoted a replica during what they thought was a primary hardware issue. The replica had 4 hours of lag due to a background analytics query consuming replica resources. Four hours of customer orders, profile updates, and payment confirmations were permanently lost. The RTO was 15 minutes, but the RPO was 4 hours - far exceeding the company’s 1-hour RPO commitment.
Scenario 4: The Replication Slot Catastrophe
A replica lost network connectivity for 6 hours. During that time, the primary’s replication slot was dropped (due to a misconfigured cleanup policy). When the replica reconnected, there was no way to resume replication - the WAL segments it needed had been purged. The team had to rebuild the entire replica from a new base backup, causing 8 hours of reduced read capacity.
Common Pitfalls / Anti-Patterns
-
Assuming replica data is current: After a write, do not assume replicas have the update immediately. Always read from primary for critical data after writes.
-
Not monitoring lag: Lag accumulates silently. By the time users complain, you might be hours behind. Monitor proactively.
-
Promoting a lagging replica: If a replica is 2 hours behind and you promote it, you just lost 2 hours of data. Always check lag before promoting.
-
Ignoring replication slot retention: If a replica falls behind and the replication slot is lost, you cannot recover without a full resync.
-
Testing in ideal conditions: Replication lag grows under load. Test with production-level write rates.
Quick Recap
- Async replication confirms writes immediately without waiting for replicas
- Replication lag means replicas can return stale data
- Best for: non-critical reads, analytics, geographically distributed systems
- Monitor lag actively and alert on threshold breaches
- Use read-after-write consistency when needed by reading from primary after writes
- For critical data requiring zero RPO, use synchronous replication
Read-Your-Writes Consistency Checklist
When using async replicas, ensuring read-after-write consistency requires explicit handling:
# Strategy 1: Read from primary after writes
def read_after_write(session, user_id, new_email):
session.execute(
"UPDATE users SET email = ? WHERE id = ?",
new_email, user_id
)
# Must read from primary to see our own write
return primary.execute(
"SELECT email FROM users WHERE id = ?",
user_id
)
# Strategy 2: Track LSN and wait for replication
def read_after_wait(session, user_id, new_email):
lsn = session.execute(
"UPDATE users SET email = ? WHERE id = ?",
new_email, user_id
)
# Wait for replica to catch up to our LSN
replica.wait_for_lsn(lsn, timeout=30)
return replica.execute(
"SELECT email FROM users WHERE id = ?",
user_id
)
# Strategy 3: Session pinning (sticky to primary)
def read_with_session_pin(session, user_id):
# Session always routes to primary for this user
session.set_pin(user_id, to_node='primary')
return session.execute(
"SELECT email FROM users WHERE id = ?",
user_id
)
| Strategy | Pros | Cons | Best For |
|---|---|---|---|
| Read from primary | Simple, always correct | Adds primary load | Write-heavy workloads |
| Wait for LSN | Works with any replica | Adds latency | Occasional consistency needs |
| Session pinning | Consistent experience | Reduces flexibility | User-specific data |
Trade-off Analysis
| Aspect | Asynchronous Replication | Synchronous Replication |
|---|---|---|
| Write Latency | Low (local disk + processing) | High (waits for replica confirmation) |
| Data Safety | Risk of losing writes during failover | No data loss if replica confirms |
| Availability | High - continues during partitions | Lower - blocked if replica unavailable |
| Replica Requirements | Can be geographically distant | Must be low-latency for performance |
| Use Case Fit | Read-heavy, eventual consistency OK | Critical data, zero RPO required |
| Failover Complexity | Higher - may lose recent writes | Lower - replica is up-to-date |
| Cross-DC Deployment | Feasible across continents | Impractical due to latency |
Interview Questions
Expected answer points:
- Synchronous replication waits for replica confirmation before acknowledging the write to the client
- Asynchronous replication confirms the write immediately after the primary records it locally, then sends changes to replicas in the background
- This difference makes async replication faster but introduces the possibility of data loss during failover
Expected answer points:
- Replication lag is the time difference between when a write is committed on the primary and when it is applied on a replica
- Reads from lagging replicas can return stale or outdated data
- The lag can range from milliseconds to minutes depending on network conditions, load, and replica performance
- Applications must handle potentially inconsistent reads during the lag window
Expected answer points:
- Statement-based: Sends SQL statements to replicas. Simple but problematic for nondeterministic functions like NOW() or RAND() that produce different results on different nodes
- WAL-based: Sends binary Write-Ahead Log entries containing actual changed bytes. Deterministic and compact. Used by PostgreSQL
- Row-based: Sends actual row changes. Combines benefits of both - deterministic and complete. Used by MySQL's row-based replication
Expected answer points:
- Eventual consistency guarantees that if no new writes are made, all replicas will eventually converge to the same state
- The "eventually" window is critical - during lag, different replicas can return different results for the same query
- This observable inconsistency is a real-world concern, not just theoretical
- Suitable for applications where strict immediate consistency is not required
Expected answer points:
- CAP theorem states you can only guarantee two of three: Consistency, Availability, and Partition tolerance
- Async replication chooses Availability (A) over Consistency (C) during network partitions
- The primary can continue accepting writes even if replicas are disconnected
- Risk: writes may be lost if the primary fails before replicas receive them
Expected answer points:
- Read from primary after writes: Route reads to the primary immediately following a write. Simple but adds primary load
- Track LSN and wait: After a write, track the Log Sequence Number and wait for the replica to catch up to that LSN before reading. Adds latency but works with any replica
- Session pinning: Pin a user's session to the primary for consistent reads. Reduces flexibility but provides consistent experience per user
Expected answer points:
- Replication lag (bytes/time): Difference between sent and replayed LSN. Warning: 1-10s, Critical: >10s
- Replica IO thread status: Must be running. Stopped thread means replication has stopped
- Replica SQL thread status: Must be running. Stopped thread means events are not being applied
- WAL backlog size: Warning: 100MB-1GB, Critical: >1GB
Expected answer points:
- If a replica is hours behind and gets promoted, all writes in that gap are permanently lost
- The new primary will have incomplete data compared to what was committed
- Always check lag metrics before promoting a replica
- Establish a maximum acceptable lag threshold (e.g., RPO) before allowing automatic failover
Expected answer points:
- Cascade replication chains replicas: Primary → Replica 1 → Replica 2 → Replica 3
- Reduces load on the primary, which no longer needs to feed every replica directly
- Trade-off: adds latency as changes propagate through the chain
- Useful for read-heavy deployments with many replicas across geographic regions
Expected answer points:
- If the primary fails after write confirmation but before replicas receive the change, that data is lost
- In financial systems, losing committed transactions (even small amounts) is unacceptable
- Money disappearing due to replication lag can cause regulatory and customer trust issues
- Financial systems typically require synchronous replication or consensus-based approaches
Expected answer points:
- Upgrade replica hardware: Faster disks and more CPU so replicas can keep up with the primary's write rate
- Reduce write workload: If primary writes 10,000 TPS and replicas handle only 8,000, lag grows indefinitely
- Enable parallel replication: Modern MySQL and PostgreSQL support applying events in parallel on replicas
- Cascade replication: Chain replicas to reduce primary load
Expected answer points:
- Statement-based: Replicas execute the same SQL statements as the primary. Can produce different results with nondeterministic functions
- Row-based: Replicas receive the actual row changes (before and after images). Deterministic and complete
- Row-based replication produces larger binary logs but avoids subtle replication bugs
- MySQL allows switching between modes based on workload
Expected answer points:
- Replication slots prevent the primary from removing WAL segments that replicas haven't yet received
- If a replica falls behind significantly and the replication slot is lost, the gap cannot be recovered
- Result: requires a full resync from the primary instead of resuming from where replication stopped
- Monitor replication slot status and ensure replicas stay connected
Expected answer points:
- Web sessions and preferences: User profile updates can lag seconds without real problems
- Analytics and reporting: Dashboard reads do not need millisecond freshness
- Social media feeds: Timeline updates tolerate eventual consistency
- Logging and metrics: Time-series data tolerates small gaps
- Geographic distribution: Cross-continent replication would be prohibitively slow with sync
Expected answer points:
- If a primary fails while replicas are lagging, promoted replicas will lose the uncommitted transactions
- RPO (Recovery Point Objective) is violated - data written but not yet replicated is lost
- Before promoting any replica, check its lag metrics against your acceptable RPO threshold
- If no replica is within acceptable lag, prefer manual reconciliation over automatic failover
Expected answer points:
- Replication slots are PostgreSQL features that prevent WAL segment removal until replicas confirm receipt
- If a replica is disconnected and the slot is dropped (due to misconfigured retention), WAL segments needed for recovery are purged
- Result: the replica cannot resume replication and must be rebuilt from a fresh base backup
- This can cause hours of reduced read capacity and service degradation
Expected answer points:
- In cascade replication, the primary feeds only one or two replicas directly, and those replicas feed others
- The primary no longer needs to maintain concurrent connections for every replica
- Network bandwidth on the primary is significantly reduced in large replica deployments
- Trade-off: adds propagation latency as changes flow through intermediate replicas
Expected answer points:
- WAL advantages: Deterministic (contains actual changed bytes), compact, works for any SQL operation
- WAL disadvantages: Binary format tied to specific database version, harder to inspect and debug
- Statement advantages: Human-readable, portable across MySQL versions
- Statement disadvantages: Nondeterministic functions cause divergence between primary and replica
Expected answer points:
- During the replication lag window, replicas are at different points in the transaction stream
- If queries are load-balanced across replicas randomly, consecutive reads can return different results
- This observable inconsistency is the price of async replication's speed
- Applications must either accept staleness or implement read-after-write consistency by routing critical reads to the primary
Further Reading
- Database Replication - The broader replication landscape covering sync, async, and semi-sync approaches
- Synchronous Replication - Strong consistency options when async is not sufficient
- Consistency Models - Understanding strong, eventual, and causal consistency
- Distributed Caching - Read-scaling patterns that complement replication
- Event-Driven Architecture - Alternative consistency models using event sourcing
- CAP Theorem - The theoretical foundation behind replication trade-offs
Conclusion
Async replication is the right choice for most read-heavy workloads. The key is understanding which data needs strong consistency and which can tolerate eventual consistency, then routing reads accordingly.
Category
Related Posts
CRDTs: Conflict-Free Replicated Data Types
Learn how CRDTs enable strongly consistent eventual consistency in distributed databases. Explore G-Counters, PN-Counters, LWW-Registers, OR-Sets, and practical applications.
Synchronous Replication: Data Consistency Across Nodes
Learn about synchronous replication patterns in distributed databases, including Quorum-based replication and write-ahead log shipping for zero RPO.
Database Replication: Master-Slave and Failover Patterns
Database replication explained: master-slave, multi-master, synchronous vs asynchronous strategies, failover patterns, and consistency.