Database Replication: Master-Slave and Failover Patterns
Database replication explained: master-slave, multi-master, synchronous vs asynchronous strategies, failover patterns, and consistency.
Database Replication: Master-Slave and Failover Patterns
Introduction
Database replication is the practice of maintaining multiple copies of the same data across different database servers. This approach delivers two primary benefits: high availability (if one server fails, others can take over) and read scaling (distributing queries across replicas reduces load on the primary). The choice of replication topology, synchronous versus asynchronous propagation, and failover strategy fundamentally shapes how an application behaves under failure conditions.
This guide covers the major replication models—master-slave and multi-master—then examines the consistency/durability trade-offs between synchronous and asynchronous approaches. You will also find configuration examples for PostgreSQL and MySQL, production failure scenarios, and an observability and security checklist to validate your setup.
When to Use / When Not to Use
Use replication when you need:
- Read scaling: Distribute SELECT queries across replicas to reduce primary load. Effective for read-heavy workloads where stale reads are acceptable.
- High availability: Automatic failover ensures continuity when the primary fails. Critical for applications that cannot tolerate extended downtime.
- Geographic distribution: Place replicas in different regions to reduce read latency for distributed users.
- Reporting and analytics: Offload read-intensive analytical queries to dedicated replicas, keeping the primary focused on OLTP workloads.
- Zero-downtime maintenance: Perform upgrades or maintenance on the primary by promoting a replica, then switching traffic back.
Avoid replication when:
- Write throughput is the bottleneck: Replication does not scale writes—only the primary accepts writes. If writes are your constraint, consider sharding or upgrading primary hardware.
- Strong consistency is required for every read: Async replicas lag. If your application cannot tolerate stale reads after writes, you need either synchronous replication or read-from-primary-after-write patterns.
- Your dataset exceeds a single server: Replication copies data, it does not partition it. For datasets too large for one server, you need sharding.
- You need full-text search or complex joins at scale: Replicas are copies of the same relational schema. For search-heavy workloads, a separate index like Elasticsearch is more appropriate.
- Your application is write-heavy with conflict-prone workloads: Multi-master creates write conflicts that require conflict resolution logic. The complexity is rarely worth it unless write availability absolutely demands it.
Bottom line: Replication solves read scale and HA. It does not solve write scale. If your bottleneck is writes, go back to the drawing board and evaluate sharding.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Primary failure without automatic failover | Writes fail, extended downtime | Implement automated failover, test failover regularly, have runbooks |
| Replication break due to divergent data | Replica stops replicating, falls behind | Use pt-table-checksum to detect drift, implement resync procedure |
| Network partition causing split-brain | Two primaries accept writes independently | Use quorum-based failover, implement fencing, test network partition scenarios |
| Replication lag buildup | Read replicas return stale data | Monitor lag, provision stronger replicas, switch to synchronous replication |
| Long-running query holding replication slots | Replication cannot advance, WAL accumulates | Kill long-running queries, tune max_slot_wal_keep_size |
| Cascading replication chain failure | Leaf replicas fall far behind | Keep replication chain short, monitor all replicas |
| Accidental replica promotion | Application writes go to wrong server | Protect against manual promotion, require multiple approvals |
| WAL corruption on primary | Replicas replicate corrupted data | Enable checksums, verify replica integrity regularly |
Trade-off Analysis
| Dimension | Asynchronous Replication | Synchronous Replication |
|---|---|---|
| Latency | Low — primary does not wait | High — primary waits for ack |
| Durability | Risk of data loss on failover | Zero data loss (when working) |
| Availability | Higher — writes never blocked | Lower — writes blocked if replica down |
| Replica load | Lower | Higher |
| Cross-region support | Practical at distance | Impractical at distance (latency) |
| Use case | Read scaling, non-critical data | Financial, compliance-critical data |
| Dimension | Master-Slave | Multi-Master (Active-Active) |
|---|---|---|
| Write scalability | None — single primary | Scales writes across nodes |
| Conflict resolution | None — single writer | Required — same row can be written |
| Complexity | Low | High |
| Failure handling | Simple failover | Split-brain risk |
| Use case | Most workloads | Geo-distributed write availability |
Common Pitfalls / Anti-Patterns
-
Assuming replicas are consistent immediately after writes: With async replication, replicas lag. Do not read from replicas for critical data immediately after writes.
-
Not monitoring replication lag: Lag accumulates silently until users complain. Monitor proactively and alert on threshold breaches.
-
Promoting replica without verifying data: An incompletely replicated replica promoted to primary causes data loss. Verify replication is current before promotion.
-
Using replicas for writes without understanding conflicts: Writes to replicas are typically replicated back. This can cause conflicts if not carefully designed.
-
Ignoring replication slot retention: Slots prevent WAL removal. If replicas fall behind, WAL accumulates and can fill disk.
-
Configuring weak authentication for replicas: Replicas with weak auth become attack vectors. Treat replica authentication as seriously as primary.
-
Not testing failover procedures: If you have never tested failover, you will discover problems during actual outages. Test regularly.
-
Mixing synchronous and asynchronous replication: This causes confusing behavior. Choose one replication mode per replica.
Quick Recap Checklist
- Replication solves read scale and HA — not write scale
- Async replication: low latency, risk of data loss on failover
- Sync replication: zero data loss, higher write latency
- Semi-synchronous: at least one replica acknowledged before commit
- Monitor replication lag on every replica; alert on anomalies
- Verify replication is current before promoting a replica
- Test failover procedures before production failures occur
- Replication slots prevent WAL recycling — monitor disk usage
- Strong auth and TLS encryption for replication connections
- Multi-master requires conflict resolution — avoid unless necessary
Interview Questions
The weaker replica cannot replay WAL entries as fast as the stronger replicas. Replication uses a single SQL thread for applying changes — if that thread cannot keep up with the primary's write rate, the weak replica falls behind. Meanwhile, stronger replicas stay current. The practical impact: if your application routes reads round-robin across replicas without checking lag, some users get stale data while others see current data. The fix is to either upgrade the weak replica's resources or exclude it from read distribution and use it only for specific low-throughput tasks.
The read sees eventual consistency at best. There is no guarantee that the replica has received the write. If replication lag is zero, the read sees the write. If lag is 5 seconds, the read sees data that is 5 seconds old. For read-after-write consistency, you must either read from the primary after writes or use semi-synchronous replication to guarantee at least one replica has the write before the primary acknowledges the transaction.
Synchronous replication waits for all configured replicas to acknowledge writes before committing the transaction on the primary. If any replica does not acknowledge within a timeout, the write fails. Semi-synchronous replication waits for at least one replica to acknowledge, then commits on the primary. If no replica acknowledges within the timeout, it falls back to asynchronous replication and commits anyway. PostgreSQL calls this synchronous_commit = on for synchronous and synchronous_commit = remote_apply or remote_write for the various semi-synchronous modes.
Use synchronous replication to at least one replica in the same datacenter. This guarantees that when the primary commits a transaction, at least one replica has already written it. On failover, the promoted replica is current with no data loss. The tradeoff is that every write now waits for a network round-trip to the replica. For financial applications where correctness matters more than write latency, this is the right trade. You also want to use semi-synchronous replication as a fallback if the synchronous replica becomes unavailable, and you want automated failover that promotes only if the replica is confirmed current.
The quorum formula R + W > N ensures that read and write quorums overlap. When you write to W replicas and read from R replicas, the condition guarantees at least one replica in the read set also has the latest write. This prevents stale reads because the overlapping replica(s) definitely have the most recent write. If R + W ≤ N, the read and write sets may be disjoint — a write could be committed to replicas A, B, C while a subsequent read contacts only D and E, returning stale data. For example, with N=5, R=2, W=3: writes go to {A,B,C} but reads go to {D,E} with no overlap, allowing stale reads.
Cascading replication chains replicas so that a replica follows another replica instead of the primary directly. Data flows: Primary → Replica1 → Replica2 → Replica3. This reduces load on the primary when you have many replicas, because the primary only needs to send WAL entries to one first-tier replica instead of N replicas. The tradeoff is increased replication lag — data takes longer to reach leaf replicas since it must flow through intermediate nodes. Cascading makes sense for read-heavy workloads where staleness is acceptable and you need many replicas. It does not make sense when lag must be minimized or when you need synchronous replication to all replicas.
Automatic failover works by having a monitor process detect primary failure through health checks, reaching consensus among monitors that the primary is down, selecting the most current replica based on WAL position, promoting that replica to primary, and redirecting traffic through a proxy. The main risks include: (1) Split-brain where two servers think they are primary due to network partition — mitigated by quorum-based consensus; (2) Data loss if the promoted replica has not received the latest writes from the async primary — mitigated by semi-synchronous replication; (3) Write burst overwhelming the new primary before it stabilizes; (4) A weak replica being promoted when a stronger one was available. Testing failover regularly and having operator verification for critical systems helps manage these risks.
Read-after-write consistency (also called read-your-writes) guarantees that after a client writes data, subsequent reads by the same client see that write. In an async replicated system, this requires reading from the primary after writes, because replicas may not have received the write yet. Eventual consistency only guarantees that if no new writes are made, all replicas will eventually converge to the same value — but a read immediately after a write may return stale data. Read-after-write is a stronger guarantee. Techniques to achieve it include: routing writes to primary then subsequent reads to primary for a session window; using synchronous replication so replicas are current; or tracking the minimum LSN that a client has seen and routing reads to replicas that have replayed past that LSN.
Replication slots prevent WAL (Write-Ahead Log) from being removed on the primary as long as a replica has not acknowledged receiving those WAL entries. This is designed to prevent data loss — if a replica falls behind and then reconnects, it can resume from where it left off without the WAL having been recycled. The danger: if a replica falls behind significantly (due to being offline, a slow network, or a long-running query holding the slot) or if a slot is created but the replica never actually connects, WAL accumulates on the primary's disk. If left unchecked, this can fill the entire disk, causing the primary to stop accepting writes entirely. PostgreSQL's max_slot_wal_keep_size parameter limits how much WAL a slot can retain, and wal_keep_size provides a simpler floor for all replicas.
Traditional replication replicates SQL statements or WAL entries from primary to replicas. Aurora separates compute and storage — the primary and replicas all share the same 6-node storage volume across 3 AZs. When the primary writes a page, it is acknowledged after 4 of 6 storage nodes commit. Replicas read from the same shared storage, so there is no traditional replication lag — a replica sees the same data as the primary as soon as it is written. This gives single-digit millisecond replica lag under normal operation. The tradeoff is write throughput is limited by storage layer capacity, and cross-region replication is asynchronous with higher lag since the shared storage is region-local. Aurora's model works well for most applications but can bottleneck for write-heavy workloads.
A direct replica connects straight to the primary and receives WAL entries directly. A relay master (also called a cascading replica or intermediate master) receives WAL from the primary and then forwards it to other replicas downstream. In GitHub's architecture, the primary replicated to a relay master in each datacenter, which then replicated to read replicas. This reduced primary replication load but created compounding lag problems — a replica attached to a lagging relay master could be far behind the primary even when its own replication thread was healthy. Relay masters are useful for reducing primary load and for geo-distributed setups where you want to minimize cross-region traffic from the primary, but they add complexity and compound lag at each tier.
GTID assigns a unique identifier (UUID + sequence number) to every transaction committed on the primary. When a replica receives a transaction, it records the GTID rather than a file name and byte offset. This makes replication self-describing — a replica always knows exactly which transaction it has executed. The advantages over file-position replication: (1) Failover is simpler because you specify the GTID to resume from rather than a file+offset; (2) It is easier to detect whether a replica has skipped a transaction; (3) When promoting a replica, you do not need to calculate the correct position — you just enable replication with the GTID set; (4) Consistency checking with MASTER_AUTO_POSITION = 1 automatically aligns the replica to the correct position. MySQL's GTID-based replication eliminates the need to manually track binlog file positions.
Semi-synchronous replication is a middle ground: the primary waits for at least one replica to acknowledge receipt of a write before committing, but does not wait for all replicas. If no replica acknowledges within a timeout, the primary falls back to asynchronous mode and commits anyway. This gives stronger durability than async (at least one replica has the write) while avoiding the latency of waiting for all replicas. PostgreSQL implements this via synchronous_commit = remote_apply which waits for the replica to replay the transaction, or remote_write which only waits for the replica to write to disk. You would choose semi-synchronous when: you want at least one durable copy but cannot tolerate the latency of waiting for all replicas; you need protection against total primary failure without the full performance cost of synchronous replication; or you are in a hybrid environment where some replicas must be synchronous and others can be async.
Common causes of periodic lag spikes include: (1) A long-running query on the replica holding back the SQL replay thread — diagnose with SHOW PROCESSLIST on the replica or pg_stat_activity in PostgreSQL; (2) A bulk data load or batch job on the primary generating WAL faster than replicas can consume — monitor WAL generation rate vs replay rate; (3) Network saturation between primary and replica — check network throughput metrics; (4) Replica resource exhaustion (CPU, disk IOPS) — monitor replica host metrics; (5) Replication slot holding back WAL when a replica falls behind — check pg_replication_slots; (6) Increasing table bloat on the primary causing larger WAL records. Fixes include: provisioning stronger replicas, adding parallel apply for the replica SQL thread (MySQL), tuning max_standby_streaming_delay in PostgreSQL, offloading analytics queries from replicas used for OLTP, and using cascading replication to reduce primary load.
Several approaches reduce stale reads without routing everything to the primary. (1) Latency-aware routing: route reads to the nearest replica to reduce lag and network delay. (2) Monotonic reads: ensure a client always reads from the same replica or one that has seen all previous writes from that client. (3) Read-after-write targeting primary: read from primary for a session window after writes, then fall back to replicas. (4) LSN-based routing: track the replication position after writes and route reads only to replicas that have replayed past that LSN. (5) Hinted handoffs: in Cassandra-style systems, a replica that receives a read request but is behind forwards the request to a more current replica, reads from it, and returns the fresher data. For PostgreSQL, you can use pg_stat_replication to check replay_lsn and only route reads to replicas whose lag is below your tolerance threshold.
Leader-based replication (primary-replica) designates one node as the primary that accepts all writes. Replicas apply changes from the primary. Reads can go to any replica. Leaderless replication (used by DynamoDB, Cassandra, Riak) has no designated primary — any node can accept writes and reads. With leaderless replication, you typically use quorum reads and writes (R + W > N) to guarantee consistency. The trade-offs: leader-based is simpler to reason about and avoids write conflicts but the primary is a bottleneck and a failure point requiring failover. Leaderless replication distributes write load across nodes and continues operating during node failures, but requires conflict resolution for writes and careful quorum tuning. Leaderless systems also tend to have higher write amplification since writes may need to be retried or repaired.
MySQL supports three binary log formats. Statement-based replication (SBR) logs the SQL statements themselves — simpler, less data logged, but can produce nondeterministic results for functions like NOW() or UUID(). Beyond nondeterministic functions, SBR has two additional practical drawbacks: triggers can produce different outcomes on replicas if the trigger logic depends on state that differs between primary and replica, and SBR cannot work correctly with multi-statement transactions that have no snapshot isolation (e.g., a transaction doing two consecutive reads where the second read depends on the first read's result — on a replica, the non-deterministic ordering can cause different outcomes). Row-based replication (RBR) logs the actual row changes — guarantees consistency for nondeterministic statements but can produce very large binlogs for bulk updates. Hybrid replication (MIXED) uses statement-based by default but switches to row-based for statements known to be nondeterministic. In practice, row-based is now the default in MySQL 8.0 because it is safer. Statement-based can be faster for bulk operations on tables with many rows, but the inconsistency risks rarely make it worth the disk savings.
PostgreSQL streaming replication (physical replication) copies the entire database byte-for-byte via WAL records. It is block-level and database-wide — you cannot replicate a single table or database. pglogical is a logical replication extension that replicates at the row and DDL level. With pglogical, you can replicate specific tables, specific databases, or subsets of data using publication/subscription. Logical replication also lets you do zero-downtime major version upgrades by having the new version subscribe to the old version's changes. The tradeoff is that logical replication has higher overhead per transaction (more CPU) because it must decode WAL into individual row changes, whereas streaming replication just streams raw WAL bytes. Logical replication also requires the schema to be mostly compatible between publisher and subscriber.
Dual-primary (also called active-active) means two database servers are both configured as primaries, each accepting writes and replicating to the other. Both are simultaneously writable. The appeal is write load distribution and geographic proximity to users. The problem is conflict resolution — if the same row is updated on both primaries within the replication window, you get a write conflict. Depending on your conflict resolution strategy (last-write-wins, manual resolution, application-level handling), you risk data divergence, lost updates, or application errors. The window of conflict is every millisecond two servers are accepting writes independently. Most databases with dual-primary configurations end up requiring application-level logic to serialize writes (e.g., shard by key range so each primary owns different rows), which defeats the purpose. For most use cases, a single primary with read replicas and a proper failover mechanism handles the job better.
CDC is a pattern where you capture changes (inserts, updates, deletes) from a database's transaction log and publish them as an event stream. Tools like Debezium, AWS DMS, or Postgres' logical replication slots do this. CDC differs from traditional replication in that the output is an event stream rather than a direct replica — downstream consumers can be data warehouses, search indexes, other databases, or microservices. You would use CDC over traditional replication when: you need to feed multiple downstream systems (not just one replica set), you want loose coupling between source and sink (events can be replayed, filtered, transformed), you are doing a migration and need to keep systems in sync during the transition, or you need to build audit logs or time-travel queries. The tradeoff is operational complexity — CDC pipelines require monitoring, schema evolution handling, and event ordering guarantees.
Further Reading
Official Documentation:
- PostgreSQL High Availability and Replication — Official docs on streaming replication, logical replication, and HA configurations
- MySQL Replication Documentation — GTID replication, replica configuration, and failover
- Amazon Aurora Documentation — Aurora’s shared storage replication model
Books and Papers:
- Designing Data-Intensive Applications by Martin Kleppmann — Chapters on replication (Ch. 5) and distributed data (Ch. 6, 8) provide foundational theory
- Database Internals by Alex Petrov — Deep dive into storage engines, WAL mechanisms, and replication internals
Tools and Utilities:
- pgpool-II: Connection pooling and replication proxy for PostgreSQL
- PgBouncer: Lightweight connection pooler for PostgreSQL
- Vitess: MySQL sharding proxy that also handles replication routing
- patroni: PostgreSQL HA template using consensus-based failover
- orchestrator: MySQL replication topology management and failover tool
- percona-toolkit: Includes
pt-table-checksumfor detecting replication drift
Monitoring Queries:
-- PostgreSQL: replication lag monitoring
SELECT
client_addr,
state,
sent_lsn - replay_lsn AS replication_lag_bytes
FROM pg_stat_replication;
-- MySQL: replica lag monitoring
SHOW REPLICA STATUS\G
-- Key fields: Seconds_Behind_Master, Relay_Log_Pos, Exec_Source_Log_Pos
Conclusion
- Replication provides high availability and read scale; it does not scale writes
- Asynchronous replication is faster but allows stale reads and potential data loss
- Synchronous replication guarantees durability at the cost of latency
- Monitor replication lag on every replica and alert on anomalies
- Test failover procedures before you need them in production
- Multi-master creates conflict resolution complexity; use only when write availability demands it
Copy/Paste Checklist:
-- PostgreSQL: Set up synchronous replication
ALTER SYSTEM SET synchronous_commit = on;
ALTER SYSTEM SET synchronous_standby_names = 'replica1,replica2';
-- MySQL: Configure replica with GTID
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = 'primary-host',
SOURCE_USER = 'repl_user',
SOURCE_PASSWORD='***',
SOURCE_AUTO_POSITION = 1;
START REPLICA;
-- Monitor replication health
SELECT * FROM pg_stat_replication;
SHOW REPLICA STATUS;
Observability Checklist
Metrics to Monitor:
- Replication lag in seconds (primary vs each replica)
- Replica IO and SQL thread status
- Replica heartbeat/lag metrics
- WAL generation rate vs replay rate
- Replication slot age (if using slots)
- Disk usage on replicas vs primary
- Connection count per replica
- Error count on replication threads
Logs to Capture:
- Replication state changes (connect, disconnect, error)
- Replication conflict events (duplicate key, deadlocks)
- Long-running queries on replicas blocking replication
- Checkpoint and WAL activity
- Failover events and decisions
Alerts to Set:
- Replication lag > 30 seconds (adjust to your SLA)
- Replica IO/SQL thread stopped
- Replication slot age > threshold
- Disk usage on replica > 80%
- High number of replication conflicts
- Replica connection failures
-- PostgreSQL: Monitor replication status
SELECT client_addr, state, sent_lsn, write_lsn,
flush_lsn, replay_lsn,
sent_lsn - replay_lsn AS replication_lag
FROM pg_stat_replication;
-- MySQL: Check replica status
SHOW REPLICA STATUS\G
Security Checklist
- Use strong authentication for replication connections (scram-sha-256, certificates)
- Encrypt replication traffic with TLS
- Restrict replication user to replica connections only
- Implement network segmentation for replication traffic
- Use separate credentials for replication (not application credentials)
- Audit replication configuration changes
- Rotate replication credentials regularly
- Monitor authentication failures on replication ports
- Verify replica SSL certificates
Category
Related Posts
Asynchronous Replication: Speed and Availability at Scale
Learn how asynchronous replication works in distributed databases, including eventual consistency implications, lag monitoring, and practical use cases where speed outweighs strict consistency.
Synchronous Replication: Data Consistency Across Nodes
Learn about synchronous replication patterns in distributed databases, including Quorum-based replication and write-ahead log shipping for zero RPO.
Amazon DynamoDB: Scalable NoSQL with Predictable Performance
Deep dive into Amazon DynamoDB architecture, partitioned tables, eventual consistency, on-demand capacity, and the single-digit millisecond SLA.