Database Scaling: Vertical, Horizontal, and Read Replicas

Learn strategies for scaling databases beyond a single instance: vertical scaling, read replicas, write scaling, and when to choose each approach.

published: reading time: 32 min read author: GeekWorkBench

Introduction

Vertical scaling means bigger servers. More CPU cores, more memory, faster disks. This is the simplest approach. You take your existing database and move it to a more powerful machine.

The advantage is simplicity. No application changes required. Your single-server transactions stay ACID-compliant. Your queries work exactly as before.

The disadvantage is limits. Server sizes have caps. A database that needs 64 cores is not going to fit on a server with 32. And costs scale faster than capacity. A server with twice the resources rarely costs twice as much.

Vertical scaling works best early in a system’s life. When traffic is modest and growth is predictable, throwing bigger iron at the problem is often the cheapest solution in engineering time.

Scaling Formulas & Metrics

Capacity Planning

Vertical Scaling Formulas

Estimate when you will need to scale vertically based on current resource utilization and growth rate:

import math

def estimate_vertical_scale_time(
    current_cpu_percent: float,
    current_memory_percent: float,
    monthly_growth_rate: float,  # e.g., 0.15 for 15% monthly growth
    days_until_limit: int = 90
) -> dict:
    """
    Estimate when vertical scaling will be needed.
    Returns the limiting resource and estimated days until scale required.
    """
    # Days until CPU limit at current growth
    cpu_days = math.log(days_until_limit / 30) / math.log(1 + monthly_growth_rate) * 30

    # Days until memory limit (typically slower growth than CPU)
    memory_days = math.log(days_until_limit / 30) / math.log(1 + monthly_growth_rate * 0.7) * 30

    limiting_resource = 'CPU' if cpu_days < memory_days else 'Memory'
    days_until_scale = min(cpu_days, memory_days)

    return {
        'limiting_resource': limiting_resource,
        'estimated_days_until_scale': int(days_until_scale),
        'monthly_growth_assumption': f"{monthly_growth_rate * 100:.0f}%"
    }

# Example: 60% CPU, 70% memory, 20% monthly growth
result = estimate_vertical_scale_time(60, 70, 0.20)
# Returns: limiting_resource='CPU', estimated_days_until_scale=~73 days

Connection pool sizing formula:

def recommended_pool_size(
    max_connections_per_query: int,
    expected_concurrent_requests: int,
    database_cpu_count: int
) -> int:
    """
    Calculate recommended connection pool size.
    Rule of thumb: (core_count * 2) + effective_spindle_count
    For SSDs: effective_spindle_count ~= 30
    """
    # General formula from PostgreSQL documentation
    baseline = database_cpu_count * 2

    # Add for SSD storage (rough approximation of IO capacity)
    effective_spindle_count = 30  # for SSD-backed storage

    recommended = baseline + effective_spindle_count

    # Cap at reasonable fraction of max_connections
    max_pool_size = max_connections_per_query * expected_concurrent_requests * 0.5

    return min(recommended, max_pool_size)

# Example: 8-core DB, 100 concurrent requests, each query uses 1 connection
pool_size = recommended_pool_size(1, 100, 8)  # Returns ~46

Read Replica Lag Estimation

Replication lag depends on write volume, network bandwidth, and replica hardware:

def estimated_replication_lag_seconds(
    write_throughput_mbps: float,
    network_latency_ms: float,
    replica_apply_latency_ms: float,
    wal_record_size_kb: float = 4.0
) -> float:
    """
    Estimate typical replication lag in seconds.
    """
    # Time to transfer WAL records for each write batch
    transfer_time = (wal_record_size_kb / write_throughput_mbps) * 1000  # ms

    # Round trip + apply time
    total_latency = network_latency_ms + transfer_time + replica_apply_latency_ms

    return total_latency / 1000  # convert to seconds

# Example: 10 MB/s write throughput, 1ms LAN latency, 2ms apply
lag = estimated_replication_lag_seconds(10, 1.0, 2.0)
# Returns: ~0.005 seconds (5ms) for small records

Sharding Capacity Formula

def estimate_sharding_requirements(
    total_data_gb: int,
    write_tps: int,
    read_tps: int,
    target_shard_data_gb: int = 100  # GB per shard target
) -> dict:
    """
    Estimate number of shards needed and their characteristics.
    """
    # Data-based shard count
    data_shards = math.ceil(total_data_gb / target_shard_data_gb)

    # Write scaling: each shard can handle ~5000 TPS for simple writes
    writes_per_shard = write_tps / data_shards
    write_shards = math.ceil(write_tps / 5000) if writes_per_shard > 5000 else data_shards

    # Read scaling: each shard can handle ~10000 TPS with caching
    reads_per_shard = read_tps / data_shards
    read_shards = math.ceil(read_tps / 10000) if reads_per_shard > 10000 else data_shards

    total_shards = max(data_shards, write_shards, read_shards)

    return {
        'estimated_shards': total_shards,
        'data_gb_per_shard': total_data_gb / total_shards,
        'writes_per_shard_per_second': write_tps / total_shards,
        'reads_per_shard_per_second': read_tps / total_shards,
        'recommendation': 'shard' if total_shards > 1 else 'single_server'
    }

# Example: 500 GB data, 20000 write TPS, 100000 read TPS
result = estimate_sharding_requirements(500, 20000, 100000)
# Returns: ~5 shards, 100GB/shard, 4000 write TPS/shard, 20000 read TPS/shard

Cost & Decision Framework

Cost Comparison: Vertical vs Horizontal Scaling

DimensionVertical ScalingHorizontal Scaling (Replicas + Sharding)
Hardware costHigh (large server prices non-linearly)Lower (commodity servers scale linearly)
Licensing costOften per-socket or per-corePer-server (can use more smaller licenses)
Operational complexityLow (single server)High (more servers, replication, failover)
Scaling elasticityDiscrete jumps (next server size)Smooth (add servers incrementally)
Maximum scaleLimited by largest available serverNearly unlimited (sharding)
Failure domainSingle point of failureCan survive single node failures
Read throughputLimited to one server’s capacityMultiplied by replica count
Write throughputLimited to one server’s capacityRequires sharding (complex)
Monthly cost trajectoryStep function (big jumps)Linear growth with usage

Break-even comparison:

ScenarioVertical (r6i.4xlarge)Horizontal (3x r6i.xlarge)
Specifications16 vCPU, 128 GB RAM3 × (4 vCPU, 32 GB RAM)
Monthly AWS cost (us-east-1)~$1,200~$1,350
Write throughput15,000 TPS~20,000 TPS (with sharding)
Read throughput (cached)50,000 QPS150,000 QPS
Failure handlingManual failoverAutomatic with read replicas
Best forWrite-heavy, simple opsRead-heavy, high availability

Scaling Timeline: When to Choose What

Use this as a rough guide for when to consider each scaling approach based on your workload:

Scale StageUsers / QPSPrimary BottleneckRecommended Approach
Startup< 1K users, < 100 QPSSimple queriesSingle server, basic vertical
Growing1K-10K users, 100-1K QPSConnection limits, some slow queriesVertical + read replicas
Scaling10K-100K users, 1K-10K QPSRead latency, replication lagAdd caching, more replicas
Large100K-1M users, 10K-50K QPSWrite throughput, data sizeSharding, connection pooling
Enterprise> 1M users, > 50K QPSMulti-region, complianceDistributed database, read-heavy CQRS

Decision triggers to watch:

def should_scale_vertically(
    cpu_percent_avg: float,
    memory_percent_avg: float,
    connections_used_percent: float
) -> dict:
    """Return scaling recommendations based on resource utilization."""
    triggers = []

    if cpu_percent_avg > 70:
        triggers.append("CPU > 70%: consider vertical scale or read replicas")
    if memory_percent_avg > 80:
        triggers.append("Memory > 80%: vertical scale or caching")
    if connections_used_percent > 70:
        triggers.append("Connections > 70%: add connection pooling or replicas")

    return {
        'should_scale': len(triggers) > 0,
        'triggers': triggers,
        'recommendation': 'vertical' if cpu_percent_avg > 70 else 'replicas'
    }

def should_add_caching(
    cache_hit_rate: float,
    repeated_query_percent: float,
    db_cpu_percent: float
) -> dict:
    """Determine if caching would help."""
    if repeated_query_percent > 30 and cache_hit_rate < 0.80:
        return {
            'recommend_caching': True,
            'reason': f"{repeated_query_percent:.0f}% repeated queries, "
                      f"only {cache_hit_rate:.0%} cache hit rate"
        }
    return {'recommend_caching': False}

def should_shard(
    data_size_gb: int,
    write_tps: int,
    single_server_max_tps: int,
    single_server_max_gb: int
) -> dict:
    """Determine if sharding is needed."""
    reasons = []
    if data_size_gb > single_server_max_gb * 0.7:
        reasons.append(f"Data ({data_size_gb} GB) approaching "
                       f"single server limit ({single_server_max_gb} GB)")
    if write_tps > single_server_max_tps * 0.7:
        reasons.append(f"Write TPS ({write_tps}) approaching "
                       f"single server limit ({single_server_max_tps})")

    return {
        'should_shard': len(reasons) > 0,
        'reasons': reasons
    }

Start with the simplest approach (vertical). Only add complexity when measurements show it is needed.

Consistency Considerations

Scaling introduces consistency trade-offs. Read replicas are typically asynchronously replicated. Writes to primary take time to propagate to replicas.

# This might fail if replica lags behind primary
user = replica_db.query("SELECT * FROM users WHERE id = ?", user_id)
if user['email'] != new_email:
    # User might see old email briefly
    primary_db.execute("UPDATE users SET email = ?", new_email)

Read-after-write consistency is not guaranteed with replicas. A user might write data and immediately read it from a replica that has not yet received the write. Solutions include:

  1. Read from primary after writes
  2. Synchronous replication (higher latency, lower availability)
  3. Session-based routing (read from primary for a window after writes)

For most applications, eventual consistency is acceptable. Users rarely notice milliseconds of staleness. Choose stronger guarantees only when your requirements demand them.

Query Performance Optimization

Query Optimization Deep Dive

Before scaling infrastructure, optimize the queries that run against it. A slow query on a powerful server becomes a crisis at scale.

Reading EXPLAIN Plans

Every major database provides execution plan visualization:

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

-- Key things to look for:
-- - Seq Scan: full table scan (often indicates missing index)
-- - Hash Join vs Nested Loop: Hash is better for large sets
-- - Rows estimate vs actual rows: large mismatch means stale statistics)

For MySQL:

EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

Critical patterns to identify in EXPLAIN output:

PatternIndicatesAction
Seq Scan on large tableMissing index on WHERE/JOIN columnAdd appropriate index
Hash Join with large rowsMay exceed memory for hash tableIncrease work_mem or use partial indexes
Nested Loop with large innerO(n²) behaviorAdd index, rewrite to use Hash Join
Rows estimate ≫ actualStale statisticsRun ANALYZE or VACUUM ANALYZE
Index Only ScanOptimal — data from index aloneEnsure covering indexes for frequent queries

Real-world Failure Scenarios

Understanding how scaling decisions fail in production helps you avoid the same mistakes. These scenarios are drawn from documented incidents at high-traffic services.

Scenario 1: The Cache Cascade

A popular e-commerce site deployed a new recommendation engine that cleared cache entries on each page view. Within minutes, every request hit the database directly. Database connections spiked to 100% of max capacity. The application began timing out, which caused retries from clients, compounding the problem. Within 10 minutes, the site was unavailable.

Root cause: Cache invalidation on read operations. Every page view invalidated the cache entry it had just read, creating a circular pattern where the cache was never populated.

Resolution: Changed cache invalidation strategy from immediate invalidation to TTL-based expiration. Added circuit breakers to prevent cascading failures when cache operations fail.

Scenario 2: The Shard Imbalance

A gaming company sharded their player database by player ID (modulo of shard count). Players who joined early accumulated more data (achievements, inventory, social graphs) than new players. Early shard nodes ran out of storage while newer shards had 80% free space. Re-sharding required 72 hours of downtime and manual data migration.

Root cause: Shard key selection did not account for data growth patterns. Player age (join time) correlated with data volume, not just access frequency.

Resolution: Migrated to consistent hashing with virtual nodes. Data is now rebalanced automatically as shards are added or removed. Implemented monitoring for shard storage skew.

Scenario 3: The Read Replica Storm

A reporting dashboard polled a read replica every second. During business hours, 500 concurrent users generated 500 queries per second against a replica that could handle 1,000 TPS. At peak, replication lag spiked to 30+ seconds. The dashboard showed stale data, and users refreshed repeatedly, creating a feedback loop of increasing load.

Root cause: No query rate limiting or caching at the application layer. Identical queries were executed every second regardless of data freshness requirements.

Resolution: Added query result caching with 10-second TTL. Implemented query coalescing (multiple requests share one database query). Set up separate analytics database for reporting workloads.

Scenario 4: The Connection Pool Exhaustion

An API service scaled horizontally to 20 instances, each maintaining its own connection pool to the database. With 50 connections per pool, the total was 1,000 connections. The database was configured with max_connections=800. Under load, connection requests queued, latency spiked, and the service became unresponsive.

Root cause: Connection pools were not coordinated across instances. Each instance sized its pool independently without accounting for total database connection capacity.

Resolution: Centralized connection pool management with PgBouncer. Application instances connect through the pooler, which maintains a shared pool sized for total database capacity. Implemented connection health checks and automatic recovery.

Scenario 5: The Synchronous Replication Deadlock

A financial service configured synchronous replication for audit compliance. During a network partition between primary and replica, writes blocked indefinitely waiting for replica confirmation. The application hung, pending transactions did not complete, and the queue of uncommitted transactions grew to consume all available disk space.

Root cause: Synchronous replication with no timeout or fallback. The system had no graceful degradation when the replica was unavailable.

Resolution: Configured synchronous_commit with timeout and automatic fallback to asynchronous. Implemented partition detection and automatic promotion of asynchronous replication when synchronous target is unreachable.

Trade-off Analysis

Database scaling involves fundamental trade-offs. Understanding these helps you make decisions that match your workload requirements.

Consistency vs. Availability

ApproachConsistencyAvailabilityUse Case
Synchronous replicationStrongLower (replica required)Financial transactions, inventory
Asynchronous replicationEventualHighRead-heavy apps, social feeds
Multi-leaderEventualHighestGeographic distribution
Leaderless (quorum)ConfigurableVery highDistributed systems, Cassandra

Recommendation: Start with asynchronous replication. Add synchronous replication only when consistency requirements demand it, and implement proper timeouts and fallback behavior.

Write Scaling vs. Complexity

ApproachWrite ScaleOperational ComplexityData Integrity
Single primaryLimitedLowStrong
Multi-leaderMediumHighConflict-dependent
ShardingHighVery highApplication-managed

Recommendation: Exhaust vertical scaling and read replicas before sharding. Sharding complexity is often 10x higher than single-server operation.

Index Overhead vs. Query Performance

Index StrategyWrite OverheadRead PerformanceStorage
No indexesLowestFull table scansLowest
Minimal (primary lookups)LowGood for point queriesLow
Covering indexesMediumExcellent for specific queriesMedium
Many indexesHighFast for varied queriesHigh

Recommendation: Add indexes based on query patterns measured in production. Remove unused indexes. Balance read performance gains against write overhead.

Connection Pool Sizing

Pool SizeThroughputLatencyRisk
Too smallLowHigh (queuing)Underutilization
OptimalHighLowBalanced
Too largeMediumHigh (contention)Memory pressure

Recommendation: Start with the formula (core_count * 2) + effective_spindle_count. Monitor connection wait times and adjust based on observed behavior.

Common Pitfalls / Anti-Patterns

  1. Scaling too late: Monitoring gaps cause sudden performance crises. Monitor proactively and scale before hitting limits.

  2. Premature sharding: Sharding introduces significant complexity. Vertical scaling plus caching handles most workloads longer than expected.

  3. Ignoring read-after-write consistency: With replicas, reads might return stale data immediately after writes. Implement appropriate consistency levels.

  4. Cache invalidation complexity: Caching stale data is worse than no caching. Ensure invalidation is correct before caching frequently-changing data.

  5. Connection pool misconfiguration: Too few connections starve the application. Too many exhaust database resources. Size appropriately.

  6. Not testing failover: If failover is broken, you only find out during an outage. Test regularly in staging.

  7. Sharding without clear access patterns: Without understanding how data is accessed, shard keys will be wrong. Design access patterns first.

  8. Caching without measuring: Cache adds complexity. Measure cache hit rates and only cache when it provides meaningful benefit.

Real-World Case Studies

Case Study 1: Stack Overflow — Redis Caching at Scale

Stack Overflow handles millions of requests per day with a primarily read-heavy workload. Their caching strategy: hot question pages are cached in Redis with a 1-minute TTL. On cache miss, the query hits PostgreSQL with aggressive index-only scans. This pattern handles 10,000+ requests per second with sub-10ms p95 latency.

Booking.com shards hotel search by geographic region (hotel location). Each shard contains hotels in a specific region, allowing fast searches within a region. Cross-region searches scatter across multiple shards and aggregate results. This architecture supports billions of hotel room queries per day.

Case Study 3: GitHub — MySQL Sharding by Repository ID

GitHub shards repository data by repository ID using a modified consistent hashing scheme. Each shard handles ~10,000 repositories. Shard routing is done at the application layer via a lookup service. This allows GitHub to scale to millions of repositories without hitting single-node limits.

Interview Questions

1. Your database CPU is at 90% but memory utilization is only 40%. Adding more RAM does not help. What is the bottleneck and what do you do?

The bottleneck is CPU, not memory. More RAM would help if the bottleneck was cache misses — if you were hitting disk because data did not fit in memory, adding RAM would reduce disk I/O and CPU time spent waiting. But with CPU at 90% and memory at 40%, your working set fits in memory and the CPU is doing compute work. Options: upgrade CPU (vertical scaling), add read replicas to distribute read queries across multiple servers, optimize slow queries to reduce CPU per transaction, or implement caching for hot queries to reduce database load.

2. Your application is read-heavy (90% reads, 10% writes). You have a single database server. What is the first scaling step?

The first step is adding read replicas. With 90% reads, distributing reads across replicas gives you nearly 10x read capacity without modifying application logic. You add one or more read replicas, configure the application to route writes to the primary and reads to replicas, and monitor replication lag to ensure reads are not too stale. Vertical scaling is also an option, but for a 90/10 read-heavy workload, replicas give more capacity per dollar than upgrading to a larger server.

3. You are at 80% CPU on your primary database and adding replicas does not help because writes are the bottleneck. What do you do?

If writes are the bottleneck and replicas do not help (because replicas still funnel writes through the primary), you need to reduce write volume or distribute writes. Options: optimize write-heavy queries — maybe an UPDATE runs more often than necessary or an index is causing excessive write overhead; implement caching for data that does not need to be written to the database immediately; shard the database to distribute writes across multiple primaries; or move to a write-optimized database for specific write-heavy tables while keeping the rest on the primary.

4. At what point do you decide to shard versus continue vertical scaling?

The decision should be data-driven, not time-based. Shard when you have exhausted vertical scaling (cannot get a bigger server or the cost exceeds what sharding would cost), when you have measured that writes — not reads — are the bottleneck (replicas do not help writes), and when your data size exceeds what a single server can reasonably store. If you have a 2TB database and your largest available server has 4TB of storage, you have headroom but not for growth. If that server is already at 80% CPU and 90% storage, you need to shard or upgrade. The key metric is whether you can achieve your SLA for write latency and availability with your current architecture under projected load.

5. Explain the difference between ACID and BASE consistency models. When would you choose each?

ACID (Atomicity, Consistency, Isolation, Durability) provides strong consistency guarantees with transaction isolation. It ensures that writes are immediately visible to all subsequent reads and that transactions are all-or-nothing. BASE (Basically Available, Soft State, Eventually Consistent) sacrifices strong consistency for availability and partition tolerance. ACID is appropriate for financial transactions, inventory systems, or any system where reading stale data causes real harm. BASE is appropriate for social feeds, activity logs, user sessions, or read-heavy workloads where milliseconds of staleness is acceptable. The CAP theorem shows you cannot have all three (Consistency, Availability, Partition Tolerance) simultaneously — BASE prioritizes availability while ACID prioritizes consistency.

6. What is replication lag and how do you monitor and mitigate it?

Replication lag is the time delay between a write being committed on the primary and that write being applied on a replica. It occurs because most replication is asynchronous — the primary does not wait for replicas to confirm writes before returning success to the client. Monitoring: PostgreSQL's `pg_stat_replication` shows `sent_lsn`, `write_lsn`, `flush_lsn`, and `replay_lag`; MySQL's `SHOW REPLICA STATUS` shows `Seconds_Behind_Master`. Mitigation strategies: use synchronous replication (higher latency but zero lag), route critical reads to the primary after writes (read-after-write consistency), session-based routing (sticky reads to primary for a window after writes), or accept eventual consistency for non-critical reads. High replication lag causes stale reads where users see outdated data after updates.

7. How do you choose a shard key? What makes a good vs bad shard key?

A good shard key distributes writes evenly across shards (avoids hotspots), is frequently used in query filters (enables efficient routing), and has high cardinality (many distinct values). User ID or customer ID often works well because writes naturally distribute by user and queries often filter by user. A bad shard key causes hotspots: timestamp-based keys concentrate writes on the latest shard; geographic regions may cause uneven distribution if some regions have more activity. Avoid keys with low cardinality (e.g., boolean status field creates only two shards). Cross-shard queries (joining data across shards) are expensive — design keys to co-locate frequently joined data on the same shard. Test key distribution under realistic write patterns before production deployment.

8. What is cache stampede and how do you prevent it?

Cache stampede (thundering herd) occurs when a popular cache entry expires and multiple concurrent requests all miss the cache, overwhelming the database with identical queries. Prevention strategies: probabilistic early expiration (randomize TTL so entries don't all expire at once), cache locking (only one request fetches from DB, others wait for the cached result), jittered TTLs (add random delay to TTL to spread out expiration), proactive warming (refresh popular entries before they expire), and background refresh (use a background job to refresh cache before expiration). The goal is to ensure only one request hits the database when a cache miss occurs.

9. What are the trade-offs between query optimization versus scaling infrastructure?

Query optimization (adding indexes, rewriting queries, denormalizing) is often 10-100x cheaper than scaling infrastructure. A missing index can cause a full table scan that takes seconds; adding an index makes the same query milliseconds. Before scaling, exhaust optimization opportunities: add covering indexes for frequent queries, remove unused indexes to reduce write overhead, batch operations to reduce round trips, use parameterized queries for plan caching, and analyze EXPLAIN plans to find bottlenecks. Scaling (vertical or horizontal) should come when optimization is insufficient — not as a substitute for it. Premature scaling adds cost and complexity; premature optimization on the wrong queries wastes effort. Measure first to identify actual bottlenecks.

10. How does connection pooling work and why is it important for database scaling?

Connection pooling maintains a cache of database connections that can be reused across requests. Without pooling, each request opens a new connection (expensive handshake, authentication, session setup) and closes it afterward. With pooling, requests borrow an existing connection from the pool, use it, and return it — avoiding connection overhead. Pooling is important because: database connections are finite resources (each consumes memory, file descriptors, and server-side processes); establishing connections is slow (TCP handshake + TLS + authentication); and connection overhead limits throughput. Tools: PgBouncer for PostgreSQL, ProxySQL for MySQL, built-in pools in ORMs. Pool sizing: too small starves requests, too large wastes resources. Rule of thumb: (core_count * 2) + effective_spindle_count for PostgreSQL, adjusted based on observed wait times.

11. Compare single-leader, multi-leader, and leaderless replication topologies. When would you choose each?

Single-leader (one primary receives writes, multiple replicas) is simple, well-understood, and provides strong write ordering — but write capacity is limited to the primary. Multi-leader (multiple primaries accept writes, each replicates to others) scales writes and provides geographic distribution but requires conflict resolution for concurrent writes and is operationally complex. Leaderless (writes go to any replica, quorum-based reads) eliminates single point of failure and scales write capacity but requires complex conflict resolution and quorum tuning. Use single-leader for most workloads — the simplicity outweighs limitations. Multi-leader for geographic distribution with write-heavy workloads. Leaderless for maximum availability and fault tolerance (Cassandra, DynamoDB). Chain replication (write flows through a chain) provides simple ordering but latency increases with chain length.

12. How do B-tree and hash indexes differ, and when would you use each?

B-tree indexes maintain sorted data and support both equality and range queries — they are the default for most databases because they handle `WHERE column > value` and `BETWEEN` efficiently. Hash indexes support only equality lookups (`WHERE column = value`) but are faster for simple key lookups. Use B-tree for: columns used in range conditions, ORDER BY, GROUP BY, and any query that filters by a prefix of the index. Use hash for: lookup tables, session storage, and columns with high cardinality where only equality checks occur. Hash indexes cannot handle range scans — if you ever need `created_at > '2024-01-01'`, a hash index will not help. Index maintenance overhead matters: every write to an indexed column updates the index, so too many indexes slow writes. Balance read performance needs against write overhead.

13. What are covering indexes and partial indexes? When would you use each?

A covering index (index-only scan) includes all columns needed by a query in the index itself, so the database never accesses the table — only the index. Use when you frequently query specific columns together and want to avoid table lookups entirely. A partial index indexes only rows matching a condition — for example, only pending orders — making the index smaller, faster to maintain, and used only for relevant queries. Use partial indexes when: a subset of rows is queried frequently (like active users vs archived), the filtered column has high cardinality but queries target a specific value, or you want to reduce index size for write-heavy tables. Both reduce I/O and improve performance, but covering indexes serve queries that need specific columns while partial indexes serve queries that filter on a specific condition.

14. Explain how PgBouncer and ProxySQL work for connection pooling. What are the trade-offs?

PgBouncer (PostgreSQL) and ProxySQL (MySQL) sit between the application and database, maintaining a pool of database connections that are reused across requests. Without pooling, each request opens a new connection (TCP handshake, authentication, session setup), which is slow and resource-intensive. With pooling, requests borrow an existing connection, execute their query, and return the connection — eliminating connection overhead. Pool modes: session mode (connection returned after transaction), transaction mode (connection returned after each transaction — most common), and statement mode (connection returned after each statement). Trade-offs: connection pooling adds latency for connection reuse logic but reduces database resource usage; transaction mode pooling can break session-specific state (like `SET SESSION` variables); too-small pools starve requests, too-large waste resources; pools do not help if the bottleneck is query performance rather than connection overhead.

15. What is read-after-write consistency and how do you achieve it with read replicas?

Read-after-write consistency means that after a user writes data, subsequent reads from the same user return the new data — not stale data from a replica that has not yet received the write. With asynchronous replication, writes propagate to replicas with a delay, so a user might write data and immediately read it from a replica that has not yet applied the write, seeing old data. Achieving read-after-write consistency: read from primary after writes (guaranteed but defeats replica benefits); synchronous replication (replicas confirm writes before client returns — higher latency, lower availability); session-based routing (sticky reads to primary for a window after writes, e.g., 30 seconds); or use causal consistency protocols that track causal dependencies. Most applications use session-based routing because it balances consistency with performance — critical reads go to primary for a short window after writes, other reads use replicas normally.

16. How do you prevent cache stampede (thundering herd) when a popular cache entry expires?

Cache stampede occurs when a popular entry expires and multiple concurrent requests all miss the cache, hitting the database simultaneously with identical queries. Prevention strategies: probabilistic early expiration (refresh cache before expiration with some probability, so entries do not all expire simultaneously); cache locking (only one request fetches from the database while others wait for the cached result — prevents duplicate DB queries); jittered TTLs (add random delay to TTL values so expiration times are spread out); proactive warming (background job refreshes popular entries before they expire); and background refresh (dedicated process updates cache independently of user requests). The goal is ensuring only one request hits the database when a cache miss occurs. Cache stampede is especially problematic for expensive queries on hot data — the thundering herd can overwhelm the database at the worst time (when the cache becomes empty).

17. What are the trade-offs between sharding at the application level versus using database-native sharding?

Application-level sharding (custom shard router in code) gives you full control over shard key selection, routing logic, and cross-shard operations — you can implement any strategy that fits your access patterns. However, you must manage shard topology, handle resharding (when you add shards), and the application becomes coupled to the sharding scheme (tests and migrations are harder). Database-native sharding (MongoDB sharding, CockroachDB, Vitess for MySQL) automates shard management, rebalancing, and failover — the database handles distribution transparently. The trade-off: native sharding is easier to operate but less flexible; application-level is more flexible but adds operational complexity. Consider native sharding when your database supports it well and your access patterns fit its model. Consider application-level when you have unusual requirements (specific shard key distribution, complex cross-shard transactions) that native sharding cannot handle.

18. How does WAL (Write-Ahead Log) streaming replication work in PostgreSQL?

PostgreSQL uses physical streaming replication. When the primary receives a write, it records the change in the WAL (Write-Ahead Log) before applying it to the data files — this ensures durability (writes survive crashes). The primary streams WAL records to replicas via the streaming replication protocol, and replicas continuously apply them to stay in sync. Replicas are called hot standbys and can serve reads while they apply WAL. The replication slot mechanism ensures replicas retain WAL they have not yet applied, preventing the primary from removing WAL that replicas still need. Monitoring: `pg_stat_replication` shows state, sent LSN, write LSN, flush LSN, and replay lag. By default, replication is asynchronous (writes commit before replicas confirm) — synchronous replication with `synchronous_commit = on` ensures writes are confirmed by replicas before returning to the client, at the cost of higher latency. Logical replication (table-level) is separate and used for migrations and selective replication.

19. Describe the trade-offs in choosing composite index column order. How do you decide?

Column order in a composite index determines which query patterns can use the index. An index on (A, B, C) can serve queries that filter on A, on A and B, or on A, B, and C — but cannot efficiently serve queries that filter only on B or C (without A). Rule: put equality conditions first, range conditions last. If you frequently query `WHERE region = 'US' AND date > '2024-01-01'`, index on (region, date) works. If you frequently query `WHERE date > '2024-01-01'` alone, the region-first index cannot use the date range efficiently. Also consider: cardinality (high-cardinality columns first filters more rows), query frequency (put columns used in more frequent queries first), and write overhead (columns updated frequently add index maintenance cost). Test with EXPLAIN ANALYZE — if a query shows Seq Scan instead of Index Scan, the index order may not match query patterns. Avoid indexing low-cardinality columns first (like boolean status) because it creates large intermediate result sets.

20. When selecting between PostgreSQL, MySQL, and MongoDB for a new project, what factors would guide the decision?

Choose PostgreSQL when: you need ACID compliance (financial, inventory, any system where consistency matters), you have complex queries (joins, subqueries, window functions), you need strong JSON support (JSONB with GIN indexes), or you need advanced features (CTEs,full-text search,Geospatial with PostGIS). Choose MySQL when: you need high write throughput (MySQL's InnoDB handles write concurrency well), you prioritize read performance with replication (MySQL's replication is mature and fast), or you are building a LAMP/LEMP stack where MySQL is well-integrated. Choose MongoDB when: your data is document-structured (JSON-like, nested, variable schema), you need horizontal scaling out of the box (MongoDB sharding is mature), or your application benefits from flexible schema (rapid iteration, polymorphic data). Key factors: consistency requirements (PostgreSQL > MySQL > MongoDB for strong guarantees), scaling needs (MongoDB for automatic sharding, PostgreSQL with extensions for application-level sharding), query complexity (relational joins favor PostgreSQL/MySQL), and team expertise (familiarity matters for operational reliability).

Further Reading

Database Fundamentals

Scaling Deep Dives

Operational Excellence

Performance Tuning

Conclusion

Key Bullets:

  • Scale vertically first, then add read replicas, then cache, then shard (in that order)
  • Read replicas solve read scale; sharding solves write scale
  • Caching provides the best ROI for reducing database load when access patterns allow
  • Monitor CPU, memory, connection counts, and replication lag
  • Test failover procedures before you need them in production
  • Choose scaling approach based on actual bottleneck (measure first)

Copy/Paste Checklist:

# Read replica routing with connection pooling
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# Primary for writes
primary_engine = create_engine(
    "postgresql://user:***@primary:5432/mydb",
    pool_size=5, max_overflow=10
)

# Replica for reads
replica_engine = create_engine(
    "postgresql://user:***@replica:5432/mydb",
    pool_size=20, max_overflow=20
)

# Use primary for writes, replica for reads
def write_operation(sql, params):
    with primary_engine.connect() as conn:
        return conn.execute(sql, params)

def read_operation(sql, params):
    with replica_engine.connect() as conn:
        return conn.execute(sql, params)

Observability Checklist

Metrics to Monitor:

  • Query latency by type (read, write, analytical)
  • Database CPU, memory, disk I/O utilization
  • Connection pool utilization and wait times
  • Replication lag (seconds behind primary)
  • Cache hit ratio and memory utilization
  • Shard distribution and hotspot detection
  • Throughput (queries per second, transactions per second)
  • Queue depths and backpressure indicators

Logs to Capture:

  • Slow query logs (queries exceeding thresholds)
  • Connection events (opens, closes, failures)
  • Replication status changes and errors
  • Failover events and decision rationale
  • Cache evictions and expiration patterns
  • Application errors related to database access

Alerts to Set:

  • CPU or memory utilization > 80%
  • Replication lag > 30 seconds
  • Connection pool > 80% utilized
  • Cache hit ratio < threshold (e.g., < 80%)
  • Query latency p99 exceeding SLA
  • Shard skew exceeding threshold (e.g., > 2x average)
  • Disk usage > 85%
# Database scaling monitoring example
def check_db_health(db_config):
    metrics = {
        'cpu_percent': get_db_cpu(),
        'memory_percent': get_db_memory(),
        'connections': get_connection_count(),
        'replication_lag': get_replication_lag(),
        'cache_hit_ratio': get_cache_hit_ratio()
    }

    alerts = []
    if metrics['cpu_percent'] > 80:
        alerts.append(f"CPU critical: {metrics['cpu_percent']}%")
    if metrics['replication_lag'] > 30:
        alerts.append(f"Replication lag: {metrics['replication_lag']}s")

    return metrics, alerts

Security Checklist

  • Use strong authentication for all database connections
  • Implement TLS encryption for all database connections
  • Apply principle of least privilege for application database users
  • Rotate credentials regularly and use secret management
  • Audit database access logs for security review
  • Use network segmentation (databases not directly internet-accessible)
  • Encrypt data at rest (filesystem or application-level encryption)
  • Implement query whitelisting or restrictions for application users
  • Monitor for unauthorized access attempts
  • Test database access controls with penetration testing
  • Secure backup data with encryption
  • Implement read-only users separate from read-write users

Category

Related Posts

Database Capacity Planning: A Practical Guide

Plan for growth before you hit walls. This guide covers growth forecasting, compute and storage sizing, IOPS requirements, and cloud vs on-prem decisions.

#database #capacity-planning #infrastructure

Connection Pooling: HikariCP, pgBouncer, and ProxySQL

Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.

#database #connection-pooling #performance

Read/Write Splitting

Master-slave replication for read scaling. Routing strategies, consistency lag considerations, and when this pattern helps vs hurts your architecture.

#database #scaling #replication