Relational Databases: ACID Transactions and Data Foundation

Learn how relational databases work, from ACID properties and transactions to joins, indexing, and normalization in PostgreSQL and MySQL.

published: reading time: 40 min read author: GeekWorkBench

Relational Databases: ACID Transactions and the Foundation of Data

Introduction

A relational database stores data in tables with rows and columns. Each row represents a record, and each column represents an attribute. Tables connect through keys, which establish relationships between records.

The most important relationship type is the foreign key. A foreign key in one table points to the primary key of another table. This creates links. You do not store the user’s name in every order record. Instead, you store a user_id that points to the users table.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total DECIMAL(10, 2) NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

This structure is what gives relational databases their power. You can ask complex questions by combining tables, filtering on multiple conditions, and aggregating results. The query optimizer figures out the most efficient way to answer your question given the available indexes.

Core Concepts

Relational databases provide guarantees around how transactions work. These guarantees have a memorable acronym: ACID.

Atomicity

Atomicity means a transaction happens completely or not at all. If you transfer money from account A to account B, either both the debit and credit happen, or neither does. You never end up with money deducted from A but not credited to B.

Database systems implement this through transaction logs. The database writes what it plans to do before doing it. If a crash happens mid-transaction, the log allows the system to either complete the operation or roll it back.

Consistency

Consistency ensures that every transaction moves the database from one valid state to another. Any constraints you define, like foreign keys or unique indexes, must hold after every transaction.

If you have a constraint that order.user_id must reference a valid users.id, the database rejects any transaction that tries to create an order with an invalid user_id. Consistency is your safety net for data integrity.

Isolation

Isolation determines how concurrent transactions interact. Multiple people querying the database at the same time should not see each other’s half-finished work.

If you transfer money from A to B while someone else reads your balance, isolation determines whether they see the old balance, the new balance, or something in between. Higher isolation levels prevent anomalies but reduce concurrency.

Durability

Durability means once a transaction commits, it survives system crashes. If you receive confirmation that a transaction succeeded, the data persists even if the database server loses power immediately after.

Most databases achieve this by writing to disk before reporting success. The write-ahead log technique ensures the database can replay incomplete transactions after a crash.

Transactions in Practice

Transactions group multiple operations into a single unit. PostgreSQL and MySQL both use the same basic syntax:

BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;

If the second UPDATE fails, the first one rolls back. Your balance stays at its original value.

The default isolation level in most databases is READ COMMITTED, which means you only see data that has been committed by other transactions. You can set stricter isolation:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SERIALIZABLE prevents most concurrency anomalies but can cause more lock contention and rolled-back transactions under heavy load.

Understanding Joins

Joins combine data from multiple tables. Understanding how they work is fundamental to writing efficient queries.

Inner Join

An inner join returns only rows that have matches in both tables. If you join orders to users, you only see orders that have a valid user_id.

SELECT users.email, orders.total, orders.created_at
FROM users
INNER JOIN orders ON users.id = orders.user_id
WHERE orders.status = 'completed';
Left Join

A left join returns all rows from the left table, even if they have no matching rows in the right table. The right side columns show NULL for non-matching rows.

SELECT users.email, orders.total
FROM users
LEFT JOIN orders ON users.id = orders.user_id;

This query shows every user, even those who have never placed an order. Their order columns will be NULL.

Right Join and Full Outer Join

Right joins work the opposite way, returning all rows from the right table. Full outer joins return all rows from both tables, filling in NULLs where there is no match. These are less commonly used but important to understand.

Join Performance

Joins can get expensive. A naive join algorithm scans both tables and compares every row. With proper indexing and query planning, the database can do much better.

The query optimizer decides how to execute your join based on table statistics, available indexes, and estimated row counts. Sometimes the optimizer makes poor choices, which is when you need to understand EXPLAIN output.

EXPLAIN SELECT users.email, orders.total
FROM users
INNER JOIN orders ON users.id = orders.user_id;
Query Optimization Guide

The EXPLAIN family of commands is the main diagnostic tool for slow queries. Here is how to use it effectively.

-- Basic plan
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

-- With actual runtime stats (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT * FROM orders WHERE user_id = 42;

-- Costs and actual rows
EXPLAIN (ANALYZE, COSTS, VERBOSE) SELECT * FROM orders WHERE user_id = 42;

Common plan nodes to recognize:

Node TypeWhat It Means
Seq ScanFull table scan — usually a red flag on large tables
Index ScanIndex used to find rows
Index Only ScanData comes entirely from index, no heap fetch
Nested LoopJoins by probing inner relation per outer row
Hash JoinBuilds a hash table for joining large sets
Merge JoinSorts both inputs then merges — efficient on pre-sorted data
Bitmap Heap ScanRetrieves rows via bitmap from index

Diagnosing slow queries:

  1. Seq Scan on a large table returning few rows — You need an index. A query returning 100 rows from a 10M-row table doing a Seq Scan is a clear signal.

  2. Estimated rows far from actual rows — Statistics are stale. Run ANALYZE table_name; to update them.

  3. Hash Join spilling to disk — The query is running out of work_mem. Either increase it or restructure the query.

  4. Filter applied after join vs during join — For INNER JOINs, predicates in the ON clause filter earlier. Move filters to the right place.

-- Tables with more seq scans than index scans (missing indexes)
SELECT schemaname, tablename, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan * 10
ORDER BY seq_scan DESC;

-- Tables with high dead tuple percentage
SELECT relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / (n_dead_tup + n_live_tup + 1) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- Top queries by total time
SELECT query, calls, total_time, mean_time, rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

Index patterns worth knowing:

-- Partial index for a common filter pattern
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

-- Expression index when you filter on a function
CREATE INDEX idx_users_email_lower ON users(lower(email));

-- Include columns to enable index-only scans (PostgreSQL 11+)
CREATE INDEX idx_orders_user_include ON orders(user_id) INCLUDE (total, created_at);
-- This covers: SELECT total, created_at FROM orders WHERE user_id = ?

Query rewriting patterns:

-- Subquery in IN clause — rewrite as a JOIN
SELECT * FROM orders WHERE id IN (SELECT order_id FROM items WHERE product_id = 123);
-- Becomes:
SELECT o.* FROM orders o INNER JOIN items i ON o.id = i.order_id WHERE i.product_id = 123;

-- Multiple queries fetching one row each (N+1 pattern) — batch them
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- Becomes:
SELECT * FROM orders WHERE user_id IN (1, 2) ORDER BY user_id;

-- Aggregation by category — let PostgreSQL do it in one scan
SELECT status, count(*) FROM orders GROUP BY status;
Indexing for Performance

Indexes speed up data retrieval. Without indexes, the database must scan every row to find what you are looking for. With the right index, it jumps directly to the relevant pages.

How Indexes Work

Most database indexes use a structure called B-tree, which keeps data sorted and allows efficient range queries. When you create an index on a column, the database maintains a separate sorted structure pointing to the actual data rows.

CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created ON orders(created_at);

These indexes let the database quickly find orders by user_id, filter by status, or range-query by date.

When Indexes Hurt

Indexes are not free. Every INSERT, UPDATE, or DELETE must update all relevant indexes. A table with many indexes writes slower than one with few indexes.

For columns with low cardinality (few unique values), indexes provide minimal benefit. An index on a boolean column rarely helps since half the rows match any query.

Composite Indexes

A composite index covers multiple columns. The order matters. A composite index on (user_id, created_at) helps queries that filter by user_id alone, or user_id plus created_at. It does not help queries that only filter by created_at.

CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

Normalization and Its Trade-offs

Database normalization structures tables to reduce redundancy. Normal forms are stages of normalization, from 1NF (first normal form) to 3NF (third normal form) and beyond.

First Normal Form

1NF requires atomic values and no repeating groups. Each cell contains a single value, not a list or array. Each row is unique.

Second Normal Form

2NF removes partial dependencies. Non-key columns must depend on the entire primary key, not just part of it. This matters for composite keys.

Third Normal Form

3NF removes transitive dependencies. Non-key columns should not depend on other non-key columns. If column A determines column B, and B determines column C, you have a transitive dependency.

-- Not normalized: total depends on quantity and price
CREATE TABLE order_items_bad (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER,
    price DECIMAL(10, 2),
    total DECIMAL(10, 2)  -- This depends on quantity * price
);

-- Normalized: total is calculated, not stored
CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER,
    price DECIMAL(10, 2)
);

Normalization reduces data duplication. Denormalization (deliberately introducing redundancy) can improve read performance. Both have their place.

PostgreSQL vs MySQL

Both PostgreSQL and MySQL are excellent relational databases. They share most SQL syntax but differ in implementation details.

PostgreSQL offers more advanced features: better support for JSON operations within a relational context, full-text search, window functions, and more sophisticated indexing options. It follows SQL standards more strictly.

MySQL has historically been faster for simple read-heavy workloads. Its default storage engine, InnoDB, provides ACID transactions and row-level locking. MySQL syntax sometimes diverges from SQL standards.

Both handle replication, backups, and high availability. PostgreSQL has better support for complex data types and operations. MySQL has a larger ecosystem of managed services and tooling.

For most applications, the difference matters less than you might think. Pick one and learn it well. Switching between them is rarely necessary and comes with significant migration costs.

Sharding and Replication Deep Dive

Horizontal scaling for relational databases takes two forms: sharding and replication. They solve different problems.

Read Replicas and Replication Topologies

Replication streams data from a primary to one or more replicas. Reads go to replicas, reducing load on the primary. Writes always go to the primary, which pushes them to replicas after the fact.

-- PostgreSQL streaming replication setup (primary)
ALTER DATABASE mydb SET wal_level = replica;
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;

-- Create replication slot on primary
SELECT pg_create_physical_replication_slot('replica1_slot');

-- On replica, create recovery.conf (or modern standby.signal)
-- primary_conninfo = 'host=primary_ip port=5432 application_name=replica1'
-- slot_name = 'replica1_slot'
-- recovery_target_timeline = 'latest'

Replication topologies:

TopologyProsConsBest For
Single primary + async replicasSimple, read scalingReplica lagRead-heavy apps, analytics
Synchronous replicationStrong consistency, automatic failoverExtra latency per writeFinancial transactions
Multi-primaryNo single write bottleneckConflict resolution complexityGeo-distributed writes
Cascading replicationTakes load off primary for many replicasLag compounds downstreamLarge replica counts
Sharding Patterns

Sharding partitions data across multiple databases. Each shard holds a subset, selected by a shard key.

-- Application-level sharding by user_id hash
-- Shard key = hash(user_id) % num_shards

CREATE TABLE orders (
    id SERIAL,
    user_id INTEGER,
    shard_key INTEGER GENERATED ALWAYS AS (user_id % 4) STORED,
    total DECIMAL(10, 2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Shard 0, 1, 2, 3 are separate physical databases
-- Query router directs traffic based on shard_key

Sharding strategies:

StrategyHow It WorksChallenges
Hash-basedshard = hash(key) % num_shardsResharding means remapping everything
Range-basedshard = key / range_sizeSequential keys concentrate on one shard
Directory-basedLookup table maps keys to shardsLookup table becomes a bottleneck
Geo-basedRegion determines shardUneven distribution by geography

When to shard:

  • Data no longer fits on one server
  • Write throughput outpaces a single primary
  • Compliance demands specific geographic data residency

When NOT to shard:

  • A single server handles your data fine
  • You are optimizing prematurely
  • Better indexing or caching would solve the problem first
Distributed Consistency Trade-offs

Relational databases trade horizontal write scalability for strong consistency. NoSQL systems make the opposite trade. Most applications never hit the ceiling on a well-tuned relational setup — but engineers often jump to NoSQL before exhausting what relational databases can do.

Topic-Specific Deep Dives

ACID vs BASE Trade-offs

ACID and BASE are two different philosophies on how to handle consistency in distributed systems. ACID is what relational databases do. BASE showed up with NoSQL.

The ACID Model

ACID gives you four guarantees:

  • Atomicity: Transaction completes entirely or not at all
  • Consistency: All constraints hold after every transaction
  • Isolation: Concurrent transactions stay out of each other’s way
  • Durability: Committed data survives crashes

These guarantees make relational databases predictable. You know exactly what state your data is in. That predictability has a cost: keeping all nodes in sync requires coordination, and coordination takes time.

The BASE Model

BASE makes different trade-offs:

  • Basically Available: The system stays up, even if some nodes are stale
  • Soft state: Data might be inconsistent between nodes at any moment
  • Eventually consistent: Given enough time, all nodes will agree
graph LR
    A[Write to Node A] --> B{Quorum?<br/>W + R > N}
    B -->|Yes| C[Strong Consistency]
    B -->|No| D[Eventual Consistency]
    D --> E[Read from Node B<br/>Possible stale data]
    C --> F[All nodes converge]
ACID vs BASE Comparison
FactorACID (Relational)BASE (NoSQL)
ConsistencyStrongEventual
TransactionsFull ACIDLimited or none
LatencyHigherLower
AvailabilityLower (CP systems)Higher (AP systems)
Write scalingHardNatural
Data integrityEnforced by databaseApplication logic
ExamplesPostgreSQL, MySQLCassandra, DynamoDB
When to Pick Each

Go with ACID/relational when data integrity matters more than anything else — financial records, inventory, medical data. When you need complex transactions across multiple tables, or when your queries involve joins that would be a nightmare in application code.

Go with BASE/NoSQL when write throughput is the hard problem and you can tolerate reads that might be slightly behind. Document-oriented or key-value workloads that do not map naturally to tables fit here. So do use cases that need automatic sharding across geographic regions.

Hybrid Systems

A few modern databases try to split the difference:

  • Google Spanner: TrueTime hardware (specialized GPS + atomic clocks) enables strongly consistent distributed transactions with geographic replication. The hardware is the key.
  • CockroachDB: Distributed SQL without special hardware — uses Hybrid Logical Clocks instead. Operationally simpler than Spanner, slightly more latency on distributed transactions.
  • YugabyteDB: PostgreSQL-compatible, uses Raft consensus for distributed consistency.
  • Aurora: MySQL/PostgreSQL-compatible with a distributed storage layer that replicates across availability zones. Not truly distributed writes — still a single primary — but the storage layer handles replication and durability.

These systems give you relational semantics with distributed scale. The cost is operational complexity — running a distributed SQL cluster is harder than running a single PostgreSQL instance.

Trade-off Analysis

Use this table to guide architectural decisions based on workload characteristics:

FactorScenarioPostgreSQLMySQL (InnoDB)Trade-off Consideration
High write throughputMVCC reduces writer contentionRow-level locking, faster single-threadPostgreSQL better for mixed workloadsPostgreSQL wins for write-heavy concurrent workloads
Complex analytical queriesWindow functions, CTEs, excellent optimizerLimited window function supportPostgreSQL wins for analyticsPostgreSQL superior for complex analytical queries
Simple key-value lookupsB-tree indexed, reliableVery fast for primary key lookupsMySQL slightly faster for point queriesMySQL slightly faster; PostgreSQL for concurrent multi-key access
Full-text searchNative tsvector/tsquery, built-inMySQL Full-Text indexes availablePostgreSQL superior out of the boxPostgreSQL for production FTS; MySQL acceptable for simple cases
JSON data handlingJSONB with GIN indexes, full JSON supportJSON functions limitedPostgreSQL for JSON-centric workloadsPostgreSQL for JSON-heavy workloads; MySQL for basic JSON only
Horizontal shardingCitus extension, or manual shardingVitess, manual shardingPostgreSQL has better sharding supportPostgreSQL via Citus preferred; MySQL requires Vitess complexity
Managed cloud offeringsAurora, Cloud SQL, RDS, self-managedAurora, Cloud SQL, RDS, self-managedBoth well-supported in cloudBoth equivalent in managed cloud; PostgreSQL has richer tooling
ReplicationStreaming, logical decoding, extensiveBinlog-based, semi-sync availablePostgreSQL more flexible replicationPostgreSQL for flexible replication; MySQL for simplicity
Locking granularityRow-level, predicate locksRow-level, gap locks in RR modePostgreSQL prevents phantoms at lower costPostgreSQL for better concurrent write performance

When to Use / When Not to Use

When to Use Relational Databases:

  • Your data has clear entity relationships that map naturally to tables
  • You need ACID transactions for financial data, inventory, or orders
  • Your queries involve complex joins across multiple tables
  • You need powerful ad-hoc querying with SQL for analytics
  • Your workload is balanced between reads and writes
  • You need mature tooling, backups, and operational procedures

When Not to Use Relational Databases:

  • Your data structure varies dramatically between records (document storage fits better)
  • You need to store and query massive semi-structured data like JSON logs
  • Your primary access pattern is single-key lookups at extreme scale
  • Your writes massively outnumber reads and you need horizontal write scaling
  • You are building a graph-based application with complex relationship traversals

Production Failure Scenarios

FailureImpactMitigation
Primary disk failureDatabase unavailable, potential data lossRegular backups, write-ahead logs, replication to replicas
Lock contention from long transactionsQueries timeout, application hangsKeep transactions short, use appropriate isolation levels
Index corruptionQueries return incorrect resultsUse REINDEX periodically, enable checksums, verify with pg_checksums
Connection pool exhaustionNew queries fail, “too many connections”Right-size max_connections, implement connection pooling (PgBouncer)
Accidental DROP TABLE or DELETEData lossRestrict permissions, use DROP TABLE ... IF EXISTS, point-in-time recovery
Replication lag spikesRead replicas serve stale dataMonitor lag, route writes to primary after updates, use synchronous replication
Autovacuum not keeping upTable bloat, degraded performanceTune autovacuum workers, manually VACUUM when needed, monitor dead tuples
Query planner chooses wrong indexSlow queries, high CPUAnalyze with EXPLAIN, update statistics, use index hints if needed

Common Pitfalls / Anti-Patterns

  1. Missing indexes on foreign keys: Always index foreign key columns. Without indexes, parent-child joins cause full table scans.

  2. N+1 query problems: Fetching related data in a loop instead of using joins. A single query returning 100 users with their orders is better than 101 queries.

  3. Over-indexing: Creating indexes “just in case” hurts write performance. Every index slows down INSERT/UPDATE/DELETE.

  4. Using triggers for business logic: Triggers hide side effects and make debugging harder. Handle logic in application code.

  5. Ignoring NULL semantics: NULL is not “empty string” or “zero”. Queries behave differently with NULLs, and mixed NULL handling causes subtle bugs.

  6. Storing serialized objects or arrays in text columns: While convenient, this bypasses indexing and query capabilities. Use proper data types or a document store.

  7. Ignoring connection pooling: Each connection consumes memory. Use PgBouncer or similar to pool connections and avoid connection exhaustion.

  8. Not using prepared statements: Parsing overhead adds up. Use prepared statements for repeated queries.

  9. Mismatched types in joins: Joining VARCHAR to INTEGER causes implicit casts and prevents index usage. Always match types.

Quick Recap

Key Takeaways

  • Relational databases provide ACID guarantees that keep your data consistent and durable
  • Indexes are essential for read performance but slow down writes
  • Normalization reduces redundancy; denormalization improves read performance
  • Transactions group operations into atomic units that commit or rollback together
  • Choose PostgreSQL for advanced features and standards compliance, MySQL for simple read-heavy workloads
  • Monitor query latency, connection counts, and replication lag in production

Copy/Paste Checklist:

-- Essential performance monitoring queries
SELECT * FROM pg_stat_activity WHERE state != 'idle';
SELECT * FROM pg_stat_replication;
SELECT relname, n_dead_tup, last_vacuum FROM pg_stat_user_tables;
EXPLAIN (ANALYZE, BUFFERS) your_query_here;

-- Essential security settings
ALTER DATABASE mydb SET session_preload_libraries = 'pg_stat_statements';
ALTER ROLE readonly NOLOGIN;
GRANT CONNECT ON DATABASE mydb TO app_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;

-- Essential backup verification
SELECT pg_start_backup('baseline');
SELECT pg_stop_backup();

Observability Checklist

Metrics to Monitor:

  • Query latency (p50, p95, p99)
  • Active connections and connection saturation
  • Transaction throughput (commits/rollbacks per second)
  • Lock wait time and lock contention rates
  • Cache hit ratio (shared_buffers)
  • Disk I/O utilization and queue depth
  • Replication lag in seconds
  • Autovacuum activity and table bloat percentage
  • Index usage and missing indexes

Logs to Capture:

  • Slow query log (queries exceeding threshold, e.g., 100ms)
  • Error logs with full context including query parameters
  • Connection logs for authentication failures
  • Lock deadlocks and lock timeout events
  • Checkpoint and bgwriter activity
  • Autovacuum execution details

Alerts to Set:

  • Connection count > 80% of max_connections
  • Replication lag > 30 seconds
  • Disk usage > 85% on data volume
  • Long-running queries (> 60 seconds)
  • Cache hit ratio < 90%
  • Lock wait time spike
  • Autovacuum failing repeatedly
  • Replication slot age exceeding retention
-- Quick visibility queries
SELECT * FROM pg_stat_activity WHERE state != 'idle';
SELECT * FROM pg_stat_replication;
SELECT relname, n_dead_tup, n_live_tup, last_autovacuum FROM pg_stat_user_tables;
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

Security Checklist

  • Use strong authentication (scram-sha-256 or certificate-based, never md5)
  • Implement least-privilege access (users get only needed permissions)
  • Encrypt connections with SSL/TLS (set ssl = on and enforce in pg_hba.conf)
  • Encrypt data at rest (use filesystem encryption or database-level encryption)
  • Enable and review audit logging for sensitive operations
  • Use ALTER DATABASE SET session_preload_libraries = 'pg_stat_statements' for query monitoring without overexposure
  • Restrict pg_hba.conf to specific IP ranges, deny public access
  • Regularly apply security patches and update minor versions
  • Use network segmentation (database not directly internet-accessible)
  • Implement row-level security for multi-tenant scenarios
  • Backup encryption keys separately from encrypted backups
  • Test restore procedures in isolation from production

Real-World Case Studies

Netflix: From Oracle to PostgreSQL at Massive Scale

Netflix ran on Oracle for years — billions of rows, thousands of tables, serious hardware. In 2012 they migrated to PostgreSQL running on AWS RDS. The migration took two years of preparation, and the main motivation was cost: Oracle licensing was pricing them out at their scale.

The migration was incremental. They did not do a big-bang switch. They ran dual-write for months — writing to both Oracle and PostgreSQL simultaneously while comparing results. They built custom validation tools to catch any divergence. When they cut over, the decision was purely operational: flip a routing switch and watch the metrics.

The lesson: database migrations at scale are infrastructure problems more than database problems. The schema transfer was straightforward. The hard part was the two years of tooling, validation, and operational runbook development that made the switch boring.

Amazon: Why They Left Oracle for PostgreSQL and MySQL

Amazon’s consumer-facing databases migrated away from Oracle around 2013-2017. The stated reasons were licensing cost and control. At Amazon’s scale, Oracle’s per-core licensing model was a serious line item.

Amazon’s internal Postgres-compatible database, Aurora, launched in 2014. The architecture is not a vanilla Postgres — it separates the storage layer and replicates across three availability zones. This gives MySQL/PostgreSQL compatibility with durability characteristics no traditional RDS could match.

The takeaway: the database you pick shapes your operational constraints for years. Amazon built their own storage engine because off-the-shelf could not give them the durability they needed at their write volume. Most companies will not hit that ceiling, but it is worth knowing where the ceiling is.

Interview Questions

1. What is the difference between a clustered and a non-clustered index, and how does PostgreSQL handle indexing differently from MySQL?

MySQL InnoDB uses clustered indexes where the primary key is the physical row storage order. Secondary indexes store primary key values, not row pointers. SQL Server follows the same pattern. PostgreSQL has no concept of clustered indexes — tables are heaps and indexes point to heap locations (block number and offset). PostgreSQL does offer CLUSTER to reorder a table physically, but subsequent inserts are not maintained in order. This means PostgreSQL's index-only scans depend on visibility map freshness, while MySQL's covering indexes are more predictable. For read-heavy workloads on MySQL, choose your primary key carefully since it determines physical storage order.

2. A query that worked yesterday is suddenly slow. List the steps you would take to diagnose it.

First, run EXPLAIN (ANALYZE, BUFFERS) on the query and compare the plan to a known-good plan — the issue is often a stale statistics change causing the planner to pick a sequential scan instead of an index scan. Check whether the table has recently grown past an autovacuum threshold (high dead tuple count causes the planner to misestimate). Check whether an index was dropped or disabled. Verify that the connection is not hitting a different replica with stale statistics. Check for lock contention — if the query is waiting on a lock, execution time includes wait time, not just execution time.

3. Explain ACID properties and which one is most often relaxed in distributed databases.

Atomicity means a transaction commits entirely or rolls back entirely — no partial state. Consistency means the database moves from one valid state to another — all constraints and invariants hold. Isolation means concurrent transactions do not interfere with each other — the details of which isolation level determines exactly how much interference is visible. Durability means once a transaction commits, it survives crashes. Distributed databases most often relax isolation. Serializability is expensive — it requires coordination across nodes. Many distributed databases default to eventually consistent or read-committed isolation to avoid the latency penalty of synchronous coordination. Durability is also frequently traded against latency in write paths (write-ahead logs vs synchronous fsync).

4. How does a B-tree index work, and why is it the default index type in most relational databases?

A B-tree (balanced tree) keeps data sorted in pages, typically 8KB-16KB each. Root and intermediate pages contain pointers to child pages; leaf pages contain the actual index entries sorted by key. This means lookups, range scans, and sequential access are all efficient — O(log n) for point queries, O(log n + k) for range queries where k is results. B-trees stay balanced automatically through page splits on insert. They minimize disk I/O because tree depth is shallow (3-4 levels for millions of rows) and pages are large. Hash indexes are faster for equality on exact matches but cannot handle range queries, which are common in relational workloads, making B-trees the default.

5. What is a race condition in the context of database transactions, and how do isolation levels prevent them?

A race condition occurs when two transactions operate on the same data simultaneously and the outcome depends on execution order. READ UNCOMMITTED allows dirty reads — seeing uncommitted changes. READ COMMITTED (PostgreSQL/MySQL default) prevents dirty reads but allows non-repeatable reads (re-reading the same row yields different values). REPEATABLE READ (MySQL default) prevents dirty and non-repeatable reads but allows phantom reads (new rows matching a range query). SERIALIZABLE prevents all anomalies by executing transactions as if sequentially, typically via two-phase locking or snapshot isolation. PostgreSQL additionally implements SELECT FOR UPDATE and FOR SHARE to explicitly lock rows, giving developers control at the cost of potential deadlocks.

6. Explain the N+1 query problem and how you would identify and fix it.

The N+1 problem occurs when fetching a list of records (1 query) then making N additional queries — one per record — to load related data. Example: fetching 100 users (1 query) then loading each user's orders (100 queries). You identify it via query logging: a single page load generates hundreds of similar queries, or pg_stat_statements shows many calls with the same pattern. Fix approaches: use JOINs to fetch related data in one query, use batch loading (IN clause with limited batches), use an ORM's eager loading features (N+1 prevention), or use a DataLoader pattern to batch and cache individual lookups. The right fix depends on whether you need all related data or can paginate it.

7. What is the difference between DELETE, TRUNCATE, and DROP in SQL?

DELETE is a DML operation that removes rows one by one, triggers DELETE triggers, respects transaction semantics (can be rolled back), and logging depends on the recovery model — it is logged in full recovery mode. TRUNCATE is a DDL operation that deallocates data pages, making it much faster for large tables; it resets identity columns, does not fire triggers, and is not fully logged (it logs page deallocations, not rows). DROP removes the entire table structure including indexes and constraints — it is a DDL operation, cannot be rolled back in most databases, and removes the table entirely. For large tables, TRUNCATE TABLE (table_name) is orders of magnitude faster than DELETE FROM (table_name). DROP is irreversible without backups.

8. How would you design a database schema for a many-to-many relationship, and what join table considerations matter at scale?

A many-to-many relationship requires a join table with foreign keys to both tables. For example, students and courses: a student can take many courses and a course has many students. The join table (enrollments) has (student_id, course_id) as a composite primary key. At scale, add a surrogate key for the join table if you need to reference specific enrollments (for grades, dates, etc.). Always index both foreign key columns — a composite index on (student_id, course_id) handles queries for all courses a student takes and all students in a course. If queries are predominantly one-directional, consider separate indexes. For high-write scenarios, the join table becomes a hotspot; partitioning by one of the foreign keys can help distribute load.

9. What are materialized views and when would you use them instead of regular views?

A regular view is a saved query — it runs the underlying query each time you SELECT from it. A materialized view pre-computes and stores the result as a physical table, refreshing on a schedule or on-demand. Use materialized views when you have expensive aggregations or joins computed frequently (daily sales summaries, reporting dashboards) and near-real-time accuracy is not required. PostgreSQL supports materialized views with REFRESH MATERIALIZED VIEW. MySQL does not have native materialized view support — you implement the pattern manually with summary tables and scheduled updates. Trade-offs: materialized views improve read performance but add storage overhead, require refresh strategy, and may serve stale data.

10. Explain the difference between optimistic and pessimistic locking. When would you use each?

Pessimistic locking assumes conflicts will happen and prevents them by acquiring locks before a transaction: SELECT FOR UPDATE in PostgreSQL or SELECT ... LOCK IN SHARE MODE in MySQL. The lock is held until the transaction commits. This prevents conflicts but can cause lock contention and deadlocks under high concurrency. Optimistic locking assumes conflicts are rare: you read a row with a version number or timestamp, and on update you check that the version has not changed (WHERE id = ? AND version = ?). If it changed, you retry or abort. This allows higher concurrency but means you must handle retry logic in application code. Use pessimistic locking when conflicts are frequent or expensive to resolve (financial transactions, inventory decrements). Use optimistic locking when conflicts are rare and concurrency is important (user profile updates, comment likes).

11. What is a stored procedure, and what are the trade-offs of using them versus application-layer queries?

A stored procedure is executable SQL logic stored in the database itself. Advantages: execution plan caching and reuse, reduced network round-trips for multi-statement operations, encapsulation of business logic that must apply regardless of which application accesses the database. Disadvantages: vendor lock-in (procedures are database-specific), harder to version control and test (requires database deployment), limited debugging tools, and they shift business logic away from application code where modern testing and CI/CD practices are well-established. For most applications, keep business logic in the application layer and use stored procedures sparingly — primarily for performance-critical operations where network latency is the bottleneck, or for operations that must be atomic regardless of which client executes them.

12. How does database normalization reduce redundancy, and what are the practical limits of highly normalized schemas?

Normalization eliminates redundant data by splitting tables and establishing relationships. In an unnormalized order entry, customer name appears on every order row; in 3NF, customer name exists only in the customers table. This eliminates update anomalies — changing a customer's name requires updating one row, not many. 1NF enforces atomicity (no repeating groups), 2NF removes partial dependencies on composite keys, 3NF removes transitive dependencies. Beyond 3NF, BC NF and 4NF handle edge cases. The practical limit is usually 3NF — higher forms matter mainly for specialized domains. Highly normalized schemas can require many joins for common queries, which hurts read performance. Denormalization (deliberately reintroducing controlled redundancy) is a deliberate trade-off for read-heavy workloads, not a failure of design.

13. What is the difference between synchronous and asynchronous replication, and what are the durability implications of each?

In synchronous replication, the primary waits for at least one replica to confirm the write before reporting success to the client. If the replica does not confirm within a timeout, the primary fails the transaction. This guarantees zero data loss if the primary fails — the replica has all committed data. However, write latency increases by at least the network round-trip to the replica, which matters at scale. Asynchronous replication propagates writes to replicas after the primary confirms success to the client. Writes are faster but there is a window of data loss if the primary fails — typically measured as replication lag in seconds. Semi-synchronous replication (MySQL) waits for at least one replica to receive the write but not necessarily apply it. PostgreSQL's synchronous replication requires confirmation before commit; streaming replication can be configured for various guarantee levels.

14. What is a deadlock, how does PostgreSQL detect and resolve them, and what strategies prevent them in application code?

A deadlock occurs when two or more transactions each hold locks the other needs and neither can proceed. PostgreSQL detects deadlocks via a wait-for graph — when a transaction detects it is waiting on a lock held by another transaction that is itself waiting, it checks for a cycle. If found, it aborts one transaction (the victim) with error 40P01. Prevention strategies: acquire locks in a consistent order across all transactions (if all code acquires table locks in the same order, deadlocks are impossible), keep transactions short (less lock hold time means less window for cycles), use lock timeout to bound wait time instead of waiting indefinitely, avoid user interaction inside transactions (batch operations instead).

15. Explain the difference between a unique index and a primary key constraint, and when the distinction matters in PostgreSQL vs MySQL.

Both enforce uniqueness, but primary key adds NOT NULL and is the clustered index in MySQL (InnoDB). PostgreSQL does not cluster by default — the primary key is just a unique constraint with a NOT NULL added. In MySQL, the primary key determines physical row order, so choose it carefully for write patterns. In PostgreSQL, you can have one PK that is NOT NULL and another unique index that allows NULLs (for composite keys where one column might be absent). Foreign keys can reference either. The distinction matters for replication: MySQL's statement-based replication replicates primary key usage for binlog formatting.

16. How would you migrate a large table (billions of rows) from one schema to another with minimal downtime?

Big table migrations require a multi-phase approach. Option 1 (online migration): add new columns as nullable, backfill in batches using chunked UPDATE with sleep between batches to reduce load, add triggers to keep old and new columns in sync during transition, backfill complete, add NOT NULL constraints, drop old columns. Option 2 (shadow table): create new table with target schema, backfill from old table in batches, symlink cutover (rename old to old_backup, rename new to active in single transaction), verify, drop old_backup. The key constraint: you must be able to roll back at every step. For very large tables, tools like `pg_repack` or `ALTER TABLE ... SET TABLESPACE` help move physical storage without table rewrite locks.

17. What is table bloat and what causes it? How do you detect and mitigate it in PostgreSQL?

Table bloat is dead tuple accumulation — rows that were updated or deleted but not reclaimed. Causes: high UPDATE/DELETE rate without sufficient autovacuum, long-running transactions holding snapshots (preventing vacuum from seeing dead tuples), abrupt connection drops without proper cleanup, aggressive indexing on frequently updated columns. Detection: `pg_stat_user_tables.n_dead_tup` growing, `pgstattuple` extension for precise bloat estimation, comparing `pg_class.relpages` (physical pages) against expected from `pg_class.reltuples` (row count). Mitigation: tune autovacuum to run more aggressively on high-write tables, manually VACUUM when needed, increase `autovacuum_vacuum_scale_factor` or lower threshold, consider `VACUUM FULL` (requires exclusive lock) or `pg_repack` for online cleanup. Prevention: right-size `work_mem` to reduce sorts spilling to disk, avoid long transactions.

18. What is the difference between a bitmap index scan and a regular index scan, and when does PostgreSQL choose each?

A regular index scan fetches each matching row individually from the heap — one disk I/O per row. A bitmap index scan first collects all matching index entries into a bitmap (in memory), then sorts the bitmap by heap page location, then fetches each page once with all rows from that page. Bitmap scans win when many rows match and are scattered across many pages — fetching each row individually would cause random I/O. PostgreSQL chooses bitmap scans when the index selectivity is moderate (not too selective, not too unselective). For highly selective queries returning few rows, regular index scan is faster because you stop after finding them. You can force bitmap scan with `enable_bitmapscan = off` to test whether the planner is making the right choice.

19. How does PostgreSQL's advisory locking work and when would you use it over regular row locks?

Advisory locks are application-defined locks identified by bigint keys. They are not tied to data rows — you explicitly acquire and release them. Functions: `pg_advisory_lock(key)`, `pg_advisory_xact_lock(key)` (auto-release at transaction end), `pg_try_advisory_lock(key)` (non-blocking). Use cases: implement application-level resource serialization without adding database rows (e.g., ensure only one job of a certain type runs at a time), coordinate between multiple application instances without database rows as coordination points, implement distributed counters or rate limiting. Row locks block on the locked row until the transaction completes; advisory locks give you coarser-grained control. The tradeoff: row locks are automatically released on commit/rollback; advisory locks require explicit release or transaction scope.

20. What are the trade-offs between using a natural key versus a surrogate key (sequence/UUID) as a primary key?

Natural keys come from your domain data (e.g., ISBN for books, country code for countries). Surrogate keys are system-generated (auto-increment integers, UUIDs). Surrogate keys advantages: never change, no possibility of conflicts, compact storage (4 bytes for bigint vs variable for strings), no business logic leakage into foreign keys. Natural key advantages: self-documenting (a join on country_code is readable, a join on country_id is not), enables referential integrity checks at application level, sometimes eliminates the need for a join. For most OLTP workloads, surrogate keys are preferred — they are immutable and the natural key can live as a unique constraint instead of the PK. For reference tables with stable, meaningful codes (ISO country codes, currency codes), natural keys make sense. UUIDs as primary keys have write amplification issues in B-tree indexes (random insertion patterns cause page splits and index bloat) — use `gen_random_uuid()` only when truly needed, or consider `uuid_generate_v4()` with `pgcrypto` extension for sequential UUID generation.

Further Reading

Database Engine Internals Deep Dive

Understanding the internals helps you reason about performance, concurrency, and failure modes.

MVCC and Read Consistency

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to allow readers and writers to not block each other. When you UPDATE a row, PostgreSQL creates a new version while keeping the old version until no transaction needs it. Each transaction sees a snapshot based on its start time — this is why READ COMMITTED works as it does.

-- See the current transaction snapshot
SELECT txid_current_snapshot();

-- Check which rows are visible to current transaction
SELECT * FROM my_table WHERE ctid = '(0,1)';

-- Monitor tuple visibility and age
SELECT relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

The visibility map tracks which pages contain only dead tuples. When the visibility map shows a page is all-dead, future index-only scans can skip heap fetches. This is why autovacuum freshness matters for read performance.

WAL and Crash Recovery

The Write-Ahead Log (WAL) is the foundation of PostgreSQL durability. Every change is written to the WAL before being applied to data pages. On crash, PostgreSQL replays the WAL to reach a consistent state.

-- Check WAL activity and size
SELECT pg_current_wal_lsn(), pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0');

-- Force a checkpoint (admin task, use carefully)
SELECT pg_checkpoint_sync();

-- Check wal usage per database
SELECT datname, total_wal_bytes FROM pg_stat_database;

wal_level = replica enables logical decoding for change data capture and replication. For production, logical level provides enough information for point-in-time recovery plus replication.

Autovacuum Tuning

Autovacuum prevents table bloat by reclaiming dead tuples. Default settings work for moderate workloads but need tuning for high-write tables:

-- Table-specific autovacuum tuning for high-write tables
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.01,    -- Trigger at 1% dead tuples vs default 20%
    autovacuum_analyze_scale_factor = 0.005, -- Analyze at 0.5% changes vs default 10%
    autovacuum_vacuum_cost_delay = 2         -- Lower cost delay for faster cleanup
);

-- Monitor autovacuum activity
SELECT relname, last_autovacuum, autovacuum_count,
       COALESCE(ROUND(n_dead_tup::numeric / (n_live_tup + n_dead_tup + 1) * 100, 2), 0) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- Manual vacuum for emergency cleanup
VACUUM verbose orders;

Connection Pooling with PgBouncer

PgBouncer reduces connection overhead by pooling connections to PostgreSQL. It sits between your application and the database:

; pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3

With pool_mode = transaction, PgBouncer disconnects the backend connection when the client transaction ends. Your application never holds a backend connection between transactions, allowing you to support many more concurrent application connections than PostgreSQL can handle.

Conclusion

Relational databases remain relevant because they solve real problems. ACID transactions give you confidence your data stays consistent. Joins let you ask complex questions across related data. Indexes make queries fast. Normalization keeps your data from becoming a mess.

PostgreSQL and MySQL both handle these concerns well. The features overlap significantly. My preference leans toward PostgreSQL for its feature set and standards compliance, but MySQL is equally capable for most workloads.

For related reading, see my posts on NoSQL Databases to understand when alternative data models make sense, and Database Scaling to learn how to handle growth beyond a single database instance.

Category

Related Posts

Connection Pooling: HikariCP, pgBouncer, and ProxySQL

Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.

#database #connection-pooling #performance

Table Partitioning: Range, Hash, and List Strategies

Learn how table partitioning improves query performance and maintenance, including range partitioning, hash partitioning, and partition pruning.

#databases #partitioning #performance

dbt: The SQL-First Transformation Tool for Data Teams

Discover how dbt brings software engineering practices—version control, testing, documentation—to SQL-based data transformations.

#data-engineering #dbt #sql