Physical Clocks in Distributed Systems: NTP and Synchronization
Learn how physical clocks work in distributed systems, including NTP synchronization, clock sources, and the limitations of wall-clock time for ordering events.
Introduction
Hardware Clocks
Modern CPUs contain a crystal oscillator driving a counter that increments millions of times per second. The operating system reads this counter to track elapsed time since some epoch.
On Linux, two hardware clocks exist:
# System clock: maintained by the kernel, used for gettimeofday()
# Hardware clock: battery-backed CMOS clock, persists across reboots
# Check current time sources
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
# Output might be: tsc, hpet, or acpi_pm
# View clock precision
adjtimex --print | grep resolution
The hardware clock drifts. Crystal oscillators are accurate to maybe 20-50 parts per million. That sounds tiny, but over a day, your clock could be off by several milliseconds. Over a month, it could drift by seconds.
The System Clock
The kernel maintains a software clock derived from the hardware clock. This software clock is what your applications actually use:
// JavaScript: uses the system clock
const now = Date.now(); // Milliseconds since epoch
// Go: uses the system clock
import "time"
t := time.Now().UnixNano() // Nanoseconds since epoch
// C: uses gettimeofday
struct timeval tv;
gettimeofday(&tv, NULL);
int64_t microseconds = (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
The epoch is typically January 1, 1970 UTC (the Unix epoch), though some systems use different starting points.
Linux Kernel Timekeeping Architecture
The Linux kernel maintains a sophisticated timekeeping subsystem that abstracts hardware differences and provides stable time to applications.
The timekeeper Structure
At the heart of Linux timekeeping is the timekeeper structure, maintained by the kernel:
struct timekeeper {
struct tk_read_base tk_base;
u64 xtime_sec; // Seconds since epoch
unsigned int tk_mono; // Monotonic time
struct timespec64 wall_to_monotonic;
ktime_t offset_wall_to_monotonic;
struct/clk * clk; // Current clocksource
struct timens_counter clock_was_set_seq;
u8 tk_cs_was_changed_seq;
// ... NTP state and other fields
};
The kernel updates xtime_sec from the current clocksource on each tick. Applications read this via clock_gettime().
Clocksource Management
Linux manages multiple clock sources through an abstraction layer:
# List available clocksources
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
# Output: tsc hpet acpi_pm
# Current active clocksource
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
# Output: tsc
Each clocksource implements the clocksource struct:
struct clocksource {
u64 (*read)(struct clocksource *cs);
s64 (*cyc2ns)(struct clocksource *cs, u64 cycles);
u64 mask;
u32 mult; // Cycle to nanosecond multiplier
u32 shift; // Cycle to nanosecond shift
u64 maxadj; // Maximum adjustment (ppm)
const char *name;
struct list_head list;
// ...
};
The mult and shift values convert cycle counts to nanoseconds. A clocksource with mult=1000000, shift=20 converts cycles to microseconds via cycles * 1000000 >> 20.
VDSO and Time Acceleration
For performance, Linux maps time-related functions into user space via VDSO (Virtual Dynamic Shared Object):
# Check VDSO for clock_gettime
ldd /bin/date | grep vdso
# Output: linux-vdso.so.1
# VDSO accelerates:
# - clock_gettime(CLOCK_REALTIME)
# - clock_gettime(CLOCK_MONOTONIC)
# - gettimeofday()
VDSO eliminates kernel transitions for time queries. The kernel updates VDSO data pages on each tick, and user code reads the precomputed values directly.
adjtimex and NTP Control
The adjtimex() system call provides low-level access to the kernel’s NTP state:
struct timex tx;
memset(&tx, 0, sizeof(tx));
tx.modes = ADJ_OFFSET; // Set adjustment mode
tx.offset = 500; // 500 microseconds offset
adjtimex(&tx);
// Check kernel NTP status
tx.modes = 0;
adjtimex(&tx);
printf("status: %d, offset: %ld, freq: %ld\n",
tx.status, tx.offset, tx.freq);
Key fields:
offset: Current time adjustment in microsecondsfreq: Frequency offset in parts per million (ppm)status: Clock status flags (STA_PLL, STA_UNSYNC, etc.)maxerror: Maximum error estimate in microsecondsesterror: Estimated error in microseconds
time Namespace
Linux namespaces allow containers to have independent time offsets:
# Create a time namespace with offset
unshare --time --monotonic=3600 --boottime=7200
# Inside the namespace:
# Monotonic time starts at 3600s
# Boottime starts at 7200s
# Useful for testing time-dependent applications
The time namespace feature lets containers test time-dependent behavior without waiting for real time to pass.
timerfd and POSIX Timers
Linux provides high-resolution timers through timerfd:
#include <sys/timerfd.h>
// Create a timer
int tfd = timerfd_create(CLOCK_REALTIME, TFD_NONBLOCK);
// Set it to fire every 100ms
struct itimerspec spec;
spec.it_interval.tv_sec = 0;
spec.it_interval.tv_nsec = 100000000; // 100ms
spec.it_value.tv_sec = 0;
spec.it_value.tv_nsec = 100000000;
timerfd_settime(tfd, 0, &spec, NULL);
// Read events
uint64_t expirations;
read(tfd, &expirations, sizeof(expirations));
timerfd integrates with select(), poll(), and epoll, making it suitable for event-driven applications needing precise timing.
NTP Synchronization
The Network Time Protocol synchronizes system clocks to external time sources. It is the backbone of timekeeping on the internet.
How NTP Works
NTP uses a hierarchical system of time sources:
graph TD
A[Stratum 0: Atomic Clocks] --> B[Stratum 1: Time Servers]
B --> C[Stratum 2: University Servers]
C --> D[Stratum 3: Public Servers]
D --> E[Your Servers]
Stratum 0 are atomic clocks and GPS receivers. Stratum 1 servers sync directly to these. Each layer down adds some uncertainty.
NTP Algorithm
NTP works by exchanging UDP packets.
// Simplified NTP exchange
// 1. Client sends timestamp T1
// 2. Server receives at T2
// 3. Server sends response at T3
// 4. Client receives at T4
// Round-trip delay:
delay = T4 - T1 - (T3 - T2);
// Clock offset:
offset = (T2 - T1 + (T3 - T4)) / 2;
NTP takes multiple samples and uses selection algorithms to filter out bad measurements. The ntpd daemon runs continuously, adjusting the clock in small increments.
Configuring NTP
Most Linux systems use chrony now, which is more accurate and handles network fluctuations better:
# Install and start chrony
apt install chrony
# Check synchronization status
chronyc tracking
# Sample output:
# Reference ID : A1B2C3D4 (time.example.com)
# Stratum : 3
# Ref time (UTC) : Mon Mar 24 10:15:30 2026
# System time : 0.000012345 seconds slow of NTP time
# Last offset : -0.000023456 seconds
# RMS offset : 0.000034567 seconds
Clock Accuracy and Drift
Measuring Clock Drift
Real clocks do not keep perfect time. Drift rates vary based on temperature, age, and hardware quality:
# Measure drift rate over 24 hours
# Before: sync your clock to NTP
ntpdate -b pool.ntp.org
# After 24 hours:
# Check how far off you are
timedatectl status
Drift rates typically range from 1 to 50 parts per million (ppm). A 1 ppm drift means your clock drifts about 0.086 seconds per day.
Temperature Effects
Crystal oscillators are temperature-sensitive. Large swings in temperature cause measurable drift. Data centers with stable environments have an advantage here.
Temperature vs Drift (typical values):
- 20°C: 0 ppm (baseline)
- 25°C: +2 ppm
- 30°C: +5 ppm
- 35°C: +10 ppm
Rapid temperature changes cause immediate frequency shifts.
This is why outdoor servers or edge devices struggle with clock accuracy.
Hardware Clock Types
Different clock sources have different characteristics:
| Clock Source | Accuracy | Stability | Cost |
|---|---|---|---|
| TSC (Time Stamp Counter) | High | Poor under load | Free |
| HPET | Medium | Good | Motherboard |
| ACPI PM Timer | Low | Good | Free |
| GPS Receiver | Very High | Excellent | $50-200 |
| Atomic (Cesium) | Perfect | Perfect | $10k+ |
The TSC clock is fastest but varies with CPU frequency scaling. On modern systems with constant TSC frequency, it is usually reliable for short intervals.
Limitations of Physical Clocks
Physical clocks have fundamental problems for distributed systems.
Clock Skew vs Clock Drift
Drift is the rate at which a clock runs fast or slow. Skew is the difference between two clocks at a point in time. Even if all your clocks drift at the same rate, skew accumulates because they started from different times.
sequenceDiagram
participant A as Server A
participant B as Server B
participant NTP as NTP Server
Note over A: Clock: 10:00:00.000
Note over B: Clock: 10:00:00.150
A->>NTP: Sync request
B->>NTP: Sync request
Note over A: After sync: 10:00:00.050
Note over B: After sync: 10:00:00.100
Note over A,B: Still 50ms apart due to previous drift
Non-Monotonic Clocks
System clocks can jump forward or backward. NTP can step the clock when drift exceeds a threshold, or slew it gradually. Either way, your application might see time go backward:
// Time going backward is possible with NTP sync
const t1 = Date.now(); // 1000
await doSomething();
const t2 = Date.now(); // 998 - time went backward!
Monotonic clocks solve this for relative time measurements, but they are not globally synchronized.
Leap Seconds
Once or twice per year, a leap second is added to UTC. This causes clocks to repeat a second or pause. Most systems handle this poorly:
# Leap second announced: June 30, 2015 23:59:60 UTC
# Many servers had kernel panics or went into infinite loops
# Some databases had corruption issues
Linux now handles leap seconds better, but they remain a source of unpredictability.
Using Physical Clocks in Distributed Systems
Despite their limitations, physical clocks are widely used. Knowing when to trust them matters.
When Physical Clocks Work
For non-critical timestamps and logging, wall-clock time is fine:
// Logging: wall-clock time is appropriate
logger.info({
event: "user_login",
timestamp: new Date().toISOString(), // Wall clock OK for logs
userId: user.id,
});
// Audit trails benefit from wall clock
// Even if slightly off, audit timestamps are for human readability
When Physical Clocks Fail
For ordering events or conflict resolution, physical clocks are dangerous:
// BAD: Using wall clock for conflict resolution
function resolveConflict(local, remote) {
// Assumes remote timestamp > local timestamp means remote is newer
// FAILS when clocks are skewed
if (remote.updatedAt > local.updatedAt) {
return remote;
}
return local;
}
The PACELC theorem post discusses why timestamp-based conflict resolution breaks down in distributed systems.
Millisecond vs Microsecond vs Nanosecond
Precision vs Accuracy
Precision is how finely you can measure time. Accuracy is how close that measurement is to truth. You can be precise but inaccurate.
Clock Source | Precision | Accuracy
----------------------|-----------|----------
System clock (NTP) | 1 ms | 10-100 ms
System clock (local) | 1 us | 1-50 ms (drift)
RDTSC (modern CPUs) | 1 ns | 1-50 ms (drift)
PTP (IEEE 1588) | 1 ns | 100 ns
GPS | 1 ns | 50 ns
When You Need Better
High-frequency trading systems need nanosecond precision. Telecom systems need microsecond synchronization. Most applications do not, but understanding the options matters:
- PTP (Precision Time Protocol): Hardware-level synchronization, achievable with specialized network equipment
- GPS: Extremely accurate time source, requires hardware and clear sky view for GPS
- Atomic clocks: Only for the most demanding applications (telecom, scientific)
IEEE 1588 PTP Deep Dive
IEEE 1588, also known as Precision Time Protocol (PTP), achieves nanosecond-level synchronization by using hardware timestamps and a master-slave architecture. It is the gold standard for time synchronization in financial trading, telecom, and industrial control systems.
How PTP Works
PTP synchronizes clocks in a network by exchanging precision timestamps. Unlike NTP which uses UDP and relies on software timestamps, PTP uses hardware timestamps when available and synchronizes at the network interface card level.
The synchronization process works as follows:
-
Best Master Clock Algorithm (BMCA): All clocks run BMCA to elect the grandmaster clock—the most accurate time source. The grandmaster is typically a GPS-disciplined oscillator or atomic clock.
-
Sync messages: The master sends Sync messages with the send timestamp (hardware-generated). The slave records the receive timestamp (hardware-generated).
-
Follow-up messages: Because the send timestamp for Sync may not be known precisely at transmission time, the master sends a Follow-Up message with the corrected timestamp.
-
Delay Request and Response: The slave sends a Delay Request to the master, which responds with a Delay Response. This measures the path delay.
Master Slave
| |
|------------- Sync (t1) --------------------->|
| (t1 = master send timestamp, hardware) |
| |
|<----------- Delay Request (t4) --------------|
| (t4 = slave send timestamp) |
| |
|----- Follow-Up (t2) ------------------------>|
| (t2 = corrected master send timestamp) |
| |
|<----- Delay Response (t3) --------------------|
| (t3 = master receive timestamp) |
| |
Calculated values:
- Offset from master: ((t2 - t1) - (t4 - t3)) / 2
- Path delay: (t3 - t1) + (t4 - t2)) / 2
The critical insight is that PTP uses hardware timestamps at both ends, eliminating software delays that plague NTP.
PTP vs NTP: The Key Differences
| Feature | PTP (IEEE 1588) | NTP |
|---|---|---|
| Timestamp level | Hardware (NIC-level) | Software (OS kernel) |
| Best accuracy | Sub-microsecond (100ns achievable) | Milliseconds (10-100ms typical) |
| Network requirements | Dedicated or VLAN with QoS | Any IP network |
| Architecture | Master-slave with BMCA | Hierarchical with NTP pools |
| Hardware dependency | Requires PTP-capable NIC/switch | Works on any hardware |
| Cost | Expensive (switches, NICs) | Free (software) |
| Convergence time | Seconds to minutes | Minutes |
| Typical use cases | HFT, telecom, power grids | General servers, applications |
Hardware Requirements for PTP
PTP requires specialized network infrastructure:
PTP-Capable NICs: Network interface cards with hardware support for IEEE 1588. These NICs capture timestamps at the hardware level, bypassing OS latency.
PTP-Aware Switches: Standard switches introduce variable latency as packets queue. PTP-aware switches use boundary clocks or transparent clocks to compensate for switch delay. They measure and correct for the time packets spend traversing the switch.
GPS Grandmaster Clocks: For the most accurate time source, GPS-disciplined oscillators provide nanosecond accuracy to UTC. They serve as the authoritative time source for PTP domains.
# Check if your NIC supports PTP
ethtool -T eth0 | grep -i timestamp
# Output might show:
# PTP Hardware Clock: 0
# Hardware Timestamping:-supported
# PTPv2 Event Port Transmit: supported
# PTPv2 Event Port Receive: supported
# Check PTP capabilities with ptp4l
ptp4l -i eth0 -l 6 -m
# -l 6 = log level debug
# -m = print messages to stdout
PTP Profiles
IEEE 1588 allows customization through profiles. Two are particularly important:
Default PTP Profile: General-purpose profile for enterprise networks. Allows up to 10ms path delay.
Power Profile (IEEE C37.238): Used in electrical substations for protection and control. Requires sub-microsecond accuracy for synchronizing phasor measurement units (PMUs).
Telecom Profile (G.8265.1): For telecom applications requiring frequency synchronization (not phase). Uses only Announce and Sync messages.
# Example: ptp4l configuration for default profile
# /etc/ptp4l.conf
[global]
domainNumber 0
priority1 128
priority2 128
clockType 2
servo_num_offset_threshold 1000
[eth0]
delayMechanism 1 # E2E (End to End)
Deployment Considerations
PTP requires careful network design:
-
Network latency asymmetry: PTP assumes symmetric path delay. If forward and reverse paths have different latency (common in wireless or routed networks), synchronization degrades. Use symmetric network paths or boundary clocks.
-
VLAN and QoS configuration: PTP messages must get priority queuing. Configure switches to give PTP traffic highest priority:
# Cisco switch example for PTP VLAN
vlan 100
name ptp-vlan
!
interface GigabitEthernet1/0/1
switchport mode trunk
switchport trunk allowed vlan 100
priority-mode dscp
!
# QoS configuration for PTP
mls qos map cos-dscp 46 34 26 18 0 0 0 0
# 46 = EF (Expedited Forwarding) for PTP
-
Boundary Clock vs Transparent Clock: Boundary clocks terminate PTP at each switch and act as masters for downstream devices. Transparent clocks pass PTP through while correcting for switch delay. Boundary clocks are easier to deploy; transparent clocks provide better accuracy.
-
Multi-domain PTP: Different PTP domains can run independently. Useful when you need separate time references for different subsystems.
When PTP Is Worth the Cost
PTP adds significant complexity and expense. Only deploy it when your requirements demand it:
| Requirement | NTP Sufficient | Consider PTP |
|---|---|---|
| Clock synchronization | < 1 ms accuracy | > 1 ms accuracy |
| Financial trading (HFT) | No | Yes (nanoseconds matter) |
| Telecom (4G/5G base stations) | No | Yes (phase synchronization) |
| Power grid synchronization | No | Yes (PMU timing) |
| Industrial automation | Sometimes | Yes (motion control) |
| Video/audio sync (broadcast) | Sometimes | Yes (lip sync) |
| General server timekeeping | Yes | No |
| Distributed databases | Usually | Rarely |
Most distributed databases do not need PTP. CockroachDB and Spanner use hybrid logical clocks or true time (hardware atomic clocks plus GPS) for distributed transaction ordering. For everything else, NTP with proper monitoring is sufficient.
Production Considerations
Monitoring Clock Synchronization
# Check for clock synchronization issues
# Watch for large offset values
chronyc sources -v
# Should show multiple sources with ^*, indicating successful sync
chronyc sourcestats
# High "sd" (standard deviation) indicates unstable sync
Alerts to Set
| Alert | Threshold | Severity |
|---|---|---|
| Clock offset exceeds 100ms | 100 ms | Warning |
| Clock offset exceeds 500ms | 500 ms | Critical |
| NTP sync lost | Any | Warning |
| Leap second event | Any | Warning |
Common Issues
NTP daemon not running: Clock drifts freely until next manual sync Firewall blocking NTP: Cannot sync, clock drifts Virtual machines: Clocks run slower or faster than physical time (VMs have CPU time slicing issues) Cloud instances: Shared resources cause clock instability
For virtualized and cloud environments, use hypervisor-level synchronization when possible. Most cloud providers offer time sync services that are more reliable than public NTP.
Cloud Provider Time Sync Services
Major cloud providers provide specialized time synchronization services optimized for their infrastructure:
AWS Time Sync Service
Amazon Web Services provides a time sync service accessible via NTP at 169.254.169.123:
# AWS EC2 instance time sync
# The AWS Time Sync Service is available at this link-local address
# No NTP package installation needed on Amazon Linux 2023
cat /etc/chrony/chrony.conf
# server 169.254.169.123 prefer iburst
# For Ubuntu/Debian with systemd-timesyncd
# Edit /etc/systemd/timesyncd.conf
[Time]
NTP=169.254.169.123
FallbackNTP=169.254.169.123
# Verify sync status
timedatectl
AWS uses the Amazon Time Sync Service which is synchronized to GPS and atomic clocks in each region. The service runs on fleet instances with hardware clocks and is monitored by AWS.
Google Cloud NTP
Google Cloud Platform provides NTP through their metadata server and a dedicated time service:
# GCP NTP configuration
# For Debian/Ubuntu with systemd-timesyncd
cat /etc/systemd/timesyncd.conf
[Time]
NTP=metadata.google.internal
FallbackNTP=time.google.com
# For CentOS/RHEL with chrony
echo "server metadata.google.internal iburst" >> /etc/chrony/chrony.conf
systemctl restart chronyd
# Verify
chronyc tracking
GCP also offers the Google Public NTP service at time.google.com for non-GCP infrastructure.
Microsoft Azure Time Sync
Azure provides time sync through the Azure VMs themselves and an NTP service:
# Azure time sync status
# Azure VMs automatically sync to the host server time
# For explicit NTP configuration:
# Ubuntu/Debian
cat /etc/systemd/timesyncd.conf
[Time]
NTP=time.windows.com
# CentOS/RHEL with chrony
echo "server time.windows.com iburst" >> /etc/chrony/chrony.conf
# Verify Azure VM time sync
systemctl status systemd-timesyncd
Cloud Provider Comparison
| Provider | NTP Endpoint | Source | Accuracy | Special Notes |
|---|---|---|---|---|
| AWS | 169.254.169.123 | GPS + Atomic | ~1-5 ms | Link-local, no network hops |
| GCP | metadata.google.internal | Google atomic clocks | ~1-5 ms | Via metadata server |
| Azure | time.windows.com | Microsoft time servers | ~5-20 ms | May require firewall rules |
| Oracle Cloud | 169.254.0.2 | Oracle stratum 1 | ~5-10 ms | Built into Oracle Cloud infrastructure |
Virtualization Clock Issues
Virtual machines face unique clock challenges that physical machines do not:
CPU Time Slicing: VMs share physical CPU cores. When a VM is scheduled off the CPU, its clock stops advancing. When rescheduled, it may appear to jump forward.
Live Migration: When VMs migrate between hosts, they may pause briefly. This causes clock discontinuities that NTP must compensate for.
Resource Contention: Under heavy load, VMs may not receive full CPU time, causing clock drift even with NTP running.
Hypervisor Solutions:
# VMware: Enable time synchronization with host
# On VMware, ensure these settings are configured:
# VM Tools > Options > Time synchronization > Synchronize time with host
# Verify VMware tools status
vmware-toolbox-cmd timesync status
# Hyper-V: Enable time synchronization
# On Hyper-V host:
# Set-VMProcessor -VMName "YourVM" -CompatibilityForTimeMigration $true
# Linux guest with Hyper-V integration services:
# The hv_utils driver provides time sync
lsmod | grep hv_utils
Best Practices for VMs:
- Use the cloud provider’s time sync service (169.254.169.123 for AWS)
- Enable hypervisor-level time sync where available
- Avoid using bare metal time sources inside VMs
- Monitor clock drift and set alerts for large offsets
- Consider using a monotonic clock for short-interval timing
Production Monitoring Implementation
Monitoring clock synchronization is critical for systems that depend on event ordering:
Prometheus Metrics for Clock Sync
# prometheus.yml - scrape configuration for node exporter
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
# Key node_timex metrics to monitor:
# node_timex_offset_seconds - current clock offset from NTP
# node_timex_maxerror_seconds - maximum estimated error
# node_timex_loop_time_constant - phase-locked loop time constant
# node_timex_sync_status - whether sync is active (1 = synced)
Alerting Rules
# alerts/clock-sync.yml
groups:
- name: clock_alerts
interval: 30s
rules:
- alert: ClockOffsetHigh
expr: abs(node_timex_offset_seconds) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Clock offset exceeds 100ms"
description: "Clock on {{ $labels.instance }} is {{ $value }} seconds offset"
- alert: ClockOffsetCritical
expr: abs(node_timex_offset_seconds) > 0.5
for: 1m
labels:
severity: critical
annotations:
summary: "Clock offset exceeds 500ms"
description: "Clock on {{ $labels.instance }} is critically offset"
- alert: ClockSyncLost
expr: node_timex_sync_status == 0
for: 1m
labels:
severity: warning
annotations:
summary: "NTP synchronization lost"
description: "Clock on {{ $labels.instance }} is not synchronized"
Clock Selection Decision Matrix
Use this matrix to select the appropriate time source for your use case:
| Use Case | Time Source | Accuracy Needed | Recommendation |
|---|---|---|---|
| User-facing timestamps | Wall clock | 1 second | System clock, human-readable |
| Application logging | Wall clock | 1 millisecond | System clock with NTP |
| Session management | Monotonic | 1 millisecond | CLOCK_MONOTONIC |
| Distributed event ordering | Logical clocks | Causality | Lamport timestamps |
| Conflict resolution | Logical/Vector | Causality | Vector clocks |
| Database replication | Logical clocks | Causality | Hybrid Logical Clocks |
| Financial transactions | PTP/GPS | Nanoseconds | IEEE 1588 or GPS |
| Telecom synchronization | PTP/GPS | Microseconds | IEEE 1588 with hardware |
| Scientific experiments | Atomic | Nanoseconds | Dedicated time server |
| CDN edge nodes | NTP | 1 millisecond | Local stratum 1 |
When to Use / When Not to Use Physical Clocks
| Scenario | Recommendation |
|---|---|
| User-facing timestamps | Use wall-clock with timezone handling |
| Application logging | Use wall-clock, useful for human review |
| Debugging and tracing | Use wall-clock, correlates with real events |
| Event ordering across machines | Do not use wall-clock |
| Conflict resolution | Do not use wall-clock |
| Distributed consensus | Do not use wall-clock alone |
| High-frequency trading | Use PTP or GPS, not NTP |
When TO Use Physical Clocks
- Displaying times to users
- Logging events for human review
- Auditing and compliance (with caution)
- Non-critical scheduling where small errors are tolerable
When NOT to Use Physical Clocks
- Determining causal ordering of distributed events
- Conflict resolution in distributed databases
- Implementing distributed protocols (consensus, coordination)
- Any situation where “which happened first” matters
Production Failure Scenarios
Understanding how clock failures manifest in production helps with debugging and prevention.
The 2015 Leap Second Incident
On June 30, 2015, many Linux servers worldwide experienced kernel panics or runaway CPU usage when a leap second was inserted. The issue stemmed from a bug in the Linux kernel’s clocksource_watchdog function, which could cause an infinite loop when the TSC clock source was used.
Cloudflare’s DNS service was among those affected, experiencing intermittent failures. The fix involved properly handling the repeated second in the kernel’s timekeeping code.
AWS EC2 Clock Drift Incident
In 2014, Amazon EC2 instances experienced significant clock drift when the hypervisor’s clock source became unreliable. Instances running applications sensitive to timing showed:
- SSL/TLS handshake failures due to certificate time validation
- Kerberos authentication errors as tickets were considered expired
- Cassandra batch statements timing out due to incorrect timestamp headers
The resolution involved AWS improving their hypervisor-level time synchronization.
High-Frequency Trading Clock Skew
A quantitative trading firm discovered their arbitrage system was losing money due to 2ms clock skew between their primary and backup servers. Despite both running NTP synchronization, network latency asymmetry caused one server’s clock to consistently run 2ms fast.
The fix required moving to GPS-disciplined oscillators with hardware timestamps, eliminating network-based synchronization entirely.
Distributed Database Timestamp Conflict
A social media company experienced data loss in their distributed database during a network partition. The “last write wins” conflict resolution was using wall-clock timestamps, and clock skew between data centers meant updates from one datacenter consistently overwrote the other.
The incident led them to adopt hybrid logical clocks, which provide causality guarantees without requiring tightly synchronized physical clocks.
Kubernetes Insecure Token Validation
In early Kubernetes versions, service account tokens were validated using the node’s system clock. If the node’s clock was significantly skewed, tokens that were legitimately expired could be accepted as valid, or fresh tokens rejected. This was especially problematic in development environments with unreliable NTP sync.
Trade-off Analysis
| Factor | Physical Clocks + NTP | PTP (IEEE 1588) | Atomic/GPS Clocks |
|---|---|---|---|
| Accuracy | 10-100 ms | 100 ns - 1 μs | < 50 ns |
| Cost | Free (software only) | $1,000-10,000+ per location | $10,000-100,000+ |
| Hardware required | None | PTP-capable NICs + switches | Atomic clock or GPS receiver |
| Network requirements | Any IP network | Dedicated VLAN or QoS | Antenna/signal path |
| Convergence time | Minutes | Seconds to minutes | Instant (self-contained) |
| Maintenance | Low | High | Very high |
| Typical use cases | Web servers, logging, general | HFT, telecom, industrial | Telecom backbone, research |
When NTP Is Sufficient
Choose NTP when:
- Millisecond accuracy meets your requirements
- You need broad compatibility and simplicity
- Your infrastructure is in cloud environments with built-in time sync
- You cannot afford specialized hardware or network configuration
When to Consider PTP
Choose PTP when:
- Sub-millisecond accuracy is required
- You operate in financial trading, telecom, or industrial control
- You have dedicated network infrastructure for time traffic
- Regulatory requirements mandate specific accuracy levels
When You Need Hardware Clocks
Choose dedicated hardware clocks when:
- Nanosecond precision is required
- Network path asymmetry cannot be controlled
- Regulatory compliance requires traceable time to UTC
- You operate in environments without reliable network connectivity
Quick Recap Checklist
Before relying on physical clocks in production, make sure you understand:
- Physical clocks drift at 1-50 ppm due to crystal oscillator imperfections
- NTP synchronization reduces but cannot eliminate clock skew
- Clock skew makes wall-clock timestamps unreliable for event ordering across machines
- System clocks can jump forward or backward (non-monotonic)
- Leap seconds cause repeats or pauses that many systems handle poorly
- Virtual machines have additional clock challenges (time slicing, live migration)
- Use physical clocks for human-readable logs, not for distributed event ordering
- For event ordering, use Lamport timestamps or vector clocks instead
- PTP (IEEE 1588) provides nanosecond accuracy but requires specialized hardware
- Cloud providers offer time sync services optimized for their infrastructure
Interview Questions
Expected answer points:
- Drift is the rate at which a single clock runs faster or slower than ideal time (e.g., ppm drift rate)
- Skew is the difference in time reading between two clocks at a specific moment
- Even clocks with identical drift rates can have skew because they started from different times
- Drift accumulates over time; skew is the observable difference at any point
Expected answer points:
- NTP uses 4 timestamps: T1 (client send), T2 (server receive), T3 (server send), T4 (client receive)
- Round-trip delay = (T4 - T1) - (T3 - T2)
- Clock offset = ((T2 - T1) + (T3 - T4)) / 2
- NTP takes multiple samples and uses filtering algorithms to reject bad measurements
Expected answer points:
- Crystal oscillator drift: typical accuracy 1-50 ppm, causing ~0.086-4.3 seconds/day drift
- Temperature variations: crystal frequency shifts with temperature changes
- Hardware aging: oscillators degrade over time, changing their drift rate
- NTP network latency variability: asymmetric paths add uncertainty
Expected answer points:
- Clock skew means different machines may have different views of "now"
- A timestamp from one machine may appear newer on another due to clock differences, not actual causality
- Last-write-wins conflict resolution fails when clocks are skewed
- The PACELC theorem explains this: in case of network partition, you choose between consistency and availability/performance
Expected answer points:
- Precision is how finely you can measure time (granularity of the measurement)
- Accuracy is how close that measurement is to the true time
- You can be precise but inaccurate (e.g., a clock that increments in nanoseconds but drifts 50ms)
- NTP provides millisecond precision but only 10-100ms accuracy due to network uncertainty
Expected answer points:
- PTP uses hardware timestamps at the NIC level, not software timestamps in the OS kernel
- Hardware timestamps eliminate software delays (interrupt handling, context switching, scheduling)
- PTP uses a master-slave architecture with Best Master Clock Algorithm (BMCA)
- PTP achieves sub-microsecond accuracy (100ns possible) vs. NTP's milliseconds
Expected answer points:
- Leap seconds are occasionally added to UTC to keep it synchronized with Earth's rotation
- They cause clocks to repeat a second (23:59:60) or pause briefly
- Many systems handle this poorly, leading to kernel panics, infinite loops, or data corruption
- The 2015 leap second caused issues at Reddit, Cloudflare, and others
Expected answer points:
- Stratum 0: Atomic clocks and GPS receivers (the ultimate time sources)
- Stratum 1: Time servers that sync directly to Stratum 0
- Stratum 2-3: Servers that sync to Stratum 1, adding more uncertainty
- Each hop introduces additional delay variance and reduces accuracy
- Your servers typically sync to Stratum 2 or Stratum 3 servers
Expected answer points:
- CPU time slicing: VMs stop counting when scheduled off the CPU, causing jumps when rescheduled
- Live migration: VMs moving between hosts pause briefly, causing clock discontinuities
- Resource contention: under load, VMs may not receive full CPU time, affecting clock rate
- Solutions: use hypervisor-level sync, cloud provider time services, or monotonic clocks for short intervals
Expected answer points:
- TSC (Time Stamp Counter): fastest, built into CPU, but varies with CPU frequency scaling on older systems
- HPET (High Precision Event Timer): motherboard-level, good stability, medium precision
- ACPI PM (ACPI Power Management Timer): lowest precision, but always available and stable
- Modern constant TSC frequency CPUs make TSC reliable for most uses
Expected answer points:
- When you need sub-millisecond accuracy (NTP only offers 10-100ms typically)
- High-frequency trading systems where nanoseconds matter
- Telecom applications requiring microsecond phase synchronization for 4G/5G base stations
- Power grid synchronization for PMU (Phasor Measurement Units)
- Industrial automation and motion control systems
Expected answer points:
- chrony can step or slew the clock; ntpd primarily slews (chrony handles larger offsets better)
- chrony is faster to converge and handles intermittent network connectivity better
- chrony works better with virtual machines and in cloud environments
- chrony has a simpler algorithm with less overhead
Expected answer points:
- Authentication failures: Kerberos tickets expire based on clock skew; certificates may appear invalid
- Cache coherency issues: distributed caches may serve stale data when timestamps disagree
- Audit log misalignment: logs from different machines cannot be accurately ordered
- Scheduled job conflicts: cron jobs firing at wrong times across cluster nodes
- Payment processing failures: retry mechanisms may trigger incorrectly based on timestamps
Expected answer points:
- NTP selects from multiple servers and uses intersection algorithm to find correct time
- It identifies false tickers (servers with wrong time) by comparing multiple sources
- Outlier detection rejects measurements that differ significantly from the cluster
- Authentication with NTPv4 uses Autokey to verify server identity
- Best practice: use at least 4 diverse NTP servers to enable fault detection
Expected answer points:
- Clock slewing adjusts the clock frequency slightly to converge gradually (over minutes/hours)
- Clock stepping jumps the clock immediately when offset exceeds threshold
- Slewing is preferred for running applications (no time jumps)
- Stepping is needed when offset is too large for slewing to correct quickly
- chrony uses both; ntpd primarily slews but can step with `tinker step`
Expected answer points:
- Network asymmetry: forward and reverse paths have different latency
- Server selection: different servers within the pool may have different offsets
- Load variation: CPU contention affects timestamp capture timing
- Hardware differences: different clock sources (TSC vs HPET) have different precision
- Virtualization effects: VMs may be subject to hypervisor clock issues
Expected answer points:
- PACELC states: during a partition, you must choose between consistency and latency/availability
- Timestamp-based conflict resolution assumes causal ordering via wall-clock timestamps
- When clocks are skewed, this assumption breaks, violating consistency guarantees
- Distributed databases that use "last write wins" with timestamps may lose data during partitions
- Better approaches: vector clocks, quorum-based replication, or hybrid logical clocks
Expected answer points:
- Crystal oscillators have temperature-dependent frequency ( crystal activity tends to be highest near room temperature)
- Typical drift: +2 ppm at 25°C, +5 ppm at 30°C, +10 ppm at 35°C relative to baseline
- Outdoor or edge devices experience large temperature swings, causing significant drift
- Data centers with stable HVAC maintain ~20-25°C, minimizing temperature-induced drift
- OCXO (Oven-Controlled XO) oscillators use heating to maintain constant temperature
Expected answer points:
- NTP amplification attacks: attackers spoof requests to NTP servers, reflected traffic overwhelms victim
- Clock manipulation: if attacker controls NTP server, they can manipulate your clock for attacks
- Certificate validation failures: manipulated clocks can cause TLS certificates to appear invalid or expired
- Replay attacks: old valid timestamps may be replayed if clock is manipulated backward
- Mitigation: use authenticated NTP (NTPv4 Autokey), restrict NTP to trusted servers, monitor offset changes
Expected answer points:
- Clock offset: `node_timex_offset_seconds` — deviation from NTP source
- Maximum error estimate: `node_timex_maxerror_seconds` — accumulated uncertainty
- Sync status: `node_timex_sync_status` — 1 means synced, 0 means lost
- Loop time constant: `node_timex_loop_time_constant` — PLL configuration
- Stratum level: indicates NTP server hierarchy distance from primary source
Further Reading
- NTP Documentation
- chrony - Modern NTP implementation
- IEEE 1588 - Precision Time Protocol
- PTP (Precision Time Protocol) Overview
- AWS Time Sync Service
- Google Cloud NTP
- Linux Kernel Clocksource Documentation
Conclusion
Physical clocks are imperfect but practical. NTP synchronization keeps them reasonable for most uses. Wall-clock time works fine for logging, user-facing timestamps, and non-critical scheduling.
The problems arise when you need to order events across machines. Clock skew makes “what happened first” non-trivial to answer with physical clocks alone. This is why distributed systems turn to logical clocks and vector clocks, which the next posts in this series cover.
Key takeaways:
- Hardware clocks drift, and drift rates vary between machines
- NTP synchronization reduces skew but cannot eliminate it
- Clock skew makes wall-clock timestamps unreliable for event ordering
- Use physical clocks for human-readable timestamps and logging
- Use logical clocks for distributed event ordering
The Logical Clocks post covers Lamport timestamps, which provide a way to order events without synchronized physical clocks.
Category
Related Posts
Clock Skew in Distributed Systems: Problems and Solutions
Explore how clock skew affects distributed systems, causes silent data corruption, breaks conflict resolution, and what you can do to mitigate these issues.
Logical Clocks: Lamport Timestamps and Event Ordering
Understand Lamport timestamps and logical clocks for ordering distributed events without synchronized physical clocks. Learn how to determine what happened before what.
TrueTime: Google's Globally Synchronized Clock Infrastructure
Learn how Google uses TrueTime for globally distributed transactions with external consistency. Covers the Spanner system, time bounded uncertainty, and HW-assisted synchronization.