Health Checks: Liveness, Readiness, and Service Availability
Master health check implementation for microservices including liveness probes, readiness probes, and graceful degradation patterns.
Health Checks: Liveness, Readiness, and Service Availability
In distributed systems, your services do not exist in isolation. They call each other, depend on databases and caches, and serve traffic through load balancers. When a service starts failing, the rest of the system needs to know quickly. Health checks provide that visibility.
A properly implemented health check system tells Kubernetes when to route traffic to your pod, tells your load balancer which instances are ready, and gives your monitoring system early warning before problems cascade. Without health checks, you get cascading failures, traffic sent to dead instances, and problems that compound silently until they take down your entire application.
This article covers the three probe types Kubernetes provides, how to implement health endpoints in your services, how to handle deep health checks for dependencies, and the patterns that keep your system resilient when individual services fail.
Introduction
Health checks provide the visibility that distributed systems need to self-heal. When a service starts failing, the rest of the system needs to know quickly so it can stop routing traffic to the failing instance, trigger restarts when appropriate, and alert operators before problems cascade. Without health checks, you get cascading failures, traffic sent to dead instances, and problems that compound silently until they take down your entire application.
A properly implemented health check system tells Kubernetes when to route traffic to your pod, tells your load balancer which instances are ready, and gives your monitoring system early warning before problems escalate. This article covers the three probe types Kubernetes provides, how to implement health endpoints in your services, deep health checks for dependencies, and the configuration patterns that keep your system resilient.
The Three Probe Types
Kubernetes distinguishes between three states a pod can be in. Each state has a corresponding probe type that determines how Kubernetes manages the pod’s lifecycle and traffic routing.
graph TD
A[Pod Starting] --> B{Startup Probe}
B -->|Not Ready| C[Initializing]
B -->|Ready| D{Liveness Probe}
D -->|Failing| E[Restarting]
D -->|Healthy| F{Readiness Probe}
F -->|Failing| G[Remove from Traffic]
F -->|Passing| H[Receive Traffic]
E --> D
G --> F
Liveness Probe: Is the Process Alive?
The liveness probe answers a simple question: is the process running and responsive? If the liveness probe fails, Kubernetes restarts the container. This handles situations where the process is alive but stuck in a deadlock or unresponsive state.
A basic liveness probe configuration looks like this:
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
The liveness probe waits 10 seconds after startup before the first check. Then it checks every 15 seconds. If the check takes more than 5 seconds, it counts as a failure. After 3 consecutive failures, Kubernetes restarts the container.
Keep liveness probes simple. A liveness probe that checks dependencies will restart your service whenever your database is temporarily unavailable, which makes outages worse, not better.
Readiness Probe: Can the Service Accept Traffic?
The readiness probe answers: can this instance handle requests right now? A service might be running but not ready if it is warming up, loading configuration, or recovering from a dependency outage.
When the readiness probe fails, Kubernetes removes the pod from the service endpoint slice. Traffic stops being routed to that instance. The pod keeps running and the probe keeps checking. When the probe passes again, traffic resumes.
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 2
Use readiness probes for checks that verify dependencies. If your service needs a database connection and a cache to serve requests properly, the readiness probe should verify both. Keep the probe fast to avoid removing instances unnecessarily during brief slowdowns.
Startup Probe: The Initialization Grace Period
The startup probe handles applications that need significant time to initialize. If your service takes 30 seconds to start, a liveness probe that starts checking after 10 seconds will kill the container before it is ready.
The startup probe delays all other probes until it succeeds:
startupProbe:
httpGet:
path: /started
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 12
With 5-second intervals and 12 failures allowed, the startup probe gives your service up to 60 seconds to initialize. Once the startup probe passes, Kubernetes switches to the liveness and readiness probes.
Startup probes suit applications that load large models, warm up JIT compilers, or perform initial data loads at startup.
Implementing Health Endpoints
Your service needs to expose endpoints that Kubernetes can query. Plan for three endpoints: a liveness endpoint for basic aliveness, a readiness endpoint that checks dependencies, and optionally a startup endpoint for initialization.
Basic Health Endpoint
The liveness endpoint should be trivially simple. It checks nothing except whether the HTTP server can respond:
@app.get("/live")
def liveness():
return {"status": "alive"}
This endpoint must not check dependencies. If your database is down and this endpoint returns healthy, Kubernetes will keep the container running but the liveness probe passes. If the endpoint itself fails because the process is deadlocked, Kubernetes restarts the container, which is the desired behavior.
Readiness Endpoint with Dependency Checks
The readiness endpoint verifies your service can handle traffic:
@app.get("/ready")
def readiness():
# Check database connectivity
try:
db.execute("SELECT 1")
except Exception as e:
raise HealthCheckFailed("Database unavailable")
# Check cache connectivity
try:
cache.ping()
except Exception as e:
raise HealthCheckFailed("Cache unavailable")
# Check downstream services
for service in dependent_services:
if not service.is_healthy():
raise HealthCheckFailed(f"{service.name} unavailable")
return {"status": "ready"}
Keep readiness checks fast. A 5-second timeout means 5 seconds of serving bad responses while your health check times out. Set timeouts aggressively and fail fast.
Startup Endpoint
The startup endpoint mirrors the readiness check but exists only during initialization:
@app.get("/started")
def startup():
if not initialization_complete.is_set():
raise HealthCheckFailed("Still initializing")
return {"status": "started"}
Once initialization completes, this endpoint can return healthy permanently, or you can remove the startup probe configuration and let Kubernetes use only liveness and readiness probes.
Deep Health Checks
Simple endpoints that just return “healthy” catch process crashes but miss dependency failures. Deep health checks verify your dependencies are actually working.
Database Connectivity
Do not just check if the database process is running. Check if your application can execute queries:
def check_database():
try:
with db.connection() as conn:
cursor = conn.cursor()
cursor.execute("SELECT 1")
result = cursor.fetchone()
if result[0] != 1:
raise HealthCheckFailed("Database query failed")
except OperationalError:
raise HealthCheckFailed("Database connection failed")
For PostgreSQL, SELECT 1 works. For MySQL, use SELECT 1 as well. For MongoDB, use db.admin.command('ping').
Cache Verification
Caches fail silently in most configurations. Verify your cache is actually storing and retrieving data:
def check_cache():
try:
test_key = f"health_check:{uuid.uuid4()}"
test_value = str(time.time())
cache.set(test_key, test_value, ex=10)
retrieved = cache.get(test_key)
if retrieved != test_value:
raise HealthCheckFailed("Cache read/write mismatch")
cache.delete(test_key)
except Exception as e:
raise HealthCheckFailed(f"Cache check failed: {e}")
Use a unique key per check to avoid collisions in shared cache environments.
Service Mesh Health Checks
When running behind a service mesh like Istio, Envoy handles health checking by default. You configure ReadinessGate in your pod spec and Envoy manages the actual health check calls:
readinessGates:
- conditionType: "envoy.kubernetes.io/ready"
Your application still needs to expose a health endpoint for orchestration systems and load balancers that do not use Envoy’s sidecar proxy.
Kubernetes Configuration
Probe Configuration Options
Each probe type supports the same configuration parameters.
| Parameter | Purpose | Typical Value |
|---|---|---|
initialDelaySeconds | Wait before first check | Liveness: 10-30s, Readiness: 5-10s |
periodSeconds | How often to check | 10-15s for liveness, 5-10s for readiness |
timeoutSeconds | When to count as failure | 3-5s |
failureThreshold | Failures before taking action | Liveness: 3, Readiness: 2 |
successThreshold | Consecutive successes to recover | 1 for liveness (always 1) |
Common Mistakes in Probe Configuration
Setting initialDelaySeconds too low causes premature failures. Your application needs time to start before Kubernetes starts checking. Set this based on your observed startup time, not your desired startup time.
Setting periodSeconds too short causes excessive load from health check requests. Setting it too long delays detection of failures. 10-15 seconds balances quick detection with minimal overhead.
Setting failureThreshold too low causes unnecessary restarts from transient issues. Setting it too high delays failure detection. For liveness probes, 3 failures over 45 seconds is reasonable. For readiness probes, 2 failures over 20 seconds balances sensitivity with stability.
Verifying Probe Configuration
Use kubectl to inspect probe configuration and test probes manually:
# Describe pod probe configuration
kubectl describe pod my-pod | grep -A 10 "Liveness"
kubectl describe pod my-pod | grep -A 10 "Readiness"
# Port-forward to test health endpoints
kubectl port-forward my-pod 8080:8080
curl http://localhost:8080/live
curl http://localhost:8080/ready
# Check pod status
kubectl get pod my-pod -o jsonpath='{.status.conditions[*]}'
Health Check Best Practices
Timeouts and Retries
Health check timeouts must be shorter than your request timeout. If your service times out requests at 30 seconds but health checks wait 10 seconds, failing health checks will not catch the problems fast enough.
For readiness probes checking dependencies, set timeouts at 2-3 seconds. Most dependency checks should complete in milliseconds. A 3-second timeout catches genuine problems without false positives from brief slowdowns.
Do not implement retry logic in health checks. Kubernetes handles retries at the probe level. If a health check fails, Kubernetes retries based on failureThreshold. Adding your own retry logic inside the health check endpoint adds latency and complexity without benefit.
Fallbacks and Graceful Degradation
When health checks fail, have a plan for degraded operation. If your recommendation service cannot reach its ML model, return popularity-based recommendations instead of errors. If your search service cannot reach Elasticsearch, fall back to database-backed search.
@app.get("/ready")
def readiness():
try:
check_database()
except HealthCheckFailed:
# Can we serve read-only traffic?
if not app.allow_read_only_mode():
raise
return {"status": "ready", "mode": "read_only"}
try:
check_cache()
except HealthCheckFailed:
# Cache is optional
return {"status": "ready", "cache": "degraded"}
return {"status": "ready"}
What Not to Check in Health Endpoints
Keep liveness probes minimal. The liveness probe exists to detect deadlocks and crashes, not dependency outages. If your liveness probe fails whenever your database is unavailable, you restart into the same situation repeatedly.
Do not implement business logic in health checks. Health checks should verify infrastructure and dependencies, not application state. If you need to check application state, use separate monitoring endpoints with their own alerting.
Do not block health checks on long operations. A health check that takes 30 seconds to complete defeats its purpose. Set aggressive timeouts and fail fast.
Monitoring and Alerting
Health checks generate valuable signals for monitoring. Track health check latency and failure rates alongside application metrics.
# Track health check duration
def measure_health_check(name, check_func):
start = time.time()
try:
check_func()
duration = time.time() - start
metrics.histogram("health_check_duration_seconds", duration, labels={"check": name})
metrics.increment("health_check_success_total", labels={"check": name})
except Exception:
duration = time.time() - start
metrics.histogram("health_check_duration_seconds", duration, labels={"check": name})
metrics.increment("health_check_failure_total", labels={"check": name})
raise
Set alerts on health check failures, not just application error rates. A failing health check often precedes customer-visible errors by several minutes.
When to Use / When Not to Use
When to Use Health Checks
Health checks are essential in these scenarios:
- Container orchestration (Kubernetes, Docker Swarm) where orchestrators need to know when to restart or route traffic to your service
- Load balancer integration where load balancers need to know which instances can receive traffic
- Auto-scaling systems where scaling decisions depend on service health
- Microservices with dependencies where you need to detect when downstream services are unavailable
- Multi-instance deployments where you need to ensure all instances are healthy before serving traffic
When Not to Use Health Checks
Health checks may add unnecessary complexity in these cases:
- Single-instance applications with no orchestration and no load balancing
- Stateless batch jobs that run to completion and exit (though startup/shutdown hooks may still be useful)
- Very short-lived tasks where the overhead of health check implementation outweighs the benefit
- Services where failure is acceptable - non-critical background workers that can fail without impact
Probe Selection Guide
| Scenario | Startup Probe | Liveness Probe | Readiness Probe |
|---|---|---|---|
| Slow-starting application | Required | Not needed until startup completes | Not needed until startup completes |
| Depends on external services | Not needed | Not recommended (restarts on transient deps) | Required (blocks traffic during dependency issues) |
| Serves cached data when deps fail | Not needed | Not recommended | Optional (can return healthy with degraded status) |
| Stateless computation | Required if startup time is non-trivial | Optional (process crash = container restart) | Optional |
| Database-backed API | Required | Not recommended | Required |
Decision Flow
graph TD
A[Implementing Health Checks] --> B{Application Slow to Start?}
B -->|Yes| C[Add Startup Probe]
B -->|No| D{Service Has Dependencies?}
D -->|Yes| E{Need to Block Traffic When Deps Unavailable?}
E -->|Yes| F[Add Readiness Probe]
E -->|No| G[Add Liveness Probe]
D -->|No| H{Can Crash Indicate Problem?}
H -->|Yes| G
H -->|No| I[No Probes Needed]
C --> F
C --> G
Topic Deep Dive: Kubernetes Probe Configuration and Failure Threshold Tuning
Getting probe configuration right is critical for reliability. Too sensitive and you restart healthy services. Too lenient and you route traffic to failing ones.
Initial Delay Calculation
Set initialDelaySeconds based on observed startup time, not desired startup time:
# Check actual startup time first
# kubectl run my-app --image=my-app && kubectl logs -f my-app
# Observe how long until the app is ready to serve
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 45 # Allow 45s for startup
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
A common mistake is setting initialDelaySeconds too low based on optimistic estimates. If your service takes 30 seconds to warm up its database connection pool and load configuration, set initialDelaySeconds to at least 35 seconds.
Period and Failure Threshold Tuning
The right balance depends on your service characteristics:
| Service Type | Suggested Period | Failure Threshold | Detection Time |
|---|---|---|---|
| Fast stateless API | 5-10s | 2-3 | 10-30s |
| Database-backed service | 10s | 3 | 30-45s |
| Slow initializing service | 15s | 3 | 45s |
| ML model serving | 20s | 3 | 60s |
Detection time = periodSeconds * failureThreshold. Aim for 30-60 second detection for transient issues while catching real problems quickly.
Kubernetes Probe Configuration for Different Scenarios
# Fast-starting stateless service
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
# Slow-starting service with heavy initialization
startupProbe:
httpGet:
path: /started
port: 8080
failureThreshold: 12
periodSeconds: 10 # 120s max startup time
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 130 # After startup probe completes
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 135
periodSeconds: 10
failureThreshold: 2
gRPC Health Checks
For gRPC services, Kubernetes supports gRPC probe as of 1.24:
readinessProbe:
grpc:
port: 50051
service: ""
initialDelaySeconds: 5
periodSeconds: 10
The empty service field checks overall server health. Specify a service name to check specific service availability.
HTTP/TCP Comparison for Different Service Types
| Service Type | Recommended Probe | Why |
|---|---|---|
| HTTP REST API | HTTP GET /health | Verifies entire stack |
| gRPC service | gRPC probe | Native gRPC support |
| Database | TCP check on port | Fast, verifies network reachability |
| Cache (Redis) | TCP check or Redis PING | Simple connectivity check |
| Message queue | HTTP if exposed, else TCP | Depends on exposure |
Real-world Failure Scenarios
| Scenario | What Happens | Root Cause | Mitigation |
|---|---|---|---|
| initialDelaySeconds too low | Container killed before startup completes | Optimistic configuration | Measure actual startup time |
| Period too short | Excessive load from probe requests | Probe fatigue | Use 10-15s for liveness, 5-10s for readiness |
| failureThreshold too low | False positives from transient issues | Oversensitive configuration | Require 3+ failures before action |
| Readiness probe checks external deps | Unhealthy when downstream is slow | Tight coupling | Make readiness probe fast, rely on circuit breakers |
| Liveness probe checks database | Continuous restart loop | Liveness depends on external | Liveness should check only local health |
Trade-off Comparison
| Strategy | Pros | Cons | Best For |
|---|---|---|---|
| HTTP /health endpoint | Easy to implement, comprehensive | Requires application support | Most REST services |
| TCP socket check | Simple, no app changes needed | Cannot verify application health | Legacy services, non-HTTP protocols |
| Exec probe | Can run custom scripts | Slower, more overhead | Complex health verification |
| gRPC probe | Native for gRPC services | Requires Kubernetes 1.24+ | gRPC microservices |
| Sidecar health check | Decoupled from app | Additional complexity | Service mesh deployments |
For more on building resilient systems, see Resilience Patterns, Circuit Breaker Pattern, and Kubernetes.
Quick Recap Checklist
Health checks let Kubernetes and your load balancers make intelligent routing decisions. Use liveness probes to detect crashed or deadlocked processes. Use readiness probes to control traffic routing based on dependency health. Use startup probes to give slow-starting applications time to initialize.
Keep liveness probes simple. Keep readiness probes fast and thorough. Set timeouts short and failure thresholds reasonable. Monitor your health checks and alert on failures.
Interview Questions
Liveness probes determine if a container should be restarted. If the liveness probe fails, Kubernetes restarts the container. Use liveness probes to detect deadlock situations where the process is running but unresponsive.
Readiness probes determine if a container can receive traffic. If the readiness probe fails, Kubernetes removes the container from the service endpoint slice, stopping traffic to it. Use readiness probes to control traffic routing based on whether the service is ready to serve requests.
Startup probes delay all other probes until the container is ready. Use startup probes for applications that take significant time to initialize, preventing liveness probes from killing the container before it is ready.
If a liveness probe checks external dependencies like databases or caches, a temporary outage of that dependency causes the liveness probe to fail, restarting the container. The container starts up, the liveness probe checks the dependency again (still down), fails again, and the cycle repeats.
This makes the outage worse instead of better. The service restarts continuously, consuming resources and potentially making the dependency problem worse under the load of restart attempts.
Liveness probes should only check whether the process is running and the HTTP server can respond. Dependency health should be checked by readiness probes, which block traffic without restarting the container.
Measure actual startup time for initialDelaySeconds. Run the application, observe how long until it is ready, and set initialDelaySeconds slightly above that measured time.
PeriodSeconds should balance quick detection against probe overhead. 10-15 seconds for liveness, 5-10 seconds for readiness works for most services. Shorter periods add load; longer periods delay detection.
TimeoutSeconds should be short enough to fail fast but long enough for legitimate slow responses. 3-5 seconds is typical for HTTP probes.
FailureThreshold determines how many consecutive failures triggers action. Multiply periodSeconds by failureThreshold to get detection time. For liveness, 3 failures over 45 seconds is reasonable. For readiness, 2 failures over 20 seconds balances sensitivity with stability.
Startup probes delay the start of liveness and readiness probes until they pass. Until the startup probe succeeds, Kubernetes treats all probes as if they are pending. This gives slow-starting applications time to initialize.
If you set initialDelaySeconds on liveness probe to wait for initialization, and your initialization takes 60 seconds, you need initialDelaySeconds: 60. But if initialization fails, Kubernetes waits 60 seconds before even starting liveness checks, delaying the detection of the failure.
Startup probes solve this: set periodSeconds and failureThreshold so the total timeout (periodSeconds * failureThreshold) covers the maximum startup time. If the application starts in 30 seconds but fails to start at all, you detect the failure after 60 seconds of trying rather than waiting 30 seconds first.
Active health checks run periodically from the load balancer or proxy, independent of production traffic. They query health endpoints or test TCP connections and mark instances unhealthy based on responses.
Passive health checks analyze actual request responses. If an instance returns errors or timeouts above a threshold, the load balancer marks it unhealthy without sending additional probe requests. Passive checks catch issues that affect real traffic but might not trigger active probe failures.
Most production systems use both: active checks for detection and passive checks for accuracy. AWS ALB uses passive health checks by default, for example.
Implement a readiness endpoint that verifies connectivity to dependencies. Check database connectivity by executing a simple query like SELECT 1. Check cache connectivity by writing and reading a test value with a unique key to avoid collisions.
Keep checks fast—set aggressive timeouts (2-3 seconds). If a dependency is slow but working, the health check should not fail just because it did not complete within the timeout.
Consider implementing fallback behavior where the service can operate in degraded mode. If the database is unavailable but cached data suffices, the readiness probe can pass with a "degraded" status and the service continues serving from cache.
If timeouts are too high, failing health checks take too long to detect real problems. A 30-second timeout on a health check that should complete in 100ms means you wait 30 seconds to learn the service is unresponsive. Detection time = timeout + processing time, so high timeouts delay failure detection.
If timeouts are too low, you get false positives from legitimate slow responses. Under load, a normally responsive service might take 2 seconds for a health check, but a 1-second timeout would mark it unhealthy even though it is working fine.
Set timeouts based on observed response times under normal load, plus a small buffer. Most health checks should complete in under 1 second; 3-5 seconds is a reasonable timeout range.
Horizontal Pod Autoscaler (HPA) scales based on CPU utilization, memory usage, or custom metrics. Unhealthy pods affect these metrics differently depending on whether the unhealthiness is detected by readiness probes.
If a pod is failing readiness probes, it is removed from service endpoints and stops receiving traffic. This reduces its resource utilization, which might cause the HPA to scale down when it should actually scale up to handle the issue.
Configure HPA to ignore unhealthy pods using the behavior API. Set scale-down stabilization window to avoid aggressive scale-down during recovery. Some teams implement separate scaling based on health check failure rates rather than resource utilization.
When Kubernetes sends SIGTERM to a container for termination, the container should stop accepting new connections but finish existing requests. The readiness probe should start failing immediately so the pod is removed from endpoints before it processes new requests.
The terminationGracePeriodSeconds setting controls how long to wait for the container to shut down. Configure the preStop hook to add delay before sending SIGTERM, giving the load balancer time to update its routing and stop sending new traffic.
If the container does not exit within the grace period, Kubernetes sends SIGKILL. Ensure your application handles SIGTERM by stopping listeners, draining connections, and exiting cleanly.
Istio's Envoy sidecar handles health checking by default when you configure ReadinessGate in your pod spec. The sidecar performs health checks on behalf of the application, and Kubernetes receives the aggregated status through the ReadinessGate condition.
Your application still needs to expose a health endpoint for systems that do not use Envoy's sidecar proxy. Implement the endpoint as usual but understand that Istio will manage the actual probe calls to your pod.
For multi-container pods where the main container and sidecar have different health characteristics, configure separate probes for each. The sidecar might be ready when the main application is still initializing.
HTTP probes send an HTTP GET request to the health endpoint. They are the most common for web services because they verify the entire application stack including HTTP server functionality. Kubernetes marks the probe as successful for response codes 200-399.
TCP probes attempt to open a TCP socket. Use these for services that do not expose HTTP endpoints, such as databases, mail servers, or legacy protocols. They only verify network connectivity, not application health.
Exec probes run a shell command inside the container. The probe succeeds if the command exits with zero. These are flexible but slower due to process overhead. Use when you need custom health logic that cannot be expressed as an HTTP endpoint.
Choose HTTP probes for most web services. Use TCP for non-HTTP services or when you only care about port availability. Use Exec only when you need custom verification logic that cannot be expressed via HTTP or TCP.
During a rolling update, Kubernetes gradually replaces old pods with new ones. Readiness probes determine when a new pod is ready to receive traffic. If the new version fails readiness checks, it stays in the rollout without receiving traffic, allowing you to catch issues before full deployment.
The rollout waits for the readiness probe to pass before the old pod terminates. This ensures continuous availability. Configure `maxUnavailable` and `maxSurge` in your rolling update strategy to control deployment speed versus availability.
Liveness probes do not directly affect rollouts. If a new pod's liveness probe fails, Kubernetes restarts it rather than rolling back. This is why readiness probes should be thorough and liveness probes should be minimal.
When a readiness probe returns a non-2xx response, Kubernetes marks the pod condition as `Ready: False`. The pod is removed from the service endpoint slice, stopping new traffic. Existing connections are not terminated, but no new traffic routes to that pod.
The pod continues running and the probe continues checking. When the probe passes again, Kubernetes marks `Ready: True` and adds the pod back to the endpoint slice. Traffic resumes automatically.
For a 503 specifically, your application is explicitly reporting it cannot handle requests. This is different from a timeout (which indicates the application may be stuck) or connection refused (which indicates the application is not listening).
Multi-tenant services often have per-tenant dependencies. A tenant's database might be down while others are fine. Consider implementing tenant-aware readiness checks that verify connectivity to all tenant resources.
One approach is a aggregated health endpoint that reports status per tenant. The readiness probe sums across all tenants: if any tenant's critical dependency is down, mark the pod as not ready. Use a separate monitoring endpoint to expose per-tenant details without affecting traffic routing.
Alternatively, use separate readiness gates per tenant or implement tenant isolation at the ingress layer so unhealthy tenants do not affect others.
Health checks and circuit breakers serve different but complementary purposes. Health checks detect failure so orchestration platforms can react. Circuit breakers detect failure patterns and prevent your service from calling downstream services that are known to be failing.
A circuit breaker monitors the success rate of downstream calls. When the failure rate exceeds a threshold, the circuit opens and subsequent calls fail immediately without making the actual request. This protects both your service and the downstream one from cascading failure.
Use readiness probes to report your service's overall health to Kubernetes. Use circuit breakers to protect your service from downstream failures. These patterns work together: the readiness probe might report degraded status while the circuit breaker prevents further damage.
Test health checks under realistic failure conditions. Artificially inject failures for each dependency (database down, cache unavailable, downstream service timeout) and verify the probe behaves as expected.
Measure actual probe latency under load. Your health check might complete in 50ms normally but take 2 seconds under load. Set timeouts high enough to avoid false positives during traffic spikes.
Test startup time measurements. Force restart your application and measure how long until readiness probe passes. Set initialDelaySeconds to 10-20% above this measured time.
Use canary deployments to test new probe configurations with a small percentage of traffic before full rollout.
Health endpoints should not expose sensitive information. A health check that returns database connection strings, internal IP addresses, or configuration details creates an information disclosure vulnerability.
Keep health responses minimal: just a status indicator. If you need detailed diagnostics for monitoring, use a separate endpoint that is not exposed to the internet and protected by network policies.
Consider rate limiting on health endpoints to prevent abuse. An exposed `/live` endpoint with no rate limiting could be used for amplification attacks or reconnaissance.
Applications that need warm-up time (JIT compilation, model loading, connection pool initialization) should use a startup probe. Set the failure threshold high enough to cover the maximum warm-up time.
During warm-up, readiness probes should fail. After warm-up completes, readiness probes should pass. This ensures the application does not receive traffic until it is ready.
For cool-down (graceful shutdown), use preStop hooks and SIGTERM handling. The readiness probe should start failing immediately on SIGTERM so the pod is removed from endpoints before shutdown begins.
Each health check probe consumes resources: CPU for the check, network bandwidth for the request, and memory for the connection. High-frequency probes compound across all pods in your cluster.
For a cluster with 1000 pods checking every 10 seconds, you have 100 probe requests per second hitting your services. If each probe takes 10ms of CPU, that is 1 second of CPU per second dedicated to health checking.
Balance detection speed against overhead. 10-15 second periods for liveness probes work for most services. Readiness probes can be more frequent (5-10 seconds) because they only affect traffic routing, not restarts.
Track health check latency and failure rates as application metrics. If health check duration increases over time, your application may be degrading. If health check failures spike, investigate immediately.
Set up alerts for health check probe duration exceeding thresholds. A health check that normally takes 5ms taking 500ms indicates resource contention or connection pool exhaustion.
Monitor probe configuration changes. If someone reduces failureThreshold or increases periodSeconds, detection time changes. Alert on configuration drift from baseline.
Further Reading
- Resilience Patterns — Circuit breakers, retries, and bulkheads that work alongside health checks
- Circuit Breaker Pattern — Preventing cascading failures in distributed systems
- Kubernetes — Container orchestration and service networking fundamentals
- Kubernetes Probe Configuration — Official documentation on configuring liveness, readiness, and startup probes
- Istio Health Checking — Configuring health checks in service mesh environments
Conclusion
Health checks form the foundation of reliable service discovery and availability detection in distributed systems. Liveness probes identify crashed or dead services that need restarting. Readiness probes determine whether a service can handle traffic after startup, deployments, or temporary degradation. Startup probes accommodate slow-starting applications without forcing overly aggressive defaults.
Probe configuration requires balancing detection speed against false positive risk. Set timeouts based on normal response times under load. Configure failure thresholds to tolerate brief issues without waiting too long to act. Use separate probes for different concerns when your application warrants it.
Health checks integrate deeply with orchestration platforms—Kubernetes uses them to manage pod lifecycle, autoscaling, and traffic routing. Service meshes layer additional health checking through sidecar proxies. External load balancers perform their own checks against your services.
Building reliable health checks means testing them under failure conditions. Verify that your services correctly report unhealthy states. Confirm that orchestration platforms respond appropriately. Measure detection and recovery times under various failure scenarios.
The right health check strategy depends on your application’s characteristics and your tolerance for different failure modes. Start with simple checks and add depth as operational needs demand.
Category
Related Posts
Deployment Strategies: Rolling, Blue-Green, and Canary Releases
Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.
Kubernetes High Availability: HPA, Pod Disruption Budgets, Multi-AZ
Build resilient Kubernetes applications with Horizontal Pod Autoscaler, Pod Disruption Budgets, and multi-availability zone deployments for production workloads.
CI/CD Pipelines for Microservices
Learn how to design and implement CI/CD pipelines for microservices with automated testing, blue-green deployments, and canary releases.