Secrets Management: Vault, Kubernetes Secrets, and Env Vars
Learn how to securely manage secrets, API keys, and credentials across microservices using HashiCorp Vault, Kubernetes Secrets, and best practices.
Secrets Management: Vault, Kubernetes Secrets, and Environment Variables
Every microservice needs credentials. Database passwords, API keys, TLS certificates, encryption keys. The list adds up fast. In a monolith, you could probably get away with a config file or some environment variables on each server. That stops working when you have dozens of services scaling dynamically across multiple clouds.
This post covers what secrets actually are in microservices, the difference between static and dynamic approaches, how Kubernetes handles them, and why HashiCorp Vault has become essential infrastructure for production deployments.
Introduction
A secret is any credential or sensitive data that authenticates or authorizes access to something. API keys, database passwords, OAuth tokens, SSH keys, TLS certificates.
The tricky part in microservices is that services communicate with each other and with external systems. Each connection typically requires authentication. A payment service needs database credentials. An API gateway needs to validate JWT tokens. Service-to-service calls need mTLS certificates.
In a monolith, you might have put these in a config.yaml on each server. With hundreds of services scaling dynamically across multiple clouds, that breaks down.
Core Concepts
Not all secrets are the same. The distinction between static and dynamic secrets affects both security and operational complexity.
Static Secrets
Static secrets are long-lived credentials that do not change unless someone manually rotates them. Database passwords, API keys, static TLS certificates fit here. These secrets persist for weeks, months, or even years.
The rotation problem is real. If a database password never changes and someone obtains it through a breach, they have permanent access until you manually update it. Most organizations update static secrets infrequently because rotation requires coordinating changes across multiple services and environments, a painful process that often introduces downtime risk.
Dynamic Secrets
Dynamic secrets are generated on-demand with short lifespans — minutes to hours. A service requests temporary credentials and they expire before an attacker can do much with them. Vault, AWS Secrets Manager, and GCP Secret Manager support this.
Kubernetes Secrets
Kubernetes has a built-in Secrets resource, but it has limitations that catch many teams off guard.
How Kubernetes Secrets Work
Kubernetes Secrets use base64 encoding. Creating a secret looks like this:
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
data:
username: cG9zdGdyZXM=
password: c2VjcmV0cGFzc3dvcmQ=
The values are base64 encoded, not encrypted. Anyone with cluster access (developers, CI/CD pipelines, anyone who can read pods) can decode those values trivially. Run echo "cG9zdGdyZXM=" | base64 -d and you get the plaintext username.
By default, Kubernetes stores secrets in etcd with encryption at rest, but this requires explicit configuration. Many managed Kubernetes services enable this by default. Self-hosted clusters often do not.
The Real Limitation
Kubernetes Secrets are not really secrets. They are configmaps with base64 encoding. Anyone with cluster access can read them. They do not support fine-grained access control beyond namespace separation. No built-in secret rotation either.
Teams use Kubernetes Secrets because they are built-in and simple, then hit problems when they need audit logs, automatic rotation, or integration with external secret stores.
For small deployments or early-stage projects, Kubernetes Secrets might be enough. For production systems handling sensitive data, you will probably need something more robust.
HashiCorp Vault
Vault is a dedicated secrets management tool that addresses the gaps in Kubernetes Secrets. It gives you centralized secret storage, dynamic secrets, encryption as a service, and detailed audit logs.
Vault Architecture
Vault uses an architecture that separates concerns cleanly:
graph TD
subgraph Clients
S1[Service A]
S2[Service B]
S3[Service C]
end
subgraph Vault
SA[Storage Backend]
LS[Logical System]
PL[Plugin Layer]
end
S1 -->|Request secrets| LS
S2 -->|Request secrets| LS
S3 -->|Request secrets| LS
LS -->|Persist| SA
LS -->|Execute| PL
SA[(Storage<br/>etcd/Consul/S3)]
The storage backend holds encrypted data. The logical system handles the API and secret engines. The plugin layer extends functionality for things like PKI certificates or cloud provider integrations.
Secret Engines
Vault organizes secrets into logical paths called secret engines. Each engine handles a specific type of secret:
- Key-Value (KV): Static secrets stored at a path. Simple key-value storage for passwords, API keys, or any credential.
- Database: Generates dynamic database credentials on demand. Supports PostgreSQL, MySQL, MongoDB, and many others.
- PKI: Generates TLS certificates automatically. Handles certificate lifecycle including issuance, renewal, and revocation.
- AWS: Generates temporary IAM credentials for AWS resources.
- Kubernetes: Issues Kubernetes service account tokens and can manage Kubernetes Secrets.
Authentication Methods
Services do not just retrieve secrets - they authenticate to Vault first. Vault supports multiple authentication methods:
- Kubernetes Service Account: Uses Kubernetes service account tokens to authenticate. The most common approach for services running in Kubernetes.
- AppRole: A role-based authentication method for machines or applications.
- JWT/OIDC: For services using JWT-based authentication.
- TLS Certificates: For services with valid client certificates.
The typical flow for a Kubernetes workload looks like this:
- Pod starts and gets its service account token mounted
- Pod authenticates to Vault using its Kubernetes service account
- Vault validates the token with Kubernetes
- Vault returns the requested secrets
- Pod uses secrets to connect to databases or other services
Dynamic Secrets in Action
The database secret engine demonstrates Vault’s power. Instead of sharing a static database password across all services, each service gets its own temporary credentials:
# A service requests database credentials
vault read database/creds/myapp-role
# Vault returns temporary credentials
Key Value
--- -----
lease_id database/creds/myapp-role/xyz123
lease_duration 1h
username v-token-myapp-role-xyz123
password A1a-xxxxxxxxxxxxx
The service uses these credentials to connect to the database. After one hour, Vault revokes the credentials automatically. The next time the service needs database access, it requests new credentials from Vault.
This approach means no service ever knows the master database password. Compromise of one service only exposes temporary credentials that expire quickly.
Service Accounts and Workload Identity
Modern microservices should not rely on shared static credentials. Workload identity provides cryptographically verifiable identity for services running in cloud environments or Kubernetes.
SPIFFE (Secure Production Identity Framework for Everyone) defines a standard for workload identity. SPIFFE IDs are URIs that uniquely identify workloads:
spiffe://example.com/ns/payment/sa/payment-service
This SPIFFE ID can be embedded in X.509 certificates or JWTs, letting services prove their identity without shared secrets. Service mesh solutions like Istio and Linkerd automatically issue SPIFFE certificates to workloads.
Workload identity shifts the security model from “protect the secrets” to “verify the identity”. Instead of obsessing about database passwords being leaked, you verify that the service presenting credentials is actually the payment service.
Vault supports SPIFFE-based authentication through its Kubernetes auth method. Services can authenticate using their SPIFFE IDs and receive Vault secrets based on their identity.
Secret Rotation Strategies
Rotation is where many secret management strategies fall apart. A good rotation strategy should be automated, have minimal blast radius, and not require downtime.
Rotation Patterns
Time-based rotation: Secrets expire after a fixed period. Services get new credentials before expiration. Simple approach but requires services to handle credential refresh gracefully.
Event-based rotation: Rotation triggers on specific events - a suspected compromise, an employee leaving, a compliance requirement. This requires coordination across services but ensures secrets do not persist indefinitely after a security incident.
Usage-based rotation: For dynamic secrets from Vault, rotation happens automatically on a schedule. Services always have fresh credentials without manual intervention.
Implementing Rotation
The key to successful rotation is designing services to handle credential refresh:
- Cache credentials locally
- Before using a credential, check if it is about to expire
- If expiring soon, request new credentials proactively
- Handle authentication failures gracefully and retry with fresh credentials
Many teams use a sidecar container that handles credential refresh. The sidecar communicates with Vault, stores credentials in a shared volume, and the main container reads from that volume. This separation lets you update the sidecar logic without touching your application.
External Secrets Operators
External Secrets Operators (ESO) integrate external secret stores like Vault, AWS Secrets Manager, or GCP Secret Manager with Kubernetes. Instead of manually copying secrets into Kubernetes, you reference external secrets in your Kubernetes manifests.
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: password
remoteRef:
key: secret/myapp/db-password
property: password
When you apply this manifest, the operator syncs the secret from Vault into a Kubernetes Secret. The operator handles rotation based on the refresh interval.
This gives you the best of both worlds: secrets live in your centralized secrets store with full audit logs and access control, while your applications keep using standard Kubernetes Secrets.
The External Secrets Operator supports multiple backends: Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, and more.
CI/CD and GitOps Integration
Secrets management does not stop at runtime. CI/CD pipelines need access to secrets to deploy applications. GitOps workflows need secrets to synchronize configuration.
Injecting Secrets at Deploy Time
The pattern that works well: secrets never live in git, but CI/CD pipelines retrieve them at deploy time. Your deployment process looks like this:
- CI/CD pipeline triggers on a git commit
- Pipeline authenticates to Vault or your cloud secrets manager
- Pipeline retrieves the required secrets
- Secrets are injected into the pod at deploy time (via dynamic secrets or ESO)
- Application runs with short-lived credentials
Many teams use Kubernetes service account JWTs for this. The CI/CD runner has a service account with permissions to request specific secrets from Vault.
GitOps Considerations
In a GitOps model, your desired state lives in git. The problem is reconciling the fact that secrets cannot live in git.
The solutions:
- External Secrets Operator: Reference external secrets in your git manifests. The operator syncs actual values from your secrets store.
- Sealed Secrets: Encrypt secrets before committing to git. Only the cluster can decrypt them.
- Vault Agent: Run a vault agent sidecar that handles authentication and secret retrieval.
The External Secrets Operator approach has become popular because it keeps git manifests clean and relies on established external secrets stores for the actual sensitive values.
When to Use / When Not to Use
| Scenario | Use This Approach | Notes |
|---|---|---|
| Production microservices with dynamic scaling | Vault Dynamic Secrets | Short-lived credentials work well with scaling |
| Small project or early-stage startup | Kubernetes Secrets | Simpler to start, but plan migration to Vault |
| CI/CD pipeline credentials | Vault or Cloud Secrets Manager | Centralized audit trail for pipeline access |
| Multi-cloud or hybrid environment | Vault | Cloud-agnostic, works across AWS/GCP/Azure |
| Database credentials for PostgreSQL/MySQL | Vault Database Secrets Engine | Dynamic credentials per service |
| TLS certificates | Vault PKI or cert-manager | cert-manager is simpler for Kubernetes-native; Vault for unified secrets |
| External third-party API keys | Vault KV with rotation | Static but with automated rotation |
| Kubernetes-native only, simple needs | Kubernetes Secrets + ESO | External Secrets Operator syncs from external stores |
| Legacy VMs outside Kubernetes | Vault Agent Sidecar | Vault works outside K8s too |
Trade-off Analysis
| Aspect | Kubernetes Secrets | HashiCorp Vault |
|---|---|---|
| Encryption at rest | Requires explicit config (etcd encryption) | Encrypted by default |
| Dynamic secrets | No native support | Database, AWS, GCP, PKI engines |
| Secret rotation | Manual or via external tools | Native rotation engines |
| Access control | RBAC only (namespace, SA) | Fine-grained policies per path |
| Audit logging | Basic K8s events | Comprehensive audit log |
| High availability | Built into K8s control plane | Requires HA configuration |
| Learning curve | Low | Higher |
| Cost | Included with K8s | Open source free; Enterprise for advanced features |
When Not to Use Plain Kubernetes Secrets
- Secrets containing sensitive data: base64 encoding is not encryption; anyone with cluster access can decode
- Production systems with compliance requirements: No audit log of who accessed what
- Multi-team clusters where separation is needed: RBAC is coarse-grained
- When you need dynamic credentials: K8s Secrets are static by nature
Best Practices
Managing secrets well requires consistent discipline across your organization. These practices matter most.
Never Commit Secrets to Version Control
This should go without saying, but it keeps happening. API keys, database passwords, and private keys should never appear in git repositories. Not even in private repositories.
Use tools like git-secrets, Talisman, or pre-commit hooks to scan commits for accidentally committed secrets. Treat any secret that appears in version control as compromised. Rotate it immediately.
A .gitignore file should exclude any file that might contain secrets. Be particularly careful with .env files, config maps, and any file with “secret” or “credential” in the name.
Least Privilege Access
Services should only have access to the secrets they need. A logging service does not need database credentials. An API service does not need access to your payment processing secrets.
In Vault, define policies that restrict which paths each service can read. In Kubernetes, use RBAC to limit who can read Secrets in each namespace.
Audit access regularly. Who is actually reading secrets in production? Are there services that request secrets they no longer use? Periodic reviews catch accumulation of unnecessary access.
Audit Everything
You need to know who accessed what, when, and from where. Vault provides comprehensive audit logging including:
- Every authentication attempt (success and failure)
- Every secret read
- Every secret created or updated
- Client identity and source IP for each request
Cloud providers offer similar logging for their secrets managers. Enable those logs and ship them to your SIEM or log aggregation system.
Audit logs serve multiple purposes: detecting unauthorized access, troubleshooting operational issues, and meeting compliance requirements.
Use Short-Lived Credentials
Dynamic secrets with short lifespans reduce risk significantly. If a credential expires in 15 minutes, an attacker has a narrow window to use it. Static credentials that last a year give attackers plenty of time.
Where dynamic secrets are not possible, use the shortest acceptable lifetime for static secrets. Review password policies and certificate expiration periods. Do you really need that API key to last forever?
Production Failure Scenarios
Failure Scenarios and Mitigations
Scenario: Vault Unavailable
Symptoms: Services cannot retrieve secrets. Applications fail to start or start with stale cached credentials. Dynamic credentials stop working.
Diagnosis:
# Check Vault pods
kubectl get pods -n vault -l app=vault
# Check Vault status
kubectl exec -n vault vault-0 -- vault status
# Check for seal status
kubectl exec -n vault vault-0 -- vault status | grep Sealed
# Test Vault API
kubectl exec -n vault vault-0 -- curl -s http://localhost:8200/v1/sys/health
Mitigation:
- If Vault is sealed, unseal it using stored unseal keys (auto-unseal is preferred for production)
- If Vault pods are down, check resource constraints (OOMKilled) or scheduling issues
- Services with cached secrets may continue working if cache is still valid
- If Vault is overloaded, scale horizontally (Vault supports read replicas)
Prevention:
- Use auto-unseal with cloud KMS (AWS KMS, GCP CKMS, Azure Key Vault)
- Run Vault in HA mode with multiple pods
- Configure pod anti-affinity for Vault servers
- Monitor Vault health and set alerts for sealed/unavailable status
Scenario: Kubernetes Service Account Token Not Validating Against Vault
Symptoms: Services cannot authenticate to Vault using Kubernetes auth method. Logs show “permission denied” or “invalid token” errors.
Diagnosis:
# Check if service account token is mounted
kubectl exec -it <pod> -- cat /var/run/secrets/kubernetes.io/serviceaccount/token
# Check Vault auth method status
kubectl exec -n vault vault-0 -- vault auth list
# Check the role exists
kubectl exec -n vault vault-0 -- vault read auth/kubernetes/role/<role-name>
# Test token validation locally
kubectl exec -n vault vault-0 -- vault write auth/kubernetes/login \
role=<role-name> jwt=<token-from-pod>
Mitigation:
- Verify the service account exists and has the correct name
- Check if the Vault role is configured with correct service account name and namespace
- Verify the cluster’s CA cert is configured in Vault’s Kubernetes auth method
- If token was regenerated, the old token is invalid; restart the pod to get a fresh token
Prevention:
- Use long-lived service accounts to avoid token rotation issues
- Monitor Vault auth method configuration for unexpected changes
- Test Kubernetes auth method monthly
Scenario: ESO Sync Failing
Symptoms: ExternalSecret resource exists but Kubernetes Secret is not created. ESO operator logs show errors.
Diagnosis:
# Check ESO operator logs
kubectl logs -n external-secrets deployment/external-secrets-operator
# Check ExternalSecret status
kubectl get externalsecret -n <namespace> -o yaml
# Check secret store connectivity
kubectl get secretstore -n <namespace>
# Verify ESO can reach Vault
kubectl exec -n external-secrets deployment/external-secrets-operator -- curl -s https://vault.example.com/v1/sys/health
Mitigation:1. Verify SecretStore CRD is correctly configured with Vault address and auth method 2. Check ESO operator has ClusterRole permissions to read ExternalSecret and SecretStore resources 3. If ESO pod restarted recently, wait for reconciliation 4. Delete and recreate ExternalSecret to force re-sync
Prevention:
- Monitor ExternalSecret status with Prometheus metrics
- Set alerts for ESO reconciliation errors
- Use ESO with Vault’s Kubernetes auth method for automatic credential management
Scenario: Database Credentials Not Rotating
Symptoms: Vault shows one set of credentials issued but database shows different user active. Dynamic credentials not being renewed.
Diagnosis:
# Check Vault lease
vault read database/creds/myapp-role
# Check current database users
# For PostgreSQL: SELECT * FROM pg_user;
# Verify lease is being renewed
vault lease list database/
# Check Vault logs for renewal errors
kubectl logs -n vault vault-0 | grep renew
Mitigation:
- If lease expired, Vault automatically revokes credentials; new credentials will be issued on next request
- Services must request new credentials when lease expires; check service implementation
- If using ESO, verify the refresh interval is configured correctly
- Database might have reached max connections due to accumulated stale credentials
Prevention:
- Implement credential refresh logic in services before lease expires
- Monitor lease expiration metrics
- Set database max connections with auto-cleanup for abandoned connections
- Test renewal process in staging
Observability Hooks
Metrics to Capture
| Metric | What It Tells You | Alert Threshold |
|---|---|---|
vault_secret_access_total | Secret read rate by path | Unexpected paths accessed |
vault_lease_renewal_success_total | Dynamic credential renewal success | <99.5% |
vault_lease_expiration_seconds | Time until dynamic credentials expire | <300s warning, <60s critical |
vault_auth_method_failure_total | Auth failures by method (k8s, AppRole) | >1% failure rate |
eso_sync_success_total | External Secrets Operator sync success | <99% |
eso_sync_error_total | ESO sync errors by type | Any increase |
Logs to Collect
From Vault (structured logging):
{
"event": "secret_accessed",
"client_namespace": "default",
"client_service_account": "payment-service",
"secret_path": "secret/myapp/db-password",
"access_result": "allowed|denied",
"remote_addr": "10.0.0.5",
"timestamp": "2026-03-24T10:30:00Z"
}
{
"event": "lease_renewed",
"lease_id": "database/creds/myapp-role/xyz123",
"ttl_seconds": 3600,
"renewed_by": "spire-agent",
"timestamp": "2026-03-24T10:00:00Z"
}
{
"event": "auth_failure",
"auth_method": "kubernetes",
"client_namespace": "default",
"client_service_account": "unknown-sa",
"failure_reason": "invalid_token|role_not_found|token_expired",
"timestamp": "2026-03-24T10:30:00Z"
}
Key log fields: client identity (namespace, SA), secret path, access result, failure reason, source IP.
Traces to Capture
Enable Vault telemetry with Prometheus metrics. Key metrics to trace:
vault.secret.gen.success: Dynamic secret generation successvault.expire.lease.expiration: Leases approaching expirationvault.auth.kubernetes.fail: Kubernetes auth failures
Dashboards to Build
- Vault Health: Active leases, auth success rate, storage utilization, seal status
- Secret Access Patterns: Read rate by path, top accessed secrets, denied access attempts
- Dynamic Credential Lifecycle: Active credentials by type, expiration heatmap, renewal success rate
- ESO Operations: Sync success rate, error breakdown, reconciliation latency
Alerting Rules
# Vault unavailable
- alert: VaultUnavailable
expr: up{job="vault"} == 0
labels:
severity: critical
annotations:
summary: "Vault is unavailable"
# Vault sealed
- alert: VaultSealed
expr: vault_is_sealed == 1
labels:
severity: critical
annotations:
summary: "Vault is sealed"
# Auth failures
- alert: VaultAuthFailureRate
expr: rate(vault_auth_method_failure_total[5m]) > 0.01
labels:
severity: warning
annotations:
summary: "Vault auth failure rate above 1%"
# Lease expiring soon
- alert: DynamicCredentialExpiring
expr: vault_lease_expiration_seconds < 300
labels:
severity: warning
annotations:
summary: "Dynamic credential expiring in {{ $value }} seconds"
# ESO sync failures
- alert: ESOSyncFailure
expr: rate(eso_sync_error_total[5m]) > 0
labels:
severity: warning
annotations:
summary: "External Secrets Operator sync failures detected"
Interview Questions
Treat this as a critical security incident. First, rotate the key immediately using gcloud iam service-accounts keys delete (for GCP) or the equivalent for AWS/Azure. Do not wait — the key is compromised the moment it is public. Second, check CloudTrail or equivalent audit logs for all usage of that key since the commit timestamp — determine if it was actually exploited. Third, revoke all existing keys for that service account and create a new one. Fourth, update any CI/CD pipelines that used the old key with Workload Identity instead. Finally, run a retrospective: why was the key committed — was it in a pre-commit hook, gitignore misconfiguration, or IDE accident?
ESO syncs secrets from external secret stores (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) into Kubernetes Secrets objects. Your application reads Kubernetes Secrets as env vars or volume mounts — no application code changes needed. Vault's Kubernetes auth method lets pods authenticate to Vault directly using their Kubernetes service account and retrieve secrets at runtime, without persisting secrets into Kubernetes Secrets. ESO is simpler for existing applications that expect Kubernetes Secrets. Vault's direct auth is better for security (secrets never exist as Kubernetes objects) and for short-lived workloads where dynamic secret generation per pod is needed. Both are better than storing static credentials in Kubernetes Secrets.
Use a dual-update approach: update the secret in Vault or your secrets manager, then update the Kubernetes secret using ESO or kubectl patch. The application needs to support credential rotation without restart — implement graceful reload: when the application detects the secret has changed (via a Kubernetes watch on the secret, a SIGUSR1 signal, or a periodic file check), it closes existing connections and reconnects with the new credentials. For applications that do not support dynamic reload, use connection pooling via a proxy like PgBouncer where the proxy holds the database credentials and the application connects through the proxy. This way you rotate the database password at the proxy level without touching application configs.
Kubernetes Secrets are base64-encoded by default, not encrypted — anyone with API access or etcd access can read them. They are also not auditable, have no secret versioning, no dynamic secret generation, and no automatic rotation built in. Kubernetes Secrets are fine for non-sensitive, low-risk credentials like a staging environment's public API key. For production with real credentials, service account keys, database passwords, and TLS certificates, a dedicated secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager) provides encryption at rest, audit logging, access policies, secret versioning, automatic rotation, and short-lived credentials. The right answer depends on the sensitivity of what you are protecting.
Static secrets are long-lived credentials that persist for weeks, months, or years — database passwords, API keys, static TLS certificates. They require manual rotation and if compromised give attackers permanent access. Dynamic secrets are generated on-demand with short lifespans (minutes to hours) — Vault database credentials, temporary IAM roles, short-lived certificates. Dynamic secrets eliminate the rotation problem since credentials constantly refresh, and even if intercepted they expire before meaningful damage. Use static secrets for things that genuinely need to persist (like an external API key you cannot control rotation for). Use dynamic secrets for anything internal where you control the system — database access, cloud credentials, service-to-service authentication.
The flow: (1) Pod starts with a Kubernetes service account token mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. (2) Pod's application calls Vault's Kubernetes auth endpoint with its service account token and the role name it wants. (3) Vault's Kubernetes auth method validates the token against the Kubernetes TokenReview API — verifying the token is valid, not expired, and bound to the correct service account and namespace. (4) Vault returns a Vault token with policies scoped to the requested role. (5) Pod uses the Vault token to read secrets from specific paths. (6) When the pod terminates, its Kubernetes token is automatically revoked, and any Vault leases it held are eventually expired. This ties identity at the Kubernetes layer directly to Vault access, with no static credentials to manage.
With shared static credentials (e.g., one database password used by 50 microservices), compromise of any single service exposes the credential to the attacker — giving them access to the shared resource (the database) directly. The blast radius equals all services using that credential. Dynamic secrets solve this by giving each service its own unique credentials. If one service is compromised, only those specific credentials are exposed — the attacker cannot use them to access the database directly because the credential is tied to that service's identity and expires quickly. Vault's database secret engine creates per-service database usernames like v-token-myapp-role-xyz123 — even if one service is compromised, the attacker gets a short-lived credential for only that one service account, not the master database password.
Phased approach: (1) Deploy Vault and configure the Kubernetes auth method alongside existing Kubernetes Secrets — do not change anything yet. (2) Deploy External Secrets Operator as a bridge — ESO syncs Vault secrets into Kubernetes Secrets, so applications keep using K8s Secrets without code changes. (3) Migrate one team or one non-critical service at a time to read directly from Vault via the Kubernetes auth method. (4) Once confident, deprecate ESO syncing for migrated services and remove static credentials from Kubernetes Secrets. (5) Finally, audit that no static credentials remain in etcd or git. Key principle: do not try to migrate everything at once — maintain backward compatibility during the transition.
SPIFFE (Secure Production Identity Framework for Everyone) defines a standard for workload identity — cryptographic identity for services in dynamic environments. SPIFFE IDs are URIs like spiffe://example.com/ns/payment/sa/payment-service that uniquely identify a workload. These IDs are embedded in X.509 certificates or JWTs, allowing services to prove their identity without shared secrets. SPIFFE shifts security from "protect the secrets" to "verify the identity" — instead of obsessing about database passwords being leaked, you verify the service presenting credentials is actually the payment service. Service meshes like Istio and Linkerd automatically issue SPIFFE certificates. Vault integrates with SPIFFE through its Kubernetes auth method — services authenticate using their SPIFFE IDs and receive Vault secrets scoped to their identity. This enables passwordless authentication between services.
Common causes: (1) SecretStore misconfiguration — wrong Vault address, expired Kubernetes auth token, or incorrect role mapping. (2) ESO operator lacks ClusterRole permissions to read ExternalSecret and SecretStore CRDs. (3) Vault lease expired and ESO cannot renew — ESO needs its own Vault token with appropriate policies. (4) Network policy blocking ESO pod from reaching Vault. Diagnosis steps: check ESO operator logs (kubectl logs -n external-secrets deployment/external-secrets-operator), verify ExternalSecret status (kubectl get externalsecret -n namespace -o yaml), check SecretStore connectivity (kubectl get secretstore -n namespace), and test Vault reachability from ESO pod (kubectl exec -n external-secrets deployment/external-secrets-operator -- curl -s https://vault.example.com/v1/sys/health). Prevention: use ESO with Vault's Kubernetes auth method so ESO gets automatic credential rotation, and monitor ExternalSecret status with Prometheus metrics.
The key is to decouple the rotation cycle from the connection lifecycle. Implement a grace period approach: rotate the secret in Vault but keep the old credential valid for a transition window (e.g., 24-48 hours). Your service gets a new credential on next restart or next credential check, while existing connections continue using the old credential until they naturally terminate. For services that run long-lived connections, use a connection proxy (like PgBouncer for PostgreSQL) — the proxy holds the database credentials and your service connects through it. Rotate the password at the proxy level without touching application configs. Another pattern: implement a credential refresh signal using a SIGHUP or a periodic file check, where the service detects the secret changed, gracefully drains existing connections, and reconnects with new credentials. The worst approach is forcing credential rotation on a running service without graceful reload — you'll get connection failures and downtime.
Both are bad options for production secrets. Kubernetes Secrets are base64-encoded, not encrypted, and anyone with API access or etcd access can decode them. Env vars are worse because they can leak more easily — they appear in logs, are visible in ps aux output, get exported in a shell's environment to child processes, and can be accidentally printed in error messages. A container running as non-root can still read /proc/self/environ to get env vars. The only advantage of env vars is simplicity, and that's not a security advantage. For production: use a secrets driver that supports encryption at rest, enable RBAC so only authorized service accounts can read secrets, and prefer direct Vault integration (pods authenticate to Vault and receive secrets at runtime) over persisting secrets in Kubernetes Secrets at all. If you must use Kubernetes Secrets, encrypt etcd and enable RBAC to restrict access.
KV v1 is a simple key-value store — write a secret, read it back, delete it. No versioning, no metadata. KV v2 adds versioning, metadata, and check-and-set (CAS) operations for atomic updates. The practical difference: with KV v1, if you update a secret and someone else updates it at the same time, one of those updates is lost silently. With KV v2, you can use the cas parameter to make atomic updates — if the version has changed since you read it, the write fails. KV v2 also lets you recover from accidental deletes (old versions are preserved) and add metadata to secrets. The other key difference is paths — KV v2 prefixes paths with data/ by default, so database/creds becomes database/data/creds. Most production deployments use KV v2. If you are still on KV v1, migrate by re-creating secrets under KV v2 paths or using the vault kv migrate command.
Vault issues leases for every secret it issues, not just dynamic secrets. A database credential might have a 1-hour lease, a Vault token might have a 24-hour lease. When a lease expires, Vault revokes the secret — the credential is invalidated in the database, the token is invalidated. Your application is responsible for renewing leases before they expire (vault lease renew command) or re-reading secrets that get revoked. This is the key difference from static secrets: if you put a static password in a KV store and never touch it, it stays valid forever. With leases, Vault enforces expiration even if you forget about a secret. This is powerful for security — but it means your application needs a renewal loop. If you fail to renew and the lease expires, your service loses access to the credential. Vault emits events when leases are about to expire (vault. expire.lease_expiration metric), so you can monitor and alert on it. Services that do not implement lease renewal will fail when their secrets expire.
Least privilege means each service gets exactly the permissions it needs and nothing more. A payment service reading database credentials should not also be able to read the credentials for the logging service or the email service. In Vault, policies are attached to tokens and define which paths can be read, write, updated, or deleted. A good policy structure: create a policy per service or per team, grant read access only to the specific secret paths that service needs, and deny everything else by default. Example: payment-service-policy might allow read on secret/data/myapp/db-creds but deny secret/data/myapp/admin-creds. Test policies before applying them in production — use vault policy read and vault token capabilities commands to verify what a token can actually access. Review policies quarterly — accumulated permissions over time is how you end up with services that have more access than they need, violating the principle of least privilege.
This is the fundamental GitOps secret problem: your desired state lives in git, but secrets cannot live in git. Three practical solutions, in order of preference: (1) External Secrets Operator — reference external secrets in your git manifests (kind: ExternalSecret), the ESO syncs actual values from your secrets store at runtime. Your git repo has the reference, not the secret. (2) Sealed Secrets — encrypt secrets using a public key only the cluster can decrypt, commit the encrypted blobs to git. The cluster's sealed-secrets controller decrypts them. (3) Vault Agent Sidecar — run a vault agent in each pod that handles authentication to Vault and writes secrets to a shared volume. Your git manifests reference paths in Vault, not actual values. The vault agent sidecar approach is the most secure but also the most complex to set up. ESO is the most practical for most teams — it works with any external secrets store (Vault, AWS, GCP, Azure) and keeps git manifests clean.
When a pod is deleted, the container's filesystem (where env vars and mounted secrets live) is destroyed with the pod. Any secret data that was read into memory is gone. However, Kubernetes Secrets objects themselves persist in etcd until explicitly deleted. If you created a Kubernetes Secret via ESO sync, that Secret remains in the cluster after the pod is gone — and might be mounted by other pods if the Secret name is reused. The cleanup strategy depends on how the secret was created: (1) ESO-managed secrets — use creationPolicy: Owner, which means Kubernetes garbage collector deletes the Secret when the ExternalSecret resource is deleted. (2) Manually created secrets — you must delete them explicitly. (3) Vault dynamic credentials — the lease is tied to the pod's identity, so when the pod is deleted and its Vault token is revoked, the dynamic credentials become invalid automatically. For Vault, the credential is useless even if it technically still exists in the database — what matters is that Vault revokes the lease and the database credential stops working.
Vault and ESO expose Prometheus metrics — enable the telemetry and ship them to your monitoring system. Key metrics: vault_secret_access_total (who is reading what), vault_lease_renewal_success_total (are dynamic credentials being renewed), vault_lease_expiration_seconds (leases approaching expiration), vault_auth_method_failure_total (auth failures by method), eso_sync_success_total and eso_sync_error_total (ESO sync health). Alerts you must have: vault_sealed (critical — Vault cannot serve any secrets), vault_unavailable (critical), dynamic_credential_expiring_soon (lease expiring within 5 minutes warning, 1 minute critical), auth_failure_rate_above_1_percent (possible attack or misconfiguration), eso_sync_errors_increasing (ESO not keeping secrets up to date). Dashboards to build: Vault health (seal status, active leases, storage), secret access patterns (top accessed secrets, denied access attempts), dynamic credential lifecycle (expiration heatmap, renewal success rate), ESO operations (sync success rate, error breakdown). Send Vault audit logs to your SIEM — each secret read, auth success/failure, and lease renewal should be logged with client identity, source IP, and timestamp.
Vault Kubernetes auth: pods authenticate directly to Vault using their Kubernetes service account token. Vault validates the token with Kubernetes' TokenReview API, then returns a Vault token scoped to a specific role. The pod reads secrets directly from Vault using that token. Secrets never persist as Kubernetes objects. Best for: short-lived workloads, services that need dynamic secrets (database credentials per pod), security-critical applications where you want secrets ephemeral. ESO with Vault KV: ESO syncs secrets from Vault's KV store into Kubernetes Secrets objects on a schedule. Your application reads Kubernetes Secrets as env vars or volume mounts. Best for: existing applications that expect Kubernetes Secrets, gradual migrations from Kubernetes Secrets to Vault, applications you cannot modify to add Vault SDK. The trade-offs: Kubernetes auth is more secure (secrets never hit etcd, no persistent Kubernetes Secret objects) but requires application code changes or a sidecar. ESO is easier to adopt (no code changes) but secrets persist in Kubernetes Secrets and etcd.
Shared service accounts break the security model completely. If all microservices use the same service account, then Vault policies cannot differentiate between them — the payment-service and the logging-service get the same Vault token with the same permissions. Compromise one service and you get access to everything. Beyond Vault: with a shared SA, a breach of any single service gives an attacker the ability to impersonate every other service in the cluster. Kubernetes RBAC cannot isolate workloads. Audit logs cannot distinguish which service accessed what. Service mesh mTLS cannot work because all services present the same identity. The blast radius of any single vulnerability becomes the entire system. The correct model: one service account per service (or per logical group of related services). This is how you get fine-grained security — the payment-service Vault role can only read the payment-service secrets, the logging-service Vault role can only read the logging-service secrets. If the logging service is compromised, the attacker gets nothing beyond logging. This is not over-engineering — it is the foundation of zero-trust architecture in microservices.
Further Reading
- HashiCorp Vault Documentation — Official Vault docs covering all secret engines, auth methods, and operational guides
- Kubernetes Secrets Documentation — K8s native secrets, encryption configuration, and security best practices
- External Secrets Operator — ESO project documentation for syncing secrets from external stores into Kubernetes
- SPIFFE Project — Official SPIFFE specification and implementation guides for workload identity
- OWASP Secrets Management Cheat Sheet — Industry guidance on secrets management pitfalls and controls
- Vault High Availability Guide — Deploying Vault in HA mode for production resilience
- Kubernetes RBAC Documentation — Fine-grained access control for Kubernetes resources including Secrets
- Cloud KMS Integration with Vault — Auto-unseal using cloud key management services
Conclusion
Secrets management connects to several other patterns worth understanding.
mTLS and Service Mesh handles encryption and authentication of service-to-service communication. Service meshes often leverage workload identity to issue short-lived certificates automatically.
Kubernetes provides the container orchestration layer where many secrets management solutions run. Understanding Kubernetes RBAC and service accounts is foundational.
GitOps changes how you think about configuration management. Secrets become part of the reconciliation loop without living in git.
Service Identity and SPIFFE provides the cryptographic identity layer that enables passwordless authentication and short-lived credentials.
Secrets management in microservices is a solved problem, but it requires intentional design. Start with Kubernetes Secrets for simple use cases, but plan to evolve toward a dedicated secrets manager as your system grows.
HashiCorp Vault has become the de facto standard for secrets management in Kubernetes environments. Its dynamic secrets, fine-grained policies, and comprehensive audit logging address the gaps in native Kubernetes Secrets.
The shift from static shared credentials to short-lived dynamic credentials improves your security posture. Even if an attacker obtains a credential, it expires before meaningful damage occurs.
Invest the time to implement proper secrets management early. Retrofitting it into an existing system with dozens of services is painful. Building it in from the start is straightforward and pays dividends in reduced risk and easier compliance audits.
Category
Related Posts
Kubernetes Network Policies: Securing Pod-to-Pod Communication
Implement microsegmentation in Kubernetes using Network Policies to control traffic flow between pods and enforce zero-trust networking.
GitOps: Infrastructure as Code with Git for Microservices
Discover GitOps principles and practices for managing microservices infrastructure using Git as the single source of truth.
Health Checks: Liveness, Readiness, and Service Availability
Master health check implementation for microservices including liveness probes, readiness probes, and graceful degradation patterns.