Kubernetes: Container Orchestration for Microservices
Learn Kubernetes fundamentals: pods, services, deployments, ingress controllers, Helm charts, autoscaling, and microservices architecture patterns.
Kubernetes: Container Orchestration for Microservices
Kubernetes handles the hard stuff: scheduling, scaling, networking, failure recovery. You focus on what runs, Kubernetes handles where and when.
This post covers the core concepts: pods, services, deployments, ingress, Helm, and autoscaling. By the end, you will have a mental model for how these pieces fit together.
Kubernetes (K8s) is an open-source container orchestration platform. It manages where containers run across a cluster of machines, keeps them healthy, scales them based on load, and handles networking between them.
You define the desired state in configuration files. Kubernetes continuously works to make reality match your desired state. If a container crashes, Kubernetes restarts it. If a node fails, Kubernetes reschedules its containers elsewhere.
graph TD
subgraph Cluster
subgraph Node1
P1[Pod] --> P2[Pod]
end
subgraph Node2
P3[Pod] --> P4[Pod]
end
subgraph Node3
P5[Pod]
end
end
K8s[Kubernetes Control Plane] --> Node1
K8s --> Node2
K8s --> Node3
The control plane makes scheduling decisions, manages node membership, and exposes the API you interact with. Nodes run the actual workloads.
Core Concepts
Pods
A pod is the smallest thing you can deploy in Kubernetes. It represents one running process — one instance of something. A pod may contain one or more containers that share network and storage.
Most of the time, you run one container per pod. The sidecar pattern puts helper containers in the same pod for logging, proxying, or synchronization — useful but not the common case.
apiVersion: v1
kind: Pod
metadata:
name: api-server
spec:
containers:
- name: api
image: my-api:v1
ports:
- containerPort: 8080
Services
A service gives pods a stable network address. Pod IPs change when they restart — services abstract that away so your code does not need to track changing IPs.
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
selector:
app: api-server
ports:
- port: 80
targetPort: 8080
type: ClusterIP
The four service types:
- ClusterIP: Internal cluster IP (default). Only reachable from within the cluster.
- NodePort: Exposes the service on each node’s IP at a static port.
- LoadBalancer: Creates an external load balancer (in cloud environments).
- Headless: No cluster IP. DNS returns pod IPs directly.
Deployments
A deployment manages replicas of a pod. It handles rolling updates and rollbacks. You declare how many replicas you want, and Kubernetes maintains that count.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: my-api:v1
If you update the image version, the deployment rolls out the change gradually, replacing pods one by one to maintain availability.
ReplicaSets
ReplicaSets keep the right number of pods running. Deployments manage ReplicaSets — you usually work with Deployments, not directly with ReplicaSets.
Networking
Kubernetes networking has a few important rules:
- Every pod gets a unique IP across the cluster
- Containers within a pod share that IP
- Pods can communicate with all other pods without NAT
- Services get a stable virtual IP that load-balances to pods
graph LR
PodA[Pod A] --> Svc[Service]
Svc --> PodB[Pod B]
Svc --> PodC[Pod C]
The service selector matches pod labels. When you call the service IP, Kubernetes load-balances across all matching pods.
Ingress
Ingress manages external HTTP/HTTPS access to services within the cluster. It provides routing based on host and path, SSL termination, and name-based virtual hosting.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
spec:
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 80
- path: /products
pathType: Prefix
backend:
service:
name: product-service
port:
number: 80
An ingress controller (like nginx-ingress or Istio gateway) implements the Ingress resource. Without a controller, Ingress resources do nothing.
Helm: Kubernetes Package Manager
Helm templatizes Kubernetes manifests. Instead of repeating YAML for each environment, you define templates with placeholders. A values file fills in the placeholders for dev, staging, and production.
# Install a chart
helm install my-release bitnami/wordpress
# Upgrade
helm upgrade my-release bitnami/wordpress --set resources.limits.memory=2Gi
# Rollback
helm rollback my-release 1
Helm charts bundle manifests, templates, defaults, and metadata into something you can install, upgrade, and roll back as a unit.
Scaling
Kubernetes scales workloads in two directions: horizontally (more pod replicas) and vertically (more resources per pod).
Horizontal Pod Autoscaler (HPA)
The HPA scales the number of pod replicas based on CPU utilization or custom metrics. HPA v2 (the standard HorizontalPodAutoscaler in autoscaling/v2) natively supports custom metrics from Prometheus via the metrics.k8s.io API — this is built into Kubernetes itself, not a separate add-on. For more advanced event-driven scaling scenarios, KEDA (Kubernetes Event-Driven Autoscaling) extends HPA with a richer set of scalers that can trigger scaling based on external event sources such as message queue depth, request rates, or custom metrics from nearly any source.
In short:
- Native HPA: scales based on CPU, memory, or custom Prometheus metrics via the built-in metrics API
- KEDA: a separate higher-level autoscaling framework that runs as an operator and extends HPA with event-driven scalers for queues, databases, cloud services, and more
For most workloads, native HPA with Prometheus metrics is sufficient. KEDA is the right choice when you need to scale based on signals that HPA does not natively support, such as Apache Kafka consumer lag or the depth of an Azure Queue.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
When average CPU across replicas exceeds 70%, Kubernetes adds replicas up to the maximum. When it drops, it removes replicas down to the minimum.
Cluster Autoscaler
The cluster autoscaler adjusts the number of nodes in a cluster. When pods cannot be scheduled because resources are tight, it adds nodes. When nodes are sitting idle, it removes them.
ConfigMaps and Secrets
Configuration data lives in ConfigMaps. Sensitive data lives in Secrets. Both inject into containers as environment variables or files.
apiVersion: v1
kind: ConfigMap
metadata:
name: api-config
data:
DATABASE_HOST: "db.example.com"
LOG_LEVEL: "info"
apiVersion: v1
kind: Secret
metadata:
name: api-secrets
type: Opaque
stringData:
DATABASE_PASSWORD: "supersecret"
Inject into pods via environment variables or mounted volumes. Secrets are base64 encoded, not encrypted by default. For production secrets, integrate with a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager).
Namespaces
Namespaces split a cluster into virtual clusters. They scope names, enforce resource quotas, and let you apply RBAC per team.
kubectl get namespaces
kubectl create namespace my-app
kubectl config set-context --current --namespace=my-app
Common namespaces: default, kube-system (cluster components), kube-public (publicly readable resources).
Resource Management
Pods consume CPU and memory. You set resource requests (the minimum guaranteed) and limits (the maximum allowed).
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
If a pod exceeds its memory limit, it gets OOM-killed. If it exceeds its CPU limit, it gets throttled.
RBAC Security
Role-Based Access Control restricts what users and service accounts can do in the cluster. Kubernetes has two role types: Role (namespace-scoped) and ClusterRole (cluster-wide). Bindings associate roles with subjects.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-reader-binding
namespace: production
subjects:
- kind: ServiceAccount
name: my-app
namespace: production
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
ClusterRoles work the same way but apply across all namespaces. Use least-privilege: grant only the permissions actually needed. Avoid cluster-admin bindings unless genuinely required. Run kubectl auth can-i --list to check what a principal can actually do.
Network Policies
By default, all pods in a cluster can reach each other. NetworkPolicies restrict traffic based on label selectors, namespaces, and port specifications.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- port: 5432
A default-deny policy is the safest starting point. Whitelist only the traffic each pod actually needs. Without network policies, a compromised pod can reach every other workload in the cluster.
Storage Orchestration
Persistent data in Kubernetes lives in PersistentVolumes backed by storage classes. StatefulSets manage pods with stable storage identities.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-pvc
namespace: production
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
Storage classes define provisioners (GCE PD, AWS EBS, etc.) and parameters like replication type. Dynamic provisioning creates PersistentVolumes automatically when a PVC is claimed. For stateful workloads, StatefulSets give pods stable network identities and ordered deployment, which simple Deployments do not provide.
Helm Comparison
Helm is the standard package manager for Kubernetes. Alternatives include Kustomize (patch-based overlays without templating) and raw YAML with tools like yq for manipulation.
| Feature | Helm | Kustomize |
|---|---|---|
| Templating | Go templates with values.yaml | No templating; pure YAML overlays |
| Learning curve | Steeper (templates, hooks, chart repos) | Gentler (kustomization.yaml) |
| Package ecosystem | chart repos (Bitnami, etc.) | No central repo; DIY packaging |
| DRY principle | Fully templated | Patch-based (patch files per env) |
| Best for | Reusable charts across many envs | Env-specific configs with shared bases |
| Rollback | Native (helm rollback) | Manual (kubectl rollout undo) |
Helm wins when you need reusable, versioned packages. Kustomize wins when your configs are mostly similar except for a few env-specific values. Many teams use both: Helm for third-party software, Kustomize for in-house services.
Microservices on Kubernetes
Kubernetes works well for microservices because each service runs in its own deployment with independent scaling. Services communicate over the internal network via services. Deployments handle rolling updates without downtime. Namespaces isolate teams and environments. RBAC controls access between services.
A service mesh like Istio adds mTLS, traffic routing, and observability on top. See Service Mesh and Istio and Envoy for how they combine with Kubernetes.
When to Use / When Not to Use
Use Kubernetes when:
- You are running multiple services that need independent scaling and deployment
- You need automated failure recovery (self-healing for containerized workloads)
- You require traffic management (ingress, load balancing, canary deployments)
- Your team has or is building Kubernetes expertise
- You need consistent behavior across development, staging, and production environments
- You want infrastructure-as-code with declarative configuration
Probably not the right choice when:
- You have a small number of services (fewer than 5) with simple workloads
- Your team lacks DevOps capacity to manage cluster operations
- Your workload is primarily stateless but simple (consider managed container services instead)
- You need to move fast with minimal infrastructure overhead
- Cost of running a cluster outweighs benefits for your use case
Trade-off Table
| Factor | With Kubernetes | Without Kubernetes |
|---|---|---|
| Latency | Baseline (no added hops) | Baseline |
| Consistency | Declarative config, same behavior everywhere | Configuration varies by environment |
| Cost | Control plane + worker nodes overhead | Lower (fewer components) |
| Complexity | Steeper learning curve; cluster management | Simpler direct deployment |
| Operability | Centralized cluster management; unified monitoring | Per-server management |
| Scalability | Auto-scaling built-in; handles thousands of pods | Manual scaling |
| Reliability | Self-healing, automatic restarts, load balancing | Requires external tools |
| Flexibility | Runs anywhere (cloud, on-prem, hybrid) | Tied to specific infrastructure |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Node goes down | Pods on that node become unavailable | Run multiple replicas; use PodDisruptionBudgets; cluster autoscaler provisions replacement nodes |
| Pod OOM killed (memory limit exceeded) | Application crashes and restarts | Set appropriate memory requests/limits; monitor memory usage; investigate memory leaks |
| Pod throttled (CPU limit exceeded) | Application latency increases | Set appropriate CPU requests/limits; profile application CPU usage |
| Image pull failure | Pods stuck in ImagePullBackOff state | Use private registries with credentials configured; cache images on nodes; use image pull secrets |
| Volume mount failure | Pod cannot start or crashes | Validate PersistentVolumeClaims; check storage class availability; monitor volume capacity |
| Deployment rollback required | Bad deployment causes failures | Use RollingUpdate strategy with maxSurge/maxUnavailable; test rollbacks in staging; keep previous working ReplicaSet |
| Namespace deletion accident | All resources in namespace deleted | Use ResourceQuota to limit scope; implement namespace protection; regular backups of cluster state |
| etcd data loss | Cluster state corrupted; may require full rebuild | Use etcd backups; run etcd in HA mode; monitor disk I/O on etcd nodes |
Common Pitfalls / Anti-Patterns
Misconfigured resource requests and limits: Setting requests too low causes throttling; setting limits too aggressively causes OOM kills. Profile your application under realistic load and set appropriate values with some headroom.
Ignoring pod disruption budgets: Without PDBs, a node drain can take down too many replicas simultaneously. Always set PDBs for stateful or high-availability workloads.
Using latest tag for images: latest means different things at different times. Always pin image tags to specific versions to ensure reproducible deployments.
Not planning for capacity: Running nodes at high utilization leaves no headroom for spikes. Cluster autoscaler helps but is not instantaneous. Plan for ~70% average utilization.
Overly permissive RBAC: Giving developers cluster-admin “because it is easier” creates security and stability risks. Use namespace-scoped roles and RoleBindings.
Not using labels and selectors consistently: Labels are the primary way to select pods for services, deployments, and network policies. Inconsistent labeling breaks routing and isolation.
Ignoring Kubernetes events: Events are often the first indicator of problems (ImagePullBackOff, FailedScheduling, etc.). Monitor and alert on events, not just metrics.
Observability Checklist
Metrics
- Node CPU and memory utilization
- Pod CPU and memory requests vs actual usage
- Pod restart count and reason
- Deployment rollout progress
- HPA status (current replicas vs desired)
- Persistent volume capacity and usage
- Network policies in effect
- API server request latency and error rate
Logs
- Container logs captured (stdout/stderr)
- Kubernetes events logged (pod scheduling, volume mounts, image pulls)
- Node-level logs for kubelet and container runtime
- Audit logs for API server access (who did what when)
- Include labels and selectors in log context for filtering
Alerts
- Alert when node memory/CPU exceeds 85% utilization
- Alert when pod restart count exceeds threshold in short window
- Alert when deployment fails to make progress
- Alert when HPA is at max replicas (may need to adjust)
- Alert on persistent volume capacity approaching limit
- Alert when etcd disk I/O is high (could indicate problems)
- Alert on unauthorized API server access attempts
Security Checklist
- RBAC configured with least-privilege principle (avoid cluster-admin where possible)
- Service accounts use bound service account tokens; avoid default tokens
- NetworkPolicy restricts traffic between namespaces (default deny)
- PodSecurityPolicy or Pod Security Standards enforced
- Secrets not stored in etcd in plain text (use encryption at rest)
- Container images scanned for vulnerabilities (use image policy)
- RunAsNonRoot and readOnlyRootFilesystem enforced where possible
- Kubernetes API server not exposed publicly; use authentication
- Regular Kubernetes version updates for security patches
Production Checklist
These are the non-negotiable items for production-ready Kubernetes deployments:
Core Workloads
- Pods use appropriate resource requests and limits (CPU and memory)
- Each Deployment/StatefulSet has minimum 2 replicas for HA
- PodDisruptionBudgets configured for stateful and critical services
- Liveness and readiness probes defined for all application containers
- Image tags pinned to specific versions (no
latest)
Network Connectivity
- NetworkPolicy with default-deny applied to all namespaces
- Ingress controller deployed and Ingress resources configured
- Services use correct selectors matching pod labels
- Headless Services used only where direct pod DNS is required
Security
- RBAC roles use least-privilege (no cluster-admin unless required)
- Service accounts use bound tokens; default token not used by workloads
- Secrets encrypted at rest in etcd
- PodSecurityStandards enforced (restricted policy)
- Containers run as non-root where possible
Scaling and Availability
- HPA configured for services with variable load
- Cluster autoscaler enabled for node elasticity
- ResourceQuota applied per namespace to prevent resource starvation
- LimitRange sets default container limits automatically
Storage
- PersistentVolumeClaims use appropriate storage class
- StatefulSets use stable storage (not emptyDir) for persistent data
- Storage capacity monitored and alerts configured
Operations
- Kubernetes version current; cluster upgrades tested in staging
- etcd backed up regularly (test restore periodically)
- Audit logging enabled on API server
- Monitoring and alerting active for node/pod metrics and events
- kubectl config context verified before running production commands
Quick Debug Checklist
# 1. Get current pod state
kubectl get pod myapp-xxx -n production -o wide
# 2. Check events on the pod
kubectl describe pod myapp-xxx -n production
# 3. View application logs
kubectl logs myapp-xxx -n production --tail=100
# 4. Check if the container is crashing
kubectl logs myapp-xxx -n production --previous
# 5. Check resource usage on the node
kubectl top pod myapp-xxx -n production
kubectl describe node $(kubectl get pod myapp-xxx -n production -o jsonpath='{.spec.nodeName}')
# 6. Check for finalizer issues (stuck in Terminating)
kubectl get pod myapp-xxx -n production -o jsonpath='{.metadata.finalizers}'
Interview Questions
Pending state. How do you diagnose it?Check kubectl describe pod for events. Common causes: insufficient cluster resources (CPU/memory), no matching node selectors or affinity rules, PVC not bound, or image pull failures. Check node capacity with kubectl describe nodes, PVC status with kubectl get pvc, and events with kubectl get events --sort-by='.lastTimestamp'.
Start at the Service level: verify the selector matches pod labels (kubectl get svc -o wide, kubectl get pods --show-labels). Check endpoints exist (kubectl get endpoints <svc>). If no endpoints, the selector mismatch is the culprit. If endpoints exist, check whether pods are actually running and listening on the target port. Use kubectl exec into a pod and curl the target port directly. Check network policies that might be blocking traffic.
For managed clusters (GKE, EKS, AKS), use the managed upgrade path which handles node draining and replacement automatically. For self-managed: use kubectl drain --ignore-daemonsets --delete-emptydir-data to safely evict pods, then upgrade the node. For StatefulSets, use PodDisruptionBudgets to ensure minimum availability during upgrades. Always test in a non-production environment first. Use blue-green node pool strategies where you spin up new nodes, migrate workloads, then terminate old nodes.
Deployments manage stateless applications with interchangeable pods — Kubernetes freely schedules, scales, and replaces pods. StatefulSets manage stateful applications requiring stable identity, stable storage, and ordered deployment/scaling — pods have persistent identifiers and ordered graceful deployment. Use Deployments for web servers, APIs, and most application workloads. Use StatefulSets for databases, Kafka, ZooKeeper, and any workload that needs persistent identity.
The container's memory usage likely exceeded the limit within the cgroup. Check actual usage with kubectl top pod or docker stats. The limit shown in the pod spec might be a request rather than a limit, or the application might have a memory leak. Check if the container is hitting the node-level memory pressure with kubectl describe node. Consider whether the application is forking processes that escape the container's cgroup accounting. Also verify the OOMKilled reason in kubectl describe pod — there is a difference between container-level OOM and node-level OOM.
Never store secrets in etcd as plaintext — enable encryption at rest. Use a secrets management tool: HashiCorp Vault with ESO (External Secrets Operator) syncs secrets from Vault into Kubernetes secrets, or AWS Secrets Manager with the Secrets Store CSI driver. Avoid mounting secrets as environment variables when possible — env vars persist in process memory and appear in logs more easily. Use short-lived tokens and workload identity where possible. Audit access to secrets with Kubernetes audit logging.
A PDB ensures minimum number of pods remain available during voluntary disruptions like node drains. It prevents Kubernetes from evicting too many replicas at once, maintaining availability during cluster upgrades or maintenance. Without PDBs, a single node drain could take down your entire service. Set minAvailable or maxUnavailable on Deployments or StatefulSets for HA workloads.
livenessProbe, readinessProbe, and startupProbe.livenessProbe determines if a container is alive and should be restarted. readinessProbe determines if a container can receive traffic — pods without ready probes are removed from Service endpoints. startupProbe delays all other probes until the application finishes starting, useful for slow-initializing apps. Use the right probe type: liveness for restart decisions, readiness for traffic routing, startup to protect slow starters.
The scheduler watches for unscheduled pods, evaluates nodes against resource requests and constraints, scores nodes with predicates (hardware, affinity, taints, resource availability), and binds the pod to the highest-scoring node. Influencing factors: nodeSelector, node affinity/anti-affinity, pod affinity/anti-affinity, taints and tolerations, resource requests/limits, priority classes, and topology spread constraints for HA distribution.
ConfigMaps hold non-sensitive configuration data; Secrets hold sensitive data (tokens, passwords, certificates). Both are key-value pairs consumed as environment variables or mounted files. Secrets are base64-encoded but not encrypted by default — enable encryption at rest for etcd. Consume as env vars with envFrom.configMapRef / envFrom.secretRef, or as volumes with volumes.configMap / volumes.secret. Prefer files over env vars for secrets since env vars are harder to rotate.
ServiceAccounts are identities for workloads (pods, applications) to authenticate to the Kubernetes API. User accounts are for human operators. ServiceAccount tokens are mounted into pods as projected volumes, and the API server validates them against the RBAC policy for that ServiceAccount. Use dedicated ServiceAccounts per application and bind only the permissions the app actually needs — avoid the default ServiceAccount.
A Deployment's RollingUpdate strategy controls the process. Kubernetes creates new ReplicaSet pods incrementally, waiting for each new pod to become ready before scaling up further. It also terminates old ReplicaSet pods incrementally, respecting maxUnavailable and maxSurge settings. During the rollout, the Deployment's .spec.template changes but the selector stays constant. The Service continues routing to ready pods only (via readinessProbe). If a problem occurs, kubectl rollout undo reverts to the previous ReplicaSet.
ResourceQuota limits total resource consumption per namespace (CPU, memory, counts of objects). LimitRange sets default requests/limits per container automatically. Apply a ResourceQuota to cap total CPU and memory, then a LimitRange so every pod gets sensible defaults even if not explicitly set. Without quotas, one team can starve others. Quotas also prevent accidental runaway resource consumption that could destabilize the cluster.
Use namespace-level isolation as the primary boundary. Apply ResourceQuota per namespace to prevent tenant A from consuming resources needed by tenant B. Use NetworkPolicy with default-deny rules, allowing only explicitly permitted cross-namespace communication. RBAC restricts what each tenant can do within their namespace. PodSecurityStandards enforce security constraints (restricted policy prevents privileged containers). For sensitive workloads, use node isolation with taints/tolerations and dedicated node pools so specific tenants get dedicated infrastructure. Secrets should be encrypted at rest, and access to etcd should be restricted.
The control plane consists of: etcd (consistent key-value store holding cluster state), kube-apiserver (REST API server all clients talk to, validates requests, persists to etcd), kube-controller-manager (runs controller loops: node controller, replication controller, endpoints controller, service account controller), kube-scheduler (watches for unscheduled pods, selects nodes based on resource requirements and constraints), and cloud-controller-manager (interfaces with cloud provider for load balancers and nodes). The API server is the central hub — all other components interact through it, never directly.
DaemonSet ensures one pod runs on every node (or subset matching node selectors). Use for log collectors, monitoring agents, storage daemons — things you need running on each node. Job creates pods that run to completion. Use for batch jobs, one-time tasks, migrations. CronJob schedules jobs to run repeatedly at specific times. Use for periodic maintenance, backups, report generation. Deployment manages long-running services that should never stop. StatefulSet manages stateful workloads requiring stable identity and ordered deployment.
Store sensitive data in Secrets, not ConfigMaps. Enable encryption at rest for etcd so secrets are not plaintext. Never commit secrets to version control — use external secrets management: HashiCorp Vault with External Secrets Operator, AWS Secrets Manager with Secrets Store CSI driver, or GCP Secret Manager. Prefer file-based mounting over environment variables since env vars are harder to rotate and more likely to leak in logs. Use short-lived service account tokens with token projection. Regularly rotate secrets and audit access via Kubernetes audit logs.
An Ingress controller (nginx-ingress, Traefik, Istio gateway) watches for Ingress resources and implements the routing rules. Without a controller, Ingress resources have no effect. For TLS termination, create a Secret with the certificate and private key, then reference it in the Ingress spec under tls[].hosts. The controller terminates TLS at the edge and forwards plain HTTP to backend services. For automated certificate management, use cert-manager with Let's Encrypt — it watches for Ingress resources, solves the ACME challenge, and provisions certificates automatically.
The default scheduler uses predicates and priorities to select the best node. Override with: nodeSelector (simple label matching), node affinity/anti-affinity (more expressive rules like preferred vs required), pod affinity/anti-affinity (co-locate or spread pods relative to other pods), taints and tolerations (nodes repel pods unless they tolerate the taint), priority classes (higher priority pods can preempt lower ones), and topology spread constraints (distribute pods evenly across zones or regions). For special workloads, use the descheduler to evict pods and let the scheduler re-place them based on current cluster state.
Kubernetes provides service discovery via DNS. When you create a Service, Kubernetes registers it with CoreDNS (or kube-dns). Pods can reach other pods via the Service name — DNS resolves to the Service IP, which kube-proxy routes to backing pods. Environment variables also provide service discovery: kubelet injects SVCNAME_SVCNAME_TCP_[PORT]_TCP_ADDR variables into each pod. DNS is preferred since it works across namespaces (svcname.svc.cluster.local). For headless Services, DNS returns individual pod IPs directly, useful for stateful applications that need direct pod addressing.
Further Reading
- Kubernetes Documentation - Official docs for all core concepts
- kubectl Cheat Sheet - Common commands reference
- Helm Documentation - Package manager docs
- Kubernetes Blog - Release notes and deep dives
- Lens - Open-source Kubernetes IDE for cluster management
- K9s - Terminal-based UI for managing Kubernetes clusters
Operator Pattern Deep Dive
The operator pattern extends Kubernetes by bridging custom controllers to external systems. Where a built-in controller (like ReplicaSet) manages pods, an operator manages applications and their lifecycle.
graph LR
subgraph Custom Controller
Reconcile[Reconcile Loop]
Watch[Watch API Server]
end
Watch --> Reconcile
Reconcile --> CustomResource[Custom Resource Definition]
CustomResource --> ExternalSystem[External System<br/>e.g. Vault, Prometheus]
ExternalSystem --> Watch
Key components:
- CRD (Custom Resource Definition): Extends the Kubernetes API with custom types. You define the schema in YAML.
- Controller: A control loop that watches your custom resources and takes action to reach the desired state.
- Operator: Packages the CRD and controller together. The controller encodes operational knowledge (how to manage the application).
Popular operators:
- cert-manager: Automatically provisions TLS certificates from Let’s Encrypt
- External Secrets Operator: Syncs secrets from Vault, AWS Secrets Manager, GCP Secret Manager into Kubernetes Secrets
- Prometheus Operator: Manages Prometheus and Alertmanager instances
- Strimzi: Manages Kafka on Kubernetes
Operators turn Kubernetes into a platform for building platforms — you encode operational knowledge into code that runs forever.
Conclusion
When Kubernetes Complexity Pays Off
Kubernetes adds operational overhead. Managing the control plane, understanding YAML abstractions, handling upgrades and failures — all of this costs time and expertise. The question is whether the benefits justify the investment for your situation.
Kubernetes pays off when:
- You run more than a handful of microservices that need to scale and recover independently
- Your team spends significant time on deployment, rollbacks, and incident response without Kubernetes
- You need to run the same workloads across multiple environments (dev, staging, production, multi-cloud)
- Regulatory or contractual requirements demand specific isolation or auditing that containers alone cannot provide
- Your traffic patterns vary significantly and you need automated scaling to optimize costs
Plain container orchestration or managed services may be better when:
- You run a small number of services (fewer than 5-10) with stable, predictable traffic
- Your team is small and does not have time to learn and maintain Kubernetes
- Your requirements are simple enough that a single cloud-based Platform-as-a-Service handles them
- You are early in a project and moving fast matters more than operational sophistication
For small teams or simple workloads, the complexity of Kubernetes may not pay off. For production microservices at scale, Kubernetes is usually worth the investment.
The key is honest assessment: if you find yourself fighting Kubernetes more than it helps, simplify. The best infrastructure is the one your team can operate reliably.
Category
Related Posts
Advanced Kubernetes: Controllers, Operators, RBAC, Production Patterns
Explore Kubernetes custom controllers, operators, RBAC, network policies, storage classes, and advanced patterns for production cluster management.
Docker Fundamentals
Learn Docker containerization fundamentals: images, containers, volumes, networking, and best practices for building and deploying applications.
Helm Charts: Templating, Values, and Package Management
Helm Charts guide covering templates, values management, chart repositories, and production deployment workflows.