Advanced Kubernetes: Controllers, Operators, RBAC, Production Patterns

Explore Kubernetes custom controllers, operators, RBAC, network policies, storage classes, and advanced patterns for production cluster management.

published: reading time: 34 min read

Advanced Kubernetes: Controllers, Operators, RBAC, and Production Patterns

Kubernetes has become the de facto standard for container orchestration. If you have been running clusters for a while, you have likely encountered scenarios that basic Kubernetes resources do not handle well. Custom controllers, operators, and advanced security patterns solve these problems.

This guide assumes you already know Kubernetes basics. If you are just starting, our Docker Fundamentals guide covers containers first, which is essential groundwork before tackling Kubernetes.

Introduction

Before diving into advanced topics, let us review how Kubernetes control plane components work together.

graph TB
    subgraph "Control Plane"
        A[API Server] --> B[etcd]
        A --> C[Controller Manager]
        A --> D[Scheduler]
        C --> E[Controllers]
        E --> A
    end
    subgraph "Worker Nodes"
        F[Kubelet] --> A
        G[Container Runtime] --> F
        H[Kube Proxy] --> F
    end

The API server is the gateway to everything. All cluster operations go through it, and it validates configurations before persisting to etcd. Controllers watch the API server for changes and reconcile actual state toward desired state.

CRDs extend the Kubernetes API to define new resource types. They let you create domain-specific objects that Kubernetes can manage like native resources.

Defining a CRD

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  names:
    kind: Database
    plural: databases
    shortNames:
      - db
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                engine:
                  type: string
                  enum: [postgresql, mysql, mongodb]
                version:
                  type: string
                replicas:
                  type: integer
                  minimum: 1
                storage:
                  type: object
                  properties:
                    size:
                      type: string
                    storageClass:
                      type: string
            status:
              type: object
              properties:
                phase:
                  type: string
                endpoint:
                  type: string

After applying this CRD, you can create Database objects just like built-in resources:

kubectl apply -f database-crd.yaml
kubectl get databases

CRD Versioning

Kubernetes supports multiple versions of a CRD simultaneously. The storage flag indicates which version persists to etcd. This enables zero-downtime migrations when you need to change your schema.

versions:
  - name: v1
    served: true
    storage: true
  - name: v1beta1
    served: true
    storage: false

Clients can request specific versions via the Accept header. This gives you flexibility during rolling upgrades.

Custom Controllers

Controllers are control loops that watch resources and take action to achieve desired state. The Kubernetes ecosystem is built on this pattern, and you can extend it with custom controllers.

The Controller Pattern

A controller follows a reconcile loop:

  1. Watch for changes to resources
  2. Fetch current state
  3. Compare current state with desired state
  4. Take action to reconcile the difference
  5. Update status
  6. Repeat

Writing a Basic Controller

The controller-runtime library simplifies controller development:

package controller

import (
    "context"
    "fmt"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/controller"
    "sigs.k8s.io/controller-runtime/pkg/handler"
    "sigs.k8s.io/controller-runtime/pkg/manager"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
    "sigs.k8s.io/controller-runtime/pkg/source"
)

type DatabaseReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    log := fmt.Sprintf("Reconciling Database %s/%s", req.Namespace, req.Name)

    // Fetch the Database instance
    db := &examplev1.Database{}
    err := r.Get(ctx, req.NamespacedName, db)
    if err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // Create or update the StatefulSet
    statefulSet := r.buildStatefulSet(db)
    err = r.createOrUpdate(ctx, statefulSet)
    if err != nil {
        return reconcile.Result{}, err
    }

    // Update status
    db.Status.Phase = "Running"
    db.Status.Endpoint = fmt.Sprintf("%s.%s.svc.cluster.local", db.Name, db.Namespace)
    r.Status().Update(ctx, db)

    return reconcile.Result{Requeue: true}, nil
}

Controllers run as part of a manager that handles caching, client connections, and leader election. This makes them robust in production environments with multiple replicas.

Operators: Domain-Specific Automation

Operators are custom controllers with domain-specific knowledge baked in. They encode operational expertise into software that handles complex, stateful applications.

The key difference from generic controllers is that operators understand the application they manage. They know how to handle backups, upgrades, failover, and other operational tasks.

Building an Operator with Operator SDK

Operator SDK provides scaffolding and best practices for building operators:

# Install operator-sdk
brew install operator-sdk

# Create a new operator
operator-sdk init --domain example.com --repo github.com/example/database-operator

# Create the API and controller
operator-sdk create api --group database --version v1 --kind Database --resource --controller

Defining the Operator API

# api/v1/database_types.go
package v1

type DatabaseSpec struct {
Engine       string            `json:"engine,omitempty"`
Version      string            `json:"version,omitempty"`
Replicas     int32             `json:"replicas,omitempty"`
Storage      StorageSpec       `json:"storage,omitempty"`
BackupConfig *BackupConfigSpec `json:"backupConfig,omitempty"`
}

type StorageSpec struct {
Size         string `json:"size"`
StorageClass string `json:"storageClass,omitempty"`
}

type BackupConfigSpec struct {
Schedule string `json:"schedule"`
Bucket   string `json:"bucket"`
}

type DatabaseStatus struct {
Phase    string `json:"phase,omitempty"`
Endpoint string `json:"endpoint,omitempty"`
}

Implementing Reconcile Logic

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("database", req.NamespacedName)

    // Fetch the Database instance
    db := &databasev1.Database{}
    err := r.Get(ctx, req.NamespacedName, db)
    if err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Create or update StatefulSet
    statefulSet, err := r.desiredStatefulSet(db)
    if err != nil {
        return ctrl.Result{}, err
    }
    err = r.createOrUpdate(ctx, statefulSet)
    if err != nil {
        return ctrl.Result{}, err
    }

    // Handle backups if configured
    if db.Spec.BackupConfig != nil {
        result, err := r.reconcileBackups(ctx, db)
        if err != nil {
            return result, err
        }
    }

    // Update status
    db.Status.Phase = "Running"
    db.Status.Endpoint = fmt.Sprintf("%s.%s.svc.cluster.local", db.Name, db.Namespace)
    r.Status().Update(ctx, db)

    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

Practical Operator Examples

Operators shine for managing stateful applications:

  • Prometheus Operator manages Prometheus deployments and monitoring configurations
  • Velero Operator handles backup and restore of Kubernetes resources and volumes
  • Cert Manager automates certificate management with Let’s Encrypt

When building your own operator, ask yourself whether the application has complex lifecycle requirements that generic Kubernetes resources cannot handle.

Role-Based Access Control

RBAC restricts who can perform operations in the cluster. It uses four key concepts: subjects (who), verbs (what actions), resources (what objects), and namespaces (where).

Roles and RoleBindings

Role and RoleBinding are namespace-scoped:

# Role definition
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: deployment-manager
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]
# RoleBinding - grants Role to subjects
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: deployment-manager-binding
  namespace: production
subjects:
  - kind: User
    name: alice@example.com
    apiGroup: rbac.authorization.k8s.io
  - kind: Group
    name: developers
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: deployment-manager
  apiGroup: rbac.authorization.k8s.io

ClusterRoles and ClusterRoleBindings

ClusterRoles and ClusterRoleBindings work cluster-wide:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-viewer
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-viewer-binding
subjects:
  - kind: ServiceAccount
    name: metrics-collector
    namespace: monitoring
roleRef:
  kind: ClusterRole
  name: node-viewer
  apiGroup: rbac.authorization.k8s.io

ServiceAccount Usage

Pods use ServiceAccounts to authenticate to the API server:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  namespace: production
---
apiVersion: v1
kind: Pod
metadata:
  name: my-app
  namespace: production
spec:
  serviceAccountName: my-app-sa
  containers:
    - name: app
      image: my-app:latest

Your application retrieves the token mounted at /var/run/secrets/kubernetes.io/serviceaccount/ and uses it to authenticate API calls.

Network Policies

Network policies restrict traffic between pods. By default, all pods can reach all other pods and services in a cluster. Network policies let you implement defense in depth.

Basic Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-isolation
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: database
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 53
        - protocol: UDP
          port: 53

This policy restricts the API pod to receive traffic only from frontend pods and monitoring namespaces, and allows it to send traffic only to the database and DNS.

DNS Egress

Almost every pod needs DNS resolution. Make sure your egress policies include port 53 to both TCP and UDP, or your applications will fail to resolve service names.

Storage Classes and Persistent Volumes

Dynamic provisioning of persistent storage requires StorageClasses. They define how storage is provisioned when a PersistentVolumeClaim requests it.

Defining a StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

The WaitForFirstConsumer binding mode delays volume binding until a pod actually uses the claim. This allows the scheduler to co-locate volumes with pods in the same zone.

Using PersistentVolumes in Pods

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-storage
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-storage
  resources:
    requests:
      storage: 100Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: database
  namespace: production
spec:
  containers:
    - name: db
      image: postgres:15-alpine
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: database-storage

Resource Quotas and Limits

Namespaces let you partition the cluster, but ResourceQuotas enforce resource limits within namespaces.

Setting Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 100Gi
    limits.cpu: "40"
    limits.memory: 200Gi
    pods: "50"
    services: "10"
    persistentvolumeclaims: "20"
apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
    - max:
        cpu: "8"
        memory: 32Gi
      min:
        cpu: 100m
        memory: 128Mi
      default:
        cpu: 500m
        memory: 1Gi
      defaultRequest:
        cpu: 200m
        memory: 256Mi
      type: Container

The LimitRange sets default requests and limits for containers that do not specify them, while ResourceQuota caps total resource usage per namespace.

Pod Disruption Budgets

When performing cluster maintenance, PDBs ensure minimum availability for your applications.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: frontend-pdb
  namespace: production
spec:
  maxUnavailable: 25%
  selector:
    matchLabels:
      app: frontend

The first PDB ensures at least 2 API pods are running during disruptions. The second allows up to 25% of frontend pods to be unavailable simultaneously.

Advanced Scheduling

Node Affinity and Anti-Affinity

apiVersion: v1
kind: Pod
metadata:
  name: database
  namespace: production
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: disktype
                operator: In
                values:
                  - ssd
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - database
            topologyKey: topology.kubernetes.io/zone

This schedules database pods on nodes with SSD storage and tries to spread them across availability zones.

Taints and Tolerations

Taints repel pods from nodes unless the pods have matching tolerations:

# Taint a node to repel non-critical workloads
kubectl taint nodes node1 dedicated=ml-workloads:NoSchedule
# Pod that tolerates the taint
apiVersion: v1
kind: Pod
metadata:
  name: ml-job
spec:
  tolerations:
    - key: dedicated
      operator: Equal
      value: ml-workloads
      effect: NoSchedule
  containers:
    - name: ml
      image: ml-training:latest

This pattern is useful for reserving nodes for specific workloads like ML training or stateful services.

Cluster Autoscaling

The cluster autoscaler adjusts the number of nodes based on pending pods and resource utilization. It talks to the cloud provider to add or remove nodes as needed.

Configuring the Cluster Autoscaler

apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec:
  resourceLimits:
    maxNodesTotal: 50
    cores:
      min: 8
      max: 128
    memory:
      min: 16Gi
      max: 256Gi
  scaleDown:
    enabled: true
    delayAfterAdd: 10m
    delayAfterDelete: 5m
    delayAfterFailure: 3m
    unneededTime: 10m

Node Pool Management

apiVersion: machine.openshift.io/v1beta1
kind: MachinePool
metadata:
  name: worker-pool
  namespace: openshift-machine-api
spec:
  minReplicas: 2
  maxReplicas: 10
  platform:
    aws:
      instanceType: m5.xlarge

Vertical Pod Autoscaler

VPA recommends resource requests based on actual usage patterns:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi

VPA can also automatically update resource requests by evicting and recreating pods. Combine it with the cluster autoscaler to handle both pod and node-level scaling.

Service Mesh Patterns

Service meshes like Istio, Linkerd, and Cilium Service Mesh extend Kubernetes networking with mTLS, traffic management, and observability without modifying application code.

Installing Istio

# Install Istio with minimal profile
istioctl install --set profile=default

# Enable automatic sidecar injection for a namespace
kubectl label namespace production istio-injection=enabled

mTLS Between Services

Istio enforces mutual TLS automatically:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

Traffic Splitting

Weight-based traffic splitting for canary deployments:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: api-canary
  namespace: production
spec:
  hosts:
    - api
  http:
    - route:
        - destination:
            host: api
            subset: stable
          weight: 90
        - destination:
            host: api
            subset: canary
          weight: 10
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: api
  namespace: production
spec:
  host: api
  subsets:
    - name: stable
      labels:
        version: stable
    - name: canary
      labels:
        version: canary

Circuit Breaking

Prevent cascading failures with circuit breakers:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: api
  namespace: production
spec:
  host: api
  trafficPolicy:
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Helm vs Kustomize

Both tools manage Kubernetes configurations, but they take different approaches.

Helm

Helm uses a templating model with a chart structure:

# Create a new chart
helm create my-app

# Package and install
helm package my-app
helm install my-app ./my-app-0.1.0.tgz

# Render templates without installing
helm template my-release my-chart --set replicaCount=3

Helm charts have values.yaml with defaults, overridden by user values:

# values.yaml
replicaCount: 2
image:
  repository: my-app
  tag: latest

Kustomize

Kustomize uses a declarative overlay model:

# kustomization.yaml (base)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
commonLabels:
  app: my-app
# kustomization.yaml (production overlay)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  - ../base
namePrefix: prod-
commonLabels:
  env: production
replicas:
  - name: my-app
    count: 5

When to Use Each

ScenarioToolReason
Third-party software (Prometheus, nginx)HelmCharts available, templating fits package model
Custom app with environment differencesKustomizeOverlay approach handles variants cleanly
Complex multi-tenant configsHelm + KustomizeKustomize for base, Helm for releases
GitOps with ArgoCDEitherBoth integrate well with GitOps workflows

StatefulSets

StatefulSets manage stateful applications requiring stable network identities and persistent storage.

Defining a StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: production
spec:
  serviceName: kafka
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  template:
    metadata:
      labels:
        app: kafka
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: kafka
          image: confluentinc/cp-kafka:7.4.0
          ports:
            - containerPort: 9093
              name: kafka
          volumeMounts:
            - name: data
              mountPath: /var/lib/kafka/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-storage
        resources:
          requests:
            storage: 50Gi

Scaling Considerations

StatefulSet scaling requires careful coordination. Kafka brokers must be scaled one at a time, and each new broker must fully sync before the next is added. The StatefulSet controller handles ordering, but operational procedures must respect broker sync times.

Init Containers for Bootstrap

spec:
  initContainers:
    - name: init-config
      image: busybox:1.36
      command:
        - sh
        - -c
        - |
          echo "Waiting for storage to be provisioned..."
          while [ ! -d /var/lib/kafka/data ]; do sleep 5; done
      volumeMounts:
        - name: data
          mountPath: /var/lib/kafka/data

When to Use / When Not to Use

Understanding when these advanced patterns apply helps you avoid over-engineering.

Custom Controllers and Operators

Use when:

  • You manage stateful applications with complex lifecycle requirements
  • You need to encode domain-specific operational knowledge into automated workflows
  • You want to reduce manual intervention for recurring operational tasks
  • You are building a platform that other teams will consume

When not to use:

  • Your application is stateless and scales horizontally without special handling
  • You only need basic Kubernetes primitives like Deployments and Services
  • The operational complexity of building an operator exceeds the manual effort it would save
  • You are in early stages and requirements are still changing rapidly

Decision Tree: Controllers vs Operators vs Native Resources

Use this flowchart to determine which approach fits your use case:

flowchart TD
    A[What are you trying to manage?] --> B{Is it a built-in K8s resource?}
    B -->|Yes| C[Use native resource<br/>Deployment, StatefulSet, Service, etc.]
    B -->|No| D{Does the app have complex lifecycle?}
    D -->|No - stateless, simple scale| E[Use native resources + Helm/Kustomize]
    D -->|Yes - backups, upgrades, failover| F{Is it a well-known off-the-shelf app?}
    F -->|Yes - Prometheus, CertManager, Velero| G[Install existing Operator<br/>via Helm or OperatorHub]
    F -->|No - custom domain app| H{Can existing controllers handle it?}
    H -->|Yes - CRD + standard reconciliation| I[Write Custom Controller<br/>with controller-runtime]
    H -->|No - app-specific domain logic| J[Build an Operator<br/>with Operator SDK]
    I --> K[Does it need Helm-style packaging?]
    J --> K
    K -->|Yes| L[Package as Operator with OLM]
    K -->|No| M[Deploy controller directly<br/>via YAML]

Quick reference:

ApproachComplexityBest For
Native resourcesLowestDeployments, Services, ConfigMaps, vanilla stateful apps
Helm/KustomizeLowPackage and configure standard apps, no custom logic
Custom ControllerMediumCRDs with standard reconcile loops, no app-specific domain
Existing OperatorLow-MediumPrometheus, cert-manager, Velero, databases, message queues
Custom OperatorHighestComplex domain logic, specialized stateful apps, internal platforms

RBAC and Network Policies

Use when:

  • Multiple teams share the same cluster
  • You need to enforce least-privilege access
  • Security compliance requires network segmentation
  • You want defense in depth beyond pod-level security

When not to use:

  • Single-tenant clusters with trusted users
  • Development or test environments without sensitive workloads
  • Network policies are handled by a higher-level service mesh

Storage Classes and Persistent Volumes Details

Use when:

  • Stateful workloads require persistent storage
  • You need dynamic provisioning based on application needs
  • You want to separate storage tiers (SSD vs HDD)

When not to use:

  • Stateless applications that store no persistent data
  • Caches or temporary data that can be lost without consequences

Production Failure Scenarios

Understanding real failure modes helps you prepare better.

FailureImpactMitigation
etcd quorum lossCluster becomes read-only or unavailableMaintain at least 3 etcd nodes, regular backups, separate etcd disks
API server overloadAll cluster operations failImplement proper rate limiting, optimize client code, scale API server
Kubelet failureAll pods on node become unhealthyUse pod disruption budgets, set pod priority classes, monitor node health
Storage class deletion with active PVCsPods cannot start, data loss potentialUse allowVolumeExpansion: false initially, never delete active storage classes
RBAC misconfigurationUsers cannot perform needed operationsUse kubectl auth can-i for verification, audit role bindings regularly
Network policy misconfigurationApplication pods cannot communicateTest in staging first, use policy order carefully, always allow DNS egress
Controller reconciliation loopsHigh API server load, degraded cluster performanceImplement proper reconciliation with exponential backoff
Pod budget too restrictiveCluster upgrades blockedSet realistic minAvailable values, test disruption scenarios

Common Pitfalls / Anti-Patterns

Controller Pitfalls

Reconciliation without backoff Writing a controller that continuously reconciles without exponential backoff will overwhelm the API server and cause cascading failures. Always implement retry logic with increasing delays.

Ignoring status updates Controllers that fail to update status leave users blind about their resource state. Status conditions should reflect actual observed state.

Not handling deletion Controllers must watch for deletion events and clean up resources properly. Orphaned resources cause ghost deployments and confusion.

RBAC Pitfalls

Using default service accounts Workloads should always use dedicated ServiceAccounts with specific permissions. Default ServiceAccount has too many implicit permissions.

Granting cluster-admin broadly Reserve cluster-admin for break-glass scenarios. Use namespace-scoped roles for daily operations.

Forgetting to audit RBAC configurations drift over time. Regular audits catch permission creep.

Network Policy Pitfalls

Forgetting DNS DNS uses port 53 on both TCP and UDP. Without DNS egress, applications cannot resolve service names and all external calls fail.

Too permissive policies Using podSelector: {} matches all pods in the namespace. Be specific about source and destination pods.

Policy ordering NetworkPolicy rules are evaluated in order. Place restrictive rules first to avoid accidentally allowing traffic.

Storage Pitfalls

Deleting StorageClass accidentally Never delete a StorageClass that has active claims. The deletion does not block, but dependent pods cannot recover.

Not monitoring volume quotas Running out of PV capacity blocks new PVC claims. Monitor available capacity and plan expansion.

Using ReadWriteMany incorrectly Not all volume plugins support ReadWriteMany. Using it with unsupported backends causes mount failures.

Quick Recap

Quick Recap Checklist

Before shipping your Kubernetes configuration to production, run through this checklist:

Operational Checklists

Controllers and Operators

  • Custom controller implements exponential backoff on reconciliation failures
  • Status conditions reflect actual observed state
  • Deletion handling cleans up orphaned resources
  • CRD schema validation covers required fields
  • Operator handles backup and restore scenarios

RBAC and Security

  • No workloads use default ServiceAccount
  • ServiceAccount permissions follow least privilege
  • ClusterRoleBindings reviewed quarterly
  • ServiceAccount tokens rotated regularly
  • etcd encryption at rest enabled

Network Policy Checklist

  • Default deny-all policy applied in each namespace
  • DNS egress (port 53 TCP/UDP) allowed in all policies
  • Policy ordering reviewed (restrictive rules first)
  • podSelector is specific, not {}

Storage

  • StorageClass has allowVolumeExpansion: false initially
  • No active PVCs before deleting a StorageClass
  • Volume capacity monitored and expanded proactively
  • Correct access mode (ReadWriteOnce vs ReadWriteMany) used

Scheduling

  • PodDisruptionBudgets set for critical deployments
  • Resource requests and limits defined for all containers
  • Taints and tolerations documented
  • Node affinity and anti-affinity configured for workload distribution

Observability

  • Control plane metrics collected (API latency, etcd WAL fsync, scheduler latency)
  • Audit logs enabled and forwarded to central storage
  • Alerts configured for API server unavailability, etcd leadership changes, node not ready
  • Application metrics (CPU, memory, replica count) monitored

Summary

Key Takeaways

  • Custom controllers and operators encode operational expertise but add complexity only justified for stateful, domain-specific applications
  • RBAC follows the principle of least privilege: grant only the permissions actually needed
  • Network policies implement defense in depth; always include DNS egress rules
  • StorageClasses enable dynamic provisioning but require careful capacity planning
  • Pod disruption budgets protect availability during voluntary disruptions
  • Taints and tolerations control pod placement across node types
  • Observability across metrics, logs, and alerts is essential for production reliability

Production Readiness Checklist

# RBAC
kubectl get rolebindings,clusterrolebindings -A | grep -v system:
kubectl auth can-i --list --as=system:serviceaccount:production:my-app-sa

# Network Policies
kubectl get networkpolicies -A
kubectl describe networkpolicy <name> -n production

# Storage
kubectl get pvc -A | grep -v Bound
kubectl get storageclass

# Pod Security
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.securityContext.runAsNonRoot}{"\n"}{end}' -n production

# Controllers
kubectl get events --sort-by='.lastTimestamp' -n production | tail -50

Trade-off Summary

PatternBest ForComplexityOperational Burden
Built-in controllersStandard workloadsLowMinimal
Custom controllersDomain-specific automationMediumMedium
Operators (kubebuilder)Complex lifecycle managementHighHigh
Operators (operator-sdk)Existing Go projectsHighHigh
RBAC onlySimple permissionsLowMinimal
OPA GatekeeperPolicy enforcementMediumMedium
KyvernoPolicy as YAMLLowLow

Observability Checklist

Comprehensive monitoring helps you catch issues before they become outages.

Metrics to Collect

graph LR
    A[Control Plane Metrics] --> B[API Server]
    A --> C[etcd]
    A --> D[Controller Manager]
    A --> E[Scheduler]
    F[Node Metrics] --> G[Kubelet]
    F --> H[Container Runtime]
    F --> I[Kube Proxy]
    J[Workload Metrics] --> K[Pod CPU Memory]
    J --> L[Deployment Replicas]
    J --> M[PV Usage]

Control plane metrics:

  • API server request latency and error rates
  • etcd disk I/O and WAL fsync latency
  • Controller reconciliation duration and error counts
  • Scheduler pod placement latency

Prometheus queries for control plane health:

# API server request error rate (5xx errors)
sum(rate(apiserver_request_total{job="apiserver",code=~"5.."}[5m])) / sum(rate(apiserver_request_total{job="apiserver"}[5m]))

# etcd WAL fsync latency (p99)
histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job="etcd"}[5m]))

# Controller reconciliation duration (p99)
histogram_quantile(0.99, rate(workqueue_work_duration_seconds_bucket{job="kube-controller-manager"}[5m]))

# Scheduler pod placement latency (p99)
histogram_quantile(0.99, rate(scheduler_pod_scheduling_duration_seconds_bucket{job="kube-scheduler"}[5m]))

# API server request latency by verb (p99)
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{job="apiserver"}[5m])) by (verb)

# etcd compaction and backup duration
rate(etcd_debugging_compaction_duration_seconds_sum{job="etcd"}[5m])

# Leader election rate (high rate means instability)
rate(etcd_server_leader_changes_seen_total{job="etcd"}[5m])

Node metrics:

  • Kubelet working set and eviction thresholds
  • Container runtime CPU and memory usage
  • Network bytes sent/received per pod

Application metrics:

  • Pod CPU and memory actual usage vs requests
  • Deployment replica count vs desired
  • Persistent volume usage percentage
  • Custom resource status conditions

Logs to Capture

graph TD
    A[Application Logs] --> D[Stdout stderr]
    B[System Logs] --> E[Kubelet]
    B --> F[Container Runtime]
    C[Kubernetes Logs] --> G[API Server]
    C --> H[Controller Manager]
    C --> I[Scheduler]
    J[Audit Logs] --> K[API Requests by User]
    J --> L[Policy Violations]
  • Aggregate all logs to a central location (Loki, ELK, CloudWatch)
  • Include Kubernetes metadata: namespace, pod name, container name
  • Capture Kubernetes events for resource lifecycle changes
  • Store audit logs for compliance and security investigations

Alerts to Configure

Critical (immediate response required):

  • API server unavailable for more than 1 minute
  • etcd high latency or leadership elections
  • Node not ready for more than 2 minutes
  • Pod evictions occurring due to resource pressure

Warning (investigate soon):

  • Pod restart loop (CrashLoopBackOff)
  • Deployment replica count below desired
  • Persistent volume usage above 80%
  • Certificate expiration within 30 days

Security Checklist

RBAC Security

  • Review all ClusterRoleBindings and RoleBindings quarterly
  • Use ServiceAccounts instead of user credentials for workloads
  • Implement least-privilege: only grant required permissions
  • Use kubectl auth can-i --list to audit effective permissions
  • Rotate ServiceAccount tokens regularly

Network Security

  • Apply default deny NetworkPolicies in each namespace
  • Explicitly allow only required traffic paths
  • Always include DNS egress in network policies
  • Use Kubernetes DNS for service discovery (not hardcoded IPs)
  • Consider service mesh for mTLS between services

Pod Security

graph LR
    A[Pod Security] --> B[Run as non-root]
    A --> C[ReadOnly root filesystem]
    A --> D[Drop all capabilities]
    A --> E[No privileged containers]
    A --> F[Resource limits set]
  • Run containers as non-root user (securityContext.runAsNonRoot: true)
  • Use read-only root filesystem when possible (securityContext.readOnlyRootFilesystem: true)
  • Drop all capabilities and add only required ones (securityContext.capabilities.drop)
  • Set resource requests and limits to prevent resource starvation
  • Disable host PID and network namespaces (securityContext.hostPID: false, hostNetwork: false)
  • Use PodSecurityStandards or OPA Gatekeeper for policy enforcement

Secret Management

  • Never store secrets in etcd or configmaps in plain text
  • Use external secrets solutions (External Secrets Operator, HashiCorp Vault)
  • Enable encryption at rest for etcd
  • Rotate secrets regularly and have a revocation plan

Interview Questions

1. How does a custom controller in Kubernetes differ from a built-in controller like the Deployment controller?

Expected answer points:

  • Built-in controllers (Deployment, StatefulSet, etc.) handle Kubernetes native resources and are part of the kube-controller-manager
  • Custom controllers extend Kubernetes by watching CRDs or native resources and reconciling toward a desired state you define
  • Custom controllers require you to implement the reconcile loop logic, while built-in controllers are pre-built
  • Custom controllers run as pods in the cluster, typically managed by a Deployment for high availability
  • The controller-runtime library simplifies building custom controllers with caching, client connections, and leader election built in
2. What is the difference between a Custom Resource Definition (CRD) and a built-in Kubernetes resource?

Expected answer points:

  • CRDs extend the Kubernetes API to define new resource types without modifying the core Kubernetes binary
  • Built-in resources (Pod, Deployment, Service) are compiled into the Kubernetes API server
  • CRDs are stored in etcd just like native resources, but validation is defined via OpenAPIV3Schema
  • CRDs support versioning to enable zero-downtime schema migrations (storage vs served versions)
  • Controllers can watch CRDs and reconcile them the same way they watch native resources
3. Explain the reconcile loop pattern in custom controllers.

Expected answer points:

  • The reconcile loop is a control loop that continuously works toward the desired state
  • Steps: Watch for changes → Fetch current state → Compare with desired → Take action → Update status → Repeat
  • Should implement exponential backoff to avoid overwhelming the API server during failures
  • Must handle deletion properly by cleaning up finalizers and orphaned resources
  • The controller-runtime library handles caching and provides a structured Reconcile method
4. When would you build a custom operator instead of using a generic controller?

Expected answer points:

  • Operators encode domain-specific operational knowledge into the reconciliation logic
  • Use operators for stateful applications with complex lifecycles (backups, upgrades, failover)
  • Use custom controllers for standard reconcile patterns without app-specific logic
  • Operators are built with Operator SDK (kubebuilder or ansible) which provides scaffolding, testing, and OLM packaging
  • If the application needs specialized domain logic beyond standard create/update/delete, build an operator
5. How does RBAC work in Kubernetes, and what are the four key concepts?

Expected answer points:

  • The four key concepts are: subjects (who), verbs (what actions), resources (what objects), and namespaces (where)
  • Role and RoleBinding are namespace-scoped; ClusterRole and ClusterRoleBinding are cluster-wide
  • Subjects can be Users, Groups, or ServiceAccounts
  • Common verbs: get, list, watch, create, update, patch, delete
  • Use kubectl auth can-i to test permissions before applying RBAC changes
6. What are Network Policies in Kubernetes, and why are they important?

Expected answer points:

  • Network policies restrict traffic between pods; by default, all pods can reach all other pods
  • They implement defense in depth by controlling both ingress and egress traffic
  • Use podSelector to target specific pods and namespaceSelector to allow traffic from specific namespaces
  • Always include DNS egress (port 53 TCP/UDP) or applications cannot resolve service names
  • NetworkPolicies are additive, so start with a default deny-all policy and explicitly allow required traffic
7. What is the difference between PodDisruptionBudget minAvailable and maxUnavailable?

Expected answer points:

  • minAvailable specifies the minimum number of pods that must remain available during disruptions
  • maxUnavailable specifies the maximum number of pods that can be unavailable simultaneously
  • minAvailable is absolute number or percentage; maxUnavailable is typically percentage
  • Use minAvailable when you need guaranteed capacity (like API servers)
  • Use maxUnavailable when you want to allow some flexibility during cluster maintenance
8. How do Taints and Tolerations work together to control pod placement?

Expected answer points:

  • Taints are applied to nodes to repel pods that do not have matching tolerations
  • Tolerations are applied to pods to allow them to be scheduled on tainted nodes
  • Taint effects: NoSchedule (hard reject), PreferNoSchedule (soft reject), NoExecute (evict after toleration)
  • A pod without a matching toleration cannot be scheduled on a tainted node
  • Use case: reserving nodes for ML workloads by tainting GPU nodes and tolerating only ML training pods
9. What are the key differences between Helm and Kustomize for Kubernetes configuration management?

Expected answer points:

  • Helm uses a templating model with values.yaml; Kustomize uses an overlay model with kustomization.yaml
  • Helm charts are versioned packages; Kustomize is patch-based configuration management
  • Helm is better for third-party software (Prometheus, nginx) where charts already exist
  • Kustomize is better for custom applications with environment-specific variants (dev, staging, prod)
  • Both integrate well with GitOps tools like ArgoCD
10. How does a StatefulSet differ from a Deployment, and when would you use each?

Expected answer points:

  • StatefulSets provide stable network identity (persistent pod names) and stable storage
  • Deployments are for stateless applications with interchangeable replicas
  • StatefulSets create pods with predictable, stable hostnames in order (e.g., kafka-0, kafka-1, kafka-2)
  • StatefulSets require VolumeClaimTemplates for persistent storage per pod instance
  • Use StatefulSets for databases, message queues, and other stateful applications needing stable identity
11. What are the key components of the Kubernetes control plane and how do they interact?

Expected answer points:

  • API Server is the gateway for all cluster operations, validating and persisting configurations to etcd
  • etcd stores the cluster state as a distributed key-value store
  • Controller Manager runs control loops that reconcile actual state toward desired state
  • Scheduler places pods on nodes based on resource requirements and constraints
  • Controllers watch the API server for changes and take corrective action to achieve desired state
12. How does the controller-runtime library simplify custom controller development?

Expected answer points:

  • controller-runtime provides caching, client connections, and leader election out of the box
  • It offers a structured Reconcile method with context for handling reconciliation requests
  • The library handles watch mechanisms for resources, reducing boilerplate code
  • It integrates with the manager pattern for running multiple controllers together
  • Error handling and exponential backoff can be implemented within the Reconcile loop
13. What is the difference between a ClusterRole and a Role in Kubernetes RBAC?

Expected answer points:

  • Role and RoleBinding are namespace-scoped, limited to permissions within a specific namespace
  • ClusterRole and ClusterRoleBinding are cluster-wide, with access to nodes, persistent volumes, and cluster-scoped resources
  • ClusterRoles can also grant access to non-resource URLs like /metrics
  • Use Roles for namespace-limited operations, ClusterRoles for cluster-wide or node-level access
  • ClusterRoleBindings can reference both ClusterRoles and namespace-scoped RoleBindings
14. How do you handle CRD versioning for zero-downtime migrations?

Expected answer points:

  • Define multiple versions in the CRD with the storage version flag indicating what persists to etcd
  • Set served: true for versions you want the API server to handle
  • Clients can request specific versions via the Accept header during rolling upgrades
  • Implement conversion webhooks if schema changes between versions require data transformation
  • Keep the old version served while transitioning traffic, then remove after migration
15. What are the benefits and trade-offs of using a service mesh like Istio?

Expected answer points:

  • Benefits: Automatic mTLS between services, traffic management (canary deployments, circuit breakers), observability without app changes
  • Trade-offs: Adds overhead (sidecar proxies consume memory and latency), increases operational complexity
  • Requires careful resource allocation for sidecar proxies
  • Debugging can be harder with the additional network hop layer
  • Consider lighter alternatives like Cilium for simpler networking needs
16. How does Vertical Pod Autoscaler (VPA) work and when would you use it?

Expected answer points:

  • VPA analyzes actual resource usage patterns and recommends or applies updated resource requests
  • It can operate in Off mode (recommendations only), Auto mode (applies by evicting and recreating pods), or Initial mode (at pod creation only)
  • VPA helps right-size resources based on real usage rather than estimates
  • Combine VPA with cluster autoscaler for both pod and node-level scaling
  • Be aware that VPA cannot be used simultaneously with HPA on the same metrics
17. What is the purpose of PodDisruptionBudgets and how do they protect availability?

Expected answer points:

  • PDBs ensure minimum availability during voluntary disruptions like cluster upgrades or node maintenance
  • minAvailable specifies the minimum number of pods that must remain available
  • maxUnavailable specifies the maximum number of pods that can be unavailable (typically percentage)
  • Kubernetes will block node drain operations that violate PDBs
  • Set realistic values based on application capacity requirements
18. How do StorageClasses enable dynamic volume provisioning?

Expected answer points:

  • StorageClasses define storage provisioners (like kubernetes.io/gce-pd or cloud-provider-specific ones)
  • They parameterize storage types (SSD vs HDD, regional vs zonal) for different performance tiers
  • WaitForFirstConsumer delays volume binding until a pod actually uses the claim, allowing zone co-location
  • allowVolumeExpansion: true allows PVCs to be expanded without recreating the volume
  • When a PVC requests storage, the provisioner creates the volume dynamically based on the StorageClass
19. What are the key differences between Helm and Kustomize approaches to configuration management?

Expected answer points:

  • Helm uses a templating model with values.yaml files and produces versioned chart packages
  • Kustomize uses a declarative overlay model with kustomization.yaml files for patches
  • Helm is better for installing third-party software where charts already exist
  • Kustomize excels at managing environment-specific variants (dev, staging, prod) with base + overlay
  • Both integrate with GitOps tools like ArgoCD and can be combined (Kustomize for base, Helm for releases)
20. What operational considerations are unique to StatefulSets compared to Deployments?

Expected answer points:

  • StatefulSets scale one pod at a time, maintaining ordering guarantees
  • Each pod gets a stable, predictable hostname (e.g., kafka-0, kafka-1)
  • VolumeClaimTemplates create unique PVCs per pod instance that persist across rescheduling
  • Stateful applications like databases require careful scaling procedures (e.g., Kafka broker sync times)
  • Init containers can handle bootstrap sequences that must complete before the next pod is created

Further Reading

Official Documentation

Books

  • Kubernetes in Action by Marko Lukša — Comprehensive coverage from basics to advanced topics including controllers and operators
  • Cloud Native Infrastructure by Justin Garrison and Kris Nova — Practical guide to running Kubernetes in production

Articles & Guides

Blog Posts

Conclusion

Advanced Kubernetes topics build on the fundamentals of containers and orchestration. Custom controllers and operators let you encode domain knowledge into automated workflows. RBAC and network policies enforce security boundaries. Storage classes and resource quotas ensure predictable cluster operation.

These patterns emerge from real production experience. Start with the basics of your applications, understand the failure modes, and apply the patterns that solve your specific problems.

For packaging your Kubernetes applications, the Helm Charts guide covers templating and release management. If you are building observability into your cluster, check out our Distributed Tracing and Prometheus & Grafana guides for monitoring setup.

Category

Related Posts

Kubernetes: Container Orchestration for Microservices

Learn Kubernetes fundamentals: pods, services, deployments, ingress controllers, Helm charts, autoscaling, and microservices architecture patterns.

#kubernetes #containers #orchestration

Docker Fundamentals

Learn Docker containerization fundamentals: images, containers, volumes, networking, and best practices for building and deploying applications.

#docker #containers #devops

Helm Charts: Templating, Values, and Package Management

Helm Charts guide covering templates, values management, chart repositories, and production deployment workflows.

#kubernetes #helm #devops