Helm Versioning and Rollback: Managing Application Releases

Master Helm release management—revision history, automated rollbacks, rollback strategies, and handling failed releases gracefully.

published: reading time: 20 min read author: GeekWorkBench

Introduction

Every Helm upgrade creates a new numbered revision. If the upgrade breaks something, you roll back to a previous revision with a single command. This revision history is stored as Kubernetes secrets in the release namespace and survives across cluster restarts, making rollback a reliable recovery mechanism when deployments go wrong.

Rollback matters in production because not every failed deployment is caught by pre-deployment testing. A config change that seemed safe in staging may interact unexpectedly with production data. A new image tag may have a subtle bug that only manifests under production load. When this happens, you need to restore the previous working state quickly, and Helm rollback lets you do that without manually reverting changes in Git or applying old manifests.

This guide covers Helm release revision history, manual rollback procedures, automated rollback in CI/CD pipelines, and hooks for pre and post-upgrade tasks. You will learn when rollback is the right tool and when database migrations or forward fixes are required instead. By the end, you will be able to inspect release history, execute rollbacks safely, implement automated rollback with the atomic flag, and design upgrade processes that survive real-world failures.

When to Use / When Not to Use

When Helm rollback makes sense

Helm rollback is the right tool when you have an intact release history and the previous state is known and safe. Database schema changes that went wrong, config changes that broke behavior, and canary deployments that revealed problems are all good rollback candidates. The --atomic flag makes automated rollback in CI/CD practical for these cases.

Helm rollback also works well when your application is stateless or when storage resources are not affected by the change. A bad Deployment image tag or a misconfigured ConfigMap rolls back cleanly.

When rollback is not enough

If your upgrade included database migrations that modified schema, rollback does not undo those changes. PostgreSQL migrations that added columns with non-null constraints, MongoDB schema changes, and any irreversible data transformation cannot be fixed by rolling back the Kubernetes resources.

In these cases, you need forward-fix migrations, not rollback. Design your upgrade process to handle this before deploying.

Rollback Decision Flow

flowchart TD
    A[Deployment fails<br/>or degraded] --> B{Upgrade or<br/>Config change?}
    B -->|Config| C{Rollback<br/>fixes it?}
    B -->|Migration| D[Forward-fix<br/>required]
    C -->|Yes| E[helm rollback<br/>to previous revision]
    C -->|No| D
    E --> F{Stateful<br/>resources affected?}
    F -->|Yes| G[Backup data<br/>then rollback]
    F -->|No| H[Rollback safe<br/>proceed]

Release Revision History

Helm stores release history in Kubernetes secrets within the release namespace. Each time you install, upgrade, or roll back, Helm creates a new revision.

# List releases
helm list

# List releases with history
helm history myapp

REVISION  UPDATED                  STATUS     CHART          DESCRIPTION
1         2026-03-20 10:30:00      superseded myapp-1.0.0    Install complete
2         2026-03-21 14:15:00      superseded myapp-1.1.0    Upgrade complete
3         2026-03-22 09:00:00      deployed   myapp-1.2.0     Upgrade complete
# Get detailed release status
helm status myapp

# Show release values at any revision
helm get values myapp --revision 2

# Show all hooks and values
helm get all myapp --revision 3

History is stored as Kubernetes secrets:

kubectl get secrets -n mynamespace -l "owner=helm"

NAME                     TYPE       DATA  AGE
sh.helm.release.v1.myapp.v1   helm.sh/release.v1   1   5d
sh.helm.release.v1.myapp.v2   helm.sh/release.v1   1   4d
sh.helm.release.v1.myapp.v3   helm.sh/release.v1   1   3d

Manual Rollback Procedures

Rolling back to a previous revision takes a single command:

# Rollback to previous revision
helm rollback myapp

# Rollback to specific revision
helm rollback myapp 2

# Rollback with timeout
helm rollback myapp 1 --timeout 5m

# Dry-run rollback
helm rollback myapp 1 --dry-run

Helm performs the rollback by applying the stored manifests from that revision. This is not a reverse diff but a full recreation of the state.

# Watch rollback progress
helm rollback myapp 1 --wait

# Rollback with cleanup on failure
helm rollback myapp 1 --cleanup-on-fail

The --cleanup-on-fail flag causes Helm to delete new resources that were created by the failed upgrade but are not present in the rollback target revision.

Automated Rollback in CI/CD

In automated pipelines, you need criteria for when to trigger a rollback:

# GitHub Actions workflow excerpt
- name: Deploy to production
  run: |
    helm upgrade --install myapp ./charts/myapp \
      --namespace production \
      --values ./config/production.yaml \
      --wait --timeout 10m \
      --atomic

  env:
    KUBECONFIG: /tmp/kubeconfig

- name: Verify deployment
  run: |
    # Check rollout status
    kubectl rollout status deployment/myapp -n production

    # Run smoke tests
    curl -f https://myapp.example.com/health || exit 1

    # Check metrics
    prometheus query "app:http_requests_total{app='myapp'}" > /dev/null

The --atomic flag automatically rolls back on failure:

helm upgrade --install myapp ./charts/myapp \
  --atomic \
  --timeout 5m

If the upgrade fails or pods do not become ready within the timeout, Helm automatically rolls back to the previous successful revision.

Prometheus-based rollback:

# deploy-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: deploy-myapp
  namespace: cicd
spec:
  template:
    spec:
      containers:
        - name: deploy
          image: helm/helm:3.14
          command:
            - /bin/bash
            - -c
            - |
              # Deploy
              helm upgrade --install myapp ./charts/myapp \
                --wait --timeout 10m

              # Wait for metrics
              sleep 60

              # Check error rate
              ERROR_RATE=$(prometheus query "rate(http_errors_total{app='myapp'}[5m])")
              if [ "$ERROR_RATE" > "0.01" ]; then
                echo "Error rate too high: $ERROR_RATE"
                helm rollback myapp
                exit 1
              fi

Hooks for Pre/Post Upgrade Tasks

Helm hooks run arbitrary Kubernetes jobs at specific points in the release lifecycle:

# templates/backup-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ .Release.Name }}-pre-upgrade-backup"
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: backup
          image: postgres-client:15
          command:
            - /bin/bash
            - -c
            - |
              pg_dump -h postgres -U app $DATABASE > /backups/pre-upgrade.sql
# templates/migration-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ .Release.Name }}-db-migration"
  annotations:
    "helm.sh/hook": post-upgrade
    "helm.sh/hook-weight": "5"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp/migrator:1.5
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: myapp-db-credentials
                  key: connection-string
      restartPolicy: OnFailure

Hook weights control execution order. Lower weights run first for pre-upgrade hooks, higher weights run first for post-upgrade hooks.

Delete policies:

  • hook-succeeded: Delete job after successful completion
  • hook-failed: Delete job after failed execution
  • before-hook-creation: Delete existing job before creating new one

Rollback Failure Scenarios

Some situations prevent straightforward rollbacks:

Resource no longer exists in old revision:

If the previous version created resources that were deleted in a later version, rollback will recreate them.

Storage resource changes:

Storage classes and persistent volumes may not be rollback-able depending on their provisioner.

API version deprecations:

If you upgraded to a resource using a newer API version that is no longer available in the old version, rollback cannot proceed.

Manual edits:

If someone manually edited Kubernetes resources managed by Helm, rollback may conflict with those changes.

# Force rollback (may leave resources in unexpected state)
helm rollback myapp 1 --force

# Check what will happen before rolling back
helm template myapp --revision 1 ./charts/myapp > /tmp/rollback.yaml
kubectl diff -f /tmp/rollback.yaml

Best Practices for Production

Always use --wait and --timeout:

helm upgrade myapp ./charts/myapp \
  --wait \
  --timeout 10m \
  --cleanup-on-fail

Keep reasonable history:

# Limit secrets stored by Helm
helm.sh/resource-policy: keep

# Or use postrenderer to exclude hook resources

Test rollbacks in staging:

# In your staging pipeline
- name: Rollback test
  run: |
    # Deploy current version
    helm upgrade --install myapp ./charts/myapp --namespace staging

    # Rollback immediately
    helm rollback myapp --namespace staging

    # Verify rollback completed
    helm history myapp --namespace staging

Monitor rollback events:

# Alerting rule
- alert: HelmRollback
  expr: |
    increase(helm_release_rollback_total[5m]) > 0
  labels:
    severity: warning
  annotations:
    summary: "Helm rollback detected"
    description: "Release {{ $labels.name }} rolled back in namespace {{ $labels.namespace }}"

Rollback Trade-offs

Helm rollback is not always the right tool. Here is how it compares to other recovery strategies.

ScenarioHelm RollbackForward FixBlue-Green Deploy
Config change broke thingsFast (single command)Takes longerFast (switch traffic)
Bad deployment image tagFastRebuild and redeployFast switch back
Database migration failureDangerous (schema already changed)Correct approachRoll traffic, fix migration
CRD changesMay not workOften requiredDepends on change type
Data loss riskHigh if storage affectedLower with proper backupLow

The key question: does the previous revision actually represent a safe state? If your upgrade modified persistent data, rolling back may not undo those changes.

Observability Hooks

Track rollback health and deployment reliability with these monitoring practices.

Key metrics to track:

# Rollback frequency by release
sum(rate(helm_release_rollback_total[5m])) by (name, namespace)

# Failed upgrades that triggered atomic rollback
sum(rate(helm_upgrade_failed_total[5m])) by (name, namespace)

# Release revision count (high count = unstable releases)
helm_release_info{owner=helm}

# Time since last successful deployment
time() - helm_release_last_deployed_timestamp

Alert rules for Helm releases:

# Alert when a rollback occurs
- alert: HelmRollbackExecuted
  expr: increase(helm_release_rollback_total[5m]) > 0
  labels:
    severity: warning
  annotations:
    summary: "Helm rollback executed for {{ $labels.name }}"
    description: "Release {{ $labels.name }} in {{ $labels.namespace }} was rolled back. Investigate the cause."

# Alert when release is in failed state
- alert: HelmReleaseFailed
  expr: helm_release_info{status="failed"} == 1
  labels:
    severity: critical
  annotations:
    summary: "Helm release {{ $labels.name }} is in failed state"
    description: "Release has been failing. Manual intervention may be required."

# Alert on high revision count (unstable release)
- alert: HelmReleaseUnstable
  expr: count(helm_release_info) by (name, namespace) > 10
  labels:
    severity: warning
  annotations:
    summary: "Release {{ $labels.name }} has {{ $value }} revisions"
    description: "High revision count indicates frequent changes or rollbacks. Review release stability."

Debugging commands:

# Get full release history with details
helm history myapp --all

# See what changed between revisions
helm diff revision myapp 2 3

# Check current release status
helm status myapp

# Get all values at a specific revision
helm get values myapp --revision 3

# View the manifest that would be applied at revision 1
helm template myapp --revision 1 ./charts/myapp > /tmp/rev1.yaml

Common Pitfalls / Anti-Patterns

Relying on rollback for database migrations

Rollback does not undo database migrations. If your upgrade runs SQL migrations and then the application fails to start, rolling back the Kubernetes Deployment does not roll back the database schema. This leaves your app in a broken state against a migrated schema.

Always design database migrations to be reversible, or design your upgrade process so that a failed migration aborts the Helm upgrade before the new application code is deployed.

Not using —atomic in automated deployments

Without --atomic, a failed upgrade leaves the release in a pending-upgrade or failed state. The next deployment attempt may behave unexpectedly. --atomic guarantees you always end up with a working release, either the new one or the rolled-back previous one.

Manual edits between deploy and rollback

If someone uses kubectl edit to modify a resource managed by Helm after an upgrade, that change survives the next helm upgrade because Helm overwrites whatever is in the cluster with what is in the chart. It also complicates rollback since the cluster state no longer matches the stored revision.

Use helm.sh/resource-policy: keep annotations to exclude specific resources from Helm management when manual edits are necessary.

Not limiting release history

Helm stores every revision as a Kubernetes secret. Without limits, hundreds of revisions accumulate over time, bloating the namespace and slowing down helm list and helm history. Set reasonable history limits and prune old revisions.

Forgetting to test rollback in staging

A rollback procedure that has never been tested is unreliable. If your first rollback attempt happens during a production incident at 3am, you do not want to discover that your rollback hook has a bug.

Test rollback in staging on every significant release.

Interview Questions

1. How does Helm store release history and what resources does it create?

Expected answer points:

  • Helm stores every release as a numbered revision in Kubernetes secrets within the release namespace
  • Secrets have name format `sh.helm.release.v1..v` and type `helm.sh/release.v1`
  • Each install, upgrade, or rollback creates a new revision secret
  • You can view history with `helm history` and inspect specific revisions with `helm get values --revision N`
2. What is the `--atomic` flag and when should you use it?

Expected answer points:

  • `--atomic` automatically rolls back to the previous successful revision if the upgrade fails or times out
  • Ensures you always end up with a working release—either the new one or the rolled-back previous one
  • Essential for CI/CD pipelines where manual intervention during failures is impractical
  • If upgrade fails and `--atomic` is set, Helm guaranteed rollback happens before returning error
3. Why is rollback dangerous for database migrations?

Expected answer points:

  • Helm rollback only re-applies Kubernetes manifests—it cannot undo database schema changes
  • If your upgrade runs SQL migrations that modify schema (e.g., adding columns with non-null constraints), those changes persist after rollback
  • Application code may fail against the migrated schema if it was designed for the old schema
  • Design migrations to be reversible, or build forward-fix procedures instead of relying on rollback
4. What are Helm hooks and how do they work with rollback?

Expected answer points:

  • Hooks run Kubernetes jobs at specific points in release lifecycle: pre-install, post-install, pre-upgrade, post-upgrade, pre-rollback, post-rollback
  • Hook weight controls execution order (lower weight runs first for pre-hooks)
  • Delete policies: `hook-succeeded`, `hook-failed`, `before-hook-creation`
  • Pre-upgrade hooks with backup logic run before migration, enabling data preservation before risky changes
5. What happens when you rollback a release that modified storage resources?

Expected answer points:

  • Storage classes and persistent volumes may not be rollback-able depending on their provisioner
  • PVC claims that were deleted in a later version are not automatically recreated on rollback
  • Some storage provisioners allow volume expansion but not contraction
  • Always check storage state before assuming rollback fully restores previous state
6. How do you automate rollback based on Prometheus metrics in CI/CD?

Expected answer points:

  • Deploy using `helm upgrade --install` with `--wait --timeout`
  • After deployment, wait for metrics to stabilize (sleep N seconds)
  • Query Prometheus for error rate or latency thresholds
  • If thresholds exceeded, run `helm rollback` and exit with failure
  • Without `--atomic`, you must explicitly check and rollback—otherwise release stays in failed state
7. What is the difference between `helm rollback` and `helm template` for rollback verification?

Expected answer points:

  • `helm rollback` actually applies changes to the cluster
  • `helm template` renders the chart without connecting to cluster—useful for seeing what would be applied
  • `helm diff revision A B` shows what changed between two revisions
  • Use `helm template --revision N` to preview rollback state before executing
8. What are the rollback failure scenarios that require manual intervention?

Expected answer points:

  • API version deprecation—if upgraded to a newer API version no longer available in old revision, rollback cannot proceed
  • Resources manually edited after upgrade—rollback conflicts with those changes
  • Stateful workload storage changes that persist despite rollback
  • `--force` flag can override but may leave resources in unexpected state
9. How should you test rollback procedures before relying on them in production?

Expected answer points:

  • Add rollback test step in staging pipeline: deploy, immediately rollback, verify rollback completed
  • Test with every significant release, not just once during initial setup
  • Verify hooks run correctly (backup jobs succeed, migration jobs complete)
  • A rollback procedure that has never been tested is unreliable when you need it at 3am
10. What metrics should you monitor for Helm release health?

Expected answer points:

  • Rollback frequency by release (increase in helm_release_rollback_total)
  • Failed upgrades that triggered atomic rollback (helm_upgrade_failed_total)
  • Release revision count (high count indicates unstable releases)
  • Time since last successful deployment
  • Alert when rollback occurs, when release is in failed state, or when revision count exceeds threshold
11. How do you handle Helm rollback when the previous revision has been deleted from the repository?

Expected answer points:

  • Helm rollback does not fetch charts from the repository — it uses the manifests stored in release secrets
  • Release secrets (sh.helm.release.v1.*) in Kubernetes contain the full rendered manifests
  • Even if the chart version is no longer in the repository, rollback can apply the stored manifests
  • Problem occurs if: secret-based releases are deleted, or if resource definitions require chart-specific files (templates, hooks)
  • If chart tarball is missing for hooks: rollback may fail if hook execution requires chart assets
  • Best practice: keep chart versions available in repository for rollback scenarios requiring re-rendering
12. What is the difference between helm.sh/resource-policy: keep and helm.sh/hook-delete-policy?

Expected answer points:

  • `helm.sh/resource-policy: keep` prevents Helm from deleting specific resources during upgrade or rollback
  • Use when you want to preserve resources created manually or by external tools that Helm should not manage
  • `helm.sh/hook-delete-policy` controls when hook resources (jobs, pods) are deleted after hook execution
  • Hook delete policies: hook-succeeded (delete if hook completed successfully), hook-failed (delete if failed), before-hook-creation (delete existing before creating new)
  • Resource policy keep is for persistent resources; hook delete policy is for temporary hook workloads
  • Both are annotations on resources — they serve different purposes and can be used together
13. How do you design Helm charts to support zero-downtime deployments with database migrations?

Expected answer points:

  • Design upgrade strategy: migrate first, then deploy new application version
  • Pre-upgrade hook runs database migration with retry logic before new pods start
  • If migration fails: hook fails, Helm upgrade stops, previous release remains deployed
  • Add init container to application that waits for migration completion before starting app
  • Blue-green deployment: deploy new version alongside old, run migration, switch traffic, delete old
  • Backward-compatible database migrations: add columns with defaults, never remove columns in same release as code change
14. What is the purpose of Helm tests and how do you write effective ones?

Expected answer points:

  • Helm tests are Kubernetes jobs that run after installation to verify the release is working
  • Test pod must have `helm.sh/hook: test-success` annotation
  • Tests should verify: service is reachable, health endpoint returns 200, application can read/write data
  • Run tests with `helm test myapp` — outputs test pod logs
  • Tests run after upgrade if `--timeout` is reached and pods are ready
  • Write meaningful tests: smoke tests that catch real failures, not just "pod is running"
15. How do you handle secret rotation in Helm deployments without causing downtime?

Expected answer points:

  • Use `helm upgrade --reuse-values` with `--set` to update secret values without changing other config
  • For External Secrets Operator: secrets auto-refresh from secret store — no Helm upgrade needed
  • For sealed secrets: deploy new sealed secret, then delete old secret after verification
  • Strategy: deploy new secret alongside old, verify new secret works, remove old secret
  • Avoid `helm.sh/resource-policy: keep` on secrets — old secret references may break if recreated
  • Test rollback procedure for secret rotation — verify you can roll back to previous secret if needed
16. What are the best practices for managing Helm release naming and namespaces in production?

Expected answer points:

  • Use release name that indicates purpose: `myapp-prod`, not `myapp`
  • Install each application to its own namespace — don't mix unrelated applications
  • One release per application per namespace — multiple releases of same chart in same namespace causes conflicts
  • Use `--generate-name` only for temporary test installs, never in production
  • Track releases in documentation — which release is in which namespace and what it does
  • Use labels and annotations on releases to track ownership, contact, and purpose
17. How does Helm handle concurrent upgrades to the same release?

Expected answer points:

  • Helm acquires a lock on the release during upgrade — concurrent upgrades are rejected
  • Second upgrade attempt receives error: "another operation (install/upgrade/rollback) is in progress"
  • Use `--wait` to ensure previous operation completes before starting new one
  • In CI/CD: implement locking mechanism (flock, database mutex) to prevent concurrent deploys
  • If deploy fails mid-operation and leaves release in pending-upgrade state: use `helm rollback` or `helm upgrade --atomic` to recover
18. How do you implement canary deployments using Helm with weighted releases?

Expected answer points:

  • Helm native support for canary is limited — use Helmfile or additional tools for complex strategies
  • Simple canary: `helm upgrade --set replicaCount=1` for canary, then gradually increase
  • Use Argo Rollouts or Flagger for production-grade canary with traffic splitting
  • Argo Rollouts defines rollout strategy in CRD — separate from Helm values
  • Promote canary: update Helm release with canary values; rollback if metrics degrade
  • Monitor error rate and latency during canary period before full promotion
19. What is the difference between `helm upgrade --install` and `helm install`?

Expected answer points:

  • `helm install` creates a new release; fails if release with same name already exists
  • `helm upgrade --install` creates release if it doesn't exist, or upgrades if it does — idempotent
  • Use `--install` in CI/CD for repeatable deployments — works for both initial install and updates
  • `helm install` is useful for testing with temporary random names via `--generate-name`
  • `--install` combined with `--atomic` ensures clean state: if upgrade fails, rollback to previous release
20. How do you handle Helm release failures that leave resources in an inconsistent state?

Expected answer points:

  • Release is left in `pending-upgrade` or `failed` state if upgrade fails without `--atomic`
  • Recovery: `helm upgrade myapp mychart --atomic` — if current state is broken, rollback succeeds
  • If rollback also fails: use `helm rollback myapp --force` — recreates resources from stored manifest
  • Check what went wrong: `helm status myapp`, `kubectl get events`, application logs
  • For persistent failures: use `helm template` to render manifests, apply manually with `kubectl` for debugging
  • Delete release in failed state: `helm uninstall myapp --keep-history` then manually clean up orphaned resources

Further Reading

Conclusion

Key Takeaways

  • Helm stores every release as a numbered revision; rollback replays the stored state of that revision
  • Use --atomic for automated pipelines to guarantee clean state on failure
  • Design migrations to be reversible or build forward-fix procedures instead of relying on rollback
  • Test rollback in staging before relying on it in production
  • Monitor rollback events with Prometheus alerts to catch problems early

Rollback Checklist

# Check release history
helm history myapp

# See what values were deployed at each revision
helm get values myapp --revision 2

# Rollback to previous working revision
helm rollback myapp --wait --timeout 5m

# Rollback to specific revision
helm rollback myapp 1 --wait

# Check rollback succeeded
helm status myapp
helm history myapp

Helm revision history makes rollback straightforward in most cases. Use --atomic for automatic rollback in CI/CD, implement proper hooks for data migration, and test rollback procedures regularly in non-production environments. For more on Helm, see our Helm Charts overview, and for deployment strategies that reduce rollback need, see our CI/CD Pipelines guide. For GitOps patterns with ArgoCD, see our GitOps with ArgoCD and Flux post.

Category

Related Posts

Developing Helm Charts: Templates, Values, and Testing

Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.

#helm #kubernetes #devops

Helm Charts: Templating, Values, and Package Management

Helm Charts guide covering templates, values management, chart repositories, and production deployment workflows.

#kubernetes #helm #devops

Container Security: Image Scanning and Vulnerability Management

Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.

#container-security #docker #kubernetes