Removing Sensitive Data from Git History

Using git filter-repo, BFG Repo-Cleaner, and git filter-branch to scrub secrets, passwords, and credentials from Git history. Step-by-step remediation guide.

published: reading time: 11 min read updated: March 31, 2026

Introduction

Accidentally committing a secret to Git is a rite of passage for developers. API keys, database passwords, private keys, and tokens end up in commit history, where they persist even after you delete them in a subsequent commit. Git’s append-only object database means that “deleting” a file only creates a new commit without it — the old commit with the secret remains accessible forever.

The only way to truly remove sensitive data from Git history is to rewrite the repository’s history, creating new commits that never contained the secret. This process changes every commit hash downstream, requiring all collaborators to re-clone the repository.

This guide covers the three main tools for history rewriting, when to use each, and the complete remediation workflow including secret rotation — because removing from history is only half the solution.

When to Use / When Not to Use

When to rewrite history:

  • Secrets committed to any branch
  • Large files accidentally committed (before migrating to LFS)
  • Compliance requirements for data removal
  • Repository cleanup before open-sourcing

When not to rewrite history:

  • For shared branches without team coordination
  • When secret rotation is sufficient (and faster)
  • If you don’t control all clones of the repository
  • As a substitute for proper secret management

Core Concepts

History rewriting creates entirely new commits:


graph TD
    OLD["Original History\nC1 → C2(secret) → C3 → C4"] -->|rewrite| NEW["New History\nC1' → C2'(clean) → C3' → C4'"]

    NEW -->|force push| REMOTE["Remote Repository\n(updated)"]
    REMOTE -->|requires| RECLONE["All collaborators\nmust re-clone"]

    SECRET["Secret in C2"] -->|rotate first| ROTATE["Rotate Secret\ninvalidate old value"]
    ROTATE -->|then| REWRITE["Rewrite History"]

Every commit hash changes because each commit includes its parent’s hash. Rewriting C2 changes C3 and C4 too.

Architecture or Flow Diagram


flowchart TD
    DISCOVER["Discover Secret in History"] -->|immediately| ROTATE["1. Rotate the Secret\ninvalidate old value"]

    ROTATE --> CHOOSE["2. Choose Tool"]
    CHOOSE -->|recommended| FILTER_REPO["git filter-repo\nfast, modern, safe"]
    CHOOSE -->|large repos| BFG["BFG Repo-Cleaner\nfastest for large repos"]
    CHOOSE -->|last resort| FILTER_BRANCH["git filter-branch\nslow, deprecated"]

    FILTER_REPO --> REWRITE["3. Rewrite History\nremove secret from all commits"]
    BFG --> REWRITE
    FILTER_BRANCH --> REWRITE

    REWRITE --> VERIFY["4. Verify Removal\ngit log -p, secret scan"]
    VERIFY -->|confirmed| FORCE["5. Force Push\ngit push --force-with-lease"]
    VERIFY -->|still found| REWRITE

    FORCE --> NOTIFY["6. Notify Team\nre-clone required"]
    NOTIFY --> PREVENT["7. Prevent Recurrence\npre-commit hooks"]

Step-by-Step Guide / Deep Dive

Step 1: Rotate the Secret First

Before rewriting history, invalidate the compromised credential:


# The secret is already exposed — assume it's compromised
# Rotate it in the service immediately
# Then proceed with history cleanup

git filter-repo is the modern, recommended tool:


# Install
pip install git-filter-repo
# Or: brew install git-filter-repo

# Remove a specific file from all history
git filter-repo --path path/to/secret-file --invert-paths

# Remove files matching a pattern
git filter-repo --path-glob '*.env' --invert-paths

# Replace text in all commits
git filter-repo --replace-text <(echo 'OLD_SECRET==>REDACTED')

# Remove blobs matching expression
git filter-repo --strip-blobs-bigger-than 10M

Step 3: Using BFG Repo-Cleaner

BFG is optimized for speed on large repositories:


# Install
brew install bfg

# Remove files by name
bfg --delete-files .env

# Remove files by size
bfg --strip-blobs-bigger-than 100M

# Replace text in all blobs
bfg --replace-text passwords.txt

# Run (must be on a bare clone)
git clone --mirror https://github.com/user/repo.git
cd repo.git
bfg --delete-files .env
git reflog expire --expire=now --all
git gc --prune=now --aggressive

Step 4: Verify Removal


# Search entire history for the secret
git log -p --all -S 'SECRET_VALUE'

# Search with gitleaks
gitleaks detect --log-opts="--all"

# Check that the file is truly gone
git log --all --full-history -- path/to/secret-file

# Verify no secrets remain
git rev-list --all | xargs git grep -l 'SECRET_VALUE'

Step 5: Force Push


# Force push with lease (safer than --force)
git push origin --force-with-lease

# If protected branch, temporarily disable protection
# Then re-enable after push

Step 6: Notify Team


# All collaborators must:
# 1. Delete their local clone
# 2. Re-clone from the cleaned repository
# 3. NOT merge old history back

# Warning message to team:
echo "URGENT: Repository history has been rewritten due to secret exposure.
Please delete your local clone and re-clone:
  rm -rf repo && git clone https://github.com/user/repo.git
Do NOT pull or merge from your old clone."

Production Failure Scenarios + Mitigations

ScenarioSymptomsMitigation
Secret in forkFork still contains secretContact fork owner; GitHub can remove if reported
Collaborator pushes old historySecret reappearsAll collaborators must delete old clones
CI/CD cached old commitsPipeline uses old historyClear CI caches; re-trigger pipelines
filter-repo refuses on non-fresh clone”expected fresh clone” errorClone fresh: git clone --mirror then filter
Protected branch blocks force push”protected branch” errorTemporarily disable protection; re-enable after

Trade-offs

ToolSpeedSafetyEase of Use
git filter-repoFastHigh (built-in safety checks)Moderate
BFG Repo-CleanerFastestHigh (focused on common cases)Easy
git filter-branchSlowestLow (easy to misuse)Hard

Implementation Snippets


# Complete remediation workflow
# 1. Clone fresh
git clone --mirror https://github.com/user/repo.git
cd repo.git

# 2. Remove secret file
git filter-repo --path .env --path config/secrets.yml --invert-paths

# 3. Verify
git log --all --oneline | head -20

# 4. Push cleaned history
git push --force --mirror

# 5. Clean up
cd ..
rm -rf repo.git

# Alternative: BFG for large repos
git clone --mirror https://github.com/user/repo.git
cd repo.git
bfg --delete-files '{.env,*.pem,*.key}'
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push --force --mirror

Observability Checklist

  • Monitor: Secret scanning alerts after cleanup
  • Verify: Full history scan with gitleaks
  • Track: Team re-clone completion
  • Audit: Protected branch settings after force push
  • Alert: Any recurrence of the secret in new commits

Security/Compliance Notes

  • Rotating the secret is mandatory — history removal alone is insufficient
  • Assume the secret was exposed from the moment it was committed
  • Check if the secret was pushed to any public repositories or forks
  • Document the incident for compliance records
  • See Git Secrets Management for prevention

Common Pitfalls / Anti-Patterns

  • Not rotating the secret first — it may already be compromised
  • Using git filter-branch — it’s deprecated and slow
  • Not forcing collaborators to re-clone — old clones can reintroduce secrets
  • Skipping verification — always confirm the secret is truly gone
  • Not adding pre-commit hooks — prevent recurrence

Quick Recap Checklist

  • Rotate the compromised secret immediately
  • Use git filter-repo or BFG (not filter-branch)
  • Verify removal with full history scan
  • Force push with --force-with-lease
  • Notify all collaborators to re-clone
  • Install pre-commit hooks to prevent recurrence
  • Document the incident

Interview Q&A

Why is `git filter-branch` deprecated?

It's extremely slow (shell-based, processes commits one at a time), easy to misuse (common patterns produce incorrect results), and leaves backup refs that keep old objects reachable. git filter-repo is 10-100x faster, has built-in safety checks, and is the officially recommended replacement.

What's the difference between `--force` and `--force-with-lease`?

--force unconditionally overwrites the remote branch. --force-with-lease only overwrites if the remote branch hasn't changed since your last fetch — it protects against accidentally overwriting someone else's pushes. Always prefer --force-with-lease.

Can you remove a secret from just one commit without rewriting all subsequent commits?

No. Because each commit's hash includes its parent's hash, changing any commit changes all descendants. This is fundamental to Git's Merkle tree structure. The entire history from the modified commit forward must be rewritten, producing new hashes for every commit.

What if the secret was in a public fork?

You can't force-push to someone else's fork. Rotate the secret immediately — that's the only reliable fix. You can request GitHub to remove the secret from their caches, and contact the fork owner. But assume the secret is permanently exposed and treat it as compromised.

History Rewrite Process (Clean Architecture)


graph TD
    OLD["Original History"] -->|filter-repo| CLEAN["Cleaned History"]
    OLD -->|BFG| CLEAN
    OLD -->|filter-branch| CLEAN_SLOW["Cleaned (slow)"]

    CLEAN -->|force push| REMOTE["Remote Updated"]
    CLEAN_SLOW -->|force push| REMOTE

    REMOTE -->|notify| TEAM["Team Re-clones"]

    OLD -.->|still exists in| CLONES["Cached Clones\nstill have secrets"]
    CLONES -.->|must be| DELETED["Deleted or Re-cloned"]

Production Failure: Incomplete Cleanup

Scenario: Cached clones still containing secrets


# What happened:
# 1. History rewritten and force-pushed
# 2. Team member pulls instead of re-cloning
# 3. Old objects still in their local .git/objects/
# 4. They push — secret reappears in remote!

# Symptoms
$ git push
# Team member's old clone pushes old commits back
$ gitleaks detect --log-opts="--all"
# Secret found again!

# Root cause: Not all clones were cleaned; old objects persist

# Recovery steps:

# 1. Identify the source of re-contamination
git log --all --oneline --source | grep -C2 "secret"
# Find which branch/clone reintroduced the secret

# 2. Force push again with clean history
git push origin --force-with-lease

# 3. Enforce re-clone on ALL machines:
#    - Developer laptops
#    - CI/CD runners (clear workspace cache)
#    - Staging servers
#    - Backup systems

# 4. Verify no cached clones exist:
#    Check CI/CD workspace directories
#    Check developer machines (ask team)
#    Check any automated systems that clone the repo

# 5. Prevent re-contamination:
#    Add branch protection rules
#    Enable server-side secret scanning
#    Set up pre-receive hook to reject secret-containing pushes

# === CI/CD Cache Cleanup ===
# GitHub Actions
# Add to workflow:
# - name: Clean workspace
#   run: |
#     rm -rf $GITHUB_WORKSPACE
#     git clone ${{ github.server_url }}/${{ github.repository }} .

# Jenkins
# Use "Wipe out repository & force clone" in pipeline

# GitLab CI
# variables:
#   GIT_STRATEGY: clone  # Not "fetch"

Trade-offs: BFG vs Git Filter-repo

AspectBFG Repo-CleanerGit Filter-repo
SpeedFastest (Java, optimized)Fast (Python, C bindings)
SafetyHigh (focused use cases)Highest (built-in safety checks)
Ease of useEasy (simple commands)Moderate (more options)
MaintenanceUnmaintained since 2020Actively maintained
InstallationJava requiredPython required
Text replacement--replace-text (file-based)--replace-text (flexible)
File removal--delete-files--path --invert-paths
Size filtering--strip-blobs-bigger-than--strip-blobs-bigger-than
Official recommendationNoYes (Git community)
Best forVery large repos (>1GB)General use, most scenarios

Recommendation: Use git filter-repo for most cases. Use BFG only for very large repositories where filter-repo is too slow.

Quick Recap: Post-Cleanup Actions


# === 1. Force push (done) ===
git push origin --force-with-lease

# === 2. Notify team ===
# Send this message to all collaborators:
# "URGENT: Repository history rewritten. Delete your local clone and re-clone:
#  rm -rf repo && git clone <url>
#  Do NOT pull from your old clone."

# === 3. Rotate ALL secrets ===
# Even after cleanup, assume secrets were exposed:
# - API keys: regenerate in provider console
# - Database passwords: change and update connection strings
# - SSH keys: generate new key pairs
# - Certificates: reissue from CA

# === 4. Clear CI/CD caches ===
# GitHub Actions: Settings → Actions → Clear cache
# GitLab CI: CI/CD → Clear Runner Caches
# Jenkins: Delete workspace directories

# === 5. Check forks and mirrors ===
# - Contact fork owners to re-clone
# - Update any mirror repositories
# - Check if secrets were pushed to public repos

# === 6. Enable prevention ===
# Install pre-commit hooks:
pre-commit install

# Enable platform secret scanning:
# GitHub: Settings → Code Security → Secret Scanning
# GitLab: Settings → Security → Secret Detection

# === 7. Document the incident ===
# Record: what was leaked, when, how it was fixed
# Update: runbooks, team training, prevention measures
# Review: why the secret was committed in the first place

# === 8. Verify complete cleanup ===
gitleaks detect --log-opts="--all"
# Should return: "No leaks found"

git log --all -p | grep -i "secret_value"
# Should return: nothing

Resources

Category

Related Posts

Git Secrets Management and Pre-commit Hooks

Preventing secrets from entering repositories using pre-commit hooks, secret scanning tools, and automated detection. Protect API keys, tokens, and credentials from accidental commits.

#git #version-control #secrets

Signed Commits (GPG/SSH)

Complete guide to Git commit signing with GPG and SSH keys. Setup, verification, trust chains, and why signed commits matter for supply chain security.

#git #version-control #gpg

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn