Removing Sensitive Data from Git History

Using git filter-repo, BFG Repo-Cleaner, and git filter-branch to scrub secrets, passwords, and credentials from Git history. Step-by-step remediation guide.

published: reading time: 17 min read author: Geek Workbench updated: March 31, 2026

Introduction

Accidentally committing a secret to Git is a rite of passage for developers. API keys, database passwords, private keys, and tokens end up in commit history, where they persist even after you delete them in a subsequent commit. Git’s append-only object database means that “deleting” a file only creates a new commit without it — the old commit with the secret remains accessible forever.

The only way to truly remove sensitive data from Git history is to rewrite the repository’s history, creating new commits that never contained the secret. This process changes every commit hash downstream, requiring all collaborators to re-clone the repository.

This guide covers the three main tools for history rewriting, when to use each, and the complete remediation workflow including secret rotation — because removing from history is only half the solution.

When to Use / When Not to Use

When to rewrite history:

  • Secrets committed to any branch
  • Large files accidentally committed (before migrating to LFS)
  • Compliance requirements for data removal
  • Repository cleanup before open-sourcing

When not to rewrite history:

  • For shared branches without team coordination
  • When secret rotation is sufficient (and faster)
  • If you don’t control all clones of the repository
  • As a substitute for proper secret management

Core Concepts

History rewriting creates entirely new commits:


graph TD
    OLD["Original History\nC1 → C2(secret) → C3 → C4"] -->|rewrite| NEW["New History\nC1' → C2'(clean) → C3' → C4'"]

    NEW -->|force push| REMOTE["Remote Repository\n(updated)"]
    REMOTE -->|requires| RECLONE["All collaborators\nmust re-clone"]

    SECRET["Secret in C2"] -->|rotate first| ROTATE["Rotate Secret\ninvalidate old value"]
    ROTATE -->|then| REWRITE["Rewrite History"]

Every commit hash changes because each commit includes its parent’s hash. Rewriting C2 changes C3 and C4 too.

Architecture or Flow Diagram


flowchart TD
    DISCOVER["Discover Secret in History"] -->|immediately| ROTATE["1. Rotate the Secret\ninvalidate old value"]

    ROTATE --> CHOOSE["2. Choose Tool"]
    CHOOSE -->|recommended| FILTER_REPO["git filter-repo\nfast, modern, safe"]
    CHOOSE -->|large repos| BFG["BFG Repo-Cleaner\nfastest for large repos"]
    CHOOSE -->|last resort| FILTER_BRANCH["git filter-branch\nslow, deprecated"]

    FILTER_REPO --> REWRITE["3. Rewrite History\nremove secret from all commits"]
    BFG --> REWRITE
    FILTER_BRANCH --> REWRITE

    REWRITE --> VERIFY["4. Verify Removal\ngit log -p, secret scan"]
    VERIFY -->|confirmed| FORCE["5. Force Push\ngit push --force-with-lease"]
    VERIFY -->|still found| REWRITE

    FORCE --> NOTIFY["6. Notify Team\nre-clone required"]
    NOTIFY --> PREVENT["7. Prevent Recurrence\npre-commit hooks"]

Step-by-Step Guide / Deep Dive

Reference: Step 1: Rotate the Secret First

Reference: Step 4: Verify Removal

Step 1: Rotate the Secret First

Before rewriting history, invalidate the compromised credential:


# The secret is already exposed — assume it's compromised
# Rotate it in the service immediately
# Then proceed with history cleanup

git filter-repo is the modern, recommended tool:


# Install
pip install git-filter-repo
# Or: brew install git-filter-repo

# Remove a specific file from all history
git filter-repo --path path/to/secret-file --invert-paths

# Remove files matching a pattern
git filter-repo --path-glob '*.env' --invert-paths

# Replace text in all commits
git filter-repo --replace-text <(echo 'OLD_SECRET==>REDACTED')

# Remove blobs matching expression
git filter-repo --strip-blobs-bigger-than 10M

Step 3: Using BFG Repo-Cleaner

BFG is optimized for speed on large repositories:


# Install
brew install bfg

# Remove files by name
bfg --delete-files .env

# Remove files by size
bfg --strip-blobs-bigger-than 100M

# Replace text in all blobs
bfg --replace-text passwords.txt

# Run (must be on a bare clone)
git clone --mirror https://github.com/user/repo.git
cd repo.git
bfg --delete-files .env
git reflog expire --expire=now --all
git gc --prune=now --aggressive

Step 4: Verify Removal


# Search entire history for the secret
git log -p --all -S 'SECRET_VALUE'

# Search with gitleaks
gitleaks detect --log-opts="--all"

# Check that the file is truly gone
git log --all --full-history -- path/to/secret-file

# Verify no secrets remain
git rev-list --all | xargs git grep -l 'SECRET_VALUE'

Step 5: Force Push


# Force push with lease (safer than --force)
git push origin --force-with-lease

# If protected branch, temporarily disable protection
# Then re-enable after push

Step 6: Notify Team


# All collaborators must:
# 1. Delete their local clone
# 2. Re-clone from the cleaned repository
# 3. NOT merge old history back

# Warning message to team:
echo "URGENT: Repository history has been rewritten due to secret exposure.
Please delete your local clone and re-clone:
  rm -rf repo && git clone https://github.com/user/repo.git
Do NOT pull or merge from your old clone."

Production Failure Scenarios

ScenarioSymptomsMitigation
Secret in forkFork still contains secretContact fork owner; GitHub can remove if reported
Collaborator pushes old historySecret reappearsAll collaborators must delete old clones
CI/CD cached old commitsPipeline uses old historyClear CI caches; re-trigger pipelines
filter-repo refuses on non-fresh clone”expected fresh clone” errorClone fresh: git clone --mirror then filter
Protected branch blocks force push”protected branch” errorTemporarily disable protection; re-enable after

Trade-off Analysis

ToolSpeedSafetyEase of Use
git filter-repoFastHigh (built-in safety checks)Moderate
BFG Repo-CleanerFastestHigh (focused on common cases)Easy
git filter-branchSlowestLow (easy to misuse)Hard

Implementation Snippets


# Complete remediation workflow
# 1. Clone fresh
git clone --mirror https://github.com/user/repo.git
cd repo.git

# 2. Remove secret file
git filter-repo --path .env --path config/secrets.yml --invert-paths

# 3. Verify
git log --all --oneline | head -20

# 4. Push cleaned history
git push --force --mirror

# 5. Clean up
cd ..
rm -rf repo.git

# Alternative: BFG for large repos
git clone --mirror https://github.com/user/repo.git
cd repo.git
bfg --delete-files '{.env,*.pem,*.key}'
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push --force --mirror

Observability Checklist

  • Monitor: Secret scanning alerts after cleanup
  • Verify: Full history scan with gitleaks
  • Track: Team re-clone completion
  • Audit: Protected branch settings after force push
  • Alert: Any recurrence of the secret in new commits

Security & Compliance Considerations

  • Rotating the secret is mandatory — history removal alone is insufficient
  • Assume the secret was exposed from the moment it was committed
  • Check if the secret was pushed to any public repositories or forks
  • Document the incident for compliance records
  • See Git Secrets Management for prevention

Common Pitfalls / Anti-Patterns

  • Not rotating the secret first — it may already be compromised
  • Using git filter-branch — it’s deprecated and slow
  • Not forcing collaborators to re-clone — old clones can reintroduce secrets
  • Skipping verification — always confirm the secret is truly gone
  • Not adding pre-commit hooks — prevent recurrence

Quick Recap Checklist

  • Rotate the compromised secret immediately
  • Use git filter-repo or BFG (not filter-branch)
  • Verify removal with full history scan
  • Force push with --force-with-lease
  • Notify all collaborators to re-clone
  • Install pre-commit hooks to prevent recurrence
  • Document the incident

History Rewrite Process (Clean Architecture)


graph TD
    OLD["Original History"] -->|filter-repo| CLEAN["Cleaned History"]
    OLD -->|BFG| CLEAN
    OLD -->|filter-branch| CLEAN_SLOW["Cleaned (slow)"]

    CLEAN -->|force push| REMOTE["Remote Updated"]
    CLEAN_SLOW -->|force push| REMOTE

    REMOTE -->|notify| TEAM["Team Re-clones"]

    OLD -.->|still exists in| CLONES["Cached Clones\nstill have secrets"]
    CLONES -.->|must be| DELETED["Deleted or Re-cloned"]

Production Failure: Incomplete Cleanup

Scenario: Cached clones still containing secrets


# What happened:
# 1. History rewritten and force-pushed
# 2. Team member pulls instead of re-cloning
# 3. Old objects still in their local .git/objects/
# 4. They push — secret reappears in remote!

# Symptoms
$ git push
# Team member's old clone pushes old commits back
$ gitleaks detect --log-opts="--all"
# Secret found again!

# Root cause: Not all clones were cleaned; old objects persist

# Recovery steps:

# 1. Identify the source of re-contamination
git log --all --oneline --source | grep -C2 "secret"
# Find which branch/clone reintroduced the secret

# 2. Force push again with clean history
git push origin --force-with-lease

# 3. Enforce re-clone on ALL machines:
#    - Developer laptops
#    - CI/CD runners (clear workspace cache)
#    - Staging servers
#    - Backup systems

# 4. Verify no cached clones exist:
#    Check CI/CD workspace directories
#    Check developer machines (ask team)
#    Check any automated systems that clone the repo

# 5. Prevent re-contamination:
#    Add branch protection rules
#    Enable server-side secret scanning
#    Set up pre-receive hook to reject secret-containing pushes

# === CI/CD Cache Cleanup ===
# GitHub Actions
# Add to workflow:
# - name: Clean workspace
#   run: |
#     rm -rf $GITHUB_WORKSPACE
#     git clone ${{ github.server_url }}/${{ github.repository }} .

# Jenkins
# Use "Wipe out repository & force clone" in pipeline

# GitLab CI
# variables:
#   GIT_STRATEGY: clone  # Not "fetch"

Trade-offs: BFG vs Git Filter-repo

AspectBFG Repo-CleanerGit Filter-repo
SpeedFastest (Java, optimized)Fast (Python, C bindings)
SafetyHigh (focused use cases)Highest (built-in safety checks)
Ease of useEasy (simple commands)Moderate (more options)
MaintenanceUnmaintained since 2020Actively maintained
InstallationJava requiredPython required
Text replacement--replace-text (file-based)--replace-text (flexible)
File removal--delete-files--path --invert-paths
Size filtering--strip-blobs-bigger-than--strip-blobs-bigger-than
Official recommendationNoYes (Git community)
Best forVery large repos (>1GB)General use, most scenarios

Recommendation: Use git filter-repo for most cases. Use BFG only for very large repositories where filter-repo is too slow.

Quick Recap: Post-Cleanup Actions


# === 1. Force push (done) ===
git push origin --force-with-lease

# === 2. Notify team ===
# Send this message to all collaborators:
# "URGENT: Repository history rewritten. Delete your local clone and re-clone:
#  rm -rf repo && git clone <url>
#  Do NOT pull from your old clone."

# === 3. Rotate ALL secrets ===
# Even after cleanup, assume secrets were exposed:
# - API keys: regenerate in provider console
# - Database passwords: change and update connection strings
# - SSH keys: generate new key pairs
# - Certificates: reissue from CA

# === 4. Clear CI/CD caches ===
# GitHub Actions: Settings → Actions → Clear cache
# GitLab CI: CI/CD → Clear Runner Caches
# Jenkins: Delete workspace directories

# === 5. Check forks and mirrors ===
# - Contact fork owners to re-clone
# - Update any mirror repositories
# - Check if secrets were pushed to public repos

# === 6. Enable prevention ===
# Install pre-commit hooks:
pre-commit install

# Enable platform secret scanning:
# GitHub: Settings → Code Security → Secret Scanning
# GitLab: Settings → Security → Secret Detection

# === 7. Document the incident ===
# Record: what was leaked, when, how it was fixed
# Update: runbooks, team training, prevention measures
# Review: why the secret was committed in the first place

# === 8. Verify complete cleanup ===
gitleaks detect --log-opts="--all"
# Should return: "No leaks found"

git log --all -p | grep -i "secret_value"
# Should return: nothing

Interview Questions

1. Why is `git filter-branch` deprecated?

It's extremely slow (shell-based, processes commits one at a time), easy to misuse (common patterns produce incorrect results), and leaves backup refs that keep old objects reachable. git filter-repo is 10-100x faster, has built-in safety checks, and is the officially recommended replacement.

2. What's the difference between `--force` and `--force-with-lease`?

--force unconditionally overwrites the remote branch. --force-with-lease only overwrites if the remote branch hasn't changed since your last fetch — it protects against accidentally overwriting someone else's pushes. Always prefer --force-with-lease.

3. Can you remove a secret from just one commit without rewriting all subsequent commits?

No. Because each commit's hash includes its parent's hash, changing any commit changes all descendants. This is fundamental to Git's Merkle tree structure. The entire history from the modified commit forward must be rewritten, producing new hashes for every commit.

4. What if the secret was in a public fork?

You can't force-push to someone else's fork. Rotate the secret immediately — that's the only reliable fix. You can request GitHub to remove the secret from their caches, and contact the fork owner. But assume the secret is permanently exposed and treat it as compromised.

5. What is the first step when you discover a secret in Git history?

Rotate the secret immediately. The moment you discover a secret in history, assume it is already compromised — assume it has been scraped, used, or shared. Revoke or change the credential in its respective service (API key, password, token). Only after rotation should you proceed with history rewriting to remove it from Git.

6. How does git filter-repo compare to BFG Repo-Cleaner?

filter-repo is faster (10-100x), has built-in safety checks, and is officially recommended. BFG is faster for very large repos (>1GB) but is unmaintained since 2020 and uses Java. filter-repo supports flexible text replacement with --replace-text, handles glob patterns, and can strip blobs by size. BFG is simpler for common cases but filter-repo is better maintained.

7. Why must collaborators re-clone after history rewriting?

Because every commit hash changes after the rewrite, and the old history still exists in their local clones. If they push from their old clone, they reintroduce the old objects (including the secret). They must delete their local clone entirely and re-clone from the cleaned repository. Pulling or merging from the old clone reinfects the repository.

8. How do you verify that a secret has been completely removed from history?

Use multiple verification methods: (1) git log -p --all -S 'SECRET_VALUE' to search the entire object graph, (2) gitleaks detect --log-opts="--all" for secret scanning, (3) git rev-list --all | xargs git grep -l 'SECRET_VALUE' to search all reachable objects. If any method returns results, the secret is not fully removed.

9. What does the --invert-paths flag do in git filter-repo?

--invert-paths reverses the path matching — instead of removing files that match the path, it removes all files that do not match. For example, git filter-repo --path secrets/ --invert-paths removes everything except the secrets directory. This is useful when you want to keep only specific files and remove everything else.

10. How do you remove a secret that was committed multiple times across many commits?

Use git filter-repo --replace-text <(echo 'SECRET_VALUE==>REDACTED'). This replaces all occurrences of the string across all commits. The --replace-text option handles multiple patterns by using a file with one pattern per line. After replacement, force push with --force-with-lease and notify collaborators to re-clone.

11. What happens to CI/CD caches after history rewriting?

CI/CD runners that cached the old Git objects may still have the secret in their workspace cache. Clear all CI/CD caches after rewriting: GitHub Actions (Settings → Actions → Clear caches), GitLab CI (CI/CD → Clear Runner caches), Jenkins (delete workspace directories). Any system that cloned the repository before the rewrite must be cleared or re-cloned.

12. Can you use git filter-repo on a non-fresh clone?

No — filter-repo refuses to run on a non-fresh clone and will error with "expected fresh clone". Clone fresh first: git clone --mirror https://github.com/user/repo.git && cd repo.git && git filter-repo. Using an existing clone risks accidentally including old objects from the source. The mirror clone ensures a clean slate.

13. How do you remove large files from Git history to reduce repository size?

Use git filter-repo --strip-blobs-bigger-than 10M to remove all blobs over a certain size. For BFG: bfg --strip-blobs-bigger-than 100M. After rewriting, run git reflog expire --expire=now --all and git gc --prune=now --aggressive to garbage collect old objects and reduce repository size.

14. What is the role of git reflog after history rewriting?

The reflog records where HEAD pointed before the rewrite. After rewriting, the old commits may still be reachable via reflog for the 90-day retention period. Use git reflog expire --expire=now --all to clear reflog entries and prevent old objects from being recovered. This is part of the complete cleanup after any history rewrite operation.

15. How do you prevent secrets from being committed in the future?

Install pre-commit hooks: (1) pre-commit install with hooks like gitleaks or git-secrets, (2) Enable platform secret scanning (GitHub Settings → Code Security → Secret Scanning), (3) Use .gitignore for common secret file patterns (.env, *.pem), (4) Require signed commits to establish authorship accountability.

16. What is the difference between removing a file and removing its content from history?

Removing a file from Git history with git filter-repo --path file --invert-paths removes all commits that touched that file, rewriting all subsequent commits. Removing content with --replace-text keeps the file but scrubs the sensitive content from every commit. File removal is for files that should never have been committed; content removal is for files (like .env) that legitimately exist but contain bad values.

17. Why is --force-with-lease safer than --force for pushing cleaned history?

--force-with-lease checks that the remote ref matches what you last fetched before overwriting. If someone else pushed to the remote since your last fetch (perhaps an old collaborator's pre-cleanup clone), the push fails. This prevents accidentally overwriting work from others. --force would unconditionally overwrite, potentially losing other people's work.

18. How do you handle secrets in Git submodules?

Submodules store references to commits in external repositories. To fully remove a secret from a submodule's history, you must independently clean the submodule repository using the same methods, then update the parent repository's submodule reference. The parent repository only stores the submodule's commit hash, not its internal history.

19. What compliance documentation should be created after a secret exposure incident?

Document: (1) what secret was exposed and when, (2) which repositories and branches were affected, (3) when rotation occurred, (4) cleanup method used (filter-repo/BFG), (5) which collaborators were notified, (6) when re-clone was confirmed complete, (7) what prevention measures were implemented. Keep this record for compliance audits — many frameworks require documented incident response.

20. How does Git's object compression affect history rewriting performance?

Git stores objects compressed in .git/objects/. History rewriting decompresses and re-processes every affected object, recompressing them with new parent references. For repositories with many large blobs, this is I/O intensive. BFG handles large repos faster because it works directly on the object database without fully decompressing. filter-repo is slower but safer and more flexible.

Further Reading

Conclusion

Once sensitive data reaches a remote, removing it requires rewriting history with tools like git filter-branch or BFG Repo-Cleaner. The process is surgical and destructive — it changes every descendant commit’s hash — so prevention via pre-commit scanning is far better than cleanup after the fact.

Category

Related Posts

Git Secrets Management and Pre-commit Hooks

Preventing secrets from entering repositories using pre-commit hooks, secret scanning tools, and automated detection. Protect API keys, tokens, and credentials from accidental commits.

#git #version-control #secrets

Signed Commits (GPG/SSH)

Complete guide to Git commit signing with GPG and SSH keys. Setup, verification, trust chains, and why signed commits matter for supply chain security.

#git #version-control #gpg

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn