Git Garbage Collection and Maintenance

Master git gc, git prune, git fsck, and automated repository maintenance. Learn how Git manages object storage, cleans unreachable data, and keeps repositories healthy.

published: reading time: 18 min read author: Geek Workbench updated: March 31, 2026

Introduction

Git repositories grow over time. Every commit, branch deletion, and force push leaves behind objects that may become unreachable. Without cleanup, your .git directory accumulates loose objects, stale reflogs, and orphaned packs — slowing down operations and consuming disk space.

Git’s garbage collection system handles this automatically, but understanding how it works lets you optimize for large repositories, recover lost data before it’s pruned, and troubleshoot storage issues. The difference between a snappy 500MB repo and a sluggish 5GB repo often comes down to maintenance practices.

This article covers every aspect of Git’s garbage collection: the automatic triggers, manual commands, pruning policies, and maintenance routines that keep repositories healthy across their entire lifecycle.

When to Use / When Not to Use

When to manage Git garbage collection:

  • Repository size is growing unexpectedly
  • CI/CD clone times are increasing
  • After history rewriting (rebase, filter-branch)
  • Before archiving or backing up repositories
  • Troubleshooting “bad object” errors

When not to intervene:

  • Small, active repositories — auto-gc handles them
  • During active development — let Git manage itself
  • When unsure about pruning — unreachable objects may be needed

Core Concepts

Git’s garbage collection has three phases:


graph TD
    REPO["Repository"] -->|accumulates| UNREACH["Unreachable Objects\n(commits, blobs, trees)"]
    REPO -->|accumulates| LOOSE["Loose Objects\nindividual files"]
    REPO -->|accumulates| STALE["Stale Reflogs\nexpired entries"]

    AUTO["Auto GC\n(triggered by operations)"] -->|checks| THRESHOLD["Threshold Check\nloose objects > gc.auto"]
    THRESHOLD -->|exceeded| PACK["Repack Objects\ninto pack files"]
    THRESHOLD -->|not exceeded| SKIP["Skip GC\ncontinue normally"]

    MANUAL["Manual GC\ngit gc"] -->|runs| PRUNE["Prune Unreachable\nobjects older than gc.pruneExpire"]
    MANUAL -->|runs| REFLOG["Expire Reflogs\nolder than gc.reflogExpire"]
    MANUAL -->|runs| PACK

Auto GC runs silently during normal operations when loose object count exceeds gc.auto (default: 6700). Manual GC gives you full control over packing, pruning, and reflog expiration.

Architecture or Flow Diagram


flowchart LR
    COMMITS["New Commits"] -->|create| OBJECTS["New Objects\nloose files"]
    OBJECTS -->|count exceeds| TRIGGER["gc.auto threshold\n(default: 6700)"]
    TRIGGER -->|triggers| AUTO_GC["Auto GC\n(git gc --auto)"]

    AUTO_GC -->|packs| PACKS["Pack Files\ncompressed objects"]
    AUTO_GC -->|removes| LOOSE_DEL["Deleted Loose Objects"]

    REBASE["git rebase"] -->|creates| ORPHAN["Orphaned Objects\nunreachable commits"]
    ORPHAN -->|after| EXPIRY["gc.pruneExpire\n(default: 14 days)"]
    EXPIRY -->|pruned| GONE["Permanently Deleted"]

    FSCK["git fsck"] -->|finds| DANGLING["Dangling Objects\nreachable via reflog"]
    DANGLING -->|reflog expires| PRUNED["Pruned by gc"]

Step-by-Step Guide / Deep Dive

Automatic Garbage Collection

Git runs git gc --auto after certain operations (commit, merge, fetch) when loose object count exceeds the threshold:


# Check current gc settings
git config --get gc.auto
# Output: 6700

git config --get gc.autopacklimit
# Output: 50

# Check if auto-gc will trigger
git count-objects -v
# Output:
# count: 1234        (loose objects)
# in-pack: 56789     (packed objects)
# packs: 3           (number of packs)

Auto-gc packs loose objects but does not prune unreachable objects. Pruning requires manual intervention.

Manual Garbage Collection


# Standard garbage collection
git gc
# Output:
# Enumerating objects: 12345, done.
# Counting objects: 100% (12345/12345), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (5678/5678), done.
# Writing objects: 100% (12345/12345), done.
# Total 12345 (delta 8901), reused 12345 (delta 8901)

# Aggressive garbage collection (better compression, slower)
git gc --aggressive

# Prune immediately (dangerous — removes all unreachable objects)
git gc --prune=now

# Disable pruning entirely
git gc --prune=never

Pruning Unreachable Objects

Unreachable objects are commits, trees, or blobs not referenced by any branch, tag, or reflog:


# Find unreachable objects
git fsck --unreachable
# Output:
# unreachable commit abc123...
# unreachable blob def456...

# Find dangling objects (unreachable AND not in reflog)
git fsck --dangling

# Prune objects older than 7 days
git prune --expire=7.days.ago

# Expire reflog entries older than 30 days
git reflog expire --expire=30.days.ago --all

Repository Health Checks


# Full integrity check
git fsck --full
# Output:
# Checking object directories: 100% (256/256)
# Checking objects: 100% (12345/12345)
# dangling commit abc123...

# Check for missing objects
git fsck --no-dangling --no-reflogs

# Verify pack files
git verify-pack -v .git/objects/pack/*.idx

# Check repository size
git count-objects -vH

Git Maintenance Command

Git 2.30+ introduced git maintenance for automated, scheduled maintenance:


# Register for automatic maintenance
git maintenance register

# Run all maintenance tasks
git maintenance run --auto

# Run specific tasks
git maintenance run --task=gc
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack

# Check maintenance schedule
git maintenance start

# Stop maintenance
git maintenance unregister

Production Failure Scenarios

ScenarioSymptomsMitigation
Accidental pruneLost commits after git gc --prune=nowRecover from remote clone or backup; check reflog first
GC during CISlow CI runs due to gc triggerDisable auto-gc in CI: git config gc.auto 0
Pack corruption”bad packed object” errorsDelete pack, re-fetch from remote
Disk space exhaustionRepository fills diskRun git gc --aggressive; migrate large files to LFS
Reflog bloat.git/logs/ consumes GBsgit reflog expire --expire=7.days.ago --all

Trade-off Analysis

AspectAdvantageDisadvantage
Auto GCZero-config maintenanceMay trigger at inconvenient times
Aggressive GCMaximum compressionVery slow on large repos
Immediate pruneFrees disk spaceLoses recovery safety net
14-day default pruneRecovery windowDelays space reclamation
Git maintenanceScheduled, incrementalRequires Git 2.30+

Implementation Snippets


# Safe cleanup workflow
git reflog expire --expire=30.days.ago --all
git gc --prune=30.days.ago

# Aggressive cleanup (after confirming no needed unreachable objects)
git reflog expire --expire=7.days.ago --all
git gc --aggressive --prune=7.days.ago

# CI/CD optimization: disable auto-gc
git config gc.auto 0
git config gc.autoDetach false

# Check what would be pruned (dry run)
git fsck --unreachable --no-reflogs

# Monitor repository health
git count-objects -vH
git fsck --full 2>&1 | grep -E "error|warning"

# Set up scheduled maintenance
git config maintenance.auto true
git config maintenance.gc.auto 6700

Observability Checklist

  • Monitor: Loose object count (git count-objects -v)
  • Track: Repository size growth over time
  • Alert: Pack file count exceeding gc.autopacklimit
  • Verify: Integrity with periodic git fsck --full
  • Audit: Reflog size per branch

Security & Compliance Considerations

  • Pruned objects may still be recoverable from disk forensics
  • For true data removal, use secure deletion tools after pruning
  • GC doesn’t remove objects from remote repositories
  • See Removing Sensitive Data from History for secret removal

Common Pitfalls / Anti-Patterns

  • Running git gc --prune=now without checking reflog — loses recovery options
  • Disabling auto-gc permanently — leads to excessive loose objects
  • Not pruning after history rewrite — old objects persist indefinitely
  • Assuming git gc removes secrets — it only removes unreachable objects, not specific content

Quick Recap Checklist

  • Auto GC triggers when loose objects exceed gc.auto (default: 6700)
  • Manual git gc packs objects and optionally prunes unreachable ones
  • Pruning removes objects not referenced by any ref or reflog
  • Default prune expiry is 14 days — provides recovery window
  • git fsck verifies repository integrity
  • git maintenance provides scheduled, incremental maintenance
  • Always check reflog before pruning

GC Process Flow (Clean Architecture)


graph TD
    REPO["Repository Operations"] -->|generate| LOOSE["Loose Objects"]
    REPO -->|generate| REFLOG["Reflog Entries"]
    REPO -->|generate| UNREACH["Unreachable Objects"]

    AUTO_GC["Auto GC\n(triggered at threshold)"] -->|checks| COUNT["Loose object count\n> gc.auto (6700)?"]
    COUNT -->|yes| PACK["Repack objects\ninto pack files"]
    COUNT -->|no| SKIP["Skip — no action"]

    MANUAL_GC["Manual GC\ngit gc"] --> PACK
    MANUAL_GC --> EXPIRE_REFLOG["Expire old reflogs\ngc.reflogExpire"]
    EXPIRE_REFLOG --> PRUNE["Prune unreachable objects\ngc.pruneExpire (14 days)"]
    PACK --> CLEAN["Clean repository\noptimized storage"]
    PRUNE --> CLEAN

Production Failure: Aggressive GC Data Loss

Scenario: Reflog data loss after aggressive pruning


# What happened:
$ git gc --prune=now  # Removes ALL unreachable objects immediately
$ # Later: need to recover a commit from 3 days ago
$ git reflog
# Empty — reflog entries were expired too!

# Symptoms
$ git log --oneline
# Missing commits that were on deleted branches
$ git fsck --unreachable
# No unreachable objects — they're all gone

# Recovery (limited options):

# 1. Check if remote still has the commits
git fetch origin
git log origin/old-branch --oneline

# 2. Check other clones (colleagues' machines)
# Ask teammates if they have the commits locally

# 3. Check CI/CD logs for commit SHAs
# Some CI systems log commit hashes before building

# 4. If truly lost — the commits are unrecoverable
# This is why --prune=now is dangerous

# Prevention:
# NEVER use --prune=now unless you're certain
# Always use a time window:
git gc --prune=30.days.ago

# Safe cleanup workflow:
# 1. Check what would be pruned
git fsck --unreachable --no-reflogs
# 2. Review reflog for valuable commits
git reflog --all
# 3. Create branches for anything important
git branch save-important <sha>
# 4. Then gc with a safety window
git gc --prune=90.days.ago

Trade-offs: Auto GC vs Manual GC

AspectAuto GCManual GC
TriggerAutomatic (loose objects > gc.auto)Explicit command
SafetyConservative — never prunes unreachableCan prune immediately (—prune=now)
PerformanceRuns during operations (may slow commit/fetch)Runs on your schedule
Disk recoveryPacks objects but keeps unreachableCan free maximum space
Configurationgc.auto, gc.autopacklimitFull control over all parameters
CI/CD impactCan trigger unexpectedlyCan be disabled or scheduled
Best forDaily developmentMaintenance windows, large repos
Risk levelLowMedium to high (depends on flags)

Recommendation: Keep auto-gc enabled for daily work. Run manual gc monthly or after major operations (rebase, filter-repo) with a safe prune window.

Implementation: GC Configuration Tuning


# === Threshold tuning ===
# When auto-gc triggers (default: 6700 loose objects)
git config gc.auto 10000

# Maximum number of pack files before consolidating (default: 50)
git config gc.autopacklimit 10

# === Pruning windows ===
# How long unreachable objects are kept (default: 14 days)
git config gc.pruneExpire 30.days.ago

# Reflog expiry for current branch (default: 90 days)
git config gc.reflogExpire 180.days.ago

# Reflog expiry for other refs (default: 30 days)
git config gc.reflogExpireUnreachable 90.days.ago

# === Pack optimization ===
# Delta search window size (default: 10)
git config pack.window 50

# Maximum delta chain depth (default: 50)
git config pack.depth 250

# Memory limit for delta search
git config pack.windowMemory 512m

# Compression level (0-9, default: -1 = zlib default)
git config pack.compression 6

# === Pack bitmaps (for large repos) ===
# Speeds up git rev-list operations
git config repack.writeBitmaps true

# === CI/CD optimization ===
# Disable auto-gc in CI environments
git config gc.auto 0
git config gc.autodetach false

# === Verify current configuration ===
git config --get-regexp 'gc\.'
git config --get-regexp 'pack\.'
git config --get-regexp 'repack\.'

# === Scheduled maintenance (Git 2.30+) ===
# Register for automatic background maintenance
git maintenance register

# Run maintenance tasks
git maintenance run --auto

# Available tasks: gc, commit-graph, loose-objects, incremental-repack
git maintenance run --task=gc
git maintenance run --task=commit-graph

Interview Questions

1. What's the difference between `git gc` and `git gc --aggressive`?

Standard git gc repacks objects with default delta window and depth. --aggressive uses a larger delta window (250 vs 10) and deeper delta chains, producing smaller packs but taking significantly longer. It also reuses existing delta information less, recomputing for better compression. Use it occasionally, not regularly.

2. How long does Git keep unreachable objects before pruning them?

By default, 14 days (gc.pruneExpire = 14.days.ago). Unreachable objects that are still referenced in the reflog are kept until the reflog entry expires (90 days for current branch, 30 days for others). Only objects that are both unreachable AND past the prune expiry are deleted.

3. Why should you disable auto-gc in CI/CD pipelines?

Auto-gc can trigger unpredictably during clone or fetch, adding minutes to CI runs. CI environments typically use shallow clones that are discarded after the run, so gc provides no benefit. Disable it with git config gc.auto 0 in CI scripts to ensure consistent build times.

4. Can `git gc` remove commits that were force-pushed away?

Yes, but only after the reflog expires. When you force-push, the old commits become unreachable from any branch, but they remain in the reflog. Once the reflog entries expire (default 90 days) AND the prune expiry passes (default 14 days), git gc will permanently remove them.

5. What is the difference between loose objects and pack files in Git?

Loose objects are individual files stored under .git/objects/ as single files per object. Pack files are compressed archives that bundle multiple objects together, using delta compression to store differences between similar objects. Git creates pack files during gc to save space. Auto-gc triggers when loose object count exceeds gc.auto (default 6700) and consolidates them into packs.

6. What does `git count-objects -vH` tell you about repository health?

It shows a human-readable summary of object counts and storage usage: count (loose objects), in-pack (objects in pack files), packs (number of pack files), and size (pack file size). A growing count with few packs suggests auto-gc is not keeping up. A high packs count approaching gc.autopacklimit signals it's time to run manual gc.

7. What happens when `gc.autopacklimit` is reached?

When the number of pack files exceeds gc.autopacklimit (default 50), Git automatically repacks all objects into a single new pack file during gc. This consolidates fragmented packs and reduces file descriptor usage. If you have many small packs, running git gc manually triggers this consolidation.

8. How does `git maintenance` differ from `git gc --auto`?

git gc --auto only runs when loose object count exceeds the threshold and only performs packing. git maintenance (Git 2.30+) runs scheduled background tasks incrementally: loose object packing, commit-graph updates, and incremental repack. It runs via cron or scheduler on your system, doing small amounts of work per run so it never causes the latency spikes that a full gc can produce on large repos.

9. What is a dangling commit and how does Git handle it?

A dangling commit is an unreachable commit not referenced by any branch, tag, or the reflog. This commonly happens after rebasing, amending, or resetting. git fsck --dangling finds them. They are kept because the reflog still references them. Once the reflog entry expires AND the prune expiry passes, gc removes them. They are recoverable until that point.

10. Why does `git gc --prune=now` pose a data loss risk?

It removes ALL unreachable objects immediately, bypassing the default 14-day recovery window. If your reflog entries are also expired or if you ran git reflog expire first, there is no safety net. Any commits from deleted branches, rebased history, or reset operations that are not on another branch or tag are permanently deleted. Use a time-based prune window like --prune=30.days.ago instead.

11. What is the purpose of `git fsck --full` and when should you run it?

It performs a full integrity check on all objects in the repository, verifying SHA-1 checksums, directory structure validity, and referential integrity. It detects corrupted or missing objects, broken refs, and dangling objects. Run it periodically (monthly or after any filesystem issue) and always check the output for errors or warnings.

12. How do `gc.reflogExpire` and `gc.reflogExpireUnreachable` differ?

gc.reflogExpire controls expiry for reflog entries of the current branch (default 90 days). gc.reflogExpireUnreachable controls expiry for reflog entries of deleted or unreachable branches (default 30 days). Setting these appropriately balances between recovery window (longer for important branches) and storage cleanup (shorter for feature branches that are merged and deleted).

13. What does `pack.depth` configuration control in Git gc?

It sets the maximum delta chain depth when repacking objects (default 50). A deeper chain can produce smaller packs for chains of similar objects (like consecutive commits with small changes), but increases CPU usage during repack and object lookup time. --aggressive effectively sets a deeper delta window. For most repos, the default is fine; only very large repos with deep commit histories benefit from tuning it.

14. When should you use `git verify-pack` instead of `git fsck`?

Use git verify-pack when you suspect pack file corruption specifically. It verifies the .idx index file matches the .pack file and lists all objects in the pack. git fsck checks object integrity across both loose objects and packs, resolving refs and checking object connectivity. Run git verify-pack -v .git/objects/pack/*.idx to see all objects in all packs and spot-check for issues.

15. What is the commit-graph file and how does gc maintain it?

The commit-graph is a binary index that accelerates commit lookups, especially git log and reachability queries. git maintenance updates it incrementally. While gc itself does not directly update commit-graph, running git maintenance run --task=commit-graph alongside gc keeps it current. Without it, repos with many commits see significantly slower log operations.

16. What happens to the reflog when a branch is deleted and then recreated?

When you delete a branch, its reflog entries remain for the gc.reflogExpireUnreachable period (default 30 days). If you recreate the branch name, a new reflog starts fresh. The old commits are still in the repo as dangling/unreachable objects until both the reflog expires and the prune expiry passes. You can recover the old commits via git reflog of the new branch only shows new entries — use git fsck --unreachable to find orphaned commits.

17. How does Git LFS interact with garbage collection?

Git LFS stores large files externally and stores pointer files in the repo. The actual LFS objects are stored in .git/lfs/objects and managed by LFS itself, not Git gc. Git gc only affects the pointer files (small text files) and regular Git objects. This means repos using LFS can have large LFS storage even when Git gc has cleaned everything. Run git lfs prune separately to clean up old LFS objects.

18. What is the difference between `git reflog expire` and `git gc --prune`?

git reflog expire removes reflog entries themselves (the history of where HEAD pointed). git gc --prune removes the actual repository objects (commits, trees, blobs) that are no longer referenced by any ref or remaining reflog. You need reflog entries to survive long enough to let you recover — but once all reflog entries referencing an object are gone, gc can prune that object. Running reflog expire before gc is a safe cleanup sequence.

19. Why might running `git gc` on a very large repository cause problems?

A full gc is CPU and memory intensive, re-computing deltas and rewriting pack files. On repos over 10GB, it can take hours and consume tens of gigabytes of temporary disk space. It also takes an exclusive lock on the repository. Mitigation: use git maintenance for incremental maintenance, disable auto-gc in CI, run gc during maintenance windows, and use git gc --aggressive sparingly (never as a scheduled task).

20. What are the best practices for scheduling and automating Git repository maintenance in a production environment?

Schedule git maintenance run --auto as a background cron job during low-activity hours to keep the repository healthy without impacting developer workflows. Use git maintenance start (Git 2.30+) which sets up hourly, daily, and weekly maintenance tasks automatically, including commit-graph updates, loose object packing, and incremental repack. For large monorepos, set git config maintenance.gc.auto 0 and run git gc manually during scheduled maintenance windows instead of relying on auto-gc triggers. Monitor repository health monthly with git count-objects -vH and git fsck --full to catch issues early. In CI/CD pipelines, disable auto-gc entirely with git config gc.auto 0 to prevent unpredictable slowdowns during builds.

Further Reading

Conclusion

Git’s garbage collector is the housekeeping daemon that keeps your repo healthy — pruning unreachable objects, packing loose files, and optimizing storage. Regular maintenance prevents repository bloat and keeps Git operations fast over years of development.

Category

Related Posts

Git Object Database and Pack Files

Understanding Git's object storage: loose objects, pack files, delta compression, and how Git optimizes storage for repositories with millions of objects and gigabytes of history.

#git #version-control #pack-files

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn

Automated Changelog Generation: From Commit History to Release Notes

Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.

#git #version-control #changelog