Git Garbage Collection and Maintenance
Master git gc, git prune, git fsck, and automated repository maintenance. Learn how Git manages object storage, cleans unreachable data, and keeps repositories healthy.
Introduction
Git repositories grow over time. Every commit, branch deletion, and force push leaves behind objects that may become unreachable. Without cleanup, your .git directory accumulates loose objects, stale reflogs, and orphaned packs — slowing down operations and consuming disk space.
Git’s garbage collection system handles this automatically, but understanding how it works lets you optimize for large repositories, recover lost data before it’s pruned, and troubleshoot storage issues. The difference between a snappy 500MB repo and a sluggish 5GB repo often comes down to maintenance practices.
This article covers every aspect of Git’s garbage collection: the automatic triggers, manual commands, pruning policies, and maintenance routines that keep repositories healthy across their entire lifecycle.
When to Use / When Not to Use
When to manage Git garbage collection:
- Repository size is growing unexpectedly
- CI/CD clone times are increasing
- After history rewriting (rebase, filter-branch)
- Before archiving or backing up repositories
- Troubleshooting “bad object” errors
When not to intervene:
- Small, active repositories — auto-gc handles them
- During active development — let Git manage itself
- When unsure about pruning — unreachable objects may be needed
Core Concepts
Git’s garbage collection has three phases:
graph TD
REPO["Repository"] -->|accumulates| UNREACH["Unreachable Objects\n(commits, blobs, trees)"]
REPO -->|accumulates| LOOSE["Loose Objects\nindividual files"]
REPO -->|accumulates| STALE["Stale Reflogs\nexpired entries"]
AUTO["Auto GC\n(triggered by operations)"] -->|checks| THRESHOLD["Threshold Check\nloose objects > gc.auto"]
THRESHOLD -->|exceeded| PACK["Repack Objects\ninto pack files"]
THRESHOLD -->|not exceeded| SKIP["Skip GC\ncontinue normally"]
MANUAL["Manual GC\ngit gc"] -->|runs| PRUNE["Prune Unreachable\nobjects older than gc.pruneExpire"]
MANUAL -->|runs| REFLOG["Expire Reflogs\nolder than gc.reflogExpire"]
MANUAL -->|runs| PACK
Auto GC runs silently during normal operations when loose object count exceeds gc.auto (default: 6700). Manual GC gives you full control over packing, pruning, and reflog expiration.
Architecture or Flow Diagram
flowchart LR
COMMITS["New Commits"] -->|create| OBJECTS["New Objects\nloose files"]
OBJECTS -->|count exceeds| TRIGGER["gc.auto threshold\n(default: 6700)"]
TRIGGER -->|triggers| AUTO_GC["Auto GC\n(git gc --auto)"]
AUTO_GC -->|packs| PACKS["Pack Files\ncompressed objects"]
AUTO_GC -->|removes| LOOSE_DEL["Deleted Loose Objects"]
REBASE["git rebase"] -->|creates| ORPHAN["Orphaned Objects\nunreachable commits"]
ORPHAN -->|after| EXPIRY["gc.pruneExpire\n(default: 14 days)"]
EXPIRY -->|pruned| GONE["Permanently Deleted"]
FSCK["git fsck"] -->|finds| DANGLING["Dangling Objects\nreachable via reflog"]
DANGLING -->|reflog expires| PRUNED["Pruned by gc"]
Step-by-Step Guide / Deep Dive
Automatic Garbage Collection
Git runs git gc --auto after certain operations (commit, merge, fetch) when loose object count exceeds the threshold:
# Check current gc settings
git config --get gc.auto
# Output: 6700
git config --get gc.autopacklimit
# Output: 50
# Check if auto-gc will trigger
git count-objects -v
# Output:
# count: 1234 (loose objects)
# in-pack: 56789 (packed objects)
# packs: 3 (number of packs)
Auto-gc packs loose objects but does not prune unreachable objects. Pruning requires manual intervention.
Manual Garbage Collection
# Standard garbage collection
git gc
# Output:
# Enumerating objects: 12345, done.
# Counting objects: 100% (12345/12345), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (5678/5678), done.
# Writing objects: 100% (12345/12345), done.
# Total 12345 (delta 8901), reused 12345 (delta 8901)
# Aggressive garbage collection (better compression, slower)
git gc --aggressive
# Prune immediately (dangerous — removes all unreachable objects)
git gc --prune=now
# Disable pruning entirely
git gc --prune=never
Pruning Unreachable Objects
Unreachable objects are commits, trees, or blobs not referenced by any branch, tag, or reflog:
# Find unreachable objects
git fsck --unreachable
# Output:
# unreachable commit abc123...
# unreachable blob def456...
# Find dangling objects (unreachable AND not in reflog)
git fsck --dangling
# Prune objects older than 7 days
git prune --expire=7.days.ago
# Expire reflog entries older than 30 days
git reflog expire --expire=30.days.ago --all
Repository Health Checks
# Full integrity check
git fsck --full
# Output:
# Checking object directories: 100% (256/256)
# Checking objects: 100% (12345/12345)
# dangling commit abc123...
# Check for missing objects
git fsck --no-dangling --no-reflogs
# Verify pack files
git verify-pack -v .git/objects/pack/*.idx
# Check repository size
git count-objects -vH
Git Maintenance Command
Git 2.30+ introduced git maintenance for automated, scheduled maintenance:
# Register for automatic maintenance
git maintenance register
# Run all maintenance tasks
git maintenance run --auto
# Run specific tasks
git maintenance run --task=gc
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack
# Check maintenance schedule
git maintenance start
# Stop maintenance
git maintenance unregister
Production Failure Scenarios + Mitigations
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Accidental prune | Lost commits after git gc --prune=now | Recover from remote clone or backup; check reflog first |
| GC during CI | Slow CI runs due to gc trigger | Disable auto-gc in CI: git config gc.auto 0 |
| Pack corruption | ”bad packed object” errors | Delete pack, re-fetch from remote |
| Disk space exhaustion | Repository fills disk | Run git gc --aggressive; migrate large files to LFS |
| Reflog bloat | .git/logs/ consumes GBs | git reflog expire --expire=7.days.ago --all |
Trade-offs
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Auto GC | Zero-config maintenance | May trigger at inconvenient times |
| Aggressive GC | Maximum compression | Very slow on large repos |
| Immediate prune | Frees disk space | Loses recovery safety net |
| 14-day default prune | Recovery window | Delays space reclamation |
| Git maintenance | Scheduled, incremental | Requires Git 2.30+ |
Implementation Snippets
# Safe cleanup workflow
git reflog expire --expire=30.days.ago --all
git gc --prune=30.days.ago
# Aggressive cleanup (after confirming no needed unreachable objects)
git reflog expire --expire=7.days.ago --all
git gc --aggressive --prune=7.days.ago
# CI/CD optimization: disable auto-gc
git config gc.auto 0
git config gc.autoDetach false
# Check what would be pruned (dry run)
git fsck --unreachable --no-reflogs
# Monitor repository health
git count-objects -vH
git fsck --full 2>&1 | grep -E "error|warning"
# Set up scheduled maintenance
git config maintenance.auto true
git config maintenance.gc.auto 6700
Observability Checklist
- Monitor: Loose object count (
git count-objects -v) - Track: Repository size growth over time
- Alert: Pack file count exceeding
gc.autopacklimit - Verify: Integrity with periodic
git fsck --full - Audit: Reflog size per branch
Security/Compliance Notes
- Pruned objects may still be recoverable from disk forensics
- For true data removal, use secure deletion tools after pruning
- GC doesn’t remove objects from remote repositories
- See Removing Sensitive Data from History for secret removal
Common Pitfalls / Anti-Patterns
- Running
git gc --prune=nowwithout checking reflog — loses recovery options - Disabling auto-gc permanently — leads to excessive loose objects
- Not pruning after history rewrite — old objects persist indefinitely
- Assuming
git gcremoves secrets — it only removes unreachable objects, not specific content
Quick Recap Checklist
- Auto GC triggers when loose objects exceed
gc.auto(default: 6700) - Manual
git gcpacks objects and optionally prunes unreachable ones - Pruning removes objects not referenced by any ref or reflog
- Default prune expiry is 14 days — provides recovery window
-
git fsckverifies repository integrity -
git maintenanceprovides scheduled, incremental maintenance - Always check reflog before pruning
Interview Q&A
Standard git gc repacks objects with default delta window and depth. --aggressive uses a larger delta window (250 vs 10) and deeper delta chains, producing smaller packs but taking significantly longer. It also reuses existing delta information less, recomputing for better compression. Use it occasionally, not regularly.
By default, 14 days (gc.pruneExpire = 14.days.ago). Unreachable objects that are still referenced in the reflog are kept until the reflog entry expires (90 days for current branch, 30 days for others). Only objects that are both unreachable AND past the prune expiry are deleted.
Auto-gc can trigger unpredictably during clone or fetch, adding minutes to CI runs. CI environments typically use shallow clones that are discarded after the run, so gc provides no benefit. Disable it with git config gc.auto 0 in CI scripts to ensure consistent build times.
Yes, but only after the reflog expires. When you force-push, the old commits become unreachable from any branch, but they remain in the reflog. Once the reflog entries expire (default 90 days) AND the prune expiry passes (default 14 days), git gc will permanently remove them.
GC Process Flow (Clean Architecture)
graph TD
REPO["Repository Operations"] -->|generate| LOOSE["Loose Objects"]
REPO -->|generate| REFLOG["Reflog Entries"]
REPO -->|generate| UNREACH["Unreachable Objects"]
AUTO_GC["Auto GC\n(triggered at threshold)"] -->|checks| COUNT["Loose object count\n> gc.auto (6700)?"]
COUNT -->|yes| PACK["Repack objects\ninto pack files"]
COUNT -->|no| SKIP["Skip — no action"]
MANUAL_GC["Manual GC\ngit gc"] --> PACK
MANUAL_GC --> EXPIRE_REFLOG["Expire old reflogs\ngc.reflogExpire"]
EXPIRE_REFLOG --> PRUNE["Prune unreachable objects\ngc.pruneExpire (14 days)"]
PACK --> CLEAN["Clean repository\noptimized storage"]
PRUNE --> CLEAN
Production Failure: Aggressive GC Data Loss
Scenario: Reflog data loss after aggressive pruning
# What happened:
$ git gc --prune=now # Removes ALL unreachable objects immediately
$ # Later: need to recover a commit from 3 days ago
$ git reflog
# Empty — reflog entries were expired too!
# Symptoms
$ git log --oneline
# Missing commits that were on deleted branches
$ git fsck --unreachable
# No unreachable objects — they're all gone
# Recovery (limited options):
# 1. Check if remote still has the commits
git fetch origin
git log origin/old-branch --oneline
# 2. Check other clones (colleagues' machines)
# Ask teammates if they have the commits locally
# 3. Check CI/CD logs for commit SHAs
# Some CI systems log commit hashes before building
# 4. If truly lost — the commits are unrecoverable
# This is why --prune=now is dangerous
# Prevention:
# NEVER use --prune=now unless you're certain
# Always use a time window:
git gc --prune=30.days.ago
# Safe cleanup workflow:
# 1. Check what would be pruned
git fsck --unreachable --no-reflogs
# 2. Review reflog for valuable commits
git reflog --all
# 3. Create branches for anything important
git branch save-important <sha>
# 4. Then gc with a safety window
git gc --prune=90.days.ago
Trade-offs: Auto GC vs Manual GC
| Aspect | Auto GC | Manual GC |
|---|---|---|
| Trigger | Automatic (loose objects > gc.auto) | Explicit command |
| Safety | Conservative — never prunes unreachable | Can prune immediately (—prune=now) |
| Performance | Runs during operations (may slow commit/fetch) | Runs on your schedule |
| Disk recovery | Packs objects but keeps unreachable | Can free maximum space |
| Configuration | gc.auto, gc.autopacklimit | Full control over all parameters |
| CI/CD impact | Can trigger unexpectedly | Can be disabled or scheduled |
| Best for | Daily development | Maintenance windows, large repos |
| Risk level | Low | Medium to high (depends on flags) |
Recommendation: Keep auto-gc enabled for daily work. Run manual gc monthly or after major operations (rebase, filter-repo) with a safe prune window.
Implementation: GC Configuration Tuning
# === Threshold tuning ===
# When auto-gc triggers (default: 6700 loose objects)
git config gc.auto 10000
# Maximum number of pack files before consolidating (default: 50)
git config gc.autopacklimit 10
# === Pruning windows ===
# How long unreachable objects are kept (default: 14 days)
git config gc.pruneExpire 30.days.ago
# Reflog expiry for current branch (default: 90 days)
git config gc.reflogExpire 180.days.ago
# Reflog expiry for other refs (default: 30 days)
git config gc.reflogExpireUnreachable 90.days.ago
# === Pack optimization ===
# Delta search window size (default: 10)
git config pack.window 50
# Maximum delta chain depth (default: 50)
git config pack.depth 250
# Memory limit for delta search
git config pack.windowMemory 512m
# Compression level (0-9, default: -1 = zlib default)
git config pack.compression 6
# === Pack bitmaps (for large repos) ===
# Speeds up git rev-list operations
git config repack.writeBitmaps true
# === CI/CD optimization ===
# Disable auto-gc in CI environments
git config gc.auto 0
git config gc.autodetach false
# === Verify current configuration ===
git config --get-regexp 'gc\.'
git config --get-regexp 'pack\.'
git config --get-regexp 'repack\.'
# === Scheduled maintenance (Git 2.30+) ===
# Register for automatic background maintenance
git maintenance register
# Run maintenance tasks
git maintenance run --auto
# Available tasks: gc, commit-graph, loose-objects, incremental-repack
git maintenance run --task=gc
git maintenance run --task=commit-graph
Resources
Category
Related Posts
Git Object Database and Pack Files
Understanding Git's object storage: loose objects, pack files, delta compression, and how Git optimizes storage for repositories with millions of objects and gigabytes of history.
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.
Automated Changelog Generation: From Commit History to Release Notes
Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.