Git Garbage Collection and Maintenance

Master git gc, git prune, git fsck, and automated repository maintenance. Learn how Git manages object storage, cleans unreachable data, and keeps repositories healthy.

published: reading time: 11 min read updated: March 31, 2026

Introduction

Git repositories grow over time. Every commit, branch deletion, and force push leaves behind objects that may become unreachable. Without cleanup, your .git directory accumulates loose objects, stale reflogs, and orphaned packs — slowing down operations and consuming disk space.

Git’s garbage collection system handles this automatically, but understanding how it works lets you optimize for large repositories, recover lost data before it’s pruned, and troubleshoot storage issues. The difference between a snappy 500MB repo and a sluggish 5GB repo often comes down to maintenance practices.

This article covers every aspect of Git’s garbage collection: the automatic triggers, manual commands, pruning policies, and maintenance routines that keep repositories healthy across their entire lifecycle.

When to Use / When Not to Use

When to manage Git garbage collection:

  • Repository size is growing unexpectedly
  • CI/CD clone times are increasing
  • After history rewriting (rebase, filter-branch)
  • Before archiving or backing up repositories
  • Troubleshooting “bad object” errors

When not to intervene:

  • Small, active repositories — auto-gc handles them
  • During active development — let Git manage itself
  • When unsure about pruning — unreachable objects may be needed

Core Concepts

Git’s garbage collection has three phases:


graph TD
    REPO["Repository"] -->|accumulates| UNREACH["Unreachable Objects\n(commits, blobs, trees)"]
    REPO -->|accumulates| LOOSE["Loose Objects\nindividual files"]
    REPO -->|accumulates| STALE["Stale Reflogs\nexpired entries"]

    AUTO["Auto GC\n(triggered by operations)"] -->|checks| THRESHOLD["Threshold Check\nloose objects > gc.auto"]
    THRESHOLD -->|exceeded| PACK["Repack Objects\ninto pack files"]
    THRESHOLD -->|not exceeded| SKIP["Skip GC\ncontinue normally"]

    MANUAL["Manual GC\ngit gc"] -->|runs| PRUNE["Prune Unreachable\nobjects older than gc.pruneExpire"]
    MANUAL -->|runs| REFLOG["Expire Reflogs\nolder than gc.reflogExpire"]
    MANUAL -->|runs| PACK

Auto GC runs silently during normal operations when loose object count exceeds gc.auto (default: 6700). Manual GC gives you full control over packing, pruning, and reflog expiration.

Architecture or Flow Diagram


flowchart LR
    COMMITS["New Commits"] -->|create| OBJECTS["New Objects\nloose files"]
    OBJECTS -->|count exceeds| TRIGGER["gc.auto threshold\n(default: 6700)"]
    TRIGGER -->|triggers| AUTO_GC["Auto GC\n(git gc --auto)"]

    AUTO_GC -->|packs| PACKS["Pack Files\ncompressed objects"]
    AUTO_GC -->|removes| LOOSE_DEL["Deleted Loose Objects"]

    REBASE["git rebase"] -->|creates| ORPHAN["Orphaned Objects\nunreachable commits"]
    ORPHAN -->|after| EXPIRY["gc.pruneExpire\n(default: 14 days)"]
    EXPIRY -->|pruned| GONE["Permanently Deleted"]

    FSCK["git fsck"] -->|finds| DANGLING["Dangling Objects\nreachable via reflog"]
    DANGLING -->|reflog expires| PRUNED["Pruned by gc"]

Step-by-Step Guide / Deep Dive

Automatic Garbage Collection

Git runs git gc --auto after certain operations (commit, merge, fetch) when loose object count exceeds the threshold:


# Check current gc settings
git config --get gc.auto
# Output: 6700

git config --get gc.autopacklimit
# Output: 50

# Check if auto-gc will trigger
git count-objects -v
# Output:
# count: 1234        (loose objects)
# in-pack: 56789     (packed objects)
# packs: 3           (number of packs)

Auto-gc packs loose objects but does not prune unreachable objects. Pruning requires manual intervention.

Manual Garbage Collection


# Standard garbage collection
git gc
# Output:
# Enumerating objects: 12345, done.
# Counting objects: 100% (12345/12345), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (5678/5678), done.
# Writing objects: 100% (12345/12345), done.
# Total 12345 (delta 8901), reused 12345 (delta 8901)

# Aggressive garbage collection (better compression, slower)
git gc --aggressive

# Prune immediately (dangerous — removes all unreachable objects)
git gc --prune=now

# Disable pruning entirely
git gc --prune=never

Pruning Unreachable Objects

Unreachable objects are commits, trees, or blobs not referenced by any branch, tag, or reflog:


# Find unreachable objects
git fsck --unreachable
# Output:
# unreachable commit abc123...
# unreachable blob def456...

# Find dangling objects (unreachable AND not in reflog)
git fsck --dangling

# Prune objects older than 7 days
git prune --expire=7.days.ago

# Expire reflog entries older than 30 days
git reflog expire --expire=30.days.ago --all

Repository Health Checks


# Full integrity check
git fsck --full
# Output:
# Checking object directories: 100% (256/256)
# Checking objects: 100% (12345/12345)
# dangling commit abc123...

# Check for missing objects
git fsck --no-dangling --no-reflogs

# Verify pack files
git verify-pack -v .git/objects/pack/*.idx

# Check repository size
git count-objects -vH

Git Maintenance Command

Git 2.30+ introduced git maintenance for automated, scheduled maintenance:


# Register for automatic maintenance
git maintenance register

# Run all maintenance tasks
git maintenance run --auto

# Run specific tasks
git maintenance run --task=gc
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack

# Check maintenance schedule
git maintenance start

# Stop maintenance
git maintenance unregister

Production Failure Scenarios + Mitigations

ScenarioSymptomsMitigation
Accidental pruneLost commits after git gc --prune=nowRecover from remote clone or backup; check reflog first
GC during CISlow CI runs due to gc triggerDisable auto-gc in CI: git config gc.auto 0
Pack corruption”bad packed object” errorsDelete pack, re-fetch from remote
Disk space exhaustionRepository fills diskRun git gc --aggressive; migrate large files to LFS
Reflog bloat.git/logs/ consumes GBsgit reflog expire --expire=7.days.ago --all

Trade-offs

AspectAdvantageDisadvantage
Auto GCZero-config maintenanceMay trigger at inconvenient times
Aggressive GCMaximum compressionVery slow on large repos
Immediate pruneFrees disk spaceLoses recovery safety net
14-day default pruneRecovery windowDelays space reclamation
Git maintenanceScheduled, incrementalRequires Git 2.30+

Implementation Snippets


# Safe cleanup workflow
git reflog expire --expire=30.days.ago --all
git gc --prune=30.days.ago

# Aggressive cleanup (after confirming no needed unreachable objects)
git reflog expire --expire=7.days.ago --all
git gc --aggressive --prune=7.days.ago

# CI/CD optimization: disable auto-gc
git config gc.auto 0
git config gc.autoDetach false

# Check what would be pruned (dry run)
git fsck --unreachable --no-reflogs

# Monitor repository health
git count-objects -vH
git fsck --full 2>&1 | grep -E "error|warning"

# Set up scheduled maintenance
git config maintenance.auto true
git config maintenance.gc.auto 6700

Observability Checklist

  • Monitor: Loose object count (git count-objects -v)
  • Track: Repository size growth over time
  • Alert: Pack file count exceeding gc.autopacklimit
  • Verify: Integrity with periodic git fsck --full
  • Audit: Reflog size per branch

Security/Compliance Notes

  • Pruned objects may still be recoverable from disk forensics
  • For true data removal, use secure deletion tools after pruning
  • GC doesn’t remove objects from remote repositories
  • See Removing Sensitive Data from History for secret removal

Common Pitfalls / Anti-Patterns

  • Running git gc --prune=now without checking reflog — loses recovery options
  • Disabling auto-gc permanently — leads to excessive loose objects
  • Not pruning after history rewrite — old objects persist indefinitely
  • Assuming git gc removes secrets — it only removes unreachable objects, not specific content

Quick Recap Checklist

  • Auto GC triggers when loose objects exceed gc.auto (default: 6700)
  • Manual git gc packs objects and optionally prunes unreachable ones
  • Pruning removes objects not referenced by any ref or reflog
  • Default prune expiry is 14 days — provides recovery window
  • git fsck verifies repository integrity
  • git maintenance provides scheduled, incremental maintenance
  • Always check reflog before pruning

Interview Q&A

What's the difference between `git gc` and `git gc --aggressive`?

Standard git gc repacks objects with default delta window and depth. --aggressive uses a larger delta window (250 vs 10) and deeper delta chains, producing smaller packs but taking significantly longer. It also reuses existing delta information less, recomputing for better compression. Use it occasionally, not regularly.

How long does Git keep unreachable objects before pruning them?

By default, 14 days (gc.pruneExpire = 14.days.ago). Unreachable objects that are still referenced in the reflog are kept until the reflog entry expires (90 days for current branch, 30 days for others). Only objects that are both unreachable AND past the prune expiry are deleted.

Why should you disable auto-gc in CI/CD pipelines?

Auto-gc can trigger unpredictably during clone or fetch, adding minutes to CI runs. CI environments typically use shallow clones that are discarded after the run, so gc provides no benefit. Disable it with git config gc.auto 0 in CI scripts to ensure consistent build times.

Can `git gc` remove commits that were force-pushed away?

Yes, but only after the reflog expires. When you force-push, the old commits become unreachable from any branch, but they remain in the reflog. Once the reflog entries expire (default 90 days) AND the prune expiry passes (default 14 days), git gc will permanently remove them.

GC Process Flow (Clean Architecture)


graph TD
    REPO["Repository Operations"] -->|generate| LOOSE["Loose Objects"]
    REPO -->|generate| REFLOG["Reflog Entries"]
    REPO -->|generate| UNREACH["Unreachable Objects"]

    AUTO_GC["Auto GC\n(triggered at threshold)"] -->|checks| COUNT["Loose object count\n> gc.auto (6700)?"]
    COUNT -->|yes| PACK["Repack objects\ninto pack files"]
    COUNT -->|no| SKIP["Skip — no action"]

    MANUAL_GC["Manual GC\ngit gc"] --> PACK
    MANUAL_GC --> EXPIRE_REFLOG["Expire old reflogs\ngc.reflogExpire"]
    EXPIRE_REFLOG --> PRUNE["Prune unreachable objects\ngc.pruneExpire (14 days)"]
    PACK --> CLEAN["Clean repository\noptimized storage"]
    PRUNE --> CLEAN

Production Failure: Aggressive GC Data Loss

Scenario: Reflog data loss after aggressive pruning


# What happened:
$ git gc --prune=now  # Removes ALL unreachable objects immediately
$ # Later: need to recover a commit from 3 days ago
$ git reflog
# Empty — reflog entries were expired too!

# Symptoms
$ git log --oneline
# Missing commits that were on deleted branches
$ git fsck --unreachable
# No unreachable objects — they're all gone

# Recovery (limited options):

# 1. Check if remote still has the commits
git fetch origin
git log origin/old-branch --oneline

# 2. Check other clones (colleagues' machines)
# Ask teammates if they have the commits locally

# 3. Check CI/CD logs for commit SHAs
# Some CI systems log commit hashes before building

# 4. If truly lost — the commits are unrecoverable
# This is why --prune=now is dangerous

# Prevention:
# NEVER use --prune=now unless you're certain
# Always use a time window:
git gc --prune=30.days.ago

# Safe cleanup workflow:
# 1. Check what would be pruned
git fsck --unreachable --no-reflogs
# 2. Review reflog for valuable commits
git reflog --all
# 3. Create branches for anything important
git branch save-important <sha>
# 4. Then gc with a safety window
git gc --prune=90.days.ago

Trade-offs: Auto GC vs Manual GC

AspectAuto GCManual GC
TriggerAutomatic (loose objects > gc.auto)Explicit command
SafetyConservative — never prunes unreachableCan prune immediately (—prune=now)
PerformanceRuns during operations (may slow commit/fetch)Runs on your schedule
Disk recoveryPacks objects but keeps unreachableCan free maximum space
Configurationgc.auto, gc.autopacklimitFull control over all parameters
CI/CD impactCan trigger unexpectedlyCan be disabled or scheduled
Best forDaily developmentMaintenance windows, large repos
Risk levelLowMedium to high (depends on flags)

Recommendation: Keep auto-gc enabled for daily work. Run manual gc monthly or after major operations (rebase, filter-repo) with a safe prune window.

Implementation: GC Configuration Tuning


# === Threshold tuning ===
# When auto-gc triggers (default: 6700 loose objects)
git config gc.auto 10000

# Maximum number of pack files before consolidating (default: 50)
git config gc.autopacklimit 10

# === Pruning windows ===
# How long unreachable objects are kept (default: 14 days)
git config gc.pruneExpire 30.days.ago

# Reflog expiry for current branch (default: 90 days)
git config gc.reflogExpire 180.days.ago

# Reflog expiry for other refs (default: 30 days)
git config gc.reflogExpireUnreachable 90.days.ago

# === Pack optimization ===
# Delta search window size (default: 10)
git config pack.window 50

# Maximum delta chain depth (default: 50)
git config pack.depth 250

# Memory limit for delta search
git config pack.windowMemory 512m

# Compression level (0-9, default: -1 = zlib default)
git config pack.compression 6

# === Pack bitmaps (for large repos) ===
# Speeds up git rev-list operations
git config repack.writeBitmaps true

# === CI/CD optimization ===
# Disable auto-gc in CI environments
git config gc.auto 0
git config gc.autodetach false

# === Verify current configuration ===
git config --get-regexp 'gc\.'
git config --get-regexp 'pack\.'
git config --get-regexp 'repack\.'

# === Scheduled maintenance (Git 2.30+) ===
# Register for automatic background maintenance
git maintenance register

# Run maintenance tasks
git maintenance run --auto

# Available tasks: gc, commit-graph, loose-objects, incremental-repack
git maintenance run --task=gc
git maintenance run --task=commit-graph

Resources

Category

Related Posts

Git Object Database and Pack Files

Understanding Git's object storage: loose objects, pack files, delta compression, and how Git optimizes storage for repositories with millions of objects and gigabytes of history.

#git #version-control #pack-files

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn

Automated Changelog Generation: From Commit History to Release Notes

Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.

#git #version-control #changelog