The .git Directory Structure
Exploring the .git folder: HEAD, config, objects, refs, hooks, and index. Understand how Git stores everything internally for better debugging and recovery.
Introduction
Every Git repository you’ve ever cloned, initialized, or worked with hides a single directory that contains the entire universe of that project’s history: the .git directory. While most developers interact with Git through high-level commands like git commit, git push, and git merge, understanding what lives inside .git transforms you from a Git user into a Git operator.
The .git directory is not a black box — it’s a carefully organized database of files, references, and metadata that together implement a distributed version control system. When things go wrong (and they will), knowing this structure is the difference between panicking and confidently recovering your work.
This deep dive examines every component of the .git directory, explaining what each file and folder does, how they interconnect, and why Git’s design choices matter for real-world development workflows.
When to Use / When Not to Use
When to understand .git internals:
- Recovering from corrupted repositories or lost commits
- Debugging mysterious Git behavior (detached HEAD, missing branches)
- Writing custom Git hooks or tooling
- Optimizing large repositories
- Understanding how Git achieves its guarantees
When not to dig into .git:
- Daily development workflows — use normal Git commands
- When you’re unsure — modifying
.gitfiles directly can corrupt your repo - For simple tasks like committing or branching — the CLI is sufficient
Core Concepts
The .git directory is Git’s entire knowledge base. Everything outside of it is your working tree — the files you edit. Everything inside is Git’s internal representation of your project’s history, configuration, and state.
graph TD
A[".git Directory"] --> B["HEAD"]
A --> C["config"]
A --> D["objects/"]
A --> E["refs/"]
A --> F["hooks/"]
A --> G["index"]
A --> H["logs/"]
A --> I["description"]
A --> J["packed-refs"]
A --> K["COMMIT_EDITMSG"]
D --> D1["info/"]
D --> D2["pack/"]
D --> D3["<sha2>/"]
E --> E1["heads/"]
E --> E2["tags/"]
E --> E3["remotes/"]
Git’s design follows a simple principle: everything is a file. This means you can inspect, backup, and even manually repair a repository using standard file operations. The directory structure is stable across Git versions, making it a reliable foundation for tooling.
Architecture or Flow Diagram
flowchart LR
WT["Working Tree\n(your files)"] -->|git add| IDX["index\n(staging area)"]
IDX -->|git commit| OBJ["objects/\n(blobs, trees, commits)"]
OBJ -->|referenced by| REF["refs/\n(branches, tags)"]
REF -->|pointed to by| HD["HEAD\n(current ref)"]
HD -->|determines| WT
CFG["config"] -.->|settings| IDX
CFG -.->|settings| OBJ
HKS["hooks/"] -.->|triggers| IDX
HKS -.->|triggers| OBJ
The flow shows how data moves from your working tree through the staging area into the object database, with references and HEAD tracking the current state. Configuration and hooks influence every transition.
Step-by-Step Guide / Deep Dive
The HEAD File
HEAD is a single file containing a reference to the current branch or commit SHA:
$ cat .git/HEAD
ref: refs/heads/main
When you’re in detached HEAD state, it contains a raw SHA-1 hash instead of a symbolic reference. This file is Git’s answer to “where am I right now?”
The config File
Your repository’s local configuration, overriding global (~/.gitconfig) and system-level settings:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = https://github.com/user/repo.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
The objects/ Directory
Git’s content-addressable storage. Every blob, tree, commit, and tag lives here, named by their SHA-1 (or SHA-256) hash:
$ ls .git/objects/
02/ 0a/ 1b/ 2c/ 3d/ ... pack/ info/
The first two characters of the hash form a subdirectory; the remaining 38 characters form the filename. Objects are zlib-compressed and stored as loose files until git gc packs them.
The refs/ Directory
References are named pointers to commits:
refs/
├── heads/ # Local branches
│ ├── main
│ └── feature/auth
├── tags/ # Tags
│ └── v1.0.0
└── remotes/ # Remote-tracking branches
└── origin/
├── main
└── develop
Each file contains a 40-character SHA-1 hash. Reading refs/heads/main tells you exactly which commit main points to.
The index File
The index (or staging area) is a binary file that tracks which files are staged for the next commit. You can’t read it directly, but you can inspect it:
$ git ls-files --stage
100644 abc123... 0 src/main.py
100644 def456... 0 src/utils.py
The hooks/ Directory
Executable scripts that run at specific points in Git’s workflow:
hooks/
├── pre-commit.sample
├── pre-push.sample
├── commit-msg.sample
└── ...
Remove .sample and make executable to activate. Hooks are not versioned — each developer manages their own.
The logs/ Directory
Reflogs track reference movements, enabling recovery of “lost” commits:
$ cat .git/logs/HEAD
0000000... abc123... User <user@email.com> 1711900000 +0000 commit: Initial commit
abc123... def456... User <user@email.com> 1711900100 +0000 commit: Add feature
Other Files
- description: Repository description (used by GitWeb)
- packed-refs: Packed references for performance (see Git Object Database and Pack Files)
- COMMIT_EDITMSG: Last commit message (used by
git commit --amend) - MERGE_HEAD, REBASE_HEAD: Temporary files during merge/rebase operations
Production Failure Scenarios + Mitigations
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Corrupted HEAD | ”fatal: bad HEAD” | Restore from refs/heads/ or use git symbolic-ref HEAD refs/heads/main |
| Missing objects | ”fatal: loose object corrupt” | Run git fsck --full, fetch from remote, or restore from backup |
| Broken refs | Branch points to non-existent commit | Check reflog, reset to valid commit |
| Lock file stuck | ”fatal: Unable to create .git/index.lock” | Remove stale .git/index.lock after verifying no other Git process runs |
| Hook failure | Commit/push silently fails | Check hook exit codes, run hooks manually with bash .git/hooks/pre-commit |
Trade-offs
| Aspect | Advantage | Disadvantage |
|---|---|---|
| File-based storage | Human-inspectable, easy to backup | Not optimized for large repos without packing |
| SHA-1 hashing | Fast content addressing | Collision risk (mitigated by SHA-256 transition) |
| Local hooks | Per-developer customization | Not shared across team, easy to forget |
| Reflog | Recovery safety net | Consumes disk space over time |
Implementation Snippets
# Initialize a new repository
git init
ls -la .git/
# Inspect HEAD
cat .git/HEAD
# List all objects with sizes
git count-objects -vH
# View packed refs
cat .git/packed-refs
# Inspect the index
git ls-files --stage
# List hooks
ls -la .git/hooks/
# View reflog
git reflog
cat .git/logs/HEAD
# Create a custom pre-commit hook
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
echo "Running pre-commit checks..."
npm run lint
EOF
chmod +x .git/hooks/pre-commit
Observability Checklist
- Monitor: Repository size growth with
git count-objects -vH - Log: Hook execution results (hooks should log to stderr)
- Alert: Reflog size exceeding thresholds (prune with
git reflog expire) - Verify: Run
git fsck --fullperiodically on critical repositories - Track: Number of loose objects vs packed objects
Security/Compliance Notes
- The
.gitdirectory contains all history — never expose it on a web server - Hooks run with the same permissions as the user — validate hook sources
- Configuration may contain credentials — never commit
.git/configwith secrets - Consider using
git config core.hooksPathfor shared, versioned hooks
Common Pitfalls / Anti-Patterns
- Editing
.gitfiles directly — always use Git commands unless you know exactly what you’re doing - Deleting
.gitto “reset” — you lose all history; usegit reset --hardinstead - Ignoring hook exit codes — a failing hook should abort the operation
- Not backing up
.git— it IS your repository; the working tree is disposable - Sharing hooks via
.git/hooks/— hooks aren’t versioned; usecore.hooksPathor a tool like Husky
Quick Recap Checklist
-
.gitcontains everything Git needs — working tree is disposable -
HEADpoints to current branch or commit -
objects/stores all content as SHA-addressed blobs, trees, commits, tags -
refs/contains branch and tag pointers -
indexis the staging area (binary format) -
hooks/runs scripts at key Git lifecycle events -
logs/enables recovery through reflogs - Never expose
.giton production web servers
Interview Q&A
HEAD is a symbolic reference that points to the current branch (e.g., ref: refs/heads/main). A branch reference is a file in .git/refs/heads/ that points to a specific commit SHA. HEAD moves when you check out different branches; branch refs move when you make new commits.
Filesystem performance degrades with too many files in a single directory. By using the first two hex characters of the SHA-1 hash as a subdirectory, Git limits each directory to at most 256 entries (16²), keeping filesystem operations efficient even with millions of objects.
Use git reflog to find the SHA of the lost commit. The reflog in .git/logs/HEAD records every HEAD movement, including resets. Once you have the SHA, run git checkout <sha> or git branch recovery <sha> to restore it. This works as long as git gc hasn't pruned the unreachable objects.
Git creates blob objects in .git/objects/ for each file's content (if not already present), then updates the .git/index binary file to record the blob SHA, file path, and metadata. The working tree files are hashed and compared to existing objects — unchanged files reuse existing blobs, enabling deduplication.
.git Directory Tree (Clean Architecture)
graph TD
ROOT[".git/"] --> HEAD["HEAD"]
ROOT --> CONFIG["config"]
ROOT --> OBJECTS["objects/"]
ROOT --> REFS["refs/"]
ROOT --> HOOKS["hooks/"]
ROOT --> INDEX["index"]
ROOT --> LOGS["logs/"]
ROOT --> PACKED["packed-refs"]
ROOT --> DESC["description"]
ROOT --> EDITMSG["COMMIT_EDITMSG"]
OBJECTS --> OBJ_INFO["info/"]
OBJECTS --> OBJ_PACK["pack/"]
OBJECTS --> OBJ_LOOSE["<first-2-chars>/<remaining-38>/"]
REFS --> REFS_HEADS["heads/ (branches)"]
REFS --> REFS_TAGS["tags/"]
REFS --> REFS_REMOTES["remotes/ (tracking)"]
REFS_HEADS --> RH_MAIN["main"]
REFS_HEADS --> RH_FEAT["feature/auth"]
REFS_TAGS --> RT_V1["v1.0.0"]
REFS_REMOTES --> RR_ORIGIN["origin/"]
RR_ORIGIN --> RR_MAIN["main"]
RR_ORIGIN --> RR_DEV["develop"]
Production Failure: Corrupted Repository Recovery
Scenario: Corrupted index file preventing all operations
# Symptoms
$ git status
fatal: .git/index: index file smaller than expected
$ git add .
fatal: Unable to create '/path/to/repo/.git/index.lock': File exists.
# Root cause: Binary index file was truncated or corrupted
# (disk failure, interrupted operation, filesystem corruption)
# Recovery steps:
# 1. Remove stale lock file (verify no git process is running first)
rm -f .git/index.lock
# 2. Rebuild index from HEAD
git reset HEAD
# 3. If index is completely gone, recreate from current tree
git read-tree HEAD
# 4. Verify working tree matches
git status
# If objects are missing:
$ git cat-file -t abc123
fatal: git cat-file: could not get object info
# Recovery:
git fsck --full
# Fetch missing objects from remote
git fetch origin
# Or restore .git/objects/ from backup
Manual Object Inspection with Plumbing Commands
# Create a blob and inspect it
echo "Hello, Git internals!" | git hash-object -w --stdin
# Output: 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Verify the object exists
git cat-file -t 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: blob
# Read the blob content
git cat-file -p 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: Hello, Git internals!
# Check object size
git cat-file -s 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: 22
# Hash a file without storing it
git hash-object src/main.py
# Output: computes SHA-1 without -w flag
# Inspect a tree object
git cat-file -p HEAD^{tree}
# Output:
# 100644 blob abc123... README.md
# 040000 tree def456... src/
# 100644 blob 789ghi... package.json
# Inspect a commit object
git cat-file -p HEAD
# Output:
# tree 4b825d...
# parent a1b2c3...
# author Name <email> timestamp timezone
# committer Name <email> timestamp timezone
#
# Commit message here
Security: .git Directory Exposure on Web Servers
Exposing .git/ on a production web server is one of the most common and dangerous misconfigurations. Attackers can download your entire source code history, including:
- All source code — every version of every file ever committed
- Committed secrets — API keys, passwords, tokens in old commits
- Developer information — names, emails from commit metadata
- Infrastructure details — deployment configs, internal URLs
# Check if your site exposes .git
curl -I https://yoursite.com/.git/HEAD
# If you get 200 OK instead of 403/404, you are vulnerable
# Common attack tools that exploit this:
# - git-dumper (downloads entire repo)
# - diggit.py (reconstructs repo from exposed objects)
Mitigation:
- Nginx:
location ~ /\.git { deny all; return 404; } - Apache:
RedirectMatch 404 /\.git - Never deploy
.git/— use build artifacts, not raw repos - Scan with:
gitleaks detect --log-opts="--all"before deployment
Resources
Category
Related Posts
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.
Automated Changelog Generation: From Commit History to Release Notes
Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.
Choosing a Git Team Workflow: Decision Framework for Branching Strategies
Decision framework for selecting the right Git branching strategy based on team size, release cadence, project type, and organizational maturity. Compare Git Flow, GitHub Flow, and more.