The .git Directory Structure

Exploring the .git folder: HEAD, config, objects, refs, hooks, and index. Understand how Git stores everything internally for better debugging and recovery.

published: reading time: 11 min read updated: March 31, 2026

Introduction

Every Git repository you’ve ever cloned, initialized, or worked with hides a single directory that contains the entire universe of that project’s history: the .git directory. While most developers interact with Git through high-level commands like git commit, git push, and git merge, understanding what lives inside .git transforms you from a Git user into a Git operator.

The .git directory is not a black box — it’s a carefully organized database of files, references, and metadata that together implement a distributed version control system. When things go wrong (and they will), knowing this structure is the difference between panicking and confidently recovering your work.

This deep dive examines every component of the .git directory, explaining what each file and folder does, how they interconnect, and why Git’s design choices matter for real-world development workflows.

When to Use / When Not to Use

When to understand .git internals:

  • Recovering from corrupted repositories or lost commits
  • Debugging mysterious Git behavior (detached HEAD, missing branches)
  • Writing custom Git hooks or tooling
  • Optimizing large repositories
  • Understanding how Git achieves its guarantees

When not to dig into .git:

  • Daily development workflows — use normal Git commands
  • When you’re unsure — modifying .git files directly can corrupt your repo
  • For simple tasks like committing or branching — the CLI is sufficient

Core Concepts

The .git directory is Git’s entire knowledge base. Everything outside of it is your working tree — the files you edit. Everything inside is Git’s internal representation of your project’s history, configuration, and state.


graph TD
    A[".git Directory"] --> B["HEAD"]
    A --> C["config"]
    A --> D["objects/"]
    A --> E["refs/"]
    A --> F["hooks/"]
    A --> G["index"]
    A --> H["logs/"]
    A --> I["description"]
    A --> J["packed-refs"]
    A --> K["COMMIT_EDITMSG"]

    D --> D1["info/"]
    D --> D2["pack/"]
    D --> D3["<sha2>/"]

    E --> E1["heads/"]
    E --> E2["tags/"]
    E --> E3["remotes/"]

Git’s design follows a simple principle: everything is a file. This means you can inspect, backup, and even manually repair a repository using standard file operations. The directory structure is stable across Git versions, making it a reliable foundation for tooling.

Architecture or Flow Diagram


flowchart LR
    WT["Working Tree\n(your files)"] -->|git add| IDX["index\n(staging area)"]
    IDX -->|git commit| OBJ["objects/\n(blobs, trees, commits)"]
    OBJ -->|referenced by| REF["refs/\n(branches, tags)"]
    REF -->|pointed to by| HD["HEAD\n(current ref)"]
    HD -->|determines| WT

    CFG["config"] -.->|settings| IDX
    CFG -.->|settings| OBJ
    HKS["hooks/"] -.->|triggers| IDX
    HKS -.->|triggers| OBJ

The flow shows how data moves from your working tree through the staging area into the object database, with references and HEAD tracking the current state. Configuration and hooks influence every transition.

Step-by-Step Guide / Deep Dive

The HEAD File

HEAD is a single file containing a reference to the current branch or commit SHA:


$ cat .git/HEAD
ref: refs/heads/main

When you’re in detached HEAD state, it contains a raw SHA-1 hash instead of a symbolic reference. This file is Git’s answer to “where am I right now?”

The config File

Your repository’s local configuration, overriding global (~/.gitconfig) and system-level settings:


[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = https://github.com/user/repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
    remote = origin
    merge = refs/heads/main

The objects/ Directory

Git’s content-addressable storage. Every blob, tree, commit, and tag lives here, named by their SHA-1 (or SHA-256) hash:


$ ls .git/objects/
02/  0a/  1b/  2c/  3d/  ...  pack/  info/

The first two characters of the hash form a subdirectory; the remaining 38 characters form the filename. Objects are zlib-compressed and stored as loose files until git gc packs them.

The refs/ Directory

References are named pointers to commits:


refs/
├── heads/        # Local branches
   ├── main
   └── feature/auth
├── tags/         # Tags
   └── v1.0.0
└── remotes/      # Remote-tracking branches
    └── origin/
        ├── main
        └── develop

Each file contains a 40-character SHA-1 hash. Reading refs/heads/main tells you exactly which commit main points to.

The index File

The index (or staging area) is a binary file that tracks which files are staged for the next commit. You can’t read it directly, but you can inspect it:


$ git ls-files --stage
100644 abc123... 0 src/main.py
100644 def456... 0 src/utils.py

The hooks/ Directory

Executable scripts that run at specific points in Git’s workflow:


hooks/
├── pre-commit.sample
├── pre-push.sample
├── commit-msg.sample
└── ...

Remove .sample and make executable to activate. Hooks are not versioned — each developer manages their own.

The logs/ Directory

Reflogs track reference movements, enabling recovery of “lost” commits:


$ cat .git/logs/HEAD
0000000... abc123... User <user@email.com> 1711900000 +0000 commit: Initial commit
abc123... def456... User <user@email.com> 1711900100 +0000 commit: Add feature

Other Files

  • description: Repository description (used by GitWeb)
  • packed-refs: Packed references for performance (see Git Object Database and Pack Files)
  • COMMIT_EDITMSG: Last commit message (used by git commit --amend)
  • MERGE_HEAD, REBASE_HEAD: Temporary files during merge/rebase operations

Production Failure Scenarios + Mitigations

ScenarioSymptomsMitigation
Corrupted HEAD”fatal: bad HEAD”Restore from refs/heads/ or use git symbolic-ref HEAD refs/heads/main
Missing objects”fatal: loose object corrupt”Run git fsck --full, fetch from remote, or restore from backup
Broken refsBranch points to non-existent commitCheck reflog, reset to valid commit
Lock file stuck”fatal: Unable to create .git/index.lock”Remove stale .git/index.lock after verifying no other Git process runs
Hook failureCommit/push silently failsCheck hook exit codes, run hooks manually with bash .git/hooks/pre-commit

Trade-offs

AspectAdvantageDisadvantage
File-based storageHuman-inspectable, easy to backupNot optimized for large repos without packing
SHA-1 hashingFast content addressingCollision risk (mitigated by SHA-256 transition)
Local hooksPer-developer customizationNot shared across team, easy to forget
ReflogRecovery safety netConsumes disk space over time

Implementation Snippets


# Initialize a new repository
git init
ls -la .git/

# Inspect HEAD
cat .git/HEAD

# List all objects with sizes
git count-objects -vH

# View packed refs
cat .git/packed-refs

# Inspect the index
git ls-files --stage

# List hooks
ls -la .git/hooks/

# View reflog
git reflog
cat .git/logs/HEAD

# Create a custom pre-commit hook
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
echo "Running pre-commit checks..."
npm run lint
EOF
chmod +x .git/hooks/pre-commit

Observability Checklist

  • Monitor: Repository size growth with git count-objects -vH
  • Log: Hook execution results (hooks should log to stderr)
  • Alert: Reflog size exceeding thresholds (prune with git reflog expire)
  • Verify: Run git fsck --full periodically on critical repositories
  • Track: Number of loose objects vs packed objects

Security/Compliance Notes

  • The .git directory contains all history — never expose it on a web server
  • Hooks run with the same permissions as the user — validate hook sources
  • Configuration may contain credentials — never commit .git/config with secrets
  • Consider using git config core.hooksPath for shared, versioned hooks

Common Pitfalls / Anti-Patterns

  • Editing .git files directly — always use Git commands unless you know exactly what you’re doing
  • Deleting .git to “reset” — you lose all history; use git reset --hard instead
  • Ignoring hook exit codes — a failing hook should abort the operation
  • Not backing up .git — it IS your repository; the working tree is disposable
  • Sharing hooks via .git/hooks/ — hooks aren’t versioned; use core.hooksPath or a tool like Husky

Quick Recap Checklist

  • .git contains everything Git needs — working tree is disposable
  • HEAD points to current branch or commit
  • objects/ stores all content as SHA-addressed blobs, trees, commits, tags
  • refs/ contains branch and tag pointers
  • index is the staging area (binary format)
  • hooks/ runs scripts at key Git lifecycle events
  • logs/ enables recovery through reflogs
  • Never expose .git on production web servers

Interview Q&A

What is the difference between HEAD and a branch reference in Git?

HEAD is a symbolic reference that points to the current branch (e.g., ref: refs/heads/main). A branch reference is a file in .git/refs/heads/ that points to a specific commit SHA. HEAD moves when you check out different branches; branch refs move when you make new commits.

Why does Git use a two-level directory structure for objects (e.g., .git/objects/ab/cdef...)?

Filesystem performance degrades with too many files in a single directory. By using the first two hex characters of the SHA-1 hash as a subdirectory, Git limits each directory to at most 256 entries (16²), keeping filesystem operations efficient even with millions of objects.

How can you recover a commit that was lost after a hard reset?

Use git reflog to find the SHA of the lost commit. The reflog in .git/logs/HEAD records every HEAD movement, including resets. Once you have the SHA, run git checkout <sha> or git branch recovery <sha> to restore it. This works as long as git gc hasn't pruned the unreachable objects.

What happens when you run `git add` in terms of the .git directory?

Git creates blob objects in .git/objects/ for each file's content (if not already present), then updates the .git/index binary file to record the blob SHA, file path, and metadata. The working tree files are hashed and compared to existing objects — unchanged files reuse existing blobs, enabling deduplication.

.git Directory Tree (Clean Architecture)


graph TD
    ROOT[".git/"] --> HEAD["HEAD"]
    ROOT --> CONFIG["config"]
    ROOT --> OBJECTS["objects/"]
    ROOT --> REFS["refs/"]
    ROOT --> HOOKS["hooks/"]
    ROOT --> INDEX["index"]
    ROOT --> LOGS["logs/"]
    ROOT --> PACKED["packed-refs"]
    ROOT --> DESC["description"]
    ROOT --> EDITMSG["COMMIT_EDITMSG"]

    OBJECTS --> OBJ_INFO["info/"]
    OBJECTS --> OBJ_PACK["pack/"]
    OBJECTS --> OBJ_LOOSE["<first-2-chars>/<remaining-38>/"]

    REFS --> REFS_HEADS["heads/ (branches)"]
    REFS --> REFS_TAGS["tags/"]
    REFS --> REFS_REMOTES["remotes/ (tracking)"]

    REFS_HEADS --> RH_MAIN["main"]
    REFS_HEADS --> RH_FEAT["feature/auth"]

    REFS_TAGS --> RT_V1["v1.0.0"]

    REFS_REMOTES --> RR_ORIGIN["origin/"]
    RR_ORIGIN --> RR_MAIN["main"]
    RR_ORIGIN --> RR_DEV["develop"]

Production Failure: Corrupted Repository Recovery

Scenario: Corrupted index file preventing all operations


# Symptoms
$ git status
fatal: .git/index: index file smaller than expected

$ git add .
fatal: Unable to create '/path/to/repo/.git/index.lock': File exists.

# Root cause: Binary index file was truncated or corrupted
# (disk failure, interrupted operation, filesystem corruption)

# Recovery steps:

# 1. Remove stale lock file (verify no git process is running first)
rm -f .git/index.lock

# 2. Rebuild index from HEAD
git reset HEAD

# 3. If index is completely gone, recreate from current tree
git read-tree HEAD

# 4. Verify working tree matches
git status

# If objects are missing:
$ git cat-file -t abc123
fatal: git cat-file: could not get object info

# Recovery:
git fsck --full
# Fetch missing objects from remote
git fetch origin
# Or restore .git/objects/ from backup

Manual Object Inspection with Plumbing Commands


# Create a blob and inspect it
echo "Hello, Git internals!" | git hash-object -w --stdin
# Output: 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a

# Verify the object exists
git cat-file -t 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: blob

# Read the blob content
git cat-file -p 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: Hello, Git internals!

# Check object size
git cat-file -s 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: 22

# Hash a file without storing it
git hash-object src/main.py
# Output: computes SHA-1 without -w flag

# Inspect a tree object
git cat-file -p HEAD^{tree}
# Output:
# 100644 blob abc123... README.md
# 040000 tree def456... src/
# 100644 blob 789ghi... package.json

# Inspect a commit object
git cat-file -p HEAD
# Output:
# tree 4b825d...
# parent a1b2c3...
# author Name <email> timestamp timezone
# committer Name <email> timestamp timezone
#
# Commit message here

Security: .git Directory Exposure on Web Servers

Exposing .git/ on a production web server is one of the most common and dangerous misconfigurations. Attackers can download your entire source code history, including:

  • All source code — every version of every file ever committed
  • Committed secrets — API keys, passwords, tokens in old commits
  • Developer information — names, emails from commit metadata
  • Infrastructure details — deployment configs, internal URLs

# Check if your site exposes .git
curl -I https://yoursite.com/.git/HEAD
# If you get 200 OK instead of 403/404, you are vulnerable

# Common attack tools that exploit this:
# - git-dumper (downloads entire repo)
# - diggit.py (reconstructs repo from exposed objects)

Mitigation:

  • Nginx: location ~ /\.git { deny all; return 404; }
  • Apache: RedirectMatch 404 /\.git
  • Never deploy .git/ — use build artifacts, not raw repos
  • Scan with: gitleaks detect --log-opts="--all" before deployment

Resources

Category

Related Posts

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn

Automated Changelog Generation: From Commit History to Release Notes

Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.

#git #version-control #changelog

Choosing a Git Team Workflow: Decision Framework for Branching Strategies

Decision framework for selecting the right Git branching strategy based on team size, release cadence, project type, and organizational maturity. Compare Git Flow, GitHub Flow, and more.

#git #version-control #branching-strategy