Git Objects: Blobs, Trees, Commits, Tags
Understanding Git's four object types — blobs, trees, commits, and annotated tags — how they relate through content-addressable storage, and how to inspect them with plumbing commands.
Introduction
Git is fundamentally a content-addressable filesystem with a VCS user interface. At its core lies a simple but powerful abstraction: four object types that together represent every snapshot of your project’s history. Understanding these objects — blobs, trees, commits, and tags — is the key to demystifying Git’s internals.
Unlike traditional version control systems that store deltas (differences between versions), Git stores complete snapshots. Each snapshot is decomposed into these four object types, linked together by SHA-1 hashes into a directed acyclic graph (DAG). This design gives Git its speed, integrity guarantees, and distributed nature.
This article examines each object type in depth, shows how they interconnect, and teaches you to inspect them using Git’s plumbing commands. By the end, you’ll be able to manually reconstruct a Git repository from raw objects if needed.
When to Use / When Not to Use
When to understand Git objects:
- Debugging corruption or missing data in repositories
- Building Git tooling or integrations
- Understanding how Git achieves integrity verification
- Optimizing repository size and performance
- Recovering lost data from the object database
When not to manipulate objects directly:
- Daily development — use porcelain commands (
git add,git commit) - When unsure — direct object manipulation can corrupt repositories
- For simple history inspection —
git logandgit showare sufficient
Core Concepts
Git stores everything as objects in .git/objects/, each identified by a SHA-1 hash of its content. There are exactly four types:
graph TD
TAG["Annotated Tag\n(type: tag)"] -->|points to| COMMIT["Commit\n(type: commit)"]
COMMIT -->|tree: points to| TREE["Tree\n(type: tree)"]
COMMIT -->|parent: points to| COMMIT2["Parent Commit"]
TREE -->|contains| TREE2["Subdirectory Tree"]
TREE -->|contains| BLOB1["Blob\n(file content)"]
TREE -->|contains| BLOB2["Blob\n(file content)"]
TREE2 -->|contains| BLOB3["Blob\n(file content)"]
The relationship is hierarchical: tags point to commits, commits point to trees (and parent commits), trees point to other trees and blobs. Blobs are the leaves — they contain actual file content.
Each object is stored as:
<object type> <content length>\0<content>
This header is zlib-compressed and stored in .git/objects/ under a path derived from the SHA-1 hash.
Architecture or Flow Diagram
flowchart LR
FILE["File Content"] -->|git hash-object| BLOB["Blob Object\nSHA: abc123..."]
BLOB -->|referenced by| TREE["Tree Object\nSHA: def456..."]
TREE -->|referenced by| COMMIT["Commit Object\nSHA: 789ghi..."]
COMMIT -->|referenced by| TAG["Tag Object\nSHA: jkl012..."]
META["Author, Date, Message"] --> COMMIT
PARENT["Parent SHA"] --> COMMIT
The flow shows how file content becomes a blob, which is referenced by a tree, which is referenced by a commit, which may be referenced by a tag. Metadata flows into commits separately from content.
Step-by-Step Guide / Deep Dive
Blob Objects
Blobs store file content. They do not store filenames, permissions, or directory structure — just raw bytes.
# Create a blob from a file
echo "Hello, Git!" | git hash-object -w --stdin
# Output: 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# The object is now in .git/objects/8a/
ls .git/objects/8a/
# Inspect the blob
git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: Hello, Git!
# Check the type
git cat-file -t 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: blob
Key properties:
- Content-addressable: identical files produce identical blobs (deduplication)
- Immutable: once created, a blob never changes
- No metadata: filename and permissions are stored in the tree, not the blob
Tree Objects
Trees represent directories. They map filenames to blob SHAs (or other tree SHAs for subdirectories).
# Create a tree from the current index
git write-tree
# Output: 4b825dc642cb6eb9a060e54bf899d69f824970a0
# Inspect a tree
git cat-file -p 4b825dc642cb6eb9a060e54bf899d69f824970a0
# Output format:
# 100644 blob abc123... file1.txt
# 040000 tree def456... subdir/
# 100755 blob 789ghi... script.sh
Each tree entry contains:
- Mode: file permissions (100644 for regular files, 100755 for executables, 040000 for directories)
- Type: blob or tree
- SHA-1: hash of the referenced object
- Filename: the name within this directory
Commit Objects
Commits are the backbone of Git history. Each commit records a snapshot (via a tree), authorship, and lineage.
# Create a commit object manually
export GIT_AUTHOR_NAME="Test User"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test User"
export GIT_COMMITTER_EMAIL="test@example.com"
export GIT_AUTHOR_DATE="2026-03-31T12:00:00+00:00"
export GIT_COMMITTER_DATE="2026-03-31T12:00:00+00:00"
TREE_SHA=$(git write-tree)
COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)
echo $COMMIT_SHA
# Output: a1b2c3d4e5f6...
# Inspect the commit
git cat-file -p $COMMIT_SHA
# Output:
# tree 4b825dc642cb6eb9a060e54bf899d69f824970a0
# author Test User <test@example.com> 1711886400 +0000
# committer Test User <test@example.com> 1711886400 +0000
#
# Initial commit
A commit object contains:
- tree: SHA-1 of the root tree (the snapshot)
- parent: SHA-1 of the parent commit(s) — zero for the initial commit, one for normal commits, two+ for merges
- author: who wrote the code (name, email, timestamp, timezone)
- committer: who committed the code (can differ from author, e.g., after rebase)
- message: the commit message
Tag Objects
There are two types of tags in Git:
Lightweight tags are simply refs pointing to a commit — no tag object is created:
git tag v1.0-lightweight
# Creates: .git/refs/tags/v1.0-lightweight → commit SHA
Annotated tags are full objects with metadata:
git tag -a v1.0 -m "Release version 1.0"
# Creates a tag object
# Inspect the tag object
git cat-file -p $(git rev-parse v1.0)
# Output:
# object a1b2c3d4e5f6...
# type commit
# tag v1.0
# tagger Test User <test@example.com> 1711886400 +0000
#
# Release version 1.0
Annotated tags contain:
- object: the SHA-1 of what the tag points to (usually a commit)
- type: the type of the object (commit, tree, blob, or tag)
- tag: the tag name
- tagger: who created the tag
- message: the tag message
Production Failure Scenarios + Mitigations
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Missing blob | ”fatal: unable to read tree” | Fetch from remote, or restore from backup; blobs are immutable so any copy works |
| Corrupted tree | Directory listing fails | Rebuild tree from working tree with git read-tree; verify with git fsck |
| Broken commit chain | git log stops abruptly | Use git replace to graft history, or rebase onto valid ancestor |
| Tag pointing to wrong type | Unexpected behavior on tag checkout | Verify with git cat-file -t; recreate annotated tag if needed |
| Object store corruption | Multiple “bad object” errors | Run git fsck --full; clone fresh from remote; restore from backup |
Trade-offs
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Content-addressable storage | Automatic deduplication, integrity verification | SHA-1 collision risk (being mitigated with SHA-256) |
| Snapshot-based (not delta) | Fast checkouts, simple model | Higher storage for text files (mitigated by pack files) |
| Immutable objects | Safe concurrent access, easy replication | No in-place updates; every change creates new objects |
| No filenames in blobs | Blobs can be shared across trees | Must traverse tree to find which file a blob belongs to |
Implementation Snippets
# Create a blob and store it
echo "content" | git hash-object -w --stdin
# Read a blob's content
git cat-file -p <sha>
# Get object type
git cat-file -t <sha>
# Get object size
git cat-file -s <sha>
# Create a tree from index
git write-tree
# Read a tree into index
git read-tree <tree-sha>
# Create a commit object
git commit-tree <tree-sha> -p <parent-sha> -m "message"
# Create an annotated tag object
git mktag << EOF
object <commit-sha>
type commit
tag v1.0
tagger Name <email> date
EOF
# List all objects in the repository
git rev-list --objects --all
# Find all unreachable objects
git fsck --unreachable
Observability Checklist
- Monitor: Object count growth with
git count-objects -v - Verify: Run
git fsckperiodically to detect corruption - Track: Ratio of loose to packed objects (should favor packed)
- Alert: Unexpected object count spikes (may indicate accidental large file commits)
- Audit: Tag signatures for release integrity verification
Security/Compliance Notes
- Object hashes provide integrity verification — tampering changes the hash
- SHA-1 is being deprecated in favor of SHA-256 for collision resistance
- Signed tags (GPG/SSH) provide non-repudiation for releases
- Objects are not encrypted — sensitive data in blobs is readable by anyone with repo access
- See Git Secrets Management for preventing secret commits
Common Pitfalls / Anti-Patterns
- Assuming blob size equals file size — blobs include a header; use
git cat-file -sfor actual size - Confusing lightweight and annotated tags — lightweight tags are just refs, not objects
- Modifying objects directly — objects are immutable; use Git commands to create new ones
- Ignoring unreachable objects — they consume space until
git gcprunes them - Storing large binary files as blobs — use Git LFS instead
Quick Recap Checklist
- Blobs store file content only — no filenames or metadata
- Trees map filenames to blob/tree SHAs — represent directories
- Commits point to a tree, parent(s), and record authorship
- Annotated tags are full objects; lightweight tags are just refs
- All objects are content-addressable by SHA-1 hash
- Objects are immutable and zlib-compressed
- The object graph forms a directed acyclic graph (DAG)
- Use
git cat-fileto inspect any object by type, size, or content
Interview Q&A
Blobs store only file content to enable deduplication. If two files in different directories have identical content, they share the same blob. Filenames are stored in tree objects, which map names to blob SHAs. This separation means renaming a file doesn't create a new blob — only a new tree.
Every object's SHA-1 hash is computed from its type, size, and content. When Git reads an object, it recomputes the hash and compares it to the filename. If they don't match, the object is corrupted. This is why Git is called a "content-addressable filesystem" — the address is the content's fingerprint.
Without -w, git hash-object only computes and prints the SHA-1 hash without storing the object. With -w (write), it also stores the object in .git/objects/. This is useful for checking if content already exists before writing it.
Yes — the initial commit of any repository has zero parents. Additionally, commits created with git commit-tree without the -p flag, or orphan branches created with git checkout --orphan, produce commits with no parent. This creates a new root in the commit DAG.
An annotated tag creates a full tag object in .git/objects/ with metadata (tagger, date, message, GPG signature). A lightweight tag is just a file in .git/refs/tags/ containing a commit SHA — no object is created. Annotated tags are preferred for releases because they're immutable and verifiable.
Object Relationship Diagram (Clean)
graph TD
REPO["Repository"] -->|contains| TAGS["Annotated Tags"]
TAGS -->|points to| COMMITS["Commits"]
COMMITS -->|tree ref| TREES["Trees"]
COMMITS -->|parent ref| PARENTS["Parent Commits"]
TREES -->|entries| SUBTREES["Subdirectory Trees"]
TREES -->|entries| BLOBS["Blobs (file content)"]
SUBTREES -->|entries| MORE_BLOBS["More Blobs"]
BLOBS -->|content only| FILES["Raw File Bytes"]
MORE_BLOBS -->|content only| FILES
Production Failure: Corrupted Object Database
Scenario: Missing blob causing checkout failure
# Symptoms
$ git checkout main
error: unable to read sha1 file (src/config.py)
fatal: unable to checkout working tree
$ git fsck --full
error: abc123def456...: object missing
error: 789ghi...: object corrupt
# Root cause: Disk corruption, interrupted gc, or filesystem error
# destroyed blob objects in .git/objects/
# Recovery steps:
# 1. Identify missing objects
git fsck --full 2>&1 | grep "missing"
# 2. Try to fetch missing objects from remote
git fetch origin --refetch
# 3. If remote doesn't have them (local-only commits):
# Check reflog for last known good state
git reflog
git checkout HEAD@{1} # Try previous HEAD
# 4. As last resort, clone fresh and cherry-pick
cd ..
git clone https://github.com/user/repo.git repo-clean
cd repo-clean
git --git-dir=../repo/.git cherry-pick <sha>
# 5. Prevent future corruption:
# - Use reliable storage (SSD > HDD for .git/)
# - Run git fsck periodically
# - Keep remote backups
git push origin --mirror # Full backup
Trade-offs: Annotated vs Lightweight Tags
| Aspect | Annotated Tags | Lightweight Tags |
|---|---|---|
| Object type | Full tag object in .git/objects/ | Simple ref file in .git/refs/tags/ |
| Metadata | Tagger name, email, date, message | None |
| GPG signing | Supported (git tag -s) | Not supported |
| Storage | ~200 bytes per tag | ~41 bytes (SHA only) |
| Immutability | Immutable once created | Can be moved with git tag -f |
| Use case | Releases, public milestones | Private bookmarks, temporary markers |
git describe | Works correctly | May not show tag message |
| Platform display | Shows message on GitHub/GitLab | Shows as simple pointer |
Recommendation: Use annotated tags for anything public or release-related. Lightweight tags are fine for personal, temporary markers.
Implementation: Creating and Inspecting Each Object Type Manually
# === 1. BLOB ===
# Create blob from string
BLOB_SHA=$(echo "file content" | git hash-object -w --stdin)
echo "Blob SHA: $BLOB_SHA"
# Create blob from file
git hash-object -w myfile.txt
# Inspect
git cat-file -t $BLOB_SHA # blob
git cat-file -s $BLOB_SHA # size in bytes
git cat-file -p $BLOB_SHA # content
# === 2. TREE ===
# Build index, then create tree
echo "100644 blob $BLOB_SHA test.txt" | git mktree
# Output: TREE_SHA
# Or from current index
TREE_SHA=$(git write-tree)
# Inspect
git cat-file -t $TREE_SHA # tree
git cat-file -p $TREE_SHA # entries (mode, type, sha, name)
# === 3. COMMIT ===
# Create commit (requires env vars for author)
export GIT_AUTHOR_NAME="Test"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test"
export GIT_COMMITTER_EMAIL="test@example.com"
COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)
# Inspect
git cat-file -t $COMMIT_SHA # commit
git cat-file -p $COMMIT_SHA # tree, author, committer, message
# === 4. ANNOTATED TAG ===
# Create tag object
git tag -a v1.0 -m "Release 1.0" $COMMIT_SHA
# Get tag object SHA (not the commit it points to)
TAG_SHA=$(git rev-parse v1.0^{tag})
# Inspect
git cat-file -t $TAG_SHA # tag
git cat-file -p $TAG_SHA # object, type, tag, tagger, message
# === Verify the chain ===
echo "Tag -> Commit -> Tree -> Blob"
git cat-file -p $TAG_SHA | grep "^object"
git cat-file -p $COMMIT_SHA | grep "^tree"
git cat-file -p $TREE_SHA | grep "blob"
Resources
Category
Related Posts
Git References and HEAD
Deep dive into Git references — branch refs, tag refs, HEAD, detached HEAD state, and symbolic references. Learn how Git tracks commits through the refs namespace.
Semantic Versioning and Git Tags: SemVer, Tag Types, and Management Strategies
Master semantic versioning (SemVer 2.0.0), lightweight vs annotated git tags, tag management strategies, and automated versioning workflows for production software releases.
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.