Git Objects: Blobs, Trees, Commits, Tags

Understanding Git's four object types — blobs, trees, commits, and annotated tags — how they relate through content-addressable storage, and how to inspect them with plumbing commands.

published: March 31, 2026 reading time: 13 min read updated: March 31, 2026

Introduction

Git is fundamentally a content-addressable filesystem with a VCS user interface. At its core lies a simple but powerful abstraction: four object types that together represent every snapshot of your project’s history. Understanding these objects — blobs, trees, commits, and tags — is the key to demystifying Git’s internals.

Unlike traditional version control systems that store deltas (differences between versions), Git stores complete snapshots. Each snapshot is decomposed into these four object types, linked together by SHA-1 hashes into a directed acyclic graph (DAG). This design gives Git its speed, integrity guarantees, and distributed nature.

This article examines each object type in depth, shows how they interconnect, and teaches you to inspect them using Git’s plumbing commands. By the end, you’ll be able to manually reconstruct a Git repository from raw objects if needed.

When to Use / When Not to Use

When to understand Git objects:

Debugging corruption or missing data in repositories
Building Git tooling or integrations
Understanding how Git achieves integrity verification
Optimizing repository size and performance
Recovering lost data from the object database

When not to manipulate objects directly:

Daily development — use porcelain commands (git add, git commit)
When unsure — direct object manipulation can corrupt repositories
For simple history inspection — git log and git show are sufficient

Core Concepts

Git stores everything as objects in .git/objects/, each identified by a SHA-1 hash of its content. There are exactly four types:


graph TD
    TAG["Annotated Tag\n(type: tag)"] -->|points to| COMMIT["Commit\n(type: commit)"]
    COMMIT -->|tree: points to| TREE["Tree\n(type: tree)"]
    COMMIT -->|parent: points to| COMMIT2["Parent Commit"]
    TREE -->|contains| TREE2["Subdirectory Tree"]
    TREE -->|contains| BLOB1["Blob\n(file content)"]
    TREE -->|contains| BLOB2["Blob\n(file content)"]
    TREE2 -->|contains| BLOB3["Blob\n(file content)"]

The relationship is hierarchical: tags point to commits, commits point to trees (and parent commits), trees point to other trees and blobs. Blobs are the leaves — they contain actual file content.

Each object is stored as:


<object type> <content length>\0<content>

This header is zlib-compressed and stored in .git/objects/ under a path derived from the SHA-1 hash.

Architecture or Flow Diagram


flowchart LR
    FILE["File Content"] -->|git hash-object| BLOB["Blob Object\nSHA: abc123..."]
    BLOB -->|referenced by| TREE["Tree Object\nSHA: def456..."]
    TREE -->|referenced by| COMMIT["Commit Object\nSHA: 789ghi..."]
    COMMIT -->|referenced by| TAG["Tag Object\nSHA: jkl012..."]

    META["Author, Date, Message"] --> COMMIT
    PARENT["Parent SHA"] --> COMMIT

The flow shows how file content becomes a blob, which is referenced by a tree, which is referenced by a commit, which may be referenced by a tag. Metadata flows into commits separately from content.

Step-by-Step Guide / Deep Dive

Blob Objects

Blobs store file content. They do not store filenames, permissions, or directory structure — just raw bytes.


# Create a blob from a file
echo "Hello, Git!" | git hash-object -w --stdin
# Output: 8ab686eafeb1f44702738c8b0f24f2567c36da6d

# The object is now in .git/objects/8a/
ls .git/objects/8a/

# Inspect the blob
git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: Hello, Git!

# Check the type
git cat-file -t 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: blob

Key properties:

Content-addressable: identical files produce identical blobs (deduplication)
Immutable: once created, a blob never changes
No metadata: filename and permissions are stored in the tree, not the blob

Tree Objects

Trees represent directories. They map filenames to blob SHAs (or other tree SHAs for subdirectories).


# Create a tree from the current index
git write-tree
# Output: 4b825dc642cb6eb9a060e54bf899d69f824970a0

# Inspect a tree
git cat-file -p 4b825dc642cb6eb9a060e54bf899d69f824970a0
# Output format:
# 100644 blob abc123... file1.txt
# 040000 tree def456... subdir/
# 100755 blob 789ghi... script.sh

Each tree entry contains:

Mode: file permissions (100644 for regular files, 100755 for executables, 040000 for directories)
Type: blob or tree
SHA-1: hash of the referenced object
Filename: the name within this directory

Commit Objects

Commits are the backbone of Git history. Each commit records a snapshot (via a tree), authorship, and lineage.


# Create a commit object manually
export GIT_AUTHOR_NAME="Test User"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test User"
export GIT_COMMITTER_EMAIL="test@example.com"
export GIT_AUTHOR_DATE="2026-03-31T12:00:00+00:00"
export GIT_COMMITTER_DATE="2026-03-31T12:00:00+00:00"

TREE_SHA=$(git write-tree)
COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)

echo $COMMIT_SHA
# Output: a1b2c3d4e5f6...

# Inspect the commit
git cat-file -p $COMMIT_SHA
# Output:
# tree 4b825dc642cb6eb9a060e54bf899d69f824970a0
# author Test User <test@example.com> 1711886400 +0000
# committer Test User <test@example.com> 1711886400 +0000
#
# Initial commit

A commit object contains:

tree: SHA-1 of the root tree (the snapshot)
parent: SHA-1 of the parent commit(s) — zero for the initial commit, one for normal commits, two+ for merges
author: who wrote the code (name, email, timestamp, timezone)
committer: who committed the code (can differ from author, e.g., after rebase)
message: the commit message

Tag Objects

There are two types of tags in Git:

Lightweight tags are simply refs pointing to a commit — no tag object is created:


git tag v1.0-lightweight
# Creates: .git/refs/tags/v1.0-lightweight → commit SHA

Annotated tags are full objects with metadata:


git tag -a v1.0 -m "Release version 1.0"
# Creates a tag object

# Inspect the tag object
git cat-file -p $(git rev-parse v1.0)
# Output:
# object a1b2c3d4e5f6...
# type commit
# tag v1.0
# tagger Test User <test@example.com> 1711886400 +0000
#
# Release version 1.0

Annotated tags contain:

object: the SHA-1 of what the tag points to (usually a commit)
type: the type of the object (commit, tree, blob, or tag)
tag: the tag name
tagger: who created the tag
message: the tag message

Production Failure Scenarios + Mitigations

Scenario	Symptoms	Mitigation
Missing blob	”fatal: unable to read tree”	Fetch from remote, or restore from backup; blobs are immutable so any copy works
Corrupted tree	Directory listing fails	Rebuild tree from working tree with `git read-tree`; verify with `git fsck`
Broken commit chain	`git log` stops abruptly	Use `git replace` to graft history, or rebase onto valid ancestor
Tag pointing to wrong type	Unexpected behavior on tag checkout	Verify with `git cat-file -t`; recreate annotated tag if needed
Object store corruption	Multiple “bad object” errors	Run `git fsck --full`; clone fresh from remote; restore from backup

Trade-offs

Aspect	Advantage	Disadvantage
Content-addressable storage	Automatic deduplication, integrity verification	SHA-1 collision risk (being mitigated with SHA-256)
Snapshot-based (not delta)	Fast checkouts, simple model	Higher storage for text files (mitigated by pack files)
Immutable objects	Safe concurrent access, easy replication	No in-place updates; every change creates new objects
No filenames in blobs	Blobs can be shared across trees	Must traverse tree to find which file a blob belongs to

Implementation Snippets


# Create a blob and store it
echo "content" | git hash-object -w --stdin

# Read a blob's content
git cat-file -p <sha>

# Get object type
git cat-file -t <sha>

# Get object size
git cat-file -s <sha>

# Create a tree from index
git write-tree

# Read a tree into index
git read-tree <tree-sha>

# Create a commit object
git commit-tree <tree-sha> -p <parent-sha> -m "message"

# Create an annotated tag object
git mktag << EOF
object <commit-sha>
type commit
tag v1.0
tagger Name <email> date
EOF

# List all objects in the repository
git rev-list --objects --all

# Find all unreachable objects
git fsck --unreachable

Observability Checklist

Monitor: Object count growth with git count-objects -v
Verify: Run git fsck periodically to detect corruption
Track: Ratio of loose to packed objects (should favor packed)
Alert: Unexpected object count spikes (may indicate accidental large file commits)
Audit: Tag signatures for release integrity verification

Security/Compliance Notes

Object hashes provide integrity verification — tampering changes the hash
SHA-1 is being deprecated in favor of SHA-256 for collision resistance
Signed tags (GPG/SSH) provide non-repudiation for releases
Objects are not encrypted — sensitive data in blobs is readable by anyone with repo access
See Git Secrets Management for preventing secret commits

Common Pitfalls / Anti-Patterns

Assuming blob size equals file size — blobs include a header; use git cat-file -s for actual size
Confusing lightweight and annotated tags — lightweight tags are just refs, not objects
Modifying objects directly — objects are immutable; use Git commands to create new ones
Ignoring unreachable objects — they consume space until git gc prunes them
Storing large binary files as blobs — use Git LFS instead

Quick Recap Checklist

Blobs store file content only — no filenames or metadata
Trees map filenames to blob/tree SHAs — represent directories
Commits point to a tree, parent(s), and record authorship
Annotated tags are full objects; lightweight tags are just refs
All objects are content-addressable by SHA-1 hash
Objects are immutable and zlib-compressed
The object graph forms a directed acyclic graph (DAG)
Use git cat-file to inspect any object by type, size, or content

Interview Q&A

Why don't Git blobs store filenames?

Blobs store only file content to enable deduplication. If two files in different directories have identical content, they share the same blob. Filenames are stored in tree objects, which map names to blob SHAs. This separation means renaming a file doesn't create a new blob — only a new tree.

How does Git detect if an object has been corrupted?

Every object's SHA-1 hash is computed from its type, size, and content. When Git reads an object, it recomputes the hash and compares it to the filename. If they don't match, the object is corrupted. This is why Git is called a "content-addressable filesystem" — the address is the content's fingerprint.

What's the difference between `git hash-object` and `git hash-object -w`?

Without -w, git hash-object only computes and prints the SHA-1 hash without storing the object. With -w (write), it also stores the object in .git/objects/. This is useful for checking if content already exists before writing it.

Can a commit have zero parents? When?

Yes — the initial commit of any repository has zero parents. Additionally, commits created with git commit-tree without the -p flag, or orphan branches created with git checkout --orphan, produce commits with no parent. This creates a new root in the commit DAG.

How do annotated tags differ from lightweight tags internally?

An annotated tag creates a full tag object in .git/objects/ with metadata (tagger, date, message, GPG signature). A lightweight tag is just a file in .git/refs/tags/ containing a commit SHA — no object is created. Annotated tags are preferred for releases because they're immutable and verifiable.

Object Relationship Diagram (Clean)


graph TD
    REPO["Repository"] -->|contains| TAGS["Annotated Tags"]
    TAGS -->|points to| COMMITS["Commits"]
    COMMITS -->|tree ref| TREES["Trees"]
    COMMITS -->|parent ref| PARENTS["Parent Commits"]
    TREES -->|entries| SUBTREES["Subdirectory Trees"]
    TREES -->|entries| BLOBS["Blobs (file content)"]
    SUBTREES -->|entries| MORE_BLOBS["More Blobs"]

    BLOBS -->|content only| FILES["Raw File Bytes"]
    MORE_BLOBS -->|content only| FILES

Production Failure: Corrupted Object Database

Scenario: Missing blob causing checkout failure


# Symptoms
$ git checkout main
error: unable to read sha1 file (src/config.py)
fatal: unable to checkout working tree

$ git fsck --full
error: abc123def456...: object missing
error: 789ghi...: object corrupt

# Root cause: Disk corruption, interrupted gc, or filesystem error
# destroyed blob objects in .git/objects/

# Recovery steps:

# 1. Identify missing objects
git fsck --full 2>&1 | grep "missing"

# 2. Try to fetch missing objects from remote
git fetch origin --refetch

# 3. If remote doesn't have them (local-only commits):
#    Check reflog for last known good state
git reflog
git checkout HEAD@{1}  # Try previous HEAD

# 4. As last resort, clone fresh and cherry-pick
cd ..
git clone https://github.com/user/repo.git repo-clean
cd repo-clean
git --git-dir=../repo/.git cherry-pick <sha>

# 5. Prevent future corruption:
#    - Use reliable storage (SSD > HDD for .git/)
#    - Run git fsck periodically
#    - Keep remote backups
git push origin --mirror  # Full backup

Trade-offs: Annotated vs Lightweight Tags

Aspect	Annotated Tags	Lightweight Tags
Object type	Full tag object in `.git/objects/`	Simple ref file in `.git/refs/tags/`
Metadata	Tagger name, email, date, message	None
GPG signing	Supported (`git tag -s`)	Not supported
Storage	~200 bytes per tag	~41 bytes (SHA only)
Immutability	Immutable once created	Can be moved with `git tag -f`
Use case	Releases, public milestones	Private bookmarks, temporary markers
`git describe`	Works correctly	May not show tag message
Platform display	Shows message on GitHub/GitLab	Shows as simple pointer

Recommendation: Use annotated tags for anything public or release-related. Lightweight tags are fine for personal, temporary markers.

Implementation: Creating and Inspecting Each Object Type Manually


# === 1. BLOB ===
# Create blob from string
BLOB_SHA=$(echo "file content" | git hash-object -w --stdin)
echo "Blob SHA: $BLOB_SHA"

# Create blob from file
git hash-object -w myfile.txt

# Inspect
git cat-file -t $BLOB_SHA   # blob
git cat-file -s $BLOB_SHA   # size in bytes
git cat-file -p $BLOB_SHA   # content

# === 2. TREE ===
# Build index, then create tree
echo "100644 blob $BLOB_SHA test.txt" | git mktree
# Output: TREE_SHA

# Or from current index
TREE_SHA=$(git write-tree)

# Inspect
git cat-file -t $TREE_SHA   # tree
git cat-file -p $TREE_SHA   # entries (mode, type, sha, name)

# === 3. COMMIT ===
# Create commit (requires env vars for author)
export GIT_AUTHOR_NAME="Test"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test"
export GIT_COMMITTER_EMAIL="test@example.com"

COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)

# Inspect
git cat-file -t $COMMIT_SHA   # commit
git cat-file -p $COMMIT_SHA   # tree, author, committer, message

# === 4. ANNOTATED TAG ===
# Create tag object
git tag -a v1.0 -m "Release 1.0" $COMMIT_SHA

# Get tag object SHA (not the commit it points to)
TAG_SHA=$(git rev-parse v1.0^{tag})

# Inspect
git cat-file -t $TAG_SHA   # tag
git cat-file -p $TAG_SHA   # object, type, tag, tagger, message

# === Verify the chain ===
echo "Tag -> Commit -> Tree -> Blob"
git cat-file -p $TAG_SHA | grep "^object"
git cat-file -p $COMMIT_SHA | grep "^tree"
git cat-file -p $TREE_SHA | grep "blob"