Object Storage: S3, Blob Storage, and Scale of Data

Learn how object storage systems like Amazon S3 handle massive unstructured data, buckets, keys, metadata, versioning, and durability patterns.

published: March 22, 2026 reading time: 36 min read author: GeekWorkBench updated: April 16, 2026

Quick Summary

Object storage systems like S3 store data as objects in flat buckets, using string keys and metadata. This post covers storage classes, versioning, lifecycle policies, multipart upload, access control, S3 Select, Object Lock, and cross-region replication. By the end, you'll have what you need to design object storage that fits your use case — not just follow best practices blindly.

Object Storage: S3, Blob Storage, and Scale of Data

Introduction

Object storage organizes data as objects in buckets. Each object has a key, data, and metadata. The key is a string, like a file path but without directory semantics. The data is arbitrary bytes. Metadata is key-value pairs describing the object.

import boto3

s3 = boto3.client('s3')

# Upload an object
s3.put_object(
    Bucket='my-bucket',
    Key='images/product/photo123.jpg',
    Body=image_data,
    ContentType='image/jpeg',
    Metadata={'product-id': '12345', 'uploaded-by': 'jane'}
)

# Retrieve an object
response = s3.get_object(
    Bucket='my-bucket',
    Key='images/product/photo123.jpg'
)
image_data = response['Body'].read()

The key looks like a path but the storage is flat. There are no directories, though tools often display keys as if they had folders. The slash in the key is just another character.

Data Protection & Compliance

Buckets are containers for objects. They have global names across regions. You choose the bucket name, and it must be unique across all S3 users worldwide.

# Create a bucket
s3.create_bucket(Bucket='my-unique-bucket-name')

# List buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

Keys identify objects within a bucket. The combination of bucket name and key uniquely identifies every object. No two objects in the same bucket can have the same key.

Keys can contain slashes, which tools display as folder hierarchies. But the storage treats them as flat. You can list objects with a prefix filter to simulate directory browsing.

Object Metadata and Tags

Object metadata travels with the object. System metadata includes things like content type, size, and last-modified time. User metadata is custom key-value pairs you define.

# Retrieve metadata without downloading
head = s3.head_object(
    Bucket='my-bucket',
    Key='images/product/photo123.jpg'
)
print(head['ContentType'])  # image/jpeg
print(head['ContentLength'])  # 245632
print(head['Metadata'])  # {'product-id': '12345', 'uploaded-by': 'jane'}

Tags are separate from metadata. They are key-value pairs used for filtering and billing. You can have up to 10 tags per object. Tags are indexed separately for efficient querying.

# Add tags to an existing object
s3.put_object_tagging(
    Bucket='my-bucket',
    Key='images/product/photo123.jpg',
    Tagging={
        'TagSet': [
            {'Key': 'department', 'Value': 'marketing'},
            {'Key': 'project', 'Value': 'spring-campaign'}
        ]
    }
)

Bucket and Key Fundamentals

Versioning

S3 versioning keeps multiple versions of an object. When you overwrite an object, the previous version is preserved. You can retrieve any historical version.

# Enable versioning on a bucket
s3.put_bucket_versioning(
    Bucket='my-bucket',
    VersioningConfiguration={'Status': 'Enabled'}
)

# List object versions
versions = s3.list_object_versions(Bucket='my-bucket')
for version in versions['Versions']:
    print(f"{version['Key']} - {version['VersionId']}")

Versioning costs more storage since every version is kept. But it protects against accidental overwrites and deletions. You can retrieve any previous state.

Versioning also enables point-in-time recovery. Combined with lifecycle policies, you can keep historical versions for compliance without manual intervention.

Storage Classes and Lifecycle Policies

S3 offers multiple storage classes with different cost and availability characteristics.

Standard is the default, most expensive tier. Infrequent Access stores data cheaper but charges more for access. Glacier is for archiving, with retrieval times of minutes to hours. Intelligent Tiering moves data automatically based on access patterns.

from datetime import datetime

# Define lifecycle rule to transition objects to IA after 30 days
s3.put_bucket_lifecycle_configuration(
    Bucket='my-bucket',
    LifecycleConfiguration={
        'Rules': [
            {
                'ID': 'MoveToIA',
                'Filter': {'Prefix': 'logs/'},
                'Status': 'Enabled',
                'Transitions': [
                    {'Days': 30, 'StorageClass': 'STANDARD_IA'},
                    {'Days': 90, 'StorageClass': 'GLACIER'}
                ]
            }
        ]
    }
)

Lifecycle policies automate data movement between tiers. Old logs might move to Infrequent Access after 30 days, then to Glacier after 90 days. You set it and forget it.

flowchart LR
    Upload[("Object<br/>Uploaded")] --> Standard[("S3 Standard<br/>Frequently accessed")]

    Standard -->|30 days<br/>no access| IA[("S3 Standard-IA<br/>Infrequent access")]

    IA -->|90 days<br/>no access| Glacier[("S3 Glacier<br/>Archived")]

    Glacier -->|180 days<br/>no access| DeepArchive[("S3 Deep Archive<br/>Long-term archive")]

    DeepArchive -->|365 days<br/>no access| Delete[("Lifecycle<br/>Expiration")]

    Upload -.->|versioning<br/>enabled| Versions[("Version<br/>Preserved")]

    Versions -.->|MFA delete<br/>required| Restore[("Restore<br/>before delete")]

Without versioning, overwrites and deletes are permanent. With versioning, previous versions persist until you explicitly delete them or the lifecycle policy purges old versions.

Storage classes across AWS S3:

Class	Durability	Availability	Access Latency	Min Storage Duration	Best For
Standard	11 nines	99.99%	Milliseconds	None	Frequently accessed data
Standard-IA	11 nines	99.9%	Milliseconds	30 days	Infrequently accessed data
Intelligent Tiering	11 nines	99.9%	Milliseconds	None	Unknown or changing access patterns
One Zone-IA	99.5%	99.5%	Milliseconds	30 days	Re-creatable non-critical data in one AZ
Glacier Instant Retrieval	11 nines	99.9%	Milliseconds	90 days	Rarely accessed but needs instant retrieval
Glacier Flexible Retrieval	11 nines	99.99%	1-12 hours	90 days	Long-term archives with occasional access
Glacier Deep Archive	11 nines	99.99%	12-48 hours	180 days	Longest-term retention, regulatory compliance

Most teams keep everything in Standard. They should not. Standard-IA is about half the price, and Glacier Deep Archive is roughly 95% cheaper. The tradeoff is retrieval cost and latency — match the class to how you actually access the data.

Durability and Availability

Object storage is designed for massive scale and high durability. S3 Standard offers 11 nines of durability. That means losing an object is extraordinarily unlikely.

The durability comes from storing multiple copies across availability zones. S3 automatically replicates your data. You do not need to configure RAID or backup software.

Availability guarantees differ from durability. S3 Standard guarantees 99.99% availability. That is about 52 minutes of downtime per year. Different storage classes offer different availability SLAs.

# Check if an object exists
try:
    s3.head_object(Bucket='my-bucket', Key='important-file.pdf')
    print("Object exists")
except ClientError as e:
    error_code = e.response['Error']['Code']
    if error_code == '404':
        print("Object not found")

Using Object Storage in Applications

Object storage replaces both file storage and some database use cases. Static assets like images and videos live in object storage. User uploads go there. Backup files. Data lakes.

The pattern is straightforward. Generate a unique key, upload with metadata, store the key in your database if needed. The object storage handles the rest.

import uuid

def handle_upload(file_data, content_type):
    # Generate unique key
    key = f"uploads/{uuid.uuid4()}/{file_data.filename}"

    # Upload to S3
    s3.put_object(
        Bucket='my-bucket',
        Key=key,
        Body=file_data.read(),
        ContentType=content_type
    )

    # Store key in database, not the file itself
    db.files.insert({'s3_key': key, 'original_name': file_data.filename})

    return key

CDN integration is common. CloudFront, Cloudflare, and others cache objects from S3. Users download from edge locations rather than origin servers. This reduces latency and origin load.

Access Control Patterns: Bucket Policies, IAM, and CORS

S3 offers layered access control: IAM policies for fine-grained user permissions, bucket policies for bucket-wide rules, and ACLs for legacy compatibility. Bucket policies are JSON documents that apply to entire buckets — they can grant or deny access based on source IP, requestor, object prefix, or operation type.

# Bucket policy denying HTTP traffic except from a specific IP range
bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowReadFromVPN",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-bucket/*",
            "Condition": {
                "NotIpAddress": {
                    "aws:SourceIp": "203.0.113.0/24"
                }
            }
        },
        {
            "Sid": "AllowInternalReads",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-bucket/internal/*"
        }
    ]
}

s3.put_bucket_policy(Bucket='my-bucket', Policy=json.dumps(bucket_policy))

IAM vs bucket policies:

Aspect	IAM Policies	Bucket Policies
Scope	User, group, or role level	Entire bucket or prefix
Use case	Cross-account access, detailed action control	Public/private boundaries, cross-account grants
Evaluation	Evaluated in combination with bucket policies	Evaluated independently as resource-based policy
Management	Per-user in IAM	Single JSON doc attached to bucket

CORS for browser access:

If your application serves objects to browsers directly from S3, you need CORS configuration:

# Configure CORS to allow browser-based uploads from your domain
cors_configuration = {
    'CORSRules': [
        {
            'AllowedHeaders': ['Authorization', 'Content-Type'],
            'AllowedMethods': ['GET', 'POST', 'PUT'],
            'AllowedOrigins': ['https://your-domain.com'],
            'ExposeHeaders': ['ETag'],
            'MaxAgeSeconds': 3600
        }
    ]
}

s3.put_bucket_cors(Bucket='my-bucket', CORSConfiguration=cors_configuration)

Public access block: At minimum, enable block public access at the account level before deploying any bucket:

# Block all public access to prevent accidental exposure
s3.put_public_access_block(
    Bucket='my-bucket',
    PublicAccessBlockConfiguration={
        'BlockPublicAcls': True,
        'IgnorePublicAcls': True,
        'BlockPublicPolicy': True,
        'RestrictPublicBuckets': True
    }
)

Multipart Upload: Parallelizing Large File Transfers

S3 splits large objects into parts for upload. Multipart upload enables parallel transfers, resumable uploads, and handling of files larger than 5GB (the single-put limit).

# Multipart upload example
import boto3
import math

s3 = boto3.client('s3')

def upload_large_file(bucket, key, file_path, part_size_mb=100):
    """Upload a large file using multipart upload."""
    file_size = file_path.stat().st_size
    part_size = part_size_mb * 1024 * 1024  # 100MB default
    num_parts = math.ceil(file_size / part_size)

    # Initiate multipart upload
    response = s3.create_multipart_upload(
        Bucket=bucket,
        Key=key,
        ContentType='application/octet-stream'
    )
    upload_id = response['UploadId']

    parts = []
    with open(file_path, 'rb') as f:
        for i in range(num_parts):
            data = f.read(part_size)
            part_num = i + 1

            # Upload each part
            result = s3.upload_part(
                Bucket=bucket,
                Key=key,
                UploadId=upload_id,
                PartNumber=part_num,
                Body=data
            )
            parts.append({
                'PartNumber': part_num,
                'ETag': result['ETag']
            })

    # Complete multipart upload
    s3.complete_multipart_upload(
        Bucket=bucket,
        Key=key,
        UploadId=upload_id,
        MultipartUpload={'Parts': parts}
    )
    return key

Part size considerations:

File Size	Recommended Part Size	Max Parts	Notes
< 100MB	Single PUT	1	Simpler, no multipart needed
100MB - 5GB	50-100MB	< 100	Default part size works well
5GB - 5TB	100-500MB	10,000	Must use multipart; 5GB is single PUT limit
> 5TB	Not supported	-	S3 maximum object size is 5TB

Performance tips: Upload parts in parallel to maximize throughput. Use byte-range fetches for parallel downloads. Part numbers must be 1-10000 and can be uploaded concurrently.

# Abort incomplete multipart upload to avoid stray charges
s3.abort_multipart_upload(
    Bucket='my-bucket',
    Key='large-file.zip',
    UploadId='upload-id-here'
)

Set lifecycle rules to abort incomplete multipart uploads after 7 days. This prevents accumulating storage costs from failed uploads.

Performance & Retrieval

S3 Select: Querying Data Without Downloading

S3 Select lets you query CSV, JSON, or Parquet objects directly. You retrieve only the data you need, reducing transfer costs and improving query performance.

# Query CSV data with S3 Select
response = s3.select_object_content(
    Bucket='my-bucket',
    Key='analytics/events.csv',
    Expression='SELECT * FROM s3object WHERE status = \'completed\' LIMIT 10',
    ExpressionType='SQL',
    InputSerialization={
        'CSV': {
            'FileHeaderInfo': 'USE',
            'RecordDelimiter': '\n',
            'FieldDelimiter': ','
        }
    },
    OutputSerialization={'CSV': {}}
)

# Process results
for event in response['Payload']:
    if 'Records' in event:
        print(event['Records']['Payload'].decode('utf-8'))

S3 Select use cases:

Log analysis: Query specific error codes from large CloudWatch logs
Time-series filtering: Extract metrics for a specific date range from Parquet files
Partial file retrieval: Get only needed columns from wide CSV datasets

Cost benefits: You pay for data scanned (Query Request pricing) rather than data transferred. For filtered queries on large files, this can reduce costs by 80-90%.

Object Lock and Compliance: WORM Storage

S3 Object Lock provides WORM (Write Once, Read Many) protection. Objects cannot be deleted or overwritten for a specified retention period.

# Enable Object Lock on bucket creation
s3.create_bucket(
    Bucket='compliance-bucket',
    ObjectLockEnabledForBucket=True
)

# Put Object Lock retention on an object
from datetime import datetime, timedelta

retention_until = datetime.utcnow() + timedelta(days=365)
s3.put_object_retention(
    Bucket='compliance-bucket',
    Key='financial-records/2024.xlsx',
    Retention={
        'Mode': 'COMPLIANCE',  # Cannot be overridden by any user
        'RetainUntilDate': retention_until.isoformat()
    }
)

# Put legal hold (indefinite protection)
s3.put_object_legal_hold(
    Bucket='compliance-bucket',
    Key='financial-records/2024.xlsx',
    LegalHold={'Status': 'ON'}
)

Retention modes:

Mode	Override by Admin	Use Case
COMPLIANCE	No	Regulatory requirements, legal hold
GOVERANCE	Yes (with special permission)	Temporary protection, testing environments

Legal hold remains until explicitly removed. It survives IAM user permissions changes and account closures. Ideal for litigation hold or permanent records.

Cross-Region Replication: Disaster Recovery Patterns

CRR replicates objects to destination buckets in different regions. It provides lower latency, disaster recovery, and compliance requirements.

# Enable cross-region replication
import boto3

iam = boto3.client('iam')
s3 = boto3.client('s3')

# Create IAM role for replication
role_policy = {
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {"Service": "s3.amazonaws.com"},
        "Action": "sts:AssumeRole"
    }]
}

# Enable versioning first (required for CRR)
s3.put_bucket_versioning(
    Bucket='source-bucket',
    VersioningConfiguration={'Status': 'Enabled'}
)

# Create replication configuration
replication_config = {
    'Role': iam.get_role(RoleName='s3-replication-role')['Role']['Arn'],
    'Rules': [{
        'ID': 'replicate-to-us-west-2',
        'Status': 'Enabled',
        'Priority': 1,
        'DeleteMarkerReplication': {'Status': 'Enabled'},
        'Destination': {
            'Bucket': 'arn:aws:s3:::dest-bucket-us-west-2',
            'StorageClass': 'STANDARD_IA',
            'EncryptionConfiguration': {'ReplicaKmsKeyID': 'kms-key-id'}
        },
        'Filter': {'Prefix': 'production/'}
    }]
}

s3.put_bucket_replication(
    Bucket='source-bucket',
    ReplicationConfiguration=replication_config
)

Replication considerations:

Cost: Data transfer between regions charges egress fees. Budget accordingly.
Latency: Objects appear in destination within seconds typically, but async replication means RPO > 0.
Filtered replication: Use prefix filters to replicate only production data, not test data.
KMS encryption: Encrypted objects require KMS key access in destination region.

Multi-region active-active: For zero RPO, replicate synchronously to a read replica bucket. Route traffic with Route53 geolocation. This is complex and expensive but achieves true active-active.

Cost & Capacity Planning

Capacity Estimation: Cost Calculation for S3 Tiers

Object storage pricing varies significantly by storage class. Understanding tier costs enables cost-effective lifecycle management.

Monthly storage cost formula:

storage_cost_per_month = storage_gb × price_per_gb_per_month
total_storage_cost = sum(storage_tier_gb × tier_price_per_gb_per_month)

For a media platform with the following storage distribution:

Tier	Storage	Price/GB/mo	Monthly Cost
S3 Standard	10TB	$0.023	$230
S3 IA	50TB	$0.0125	$625
S3 Glacier	200TB	$0.004	$800
S3 Glacier Deep Archive	500TB	$0.00099	$495
Total	760TB		$2,150/mo

Request costs matter as much as storage:

request_cost_per_month = get_requests × price_per_1k_gets + put_requests × price_per_1k_puts

S3 Standard pricing: $0.023/GB storage, $0.0004 per 1,000 GET requests, $0.005 per 1,000 PUT requests. For a high-traffic application with 100M GET/day and 1M PUT/day: storage might be $500/mo but requests add $40 + $5 = $45/mo. For a low-traffic archive with 1M GET/day and 10K PUT/day: storage $500/mo, requests $0.40 + $0.05 = $0.45/mo. Storage dominates for cold data; requests dominate for hot data.

Early deletion fees: S3 Glacier and Glacier Deep Archive charge early deletion fees if data is deleted within 90 or 180 days respectively. Plan lifecycle transitions accordingly — do not move data to Glacier if you might need to delete it within 90 days.

Trade-off Analysis

When designing object storage solutions, you constantly trade off cost, performance, durability, and operational complexity.

Storage Class Trade-offs:

Consideration	Standard	Standard-IA	Glacier
Storage cost	Highest	~50% of Standard	~95% cheaper than Standard
Retrieval cost	None	Per-GB fees	Per-GB + retrieval tier fees
Access latency	Milliseconds	Milliseconds	Minutes to hours
Minimum duration	None	30 days	90-180 days
Best for	Hot, accessed data	Monthly access	Archival, rarely accessed

Versioning Trade-offs:

Factor	With Versioning	Without Versioning
Storage cost	2x-10x depending on churn	1x base storage
Protection	Accidental deletes/overwrites recoverable	Permanent loss risk
Operational complexity	More storage to manage, list versions	Simpler
MFA delete available	Yes	No protection against admin mistakes

Replication Trade-offs:

Strategy	Cost	RPO	RTO	Complexity
No replication	$0	N/A	Full loss possible	None
Same-region replication	Low	Minutes	Minutes-hours	Medium
Cross-region async	Medium (egress fees)	Seconds-minutes	Minutes	Medium
Cross-region sync	High	Near zero	Near zero	High

Key naming strategies:

Strategy	Pros	Cons
Descriptive paths (`images/2024/product/photo.jpg`)	Human readable, easy to navigate	May hit hot prefix limits
UUID-based (`uploads/a1b2c3d4/photo.jpg`)	No collision risk, evenly distributed	Hard to navigate manually
Hybrid (`uploads/2024/a1b2c3d4.jpg`)	Balanced	Slightly more complex

Encryption trade-offs:

Method	Key management	Performance	Compliance
SSE-S3 (AES-256)	AWS managed	Minimal overhead	Basic
SSE-KMS	Your keys in KMS	~3-5% overhead	Enhanced
CSE-KMS	Your keys, client-side	Higher CPU	Highest control

When to Use / When Not to Use

When to Use Object Storage:

Storing unstructured files: images, videos, documents, audio files
Backing up databases, logs, or system files
Hosting static website assets (HTML, CSS, JS, images)
Building data lakes for analytics workloads
Distributing large files via CDN integration
Storing user uploads that exceed database BLOB limits
Archiving data for compliance with infrequent access patterns

When Not to Use Object Storage:

You need filesystem semantics (directories, symlinks, permissions)
Your workload requires sub-millisecond latency (use block storage or memory)
You need to modify parts of files frequently (object storage is write-all-or-nothing)
Your application requires strong consistency for concurrent modifications
You need database-like queries across object metadata (use a database index)
Real-time file system operations are required (use NFS, EBS, or similar)

Production Failure Scenarios

Failure	Impact	Mitigation
Accidental deletion	Permanent data loss without versioning	Enable versioning, implement soft-delete policies, use MFA delete
Bucket policy misconfiguration	Public exposure or access denial	Review policies with least privilege, use policy validation tools
Cost overrun from unconstrained uploads	Unexpectedly high storage costs	Set budget alerts, implement upload size limits, lifecycle policies
Cross-region replication delay	Stale data in DR region	Set realistic RPO, test replication lag, use synchronous replication if needed
Throttling from request rate limits	Upload/download failures under load	Implement retry with exponential backoff, request rate smoothing
Object key namespace collision	Data overwrites	Use UUIDs or timestamp-based keys, implement key validation
Bucket name conflicts	Deployment failures	Use globally unique naming conventions, automate naming
CDN cache invalidation delays	Stale content served after updates	Plan invalidation strategy, use versioned object keys

Common Pitfalls / Anti-Patterns

Using object storage as a database: Object storage has no query language. Storing millions of objects with no index makes retrieval extremely slow. Use a database for structured data with query needs.
Not planning for request rate limits: S3 has per-bucket and per-prefix limits. Popular content from a single prefix throttles. Spread across prefixes and use CloudFront.
Storing too many small objects: Each object has overhead. Millions of tiny objects waste money and slow listings. Consider archiving small files together.
Ignoring storage class optimization: Storing everything in Standard costs more than necessary. Use lifecycle policies to move rarely-accessed data to cheaper tiers.
Not using versioning for mutable objects: Overwriting without versioning destroys the previous version. Enable versioning for any object that changes.
Assuming strong consistency immediately: New objects and updates take time to propagate. Do not assume immediate consistency for read-after-write.
Leaking pre-signed URLs: Pre-signed URLs grant access to anyone with the URL. Do not log or expose them. Use short expiration times.
Not implementing cleanup policies: Upload failures, test runs, and temporary files accumulate. Implement lifecycle rules to auto-delete incomplete multipart uploads and old temp files.

Real-World Case Studies

Case Study 1: Netflix — Video Storage at Scale

Netflix stores billions of hours of video content in S3-compatible object storage. Each title is encoded at multiple resolutions and bitrates, resulting in hundreds of object versions per film or episode. Netflix uses:

Lifecycle policies to automatically transition old or rarely viewed content from Standard to Glacier storage classes
Multipart upload for reliable upload of very large video files
Cross-Region Replication to replicate popular content to regions near users, reducing playback latency

Netflix encodes every title in multiple profiles — a single two-hour film might produce 15-20 object versions across resolutions (4K, 1080p, 720p) and bitrates (450kbps for mobile, 25Mbps for premium). Each profile is a separate object with a structured key like titles/{title_id}/encodings/{profile}.mp4. The volume is staggering: Netflix reports storing over 250 million hours of video at peak usage, with thousands of new encodes added daily.

The architecture handles this with a producer-consumer pattern. Encode jobs run on compute clusters, writing output directly to S3 via multipart upload in100MB parts. The encoding service tracks completion by object key, not file handle. Once a profile object is confirmed in S3, metadata is written to a DynamoDB table that the streaming serving layer queries to assemble the available bitrates for a given title.

For cost management, Netflix applies lifecycle transitions aggressively. Content older than 90 days without significant viewing audience moves from Standard to Glacier Instant Retrieval — the instant retrieval tier matters here because recommendation algorithms still query metadata for content that is not actively streamed. Popular titles stay in Standard or Standard-IA based on real-time viewership metrics. The cross-region replication is not for disaster recovery in the traditional sense — it is tuned for regional latency. Popular titles in the US East region are replicated to US West, EU Central, and Asia Pacific buckets so playback starts faster for international users without origin fetches.

Netflix also uses S3 Object Lock in COMPLIANCE mode for regulatory content that must be preserved unmodified for specified windows. Combined with versioning, this lets them meet legal hold requirements without sacrificing the ability to add new versions when content rights are renegotiated.

Case Study 2: Airbnb — Photo Storage with S3

Airbnb stores millions of listing photos as objects in S3. Each photo is uploaded via a backend service that:

Assigns a unique key (e.g., listings/{listing_id}/photos/{photo_id}.jpg)
Generates multiple thumbnails stored as separate objects under the same listing prefix
Applies storage classes to thumbnails based on age (older thumbnails move to Infrequent Access)

The key structure follows a predictable hierarchy: listings/{listing_id}/photos/{photo_id}.jpg. The listing_id prefix enables efficient directory-style listing when Airbnb needs to retrieve all photos for a specific listing — a common operation when rendering a listing detail page. With millions of listings and an average of 5-10 photos per listing, the bucket holds tens of millions of objects under a relatively shallow prefix tree.

When a host uploads a photo, the backend service assigns a UUID as the photo_id and stores the original at full resolution. Simultaneously, a background worker generates derived thumbnails — typically at 4 sizes: thumbnail (100x100), small (256x256), medium (720x720), and large (1920x1080). Each thumbnail is a separate object with a key like listings/{listing_id}/photos/{photo_id}_large.jpg. The original object never changes after upload, which makes S3 versioning largely unnecessary for listing photos — once a photo is published it is immutable.

Storage class transitions are age-based and tiered. New photos live in Standard for the first 30 days, matching the window when a listing is most likely to receive traffic. After 30 days, originals and large thumbnails transition to Standard-IA since listing views drop significantly after the initial booking window. The smallest thumbnails (_thumbnail) actually remain in Standard because the per-object retrieval cost for Standard-IA exceeds the storage savings for such small files — a subtle but important optimization that most teams miss.

Airbnb uses S3 Select in pipeline jobs that run periodic audits on photo metadata. By querying the ContentType and ContentLength fields directly from S3 without downloading the objects, the data engineering team identifies malformed uploads (photos with zero bytes, wrong MIME types, or unusually large files that indicate encoding errors) without pulling data across the network. This kind of metadata auditing at scale is a practical example of S3 Select paying for itself in reduced egress costs.

Case Study 3: GitHub — Git Object Storage

GitHub stores Git repository objects (blobs, trees, commits, tags) as content-addressable objects in S3. Each object is stored under its SHA-1 hash as the key, enabling efficient deduplication across millions of repositories.

Git’s storage model is fundamentally different from most object storage use cases. When you commit a file, Git computes the SHA-1 hash of its content, stores the compressed blob under that hash, and references it by that same hash in the tree object. Two repositories with identical package.json files — even across completely unrelated projects — store exactly one blob object in S3. GitHub’s S3-backed storage turns this into a site-wide deduplication layer: across tens of millions of repositories, the storage efficiency from content-addressable deduplication is substantial.

The key structure uses the object type as a prefix: objects/{first_two_chars}/{full_hash}. For example, a blob with hash 9daeafb9864cf43088d93adf3d4ea4615b61eb1f lives at objects/9d/9daeafb9864cf43088d93adf3d4ea4615b61eb1f. This two-character prefix is not a directory — S3 has no directories — but it distributes objects across many S3 prefix partitions, avoiding the hot-prefix throttling problem that plagues buckets with sequential naming. The prefix split is deliberate engineering, not an artifact of a filesystem mental model.

GitHub does not use S3 versioning for repository objects. Repository data is already immutable at the object level — a blob with a given hash never changes, and Git’s reference model (branches, tags) provides version-like semantics through the commit graph. Overwriting a blob is not a Git operation; it would require rewriting repository history, which GitHub prevents at the application layer. This design choice eliminates the storage cost of versioning across billions of objects.

For availability, GitHub runs S3 Intelligent Tiering on the blob storage class. Access patterns for repository objects are highly unpredictable — a popular open-source library might receive thousands of clone requests per minute for weeks, then go months without a single fetch. Intelligent Tiering eliminates the need to manually tune storage classes for this variance. The 30-day monitoring window before objects move to Infrequent Access aligns with the typical access pattern of active repositories.

GitHub also uses S3 Object Lock for legal hold on repositories under investigation or subject to data residency requirements. Because repository objects are immutable and the commit graph provides referential integrity, Object Lock in COMPLIANCE mode is a natural fit — the repository can continue to accept new commits, but historical objects cannot be purged during the hold period.

Quick Recap Checklist

Use this checklist to validate your object storage setup:

Interview Questions

1. Explain the difference between object storage and block storage. When would you choose one over the other?

Block storage works like a traditional hard drive: you get raw disks, you partition them, you format them with a filesystem. Object storage throws all that away: data goes in as objects identified by a string key, with metadata attached. There are no blocks, no directories, no in-place modification. Pick block storage when you need to run a database or mount virtual disks — situations that demand low latency and partial file changes. Pick object storage when you are storing things that are written once and read often: images, videos, log files, backups, archives.

2. How does S3 achieve 11 nines of durability?

S3 keeps multiple copies of every object across different Availability Zones at all times. If a disk fails, AWS swaps in a replacement without you noticing. The math behind 11 nines: you'd expect to lose one object per 10 million stored, per 10,000 years. That is the redundancy working — not magic, just engineering.

3. What is the difference between S3 storage classes? When would you use Glacier vs Standard-IA?

Standard is always-on, always-fast, and always expensive. Standard-IA is cheaper storage but hits you with retrieval fees — good for data you access monthly. Glacier is the archive tier: retrieval times range from milliseconds (Instant) to half a day (Deep Archive), and there are 90-180 day minimums before you can delete without penalty. The short version: use Glacier when you need to keep something for years and rarely touch it. Use Standard-IA when monthly access is likely but you want to save on storage costs.

4. Explain multipart upload. Why is it important for large files?

Multipart upload breaks objects into up to 10,000 parts and uploads each independently. The practical wins: if your connection drops mid-upload, you pick up where you left off instead of starting over. Upload speeds scale with parallelism — more parts simultaneously means more throughput. It also lifts the 5GB single-upload ceiling entirely (5TB is the max object size). And your memory footprint stays small since you only hold one part in memory at a time.

5. How does S3 versioning protect against accidental deletion?

When you delete an object with versioning on, S3 does not actually delete it — it drops a delete marker over the current version. The previous version is still there, untouched. You can list versions, pull any historical version back, or permanently delete a specific version when you are ready. Add MFA delete and even admins cannot force-permanently-delete without a code from your authenticator app.

6. What is a pre-signed URL and when would you use one?

A pre-signed URL embeds your IAM credentials with an expiration time, letting anyone with the URL access a specific object or perform a specific operation. You see this when you upload profile pictures: the app generates a pre-signed URL, your browser uploads directly to S3, and your credentials never touch the app server. Other uses: temporary download links for private files, or one-time upload links for user submissions. Default lifetime is one hour; the max is seven days.

7. Explain S3 Object Lock. What is the difference between COMPLIANCE and GOVERANCE retention modes?

S3 Object Lock is WORM storage for S3 — once written, objects cannot be deleted or overwritten until your retention period expires. COMPLIANCE mode is the strict one: not even AWS support can remove it before the date, no matter what. GOVERANCE is softer — users with special IAM permissions can remove it early. Use COMPLIANCE when regulators are watching. Use GOVERANCE for internal policies where you want protection but do not want to accidentally lock yourself out permanently.

8. What is cross-region replication (CRR) and what are its limitations?

CRR mirrors a bucket to a different region automatically. The upside is obvious: if your primary region goes down, you fail over. It also serves global users from a nearby replica. The catches: replication is async so your RPO is not zero, cross-region transfer has egress fees, KMS-encrypted objects need key access in the destination account, and you cannot replicate to the same region you are already in.

9. How would you design a cost optimization strategy for S3 at scale?

Layer your approach. First, lifecycle policies — set them on day one, not after you have 500TB of Standard storage you should have moved to IA. Second, Intelligent Tiering for workloads where you genuinely do not know access patterns. Third, do not version everything — version only what matters, because every version costs storage. Fourth, clean up failed multipart uploads before they burn money. Finally, watch request costs separately: storage dominates for cold data, requests dominate for hot data.

10. What is S3 Select and when would you use it?

S3 Select lets you run SQL queries against objects stored as CSV, JSON, or Parquet without downloading the whole file. The selling point is cost and speed: you pay per GB scanned, not per GB transferred. For a 10GB CSV where you only need one column, that is a meaningful difference. Use it for filtering logs, extracting metrics from Parquet analytics files, or pulling specific records from structured datasets.

11. How does S3's eventual consistency model affect your application design?

S3 offers eventual consistency for overwrite and delete operations in the US regions. This means after you overwrite an object, a subsequent GET might return the old data for a brief window. For most applications this does not matter — CloudFront caches help, and your application logic handles the race. It becomes critical when you need read-after-write consistency for new objects (which S3 does guarantee), or when you are building systems where updates must be immediately visible. In practice, design for eventual consistency: retry failed reads, do not assume sequential visibility across distributed clients, and use versioning if you need to track state transitions.

12. What are S3 Access Points and when would you use them?

Access Points are named network endpoints attached to buckets, each with its own IAM policy. Rather than one bucket policy governing all access, you create scoped access points for different applications or teams. Example: a data pipeline access point allows only prefixed writes, while an analytics access point allows reads from a read-only prefix. Access Points also support VPC-only access, closing down internet routes entirely for sensitive workloads. Use them when you need to segment access without complex bucket policy logic, or when you want to apply different network constraints to different consumers of the same bucket.

13. What is the difference between SSE-S3, SSE-KMS, and CSE-KMS encryption in S3?

SSE-S3 uses an AWS-managed key — encryption happens transparently, you pay nothing extra, and key rotation is automatic. SSE-KMS uses your own keys stored in AWS KMS — you control key policies and rotation, and you pay per API call. CSE-KMS (client-side encryption) encrypts data before it leaves your machine using KMS keys, so AWS never sees plaintext — highest control but operational overhead. Compliance requirements often mandate SSE-KMS for audit trails of key usage. For most workloads, SSE-S3 is sufficient. Use SSE-KMS when you need explicit key access logging or compliance controls. Use CSE-KMS when data must be encrypted end-to-end before touching any cloud infrastructure.

14. How does S3 Intelligent Tiering work and when does it make sense?

Intelligent Tiering monitors access patterns and moves objects between two tiers: Frequent Access and Infrequent Access, automatically. New objects start in Frequent Access. If an object is not accessed for 30 consecutive days, it moves to Infrequent Access. The key selling point is you do not have to predict access patterns — the system adapts. The cost: slightly higher storage price than Standard, plus a small monthly monitoring fee. It makes sense when access patterns are unpredictable or when you have a large volume of objects with unknown or changing access patterns. It does not make sense for very cold data that sits for years — Glacier is cheaper for truly archival data.

15. What are the limits of S3 and how do you design around them?

S3 limits you to 5TB max object size, 5GB for single PUT (use multipart above that), 10,000 parts per multipart upload, and per-bucket request rates that vary by prefix. The most common hot prefix problem: if all your traffic hits a single prefix, you hit throttling even if overall request capacity is fine. Design key schemas to spread load — avoid sequential naming like `file-0001`, `file-0002`. Use random prefixes like `uploads/a1b2/` instead. For request rate, CloudFront or S3 Transfer Acceleration helps distribute load. For very high throughput needs, S3 now offers up to 20,000 RPS per prefix by default, but design keys intentionally to take advantage of partition distribution.

16. What is S3 Batch Operations and when would you use it?

Batch Operations performs repetitive S3 operations at scale — think millions of objects. You provide a manifest CSV or inventory report, specify the operation (copy, replace tag set, change storage class, restore from Glacier), and S3 executes it asynchronously. Common uses: migration projects, compliance remediation to tag or encrypt existing objects, bulk lifecycle transitions. You pay per operation, not per GB, making it cost-effective for large-scale metadata changes. The key difference from lifecycle policies: batch operations act on existing objects immediately; lifecycle policies only apply rules going forward.

17. How do you handle S3 costs at scale? Walk through a cost optimization approach.

Start with lifecycle policies immediately when creating buckets — do not let Standard accumulate for months before optimizing. Separate hot and cold data into different prefixes so lifecycle transitions are targeted, not blanket. Use Intelligent Tiering for unpredictable workloads. Enable versioning selectively — version everything and your storage bill doubles or triples. Set up budget alerts before you get surprised. Abort incomplete multipart uploads on a 7-day lifecycle. Finally, monitor request pricing separately: a hot workload with millions of GETs per day adds meaningful cost that storage optimization does not address.

18. Explain the relationship between S3 and CloudFront for content delivery.

CloudFront is a CDN that caches your S3 objects at edge locations globally. Users download from the nearest edge rather than your S3 origin, reducing latency and origin load. You point CloudFront at an S3 bucket as the origin, or better yet, use an Origin Access Identity so the bucket can only be accessed through CloudFront — no direct S3 access needed. CloudFront also terminates TLS at the edge, offloads your server, and can compress objects on the fly. Cache invalidation is manual or via versioned keys. For static assets, this is usually a significant win. For dynamic content, caching is trickier — you need to set appropriate cache headers on your objects.

19. What is the S3 Inventory feature and what is it useful for?

S3 Inventory provides scheduled CSV or ORC reports of your bucket contents — every object, its metadata, encryption status, version ID, and more. Useful for compliance audits, migration planning, and identifying objects that should be in different storage classes. You configure it per-bucket with a daily or weekly schedule, output to a destination bucket, and process the files with your own tooling. It solves the problem of listing millions of objects through the API, which is slow and expensive. Combine it with Athena to query the inventory directly with SQL.

20. How would you audit an S3 bucket for security and compliance?

Layer your audit. First, enable server access logging and store logs in a separate bucket with versioning — you need them if an incident investigation requires pristine log history. Second, enable AWS Config or use S3 Storage Lens for compliance dashboards. Third, review bucket policies: use policy simulation to verify they grant only intended access. Fourth, check for public access block settings — this should be enabled at the account level. Fifth, verify encryption is applied via bucket default or object-level SSE. Sixth, audit IAM roles with s3:* permissions — least privilege means roles should target specific buckets and actions. Finally, use S3 Inventory plus Athena to identify unencrypted objects or objects without expected tags across your entire storage footprint.

Conclusion

Key takeaways from Object Storage:

Object storage organises data as objects (data + metadata + unique ID) rather than files in a hierarchy
S3 (and S3-compatible services like MinIO, GCS, Azure Blob) is the dominant interface for object storage
Buckets are top-level containers; keys are the full object path within a bucket
Storage classes (S3 Standard, IA, Glacier, etc.) enable cost optimisation based on access patterns
Lifecycle policies automate object transitions between storage classes and expiration
Versioning preserves previous versions of objects; MFA Delete prevents accidental or malicious deletion
Multipart upload allows parallel upload of large objects in parts
S3 Select enables querying objects directly with SQL-like filters — no full download needed
Object Lock (WORM) enforces immutability for compliance and regulatory requirements
Cross-Region Replication (CRR) provides disaster recovery and geo-low-latency access
Object storage is not suitable for complex relational queries, ultra-low latency, or fine-grained ACID transactions

Object Storage: S3, Blob Storage, and Scale of Data

Introduction

Data Protection & Compliance

Object Metadata and Tags

Bucket and Key Fundamentals

Versioning

Storage Classes and Lifecycle Policies

Durability and Availability

Using Object Storage in Applications

Access Control Patterns: Bucket Policies, IAM, and CORS

Multipart Upload: Parallelizing Large File Transfers

Performance & Retrieval

S3 Select: Querying Data Without Downloading

Object Lock and Compliance: WORM Storage

Cross-Region Replication: Disaster Recovery Patterns

Cost & Capacity Planning

Capacity Estimation: Cost Calculation for S3 Tiers

Trade-off Analysis

When to Use / When Not to Use

Production Failure Scenarios

Common Pitfalls / Anti-Patterns

Real-World Case Studies

Case Study 1: Netflix — Video Storage at Scale

Case Study 2: Airbnb — Photo Storage with S3

Case Study 3: GitHub — Git Object Storage

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

AWS Data Services: Kinesis, Glue, Redshift, and S3

AWS SQS and SNS: Cloud Messaging Services

Cloud Cost Optimization: Right-Sizing, Reserved Capacity