Docker Fundamentals: From Images to Production Containers
Master Docker containers, images, Dockerfiles, docker-compose, volumes, and networking. A comprehensive guide for developers getting started with containerization.
Docker Fundamentals: From Images to Production Containers
Docker has reshaped how we build, ship, and run applications. If you are still manually installing dependencies and fighting “works on my machine” problems, you are leaving performance on the table. Containerization is not a passing trend — it is the standard deployment model for modern software.
This guide walks you through everything you need to go from Docker beginner to someone who can containerize a real application and run it reliably.
Introduction
Docker is a platform for packaging applications into self-contained units called containers. A container bundles your code, runtime, system tools, libraries, and settings — everything the application needs to run — independent of the host system.
Containers share the host kernel and do not emulate hardware. This makes them lightweight and fast to start. A VM needs to boot an entire operating system; a container starts in seconds.
Docker uses client-server architecture. The Docker client talks to the Docker daemon, which handles building, running, and distributing containers. You interact primarily with the CLI, but a RESTful API does the actual work.
Core Concepts
An image is a read-only template with instructions for creating a container. Think of it as a snapshot of a filesystem with some metadata about how to run the process.
Images are built in layers. Each instruction in a Dockerfile creates a new layer. When you change something, only that layer and its dependents rebuild. This caching mechanism is what makes Docker builds fast after the first run.
Here is a simple Dockerfile for a Node.js application:
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
USER node
EXPOSE 3000
CMD ["node", "server.js"]
The FROM instruction sets the base image. Using Alpine variants keeps your images small — around 5MB for the base OS layer versus 700MB+ for a full Ubuntu image.
Docker images use a naming convention: registry/repository:tag. If you do not specify a tag, Docker defaults to latest.
docker pull nginx:1.25-alpine
docker pull nginx:1.25
docker pull nginx
The three commands above pull different images. The first explicitly specifies version 1.25 of the Alpine variant. The second pulls the same version without the Alpine suffix. The third gets the latest tag.
For production, always pin exact versions. latest is a moving target that will bite you when it changes unexpectedly.
A container is a runnable instance of an image. You can create, start, stop, and delete containers. Each container is isolated from other containers and the host system, but they can communicate through defined networking channels.
docker run nginx:latest
This pulls the nginx image if not present locally, creates a container from it, and starts it. By default, nginx runs in the foreground and binds to port 80 inside the container.
To run it in detached mode with port mapping:
docker run -d -p 8080:80 --name my-nginx nginx:latest
The -d flag runs the container detached (in the background). -p 8080:80 maps host port 8080 to container port 80. The --name flag gives your container a memorable name instead of a random one.
docker ps # List running containers
docker ps -a # List all containers (including stopped)
docker stop my-nginx # Stop a running container
docker start my-nginx # Start a stopped container
docker restart my-nginx # Stop then start
docker rm my-nginx # Remove a container (must be stopped)
docker logs -f my-nginx # Follow logs in real-time
docker exec -it my-nginx sh # Get shell inside running container
The docker exec command is indispensable for debugging. Jump into a running container and inspect its filesystem, check environment variables, or figure out why something is not working.
Building Images with Dockerfiles
A Dockerfile is a script with instructions for building your custom image. Each instruction creates a new layer, and Docker caches layers when possible to speed up rebuilds.
Multi-stage builds let you use multiple FROM statements to separate build-time and runtime environments. This keeps production images lean by excluding build tools.
The final image only contains the production-ready artifact. The build dependencies never make it into the runtime image.
Docker Compose for Multi-Container Applications
Most real applications need multiple services: a web server, a database, a cache layer. Docker Compose manages these multi-container setups through a YAML configuration file.
docker-compose.yml Structure
version: "3.8"
services:
web:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=postgres://db:5432/app
depends_on:
- db
- redis
restart: unless-stopped
db:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: app
POSTGRES_USER: user
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
secrets:
db_password:
file: ./secrets/db_password.txt
Compose Commands
docker-compose up -d # Start all services
docker-compose down # Stop and remove containers
docker-compose down -v # Also remove volumes
docker-compose logs -f web # Follow logs for web service
docker-compose ps # List running services
docker-compose exec db psql # Run psql in db container
docker-compose restart web # Restart web service
The depends_on directive ensures services start in the right order. Note that it only waits for the container to start, not for the application inside to be ready. For databases and similar services, you often need a healthcheck or a startup script that waits for dependencies.
Data Persistence with Volumes
Containers are ephemeral by default. Any data written inside a container disappears when the container is removed. Volumes solve this by providing persistent storage that exists independent of containers.
Volume Types
Named volumes are the simplest approach:
services:
db:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Docker creates the volume if it does not exist. The data persists across container restarts and removals.
Bind mounts map a host directory into the container:
services:
app:
image: node:20-alpine
volumes:
- ./src:/app/src:ro
The :ro suffix makes the mount read-only. Bind mounts are useful for development, letting you edit code on your host and see changes immediately inside the container.
tmpfs mounts store data in memory only, useful for sensitive data you do not want persisted:
services:
cache:
image: redis:7-alpine
tmpfs:
- /data
Container Networking
Docker provides several networking modes. Understanding them helps you design proper communication between services.
Network Drivers
| Driver | Use Case |
|---|---|
| bridge | Default for standalone containers |
| host | Remove network isolation, use host network directly |
| overlay | Connect containers across multiple Docker hosts |
| macvlan | Assign MAC address to containers for legacy applications |
| none | Disable all networking |
Custom Bridge Networks
Creating a custom bridge network enables automatic DNS resolution between containers by name:
version: "3.8"
services:
web:
build: .
networks:
- frontend
api:
build: ./api
networks:
- frontend
- backend
db:
image: postgres:15-alpine
networks:
- backend
volumes:
- db_data:/var/lib/postgresql/data
networks:
frontend:
backend:
The web service can reach api by its service name, but cannot reach db directly because they are on separate networks. This network segmentation adds security by limiting what services can communicate.
Service Discovery
Within a custom bridge network, containers discover each other by the service name defined in compose. If you have a service named postgres, other containers can reach it at postgres:5432.
Docker embeds a DNS resolver that handles this resolution automatically. You do not need to hardcode IP addresses; they can change as containers restart.
Environment Variables and Configuration
Environment variables are the primary way to configure containerized applications at runtime. Docker provides several mechanisms for setting them.
Setting Environment Variables
services:
web:
environment:
- NODE_ENV=production
- API_KEY=${API_KEY}
- DEBUG=false
You can also use an .env file with Docker Compose:
# .env file
NODE_ENV=production
API_KEY=your-secret-key
environment:
- NODE_ENV=${NODE_ENV}
- API_KEY=${API_KEY}
For secrets in production, use Docker secrets or an external secrets manager. Never commit secrets to version control, even in private repositories.
Building for Production
A production Docker workflow differs from development in several ways.
Image Optimization Checklist
Use Alpine-based images to reduce attack surface and pull times. Pin exact versions for all images. Use multi-stage builds to exclude build artifacts from production. Run containers as non-root users. Remove unnecessary tools and shells from production images.
A hardened production Dockerfile might look like:
# Build stage
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine
# Add labels for metadata
LABEL maintainer="dev@example.com"
LABEL version="1.0.0"
WORKDIR /app
# Create non-root user
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY package*.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
The HEALTHCHECK instruction tells Docker how to verify the container is healthy. This enables proper health monitoring and ensures load balancers only send traffic to healthy instances.
Container Health and Monitoring
Containers can fail in several ways: the application can crash, hang, or run out of memory. Docker provides restart policies to handle these scenarios automatically.
Restart Policies
services:
web:
image: nginx:latest
restart: unless-stopped
worker:
image: my-worker:latest
restart: on-failure
| Policy | Behavior |
|---|---|
| no | Do not restart (default) |
| on-failure | Restart only if container exits with non-zero code |
| unless-stopped | Restart unless explicitly stopped |
| always | Always restart, including after Docker daemon restart |
For production services, unless-stopped or always are usually appropriate. Think about whether you want the service to restart after a code bug that causes repeated crashes, which could mask an underlying issue.
Containerized applications need CI/CD pipelines that handle building, testing, and pushing images to a registry. The patterns here cover the build stage, multi-stage optimizations, and registry authentication.
BuildKit enables parallel layer processing for faster builds. You can cache npm dependencies between runs:
# Enable BuildKit
# DOCKER_BUILDKIT=1 docker build
# Use inline cache for faster rebuilds
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm npm ci
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --only=production
CMD ["node", "dist/server.js"]
Production deployments typically use orchestration platforms beyond Docker Compose. These tools handle container lifecycle, scaling across nodes, and service distribution.
Docker containers work well in many scenarios but are not always the right tool.
When to Use Docker
Use Docker when:
- Packaging applications for consistent deployment across environments
- Microservices architectures where services need isolation
- CI/CD pipelines requiring reproducible build environments
- Scaling applications horizontally with container orchestration
- Running multiple versions of dependencies side by side
- Development environments needing parity with production
Use Docker Compose when:
- Local development with multiple coordinated services
- Running integration tests in isolated containers
- Small-scale deployments without Kubernetes
- Demonstrating application stacks to stakeholders
When Not to Use Docker
Consider alternatives when:
- Applications requiring real-time kernel access or hardware passthrough
- Desktop applications with complex GUI requirements (native packaging may be better)
- Very small scripts that have minimal dependencies
- Applications with extreme performance requirements where container overhead matters
- Windows-specific workloads (Docker on Windows has more limitations)
Containerization Decision Tree
graph TD
A[Need to deploy application?] --> B{Multiple environments?}
B -->|Yes| C[Use containers]
B -->|No| D{Scalability needed?}
D -->|Yes| C
D -->|No| E{Single host only?}
E -->|Yes| F[Consider Docker Compose]
E -->|No| C
C --> G[Use multi-stage builds]
C --> H[Configure health checks]
C --> I[Set resource limits]
Production Failure Scenarios
Containers fail in predictable ways. Understanding these helps you design resilient systems.
| Failure | Impact | Mitigation |
|---|---|---|
| Application crash | Container exits with non-zero code | Implement restart policies, health checks, and logging |
| OOM kill | Container terminated, potential data loss | Set memory limits, monitor memory usage |
| Disk full | Container cannot write logs or data | Use log rotation, monitor disk usage, mount tmpfs for temp data |
| Network partition | Container cannot reach dependencies | Implement retry logic, circuit breakers, health checks |
| Image pull failure | Pod cannot start, app unavailable | Use private registry, pre-pull images, pin exact versions |
| Port conflicts | Container fails to start | Configure port mapping carefully, use Docker Compose |
| Volume mount failure | Data inaccessible, potential crash | Verify volume paths exist, use named volumes |
| Dependency outage | Application cannot serve traffic | Implement graceful degradation, health checks |
Common Container Exit Codes
| Exit Code | Meaning | Resolution |
|---|---|---|
| 0 | Application exited successfully | Normal termination |
| 1 | Application exited with general error | Check application logs |
| 137 | SIGKILL (OOM or manual kill) | Increase memory limit, check for memory leaks |
| 139 | Segfault or SIGSEGV | Application bug, check core dump |
| 143 | SIGTERM (graceful shutdown) | Normal during restart or stop |
| 255 | Exit status out of range | Application error, check entrypoint |
Common Pitfalls / Anti-Patterns
Image Building Pitfalls
Not using multi-stage builds
# Anti-pattern: Build artifacts in production image
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
CMD ["node", "dist/server.js"]
# Better: Multi-stage build
FROM node:20 AS builder
WORKDIR /app
COPY . .
RUN npm ci && npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]
Not cleaning up in the same layer
# Anti-pattern: Build artifacts persist
RUN apt-get update && apt-get install build-essential
# Better: Clean in same layer
RUN apt-get update && apt-get install -y build-essential \
&& rm -rf /var/lib/apt/lists/*
Copying too much
# Anti-pattern: Copies everything including .git, node_modules
COPY . .
# Better: Only copy necessary files
COPY package*.json ./
COPY src ./src
Container Execution Pitfalls
Running as root
# Anti-pattern: Running as root user
services:
web:
image: myapp:1.0.0
user: root
# Better: Run as non-root
services:
web:
image: myapp:1.0.0
user: "10001"
Not setting resource limits
# Anti-pattern: No limits means unbounded resource usage
services:
web:
image: myapp:1.0.0
# Better: Set appropriate limits
services:
web:
image: myapp:1.0.0
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
Missing health checks
# Anti-pattern: No health check, Docker does not know if app is healthy
services:
web:
image: myapp:1.0.0
# Better: Define health check
services:
web:
image: myapp:1.0.0
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 10s
Networking Pitfalls
Exposing unnecessary ports
# Anti-pattern: Exposing debug ports
services:
web:
image: myapp:1.0.0
ports:
- "3000:3000"
- "9229:9229" # Debug port exposed
# Better: Only expose needed ports
services:
web:
image: myapp:1.0.0
ports:
- "3000:3000"
Not using custom networks
# Anti-pattern: Default bridge, no automatic DNS
services:
web:
image: myapp:1.0.0
db:
image: postgres:15
ports:
- "5432:5432" # Unnecessary exposure
# Better: Custom network with proper isolation
services:
web:
image: myapp:1.0.0
networks:
- backend
db:
image: postgres:15
networks:
- backend
networks:
backend:
Observability Checklist
Containerized applications need comprehensive monitoring to catch issues early.
Metrics to Collect
graph LR
A[Container Metrics] --> B[CPU Usage]
A --> C[Memory Usage]
A --> D[Network I/O]
A --> E[Block I/O]
F[Application Metrics] --> G[Request Rate]
F --> H[Error Rate]
F --> I[Latency]
F --> J[Active Connections]
Container-level metrics:
- CPU usage percentage vs limit
- Memory usage percentage vs limit
- Network bytes sent and received
- Block I/O read and write bytes
- Container restart count
Application-level metrics:
- Request throughput (requests per second)
- Error rate (4xx, 5xx responses)
- Response latency (p50, p95, p99)
- Active connections (database, Redis, HTTP)
- Queue depth for async processing
Logging Best Practices
graph TD
A[Container STDOUT STDERR] --> B[Log Driver]
B --> C[Centralized Logging]
C --> D[ELK Stack]
C --> E[Loki]
C --> F[CloudWatch]
G[Structured Logs] --> C
G --> H[JSON Format]
G --> I[Correlation ID]
- Use structured logging: JSON format enables easier parsing and querying
- Include correlation IDs: Trace requests across services
- Log to STDOUT/STDERR: Let Docker handle log routing, not files
- Implement log rotation: Prevent disk exhaustion
- Ship logs centrally: Aggregate logs from all containers
# Configure log rotation in daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Alerts to Configure
Critical (immediate action):
- Container restart count > 5 in 10 minutes
- Memory usage > 90% of limit for > 5 minutes
- Container exit (unexpected termination)
- Health check failure for > 2 minutes
Warning (investigate soon):
- CPU usage > 80% of limit for > 10 minutes
- Disk usage > 80% on volume
- Restart count > 2 in 30 minutes
- Health check degradation
Security Checklist
Container security requires defense in depth across multiple layers.
Image Security
Image selection:
- Use official images from trusted registries when possible
- Prefer Alpine or distroless images for smaller attack surface
- Never use
latesttag in production (pin exact versions) - Scan images for vulnerabilities before deployment
# Scan image locally
trivy image myapp:1.0.0
# In CI/CD pipeline
trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:1.0.0
Dockerfile hardening:
# Use specific version, not latest
FROM node:20-alpine3.18
# Create non-root user
RUN addgroup -g 1001 -S appgroup && \
adduser -S appuser -u 1001 -G appgroup
# Copy files with correct ownership
COPY --chown=appuser:appgroup . .
# Switch to non-root user
USER appuser
# Set explicit exposure
EXPOSE 3000
# Use exec form for CMD (proper signal handling)
CMD ["node", "server.js"]
Runtime Security
graph LR
A[Runtime Security] --> B[Resource Limits]
A --> C[Capability Drop]
A --> D[No Privileged]
A --> E[Read-only FS]
B --> F[Memory Limit]
B --> G[CPU Limit]
C --> H[DROP ALL]
C --> I[Add specific]
Security options for docker run:
# Run with security hardening
docker run \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
--memory=512m \
--memory-swap=512m \
--cpus=1.0 \
--user=10001 \
--cap-drop=ALL \
--security-opt=no-new-privileges \
myapp:1.0.0
Security options for docker-compose.yml:
services:
web:
image: myapp:1.0.0
read_only: true
tmpfs:
- /tmp:rw,noexec,nosuid,size=64m
mem_limit: 512m
memswap_limit: 512m
cpus: 1.0
user: "10001"
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
Secret Management
Never:
- Store secrets in environment variables
- Commit secrets to Dockerfiles or docker-compose files
- Use secrets in build arguments (they get baked into image layers)
- Use ConfigMaps for sensitive data
Always:
- Use Docker secrets for sensitive data in Compose
- Use external secrets managers (Vault, AWS Secrets Manager)
- Mount secrets as files or environment variables at runtime
- Rotate secrets regularly
# docker-compose.yml with secrets
services:
web:
image: myapp:1.0.0
secrets:
- db_password
environment:
- DATABASE_PASSWORD_FILE=/run/secrets/db_password
secrets:
db_password:
file: ./secrets/db_password.txt
Interview Questions
Expected answer points:
- Containers share the host kernel; VMs each include a full OS
- Containers start in seconds; VMs take minutes to boot
- VMs provide complete isolation; containers share the host kernel which creates potential security boundaries
- Performance: VMs have overhead from emulating hardware; containers run near bare-metal performance
- Resource usage: VMs require dedicated RAM and CPU allocation; containers share resources dynamically
- Portability: Containers are highly portable; VMs require hypervisor compatibility
Expected answer points:
- Each Dockerfile instruction creates a new layer
- Docker caches layers when instructions have not changed
- When a layer changes, all subsequent layers must rebuild
- Best practice: Order instructions from least to most frequently changing
- COPY package*.json before COPY source code to leverage npm install caching
- Use multi-stage builds to keep cache efficient and production images small
Expected answer points:
- COPY is the preferred instruction for basic file copying
- ADD can extract tar files and copy from URLs automatically
- ADD can pull files from remote URLs, which can expose secrets in image layers
- ADD auto-extraction can cause unexpected behavior with large archives
- Recommendation: Use COPY unless you specifically need ADD tar extraction feature
Expected answer points:
- Multi-stage builds use multiple FROM statements to separate build and runtime environments
- The final production image only contains runtime artifacts, not build tools
- Build dependencies like compilers and test frameworks never enter the production image
- This reduces attack surface by excluding potential vulnerabilities from build tools
- Smaller images mean faster pulls, smaller storage, and reduced memory footprint
- The build stage can use larger base images with more tools; production stage uses minimal images
Expected answer points:
- bridge: Default driver for standalone containers; provides DNS resolution via docker0 interface
- host: Removes network isolation; container uses host network directly for lower latency
- overlay: Connects containers across multiple Docker hosts; used in Docker Swarm clusters
- macvlan: Assigns a MAC address to each container; useful for legacy applications expecting physical network cards
- none: Disables all networking; container is completely isolated
- Network choice affects container-to-container communication, performance, and security isolation
Expected answer points:
- Named volumes: Docker-managed persistent storage; survive container removal; best for database storage
- Bind mounts: Map host directory into container; useful for development with live code reloading
- tmpfs mounts: Store data in memory only; fastest option; data lost when container stops
- Volumes can be pre-populated before container start; bind mounts reflect host filesystem exactly
- tmpfs is ideal for caches, session data, or any data that does not need to persist
Expected answer points:
- depends_on defines startup order: services listed wait for dependencies to start first
- It does NOT wait for the application inside the container to be ready
- A database container may be running but the database not yet accepting connections
- For production readiness, implement health checks or startup scripts that wait for dependencies
- Use wait-for scripts or tools like dockerize, wait-for-it, or healthcheck configurations
- Container orchestration platforms support proper dependency health tracking
Expected answer points:
- no: Never restart (default behavior)
- on-failure: Restart only when container exits with non-zero code
- unless-stopped: Restart unless explicitly stopped; survives Docker daemon restart
- always: Always restart; includes after Docker daemon restart
- Production recommendation: usually unless-stopped or always for critical services
- Consider whether you want restart after crashes that might mask underlying bugs
- HEALTHCHECK should accompany restart policies for proper load balancer integration
Expected answer points:
- Run containers as non-root user (USER instruction in Dockerfile)
- Use minimal base images (Alpine, distroless) to reduce attack surface
- Pin exact image versions; never use latest tag in production
- Scan images for vulnerabilities with tools like Trivy before deployment
- Use read-only filesystem (--read-only flag) and tmpfs for /tmp
- Drop all capabilities (--cap-drop=ALL) and disable privilege escalation (--security-opt=no-new-privileges)
- Set resource limits to prevent resource exhaustion attacks
- Use Docker secrets or external secrets managers for sensitive data; never environment variables
- Minimize exposed ports; only expose what is strictly necessary
Expected answer points:
- Check exit code: docker ps -a shows exit code (0, 1, 137, 139, 143, 255)
- 137 = SIGKILL usually from OOM kill; check memory limits and application memory usage
- 139 = SIGSEGV (segmentation fault); application bug or bad memory access
- 143 = SIGTERM (graceful shutdown); normal during stop or restart
- 255 = exit status out of range; often entrypoint misconfiguration
- View logs: docker logs container_name for application output
- Run interactively: docker run -it image_name sh to debug the entrypoint
- Check application configuration: environment variables, dependencies, file permissions
- Verify the CMD or ENTRYPOINT syntax; exec form vs shell form behaves differently
Expected answer points:
- Docker captures STDOUT and STDERR from container processes and writes them to json-file log driver by default
- Configure log driver in daemon.json or per-container with --log-driver flag
- Set log rotation with --log-opt max-size and --log-opt max-file to prevent disk exhaustion
- For production, use centralized logging drivers (fluentd, gelf, awslogs) or ship logs to ELK stack, Loki, or CloudWatch
- Application should write structured JSON logs to STDOUT for easier parsing and querying
- Include correlation IDs in logs to trace requests across services
- Never log sensitive data (secrets, tokens, passwords) even to stdout
- Use log levels appropriately: ERROR for failures, WARN for degraded state, INFO for normal operations
Expected answer points:
- ENTRYPOINT defines the main executable that always runs when the container starts
- CMD provides default arguments that can be overridden at runtime
- Shell form vs exec form: shell form adds /bin/sh -c wrapper which does not handle signals properly
- Exec form (JSON array) is preferred as it runs directly without shell wrapper, enabling proper signal handling
- Use CMD for default arguments: CMD ["--config", "/default.conf"]
- Use ENTRYPOINT when the container should always run as a specific executable: ENTRYPOINT ["python", "app.py"]
- Combine both when you need a fixed executable with default parameters
- Entrypoint can be overridden with --entrypoint flag for debugging or special use cases
Expected answer points:
- HEALTHCHECK instruction tells Docker how to verify container health by running a command inside the container
- Docker marks container unhealthy after consecutive failures matching --retries threshold
- Health checks enable load balancers to only send traffic to healthy containers
- Orchestrators like Kubernetes use liveness probes to restart unhealthy containers and readiness probes to remove from load balancing
- Health check should test the actual application, not just the process: wget to health endpoint, curl to API, or custom check script
- Set appropriate intervals and timeouts: too aggressive wastes resources, too lenient delays failure detection
- For databases, check actual connectivity, not just port open; the database may be starting up while port is listening
Expected answer points:
- COPY bakes code into image layer; changes require rebuild and push to registry
- Volume mounts (bind mounts) let you edit code on host and see changes immediately in container
- COPY is for production: image contains exact code that was tested, reproducible builds
- Bind mounts are for development: fast iteration, no rebuild needed, code may differ from production
- Named volumes are for persistent data: database storage, state that survives container restart
- tmpfs mounts are for sensitive data you never want persisted: tokens, session data, temporary caches
- For development, use bind mounts for code; for production, use COPY; tmpfs only for secrets in dev
- Performance: Bind mounts have minimal overhead; COPY creates additional image layers
Expected answer points:
- Each Dockerfile instruction creates a new layer that is cached if the instruction and its inputs have not changed
- When a layer changes, Docker invalidates the cache for that layer and all subsequent layers
- Order instructions from least to most frequently changing: base image, dependencies, source code last
- Split RUN commands to leverage caching: npm install in separate layer before COPY source
- Use --chown on COPY to avoid cache busting when ownership changes
- BuildKit enables parallel layer building and better cache management
- Use mount caches for package managers (npm, pip, maven) to persist cache across builds
- Avoid COPY . when you only need specific files; COPY package.json first, then source
Expected answer points:
- Containers share the host kernel, creating a larger attack surface than VMs
- Run as non-root user: USER instruction in Dockerfile prevents privilege escalation
- Use minimal base images (Alpine, distroless) to reduce attack surface and minimize CVEs
- Pin exact versions: latest tag can introduce breaking changes or vulnerabilities silently
- Scan images with Trivy or Snyk before deployment; integrate into CI/CD pipeline
- Drop all capabilities: --cap-drop=ALL removes unnecessary kernel permissions
- Use read-only filesystem: --read-only prevents writing to unexpected locations
- Prevent privilege escalation: --security-opt=no-new-privileges stops container from gaining more privileges
- Never store secrets in environment variables; use secrets managers or Docker secrets
- Network isolation: Use custom bridge networks to limit inter-container communication
Expected answer points:
- depends_on directive defines startup order: service listed waits for dependencies to start first
- depends_on only waits for container to start, not for application readiness inside container
- A database container may be running but the database not yet accepting connections
- For production readiness, implement health checks or startup scripts that poll for dependency readiness
- Use tools like wait-for-it, dockerize, or custom entrypoint scripts to wait for dependencies
- Kubernetes handles this better with init containers and readiness probes
- Use condition: service_healthy in Compose to wait for health check to pass
- Restart policies handle crashes but not slow-starting applications
Expected answer points:
- Docker Swarm is built into Docker Engine; Kubernetes is a separate orchestration system
- Swarm has gentler learning curve; Kubernetes has steeper but more powerful abstractions
- Swarm uses Services and Stacks; Kubernetes uses Deployments, Services, Ingress, ConfigMaps, Secrets
- Kubernetes has richer ecosystem: Helm for package management, operators for custom controllers
- Swarm suits simple cases and small teams; Kubernetes scales better for large, complex deployments
- Both support rolling updates, service discovery, load balancing, scaling
- Kubernetes has more sophisticated scheduling: taints, tolerations, node affinity, pod priority
- Swarm uses docker-compose.yml directly; Kubernetes uses YAML manifests
- Both work with Docker images; Kubernetes can also use containerd and other runtimes
Expected answer points:
- Containers are ephemeral by default; any data written inside is lost when container is removed
- Use named volumes for persistent data that must survive container restarts and recreation
- Database data should always go in named volumes, never in container filesystem
- Bind mounts useful for development (live code reload) but risky for production
- tmpfs mounts store sensitive data in memory only; data never persists to disk
- For distributed state, use external databases, Redis, or other stateful services outside containers
- Volume drivers can provide clustering, replication, encryption for production data needs
- Backup volumes regularly: docker run --rm -v volume_name:/data alpine tar czf - /data > backup.tar.gz
- Never store application state in container layer; treat containers as stateless processing units
Expected answer points:
- Use Alpine-based images: ~5MB vs 700MB+ for Ubuntu; smaller attack surface, faster pulls
- Multi-stage builds exclude build tools from production image
- Pin exact versions: never use latest; enables cache reuse, prevents unexpected changes
- Order Dockerfile instructions from least to most frequently changing for better caching
- Combine RUN commands to reduce layer count: RUN apt-get install && rm -rf /var/lib/apt/lists/*
- Use .dockerignore to exclude unnecessary files (.git, node_modules, docs) from build context
- BuildKit enables parallel builds and mount caches for package managers
- Only COPY necessary files: COPY package*.json first, then source; not COPY . .
- Consider distroless or scratch images for minimal footprint; only includes runtime dependencies
- Layer ordering matters: dependencies change less often than source code, so they go first
Further Reading
- Dockerfile Best Practices - Official guide to writing efficient Dockerfiles
- Docker Compose Documentation - Complete reference for docker-compose configuration
- Container Security Best Practices - Security hardening guidelines from Docker
- BuildKit Documentation - Advanced build features for faster, more efficient builds
- Kubernetes Documentation - Pod lifecycle, scheduling, and management
- Trivy Scanner - Vulnerability scanner for containers
- Docker Scout - Image analysis and vulnerability detection
- Helm Charts - Package manager for Kubernetes applications
Conclusion
Key Takeaways
- Docker containers package applications with their dependencies for consistent deployment across environments
- Images are built in layers; multi-stage builds keep production images lean
- Named volumes persist data independent of container lifecycle
- Docker Compose manages multi-container applications with automatic service discovery
- Health checks enable proper monitoring and orchestrator integration
- Restart policies handle common failure scenarios automatically
- Security requires defense in depth: image scanning, non-root users, resource limits, and secret management
Production Readiness Checklist
# Image building
docker build -t myapp:1.0.0 --platform linux/amd64 .
docker scan myapp:1.0.0
docker run --rm -it myapp:1.0.0 --healthcheck
# Security hardening
docker run \
--read-only \
--user=10001 \
--cap-drop=ALL \
--memory=512m \
--security-opt=no-new-privileges \
myapp:1.0.0
# Compose validation
docker-compose config --quiet
docker-compose up -d
docker-compose ps
docker-compose logs -f
# Volume management
docker volume create myapp_data
docker inspect myapp_data
docker volume ls
Pre-Deployment Verification
# Check for vulnerabilities
trivy image myapp:1.0.0
# Verify resource limits
docker inspect myapp | grep -A 10 Memory
# Test health check
docker exec myapp wget -qO- http://localhost:3000/health
# Check logs
docker logs myapp --tail 100 --timestamps
# Monitor in real-time
docker stats myapp --no-stream
Docker simplifies application deployment by providing consistent packaging across environments. Images, containers, volumes, and networking form the foundation for any containerized architecture.
Start simple: containerize a single application, run it locally with Docker Compose, and gradually add complexity as you need it. Most teams outgrow manual Docker commands quickly and move to orchestration tools like Kubernetes, but the fundamentals covered here apply throughout.
If you want to go deeper into container orchestration, the Advanced Kubernetes guide covers custom controllers, operators, and production-grade cluster management. Helm Charts provides a templating system that makes deploying complex applications manageable.
Category
Related Posts
Docker Fundamentals
Learn Docker containerization fundamentals: images, containers, volumes, networking, and best practices for building and deploying applications.
Container Images: Building, Optimizing, and Distributing
Learn how Docker container images work, layer caching strategies, image optimization techniques, and how to publish your own images to container registries.
Container Registry: Image Storage, Scanning, and Distribution
Set up and secure container registries for storing, scanning, and distributing container images across your CI/CD pipeline and clusters.