Container Security: Image Scanning and Vulnerability Management
Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.
Introduction
Image Signing (Cosign) vs. Not
Use image signing when you deploy to production environments where you need to verify that only images you intentionally built reach your cluster. Signing matters most in multi-team environments where anyone could push to your registry, or when you pull base images from third parties.
Do not use image signing if you are the only person building and deploying images in a small team with a private registry. The operational overhead of key management exceeds the security benefit until you have multiple contributors.
Falco vs. Other Runtime Security Tools
Use Falco when you want open-source runtime security with an active community and Kubernetes-native integration. Falco has the largest rule set community and works well with standard Kubernetes logging.
Use alternatives like Sysdig Falco Enterprise or Aqua Security when you need commercial support, specific compliance framework integrations, or tighter integration with your SIEM.
AppArmor and Seccomp: When They Are Overkill
AppArmor and Seccomp profiles are worth the effort for regulated environments (financial services, healthcare) or for workloads handling sensitive data. The performance overhead is minimal and the blast radius reduction is significant.
Do not invest in custom Seccomp profiles for stateless microservices with no external network access. The operational cost of maintaining profiles exceeds the risk reduction for low-sensitivity workloads. Use the default Docker Seccomp profile instead.
Image Scanning with Trivy and Grype
Scan every image before it touches your cluster. Not sometimes. Not in staging only. Every image, every push, in your CI pipeline.
Trivy is the default choice for most teams. It is fast, has a large vulnerability database, and integrates with most CI systems.
# Install Trivy
brew install trivy
# Scan an image
trivy image myregistry/myapp:latest
# Scan in CI with exit code on high vulnerabilities
trivy image --exit-code 1 --severity HIGH,CRITICAL myregistry/myapp:latest
Grype is another option, particularly if you want to scan SBOMs (Software Bills of Materials) or need a different database backend.
# Install Grype
brew install grype
# Scan with SBOM input
grype sbom:./sbom.json
# JSON output for automation
grype image myregistry/myapp:latest -o json > results.json
Both tools pull from multiple vulnerability databases including the Ubuntu, Debian, and Alpine security feeds, plus the Python Package Index and npm registry.
SBOM Generation and Vulnerability Tracking
An SBOM is a formal record of the packages and dependencies in your software. Think of it as an ingredient list for your container image.
Generate SBOMs at build time:
# Generate SBOM with Syft
syft myregistry/myapp:latest -o spdx-json > sbom.spdx.json
# Or in Dockerfile build with buildpacks
pack build --builder heroku/buildpacks:20 myregistry/myapp:latest
SBOMs serve two purposes. First, when a new vulnerability drops (like Log4Shell), you can query your SBOM database to find every image affected in minutes, not hours. Second, SBOMs give you audit trails for compliance.
Store SBOMs alongside your images in a registry that supports it, or in a separate artifact storage.
Runtime Security with Falco
Scanning images at build time catches known vulnerabilities. Falco catches anomalous behavior at runtime, things that are not in any vulnerability database because they are specific to your environment.
Falco works by monitoring system calls. You define rules for behavior you consider suspicious:
# falco_rules.yaml
- rule: Detect shell in container
desc: A shell was spawned inside a container
condition: >
container and
proc.name = bash
output: >
Shell spawned in container
(user=%user.name container=%container.name
image=%container.image.repository)
priority: WARNING
- rule: Detect crypto mining
desc: Detect execution of known crypto miner
condition: >
spawned_process and
proc.name in (cpuminer, nanominer, ethminer)
output: >
Crypto miner detected
(user=%user.name command=%proc.cmdline)
priority: CRITICAL
Deploy Falco as a DaemonSet in your cluster. It will generate events for every suspicious behavior it sees.
Non-Root Users and Read-Only Root Filesystems
Design your containers to run as non-root by default. This is harder than it sounds because many official images run as root internally.
# Create a non-root user in your Dockerfile
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
# If you must run as root, switch before running the app
USER root
RUN some-privileged-operation
USER appuser
Pair non-root users with read-only filesystems. If an attacker compromises your container, they cannot write to the filesystem.
# Kubernetes pod spec
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10000
You will need to identify which directories need write access and mount them as volumes.
Seccomp and AppArmor Profiles
Seccomp (secure computing mode) restricts the system calls a container can make. By default, containers can make hundreds of system calls. Seccomp lets you whittle that down to the handful your application actually needs.
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": ["read", "write", "exit", "sigreturn"],
"action": "SCMP_ACT_ALLOW"
}
]
}
AppArmor works at a higher level, controlling file access, capabilities, and network access based on profiles.
# Apply an AppArmor profile to a container (in Kubernetes with containerd)
container.apparmor.security.alpha.kubernetes.io/runtimeclass: "runtime/default"
Docker applies a default seccomp profile that blocks about 44 system calls. Kubernetes does not apply any default seccomp profile, so you need to set it explicitly if you want it.
Supply Chain Security
The SolarWinds and Codecov breaches showed what happens when attackers compromise upstream supply chains. Your containers are only as secure as their dependencies.
flowchart LR
A[Image Build] --> B[Trivy Scan]
B --> C{ Vulnerabilities found? }
C -->|High/Critical| D[Block Deploy]
C -->|None/Low| E[Generate SBOM]
E --> F[Cosign Sign]
F --> G[Push to Registry]
G --> H[Kyverno Policy Check]
H --> I{ Signature Valid? }
I -->|No| J[Reject Pod]
I -->|Yes| K[Deploy to Cluster]
K --> L[Falco Runtime Monitor]
L --> M[Alert on Anomaly]
Pin base images to specific digests, not tags. Tags are mutable; a node:18-alpine today is not the same as node:18-alpine in six months.
# Pin to digest, not tag
FROM node@sha256:a1b2c3d4e5f6... as builder
Use image signing. Cosign (part of Sigstore) lets you sign images and verify signatures at runtime.
# Sign an image
cosign sign --key cosign.key myregistry/myapp:latest
# Verify in Kubernetes with Kyverno
kubectl apply -f - <<EOF
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-signed-images
spec:
validationFailureAction: enforce
match:
any:
- resources:
kinds:
- Pod
EOF
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Trivy blocking deployment for critical CVE with no immediate patch | Build pipeline halts, deployment delayed | Establish a vulnerability exception process with risk acceptance sign-off, prioritize CVEs by exploitability (EPSS score) over severity alone |
| Falco false positives causing alert fatigue | Security team ignores alerts, real threats missed | Tune Falco rules to your environment, suppress known-false positives, review rule effectiveness quarterly |
| Container running as root escaping to host | Attacker gains host access, full cluster compromise | Enforce runAsNonRoot: true in PodSecurityPolicy, fail builds that produce root containers |
| Supply chain compromise via malicious base image | Backdoored image deployed to production | Pin base images to digests, use Cosign signature verification, scan all third-party images in CI |
:latest tag image mutation causing inconsistency | Different nodes run different image versions, unpredictable behavior | Always tag builds with commit SHA, never pull :latest in production |
Trade-off Analysis
| Scenario | Trivy | Grype | Notes |
|---|---|---|---|
| Vulnerability database size | Large | Large | Both cover major OS and language package feeds |
| SBOM generation | Via Syft | Native | Grype handles SBOMs directly; Trivy requires Syft as separate step |
| CI integration | Native | Native | Both exit non-zero on findings |
| JSON output for automation | Yes | Yes | Both produce structured output |
| Speed (large images) | Fast | Fast | Comparable performance |
| Scenario | Falco (runtime) | Prevention-only | Notes |
|---|---|---|---|
| Detects zero-days | Yes | No | Runtime monitoring catches novel attacks |
| Performance overhead | Low (~5%) | None | Falco adds minimal latency |
| Requires tuning | Yes | No | Falco needs rule customization per environment |
| Compliance value | Medium | Low | Falco provides audit trail for behavior |
| Scenario | Image signing required | Signing optional | Notes |
|---|---|---|---|
| Multi-team registry | Yes | No | Signature verification prevents unauthorized pushes |
| Single-person builds | No | Yes | Key management overhead exceeds risk without multiple contributors |
| Regulated environments | Yes | No | SOC 2, PCI-DSS often require artifact signing |
| Scenario | Rootless containers | Privileged containers | Notes |
|---|---|---|---|
| Security posture | Strong | Weak | Rootless significantly reduces container escape impact |
| Compatibility | Most apps work | Legacy apps may need root | Worth migrating legacy apps rather than running privileged |
| Performance | No overhead | No overhead | No reason to use privileged containers |
Container Security Observability
Monitor CVE counts per image as a metric in your CI pipeline. A spike in critical CVEs for an image you have not changed means one of your dependencies released a bad update. Set up alerts when image scan results change between builds.
Falco alert volume per rule tells you which rules are worth keeping. Rules that fire hundreds of times a day are noise. Suppress or remove them so real anomalies stand out.
Track container restart rates. Containers that restart every few minutes are either crashing or being evicted repeatedly. Both are worth investigating.
Key commands:
# Trivy scan with JSON output for metrics extraction
trivy image --exit-code 1 --severity HIGH,CRITICAL --format json myregistry/myapp:latest > scan-results.json
# Count CVEs by severity
jq '[.Results[].Vulnerabilities[]?.Severity] | group_by(.) | map({severity: .[0], count: length})' scan-results.json
# Falco alert volume by rule in the last hour
kubectl logs -l app=falco -n falco --since=1h | jq -r '.rule' | sort | uniq -c | sort -rn
# List images with most critical vulnerabilities across your cluster
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}' | tr ' ' '\n' | sort -u | while read img; do echo "$img: $(trivy image --quiet --severity CRITICAL "$img" 2>/dev/null | grep -c CRITICAL || echo 0)"; done | sort -t: -k2 -rn | head -10
Common Pitfalls / Anti-Patterns
Running containers as root. Many official images run as root internally. If your container escapes, the attacker has root on the host. Use runAsNonRoot: true and design your images with a non-root user from the start.
Not scanning images. Skipping scans to speed up builds means known vulnerabilities reach production. Block high and critical CVEs in CI. If the build cannot pass, that is the signal to fix the dependency.
Using the :latest tag. When you pull node:18-alpine, you get whatever node:18-alpine means today. Pin to digests: node:18-alpine@sha256:abc123.... Test your builds against a fixed version.
Not signing images. In any environment where untrusted parties can push to your registry, signature verification prevents unauthorized images from running. Cosign makes this straightforward.
Skipping runtime monitoring. Image scanning only catches known vulnerabilities. An attacker exploiting a misconfiguration or a zero-day will not show up in any scan. Falco closes that gap.
Interview Questions
Running as root means if an attacker escapes the container, they have root access to the host. Risks include container breakout to host filesystem, binding to privileged ports, and capability escalation. Fix by: setting runAsNonRoot: true in pod security context, using a non-root user in the Dockerfile (USER instruction), and ensuring the image builds with a non-root user. Also set allowPrivilegeEscalation: false and drop all capabilities with capDrop: ALL.
First, stop the bleeding: block the vulnerable image version in your CI/CD admission control (OPA Gatekeeper or Kyverno). Identify all affected services via your image registry tags and deployment inventory. Prioritize by exposure (internet-facing vs internal) and data sensitivity. Build and push fixed images for the highest-priority services, test, and deploy. For lower-priority services, schedule into sprint planning. Set up automatic vulnerability scanning on new image pushes to catch this earlier. Consider a "golden image" strategy where security hardens a base image centrally.
Use image signing and verification: sign images with Cosign or Notary during the build pipeline, then verify signatures at admission time using a policy controller (Kyverno or OPA Gatekeeper). Store signing keys in a KMS (AWS KMS, Google Cloud KMS, HashiCorp Vault). Enable admission control to reject unsigned images. Use short-lived tokens for CI/CD authentication rather than long-lived credentials. Audit all image pull events. Implement a software supply chain bill of materials (SBOM) to track what went into each image.
Seccomp restricts syscalls a container can make at the kernel level — the most granular control but requires knowing which syscalls an application needs. AppArmor works at the application level, restricting capabilities and file access paths — easier to use for known application profiles. SELinux works at the system level, labeling files and processes — most powerful but complex to configure. In practice: Docker defaults ship with a sensible seccomp profile blocking dangerous syscalls. For Kubernetes, seccomp via securityContext.seccompProfile and AppArmor via container.apparmor.security.beta.kubernetes.io are the common paths. SELinux is typically used at the host level.
Runtime detection tools like Falco monitor syscall behavior and flag anomalous activity: a shell spawning inside a container, unexpected network connections, writing to sensitive paths like /etc/ or /root/. Sysdig captures system calls for deeper analysis. Network monitoring detects exfiltration attempts via unusual outbound traffic. Integrate these with your SIEM or alerting system. Also monitor container restart counts, unexpected process trees (kubectl top pods showing unusual CPU), and node-level indicators like new SSH keys in /root/.ssh/.
Trivy is the default choice for most teams — it has a large vulnerability database, is fast, and integrates with most CI systems with minimal configuration. Grype is stronger when you need native SBOM support or want to scan SBOMs you already generated with Syft. Grype handles SBOM inputs directly while Trivy requires Syft as a separate step for SBOM generation. If your pipeline already generates SBOMs, Grype avoids adding another tool. If you want the simplest setup with the most baked-in integrations, Trivy wins.
Cosign generates a key pair, signs your container image digest, and stores the signature as an OCI artifact in your registry alongside the image. To enforce verification in Kubernetes, deploy Kyverno or OPA Gatekeeper as an admission controller. Create a policy that queries the registry for a valid Cosign signature before allowing the pod to start. If the image is not signed or the signature is invalid, the admission controller rejects the pod. Store the Cosign private key in a KMS (AWS KMS, Google Cloud KMS, HashiCorp Vault) — never in the pipeline itself. Rotation involves re-signing all images with the new key and updating the policy.
Identify the specific paths the application needs to write to at runtime. Common candidates are /tmp, /var/log, and /run. Create emptyDir volumes mount at those paths in your pod spec. Set readOnlyRootFilesystem: true globally, then selectively mount writable volumes to the specific locations the application requires. For log directories, mount an emptyDir or a mounted NFS share. This approach gives you the security benefit of a read-only root while allowing legitimate writes where needed. You can also use a tmpfs mount for temporary files.
Pod Security Standards define three levels — privileged, baseline, and restricted — that cluster operators can enforce cluster-wide. Pod Security Admissions (built into Kubernetes 1.25+) is the controller that enforces those standards at the namespace level via labels. PSPs were the previous mechanism but were deprecated because they required a mutating admission webhook and had security gaps. To enforce restricted PSS on a namespace, label it pod-security.kubernetes.io/enforce: restricted. The PSA checks runs before pods are scheduled, rejecting those that violate the policy. This replaces the old PSP webhook approach with something simpler and more maintainable.
Linux capabilities split the power of root into fine-grained units — CAP_NET_RAW lets a process send raw packets, CAP_SYS_TIME lets it set the system clock. Running as root in a container does not give all capabilities by default because Docker drops many, but not all. Dropping ALL capabilities and then adding back only what your application needs follows the principle of least privilege. To determine what your app needs: run it under seccomp with a permissive profile while monitoring which syscalls fire (use strace or Falco), then build a restrictive seccomp profile from that baseline. For capabilities, start with nothing and add them one at a time while testing functionality. CAP_NET_BIND_SERVICE is commonly needed for processes binding to ports below 1024.
Never bake secrets into images or pass them as plain-text environment variables — both end up in image layers and container logs. The Kubernetes-native approach is to use a secrets management tool like HashiCorp Vault with the Vault Secrets Operator or the CSI Secret Store driver, which mounts secrets as files in the container filesystem without ever exposing them as env vars. For Azure, use Key Vault with the AKV CSI provider. Alternatively, use Kubernetes external secrets with AWS Secrets Manager or GCP Secret Manager. The key principle: secrets should be injected at runtime from an external store, never baked into the image at build time.
First, isolate the affected pod — prevent it from scheduling new work while you investigate. Capture the running container state: docker inspect for the container config, docker diff to see filesystem changes from the image, and docker logs for stdout/stderr. Extract the container's process tree with docker top and network connections with docker exec netstat or similar. Take a snapshot of the container's memory with docker checkpoint if your runtime supports it. Pull the image and compare it to the expected image digest. Preserve logs and audit trails before letting the pod restart or scaling it out.
Network policies in Kubernetes are namespace-scoped and act as a firewall for egress and ingress traffic per pod. Label your namespaces and pods, then create a NetworkPolicy that selectors the appropriate pods. For a frontend-backend setup: the frontend policy allows ingress from the ingress controller only, the backend policy allows ingress from the frontend only. Common mistakes: forgetting that network policies are additive within a namespace (a pod with no policy is fully accessible), not accounting for DNS resolution (pods need to communicate with kube-system for DNS), and applying policies only to named namespaces without understanding that pods in unlabeled namespaces can still reach your services.
An SBOM (Software Bill of Materials) is a structured inventory of every package and dependency in your container image — the equivalent of an ingredient list. It matters because when a new vulnerability drops (like Log4Shell), you query your SBOM database to identify every affected image in minutes rather than scanning each one individually. To generate one: use Syft to scan your image and produce an SPDX or CycloneDX SBOM. Store the SBOM alongside the image in your registry or in a separate artifact store. In CI/CD, generate the SBOM after building the image, store it as a build artifact, and integrate it with your vulnerability scanner so Grype or Trivy can correlate CVE data with your exact package versions.
Start with the default Falco rule set and run in audit mode — Falco logs warnings but does not block. Collect a week of alerts and identify which rules fire hundreds of times per day. Those rules are noise in your environment. Suppress them by creating exceptions in falco config. Then identify which alerts represent genuine security signals by correlating with known incidents. Keep those rules. Review quarterly — rule effectiveness changes as your workload changes. The goal: security engineers should be able to investigate every Falco alert that fires in a day. If they cannot, you have too much noise and will miss real incidents.
hostPath volumes let a container read/write files on the host node's filesystem. An attacker who escapes the container and has access to hostPath-mounted directories can read sensitive host data, write cron jobs to the host, or modify kubelet configuration. Alternatives: use Kubernetes ConfigMaps for configuration files, Secrets for credentials (via CSI or Vault), emptyDir for temporary storage, or PersistentVolumeClaims with appropriate access modes for persistent data. If you must use hostPath for system-level access (like the node's crictl socket for a CNI plugin), restrict it with PSP or PSS to only the specific service accounts that need it, and document why it is required.
Never pull by tag alone — tag mutability means node:18-alpine today is not the same image in six months. Pin to a specific digest in your Dockerfile: FROM node@sha256:abc123.... Scan every image in CI before it is used, even if it comes from a trusted registry like Docker Hub. Use a VEX (Vulnerability Exploitability eXchange) document to communicate which CVEs in your dependencies are not exploitable in your context. For critical workloads, maintain a hardened "golden image" base that your security team audits and signs with Cosign. Pull from official sources only and verify the registry's image signature when available.
Build-time scanning (Trivy, Grype) catches known vulnerabilities in your dependencies and base image layers before they reach production. It is preventive and deterministic — given the same image, it produces the same results. Runtime monitoring (Falco) catches anomalous behavior that is not in any vulnerability database: misconfiguration attacks, zero-days, and attacker behavior specific to your environment. You need both. Build-time scanning prevents known CVEs from reaching production. Runtime monitoring catches everything else — the attacks that exploit misconfigurations or vulnerabilities that have no CVE yet. Without runtime monitoring, a zero-day exploit of a medium-severity CVE will sail through because no scanner flags it.
Layer ordering matters: put instructions that change most frequently at the end of the Dockerfile so that cache invalidation does not rebuild sensitive layers. Never put secrets in RUN commands — they appear in the layer history. Use multi-stage builds so the final image contains only the runtime artifacts, not the build toolchain (which may include source code or build secrets). Set appropriate file permissions in the Dockerfile (chmod only what is needed). Remove package manager caches, package lists, and temporary files in the same layer that installs them. Validate that the final image does not contain shell, package managers, or debugging tools unless explicitly needed at runtime.
In your CI pipeline, run trivy image --exit-code 1 --severity HIGH,CRITICAL after building and before pushing. If it exits with code 1, the build fails and the image is not pushed. Set appropriate severity thresholds — blocking on HIGH/CRITICAL is common; blocking on LOW/MEDIUM is too noisy for most teams. When a build fails due to a critical CVE, you have a few paths: update the affected dependency to a patched version (preferred), rebuild from a patched base image, apply a vulnerability exception with risk acceptance if the CVE is not exploitable in your context, or implement a compensating control like runtime monitoring to catch exploitation attempts. Never ignore critical CVEs without documented risk acceptance.
Further Reading
| Layer | Tool | Preventative vs Detective | CI/CD vs Runtime |
|---|---|---|---|
| Image scanning | Trivy, Grype, Snyk | Preventative | CI/CD |
| Sig verification | Cosign, Notary | Preventative | CI/CD + Registry |
| Runtime monitoring | Falco, Sysdig | Detective | Runtime |
| Policy enforcement | OPA Gatekeeper, Kyverno | Preventative | Admission control |
| User namespace remapping | —userns-remap | Preventative | Daemon config |
| Syscall filtering | seccomp, AppArmor, SELinux | Preventative | Daemon config |
| Network policies | K8s NetworkPolicy | Preventative | Runtime |
Conclusion
Key Takeaways
- Image scanning catches known vulnerabilities; runtime monitoring catches anomalous behavior
- Pin base images to digests, not tags, to prevent supply chain drift
- Run containers as non-root with read-only filesystems to limit container escape blast radius
- Cosign signatures prevent unauthorized images from reaching your cluster
- Falco complements scanning by detecting post-deployment anomalies
Container Security Checklist
# 1. Scan every image in CI, block on HIGH/CRITICAL
trivy image --exit-code 1 --severity HIGH,CRITICAL myregistry/myapp:$GIT_COMMIT
# 2. Pin base images to digest
FROM node@sha256:abc123... AS builder
# 3. Build as non-root
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
# 4. Enforce read-only root filesystem
securityContext:
readOnlyRootFilesystem: true
# 5. Sign images with Cosign
cosign sign --key cosign.key myregistry/myapp:$GIT_COMMIT
# 6. Verify signatures in Kubernetes with Kyverno
kubectl apply -f kyverno-policy-require-signed-images.yaml
# 7. Deploy Falco as DaemonSet
helm install falco falcosecurity/falco -n falco --create-namespace
For more on securing Kubernetes workloads, see Network Security. For secrets handling, see Secrets Management.
Category
Related Posts
DevOps & Cloud Infrastructure Roadmap: From Containers to Cloud-Native Deployments
Master DevOps practices with this comprehensive learning path covering Docker, Kubernetes, CI/CD pipelines, infrastructure as code, and cloud-native deployment strategies.
Container Images: Building, Optimizing, and Distributing
Learn how Docker container images work, layer caching strategies, image optimization techniques, and how to publish your own images to container registries.
Container Registry: Image Storage, Scanning, and Distribution
Set up and secure container registries for storing, scanning, and distributing container images across your CI/CD pipeline and clusters.