Cloud Security: IAM, Network Isolation, and Encryption

Implement defense-in-depth security for cloud infrastructure—identity and access management, network isolation, encryption, and security monitoring.

published: March 25, 2026 reading time: 35 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

Cloud security means layering identity, network isolation, and encryption because the network perimeter is no longer a hard boundary. This guide walks through IAM least privilege, workload identity federation, VPC design with public and private subnets, and KMS key management for encryption at rest. You will learn how security groups, NACLs, and VPC endpoints work together, when to choose customer-managed versus cloud-managed keys, and which monitoring services (CloudTrail, GuardDuty, Security Hub) catch threats that configuration checks miss. By the end you will understand defense-in-depth for cloud and know which security failures cause real incidents.

Cloud Security: IAM, Network Isolation, and Encryption

Cloud security requires rethinking the assumption that your network perimeter is safe. In cloud environments, the network is potentially hostile by default. Any resource with a public IP or membership in a security group with public access is exposed. Security must be layered: identity, network, and data encryption work together for defense in depth.

This post covers the core security practices that apply regardless of which cloud provider you use. The examples use AWS, but the concepts translate to Azure, GCP, and other providers with different service names.

Introduction

Cloud security operates on a fundamentally different model than on-premises security. On-premises, your network perimeter is a hard boundary—firewalls, VLANs, and physical access controls keep threats out. In the cloud, that perimeter is soft. Any resource with a public IP is potentially reachable from the internet, and any misconfigured security group can expose sensitive services.

The shift requires thinking about security as a series of layers rather than a single hard shell. Identity—who can access what—matters as much as network controls. Encryption protects data at rest and in transit. Monitoring and logging give you visibility into what is happening so you can detect and respond to threats.

This guide covers identity and access management (IAM), network isolation using VPCs and security groups, encryption strategies, and the cloud-native security services that provide visibility across your infrastructure.

When to Use

Cloud-Native Security Services vs. Third-Party

Cloud-native security tools like Security Hub, GuardDuty, and CloudTrail on AWS integrate tightly with the provider’s control plane. Findings surface in the same console where you manage everything else, and CloudTrail already logs every API call as part of your normal operations. The upside is simplicity: enable a service, get findings, done. The downside is that Security Hub only sees AWS. If you run EKS and Azure SQL in parallel, you are checking two separate consoles and correlating two separate streams of findings.

Third-party tools like Wiz, Prisma Cloud, and SentinelOne take the opposite approach. They ingest findings from every cloud provider and on-premises systems into one dashboard. This unified view matters when your environment is actually multi-cloud or when you need compliance reporting that spans providers. The tradeoff is added complexity: someone has to configure the integration, manage the vendor relationship, and pay for another subscription. Most organizations that go third-party also keep cloud-native tools running in each provider because the native services catch things the aggregator misses.

The detection coverage difference is where the separation gets real. GuardDuty uses machine learning trained on AWS-specific attack patterns. It catches things like unusual S3 access from an unusual location, brute force attempts on SSH, or cryptocurrency mining behavior in EC2. Third-party CSPMs use rules-based detection that works across providers but may miss AWS-specific signals that GuardDuty catches because the aggregation logic does not replicate the provider-native detection engine. Many organizations run both: GuardDuty for AWS-specific threats, the CSPM for cross-provider visibility and compliance reporting.

For single-cloud environments, cloud-native tools are usually sufficient. Security Hub plus GuardDuty plus CloudTrail give you threat detection, compliance monitoring, and audit logging. The CSPM aggregation overhead is not worth it if you only have one provider. For multi-cloud environments, the moment you have EKS on AWS and Azure SQL on Azure, you have two separate security consoles. A CSPM that aggregates findings into one dashboard makes cross-provider incident response faster. The cost is the integration complexity and the license.

The layered approach that works in practice: cloud-native tools handle core detection in each provider (always on, low overhead), a CSPM aggregates for cross-provider visibility, and a SIEM or SOAR tool automates response playbooks. This is not cheap and requires dedicated security engineering to operate. Pricing reality: cloud-native tools charge per consume or per-asset. CSPMs like Wiz and Prisma Cloud charge per-asset pricing that can get expensive at scale. Factor in the total security tooling cost including the CSPM before deciding.

Factor	Cloud-Native	Third-Party CSPM
Detection depth	AWS-specific ML models, high accuracy on AWS threats	Rules-based, cross-provider, may miss provider-specific signals
Operational overhead	Low (enable per-service, native console)	High (integration setup, vendor management, ongoing config)
Multi-cloud support	Single provider only	Unified view across all providers
Cost model	Per-consume or per-asset	Per-asset, scales expensively
Compliance reporting	Per-provider	Cross-provider, aggregated
Best for	Single-cloud, AWS-focused environments	Multi-cloud, complex hybrid environments

In practice, the most common pattern is layered: cloud-native services handle core detection in each provider, a SIEM or CSPM aggregates findings for cross-provider visibility, and a SOAR tool automates response playbooks. If you are starting from scratch, enable cloud-native tools in every provider first. Add a third-party aggregator only when the operational overhead of checking multiple consoles becomes a real burden.

VPC Endpoints vs. NAT Gateway

The difference between VPC endpoints and NAT gateways comes down to what traffic you are trying to protect and where it goes. VPC endpoints create a private connection from your VPC to AWS services like S3 and DynamoDB without the traffic leaving the AWS network. There is no hourly cost and no data processing charge for traffic that stays on the endpoint. NAT gateways, by contrast, route outbound traffic from private subnets through the internet. You pay per hour the gateway runs and per gigabyte of data processed.

For private subnets that only need to reach AWS services, VPC endpoints are the obvious choice. A common production setup is: private subnets for application servers, a NAT gateway for outbound internet access (patching, package downloads, external API calls), and VPC endpoints for S3 and DynamoDB access. If you route S3 traffic through the NAT gateway instead, you are paying data processing fees on top of the NAT gateway hourly cost for no reason.

The catch is that VPC endpoints only work for AWS services. If your private instances need to reach anything else on the internet, you need a NAT gateway or some other outbound route. Some teams also run VPC endpoints alongside NAT gateways intentionally, using endpoints for AWS service traffic and the NAT for everything else. This works fine as long as your route tables are explicit about which traffic goes where. Misconfigured route tables that send endpoint traffic through the NAT are a common source of unexpected bills.

VPC endpoints are faster for a concrete reason: traffic to S3 via VPC endpoint stays on the AWS backbone and does not traverse the internet. S3 traffic via NAT gateway exits the VPC, crosses the internet boundary, re-enters AWS at the S3 endpoint. The difference is 1-3ms of latency and $0.01 per GB in data processing fees on top of NAT gateway costs. For a data pipeline moving 100GB per day, that is $3 per day in unnecessary processing fees through NAT versus $0 through a VPC endpoint.

The security angle matters for production workloads. VPC endpoints can have endpoint policies that restrict access to specific buckets or actions. A misconfigured application that tries to list all S3 buckets gets denied by the endpoint policy even if the IAM role allows s3:* on all resources. This is defense in depth at the network layer: IAM says what the role can do, the endpoint policy says what can actually be reached. Without the endpoint policy, the IAM permission is the only control, and a typo in a bucket policy or an overly broad IAM statement becomes the only thing stopping a data breach.

NAT gateway use cases that VPC endpoints cannot replace are straightforward to identify. Patching (yum or apt downloads from Amazon Linux or Ubuntu repos), downloading packages from public registries (npm, pip, Docker Hub), calling third-party APIs (Stripe, Twilio) that are not AWS services, and downloading software from vendor URLs all require outbound internet access. VPC endpoints only cover AWS services. If your private instances need any of these, a NAT gateway is not optional.

The bill shock scenario catches teams every quarter. Teams that route S3 traffic through NAT gateways accumulate data processing fees that surprise them at month end. A 100GB per day data pipeline through NAT costs $3 per day in processing fees. Through a VPC endpoint it costs $0. VPC endpoints are free in the sense that there is no hourly cost and no per-GB charge for traffic that stays on the endpoint. The only cost is the negligible resources the endpoint creates.

Route table priority is where teams get confused. VPC endpoint traffic uses the VPC route table local prefix, which takes precedence over 0.0.0.0/0 NAT routes. This means adding a VPC endpoint does not automatically route all S3 traffic through it. Only traffic destined for the specific S3 prefix lists uses the endpoint route. The confusion comes from assuming VPC endpoints work like NAT for all destinations. When you add a VPC endpoint, you are adding a specific route for S3 prefix lists, not a catch-all redirect.

Factor	VPC Endpoints	NAT Gateway
Cost	Free (no hourly, no per-GB for AWS service traffic)	$0.045/GB data processed + hourly fee
Latency	1-3ms lower (stays on AWS backbone)	Higher (exits VPC, internet, re-enters AWS)
Security	Endpoint policies restrict access to specific buckets/actions	No endpoint policy, relies on IAM only
Use case	AWS services only (S3, DynamoDB, Secrets Manager)	Any outbound internet (patching, public APIs, registries)
Route priority	Local prefix takes precedence over 0.0.0.0/0	0.0.0.0/0 catches all unmatched traffic
Best for	Private subnets accessing AWS services	Private subnets needing internet access

Customer-Managed KMS Keys vs. Cloud-Managed Keys

Customer-managed KMS keys cost roughly $1 per month per key plus data processing fees per API call. In return, you control the key policy, can restrict which principals can use it, and can allow cross-account access by editing the policy directly. Key rotation happens on a schedule you define, and every Encrypt/Decrypt operation is logged in CloudTrail with the key ID, caller identity, and timestamp. If an auditor asks which identities accessed your encryption keys last quarter, you can answer that question with CloudTrail.

Cloud-managed keys are free and rotate automatically every year. AWS handles the policy and access control. The limitation is that you cannot view or modify the key policy, and you cannot share the key across accounts. This is fine for development environments and non-sensitive workloads. It is a real problem when your compliance framework requires you to demonstrate who had access to encryption keys, or when you need to share encrypted data with a workload in another account.

The risk with customer-managed keys that catches teams off guard is deletion without a grace period. If you schedule a KMS key for immediate deletion and you have data encrypted under that key, that data is gone. AWS enforces a minimum 7-day waiting period by default and you can extend it to 30 days, but the default exists because teams have permanently lost access to production data this way. Always set the deletion window to the maximum and verify nothing is still encrypted under the key before you delete it.

IAM Best Practices

Identity and Access Management (IAM) is the foundation of cloud security. Every request to a cloud API requires authentication and authorization. IAM policies determine what identities can do what operations on which resources.

The cardinal rule is least privilege: grant only the permissions required for a task, and nothing more. This applies to human users, service accounts, and compute workloads.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3ReadOnlyForApplication",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::my-app-bucket", "arn:aws:s3:::my-app-bucket/*"]
    }
  ]
}

Avoid attaching policies directly to users. Instead, create groups for roles, add users to groups, and attach policies to groups. This makes permission management systematic rather than ad hoc.

# Create a group
aws iam create-group --group-name developers

# Attach a policy to the group
aws iam attach-group-policy \
  --group-name developers \
  --policy-arn arn:aws:iam::aws:policy/ReadOnlyAccess

# Add a user to the group
aws iam add-user-to-group \
  --group-name developers \
  --user-name alice

Regularly audit IAM configurations. AWS Access Analyzer, Azure AD external identities, and GCP Policy Analyzer can identify permissions that grant external access or violate least privilege. Remove unused access keys, deactivate old credentials, and rotate secrets on a schedule.

Service Accounts and Workload Identity

Human users are not the only identities in cloud environments. Compute workloads—EC2 instances, containers, Lambda functions—need permissions to access other AWS services. The question is how those workloads authenticate.

Embedding long-lived credentials in instance profiles or environment variables is risky. Credentials persist beyond the workload lifecycle and can be exfiltrated from logs or environment variables.

Workload identity is the solution. Instead of storing credentials, workloads assume a role using short-lived tokens. The role permissions are scoped to what the workload actually needs.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

For Kubernetes workloads, cloud providers offer operators that project Kubernetes service account tokens into cloud IAM roles. This lets you create Kubernetes service accounts with specific IAM permissions without managing cloud credentials.

# Kubernetes service account with IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app
  namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: my-app-role
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-role-binding
subjects:
  - kind: ServiceAccount
    name: my-app
    namespace: production
roleRef:
  kind: Role
  name: my-app-role

VPC and Network Isolation

Network isolation in cloud environments uses virtual private clouds (VPCs) with subnet segmentation. The principle is straightforward: nothing should be directly accessible from the internet unless intentionally exposed.

# VPC with public and private subnets
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
}

# Public subnets for load balancers
resource "aws_subnet" "public" {
  count             = 2
  vpc_id             = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Type = "Public"
  }
}

# Private subnets for application servers
resource "aws_subnet" "private" {
  count             = 2
  vpc_id             = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 10)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Type = "Private"
  }
}

# NAT gateway for outbound traffic from private subnets
resource "aws_eip" "nat" {
  domain = "vpc"
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }
}

Application servers sit in private subnets and cannot be reached directly from the internet. Load balancers in public subnets route traffic to application servers. Database and cache servers sit in private subnets with no internet access at all.

Security groups act as instance-level firewalls. They are stateful: allowing inbound traffic automatically allows outbound response traffic.

# Security group for web servers
resource "aws_security_group" "web" {
  name        = "web-servers"
  description = "Security group for web servers"
  vpc_id      = aws_vpc.main.id

  # Allow inbound HTTP/HTTPS from load balancer
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.1.0/24"]  # Private subnet CIDR
  }

  # Allow outbound to internet via NAT
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Encryption at Rest and in Transit

Encrypt data wherever it lives. Cloud providers offer encryption at rest by default for most services, using KMS keys you control or provider-managed keys.

# S3 bucket with encryption
resource "aws_s3_bucket" "data" {
  bucket = "my-sensitive-data"

  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm     = "aws:kms"
        kms_key_id        = aws_kms_key.data.arn
      }
    }
  }
}

# KMS key with restricted usage
resource "aws_kms_key" "data" {
  description             = "KMS key for sensitive data"
  deletion_window_in_days  = 30
  enable_key_rotation     = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::123456789:root"
        }
        Action = "kms:*"
        Resource = "*"
      },
      {
        Sid = "Allow use by application"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
        Action = ["kms:Encrypt", "kms:Decrypt"]
        Resource = "*"
      }
    ]
  })
}

TLS encrypts data in transit. Force HTTPS on all public endpoints. Use TLS for connections between services, especially when they cross network boundaries. Certificate management can be automated with services like AWS Certificate Manager or Let’s Encrypt.

Security Groups and Firewall Rules

Security groups should be as restrictive as possible. Start with deny all inbound, allow specific ports and sources.

# Database security group - minimal access
resource "aws_security_group" "database" {
  name        = "database"
  description = "Security group for RDS instance"
  vpc_id      = aws_vpc.main.id

  # No inbound rules - RDS is only reachable from application tier
  # via security group references

  egress {
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    security_groups = [aws_security_group.app.id]
  }
}

Network ACLs provide subnet-level filtering as a secondary control. Security groups handle instance-level filtering. Use both together: NACLs for subnet-wide rules like blocking a specific IP range, security groups for instance-specific access control.

VPC endpoint policies restrict which actions are allowed through VPC endpoints. Without endpoints, traffic to S3 and DynamoDB leaves the VPC and re-enters from the internet. Endpoints keep traffic internal but require explicit policies to control access.

Cloud-Native Security Services

Each major cloud provider offers security services that layer on top of basic IAM and networking.

AWS Security Hub aggregates findings from GuardDuty, Inspector, and Macie. Azure Security Center and GCP Security Command Center play similar roles. These services provide centralized visibility and compliance monitoring across your cloud footprint.

Cloud-native firewalls and WAFs filter traffic at the edge. AWS WAF works with CloudFront and Application Load Balancers, Azure WAF with Application Gateway, and GCP Cloud Armor with Cloud CDN and load balancers. If you expose any HTTP services, a WAF is not optional—it’s the first thing attackers probe.

Logging and monitoring make incident response possible. CloudTrail logs every API call in your account, VPC Flow Logs capture every network connection, and GuardDuty uses machine learning to flag anomalies. Route these to a SIEM or analytics platform. Without them, you are blind to what is happening in your environment.

Defense-in-Depth Architecture

flowchart TD
    A[Internet Traffic] --> B[WAF / Cloud Firewall]
    B --> C[Load Balancer]
    C --> D[Security Groups]
    D --> E[Application Tier]
    E --> F[Database Tier]
    F --> G[KMS Encryption]
    A --> H[IAM Authentication]
    H --> E
    E --> I[VPC Endpoints]
    I --> J[S3 / DynamoDB]

Trade-off Analysis

Security Control	Complexity	Security Benefit	Best For
Customer-managed KMS keys	High	Full audit and rotation control	Regulated workloads, cross-account access
Cloud-managed KMS keys	Low	Automatic rotation, no cost	Development, non-sensitive workloads
VPC endpoints	Medium	Traffic stays internal, lower cost	Private access to S3, DynamoDB from private subnets
NAT gateway for private traffic	Medium	Outbound-only internet for private subnets	Patching, external API calls from private instances
Security groups	Low	Instance-level stateful firewall	Primary network isolation for compute
NACLs	Medium	Subnet-level stateless filtering	Broad subnet rules, blocking specific CIDRs
IAM roles over user credentials	Low	Short-lived tokens, no credential management	All compute workloads

Production Failure Scenarios

Failure	Impact	Mitigation
IAM role trust policy misconfiguration locking out resources	Resources cannot assume roles, deployments fail	Use AWS Access Analyzer before deploying, test trust policies in dev
KMS key deletion without waiting for grace period	Encrypted data becomes irrecoverable	Use 7-30 day deletion windows, never delete keys with production data
Security group overly restrictive blocking legitimate traffic	Application cannot connect to dependencies, outages	Always test security group changes in staging first, use descriptive names
VPC endpoint policy denying required S3 access	Application cannot read from S3, deployments fail	Explicitly list required actions in endpoint policy, test after changes
CloudTrail not enabled for all regions	Attack activity in disabled regions goes unlogged	Enable CloudTrail across all regions, aggregate to single bucket

Cloud Security Observability

What to monitor:

CloudTrail monitors all API calls. Enable it in all regions and route logs to a centralized bucket with object lock to prevent tampering.

GuardDuty monitors for compromised workloads. Review findings daily and route alerts to your security team’s notification channel.

Security Hub aggregates findings from GuardDuty, Inspector, and Macie into a unified view. Enable all integrated services for complete coverage.

VPC Flow Logs record source and destination IPs, ports, and bytes transferred. Use Flow Logs to detect lateral movement and unusual traffic patterns.

Key commands and queries:

# List recent CloudTrail events
aws cloudtrail lookup-events --max-results 10

# Get GuardDuty findings
aws guardduty list-findings \
  --detector-id abc123 \
  --finding-criteria '{"Severity": [{"Eq": ["HIGH"]}]}'

# Query VPC Flow Logs for port 22 access
aws logs insights query \
  --log-group-name /aws/vpc/flow-logs \
  --query-string 'fields srcAddr, dstAddr, dstPort, action | filter dstPort = 22 | limit 20'

# Check IAM access analyzer findings
aws accessanalyzer list-findings \
  --analyzer-name my-analyzer

Common Pitfalls / Anti-Patterns

Using AWS root account for daily operations. The root account has full permissions and cannot be restricted by IAM policies. Use root account only for initial setup, then switch to IAM users and roles for everything else.

Over-permissive IAM roles. Granting *:* or AdministratorAccess to workloads because it is faster than scoping permissions defeats the purpose of least privilege. Start with minimal permissions and add only what the workload actually needs.

Leaving security groups open to 0.0.0.0/0. Allowing all inbound traffic to a database or cache port from anywhere on the internet is a common breach vector. Security groups should restrict access to known CIDRs or specific security groups.

Not enabling encryption by default. Some services allow creating unencrypted resources by default. Enforce encryption through service control policies or AWS Config rules so new resources cannot be created without encryption.

Forgetting to rotate access keys. Long-lived access keys on service accounts are a common exfiltration target. Rotate keys regularly, use short-lived credentials via IAM roles wherever possible.

Trade-off Analysis (Tools)

Security Tool	Preventative vs Detective	CI/CD vs Runtime	Cost
Cloud-native (GuardDuty, Security Hub, Defender)	Detective	Runtime	Pay per consume
CSPM (Prisma Cloud, Wiz)	Both	Runtime	Per-asset pricing
SAST / IaC scanning	Preventative	CI/CD	Tool cost
Secret scanning (Gitleaks, TruffleHog)	Preventative	CI/CD	Free / paid tiers
Runtime security (Falco, Sysdig)	Detective	Runtime	Infrastructure + license
SIEM (Splunk, Elastic)	Detective	Runtime	High (licensing + storage)

Real-world Failure Scenarios

Company / Context	Failure	Consequence	Lesson Learned
Target breach (2013)	IAM credentials for HVAC vendor abused to access POS systems	70 million customer records exposed	Segment networks; vendor access should never reach POS systems regardless of credentials valid
Capital One breach (2019)	Overly permissive IAM role allowing S3 access from external	100 million customer records exposed	Use SCPs to block cross-account access; Audit trust policies regularly
Toyota data exposure (2019)	S3 bucket public; CloudTrail not enabled for region	Customer data accessible; attack undetected	Enable CloudTrail everywhere; Block public S3 access by default
Meow ransomware attacks	Elasticsearch and MongoDB with no authentication exposed	Petabytes of data encrypted by ransom	Network access controls alone are insufficient; Authentication required on all data stores
SolarWinds supply chain attack (2020)	Software build process compromised; malicious update pushed	18,000+ organizations breached	Verify software supply chain integrity; sign releases; monitor for anomalous build behavior

Interview Questions

1. Explain the principle of least privilege in cloud IAM. How do you implement it?

Least privilege means granting exactly the permissions needed for a task and nothing more. It applies to human users, service accounts, and compute workloads.

Implementation starts with understanding what permissions your identities actually need rather than defaulting to broad policies. Use IAM Access Analyzer to identify external access and policy simulators to test policies before deployment. Create groups for roles, attach policies to groups, and add users to groups rather than attaching policies directly to users. Regularly audit unused access keys and deactivate old credentials.

2. What's the difference between VPC endpoints and NAT gateways? When would you use each?

VPC endpoints let private resources like S3 or DynamoDB be accessed from within your VPC without traffic leaving the AWS network. They're faster, cheaper, and more secure than NAT for this use case.

NAT gateways handle outbound internet access for private instances—for patching, downloading packages, calling external APIs. They don't allow inbound connections from the internet.

Use VPC endpoints for private access to AWS services. Use NAT gateways when private instances need to reach the internet outbound. If you're routing S3 traffic through NAT, you're paying unnecessary egress costs and adding latency.

3. How do workload identities work and why are they preferred over long-lived credentials?

Workload identities let compute workloads like EC2 instances, containers, or Lambda functions assume IAM roles using short-lived tokens instead of storing long-lived access keys. The role permissions are scoped to what the workload actually needs.

Long-lived credentials embedded in instance profiles or environment variables persist beyond the workload lifecycle and can be exfiltrated from logs or environment variables. Workload identity eliminates credentials from code entirely—Azure uses managed identities, AWS uses IAM roles with token federation, GCP uses workload identity federation.

4. Describe a defense-in-depth architecture for a three-tier web application on AWS.

Start at the edge: WAF or Cloud Firewall filtering malicious traffic before it reaches your load balancer. The load balancer sits in public subnets, routing to application servers in private subnets. Security groups on application servers allow only traffic from the load balancer.

Database servers sit in private subnets with no internet access at all, reachable only from application tier security groups. KMS encrypts data at rest. IAM roles handle authentication for any AWS service access.

For Kubernetes workloads, network policies restrict pod-to-pod communication, and service mesh adds mTLS between services. VPC endpoints keep traffic to S3 and DynamoDB internal.

5. How do you design security group rules for a database tier that must be accessed only by application servers?

Database security groups should have no inbound rules from 0.0.0.0/0 ever. The only inbound access comes from the application tier security group via security group references.

Configure the database security group to accept inbound PostgreSQL or MySQL traffic only from the application tier security group ID. This means the rule looks like: port 5432, source security group sg-xxxxxxxx. When application servers scale, they automatically get database access. When database servers scale, they inherit the same restrictions.

Outbound rules should be minimal—typically only to the application tier or specific external services the database needs to reach.

6. What's the difference between customer-managed KMS keys and cloud-managed keys? When would you choose each?

Customer-managed KMS keys give you control over key rotation, key policies, and cross-account access. You can inspect key policies and define exactly who can use the key. They cost money but provide audit trails.

Cloud-managed keys are free and automatic—AWS handles rotation and policies. You cannot inspect or modify their policies. They're fine for development and non-sensitive workloads.

Choose customer-managed keys for regulated workloads, production data requiring compliance controls, and scenarios where you need cross-account access to keys. Choose cloud-managed keys for development, test, and non-sensitive data where you want to minimize operational overhead.

7. How do you prevent a compromised IAM role from exfiltrating data from S3?

Layer several controls. First, scope the role's permissions narrowly—if it only needs read access to specific prefixes, grant only s3:GetObject on those prefixes, not s3:*.

Second, use VPC endpoints with endpoint policies that restrict S3 actions to specific buckets. Without an endpoint, traffic to S3 leaves the VPC and re-enters from the internet. With an endpoint policy, you can deny actions like PutObject if the role shouldn't be writing data.

Third, enable S3 Block Public Access at the account level. Fourth, use CloudTrail with S3 data events to detect unusual GetObject patterns—large volumes of downloads from an unusual location or at unusual times.

8. What is a Service Control Policy (SCP) and when would you use one?

SCPs are guardrails in AWS Organizations that restrict what actions IAM users and roles can perform, regardless of individual IAM permissions. They don't grant permissions—they deny actions that would otherwise be allowed.

Use SCPs to enforce organizational security standards. For example: deny creation of S3 buckets without encryption, deny creation of users with programmatic access keys older than 90 days, deny opening security groups to 0.0.0.0/0, or require MFA for delete operations on specific resources.

SCPs apply to all accounts in an OU, which makes them powerful for enforcing security baselines across your entire AWS footprint.

9. How do you handle encryption in transit for microservices communicating across availability zones?

For Kubernetes services, implement mTLS through a service mesh like Istio or Linkerd. The service mesh handles certificate rotation automatically—all pod-to-pod traffic is encrypted without application code changes.

For non-Kubernetes services, use TLS for all network communication. Either terminate TLS at the load balancer and use internal encryption between services, or implement TLS end-to-end. Certificate management can be automated with services like AWS Certificate Manager or Let's Encrypt with cert-manager for Kubernetes.

The key requirement is enforcing TLS everywhere—not just on public endpoints. Internal services should also use TLS, especially when crossing network boundaries like availability zones.

10. A company's security group accidentally allows all inbound traffic. What happens and how do you detect it?

If a security group allows all inbound traffic (0.0.0.0/0 for all ports), any resource with that security group is exposed to the internet. Port scanners will find it within hours.

Detection methods: AWS Config rules can detect overly permissive security groups and alert or auto-remediate. GuardDuty can flag unusual inbound traffic patterns. VPC Flow Logs capture all traffic including the permissive rule's effect. Security Hub aggregates these findings.

Prevention: Use AWS Config rules with remediation to automatically close security groups that open ports to 0.0.0.0/0. Set up AWS Security Hub to alert on configuration changes. Never allow 0.0.0.0/0 for database or cache ports—these should only accept traffic from specific security groups or CIDRs you control.

11. How do you implement and enforce encryption at rest for all S3 buckets in an organization?

Enforce encryption at rest through multiple layers. First, use S3 Block Public Access at the account level to prevent creating unencrypted buckets. Second, use S3 bucket policies that require objects to be uploaded with encryption. Third, use AWS Config rules with remediation to detect and automatically fix unencrypted buckets.

For existing buckets, run a scan and remediate any unencrypted ones. Use S3 Inventory to track encryption status across all buckets. For KMS encryption, ensure the policy restricts key usage to specific roles and services that need access.

Automate with a bucket creation workflow that enforces default encryption via bucket policy. Lambda can intercept bucket creation and apply encryption settings automatically.

12. What is the Shared Responsibility Model in cloud security and how does it affect your security posture?

The cloud provider (AWS, Azure, GCP) is responsible for the security OF the cloud: physical data centers, hardware, networking infrastructure, hypervisor. You are responsible for security IN the cloud: data, identity, access management, application code, operating systems, network configuration.

The split varies by service type. For IaaS (EC2, VPC), you manage the OS and applications. For PaaS (RDS, Lambda), the provider manages the platform. For SaaS, most controls are provider-managed.

This means you cannot rely on the cloud provider to secure your data or configurations. Even with a "secure" cloud platform, misconfigured IAM, open security groups, or unencrypted data is your fault.

13. How do you detect and respond to a compromised IAM role being used for data exfiltration?

Detection starts with CloudTrail with S3 data events enabled. Set up Athena queries or GuardDuty to flag unusual GetObject patterns: large volumes from unusual IPs, at unusual times, or to unfamiliar geographic locations.

Response steps: immediately revoke the role's credentials using the role's inline session policies or AWS STS temporary credentials. Isolate the affected resources. Identify what data was accessed using CloudTrail logs. Identify the initial compromise vector (was it a GitHub secret, an exposed credential, a phishing attack?).

Prevent exfiltration by using VPC endpoints with endpoint policies to restrict S3 access to specific buckets. Enable S3 Block Public Access. Use service control policies to prevent overly permissive access to S3 from any account.

14. What is AWS Security Hub and how does it integrate with other AWS security services?

Security Hub is a centralized security findings aggregator. It collects findings from GuardDuty (threat detection), Inspector (vulnerability scanning), Macie (data classification), and Config (compliance monitoring). It normalizes findings into a common format and provides a unified dashboard across your AWS account or organization.

Enable Security Hub in each account and region. Use Security Hub standards (CIS AWS Foundations, PCI DSS) to check compliance against benchmarks. Route findings to a SIEM or ticketing system for response. Use automated remediation playbooks for common findings.

Without Security Hub, you would need to check each service individually and correlate findings manually. Security Hub automates this correlation and provides a single view of your security posture.

15. How do you design IAM policies for a federated identity scenario where external users need limited access?

Federation is appropriate when external users (contractors, partners, customers) need access without creating permanent IAM users. Use AWS IAM Identity Center (formerly SSO) or SAML-based federation with your identity provider (Okta, Azure AD, Google Workspace).

For external users, create an IAM role with trust policy allowing your identity provider to assume it. Scope permissions to exactly what external users need. Set session duration to limit exposure. Enable MFA at the identity provider level, not just in AWS.

For programmatic access, use STS to issue short-lived credentials rather than access keys. Set external ID if a third party needs cross-account access. Regularly audit which federated identities are active and remove unused ones.

16. What is a Service Control Policy (SCP) and how does it differ from IAM policies?

SCPs are guardrails at the AWS Organization level that restrict what actions IAM users and roles can perform, regardless of their individual permissions. They do not grant permissions — they only deny actions that would otherwise be allowed. IAM policies grant permissions to users and roles within an account.

Use SCPs to enforce organizational security standards across all accounts. Example SCPs: deny creation of S3 buckets without encryption, deny opening security groups to 0.0.0.0/0, require MFA for delete operations on production resources, block access to specific regions.

SCP hierarchy: root OU SCPs apply to all child OUs. Explicit deny in any SCP overrides any allow. If an SCP denies an action, no IAM policy within that account can permit it.

17. How do you implement least privilege for a Kubernetes workload running on AWS EKS?

On EKS, use IAM roles for service accounts (IRSA) to assign AWS permissions to Kubernetes service accounts. Create an IAM role with a trust policy allowing the EKS cluster's OIDC provider to assume it, scoped to specific service account names in specific namespaces.

Within Kubernetes, use RBAC to grant permissions to service accounts, not to users. Network policies restrict pod-to-pod communication — a compromised pod cannot reach other pods unless explicitly allowed. Limit what pods can do via Pod Security Standards.

Avoid giving pods cluster-admin or using default service accounts. Each workload should have its own service account with minimal RBAC permissions. Enable EKS Secrets encryption for sensitive data in etcd.

18. What is the purpose of VPC Flow Logs and how do you use them for security monitoring?

VPC Flow Logs capture all network traffic flowing through VPC interfaces: source/destination IP, ports, bytes, action (accept/reject), timestamp. Enable Flow Logs at the VPC level to capture all traffic across all subnets.

Security monitoring use cases: detect port scans (many connection attempts to different ports from same IP), identify lateral movement (unusual traffic between subnets), find data exfiltration (large outbound transfers to unfamiliar IPs), catch open security groups (traffic being rejected that should be allowed).

Route Flow Logs to CloudWatch Logs or S3, then analyze with Athena or a SIEM. Set up CloudWatch alarms on unusual traffic patterns. Store logs with Object Lock to prevent tampering.

19. How do you handle key rotation for customer-managed KMS keys without causing downtime?

Enable automatic key rotation for KMS keys — AWS rotates the key material annually without any code changes or downtime. The old key material is retained so decryption of data encrypted with old keys continues to work.

For manual rotation (if you need rotation more frequently or on a specific schedule): create a new KMS key, update applications to use the new key for encryption, ensure existing data can still be decrypted with old key, then disable the old key. Do not delete old keys until all data encrypted with them is either re-encrypted or no longer needed.

Use envelope encryption: a data key encrypts data, the data key is encrypted by KMS key. Rotating the KMS key only requires re-encrypting data keys, not the data itself.

20. Describe a complete incident response workflow for a suspected cloud breach.

Detection: CloudTrail logs, GuardDuty findings, GuardDuty alerts, or unusual network traffic from VPC Flow Logs trigger investigation. Confirm breach through CloudTrail lookups for suspicious API calls (DescribeInstances, GetSecret, ListBuckets).

Containment: immediately isolate affected resources by revoking IAM credentials, security group changes, or Network ACL blocks. Preserve evidence — do not delete logs or stop instances that might contain artifacts.

Investigation: use CloudTrail to identify what actions were taken, from which IP, using which credentials. Check whether data was exfiltrated via S3 or other services. Identify the initial access vector — was it an exposed access key, a compromised service account, or a misconfigured resource?

Recovery: rotate all credentials that might be compromised. Remove any backdoors added by attacker (new IAM users, security group rules, cron jobs). Restore from known-good backups if data was modified.

Post-incident: document timeline, root cause, and remediation. Update detection rules to catch similar attacks faster. Notify affected customers if data was exposed.

Conclusion

Key Takeaways

Defense in depth means layering IAM, network isolation, and encryption—not relying on any single control
Least privilege is the cardinal rule: grant only the permissions needed, nothing more
IAM roles with short-lived tokens beat long-lived credentials embedded in code or environment variables
VPC endpoints keep traffic internal and avoid NAT gateway costs for private resource access
CloudTrail, GuardDuty, and Security Hub provide the monitoring foundation for any AWS environment

Cloud Security Checklist

# 1. Enable CloudTrail in all regions
aws cloudtrail create-trail --name my-trail --is-multi-region --bucket-name my-cloudtrail-bucket

# 2. Enable GuardDuty
aws guardduty enable-detector --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

# 3. Enforce encryption on S3 buckets
aws s3api put-bucket-encryption \
  --bucket my-bucket \
  --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'

# 4. Create IAM group with minimal permissions
aws iam create-group --group-name readonly
aws iam attach-group-policy --group-name readonly --policy-arn arn:aws:iam::aws:policy/ReadOnlyAccess

# 5. Block public access to S3 buckets
aws s3api put-public-access-block \
  --bucket my-bucket \
  --public-access-block-configuration 'BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true'

For more on managing cloud infrastructure, see our post on Cost Optimization.

Cloud Security: IAM, Network Isolation, and Encryption

Introduction

When to Use

Cloud-Native Security Services vs. Third-Party

VPC Endpoints vs. NAT Gateway

Customer-Managed KMS Keys vs. Cloud-Managed Keys

IAM Best Practices

Service Accounts and Workload Identity

VPC and Network Isolation

Encryption at Rest and in Transit

Security Groups and Firewall Rules

Cloud-Native Security Services

Defense-in-Depth Architecture

Trade-off Analysis

Production Failure Scenarios

Cloud Security Observability

Common Pitfalls / Anti-Patterns

Trade-off Analysis (Tools)

Real-world Failure Scenarios

Interview Questions

Further Reading

Conclusion

Key Takeaways

Cloud Security Checklist

Category

Tags

Related Posts

Kubernetes Network Policies: Securing Pod-to-Pod Communication

Encryption at Rest: TDE, Key Management, and Performance

AWS Core Services for DevOps: EC2, ECS, EKS, S3, Lambda