GCP Core Services: GCE, GKE, Cloud Storage, BigQuery

Master essential Google Cloud Platform services for containerized and serverless workloads—GCE, GKE, Cloud Storage, and serverless options.

published: March 25, 2026 reading time: 30 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

GCP organizes everything around projects, not accounts—IAM policies cascade down through folders to all resources inside. GCE gives VMs; GKE Standard gives you full control over nodes and networking; GKE Autopilot strips node management entirely, scaling pods on demand. Cloud Run handles containerized HTTP services scaling from zero to thousands without managing infrastructure. Cloud Storage lifecycle policies auto-transition objects from Standard through Nearline and Coldline to deletion, preventing bucket bloat. Workload Identity binds Kubernetes service accounts to GCP service accounts so pods authenticate via OIDC federation instead of key files—a pattern that eliminates an entire class of secret sprawl incidents. Pick compute options by whether you need persistent machines, Kubernetes compatibility, or serverless scaling, not by which one is newer.

GCP Core Services: GCE, GKE, Cloud Storage, BigQuery

Introduction

Google Cloud Platform shares concepts with other cloud providers but uses its own service names and organizational model. The key difference is that GCP organizes everything around projects rather than accounts. A project groups related resources, carries its own IAM policies, and accumulates its own billing. This structure makes it easy to isolate workloads and control access per team. Everything lives in a project, and organizational policies cascade down through folders to all projects underneath.

For DevOps workloads, GCP offers a range of compute options. Google Compute Engine (GCE) gives you virtual machines. Managed Instance Groups (MIGs) keep fleets of instances running across zones with automatic healing and scaling. Google Kubernetes Engine (GKE) comes in two flavors: Standard mode for full control over nodes, and Autopilot mode where Google handles node provisioning and you pay per pod. Cloud Run runs containerized HTTP services and scales from zero to thousands automatically. Cloud Functions handles event-driven single-purpose tasks. Cloud Storage holds objects with lifecycle policies, and Cloud SQL handles relational databases without operational overhead.

These services tie together through Workload Identity, which lets Kubernetes service accounts act as GCP service accounts without managing key files. That is a meaningful improvement over storing service account keys in your codebase. The rest of this guide covers each service with practical examples, trade-off analysis, and production failure scenarios to help you avoid common mistakes. You will understand when to pick each compute option, how to configure GKE and Cloud Run for production, and what monitoring to set up.

When to Use

GKE Autopilot vs Standard

Choose GKE Autopilot when you want Google to manage node provisioning, scaling, and upgrades. Autopilot works well for teams that want Kubernetes without the operational overhead, and for variable workloads where per-pod pricing beats paying for idle nodes.

Choose GKE Standard when you need SSH access to nodes, specific node configurations, daemon sets on dedicated infrastructure, or visibility into node-level resource allocation. Fixed, predictable workloads or compliance requirements around node access also point toward Standard.

The reason Autopilot removes node management is that Google provisions nodes on your behalf when pods schedule, then tears them down when workloads complete. Nodes are preemption-protected by default, so you do not manage upgrades or node lifecycle directly. This shifts the operational burden to Google but introduces constraints: Autopilot restricts instance types to predefined profiles, and you cannot run daemon sets on system nodes since the node pool is opaque.

Standard gives you the equivalent of EKS Managed Node Groups plus self-managed nodes in the same cluster. You can provision specific machine types, custom AMIs, or GPU nodes explicitly. This matters when you need particular kernel versions, GPU drivers, or security compliance that requires hardened images. Daemon sets run on nodes you designate as dedicated system nodes, which Autopilot does not support.

Factor	Autopilot	Standard
Node management	Google manages	You manage
Instance types	Predefined profiles only	Any Google Cloud machine type
SSH access	No	Yes
Daemon sets	No	Yes
Per-pod pricing	Yes	No (per node)
Node allocation visibility	Opaque	Full
Upgrade management	Automatic	Manual or configured

Compute Engine vs Cloud Run vs Cloud Functions

Choose Compute Engine for long-running VMs requiring persistent state, specific hardware configurations, or legacy workloads that do not fit the container model.

Choose Cloud Run for containerized HTTP services that need automatic scaling from zero to thousands. Cloud Run handles requests up to 60 minutes per instance—most web services and APIs fit comfortably within that limit.

Choose Cloud Functions for single-purpose event-driven tasks—processing a Cloud Storage upload, responding to a Pub/Sub message, or a lightweight ETL step. For anything more substantial, Cloud Run is the better fit.

The execution model differs meaningfully across the three. Compute Engine VMs are always-on—you pay for the instance regardless of utilization. Cloud Run scales to zero between requests but bills per actual milliseconds of container runtime, with a 100ms minimum rounding. Cloud Functions bills per invocation with a 15-minute timeout for 2nd gen functions (9 minutes for 1st gen), and you pay only for actual execution time.

Traffic patterns determine which fits. Sustained, predictable traffic favors Compute Engine because per-second billing with a 1-minute minimum means you pay for what you use, and there is no cold start latency. Sporadic HTTP traffic with idle periods favors Cloud Run since scaling to zero eliminates idle cost and billing per millisecond aligns with actual consumption. Event-driven background work with unpredictable arrival patterns favors Cloud Functions because the per-invocation model maps cleanly to discrete tasks.

Cloud Run and Cloud Functions overlap but diverge in practice. Cloud Run handles longer-running processes up to 60 minutes per request versus 15 minutes for Cloud Functions 2nd gen. Cloud Run also supports any HTTP framework in your container, while Cloud Functions constrains you to supported runtimes. Cloud Run concurrency is higher (up to 1000 requests per instance), and you control the container directly. Use Cloud Functions for quick, stateless transformations triggered by events; use Cloud Run for anything that needs sustained processing or framework flexibility.

Cold starts matter for latency-sensitive services. Cloud Run cold starts range from 100ms for small, optimized images to 2s or more for larger images with initialization logic. Compute Engine has no cold start since instances are always running, but scaling requires Managed Instance Groups with health checks and pre-warming. Cloud Functions cold starts are typically faster since they load a single function rather than an entire container.

Factor	Compute Engine	Cloud Run	Cloud Functions
Execution model	Always-on VM	Container, scales to zero	Per-invocation function
Scaling	MIG-based, always-on	Auto 0 to thousands	Auto, per invocation
Max request timeout	Persistent	60 minutes	15 minutes (2nd gen)
Cold start	None (always on)	100ms-2s	Typically faster
Billing	Per second, 1-min minimum	Per millisecond (100ms min)	Per invocation + GB-s
Framework flexibility	Any OS/runtime	Any HTTP framework	Supported runtimes only

GCS Storage Class Selection

Storage class choice is really about how often you access the data, how quickly you need it when you do, and how much you want to spend per gig per month. Each class also enforces a minimum storage duration, and that duration is where teams get caught.

Class	Best For	Min Storage	Retrieval Latency	Cost
Standard	Hot data, daily access	30 days	Milliseconds	Highest
Nearline	Monthly access, backups	30 days	Seconds	~$0.01/GB/mo
Coldline	Annual access, archival	90 days	Seconds to minutes	~$0.004/GB/mo
Archive	Compliance, yearly access	365 days	Minutes to hours	~$0.001/GB/mo

Standard is the default for anything in active use. Nearline fits backups and logs that rotate monthly. Coldline works for disaster recovery images and compliance archives. Archive is for data you legally have to keep but practically never read.

The minimum storage duration is the part that surprises people. Upload a file to Nearline and delete it after a week, and you still get billed for all 30 days. This catches teams running short-lived dev environments, staging migration artifacts, or any data that gets replaced rather than updated in place. When access patterns are unclear, stick with Standard at first, then move down once the pattern is established.

Lifecycle policies automate class transitions so you do not have to reclassify manually. A typical chain: Standard for 30 days, Nearline through day 90, Coldline until day 365, then delete. One caveat: transition rules are asynchronous and can take up to 48 hours to apply. If you are watching billing after a new policy, give it a couple of days before concluding it is not working.

When Not to Use GCP

GCP works well for cloud-native workloads, but there are cases where it is the wrong call. These boundaries matter because choosing poorly means paying for complexity you do not need.

Avoid GKE Autopilot when you need specific kernel modules, custom node images, or daemon sets that require direct node access. Autopilot restricts SSH access and node-level customization. Teams with compliance requirements around host-based auditing or those running security agents that need node-level visibility should use GKE Standard or Compute Engine instead. Autopilot also restricts you to predefined instance profiles, so workloads with unusual CPU-to-memory ratios may not fit the available shapes.

Avoid Compute Engine when you do not need persistent VMs. The always-on billing model means you pay for idle capacity. Cloud Run scales to zero between requests and bills per actual milliseconds, which is cheaper for sporadic traffic. Even for steady-state workloads, the operational overhead of managing MIGs, health checks, and scaling policies often outweighs the flexibility benefit.

Avoid regional GCS buckets when you need global low-latency access. Standard regional buckets serve local traffic efficiently but add egress costs for cross-region downloads. If your users or services span multiple geographic regions, consider dual-region or multi-region storage instead. The higher storage cost is usually offset by reduced egress fees when traffic is truly global.

Avoid BigQuery as a primary application database. BigQuery is an analytics warehouse, not a transactional database. It handles analytical queries over large datasets well but lacks the random-access read/write patterns that operational databases need. Application-layer joins, high-frequency updates, and sub-second query requirements are outside its design. Use Cloud SQL, Spanner, or Firestore for operational data, and keep BigQuery for the analytical layer.

There is also the familiarity factor. The gcloud CLI, Cloud Console, and IAM model differ from AWS or Azure. If your team is already deep in CloudFormation or SAM, the migration cost may not be worth it. GCP makes sense when you are building new infrastructure or can commit to its tooling from the start.

GCP Project and Resource Hierarchy

GCP resources follow a four-level hierarchy: Organization, Folder, Project, and Resource. The organization sits at the top, followed by folders that can contain other folders or projects, then individual resources within projects.

# Set your project
gcloud config set project my-project-123

# List available projects
gcloud projects list

# Set compute zone
gcloud config set compute/zone us-central1-a

# Set compute region
gcloud config set compute/region us-central1

Folders let you group projects by team, environment, or department. Organizational policies applied at the folder level cascade down to all contained projects. This simplifies governance for large organizations.

IAM roles control access at project or resource level. GCP distinguishes between primitive roles (owner, editor, viewer) that affect all resources, and predefined roles that grant specific permissions for specific services.

flowchart TD
    A[Organization] --> B[Folder: Team A]
    A --> C[Folder: Team B]
    B --> D[Project: prod-api]
    B --> E[Project: prod-frontend]
    C --> F[Project: dev-services]
    D --> G[GKE Autopilot Cluster]
    D --> H[Cloud Run Service]
    D --> I[Cloud Storage Bucket]
    E --> J[Cloud Run Service]
    F --> K[Compute Engine MIG]
    G --> L[Workload Identity]
    L --> M[GCP Service Account]

Compute Engine Fundamentals

Google Compute Engine (GCE) provides virtual machines similar to EC2. Instance templates define the machine type, image, disk, and network configuration for launching instances.

# Create an instance from a template
gcloud compute instances create web-server-1 \
  --source-instance-template=web-server-template \
  --zone=us-central1-a

# List running instances
gcloud compute instances list

# Connect via SSH
gcloud compute ssh web-server-1 --zone=us-central1-a

Managed instance groups (MIGs) maintain a fleet of instances across zones. Like AWS ASGs, MIGs automatically heal failed instances and scale based on load.

# Resize a managed instance group
gcloud compute instance-groups managed set-size web-server-mig \
  --size=5 \
  --zone=us-central1-a

# Update instance template for rolling updates
gcloud compute instance-groups managed rolling-action start-update web-server-mig \
  --zone=us-central1-a \
  --version=template=web-server-template-v2

Preemptible VMs cost less than regular instances but can be terminated by GCP at any time. They work well for batch jobs and fault-tolerant workloads. Spot VMs are the successor to preemptible VMs with similar pricing dynamics.

GKE Operating Modes

Google Kubernetes Engine (GKE) offers two operating modes. Standard mode gives you control over node provisioning, scaling, and upgrades. Autopilot mode offloads node management to Google, provisioning and scaling nodes automatically as your workloads demand them.

# Create a Standard GKE cluster
gcloud container clusters create standard-cluster \
  --zone=us-central1-a \
  --num-nodes=3 \
  --machine-type=e2-medium

# Create an Autopilot GKE cluster
gcloud container clusters create autopilot-cluster \
  --region=us-central1 \
  --enable-autopilot

Autopilot clusters provision nodes when pods schedule and remove them when workloads complete. You pay per pod rather than per node, which can reduce costs for variable workloads. The tradeoff is less control over node configuration and the inability to SSH to nodes directly.

Standard clusters give you full control. You choose instance types, manage node pools explicitly, and handle upgrades yourself. This works better when you have specific infrastructure requirements or need to run daemon sets and system workloads on dedicated nodes.

# node-pool configuration for standard cluster
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
  name: compute-nodepool
spec:
  clusterRef:
    name: standard-cluster
  location: us-central1
  nodeConfig:
    machineType: e2-medium
    diskSizeGb: 50
  nodeCount: 3

GKE uses Kubernetes natively, so kubectl commands, Helm charts, and GitOps workflows work the same as any Kubernetes cluster. The main GCP-specific integrations are workload identity for service account authentication and Anthos for hybrid/multi-cluster management.

Cloud Storage for Artifacts

Google Cloud Storage (GCS) uses buckets to store objects. Buckets live in projects and have globally unique names. GCS supports multiple storage classes that trade off cost against access latency.

# Create a bucket
gcloud storage buckets create gs://my-artifacts-bucket \
  --location=US \
  --default-storage-class=STANDARD

# Upload artifacts
gcloud storage cp ./dist/app.tar.gz gs://my-artifacts-bucket/prod/

# List bucket contents
gcloud storage ls gs://my-artifacts-bucket/prod/

# Set lifecycle policy
gcloud storage buckets update gs://my-artifacts-bucket \
  --set-lifecycle-file=lifecycle-policy.json

Lifecycle policies automate object transitions between storage classes and deletions.

{
  "rule": [
    {
      "action": {
        "type": "SetStorageClass",
        "storageClass": "NEARLINE"
      },
      "condition": {
        "age": 30
      }
    },
    {
      "action": {
        "type": "Delete"
      },
      "condition": {
        "age": 365
      }
    }
  ]
}

GCS integrates with Cloud CDN for content delivery, with IAM for access control, and with Pub/Sub for triggering functions on object changes.

Cloud Run for Serverless Containers

Cloud Run runs containerized applications without managing infrastructure. It scales from zero to thousands of instances automatically based on incoming requests. Cloud Run is fully managed when you do not specify a VPC connector, or it can run in your VPC for private workload access.

# Deploy a container to Cloud Run
gcloud run deploy webapp \
  --image=gcr.io/my-project/webapp:v1 \
  --platform=managed \
  --region=us-central1 \
  --allow-unauthenticated

# Check service status
gcloud run services describe webapp --region=us-central1

# View logs
gcloud run services logs read webapp --region=us-central1

Cloud Run bills per actual usage measured in milliseconds, rounding up to 100ms minimum. This makes it economical for sporadic workloads where Lambda might charge for full seconds or billing increments.

# service.yaml for Cloud Run
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: webapp
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"
        autoscaling.knative.dev/maxScale: "100"
    spec:
      containers:
        - image: gcr.io/my-project/webapp:v1
          ports:
            - containerPort: 8080
          env:
            - name: NODE_ENV
              value: production
          resources:
            limits:
              cpu: 1000m
              memory: 512Mi

Cloud Functions is GCP’s original serverless offering, running individual functions in response to events. Cloud Run handles longer-running services and containers, while Cloud Functions handles quick event-driven tasks. Both are serverless options with different use cases.

GCP IAM for Service Accounts

IAM in GCP uses service accounts for workloads and human users. Service accounts are identities that workloads use to authenticate to GCP APIs. Workload Identity Federation lets Kubernetes service accounts assume GCP service accounts without managing key files.

# Create a service account
gcloud iam service-accounts create build-bot \
  --display-name="CI/CD Build Bot"

# Grant roles to the service account
gcloud projects add-iam-policy-binding my-project-123 \
  --member="serviceAccount:build-bot@my-project-123.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

# Create a key for external use (CI/CD outside GCP)
gcloud iam service-accounts keys create key.json \
  --iam-account=build-bot@my-project-123.iam.gserviceaccount.com

Workload identity is the preferred approach for GKE workloads. It binds a Kubernetes service account to a GCP service account, and short-lived tokens replace key files.

# Enable workload identity on existing cluster
gcloud container clusters update standard-cluster \
  --region=us-central1 \
  --workload-pool=my-project-123.svc.id.goog

# Create IAM binding for the KSA
gcloud iam service-accounts add-iam-policy-binding \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:my-project-123.svc.id.goog[default/my-k8s-service-account]" \
  my-gcp-sa@my-project-123.iam.gserviceaccount.com

For more on managing cloud costs across providers, see our post on Cost Optimization.

Trade-off Analysis

Scenario	Compute Engine	GKE Standard	GKE Autopilot	Cloud Run	Cloud Functions
Full node access	Yes	SSH to nodes	No	No	No
Serverless containers	No	No	No	Yes	No
Kubernetes ecosystem	No	Yes	Yes	No	No
Pay-per-second billing	No (per second, min 1 min)	No (per node)	Per pod	Yes	Yes
Scale from zero	No	No	No	Yes	Yes
Max request timeout	Persistent	Persistent	Persistent	60 minutes	60 minutes (2nd gen)
Daemon sets	Yes	Yes	No	No	No

Production Failure Scenarios

Failure	Impact	Mitigation
GKE Autopilot pod scheduling failures due to quota	Pods pending indefinitely, deployments time out	Pre-check quota in the region, request quota increases via support
MIG instance health check failures	Instances marked unhealthy and replaced, service disruption	Verify firewall rules allow health check IPs, check instance startup scripts
Cloud Run cold start affecting latency SLOs	First request after idle period times out	Set min instances to keep warm, use pre-warming pings
GCS bucket without lifecycle policy accumulating costs	Storage costs grow unbounded for old artifact versions	Apply lifecycle rules immediately on bucket creation, audit bucket sizes quarterly
Workload identity misconfiguration locking out GKE pods	Pods cannot authenticate to GCP APIs, services fail	Test workload identity bindings before deploying, keep a fallback key rotation window

GCP Observability Hooks

GCE and MIG monitoring:

# List MIG instances and their status
gcloud compute instance-groups managed describe web-server-mig \
  --zone=us-central1-a \
  --format="table(name,baseInstanceName,currentAction,status)"

# Get instance metrics via Cloud Monitoring
gcloud monitoring metrics list --filter="resource.type=gce_instance"

GKE monitoring:

# Get cluster component status
gcloud container clusters describe standard-cluster \
  --zone=us-central1-a \
  --format="table(name,status,currentMasterVersion)"

# Check pod status across namespaces
kubectl get pods -A -o wide

# Get node pool sizes
gcloud container node-pools list --cluster=standard-cluster --zone=us-central1-a

Cloud Run monitoring:

# Check service revisions and traffic
gcloud run services describe webapp --region=us-central1

# View recent logs
gcloud logging read "resource.type=cloud_run_revision" --limit=50

Key Cloud Monitoring metrics to alert on:

Service	Metric	Alert Threshold
Compute Engine	CPU utilization	> 80% for 5 minutes
Compute Engine	Instance uptime	< 99.9% monthly
GKE	Pod pending time	> 2 minutes
GKE	Node CPU allocation	> 85%
Cloud Run	Request latency p99	> 2000ms
Cloud Run	Container instance count	> max - 2
Cloud Storage	Object count	unexpected growth
Cloud Storage	Monthly storage	> budget

Common Anti-Patterns

Using the default network. GCP’s default network has permissive firewall rules and is the same across all projects. Create a dedicated network with explicit firewall rules for each environment.

Not using Workload Identity. Creating and managing service account key files is a security risk and an operational burden. Workload Identity eliminates key files entirely for GKE workloads.

Leaving service account keys in source code or CI/CD systems. Service account keys committed to git or stored in CI/CD variables are a common compromise vector. Use Workload Identity for GKE or short-lived credentials for CI/CD systems.

Not setting up GCS bucket policies. Buckets are private by default, but misconfigured uniform bucket-level access can accidentally expose data. Use IAM conditions and test bucket ACLs in a non-production environment first.

Mixing prod and non-prod resources in the same project. Using a single project for all environments defeats GCP’s natural isolation. Separate projects per environment make IAM governance simpler and contain blast radius.

Capacity Estimation and Benchmark Data

Use these numbers for initial capacity planning. Actual performance varies by workload characteristics.

GCE Machine Types

Series	Best For	Machine Types	Network Performance
e2	Cost-effective general purpose	e2-medium → e2-standard-32	Up to 16 Gbps
n2	General purpose (standard workloads)	n2-standard-2 → n2-standard-80	Up to 100 Gbps
n1	General purpose (balanced)	n1-standard-1 → n1-standard-96	Up to 32 Gbps
c2	Compute optimized (high CPU)	c2-standard-4 → c2-standard-60	Up to 100 Gbps
m2	Memory optimized	m2-megabyte-416 → m2-ultramem-416	Up to 100 Gbps
a2	GPU optimized	a2-highgpu-1g → a2-megagpu-16g	Up to 100 Gbps

Cloud Run Performance Targets

Metric	Value	Notes
Cold start (container instance)	100ms-2s	Depends on image size and initialization time
Cold start (dynamic loading)	2-5 seconds	First request to a new instance
Min instances = 0 latency	Cold starts apply when at zero	Set min instances for latency-sensitive services
Max requests per instance	80 concurrent (default)	Configure based on memory/CPU needs
Request timeout	300 seconds (5 min)	Increase for long-running operations
Throughput per instance	~1,000 RPS (simple HTTP)	Varies significantly with workload type

GCS Storage Performance

Metric	Value
Single object GET latency	5-20ms (p50), 100-200ms (p99)
Single object PUT latency	20-50ms (p50)
Recommended requests per bucket	Up to 1,000-5,000 requests per second per bucket
Typical throughput per bucket	1-5 Gbps for large objects
Consistency	Strong consistency for all operations

Cloud SQL Instance Tiers

Tier	vCPUs	Memory	Max Connections	Typical Use
db-f1-micro	1	0.6 GB	45	Dev/test
db-g1-small	1	1.7 GB	200	Small production
db-n1-standard-1	1	3.75 GB	400	Entry production
db-n1-standard-4	4	15 GB	1,000	Medium production
db-n1-standard-8	8	30 GB	2,000	Large production
db-n1-highmem-4	4	26 GB	1,000	Memory-intensive

Interview Questions

1. What is the difference between GKE Autopilot and GKE Standard mode, and when would you choose each?

What to cover:

Autopilot: Google manages node provisioning, scaling, upgrades; pay per pod, not per node
Autopilot restricts SSH access and node-level customization; no daemon sets on dedicated nodes
Standard: you control nodes, instance types, node pools; full kubectl access
Choose Autopilot for: teams wanting Kubernetes without ops overhead, variable workloads
Choose Standard for: compliance needs node access, specific hardware, daemon sets

2. How does Workload Identity work in GKE and why is it preferred over service account keys?

What to cover:

Workload Identity binds a Kubernetes service account to a GCP service account
Pods get short-lived tokens via OIDC federation, no key files needed
Key advantages: no key file management, automatic rotation, audited access
Setup: enable Workload Identity on cluster, create IAM binding, annotate KSA
Security: compromised pod cannot steal static keys; tokens expire automatically

3. When would you use Cloud Run vs Cloud Functions vs Compute Engine?

What to cover:

Compute Engine: persistent VMs, legacy workloads, specific hardware, stateful services
Cloud Run: containerized HTTP services, serverless scaling from zero, up to 60 min requests
Cloud Functions: event-driven single-purpose tasks, Cloud Storage triggers, Pub/Sub handlers
Cloud Run handles longer-running processes and complex containers; Functions for quick tasks
Cost: Cloud Run bills per actual milliseconds; Functions similar but limited to 15 min

4. How do you choose the right GCS storage class for different access patterns?

What to cover:

Standard: hot data, accessed daily, immediate availability
Nearline: accessed < once/month, backups, monthly reports; lower cost, retrieval in seconds
Coldline: accessed < twice/year, archival; cheapest storage, retrieval in seconds to minutes
Archive: compliance, accessed < once/year; lowest cost, retrieval in minutes to hours
Use lifecycle policies to auto-transition: Standard → Nearline (30d) → Coldline (90d) → Archive (365d)

5. How does GCP project hierarchy differ from AWS account hierarchy?

What to cover:

GCP: Organization → Folders → Projects → Resources (hierarchical)
AWS: Organization → Accounts → Regions → Resources (flat accounts, region-scoped)
GCP projects are the billing and authorization boundary; everything lives in a project
Organizational policies cascade down through folders to all projects
AWS SCPs work similarly but apply to accounts rather than projects
Both support cross-project/cross-account networking via VPC peering or shared VPCs

6. What are MIGs (Managed Instance Groups) and how do they handle high availability?

What to cover:

MIGs maintain desired instance count across zones automatically
Auto-healing: replaces failed instances based on health checks
Rolling updates: canary or rolling pattern for zero-downtime updates
Load balancing: MIGs integrate with Cloud Load Balancing for traffic distribution
Stateful MIGs for databases; regular MIGs for stateless web services

7. How do you set up Cloud Run for private workload access within a VPC?

What to cover:

Deploy Cloud Run with VPC connector: gcloud run services update --vpc-connector
VPC connector handles routing to private GCP resources (Cloud SQL, Memorystore)
Fully managed Cloud Run allows external traffic by default; disable with --no-allow-unauthenticated
Cloud Run in VPC has cold start overhead for VPC initialization
Alternative: use Cloud Run for Anthos for full VPC control

8. What is the GCP equivalent of AWS Lambda, and how does it compare?

What to cover:

GCP: Cloud Functions (event-driven) and Cloud Run (containerized, longer-running)
Cloud Functions: similar to Lambda, 2nd gen supports up to 60 min timeout
Cloud Run: similar to Lambda but for containers; scales to zero, bills per ms
Billing: both GCP and AWS charge per invocation and per GB-second
Concurrency: Cloud Run 1000 concurrent per instance; Lambda 1000 concurrent per region
Key difference: Cloud Run is container-based, Lambda supports both container and zip

9. How do you configure lifecycle policies for GCS buckets?

What to cover:

Lifecycle rules defined in JSON: set_storage_class and delete actions
Conditions: age (days), created_before (date), is_live (boolean)
Example: age 30 → Nearline, age 90 → Coldline, age 365 → delete
Apply via: gcloud storage buckets update gs://bucket --set-lifecycle-file=policy.json
Test lifecycle rules in dev first; rules are async and may not apply immediately

10. What GCP monitoring tools would you use to alert on GKE pod scheduling failures?

What to cover:

Cloud Monitoring: metric policy for pod pending time > 2 minutes
Cloud Logging: filter for pods in Pending state for extended periods
Cloud Alerts: configure notification channels (email, SMS, PagerDuty)
Uptime checks: synthetic health checks for service availability
GKE Console: cluster component status shows control plane health

11. How does Cloud Armor differ from Cloud Load Balancing's built-in DDoS protection?

What to cover:

Cloud Load Balancing includes automatic DDoS protection for layer 3/4 attacks at no extra cost
Cloud Armor is application-level WAF: layer 7 protection against SQL injection, XSS, and other OWASP threats
Cloud Armor lets you create allow/deny rules based on IP, geolocation, or request attributes
Cloud Armor sits in front of the load balancer, inspecting HTTP(S) traffic before it reaches your service
Use Cloud Armor for: WAF rules, rate limiting, geo-restrictions, bot protection

12. What is Cloud Build and how does it integrate with GKE for CI/CD?

What to cover:

Cloud Build is GCP's managed CI/CD service—serverless build infrastructure
Build triggers watch source repos (Cloud Source Repositories, GitHub, Bitbucket) for changes
Build steps run in containers—each step is a Docker container you control
For GKE deployments: build pushes image to GCR, then updates deployment in GKE
Cloud Build can run kubectl commands against GKE clusters via Workload Identity

13. How does Cloud Run compare to GKE for a stateless HTTP API in terms of operational overhead?

What to cover:

Cloud Run: fully managed, no node management, auto-scales from zero to thousands
GKE: you manage the cluster (nodes, upgrades, autoscaling configuration)
Cloud Run simpler for stateless HTTP services—Containerize, deploy, done
GKE better for: non-HTTP workloads, stateful services, complex pod orchestration, service mesh
Cloud Run cold starts happen when scaling from zero; GKE nodes are always running (cost)
Both scale horizontally; Cloud Run is simpler, GKE is more flexible

14. What are the trade-offs between regional and dual-region GCS buckets?

What to cover:

Regional: single region, lower latency for local access, lower cost
Dual-region: data in two separated regions, higher cost, automatic failover
Dual-region serves reads from the closest region automatically
Dual-region for high availability requirements where regional failure causes unacceptable downtime
Consider: is your application architected to handle regional failure gracefully?

15. How does Secret Manager differ from env variables or Kubernetes Secrets for storing credentials?

What to cover:

Environment variables: baked into image or container at deploy time, visible in logs if leaked
K8s Secrets: base64 encoded (not encrypted by default), stored in etcd, require additional config for encryption at rest
Secret Manager: encrypted at rest with Google-managed or customer-managed keys, audit logged
Secret Manager integrates with Cloud Run, GKE, Cloud Functions via Workload Identity—no key files
Secret Manager versioning: rotation creates new version, old version remains accessible
Secret Manager prevents accidental exposure in logs and provides access audit trails

16. What is the difference between Cloud SQL Proxy and direct Cloud SQL connections?

What to cover:

Direct connections require whitelisting IP addresses—problematic for dynamic Cloud Run/GKE IPs
Cloud SQL Proxy: establishes secure tunnel using IAM credentials, no IP whitelisting needed
Proxy handles TLS certificate rotation automatically
Cloud Run and GKE workloads use the proxy to connect without managing SSL certs or IPs
Proxy adds connection overhead (latency) but eliminates IP management complexity

17. How does GCP's organization hierarchy affect IAM policy inheritance?

What to cover:

Organization → Folders → Projects → Resources (top-down hierarchy)
IAM policies at higher levels cascade down to all contained resources
Folders inherit Organization policies; projects inherit Folder policies
Child-level policies can expand (grant more access) but not restrict (deny) parent grants
Use folders to group projects by team or environment—apply folder-level policies for team-wide access

18. What is the difference between Anthos and standard GKE for hybrid cloud scenarios?

What to cover:

Anthos is GCP's hybrid and multi-cloud Kubernetes platform—run K8s on-premises or other clouds
Standard GKE: GCP-managed control plane, nodes run on GCP infrastructure
Anthos: control plane can be managed by GCP while workloads run on your infrastructure (Anthos GKE on-prem)
Anthos includes Config Sync, Policy Controller, and Service Mesh (Anthos Service Mesh)
Use Anthos when: regulatory requirements prevent cloud migration, low-latency edge workloads, existing on-prem investment

19. How do you set up and use Cloud Scheduler for triggering Cloud Functions or Cloud Run jobs?

What to cover:

Cloud Scheduler is a managed cron job service—runs jobs on a schedule with at-least-once delivery
Target types: HTTP endpoint (Cloud Functions, Cloud Run, or any HTTP), Pub/Sub topic, App Engine app
Define schedule in cron format: `*/5 * * * *` (every 5 minutes), `0 9 * * *` (daily at 9am)
Retry configuration: failed jobs are retried with exponential backoff
Use for: periodic data processing, scheduled cleanup jobs, daily report generation

20. What is Cloud Memorystore and when would you use it versus managed database services?

What to cover:

Memorystore is GCP's managed Redis and Memcached service
Use for: session caching, real-time data caching, pub/sub messaging queues
Use databases (Cloud SQL, Spanner, Firestore) for persistent data that survives service restarts
Memorystore is not a database—it stores transient data that could be rebuilt from source of truth
Memorystore can be used for read-through caching: app reads from cache, falls back to DB if miss

Conclusion

Key Takeaways

GCP uses projects as the core organizational unit, not accounts like AWS
GKE Autopilot removes node management entirely—Google handles provisioning and scaling
GKE Standard keeps full control over nodes and is better for daemon sets and custom configurations
Cloud Run handles containerized HTTP services with true serverless scaling from zero
Workload Identity is the recommended way to give GKE workloads access to GCP APIs

GCP Onboarding Checklist

# 1. Set up a new project
gcloud projects create my-project-123 --name="My Project"

# 2. Enable required APIs
gcloud services enable container.googleapis.com compute.googleapis.com storage.googleapis.com

# 3. Create a GKE Autopilot cluster
gcloud container clusters create autopilot-cluster \
  --region=us-central1 \
  --enable-autopilot

# 4. Configure Workload Identity for a namespace
kubectl create serviceaccount my-sa -n default
gcloud iam service-accounts add-iam-policy-binding \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:my-project-123.svc.id.goog[default/my-sa]" \
  my-gcp-sa@my-project-123.iam.gserviceaccount.com

# 5. Create a Cloud Storage bucket with lifecycle policy
gcloud storage buckets create gs://my-artifacts-bucket --location=US

GCP Core Services: GCE, GKE, Cloud Storage, BigQuery

Introduction

When to Use

GKE Autopilot vs Standard

Compute Engine vs Cloud Run vs Cloud Functions

GCS Storage Class Selection

When Not to Use GCP

GCP Project and Resource Hierarchy

Compute Engine Fundamentals

GKE Operating Modes

Cloud Storage for Artifacts

Cloud Run for Serverless Containers

GCP IAM for Service Accounts

Trade-off Analysis

Production Failure Scenarios

GCP Observability Hooks

Common Anti-Patterns

Capacity Estimation and Benchmark Data

GCE Machine Types

Cloud Run Performance Targets

GCS Storage Performance

Cloud SQL Instance Tiers

Interview Questions

Further Reading

Conclusion

Key Takeaways

GCP Onboarding Checklist

Category

Tags

Related Posts

GCP Data Services: Dataflow, BigQuery, and Pub/Sub

Choosing a Git Team Workflow: Decision Framework

Git Flow: The Original Branching Strategy Explained