Search Pass4Sure

Cloud Technical Interview Prep for AWS and Azure

Prepare thoroughly for cloud technical interviews with insights on key topics hiring managers want to explore.

Cloud Technical Interview Prep for AWS and Azure

What AWS topics come up most in cloud engineer interviews?

The most common topics are IAM roles vs users and least privilege policy design, EC2 pricing models, VPC architecture for multi-tier applications, S3 vs EBS storage trade-offs, and Infrastructure as Code with Terraform or CloudFormation. Security and networking questions appear in nearly every cloud role interview.


Cloud technical interviews test both conceptual understanding and practical experience. Interviewers are not just looking for candidates who can recite service names—they want to know that you have built things, hit real problems, and understand the trade-offs involved in architectural decisions. This article covers the specific questions and topics that appear repeatedly in AWS and Azure technical interviews, with the depth and framing that actually distinguishes strong candidates.

Core AWS Questions

IAM and Access Management

"Explain the difference between an IAM role and an IAM user."

An IAM user is a persistent identity with long-term credentials (access keys and/or a password) associated with a specific person or service account. An IAM role is a temporary identity that can be assumed by services, applications, or other AWS accounts. Roles do not have long-term credentials—they generate short-lived STS tokens.

The important implication: using IAM roles for EC2 instances, Lambda functions, and ECS tasks is the current best practice because credentials rotate automatically and are never stored as plaintext in code or configuration. Hardcoded IAM user access keys are a security anti-pattern that appears in most cloud breach post-mortems.

"What is the principle of least privilege and how does it apply to IAM policies?"

Least privilege means granting only the permissions required for a specific task and no more. In IAM, this means writing policies that specify exact actions (s3:GetObject rather than s3:*) on exact resources (arn:aws:s3:::my-bucket/prefix/* rather than *). Using AWS-managed policies for convenience often grants broader permissions than a workload needs.

Compute: EC2 and Lambda

"What are the differences between EC2 On-Demand, Reserved, Spot, and Savings Plans pricing models?"

Model Use Case Cost Relative to On-Demand
On-Demand Variable workloads, short-term Baseline
Reserved Instances Steady-state, committed 1-3 years Up to 72% savings
Spot Instances Fault-tolerant, interruptible workloads Up to 90% savings
Savings Plans Flexible commitment across instance types Up to 66% savings

Spot Instances can be interrupted with a two-minute warning when AWS needs capacity back. They are appropriate for batch processing, CI/CD build workers, and stateless application tiers that can handle interruption.

"What is a Lambda cold start and how do you mitigate it?"

When a Lambda function has not been invoked recently, AWS must initialize a new execution environment—download the code, start the runtime, and run initialization code outside the handler. This delay is the cold start. Mitigations include:

  • Keeping package size small to reduce download time

  • Using Provisioned Concurrency for latency-sensitive functions

  • Using runtime languages with fast startup (Go, Node.js over Java)

  • Keeping initialization code outside the handler to amortize startup cost

Networking: VPC Architecture

"Design a VPC for a three-tier web application."

A standard three-tier VPC design:

Internet Gateway
       |
  Public Subnet (one per AZ)
  - Load Balancer (ALB)
  - NAT Gateway
       |
  Private Subnet - App Tier (one per AZ)
  - EC2 instances / ECS tasks
  - Auto Scaling Group
       |
  Private Subnet - Data Tier (one per AZ)
  - RDS Multi-AZ
  - ElastiCache

Key points to cover: why app and data tiers are in private subnets (not reachable from internet), why NAT Gateway enables outbound traffic without exposing instances, why multiple availability zones provide fault tolerance, and how security groups limit traffic between tiers.

Storage: S3 and EBS

"What is the difference between S3 and EBS? When do you use each?"

EBS (Elastic Block Store) provides block storage attached to a single EC2 instance. It behaves like a hard drive—you format it, mount it, and read/write as a filesystem. It is appropriate for operating system volumes, databases, and any workload requiring low-latency block I/O.

S3 is object storage accessed via API. It is not mounted like a filesystem (though s3fs enables this with caveats). It is appropriate for static assets, backups, data lakes, and application artifacts. S3 is highly durable (11 nines) and scales without provisioning capacity.

"What is an S3 bucket policy vs. an ACL vs. a presigned URL?"

A bucket policy is a resource-based IAM policy attached to the bucket that grants permissions to AWS principals, including cross-account access. An ACL (Access Control List) is a legacy mechanism that grants coarse-grained access to specific canonical users or groups. A presigned URL grants time-limited access to a specific S3 object to anyone who has the URL, without requiring AWS credentials.

Core Azure Questions

Azure Active Directory and RBAC

"What is the difference between Azure AD (Entra ID) and on-premises Active Directory?"

Azure AD (now called Microsoft Entra ID) is a cloud-native identity provider designed for web protocols—OAuth 2.0, OpenID Connect, and SAML. On-premises Active Directory uses Kerberos and NTLM for authentication within a domain. Azure AD Connect synchronizes identities between on-premises AD and Azure AD, enabling hybrid identity scenarios. Azure AD does not support group policies or organizational units in the traditional AD sense.

"Explain Azure RBAC and the difference between Owner, Contributor, and Reader roles."

Azure RBAC (Role-Based Access Control) controls access to Azure resources at subscription, resource group, or individual resource scope. Built-in roles:

  • Owner: full access including the ability to delegate access to others

  • Contributor: full access to create and manage resources but cannot grant access

  • Reader: view resources but cannot make changes

Custom roles allow fine-grained permission sets. The principle of least privilege applies: most service accounts and automation should use Contributor at the resource group scope rather than Owner at the subscription scope.

Azure Networking

"What is the difference between a Network Security Group and an Azure Firewall?"

A Network Security Group (NSG) provides stateful packet filtering at the subnet or NIC level. Rules are based on source/destination IP, port, and protocol. NSGs are appropriate for segmenting traffic within a virtual network.

Azure Firewall is a managed, cloud-native firewall with application-level filtering (FQDNs, URL categories), threat intelligence feeds, and centralized logging. It is appropriate for east-west traffic inspection and internet egress filtering at enterprise scale.

"What is VNet peering and when would you use it instead of a VPN?"

VNet peering connects two Azure virtual networks within the same region or across regions (global peering) using the Azure backbone network. Traffic is private, does not traverse the public internet, and has lower latency than a VPN connection. It is appropriate for connecting workloads across VNets when you do not need the overhead of VPN gateway management.

A VPN gateway is appropriate when you need site-to-site connectivity with on-premises infrastructure or when you need to connect to Azure over an encrypted tunnel from outside the Azure backbone.

Cross-Cloud Architecture Questions

Senior cloud interviews often include architecture and trade-off questions that span providers or compare cloud-native patterns with traditional approaches.

"The most common failure I see in cloud interviews is candidates who can describe services but cannot explain the trade-offs. Every architectural decision involves trade-offs—if a candidate cannot articulate what they gave up by choosing RDS over self-managed Postgres, they have not thought deeply about the decision." — Michael Wittig, co-author of Amazon Web Services in Action (Manning Publications)

"What is the CAP theorem and how does it affect your database choices in the cloud?"

The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance. Since network partitions are a reality, cloud database design primarily involves choosing between consistency (CP) and availability (AP).

AWS DynamoDB defaults to eventual consistency for higher availability and lower latency, but supports strongly consistent reads at a performance cost. Amazon Aurora provides strong consistency within a region. For globally distributed workloads with strong consistency requirements, Aurora Global Database or a CP database with higher latency may be appropriate.

"Explain Infrastructure as Code and why it matters for cloud operations."

IaC means managing cloud resources through code rather than through the management console. Tools include Terraform, AWS CloudFormation, and Azure Bicep/ARM templates. Benefits:

  • Resources are reproducible and version-controlled

  • Drift between environments is detectable

  • Provisioning can be automated and audited

  • Destruction and recreation is predictable

In interviews, be prepared to describe a real IaC workflow: writing Terraform, running plan to review changes, applying, storing state remotely (S3 + DynamoDB locking), and managing multiple environments through workspaces or separate state files.

See also: DevOps Interview Questions: CI/CD, Containers, and Infrastructure as Code

Cloud role salaries and certification signal

Interview preparation is leverage only if the roles you are preparing for exist and pay what you expect. The table below summarizes current US 2024-2025 salary ranges for cloud roles across seniority levels, drawn from the Robert Half 2024 Technology Salary Guide [1] and Levels.fyi aggregated data.

Role Seniority US salary range (2024-2025) Most valued certifications
Cloud Support Engineer Associate $75,000-$105,000 AWS CCP, AWS SAA
Cloud Engineer Mid $110,000-$155,000 AWS SAA, Azure AZ-104, GCP ACE
Senior Cloud Engineer Senior $145,000-$195,000 AWS SAP, Azure AZ-305, GCP PCA
Cloud Architect Senior $165,000-$220,000 AWS SAP, Azure AZ-305, TOGAF
Principal Cloud Architect Staff $200,000-$285,000 AWS SAP, Azure AZ-305, domain specialty
DevOps Engineer Mid $115,000-$160,000 AWS DOP, CKA, Terraform Associate
Site Reliability Engineer Senior $160,000-$230,000 AWS DOP, CKA, CKS
Cloud Security Engineer Senior $155,000-$210,000 AWS Security Specialty, CISSP
FAANG Cloud Engineer L5/L6 $275,000-$450,000 (TC with stock) AWS Professional-tier certs, depth over breadth

Salaries in San Francisco Bay Area, New York, and Seattle typically run 15-25% above the ranges above; secondary tech hubs (Austin, Denver, Raleigh, Atlanta) track the national median. Remote-only roles have compressed geographic premiums since 2022 but have not eliminated them entirely.

Certifications most signal-heavy in interviews

Certification Current exam code Fee Interview signal value
AWS Cloud Practitioner CLF-C02 $100 Entry-level credibility; rarely decisive
AWS SAA-C03 SAA-C03 $150 Strong signal for mid-level roles
AWS SAP-C02 SAP-C02 $300 Near-prerequisite for senior AWS-focused roles
Azure AZ-104 AZ-104 $165 Mid-level Azure credibility
Azure AZ-305 AZ-305 $165 Senior Azure solutions architect prerequisite
Google Professional Cloud Architect PCA $200 Strong signal in GCP-focused firms
CKA CKA $395 Prerequisite for Kubernetes-heavy roles
Terraform Associate 003 $70.50 Very signal-heavy for IaC-heavy teams

Live coding and architecture whiteboard formats

Cloud interviews increasingly include architecture whiteboard sessions where candidates are asked to design a system end-to-end. The evaluation criteria are not just correctness but decision-making quality and trade-off articulation.

Interview segment Typical duration What interviewers evaluate
Behavioral screening 30 min Communication, role fit, motivation
Technical phone screen 45-60 min Service knowledge, scenario reasoning
Architecture whiteboard 60-90 min System design, trade-off articulation
Live coding (for SRE/DevOps) 45-60 min Python/bash/Go fluency, debugging skill
Behavioral deep-dive 45-60 min Leadership principles, conflict resolution
Bar-raiser / team fit 45-60 min Cultural alignment, senior feedback

Candidates should practice articulating trade-offs aloud rather than simply identifying them. The most common feedback our cert research team hears from interviewers who pass candidates is "they explained their reasoning as they designed", not "they produced the ideal design".

"Cloud architecture interviews are not knowledge tests in the traditional sense. They are decision-making demonstrations. The best candidates are the ones who, when asked to design something, first clarify the requirements, then articulate three or four possible approaches, and explicitly reason through the trade-offs before committing to a design. Candidates who jump directly to a solution typically miss the mark even when their solution is technically reasonable." - Werner Vogels, CTO of Amazon, in a 2023 talk on architectural thinking [2].


Behavioral component: STAR technique with technical depth

Cloud engineering interviews pair technical scenarios with behavioral deep-dives. The standard STAR (Situation, Task, Action, Result) framework applies, but candidates often under-invest in the Action portion, which is where senior interviewers find signal.

  • Situation - one or two sentences of context. Too much context bores the interviewer and wastes interview time.

  • Task - the specific outcome required. Be concrete: "We needed to reduce P95 latency on the checkout API from 800ms to under 200ms without increasing infrastructure cost."

  • Action - what you personally did. This is the most important portion. Senior interviewers want to hear the specific trade-offs you considered, the options you rejected and why, and the technical reasoning behind your chosen approach.

  • Result - quantified outcome when possible. Percentage improvements, dollar savings, incident reduction rates, or customer-impact metrics.

A candidate describing "we migrated to Aurora" in one sentence provides zero signal. A candidate describing the Aurora-versus-RDS-versus-self-managed-Postgres evaluation with specific cost and performance benchmarks provides strong senior-level signal.


Common interview traps and how to handle them

  • The intentional ambiguity - interviewers sometimes leave requirements deliberately unclear to see whether candidates ask clarifying questions. The correct response is to ask about traffic patterns, consistency requirements, budget constraints, and SLA targets before designing.

  • The overengineering trap - when asked to design a simple read-heavy website, candidates who propose a multi-region Aurora Global Database with Route 53 failover demonstrate insensitivity to cost and complexity. Senior candidates propose the simplest solution that meets requirements, then articulate when more complex solutions become justified.

  • The "just memorize service names" trap - listing AWS services without describing when to choose one over another is a junior-level response. Senior-level responses articulate decision criteria.

  • The live documentation check - some interviewers openly invite candidates to check documentation during design. The correct behavior is to accept the offer for specific detail lookups (exact limits, service names) while articulating the conceptual design from memory.


Hands-on lab preparation by provider

Mock interview effectiveness increases dramatically when paired with hands-on lab practice in the actual cloud environment.

Provider Free tier preparation environment Cost after free tier
AWS AWS Free Tier (12 months), AWS Skill Builder labs $29/month Skill Builder individual
Azure Azure Free Account ($200 credit, 12 months), Microsoft Learn Sandbox (free) Pay-as-you-go
Google Cloud GCP Free Tier (always free + $300 credit), Qwiklabs / Skills Boost $29/month Skills Boost
Multi-cloud A Cloud Guru hands-on labs, Pluralsight, Cloud Academy $29-$45/month
Kubernetes kind, Minikube (local free), Play with Kubernetes (free) $0 if local

For interview preparation, candidates should complete at least one end-to-end build in the provider's free tier before the interview. A candidate who has personally stood up a multi-tier application with IaC will articulate design trade-offs with credibility that video-course-only candidates cannot match.


References

Frequently Asked Questions

What AWS topics come up most in cloud engineer interviews?

The most common topics are IAM roles vs users and least privilege policy design, EC2 pricing models, VPC architecture for multi-tier applications, S3 vs EBS storage trade-offs, and Infrastructure as Code with Terraform or CloudFormation. Security and networking questions appear in nearly every cloud role interview.

What is the difference between an IAM role and an IAM user in AWS?

An IAM user is a persistent identity with long-term credentials. An IAM role is a temporary identity that services and applications assume to get short-lived STS credentials. Using roles for EC2, Lambda, and ECS is best practice because credentials rotate automatically and are never stored as plaintext.

How should I answer a VPC design question in a cloud interview?

Start by confirming requirements (tiers, availability requirements, internet exposure). Then describe a design with public subnets for load balancers and NAT gateways, private subnets for application and database tiers, multi-AZ placement for fault tolerance, and security groups controlling traffic between tiers. Explain the reasoning for each decision.

What is the difference between Azure NSG and Azure Firewall?

A Network Security Group provides stateful packet filtering at the subnet or NIC level based on IP, port, and protocol. Azure Firewall is a managed service with application-layer filtering, FQDN rules, threat intelligence, and centralized logging—appropriate for enterprise-scale egress control and east-west inspection.

What is a Lambda cold start and how do you reduce it?

A cold start occurs when AWS initializes a new execution environment for a Lambda function that has not been invoked recently. You can reduce cold start impact by minimizing package size, using languages with fast runtimes like Go or Node.js, keeping initialization code outside the handler, and using Provisioned Concurrency for latency-sensitive functions.