Cloud technical interviews test both conceptual understanding and practical experience. Interviewers are not just looking for candidates who can recite service names—they want to know that you have built things, hit real problems, and understand the trade-offs involved in architectural decisions. This article covers the specific questions and topics that appear repeatedly in AWS and Azure technical interviews, with the depth and framing that actually distinguishes strong candidates.
Core AWS Questions
IAM and Access Management
"Explain the difference between an IAM role and an IAM user."
An IAM user is a persistent identity with long-term credentials (access keys and/or a password) associated with a specific person or service account. An IAM role is a temporary identity that can be assumed by services, applications, or other AWS accounts. Roles do not have long-term credentials—they generate short-lived STS tokens.
The important implication: using IAM roles for EC2 instances, Lambda functions, and ECS tasks is the current best practice because credentials rotate automatically and are never stored as plaintext in code or configuration. Hardcoded IAM user access keys are a security anti-pattern that appears in most cloud breach post-mortems.
"What is the principle of least privilege and how does it apply to IAM policies?"
Least privilege means granting only the permissions required for a specific task and no more. In IAM, this means writing policies that specify exact actions (s3:GetObject rather than s3:*) on exact resources (arn:aws:s3:::my-bucket/prefix/* rather than *). Using AWS-managed policies for convenience often grants broader permissions than a workload needs.
Compute: EC2 and Lambda
"What are the differences between EC2 On-Demand, Reserved, Spot, and Savings Plans pricing models?"
| Model | Use Case | Cost Relative to On-Demand |
|---|---|---|
| On-Demand | Variable workloads, short-term | Baseline |
| Reserved Instances | Steady-state, committed 1-3 years | Up to 72% savings |
| Spot Instances | Fault-tolerant, interruptible workloads | Up to 90% savings |
| Savings Plans | Flexible commitment across instance types | Up to 66% savings |
Spot Instances can be interrupted with a two-minute warning when AWS needs capacity back. They are appropriate for batch processing, CI/CD build workers, and stateless application tiers that can handle interruption.
"What is a Lambda cold start and how do you mitigate it?"
When a Lambda function has not been invoked recently, AWS must initialize a new execution environment—download the code, start the runtime, and run initialization code outside the handler. This delay is the cold start. Mitigations include:
- Keeping package size small to reduce download time
- Using Provisioned Concurrency for latency-sensitive functions
- Using runtime languages with fast startup (Go, Node.js over Java)
- Keeping initialization code outside the handler to amortize startup cost
Networking: VPC Architecture
"Design a VPC for a three-tier web application."
A standard three-tier VPC design:
Internet Gateway
|
Public Subnet (one per AZ)
- Load Balancer (ALB)
- NAT Gateway
|
Private Subnet - App Tier (one per AZ)
- EC2 instances / ECS tasks
- Auto Scaling Group
|
Private Subnet - Data Tier (one per AZ)
- RDS Multi-AZ
- ElastiCache
Key points to cover: why app and data tiers are in private subnets (not reachable from internet), why NAT Gateway enables outbound traffic without exposing instances, why multiple availability zones provide fault tolerance, and how security groups limit traffic between tiers.
Storage: S3 and EBS
"What is the difference between S3 and EBS? When do you use each?"
EBS (Elastic Block Store) provides block storage attached to a single EC2 instance. It behaves like a hard drive—you format it, mount it, and read/write as a filesystem. It is appropriate for operating system volumes, databases, and any workload requiring low-latency block I/O.
S3 is object storage accessed via API. It is not mounted like a filesystem (though s3fs enables this with caveats). It is appropriate for static assets, backups, data lakes, and application artifacts. S3 is highly durable (11 nines) and scales without provisioning capacity.
"What is an S3 bucket policy vs. an ACL vs. a presigned URL?"
A bucket policy is a resource-based IAM policy attached to the bucket that grants permissions to AWS principals, including cross-account access. An ACL (Access Control List) is a legacy mechanism that grants coarse-grained access to specific canonical users or groups. A presigned URL grants time-limited access to a specific S3 object to anyone who has the URL, without requiring AWS credentials.
Core Azure Questions
Azure Active Directory and RBAC
"What is the difference between Azure AD (Entra ID) and on-premises Active Directory?"
Azure AD (now called Microsoft Entra ID) is a cloud-native identity provider designed for web protocols—OAuth 2.0, OpenID Connect, and SAML. On-premises Active Directory uses Kerberos and NTLM for authentication within a domain. Azure AD Connect synchronizes identities between on-premises AD and Azure AD, enabling hybrid identity scenarios. Azure AD does not support group policies or organizational units in the traditional AD sense.
"Explain Azure RBAC and the difference between Owner, Contributor, and Reader roles."
Azure RBAC (Role-Based Access Control) controls access to Azure resources at subscription, resource group, or individual resource scope. Built-in roles:
- Owner: full access including the ability to delegate access to others
- Contributor: full access to create and manage resources but cannot grant access
- Reader: view resources but cannot make changes
Custom roles allow fine-grained permission sets. The principle of least privilege applies: most service accounts and automation should use Contributor at the resource group scope rather than Owner at the subscription scope.
Azure Networking
"What is the difference between a Network Security Group and an Azure Firewall?"
A Network Security Group (NSG) provides stateful packet filtering at the subnet or NIC level. Rules are based on source/destination IP, port, and protocol. NSGs are appropriate for segmenting traffic within a virtual network.
Azure Firewall is a managed, cloud-native firewall with application-level filtering (FQDNs, URL categories), threat intelligence feeds, and centralized logging. It is appropriate for east-west traffic inspection and internet egress filtering at enterprise scale.
"What is VNet peering and when would you use it instead of a VPN?"
VNet peering connects two Azure virtual networks within the same region or across regions (global peering) using the Azure backbone network. Traffic is private, does not traverse the public internet, and has lower latency than a VPN connection. It is appropriate for connecting workloads across VNets when you do not need the overhead of VPN gateway management.
A VPN gateway is appropriate when you need site-to-site connectivity with on-premises infrastructure or when you need to connect to Azure over an encrypted tunnel from outside the Azure backbone.
Cross-Cloud Architecture Questions
Senior cloud interviews often include architecture and trade-off questions that span providers or compare cloud-native patterns with traditional approaches.
"The most common failure I see in cloud interviews is candidates who can describe services but cannot explain the trade-offs. Every architectural decision involves trade-offs—if a candidate cannot articulate what they gave up by choosing RDS over self-managed Postgres, they have not thought deeply about the decision." — Michael Wittig, co-author of Amazon Web Services in Action (Manning Publications)
"What is the CAP theorem and how does it affect your database choices in the cloud?"
The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance. Since network partitions are a reality, cloud database design primarily involves choosing between consistency (CP) and availability (AP).
AWS DynamoDB defaults to eventual consistency for higher availability and lower latency, but supports strongly consistent reads at a performance cost. Amazon Aurora provides strong consistency within a region. For globally distributed workloads with strong consistency requirements, Aurora Global Database or a CP database with higher latency may be appropriate.
"Explain Infrastructure as Code and why it matters for cloud operations."
IaC means managing cloud resources through code rather than through the management console. Tools include Terraform, AWS CloudFormation, and Azure Bicep/ARM templates. Benefits:
- Resources are reproducible and version-controlled
- Drift between environments is detectable
- Provisioning can be automated and audited
- Destruction and recreation is predictable
In interviews, be prepared to describe a real IaC workflow: writing Terraform, running plan to review changes, applying, storing state remotely (S3 + DynamoDB locking), and managing multiple environments through workspaces or separate state files.
See also: DevOps Interview Questions: CI/CD, Containers, and Infrastructure as Code
References
- Wittig, M., & Wittig, A. (2019). Amazon Web Services in Action (2nd ed.). Manning Publications. ISBN: 978-1617295119
- AWS Documentation. (2024). "AWS Identity and Access Management User Guide." https://docs.aws.amazon.com/iam/
- Microsoft Azure Documentation. (2024). "Azure Role-Based Access Control documentation." https://learn.microsoft.com/en-us/azure/role-based-access-control/
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media. ISBN: 978-1449373320
- HashiCorp. (2024). "Terraform Best Practices." https://developer.hashicorp.com/terraform/docs/cloud-docs/recommended-practices
- Fowler, M. (2016). "Infrastructure as Code." https://martinfowler.com/bliki/InfrastructureAsCode.html
- Amazon Web Services. (2024). "AWS Lambda Developer Guide: Performance Optimization." https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
Frequently Asked Questions
What AWS topics come up most in cloud engineer interviews?
The most common topics are IAM roles vs users and least privilege policy design, EC2 pricing models, VPC architecture for multi-tier applications, S3 vs EBS storage trade-offs, and Infrastructure as Code with Terraform or CloudFormation. Security and networking questions appear in nearly every cloud role interview.
What is the difference between an IAM role and an IAM user in AWS?
An IAM user is a persistent identity with long-term credentials. An IAM role is a temporary identity that services and applications assume to get short-lived STS credentials. Using roles for EC2, Lambda, and ECS is best practice because credentials rotate automatically and are never stored as plaintext.
How should I answer a VPC design question in a cloud interview?
Start by confirming requirements (tiers, availability requirements, internet exposure). Then describe a design with public subnets for load balancers and NAT gateways, private subnets for application and database tiers, multi-AZ placement for fault tolerance, and security groups controlling traffic between tiers. Explain the reasoning for each decision.
What is the difference between Azure NSG and Azure Firewall?
A Network Security Group provides stateful packet filtering at the subnet or NIC level based on IP, port, and protocol. Azure Firewall is a managed service with application-layer filtering, FQDN rules, threat intelligence, and centralized logging—appropriate for enterprise-scale egress control and east-west inspection.
What is a Lambda cold start and how do you reduce it?
A cold start occurs when AWS initializes a new execution environment for a Lambda function that has not been invoked recently. You can reduce cold start impact by minimizing package size, using languages with fast runtimes like Go or Node.js, keeping initialization code outside the handler, and using Provisioned Concurrency for latency-sensitive functions.
