Question 1

Walk me through how a request flows from a user to your backend.

Accepted Answer

DNS → CDN edge → load balancer → ingress / service mesh → application pod → database. Mention TLS termination, where caching happens, and where you would add observability. Strong answers also flag failure points at each hop.

Question 2

How do you safely deploy a change to production?

Accepted Answer

Pipeline gates (tests, security scan, manual approval if needed). Deploy strategy: blue-green, canary, or rolling. Health checks. Auto-rollback on metric regression. Feature flags decouple deploy from release. Discuss what you actually use, not theory.

Question 3

A service is slow but not erroring. How do you debug?

Accepted Answer

Start with the four golden signals: latency, traffic, errors, saturation. Look at p95/p99, not the mean. Trace one slow request end-to-end. Check downstream dependencies (DB, cache, external API). Most production slowness is one of: lock contention, slow query, GC pause, or noisy neighbor.

Question 4

How do you manage secrets in production?

Accepted Answer

A secrets manager (AWS Secrets Manager, Vault, GCP Secret Manager). Short-lived credentials via IAM roles or workload identity. No secrets in env files committed to git. Rotation policy. Audit access. Strong answers cite a real near-miss.

Question 5

Tell me about an incident you led and what changed afterwards.

Accepted Answer

Specific incident, your role, the timeline, the customer impact, and — most importantly — the systemic change. Tooling change > process change > "we communicated more". Blameless framing is expected at this point.

Question 6

How would you reduce cloud spend without sacrificing reliability?

Accepted Answer

Reserved capacity for steady workloads, spot/preemptible for fault-tolerant batch, right-size instances based on actual utilization, S3 lifecycle policies, retire orphaned resources. Reliability comes from chaos testing the cheaper config, not from over-provisioning.

DevOps Engineer interview questions

Real questions with model answers

1. Walk me through how a request flows from a user to your backend.

2. How do you safely deploy a change to production?

3. A service is slow but not erroring. How do you debug?

4. How do you manage secrets in production?

5. Tell me about an incident you led and what changed afterwards.

6. How would you reduce cloud spend without sacrificing reliability?

Prep for other roles

Kickstart Your Career Journey

Build a team that wins