Narrative

Cloud Infrastructure Automation Platform

4 weeks (manual)15 minutesenvironment provisioning

Provisioning a new cloud environment took 4+ weeks of manual click-ops. Every environment was slightly different from the last — configuration drift meant 'it works in staging' was meaningless.

DevOpsAWSTerraform

What Was Broken

  • New environment provisioning: 4+ weeks of manual steps, tickets, and approvals
  • 100% configuration drift between dev, staging, and prod — no two environments were identical
  • No audit trail for infrastructure changes — who changed what, when, why
  • Zero automated rollback — a bad infrastructure change required manual reverting
// required fix
  • Centralized IaC repository with modules for all standard infrastructure patterns
  • CI/CD pipeline that validates, plans, and applies Terraform on PR merge
  • Drift detection running on a schedule — alert when actual state diverges from code
  • Automated rollback on detection of provisioning failure

How It Was Built

Built Terraform modules for VPCs, EKS, and RDS. Orchestrated via GitHub Actions with S3/DynamoDB remote state. Enforced compliance with tfsec on every PR.

Modular Terraform with Remote State
  • Built reusable modules for VPCs (with public/private subnet patterns), EKS clusters, and RDS instances with parameter groups.
  • 📄 modules/eks/main.tf
CI/CD Pipeline for Infrastructure
  • GitHub Actions runs tfsec (security scan) and terraform plan on every PR.
  • 📄 .github/workflows/terraform.yml

Modular Terraform with Remote State

Built reusable modules for VPCs (with public/private subnet patterns), EKS clusters, and RDS instances with parameter groups. Remote state in S3 with DynamoDB locking — no concurrent apply conflicts.

modules/eks/main.tf
hcl
module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  cluster_name    = var.cluster_name
  cluster_version = "1.29"
  vpc_id          = var.vpc_id
  subnet_ids      = var.private_subnet_ids

  node_groups = {
    default = {
      instance_types = ["t3.medium"]
      min_size       = 2
      max_size       = 10
      desired_size   = 3
    }
  }
}

CI/CD Pipeline for Infrastructure

GitHub Actions runs tfsec (security scan) and terraform plan on every PR. Plan output is posted as a PR comment. On merge to main, terraform apply runs with the locked plan file.

.github/workflows/terraform.yml
yaml
- name: Security scan
  uses: aquasecurity/tfsec-action@v1
  with:
    soft_fail: false

- name: Terraform plan
  run: terraform plan -out=tfplan

- name: Comment plan on PR
  uses: borchero/terraform-plan-comment@v1

What Changed

New environment provisioning dropped from 4 weeks to 15 minutes. 100% configuration consistency across all environments. Compute costs cut 15% via automated dev teardown.

Environment provisioning
4 weeks (manual)
0
~270× faster
Configuration consistency
Unknown (drift)
0
Guaranteed
Compute costs
Baseline
0
Automated teardown
"Infrastructure code now lives in Git, gets reviewed like application code, and deploys automatically. The team stopped being afraid of infrastructure changes."

Common Questions

Terraform has the largest ecosystem and best module support. While CDK is great for application developers, Terraform's declarative nature makes it easier to audit and ensures state is always deterministic.
We used an S3 backend for state storage coupled with a DynamoDB table for state locking. This ensured that if two CI/CD jobs ran simultaneously, they wouldn't corrupt the infrastructure state.
It took a culture shift. Engineers were used to clicking around in the console. We enforced a 'no manual changes' policy by locking down AWS permissions, making the Terraform pipeline the only way to deploy.