Narrative

Cloud Cost Optimization — 40% Waste Found

Cloud bills were growing, but nobody had done a structured audit of what was actually being used. It was the classic provision for peak, pay forever problem.

CloudCost OptimizationAWSScaling

What Was Broken

How It Was Built

I started with a full audit — looked at CPU and memory utilization metrics over 30 days across all running instances. About 40% were either massively oversized or had near-zero traffic. For the oversized ones, I right-sized them based on actual p95 utilization, not the original assumptions. For the idle ones, I introduced scheduled scaling — scale down during known low-traffic hours, scale up before peak. I also identified services running 24/7 that only needed to exist during business hours.

What Changed

We cut monthly cloud costs significantly — the reduction was substantial and availability stayed flat. Finance noticed before I even sent the report.

Common Questions

I used p95 utilization as the baseline, not average. Average can hide spikes. I also kept a 20–30% headroom above p95 when selecting the new instance size, and monitored closely for two weeks post-change.
Cloud-native cost dashboards and utilization metrics from our monitoring stack. I pulled data into spreadsheets for the initial analysis, then built dashboards to keep it visible on an ongoing basis.