Cloud Cost Optimization: How We Saved a Client $2M Annually
The Problem: $3M Annual AWS Bill, No Visibility
When Meridian Logistics brought us in, they were spending $3M per year on AWS with no clear sense of where the money was going. Their FinOps practice was a spreadsheet updated monthly by one overworked engineer.
By the time we were done, they were spending $960K annually — a 68% reduction — with fully automated cost monitoring and a team that understood every dollar.
Step 1: Build a Cost Map Before Touching Anything
The worst mistake in cloud optimization is optimizing before you understand. We spent the first two weeks doing nothing but mapping:
- Which services generate which costs
- Which teams own which resources
- Which resources are idle, underutilized, or orphaned
Using AWS Cost Explorer and a custom tagging enforcement policy, we found:
- 23% of compute spend went to resources that hadn't served traffic in 30+ days
- 41% of storage costs came from uncompressed, non-tiered S3 data that was accessed fewer than 3 times per year
- 18% of network costs were cross-region data transfer that could be eliminated with topology changes
That's 82% of waste identified before we wrote a single line of infrastructure code.
Step 2: Right-Size Before You Optimize
Everyone wants to use Spot Instances and Reserved Capacity immediately. That's the wrong order.
Right-sizing first:
- Analyzed actual CPU and memory utilization across 340 EC2 instances over 90 days
- Found 60% were running at under 15% CPU utilization
- Downgraded 200+ instances to smaller types, maintaining headroom
This alone saved $420K annually.
Step 3: Storage Tiering — The Overlooked Win
Most teams treat S3 as a single-tier flat storage system. That's leaving significant money on the table.
We implemented a tiering strategy:
- S3 Standard — active data accessed frequently
- S3 Intelligent-Tiering — data with unpredictable access patterns
- S3 Glacier Instant Retrieval — archives accessed occasionally
- S3 Glacier Deep Archive — compliance archives, rarely accessed
Automated lifecycle policies move data through tiers based on access patterns. The savings: $380K annually on storage alone.
Step 4: Reserved and Savings Plans
Only after right-sizing and tiering did we touch commitments.
With 12 months of reliable baseline data, we purchased:
- 1-year Compute Savings Plans for predictable baseline load
- Spot Instances with intelligent interruption handling for batch workloads
- Reserved Instances for databases that run 24/7
Combined savings: $540K annually.
Step 5: Architecture Changes for Long-Term Efficiency
Some optimizations require architectural changes, not just configuration tweaks:
- Moved scheduled batch jobs from ECS to AWS Batch — 40% cheaper for workloads that don't need constant containers
- Replaced polling with EventBridge — eliminated expensive Lambda invocations that ran every minute to check for work that wasn't there
- Moved static assets to CloudFront — reduced origin server load and network egress significantly
The Lasting Change: FinOps Culture
The technical changes saved $2M. But the cultural change is what makes the savings permanent.
Every service now has:
- A cost allocation tag linking it to a team and a product
- A monthly budget alert at 80% of expected spend
- A dashboard visible to every engineer on that team
When engineers see the dollar impact of their architectural decisions in real time, they make better decisions. That's not a technology problem — it's a visibility problem. And it's the most important infrastructure you can build.
Get our best insights delivered weekly.
Join 5,000+ engineers and product leaders reading IntelliNodes weekly. No spam, unsubscribe anytime.
Engineering world-class systems and writing about what we learn along the way.