FinOpsAWSCloudDevOpsCost Optimization

We Cut Our AWS Bill by 47%

Without Cutting Features or Firing Anyone

Luke Halley

Cloud Developer

December 28, 2025

9 min read

$2.3 million.

That's what we were on track to spend on AWS last year. Up from $1.4 million the year before. Leadership noticed.

"Cut 30% from the cloud budget."

The usual response: find underutilized resources, right-size instances, delete old snapshots. We did all that. Got maybe 8% savings.

Then we tried something different. Instead of optimizing resources, we optimized culture.

Twelve months later: 47% reduction. Same workloads. Same team. Different mindset.

Here's how.

The Real Problem Wasn't Resources

We started where everyone starts—the AWS Cost Explorer. Found the usual suspects:

Dev instances running 24/7

Oversized RDS instances

Orphaned EBS volumes

Unused Elastic IPs

Fixed all of it. Saved 8%. Then costs kept climbing.

Why? Because engineers kept creating new inefficiencies faster than we could find them. We were playing whack-a-mole with a thousand developers.

The problem wasn't resources. The problem was visibility. Engineers had no idea what their code cost.

FinOps Is a Culture, Not a Tool

We tried cost management tools. Dashboards everywhere. Weekly reports. Nobody read them.

The breakthrough came when we asked: "What if engineers saw cost the same way they see latency?"

Latency matters because it's visible. It's in dashboards. It's in alerts. It's in performance reviews. Cost? Buried in a finance spreadsheet somewhere.

So we made cost visible.

Step 1: Cost Tags as Code

First, we made tagging mandatory. Not "please tag your resources"—mandatory. SCPs that reject untagged resources:

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireCostTags",
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances",
        "rds:CreateDBInstance",
        "lambda:CreateFunction"
      ],
      "Resource": "*",
      "Condition": {
        "Null": {
          "aws:RequestTag/cost-center": "true",
          "aws:RequestTag/team": "true",
          "aws:RequestTag/environment": "true"
        }
      }
    }
  ]
}

Can't deploy without tags. Period. Engineers grumbled for a week. Then tagging became automatic—built into our Terraform modules.

Step 2: Cost in the CI/CD Pipeline

Next, we added Infracost to every pull request:

yaml
- name: Infracost
  run: |
    infracost breakdown --path=. --format=json --out-file=/tmp/infracost.json
    infracost comment github --path=/tmp/infracost.json \
      --repo=${{ github.repository }} \
      --pull-request=${{ github.event.pull_request.number }} \
      --github-token=${{ secrets.GITHUB_TOKEN }}

Every PR now shows:

code
💰 Monthly cost will increase by $847 (+12%)

├── aws_rds_cluster.main        +$650/mo (new)
├── aws_instance.app[0]         +$120/mo (m5.xlarge → m5.2xlarge)
└── aws_elasticache_cluster.redis +$77/mo (new)

Engineers see cost before merge. Not after the bill arrives.

Step 3: Team Cost Dashboards

We built Grafana dashboards showing cost by team. Updated daily. Displayed on monitors in each team area.

code
Team Phoenix - December 2025
├── Daily average: $342
├── Month-to-date: $7,524
├── Projected month-end: $10,602
├── Budget: $12,000
└── Status: 🟢 On track

Top 5 Resources:
1. RDS cluster (production): $156/day
2. ECS Fargate tasks: $89/day
3. ElastiCache: $45/day
4. NAT Gateway: $32/day
5. S3 storage: $20/day

Suddenly, teams owned their costs. "Why is our RDS so expensive?" became a standup topic.

Step 4: Blameless Cost Reviews

Monthly cost reviews. Not blame sessions—learning sessions. Format:

What changed? New resources, scaling events, traffic spikes

What surprised us? Unexpected cost increases

What did we learn? Architectural decisions that affected cost

What will we try? Experiments for next month

We found that most cost increases weren't negligence. They were reasonable decisions made without cost information. Once engineers had visibility, they made different decisions.

The Wins That Added Up

Reserved Instances & Savings Plans

With predictable baseline from team dashboards, we could confidently commit:

Commitment	Monthly Savings
1-year EC2 Savings Plan	$18,400
3-year RDS Reserved	$12,200
1-year ElastiCache Reserved	$3,100

Total: $33,700/month (40% of our savings)

Right-Sizing From Data

Teams right-sized their own resources because they could see the impact:

code
Team Artemis reduced RDS from db.r5.2xlarge to db.r5.large
Savings: $412/month
Performance impact: None (CPU was at 8%)

When the team makes the decision, there's no pushback.

Zombie Resource Hunting

We gamified it. Monthly leaderboard for finding unused resources:

code
🏆 Zombie Hunter Leaderboard - December

1. Sarah K. - $2,340 saved (found 3 unused RDS snapshots)
2. Marcus T. - $1,890 saved (deleted dev EKS cluster)
3. Priya R. - $1,200 saved (right-sized 12 EC2 instances)

Small prizes. Big engagement. Resources that had been running for years got deleted in days.

Architecture Changes

The biggest wins came from engineers rethinking architecture:

Before: Lambda functions calling RDS directly

NAT Gateway costs: $800/month for data transfer

RDS connection limits: constant issues

After: Lambda functions using RDS Proxy

NAT Gateway costs: $200/month

Connection pooling: problem solved

Savings: $600/month plus engineering time saved on connection issues.

This change happened because an engineer saw the NAT Gateway cost in their dashboard and asked "why is this so high?"

The Numbers

After 12 months:

Metric	Before	After	Change
Monthly AWS spend	$192,000	$102,000	-47%
Cost per request	$0.00034	$0.00019	-44%
Resources tagged	34%	99.2%	+191%
Teams with cost visibility	0	28	—

We hit the 30% target in month 6. Kept going.

What Didn't Work

Centralized Optimization Team

We tried having a dedicated team find savings. They found plenty—but couldn't implement changes without team cooperation. And teams resisted changes they didn't understand.

Distributed ownership beats centralized optimization.

Automated Shutdowns Without Context

We tried automatically stopping dev instances at 7pm. Broke on-call debugging. Broke timezone-distributed teams. Broke deploys.

Now we recommend schedules, but teams configure their own.

Shaming Teams for High Costs

Early on, we highlighted "top spending teams" in all-hands. Backfired immediately. Teams with legitimate high-cost workloads felt attacked. Others hid resources in shared accounts.

Blameless reviews work. Shame doesn't.

The Culture Shift

The real change wasn't technical. It was cultural.

Before: "Cost is finance's problem." After: "Cost is a feature we ship."

Engineers now ask "what will this cost?" during design reviews. They compare instance types like they compare algorithms. They celebrate cost reductions in sprint retros.

One engineer told me: "I used to ignore cost alerts. Now I check our dashboard every morning like I check Slack."

That's the shift. Cost became visible, so it became manageable.

Start Here

If your cloud bill is climbing:

Tag everything. Use SCPs to enforce. No exceptions.

Add Infracost to PRs. Make cost visible at decision time.

Build team dashboards. Daily cost by team, by environment.

Run blameless reviews. Monthly, focused on learning.

Gamify the hunt. Leaderboards for finding waste.

You don't need a FinOps team. You need engineers who see their costs.

47% wasn't a target—it was a side effect of making cost visible. When engineers can see the bill, they optimize naturally.