All posts
FinOpsAWSCloudDevOpsCost Optimization

We Cut Our AWS Bill by 47%

Without Cutting Features or Firing Anyone

Luke Halley

Luke Halley

Cloud Developer

December 28, 2025
9 min read

$2.3 million.

That's what we were on track to spend on AWS last year. Up from $1.4 million the year before. Leadership noticed.

"Cut 30% from the cloud budget."

The usual response: find underutilized resources, right-size instances, delete old snapshots. We did all that. Got maybe 8% savings.

Then we tried something different. Instead of optimizing resources, we optimized culture.

Twelve months later: 47% reduction. Same workloads. Same team. Different mindset.

Here's how.

The Real Problem Wasn't Resources

We started where everyone starts—the AWS Cost Explorer. Found the usual suspects:

  • Dev instances running 24/7
  • Oversized RDS instances
  • Orphaned EBS volumes
  • Unused Elastic IPs
  • Fixed all of it. Saved 8%. Then costs kept climbing.

    Why? Because engineers kept creating new inefficiencies faster than we could find them. We were playing whack-a-mole with a thousand developers.

    The problem wasn't resources. The problem was visibility. Engineers had no idea what their code cost.

    FinOps Is a Culture, Not a Tool

    We tried cost management tools. Dashboards everywhere. Weekly reports. Nobody read them.

    The breakthrough came when we asked: "What if engineers saw cost the same way they see latency?"

    Latency matters because it's visible. It's in dashboards. It's in alerts. It's in performance reviews. Cost? Buried in a finance spreadsheet somewhere.

    So we made cost visible.

    Step 1: Cost Tags as Code

    First, we made tagging mandatory. Not "please tag your resources"—mandatory. SCPs that reject untagged resources:

    json
    { "Version": "2012-10-17", "Statement": [ { "Sid": "RequireCostTags", "Effect": "Deny", "Action": [ "ec2:RunInstances", "rds:CreateDBInstance", "lambda:CreateFunction" ], "Resource": "*", "Condition": { "Null": { "aws:RequestTag/cost-center": "true", "aws:RequestTag/team": "true", "aws:RequestTag/environment": "true" } } } ] }

    Can't deploy without tags. Period. Engineers grumbled for a week. Then tagging became automatic—built into our Terraform modules.

    Step 2: Cost in the CI/CD Pipeline

    Next, we added Infracost to every pull request:

    yaml
    - name: Infracost run: | infracost breakdown --path=. --format=json --out-file=/tmp/infracost.json infracost comment github --path=/tmp/infracost.json \ --repo=${{ github.repository }} \ --pull-request=${{ github.event.pull_request.number }} \ --github-token=${{ secrets.GITHUB_TOKEN }}

    Every PR now shows:

    code
    💰 Monthly cost will increase by $847 (+12%) ├── aws_rds_cluster.main +$650/mo (new) ├── aws_instance.app[0] +$120/mo (m5.xlarge → m5.2xlarge) └── aws_elasticache_cluster.redis +$77/mo (new)

    Engineers see cost before merge. Not after the bill arrives.

    Step 3: Team Cost Dashboards

    We built Grafana dashboards showing cost by team. Updated daily. Displayed on monitors in each team area.

    code
    Team Phoenix - December 2025 ├── Daily average: $342 ├── Month-to-date: $7,524 ├── Projected month-end: $10,602 ├── Budget: $12,000 └── Status: 🟢 On track Top 5 Resources: 1. RDS cluster (production): $156/day 2. ECS Fargate tasks: $89/day 3. ElastiCache: $45/day 4. NAT Gateway: $32/day 5. S3 storage: $20/day

    Suddenly, teams owned their costs. "Why is our RDS so expensive?" became a standup topic.

    Step 4: Blameless Cost Reviews

    Monthly cost reviews. Not blame sessions—learning sessions. Format:

  • What changed? New resources, scaling events, traffic spikes
  • What surprised us? Unexpected cost increases
  • What did we learn? Architectural decisions that affected cost
  • What will we try? Experiments for next month
  • We found that most cost increases weren't negligence. They were reasonable decisions made without cost information. Once engineers had visibility, they made different decisions.

    The Wins That Added Up

    Reserved Instances & Savings Plans

    With predictable baseline from team dashboards, we could confidently commit:

    CommitmentMonthly Savings
    1-year EC2 Savings Plan$18,400
    3-year RDS Reserved$12,200
    1-year ElastiCache Reserved$3,100

    Total: $33,700/month (40% of our savings)

    Right-Sizing From Data

    Teams right-sized their own resources because they could see the impact:

    code
    Team Artemis reduced RDS from db.r5.2xlarge to db.r5.large Savings: $412/month Performance impact: None (CPU was at 8%)

    When the team makes the decision, there's no pushback.

    Zombie Resource Hunting

    We gamified it. Monthly leaderboard for finding unused resources:

    code
    🏆 Zombie Hunter Leaderboard - December 1. Sarah K. - $2,340 saved (found 3 unused RDS snapshots) 2. Marcus T. - $1,890 saved (deleted dev EKS cluster) 3. Priya R. - $1,200 saved (right-sized 12 EC2 instances)

    Small prizes. Big engagement. Resources that had been running for years got deleted in days.

    Architecture Changes

    The biggest wins came from engineers rethinking architecture:

    Before: Lambda functions calling RDS directly

  • NAT Gateway costs: $800/month for data transfer
  • RDS connection limits: constant issues
  • After: Lambda functions using RDS Proxy

  • NAT Gateway costs: $200/month
  • Connection pooling: problem solved
  • Savings: $600/month plus engineering time saved on connection issues.

    This change happened because an engineer saw the NAT Gateway cost in their dashboard and asked "why is this so high?"

    The Numbers

    After 12 months:

    MetricBeforeAfterChange
    Monthly AWS spend$192,000$102,000-47%
    Cost per request$0.00034$0.00019-44%
    Resources tagged34%99.2%+191%
    Teams with cost visibility028

    We hit the 30% target in month 6. Kept going.

    What Didn't Work

    Centralized Optimization Team

    We tried having a dedicated team find savings. They found plenty—but couldn't implement changes without team cooperation. And teams resisted changes they didn't understand.

    Distributed ownership beats centralized optimization.

    Automated Shutdowns Without Context

    We tried automatically stopping dev instances at 7pm. Broke on-call debugging. Broke timezone-distributed teams. Broke deploys.

    Now we recommend schedules, but teams configure their own.

    Shaming Teams for High Costs

    Early on, we highlighted "top spending teams" in all-hands. Backfired immediately. Teams with legitimate high-cost workloads felt attacked. Others hid resources in shared accounts.

    Blameless reviews work. Shame doesn't.

    The Culture Shift

    The real change wasn't technical. It was cultural.

    Before: "Cost is finance's problem." After: "Cost is a feature we ship."

    Engineers now ask "what will this cost?" during design reviews. They compare instance types like they compare algorithms. They celebrate cost reductions in sprint retros.

    One engineer told me: "I used to ignore cost alerts. Now I check our dashboard every morning like I check Slack."

    That's the shift. Cost became visible, so it became manageable.

    Start Here

    If your cloud bill is climbing:

  • Tag everything. Use SCPs to enforce. No exceptions.
  • Add Infracost to PRs. Make cost visible at decision time.
  • Build team dashboards. Daily cost by team, by environment.
  • Run blameless reviews. Monthly, focused on learning.
  • Gamify the hunt. Leaderboards for finding waste.
  • You don't need a FinOps team. You need engineers who see their costs.


    47% wasn't a target—it was a side effect of making cost visible. When engineers can see the bill, they optimize naturally.