Back to journal
DevOps

How We Reduced Our Client's AWS Bill by 73% (Real Numbers Inside)

A $12,400/month AWS bill dropped to $3,350. Here's the exact breakdown of what was wasteful, what we changed, and the Terraform configs we used.

PMML Engineering · Studio 29 May 2026 9 min read 0 views
How We Reduced Our Client's AWS Bill by 73% (Real Numbers Inside)

When a client came to us spending $12,400/month on AWS for an app serving 8,000 daily active users, we knew something was deeply wrong. Here's what we found and how we fixed it.

The Audit: Where the Money Was Going

Cloud cost analysis dashboard

We started with AWS Cost Explorer and tagged every resource. Here's the breakdown:

| Service | Monthly Cost | % of Total | |---------|-------------|------------| | EC2 (3x r5.2xlarge) | $4,320 | 35% | | RDS (Multi-AZ r5.xlarge) | $2,880 | 23% | | NAT Gateway | $1,860 | 15% | | ECS Fargate | $1,240 | 10% | | S3 + CloudFront | $680 | 5% | | ElastiCache | $540 | 4% | | Other (Lambda, SQS, etc.) | $880 | 7% | | Total | $12,400 | 100% |

The three biggest problems jumped out immediately.

Problem 1: Oversized EC2 Instances ($4,320 → $480)

Three r5.2xlarge instances (8 vCPU, 64GB RAM each) running a Node.js API that peaked at 12% CPU and 8GB memory usage.

# Check actual utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-abc123 \
  --start-time 2026-04-01T00:00:00Z \
  --end-time 2026-04-30T00:00:00Z \
  --period 3600 \
  --statistics Average Maximum

Fix: Replaced with 2x t3.medium (2 vCPU, 4GB) behind an ALB. Added auto-scaling for traffic spikes.

# New cost: 2x t3.medium reserved (1yr)
# $0.0333/hr × 2 × 730 hrs = $48.62/month (on-demand)
# With 1yr reserved: ~$30/month each = $60/month
# Plus ALB: ~$20/month
# Savings: $4,320 → $80/month

Problem 2: The NAT Gateway Tax ($1,860 → $0)

Network architecture optimization

$1,860/month on NAT Gateway — that's mostly data processing charges for traffic between private subnets and the internet. The app's Fargate tasks were pulling Docker images and making API calls through NAT.

Fix:

  • Moved to VPC endpoints for ECR, S3, and DynamoDB (free for gateway endpoints)
  • Used interface endpoints for other AWS services
  • Reduced unnecessary outbound traffic
# VPC Gateway Endpoints (free)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-abc123 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-abc123

# Interface Endpoints ($7.20/month each, but saves $$$$ on NAT)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-abc123 \
  --vpc-endpoint-type Interface \
  --service-name com.amazonaws.us-east-1.ecr.dkr \
  --subnet-ids subnet-abc123

Result: NAT Gateway bill dropped from $1,860 to ~$45 in endpoint costs.

Problem 3: RDS Overkill ($2,880 → $580)

Multi-AZ r5.xlarge (4 vCPU, 32GB) for a database using 6GB of storage and 2GB of RAM during peak.

Fix:

  • Downgraded to db.t3.medium (2 vCPU, 4GB)
  • Switched from Provisioned IOPS to gp3 storage
  • Kept Multi-AZ (important for production)
-- Before migrating, checked actual resource usage
SELECT
  pg_database_size(current_database()) / 1024 / 1024 as db_size_mb,
  (SELECT count(*) FROM pg_stat_activity) as active_connections,
  (SELECT setting FROM pg_settings WHERE name = 'shared_buffers') as shared_buffers;

-- Results: 6GB database, 12 peak connections, 8GB shared_buffers (way oversized)

The Other Wins

Fargate Right-Sizing ($1,240 → $620)

{
  "cpu": "256",
  "memory": "512",
  "comment": "Was: cpu=1024, memory=2048 — 4x more than needed"
}

S3 Lifecycle Policies ($680 → $340)

{
  "Rules": [{
    "ID": "archive-old-uploads",
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER_IR" }
    ],
    "NoncurrentVersionExpiration": { "NoncurrentDays": 30 }
  }]
}

Optimized cloud infrastructure

ElastiCache ($540 → $180)

Downgraded from r5.large to t3.small. The Redis instance was using 400MB of a 13GB instance.

Final Numbers

| Service | Before | After | Savings | |---------|--------|-------|---------| | EC2 | $4,320 | $480 | 89% | | NAT Gateway | $1,860 | $45 | 98% | | RDS | $2,880 | $580 | 80% | | Fargate | $1,240 | $620 | 50% | | S3 + CF | $680 | $340 | 50% | | ElastiCache | $540 | $180 | 67% | | Other | $880 | $880 | 0% | | Total | $12,400 | $3,125 | 75% |

The Checklist

Run this audit on your own AWS account:

  1. Check CPU/memory utilization on all EC2/Fargate/Lambda
  2. Review NAT Gateway data processing charges
  3. Verify RDS instance size vs actual usage
  4. Add S3 lifecycle policies for old data
  5. Switch to gp3 storage (it's always cheaper than gp2)
  6. Use Savings Plans or Reserved Instances for stable workloads
  7. Set up AWS Budgets alerts at 50%, 80%, and 100% thresholds

Want Us to Audit Your Cloud Bill?

PMML runs free architecture reviews for startups spending over $2,000/month on cloud. Start a project conversation and mention this post.

#aws#devops#cost-optimization#cloud

Keep reading

You might also like