How We Reduced Our Client's AWS Bill by 73% (Real Numbers Inside)
A $12,400/month AWS bill dropped to $3,350. Here's the exact breakdown of what was wasteful, what we changed, and the Terraform configs we used.
When a client came to us spending $12,400/month on AWS for an app serving 8,000 daily active users, we knew something was deeply wrong. Here's what we found and how we fixed it.
The Audit: Where the Money Was Going
We started with AWS Cost Explorer and tagged every resource. Here's the breakdown:
| Service | Monthly Cost | % of Total | |---------|-------------|------------| | EC2 (3x r5.2xlarge) | $4,320 | 35% | | RDS (Multi-AZ r5.xlarge) | $2,880 | 23% | | NAT Gateway | $1,860 | 15% | | ECS Fargate | $1,240 | 10% | | S3 + CloudFront | $680 | 5% | | ElastiCache | $540 | 4% | | Other (Lambda, SQS, etc.) | $880 | 7% | | Total | $12,400 | 100% |
The three biggest problems jumped out immediately.
Problem 1: Oversized EC2 Instances ($4,320 → $480)
Three r5.2xlarge instances (8 vCPU, 64GB RAM each) running a Node.js API that peaked at 12% CPU and 8GB memory usage.
# Check actual utilization
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-abc123 \
--start-time 2026-04-01T00:00:00Z \
--end-time 2026-04-30T00:00:00Z \
--period 3600 \
--statistics Average Maximum
Fix: Replaced with 2x t3.medium (2 vCPU, 4GB) behind an ALB. Added auto-scaling for traffic spikes.
# New cost: 2x t3.medium reserved (1yr)
# $0.0333/hr × 2 × 730 hrs = $48.62/month (on-demand)
# With 1yr reserved: ~$30/month each = $60/month
# Plus ALB: ~$20/month
# Savings: $4,320 → $80/month
Problem 2: The NAT Gateway Tax ($1,860 → $0)
$1,860/month on NAT Gateway — that's mostly data processing charges for traffic between private subnets and the internet. The app's Fargate tasks were pulling Docker images and making API calls through NAT.
Fix:
- Moved to VPC endpoints for ECR, S3, and DynamoDB (free for gateway endpoints)
- Used interface endpoints for other AWS services
- Reduced unnecessary outbound traffic
# VPC Gateway Endpoints (free)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-abc123 \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-abc123
# Interface Endpoints ($7.20/month each, but saves $$$$ on NAT)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-abc123 \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.us-east-1.ecr.dkr \
--subnet-ids subnet-abc123
Result: NAT Gateway bill dropped from $1,860 to ~$45 in endpoint costs.
Problem 3: RDS Overkill ($2,880 → $580)
Multi-AZ r5.xlarge (4 vCPU, 32GB) for a database using 6GB of storage and 2GB of RAM during peak.
Fix:
- Downgraded to db.t3.medium (2 vCPU, 4GB)
- Switched from Provisioned IOPS to gp3 storage
- Kept Multi-AZ (important for production)
-- Before migrating, checked actual resource usage
SELECT
pg_database_size(current_database()) / 1024 / 1024 as db_size_mb,
(SELECT count(*) FROM pg_stat_activity) as active_connections,
(SELECT setting FROM pg_settings WHERE name = 'shared_buffers') as shared_buffers;
-- Results: 6GB database, 12 peak connections, 8GB shared_buffers (way oversized)
The Other Wins
Fargate Right-Sizing ($1,240 → $620)
{
"cpu": "256",
"memory": "512",
"comment": "Was: cpu=1024, memory=2048 — 4x more than needed"
}
S3 Lifecycle Policies ($680 → $340)
{
"Rules": [{
"ID": "archive-old-uploads",
"Status": "Enabled",
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER_IR" }
],
"NoncurrentVersionExpiration": { "NoncurrentDays": 30 }
}]
}
ElastiCache ($540 → $180)
Downgraded from r5.large to t3.small. The Redis instance was using 400MB of a 13GB instance.
Final Numbers
| Service | Before | After | Savings | |---------|--------|-------|---------| | EC2 | $4,320 | $480 | 89% | | NAT Gateway | $1,860 | $45 | 98% | | RDS | $2,880 | $580 | 80% | | Fargate | $1,240 | $620 | 50% | | S3 + CF | $680 | $340 | 50% | | ElastiCache | $540 | $180 | 67% | | Other | $880 | $880 | 0% | | Total | $12,400 | $3,125 | 75% |
The Checklist
Run this audit on your own AWS account:
- Check CPU/memory utilization on all EC2/Fargate/Lambda
- Review NAT Gateway data processing charges
- Verify RDS instance size vs actual usage
- Add S3 lifecycle policies for old data
- Switch to gp3 storage (it's always cheaper than gp2)
- Use Savings Plans or Reserved Instances for stable workloads
- Set up AWS Budgets alerts at 50%, 80%, and 100% thresholds
Want Us to Audit Your Cloud Bill?
PMML runs free architecture reviews for startups spending over $2,000/month on cloud. Start a project conversation and mention this post.