Instance Sizing

Right-Sizing Cloud VMs: Why Most Teams Over-Provision by 40–60%

Alan Foster · FinOps Engineer — March 2026 — ≈ 5 min read

Server room and infrastructure monitoring screens

Most production fleets are not oversized because engineers are careless. They are oversized because nobody gets rewarded for removing a safety cushion that has not caused visible pain yet. A large instance that runs quietly looks prudent. A smaller instance that saves money feels risky, even when the numbers support it.

That is why over-provisioning survives long after launch week. A workload gets a generous VM during a tense release, traffic stabilises, and nobody comes back to check whether the instance ever needed that much headroom.

⚡ The average EC2 instance in production runs at 14% CPU utilisation — six times under the recommended sizing threshold.

1. Why over-provisioning happens

Fear is the first driver. Teams remember the outage that happened once, not the 27 quiet days a month when the system idled at a fraction of capacity. Vague requirements make it worse. If nobody can state the true peak request rate or memory footprint, the default answer is to buy more room.

The second driver is a missing feedback loop. Many companies monitor incidents but not waste. They know when CPU hits 95%, yet they do not review fleets where CPU never rises above 18% and memory sits flat at 42% for weeks.

2. What utilisation data actually shows

When you inspect ninety days of CloudWatch or Datadog charts, the story is usually calmer than people expect. CPU spikes are short. Memory usage is steady. Disk and network charts show predictable business-hour patterns rather than chaotic bursts.

That matters because right-sizing should be based on peaks with context, not averages alone. I usually want p95 CPU, sustained memory usage, swap activity, and one clear release calendar. Those four signals tell you far more than a debate about hypothetical future load.

CPU p95 and short-lived spike duration
Memory saturation rather than allocated memory
Request rate changes after releases or campaigns
Background job schedules that create predictable bursts
Instance restart history and paging activity

3. The 70% target rule

For steady workloads, aiming for roughly 70% utilisation is a sensible operating point. It leaves room for variance without paying for idle hardware. Bursty web services deserve more slack, often closer to 60%, because sudden traffic shifts arrive faster than procurement cycles.

The rule is not rigid. Database servers may need extra memory headroom, and latency-sensitive inference services may keep more spare CPU. The point is to anchor the decision in a target rather than in instinct.

4. Building a right-sizing cycle

The healthiest teams treat right-sizing as a recurring review, not a heroic one-off. Pull utilisation data monthly, rank the worst offenders, make the smallest safe reduction, then watch the next release. The cycle becomes ordinary, which is exactly what you want.

Auto-scaling also has a place. If demand moves sharply and frequently, scale-out rules may be safer than guessing one perfect instance size. Stable services, though, should not hide behind auto-scaling as an excuse to ignore oversized baselines.

Alan Foster

FinOps Engineer

Alan has right-sized cloud infrastructure at four Series-B startups, recovering an average of $24,000/month in annual cloud spend.