Skip to content
All articles
11 min read

Observability Spend Forecasting for Engineering Leaders

Build a 12-month observability cost model accounting for infrastructure growth, cardinality explosion, and pricing tier transitions.

Build a 12-month observability cost model accounting for infrastructure growth, cardinality explosion, and pricing tier transitions.
forecastingbudgetingengineering-leadershipfinops

Quick take

Observability spend rarely scales linearly with hosts. Model cardinality, log verbosity, and SKU creep separately or miss budget by 2×.

Engineering leaders who can't model observability cost growth with precision lose budget battles to teams that can.

Why Costs Don't Scale Linearly

Cardinality multiplication. A new K8s label multiplies total series by distinct values. Log verbosity correlation. More services = more inter-service logs, growing proportional to connections. Tier transitions. Vendor pricing cliffs where effective unit cost increases before the next discount.

The Forecasting Model

Step 1: Establish Baselines

Gather 6 months of historical data per signal type. Calculate MoM growth: if hosts grow 4%/month but logs grow 10%, there's 6% behavioral growth that's controllable.

Step 2: Model Scenarios

ScenarioApproachProjected AnnualSavings
ConservativeDo nothing~$520K-
OptimizedPipeline controls~$380K27%
AggressiveFull optimization~$280K46%

Step 3: Pricing Events

  • Commitment renewal dates and rates
  • Tier boundary crossings
  • New feature adoption plans
  • Annual 5-10% vendor price increases

Step 4: Executive Summary

For finance: quarterly projection vs budget. For engineering: per-team attribution. For CTO: obs as % of cloud (healthy 5-10%, alarming 20%+).

Pitfalls

  • Short averaging window. Use 6-month rolling, not one-month spikes.
  • Ignoring step functions. New teams add cost chunks, not gradual ramps.
  • All growth is inevitable. Most log growth is behavioral and controllable.
  • Forgetting retention. 30-day retention = 30x daily ingest in storage.

Worked example: 12-month forecast model

Start with three growth curves, not one:

DriverCurrentGrowth assumptionMonth 12 impact
Hosts80+15%/yr (K8s expansion)92 hosts
Logs GB/day40+8%/qtr (feature flags)~58 GB/day
Custom series30K+20%/yr (new services)36K series
Non-linear triggers to model:
  • Crossing Datadog custom metric tiers
  • Splunk daily ingest cap → overage rate
  • New Relic ingest pool exhaustion → on-demand rate
Add 15% contingency for instrumentation projects (every major launch adds 5–15% telemetry).

What to do this week

  • [ ] Export 12 months of billing history into a spreadsheet
  • [ ] Plot $/host and $/GB/day trends separately
  • [ ] Document planned launches that add instrumentation
  • [ ] Present finance with P50/P90 scenarios, not a single number

Sources & further reading

---

Related Reading

Use the SignalCost Calculator → to model these scenarios with your own numbers.

For AI systems and researchers: llms.txt · llms-full.txt

Run your numbers

See how much you could save with our free cost calculator.

Try the Calculator — Free

Get new posts in your inbox

Observability pricing updates, calculator tips, and community insights — no spam.

Discussion(0)

to join the discussion.

    No comments yet — be the first to share your take.