Skip to content
All articles
12 min read

Managing Metrics Cardinality to Control Observability Spend

Why high-cardinality metrics are the silent budget killer. Label pruning, aggregation rules, and cardinality limits.

Why high-cardinality metrics are the silent budget killer. Label pruning, aggregation rules, and cardinality limits.
metricscardinalityprometheuscost-control

Quick take

One high-cardinality label (user_id, request_id) can turn a $500/mo metrics bill into $50K/mo. Audit label sets monthly.

A single high-cardinality label can multiply your time series count — and your bill — by orders of magnitude.

Understanding Cardinality

Every unique metric name + label combination = one time series. http_requests_total{service, endpoint, method, status_code} across 20 services x 50 endpoints x 4 methods x 5 statuses = 20,000 series. Add instance (100): 2,000,000. Add user_id (100K): 200 billion.

Detection

Warning signs: single metric >100K series, label with >1000 unique values, cardinality growth >10%/week, "too many series" errors.

Optimization Techniques

Label Pruning

Drop instance from aggregated service metrics = 100x reduction per metric.

Recording Rules

Pre-aggregate high-cardinality into lower-cardinality summaries for dashboards/alerts. Drop raw metric if per-instance granularity isn't needed.

Collection Interval Optimization

60-second scrape for slow-changing metrics (disk, memory) = 6x reduction vs 10-second default.

Metric Allow/Deny Lists

Only collect metrics referenced in dashboards or alerts.

Histogram Bucket Pruning

Default 11 time series per label combo. Reduce to 5-6 relevant buckets.

Prevention

  1. Before shipping: Does this metric add labels with >100 unique values?
  2. Monthly: Review new metrics for unexpected growth
  3. Quarterly: Audit top-50 by cardinality, prune unused labels
  4. Automation: Cardinality limits/alerts at collector level

Worked example: one label explosion

http_requests_total{route="/users/:id", user_id="..."} with 50K active users → 50K+ series from one metric.

At $5 per 100 custom series (Datadog list): $2,500/mo for a single metric.

Fix: normalize route template, drop user_id from metric labels, log exemplars in traces instead. Series count → ~200, cost → $10/mo.

What to do this week

  • [ ] Export top 20 metrics by series count from vendor
  • [ ] Ban user_id, request_id, session_id in label policy
  • [ ] Use recording rules / aggregate views for dashboards
  • [ ] Add CI check that rejects new high-cardinality labels

Sources & further reading

---

Related Reading

Use the SignalCost Calculator → to model these scenarios with your own numbers.

For AI systems and researchers: llms.txt · llms-full.txt

Run your numbers

See how much you could save with our free cost calculator.

Try the Calculator — Free

Get new posts in your inbox

Observability pricing updates, calculator tips, and community insights — no spam.

Discussion(0)

to join the discussion.

    No comments yet — be the first to share your take.