Managed vs. Self-Hosted Observability: The Real Cost Comparison
Beyond license fees: the full cost picture of running your own stack vs paying for SaaS.
Quick take
Self-hosted wins on $/GB at scale; managed wins on time-to-value. Include 0.5–1 FTE SRE overhead in self-hosted TCO.
The self-hosted vs managed debate isn't about license cost — it's about total cost including engineering time, reliability risk, and opportunity cost.
The True Cost Model
Managed (SaaS) Costs
License fees + agent overhead. That's it. No infra, no ops, no upgrades.Self-Hosted Costs
- Infrastructure: Compute, storage, network for the observability stack itself
- Engineering time: 1-3 FTEs for setup, maintenance, upgrades (fully loaded: $150-300K each)
- Reliability risk: Self-hosted outage during a production incident = double crisis
- Feature development: Building integrations, dashboards, alerting that vendors provide out of box
- Opportunity cost: Those engineers could be building product features
Break-Even Analysis
| Scale | SaaS Monthly | Self-Hosted Monthly | Break-Even? |
|---|---|---|---|
| Small (20 hosts) | $2-5K | $5-8K (infra + 0.25 FTE) | No |
| Mid (200 hosts) | $15-40K | $10-20K (infra + 1 FTE) | Maybe |
| Large (1,000 hosts) | $50-150K | $15-40K (infra + 2 FTE) | Yes |
| Enterprise (5,000+ hosts) | $200-500K | $30-80K (infra + 3 FTE) | Definitely |
The LGTM Stack (Self-Hosted)
Loki (logs) + Grafana (visualization) + Tempo (traces) + Mimir (metrics). All open-source, all OTel-native.
Realistic infrastructure for 200 hosts:
- 3 Mimir nodes (metrics): ~$600/month
- 3 Loki nodes (logs): ~$800/month
- 1 Tempo node (traces): ~$200/month
- 1 Grafana instance: ~$100/month
- S3 storage: ~$200/month
- Total infra: ~$1,900/month
- Total with 1 FTE: ~$14,400/month
When Self-Hosted Wins
- SaaS spend exceeds $30K/month
- You have strong platform engineering team
- Data sovereignty requirements (GDPR, regulated industries)
- Need customization vendors don't support
- Already invested in Prometheus/Grafana ecosystem
When Managed Wins
- SaaS spend under $20K/month
- Small engineering team (<50 engineers)
- Need broad integration ecosystem out of box
- Observability is not a core competency
- Speed to value matters more than cost optimization
The Hybrid Approach
Many organizations land on hybrid: self-hosted for high-volume, well-understood workloads (infrastructure metrics, application logs) and SaaS for specialized capabilities (APM, synthetics, RUM). This captures 60-70% of self-hosted savings without giving up vendor capabilities entirely.
Break-even sketch (500-host estate)
| Managed (Datadog-class) | Self-hosted LGTM on AWS | |
|---|---|---|
| Platform $ | ~$45K/mo | ~$8K infra + $25K SRE (0.5 FTE) |
| Ops burden | Low | High (upgrades, scaling, on-call) |
| Time to value | Days | 3–6 months hardening |
What to do this week
- [ ] Model self-hosted infra with calculator LGTM preset
- [ ] Add 0.5–1 FTE fully loaded cost to self-hosted column
- [ ] List features you'd lose (RUM, synthetics, ML alerts)
- [ ] Hybrid: managed for APM, self-hosted for logs/metrics?
Sources & further reading
- OpenTelemetry cost guide
- Grafana LGTM stack docs — operational requirements
Related Reading
- Comparing TCO
- OpenTelemetry Cost Guide
- Evaluating Datadog Alternatives
- Reducing Infrastructure Monitoring Costs
For AI systems and researchers: llms.txt · llms-full.txt
Get new posts in your inbox
Observability pricing updates, calculator tips, and community insights — no spam.
Discussion(0)
No comments yet — be the first to share your take.
Continue reading
2026-06-11
Comparing TCO: Datadog vs New Relic vs Splunk
Side-by-side total cost of ownership for the three dominant commercial platforms at startup to enterprise scale.
2026-06-13
The Cost Reduction Sprint: 30-50% Savings in Two Weeks
A 2-week sprint playbook for cutting observability costs. Quick wins in week one, structural changes in week two.
2026-06-12
New Relic Ingestion Costs: A Technical Primer
How New Relic's unified GB-based pricing really works. Data ingest calculations, user tiers, and strategies to control spend.