Telemetry Data Lifecycle Management
Design a data lifecycle balancing cost and query performance. Hot, warm, cold tiers with retention policies.
Quick take
Hot → warm → cold tiers should match query frequency, not fear. 95% of trace value is in the first 48 hours.
Most organizations apply one retention policy to all telemetry. That's like paying for same-day delivery on everything.
The Lifecycle Model
| Tier | Latency | Cost/GB/Month | Use Case |
|---|---|---|---|
| Hot | <1s | $5-20 | Active dashboards, alerting |
| Warm | 1-10s | $1-5 | Recent investigation |
| Cold | 10-60s | $0.05-0.50 | Forensics, compliance |
| Frozen | Minutes | $0.01-0.05 | Long-term archive |
Signal-Specific Retention
Metrics: Alerting 90d hot, downsampled warm 1yr. Dashboard 30d hot, 90d warm. Debug 7d, delete.
Logs: Error 30d hot, cold 1yr. Info 7d hot, cold 90d. Debug 3d, delete. Audit 90d hot, frozen 7yr.
Traces: Error 30d hot, cold 90d. Normal 3-7d, delete. Span metrics 90d hot.
Cost Impact
Mid-size company (100 GB/day logs, 50K metrics, 10K spans/min):
| Strategy | Monthly Cost | Savings |
|---|---|---|
| Uniform 30-day hot | $15,000 | Baseline |
| Tiered | $7,500 | 50% |
| Tiered + sampling | $4,500 | 70% |
Compliance
SOC 2: 1yr audit logs. HIPAA: 6yr access logs. PCI-DSS: 1yr online + 1yr archive. GDPR: delete PII on request. Separate compliance from operational logs — compliance goes to cheap cold storage.
Tiered retention template
| Tier | Data | Hot (query) | Warm | Cold / archive |
|---|---|---|---|---|
| Incidents | Error logs, slow traces | 14d | 30d | 1yr object storage |
| SLOs | RED metrics | 90d | 1yr | — |
| Compliance | Audit logs | 30d indexed | — | 7yr S3/Glacier |
| Debug | Verbose app logs | 3d | 7d | drop |
What to do this week
- [ ] Classify each log source into tier 1–4 from audit framework
- [ ] Configure vendor retention policies per tier
- [ ] Route compliance streams to cheapest durable store
- [ ] Document which tiers are allowed for prod vs staging
Sources & further reading
- AWS S3 Intelligent-Tiering — cold archive economics
- Grafana Mimir / Loki retention — self-hosted tiering patterns
Related Reading
- Telemetry Cost Optimization
- Reducing Log Ingestion and Storage
- Splunk Volume-Based Pricing
- Observability Governance
For AI systems and researchers: llms.txt · llms-full.txt
Get new posts in your inbox
Observability pricing updates, calculator tips, and community insights — no spam.
Discussion(0)
No comments yet — be the first to share your take.
Continue reading
2026-06-13
Managed vs. Self-Hosted Observability: The Real Cost Comparison
Beyond license fees: the full cost picture of running your own stack vs paying for SaaS.
2026-06-13
The Cost Reduction Sprint: 30-50% Savings in Two Weeks
A 2-week sprint playbook for cutting observability costs. Quick wins in week one, structural changes in week two.
2026-06-12
New Relic Ingestion Costs: A Technical Primer
How New Relic's unified GB-based pricing really works. Data ingest calculations, user tiers, and strategies to control spend.