OpenData Timeseries: Prometheus-compatible metrics on object storage

https://www.opendata.dev/blog/introducing-timeseries

9•apurvamehta•1h ago

Comments

mdwaud•1h ago

The "why should I care" is about 3/4 of the way down the page:

> None of these numbers are exact, but the structural gap is clear: a handful of nodes costing roughly $560/month versus $10,000-20,000/month for a managed service at the same scale. As we explained earlier, it’s practical to operate OpenData Timeseries yourself and fully realize these massive cost savings since it isn’t a traditional distributed database that manages partitioned and replicated state.

It doesn't look 100% turn-key, but those are compelling numbers.

agavra•1h ago

Good point, a tl;dr is probably worthwhile.

It's definitely not quite turn key just yet but we've been dogfooding it in production against a moderate metrics use case (~30k samples/s) and have it hooked up to grafana (you just configure a prometheus source and point to your deployed URL). We run it on a single node with no replicas ;)

apurvamehta•1h ago

Good call out, updated the intro with a summary of the cost benefit. Thanks for the feedback!

davistreybig•1h ago

Wow this is so, so much cheaper than alternatives

hagen1778•1h ago

Comparing self-hosted prices with managed solutions isn't exactly apples to apples.

But if you do compare, VictoriaMetrics cloud for 3Mil active series and twice higher ingestion rate (100K samples/s or 30s scrape interval) will cost you ~$1k/month + storage costs.

See https://victoriametrics.cloud/#estimate-cost

apurvamehta•54m ago

Agreed. VictoriaMetrics is indeed a very compelling offering. The disk-less approach is significantly simpler to operate, which I think is the biggest difference. Running opendata's version yourself has fewer moving pieces, and standard operations become trivial because no single service retains permanent state.

It's a meaningful change in calculation of running yourself vs paying someone to do it for you IMO.

agavra•51m ago

anecdotally I've heard confirmations of the challenge of running VictoriaMetrics clusters at scale. they're way better than Cortex/Thanos and they've built a pretty awesome product but still are a pretty significant operational burden.

valyala•50m ago

Interesting solution! According to the provided numbers at "query latency" chapter, the query over cold data, which selects samples for 497 time series over 6 hours time range takes 15 seconds if the queried data isn't available in the cache. This means that typical queries over historical data will take eternity to execute ;(

apurvamehta•45m ago

yes. this is current issue. there are two solutions:

1. the reason it's slow as you select more series over longer periods of time is that the series has to be pulled for each time bucket in the range, and then the samples have to be pulled for each bucket. By compacting older buckets and merging samples together, historical queries should be pretty comparable to 'more recent' cold queries. 2. We don't pre-cache all the metadata today. If we did that, then we could parallelize sample loads much more efficiently, lowering latency. 3. There is a lot of room to do better batching and tune the parallelism of cold reads.

We've only been at this for a couple of months. THe techniques to improve latency on object storage are well known, we just have to implement them.

Another benefit is this: all the data is on S3, so spinning up more optimized readers to transform older data to do more detailed analysis is also an option with this architecture.

valyala•36m ago

Yes, there is a solution for masking the read latency at object storage - to run many readers in parallel. I tweeted about it some time ago - https://x.com/valyala/status/1965093140525715714

agavra•34m ago

The other solution is to aggressively size your disk cache and keep effectively the full working set on disk, using object storage just as a durability layer. Then the main benefit is operational simplicity because you have a true shared-nothing architecture between the read replicas (there's no quorum or hash ring to maintain and no deduplication on read). Obviously you'll have a more expensive deployment topology if you do so, but it's still compelling IMO because you have the knobs to tune whether you want to cache on disk or not.

agavra•33m ago

also super cool to see you on here valyala! we took a bunch of inspiration from your work at VM. kudos to all you've done :)

apurvamehta•20m ago

+1 to what @agavra said. It's awesome to see you here @valyala. Your writing and talks about timeseries databases were a great inspriratino for us. I recall one of your earlier talks about the data layout design of VM. Opendata Timeseries has emulated a lot of it.

hagen1778•47m ago

I am curious to see more tests on the reading path. The article mentions matching 500 series over 6h window with 1m step - and it takes 2s for warmed caches. That doesn't sound good at all.

Especially nowadays, when metrics from k8s ramping up churn rate to hundreds of thousands and millions series.

agavra•27m ago

This is the biggest gap in the 0.2.1 release. We have a pretty naive query execution engine because we've spent most of the time on core data structures and ingestion.

I have some prototypes of vectorized compute that takes that same query from 2s -> ~800ms, and it's just early days. If you want to contribute to help make it better, the query engine part of it is begging for help!

Generate coherent personas from pop culture universes

Atomic Operations in Go

The Case for AI "Cooperatives"

Free airport ride exchange platform

A Better R Programming Experience Thanks to Tree-sitter

AI's Next Frontier: People Skills

I built a game: convince an AI bouncer you're also an AI in 3 messages

History Is Running Backwards

WordPress needs to refactor, not redecorate

The Iterated Surgeon's Dilemma

New AI Paradigm: Communitized RL

FrontierSWE: An ultra-long horizon coding benchmark

Open-source AI runtime security

Banks' soaring exposure to trading firms creating inherent fragility, warns S&P

Jane Street signs $6B AI cloud deal with CoreWeave, invests $1B

KeePassChi – Codeberg.org

Elon Musk buys a fifth of his own Cybertrucks

We're Hooked on Satellites. It Could Blow Up in Our Faces

Filling DOCX forms: GPT-5.1 broke it, every Claude model handled it

Lutnick's old firm pumps $10M into super PAC led by Tether executive

US, EU move toward landmark biometric data sharing deal

A privacy-first QR code generator (100% client-side, PNG and SVG export)

TCode: An AI Coding Agent Leverages Neovim and Tmux

Reed Hastings to Step Down as Netflix Chair

Frontier Coding Agents Built a Video Diffusion Pipeline on Max

Scientists Just Created Super-Strong Steel That Never Rusts

The Beginning of Scarcity in AI

Mechanical Sympathy

Neuromorphic Event-Based Camera Achieves Kilohertz Vascular Imaging

Accelerate building an independent European social web