Over the past months I’ve been building Arc, a time-series data platform designed to combine very fast ingestion with strong analytical queries.
What Arc does? Ingest via a binary MessagePack API (fast path), Compatible with Line Protocol for existing tools (Like InfluxDB, I'm ex Influxer), Store data as Parquet with hourly partitions, Query via DuckDB engine using SQL
Why I built it:
Many systems force you to trade retention, throughput, or complexity. I wanted something where ingestion performance doesn’t kill your analytics.
Performance & benchmarks that I have so far.
Write throughput: ~1.88M records/sec (MessagePack, untuned) in my M3 Pro Max (14 cores, 36gb RAM) ClickBench on AWS c6a.4xlarge: 35.18 s cold, ~0.81 s hot (43/43 queries succeeded) In those runs, caching was disabled to match benchmark rules; enabling cache in production gives ~20% faster repeated queries
I’ve open-sourced the Arc repo so you can dive into implementation, benchmarks, and code. Would love your thoughts, critiques, and use-case ideas.
Thanks!
leakycap•1h ago
ignaciovdk•1h ago
I didn’t really worry about confusion since this isn’t a browser, it’s a completely different animal.
The name actually came from “Ark”, as in something that stores and carries, but I decided to go with Arc to avoid sounding too biblical.
The deeper reason is that Arc isn’t just about ingestion; it’s designed to store data long-term for other databases like InfluxDB, Timescale, or Kafka using Parquet and S3-style backends that scale economically while still letting you query everything with SQL.