frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

A Bid-Based NFT Advertising Grid

https://bidsabillion.com/
1•chainbuilder•40s ago•1 comments

AI readability score for your documentation

https://docsalot.dev/tools/docsagent-score
1•fazkan•8m ago•0 comments

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

https://science.nasa.gov/blogs/science-news/2026/02/06/nasa-study-non-biologic-processes-dont-ful...
1•bediger4000•11m ago•2 comments

I inhaled traffic fumes to find out where air pollution goes in my body

https://www.bbc.com/news/articles/c74w48d8epgo
1•dabinat•11m ago•0 comments

X said it would give $1M to a user who had previously shared racist posts

https://www.nbcnews.com/tech/internet/x-pays-1-million-prize-creator-history-racist-posts-rcna257768
2•doener•14m ago•1 comments

155M US land parcel boundaries

https://www.kaggle.com/datasets/landrecordsus/us-parcel-layer
2•tjwebbnorfolk•18m ago•0 comments

Private Inference

https://confer.to/blog/2026/01/private-inference/
2•jbegley•22m ago•1 comments

Font Rendering from First Principles

https://mccloskeybr.com/articles/font_rendering.html
1•krapp•25m ago•0 comments

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

https://seedance-2.net
1•dallen97•29m ago•0 comments

Wally: A fun, reliable voice assistant in the shape of a penguin

https://github.com/JLW-7/Wally
2•PaulHoule•30m ago•0 comments

Rewriting Pycparser with the Help of an LLM

https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/
2•y1n0•32m ago•0 comments

Lobsters Vibecoding Challenge

https://gist.github.com/MostAwesomeDude/bb8cbfd005a33f5dd262d1f20a63a693
1•tolerance•32m ago•0 comments

E-Commerce vs. Social Commerce

https://moondala.one/
1•HamoodBahzar•32m ago•1 comments

Avoiding Modern C++ – Anton Mikhailov [video]

https://www.youtube.com/watch?v=ShSGHb65f3M
2•linkdd•34m ago•0 comments

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

https://www.aegismind.app
2•aegismind_app•38m ago•1 comments

Zig – Package Management Workflow Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
1•Retro_Dev•39m ago•0 comments

AI-powered text correction for macOS

https://taipo.app/
1•neuling•43m ago•1 comments

AppSecMaster – Learn Application Security with hands on challenges

https://www.appsecmaster.net/en
1•aqeisi•44m ago•1 comments

Fibonacci Number Certificates

https://www.johndcook.com/blog/2026/02/05/fibonacci-certificate/
2•y1n0•45m ago•0 comments

AI Overviews are killing the web search, and there's nothing we can do about it

https://www.neowin.net/editorials/ai-overviews-are-killing-the-web-search-and-theres-nothing-we-c...
4•bundie•50m ago•1 comments

City skylines need an upgrade in the face of climate stress

https://theconversation.com/city-skylines-need-an-upgrade-in-the-face-of-climate-stress-267763
3•gnabgib•51m ago•0 comments

1979: The Model World of Robert Symes [video]

https://www.youtube.com/watch?v=HmDxmxhrGDc
1•xqcgrek2•56m ago•0 comments

Satellites Have a Lot of Room

https://www.johndcook.com/blog/2026/02/02/satellites-have-a-lot-of-room/
3•y1n0•56m ago•0 comments

1980s Farm Crisis

https://en.wikipedia.org/wiki/1980s_farm_crisis
4•calebhwin•57m ago•1 comments

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•1h ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•1h ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•1h ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•1h ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
3•rolph•1h ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•1h ago•3 comments
Open in hackernews

Show HN: Arc – high-throughput time-series warehouse with DuckDB analytics

https://github.com/Basekick-Labs/arc
30•ignaciovdk•4mo ago
Hi HN, I’m Ignacio, founder at Basekick Labs.

Over the past months I’ve been building Arc, a time-series data platform designed to combine very fast ingestion with strong analytical queries.

What Arc does? Ingest via a binary MessagePack API (fast path), Compatible with Line Protocol for existing tools (Like InfluxDB, I'm ex Influxer), Store data as Parquet with hourly partitions, Query via DuckDB engine using SQL

Why I built it:

Many systems force you to trade retention, throughput, or complexity. I wanted something where ingestion performance doesn’t kill your analytics.

Performance & benchmarks that I have so far.

Write throughput: ~1.88M records/sec (MessagePack, untuned) in my M3 Pro Max (14 cores, 36gb RAM) ClickBench on AWS c6a.4xlarge: 35.18 s cold, ~0.81 s hot (43/43 queries succeeded) In those runs, caching was disabled to match benchmark rules; enabling cache in production gives ~20% faster repeated queries

I’ve open-sourced the Arc repo so you can dive into implementation, benchmarks, and code. Would love your thoughts, critiques, and use-case ideas.

Thanks!

Comments

leakycap•4mo ago
Did you consider confusion with the Arc browser and still go with the name, or were you calling this Arc first and decided to just stick with it?
ignaciovdk•4mo ago
Hey, good question!

I didn’t really worry about confusion since this isn’t a browser, it’s a completely different animal.

The name actually came from “Ark”, as in something that stores and carries, but I decided to go with Arc to avoid sounding too biblical.

The deeper reason is that Arc isn’t just about ingestion; it’s designed to store data long-term for other databases like InfluxDB, Timescale, or Kafka using Parquet and S3-style backends that scale economically while still letting you query everything with SQL.

nozzlegear•4mo ago
Didn't that browser get mothballed by its devs?
bl4kers•4mo ago
The browser is dead anyway
simlevesque•4mo ago
I'll try this right now. I'm looking to self-host duckdb because MotherDuck is way too expensive.
ignaciovdk•4mo ago
Awesome, would love to hear what you think once you try it out!

If it’s not too much trouble, feel free to share feedback at ignacio [at] basekick [dot] net.

Nesco•4mo ago
Arc Browser, Arc Prize, Arc Institute and now the Arc Warehouse

I am afraid “Arc” became too fashionable this decade and using it might decrease brand visibility

signal11•4mo ago
Coincidentally, this site runs Arc[1] code.

[1] https://en.wikipedia.org/wiki/Arc_(programming_language)

whalesalad•4mo ago
> Arc Core is designed with MinIO as the primary storage backend

Noticing that all the benchmarking is being done with MinIO which I presume is also running alongside/locally so there is no latency and it will be roughly as fast as whatever underlying disk its operating from.

Are there any benchmarks for using actual S3 as the storage layer?

How does Arc decide what to keep hot and local? TTL based? Frequency of access based?

We're going to be evaluating Clickhouse with this sort of hot (local), cold (S3) configuration soon (https://clickhouse.com/docs/guides/separation-storage-comput...) but would like to evaluate other platforms if they are relevant.

ignaciovdk•4mo ago
Hey there, great questions.

The benchmarks weren’t run on the same machine as MinIO, but on the same network, connected over a 1 Gbps switch, so there’s a bit of real network latency, though still close to local-disk performance.

We’ve also tried a true remote setup before (compute around ~160 ms away from AWS S3). I plan to rerun that scenario soon and publish the updated results for transparency.

Regarding “hot vs. cold” data, Arc doesn’t maintain separate tiers in the traditional sense. All data lives in the S3-compatible storage (MinIO or AWS S3), and we rely on caching for repeated query patterns instead of a separate local tier.

In practice, Arc performs better than ClickHouse when using S3 as the primary storage layer. ClickHouse can scan faster in pure analytical workloads, but Arc tends to outperform it on time-range–based queries (typical in observability and IoT).

I’ll post the new benchmark numbers in the next few days, they should give a clearer picture of the trade-offs.

drchaim•4mo ago
Sounds interesting, just some questions: - tables are partitioned? By year/month? - how do you handle too many small parquet files? - are updated/deleted allowed/planned?
ignaciovdk•4mo ago
Great questions, thanks! Partitioning: yes, Arc partitions by measurement > year > month > day > hour. This structure makes time-range queries very fast and simplifies retention policies (you can drop by hour/day instead of re-clustering).

Small Parquet files: we batch writes by measurement before flushing, typically every 10 K records or 60 seconds. That keeps file counts manageable while maintaining near-real-time visibility. Compaction jobs (optional) can later merge smaller Parquet files for long-term optimization.

Updates/deletes: today Arc is append-only (like most time-series systems). Updates/deletes are planned via “rewrite on retention”, meaning you’ll be able to apply corrections or retention windows by rewriting affected partitions.

The current focus is on predictable write throughput and analytical query performance, but schema evolution and partial rewrites are definitely on the roadmap.

bormaj•4mo ago
Exciting project and definitely something I'd like to explore using. I particularly like the look of the API ergonomics. A few questions:

- is the schema inferred from the data? - can/does the schema evolve? - are custom partitions supported? - is there a roadmap for future features?

ignaciovdk•4mo ago
Thanks! Let’s go by parts, as Jason would say

Schema inference: yes, Arc infers the schema automatically from incoming data (both for MessagePack and Line Protocol). Each measurement becomes a table, and fields/tags map to columns.

Schema evolution: supported. New fields can appear at any time, they’re added to the Parquet schema automatically without migration or downtime.

Custom partitions: currently partitioning is time-based (hour-level by default), but custom partitioning by tag or host or whatever is planned. The idea is to allow you to group by any tag (e.g. device, region) in the storage path for large-scale IoT data.

Roadmap: absolutely. Grafana data source, Prometheus remote write, retention policies, gRPC streaming, and distributed query execution are all in the works.

We are going to start to blogging about it, so, stay tune.

Would love any feedback on what you’d prioritize or what would make adoption easier for your use case.

bormaj•4mo ago
My use case isn't IOT, but about once a month I get a massive data dump from a vendor. Think tens of millions of rows and 100+ columns. Cleaning, ingesting and querying this data via standard RDBMS is a slow and brittle process. There is a time series aspect, but partitioning across other keys/groups is critical.
leguy•4mo ago
In conjunction with Postgres for related relational data, I’m using timescale for IoT based time series data.

Is this something I’d use instead of timescale, or, am I understanding that the intention here is to be a data warehouse, where we could potentially offload older data to Arc for longer term storage or trend analysis?

ignaciovdk•4mo ago
Hey, thanks for asking.

I’d say both roles are possible, though the original intent of Arc was indeed to act as an offload / long-term store for systems like TimescaleDB, InfluxDB, Kafka, etc. The idea: you send data into Arc to reduce storage and query load on your primary database for ML, deep analysis, etc.

But as we built it, we discovered that Arc is really good not just at storage but at actively answering queries, so it’s kind of hybrid: somewhat “warehouse-like,” but still retaining database qualities in performance. I feel that saying a database its too much, but we are going on that direction.

IoT is absolutely one of the core use cases. You’re often ingesting tens or hundreds of thousands of events per second from edge devices, and you need a system that doesn’t choke. Our binary MessagePack ingestion helps shrink the payload size and reduce parsing overhead, that allows higher throughput for writes, which is crucial in IoT scenarios.

Let me know if you want to explore this a little more, not for selling you anything, at least not yet, I would love to understand your use case. Let me know if you are open: ignacio[at]basekick[dot]net

riku_iki•4mo ago
> Write throughput: ~1.88M records/sec (MessagePack, untuned)

this doesn't sound like much, unless records are very large..

ignaciovdk•4mo ago
That’s fair, the number alone doesn’t mean much without context.

The benchmark measures fully written time-series records, not bytes. Each record typically includes 1–4 fields, tags, and timestamps, similar to InfluxDB’s Line Protocol structure.

For comparison, the same hardware (AWS c6a.4xlarge) handles around 240K RPS using Line Protocol, while Arc reaches 1.88M RPS with MessagePack, about 7.8× faster on ingestion throughput.

You can see the full ClickBench and ingestion benchmarks are in the repo.

TL;DR: Arc’s strength isn’t massive single records, it’s sustained high-throughput ingestion of structured time-series data while still staying analytical-query friendly.