frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

https://github.com/rocky-data/rocky
54•hugocorreia90•18h ago
Hi HN, I'm Hugo. I've been building Rocky over the past month, shipping fast in the open. The binary is on GitHub Releases, `dagster-rocky` on PyPI, and the VS Code extension on the Marketplace. I held off on a broader announcement until the trust-system surface was coherent enough to talk about as one thing. The governance waveplan — column classification, per-env masking, 8-field audit trail on every run, `rocky compliance` rollup, role-graph reconciliation, retention policies — landed end-to-end last week in engine-v1.16.0 and rounded out in v1.17.4 (tagged 2026-04-26). That's the milestone I'd been waiting for.

The pitch: keep Databricks or Snowflake. Bring Rocky for the DAG. Rocky is a Rust-based control plane for warehouse pipelines. Storage and compute stay with your warehouse. Rocky owns the graph — dependencies, compile-time types, drift, incremental logic, cost, lineage, governance. The things your current stack can't give you because it doesn't own the DAG.

A few things I think are interesting:

- Branches + replay. `rocky branch create stg` gives you a logical copy of a pipeline's tables (schema-prefix today; native Delta SHALLOW CLONE and Snowflake zero-copy are next). `rocky replay <run_id>` reconstructs which SQL ran against which inputs. Git-grade workflow on a warehouse.

- Column-level lineage from the compiler, not a post-hoc graph crawl. The type checker traces columns through joins, CTEs, and windows. VS Code surfaces it inline via LSP.

- Governance as a first-class surface. Column classification tags plus per-env masking policies, applied to the warehouse via Unity Catalog (Databricks) or masking policies (Snowflake). 8-field audit trail on every run. `rocky compliance` rollup that CI can gate on. Role-graph reconciliation via SCIM + per-catalog GRANT. Retention policies with a warehouse-side drift probe.

- Cost attribution. Every run produces per-model cost (bytes, duration). `[budget]` blocks in `rocky.toml`; breaches fire a `budget_breach` hook event.

- Compile-time portability + blast radius. Dialect-divergence lint across Databricks / Snowflake / BigQuery / DuckDB (12 constructs). `SELECT *` downstream-impact lint.

- Schema-grounded AI. Generated SQL goes through the compiler — AI suggestions type-check before they can land.

What Rocky isn't:

- Not a warehouse — it's the control plane on top.

- Not a Fivetran replacement. `rocky load` handles files (CSV/Parquet/JSONL); for SaaS sources use Fivetran, Airbyte, or warehouse-native CDC.

- Not dbt Cloud — no hosted UI, no managed scheduler. First-class Dagster integration if you need orchestration.

Adapters: Databricks (GA), Snowflake (Beta), BigQuery (Beta), DuckDB (local dev / playground). Apache 2.0.

I'd love feedback on the trust-system framing, the governance surface (particularly classification-to-masking resolution in `rocky compile` and the `rocky compliance` CI gate), the branches/replay design, the cost-attribution primitives, or anything else that catches your eye. Happy to go deep in the thread.

Comments

mergisi•1h ago
* * *
hasyimibhar•1h ago
Looks cool, I've been waiting for someone to build this since dbt and SQLMesh acquisition. It would be great to have model versioning and support for ClickHouse SQL.
mollerhoj•4m ago
Its a bit confusing to claim that "The things your current stack can't give you because it doesn't own the DAG" and use DataBricks as your example: DataBricks includes jobs and pipelines, so it very much owns the DAG, no?

Ghostty is leaving GitHub

https://mitchellh.com/writing/ghostty-leaving-github
2564•WadeGrimridge•13h ago•753 comments

Bugs Rust won't catch

https://corrode.dev/blog/bugs-rust-wont-catch/
278•lwhsiao•7h ago•115 comments

Tell HN: An update from the new Tindie team

26•altairprime•1h ago•8 comments

HardenedBSD Is Now Officially on Radicle

https://hardenedbsd.org/article/shawn-webb/2026-04-26/hardenedbsd-officially-radicle
61•lftherios•2h ago•7 comments

How ChatGPT serves ads

https://www.buchodi.com/how-chatgpt-serves-ads-heres-the-full-attribution-loop/
333•lmbbuchodi•9h ago•225 comments

Soft launch of open-source code platform for government

https://www.nldigitalgovernment.nl/news/soft-launch-for-government-open-source-code-platform/
5•e12e•18m ago•0 comments

Before GitHub

https://lucumr.pocoo.org/2026/4/28/before-github/
465•mlex•12h ago•142 comments

Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

https://github.com/rocky-data/rocky
54•hugocorreia90•18h ago•3 comments

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

https://github.com/FeSens/auto-arch-tournament/blob/main/docs/auto-arch-tournament-blog-post.md
147•fesens•16h ago•31 comments

Withnail's Coat and I

https://ontherow.substack.com/p/withnails-coat-and-i
69•apollinaire•1d ago•3 comments

Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation

https://dl.acm.org/doi/abs/10.1145/3767295.3803591
26•matt_d•2h ago•0 comments

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

https://stratechery.com/2026/an-interview-with-openai-ceo-sam-altman-and-aws-ceo-matt-garman-abou...
273•translocator•14h ago•88 comments

We still don't have a more precise value for "Big G"

https://arstechnica.com/science/2026/04/we-still-dont-have-a-more-precise-value-for-big-g/
64•rbanffy•1d ago•38 comments

I won a championship that doesn't exist

https://ron.stoner.com/How_I_Won_a_Championship_That_Doesnt_Exist/
170•SEJeff•12h ago•84 comments

Gallium oxide electronics withstand extreme cold

https://discovery.kaust.edu.sa/en/article/26858/gallium-oxide-electronics-withstand-extreme-cold/
44•giuliomagnifico•1d ago•1 comments

GitHub RCE Vulnerability: CVE-2026-3854 Breakdown

https://www.wiz.io/blog/github-rce-vulnerability-cve-2026-3854
356•bo0tzz•17h ago•76 comments

Who owns the code Claude Code wrote?

https://legallayer.substack.com/p/who-owns-the-claude-code-wrote
403•senaevren•22h ago•379 comments

Regression: malware reminder on every read still causes subagent refusals

https://github.com/anthropics/claude-code/issues/49363
215•thomashobohm•9h ago•110 comments

Behavioral timescale synaptic plasticity rewires the brain after an experience

https://www.quantamagazine.org/a-new-type-of-neuroplasticity-rewires-the-brain-after-a-single-exp...
116•ibobev•1d ago•3 comments

Intel Arc Pro B70 Review

https://www.pugetsystems.com/labs/articles/intel-arc-pro-b70-review/
166•zdw•5d ago•100 comments

Your phone is about to stop being yours

https://keepandroidopen.org/en/
1365•doener•18h ago•618 comments

Talkie: a 13B vintage language model from 1930

https://talkie-lm.com/introducing-talkie
705•jekude•1d ago•281 comments

Apple CMF (Color-Matching Functions) 2026

https://www.lttlabs.com/articles/2026/04/11/apple-studio-display-xdr-display-testing-results
66•HeyMeco•9h ago•2 comments

Warp is now open-source

https://www.warp.dev/blog/warp-is-now-open-source
271•meetpateltech•17h ago•73 comments

When the Internet Was a Place

https://www.frontporchrepublic.com/2025/09/when-the-internet-was-a-place/
56•herbertl•7h ago•14 comments

Localsend: An open-source cross-platform alternative to AirDrop

https://github.com/localsend/localsend
847•bilsbie•21h ago•251 comments

Show HN: Drive any macOS app in the background without stealing the cursor

https://github.com/trycua/cua
130•frabonacci•17h ago•30 comments

An update on GitHub availability

https://github.blog/news-insights/company-news/an-update-on-github-availability/
393•salkahfi•23h ago•236 comments

I have officially retired from Emacs

https://nullprogram.com/blog/2026/04/26/
238•Fudgel•3d ago•154 comments

UAE to leave OPEC

https://www.ft.com/content/8c354f2d-3e66-47f1-aad4-9b4aa30e386d
430•bazzmt•20h ago•553 comments