frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Datalog in Rust

https://github.com/frankmcsherry/blog/blob/master/posts/2025-06-03.md
326•brson•7mo ago

Comments

Leynos•7mo ago
It's funny seeing this as the top story.

I'm in the middle of putting together a realtime strategy game using Differential Datalog[1] and Rust, with DDL managing the game's logic. Mostly as an excuse to expose myself to new ideas and engage in a whole lot of yak shaving.

[1] https://github.com/vmware-archive/differential-datalog

Yoric•7mo ago
On, nice!

I'll be interested in reading how this goes!

cmrdporcupine•7mo ago
Very cool, I'm curious to see what the state of that implementation is and how far you get, since DDLog is not being actively maintained anymore.
foota•7mo ago
I wonder if you could make a frankenstein version of differential datalog by combing the OP repo with salsa[1] (the crate that powers rust-analyzer)

[1] https://github.com/salsa-rs/salsa

lsuresh•7mo ago
That's a cool demo you're building using ddlog! FWIW, we, the ddlog team have moved on to found Feldera (https://github.com/feldera/feldera). You could consider using DBSP directly through Rust.
rienbdj•7mo ago
A new McSharry post! Excellent

Last I checked, VMWare had moved away from differential datalog?

jitl•7mo ago
The Differential Datalog team founded Feldera: https://www.feldera.com/

They switched from differential Datalog to differential SQL, I think because they realized Datalog is a really tough sell.

rebanevapustus•7mo ago
They did, and their product is great.

It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

I have made an accessible version of a subset of Differential Dataflow (DBSP) in Python right here: https://github.com/brurucy/pydbsp

DBSP is so expressive that I have implemented a fully incremental dynamic datalog engine as a DBSP program.

Think of SQL/Datalog where the query can change in runtime, and the changes themselves (program diffs) are incrementally computed: https://github.com/brurucy/pydbsp/blob/master/notebooks/data...

gunnarmorling•7mo ago
> It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

Flink SQL also checks that box.

rebanevapustus•7mo ago
Not true.

There has to be some change in the code, and they will not share the same semantics (and perhaps won't work when retractions/deletions also appear whilst streaming). And let's not even get to the leaky abstractions for good performance (watermarks et al).

jitl•7mo ago
Flink SQL is quite limited compared to Feldera/DBSP or Frank’s Materialize.com, and has some correctness limitations: it’s “eventually consistent” but until you stop the data it’s unlikely to ever be actually correct when working with streaming joins. https://www.scattered-thoughts.net/writing/internal-consiste...
rc00•7mo ago
Posted 1 day ago

https://news.ycombinator.com/item?id=44274592

tulio_ribeiro•7mo ago
"I, a notorious villain, was invited for what I was half sure was my long-due comeuppance." -- Best opening line of a technical blog post I've read all year.

The narrator's interjections were a great touch. It's rare to see a post that is this technically deep but also so fun to read. The journey through optimizing the aliasing query felt like a detective story. We, the readers, were right there with you, groaning at the 50GB memory usage and cheering when you got it down to 5GB.

Fantastic work, both on the code and the prose.

29athrowaway•7mo ago
If you wish to use Datalog and Rust, cozodb is written in Rust and has a Datalog query syntax.
jitl•7mo ago
Cozodb seems cool but also inactive. I poked around about in November 2024 and found some low hanging fruit in the sqlite storage backend: https://github.com/cozodb/cozo/issues/285
29athrowaway•7mo ago
It's not a lot of code so it's easy to tinker with.
w10-1•7mo ago
Yes, Cozodb has mostly worked as documented for me and been a pleasure to work with (also for program static analysis, also using sorted trees and type-fu internally). The docs have enough for an initial comparison with the blog walk-through, esp. interesting work on optimizing queries. But if you're not working in-memory in rust, data serialization is costly, and the project is quiet at best.
maweki•7mo ago
It is nice to see a core group of Datalog enthusiasts persist, even though the current Datalog revival seems to be on the decline. The recent Datalog 2.0 conference was quite small compared to previous years and the second HYTRADBOI conference was very light on Datalog as well, while the first one had a quarter of submissions with Datalog connection.

I'm encouraged by the other commenters sharing their recent Datalog projects. I am currently building a set of data quality pipelines for a legacy SQL database in preparation of a huge software migration.

We find Datalog much more useful in identifying and looking for data quality issues thatn SQL, as the queries can be incredibly readable when well-structured.

kmicinski•7mo ago
No offense, but I wouldn't take Datalog 2.0's small attendance as an exemplar of Datalog's decline, even if I agree with that high-level point. Datalog 2.0 is a satellite workshop of LPNMR, a relatively-unknown European conference that was randomly held in Dallas. I myself attended Datalog 2.0 and also felt the event felt relatively sparse. I also had a paper (not my primary work, the first author is the real wizard of course :-) at the workshop. I myself saw relatively few folks in that space even attending that event--with the notable exception of some European folks (e.g., introducing the Nemo solver).

All of this is to say, I think Datalog 2.0's sparse attendance this year may be more indicative of the fact that it is a satellite workshop of an already-lesser-prestigious conference (itself not even the main event! That was ICLP!) rather than a lack of Datalog implementation excitement.

For what it's worth, none of what I'm saying is meant to rebut your high-level point that there is little novelty left in implementing raw Datalog engines. Of course I agree, the research space has moved far beyond that (arguably it did a while ago) and into more exotic problems involving things like streaming (HydroFlow), choice (Dusa), things that get closer to the general chase (e.g., Egglog's chase engine), etc. I don't think anyone disagrees that vanilla Datalog is boring, it's just that monotonic, chain-forward saturation (Horn clauses!) are a rich baseline with a well-understood engineering landscape (esp in the high-performance space) to build out more interesting theories (semirings, Z-sets, etc..).

BartjeD•7mo ago
The USA is a hostile environment to visit, for Europeans. It's a theme all over western and northern Europe to avoid it.

If you were expecting European attendance, that would explain why it's different from the past.

It's the same in science circles, tourism and even business travel.

freilanzer•7mo ago
As a European, I'm definitely avoiding the US. I was asked whether I wanted to visit our US branch and I declined - which was simply accepted and probably expected in advance. Almost everyone I know does the same.
kmicinski•7mo ago
I completely agree with you, but this event was held in October, 2024--Trump's horrendous behavior wasn't an effect on that. There are other reasons folks internationally probably don't want to travel to Texas aside from Trump--but in this specific case I think it might have more to do with the fact that LPMNR is a traditionally European conference randomly held in the US.
burakemir•7mo ago
I made some progress porting mangle datalog to Rust https://github.com/google/mangle/tree/main/rust - it is in the same repo as the golang implementation.

It is slow going, partly since it is not a priority, partly because I suffer from second system syndrome. Mangle Rust should deal with any size data through getting and writing facts to disk via memory mapping. The golang implementation is in-memory.

This post is nice because it parses datalog and mentions the LSM tree, and much easier to follow than the data frog stuff.

There are very many datalog implementations in Rust (ascent, crepe) that use proc-macros. The downside is that they won't handle getting queries at runtime. For the static analysis use case where queries/programs are fixed, the proc macro approach might be better.

banana_feather•7mo ago
I like the author's datalog work generally, but I really wish his introductory material did not teach using binary join, which I found to get very messy internally as soon as you get away from the ideal case. I found the generic join style methods to be much, much simpler to generalize in one's head (see https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).
davery22•7mo ago
related: McSherry's preceding blog post was all about demonstrating how binary joins can achieve worst-case optimal runtime, given suitable adjustments to the query plan.

- https://github.com/frankmcsherry/blog/blob/master/posts/2025...

kmicinski•7mo ago
For materialization-heavy workloads (program analysis, etc.), we often find that optimized binary join plans (e.g., profile-optimized, hand-optimized, etc.) beat worst-case optimal plans due to the ability to get better scalability (less locking) without the need to use a trie-based representation. Within the space of worst-case optimal plans, there are still lots of choices: but a bad worst-case optimal plan can often beat a bad (randomly-chosen) binary plan. And of course (the whole point of this exercise), there are some queries where every binary plan explodes and you do need WCOJ. There's also some work on making more traditional binary joins robust (https://db.in.tum.de/people/sites/birler/papers/diamond.pdf), among other interesting work (https://arxiv.org/html/2502.15181v1). Effectively parallelizing WCOJs is still an open problem as far as I am aware (at least, this is what folks working on it tell me), but there are some exciting potential directions in tackling that that several folks are working on I believe.
rnikander•7mo ago
Some Clojure fans once told me they thought datalog was better than SQL and it was a shame that the relational DBs all used SQL. I never dug into it enough to find out why they thought that way.
jitl•7mo ago
I struggle to understand the Clojure/Datomic dialect, but I agree generally. I recommend Percival for playing around with Datalog in a friendly notebook environment online: https://percival.ink/

Although there’s no “ANSI SQL” equivalent standard across Datalog implementations, once you get a hang of the core idea it’s not too hard to understand another Datalog.

I started a Percival fork that compiles the Datalog to SQLite, if you want to check out how the two can express the same thing: https://percival.jake.tl/ (unfinished when it comes to aggregates and more advanced joins but the basic forms work okay). Logica is a much more serious / complete Datalog->SQL compiler written by a Google researcher that compiles to BigTable, DuckDB, and a few other SQL dialects (https://logica.dev/).

One area Datalog is an order of magnitude easier is when working with recursive queries / rules; this is possible in SQL but feels a bit like drinking playdough through a straw. Frank’s Materialize.com has a “WITH MUTUALLY RECURSIVE” SQL form (https://materialize.com/blog/recursion-in-materialize/) that’s much nicer than the ancient ANSI SQL recursive approach, we’re evaluating it for page load queries & data sync at Notion.

Feldera has a similar form for recursive views as well (https://www.feldera.com/blog/recursive-sql-queries-in-felder...). I like that Feldera lets you make each “rule” or subview its own statement rather than needing to pack everything into a single huge statement. Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite, the Materialize SQL dialect tries very hard to be PostgresSQL compatible.

ben_pfaff•7mo ago
> Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite

At Feldera, we're adding features to our SQL over time, by contributing them upstream to Calcite, making it better for everyone. Mihai Budiu, who is the author of the Feldera SQL compiler, is a Calcite committer.

jitl•7mo ago
Thanks for contributing. I see Mihai implemented the UUID type in Calcite (https://issues.apache.org/jira/browse/CALCITE-6738) back in January which is one of the issues I hit, so for sure my experience with Feldera is 6 months out of date and y'all move pretty quick.

Most of what I mean is places where Feldera/Calcite has slightly different syntax from Postgres for things. For example, Postgres syntax for cast to bigint is `some_expresion::bigint` although Postgres also supports ANSI SQL `CAST(some_expression AS bigint)`, most examples I find in the wild and in my own Postgres SQL use the Postgres special syntax. JSON syntax also differs; Feldera uses its own pretty elegant `VARIANT` type and `some_expression[key_expression]` to access properties, where Postgres calls this `json` or `jsonb`, and uses `some_expression->key_expression` to access properties. In those cases it's not like Feldera is wrong or lacks some support, but it's a bit harder to work with for me because I'm so used to Postgres syntax and I need to do some regex replace whenever I bring a query from Postgres over to Feldera.

Definitely not a deal-breaker, I am a Feldera enjoyer, but it does add some friction.

lsuresh•7mo ago
Thanks for the kind words. :) We hear you on the dialect differences.

An interesting case of a user dealing with this problem: they use LLMs to mass migrate SparkSQL code over to Feldera (it's often json-related constructs as you also ran into). They then verify that both their original warehouse and Feldera compute the same results for the same inputs to ensure correctness.

kragen•7mo ago
Basically Datalog is much less verbose than SQL, imposes much lighter tariffs on factoring out views, and supports transitive closure enormously better. I started http://canonical.org/~kragen/binary-relations off with a simple nonrecursive query for which the SQL translation (given below) is already criminal and whose properly factored SQL solution merits the death penalty.

Recent additions to ANSI SQL have added the capacity for recursion, so it's no longer completely impossible. But they have three big disadvantages:

1. They accidentally made SQL Turing-complete. Datalog queries, by contrast, are guaranteed to terminate.

2. They're still extremely clumsy to use.

3. Because of #1, they're often not implemented fully, so they are hard to rely on.

ulrikrasmussen•7mo ago
Yes, #1 basically means that they screwed up the design from the get go, since it is impossible to reap the actual benefits of Datalog when the language you evaluate is not, in fact, Datalog. Recursive queries have the ability to perform arbitrary computation in projections, so for starters any top-down evaluation strategy or hybrid evaluation such as magic sets is ruled out.
twoodfin•7mo ago
Curious: How would you usefully & naturally add recursion to SQL without making it Turing-complete?
kragen•7mo ago
At first blush that sounds impossible. Maybe there's a clever solution that would occur to someone if they spent a few months working on it, or maybe not.
lsuresh•7mo ago
Not sure either. We added recursion to SQL in Feldera and it's Turing-complete: https://www.feldera.com/blog/recursive-sql-queries-in-felder...
akavel•7mo ago
I had some light contact with Prolog long ago during my studies - I have a rough idea how it is used and what it can be useful for, but only on surface, not deep at all. I keep hearing about Datalog since, as some amazing thing, but I don't seem able to understand what it is - i.e. to grasp an answer to a simple question:

what is that Datalog improves over Prolog?

Just now I tried to skim the Wikipedia page of Datalog; the vague theory I'm getting from it, is that maybe Prolog has relatively poor performance, whereas Datalog dramatically improves performance (presumably allowing much bigger datasets and much more parallelized processing), at the cost of reducing expressiveness and features in some other important ways? (including making it no longer Turing-complete?) Is that what it's about, or am I completely missing the mark?

codesnik•7mo ago
from what I know, prolog looked declarative, in a way that you just encode relations and it figures out the answers, but it really depended on the order of those rules, and some additional instructions like "cut" which not only prevented waste computations, but could affect the results.

datalog on the other hands is more or less a relation db with a different syntax.

johnnyjeans•7mo ago
Datalog is simpler, not turing complete , and IIRC uses forward chaining which has knock-on effects in its performance and memory characteristics. Huge search spaces that a trivial in Prolog are impossible to represent in Datalog because it eats too much memory.

Datalog is a commuter car with a CVT. Prolog is an F1 car. Basically, it's not about improvement. It's about lobotomizing Prolog into something people won't blow their legs off with. Something that's also much easier to implement and embed in another application (though Prologs can be very easy to embed.)

If you're used to Prolog, you'll mostly just find Datalog to be claustrophobic. No call/3? No term/goal expansion? Datalog is basically designed to pull out the LCD featureset of Prolog for use as an interactive database search.

> Prolog has relatively poor performance, whereas Datalog dramatically improves performance

It's easier to write fast Datalog code but the ceiling is also way lower. Prolog can be written in a way to allow for concurrency, but that's an intermediate level task that requires understanding of your implementation. Guarded Horn Clauses and their derived languages[2] were developed to formalize some of that, but Japanese advancements over Prolog are extremely esoteric. Prolog performance really depends on the programmer and the implementation being used and where it's being used. Prolog, like a Lisp, can be used to generate native machine code from a DSL at compile-time.

If you understand how the underlying implementation of your Prolog works, and how to write code with the grain of your implementation, it's absolutely "fast enough". Unfortunately, that requires years of writing Prolog code with a single implementation. There's a lot of work on optimizing[3][4] prolog compilers out there, as well as some proprietary examples[5].

[1] - http://logicprogramming.stanford.edu/readings/ullman.pdf

[2] - https://www.ueda.info.waseda.ac.jp/AITEC_ICOT_ARCHIVES/ICOT/...

[3] - https://www.sciencedirect.com/science/article/pii/S074310669...

[4] - https://link.springer.com/content/pdf/10.1007/3-540-18024-9_...

[5] - https://sicstus.sics.se/

Xeoncross•7mo ago
If you enjoyed the state machine + parsing part and also I recommend the older "Lexical Scanning in Go" talk that Rob Pike gave: https://www.youtube.com/watch?v=HxaD_trXwRE

It's in Go, but you can apply most of it easily in other languages.

I'm so happy that modern languages like Rust, Zig, and Go support unicode/runes/graphemes natively. So many problem just disappear compared to Java, .net, C++, or scripting languages.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
444•klaussilveira•6h ago•107 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
788•xnx•11h ago•478 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
151•isitcontent•6h ago•15 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
140•dmpetrov•6h ago•61 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
16•matheusalmeida•1d ago•0 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
46•quibono•4d ago•3 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
81•jnord•3d ago•7 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
255•vecti•8h ago•120 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
188•eljojo•9h ago•125 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
319•aktau•13h ago•155 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
316•ostacke•12h ago•85 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
402•todsacerdoti•14h ago•218 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
325•lstoll•12h ago•236 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
17•kmm•4d ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
50•phreda4•6h ago•8 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
110•vmatsiiako•11h ago•34 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
189•i5heu•9h ago•132 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
6•DesoPK•1h ago•2 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
149•limoce•3d ago•79 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
239•surprisetalk•3d ago•31 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
985•cdrnsf•15h ago•417 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
21•gfortaine•4h ago•2 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
42•rescrv•14h ago•17 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
56•ray__•3h ago•13 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
36•lebovic•1d ago•11 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
5•gmays•1h ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
77•antves•1d ago•57 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
40•nwparker•1d ago•10 comments

The Oklahoma Architect Who Turned Kitsch into Art

https://www.bloomberg.com/news/features/2026-01-31/oklahoma-architect-bruce-goff-s-wild-home-desi...
20•MarlonPro•3d ago•4 comments

Evolution of car door handles over the decades

https://newatlas.com/automotive/evolution-car-door-handle/
40•andsoitis•3d ago•62 comments