Datalog in Rust

https://github.com/frankmcsherry/blog/blob/master/posts/2025-06-03.md

224•brson•11h ago

Comments

Leynos•10h ago

It's funny seeing this as the top story.

I'm in the middle of putting together a realtime strategy game using Differential Datalog[1] and Rust, with DDL managing the game's logic. Mostly as an excuse to expose myself to new ideas and engage in a whole lot of yak shaving.

[1] https://github.com/vmware-archive/differential-datalog

Yoric•10h ago

On, nice!

I'll be interested in reading how this goes!

cmrdporcupine•9h ago

Very cool, I'm curious to see what the state of that implementation is and how far you get, since DDLog is not being actively maintained anymore.

rienbdj•10h ago

A new McSharry post! Excellent

Last I checked, VMWare had moved away from differential datalog?

jitl•10h ago

The Differential Datalog team founded Feldera: https://www.feldera.com/

They switched from differential Datalog to differential SQL, I think because they realized Datalog is a really tough sell.

rebanevapustus•9h ago

They did, and their product is great.

It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

I have made an accessible version of a subset of Differential Dataflow (DBSP) in Python right here: https://github.com/brurucy/pydbsp

DBSP is so expressive that I have implemented a fully incremental dynamic datalog engine as a DBSP program.

Think of SQL/Datalog where the query can change in runtime, and the changes themselves (program diffs) are incrementally computed: https://github.com/brurucy/pydbsp/blob/master/notebooks/data...

gunnarmorling•6h ago

> It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

Flink SQL also checks that box.

rebanevapustus•6h ago

Not true.

There has to be some change in the code, and they will not share the same semantics (and perhaps won't work when retractions/deletions also appear whilst streaming). And let's not even get to the leaky abstractions for good performance (watermarks et al).

jitl•3h ago

Flink SQL is quite limited compared to Feldera/DBSP or Frank’s Materialize.com, and has some correctness limitations: it’s “eventually consistent” but until you stop the data it’s unlikely to ever be actually correct when working with streaming joins. https://www.scattered-thoughts.net/writing/internal-consiste...

rc00•9h ago

Posted 1 day ago

https://news.ycombinator.com/item?id=44274592

tulio_ribeiro•8h ago

"I, a notorious villain, was invited for what I was half sure was my long-due comeuppance." -- Best opening line of a technical blog post I've read all year.

The narrator's interjections were a great touch. It's rare to see a post that is this technically deep but also so fun to read. The journey through optimizing the aliasing query felt like a detective story. We, the readers, were right there with you, groaning at the 50GB memory usage and cheering when you got it down to 5GB.

Fantastic work, both on the code and the prose.

29athrowaway•7h ago

If you wish to use Datalog and Rust, cozodb is written in Rust and has a Datalog query syntax.

jitl•6h ago

Cozodb seems cool but also inactive. I poked around about in November 2024 and found some low hanging fruit in the sqlite storage backend: https://github.com/cozodb/cozo/issues/285

29athrowaway•2h ago

It's not a lot of code so it's easy to tinker with.

maweki•6h ago

It is nice to see a core group of Datalog enthusiasts persist, even though the current Datalog revival seems to be on the decline. The recent Datalog 2.0 conference was quite small compared to previous years and the second HYTRADBOI conference was very light on Datalog as well, while the first one had a quarter of submissions with Datalog connection.

I'm encouraged by the other commenters sharing their recent Datalog projects. I am currently building a set of data quality pipelines for a legacy SQL database in preparation of a huge software migration.

We find Datalog much more useful in identifying and looking for data quality issues thatn SQL, as the queries can be incredibly readable when well-structured.

kmicinski•6h ago

No offense, but I wouldn't take Datalog 2.0's small attendance as an exemplar of Datalog's decline, even if I agree with that high-level point. Datalog 2.0 is a satellite workshop of LPNMR, a relatively-unknown European conference that was randomly held in Dallas. I myself attended Datalog 2.0 and also felt the event felt relatively sparse. I also had a paper (not my primary work, the first author is the real wizard of course :-) at the workshop. I myself saw relatively few folks in that space even attending that event--with the notable exception of some European folks (e.g., introducing the Nemo solver).

All of this is to say, I think Datalog 2.0's sparse attendance this year may be more indicative of the fact that it is a satellite workshop of an already-lesser-prestigious conference (itself not even the main event! That was ICLP!) rather than a lack of Datalog implementation excitement.

For what it's worth, none of what I'm saying is meant to rebut your high-level point that there is little novelty left in implementing raw Datalog engines. Of course I agree, the research space has moved far beyond that (arguably it did a while ago) and into more exotic problems involving things like streaming (HydroFlow), choice (Dusa), things that get closer to the general chase (e.g., Egglog's chase engine), etc. I don't think anyone disagrees that vanilla Datalog is boring, it's just that monotonic, chain-forward saturation (Horn clauses!) are a rich baseline with a well-understood engineering landscape (esp in the high-performance space) to build out more interesting theories (semirings, Z-sets, etc..).

burakemir•6h ago

I made some progress porting mangle datalog to Rust https://github.com/google/mangle/tree/main/rust - it is in the same repo as the golang implementation.

It is slow going, partly since it is not a priority, partly because I suffer from second system syndrome. Mangle Rust should deal with any size data through getting and writing facts to disk via memory mapping. The golang implementation is in-memory.

This post is nice because it parses datalog and mentions the LSM tree, and much easier to follow than the data frog stuff.

There are very many datalog implementations in Rust (ascent, crepe) that use proc-macros. The downside is that they won't handle getting queries at runtime. For the static analysis use case where queries/programs are fixed, the proc macro approach might be better.

banana_feather•5h ago

I like the author's datalog work generally, but I really wish his introductory material did not teach using binary join, which I found to get very messy internally as soon as you get away from the ideal case. I found the generic join style methods to be much, much simpler to generalize in one's head (see https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).

davery22•3h ago

related: McSherry's preceding blog post was all about demonstrating how binary joins can achieve worst-case optimal runtime, given suitable adjustments to the query plan.

- https://github.com/frankmcsherry/blog/blob/master/posts/2025...

kmicinski•3h ago

For materialization-heavy workloads (program analysis, etc.), we often find that optimized binary join plans (e.g., profile-optimized, hand-optimized, etc.) beat worst-case optimal plans due to the ability to get better scalability (less locking) without the need to use a trie-based representation. Within the space of worst-case optimal plans, there are still lots of choices: but a bad worst-case optimal plan can often beat a bad (randomly-chosen) binary plan. And of course (the whole point of this exercise), there are some queries where every binary plan explodes and you do need WCOJ. There's also some work on making more traditional binary joins robust (https://db.in.tum.de/people/sites/birler/papers/diamond.pdf), among other interesting work (https://arxiv.org/html/2502.15181v1). Effectively parallelizing WCOJs is still an open problem as far as I am aware (at least, this is what folks working on it tell me), but there are some exciting potential directions in tackling that that several folks are working on I believe.

rnikander•3h ago

Some Clojure fans once told me they thought datalog was better than SQL and it was a shame that the relational DBs all used SQL. I never dug into it enough to find out why they thought that way.

Telephone Exchanges in the UK

It’s nearly impossible to buy an original Bob Ross painting (2021)

Modifying an HDMI dummy plug's EDID using a Raspberry Pi

First 2D, non-silicon computer developed

Datalog in miniKanren

How to modify Starlink Mini to run without the built-in WiFi router

How fast can the RPython GC allocate?

Childhood leukemia: how a deadly cancer became treatable

Twin – A Textmode WINdow Environment

Datalog in Rust

Why SSL was renamed to TLS in late 90s (2014)

Cure Dolly's Japanese Grammar Lessons

Show HN: Seastar – Build and dependency manager for C/C++ with Cargo's features

An Introduction to the Hieroglyphic Language of Early 1900s Train-Hoppers

Canyon.mid

Fields where Native Americans farmed 1000 years ago discovered in Michigan

The experience continues until you stop experiencing it

David Attenborough at 99: 'I will not see how the story ends'

Foundations of Computer Vision

SQLite Date and Time Functions (2007)

Cyborg Embryos Offer New Insights into Brain Growth

The Skyscraper That Could Have Toppled over in the Wind (1995)

Social anxiety disorder-associated gut microbiota increases social fear

The Art of Lisp and Writing (2003)

Simplest C++ Callback, from SumatraPDF

Synthesis of hafnium carbide via one-step selective laser reaction pyrolysis

Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)

GNOME and Red Hat Linux eleven years ago (2009)

Studio Ghibli marks 40 years, but future looks uncertain

Biofuels Policy, a Mainstay of American Agriculture, a Failure for the Climate

Telephone Exchanges in the UK

It’s nearly impossible to buy an original Bob Ross painting (2021)

Modifying an HDMI dummy plug's EDID using a Raspberry Pi

First 2D, non-silicon computer developed

Datalog in miniKanren

How to modify Starlink Mini to run without the built-in WiFi router

How fast can the RPython GC allocate?

Childhood leukemia: how a deadly cancer became treatable

Twin – A Textmode WINdow Environment

Datalog in Rust

Why SSL was renamed to TLS in late 90s (2014)

Cure Dolly's Japanese Grammar Lessons

Show HN: Seastar – Build and dependency manager for C/C++ with Cargo's features

An Introduction to the Hieroglyphic Language of Early 1900s Train-Hoppers

Canyon.mid

Fields where Native Americans farmed 1000 years ago discovered in Michigan

The experience continues until you stop experiencing it

David Attenborough at 99: 'I will not see how the story ends'

Foundations of Computer Vision

SQLite Date and Time Functions (2007)

Cyborg Embryos Offer New Insights into Brain Growth

The Skyscraper That Could Have Toppled over in the Wind (1995)

Social anxiety disorder-associated gut microbiota increases social fear

The Art of Lisp and Writing (2003)

Simplest C++ Callback, from SumatraPDF

Synthesis of hafnium carbide via one-step selective laser reaction pyrolysis

Text-to-LoRA: Hypernetwork that generates task-specific LLM adapters (LoRAs)

GNOME and Red Hat Linux eleven years ago (2009)

Studio Ghibli marks 40 years, but future looks uncertain

Biofuels Policy, a Mainstay of American Agriculture, a Failure for the Climate

Datalog in Rust

Comments