Datalog in miniKanren

https://deosjr.github.io/dynamicland/datalog.html

134•deosjr•8mo ago

Comments

deosjr•8mo ago

Seems like interest in Datalog is high this week, so I thought I'd share a write-up of a minimal Datalog implementation I did a while ago.

Runs in the browser using Hoot (https://spritely.institute/hoot/) which compiles Guile Scheme to WebAssembly.

davexunit•8mo ago

Wow this might be the coolest use of Hoot I've seen! I need to run this for myself soon.

deosjr•8mo ago

Thanks Dave, high praise! I was inspired after seeing you all take over the declarative & minimalist programming room at FOSDEM this year.

If you thought this was cool, wait until you see what I ended up using it for: https://deosjr.github.io/dynamicland/ I personally think this is much cooler :) But it needs some more explaining before I can broadly share, I think.

Now that I have you here, a question: am I correct in thinking that in Hoot, eval in the browser does not currently work with macros?

davexunit•8mo ago

I'm glad you felt inspired! This Dynamicland implementation looks awesome. I look forward to this being shared to a wider audience. :)

Regarding your question, as of Hoot 0.6.1 we now have a psyntax-based macro expander integrated with eval so you can use syntax-rules and syntax-case. There are still rough edges, though. I'm currently focused on some non-Hoot tasks but the next Hoot priority is to implement a Guile-like REPL and really kick the tires on the interpreter before the 0.7.0 release.

fithisux•8mo ago

What scheme is this?

deosjr•8mo ago

Guile Scheme. See https://github.com/deosjr/deosjr.github.io/blob/master/dynam... for more.

upghost•8mo ago

Datalog is a syntactic subset of Prolog[1], which this is... not.

I think the most misunderstood thing about Prolog (and Datalog, the functor-free subset of pure Prolog) is that the syntax is really, really important.

It's like, the whole gimmick of the language. It is designed to efficiently and elegantly query and transform itself. If you lose the syntax you lose all of intermediate and advanced Prolog (and Datalog).

[1]: https://en.m.wikipedia.org/wiki/Datalog

kragen•8mo ago

Semantics are more important than syntax. Prolog's flexible syntax is a nice-to-have rather than essential when you're in Lisp. And Datalog is purely first-order, so the advanced Prolog you're talking about doesn't exist in it.

However, syntax does matter, and this is not acceptable

    (dl-find 
     (fresh-vars 1 
      (lambda (?id) 
       (dl-findo dl
        ((,?id reachable ,?id)))))))

as a way to ask

    reachable(Id, Id).

I think you could, however, write a bit more Scheme and be able to ask

    (?id reachable ?id)

which would be acceptable.

However, the ordering betrays a deeper semantic difference with orthodox Datalog, which is about distinct N-ary relations, like a relational database, not binary relations. This implementation seems to be specific to binary relations, so it's not really Datalog for reasons that go beyond mere syntax.

On the other hand, this (from the initial goal) would be perfectly fine:

    (dl-rule! dl (reachable ,?x ,?y) :- 
                     (edge ,?x ,?z) (reachable ,?z ,?y))

The orthodox Datalog syntax is:

    reachable(X, Y) :- edge(X, Z), reachable(Z, Y).

jitl•8mo ago

Shouldn’t lisp macros make it easy to present such a nice syntax? Perhaps the author could easily implement that bit, if not the wide rows. Or is that the point you’re making?

There is a dl-rule here: https://github.com/deosjr/deosjr.github.io/blob/15b5f7e02153...

kragen•8mo ago

I don't think you need Lisp macros for it; you could use just a regular Lisp function. I don't think the standard R5RS macros are powerful enough to grovel over the query expression to make a list of the free variables, but then, standard Scheme also doesn't have records. I think Guile has a procedural macro system that you could use, but I don't think it would be a good idea.

Yes, I think the semantic divergence is more fundamental. Triple stores and graph databases and binary relations are awesome, but they aren't what Datalog is.

deosjr•8mo ago

Thank you for the feedback! I agree with all of the above.

Should've probably been a bit more clear on the dl-find syntax; I find it just as unacceptable as you do. It is the result of laziness: my intended use of this minimal Datalog does not include any querying whatsoever but abuses fixpoint analysis for side-effects (see https://github.com/deosjr/deosjr.github.io/blob/master/dynam... which I intend to go over in a future post). I initially had it working like you described but butchered it for the above and haven't repaired it yet (see https://github.com/deosjr/whistle?tab=readme-ov-file#datalog). This version relied on some monstrous eval-hacking using a homebrew Lisp, which I've mostly cleaned up now in this version (https://github.com/deosjr/whistle/blob/main/datalog/datalog.... is a crime, for example).

The semantics are indeed limited to binary relations atm, which I agree is the main thing that disqualifies this as a proper Datalog. iirc the tutorial on Datalog that I based this implementation on only handled triples as well so I stopped there, but extending to N-ary relations is on my list to look into for sure.

kragen•8mo ago

This sounds very interesting! I'll have to take a look.

I am always worried about posting comments like mine because often people get defensive when I try to engage, as I see it, on substance. Responses like yours make it all worthwhile!

deosjr•8mo ago

I appreciate it; this kind of exchange is exactly why I read HackerNews. If you have any good sources on extending Datalog to N-ary relations, I'd love to know. Just had a look at the implementation I based mine on and it exclusively talks about triples: https://www.instantdb.com/essays/datalogjs

Coming from Prolog I'd like to get closer to the original if possible :)

thesz•8mo ago

They use triples as triplets can represent any n-tuple facts.

E.g., if you have a fact id=(a,b,c,d), you can record triples (id, 1, a), (id, 2, b), (id, 3, c) and (id, 4, d) and reconstruct original fact.

Look at it as columnar storage in databases.

Then, if your query only needs a third value from a 4-tuple facts, you can get only those, ignoring first, second and fourth values. This is what columnar storage engines do.

In fact, I read that one of most efficient datalog engines use relational query execution under the hood.

Take a look here: https://github.com/philzook58/awesome-egraphs

The paper you'll most probably find interesting is "Better Together: Unifying Datalog and Equality Saturation," but there are many others interesting things there.

deosjr•8mo ago

Cheers, this is super useful. I will have to do some reading. Being able to build up n-ary predicates using triples that way makes a lot of sense.

kragen•8mo ago

Datalog always supports N-ary relations! It's not an extension.

The Wikipedia article recommends https://search.worldcat.org/title/30546436 "Foundations of Databases" by Abiteboul, Hull, and Vianu, from 01995, and https://archive.org/details/logicdatabases0000symp/page/n5/m... "Logic and Data Bases [sic]" by Gallaire and Minker from 01978. Some poking at Google Scholar also turns up, in rough order of how promising they look (without having read them that I can recall):

https://dl.acm.org/doi/pdf/10.1145/6012.15399 "Magic Sets and Other Strange Ways to Implement Logic Programs", Bancilhon, Maier, Sagiv, & Ullman (yes, that Ullman), 01985 (15 pp.)

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d... "What You Always Wanted to Know About Datalog (And Never Dared to Ask)", Ceri, Gottlob, and Tanca, 01989 (21 pp.)

https://web.cecs.pdx.edu/harry/earley/datalog.pdf "Optimizations to Earley Deduction for DATALOG Programs", Porter, 01985 (12 pp.)

https://dl.acm.org/doi/pdf/10.1145/308386.308420 "Optimizing Existential Datalog Queries", Ramakrishhnan, Beers, & Krishnamurthy, 01988 (14 pp.)

https://dl.acm.org/doi/pdf/10.1145/298514.298542 "On the Expressive Power of Datalog: Tools and a Case Study", Kolaitis & Vardi, 01990

https://dl.acm.org/doi/pdf/10.1145/93605.98724 "A Framework for the Parallel Processing of Datalog Queries", Ganguly, Silberschatz, & Tsur, 01990

https://deepblue.lib.umich.edu/bitstream/handle/2027.42/3116... "Datalog vs First-Order Logic", Ajtai & Gurevich, 01993

https://lat.inf.tu-dresden.de/teaching/ss2014/Seminar/Papers... "Equivalence of Datalog Queries is Undecidable", Shmueli, 01993

It also turned up "Portability of Syntax and Semantics in Datalog" which turned out to be an unrelated NLP AI system called Datalog.

Bancilhon, Maier, Sagiv, & Ullman give as their reference for Datalog "Maier and Warren [1985]", which turns out to be "D. Maier and D. S. Warren [1985]. Introduction to Logic Programming, unpublished memorandum, Oregon Graduate Center," which I can't find a copy of easily. But given that Maier is a shared author we can probably trust their summary of what Datalog is.

Ceri, Gottlob, and Tanca reference "[120], [15], [16]," which are respectively:

J. D. Ullman, “Implementation of logic query languages for databases,” ACM Trans. Database Syst., vol. 10, no. 3, 1985

F. Bancilhon and R. Ramakrishnan, “An amateur’s introduction to recursive query processing,” in Proc. ACM-SIGMOD Conf., May 1986.

-, “Performance evaluation of data intensive logic programs,” in Foundations of Deductive Databases and Logic Programming, J. Minker, Ed. Washington, DC, 1986.

The Ullman paper is https://dl.acm.org/doi/pdf/10.1145/3979.3980, "Implementation of Logical Query Languages for Databases", ACM Transactions on Database Systems, Vol. 10, No. 3, September 1985, Pages 289-321 (33 pp.). Ceri, Gottlob, and Tanca screwed up the title. https://dl.acm.org/doi/10.1145/971699.320000 probably isn't it; the journal name, volume number, and issue number don't match, although it's the right author and year. That seems to be the oldest published Datalog paper, although the word "Datalog" hadn't been invented yet and doesn't appear in the paper.

I think I'm going to read the Bancilhon, Maier, Sagiv, & Ullman paper first, because it's shorter and has a more readable-sounding title, and then maybe Ceri, Gottlob, and Tanca, and then maybe Ullman, and then maybe a relevant chapter or two of Gallaire and Minker.

j-pb•8mo ago

Most database literature simply uses Datalog to mean the query language fragment of conjunctive queries + recursion/fixpoint-iteration and potentially stratified negation.

Yes it started out as a Prolog subset, but the definition as the fragments it supports has become much more prevalent, mainly to contrast it to non-recursive fragments with arbitrary negation (e.g. SQL).

This usage dates back to database literature of the 80s by Ullman et. al.

The hard problem of AI therapy

Trump Orders Government to Stop Using Anthropic After Pentagon Standoff

Does overwork make agents Marxist?

Refactoring Is for Humans

Federal Government to restrict use of Anthropic

GLP-1 and Prior Major Adverse Limb Events in Patients with Diabetes

Show HN: Agoragentic – Agent-to-Agent Marketplace for LangChain, CrewAI and MCP

Show HN: WhenItHappens–family resource after traumatic death

Trump directs federal agencies to cease use of Anthropic

Trump Will End Government Use of Anthropic's AI Models

The Death of Spotify: Why Streaming Is Minutes Away from Being Obsolete

The Death of the Subconscious and the Birth of the Subconsciousness

Show HN: Gace AI – A zero-config platform to build and host AI plugins for free

USA to cut Anthropic from government contracts in six months

Heart attack deaths rose between 2011 and 2022 among adults younger than age 55

Ask HN: What's the best engineering interview process?

Relaxation trend: customers can meditate or snooze in open or closed casket

Massachusetts State Police are on a drone surveillance shopping spree

Trump Responds to Anthropic

LLM-Based Evolution as a Universal Optimizer

Trump Orders US Agencies to Drop Anthropic After Pentagon Feud

Netflix Declines to Raise Offer for Warner Bros

Show HN: I Built a $1 Escalating Internet Billboard – Called Space

Show HN: I vibe coded a DAW for the terminal. how'd I do?

How to Run a One Trillion-Parameter LLM Locally: AMD Ryzen AI Max+ Cluster Guide

It's Time for LLM Connection Strings

A War Foretold

Recontextualizing Famous Quotes for Brand Slogan Generation

Poland Plans Social Media Ban for Kids in Challenge to US Tech

Show HN: A pure Python HTTP Library built on free-threaded Python

The hard problem of AI therapy

Trump Orders Government to Stop Using Anthropic After Pentagon Standoff

Does overwork make agents Marxist?

Refactoring Is for Humans

Federal Government to restrict use of Anthropic

GLP-1 and Prior Major Adverse Limb Events in Patients with Diabetes

Show HN: Agoragentic – Agent-to-Agent Marketplace for LangChain, CrewAI and MCP

Show HN: WhenItHappens–family resource after traumatic death

Trump directs federal agencies to cease use of Anthropic

Trump Will End Government Use of Anthropic's AI Models

The Death of Spotify: Why Streaming Is Minutes Away from Being Obsolete

The Death of the Subconscious and the Birth of the Subconsciousness

Show HN: Gace AI – A zero-config platform to build and host AI plugins for free

USA to cut Anthropic from government contracts in six months

Heart attack deaths rose between 2011 and 2022 among adults younger than age 55

Ask HN: What's the best engineering interview process?

Relaxation trend: customers can meditate or snooze in open or closed casket

Massachusetts State Police are on a drone surveillance shopping spree

Trump Responds to Anthropic

LLM-Based Evolution as a Universal Optimizer

Trump Orders US Agencies to Drop Anthropic After Pentagon Feud

Netflix Declines to Raise Offer for Warner Bros

Show HN: I Built a $1 Escalating Internet Billboard – Called Space

Show HN: I vibe coded a DAW for the terminal. how'd I do?

How to Run a One Trillion-Parameter LLM Locally: AMD Ryzen AI Max+ Cluster Guide

It's Time for LLM Connection Strings

A War Foretold

Recontextualizing Famous Quotes for Brand Slogan Generation

Poland Plans Social Media Ban for Kids in Challenge to US Tech

Show HN: A pure Python HTTP Library built on free-threaded Python

Datalog in miniKanren

Comments