In-process (aka embedded/embeddable) databases are not new. In fact SQLite is the most widely deployed database in the world. However, starting with DuckDB, there is a new set of in-process database systems, such as Kuzu and Lance. As a co-developer of Kuzu, I hear several frequently asked questions (some of which are misconceptions) about in-process databases.
- What are their advantages/disadvantages compared to client-server databases?
- Does in-process mean databases are in-memory/ephemeral? (NO!)
- Can in-process databases handle only small amounts of data? (NO!)
- What are some common use cases of in-process databases?
- What if my application needs a server?
I tried to answer some of these questions in a blog post with pointers to several other resources that articulate several of these points in more detail than I get into.
I hope it's helpful to clarify some of these questions and help developers position in-process DBMSs against client-server ones.
emmanueloga_•2mo ago
The article suggests running Kuzu in a FastAPI frontend for network access. A caveat: production Python servers like Uvicorn [1] typically spawn multiple worker processes.
A simple workaround is serving HTTP through a single process language like Go or JavaScript, since Kuzu has bindings for both. Other processes could access the database directly in read-only mode for analysis [2].
For better DX, the ideal would be Kuzu implementing the Bolt protocol of Neo4J directly in the binary, handling single-writer and multi-reader coordination internally. Simpler alternative: port the code from [3] to C++ and add a `kuzu --server` option.
Yes this makes sense and we plan to eventually do something along what you are suggesting. We also have a plan to have a built-in server/GUI, where users can directly launch a web-based explorer through our CLI by typing "kuzudb -ui".
semihs•2mo ago
- What are their advantages/disadvantages compared to client-server databases? - Does in-process mean databases are in-memory/ephemeral? (NO!) - Can in-process databases handle only small amounts of data? (NO!) - What are some common use cases of in-process databases? - What if my application needs a server?
I tried to answer some of these questions in a blog post with pointers to several other resources that articulate several of these points in more detail than I get into.
I hope it's helpful to clarify some of these questions and help developers position in-process DBMSs against client-server ones.
emmanueloga_•2mo ago
A simple workaround is serving HTTP through a single process language like Go or JavaScript, since Kuzu has bindings for both. Other processes could access the database directly in read-only mode for analysis [2].
For better DX, the ideal would be Kuzu implementing the Bolt protocol of Neo4J directly in the binary, handling single-writer and multi-reader coordination internally. Simpler alternative: port the code from [3] to C++ and add a `kuzu --server` option.
--
1: https://fastapi.tiangolo.com/deployment/server-workers/#mult...
2: https://docs.kuzudb.com/concurrency/#scenario-2-multiple-pro...
3: https://github.com/kuzudb/explorer/tree/master/src/server
semihs•2mo ago
emmanueloga_•2mo ago