So, I set out to create something myself: a thin NAPI layer around http://www.lmdb.tech/doc/index.html. LMDB is cool because it's very fast and allows multiple processes to do lockless read-only transactions in parallel. And it's very much proven tech.
Eventually, the thin NAPI layer grew to be a little less thin, because I want to support simultaneous read/write transactions as well. The main trade-off that LMDB makes, though, is that it allows only a single read/write transaction at a time. This is how I've solved it:
1. I've introduced application-level 'logical' transactions. A single JavaScript process can have many logical transactions running simultaneous (say one for each async request handler function).
2. Reads within a logical transaction are handled by a read-only LMDB transaction/snapshot. Logical transactions within the same process that are started shortly after one another can share a single LMDB transaction.
3. Because reads in LMDB are very fast, they're synchronous function calls. They return `ArrayBuffer` objects that point at the actual on-disk data memory mapped by LMDB: zero-copy!
4. Writes are initially stored in an in-memory buffer attached to the transaction. Subsequent reads within that transaction search the (indexed) buffer first, so that uncommitted updates can be read back within the transaction itself.
5. We also keep track of all reads being done in the transaction, and a checksum of the values that were read.
6. Commits for read-only transaction don't do anything except release some resources, and happen synchronously.
7. Commits for read/write transactions are more involved:
- A socket connection is setup to the 'commit worker' daemon. This daemon is started automatically if it isn't running yet for a particular database. It will also automatically stop when unused.
- The client hands the control over the transaction details and buffers (which are located in a shared memory segment) over to the daemon for processing.
- The daemon starts an LMDB read/write transaction, within which it can process a large number of logical transactions. This massively improves throughput, as the sync() and top-level block rewrites can be amortized over many logical transactions.
- For each logical transaction the daemon verifies that the results for all transaction reads haven't changed since they were initially performed. If they were, that indicates a race condition, which causes the library to rerun the logical transaction function.
- If the logical transaction was not raced, the updates are applied to in the LMDB read/write transaction, and then later committed together with the rest of the batch.
Phew... so that turned out a little more, uh, interesting than the thin NAPI layer I set out to do. I'm calling it OLMDB, for Optimistic LMDB!