https://github.com/lsferreira42/nadb
It is a disk based KV store with tags for search
Can you guys please explain this to me like I am 5(or maybe 10)? Is this something revolutionary to keep in back of the mind? How does it compare to redis? When should I use it, if any. I always prefer sqlite, then postgresql if scalability and afterwards I am not sure but maybe things like clickhouse. I am also looking more into duckdb but maybe not as a primary database, but rather just in fun. There are also things like turso and cloudflare d1 (if I remember correctly), kinda prefer cloudflare d1 but also like turso or sqlite in general. Still, the database space really piques my interest.
Thanks in advance for helping this young fellow out!
So can I say that this is just a toy project created by the author to learn about DB/storage engines and I should just use sqlite right in prod right?
Riak (the inspiration for this project) and other projects came out at a time when software engineers were exploring how to make disk writes fast and potentially even faster than reads for practical applications. Some tradeoffs to achieve this goal could be enforcing all writes to be sequential ("log-structured" in riak, kafka, and cassandra parlance) and to embrace the model of "eventual consistency".
Eventual consistency is similar to how orders are processed at a cafe or fast-food restaurant. The cashier takes the order and passes it on to the barista or chef - we'll just say "kitchen". The kitchen might not know your order at that moment but it's right there nearby (equivalent in our case: in a RAM buffer ready for disk write). Once the kitchen has finished other orders ahead of yours (the sync interval is reached), it makes your order and delivers it to the counter (the data gets actually written to disk -- "committed" in DB talk).
The key point in this analogy is that the cashier station (system front end UI) doesn't wait around until your order gets made before taking other orders. It assumes all is well and your order will be served by the kitchen "soon enough".
When might these tradeoffs make sense for production systems? Answer: not all data is created equal. For example, if your system stores a steady stream of GPS coordinates from pakage delivery trucks so customers can know when a truck is near their house, it doesn't actually matter if one or two of the coordinates is not immediately available (or even gets lost). The same can go for backend system telemetry, showing CPU or RAM utilization. The trend is the main thing and it's not actually important in a particular real-time instant whether the dashboard chart shows the last 3 readings (since they have yet to be finally written to disk). In cases like these, "ACID" (traditional db term) guarantees not only are not requried, they get in the way of proper system design and implementation.
append only writes meant less complexity. loaded keys into memory on startup, mapped offsets, done. didn't need range queries or indexes, just fast put/get. wrote a simple merge script to compact old segments. performance was solid and startup time didn’t degrade as data grew.
biggest learning was how bitcask avoided cleverness. no tricks, no layered abstractions. it was just clean storage logic with a clear mental model. still think about it when touching newer engines that try to do too much
wallstop•14h ago
mukesh610•10h ago
A sync process syncs the open disk files once every config.syncInterval. Sync also can be done on every request if config.alwaysFsync is True.