frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Filedb: Disk-based key-value store inspired by Bitcask

https://github.com/rajivharlalka/filedb
98•todsacerdoti•15h ago

Comments

wallstop•14h ago
This looks interesting. Maybe I'm not in-the-know, but why would you offload such important aspects like `sync` to the client instead of building in some protocol to ensure that file integrity is maintained? With this kind of design choice, it seems quite easy to lose data, unless I'm missing something.
mukesh610•10h ago
From the README:

A sync process syncs the open disk files once every config.syncInterval. Sync also can be done on every request if config.alwaysFsync is True.

im_down_w_otp•13h ago
Bitcask, now there's a blast from the Basho past. It always bugged me that no good secondary indexing strategy was built to make using Bitcask viable for more use cases. Everyone always wanted to use the LevelDB backend just to get at secondary indexing features (which also performance scaled inversely relative to cluster size, which was it's own problem). But having Riak exhibit consistent, high-performance was waaaaaaaay easier on Bitcask.
lsferreira42•7h ago
This is something that sometimes i play with:

https://github.com/lsferreira42/nadb

It is a disk based KV store with tags for search

Imustaskforhelp•5h ago
Sorry, maybe I am not in the mood of delving too deep into the project(but I starred it! Amazing job I suppose) and I don't want to ask AI but rather some experts who are surely lurking HN.

Can you guys please explain this to me like I am 5(or maybe 10)? Is this something revolutionary to keep in back of the mind? How does it compare to redis? When should I use it, if any. I always prefer sqlite, then postgresql if scalability and afterwards I am not sure but maybe things like clickhouse. I am also looking more into duckdb but maybe not as a primary database, but rather just in fun. There are also things like turso and cloudflare d1 (if I remember correctly), kinda prefer cloudflare d1 but also like turso or sqlite in general. Still, the database space really piques my interest.

Thanks in advance for helping this young fellow out!

packetlost•5h ago
Implementing Bitcask is sorta like a right of passage for people interested in DBs/storage engines. You shouldn't use this in production. SQLite is most likely more flexible, reliable, and ubiquitous for situations where this project would be useful.
Imustaskforhelp•4h ago
Gotcha! Thanks a lot mate!

So can I say that this is just a toy project created by the author to learn about DB/storage engines and I should just use sqlite right in prod right?

ezekiel68•2h ago
I disagree with the other reply indicating something like this should not be used in production. For most of the history of practical disk IO, it was observed and assumed that disk reads would be relatively much faster than disk writes. It turns out that this assumption was based on other assumptions, such as that most reading and writing would be handled as "random IO" where a physical disk head accessing an actual spinning disk might need to move around at any given time to read or to update some data.

Riak (the inspiration for this project) and other projects came out at a time when software engineers were exploring how to make disk writes fast and potentially even faster than reads for practical applications. Some tradeoffs to achieve this goal could be enforcing all writes to be sequential ("log-structured" in riak, kafka, and cassandra parlance) and to embrace the model of "eventual consistency".

Eventual consistency is similar to how orders are processed at a cafe or fast-food restaurant. The cashier takes the order and passes it on to the barista or chef - we'll just say "kitchen". The kitchen might not know your order at that moment but it's right there nearby (equivalent in our case: in a RAM buffer ready for disk write). Once the kitchen has finished other orders ahead of yours (the sync interval is reached), it makes your order and delivers it to the counter (the data gets actually written to disk -- "committed" in DB talk).

The key point in this analogy is that the cashier station (system front end UI) doesn't wait around until your order gets made before taking other orders. It assumes all is well and your order will be served by the kitchen "soon enough".

When might these tradeoffs make sense for production systems? Answer: not all data is created equal. For example, if your system stores a steady stream of GPS coordinates from pakage delivery trucks so customers can know when a truck is near their house, it doesn't actually matter if one or two of the coordinates is not immediately available (or even gets lost). The same can go for backend system telemetry, showing CPU or RAM utilization. The trend is the main thing and it's not actually important in a particular real-time instant whether the dashboard chart shows the last 3 readings (since they have yet to be finally written to disk). In cases like these, "ACID" (traditional db term) guarantees not only are not requried, they get in the way of proper system design and implementation.

b0a04gl•3h ago
used bitcask during undergrad for a systems course project. task was to build a minimal key value store with durability and fast writes. no frameworks allowed. tried leveldb first but spent too much time tuning compaction. switched to bitcask after reading the original Riak paper and it just worked.

append only writes meant less complexity. loaded keys into memory on startup, mapped offsets, done. didn't need range queries or indexes, just fast put/get. wrote a simple merge script to compact old segments. performance was solid and startup time didn’t degrade as data grew.

biggest learning was how bitcask avoided cleverness. no tricks, no layered abstractions. it was just clean storage logic with a clear mental model. still think about it when touching newer engines that try to do too much

I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch

https://github.com/yousef-rafat/miniDiffusion
172•yousef_g•4h ago•22 comments

Inside the Apollo "8-Ball" FDAI (Flight Director / Attitude Indicator)

https://www.righto.com/2025/06/inside-apollo-fdai.html
55•zdw•2h ago•13 comments

Unsupervised Elicitation of Language Models

https://arxiv.org/abs/2506.10139
89•kordlessagain•5h ago•5 comments

Solar Orbiter gets world-first views of the Sun's poles

https://www.esa.int/Science_Exploration/Space_Science/Solar_Orbiter/Solar_Orbiter_gets_world-first_views_of_the_Sun_s_poles
80•sohkamyung•2d ago•6 comments

Drones will realize the promise of suicide terrorism

https://blog.exitgroup.us/p/cheap-drones-will-realize-the-promise
3•arrowsmith•28m ago•0 comments

Peano arithmetic is enough, because Peano arithmetic encodes computation

https://math.stackexchange.com/a/5075056/6708
180•btilly•1d ago•73 comments

Last fifty years of integer linear programming: Recent practical advances

https://inria.hal.science/hal-04776866v1
131•teleforce•12h ago•26 comments

The Many Sides of Erik Satie

https://thereader.mitpress.mit.edu/the-many-sides-of-erik-satie/
98•anarbadalov•6d ago•20 comments

SSHTron: A multiplayer lightcycle game that runs through SSH

https://github.com/zachlatta/sshtron
30•thunderbong•1h ago•5 comments

SIMD-friendly algorithms for substring searching (2018)

http://0x80.pl/notesen/2016-11-28-simd-strfind.html
163•Rendello•14h ago•28 comments

Self-driving company Waymo's market share in San Francisco exceeds Lyft's

https://underscoresf.com/in-san-francisco-waymo-has-now-bested-lyft-uber-is-next/
14•namanyayg•1h ago•3 comments

Writing a Truth Oracle in Lisp

https://lambda-cove.net/posts/truth-oracle-lisp/
27•io12•2d ago•3 comments

Endometriosis is an interesting disease

https://www.owlposting.com/p/endometriosis-is-an-incredibly-interesting
269•crescit_eundo•19h ago•166 comments

Minnesota Lawmaker Assassinated

https://apnews.com/article/minnesota-lawmakers-shot-d7983e1e4f1a7573a487cab1a98cd172
27•jihadjihad•34m ago•1 comments

Peeling the Covers Off Germany's Exascale "Jupiter" Supercomputer

https://www.nextplatform.com/2025/06/11/peeling-the-covers-off-germanys-exascale-jupiter-supercomputer/
7•rbanffy•2d ago•1 comments

"Language and Image Minus Cognition." Leif Weatherby on LLMs

https://www.jhiblog.org/2025/06/11/language-and-image-minus-cognition-an-interview-with-leif-weatherby/
16•Traces•3d ago•5 comments

Slowing the flow of core-dump-related CVEs

https://lwn.net/SubscriberLink/1024160/f18b880c8cd1eef1/
64•jwilk•3d ago•10 comments

Solidroad (YC W25) Is Hiring

https://solidroad.com/careers
1•pjfin•6h ago

Me an' Algernon – grappling with (temporary) cognitive decline

https://tidyfirst.substack.com/p/me-an-algernon
75•KentBeck•4d ago•41 comments

TimeGuessr

https://timeguessr.com/
190•stefanpie•4d ago•39 comments

Texting myself the weather every day

https://bensilverman.co.uk/posts/daily-weather-sms/
21•benslv•3d ago•36 comments

Liquid Glass – WWDC25 [video]

https://developer.apple.com/videos/play/wwdc2025/219
136•lnrd•4d ago•242 comments

Filedb: Disk-based key-value store inspired by Bitcask

https://github.com/rajivharlalka/filedb
98•todsacerdoti•15h ago•9 comments

Self-Adapting Language Models

https://arxiv.org/abs/2506.10943
188•archon1410•23h ago•51 comments

Implementing Logic Programming

https://btmc.substack.com/p/implementing-logic-programming
166•sirwhinesalot•20h ago•53 comments

The Army’s Newest Recruits: Tech Execs From Meta, OpenAI and More

https://www.wsj.com/tech/army-reserve-tech-executives-meta-palantir-796f5360
170•aspenmayer•1d ago•158 comments

Student discovers fungus predicted by Albert Hoffman

https://wvutoday.wvu.edu/stories/2025/06/02/wvu-student-makes-long-awaited-discovery-of-mystery-fungus-sought-by-lsd-s-inventor
133•zafka•3d ago•109 comments

The international standard for identifying postal items

https://www.akpain.net/blog/s10-upu/
86•surprisetalk•2d ago•18 comments

Mollusk shell assemblages as a tool for identifying unaltered seagrass beds

https://www.int-res.com/abstracts/meps/v760/meps14839
13•PaulHoule•2d ago•0 comments

If the moon were only 1 pixel: A tediously accurate solar system model (2014)

https://joshworth.com/dev/pixelspace/pixelspace_solarsystem.html
823•sdoering•1d ago•246 comments