Hahaha. (Seems like the bloom filter library isn't set for maximum false positive rate and/or to autoexpand.)
Edit: Actually there's a BloomFalsePositive setting, maybe it never gets used? Also maybe it's not a library and it's a custom implementation.
The author wrote this as a learning exercise. And is sharing the process.
That's what Cassandra does iirc
> This is the kind of bug you only find by building the thing and measuring it.
No? I mean, maybe if you're vibecoding it's the only way, but in the prehistoric days you could reason about what code would do before you ran it.
[Obviously, i've made my own silly mistakes over the years, many much sillier than this, its just weird to describe this one as only detectable by profiling]
bool getSchemaSizes(size_t * expectedBatchSize, size_t * expectedEntriesPerBlock) { ... }
size_t expectedEntriesPerBlock, expectedBatchSize;
getSchemaSizes(&expectedEntriesPerBlock, &expectedBatchSize)
initBloomFilter(expectedEntriesPerBlock)
I'm sure you've never made a silly mistake where you passed the wrong integer parameter to a function, stared at your screen, and failed to notice it. Or, forgot the order of arguments to calloc().
If you're saying that profiling is for those too lazy to reason about their code, you're distorting the whole lesson: profiling is more powerful than guessing.
I'm very amused by this obviously AI-generated "benchmark program": https://github.com/AasheeshLikePanner/lsm-tree-go/blob/main/...
Immediate tell that this was written by AI. Another thing I've noticed lately - AI's overuse of "every":
> Every batch of writes called `file.Write` on the write-ahead log.
> Every read was scanning entire SSTable files.
> Every bit is set.
> Every value matches.
Me: so you have an in-memory cache, right?
Them: yes!
Me: what is the TTL?
Them: Oh, it's not set, oops. Here, let's set it to 1 minute. Hey look, the performance went way up!
Me: okay, great. When you say 1 minute, do you mean 60 seconds?
Them: uh...wait...uh....oh, the unit is seconds. Wait, why is the performance so good with a 1 second TTL?
Me: What's your load test?
Them: We crank 1M TPS fetching the same 30 items over and over.
Me: ....
I totally agree about the power of profiling but profiling without understanding would not have helped this team.
I also don't think the author wrote much of their codebase, or much of their blog post, but that's the brave new world we're living in.
But even just thinking about it for half a second from a balls and bins perspective, 100k items into 100 binary bins is obviously gonna saturate.
jmalicki•1h ago
Sure it's faster to never write to disk, then you reboot and you've lost data.
/dev/null is a webscale database that is even faster!
FarmerPotato•52m ago
jmalicki•47m ago
You don't write to the WAL on a batch.
> the author tested correctness after a crash.
You mean the LLM?
bawolff•34m ago
teraflop•31m ago
It seems to me that neither the old nor the new version of the code is really "durable" as I would understand the word. The old version made a write syscall per batch, but doesn't say it also did an fsync per batch. The new version writes data to an mmap'ed file, and calls fsync in the background.
So both versions are "durable" in the sense that written data is preserved even if the process gets killed, because it's in the OS page cache. But in both versions, a write can be completed before the data actually makes it to disk, so a power failure will lose acknowledged writes.
Retr0id•31m ago
https://github.com/facebook/rocksdb/wiki/WAL-Performance#non...
jmalicki•18m ago
Fsync is often used when the data doesn't truly need to be on disk, because there aren't very good write ordering APIs exposed, even if that's all you truly need.