It appears that perfect hash is the one that works the best for my use case.
But if you put things into the perfect hash function it is not expecting, some fraction of them will collide.
If you're searching for a fixed set, look at the Ragel library. Compile-time generation of the search in a way that is very hard to beat.
No, 99.999999% of the time the values is NOT what I am looking for. So the error rate is EXTREMELY high, so its very important that the failure path is quick.
I probably could have made $100,000,000 if I had made a different choice there.
I wrote a hashcash implementation on a GPU 10 years ago. Pretty sure it's valueless now.
I'm quite sure I wrote one of the first GPU optimized bioinformatics toolkits ... out of curiosity, on a GeForce 8, using CUDA v1(!) back in 2009.
Then went on to do other things ... missed the big bux.
After being aware of bloom filters for a while it was quite satisfying to organically find a real use for one that was such an improvement.
All three concepts are key in understanding operations of complex distributed systems.
But yes, those are pretty important too.
It will demonstrate why the answer is always "maybe it's in the set!" instead of "yes".
Both collide with: fnv: 7 murmur: 12
> When you add a string, you can see that the bits at the index given by the hashes are set to 1. I've used the color green to show the newly added ones
because all I saw was the content text of "Your set:" changing, no bits, no colors. It took me trying a different browser and then scrolling up and down to realize that it's the visualization two paragraphs above that was changing.
There is a only bloom filter calculator that lets you just mess arpund with parameters
This one compares bloom and coockoo filters, so it adds a little more information.
marginalia_nu•7mo ago
For sets that are plausibly sometimes going to be small where you're going to do a lot of membership checks, you can speculatively add a 64 bit bloom filter with a trivial hash function.
This sounds really stupid, but the cost of doing this is so small you can do it as a gamble. If it doesn't work out you've added like 10ns to your insertions and membership checks, but when it does work out, you can save an incredible amount of work.
Sesse__•7mo ago
(I added some of these)
marginalia_nu•7mo ago
Sesse__•7mo ago