Bloom Filters by Example

https://llimllib.github.io/bloomfilter-tutorial/

92•ibobev•5h ago

Comments

marginalia_nu•3h ago

I have a trick I like:

For sets that are plausibly sometimes going to be small where you're going to do a lot of membership checks, you can speculatively add a 64 bit bloom filter with a trivial hash function.

This sounds really stupid, but the cost of doing this is so small you can do it as a gamble. If it doesn't work out you've added like 10ns to your insertions and membership checks, but when it does work out, you can save an incredible amount of work.

Sesse__•3h ago

Chromium does this in a bunch of places; the article only links to Safe Browsing using murmur, but the renderer (Blink) generally uses rapidhash and has some of these micro-filters which it uses for e.g.:

  - querySelector() in certain cases
  - Prefiltering hash lookups in CSS buckets
  - Rapid reject of elements when looking for certain Aria attributes (for accessibility)

It's surprising that such tiny filters (32 or 64 bits) work at all, but they often do. There are also some larger Bloom filters around.

(I added some of these)

marginalia_nu•3h ago

They just have a really unintuitive economy where they basically only need to work once or twice to make up for the cost of all the times they don't contribute any benefit.

alienbaby•3h ago

This article is aimed squarely at people like me. I'd heard of them. I kept meaning to look them up everytime I saw them mentioned. I finally did when I saw your articale and it was the perfect intro that I was looking for :)

konsalexee•2h ago

Another one bloom filter post I really appreciated from Eli Bendersky if anyone wants to read more: https://eli.thegreenplace.net/2025/bloom-filters/

256bit•2h ago

Another visualisation of Bloom filters can be found at the end of this page: https://www.chrislaux.com/hashtable.html

verytrivial•1h ago

The overlap in concepts required to understand Bloom filters, sets and hash tables is about 95% IMHO. A set is a hash table used for membership tests where you only care about the key, not the value. And a Bloom filter is just a set that exploits the fact that many-to-one hashing 'compresses' the key-space with collisions. It deliberately uses a very collide-y hash function. If a specific key was ever hashed, you WILL get a hit, but there might be other keys that produced the same hash. It's a feature, not a bug.

marginalia_nu•1h ago

If you've grokked bloom filters, you're very close to also understanding both random projection and certain implementations of locality-sensitive hashes.

cherrycherry98•1h ago

Glad to know I'm not alone in my mental modeling of Bloom filters as just hash tables that only track the buckets which have data but not the actual data itself.

anon-3988•52m ago

I have a specific use case where I know from startup the list of words that I want to find and this will not change for the duration of the program. Can anyone think of a low latency solution to this? I have tried a lot of variations of bloom filter, perfect hash, linear lookup, binary search, set search etc

It appears that perfect hash is the one that works the best for my use case.

jerf•31m ago

You're saying you can use a perfect hash also implies you know you will only find those values? If so, then yes, the name is accurate and is probably a very good choice.

But if you put things into the perfect hash function it is not expecting, some fraction of them will collide.

If you're searching for a fixed set, look at the Ragel library. Compile-time generation of the search in a way that is very hard to beat.

b0a04gl•49m ago

i got into bloom filters while debugging cassandra read spikes ,lot of sstable lookups even when key not exist ,didnt make sense at first ,then realised bloom filter on each sstable meant to skip disk ,but default fp rate was high like 0.1 or so ,too much for our case ,most reads were cache miss anyway so those false positives were killing us ,changed it to 0.01 ,bit more memory it consumed but way less useless reads ,lbrought p99 read latency by good 16-18%

costco•16m ago

I had used bloom filters in the past without really understanding how they worked. Then one day I decided to implement them just going off the Wikipedia article with the 32-bit MurmurHash function and was surprised at how simple it was. If you're using C++ you can use std::vector<bool> (or as of C++23, std::bitset) to make it even easier to store the bits in a space efficient way.

Show HN: I made a zero-log, ephemeral, E2EE web chat

The Evolution of Caching Libraries in Go and Ristretto's zero hit rate mystery

Online Font Size Calculator

Distapp. Manage and distribute Android, iOS and Desktop app

Dead members of Congress can't stop posting

Show HN: StopAddict – A minimalist, gamified app to quit addictions

Self-driving is finally happening

Over a third of people on sinking Tuvalu seek Australia's climate visas

LAPD Face Search

Analysing the Death Toll from the Hamas-Run Ministry of Health in Gaza [pdf]

Magma

Confidence Slop

Apple pushed Wallet notifications with F1 offer, sparking backlash

Unlocking Context: Using Gemini CLI as a Data Engine for Claude Code

We catch viral deepfakes by making CLIP, Whisper, and Gemini vote on what's real

New Blood Type Discovered in France Offers Breakthrough in Transfusion Medicine

I made a tool for sharing Claude Code conversations online

Trump says he has group of wealthy people to buy TikTok

When Earth iced over, early life may have sheltered in meltwater ponds

Show HN: Ketcher Docker – Self-Hosting Advanced Chemical Structure Editor

SongBloom: Coherent Song Generation

A history of the Fillmore neighborhood in San Francisco

Substack Is Having a Moment–Again. But Time Is Running Out

I Transformed My Flat into a Smart Home with Home Assistant

Why Artificial Integrity Must Overtake Artificial Intelligence

Checking your ChatGPT traffic in 2 min and 5 clicks

EPA plans to end the Energy Star program, with costs to US homeowners uncertain

Computer Simulations Reveal the Wheel's Unlikely Birth Nearly 6k Years Ago

Europe's First Exascale Supercomputer Powers Up

Night lizards survived the asteroid that ended the dinosaurs