frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Bloom Filters by Example

https://llimllib.github.io/bloomfilter-tutorial/
140•ibobev•8h ago

Comments

marginalia_nu•6h ago
I have a trick I like:

For sets that are plausibly sometimes going to be small where you're going to do a lot of membership checks, you can speculatively add a 64 bit bloom filter with a trivial hash function.

This sounds really stupid, but the cost of doing this is so small you can do it as a gamble. If it doesn't work out you've added like 10ns to your insertions and membership checks, but when it does work out, you can save an incredible amount of work.

Sesse__•6h ago
Chromium does this in a bunch of places; the article only links to Safe Browsing using murmur, but the renderer (Blink) generally uses rapidhash and has some of these micro-filters which it uses for e.g.:

  - querySelector() in certain cases
  - Prefiltering hash lookups in CSS buckets
  - Rapid reject of elements when looking for certain Aria attributes (for accessibility)
It's surprising that such tiny filters (32 or 64 bits) work at all, but they often do. There are also some larger Bloom filters around.

(I added some of these)

marginalia_nu•6h ago
They just have a really unintuitive economy where they basically only need to work once or twice to make up for the cost of all the times they don't contribute any benefit.
Sesse__•1h ago
For extra fun, you sometimes can make ideal filters with no false positives, if you know your possible elements ahead of time and you don't insert too many of them. (E.g., for 20 elements, you can construct a 12-bit code where there are guaranteed no false positives as long as you insert at most two elements.)
alienbaby•6h ago
This article is aimed squarely at people like me. I'd heard of them. I kept meaning to look them up everytime I saw them mentioned. I finally did when I saw your articale and it was the perfect intro that I was looking for :)
konsalexee•5h ago
Another one bloom filter post I really appreciated from Eli Bendersky if anyone wants to read more: https://eli.thegreenplace.net/2025/bloom-filters/
256bit•5h ago
Another visualisation of Bloom filters can be found at the end of this page: https://www.chrislaux.com/hashtable.html
verytrivial•4h ago
The overlap in concepts required to understand Bloom filters, sets and hash tables is about 95% IMHO. A set is a hash table used for membership tests where you only care about the key, not the value. And a Bloom filter is just a set that exploits the fact that many-to-one hashing 'compresses' the key-space with collisions. It deliberately uses a very collide-y hash function. If a specific key was ever hashed, you WILL get a hit, but there might be other keys that produced the same hash. It's a feature, not a bug.
marginalia_nu•4h ago
If you've grokked bloom filters, you're very close to also understanding both random projection and certain implementations of locality-sensitive hashes.
cherrycherry98•4h ago
Glad to know I'm not alone in my mental modeling of Bloom filters as just hash tables that only track the buckets which have data but not the actual data itself.
anon-3988•4h ago
I have a specific use case where I know from startup the list of words that I want to find and this will not change for the duration of the program. Can anyone think of a low latency solution to this? I have tried a lot of variations of bloom filter, perfect hash, linear lookup, binary search, set search etc

It appears that perfect hash is the one that works the best for my use case.

jerf•3h ago
You're saying you can use a perfect hash also implies you know you will only find those values? If so, then yes, the name is accurate and is probably a very good choice.

But if you put things into the perfect hash function it is not expecting, some fraction of them will collide.

If you're searching for a fixed set, look at the Ragel library. Compile-time generation of the search in a way that is very hard to beat.

b0a04gl•3h ago
i got into bloom filters while debugging cassandra read spikes ,lot of sstable lookups even when key not exist ,didnt make sense at first ,then realised bloom filter on each sstable meant to skip disk ,but default fp rate was high like 0.1 or so ,too much for our case ,most reads were cache miss anyway so those false positives were killing us ,changed it to 0.01 ,bit more memory it consumed but way less useless reads ,lbrought p99 read latency by good 16-18%
costco•3h ago
I had used bloom filters in the past without really understanding how they worked. Then one day I decided to implement them just going off the Wikipedia article with the 32-bit MurmurHash function and was surprised at how simple it was. If you're using C++ you can use std::vector<bool> (or as of C++23, std::bitset) to make it even easier to store the bits in a space efficient way.
kridsdale3•3h ago
I wrote a Bloom Filter for college in CUDA in 2009. My advisor was a former Nvidia guy. I then went on to not do any GPU programming at all in my career.

I probably could have made $100,000,000 if I had made a different choice there.

Kranar•35m ago
Could have also bought Bitcoin and made a lot more... just saying.

Tools I love: mise(-en-place)

https://blog.vbang.dk/2025/06/29/tools-i-love-mise/
81•micvbang•2h ago•34 comments

I made my VM think it has a CPU fan

https://wbenny.github.io/2025/06/29/i-made-my-vm-think-it-has-a-cpu-fan.html
299•todsacerdoti•6h ago•58 comments

Personal care products disrupt the human oxidation field

https://www.science.org/doi/10.1126/sciadv.ads7908
136•XzetaU8•2h ago•75 comments

Unhooking from Amazon Ebooks

https://remysharp.com/2025/06/29/unhooking-from-amazon-ebooks
26•Timothee•1h ago•28 comments

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok

https://github.com/octelium/octelium
215•geoctl•8h ago•75 comments

4-10x faster in-process pub/sub for Go

https://github.com/kelindar/event
62•kelindar•4h ago•9 comments

Bloom Filters by Example

https://llimllib.github.io/bloomfilter-tutorial/
140•ibobev•8h ago•16 comments

Loss of key US satellite data could send hurricane forecasting back 'decades'

https://www.theguardian.com/us-news/2025/jun/28/noaa-cuts-hurricane-forecasting-climate
106•trauco•2h ago•55 comments

Using the Internet without IPv4 connectivity

https://jamesmcm.github.io/blog/no-ipv4/
233•jmillikin•11h ago•102 comments

Web Numbers

https://ar.al/2025/06/25/web-numbers/
27•surprisetalk•2d ago•33 comments

The Medley Interlisp Project: Reviving a Historical Software System [pdf]

https://interlisp.org/documentation/young-ccece2025.pdf
56•pamoroso•5h ago•5 comments

Many ransomware strains will abort if they detect a Russian keyboard installed

https://krebsonsecurity.com/2021/05/try-this-one-weird-trick-russian-hackers-hate/
35•air7•1h ago•6 comments

Most ints are not floats

https://www.johndcook.com/blog/2025/06/27/most-ints-are-not-floats/
15•zdw•2d ago•17 comments

Tell HN: (dictionary|thesaurus).reference.com is now a spam site

25•akkartik•1h ago•6 comments

Why Go Rocks for Building a Lua Interpreter

https://www.zombiezen.com/blog/2025/06/why-go-rocks-for-building-lua-interpreter/
33•Bogdanp•3d ago•13 comments

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

https://llmapitest.com/
14•mrqjr•4h ago•2 comments

Show HN: Sharpe Ratio Calculation Tool

https://www.fundratios.com/
5•navquant•2h ago•0 comments

Brad Woods Digital Garden

https://garden.bradwoods.io
32•samuel246•2d ago•3 comments

The Unsustainability of Moore's Law

https://bzolang.blog/p/the-unsustainability-of-moores-law
119•shadyboi•13h ago•82 comments

Revisiting Knuth's "Premature Optimization" Paper

https://probablydance.com/2025/06/19/revisiting-knuths-premature-optimization-paper/
5•signa11•3d ago•0 comments

More on Apple's Trust-Eroding 'F1 the Movie' Wallet Ad

https://daringfireball.net/2025/06/more_on_apples_trust-eroding_f1_the_movie_wallet_ad
703•dotcoma•12h ago•464 comments

America's Coming Smoke Epidemic

https://www.theatlantic.com/science/archive/2025/06/wildfire-smoke-epidemic/683343/
48•JumpCrisscross•3h ago•5 comments

Solving `Passport Application` with Haskell

https://jameshaydon.github.io/passport/
274•jameshh•21h ago•108 comments

The Asymmetry of Destruction

https://passingtime.substack.com/p/the-asymmetry-of-destruction
9•27153•1h ago•1 comments

Implementing fast TCP fingerprinting with eBPF

https://halb.it/posts/ebpf-fingerprinting-1/
51•halb•9h ago•17 comments

Sequence and first differences together list all positive numbers exactly once

https://oeis.org/A005228
65•andersource•4d ago•25 comments

Scientists Retrace 30k-Year-Old Sea Voyage, in a Hollowed-Out Log

https://www.nytimes.com/2025/06/25/science/anthropology-ocean-migration-japan.html
27•benbreen•3d ago•14 comments

The Death of the Middle-Class Musician

https://thewalrus.ca/the-death-of-the-middle-class-musician/
253•pseudolus•22h ago•557 comments

Schizophrenia is the price we pay for minds poised near the edge of a cliff

https://www.psychiatrymargins.com/p/schizophrenia-is-the-price-we-pay
201•Anon84•22h ago•311 comments

Engineered Addictions

https://masonyarbrough.substack.com/p/engineered-addictions
641•echollama•1d ago•399 comments