frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
142•theblazehen•2d ago•42 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
668•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
949•xnx•19h ago•551 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
122•matheusalmeida•2d ago•32 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
53•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
229•isitcontent•14h ago•25 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
16•kaonwarb•3d ago•19 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
222•dmpetrov•14h ago•117 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
27•jesperordrup•4h ago•16 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
330•vecti•16h ago•143 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
494•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
381•ostacke•20h ago•95 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•20h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
288•eljojo•17h ago•169 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
412•lstoll•20h ago•278 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
19•bikenaga•3d ago•4 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
63•kmm•5d ago•6 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
90•quibono•4d ago•21 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
256•i5heu•17h ago•196 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
32•romes•4d ago•3 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
43•helloplanets•4d ago•42 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
12•speckx•3d ago•4 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
59•gfortaine•12h ago•25 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
33•gmays•9h ago•12 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1066•cdrnsf•23h ago•446 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•67 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
149•SerCe•10h ago•138 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
287•surprisetalk•3d ago•43 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
182•limoce•3d ago•98 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•13h ago•14 comments
Open in hackernews

Binary fuse filters: Fast and smaller than xor filters (2022)

https://arxiv.org/abs/2201.01174
138•redbell•2w ago

Comments

djmips•2w ago
This feels like a better xor filter implementation.
orlp•2w ago
Same author.
pwagland•2w ago
This is mentioned on the first page of the paper:

> Building on theoretical work by Dietzfelbinger and Walzer [8], we propose a novel practical approach, the binary fuse filters. They are conceptually similar to xor filters, and they rely on nearly the same simple code.

dang•2w ago
Related prior work:

Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters - https://news.ycombinator.com/item?id=22742905 - March 2020 (25 comments)

Xor Filters: Faster and Smaller Than Bloom Filters - https://news.ycombinator.com/item?id=21840821 - Dec 2019 (81 comments)

less_less•2w ago
See also the paper Ribbon filter: practically smaller than Bloom and Xor: https://arxiv.org/abs/2103.02515, which is a similar idea though not by the same authors.

IIRC, binary fuse filters are faster to construct than ribbon filters, but typically not quite as space-efficient. There are also frayed ribbon filters (by me) which are slower and more complex to construct but more space-efficient. There's no paper for those, just a Rust implementation.

Ribbon filters are deployed in Mozilla's Clubcard for distributing compressed certificate revocation lists: https://github.com/mozilla/clubcard and https://jmschanck.info/papers/20250327-clubcard.pdf. CRLs are an almost ideal application of this sort of compressed set tech, since the aggregator runs batch jobs and needs to distribute the set to very many clients. It's not perfectly ideal because CRLs require frequent updates and none of these methods support delta updates. There is a straightforward but inelegant workaround, which is to send a compressed set that represents the delta, and query both on the client.

vgb2k18•2w ago
Fast, but not faster than XOR filters. I was wondering if the title was a typo, but the article clarifies they sacrificed some speed for the smaller size.
nine_k•2w ago
One construction is smaller and faster to build than xor filters, and another is even more compact, though slower:

> we build probabilistic filters -- called binary fuse filters -- that are within 13% of the storage lower bound -- without sacrificing query speed. As an additional benefit, the construction of the new binary fuse filters can be more than twice as fast as the construction of xor filters. By slightly sacrificing query speed, we further reduce storage to within 8% of the lower bound.

vlovich123•2w ago
I think you misread:

> that are within 13% of the storage lower bound -- without sacrificing query speed. As an additional benefit, the construction of the new binary fuse filters can be more than twice as fast as the construction of xor filters. By slightly sacrificing query speed, we further reduce storage to within 8% of the lower bound.

They are faster to construct (2x) and smaller (within 13% of theoretical limit) while maintaining the same query performance.

pastage•2w ago
I recommend the zig library [1], it was a joy to use. Bloom filters was one of the first interesting algorithms I did in class back in university, we upgraded hardware during the lab making the use of bloom filters unnecessary in a lab ment to interactively show its usefulness. I have had this repeated since then, these filters are magic until hardware catches up, having smaller filter is lovely.

[1] https://github.com/hexops/fastfilter

Sesse__•2w ago
As far as I can see, these classes of filters (including xor filters) have some practical issues for many applications: They can become full (refuse new entries altogether), and they need to know all the elements up-front (no incremental inserts). Is there anything more modern than Bloom filters that don't have these restrictions?

I'm especially fond of tiny filters; a well-placed 32- or 64-bit Bloom filter can be surprisingly effective in the right context!

withoutboats3•2w ago
Cuckoo filters outperform bloom filters and allow dynamic insertion and deletion (unlike bloom filters, which only allow insertion). The trade off is that insertion can fail if the table is too full and then would need to expand or store those entries some other way to avoid a false negative.
thomasmg•2w ago
There are many variants. It really depends on what features you need. Cuckoo filters were mentioned. If you want to add and remove entries and the regular counting Bloom filter are not good enough, I would look at the "succinct counting blocked Bloom filter" [1]: they only need about twice the space than regular Bloom filters. Sure, cuckoo filters need less memory, but they can fail basically any time, while these can not.

Tiny filters: Some time ago I worked on tiny statistics [2]. This includes some 64-bit HyperLogLog implementations; some use linear counting, which is basically a 64-bit Bloom filter, until some limit, and only then switch to HyperLogLog. This is great for distinct counts of columns in databases (cardinality estimation). This project also includes 64-bit approximate counts and histograms.

[1] https://github.com/FastFilter/fastfilter_java/blob/master/fa... [2] https://github.com/thomasmueller/tinyStats

Sesse__•2w ago
FWIW, I found https://github.com/FastFilter/fastfilter_java/issues/28 a pretty good explanation of what's going on in the succinct counting blocked Bloom filters. (I'm not sure if the blocking is needed for the normal Bloom filter part, though, i.e., all bits don't necessarily need to fall within the same block, no? But the counts are stored separately for each block.)
thomasmg•2w ago
Yes, non-blocked is also possible. This would need a bit less space, and would be a bit slower. The counts > 1 (per bit that is set) are stored spearately, yes.
Sesse__•2w ago
> This would need a bit less space, and would be a bit slower.

I guess that is because the count storage update is really slow, right, so it's better to have one than two (or whatever number of set bits) operations? At least the linked code seems to process it one by one bit when updating, and without some sort of “rank of n-th set bit” operation (which would accelerate the “select” version fairly well), I'm not sure it could be made much faster than that either.

Edit: I see https://github.com/FastFilter/fastfilter_java/blob/master/fa... tries to make that operation in O(1), but to be honest, with this many operations, the cure almost looks worse than the disease. :-) Haven't benchmarked, though.

thomasmg•1w ago
Yes the count storage update is a bit slow. It could be implemented as a background thread, so that it is not blocking. It depends on the use case wheter this is a problem. The 'blocked' variant might be faster to optimize I assume, if this is needed.
Sesse__•1w ago
> It could be implemented as a background thread, so that it is not blocking.

How would you realistically do this without creating more overhead in the thread communication than what you're saving? I've never heard of offloading a 30–40 cycle operation to a thread before. Typically sending an atomic across CPUs is what, 200 cycles? (That's assuming you have the thread just sitting there in some kind of pool; firing up a new one is much, much more expensive. It also assumes you never need to hear back from it, so wouldn't work well for a decrease where you need to know the count.)

thomasmg•1w ago
Right, it would be wasteful if it's done for each entry separately. But that is not needed. (I should probably write an article about this and a better implementation).

The "succinct counting (blocked) Bloom filters" have two components: the regular Bloom filter for querying, and the count storage. The count storage is not "strictly" needed for querying: you will never get a false _negative_ is increments / decrements in the count storage are deferred. So updating the count storage can be done asynchronously. Both increments and decrements, if you want. So these can be buffered, eg 100 increments / decrements at a time. Sure, the false-positive rate is slightly higher than needed if the decrements are deferred, but with a large filter this effect is small.

So that means, you can buffer increments / decrements in the count storage. You still want to do the "or" operations in the the Bloom filter part synchronously, so that you don't get false negatives. And then, it is no longer just 30-40 cycles, but 3000-4000 cycles, or 30'000-40'000 cycles, or so. I understand this would not be trivial to implement, but also not very complex. I never really had a real-world use case so far, so I didn't work on a full implementation yet.

Sesse__•1w ago
Sure, you can batch. But that means your decrements must be _entirely_ buffered, right? Because you cannot remove anything from the main filter until you know its count is zero, and you don't know its count until you've processed all increments. And you need to do that under a lock, or you'll have a race where you could have a false negative in the main filter for a short while.
thomasmg•1w ago
Yes exactly!
orlp•2w ago
Bloom filters also become full.

As it fills up the false probability rate goes up. Once the false probability rate reaches the threshold of unacceptability, the bloom filter is full, and you can no longer insert into it.

That most interfaces still let you do something that looks like an insert is an interface failure, not a bloom filter feature.

If you find this controversial and want to reply "I don't have a threshold of unacceptability", I'll counter that a false probability rate of 100% will be reached eventually. And if you still find that acceptable, you can trivially modify any probabilistic filter to "never become full" by replacing the "is full" error condition with setting a flag that all future queries should return a false positive.

fmajid•2w ago
Counting Quotient Filters (CQF) and Mixed-Counters Quotient Filters (MQF) are more efficient than Bloom Filters, and can count, making them highly suited for applications like rate-limiting at scale.

https://link.springer.com/content/pdf/10.1186/s12859-021-039...

thomasmg•2w ago
I'm one of the authors. Feel free to ask anything.
kuhdungbingo•2w ago
Now would this be useful in high scaling games, specifically: determining if you might be a winner of game with a grid of 10^nx10^n (where n>5). A very large "cow bingo game", where the insertions are made randomly spawned on a grid? Seems one of the other filters would be more appropriate as they support dynamic insertion.

Still very neat, nice work!

thomasmg•2w ago
Yes, you would need a dynamic filter (eg. regular Bloom filter or cuckoo filter) for this, due to the insertions.

Static filters are good for eg. leaked password lists, LSM trees (LevelDB, RocksDB), and so on.

avadodin•1w ago
I'm saddened that there aren't more replies to this.

Bloom filters are the coolest thing ever and I wish I could replace them with something better.