frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Altermagnets: The first new type of magnet in nearly a century

https://www.newscientist.com/article/2487013-weve-discovered-a-new-kind-of-magnetism-what-can-we-do-with-it/
227•Brajeshwar•6h ago•44 comments

How and where will agents ship software?

https://www.instantdb.com/essays/agents
74•stopachka•3h ago•32 comments

Artisanal handcrafted Git repositories

https://drew.silcock.dev/blog/artisanal-git/
31•drewsberry•1h ago•8 comments

Show HN: Improving search ranking with chess Elo scores

https://www.zeroentropy.dev/blog/improving-rag-with-elo-scores
121•ghita_•7h ago•40 comments

PyPI Prohibits inbox.ru email domain registrations

https://blog.pypi.org/posts/2025-06-15-prohibiting-inbox-ru-emails/
102•miketheman•2h ago•65 comments

Pgactive: Postgres active-active replication extension

https://github.com/aws/pgactive
229•ForHackernews•12h ago•66 comments

Chain of thought monitorability: A new and fragile opportunity for AI safety

https://arxiv.org/abs/2507.11473
81•mfiguiere•6h ago•41 comments

Show HN: 0xDEAD//TYPE – A fast-paced typing shooter with retro vibes

https://0xdeadtype.theden.sh/
30•theden•3d ago•7 comments

A Recap on May/June Stability at Neon

https://neon.com/blog/an-apology-and-a-recap-on-may-june-stability
8•nikita•1h ago•0 comments

Cloudflare 1.1.1.1 Incident on July 14, 2025

https://blog.cloudflare.com/cloudflare-1-1-1-1-incident-on-july-14-2025/
504•nomaxx117•17h ago•335 comments

Young graduates are facing an employment crisis

https://www.wsj.com/economy/jobs/jobs-unemployment-rise-young-people-ce4704d8
47•bdev12345•1h ago•37 comments

Shipping WebGPU on Windows in Firefox 141

https://mozillagfx.wordpress.com/2025/07/15/shipping-webgpu-on-windows-in-firefox-141/
320•Bogdanp•15h ago•131 comments

I'm switching to Python and actually liking it

https://www.cesarsotovalero.net/blog/i-am-switching-to-python-and-actually-liking-it.html
274•cesarsotovalero•13h ago•434 comments

Scanned piano rolls database

http://www.pianorollmusic.org/rolldatabase.php
6•bookofjoe•3d ago•0 comments

Weave (YC W25) is hiring an AI engineer

https://www.ycombinator.com/companies/weave-3/jobs/SqFnIFE-founding-ai-engineer
1•adchurch•4h ago

Mkosi – Build Bespoke OS Images

https://mkosi.systemd.io/
46•leetrout•5h ago•14 comments

What's happening to reading?

https://www.newyorker.com/culture/open-questions/whats-happening-to-reading
105•Kaibeezy•3d ago•231 comments

Atopile – Design circuit boards with code

https://atopile.io/atopile/introduction
74•poly2it•3d ago•17 comments

Tilck: A tiny Linux-compatible kernel

https://github.com/vvaltchev/tilck
251•chubot•17h ago•48 comments

'Gentle parenting' my smartphone addiction

https://www.newyorker.com/culture/infinite-scroll/gentle-parenting-my-smartphone-addiction
43•fortran77•6h ago•38 comments

How I lost my backpack with passports and laptop

https://psychotechnology.substack.com/p/how-i-lost-my-backpack-with-passports
94•eatitraw•1d ago•83 comments

Show HN: Timep – a next-gen profiler and flamegraph-generator for bash code

https://github.com/jkool702/timep
12•jkool702•1d ago•0 comments

GPUHammer: Rowhammer attacks on GPU memories are practical

https://gpuhammer.com/
253•jonbaer•21h ago•87 comments

Ukrainian hackers destroyed the IT infrastructure of Russian drone manufacturer

https://prm.ua/en/ukrainian-hackers-destroyed-the-it-infrastructure-of-a-russian-drone-manufacturer-what-is-known/
562•doener•13h ago•375 comments

MARS.EXE → COM (2021)

https://chaos.if.uj.edu.pl/~wojtek/MARS.COM/
137•reconnecting•4d ago•40 comments

Show HN: An MCP server that gives LLMs temporal awareness and time calculation

https://github.com/jlumbroso/passage-of-time-mcp
67•lumbroso•6h ago•33 comments

Intel's retreat is unlike anything it's done before in Oregon

https://www.oregonlive.com/silicon-forest/2025/07/intels-retreat-is-unlike-anything-its-done-before-in-oregon.html
39•cbzbc•2h ago•24 comments

LLM Daydreaming

https://gwern.net/ai-daydreaming
174•nanfinitum•19h ago•124 comments

KX Community Edition

https://www.defconq.tech/blog/From%20Elite%20to%20Everyone%20-%20KX%20Community%20Edition%20Breaks%20Loose
59•AUnterrainer•4h ago•30 comments

Show HN: BloomSearch – Keyword search with hierarchical bloom filters

https://github.com/danthegoodman1/bloomsearch
35•dangoodmanUT•3d ago•9 comments
Open in hackernews

Show HN: BloomSearch – Keyword search with hierarchical bloom filters

https://github.com/danthegoodman1/bloomsearch
35•dangoodmanUT•3d ago
Hey HN! I got nerd-sniped by Bloom Filters this weekend, specifically for searching datasets with high "cardinality" (number of unique items).

They're an _amazing_ data structure that, at a fixed size, tracks potential set membership. That means unlike normal b-tree indexes, they don't grow with the number of unique items in the dataset.

This makes them great for "needle in a haystack" search (logs, document) as implementations like VictoriaMetrics and Bing's BitFunnel show. I've used them in the past, but they've never been center-stage in my projects.

I wanted high cardinality keyword search for ANOTHER project... and, well, down the yak-shaving rabbit hole we go!

BloomSearch brings this into an extensible Go package:

- Very memory efficient via bloom filters and streaming row scans

- DataStore and MetaStore interfaces for any backend (can be same or separate)

- Hierarchical pruning via partitions, minmax indexes, and of course bloom filters

- Search by field, token, or field:token with complex combinators

- Disaggregated storage and compute for unbound ingest and query throughput

And of course, you know I had to make a custom file format ^-^ (FILE_FORMAT.MD)

BloomSearch is optimized for massive concurrency, arbitrary cardinality and dataset size, and super low memory usage. There's still a lot on the table too in terms of size and performance optimizations, but I'm already super pleased with it. With distributed query processing I'm targeting >100B rows/s over large datasets.

I'm also excited to replace our big logging bill ~$0.003/GB for log storage with infinite retention and guilt-free querying :P

Comments

SwiftyBug•5h ago
How do you use Bloom filters to replace your current logs? Bloom filters are very good at knowing for sure that something does not exist in a set. What exactly is your set in this case? In other words, how can you query a dataset that's behind a bloom filter?
dangoodmanUT•5h ago
There are three kinds of queries supported for keywords:

- field

- term

- term in field

Each file, and each row group within the file, has 3 bloom filters to handle these queries.

So something like:

{"user": {"name": "John", "tags": [{"type": "user"}, {"role": "admin"}]}}

Gets turned into queryable pairs of:

[{Path: "user.name", Values: ["John"]}, {Path: "user.tags.type", Values: ["user"]}, {Path: "user.tags.role", Values: ["admin"]}]

Then you can search for:

- any record that has "john" in it

- any record that has the "user.tags.type" key

- any record that has "user.tags.type"="user" and "user.tags.role"="admin"

Which bloom filters are used depends on how you build the query, but they test for whether a row matching the condition(s) is in the file/row group

SwiftyBug•3h ago
Does that mean that you can't query substrings or do fuzzy searches?
panic•3h ago
If you want to adapt the technique to full-text search, you can index trigrams instead of full keywords.
dangoodmanUT•3h ago
haha you beat me to it! yes tokenize with trigrams is a very simple way to get this functionality. That's how systems like postgres has historically done it
dangoodmanUT•3h ago
The BloomSearchEngine takes a TokenizerFunc so you can determine how JSON values are tokenized (that's why each path always returns an array of strings).

The default tokenizer is a a whitespace one: https://github.com/danthegoodman1/bloomsearch/blob/148a79967...

So {"name": "John Smith"} is tokenized to [{Path: "name", Values: ["john", "smith"]}], and the bloom filters will store:

- field: "name"

- token: "john"

- token: "smith"

- fieldtoken: "name:john"

- fieldtoken: "name:smith"

The same tokenizer must be used at query time too.

Fuzzy searches and sub-word searches could be supported with custom tokenizers (eg trigrams, stemming), but it's more generally targeting the "I know some exact subset of the record, I need all that have this exactly" searches

bonobocop•3h ago
Not OP, but to me, this reads fairly similar to how ClickHouse can be set up, with Bloom filters, MinMax indexes, etc.

A way to “handle” partial substrings is to break up your input data into tokens (like substrings split in spaces or dashes) and then you can break up your search string up in the same way.

EGreg•2h ago
Doesnt this mean you have to do a row scan though? With BTREE you have O(log N) index query and that’s it
dangoodmanUT•1h ago
To actually retrieve the row, yeah, but a btree index size scales ~linearly with the dataset size.

You can prune based on partitions, minmax indexes, then bloom filters first. By that point the row group scan, if all other cheks suggest that the row you are after is in the block, is a very small amount of data.

https://itnext.io/how-do-open-source-solutions-for-logs-work... covers this very well