frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The API Is a Dead End; Machines Need a Labor Economy

1•bot_uid_life•1m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•Jyaif•2m ago•0 comments

New wave of GLP-1 drugs is coming–and they're stronger than Wegovy and Zepbound

https://www.scientificamerican.com/article/new-glp-1-weight-loss-drugs-are-coming-and-theyre-stro...
3•randycupertino•3m ago•0 comments

Convert tempo (BPM) to millisecond durations for musical note subdivisions

https://brylie.music/apps/bpm-calculator/
1•brylie•5m ago•0 comments

Show HN: Tasty A.F.

https://tastyaf.recipes/about
1•adammfrank•6m ago•0 comments

The Contagious Taste of Cancer

https://www.historytoday.com/archive/history-matters/contagious-taste-cancer
1•Thevet•8m ago•0 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
1•alephnerd•8m ago•0 comments

Bithumb mistakenly hands out $195M in Bitcoin to users in 'Random Box' giveaway

https://koreajoongangdaily.joins.com/news/2026-02-07/business/finance/Crypto-exchange-Bithumb-mis...
1•giuliomagnifico•8m ago•0 comments

Beyond Agentic Coding

https://haskellforall.com/2026/02/beyond-agentic-coding
3•todsacerdoti•9m ago•0 comments

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

https://www.loom.com/embed/e26a750c0c754312b032e2290630853d
1•kaicianflone•11m ago•0 comments

OpenBSD Copyright Policy

https://www.openbsd.org/policy.html
1•Panino•12m ago•0 comments

OpenClaw Creator: Why 80% of Apps Will Disappear

https://www.youtube.com/watch?v=4uzGDAoNOZc
2•schwentkerr•16m ago•0 comments

What Happens When Technical Debt Vanishes?

https://ieeexplore.ieee.org/document/11316905
2•blenderob•17m ago•0 comments

AI Is Finally Eating Software's Total Market: Here's What's Next

https://vinvashishta.substack.com/p/ai-is-finally-eating-softwares-total
3•gmays•18m ago•0 comments

Computer Science from the Bottom Up

https://www.bottomupcs.com/
2•gurjeet•18m ago•0 comments

Show HN: A toy compiler I built in high school (runs in browser)

https://vire-lang.web.app
1•xeouz•20m ago•0 comments

You don't need Mac mini to run OpenClaw

https://runclaw.sh
1•rutagandasalim•21m ago•0 comments

Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
2•nicholascarolan•23m ago•0 comments

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

https://arxiv.org/abs/2601.22389
1•energyscholar•23m ago•1 comments

Ask HN: Will GPU and RAM prices ever go down?

1•alentred•23m ago•0 comments

From hunger to luxury: The story behind the most expensive rice (2025)

https://www.cnn.com/travel/japan-expensive-rice-kinmemai-premium-intl-hnk-dst
2•mooreds•24m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
5•mindracer•25m ago•0 comments

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

https://www.wsj.com/finance/currencies/a-new-crypto-winter-is-here-and-even-the-biggest-bulls-are...
1•thm•25m ago•0 comments

Moltbook was peak AI theater

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
2•Brajeshwar•26m ago•0 comments

Why Claude Cowork is a math problem Indian IT can't solve

https://restofworld.org/2026/indian-it-ai-stock-crash-claude-cowork/
3•Brajeshwar•26m ago•0 comments

Show HN: Built an space travel calculator with vanilla JavaScript v2

https://www.cosmicodometer.space/
2•captainnemo729•26m ago•0 comments

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•Brajeshwar•26m ago•0 comments

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

https://iocombats.com/blogs/micro-frontends-in-2026
2•ghazikhan205•29m ago•1 comments

These White-Collar Workers Actually Made the Switch to a Trade

https://www.wsj.com/lifestyle/careers/white-collar-mid-career-trades-caca4b5f
1•impish9208•29m ago•1 comments

The Wonder Drug That's Plaguing Sports

https://www.nytimes.com/2026/02/02/us/ostarine-olympics-doping.html
1•mooreds•29m ago•0 comments
Open in hackernews

Show HN: BloomSearch – Keyword search with hierarchical Bloom filters

https://github.com/danthegoodman1/bloomsearch
66•dangoodmanUT•6mo ago
Hey HN! I got nerd-sniped by Bloom Filters this weekend, specifically for searching datasets with high "cardinality" (number of unique items).

They're an _amazing_ data structure that, at a fixed size, tracks potential set membership. That means unlike normal b-tree indexes, they don't grow with the number of unique items in the dataset.

This makes them great for "needle in a haystack" search (logs, document) as implementations like VictoriaMetrics and Bing's BitFunnel show. I've used them in the past, but they've never been center-stage in my projects.

I wanted high cardinality keyword search for ANOTHER project... and, well, down the yak-shaving rabbit hole we go!

BloomSearch brings this into an extensible Go package:

- Very memory efficient via bloom filters and streaming row scans

- DataStore and MetaStore interfaces for any backend (can be same or separate)

- Hierarchical pruning via partitions, minmax indexes, and of course bloom filters

- Search by field, token, or field:token with complex combinators

- Disaggregated storage and compute for unbound ingest and query throughput

And of course, you know I had to make a custom file format ^-^ (FILE_FORMAT.MD)

BloomSearch is optimized for massive concurrency, arbitrary cardinality and dataset size, and super low memory usage. There's still a lot on the table too in terms of size and performance optimizations, but I'm already super pleased with it. With distributed query processing I'm targeting >100B rows/s over large datasets.

I'm also excited to replace our big logging bill ~$0.003/GB for log storage with infinite retention and guilt-free querying :P

Comments

SwiftyBug•6mo ago
How do you use Bloom filters to replace your current logs? Bloom filters are very good at knowing for sure that something does not exist in a set. What exactly is your set in this case? In other words, how can you query a dataset that's behind a bloom filter?
dangoodmanUT•6mo ago
There are three kinds of queries supported for keywords:

- field

- term

- term in field

Each file, and each row group within the file, has 3 bloom filters to handle these queries.

So something like:

{"user": {"name": "John", "tags": [{"type": "user"}, {"role": "admin"}]}}

Gets turned into queryable pairs of:

[{Path: "user.name", Values: ["John"]}, {Path: "user.tags.type", Values: ["user"]}, {Path: "user.tags.role", Values: ["admin"]}]

Then you can search for:

- any record that has "john" in it

- any record that has the "user.tags.type" key

- any record that has "user.tags.type"="user" and "user.tags.role"="admin"

Which bloom filters are used depends on how you build the query, but they test for whether a row matching the condition(s) is in the file/row group

SwiftyBug•6mo ago
Does that mean that you can't query substrings or do fuzzy searches?
panic•6mo ago
If you want to adapt the technique to full-text search, you can index trigrams instead of full keywords.
dangoodmanUT•6mo ago
haha you beat me to it! yes tokenize with trigrams is a very simple way to get this functionality. That's how systems like postgres has historically done it
dangoodmanUT•6mo ago
The BloomSearchEngine takes a TokenizerFunc so you can determine how JSON values are tokenized (that's why each path always returns an array of strings).

The default tokenizer is a a whitespace one: https://github.com/danthegoodman1/bloomsearch/blob/148a79967...

So {"name": "John Smith"} is tokenized to [{Path: "name", Values: ["john", "smith"]}], and the bloom filters will store:

- field: "name"

- token: "john"

- token: "smith"

- fieldtoken: "name:john"

- fieldtoken: "name:smith"

The same tokenizer must be used at query time too.

Fuzzy searches and sub-word searches could be supported with custom tokenizers (eg trigrams, stemming), but it's more generally targeting the "I know some exact subset of the record, I need all that have this exactly" searches

bonobocop•6mo ago
Not OP, but to me, this reads fairly similar to how ClickHouse can be set up, with Bloom filters, MinMax indexes, etc.

A way to “handle” partial substrings is to break up your input data into tokens (like substrings split in spaces or dashes) and then you can break up your search string up in the same way.

EGreg•6mo ago
Doesnt this mean you have to do a row scan though? With BTREE you have O(log N) index query and that’s it
dangoodmanUT•6mo ago
To actually retrieve the row, yeah, but a btree index size scales ~linearly with the dataset size.

You can prune based on partitions, minmax indexes, then bloom filters first. By that point the row group scan, if all other cheks suggest that the row you are after is in the block, is a very small amount of data.

https://itnext.io/how-do-open-source-solutions-for-logs-work... covers this very well

hztar•6mo ago
Super ! Bloom filters are smart. Created a hierchial bloom filter for a revisit log for an indexer almost 20 years ago. Saved us $$$ and a still kind of proud of it
ianred•6mo ago
Library/package with AGPL license, not a great thing even for a lot of FOSS projects.
dangoodmanUT•6mo ago
Not true, its very permissive as long as it's not a "feature" of a product you're building, or offering as a service.

Otherwise you can happily use it in indirect backend services (e.g. your own logging) without license concerns.

another_twist•6mo ago
I have a question about using HBFs for logs - how do you determine the hierarchy ?