frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Biscuit is a specialized PostgreSQL index for fast pattern matching LIKE queries

https://github.com/CrystallineCore/Biscuit
123•eatonphil•1mo ago

Comments

eatonphil•1mo ago
Noticed Daniel Lemire talking about it and how they use Roaring Bitmaps.

https://x.com/lemire/status/2000944944832504025

fabian2k•1mo ago
Looks very interesting. I really like trigram indexes for certain use cases, but those are essentially running an ILIKE %something% on various text content in the DB. So that would fit the described limitations of this index type very well.

Usually you're quickly steered towards fulltext search (tsvector) in Postgres if you want to do something like that. But depending on what kind of search you actually need, trigram indexes can be a better option. If you don't search so much for natural language, but more for specific keywords the stemming in fulltext search can get in the way.

One information that would be nice here is a comparison of the index size on disk for both index types.

out_of_protocol•1mo ago
Any data on index size for big tables? Comparison (with ms/megabytes) vs trigram regarding size/speed?

UPD

> Biscuit is 15.0× faster than B-tree (median) and 5.6× faster than Trigram (median)

> Trade-off: 3.2× larger index than Trigram, but 5.6× faster queries (median)

maxmcd•1mo ago
I found some more info here: https://biscuit.readthedocs.io/en/latest/benchmark_roaring.h...
tandr•1mo ago
Index Size

    Biscuit 277.09 MB
    Trigram 86 MB
    B-Tree  43 MB
Pretty much you exchange space for speed
kwillets•1mo ago
This is a fairly simple idea of indexing characters for each column/offset and compressing the bitmaps. Simple is good, as the overhead of more sophisticated ideas (eg suffix sorting) is often prohibitive.

One suggestion is to index the end-of-string as a character as well; then you don't need negative offsets. But that turns the suffix search into a wildcard type of thing where you have to try all offsets, which is what the '%pat%' searches do already, so maybe it's OK.

Sesse__•1mo ago
AFAIK the most common design for these kinds of systems is using trigram posting lists with position information, i.e., where in the string does the trigram occur. (It's the extra position information that means that you don't need to re-check the string itself.) No need for many different bitmaps; you just take an existing GIN-like design, remove deduplication and add some side information.
pedrozieg•1mo ago
Postgres’s extensible index AM story doesn’t get enough love, so it’s nice to see someone really lean into it for LIKE. Biscuit is basically saying: “what if we precompute an aggressive amount of bitmap structure (forward/backward char positions, case-insensitive variants, length buckets) so most wildcard patterns become a handful of bitmap ops instead of a heap scan or bitmap heap recheck?” That’s a very different design point from pg_trgm, which optimizes more for fuzzy-ish matching and general text search than for “I run a ton of LIKE '%foo%bar%' on the same columns”.

The interesting question in prod is always the other side of that trade: write amplification and index bloat. The docs are pretty up-front that write performance and concurrency haven’t been deeply characterized yet, and they even have a section on when you should stick with pg_trgm or plain B-trees instead. If they can show that Biscuit stays sane under a steady stream of updates on moderately long text fields, it’ll be a really compelling option for the common “poor man’s search” use case where you don’t want to drag in an external search engine but ILIKE '%foo%' is killing your box.

bjt•1mo ago
Wouldn't tsvector, tsquery, ts_rank, etc. be Postgres's "poor man's search" solution? With language-aware stemming they don't need to be as aggressive with writing to indexes as you describe Biscuit above.

But if you really need to optimize LIKE instead of providing plain text search, sure.

eats_indigo•1mo ago
How is the postgres ecosystem at stating when these kinds of things are ready for adoption? I can think of a usecase at work where this might be useful, but hesitant to just start throwing random opensource extensions at our monolith DB.
fwip•1mo ago
The GitHub repo is about two weeks old and there's a single author - if I were you, I'd let it cook for a while longer.
eats_indigo•1mo ago
My thoughts exactly
tpetry•1mo ago
In my experience you wait for the next two major PG release. When its actively maintained they support them fast. If not, you see by them that it is abandoned…
oldgregg•1mo ago
Would this be a good fit to replace FTS for hybrid search? Biscuit + Vector?
viraptor•1mo ago
I'm confused by the example in readme:

   Example: LIKE '%abc%def'
   ...
   Step 2: Match first part as prefix
   
   -- "abc" must start at position 0
   Candidates = pos[a@0] ∩ pos[b@1] ∩ pos[c@2]
Is this a mistake, or is there some position magic that makes the position == 0, even after an arbitrary prefix?
Crystallinecore•1mo ago
Hi! Thanks for pointing that out. The Readme has now been updated, and the example has been fixed.
sroerick•1mo ago
I don't know much about postgres - I'm wondering how I would match something like

"Foobario 451" With the string "Foo 4" Is this too much complexity for trigrams? Would biscuit work for this?

Poland to probe possible links between Epstein and Russia

https://www.reuters.com/world/poland-probe-possible-links-between-epstein-russia-pm-tusk-says-202...
1•doener•5m ago•0 comments

Effectiveness of AI detection tools in identifying AI-generated articles

https://www.ijoms.com/article/S0901-5027(26)00025-1/fulltext
1•XzetaU8•11m ago•0 comments

Warsaw Circle

https://wildtopology.com/bestiary/warsaw-circle/
1•hackandthink•12m ago•0 comments

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
1•pacod•17m ago•0 comments

The AI4Agile Practitioners Report 2026

https://age-of-product.com/ai4agile-practitioners-report-2026/
1•swolpers•18m ago•0 comments

Digital Independence Day

https://di.day/
1•pabs3•22m ago•0 comments

What a bot hacking attempt looks like: SQL injections galore

https://old.reddit.com/r/vibecoding/comments/1qz3a7y/what_a_bot_hacking_attempt_looks_like_i_set_up/
1•cryptoz•23m ago•0 comments

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

https://flashmesh.netlify.app
1•Elevanix•24m ago•0 comments

Show HN: AgentLens – Open-source observability and audit trail for AI agents

https://github.com/amitpaz1/agentlens
1•amit_paz•25m ago•0 comments

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

https://shipclaw.app
1•sunpy•27m ago•0 comments

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

https://daily-trending.org
1•azamsayeedit•29m ago•1 comments

Explanation of British Class System

https://www.youtube.com/watch?v=Ob1zWfnXI70
1•lifeisstillgood•30m ago•0 comments

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

https://github.com/alesr/jwtpeek
1•alesrdev•33m ago•0 comments

Willow – Protocols for an uncertain future [video]

https://fosdem.org/2026/schedule/event/CVGZAV-willow/
1•todsacerdoti•35m ago•0 comments

Feedback on a client-side, privacy-first PDF editor I built

https://pdffreeeditor.com/
1•Maaz-Sohail•39m ago•0 comments

Clay Christensen's Milkshake Marketing (2011)

https://www.library.hbs.edu/working-knowledge/clay-christensens-milkshake-marketing
2•vismit2000•45m ago•0 comments

Show HN: WeaveMind – AI Workflows with human-in-the-loop

https://weavemind.ai
9•quentin101010•51m ago•2 comments

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

https://seedream5ai.org
1•dallen97•53m ago•0 comments

A contributor trust management system based on explicit vouches

https://github.com/mitchellh/vouch
2•admp•55m ago•1 comments

Show HN: Analyzing 9 years of HN side projects that reached $500/month

3•haileyzhou•55m ago•0 comments

The Floating Dock for Developers

https://snap-dock.co
2•OsamaJaber•56m ago•0 comments

Arcan Explained – A browser for different webs

https://arcan-fe.com/2026/01/26/arcan-explained-a-browser-for-different-webs/
2•walterbell•57m ago•0 comments

We are not scared of AI, we are scared of irrelevance

https://adlrocha.substack.com/p/adlrocha-we-are-not-scared-of-ai
1•adlrocha•58m ago•0 comments

Quartz Crystals

https://www.pa3fwm.nl/technotes/tn13a.html
2•gtsnexp•1h ago•0 comments

Show HN: I built a free dictionary API to avoid API keys

https://github.com/suvankar-mitra/free-dictionary-rest-api
2•suvankar_m•1h ago•0 comments

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

https://kybera.xyz
3•xipz•1h ago•0 comments

Show HN: brew changelog – find upstream changelogs for Homebrew packages

https://github.com/pavel-voronin/homebrew-changelog
1•kolpaque•1h ago•0 comments

Any chess position with 8 pieces on board and one pair of pawns has been solved

https://mastodon.online/@lichess/116029914921844500
2•baruchel•1h ago•1 comments

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

https://cyber-omelette.com/posts/the-abstraction-rises.html
3•birdculture•1h ago•0 comments

Projecting high-dimensional tensor/matrix/vect GPT–>ML

https://github.com/tambetvali/LaegnaAIHDvisualization
1•tvali•1h ago•1 comments