Built an api that tells any hybrid search stack what alpha to use for RRF based on the query vector + text. Also a compressed index service but that's another thing.
Hybrid search always felt like a devil's bargain. I could either rescue keywords and some no duh searches and just tank my dense ranking at the very top, or I don’t and miss the forest for the trees.
It always left a bad taste in my mouth and never quite sat right.
The premise is simple. It’s pretty obvious to any person looking at a query whether BM25 helps or not. If it’s obvious to us, there should be a model that can do the same. (Turns out you only need 5M params.)
The hard part was making it work universally with every embedding model out there. I had to spin up an entire serving pod just to store all the duplicated vectors.
Anyways, you don’t need to port your data over or anything. Just shoot the query vector + text at the API and get an ideal alpha.
The more advanced version with higher lift (+12 NDCG) has to run in our vector store because it relies on the results themselves, and passing 200 vectors + text doesn’t make a lot of sense.
You don’t have to take my word for it. Full results are in the repo: 13 models (3 held out), 4 distinct datasets, full alpha sweeps, dense vs fixed hybrid vs dynamic. Or just throw it at your stack and see for yourself.
Would love any and all feedback, even if it’s who cares.
nickswami•1h ago
Hybrid search always felt like a devil's bargain. I could either rescue keywords and some no duh searches and just tank my dense ranking at the very top, or I don’t and miss the forest for the trees.
It always left a bad taste in my mouth and never quite sat right.
The premise is simple. It’s pretty obvious to any person looking at a query whether BM25 helps or not. If it’s obvious to us, there should be a model that can do the same. (Turns out you only need 5M params.)
The hard part was making it work universally with every embedding model out there. I had to spin up an entire serving pod just to store all the duplicated vectors.
Anyways, you don’t need to port your data over or anything. Just shoot the query vector + text at the API and get an ideal alpha.
The more advanced version with higher lift (+12 NDCG) has to run in our vector store because it relies on the results themselves, and passing 200 vectors + text doesn’t make a lot of sense.
You don’t have to take my word for it. Full results are in the repo: 13 models (3 held out), 4 distinct datasets, full alpha sweeps, dense vs fixed hybrid vs dynamic. Or just throw it at your stack and see for yourself.
Would love any and all feedback, even if it’s who cares.