frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Improving search ranking with chess Elo scores

https://www.zeroentropy.dev/blog/improving-rag-with-elo-scores
89•ghita_•4h ago
Hello HN,

I'm Ghita, co-founder of ZeroEntropy (YC W25). We build high accuracy search infrastructure for RAG and AI Agents.

We just released two new state-of-the-art rerankers zerank-1, and zerank-1-small. One of them is fully open-source under Apache 2.0.

We trained those models using a novel Elo score inspired pipeline which we describe in detail in the blog attached. In a nutshell, here is an outline of the training steps: * Collect soft preferences between pairs of documents using an ensemble of LLMs. * Fit an ELO-style rating system (Bradley-Terry) to turn pairwise comparisons into absolute per-document scores. * Normalize relevance scores across queries using a bias correction step, modeled using cross-query comparisons and solved with MLE.

You can try the models either through our API (https://docs.zeroentropy.dev/models), or via HuggingFace (https://huggingface.co/zeroentropy/zerank-1-small).

We would love this community's feedback on the models, and the training approach. A full technical report is also going to be released soon.

Thank you!

Comments

sippeangelo•3h ago
Really cool stuff! Just want to let you know you forgot to link to the evals at the end.
ghita_•2h ago
oh waw thanks for flagging, just fixed, thanks!
esafak•2h ago
I would have titled it "Improving ranking..."

I like that it works with `sentence_transformers`

ghita_•2h ago
yes we found it hard to find a good title for this, thanks for the feedback
dang•1h ago
We could change the title to "Improving search ranking with chess Elo scores". Anybody object?

Edit: ok, done. Submitted title was "Show HN: Improving RAG with chess Elo scores".

ashwindharne•2h ago
Cool stuff! We use a similar process internally to rerank and filter our cold outbound lists. We just use an off-the-shelf model as the judge, give it a custom criteria, and let it run until some set number of iterations. It's helped narrow down wide searches to the maximally relevant set of people (few thousand medium-bad matches to few hundred good matches)

It's not cheap and it's not fast, but it definitely works pretty well!

jayunit•48m ago
Very interesting! What are some examples of criteria that you can evaluate pairwise, but couldn't score individually?
bravura•17m ago
Pairwise rank constraints involve fewer assumptions that per-item scoring about the underlying nature of the data, thus they are more robust.
yalok•2h ago
What’s the expected additional latency due to running this re-ranker?
ghita_•2h ago
It actually runs pretty fast, our benchmarks show ~149ms for 12665 bytes. It's faster than many other models
esafak•1h ago
I would prominently display your benchmarks (against your competitors, of course). That's your selling point, right?
ghita_•1h ago
Yes! We did this here: https://www.zeroentropy.dev/blog/announcing-zeroentropys-fir... We wanted to share the approach with the community in this post. It does do better than competitors though!
seanhunter•2h ago
Fun fact about ELO. It's natural to think that it is some kind of initialism, but in fact ELO doesn't stand for anything. It's the name of the guy who invented the system. https://en.wikipedia.org/wiki/Arpad_Elo

So don't say it "E.L.O." (unless you're talking about the band, I guess), say "ee-low"

ghita_•2h ago
oh interesting, had no idea, thanks for sharing
amelius•1h ago
What was his ELO rating?
homarp•1h ago
https://chess.stackexchange.com/questions/35420/what-was-arp...

2065

esafak•27m ago
It should be Elo rating! https://en.wikipedia.org/wiki/Elo_rating_system
rahulnair23•1h ago
Interesting work.

For a slightly different take using a similar intuition, see our paper [at ACL 2024](https://arxiv.org/abs/2402.14860) on ranking LLMs which may be of interest.

Our HuggingFace space has some examples: https://huggingface.co/spaces/ibm/llm-rank-themselves

ghita_•1h ago
thank you, will check out the paper, the hf space is very cool!
mkaszkowiak•1h ago
Happy to see competition in rerankers! Good luck with your product.

My questions: what languages do your models currently support? Did you perform multilingual benchmarks? Couldn't find an answer on the website

ghita_•1h ago
Thanks! We trained on most european languages (english, french, spanish, russian...), arabic, and chinese so it does well on those! We haven't tested too much on other languages, but happy to do so if there is a use case

Could Natural Hydrogen Reserves Power the Planet for Centuries?

https://oilprice.com/Energy/Energy-General/Could-Natural-Hydrogen-Reserves-Really-Power-the-Planet-for-Centuries.html
1•Bluestein•1m ago•0 comments

The party trick called LLM

https://destaatvanhetweb.nl/2025/07/12/the-party-trick-called-llm-blowing-away-smoke-and-break-some-mirrors/
2•hirako2000•1m ago•0 comments

Show HN: Doctor

https://merkoba.github.io/Doctor/
1•complex_city•3m ago•0 comments

Career Civil Servants' Socially Embedded Responses to Democratic Backsliding

https://www.cambridge.org/core/journals/perspectives-on-politics/article/career-civil-servants-socially-embedded-responses-to-democratic-backsliding/CA9DA749871C535E87074D9F43B7CEFC
1•PaulHoule•4m ago•0 comments

A Better Look at 3I/Atlas

https://www.centauri-dreams.org/2025/07/16/a-better-look-at-3i-atlas/
1•JPLeRouzic•4m ago•0 comments

Zuckerberg says Meta will build a data center the size of Manhattan in AI push

https://www.theguardian.com/technology/2025/jul/16/zuckerberg-meta-data-center-ai-manhattan
2•c420•6m ago•0 comments

Notes from a product design vibe coding hackathon

https://www.intercom.com/blog/notes-from-a-product-design-vibe-coding-hackathon/
2•ChrisArchitect•6m ago•0 comments

The Artificial Intelligence Revolution: Part 1

https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html
1•gmays•8m ago•0 comments

Why Whoop Stands Behind Blood Pressure Insights after FDA Warning

https://www.whoop.com/us/en/thelocker/why-whoop-stands-behind-blood-pressure-insights/
1•brandonb•8m ago•0 comments

Google raising Nest Aware Plus pricing by 25%

3•corywatilo•8m ago•1 comments

Stop Building Products Nobody Wants: The Validation Method That Works

2•Taikhoom10•12m ago•1 comments

Soviet College Admission – My Dad's Story (1970)

http://ilyavolodarsky.com/soviet-college-admission
3•jxmorris12•12m ago•0 comments

AI Finance Academy – Free AI-Powered Personal Finance Academy and Chatbot

https://finance-academy-two.vercel.app/
2•BlazemasterB•12m ago•1 comments

Future-Proofing Junior Devs in the LLM Era

https://michaelbastos.com/blog/future-proofing-junior-devs-llm-era
2•mbastos•14m ago•1 comments

JavaScript scope hoisting is broken

https://devongovett.me/blog/scope-hoisting.html
1•whatever3•14m ago•0 comments

The CIA faces a new AI-powered spy game

https://www.washingtonpost.com/opinions/interactive/2025/cia-ai-technology-spies/
1•herbertl•15m ago•0 comments

RCE found in diagnostic app affecting Android devices and connected vehicles

https://www.nowsecure.com/blog/2025/07/16/remote-code-execution-discovered-in-xtool-anyscan-app-risks-to-phones-and-vehicles/
1•press-ntr•17m ago•0 comments

Hackers Can Tamper with Train Brakes Using Just a Radio, Feds Warn

https://gizmodo.com/hackers-can-tamper-with-train-breaks-using-just-a-radio-feds-warn-2000629522
2•almost-exactly•18m ago•0 comments

Payment processors pressure Valve into banning porn games with themes of incest

https://bsky.app/profile/steamdb.info/post/3lu32vdlsmg27
3•notthemessiah•24m ago•2 comments

Provably-Correct Vibe Coding

http://provablycorrectvibecoding.com/
1•GasStationLab•24m ago•0 comments

US-founded terrorist group says it was involved in killing of officer in Kyiv

https://www.theguardian.com/world/2025/jul/16/the-base-terrorist-group-ukraine-assassination
4•consumer451•27m ago•0 comments

AtCoder Finals Problem Statement

https://atcoder.jp/contests/awtf2025heuristic/tasks/awtf2025heuristic_a
1•11thEarlOfMar•27m ago•0 comments

YouTuber faces jail time for showing off Android-based gaming handhelds

https://arstechnica.com/gadgets/2025/07/youtuber-faces-jail-time-for-showing-off-android-based-gaming-handhelds/
2•jtokoph•27m ago•2 comments

How and where will agents ship software?

https://www.instantdb.com/essays/agents
8•stopachka•30m ago•2 comments

MongoDB Sues FerretDB over Patents, Misinformation, and Trademark Misuse [pdf]

https://storage.courtlistener.com/recap/gov.uscourts.ded.89247/gov.uscourts.ded.89247.1.0.pdf
6•the_precipitate•31m ago•0 comments

A Mile-Long Gateway to Hell Opens Up in Iceland

https://gizmodo.com/a-mile-long-gateway-to-hell-opens-up-in-iceland-2000630127
2•DocFeind•33m ago•0 comments

Ticket management system for IT professionals

https://github.com/Evolve-03/Ticket-Management-System
1•Evolve03•34m ago•0 comments

Green Tea GC: How Go Stopped Wasting 35% of Your CPU Cycles

https://siddharthav.medium.com/green-tea-garbage-collector-63233aa5a9b5
2•jasonthorsness•34m ago•0 comments

Ellen Ullman's "Close to the Machine: Technophilia and Its Discontents"

https://pluralistic.net/2025/07/16/beautiful-code/#hackers-disease
2•NotInOurNames•35m ago•0 comments

I built a real AI-first OS solo – with a functional, learning "brain system"

1•chargenorris•35m ago•4 comments