frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

https://github.com/Pringled/pyversity
32•Tananon•3h ago
Hey HN! I’ve recently open-sourced Pyversity, a lightweight library for diversifying retrieval results. Most retrieval systems optimize only for relevance, which can lead to top-k results that look almost identical. Pyversity efficiently re-ranks results to balance relevance and diversity, surfacing items that remain relevant but are less redundant. This helps with improving retrieval, recommendation, and RAG pipelines without adding latency or complexity.

Main features:

- Unified API: one function (diversify) supporting several well-known strategies: MMR, MSD, DPP, and COVER (with more to come)

- Lightweight: the only dependency is NumPy, keeping the package small and easy to install

- Fast: efficient implementations for all supported strategies; diversify results in milliseconds

Re-ranking with cross-encoders is very popular right now, but also very expensive. From my experience, you can usually improve retrieval results with simpler and faster methods, such as the ones implemented in this package. This helps retrieval, recommendation, and RAG systems present richer, more informative results by ensuring each new item adds new information.

Code and docs: github.com/pringled/pyversity

Let me know if you have any feedback, or suggestions for other diversification strategies to support!

Comments

leobg•2h ago
Might also be useful for dataset curation, or even just prompt engineering. For example when training a classification task and picking a diverse set of examples for training or evaluation.
Tananon•2h ago
True, I think that's also a great usecase! Though these algorithms likely won't scale to very large datasets (e.g. millions of samples), but for smaller datasets, like fine-tuning sets, I think this would work very well. I've worked on something similar in the past that works for larger datasets (semantic deduplication: https://github.com/MinishLab/semhash).

Doing well in your courses: Andrej's advice for success (2013)

https://cs.stanford.edu/people/karpathy/advice.html
63•peterkshultz•1h ago•15 comments

What Are RFCs? The Forgotten Blueprints of the Internet

https://ackreq.github.io/posts/what-are-rfcs/
47•ackreq•2h ago•36 comments

The Trinary Dream Endures

https://www.robinsloan.com/lab/trinary-dream/
11•FromTheArchives•51m ago•5 comments

Replacement.ai

https://replacement.ai
626•wh313•4h ago•395 comments

Comparing the power consumption of a 30 year old refrigerator to a brand new one

https://ounapuu.ee/posts/2025/10/14/fridge-power-consumption/
48•furkansahin•5d ago•48 comments

Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB

https://demo.duckui.com
134•caioricciuti•6h ago•42 comments

How to Assemble an Electric Heating Element from Scratch

https://solar.lowtechmagazine.com/2025/10/how-to-build-an-electric-heating-element-from-scratch/
38•surprisetalk•4h ago•19 comments

Infisical (YC W23) Is Hiring Full Stack Engineers

https://www.ycombinator.com/companies/infisical/jobs/0gY2Da1-full-stack-engineer-global
1•vmatsiiako•48m ago

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

https://github.com/Pringled/pyversity
32•Tananon•3h ago•2 comments

The case for the return of fine-tuning

https://welovesota.com/article/the-case-for-the-return-of-fine-tuning
96•nanark•8h ago•42 comments

The macOS LC_COLLATE hunt: Or why does sort order differently on macOS and Linux

https://blog.zhimingwang.org/macos-lc_collate-hunt
30•g0xA52A2A•4h ago•2 comments

The zipper is getting its first major upgrade in 100 years

https://www.wired.com/story/the-zipper-is-getting-its-first-major-upgrade-in-100-years/
44•bookofjoe•2h ago•52 comments

Abandoned land drives dangerous heat in Houston, study finds

https://stories.tamu.edu/news/2025/10/07/abandoned-land-drives-dangerous-heat-in-houston-texas-am...
83•PaulHoule•4h ago•77 comments

Why an abundance of choice is not the same as freedom

https://aeon.co/essays/why-an-abundance-of-choice-is-not-the-same-as-freedom
63•herbertl•2h ago•25 comments

Xubuntu.org Might Be Compromised

https://old.reddit.com/r/Ubuntu/comments/1oa4549/xubuntuorg_might_be_compromised/
174•kekqqq•3h ago•60 comments

The Spherical Cows of Programming

https://programmingsimplicity.substack.com/p/the-spherical-cows-of-programming
16•whobre•2h ago•18 comments

Lost Jack Kerouac story found among assassinated mafia boss' belongings

https://www.sfgate.com/sf-culture/article/lost-jack-kerouac-chapter-found-mafia-boss-estate-21098...
72•rmason•4d ago•37 comments

Thieves steal crown jewels in 4 minutes from Louvre Museum

https://apnews.com/article/france-louvre-museum-robbery-a3687f330a43e0aaff68c732c4b2585b
40•malshe•1h ago•9 comments

Improving PixelMelt's Kindle Web Deobfuscator

https://shkspr.mobi/blog/2025/10/improving-pixelmelts-kindle-web-deobfuscator/
59•ColinWright•5h ago•13 comments

Windows 11 25H2 October Update Bug Renders Recovery Environment Unusable

https://www.techpowerup.com/342032/windows-11-25h2-october-update-bug-renders-recovery-environmen...
32•MaximilianEmel•1h ago•10 comments

Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC

https://github.com/VapiAI/vapicon-2025-hardware-workshop
19•Sean-Der•1w ago•3 comments

EQ: A video about all forms of equalizers

https://www.youtube.com/watch?v=CLAt95PrwL4
233•robinhouston•1d ago•66 comments

Feed me up, Scotty – custom RSS feed generation using CSS selectors

https://feed-me-up-scotty.vincenttunru.com/
18•diymaker•4h ago•4 comments

Scheme Reports at Fifty

https://crumbles.blog/posts/2025-10-18-scheme-reports-at-fifty.html
6•djwatson24•3h ago•0 comments

Show HN: Notepad.exe – macOS editor for Swift and Python (now Linux runtime)

https://notepadexe.com/
4•krzyzanowskim•1h ago•0 comments

When Pollution Spikes in Southeast Asia, Rainfall Shifts from Land to Sea

https://e360.yale.edu/digest/southeast-asia-aerosols-rainfall?asds
13•Brajeshwar•1h ago•0 comments

OpenAI researcher announced GPT-5 math breakthrough that never happened

https://the-decoder.com/leading-openai-researcher-announced-a-gpt-5-math-breakthrough-that-never-...
271•Topfi•6h ago•175 comments

GNU Octave Meets JupyterLite: Compute Anywhere, Anytime

https://blog.jupyter.org/gnu-octave-meets-jupyterlite-compute-anywhere-anytime-8b033afbbcdc
5•bauta-steen•2h ago•0 comments

A Tower on Billionaires' Row Is Full of Cracks. Who's to Blame?

https://www.nytimes.com/2025/10/19/nyregion/432-park-avenue-condo-tower.html
85•danso•5h ago•53 comments

I wish SSDs gave you CPU performance style metrics about their activity

https://utcc.utoronto.ca/~cks/space/blog/tech/SSDWritePerfMetricsWish
6•ingve•35m ago•1 comments