frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

https://github.com/Pringled/pyversity
29•Tananon•3h ago
Hey HN! I’ve recently open-sourced Pyversity, a lightweight library for diversifying retrieval results. Most retrieval systems optimize only for relevance, which can lead to top-k results that look almost identical. Pyversity efficiently re-ranks results to balance relevance and diversity, surfacing items that remain relevant but are less redundant. This helps with improving retrieval, recommendation, and RAG pipelines without adding latency or complexity.

Main features:

- Unified API: one function (diversify) supporting several well-known strategies: MMR, MSD, DPP, and COVER (with more to come)

- Lightweight: the only dependency is NumPy, keeping the package small and easy to install

- Fast: efficient implementations for all supported strategies; diversify results in milliseconds

Re-ranking with cross-encoders is very popular right now, but also very expensive. From my experience, you can usually improve retrieval results with simpler and faster methods, such as the ones implemented in this package. This helps retrieval, recommendation, and RAG systems present richer, more informative results by ensuring each new item adds new information.

Code and docs: github.com/pringled/pyversity

Let me know if you have any feedback, or suggestions for other diversification strategies to support!

Comments

leobg•1h ago
Might also be useful for dataset curation, or even just prompt engineering. For example when training a classification task and picking a diverse set of examples for training or evaluation.
Tananon•1h ago
True, I think that's also a great usecase! Though these algorithms likely won't scale to very large datasets (e.g. millions of samples), but for smaller datasets, like fine-tuning sets, I think this would work very well. I've worked on something similar in the past that works for larger datasets (semantic deduplication: https://github.com/MinishLab/semhash).

Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB

https://demo.duckui.com
130•caioricciuti•6h ago•39 comments

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

https://github.com/Pringled/pyversity
29•Tananon•3h ago•2 comments

Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC

https://github.com/VapiAI/vapicon-2025-hardware-workshop
17•Sean-Der•1w ago•3 comments

Show HN: Photerra – One app to discover hidden gems, plan with friends, and book

https://www.photerra.com/
3•davidlevien•36m ago•1 comments

Show HN: Browser-based PDF form fields detection (YOLO-based)

https://commonforms.simplepdf.com/
5•nip•57m ago•1 comments

Show HN: Moonfish – AI podcast generator with research, writing, and voicing

https://apps.apple.com/us/app/moonfish-ai/id6748574770
2•huygiab•1h ago•0 comments

Show HN: Notepad.exe – macOS editor for Swift and Python (now Linux runtime)

https://notepadexe.com/
2•krzyzanowskim•1h ago•0 comments

Show HN: Web-directive.js – A directive pattern for native HTML

https://github.com/asika32764/web-directive
3•asika32764•3h ago•0 comments

Show HN: EloqDoc: MongoDB-Compatible Doc DB with Object Storage as First Citizen

https://github.com/eloqdata/eloqdoc
9•eloqdata•2h ago•6 comments

Show HN: 17 Y/O built my second app: Omegle for Indie Hackers and Builders

https://www.xappy.fun/
3•imad-101•2h ago•0 comments

Show HN: MotionFlow – Extract Android Motion Photos to .jpg and .mp4

https://motionflow.dejavu.moe/
2•DejavuMoe•2h ago•0 comments

Show HN: Jekyll Book Boilerplate – A boilerplate for self-publishing books

https://github.com/abuseofnotation/jekyll-book-boilerplate
2•boris_m•2h ago•0 comments

Show HN: EraseVideo – a Free Mac app removes Sora video watermark in 1 minute

https://erasevideo.app
2•qzcanoe•3h ago•0 comments

Show HN: C and C++ preprocessor for modern memory safety

https://github.com/krishnaTORQUE/cdefer
5•KrishnaTorque•3h ago•0 comments

Show HN: Syna – Minimal ML and RL Framework Built from Scratch with NumPy

https://github.com/sql-hkr/syna
2•sql-hkr•4h ago•0 comments

Show HN: Proxmox-GitOps: Container Automation Metaframework (Recursive Monorepo)

https://github.com/stevius10/Proxmox-GitOps
5•gitopspm•8h ago•1 comments

Show HN: Nova: Open-source solution for CAD file conflicts

https://github.com/agg111/nova
4•aishwaryagune•9h ago•0 comments

Show HN: The Shape of YouTube

https://soy.leg.ovh/
30•hide_on_bush•1w ago•11 comments

Show HN: ServiceRadar – open-source Network Observability Platform

https://github.com/carverauto/serviceradar
56•carverauto•1d ago•3 comments

Show HN: We packaged an MCP server inside Chromium

https://github.com/browseros-ai/BrowserOS/blob/main/docs/browseros-mcp/how-to-guide.mdx
45•felarof•2d ago•16 comments

Show HN: Inkeep (YC W23) – Agent Builder to create agents in code or visually

https://github.com/inkeep/agents
78•engomez•3d ago•49 comments

Show HN: Land use visualization for European countries

https://onsland.koenvangilst.nl/
19•vnglst•1d ago•6 comments

Show HN: A large format XY scanning hyperspectral camera

https://www.anfractuosity.com/projects/waverider/
44•anfractuosity•1w ago•10 comments

Show HN: WP-Easy, framework to build WordPress themes

https://github.com/drewbaker/wp-easy
2•drewrbaker•14h ago•0 comments

Show HN: Open-source implementation of Stanford's self-learning agent framework

https://github.com/kayba-ai/agentic-context-engine
9•kayba•1d ago•1 comments

Show HN: Firm, a text-based work management system

https://github.com/42futures/firm
169•danielrothmann•4d ago•60 comments

Show HN: Halloy – Modern IRC client

https://github.com/squidowl/halloy
374•culinary-robot•4d ago•97 comments

Show HN: Compression-Resistant Data Transfers

https://github.com/ianling/steg-experiments
27•iaaan•1w ago•6 comments

Show HN: Scriber Pro – Offline AI transcription for macOS

https://scriberpro.cc/hn/
135•rezivor•4d ago•111 comments

Show HN: Odyis: lunar lander (1979) clone written in Rust

https://ad301.org/blog/odyis.php
2•pilkiad•22h ago•0 comments