frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
255•isitcontent•18h ago•27 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
354•vecti•20h ago•160 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
10•sandGorgon•2d ago•2 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
327•eljojo•21h ago•198 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
79•phreda4•18h ago•14 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
93•antves•2d ago•70 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
3•sam256•2h ago•1 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
3•nmfccodes•36m ago•1 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
6•sakanakana00•3h ago•1 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•3h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
26•dchu17•23h ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
50•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
152•bsgeraci•1d ago•64 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
17•denuoweb•2d ago•2 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
19•NathanFlurry•1d ago•9 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
2•melvinzammit•5h ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•6h ago•2 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•7h ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
17•keepamovin•8h ago•5 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
173•vkazanov•2d ago•49 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
6•rahuljaguste•17h ago•1 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•23h ago•7 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•11h ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•12h ago•4 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•13h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
12•KevinChasse•23h ago•16 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
10•sawyerjhood•1d ago•0 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
568•deofoo•5d ago•166 comments
Open in hackernews

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

https://github.com/Pringled/pyversity
86•Tananon•3mo ago
Hey HN! I’ve recently open-sourced Pyversity, a lightweight library for diversifying retrieval results. Most retrieval systems optimize only for relevance, which can lead to top-k results that look almost identical. Pyversity efficiently re-ranks results to balance relevance and diversity, surfacing items that remain relevant but are less redundant. This helps with improving retrieval, recommendation, and RAG pipelines without adding latency or complexity.

Main features:

- Unified API: one function (diversify) supporting several well-known strategies: MMR, MSD, DPP, and COVER (with more to come)

- Lightweight: the only dependency is NumPy, keeping the package small and easy to install

- Fast: efficient implementations for all supported strategies; diversify results in milliseconds

Re-ranking with cross-encoders is very popular right now, but also very expensive. From my experience, you can usually improve retrieval results with simpler and faster methods, such as the ones implemented in this package. This helps retrieval, recommendation, and RAG systems present richer, more informative results by ensuring each new item adds new information.

Code and docs: github.com/pringled/pyversity

Let me know if you have any feedback, or suggestions for other diversification strategies to support!

Comments

leobg•3mo ago
Might also be useful for dataset curation, or even just prompt engineering. For example when training a classification task and picking a diverse set of examples for training or evaluation.
Tananon•3mo ago
True, I think that's also a great usecase! Though these algorithms likely won't scale to very large datasets (e.g. millions of samples), but for smaller datasets, like fine-tuning sets, I think this would work very well. I've worked on something similar in the past that works for larger datasets (semantic deduplication: https://github.com/MinishLab/semhash).
CarlosD•3mo ago
Fascinating!
pu_pu•3mo ago
The biggest problem with retrieval is actually semantic relevance. I think most embedding models don't really capture sentence-level semantic content and instead act more like bag-of-words models averaging local word-level information.

Consider this simple test I’ve been running:

Anchor: “A background service listens to a task queue and processes incoming data payloads using a custom rules engine before persisting output to a local SQLite database.”

Option A (Lexical Match): “A background service listens to a message queue and processes outgoing authentication tokens using a custom hash function before transmitting output to a local SQLite database.”

Option B (Semantic Match): “An asynchronous worker fetches jobs from a scheduling channel, transforms each record according to a user-defined logic system, and saves the results to an embedded relational data store on disk.”

Any decent LLM (e.g., Gemini 2.5 Pro, GPT-4/5) immediately knows that the Anchor and Option B describe the same concept just with different words. But when I test embedding models like gemini-embedding-001 (currently top of MTEB), they consistently rate Option A as more similar measured by cosine similarity. They’re getting tricked by surface-level word overlap.

I put together a small GitHub repo that uses ChatGPT to generate and test these “semantic triplets:

https://github.com/semvec/embedstresstest

gemini-embedding-001 (current #1 on MTEB leaderboard ) scored close to 0% on these adversarial examples.

The repo is unpolished at the moment but it gets the idea across and everything is reproducible.

Anyway, did anyone else notice this problem?

softwaredoug•3mo ago
I’m not sure what the “biggest” problem is, but I do think diversity is vastly underappreciated compared to relevance.

You can have maximally relevant search results that are horrible. Because most users (and LLMs) want to understand the range of options, not just one type of relevant option.

Search for “shoes” and only see athletic shoes is a bad experience. You’ll sell more shoes, and keep the user engaged, if you show a diverse range of shoes.

jimmySixDOF•3mo ago
I liked how Karpathy explained part of this problem as "silent collapse" in his recent Dwarkesh podcast. Meaning the models tend to fall into a local minima situation of using a few output wording templates for a large number of similar questions, and this lack of entropy diversity it becomes a tough hard to detect problem when doing distillation or synthetic data generation in general. These algorithms as nice python functions are also useful repurposed for labeling parts of ontology and topic clusters etc [1]. Will definitely star and keep an eye on the repo !

[1] https://jina.ai/news/submodular-optimization-for-text-select...

Tananon•3mo ago
Nice, I actually read that Jina article when it was published, but forgot they use facility location as well! The saturated coverage algorithm looks pretty interesting, I'll have a look at how feasible it would be to add that to Pyversity.
ehsanu1•3mo ago
This seems like a good template to generate synthetic data, with positive/negative examples, allowing an embedding model to be aligned more semantically to underlying concepts.

Anyways, I'd hope reranking models do better, have you tried those?

paulfharrison•3mo ago
Producing a diverse list of results may still help in a couple of ways here.

* If there are a lot of lexical matches, real semantic matches may still be in the list but far down the list. A diverse set of, say, 20 results may have a better chance of including a semantic match than the top 20 results by some score.

* There might be a lot of semantic matches, but a vast majority of the semantic matches follow a particular viewpoint. A diverse set of results has a better chance of including the viewpoint that solves the problem.

Yes, semantic matching is important, but this is solving an orthogonal and complementary problem. Both are important.

liqilin1567•3mo ago
It would be better if there are some real-world performance tests and comparisons with other embedding-only search methods.
Tananon•3mo ago
That's indeed something I plan to add in the near future. I'll probably add a tutorial as well to showcase how you can use this with e.g. sentence transformers. There's some pretty good benchmarks in the paper that I used as inspiration for some of these algorithms: https://arxiv.org/pdf/1709.05135, I'll most likely try to reproduce some of these.