frontpage.

Show HN: PyNear – exact and approximate KNN, faster than Faiss

2•pcael•1h ago

PyNear is a Python KNN library built around Vantage-Point Trees with a C++ SIMD core. I've been working on it for a while and just shipped v2.2 with two new approximate binary indices. Benchmarks surprised me so I wanted to share.

  * Where it beats Faiss:

  - Exact L2 search — VP-Trees prune aggressively using the triangle inequality. At d=512, N=500k: 2.2 ms vs Faiss IndexFlatL2's 85 ms (39×). At low
  dimensionality (d≤16) it's 2–4× faster.

  - Approximate binary search — This one was unexpected. The new MIHBinaryIndex (Multi-Index Hashing) splits 512-bit descriptors into 8 sub-tables of
  64-bit keys. By the pigeonhole principle, any true neighbour within Hamming radius 8 must match at least one sub-table exactly or with 1 bit flip — so
  each query is just 520 hash lookups instead of a linear scan. At N=1M, d=512: 0.037 ms vs Faiss IndexBinaryFlat's 9.5 ms (257×), with 100% Recall@10.

  - Faiss's approximate binary index (IndexBinaryIVF) turned out to have an O(N²) bug in its add() path — 34 minutes to build at N=1M. So in practice
  Faiss can't do approximate binary search at scale.

  * Where Faiss still wins:
  - Approximate float search at very large N (≥500k) and very high d — their compiled BLAS K-Means is faster than ours for big clustering jobs. If you're
  doing CLIP or LLM embedding retrieval at scale, Faiss IVF is still the right tool.

   Other things PyNear does that Faiss doesn't:
   - Pure Python install (NumPy only, no compiled native lib to manage)
   - Pickle serialization out of the box
   - L1, L∞, and Hamming exact search with the same API
   - Drop-in scikit-learn adapter (same fit/predict/kneighbors interface)
   - BKTree for Hamming range/threshold queries

   The binary approximate story is the most practically interesting to me — binary descriptors (ORB, BRIEF, AKAZE) are always high-dimensional and always
   approximate in practice, and it turns out MIH is a much better fit for that problem than IVF.

   GitHub: https://github.com/pablocael/pynear
   Benchmark report (PDF): https://github.com/pablocael/pynear/blob/main/docs/benchmarks.pdf

Ask HN: Best stack for building a tiny game with an 11-year-old?

Fetch metadata request header – MDN

The world of AI contributions in Open Source

When prime numbers emerge from motion

Future of SWE: Efficiency, Learning Velocity, Small Teams, Reasoning

Police used AI facial recognition to wrongly arrest TN woman for crimes in ND

One ant for $220: The new frontier of wildlife trafficking

The Marvelous Misunderstanding of Miss Mustard Oil (2022)

Reflections on the State of the Software and AI Market

Twelve Dimensional Chess is Stupid (2018)

Initweave – build your Emacs init.el

Moving Towards Determinism with LLMs

Show HN: 2.7KB Zig WASM – live globe showing executions at 300 CF edges

Probability two skaters are equal, to 3rd decimal places, after 4 distances?

gRPC wrapper service for Pi4J 4.x.x

The literacy you need for AI literacy

Security Analysis of the Official White House iOS App

Number of AI chatbots ignoring human instructions increasing, study says

Show HN: ACP – Like MCP, but for controlling existing application UIs

Why India's best engineering talent ends up in its worst software roles

A Secret History of Psychosis

FOSS – Tracking delayed open-source releases

Show HN: I built a physical device because productivity apps failed me

Ask HN: Log in issues after macOS Tahoe update?

The Kremlin's Cap Table: How Russia Builds War Tech Without Venture Capital

Show HN: I wrote a ~2KB executable file HTTP file downloader without Libc

Show HN: Payphone Tag is territory game using Australia's 15,000 payphones

Patriot Crisis: US Embezzles Switzerland's Fighter Jet Funds

Hypersphere Cosmology: resolving the Hubble tension geometrically

Bees and hummingbirds aren't just buzzing – they're sipping trace booze