PyNear is a Python KNN library built around Vantage-Point Trees with a C++ SIMD core. I've been working on it for a while and just shipped v2.2 with
two new approximate binary indices. Benchmarks surprised me so I wanted to share.
* Where it beats Faiss:
- Exact L2 search — VP-Trees prune aggressively using the triangle inequality. At d=512, N=500k: 2.2 ms vs Faiss IndexFlatL2's 85 ms (39×). At low
dimensionality (d≤16) it's 2–4× faster.
- Approximate binary search — This one was unexpected. The new MIHBinaryIndex (Multi-Index Hashing) splits 512-bit descriptors into 8 sub-tables of
64-bit keys. By the pigeonhole principle, any true neighbour within Hamming radius 8 must match at least one sub-table exactly or with 1 bit flip — so
each query is just 520 hash lookups instead of a linear scan. At N=1M, d=512: 0.037 ms vs Faiss IndexBinaryFlat's 9.5 ms (257×), with 100% Recall@10.
- Faiss's approximate binary index (IndexBinaryIVF) turned out to have an O(N²) bug in its add() path — 34 minutes to build at N=1M. So in practice
Faiss can't do approximate binary search at scale.
* Where Faiss still wins:
- Approximate float search at very large N (≥500k) and very high d — their compiled BLAS K-Means is faster than ours for big clustering jobs. If you're
doing CLIP or LLM embedding retrieval at scale, Faiss IVF is still the right tool.
Other things PyNear does that Faiss doesn't:
- Pure Python install (NumPy only, no compiled native lib to manage)
- Pickle serialization out of the box
- L1, L∞, and Hamming exact search with the same API
- Drop-in scikit-learn adapter (same fit/predict/kneighbors interface)
- BKTree for Hamming range/threshold queries
The binary approximate story is the most practically interesting to me — binary descriptors (ORB, BRIEF, AKAZE) are always high-dimensional and always
approximate in practice, and it turns out MIH is a much better fit for that problem than IVF.
GitHub: https://github.com/pablocael/pynear
Benchmark report (PDF): https://github.com/pablocael/pynear/blob/main/docs/benchmarks.pdf