I've been building a small in-memory vector search library as a way to explore ANN systems from first principles. I was inspired by Spotify's annoy, and Meta's FAISS.
Currently, it's a CPU-first C++ library with Python bindings that supports Flat/IVF indexes and Cosine/L2 distance metrics. There's a Colab notebook linked in the README if you want to try it quickly without installing anything.
I went from a naive brute-force (with millisecond level latency) to under half a millisecond with IVF, when benchmarked against SIFT1M subset. After that I was able to increase the throughput by ~2.4x, by making the search multi-threaded on my 4 core CPU (U series). Using scalar quantization, I reduced the memory usage by ~73% with negligible loss in accuracy.
I have documented the changes in performance and overall architecture in my repository.
Currently, my focus is on tightening the memory alignment and tuning cache locality, and making top-k selection faster, and in the long term my aim is to implement IVF-PQ and HNSW.
I'd appreciate any feedback on how I may move forward, and think through the process.
I used AI tools for implementing serialization and bindings and for early architecture brainstorming; details/prompts for which are documented in the README's 'Disclosure' section.