I've been working on a vector search engine called QSS (Quantized Similarity Search). It's written in C and explores the idea of aggressively quantizing embedding vectors to 1-bit per dimension. It uses XOR + popcount for fast approximate search, followed by re-ranking using cosine similarity on the original vectors.
The main goal is to see how far you can push quantization without sacrificing too much search quality—while gaining significantly in memory usage and speed.
How it works Embeddings are quantized to 1 bit per dimension (e.g. 300D → 300 bits → ~40 bytes).
Search is done using bitwise XOR and popcount (Hamming distance).
A shortlist is re-ranked using cosine similarity on the original (float) embeddings.
Supports GloVe, Word2Vec, and fastText formats.
Goals Analyze the trade-offs between quantization and search accuracy.
Measure potential speed and memory gains.
Explore how this approach scales with larger datasets.
Preliminary tests I’ve only run a few small-scale tests so far, but the early signs are encouraging:
For some queries (e.g. "hello", "italy"), the top 30 results matched the full-precision cosine search.
On Word2Vec embeddings, the quantized pipeline was up to 18× faster than the standard cosine similarity loop.
These results are anecdotal for now—I’m sharing the project early to get feedback before going deeper into benchmarks.
Other notes Word lookup is linear and unoptimized for now—focus is on the similarity search logic.
Testing has been done single-threaded on a 2018 iMac (3.6 GHz Intel i3).
If you're interested in vector search, quantization, or just low-level performance tricks, I'd love your thoughts:
Do you think this kind of aggressive quantization could work at scale?
Are there other fast approximate search techniques you'd recommend exploring?
Repo is here: https://github.com/buddyspencer/QSS
Thanks for reading!