Main Features:
- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).
- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.
Performance:
We benchmarked single-threaded on a CPU:
- Python: ~4650 embeddings/sec
- Rust: ~8000 embeddings/sec (~1.7× speedup)
First open-source project in Rust for us, so would be great to get some feedback!
noahbp•3h ago
For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?
Tananon•3h ago
There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.