Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

https://github.com/MinishLab/model2vec-rs

22•Tananon•3h ago

Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase.

Main Features:

- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).

- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.

Performance:

We benchmarked single-threaded on a CPU:

- Python: ~4650 embeddings/sec

- Rust: ~8000 embeddings/sec (~1.7× speedup)

First open-source project in Rust for us, so would be great to get some feedback!

Comments

noahbp•27m ago

What is your preferred static text embedding model?

For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?

Tananon•14m ago

It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.

Havoc•22m ago

Surprised it is so much faster. I would have thought the python one is C under the hood

Tananon•11m ago

Indeed, I also didn't expect it to be so much faster! I think it's because most of the time is actually spent on tokenization (which also happens in Rust in the Python package), but there is some transfer overhead there between Rust and Python. The other operations should be the same speed I think.

The Fall of Roam

Gen Z Loves China, and Nobody Knows Why [video]

Ask HN: What salutation do you use at townhalls?

What Is the Difference Between a Block, a Proc, and a Lambda in Ruby? (2013)

What is the deal with NULLs? (2009)

Traffic Enforcement Dwindled in the Pandemic. In Many Places, It Hasnt Come Back

Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

Noyb sends Meta 'cease and desist' letter over AI training

What Your Brain Looks Like When You Solve a Problem

Why does advice work so poorly?

Are We at the End of Science Fiction? (2017)

No one seems to know if AI will take our jobs or make us productive superstars

Grok's 'white genocide' responses show gen AI tampered with 'at will'

Kerrisdale Capital on D-Wave Quantum (QBTS)

Ask HN: Extract text and translate on every PR

Google fixes high severity Chrome flaw with public exploit

You are invited to the AEGIS waiting list

Putting Scaffolding Around Vibe Coding to Build More Complex Apps

Show HN: Claude Code in the Cloud is better than CodeX

Ask HN: How can we get close to Full Dive VR?

I Learned to Stop Worrying and Love Building My Own Solar System

Xiaomi's XRing 01 SoC leaked – 10-core Arm Cortex CPU plus 16-core Mali G925 GPU

The Chores Rota (#3 in the `Itertools` Series • `Cycle()` and Combining Tools)

Show HN: Vibe Coded GitHub PR Bot for Integrating a GitHub Action

Sell your sass bro – feedback appreciated

Built a Job Search AI That Gets Interviews (+How We Got 2M TikTok Views with $0)

Europe Built Trains. America Built Highways and Regret.

Sci-fi/fantasy books about an ordinary man in extraordinary circumstances

Next Generation Solar Panels Are Revolutionizing Clean Energy

The Long Arc of Semiconductor Scaling – By Austin Lyons