Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

https://github.com/MinishLab/model2vec-rs

45•Tananon•7h ago

Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase.

Main Features:

- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).

- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.

Performance:

We benchmarked single-threaded on a CPU:

- Python: ~4650 embeddings/sec

- Rust: ~8000 embeddings/sec (~1.7× speedup)

First open-source project in Rust for us, so would be great to get some feedback!

Comments

noahbp•3h ago

What is your preferred static text embedding model?

For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?

Tananon•3h ago

It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.

Havoc•3h ago

Surprised it is so much faster. I would have thought the python one is C under the hood

Tananon•3h ago

Indeed, I also didn't expect it to be so much faster! I think it's because most of the time is actually spent on tokenization (which also happens in Rust in the Python package), but there is some transfer overhead there between Rust and Python. The other operations should be the same speed I think.

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

Show HN: Vaev – A browser engine built from scratch (It renders google.com)

Show HN: Python Simulator of David Deutsch’s "Constructor Theory of Time"

Show HN: Buckaroo – Data table UI for Notebooks

Show HN: Hardtime.nvim – break bad habits and master Vim motions

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

Show HN: Stack Error – ergonomic error handling for Rust

Show HN: A Wolfenstein3D-like raycaster made in Windows Batch

Show HN: A web browser agent in your Chrome side panel

Show HN: Racketmeter – Measure Badminton String Tension Using Sound Frequency

Show HN: Chat with 19 years of HN

Show HN: DeepShot – an open-source NBA predictor with ML, EWMA, and live UI

Show HN: Turn any workflow diagram into compilable, running and stateful code

Show HN: I built a knife steel comparison tool

Show HN: Pixelagent – Build your Stateful Agent Framework in 200 lines of code

Show HN: HTML QR Codes (Animated, Interactive, Programmable)

Show HN: Public database of sites for backlinks

Show HN: Merliot – plugging physical devices into LLMs

Show HN: I built a site to binge the best engineering blog posts

Show HN: MCP Server for secure code execution (Python, Ruby, C, and more)

Show HN: I cloned a YC funded app in a day as an MVP

Show HN: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java

Show HN: Solidis – Tiny TS Redis client, no deps, for serverless

Show HN: Visual flow-based programming for Erlang, inspired by Node-RED

Show HN: SQL-tString a t-string SQL builder in Python

Show HN: Muscle-Mem, a behavior cache for AI agents

Show HN: Rv, a Package Manager for R

Show HN: Workflow Use – Deterministic, self-healing browser automation (RPA 2.0)

Show HN: Real-Time Gaussian Splatting

Show HN: We created a new protocol and wallet that supports it for Bitcoin

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

Comments

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

Show HN: Vaev – A browser engine built from scratch (It renders google.com)

Show HN: Python Simulator of David Deutsch’s "Constructor Theory of Time"

Show HN: Buckaroo – Data table UI for Notebooks

Show HN: Hardtime.nvim – break bad habits and master Vim motions

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

Show HN: Stack Error – ergonomic error handling for Rust

Show HN: A Wolfenstein3D-like raycaster made in Windows Batch

Show HN: A web browser agent in your Chrome side panel

Show HN: Racketmeter – Measure Badminton String Tension Using Sound Frequency

Show HN: Chat with 19 years of HN

Show HN: DeepShot – an open-source NBA predictor with ML, EWMA, and live UI

Show HN: Turn any workflow diagram into compilable, running and stateful code

Show HN: I built a knife steel comparison tool

Show HN: Pixelagent – Build your Stateful Agent Framework in 200 lines of code

Show HN: HTML QR Codes (Animated, Interactive, Programmable)

Show HN: Public database of sites for backlinks

Show HN: Merliot – plugging physical devices into LLMs

Show HN: I built a site to binge the best engineering blog posts

Show HN: MCP Server for secure code execution (Python, Ruby, C, and more)

Show HN: I cloned a YC funded app in a day as an MVP

Show HN: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java

Show HN: Solidis – Tiny TS Redis client, no deps, for serverless

Show HN: Visual flow-based programming for Erlang, inspired by Node-RED

Show HN: SQL-tString a t-string SQL builder in Python

Show HN: Muscle-Mem, a behavior cache for AI agents

Show HN: Rv, a Package Manager for R

Show HN: Workflow Use – Deterministic, self-healing browser automation (RPA 2.0)

Show HN: Real-Time Gaussian Splatting

Show HN: We created a new protocol and wallet that supports it for Bitcoin