Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

https://github.com/MinishLab/model2vec-rs

47•Tananon•9h ago

Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase.

Main Features:

- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).

- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.

Performance:

We benchmarked single-threaded on a CPU:

- Python: ~4650 embeddings/sec

- Rust: ~8000 embeddings/sec (~1.7× speedup)

First open-source project in Rust for us, so would be great to get some feedback!

Comments

noahbp•6h ago

What is your preferred static text embedding model?

For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?

Tananon•6h ago

It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.

refulgentis•2h ago

Thanks man this is incredible work, really appreciate the details you went into.

I've been chewing on if there was a miracle that could make embeddings 10x faster for my search app that uses minilmv3, sounds like there is :) I never would have dreamed. I'll definitely be trying potion-base in my library for Flutter x ONNX.

EDIT: I was thanking you for thorough benchmarking, then it dawned on me you were on the team that built the model - fantastic work, I can't wait to try this. And you already have ONNX!

EDIT2: Craziest demo I've seen in a while. I'm seeing 23x faster, after 10 minutes of work.

Havoc•6h ago

Surprised it is so much faster. I would have thought the python one is C under the hood

Tananon•6h ago

Indeed, I also didn't expect it to be so much faster! I think it's because most of the time is actually spent on tokenization (which also happens in Rust in the Python package), but there is some transfer overhead there between Rust and Python. The other operations should be the same speed I think.

France Becomes First Government to Endorse UN Open Source Principles

Spaced repetition systems have gotten better

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

Ditching Obsidian and building my own

$30 Homebrew Automated Blinds Opener

Show HN: Vaev – A browser engine built from scratch (It renders google.com)

Show HN: A platform to find tech conferences, discounts, and ticket giveaways

Spaced Repetition Memory System

K-Scale Labs: Open-source humanoid robots, built for developers

The Fall of Roam

Comparing Parallel Functional Array Languages: Programming and Performance

The Journal of Imaginary Research

Show HN: Python Simulator of David Deutsch’s "Constructor Theory of Time"

Building my childhood dream PC

A New Headache for Honest Students: Proving They Didn't Use A.I

Green Fabrication of Sulfonium-Containing Bismuth Materials for X-Ray Detection

Show HN: Buckaroo – Data table UI for Notebooks

California vanity license plate applications with reasons for rejection

KDE is finally getting a native virtual machine manager called "Karton"

The effect of physical fitness on mortality is overestimated

In Memoriam: John L. Young, Cryptome Co-Founder

Emergent social conventions and collective bias in LLM populations

Show HN: Hardtime.nvim – break bad habits and master Vim motions

Yahtzeeql – Yahtzee solver that's mostly SQL

How the humble chestnut traced the rise and fall of the Roman Empire

Dezyne Programming Language

How the Sun Enterprise 10000 was born (2007)

Mystical

AniSora: Open-source anime video generation model

Severed Fingers and 'Wrench Attacks' Rattle the Crypto Elite