I built ChatVault because I was frustrated with WhatsApp's native search. It relies on exact keyword matching, which makes finding old messages ("where was that sushi place?") impossible if you don't remember the exact phrasing.
I wanted semantic search, but I wasn't willing to upload my chat history to a cloud vector database or an LLM API.
So I built a strictly local-first search engine using Rust compiled to WebAssembly.
The Architecture:
1. Inference: It runs a quantized BERT model (all-MiniLM-L6-v2) directly in the browser using Candle (Hugging Face's Rust ML framework). No ONNX runtime, just pure Rust Wasm.
2. Vector Store: I implemented a custom in-memory vector store in Rust. It performs normalization on insertion, allowing for fast dot-product similarity searches.
3. Performance: To prevent blocking the main thread during indexing (computing embeddings for thousands of messages), the Wasm module runs inside a Web Worker. 4. The UI (Next.js 16) stays responsive at 60fps.
Search: I implemented a hybrid scoring algorithm: (Vector_Score * 0.5) + (Keyword_Score * 0.5) to balance semantic meaning with exact matches.
The app works 100% offline once the model weights (~90MB) are cached.
I'm a student looking to get deeper into systems programming/Rust. I'd love feedback on the memory management between JS and Wasm, as passing strings back and forth was the biggest bottleneck I encountered.
marcoshernanz•1h ago
I built ChatVault because I was frustrated with WhatsApp's native search. It relies on exact keyword matching, which makes finding old messages ("where was that sushi place?") impossible if you don't remember the exact phrasing.
I wanted semantic search, but I wasn't willing to upload my chat history to a cloud vector database or an LLM API.
So I built a strictly local-first search engine using Rust compiled to WebAssembly.
The Architecture: 1. Inference: It runs a quantized BERT model (all-MiniLM-L6-v2) directly in the browser using Candle (Hugging Face's Rust ML framework). No ONNX runtime, just pure Rust Wasm. 2. Vector Store: I implemented a custom in-memory vector store in Rust. It performs normalization on insertion, allowing for fast dot-product similarity searches. 3. Performance: To prevent blocking the main thread during indexing (computing embeddings for thousands of messages), the Wasm module runs inside a Web Worker. 4. The UI (Next.js 16) stays responsive at 60fps. Search: I implemented a hybrid scoring algorithm: (Vector_Score * 0.5) + (Keyword_Score * 0.5) to balance semantic meaning with exact matches.
The app works 100% offline once the model weights (~90MB) are cached.
Live Demo: https://chat-vault-mh.vercel.app/
I'm a student looking to get deeper into systems programming/Rust. I'd love feedback on the memory management between JS and Wasm, as passing strings back and forth was the biggest bottleneck I encountered.