Hi HN! I built Shimmy, a lightweight AI inference server that can now load HuggingFace SafeTensors models directly without any Python dependencies.
The core problem: I wanted to run HuggingFace models locally but didn't want the heavyweight Python ML stack. Most solutions require Python + PyTorch + transformers libraries, which can be 2GB+ just for
dependencies.
What's new in v1.2.0:
• Native SafeTensors support - loads .safetensors files directly in Rust
• 2x faster model loading compared to traditional formats
• Zero Python dependencies - pure Rust implementation
• Still just a 5MB binary (vs 50MB+ alternatives like Ollama)
• Full OpenAI API compatibility for drop-in replacement
Technical details:
- Built with native SafeTensors parsing (not Python bindings)
- Memory-efficient tensor loading with bounds checking
- Tested up to 100MB+ models with sub-second loading
- Cross-platform: Windows, macOS (Intel/ARM), Linux
- Supports mixed model formats (GGUF + SafeTensors)
This bridges the gap between HuggingFace's model ecosystem and lightweight local deployment. You can now grab any SafeTensors model from HuggingFace and run it locally with just a single binary.
GitHub: https://github.com/Michael-A-Kuykendall/shimmy
Install: `cargo install shimmy`
Happy to answer questions about the SafeTensors implementation or Rust AI inference in general!