Built this because deploying ML models is painful. Python + pip dependencies + Docker = 5GB+ images. For edge/embedded/air-gapped systems, this doesn't work.
LOOM loads HuggingFace transformers directly in Go. No Python runtime. ~10MB binary.
Technical highlights: - Native safetensors parser - Pure Go BPE tokenizer (no transformers library) - Full transformer stack (MHA, GQA, RMSNorm, SwiGLU) - Cross-platform determinism (MAE < 1e-8) - Published to PyPI, npm, NuGet
Tradeoff: CPU-only, 1-3 tok/s on small models. Correctness first, speed second.
Works with Qwen, Llama, Mistral, SmolLM. Cross-compiles everywhere Go runs.
Demo: https://youtu.be/86tUjFWow60
What layer types should I add next? Currently have: Dense, Conv2D, MHA, RNN, LSTM, LayerNorm, RMSNorm, SwiGLU, Softmax (10 variants), Residual.
Questions welcome!