We built Xybrid, a Rust library for running LLM + speech pipelines directly inside your app, no server, no daemon, just one binary.
We started building it while working on a privacy-focused LLM app with Tauri and realized there wasn’t a straightforward way to embed models directly into shipped applications without relying on a separate server process.
Xybrid links into your process like any other library. It supports GGUF / ONNX / CoreML and integrates with Flutter, Swift, Kotlin, Unity, and Tauri, letting you run pipelines like speech → LLM → speech in a single call.
On recent phones, we’re seeing ~20 tok/s on Android and ~40 tok/s on iOS for small (~3B) quantized models (varies by device, backend, and thermals).
The demo that shows it best: a Unity tavern scene where 6 NPCs generate real-time dialogue fully on-device — no API key, no internet, no per-request cost.
Unity demo: https://youtu.be/vSPeTyeow6A Desktop demo (Tauri): https://youtu.be/o83YShqV7O4
GitHub: https://github.com/xybrid-ai/xybrid
It’s still early — there are rough edges, especially around model support and performance tuning. Happy to answer questions about the architecture, backends, or integrations (Flutter, Swift, Kotlin, Unity, Tauri).