It wraps Resemble AI’s Chatterbox ONNX models behind a Rust CLI (ONNX Runtime under the hood). Goal: “I want good TTS in a shell script” without Python envs / pip / venv juggling.
Quick start: cbx speak --text "Hello from cbx." --voice-wav ./your-voice.wav --out-wav ./output.wav
First run downloads the model files (~1–2GB depending on variant). After that it runs locally. If you’re doing repeated runs with the same reference voice, you can cache the voice encoding once: cbx voice add --name myvoice --voice-wav ./your-voice.wav cbx speak --voice myvoice --text "Much faster now." --out-wav ./output.wav
What it does (intentionally small surface area): - single binary, cross-platform CLI - built-in model download/list/clean commands - voice profile caching (avoid re-encoding the reference clip every run)
What it doesn’t do: - it’s not the full Chatterbox project (multilingual, fine-tuning, etc). It’s a packaging + UX layer for basic TTS.
Slightly counterintuitive perf note: on an M1 MacBook Pro, CPU ended up faster than CoreML for this model due to accelerator partitioning overhead; numbers are in the README.
If you try it, I’m especially interested in feedback on: install/packaging trust, cache layout, and what you’d want from a “tiny model / fast mode”.