It wraps mlx-audio and handles the full lifecycle: bootstraps its own Python environment via uv, downloads the model on first run, manages the server process, auto-restarts on crash, and exposes a standard OpenAI-compatible /v1/audio/speech endpoint.
Installation:
openclaw plugin install @cosformula/openclaw-mlx-audio Four models out of the box:
• Kokoro-82M: ~400 MB RAM, fastest, good for English/Japanese • Qwen3-TTS-0.6B: ~1.4 GB RAM, best Chinese quality, 3-second voice cloning • Qwen3-TTS-1.7B VoiceDesign: generate voices from text descriptions • Chatterbox: 16 languages, ~3.5 GB RAM
Works on 8 GB Macs with Kokoro or Qwen3-0.6B. A proxy layer injects model-specific parameters so OpenClaw's TTS client needs zero changes.
Why not just run mlx-audio directly? You can. This plugin removes the setup friction: no Python version juggling, no pip install, no manual server management. It also adds OOM detection, memory pre-checks, startup progress tracking, and hot config reload.