So I built OpenToys so anyone with an ESP32 can create their own AI Toys that run inference locally, starting with Apple Silicon chips and keep their data from leaving their home network.
The repo currently supports voice cloning and multilingual conversations in 10 languages locally. The app is a Rust Tauri app with a Python sidecar with the voice pipeline. The stack uses Whisper for STT, any MLX LLMs, and Qwen3-TTS and Chatterbox-Turbo for TTS.