EchoKit is a DIY, open-source voice agent running on an ESP32-S3. The fun part is the server backend, which I wrote entirely in Rust to handle the AI pipeline (ASR, LLM, TTS).
The stack is:
Hardware: EchoKit board (ESP32-S3)
Firmware: ESP-IDF
Server: Rust (Actix Web/Tungstenite)
AI: Customizable pipeline. The tutorial uses Groq for the Whisper, Llama 3, and TTS models, which makes the response time incredibly fast (usually just a few seconds for the full ASR->LLM->TTS roundtrip).
It's designed to be easy for makers, students, or anyone curious about AI to build in just a few minutes. You can modify the system prompts, swap out models, or even add custom actions (Step 6 in the guide).
The tutorial (linked) walks through assembly, flashing, and setting up the server. The server code is on GitHub (also linked).
Happy to answer any questions. Would love to hear your thoughts and what you think we could build with this!
Server Repo:https://github.com/second-state/echokit_server