The goal was to see how responsive a LLM → speech system can be on normal laptops or edge devices.
It includes: - Voice Activity Detection - CPU-friendly LLM + TTS streaming - Async pipeline to reduce latency
Modular LLM backend
Useful for local assistants, robotics prototypes, privacy-first setups, or benchmarking STT/LLM/TTS latency.
We’ve been experimenting with similar CPU-first pipelines inside NEO workflows for on-device agents, and this repo is a minimal standalone version.
Would love suggestions on lightweight STT/TTS models or latency tricks people have used on CPU.