We’ve been playing with what's truly possible for low-latency, privacy-first voice agents, and just released a demo: Agent Santa.
The entire voice-to-text-to-speech loop runs locally on a sub-$250 Nvidia Jetson Orin Nano.
The ML Stack:
- STT: OpenAI Whisper EN tiny
- LLM: LiquidAI’s 700M-parameter LFM2
- TTS: Our NeuTTS (zero-cost cloning, high quality)
The whole thing consumes under 4GB RAM and 2GB VRAM. This showcases that complex, multi-model AI can be fully deployed on edge devices today.
We'd love to hear your feedback on the latency and potential applications for this level of extreme on-device efficiency.
digdugdirk•1mo ago
It looks cool, and I'm 100% behind the idea, but I'm more curious about what could be done on hardware that we all have broader access to, without requiring a standalone custom purpose device.
What are the options for midlevel (or older flagship) smartphones? Used PCs? Macbooks with broken screens?
neuphonic•1mo ago
TTS models can be deployed on most devices (we do a lot of CPU work), but the real bottleneck for these on-device deployments is on the LLM side. Even a "small" 700m parameter LLM was problematic. Also running 3 models in parallel on a single device is not super straightforward.
neuphonic•1mo ago
The entire voice-to-text-to-speech loop runs locally on a sub-$250 Nvidia Jetson Orin Nano.
The ML Stack:
- STT: OpenAI Whisper EN tiny - LLM: LiquidAI’s 700M-parameter LFM2 - TTS: Our NeuTTS (zero-cost cloning, high quality)
The whole thing consumes under 4GB RAM and 2GB VRAM. This showcases that complex, multi-model AI can be fully deployed on edge devices today.
We'd love to hear your feedback on the latency and potential applications for this level of extreme on-device efficiency.
digdugdirk•1mo ago
What are the options for midlevel (or older flagship) smartphones? Used PCs? Macbooks with broken screens?
neuphonic•1mo ago