Hey HN, I built this because I wanted a voice AI assistant in my Discord
server without paying for cloud services.
How it works: captures speech per user via Opus stream, transcribes with
Groq Whisper (free tier), generates a reply with LLaMA 3.1 (also free
via Groq), and speaks it back into the voice channel with ElevenLabs
or Google TTS fallback.
The trickiest part was the echo guard — the bot needs to ignore its own
voice being picked up through other users' speakers/mics. Solved with
a time-window filter after each bot utterance.
Stack: Node.js, discord.js v14, @discordjs/voice, Groq API, ElevenLabs.
Single file, ~1400 lines, no build step.
agentzz•1h ago
How it works: captures speech per user via Opus stream, transcribes with Groq Whisper (free tier), generates a reply with LLaMA 3.1 (also free via Groq), and speaks it back into the voice channel with ElevenLabs or Google TTS fallback.
The trickiest part was the echo guard — the bot needs to ignore its own voice being picked up through other users' speakers/mics. Solved with a time-window filter after each bot utterance.
Stack: Node.js, discord.js v14, @discordjs/voice, Groq API, ElevenLabs. Single file, ~1400 lines, no build step.
Happy to answer questions about the architecture!