Hi HN,
We’re the Dograh team (YC alumni). While building voice bots, we found that wiring WebRTC/ Telephony + STT + LLM + TTS took more time than the bots themselves. Teams are spending weeks on plumbing - handling call flows, extracting variables, dealing with telephony edge cases, and redeploying for small changes. Tools like Vapi/Retell are easy to start with but come with lock-in and platform fees. So we built Dograh: a 100% open-source platform that handles the full stack, with a visual workflow builder and self-hosting by default.
Dograh v1.20 introduces two major additions: 1. Gemini 3.1 Live support Run fully real-time voice agents using Gemini’s streaming APIs, without stitching together separate STT + LLM + TTS components. 2. Pre-recorded audio (hybrid voice) Upload real voice clips (greetings, confirmations, etc.), and the agent plays them instantly while using TTS only for dynamic responses. This reduces latency, improves naturalness, and cuts TTS costs.
It also includes:
- Plug-and-play LLM / STT / TTS (including self-hosted models) - Telephony integrations (Twilio, Vonage, Telnyx) along with Call Transfer - Post-call QA, transcripts, and variable extraction - Observability via Langfuse (OpenTelemetry traces + prompt playground)
Try it now: If you have Docker, you can run the below command for a 2-minute setup (no API keys needed out of the box).
https://gist.github.com/a6kme/072252bf885270787bbb8376687c67... [ sorry, HN wont let me post the entire command ]
Looking Ahead: We’re expanding self-hosted model support: you can already bring any LLM (e.g. Llama, Qwen) or TTS (Kokoro, Voxtral) by configuring API endpoints. We are working on updates that will enable anyone to run everything on a single server - your AI models along with Dograh Orchestration.
Looking forward to hearing thoughts of the community.