Show HN: Dograh – an OSS Vapi alternative to quickly build and test voice agents

16•a6kme•2mo ago

Hi HN, I have been building voice agents for sometime now. I was earlier automating parts of visa processing, and we needed real-time, multilingual voice calling.

I assumed the hard work was just wiring LiveKit/Pipecat + STT/TTS + an LLM. It wasn’t.

Even with solid OSS (Pipecat/LiveKit), we still had to do a lot of plumbing- variable extraction, tracing, testing etc and any workflow changes required constant redeploys.

We eventually realized we’d spent more time building infrastructure than building the actual agents. Everything felt custom. We hit every possible pain with Pipecat and VAPI style systems.

So we built Dograh - a fully open-source voice agent framework that includes all the boring, painful pieces by default.

What’s different:

- Pipecat-based engine, but forked - custom event model, and concurrency fixes

- One-click start template generated by an LLM Agent for a quick get start template for any use case

- Drag-and-drop visual agent builder for quick iteration (the thing we wished existed earlier)

- Variable extraction layer (name/order/date/etc.) baked into the LLM loop

- Built in Telephony integration (Twilio/ Vonage/ Vobiz/ Cloudonix)

- Multilingual support end-to-end

- Select any LLM TTS STT (add their credits, if any)

- AI-to-AI call testing: automatically stress-test an agent before shipping (still a work in progress- so patchy as of now)

- Fully Open Source

It's built and maintained by YC alumni / exit founders who got tired of rebuilding the same plumbing.

Why we open-sourced it: We kept feeling that the space was drifting toward closed SaaS abstractions (VAPI, Retell). Those are good for demos, but once you need data controls, privacy or self/offline deployment, you end up stuck. We wanted a stack where you can see every part, fork it, self-host it, and patch it as needed.

Try it:

- Repo: https://github.com/dograh-hq/dograh

This spins up a basic multilingual agent with everything pre-wired.

Who this is for:

- If you are looking for self hosting a Vapi like platform for Data Privacy etc.

- Anyone trying to build production-grade voice agents without reinventing audio plumbing.

- If you’ve tried to glue STT→LLM→TTS manually, you probably know the exact pain this is built for

Happy to answer technical questions, show the architecture, or hear how we can improve the product.

Comments

a6kme•2mo ago

Earlier I was using other platforms for production voice agents. One thing that became obvious was the cost: 60–70% of our total spend was the Vapi platform fee, and only 30-40% was actual LLM/STT/TTS usage. Platform cost dominated everything. That alone pushed us toward something self-hosted.

But when we switched to OSS stacks (Pipecat, LiveKit), we realise that even with great OSS, the plumbing was still painful and necessary- no standard way to extract variables from conversations (name/date/order ID), no straightforward tracing of LLM calls, no way to run AI-to-AI test loops, and no fast workflow iteration - and every change meant another redeploy.

The infrastructure glue kept ballooning, and each time it felt like rebuilding the same system from scratch.

Dograh came out of that combination of cost pain and integration pain. Happy to dig deeper into anything.

pritesh1908•2mo ago

Hey HN, sometime back someone on HN asked for an open-source alternative for Vapi or Retell and we replied there (https://news.ycombinator.com/item?id=45884165) That thread just confirmed otehrs running into the same problems we had been dealing with. Now Dograh is more mature.

We are happy to share some technical details for anyone interested. A lot of Dograh’s internal work went into extending the functionality of the pipeline by including custom Frames and Processors, creating a ReactFlow based visual agent builder and creating an Engine that can parse that Agent JSON and call conversational LLM loops with function calling. Also we enhanced the functionality by creating easier access to extracted variables, call transcripts and recordings - things that are needed in any production deployment.

One thing we are still trying to understand better: how teams handle long-running conversations while keeping context tight and cheap. Would love to hear how others have approached that.

eddywebs•1mo ago

Just did a test drive, CONGRATULATIONS first of all for getting this launched. Few pointers:

1) It would be great to provide different voice personas like vapi does maybe it's there already but couldn't find the config. 2) My agent reported some lag in getting responses during the call, perhaps that's just resource issue ?

Either Way you're to a great start and I look forward for this project to grow, starred the repo on GH,I think I was the 100th one :).

a6kme•1mo ago

Hello. Thank you for trying out Dograh and being our 100th Github Star.:)

1. Having different voice personas selector like Vapi is in our pipeline. 2. The lag can be either because of system resource constraints, or due to LLM Inference Lags from the LLM inference providers. We are constantly trying to squeeze out every milisecond to combat the latency issues.

Thank you again for your kind words.

Multicomp•1mo ago

Thank you for sharing your hard work with the world! I get to play with these AI technologies without having to train my own model or wire up an entire composition because of precompiled systems ither have made and shared, like yours.

I hope you find product market fit and are able to do what you desire with this product. In the meantime, I am grateful that you are helping us advance towards the Star Trek Voice Computer being defictionalized!

a6kme•1mo ago

Thank you for your kind words.

Among many other useful and fun things, yes, the dream of having a Star Trek Voice Computer or the good HAL is not very far away. :)

android521•1mo ago

is end to end speech model like openai real time /gemini live or open source qwen 3 omni better in terms of latency?

a6kme•1mo ago

There is always a tradeoff between latency and reasoning. The bigger the model, the more stuff we can get it to do by better instruction following, but it comes at a cost of increased latency. OpenSource colocated smaller models do much better in terms of latency, but the instruction following is not that great, and we might have to tune the prompts much more than tuning for bigger models.

brihati•1mo ago

Thank you so much for sharing this with the community. Starred the project and will definitely try it out within my company. More power to you!

pritesh1908•1mo ago

Thanks brihati . Reachout (slack/chat) to us incase you need any support with any usecase

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

OpenClaw Is Changing My Life

Software factories and the agentic moment

LLMs as the new high level language

Speed up responses with fast mode

LineageOS 23.2

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Vocal Guide – belt sing without killing yourself

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Substack confirms data breach affects users’ email addresses and phone numbers

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

uLauncher

Vouch

Al Lowe on model trains, funny deaths and working with Disney

Start all of your commands with a comma (2009)

Show HN: A luma dependent chroma compression algorithm (image compression)

The AI boom is causing shortages everywhere else

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

OpenClaw Is Changing My Life

Software factories and the agentic moment

LLMs as the new high level language

Speed up responses with fast mode

LineageOS 23.2

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Vocal Guide – belt sing without killing yourself

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Substack confirms data breach affects users’ email addresses and phone numbers

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

uLauncher

Vouch

Al Lowe on model trains, funny deaths and working with Disney

Start all of your commands with a comma (2009)

Show HN: A luma dependent chroma compression algorithm (image compression)

The AI boom is causing shortages everywhere else

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Dograh – an OSS Vapi alternative to quickly build and test voice agents

Comments