We’ve built a ton of voice agents in the past few years. We noticed that voice agent dev follows a pretty typical pattern:
⁃ Start with a “single prompt” version that dumps all relevant context and instructions into a one prompt, and provide access to all necessary tools the agent will ever need
⁃ As the agent gets more complex and users discover more edge cases, break the single-prompt version out into a more complex structure that gives a user more control over the precise conversational flow. There’s a pretty high ceiling for how complicated things can get with real-world systems.
We’ve been building https://www.voicekit.com/ for this workflow. It’s a full-featured (though minimal for now) platform for making and receiving phone calls with LLM-based agents. You can buy phone numbers, connect them to LLM-based workflows with access to HTTP-based tool calls, make outbound phone calls via API, and monitor calls in a simple dashboard.Our intended workflow is for folks to start with the single prompt “simple mode”, and graduate into advanced mode over time - but to provide a simple UX that works well for both cases.
We’d love to hear your feedback.