voicetest is an open source (Apache 2.0) test harness that works across voice AI platforms. You import your agent graph from any supported platform (or define one from scratch), write test scenarios with expected behaviors, and voicetest simulates conversations and evaluates them with LLM judges that score each turn 0.0-1.0 with written reasoning. It also ships global compliance evaluators for things like HIPAA, PCI-DSS, and brand voice consistency. The core abstraction is an AgentGraph IR that normalizes across platform formats, so you can convert between Retell, VAPI, LiveKit, and Bland configs and test them all the same way.
Quick start:
``` uv tool install voicetest voicetest demo --serve ```
That gives you a web UI at localhost with a sample agent, test cases, and evaluation results you can poke at. There's also a CLI, a TUI, and a REST API. It integrates into CI/CD with GitHub Actions, uses DuckDB for persistence, and includes a Docker Compose dev environment with LiveKit, Whisper STT, and Kokoro TTS. If you have a Claude Code subscription, voicetest can pass through to it instead of requiring separate API keys for evaluation.
GitHub: https://github.com/voicetestdev/voicetest Docs: https://voicetest.dev API reference: https://voicetest.dev/api/