frontpage.

We've been building voice agents across Retell, VAPI, LiveKit, and Bland, and the testing story is... rough. Every platform has its own config format, there's no shared way to define what "correct" looks like, and most teams end up doing manual QA by literally calling their agent and listening. So we built voicetest.

voicetest is an open source (Apache 2.0) test harness that works across voice AI platforms. You import your agent graph from any supported platform (or define one from scratch), write test scenarios with expected behaviors, and voicetest simulates conversations and evaluates them with LLM judges that score each turn 0.0-1.0 with written reasoning. It also ships global compliance evaluators for things like HIPAA, PCI-DSS, and brand voice consistency. The core abstraction is an AgentGraph IR that normalizes across platform formats, so you can convert between Retell, VAPI, LiveKit, and Bland configs and test them all the same way.

Quick start:

``` uv tool install voicetest voicetest demo --serve ```

That gives you a web UI at localhost with a sample agent, test cases, and evaluation results you can poke at. There's also a CLI, a TUI, and a REST API. It integrates into CI/CD with GitHub Actions, uses DuckDB for persistence, and includes a Docker Compose dev environment with LiveKit, Whisper STT, and Kokoro TTS. If you have a Claude Code subscription, voicetest can pass through to it instead of requiring separate API keys for evaluation.

GitHub: https://github.com/voicetestdev/voicetest Docs: https://voicetest.dev API reference: https://voicetest.dev/api/

Instruction decoding in the Intel 8087 floating-point chip

Memories: Doing my PhD at Stanford, under John L Hennessy

The great computer science exodus (and where students are going instead)

Dimensional novel Validation protocol DAM Elara project

TinyFish Accelerator: 9 Weeks Virtual Agent Accelerator, $2M Seed Pool

The One Woman Anthropic Trusts to Teach AI Morals

Heydawy DNS Changer v1 x64

PlaceboBench: An LLM hallucination benchmark for pharma

Amplified.dev: Developers amplified, not automated

Feed the AI Beast

Gentoo on Codeberg

KDE Plasma 6.6

Cannabis Beverage Substitution for Alcohol: A Novel Harm Reduction Strategy

Show HN: StewReads – Turn Claude chats into Kindle ebooks

Show HN: Agent Readiness Score – A real AI agent to test your website

Security Hardened OpenClaw

OpenAI axes exec for "sexual discrimination" after she objected GPT erotica plan

Avoid IaaS Lock-In with a SAML Proxy (2025)

Open SSH: Post-Quantum Cryptography

Show HN: Website Monitoring with Telegram Alerts

Waiting for the AI J-Curve

Ask HN: How do you motivate your humans to stop AI-washing their emails?

Show HN: Self-Hosted Task Scheduling System (Back End and UI and Python SDK)

Hybrid Search in PostgreSQL: The Missing Manual

Grand Time: Time-Based Models in Decentralized Trust

Show HN: I Forked Moltbook to Build a Hybrid Social Network (Humans and AI)

Retrotech YouTuber Sam Battle "Lookmumnocomputer" to Represent UK in Eurovision

WolfSSL Doesn't Suck

Show HN: Continue – Source-controlled AI checks, enforceable in CI

Chess engines do weird stuff

Show HN: Voicetest – open-source test harness for voice AI agents