The S2S landscape right now: OpenAI (GPT-Realtime), Hume (EVI), and now this. The first two are closed-source. Qwen3-Omni is open.
What we built: real-time inference stack optimized for voice, deployed across multiple regions. You can test latency directly at the link.
Honest take: we've seen faster results chaining ASR/LLM/TTS compared to native S2S. But the progress on end-to-end models in the last few months has been impressive, and we wanted to make it easy for people to experiment.
Would love feedback from anyone who tries it, particularly on latency, voice quality, and where it breaks.