We rebuilt it from the ground up around a new architecture that uses predictive transcription – it anticipates the next likely word before it’s spoken. That makes it both fast (around 150 ms latency) and highly accurate.
Scribe v2 Realtime outperforms every low latency transcription model across 30 commonly used EU and Asian languages – 93.5% accuracy, compared to Gemini 2.5 Flash (91.4%), GPT-4o MiniTranscribe (90.7%), and Deepgram Nova 3 (85%).
It supports 90+ languages including English, French, German, Italian, Spanish, Portuguese, Hindi, and Japanese. It’s designed for live, agentic applications like AI voice agents, meeting notetakers, and captioning systems.
We also built an open-source real-time transcription UI component at https://ui.elevenlabs.io/blocks to make it easy to integrate voice into any product.
You can use Scribe v2 Realtime via our API or directly within ElevenLabs Agents.
Try it out here: https://elevenlabs.io/realtime-speech-to-text or read the docs: https://elevenlabs.io/docs/capabilities/speech-to-text
We’d love your feedback!