frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Voxtral Transcribe 2

https://mistral.ai/news/voxtral-transcribe-2
144•meetpateltech•1h ago

Comments

observationist•1h ago
Native diarization, this looks exciting. edit: or not, no diarization in real-time.

https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

~9GB model.

coder543•41m ago
The diarization is on Voxtral Mini Transcribe V2, not Voxtral Mini 4B.
observationist•26m ago
Ahh, yeah, and it's explicitly not working for realtime streams. Good catch!
sbrother•2m ago
Do you have experience with that model for diarization? Does it feel accurate, and what's its realtime factor on a typical GPU? Diarization has been the biggest thorn in my side for a long time..
serf•1h ago
things I hate:

"Click me to try now!" banners that lead to a warning screen that says "Oh, only paying members, whoops!"

So, you don't mean 'try this out', you mean 'buy this product'.

Let's not act like it's a free sampler.

I can't comment on the model : i'm not giving them money.

ReadEvalPost•59m ago
You can try it on HF: https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim...
boobsbr•36m ago
I'm impressed.
mdrzn•54m ago
There's no comparison to Whisper Large v3 or other Whisper models..

Is it better? Worse? Why do they only compare to gpt4o mini transcribe?

GaggiX•50m ago
Gpt4o mini transcribe is better and actually realtime. Whisper is trained to encode the entire audio (or at least 30s chunks) and then decode it.
emmettm•47m ago
The linked article claims the average word error rate for Voxtral mini v2 is lower than GPT-4o mini transcribe
GaggiX•46m ago
Gpt4o mini transcribe is better than whisper, the context is the parent comment.
mdrzn•47m ago
So "gpt4o mini transcribe" is not just whisper v3 under the hood? Btw it's $0.006 / minute

For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.

GaggiX•44m ago
>So it's not just whisper v3 under the hood?

Why it should be Whisper v3? They even released an open model: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

tekacs•46m ago
WER is slightly misleading, but Whisper Large v3 WER is classically around 10%, I think, and 12% with Turbo.

The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.

But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.

tekacs•45m ago
On the topic of things being misleading, GPT-4o transcriber is a very _different_ transcriber to Whisper. I would say not better or worse, despite characterizations such. So it is a little difficult to compare on just the numbers.

There's a reason that quite a lot of good transcribers still use V2, not V3.

satvikpendem•16m ago
Different how?
dmix•50m ago
> At approximately 4% word error rate on FLEURS and $0.003/min

Amazons transcription service is $0.024 per minute, pretty big difference https://aws.amazon.com/transcribe/pricing/

mdrzn•48m ago
Is it 0.003 per minute of audio uploaded, or "compute minute"?

For example fal.ai has a Whisper API endpoint priced at "$0.00125 per compute second" which (at 10-25x realtime) is EXTREMELY cheaper than all the competitors.

Oras•26m ago
I think the point is having it for real-time; this is for conversations rather than transcribing audio files.
Archelaos•45m ago
As a rule of thumb for software that I use regularly, it is very useful to consider the costs over a 10-year period in order to compare it with software that I purchase for lifetime to install at home. So that means 1,798.80 $ for the Pro version.

What estimates do others use?

antirez•42m ago
Italian represents, I believe, the most phonetically advanced human language. It has the right compromise among information density, understandability, and ability to speech much faster to compensate the redundancy. It's like if it had error correction built-in. Note that it's not just that it has the lower error rate, but is also underrepresented in most datasets.
Archelaos•21m ago
This is largely due to the fact that modern Italian is a systematised language that emerged from a literary movement (whose most prominent representative is Alessandro Manzoni) to establish a uniform language for the Italian people. At the time of Italian unification in 1861, only about 2.5% of the population could speak this language.
gbalduzzi•5m ago
The language itself was not invented for the purpose: it was the language spoken in Florence, than adopted by the literary movement and than selected as the national language.

It seems like the best tradeoff between information density and understandability actually comes from the deep latin roots of the language

gbalduzzi•12m ago
I was honestly surprised to find it in the first place, because I assumed English to be at first place given the simpler grammar and the huge dataset available.

I agree with your belief, other languages have either lower density (e.g. German) or lower understandability (e.g. English)

NewsaHackO•7m ago
The only knowledge I have about how difficult Italian is comes from Inglourious Basterds.
simonw•36m ago
This demo is really impressive: https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim...

Don't be confused if it says "no microphone", the moment you click the record button it will request browser permission and then start working.

I spoke fast and dropped in some jargon and it got it all right - I said this and it transcribed it exactly right, WebAssembly spelling included:

> Can you tell me about RSS and Atom and the role of CSP headers in browser security, especially if you're using WebAssembly?

Oras•32m ago
Thank you for the link! Their playground in Mistral does not have a microphone. it just uploads files, which does not demonstrate the speed and accuracy, but the link you shared does.

I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time.

tekacs•15m ago
Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before.

And open weight too! So grateful for this.

daemonologist•9m ago
404 on https://mistralai-voxtral-mini-realtime.hf.space/gradio_api/... for me (which shows up in the UI as a little red error in the top right).
satvikpendem•18m ago
Looks like this model doesn't do realtime diarization, what model should I use if I want that? So far I've only seen paid models do diarization well. I heard about Nvidia NeMo but haven't tried that or even where to try it out.
aavci•16m ago
What's the cheapest device specs that this could realistically run on?
pietz•10m ago
Do we know if this is better than Nvidia Parakeet V3? That has been my go-to model locally and it's hard to imagine there's something even better.
boringg•4m ago
Pseudo related -- am I the only one uncomfortable using my voice with AI for the concern that once it is in the training model it is forever reproducible? As a non-public person it seems like a risk vector (albeit small),

Show HN: Chatsight – Bulk delete ChatGPT chats, add table of contents and more

https://chromewebstore.google.com/detail/chatsight-chatgpt-prompt/aamihahiiogceidpbnfgehacgiecephe
1•AbjMV•48s ago•0 comments

It took 4 years to sell my startup. I wrote a book about it

https://derekyan.com/ma-book/
1•zhyan7109•1m ago•0 comments

Claude Code is down again

https://status.claude.com/incidents/pvbysfjjrf8m
1•guluarte•2m ago•0 comments

Show HN: Notifox – send SMS and Email alerts from the terminal

https://notifox.com/cli
1•Meetvelde•2m ago•0 comments

Ask HN: What do you do when Claude is down?

1•elmean•2m ago•0 comments

Pinterest sacks workers for creating tool to track layoffs

https://www.bbc.com/news/articles/cn0k670n0ydo
1•jedberg•3m ago•1 comments

Field Notes on Scaling Moe Expert Parallelism with DeepEP

https://nousresearch.com/moe-scaling-field-notes/
1•PaulHoule•3m ago•0 comments

Justin Key's "The Hospital at the End of the World"

https://pluralistic.net/2026/02/04/slice-bees/
1•hn_acker•4m ago•0 comments

Brain network responsible for Parkinson's disease identified

https://medicalxpress.com/news/2026-02-brain-network-responsible-parkinson-disease.html
1•bikenaga•4m ago•0 comments

Think agentic AI is hard to secure today? Just wait a few months

https://www.csoonline.com/article/4123246/think-agentic-ai-is-hard-to-secure-today-just-wait-a-fe...
2•CrankyBear•7m ago•0 comments

The 2025 James C. Morgan Global Humanitarian Award Winner: Steve Wozniak

https://www.thetech.org/support/the-tech-for-global-good/global-humanitarian/
1•oldnetguy•7m ago•0 comments

Eval on Agentic Workspace Bootstrapping

https://www.nightshift.sh/blog/evaluating-large-language-models-on-agentic-workspace-bootstrapping
1•tensor_ninja•8m ago•1 comments

The Discourse Is Getting Both Smarter and Dumber

https://www.richardhanania.com/p/the-discourse-is-getting-both-smarter
1•willparks•10m ago•0 comments

RS-SDK: Drive RuneScape with Claude Code

https://github.com/MaxBittker/rs-sdk
6•evakhoury•10m ago•1 comments

The Muppets are horny and weird again (just like Jim Henson intended)

https://www.polygon.com/the-muppet-show-disney-special-review/
3•johnshades•10m ago•0 comments

InsAIts: Monitoring for AI-AI comms. Detect hallucinations before propagation

https://github.com/Nomadu27/InsAIts
1•MrSteaddy•11m ago•2 comments

25 years of British offshore wind

https://ember-energy.org/latest-insights/25-years-of-british-offshore-wind/
2•g8oz•12m ago•0 comments

AI can now fake the videos we trust most

https://www.fastcompany.com/91485004/ai-can-now-fake-the-videos-we-trust-most
1•johnshades•12m ago•0 comments

Show HN: Guess the Common Word Game

https://common-thread.specr.net
1•vunderba•13m ago•0 comments

Claude Didn't Kill Craftsmanship

https://mergify.com/blog/claude-didnt-kill-craftsmanship
4•JulianMaurin1•13m ago•0 comments

China bans hidden car door handles

https://text.npr.org/nx-s1-5698224
5•mooreds•13m ago•0 comments

UpGuard Research: 1 in 5 Developers Grant Vibe Coding Tools Unrestricted Access

https://www.upguard.com/press/new-research-from-upguard-1-in-5-developers-grant-ai-vibe-coding-to...
1•upguardnews•14m ago•1 comments

We asked retired astronauts about their favorite space movies

https://www.cnn.com/2026/02/01/entertainment/astronauts-favorite-space-movies
2•mooreds•14m ago•0 comments

Data-Driven Deportation in the 21st Century

https://americandragnet.org/
1•mooreds•14m ago•0 comments

Show HN: Editor for perfecting your YC App. Multiplayer w/ Durable Objects. OSS.

https://github.com/bensenescu/graham
2•bsenescu•15m ago•0 comments

Show HN: CloudCounter – Serverless GoatCounter on Cloudflare Pages and D1

https://github.com/philippdubach/cloudcounter
4•7777777phil•17m ago•1 comments

ViralVelocity – AI-powered YouTube script & video generator

https://viralvelocity.app/
2•coreycascio•18m ago•1 comments

SereneDB – The First Real-Time Search Analytics Database

https://github.com/serenedb/serenedb
2•zX41ZdbW•19m ago•0 comments

Show HN: 8of8 – A trend radar for developers (17 sources, scored 0-100)

https://8of8.xyz
2•JoseOSAF•19m ago•0 comments

Analyzing 14M Chess Games

https://loganharless.com/blog/analyzing-5-billion-chess-games
4•myhandleisbest•20m ago•0 comments