"Click me to try now!" banners that lead to a warning screen that says "Oh, only paying members, whoops!"
So, you don't mean 'try this out', you mean 'buy this product'.
Let's not act like it's a free sampler.
I can't comment on the model : i'm not giving them money.
Is it better? Worse? Why do they only compare to gpt4o mini transcribe?
For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.
Why it should be Whisper v3? They even released an open model: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...
The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.
But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.
There's a reason that quite a lot of good transcribers still use V2, not V3.
Amazons transcription service is $0.024 per minute, pretty big difference https://aws.amazon.com/transcribe/pricing/
For example fal.ai has a Whisper API endpoint priced at "$0.00125 per compute second" which (at 10-25x realtime) is EXTREMELY cheaper than all the competitors.
What estimates do others use?
It seems like the best tradeoff between information density and understandability actually comes from the deep latin roots of the language
I agree with your belief, other languages have either lower density (e.g. German) or lower understandability (e.g. English)
Italian has one official italian (two, if you count IT_ch, but difference is minor), doesn't pay much attention to stress and vowel length, and only has a few "confusable" sounds (gl/l, gn/n, double consonants, stuff you get wrong in primary school). Italian dialects would be a disaster tho :)
I don't know how widely accepted that conclusion is, what exceptions there may be, etc.
Don't be confused if it says "no microphone", the moment you click the record button it will request browser permission and then start working.
I spoke fast and dropped in some jargon and it got it all right - I said this and it transcribed it exactly right, WebAssembly spelling included:
> Can you tell me about RSS and Atom and the role of CSP headers in browser security, especially if you're using WebAssembly?
I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time.
And open weight too! So grateful for this.
observationist•1h ago
https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...
~9GB model.
coder543•1h ago
observationist•1h ago
sbrother•41m ago
coder543•13m ago
No, I just heard about it this morning.