frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Voxtral – Frontier open source speech understanding models

https://mistral.ai/news/voxtral
156•meetpateltech•6mo ago

Comments

danelski•6mo ago
They claim to undercut competitors of similar quality by half for both models, yet they released both as Apache 2.0 instead of following smaller - open, larger - closed strategy used for their last releases. What's different here?
Havoc•6mo ago
Probably not looking to directly compete in transcription space
wmf•6mo ago
They're working on a bunch of features so maybe those will be closed. I guess they're feeling generous on the base model.
halJordan•6mo ago
They didn't release voxtral large so your question doesn't really make sense
danelski•6mo ago
It's about what their top offering is at the moment, not having Large in name. Mistral Medium 3 is notably not Mistral Large 3, but it was released as API-only.
homarp•6mo ago
weights:https://huggingface.co/mistralai/Voxtral-Mini-3B-2507 and https://huggingface.co/mistralai/Voxtral-Small-24B-2507
homarp•6mo ago
Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16.

Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

GaggiX•6mo ago
There is also a Voxtral Small 24B small model available to be downloaded: https://huggingface.co/mistralai/Voxtral-Small-24B-2507
homarp•6mo ago
Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16.

Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

lostmsu•6mo ago
My Whisper v3 Large Turbo is $0.001/min, so their price comparison is not exactly perfect.
ImageXav•6mo ago
How did you achieve that? I was looking into it and $0.006/min is quoted everywhere.
lostmsu•6mo ago
Harvesting idle compute. https://borgcloud.org/speech-to-text
BetterWhisper•6mo ago
Do you support speaker recognition?
lostmsu•6mo ago
No. I found models doing that unreliable when there are many speakers.
4b11b4•6mo ago
This is your service?
lostmsu•6mo ago
Yes
lostmsu•6mo ago
Does it support realtime transcription? What is the ~latency?
rolisz•6mo ago
Unlikely. The small model is much larger than whisper (which is already hard to use for realtime)
ipsum2•6mo ago
24B is crazy expensive for speech transcription. Conspicuously no comparison with Parakeet, a 600M param model thats currently dominating leaderboards (but only for English)
azinman2•6mo ago
But it also includes world knowledge, can do tool calls, etc. It’s an omnimodel
qwertox•6mo ago
Only the mini is meant for pure transcription. And with the tests I just did on their API, comparing to Whisper large, they are around three times faster, more accurate and cheaper.

24B is, as sibling comment says, an omni model, it can also do function calling.

sheerun•6mo ago
In demo they mention polish prononcuation is pretty bad, spoken as if second language of english-native speaker. I wonder if it's the same for other languages. On the other hand whispering-english is hillariously good, especially different emotions.
Raed667•6mo ago
It is insane how good the "French man speaking English" demo is. It captures a lot of subtleties
potlee•6mo ago
That’s an actual French man speaking English
kamranjon•6mo ago
Im pretty excited to play around with this. I’ve worked with whisper quite a bit, it’s awesome to have another model in the same class and from Mistral, who tend to be very open. I’m sure unsloth is already working on some GGUF quants - will probably spin it up tomorrow and try it on some audio.
vivalapomy•6mo ago
Won't comment on the 24B model as I see no use for it personally, but regarding purely ASR tasks, I honestly can't see voxtral taking off. For personal usage, I've been running a quant of whisper tiny(for english), as well as whisper small(for spanish, as is my native language), and have never experienced major latency when using for globally available voice commands. Considering my machine runs an Ivy Bridge processor, using CPU inference, the pricing seems unreasonable.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
391•klaussilveira•5h ago•85 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
749•xnx•10h ago•459 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
118•dmpetrov•5h ago•48 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
131•isitcontent•5h ago•14 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
234•vecti•7h ago•113 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
28•quibono•4d ago•1 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
57•jnord•3d ago•3 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
302•aktau•11h ago•152 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
304•ostacke•11h ago•82 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
160•eljojo•8h ago•121 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
377•todsacerdoti•13h ago•214 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
44•phreda4•4h ago•7 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
305•lstoll•11h ago•230 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
100•vmatsiiako•10h ago•34 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
167•i5heu•8h ago•127 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
138•limoce•3d ago•76 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
223•surprisetalk•3d ago•29 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
36•rescrv•12h ago•17 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
956•cdrnsf•14h ago•413 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
8•gfortaine•2h ago•0 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
7•kmm•4d ago•0 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
33•lebovic•1d ago•11 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
30•ray__•1h ago•6 comments

Claude Composer

https://www.josh.ing/blog/claude-composer
97•coloneltcb•2d ago•68 comments

The Oklahoma Architect Who Turned Kitsch into Art

https://www.bloomberg.com/news/features/2026-01-31/oklahoma-architect-bruce-goff-s-wild-home-desi...
17•MarlonPro•3d ago•2 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
76•antves•1d ago•56 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
37•nwparker•1d ago•8 comments

How virtual textures work

https://www.shlom.dev/articles/how-virtual-textures-really-work/
23•betamark•12h ago•22 comments

Evolution of car door handles over the decades

https://newatlas.com/automotive/evolution-car-door-handle/
38•andsoitis•3d ago•61 comments

The Beauty of Slag

https://mag.uchicago.edu/science-medicine/beauty-slag
27•sohkamyung•3d ago•3 comments