frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: seriously fast local speech+LLM cleanup on Apple Silicon - Onit Dictate

https://www.getonit.ai/dictate
1•telenardo•1h ago
TL;DR: How far can you go with local ML on a Mac? We built a dictation app to find out. It turned out, pretty far! On a stock M-series Mac, end-to-end speech → text → LLM cleanup runs in under 1s on a typical sentence.

What is this? A local dictation app for macOS. It’s a free alternative to Wispr Flow, SuperWhisper, or MacWhisper. Since it runs entirely on YOUR device we made it free. There’s no servers to maintain so we couldn’t find anything to charge you for. We were playing with Apple Silicon and it turned into something usable, so we’re releasing it.

If you've written off on-device transcription before, it’s worth another look. Apple Silicon + MLX is seriously fast. We've been using it daily for the past few weeks. It's replaced our previous setups.

The numbers that surprised us: - <500ms results if you disable LLM post-processing (from settings) or use our fine-tuned 1B model (more on this below). This feels instant. You stop talking and the text is THERE. - With LLM Cleanup, p50 latency for a sentence is ~800ms (transcription + LLM post-processing combined). In practice, it feels quick! - Tested on M1, M2, and M4!

Technical Details: - Models: Parakeet 0.6B (transcription) + Llama 3B (cleanup), both running via MLX - Cleanup model has 8 tasks: remove filler words (ums and uhs) and stutters/repeats, convert numbers, special characters, acronyms (A P I → API), emails (hi at example dot com → hi@example.com), currency (two ninety nine → $2.99), and time (three oh two → 3:02). We’d like to add more, but each task increases latency (more on this below) so we settled here for now. - Cleanup model uses a simple few-shot algorithm to pull in relevant examples before processing your input. Current implementation sets N=5.

Challenges: - Cleanup Hallucinations: Out of the box, small LLMs (3B, 1B) still make mistakes. They can hallucinate long, unrelated responses and occasionally repeat back a few‑shot example. We had to add scaffolding to fall back to the raw audio transcripts when such cases are detected. So some “ums” and “ahs” still make it through. - Cleanup Latency: We can get better cleanup results by providing longer instructions or more few-shot examples (n=20 is better than n=5). But every input token hurts latency. If we go up to N=20 for example, LLM latency goes to 1.5-3s. We decided the delays weren't worth it for marginally better results.

Experimental: - Corrections: Since local models aren't perfect, we’ve added a feedback loop. When your transcript isn’t right, there’s a simple interface to correct it. Each correction becomes a fine-tuning example (stored locally on your machine, of course). We’re working on a one-click "Optimize" flow that will use DSPy locally to adjust the LLM cleanup prompt and fine-tune the transcription model and LLM on your examples. We want to see if personalization can close the accuracy gap. We’re still experimenting, but early results are promising! - Fine-tuned 1B model: per the above, we’ve a fine-tuned a cleanup model on our own labeled data. There’s a toggle to try this in settings. It’s blazing fast, under 500 ms. Because it’s fine‑tuned to the use case, it doesn’t require a long system prompt (which consumes input tokens and slows things down). If you try it, let us know what you think. We are curious to hear how well our model generalizes to other setups.

*Product details* - Universal hotkey (CapsLock default) - Works in any text field via simulated paste events. - Access point from the menu bar & right edge of your screen (latter can be disabled in settings) - It pairs well with our other tool, QuickEdit, if you want to polish dictated text further. - If wasn’t clear, yes, it’s Mac only. Linux folks, please roast us in the comments.

Comments

mkw5053•1h ago
I'm interested!

My main gripe with Wispr Flow is that it's slow and does the entire transcription in one pass after you finish speaking. Does this stream and transcribe as you talk?

I really want to see the transcription in progress while I'm speaking.

telenardo•33m ago
It's not set up for that, no, though it's theoretically possible!

The issues I see are: - Transcription models use beam search to choose the most likely words at each step, taking into account the surrounding words. The accuracy will drop a lot if you pick each top word individually as it’s spoken. The surrounding context matters a lot. - To that point, transcription models do get things wrong (i.e. "best" instead of "test"). The LLM post-processing can help here, by taking in the top-N hypotheses from the transcription mode and determining which makes the most sense (i.e. "run the tests", not "run the bests"), adding another layer of semantic understanding. Again, the surrounding context really matters here.

Do you need each word to stream individually? Or would it be sufficient for short phrases to stream?

The MLX inference is so fast that you could accomplish something like the latter by releasing and re-pressing the shortcut every 5-10 words. It so fast it honestly feels like streaming. In practice, I tend to do something like this anyway, because I find it easier to review shorter transcripts!

Venus Might Harbor Subsurface Lava Tunnels

https://www.universetoday.com/articles/venus-might-harbor-massive-subsurface-lava-tunnels
1•rbanffy•1m ago•0 comments

Rover wheel tribocharging in lunar shadowed regions

https://www.sciencedirect.com/science/article/pii/S0273117725012724
1•PaulHoule•1m ago•0 comments

MIDI Survivor

https://www.funwithcomputervision.com/piano
1•bilsbie•2m ago•0 comments

Theorizer: Turning Papers into Scientific Laws

https://allenai.org/blog/theorizer
1•kjhughes•3m ago•0 comments

The Quiet Shift in America's Population Growth

https://brookstonenews.substack.com/p/the-quiet-shift-in-americas-population
1•toomuchtodo•3m ago•0 comments

I gave my personal site a new look, what do you think? Built using Flutter

https://thrivedev.net/
1•luis_journey•3m ago•0 comments

Slow AI Manifesto

https://www.shardcore.org/spx/2026/01/30/slow-ai-manifesto/
1•speckx•4m ago•0 comments

Show HN: Flowly – Managed Clawdbot in 5 min

2•hakanorensy•7m ago•0 comments

Show HN: Claude Commander: runtime model switching in Cloud Code via hooks/API

https://github.com/sstraus/claude-commander
1•stefanostraus•8m ago•0 comments

Buttered Crumpet, a custom typeface for Wallace and Gromit

https://jamieclarketype.com/case-study/wallace-and-gromit-font/
1•tobr•8m ago•0 comments

Trump Taps Kevin Warsh to Lead the Federal Reserve

https://www.npr.org/2026/01/30/nx-s1-5645091/trump-kevin-warsh-federal-reserve-chair
1•healsdata•10m ago•0 comments

Americans can expect to live longer than ever, per the latest CDC data

https://sherwood.news/world/americans-can-expect-to-live-longer-than-ever-per-the-latest-cdc-data/
1•avonmach•11m ago•0 comments

Monsanto's House of the Future: A Plastic Dream of Tomorrow in Photos

https://rarehistoricalphotos.com/monsanto-house-of-the-future-photos/
2•celsoazevedo•12m ago•0 comments

More Than 2M Afghan Girls Denied Secondary Education, Says UN

https://www.afintl.com/en/202601245551
2•mhb•12m ago•0 comments

Taliban's New Law Legalises Slavery in Afghanistan, Makes Mullahs Immune

https://www.ndtv.com/world-news/talibans-new-law-legalises-slavery-in-afghanistan-makes-mullahs-i...
1•mhb•13m ago•1 comments

Challenge to compress 1M rows to the smallest possible size

https://github.com/agavra/compression-golf
1•birdculture•13m ago•0 comments

Ex-CNN anchor Don Lemon arrested on charges connected to Minnesota church protes

https://www.theguardian.com/us-news/2026/jan/30/don-lemon-minnesota-protest-charges
3•gizzlon•14m ago•1 comments

Stop using low DNS TTLs

https://blog.apnic.net/2019/11/12/stop-using-ridiculously-low-dns-ttls/
1•swills•15m ago•0 comments

Optimal Software Pipelining Using an SMT-Solver

https://arxiv.org/abs/2601.21842
1•ahsillyme•15m ago•0 comments

Resist and Unsubscribe

https://www.resistandunsubscribe.com
1•ptrhvns•16m ago•0 comments

GitHub Action that updates an OpenRouter guardrail daily with US-only providers

https://github.com/speedshop/openrouter-us-only-cached-guardrail
1•dataminer•16m ago•0 comments

Decrease the frequency of product subscription deliveries

https://practicalbetterments.com/decrease-the-frequency-of-product-subscription-deliveries/
1•surprisetalk•16m ago•0 comments

One Minute Park

https://oneminutepark.tv/?park=27296633
1•surprisetalk•16m ago•0 comments

Can AI (actually) beat Minecraft? [video]

https://www.youtube.com/watch?v=Wh4abvcUj8Q
1•surprisetalk•16m ago•0 comments

Stagnant Construction Productivity Is a Worldwide Problem

https://www.construction-physics.com/p/stagnant-construction-productivity
1•surprisetalk•17m ago•0 comments

Evaluations for Testing Agentic AI

1•stichers•18m ago•0 comments

Amazon: "We are also forming a new engineering team in India "

https://twitter.com/PlumbNick/status/2017239458677231736
3•sergiotapia•19m ago•1 comments

Standalone Android utility apps and a VS Code companion I built

2•kalinuxer•21m ago•0 comments

Ask HN: Is free identity theft protection after a data breach worth the bother?

2•daoboy•21m ago•0 comments

Preserving Human Voices and Faces

https://www.vatican.va/content/leo-xiv/en/messages/communications/documents/20260124-messaggio-co...
2•swannodette•22m ago•0 comments