Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX)

https://edgewhisper.com

2•raphaelmansuy•2h ago

I built a macOS voice dictation app where zero bytes of audio ever leave your machine.

EdgeWhisper runs Voxtral Mini 4B Realtime (Mistral AI, Apache 2.0) locally on Apple Silicon via the MLX framework. Hold a key, speak, release — text appears at your cursor in whatever app has focus.

Architecture: - Native Swift (SwiftUI + AppKit). No Electron. - Voxtral 4B inference via MLX on the Neural Engine. ~3GB model, runs in ~2GB RAM on M1+. - Dual text injection: AXUIElement (preserves undo stack) with NSPasteboard+CGEvent fallback. - 6-stage post-processing pipeline: filler removal → dictionary → snippets → punctuation → capitalization → formatting. - Sliding window KV cache for unlimited streaming without latency degradation. - Configurable transcription delay (240ms–2.4s). Sweet spot at 480ms.

What it does well: - Works in 20+ terminals/IDEs (VS Code, Xcode, iTerm2, Warp, JetBrains). Most dictation tools break in terminals — we detect them and switch injection strategy. - Removes filler words automatically ("um", "uh", "like"). - 13 languages with auto-detection. - Personal dictionary + snippet expansion with variable support ({{date}}, {{clipboard}}). - Works fully offline after model download. No accounts, no telemetry, no analytics.

What it doesn't do (yet): - No file/meeting transcription (coming) - No translation (coming) - No Linux/Windows (macOS only, Apple Silicon required)

Pricing: Free tier (5 min/day, no account needed). Pro at $7.99/mo or $79.99/yr.

I'd love feedback on: 1. Would local LLM post-processing (e.g., Phi-4-mini via MLX) for grammar/tone be worth the extra ~1GB RAM? 2. For developers using voice→code workflows: what context would you want passed to your editor? 3. Anyone else building on Voxtral Realtime? Curious about your experience with the causal audio encoder.

Comments

Dumbledumb•1h ago

I have been using parakeet TDT v3 with just 0.6B params and its insanely fast (feels instant, even on M1 Air). The accuracy is all I could ask for - I dont see the benefit of a much larger 4B model?

Not knocking your app, but asking before your app seems very focused on one model, while others allow the user to pick according to their needs.

District denies enrollment to child based on license plate reader data

Aether Engine: Coupled multiphysics for photonic ICs under extreme environments

MiniMax M2.5 is trained by Claude Opus 4.6?

Meta planning layoffs as AI costs mount

Adobe's Statement Regarding the Department of Justice Settlement

Institutional AI vs. Individual AI

Volkswagen's first tailored EV rolls out as it retakes the top spot in China

AutoContext: closed-loop system for improving agent behavior over repeated runs

Autoresearch Home

BYD's 5 min fast charging, 500 mile range luxury EV is headed overseas

Show HN: A Claude Skill that teaches Rails conventions for LLM calls

(Media over QUIC) on a Boat

Monty Python Got It Wrong About Medieval Disease

Mega-OS – 38-agent operating system that runs inside Claude Code

$2B nonprofit grants traced to find who's behind age verification bills

Elon Musk's Ketamine Use Can't Be Probed in OpenAI Fraud Trial

Show HN: SupplementDEX – The Evidence-Based Supplement Database

Show HN: I built an interactive 3D three-body problem simulator in the browser

The Egg (2009)

What happens when an autonomous robotaxi gets into an accident?

The Collapse of the Incentive to Make

Spotify Silently Updates Itself (and How to Stop It)

Let the Code Do the Talking

RAM: WTF? (2025)

Sniffer dogs can detect wildlife trafficking via shipping container air samples

Instagram to discontinue end-to-end encryption for DMs

Gallo-Roman dodecahedron: twelve faces, zero answers?

What Does Extreme Wealth Do to the Brain?

Microplastics that accumulate in the body may 'clog up' immune cells

Our Experience with I-Ready