Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX)

https://edgewhisper.com

2•raphaelmansuy•2h ago

I built a macOS voice dictation app where zero bytes of audio ever leave your machine.

EdgeWhisper runs Voxtral Mini 4B Realtime (Mistral AI, Apache 2.0) locally on Apple Silicon via the MLX framework. Hold a key, speak, release — text appears at your cursor in whatever app has focus.

Architecture: - Native Swift (SwiftUI + AppKit). No Electron. - Voxtral 4B inference via MLX on the Neural Engine. ~3GB model, runs in ~2GB RAM on M1+. - Dual text injection: AXUIElement (preserves undo stack) with NSPasteboard+CGEvent fallback. - 6-stage post-processing pipeline: filler removal → dictionary → snippets → punctuation → capitalization → formatting. - Sliding window KV cache for unlimited streaming without latency degradation. - Configurable transcription delay (240ms–2.4s). Sweet spot at 480ms.

What it does well: - Works in 20+ terminals/IDEs (VS Code, Xcode, iTerm2, Warp, JetBrains). Most dictation tools break in terminals — we detect them and switch injection strategy. - Removes filler words automatically ("um", "uh", "like"). - 13 languages with auto-detection. - Personal dictionary + snippet expansion with variable support ({{date}}, {{clipboard}}). - Works fully offline after model download. No accounts, no telemetry, no analytics.

What it doesn't do (yet): - No file/meeting transcription (coming) - No translation (coming) - No Linux/Windows (macOS only, Apple Silicon required)

Pricing: Free tier (5 min/day, no account needed). Pro at $7.99/mo or $79.99/yr.

I'd love feedback on: 1. Would local LLM post-processing (e.g., Phi-4-mini via MLX) for grammar/tone be worth the extra ~1GB RAM? 2. For developers using voice→code workflows: what context would you want passed to your editor? 3. Anyone else building on Voxtral Realtime? Curious about your experience with the causal audio encoder.

Comments

Dumbledumb•1h ago

I have been using parakeet TDT v3 with just 0.6B params and its insanely fast (feels instant, even on M1 Air). The accuracy is all I could ask for - I dont see the benefit of a much larger 4B model?

Not knocking your app, but asking before your app seems very focused on one model, while others allow the user to pick according to their needs.

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

Show HN: Context Gateway – Compress agent context before it hits the LLM

Show HN: I wrote my first neural network

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX)

Show HN: Svglib a SVG parser and renderer for Windows

Show HN: Compressor.app – Compress almost any file format

Show HN: Hardened OpenClaw on AWS with Terraform

Show HN: Better HN – Realtime Comment Updates and Cleaner Look

Show HN: Axe – A 12MB binary that replaces your AI framework

Show HN: A single CLI to manage llama.cpp/vLLM/Ollama models

Show HN: Loop your agents like a dandy little b*tch

Show HN: ShellSelf – A Developer Portfolio That Feels Like Home

Show HN: Anthrology – Time-Traveling Radio

Show HN: Mutate – free inline text replacement for Mac

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: DJX – Convention over Configuration for Django (Rails-Inspired CLI)

Show HN: An addendum to the Agile Manifesto for the AI era

Show HN: Tiny macOS app that adds a facecam bubble to screen recordings

Show HN: Rudel – Claude Code Session Analytics

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

Show HN: Mjmx – render mjml using JSX

Show HN: AgentLog – a lightweight event bus for AI agents using JSONL logs

Show HN: Execute local LLM prompts in remote SSH shell sessions

Show HN: s@: decentralized social networking over static sites

Show HN: AI milestone verification for construction using AWS

Show HN: RepoCrunch – CLI to analyze GitHub repos

Show HN: OpenClaw docs in Japanese, now open source

Show HN: Open-source browser for AI agents

Show HN: OpenClaw-class agents on ESP32 (and the IDE that makes it possible)

Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX)

Comments

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

Show HN: Context Gateway – Compress agent context before it hits the LLM

Show HN: I wrote my first neural network

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

Show HN: EdgeWhisper – On-device voice-to-text for macOS (Voxtral 4B via MLX)

Show HN: Svglib a SVG parser and renderer for Windows

Show HN: Compressor.app – Compress almost any file format

Show HN: Hardened OpenClaw on AWS with Terraform

Show HN: Better HN – Realtime Comment Updates and Cleaner Look

Show HN: Axe – A 12MB binary that replaces your AI framework

Show HN: A single CLI to manage llama.cpp/vLLM/Ollama models

Show HN: Loop your agents like a dandy little b*tch

Show HN: ShellSelf – A Developer Portfolio That Feels Like Home

Show HN: Anthrology – Time-Traveling Radio

Show HN: Mutate – free inline text replacement for Mac

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: DJX – Convention over Configuration for Django (Rails-Inspired CLI)

Show HN: An addendum to the Agile Manifesto for the AI era

Show HN: Tiny macOS app that adds a facecam bubble to screen recordings

Show HN: Rudel – Claude Code Session Analytics

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

Show HN: Mjmx – render mjml using JSX

Show HN: AgentLog – a lightweight event bus for AI agents using JSONL logs

Show HN: Execute local LLM prompts in remote SSH shell sessions

Show HN: s@: decentralized social networking over static sites

Show HN: AI milestone verification for construction using AWS

Show HN: RepoCrunch – CLI to analyze GitHub repos

Show HN: OpenClaw docs in Japanese, now open source

Show HN: Open-source browser for AI agents

Show HN: OpenClaw-class agents on ESP32 (and the IDE that makes it possible)