I type a lot and got extremely frustrated with the current state of Mac dictation tools. Most of them are either heavy Electron wrappers, rely on cloud APIs (a privacy nightmare), or force you into a SaaS subscription for a tool that essentially runs on your own hardware. I wanted something that feels native, respects system resources, and runs entirely offline without forced subscriptions.
The stack is Rust, Tauri, and whisper.cpp. Here are the design decisions I made:
Model Size vs. Accuracy: Instead of using the smallest possible model just to claim a tiny footprint, the app downloads a ~490MB multi-language Whisper model locally on the first run. I found this to be the sweet spot for high accuracy (accents, technical jargon) to drastically reduce text correction time.
Hardware Acceleration: The downloaded model is compiled via CoreML. This allows the transcription to run directly on the Apple Neural Engine (ANE) and Metal on M-series chips, keeping the main CPU largely idle.
Memory Footprint: By using Tauri instead of Electron, the UI footprint is negligible. While actively running, the app takes up around 500MB of RAM. This makes perfect technical sense, as it is almost entirely the ~490MB AI model being actively held in memory to ensure instant transcription the millisecond you hit the global shortcut.
Input Method: It uses macOS accessibility APIs to type directly into your active window.
Business Model & Pricing: I strongly dislike subscription fatigue for local tools. There is a fully functional 7-day free trial (no account required). If you want to keep it, my main focus is a fair one-time purchase (€125 for a lifetime license). However, since I highly value the technical feedback from this community, I generated an exclusive launch code (HN25) that takes 25% off at checkout (dropping it to roughly €93 / ~$100).
Bug Bounty: Since I'm a solo dev, I know I might have missed some edge cases (especially around CoreML compilation on specific M-chips or weird keyboard layouts). If you find a genuine, reproducible bug and take the time to report it here in the thread, I will happily manually upgrade you to a free Lifetime license as a massive thank you for the QA help.
I'd love to hear your technical feedback on the Rust/Tauri architecture or how the CoreML compilation performs on your specific Apple Silicon setup. Happy to answer any questions!
rekabis•1h ago
Quick question: while I love the offline aspect, how does this handle spelling in relation to context? Is that via a ruleset, or is there some intelligence that learns user speaking patterns and common subjects?
Edos8877•1h ago
Good question! The short answer is, neither. There are no hardcoded rules, but the app also doesn't actively learn your personal speaking patterns over time.
All the context-awareness comes straight from the pre-trained Whisper model. Since it's a transformer network, it looks at the entire sentence context rather than translating word-by-word. For example, if you dictate a sentence about coding, it naturally knows to capitalize "Rust" and "Python" instead of writing about rusty metal and snakes.
I deliberately kept the model static. Trying to fine-tune it locally on the fly would mean I'd have to store your voice data (which kills the 100% privacy promise).
That being said, adding a custom dictionary feature, so you can feed it highly specific industry jargon right before you speak, is at the very top of my to-do list!
Let me know how it handles your vocabulary if you give the trial a spin.