The app uses NVIDIA's Parakeet v3 (TDT) as the primary engine. Inference is handled by the FluidAudio library. It's insanely fast on M-series chips. For Intel users, I added local Whisper support ranging from Tiny up to Large v3 Turbo (quantized to save space).
The interesting part was the AI integration. I originally tried to pipeline the ASR output into Apple's Intelligence Foundation Model. It was a nightmare, because the content moderation filter blocked nearly every useful request.
So I scrapped that and moved to SwiftMLX. Now it runs Qwen 3.0 (0.6B to 8B) locally. You just set a hotkey, speak, and the local LLM handles formatting/rewriting/translation with very low latency. Zero data leaves your Mac.
It's a one-time purchase (no monthly fee). I'd love to hear what you think about the latency compared to standard Whisper implementations.