frontpage.

Show HN: Dicta.to – Local voice dictation for Mac with on-device AI

https://dicta.to/

2•alamparelli•3h ago

I built a macOS dictation app where everything runs on-device. Transcription, auto-correct, translation. No audio or text leaves your machine.

It ships with 4 transcription engines you can swap between: WhisperKit (99 languages), NVIDIA Parakeet TDT 0.6B (25 European languages, fastest of the bunch), Qwen3-ASR 0.6B (30 languages), and Apple Speech on macOS 26+. They all run through CoreML/Metal. Whisper is the most versatile, Parakeet wins on raw latency for European languages, Qwen3 does better with CJK. I went with a protocol-based architecture so you pick the engine that fits your use case instead of me pretending one model rules them all.

After transcription, there's an optional post-processing pipeline using Apple Intelligence (FoundationModels framework, macOS 26+, also fully on-device): auto-correct with filler word removal, tone rewriting, translation. The annoying part was FoundationModels cold start. First inference after idle takes 2-3s, which kills the experience. I worked around it by firing a throwaway mini-inference (`session.respond(to: "ok")`) in parallel while audio is still being transcribed, so the model is already warm when the text arrives. Hacky, but it shaved off the perceived latency.

Getting transcribed text into any arbitrary macOS app was honestly the hardest part. I use clipboard save/restore: read all NSPasteboard types (not just strings, also images, RTF, whatever the user had copied), write the transcribed text, simulate Cmd+V via CGEvent posted to `cghidEventTap`, then restore the original clipboard. Electron apps are slower to process paste events, so I detect them by checking if `Contents/Frameworks/Electron Framework.framework` exists in the app bundle and add extra delay. This whole approach requires Accessibility permissions, which means no sandbox, which means no App Store. I'm fine with that trade-off.

Built this solo in about 6 weeks. One-time purchase, no subscription.

I'm genuinely unsure about the multi-engine approach. Is letting users choose between Whisper/Parakeet/Qwen3 useful, or would most people prefer I just auto-select based on their language? Also curious if anyone has a cleaner approach to text injection on macOS. The clipboard hack works everywhere but it feels fragile and I don't love it.

Show HN: enveil – hide your .env secrets from prAIng eyes

Show HN: X86CSS – An x86 CPU emulator written in CSS

Show HN: Ghist – Task management that lives in your repo

Show HN: Steerling-8B, a language model that can explain any token it generates

Show HN: PgDog – Scale Postgres without changing the app

Show HN: Cellarium: A Playground for Cellular Automata

Show HN: Babyshark – Wireshark made easy (terminal UI for PCAPs)

Show HN: If Discord, Reddit, X, IRC and 4chan had a baby

Show HN: Awsim – Lightweight AWS emulator in Go (40 services in progress)

Show HN: Sowbot – Open-hardware agricultural robot (ROS2, RTK GPS)

Show HN: AI-native SDLC – 156 test docs, 16 skills, 1 human

Show HN: Dicta.to – Local voice dictation for Mac with on-device AI

Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf)

Show HN: AI phone assistant that became a lifeline for people who can't speak

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

Show HN: Turn human decisions into blocking tool-calls for AI agents (iOS+CLI)

Show HN: Tessera – An open protocol for AI-to-AI knowledge transfer

Show HN: WebPerceptor – Enabling AI Mediated Web Browsing

Show HN: Claude Copy – Drop-in fix for Claude Code's broken copy-paste

Show HN: CIA World Factbook Archive (1990–2025), searchable and exportable

Show HN: Git-native-issue – issues stored as commits in refs/issues/

Show HN: L88 – A Local RAG System on 8GB VRAM (Need Architecture Feedback)

Show HN: 3D Mahjong, Built in CSS

Show HN: Notion-CLI – Full Notion API from the terminal, 39 commands, one binary

Show HN: Agent Multiplexer – manage Claude Code via tmux

Show HN: BVisor – An Embedded Bash Sandbox, 2ms Boot, Written in Zig

Show HN: A geometric analysis of Chopin's Prelude No. 4 using 3D topology

Show HN: AgentBudget – Real-time dollar budgets for AI agents

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Show HN: ClinTrialFinder –AI-powered clinical trial matching for cancer patients

Show HN: Dicta.to – Local voice dictation for Mac with on-device AI

Show HN: enveil – hide your .env secrets from prAIng eyes

Show HN: X86CSS – An x86 CPU emulator written in CSS

Show HN: Ghist – Task management that lives in your repo

Show HN: Steerling-8B, a language model that can explain any token it generates

Show HN: PgDog – Scale Postgres without changing the app

Show HN: Cellarium: A Playground for Cellular Automata

Show HN: Babyshark – Wireshark made easy (terminal UI for PCAPs)

Show HN: If Discord, Reddit, X, IRC and 4chan had a baby

Show HN: Awsim – Lightweight AWS emulator in Go (40 services in progress)

Show HN: Sowbot – Open-hardware agricultural robot (ROS2, RTK GPS)

Show HN: AI-native SDLC – 156 test docs, 16 skills, 1 human

Show HN: Dicta.to – Local voice dictation for Mac with on-device AI

Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf)

Show HN: AI phone assistant that became a lifeline for people who can't speak

Show HN: AI Timeline – 171 LLMs from Transformer (2017) to GPT-5.3 (2026)

Show HN: Turn human decisions into blocking tool-calls for AI agents (iOS+CLI)

Show HN: Tessera – An open protocol for AI-to-AI knowledge transfer

Show HN: WebPerceptor – Enabling AI Mediated Web Browsing

Show HN: Claude Copy – Drop-in fix for Claude Code's broken copy-paste

Show HN: CIA World Factbook Archive (1990–2025), searchable and exportable

Show HN: Git-native-issue – issues stored as commits in refs/issues/

Show HN: L88 – A Local RAG System on 8GB VRAM (Need Architecture Feedback)

Show HN: 3D Mahjong, Built in CSS

Show HN: Notion-CLI – Full Notion API from the terminal, 39 commands, one binary

Show HN: Agent Multiplexer – manage Claude Code via tmux

Show HN: BVisor – An Embedded Bash Sandbox, 2ms Boot, Written in Zig

Show HN: A geometric analysis of Chopin's Prelude No. 4 using 3D topology

Show HN: AgentBudget – Real-time dollar budgets for AI agents

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Show HN: ClinTrialFinder –AI-powered clinical trial matching for cancer patients