Dictate in most languages. Types into any app.
Listen back with local Piper TTS.
3,000 words free. Then $6.99/month.
Next: joins meetings, transcribes, writes short notes. Later: lightweight Obsidian-style notes built from your text.
Built on Whisper + Piper. Runs on your machine.
Feedback on UX, speed, and pricing is welcome.
mesadb•2mo ago
My goal was “press hotkey, start talking, see text within ~1–2 seconds” on an M2 MacBook Pro, and support multiple languages.
First attempts (cloud) – I tried Hugging Face real-time transcription. It worked but latency was all over the place and costs would not scale. – I tried OpenAI real-time transcription. Latency was better, but when there was background noise, it'd transcribe wrong things. Saw 200ms responses. I can bring that back if I can make it stable. – I briefly experimented with Gemini for transcribing and formatting multi-language text. Quality was not consistent enough compared to Whisper for Multi language.
Local experiments – I used FFmpeg + Whisper CLI in a bunch of ways: batching, buffering, trying to “stream” partial results out of Whisper to make it feel live. – I also tried a local Llama model to format the raw transcript into an email. On an M2 Pro this took ~2 seconds for short emails and got much slower for long text. It looked nice but the latency was not acceptable for everyday use.
Where I ended up (for now) – Current version sticks to FFmpeg + Whisper CLI locally, optimized for short chunks so you usually see text within about 1–2 seconds. – I dropped the heavy on-device LLM formatting and keep the formatting logic much simpler so it stays predictable and fast.
Next step is to re-introduce “smart” formatting and meeting notes, but only when I can do it without blowing up latency. Happy to dig deeper into any of these if people are curious.