How it works technically:
Push-to-talk hotkey (configurable) → audio captured → transcribed → pasted directly into the active text field via OS accessibility APIs - Local mode: runs faster-whisper (or MLX-Whisper on Apple Silicon) entirely on-device. Audio never leaves the machine. - Cloud mode (Pro): audio → Cloud → LLM for ASR → Llama 3.1-8B for formatting → back to client via WebSocket - App-aware formatting: detects the active app (Discord, Outlook, VS Code, Slack) and applies different formatting styles via prompt engineering - Auto-learn dictionary: reads the text field 7s after injection via UIA/AXUIElement, diffs against injected text, learns corrections
Stack: Tauri (Rust + React), Python FASTAPI, GCP Cloud Run, Supabase auth, Stripe billing.
Why I built it: Wispr Flow is the dominant player, but it's Mac& Windows only and has no free tier. Linux users have been asking for over a year. I wanted something that actually works everywhere, respects privacy in local mode, and gets smarter over time.
Free tier: Unlimited local Whisper, forever. No account required to start.
Download and source available at: https://lotusq.app/download
Happy to answer questions about the architecture — the hybrid local/cloud routing logic and the UIA-based correction detection were the trickiest parts.
vacano•2h ago