SpeechDock can capture and transcribe system-wide audio or audio from a specific app — video calls, online lectures, podcasts, anything your Mac can play. It also does the reverse: select any text on screen (or capture it via OCR) and have it read aloud.
Key points:
- Works out of the box with macOS native STT/TTS — no API keys needed - Optionally connect OpenAI, Gemini, ElevenLabs, or Grok for higher accuracy - Real-time subtitle overlay with translation (80+ languages) - Global hotkeys — use from anywhere without switching apps - AppleScript support for automation - Open source (Apache 2.0)
Requires macOS 14+. Built with Swift/SwiftUI.
GitHub: https://github.com/yohasebe/speechdock Docs: https://yohasebe.github.io/speechdock/