The latest release (v0.15) adds Local Inference mode — fully on-device ASR, translation, and TTS using WASM and WebGPU. No API key, no internet, no data leaving your machine. It ships with:
- 48 ASR models covering 99+ languages (sherpa-onnx WASM + Whisper WebGPU) - 55+ translation language pairs (Opus-MT) plus multilingual LLMs (Qwen 2.5/3/3.5) via WebGPU - 136 TTS models across 53 languages (Piper, Coqui, Mimic3, Matcha)
For those who prefer cloud providers, it also supports OpenAI Realtime API, Google Gemini Live, Palabra.ai, Volcengine ST, Doubao AST 2.0, and any OpenAI-compatible endpoint.
The browser extension integrates with Google Meet, Teams, Zoom, Discord, Slack, and others — it can capture participant audio and inject translated speech via a virtual microphone.
Tech stack: React + Zustand + Vite, Electron Forge, sherpa-onnx compiled to WASM, HuggingFace Transformers.js for WebGPU inference. Models are downloaded on demand and cached in IndexedDB.
I built this because existing translation tools either require expensive API keys, send your audio to the cloud, or don't support enough languages. The local inference mode makes it practical for privacy-sensitive use cases and for people without reliable internet.
AGPL-3.0 licensed. Available on Windows, macOS, Linux, Chrome Web Store, and Edge Add-ons.
GitHub: https://github.com/kizuna-ai-lab/sokuji Offical site: https://sokuji.kizuna.ai