No API keys, no backend, no data leaves your machine.
When you open the site, you'll hear it immediately — the landing page auto-generates speech from three different sentences right in your browser, no setup required.
You can then try any model yourself: type text, hit generate, hear it instantly. Models download once and get cached locally.
The most experimental feature: a fully in-browser Voice Agent. It chains speech-to-text → LLM → text-to-speech, all running locally on your GPU via WebGPU. You can have a spoken conversation with an AI without a single network request.
Currently supported models: - TTS: Kokoro 82M, SpeechT5, Piper (VITS) - STT: Whisper Tiny, Whisper Base
Other features: - Side-by-side model comparison - Speed benchmarking on your hardware - Streaming generation for supported models
Source: https://github.com/MbBrainz/ttslab (MIT)
Feedback I'd especially like: 1. How does performance feel on your hardware? 2. What models should I add next? 3. Did the Voice Agent work for you? That's the most experimental part.
Built on top of ONNX Runtime Web (https://onnxruntime.ai) and Transformers.js — huge thanks to those communities for making in-browser ML inference possible.
MbBrainz•2h ago
The Voice Agent chains three models in the browser: Whisper for STT → a local LLM → Kokoro/SpeechT5 for TTS. All inference runs on-device via WebGPU. The latency isn't amazing yet, but the fact that it works at all with zero backend is kind of wild.
The landing page has an auto-playing demo that generates speech locally as soon as you visit — you'll hear it typewrite and speak three sentences. That was important to me because "runs in your browser" sounds like marketing until you actually hear it happen.
Happy to go deep on the WebGPU inference pipeline, model conversion process, or anything else.