Show HN: TTSLab – A voice AI agent and TTS lab running in the browser via WebGPU

https://ttslab.dev

4•MbBrainz•2h ago

I built TTSLab — a free, open-source tool for running text-to-speech and speech-to-text models directly in the browser using WebGPU and WASM.

No API keys, no backend, no data leaves your machine.

When you open the site, you'll hear it immediately — the landing page auto-generates speech from three different sentences right in your browser, no setup required.

You can then try any model yourself: type text, hit generate, hear it instantly. Models download once and get cached locally.

The most experimental feature: a fully in-browser Voice Agent. It chains speech-to-text → LLM → text-to-speech, all running locally on your GPU via WebGPU. You can have a spoken conversation with an AI without a single network request.

Currently supported models: - TTS: Kokoro 82M, SpeechT5, Piper (VITS) - STT: Whisper Tiny, Whisper Base

Other features: - Side-by-side model comparison - Speed benchmarking on your hardware - Streaming generation for supported models

Source: https://github.com/MbBrainz/ttslab (MIT)

Feedback I'd especially like: 1. How does performance feel on your hardware? 2. What models should I add next? 3. Did the Voice Agent work for you? That's the most experimental part.

Built on top of ONNX Runtime Web (https://onnxruntime.ai) and Transformers.js — huge thanks to those communities for making in-browser ML inference possible.

Comments

MbBrainz•2h ago

Maker here. A few technical notes that might be interesting to this crowd:

The Voice Agent chains three models in the browser: Whisper for STT → a local LLM → Kokoro/SpeechT5 for TTS. All inference runs on-device via WebGPU. The latency isn't amazing yet, but the fact that it works at all with zero backend is kind of wild.

The landing page has an auto-playing demo that generates speech locally as soon as you visit — you'll hear it typewrite and speak three sentences. That was important to me because "runs in your browser" sounds like marketing until you actually hear it happen.

Happy to go deep on the WebGPU inference pipeline, model conversion process, or anything else.

Show HN: BVisor – An Embedded Bash Sandbox, 2ms Boot, Written in Zig

Transitive Core Concepts: Full-Stack Robotic Packages

Show HN: OmniGlass – An open-source, sandboxed Visual Action Engine

25 Years of Eggs

Multiple formation pathways for amino acids based on asteroid sample isotopes

Show HN: Personal Vault – Own your personal context, let AI agents query it

Show HN: I made a hub to fix how founders and investors connect online

Show HN: I Built Captcha for Agents

Tariff Refund Calculator

User gains control of over 6,700 DJI robot vacuums with help from Claude Code

Rust Doesn't Have Named Arguments. So What?

Show HN: DealLedger – An open ledger of every business for sale in America

Native apps with Tauri, back end with Bun

Surprisingly Frugal Hobbies

OpenAI lands multiyear deals with consulting giants in enterprise push

Venture Fraud

Effective Agents

I made a retro 90s TCG

The Flawed V02 Max Craze

Ask HN: Is there anyone working at Meta who could help with a stolen FB account?

Show HN: AgentDbg - local-first debugger for AI agents (timeline, loops, etc.)

HubSpot Acquires YouTube-Based Media Brand Starter Story

ReadInSync: Personalized, grouped news summaries to reduce noise

Sam Altman would like to remind you that humans use a lot of energy, too

Deplatform Yourself

A Simple Playbook for Replacing SaaS Vendors

The Text Message That Almost Ended My Business

Can AI lead to negative growth?

Your First PeopleWork Workspace

Coding Agent Commit Tracker