The endless loop grew tiresome. Downloading models, scripting tests, watching them falter on setups while big tech smirks from afar. Trial-and-error felt like a slow grind, so Speechos came about: a web UI to drop audio, switch models on the fly, and watch the comparisons unfold.
All local, data stays put. Mic input or file upload. It auto-senses GPU/CPU/RAM for smart defaults, but tweaks are possible.
Built-in (no Docker): faster-whisper (tiny to large-v3), Vosk, Wav2Vec2, Piper, Kokoro, Bark, eSpeak, Chatterbox, emotion2vec+, HuBERT, Resemblyzer, Silero VAD.
Docker extras: XTTS, ChatTTS, Orpheus, Fish-Speech, Qwen3-TTS, Parler, MeloTTS, Speaches, NeMo, PyAnnote, and more.
Python/FastAPI + Next.js, uv/pnpm. ./dev.sh starts it. MIT, scales from 2GB CPU basics to 24GB GPU full load.
hamuf•2h ago
All local, data stays put. Mic input or file upload. It auto-senses GPU/CPU/RAM for smart defaults, but tweaks are possible.
Built-in (no Docker): faster-whisper (tiny to large-v3), Vosk, Wav2Vec2, Piper, Kokoro, Bark, eSpeak, Chatterbox, emotion2vec+, HuBERT, Resemblyzer, Silero VAD.
Docker extras: XTTS, ChatTTS, Orpheus, Fish-Speech, Qwen3-TTS, Parler, MeloTTS, Speaches, NeMo, PyAnnote, and more.
Python/FastAPI + Next.js, uv/pnpm. ./dev.sh starts it. MIT, scales from 2GB CPU basics to 24GB GPU full load.
Grab it if it speaks to you.