A 135M parameter TTS model trained for ~$100 on 1 GPU, running ~20× real-time on a base MacBook M3 CPU.
v1.5 highlights (on CPU):
• 250 ms TTFA streaming latency • 0.05 RTF (~20× real-time) • Zero-shot voice cloning • Smaller, faster, more stable
Still not perfect (OOD voices can be tricky, and there are still some artifacts), but a decent upgrade.
Repo: https://github.com/samuel-vitorino/sopro
sammyyyyyyy•1h ago
A 135M parameter TTS model trained for ~$100 on 1 GPU, running ~20× real-time on a base MacBook M3 CPU.
v1.5 highlights (on CPU):
• 250 ms TTFA streaming latency • 0.05 RTF (~20× real-time) • Zero-shot voice cloning • Smaller, faster, more stable
Still not perfect (OOD voices can be tricky, and there are still some artifacts), but a decent upgrade.
Repo: https://github.com/samuel-vitorino/sopro