The TTS example clip in the repo of 'spontaneous singing' is creepy as fuck
Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.
https://github.com/microsoft/VibeVoice/commit/e73d1e17c3754f...
which is microsoft for "we removed two dead links". AI innovation knows no limits!
- Cohere Transcribe (self hosted)
- Grok Speech To Text (they provide an API, only $0.10/hr!)
They are both excellent. I'm not sure about this one. Would you like to see it in a consumer speech to text app?
This ship has sailed. It’s now in the same category as hacker/cracker and the pronunciation of GIF.
CubsFan1060•41m ago
542458•32m ago
JumpCrisscross•2m ago
Why?