The idea was to keep Kokoro’s speed and real-time compatibility while allowing speech to be generated in the timbre of a reference voice.
You can type text, upload a ~3–10 second voice sample, and generate speech in that voice.
Supports several languages including English, Hindi, French, Japanese, Chinese, Spanish, Portuguese, and Italian.
Runs on CPU and can use GPU if available.
Live demo: https://huggingface.co/spaces/PatnaikAshish/kokoclone
Would appreciate feedback.