frontpage.

Show HN: KVoiceWalk – Voice cloning for Kokoro TTS using random walk algorithms

https://github.com/RobViren/kvoicewalk

13•robviren•8mo ago

I was blown away by Kokoro and what it managed to do with such little space. I became curious if it would be possible to create new voices by direct manipulation of the style tensors. After many failed attempts I finally landed on a method that properly scores the similarity of two audio segments that works well enough to random walk similar voices for Kokoro. I plan on using this scoring as part of a genetic algorithm, but wanted to baseline test it with this code.

The scoring mechanism using Resemblyzer to calculate similarity to target audio and similarity to another segment of audio it generates itself, self similarity. This self similarity was key in keeping the model stable and the audio consistent across inputs. But it was not enough to prevent over fitting to Resemblyzer.

I had to create a third metric which uses a normalized difference of a variety of audio features compared to the target features. Summing those I get a feature similarity metric which is useful in keeping audio quality from degrading too much and prevents over fitting.

The last challenge was weighting the score while keeping it flexible enough to explore the complex text to speech style space. Using a weighted harmonic mean allowed for back sliding on some metrics for significant improvement in others, which reduced stagnation and worked well enough for the random walk to work.

The results are fairly good. I would say it ends up in the uncanny valley of similarity rather than producing a proper clone of the target voice. It sounds like it might be the target voice, but does well enough to improve similarity from 70% to around 90%. There are probably limitations to the architecture of Kokoro in how close it can possibly sound to other voices, but there is probably some more progress to be made using a more advanced genetic algorithm.

Check out the code, make some new voices, and let me know if you have any ideas on ways to improve.

Arcan Explained: A browser for different webs

What did we learn from the AI Village in 2025?

An open replacement for the IBM 3174 Establishment Controller

The P in PGP isn't for pain: encrypting emails in the browser

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

We Mourn Our Craft

Jim Fan calls pixels the ultimate motor controller

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

AI UX Playground: Real-world examples of AI interaction design

The Field Guide to Design Futures

The Other Leverage in Software and AI

AUR malware scanner written in Rust

Free FFmpeg API [video]

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Solid-State Freezer Needs No Refrigerants

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

From Zero to Hero: A Brief Introduction to Spring Boot

NSA detected phone call between foreign intelligence and person close to Trump

How to Fake a Robotics Result

It's time for the world to boycott the US

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

The AI CEO Experiment

Speed up responses with fast mode

MS-DOS game copy protection and cracks

Updates on GNU/Hurd progress [video]

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts