frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: KVoiceWalk – Voice cloning for Kokoro TTS using random walk algorithms

https://github.com/RobViren/kvoicewalk
2•robviren•5h ago
I was blown away by Kokoro and what it managed to do with such little space. I became curious if it would be possible to create new voices by direct manipulation of the style tensors. After many failed attempts I finally landed on a method that properly scores the similarity of two audio segments that works well enough to random walk similar voices for Kokoro. I plan on using this scoring as part of a genetic algorithm, but wanted to baseline test it with this code.

The scoring mechanism using Resemblyzer to calculate similarity to target audio and similarity to another segment of audio it generates itself, self similarity. This self similarity was key in keeping the model stable and the audio consistent across inputs. But it was not enough to prevent over fitting to Resemblyzer.

I had to create a third metric which uses a normalized difference of a variety of audio features compared to the target features. Summing those I get a feature similarity metric which is useful in keeping audio quality from degrading too much and prevents over fitting.

The last challenge was weighting the score while keeping it flexible enough to explore the complex text to speech style space. Using a weighted harmonic mean allowed for back sliding on some metrics for significant improvement in others, which reduced stagnation and worked well enough for the random walk to work.

The results are fairly good. I would say it ends up in the uncanny valley of similarity rather than producing a proper clone of the target voice. It sounds like it might be the target voice, but does well enough to improve similarity from 70% to around 90%. There are probably limitations to the architecture of Kokoro in how close it can possibly sound to other voices, but there is probably some more progress to be made using a more advanced genetic algorithm.

Check out the code, make some new voices, and let me know if you have any ideas on ways to improve.

Obsidian 1.9.0 launches with new file format

https://www.neowin.net/news/obsidian-190-launches-with-new-file-format-footnotes-view-plugin-and-more/
1•bundie•4m ago•0 comments

Verizon tries to get out of merger condition requiring it to unlock phones

https://arstechnica.com/tech-policy/2025/05/verizon-tries-to-get-out-of-merger-condition-requiring-it-to-unlock-phones/
2•coloneltcb•5m ago•0 comments

I built Zuzia.app to simplify server monitoring and task automation

https://zuzia.app/
1•gbukat•6m ago•1 comments

Mason: Scalable, Contiguous Sequencing for Building Consistent Services [pdf]

https://www.cs.princeton.edu/~wlloyd/papers/mason-jsys23.pdf
1•foota•11m ago•1 comments

Tesla's head of self-driving admits 'lagging a couple years' behind Waymo

https://electrek.co/2025/05/21/tesla-head-self-driving-admits-lagging-a-couple-years-behind-waymo/
4•NullHypothesist•11m ago•0 comments

Ask HN: What are you using to generate logos?

2•slow_turtle3•14m ago•0 comments

Google is baking Gemini AI into Chrome

https://www.pcworld.com/article/2788839/project-mariner-google-is-baking-gemini-ai-into-chrome.html
2•iio7•14m ago•0 comments

All Embedding Models Learn the Same Thing

https://twitter.com/jxmnop/status/1925224612872233081
1•MrBuddyCasino•20m ago•0 comments

An autostereogram ("Magic Eye") solver

https://huggingface.co/spaces/thearn/magiceye-solver
1•thearn4•20m ago•1 comments

Ask HN: Using LLMs for Better Design in Front End Development?

1•lukis_mx•22m ago•0 comments

Show HN: RepublishAI – WordPress AI Agents

https://republishai.com/
1•domid•22m ago•0 comments

Apple was Captured by China [video]

https://www.youtube.com/watch?v=NAj9zB4vaZc
3•ViktorRay•23m ago•0 comments

Big US cities are sinking. Which city is sinking fastest?

https://www.usatoday.com/story/news/nation/2025/05/08/big-us-cities-are-sinking-which-city-is-sinking-fastest/83492473007/
3•gmays•23m ago•0 comments

Rocky Linux 10 Will Support RISC-V

https://rockylinux.org/news/rockylinux-support-for-riscv
3•fork-bomber•24m ago•0 comments

Show HN: Gestiona tus rentas por día (Inflables, botargas,)

https://rent-ejej.onrender.com
1•oscarolbe•24m ago•0 comments

Show HN: High-resolution surface analysis with Lidar data

https://github.com/r-follador/delta-relief
1•folli•25m ago•0 comments

The Long Way into Open Source

https://opensource.org/maintainers/kgodey
2•kgodey•28m ago•0 comments

Make Your Code Sound Beautiful

https://code-to-music-ow50.onrender.com
3•rk3000•29m ago•1 comments

Campaign Against Avelo Airlines over ICE Deportation Flights Sets Off Fight

https://www.nytimes.com/2025/05/16/business/deportation-flights-avelo-airlines-billboard.html
1•pavel_lishin•29m ago•0 comments

The Fastest Native Hacker News Reader Built with Rust

http://fastHNreader.com
1•coolwulf•32m ago•2 comments

GPU Price Tracker

https://www.unitedcompute.ai/
1•RolandTheDragon•33m ago•1 comments

Is Microsoft replacing coders with AI, or is a Big Tech hiring boom coming?

https://aboard.com/ms-blurred-microsofts-ai-path-gets-slippery/
1•gbseventeen3331•35m ago•0 comments

Lambda the Ultimate AI Agent

https://www.boundaryml.com/blog/lambda-the-ultimate-ai-agent
3•aaronvg•36m ago•0 comments

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization

https://arxiv.org/abs/2505.14633
1•badmonster•37m ago•0 comments

China to donate $500M to WHO, stepping into gap left by U.S.

https://www.washingtonpost.com/world/2025/05/21/china-who-donation-500-million/
5•buuu•37m ago•0 comments

Microsoft's AI Vision: An Open Internet Made for Agents

https://every.to/chain-of-thought/microsoft-s-ai-vision-an-open-internet-made-for-agents
1•rbanffy•42m ago•0 comments

Show HN: I Built a Simple Prompt Manager to Organize AI Prompts with Windsurf

https://prompt-manager.com
1•gduale•43m ago•0 comments

Ask HN: Why did the godfather of AI retire given that AI can help you code?

1•amichail•43m ago•0 comments

Monitoring Claude Code with OTel / Datadog

https://ma.rtin.so/posts/monitoring-claude-code-with-datadog/
1•martin_•43m ago•1 comments

Show HN: Confidential computing for high-assurance RISC-V embedded systems

https://github.com/IBM/ACE-RISCV
13•mrnoone•44m ago•0 comments