I’m the creator of VoGen [https://vogen.app].
I’ve always been fascinated by how far AI technology has come, but I found that most existing tools with expressing emotions are gated behind expensive subscriptions. I built VoGen to explore how we can make AI voices more "human" and accessible.
What it does:
Voice Cloning: You can clone a voice using a 3-60 second sample. It works best with a clean, single-speaker recording.
Emotional TTS: Instead of a flat tone, you can choose between Happy, Angry, Sad, and etc.
Bilingual Support: It currently supports both English and Mandarin Chinese.
Privacy-First Tools: I also added a browser-based audio speed changer that processes files locally—no audio data ever leaves your machine for that specific tool.
The Tech Stack: The frontend is built with React.js, and it's deployed on Vercel. For the voice engine, I'm using a customized pipeline that focuses on low-latency inference while maintaining high fidelity.
Why is it free? Right now, VoGen is in its early stages (MVP). I want to see how people use it and what kind of voice quality the community expects before even thinking about monetization.
Privacy Note: I know how sensitive voice data is. We don’t use your uploaded cloning samples to train our base models.
I’d love to get some feedback from the HN community. Whether it’s about the latency, the naturalness of the emotions, or the UI/UX—I’m all ears.
What features would make this more useful for your workflow?