I'm Praney, solo developer. I'm partially dyslexic, so since high school I've been converting articles and papers to audio to actually absorb them. That habit became a personal tool. Over the last year that tool became this.
Vois is a desktop voice AI studio. Everything runs on your machine. No cloud calls, no data uploaded, no per-character billing.
Three voice engines in one app: a fast one (English, 6x real-time on Apple Silicon), an expressive one for character work, and a multilingual one covering 23 languages. 63 voices across 15 character archetypes, voice cloning from a 15-second sample, and voice design where you describe the voice you want in plain text and get something in the right ballpark.
The script editor supports multi-speaker dialogue, so game developers can prototype NPC conversations without burning through credits on lines that'll change in the next revision. The timeline is multi-track for mixing and arranging. Mastering covers loudness normalization, de-esser, EQ, and limiter. Export to WAV, MP3, FLAC, or AAC.
Content creators use it for podcast narration and audiobooks. App developers are building it into accessibility tools and AI companion apps. L&D teams use it to keep e-learning voiceovers updated without paying per character on every edit. For me personally, the Listen mode — converting whatever I'm reading into audio — is still the feature I use most.
The build:
Tauri 2, Rust backend, React/TypeScript frontend. GPU acceleration on Windows landed just today in v1.2.1.
No tokens counting, generate UNLIMITED and local.
Pricing: $29/month or $14/month billed annually ($168/year). Free tier: all voices and all engines, 10 generations a day, no card required.
Happy to go deeper on any of the engineering decisions.
Praney-Behl•2h ago
Vois is a desktop voice AI studio. Everything runs on your machine. No cloud calls, no data uploaded, no per-character billing.
Three voice engines in one app: a fast one (English, 6x real-time on Apple Silicon), an expressive one for character work, and a multilingual one covering 23 languages. 63 voices across 15 character archetypes, voice cloning from a 15-second sample, and voice design where you describe the voice you want in plain text and get something in the right ballpark.
The script editor supports multi-speaker dialogue, so game developers can prototype NPC conversations without burning through credits on lines that'll change in the next revision. The timeline is multi-track for mixing and arranging. Mastering covers loudness normalization, de-esser, EQ, and limiter. Export to WAV, MP3, FLAC, or AAC.
Content creators use it for podcast narration and audiobooks. App developers are building it into accessibility tools and AI companion apps. L&D teams use it to keep e-learning voiceovers updated without paying per character on every edit. For me personally, the Listen mode — converting whatever I'm reading into audio — is still the feature I use most.
The build:
Tauri 2, Rust backend, React/TypeScript frontend. GPU acceleration on Windows landed just today in v1.2.1.
No tokens counting, generate UNLIMITED and local.
Pricing: $29/month or $14/month billed annually ($168/year). Free tier: all voices and all engines, 10 generations a day, no card required.
Happy to go deeper on any of the engineering decisions.