Show HN: KaniTTS – Open-source high-fidelity TTS with just 450M params

https://huggingface.co/nineninesix/kani-tts-450m-0.1-pt

2•ulan_kg•1h ago

Comments

ulan_kg•1h ago

Hey HN,

We've been tinkering with TTS models for a while, and I'm excited to share KaniTTS – an open-source text-to-speech system we built at NineNineSix.ai. It's designed for speed and quality, hitting real-time generation on consumer GPUs while sounding natural and expressive.

Quick overview: Architecture: Two-stage pipeline – a LiquidAI LFM2-350M backbone generates compact semantic/acoustic tokens from text (handling prosody, punctuation, etc.), then NVIDIA's NanoCodec synthesizes them into 22kHz waveforms. Trained on ~50k hours of data. Performance: On an RTX 5080, it generates 15s of audio in ~1s with only 2GB VRAM.

Languages: English-focused, but tokenizer supports Arabic, Chinese, French, German, Japanese, Korean, Spanish (fine-tune for better non-English prosody).

Use cases: Conversational AI, edge devices, accessibility, or research. Batch up to 16 texts for high throughput.

It's Apache 2.0 licensed, so fork away.

Check the audio comparisons on the page – it holds up well against ElevenLabs or Cartesia.

Repo: https://github.com/nineninesix-ai/kani-tts

Model: https://huggingface.co/nineninesix/kani-tts-450m-0.1-pt

Page: https://www.nineninesix.ai/n/kani-tts

Feedback welcome – what's your go-to TTS setup?

homarp•1h ago

can you add kokoro tts output next to yours? and a few others Apache 2 TTS of your choice

ulan_kg•1h ago

ya, you can check out our post https://www.nineninesix.ai/n/kani-tts . I've taken a bunch of OSS and closed TTS outputs there

gangtao•1h ago

there are lots of TTS models, while what I care most is how can I easily use it, like clone my voice?

ulan_kg•1h ago

most TTS models are either small and too robotic or big and slow. We try to hit the sweet spot: near-human quality (MOS 4.3/5) while running fast on consumer GPUs like an RTX 5080.

For voice cloning, it's better to fine-tune the base model, if you are chasing top quality. What’s your use case?

A Backdoor Disguised as a Job Interview

Shorting Bitcoin as the Perfect Hedge

Elenchus – A research tool for public discourse

China's Great Firewall suffers a leak – 500GB of source code is spilled online

Well, it's over - a brief eulogy for the "trans rights" movement

Learning to read C++ compiler errors: Not a legal base class

How to waste CPU like a Professional

Goodbye, RubyGems

Permanently On-Call

WeUseElixir

Sandia team creates X-ray images of the future

Get email notifications when somebody replies to your HN comment

Trump administration to add $100k fee for H-1B visas

US import dependence on EU on the rise, outpacing China

Is a M4 MacBook Air worth it in todays modern technological world?

What does the future hold for generative AI?

Review: Project Xanadu – The Internet That Might Have Been

Wan-Animate: Unified Character Animation and Replacement

Trump to add $100k fee for H-1B worker visas

The political mood feels like 9/11 again

Figma renderer upgraded to use WebGPU

Syria's quest to build its own Silicon Valley

Paper2Agent – transforming research papers into interactive AI agents

China's BYD is sued over 'slave-like' labor conditions

How Replit Is Protecting You from the "Shai-Hulud" Worm

Time Spent on Hardening

Some Shoulds for Strategy

Tongyi DeepResearch: the leading open-source deep research agent

Dissection of genomic regulation using bulk and targeted single-cell activation

Uber Eats Testing Flying Meal Deliveries by Drone Again This Year

A Backdoor Disguised as a Job Interview

Shorting Bitcoin as the Perfect Hedge

Elenchus – A research tool for public discourse

China's Great Firewall suffers a leak – 500GB of source code is spilled online

Well, it's over - a brief eulogy for the "trans rights" movement

Learning to read C++ compiler errors: Not a legal base class

How to waste CPU like a Professional

Goodbye, RubyGems

Permanently On-Call

WeUseElixir

Sandia team creates X-ray images of the future

Get email notifications when somebody replies to your HN comment

Trump administration to add $100k fee for H-1B visas

US import dependence on EU on the rise, outpacing China

Is a M4 MacBook Air worth it in todays modern technological world?

What does the future hold for generative AI?

Review: Project Xanadu – The Internet That Might Have Been

Wan-Animate: Unified Character Animation and Replacement

Trump to add $100k fee for H-1B worker visas

The political mood feels like 9/11 again

Figma renderer upgraded to use WebGPU

Syria's quest to build its own Silicon Valley

Paper2Agent – transforming research papers into interactive AI agents

China's BYD is sued over 'slave-like' labor conditions

How Replit Is Protecting You from the "Shai-Hulud" Worm

Time Spent on Hardening

Some Shoulds for Strategy

Tongyi DeepResearch: the leading open-source deep research agent

Dissection of genomic regulation using bulk and targeted single-cell activation

Uber Eats Testing Flying Meal Deliveries by Drone Again This Year

Show HN: KaniTTS – Open-source high-fidelity TTS with just 450M params

Comments