frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Three new Kitten TTS models – smallest less than 25MB

https://github.com/KittenML/KittenTTS
88•rohan_joshi•1h ago
Kitten TTS (https://github.com/KittenML/KittenTTS) is an open-source series of tiny and expressive text-to-speech models for on-device applications. We had a thread last year here: https://news.ycombinator.com/item?id=44807868.

Today we're releasing three new models with 80M, 40M and 14M parameters.

The largest model (80M) has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. This release is a major upgrade from the previous one and supports English text-to-speech applications in eight voices: four male and four female.

Here's a short demo: https://www.youtube.com/watch?v=ge3u5qblqZA.

Most models are quantized to int8 + fp16, and they use ONNX for runtime. Our models are designed to run anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release aims to bridge the gap between on-device and cloud models for tts applications. Multi-lingual model release is coming soon.

On-device AI is bottlenecked by one thing: a lack of tiny models that actually perform. Our goal is to open-source more models to run production-ready voice agents and apps entirely on-device.

We would love your feedback!

Comments

great_psy•1h ago
Thanks for working on this!

Is there any way to get those running on iPhone ? I would love to have the ability for it to read articles to me like a podcast.

rohan_joshi•57m ago
yes, we're releasing an official mobile sdk and inference engine very soon. if you want to use something until then, some folks from the oss community have built ways to run kitten on ios. if you search kittentts ios on github you should find a few. if you cant find it, feel free to ping me and i can help you set it up. thanks a lot for your support and feedback!
ilaksh•1h ago
Thanks for open sourcing this.

Is there any way to do a custom voice as a DIY? Or we need to go through you? If so, would you consider making a pricing page for purchasing a license/alternative voice? All but one of the voices are unusable in a business context.

rohan_joshi•1h ago
thanks a lot for the feedback. yes, we're working on a diy way to add custom voices and will also be releasing a model with more professional voices in the next 2-3 weeks. as of now, we're providing commercial support for custom voices, languages and deployment through the support form on our github. can you share more about your business use-case? if possible, i'd like to ensure the next release can serve that.
Tacite•1h ago
Is it English only?
rohan_joshi•55m ago
as of now its english only. the training for multilingual model is underway and should be out in April! what languages are you most interested in? Right now, we are providing deployments for custom languages + voices through support form on the github.
altruios•1h ago
One of the core features I look for is expressive control.

Either in the form of the api via pitch/speed/volume controls, for more deterministic controls.

Or in expressive tags such as [coughs], [urgently], or [laughs in melodic ascending and descending arpeggiated gibberish babbles].

the 25MB model is amazingly good for being 25MB. How does it handle expressive tags?

rohan_joshi•44m ago
thank you so much. Right now, it cannot handle expressive tags. what kind of tags would be most helpful according to you?
altruios•4m ago
Emotion based tagging control would be the most helpful narrowing it down. Tags like [sarcastically] [happily] [joyfully] [fearfully]: so a subsection of adverbs.

A stretch goal is 'arbitrary tags' from [singing] [sung to the tune of {x}] [pausing for emphasis] [slowly decreasing speed for emphasis] [emphasizing the object of this sentence] [clapping] [car crash in the distance] [laser's pew pew].

But yeah: instruction/control via [tags] is the deciding feature for me, provided prompt adherence is strong enough.

Also: a thought...

Everyone is using [] for different kinds of tags in this space: which is very simple. Maybe it makes sense to differentiate kinds of tags? I.E. [tags for modifying how text is spoken] vs {tags for creating sounds not specifically speech: not modifying anything... but instead it's own 'sound/word'}

kevin42•1h ago
What I love about OpenClaw is that I was able to send it a message on Discord with just this github URL and it started sending me voice messages using it within a few minutes. It also gave me a bunch of different benchmarks and sample audio.

I'm impressed with the quality given the size. I don't love the voices, but it's not bad. Running on an intel 9700 CPU, it's about 1.5x realtime using the 80M model. It wasn't any faster running on a 3080 GPU though.

rohan_joshi•46m ago
yeah we'll add some more professional-sounding voices and also support for diy custom voices. we tried to add more anime/cartoon-ish voices to showcase the expressivity.

Regarding running on the 3080 gpu, can you share more details on github issues, discord or email? it should be blazing fast on that. i'll add an example to run the model on gpu too.

ks2048•1h ago
You should put examples comparing the 4 models you released - same text spoken by each.
rohan_joshi•27m ago
great idea, let me add this. meanwhile, you can try the models on our huggingface spaces demo here: https://huggingface.co/spaces/KittenML/KittenTTS-Demo
fwsgonzo•56m ago
How much work would it be to use the C++ ONNX run-time with this instead of Python? Is it a Claudeable amount of work?

The iOS version is Swift-based.

rohan_joshi•49m ago
shouldn't be hard. what backend/hardware are you interested in running this with? i'll add an example for using C++ onnx model. btw check out roadmap, our inference engine will be out 1-2 weeks and it is expected to be faster than onnx.
ks2048•54m ago
There's a number of recent, good quality, small TTS models.

If the author doesn't describe some detail about the data, training, or a novel architecture, etc, I only assume they just took another one, do a little finetuning, and repackage as a new product.

the_duke•49m ago
Any recommendations?
wiradikusuma•16m ago
I'm thinking of giving "voice" to my virtual pets (think Pokemon but less than a dozen). The pets are made up animals but based on real animal, like Mouseier from Mouse (something like that). Is this possible?

Tldr: generate human-like voice based on animal sound. Anyway maybe it doesn't make sense.

magicalhippo•15m ago
A lot of good small TTS models in recent times. Most seem to struggle hard on prosody though.

Kokoro TTS for example has a very good Norwegian voice but the rhythm and emphasizing is often so out of whack the generated speech is almost incomprehensible.

Haven't had time to check this model out yet, how does it fare here? What's needed to improve the models in this area now that the voice part is more or less solved?

devinprater•8m ago
A lot of these models struggle with small text strings, like "next button" that screen readers are going to speak a lot.
soco•2m ago
I think I tried on my Android everything I could try and 1. outside webpage reading, not many options; 2. as browser extensions, also not many (I don't like to copy URLs in your app) 3. they all insist reading every little shit, not only buttons but also "wave arrow pointing directly right" which some people use in their texts. So basically reading text aloud is a bunch of shitty options. Anyone jumping in this market opening?

Claude Drexler

https://claudedrexler.com/
1•mattdionis•1m ago•0 comments

San Francisco's DNA is mischief

https://www.fieldnotes.nautilus.quest/p/san-franciscos-dna-is-mischief
2•zackoverflow•5m ago•0 comments

How the first electric grid was built

https://worksinprogress.co/issue/how-the-worlds-first-electric-grid-was-built/
1•deunamuno•5m ago•0 comments

Google announces Firebase Studio sunset and project migration

https://firebase.google.com/docs/studio/migrating-project
2•tmoertel•6m ago•0 comments

I Think a New Role Is Emerging in Tech

https://newsletter.thelongcommit.com/p/i-think-a-new-role-is-emerging-in
2•jcmartinezdev•7m ago•0 comments

Budgetbreeze – an AI centric personal finance tool

https://www.budgetbreeze.io/
1•boxstream•7m ago•1 comments

Imagined Unrealities

https://zuhayeer.com/writing/imagined-unrealities.html
1•zuhayeer•7m ago•0 comments

Why $4 gasoline is the tipping point for EVs

https://grist.org/energy/why-4-gasoline-is-the-tipping-point-for-evs/
1•speckx•7m ago•0 comments

Developer Spotlight: Somtochi Onyekwere from Fly.io

https://theconsensus.dev/p/2026/03/19/developer-spotlight-somtochi-onyekwere.html
1•eatonphil•9m ago•0 comments

Workflow Guardian – a GitHub Action that lints your CI/CD workflow files

https://github.com/marketplace/actions/workflow-guardian
1•hnollie89•9m ago•0 comments

New hard science fiction novel

2•dufbugderopa•11m ago•0 comments

Claude Code's System Prompt

https://www.claudecodecamp.com/p/inside-claude-code-s-system-prompt
2•aray07•14m ago•0 comments

Show HN: Local Document Parsing for Agents

https://www.llamaindex.ai/blog/liteparse-local-document-parsing-for-ai-agents
7•cheesyFish•14m ago•0 comments

Why aren't AI productivity gains higher?

https://newsletter.getdx.com/p/why-arent-ai-productivity-gains-higher
2•romanhn•17m ago•0 comments

The Long Farewell to Mark Zuckerberg's Metaverse

https://www.nytimes.com/2026/03/19/technology/mark-zuckerbergs-metaverse-vr-horizon-worlds.html
4•SLHamlet•17m ago•1 comments

Microsoft Seeks More Coherence in AI Efforts with Copilot Reorganization

https://www.wsj.com/tech/ai/microsoft-seeks-more-coherence-in-ai-efforts-with-copilot-reorganizat...
1•gmays•18m ago•0 comments

I built a direct-to-buyer store for industrial floor scrubbers and pallet jacks

https://sunmaxus.com/
1•Cleaninguy•19m ago•1 comments

Show HN: Oku – One tab to filter out noise from feeds and content sources

https://oku.io
2•oan•19m ago•0 comments

UK's Ofcom has today fined 4chan £450k for not having age checks in place

https://www.ofcom.org.uk/online-safety/illegal-and-harmful-content/4chan-fined-450000-for-not-pro...
6•longislandguido•19m ago•2 comments

Search for Golf Shots From The Masters 1968-2025

https://www.masters.com/en_US/vault/index.html
1•kyleblarson•19m ago•0 comments

An update on Steam / GOG changes for OpenTTD

https://www.openttd.org/news/2026/03/19/steam-changes-update
10•jandeboevrie•19m ago•0 comments

There Is No Firewall for English

https://openguard.sh/blog/english-firewall/
3•everlier•20m ago•0 comments

Three Thoughts on Dark Code

https://blog.waleson.com/2026/03/three-thoughts-on-dark-code.html
3•jtwaleson•23m ago•0 comments

I Can't Stop Running Claude Code Sessions

https://www.claudecodecamp.com/p/i-take-my-laptop-to-the-gym-so-claude-doesn-t-have-downtime
2•aray07•24m ago•0 comments

Composer 2 is now available in Cursor

https://twitter.com/cursor_ai/status/2034668943676244133
6•frankfrank13•25m ago•0 comments

Rowan County Chair Engages with Citizens Against AI DC Project

https://www.salisburypost.com/2026/03/19/rowan-county-chair-edds-engages-with-citizens-against-da...
1•gz5•26m ago•1 comments

Why I Used a Broken Laptop Instead of Buying a Mac Mini

https://medium.com/seeds-for-the-future/how-a-broken-laptop-saved-me-from-buying-a-mac-mini-fa169...
1•hungryclaw•28m ago•0 comments

How to Not Pay Your Taxes

https://taylor.town/succession-000
4•surprisetalk•30m ago•0 comments

Android: Balancing Openness and Choice with Safety

https://android-developers.googleblog.com/2026/03/android-developer-verification.html
6•0xedb•31m ago•1 comments

Offload: Speed up the agent loop by running tests remotely

https://imbue.com/product/offload/
7•nvader•32m ago•2 comments