But yeah, if it's like any of the others we'll likely see a different "model" per language down the line based on the same techniques
[0] https://old.reddit.com/r/LocalLLaMA/comments/1mhyzp7/kitten_...
It sounds ok, but impressive for the size.
https://clowerweb.github.io/node_modules/onnxruntime-web/dis...
(seems reverted now)
If anyone else wants to try:
> Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications. Our smallest model is less than 25 megabytes.
Doesn't seem to work with thai.
This is a ouroboros that will continue.
(Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)
Because, well, there's a huge number of models. Are they all, as they say, "in cahoots"? (working together, clandestinely)
This is a good list: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
It's one thing to observe "LLM-generated writing all looks the same". Whether the LLMs were all post-trained the same way is a different question.
I don't agree "everyone says everything is AI". Do you have examples where a consensus of people are accusing something of being AI generated, where it lacks those indicators?
It’s not slop — it’s inspiration!
The problem I see this leading to is plenty of legitimate written things getting thrown away because somebodys online exposure bubbles don't end up including a lot of Medium or Tumblr or a certain Discord or whatever bubble where _huge_ groups of people actually are writing in whatever $STYLE is being identified by the reader and commenter as AI. Which then, because of their post, also gets other people to not even look.
It seems like a disaster, frankly.
No human comments on meta formatting like that outside the deepest trenches of Apple/FB corporate stuff.
Is that tested and proven or just gut feeling?
ls -lah /usr/bin/say
-rwxr-xr-x 1 root wheel 193K 15 Nov 2024 /usr/bin/say
Usage: M1-Mac-mini ~ % say "hello world this is the kitten TTS model speaking"
That being said, the ‘classical’ (pre-AI) speech synthesisers are much smaller than kitten, so you’re not wrong per se, just for the wrong reason.
https://project64.c64.org/Software/SAM10.TXT
Obviously it's not fair to compare these with ML models.
Running `man say` reveals that "this tool uses the Speech Synthesis manager", so I'm guessing the Apple Intelligence stuff is kicking in.
For STT whisper is really amazing. But I miss a good TTS. And I don't mind throwing GPU power at it. But anyway. this isn't it either, this sounds worse than kokoro.
This isn't for you, then. You should evaluate quality here based on the fact you don't need a GPU.
Back in the pre-Tacotron2 days, I was running slim TTS and vocoder models like GlowTTS and MelGAN on Digital Ocean droplets. No GPU to speak of. It cost next to nothing to run.
Since then, the trend has been to scale up. We need more models to scale down.
In the future we'll see small models living on-device. Embedded within toys and tools that don't need or want a network connection. Deployed with Raspberry Pi.
Edge AI will be huge for robotics, toys and consumer products, and gaming (ie. world models).
Aside: Are there any models for understanding voice to text, fully offline, without training?
I will be very impressed when we will be able to have a conversation with an AI at a natural rate and not "probe, space, response"
While I think this is indeed impressive and has a specific use case (e.g. in the embedded sector), I'm not totally convinced that the quality is good enough to replace bigger models.
With fish-speech[1] and f5-tts[2] there are at least 2 open source models pushing the quality limits of offline text-to-speech. I tested F5-TTS with an old NVidia 1660 (6GB VRAM) and it worked ok-ish, so running it on a little more modern hardware will not cost you a fortune and produce MUCH higher quality with multi-language and zero-shot support.
For Android there is SherpaTTS[3], which plays pretty well with most TTS Applications.
1: https://github.com/fishaudio/fish-speech
Here is the link to our repo: https://github.com/KittenML/KittenTTS
We would appreciate a star!
Thanks
It would be great if the training data were released too!
GaggiX•2h ago
https://github.com/KittenML/KittenTTS
This is the model and Github page, this blog post looks very much AI generated.