frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Inworld TTS – high-quality, affordable, and low-latency TTS

https://inworld.ai/tts
15•rogilop•3h ago
Hi HN, Igor here, one of the engineers behind this project.

High-quality voice APIs are usually either expensive, slow, or both. Cheaper and faster solutions very often lack realism. We decided to build Inworld TTS to bridge this gap.

We just released two multilingual models. Our small model, named TTS-1, is on par with SOTA models quality-wise given objective metrics WER/SIM/DNSMOS. A larger model, TTS-1-Max, is even better. It can produce more nuanced speech and has ~3.5% better WER across all 11 supported languages averaged. Both models also support markup tags (e.g. prepend "[happy]" to the text to make the generation more enthusiastic, etc).

The models are built with LLaMA 1B and 8B being the SpeechLM backbones for TTS-1 and TTS-1-Max respectively. We up-trained both models on a mixture of text and audio, then finetuned on text-audio pairs and polished final checkpoints with GRPO on a small high-quality dataset. Our Speech Lab team (4 MLEs) started to work on collecting audio data around late February and exploring different audio codec architectures. We got inspired by the simplicity of the single vector quantization Xcodec2 neural audio codec architecture used and decided to use a similar idea. We started the training early April. Once codec was ready, SpeechLMs’ training took another month and a half. We finished mid-June, all - using 32 H100 GPUs.

To make models real-time ready during serving, we collaborated with Modular to migrate from vanilla vLLM solution to Mojo- written MAX server. Our bet of keeping serving architecture as simple as possible played out well: both models turned out to be really fast. TTS-1, which can be accessed via streaming API, has ~500ms p90 latency for returning the first ~2 seconds of audio. The pricing is simple, pay $5/1M characters. A larger model’s API access will be opened soon. We’ll share more details about serving performance optimizations made in the coming weeks.

We are also about to release all the training, modeling, and benchmarking code on GitHub to be transparent about how we made it. This repo is very flexible and can easily be adjusted to train an arbitrary neural net, but we’ll release the code with the focus on speech modeling. By the way, we’ve used PyTorch Lightning as the framework for multi-node/multi-GPU training as it proved to be very easy-to-use and reliable.

--

Check the TTS out at https://inworld.ai/tts

Happy to answer any questions you have!

Comments

igh•56m ago
Thank you for sharing the details!
rogilop•1m ago
Sure! We plan to release a detailed tech report alongside with the repo too. We have a lot of interesting lessons to share.
feifan123•47m ago
This is amazing! It unblocks many potential AI applications with voices.
fr25•46m ago
Interesting approach... thanks for sharing
RohanPanda99•43m ago
Kudos on the launch! The price-point along with superior quality compared to peer models would make it a go-to solution for TTS!
kalacoffee•39m ago
TTS Playground is easy to use and impressive. Clone voice was intuitive.
cremaster_•24m ago
I've used Inworld for AI characters in the past. Are you pivoting to a TTS company?

Also, can these voices be plugged into the Unreal/Unity SDKs?

rogilop•1m ago
Not really, we aren't pivoting: TTS is a part of our strategy to great AI solutions accessible for as many developers as possible. We don't have official plugins for UE/Unity yet, but will have something to share soon. So at the moment feel free to use directly via API.
jsx888•3m ago
Love it! Cant wait to try this out and cut down the costs we incur using other services.

Programming skills that AIs cannot have and how you learn them

https://www.youtube.com/watch?v=iJv25jws7qo
1•indigodaddy•1m ago•0 comments

CIA Insectothopter

https://www.cia.gov/legacy/museum/artifact/insectothopter/
1•austinallegro•3m ago•0 comments

Kea 3.0, our first LTS version

https://www.isc.org/blogs/kea-3-0/
1•conductor•3m ago•0 comments

John Carmack (Keen Technologies): Research Directions

https://www.youtube.com/watch?v=3pdlTMdo7pY
1•rasengan0•3m ago•0 comments

Vpype: A CLI for Plotter Art

https://github.com/abey79/vpype
1•spmcl•4m ago•0 comments

Out-of-Band, Part 1: The new gen of IP KVMs and how to find them

https://www.runzero.com/blog/oob-p1-ip-kvm/
1•walterbell•5m ago•0 comments

Ask HN: Employers of HN – Would you hire a career changer without experience?

1•tejonutella•6m ago•0 comments

Robots That Learn – OpenAI

https://openai.com/index/robots-that-learn/
2•ulrischa•7m ago•0 comments

'Made in the USA' reference disappears from Trump phone listing

https://www.bbc.co.uk/news/articles/cy4yv3pmmwjo
6•_ua_•12m ago•0 comments

Midjourney Storytelling Lab

https://mj-storytelling.github.io/
1•kelseyfrog•15m ago•0 comments

Deploying Pull Requests: A Complete AWS Stack for Every PR

https://metaduck.com/deploying-pull-requests-a-complete-aws-stack-for-every-pr-/
1•pgte•15m ago•1 comments

Uber in Talks with Its Founder, Travis Kalanick, to Fund Self-Driving Car Deal

https://www.nytimes.com/2025/06/26/technology/uber-travis-kalanick-self-driving-car-deal.html
1•donohoe•17m ago•0 comments

The life-changing artificial pancreas

https://www.cam.ac.uk/stories/KidsArtificialPancreas
1•geox•17m ago•0 comments

Adding a trash can to Linux with trash-CLI

https://ittavern.com/adding-a-trash-can-to-linux-with-trash-cli/
1•todsacerdoti•20m ago•0 comments

Starcloud says 1 launch, $8M but ISS tech says 17 launches, $850M+

https://angadh.com/space-data-centers-1
2•angadh•23m ago•0 comments

DeepSeek's next-gen model delayed by Nvidia H20 restrictions

https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-disruptor-deepseeks-next-gen-model-delayed-by-nvidia-h20-restrictions-short-supply-of-accelerators-hinders-development
2•LorenDB•26m ago•0 comments

Researchers develop a battery cathode material that does it all

https://arstechnica.com/science/2025/06/researchers-develop-a-battery-cathode-material-that-does-it-all/
4•LorenDB•28m ago•0 comments

Matrix v1.15 Release

https://matrix.org/blog/2025/06/26/matrix-v1.15-release/
19•todsacerdoti•28m ago•2 comments

Cartridges: Lightweight long context representations via self-study

https://arxiv.org/abs/2506.06266
1•PaulHoule•29m ago•0 comments

Why Postgres needs better connection security defaults

https://neon.com/blog/postgres-needs-better-connection-security-defaults
2•gmac•29m ago•0 comments

Book authors made the wrong arguments in Meta AI training case, judge says

https://arstechnica.com/tech-policy/2025/06/book-authors-made-the-wrong-arguments-in-meta-ai-training-case-judge-says/
3•LorenDB•30m ago•0 comments

Harry Potter and the Cuaron Slam

https://www.ribbonfarm.com/2007/07/20/harry-potter-and-the-cuaron-slam/
1•tomaskafka•30m ago•0 comments

Ask HN: Has anyone manage to implement OAuth on an MCP server?

1•rco8786•32m ago•0 comments

Next Generation of Red Teaming for LLM Agents

https://www.promptfoo.dev/blog/2025-summer-new-redteam-agent/
1•mooreds•32m ago•0 comments

A Prophecy of Silicon Valley's Fall

https://www.theintrinsicperspective.com/p/a-prophecy-of-silicon-valleys-fall
1•paulpauper•33m ago•0 comments

Infra Caddy Guy Scripts: Docker, Caddy Lightweight Server Management Bash TUI

https://github.com/nguyenanhung/infra-caddy-guy
1•indigodaddy•34m ago•0 comments

Is DOGE doomed to fail? Some experts are ready to call it

https://arstechnica.com/tech-policy/2025/06/is-doge-doomed-to-fail-some-experts-are-ready-to-call-it/
2•rbanffy•34m ago•0 comments

Apple announces sweeping App Store changes in the EU

https://9to5mac.com/2025/06/26/apple-announces-sweeping-app-store-changes-in-the-eu/
14•saubeidl•36m ago•6 comments

Prompter: Orchestrate AI-powered code maintenance workflows with Claude

https://github.com/baijum/prompter
1•baijum•36m ago•1 comments

Mothering Without Limits

https://web.archive.org/web/20230628185931/https://www.topic.com/mothering-without-limits
1•NaOH•37m ago•0 comments