frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Ask HN: What happens when AI-voice becomes good enough?

1•boa00•1h ago
I fell into the rabbit hole of TTS models lately. Tried all major paid tools (ElevenLabs/InWorld/etc.), and all the newest open-source models.

I started asking myself: what happens when the voice is "solved"? E.g. it gets impossible to distinguish it from a human. Wanted to hear your opinions!

Sketched some of my own thoughts, and I see two futures:

Future 1: the nuanced version

Audiobooks: I think established authors will still prefer human narrators. If you can afford a $3k–$4k fixed cost for narration, a good human voice is usually worth it. TTS may even push human narration prices down, making that choice easier.

But for new/self-published authors, especially in non-fiction, AI narration may become the default. The choice is often not “AI vs. human narrator,” but “AI audiobook vs. no audiobook.” There will be backlash, but I think people will partly get used to it.

The more interesting threat may be AI readers. If I can buy an ebook for $8–$10 and have it narrated in a voice/style I like for $1–$2, why pay for an AI-narrated audiobook as a separate product? This could partly unbundle audiobooks from platforms like Audible. I’m torn here: AI-narrated self-published audiobooks and AI readers may co-exist, but AI readers could eventually replace most non-human audiobook editions.

Business content: training videos, museum guides, phone systems, short ads, internal explainers, etc. will be mostly AI. Anywhere “good enough is good enough” meets budget pressure, TTS wins. It already does.

Content creation: YouTube, podcasts, TikTok, etc. are different. Among top creators, I think human narration still dominates because personality and authenticity matter. If the voice is part of the brand, TTS is counterproductive.

That said, AI narration will explode in low-effort content. As generative text/video tools create more slop, most of that slop will probably have AI narration. So maybe the ratio of human vs. TTS voices on social media becomes 1:10 by volume, but 10:1 by total viewership in favor of human voices.

Dubbing/translations: heavily AI-dominated, except for high-end creative work like major films or books.

Films: only humans for now, but it could change. I can easily see generative AI technology going far enough that films of Hollywood quality are fully produced with AI. It would involve a new type of “producer,” someone who could manipulate generative AI and mold it into something beautiful, and it would require a new set of tools. Essentially, there would be many, many Pixar-style studios focused on ultra-realistic video with relatively small budgets. For such cases, AI narration would be used, and eventually it could eat almost the whole industry.

Games: TTS seems especially strong here: many distinct voices, short lines, lots of minor characters, and poor economics for hiring actors for everything. I think studios will still use humans for main characters, but many NPCs and indie-game voices will become AI.

Future 2: the hardline version

Anything outside of personal-brand stuff would be AI-generated. If it gets cheap and good enough, and society accepts it, everything from books to films and ads would be AI-narrated.

Human narrator would evolve as a profession — you would “sell” the rights to your voice being AI-generated.

A new profession of AI sound engineers will emerge, who will use AI to get creative with voice design and voice orchestration to get the best results.

I also feel like voice is quite different from text or image generation, in the sense that there is a weaker uncanny valley. In 95% of cases, voice is just a tool to convey creatively written text, hopefully written by a human, correctly. And for tools, it is mostly a question of getting good enough.

It is also possible that it is not either/or between the two futures: the first future is the next 10 years, and the second future is a bit ahead of that.

Comments

kvasserman•1h ago
I think of it this way. LLMs suppose to be good at generating text/writing, right? Well, they are not very good at it. They generate plausible content that superficially makes sense. Most people can easily tell AI generated slop from human writing. I suspect that mimicking human voice is multiple levels more difficult for LLMs than mimicking human content. The level of nuance that humans produce in their speech is probably staggering. So I maybe completely wrong, but I see no evidence so far to support the idea that either LLM's writing or speaking is going to get much better any time soon.
ben_w•1h ago
Perhaps, but for what it's worth, when I first heard OpenAI's TTS demo, I assumed they were faking it and a human was speaking because it had "um"s and "err"s.

Right now, the main thing making these things recognisable is there's so few voices. The voices themselves are basically celebrities, albeit in the same way as some annoying D-list celebrity who somehow managed to get a bajillion contracts for advertising cheap tat.

Given that LLM slop is currently rapidly degrading the trustworthiness of search results (even moreso than SEO already had), it's probably for the best if the major AI providers don't release a bunch more voices.

boa00•47m ago
Not sure I agree here

Text is just human thoughts in their most simple form. Writing is about expressing ideas, and there is almost an infinite number of ways to express them. Extremely difficult task, and LLMs only "imitate" it to the best of their training

This is not at all true for voice. There are an infinite number of possible voices, but a finite number of tones and phonemes you can use to express the text.

It's a much easier technical problem; it's just that it's much harder to gather proper data (you cannot just scrape Reddit and hope for the best, as LLMs do). And voice gets like 1/100th of LLMs' funding

damnesian•57m ago
I wonder if when it truly becomes indistinguishable from reality if people won't increasingly seek direct experiences with fellow humans. We're already experiencing this as a family. AI is such a strange mental rabbit hole, we're suffering from "tailored for you" fatigue. When you just want some objective answers, what pleases you the best is NOT useful, and at this point in the curve, you have to work harder to get LLMs to give you what you need rather that what it thinks will engage you more. My adult kids have started gathering to play board games and hang out in person whereas three years ago they'd be content to play online games together. We're hitting that threshold, right now, where our biology is pushing back.

I don't think the future as painted for us presently is as guaranteed as those would profit from it would like you to think.

Jblx2•52m ago
Dystopian Future 3: Elderly people getting scammed out of their life savings by scammers on the phone who sound indistinguishable from their grandchildren. (The ones who's grandchildren had their voices scraped from tiktoks.)

Scott Alexander's AI Opinions

https://www.astralcodexten.com/p/my-ai-opinions
1•brettcvz•1m ago•0 comments

Can I Buy Your KV Cache?

https://arxiv.org/abs/2606.13361
2•MediaSquirrel•2m ago•0 comments

Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization

https://arxiv.org/abs/2606.13658
1•MediaSquirrel•4m ago•0 comments

What Is an LLM Control Plane?

https://blog.mozilla.ai/what-is-an-llm-control-plane/
1•angpt•5m ago•0 comments

3D Map That Acts as Commercial Vessel and Geopolitics Intel Platform

https://github.com/jamalrfordii-arch/Vanguard-Map
1•Lawyer24•5m ago•1 comments

Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Reasoning

https://arxiv.org/abs/2606.13607
1•MediaSquirrel•7m ago•0 comments

Show HN: Musefs – organize and tag music without touching the original files

https://github.com/Sohex/musefs
1•sohex•8m ago•0 comments

A Visual Guide to DiffusionGemma

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-diffusiongemma
1•speckx•9m ago•0 comments

Writing Constant-Time Rust Is Not Enough

https://emavan.com/blog/2026/constant-time-rust-llvm-aliasing/
1•emavan•10m ago•1 comments

The First Trillionaire Is a Killer

https://www.theverge.com/tech/949259/the-worlds-first-trillionaire-is-a-killer
2•cdrnsf•10m ago•0 comments

What Does It Feel Like to Live Under the Threat of Redundancy?

https://isrf.org/blog/what-does-it-feel-like-to-live-under-the-threat-of-redundancy
1•theanonymousone•11m ago•0 comments

NEURA: A Unified and Retargetable Compilation Framework for CGRAs

https://dl.acm.org/doi/10.1145/3808285
1•matt_d•12m ago•0 comments

Why the AI Renaissance Keeps Not Arriving

https://jamesfbaker.substack.com/p/why-the-ai-renaissance-keeps-not
3•jamesbaker1•14m ago•0 comments

Unified Contradiction‑Resolution Framework for Physics and Mathematics

https://zenodo.org/records/20671885
1•MatthewCarlo•17m ago•0 comments

NMOX Studio is being built by Fable

https://github.com/NMOX/NMOX-Studio
1•DavidCanHelp•19m ago•0 comments

Devirt.dev – generic JavaScript deobfuscator built as a compiler

https://devirt.dev/
2•vasie•20m ago•0 comments

General purpose LLMs outperform specialized clinical AI on medical benchmarks

https://www.nature.com/articles/s41591-026-04431-5
1•dnw•22m ago•0 comments

Show HN: Markdown Viewer for Mac Finder

https://quicklookmd.com/
1•jzone3•22m ago•0 comments

Swift at Apple: Migrating the TrueType Hinting Interpreter

https://www.swift.org/blog/migrating-truetype-hinting-to-swift/
1•DASD•22m ago•0 comments

China's Juno detector outpaces decades of research in 59 days (science.org)

https://www.science.org/content/article/first-results-put-neutrino-experiment-china-track-breakth...
3•Hypathia•27m ago•0 comments

Urban pollution in wealthy world still adding to heart damage, study finds

https://www.ft.com/content/2f86dced-7a1c-432e-9988-0592ebb8ac25
2•paulpauper•29m ago•0 comments

Parking Spot Is Free. Should It Be?

https://www.nytimes.com/interactive/2026/06/09/nyregion/nyc-street-parking.html
3•paulpauper•29m ago•0 comments

Kagi Magic

https://kagi.com/magic
36•amirmasoudabdol•29m ago•9 comments

Tyler Cowen: Is Mexico Safe Enough for the World Cup?

https://www.thefp.com/p/tyler-cowen-is-mexico-safe-enough
1•paulpauper•30m ago•0 comments

US and Iran have agreed to wording of a deal to end their war

https://apnews.com/article/iran-us-ceasefire-hezbollah-israel-12-june-2026-7085e386e1c40ee6cfe634...
2•geox•30m ago•0 comments

Hacking Google with A.I. For $500k

https://brutecat.com/articles/hacking-google-with-ai/
1•kkm•30m ago•0 comments

GatekeeperAI – self-hosted governance platform for AI apps your team is building

https://github.com/jacobthomasmichael/GatekeeperAI/blob/main/README.md
1•jacob_thomas503•32m ago•0 comments

It's like I was born to be here (in Postgres) on Talking Postgres podcast Ep40

https://talkingpostgres.com/episodes/how-i-got-started-running-a-postgres-user-group-with-jeremy-...
1•clairegiordano•32m ago•0 comments

The 98% Problem: A Survey of Harness Engineering for AI Agents

https://labs.beconfident.app/papers/harness-engineering-survey
4•gdss•32m ago•0 comments

Sex n Crime 01

https://c64mags.untergrund.net/wiki/index.php?title=Sex_n_Crime_01
2•jruohonen•35m ago•0 comments