frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Streaming Speech Synthesis Without the Trade-Offs: Meet StreamFlow

https://arxiv.org/abs/2506.23986
3•PranayBatta•1mo ago

Comments

PranayBatta•1mo ago
TL;DR: Diffusion-based TTS models sound amazing but break down for real-time streaming because they require full-sequence attention. StreamFlow introduces a block-wise guided attention scheme that lets diffusion transformers generate speech chunk-by-chunk with near–SOTA quality and predictable low latency.

Why this matters: Current diffusion speech models need to see the entire audio sequence, making them too slow and memory-heavy for assistants, agents, or anything that needs instant voice responses. Causal masks sound robotic; chunking adds weird seams. Streaming TTS has been stuck with a quality–latency tradeoff.

The idea: StreamFlow restricts attention using sliding windows over blocks:

Each block can see W_b past blocks and W_f future blocks

Compute becomes roughly O(B × W × N) instead of full O(N²)

Prosody stays smooth, latency stays constant, and boundaries disappear with small overlaps + cross-fades

How it works: The system is still a Diffusion Transformer, but trained in two phases:

Full-attention pretraining for global quality

Block-wise fine-tuning to adapt to streaming constraints

Generates mel-spectrograms; BigVGAN vocoder runs in parallel.

Performance:

~180ms first-packet latency (80ms model, 60ms vocoder, 40ms overhead)

No latency growth with longer speech

MOS tests show near-indistinguishable quality vs non-streaming diffusion

Speaker similarity within ~2%, prosody continuity preserved

Key ablation takeaways:

Past context helps until ~3 blocks; more adds little

Even a tiny future window greatly boosts naturalness

Best results: 0.4–0.6s block size, ~10–20% overlap

Comparison:

Autoregressive TTS → streaming but meh quality

GAN TTS → fast but inconsistent

Causal diffusion → real-time but degraded

StreamFlow → streaming + near-SOTA quality

Bigger picture: Smart attention shaping lets diffusion models work in real time without throwing away global quality. The same technique could apply to streaming music generation, translation, or interactive media.

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

https://app.writtte.com/read/kZ8Kj6R
1•lasgawe•25s ago•1 comments

The Story of Heroku (2022)

https://leerob.com/heroku
1•tosh•44s ago•0 comments

Obey the Testing Goat

https://www.obeythetestinggoat.com/
1•mkl95•1m ago•0 comments

Claude Opus 4.6 extends LLM pareto frontier

https://michaelshi.me/pareto/
1•mikeshi42•2m ago•0 comments

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•4m ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•5m ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•5m ago•0 comments

Software development is undergoing a Renaissance in front of our eyes

https://twitter.com/gdb/status/2019566641491963946
1•tosh•6m ago•0 comments

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

https://tryward.app/aiquiz
1•bennydog224•7m ago•1 comments

Spec-Driven Design with Kiro: Lessons from Seddle

https://medium.com/@dustin_44710/spec-driven-design-with-kiro-lessons-from-seddle-9320ef18a61f
1•nslog•7m ago•0 comments

Agents need good developer experience too

https://modal.com/blog/agents-devex
1•birdculture•8m ago•0 comments

The Dark Factory

https://twitter.com/i/status/2020161285376082326
1•Ozzie_osman•8m ago•0 comments

Free data transfer out to internet when moving out of AWS (2024)

https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-internet-when-moving-out-of-aws/
1•tosh•10m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•alwillis•11m ago•0 comments

Prejudice Against Leprosy

https://text.npr.org/g-s1-108321
1•hi41•12m ago•0 comments

Slint: Cross Platform UI Library

https://slint.dev/
1•Palmik•16m ago•0 comments

AI and Education: Generative AI and the Future of Critical Thinking

https://www.youtube.com/watch?v=k7PvscqGD24
1•nyc111•16m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•17m ago•0 comments

Moltbook isn't real but it can still hurt you

https://12gramsofcarbon.com/p/tech-things-moltbook-isnt-real-but
1•theahura•20m ago•0 comments

Take Back the Em Dash–and Your Voice

https://spin.atomicobject.com/take-back-em-dash/
1•ingve•21m ago•0 comments

Show HN: 289x speedup over MLP using Spectral Graphs

https://zenodo.org/login/?next=%2Fme%2Fuploads%3Fq%3D%26f%3Dshared_with_me%25253Afalse%26l%3Dlist...
1•andrespi•22m ago•0 comments

Teaching Mathematics

https://www.karlin.mff.cuni.cz/~spurny/doc/articles/arnold.htm
2•samuel246•24m ago•0 comments

3D Printed Microfluidic Multiplexing [video]

https://www.youtube.com/watch?v=VZ2ZcOzLnGg
2•downboots•25m ago•0 comments

Abstractions Are in the Eye of the Beholder

https://software.rajivprab.com/2019/08/29/abstractions-are-in-the-eye-of-the-beholder/
2•whack•25m ago•0 comments

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

https://zenodo.org/records/18518956
1•MikeBee•25m ago•0 comments

We didn't ask for this internet – Ezra Klein show [video]

https://www.youtube.com/shorts/ve02F0gyfjY
1•softwaredoug•26m ago•0 comments

The Real AI Talent War Is for Plumbers and Electricians

https://www.wired.com/story/why-there-arent-enough-electricians-and-plumbers-to-build-ai-data-cen...
2•geox•29m ago•0 comments

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

https://github.com/memovai/mimiclaw
1•ssslvky1•29m ago•0 comments

I Maintain My Blog in the Age of Agents

https://www.jerpint.io/blog/2026-02-07-how-i-maintain-my-blog-in-the-age-of-agents/
3•jerpint•29m ago•0 comments

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•31m ago•0 comments