frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Diffusion LLM may make most of the AI engineering stack obsolete

3•victorpiles99•1h ago
I've been deep-diving into diffusion language models this week and I think this is the most underrated direction in AI right now.

The core issue with autoregressive LLMs:

Every major model today (GPT, Claude, Gemini) generates one token at a time, left to right. Each token depends on the previous one. This single architectural constraint has shaped the entire AI industry:

- Models can't revise what they already wrote → we build chain-of-thought, reflection, and multi-pass reasoning to force them to "think before committing" - One forward pass per token → we invest heavily in speculative decoding, KV-caches, and quantization to make generation tolerable - Can't edit mid-output → we build agent frameworks with retry loops, tool calls, and planning layers to work around it - Can't generate in parallel → we build orchestration systems that chain multiple slow calls together

Most of what we call "AI engineering" today is patching around one thing: the model can't look back.

Diffusion LMs flip the paradigm. Start with a canvas of masked tokens, iteratively refine the entire output in parallel. Every position updated simultaneously, the model sees and edits all of its output at every step. Same principle as image diffusion (Stable Diffusion, DALL-E), applied to text.

Why I think the theory actually holds:

1. Parallelism is real, not theoretical. Inception Labs' Mercury 2 (closed-source, diffusion-based) already hits ~1000 tok/s with quality competitive with GPT-4o mini on MMLU, HumanEval, MATH. That's not a benchmark trick it's a direct consequence of not being bottlenecked by sequential generation.

2. The complexity reduction is massive. If a model can see and edit its entire output at once, you don't need half the scaffolding we've built: reflection prompting becomes native (the model already iterates on its own output), retry loops become unnecessary (edit in place), planning agents get simpler (the model can restructure, not just append). The whole stack flattens.

3. The conversion path exists. You can take an existing pretrained AR model and convert it to diffusion via fine-tuning alone no pretraining from scratch. This means the billions already invested in AR pretraining aren't wasted. It's an upgrade path, not a restart.

The main limitation today: fixed output length. You must pre-allocate the canvas size before generation starts. Block Diffusion (generating in sequential chunks, diffusing within each chunk) is one workaround. Hierarchical generation outline first, expand sections in parallel is another. Ironically, orchestrating that requires an agent, so diffusion doesn't kill agents it changes what they do.

Honest take: Open diffusion LMs still trail top AR models on knowledge and reasoning at comparable scale. But Mercury 2 shows the ceiling is high, the conversion results are surprisingly good, and the architecture eliminates entire categories of engineering complexity. I think within a year we'll see diffusion models competitive with frontier AR models, and when that happens, a lot of the current tooling (agent frameworks, prompt engineering techniques, inference optimization stacks) gets dramatically simpler or unnecessary.

While researching all this I found dLLM, an open-source library that unifies training, inference, and evaluation for diffusion LMs. It has recipes for LLaDA, Dream, Block Diffusion, and converting any AR model to diffusion. Good starting point if you want to experiment.

Paper: https://arxiv.org/abs/2602.22661

Code: https://github.com/ZHZisZZ/dllm

Models: https://huggingface.co/dllm-hub

Infrastructure orchestration is an agent skill

https://dstack.ai/blog/agentic-orchestration/
1•latchkey•1m ago•0 comments

XonY.org – the structured opinions of public figures

https://xony.org/
1•itshywu•1m ago•0 comments

Nvidia Nemotron 3 Super Delivers 5x Higher Throughput

https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/
1•buildbot•1m ago•0 comments

MacBook Neo review: Fresh-squeezed laptop

https://sixcolors.com/post/2026/03/macbook-neo-review/
1•tosh•2m ago•0 comments

M5 MacBook Air Review: Not just more of the same–the same, but more

https://sixcolors.com/post/2026/03/m5-macbook-air-review-not-just-more-of-the-same-the-same-but-m...
1•tosh•2m ago•0 comments

Nvidia Nemotron 3 Super

https://research.nvidia.com/labs/nemotron/Nemotron-3-Super/
2•vinhnx•5m ago•0 comments

Enable Code-Mode for all your MCP servers even if they don't support it natively

https://github.com/aakashh242/remote-mcp-adapter
1•aakashh242•6m ago•0 comments

Show HN: Kronos – A calendar-style scheduler for AI Agents agent runs

https://github.com/Reqeique/Kronos
1•Reqeique•7m ago•0 comments

GitHub Accounts Compromised

https://opensourcemalware.com/blog/polinrider-attack
2•6mile•7m ago•1 comments

Show HN: KnowledgeWorker – A Corporate Productivity Simulator

https://knowledgeworker.alexmeub.com/
1•meub•7m ago•0 comments

Show HN: JD Roast – Paste a job description, get it brutally roasted

https://jd-roast.openjobs-ai.com/
1•genedai•7m ago•0 comments

WireGuardClient is Transport Encryption not a VPN

https://github.com/proxylity/wg-client
1•mlhpdx•8m ago•0 comments

Built an Intelligence Platform to Map the "PizzaGate.Online" Scandal

https://pizzagate.online/
1•whistleblowhy•8m ago•1 comments

How the UK government's new digital ID will work

https://takes.jamesomalley.co.uk/p/how-the-uk-digital-id-will-work
1•dgroshev•9m ago•0 comments

AMD Ryzen AI NPUs Are Finally Useful Under Linux for Running LLMs

https://www.phoronix.com/news/AMD-Ryzen-AI-NPUs-Linux-LLMs
2•mikece•10m ago•0 comments

Show HN: Debrief CLI, local CLI to turn Git pushes into product updates

https://github.com/trydebrief/debrief-cli
1•baetylus•10m ago•0 comments

The App Store Accountability Act

https://proton.me/blog/app-store-accountability-act
1•mikece•11m ago•0 comments

Don't lick that cold metal pole in winter–if you do, don't panic

https://arstechnica.com/science/2026/03/exploring-the-science-of-tundra-tongue/
1•canucker2016•12m ago•0 comments

Turnstone: Multi-node AI orchestration platform

https://github.com/turnstonelabs/turnstone/
1•huslage•14m ago•0 comments

Revolut secures full UK banking licence after four-year wait

https://www.ft.com/content/b4df4126-351e-4424-9707-8a12ca6b79a6
1•0xFA11•14m ago•0 comments

Replit Agent 4

https://replit.com/agent4
2•colesantiago•14m ago•1 comments

Solution to the Sleuth puzzle made by Julian Assange

https://wondrousnet.blogspot.com/2023/05/solution-to-puzzle-sleuth.html
1•morethenthis•15m ago•0 comments

10x Is the New Floor

https://writing.nikunjk.com/p/10x-is-the-new-floor
1•vinhnx•16m ago•0 comments

BOE Open to Changing Stablecoin Caps After Industry Backlash

https://www.bloomberg.com/news/articles/2026-03-11/boe-open-to-changing-stablecoin-cap-after-indu...
1•petethomas•16m ago•0 comments

Launch HN: Sentrial (YC W26) – Catch AI Agent Failures Before Your Users Do

https://www.sentrial.com/
4•anayrshukla•16m ago•2 comments

Binance brings back tokenized stocks trading with Ondo Finance deal

https://www.coindesk.com/business/2026/02/23/binance-brings-back-tokenized-stocks-trading-with-on...
1•PaulHoule•16m ago•0 comments

SQLite WAL-Reset Database Corruption Bug

https://sqlite.org/wal.html#walresetbug
1•tcbrah•17m ago•0 comments

Show HN: First IDL for Object-Graph Serialization (Apache Fory IDL)

https://fory.apache.org/blog/fory_schema_idl_for_object_graph/
1•chaokunyang•17m ago•1 comments

Iran-Backed Hackers Claim Wiper Attack on Medtech Firm Stryker

https://krebsonsecurity.com/2026/03/iran-backed-hackers-claim-wiper-attack-on-medtech-firm-stryker/
1•todsacerdoti•18m ago•0 comments

OpenAIReview: AI-assisted Reviewing is Necessary and Should be Open

https://openaireview.github.io/blog.html
1•jprs•19m ago•0 comments