frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Introspective Diffusion Language Models

https://introspective-diffusion.github.io/
83•zagwdt•4h ago

Comments

andsoitis•4h ago
Is anyone here experimenting seriously with Diffusion for text generation? I’d love to learn about your experiences!
moostee•3h ago
I have. It requires a distinct intuition compared to a normal language model. Very well suited to certain problems.
andsoitis•3h ago
Can you tell us more?
recsv-heredoc•3h ago
https://www.inceptionlabs.ai/

This startup seems to have been at it a while.

From our look into it - amazing speed, but challenges remain around time-to-first-token user experience and overall answer quality.

Can absolutely see this working if we can get the speed and accuracy up to that “good enough” position for cheaper models - or non-user facing async work.

One other question I’ve had is wondering if it’s possible to actually set a huge amount of text to diffuse as the output - using a larger body to mechanically force greater levels of reasoning. I’m sure there’s some incredibly interesting research taking place in the big labs on this.

IanCal•3h ago
The overall speed rather than TTFT might start to be more relevant as the caller moves from being a human to another model.

However quality is really important. I tried that site and clicked one of their examples, "create a javascript animation". Fast response, but while it starts like this

``` Below is a self‑contained HTML + CSS + JavaScript example that creates a simple, smooth animation: a colorful ball bounces around the browser window while leaving a fading trail behind it.

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>JavaScript Bounce Animation</title> <style> body, html { margin: 0; padding: 0;

```

the answer then degrades to

``` radius: BALL_RADIUS, color: BALL_COLOR, traivD O] // array of previous {x,y} positions }; ```

Then more things start creeping in

``` // 3⃣ Bounce off walls if (ball.G 0 ball.radius < 0 || ball.x + ball.radius > _7{nas.width) { ball.vx *= -1; ibSl.x = Math.max(ball.radius, Math.min(ball.x, canvbbF4idth - ball.radius)); } if

```

and the more it goes on the worse it gets

``` Ho7 J3 Works 0 Atep | Description | ```

and

``` • prwrZ8}E6on 5 jdF wVuJg Ar touc> 2ysteners ,2 Ppawn \?) balls w>SFu the 8b$] cliM#]9 ```

This is for the demo on the front page, so I expect this is a pretty good outcome compared to what else you might ask.

cataflutter•2h ago
Weird; I clicked through out of curiosity and didn't get any corruption of the sort in the end result.

I also asked it some technical details about how diffusion LLMs could work and it provided grammatically-correct plausible answers in a very short time (I don't know the tech to say if it's correct or not).

nl•5m ago
Mercury 2 is better than that in my testing, but it does have trouble with tool calling.
girvo•2h ago
It's being explored right now for speculative decoding in the local-LLM space, which I think is quite interesting as a use-case

https://www.emergentmind.com/topics/dflash-block-diffusion-f...

roger_•42m ago
DFlash immediately came to my mind.

There are several Mac implementations of it that show > 2x faster Qwen3.5 already.

LoganDark•2h ago
I've been playing with a Swift implementation of a diffusion language model (WeDLM), but performance is not yet acceptable and it still generates roughly from left-to-right like a language model (just within a sliding window rather than strictly token-by-token... but that doesn't matter when the sliding window is only like 16 tokens.)
Topfi•30m ago
I've found the latency and pricing make Mercury 2 extremely compelling for some UX experiments focused around automated note tagging/interlinking. Far more than the Gemini Flash Lite I used before, it made some interactions nearly frictionless, very close to how old school autocomplete/T9/autocorrect works in a manner that users don't even think about the processes behind it.

Sadly, it does not perform at the level of e.g. Haiku 3.5 for tool calling, despite their own benchmarks claiming parity with Haiku 4.5, but it does compete with Flash Lite there too.

Anything with very targeted output, sufficient existing input and that benefits from a seamless feeling lends itself to dLLMs. Could see a place in tab-complete too, though Cursors model seems to be sufficiently low latency already.

nl•7m ago
If you like Mercury 2 you should try Xiaomi Mimo-v2-flash.

I have an agentic benchmark and it shows Mercury 2 at 19/25 in 58 seconds and Mimo v2 Flash at 22/25 in 109 seconds

https://sql-benchmark.nicklothian.com/?highlight=xiaomi_mimo... (flip to the Cost vs Performance tab to see speed more graphically too)

thepasch•2h ago
If I’m reading this right, this is pretty wild. They turned a Qwen autoregressor into a diffuser by using a bunch of really clever techniques, and they vastly outperform any “native diffuser,” actually being competitive with the base model they were trained from. The obvious upside here is the massive speedup in generation.

And then through a LoRA adapter, you can ground the diffuser on the base model’s distribution (essentially have it “compare” its proposals against what the base model would’ve generated), which effectively means: exact same byte-for-byte output for the same seed, just roughly twice as fast (which should improve even more for batched tasks).

I’m not an expert, more of a “practicing enthusiast,” so I might be missing something, but at first glance, this reads super exciting to me.

awestroke•1h ago
I don't understand how you can compare against the base model output without generating with the base model, in which case what's the point?
a1j9o94•39m ago
You would only use the base model during training. This is a distillation technique
qeternity•28m ago
I haven't read TFA yet but a common technique is speculative decoding where a fast draft model will generate X tokens, which are then verified by the larger target model. The target model may accept some Y < X tokens but the speedup comes from the fact that this can be done in parallel as a prefill operation due to the nature of transformers.

So let's say a draft model generates 5 tokens, all 5 of these can be verified in parallel with a single forward pass of the target model. The target model may only accept the first 4 tokens (or whatever) but as long as the 5 forward passes of the draft model + 1 prefill of the target model is faster than 4 forward passes of the target, you will have a speedup while maintaining the exact output distribution as the target.

anentropic•23m ago
presumably that happens at training time?

then once successfully trained you get faster inference from just the diffusion model

ramon156•2h ago
> 2025-04-12: Initial code release with training and inference support.

> 2025-04-12: Released I-DLM-8B, I-DLM-32B, and I-DLM-8B-LoRA on HuggingFace.

Is this old already? Not saying that's a bad thing, since it seems very sophisticated. Just curious if there's an update

oersted•1h ago
It's clearly a typo on the year, April 12 was two days ago, a quick check in HuggingFace shows that they were uploaded 5 days ago.
simianwords•1h ago
Can diffusion models have reasoning steps where they generate a block, introspect and then generate another until the output is satisfactory?
moeadham•1h ago
Well, you can take the output of a first pass and pass it back through the model like AR “reasoning” models do at inference time.
simianwords•1h ago
Yes and has this been tried?
Topfi•21m ago
Yes, Mercury 2 is a reasoning model [0].

[0] https://docs.inceptionlabs.ai/get-started/models#mercury-2

scotty79•23m ago
So can you just use this and have a faster Qwen32b?

https://huggingface.co/yifanyu/I-DLM-32B/tree/main

DaVinci Resolve – Photo

https://www.blackmagicdesign.com/products/davinciresolve/photo
679•thebiblelover7•9h ago•179 comments

NimConf 2026: Dates Announced, Registrations Open

https://nim-lang.org/blog/2026/04/07/nimconf-2026.html
18•moigagoo•53m ago•3 comments

A new spam policy for “back button hijacking”

https://developers.google.com/search/blog/2026/04/back-button-hijacking
448•zdw•9h ago•269 comments

What is jj and why should I care?

https://steveklabnik.github.io/jujutsu-tutorial/introduction/what-is-jj-and-why-should-i-care.html
24•tigerlily•1h ago•9 comments

Someone bought 30 WordPress plugins and planted a backdoor in all of them

https://anchor.host/someone-bought-30-wordpress-plugins-and-planted-a-backdoor-in-all-of-them/
999•speckx•18h ago•281 comments

Introspective Diffusion Language Models

https://introspective-diffusion.github.io/
83•zagwdt•4h ago•24 comments

Backblaze has stopped backing up your data

https://rareese.com/posts/backblaze/
269•rrreese•3h ago•182 comments

GitHub Stacked PRs

https://github.github.com/gh-stack/
752•ezekg•15h ago•398 comments

Franklin's bad ads for Apple ][ clones and the beloved impersonator they depict

https://buttondown.com/suchbadtechads/archive/franklin-ace-1000/
34•rfarley04•3d ago•14 comments

Distributed DuckDB Instance

https://github.com/citguru/openduck
85•citguru•5h ago•18 comments

Ask HN: I quit my job over weaponized robots to start my own venture

19•barratia•41m ago•10 comments

Lean proved this program correct; then I found a bug

https://kirancodes.me/posts/log-who-watches-the-watchers.html
280•bumbledraven•11h ago•134 comments

The M×N problem of tool calling and open-source models

https://www.thetypicalset.com/blog/grammar-parser-maintenance-contract
14•remilouf•4d ago•3 comments

Ransomware Is Growing Three Times Faster Than the Spending Meant to Stop It

https://ciphercue.com/blog/ransomware-claims-grew-faster-than-security-spend-2025
22•adulion•3h ago•19 comments

WiiFin – Jellyfin Client for Nintendo Wii

https://github.com/fabienmillet/WiiFin
180•throwawayk7h•12h ago•78 comments

A soft robot has no problem moving with no motor and no gears

https://engineering.princeton.edu/news/2026/04/08/soft-robot-has-no-problem-moving-no-motor-and-n...
42•hhs•4d ago•8 comments

Multi-Agentic Software Development Is a Distributed Systems Problem

https://kirancodes.me/posts/log-distributed-llms.html
50•tie-in•6h ago•17 comments

MOS tech 6502 8-bit microprocessor in pure SQL powered by Postgres

https://github.com/lasect/pg_6502
36•adunk•6h ago•3 comments

The Great Majority: Body Snatching and Burial Reform in 19th-Century Britain

https://publicdomainreview.org/essay/the-great-majority/
3•apollinaire•3d ago•0 comments

Nothing Ever Happens: Polymarket bot that always buys No on non-sports markets

https://github.com/sterlingcrispin/nothing-ever-happens
437•m-hodges•20h ago•242 comments

Lumina – a statically typed web-native language for JavaScript and WASM

https://github.com/nyigoro/lumina-lang
22•light_ideas•4d ago•7 comments

Design and implementation of DuckDB internals

https://duckdb.org/library/design-and-implementation-of-duckdb-internals/
139•mpweiher•3d ago•9 comments

US appeals court declares 158-year-old home distilling ban unconstitutional

https://nypost.com/2026/04/11/us-news/us-appeals-court-declares-158-year-old-home-distilling-ban-...
400•t-3•22h ago•268 comments

Rust Threads on the GPU

https://www.vectorware.com/blog/threads-on-gpu/
88•PaulHoule•4d ago•24 comments

The secrets of the Shinkansen

https://www.worksinprogress.news/p/the-secret-behind-japans-railways
106•WillDaSilva•5h ago•97 comments

N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

https://ndaybench.winfunc.com
78•mufeedvh•14h ago•26 comments

Make tmux pretty and usable (2024)

https://hamvocke.com/blog/a-guide-to-customizing-your-tmux-conf/
394•speckx•21h ago•243 comments

Write less code, be more responsible

https://blog.orhun.dev/code-responsibly/
108•orhunp_•3d ago•65 comments

TanStack Start Now Support React Server Components

https://tanstack.com/blog/react-server-components
74•polywock•6h ago•53 comments

Android now stops you sharing your location in photos

https://shkspr.mobi/blog/2026/04/android-now-stops-you-sharing-your-location-in-photos/
386•edent•1d ago•305 comments