Introspective Diffusion Language Models

https://introspective-diffusion.github.io/

67•zagwdt•3h ago

Comments

andsoitis•3h ago

Is anyone here experimenting seriously with Diffusion for text generation? I’d love to learn about your experiences!

moostee•3h ago

I have. It requires a distinct intuition compared to a normal language model. Very well suited to certain problems.

andsoitis•2h ago

Can you tell us more?

recsv-heredoc•3h ago

https://www.inceptionlabs.ai/

This startup seems to have been at it a while.

From our look into it - amazing speed, but challenges remain around time-to-first-token user experience and overall answer quality.

Can absolutely see this working if we can get the speed and accuracy up to that “good enough” position for cheaper models - or non-user facing async work.

One other question I’ve had is wondering if it’s possible to actually set a huge amount of text to diffuse as the output - using a larger body to mechanically force greater levels of reasoning. I’m sure there’s some incredibly interesting research taking place in the big labs on this.

IanCal•2h ago

The overall speed rather than TTFT might start to be more relevant as the caller moves from being a human to another model.

However quality is really important. I tried that site and clicked one of their examples, "create a javascript animation". Fast response, but while it starts like this

``` Below is a self‑contained HTML + CSS + JavaScript example that creates a simple, smooth animation: a colorful ball bounces around the browser window while leaving a fading trail behind it.

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>JavaScript Bounce Animation</title> <style> body, html { margin: 0; padding: 0;

```

the answer then degrades to

``` radius: BALL_RADIUS, color: BALL_COLOR, traivD O] // array of previous {x,y} positions }; ```

Then more things start creeping in

``` // 3⃣ Bounce off walls if (ball.G 0 ball.radius < 0 || ball.x + ball.radius > _7{nas.width) { ball.vx *= -1; ibSl.x = Math.max(ball.radius, Math.min(ball.x, canvbbF4idth - ball.radius)); } if

```

and the more it goes on the worse it gets

``` Ho7 J3 Works 0 Atep | Description | ```

and

``` • prwrZ8}E6on 5 jdF wVuJg Ar touc> 2ysteners ,2 Ppawn \?) balls w>SFu the 8b$] cliM#]9 ```

This is for the demo on the front page, so I expect this is a pretty good outcome compared to what else you might ask.

cataflutter•1h ago

Weird; I clicked through out of curiosity and didn't get any corruption of the sort in the end result.

I also asked it some technical details about how diffusion LLMs could work and it provided grammatically-correct plausible answers in a very short time (I don't know the tech to say if it's correct or not).

girvo•2h ago

It's being explored right now for speculative decoding in the local-LLM space, which I think is quite interesting as a use-case

https://www.emergentmind.com/topics/dflash-block-diffusion-f...

roger_•4m ago

DFlash immediately came to my mind.

There are several Mac implementations of it that show > 2x faster Qwen3.5 already.

LoganDark•1h ago

I've been playing with a Swift implementation of a diffusion language model (WeDLM), but performance is not yet acceptable and it still generates roughly from left-to-right like a language model (just within a sliding window rather than strictly token-by-token... but that doesn't matter when the sliding window is only like 16 tokens.)

thepasch•1h ago

If I’m reading this right, this is pretty wild. They turned a Qwen autoregressor into a diffuser by using a bunch of really clever techniques, and they vastly outperform any “native diffuser,” actually being competitive with the base model they were trained from. The obvious upside here is the massive speedup in generation.

And then through a LoRA adapter, you can ground the diffuser on the base model’s distribution (essentially have it “compare” its proposals against what the base model would’ve generated), which effectively means: exact same byte-for-byte output for the same seed, just roughly twice as fast (which should improve even more for batched tasks).

I’m not an expert, more of a “practicing enthusiast,” so I might be missing something, but at first glance, this reads super exciting to me.

awestroke•40m ago

I don't understand how you can compare against the base model output without generating with the base model, in which case what's the point?

a1j9o94•1m ago

You would only use the base model during training. This is a distillation technique

ramon156•1h ago

> 2025-04-12: Initial code release with training and inference support.

> 2025-04-12: Released I-DLM-8B, I-DLM-32B, and I-DLM-8B-LoRA on HuggingFace.

Is this old already? Not saying that's a bad thing, since it seems very sophisticated. Just curious if there's an update

oersted•1h ago

It's clearly a typo on the year, April 12 was two days ago, a quick check in HuggingFace shows that they were uploaded 5 days ago.

simianwords•1h ago

Can diffusion models have reasoning steps where they generate a block, introspect and then generate another until the output is satisfactory?

moeadham•56m ago

Well, you can take the output of a first pass and pass it back through the model like AR “reasoning” models do at inference time.

simianwords•46m ago

Yes and has this been tried?

Ask HN: I quit my job over weaponized robots to start my own venture

Building a Real-Time Routing System for Payment Success at Cashfree Payments

Read the Friendly Manual

Ada Lovelace and the First Computer Algorithm

We Tracked Every Congressional Bill to Its Prediction Market

China Imposes New Rules to Block Foreign Companies from 'Decoupling'

The Case Against Gameplay Loops

Optimizing Chained Strcmp Calls for Speed and Clarity

Architecture Catas – A collection of anti-patterns

Mastodon gets Sovereign Tech Agency funding

Rubens Menin's 150 Years "Old" Port Wine

Computational 'time machine' shows solar and wind power on track for 2°C target

NimConf 2026: Dates Announced, Registrations Open

Thucydides Trap

1% Vacancy, 81% Preleased: Where Midmarket Compute Deploys in 2026

Ask HN: Preferred pricing model for sound effects libraries?

Energy-Guard OS – A 411MB CPU-Native AI Security Gateway (4ms Latency)

Why it's so hard to innovate in the email space (2014)

PHP 8.6 Closure Optimizations

The Folly of SEO

US officials underwhelmed by French far-right's plans for economy

AdVersa: Adversarially-Robust and Practical Ad and Tracker Blocking in the Wild

The Etymological Problem with Apples

Why My WordPress?

Show HN: VibeDrift – Measure drift in AI-generated codebases

Post-Slop Stress Disorder (PSSD)

I trained an AI to do my LinkedIn outreach, it books more meetings than me

The Origins of GPU Computing

On hacker mindset

The Crypto Social Arena