frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Diffusion Models Explained Simply

https://www.seangoedecke.com/diffusion-models-explained/
96•onnnon•8h ago

Comments

user14159265•6h ago
https://lilianweng.github.io/posts/2021-07-11-diffusion-mode...
Philpax•5h ago
Notably, Lilian did not explain diffusion models simply. This is a fantastic resource that details how they actually work, but your casual reader is unlikely to develop any sort of understanding from this.
kmitz•5h ago
Thanks, I was looking for an article like this, with a focus on the differences between generative AI techniques. My guess is that since LLMs and image generation became mainstream at the same time, most people don't have the slightest idea they are based on fundamentally different technologies.
cubefox•5h ago
That's a nice high-level explanation: short and easy to understand.
cubefox•5h ago
It's nice that this contains a comparison between diffusion models that are used for image models, and the autoregressive models that are used for LLMs.

But recently (2024 NeuIPS paper of the year) there was a new paper on autoregressive image modelling that apparently outperforms diffusion models: https://arxiv.org/abs/2404.02905

The innovation is that it doesn't predict image patches (like older autoregressive image models) but somehow does some sort of "next scale" or "next resolution" prediction.

In the past, autoregressive image models did not perform as well as diffusion models, which meant that most image models used diffusion. Now it seems autoregressive techniques have a strict advantage over diffusion models. Another advantage is that they can be integrated with autoregressive LLMs (multimodality), which is not possible with diffusion image models. In fact, the recent GPT-4o image generation is autoregressive according to OpenAI. I wonder whether diffusion models still have a future now.

earthnail•2h ago
From what I can tell, it doesn't look like the recent GPT-4o image generation includes the research of the NeurIPS paper you cited. If it did, we wouldn't see a line-by-line generation of the image, which we do currently in GPT-4o, but rather a decoding similar to progressive JPEG.

I'm not 100% convinced that diffusion models are dead. That paper fixes autoregression for 2D spaces by basically turning the generation problem from pixel-by-pixel to iterative upsampling, but if 2D was the problem (and 1D was not), why don't we have more autoregressive models in 1D spaces like audio?

porphyra•5h ago
Meanwhile, if you want diffusion models explained with math for a graduate student, there's Tony Duan's Diffusion Models From Scratch.

[1] https://www.tonyduan.com/diffusion/index.html

bcherry•4h ago
"The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material."

- Michelangelo

jdthedisciple•3h ago
Not to be that guy but an article on diffusion models with only one image ... and that too just noise?
ActorNightly•3h ago
The thing to understand about any model architecture is that there isn't really anything special about one or the other - as long as the process differentiable, ML can learn it.

You can build an image generator that basically renders each word on one line in an image, and then uses a transformer architecture to morph the image of the words into what the words are describing.

They only big difference is really efficiency, but we are just taking stabs at the dark at this point - there is work that Google is doing that eventually is going to result in the most optimal model for a certain type of task.

g42gregory•2h ago
One of the key intuitions: If you take a natural image and add random noise, you will get a different random noise image every time you do this. However, all of these (different!) random noise images will be lined up in the direction perpendicular to the natural images manifold.

So you will always know where to go to restore the original image: shortest distance to the natural image manifold.

How all these random images end up perpendicular to the manifold? High dimensional statistics and the fact that the natural image manifold has much lower dimension than the overall space.

fisian•2h ago
I found this course very helpful if you're interested in a bit of math (but all very well explained): https://diffusion.csail.mit.edu/

It is short, with good lecture notes and has hands on examples that are very approachable (with solutions available if you get stuck).

woolion•36m ago
Discussed on hn: https://news.ycombinator.com/item?id=43238893

I found it to be the best resource to understand the material. That's certainly a good reference to delve deeper into the intuitions given by OP (it's about 5 hours of lectures, plus exercises).

IncreasePosts•6m ago
Are there any diffusion models for text? I'd imagine they'd be very fast, if the whole result can be processed simultaneously, instead of outputting a linear series of tokens that each depend on the last

The Windows Subsystem for Linux is now open source

https://blogs.windows.com/windowsdeveloper/2025/05/19/the-windows-subsystem-for-linux-is-now-open-source/
852•pentagrama•5h ago•542 comments

Zod 4

https://zod.dev/v4
491•bpierre•6h ago•171 comments

Jules: An Asynchronous Coding Agent

https://jules.google/
27•travisennis•32m ago•1 comments

Claude Code SDK

https://docs.anthropic.com/en/docs/claude-code/sdk
161•sync•3h ago•85 comments

GitHub Copilot Coding Agent

https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
215•net01•5h ago•127 comments

Launch HN: Better Auth (YC X25) – Authentication Framework for TypeScript

147•bekacru•6h ago•53 comments

The forbidden railway: Vienna-Pyongyang (2008)

http://vienna-pyongyang.blogspot.com/2008/04/how-everything-began.html
60•1317•2h ago•15 comments

Dominion Energy's NEM 2.0 Proposal: What It Means for Solar in Virginia

https://www.virtuesolar.com/2025/05/16/dominion-nem-2/
25•Vsolar•3d ago•10 comments

Run your GitHub Actions locally

https://github.com/nektos/act
93•flashblaze•3d ago•34 comments

Too Much Go Misdirection

https://flak.tedunangst.com/post/too-much-go-misdirection
124•todsacerdoti•6h ago•52 comments

Game theory illustrated by an animated cartoon game

https://ncase.me/trust/
117•felineflock•5h ago•20 comments

Remarks on AI from NZ

https://nealstephenson.substack.com/p/remarks-on-ai-from-nz
91•zdw•3d ago•43 comments

ClawPDF – Open-Source Virtual/Network PDF Printer with OCR and Image Support

https://github.com/clawsoftware/clawPDF
156•miles•9h ago•23 comments

Show HN: Windows 98 themed website in 1 HTML file for my post punk band

https://corp.band
133•jealousgelatin•4h ago•28 comments

Glasskube (YC S24) is hiring in Vienna to build Open Source deployment tools

https://www.ycombinator.com/companies/glasskube/jobs/wjB77iZ-founding-engineer-go-typescript-kubernetes-docker
1•pmig•4h ago

Show HN: A MCP server to evaluate Python code in WASM VM using RustPython

https://github.com/tuananh/hyper-mcp/tree/main/examples/plugins/eval-py
6•tuananh•2d ago•2 comments

Microsoft's ICC blockade: digital dependence comes at a cost

https://www.techzine.eu/news/privacy-compliance/131536/microsofts-icc-blockade-digital-dependence-comes-at-a-cost/
166•bramhaag•3h ago•67 comments

European Investment Bank to inject €70B in European tech

https://ioplus.nl/en/posts/european-investment-bank-to-inject-70-billion-in-european-tech
224•saubeidl•5h ago•229 comments

Rivers

https://www.futilitycloset.com/2025/05/15/rivers/
38•surprisetalk•3d ago•3 comments

InventWood is about to mass-produce wood that's stronger than steel

https://techcrunch.com/2025/05/12/inventwood-is-about-to-mass-produce-wood-thats-stronger-than-steel/
422•LorenDB•1d ago•398 comments

Wikipedia's Most Translated Articles

https://sohom.dev/most-translated-articles-on-wikipedia/pretty.html
77•sohom_datta•5h ago•49 comments

FCC Chair Brendan Carr is letting ISPs merge–as long as they end DEI programs

https://arstechnica.com/tech-policy/2025/05/fcc-chair-brendan-carr-is-letting-isps-merge-as-long-as-they-end-dei-programs/
30•rntn•1h ago•8 comments

Side projects I've built since 2009

https://naeemnur.com/side-projects/
224•naeemnur•12h ago•125 comments

Telum II at Hot Chips 2024: Mainframe with a Unique Caching Strategy

https://chipsandcheese.com/p/telum-ii-at-hot-chips-2024-mainframe-with-a-unique-caching-strategy
110•rbanffy•11h ago•49 comments

Dilbert creator Scott Adams says he will die soon from same cancer as Joe Biden

https://www.thewrap.com/dilbert-scott-adams-prostate-cancer-biden/
134•dale_huevo•4h ago•169 comments

Show HN: A native Hacker News reader with integrated todo/done tracking

https://github.com/haojiang99/hacker_news_reader
11•coolwulf•3h ago•6 comments

Diffusion Models Explained Simply

https://www.seangoedecke.com/diffusion-models-explained/
96•onnnon•8h ago•14 comments

Edit is now open source

https://devblogs.microsoft.com/commandline/edit-is-now-open-source/
150•ingve•5h ago•56 comments

23andMe Sells Gene-Testing Business to DNA Drug Maker Regeneron

https://www.bloomberg.com/news/articles/2025-05-19/23andme-sells-gene-testing-business-to-dna-drug-maker-regeneron
179•wslh•6h ago•100 comments

WireGuard-vanity-keygen: WireGuard vanity key generator

https://github.com/axllent/wireguard-vanity-keygen
7•simonpure•1h ago•1 comments