Show HN: I invented a new generative model and got accepted to ICLR

https://discrete-distribution-networks.github.io/

199•diyer22•5h ago

I invented Discrete Distribution Networks, a novel generative model with simple principles and unique properties, and the paper has been accepted to ICLR2025!

Modeling data distribution is challenging; DDN adopts a simple yet fundamentally different approach compared to mainstream generative models (Diffusion, GAN, VAE, autoregressive model):

1. The model generates multiple outputs simultaneously in a single forward pass, rather than just one output. 2. It uses these multiple outputs to approximate the target distribution of the training data. 3. These outputs together represent a discrete distribution. This is why we named it "Discrete Distribution Networks".

Every generative model has its unique properties, and DDN is no exception. Here, we highlight three characteristics of DDN:

- Zero-Shot Conditional Generation (ZSCG). - One-dimensional discrete latent representation organized in a tree structure. - Fully end-to-end differentiable.

Reviews from ICLR:

> I find the method novel and elegant. The novelty is very strong, and this should not be overlooked. This is a whole new method, very different from any of the existing generative models. > This is a very good paper that can open a door to new directions in generative modeling.

Comments

VoidWhisperer•3h ago

I don't have a super deep understanding of the underlying algorithms involved, but going off the demo and that page, is this mainly a model for image related tasks, or could it also be trained to do things like what GPT/Claude/etc does (chat conversations)?

booli•3h ago

The posts mentions that: https://github.com/Discrete-Distribution-Networks/Discrete-D...

diyer22•3h ago

Yes, it's absolutely possible—just like how diffusion LLMs work, we can do the same with DDN LLMs.

I made an initial attempt to combine [DDN with GPT](https://github.com/Discrete-Distribution-Networks/Discrete-D...), aiming to remove tokenizers and let LLMs directly model binary strings. In each forward pass, the model adaptively adjusts the byte length of generated content based on generation difficulty (naturally supporting speculative sampling).

f_devd•3h ago

Pretty interesting architecture, seems very easy to debug, but as a downside you effectively discard K-1 computations at each layer since it's using a sampler rather than a MoE-style router.

The best way I can summarize it is a Mixture-of-Experts combined with an 'x0-target' latent diffusion model. The main innovation is the guided sampler (rather than router) & split-and-prune optimizer; making it easier to train.

yorwba•3h ago

Since the sampling probability is 1/K independent of the input, you don't need to compute K different intermediate outputs at each layer during inference, you can instead decide ahead of time which of the outputs you want to use and only compute that one.

(This is mentioned in Q1 in the "Common Questions About DDN" section at the bottom.)

p1esk•3h ago

How does it compare to state of the art models? Does it scale?

diyer22•2h ago

The first version of DDN was developed in less than three months, almost entirely by one person. Consequently, the experiments were preliminary and the results far from SoTA.

The current goal in research is scaling up. Here are some thoughts in blog about future directions: https://github.com/Discrete-Distribution-Networks/Discrete-D...

serf•3h ago

isn't this kind of like an 80% vq-vae?

diyer22•2h ago

No, DDN and VQ-VAE are clearly different.

Similarities: - Both map data to a discrete latent space.

Differences: - VQ-VAE needs an external prior over code indices (e.g. PixelCNN or a hierarchical prior) to model distribution. DDN builds its own hierarchical discrete distribution and can even act as the prior for a VQ-VAE-like system. - DDN’s K outputs are features that change with the input; VQ-VAE’s codebook is a set of independent parameters (embeddings) that remain fixed regardless of the input. - VQ-VAE produces a 2-D grid of code indices; DDN yields a 1-D/tree-structured latent. - VQ-VAE needs Straight-Through Estimator. - DDN supports zero-shot conditional generation.

So I’d call them complementary rather than “80 % the same.” (See the paper’s “Connections to VQ-VAE.”)

FitchApps•1h ago

Can you train this model to detect objects (e.g detect a fish in the picture)?

diyer22•1h ago

I believe DDN is exceptionally well-suited to the “generative models for discriminative tasks” paradigm for object detection.

Much like DiffusionDet, which applies diffusion models to detection, DDN can adopt the same philosophy. I expect DDN to offer several advantages over diffusion-based approaches: - Single forward pass to obtain results, no iterative denoising required. - If multiple samples are needed (e.g., for uncertainty estimation), DDN can directly produce multiple outputs in one forward pass. - Easy to impose constraints during generation due to DDN's Zero-Shot Conditional Generation capability. - DDN supports more efficient end-to-end optimization, thus more suitable for integration with discriminative models and reinforcement learning.

nvr219•1h ago

Congrats!! Very cool.

curtistyr•41m ago

I've been thinking about this too—how different DDN is from other generative models. The idea of generating multiple outputs at once in a single pass sounds like it could really speed things up, especially for tasks where you need a bunch of samples quickly. I'm curious how this compares to something like GANs, which can also generate multiple samples but often struggle with mode collapse.

The zero-shot conditional generation part is wild. Most methods rely on gradients or fine-tuning, so I wonder what makes DDN tick there. Maybe the tree structure of the latent space helps navigate to specific conditions without needing retraining? Also, I'm intrigued by the 1D discrete representation—how does that even work in practice? Does it make the model more interpretable?

The Split-and-Prune optimizer sounds new—I'd love to see how it performs against Adam or SGD on similar tasks. And the fact that it's fully differentiable end-to-end is a big plus for training stability.

I also wonder about scalability—can this handle high-res images without blowing up computationally? The hierarchical approach seems promising, but I'm not sure how it holds up when moving from simple distributions to something complex like natural images.

Overall though, this feels like one of those papers that could really shift the direction of generative models. Excited to dig into the code and see what kind of results people get with it!

Der_Einzige•1h ago

Wtf, iclr reviews are happening right now. Did you get accepted into a workshop? How do you know it’s been accepted?

albertzeyer•1h ago

ICLR 2026 reviews are happening now (or soon). This paper here was accepted at ICLR 2025.

moconnor•1h ago

Super cool, I spent a lot of time playing with representation learning back in the day and the grids of MNIST digits took me right back :)

A genuinely interesting and novel approach, I'm very curious how it will perform when scaled up and applied to non-image domains! Where's the best place to follow your work?

diyer22•51m ago

Thank you for your appreciation. I will update the future work on both GitHub and Twitter.

https://github.com/DIYer22 https://x.com/diyerxx

GaggiX•1h ago

It's so cool to see the hierarchical generation of the model, on their Github page they have one with L=4: https://discrete-distribution-networks.github.io/img/tree-la...

The one shown on their page is L=3.

BrokenCogs•44m ago

This is a great figure

gurtinator•50m ago

How did this get accepted without any baseline comparisons? They should have compared this to VQ-VAE, diffusion inpainting and a lot more.

diyer22•32m ago

I believe it is the novelty. Here I would like to quote Reviewer r4YK’s original words:

> Many high rated papers would have been done by someone else if their authors never published them or were rejected. However, if this paper is not published, it is not likely that anyone would come up with this approach. This is real publication value. I am reminding again the original diffusion paper from 2015 (Sohl-Dickstein) that was almost not noticed for 5 years. Had it not been published, would we have had the amazing generative models we have today?

Cite from: https://openreview.net/forum?id=xNsIfzlefG&noteId=Dl4bXmujh1

Besides, we compared DDN with other approaches in the Table 1 of original paper, including VQ-VAE.

michaeldoron•30m ago

Very impressive to see a single author paper in ICLR, especially for an innovative method. Well done!

Lerc•29m ago

It's not often you read a title like that and expect it to pan out, but from a quick browse, it looks pretty good.

Now I just need a time-turner.

Ask HN: Why does this trip GPT5 up?

Computable Babylonian Diaries Project

Show HN: BookSmarts – An app to remember what you read using active recall

Miralis RISC-V Virtual Firmware Monitor

Show HN: Collaborate on Documents with Claude Code

Lost all my sites overnight: Vercel terminated my account without notice

MIT Response to the "Compact"

A game where you program a lander to lander on the moon

Meet Neurosymbolic AI, Amazon's Method for Enhancing Neural Networks

Amazon's giant ads have ruined the Echo Show

Command and Conquer: Generals in C++20 with D3D12, x64, widescreen fixes

HeroRATs

Neuro-Symbolic AI

Making Slint Desktop-Ready

Terraform Fileset Function: Filter and Deploy Specific Files

CamoLeak: Critical GitHub Copilot Vulnerability Leaks Private Source Code

RemoteIp trusts link-local IP ranges, has_secure_token expiration gets config

Show HN: I built a tool to help freelancers find better Upwork jobs with AI

Build a React App with Bun

All-Natural Geoengineering with Frank Herbert's Dune

Show HN: Static builds of popular open source libraries on npmjs.org

Flies keep landing on North Sea oil rigs

It Begins: An AI Attempted Murder to Avoid Shutdown

Climate goals go up in smoke as US datacenters turn to coal

Airlines Are Now Credit Cards with Wings

Next-gen vaccine prevents up to 88% of multiple aggressive cancers

Someone programmed a 65-year old computer to play Boards of Canada's 'Olson'

Erlang-Red Walkthrough – Visual FBP for Telecom: Diameter AAA on Erlang/OTP

EA has released the Godot-based Battlefield 6 Portal SDK

How View Caching in Rails Works (2020)