UGMM-NN: Univariate Gaussian Mixture Model Neural Network

23•zakeria•3h ago

Comments

zakeria•3h ago

uGMM-NN is a novel neural architecture that embeds probabilistic reasoning directly into the computational units of deep networks. Unlike traditional neurons, which apply weighted sums followed by fixed nonlinearities, each uGMM-NN node parameterizes its activations as a univariate Gaussian mixture, with learnable means, variances, and mixing coefficients.

vessenes•1h ago

Meh. Well, at least, possibly “meh”.

Upshot: Gaussian sampling along the parameters of nodes rather than a fixed number. This might offer one of the following:

* Better inference time accuracy on average

* Faster convergence during training

It probably costs additional inference and training compute.

The paper demonstrates worse results on MNIST, and shows the architecture is more than capable of dealing with the Iris test (which I hadn’t heard of; categorizing types of irises, I presume the flower, but maybe the eye?)

The paper claims to keep the number of parameters and depth the same, but it doesn’t report as to

* training time/flops (probably more I’d guess?)

* inference time/flops (almost certainly more)

Intuitively if you’ve got a mean, variance and mix coefficient, then you have triple the data space per parameter — no word as to whether the networks were normalized as to total data taken by the NN or just the number of “parameters”.

Upshot - I don’t think this paper demonstrates any sort of benefit here or elucidates the tradeoffs.

Quick reminder, negative results are good, too. I’d almost rather see the paper framed that way.

zakeria•1h ago

Thanks for the comment. Just to clarify, the uGMM-NN isn't simply "Gaussian sampling along the parameters of nodes."

Each neuron is a univariate Gaussian mixture with learnable mean, variance, and mixture weights. This gives the network the ability to perform probabilistic inference natively inside its architecture, rather than approximating uncertainty after the fact.

The work isn’t framed as "replacing MLPs." The motivation is to bridge two research traditions:

- probabilistic graphical models and probabilistic circuits (relatively newer)

- deep learning architectures

That's why the Iris dataset (despite being simple) was included - not as a discriminative benchmark, but to show the model could be trained generatively in a way similar to PGMs, something a standard MLP cannot do. Hence, the other benefits of the approach mentioned in the paper.

ericdoerheit•1h ago

Thank you for your work! I would be interested to see what this means to a CNN architecture. Maybe it wouldn't actually be needed to have the whole architecture based on uGMM-NNs but only the last layers?

zakeria•47m ago

Thanks - good question, in theory, the uGMM layer could complement CNNs in different ways - for example, one could imagine (as you mentioned):

using standard convolutional layers for feature extraction,

then replacing the final dense layers with uGMM neurons to enable probabilistic inference and uncertainty modeling on top of the learned features.

My current focus, however, is exploring how uGMMs translate into Transformer architectures, which could open up interesting possibilities for probabilistic reasoning in attention-based models.

magicalhippo•51m ago

I'm having a very dense moment I think, and it's been far to long since ny statistics courses.

They state the output of a neuron j is a log density P_j(y), where y is a latent variable.

But how does the output from the previous layer, x, come into play?

I guess I was expecting some kind of conditional probabilities, ie the output is P_j given x or something.

Again, perhaps trivial. Just struggling to figure out how it works in practice.

ChatGPT Developer Mode: Full MCP client access

Show HN: Term.everything – Run any GUI app in the terminal

Pontevedra, Spain declares its entire urban area a "reduced traffic zone"

Defeating Nondeterminism in LLM Inference

KDE launches its own distribution (again)

The HackberryPi CM5 handheld computer

Christie's Deletes Digital Art Department

Launch HN: Recall.ai (YC W20) – API for meeting recordings and transcripts

Mux (YC W16) Is Hiring Engineering ICs and Managers

Dotter: Dotfile manager and templater written in Rust

OrioleDB Patent: now freely available to the Postgres community

Show HN: Haystack – Review pull requests like you wrote them yourself

Longhorn – A Kubernetes-Native Filesystem

Clojure's Solutions to the Expression Problem

I didn't bring my son to a museum to look at screens

Jiratui – A Textual UI for interacting with Atlassian Jira from your shell

Harvey Mudd Miniature Machine

"No Tax on Tips" Includes Digital Creators, Too

Show HN: HumanAlarm – Real people knock on your door to wake you up

Show HN: TailGuard – Bridge your WireGuard router into Tailscale via a container

UGMM-NN: Univariate Gaussian Mixture Model Neural Network

Kerberoasting

Zoox robotaxi launches in Las Vegas

Charlie Kirk killed at event in Utah

The origin story of merge queues

Tarsnap is cozy

Things you can do with a debugger but not with print debugging

TikTok has turned culture into a feedback loop of impulse and machine learning

Semantic Line Breaks (2017)

Distributing your own scripts via Homebrew

UGMM-NN: Univariate Gaussian Mixture Model Neural Network

Comments

ChatGPT Developer Mode: Full MCP client access

Show HN: Term.everything – Run any GUI app in the terminal

Pontevedra, Spain declares its entire urban area a "reduced traffic zone"

Defeating Nondeterminism in LLM Inference

KDE launches its own distribution (again)

The HackberryPi CM5 handheld computer

Christie's Deletes Digital Art Department

Launch HN: Recall.ai (YC W20) – API for meeting recordings and transcripts

Mux (YC W16) Is Hiring Engineering ICs and Managers

Dotter: Dotfile manager and templater written in Rust

OrioleDB Patent: now freely available to the Postgres community

Show HN: Haystack – Review pull requests like you wrote them yourself

Longhorn – A Kubernetes-Native Filesystem

Clojure's Solutions to the Expression Problem

I didn't bring my son to a museum to look at screens

Jiratui – A Textual UI for interacting with Atlassian Jira from your shell

Harvey Mudd Miniature Machine

"No Tax on Tips" Includes Digital Creators, Too

Show HN: HumanAlarm – Real people knock on your door to wake you up

Show HN: TailGuard – Bridge your WireGuard router into Tailscale via a container

UGMM-NN: Univariate Gaussian Mixture Model Neural Network

Kerberoasting

Zoox robotaxi launches in Las Vegas

Charlie Kirk killed at event in Utah

The origin story of merge queues

Tarsnap is cozy

Things you can do with a debugger but not with print debugging

TikTok has turned culture into a feedback loop of impulse and machine learning

Semantic Line Breaks (2017)

Distributing your own scripts via Homebrew