Show HN: OS Library for Conditional Gaussian Mixture Modelling in Python

1•sitmo•1h ago

I've been working on a compact Python library called cgmm for regression modelling with Conditional Gaussian Mixture Models. It allows flexible, data-driven regression beyond Gaussian and linear assumptions.

It integrates with scikit-learn, comes with documentation and examples, and is available on PyPI.

Key features:

* model non-Gaussian conditional distributions

* capture non-linear dependencies

* handle heteroscedastic noise (variance that changes with inputs)

* provide full predictive distributions, not just point estimates

The current release added:

* Mixture of Experts (MoE): Softmax-gated experts with linear mean functions (Jordan & Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm”, Neural Computation, 1994)

* Direct conditional likelihood optimization: implementing EM from Jaakkola & Haussler, “Expectation-Maximization Algorithms for Conditional Likelihoods”, ICML 2000

Examples now cover a range of applications:

* ViX volatility Monte Carlo simulation (non-linear, non-Gaussian SDEs)

* Multivariate seasonal forecasts (temperature, windspeed, light intensity)

* Iris dataset + scikit-learn benchmarks

* Generative modelling of handwritten digits

Links:

Docs: https://cgmm.readthedocs.io/en/latest/

GitHub: https://github.com/sitmo/cgmm

PyPI: https://pypi.org/project/cgmm/

I'd love to get feedback from the community, especially on use cases where people model non-Gaussian, non-linear data.

Comments

sitmo•1h ago

A quick note on how cgmm relates to existing tools:

* scikit-learn's GaussianMixture models the unconditional distribution of data. cgmm, on the other hand, models conditional distributions (p(y|x)), which makes it more suitable for regression and forecasting tasks.

* Compared to linear or generalized linear models, cgmm can capture multi-modal outputs, non-Gaussian behavior, and input-dependent variance.

* Compared to Bayesian frameworks (like PyMC or Stan), cgmm is more focused and lightweight: it provides efficient EM-based algorithms and scikit-learn–style APIs rather than full Bayesian inference.

So I see cgmm as complementary, a middle ground between simple regression models and full probabilistic programming frameworks, with a focus on conditional mixture models that are easy to drop into existing Python/ML pipelines.

How the AI Bubble Will Pop

Interview with Zelda's Adventure Model and Prosthetic Maker Jason Bakutis

RailsERD.com is now Open Source

Real AI Agents and Real Work

A minimal tool to share Markdown with end-to-end encryption

AI-Powered Shelf Price Verification: Matching Label Prices to POS Server Prices

Matrix Core Programming on AMD CDNA 3 and CDNA 4 Architecture

openvpn-manager (openvpn3 and openvpn)

Why over-engineering happens

Safer C++ at Scale with Static Analysis – C++Now 2025 [video]

Nirvana again defeats alleged child sexual abuse image lawsuit over album cover

Want to do disruptive science? Include more rookie researchers

Bloopers by Sora2 [video]

Karpathy: LLMs are "ghosts," not "animals"

Show HN: InstaTrack: open-source, privacy-first Instagram analytics (local)

The next generation of Laravel Forge

Sora 2 and the end of copyright as we know it

Storytel Desktop Player for Windows

Glazed: AI turns Figma designs into tracking code for analytics

Tutankhamun's Meteoric Iron Dagger

Who Watches the Watch Piece

Let's Build the Cinemas of the Future

Implementing a Fast Tensor Core Matmul on the Ada Architecture

Intel in early talks to add AMD as foundry customer

A 4k-year-old spatial pattern of termite mounds

An app to help you work out what to wear running and cycling

Photo Prompt

Tiny Reagent-compatible UIs with no React

How a program runs on a cpu

Vibecoded Accessibility Tools