It integrates with scikit-learn, comes with documentation and examples, and is available on PyPI.
Key features:
* model non-Gaussian conditional distributions
* capture non-linear dependencies
* handle heteroscedastic noise (variance that changes with inputs)
* provide full predictive distributions, not just point estimates
The current release added:
* Mixture of Experts (MoE): Softmax-gated experts with linear mean functions (Jordan & Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm”, Neural Computation, 1994)
* Direct conditional likelihood optimization: implementing EM from Jaakkola & Haussler, “Expectation-Maximization Algorithms for Conditional Likelihoods”, ICML 2000
Examples now cover a range of applications:
* ViX volatility Monte Carlo simulation (non-linear, non-Gaussian SDEs)
* Multivariate seasonal forecasts (temperature, windspeed, light intensity)
* Iris dataset + scikit-learn benchmarks
* Generative modelling of handwritten digits
Links:
Docs: https://cgmm.readthedocs.io/en/latest/
GitHub: https://github.com/sitmo/cgmm
PyPI: https://pypi.org/project/cgmm/
I'd love to get feedback from the community, especially on use cases where people model non-Gaussian, non-linear data.
sitmo•1h ago
* scikit-learn's GaussianMixture models the unconditional distribution of data. cgmm, on the other hand, models conditional distributions (p(y|x)), which makes it more suitable for regression and forecasting tasks.
* Compared to linear or generalized linear models, cgmm can capture multi-modal outputs, non-Gaussian behavior, and input-dependent variance.
* Compared to Bayesian frameworks (like PyMC or Stan), cgmm is more focused and lightweight: it provides efficient EM-based algorithms and scikit-learn–style APIs rather than full Bayesian inference.
So I see cgmm as complementary, a middle ground between simple regression models and full probabilistic programming frameworks, with a focus on conditional mixture models that are easy to drop into existing Python/ML pipelines.