Anti-money laundering (banks) Early in my career, I was hired to improve an anti-money laundering system. The incumbent model was 28 hard-coded rules. If enough thresholds fired (e.g., $3,000 ATM withdrawals over 30 days), the account was flagged. No one knew where the thresholds came from. There was no modeling of the underlying behavior. Just rule accumulation. I convinced the bank to provide the raw financial features behind those rule firings. We trained an interpretable ML model directly on the underlying activity patterns. The result: ~200% more true positives (accounts actually involved in fraud or laundering). But what leadership cared about most wasn’t the metric. It was this: “Why is this account suspicious?” That theme repeated across industries.
Insurance claim adjudication I later built a claim adjudication model for a major health insurer. The legacy system was massive, brittle, and effectively a black box. It would frequently deny claims incorrectly, and no one fully understood how it worked. We built a new ML system that brought claim-level adjudication accuracy to ~95%. Again, the metric wasn’t the headline internally. The headline was: “Why did this claim get denied?” In regulated environments, interpretability isn’t optional.
Stock forecasting and calibration I also learned this lesson personally. I built stock-forecasting models that performed well in historical backtests. Some predictions showed 80% probability of a price increase. Then the market regime shifted. The probabilities were overconfident. Some trades went the opposite direction. I lost money. Accuracy ≠ trustworthy probabilities. Calibration and drift awareness matter far more in deployment than most tutorials suggest. That experience fundamentally changed how I think about ML systems.
The core idea Endgame is my attempt to encode those lessons into a framework. It’s not trying to replace scikit-learn. Every estimator implements fit / predict / transform. But it extends the ecosystem with: Glass-box models (EBM, GAM, CORELS, SLIM, GOSDT, etc.) SOTA deep tabular models (FT-Transformer, TabPFN, SAINT, etc.) Conformal prediction and Venn-ABERS calibration Deployment guardrails (leakage detection, latency constraints, drift checks) 42 self-contained HTML visualizations Super Learner, BMA, cascade ensembles A full AutoML pipeline that respects deployment constraints All under a unified sklearn-compatible API.
Agent-native ML (MCP) We’re in the agentic AI era. You can ask an LLM to build a pipeline for you, but it often requires multiple prompts and manual corrections. Endgame ships with a native MCP server. This lets agents: load data train models compare results generate reports export reproducible scripts Through structured tool calls, not fragile prompt chains. My belief is that ML pipelines will increasingly become conversational infrastructure.
A small contrarian view The ML community is underestimating the problems left to solve in tabular data and overestimating the demand for accuracy-optimized models. Most real-world data in business, healthcare, and finance is tabular (often multimodal). And most real-world systems need to be interpretable, calibrated, and deployable — not just accurate.
Endgame v1.0.0 is open source (Apache 2.0). Python 3.10+. If you work on production ML systems, especially in regulated domains, I’d genuinely value feedback. GitHub: https://github.com/allianceai/endgame Install: pip install endgame-ml Happy to answer technical questions.