frontpage.

Show HN: Endgame – Production-aware ML under the sklearn API

1•cameronhamilton•1h ago

Most ML frameworks optimize for leaderboard accuracy. But in finance and healthcare, accuracy is often the least interesting part of the system. If you can’t explain a prediction, you can’t deploy it. If your probabilities aren’t calibrated, you can’t trust them. If your pipeline doesn’t enforce constraints, you can’t ship it. I built Endgame after repeatedly running into that gap in production.

Anti-money laundering (banks) Early in my career, I was hired to improve an anti-money laundering system. The incumbent model was 28 hard-coded rules. If enough thresholds fired (e.g., $3,000 ATM withdrawals over 30 days), the account was flagged. No one knew where the thresholds came from. There was no modeling of the underlying behavior. Just rule accumulation. I convinced the bank to provide the raw financial features behind those rule firings. We trained an interpretable ML model directly on the underlying activity patterns. The result: ~200% more true positives (accounts actually involved in fraud or laundering). But what leadership cared about most wasn’t the metric. It was this: “Why is this account suspicious?” That theme repeated across industries.

Insurance claim adjudication I later built a claim adjudication model for a major health insurer. The legacy system was massive, brittle, and effectively a black box. It would frequently deny claims incorrectly, and no one fully understood how it worked. We built a new ML system that brought claim-level adjudication accuracy to ~95%. Again, the metric wasn’t the headline internally. The headline was: “Why did this claim get denied?” In regulated environments, interpretability isn’t optional.

Stock forecasting and calibration I also learned this lesson personally. I built stock-forecasting models that performed well in historical backtests. Some predictions showed 80% probability of a price increase. Then the market regime shifted. The probabilities were overconfident. Some trades went the opposite direction. I lost money. Accuracy ≠ trustworthy probabilities. Calibration and drift awareness matter far more in deployment than most tutorials suggest. That experience fundamentally changed how I think about ML systems.

The core idea Endgame is my attempt to encode those lessons into a framework. It’s not trying to replace scikit-learn. Every estimator implements fit / predict / transform. But it extends the ecosystem with: Glass-box models (EBM, GAM, CORELS, SLIM, GOSDT, etc.) SOTA deep tabular models (FT-Transformer, TabPFN, SAINT, etc.) Conformal prediction and Venn-ABERS calibration Deployment guardrails (leakage detection, latency constraints, drift checks) 42 self-contained HTML visualizations Super Learner, BMA, cascade ensembles A full AutoML pipeline that respects deployment constraints All under a unified sklearn-compatible API.

Agent-native ML (MCP) We’re in the agentic AI era. You can ask an LLM to build a pipeline for you, but it often requires multiple prompts and manual corrections. Endgame ships with a native MCP server. This lets agents: load data train models compare results generate reports export reproducible scripts Through structured tool calls, not fragile prompt chains. My belief is that ML pipelines will increasingly become conversational infrastructure.

A small contrarian view The ML community is underestimating the problems left to solve in tabular data and overestimating the demand for accuracy-optimized models. Most real-world data in business, healthcare, and finance is tabular (often multimodal). And most real-world systems need to be interpretable, calibrated, and deployable — not just accurate.

Endgame v1.0.0 is open source (Apache 2.0). Python 3.10+. If you work on production ML systems, especially in regulated domains, I’d genuinely value feedback. GitHub: https://github.com/allianceai/endgame Install: pip install endgame-ml Happy to answer technical questions.

Show HN: I built a human rights evaluator for HN (content vs. site behavior)

The Supreme Court doesn't care if you want to copyright your AI-generated art

Google Chrome switches to two-week release cycle

ChatGPT Health 'under-triaged' half of medical emergencies in a new study

Universal-3 Pro Streaming

Show HN: Dracula-AI – A lightweight, async SQLite-backed Gemini wrapper

Show HN: Sovereign Trace Stamp – Frozen triple-time cryptographic timestamp

Cancel ChatGPT AI boycott surges after OpenAI pentagon military deal

Show HN: Demarkus – De-centralized Markup for Us:memory for AI agents and humans

Quantifying the Swiss Marriage Tax

Migrating 11,000 JavaScript files to TypeScript over 7 years at Patreon

Engineering over Enforcement (2023)

The jellyfish knows how to survive uncertain times

Obama is right about aliens

Life in the Endless Scroll: What We're Losing

Zen of AI Coding

Intent-Based Access Control (IBAC) – FGA for AI Agent Permissions

Moving to 199-day validity for public TLS certificates

The Wealth of Wall Street with Oren Cass [video]

Eurosky.social accounts – launching early February

Spain says we have the necessary resources to contain US trade embargo

Four Decades of Inquiry into the Genetic Bases of Specific Reading Disability

Show HN: Finqual – Free SEC-based API for fundamentals, insider and 13F data

Reverse engineering "Hello World" in QuickBASIC 3.0

Ask HN: Are you running a free product (pre-revenue)?

Interactive Fiction Theory and Criticism

The evolution of background job frameworks in Ruby

Tunesia authoritative nameservers for .tn are down

"We have made the decision to permanently shut down Highguard."

The missing piece for AI coding agents