*What's new in V2:*
1. *ML Impact Predictor* (XGBoost) - Trained on 448 historical catalyst patterns (2015-2025) - Direction accuracy: 76.67% - Magnitude MAE: 3.09% - Top features: catalyst_type (0.649), price_momentum_30d (0.149), volatility_30d (0.110)
2. *Likelihood of Approval (LOA) scores* - Random Forest model - Based on phase, indication, company track record
3. *Optimal Entry Predictor* - Suggests when to enter before a catalyst - Analyzes pre-catalyst price patterns
4. *Smart Money Score* (-100 to +100) - Combines SEC Form 4 insider trades + 13F institutional holdings - Flags unusual activity near catalyst dates
5. *Auto-retraining* - Weekly cron job retrains models - Drift detection triggers retraining if accuracy drops
*Technical details:* - ML Service: Python FastAPI + XGBoost/Random Forest - Training data: 1,884 catalysts, 490K stock price rows, 727 tickers - Deployed on VPS with Docker Compose - Models stored in Docker volume, hot-reloaded
*What stayed free:* - 992 companies, 2,192 drugs, full catalyst calendar - SEC insider filings, EMA Europe coverage - Twitter bot posts movers 3x daily (@catalystalert)
The class distribution in training data is skewed (73.7% positive outcomes) which inflates accuracy. Working on balancing techniques.
Questions I'm wrestling with: - Is 76% accuracy actually useful for trading decisions? - Should I open-source the ML pipeline? - What would make the predictions more trustworthy?
Original post: https://news.ycombinator.com/item?id=46189127
catalystalert.io