*The pipeline:*
- Crawls articles from Substack - Extracts high-conviction stock picks using Gemini's structured output — filters out casual ticker mentions and only counts calls where the author dedicates real analysis, specific data, or price targets - Tracks returns at 1d, 7d, 15d, 30d, and 60d post-publication using yfinance - Calculates alpha vs sector-specific ETF benchmarks (SOXX for semis, IGV for SaaS, XLF for financials, EWJ for Japan, SPY as fallback) - Deduplication: same author, same ticker within 14 days = one call. Cross-author calls are independent
Total dataset: 3,519 high-conviction calls from 22 authors over 1 year.
*Interesting technical challenges:*
1. AI extraction accuracy. Gemini is surprisingly good at identifying whether an author is making a real call vs. just mentioning a ticker in passing. We tag calls with conviction level (high/low) and direction (bullish/bearish). To validate this, we spot-checked against manual reads and cross-verified with alternative model outputs. Not perfect, but consistent enough to be useful.
2. Custom domain handling. Many Substack authors use custom domains (e.g., collyerbridge.com, lordfed.co.uk) which sometimes trigger Cloudflare challenges. We fall back to headless Playwright when the standard HTTP client gets blocked.
3. Benchmark selection. A naive "did the stock go up?" metric is meaningless in a bull market. We map each ticker to a sector ETF benchmark, so alpha = position return minus benchmark return over the same period. This separates genuine stock-picking skill from just being long in a rising market.
4. Deduplication logic. Authors often revisit the same thesis across multiple articles. Without dedup, a single stock mentioned in 5 articles would count as 5 independent "calls." We use a 14-day window per author per ticker — only the first mention counts.
*Some findings (for context, not the point of this post):*
- Top performer averaged +14.9% at 30d and +26.7% at 60d on long calls - The most expensive newsletters ($1,000+/year) were not the best performers - Authors with fewer, more targeted calls (15-80) tended to outperform those with 300+ calls - 30d vs 60d rankings shift significantly — deep value investors look much better at longer horizons - Short calls were harder for almost everyone
*Stack:* Python, SQLite, Gemini API (structured output), yfinance, Playwright (optional)
I wrote a more detailed breakdown with charts as an X thread: https://x.com/pyhrroll/status/2027374283669066045?s=20
Happy to discuss the methodology, architecture, or share the extraction prompts. The pipeline is ~2,000 lines of Python if there's interest in seeing the code.
zahlman•54m ago
Suppose you had put that money in index funds instead?
lineudemonia•5m ago