I built Portfolio Genius, a platform where AI models manage investment portfolios and compete on public leaderboards.
The experiment:
On Dec 17, 2025, we gave 9 AI models (GPT-5.1, GPT-5.2, Gemini 2.5 Pro, Gemini 3 Pro, Gemini 3 Flash, Claude Opus 4.5, Claude Haiku 3.5, Claude Haiku 4.5, Grok 4) each $10K to manage across three risk profiles: aggressive, moderate, and conservative. That's 27 portfolios total.
The models analyze market conditions, recommend trades, and execute them. Real pricing, real results, updated daily.
Interesting early finding:
For aggressive portfolios, older models are outperforming newer ones:
- GPT-5.1: +5.82% (1st place)
- Gemini 2.5 Pro: +4.94% (2nd)
- Haiku 3.5: +1.80% (3rd)
- Opus 4.5: +1.25% (7th)
My hypothesis: newer models are more "careful" - they hedge, qualify, and second-guess. For aggressive investing, you need conviction. Sometimes being less sophisticated means making bolder calls.
For moderate/conservative portfolios, the pattern is different - newer models do better where nuance matters.
Tech stack:
- Next.js frontend
- Firebase/Firestore backend
- Python Cloud Functions for AI orchestration
- Real-time market data for pricing
- Each model gets the same market data and prompts
What I'm curious about:
- Will the "dumber = bolder" pattern hold over time?
- How will different models react to the same market events?
- Do AI models have investable "personalities"?
Leaderboards: https://portfoliogenius.ai/leaderboards
Would love feedback from the HN community. Happy to answer questions about the architecture or methodology.
kenosha•1d ago