Show HN: Trained an LLM to predict "What will Trump do?"

https://huggingface.co/LightningRodLabs/Trump-Forecaster

9•bturtel•1h ago

Hey HN! I RL-tuned an open-source LLM (gpt-oss-120b — 120B MoE, but only 5.1B active params) to predict "What will Trump do?" in any situation, trained on nothing but public news collected automatically from search queries. The trained model beats GPT-5, and both dataset and trained model are open sourced.

Data generation: Generated 2,108 binary forecasting questions from just a search query and a date range using the Lightning Rod SDK (https://github.com/lightning-rod-labs/lightningrod-python-sd...). Questions are generated from historic news articles — like "Will Trump impose 25% tariffs on Mexico by March 1?" — and resolved by checking what actually happened after the deadline. No human annotation — the whole pipeline is automated.

Training: GRPO with Brier score as the reward signal. LoRA rank 32, 50 training steps.

Results: Slight accuracy edge over GPT-5 (Brier 0.194 vs 0.200), but big gains in calibration — the RL-tuned model produces much better probabilities (ECE 0.079 vs 0.091).

Dataset: https://huggingface.co/datasets/LightningRodLabs/WWTD-2025

This is a fully automated way to spin up domain expert LLMs from public web data with just a few search queries, no labeling/annotation required.

I’d love any feedback, or suggestions for what domain expert to train next!

Comments

sleno•1h ago

interesting...what were some examples of things trump did that your model got right and gpt-5 got wrong?

bturtel•57m ago

Great question! It's probabilistic so not really "right vs wrong" on any single question, but who better estimated the likelihood. One big difference shows up when there's no useful context - we ran the same eval WITHOUT including any useful up-to-date context with questions. In this case, GPT-5 stays overconfident and its BSS drops to -11.3% (vs -4.3% ours) - worse than just guessing the base rate. So one advantage of the RL training is just learning to know what you don't know, and identify when there's real signal.

There is no AI in accountability

Google vs. SerpApi: We're Filing a Motion to Dismiss

Google vs. SerpApi: We're Filing a Motion to Dismiss

Predator spyware exploits SpringBoard to block iOS recording

Show HN: Running Debian on the OpenWrt One

Performance of Deep Material Networks for Multiscale Material Modeling

Show HN: Skills – Making AI coding tools aware of government standards

Show HN: Segspec (CLI) K8s NetworkPolicies from App Configs (Go)

German Grooms, Irish Brides: How Immigrant Communities Married into Each Other

Programming Is Forgetting: Toward a New Hacker Ethic (2016)

Michael Abrash's Zen of Assembly Language (1990)

Wikipedia bans Archive.today after site executed DDoS and altered web captures

Show HN: LLMWise – Compare, Blend, and Judge LLM Outputs from One API

Do We Need a Programming Language Built Just for AI Agents?

From Software Guilds to Software Factories

Five Memorable Books About Programming

Warden

Show HN: Together, multiplayer drawing chat room

ClawDuck

Cloudflare Outage

A collection of scripts to modernize CLI file management

Show HN: An offline-first ski analysis app

The Most Important Decisions Are Non-Technical

Wisdom of the Crowd: How Network Topology Distorts Collective Perception

7-Eleven bets on Australian stores to show it can grow globally

Show HN: Locational Variable Theory – An informational framework for physics

Show HN: Vibe coded iOS workout app with Apple Watch support

Stateful Agents and Basic Memory

Show HN: SQL Query Optimizer

Ask HN: What is the current adoption scenario for background coding agents?

Show HN: Trained an LLM to predict "What will Trump do?"

Comments

There is no AI in accountability

Google vs. SerpApi: We're Filing a Motion to Dismiss

Google vs. SerpApi: We're Filing a Motion to Dismiss

Predator spyware exploits SpringBoard to block iOS recording

Show HN: Running Debian on the OpenWrt One

Performance of Deep Material Networks for Multiscale Material Modeling

Show HN: Skills – Making AI coding tools aware of government standards

Show HN: Segspec (CLI) K8s NetworkPolicies from App Configs (Go)

German Grooms, Irish Brides: How Immigrant Communities Married into Each Other

Programming Is Forgetting: Toward a New Hacker Ethic (2016)

Michael Abrash's Zen of Assembly Language (1990)

Wikipedia bans Archive.today after site executed DDoS and altered web captures

Show HN: LLMWise – Compare, Blend, and Judge LLM Outputs from One API

Do We Need a Programming Language Built Just for AI Agents?

From Software Guilds to Software Factories

Five Memorable Books About Programming

Warden

Show HN: Together, multiplayer drawing chat room

ClawDuck

Cloudflare Outage

A collection of scripts to modernize CLI file management

Show HN: An offline-first ski analysis app

The Most Important Decisions Are Non-Technical

Wisdom of the Crowd: How Network Topology Distorts Collective Perception

7-Eleven bets on Australian stores to show it can grow globally

Show HN: Locational Variable Theory – An informational framework for physics

Show HN: Vibe coded iOS workout app with Apple Watch support

Stateful Agents and Basic Memory

Show HN: SQL Query Optimizer

Ask HN: What is the current adoption scenario for background coding agents?