Data generation: Generated 2,108 binary forecasting questions from just a search query and a date range using the Lightning Rod SDK (https://github.com/lightning-rod-labs/lightningrod-python-sd...). Questions are generated from historic news articles — like "Will Trump impose 25% tariffs on Mexico by March 1?" — and resolved by checking what actually happened after the deadline. No human annotation — the whole pipeline is automated.
Training: GRPO with Brier score as the reward signal. LoRA rank 32, 50 training steps.
Results: Slight accuracy edge over GPT-5 (Brier 0.194 vs 0.200), but big gains in calibration — the RL-tuned model produces much better probabilities (ECE 0.079 vs 0.091).
Dataset: https://huggingface.co/datasets/LightningRodLabs/WWTD-2025
This is a fully automated way to spin up domain expert LLMs from public web data with just a few search queries, no labeling/annotation required.
I’d love any feedback, or suggestions for what domain expert to train next!
sleno•1h ago
bturtel•57m ago