fp.
newest
Open in hackernews
Spurious Rewards: Rethinking Training Signals in RLVR
https://rethink-rlvr.notion.site/Spurious-Rewards-Rethinking-Training-Signals-in-RLVR-1f4df34dac1880948858f95aeb88872f
1
•
andy12_
•
8mo ago