fp.
newest
Open in hackernews
The Extreme Inefficiency of RL for Frontier Models
https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning
2
•
kiyanwang
•
4mo ago