frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Extreme Inefficiency of RL for Frontier Models

https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning
2•kiyanwang•4mo ago