Reinforcement learning dominated the recent neurips papers. But here's one that stood out to me about how exactly pre training can affect post training.
This means if the core data (ex. additions, subtractions, etc) were not there in the pre training stage, RL on complex math problems would not lead to the model developing improvements in the core areas.
binsquare•1h ago
This means if the core data (ex. additions, subtractions, etc) were not there in the pre training stage, RL on complex math problems would not lead to the model developing improvements in the core areas.