This feels like an easy enough hypothesis to verify, for anyone in the business of training LLMs - does the not-X-but-Y rate increase after RLVR?
Retr0id•4m ago
This feels like an easy enough hypothesis to verify, for anyone in the business of training LLMs - does the not-X-but-Y rate increase after RLVR?