Very insightful post, this may work in the IMO setting because mathematical problems are inherently binary if we ignore somethings like the incompleteness theorem. In contrast, subjective tasks, such as evaluating a painting or rating a poem, lack absolute truth. How would such reasoners estimate confidence in these cases, and to what extent could RL techniques effective in the IMO transfer to real world problems?
aylinakkus•5mo ago
Yes, thats indeed a remaining question how it would transfer to tasjs like creating writing etc. :-) But my guess is that the statement that they have "solved" hard-to-verify tasks is a bit of an overstatement from the side of OpenAI.
martianlantern•6mo ago
aylinakkus•5mo ago