How Confident Are You, ChatGPT?

https://aylinakkus.github.io/how-confident-are-you-chatgpt/

2•aylinakkus•6mo ago

Comments

martianlantern•6mo ago

Very insightful post, this may work in the IMO setting because mathematical problems are inherently binary if we ignore somethings like the incompleteness theorem. In contrast, subjective tasks, such as evaluating a painting or rating a poem, lack absolute truth. How would such reasoners estimate confidence in these cases, and to what extent could RL techniques effective in the IMO transfer to real world problems?

aylinakkus•6mo ago

Yes, thats indeed a remaining question how it would transfer to tasjs like creating writing etc. :-) But my guess is that the statement that they have "solved" hard-to-verify tasks is a bit of an overstatement from the side of OpenAI.