How good are you at programming on a whiteboard? How good is anybody? With code execution tools withheld from me, I'll freely admit that I'm pretty shit at programming. Hell, I barely remember the syntax in some of the more esoteric, unpracticed places of my knowledge. Thus, it's hard not to see case studies like this as dunking on a blindfolded free throw shooter, and calling it analysis.
pretty good?
I could certainly do a square root
(given enough time, that one would take me a while)
The only fix is tight verification loops. You can't trust the generative step without a deterministic compilation/execution step immediately following it. The model needs to be punished/corrected by the environment, not just by the prompter.
The keyword is convince. So it just needs to convince people that’s it’s right.
It is optimizing for convincing people. Out of all answers that can convince people some can be actual correct answers, others can be wrong answers.
benreesman•1h ago
dehsge•28m ago