While being an insightful satire of mass training LLMs with (negative) reinforcement learning, it's actually from the 1993 episode "Last Exit to Springfield", thought by many (including me) to be the single greatest Simpsons episode of all time (https://www.reddit.com/r/Simpsons/comments/1f813ki/last_exit...).
Codex makes all kind of terrible blunders that it presents as "correct". What's to stop it from just doing that in the loop? The LLM is still driving, same as when a human is in the loop.
jcz_nz•59m ago