This is super cool! One thing I find counterintuitive is that GPT5 or o3 not have better performance. GPT5 gets about 800k on average per round but I would have expected it to be nearly perfect, since these are not particularly hard questions and mostly trivia or simple look up knowledge questions. There is little reasoning involved so I expected the big models to do much better.
mynti•2h ago