Very cool! The massive outperformance of GPT-5 looks like there is something different in their training data indeed. Considering their previous work on games, wouldn't be surprising if they generated some synthetic game data.
nsypteras•50m ago
Ya interesting thought - would be fascinating if generating games w/solutions is part of the training data pipeline. There's been previous work done on on testing LLMs on logic puzzles[1][2][3] so they could possibly be building off those ideas to improve performance.
pyankoff•1h ago
nsypteras•50m ago
[1] https://huggingface.co/papers/2504.00043 [2] https://huggingface.co/blog/yuchenlin/zebra-logic [3] https://arxiv.org/pdf/2403.12094