Based on a month of GPT5 usage, this model feels like primarily like a regression:
1. It's slow: thinking mode can take ages, and sometimes gets downright stuck. It's auto-assessment of whether or not it needs to think feels poorly tuned to most tasks and defaults too easily to going into deep reasoning mode.
2. Hallucinations are in overdrive: I would assess that in 7/10 tasks, hallucinations continuously clutter the responses and warrant corrections and careful monitoring and steering back. It hallucinates list items from your prompt that weren't there, software package functionalities/capabilities and CLI parameters etc. Even thorough prompting with explicit linking to sources, e.g. also wihtin deep research frequently goes of the rails.
3. Not self critical: even in thinking mode, it frequently spews out incorrect stuff, that a blatant "this is not correct, check your answer" can directly correct.
Note: I am not a super advanced prompt engineer, and this above assessment is mainly wrt the previous generation of models. I would expect that with progression of model capabilities, the need for users to apply careful prompt engineering goes down, not up.
I am very curious to hear your experiences.
patrakov•1h ago
Another pet peeve is that it, when asked to provide several possible solutions, sometimes generates two that are identical but with different explanations.
technocratius•1h ago