1. Coding tasks: they don't follow instructions at all. They consistently miss close to a quarter of the ask, which cascades into bloat over time. 2. Reviewing existing code: they literally fail to read the code or use the tools properly.
Both of these have, in my opinion, eroded the time saving value of the tool. Hard opinion: real code review is harder than coding.
So I tried Codex. This is my first time, so I have only a little experience with it, but the distinction is clear. Codex is remarkably precise about the exact changes I need, reliable nearly 95% of the time. What it lacks is the flair, the bombastic ideas and presentation that Claude has. I use Claude to discuss ideas, it gives me great variety and draws ascii block diagrams Codex never could. But Codex is the most reliable coding agent.
Suggestion: Don't trust Claude when it says it finished something :) Always review it, at least post-4.6.
Is this just my experience, or do others feel the same?