I was annoyed at reading all the subjective comparisons and outright conspiracy theories on the Reddit AI coding groups, backed by nothing more than that day's subjective observations.
So I put the older and newer versions of Claude Code and Codex CLI to the same task of generating a simple React web app from a specification, using a consistent prompt structure, and compared the results.
It's not a quantitative evaluation, but it is an apples-to-apples qualitative comparison that shows clear differences in agent/model performance on an equivalent multistep task using equivalent approaches.
mwigdahl•1h ago
So I put the older and newer versions of Claude Code and Codex CLI to the same task of generating a simple React web app from a specification, using a consistent prompt structure, and compared the results.
It's not a quantitative evaluation, but it is an apples-to-apples qualitative comparison that shows clear differences in agent/model performance on an equivalent multistep task using equivalent approaches.