It has already happened with the music gen models already. It's only a matter of time when the open weight models will overtake Anthropic.
Expect them to dial up the scaremongering until they IPO. The Claude family of models are their only AI product that is keeping them alive.
- a reasonable improvement over sonnet 4.5, esp. with agentic tool use
- generally worse than opus 4.6
Probably not worth it for coding, but a win for anybody building agentic ai assistants of any sort with Sonnet.
To remind, Opus 4.5 was SOTA 2-3 weeks ago.
SWE bench for example creates a predictions file and evaluates the results in the harness. Without Codex 5.3 being in the API, it can't.
People who do not know how reproducible research works.
Any benchmark that is presented by AI labs must be reproduced reliably by someone else independent of that AI lab presenting these results.
Otherwise, not only it is biased, these numbers can be just made up for marketing purposes.
a_void_sky•1h ago