Not because I love Anthropic (I do like them) but because it's staving off me having to change my Coding Agent.
This world is changing fast, and both keeping up with State of the Art and/or the feeling of FOMO is exhausting.
Ive been holding onto Claude Code for the last little while since Ive built up a robust set of habits, slash commands, and sub agents that help me squeeze as much out of the platform as possible.
But with the last few releases of Gemini and Codex I've been getting closer and closer to throwing it all out to start fresh in a new ecosystem.
Thankfully Anthropic has come out swinging today and my own SOP's can remain in tact a little while longer.
On-topic, I love the fact that Opus is now three times cheaper. I hope it's available in Claude Code with the Pro subscription.
EDIT: Apparently it's not available in Claude Code with the Pro subscription, but you can add funds to your Claude wallet and use Opus with pay-as-you-go. This is going to be really nice to use Opus for planning and Sonnet for implementation with the Pro subscription.
However, I noticed that the previously-there option of "use Opus for planning and Sonnet for implementation" isn't there in Claude Code with this setup any more. Hopefully they'll implement it soon, as that would be the best of both worlds.
I think Anthropic is making the right decisions with their models. Given that software engineering is probably one of the very few domains of AI usage that is driving real, serious revenue: I have far better feelings about Anthropic going into 2026 than any other foundation model. Excited to put Opus 4.5 through its paces.
What do you mean?
On the other hand, it’s a truly multi modal model whereas Claude remains to be specifically targeted at coding tasks, and therefore is only a text model.
>> I'll execute.
>> I'll execute.
>> Wait, what if...?
>> I'll execute.
Suffice it to say I've switched back to Sonnet as my daily driver. Excited to give Opus a try.
It gave me the Youtube-URL to Rick Astley.
Same with asking a person to solve something in their head vs. giving them an editor and a random python interpreter, or whatever it is normal people use to solve problems.
Claude is still a go to but i have found that composer was “good enough” in practice.
Also notable: they're claiming SOTA prompt injection resistance. The industry has largely given up on solving this problem through training alone, so if the numbers in the system card hold up under adversarial testing, that's legitimately significant for anyone deploying agents with tool access.
The "most aligned model" framing is doing a lot of heavy lifting though. Would love to see third-party red team results.
I’ve always found Opus significantly better than the benchmarks suggested.
LFG
And they left Haiku out of most of the comparisons! That's the most interesting model for me. Because for some tasks it's fine. And it's still not clear to me which ones those are.
Because in my experience, Haiku sits at this weird middle point where, if you have a well defined task, you can use a smaller/faster/cheaper model than Haiku, and if you don't, then you need to reach for a bigger/slower/costlier model than Haiku.
Then for the next 2-3 months people complaining about the degradation will be labeled “skill issue”.
Then a sacrificial Anthropic engineer will “discover” a couple obscure bugs that “in some cases” might have lead to less than optimal performance. Still largely a user skill issue though.
Then a couple months later they’ll release Opus 4.7 and go through the cycle again.
My allegiance to these companies is now measured in nerf cycles.
I’m a nerf cycle customer.
jumploops•29m ago
So it’s 1/3 the price of Opus 4.1…
> [..] matches Sonnet 4.5’s best score on SWE-bench Verified, but uses 76% fewer output tokens
…and potentially uses a lot less tokens?
Excited to stress test this in Claude Code, looks like a great model on paper!
jmkni•19m ago
For anyone else confused, it's input/output tokens
$5 for 1million tokens in $25 for 1million tokens out
alach11•15m ago
Also increasingly it's becoming important to look at token usage rather than just token cost. They say Opus 4.5 (with high reasoning) used 50% fewer tokens than Sonnet 4.5. So you get a higher score on SWE-bench verified, you pay more per token, but you use fewer tokens and overall pay less!