Claude Opus 4.5

https://www.anthropic.com/news/claude-opus-4-5

156•adocomplete•35m ago

Comments

jumploops•29m ago

> Pricing is now $5/$25 per million tokens

So it’s 1/3 the price of Opus 4.1…

> [..] matches Sonnet 4.5’s best score on SWE-bench Verified, but uses 76% fewer output tokens

…and potentially uses a lot less tokens?

Excited to stress test this in Claude Code, looks like a great model on paper!

jmkni•19m ago

> Pricing is now $5/$25 per million tokens

For anyone else confused, it's input/output tokens

$5 for 1million tokens in $25 for 1million tokens out

alach11•15m ago

This is the biggest news of the announcement. Prior Opus models were strong, but the cost was a big limiter of usage. This price point still makes it a "premium" option, but isn't prohibitive.

Also increasingly it's becoming important to look at token usage rather than just token cost. They say Opus 4.5 (with high reasoning) used 50% fewer tokens than Sonnet 4.5. So you get a higher score on SWE-bench verified, you pay more per token, but you use fewer tokens and overall pay less!

elvin_d•27m ago

Great seeing the price reduction. Opus historically was prices at 15/75, this one delivers at 5/25 which is close to Gemini 3 Pro. I hope Anthropic can afford increasing limits for the new Opus.

rishabhaiover•26m ago

Is this available on claude-code?

greenavocado•25m ago

What are you thinking of trying to use it for? It is generally a huge waste of money to unleash Opus on high content tasks ime

rishabhaiover•23m ago

I use claude-code extensively to plan and study for my college using the socrates learning mode. It's a great way to learn for me. I wanted to test the new model's capabilities on that front.

flutas•21m ago

My workflow has always been opus for planning, sonnet for actual work.

elvin_d•24m ago

Yes, the first run was nice - feels faster than 4.1 and did what Sonnet 4.5 struggled to execute properly.

rishabhaiover•20m ago

damn, I need a MAX sub for this.

stavros•13m ago

You don't, you can add $5 or whatever to your Claude wallet with the Pro subscription and use those for Opus.

bnchrch•25m ago

Seeing these benchmarks makes me so happy.

Not because I love Anthropic (I do like them) but because it's staving off me having to change my Coding Agent.

This world is changing fast, and both keeping up with State of the Art and/or the feeling of FOMO is exhausting.

Ive been holding onto Claude Code for the last little while since Ive built up a robust set of habits, slash commands, and sub agents that help me squeeze as much out of the platform as possible.

But with the last few releases of Gemini and Codex I've been getting closer and closer to throwing it all out to start fresh in a new ecosystem.

Thankfully Anthropic has come out swinging today and my own SOP's can remain in tact a little while longer.

stavros•21m ago

Did anyone else notice Sonnet 4.5 being much dumber recently? I tried it today and it was really struggling with some very simple CSS on a 100-line self-contained HTML page. This never used to happen before, and now I'm wondering if this release has something to do with it.

On-topic, I love the fact that Opus is now three times cheaper. I hope it's available in Claude Code with the Pro subscription.

EDIT: Apparently it's not available in Claude Code with the Pro subscription, but you can add funds to your Claude wallet and use Opus with pay-as-you-go. This is going to be really nice to use Opus for planning and Sonnet for implementation with the Pro subscription.

However, I noticed that the previously-there option of "use Opus for planning and Sonnet for implementation" isn't there in Claude Code with this setup any more. Hopefully they'll implement it soon, as that would be the best of both worlds.

kjgkjhfkjf•15m ago

My guess is that Claude's "bad days" are due to the service becoming overloaded and failing over to use cheaper models.

bryanlarsen•14m ago

On Friday my Claude was particularly stupid. It's sometimes stupid, but I've never seen it been that consistently stupid. Just assumed it was a fluke, but maybe something was changing.

vunderba•3m ago

Anecdotally, I kind of compare the quality of Sonnet 4.5 to that of a chess engine: it performs better when given more time to search deeper into the tree of possible moves (more plies). So when Anthropic is under peak load I think some degradation is to be expected. I just wish Claude Code had a "Signal Peak" so that I could schedule more challenging tasks for a time when its not under high demand.

827a•21m ago

I've played around with Gemini 3 Pro in Cursor, and honestly: I find it to be significantly worse than Sonnet 4.5. I've also had some problems that only Claude Code has been able to really solve; Sonnet 4.5 in there consistently performs better than Sonnet 4.5 anywhere else.

I think Anthropic is making the right decisions with their models. Given that software engineering is probably one of the very few domains of AI usage that is driving real, serious revenue: I have far better feelings about Anthropic going into 2026 than any other foundation model. Excited to put Opus 4.5 through its paces.

visioninmyblood•18m ago

The model is great it is able to code up some interesting visual tasks(I guess they have pretty strong tool calling capapbilities). Like orchestrate prompt -> image generate -> Segmentation -> 3D reconstruction. Checkout the results here https://chat.vlm.run/c/3fcd6b33-266f-4796-9d10-cfc152e945b7. Note the model was only used to orchestrate the pipeline, the tasks are done by other models in an agentic framework. They much have improved tool calling framework with all the MCP usage. Gemini 3 was able to orchestrate the same but Claude 4.5 is much faster

Squarex•18m ago

I have heard that gemini 3 is not that great in cursor, but excellent in Antigravity. I don't have a time to personally verify all that though.

incoming1211•16m ago

I think gemini 3 is hot garbage in everything. Its great on a greenfield trying to 1 shot something, if you're working on a long term project it just sucks.

koakuma-chan•9m ago

Nothing is great in Cursor.

itsdrewmiller•6m ago

My first couple of attempts at antigravity / Gemini were pretty bad - the model kept aborting and it was relatively helpless at tools compared to Claude (although I have a lot more experience tuning Claude to be fair). Seems like there are some good ideas in antigravity but it’s more like an alpha than a product.

rishabhaiover•17m ago

I suspect Cursor is not the right platform to write code on. IMO, humans are lazy and would never code on Cursor. They default to code generation via prompt which is sub-optimal.

viraptor•13m ago

> They default to writing code via prompt generation which is sub-optimal.

What do you mean?

rishabhaiover•7m ago

If you're given a finite context window, what's the most efficient token to present for a programming task? sloppy prompts or actual code (using it with autocomplete)

behnamoh•14m ago

i’ve tried Gemini in Google AI studio as well and was very disappointed by the superficial responses it provided. It seems like at the level of GPT-5-low or even lower.

On the other hand, it’s a truly multi modal model whereas Claude remains to be specifically targeted at coding tasks, and therefore is only a text model.

poszlem•11m ago

I’ve trashed Gemini non-stop (seriously, check my history on this site), but 3 Pro is the one that finally made me switch from OpenAI. It’s still hot garbage at coding next to Claude, but for general stuff, it’s legit fantastic.

enraged_camel•9m ago

My testing of Gemini 3 Pro in Cursor yielded mixed results. Sometimes it's phenomenal. At other times I either get the "provider overloaded" message (after like 5 mins or whatever the timeout is), or the model's internal monologue starts spilling out to the chat window, which becomes really messy and unreadable. It'll do things like:

>> I'll execute.

>> Wait, what if...?

>> I'll execute.

Suffice it to say I've switched back to Sonnet as my daily driver. Excited to give Opus a try.

vunderba•8m ago

My workflow was usually to use Gemini 2.5 Pro (now 3.0) for high-level architecture and design. Then I would take the finished "spec" and have Sonnet 4.5 perform the actual implementation.

vessenes•5m ago

I like this plan, too - gemini's recent series have long seemed to have the best large context awareness vs competing frontier models - anecdotally, although much slower, I think gpt-5's architecture plans are slightly better.

chinathrow•5m ago

I gave Sonnet 4.5 a base64 encoded PHP serialize() json of an object dump and told him to extraxt the URL within.

It gave me the Youtube-URL to Rick Astley.

mikestorrent•3m ago

You should probably tell AI to write you programs to do tasks that programs are better at than minds.

arghwhat•2m ago

If you're asking an LLM to compute something "off the top of its head", you're using it wrong. Ask it to write the code to perform the computation and it'll do better.

Same with asking a person to solve something in their head vs. giving them an editor and a random python interpreter, or whatever it is normal people use to solve problems.

rustystump•5m ago

Gemini 3 was awful when i gave it a spin. It was worse than cursor’s composer model.

Claude is still a go to but i have found that composer was “good enough” in practice.

lvl155•2m ago

I really don’t understand the hype around Gemini. Opus/Sonnet/GPT are much better for agentic workflows. Seems people get hyped for the first few days.

GodelNumbering•19m ago

The fact that the post singled out SWE-bench at the top makes the opposite impression that they probably intended.

grantpitt•17m ago

do say more

GodelNumbering•9m ago

Makes it sound like a one trick pony

alvis•18m ago

What surprise me is that Opus 4.5 lost all reasoning scores to Gemini and GPT. I thought it’s the area the model will shine the most

viraptor•14m ago

Has there been any announcement of a new programming benchmark? SWE looks like it's close to saturation already. At this point for SWE it may be more interesting to start looking at which types of issues consistently fail/work between model families.

llamasushi•10m ago

The burying of the lede here is insane. $5/$25 per MTok is a 5x price drop from Opus 4. At that price point, Opus stops being "the model you use for important things" and becomes actually viable for production workloads.

Also notable: they're claiming SOTA prompt injection resistance. The industry has largely given up on solving this problem through training alone, so if the numbers in the system card hold up under adversarial testing, that's legitimately significant for anyone deploying agents with tool access.

The "most aligned model" framing is doing a lot of heavy lifting though. Would love to see third-party red team results.

wolttam•5m ago

It's 1/3 the old price ($15/$75)

keeeba•10m ago

Oh boy, if the benchmarks are this good and Opus feels like it usually does then this is insane.

I’ve always found Opus significantly better than the benchmarks suggested.

LFG

aliljet•10m ago

The real question I have after seeing the usage rug being pulled is what this costs and how usable this ACTUALLY is with a Claude Max 20x subscription. In practice, Opus is basically unusable by anyone paying enterprise-prices. And the modification of "usage" quotas has made the platform fundamentally unstable, and honestly, it left me personally feeling like I was cheated by Anthropic...

zb3•5m ago

The first chart is straight from "how to lie in charts"..

andai•5m ago

Why do they always cut off 70% of the y-axis? Sure it exaggerates the differences, but... it exaggerates the differences.

And they left Haiku out of most of the comparisons! That's the most interesting model for me. Because for some tasks it's fine. And it's still not clear to me which ones those are.

Because in my experience, Haiku sits at this weird middle point where, if you have a well defined task, you can use a smaller/faster/cheaper model than Haiku, and if you don't, then you need to reach for a bigger/slower/costlier model than Haiku.

chaosprint•5m ago

SWE's results were actually very close, but they used a poor marketing visualization. I know this isn't a research paper, but for Anthropic, I expect more.

unsupp0rted•4m ago

This is gonna be game-changing for the next 2-4 weeks before they nerf the model.

Then for the next 2-3 months people complaining about the degradation will be labeled “skill issue”.

Then a sacrificial Anthropic engineer will “discover” a couple obscure bugs that “in some cases” might have lead to less than optimal performance. Still largely a user skill issue though.

Then a couple months later they’ll release Opus 4.7 and go through the cycle again.

My allegiance to these companies is now measured in nerf cycles.

I’m a nerf cycle customer.

Pebble Watch software is now 100% open source

Claude Opus 4.5

Is Your Android TV Streaming Box Part of a Botnet?

Cool-retro-term: terminal emulator which mimics look and feel of the old CRTs

Shai-Hulud Returns: Over 300 NPM Packages Infected

We're (now) moving from OpenBSD to FreeBSD for firewalls

The Bitter Lesson of LLM Extensions

TSMC Arizona Outage Saw Fab Halt, Apple Wafers Scrapped

Three Years from GPT-3 to Gemini 3

NSA and IETF, part 3: Dodging the issues at hand

Corvus Robotics (YC S18): Hiring Head of Mfg/Ops, Next Door to YC Mountain View

Inside Rust's std and parking_lot mutexes – who wins?

Launch HN: Karumi (YC F25) – Personalized, agentic product demos

GrapheneOS migrates server infrastructure from France

Chrome Jpegxl Issue Reopened

Serflings is a remake of The Settlers 1

We stopped roadmap work for a week and fixed bugs

Historically Accurate Airport Dioramas by AV Pro Designs

Ask HN: Scheduling stateful nodes when MMAP makes memory accounting a lie

Show HN: Cynthia – Reliably play MIDI music files – MIT / Portable / Windows

Disney Lost Roger Rabbit

RuBee

Slicing Is All You Need: Towards a Universal One-Sided Distributed MatMul

Google's new 'Aluminium OS' project brings Android to PC

Japan's gamble to turn island of Hokkaido into global chip hub

µcad: New open source programming language that can generate 2D sketches and 3D

Ask HN: Hearing aid wearers, what's hot?

I built a faster Notion in Rust

A New Raspberry Pi Imager

Mind-reading devices can now predict preconscious thoughts: is it time to worry?

Pebble Watch software is now 100% open source

Claude Opus 4.5

Is Your Android TV Streaming Box Part of a Botnet?

Cool-retro-term: terminal emulator which mimics look and feel of the old CRTs

Shai-Hulud Returns: Over 300 NPM Packages Infected

We're (now) moving from OpenBSD to FreeBSD for firewalls

The Bitter Lesson of LLM Extensions

TSMC Arizona Outage Saw Fab Halt, Apple Wafers Scrapped

Three Years from GPT-3 to Gemini 3

NSA and IETF, part 3: Dodging the issues at hand

Corvus Robotics (YC S18): Hiring Head of Mfg/Ops, Next Door to YC Mountain View

Inside Rust's std and parking_lot mutexes – who wins?

Launch HN: Karumi (YC F25) – Personalized, agentic product demos

GrapheneOS migrates server infrastructure from France

Chrome Jpegxl Issue Reopened

Serflings is a remake of The Settlers 1

We stopped roadmap work for a week and fixed bugs

Historically Accurate Airport Dioramas by AV Pro Designs

Ask HN: Scheduling stateful nodes when MMAP makes memory accounting a lie

Show HN: Cynthia – Reliably play MIDI music files – MIT / Portable / Windows

Disney Lost Roger Rabbit

RuBee

Slicing Is All You Need: Towards a Universal One-Sided Distributed MatMul

Google's new 'Aluminium OS' project brings Android to PC

Japan's gamble to turn island of Hokkaido into global chip hub

µcad: New open source programming language that can generate 2D sketches and 3D

Ask HN: Hearing aid wearers, what's hot?

I built a faster Notion in Rust

A New Raspberry Pi Imager

Mind-reading devices can now predict preconscious thoughts: is it time to worry?

Claude Opus 4.5

Comments