Claude Opus 4.7 costs 20–30% more per session

https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new-tokenizer-here-s-what-it-costs-you

205•aray07•2h ago

Comments

uberman•1h ago

On actual code, I see what you see a 30% increase in tokens which is in-line with what they claim as well. I personally don't tend to feed technical documentation or random pros into llms.

Given that Opus 4.6 and even Sonnet 4.6 are still valid options, for me the question is not "Does 4.7 cost more than claimed?" but "What capabilities does 4.7 give me that 4.6 did not?"

Yesterday 4.6 was a great option and it is too soon for me to tell if 4.7 is a meaningful lift. If it is, then I can evaluate if the increased cost is justified.

pier25•1h ago

haven't people been complaining lately about 4.6 getting worse?

ed_elliott_asc•1h ago

No we increased our plans

solenoid0937•1h ago

People complain about a lot of things. Claude has been fine:

https://marginlab.ai/trackers/claude-code-historical-perform...

Majromax•1h ago

While that's a nice effort, the inter-run variability is too high to diagnose anything short of catastrophic model degradation. The typical 95% confidence interval runs from 35% to 65% pass rates, a full factor of two performance difference.

Moreover, on the companion codex graphs (https://marginlab.ai/trackers/codex-historical-performance/), you can see a few different GPT model releases marked yet none correspond to a visual break in the series. Either GPT 5.4-xhigh is no more powerful than GPT 5.2, or the benchmarking apparatus is not sensitive enough to detect such changes.

cbg0•1h ago

That performance monitor is super easy to game if you cache responses to all the SWE bench questions.

addisonj•1h ago

I will be the first to acknowledge that humans are a bad judge of performance and that some of the allegations are likely just hallucinations...

But... Are you really going to completely rely on benchmarks that have time and time again be shown to be gamed as the complete story?

My take: It is pretty clear that the capacity crunch is real and the changes they made to effort are in part to reduce that. It likely changed the experience for users.

grim_io•1h ago

How long will they host 4.6? Maybe longer for enterprise, but if you have a consumer subscription, you won't have a choice for long, if at all anymore.

nfredericks•1h ago

Opus 4.5 is still available

grim_io•56m ago

Wow, they hosted it for 6 months. Truly LTS territory :)

Jeremy1026•1h ago

I was trying to figure out earlier today how to get 4.6 to run in Claude Code, as part of the output it included "- Still fully supported — not scheduled for retirement until Feb 2027." Full caveat of, I don't know where it came up with this information, but as others have said, 4.5 is still available today and it is now 5, almost 6 months old.

hypercube33•43m ago

I'm still using 4.5 because it gets the niche work I'm using it for where 4.6 would just fight me.

dallen33•1h ago

I'm still using Sonnet 4.6 with no issues.

risyachka•1h ago

How does this solve the issue? 4.6 will be disabled after one or more release like any other legacy model.

gadflyinyoureye•58m ago

Won't the thing that replaces 4.6 come down in token cost?

iknowstuff•1h ago

Interesting because I already felt like current models spit out too much garbage verbose code that a human would write in a far more terse, beautiful and grokable way

aray07•1h ago

yeah opus 4.7 feels a lot more verbose - i think they changed the system prompt and removed instructions to be terse in its responses

louiereederson•1h ago

LLMs exist on a logaritmhic performance/cost frontier. It's not really clear whether Opus 4.5+ represent a level shift on this frontier or just inhabits place on that curve which delivers higher performance, but at rapidly diminishing returns to inference cost.

To me, it is hard to reject this hypothesis today. The fact that Anthropic is rapidly trying to increase price may betray the fact that their recent lead is at the cost of dramatically higher operating costs. Their gross margins in this past quarter will be an important data point on this.

I think the tendency for graphs of model assessment to display the log of cost/tokens on the x axis (i.e. Artificial Analysis' site) has obscured this dynamic.

snek_case•59m ago

They're also getting closer to IPO and have a growing user base. They can't justify losing a very large number of billions of other people's money in their IPO prospectus.

So there's a push for them to increase revenue per user, which brings us closer to the real cost of running these models.

giwook•48m ago

I agree, and I'm also quite skeptical that Anthropic will be able to remain true to its initial, noble mission statement of acting for the global good once they IPO.

At that point you are beholden to your shareholders and no longer can eschew profit in favor of ethics.

Unfortunately, I think this is the beginning of the end of Anthropic and Modei being a company and CEO you could actually get behind and believe that they were trying to do "the right thing".

It will become an increasingly more cutthroat competition between Anthropic and OpenAI (and perhaps Google eventually if they can close the gap between their frontier models and Claude/GPT) to win market share and revenue.

Perhaps Amodei will eventually leave Anthropic too and start yet another AI startup because of Anthropic's seemingly inevitable prioritization of profit over safety.

devmor•45m ago

Skeptical is a light way to put it. It is essentially a forgone conclusion that once a company IPOs, any veil that they might be working for the global good is entirely lifted.

A publicly traded company is legally obligated to go against the global good.

giwook•39m ago

Fair point.

Call me an optimist, but I'm still holding out hope that Amodei is and still can do the right thing. That hope is fading fast though.

WarmWash•35m ago

The problem is that people equate money to power and power to evil.

So no matter what, if you do something lots of people like (and hence compensate you for), you will be evil.

It's a very interesting quirk of human intuition.

arcanemachiner•26m ago

A reasonable conclusion, considering that money and power seem to have their own gravity, so people with more of both end up getting even more of both, and vice versa.

Can't blame someone who comes to such a conclusion about money and power.

mattkevan•25m ago

It’s not really, companies like GM used to boast about how well they treated their employees and communities. It was Jack Welch and a legion of like-minded arseholes who decided they should be increasingly richer no matter who or what paid for it.

dboreham•12m ago

See also HP. Pretty much only Costco left.

snek_case•38m ago

I think the pivot to profit over good has been happening for a long time. See Dario hyping and salivating over all programming jobs disappearing in N months. He doesn't care at all if it's true or not. In fact he's in a terrible position to even understand if this is possible or not (probably hasn't coded for 10+ years). He's just in the business of selling tokens.

louiereederson•56m ago

I meant reference Toby Ord's work here. I think his framing of the performance/cost frontier hasn't gotten enough attention https://www.tobyord.com/writing/hourly-costs-for-ai-agents

paulddraper•49m ago

> The fact that Anthropic is rapidly trying to increase price may betray the fact that their recent lead is at the cost of dramatically higher operating costs.

Or they are just not willing to burn obscene levels of capital like OpenAI.

xd1936•1h ago

And what about with Caveman[1]?

1. https://github.com/juliusbrussee/caveman

Majromax•1h ago

Caveman doesn't and cannot change the tokenizer, so the relative token count differences by input category will remain unchanged.

brokencode•1h ago

Can we have one thread about Claude without people trying to shovel Caveman?

Much of the token usage is in reasoning, exploring, and code generation rather than outputs to the user.

Does making Claude sound like a caveman actually move the needle on costs? I am not sure anymore whether people are serious about this.

To me, caveman sounds bad and is not as easy to understand compared to normal English.

aray07•1h ago

isn’t caveman a joke? why would you use it for real work?

atonse•1h ago

Just yesterday I was happy to have gotten my weekly limit reset [1]. And although I've been doing a lot of mockup work (so a lot of HTML getting written), I think the 1M token stuff is absolutely eating up tokens like CRAZY.

I'm already at 27% of my weekly limit in ONE DAY.

https://news.ycombinator.com/item?id=47799256

aray07•1h ago

yeah similar for me - it uses a bunch more tokens and I haven’t been able to tell the ROI in terms of better instruction following

it seems to hallucinate a bit more (anecdotal)

titaniumtown•53m ago

I had it hallucinate a tool that didn't exist, it was very frustrating!

jabart•1h ago

I'm seeing the opposite. With Opus 4.7 and xhigh, I'm seeing less session usage , it's moving faster, and my weekly usage is not moving that much on a Team Pro account.

cbm-vic-20•27m ago

Four day workweek!

jmward01•1h ago

Yeah. I just did a day with 4.7 and I won't be going back for a while. It is just too expensive. On top of the tokenization the thinking seems like it is eating a lot more too.

aray07•1h ago

yeah i am still not clear why there are 5 effort modes now on top of more expensive tokenization

jddj•47m ago

Once you've seen a few results of an LLM given too much sway over product decisions, 5 effort modes expressed as various english adjectives is pretty much par for the course

rafram•1h ago

Pretty funny that this article was clearly written by Claude.

markrogersjr•1h ago

4.7 one-shot rate is at least 20-30% higher for me

ChicagoBoy11•40m ago

How are you able to track this as you use it? A bit stumped atm

markrogersjr•9m ago

Purely empirical

bcjdjsndon•1h ago

Because those braniacs added 20-30% more system prompt

CodingJeebus•1h ago

The fundamental problem with these frontier model companies is that they're incentivized to create models that burn through more tokens, full stop. It's a tale as old as capitalism: you wake up every day and choose to deliver more value to your customers or your shareholders, you cannot do both simultaneously forever.

People love to throw around "this is the dumbest AI will ever be", but the corollary to that is "this is the most aligned the incentives between model providers and customers will ever be" because we're all just burning VC money for now.

NickC25•57m ago

> but the corollary to that is "this is the most aligned the incentives between model providers and customers will ever be" because we're all just burning VC money for now.

Please say this louder for everyone to hear. We are still at the stage where it is best for Anthropic's product to be as consumer aligned (and cost-friendly) as possible. Anthropic is loosing a lot of money. Both of those things will not be true in the near future.

BosunoB•46m ago

Their bigger incentive is to deliver the best product in the cheapest way, because there is tight competition with at least 2 other companies. I know we all love to hate on capitalism but it's actually functioning fine in this situation, and the token inflation is their attempt to provide a better product, not a worse one.

stefan_•1h ago

I don't know anything about tokens. Anthropic says Pro has "more usage*", Max has 5x or 20x "more usage*" than Pro. The link to "usage limits" says "determines how many messages you can send". Clearly no one is getting billed for tokens.

aray07•49m ago

anthropic’s pricing is all based on token usage

https://platform.claude.com/docs/en/about-claude/pricing

So if you are generating more tokens, you are eating up your usage faster

_pdp_•1h ago

IMHO there is a point where incremental model quality will hit diminishing returns.

It is like comparing an 8K display to a 16K display because at normal viewing distance, the difference is imperceptible, but 16K comes at significant premium.

The same applies to intelligence. Sure, some users might register a meaningful bump, but if 99% can't tell the difference in their day-to-day work, does it matter?

A 20-30% cost increase needs to deliver a proportional leap in perceivable value.

snek_case•1h ago

It probably depends what you're using the models for. If you use them for web search, summarizing web pages, I can imagine there's a plateau and we're probably already hitting it.

For coding though, there is kind of no limit to the complexity of software. The more invariants and potential interactions the model can be aware of, the better presumably. It can handle larger codebases. Probably past the point where humans could work on said codebases unassisted (which brings other potential problems).

aray07•51m ago

yeah thats is my biggest issue - im okay with paying 20-30% more but what is the ROI? i dont see an equivalent improvement in performance. Anthropic hasnt published any data around what these improvements are - just some vague “better instruction following"

ZeroCool2u•35m ago

Whenever we get the locally runnable 4k models things are going to get really awkward for the big 3 labs. Well at least Google will still have their ad revenue I guess.

robot_jesus•25m ago

They're not perfect but the local model game is progressing so quickly that they're impossible to ignore. I've only played around with the new qwen 3.6 models for a few minutes (it's damn impressive) but this weekend's project is to really put it through its paces.

If I can get the performance I'm seeing out of free models on a 6-year-old Macbook Pro M1, it's a sign of things to come.

Frontier models will have their place for 1) extensive integrations and tooling and 2) massive context windows. But I could see a very real local-first near future where a good portion of compute and inference is run locally and only goes to a frontier model as needed.

UncleOxidant•17m ago

I've had really good results form qwen3-coder-next. I'm hoping we get a qwen3.6-coder soon since claude seems to get less-and-less available on the pro plan.

UncleOxidant•18m ago

Given how little claude usage they've been giving us on the "pro" plan lately, I've started doing more with the various open Qwen3.* models. Both Qwen3-coder-next and Qwen3.5-27b have been giving me good results and their 3.6 models are starting to be released. I think Anthropic may be shooting themselves in the foot here as more people start moving to local models due to costs and/or availability. Are the Qwen models as good as Claude right now? No. But they're about as good as Claude was 9 months ago (prior to 4.5). If I need some complex planning I save that for claude and have the Qwen models do the implementation.

mlinsey•19m ago

I agree, but also the model intelligence is quite spikey. There are areas of intelligence that I don't care at all about, except as proxies for general improvement (this includes knowledge based benchmarks like Humanity's Last Exam, as well as proving math theorems etc). There are other areas of intelligence where I would gladly pay more, even 10X more, if it meant meaningful improvements: tool use, instruction following, judgement/"common sense", learning from experience, taste, etc. Some of these are seeing some progress, others seem inherent to the current LLM+chain of thought reasoning paradigm.

nisegami•18m ago

>IMHO there is a point where incremental model quality will hit diminishing returns.

It's not necessary a single discrete point I think. In my experience, it's tied to the quality/power of your harness and tooling. More powerful tooling has made revealed differences between models that were previously not easy to notice. This matches your display analogy, because I'm essentially saying that the point at which display resolution improvements are imperceptible matters on how far you sit.

simplyluke•12m ago

I'm seeing a lot of sentiment, and agree with a lot of it, that opus 4.6 un-nerfed is there already and for many if not most software use cases there's more value to be had in tooling, speed, and cost than raw model intelligence.

_pdp_•9m ago

Longer version of the comment https://www.linkedin.com/pulse/imperceptible-upgrade-petko-d...

mikert89•59m ago

The compute is expensive, what is with this outrage? People just want free tools forever?

rvz•53m ago

> The compute is expensive, what is with this outrage?

Gamblers (vibe-coders) at Anthropic's casino realising that their new slot machine upgrade (Claude Opus) is now taking 20%-30% more credits for every push of the spin button.

Problem is, it advertises how good it is (unverified benchmarks) and has a better random number generator but it still can be rigged (made dumber) by the vendor (Anthropic).

The house (Anthropic) always wins.

> People just want free tools forever?

Using local models are the answer to this if you want to use AI models free forever.

aray07•50m ago

are you okay with paying more for your services without any perceived improvement in the service itself?

schmookeeg•46m ago

That's been a constant for my entire adult life.

throw_m239339•25m ago

> are you okay with paying more for your services without any perceived improvement in the service itself?

These services were and still are wholly subsidized by VC money, in terms of price increase you have seen nothing yet. Same with the competition...

sipsi•59m ago

I tried to do my usual test (similar to pelican but a bit more complex) but it ran out of 5 hour limit in 5 minutes. Then after 5 hours I said "go on" and the results were the worst I've ever seen.

qq66•52m ago

This is the backdoor way of raising prices... just inflate the token pricing. It's like ice cream companies shrinking the box instead of raising the price

Yukonv•49m ago

Some broad assumptions are being made that plans give you a precise equivalent to API cost. This is not the case with reverse engineering plan usage showing cached input is free [0]. If you re-run the math removing cached input the usage cost is ~5-34% more. Was the token plan budget increase [1] proportional to account for this? Can’t say with certainty. Those paying API costs though the price hike is real.

[0] https://she-llac.com/claude-limits

[1] https://xcancel.com/bcherny/status/2044839936235553167

encoderer•48m ago

In my “repo os” we have an adversarial agent harness running gpt5.4 for plan and implementation and opus4.6 for review. This was the clear winner in the bake-off when 5.4 came out a couple months ago.

Re-ran the bake-off with 4.7 authoring and… gpt5.4 still clearly winning. Same skills, same prompts, same agents.md.

lacoolj•48m ago

This is probably an adjacent result of this (from anthropic launch post):

> In Claude Code, we’ve raised the default effort level to xhigh for all plans.

Try changing your effort level and see what results you get

aray07•36m ago

effort level is separate from tokenization. Tokenization impacts you the same regardless.

I find 5 thinking levels to be super confusing - I dont really get why they went from 3 -> 5

curioussquirrel•48m ago

Claude's tokenizers have actually been getting less efficient over the years (I think we're at the third iteration at the least since Sonnet 3.5). And if you prompt the LLM in a language other than English, or if your users prompt it or generate content in other languages, the costs go higher even more. And I mean hundreds of percent more for languages with complex scripts like Tamil or Japanese. If you're interested in the research we did comparing tokenizers of several SOTA models in multiple languages, just hit me up.

arcanemachiner•15m ago

I would encourage you to post a link here, and also to submit to HN if you haven't already. :)

varispeed•46m ago

Don't forget that the model doesn't have an incentive to give right solution the first time. At least with Opus 4.6 after it got nerfed, it would go round in circles until you tell it to stop defrauding you and get to correct solution. That not always worked though. I found starting session again and again until less nerfed model was put on the request. Still all points to artificially make customer pay more.

namnnumbr•46m ago

The title is a misdirection. The token counts may be higher, but the cost-per-task may not be for a given intelligence level. Need to wait to see Artificial Analysis' Intelligence Index run for this, or some other independent per-task cost analysis.

The final calculation assumes that Opus 4.7 uses the exact same trajectory + reasoning output as Opus 4.6. I have not verified, but I assume it not to be the case, given that Opus 4.7 on Low thinking is strictly better than Opus 4.6 on Medium, etc., etc.

aray07•24m ago

im running some experiments on this but based on what i have seen on my own personal data - I dont think this is true

"given that Opus 4.7 on Low thinking is strictly better than Opus 4.6 on Medium, etc., etc.”

Opus 4.7 in general is more expensive for similar usage. Now we can argue that is provides better performance all else being equal but I haven’t been able to see that

unpwn•23m ago

Very unlikely that the article is wrong. the 4.7 intelligence bump is not that big, plus most of the token spend is in inputs/tool calls etc, much of which won't change even with this bump.

_fat_santa•39m ago

A question I've been asking alot lately (really since the release of GPT-5.3) is "do I really need the more powerful model"?

I think a big issue with the industry right now is it's constantly chasing higher performing models and that comes at the cost of everything else. What I would love to see in the next few years is all these frontier AI labs go from just trying to create the most powerful model at any cost to actually making the whole thing sustainable and focusing on efficiency.

The GPT-3 era was a taste of what the future could hold but those models were toys compare to what we have today. We saw real gains during the GPT-4 / Claude 3 era where they could start being used as tools but required quite a bit of oversight. Now in the GPT-5 / Claude 4 era I don't really think we need to go much further and start focusing on efficiency and sustainability.

What I would love the industry to start focusing on in the next few years is not on the high end but the low end. Focus on making the 0.5B - 1B parameter models better for specific tasks. I'm currently experimenting with fine-tuning 0.5B models for very specific tasks and long term I think that's the future of AI.

beej71•37m ago

News like this always makes me wonder about running my own model, something I've never done. A couple thousand bucks can get you some decent hardware, it looks like, but is it good for coding? What is your all's experience?

And if it's not good enough for coding, what kind of money, if any, would make it good enough?

aray07•37m ago

i think the new qwen models are supposed to be good based on some the articles that i read

hleszek•32m ago

The latest Qwen3.6 model is very impressive for its size. Get an RTX 3090 and go to https://www.reddit.com/r/LocalLLaMA/ to see the latest news on how to run models locally. Totally fine for coding.

bakugo•22m ago

You should be aware that any model you can run on less than $10k worth of hardware isn't going to be anywhere close to the best cloud models on any remotely complex task.

Many providers out there host open weights models for cheap, try them out and see what you think before actually investing in hardware to run your own.

arcanemachiner•17m ago

I want to give give you realistic expectations: Unless you spend well over $10K on hardware, you will be disappointed, and will spend a lot of time getting there. For sophisticated coding tasks, at least. (For simple agentic work, you can get workable results with a 3090 or two, or even a couple 3060 12GBs for half the price. But they're pretty dumb, and it's a tease. Hobby territory, lots of dicking around.)

Do yourself a favor: Set up OpenCode and OpenRouter, and try all the models you want to try there.

Other than the top performers (e.g. GLM 5.1, Kimi K2.5, where required hardware is basically unaffordable for a single person), the open models are more trouble than they're worth IMO, at least for now (in terms of actually Getting Shit Done).

mfro•5m ago

Not sure why all the other commentors are failing to mention you can spend considerably less money on an apple silicon machine to run decent local models.

Fun fact: AWS offers apple silicon EC2 instances you can spin up to test.

adaptive_loop•37m ago

Every time a new model comes out, I'm left guessing what it means for my token budget in order to sustain the quality of output I'm getting. And it varies unpredictably each time. Beyond token efficiency, we need benchmarks to measure model output quality per token consumed for a diverse set of multi-turn conversation scenarios. Measuring single exchanges is not just synthetic, it's unrealistic. Without good cost/quality trade-off measures, every model upgrade feels like a gamble.

therobots927•25m ago

That’s the joy of purchasing an intangible and non-deterministic product. The profit margin is completely within the vendor’s control and quality is hard for users to measure.

taosx•36m ago

Claude seems so frustrating lately to the point where I avoid and completely ignore it. I can't identify a single cause but I believe it's mostly the self-righteousness and leadership that drive all the decisions that make me distrust and disengage with it.

estearum•36m ago

using dumber models to own the libs

testbjjl•32m ago

Definitely experimenting with less expensive ones. I have a few versions of my settings.json

I also wonder if token utilization has or will ever find its way to employee performance reviews as these models go up in price.

sysmax•35m ago

Well, LLMs are priced per token, and most of the tokens are just echoing back the old code with minimal changes. So, a lot of the cost is actually paying for the LLM to echo back the same code.

Except, it's not that trivial to solve. I tried experimenting with asking the model to first give a list of symbols it will modify, and then just write the modified symbols. The results were OK, but less refined than when it echoes back the entire file.

The way I see it is that when you echo back the entire file, the process of thinking "should I do an edit here" is distributed over a longer span, so it has more room to make a good decision. Like instead of asking "which 2 of the 10 functions should you change" you're asking it "should you change method1? what about method2? what about method3?", etc., and that puts less pressure on the LLM.

Except, currently we are effectively paying for the LLM to make that decision for *every token*, which is terribly inefficient. So, there has to be some middle ground between expensively echoing back thousands of unchanged tokens and giving an error-ridden high-level summary. We just haven't found that middle ground yet.

gruez•33m ago

>and most of the tokens are just echoing back the old code with minimal changes

I thought coding harnesses provided tools to apply diffs so the LLM didn't have to echo back the entire file?

sysmax•22m ago

They can, but this reduces the quality. The LLM has a harder time picking the first edit, and then all subsequent work is influenced by that one edit. Like first creating an unnecessary auxiliary type, and then being stuck modifying the rest of the code to work with it.

So, in practice, many tools still work on the file level.

mmastrac•32m ago

I think the ideal way for these LLMs to work will be using AST-level changes instead of "let me edit this file".

grit.io was working on this years ago, not sure if they are still alive/around, but I liked their approach (just had a very buggy transformer/language).

ricardobeat•33m ago

I can’t stand reading this. One article. Many words. Not written by a human.

Feels like LLMs are devolving into having a single, instantly recognizable and predictable writing style.

aliljet•31m ago

This is the reality I'm seeing too. Does this mean that the subscriptions (5x, 10x, 20x) are essentially reduced in token-count by 20-30%?

aray07•23m ago

yeah thats the part that is unclear to me as well - if our usage capacity is now going to run out faster.

Bingolotto•20m ago

Talked to Claude earlier today and Opus 4.7 cost up to 35% more.

technotony•17m ago

Not only that but they seem to have cut my plan ability to use Sonnet too. I have a routine that used to use about 40% of my 5 hour max plan tokens, then since yesterday it gets stopped because it uses the whole 100%. Anyone else experience this?

mfro•9m ago

yeah it seems like sonnet 4.6 burns thru tokens crazy fast. I did one prompt, sonnet misunderstood it as 'generate an image of this' and used all of my free tokens.

jmward01•17m ago

Claude code seems to be getting worse on several fronts and better on others. I suspect product is shifting from 'make it great' to 'make it make as much money for us as possible and that includes gathering data'.

Recently it started promoting me for feedback even though I am on API access and have disabled this. When I did a deep dive of their feedback mechanism in the past (months ago so probably changed a lot since then) the feedback prompt was pushing message ids even if you didn't respond. If you are on API usage and have told them no to training on your data then anything pushing a message id implies that it is leaking information about your session. It is hard to keep auditing them when they push so many changes so I am now 'default they are stealing my info' instead of believing their privacy/data use policy claims. Basically, my level of trust is eroding fast in their commitment to not training on me and I am paying a premium to not have that happen.

thibran•14m ago

For me there is no point in using Claude Opus 4.7, it's too expensive since it does not do 100% of the job. Since AI can anyway only do 90% of most tasks, I can use another model and do the remaining 15-30% myself.

speedgoose•13m ago

The "multiplier" on Github Copilot went from 3 to 7.5. Nice to see that it is actually only 20-30% and Microsoft wanting to lose money slightly slower.

https://docs.github.com/fr/copilot/reference/ai-models/suppo...

Someone1234•5m ago

Yep, and I just made a recommendation that was essentially "never enable Opus 4.7" to my org as a direct result. We have Opus 4.6 (3x) and Opus 4.5 (3x) enabled currently. They are worth it for planning.

At 7.5x for 4.7, heck no. It isn't even clear it is an upgrade over Opus 4.6.

therobots927•13m ago

As a regular listener of Ed Zitron this comes as absolutely no surprise. Once you understand the levels of obfuscation available to anthro / OAI you will realize that they have almost certainly hit a model plateau ~1 year ago. All benchmark improvements since have come at a high compute cost. And the model used when evaluating said benchmarks is not the same model you get with your subscription.

This is already becoming apparent as users are seeing quality degrade which implies that anthropic is dropping performance across the board to minimize financial losses.

montjoy•8m ago

It appears that they are testing using Max. For 4.7 Anthropic recognizes the high token usage of max and recommends the new xhigh mode for most cases. So I think the real question is whether 4.7 xhigh is “better” than 4.6 max.

> max: Max effort can deliver performance gains in some use cases, but may show diminishing returns from increased token usage. This setting can also sometimes be prone to overthinking. We recommend testing max effort for intelligence-demanding tasks.

> xhigh (new): Extra high effort is the best setting for most coding and agentic use cases

Ref: https://platform.claude.com/docs/en/build-with-claude/prompt...

dcrazy•2m ago

Inserting an xhigh tier and pushing max way out has very “these go to 11” vibes.

omega3•6m ago

Contrary to people here who feel the price increases, reduction of subscription limits etc are the result of the Anthropic models being more expensive to run than the API & subscription revenue they generate I have a theory that Anthropic has been in the enshittification & rent seeking phase for a while in which they will attempt to extract as much money out of existing users as possible.

Commercial inference providers serve Chinese models of comparable quality at 0.1x-0.25x. I think Anthropic realised that the game is up and they will not be able to hold the lead in quality forever so it's best to switch to value extraction whilst that lead is still somewhat there.

CharlesW•4m ago

> Commercial inference providers serve Chinese models of comparable quality…

"Comparable" is doing some heavy lifting there. Comparable to Anthropic models in 1H'25, maybe.

ndom91•4m ago

`/model claude-opus-4-6`

Claude Design

Claude Opus 4.7 costs 20–30% more per session

Isaac Asimov: The Last Question (1956)

Middle schooler finds coin from Troy in Berlin

NIST gives up enriching most CVEs

It Is Time to Ban the Sale of Precise Geolocation

Kyber (YC W23) Is Hiring a Head of Engineering

Healthchecks.io Now Uses Self-Hosted Object Storage

Iceye Open Data

NASA Force

Designing the Transport Typeface

Claude Opus 4.7

The Utopia of the Family Computer

Show HN: Stage – Putting humans back in control of code review

Show HN: PanicLock – Close your MacBook lid disable TouchID –> password unlock

Codex for almost everything

Hyperscalers have already outspent most famous US megaprojects

Solitaire simulator for finding the best strategy: Current record is 8.590%

Teddy Roosevelt and Abraham Lincoln in the same photo (2010)

The Gregorio project – GPL tools for typesetting Gregorian chant

FIM – Linux framebuffer image viewer

Ada, Its Design, and the Language That Built the Languages

Scan your website to see how ready it is for AI agents

CadQuery is an open-source Python library for building 3D CAD models

The missing catalogue: why finding books in translation is still so hard

A Python Interpreter Written in Python

Official Clojure Documentary page with Video, Shownotes, and Links

Android CLI: Build Android apps 3x faster using any agent

Reflections on 30 Years of HPC Programming

中文 Literacy Speedrun II: Character Cyclotron

Claude Design

Claude Opus 4.7 costs 20–30% more per session

Isaac Asimov: The Last Question (1956)

Middle schooler finds coin from Troy in Berlin

NIST gives up enriching most CVEs

It Is Time to Ban the Sale of Precise Geolocation

Kyber (YC W23) Is Hiring a Head of Engineering

Healthchecks.io Now Uses Self-Hosted Object Storage

Iceye Open Data

NASA Force

Designing the Transport Typeface

Claude Opus 4.7

The Utopia of the Family Computer

Show HN: Stage – Putting humans back in control of code review

Show HN: PanicLock – Close your MacBook lid disable TouchID –> password unlock

Codex for almost everything

Hyperscalers have already outspent most famous US megaprojects

Solitaire simulator for finding the best strategy: Current record is 8.590%

Teddy Roosevelt and Abraham Lincoln in the same photo (2010)

The Gregorio project – GPL tools for typesetting Gregorian chant

FIM – Linux framebuffer image viewer

Ada, Its Design, and the Language That Built the Languages

Scan your website to see how ready it is for AI agents

CadQuery is an open-source Python library for building 3D CAD models

The missing catalogue: why finding books in translation is still so hard

A Python Interpreter Written in Python

Official Clojure Documentary page with Video, Shownotes, and Links

Android CLI: Build Android apps 3x faster using any agent

Reflections on 30 Years of HPC Programming

中文 Literacy Speedrun II: Character Cyclotron

Claude Opus 4.7 costs 20–30% more per session

Comments