Incident Report for Anthropic

https://status.anthropic.com/incidents/72f99lh1cj2c

81•bashtoni•5mo ago

Comments

paradite•5mo ago

Announcement on X:

https://x.com/claudeai/status/1965208247302029728

viraptor•5mo ago

https://xcancel.com/claudeai/status/1965208247302029728

jondwillis•5mo ago

It’s like a thread of rabid animals replying. So much unbridled entitlement and frustration without any hope of recourse.

I’d almost say it’s hard to understand how people don’t realize that grok has all of the same power and incentive structures behind it as Anthropic’s cloud models.

watwut•5mo ago

Grok has Musk behind it and that has ... much worst implications then the background of the other companies. Not that those wpuld be saints, but they are not openly like Musk.

metadat•5mo ago

Do they credit your account if you were impacted? Or it's just "sorry 'bout 'dat month of trash"?

Unfortunate timing, as I am rooting for Anthropic as the underdog, but feel compelled to use whatever works best. Since mid-August I've demoted Claude to only putting the fire on UIs and am getting amazing results with GPT-5 for everything else. Given the nonstop capacity warnings on codex cli, I might not be the only one.

behnamoh•5mo ago

> Unfortunate timing, as I am rooting for Anthropic as the underdog...

Give me a break... Anthropic has never been the underdog. Their CEO is one of the most hypocrite people in the field. In the name of "safety" and "ethics", they got away with not releasing even a single open-weight (or open-source) model, calling out OpenAI as the "bad guys", and constantly trying to sabotage pro-competition and pro-consumer AI laws in the US.

watwut•5mo ago

Well OpenAI and Sam Altman are "bad guys". At least that part is true. It is just that Anthropic is not better.

behnamoh•5mo ago

> Well OpenAI and Sam Altman are "bad guys".

Define "bad". Sama is a businessman and at least doesn't pretend to be a saint like Amodei does.

testfrequency•5mo ago

If you know sama you’d know damn well he’s not nice, but believe whatever you want of course

pdksam•5mo ago

You don't need to be nice to be a good businessman

manojlds•5mo ago

Oh as in being upfront with a for profit organization from the get go?

yunwal•5mo ago

Isn’t this exactly what he was doing by calling it OpenAI and establishing it a nonprofit?

cma•5mo ago

They were also the first to work with the NSA, years before their change to support military uses, according to Dean Ball, former Whitehouse AI Policy advisor, in an interview with Nathan Labenz.

nextworddev•5mo ago

Til

rfoo•5mo ago

You are absolutely right! But China bad Dario good Anthropic the only firm caring about AI safety /s

ahofmann•5mo ago

A company can be considered as the underdog and still be run by assholes?

paulddraper•5mo ago

Rooting for the underdog is a moving target.

dinfinity•5mo ago

Gemini 2.5 Pro is also pretty good, if you need a fallback.

andy_ppp•5mo ago

So they aren’t saying what the bug was that caused this issue? Would love a more detailed explanation, what could possibly cause the model degradation apart from potentially pointing the queries to the wrong model?

qsort•5mo ago

If I had to guess, something related to floating point operations. FP additions and multiplications are neither commutative nor associative.

allisdust•5mo ago

Opus has been utter garbage for the last one month or so.

Aeolun•5mo ago

I’ve definitely been more annoyed with it recently. I never had to curse at it because it was taking the lazy way out before.

Oh, let me just fix that!

Comments out test

speedgoose•5mo ago

When it happens, I stop it and tell that we aren’t working for one of the IT consulting companies I hate, and a "you are absolutely right" later we are back on track.

CuriouslyC•5mo ago

OH MY GOD YES! I actually had it inject synthetic data into my experiments! I had to go back through all my work and re-validate so much to make sure I found all instances of it (it happened in a few different projects).

I now have a system of automated tripwires on all experimental scripts that notifies me and terminates the experiment when any sort of statistical irregularity is detected.

ares623•5mo ago

One man’s bug is another man’s load balancing experiment.

slacktivism123•5mo ago

>Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

Sure. I give it a few hours until the prolific promoters start to parrot this apologia.

Don't forget: the black box nature of these hosted services means there's no way to audit for changes to quantization and model re-routing, nor any way to tell what you're actually getting during these "demand" periods.

mccoyb•5mo ago

Here’s a report: Claude Code (the software) is getting worse by the day.

Removing the shown token comsumption rates (which allowed understanding when tokens were actually being sent / received!) … sometimes hiding the compaction percentage … the incredible lag on ESC interruption on long running sessions, the now broken clearing of the context window content on TASK tool usage

Who the fuck is working on this software and do they actually use it themselves?

Maybe the quality of Claude Code on any given day is indicative of whether their models are degraded …

yurifury•5mo ago

Use /config and enable verbose output to see the token consumption/usage per message.

mccoyb•5mo ago

That I could, except it hangs in the current version of Claude Code when switching from false to true ...

CuriouslyC•5mo ago

Claude Code is indeed legit bad. You'd never know that this was a billion dollar company by the mess of javascript they hacked together. You have to periodically close and re-open the client because otherwise it starts to lag the system from constantly scanning and saving a big JSON file, and they didn't think to shard their storage or use a database. I have 128GB of ram on my workstation and running 8 claude code instances at once sometimes causes heavy thrashing and desktop responsiveness issues... That's just insane.

Needless to say I built my own agent (just needs a good web UI, last step!). The only thing keeping me with anthropic right now is the economics of the plan, my inference bill would be a second mortgage without it.

pton_xd•5mo ago

Lately I've noticed even the Claude web interface for chat is laggy on my 16 core / 32 GB RAM laptop. How is that possible?! It's just text!

marxism•5mo ago

Hey, you mentioned needing a web UI for your agent.

I contribute to Happy Coder, an open source agent client (mobile app, desktop app, and web app). The project is just a UI layer for existing agents. Adding Codex specific UI and plumbing last week was a 2,600 line diff that took a contributor 3 evenings. And it should be even less plumbing for the next agent.

I'm looking for developers to try integrating their agents and tell me what's broken or awkward. I'd appreciate feedback on where the abstractions leak or what's missing. Even with the friction of using someone else's codebase, it could be less work than starting from zero.

I keep seeing these proprietary clients charge $50/month for basically the same plumbing everyone needs. Selfishly I would like open source and free to win this category here.

GitHub: https://github.com/slopus/happy (MIT License)

CuriouslyC•5mo ago

Looks pretty neat, but I have a vision for my web client that I'm excited to execute on (it's a 3d map that shows all active agents with branching, it has high performance zoom/pan so you can visualize your entire swarm's activity at once, lots of nice visual flash), I just have a jammed launch schedule so I haven't been able to put the time towards the work yet, I'm trying to ship 3 products this week!

Speaking of which, if you're interested in a CLI that creates a comprehensive refactoring plan for agents, I'm going to drop https://github.com/sibyllinesoft/valknut tomorrow once I polish the github and create a product page for it on my site. It's helped me keep my agents on rails while refactoring, so that I can scale my AI workflows to larger codebases. It's available on crates as valknut-rs and brew as valknut.

behnamoh•5mo ago

Anthropic only has _one_ product that people want: Claude Code. Everything else about their offerings sucks compared to the competition:

- shitty voice to text (why not just use Whisper at this point?)

- clunky website

- no image/video generation models

- DeepResearch sucks big time

- "Extended Thinking" doesn't seem to do much "thinking". I get the same results without it.

- API too expensive for what it is.

- No open-weight model to boost their reputation. Literally every other player has released an open model at this point..

viraptor•5mo ago

That's a weird summary given how many people around me use Claude with Cursor and still prefer it over gpt5. I don't think you can claim a complete view of what their customers want.

behnamoh•5mo ago

I never said their API model was bad. That's what they use in CC afterall.

viraptor•5mo ago

You did effectively say nobody wants their API though. And that's not true.

visarga•5mo ago

This is why it is hard to take a subscription or dependency on them, if they degrade the services willy nilly. Bait and switch tactic.

In Cursor I am seeing varying degrees of delays after exhausting my points, for On-Demand Usage. Some days it works well, other days it just inserts a 30s wait on each message. What am I paying for? You never know when you buy.

behnamoh•5mo ago

You should never buy annual AI subs. This field moves so fast and companies often change their ToS. Poe.com did the same and I was out (one day they decided to change the quotas/month for the SOTA models and turned off the good old GPT-4 and replaced it with GPT-4-Turbo which was quantized and bad).

andrewinardeer•5mo ago

Could you not ask for, and be entitled to, a refund for your remaining time on an annual subscription if ToS change n months into it?

bakugo•5mo ago

Could you ask them? Sure, but good luck getting it. In theory, forcefully changing the terms of a service after payment without offering a refund should clearly not be allowed, but in practice, it very much is unless you're willing to waste disproportionate amounts of money taking them to court.

andrewinardeer•5mo ago

You're right. Silly me to even assume I would be able to establish communication with a biological entity on their side.

troupo•5mo ago

One of the many reasons why any advice du jour on "just use this methodology to make agentic coding produce amazing results" is utter crap.

naiv•5mo ago

I think this is directly related to https://x.com/sama/status/1965110064215458055

And I think it was 100% on purpose that they degraded the model performance as Claude Code got so popular and they either ran out of capacity or were losing money too fast.

But now that people are fleeing to Codex as it improved so much during the time, they had to act now.

deepdarkforest•5mo ago

They will probably also release sonnet 4.2 or something soon to make people jump back again to try it and hopefully restick

rsanek•5mo ago

I wonder how long the myth of AI firms losing money on inference will persist. Feels like the majority of the evidence points to good margins on that side

disgruntledphd2•5mo ago

> I wonder how long the myth of AI firms losing money on inference will persist. Feels like the majority of the evidence points to good margins on that side

If they're not losing money on inference, then why do they need to keep raising absurd amounts of money? Like, if inference is profitable and they're still losing lots and lots of money, then training must be absurdly expensive, which means that basically they invest in quickly depreciating capital assets (the models) so not a good business.

I think Anthropic is an interesting case study here, as most of their volume is API and they don't have a very generous free tier (unlike OpenAI).

consumer451•5mo ago

> If they're not losing money on inference, then why do they need to keep raising absurd amounts of money?

I recently heard someone say that ~"state of the art LLMs are the most rapidly depreciating asset in history."

This seemed accurate sounding to me. Anyone else have thoughts on this?

theshrike79•5mo ago

If Codex CLI was even half as good as Claude Code's CLI, I'd seriously consider moving.

But alas it's not. It looks like some intern whipped it together.

BhavdeepSethi•5mo ago

You're absolutely right! The degraded model quality finally pushed me to stop paying for the max plan. Still on the Pro for now.

irthomasthomas•5mo ago

  "we often make changes intended to improve the efficiency and throughput of our models.."

https://status.anthropic.com/incidents/h26lykctfnsz

I thought Anthropic said they never mess with their models like this? Now they do it often?

mccoyb•5mo ago

I read this as changes to quantization and batching techniques. The latter shouldn’t affect logits, the former definitely will …

jjani•5mo ago

They already have a track record of messing with internal system prompts (including those that affect the API) which obviously directly change outputs given the same prompts. So in effect, they've already been messing with the models for a long time. It's well known among founders who run services based on their products that this happened, everyone who does long output saw the same. It happened around November last year. If you had a set of evals running that expected an output of e.g. 6k tokens in length on 3.5 Sonnet, overnight it suddenly started cutting off at <2k, ending the message with something like "(Would you like me to continue?)". This is on raw API calls.

Never seen or heard of (from people running services at scale, not just rumours) this kind of API behaviour change for a the same model from OpenAI and Google. Gemini 2.5 Pro did materially change at time of prod release despite them claiming they had simply "promoted the final preview endpoint to GA", but in that case you can give them the benefit of it being technically a new endpoint. Still lying, but less severe.

simonw•5mo ago

Can you expand on "messing with internal system prompts" - this is the first I have heard of that.

jjani•5mo ago

We talked about this a few weeks ago so it can't be the first time you're hearing about this :) [1] You hadn't heard about it before because it really only affected API customers running services including calls that required output of 2.5k+ (rough estimate) tokens in a single message. Which is pretty much just the small subset of AI founders/developers that are in the long-form content space. And then out of those, the ones using Sonnet 3.5 at the time for these tasks, which is an even smaller number. Back then it wasn't as favoured yet as it is now, especially for content. It's also expensive for high-output tasks, so only relevant to high-margin services, mostly B2B. Most of us in that small group aren't posting online about it - we urgently work to work around it, as we did back then. Still, as I showed you, some others did post about it as well.

The only explanations were either internal system prompt changes, or updating the actual model. Since the only sharply different evals were those expecting 2.5k+ token outputs with all short ones remaining the same, and the consistency of the change was effectively 100%, it's unlikely to have been a stealth model update, though not impossible.

[1] https://news.ycombinator.com/item?id=44844311

simonw•5mo ago

I didn't see that as being about changing system prompts, I thought it was about changing model weights.

simonw•5mo ago

Anthropic have frequently claimed that they do not change the model weights without bumping the version number.

I think that is compatible with making "changes intended to improve the efficiency and throughput of our models" - i.e. optimizing their inference stack, but only if they do so in a way that doesn't affect model output quality.

Clearly they've not managed to do that recently, but they are at least treating these problems as bugs and rolling out fixes for them.

irthomasthomas•5mo ago

Do you have any links to statements they have made about that? I can only find this from last year.

  >We've read and heard that you'd appreciate more transparency as to when changes, if any, are made. We've also heard feedback that some users are finding Claude's responses are less helpful than usual. Our initial investigation does not show any widespread issues. *We'd also like to confirm that we've made no changes to the 3.5 Sonnet model or inference pipeline.*

https://www.reddit.com/r/ClaudeAI/comments/1f1shun/new_secti...

That statement aged poorly. The recent incident report admits they "often" ship optimizations that affect "efficiency and throughput." Whether those tweaks touch weights, tensor-parallel layout, sampling or just CUDA kernels is academic to paying users: when downstream quality drops, we eat the support tickets and brand damage.

We don't need philosophical nuance about what counts as a "model change." We need a change log: timestamped, versioned, and machine-readable that covers any modification that can shift outputs: weights, inference config, system prompt, temperature, top-P, KV-cache size, rollout percentage, the lot. If your internal evals caught nothing but users did, give us the diff and let us run our own regression tests.

History proves inference changes can drastically alter outputs. When gpt-oss launched, providers using identical weights delivered wildly different qualities due to inference configurations.

We need transparency about all changes whether model weights or infrastructure. Anthropic's eval suite clearly missed this real-world regression. Proactive change notifications would let us run our own evals to prevent failures. Without this, we're forced to reactively troubleshoot. An unacceptable risk for production systems.

simianwords•5mo ago

There are loads of people who just used Claude and left unimpressed and moved on to something else. They would never know about this regression.

And this bad memory might stick for a while.

avishai2112•5mo ago

what kind of incident report is this ? “It’s a bug, we fixed it !” - Anthropic

naiv•5mo ago

The model providers should analyse the tone of the instructions.

Before I finally gave up on Claude Code, I noticed that I got more aggressive towards it, the more stupid it got as I could not believe how dumb it started to be.

And I am sure I was not the only one.

stpedgwdgfhgdd•5mo ago

This RCA is too vague: ‘a bug’

I want to know how i could have been impacted.

fxtentacle•5mo ago

My guess would be that they tried to save money with speculative decoding and they had too loose thresholds for the verification stage.

As someone who has implemented this myself, I know that it’s pretty easy to make innocent mistakes there. And the only visible result is a tiny distortion of the output distribution which only really becomes visible after analysing thousands of tokens. And I would assume that all providers are using speculative decoding by now because it’s the only way to have good inference speed at scale.

As a quick recap, you train a small model to quickly predict the easy tokens, like filler words, so that you can jump over them in the recurrent decoding loop. That way, a serial model can predict multiple tokens per invocation, thereby easily doubling throughput.

And the fact that they need lots of user tokens to verify that it works correctly would nicely explain why it took them a while to find and fix the issue.

metadat•5mo ago

Speculative Decoding, for the uninitiated (like me..): https://research.google/blog/looking-back-at-speculative-dec...

buildbot•5mo ago

Standard speculative decoding without relaxed acceptance has no accuracy impact as far as I understand things. If you always run the verification; you always have the true target model output.

fxtentacle•5mo ago

You need the relaxed acceptance to get those cost savings. Every time you determine your small model to be "good enough", it allows the large model to skip one iteration in its recursive decoding loop. You are correct in the sense that you can estimate the probability that the large model would have had for predicting the token(s) that your small model chose, but you don't know which tokens the large model might have predicted based on tokens that the small model did not predict. Unless, of course, your "small" model becomes as precise as the large model, at which point it's not small anymore.

In other words: The speculative decoding causes "holes" in your beam search data. You can fill them by sampling more, increasing hosting costs. Or you fill them with approximations, but that'll skew the results to be more "safe" => more generic, less reasoning.

mickdarling•5mo ago

Would that show up more with heavily cached data or less? Because I've been using very heavily cached data on the order of 30 to 1 cache versus not. And I haven't had much trouble with Claude Code at all in the last month. A few problems here and there, but they were sporadic. While I know other people have had massive issues, I wonder if the issues that occurred didn't effect cache as much and thus prevented me from seeing some of the worst aspects of it.

d4rkp4ttern•5mo ago

My biggest concern now is — if the issue they have is as vague as “reports of degraded quality”, how do they even approach fixing it? And what measurable criteria will they use, to declare that it is fixed? Would they take a vibes-check opinion poll?

Curious why they can’t run some benchmarks with the model (if they suspect the issue is with the model itself) or some agentic coding benchmarks on Claude-code (if the issue might be with the scaffolding, prompts etc).

ziml77•5mo ago

How do people even identify degraded output when the output for the same input can change so much each time its submitted?

extr•5mo ago

Anthropic can't seem to get a win lately. Claude code hangover, their lunch is getting eaten by Chinese OSS on the value side and GPT-5 High on the quality side.

I may be in a minority but I am still quite bullish on them as a company. Even with GPT-5 out they still seem to have a monopoly on taste - Claude is easily the most "human" of the frontier models. Despite lagging in features compared to ChatGPT Web, I mostly ask Claude day-to-day kinds of questions. It's good at inferring my intent and feels more like a real conversation partner. Very interested to see their next release.

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Transcribe your aunts post cards with Gemini 3 Pro

.72% Variance Lance

ReKindle – web-based operating system designed specifically for E-ink devices

Encrypt It

NextMatch – 5-minute video speed dating to reduce ghosting

Personalizing esketamine treatment in TRD and TRBD

SpaceKit.xyz – a browser‑native VM for decentralized compute

NotebookLM: The AI that only learns from you

C and C++ dependencies: don't dream it, be it

Show HN: Vbuckets – Infinite virtual S3 buckets

Open Molten Claw: Post-Eval as a Service

New York Budget Bill Mandates File Scans for 3D Printers

The End of Software as a Business?

Exploring 1,400 reusable skills for AI coding tools

Show HN: A unique twist on Tetris and block puzzle

The logs I never read

How to use AI with expressive writing without generating AI slop

Show HN: LinkScope – Real-Time UART Analyzer Using ESP32-S3 and PC GUI

Cppsp v1.4.5–custom pattern-driven, nested, namespace-scoped templates

The next frontier in weight-loss drugs: one-time gene therapy

At Age 25, Wikipedia Refuses to Evolve

Show HN: ReviewReact – AI review responses inside Google Maps ($19/mo)

Why AlphaTensor Failed at 3x3 Matrix Multiplication: The Anchor Barrier

Ask HN: How much of your token use is fixing the bugs Claude Code causes?

Show HN: Agents – Sync MCP Configs Across Claude, Cursor, Codex Automatically

Hello

FSD helped save my father's life during a heart attack

Show HN: Writtte – Draft and publish articles without reformatting, anywhere

Portuguese icon (FROM A CAN) makes a simple meal (Canned Fish Files) [video]

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

Transcribe your aunts post cards with Gemini 3 Pro

.72% Variance Lance

ReKindle – web-based operating system designed specifically for E-ink devices

Encrypt It

NextMatch – 5-minute video speed dating to reduce ghosting

Personalizing esketamine treatment in TRD and TRBD

SpaceKit.xyz – a browser‑native VM for decentralized compute

NotebookLM: The AI that only learns from you

Incident Report for Anthropic

Comments