Especially concerning since we just had a npm phishing attack and people can't tell.
But less tongue-in-cheek, yeah Anthropic definitely has reliability issues. It might be part of trying to move fast to stay ahead of competitors.
B. Let's just say I didn't write the most robust javascript decompilation/deminification engine in existence solely as an academic exercise :)
Gemini never goes down, OpenAI used to go down once in a while but is much more stable now, and Anthropic almost never goes a full week without throwing an error message or suffering downtime. It's a shame because I generally prefer Claude to the others.
But even when the API is up, all three have quite high API failure rates, such as tool calls not responding with valid JSON, or API calls timing out after five minutes with no response.
Definitely need robust error handling and retries with exponential backoff because maybe one in twenty-five calls fails and then succeeds on retry.
It’s like every other day, the moment US working hours start, AI (in my case I mostly use Anthropic, others may be better) starts dying or at least getting intermittent errors.
In EU working hours there’s rarely any outages.
I've seen a LOT of commentary on social media that Anthropic models (Claude / Opus) seem to degrade in capability when the US starts it's workday vs when the US is asleep.
> Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.
the statement is carefully worded to avoid the true issue: an influx of traffic resulting in service quality unintentionally degrading
I was trying to say that systemic issues (such as load capacity) seem to degrade the models in US working hours and has been noticed by a non-zero number of users (myself included).
Glad I switched.
Comment last time that had me chuckling.
(nit.. please don't actually do this).
I've noticed a trend with their incident reports... "all fixed", basically. Little mind/words to prevention
https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...
edit: before some drive-by datamining nerd thinks I do/did SRE for Google, no
To be fair, too, it's likely been mentioned. I'm biased towards an unreasonable standard due to my line of work.
A status page without some thorough history is glorified 'About Us' :P
Every status page incident on every normal company everywhere in the world has links to lead you to the postmortem and their steps to avoid it. Here are a few examples:
https://status.gitlab.com/ -> https://status.gitlab.com/pages/history/5b36dc6502d06804c083...
https://status.hetzner.com/ -> https://status.hetzner.com/incident/2e715748-fddd-427b-a07b-...
https://www.githubstatus.com/ -> https://www.githubstatus.com/incidents/mj067hg9slb4
https://bitbucket.status.atlassian.com/ -> https://bitbucket.status.atlassian.com/incidents/4mcg46242wz...
It's literally a standard for your status page to communicate both about root cause and action plan how to prevent it in the future. Sure, when an incident is just happening, the status page entry doesn't have the postmortem and the steps to avoid, but later on those get added.
Being so overconfidently wrong reminds me of an LLM.
The shoulders of giants we stand on are slumped in sadness.
Or is there a better alternative to address this availability concern?
rob•4h ago
> This incident affects: claude.ai, console.anthropic.com, and api.anthropic.com.