Currently nothing on the status: https://status.claude.com/
Currently nothing on the status: https://status.claude.com/
EDIT: Now they show the issue, kudos to them! Transparency is the key to build trust. No body expects a perfect service, thanks Claude team for your efforts.
When Claude took an extra day off, he forgot to report his hours to the dashboard when he will be unavailable / unresponsive and this is probably why people here are complaining about no status update.
Wonder where I have seen that before?
The one that pissed me off the most was Gemini API displaying very clearly 1) user cancelled request in Gemini chat app 2) API showing "user quota reached". Both were blatant lies. In the latter case, you could find the actual global quota cause later in the error message. I don't know why there isn't more outrage. I'm guessing this sort of behavior is not new, but it's never been so visible to me.
Probably the most damning fact about LLMs is just how poorly written their parent companies' systems are.
It's actually about the same speed when accounting for how much more responsive my system is to Anthropic's saas infrastructure
I've tried different harnesses, building my own etc.
They are reasonably close to haiku? Maybe?
Claims to the tune of "this 0.5B local model running on my phone is almost as good as [large expensive model]" are common but greatly exaggerated, it's simply not true beyond the most basic use cases.
Only the much larger models (such as the 744B GLM-5) manage to come close, but nobody's running those locally.
I'm pretty sympathetic to Anthropic/OpenAI just because they are scaling a pretty new technology by 10x every year. It is too bad Google isn't trying to compete on coding models though, I feel like they'd do way better on the infra and stability side.
Recently I wrote a data transformation pipeline and I added a note that the whole pipeline should be idempotent. I asked Claude to prove it or find a counterexample. It found one after 25 minutes of thinking; I reasonably estimate that it would take me far longer, perhaps one whole day. I couldn’t care less about using Claude to type code I already knew.
Lol no, I've yet to find a model with those properties. Sounds like a fast track to AI psychosis.
The domain I work in doesn't have enough public documentation for these models to be particularly helpful without a lot of handholding though.
Documentation is helpful to describe high-level intentions, but the beauty is when you have access to source code. Now a good model can derive behavior from implementation instead of docs which are inherently limited.
I implemented the luks+btrfs part by hand a few years ago, and I resurrected the project a couple months ago. Using source code for local reference, Claude discovered so many major cases I missed, especially in the unhappy-path scenarios. Even in my own hand-written tests. And it helped me set up an amazing NixOS VM test system include reproduction tests on the libraries to see what they do in weird undocumented cases.
So I think "tasks beyond our intellect (and/or time and energy)" can be fitting. Otherwise I'd only be capable of polishing this project if luks+btfs+systemd were specifically my day job. I just can't fit so much in my head and working memory.
Though my workflow always starts in plan mode where Claude is clearly more thorough (which is the reason it takes 10x as long as going straight to impl). I rarely skip it.
Can you provide actual context to what was beyond your ability and how you're able to determine if the solution was correct?
Finding out that all these comments that reference the "magical incantation" tend to be full of hot air. Maybe yours is different.
I had hundreds of unit tests that did not trigger an assertion I added for idempotency. Claude wrote one that triggered an assertion failure. Simple as that. A counterexample suffices.
How have you set things up to have a good experience?
Once it can search for factual information online the smaller model size becomes less noticeable
I've got a AMD395+ with 128GB, so running a ~46GB model gives me about 85k tokens, which gives me easily copy/paste/find/replace behavior; it mocks up new components; it can wire in some functionality, but that's usually at it's limits and requires more debugging.
I've been looking at how to schedule it using systemd to keep a wiki up to date with a long loaded project and breaks the "blank page" issue with extending behaviors in a side project.
I understand some of these larger models can do things faster and smarter, but I don't see how they can implement novel functionality required for the type of app I'm concerned with. If I just wanted to make endless CRUD or TODO apps, I'm betting I could figure out a loop that's mostly hands off.
> Probably the most damning fact about LLMs is just how poorly written their parent companies' systems are
I have been working on some work related to MCP and found some gaps in implementation in Claude and Codex. This is a relatively simple, well-defined spec and both Claude Code and Codex CLI have incomplete/incorrect implementations.During this process of investigation, I checked the CC repo and noticed they had 5000+ issues open. Out of curiosity, I skimmed through them and many point to regressions, real bugs, simple changes, etc. Maybe they have some internal tracker they are using, but you would think that a company with functionally unlimited tokens and access to the best models would be able to use those tokens to get their own house in order.
My sense now is that there is a need for the industry to create a lot of hype right now so we see showmanship like the kernel compiler and the agent swarms building a semi-functional browser, etc....yet their own tooling has not fully implemented their own protocol (MCP) correctly. They need all of us to believe that these agents are more capable than they actually are; the more piles of tangled code you write and the more discipline you cede to their LLMs, the more dependent you are on those LLMs to even know what the code is doing. At some point, teams become incapable of teasing the code apart anymore because no one will understand it.
Peeking at the issues in the repos and seeing big gaps in functionality like Codex's missing support for MCP prompts and resources is like looking behind the curtain at reality.
This seems like a popular take, but I think it's the other way around. Them dogfooding cc with cc is proof that it can work, and that "code quality" doesn't really matter in the end.
Before cc claude.ai (equivalent of chatgpt) was meh. They were behind in features, behind in users, behind in mindshare. cc took them from "weirdos who use AI for coding" to "wait, you're NOT using cc? you freak" in ~1 year. And cc is a very big part of them reaching 1-2B$ monthly revenue.
Yes, it's buggy. Yes, the code is a mess (as per the leak, etc). But they're also the most used coding harness. And, on a technical side, having had cc as early as they did, helped them immensely on having users, having real-usage data, real-usage signals and so on. They trained the models on that data, and trained the models in sync with the harness. And it shows, their models are consistently the highest ranked both on benchmarks and on "vibes" from coders. Had they not have that, they would have lacked that real-world data.
And if you look at the competition it's even more clear. Goog is kidna nowhere with their gemini-cli, is all over the place with their antigravity-ex-windsurf, and while having really good generalist models, the general mindshare is just not there for coding. Same for oAI. They have an open-source, rust-based, "solid" cli, they have solid models (esp in code review, planning, architecture, bug fixing, etc) but they are not #1. Claude is with their cc.
So yeah, I think it's really the other way around. Having a vibe-coded, buggy, bad code solution, but being the first to have it, the first to push it, and the first to keep iterating on it is really what sets them apart. Food for thought on the future, and where coding is headed.
Edit: But the status page - at least as of now - is clearly communicating elevated error rates.
He'll be back to work by tomorrow.
https://gist.github.com/ManveerBhullar/7ed5c01a0850d59188632...
simple script i use to toggle which backend my claude code is using
"Wait, I'm editing the wrong sections. The edit tool tried to match but replaced with different prop names than what was in the file. Let me re-read the file and understand the current state properly."
And of course models are not 1-to-1 and have different strengths and weaknesses. I know I wont get the same quality plan mode output probably. Its a tradeoff.
e.g. you can run Claude models on AWS Bedrock giving you provider choice for the same model. Whether or not you need model agnosticism at that point seems like a very different question.
Is anyone doing this for personal dev that isnt token fed by employers? Coding plans are subzidized for a reason right? If I did API usage from a cloud provider id be out tens of thousands already.
I put my Amex details into OAI, I get tokens, it just works. I really don't understand what the hell is going on with Claude. The $200/m thing is so confusing to me. I'd rather just go buy however many tokens I plan to use. $200 worth of OAI tokens would go a really long way for me (much longer than a month), but perhaps I am holding it wrong.
For really old programmers: this is like when Computer Literacy bookstore was closed.
Loads of people cancelled their subscriptions.
Should be the least load they have been under in months. Yet unreliable.
Crazy that people are going with their benchmaxxed models.
rishabhaiover•8h ago
bronlund•8h ago
matheusmoreira•8h ago