There is one caveat, and that is you have to give the model well thought out constraints to guide it properly, and absolutely take the time to read all the thinking it's doing and not be afraid to stop the process whenever things go sideway.
People who just let Claude roam free on their repository deserve everything they end up with.
I think even with the worse limits people still hated it but when you start to either on purpose or inadvertently make the model dumber that's when there's really no purpose to keep using Claude anymore.
Max 5, sonnet for 95% of things. I never run out of tokens in a week and I use it for ~5-6 hours a day.
The product keeps getting worse so I will definitely evaluate options and possibly switch if management keeps screwing up the product.
Dear Anthropic:
Please, for the love of all things holy, NEVER change someone's defaults without INFORMING the end user first, because you will wind up with people confused, upset, and leaving your service.
I was worried about Anthropic models quality varying and about Anthropic jacking up prices.
I don't think Claude Code is the best agent orchestrator and harness in existence but it's most widely supported by plugins and skills.
I'm debating trying out Codex, from some people I hear its "uncapped" from others I hear they reached limits in short spans of time.
There's also the really obnoxious "trust me bro" documentation update from OpenClaw where they claim Anthropic is allowing OpenClaw usage again, but no official statement?
Dear Anthropic:
I would love to build a custom harness that just uses my Claude Code subscription, I promise I wont leave it running 24/7, 365, can you please tell me how I can do this? I don't want to see some obscure tweet, make official blog posts or documentation pages to reflect policies.
Can I get whitelisted for "sane use" of my Claude Code subscription? I would love this. I am not dropping $2400 in credits for something I do for fun in my free time.
Plus is still very usable for me though. I have not tried Claude Pro in quite a while and if people are complaining about usage limits I know it's going to be a bad time for me. I had to move up from Claude Pro when the weekly limits were introduced because it was too annoying to schedule my life around 5hr windows.
I started using codex around December when I started to worry I was becoming too dependent on Claude and need to encourage competition. codex wasn't particularly competitive with Claude until 5.4 but has grown on me.
The only thing I really care about is that whatever I'm using "just works" and doesn't hurt limits and Claude code has been flaky as all hell on multiple fronts ever since everyone discovered it during the Pentagon flap. So I tend to reach for ChatGPT and codex at the moment because it will "just work" and there's a good chance Claude will not.
They won't ever be SOTA due to money, but "last year's SOTA" when it costs 1/4 or less, may be good enough. More quantity, more flexibility, at lower edge quality. It can make sense. A 7% dumber agent TEAM Vs. a single objectively superior super-agent.
That's the most exciting thing going on in that space. New workflows opening up not due to intelligence improvements but cost improvements for "good enough" intelligence.
Why should anyone waste time on poorer results? I'd rather pay my $200/mo because my time matters. I'm not a poor college student anymore, and I need more return on my time.
I'm not shitting on open weights here - I want open source to win. I just don't see how that's possible.
It's like Photoshop vs. Gimp. Not only is the Gimp UX awful, but it didn't even offer (maybe still doesn't?) full bit depth support. For a hacker with free time, that's fine. But if my primary job function is to transform graphics in exchange for money, I'm paying for the better tool. Gimp is entirely a no-go in a professional setting.
Or it's like Google Docs / Microsoft Office vs. LibreOffice. LibreOffice is still pretty trash compared to the big tools. It's not just that Google and Microsoft have more money, but their products are involved in larger scale feedback loops that refine the product much more quickly.
But with weights it's even worse than bad UX. These open weights models just aren't as smart. They're not getting RLHF'd on real world data. The developers of these open weights models can game benchmarks, but the actual intelligence for real world problems is lacking. And that's unfortunately the part that actually matters.
Again, to be clear: I hate this. I want open. I just don't see how it will ever be able to catch up to full-featured products.
The trick is going to be recognizing tasks which have some ceiling on what they need and which will therefore eventually be doable by open models, and those which can always be done better if you add a bit more intelligence.
When was the last time you used any of them? Because, a lot of people are actively using them for 9-5 work today, I count myself in that group. That opinion feels outdated, like it was formed a year ago+ and held onto. Or based on highly quantized versions and or small non-Thinking models.
Do you really think Qwen3.6 for a specific example is "50%" as good as Opus4.7? Opus4.7 is clearly and objectively better, no debate on that, but the gap isn't anywhere near that wide. I'd call "20%" hyperbole, the true difference is difficult to exactly measure but sub-10% for their top-tier Thinking models is likely.
The breakeven at this price is 6 minutes of productivity per work day for an engineer making $200k.
Who said so? GLM 5.1 is 90% Opus, at least. Some people quite happy with Kimi 2.6 too. I did not try Deepseek 4 yet but also hearing it is as good as Opus. You might be confusing open source models with local models. It is not easy to run a 1.6T model locally, but they are not 50% of SOTA models.
Because in almost no real-world project is "programming time" the limiting factor?
This kind of rhetoric is not helpful. If you want to make a point, then make one, but this adds nothing to the conversation. Maybe open source models don't work for you. They work very well for me.
I'm not disagreeing per-se but if you think the benchmarks are flawed and "my real world usage" is more reflective of model capabilities, why not write some benchmarks of your own?
You stand to make a lot of money and gain a lot of clout in the industry if you've figured out a better way to measure model capability, maybe the frontier labs would hire you.
Starting closer to 40k if you want something that's practical. 10k can't run anything worthwhile for SDLC at useful speeds.
(If you are willing to let the machine work mostly overnight/unattended, with only incidental and sporadic human intervention, you could even decrease that memory requirement a bit.)
So you can run 1 agent locally on $1k to $3k hardware
They can run a fleet of thousands
Yes, it's possible to run tiny quantized models, but you're working with extremely small context windows and tons of hallucinations. It's fun to play with them, but they're not at all practical.
[†] The latest Qwen 3.6 whatever has been a noticeable improvement, and I'm not even at the point where I tweak settings like sampling, temperature, etc. No idea what that stuff does, I just use the staff picks in LM Studio and customize the system prompts.
Competition (OpenAI vs Anthropic is fun to watch) and open source will get us there soon I think.
For now. That doesn't really change the risk, that just means they are all hyper competitive right this moment, and so they are comparable. If one of them becomes king of the hill, nothing stops them from silently degrading or jacking prices.
The only shield is to not be dependent in the first place. That means keeping your skills sharp and being willing to pass on your knowledge to juniors, so they aren't dependent on these things.
Of course, many people are building their business on huge AI scaffolding. There's nothing they can do.
It's like dating apps. They don't want you to find a good match, because then you cancel the subscription.
Speaking of which:
https://www.cnbc.com/2026/04/24/deepseek-v4-llm-preview-open...
AI companies have the same incentive. Make it cheaper and people will use it more, making you more money (assuming your price is still above cost). And of course they have every reason to reduce their on costs.
Now I'm looking for an extremely simple open-source coding agent. Nanocoder doesn't seem install on my Mac and it brings node-modules bloat, so no. Opencode seems not quite open-source. For now, I'm doing the work of coding agent and using llama_cpp web UI. Chugging it along fine.
Even the FSF recognizes that non-copyleft licenses still follow the Freedoms, and therefore are still Free Software.
However, it's hard to justify Cursor's cost. My bill was $1,500/mo at one point, which is what encouraged me to give CC a try.
I got annoyed enough with Anthropic's weird behavior this week to actually try this, and got something workable up & running in a few days. My case was unique: there's no Claude Code for BeOS, or my older / ancient Macs, so it was easier to bootstrap & stitch something together if I really wanted an agentic coding agent on those platforms. You'll learn a lot about how models actually work in the process too, and how much crazy ridiculous bandaid patching is happening Claude Code. Though you might also appreciate some of the difficulties that the agent / harnesses have to solve too. (And to be clear, I'm still using CC when I'm on a platform that supports it.)
As for the llama_cpp vs Claude Code delays - I've run into that too. My theory is API is prioritized over Claude Code subscription traffic. API certainly feels way faster. But you're also paying significantly more.
Heck two weeks ago i tried my hardest to hit my limit just to make use of my subscription (i sometimes feel like i'm wasting it), and i still only managed to get to 80% for the week.
I generally prune my context frequently though, each new plan is a prune for example, because i don't trust large context windows and degradation. My CLAUDE.md's are also somewhat trim for this same fear and i don't use any plugins, and only a couple MCPs (LSP).
No idea why everyone seems to be having such wildly different experiences on token usage.
First was the CC adaptive thinking change, then 4.7. Even with `/effort max` and keeping under 20% of 1M context, the quality degradation is obvious.
I don't understand their strategy here.
Here is a sample report that tries out the cheaper models + the newest Kimi2.6 model against the 5.4 'gold' testcases from the repo: https://repogauge.org/sample_report.
There's really no immediate solution to this other than letting the price float or limiting users as capacity is built out this gets better.
The market-leading technology is pretty close to "good enough" for how I'm using it. I look forward to the day when LLM-assisted coding is commoditized. I could really go for an open source model based on properly licensed code.
It does seem like the sweet spot between WallE and the destroyed earth in WallE.
(but I guess they're not really conflicting, if the "solution" involves upgrading to a higher plan)
I find it incredibly difficult to saturate my usage. I'm ending the average week at 30-ish percentage, despite this thing doing an enormous amount of work for (with?) me.
Now I will say that with pro I was constantly hitting the limit -- like comically so, and single requests would push me over 100% for the session and into paying for extra usage -- and max 5x feels like far more than 5x the usage, but who knows. Anthropic is extremely squirrely about things like surge rates, and so on.
I'm super skeptical of the influx of "DAE think Opus sucks now. Let's all move to Codex!" nonsense that has flooded HN. A part of it is the ex-girlfriend thing where people are angry about something and try to force-multiply their disagreement, but some of it legitimately smells like astroturfing. Like OpenAI got done pay $100M for some unknown podcaster and start hiring people to write this stuff online.
>I'm super skeptical of the influx of "DAE think Opus sucks now. Let's all move to Codex!" nonsense that has flooded HN. A part of it is the ex-girlfriend thing where people are angry about something and try to force-multiply their disagreement, but some of it legitimately smells like astroturfing. Like OpenAI got done pay $100M for some unknown podcaster and start hiring people to write this stuff online.
A lot of people are angry about the whole openclaw situation. They are especially bitter that when they attempted to justify exfiltrating the OAuth token to use for openclaw, nobody agreed with them that they had the right to do so, and sided with Claude that different limits for first-party use is standard. So they create threads like this, and complain about some opaque reason why Anthropic is finished (while still keeping their subscription, of course).
I did a 1:1 map of all my Claude Code skills, and it feels like I never left Opus.
Super happy with the results.
I tried Kimi 2.6 and it's almost comparable to Opus. Anthropic lost the ball. I hope this is a sign the we are moving towards a future where model usage is a commodity with heavy competition on price/performance
I hate enshittification and I hate seeing this happening to Claude Code right now.
Anthropic can't even scale their own infrastructure operations, because it does not exist and they do not have the compute; even when they are losing tens of billions and can nerf models when they feel like it.
Once again, local models are the answer and Anthropic continues to get you addicted to their casino instead of running your own cheaper slot machine, which you save your money.
Every time you go to Anthropic's casino, the house always wins.
I am certainly not saying people should “spend more money,” more like the Claude Code access in the Pro plan seems kind of like false advertising. Since it’s technically usable, but not really.
Its particularly noticeable when for a long time you could work an 8 hour day in codex on ChatGPT´s $20/month plan (though they too started tightening the screws a couple of weeks back)
Strange how things can change!
GPT 5.4+ takes its time and considers even edgecases unprovoked that in fact are correct and saves me subsequent error hunting turns and finally delivers. Plus no "this doesn't look like malware" or "actually wait" thinking loops for minutes over a oneliner script change.
GLM always feels like it's doing things smarter, until you actually review the code. So you still need the build/prune cycle. That's my experience anyway.
Oh wait, I don’t have to imagine. That’s what Anthropic does. A nice preview for what is in store for those who chose to turn off their brains and turn on their AI agents.
I use AI, but only what is free-of-charge, and if that doesn't cut it, I just do it like in the good old times, by using my own brain.
And by crikey do I empathise with the poor support in this article. Nothing has soured me on Anthropic more than their attitude.
Great AI engineers. Questionable command line engineers (but highly successful.) Downright awful to their customers.
Asked support hey i got nothing back i tried prompting several times used a ton of usage and it gave no response. I'd just like usage back. What I payed for I never got.
Just bot response we don't do refunds no exceptions. Even in the case they don't serve you what your plan should give you.
Like 3 weeks ago Qwen3-coder was the best coding LLM to run locally. I haven’t spent time since to figure out if anything is better.
You can also power Opencode with OpenRouter which lets you pay for any LLM à la carte.
I haven't seen anyone mention this publicly, but I've noticed that the same model will give wildly different results depending on the quantization. 4-bit is not the same as 8-bit and so on in compute requirements and output quality. https://newsletter.maartengrootendorst.com/p/a-visual-guide-...
I'm aware that frontier models don't work in the same way, but I've often wondered if there's a fidelity dial somewhere that's being used to change the amount of memory / resources each model takes during peak hours v. off hours. Does anyone know if that's the case?
Then within the last few months everything changed and went to shit. My trust was lost. Behavior became completely inconsistent.
During the height of Claude's mental retardation (now finally acknowledged by the creators) I had an incident where CC ran a query against an unpartitioned/massive BQ table that resulted in $5,000 in extra spend because it scanned a table which should have been daily partitioned 30 times. 27 TB per scan. I recall going over and over the setup and exhaustively refining confidence. After I realized this blunder, I referred to it in the same CC session, "jesus fucking christ, I flagged this issue earlier" -- it responded, "you did. you called out the string types and full table scans and I said "let's do it later." That was wrong. I should have prioritized it when you raised it". Now obviously this is MY fault. I fucked up here, because I am the operator, and the buck stops with me. But this incident really galvinized that the Claude I had come to vibe with so well over the last N months was entirely gone.
We all knew it was making making mistakes, becoming fully retarded. We all felt and flagged this. When Anthropic came out and said, "yeah ... you guys are using it wrong, its a skill issue" I knew this honeymoon was over. Then recently when they finally came out and ack'd more of the issues (while somehow still glossing over how bad they fucked up?) it was the final nail. I'm done spending $ on Anthropic ecosystem. I signed up for OpenAI pro $200/mo and will continue working on my own local inference in the meantime.
They could have just kept doing this - literally printing money. Literally: do absolutely nothing, go on vacation, profit $$$. So why did so much change? I think that the issue is they were trying to optimize CC for the monthly plan folks, the ones who are likely losing the company money, but API users became collateral damage.
WTF are y'all doing that chews tokens so fast? I mean, sure, I could spin up Gas Town and Beads and produce infinite busy work for the agents, but that won't make useful software, because the models don't want anything. They don't know what to build without pretty constant guidance. Left to their own devices, they do busy work. The folks who "set and forget" on AI development are producing a whole lot of code to do nothing that needed doing. And, a lot of those folks are proud of their useless million lines of code.
I'm not trying to burn as many tokens as a possible, I'm trying to build good software. If you're paying attention to what you're building, there's so many points where a human is in the loop that it's unusual to run up against token limits.
Anyway, I assume that at some point they have to make enough money to pay the bills. Everything has been subsidized by investors for quite some time, and while the cost per token is going down with efficiency gains in the models/harnesses and with newer compute hardware tuned for these workloads, I think we're all still enjoying subsidized compute at the moment. I don't think Anthropic is making much profit on their plans, especially with folks who somehow run right at the edge of their token limit 24/7. And, I would guess OpenAI is running an even lossier balance sheet (they've raised more money and their prices are lower).
I dunno. I hear a lot of complaining about Claude, but it's been pretty much fine for me throughout 4.5, 4.6 and 4.7. It got Good Enough at 4.5, and it's never been less than Good Enough since. And, when I've tried alternatives, they usually proved to be not quite Good Enough for some reason, sometimes non-technical reasons (I won't use OpenAI, anymore, because I don't trust OpenAI, and Gemini is just not as good at coding as Claude).
but then two months ago 4.6 started getting forgetful and making very dumb decisions and so on. Everyone started comparing notes and realising it wasn’t “just them”. And 4.7 isn’t much better and the last few weeks we keep having to battle the auto level of effort downgrade and so on. So much friction as you think “that was dumb” and have to go check the settings again and see there has been some silent downgrade.
We all miss the early days of 4.6, which just show you can have a good useful model. LLMs can be really powerful but in delivering it to the mass market Anthropic throttle and downgrade it to not useful.
My thinking is that soon deepseek reaches the more-than-good-enough 4.6+ level and everyone can get off the Claude pay-more-for-less trajectory. We don’t need much more than we’ve already had a glimpse of and now know is possible. We just need it in our control and provisioned not metered so we can depend upon it.
wilbur_whateley•1h ago
API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.
giancarlostoro•1h ago
drunken_thor•1h ago
ModernMech•1h ago
isjcjwjdkwjxk•1h ago
Please. This is a toy. A novel little tech-toy. If you depend on it now for doing your job then, frankly, you deserve to have your rug pulled now and then.
jasonlotito•1h ago
amarcheschi•1h ago
jansenmac•27m ago
couchdb_ouchdb•21m ago