The one statistic mentioned in this overview where they observed a 67% drop seems like it could easily be reduced simply by editing 3.7’s system prompt.
What are folks’ theories on the version increment? Is the architecture significantly different (not talking about adding more experts to the MoE or fine tuning on 3.7’s worst failures. I consider those minor increments rather than major).
One way that it could be different is if they varied several core hyperparameters to make this a wider/deeper system but trained it on the same data or initialized inner layers to their exact 3.7 weights. And then this would “kick off” the 4 series by allowing them to continue scaling within the 4 series model architecture.
I feel like a company doesn’t have to justify a version increment. They should justify price increases.
If you get hyped and have expectations for a number then I’m comfortable saying that’s on you.
It does make sense. The companies are expected to exponentially improve LLMs, and the increasing versions are catering to the enthusiast crowd who just need a number to go up to lose their mind over how all jobs are over and AGI is coming this year.
But there's less and less room to improve LLMs and there are currently no known new scaling vectors (size and reasoning have already been largely exhausted), so the improvement from version to version is decreasing. But I assure you, the people at Anthropic worked their asses off, neglecting their families and sleep and they want to show something for their efforts.
It makes sense, just not the sense that some people want.
I had to stop the model going crazy with unnecessary tests several times, which isn't something I had to do previously. Can be fixed with a prompt but can't help but wonder if some providers explicitly train their models to be overly verbose.
However, after having pretty deep experience with writing book (or novella) length system prompts, what you mentioned doesn’t feel like a “regime change” in model behavior. I.e it could do those things because its been asked to do those things.
The numbers presented in this paper were almost certainly after extensive system prompt ablations, and the fact that we’re within a tenth of a percent difference in some cases indicates less fundamental changes.
I’m fine with a v4 that is marginally better since the price is still the same. 3.7 was already pretty good, so as long as they don’t regress it’s all a win to me.
saladtoes•1h ago
These LLMs still fall short on a bunch of pretty simple tasks. Attackers can get Claude 4 to deny legitimate requests easily by manipulating third party data sources for example.
simonw•1h ago
I still don't see guardrails and scanning as effective ways to prevent malicious attackers. They can't get to 100% effective, at which point a sufficiently motivated attacker is going to find a way through.
I'm hoping someone implements a version of the CaMeL paper - that solution seems much more credible to me. https://simonwillison.net/2025/Apr/11/camel/
saladtoes•1h ago
sureglymop•33m ago
Or is it more about the user then having to confirm/verify certain actions and what is essentially a "permission system" for what the LLM can do?
My immediate thought is that that may be circumvented in a way where the user unknowingly thinks they are confirming something safe. Analogous to spam websites that show a fake "Allow Notifications" prompt that is rendered as part of the actual website body. If the P-LLM creates the plan it could make it arbitrarily complex and confusing for the user, allowing something malicious to happen.
Overall it's very good to see research in this area though (also seems very interesting and fun).