Prediction: Claude 5 will be a major regression

3•cadabrabra•1h ago

At this point it should be completely obvious to everyone that there’s what is approximately a linear relationship between model cost and model performance. Anthropic is claiming that Claude 5 Sonnet will cost about half as much as their current SOTA models. Therefore, expect about half the performance. This is Anthropic’s version of GPT-5, i.e. a way to fool their customers into using a less compute intensive model, almost purely for the benefit of the company. But as usual, they will rig the benchmarks and make it appear as though the model is better at certain things, like coding.

It’s an illusion, folks. You’re being played. Wake the hell up.

Also, I can’t believe that people still talk about SWE-Bench when there is a paper proving that the benchmark is completely useless because models regurgitate memorized answers.

Again, please, wake up.

https://arxiv.org/abs/2506.12286

Comments

bigyabai•1h ago

> It’s an illusion, folk. You’re being played.

How are they "being played" if Claude 5 isn't even out yet

cadabrabra•1h ago

It’s already obvious that it will be a scam. Higher benchmark scores and lower cost are two signs that customers are about to get scammed. We saw it with GPT-5.

bigyabai•1h ago

Is it? It might be possible that it's a scam, but for something to be "obvious" it has to release first.

There are plenty of ways to reduce inference cost for a high-intelligence model. Making sparser weights, for example, can increase the parameter count while reducing the inference cost and time.

cadabrabra•1h ago

I get what you’re saying, but I still think that it will be a scam. Bookmark this thread and let’s continue the conversation after it’s released.

bigyabai•1h ago

I think you are informed by more of an emotional interest than a technical one, here. You've written several such posts and many of them are astronomically unlikely predictions.

cadabrabra•1h ago

Ok but didn’t Karpathy make it clear that we live in the vibe era? I’m inclined to trust vibes more than technical jargon, and boy are the vibes off with what’s been happening!

Let’s see what happens :)

Redster•1h ago

Respectfully,

Claude 3 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4.1 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4.5 Opus: $5.00 (Input) / $25.00 (Output) per 1M tokens

cadabrabra•1h ago

This actually proves my point because if you read the anecdotes, you will notice a marked decline in performance. The version number goes up but the actual performance declines. The benchmarks can tell any story you want them to.

minimaxir•1h ago

> Anthropic is claiming that Claude 5 Sonnet will cost about half as much as their current SOTA models. Therefore, expect about half the performance.

That's not how LLM quality works.

cadabrabra•1h ago

Maybe not in theory but definitely in practice, as we’ve seen with GPT-5. These companies are lightning money on fire. If they reduce the cost, expect a proportional decrease in quality. All of the GPT-5 anecdotes confirm this. When the data and anecdotes disagree, the anecdotes are usually right, and the data is usually bullshit.

minimaxir•1h ago

GPT-5's issues were due to router shenanigans which Claude models do not do.

cadabrabra•1h ago

No dude, the latest versions of the models it routes to are markedly poorer in performance than their predecessors.

I’m observing a law that states: There appears to be a direct relationship between model performance and cost, such that whenever a company claims to have reduced inference costs, customers immediately notice a corresponding decline in model performance.

Show HN: Parano.ai – Continuous Competitor Monitoring

Interest in a "Who's looking for funding?" post

Don't buy fancy wall art city maps, make your own with this free script

Show HN: AiDex Tree-sitter code index as MCP server (50x less AI context usage)

Python, Is It Being Killed by Incremental Improvements?

Ghostty nightly now supports the `click_events` extension

Futureproofing Tines: Partitioning a 17TB Table in PostgreSQL – Tines

PGlite: Embeddable Postgres

First Contact with America

The Dot-Com Optimists Got a Lot Right

Pink noise reduces REM sleep and may harm sleep quality

David Alan Grier Speaks on the History of Computing: Full Interview [video]

Researchers Find OpenClaw Instances Exposed to the Internet

Common bacteria (Chlamydia) discovered in the eye linked to cognitive decline

Adoption of electric vehicles tied to real-world reductions in air pollution

Police facial recognition is now highly accurate, but public awareness lags

What we've been getting wrong about AI's truth crisis

The Bash Reference Manual Is in the Epstein Files

My Free Press Column on Moltbook

A free MCU watch tracker for Avengers: Doomsday

Doom on Emacs

Software Engineering with LLMs

Prompt Engineering Basics for Better AI Outputs

Codex App

Show HN: Deterministic event logs with explicit gap markers (NDJSON proof)

Power Aware Dynamic Reallocation for Inference

Show HN: Mortgage Payment Calculator (fast, no signup)

The origin story of the modern computer you’ve probably never heard, David Grier

Show HN: Open-Source Terminal UI for Kamal Deploy Management

The Codex App – OpenAI

Prediction: Claude 5 will be a major regression

Comments

Show HN: Parano.ai – Continuous Competitor Monitoring

Interest in a "Who's looking for funding?" post

Don't buy fancy wall art city maps, make your own with this free script

Show HN: AiDex Tree-sitter code index as MCP server (50x less AI context usage)

Python, Is It Being Killed by Incremental Improvements?

Ghostty nightly now supports the `click_events` extension

Futureproofing Tines: Partitioning a 17TB Table in PostgreSQL – Tines

PGlite: Embeddable Postgres

First Contact with America

The Dot-Com Optimists Got a Lot Right

Pink noise reduces REM sleep and may harm sleep quality

David Alan Grier Speaks on the History of Computing: Full Interview [video]

Researchers Find OpenClaw Instances Exposed to the Internet

Common bacteria (Chlamydia) discovered in the eye linked to cognitive decline

Adoption of electric vehicles tied to real-world reductions in air pollution

Police facial recognition is now highly accurate, but public awareness lags

What we've been getting wrong about AI's truth crisis

The Bash Reference Manual Is in the Epstein Files

My Free Press Column on Moltbook

A free MCU watch tracker for Avengers: Doomsday

Doom on Emacs

Software Engineering with LLMs

Prompt Engineering Basics for Better AI Outputs

Codex App

Show HN: Deterministic event logs with explicit gap markers (NDJSON proof)

Power Aware Dynamic Reallocation for Inference

Show HN: Mortgage Payment Calculator (fast, no signup)

The origin story of the modern computer you’ve probably never heard, David Grier

Show HN: Open-Source Terminal UI for Kamal Deploy Management

The Codex App – OpenAI