frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Prediction: Claude 5 will be a major regression

3•cadabrabra•1h ago
At this point it should be completely obvious to everyone that there’s what is approximately a linear relationship between model cost and model performance. Anthropic is claiming that Claude 5 Sonnet will cost about half as much as their current SOTA models. Therefore, expect about half the performance. This is Anthropic’s version of GPT-5, i.e. a way to fool their customers into using a less compute intensive model, almost purely for the benefit of the company. But as usual, they will rig the benchmarks and make it appear as though the model is better at certain things, like coding.

It’s an illusion, folks. You’re being played. Wake the hell up.

Also, I can’t believe that people still talk about SWE-Bench when there is a paper proving that the benchmark is completely useless because models regurgitate memorized answers.

Again, please, wake up.

https://arxiv.org/abs/2506.12286

Comments

bigyabai•1h ago
> It’s an illusion, folk. You’re being played.

How are they "being played" if Claude 5 isn't even out yet

cadabrabra•1h ago
It’s already obvious that it will be a scam. Higher benchmark scores and lower cost are two signs that customers are about to get scammed. We saw it with GPT-5.
bigyabai•1h ago
Is it? It might be possible that it's a scam, but for something to be "obvious" it has to release first.

There are plenty of ways to reduce inference cost for a high-intelligence model. Making sparser weights, for example, can increase the parameter count while reducing the inference cost and time.

cadabrabra•1h ago
I get what you’re saying, but I still think that it will be a scam. Bookmark this thread and let’s continue the conversation after it’s released.
bigyabai•1h ago
I think you are informed by more of an emotional interest than a technical one, here. You've written several such posts and many of them are astronomically unlikely predictions.
cadabrabra•1h ago
Ok but didn’t Karpathy make it clear that we live in the vibe era? I’m inclined to trust vibes more than technical jargon, and boy are the vibes off with what’s been happening!

Let’s see what happens :)

Redster•1h ago
Respectfully,

Claude 3 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4.1 Opus: $15.00 (Input) / $75.00 (Output) per 1M tokens

Claude 4.5 Opus: $5.00 (Input) / $25.00 (Output) per 1M tokens

cadabrabra•1h ago
This actually proves my point because if you read the anecdotes, you will notice a marked decline in performance. The version number goes up but the actual performance declines. The benchmarks can tell any story you want them to.
minimaxir•1h ago
> Anthropic is claiming that Claude 5 Sonnet will cost about half as much as their current SOTA models. Therefore, expect about half the performance.

That's not how LLM quality works.

cadabrabra•1h ago
Maybe not in theory but definitely in practice, as we’ve seen with GPT-5. These companies are lightning money on fire. If they reduce the cost, expect a proportional decrease in quality. All of the GPT-5 anecdotes confirm this. When the data and anecdotes disagree, the anecdotes are usually right, and the data is usually bullshit.
minimaxir•1h ago
GPT-5's issues were due to router shenanigans which Claude models do not do.
cadabrabra•1h ago
No dude, the latest versions of the models it routes to are markedly poorer in performance than their predecessors.

I’m observing a law that states: There appears to be a direct relationship between model performance and cost, such that whenever a company claims to have reduced inference costs, customers immediately notice a corresponding decline in model performance.

Show HN: Parano.ai – Continuous Competitor Monitoring

https://parano.ai
1•mlukaszczyk•2m ago•0 comments

Interest in a "Who's looking for funding?" post

2•gushogg-blake•4m ago•0 comments

Don't buy fancy wall art city maps, make your own with this free script

https://www.howtogeek.com/dont-buy-fancy-wall-art-city-maps-make-your-own-with-this-free-script/
1•Krasnol•4m ago•0 comments

Show HN: AiDex Tree-sitter code index as MCP server (50x less AI context usage)

https://github.com/CSCSoftware/AiDex
1•ultrafox42•6m ago•1 comments

Python, Is It Being Killed by Incremental Improvements?

https://www.youtube.com/watch?v=03DswsNUBdQ
1•todsacerdoti•9m ago•0 comments

Ghostty nightly now supports the `click_events` extension

https://twitter.com/mitchellh/status/2018400993466331431
1•tosh•9m ago•0 comments

Futureproofing Tines: Partitioning a 17TB Table in PostgreSQL – Tines

https://www.tines.com/blog/futureproofing-tines-partitioning-a-17tb-table-in-postgresql/
1•vinnyglennon•9m ago•0 comments

PGlite: Embeddable Postgres

https://github.com/electric-sql/pglite
1•KolmogorovComp•12m ago•0 comments

First Contact with America

https://novum.substack.com/p/first-contact-with-america
1•paulpauper•14m ago•0 comments

The Dot-Com Optimists Got a Lot Right

https://www.bloomberg.com/news/newsletters/2026-02-01/what-mary-meeker-s-internet-trend-reports-c...
2•paulpauper•14m ago•0 comments

Pink noise reduces REM sleep and may harm sleep quality

https://medicalxpress.com/news/2026-01-pink-noise-rem-quality.html
1•bikenaga•15m ago•1 comments

David Alan Grier Speaks on the History of Computing: Full Interview [video]

https://www.youtube.com/watch?v=NJckzrDpbUA
1•oldnetguy•15m ago•0 comments

Researchers Find OpenClaw Instances Exposed to the Internet

https://protean-labs.io/blog/researchers-find-thousands-of-openclaw-instances-exposed
1•birdculture•15m ago•0 comments

Common bacteria (Chlamydia) discovered in the eye linked to cognitive decline

https://medicalxpress.com/news/2026-02-common-bacteria-eye-linked-cognitive.html
4•bikenaga•19m ago•1 comments

Adoption of electric vehicles tied to real-world reductions in air pollution

https://phys.org/news/2026-01-electric-vehicles-real-world-reductions.html
1•Teever•20m ago•0 comments

Police facial recognition is now highly accurate, but public awareness lags

https://theconversation.com/facial-recognition-technology-used-by-police-is-now-very-accurate-but...
4•gnabgib•21m ago•1 comments

What we've been getting wrong about AI's truth crisis

https://www.technologyreview.com/2026/02/02/1132068/what-weve-been-getting-wrong-about-ais-truth-...
1•cmsefton•21m ago•0 comments

The Bash Reference Manual Is in the Epstein Files

https://mastodon.social/@sjvn/116002496494323705
3•paulfitz•21m ago•1 comments

My Free Press Column on Moltbook

https://marginalrevolution.com/marginalrevolution/2026/02/my-free-press-column-on-moltbook.html
1•paulpauper•21m ago•0 comments

A free MCU watch tracker for Avengers: Doomsday

https://doomsdayrdy.vercel.app/
1•AlonsoGP•23m ago•1 comments

Doom on Emacs

https://github.com/minad/doom-on-emacs
1•ashton314•23m ago•0 comments

Software Engineering with LLMs

https://jamison.dance/02-02-2026/software-engineering-with-llms
2•jergason•24m ago•0 comments

Prompt Engineering Basics for Better AI Outputs

https://mem0.ai/blog/prompt-engineering-complete-guide
1•ninadwrites•25m ago•0 comments

Codex App

https://developers.openai.com/codex/app/
2•tosh•25m ago•1 comments

Show HN: Deterministic event logs with explicit gap markers (NDJSON proof)

https://github.com/yupme-bot/kernel-v1.1-ndjson-proof
1•Slaine•25m ago•1 comments

Power Aware Dynamic Reallocation for Inference

https://arxiv.org/abs/2601.12241
3•PaulHoule•27m ago•0 comments

Show HN: Mortgage Payment Calculator (fast, no signup)

https://toolvault.co/tools/mortgage-payment-calculator
1•Aaevro•28m ago•1 comments

The origin story of the modern computer you’ve probably never heard, David Grier

https://www.youtube.com/watch?v=dHy5nT-5e9M
1•oldnetguy•30m ago•0 comments

Show HN: Open-Source Terminal UI for Kamal Deploy Management

https://github.com/shuvro/lazykamal
1•shuvrokhan•31m ago•0 comments

The Codex App – OpenAI

https://twitter.com/ajambrosino/status/2018385459936923656
1•abinaya_rl•31m ago•1 comments