frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Opus 4.7 vs. 4.6 after 3 days of real coding side by side from my actual session

2•agentseal•1h ago
I spent some time today comparing Opus 4.6 and 4.7 using my own usage data to see how they actually behave side by side.

still pretty early for 4.7, but a few things surprised me.

In my sessions, 4.7 gets things right on the first try less often than 4.6. One-shot rate sits around 74.5% vs 83.8%, and I am seeing roughly double the retries per edit (0.46 vs 0.22).

It also produces a lot more output per call, about 800 tokens vs 372 on 4.6, which makes it noticeably more expensive. cost per call is $0.185 vs $0.112.

when I broke it down by task type, coding and debugging both looked weaker on 4.7. Coding one-shot dropped from 84.7% to 75.4%, debugging from 85.3% to 76.5%. Feature work was slightly better on 4.7 (75% vs 71.4%), but the sample is small. Delegation showed a big gap (100% vs 33.3%), though that one only has 3 samples on the 4.7 side so I wouldnt read much into it yet.

4.7 also uses fewer tools per turn (1.83 vs 2.77) and barely delegates to subagents (0.6% vs 3.1%). Not sure yet if that's a style difference or just the smaller sample.

A couple of caveats. This is about 3 days of 4.7 data (3,592 calls) vs 8 days of 4.6 (8,020 calls). Some categories only have a handful of examples. These numbers will shift with more usage, and your results will probably look different depending on what kind of work you do.

npx codeburn compare

Comments

alegd•45m ago
interesting data. I use Claude Code daily and noticed 4.7 feels different but couldnt put numbers to it like this.

does your one-shot rate account for how much context you give it? I keep a detailed CLAUDE.md with project conventions and wondering if that closes the gap at all or if 4.7 just struggles regardless.

the fewer tools per turn thing worries me. Are you seeing it hallucinate project structure more? In my sessions it seems to want to figure things out in its head instead of actually reading the files

More expensive and lower first-try accuracy is rough. You planning to stick with 4.7 or going back?

alwillis•10m ago
Anthropic provides details regarding between Opus 4.7 and 4.6, including Opus 4.7 doesn't call tools as frequently as 4.6 due to being more capable. Depending on the task at hand, that could a good thing or not so good [1].

For example, regarding instruction following:

Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make.

[1]: https://platform.claude.com/docs/en/build-with-claude/prompt...

The Wall of Shame

https://gagor.pro/2026/04/the-wall-of-shame/
1•___timor___•2m ago•0 comments

Verum, examined – a systems language for an age when humans write less code

https://verum-lang.org/blog/verum-examined
1•old8man•3m ago•0 comments

Show HN: Free PDF redactor that runs client-side

https://redactpdf.net
1•MrGuacamole•3m ago•0 comments

Presentator: Free and Open source design feedback and presentation platform

https://presentator.io/
1•thunderbong•5m ago•0 comments

Six Levels of Dark Mode

https://cssence.com/2024/six-levels-of-dark-mode/
2•Akcium•6m ago•0 comments

I Created OpenClaw – Peter Steinberger, TedTalk, YT)

https://www.youtube.com/watch?v=7rzYDM6vMtI
1•hbarka•6m ago•0 comments

A more troubling picture of sea level rise is coming into view

https://e360.yale.edu/features/sea-level-rise-land-subsidence
2•Brajeshwar•7m ago•0 comments

Europe has 'maybe six weeks of jet fuel left'

https://www.bbc.com/news/articles/czjw2kz0l22o
2•measurablefunc•10m ago•0 comments

PerryTS: Compile TypeScript to native executables with LLVM

https://www.perryts.com/
1•simjnd•13m ago•0 comments

Sagas vs. Process Managers

https://docs.eventsourcingdb.io/blog/2026/04/20/sagas-vs-process-managers/
1•goloroden•15m ago•0 comments

Epistemic Suicide: Why AI Is Collapsing into Mediocrity

https://medium.com/@erinacius4455/full-linkedin-article-english-version-6611a87d02c5
1•alex_gold•18m ago•0 comments

Cardynal – AI support agent for businesses, no code, WhatsApp and web chat

https://cardynal.io
1•Cardynal•18m ago•0 comments

Can You Hear an Ambulance Moving Faster Than Sound?

https://snoeprol.github.io/science/doppler-effect.html
1•Snoeprol•20m ago•0 comments

NNA (Natural Number Array)

https://users.rust-lang.org/t/this-is-my-first-project-i-invented-a-new-algorithm-that-even-ai-do...
1•Erenay09•22m ago•1 comments

PostgresBench: A Reproducible Benchmark for Postgres Services

https://clickhouse.com/blog/postgresbench
1•saisrirampur•24m ago•0 comments

New CDC pick may face "threat to follow ideology over evidence," ex-official say

https://www.cbsnews.com/news/jerome-adams-erica-schwartz-face-the-nation-surgeon-general-kennedy-...
1•rolph•26m ago•0 comments

Wind and solar power surge across the Mountain West as demand tests the grid

https://www.kunr.org/local-stories/2026-04-14/wind-solar-power-surge-mountain-west
1•Bender•28m ago•0 comments

Uber's AI Push Hits a Wall–CTO Says Budget Struggles Despite $3.4B Spend

https://finance.yahoo.com/sectors/technology/articles/ubers-anthropic-ai-push-hits-223109852.html
3•dakiol•28m ago•0 comments

Bronx officials try to rein social media 'takeovers' after events turn chaotic

https://gothamist.com/news/bronx-officials-try-to-rein-in-social-media-takeovers-after-winter-eve...
1•gnabgib•30m ago•0 comments

Fixing Unix Filenames (2009)

https://dwheeler.com/essays/fixing-unix-linux-filenames.html
2•LorenDB•32m ago•2 comments

How Can Make

https://www.forumvc.com
1•dongtam•34m ago•0 comments

Do Not Default to a Public VPN

https://avkcode.github.io/blog/do-not-default-to-a-vpn.html
1•KyleVlaros•41m ago•1 comments

The Missing Human Half of AI

https://www.utkarshapoorva.com/writing/missing-human-half-of-ai/
1•utkarsh_apoorva•42m ago•1 comments

Musk's SpaceX urges Trump to crack down on EU satellites

https://www.telegraph.co.uk/business/2026/04/19/musks-spacex-urges-trump-to-crackdown-on-eu-satel...
3•doener•43m ago•0 comments

An explainer of the invisible temporal logic shaping platform behavior

https://github.com/Dario-Chang/The-Invisible-Logic-Regulators-Missed-for-23-Years-How-Platforms-R...
1•governace-layer•46m ago•1 comments

After 6 months I shipped Transita found a niche I could build for

https://transita.app
1•snenenenene•46m ago•0 comments

"Now I Have the Full Picture"

https://taoofmac.com/space/notes/2026/04/19/1400
3•rcarmo•48m ago•0 comments

They Went Abroad to Save Money. Moving Back Seems Unaffordable

https://www.nytimes.com/2026/04/19/business/americans-abroad-cheaper-living-costs.html
2•mikhael•50m ago•0 comments

The Unsuitability of English (2015)

https://www.chronicle.com/blogs/linguafranca/the-unsuitability-of-english
1•downbad_•51m ago•1 comments

A Chinese Android just ran a half-marathon faster than any human

https://www.cnn.com/2026/04/19/china/china-robot-half-marathon-intl-hnk
8•Bender•57m ago•1 comments