frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

CursorBench 3.1

https://cursor.com/evals
16•handfuloflight•1h ago

Comments

o10449366•26m ago
I feel like this benchmark reiterates my disbelief that anyone uses the latest Anthropic models for any productive work. They seem to be the best at burning tokens and spawning unnecessary subagents even for well-defined and tightly scoped tasks.

Can we get a count of people that have had Claude read irrelevant documents or perform unnecessary web searches even when told not to from the beginning?

I'm starting to wonder if this increased token usage is inadvertently bleeding into how Anthropic actually trains their model, especially leading up to IPO. As older models are deprecated and users are forced onto newer models, if the default is less efficient and more token expensive that directly results in higher "profit" for Anthropic in terms of the consumption their users have to tolerate - lest they jump to a competitor.

anilgulecha•23m ago
is composer 2.5 that good at that pricepoint? Seems like the gemini flash playbook of trying to get most bang for the buck.
aabdi•19m ago
yes, its very good.
danfritz•15m ago
It's my daily driver, it's fast affordable and with a bit of guidance gets the job done.

I only reach for Claud when i need to plan something big or want to have a sparring partner to fire of some ideas.

I think what a lot of people don't realize is that you don't need a fronteer model for 80% of coding tasks. Composer 2.5 is often more than good enough, less token hungry and way faster

uf00lme•10m ago
It's surprising usable and cheap enough to run in 'fast' mode when vibing something quick. For simple code I find I prefer the code it writes over GLM or Gemini family.
tekacs•14m ago
I'm pretty baffled by their choice of axes. I would have thought that the left was the cheapest, not the most expensive. I appreciate that this layout means that top right can be best, but it's still unintuitive to have this backwards cost axis IMO.

Putting that aside, I spend all day every day implementing very, very hard things right on the edge of what agents are (barely, sometimes) capable of, and I have had to keep Opus on max for things that need 'real validation' for a while now. And that has felt like 'the only way' to get Opus to perform even close to 5.5 xhigh. I'm only using Opus at all because GPT-5.5 in the subscriptions only has a small (400k, but 258k effective) context window.

The difference is that 5.5 xhigh is extremely fast in most practical cases, both efficiently implementing _overall_, and responding very quickly with great adaptive thinking if you ask it something that it doesn't have to think about. Opus 4.8 Max will needlessly chew on everything and can take hours to implement even simple things, so I can mostly only use it for planning/review.

Fable is much much better at adaptive thinking / responding quickly (although probably still worse than 5.5 xhigh), and... I think folks have said enough elsewhere about its strengths and weaknesses. Sadly still not a reliable implementor for my hard tasks though (that's still GPT's domain) – it tends to leave big, dangerous holes hiding inside implementations unless babied.

Horsewood (2 July 2026) We Tried It My Honest ReviewS

https://finance.yahoo.com/sectors/healthcare/articles/horsewood-urgent-report-2026-horse-19110038...
1•Gafyhanu•1m ago•0 comments

Seattle Just Had an Earthquake

1•tobinfekkes•5m ago•0 comments

Feds Might Flip the Script on Right to Repair Vehicle Emissions Systems

https://www.thedrive.com/news/feds-might-flip-the-script-on-right-to-repair-vehicle-emissions-sys...
1•josephcsible•12m ago•0 comments

Likelihood, and Maximum Likelihood, in Statistics

https://bactra.org/notebooks/likelihood.html
1•Tomte•15m ago•0 comments

Fable 5 is insanely good

1•vuphanse•15m ago•0 comments

Ask HN: Who's Hiring Remote Contractors? (July 2026)

1•akashwadhwani35•15m ago•0 comments

Typst: Designing for Incrementality (Laurenz Mädje at RustWeek) [video]

https://www.youtube.com/watch?v=yWWVhbyOWWE
1•felixhummel•16m ago•0 comments

Rasa Intelligence: AI diagnostic engine-gives one business verdict in 90 seconds

https://tech-rasa.com
1•Deepti251•19m ago•0 comments

My Story of 3D Realms / Apogee Part I (2020)

https://joesiegler.blog/2020/11/my-story-of-apogee-3dr/
1•Michelangelo11•20m ago•0 comments

NoUI()

https://www.swiftjectivec.com/noui/
1•ingve•23m ago•0 comments

OpenAgents makes Sonnet 5, Fable 5 and other agents collaborate in one thread

https://openagents.org/workspace
1•gshg12•27m ago•1 comments

The Socialist Wave Reaches the Heartland

https://www.wsj.com/opinion/colorado-democrats-socialists-melat-kiros-michael-bennet-99ad5a66
1•doener•27m ago•2 comments

PyCanopy: A spatial query layer for Polars, competitive with DuckDB, SedonaDB

https://github.com/pranav-walimbe/PyCanopy
1•pranav1077•27m ago•1 comments

The Complete Homemade Juggling Beanbag Guide

https://www.joshuaclifton.com/juggle/
1•mrauha•28m ago•1 comments

Show HN: LinkedIn Focus Chrome Extension

https://yvetter438.github.io/LinkedInFeedBlockerWebsite/
1•ywv•29m ago•0 comments

Show HN: What GPTBot sees before your React app hydrates

https://botscore.io/blog/what-gptbot-sees-before-hydration/
1•_tool•31m ago•0 comments

Show HN: Designing a factory-safety agent (model reasons, code routes)

https://github.com/HumphreySun98/safety-commander-agent
1•humphreysun98•34m ago•0 comments

Section 194J of Income Tax Act: Meaning, Rules and Examples – SMFG India Credit

https://www.smfgindiacredit.com/knowledge-center/section-194j-of-income-tax-act.aspx
1•saumyaraut11•35m ago•0 comments

Black Pepper Won Europe from a Tastier Pepper (2016)

https://www.atlasobscura.com/articles/long-pepper-better-than-black-pepper
1•downbad_•36m ago•0 comments

Google's exponential path to climate-wrecking digital bloat

https://ketanjoshi.co/2026/07/01/googles-exponential-path-to-climate-wrecking-digital-bloat/
1•jalev•37m ago•0 comments

LongCat 2.0: The first trillion-parameter model trained on Chinese-made GPUs

https://www.reuters.com/world/china/chinas-meituan-says-new-ai-model-trained-domestic-chips-2026-...
1•linzhangrun•38m ago•0 comments

Limine: Modern, secure, portable, multiprotocol bootloader and boot manager

https://github.com/Limine-Bootloader/Limine
2•noteness•43m ago•0 comments

Cotal: Agentic Coordination Layer

https://cotal.ai
1•handfuloflight•45m ago•0 comments

CSS Logical Properties Converter

https://cssawwwards.com/frontend-toolkit/css-logical-properties
1•cssawwwards•47m ago•0 comments

Show HN: I wrote a Rust book ending with a Redis clone

https://shankhan3.gumroad.com/l/dnwmtp
2•zeeshanali0094•47m ago•0 comments

The Three Projections of Doctor Futamura

http://blog.sigfpe.com/2009/05/three-projections-of-doctor-futamura.html
1•tristenharr•47m ago•0 comments

Torlink

https://github.com/baairon/torlink
3•handfuloflight•47m ago•1 comments

AI Tutor on Your Screen

https://heybraza.com
1•orakulus•52m ago•1 comments

What Happened to the Fight for the Internet?

https://dustycloud.org/blog/what-happened-to-the-fight-for-the-internet/
1•signa11•55m ago•0 comments

Power House – a Rust/Python toolkit for verifiable computation artifacts

https://mfenx.com
1•psl-fox•57m ago•0 comments