frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GPT-5.5 – No ARC-AGI-3 scores

6•AG25•5h ago
Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.

Comments

casey2•4h ago
ARC-AGI-3 scoring is really weird, in some views it's already saturated in others it's near 0. But I assume, since the entire benchmark IMO is a PR tool for OpenAI they will publish it eventually.
ForgeSynapse•3h ago
Spot on. If they had decent ARC-AGI-3 scores, it would be the first slide of their keynote.

Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.

You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.

Ask HN: How do solo devs protect their work in the age of vibe coding?

3•langs•1h ago•2 comments

GPT-5.5 – No ARC-AGI-3 scores

6•AG25•5h ago•2 comments

Ask HN: Dear astronomers, what are the most interesting things in space lately?

9•simonebrunozzi•3h ago•1 comments

Tell HN: YouTube RSS feeds no longer work

29•019•20h ago•13 comments

Recommended GPU Repairshop in Europe (Germany)

38•DogRunner•4d ago•15 comments

Ask HN: How to solve the cold start problem for a two-sided marketplace?

147•alegd•3d ago•163 comments

Hey, it's Earth Day today

19•burnt-resistor•1d ago•13 comments

Ask HN: How did you land your first projects as a solo engineer/consultant?

300•modelcroissant•4d ago•145 comments

Ask HN: Would you take a job programming VMS?

10•smackeyacky•18h ago•19 comments

Ask HN: What skills are future proof in an AI driven job market?

33•sunny678•2d ago•75 comments

Ask HN: How are you handling data retention across your stack?

2•preston-kwei•1d ago•3 comments

Ask HN: What's your favorite Emacs package?

4•blenderob•13h ago•5 comments

Need advice: Back end engineer → infrastructure: how do you make the transition?

6•gokuljs•1d ago•4 comments

Ask HN: Are cloud coding agents useful in real workflows yet?

3•Rperry2174•1d ago•3 comments

Tell HN: My open-source project hit 5k registered users

16•darkhorse13•2d ago•6 comments

Anthropic bans orgs without warning

32•alpinisme•2d ago•15 comments

OpenClaw stats don't add up

10•iliaov•1d ago•8 comments

Opus 4.7 vs. 4.6 after 3 days of real coding side by side from my actual session

14•agentseal•4d ago•6 comments

Ask HN: What Would Make Stack Overflow Great Again?

10•nnurmanov•1d ago•24 comments

Ask HN: Are there any engineering orgs that use incentives?

3•jppope•2d ago•7 comments

Ask Anthropic: Requesting clarity on Claude -p situation

5•andai•2d ago•0 comments

My file access workaround for cron in Tahoe

4•noduerme•2d ago•2 comments

GPT 5.5 Released in Codex

16•zuzululu•1d ago•3 comments

You've reached the end!