GPT-5.5 – No ARC-AGI-3 scores

3•AG25•1h ago

Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.

Comments

casey2•28m ago

ARC-AGI-3 scoring is really weird, in some views it's already saturated in others it's near 0. But I assume, since the entire benchmark IMO is a PR tool for OpenAI they will publish it eventually.

ForgeSynapse•17m ago

Spot on. If they had decent ARC-AGI-3 scores, it would be the first slide of their keynote.

Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.

You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.

Anthropic reaches $1T valuation on secondary markets

Friendster Relaunch

When you’re stuck on “Help Wanted”

OpenAI Releases GPT-5.5

A Grounded Conceptual Model for Ownership Types in Rust

Ask HN: Dear astronomers, what are the most interesting things in space lately?

How tolls saved Britain from pothole hell in the Industrial Revolution

The Disillusioned College Grads Turning to the Labor Movement

How Have Universities Survived for Nearly a Millenium

Suddenly Everyone Wants a Tailor. They're in Short Supply

The first 25 years of the Northwestern University SuperAging Program

US Soldier in Maduro Raid Is Charged with Making Bets on Former Leader's Ouster

Show HN: Free AI Stock analysis in 6 seconds, any US ticker, no signup

U.S. Soldier Charged with Using Classified Info to Profit from Prediction Market

Claude Code channels not available for Personal Max plan

Could the math 'shape' of the universe solve the cosmological constant problem?

Blame The Pentagon, Not AI, for Preventable Targeting Mistakes

Build with Kiro: Introducing the Community Hub and Kiro Labs

Lovable admits public project chats and source code were exposed, apologizes

Protein-maxxing, GLP-1s have US farmers betting on peas and lentils

Agents grew up, so did our docs

Charge: Soldier Used Classified Info to Profit from Prediction Market Bets

The Rich Don't Play by the Rules. So Why Should I?

I scanned 10 open-source AI apps for EU AI Act compliance – here's what I found

Cool Makie Papers: Science Visualized with Makie.jl

Gentital Costume Protester Exonerated in Short, Absurd Trial

MationX – Android Automation Made Simple

I run my business like an open source project (2013)

Speaking Freely: Lizzie O'Shea

Show HN: Stash – CLI to search over your team's coding agent sessions