OpenAI claiming gold medal standard at IMO 2025

https://github.com/aw31/openai-imo-2025-proofs

21•ocfnash•6mo ago

Comments

ocfnash•6mo ago

According to the 6/N from this series, they are claiming full marks for problems 1 -- 5

https://x.com/alexwei_/status/1946477742855532918

Davidzheng•6mo ago

I posted about one of the twitter threads at https://news.ycombinator.com/item?id=44613840

Davidzheng•6mo ago

The proof superficially look super interesting. Especially bc it's not in style of usual LLM babble fillers. It's like almost exactly opposite, very efficient use of words and eliminating parts of grammar not important. Reminds me of how people write down proofs in drafts/how we communicate proofs with peers before writing final versions.

Davidzheng•6mo ago

P1 has in setup section basically a very precise summary of proof which it fills in later "So main is: (a) for n>=4, any n-line cover must contain a side-line; inductively reduce to n=3. (b) Analyze n=3 exactly."

I suspect there's some (tree-based?) search + separate process verifier + large # of parallel generation sessions. Coming just from hints of how structured/monotone the generated text is.

A lot of colons. like So: Now: Need: etc..

Davidzheng•6mo ago

P2 is geometry. It looks coordinate bashed? Very interesting to see it writing Good. and Perfect. after some lines. Very human-like in thinking process. It reads like a person talking about the proof orally.

Davidzheng•6mo ago

P3: interesting that in the basics section, it makes an easy observation but no proof sketch. unlike P1/P2 (P1 has full proof idea sketch P2 says we'll bash). This suggests actually the whole proof is generated one-shot (unlike my previous comment). I guess it's not doing search in the text space (like output some line search for next line etc). OFC there's probably some final process outputing the proof from some parts so it could be obfuscated the search.

come to think of it, informal proof gen probably can't easily use search? Probably it's doing parallel generation with some information sharing + global verification process. No real evidence except for the fact that the entire proof is very unstructured despite at each line it's written with some style consistency.

energy123•6mo ago

This is incredible. We know these questions are not in the training data. How can you still say that LLMs aren't reasoning.

What rare disease AI teaches us about longitudinal health

The Brand Savior Complex and the New Age of Self Censorship

Show HN: A Prompting Framework for Non-Vibe-Coders

Kilroy is a local-first "software factory" CLI

Mathscapes – Jan 2026 [pdf]

80386 Barrel Shifter

Training Foundation Models Directly on Human Brain Data

Web Speech API on HN Threads

ArtisanForge: Learn Laravel through a gamified RPG adventure – 100% free

Your phone edits all your photos with AI – is it changing your view of reality?

DStack, a small Bash tool for managing Docker Compose projects

Hop – Fast SSH connection manager with TUI dashboard

Turning books to courses using AI

Top #1 AI Video Agent: Free All in One AI Video and Image Agent by Vidzoo AI

Ask HN: How would you design an LLM-unfriendly language?

Show HN: MuxPod – A mobile tmux client for monitoring AI agents on the go

March for Billionaires

Turn Claude Code/OpenClaw into Your Local Lovart – AI Design MCP Server

An Nginx Engineer Took over AI's Benchmark Tool

Use fn-keys as fn-keys for chosen apps in OS X

Sir/SIEN: A communication protocol for production outages

Show HN: OpenCode for Meetings

The chaos in the US is affecting open source software and its developers

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

The original vi is a product of its time (and its time has passed)

Circumstantial Complexity, LLMs and Large Scale Architecture

Tech Bro Saga: big tech critique essay series

Show HN: A calculus course with an AI tutor watching the lectures with you

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

Show HN: SAA – A minimal shell-as-chat agent using only Bash

What rare disease AI teaches us about longitudinal health

The Brand Savior Complex and the New Age of Self Censorship

Show HN: A Prompting Framework for Non-Vibe-Coders

Kilroy is a local-first "software factory" CLI

Mathscapes – Jan 2026 [pdf]

80386 Barrel Shifter

Training Foundation Models Directly on Human Brain Data

Web Speech API on HN Threads

ArtisanForge: Learn Laravel through a gamified RPG adventure – 100% free

Your phone edits all your photos with AI – is it changing your view of reality?

DStack, a small Bash tool for managing Docker Compose projects

Hop – Fast SSH connection manager with TUI dashboard

Turning books to courses using AI

Top #1 AI Video Agent: Free All in One AI Video and Image Agent by Vidzoo AI

Ask HN: How would you design an LLM-unfriendly language?

Show HN: MuxPod – A mobile tmux client for monitoring AI agents on the go

March for Billionaires

Turn Claude Code/OpenClaw into Your Local Lovart – AI Design MCP Server

An Nginx Engineer Took over AI's Benchmark Tool

Use fn-keys as fn-keys for chosen apps in OS X

Sir/SIEN: A communication protocol for production outages

Show HN: OpenCode for Meetings

The chaos in the US is affecting open source software and its developers

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

The original vi is a product of its time (and its time has passed)

Circumstantial Complexity, LLMs and Large Scale Architecture

Tech Bro Saga: big tech critique essay series

Show HN: A calculus course with an AI tutor watching the lectures with you

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

Show HN: SAA – A minimal shell-as-chat agent using only Bash

OpenAI claiming gold medal standard at IMO 2025

Comments