frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

React Router and React Server Components: The Path Forward

https://remix.run/blog/react-router-and-react-server-components
1•SpaceIcedLatte•1m ago•0 comments

Gum: A Go-powered tool for writing shell scripts, no Go code required

https://github.com/charmbracelet/gum
1•bundie•2m ago•0 comments

AI's energy demand ratchets up pressure on Republicans

https://www.eenews.net/articles/ais-energy-demand-ratchets-up-pressure-on-republicans/
1•pera•2m ago•0 comments

Popular NPM linter packages hijacked via phishing to drop malware

https://www.bleepingcomputer.com/news/security/popular-npm-linter-packages-hijacked-via-phishing-to-drop-malware/
2•OptionOfT•3m ago•0 comments

Using leaked data to examine vulnerabilities in SMS routing and SS7 signalling

https://medium.com/@lighthousereports/using-leaked-data-to-examine-vulnerabilities-in-sms-routing-and-ss7-signalling-8e30298491d9
2•todsacerdoti•4m ago•0 comments

Unexpected inconsistency in records – Jon Skeet's coding blog

https://codeblog.jonskeet.uk/2025/07/19/unexpected-inconsistency-in-records/
1•OptionOfT•4m ago•0 comments

Where Did All Those Brave Free Speech Warriors Go?

https://www.techdirt.com/2025/05/19/where-did-all-those-brave-free-speech-warriors-go/
3•the_why_of_y•11m ago•0 comments

Interesting thoughts on the limits of AI in the context of software development

https://www.ufried.com/blog/ai_and_software_development_5/
1•BinaryIgor•13m ago•0 comments

A Look Back at WeChat's PhxSQL and the 'Fastest Majority'

https://www.supasaf.com/blog/general/phxsql
1•supasaf•13m ago•0 comments

New Russian law criminalizes online searches for controversial content

https://www.washingtonpost.com/world/2025/07/17/russia-internet-censorship/
2•voxleone•15m ago•0 comments

DunedinPACNI estimates the longitudinal Pace of Aging from a single brain image

https://www.nature.com/articles/s43587-025-00897-z
1•bookofjoe•18m ago•0 comments

Why Is ReactOS Development So Undervalued?

2•Waraqa•20m ago•1 comments

AI guzzled books without permission. Authors are fighting back

https://www.washingtonpost.com/technology/2025/07/19/ai-books-authors-congress-courts/
2•amirkabbara•22m ago•1 comments

Kimi K2 scored 59% on the aider polyglot coding benchmark

https://twitter.com/paulgauthier/status/1946165321611526229
1•tosh•24m ago•0 comments

Spectrally Tunable Lighting: How LEDs can emulate blackbody emitters

https://enody.lighting/journal/01-spectrally-tunable-lighting/
1•carterpeterson•24m ago•0 comments

'I was floored by the data': Psilocybin shows anti-aging properties

https://www.livescience.com/health/ageing/i-was-floored-by-the-data-psilocybin-shows-anti-aging-properties-in-early-study
3•Bluestein•28m ago•0 comments

Extending Iterated, Spatialized Prisoners Dilemma to Understand Multicellularity

https://lksshw.github.io/
1•ca98am79•32m ago•0 comments

Field Guide to the North American Weigh Station

https://hackaday.com/2025/06/26/field-guide-to-the-north-american-weigh-station/
1•toomuchtodo•32m ago•0 comments

uv 0.8

https://github.com/astral-sh/uv/releases/tag/0.8.0
2•tosh•32m ago•0 comments

Is automating your AI too hard? Let AI automate that too

https://github.com/czlonkowski/n8n-mcp
1•greggh•33m ago•2 comments

Origami Space Planes Could Solve a Major Problem in Orbit

https://gizmodo.com/origami-space-planes-could-solve-a-major-problem-in-orbit-2000629875
1•Bluestein•33m ago•0 comments

Scenarios for solar radiation modification need to include perceptions of risk

https://iopscience.iop.org/article/10.1088/2752-5295/addd42
1•PaulHoule•34m ago•0 comments

Google Backs 10 New Nuclear Reactors for AI, Built by AI. What Could Go Wrong?

https://www.pcmag.com/news/google-backs-10-new-nuclear-reactors-for-ai-will-it-work-this-time
1•Bluestein•36m ago•0 comments

Karen Hao – Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI

https://www.youtube.com/watch?v=NtQCthF2vlY
1•belter•40m ago•1 comments

Death by AI

https://davebarry.substack.com/p/death-by-ai
2•ano-ther•41m ago•0 comments

Elon Musk's Starlink internet works great if hardly anyone uses it

https://www.washingtonpost.com/technology/2025/07/18/starlink-internet-satellite-speed-elon-musk/
2•reaperducer•41m ago•0 comments

Angel vs. Devil Accounting: Reviving a 500-Yr-Old Idea for Modern Mental Health

https://ledgeroflife.blog/angel-vs-devil-accounting-resurrecting-a-500-year-old-idea-for-modern-mental-health/
4•shadowvoxing•47m ago•0 comments

That how to calculate the hours that you worked and revenue amount

https://billr.us/
1•ulicaki8991•47m ago•0 comments

Groq's First Compound AI System

https://groq.com/blog/now-in-preview-groqs-first-compound-ai-system
1•tosh•48m ago•0 comments

Why you should choose HTMX for your next web-based side project (2024)

https://hamy.xyz/blog/2024-02_htmx-for-side-projects
3•kugurerdem•52m ago•3 comments
Open in hackernews

OpenAI claims Gold-medal performance at IMO 2025

https://twitter.com/alexwei_/status/1946477742855532918
117•Davidzheng•6h ago

Comments

tester756•4h ago
huh?

any details?

ktallett•4h ago
It is able to solve some high school/early bsc maths problems.
littlestymaar•4h ago
Which would be impressive if we knew those problems weren't in the training data already.

I mean it is quite impressive how language models are able to mobilize the knowledge they have been trained on, especially since they are able to retrieve information from sources that may be formatted very differently, with completely different problem statement sentences, different variable names and so on, and really operate at the conceptual level.

But we must wary of mixing up smart information retrieval with reasoning.

ktallett•4h ago
Considering these LLM utilise the entirety of the internet, there will be no unique problems that come up in the oLympiad. Even across the course of a degree, you will have likely been exposed to 95% of the various ways to write problems. As you say, retrieval is really the only skill here. There is likely no reasoning.
Jcampuzano2•2h ago
Calling these high school/early bsc maths questions is an understatement lol.
demirbey05•4h ago
Progress is astounding. Recently report published about evaluation of LLMs on IMO 2025. o3 high didn't even get bronze.

https://matharena.ai/imo/

Waiting for Terry Tao's thoughts, but these kind of things are good use of AI. We need to make science progress faster rather than disrupting our economy without being ready.

ktallett•4h ago
Astounding in what sense? I assume you are aware of the standard of Olympiad problems and that they are not particularly high. They are just challenging for the age range, but they shouldn't be for AI considering they aren't really anything but proofs and basic structured math problems.

Considering OpenAI can't currently analyse and provide real paper sources to cutting edge scientific issues, I wouldn't trust it to do actual research outside of generating matplotlib code.

demirbey05•4h ago
I mean progress speed, few months ago they released o3 it has 16 pt in imo 2025
ktallett•4h ago
In that regards I would agree but that to me suggests that prior hype was unbased though.
Davidzheng•4h ago
sorry but I don't think it's accurate to say "they are just challenging for the age range"
ktallett•3h ago
I'm aware you believe they are impossible tasks unless you have specific training, I happen to disagree with that.
Davidzheng•3h ago
you meaning specific IMO training or general math training? Latter is certainly needed, former being needed in my opinion is a general observation for example about the people who make it on the teams.
ktallett•3h ago
I mean IMO training, as yes I agree you wouldn't be able to do this without a complete Math knowledge.
saagarjha•3h ago
I did competitive math in high school and I can confidently say that they are anything but "basic". I definitely can't solve them now (as an adult) and it's likely I never will. The same is true for most people, including people who actually pursued math in college (I didn't). I'm not going to be the next guy who unknowingly challenges a Putnam winner to do these but I will just say that it is unlikely that someone who actually understands the difficulty of these problems would say that they are not hard.

For those following along but without math specific experience: consider whether your average CS professor could solve a top competitive programming question. Not Leetcode hard, Codeforces hard.

zug_zug•1h ago
I feel like I've noticed you you making the same comment 12 places in this thread -- incorrectly misrepresenting the difficulty of this tournament and ultimately it comes across as a bitter ex.

Here's an example problem 5:

Let a1,a2,…,an be distinct positive integers and let M=max⁡1≤i<j≤n.

Find the maximum number of pairs (i,j) with 1≤i<j≤n for which (ai +aj )(aj −ai )=M.

causal•1h ago
What does max⁡1≤i<j≤n mean? Wouldn't M always be j?
causal•41m ago
Where did you get this? Don't see it on the 2025 problem set and now I wanna see if I have the right answer
Aurornis•56m ago
> I assume you are aware of the standard of Olympiad problems and that they are not particularly high.

Every time an LLM reaches a new benchmark there’s a scramble to downplay it and move the goalposts for what should be considered impressive.

The International Math Olympiad was used by many people as an example of something that would be too difficult for LLMs. It has been a topic of discussion for some time. The fact that an LLM has achieved this level of performance is very impressive.

You’re downplaying the difficulty of these problems. It’s called international because the best in the entire world are challenged by it.

ktallett•4h ago
Tbh, the way everyone has been going out about the quality of Open ai, high school/early university maths problems should not have been a stretch at all for it. The fact that this unverified claim is only just being mentioned suggests their AI isn't quite as amazing as marketed. Especially considering fundamentally logic and following rules should be rather easy to do so and most Olympiad problems are rather easy to extract the key details from.
gametorch•3h ago
> high school/early university maths problems should not have been a stretch at all for it

This is a ridiculous understatement of the difficulty of getting gold at the IMO.

ktallett•3h ago
That is the level of math you need to do these problems with a little brief understanding of what certain concepts are. There is no calculus etc. The vast majority of IMO questions are applying the base rules to new problems.
gametorch•3h ago
Okay, let's see you try any one of the past IMOs and show us your score.

It's really hard.

See my other comment. I was voted the best at math in my entire high school by my teachers, completed the first two years of college classes while still in high school. I've tried IMO problems for fun. I'm very happy if I get one right. I'd be infinitely satisfied to score a perfect on 3 out of 6 problems and that's nowhere near gold.

Davidzheng•3h ago
You'd be surprised at how much math the people who actually get IMO gold know...
Jcampuzano2•2h ago
There are entire fields of math with exceptional people trying to solve impossibly hard problems that utilize quite literally 0 calculus.

Many of them are also questions that eventually end up with proofs or solutions that only require very high level understanding of basic principles. But when I say very high I mean like impossibly high for the average person and ability to combine simple concepts to solve complex problems.

I'd wager the majority of Math graduates from universities would struggle to answer most IMO questions.

oytis•2h ago
It's like saying getting a gold medal in boxing is not hard, because it doesn't involve any firearms
pragmatic•1h ago
More fair comparison: Military grade killbot enters ring with boxer and proceeds to fire pneumatic hammer at boxer until KO?
curt15•2h ago
Olympiad questions don't require advanced concepts except maybe some classical geometry techniques that you wouldn't normally encounter in modern research mathematics. But they're fundamentally designed as puzzles. You need to spot the tricks.
Aurornis•46m ago
> high school/early university maths problems should not have been a stretch at all for it.

Either you are unfamiliar with the International Math Olympiad or you’re trying to be misleading.

Calling these problems high school/early university maths is a ridiculous characterization.

Lionga•4h ago
counting "R"s in strawberry now counts for a gold medal in math?
ktallett•4h ago
The Olympiad is a great thing for children for sure. This is not what I feel we should be wasting resources on though for AI. I question if it's even impressive.
baq•4h ago
Velocity of AI progress in recent years is exceeded only by velocity of goalposts.
ktallett•4h ago
The goalposts should focus on being able to make a coherent statement using papers on a subject with sources. At this point it can't do that for any remotely cutting edge topic. This is just a distraction.
mindwok•3h ago
The idea of a computer being able to solve IMO problems it has not seen before in natural language even just 3 years ago would be completely science fiction. This is astounding progress.
timbaboon•1h ago
Haha no - then it wouldn't have got a gold medal ;)
z7•4h ago
Some previous predictions:

In 2021 Paul Christiano wrote he would update from 30% to "50% chance of hard takeoff" if we saw an IMO gold by 2025.

He thought there was an 8% chance of this happening.

Eliezer Yudkowsky said "at least 16%".

Source:

https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challe...

exegeist•2h ago
Impressive prediction, especially pre-ChatGPT. Compare to Gary Marcus 3 months ago: https://garymarcus.substack.com/p/reports-of-llms-mastering-...

We may certainly hope Eliezer's other predictions don't prove so well-calibrated.

rafaelero•1h ago
Gary Marcus is so systematically and overconfidently wrong that I wonder why we keep talking about this clown.
qoez•56m ago
People just give attention to people making surprising bold counter narrative predictions but don't give them any attention when they're wrong.
dcre•1h ago
I do think Gary Marcus says a lot of wrong stuff about LLMs but I don’t see anything too egregious in that post. He’s just describing the results they got a few months ago.
m3kw9•1h ago
He definitely cannot use the original arguments from then ChatGPT arrived, he's a perennial goal post shifter.
causal•1h ago
These numbers feel kind of meaningless without any work showing how he got to 16%
shuckles•57m ago
My understanding is that Eliezer more or less thinks it's over for humans.
0xDEAFBEAD•32m ago
He hasn't given up though: https://xcancel.com/ESYudkowsky/status/1922710969785917691#m
sigmoid10•1h ago
While I usually enjoy seeing these discussions, I think they are really pushing the usefulness of bayesian statistics. If one dude says the chance for an outcome is 8% and another says it's 16% and the outcome does occur, they were both pretty wrong, even though it might seem like the one who guessed a few % higher might have had a better belief system. Now if one of them had said 90% while the other said 8% or 16%, then we should pay close attention to what they are saying.
grillitoazul•45m ago
From a mathematical point of view there are two factors: (1) Initial prior capability of prediction from the human agents and (2) Acceleration in the predicted event. Now we examine the result under such a model and conclude that:

The more prior predictive power of human agents imply the more a posterior acceleration of progress in LLMs (math capability).

Here we are supposing that the increase in training data is not the main explanatory factor.

This example is the gem of a general framework for assessing acceleration in LLM progress, and I think its application to many data points could give us valuable information.

grillitoazul•15m ago
Another take at a sound interpretation:

(1) Bad prior prediction capability of humans imply that result does not provide any information

(2) Good prior prediction capability of humans imply that there is acceleration in math capabilities of LLMs.

zeroonetwothree•28m ago
A 16% or even 8% event happening is quite common so really it tells us nothing and doesn’t mean either one was pretty wrong.
andrepd•1h ago
Context? Who are these people and what are these numbers and why shouldn't I assume they're pulled from thin air?
orespo•4h ago
Definitely interesting. Two thoughts. First, are the IMO questions somewhat related to other openly available questions online, making it easier for LLMs that are more efficient and better at reasoning to deduce the results from the available content?

Second, happy to test it on open math conjectures or by attempting to reprove recent math results.

ktallett•4h ago
You mean as in the previous years questions will have been used to train it? Yes, they are the same questions and due to them limited format on math questions, there are repeats so LLMs should fundamentally be able to recognise a structure and similarities and use that.
laurent_du•1h ago
They are not the same question, why are you spreading so much misinformed takes in this thread? I know a guy who had one of the best scores in history at IMO and he's incredibly intelligent. Stop repeating that getting a gold medal at IMO is a piece of cake - it's not.
evrimoztamur•4h ago
From what I've seen, IMO question sets are very diverse. Moreover, humans also train on all available set of math olympiad questions and similar sets too. It seems fair game to have the AI train on them as well.

For 2, there's an army of independent mathematicians right now using automated theorem provers to formalise more or less all mathematics as we know it. It seems like open conjectures are chiefly bounded by a genuine lack of new tools/mathematics.

dylanbyte•4h ago
These are high school level only in the sense of assumed background knowledge, they are extremely difficult.

Professional mathematicians would not get this level of performance, unless they have a background in IMO themselves.

This doesn’t mean that the model is better than them in math, just that mathematicians specialize in extending the frontier of math.

The answers are not in the training data.

This is not a model specialized to IMO problems.

ktallett•4h ago
I think that's an insult to professional mathematicians. Any mathematician that has got to the stage where they do this for a living will be more than capable of doing Olympiad questions. These are proofs and some general numerical maths, some are probably a little trickier than others but the questions aren't unique and most final year bsc students in Maths will have encountered similar. I wouldn't consider myself particularly great at Maths, (despite it being the language of physics/engineering as many of my lecturers told me) but I can do plenty of the past questions without any significant reading. Most of these are similar to later years uni problems so the LLM will be able to find answers with the right searching. It may not be specialised to IMO problems, but these sort of math questions pop up in plenty of settings so it doesn't need to be.
Davidzheng•4h ago
No I assure you >50% of working mathematicians will not score gold level at IMO consistently (I'm in the field). As the original parent said, pretty much only ppl who had the training in high school can. Like number theorists without training might be able to do some number theory IMO questions but this level is basically impossible without specialized training (with maybe a few exceptions of very strong mathematicians)
ktallett•3h ago
I sense we may just have a different experience related to colleagues skill sets as I can think of 5 people I could send some questions too and I know they would do them just fine. Infact we often have done similar problems on a free afternoon and I often do similar on flights as a way to pass the time and improve my focus (my issue isn't my talent/understanding at maths, it's my ability to concentrate). I don't disagree that some level of training is needed but these questions aren't unique, nor impossible, especially as said training does exist and LLM's can access said examples. LLM's also have brute force which is a significant help with these type of issues. One particular point is that Math of all the STEM topics to try and focus on probably is the best documented alongside CS.
Davidzheng•3h ago
I mean these problems you can get better with practice. But if you haven't solved many before and can do them after an afternoon of thought I would be very impressed. Not that I don't believe you, it's just in my experience people like this are very rare. (Also I assume they have to have some degree of familarity of some common tricks otherwise they would have to derive basic number theory from scratch etc and that seems a bit much for me to believe)
ktallett•3h ago
I think honestly it's probably different experiences and skillsets. I find these sort of things doable bar dumb mistakes by myself, yet there will be other things I'll get stressed and not be able to do for ages (some lab skills no matter the number of times I do them and some physical equation derivations that I regularly muck up). I maybe sometimes assume that what comes easy for me, comes easy for all, and what I struggle with, everyone struggles with and that's probably not always the case. Likewise I did similar tasks as a teen in school and assume that is possibly the case for many academically bright so to speak but perhaps isn't so that probably helped me learn some tricks that I may not have otherwise. But as you say I do feel that you can learn the tricks and learn how to do them, even in older age (academically speaking) if you have the time and the patience and the right guide.
credit_guy•3h ago
> No I assure you >50% of working mathematicians will not score gold level at IMO consistently (I'm in the field)

I agree with you. However, would a lot of working mathematicians score gold level without the IMO time constraints? Working mathematicians generally are not trying to solve a problem in the time span of one hour. I would argue that most working mathematicians, if given an arbitrary IMO problem and allowed to work on it for a week, would solve it. As for "gold level", with IMO problems you either solve one or you don't.

You could counter that it is meaningless to remove the time constraints. But we are comparing humans with OpenAI here. It is very likely OpenAI solved the IMO problems in a matter of minutes, maybe even seconds. When we talk about a chatbot achieving human-level performance, it's understood that the time is not a constraint on the human side. We are only concerned with the quality of the human output. For example: can OpenAI write a novel at the level of Jane Austen? Maybe it can, maybe it can't (for now) but Jane Austen was spending years to write such a novel, while our expectation is for OpenAI to do it at the speed of multiple words per second.

Davidzheng•3h ago
I mean. Back when I was practicing these problems sometimes I would try them on/off for a week and would be able to do some 3&6's (usually I can do 1&4 somewhat consistently and usually none of others). As a working mathematician today, I would almost certain not be able to get gold medal performance in a week but for a given problem I guess I would have ~50% chance at least of solving it in a week? But I haven't tried in a while. But I suspect the professionals here do worse at these competition questions than you think. I mean certain these problems are "easy" compared to many of the questions we think about, but expertise drastically shifts the speed/difficulty of questions we can solve within our domains, if that makes sense.

Addendum: Actually I am not sure the probability of solving it in a week is not much better than 6 hours for these questions because they are kind of random questions. But I agree with some parts of your post tbf.

jsnell•1h ago
> It is very likely OpenAI solved the IMO problems in a matter of minutes, maybe even seconds

Really? My expectation would have been the opposite, that time was a constraint for the AIs. OpenAI's highest end public reasoning models are slow, and there's only so much that you can do by parallelization.

Understanding how they dealt with time actually seems like the most important thing to put these results into context, and they said nothing about it. Like, I'd hope they gave the same total time allocation for a whole problem set as the human competitors. But how did they split that time? Did they work on multiple problems in parallel?

gametorch•3h ago
Getting gold at the IMO is pretty damn hard.

I grew up in a relatively underserved rural city. I skipped multiple grades in math, completed the first two years of college math classes while in high school, and won the award for being the best at math out of everyone in my school.

I've met and worked with a few IMO gold medalists. Even though I was used to scoring in the 99th percentile on all my tests, it felt like these people were simply in another league above me.

I'm not trying to toot my own horn. I'm definitely not that smart. But it's just ridiculous to shoot down the capabilities of these models at this point.

npinsker•3h ago
The trouble is, getting an IMO gold medal is much easier (by frequency) than being the #1 Go player in the world, which was achieved by AI 10 years ago. I'm not sure it's enough to just gesture at the task; drilling down into precisely how it was achieved feels important.

(Not to take away from the result, which I'm really impressed by!)

Invictus0•2h ago
The "AI" that won Go was Monte Carlo tree search on a neural net "memory" of the outcome of millions of previous games; this is a LLM solving open ended problems. The tasks are hardly even comparable.
gafferongames•2h ago
And then they created AlphaGo Zero, which is not trained on any previous games, and it was even stronger!

https://deepmind.google/discover/blog/alphago-zero-starting-...

yobbo•23m ago
A "reasoning LLM" might not be conceptually far from MCTS.
parsimo2010•1h ago
I am a professor in a math department (I teach statistics but there is a good complement of actual math PhDs) and there are only about 10% who care about these types of problems and definitely less than half who could get gold on an IMO test even if they didn’t care.

They are all outstanding mathematicians, but the IMO type questions are not something that mathematicians can universally solve without preparation.

There are of course some places that pride themselves on only taking “high scoring” mathematicians, and people will introduce themselves with their name and what they scored on the Putnam exam. I don’t like being around those places or people.

crinkly•1h ago
100% agree with this.

My second degree is in mathematics. Not only can I probably not do these but they likely aren’t useful to my work so I don’t actually care.

I’m not sure an LLM could replace the mathematical side of my work (modelling). Mostly because it’s applied and people don’t know what they are asking for, what is possible or how to do it and all the problems turn out to be quite simple really.

jebarker•1h ago
IMO questions are to math as leetcode questions are to software engineering. Not necessarily easier or harder but they test ability on different axes. There’s definitely some overlap with undergrad level proof style questions but I disagree that being a working mathematician would necessarily mean you can solve these type of questions quickly. I did a PhD in pure math (and undergrad obv) and I know I’d have to spend time revising and then practicing to even begin answering most IMO questions.
demirbey05•4h ago
Are you from OpenAI ?
ktallett•4h ago
Hahaha! It's either that or they are determined to get a job there.
Davidzheng•4h ago
Are you sure this is not specialized to IMO? I do see the twitter thread saying it's "general reasoning" but I'd imagine they RL'd on olympiad math questions? If not I really hope someone from OpenAI says that bc it would be pretty astounding.
stingraycharles•1h ago
They also said this is not part of GPT-5, and “will be released later”. It’s very, very likely a model specifically fine-tuned for this benchmark, where afterwards they’ll evaluate what actual real-world problems it’s good at (eg like “use o4-mini-high for coding”).
AIPedant•3h ago
It almost certainly is specialized to IMO problems, look at the way it is answering the questions: https://xcancel.com/alexwei_/status/1946477742855532918

E.g here: https://pbs.twimg.com/media/GwLtrPeWIAUMDYI.png?name=orig

Frankly it looks to me like it's using an AlphaProof style system, going between natural language and Lean/etc. Of course OpenAI will not tell us any of this.

redlock•3h ago
Nope

https://x.com/polynoamial/status/1946478249187377206?s=46&t=...

AIPedant•2h ago
If you don't have a Twitter account then x.com links are useless, use a mirror: https://xcancel.com/polynoamial/status/1946478249187377206

Anyway, that doesn't refute my point, it's just PR from a weaselly and dishonest company. I didn't say it was "IMO-specific" but the output strongly suggests specialized tooling and training, and they said this was an experimental LLM that wouldn't be released. I strongly suspect they basically attached their version of AlphaProof to ChatGPT.

Davidzheng•2h ago
We can only go off their word unfortunately and they say no formal math. so I assume it's being eval'd by a verifier model instead of a formal system. There's actually some hints of this b/c geometry in Lean is not that well developed so unless they also built their own system it's hard to do it formally (though their P2 proof is by coordinate bash (computation by algebra instead of geometric construction) so it's hard to tell.
skdixhxbsb•1h ago
> We can only go off their word

We’re talking about Sam Altman’s company here. The same company that started out as a non profit claiming they wanted to better the world.

Suggesting they should be given the benefit of the doubt is dishonest at this point.

fnordpiglet•1h ago
I actually think this “cheating” is fine. In fact it’s preferable. I don’t need an AI that can act as a really expensive calculator or solver. We’ve already built really good calculators and solvers that are near optimal. What has been missing is the abductive ability to successfully use those tools in an unconstrained space with agency. I find really no value in avoiding the optimal or near optimal techniques we’ve devised rather than focusing on the harder reasoning tasks of choosing tools, instrumenting them properly, interpreting their results, and iterating. This is the missing piece in automated reasoning after all. A NN that can approximate at great cost those tools is a parlor trick and while interesting not useful or practical. Even if they have some agent system here, it doesn’t make the achievement any less that a machine can zero shot do as well as top humans at incredibly difficult reasoning problems posed in natural language.
YeGoblynQueenne•43m ago
>> This is not a model specialized to IMO problems.

How do you know?

gniv•2h ago
From that thread: "The model solved P1 through P5; it did not produce a solution for P6."

It's interesting that it didn't solve the problem that was by far the hardest for humans too. China, the #1 team got only 21/42 points on it. In most other teams nobody solved it.

demirbey05•1h ago
I think from Canada team someone solved it but among all, its very few
gus_massa•1h ago
In the IMO, the idea is that the first day you get P1, P2 and P3, and the second day you get P4, P5 and P6. Usually, ordered by difficulty, they are P1, P4, P2, P5, P3, P6. So, usually P1 is "easy" and P6 is very hard. At least that is the intended order, but sometime reality disagree.

Edit: Fixed P4 -> P3. Thanks.

thundergolfer•57m ago
You have P4 twice in there, latter should be 3
masterjack•5m ago
In this case P6 was unusually hard and P3 was unusually easy https://sugaku.net/content/imo-2025-problems/
davidguetta•2h ago
Wait for the Chinese version
procgen•1h ago
riding coattails
johnecheck•2h ago
Wow. That's an impressive result, but how did they do it?

Wei references scaling up test-time compute, so I have to assume they threw a boatload of money at this. I've heard talk of running models in parallel and comparing results - if OpenAI ran this 10000 times in parallel and cherry-picked the best one, this is a lot less exciting.

If this is legit, then we need to know what tools were used and how the model used them. I'd bet those are the 'techniques to make them better at hard to verify tasks'.

Davidzheng•2h ago
I don't think it's much less exciting if they ran it 10000 parallel? It implies an ability to discern when the proof is correct and rigorous (which o3 can't do consistently) and also means that outputting the full proof is within capabilities even if rare.
FeepingCreature•1h ago
The whole point of RL is if you can get it to work 0.01% of the time you can get it to work 100% of the time.
lcnPylGDnU4H9OF•2h ago
> what tools were used and how the model used them

According to the twitter thread, the model was not given access to tools.

constantcrying•1h ago
>if OpenAI ran this 10000 times in parallel and cherry-picked the best one, this is a lot less exciting.

That entirely depends on who did the cherry picking. If the LLM had 10000 attempts and each time a human had to falsify it, this story means absolutely nothing. If the LLM itself did the cherry picking, then this is just akin to a human solving a hard problem. Attempting solutions and falsifying them until the desired result is achieved. Just that the LLM scales with compute, while humans operate only sequentially.

johnecheck•1h ago
The key bit here is whether the LLM doing the cherry picking had knowledge of the solution. If it didn't, this is a meaningful result. That's why I'd like more info, but I fear OpenAI is going to try to keep things under wraps.
diggan•1h ago
> If it didn't

We kind of have to assume it didn't right? Otherwise bragging about the results makes zero sense and would be outright misleading.

blibble•26m ago
openai have been caught doing exactly this before
samat•23m ago
> would be outright misleading

why would not they? what are the incentives not to?

fnordpiglet•1h ago
Why is that less exciting? A machine competing in an unconstrained natural language difficult math contest and coming out on top by any means is breath taking science fiction a few years ago - now it’s not exciting? Regardless of the tools for verification or even solvers - why is the goal post moving so fast? There is no bonus for “purity of essence” and using only neural networks. We live in an era where it’s hard to tell if machines are thinking or not, which for since the first computing machines was seen as the ultimate achievement. Now we Pooh Pooh the results of each iteration - which unfold month over month not decade over decade now.

You don’t have to be hyped to be amazed. You can retain the ability to dream while not buying into the snake oil. This is amazing no matter what ensemble of techniques used. In fact - you should be excited if we’ve started to break out of the limitations of forcing NN to be load bearing in literally everything. That’s a sign of maturing technology not of limitations.

parasubvert•49m ago
I think the main hesitancy is due to rampant anthropomorphism. These models cannot reason, they pattern match language tokens and generate emergent behaviour as a result.

Certainly the emergent behaviour is exciting but we tend to jump to conclusions as to what it implies.

This means we are far more trusting with software that lacks formal guarantees than we should be. We are used to software being sound by default but otherwise a moron that requires very precise inputs and parameters and testing to act correctly. System 2 thinking.

Now with NN it's inverted: it's a brilliant know-it-all but it bullshits a lot, and falls apart in ways we may gloss over, even with enormous resources spent on training. It's effectively incredible progress on System 1 thinking with questionable but evolving System 2 skills where we don't know the limits.

If you're not familiar with System 1 / System 2, it's googlable .

logicchains•10m ago
>I think the main hesitancy is due to rampant anthropomorphism. These models cannot reason, they pattern match language tokens and generate emergent behaviour as a result

This is rampant human chauvinism. There's absolutely no empirical basis for the statement that these models "cannot reason", it's just pseudoscientific woo thrown around by people who want to feel that humans are somehow special. By pretty much every empirical measure of "reasoning" or intelligence we have, SOTA LLMs are better at it than the average human.

YeGoblynQueenne•45m ago
>> Why is that less exciting? A machine competing in an unconstrained natural language difficult math contest and coming out on top by any means is breath taking science fiction a few years ago - now it’s not exciting?

Half the internet is convinced that LLMs are a big data cheating machine and if they're right then, yes, boldly cheating where nobody has cheated before is not that exciting.

reactordev•2h ago
The Final boss was:

   Which is greater, 9.11 or 9.9?

/s

I kid, this is actually pretty amazing!! I've noticed over the last several months that I've had to correct it less and less when dealing with advanced math topics so this aligns.

zkmon•1h ago
This is an awesome progress in human achievement to get these machines intelligent. And this is also a fast regress and decline on the human wisdom!

We are simply greasing the grooves and letting things slide faster and faster and calling it progress. How does this help to make the human and nature integration better?

Does this improve climate or make humans adapt better to changing climate? Are the intelligent machines a burning need for the humanity today? Or is it all about business and political dominance? At what cost? What's the fall out of all this?

jebarker•1h ago
Nobody knows the answers to these questions. Relying on AGI solving problems like climate change seems like a risky strategy but on the other hand it’s very plausible that these tools can help in some capacity. So we have to build, study and find out but also consider any opportunity cost of building these tools versus others.
jfengel•15m ago
Solving climate change isn't a technical problem, but a human one. We know the steps we have to take, and have for many years. The hard part is getting people to actually do them.

No human has any idea how to accomplish that. If a machine could, we would all have much to learn from it.

chairhairair•1h ago
OpenAI simply can’t be trusted on any benchmarks: https://news.ycombinator.com/item?id=42761648
amelius•1h ago
This is not a benchmark, really. It's an official test.
andrepd•1h ago
And what were the methods? How was the evaluation? They could be making it all up for all we know!
qoez•57m ago
Remember that they've fired all whistleblowers that would admit to breaking the verbal agreement that they wouldn't train on the test data.
samat•28m ago
Could not find it on the open web. Do you have clues to search for?
Aurornis•7m ago
The International Math Olympiad isn’t an AI benchmark.

It’s an annual human competition.

tlb•1h ago
I encourage anyone who thinks these are easy high-school problems to try to solve some. They're published (including this year's) at https://www.imo-official.org/problems.aspx. They make my head spin.
xpressvideoz•43m ago
I didn't know there were localized versions of the IMO problems. But now that I think of it, having versions of multiple languages is a must to remove the language barrier from the competitors. I guess having that many language versions (I see ~50 languages?) may make keeping the security of the problems considerably harder?
ksec•1h ago
I am neither an optimist nor a pessimist for AI. I would likely be called both by the opposite parties. But the fact that AI / LLM is still rapidly improving is impressive in itself and worth celebrating for. Is it perfect, AGI, ASI? No. Is it useless? Absolutely not.

I am just happy the prize is so big for AI that there are enough money involve to push for all the hardware advancement. Foundry, Packaging, Interconnect, Network etc, all the hardware research and tech improvements previously thought were too expensive are now in the "Shut up and take my money" scenario.

andrepd•1h ago
Am I missing something or is this completely meaningless? It's 100% opaque, no details whatsoever and no transparency or reproducibility.

I wouldn't trust these results as it is. Considering that there are trillions of dollars on the line as a reward for hyping up LLMs, I trust it even less.

flappyeagle•48m ago
Yes you are missing the entire boat
meroes•1h ago
In the RLHF sphere you could tell some AI company/companies were targeting this because of how many IMO RLHF’ers they were hiring specifically. I don’t think it’s really easy to say how much “progress” this is given that.
ALLTaken•55m ago
I think OpenAI participating is nothing but a publicity stunt and wholly unfair and disrespectful against Human participants. It should be allowed for AI models to participate, but it should not be ranked equally, nor put any engineers under duress of having to pull all-nighters. AI model performance should be shown T+2 days AFTER the contest! I wish that real Humans who worked hard can enjoy the attention, price and respect they deserve!

Billion dollar companies stealing not only the price, prestige, time and sleep of participants by brute-forcing their model through all illegally scraped Code via GitHub is a disgrace to humanity.

AI models should read the same materials to become proficient in coding, without having trillions of lines of code to ape through mindlessly. Otherwise the "AI" is no different than an elaborate Monte Carlo Tree Search (MCTS).

Yes I know AI is quite advanced. I know that quite well and study latest SOTA papers daily, have developed my own models aswell from the ground up, but it's despite all the advancements still far away from substantially being better than MCTS (see: https://icml.cc/virtual/2025/poster/44177 and https://allenai.org/blog/autods )

EDIT, adding proof:

This is the results of the last competition they tried to win and have LOST: https://arstechnica.com/ai/2025/07/exhausted-man-defeats-ai-...

(Looks like a pattern OpenAI Corp is scraping competitions to place themselves into the spotlight and headlines.)

aubanel•49m ago
- AI competing is "wholly unfair"

- "[AI is] far away from being substantially being better than MCTs"

^ pick only one

stingraycharles•41m ago
Yeah it’s a completely fair playing field, it’s completely obvious that AI should be able to compete with humans in the same way that robotics and computers can compete with humanity (and are better suited for many tasks).

Whether or not they’re far away from being better than humans is up to debate, but the entire point of these types of benchmarks it to compare them to humans.

bluecalm•14m ago
>>Yeah it’s a completely fair playing field, it’s completely obvious that AI should be able to compete with humans in the same way that robotics and computers can compete with humanity (and are better suited for many tasks).

Yeah same way computers and robots should be able to win World Chess Championship, 100m dash and Wimbledon.

>>but the entire point of these types of benchmarks it to compare them to humans

The entire point of the competition is to fight against participants who are similar to you, have similar capabilities and go through similar struggles. If you want bot vs human competitions - great - organize it yourself instead of hijacking well established competitions out there.

yobbo•36m ago
Running MCTS over algorithms is the part that might be considered unfair if used in competition with humans.
threatripper•30m ago
Humans should be allowed to compete in groups of arbitrary size. This would also be a demonstration of excellent teamwork under time pressure.
Aurornis•37m ago
> nor put any engineers under duress of having to pull all-nighters.

Under duress? At a company like this, all of the people working on this project are there because they want to be and they’re compensated millions.

up2isomorphism•54m ago
In fact no car company claims “gold medal” performance in Olympic running even they can do that 100 yeas ago. Obviously since IMO does not generate much money so it is an easy target.

BTW; “Gold medal performance “ looks a promotional term for me.

flappyeagle•51m ago
LMAO
ddtaylor•44m ago
Glock should show up to the UFC and win the whole tournament handily.
amelius•54m ago
Makes sense. Mathematicians use intuiton a lot to drive their solution seeking, and I suppose an AI such as an LLM could develop intuition too. Of course where AI really wins is search speed and the fact that an LLM really doesn't get tired when exploring different strategies and steps within each strategy.

However, I expect that geometric intuition may still be lacking mostly because of the difficulty of encoding it in a form which an LLM can easily work with. After all, Chatgpt still can't draw a unicorn [1] although it seems to be getting closer.

[1] https://gpt-unicorn.adamkdean.co.uk/

quirino•52m ago
I think equally impressive is the performance of the OpenAI team at the "AtCoder World Tour Finals 2025" a couple of days ago. There were 12 human participants and only one did better than OpenAI.

Not sure there is a good writeup about it yet but here is the livestream: https://www.youtube.com/live/TG3ChQH61vE.

zeroonetwothree•27m ago
And yet when working on production code current LLMs are about as good as a poor intern. Not sure why the disconnect.
kenjackson•10m ago
Depends. I’ve been using it for some of my workflows and I’d say it is more like a solid junior developer with weird quirks where it makes stupid mistakes and other times behaves as a 30 year SME vet.
chvid•51m ago
I believe this company used to present its results and approach in academic papers with enough details so that it could be reproduced by third parties.

Now it is just doing a bunch of tweets?

samat•30m ago
This company used to be non profit

And many other things

do_not_redeem•24m ago
They're doing tweets because the results cannot be reproduced. https://matharena.ai/
modeless•42m ago
Noam Brown:

> this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

> it’s also more efficient [than o1 or o3] with its thinking. And there’s a lot of room to push the test-time compute and efficiency further.

> As fast as recent AI progress has been, I fully expect the trend to continue. Importantly, I think we’re close to AI substantially contributing to scientific discovery.

I thought progress might be slowing down, but this is clear evidence to the contrary. Not the result itself, but the claims that it is a fully general model and has a clear path to improved efficiency.

https://x.com/polynoamial/status/1946478249187377206

stingraycharles•36m ago
My issue with all these citations is that it’s all OpenAI employees that make these claims.

I’ll wait to see third party verification and/or use it myself before judging. There’s a lot of incentives right now to hype things up for OpenAI.

do_not_redeem•27m ago
A third party also tried this exact experiment. OpenAI did half as well as Gemini, and none of the models even got bronze.

https://matharena.ai/imo/

jsnell•9m ago
I feel you're misunderstanding something. That's not "this exact experiment". Matharena is testing publicly available models against the IMO problem set. OpenAI was announcing the results of a new, unpublished model, on that problems set.

It is totally fair to discount OpenAI's statement until we have way more details about their setup, and maybe even until there is some level of public access to the model. But you're doing something very different: claiming that their results are fraudulent and (incorrectly) using the Matharena results as your proof.

YeGoblynQueenne•32m ago
How is a claim, "clear evidence" to anything?
kelipso•21m ago
Haha, if Musk made a claim five years ago, it would’ve been taken as clear evidence here. Now it’s other people I guess, hype never dies.
modeless•19m ago
Most evidence you have about the world is claims from other people, not direct experiment. There seems to be a thought-terminating cliche here on HN, dismissing any claim from employees of large tech companies.

Unlike seemingly most here on HN, I judge people's trustworthiness individually and not solely by the organization they belong to. Noam Brown is a well known researcher in the field and I see no reason to doubt these claims other than a vague distrust of OpenAI or big tech employees generally which I reject.

csomar•25m ago
The issue is that trust is very hard to build and very easy to lose. Even in today's age where regular humans have a memory span shorter than that of an LLM, OpenAI keeps abusing the public's trust. As a result, I take their word on AI/LLMs about as seriously as I'd take my grocery store clerk's opinion on quantum physics.
YeGoblynQueenne•36m ago
Guys, that's nothing. My new AI system is not LLM-based but neuro-symbolic and yet it just scored 100% on the IMO 2026 problems that haven't even been written yet, it is that good.

What? This is a claim with all the trust-worthiness of OpenAI's claim. I mean I can claim anything I want at this point and it would still be just as trust-worthy as OpenAI's claim, with exactly zero details about anything else than "we did it, promise".

Jackson__•30m ago
Also interesting takeaways from that tweet chain:

>GPT5 soon

>it will not be as good as this secret(?) model

mehulashah•29m ago
The AI scaling that went on for the last five years is going to be very different from the scaling that will happen in the next ten years. These models have latent capabilities that we are racing to unearth. IMO is but one example.

There’s so much to do at inference time. This result could not have been achieved without the substrate of general models. Its not like Go or protein folding. You need the collective public global knowledge of society to build on. And yes, there’s enough left for ten years of exploration.

More importantly, the stakes are high. There may be zero day attacks, biological weapons, and more that could be discovered. The race is on.

mikert89•27m ago
The cynacism/denial on HN about AI is exhausting. Half the comments are some weird form of explaining away the ever increasing performance of these models
softwaredoug•25m ago
Probably because both sides have strong vested interests and it’s next to impossible to find a dispassionate point of view.

The Pro AI crowd, VC, tech CEOs etc have strong incentive to claim humans are obsolete. Tech employees see threats to their jobs and want to poopoo any way AI could be useful or competitive.

rvz•18m ago
Or some can spot a euphoric bubble when they see it with lots of participants who have over-invested in 90% of these so called AI startups that are not frontier labs.
yunwal•13m ago
What does this have to do with the math Olympiad? Why would it frame your view of the accomplishment?
orbital-decay•10m ago
That's a huge hyperbole. I can assure you many people find the entire thing genuinely fascinating, without having any vested interest and without buying the hype.
halfmatthalfcat•23m ago
The overconfidence/short sightedness on HN about AI is exhausting. Half the comments are some weird form of explaining how developers will be obsolete in five years and how close we are to AGI.
blamestross•12m ago
Nobody likes the idea that this is only "economical superior AI". Not as good as humans, but a LOT cheaper.

The "It will just get better" is bubble baiting the investors. The tech companies learned from the past and they are riding and managing the bubble to extract maximum ROI before it pops.

The reality is a lot of work done by humans can be replaced by an LLM with lower quality and nuance. The loss in sales/satisfaction/ect is more than offset by the reduced cost.

The current model of LLMs are enshitification accelerators and that will have real effects.

Aurornis•10m ago
> Half the comments are some weird form of explaining how developers will be obsolete in five years and how close we are to AGI.

I do not see that at all in this comment section.

There is a lot of denial and cynicism like the parent comment suggested. The comments trying to dismiss this as just “some high school math problem” are the funniest example.

wyuyang377•14m ago
cynacism -> cynicism
gellybeans•10m ago
Making an account just to point out how these comments are far more exhausting, because they don't engage with the subject matter. They are just agreeing with a headline and saying, "See?"

You say, "explaining away the increasing performance" as though that was a good faith representation of arguments made against LLMs, or even this specific article. Questionong the self-congragulatory nature of these businesses is perfectly reasonable.

gcanyon•15m ago
99.99+% of all problems humans face do not require particularly original solutions. Determining whether LLMs can solve truly original (or at least obscure) problems is interesting, and a problem worth solving, but ignores the vast majority of the (near-term at least) impact they will have.
another_twist•12m ago
Its a level playing field IMO. But theres another thread which claims not even bronze and I really don't want to go to X for anything.
another_twist•11m ago
I am quite surprised that Deepmind with MCTS wasnt able to figure out math performance itself.