LLMs Are Currently Not Helpful at All for Math Research: Hamkins

https://officechai.com/ai/llms-are-currently-not-helpful-at-all-for-math-research-give-garbage-answers-mathematician-joel-david-hamkins/

27•ablaba•1mo ago

Comments

j16sdiz•1mo ago

> The frustration, for Hamkins, goes beyond mere incorrectness—it’s the nature of the interaction itself that proves problematic.

I would assume pairing LLMs with a formal proof system would help a lot. At the very least, the system can know what is incorrect, without lengthy debates, which frustrate him most.

This won't help the system discover or solve new math, but it make the experience far more endurable.

112233•1mo ago

That itself may be tricky. Suppose you proof system tells that the proof is correct — how do you verify it is proof of the assertion you want, unless you have beforehand written the "test harness" by hand? At least in programming, formally checking that the code does exactly what is required is orders of magnitude harder than simply writing code that compiles.

panny•1mo ago

"It is difficult to get a man to understand something when his salary depends on his not understanding it."

Well, this definitely won't make the front page despite upvotes. Too many people here are heavily invested and cannot stand for it.

fragmede•1mo ago

It's currently #15 on the front page.

https://hnrankings.info/46496026/

panny•1mo ago

I believe you, but also believe me when I tell you it is not on my frontpage.

ctrl-F llm

https://i.imgur.com/D5K6b9c.png

Not this story.

edit: Hi fragmede, I'd reply but Nazi @dang says I'm posting too fast again. I guess you won't be working on that fully free chromebook hardware either. I'll keep trying anyway. I'm not motivated by money like you guys. Keep chasing that carrot however.

fragmede•1mo ago

HN rankings put it on the front page for 15 minutes, now it's dropped to 44th. Point is, not everything is a vast conspiracy against you, as much as it would make it easier for you to believe the world is run by some shadowy conspiracy, that there's someone in charge of things.

NitpickLawyer•1mo ago

> “I’ve played around with it and I’ve tried experimenting, but I haven’t found it helpful at all. Basically zero. It’s not helpful to me. And I’ve used various systems and so on, the paid models and so on.”

Eh... I don't know. It's hard for me to believe such absolutist takes. Especially since other proeminent mathematicians (i.e. Tao) have spoken highly of the help and value they get from these systems.

I also see this kind of takes in my field. Especially from "senior" people, 20+ years of experience. The problem is that, when pressed, their "trying it out" is often the most basic, naive, technically atrocious type of testing. "I tried coding with it but it's useless" -> they tried once, 3 years ago, in a chatgpt web interface, and never touched it since.

I think there's a lot of disconnect between the overhypers and deniers. Both are wrong, IMO, and unfortunately both are extremely vocal while also being stubborn and unwilling to try different things. But my gut feeling is that if in 2025/26 someone says "basically zero" they are wrong. Or naive. Or make strawman arguments. And ignorable.

mcv•1mo ago

I've been using it very recently. Not for mathematics , but for programming. And while Claude Opus is much more likely to admit mistake ("You're absolutely correct!" instead of "That's fine") when I correct it, it does require correcting, and has been incapable at grasping complex problems. I can't trust it to produce correct code for complex problems, because when I did, the solution turned out to be wrong. Plausible looking, and certainly producing some results, but not the correct ones.

It has still been useful, because parts of the problem have been solved before, and in some cases it can accurately reproduce those parts, which has been massively helpful. And it can help me understand some aspects of the problem. But it can't solve it for me. It has shown that that's simply beyond its capabilities.

MaysonL•1mo ago

Terry Tao has found them helpful in the past[0], and I’d trust him over Hamkins.

[0] https://terrytao.wordpress.com/2025/12/08/the-story-of-erdos...

measurablefunc•1mo ago

They do different kinds of mathematics. Tao works mainly in analytic number theory & that's why he got a Fields medal. Hamkins works in foundations of mathematics, logic, & computability theory.

pfdietz•1mo ago

Tao has also used "Good Old Fashioned AI" (in the form of Mace4/Prover9) for math research.

https://terrytao.wordpress.com/2025/12/09/the-equational-the...

watchthedemo•1mo ago

For the sake of rigor it appears he found it interesting that someone else found them helpful in the past rather than "Terry Tao[himself] has found them helpful in the past."

> This was one further addition to a recent sequence of examples where an Erdős problem had been automatically solved in one fashion or another by an AI tool.[Aristotle] Like the previous cases, the proof turned out to not be particularly novel

He concludes:

> One striking feature of this story for me is how important it was to have a diverse set of people, literature, and tools to attack this problem. To be able to state and prove the precise formula for {c(n)} required multiple observations

Which looks like he wants to focus the praise for the solution on the "diverse set of people" relegating any "praise" for LLMs to the obfuscated and ambiguous category of "tools" behind literature.

sph•1mo ago

Isaac Newton was deep into alchemy and occult. Even very smart people suffer from confirmation bias, Dunning-Kruger effect & co.

20kHz•1mo ago

Well folks, Tao says he finds AI useful so I guess we can flag this and other stories that say anything to the contrary, pack it up and go home.

pfdietz•1mo ago

Not only said, but demonstrated. That counts for quite a lot.

watchthedemo•1mo ago

I think you are missing the sarcasm of the parent comment.

I am also fairly certain you have yet to actually watch Tao "demonstrate" his interaction with these technologies because if you had you would know in those demonstrations he formally verifies theorems that already have proofs, and even more importantly, that he had already formally verified twice before 'off camera'. Despite these two realities he still spends a lot of time correcting the LLMs or completely ignoring their suggestions while expressing a mostly polite appraisal of the tech's use and capability.

Of course the hype machine takes his politeness, which has more to do with his specific personality than anything to do with the tech, as universal endorsement for the tech and all of its externalities.

I called it when he started getting in bed with these malevolent tech companies that they were laundering their misdeeds through his reputation, and here we are.

pfdietz•1mo ago

Unless you claim Tao is outright lying, his statements say he's getting objective value out of AI tools. They are helping him accelerate actual math research.

Math seems like an ideal target for AI, as it provides a firm basis for identifying correctness (has something actually been proved.) I think the consequences of this are going to be enormous for mathematics, including the conversion of the entire historical math literature to formalized, checked proofs. Once that is done, once all that training data is available, the capabilities of math AIs are IMO likely to be extraordinary. Add to that the sort of Alpha Go-style self training as AI provers work on new problems.

watchthedemo•1mo ago

> Unless you claim Tao is outright lying, his statements say he's getting objective value out of AI tools. They are helping him accelerate actual math research.

I lack the personal relationship with Tao necessary to determine if his words are truthful or otherwise (namely for me, paid for), but you also failed to provide "his statements" that say either "he's getting objective value out of AI tools" (however one may quantify such a statement) or that "they are helping him accelerate actual math research" choosing instead to simply claim those words as Tao's own.

All of the future tense coded "I think", "going to be", "Once that is done", and "IMO likely" in your second paragraph seem to be at conflict, in the very least tensewise, with your gp comment's past tense of "Not only said, but demonstrated."

pfdietz•1mo ago

I think you're just engaging in passive-aggressive denial here. There's no arguing with that, so I'm going to stop. Good day!

watchthedemo•1mo ago

Just regular ole skepticism in response to claims without substantiative evidence.

demirbey05•1mo ago

It's hard to cut through the AI hype when there are billions of dollars at stake. I usually trust negative comments more, as long as the person isn't trying to sell a course. Even though Terence Tao is a respected scientist, I wonder if his recent comments are driven by a need for funding due to federal cuts. I’ve had similar experiences with LLMs—whenever I ask them about hard math or RL theory, they almost always give me the wrong answers.

ben_w•1mo ago

I also care more about the failure modes than the successes, although in my case, it's because I keep finding them exceptionally useful at software development, and I:

1. Don't want to use them where they suck.

Think normalisation of deviance: "the problems haven't affected me therefore they don't exist" is a way to get really badly burned.

2. Want to train up in things they will still suck at by the time I've leared whatever it is.

I find LLMs seem kinda bad at writing sheet music, and Suno is kinda bad at werid instructions (like Stable Diffusion for images), but I expect them to get good before I can.

I also find them inconsistent at non-Euclidian problems: sometimes they can, sometimes they can't. I have absolutely no idea how to monetise that, but even if I could, "inconsistent" is itself an improvement on "cannot ever" which is what SOTA was a few years ago.

erelong•1mo ago

As the kids would say, sounds like a "skill issue", but more kindly just that this user hasn't developed a use for the tools for his specific use cases, which is fine; however an implication that the tools aren't useful for other people seems to contradict others' experience reports on how they have been able to find uses for the tools

imtringued•1mo ago

For me personally LLMs have made it way easier to discover obscure or unfamiliar math concepts. It's way harder to get into a topic if you have to read through hundreds of pages and then wonder if it was time well spent, compared to just finding the bits that are interesting to you and then getting deeper into the topic as you need.

jupitr•1mo ago

This is quite seriously missing the point: computational theorem-provers and contributions to mathlib (for Lean, anyway) allow for checking "correctness". The LLM is one of the many ways to search for tactics that help complete a proof for a theorem not yet in mathlib.

The LLM is not done until the theorem-prover says it is done - at which point we can be sure of the correctness of the assertion in the context. AFAIK there is no debate about inaccuracy with theorem-provers. Their real limitation continues to be converging in realistic timescales and efficient search over all combinations of proof tactics.

watchthedemo•1mo ago

> The LLM is not done until the theorem-prover says it is done - at which point we can be sure of the correctness of the assertion in the context. AFAIK there is no debate about inaccuracy with theorem-provers.

Your "in the context" is doing a lot of heavy lifting here, see:

- The Math Is Haunted : https://news.ycombinator.com/item?id=44739315 : https://overreacted.io/the-math-is-haunted/

- Three ways formally verified code can go wrong in practice : https://news.ycombinator.com/item?id=45555727 : https://buttondown.com/hillelwayne/archive/three-ways-formal...

- Breaking “provably correct” Leftpad : https://news.ycombinator.com/item?id=45492274 : https://lukeplant.me.uk/blog/posts/breaking-provably-correct...

- what does a lean proof prove? : https://news.ycombinator.com/item?id=46286605 : https://supaiku.com/what-does-a-lean-proof-prove

> Their real limitation continues to be converging in realistic timescales and efficient search over all combinations of proof tactics.

I agree "efficient search over all combination of proof tactics" will be a defining moment in the field. I disagree these LLMs are it.

I Write Games in C (yes, C)

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

SectorC: A C Compiler in 512 bytes

The AI boom is causing shortages everywhere else

Hoot: Scheme on WebAssembly

We have broken SHA-1 in practice

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Al Lowe on model trains, funny deaths and working with Disney

The Waymo World Model

The F Word

We Mourn Our Craft

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Vocal Guide – belt sing without killing yourself

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

72M Points of Interest

Coding agents have replaced every framework I used

France's homegrown open source online office suite

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Where did all the starships go?

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

I Write Games in C (yes, C)

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

SectorC: A C Compiler in 512 bytes

The AI boom is causing shortages everywhere else

Hoot: Scheme on WebAssembly

We have broken SHA-1 in practice

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Al Lowe on model trains, funny deaths and working with Disney

The Waymo World Model

The F Word

We Mourn Our Craft

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

Vocal Guide – belt sing without killing yourself

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

72M Points of Interest

Coding agents have replaced every framework I used

France's homegrown open source online office suite

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Where did all the starships go?

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

LLMs Are Currently Not Helpful at All for Math Research: Hamkins

Comments