But I suppose the bigger goal remains improving their language model, and this was an experimentation born from that. These works are symbiotic; the original DeepSeekMath resulted in GRPO, which eventually formed the backbone of their R1 model: https://arxiv.org/abs/2402.03300
i don't think llm is a great pure rlvr
[1] IIRC, AlphaProof is a bit like this. But I bet that either there's a whole lot of effort on this sort of thing in the major AI labs, or else there's some good reason to expect it not to work that I haven't thought of. (Maybe just the "bitter lesson", I guess.)
It would doubtless be challenging to get such a system to find large difficult proofs, because it's not so easy to tell what's making progress and what isn't. Maybe you need LLM #3, which again might or might not be the same as the other two LLMs, to assess what parts of the attempt so far seem like they're useful, and scrub the rest from the context or at least stash it somewhere less visible.
It is, of course, also challenging for human mathematicians to find large difficult proofs, and one of the reasons for them is that it's not so easy to tell what's making progress and what isn't. Another major reason, though, is that sometimes you need a genuinely new idea, and so far LLMs aren't particularly good at coming up with those. But a lot of new-enough-ideas[2] are things like "try a version of this technique that worked well in an apparently unrelated field", which is the kind of thing LLMs aren't so bad at.
[2] Also a lot of the new-enough-ideas that mathematicians get really happy about. One of the cool things about mathematics is the way that superficially-unrelated things can turn out to share some of their structure. If LLMs get good at finding that sort of thing but never manage any deeper creativity than that, it could still be enough to produce things that human mathematicians find beautiful.
Plus there isn't a lot of training data in lean.
Most gains come from training on stuff already out there, not really the RLVR part which just amps it up a bit.
Turing machines are also deterministic, but there is no algorithm that can decide whether any given Turing machine halts. What you're asking for is a solution to the Halting Problem.
That's the first problem, the second problem is that any such system that didn't support natural language would require a formal language of some sort, and then you would have to convince every mathematician to write their proofs in your language so it can be checked. All attempts at this have failed to gain much traction, although Lean has gotten pretty far.
It's like Chess, checking who wins for a given board state is easy, but coming up with the next move is hard.
Of course, one can try all possible moves and see what happens. Similar to Chess AI based on search methods (e.g. MinMax), there are proof search methods. See the related work section of the paper.
What's the use case for a system like this?
I suspect it's also because there isn't a lot of data to train on.
We've seen absolutely ridiculous progress in model capability over the past year (which is also quite terrifying).
The approach shifts from "result-oriented" to "process-oriented" verification, particularly important for theorem proving where rigorous step-by-step derivation matters more than just numerical answers.
For another thing, the 2024 Putnam problems are in their RL data.
Also, it's very unclear how these competitions consisting of problems designed to have clear-cut answers and be solved by (well-prepared) humans in an hour will translate to anything else.
Why do you think that the 2024 Putnam programs that they used to test were in the training data?
/? "Art of Problem Solving" Putnam https://www.google.com/search?q=%22Art+of+Problem+Solving%22...
From p.3 of the PDF:
> Curating Cold Start RL Data: We constructed our initial training data through the following process:
> 1. We crawled problems from Art of Problem Solving (AoPS) contests , prioritizing math olympiads, team selection tests, and post-2010 problems explicitly requiring proofs, total- ing 17,503 problems.
Putnam solutions can be found multiple places online: https://kskedlaya.org/putnam-archive/, https://artofproblemsolving.com/community/c3249_putnam. These could have appeared in the training of the base LLM DeepSeek-V3.2-Exp or as problems in the training set - they do not give further detail on what problems they selected from AOPS and as the second link gives they are there.
They reference https://artofproblemsolving.com/community/c13_contest_collec... for the source of their scrape and the Putnam problems are on that page under 'Undergraduate Contests'.
It would also facilitate keeping engineers in the loop, who would decompose the problem into an appropriate set of formally specified functions.
They could also chip in when necessary to complete difficult proofs or redefine the functions.
photon_lines•2mo ago