> The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.
Which is 0.07% (edit: 7%) for PutnamBench
It feels pretty intuitive to me that the ability for an LLM to break a complex problem down into smaller, more easily solvable pieces will unlock the next level of complexity.
This pattern feels like a technique often taught to junior engineers- how to break up a multi-week project into bitesized tasks. This model is obviously math focused, but I see no reason why this wouldn't be incredibly powerful for code based problem solving.
1) Maintaining context of the overall project and goals while working in the weeds on a subtask of a task on an epic (so to speak) both in terms of what has been accomplished already and what still needs to be accomplished
and 2) Getting an agentic coding tool which can actually handle the scale of doing 50 small projects back to back. With these agentic tools I find they start projects off really strong but by task #5 they're just making a mess with every change.
I've played with keeping basically a dev-progress.md file and implementation-plan.md file that I keep in context for every request and end each task by updating files. But me manually keeping all this context isn't solving all my problems.
And all the while, tools like Cline are gobbling up 2M tokens to make small changes.
This is a struggle for every human I’ve ever worked with
Shit. Do we need agile AI now?
There are a default set of modes (orchestrator, code, architect, debug, and ask) and you can create your own custom ones (or have roo do it for you, which is kind of a fun meta play).
Orchestrator basically consults the others and uses them when appropriate, feeding in a sensible amount of task definition and context into the sub task. You can use different LLMs for different modes as well (I like Gemini 2.5 Pro for most of the thinking style ones and gpt o4-mini for the coding).
I've done some reasonably complicated things and haven't really had an orchestrator task creep past ~400k tokens before I was finished and able to start a new task.
There are some people out there who do really cool stuff with memory banks (basically logging and progress tracking), but I haven't played a ton with that yet.
Basic overview: https://docs.roocode.com/features/boomerang-tasks
Custom Modes: https://docs.roocode.com/features/custom-modes
For example, I made a bot that you could give it a problem statement, and then it would return an array of steps to accomplish it.
Then you could take the steps, and click on them to break them down and add them to the list. If you just kept clicking you would get to excruciating detail.
For example taking out the trash can become over ~70 individual steps if you really drill into the details.
Some of the steps:
Stand close to the trash can – Position yourself so you have stable footing and easy access.
Place one hand on the rim of the can – Use your non-dominant hand to press lightly on the edge of the trash can to hold it in place.
Grip the top edge of the bag with your other hand – Find the part of the bag that extends past the rim.
Gently lift the bag upward – While your one hand stabilizes the can, slowly pull the bag up with the other.
Tilt the can slightly if needed – If the bag sticks or creates suction, rock or tilt the can slightly while continuing to lift.
Avoid jerking motions – Move steadily to prevent tears or spills
--> "In an increasingly crowded field of LLMs, how will our (costly to produce) model stand out?"
Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.
- Making correct and smart assumption. Currently all LLM bots are too stupid at making good assumptions. I don't want to explicitly repeat and repeat again my own assumptions while the context is clear enough. Hey bots, try harder.
- LLM bot needs to bring their own secondary and contextual memory in reasoning, i don't want to do it for you, ok ? You're the bot.
- Thinking out of the box. This is the final stage of intelligence. Adapt old technique to make your own technique to solve non-existing before problems.
Is it really opensource? Something changed?
theorem convergesTo_unique {s : ℕ → ℝ} {a b : ℝ} (sa : ConvergesTo s a) (sb : ConvergesTo s b) :
For fun I tried it on the free model on openrouter.ai. Got the answer the first time.
https://leanprover-community.github.io/mathematics_in_lean/m...
Here's the answer just to give you a feel.
by_contra h
have h₁ : a ≠ b := h
have h₂ : |a - b| > 0 := by
apply abs_pos.mpr
exact sub_ne_zero.mpr h₁
-- Use the definition of convergence to find N₁ and N₂
have h₃ := sa (|a - b| / 2) (by linarith)
have h₄ := sb (|a - b| / 2) (by linarith)
cases' h₃ with N₁ h₃
cases' h₄ with N₂ h₄
-- Choose N to be the maximum of N₁ and N₂
let N := max N₁ N₂
have h₅ := h₃ N (by simp [N, le_max_left])
have h₆ := h₄ N (by simp [N, le_max_right])
-- Derive a contradiction using the triangle inequality
have h₇ : |s N - a| < |a - b| / 2 := by simpa using h₅
have h₈ : |s N - b| < |a - b| / 2 := by simpa using h₆
have h₉ : |a - b| < |a - b| := by
calc
|a - b| = |a - s N + (s N - b)| := by ring_nf
_ ≤ |a - s N| + |s N - b| := by
apply abs_add
_ = |s N - a| + |s N - b| := by
rw [abs_sub_comm]
_ < |a - b| / 2 + |a - b| / 2 := by
linarith
_ = |a - b| := by ring
linarith
simianwords•4h ago
A prover model might be used as a tool in the coming future.
koakuma-chan•4h ago
In the future? I'm pretty sure people do that already.
energy123•4h ago
kratom_sandwich•4h ago
simianwords•3h ago
The point being that a separate expert model would be better at its own field than a single model that tries to be good at everything. Intuitively it makes sense, in practice I have seen anecdotes where finetuning a small model on domain data makes the model lose coherence on other topics.
koakuma-chan•1h ago
This is expected behaviour.
simianwords•1h ago
Arcuru•4h ago
simianwords•3h ago
someguy101010•4h ago
repsilat•4h ago
samvaran•4h ago
oofbaroomf•4h ago
retinaros•4h ago
AlexCoventry•1h ago