> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).
Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?
inkysigma•9m ago
I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods.
I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.
karmasimida•8m ago
No denial at this point, AI could produce something novel, and they will be doing more of this moving forward.
osti•6m ago
Seems like the high compute parallel thinking models weren't even needed, both the normal 5.4 and gemini 3.1 pro solved it. Somehow Gemini 3 deepthink couldn't solve it.
renewiltord•6m ago
Fantastic news! That means with the right support tooling existing models are already capable of solving novel mathematics. There’s probably a lot of good mathematics out there we are going to make progress on.
6thbit•43m ago
Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?
inkysigma•9m ago
I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.