I still don't get how achieving 96% on some benchmark means it's a super genius but that last 4% is somehow still out of reach. The people who constantly compare robots to people should really ponder how a person who manages to achieve 90% on some advanced math benchmark still misses that last 10% somehow.
botusaurus•30m ago
do you think Terence Tao can solve any math problem in the world that is solvable by another matematician?
This is what everyone who uses llms regularly expected. Good results require a human in the loop and the internet is so big that just about everything has been done there by someone. Most often you.
u1hcw9nx•8m ago
>The results of this paper should not be interpreted as suggesting that AI can consistently solve
research-level mathematics questions. In fact, our anecdotal experience is the opposite: success cases
are rare, and an apt intuition for autonomous capabilities (and limitations) may currently be important
for finding such cases. The papers (ACGKMP26; Feng26; LeeSeo26) grew out of spontaneous positive
outcomes in a wider benchmarking effort on research-level problems; for most of these problems, no
autonomous progress was made.
measurablefunc•1h ago
botusaurus•30m ago