In June 2025, Apple published a highly controversial paper, "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the
Lens of Problem Complexity" that claimed Large Reasoning Models (LRM) did very little reasoning (planning).
Anthropic's Lawson fired back, condemning the experimental setup as flawed and the conclusions overstated.
This paper provides evidence supporting Apple's take -
"failures solving the Towers of Hanoi were not purely result
of output constraints, but also partly a result of cognition limitations: LRMs still stumble when complexity rises moderately (around 8 disks)"
" we also identified persistent failure modes that reveal limitations in long-horizon consistency and symbolic generalization. Our analysis suggests that
these reasoning breakdowns stem not only from architectural constraints, but also from the inherently stochastic nature of these systems and the optimization methods they rely on."
jamesblonde•10h ago
Anthropic's Lawson fired back, condemning the experimental setup as flawed and the conclusions overstated.
This paper provides evidence supporting Apple's take - "failures solving the Towers of Hanoi were not purely result of output constraints, but also partly a result of cognition limitations: LRMs still stumble when complexity rises moderately (around 8 disks)"
" we also identified persistent failure modes that reveal limitations in long-horizon consistency and symbolic generalization. Our analysis suggests that these reasoning breakdowns stem not only from architectural constraints, but also from the inherently stochastic nature of these systems and the optimization methods they rely on."