That is very impressive.
Side note: Superficially reminds me of Hierarchical Temporal Memory from Jeff Hawkins "On Intelligence". Although this doesn't have the sparsity aspect, its hierarchical and temporal aspects are related.
https://en.wikipedia.org/wiki/Hierarchical_temporal_memory https://www.numenta.com
But it has the potential to alter the economics of AI quite dramatically
Also would possibly instantly void the value of trillions of pending AI datacenter capex, which would be funny. (Though possibly not for very long.)
https://arcprize.org/blog/hrm-analysis
This here looks like a stripped down version of HRM - possibly drawing on the ablation studies from this very analysis.
Worth noting that HRMs aren't generally applicable in the same way normal transformer LLMs are. Or, at least, no one has found a way to apply them to the typical generative AI tasks yet.
I'm still reading the paper, but I expect this version to be similar - it uses the same tasks as HRMs as examples. Possibly quite good at spatial reasoning tasks (ARC-AGI and ARC-AGI-2 are both spatial reasoning benchmarks), but it would have to be integrated into a larger more generally capable architecture to go past that.
With the same data augmentation / 'test time training' setting, the vanilla Transformers do pretty well, close to the "breakthrough" HRM reported. From a brief skim, this paper is using similar settings to compare itself on ARC-AGI.
I too, want to believe in smaller models with excellent reasoning performance. But first understand what ARC-AGI tests for, what the general setting is -- the one that commercial LLMs use to compare against each other -- and what the specialised setting HRM and this paper uses as evaluation.
The naming of that benchmark lends itself to hype, as we've seen in both HRM and this paper.
Which is still a fun idea to play around with - this approach clearly has its strengths. But it doesn't appear to be an actual "better Transformer". I don't think it deserves nearly as much hype as it gets.
With recurrence: The idea has been around: https://arxiv.org/abs/1807.03819
There are reasons why it hasn't really been picked up at scale, and the method tends to do well on synthetic tasks.
guybedo•2h ago
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies.
This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal.
We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers.
With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.
SeanAnderson•55m ago
Well, that's pretty compelling when taken in isolation. I wonder what the catch is?
esafak•53m ago
My gut feeling is that this will limits its capability, because creativity and intelligence involve connecting disparate things, and to do that you need to know them first. Though philosophers have tried, you can't unravel the mysteries of the universe through reasoning alone. You need observations, facts.
What I could see it good for is a dedicated reasoning module.
Grosvenor•49m ago
We'll need a memory system, an executive function/reasoning system as well as some sort of sense integration - auditory, visual, text in the case of LLMs, symbolic probably.
A good avenue of research would be to see if you could glue opencyc to this for external "knowledge".
LLM's are fundamentally a dead end.
Github link: https://github.com/SamsungSAILMontreal/TinyRecursiveModels