frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Sapients paper on the concept of Hierarchical Reasoning Model

https://arxiv.org/abs/2506.21734
63•hansmayer•4h ago

Comments

torginus•2h ago
Is it just me or are symbolic (or as I like to call it 'video game') AI is seeping back into AI?
taylorius•2h ago
Perhaps so - but represented in a trainable, neural form. Very exciting!
bobosha•46m ago
But symbolic != hierarchical
cs702•2h ago
Based on a quick first skim of the abstract and the introduction, the results from hierarchical reasoning (HRM) models look incredible:

> Using only 1,000 input-output examples, without pre-training or CoT supervision, HRM learns to solve problems that are intractable for even the most advanced LLMs. For example, it achieves near-perfect accuracy in complex Sudoku puzzles (Sudoku-Extreme Full) and optimal pathfinding in 30x30 mazes, where state-of-the-art CoT methods completely fail (0% accuracy). In the Abstraction and Reasoning Corpus (ARC) AGI Challenge 27,28,29 - a benchmark of inductive reasoning - HRM, trained from scratch with only the official dataset (~1000 examples), with only 27M parameters and a 30x30 grid context (900 tokens), achieves a performance of 40.3%, which substantially surpasses leading CoT-based models like o3-mini-high (34.5%) and Claude 3.7 8K context (21.2%), despite their considerably larger parameter sizes and context lengths, as shown in Figure 1.

I'm going to read this carefully, in its entirety.

Thank you for sharing it on HN!

diwank•2h ago
Exactly!

> It uses two interdependent recurrent modules: a *high-level module* for abstract, slow planning and a *low-level module* for rapid, detailed computations. This structure enables HRM to achieve significant computational depth while maintaining training stability and efficiency, even with minimal parameters (27 million) and small datasets (~1,000 examples).

> HRM outperforms state-of-the-art CoT models on challenging benchmarks like Sudoku-Extreme, Maze-Hard, and the Abstraction and Reasoning Corpus (ARC-AGI), where CoT methods fail entirely. For instance, it solves 96% of Sudoku puzzles and achieves 40.3% accuracy on ARC-AGI-2, surpassing larger models like Claude 3.7 and DeepSeek R1.

Erm what? How? Needs a computer and sitting down.

mkagenius•36m ago
Is it talking about fine tuning existing models with 1000 examples to beat them in those tasks?
electroglyph•2h ago
but does it scale?
lispitillo•2h ago
I hope/fear this HRM model is going to be merged with MoE very soon. Given the huge economic pressure to develop powerful LLMs I think this can be done in just a month.

The paper seems to only study problems like sudoku solving, and not question answering or other applications of LLMs. Furthermore they omit a section for future applications or fusion with current LLMs.

I think anyone working in this field can envision their applications, but the details to have a MoE with an HRM model could be their next paper.

I only skimmed the paper and I am not an expert, sure other will/can explain why they don't discuss such a new structure. Anyway, my post is just blissful ignorance over the complexity involved and the impossible task to predict change.

Edit: A more general idea is that Mixture of Expert is related to cluster of concepts and now we would have to consider a cluster of concepts related by the time they take to be grasped, so in a sense the model would have in latent space an estimation of the depth, number of layers, and time required for each concept, just like we adapt our reading style for a dense math book different to a newspaper short story.

buster•1h ago
must say I am suspicious in this regard, as they don't show applications other than a Sudoku solver and don't discuss downsides.
Oras•1h ago
and the training was only on Sudoku. Which means they need to train a small model for every problem that currently exists.

Back to ML models?

yorwba•1h ago
This HRM is essentially purpose-designed for solving puzzles with a small number of rules interacting in complex ways. Because the number of rules is small, a small model can learn them. Because the model is small, it can be run many times in a loop to resolve all interactions.

In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other, so I don't think you could ever get away with a similarly small model. Fortunately, a comparatively small number of steps typically seems to be enough to get decent results.

But if you tried to use an LLM-sized model in an HRM-style loop, it would be dog slow, so I don't expect anyone to try it anytime soon. Certainly not within a month.

Maybe you could have a hybrid where an LLM has a smaller HRM bolted on to solve the occasional constraint-satisfaction task.

energy123•8m ago
What about many small HRM models that solve conceptually distinct subtasks as determined and routed to by a master model who then analyzes and aggregates the outputs, with all of that learned during training.
OgsyedIE•46m ago
Skimming this, there is no reason why a MoE LLM system (whether autoregressive, diffusion, energy-based or mixed) couldn't be given a nested architecture that duplicates the layout of a HRM. Combining these in different ways should allow for some novel benchmarks around efficiency and quality, which will be interesting.
0x000xca0xfe•30m ago
Goodbye captchas I guess? Somehow they are still around.
topspin•27m ago
> "After completing the T steps, the H-module incorporates the sub-computation’s outcome (the final state L) and performs its own update. This H update establishes a fresh context for the L-module, essentially “restarting” its computational path and initiating a new convergence phase toward a different local equilibrium."

So they let the low-level RNN bottom out, evaluate the output in the high level module, and generate a new context for the low-level RNN. Rinse, repeat. The low-level RNNs are iterating backpropagation while the high-level is periodically kicking the low-level RNNs to get better outputs. Loops within loops.

Another interesting part:

> "Neuroscientific evidence shows that these cognitive modes share overlapping neural circuits, particularly within regions such as the prefrontal cortex and the default mode network. This indicates that the brain dynamically modulates the “runtime” of these circuits according to task complexity and potential rewards.

> Inspired by the above mechanism, we incorporate an adaptive halting strategy into HRM that enables `thinking, fast and slow'"

A scheduler that dynamically balances resources based on the necessary depth of reasoning and the available data.

I love how this paper cites parallels with real brains throughout. I believe AGI will be solved as the primitives we're developing are composed to extreme complexity, utilizing many cooperating, competing, communicating, concurrent, specialized "modules." It is apparent to me that human brain must have this complexity, because it's the only feasible way evolution had to achieve cognition using slow, low power tissue.

JonathanRaines•9m ago
I advise scepticism.

This work does have some very interesting ideas, specifically avoiding the costs of backpropagation through time.

However, it does not appear to have been peer reviewed.

The results section is odd. It does not include include details of how they performed the assesments, and the only numerical values are in the figure on the front page. The results for ARC2 are (contrary to that figure) not top of the leaderboard (currently 19% compared to HRMs 5% https://www.kaggle.com/competitions/arc-prize-2025/leaderboa...)

When We Get Komooted

https://bikepacking.com/plog/when-we-get-komooted/
187•atakan_gurkan•4h ago•83 comments

Jeff Bezos doesn't believe in PowerPoint, and his employees agree

https://texttoslides.ai/blog/amazon-not-using-powerpoint
16•sh_tomer•43m ago•12 comments

Linux on Snapdragon X Elite: Linaro and Tuxedo Pave the Way for ARM64 Laptops

https://www.linaro.org/blog/linux-on-snapdragon-x-elite/
86•MarcusE1W•4h ago•44 comments

Chemical process produces critical battery metals with no waste

https://spectrum.ieee.org/nmc-battery-aspiring-materials
128•stubish•6h ago•7 comments

Sapients paper on the concept of Hierarchical Reasoning Model

https://arxiv.org/abs/2506.21734
63•hansmayer•4h ago•16 comments

Fast and cheap bulk storage: using LVM to cache HDDs on SSDs

https://quantum5.ca/2025/05/11/fast-cheap-bulk-storage-using-lvm-to-cache-hdds-on-ssds/
106•todsacerdoti•7h ago•25 comments

Smallest particulate matter air quality sensor for ultra-compact IoT devices

https://www.bosch-sensortec.com/news/worlds-smallest-particulate-matter-sensor-bmv080.html
88•Liftyee•7h ago•30 comments

A low power 1U Raspberry Pi cluster server for inexpensive colocation

https://github.com/pawl/raspberry-pi-1u-server
69•LorenDB•3d ago•26 comments

Janet: Lightweight, Expressive, Modern Lisp

https://janet-lang.org
75•veqq•9h ago•29 comments

Cable Bacteria Are Living Batteries

https://www.asimov.press/p/cable-bacteria
40•mailyk•3d ago•5 comments

Resizable structs in Zig

https://tristanpemble.com/resizable-structs-in-zig/
130•rvrb•13h ago•57 comments

How we rooted Copilot

https://research.eye.security/how-we-rooted-copilot/
311•uponasmile•19h ago•125 comments

Purple Earth hypothesis

https://en.wikipedia.org/wiki/Purple_Earth_hypothesis
235•colinprince•3d ago•63 comments

Implementing dynamic scope for Fennel and Lua

https://andreyor.st/posts/2025-06-09-implementing-dynamic-scope-for-fennel-and-lua/
12•Bogdanp•3d ago•0 comments

16colo.rs: ANSI/ASCII art archive

https://16colo.rs/
54•debo_•3d ago•13 comments

Rust running on every GPU

https://rust-gpu.github.io/blog/2025/07/25/rust-on-every-gpu/
548•littlestymaar•1d ago•184 comments

Low cost mmWave 60GHz radar sensor for advanced sensing

https://www.infineon.com/part/BGT60TR13C
85•teleforce•3d ago•29 comments

Coronary artery calcium testing can reveal plaque in arteries, but is underused

https://www.nytimes.com/2025/07/26/health/coronary-artery-calcium-heart.html
102•brandonb•13h ago•92 comments

Reading QR codes without a computer

https://qr.blinry.org/
20•taubek•3d ago•3 comments

Personal aviation is about to get interesting (2023)

https://www.elidourado.com/p/personal-aviation
112•JumpCrisscross•12h ago•90 comments

Constrained languages are easier to optimize

https://jyn.dev/constrained-languages-are-easier-to-optimize/
3•PaulHoule•2h ago•1 comments

What went wrong for Yahoo

https://dfarq.homeip.net/what-went-wrong-for-yahoo/
185•giuliomagnifico•17h ago•177 comments

Teach Yourself Programming in Ten Years (1998)

https://norvig.com/21-days.html
93•smartmic•14h ago•40 comments

Paul Dirac and the religion of mathematical beauty (2011) [video]

https://www.youtube.com/watch?v=jPwo1XsKKXg
71•magnifique•12h ago•5 comments

Show HN: QuickTunes: Apple Music player for Mac with iPod vibes

https://furnacecreek.org/quicktunes/
78•albertru90•11h ago•24 comments

The natural diamond industry is getting rocked. Thank the lab-grown variety

https://www.cbc.ca/news/business/lab-grown-diamonds-1.7592336
215•geox•23h ago•250 comments

Getting decent error reports in Bash when you're using 'set -e'

https://utcc.utoronto.ca/~cks/space/blog/programming/BashGoodSetEReports
123•zdw•3d ago•35 comments

Three high-performance RISC-V processors to watch in H2 2025

https://www.cnx-software.com/2025/07/22/three-high-performance-risc-v-processors-to-watch-in-h2-2025-ultrarisc-ur-dp1000-zizhe-a210-and-spacemit-k3/
7•fork-bomber•3d ago•2 comments

Beyond Food and People

https://aeon.co/essays/nietzsches-startling-provocation-youre-edible-and-delicious
11•Petiver•4h ago•3 comments

Arvo Pärt at 90

https://www.theguardian.com/music/2025/jul/24/the-god-of-small-things-celebrating-arvo-part-at-90
92•merrier•14h ago•24 comments