Sapients paper on the concept of Hierarchical Reasoning Model

48•hansmayer•3h ago

Comments

torginus•1h ago

Is it just me or are symbolic (or as I like to call it 'video game') AI is seeping back into AI?

taylorius•1h ago

Perhaps so - but represented in a trainable, neural form. Very exciting!

cs702•1h ago

Based on a quick first skim of the abstract and the introduction, the results from hierarchical reasoning (HRM) models look incredible:

> Using only 1,000 input-output examples, without pre-training or CoT supervision, HRM learns to solve problems that are intractable for even the most advanced LLMs. For example, it achieves near-perfect accuracy in complex Sudoku puzzles (Sudoku-Extreme Full) and optimal pathfinding in 30x30 mazes, where state-of-the-art CoT methods completely fail (0% accuracy). In the Abstraction and Reasoning Corpus (ARC) AGI Challenge 27,28,29 - a benchmark of inductive reasoning - HRM, trained from scratch with only the official dataset (~1000 examples), with only 27M parameters and a 30x30 grid context (900 tokens), achieves a performance of 40.3%, which substantially surpasses leading CoT-based models like o3-mini-high (34.5%) and Claude 3.7 8K context (21.2%), despite their considerably larger parameter sizes and context lengths, as shown in Figure 1.

I'm going to read this carefully, in its entirety.

Thank you for sharing it on HN!

diwank•1h ago

Exactly!

> It uses two interdependent recurrent modules: a *high-level module* for abstract, slow planning and a *low-level module* for rapid, detailed computations. This structure enables HRM to achieve significant computational depth while maintaining training stability and efficiency, even with minimal parameters (27 million) and small datasets (~1,000 examples).

> HRM outperforms state-of-the-art CoT models on challenging benchmarks like Sudoku-Extreme, Maze-Hard, and the Abstraction and Reasoning Corpus (ARC-AGI), where CoT methods fail entirely. For instance, it solves 96% of Sudoku puzzles and achieves 40.3% accuracy on ARC-AGI-2, surpassing larger models like Claude 3.7 and DeepSeek R1.

Erm what? How? Needs a computer and sitting down.

electroglyph•1h ago

but does it scale?

lispitillo•54m ago

I hope/fear this HRM model is going to be merged with MoE very soon. Given the huge economic pressure to develop powerful LLMs I think this can be done in just a month.

The paper seems to only study problems like sudoku solving, and not question answering or other applications of LLMs. Furthermore they omit a section for future applications or fusion with current LLMs.

I think anyone working in this field can envision their applications, but the details to have a MoE with an HRM model could be their next paper.

I only skimmed the paper and I am not an expert, sure other will/can explain why they don't discuss such a new structure. Anyway, my post is just blissful ignorance over the complexity involved and the impossible task to predict change.

Edit: A more general idea is that Mixture of Expert is related to cluster of concepts and now we would have to consider a cluster of concepts related by the time they take to be grasped, so in a sense the model would have in latent space an estimation of the depth, number of layers, and time required for each concept, just like we adapt our reading style for a dense math book different to a newspaper short story.

buster•33m ago

must say I am suspicious in this regard, as they don't show applications other than a Sudoku solver and don't discuss downsides.

Analoguediehard

Has the Russian intelligence service penetrated Telegram?

SharePoint Exploit Intelligence with Honeypots

How to build the Stasheff Associahedron out of a trefoil knot

Show HN: A Modular Phoenix SaaS Starter Kit

The U.S. Central Intelligence Agency's (CIA) remote viewing experiments

What if sailing had no rules? [video]

LLMs are bad at returning code in JSON

Conspiracy theorists think their views are mainstream

US drops sanctions on Myanmar junta's allies after military chief praises man

Show HN: Mapping supply chain of products (updated)

The Internet Archive just became an official U.S. federal library

Astronomer's 'clever' PR move embracing CEO scandal – featuring Gwyneth Paltrow

djbwares version 10

Exploring Windows XP on macOS ARM64

Careers at the Frontier: Hiring the Future at OpenAI

JANET – The UK Joint Academic Network (1988) [pdf]

Constrained languages are easier to optimize

Using Codex-CLI with ChatGPT Plus/Pro

Draw a fish and watch it swim

Has the Qianfan satellite network – China's Starlink rival – run into trouble?

Scala Highlights June 2025 – Scala 3.9 will be the new LTS

Ronald Coase (1960) – The Problem of Social Cost [pdf]

Apocalyptica

Four ways of declaring interfaces in Haskell

Who Likes Authoritarianism?

Libu8ident: Unicode security guidelines for programming language identifiers

Emad Mostaque: The Plan to Save Humanity from AI [video]

Thank You for Finding Me

Making Cloudflare Pages Faster with Cloudflare's CDN