...we made some surprising findings that call into question the prevailing narrative around HRM:
1. The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer.
2. However, the relatively under-documented "outer loop" refinement process drove substantial performance, especially at training time.
3. Cross-task transfer learning has limited benefits; most of the performance comes from memorizing solutions to the specific tasks used at evaluation time.
4. Pre-training task augmentation is critical, though only 300 augmentations are needed (not 1K augmentations as reported in the paper). Inference-time task augmentation had limited impact.
Findings 2 & 3 suggest that the paper's approach is fundamentally similar to Liao and Gu's "ARC-AGI without pretraining".
nabla9•1h ago
https://arcprize.org/blog/hrm-analysis
...we made some surprising findings that call into question the prevailing narrative around HRM:
1. The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer.
2. However, the relatively under-documented "outer loop" refinement process drove substantial performance, especially at training time.
3. Cross-task transfer learning has limited benefits; most of the performance comes from memorizing solutions to the specific tasks used at evaluation time.
4. Pre-training task augmentation is critical, though only 300 augmentations are needed (not 1K augmentations as reported in the paper). Inference-time task augmentation had limited impact.
Findings 2 & 3 suggest that the paper's approach is fundamentally similar to Liao and Gu's "ARC-AGI without pretraining".