CellARC, a synthetic benchmark for abstraction and reasoning is built from multicolor 1D cellular automata (CA). Each episode has five support pairs and one query serialized in 256 tokens, enabling rapid iteration with small models while exposing a controllable task space with explicit knobs for alphabet size k, radius r, rule family, Langton's lambda, query coverage, and cell entropy. We release 95k training episodes plus two 1k test splits (interpolation/extrapolation) and evaluate symbolic, recurrent, convolutional, transformer, recursive, and LLM baselines. CellARC decouples generalization from anthropomorphic priors, supports unlimited difficulty-controlled sampling, and enables reproducible studies of how quickly models infer new rules under tight budgets.
Paper: https://arxiv.org/abs/2511.07908
Code: https://github.com/mireklzicar/cellarc
Baselines: https://github.com/mireklzicar/cellarc_baselines
Dataset: https://huggingface.co/datasets/mireklzicar/cellarc_100k
X Thread: https://x.com/miroslavlzicar/status/1988502075664105561?s=20
Web & Leaderboard: https://cellarc.mireklzicar.com/