Context windows are now 1M+ tokens, but context depth is limited. Often, the answer is hidden behind layers of linked information, but an attention block can only resolve one link at a time. We trained a tiny 5 layer model that beats GPT-4.5 on a variable evaluation task requiring deep, recursive reasoning. How? It learned a divide and conquer mechanism.
ghostgoober•4h ago
Nice. Does the give general improvements on models (other benchmarks etc) or is it very specific to narrow domains.
michael_lutz•3h ago
That's a really interesting question, and it's one I'd love to answer in a future work. This blog mostly focuses on characterizing context depth limits.
michael_lutz•4h ago
ghostgoober•4h ago
michael_lutz•3h ago