The "Goldfish" Problem The Transformer is incredibly fast at learning how to finish a sentence. Because it uses "Attention," it’s like a student with perfect short-term memory. But it is "blind" to the long run. It only sees 128 characters at a time. It has no way to remember the beginning of the book while it's reading the end.
The Crossover My "Infinite Brain" model (a recurrent architecture) started out much worse. It was confused and the output was garbage. But around the 5th time reading the book, it "crossed over" and started beating the Transformer.
Because the Brain carries a small "memory state" forward forever, it eventually builds a "vibe" of the whole book that the Transformer just can't see.
What I learned:
Transformers are like great eyes: They see the immediate details perfectly.
Recurrent models are like a stomach: They digest the whole thing slowly, but they keep the "nutrients" of the story for much longer.
It’s a small toy experiment, but it reminded me that while Attention is powerful, having a persistent "soul" or memory state still matters for long-form data.
Loaded 2634700 bytes. Each batch chunk: 82334 Total steps to read book once: 643 Step 20 | Brain: 5.563 | Trans: 4.369 SAMPLE: 黛玉|)'Q:��t�д��T*��䎰�"��-��H� Step 40 | Brain: 5.397 | Trans: 3.469 SAMPLE: 黛玉C9j������%�����H�)���IF� Step 60 | Brain: 5.256 | Trans: 3.181 SAMPLE: 黛玉���f�XݪsʇGa���7��K��|)[o Step 80 | Brain: 5.107 | Trans: 3.015 � ����EU) 玉�����.��-n�������: Step 100 | Brain: 4.925 | Trans: 2.891 SAMPLE: 黛玉���5��X3�"��䜍��r��V:ی��;(�
......
Step 5540 | Brain: 2.057 | Trans: 2.327 SAMPLE: 黛玉,只賈王夫白說.這毫 Step 5560 | Brain: 2.079 | Trans: 2.354 SAMPLE: 黛玉了一大人,然時眾不要 Step 5580 | Brain: 2.112 | Trans: 2.381 SAMPLE: 黛玉姐的個 坐叫來.不著� Step 5600 | Brain: 2.115 | Trans: 2.394 SAMPLE: 黛玉天豥太人說䟥這撆, � Step 5620 | Brain: 2.026 | Trans: 2.290 SAMPLE: 黛玉奶歉打.命盞嚄看梅聴 Step 5640 | Brain: 2.134 | Trans: 2.414 SAMPLE: 黛玉圉起不又尚不什了,政 Step 5660 | Brain: 2.071 | Trans: 2.331 SAMPLE: 黛玉怎我古搬親就徴上一就 Step 5680 | Brain: 2.127 | Trans: 2.361 SAMPLE: 黛玉,,越于頭姐姒實眒頓 Step 5700 | Brain: 2.175 | Trans: 2.436 SAMPLE: 黛玉的你 又罜家中還太歉� Step 5720 | Brain: 2.127 | Trans: 2.348 SAMPLE: 黛玉半,他面是以. "釳有� Step 5740 | Brain: 2.141 | Trans: 2.394 SAMPLE: 黛玉,是難隄日既.阯要听 Step 5760 | Brain: 2.170 | Trans: 2.429 SAMPLE: 黛玉寶便吃,有了。”凌的 Step 5780 | Brain: 2.145 | Trans: 2.378 SAMPLE: 黛玉那好,乴日不拍說賈拿 Finished Read #9 Step 5800 | Brain: 2.092 | Trans: 2.376 SAMPLE: 黛玉人來,姐气有你兄話. Step 5820 | Brain: 2.058 | Trans: 2.321 SAMPLE: 黛玉了.政遚的�,著嬑 家
https://github.com/MrPan2048/GeometricTransformer