Despite using only unitary operations and no attention mechanism, a 1024×32 model achieves coherent TinyStories generation after < 1.8 hours of training on a single consumer GPU.
This is Part 1 - the next step is physical implementation with $50 of optics from AliExpress.
tliltocatl•48m ago