fp.

The model uses a 1024-dimensional complex Hilbert space with 32 layers of programmable Mach–Zehnder meshes (Reck architecture) and derives token probabilities directly via the Born rule.

Despite using only unitary operations and no attention mechanism, a 1024×32 model achieves coherent TinyStories generation after < 1.8 hours of training on a single consumer GPU.

This is Part 1 - the next step is physical implementation with $50 of optics from AliExpress.

Comments

tliltocatl•2mo ago

Stupid question - how is it even possible given that you lose information on each layer? And how do one implement a non-linear activation function without an amplifier of a sort?

IronyMan100•2mo ago

Normally in this kind of systems, the detection is the nonlinearity. That is, you send light through the system, the light can interfere, Changes path through the system but in the end you can detect only the intensities, |E|^2.

ifuknowuknow•2mo ago

meds

bastawhiz•2mo ago

This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.

If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.

damir00•2mo ago

Oh absolutely...this is kitchen-table level at this point. There is a clear path to really huge number of parameters, but a bunch of things need to be proven first. Like...can the detector meaningfully read what comes out the end of the optical chain?

I expect to have an answer this week...

cpldcpu•2mo ago

"Zero power" does not include the power needed to translate information between electronic and optical domains and the light source itself.

damir00•2mo ago

Yes, correct. I will phrase this better in the future. The zero-power refers only to what is, in effect, the optical replacement for the ocean of matmul you have in standard Transformer implementation.

I apologize for not being clearer.

The goal isn't actually "zero power" - the goal is "so little heat dissipation in orbit is easy".