Ouch found the killer it takes up 0.1 mm^2 in area. That's a show stopper. Hopefully they can scale it down or use it for server infra.
> bitcell achieves at least 10 GHz read, write, and compute operations entirely in the optical domain
> Validated on GlobalFoundries' 45SPCLO node
> X-pSRAM consumed 13.2 fJ energy per bit for XOR computation
Don't only think about area.
Put another way, the TLB in a CPU is relatively small and definitely a hotspot but you could estimate the TLB in a CPU at ≈ 0.0003 – 0.002 mm^2. which is ~50 times smaller then the single bit in this paper. To get to 10GHz we could just make 10 copies of an existing TLB operating at 1GHz and still have a ton of headroom. There is also a electro-optical conversion penalty that you need to take into account with most optical systems.
Not trying to be a Debbie downer. It's a cool result, no doubt incredibly useful for fully optical systems. Probably something really useful here for optical switching at the datacenter infrastructure level.
The ability here is that it can do storage and computation directly in the optical domain, this immensely reduces latency of crossing from photons to electrons and back to light again. Exactly what you want in a network switch.
I made no comment about it being used in a cpu.
It's only when you expect data to be able to cross a chip in a single clock cycle that you need to slow down to the 5 Ghz or so that CPUs run into trouble exceeding.
The idea of RAM itself is the bottleneck. If you can load data in one end of a process, and get results out the other end, without ever touching RAM, you can do wonders.
(I did a google search on the acknowledged grant in the paper, no connection)
[0] https://sam.gov/opp/e0fb2b2466cd470481b0ca5cab3d210d/view
Scene_Cast2•6mo ago
I understand that you can get highly power efficient XORs, for example. But if we go down this path, would they help with a matrix multiply? Or the bias term of a FFN? Would there be any improvement (i.e. is there anything to offload) in regular business logic? Should I think of it as a more efficient but highly limited DSP? Or a fixed function accelerator replacement (e.g. "we want to encrypt this segment of memory")
roflmaostc•6mo ago
For example, in this work Lin, Z., Shastri, B.J., Yu, S. et al. 120 GOPS Photonic tensor core in thin-film lithium niobate for inference and in situ training. Nat Commun 15, 9081 (2024). https://doi.org/10.1038/s41467-024-53261-x
they achieve a "weight update speed of 60Ghz" which is much faster than the average ~3-4Ghz CPU.
GloamingNiblets•6mo ago
It's still very niche but could offer enormous power savings for ML inference.
larodi•6mo ago
IBM experimenting in this direction or at least they claim to here https://www.ibm.com/think/topics/neuromorphic-computing
there is another CPU which was recently featured which has again a lattice which is sort of FPGA but very fast, where different modules are loaded with some tasks, and each marble pumps data to some other, where the orchestrator decides how and what goes in each of these.
oneseven•6mo ago
https://news.ycombinator.com/item?id=44685050
larodi•6mo ago
phkahler•6mo ago
woodrowbarlow•6mo ago