A Photonic SRAM with Embedded XOR Logic for Ultra-Fast In-Memory Computing

41•PaulHoule•3d ago

Comments

Scene_Cast2•5h ago

Something I've never quite understood is where, on the spectrum of mainstream vs niche, in memory computing approaches lie. What are the proposed use cases?

I understand that you can get highly power efficient XORs, for example. But if we go down this path, would they help with a matrix multiply? Or the bias term of a FFN? Would there be any improvement (i.e. is there anything to offload) in regular business logic? Should I think of it as a more efficient but highly limited DSP? Or a fixed function accelerator replacement (e.g. "we want to encrypt this segment of memory")

roflmaostc•5h ago

The main promises in optical computing are energy consumption, latency and single core speed.

For example, in this work Lin, Z., Shastri, B.J., Yu, S. et al. 120 GOPS Photonic tensor core in thin-film lithium niobate for inference and in situ training. Nat Commun 15, 9081 (2024). https://doi.org/10.1038/s41467-024-53261-x

they achieve a "weight update speed of 60Ghz" which is much faster than the average ~3-4Ghz CPU.

GloamingNiblets•4h ago

The von Neumann architecture is not ideal for all use cases; ML training and inference is hugely memory bound and a ton of energy is spent moving network weights around for just a few OPs. Our own squishy neural networks can be viewed as a form of in-memory computing: synapses both store network properties and execute the computation (there's no need to read out synapse weights for calculation elsewhere).

It's still very niche but could offer enormous power savings for ML inference.

larodi•3h ago

sooner or later we get a NRAM - neural ram as extension which is basically this neuromorphic lattice that can be wired on the very low level, perhaps also photonic level, and then the whole AI thing trains/lives in it.

IBM experimenting in this direction or at least they claim to here https://www.ibm.com/think/topics/neuromorphic-computing

there is another CPU which was recently featured which has again a lattice which is sort of FPGA but very fast, where different modules are loaded with some tasks, and each marble pumps data to some other, where the orchestrator decides how and what goes in each of these.

oneseven•2h ago

You're referring to Evolution, seems to be a CGRA

https://news.ycombinator.com/item?id=44685050

phkahler•49m ago

I keep thinking of a dram with a row of MAC units and registers along the row outputs. A vector is then an entire dram row. Access takes longer then the math, so slower/smaller multi-cycle circuits could be used. This would probably require OS level allocation of vectors in dram, and management of the accumulator vector (it really should be a row, but we need a huge register to avoid extra reads and writes. The dram will also need some kind of command interface.

woodrowbarlow•3h ago

perhaps one use-case is fully-homomorphic-encryption, which performs functions on encrypted data without ever decrypting it. this paper appears to be about how in-memory processing can improve the performance of FHE: https://arxiv.org/abs/2311.16293

latchkey•3h ago

This might not be used in actual computing the way you're thinking, it might be in a network switch or transceiver, and increase speeds and reduce power usage.

rapatel0•2h ago

Geez if this works. It makes TCAMs free.

Ouch found the killer it takes up 0.1 mm^2 in area. That's a show stopper. Hopefully they can scale it down or use it for server infra.

sitkack•1m ago

I don't understand how that is show stopper.

> bitcell achieves at least 10 GHz read, write, and compute operations entirely in the optical domain

> Validated on GlobalFoundries' 45SPCLO node

> X-pSRAM consumed 13.2 fJ energy per bit for XOR computation

Don't only think about area.

mikewarot•28m ago

Actual performance data on current ASIC processes is hard to find, but even with the quite old and slow Skywater 130 nm process, you can make a 2.5+ GHz oscillator.[1] I strongly suspect that with the current 3nm process you could easily do a 4LUT and latch that could clock well above 10 Ghz. If you tiled those in a grid (A BitGrid)... you've got a non-Von Neumann general purpose compute engine that can clock above 10 Ghz, without the need for photonics.

It's only when you expect data to be able to cross a chip in a single clock cycle that you need to slow down to the 5 Ghz or so that CPUs run into trouble exceeding.

The idea of RAM itself is the bottleneck. If you can load data in one end of a process, and get results out the other end, without ever touching RAM, you can do wonders.

[1] https://github.com/adithyasunil26/2.87GHz-MWG-MPW5

inasio•16m ago

This came from funding from the following DARPA program: [0]

(I did a google search on the acknowledged grant in the paper, no connection)

[0] https://sam.gov/opp/e0fb2b2466cd470481b0ca5cab3d210d/view

Tao on "blue team" vs. "red team" LLMs

Copyparty, turn almost any device into a file server

Claude Code new limits – Important updates to your Max account usage limits

Interstellar Comet 3I/Atlas: What We Know Now

Visa and Mastercard are getting overwhelmed by gamer fury over censorship

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Different Clocks

Simplify, then add delightness: On designing for children

Cells that breathe oxygen and sulfur at the same time

LLM Embeddings Explained: A Visual and Intuitive Guide

VPN use surges in UK as new online safety rules kick in

The first 100% effective HIV prevention drug is approved and going global

The Geological Sublime

AI Companion Piece

Requesting Funding for 90s.dev

Enough AI copilots, we need AI HUDs

NixOS on a Tuxedo InfinityBook Pro 14 Gen9 AMD Laptop

Six Principles for Production AI Agents

I saved a PNG image to a bird

Terminal app can now run full graphical Linux apps in the latest Android Canary

How to make websites that will require lots of your time and energy

SIMD within a register: How I doubled hash table lookup performance

Debian switches to 64-bit time for everything

Aeneas transforms how historians connect the past

A Photonic SRAM with Embedded XOR Logic for Ultra-Fast In-Memory Computing

Show HN: I made a tool to generate photomosaics with your pictures

What would an efficient and trustworthy meeting culture look like?

Getting the KIM-1 to talk to my Mac

Claude Code weekly rate limits

Is SoftBank Still Backing OpenAI?

Tao on "blue team" vs. "red team" LLMs

Copyparty, turn almost any device into a file server

Claude Code new limits – Important updates to your Max account usage limits

Interstellar Comet 3I/Atlas: What We Know Now

Visa and Mastercard are getting overwhelmed by gamer fury over censorship

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Different Clocks

Simplify, then add delightness: On designing for children

Cells that breathe oxygen and sulfur at the same time

LLM Embeddings Explained: A Visual and Intuitive Guide

VPN use surges in UK as new online safety rules kick in

The first 100% effective HIV prevention drug is approved and going global

The Geological Sublime

AI Companion Piece

Requesting Funding for 90s.dev

Enough AI copilots, we need AI HUDs

NixOS on a Tuxedo InfinityBook Pro 14 Gen9 AMD Laptop

Six Principles for Production AI Agents

I saved a PNG image to a bird

Terminal app can now run full graphical Linux apps in the latest Android Canary

How to make websites that will require lots of your time and energy

SIMD within a register: How I doubled hash table lookup performance

Debian switches to 64-bit time for everything

Aeneas transforms how historians connect the past

A Photonic SRAM with Embedded XOR Logic for Ultra-Fast In-Memory Computing

Show HN: I made a tool to generate photomosaics with your pictures

What would an efficient and trustworthy meeting culture look like?

Getting the KIM-1 to talk to my Mac

Claude Code weekly rate limits

Is SoftBank Still Backing OpenAI?

A Photonic SRAM with Embedded XOR Logic for Ultra-Fast In-Memory Computing

Comments