How can AI researchers save energy? By going backward

https://www.quantamagazine.org/how-can-ai-researchers-save-energy-by-going-backward-20250530/

63•pseudolus•1d ago

Comments

throwawaymaths•1d ago

what is the plan here? has anyone even demonstrated a reversible compute matmul? theres a lot of information destruction that happens at the edges of even that transformation

YetAnotherNick•1d ago

There of course is reversible matmul for reversible matrix. There is no reversible relu though.

But in any case I don't understand the claim of the article. If you can reverse the computation(say only use reversible matrix) you can do it for less energy?

cwillu•1d ago

The irreversible component of a computation is what actually generates heat, more or less.

throwawaymaths•1d ago

i don't think that's categorically true. if your footprint from expanding out a circuit to make it reversible gets sufficiently large other things like resistance can dominate.

bee_rider•1d ago

Other things like resistance already dominate (and always have). Reversible computing is the result of exploring the thermodynamic/information theory limits of what computation must cost.

In current chips we just charge and dump a bunch of parasitic capacitances every clock cycle.

bee_rider•1d ago

IMO it would be more accurate to say: the delete operation is the one that we know thermodynamics and information theory must charge us for, on a physics level.

In current computers, we’re nowhere near those limits anyway. Reversible computing is interesting research for the future.

thrance•1d ago

Quantum computers, albeit useless, are real life example of reversible computers. Those can be achieved.

bee_rider•1d ago

Quantum dot cellular automata looked like a good candidate for reversible computers in some post-cmos future, last time I looked at that stuff (years ago—so, it is probably time for a check-in). Notably, they do classical computing, they just exploit quantum effects for the logic gates.

physix•1d ago

I'm wondering... What happens when I do a reversible computation where no information is lost, does cutting the power on the unit create heat?

bee_rider•1d ago

I guess it would depend on the physical design of the compute elements (reversible computing is generally associated with post-CMOS tech). But, cutting power doesn’t necessarily reset a machine in general… think of a mechanical computer, cutting power removes the ability to change state.

hyghjiyhu•1d ago

Idk how accurate it is by my mental model of this would be that electricity is like a liquid and when you flip the switch, it drains out of the components creating heat through friction.

cwillu•1d ago

“Cutting power” is a thing because current computers require a periodic (and rather frequent) refresh to maintain the state of the system (primarily, but not exclusively, RAM). And indeed, one useful tactic to maintain that state when you need to cut the power for some reason is to supercool the ram so that it doesn't dissipate its charge as fast, basically making the system approach that ideal world.

thrance•1d ago

A fully reversible computers would consume no power whatsoever, so wouldn't require power to function. Instead, you would need power to (re)initialize its state, or to copy the results of its computations to e.g. your monitor. An unplugged reversible computer would be free to compute (and uncompute) its bits perpetually, unconcerned about the rest of the world. The functional programmer's dream, in a sense.

physix•1d ago

Thanks for that and the preceding comments. I was thinking about the possible paradox of getting a reset for free. But, the cost invariably comes at some point, e.g. when you restart and need to reset your state.

TheDudeMan•1d ago

Only at the theoretic limits of efficiency. In real life, not true.

throwawaymaths•1d ago

demonstrated is the key word here. you can make most circuits ~reversible by just shuffling off to a pool of bits that you destroy eventually (but i presume you destroy a minimal amount), but is the juice worth the squeeze? i would worry that the scaling factor for floorplan is not linear in the matrix size (which is already O(nxk))

if you're also batching matmuls isnt there an unavoidable information loss that happens when you evict the old data and drop in the new batch?

bravesoul2•1d ago

Isn't it all reversable if you keep the original data?

Let f: V -> V

g: V -> V x V is reversable form of f, where g(v) = (v, f(v))

g'((v, w)) = v

g' can be "fooled" with a fake w, but that is of no concern here. We trust our own chip!

thrance•1d ago

Your g is not reversible though, its input space must be the same dimension than its output space. On the other hand, you're correct in that any irreversible function can be extended to a reversible one, although the process isn't always straightforward. The general way is to do somthing like:

f: V -> V

g: V x A -> V x A

with g(x, a) = g(f(x), b) for some value(s) of a. And b cannot be set to x, because then you can't find a back with g', and your function is not invertible.

TheDudeMan•1d ago

No. It is a purely theoretic result. It has zero real-world applicability.

ajb•1d ago

People have been working on commercialising this stuff for decades, actually. The problem is that you are competing with the staggering level of investment that goes into CMOS. To get early revenue you need a niche which is forgiving of not being as good as CMOS in other dimensions - AI is almost certainly not that.

JoshuaDavid•1d ago

LeakyReLU is reversible in all but the least significant bits, right?

thrance•1d ago

You're mistaking invertible matrices for reversible computing, which are two unrelated concepts*. You can devise a reversible implementation of ReLU (and of anything else for that matter), using ansible variables. Like in addition:

add(x, a) = (x + a, x - a), and add†(y, b) = ((y + b) / 2, (y - b) / 2)

It's a well known thing in thermodynamics that a reversible process doesn't increase entropy (heat). So in theory, a reversible computer consumes no power whatsoever, as it doesn't leak any. In practice, most power leakage in real life computers is due to wire resistance, not the irreversibility of computations. Also, even with a perfectly reversible CPU, you would need to expend some energy to (re)initialize the state of your computer (input) and to copy its results out (output). Alternatively, once a computation is done, you can always "uncompute" it to get back to the initial state without using any power, at the cost of time.

If you want an example of a real invertible computer, look into quantum computers, which are adiabatic (reversible) by necessity, in accordance with the laws of quantum physics.

* Actually, you can represent reversible gates with invertible matrices, and that has quite profound implications. A gate/operation is reversible if and only if its corresponding matrix is invertible. But let's not get into that here.

MangoToupe•1d ago

What does it mean for computation to have a direction? Said direction does not seem to refer to causality, which seems to me to be the natural interpretation—ie, producing inputs from outputs. It seems to me you'd necessarily need to run the program first with known inputs for that to work. So this just about preserving state by default to make backtracking easier?

random3•1d ago

Yes, but at a physical level, so it needs different hardware. Delete (e.g. AND) generates heat, so you need different gates, like the Fredkin gate.

bravesoul2•1d ago

I am guessing it minimises irreversible operations (information deletion), I.e.:

2 + 2 + 2

<=> reversable

2 + 2 + 2, 2 + 4

<=> reversable

2 + 2 + 2, 2 + 4, 6

=> irreversible

MangoToupe•1d ago

Interesting. I would have thought that a reversible computation would produce a new algorithm with the domain and range swapped. Naming truly is the final boss of computation. But now I see it's one-step backtracking in a way that allows saving energy, supposedly. Very much still reversible, but definitely nothing remotely comparable to time travel. "uncomputation" was the much, much better name.

Edit: i see now. Well, this is much less exciting than I thought. Still, I'm excited for all the other people excited.

bravesoul2•1d ago

Anything to get the PPM co2 down is exciting although this probably has no effect (because Jevon) and we just need more trees, wind and solar

godelski•1d ago

  > What does it mean for computation to have a direction?

Actually, all computation has a directionality! This is actually a subject that I get really excited about ^__^

Think about this, we have a function f, with input x, and output y. f(x) -> y We'd even use that notation! This is our direction.

Now, the reverse actually gets a bit tricky. If the reverse is straightforward, our function has an inverse. But, it might not always. So our function f(x)=mx+b is invertible, because we can write x = (f(x)-b)/m (well... m can't be 0), which provides a unique solution. Every x corresponds to a unique f(x). But if instead, we have the function f(x) = x^2, this is not true! x = sqrt(f(x)), and here every f(x) corresponds to both x and -x! They are not unique.

We can start adopting the language of images and preimages if you want to start heading down that route. There's a lot of interesting phenomena here and if you hadn't already guessed it, yes, this is related to the P=NP problem!

An easy way to see this visually is to write down the computational circuit. Polylog actually has a really good video and will make the connection to P vs NP[0]

In the context of machine learning, a Normalizing Flow would be invertible, while a diffusion model is reversible. A pet peeve of mine is that in ML people (I'm a ML researcher) call it the "inverse problem", such as GAN-Inversion, but that is a misnomer and we shouldn't propagate it... This also has to do with the naivety of these statements...[1,2]. If yo understand this you'll understand how one could make accurate predictions in one direction but fail in the other. Which really puts a whole damper on that causality stuff. Essentially, we run into the problem of generating counterfactuals.

  > Said direction does not seem to refer to causality

Understanding this, I think you can actually tell that there's a direct relationship to causality here! In physics we love to manipulate equations around because... well... the point of physics is generating a causal mapping of the universe. But there's some problems... Entropy is the classic prime example (but there are many more in QM), and perhaps this led to his demise[3]. (This is also related to the phenomena of emergence and chaos.)

Here the issue is that we can take some gas molecules, run our computation forward and get a new "state" (configuration of our molecules). But now... how do we run this in reverse? We will not generate a unique solution, but instead we have a family of solutions.

Funny enough, you ran into this when you took calculus! That's why when you integrated your professor always got mad if you dropped the "+C"! So here you can see that differentiation isn't (necessarily) an invertible process. All f(x)+c map to f'(x)! It is a many to one relationship, just like with f(x)=x^2

  > So this just about preserving state by default to make backtracking easier?

I think here we should have some more clarity? If not, think about our gas distribution problem. If instead of just sampling at time 0 and time T we sampled at {0,t0,t1,...,T} we greatly reduce the solution space, right? Because now our mapping from T->0 needs to pass through all such states. It's still a lot of potential paths, but it's still fewer...[4]

[0] https://www.youtube.com/watch?v=6OPsH8PK7xM

[1] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...

[2] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s

[3] The opening of Goldstein's States of Matter book (the standard graduate textbook on statistical mechanics). Be sure to also read the first line of the second paragraph: https://i.imgur.com/Dm0PeJU.png

[4] I know...

worldsayshi•1d ago

I've been following Mike P Frank for a while on Twitter and he has often had interesting things to say about reversible computing and AI:

https://x.com/MikePFrank

tamat•1d ago

As a Software Engineer I found it hard to grasp the concepts explained here.

First it says we lose electrons by deleting information. But AFAIK we are losing electrons everywhere, most gates will operate on negation of a current, which I understand is what they refeer to losing electrons. So, are all gates bad now?

Also, why keeping a history of all memory changes will prevent losing heat? You will have to keep all that memory running so...

And finally, why would this be useful? Who needs to go back in time in their computations??

HPsquared•1d ago

It's a thermodynamics thing. Reversible processes are the most efficient (something to do with entropy). Deleting information means it's no longer reversible. This is an entirely theoretical thing. There are theoretical limits to energy usage of computation based on this, but actual computers are nowhere near these theoretical limits, at all.

Edit: and yes, most of the logical operations in a regular chip like AND, OR, NAND etc are irreversible (in isolation, anyway)

rnhmjoj•1d ago

> but actual computers are nowhere near these theoretical limits, at all.

The Landauer limit at ambient temperature gives something of the order of 10⁻²¹ J to irreversibly flip a bit. While, if I read this paper[1] correctly, current transistors are around 10⁻¹⁵ J. So, definitely not coming to AI "soon".

[1]: https://arxiv.org/pdf/2312.08595

tamat•1d ago

thanks for your reply

thrance•1d ago

Theoretically, a computer that never forgets anything can run without consuming any power (and thus never heating). That kind of computer would be called reversible (or adiabatic) as it would require its gates to be reversible (i.e. any computation can be undone). You would still need to expend energy to set the initial state (input) and copy the result (output).

Obviously, in real life, most power consumed by computers is lost by wire resistance, not through "forgetting" memory in logic gates. You would need superconducting wires and gates to build an actually reversible CPU.

Also, you would need to "uncompute" the result of a computation to bring back your reversible computer from its result back to its initial state, which may be problematic. Or you can expend energy to erase the state.

Quantum computers are reversible computers, if you seek a real life example. Quantum logic gates are reversible and can all be inverted.

tamat•1d ago

Thanks for your explanation

naasking•1d ago

> Also, why keeping a history of all memory changes will prevent losing heat?

How much power does a persistent storage (hard drive, SSD) require to preserve its stored data? Zero, which is why it emits zero heat.

> Who needs to go back in time in their computations??

At its most basic level, erasing/overwriting data requires energy. This generates a lot of heat. Heat dissipation is a major obstacle to scaling chips down even further. If you can design a computer that doesn't need to erase nearly as much data, you generate orders of magnitude less heat, and this potentially opens up more scaling potential and considerable power savings.

charcircuit•1d ago

This article is disappointing. Without any sort of benchmark or evidence of pytorch supporting it, how can this compete with nvidia? There is no proof that this finally is competitive against traditional chips.

Edit: One of their white papers mentions " Application Framework: A PyTorch-compatible interface supports both AI applications and general-purpose computing, ensuring versatility without sacrificing performance."

Huxley1•1d ago

I’ve been learning more about how AI systems actually work, and one thing I keep wondering is how much energy it all uses.

This idea of reversible computing was new to me. I didn’t know it was even possible to run computations “backwards” to save power. It’s interesting that slowing things down might actually save more energy in the long run. I’ll definitely be reading more about this.

imurray•1d ago

I'm sceptical about the energy motivation, but there are multiple reasons why making invertible deep learning architectures can be interesting or useful. Cf, a series of workshops from 2019-2021: https://invertibleworkshop.github.io/

Since then diffusion models have been popular. Generating from these can be seen as a special case of a continuous time normalizing flow, and so (in theory) is a reversible computation. Although the distilled/fast generation that's run in production is probably not that!

Simulating differential equations is not usually actually reversible in practice due to round-off errors. But when done carefully, simulations performed in a computer can actually be exactly bit-for-bit reversible: https://arxiv.org/abs/1704.07715

imurray•1d ago

Another machine learning paper ("ancient", 2015) where being able to exactly reverse a computation was useful: https://arxiv.org/abs/1502.03492

user____name•1d ago

I hope this will help when the next GPU datacenter driven software fad comes around. Though it probably won't, Jevons paradox and all.

sadboots•1d ago

it has been shown that using LLMs plenty of times require less electricity than boiling a kettle

thrance•1d ago

In Stephen Baxter's Time [1], in the far far future, when all the stars have died out and black holes have finally all evaporated, the descendants of mankind are left in a maximally entropic universe, with no free enrgy left at all. They are condemned to live in giant simulations powered by reversible computing (consuming no power), reliving the same events over and over again, as computations are uncomputed and then recomputed.

[1] https://en.wikipedia.org/wiki/Time_(novel)

rollcat•1d ago

Says the headline a web page that takes 12s to load on a modern machine.

People tend to ignore a problem if it's someone else's. The costs of [insert disruptive technology here] are largely externalised - on our natural environment, on individuals' livelihoods, on violated copyrights, on independent hosts' infrastructure, on pedestrians, on about-to-be burnt-out/jobless/homeless, etc. What you gain in efficiency, you will use to bring more for yourself, not to bring less harm to someone else. ¯\_(ツ)_/¯

Deep learning gets the glory, deep fact checking gets ignored

A deep dive into self-improving AI and the Darwin-Gödel Machine

Destination: Jupiter

Quarkdown: A modern Markdown-based typesetting system

Show HN: Ephe – A Minimalist Open-Source Markdown Paper for Today

The Small World of English

Show HN: AirAP AirPlay server - AirPlay to an iOS Device

Show HN: An Alfred workflow to open GCP services and browse resources within

(On | No) Syntactic Support for Error Handling

Activeloop (YC S18) Is Hiring Senior Back End and AI Search Engineers(Onsite, MV)

Show HN: Localize React apps without rewriting code

CVE-2024-47081: Netrc credential leak in PSF requests library

Ask HN: Options for One-Handed Typing

The Shape of the Essay Field

Show HN: I wrote a Java decompiler in pure C language

Show HN: Controlling 3D models with voice and hand gestures

Swift at Apple: Migrating the Password Monitoring Service from Java

Can adults grow new brain cells?

Yoshua Bengio Launches LawZero: A New Nonprofit Advancing Safe-by-Design AI

Vision Language Models Are Biased

There should be no Computer Art (1971)

Builder.ai Collapses: $1.5B 'AI' Startup Exposed as 'Indians'

Technical Guide to System Calls: Implementation and Signal Handling in Modern OS

Show HN: PinSend – Share text between devices using a PIN(P2P, no login)

Oh fuck! How do people feel about robots that leverage profanity?

Plutonium Mountain: The 17-year mission to guard remains of Soviet nuclear tests

Fun with Futex

Polish engineer creates postage stamp-sized 1980s Atari computer

The Metamorphosis of Prime Intellect (1994)

Covert Web-to-App Tracking via Localhost on Android

How can AI researchers save energy? By going backward

Comments

Deep learning gets the glory, deep fact checking gets ignored

A deep dive into self-improving AI and the Darwin-Gödel Machine

Destination: Jupiter

Quarkdown: A modern Markdown-based typesetting system

Show HN: Ephe – A Minimalist Open-Source Markdown Paper for Today

The Small World of English

Show HN: AirAP AirPlay server - AirPlay to an iOS Device

Show HN: An Alfred workflow to open GCP services and browse resources within

(On | No) Syntactic Support for Error Handling

Activeloop (YC S18) Is Hiring Senior Back End and AI Search Engineers(Onsite, MV)

Show HN: Localize React apps without rewriting code

CVE-2024-47081: Netrc credential leak in PSF requests library

Ask HN: Options for One-Handed Typing

The Shape of the Essay Field

Show HN: I wrote a Java decompiler in pure C language

Show HN: Controlling 3D models with voice and hand gestures

Swift at Apple: Migrating the Password Monitoring Service from Java

Can adults grow new brain cells?

Yoshua Bengio Launches LawZero: A New Nonprofit Advancing Safe-by-Design AI

Vision Language Models Are Biased

There should be no Computer Art (1971)

Builder.ai Collapses: $1.5B 'AI' Startup Exposed as 'Indians'

Technical Guide to System Calls: Implementation and Signal Handling in Modern OS

Show HN: PinSend – Share text between devices using a PIN(P2P, no login)

Oh fuck! How do people feel about robots that leverage profanity?

Plutonium Mountain: The 17-year mission to guard remains of Soviet nuclear tests

Fun with Futex

Polish engineer creates postage stamp-sized 1980s Atari computer

The Metamorphosis of Prime Intellect (1994)

Covert Web-to-App Tracking via Localhost on Android