Who Invented Backpropagation?

https://people.idsia.ch/~juergen/who-invented-backpropagation.html

88•nothrowaways•2h ago

Comments

fritzo•1h ago

TIL that the same Shun'ichi Amari who founded information geometry also made early advances to gradient descent.

mystraline•1h ago

> BP's modern version (also called the reverse mode of automatic differentiation)

So... Automatic integration?

Proportional, integrative, derivative. A PID loop sure sounds like what they're talking about.

eigenspace•1h ago

Reverse move automatic differentiation is not integration. It's still differentiation, but just a different method of calculating the derivative than the one you'd think to do by hand. It basically just applies the chain rule in the opposite order from what is intuitive to people.

It has a lot more overhead than regular forwards mode autodiff because you need to cache values from running the function and refer back to them in reverse order, but the advantage is that for function with many many inputs and very few outputs (i.e. the classic example is calculating the gradient of a scalar function in a high dimensional space like for gradient descent), it is algorithmically more efficient and requires only one pass through the primal function.

On the other hand, traditional forwards mode derivatives are most efficient for functions with very few inputs, but many outputs. It's essentially a duality relationship.

stephencanon•56m ago

I don't think most people think to do either direction by hand; it's all just matrix multiplication, you can multiply them in whatever order makes it easier.

imtringued•1h ago

Forward mode automatic differentiation creates a formula for each scalar derivative. If you have a billion parameters you have to calculate each derivative from scratch.

As the name implies, the calculation is done forward.

Reverse mode automatic differentiation starts from the root of the symbolic expression and calculates the derivative for each subexpression simultaneously.

The difference between the two is like the difference between calculating the Fibonacci sequence recursively without memoization and calculating it iteratively. You avoid doing redundant work over and over again.

digikata•57m ago

There are large bodies of work for optimization of state space control theory that I strongly suspect as a lot of crossover for AI, and at least has very similar mathematical structure.

e.g. optimization of state space control coefficients looks something like training a LLM matrix...

cubefox•1h ago

See also: The Backstory of Backpropagation - https://yuxi.ml/essays/posts/backstory-of-backpropagation/

pjbk•1h ago

As it is stated, I always thought it came from formulations like Euler-Lagrange procedures in mechanics used in numeric methods for differential geometry. In fact when I recreated the algorithm as an exercise it immediately reminded me of gradient descent for kinematics, with the Jacobian calculation for each layer similar to an iterative pose calculation in generalized coordinates. I never thought it was something "novel".

pncnmnp•1h ago

I have a question that's bothered me for quite a while now. In 2018, Michael Jordan (UC Berkeley) wrote a rather interesting essay - https://medium.com/@mijordan3/artificial-intelligence-the-re... (Artificial Intelligence — The Revolution Hasn’t Happened Yet)

In it, he stated the following:

> Indeed, the famous “backpropagation” algorithm that was rediscovered by David Rumelhart in the early 1980s, and which is now viewed as being at the core of the so-called “AI revolution,” first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.

I was wondering whether anyone could point me to the paper or piece of work he was referring to. There are many citations in Schmidhuber’s piece, and in my previous attempts I've gotten lost in papers.

psYchotic•1h ago

I found this,maybe it helps: https://gwern.net/doc/ai/nn/1986-rumelhart-2.pdf

pncnmnp•1h ago

Apologies - I should have been clear. I was not referring to Rumelhart et al., but to pieces of work that point to "optimizing the thrusts of the Apollo spaceships" using backprop.

costates-maybe•1h ago

I don't know if there is a particular paper exactly, but Ben Recht has a discussion of the relationship between techniques in optimal control that became prominent in the 60's, and backpropagation:

https://archives.argmin.net/2016/05/18/mates-of-costate/

dataflow•1h ago

I asked ChatGPT and it gave a plausible answer but I haven't fact checked. It says "what you’re thinking of is the “adjoint/steepest-descent” optimal-control method (the same reverse-mode idea behind backprop), developed in aerospace in the early 1960s and applied to Apollo-class vehicles." It gave the following references:

- Henry J. Kelley (1960), “Gradient Theory of Optimal Flight Paths,” ARS Journal.

- A.E. Bryson & W.F. Denham (1962), “A Steepest-Ascent Method for Solving Optimum Programming Problems,” Journal of Applied Mechanics.

- B.G. Junkin (1971), “Application of the Steepest-Ascent Method to an Apollo Three-Dimensional Reentry Optimization Problem,” NASA/MSFC report.

throawayonthe•1h ago

it's rude to show people your llm output

drsopp•59m ago

Why?

danieldk•52m ago

Because it is terribly low-effort. People are here for interesting and insightful discussions with other humans. If they were interested in unverified LLM output… they would ask an LLM?

drsopp•42m ago

Who cares if it is low effort? I got lots of upvotes for my link to Claude about this, and pncnmnp seems happy. The downvoted comment from ChatGPT was maybe a bit spammy?

lcnPylGDnU4H9OF•6m ago

> Who cares if it is low effort?

It's a weird thing to wonder after so many people expressed their dislike of the upthread low-effort comment with a down vote (and then another voiced a more explicit opinion). The point is that a reader may want to know that the text they're reading is something a human took the time to write themselves. That fact is what makes it valuable.

> pncnmnp seems happy

They just haven't commented. There is no reason to attribute this specific motive to that fact.

aeonik•32m ago

I don't think it's rude, it saves me from having to come up with my own prompt and wade through the back and forth to get useful insight from the LLMs, also saves me from spending my tokens.

Also, I quite love it when people clearly demarcate which part of their content came from an LLM, and specifies which model.

The little citation carries a huge amount of useful information.

The folks who don't like AI should like it too, as they can easily filter the content.

drsopp•1h ago

Perhaps this:

Henry J. Kelley (1960). Gradient Theory of Optimal Flight Paths.

[1] https://claude.ai/public/artifacts/8e1dfe2b-69b0-4f2c-88f5-0...

pncnmnp•58m ago

Thanks! This might be it. I looked up Henry J. Kelley on Wikipedia, and in the notes I found a citation to this paper from Stuart Dreyfus (Berkeley): "Artificial Neural Networks, Back Propagation and the Kelley-Bryson Gradient Procedure" (https://gwern.net/doc/ai/nn/1990-dreyfus.pdf).

I am still going through it, but the latter is quite interesting!

duped•1h ago

They're probably talking about Kalman Filters (1961) and LMS filters (1960).

pjbk•49m ago

To be fair, any multivariable regulator or filter (estimator) that has a quadratic component (LQR/LQE) will naturally yield a solution similar to backpropagation when an iterative algorithm is used to optimize its cost or error function through a differentiable tangent space.

cubefox•54m ago

> ... first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.

I think "its" refers to control theory, not backpropagation.

dudu24•1h ago

It's just an application of the chain rule. It's not interesting to ask who invented it.

qarl•1h ago

From the article:

Some ask: "Isn't backpropagation just the chain rule of Leibniz (1676) [LEI07-10] & L'Hopital (1696)?" No, it is the efficient way of applying the chain rule to big networks with differentiable nodes—see Sec. XII of [T22][DLH]). (There are also many inefficient ways of doing this.) It was not published until 1970 [BP1].

uoaei•42m ago

The article says that but it's overcomplicating to the point of being actually wrong. You could, I suppose, argue that the big innovation is the application of vectorization to the chain rule (by virtue of the matmul-based architecture of your usual feedforward network) which is a true combination of two mathematical technologies. But it feels like this and indeed most "innovations" in ML is only considered as such due to brainrot derived from trying to take maximal credit for minimal work (i.e., IP).

mindcrime•1h ago

Who didn't? Depending on exactly how you interpret the notion of "inventing backpropagation" it's been invented, forgotten, re-invented, forgotten again, re-re-invented, etc, about 7 or 8 times. And no, I don't have specific citations in front of me, but I will say that a lot of interesting bits about the history of the development of neural networks (including backpropagation) can be found in the book Talking Nets: An Oral History of Neural Networks[1].

[1]: https://www.amazon.com/Talking-Nets-History-Neural-Networks/...

convolvatron•1h ago

don't undergrad adaptive filters count?

https://en.wikipedia.org/wiki/Adaptive_filter

doesn't need a differentiation of the forward term, but if you squint it looks pretty close

caycep•59m ago

this fight has become legendary and infamous

caycep•58m ago

this fight has become legendary and infamous, and also pops up on HN every 2-3 years

aaroninsf•55m ago

When I worked on neural networks, I was taught David Rumelhart.

cs702•53m ago

Whatever the facts, the OP comes across as sour grapes. The author, Jürgen Schmidhuber, believes Hopfield and Hinton did not deserve their Nobel Prize in Physics, and that Hinton, Bengio, and LeCun did not deserve their Turing Award. Evidently, many other scientists disagree, because both awards were granted in consultation with the scientific community. Schmidhuber's own work was, in fact, cited by the Nobel Prize committee as background information for the 2024 Nobel.[a] Only future generations of scientists, looking at the past more objectively, will be able to settle these disputes.

[a] https://www.nobelprize.org/uploads/2024/11/advanced-physicsp...

icelancer•29m ago

Didn't click the article, came straight to the comments thinking "I bet it's Schmidhuber being salty."

Some things never change.

empiko•26m ago

I think the unspoken claim here is that the North American scientific establishment takes credit from other sources and elevates certain personas instead of the true innovators who are overlooked. Arguing that the establishment doesn't agree with this idea is kinda pointless.

uoaei•46m ago

Calling the implementation of chain rule "inventing" is most of the problem here.

dicroce•44m ago

Isn't it just kinda a natural thing once you have the chain rule?

PunchTornado•40m ago

Funny that hinton is not mentioned. Like how childish can the author be?

bjornsing•35m ago

The chain rule was explored by Gottfried Wilhelm Leibniz and Isaac Newton in the 17th century. Either of them would have ”invented” backpropagation in an instant. It’s obvious.

_fizz_buzz_•32m ago

Funny enough. For me it was the other way around. I always knew how to compute the chain rule. But really only understood what the chain rule means when I read up on what back propagation was.

Anon84•26m ago

Can we back propagate credit?

Anna's Archive: An Update from the Team

Turning an iPad Pro into the Ultimate Classic Macintosh (2021)

Robots.txt Is a Suicide Note (2011)

FFmpeg Assembly Language Lessons

Show HN: I built an app to block Shorts and Reels

My Retro TVs

Left to Right Programming: Programs Should Be Valid as They Are Typed

TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training

The Weight of a Cell

Who Invented Backpropagation?

Launch HN: Reality Defender (YC W22) – API for Deepfake and GenAI Detection

Web apps in a single, portable, self-updating, vanilla HTML file

Show HN: Whispering – Open-source, local-first dictation you can trust

Typechecker Zoo

Electromechanical reshaping, an alternative to laser eye surgery

The Cutaway Illustrations of Fred Freeman

AWS pricing for Kiro dev tool dubbed 'a wallet-wrecking tragedy'

A gigantic jet caught on camera: A spritacular moment for NASA astronaut

Vibe coding tips and tricks

Image Fulgurator (2011)

Sky Calendar

Walkie-Textie Wireless Communicator

SystemD Service Hardening

Countrywide natural experiment links built environment to physical activity

The Lives and Loves of James Baldwin

Class-action suit claims Otter AI records private work conversations

MCP doesn't need tools, it needs code

Weather Radar APIs in 2025: A Founder's Complete Market Overview

8x19 Text Mode Font Origins

Texas law gives grid operator power to disconnect data centers during crisis

Anna's Archive: An Update from the Team

Turning an iPad Pro into the Ultimate Classic Macintosh (2021)

Robots.txt Is a Suicide Note (2011)

FFmpeg Assembly Language Lessons

Show HN: I built an app to block Shorts and Reels

My Retro TVs

Left to Right Programming: Programs Should Be Valid as They Are Typed

TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training

The Weight of a Cell

Who Invented Backpropagation?

Launch HN: Reality Defender (YC W22) – API for Deepfake and GenAI Detection

Web apps in a single, portable, self-updating, vanilla HTML file

Show HN: Whispering – Open-source, local-first dictation you can trust

Typechecker Zoo

Electromechanical reshaping, an alternative to laser eye surgery

The Cutaway Illustrations of Fred Freeman

AWS pricing for Kiro dev tool dubbed 'a wallet-wrecking tragedy'

A gigantic jet caught on camera: A spritacular moment for NASA astronaut

Vibe coding tips and tricks

Image Fulgurator (2011)

Sky Calendar

Walkie-Textie Wireless Communicator

SystemD Service Hardening

Countrywide natural experiment links built environment to physical activity

The Lives and Loves of James Baldwin

Class-action suit claims Otter AI records private work conversations

MCP doesn't need tools, it needs code

Weather Radar APIs in 2025: A Founder's Complete Market Overview

8x19 Text Mode Font Origins

Texas law gives grid operator power to disconnect data centers during crisis

Who Invented Backpropagation?

Comments