So... Automatic integration?
Proportional, integrative, derivative. A PID loop sure sounds like what they're talking about.
It has a lot more overhead than regular forwards mode autodiff because you need to cache values from running the function and refer back to them in reverse order, but the advantage is that for function with many many inputs and very few outputs (i.e. the classic example is calculating the gradient of a scalar function in a high dimensional space like for gradient descent), it is algorithmically more efficient and requires only one pass through the primal function.
On the other hand, traditional forwards mode derivatives are most efficient for functions with very few inputs, but many outputs. It's essentially a duality relationship.
As the name implies, the calculation is done forward.
Reverse mode automatic differentiation starts from the root of the symbolic expression and calculates the derivative for each subexpression simultaneously.
The difference between the two is like the difference between calculating the Fibonacci sequence recursively without memoization and calculating it iteratively. You avoid doing redundant work over and over again.
e.g. optimization of state space control coefficients looks something like training a LLM matrix...
In it, he stated the following:
> Indeed, the famous “backpropagation” algorithm that was rediscovered by David Rumelhart in the early 1980s, and which is now viewed as being at the core of the so-called “AI revolution,” first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.
I was wondering whether anyone could point me to the paper or piece of work he was referring to. There are many citations in Schmidhuber’s piece, and in my previous attempts I've gotten lost in papers.
- Henry J. Kelley (1960), “Gradient Theory of Optimal Flight Paths,” ARS Journal.
- A.E. Bryson & W.F. Denham (1962), “A Steepest-Ascent Method for Solving Optimum Programming Problems,” Journal of Applied Mechanics.
- B.G. Junkin (1971), “Application of the Steepest-Ascent Method to an Apollo Three-Dimensional Reentry Optimization Problem,” NASA/MSFC report.
It's a weird thing to wonder after so many people expressed their dislike of the upthread low-effort comment with a down vote (and then another voiced a more explicit opinion). The point is that a reader may want to know that the text they're reading is something a human took the time to write themselves. That fact is what makes it valuable.
> pncnmnp seems happy
They just haven't commented. There is no reason to attribute this specific motive to that fact.
Also, I quite love it when people clearly demarcate which part of their content came from an LLM, and specifies which model.
The little citation carries a huge amount of useful information.
The folks who don't like AI should like it too, as they can easily filter the content.
Henry J. Kelley (1960). Gradient Theory of Optimal Flight Paths.
[1] https://claude.ai/public/artifacts/8e1dfe2b-69b0-4f2c-88f5-0...
I am still going through it, but the latter is quite interesting!
I think "its" refers to control theory, not backpropagation.
Some ask: "Isn't backpropagation just the chain rule of Leibniz (1676) [LEI07-10] & L'Hopital (1696)?" No, it is the efficient way of applying the chain rule to big networks with differentiable nodes—see Sec. XII of [T22][DLH]). (There are also many inefficient ways of doing this.) It was not published until 1970 [BP1].
[1]: https://www.amazon.com/Talking-Nets-History-Neural-Networks/...
https://en.wikipedia.org/wiki/Adaptive_filter
doesn't need a differentiation of the forward term, but if you squint it looks pretty close
[a] https://www.nobelprize.org/uploads/2024/11/advanced-physicsp...
Some things never change.
fritzo•1h ago