Why Momentum Works - https://news.ycombinator.com/item?id=14034426 - April 2017 (95 comments)
var s = 3
var x = xy[0]; var y = xy[1]*s
var fx = (1-x)*(1-x) + 20*(y - x*x )*(y - x*x )
var dfx = [-2*(1-x) - 80*x*(-x*x + y), s*40*(-x*x + y)]
The interactive example uses an initial guess of [-1.21, 0.853] and a fixed 150 iterations, with no convergence test.From manually fiddling with (step-size) alpha & (momentum) beta parameters, and editing the code to specify a smaller number of iterations, it seems quite difficult to tune this momentum-based approach to get near the minima and stay there without bouncing away in 50 iterations or fewer.
Out of curiosity, I compared minimising this bananaf function with scipy.optimize.minimize, using the same initial guess.
If we force scipy.optimize.minimize to use method='cg', leaving all other parameters as defaults, it converges to the optimal solution of [1.0, 1./3.] requiring 43 evaluations of fx and dfx,
If we allow scipy.optimize.minimize to use all defaults -- including the default method='bfgs', it converges to the optimal solution after only 34 evaluations of fx and dfx.
Under the hood, scipy's method='cg' and method='bfgs' solvers do not use a fixed step size or momentum to determine the step size, but instead solve a line search problem. The line search problem is to identify a step size that satisfies a sufficient decrease condition and a curvature condition - see Wolfe conditions [1]. Scipy's default line search method -- used for cg and bfgs -- is a python port [2] of the dcsrch routine from MINPACK2. A good reference covering line search methods & BFGS is Nocedal & Wright's 2006 book Numerical Optimization.
[1] https://en.wikipedia.org/wiki/Wolfe_conditions [2] https://github.com/scipy/scipy/blob/main/scipy/optimize/_dcs...
porridgeraisin•6h ago
imvg•6h ago
gwern•5h ago
michael_nielsen•4h ago
gwern•2h ago