![Page 1: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/1.jpg)
CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNINGGradient descent
![Page 2: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/2.jpg)
KEY CONCEPTS
Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate
descent Random restarts
![Page 3: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/3.jpg)
Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase
![Page 4: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/4.jpg)
Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase
![Page 5: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/5.jpg)
Gradient descent: iteratively move in direction
![Page 6: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/6.jpg)
Gradient descent: iteratively move in direction
![Page 7: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/7.jpg)
Gradient descent: iteratively move in direction
![Page 8: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/8.jpg)
Gradient descent: iteratively move in direction
![Page 9: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/9.jpg)
Gradient descent: iteratively move in direction
![Page 10: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/10.jpg)
Gradient descent: iteratively move in direction
![Page 11: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/11.jpg)
Gradient descent: iteratively move in direction
![Page 12: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/12.jpg)
Line search: pick step size to lead to decrease in function value
![Page 13: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/13.jpg)
Line search: pick step size to lead to decrease in function value
(Use your favorite univariate optimization method)
a
f(x-af(x))
*a
![Page 14: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/14.jpg)
GRADIENT DESCENT PSEUDOCODE
Input: f, starting value x1, termination tolerances
For t=1,2,…,maxIters: Compute the search direction dt = -f(xt) If ||dt||< εg then:
return “Converged to critical point”, output xt
Find t so that f(xt+t dt) < f(xt) using line search If ||t dt||< εx then:
return “Converged in x”, output xt
Let xt+1 = xt+t dt
Return “Max number of iterations reached”, output xmaxIters
![Page 15: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/15.jpg)
![Page 16: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/16.jpg)
![Page 17: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/17.jpg)
![Page 18: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/18.jpg)
![Page 19: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/19.jpg)
![Page 20: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/20.jpg)
RELATED METHODS
Steepest descent (discrete) Coordinate descent
![Page 21: CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Gradient descent](https://reader035.vdocument.in/reader035/viewer/2022070323/56649dc75503460f94abbe65/html5/thumbnails/21.jpg)
Many local minima: good initialization, or random restarts