survey of unconstrained optimization gradient based algorithms

Slide 1

Survey of unconstrained optimization gradient based algorithmsUnconstrained minimizationSteepest descent vs. conjugate gradientsNewton and quasi-Newton methodsMatlab fminunc

This lecture provides a brief survey of gradient based local unconstrained optimization algorithms. We first discuss algorithms that are based only on the gradient comparing the intuitive steepest descent to the much better conjugate gradients. Then we discuss algorithms that calculate or approximate the Hessian matrix, in particular Newtonss method and quasi-Newtons method. We finally discuss their use in Matlabs fminunc.1Unconstrained local minimizationThe necessity for one dimensional searches

The most intuitive choice of sk is the direction of steepest descent

This choice, however is very poorMethods are based on the dictum that all functions of interest are locally quadratic

Gradient based optimization algorithms assume that gradient calculations are expensive. Therefore, once the gradient is calculated and a search direction selected, it pays to go in that direction for a while, rather than stop soon and update the direction based on another gradient calculation. The move is described by

Which indicates that the (k+1)-th position is found by moving a distance alpha in the direction sk. The distance alpha is usually based on one-dimensional minimization of the function in that direction.The most intuitive direction is in the direction of steepest descent (negative of the gradient direction), the way water moves on a terrain. However, water can continually change direction, while we want to stick to a single direction as long as possible. This makes the steepest descent direction a poor choice.Instead, methods for choosing a better direction are based on the idea that close enough to a minimum the function behaves like a quadratic.

2

Conjugate gradients

Conjugate gradient methods select a direction which takes into account not only the gradient direction but also the search direction in the previous iteration. By doing that, we can develop algorithms that are guaranteed to converge to the minimum of an n-dimensional quadratic function in no more than n iterations.

The equation at the top of the slide gives the recipe for the Fletcher Reeves conjugate gradient method, and the figure compares it to the steepest descent method for a quadratic function

The two methods start the same in the negative gradient direction. However, the steepest descent methods zig-zags around, while the conjugate directions methods homes on the minimum in the second move.Note that each move ends being tangent to the function contour, because we minimize the function in the search direction. Most implementations approximate the function as a quadratic along the search direction, so that if the function is a quadratic the search will end at the minimum in that direction. However, if the function is not quadratic we are likely to stop at a distance from where the direction is tangent to the contour.3

Newton and quasi-Newton methodsNewton

Quasi-Newton methods use successive evaluations of gradients to obtain approximation to Hessian or its inverseMatlabs fminunc uses a variant of Newton if gradient routine is provided, otherwise BFGS quasi-Newton.The variant of Newton is called trust region approach and is based on using a quadratic approximation of the function inside a box.

If we have the Hessian matrix we can use a first order Taylor expansion for the gradient

where Qk is the Hessian matrix at the k-th iteration. Then we select delta-x so that the gradient vanishes, to satisfy the stationarity condition. Instead of going there, we just use it to select a direction, because at the present point we may be far from the minimum, and the equation for the gradient may be have large errors for large delta-x.This is Newtons method. When it is too expensive to calculate the Hessian matrix we fall back on quasi-Newton methods. These are methods that build an approximation to the Hessian matrix and update it after each gradient calculation. For quadratic functions they are guaranteed to reach the minimum in no more than n iterations and to have an exact Hessian after n iterations.One of the most popular quasi-Newton methods is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Matlabs fminunc will use a variant of Newtons method if given a routine to calculate the gradient, and BFGS if it has to calculate the gradient by finite differences.The variant of Newtons method used by fminunc is called a trust-region method. It constructs a quadratic approximation to the function and minimizes it in a box around the current point, with the size of the box adjusted depending on the success of the previous iteration.

4

Problems Unconstrained algorithmsExplain the differences and commonalities of steepest descent, conjugate gradients, Newtons method, and quasi-Newton methods for unconstrained minimization. Solution on Notes page.Use fminunc to minimize the Rosenbrock Banana function and compare the trajectories of fminsearch and fminunc starting from (-1.2,1), with and without the routine for calculating the gradient. Plot the three trajectories. Solution Explain the differences and commonalities of steepest descent, conjugate gradients, Newtons method,and quasi-Newton methods for unconstrained minimization.

Solution: All the algorithms share the philosophy of calculating a direction and then doing a 1D minimization along this direction. All the algorithms except for steepest descent are guaranteed to converge in finite number of steps to the solution for a quadratic function. Conjugate gradients and quasi-newton in n steps, and Newton in one step. The difference between conjugate gradients and quasi-Newton is that the latter produce an approximation of the Hessian or its inverse, which accelerates convergence for non-quadratic functions. 5

survey of unconstrained optimization gradient based algorithms

Documents