steepest gradient method conjugate gradient method

13
Steepest Gradient Method Conjugate Gradient Method [ ] Hoonyoung Jeong Department of Energy Resources Engineering Seoul National University

Upload: others

Post on 25-Feb-2022

33 views

Category:

Documents


0 download

TRANSCRIPT

Steepest Gradient Method

Conjugate Gradient Method[ ]Hoonyoung Jeong

Department of Energy Resources EngineeringSeoul National University

Review

β€’ What is the formula of βˆ‡π‘“π‘“?β€’ What is the physical meaning of βˆ‡π‘“π‘“?β€’ What is the formula of a Jacobian matrix?β€’ How is a Jacobian matrix utilized?β€’ What is the formula of a Hessian matrix?β€’ How is a Hessian matrix utilized?

Key Questions

β€’ Explain the steepest gradient method intuitivelyβ€’ Explain a step sizeβ€’ Present stop criteriaβ€’ What is the disadvantage of the steepest gradient method?β€’ Explain the conjugate gradient method by comparing with the steepest

gradient method

Comparison Intuitive Solution to Steepest Gradient Method

The example is mathematically expressed as: 𝑓𝑓(x): elevation at location x = π‘₯π‘₯ 𝑦𝑦 Find x maximizing 𝑓𝑓(x)

1) Find the steepest ascent direction by taking one step to each direction at the current locationSteepest ascent direction= Tangential direction at the current location= 𝛻𝛻𝑓𝑓 xπ‘˜π‘˜ = πœ•πœ•π‘“π‘“

πœ•πœ•π‘₯π‘₯πœ•πœ•π‘“π‘“πœ•πœ•π‘¦π‘¦

= Direction locally maximizing f(x) or Locally steepest ascent direction

2) Take steps along the steepest directionxπ‘˜π‘˜+1 = xπ‘˜π‘˜ + πœ†πœ†π‘˜π‘˜π›»π›»π‘“π‘“ xπ‘˜π‘˜

π‘˜π‘˜: iterationπœ†πœ†π‘˜π‘˜: how many steps to the steepest direction

3) Stop if you meet uphill

4) Repeat 1) – 3) until you reach a peak

4

xπ‘˜π‘˜

xπ‘˜π‘˜+1

Direciton:Length: πœ†πœ†π‘˜π‘˜ 𝛻𝛻𝑓𝑓 xπ‘˜π‘˜

xπ‘˜π‘˜+1 βˆ’ xπ‘˜π‘˜ = πœ†πœ†π‘˜π‘˜π›»π›»π‘“π‘“ xπ‘˜π‘˜

Example of Gradient Descent Method (1)

β€’ π‘₯π‘₯π‘˜π‘˜+1 = π‘₯π‘₯π‘˜π‘˜ βˆ’ πœ†πœ†π‘˜π‘˜π›»π›»π‘“π‘“(π‘₯π‘₯π‘˜π‘˜)β€’ 𝑓𝑓 = π‘₯π‘₯4

β€’ 𝛻𝛻𝑓𝑓‒ π‘₯π‘₯0 = 2β€’ πœ†πœ†π‘˜π‘˜ =0.02 (constant)β€’ π‘₯π‘₯1 = π‘₯π‘₯0 βˆ’ πœ†πœ†0𝛻𝛻𝑓𝑓(π‘₯π‘₯0)=?β€’ π‘₯π‘₯2 = π‘₯π‘₯1 βˆ’ πœ†πœ†1𝛻𝛻𝑓𝑓(π‘₯π‘₯1)=?β€’ π‘₯π‘₯3 = π‘₯π‘₯2 βˆ’ πœ†πœ†2𝛻𝛻𝑓𝑓(π‘₯π‘₯2)=?β€’ Repeat for πœ†πœ†π‘˜π‘˜ =0.1 (constant)

Example of Gradient Descent Method (2)

β€’ See gradient_descent_method.mlx

Disadvantage of Gradient Descent Method

β€’ Slow near local minima or maximaThe 2-norm of a gradient vector

becomes small near local minima or maxima

Step Size

β€’ Step length / Learning rate called in machine learningβ€’ What are the advantages of large and small step sizes?Large: fastSmall: stable

β€’ What are the disadvantages of large and small step sizes?Large: unstableSmall: slow

β€’ Step sizeShould be appropriateBut, different for different problems

How Determine a Step Size?β€’ Line search methodsBacktracking line search

Golden section searchSecant method

β€’ No implementation, No need to know detailsWhy? Matlab already has good line search methods

Try a large step size (πœ†πœ†0 )

πœ†πœ†1=πœ†πœ†0/2πœ†πœ†2=πœ†πœ†1/2

The convergence criteria is satisfied

π‘₯π‘₯π‘˜π‘˜π›»π›»π‘“π‘“(π‘₯π‘₯π‘˜π‘˜)

π‘₯π‘₯π‘˜π‘˜+1

Stop (Convergence) Criteria

β€’ 2-norm of βˆ‡π‘“π‘“ βˆ‡π‘“π‘“(π‘₯π‘₯π‘˜π‘˜) < πœ–πœ– βˆ‡π‘“π‘“ , ex) 1e-5

β€’ Relative change of 𝑓𝑓

𝑓𝑓 π‘₯π‘₯π‘˜π‘˜+1 βˆ’π‘“π‘“(π‘₯π‘₯π‘˜π‘˜)𝑓𝑓(π‘₯π‘₯π‘˜π‘˜+1)

< πœ–πœ–π‘“π‘“, ex) 1e-5

β€’ Relative change of π‘₯π‘₯π‘šπ‘šπ‘šπ‘šπ‘₯π‘₯ π‘₯π‘₯π‘˜π‘˜+1,π‘–π‘–βˆ’π‘₯π‘₯π‘˜π‘˜,𝑖𝑖

π‘₯π‘₯π‘˜π‘˜+1,𝑖𝑖< πœ–πœ–π‘₯π‘₯ 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1, … ,𝑁𝑁π‘₯π‘₯ , ex) 1e-10

β€’ Maximum number of iterationsβ€’ Maximum number of 𝑓𝑓 evaluations

Conjugate Gradient Method

β€’ A modification of the gradient descent methodβ€’ Add a scaled direction to the search directionβ€’ The scaled direction is computed using the previous search direction, a

nd the current and previous gradientsβ€’ More stable convergence because the previous search direction is cons

idered (less oscillations)

SGM vs. CGM

Example of SGM and CGM

β€’ Manual calculationExample_of_SGM_and_CGM.pdf

β€’ MATLABExample_of_SGM_and_CGM.mlx