gradient methods yaron lipman may 2003. preview background steepest descent conjugate gradient

42
Gradient Methods Yaron Lipman May 2003

Post on 21-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Gradient Methods

Yaron Lipman

May 2003

Page 2: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Preview

Background Steepest Descent Conjugate Gradient

Page 3: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Preview

Background Steepest Descent Conjugate Gradient

Page 4: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Background

Motivation The gradient notion The Wolfe Theorems

Page 5: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Motivation

The min(max) problem:

But we learned in calculus how to solve that kind of question!

)(min xfx

Page 6: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Motivation

Not exactly, Functions: High order polynomials:

What about function that don’t have an analytic presentation: “Black Box”

x1

6x

3 1

120x

5 1

5040x

7

RRf n :

Page 7: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Motivation

“real world” problem finding harmonic mapping

General problem: find global min(max) This lecture will concentrate on finding local

minimum.

Eji

jijiharm kE),(

2

,2

1vv

RRyyxxE nnnharm 2

11 :),,,,,(

Page 8: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Background

Motivation The gradient notion The Wolfe Theorems

Page 9: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

:= f ( ),x y

cos

1

2x

cos

1

2y x

Page 10: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Directional Derivatives: first, the one dimension derivative:

Page 11: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

x

yxf

),(

y

yxf

),(

Directional Derivatives : Along the Axes…

Page 12: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

v

yxf

),(

2Rv

1v

Directional Derivatives : In general direction…

Page 13: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Directional Derivatives

x

yxf

),(

y

yxf

),(

Page 14: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

In the plane

2R

RRf 2:

y

f

x

fyxf :),(

The Gradient: Definition in

Page 15: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

n

n x

f

x

fxxf ,...,:),...,(

11

RRf n :

The Gradient: Definition

Page 16: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient Properties

The gradient defines (hyper) plane approximating the function infinitesimally

yy

fx

x

fz

Page 17: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient properties

By the chain rule: (important for later use)

vfpv

fp ,)()(

1v

Page 18: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient properties

Proposition 1: is maximal choosing

is minimal choosing

(intuitive: the gradient point the greatest change direction)

v

f

p

p

ff

v )()(

1

p

p

ff

v )()(

1

Page 19: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient properties

Proof: (only for minimum case)

Assign: by chain rule:

p

p

p

pp

p

p

p

p

ff

fff

f

ff

fpv

yxf

)()(

)()(,)(

)(

1

)()(

1,)()(

),(

2

p

p

ff

v )()(

1

Page 20: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient properties

On the other hand for general v:

p

p

pp

fpv

yxf

f

vfvfpv

yxf

)()(),(

)(

)(,)()(),(

Page 21: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient Properties

Proposition 2: let be a smooth function around P,

if f has local minimum (maximum) at p

then,

(Intuitive: necessary for local min(max))

RRf n :

0)( pf

1C

Page 22: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient Properties

Proof:

Intuitive:

Page 23: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient Properties

Formally: for any

We get:}0{\nRv

0)(

,)()0()(

0

p

p

f

vfdt

vtpdf

Page 24: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Gradient Properties

We found the best INFINITESIMAL DIRECTION at each point,

Looking for minimum: “blind man” procedure How can we derive the way to the minimum

using this knowledge?

Page 25: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Background

Motivation The gradient notion The Wolfe Theorems

Page 26: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Wolfe Theorem

This is the link from the previous gradient properties to the constructive algorithm.

The problem:

)(min xfx

Page 27: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Wolfe Theorem

We introduce a model for algorithm:

Data:

Step 0: set i=0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

nRx 0

0)( ixfn

i Rh

)(minarg0

iii hxf

iiii hxx 1

Page 28: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Wolfe Theorem

The Theorem: suppose C1 smooth, and exist continuous function:

And,

And, the search vectors constructed by the model algorithm satisfy:

RRf n :

]1,0[: nRk

0)(0)(: xkxfx

iiiii hxfxkhxf )()(),(

Page 29: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Wolfe Theorem

And

Then if is the sequence constructed by

the algorithm model,

then any accumulation point y of this sequence satisfy:

0}{ iix

x^

0)( yf

00)( ihyf

Page 30: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

The Wolfe Theorem

The theorem has very intuitive interpretation :

Always go in decent direction.

)( ixf

ih

Page 31: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Preview

Background Steepest Descent Conjugate Gradient

Page 32: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Steepest Descent

What it mean? We now use what we have learned to

implement the most basic minimization technique.

First we introduce the algorithm, which is a version of the model algorithm.

The problem: )(min xf

x

Page 33: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Steepest Descent

Steepest descent algorithm:

Data:

Step 0: set i=0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

nRx 0

0)( ixf)( ii xfh

)(minarg0

iii hxf

iiii hxx 1

Page 34: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Steepest Descent

Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:

Proof: from Wolfe theorem

0)( yf

0}{ iix

Page 35: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Steepest Descent

From the chain rule:

Therefore the method of steepest descent looks like this:

0),()( iiiiiii hhxfhxfd

d

Page 36: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Steepest Descent

Page 37: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Steepest Descent

The steepest descent find critical point and local minimum.

Implicit step-size rule Actually we reduced the problem to finding

minimum:

There are extensions that gives the step size rule in discrete sense. (Armijo)

RRf :

Page 38: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Preview

Background Steepest Descent Conjugate Gradient

Page 39: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Conjugate Gradient

Modern optimization methods : “conjugate direction” methods.

A method to solve quadratic function minimization:

(H is symmetric and positive definite)

},,{min 21 xdHxx

nRx

Page 40: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Conjugate Gradient

Originally aimed to solve linear problems:

Later extended to general functions under rational of quadratic approximation to a function is quite accurate.

2min bAxbAx

nRx

Page 41: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Conjugate Gradient

The basic idea: decompose the n-dimensional quadratic problem into n problems of 1-dimension

This is done by exploring the function in “conjugate directions”.

Definition: H-conjugate vectors:

jiHuuRu jinn

ii ,0,,}{ 1

Page 42: Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient

Conjugate Gradient

If there is an H-conjugate basis then:

N problems in 1-dimension (simple smiling quadratic) The global minimizer is calculated sequentially starting

from x0:

j

hjhxx 0

jjjjjj hHxdHhhxfxf

xdHxxxf

,,)()(

,,:)(

02

21

0

21

)1...,,1,0(,ˆ1 nihxx iiii