lec 15 summary of single var methods
TRANSCRIPT
-
8/9/2019 Lec 15 Summary of Single Var Methods
1/28
Topic: All about One Dimensional
Unconstrained OT
Dr. Nasir M Mirza
Optimization Techniques Optimization Techniques
Email: [email protected]
-
8/9/2019 Lec 15 Summary of Single Var Methods
2/28
• If f ( x) and the constraints are linear, wehave linear programming .
• e.g.: Maximize x + y subject to
3x + 4y ≤ 2
y ≤ 5• If f ( x) is quadratic and the constraints are
linear, we have quadratic programming.
• If f ( x) is not linear or quadratic and/or theconstraints are nonlinear, we have
nonlinear programming.
Classification of Optimization Problems
-
8/9/2019 Lec 15 Summary of Single Var Methods
3/28
When constraints (equations marked with *)are included, we have a constrained
optimization problem.
Otherwise, we have an unconstrainedoptimization problem.
Classification of Optimization Problems
-
8/9/2019 Lec 15 Summary of Single Var Methods
4/28
Optimization Methods
One-Dimensional Unconstrained OptimizationGolden-Section Search
Quadratic InterpolationNewton's Method
Multi-Dimensional Unconstrained Optimization
Non-gradient or direct methodsGradient methods
Linear Programming (Constrained)
Graphical SolutionSimplex Method
-
8/9/2019 Lec 15 Summary of Single Var Methods
5/28
A function is said to be multimodal on a giveninterval if there are more than oneminimum/maximum point in the interval.
Global and Local Optima
-
8/9/2019 Lec 15 Summary of Single Var Methods
6/28
Characteristics of Optima
To find the optima, we can find the zeroes of f'( x).
-
8/9/2019 Lec 15 Summary of Single Var Methods
7/28
Mathematical Background
Objective: Maximize or Minimize f (x)
subject to
sConstraint*,,2,1)(
*,,2,1)(
⎭⎬⎫
===≤
pibe
miad
ii
ii
Κ Κ
x
x
x = { x1, x2, …, xn}
f (x): objective functiond i(x): inequality constraints
ei(x): equality constraints
ai and bi are constants)(
)(
x
x
f Minimize
f Maximize
−
χ
-
8/9/2019 Lec 15 Summary of Single Var Methods
8/28
A function is said to be multimodal on a given interval ifthere are more than one minimum/maximum point inthe interval.
Global and Local Optima
-
8/9/2019 Lec 15 Summary of Single Var Methods
9/28
Line search techniques are in essence optimization algorithmsfor one-dimensional minimization problems.
They are often regarded as the backbones of nonlinear
optimization algorithms.Typically, these techniques search a bracketed interval.
Often, unimodality is assumed.
Line search techniques are in essence optimization algorithmsfor one-dimensional minimization problems.
They are often regarded as the backbones of nonlinear
optimization algorithms.Typically, these techniques search a bracketed interval.
Often, unimodality is assumed.
Exhaustive search requires N = (b-a)/ε + 1 calculations tosearch the above interval, where ε is the resolution.
a bx*
Line search methods
-
8/9/2019 Lec 15 Summary of Single Var Methods
10/28
xl xu x
f f (( x x))
Bracketing Method
Suppose f ( x) is unimodal on the interval [ xl, xu]. That is,there is only one local maxima in [ xl, xu].
Objective: Gradually narrowing down the interval by
eliminating the sub-interval that does not contain themaxima.
-
8/9/2019 Lec 15 Summary of Single Var Methods
11/28
x xll x xaa x xbb x xuu x x
Bracketing Method
Let xa and xb be two points in ( xl, xu) where xa f ( xb), then the maximum point will not reside in theinterval [ xb, xu] and as a result we can eliminate the portiontoward the right of xb.
In other words, in the next iteration we can make xb the new xu
x xll x xaa x xbb x xuu x x
-
8/9/2019 Lec 15 Summary of Single Var Methods
12/28
Two point search (dichotomous search) for finding the solution tominimizing ƒ(x):
0) assume an interval [a,b]
1) Find x1 = a + (b-a)/2 - /2 and x2 = a+(b-a)/2 + /2where is the resolution.
2) Compare ƒ(x1) and ƒ(x2)
3) If ƒ(x1) < ƒ(x2) then eliminate x > x2 and set b = x2
If ƒ(x1) > ƒ(x2) then eliminate x < x1 and set a = x1
If ƒ(x1) = ƒ(x2) then pick another pair of points
4) Continue placing point pairs until interval < 2
a bx1 x2
ε
Basic bracketing algorithm
-
8/9/2019 Lec 15 Summary of Single Var Methods
13/28
Generic Bracketing Method (Pseudocode)
// xl, xu: Lower and upper bounds of the interval
// es: Acceptable relative error
function BracketingMax(xl, xu, es) {
do { prev_optimal = optimal;
Select xa and xb s.t. xl
-
8/9/2019 Lec 15 Summary of Single Var Methods
14/28
How would you suggest we select xa and xb (withthe objective to minimize computation)?
• Eliminate as much interval as possible in each iteration• Set xa and xb close to the center so that we can
halve the interval in each iteration
• Drawbacks: function evaluation is usually a costlyoperation.
• Minimize the number of function evaluations• Select xa and xb such that one of them can be
reused in the next iteration (so that we only need to
evaluate f ( x) once in each iteration).• How should we select such points?
Bracketing Method
-
8/9/2019 Lec 15 Summary of Single Var Methods
15/28
Current iteration
x' l x' a x' b x' u
If we can calculate xa and xbbased on the ratio R w.r.t.the current interval length in
each iteration, then we canreuse one of xa and xb in thenext iteraton.
In this example, xa is reusedas x' b in the next iteration soin the next iteration we only
need to evaluate f ( x' a).
xl xa x b xu
Next iteration
l 1
l 1
l o
l'1l'1
l'o
baab x xor x x
l
l R
l
l
==
==
''
'
0
'
1
0
1
Objective:
xl xa x b xu
-
8/9/2019 Lec 15 Summary of Single Var Methods
16/28
xl xa x b xu
Current iteration
Next iteration
x' l x' a x' b x' u
l 1
l 1
l o
l'1l'1
l'o
61803.02
15
)1(2
)1(411
010
][
'
'
'and'Since
2
000
2
01
0
1
0
00
1
10
0
1
10110
≈−
=
−−+−
=⇒
=−+⇒=−+⇒
=⇒==−
⇒
=−
⇒
=
−==
R
R R
l Rll R
Rll Rl
l R
Rl
Rll
R
l
ll
Rl
l
lllll
Golden Ratio xl xa x b xu
-
8/9/2019 Lec 15 Summary of Single Var Methods
17/28
)(
2
15whereor lulbua x xd d x xd x x −
−=+=−=
• Starts with two initial guesses, xl and xu
• Two interior points xa and xb are calculated based on the
golden ratio as
Golden-Section Search
• In the first iteration, both xa and xb need to becalculated.
• In subsequent iteration, xl and xu are updatedaccordingly and only one of the two interior pointsneeds to be calculated. (The other one is inherited from
the previous iteration.)
-
8/9/2019 Lec 15 Summary of Single Var Methods
18/28
• In each iteration the interval is reduced to about 61.8%(Golden ratio) of its previous length.
• After 10 iterations, the interval is shrunk to about(0.618)10 or 0.8% of its initial length.
• After 20 iterations, the interval is shrunk to about(0.618)20 or 0.0066%.
Golden-Section Search
-
8/9/2019 Lec 15 Summary of Single Var Methods
19/28
a bx1 x2
Initialize:
x1 = a + (b-a)*0.382
x2 = a + (b-a)*0.618
f1 = ƒ(x1)f2 = ƒ(x2)
Loop:
if f1 > f2 then
a = x1; x1 = x2; f1 = f2
x2 = a + (b-a)*0.618
f2 = ƒ(x2)
elseb = x2; x2 = x1; f2 = f1
x1 = a + (b-a)*0.382
f1 = ƒ(x1)
endif
Bracketing a Minimum using Golden Section
-
8/9/2019 Lec 15 Summary of Single Var Methods
20/28
a bx1 x2
L2
L2
L3
ε
L1
L1 = L2 + L3
It can be derived that
Ln = (L1 + Fn-2 ) / Fn
Fibonacci Search
Fibonacci numbers are:
1,1,2,3,5,8,13,21,34,..
that is , the sum of the last 2 numbersFn = Fn-1 + Fn-2
-
8/9/2019 Lec 15 Summary of Single Var Methods
21/28
x0 x1 x3 x2 x
f ( x)
Quadratic Interpolation
Idea:
(i) Approximate f ( x) using a quadratic function g( x) = ax2+bx+c
(ii) Optima of f ( x)≈Optima of g( x)
Optima of f ( x)Optima of g( x)
-
8/9/2019 Lec 15 Summary of Single Var Methods
22/28
• Shape near optima typically appears like aparabola. We can approximate the originalfunction f ( x) using a quadratic function:
g( x) = ax2 + bx + c.
• At the optimum point of g( x), g' ( x) = 2ax + b = 0.
Let x3 be the optimum point, then x3 = -b/2a.• How to compute b and a?
• 2 points => unique straight line (1st-order polynomial)
• 3 points => unique parabola (2nd
-order polynomial)• So, we need to pick three points that surround the optima.
• Let these points be x0, x1, x2 such that x0
-
8/9/2019 Lec 15 Summary of Single Var Methods
23/28
• a and b can be obtained by solving the system of linearequations
)(
)(
)(
2
1
0
2
2
2
1
2
1
0
2
0
x f
x f
x f
cbxax
cbxax
cbxax
=
=
=
++
++
++
))((2))((2))((2))(())(())((
102021210
2
1
2
02
2
0
2
21
2
2
2
103
x x x f x x x f x x x f x x x f x x x f x x x f x−+−+−
−+−+−=
• Substitute a and b into x3 = -b/2a yields
Quadratic Interpolation
-
8/9/2019 Lec 15 Summary of Single Var Methods
24/28
• The process can be repeated to improve theapproximation.
• Next step, decide which sub-interval to discard• Since f ( x3) > f ( x1)
if x3 > x1, discard the interval toward the left of x1
i.e., Set x0 = x1 and x1 = x3
if x3 < x1, discard the interval toward the right of x1i.e., Set x
2= x
1and x
1= x
3
• Calculate x3 based on the new x0 , x1 , x2
Quadratic Interpolation
-
8/9/2019 Lec 15 Summary of Single Var Methods
25/28
If your function is differentiable , then you do notneed to evaluate two points to determine the
region to be discarded. Get the slope and the signindicates which region to discard.
If your function is differentiable , then you do notneed to evaluate two points to determine theregion to be discarded. Get the slope and the signindicates which region to discard.
Basic premise in Newton-Raphson method:
Root finding of first derivative is equivalent to finding optimum
(if function is differentiable).
Method is sometimes referred to as a line search by curve fitbecause it approximates the real (unknown) objectivefunction to be minimized.
Gradient method: Newton's Method
-
8/9/2019 Lec 15 Summary of Single Var Methods
26/28
Newton’s Method
Let g( x) = f' ( x)
Thus the zeroes of g( x) is the optima of f ( x).
Substituting g( x) into the updating formula ofNewton-Rahpson method, we have
)("
)('
)('
)(
1i
i
ii
i
ii x f
x f x
xg
xg x x −=−=
+
-
8/9/2019 Lec 15 Summary of Single Var Methods
27/28
Newton’s Method
• Shortcomings
• Need to derive f' ( x) and f" ( x).
• May diverge• May "jump" to another solution far away
• Advantages• Fast convergent rate near solution
• Hybrid approach: Use bracketing method to find an
approximation near the solution, then switch toNewton's method.
-
8/9/2019 Lec 15 Summary of Single Var Methods
28/28
False Position Method or Secant MethodFalse Position Method or Secant Method
Second order information is expensive to calculate (for multi-variable problems).
Thus, try to approximate second order derivative.
Question: Why is this an advantage ?
Replace y''(xk) in Newton Raphson with
1k k
1k k
k xx
)x('y)x('y)x(''y
−
−
−
−=
Hence, Newton Raphson becomes
))x('y()x('y)x('y
xxxx k 1k k
1k k k 1k
−
−+ −−−=
Main advantage is no second derivative requirement