8 performance surfaces and optimum...
TRANSCRIPT
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
78
8 Performance Surfaces and Optimum Points
Optimization process consists of
• Defining performance index
• Searching for optimal parameters in the index
8.1 Taylor Series
LL +−++−+−+====
nxxn
n
xxxx xxxFdxd
nxxxF
dxdxxxF
dxdxFxF )()(
!1)()(
21)()()()( *2*
2
2**
*** (8.1-1)
8.1.1 Vector Case
),,,()( 21 nxxxFF L=x (8.1.1-1)
)()()()()()()()( **22
2
*11
1
**** nn
n
xxFx
xxFx
xxFx
FF −∂∂
++−∂∂
+−∂∂
+==== xxxxxx
xxxxx L
L+−−∂∂
∂+−
∂∂
+==
))(()(21)()(
21 *
22*
1121
22*
1121
2
xxxxFxx
xxFx
** xxxx xx (8.1.1-2)
L+−∇−+−∇+===
)()()(21)()()()( **
2* *xx
**xx
xxxxxxxxxx FFFF TT (8.1.1-3)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
79
∇ F(x): Gradient,
T
n
Fx
Fx
Fx
F ⎥⎦
⎤⎢⎣
⎡∂∂
∂∂
∂∂
=∇ )()()()(21
xxxx L (8.1.1-4)
∇2F(x): Hessian,
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂
=∇
)()()(
)()()(
)()()(
)(
2
2
2
2
1
2
2
2
22
2
12
21
2
21
2
21
2
2
xxx
xxx
xxx
x
Fx
Fxx
Fxx
Fxx
Fx
Fxx
Fxx
Fxx
Fx
F
nnn
n
n
L
MOMM
L
L
(8.1.1-5)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
80
8.2 Directional Derivatives
Directional derivative along p: (+ : increasing, - : decreasing, 0 : constant)
pxp )(FT ∇ (8.2-1)
Second derivative along p: (+ : convex, - : concave, 0 : straight)
2
2 )(p
pxp FT ∇ (8.2-2)
Example: 22
21 2)( xxF +=x (8.2-3)
⎥⎦
⎤⎢⎣
⎡=∇
2
1
42
)(xx
F x (8.2-4)
At ⎥⎦
⎤⎢⎣
⎡=
5.05.0
x ,
⎥⎦
⎤⎢⎣
⎡=∇
21
)(xF (8.2-5)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
81
For ⎥⎦
⎤⎢⎣
⎡=
21
p : parallel to gradient,
Directional derivative [ ]
521
21
21)(
22=
+
⎥⎦
⎤⎢⎣
⎡
=∇
pxp FT
(8.2-6)
For ⎥⎦
⎤⎢⎣
⎡−
=1
2p : orthogonal to gradient,
Directional derivative [ ]
021
21
12)(
22=
+
⎥⎦
⎤⎢⎣
⎡−
=∇
pxp FT
(8.2-7)
Figure 8.2-1 Quadratic Function and Directional Derivatives
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
82
8.3 Minima
Definition: The point x* is a strong minimum of F(x) if a scalar δ > 0 exists, such that F(x*) < F(x*+Δx) for all Δx such
that δ > ||Δx|| > 0.
Definition: The point x* is a unique global minimum of F(x) if F(x*) < F(x*+Δx) for all Δx≠0.
Definition: The point x* is a weak minimum of F(x) if it is not a strong minimum, and a scalar δ > 0 exists, such that
F(x*) ≤ F(x*+Δx) for all Δx such that δ > ||Δx|| > 0.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
83
Examples:
62173)( 24 +−−= xxxxF (8.3-1)
Figure 8.3-1 Scalar Example of Local and Global Minima
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
84
38)()( 2121
412 ++−+−= xxxxxxF x (8.3-2)
Figure 8.3-2 Vector Example of Minima and Saddle Point
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
85
21
2221
21 )25.1()( xxxxxF +−=x (8.3-3)
Figure 8.3-3 Weak Minimum Example
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
86
8.4 Necessary Conditions for Optimality
8.4.1 First-Order Conditions
Approximation by Taylor’s series
xxxxx xx Δ∇+≅Δ+= *)()()( ** TFFF (8.4.1-1)
If x* is minimum point, the second term must be non negative,
0)( * ≥Δ∇=
xxxx
TF (8.4.1-2)
For positive of the second term,
0)( * >Δ∇=
xx xxTF (8.4.1-3)
)()()()( **** xxxxxx
xxFFFF T <Δ∇−≅Δ+
= (8.4.1-4)
For zero of the second term,
0)( * =Δ∇=
xx xxTF (8.4.1-5)
0)( * =∇=xxxF (8.4.1-6)
• The gradient must be zero at a minimum point.
• This is a first-order, necessary (but not sufficient) condition for x* to be a local minimum point.
• Any points that have zero gradient are called stationary points.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
87
8.4.2 Second-Order Conditions
Approximation by Taylor’s series
L+Δ∇Δ+=Δ+=
xxxxxx xx *)(21)()( 2** FFF T (8.4.2-1)
As If x* is minimum point, the second term must be non negative,
0)( *2 ≥Δ∇Δ
=xxx xxFT (8.4.2-2)
Matrix A is positive definite (all eigenvalues are positive), if
0>AzzT (8.4.2-3)
Matrix A is positive semidefinite (all eigenvalues are non negative) for any vector z≠0, if
0≥AzzT (8.4.2-4)
• A positive definite Hessian matrix is a second-order, sufficient condition for a strong minimum to exist.
• It is not a necessary condition. A minimum can still be strong if the second-order term of the Taylor series is zero, but
the third-order term is positive.
• The second-order, necessary condition for a strong minimum is that the Hessian matrix be positive definite.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
88
8.5 Quadratic Functions
cF TT ++= xdAxxx
21)( (8.5-1)
where, the matrix A is symmetric.
hhxxh =∇=∇ )()( TT (8.5-2)
where h is a constant vector,
QxxQQxQxx 2=+=∇ TT (for symmetric Q) (8.5-3)
dAxx +=∇ )(F (8.5-4)
Ax =∇ )(2F (8.5-5)
• All higher derivatives of the quadratic function are zero.
• The first three terms of the Taylor series expansion give an exact representation of the function.
• All analytic functions behave like quadratics over a small neighborhood.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
89
8.5.1 Eigensystem of the Hessian
Consider a quadratic function that has a stationary point at the origin, and whose value there is zero:
Axxx TF21)( = (8.5.1-1)
The basis vectors are changed to the basis vectors of eigenvectors of the Hessian matrix, A.
• A is symmetric, its normalized eigenvectors will be mutually orthogonal.
[ ]nzzzB L21= (8.5.1-2) TBB =−1 (8.5.1-2)
[ ] Λ=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
==′
n
T
λ
λλ
L
MOMM
L
L
00
0000
2
1
ABBA (8.5.1-3)
TBBA Λ= (8.5.1-4)
The second derivative of a function F(x) in the direction of a vector p,
22
2 )(pApp
ppxp TT F
=∇ (8.5.1-5)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
90
Bcp = (8.5.1-6)
where c is the representation of the vector p with respect to the eigenvectors of A.
∑
∑
=
==Λ
=Λ
= n
ii
n
iii
T
T
TT
TTTT
c
c
1
2
1
2
2)(
λ
cccc
BcBcBcBBBc
pApp (8.5.1-7)
• The second derivative is a weighted average of the eigenvalues.
• The second derivative can never be larger that the largest eigenvalue, or smaller than the smallest eigenvalue.
max2min λλ ≤≤pAppT
(8.5.1-8)
• The maximum second derivative occurs in the direction of the eigenvector that corresponds to the largest eigenvalue.
• In each of the eigenvector directions the second derivatives will be equal to the corresponding eigenvalue.
• In other directions, the second derivative will be a weighted average of the eigenvalues.
• The eigenvectors define a new coordinate system in which the quadratic cross terms vanish.
• The eigenvectors are known as the principal axes of the function contours.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
91
Figure 8.5.1-1 The principal Axes of the Function Contours
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
92
Example:
xxx ⎥⎦
⎤⎢⎣
⎡=+=
2002
21)( 2
221
TxxF (8.5.1-9)
⎥⎦
⎤⎢⎣
⎡=∇
2002
)(2 xF , 21 =λ and ⎥⎦
⎤⎢⎣
⎡=
01
1z , 22 =λ and ⎥⎦
⎤⎢⎣
⎡=
10
2z (8.5.1-10)
Figure 8.5.1-2 Circular Hollow
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
93
Example:
xxx ⎥⎦
⎤⎢⎣
⎡=++=
2112
21)( 2
22121
TxxxxF (8.5.1-11)
⎥⎦
⎤⎢⎣
⎡=∇
2112
)(2 xF , 11 =λ and ⎥⎦
⎤⎢⎣
⎡−
=1
11z , 32 =λ and ⎥
⎦
⎤⎢⎣
⎡=
11
2z (8.5.1-12)
Figure 8.5.1-3 Elliptic Hollow
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
94
Example:
xxx ⎥⎦
⎤⎢⎣
⎡−−−−
=−−−=5.05.15.15.0
21
41
23
41)( 2
22121
TxxxxF (8.5.1-13)
⎥⎦
⎤⎢⎣
⎡−−−−
=∇5.05.15.15.0
)(2 xF , 11 =λ and ⎥⎦
⎤⎢⎣
⎡−=
11
1z , 22 −=λ and ⎥⎦
⎤⎢⎣
⎡−−
=11
2z (8.5.1-14)
Figure 8.5.1-4 Elongated Saddle
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
95
Example:
xxx ⎥⎦
⎤⎢⎣
⎡−
−=+−=
1111
21
21
21)( 2
22121
TxxxxF (8.5.1-16)
⎥⎦
⎤⎢⎣
⎡−
−=∇
1111
)(2 xF , 11 =λ and ⎥⎦
⎤⎢⎣
⎡−=
11
1z , 02 =λ and ⎥⎦
⎤⎢⎣
⎡−−
=11
2z (8.5.1-17)
A weak minimum along the line
21 xx = (8.5.1-18)
Figure 8.5.1-5 Stationary Valley
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
96
Characteristics of the quadratic function
1. If the eigenvalues of the Hessian matrix are all positive, the function will have a single strong minimum.
2. If the eigenvalues are all negative, the function will have a single strong maximum.
3. If some eigenvalues are positive and other eigenvalues are negative, the function will have a single saddle point.
4. If the eigenvalues are all nonnegative, but some eigenvalues are zero, then the function will either have a weak
minimum or will have no stationary point.
5. If the eigenvalues are all nonpositive, but some eigenvalues are zero, then the function will either have a weak
maximum or will have no stationary point.
Stationary point of a quadratic equation,
cF TT ++= xdAxxx21)( (8.5.1-19)
dAxx +=∇ )(F (8.5.1-20)
dAx 1* −−= (8.5.1-21)
• If A is not invertible (has some zero eigenvalues) and d is nonzero then no stationary points will exist.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
97
9 Performance Optimization
9.1 Steepest Descent
Updating to minimum point,
kkkk pxx α+=+1 (9.1-1)
)()( 1 kk FF xx <+ (9.1-2)
kTkkkkk FFF xgxxxx Δ+≈Δ+=+ )()()( 1 (9.1-3)
kFk xxxg =∇≡ )( (9.1-4)
0<=Δ kTkkk
Tk pgxg α (9.1-5)
0<kTk pg (9.1-6)
The direction of steepest descent,
kTk pg : most negative (9.1-7)
kk gp −= (9.1-8)
The method of steepest descent,
kkkk gxx α−=+1 (9.1-9)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
98
Learning rate determination, αk,
1. Stable learning rate: Using fixed learning rate or predetermined learning rate
2. Minimizing along a line: Minimize the performance index F(x) with respect to αk at each iteration
kkk gx α− (9.1-10)
Consider 22
21 25)( xxF +=x (9.1-11)
⎥⎦
⎤⎢⎣
⎡=
5.05.0
0x (9.1-12)
⎥⎦
⎤⎢⎣
⎡=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
=∇2
1
2
1
502
)(
)()(
xx
Fx
FxF
x
xx (9.1-13)
⎥⎦
⎤⎢⎣
⎡=∇= = 25
1)(
00 xxxg F (9.1-14)
Fixed learning rate of α = 0.01,
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡=−=
25.049.0
251
01.05.05.0
001 gxx α (9.1-15)
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡=−=
125.04802.0
5.1298.0
01.025.049.0
112 gxx α (9.1-16)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
99
Figure 9.1-1 Trajectory for Steepest Descent with α = 0.01 and 0.035
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
100
9.1.1 Stable Learning Rates
For quadratic performance index,
cF TT ++= xdAxxx21)( (9.1.1-1)
dAxx +=∇ )(F (9.1.1-2)
)(1 dAxxgxx +−=−=+ kkkkk αα (9.1.1-3)
dxAIx αα −−=+ kk ][1 (9.1.1-4)
• This linear dynamic system is stable if the eigenvalues of the matrix [I-αA] are less than one in magnitude.
{λ1, λ2, …, λn}: eigenvalues of the Hessian matrix, {z1, z2, …, zn}: eigenvectors of the Hessian matrix,
iiiiiiii zzzAzzzAI )1(][ αλαλαα −=−=−=− (9.1.1-5)
• The eigenvectors of [I-αA] are the same as the eigenvectors of A, and the eigenvalues of [I-αA] are (1-αλi).
The condition for the stability of the steepest descent algorithm,
1)1( <− iαλ (9.1.1-6)
iλα 20 << (9.1.1-7)
max
2λ
α < (9.1.1-8)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
101
Example:
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=+=
50002
,50002
2125)( 2
221 Axxx TxxF (9.1.1-9)
21 =λ and ⎥⎦
⎤⎢⎣
⎡=
01
1z , 502 =λ and ⎥⎦
⎤⎢⎣
⎡=
10
2z (9.1.1-10)
04.05022
max
==<λ
α (9.1.1-11)
Figure 9.1.1-1 Trajectories for α = 0.039 (Left), and α = 0.041 (Right)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
102
9.1.2 Minimizing Along a Line
)()()(
21)()()()()()()( **
2*1
*xx
**xx
xxxxxxxxxxpxxxx −∇−+−∇+≈=+=Δ+===+ FFFFFFF TT
kkkk α (9.1.2-1)
kTkkk
Tkkk
kkk
FFFd
d pxppxpx xxxx == ∇+∇=+ )()()( 2ααα
(9.1.2-2)
kTk
kTk
kTk
kT
k
k
k
F
F
Apppg
pxp
px
xx
xx−=
∇
∇−=
=
=
)(
)(2
α (9.1.2-3)
kFk xxxA =∇= )(2 (9.1.2-4)
Example:
xxx ⎥⎦
⎤⎢⎣
⎡=++=
2112
21)( 2
22121
TxxxxF (9.1.2-5)
⎥⎦
⎤⎢⎣
⎡−
=25.08.0
0x (9.1.2-6)
⎥⎦
⎤⎢⎣
⎡+
+=∇
21
21
22
)(xxxx
F x (9.1.2-7)
⎥⎦
⎤⎢⎣
⎡−−
=−∇=−= = 3.035.1
)(000 xxxgp F (9.1.2-8)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
103
[ ]
[ ]413.0
3.035.1
2112
3.035.1
3.035.1
3.035.1
0 =
⎥⎦
⎤⎢⎣
⎡−−
⎥⎦
⎤⎢⎣
⎡−−
⎥⎦
⎤⎢⎣
⎡−−
−=α (9.1.2-9)
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡−
=−=37.0
24.03.0
35.1413.0
25.08.0
0001 gxx α (9.1.2 -10)
Figure 9.1.2-1 Steepest Descent with Minimization Along a Line
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
104
9.2 Newton's Method
kkTkk
Tkkkkk FFF xAxxgxxxx ΔΔ+Δ+≈Δ+=+ 2
1)()()( 1 (9.2-1)
For quadratic function of
cF TT ++= xdAxxx21)( (9.2-2)
dAxx +=∇ )(F (9.2-3)
0=Δ+ kkk xAg (9.2-4)
kkk gAx 1−−=Δ (9.2-5)
kkkk gAxx 11
−+ −= (9.2-6)
Example: 22
21 25)( xxF +=x (9.2-7)
⎥⎦
⎤⎢⎣
⎡=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
=∇2
1
2
1
502
)(
)()(
xx
Fx
FxF
x
xx , ⎥
⎦
⎤⎢⎣
⎡=∇
50002
)(2 xF (9.2-8)
⎥⎦
⎤⎢⎣
⎡=
5.05.0
0x (9.2-9)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
105
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡=
−
00
5.05.0
5.05.0
251
50002
5.05.0 1
1x (9.2-10)
Figure 9.2-1 Trajectory for Newton’s Method
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
106
• If the function F(x) is not quadratic, then Newton's method will not generally converge in one step.
Example:
38)()( 21214
12 ++−+−= xxxxxxF x (9.2-11)
Three stationary points:
⎥⎦
⎤⎢⎣
⎡−=
42.042.01x , ⎥
⎦
⎤⎢⎣
⎡−=
13.013.02x , and ⎥
⎦
⎤⎢⎣
⎡−
=55.0
55.03x (9.2-12)
Figure 9.2-2 One Iteration of Newton’s Method from x0 = [1.5 0]T
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
107
Figure 9.2-3 One Iteration of Newton’s Method from x0 = [-1.5 0]T
Figure 9.2-4 One Iteration of Newton’s Method from x0 = [0.75 0.75]T
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
108
9.3 Conjugate Gradient
For a quadratic function,
cF TT ++= xdAxxx21)( (9.3-1)
Definition: A set of vectors {pk} is mutually conjugate with respect to a positive definite Hessian matrix A if and only if
0=jTk App for jk ≠ (9.3-2)
• If we make a sequence of exact linear searches along any set of conjugate directions {p1, p2, …, pn}, then the exact
minimum of any quadratic function, with n parameters, will be reached in at most n searches.
The change in the gradient at iteration k+1 for quadratic functions,
kkkkkk xAdAxdAxggg Δ=+−+=−=Δ ++ )()( 11 (9.3-3)
kkkkk pxxx α=−=Δ + )( 1 (9.3-4)
0=Δ=Δ== jTkj
Tkj
Tkkj
Tk pgApxAppApp α for jk ≠ (9.3-5)
• The search directions will be conjugate if they are orthogonal to the changes in the gradient.
00 gp −= (9.3-6)
1−+−= kkkk pgp β (9.3-7)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
109
By Hestenes and Steifel,
kTk
kTk
k pggg
1
1
−
−
ΔΔ
=β (9.3-8)
Fletcher and Reeves,
11 −−
=k
Tk
kTk
k gggg
β (9.3-9)
Polak and Ribiere,
11
1
−−
−Δ=
kTk
kTk
k ggggβ (9.3-10)
The conjugate gradient method consists of the following steps:
1. Select the first search direction to be the negative of the gradient, 00 gp −= .
2. Take a step according to kkk px α=Δ , selecting the learning rate αk to minimize the function along the search direction.
3. Select the next search direction according to 1−+−= kkkk pgp β , using k
Tk
kTk
pggg
1
1
−
−
ΔΔ ,
11 −− kTk
kTk
gggg , or
11
1
−−
−Δ
kTk
kTk
gggg to calculate βk..
4. If the algorithm has not converged, return to step 2.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
110
Example:
xxx ⎥⎦
⎤⎢⎣
⎡=
2112
21)( TF (9.3-11)
⎥⎦
⎤⎢⎣
⎡−
=25.08.0
0x (9.3-12)
⎥⎦
⎤⎢⎣
⎡+
+=∇
21
21
22
)(xxxx
F x (9.3-13)
⎥⎦
⎤⎢⎣
⎡−−
=−∇=−= = 3.035.1
)(000 xxxgp F (9.3-14)
[ ]
[ ]413.0
3.035.1
2112
3.035.1
3.035.1
3.035.1
0 =
⎥⎦
⎤⎢⎣
⎡−−
⎥⎦
⎤⎢⎣
⎡−−
⎥⎦
⎤⎢⎣
⎡−−
−=α (9.3-15)
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡−−
+⎥⎦
⎤⎢⎣
⎡−
=+=37.0
24.03.0
35.1413.0
25.08.0
0001 pxx α (9.3-16)
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡−⎥
⎦
⎤⎢⎣
⎡=∇= = 5.0
11.03.0
24.02112
)(11 xxxg F (9.3-17)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
111
[ ]
[ ]137.0
9125.12621.0
3.035.1
3.035.1
5.011.0
5.011.0
00
111 ==
⎥⎦
⎤⎢⎣
⎡
⎥⎦
⎤⎢⎣
⎡−
−==
gggg
T
T
β (9.3-18)
Using the method of Fletcher and Reeves,
⎥⎦
⎤⎢⎣
⎡−=⎥
⎦
⎤⎢⎣
⎡−−
+⎥⎦
⎤⎢⎣
⎡−=+−=
459.0295.0
3.035.1
137.05.011.0
0111 pgp β (9.3-19)
[ ]
[ ]807.0
459.0295.0
2112
459.0295.0
459.0295.0
5.011.0
1 =
⎥⎦
⎤⎢⎣
⎡−⎥⎦
⎤⎢⎣
⎡−
⎥⎦
⎤⎢⎣
⎡−−
−=α (9.3-20)
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−+⎥
⎦
⎤⎢⎣
⎡−
=+=00
459.0295.0
807.037.0
24.01112 pxx α (9.3-21)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
112
Figure 9.3-1 Conjugate Gradient Algorithm
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
113
10 Widrow-Hoff Learning
10. 1 ADALINE (ADAptive LInear NEuron) Network
Figure 10.1-1 ADALINE Network
a = purelin(Wp+b) = Wp+b (10.1-1)
ai = purelin(ni) = purelin(iwTp+bi) = iwTp+bi (10.1-2)
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
Ri
i
i
i
w
ww
,
2,
1,
Mw (10.1-3)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
114
10.1.1 Single ADALINE
Figure 10.1.1-1 Two-Input Linear Neuron
a = purelin(n) = purelin(1wTp+b) = 1wTp+b = w1,1p1+w1,2p2+b (10.1.1-1)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
115
10.2 Mean Square Error
LMS algorithm: supervised training
{p1, t1}, {p2, t2}, …, {pQ, tQ} (10.2-1)
⎥⎦
⎤⎢⎣
⎡=
bw
x 1 (10.2-2)
⎥⎦
⎤⎢⎣
⎡=
1p
z (10.2-3)
a = 1wTp+b (10.2-4)
a = xTz (10.2-5)
])[(])[(][)( 222 zxx TtEatEeEF −=−== (10.2-6)
xzzxzxxzzxzxx ][][2][]2[)( 22 TTTTTT EtEtEttEF +−=+−= (10.2-7)
Rxxhxx TTcF +−= 2)( (10.2-8)
where
c = E[t2], h = E[tz], and R = E[zzT] (10.2-9)
General form quadratic function,
Axxxdx TTcF21)( ++= (10.2-10)
d = -2h and A = 2R (10.2-11)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
116
The gradient,
RxhAxdAxxxdx 2221)( +−=+=⎟
⎠⎞
⎜⎝⎛ ++∇=∇ TTcF (10.2-12)
The stationary point of the performance index,
022 =+− Rxh (10.2-13)
hRx 1* −= (10.2-14)
Example: %,501,43
%,501,21
2211 →⎭⎬⎫
⎩⎨⎧
−=⎥⎦
⎤⎢⎣
⎡=→
⎭⎬⎫
⎩⎨⎧
=⎥⎦
⎤⎢⎣
⎡= tt zz
[ ] ( ) ( ) 115.015.0 222 =−+== tEc (10.2-15)
[ ] ⎥⎦
⎤⎢⎣
⎡−−
=⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡−+⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡==
11
43
15.021
15.0zh tE (10.2-16)
[ ] [ ] [ ] ⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡==
10775
4343
5.02121
5.0TE zzR (10.2-17)
xxxx ⎥⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡−−
−=10775
11
21)( TTF (10.2-18)
011
210775
2)( =⎥⎦
⎤⎢⎣
⎡−−
−⎥⎦
⎤⎢⎣
⎡=∇ xxF (10.2-19)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
117
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡−=⎥
⎦
⎤⎢⎣
⎡−−
⎥⎦
⎤⎢⎣
⎡=
−
12
111
23
11
10775
ww
x (10.2-20)
⎭⎬⎫
⎩⎨⎧
=⎥⎦
⎤⎢⎣
⎡= 1,
21
11 tz ,
[ ] 11 121
23 ta ==⎥⎦
⎤⎢⎣
⎡−= (10.2-21)
⎭⎬⎫
⎩⎨⎧
−=⎥⎦
⎤⎢⎣
⎡= 1,
43
22 tz ,
[ ] 22 143
23 ta =−=⎥⎦
⎤⎢⎣
⎡−= (10.2-22)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
118
Figure 10.2-1 Network
Figure 10.2-2 Weight Vector and Decision Boundary
z1
x z2
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
119
10.3 LMS Algorithm
In Widrow and Hoff learning rule, the mean square error, F(x), is estimated by
)())()(()(ˆ 22 kekaktF =−=x (10.3-1)
)()(ˆ 2 keF ∇=∇ x (10.3-2)
jjj w
kekew
keke,1,1
22 )()(2)()]([
∂∂
=∂
∂=∇ for j = 1, 2, …, R, (10.3-3)
bkeke
bkeke R ∂
∂=
∂∂
=∇ +)()(2)()]([
2
12 (10.3-4)
⎥⎦
⎤⎢⎣
⎡⎟⎠
⎞⎜⎝
⎛ +−∂
∂=+−
∂∂
=∂
−∂=
∂∂ ∑
=
R
iii
j
T
jjj
bkpwktw
bkktww
kaktw
ke1
,1,1
1,1,1,1
)()()])(()([)]()([)( pw (10.3-5)
)()(
,1
kpw
kej
j
−=∂∂ (10.3-6)
1)(−=
∂∂
bke (10.3-7)
)()(2)()(ˆ 2 kkekeF zx −=∇=∇ (10.3-8)
kFkk xxxxx =+ ∇−= )(1 α (10.3-9)
)()(21 kkekk zxx α+=+ (10.3-10)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
120
)()(2)()1( 11 kkekk pww α+=+ (10.3-11)
)(2)()1( kekbkb α+=+ (10.3-12)
• Widrow-Hoff learning algorithm is also called delta rule.
)()(2)()1( kkekk iii pww α+=+ (10.3-13)
)(2)()1( kekbkb iii α+=+ (10.3-14)
)()(2)()1( kkkk TpeWW α+=+ (10.3-15)
)(2)()1( kkk ebb α+=+ (10.3-16)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
121
10.4 Example on Apple/Orange Recognition
For simplicity, a zero bias is used in the ADALINE network.
[ ] %501,11
1
11 →⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−= tp , [ ] %501,
111
22 →⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−= tp (10.4-1)
[ ] ( ) ( ) 115.015.0 222 =+−== tEc (10.4-2)
[ ]⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−==
010
111
15.011
115.0zh tE (10.4-3)
[ ] [ ]⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−
−=−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−+−−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=+==
101010101
1111
11
21111
11
1
21
21
21][ 2211
TTTE ppppppR (10.4-4)
The eigenvalues of R,
λ1 = 1.0, λ2 = 0.0, λ3 = 2.0 (10.4-3)
5.00.2
11
max
==<λ
α (10.4-4)
Select α = 0.2,
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
122
Presenting orange, [ ]⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−= 1,
11
1
11 tp ,
[ ] 011
1000)0()0()0()0( 1 =
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=== pWpWa (10.4-5)
101)0()0()0()0( 1 −=−−=−=−= atate (10.4-6)
[ ] [ ]4.04.04.011
1)1)(2.0(2000)0()0(2)0()1( −=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−−+=+=
T
Te pWW α (10.4-7)
Presenting apple, [ ]⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−= 1,
111
22 tp ,
[ ] 4.01
11
4.04.04.0)1()1()1()1( 2 −=⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=== pWpWa (10.4-8)
4.1)4.0(1)1()1()1()1( 2 =−−=−=−= atate (10.4-9)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
123
[ ] [ ]16.096.016.01
11
)4.1)(2.0(24.04.04.0)1()1(2)1()2( −=⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−+−=+=
T
Te pWW α (10.4-10)
If we continue this procedure, the algorithm converges to
[ ]010)( =∞W (10.4-11)
Figure 10.4-1 Prototype Vectors and Decision Boundary
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
124
• Decision boundary generated by ADALINE falls halfway between the two reference patterns.
• The perceptron rule does not produce halfway decision boundary since the perceptron rule stops as soon as the patterns
are correctly classified.
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
125
10.5 Example on Linear Regression
ADALINE network is used to determine parameters of a quadratic function, 0a , 1a , and 2a , of the relation
012
2 axaxay ++= when the data of x and y are as the following.
x -3 -2 -1 0 1 2 3
y 6 3 2 3 6 11 18
By LMS algorithm,
RxxhxcxzzExtzExtExF TTTTT +−=+−= 2][][2][)( 2 (10.5-1)
( ) 7753971181163236
71 2222222 ==++++++=c (10.5-2)
x 1a
1 0a
Σ y
2x 2a
purelin
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
126
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
7840
4956280
71
139
18124
11111
6100
311
12
12
43
13
96
71h (10.5-3)
[ ] [ ] [ ] [ ] [ ] [ ] [ ]⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡+−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−+−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−+−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
1040404028
71
70280280280196
71139
139
124124
111111
100100
11111
1124
12
4139
13
9
71R (10.5-4)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡==
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡= −
321
7840
1040404028
1
0
1
2
hRaaa
x (10.5-5)
By Widrow-Hoff learning rule with 0.02 learning rate,
epWW oldnew α2+= (10.5-6)
Present (-3, 6),
606;00)3(0)3(0 2 =−==+−+−= ey (10.5-7)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
24.072.0
16.2
13)3(
)6)(02.0(2000 2
12
11
bww
(10.5-8)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
127
Present (-2, 3),
32.732.103;32.1024.0)2(72.0)2(16.2 2 −=−==+−−−= ey (10.5-9)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
05.013.0
99.0
12)2(
)32.7)(02.0(224.072.0
16.2 2
12
11
bww
(10.5-10)
Present (-1, 2),
93.007.12;07.105.0)1(13.0)1(99.0 2 =−==−−−−= ey (10.5-11)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
01.017.0
03.1
11)1(
)93.0)(02.0(205.013.0
99.0 2
12
11
bww
(10.5-12)
Present (0, 3),
01.301.03;01.001.0)0(17.0)0(03.1 2 =−−=−=−−= ey (10.5-13)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
11.017.0
03.1
10)0(
)01.3)(02.0(201.017.0
03.1 2
12
11
bww
(10.5-14)
ASIAN INSTITUTE OF TECHNOLOGY MECHATRONICS
Manukid Parnichkun
128
Present (1, 6),
03.597.06;97.011.0)1(17.0)1(03.1 2 =−==+−= ey (10.5-15)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
31.003.023.1
11)1(
)03.5)(02.0(211.017.0
03.1 2
12
11
bww
(10.5-16)
Present (2, 11),
71.529.511;29.531.0)2(03.0)2(23.1 2 =−==++= ey (10.5-17)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
54.049.014.2
12)2(
)71.5)(02.0(231.003.023.1 2
12
11
bww
(10.5-18)
Present (3, 18),
27.327.2118;27.2154.0)3(49.0)3(14.2 2 −=−==++= ey (10.5-19)
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡−=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−+⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
18.036.0
05.0
13)3(
)27.3)(02.0(231.003.023.1 2
12
11
bww
(10.5-20)