DSP everywhere!
0
Adaptive Filter Theory
Sung Ho Cho
Hanyang UniversitySeoul, Korea
(Office) +82-2-2220-0390(Mobile) +82-10-5412-5178
[email protected]@hanyang.ac.kr
DSP everywhere!
1
Table of Contentsf
Wiener Filters
Gradient Search by Steepest Descent Method
Stochastic Gradient Adaptive Algorithms
Recursive Least Square (RLS) Algorithm
DSP everywhere!
2
Wiener Filters
DSP everywhere!
3
Filter‐Optimization Problemp
Wiener FilteringA priori knowledge of the signal statistics or at least their estimates are requiredA priori knowledge of the signal statistics or at least their estimates are required.
Complex and expensive hardware systems are necessary (particularly, in nonstationary environments).
Adaptive FilteringComplete knowledge of the signal statistics is not required.
Filter weights eventually converge to the optimum Wiener solutions for stationary processes.
Filter weights show tracking capability in slowly time-varying nonstationary environments.
Complex and expensive hardware systems are not, in general, necessary.
DSP everywhere!
4
Wiener Filters (1/6)
Objectives:We want to design a filter that minimizes the mean squared estimation error so that theh { }2( )EWe want to design a filter that minimizes the mean-squared estimation error so that the estimated signal best approximates d(n).
ih { }2( )E e n)(ˆ nd
Desired
Estimation Error Signal
ˆ( ) ( ) ( )e n d n d n= −( )d nSignal
( ) ( ) ( )e n d n d n( )d n
ReferenceSignal
ih10 −≤≤ Ni
Estimated Signal
1
0
ˆ( ) ( )N
ii
d n h x n i−
== −∑( )x n
DSP everywhere!
5
Wiener Filters (2/7)
Basic Structure:
1
0( ) ( ) ( )
( ) ( )
N
ii
T
e n d n h x n i
d n H X n
−
== − −∑
h
( )d n
z‐1
( )x n( ) ( )d n H X n= −
0h
h
z‐1( 1)x n −
ˆ( )d n
1h
h( 2)x n −
2hLinear combination of
the current and past input signals
z‐1
( 1)x n N− +1Nh −
( 1)x n N− +
DSP everywhere!
6
Wiener Filters (3/7)
Basic Assumptions:d(n) and x(n) are zero-mean.d(n) and x(n) are zero mean.d(n) and x(n) are jointly wide-sense stationary.
Notations:Notations:Filter Coefficient Vector:
Reference Input Vector: [ ]( ) ( ) ( 1) ( 1) TX n x n x n x n N= − − +L
[ ]0 1 1, , , TNH h h h −= L
Reference Input Vector:
Estimation Error Signal:
[ ]( ) ( ), ( 1), , ( 1)X n x n x n x n N= +
1
0( ) ( ) ( )
N
ii
e n d n h x n i−
== − −∑
Autocorrelation Matrix:
( ) ( )Td n H X n= −
{ }( ) ( )TXXR E X n X n=
Cross-correlation Vector:
{ }
{ }( ) ( )dXR E d n X n=
TOptimum Filter Coefficient Vector: 0, 1, 1,, , ,T
opt opt opt N optH h h h −⎡ ⎤= ⎣ ⎦L
DSP everywhere!
7
Wiener Filters (4/7)
Performance Measure (Cost Function):
{ }2( )Eξ { }( )
2
2
( )
( ) ( )T
E e n
E d n H X n
ξ =
⎧ ⎫= −⎨ ⎬⎩ ⎭
{ } { } { }{ }
2
2
( ) 2 ( ) ( ) ( ) ( )
( ) 2
T T T
T TdX XX
E d n H E d n X n H E X n X n H
E d n H R H R H
= − +
= − +
We now want to minimize with respect to H:
∂ξ
ξ
Wiener Hopf Solution (1931):
022 =+−=∂∂ HRRH XXdXξ
RHRWiener-Hopf Solution (1931):dXoptXX RHR =
dXopt RRHXX
1−=
DSP everywhere!
8
Wiener Filters (5/7)
Autocorrelation Matrix RXX: { }( ) ( )TXXR E X n X n=
(0) (1) ( 1)(1) (0) ( 2)
xx xx xx
xx xx xx
r r r Nr r r N
−⎡ ⎤⎢ ⎥−⎢ ⎥=⎢ ⎥⎢ ⎥
L
L
M M O M
RXX is symmetric and Toeplitz.
( 1) ( 2) (0)xx xx xxr N r N r⎢ ⎥− −⎣ ⎦L
Is RXX invertible?Yes, almost always.
RXX is almost always a positive definite matrix.
A symmetric matrix A is called positive definite if xTAx > 0 for every nonzero x.All the eigenvalues of A is positive.
The determinant of every principal submatrix of A is positive.
Since the determinant of A is not zero, A is invertible.,
DSP everywhere!
9
Wiener Filters (6/7)
Let XB(n) denote the vector obtained by rearranging the elements of X(n) backward, i.e.,
[ ]TThen
[ ]( ) ( 1), ( 2), , ( ) TBX n x n N x n N x n= − + − + L
{ }( ) ( )TB B XXE X n X n R=
Cross-correlation Vector RdX:
{ }
{ }
(0)(1)
( ) ( )
dx
dxdX
rr
R E d n X n
⎡ ⎤⎢ ⎥⎢ ⎥= =⎢ ⎥⎢ ⎥
M
Minimum Estimation Error:
( 1)dxr N⎢ ⎥−⎣ ⎦
Minimum Estimation Error:
min ( ) ( ) ( )
( ) ( )
Topt
T
e n d n H X n
d X H
= −
( ) ( )Toptd n X n H= −
DSP everywhere!
10
Wiener Filters (7/7)
Minimum Mean-Squared Estimation Error: { }{ }
2min min ( )E e nξ =
( ){ }{ }
2
2
( ) ( )
( )
Topt
Topt dX
E d n H X n
E d n H R
= −
= −{ }{ }2( )
p
Topt XX optE d n H R H= −
Example:
ξ N = 1
ξN = 2
Error
ξ
minξError
Surface
Surfaceminξ
0h( )
1h
0,opth ( )0, 1,,opt opth h0h
DSP everywhere!
11
Orthogonality Principle:g y p
( )d nmin ( )e n
ˆ( )d n
min ( )
PlaneM
θ
The plane M is spanned by [ ]( ) ( ) ( 1) ( 1) TX n x n x n x n N +
( )d nPlane M
The plane M is spanned by .
1ˆ( ) ( )
N
id n h x n i−
= −∑
[ ]( ) ( ), ( 1), , ( 1)X n x n x n x n N= − − +L
The plane
0i=∑
)(min neM ⊥
{ }min ( ) ( ) 0NE e n X n =
The perfect estimation is possible if θ = 0, and the estimation fails if θ = π/2.
DSP everywhere!
12
Some Drawbacks of the Wiener Filter:f
Signal statistics must be known a priori. We must know R and R or at least their estimatesWe must know RXX and RdX or at least their estimates.
A matrix inversion operation is required.Heavy computational load.
Not proper for real-time applications.
Situations get worse in nonstationary environments.We have to compute RXX(n) and RdX(n) at every time n.
We must compute the matrix inversion operation at every time n.We must compute the matrix inversion operation at every time n.
DSP everywhere!
13
Gradient Search by Steepest Descent Method
DSP everywhere!
14
Steepest Descent Method (1/5)p
Objectives:We want to design a filter in a recursive form in order to avoid the matrix inversion operation( )hWe want to design a filter in a recursive form in order to avoid the matrix inversion operation required in Wiener solution.
( )ih n
ˆ( ) ( ) ( )e n d n d n= −( )d n
10 −≤≤ Ni( )x n hi(n)
1ˆ( ) ( ) ( )
Nd h i
−
∑10 ≤≤ Ni0
( ) ( ) ( )ii
d n h n x n i=
= −∑
DSP everywhere!
15
Steepest Descent Method (2/5)p
Basic Structure:
1
0( ) ( ) ( ) ( )
( ) ( ) ( )
N
ii
T
e n d n h n x n i
d n H n X n
−
== − −
= −
∑( )d n
z‐1
( )x n( ) ( ) ( )
0 ( )h n
z‐1( 1)x n −
( 2)
ˆ( )d n1( )h n
( 2)x n −2 ( )h n
z‐1
( 1)x n N− +( )h n1 ( )Nh n−
DSP everywhere!
16
Steepest Descent Method (3/5)p
Basic Assumptions:d(n) and x(n) are zero-mean.d(n) and x(n) are zero mean.d(n) and x(n) are jointly wide-sense stationary.
Notations:Notations:Filter Coefficient Vector:
Reference Input Vector: [ ]( ) ( ) ( 1) ( 1) TX n x n x n x n N= − − +L
[ ]0 1 1( ) ( ), ( ), , ( ) TNH n h n h n h n−= L
Reference Input Vector:
Estimation Error Signal:
[ ]( ) ( ), ( 1), , ( 1)X n x n x n x n N= +
1
0( ) ( ) ( ) ( )
N
ii
e n d n h n x n i−
== − −∑
Autocorrelation Matrix:
( ) ( ) ( )Td n H n X n= −
{ }( ) ( )TXXR E X n X n=
Cross-correlation Vector:
{ }
{ }( ) ( )dXR E d n X n=
TOptimum Filter Coefficient Vector: 0, 1, 1,, , ,
Topt opt opt N optH h h h −⎡ ⎤= ⎣ ⎦L
DSP everywhere!
17
Steepest Descent Method (4/5)p
The filter coefficient vector at time n+1 is equal to the coefficient vector at time n plus a change proportional to the negative gradient of the mean-squared error i eproportional to the negative gradient of the mean-squared error, i.e.,
( )1( 1) ( ) ( )2 H nH n H n n+ = − μ∇
[ ]( ) ( ) ( ) ( ) TH n h n h n h n= L
μ = Adaptation Step-size
[ ]0 1 1( ) ( ), ( ), , ( )NH n h n h n h n−= L
Performance Measure (Cost Function):
{ }2( ) ( )n E e nξ =
{ }2( ) 2 ( ) ( ) ( )T TdX XXE d n H n R H n R H n= − +
DSP everywhere!
18
Steepest Descent Method (5/5)p
The Gradient of the Mean-Squared Error:
( )( )( )( )
2 2 ( )
H n
dX XX
nnH n
R R H n
∂ξ∇ =
∂= − +
Therefore, the recursive update equation for the coefficient vector becomes
( )dX XX
[ ]( 1) ( )N XX dXH n I R H n R+ = − μ + μ
Misalignment Vector:
( ) ( ) optV n H n H= −
[ ]( 1) ( )N XXV n I R V n+ = − μ
DSP everywhere!
19
Convergence of Steepest Descent Method (1/2)g f p
Convergence (or Stability) Condition:
1 1i− μλ <
20 , i< μ < ∀λ
(λi = the i-th eigenvalue of RXX)
iλ
max
20 , i⇒ < μ < ∀λ
Slow convergence if is large.maxλλ
max
minλ
DSP everywhere!
20
Convergence of Steepest Descent Method (2/2)g f p
Time Constant:The convergence behavior of the i-th element of the misalignment vector:The convergence behavior of the i th element of the misalignment vector:
( )( 1) 1 ( )i i iv n v n+ = − μλ
( )n
Time constant for the i-th element of the misalignment vector:
( )( ) 1 (0)ni i iv n v⇒ = − μλ
g
11 expii
⎛ ⎞− μλ = −⎜ ⎟τ⎝ ⎠
( )1 1 (samples) for 1
ln 1ii i
−⇒ τ = ≈ μ
− μλ μλ
Steady-State Value: ( ) or ( ) 0opt NH H V∞ = ∞ =
We still need a priori knowledge of signal statistics.
DSP everywhere!
21
Stochastic Gradient Adaptive Algorithms
DSP everywhere!
22
Stochastic Gradient Adaptive Filtersp
Motivations:No a priori information about signal statisticsNo a priori information about signal statistics
No matrix inversion
Tracking capability
S lf d i i (R i th d)Self-designing (Recursive method)
The filter gradually learns the required correlation of the input signals and adjusts its coefficient vector recursively according to some suitably chosen instantaneous error criterion.
Evaluation Criteria:Rate of convergence
Misadjustment (Deviation from the optimum solution)
Robustness for ill-conditioned dataRobustness for ill-conditioned data
Computational costs
Hardware implementation costs
Numerical problemsNumerical problems
DSP everywhere!
23
Applications of Stochastic Gradient Adaptive Filters (1/2)
System Identifications: )(nξ
ΣΣUnknownSystem
)(nx )(ne)(nd
Adaptive
Adaptive Prediction:
Filter
Σ )(ne)(nd
Δ−
AdaptiveFilter)()( Δ−= ndnx
Δz
)()( ndnx
DSP everywhere!
24
Applications of Stochastic Gradient Adaptive Filters (1/2)
Noise Cancellation:Σ )(ne)(ny Σ
)()()( nnynd ξ+=
?
)(nξ
AdaptiveFilter
)(nx)(ˆ nξ
Inverse Filtering:Σ )(neΔ− )(ndΣ )(neΔz
TrainingSignal(RX) Received
Si lAdaptive
Filter)(nx
ΣUnknownChannelTraining
Signal
Signal
)(nξ
Signal(TX)
DSP everywhere!
25
Classification of Adaptive Filtersf f p
System Identification:System Identificationy
Layered Earth Modeling
Adaptive Prediction:Linear Predictive Coding
Autoregressive Spectral Analysis
ADPCM
Noise Cancellation:Adaptive Noise Cancellation
Adaptive Echo CancellationAdaptive Echo Cancellation
Active Noise Control
Adaptive Beamforming
Inverse Filtering:Adaptive Equalization
Deconvolution
Blind Equalization
DSP everywhere!
26
Stochastic Gradient Adaptive Algorithms (1/6)p g
ˆ( ) ( ) ( )e n d n d n= −( )d n( )
h (n) 1ˆ
N −
∑
( ) ( ) ( ) ( )Td H X
10 −≤≤ Ni( )x n hi(n)
0( ) ( ) ( )i
id n h n x n i
== −∑
( )( 1) ( ) ( )H nH n H n nμ+ = − ∇
α
( ) ( ) ( ) ( )Te n d n H n X n= −AdaptiveAlgorithm
Various forms according to the choice of the “performance measure.”
α
∂( ) ( ) ( )
( )H n n e nH n
α∂∇ =
∂
If no correlation between d(n) and x(n), then no estimation can be made.
DSP everywhere!
27
Stochastic Gradient Adaptive Algorithms (2/6)p g
Notations:Filter Coefficient Vector: [ ]( ) ( ) ( ) ( ) TH h h hFilter Coefficient Vector:
Reference Input Vector: [ ]( ) ( ), ( 1), , ( 1) TX n x n x n x n N= − − +L
[ ]0 1 1( ) ( ), ( ), , ( )NH n h n h n h n−= L
Estimation Error Signal:1
0( ) ( ) ( ) ( )
( ) ( ) ( )
N
ii
T
e n d n h n x n i
d n H n X n
−
== − −
=
∑
Autocorrelation Matrix:
( ) ( ) ( )d n H n X n= −
{ }( ) ( )TXXR E X n X n=
Cross-correlation Vector:
Optimum Filter Coefficient Vector:
{ }( ) ( )dXR E d n X n=
0, 1, 1,, , ,T
opt opt opt N optH h h h −⎡ ⎤= ⎣ ⎦L
Misalignment Vector:
⎣ ⎦
( ) ( ) optV n H n H= −
Covariance Matrix of the Misalignment Vector: { }( ) ( ) ( )TK n E V n V n=
DSP everywhere!
28
Stochastic Gradient Adaptive Algorithms (3/6)p g
Sign Algorithm: α = 1
The sign algorithm tries to minimize the instantaneous absolute error value at each iteration.
( ) ( ) ( ) ( )Td H X
( )( )
( )( )H n
e nn
H∂
∇ =∂
( ) ( ) ( ) ( )Te n d n H n X n= −
Filter Coefficient Updates:
( ) ( )n H n∂
{ }( 1) ( ) ( )sign ( )H n H n X n e n+ = − μ
1 ( ) 0⎧{ }1, ( ) 0
sign ( )1, ( ) 0
e ne n
e n≥⎧
= ⎨− <⎩
DSP everywhere!
29
Stochastic Gradient Adaptive Algorithms (4/6)p g
Least Mean Square (LMS) Algorithm: α = 2
The LMS algorithm tries to minimize the instantaneous squared error value at each iteration.
( ) ( ) ( ) ( )Td H X
2
( )( )( )( )H n
e nnH
∂∇ =
∂
( ) ( ) ( ) ( )Te n d n H n X n= −
Filter Coefficient Updates:
( ) ( )( )H n H n∂
( 1) ( ) ( ) ( )H n H n X n e n+ = − μ
DSP everywhere!
30
Stochastic Gradient Adaptive Algorithms (5/6)p g
Least Mean Absolute Third (LMAT) Algorithm: α = 3
The LMAT algorithm tries to minimize the instantaneous absolute error value to the third power at each iteration.
3( )e n∂
( ) ( ) ( ) ( )Te n d n H n X n= −
( )( )
( )( )H n
e nn
H n∂
∇ =∂
Filter Coefficient Updates:
{ }2( 1) ( ) ( ) ( )sign ( )H n H n X n e n e n+ = − μ
DSP everywhere!
31
Stochastic Gradient Adaptive Algorithms (6/6)p g
Least Mean Fourth (LMF) Algorithm: α = 4
The LMF algorithm tries to minimize the instantaneous error value to the fourth power at each iteration.
T
4
( )( )( )H
e nn ∂∇ =
( ) ( ) ( ) ( )Te n d n H n X n= −
Filter Coefficient Updates:
( ) ( )( )H n n
H n∇
∂
3( 1) ( ) ( ) ( )H n H n X n e n+ = − μ
DSP everywhere!
32
Convergence of the Adaptive Algorithms (1/2)g f p g
Basically, we need to know the mean and mean-squared behavior of the algorithms.
For the analysis of the statistical mean behavior:We want to know a set of statistical difference equations that characterizes E{H(n)} or E{V(n)}.
We also need to check
Stability conditions
Convergence speed
Unbiased estimation capability
For the analysis of the statistical mean-squared behavior:y qWe want to know a set of statistical difference equations that characterizes
and .
We also need to check
{ }2 2( ) ( )e n E e nσ =
{ }( ) ( ) ( )TK n E V n V n=
We also need to check
Stability conditions
Convergence speed
Estimation precisionEstimation precision
DSP everywhere!
33
Convergence of the Adaptive Algorithms (2/2)g f p g
Basic Assumptions for the Convergence Analysis:
The input signals d(n) and x(n) are zero-mean, jointly wide-sense stationary, and jointly Gaussian with finite variances.
A f thi ti i th t th ti ti ( ) d( ) HT( )X( ) i lA consequence of this assumption is that the estimation error e(n) = d(n) – HT(n)X(n) is also a zero-mean and Gaussian when conditioned on the coefficient vector H(n).
I d d A tiIndependence Assumption:
“The input pair {d(n), X(n)} at time n is independent of {d(k), X(k)} at time k, if n is not equal to k.”
This assumption is seldom true in practice, but is valid when the step-size μ is chosen to be ffi i tl llsufficiently small.
One direct consequence of the independence assumption is that the coefficient vector H(n) is uncorrelated with the input pair {d(n), X(n)}, since H(n) depends only on inputs at time n-1 and beforebefore..
DSP everywhere!
34
Sign Algorithm (1/2)g g
Mean Behavior:
⎡ ⎤{ } { }2 2( 1) ( )
( ) ( )N XX dXe e
E H n I R E H n Rn n
⎡ ⎤μ μ+ = − +⎢ ⎥π σ π σ⎣ ⎦
{ } { }2( 1) ( )( )N XX
eE V n I R E V n
n⎡ ⎤μ
+ = −⎢ ⎥π σ⎣ ⎦
Mean-Squared Behavior:
2 { }2min( ) ( )e XXn tr K n Rσ = ξ +
2 2 μ [ ]2
2
2( 1) ( ) ( ) ( )( )XX XX XXK n K n R K n R R K nn
μ+ = + μ − +
π σ
DSP everywhere!
35
Sign Algorithm (2/2)g g
Steady-State Mean-Squared Estimation Error:
{ }2min min( )
2 2e XXtr Rμ πσ ∞ ≈ ξ + ξ
Convergence Condition (Weak Convergence):
“The long term time average of the MAE is bounded for any positive value of μ ”The long-term time-average of the MAE is bounded for any positive value of μ.
Very robust, but slow.
DSP everywhere!
36
LMS Algorithm (1/2)g
Mean Behavior:
{ } [ ] { }( 1) ( )N XX dXE H n I R E H n R+ = − μ + μ
{ } [ ] { }( 1) ( )N XXE V n I R E V n+ = − μ
Mean-Squared Behavior:
{ }2 ( ) ( )n tr K n Rσ = ξ + { }min( ) ( )e XXn tr K n Rσ = ξ +
[ ]( 1) ( ) ( ) ( )XX XXK n K n K n R R K n+ = − μ +[ ]2 2
( ) ( ) ( ) ( )
( ) 2 ( )
XX XX
e N XX XXn I R K n R
μ
⎡ ⎤+ μ σ +⎣ ⎦
DSP everywhere!
37
LMS Algorithm (2/2)g
Steady-State Mean-Squared Estimation Error:
{ }2min min( )
2e XXtr Rμσ ∞ ≈ ξ + ξ
2Mean Convergence:max
20 < μ <λ
Mean-Squared Convergence:{ }20
3 XXtr R< μ <
If , then .min
12LMS signπ
μ = μξ
2 2( ) ( )e LMS e signσ ∞ ≈ σ ∞
The convergence of the algorithm strongly depends on the input signal statistics.
DSP everywhere!
38
LMAT Algorithm (1/2)g
Mean Behavior:
{ } { }2 2( 1) 2 ( ) ( ) 2 ( )N e XX e dXE H n I n R E H n n R⎡ ⎤
+ = − μ σ + μ σ⎢ ⎥π π⎣ ⎦
{ } { }2( 1) 2 ( ) ( )N e XXE V n I n R E V n⎡ ⎤
+ = − μ σ⎢ ⎥π⎣ ⎦
Mean-Squared Behavior:
2 { }2min( ) ( )e XXn tr K n Rσ = ξ +
[ ]2( 1) ( ) 2 ( ) ( ) ( )K K K R R K[ ]2 2 2
2( 1) ( ) 2 ( ) ( ) ( )
3 ( ) ( ) 3 ( )
e XX XX
e e XX XX XX
K n K n n K n R R K n
n n R R K n R
+ = − μ σ +π
⎡ ⎤+ μ σ σ +⎣ ⎦⎣ ⎦
DSP everywhere!
39
LMAT Algorithm (2/2)g
Steady-State Mean-Squared Estimation Error:
{ }2min min min
3( )4 2e XXtr Rμ π
σ ∞ ≈ ξ + ξ ξ
Mean Convergence:max
10 ,2 ( )e
nn
π< μ < ∀
λ σ
Very fast, but must be careful.
The convergence of the LMAT algorithm depends on the initial choice of the coefficient vector.
If then2 2 1μ μ 2 2( ) ( )∞ ∞If , then .
min3LMAT LMSμ = μπ ξ
2 2( ) ( )e LMAT e LMSσ ∞ ≈ σ ∞
DSP everywhere!
40
LMF Algorithm (1/2)g
Mean Behavior:
{ } { }2 2( 1) 3 ( ) ( ) 3 ( )N e XX e dXE H n I n R E H n n R⎡ ⎤+ = − μσ + μσ⎣ ⎦
{ } { }2( 1) 3 ( ) ( )N e XXE V n I n R E V n⎡ ⎤+ = − μσ⎣ ⎦
Mean-Squared Behavior:
{ }2 ( ) ( )n tr K n Rσ = ξ + { }min( ) ( )e XXn tr K n Rσ = ξ +
[ ]2( 1) ( ) 3 ( ) ( ) ( )e XX XXK n K n n K n R R K n+ = − μσ +[ ]4 215 ( ) ( ) 6 ( )
e XX XX
e e N XX XXn n I R K n R⎡ ⎤+ μσ σ +⎣ ⎦
DSP everywhere!
41
LMF Algorithm (2/2)g
Steady-State Mean-Squared Estimation Error:
?
2Mean Convergence:2
max
20 ,3 ( )e
nn
< μ < ∀λ σ
Very fast, but must be careful also.
The convergence of the LMF algorithm also depends on the initial choice of the coefficient vector.
DSP everywhere!
42
Further Observations (1/2)
Misadjustment: ex
min
( )M ξ ∞ξ
Sign Algorithm:{ }
2 2XXtr R
M μ π≈
ξ
minξ
LMS Algorithm:
min2 2 ξ
{ }XXM tr Rμ≈
LMAT Algorithm:
{ }2 XXM tr R≈
{ }min3
XXM tr Rμ π≈ ξ
LMF Algorithm:
{ }min4 2 XXM tr Rξ
?go : ?
DSP everywhere!
43
Further Observations (2/2)
The misadjustment M increases with the filter order N.
The misadjustment M is directly proportional to μ.
The convergence speed is inversely proportional to μ.
Convergence Speed:Convergence Speed: (Fast) LMAT – LMF ≈ LMS – Sign (Slow)
Robustness (or Stability):(Good) Sign – LMS – LMAT – LMF (Bad)
DSP everywhere!
44
Example: System Identification Mode (1/6)p y f
)(nξ
Unknown
)(nξ
)(ndΣΣ
UnknownSystem
)(nx )(ne)(
AdaptiveFilterFilter
[ ]T[ ]0.1, 0.3, 0.5, 0.7, 0.5, 0.3, 0.1 ToptH =
DSP everywhere!
45
Example: System Identification Mode (2/6)p y f
Two Sets of Reference Inputs:
CASE 1: Eigenvalue Spread Ratio = 25.3
1 1 1 1( ) ( ) 0.9 ( 1) 0.1 ( 2) 0.2 ( 3)x n n x n x n x n= ζ + − − − − −
CASE 2: Eigenvalue Spread Ratio = 185.8
1 1 1 1
2 2 2 2( ) ( ) 1.5 ( 1) ( 2) 0.25 ( 3)x n n x n x n x n= ζ + − − − − −
Measurement Noise ζ(n): White Gaussian Process
2 2 2 2( ) ( ) ( ) ( ) ( )ζ
Convergence Parameter μ:
Sign LMS LMAT LMF
0.00016 0.002 0.011 0.002
DSP everywhere!
46
Example: System Identification Mode (3/6)p y f
CASE 1: Eigenvalue Spread Ratio = 25.3
1 1 1 1( ) ( ) 0.9 ( 1) 0.1 ( 2) 0.2 ( 3)x n n x n x n x n= ζ + − − − − −
1 0
0
B
1 : L M A T2 : L M S
-1 0
MSE
indB
4
3 : L M F4 : S IG N
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 0-2 0 1
2
3
Mean-Squared Behavior of the Coefficients
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 0# o f I te r a t io n
DSP everywhere!
47
Example: System Identification Mode (4/6)p y f
0 .1 6
0 .1 22 4
1 : L M A T2 : L M S3 : L M F4 : S IG N
0 0 4
0 .0 8
E(h1
( n))
1 3
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 00 .0 0
0 .0 4
Mean Behavior of the Coefficients
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 0# o f I te r a t io n
DSP everywhere!
48
Example: System Identification Mode (5/6)p y f
CASE 2: Eigenvalue Spread Ratio = 185.8
2 2 2 2( ) ( ) 1.5 ( 1) ( 2) 0.25 ( 3)x n n x n x n x n= ζ + − − − − −
1 0
0
dB
1 : L M A T2 : L M S
- 1 0
MSE
ind 2 : L M S
3 : L M F4 : S I G N
4
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 0- 2 0 1 2
3
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 0# o f Ite ra tio n
Mean-Squared Behavior of the Coefficients
DSP everywhere!
49
Example: System Identification Mode (6/6)p y f
0 . 1 24
0 . 0 8)
1 2 3
4
0 . 0 4
E(h1
(n)
1 : L M A T2 : L M S3 : L M F4 : S I G N
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 00 . 0 0
Mean Behavior of the Coefficients
0 4 0 0 0 8 0 0 0 1 2 0 0 0 1 6 0 0 0 2 0 0 0 0# o f Ite ra tio n
DSP everywhere!
50
Other Algorithms (1/2)g
Signed Regressor Algorithm: { }( 1) ( ) sign ( ) ( )H n H n X n e n+ = + μ
Sign-Sign Algorithm: { } { }( 1) ( ) sign ( ) sign ( )H n H n X n e n+ = + μ
Normalized LMS Algorithm:
{ } { }μ
( 1) ( ) ( ) ( )H n H n X n e nμ+ +Normalized LMS Algorithm: ( 1) ( ) ( ) ( )
( ) ( )TH n H n X n e nX n X n
μ+ = +
Complex LMS Algorithm: *( 1) ( ) ( ) ( )H n H n X n e n+ = + μ
DSP everywhere!
51
Other Algorithms (2/2)g
Hybrid Algorithm #1: LMS + LMF
{ }{ }2 4
( )( ) (1 ) ( )
( ) , 0 1( )H n
e n e nn
H n
∂ φ + − φ∇ = ≤ φ ≤
∂
{ }3( 1) ( ) ( ) ( ) 2(1 ) ( ) ( )H n H n X n e n X n e n+ = + μ φ + − φ
Hybrid Algorithm #2: Sign + LMAT
{ }3( ) (1 ) ( )∂ φ φ{ }3
( )
( ) (1 ) ( )( ) , 0 1
( )H n
e n e nn
H n
∂ φ + −φ∇ = ≤ φ ≤
∂
{ } { }2( 1) ( ) ( ) 3(1 ) ( ) ( ) sign ( )H n H n X n X n e n e n+ = + μ φ + − φ
DSP everywhere!
52
Recursive Least Square (RLS) Algorithm
DSP everywhere!
53
RLS Algorithm (1/5)g
Cost Function: 2
1( ) ( , ) ( )
n
in n i e i
=ε = β∑
where n = Length of the observable data
1i
Error signal at time instance i:
The coefficient vector H(n) remains fixed during the observation interval .
( ) ( ) ( ) ( )Te i d i H n X i= −
ni ≤≤1( ) g
Weight Vector: (Normally, , λ = Forgetting Factor)
ni ≤≤1
0 ( , ) 1n i< β ≤ ( , ) n in i −β = λ
By the method of exponentially weighted least squares, we want to minimize
2( ) ( )n
n i i−λ∑Very fast, but computationally very complex.
h l i h i f l h h b f i d i ll
2
1( ) ( )n i
in e i
=ε = λ∑
The algorithm is useful when the number of taps required is small.
DSP everywhere!
54
RLS Algorithm (2/5)g
Normal Equation: ( ) ( ) ( )n H n nΦ = Θ
1( ) ( ) ( )
nn i T
in X i X i−
=Φ = λ∑where
1( ) ( ) ( )
nn i
in d i X i−
=Θ = λ∑
We write1
1( ) ( ) ( ) ( ) ( )n
n i T Tn X i X i X n X n−
− −⎡ ⎤Φ = λ λ +⎢ ⎥∑
1( ) ( ) ( ) ( ) ( )
( 1) ( ) ( )i
Tn X n X n=
⎢ ⎥⎣ ⎦
= λΦ − +
∑
( ) ( 1) ( ) ( )n n d n X nΘ = λΘ − +
Do we need a matrix inversion? ⇒ No!
DSP everywhere!
55
RLS Algorithm (3/5)g
Matrix Inversion Lemma:1
where A and B = N × N Positive Definite
( ) 11 1 1If , then .T T TA B CD C A B BC D C BC C B−− − −= + = − +
C = N × M
D = M × M Positive Definite
Letting we express in a recursive form:1( ), ( 1), ( ), 1,A n B n C X n D−= Φ = λΦ − = =
1 2 1 11
1 1( 1) ( 1) ( ) ( ) ( 1)( )
1 ( ) ( 1) ( )
T
Tn n X n X n nn
X n n X n
− − − −−
− −Φ − λ Φ − Φ −
Φ = −λ + λ Φ1 ( ) ( 1) ( )X n n X nλ + λ Φ −
K(n)K(n)
DSP everywhere!
56
RLS Algorithm (4/5)g
Define 1( ) ( ) ( )n n N N−Ρ = Φ ×
1
1( 1) ( )( ) ( 1)
1 ( ) ( 1) ( )Tn X nn N
X n n X n
−
−λ Ρ −
Κ = ×+ λ Ρ −
1 1( ) ( ) ( 1) ( ) ( 1) ( )Tn X n n X n n X n− −⇒ Κ + λ Ρ − = λ Ρ −
{ }1 1{ }1 1( ) ( 1) ( ) ( 1) ( )Tn n X n n X n− −⇒ Κ = λ Ρ − − λ Ρ −
( ) ( ) ( )n n X n⇒ Κ = Ρ
1( ) ( ) ( )n n X n−⇒ Κ = Φ
Therefore, 1 1( ) ( 1) ( ) ( ) ( 1)Tn n n X n n− −Ρ = λ Ρ − − λ Κ Ρ −
DSP everywhere!
57
RLS Algorithm (5/5)g
Time Update for H(n):11( ) ( ) ( )
( ) ( )( ) ( 1) ( ) ( ) ( )
H n n nn n
n n d n n X n
−= Φ Θ= Ρ Θ= λΡ Θ − + Ρ
1 1
( ) ( ) ( ) ( ) ( )
( 1) ( 1) ( ) ( ) ( 1) ( 1) ( ) ( )
( 1) ( 1) ( ) ( ) ( 1) ( 1) ( ) ( )
T
T
n n n X n n n d n n
n n n X n n n d n n− −
= Ρ − Θ − − Κ Ρ − Θ − + Κ
= Φ − Θ − − Κ Φ − Θ − + Κ
( ) ( 1) ( ) ( ) ( ) ( 1)TH n H n n d n X n H n⎡ ⎤⇒ = − + Κ − −⎣ ⎦
Innovation: ( ) ( ) ( ) ( 1)Tn d n X n H nα = − −“A priori estimation error”
)()()1()( nnnHnH αΚ+−=
A posteriori Estimation error e(n): ( ) ( ) ( ) ( )Te n d n X n H n= −
DSP everywhere!
58
Summary of the RLS Algorithmy f g
Initialization:
D t i th f tti f t λ (N ll 0 9≤λ<1)Determine the forgetting factor λ (Normally, 0.9≤λ<1)
1( ) : (0) , ( a small positive number)NN N I−× Ρ = δ δ =
Main Iteration:
( ) : (0) 0NN N H× =
1
1( 1) ( )( 1) : ( )
1 ( ) ( 1) ( )Tn X nN n
X n n X n
−
−λ Ρ −
× Κ =+ λ Ρ −
(1 1) : ( ) ( ) ( ) ( 1)Tn d n X n H n× α = − −
( 1) : ( ) ( 1) ( ) ( )N H n H n n n× = − + Κ α( 1) : ( ) ( 1) ( ) ( )N H n H n n n+ Κ α
1 1( 1) : ( ) ( 1) ( ) ( ) ( 1)TN n n n X n n− −× Ρ = λ Ρ − − λ Κ Ρ −
(1 1) : ( ) ( ) ( ) ( )Te n d n X n H n× = − (if necessary)