•url: .../publications/courses/ece_8423/lectures/current/lecture_09
DESCRIPTION
LECTURE 09: RECURSIVE LEAST SQUARES. Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted RLS Comparison to LMS Resources: Wiki: Recursive Least Squares Wiki: Newton’s Method IT: Recursive Least Squares YE: Kernel-Based RLS. - PowerPoint PPT PresentationTRANSCRIPT
ECE 8443 – Pattern RecognitionECE 8423 – Adaptive Signal Processing
• Objectives:Newton’s MethodApplication to LMSRecursive Least SquaresExponentially-Weighted RLSComparison to LMS
• Resources:Wiki: Recursive Least SquaresWiki: Newton’s MethodIT: Recursive Least SquaresYE: Kernel-Based RLS
• URL: .../publications/courses/ece_8423/lectures/current/lecture_09.ppt• MP3: .../publications/courses/ece_8423/lectures/current/lecture_09.mp3
LECTURE 09: RECURSIVE LEAST SQUARES
ECE 8423: Lecture 09, Slide 2
Newton’s Method• The main challenge with the steepest descent approach of the LMS algorithm
is its slow and non-uniform convergence.
• Another concern is the use of a single, instantaneous point estimate of the gradient (and we discussed an alternative block estimation approach).
• We can derive a more powerful iterative approach that uses all previous data and is based on Newton’s method for finding the zeroes of a function.
• Consider a function having a single zero:
• Start with an initial guess, x0.
• The next estimate is obtained byprojecting the tangent to the curve towhere it crosses the x-axis:
• The next estimate is formed as x1, andgeneral iterative formula is:
*at0)( xxxf
)(
)(0
001
10
00 xf
xfxx
xxxf
xf
)(1n
nnn xf
xfxx
ECE 8423: Lecture 09, Slide 3
Application to Adaptive Filtering• To apply this to the problem of least-squares minimization, we must find the
zero of the gradient of the mean-squared error.
• Since the mean-squared error is a quadratic function, the gradient is linear, and hence convergence takes place in a single step.
• Noting that , recall our error function:
• We find the optimal solution by equating the gradient of the error to zero:
and the optimum solution is:
• We can demonstrate that the Newton algorithm is given by:
by substituting our expression for the gradient:
• Note that this still requires an estimate of the autocorrelation and derivative.
ntnn
tndn
tnndEneEJ Rffgfxf 2)()( 222
ntnndne xf )()(
022
gRff nn
JJ
gRf 1-*
n
-nn
JfRff
1
1 21
*-nn
-nn fgRfgRfRff
11
11 22
21
ECE 8423: Lecture 09, Slide 4
Estimating the Gradient• In practice, we can use an estimate of the gradient, , as we did
for the LMS gradient. The update equation becomes:
• The noisy estimate of the gradient will produce excess mean-squared error. To combat this, we can introduce an adaptation constant:
Of course, convergence no longer occurs in one step, and we are somewhat back to where we started with the iterative LMS algorithm, and we have to worry about estimation the autocorrelation matrix.
• To compare this solution to the LMS algorithm, we can rewrite the update equation in terms of the error signal:
• Taking the expectation of both sides, and invoking independence:
• Note that if = 1: . Our solution is simply a weighted average of the previous value and newest estimate.
nneJ x)(2ˆ
n-
nn-
nn nene xRfxRff )())(2(21 11
1
10where)(11 n
-nn ne xRff
n
-n
tnn
-
nntn
-nn
nd
nd
xRfxxRIxxfRff
)()(
)(11
11
nxggRff d(n)EE -nn where)1( 1
1
gRf 11
-nE
ECE 8423: Lecture 09, Slide 5
Analysis of Convergence• Once again we can define an error vector:
• The solution to this first-order difference equation is:
• We can observe the following: The algorithm converges in the mean provided: Convergence proceeds exponentially at a rate determined by . The convergence rate of each coefficient is identical and independent of the
eigenvalue spread of the autocorrelation matrix, R.
• The last point is a crucial difference between the Newton algorithm and LMS.
• We still need to worry about our estimate of the autocorrelation matrix, R:
and we assume x(n) = 0 for n < 0. We can write an update equation as a function of n:
*nnnn E ffuuu where)11
01 1 uu nn
20or11
1
)()(,0
jlxilxjiRn
ln
n
l
tllnnn
-nnn jiRwherene
0
11 ,)( xxRxRff
ECE 8423: Lecture 09, Slide 6
Estimation of the Autocorrelation Matrix and Its Inverse• The effort to estimate the autocorrelation matrix and its inverse is still
considerable. We can easily derive an update equation for the autocorrelation:
• To reduce the computational complexity of the inverse, we can invoke the matrix inversion lemma:
• Applying this to the update equation for the autocorrelation function:
• Note that no matrix inversions are required ( is a scalar).
• The computation is proportional to L2 rather than L3 for the inverse.
• The autocorrelation is never calculated; its estimate is simple updated.
tnnnn 111 xxRR
uAu
AuuA-AuuA 1
1111
1 -t
-t---t
11
1
111
111
1111 1
)(
n
-n
tn
-n
tnn
-n-
ntnnn
-n xRx
RxxRRxxRR
11
11 n-n
tn xRx
ECE 8423: Lecture 09, Slide 7
Summary of the Overall Algorithm1) Initialize
2) Iterate for n = 0, 1, …
There are a few approaches to the initialization in step (1). The most straightforward thing to do is:
where the 2 is chosen to be a small positive constant (and can often be estimated based on a priori knowledge of the environment).
This approach has been superseded by recursive-in-time least squares solutions, which we will study next.
110 ,-Rf
11
1
111
111
1111 1
)(
n
-n
tn
-n
tnn
-n-
ntnnn
-n xRx
RxxRRxxRR
ntnndne xf )()(
n-nnn ne xRff )(1
1
IR 211 -
ECE 8423: Lecture 09, Slide 8
Recursive Least Squares (RLS)• Consider minimization of a finite duration version of the error:
• The objective of the RLS algorithm is to maintain a solution which is optimal at each iteration. Differentiation of the error leads to the normal equation:
• Note that we can now write recursive-in-time equations for R and g:
• We seek solutions of the form:
• We can apply the matrix inversion lemma for computation of the inverse:
n
l
n
l
lyldleJ0
2
0
2 )()()(
n
l
n
ln
nnn
ilxldig
jlxilxjiR
0
0
)()()(
)()(),(
gfR
nnn
tnnnn
nd xggxxRR)(1
1
1111
1
n-nn
n-nn
gRfgRf
ECE 8423: Lecture 09, Slide 9
Recursive Least Squares (Cont.)
])1(][1
[ 11
11
111
11
1
nn
n-n
tn
-n
tnn
-n-
nn nd xgxRxRxxRRf
• Define an intermediate vector variable, :
• Define another intermediate scalar variable, :
• Define the a priori error as:
reflecting that this is the error obtained using the old filter and the new data.
• Using this definition, we can rewrite the RLS algorithm update equation as:
11
n-n xRz
zxzzxz
zxfzxff t
n
tn
tn
ntn
nnnd
nd1
1
1
11 1
)1()1(
1
zxt1nk
1111
1 1)1(
1)1(
n
-n
ntn
nn
tn
nn knd
knd xRfxfzfxff
1)1()/1( ntnndnne xf
11
11
11 )/1(
11
n-n
n-n
tn
nn nne xRxRx
ff
ECE 8423: Lecture 09, Slide 10
Summary of the RLS Algorithm1) Initialize
2) Iterate for n = 0, 1, …
• Compare this to the Newton method:
• The RLS algorithm can be expected to converge more quickly because the use of an aggressive, adaptive step size.
111 ,
-Rf
1)1()/1( ntnndnne xf
11
111
n-n
tn
α(n)xRx
11
1 )/1()( n-nnn nnen xRff
111
1111 )( -
ntnn
-n
-n
-n n RxxRRR
11
1
111
111
1111 1
)(
n
-n
tn
-n
tnn
-n-
ntnnn
-n xRx
RxxRRxxRR
ntnndne xf )()(
n-nnn ne xRff )(1
1
ECE 8423: Lecture 09, Slide 11
Exponentially-Weighted RLS Algorithm• We can define a weighted error function:
This gives more weight to the most recent errors.
• The RLS algorithm can be modified in this case:
1) Initialize
2) Iterate for n = 1, 2, …
• RLS is computationally more complex than simple LMS because it is O(L2).
• In principle, convergence is independent of the eigenvalue structure of the signal due to the premultiplication by the inverse of the autocorrelation matrix.
111 ,
-Rf
1)1()1/( ntnndnne xf
n-n
tn
α(n)xRx 11
1
n-nnn nnen xRff 111 )1/()(
1111
11
11
1 )(1 -n
tnn
-n
-n
-n n RxxRRR
n
l
lnn leJ
0
2 )(~
ECE 8423: Lecture 09, Slide 12
Example: LMS and RLS Comparison• An IID sequence, x(n), is input to a filter:
• Measurement noise was assumed to be zero-mean Gaussian noise with unit variance, and a gain such that the SNR was 40 dB.
• The norm of the coefficient error vector is plotted in the top figure for 1000 trials.
• The filter length, L, was set to 8; the LMS adaptation constant, , was set to 0.05.
• The adaptation step-size was set to the largest value for which the LMS algorithm would give stable results, and yet the RLS algorithm still outperforms LMS.
• The lower figure corresponds to the same analysis with an input sequence:
Why is performance in this case degraded?
15.01)( zzH
)1(8.0)()( nwnwnx
ECE 8423: Lecture 09, Slide 13
• Introduced Newton’s method as an alternative to simple LMS.
• Derived the update equations for this approach.
• Introduced the Recursive Least Squares (RLS) approach and an exponentially-weighted version of RLS.
• Briefly discussed convergence and computational complexity.
• Next: IIR adaptive filters.
Summary