•url: .../publications/courses/ece_8423/lectures/current/lecture_04

ECE 8443 – Pattern RecognitionECE 8423 – Adaptive Signal Processing

• Objectives:The Linear Prediction ModelThe Autocorrelation MethodLevinson and Durbin RecursionsSpectral ModelingInverse Filtering and Deconvolution

• Resources:ECE 4773: Into To DSPECE 8463: Fund. Of SpeechWIKI: Minimum PhaseMarkel and Gray: Linear PredictionDeller: DT Processing of SpeechAJR: LP Modeling of SpeechMC: MATLAB Demo

• URL: .../publications/courses/ece_8423/lectures/current/lecture_04.ppt• MP3: .../publications/courses/ece_8423/lectures/current/lecture_04.mp3

LECTURE 04: LINEAR PREDICTION

http://www.ece.msstate.edu/research/isip/publications/courses/ece_4773/lectures/current/lecture_44/

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8463/lectures/current/lecture_15/index.html

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8463/lectures/current/lecture_15/index.html

http://en.wikipedia.org/wiki/Minimum_phase

http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=578111

http://www.amazon.com/Discrete-Time-Processing-Speech-Signals-Deller/dp/0023283017

http://svr-www.eng.cam.ac.uk/~ajr/SA95/node38.html

http://www.dcs.shef.ac.uk/~martin/MAD/lpcspect/lpcspect.htm

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/lecture_04.ppt

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/lecture_04.mp3

http://www.mathworks.com/products/demos/shipping/signal/lpcardemo_01_thumbnail.png

http://www.searchanddiscovery.net/documents/geophysical/henry/index.htm

ECE 8423: Lecture 04, Slide 2

• Consider a pth-order linear prediction model:

Without loss of generality, assume n0 = 0.

• The prediction error is defined as:

The Linear Prediction (LP) Model

)( 0nnx }{ ia

)(ˆ nx

)(nx

)(ne+–

p

ii innxanx

10ˆ

• We can define an objective function:

2

11

2

2

11

2

2

11

2

2

1

22

)(2

)(2

)(2

ˆ}{

p

ii

p

ii

p

ii

p

ii

p

ii

p

ii

p

ii

inxaEinxnxEanxE

inxaEinxnxaEnxE

inxaEinxanxEnxE

inxanxEnxnxEneEJ

p

ii inxanxnxnxne

1

ˆ


Minimization of the Objective Function• Differentiate w.r.t. al:

0)(2)(2

)(2

)(2

1

2

11

2

2

11

2

lnxinxaElnxnxE

inxaEa

inxnxEaa

nxEa

inxaEinxnxEanxEaa

J

p

ii

p

ii

l

p

ii

ll

p

ii

p

ii

ll

• Rearranging terms:

lnxnxElnxinxaEp

ii

)()(1

• Interchanging the order of summation and expectation on the left (why?):

lnxnxElnxinxEap

ii

)()(1

• Define a covariance function:

)(),( jnxinxEjic


The Yule-Walker Equations (aka Normal Equations)• We can rewrite our prediction equation as:

lnxnxElnxinxEap

ii

)()(1

),0(),(1

lclicap

ii

• This is known as the Yule-Walker equation. Its solution produces what we refer to as the Covariance Method for linear prediction.

),0(),(...),2(),1(...

)2,0()2,(...)2,2()2,1(

)1,0()1,(...)1,2()1,1(

21

21

21

pcppcapcapca

cpcacaca

cpcacaca

p

p

p

• We can write this set of p equations in matrix form:

and can easily solve for the prediction coefficients:

where:

cCa

cCa -1

),0(

)2,0()1,0(

),()2,()1,(

),2()2,2()1,2(),1()2,1()1,1(

2

1

pc

cc

ppcpcpc

pcccpccc

a

aa

p

cCa

• Note that the covariance matrix is symmetric: ),c(),c( 1221


Autocorrelation Method• C is a covariance matrix, which means it has some special properties: Symmetric: under what conditions does its inverse exist? Fast Inversion: we can factor this matrix into upper and lower triangular

matrices and derive a fast algorithm for inversion known as the Cholesky decomposition.

• If we assume stationary inputs, we can convert covariances to correlations:

)(

)2()1(

)0()2()1(

)2()0()1()1()1()0(

2

1

pr

rr

rprpr

prrrprrr

a

aa

p

rRa

• This is known as the Autocorrelation Method. This matrix is symmetric, but is also Toeplitz, which means the inverse can be performed efficiently using an iterative algorithm we will introduce shortly.

• Note that the Covariance Method requires p(p-1)/2 unique values for the matrix, and p values for the associated vector. A fast algorithm, known as the Factored Covariance Algorithm, exists to compute C.

• The Autocorrelation method requires p+1 values to produce p LP coefficients.

http://www.ece.msstate.edu/perl/ifc_document.pl?file=$isip/class/algo/Covariance/Covariance.h


Linear Prediction Error• Recall our expression for J, the prediction error energy:

We can substitute our expression for the predictor coefficients, and show:

These relations are significant because they show the error obeys the same linear prediction equation that we applied to the signal. This result has two interesting implications:

Missing values of the autocorrelation function can be calculated using this relation under certain assumptions (e.g., maximum entropy).

The autocorrelation function shares many properties with the linear prediction model (e.g., minimum phase). In fact, the two representations are interchangeable.

MethodCovariance),0()0,0(

MethodationAutocorrel)()0(

1

1

p

ii

p

ii

icacJ

irarJ

2

1

22 ˆ}{p

ii inxanxEnxnxEneEJ


Linear Filter Interpretation of Linear Prediction• Recall our expression for the error signal:

p

ii inxanxnxnxne

1

ˆ

• We can rewrite this using the z-Transform:

• This, of course, implies we can invert theprocess and generate the original signalfrom the error signal:

• This rather remarkable view of the process exposes some important questions about the nature of this filter: A(z) is an FIR filter. Under what conditions is it minimum phase? Under what conditions is the inverse, 1/A(z), stable?

p

i

ii

p

i

ii

p

ii

zazXzXzazX

inxanxZzEneZ

11

1

1)()()(

)(

• This implies we can view the computation of the error as a filtering process:

p

i

ii zaA(z)zAzXzE

1

1where)()()()(nx )(ne

)()( zAzH

)(ne )(nx)(/1)( zAzH


Residual Error• To the right are some examples of the

linear prediction error for voiced speech signals.

• The points where the prediction error peaks are points in the signal where the signal is least predictable by a linear prediction model. In the case of voiced speech, this relates to the manner in which the signal is produced.

• Speech compression and synthesis systems exploit the linear prediction model as a first-order attempt to remove redundancy from the signal.

• The LP model is independent of the energy of the input signal. It is also independent of the phase of the input signal because the LP filter is a minimum phase filter.

cCa -1

)(nx )(ne)()( zAzH

)(ne )(nx)(/1)( zAzH


Durbin Recursion• There are several efficient algorithms to compute the LP coefficients without

doing a matrix inverse. One of the most popular and insightful is known as the Durbin recursion:

11

)1(

1/)()(

)0(

)1(2)(

)1()1()(

)(

)1(1

1

)1(

)(

ij

EkE

akaa

ka

piEjirairk

rE

ii

i

ijii

ij

ij

iij

ii

j

iji

i

• The intermediate coefficients, {ki}, are referred to as reflection coefficients. To compute a pth order model, all orders from 1 to p are computed.

• This recursion is significant for several reasons: The error energy decreases as the LP order increases, indicating the model

continually improves. There is a one-to-one mapping between {ri}, {ki}, and {ai}. For the LP filter to be stable, . Note that the Autocorrelation Method

guarantees the filter to be stable. The Covariance Method does not.1ik


The Burg Algorithm• Digital filters can be implemented

using many different forms. One very important and popular form is a lattice filter, shown to the right.

• Itakura showed the {ki}’s can be computed directly:

2/11

0

2)1(1

0

2)1(

1)1()1(

))1(())((

)1()(

N

m

iN

m

i

N

om

ii

i

mbme

mbmek

• Burg demonstrated that the LP approachcan be viewed as a maximum entropy spectral estimate, and derived anexpression for the reflection coefficientsthat guarantees: .

• Makhoul showed that a family of lattice-based formulations exist.• Most importantly, the filter coefficients can be updated in real-time in O(n).

1

0

2)1(1

0

2)1(

1)1()1(

))1(())((

)1()(2

N

m

iN

m

i

N

om

ii

i

mbme

mbmek

11 ik


The Autoregressive Model• Suppose we model our signal as the output

of a linear filter with a white noise input:

• The inverse LP filter can be thought of as anall-pole (IIR) filter:

• This is referred to as an autoregressive (AR) model.

• If the system is actually a mixed model, referred to as an autoregressive moving average (ARMA) model:

)(nw )(nx)(/1)( zAzH

pp zazazazA

zH

...11

)(1)( 2

21

1

pp

qq

zazazazbzbzb

zAzBzH

...1

...1)()()( 2

21

1

22

11

• The LP model can still approximate such a system because:

...)()(111 22

11

111

zazaza

Hence, even if the system has poles and zeroes, the LP model is capable of approximating the system’s overall impulse or frequency response.


Spectral Matching and Blind Deconvolution• Recall our expression for the

error energy:• The LP filter becomes increasingly

more accurate if you increase the order of the model.

• We can interpret this as a spectral matching process, as shown to the right. As the order increases, the LP model better models the envelope of the spectrum of the original signal.

• The LP model attempts to minimize the error equally across the entire spectrum.

)1(2)( )1( ii

i EkE

• If the spectrum of the input signal has a systematic variation, such as a bandpass filter shape, or a spectral tilt, the LP model will attempt to model this. Therefore, we typically pre-whiten the signal before LP analysis.

• The process by which the LP filter learns the spectrum of the input signal is often referred to as blind deconvolution.


• There are many interpretations and motivations for linear prediction ranging from minimum mean-square error estimation to maximum entropy spectral estimation.

• There are many implementations of the filter, including the direct form and the lattice representation.

• There are many representations for the coefficients including predictor and reflection coefficients.

• The LP approach can be extended to estimate the parameters of most digital filters, and can also be applied to the problem of digital filter design.

• The filter can be estimated in batch mode using a frame-based analysis, or it can be updated on a sample basis using a sequential or iterative estimator. Hence, the LP model is our first adaptive filter. Such a filter can be viewed as a time-varying digital filter that tracks a signal in real-time.

• Under appropriate Gaussian assumptions, LP analysis can be shown to be a maximum likelihood estimate of the model parameters.

• Further, two models can be compared using a metric called the log likelihood ratio. Many other metrics exist to compare such models, including cepstral and principal components approaches.

Summary

•url: .../publications/courses/ece_8423/lectures/current/lecture_04

Documents