eleg-636: statistical signal processingarce/files/courses/... · eleg-636: statistical signal...

ELEG-636: Statistical Signal Processing

Gonzalo R. Arce

Department of Electrical and Computer EngineeringUniversity of Delaware

Spring 2010

Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 34

Course Objectives & Structure

Objective: Given a discrete time sequence x(n), develop

Statistical and spectral signal representation

Filtering, prediction, and system identification algorithmsOptimization methods that are

StatisticalAdaptive

Course Structure:

Weekly lectures [notes: www.ece.udel.edu/∼arce]

Periodic homework (theory & Matlab implementations) [15%]

Midterm & Final examinations [85%]

Textbook:

Haykin, Adaptive Filter Theory.

Course Objectives & Structure

Broad Applications in Communications, Imaging, Sensors.Emerging application in

Brain-imaging techniquesBrain-machine interfaces,Implantable devices.

Neurofeedback presents real-time physiological signals from MRIsin a visual or auditory form to provide information about brainactivity. These signals are used to train the patient to alter neuralactivity in a desired direction.

Traditionally, feedback using EEGs or other mechanisms has notfocused on the brain because the resolution is not good enough.

Wiener (MSE) Filtering Theory Wiener Filtering

Problem Statement

Produce an estimate of a desired process statistically related to a setof observations

Historical Notes: The linear filtering problem was solved byAndrey Kolmogorov for discrete time – his 1938 paper“established the basic theorems for smoothing and predictingstationary stochastic processes”Norbert Wiener in 1941 for continuous time – not published untilthe 1949 paper Extrapolation, Interpolation, and Smoothing ofStationary Time Series

System restrictions and considerations:

Filter is linear

Filter is discrete time

Filter is finite impulse response (FIR)

The process is WSS

Statistical optimization is employed

For the discrete time case

∗ ∗!" ∗

−!#" $%

The filter impulse response is finite and given by

k for k = 0, 1, · · · ,M − 10 otherwise

The output d(n) is an estimate of the desired signal d(n)x(n) and d(n) are statistically related ⇒ d(n) and d(n) arestatistically related

In convolution and vector form

d(n) =M−1∑

k x(n − k) = wHx(n)

w = [w0,w1, · · · ,wM−1]T [filter coefficient vector]

x = [x(n), x(n − 1), · · · , x(n − M + 1)]T [observation vector]

The error can now be written as

e(n) = d(n)− d(n) = d(n)− wHx(n)

Question: Under what criteria should the error be minimized?

Selected Criteria: Mean squared-error (MSE)

J(w) = Ee(n)e∗(n) (∗)

Result: The w that minimizes J(w) is the optimal (Wiener) filterGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 8 / 34

Utilizing e(n) = d(n)− wHx(n) in (∗) and expanding,

J(w) = Ee(n)e∗(n)

= E(d(n)− wHx(n))(d∗(n)− xH(n)w)

= E|d(n)|2 − d(n)xH(n)w − wHx(n)d∗(n)

+wHx(n)xH(n)w

= E|d(n)|2 − Ed(n)xH(n)w − wHEx(n)d∗(n)

+wHEx(n)xH(n)w (∗∗)

R = Ex(n)xH(n) [autocorrelation of x(n)]

p = Ex(n)d∗(n) [cross correlation between x(n) and d(n)]

Then (∗∗) can be compactly expressed as

J(w) = σ2d − pHw − wHp + wHRw

where we have assumed x(n) & d(n) are zero mean, WSSGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 9 / 34

The MSE criteria as a function of the filter weight vector w

Observation: The error is a quadratic function of w

Consequences: The error is an M–dimensional bowl–shaped functionof w with a unique minimum

Result: The optimal weight vector, w0, is determined by differentiatingJ(w) and setting the result to zero

∇wJ(w)|w=w0 = 0

A closed form solution exists

Example

Consider a two dimensional case, i.e., a M = 2 tap filter. Plot the errorsurface and error contours.

Error Surface Error Contours

Aside (Matrix Differentiation): For complex data,

wk = ak + jbk , k = 0, 1, · · · ,M − 1

the gradient, with respect to wk , is

∇k (J) =∂J∂ak

+ j∂J∂bk

, k = 0, 1, · · · ,M − 1

The complete gradient is thus given by

∇w(J) =

∇0(J)∇1(J)

...∇M−1(J)

∂J∂a0

+ j ∂J∂b0

∂J∂a1

+ j ∂J∂b1

...∂J

∂aM−1+ j ∂J

∂bM−1

Example

Let c and w be M × 1 complex vectors. For g = cHw, find ∇w(g)

g = cHw =M−1∑

k wk =M−1∑

k (ak + jbk )

∇k (g) =∂g∂ak

+ j∂g∂bk

= c∗

k + j(jc∗

k ) = 0, k = 0, 1, · · · ,M − 1

Result: For g = cHw

∇w(g) =

∇0(g)∇1(g)

...∇M-1(g)

00...0

Example

Now suppose g = wHc. Find ∇w(g)

In this case,

g = wHc =

M−1∑

k ck =

M−1∑

ck (ak − jbk )

∇k (g) =∂g∂ak

+ j∂g∂bk

= ck + j(−jck ) = 2ck , k = 0, 1, · · · ,M − 1

Result: For g = wHc

∇w(g) =

∇0(g)∇1(g)

...∇M-1(g)

2c1...

2cM−1

Example

Lastly, suppose g = wHQw. Find ∇w(g)

In this case,

M−1∑

i wjqi,j

=M−1∑

M−1∑

(ai − jbi)(aj + jbj)qi,j

⇒ ∇k (g) =∂g∂ak

+ j∂g∂bk

= 2M−1∑

(aj + jbj)qk ,j + 0

= 2M−1∑

wjqk ,j

Result: For g = wHQw

∇w(g) =

∇0(g)∇1(g)

...∇M-1(g)

M−1∑

i=0q0,iwi

M−1∑

i=0q1,iwi

...M-1∑

i=0qM−1,iwi

Observation: Differentiation result depends on matrix ordering

Returning to the MSE performance criteria

Approach: Minimize error by differentiating with respect to w and setresult to 0

∇w(J) = 0 − 0 − 2p + 2Rw

⇒ Rw0 = p [normal equation]

Result: The Wiener filter coefficients are defined by

w0 = R−1p

Question: Does R−1 always exist? Recall R is positive semi-definite,and usually positive definite

Wiener (MSE) Filtering Theory Orthogonality Principle

Orthogonality Principle

Consider again the normal equation that defines the optimal solution

Rw0 = p

⇒ Ex(n)xH(n)w0 = Ex(n)d∗(n)

Rearranging

Ex(n)d∗(n) − Ex(n)xH(n)w0 = 0

Ex(n)[d∗(n)− xH(n)w0] = 0

Ex(n)e∗

0(n) = 0

Note: e∗

0(n) is the error when the optimal weights are used, i.e.,

0(n) = d∗(n)− xH(n)w0

Wiener (MSE) Filtering Theory Orthogonality Principle

Ex(n)e∗

0(n) = E

x(n)e∗

0(n)x(n − 1)e∗

0(n)...

x(n − M + 1)e∗

00...0

Orthogonality Principle

A necessary and sufficient condition for a filter to be optimal is that theestimate error, e∗(n), be orthogonal to each input sample in x(n)

Interpretation: The observations samples and error are orthogonal andcontain no mutual “information”

Wiener (MSE) Filtering Theory Minimum MSE

Objective: Determine the minimum MSE

Approach: Use the optimal weights w0 = R−1p in the MSE expression

⇒ Jmin = σ2d − pHw0 − wH

0 p + wH0 R(R−1p)

= σ2d − pHw0 − wH

0 p + wH0 p

= σ2d − pHw0

Result:Jmin = σ2

d − pHR−1p

where the substitution w0 = R−1p has been employed

Wiener (MSE) Filtering Theory Excess MSE

Objective: Consider the excess MSE introduced by using a weightedvector that is not optimal.

J(w)−Jmin = (σ2d −pHw−wHp+wHRw)−(σ2

d −pHw0−wH0 p+wH

0 Rw0)

Using the fact that

p = Rw0 and pH = wH0 R

yields

J(w)− Jmin = −pHw − wHp + wHRw + pHw0 + wH0 p − wH

= −wH0 Rw − wHRw0 + wHRw + wH

+ wH0 Rw0 − wH

= −wH0 Rw − wHRw0 + wHRw + wH

= (w − w0)HR(w − w0)

⇒ J(w) = Jmin + (w − w0)HR(w − w0)

Wiener (MSE) Filtering Theory Excess MSE

Finally, using the eigenvalue and vector representation R = QΩQH

J(w) = Jmin + (w − w0)HQΩQH(w − w0)

or defining the eigenvector transformed difference

v = QH(w − w0) (∗)

⇒ J(w) = Jmin + vHΩv

= Jmin +M∑

λkvkv∗

Result:

J(w) = Jmin +M∑

λk |vk |2

Note: (∗) shows that vk is the difference (w − w0) projected ontoeigenvector qk

Wiener (MSE) Filtering Theory Communications Example

Example

Consider the following system

& ' ( ) (* + , - . + , -/ 0 1 2 3 4 5 6 7 2 8 0 5 9 : ;0 < 2 : 5 = 6 4 1 0 7+ 7 0 1 2 : -> 0 9 9 ? 7 1 @ 6 4 1 0 7@ A 6 7 : B & C (

D : 2 1 5 : ;2 1 E 7 6 B) ( FGObjective: Determine the optimal filter for a given system and channel

Specific Objective

Determine the optimal order two filter weights, w0, for

H1(z) =1

1 + 0.8458z−1 [AR process]

H2(z) =1

1 − 0.9458z−1 [communication channel]

where v1(n) and v2(n) zero mean white noise processes withσ2

1 = 0.27 and σ22 = 0.1

Note: To determine w0, we need:

Ru — auto–correlation of the received signal

p — the cross correlation between received signal u(n) and thedesired signal d(n)

H I J K JL M N O P M N OQ R S T U V W X Y T Z R W [ \ ]R ^ T \ W _ X V S R YM Y R S T \ O` R [ [ a Y S b X V S R Yb c X Y \ d H e J

f \ T S W \ ]T S g Y X dK J hiProcedure: Consider Ru first

Since u(n) = x(n) + v2(n), where v2(n) is white with σ22 = 0.1 and is

uncorrelated with x(n)

Ru = Rx + Rv2 = Rx +

0.1 00 0.1

Wiener (MSE) Filtering Theory Communications Examplej kl m l n o pq r s t u v s t uw x y z x y z| ~ ~ x ~ z ~ ~ q s t u ~ ~ k n o v s t u

Next, consider the relation between x(n) and v1(n)

X (z) = H1(z)H2(z)V1(z)

where for the give systems

H1(z)H2(z) =1

(1 + 0.8458z−1)(1 − 0.9458z−1)

Converting to the time domain, we see x(n) is an order 2 AR process

x(n)− 0.1x(n − 1)− 0.8x(n − 2) = v1(n)

x(n) + a1x(n − 1) + a2x(n − 2) = v1(n)

Since x(n) is a real valued order two AR process, the Yule-Walkerequations are given by

r(0) r(1)r∗(1) r(0)

r∗(1)r∗(2)

r(0) r(1)r(1) r(0)

r(1)r(2)

Solving for the coefficients:

−a1 =r(1)[r(0)− r(2)]

r2(0)− r2(1)

− a2 =r(0)r(2)− r2(1)

r2(0)− r2(1)

Question: What are the known and unknown terms in this system?

Note: We must solve the system to obtain the unknown r(·) values

Noting r(0) = σ2x and rearranging to solve for r(1) and r(2)

r(1) =−a1

1 + a2σ2

x (∗)

r(2) =

−a2 +a2

1 + a2

σ2x (∗∗)

The Yule-Walker equations also stipulate

= r(0) + a1r(1) + a2r(2) (∗ ∗ ∗)

Next, utilize the determined r(1), r(2) and given a1, a2, σ2v1

values todetermine r(0) = σ2

Substituting (∗) and (∗∗) into (∗ ∗ ∗), utilize r(0) = σ2x , and rearranging

= r(0) + a1r(1) + a2r(2)

= σ2x + a1

1 + a2

σ2x + a2

−a2 +a2

1 + a2

1 +−a2

1 + a2− a2

1 + a2

⇒ σ2x =

1 + a2

1 − a2

(1 + a2)2 − a21

Note: All correlation terms have been determined since r(0) = σ2x

Using a1 = −0.1, a2 = −0.8, and σ2v1

= 0.27,

σ2x =

1 + a2

1 − a2

(1 + a2)2 − a21

Thus r(0) = 1. Similarly

r(1) =−a1

1 + a2σ2

x = 0.5

and finally, Rx is

1 0.50.5 1

Recall the overall system ¡ ¢ £ ¤¥ ¦ § ¨ © ª § ¨ ©« ¬ ® ¯ ¬ ®° ± ² ³ ´ µ ¶ · ¸ ³ ¹ ± ¶ º » ¼± ½ ³ » ¶ ¾ · µ ² ± ¸¬ ¸ ± ² ³ » ®¿ ± º º À ¸ ² Á · µ ² ± ¸Á Â · ¸ » Ã ¥ Ä § ¨ ©Å » ³ ² ¶ » ¼³ ² Æ ¸ · Ã Ç È É Ê Ë ¢ £ Ìª § ¨ © ÍÎÏ ÐÑ

Putting the pieces of Ru together

Ru = Rx + Rv2

1 0.50.5 1

0.1 00 0.1

1.1 0.50.5 1.1

Recall: The Wiener solution is given by w0 = R−1p

Note: R is known, but p is still to be determined

p = E[

d(n)x(n)d(n)u(n − 1)

Recall

X (z) = H2(z)D(z) =D(z)

1 − 0.9458z−1

or in the time domain

x(n)− 0.9458x(n − 1) = d(n)

Lastly, the observation is corrupted by additive noise

u(n) = x(n) + v2(n)

Eu(n)d(n) = E[x(n) + v2(n)][x(n)− 0.9458x(n − 1)]

= Ex2(n)+ Ex(n)v2(n) − 0.9458Ex(n)x(n − 1)

−0.9458Ev2(n)x(n − 1)

= σ2x + 0 − 0.9458r(1)− 0

= 1 − 0.9458(

= 0.5272

Similarly,

Eu(n − 1)d(n) = E[x(n − 1) + v2(n − 1)][x(n)− 0.9458x(n − 1)]

= r(1)− 0.9458r(0)

= −0.4458

0.5272−0.4458

Final Solution:

w0 = R−1p

1.1 0.50.5 1.1

−1 [0.5272−0.4458

0.8360−0.7853

The optimal filter weights are a function of the source signalstatistics and the communications channel

Question: How do we optimized a filter of when the statistics arenot known in closed form or a priori?

eleg-636: statistical signal processingarce/files/courses/... · eleg-636: statistical signal...

Documents

eleg 479 lecture #12 magnetic resonance (mr) imaging

western union telegraph company, jennerstown haer pa-636...

eleg-636: statistical signal processing

eleg 651 and cpeg 419

eleg 5173l digital signal processing ch. 5 digital filters

eleg 5491: introduction to deep learning - neural networks

eleg 3124 systems and signals ch. 2 continuous-time systems

eleg-636: statistical signal...

eleg 5173l digital signal processing introduction to ... ·...

phyto61n06 636

eleg-636: statistical signal...

eleg 422/622 electronic materials processing homework #1...

eleg 5491: introduction to deep learning - recurrent

eleg--305: digital signal processing - university of...

eleg 3124 systems and signals ch. 5 laplace transform

eleg 3124 systems and signals lecture notes

eleg 635 digital communication theory lecture 9...

eleg 620 solar electric power systems february 25, 2010...

express 636

eleg 5173l digital signal processing ch. 3 discrete-time...