adaptive sp & machine intelligence lecture 4: modern...

© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence

Lecture 4: Modern Spectral Estimation

1

Danilo P. Mandic

Room: 813 Ext: 46271

Department of Electrical and Electronic EngineeringImperial College London, UK

[email protected] URL: www.commsp.ee.ic.ac.uk/∼mandic

Adaptive SP & Machine Intelligence


Overview of Spectral Estimation Methods

2

Introduction To

Spectrum Est.

Relationship Between Methods

Periodogram(Using Data)

Blackman-Tukey Method

(Using ACF)

Modifications1. Windowing Data2. Bartlett’s Method3. Welch’s Method

Maximum Entropy Method

ARMA Spectrum

Estimation

Subspace Methods

• Pisarenko• MUSIC• Eigenvector Method• Minimum NormNon-Parametric

Methods

Parametric Methods


Periodogram Based Methods

3

P

mod

(!) =1

NU

��

N�1X

n=0

w(n)x(n)e�|n!

��

2

PB(!) =1

N

K�1X

i=0

��

L�1X

n=0

x(n+ iL)e�|n!

��

2

PW (!) =1

KLU

K�1X

i=0

��

L�1X

n=0

w(n)x(n+ iD)e�|n!

��

2

WindowingModified Periodogram

AveragingBartlett’s Method

+ Overlapping windowsWelch’s Method

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!


Modified Periodogram

4

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!

⇥P

!

Windowing

Windowing mitigates the problem of spurious high frequency components in the spectrum.

Reduction the “Edge Effects”


Bartlett’s Method

5

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!

P

!

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!

P

!

P

!

P

!

Averaging

Reduction in Variance

Tradeoff:Frequency Resolution & Variance Reduction


Welch’s Method

6

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!

P

!

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!P

!

P !

P

!

Overlapping Windows

Pper(!) =1

N

��

N�1X

n=0

x(n)e�|n!

��

2

=N�1X

k=�N+1

r(k)e�|k!

Averaging

Achieves a good balance between Resolution &

Variance


Blackman-Tukey Method

7

Pper(!) =N�1X

k=�N+1

r(k)e�|k!

The Periodogramcan also beexpressed as:

Autocorrelation Estimates at large lags are unreliable

PBT (!) =MX

k=�M

w(k)r(k)e�|k!

Lags: M < N � 1 Windowing

Next: Can we extrapolate the autocorrelation estimates for lags ?


Maximum Entropy Method

8

How can we extrapolate the autocorrelation estimates with imposing the least amount of structure on the data?

Maximize the randomness Maximize Entropy

Lag k

r x(k)

−pi piω

P x(ej ω

)

Autocorrelation Sequences

Power Spectral Density (PSD)

Lag k

r x(k)

−pi piω

P x(ej ω

)

Lag k

r x(k)

−pi piω

P x(ej ω

)

Which one has the “flattest” PSD?


Maximum Entropy Method (MEM)

9

Entropy of Gaussian random process with PSD :

Goal: Find extrapolated autocorrelation values to maximize the entropy:

*Refer to handout for the full derivation

Estimated using the Yule-Walker Method

The MEM method is identical to the all-pole AR(p) spectrum although no assumptions were made about the model of the data (except Gaussianity).


MEM, derivation

10


MEM, spectrum

11


Power spectra of ARMA processes

12

! !Flat (white)Spectrum

BandpassSpectrum

w(n) y(n)

Pw(!) Py(!)

�2w

Py(!) =

��Bq(!)

Ap(!)

��2

Pw(!)

Py(!) =

��Bq(!)

Ap(!)

��2

Pw(!)

y(n) = �pX

k=1

aky(n� k) +qX

k=0

bkw(n� k)

PARMA(!) =

��Pq

k=0 bke�|k!

��2

��1 +Pp

k=1 ake�|k!

��2

Autoregressive AR(p)

Moving Average MA(q)


Recap: ARMA processes

13


Recap: Yule-Walker equations

14


Recap: ARMA processes

15


ARMA spectrum estimation

16


Complex AR modelling (from Lecture 2)

17


Complex AR modelling, simulations

18


Subspace Methods: Introduction

19

x(n) = A1e|n!1 + w(n) A1 = |A1|e|�

w(n) ⇠ N (0,�2w)

e1 = [1, e|!1 , . . . , e|!1(M�1)]T

Rxx = |A1|2e1eH1 + �2wI

Rs Rn

x = A1e1 +w

x = [x(0), x(1), . . . , x(M � 1)]T

Autocorrelation:Matrix

E(xxH) =

Signal Autocorrelation Rank 1Single non-zero Eigenvalue =M |A1|2

Noise AutocorrelationRankAll Eigenvalues =

M�2w


Decomposing the Autocorrelation Matrix

20

is Hermitian. Remaining M-1 eigenvectors are orthogonal to

Eigenvectors of = Eigenvectors of

Can we use the idea that to somehow estimate the power spectrum?


Multiple Sinusoids

21

Consider:

The first 2 eigenvalues of are

The remaining are noise

2

1

e

e

3

2

1

v

v

v subspacesignal

eigenvector

Rank 2

The signal and noise subspaces are orthogonal

are not the signal eigenvectors in this case.

o Signal eigevectors span a 2D subspace

o Noise eigevectos span a (M-2) dim. subspace


Subspace Methods

22

Extending to p sinusoids.

Using

PSD estimation can be performed as:

Pisarenko Harmonic Decomposition

MUltiple SignalClassification (MUSIC)

EigenVector Method Minimum Norm Method

Noise Subspace & has min. norm


Comparison of the 4 Subspace Methods

23

0 0.5 1 1.5 2−60

−50

−40

−30

−20

−10

0

10

20

Frequency (units of pi)

Mag

nitu

de (d

B)

0 0.5 1 1.5 2−10

0

10

20

30

40

50

60

70

80

90


Mag

nitu

de (d

B)

0 0.5 1 1.5 2−5

0

5

10

15

20

25

30

35

40

45


Mag

nitu

de (d

B)

0 0.5 1 1.5 2−35

−30

−25

−20

−15

−10

−5

0

5

Frequency (units of pi)M

agni

tude

(dB)

Pisarenko MUSIC

EigenVector Minimum Norm

Pisarenko only needs a 5 x 5 correlation matrix

A 64 x 64 correlation matrix was used for other methods

Except for Pisarenko’smethod, all other estimates are correct!

Overlay of 10 different realizations of 4 complex sinusoids in white noise.


Principal Components Spectral Estimation

24

Signal Noise

Can we de-noise the signal by discarding the noise eigenvectors ? Linear Algebra terms: We impose a rank

p constraint on

Principal component analysis (PCA) can be used with Blackman–Tukey, maximum entropy method and AR spectrum estimation.


Summary of the Different Methods

25

PeriodogramBlackman-TukeyMoving Average (MA)

ARMA

“Numerator” Methods

Autoregressive (AR)

Max. Entropy

Pisarenko

MUSIC

Min. Norm Eigenvector

“Denominator” Methods

Signal Subspace

Noise Subspace

Adaptive Signal Processing & Machine Intelligence

Principal Component Analysis

Danilo Mandicroom 813, ext: 46271

Department of Electrical and Electronic Engineering

Imperial College London, [email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic

c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 1

What is that a matrix does to a vector?

Ampli-twistA matrix A which multiplies a vectorx(i) stretches or shortents the vector(ii) rotates the vector

A any general matrix

R a rotation matrix (RT = R−1

and detR = 1)

Ex = λx eigenanalysis

P projection matrix

An example of a rotation matrix

R =

[cos θ − sin θsin θ cos θ

]What can we say about theproperties of the matrix A, matrixE and the projection matrix P(rank, invertibility, ...)?

Is the projection matrix invertible?


The meaning of eigenanalysis

Let A be an n× n matrix, where A is a linear operator on vectors in Rn,such that A x = b

An eigenvector of A is a vector v ∈ Rn such that A v = λv, where λ iscalled the corresponding eigenvalue.

Matrix A only changes the length of v, not its direction!

Equation A v = λv Equation A x = b.


Motivation for Principal Component Analysis (PCA)from material by Rasmus Bro


And now, principal directions in datafrom material by Rasmus Bro

Clearly, a 2D representation provides a perfect separability betweenhumans and monkays, together with principal directions in data.


Principal Component Analysis (PCA)Also known as the Karhunen-Loeve transform

• Many signal processing, control and machine learning tasks employmultivariate data which often exhibit dependencies and redundancies.

• For example, it is often useful to reduce the dimensionality of a signalwhile maintaining the useful information.

• This reduces the computational complexity of any algorithm whilepreserving the physical meaning of the data.

• Besides dimensionality reduction, we often would like to transform themulti-channel data such each channel is orthogonal to each other (thedata covariance matrix is diagonal)

• We use the PCA to accomlish this goal → The PCA has been calledone of the most valuable results from applied linear algebra.


Principal Component Analysis (PCA)Geometric View

Projected Data


Principal Component Analysis (PCA)Derivation

• Consider a general data vector, xk ∈ CM×1, with the empirical(sample) covariance matrix defined as

cov(xk)def= Rx =

1

N

N−1∑i=0

xkxHk .

• Also, if we define a matrix X ∈ CN×M :XT =

[x1 x2 . . . xN

]=⇒ Rx = 1

NXHX

• The symmetric covariance matrix Rx admits the following eigenvaluedecomposition: QHRxQ = Λ

• The diagonal eigenvalue matrix, Λ = diag{λ1, λ2, . . . , λM}, indicatesthe power of each component of xk.

• The matrix of eigenvectors, Qr = [q1,q2, . . . ,qM ], designates theprincipal directions of the data.


Principal Component Analysis (PCA)Derivation

• Suppose xk is to be transformed into a vector, Uk ∈ CM×1, using alinear transformation matrix W, so that

Uk = Wxk, where cov(Uk) = I.

• The PCA states that W = Λ−12QH can be obtained from the

eigenvector and eigenvalue matrices.

• Proof:

cov(Uk) =1

N

N−1∑i=0

UkUHk

= Λ−12QH

(1

N

N−1∑i=0

xkxHk

)QΛ−1

2

= Λ−12QHRxQΛ−1

2 = I.


Principal Component Analysis (PCA)Dimensionality Reduction

• To perform dimensionality reduction, the PCA can be applied to obtaina transformed data vector Ur,k ∈ Cr×1 with the dimension r < M as

Ur,k = Wrxk = Λ−1

21:rQ

H1:rxk

• Λ1:r = diag{λ1, λ2, . . . , λr} and Q1:r = [q1,q2, . . . ,qr]

• Note: r corresponds to the r largest eigenvalues in Λ.

• The PCA selects the directions in which the data expressesmaximal variance, that is, the directions of the principal eigenvectorsof the data matrix.

• The PCA matrix Wr can be interpreted as a projection matrix as weare unable to recover xk from the “reduced” data vector Ur,k.

• The PCA projects the data onto the axes which exhibits the r-largestvariances.


Principal Component Analysis (PCA)Connections with Singular Value Decomposition (SVD)

• Consider the SVD of the data matrix X = UΣVH.

• The matrices U and V are unitary, i.e. UHU = I and VHV = I.

• The covariance matrix Rx = 1NXXH = UΣ2UH.

• Related to the eigenvalue decomposition: QΛQH = UΣ2UH

• So, PCA matrix W = Λ−12QH can be obtained from the SVD of X as

W = Σ−1UH → No need to compute Rx.


Principal components in data and noncircularity


Noncircularity and PCA


PCA examples, uncorrelated data


PCA examples, correlated data


Matrix Decomposition for RegressionPrincipal Component Regression (PCR)

Regression: Predicting dependent variables, Y ∈ CN×L, from a linearcombination independent variables, X ∈ CN×M , through a set ofcoefficients B ∈ CM×L.

Y = XB

The ordinary least squares (OLS) the solution is given by

BOLS = (XHX)−1XH︸︷︷︸Pseudo-inverse of X

Y

Problem: If X is sub-rank then the matrix XHX is singular and so thecalculation of (XHX)−1 becomes intractable.

This occurs in big data settings where number of variables being measuredis large and may contain redundant information.


Matrix Decomposition for RegressionPrincipal Component Regression (PCR)

• The PCA representation allows the separation of data into independentcomponents.

• For sub-rank data, the number of important components will be lessthan the number of variables and this “subspace” can be identifiedusing PCA.

• Furthermore, the SVD representation allows a straightforwardcalculation of a generalised inverse as

B+PCR = VΣ−1UH︸︷︷︸

Generalised Inverse of X

Y

• Σ−1 is a diagonal matrix with the reciprocal of the singular values of X.Hint: Use the property UH = U−1 and VH = V−1 to derive B+

PCR.


Notes:

◦


adaptive sp & machine intelligence lecture 4: modern...

Documents