adaptive sp & machine intelligence lecture 4: modern...
TRANSCRIPT
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Lecture 4: Modern Spectral Estimation
1
Danilo P. Mandic
Room: 813 Ext: 46271
Department of Electrical and Electronic EngineeringImperial College London, UK
[email protected] URL: www.commsp.ee.ic.ac.uk/∼mandic
Adaptive SP & Machine Intelligence
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Overview of Spectral Estimation Methods
2
Introduction To
Spectrum Est.
Relationship Between Methods
Periodogram(Using Data)
Blackman-Tukey Method
(Using ACF)
Modifications1. Windowing Data2. Bartlett’s Method3. Welch’s Method
Maximum Entropy Method
ARMA Spectrum
Estimation
Subspace Methods
• Pisarenko• MUSIC• Eigenvector Method• Minimum NormNon-Parametric
Methods
Parametric Methods
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Periodogram Based Methods
3
P
mod
(!) =1
NU
�����
N�1X
n=0
w(n)x(n)e�|n!
�����
2
PB(!) =1
N
K�1X
i=0
�����
L�1X
n=0
x(n+ iL)e�|n!
�����
2
PW (!) =1
KLU
K�1X
i=0
�����
L�1X
n=0
w(n)x(n+ iD)e�|n!
�����
2
WindowingModified Periodogram
AveragingBartlett’s Method
+ Overlapping windowsWelch’s Method
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Modified Periodogram
4
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
⇥P
!
Windowing
Windowing mitigates the problem of spurious high frequency components in the spectrum.
Reduction the “Edge Effects”
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Bartlett’s Method
5
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
P
!
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
P
!
P
!
P
!
Averaging
Reduction in Variance
Tradeoff:Frequency Resolution & Variance Reduction
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Welch’s Method
6
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
P
!
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!P
!
P !
P
!
Overlapping Windows
Pper(!) =1
N
�����
N�1X
n=0
x(n)e�|n!
�����
2
=N�1X
k=�N+1
r(k)e�|k!
Averaging
Achieves a good balance between Resolution &
Variance
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Blackman-Tukey Method
7
Pper(!) =N�1X
k=�N+1
r(k)e�|k!
The Periodogramcan also beexpressed as:
Autocorrelation Estimates at large lags are unreliable
PBT (!) =MX
k=�M
w(k)r(k)e�|k!
Lags: M < N � 1 Windowing
Next: Can we extrapolate the autocorrelation estimates for lags ?
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Maximum Entropy Method
8
How can we extrapolate the autocorrelation estimates with imposing the least amount of structure on the data?
Maximize the randomness Maximize Entropy
Lag k
r x(k)
−pi piω
P x(ej ω
)
Autocorrelation Sequences
Power Spectral Density (PSD)
Lag k
r x(k)
−pi piω
P x(ej ω
)
Lag k
r x(k)
−pi piω
P x(ej ω
)
Which one has the “flattest” PSD?
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Maximum Entropy Method (MEM)
9
Entropy of Gaussian random process with PSD :
Goal: Find extrapolated autocorrelation values to maximize the entropy:
*Refer to handout for the full derivation
Estimated using the Yule-Walker Method
The MEM method is identical to the all-pole AR(p) spectrum although no assumptions were made about the model of the data (except Gaussianity).
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
MEM, derivation
10
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
MEM, spectrum
11
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Power spectra of ARMA processes
12
! !Flat (white)Spectrum
BandpassSpectrum
w(n) y(n)
Pw(!) Py(!)
�2w
Py(!) =
����Bq(!)
Ap(!)
����2
Pw(!)
Py(!) =
����Bq(!)
Ap(!)
����2
Pw(!)
y(n) = �pX
k=1
aky(n� k) +qX
k=0
bkw(n� k)
PARMA(!) =
���Pq
k=0 bke�|k!
���2
��1 +Pp
k=1 ake�|k!
��2
Autoregressive AR(p)
Moving Average MA(q)
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Recap: ARMA processes
13
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Recap: Yule-Walker equations
14
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Recap: ARMA processes
15
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
ARMA spectrum estimation
16
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Complex AR modelling (from Lecture 2)
17
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Complex AR modelling, simulations
18
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Subspace Methods: Introduction
19
x(n) = A1e|n!1 + w(n) A1 = |A1|e|�
w(n) ⇠ N (0,�2w)
e1 = [1, e|!1 , . . . , e|!1(M�1)]T
Rxx = |A1|2e1eH1 + �2wI
Rs Rn
x = A1e1 +w
x = [x(0), x(1), . . . , x(M � 1)]T
Autocorrelation:Matrix
E(xxH) =
Signal Autocorrelation Rank 1Single non-zero Eigenvalue =M |A1|2
Noise AutocorrelationRankAll Eigenvalues =
M�2w
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Decomposing the Autocorrelation Matrix
20
is Hermitian. Remaining M-1 eigenvectors are orthogonal to
Eigenvectors of = Eigenvectors of
Can we use the idea that to somehow estimate the power spectrum?
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Multiple Sinusoids
21
Consider:
The first 2 eigenvalues of are
The remaining are noise
2
1
e
e
3
2
1
v
v
v subspacesignal
eigenvector
Rank 2
The signal and noise subspaces are orthogonal
are not the signal eigenvectors in this case.
o Signal eigevectors span a 2D subspace
o Noise eigevectos span a (M-2) dim. subspace
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Subspace Methods
22
Extending to p sinusoids.
Using
PSD estimation can be performed as:
Pisarenko Harmonic Decomposition
MUltiple SignalClassification (MUSIC)
EigenVector Method Minimum Norm Method
Noise Subspace & has min. norm
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Comparison of the 4 Subspace Methods
23
0 0.5 1 1.5 2−60
−50
−40
−30
−20
−10
0
10
20
Frequency (units of pi)
Mag
nitu
de (d
B)
0 0.5 1 1.5 2−10
0
10
20
30
40
50
60
70
80
90
Frequency (units of pi)
Mag
nitu
de (d
B)
0 0.5 1 1.5 2−5
0
5
10
15
20
25
30
35
40
45
Frequency (units of pi)
Mag
nitu
de (d
B)
0 0.5 1 1.5 2−35
−30
−25
−20
−15
−10
−5
0
5
Frequency (units of pi)M
agni
tude
(dB)
Pisarenko MUSIC
EigenVector Minimum Norm
Pisarenko only needs a 5 x 5 correlation matrix
A 64 x 64 correlation matrix was used for other methods
Except for Pisarenko’smethod, all other estimates are correct!
Overlay of 10 different realizations of 4 complex sinusoids in white noise.
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Principal Components Spectral Estimation
24
Signal Noise
Can we de-noise the signal by discarding the noise eigenvectors ? Linear Algebra terms: We impose a rank
p constraint on
Principal component analysis (PCA) can be used with Blackman–Tukey, maximum entropy method and AR spectrum estimation.
© Danilo P Mandic Adaptive Signal Processing and Machine Intelligence
Summary of the Different Methods
25
PeriodogramBlackman-TukeyMoving Average (MA)
ARMA
“Numerator” Methods
Autoregressive (AR)
Max. Entropy
Pisarenko
MUSIC
Min. Norm Eigenvector
“Denominator” Methods
Signal Subspace
Noise Subspace
Adaptive Signal Processing & Machine Intelligence
Principal Component Analysis
Danilo Mandicroom 813, ext: 46271
Department of Electrical and Electronic Engineering
Imperial College London, [email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 1
What is that a matrix does to a vector?
Ampli-twistA matrix A which multiplies a vectorx(i) stretches or shortents the vector(ii) rotates the vector
A any general matrix
R a rotation matrix (RT = R−1
and detR = 1)
Ex = λx eigenanalysis
P projection matrix
An example of a rotation matrix
R =
[cos θ − sin θsin θ cos θ
]What can we say about theproperties of the matrix A, matrixE and the projection matrix P(rank, invertibility, ...)?
Is the projection matrix invertible?
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 2
The meaning of eigenanalysis
Let A be an n× n matrix, where A is a linear operator on vectors in Rn,such that A x = b
An eigenvector of A is a vector v ∈ Rn such that A v = λv, where λ iscalled the corresponding eigenvalue.
Matrix A only changes the length of v, not its direction!
Equation A v = λv Equation A x = b.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 3
Motivation for Principal Component Analysis (PCA)from material by Rasmus Bro
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 4
And now, principal directions in datafrom material by Rasmus Bro
Clearly, a 2D representation provides a perfect separability betweenhumans and monkays, together with principal directions in data.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 5
Principal Component Analysis (PCA)Also known as the Karhunen-Loeve transform
• Many signal processing, control and machine learning tasks employmultivariate data which often exhibit dependencies and redundancies.
• For example, it is often useful to reduce the dimensionality of a signalwhile maintaining the useful information.
• This reduces the computational complexity of any algorithm whilepreserving the physical meaning of the data.
• Besides dimensionality reduction, we often would like to transform themulti-channel data such each channel is orthogonal to each other (thedata covariance matrix is diagonal)
• We use the PCA to accomlish this goal → The PCA has been calledone of the most valuable results from applied linear algebra.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 6
Principal Component Analysis (PCA)Geometric View
Projected Data
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 7
Principal Component Analysis (PCA)Derivation
• Consider a general data vector, xk ∈ CM×1, with the empirical(sample) covariance matrix defined as
cov(xk)def= Rx =
1
N
N−1∑i=0
xkxHk .
• Also, if we define a matrix X ∈ CN×M :XT =
[x1 x2 . . . xN
]=⇒ Rx = 1
NXHX
• The symmetric covariance matrix Rx admits the following eigenvaluedecomposition: QHRxQ = Λ
• The diagonal eigenvalue matrix, Λ = diag{λ1, λ2, . . . , λM}, indicatesthe power of each component of xk.
• The matrix of eigenvectors, Qr = [q1,q2, . . . ,qM ], designates theprincipal directions of the data.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 8
Principal Component Analysis (PCA)Derivation
• Suppose xk is to be transformed into a vector, Uk ∈ CM×1, using alinear transformation matrix W, so that
Uk = Wxk, where cov(Uk) = I.
• The PCA states that W = Λ−12QH can be obtained from the
eigenvector and eigenvalue matrices.
• Proof:
cov(Uk) =1
N
N−1∑i=0
UkUHk
= Λ−12QH
(1
N
N−1∑i=0
xkxHk
)QΛ−1
2
= Λ−12QHRxQΛ−1
2 = I.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 9
Principal Component Analysis (PCA)Dimensionality Reduction
• To perform dimensionality reduction, the PCA can be applied to obtaina transformed data vector Ur,k ∈ Cr×1 with the dimension r < M as
Ur,k = Wrxk = Λ−1
21:rQ
H1:rxk
• Λ1:r = diag{λ1, λ2, . . . , λr} and Q1:r = [q1,q2, . . . ,qr]
• Note: r corresponds to the r largest eigenvalues in Λ.
• The PCA selects the directions in which the data expressesmaximal variance, that is, the directions of the principal eigenvectorsof the data matrix.
• The PCA matrix Wr can be interpreted as a projection matrix as weare unable to recover xk from the “reduced” data vector Ur,k.
• The PCA projects the data onto the axes which exhibits the r-largestvariances.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 10
Principal Component Analysis (PCA)Connections with Singular Value Decomposition (SVD)
• Consider the SVD of the data matrix X = UΣVH.
• The matrices U and V are unitary, i.e. UHU = I and VHV = I.
• The covariance matrix Rx = 1NXXH = UΣ2UH.
• Related to the eigenvalue decomposition: QΛQH = UΣ2UH
• So, PCA matrix W = Λ−12QH can be obtained from the SVD of X as
W = Σ−1UH → No need to compute Rx.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 11
Principal components in data and noncircularity
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 12
Noncircularity and PCA
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 13
PCA examples, uncorrelated data
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 14
PCA examples, correlated data
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 15
Matrix Decomposition for RegressionPrincipal Component Regression (PCR)
Regression: Predicting dependent variables, Y ∈ CN×L, from a linearcombination independent variables, X ∈ CN×M , through a set ofcoefficients B ∈ CM×L.
Y = XB
The ordinary least squares (OLS) the solution is given by
BOLS = (XHX)−1XH︸ ︷︷ ︸Pseudo-inverse of X
Y
Problem: If X is sub-rank then the matrix XHX is singular and so thecalculation of (XHX)−1 becomes intractable.
This occurs in big data settings where number of variables being measuredis large and may contain redundant information.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 16
Matrix Decomposition for RegressionPrincipal Component Regression (PCR)
• The PCA representation allows the separation of data into independentcomponents.
• For sub-rank data, the number of important components will be lessthan the number of variables and this “subspace” can be identifiedusing PCA.
• Furthermore, the SVD representation allows a straightforwardcalculation of a generalised inverse as
B+PCR = VΣ−1UH︸ ︷︷ ︸
Generalised Inverse of X
Y
• Σ−1 is a diagonal matrix with the reciprocal of the singular values of X.Hint: Use the property UH = U−1 and VH = V−1 to derive B+
PCR.
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 17
Notes:
◦
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 18
Notes:
◦
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 19
Notes:
◦
c© D. P. Mandic Adaptive Signal Processing & Machine Intelligence, Spring 2017 20