mit lincoln laboratory 000317-1 s.t. smith geometry and invariance in signal processing steven t....

MIT Lincoln Laboratory000317-1S.T. SMITH

Geometry and Invariancein

Signal Processing

Steven T. Smith*

*MIT Lincoln Laboratory, Lexington, MA 02420; [email protected]. This work was sponsored by the United States Air Force under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.


Covariance Matrix Estimation Bounds

Steven T. Smith*

*MIT Lincoln Laboratory, Lexington, MA 02420; [email protected]. This work was sponsored by the United States Air Force under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.


Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusions


Applications That UseCovariance Matrix Estimation

Air and Ground Surveillance• Space-Time adaptive processing• SAR/GMTI• Tracking

Algorithms and systems analysis for detection, location, and classification of difficult signals

all rely on covariance-based methods

Algorithms and systems analysis for detection, location, and classification of difficult signals

all rely on covariance-based methods

Signals Intelligence• Spectral analysis• Superresolution

Robust Navigation• Adaptive beamforming

Undersea Surveillance• Adaptive beamforming• Spectral analysis• Tracking

AdvancedCommunications

• Adaptive beamforming• Spectral analysis• Speech


What’s Known AboutCovariance Matrix Estimation Quality?

• The sample covariance matrix (SCM) is the most likely covariance matrix estimate

– The SCM looks like: Rˆ = XX H (X is the N-by-K “data matrix”)

The “sample support” is K samples

– The SCM is unbiased: E [Rˆ] = R

– The SCM is “efficient”: Cov(Rˆ – R) is as small as possible

– The SCM is a lousy estimate at low sample support, SNRs

Subspace and ad hoc methods like “diagonal loading” are necessary

• The sample covariance matrix (SCM) is the most likely covariance matrix estimate

– The SCM looks like: Rˆ = XX H (X is the N-by-K “data matrix”)

The “sample support” is K samples

– The SCM is unbiased: E [Rˆ] = R

– The SCM is “efficient”: Cov(Rˆ – R) is as small as possible

– The SCM is a lousy estimate at low sample support, SNRs

Subspace and ad hoc methods like “diagonal loading” are necessary

Reed-Mallett-Brennan-Kelly-Boroson detection loss

Reed-Mallett-Brennan-Kelly-Boroson detection loss

1 2 5 10 15 200

3

6

Sample support / N

Lo

ss (

dB

)

K–N+2K+1

SCM eigenvalues(the “deformed quarter-circle law”)

SCM eigenvalues(the “deformed quarter-circle law”)

1 5 10 15 20–12

–6

0

6

Index

(d

B)

K = 2N


Geometry is the Foundationof Signal Processing

Covariance matrices• Hermitian positive definite

Covariance matrices• Hermitian positive definite

Signal subspaces• Euclidean space

• Grassmann manifold

• Stiefel manifold

Signal subspaces• Euclidean space

• Grassmann manifold

• Stiefel manifold

Scaling• Magnitude

• Phase

Scaling• Magnitude

• Phase

Invariance testingInvariance testing

Statistical models• ƒ(z |)

–Parameter space–Cramér-Rao bounds

Statistical models• ƒ(z |)

–Parameter space–Cramér-Rao bounds

Spectral estimation• Array manifolds

Spectral estimation• Array manifolds

Filtering +AdaptationFiltering +Adaptation DetectionDetection EstimationEstimation

PhysicalModelingPhysicalModeling

Measurements

Signal Processing Steps

TrackingTracking


S1

Does E[] = 0?

What’s The Average Value of a Circle?Uniform Distribution

What aboutthis circle?

Or this one?Should theembedding

matter?

Shouldn’t E[] liveon the circle?

Yes! But where?


Some Literature (Very Partial List)

• Array manifolds / Spectral estimation– Schmidt ’79, Roy & Kailath ’89, Swindlehurst et al. ’92

• Maximum likelihood (geometric approach)– Amari ’85, Douglas & Amari ’97

• Phase-only nulling– Baird & Rassweiller ’76, Steyskal ’83, Hirasawa ’88, Smith ’94

• Structured covariance matrix estimation– Burg, Luenberger & Wenger ’82, Fuhrman & Barton ’90, Barton & Smith ’96

• Fourier analysis on homogeneous spaces (spheres etc.)– Healy ’95, Rockmore ’95

• Invariant hypothesis testing / Detection– Fisher ’53, Kay & Scharf ’84, Bose ’94, Scharf ’95

• Ambiguity functions– Rendas & Moura ’98

• Estimation bounds– Rao ’45, Gorman & Hero ’90, Smith ’00, ’05, Bhattacharya and Patrangenaru ’02

• Communications– Douglas ’00, Rahbar ’01, Zheng & Tse ’02, Cichocki, Amari & Georgiev ’02, Xavier ’02

• Pose estimation– Srivastiva & Grenander ’99, Ma et al. ’01, Adler et al. ’02, Srivastava & Klassen ’02


Proving Wegener’s Theoryof Continental Drift

From Margaret Hanson (U Cincinnati)From Gary Glatzmaier (UCSC)

From www.ucmp.berkeley.edu/geology/tectonics.html

From www.itis-molinari.mi.it/Boundaries.html,based on Vine (1966)

Do magnetic polarities here and here have the same statistical distribution?

– “Dispersion on a sphere” (Fisher, 1953)

Do magnetic polarities here and here have the same statistical distribution?

– “Dispersion on a sphere” (Fisher, 1953)

Fisher’s famous paper actually analyzed data from Iceland

Fisher’s famous paper actually analyzed data from Iceland

730 MYA

65 MYA

QuickTime™ and aGIF decompressor

are needed to see this picture.


Outline

• Introduction


– Spheres, subspaces, and covariance matrices

Non-Euclidean examples in signal processing

– Manifolds, derivatives, geodesics






What Is the Average ofTwo Points on a Circle?

1

2• Average(1, 2) = w1•1 + w2•2

– What does multiplication by a weight mean?

– What does addition mean?

• These operations only make sense if the circle “lives” in some Euclidean space

• Are there “intrinsic” or “natural” equivalents of these ideas so that all operations take place on the circle?

– w1•1 = some other point on the circle

– 1 + 2 = some other point on the circle

• Same questions for spheres, n-spheres


What Is the Average ofTwo Subspaces?

• Average(Y1, Y2) = w1•Y1 + w2•Y2

– What do these operations mean?

– Intrinsic explanations required

– w1•Y1 = some other subspace

– Y1 + Y2 = some other subspace

• No obvious way to embed the space of subspaces Gn,p (the Grassmann manifold) in Euclidean space

– Y an n-by-p matrix with orthonormal columns, but only the column span matters: YA Y column

– The n-by-n projection matrix YY T

– Neither gives a way to compute w1•Y1 + w2•Y2

Different Manifold, Same Questions

X1

X2

X3

Subspace Y1

SubspaceY2


Covariance Matrix Estimation

• Sample covariance matrix (SCM): Rˆ = K

–1XX T

• What’s the average value of the SCM?

– E[Rˆ] = Rˆƒ(X|R) dX = R

– If w1•R1 + w2•R2 makes sense, then integral makes sense

– May we treat covariance matrices as vectors?

• Question: What do you get when you subtract one covariance matrix from another?

• Answer: Not a covariance matrix!2

0

0

1

1

0

0

2

1

0

0

–1

– =

Cone of Hermitian positive definite

matrices• R is a covariance• So is R, > 0

R1

R2

The covariance matrices are not a vector space

data matrix X


Comparing Points on Manifolds

• Compare points using geodesic curves [exponential map]:

– Equate points on the manifold with tangent vectors at

• Average(1, 2) = exp(w1•exp–1

1 + w2• exp–1

2)

– Intrinsic average “lives” on the manifold

• Estimator bias b() depends upon choice of geodesics

ˆE[]

Parametermanifold

estimator

Tangent planeat exp–1

Geodesiccurves

b() = E[exp–1] = exp–1 E[]ˆ

ˆ


Natural Geodesics on Quotient Manifolds

Spheres

Great circle

Covariance geodesics:

R(t ) = R1/2 expm(R–1/2 Dt R–1/2) R

1/2

distance = 2-norm of log(eigenvalues)

Compare to flat geodesics R(t) = R + t D

Subspace geodesics:

Y(t ) = Y V cos(t )V H + U sin(t )V

H

distance = 2-norm of acos(singular values)

Subspaces Covariancematrices

X1X2

X3

Subspaces =U(n)

U(p) U(n–p)

= the part of U(n) that doesn’t give in-plane or co-plane rotations

Covariancematrices =

Gl(n,C)

U(n)

= the Hermitian part of the matrix polar decomposition

Spheres =U(n)

U(n–1)

= the part of U(n) that rotates the north pole

Lie groups

R1

R2

Y1

Y2

Distancesmeasured in

radiansDistances

measured indecibels


Derivatives

• The differential

(d /dt ) ƒ(x (t )) = ∂ƒ/∂x1 dx1/dt + ∂ƒ/∂x2 dx2/dt + … = ∂ƒ/∂x dx/dt

• The gradient

– The direction grad ƒ that solves the equation

grad ƒ, dx/dt = ∂ƒ/∂x dx/dt grad ƒ = G –1

∂ƒ/∂x

– Same derivation for Wiener filter equation w = R–1v

• The Hessian / Covariant differentiation

(d 2/dt

2) ƒ(x (t )) = 2ƒ(dx/dt, dx/dt ) (x(t ) a geodesic)

2ƒ(dxi /dt, dxj /dt ) = ∂ 2ƒ/∂xi ∂xj – ∑k

kij ∂ ƒ/∂xk

x1

x2

Manifold

x (t )

“curvature” terms

Think Cramér-Rao bound

Think Cramér-Rao bound


Outline

• Introduction



– Intrinsic Cramér-Rao bounds

– Covariance matrix estimation

– Subspace estimation accuracy





The Fisher Information Matrix (1922)

The covariance of a Gaussian estimate is inversely proportional to the negative mean Hessian of the log-likelihood function

– “On the mathematical foundations of theoretical statistics” (Fisher, 1922)

The covariance of a Gaussian estimate is inversely proportional to the negative mean Hessian of the log-likelihood function

– “On the mathematical foundations of theoretical statistics” (Fisher, 1922)


C ≥ G –1C ≥ G –1

Intrinsic Cramér-Rao Lower BoundUnbiased Euclidean Case

ˆE[] =

Parameterspace

estimator

C ≥beamwidth2

SNRInverse FIM term

Error covariance Inverse Fisher information matrix

CRB looks like:


C ≥ (I + ∂b/∂)G –1(I + ∂b/∂)TC ≥ (I + ∂b/∂)G –1(I + ∂b/∂)T

Intrinsic Cramér-Rao Lower BoundBiased Euclidean Case

C ≥beamwidth2

SNRInverse FIM+bias term

Error covariance

Inverse Fisher information matrix

CRB looks like:

Derivative of bias vector b

Parameterspace

estimatorˆE[] = b()


Intrinsic Cramér-Rao Lower BoundUnbiased Riemannian Case

ˆE[] = Parametermanifold

estimator

C ≥beamwidth2

SNR

1 –beamwidth2 curvature

SNR+ O(SNR–3)

• Inverse FIM term– Really care

about this term

• Local curvature term– SNR–2 term with

Riemannian curvature– Not sure that I care: An

open question

• Higher order terms– I know that I don’t care:

CRB an asymptotic bound

C ≥ G –1 – (Rm(G

–1)G –1 + G

–1Rm(G –1)) C ≥ G

–1 – (Rm(G –1)G

–1 + G –1Rm(G

–1))


Mean Riemannian curvature

1

3

CRB looks like:

Error covariance


Intrinsic Cramér-Rao Lower BoundBiased Riemannian Case

C ≥beamwidth2

SNR

1 –beamwidth2 curvature

SNR+ O(SNR–3)

• Inverse FIM+bias term

– Really care about this term

• Local curvature term– SNR–2 term with

Riemannian curvature– Not sure that I care: An

open question

• Higher order terms– I know that I don’t care:

CRB an asymptotic bound

C ≥ MbG –1Mb

T – (Rm(C)MbG –1Mb

T + MbG –1Mb

TRm(C))

Mb = I – ||b||2K(b) + b

C ≥ MbG –1Mb

T – (Rm(C)MbG –1Mb

T + MbG –1Mb

TRm(C))

Mb = I – ||b||2K(b) + b


Mean Riemannian curvature

1

3

CRB looks like:

ˆE[]Parametermanifold

estimator

b()

1

3

Sectional curvature along bias b bases

Covariant differential of bias vector field b

Error covariance


• The Cramér-Rao lower bound is the smallest possible covariance of the estimation error

• The CRB depends upon the choice of geodesics

• The Cramér-Rao lower bound is the smallest possible covariance of the estimation error

• The CRB depends upon the choice of geodesics

Intrinsic Cramér-Rao Lower Bound

Error covariance ≥ ƒ(Inverse Fisher Information Matrix)

Estimation errorcovariance

ˆE[]

Parametermanifold

estimator

exp–1

b() = E[exp–1] = exp–1 E[]ˆ

ˆ

bias


Cramér-Rao Bound in Four Easy StepsA new proof of the CRB [Euclidean case]

Fact 1. = log ƒ(z|), = ∂ /∂ , G = E [T ]

Lemma 1. E [ ] = 0 (Differentiate equality ƒ(z|) d z = 1)

Fact 2. a biased estimator of , E [] = + b()

Lemma 2. E [( – – b) ] = I + b (Differentiate Fact 2)

Theorem. E [( – – b)( – – b)T] ≥ (I + b )G

–1(I + b )T

Proof. Consider the covariance of the random variablev = ( – – b) – (I+b)G

–1

T. E v = 0 by Lemma 1 and

Fact 2. By Lemma 2, E vvT = E [( – – b)( – – b)T] – (I+b)G

–1(I+b)T ≥ 0.

QED

Conclusion. E [( – – b)( – – b)T ] ≥ (I + b)G

–1(I + b)TConclusion. E [( – – b)( – – b)T

] ≥ (I + b)G–1(I + b)

T

The newpart


Intrinsic Efficiency Unbiased Euclidean Case

Consider the random tangent vector:

v = – – grad ˆ

Vector from to ˆ Gradient (w.r.t. FIM) of log-likelihood = log ƒ

Facts: E[v] = 0 and E[vvT] ≥ 0 (this is the CRB)

Efficiency: Estimator achieves the CRB iff

= + grad ˆ

ˆ


Intrinsic Efficiency Biased Euclidean Case




= + b + (I + ∂b/∂)(grad )ˆ

ˆ

v = – – b() – (I + ∂b/∂)(grad )ˆ

Vector from to ˆGradient (w.r.t. FIM) of log-likelihood = log ƒBias vector

Derivative of bias vector


Intrinsic Efficiency Unbiased Riemannian Case


v = exp–1

– (I – Rm(C))(grad )1

3ˆ

Tangent from to ˆ Mean Riemannian curvature of exp

–1ˆ and basisGradient (w.r.t. FIM) of log-likelihood = log ƒ



exp–1

= (I – Rm(C))(grad )1

3ˆ

ˆ


Intrinsic Efficiency Biased Riemannian Case


v = exp–1

– b – (I – ||b||2K(b) – Rm(C) + b)(grad )1

3

1

3ˆ

Tangent from to ˆ

Bias vector

Sectional curvatures between b and basis

Mean Riemannian curvature of exp

–1ˆ and basisGradient (w.r.t. FIM) of log-likelihood = log ƒ

Covariant differential of bias vector field b



exp–1

= b + (I – ||b||2K(b) – Rm(C) + b)(grad )1

3

1

3ˆ

ˆ


What Are The Sectional andRiemannian Curvatures?

And Do They Matter?

• Bias term Mb depends upon ||b||2K(b)

– Small for small biases relative to (max |K |)–1/2

• Covariance term Rm(C) equals E[R(exp–1ˆ,),exp

–1ˆ]

– Small for small errors relative to (max |K |)–1/2

9 dB for covariance matrices, 1 radian for subspaces

A0 = r 2

Aexp

tX

tY–tX

–tY

ZZ(t )

R(X,Y )Z = limZ – Z(t )

t 2t 0

Riemanniancurvature

K(X Y ) = R(X,Y )Y,X / ||X Y ||2

= tr([X,Y ])2/4 ≤ 0 for covariances

= –tr([X,Y ])2/2 ≥ 0 for subspaces

K = limA0 – A

r 2A0r 0

sectionalcurvature

12


Metric Structure of Statistical ModelFisher Information Metric/Matrix

With coordinates (Rao ’45; e.g., angle, Doppler, etc.)• ds2 = g11 d 1 d 1 + 2g12 d 1 d 2 + g22 d 2d 2 + …

• gij = E [∂ /∂ i ∂ /∂ j ], () = log ƒ(z|) = log-likelihood

• G = [gij ] = Fisher information matrix (FIM)

Without coordinates (covariance, subspaces, etc.)• g = E [d d ] = Fisher information metric

Parameterspace

1

2

+d ds

What’s the length of ds?z

ƒ (z|+d )ƒ (z|)

What is the distance between two distributions?

Parameter spacepdf ƒ(z|), in U

Parameter spacepdf ƒ(z|), in U

Statistical modelS = { ƒ(z|) : in U }

Statistical modelS = { ƒ(z|) : in U }


Fisher Information Metric:

g E [d d ] = – E [ 2 ]

Fisher Information Metric:

g E [d d ] = – E [ 2 ]

The Fisher Information Metricand The Hessian

Why this fact is useful and important:

• Computationally convenient

– Second derivatives of many distributions more tractable than squares of first derivatives

• Independent of curvature of parameter space

– Fisher information matrix independent of arbitrary choice of affine connection and/or distance metric (“curvature” terms) for parameter space

Recall that 2ƒ(dxi /dt, dxj /dt ) = ∂ 2ƒ/∂xi

∂xj – ∑k k

ij ∂ ƒ/∂xk

“curvature” terms


Outline

• Introduction










Covariance matrices flat:

Covariance matrices

R1R2

Surprising and Useful Result!

SCM is a Biased and Inefficient Estimator

• Sample covariance matrix (SCM): Rˆ = K

–1XX T

data matrix X

• Geodesics R(t) = R + t (Rˆ – R)

• ER[Rˆ] = expR (expR–1Rˆ)ƒ(X|R) dX

= R + (Rˆ – R)ƒ(X|R) dX

= R

• Rˆ is an unbiased and efficient (i.e., achieves CRB) estimate of R

• Doesn’t account for extra estimation loss at low sample support

• Geodesics R(t) = R1/2eR–1/2Dt R–1/2

R1/2

• ER[Rˆ] = e–(N,K)R

R

• Rˆ is an biased and inefficient (error larger than CRB) estimate of R

• The bias term (N,K) corresponds to extra estimation loss at low sample support

Covariance matrices curved:

No surprise hereNo surprise here Completely unexpected!Completely unexpected!


Sample Covariance Matrix Estimation

Covariance RMSE vs Sample Support

An estimator ˆ of is efficient (neglecting R-curvature) iff:

exp–1

E[ˆ] = b() +

(I–||b||2K(b)/3+b) grad

An estimator ˆ of is efficient (neglecting R-curvature) iff:

exp–1

E[ˆ] = b() +

(I–||b||2K(b)/3+b) grad

Is there a more efficient covariance estimator at low sample support?

≈ 10 dB difference

Is there a more efficient covariance estimator at low sample support?

≈ 10 dB difference

100 101 102

Sample support / N

100

101

102

Co

vari

ance

RM

SE

(d

B)

SCM (natural metric)

Biased natural CRB

Unbiased natural CRB

SCM (flat metric)

Flat unbiased CRB

6-by-6 Hermitian example, 1000 Monte Carlo trials

10/log 10·N/√K dB

10/log 10·ƒ(R)/√K dB

Flat efficiency:

E[ˆ] = + b() +

(I+∂b/∂)G–1(∂/∂)T

Flat efficiency:

E[ˆ] = + b() +

(I+∂b/∂)G–1(∂/∂)T


CRBs for SCMsClosed-Form Expressions

• Natural covariance metric/geodesics

– distance(Rˆ,R) = norm(log(eig(Rˆ,R)))

– mean-square distance(Rˆ,R) ≥ N 2/K + N·(N,K)2

(N,K) = N –1(N·log K + N – (K–N+1) + (K–N+1)(K–N+2) + (K+1) –

(K+1)(K+2)) [N-by-N Hermitian case] (N,K) = log K/2 – N

–1∑i ((K–i+1)/2) [N-by-N Symmetric case]

– Natural covariance metric invariant to R and complete

A “whitened” covariance metric

• Flat covariance metric/geodesics

– distance(Rˆ,R) = norm(Rˆ – R,’fro’)

– mean-square distance(Rˆ,R) ≥ K–1(∑i Rii 2 + 2∑I<j Rii Rjj )

[Hermitian]

– mean-square distance(Rˆ,R) ≥ 2K–1(∑i≤j Rij 2 + ∑I<j Rii Rjj )

[Symmetric]

– Flat covariance metric not invariant or complete

digamma function = ´/


Symmetric Space Geometry and SCM Bias

Covariancematrices

ER[Rˆ] = e–(N,K)R / B(R) = –(N,K)R

= the Hermitian part of the matrix polar decomposition

= determinant unit-determinant covariance matrices

Covariancematrices =

Gl(n,C)

U(n)= R

Sl(n,C)

U(n)~

=~

Gl(n,C)/U(n)

Sl(n,C)/U(n)R

• Only the SCM’s determinant is biased

– Implies that the eigenvalues are biased

• The covariant differential and curvature of B vanish

– Because B(R) = –R

• Is there a connection with the symmetric space decomposition of the covariance matrices?

• Can something be said about the bias in other symmetric spaces, e.g., the Grassmann manifold?


Outline

• Introduction










Subspace Estimation Accuracy

• Standard subspace estimation method (SVD)

– [U,S,V] = svd(X,0) (Matlab notation)

– SVD-based subspace estimate: Yest = U(:,1:p)

• What is the error between this estimate and the truth?

– Natural subspace distance = (∑ principal angles2): norm(acos(min(svd(orth(Yest)’*orth(Ytrue)),1)))

– Recall that the Fisher information metric is independent of this choice of error metric

Ytrue

Yest

Subspace distance


Subspace Accuracy vsSNR and Sample Support

10 20 30 40 50 60

SNR (dB)

10–4

10–3

10–2

10–1

10 0

Su

bs

pac

e a

ccu

racy

(ra

d)

SVD-based subspace estimation

Cramér-Rao bound

5-by-2 example, 10 snapshots, 1000 Monte Carlo trials

100 101 102 103

Sample support / 2

10–3

10–2

10–1

10 0

Su

bs

pac

e a

ccu

racy

(ra

d)

SVD-based subspace estimation

Cramér-Rao bound

5-by-2 example, 21 dB SNR, 1000 Monte Carlo trials

• The RMSE of the SVD- based estimator is a small constant fraction above the CRB for fixed sample support

• The RMSE of the SVD- based estimator approaches the CRB for large sample support

• SVD-based estimation provides nearly optimum performance

• Coordinate-free CRB analysis required for this conclusion

(pn – p)1/2/(K·SNR)1/2 rad*

*white subspace covariance


Outline

• Introduction




– The detection problem

– Invariant analysis via fiber integration

– Mean and variance CFAR normalizers



Wrap up


The Detection ProblemUnknown Background

• Detector design: Threshold data to detect signals in the presence of noisy backgrounds with a low false alarm rate

• Problem: Predict detection performance with unknown background mean* and variance*

0 20 40 60 80 100Time (µs)

0

10

20

30

40

Po

wer

(d

B)

Signal

Noise

ThresholdFalse alarm(Almost)

everythingyou need

to knowabout

thedetectionproblem:

mean*

variance*


Mean and Variance CFAR Normalizers

• The deflection ratio: x(z) = (z – )/

– Traditional method most encountered

– Minor nit: maps power domain data onto entire real line

Easily handled with remapping log(exp(z)+1) or (z+√(z2+1))/2

• The log-deflection ratio: x(z) = exp((log z – log)/log)

– Decibel-space version of deflection ratio

– Power-domain method

• The log-Gamma CFAR method: x(z) = –log (,·z/)/()

– Maps chi-squared backgrounds with = 2/2 complex dof to exponential backgrounds ( = 1); SNR SNR/2

– Power-domain method

• Can compute closed-form statistics of all of these

– Fiber integration using mapping x(z), = (, )


Outline

• Introduction










Fiber IntegrationInterpretation for CFAR Analysis

ƒX(x0) = ƒZ() dy

x–1(x0)det

x/y/( )

CFAR output X = function of input data Z = X(Z)

MFMF | · |2| · |2Radardata

DivideDividetappeddelay line

∑∑

test cell

Selection logicSelection logic

z z z z z z z

XE.g.: mean-level CFAR

CFARoutput x

CFAR outputstatistics x0

Input data z yielding CFAR output x0 =

“fiber” above x0

Fiber above x0

Inputstatistics

Fiberintegration


Joint pdf of Sample Mean and Variance

• First two sample moments: m1 = (1/K)∑zk , m2 = (1/K)∑zk2

– Sample mean and variance: = m1, 2 = m2 – m12

– Mutually dependent random variables

– Will use this to our advantage to obtain finite integrals

• Apply fiber integration with the mapping (z1, z2, …, zK) (1, 2)

– Results in integral on the (K–1)-simplex (1/K)∑zk = 1

– Easily performed by Monte Carlo integration for high relative accuracy

Another application of invariance to obtain uniform density

ƒ (m1,m2) = m1NK–3e–NKm1s´(m2/m1

2)KKNNK

(K)(N)K

s(m2) = (1/M )∑ ∏(zk)N–1 (Monte Carlo evaluation)

A fiber


Outline

• Introduction







Deflection ratio: (z – )/ Log-deflection ratio: (log z – log)/log

Log-Gamma: –log (,·z/)/()




CFAR Distributions

Deflection ratio and Log-Gamma

ƒdr(x) = e–S –1+/2s´(m2) dm2

(x–1/2+1)–1+(x–1/2+1+K)–N(K+1)

1F1(N(K+1); N; S(x–1/2+1)/(x–1/2+1+K))

KK (N(K+1))

(K)(N)K+1 1

min(K,1+1/(x))

Deflection ratio

(m2 = 1 + –1)

ƒlg(x) = e–S–x s´(m2)

()()–NK(1+N/)–N(K+1)e

1F1(N(K+1); N; S/(1+N/)) dm2

KK (N(K+1))

(K)(N)K+1

Log-Gamma CFAR

(m2 = 1 + –1; –log (,)/() = x)

1

K

• Background statistics: chi-squared with N complex dof

• Sample support: K samples

• Signal model: nonfluctuating target with SNR S


0 50 100 150

Variate

10–8

10–6

10–4

10–2

100

102

pd

f

CFAR Density Example

with Monte Carlo Comparisons

N = 2K = 10SNR = 9 dB

Deflection ratio (exact and M.C.)Log-Gamma (exact and M.C.)


0 5 10 15 20SNR (dB)

0

1/2

1

PD

Matched filterMean-level CFARDeflection ratioLog-Gamma CFARLog-deflection ratio N = 1

K = 20PFA = 10–4

Receiver Operating Characteristics

• Compare to mean-level CFAR (at PD = 50%)

– Extra CFAR loss for variance normalization

• Deflection ratio has 0.5 dB CFAR loss

• Log-Gamma has 1.2 dB CFAR loss

• Log-deflection ratio has 4 dB CFAR loss

• For sample support K = 10, these losses are 0.7 dB, 1.5 dB, and 8 dB


Outline

• Introduction





• Summary and conclusionsWrap up


Geometric Optimization

• Generalization of Euclidean optimization

• Reformulate classical algorithms

– Newton’s method

– Steepest descent

– Conjugate gradient method

• Perform optimization of constraint surface Lines Geodesics, etc.

Benefits:• Natural description of problem• Unifying viewpoint for algorithms

Caveats:• Computational infeasible in general• Group invariance highly desirable

Benefits:• Natural description of problem• Unifying viewpoint for algorithms

Caveats:• Computational infeasible in general• Group invariance highly desirable

Luenberger’sant-on-surface

analogy


Applied Optimization Problems

• Signal processing and detection

H0: z = n (interference+noise hypothesis)

H1: z = av + n (signal-plus-interference+noise hypothesis)

R = covariance matrix = E[nnH]

• Subspace tracking Maximize ƒ(Y ) = tr Y HRY such that Y HY = I

• Eigenvector tracking Maximize ƒ(Y ) = tr Y HRYN such that Y HY = I

• Local density approximation of Schrödinger’s equation Maximize ƒ(Y ) = tr Y HHY + (Y ) such that Y HY = I

H = Hamiltonian

Y

n-by-p

Bradbury & Fletcher ’66, Alsén ’71, Ruhe ’74, Cullum ’78, Parlett et al. ’82, Comon & Golub ’90, Yang & Kaveh ’88, Yang ’95, Edelman, Arias & Smith ’98, many others


Common Structure Among Problems

• Constrained optimization

• Group invariance

– tr(Y )HR(Y ) = tr HY HRY = tr Y

HRY H = tr Y

HRY

for any unitary matrix

• Pertinent fields of study

– Numerical linear algebra

– Optimization

– Differential and Riemannian geometry

– Lie groups and Lie algebras

– Homogeneous and symmetric spaces

– Adaptive filtering

Exploit Natural Structure of ProblemExploit Natural Structure of Problem

Y

n-by-p

p-by-p


Newton’s Method

Quadratic Convergence with Newton Step = –(2ƒ)–1(ƒ)

Euclidean:

Maximize ƒ(x), x in Rn

Euclidean:

Maximize ƒ(x), x in Rn

Riemannian:

Maximize ƒ(x), x in M

Riemannian:

Maximize ƒ(x), x in M


Optimization on the Grassmannand Stiefel Manifolds

• Maximize ƒ(Y ) = tr Y HRY such that Y HY = I

– ƒ(Y ) = ƒ(Y ) for all unitary

• Maximize ƒ(Y ) = tr Y HRY such that Y HY = I

– ƒ(Y ) = ƒ(Y ) for all unitary

Y

n-by-p


Newton’s Method on theGrassmann Manifold

• A Sylvester equation for H

• Asymptotically equivalent to RQI (cubic convergence)

• Solution methods

– Direct O(n3p) method Uses Ritz vectors of Y

Computationally unattractive

– Linear conjugate gradient Truncated Newton approach

Yields O(n2p2) algorithm

Generalized Rayleigh Quotient ƒ(Y ) = tr Y HRY, Y HY = I

Solve for H:

(I – YY H)RH – H (Y

HRY ) = –(I – YY H)RY; Y

HH = 0

Solve for H:

(I – YY H)RH – H (Y

HRY ) = –(I – YY H)RY; Y

HH = 0

Hessian of ƒ –Gradient of ƒ Tangent vectorconstraint


Numerical Results

Trace Maximization on G 5,3


Interference Suppression

Rotating Phased Array Antenna

Problems:

• Maximize signal-to-interference-plus-noise ratio

• Track interference and/or signal subspace

Problems:

• Maximize signal-to-interference-plus-noise ratio

• Track interference and/or signal subspace


Time Varying Adaptive Filter

Rotating Phased Array Antenna


R(t ) = Covariance / Y (t ) = Principal Invariant Subspace

Dynamics for H0 = d Y/dt:

H0(Y HRY ) – (I – YY

H)RH0 = (I – YY H)(d R/dt )Y; Y

HH0 = 0

Dynamics for H0 = d Y/dt:

H0(Y HRY ) – (I – YY

H)RH0 = (I – YY H)(d R/dt )Y; Y

HH0 = 0

Tangent vectorconstraint

Subspacedynamics

Covariancedynamics

The Subspace Tracking Equation

• New solution “close” to old solution

• Approaches to subspace tracking:

– Rank-one updates versus full rank updates:

Rnew = Rold + xx

H versus Rnew = Rold + d R/dt

– Algebraic versus geometric approaches Algebraic (decomposition based): Fuhrmann ’88, Comon and

Golub ’90, Yu ’91, Moonen, Van Dooren & Vandewalle ’92, Stewart ’92, Champagne ’94, Liu ’94

Geometric (derivative based): Yang & Kaveh ’88, Yang ’95, Smith ’93, Smith ’96


Summary and Conclusions

• Geometric invariance ubiquitous in signal processing

– Geometric properties can be exploited for solutions and insight

• The Cramér-Rao bound with bias is generalized to arbitrary manifolds without intrinsic (prescribed) coordinates

– Estimator bias and efficiency depend upon geometry

– SCM biased and inefficient from intrinsic perspective Bias term corresponds to known sample support loss

• Derived formula bounding covariance and subspace accuracy

• Fiber integration applied to mean and variance CFAR analysis

– Previously unsolved performance analysis

– The deflection ratio outperforms the log-Gamma and log-deflection ratio tests (0.5 dB loss vs 1.2 dB vs 4 dB)

• Orthogonality constraints easily incorporated into many optimization problems (e.g., subspace tracking)

• Story incomplete—still the Age of Discovery


The End


Backups


Why Geometryfor Signal Processing?

• The easy reasons

– Euclidean geometry and linear algebra used for just about everything in signal processing

Optimal Filtering theory / Least squares concepts

Detection and Estimation and Root-Mean-Square Error

Parseval’s theorem

• Abstract geometry is useful

– Use the right tool for the right job

Many objects in signal processing are non-Euclidean Space of covariance matrices / Space of subspaces

An accurate geometrical description is insightful

– Useful for solving new problems [e.g., mean and variance CFAR]

• Abstract geometry is beautiful

– Differential and Riemannian geometry / Lie groups and Lie algebras / Homogeneous spaces


The Intrinsic Cramér-Rao Bound

C G–1 = E [d d ]–1 (unbiased, neglect curvature)

(I + b)G–1(I + b)T (biased, neglect curvature) MbG–1 Mb

T + R(C) (biased, with curvature terms)

C G–1 = E [d d ]–1 (unbiased, neglect curvature)

(I + b)G–1(I + b)T (biased, neglect curvature) MbG–1 Mb

T + R(C) (biased, with curvature terms)

CRB assumptions• Parameter space an arbitrary manifold

• () = log ƒ(z|) = log-likelihood function

• g = E [d d ] = Fisher information metric

• Arbitrary coordinates = (1, 2, …, n)

• G = [gij ] = FIM w.r.t. arbitrary coordinates

• Any estimator of with bias vector b()

• Error covariance matrix = E [( – – b())( – – b())T ]

• Matrix inequality: A B iff A – B is positive semidefinite


Subspace Estimation

• Subspace estimation is a frequently encountered in signal processing

• Consider the MIMO problem: z = An1 + n0

– Estimate the channel matrix A from several measurements of z

The noisy inputs n1 and n0 are unknown

– Invariance: z is the same measurement for A AM, n1 M–1n1 for an arbitrary matrix M

• Only the column span (subspace) is unchanged by A AM

– Can only measure the column span of A column

• How does one make sense of the “error” or “bias” of the subspace estimate?

– Space of subspaces is non-Euclidean


Homogeneous SpacesE.g.: Covariance Matrices

Identitymatrix I

Cone of Hermitian positive definite

matrices• R is a covariance• So is R

1

0

0

1

Gl(n,C) = Lie group of complex

invertible matrices

U(n) = Lie group of unitarymatrices: H = I

Group action: R MHRM

Identity matrix invariant to group action by U(n): I HI = I

1

0

2j

3

1

0

2j

3

H

=1

–2j

2j

13

c

–s

s

c

=

H c

–s

s

c

Covariancematrices

Gl(n,C)=

U(n)

= the part of Gl(n,C) that doesn’t give invariance

Set of allowable “directions”

“Directions” that don’t get me anywhere

Rule for following “directions”


Space of Subspaces

The Grassmann Manifold

X1

X2

X3

Gn,p

U(n) = Lie group of unitary matrices

U(p) = Lie group of in-plane rotations

Group action: Y Y (rotation)

Subspace Y invariant to both in-plane and co-plane rotations

Y = A p-dimensional subspace in Cn

U(n–p) = Lie group of co-plane rotations

SubspacesU(n)

=U(p) U(n–p)

= the part of U(n) that doesn’t give in-plane or co-plane rotations

Subspace Y


Riemannian Manifolds

• Manifold: A space that locally looks like Rn or Cn

• Riemannian manifold: A manifold with the distance metric

ds2 = g11 dx1 dx1 + 2g12 dx1 dx2 + g22 dx2dx2 + …

= dxT G dx = dx, dx

• Examples

– Sphere: x2 + y2 + z2 = 1

dimension = 3 – 1 = 2

– Orthogonal matrices: T = I

dimension = n2 – 1/2·n(n + 1) = 1/2·n(n – 1)

x1

x2

Manifold

x

x1

x2

Riemannianmanifold

xx+dxds


Geodesicsa.k.a. the Exponential Map

• Geodesics are generalizations of straight lines

• They minimize the length between two points

• Geodesic equation “exponential map”

d 2xk

/dt 2 + ij

kij dxi

/dt dxj /dt = 0

• Geodesics on homogeneous spaces very often may be expressed as matrix exponentials

Sphere:

Great circle

curvature terms

Covariance matrices:

R(t ) = R1/2 expm(R–1/2 Dt R–1/2) R

1/2

distance = 2-norm of log(eigenvalues)

Subspaces:

Y(t ) = Y V cos(t )V H + U sin(t )V

H

distance = 2-norm of acos(singular values)


Subspace Estimation Accuracy Bounds

• Statistical model for subspace estimation (real case)

– Estimate Y given data x = Yn1 + n0

E [n1n1H] = R1 (unknown), E [n0n0

H] = R0 (known)

– X = (x1, x2, …, xK) a matrix of iid snapshots, Rˆ = K–1XXH

– Gaussian pdf ƒ(X|R1, Y ) = [ exp(–1/2· tr RˆR2

–1) / ((2)n det R2) ]K ; R2 = Y R1Y H +

R0

• Subspace estimation accuracy

– Estimate Y in presence of unknown “nuisance” parameters R1

– Subspace estimation Fisher information metric

Derivatives of subspace Y look like matrices such that Y H = 0

Derivatives of covariance R1 are Hermitian matrices D

g((D,),(D,)) = 1/2·K tr ((R1Y H + Y R1H + Y DY H)R2

–1)2

– Derive accuracy bounds from Fisher information metric/matrix

See paper for details


Context of Mean and VarianceCFAR Analysis

• “Deflection ratio’’ (z – )/– Nuttall, “Operating characteristics of log-normalizer for

Weibull and log-normal inputs,” NUSC TR 8075, 1987

First closed-form analysis of mean and variance CFAR

Assumes Gaussian power-domain statistics

Independent sample mean and variance

– SAR speckle reduction

• Weibull clutter, many citations

• This talk

– Closed-form analysis (single finite integral)

– More-or-less arbitrary power-domain statistics

– More-or-less arbitrary form of CFAR normalizer

E.g., new log-Gamma CFAR: –log (,·z/)/()

– Mutually dependent sample mean and variance


Detector Performance: Receiver Operating Characteristic (ROC) Curves

• PD depends upon the threshold and target SNR

• PFA depends upon the threshold

• ROC curves show the dependence (direct or implicit) of the PD on the target SNR and/or PFA

– PD vs PFA (fixed SNR—i.e., vary the threshold)

– PD vs SNR (fixed PFA—i.e., fix the threshold)

– SNR vs PFA (fixed PD—i.e., vary the threshold)

0 10 20 30Output value z

0

.1

.2

.3

Pro

bab

ilit

yd

ensi

ty

ƒ(z|no target present)

ƒ(z|target present)

PFA

PD

threshold

2 4 6 8 10 12 14 16

SNR (dB)

1

10

50

90

99

99.9

PD

(%

)

MF

(4 lo

oks)

Swerling II (

4 looks)

Swerling I (4 looks)

PFA = 10–6


Constant False Alarm Rate (CFAR) Thresholding

• Problem: Must know (or estimate) noise floor to set threshold

• Solution: Estimate noise floor using noise-only samples

– Adaptive thresholding

• CFAR thresholding:

0 20 40 60 80 100Time (µs)

0

10

20

30

40

Po

wer

(d

B)

Signal

Noise floor

Absolutethreshold False alarm

test cellnoise floor estimate

> threshold

Threshold depends upon variance of background

Threshold depends upon variance of background


CFAR Techniques

• Mean-Level CFAR

– Noise estimate = RMS of noise-only cells

– Optimum estimator under ideal assumptions

– Not robust to target in training cells, inhomogeneous clutter

• Greatest-Of Mean-Level CFAR

– Noise estimate = Greatest of left-hand and right-hand sides

– Robust to false alarms caused by clutter on either side of test cell

– Not robust to target in training cells

• Censored Greatest-Of Mean-Level CFAR

– Noise estimate = Remove M largest samples, then GO-MLCFAR

– Robust to M targets in training cells, inhomogeneous clutter


Mean Level CFAR PerformanceNonfluctuating Target

z = iN|a + ni|

2 / K–1kK|k|

2 (noncentral F-distribution)

Kƒ(Kz) = (N+K+1

N+1)zN(1+z)–N–K e–N/(1+z)

1F1(–K; N; –Nz/(1+z))

PD(,) = (/(K+))N+K–1k=0K(N+K–1

N+k)(/K)kGk+1(N/(1+/K))

Nonfluctuating CFAR statistics Gk(z) (k,z)/(k)

z = iN|a + ni|

2 / K–1kK|k|

2 (noncentral F-distribution)

Kƒ(Kz) = (N+K+1

N+1)zN(1+z)–N–K e–N/(1+z)

1F1(–K; N; –Nz/(1+z))

PD(,) = (/(K+))N+K–1k=0K(N+K–1

N+k)(/K)kGk+1(N/(1+/K))

Nonfluctuating CFAR statistics Gk(z) (k,z)/(k)

MFMF | · |2| · |2Radardata

CompareCompare

detectiondecision

tappeddelay line

test cell

Selection logicSelection logic

z z z z z z z

Closed-form analysis provides ROC curves of mean-level CFAR

Closed-form analysis provides ROC curves of mean-level CFAR


CFAR LossNonfluctuating Target

• CFAR loss is extra SNR required to achieve same PD at fixed PFA

• Smaller CFAR loss for higher sample support, number of looks

10, 2

0

0 5 10 15 20 25

SNR (dB)

1

10

50

90

99

99.9

PD

(%

)

NC

I (4

loo

ks)

CFA

R: K

= 1

per

look

K =

5K

= 2

PFA = 10–6

Nonfluctuating

100 101 102

CFAR Sample Support (per look)

0

5

10

15

20

CF

AR

Lo

ss (

dB

)

PD = 95%PFA = 10–6

4 looksNonfluctuating

Relative to NCI

N = 1

48


Fiber Integration*

*See differential geometry literature, e.g., Abraham, Marsden & Ratiu

• Given joint pdf ƒZ(), is K-dimensional

– This will be either the joint background statistics, or the joint distribution of the sample mean and variance

• Given mapping x(), x is L-dimensional

• Want the pdf ƒX(x)

– This will be either the CFAR statistics, or the joint distribution of the sample mean and variance

• If L = K, standard change of variables: ƒX(x) = ƒZ()/det(∂x/∂)

– Very boring

• If L < K, fiber integration

– Very cool: invariant to choice of coordinates, inner product

ƒX(x0) = ƒZ() dy

x–1(x0)det

x/y/( )

or ƒX(x0) = ƒZ() dS

x–1(x0)||x/||


Geodesics on the Grassmann Manifold

• Maintains Grassmann constraint Y ()HY () = I

• Computational complexity O(np2)

• Approximation using QRD of Y + H possible

• Perform optimization line search along Y ()

Length Minimizing Curves

Geodesic curve Y() starting at Y in direction H:

Y() = (YV cos() + U sin())V H

UV H := H (compact SVD of H ), Y

HY = I, Y HH = 0

Geodesic curve Y() starting at Y in direction H:

Y() = (YV cos() + U sin())V H

UV H := H (compact SVD of H ), Y

HY = I, Y HH = 0


RQI is Cubically Convergent

Sketch of Proof

r () = cos2 + sin2

• Newton’s method cubically convergent on even functions

• RQI and Newton have same orderNewton: next = – 1/2·tan(2)

= – (4/3)3 + O(5)

RQI: next = – tan–

1(1/2·tan(2))= – 3 + O(5)


The Subspace Tracking Problem

• New solution “close” to old solution

• Approaches to subspace tracking:

– Rank-one updates versus full rank updates:

Rnew = Rold + xx

H versus Rnew = Rold + d

R/dt

– Algebraic versus geometric approaches Algebraic (decomposition based): Fuhrmann ’88, Comon

and Golub ’90, Yu ’91, Moonen, Van Dooren & Vandewalle ’92, Stewart ’92, Champagne ’94, Liu ’94

Geometric (derivative based): Yang & Kaveh ’88, Yang ’95, Smith ’93, Smith ’96

– Exploitation of time dynamics of R(t ) No direct use of d R/dt before

Many authors use structure of xx

H update

Problem: Determine principal invariant subspace of time varying covariance R(t )Problem: Determine principal invariant

subspace of time varying covariance R(t )

mit lincoln laboratory 000317-1 s.t. smith geometry and invariance in signal processing steven t....

Documents