mit lincoln laboratory 000317-1 s.t. smith geometry and invariance in signal processing steven t....

81
MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420; [email protected] . This work was sponsored by the United States Air Force under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Upload: chester-summers

Post on 28-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory000317-1S.T. SMITH

Geometry and Invariancein

Signal Processing

Steven T. Smith*

*MIT Lincoln Laboratory, Lexington, MA 02420; [email protected]. This work was sponsored by the United States Air Force under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Page 2: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory000317-2S.T. SMITH

Covariance Matrix Estimation Bounds

Steven T. Smith*

*MIT Lincoln Laboratory, Lexington, MA 02420; [email protected]. This work was sponsored by the United States Air Force under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Page 3: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-3S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusions

Page 4: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-4S.T. SMITH

Applications That UseCovariance Matrix Estimation

Air and Ground Surveillance• Space-Time adaptive processing• SAR/GMTI• Tracking

Algorithms and systems analysis for detection, location, and classification of difficult signals

all rely on covariance-based methods

Algorithms and systems analysis for detection, location, and classification of difficult signals

all rely on covariance-based methods

Signals Intelligence• Spectral analysis• Superresolution

Robust Navigation• Adaptive beamforming

Undersea Surveillance• Adaptive beamforming• Spectral analysis• Tracking

AdvancedCommunications

• Adaptive beamforming• Spectral analysis• Speech

Page 5: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-5S.T. SMITH

What’s Known AboutCovariance Matrix Estimation Quality?

• The sample covariance matrix (SCM) is the most likely covariance matrix estimate

– The SCM looks like: Rˆ = XX H (X is the N-by-K “data matrix”)

The “sample support” is K samples

– The SCM is unbiased: E [Rˆ] = R

– The SCM is “efficient”: Cov(Rˆ – R) is as small as possible

– The SCM is a lousy estimate at low sample support, SNRs

Subspace and ad hoc methods like “diagonal loading” are necessary

• The sample covariance matrix (SCM) is the most likely covariance matrix estimate

– The SCM looks like: Rˆ = XX H (X is the N-by-K “data matrix”)

The “sample support” is K samples

– The SCM is unbiased: E [Rˆ] = R

– The SCM is “efficient”: Cov(Rˆ – R) is as small as possible

– The SCM is a lousy estimate at low sample support, SNRs

Subspace and ad hoc methods like “diagonal loading” are necessary

Reed-Mallett-Brennan-Kelly-Boroson detection loss

Reed-Mallett-Brennan-Kelly-Boroson detection loss

1 2 5 10 15 200

3

6

Sample support / N

Lo

ss (

dB

)

K–N+2K+1

SCM eigenvalues(the “deformed quarter-circle law”)

SCM eigenvalues(the “deformed quarter-circle law”)

1 5 10 15 20–12

–6

0

6

Index

(d

B)

K = 2N

Page 6: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-6S.T. SMITH

Geometry is the Foundationof Signal Processing

Covariance matrices• Hermitian positive definite

Covariance matrices• Hermitian positive definite

Signal subspaces• Euclidean space

• Grassmann manifold

• Stiefel manifold

Signal subspaces• Euclidean space

• Grassmann manifold

• Stiefel manifold

Scaling• Magnitude

• Phase

Scaling• Magnitude

• Phase

Invariance testingInvariance testing

Statistical models• ƒ(z |)

–Parameter space–Cramér-Rao bounds

Statistical models• ƒ(z |)

–Parameter space–Cramér-Rao bounds

Spectral estimation• Array manifolds

Spectral estimation• Array manifolds

Filtering +AdaptationFiltering +Adaptation DetectionDetection EstimationEstimation

PhysicalModelingPhysicalModeling

Measurements

Signal Processing Steps

TrackingTracking

Page 7: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-7S.T. SMITH

S1

Does E[] = 0?

What’s The Average Value of a Circle?Uniform Distribution

What aboutthis circle?

Or this one?Should theembedding

matter?

Shouldn’t E[] liveon the circle?

Yes! But where?

Page 8: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-8S.T. SMITH

Some Literature (Very Partial List)

• Array manifolds / Spectral estimation– Schmidt ’79, Roy & Kailath ’89, Swindlehurst et al. ’92

• Maximum likelihood (geometric approach)– Amari ’85, Douglas & Amari ’97

• Phase-only nulling– Baird & Rassweiller ’76, Steyskal ’83, Hirasawa ’88, Smith ’94

• Structured covariance matrix estimation– Burg, Luenberger & Wenger ’82, Fuhrman & Barton ’90, Barton & Smith ’96

• Fourier analysis on homogeneous spaces (spheres etc.)– Healy ’95, Rockmore ’95

• Invariant hypothesis testing / Detection– Fisher ’53, Kay & Scharf ’84, Bose ’94, Scharf ’95

• Ambiguity functions– Rendas & Moura ’98

• Estimation bounds– Rao ’45, Gorman & Hero ’90, Smith ’00, ’05, Bhattacharya and Patrangenaru ’02

• Communications– Douglas ’00, Rahbar ’01, Zheng & Tse ’02, Cichocki, Amari & Georgiev ’02, Xavier ’02

• Pose estimation– Srivastiva & Grenander ’99, Ma et al. ’01, Adler et al. ’02, Srivastava & Klassen ’02

Page 9: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-9S.T. SMITH

Proving Wegener’s Theoryof Continental Drift

From Margaret Hanson (U Cincinnati)From Gary Glatzmaier (UCSC)

From www.ucmp.berkeley.edu/geology/tectonics.html

From www.itis-molinari.mi.it/Boundaries.html,based on Vine (1966)

Do magnetic polarities here and here have the same statistical distribution?

– “Dispersion on a sphere” (Fisher, 1953)

Do magnetic polarities here and here have the same statistical distribution?

– “Dispersion on a sphere” (Fisher, 1953)

Fisher’s famous paper actually analyzed data from Iceland

Fisher’s famous paper actually analyzed data from Iceland

730 MYA

65 MYA

QuickTime™ and aGIF decompressor

are needed to see this picture.

Page 10: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-10S.T. SMITH

Outline

• Introduction

• Geometric background

– Spheres, subspaces, and covariance matrices

Non-Euclidean examples in signal processing

– Manifolds, derivatives, geodesics

• Nonlinear estimation theory

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusions

Page 11: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-11S.T. SMITH

What Is the Average ofTwo Points on a Circle?

1

2• Average(1, 2) = w1•1 + w2•2

– What does multiplication by a weight mean?

– What does addition mean?

• These operations only make sense if the circle “lives” in some Euclidean space

• Are there “intrinsic” or “natural” equivalents of these ideas so that all operations take place on the circle?

– w1•1 = some other point on the circle

– 1 + 2 = some other point on the circle

• Same questions for spheres, n-spheres

Page 12: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-12S.T. SMITH

What Is the Average ofTwo Subspaces?

• Average(Y1, Y2) = w1•Y1 + w2•Y2

– What do these operations mean?

– Intrinsic explanations required

– w1•Y1 = some other subspace

– Y1 + Y2 = some other subspace

• No obvious way to embed the space of subspaces Gn,p (the Grassmann manifold) in Euclidean space

– Y an n-by-p matrix with orthonormal columns, but only the column span matters: YA Y column

– The n-by-n projection matrix YY T

– Neither gives a way to compute w1•Y1 + w2•Y2

Different Manifold, Same Questions

X1

X2

X3

Subspace Y1

SubspaceY2

Page 13: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-13S.T. SMITH

Covariance Matrix Estimation

• Sample covariance matrix (SCM): Rˆ = K

–1XX T

• What’s the average value of the SCM?

– E[Rˆ] = Rˆƒ(X|R) dX = R

– If w1•R1 + w2•R2 makes sense, then integral makes sense

– May we treat covariance matrices as vectors?

• Question: What do you get when you subtract one covariance matrix from another?

• Answer: Not a covariance matrix!2

0

0

1

1

0

0

2

1

0

0

–1

– =

Cone of Hermitian positive definite

matrices• R is a covariance• So is R, > 0

R1

R2

The covariance matrices are not a vector space

data matrix X

Page 14: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-14S.T. SMITH

Comparing Points on Manifolds

• Compare points using geodesic curves [exponential map]:

– Equate points on the manifold with tangent vectors at

• Average(1, 2) = exp(w1•exp–1

1 + w2• exp–1

2)

– Intrinsic average “lives” on the manifold

• Estimator bias b() depends upon choice of geodesics

ˆE[]

Parametermanifold

estimator

Tangent planeat exp–1

Geodesiccurves

b() = E[exp–1] = exp–1 E[]ˆ

ˆ

Page 15: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-15S.T. SMITH

Natural Geodesics on Quotient Manifolds

Spheres

Great circle

Covariance geodesics:

R(t ) = R1/2 expm(R–1/2 Dt R–1/2) R

1/2

distance = 2-norm of log(eigenvalues)

Compare to flat geodesics R(t) = R + t D

Subspace geodesics:

Y(t ) = Y V cos(t )V H + U sin(t )V

H

distance = 2-norm of acos(singular values)

Subspaces Covariancematrices

X1X2

X3

Subspaces =U(n)

U(p) U(n–p)

= the part of U(n) that doesn’t give in-plane or co-plane rotations

Covariancematrices =

Gl(n,C)

U(n)

= the Hermitian part of the matrix polar decomposition

Spheres =U(n)

U(n–1)

= the part of U(n) that rotates the north pole

Lie groups

R1

R2

Y1

Y2

Distancesmeasured in

radiansDistances

measured indecibels

Page 16: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-16S.T. SMITH

Derivatives

• The differential

(d /dt ) ƒ(x (t )) = ∂ƒ/∂x1 dx1/dt + ∂ƒ/∂x2 dx2/dt + … = ∂ƒ/∂x dx/dt

• The gradient

– The direction grad ƒ that solves the equation

grad ƒ, dx/dt = ∂ƒ/∂x dx/dt grad ƒ = G –1

∂ƒ/∂x

– Same derivation for Wiener filter equation w = R–1v

• The Hessian / Covariant differentiation

(d 2/dt

2) ƒ(x (t )) = 2ƒ(dx/dt, dx/dt ) (x(t ) a geodesic)

2ƒ(dxi /dt, dxj /dt ) = ∂ 2ƒ/∂xi ∂xj – ∑k

kij ∂ ƒ/∂xk

x1

x2

Manifold

x (t )

“curvature” terms

Think Cramér-Rao bound

Think Cramér-Rao bound

Page 17: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-17S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

– Intrinsic Cramér-Rao bounds

– Covariance matrix estimation

– Subspace estimation accuracy

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusions

Page 18: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-18S.T. SMITH

The Fisher Information Matrix (1922)

The covariance of a Gaussian estimate is inversely proportional to the negative mean Hessian of the log-likelihood function

– “On the mathematical foundations of theoretical statistics” (Fisher, 1922)

The covariance of a Gaussian estimate is inversely proportional to the negative mean Hessian of the log-likelihood function

– “On the mathematical foundations of theoretical statistics” (Fisher, 1922)

Page 19: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-19S.T. SMITH

C ≥ G –1C ≥ G –1

Intrinsic Cramér-Rao Lower BoundUnbiased Euclidean Case

ˆE[] =

Parameterspace

estimator

C ≥beamwidth2

SNRInverse FIM term

Error covariance Inverse Fisher information matrix

CRB looks like:

Page 20: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-20S.T. SMITH

C ≥ (I + ∂b/∂)G –1(I + ∂b/∂)TC ≥ (I + ∂b/∂)G –1(I + ∂b/∂)T

Intrinsic Cramér-Rao Lower BoundBiased Euclidean Case

C ≥beamwidth2

SNRInverse FIM+bias term

Error covariance

Inverse Fisher information matrix

CRB looks like:

Derivative of bias vector b

Parameterspace

estimatorˆE[] = b()

Page 21: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-21S.T. SMITH

Intrinsic Cramér-Rao Lower BoundUnbiased Riemannian Case

ˆE[] = Parametermanifold

estimator

C ≥beamwidth2

SNR

1 –beamwidth2 curvature

SNR+ O(SNR–3)

• Inverse FIM term– Really care

about this term

• Local curvature term– SNR–2 term with

Riemannian curvature– Not sure that I care: An

open question

• Higher order terms– I know that I don’t care:

CRB an asymptotic bound

C ≥ G –1 – (Rm(G

–1)G –1 + G

–1Rm(G –1)) C ≥ G

–1 – (Rm(G –1)G

–1 + G –1Rm(G

–1))

Inverse Fisher information matrix

Mean Riemannian curvature

1

3

CRB looks like:

Error covariance

Page 22: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-22S.T. SMITH

Intrinsic Cramér-Rao Lower BoundBiased Riemannian Case

C ≥beamwidth2

SNR

1 –beamwidth2 curvature

SNR+ O(SNR–3)

• Inverse FIM+bias term

– Really care about this term

• Local curvature term– SNR–2 term with

Riemannian curvature– Not sure that I care: An

open question

• Higher order terms– I know that I don’t care:

CRB an asymptotic bound

C ≥ MbG –1Mb

T – (Rm(C)MbG –1Mb

T + MbG –1Mb

TRm(C))

Mb = I – ||b||2K(b) + b

C ≥ MbG –1Mb

T – (Rm(C)MbG –1Mb

T + MbG –1Mb

TRm(C))

Mb = I – ||b||2K(b) + b

Inverse Fisher information matrix

Mean Riemannian curvature

1

3

CRB looks like:

ˆE[]Parametermanifold

estimator

b()

1

3

Sectional curvature along bias b bases

Covariant differential of bias vector field b

Error covariance

Page 23: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-23S.T. SMITH

• The Cramér-Rao lower bound is the smallest possible covariance of the estimation error

• The CRB depends upon the choice of geodesics

• The Cramér-Rao lower bound is the smallest possible covariance of the estimation error

• The CRB depends upon the choice of geodesics

Intrinsic Cramér-Rao Lower Bound

Error covariance ≥ ƒ(Inverse Fisher Information Matrix)

Estimation errorcovariance

ˆE[]

Parametermanifold

estimator

exp–1

b() = E[exp–1] = exp–1 E[]ˆ

ˆ

bias

Page 24: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-24S.T. SMITH

Cramér-Rao Bound in Four Easy StepsA new proof of the CRB [Euclidean case]

Fact 1. = log ƒ(z|), = ∂ /∂ , G = E [T ]

Lemma 1. E [ ] = 0 (Differentiate equality ƒ(z|) d z = 1)

Fact 2. a biased estimator of , E [] = + b()

Lemma 2. E [( – – b) ] = I + b (Differentiate Fact 2)

Theorem. E [( – – b)( – – b)T] ≥ (I + b )G

–1(I + b )T

Proof. Consider the covariance of the random variablev = ( – – b) – (I+b)G

–1

T. E v = 0 by Lemma 1 and

Fact 2. By Lemma 2, E vvT = E [( – – b)( – – b)T] – (I+b)G

–1(I+b)T ≥ 0.

QED

Conclusion. E [( – – b)( – – b)T ] ≥ (I + b)G

–1(I + b)TConclusion. E [( – – b)( – – b)T

] ≥ (I + b)G–1(I + b)

T

The newpart

Page 25: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-25S.T. SMITH

Intrinsic Efficiency Unbiased Euclidean Case

Consider the random tangent vector:

v = – – grad ˆ

Vector from to ˆ Gradient (w.r.t. FIM) of log-likelihood = log ƒ

Facts: E[v] = 0 and E[vvT] ≥ 0 (this is the CRB)

Efficiency: Estimator achieves the CRB iff

= + grad ˆ

ˆ

Page 26: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-26S.T. SMITH

Intrinsic Efficiency Biased Euclidean Case

Consider the random tangent vector:

Facts: E[v] = 0 and E[vvT] ≥ 0 (this is the CRB)

Efficiency: Estimator achieves the CRB iff

= + b + (I + ∂b/∂)(grad )ˆ

ˆ

v = – – b() – (I + ∂b/∂)(grad )ˆ

Vector from to ˆGradient (w.r.t. FIM) of log-likelihood = log ƒBias vector

Derivative of bias vector

Page 27: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-27S.T. SMITH

Intrinsic Efficiency Unbiased Riemannian Case

Consider the random tangent vector:

v = exp–1

– (I – Rm(C))(grad )1

Tangent from to ˆ Mean Riemannian curvature of exp

–1ˆ and basisGradient (w.r.t. FIM) of log-likelihood = log ƒ

Facts: E[v] = 0 and E[vvT] ≥ 0 (this is the CRB)

Efficiency: Estimator achieves the CRB iff

exp–1

= (I – Rm(C))(grad )1

ˆ

Page 28: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-28S.T. SMITH

Intrinsic Efficiency Biased Riemannian Case

Consider the random tangent vector:

v = exp–1

– b – (I – ||b||2K(b) – Rm(C) + b)(grad )1

3

1

Tangent from to ˆ

Bias vector

Sectional curvatures between b and basis

Mean Riemannian curvature of exp

–1ˆ and basisGradient (w.r.t. FIM) of log-likelihood = log ƒ

Covariant differential of bias vector field b

Facts: E[v] = 0 and E[vvT] ≥ 0 (this is the CRB)

Efficiency: Estimator achieves the CRB iff

exp–1

= b + (I – ||b||2K(b) – Rm(C) + b)(grad )1

3

1

ˆ

Page 29: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-29S.T. SMITH

What Are The Sectional andRiemannian Curvatures?

And Do They Matter?

• Bias term Mb depends upon ||b||2K(b)

– Small for small biases relative to (max |K |)–1/2

• Covariance term Rm(C) equals E[R(exp–1ˆ,),exp

–1ˆ]

– Small for small errors relative to (max |K |)–1/2

9 dB for covariance matrices, 1 radian for subspaces

A0 = r 2

Aexp

tX

tY–tX

–tY

ZZ(t )

R(X,Y )Z = limZ – Z(t )

t 2t 0

Riemanniancurvature

K(X Y ) = R(X,Y )Y,X / ||X Y ||2

= tr([X,Y ])2/4 ≤ 0 for covariances

= –tr([X,Y ])2/2 ≥ 0 for subspaces

K = limA0 – A

r 2A0r 0

sectionalcurvature

12

Page 30: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-30S.T. SMITH

Metric Structure of Statistical ModelFisher Information Metric/Matrix

With coordinates (Rao ’45; e.g., angle, Doppler, etc.)• ds2 = g11 d 1 d 1 + 2g12 d 1 d 2 + g22 d 2d 2 + …

• gij = E [∂ /∂ i ∂ /∂ j ], () = log ƒ(z|) = log-likelihood

• G = [gij ] = Fisher information matrix (FIM)

Without coordinates (covariance, subspaces, etc.)• g = E [d d ] = Fisher information metric

Parameterspace

1

2

+d ds

What’s the length of ds?z

ƒ (z|+d )ƒ (z|)

What is the distance between two distributions?

Parameter spacepdf ƒ(z|), in U

Parameter spacepdf ƒ(z|), in U

Statistical modelS = { ƒ(z|) : in U }

Statistical modelS = { ƒ(z|) : in U }

Page 31: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-31S.T. SMITH

Fisher Information Metric:

g E [d d ] = – E [ 2 ]

Fisher Information Metric:

g E [d d ] = – E [ 2 ]

The Fisher Information Metricand The Hessian

Why this fact is useful and important:

• Computationally convenient

– Second derivatives of many distributions more tractable than squares of first derivatives

• Independent of curvature of parameter space

– Fisher information matrix independent of arbitrary choice of affine connection and/or distance metric (“curvature” terms) for parameter space

Recall that 2ƒ(dxi /dt, dxj /dt ) = ∂ 2ƒ/∂xi

∂xj – ∑k k

ij ∂ ƒ/∂xk

“curvature” terms

Page 32: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-32S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

– Intrinsic Cramér-Rao bounds

– Covariance matrix estimation

– Subspace estimation accuracy

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusions

Page 33: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-33S.T. SMITH

Covariance matrices flat:

Covariance matrices

R1R2

Surprising and Useful Result!

SCM is a Biased and Inefficient Estimator

• Sample covariance matrix (SCM): Rˆ = K

–1XX T

data matrix X

• Geodesics R(t) = R + t (Rˆ – R)

• ER[Rˆ] = expR (expR–1Rˆ)ƒ(X|R) dX

= R + (Rˆ – R)ƒ(X|R) dX

= R

• Rˆ is an unbiased and efficient (i.e., achieves CRB) estimate of R

• Doesn’t account for extra estimation loss at low sample support

• Geodesics R(t) = R1/2eR–1/2Dt R–1/2

R1/2

• ER[Rˆ] = e–(N,K)R

R

• Rˆ is an biased and inefficient (error larger than CRB) estimate of R

• The bias term (N,K) corresponds to extra estimation loss at low sample support

Covariance matrices curved:

No surprise hereNo surprise here Completely unexpected!Completely unexpected!

Page 34: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-34S.T. SMITH

Sample Covariance Matrix Estimation

Covariance RMSE vs Sample Support

An estimator ˆ of is efficient (neglecting R-curvature) iff:

exp–1

E[ˆ] = b() +

(I–||b||2K(b)/3+b) grad

An estimator ˆ of is efficient (neglecting R-curvature) iff:

exp–1

E[ˆ] = b() +

(I–||b||2K(b)/3+b) grad

Is there a more efficient covariance estimator at low sample support?

≈ 10 dB difference

Is there a more efficient covariance estimator at low sample support?

≈ 10 dB difference

100 101 102

Sample support / N

100

101

102

Co

vari

ance

RM

SE

(d

B)

SCM (natural metric)

Biased natural CRB

Unbiased natural CRB

SCM (flat metric)

Flat unbiased CRB

6-by-6 Hermitian example, 1000 Monte Carlo trials

10/log 10·N/√K dB

10/log 10·ƒ(R)/√K dB

Flat efficiency:

E[ˆ] = + b() +

(I+∂b/∂)G–1(∂/∂)T

Flat efficiency:

E[ˆ] = + b() +

(I+∂b/∂)G–1(∂/∂)T

Page 35: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-35S.T. SMITH

CRBs for SCMsClosed-Form Expressions

• Natural covariance metric/geodesics

– distance(Rˆ,R) = norm(log(eig(Rˆ,R)))

– mean-square distance(Rˆ,R) ≥ N 2/K + N·(N,K)2

(N,K) = N –1(N·log K + N – (K–N+1) + (K–N+1)(K–N+2) + (K+1) –

(K+1)(K+2)) [N-by-N Hermitian case] (N,K) = log K/2 – N

–1∑i ((K–i+1)/2) [N-by-N Symmetric case]

– Natural covariance metric invariant to R and complete

A “whitened” covariance metric

• Flat covariance metric/geodesics

– distance(Rˆ,R) = norm(Rˆ – R,’fro’)

– mean-square distance(Rˆ,R) ≥ K–1(∑i Rii 2 + 2∑I<j Rii Rjj )

[Hermitian]

– mean-square distance(Rˆ,R) ≥ 2K–1(∑i≤j Rij 2 + ∑I<j Rii Rjj )

[Symmetric]

– Flat covariance metric not invariant or complete

digamma function = ´/

Page 36: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-36S.T. SMITH

Symmetric Space Geometry and SCM Bias

Covariancematrices

ER[Rˆ] = e–(N,K)R / B(R) = –(N,K)R

= the Hermitian part of the matrix polar decomposition

= determinant unit-determinant covariance matrices

Covariancematrices =

Gl(n,C)

U(n)= R

Sl(n,C)

U(n)~

=~

Gl(n,C)/U(n)

Sl(n,C)/U(n)R

• Only the SCM’s determinant is biased

– Implies that the eigenvalues are biased

• The covariant differential and curvature of B vanish

– Because B(R) = –R

• Is there a connection with the symmetric space decomposition of the covariance matrices?

• Can something be said about the bias in other symmetric spaces, e.g., the Grassmann manifold?

Page 37: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-37S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

– Intrinsic Cramér-Rao bounds

– Covariance matrix estimation

– Subspace estimation accuracy

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusions

Page 38: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-38S.T. SMITH

Subspace Estimation Accuracy

• Standard subspace estimation method (SVD)

– [U,S,V] = svd(X,0) (Matlab notation)

– SVD-based subspace estimate: Yest = U(:,1:p)

• What is the error between this estimate and the truth?

– Natural subspace distance = (∑ principal angles2): norm(acos(min(svd(orth(Yest)’*orth(Ytrue)),1)))

– Recall that the Fisher information metric is independent of this choice of error metric

Ytrue

Yest

Subspace distance

Page 39: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-39S.T. SMITH

Subspace Accuracy vsSNR and Sample Support

10 20 30 40 50 60

SNR (dB)

10–4

10–3

10–2

10–1

10 0

Su

bs

pac

e a

ccu

racy

(ra

d)

SVD-based subspace estimation

Cramér-Rao bound

5-by-2 example, 10 snapshots, 1000 Monte Carlo trials

100 101 102 103

Sample support / 2

10–3

10–2

10–1

10 0

Su

bs

pac

e a

ccu

racy

(ra

d)

SVD-based subspace estimation

Cramér-Rao bound

5-by-2 example, 21 dB SNR, 1000 Monte Carlo trials

• The RMSE of the SVD- based estimator is a small constant fraction above the CRB for fixed sample support

• The RMSE of the SVD- based estimator approaches the CRB for large sample support

• SVD-based estimation provides nearly optimum performance

• Coordinate-free CRB analysis required for this conclusion

(pn – p)1/2/(K·SNR)1/2 rad*

*white subspace covariance

Page 40: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-40S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

• Nonlinear detection theory

– The detection problem

– Invariant analysis via fiber integration

– Mean and variance CFAR normalizers

• Geometric optimization and filtering theory

• Summary and conclusions

Wrap up

Page 41: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-41S.T. SMITH

The Detection ProblemUnknown Background

• Detector design: Threshold data to detect signals in the presence of noisy backgrounds with a low false alarm rate

• Problem: Predict detection performance with unknown background mean* and variance*

0 20 40 60 80 100Time (µs)

0

10

20

30

40

Po

wer

(d

B)

Signal

Noise

ThresholdFalse alarm(Almost)

everythingyou need

to knowabout

thedetectionproblem:

mean*

variance*

Page 42: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-42S.T. SMITH

Mean and Variance CFAR Normalizers

• The deflection ratio: x(z) = (z – )/

– Traditional method most encountered

– Minor nit: maps power domain data onto entire real line

Easily handled with remapping log(exp(z)+1) or (z+√(z2+1))/2

• The log-deflection ratio: x(z) = exp((log z – log)/log)

– Decibel-space version of deflection ratio

– Power-domain method

• The log-Gamma CFAR method: x(z) = –log (,·z/)/()

– Maps chi-squared backgrounds with = 2/2 complex dof to exponential backgrounds ( = 1); SNR SNR/2

– Power-domain method

• Can compute closed-form statistics of all of these

– Fiber integration using mapping x(z), = (, )

Page 43: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-43S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

• Nonlinear detection theory

– The detection problem

– Invariant analysis via fiber integration

– Mean and variance CFAR normalizers

• Geometric optimization and filtering theory

• Summary and conclusions

Page 44: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-44S.T. SMITH

Fiber IntegrationInterpretation for CFAR Analysis

ƒX(x0) = ƒZ() dy

x–1(x0)det

x/y/( )

CFAR output X = function of input data Z = X(Z)

MFMF | · |2| · |2Radardata

DivideDividetappeddelay line

∑∑

test cell

Selection logicSelection logic

z z z z z z z

XE.g.: mean-level CFAR

CFARoutput x

CFAR outputstatistics x0

Input data z yielding CFAR output x0 =

“fiber” above x0

Fiber above x0

Inputstatistics

Fiberintegration

Page 45: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-45S.T. SMITH

Joint pdf of Sample Mean and Variance

• First two sample moments: m1 = (1/K)∑zk , m2 = (1/K)∑zk2

– Sample mean and variance: = m1, 2 = m2 – m12

– Mutually dependent random variables

– Will use this to our advantage to obtain finite integrals

• Apply fiber integration with the mapping (z1, z2, …, zK) (1, 2)

– Results in integral on the (K–1)-simplex (1/K)∑zk = 1

– Easily performed by Monte Carlo integration for high relative accuracy

Another application of invariance to obtain uniform density

ƒ (m1,m2) = m1NK–3e–NKm1s´(m2/m1

2)KKNNK

(K)(N)K

s(m2) = (1/M )∑ ∏(zk)N–1 (Monte Carlo evaluation)

A fiber

Page 46: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-46S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

• Nonlinear detection theory

– The detection problem

– Invariant analysis via fiber integration

– Mean and variance CFAR normalizers

Deflection ratio: (z – )/ Log-deflection ratio: (log z – log)/log

Log-Gamma: –log (,·z/)/()

• Geometric optimization and filtering theory

• Summary and conclusions

Page 47: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-47S.T. SMITH

CFAR Distributions

Deflection ratio and Log-Gamma

ƒdr(x) = e–S –1+/2s´(m2) dm2

(x–1/2+1)–1+(x–1/2+1+K)–N(K+1)

1F1(N(K+1); N; S(x–1/2+1)/(x–1/2+1+K))

KK (N(K+1))

(K)(N)K+1 1

min(K,1+1/(x))

Deflection ratio

(m2 = 1 + –1)

ƒlg(x) = e–S–x s´(m2)

()()–NK(1+N/)–N(K+1)e

1F1(N(K+1); N; S/(1+N/)) dm2

KK (N(K+1))

(K)(N)K+1

Log-Gamma CFAR

(m2 = 1 + –1; –log (,)/() = x)

1

K

• Background statistics: chi-squared with N complex dof

• Sample support: K samples

• Signal model: nonfluctuating target with SNR S

Page 48: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-48S.T. SMITH

0 50 100 150

Variate

10–8

10–6

10–4

10–2

100

102

pd

f

CFAR Density Example

with Monte Carlo Comparisons

N = 2K = 10SNR = 9 dB

Deflection ratio (exact and M.C.)Log-Gamma (exact and M.C.)

Page 49: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-49S.T. SMITH

0 5 10 15 20SNR (dB)

0

1/2

1

PD

Matched filterMean-level CFARDeflection ratioLog-Gamma CFARLog-deflection ratio N = 1

K = 20PFA = 10–4

Receiver Operating Characteristics

• Compare to mean-level CFAR (at PD = 50%)

– Extra CFAR loss for variance normalization

• Deflection ratio has 0.5 dB CFAR loss

• Log-Gamma has 1.2 dB CFAR loss

• Log-deflection ratio has 4 dB CFAR loss

• For sample support K = 10, these losses are 0.7 dB, 1.5 dB, and 8 dB

Page 50: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-50S.T. SMITH

Outline

• Introduction

• Geometric background

• Nonlinear estimation theory

• Nonlinear detection theory

• Geometric optimization and filtering theory

• Summary and conclusionsWrap up

Page 51: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-51S.T. SMITH

Geometric Optimization

• Generalization of Euclidean optimization

• Reformulate classical algorithms

– Newton’s method

– Steepest descent

– Conjugate gradient method

• Perform optimization of constraint surface Lines Geodesics, etc.

Benefits:• Natural description of problem• Unifying viewpoint for algorithms

Caveats:• Computational infeasible in general• Group invariance highly desirable

Benefits:• Natural description of problem• Unifying viewpoint for algorithms

Caveats:• Computational infeasible in general• Group invariance highly desirable

Luenberger’sant-on-surface

analogy

Page 52: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-52S.T. SMITH

Applied Optimization Problems

• Signal processing and detection

H0: z = n (interference+noise hypothesis)

H1: z = av + n (signal-plus-interference+noise hypothesis)

R = covariance matrix = E[nnH]

• Subspace tracking Maximize ƒ(Y ) = tr Y HRY such that Y HY = I

• Eigenvector tracking Maximize ƒ(Y ) = tr Y HRYN such that Y HY = I

• Local density approximation of Schrödinger’s equation Maximize ƒ(Y ) = tr Y HHY + (Y ) such that Y HY = I

H = Hamiltonian

Y

n-by-p

Bradbury & Fletcher ’66, Alsén ’71, Ruhe ’74, Cullum ’78, Parlett et al. ’82, Comon & Golub ’90, Yang & Kaveh ’88, Yang ’95, Edelman, Arias & Smith ’98, many others

Page 53: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-53S.T. SMITH

Common Structure Among Problems

• Constrained optimization

• Group invariance

– tr(Y )HR(Y ) = tr HY HRY = tr Y

HRY H = tr Y

HRY

for any unitary matrix

• Pertinent fields of study

– Numerical linear algebra

– Optimization

– Differential and Riemannian geometry

– Lie groups and Lie algebras

– Homogeneous and symmetric spaces

– Adaptive filtering

Exploit Natural Structure of ProblemExploit Natural Structure of Problem

Y

n-by-p

p-by-p

Page 54: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-54S.T. SMITH

Newton’s Method

Quadratic Convergence with Newton Step = –(2ƒ)–1(ƒ)

Euclidean:

Maximize ƒ(x), x in Rn

Euclidean:

Maximize ƒ(x), x in Rn

Riemannian:

Maximize ƒ(x), x in M

Riemannian:

Maximize ƒ(x), x in M

Page 55: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-55S.T. SMITH

Optimization on the Grassmannand Stiefel Manifolds

• Maximize ƒ(Y ) = tr Y HRY such that Y HY = I

– ƒ(Y ) = ƒ(Y ) for all unitary

• Maximize ƒ(Y ) = tr Y HRY such that Y HY = I

– ƒ(Y ) = ƒ(Y ) for all unitary

Y

n-by-p

Page 56: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-56S.T. SMITH

Newton’s Method on theGrassmann Manifold

• A Sylvester equation for H

• Asymptotically equivalent to RQI (cubic convergence)

• Solution methods

– Direct O(n3p) method Uses Ritz vectors of Y

Computationally unattractive

– Linear conjugate gradient Truncated Newton approach

Yields O(n2p2) algorithm

Generalized Rayleigh Quotient ƒ(Y ) = tr Y HRY, Y HY = I

Solve for H:

(I – YY H)RH – H (Y

HRY ) = –(I – YY H)RY; Y

HH = 0

Solve for H:

(I – YY H)RH – H (Y

HRY ) = –(I – YY H)RY; Y

HH = 0

Hessian of ƒ –Gradient of ƒ Tangent vectorconstraint

Page 57: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-57S.T. SMITH

Numerical Results

Trace Maximization on G 5,3

Page 58: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-58S.T. SMITH

Interference Suppression

Rotating Phased Array Antenna

Problems:

• Maximize signal-to-interference-plus-noise ratio

• Track interference and/or signal subspace

Problems:

• Maximize signal-to-interference-plus-noise ratio

• Track interference and/or signal subspace

Page 59: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-59S.T. SMITH

Time Varying Adaptive Filter

Rotating Phased Array Antenna

Page 60: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-60S.T. SMITH

R(t ) = Covariance / Y (t ) = Principal Invariant Subspace

Dynamics for H0 = d Y/dt:

H0(Y HRY ) – (I – YY

H)RH0 = (I – YY H)(d R/dt )Y; Y

HH0 = 0

Dynamics for H0 = d Y/dt:

H0(Y HRY ) – (I – YY

H)RH0 = (I – YY H)(d R/dt )Y; Y

HH0 = 0

Tangent vectorconstraint

Subspacedynamics

Covariancedynamics

The Subspace Tracking Equation

• New solution “close” to old solution

• Approaches to subspace tracking:

– Rank-one updates versus full rank updates:

Rnew = Rold + xx

H versus Rnew = Rold + d R/dt

– Algebraic versus geometric approaches Algebraic (decomposition based): Fuhrmann ’88, Comon and

Golub ’90, Yu ’91, Moonen, Van Dooren & Vandewalle ’92, Stewart ’92, Champagne ’94, Liu ’94

Geometric (derivative based): Yang & Kaveh ’88, Yang ’95, Smith ’93, Smith ’96

Page 61: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-61S.T. SMITH

Summary and Conclusions

• Geometric invariance ubiquitous in signal processing

– Geometric properties can be exploited for solutions and insight

• The Cramér-Rao bound with bias is generalized to arbitrary manifolds without intrinsic (prescribed) coordinates

– Estimator bias and efficiency depend upon geometry

– SCM biased and inefficient from intrinsic perspective Bias term corresponds to known sample support loss

• Derived formula bounding covariance and subspace accuracy

• Fiber integration applied to mean and variance CFAR analysis

– Previously unsolved performance analysis

– The deflection ratio outperforms the log-Gamma and log-deflection ratio tests (0.5 dB loss vs 1.2 dB vs 4 dB)

• Orthogonality constraints easily incorporated into many optimization problems (e.g., subspace tracking)

• Story incomplete—still the Age of Discovery

Page 62: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-62S.T. SMITH

The End

Page 63: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-63S.T. SMITH

Backups

Page 64: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-64S.T. SMITH

Why Geometryfor Signal Processing?

• The easy reasons

– Euclidean geometry and linear algebra used for just about everything in signal processing

Optimal Filtering theory / Least squares concepts

Detection and Estimation and Root-Mean-Square Error

Parseval’s theorem

• Abstract geometry is useful

– Use the right tool for the right job

Many objects in signal processing are non-Euclidean Space of covariance matrices / Space of subspaces

An accurate geometrical description is insightful

– Useful for solving new problems [e.g., mean and variance CFAR]

• Abstract geometry is beautiful

– Differential and Riemannian geometry / Lie groups and Lie algebras / Homogeneous spaces

Page 65: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-65S.T. SMITH

The Intrinsic Cramér-Rao Bound

C G–1 = E [d d ]–1 (unbiased, neglect curvature)

(I + b)G–1(I + b)T (biased, neglect curvature) MbG–1 Mb

T + R(C) (biased, with curvature terms)

C G–1 = E [d d ]–1 (unbiased, neglect curvature)

(I + b)G–1(I + b)T (biased, neglect curvature) MbG–1 Mb

T + R(C) (biased, with curvature terms)

CRB assumptions• Parameter space an arbitrary manifold

• () = log ƒ(z|) = log-likelihood function

• g = E [d d ] = Fisher information metric

• Arbitrary coordinates = (1, 2, …, n)

• G = [gij ] = FIM w.r.t. arbitrary coordinates

• Any estimator of with bias vector b()

• Error covariance matrix = E [( – – b())( – – b())T ]

• Matrix inequality: A B iff A – B is positive semidefinite

Page 66: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-66S.T. SMITH

Subspace Estimation

• Subspace estimation is a frequently encountered in signal processing

• Consider the MIMO problem: z = An1 + n0

– Estimate the channel matrix A from several measurements of z

The noisy inputs n1 and n0 are unknown

– Invariance: z is the same measurement for A AM, n1 M–1n1 for an arbitrary matrix M

• Only the column span (subspace) is unchanged by A AM

– Can only measure the column span of A column

• How does one make sense of the “error” or “bias” of the subspace estimate?

– Space of subspaces is non-Euclidean

Page 67: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-67S.T. SMITH

Homogeneous SpacesE.g.: Covariance Matrices

Identitymatrix I

Cone of Hermitian positive definite

matrices• R is a covariance• So is R

1

0

0

1

Gl(n,C) = Lie group of complex

invertible matrices

U(n) = Lie group of unitarymatrices: H = I

Group action: R MHRM

Identity matrix invariant to group action by U(n): I HI = I

1

0

2j

3

1

0

2j

3

H

=1

–2j

2j

13

c

–s

s

c

=

H c

–s

s

c

Covariancematrices

Gl(n,C)=

U(n)

= the part of Gl(n,C) that doesn’t give invariance

Set of allowable “directions”

“Directions” that don’t get me anywhere

Rule for following “directions”

Page 68: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-68S.T. SMITH

Space of Subspaces

The Grassmann Manifold

X1

X2

X3

Gn,p

U(n) = Lie group of unitary matrices

U(p) = Lie group of in-plane rotations

Group action: Y Y (rotation)

Subspace Y invariant to both in-plane and co-plane rotations

Y = A p-dimensional subspace in Cn

U(n–p) = Lie group of co-plane rotations

SubspacesU(n)

=U(p) U(n–p)

= the part of U(n) that doesn’t give in-plane or co-plane rotations

Subspace Y

Page 69: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-69S.T. SMITH

Riemannian Manifolds

• Manifold: A space that locally looks like Rn or Cn

• Riemannian manifold: A manifold with the distance metric

ds2 = g11 dx1 dx1 + 2g12 dx1 dx2 + g22 dx2dx2 + …

= dxT G dx = dx, dx

• Examples

– Sphere: x2 + y2 + z2 = 1

dimension = 3 – 1 = 2

– Orthogonal matrices: T = I

dimension = n2 – 1/2·n(n + 1) = 1/2·n(n – 1)

x1

x2

Manifold

x

x1

x2

Riemannianmanifold

xx+dxds

Page 70: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-70S.T. SMITH

Geodesicsa.k.a. the Exponential Map

• Geodesics are generalizations of straight lines

• They minimize the length between two points

• Geodesic equation “exponential map”

d 2xk

/dt 2 + ij

kij dxi

/dt dxj /dt = 0

• Geodesics on homogeneous spaces very often may be expressed as matrix exponentials

Sphere:

Great circle

curvature terms

Covariance matrices:

R(t ) = R1/2 expm(R–1/2 Dt R–1/2) R

1/2

distance = 2-norm of log(eigenvalues)

Subspaces:

Y(t ) = Y V cos(t )V H + U sin(t )V

H

distance = 2-norm of acos(singular values)

Page 71: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-71S.T. SMITH

Subspace Estimation Accuracy Bounds

• Statistical model for subspace estimation (real case)

– Estimate Y given data x = Yn1 + n0

E [n1n1H] = R1 (unknown), E [n0n0

H] = R0 (known)

– X = (x1, x2, …, xK) a matrix of iid snapshots, Rˆ = K–1XXH

– Gaussian pdf ƒ(X|R1, Y ) = [ exp(–1/2· tr RˆR2

–1) / ((2)n det R2) ]K ; R2 = Y R1Y H +

R0

• Subspace estimation accuracy

– Estimate Y in presence of unknown “nuisance” parameters R1

– Subspace estimation Fisher information metric

Derivatives of subspace Y look like matrices such that Y H = 0

Derivatives of covariance R1 are Hermitian matrices D

g((D,),(D,)) = 1/2·K tr ((R1Y H + Y R1H + Y DY H)R2

–1)2

– Derive accuracy bounds from Fisher information metric/matrix

See paper for details

Page 72: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-72S.T. SMITH

Context of Mean and VarianceCFAR Analysis

• “Deflection ratio’’ (z – )/– Nuttall, “Operating characteristics of log-normalizer for

Weibull and log-normal inputs,” NUSC TR 8075, 1987

First closed-form analysis of mean and variance CFAR

Assumes Gaussian power-domain statistics

Independent sample mean and variance

– SAR speckle reduction

• Weibull clutter, many citations

• This talk

– Closed-form analysis (single finite integral)

– More-or-less arbitrary power-domain statistics

– More-or-less arbitrary form of CFAR normalizer

E.g., new log-Gamma CFAR: –log (,·z/)/()

– Mutually dependent sample mean and variance

Page 73: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-73S.T. SMITH

Detector Performance: Receiver Operating Characteristic (ROC) Curves

• PD depends upon the threshold and target SNR

• PFA depends upon the threshold

• ROC curves show the dependence (direct or implicit) of the PD on the target SNR and/or PFA

– PD vs PFA (fixed SNR—i.e., vary the threshold)

– PD vs SNR (fixed PFA—i.e., fix the threshold)

– SNR vs PFA (fixed PD—i.e., vary the threshold)

0 10 20 30Output value z

0

.1

.2

.3

Pro

bab

ilit

yd

ensi

ty

ƒ(z|no target present)

ƒ(z|target present)

PFA

PD

threshold

2 4 6 8 10 12 14 16

SNR (dB)

1

10

50

90

99

99.9

PD

(%

)

MF

(4 lo

oks)

Swerling II (

4 looks)

Swerling I (4 looks)

PFA = 10–6

Page 74: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-74S.T. SMITH

Constant False Alarm Rate (CFAR) Thresholding

• Problem: Must know (or estimate) noise floor to set threshold

• Solution: Estimate noise floor using noise-only samples

– Adaptive thresholding

• CFAR thresholding:

0 20 40 60 80 100Time (µs)

0

10

20

30

40

Po

wer

(d

B)

Signal

Noise floor

Absolutethreshold False alarm

test cellnoise floor estimate

> threshold

Threshold depends upon variance of background

Threshold depends upon variance of background

Page 75: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-75S.T. SMITH

CFAR Techniques

• Mean-Level CFAR

– Noise estimate = RMS of noise-only cells

– Optimum estimator under ideal assumptions

– Not robust to target in training cells, inhomogeneous clutter

• Greatest-Of Mean-Level CFAR

– Noise estimate = Greatest of left-hand and right-hand sides

– Robust to false alarms caused by clutter on either side of test cell

– Not robust to target in training cells

• Censored Greatest-Of Mean-Level CFAR

– Noise estimate = Remove M largest samples, then GO-MLCFAR

– Robust to M targets in training cells, inhomogeneous clutter

Page 76: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-76S.T. SMITH

Mean Level CFAR PerformanceNonfluctuating Target

z = iN|a + ni|

2 / K–1kK|k|

2 (noncentral F-distribution)

Kƒ(Kz) = (N+K+1

N+1)zN(1+z)–N–K e–N/(1+z)

1F1(–K; N; –Nz/(1+z))

PD(,) = (/(K+))N+K–1k=0K(N+K–1

N+k)(/K)kGk+1(N/(1+/K))

Nonfluctuating CFAR statistics Gk(z) (k,z)/(k)

z = iN|a + ni|

2 / K–1kK|k|

2 (noncentral F-distribution)

Kƒ(Kz) = (N+K+1

N+1)zN(1+z)–N–K e–N/(1+z)

1F1(–K; N; –Nz/(1+z))

PD(,) = (/(K+))N+K–1k=0K(N+K–1

N+k)(/K)kGk+1(N/(1+/K))

Nonfluctuating CFAR statistics Gk(z) (k,z)/(k)

MFMF | · |2| · |2Radardata

CompareCompare

detectiondecision

tappeddelay line

test cell

Selection logicSelection logic

z z z z z z z

Closed-form analysis provides ROC curves of mean-level CFAR

Closed-form analysis provides ROC curves of mean-level CFAR

Page 77: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-77S.T. SMITH

CFAR LossNonfluctuating Target

• CFAR loss is extra SNR required to achieve same PD at fixed PFA

• Smaller CFAR loss for higher sample support, number of looks

10, 2

0

0 5 10 15 20 25

SNR (dB)

1

10

50

90

99

99.9

PD

(%

)

NC

I (4

loo

ks)

CFA

R: K

= 1

per

look

K =

5K

= 2

PFA = 10–6

Nonfluctuating

100 101 102

CFAR Sample Support (per look)

0

5

10

15

20

CF

AR

Lo

ss (

dB

)

PD = 95%PFA = 10–6

4 looksNonfluctuating

Relative to NCI

N = 1

48

Page 78: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-78S.T. SMITH

Fiber Integration*

*See differential geometry literature, e.g., Abraham, Marsden & Ratiu

• Given joint pdf ƒZ(), is K-dimensional

– This will be either the joint background statistics, or the joint distribution of the sample mean and variance

• Given mapping x(), x is L-dimensional

• Want the pdf ƒX(x)

– This will be either the CFAR statistics, or the joint distribution of the sample mean and variance

• If L = K, standard change of variables: ƒX(x) = ƒZ()/det(∂x/∂)

– Very boring

• If L < K, fiber integration

– Very cool: invariant to choice of coordinates, inner product

ƒX(x0) = ƒZ() dy

x–1(x0)det

x/y/( )

or ƒX(x0) = ƒZ() dS

x–1(x0)||x/||

Page 79: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-79S.T. SMITH

Geodesics on the Grassmann Manifold

• Maintains Grassmann constraint Y ()HY () = I

• Computational complexity O(np2)

• Approximation using QRD of Y + H possible

• Perform optimization line search along Y ()

Length Minimizing Curves

Geodesic curve Y() starting at Y in direction H:

Y() = (YV cos() + U sin())V H

UV H := H (compact SVD of H ), Y

HY = I, Y HH = 0

Geodesic curve Y() starting at Y in direction H:

Y() = (YV cos() + U sin())V H

UV H := H (compact SVD of H ), Y

HY = I, Y HH = 0

Page 80: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-80S.T. SMITH

RQI is Cubically Convergent

Sketch of Proof

r () = cos2 + sin2

• Newton’s method cubically convergent on even functions

• RQI and Newton have same orderNewton: next = – 1/2·tan(2)

= – (4/3)3 + O(5)

RQI: next = – tan–

1(1/2·tan(2))= – 3 + O(5)

Page 81: MIT Lincoln Laboratory 000317-1 S.T. SMITH Geometry and Invariance in Signal Processing Steven T. Smith* *MIT Lincoln Laboratory, Lexington, MA 02420;

MIT Lincoln Laboratory030303-81S.T. SMITH

The Subspace Tracking Problem

• New solution “close” to old solution

• Approaches to subspace tracking:

– Rank-one updates versus full rank updates:

Rnew = Rold + xx

H versus Rnew = Rold + d

R/dt

– Algebraic versus geometric approaches Algebraic (decomposition based): Fuhrmann ’88, Comon

and Golub ’90, Yu ’91, Moonen, Van Dooren & Vandewalle ’92, Stewart ’92, Champagne ’94, Liu ’94

Geometric (derivative based): Yang & Kaveh ’88, Yang ’95, Smith ’93, Smith ’96

– Exploitation of time dynamics of R(t ) No direct use of d R/dt before

Many authors use structure of xx

H update

Problem: Determine principal invariant subspace of time varying covariance R(t )Problem: Determine principal invariant

subspace of time varying covariance R(t )