a6523 lecture 10 2013 - cornell...

A6523 Signal Modeling, Statistical Inference and Data

Mining in Astrophysics Spring 2013

Lecture 10 –  Estimators and their statistical properties (from previous handout)

•  Convergence and ergodicity

Some references: 1.  Papoulis, Probability, Random Variables, and Stochastic Processes 2.  Scargle, Studies in Astronomical Time Series Analysis. I. Modeling

Random Processes in the Time Domain,1981, ApJS, 45, 1. 3.  Scargle, Studies in Astronomical Time Series Analysis. II. Statistical

Aspects of Spectral Analysis of Unevenly Spaced Data, 1982, ApJ, 263, 835.

4.  Rutman, Characterization of Phase and Frequency Instabilities in Precision Frequency Sources, 1978, Proc. IEEE, 66, 1048

5.  Cordes & Downs, JPL Pulsar Timing Observations. III. Pulsar Rotation Fluctuations, 1985, ApJS, 59, 343.

-------------------------------------------------------------------------------------------------------- •  Note Scargle colloquium, Mar. 7: Adventures in Modern Time Series Analysis: From

the Sun to the Crab Nebula and Beyond

Time Averages and Ergodicity:

In practice when we study signals, we are forced to analyze individual realizations of random

processes. We usually desire information about the ensemble average properties of measureable

quantities; we must derive this information from single realizations and consequently we are

usually compelled to make time averages:

Consider the following stochastic integral of a WSS random process X(t):

Y =1

T

� T

0dtX(t).

This integral can be considered an estimator for ensemble average �X(t)�.

Note that Y is a random variable and has its own PDF.

Question: How good is Y as an estimator?

Related questions: What are its expected value and variance? Does it converge to �X(t)� and

if so how fast does it converge?

10

We can show that Y is an unbiased estimator:

�Y � =

�1

T

� T

0dt X(t)

�

=1

T

� T

0dt �X(t)�

= �X� 1T

� T

0dt = �X�

Therefore Y has a mean equal to the mean of X . The question now is what the spread in Y is:

11

We can show that Y is an unbiased estimator:

�Y � =

�1

T

� T

0dt X(t)

�

=1

T

� T

0dt �X(t)�

= �X� 1T

� T

0dt = �X�

Therefore Y has a mean equal to the mean of X . The question now is what the spread in Y is:

11

To investigate the variation of Y about �Y � we must look at the second moment:

�Y 2� =

��1

T

� T

0dtX(t)

�2�

=

�T−2

�� T

0dt1 dt2X(t1)X(t2)

�

= T−2

�� T

0dt1 dt2 �X(t1)X(t2)��

�

RX(t1, t2) = RX(t1 − t2)

We transform the variables using

z =1

2(t1 + t2)

τ = t1 − t2,

which satisfiesdz dτ = dt1dt2 (Jacobian = 1).

12

The tedious part is to identify the limits of integration. The area in the t1 − t2 plane transformsto the z − τ plane:

This indicates that

�Y 2� = T−2

�� T

0dτ RX(τ )

� T−τ/2

τ/2dz +

� 0

−Tdτ RX(τ )

� +T+τ/2

−τ/2dz

�

�Y 2� =1

T

� T

−Tdτ RX(τ ) (1− |τ |/T ).

To get the variance of y we use the autocovariance CX(τ ) ≡ RX(τ )− �X�2 to write

σ2Y ≡ �Y 2� − �Y �2 = 1

T

� T−T dτ CX(τ ) (1− |τ |/T )

This result is useful because it shows how the averaging process behaves as T → ∞.

13

Convergence:

Suppose the autocovariance of X, CX(τ ) has finite width Wx and looks like

We now define the autocorrelation width of the process:

Assume we can writeCX(τ ) = σ2

X ρX(τ )

with normalization ρX(0) = 1

Define the area of ρX(τ ) as Wx: �dτ ρX(τ ) = WX

14

Then

limT→∞

σ2Y = lim

T→∞T−1

� T

−Tdτ CX(τ ) (1− |τ |/T )

If Cx(τ ) → 0 at lags � T :

σ2Y = lim

T→∞T−1

� ∞

−∞dτ CX(τ )

� limT→∞

T−1 σ2X Wx� ��

constant

= 0

Thus, σY → 0 as T → ∞ and therefore, limT→∞ y = �x�

An infinite time average is equivalent to an ensemble average.

This is an example of ergodicity where time averages of realizations(s) of a random processconverge to an ensemble average.

15

Comments:

1) wide sense stationarity was assumed for the example of y −→ �x�

2) ergodicity of higher order moments requires higher order stationarity

3) in the example above for finite T we have

σY ≈ σX

�Wx

T

�1/2

for T � Wx.

16

This simply means that the error in the average is the error in x divided by the square root of the number of independent samples in the time series, as we have seen before.

The meaning of the width Wx of the autocorrelation function:

If Wx ≡ correlation time for x ⇒ independent samples of x(t) are separated by a time ≈ Wx

⇒ T/Wx = number of independent samples of x(t) in the time interval [0, T ]

⇒ σY ≈ σX N−1/2, N = number of degrees of freedom in estimate

4) we will find that the F.T. of a WSS r.p. is not so well behaved.

The rate at which Y −→ �x� depends on the autocorrelation width Wx of the ACV of x. To getσ2Y ∼ σ2XWx

T −→ 0 for large T we assumed that Cx(τ ) dropped to zero sufficiently fast so thatthe area of CX(τ ) is finite. For this to be true, we need CX(τ ) ∝ τ−p for large τ to have p > 0,since

1

T

� T

τ1

dτ τ−p =1

T

�T 1−p − τ 1−p

1

1− p

�−→ T−p

1− p

Examples of processes that do not converge: Random walks.

17

Fourier Transform and Power Spectrum Estimate for a Stochastic Process

Another stochastic integral is the Fourier transform. As stated before, the FT of a randomprocess (WSS) X(t)

X(f ) =

� ∞

−∞dt x(t) e−iωt recall WSS ⇒ �X2(t)� = constant in t

cannot exist because the integral diverges. Luckily we need consider windowed or finite inte-grals in order to model experimental situations:

XT (f ) =

� T

−Tdt x(t) e−iωt

What should an estimator of the power spectrum Sx(f ) look like? Recall for deterministicfunctions that � ∞

−∞dt f ∗(t) f (t + τ ) ⇔ |F (f )|2

So we expect that an estimator (denoted with a carat) for the power spectrum of a process x(t)would have the form:

Sx(f ) = const |XT (f )|2

and an appropriate value of the constant is = 12T so

Sx(f ) =1

2T|XT (f )|2

This ensures that the corresponding ACF has the correct units:

18

It can be shown that the estimator satisfies the Wiener-Khinchin theorem (which applies toensemble average quantities), where

Cx(τ ) ⇔ Sx(f )

is

Cx(τ ) =1

2T

� ∞

−∞df eiωτ |XT (f )|2

=1

2T

� ∞

−∞df eiωτ

�� T

−Tdt dt� x(t) x∗(t�) e−iω(t−t�)

=1

2T

�� T

−Tdt dt� x(t) x∗(t�)

� ∞

−∞df eiω(t

�+τ−t)

� �� ≡δ(t�+τ−t)

Cx(τ ) =1

2T

� T

−Tdt x(t) x∗(t− τ )

so Cx(0) =1

2T

� T

−Tdt |x(t)|2

= estimate for �|x(t)|2� as expected for Cx(0)

19

How good an estimator is Sx(f ) for the power spectrum Sx(f )?

Answer: terrible! Recall that S(f ) is itself a ramdom process (since Sx for fixed f is a randomvariable). Thus, we may fairly ask what the convergence properties of Sx(f ) are just as weinvestigated Y ≡ 1

T

� T0 dt X(t) and found that �Y � = �X� and σy → 0 as T → ∞ so long as

the correlation function of X(t) decayed sufficiently quickly.

mean �Sx(f )� −→T→∞Sx(f ) so Sx(f ) converges in the mean.

variance but σ2S(f ) ≡ �S2

x(f )� − �Sx(f )�2 does not decay to zero as T → ∞:

limT→∞

σ2S(f ) �= 0.

Conclusion: The squared magnitude of a finite Fourier transform of a WSS process is a poorestimate for the power spectrum Sx(f ) (an ensemble average quantity).

Why Sx(f ) is a poor estimator:

1. Sx(f ) is a χ22 r.v. in the limit where Wx � T

⇒σSx

�Sx�≡ 1 independent of T .

2. From the point of view of number of degrees of freedom, the number of degrees of freedomin the data Ndof ∼ T

Wxmay be large, but the number of degrees of freedom in the spectral

estimate (per independent frequency bin of width ∆f ≈ 12T ) is small.

20

Intuitive Approaches

We can see the same result by bypassing the brute force details by being clever.

Method 1: Consider

XT (ω) ≡� T

−Tdt x(t) e−iωt

� �� infinite sum of random variables

View this as the sum of many random variables. How many?

Let Wx = autocorrelation width of x(t) as in discussion on ergodicity of Y (t) = T−1� T0 dt x(t).

By definition, Wx is the time scale over which two samples X(t1) and X(t2) become indepen-

dent.

Therefore N = 2TWx

≈ number of independent samples of x(t)

If N � 1 then we invoke the Central Limit Theorem and say that XT (ω) becomes a Gaussian

random variable with zero mean if x(t) is zero mean.

Break XT (ω) into real and imaginary parts:

XT (ω) = R(ω) + i I(ω)

It can be shown that R(ω) and I(ω) are independent r.v.’s; therefore, they are zero mean, inde-

pendent Gaussian r.v.’s.

21

Proof:

R(ω) =

� 2T

−2Tdt x(t) ωs ωt

I(ω) = −� 2T

−2Tdt x(t) sinωt

=⇒ �R(ω) I(ω)� = −��

dt1 dt2�x(t1) x(t2)� cosωt1 sinωt2� �� 12 [sinω(t1+t2)+sinω(t2−t1)]

= − 1

2

��dt1 dt2 Rx(t1 − t2) sinω(t1 + t2)

+

��dt1 dt2 Rx(t1 − t2)� ��

even

sinω(t2 − t1)� �� odd

�.

The first term integrates to zero for Wx � T while the second term is the integral of the productof odd and even functions.

ThereforeST (ω) =

1

2T|XT (ω)|2 =

1

2T[R2(ω) + I2(ω)� �� ]

Where the sum of squares is of two independent Gaussian r.v.’s ≡ χ22 implying that ST (ω) is a

χ22 r.v. with mean �ST (ω)� = S(ω) = true power spectrum.

22

Letp(ω) ≡ ST (ω).

Then the PDF of p(ω) is

fp(p) =1

�p� e−p/�p�U(p).

It can be shown that �p2� = 2�p�2 ⇒ ε = σp/�p� = 1 as before.

23

Method 2: Another way of understanding why ST (ω) does not converge as T → ∞ is toconsider the number of degrees of freedom in each independent frequency bin.

• Note that

XT (ω) ≡� T

−Tdt x(t) e−iωt =

� ∞

−∞dt x(t) WT (t) e

−iωt

where WT is a window function,

WT (t) = 1 for |t| ≤ T and zero otherwise.

orXT (ω) = X(ω) ∗ 2 sinωT

ωAs usual, multiplication by WT (t) in the time domain corresponds to convolution by sinωT

ω

in the frequency domain.A frequency cell or bin has a width ∆ω ≈ π

T or ∆f ≈ 12T

• Let Wt = correlation time in the time domain. Then the number of independent fluctuationsin the time series is Nt =

2TWt

• Let Wω = width of spectrum (bandlimited). Then the number of frequency cells into whichthe variance is divided is

Nω =Wω

∆ω=

T

πWω

24

• The number of degrees of freedom (d.o.f.) per frequency cell is

Nd.o.f. =Nt

Nω≡ #of independent data points

#of frequency cells=

2T/Wt

TWω/π=

2π

WtWω.

But the uncertainty principle ⇒ WtWω ≥ 2π, so Nd.o.f. ≈ 1 for each part of F.T.

Interpretation:

As T → ∞ more and more independent fluctuations contribute to the integral, but these arebeing spread into more and more frequency bins so that the # d.o.f. per bin remains the sameand small ⇒ large errors.

Nd.o.f. ≈ 1 ⇒ |XT (ω)|2 will have 2 d.o.f. per cell, as before.

25

Solution to the convergence problem: increase the number of degrees of freedom in afrequency cell.

The simplest approach to take is to average spectral estimates. Obtain spectra from L realiza-

tions of length 2T and average; i.e. find ST (ω) for a block of data, repeat for L blocks of data,

and average.

ST,L(ω) = L−1L�

j=1

ST ;(ω)

error :[V ar [ST,L(ω)]]1/2

�ST,L(ω)�= L−1/2

10% ⇒ L = 100

(Best method if unlimited amount of data are available.) We will talk about this and other

methods later.

26

Direct Calculation of the Mean and Variance of the Spectral Estimate

We will use continuous notation for now.

The spectral estimator for a WSS process is

ST (ω) ≡1

2T|XT (ω)|2 where XT (ω) =

� T

−Tdt x(t) e−iωt.

Properties of the estimator: As usual, we want to calculate the mean and variance of theestimator.

Mean:

The ensemble average is

�ST (ω)� =

�1

2T|XT (ω)|2

�

=

�1

2T

��dt1 dt2 x(t1) x

∗(t2) e−iω(t1−t2)

�

=1

2T

�� T

−Tdt1 dt2 RX(t1 − t2) e

−iω(t1−t2).

27

a6523 lecture 10 2013 - cornell...

Documents