spectral analysis - dept. of statistics, texas a&m …suhasini/teaching673/spectral.pdf ·...

26
Chapter 8 Spectral Analysis 8.1 Some Fourier background The background given here is a extremely sketchy (to say the least), for a more thorough background the reader is referred, for example, to Priestley (1983), Chapter 4 and Fuller (1995), Chapter 3. (i) Fourier transforms of finite sequences It is straightforward to show (by using that n j =1 exp(i2πk/n) = 0 for k = 0) that if d k = 1 n n j =1 x j exp(i2πjk/n), then {x r } can be recovered by inverting this transformation x r = 1 n n k=1 d k exp(i2πrk/n), (ii) Fourier sums and integrals Of course the above only has meaning when {x k } is a finite sequence. However suppose that {x k } is a sequence which belongs to 2 (that is k x 2 k < ), then we can define the function f (ω)= 1 2π k=−∞ x k exp(ikω), where 2π 0 f (ω) 2 = k x 2 k , and we we can recover {x k } from f (ω). That is x k = 1 2π 2π 0 f (ω) exp(ikω). 88

Upload: dinhhanh

Post on 25-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 8

Spectral Analysis

8.1 Some Fourier background

The background given here is a extremely sketchy (to say the least), for a more thoroughbackground the reader is referred, for example, to Priestley (1983), Chapter 4 and Fuller (1995),Chapter 3.

(i) Fourier transforms of finite sequences

It is straightforward to show (by using that∑n

j=1 exp(i2πk/n) = 0 for k 6= 0) that if

dk =1√n

n∑

j=1

xj exp(i2πjk/n),

then xr can be recovered by inverting this transformation

xr =1√n

n∑

k=1

dk exp(−i2πrk/n),

(ii) Fourier sums and integrals

Of course the above only has meaning when xk is a finite sequence. However supposethat xk is a sequence which belongs to ℓ2 (that is

k x2k < ∞), then we can define the

function

f(ω) =1√2π

∞∑

k=−∞xk exp(ikω),

where∫ 2π0 f(ω)2dω =

k x2k, and we we can recover xk from f(ω). That is

xk =1√2π

∫ 2π

0f(ω) exp(−ikω).

88

(iii) Convolutions. Let us suppose that the Fourier transform of the sequence ak is A(ω) =1√2π

k ak exp(ikω) and Fourier transform of the sequence bk isB(ω) = 1√2π

k bk exp(ikω).

Then

∞∑

j=−∞ajbk−j =

A(ω)B(−ω) exp(−ikω)dω

∞∑

j=−∞ajbj exp(ijω) =

A(λ)B(ω − λ)dλ. (8.1)

8.2 Motivation

To give a taster of the spectral representations below let us consider the following example.Suppose that Xtn

t=1 is a stationary time series. The Fourier transform of this sequence is

Jn(ωj) =1√n

n∑

t=1

Xt exp(itωj)

where ωj = 2πj/n (these are often called the fundamental frequencies). Using (i) above we seethat

Xt =1√n

n∑

k=1

Jn(ωj) exp(−ikωt). (8.2)

This is just the inverse Fourier transform, however Jn(ωj) has some interesting properties.Under certain conditions it can be shown that cov(Jn(ωs), Jn(ωt)) ≈ 0 if s 6= t. So in some sense(8.2) can be considered as the decomposition of Xt in terms of frequencies whose amplitudesare uncorrelated. Now if we let fn(ω) = n−1

E(|Jn(ω)|2), and take the above argument furtherwe see that

c(k) = cov(Xk, Xt+k) =1

n

n∑

s=1

E(|Jn(ωs)|2) exp(ikωs − i(k + t)ωs) =n

s=1

fn(ωs) exp(−ikωs).(8.3)

For more details on this see Priestley (1983), Section 4.11 (pages 259-261). Note that the abovecan be considered as the eigen decomposition of the stationary covariance function, since

c(u, v) = c(u− v) =n

s=1

fn(ωs) exp(iuωs) exp(−ivωs),

where exp(itωs) are the eigenfunctions and fn(ωs) the eigenvalues.Of course the entire time series Xt will have infinite length (and in general it will not

belong to ℓ2), so it is natural to ask whether the above results can be generalised to Xt. Theanswer is yes, by replacing the sum in (8.3) by an integral to obtain

c(k) =

∫ 2π

0exp(ikω)dF (ω),

89

where F (ω) is a positive nondecreasing function. Comparing with (8.3), we observe that fn(ωk)is a positive function, thus its integral (the equivalent of F (ω)) is positive and nondecreasing.Therefore heuristically we can suppose that F (ω) ≈

∫ ω0 fn(λ)dλ.

Moreover the analogue of (8.2) is

Xt =

exp(ikω)dZ(ω),

where Z(ω) is right continuous orthogonal increment process (that is E((Z(ω1)−Z(ω2)(Z(ω3)−Z(ω4)) = 0, when the intervals [ω1, ω2] and [ω3, ω4] do not overlap) and E(|Z(ω)|2) = F (ω).

We give the proof for these results the following section.We mention that a more detailed discussion on spectral analysis in time series is give in

Priestley (1983), Chpaters 4 and 6, Brockwell and Davis (1998), Chapters 4 and 10, Fuller(1995), Chapter 3, Shumway and Stoffer (2006), Chapter 4. In many of these references they alsodiscuss tests for periodicity etc (see also Quinn and Hannan (2001) for estimation of frequenciesetc.).

8.3 Spectral representations

8.3.1 The spectral distribution

We first state a theorem which is very useful for checking positive definiteness of a sequence.See Brockwell and Davis (1998), Corollary 4.3.2 or Fuller (1995), Theorem 3.1.9.

To prove part of the result we use the fact that if a sequence ak ∈ ℓ2, then g(ω) =12π

∑∞k=∞ ak exp(ikω) ∈ L2 (by Parseval’s theorem) and ak =

∫ 2π0 g(ω) exp(ikω) and the follow-

ing result.

Lemma 8.3.1 Suppose∑∞

k=−∞ |c(k)| <∞, then we have

1

n

(n−1)∑

k=−(n−1)

|kc(k)| → 0

as n→ ∞.

PROOF. The proof in straightforward in the case that∑∞

k=∞ |kc(k)| <∞, in this case∑(n−1)

k=−(n−1)|k|n |c(k)| =

O( 1n). The proof is slightly more tricky in the case that

∑∞k=∞ |c(k)| < ∞. First we note

that since∑∞

k=−∞ |c(k)| < ∞ for every ε > 0 there exists a Nε such that for all n ≥ Nε,∑

|k|≥n |c(k)| < ε. Let us suppose that n > Nε, then we have the bound

1

n

(n−1)∑

k=−(n−1)

|kc(k)| ≤ 1

n

(Nε−1)∑

k=−(Nε−1)

|kc(k)| + 1

n

|k|≥Nε

|kc(k)|

≤ 1

2πn

(Nε−1)∑

k=−(Nε−1)

|kc(k)| + ε.

Hence if we keep Nε fixed we see that 1n

∑(Nε−1)k=−(Nε−1) |kc(k)| → 0 as n → ∞. Since this is true

for all ε (for different thresholds Nε) we obtain the required result.

90

Theorem 8.3.1 (The spectral density) Suppose the coefficients c(k) are absolutely summable(that is

k |c(k)| <∞). Then the sequence c(k) is nonnegative definite if an only if the func-tion f(ω), where

f(ω) =1

∞∑

k=−∞c(k) exp(ikω)

is nonnegative. Moreover

c(k) =

∫ 2π

0exp(ikω)f(ω)dω. (8.4)

It is worth noting that f is called the spectral density corresponding to the covariances c(k).PROOF. We first show that if c(k) is a non-negative definite sequence, then f(ω) is a nonnega-tive function. We recall that since c(k) is non-negative then for any sequence x = (x1, . . . , xN )(real or complex) we have

∑ns,t=1 xsc(s − t)xs ≥ 0 (where xs is the complex conjugate of xs).

Now we consider the above for the particular case x = (exp(iω), . . . , exp(inω)). Define thefunction

fn(ω) =1

2πn

n∑

s,t=1

exp(isω)c(s− t) exp(−itω).

Clearly fn(ω) ≥ 0. We note that fn(ω) can be rewritten as

fn(ω) =1

(n−1)∑

k=−(n−1)

(n− |k|n

)

c(k) exp(ikω).

Comparing f(ω) = 12π

∑∞k=−∞ c(k) exp(ikω) with fn(ω) we see that

∣f(ω) − fn(ω) ≤∣

∣ ≤ 1

|k|≥n

c(k) exp(ikω)∣

∣ +1

(n−1)∑

k=−(n−1)

|k|nc(k) exp(ikω)

:= In + IIn.

Now since∑∞

k=−∞ |c(k)| < ∞ it is clear that In → 0 as n → ∞. Using Lemma 8.3.1 we haveIIn → 0 as n→ ∞. Altogether the above implies

∣f(ω) − fn(ω)∣

∣ → 0 n→ ∞. (8.5)

Now it is clear that since for all n, fn(ω) are nonnegative functions, the limit f must be nonneg-ative (if we suppose the contrary, then there must exist a sequence of functions fnk

(ω) whichare not necessarily nonnegative, which is not true). Therefore we have shown that if c(k) is anonnegative definite sequence, then f(ω) is a nonnegative function.

We now show that f(ω), defined by 12π

∑∞k=−∞ c(k) exp(ikω), is a nonnegative function then

c(k) is a nonnegative sequence. We first note because c(k) ∈ ℓ1 it is also in ℓ2 hence wehave that c(k) =

∫ 2π0 f(ω) exp(ikω). Now we have

n∑

s,t=1

xsc(s− t)xs =

∫ 2π

0f(ω)

n∑

s,t=1

xs exp(i(s− t)ω)xs

dω =

∫ 2π

0f(ω)|

n∑

s=1

xs exp(isω)|2dω ≥ 0.

91

Hence we obtain the desired result.

The above theorem is very useful. It basically gives a simple way to check whether a sequencec(k) is non-negative definite or not (hence whether it is a covariance function - recall Theorem1.1.1).

Example 8.3.1 Suppose we define the empirical covariances

cn(k) =

1n

∑n−kt=1 XtXt−k |k| ≤ n− 1

0 otherwise

then cn(k) is positive definite sequence. Therefore, using Lemma 1.1.1 there exists a stationarytime series Zt which has the covariance cn(k).

To show that the sequence is non-negative definite we will consider the Fourier transform ofthe sequence (the spectral density) and show that it is nonnegative. The fourier transform ofcn(k) is

(n−1)∑

k=−(n−1)

exp(ikω)cn(k) =

(n−1)∑

k=−(n−1)

exp(ikω)cn(k) =1

n

n−|k|∑

t=1

XtXt+|k| =1

n

n∑

t=1

Xt exp(itω)∣

∣ ≥ 0.

Since it is positive, this means that cn(k) is a positive definite sequence.

We now state a useful result which relates the largest and smallest eigenvalue of of a variancematrix of a stationary process to the smallest and largest values of the spectral density.

Lemma 8.3.2 Suppose that Xk is a stationary process with covariance function c(k) andspectral density f(ω). Let Σn = var(Xn), where Xn = (X1, . . . , Xn). Suppose infω f(ω) ≥ m > 0and supω f(ω) ≤M <∞ Then for all n we have

λmin(Σn) ≥ infωf(ω) λmax(Σn) ≤ sup

ωf(ω).

PROOF. Let e1 be the eigenvector with smallest eigenvalue λ1 corresponding to Σn. Then usingc(s− t) =

f(ω) exp(i(s− t)ω)dω we have

λmin(Σn) = e′1Σne1 =n

s,t=1

es,1c(s− t)et,1 =

f(ω)n

s,t=1

es,1 exp(i(s− t)ω)et,1dω =

=

∫ 2π

0f(ω)|

n∑

s=1

es,1 exp(isω)|2dω ≥∫ 2π

0f(ω)

∫ 2π

0|

n∑

s=1

es,1 exp(isω)|2dω ≤ infωf(ω),

since∫

|∑ns=1 es,1 exp(isω)|2dω = 1. Using a similar method we can show that λmax(Σn) ≤

sup f(ω).

A consequence of the above result is if a spectral density is bounded from above and boundedaway from zero then it is non-singular and with a bounded spectral norm.

Lemma 8.3.3 Suppose the covariance c(k) decays to zero as k → ∞, then for all n, Σn =var(Xn) is a non-singular matrix (Note we do not specify that the covariances are absolutelysummable).

92

PROOF. See Brockwell and Davis (1998), Proposition 5.1.1.

Theorem 8.3.1 only holds when the sequence c(k) is absolutely summable. Of coursethis may not always be the case. An example of an ‘extreme’ case is the time series Xt = Z.Clearly this is a stationary time series and its covariance is c(k) = var(Z) for all k. In this casethe autocovariances c(k) = 1, is not absolutely summable, hence the representation of thecovariance in Theorem 8.3.1 can not be applied to this case. The reason is because the fouriertransform of the infinite sequence 1 is not well defined (since 1 does not belong to ℓ1 andalso ℓ2).

However, we now show that Theorem 8.3.1 can be generalised to include all non-negativedefinite sequences and stationary processes, by considering the spectral distribution rather thanthe spectral density (we use the integral

g(x)dF (x), a definition is given in the Appendix).

Theorem 8.3.2 A function c(k) is non-negative definite sequence if and only if

c(k) =

∫ 2π

0exp(ikω)dF (ω), (8.6)

where F (ω) is a right-continuous (this means that f(x + h) → f(x) as 0 < h → 0), non-decreasing, non-negative bounded function on [−π, π] (hence it has all the properties of a dis-tribution and it can be consider as a distribution - it is usually called the spectral distribution).This representation is unique.

PROOF. We first show that if c(k) is non-negative definite sequence, then we can writec(k) =

∫ 2π0 exp(ikω)dF (ω), where F (ω) is a distribution function. Had c(k) been absolutely

summable, then we can use Theorem 8.3.1 to write c(k) =∫ 2π0 exp(ikω)dF (ω), where F (ω) =

∫ ω0 f(λ)dλ and f(λ) = 1

∑∞k=−∞ c(k) exp(ikω). By using Theorem 8.3.1 we know that f(λ) is

nonnegative, hence F (ω) is a distribution, and we have the result.In the case that c(k) is not absolutely summable we cannot use this approach but we adapt

some of the ideas used to prove Theorem 8.3.1. As in the proof of Theorem 8.3.1 define thenonnegative function

fn(ω) =1

2πn

n∑

s,t=1

exp(isω)c(s− t) exp(−itω) =1

(n−1)∑

k=−(n−1)

(n− |k|n

)

c(k) exp(ikω).

When the c(k) is not absolutely summable, the limit of fn(ω) may no longer be well defined.To circumvent our dealing with functions which may have awkward limits, we consider insteadtheir integral, which we will show will always be a distribution function. Let us define thefunction Fn(ω) whose derivative is fn(ω), that is

Fn(ω) =

∫ ω

0fn(λ)dλ 0 ≤ λ ≤ 2π.

Since fn(λ) is nonnegative, Fn is a nondecreasing function, and it is bounded (Fn(π) =∫ 2π0 fn(λ)dλ ≤

c(0)). Hence Fn satisfies all properties of a distribution and can be treated as a distributionfunction. Now it is clear that for every k we have

∫ 2π

0exp(ikω)dFn(ω) =

(1 − |k|n )c(k) |k| ≤ n0 0

(8.7)

93

If we let dn,k =∫ 2π0 exp(ikω)dFn(ω), we see that for every k, dn,k → dk as n→ ∞. But we should

ask what this tells us about the limit of the distribution Fn? Intuitively, the distributions Fnshould (weakly converge) to a function F and this function should also be a distribution function(if Fn are all nondecreasing functions, then its limit must be nondecreasing). In fact this turnsout to be the case by applying Helly’s theorem (see Appendix). Roughly speaking, it states thatgiven a sequence of distributions Gk which are all bounded, then there exists a distributionG, which is the limit of a subsequence of Gki (effectivly this determining conditions for asequence of functions to be compact), hence for every h ∈ L2 we have

h(ω)dGki(ω) →∫

h(ω)dG(ω).

We now now apply this result to the sequence Fn. We observe that the sequence of distribu-tions Fn are all uniformly bounded (by c(0)). Therefore applying Helly’s theorem there mustexist a distribution F , which is the limit of a subsequence of Fn, that is for every h ∈ L2 wehave

h(ω)dFki(ω) →∫

h(ω)dF (ω), i→ ∞,

for some subsequence Fki. We now show that above is true not only for a subsequence butthe actual sequence Fk. We observe that exp(ikω) is a basis of L2, and that the sequence∫ 2π0 exp(ikω)dFn(ω)n converges for all k, to c(k). Therefore for all h ∈ L2 we have

h(ω)dFk(ω) →∫

h(ω)dF (ω), k → ∞,

for some distribution function F . Therefore looking at the case exp(ikω) we have

exp(ikω)dFk(ω) →∫

exp(ikω)dF (ω), k → ∞.

Since∫

exp(ikω)dFk(ω) → c(k) and∫

exp(ikω)dFk(ω) →∫

exp(ikω)dF (ω), then we have c(k) =∫

exp(ikω)dF (ω), where F is a distribution.To show that c(k) is a non-negative definite sequence when c(k) is defined as c(k) =

exp(ikω)dFk(ω), we use the same method given in the proof of Theorem 8.3.1.

Example 8.3.2 We now construct the spectral distribution for the time series Xt = Z. LetF (ω) = 0 for ω < 0 and F (ω) = var(Z) for ω ≥ 0 (hence F is the step function). Then we have

cov(X0, Xk) = var(Z) =

exp(ikω)dF (ω).

8.3.2 The spectral representation theorem

We now state the spectral representation theorem and give a rough outline of the proof.

94

Theorem 8.3.3 If Xt is a second order stationary time series with mean zero, and spectraldistribution F (ω), and the spectral distribution function is F (ω), then there exists a right con-tinuous, orthogonal increment process Z(ω) (that is E((Z(ω1) − Z(ω2)(Z(ω3) − Z(ω4)) = 0,when the intervals [ω1, ω2] and [ω3, ω4] do not overlap) such that

Xt =

∫ 2π

0exp(itω)dZ(ω), (8.8)

where for ω1 ≥ ω2, E(Z(ω1) − Z(ω2))2 = F (ω1) − F (ω2) (noting that F (0) = 0). (One example

of a right continuous, orthogonal increment process is Brownian motion, though this is just oneexample, and usually Z(ω) will be far more general than Brownian motion).

Heuristically we see that (8.8) is the decomposition of Xt in terms of frequencies, whoseamplitudes are orthogonal. In other words Xt is decomposed in terms of frequencies exp(itω)which have the orthogonal amplitudes dZ(ω) ≈ (Z(ω + δ) − Z(ω)).

Remark 8.3.1 Note that so far we have not defined the integral on the right hand side of(8.8), this is known as a stochastic integral. Unlike many deterministic functions (functionswhose derivative exists), one cannot really suppose dZ(ω) ≈ Z ′(ω)dω, because usually a typicalrealisation of Z(ω) will not be smooth enough to differentiate. For example, it is well known thatBrownian is quite ‘rough’, that is a typical realisation of Brownian motion satisfies |B(t1, ω) −B(t2, ω)| ≤ K(ω)|t1 − tt|γ, where ω is a realisation and γ ≤ 1/2, but in general γ will not belarger. The integral

g(ω)dZ(ω) is well defined if it is defined as the limit (in the mean squaredsense) of discrete sums. In other words let Zn(ω) =

∑nk=1 Z(ωk)Iωnk−1,ωnk

(ω) and

g(ω)dZn(ω) =n

k=1

g(ωk)Z(ωk) − Z(ωk−1),

then∫

g(ω)dZ(ω) is the mean squared limit of ∫

g(ω)dZn(ω)n that is E[∫

g(ω)dZ(ω)−∫

g(ω)dZn(ω)]2.For a more precise explanation, see Priestley (1983), Sections 3.6.3 and Section 4.11 and Brock-well and Davis (1998), Section 4.7.

A very elegant explanation on the different proofs of the spectral representation theoremis given in Priestley (1983), Section 4.11. We now give a rough outline of the proof using thefunctional theory approach.

PROOF of the Spectral Representation Theorem To prove the result we will definetwo Hilbert spaces H1 and H2, where H1 one contains deterministic functions and H2 containsrandom variables. We will define what is known as an isomorphism (a one-to-one mapping whichpreserves the norm and is linear) between these two spaces.

Let H1 be defined by all functions f , if∫ 2π0 f2(ω)dF (ω) < ∞, then f ∈ H1 and define the

inner product on H1 to be

< f, g >=

∫ 2π

0f(x)g(x)dF (x). (8.9)

We first note that exp(ikω) belongs to H1, moreover they also span the space H1. Hence iff ∈ H1, then there exists coefficients aj such that f(x) =

j aj exp(ijω). Let H2 be the

95

space spanned by Xt, hence H2 = sp(Xt) (it necessary to define the closure of this space,but we won’t do so here) and the inner product is the covariance cov(Y,X).

Now let us define the mapping T : H1 → H2

T (n

j=1

aj exp(ikω)) =n

j=1

ajXk, (8.10)

for any n (it is necessary to show that this can be extended to infinite n, but we won’t do sohere). We need to shown that T defines an isomorphism. We first observe that this mappingperserves the inner product. That is suppose f, g ∈ H1, then there exists fj and gj suchthat f(x) =

j fj exp(ijω) and g(x) =∑

j gj exp(ijω). Hence by definition of T in (8.10) wehave

< Tf, Tg > = cov(∑

j

fjXj ,∑

j

gjXj) =∑

j1,j2

fj1gj2cov(Xj1 , Xj2)

=

∫ 2π

0

(

j1,j2

fj1gj2 exp(i(j1 − j2)ω))

dF (ω) =

∫ 2π

0f(x)g(x)dF (x) =< f, g > .

Hence < Tf, Tg >=< f, g >, so the inner product is preserved. To show that it is a one-to-onemapping see Brockwell and Davis (1998), Section 4.7. Altogether this means that T defines anisomorphism betwen H1 and H2. Therefore all functions which are in H1 have a correspondingrandom variable in H2 which display many similar properties.

Since for all ω ∈ [0, 2π], the identity functions I[0,ω](x) ∈ H1, we can define the randomfunction Z(ω); 0 ≤ λ ≤ 2π to be T (I[0,ω]) = Z(ω). Now since that mapping T is linear weobserve that T (I[ω1,ω2]) = Z(ω1)−Z(ω2). Moreover, since T preserves the norm we have for anynon-intersecting intervals [ω1, ω2] and [ω3, ω4] that

E((Z(ω1) − Z(ω2)(Z(ω3) − Z(ω4)) = < T (I[ω1,ω2]), T (I[ω3,ω4]) >=< I[ω1,ω2], I[ω3,ω4] >

=

I[ω1,ω2](x)I[ω3,ω4]dF (ω) = 0.

Therefore by construction Z(ω); 0 ≤ λ ≤ 2π is an orthogonal increment process, with

E((Z(ω1) − Z(ω2)2) = < T (I[ω1,ω2]), T (I[ω1,ω2]) >< I[ω1,ω2], I[ω2,ω3] >

=

∫ ω2

ω1

dF (ω) = F (ω1) − F (ω2).

Having defined the two spaces which are isomorphic and the random function Z(ω); 0 ≤λ ≤ 2π and function I[0,ω](x) which are have orthogonal increments. We can now prove theresult. We note that for any function g ∈ L2 we can write

g(ω) =

∫ 2π

0g(s)dI(s− ω),

where I(s) is the identity function with I(s) = 0, for s < 0, and I(s) = 1, for s ≥ 0 (hencedI(ω − s) = δω(s)ds and δω(s) is the dirac delta function). We now consider the special caseg(t) = exp(itω), and apply the isomophism T to this

T (exp(itω)) =

∫ 2π

0exp(its)dT (I(ω − s)),

96

where the mapping goes inside the integral due to the linearity of the isomorphism. Now weobserve that I(s− ω) = I[0,s](ω) and by definition of Z(ω); 0 ≤ λ ≤ 2π we have T (I[0,s](ω)) =Z(s). Substituting this into the above gives

Xt =

∫ 2π

0exp(its)dZ(s),

which gives the required result.

It is worth pointing out in the above proof, the exponential functions exp(ikω) do notnecessarily play a unique role. Indeed it was the construction of the orthogonal random functionsZ(ω) that was instrumental. The main idea of the proof was that there are functions φk(ω)and a distribution H such that all the covariances of the stochastic process Xt can be writtenas

E(XtXτ ) = c(t, τ) =

∫ 2π

0φt(ω)φτ (ω)dH(ω).

So long as this representation exists then we can define two spaces H1 and H2 where φk is thebasis of the functional space H1 and it contains all functions f such that

f2(ω)dH(ω) <∞ andH2 is the random space defined by sp(. . . , X−1, X0, X1, . . .). Now we can define an isomorphismT : H1 → H2, where for all functions f(ω) =

k fkφk(ω) ∈ H1

T (f) =∑

k

fkXk ∈ H2.

An important example is T (φk) = Xk. Now by using the same arguments as those in the proofabove we have

Xt =

φt(ω)dZ(ω)

where Z(ω) are orthogonal random functions and E|Z(ω)|2 = H(ω). We state this result inthe theorem below (see Priestley (1983), Section 4.11).

Theorem 8.3.4 (General orthogonal expansions) Let Xt be a time series (not necessar-ily second order stationary) with covariance E(XtXτ ) = c(t, s). If there exists a sequence offunctions φk(·) which satisfy for all k

∫ 2π

0|φk(ω)|2dH(ω)

and the covariance admits the representation

c(t, s) =

∫ 2π

0φt(ω)φs(ω)dH(ω), (8.11)

where H is a distribution then for all t we have the representation

Xt =

φt(ω)dZ(ω) (8.12)

where Z(ω) are orthogonal random functions and E|Z(ω)|2 = H(ω). On the other hand if Xt

has the representation (8.12), then c(s, t) admits the representation (8.11).

97

Remark 8.3.2 We mention that the above representation applies to both stationary and non-stationary time series. What makes the exponential functions exp(ikω) special is if a processis stationary then the representation of c(k) := cov(Xt, Xt+k) in terms of exponentials is guar-anteed:

c(k) =

∫ 2π

0exp(ikω)dF (ω). (8.13)

Therefore there always exists an orthogonal random function Z(ω) such that

Xt =

exp(itω)dZ(ω).

Indeed, whenever the exponential basis is used in the definition of either the covariance or theprocess Xt, the resulting process will always be second order stationary.

We mention that it is not always guaranteed that for any basis φt we can represent thecovariance c(k) as (8.11). However (8.12) is a very useful starting point for characterisingnonstationary processes.

8.3.3 The spectral densities of MA, AR and ARMA models

We obtain the spectral density function for MA(∞) processes. Using this we can easily obtainthe spectral density for ARMA processes. Let us suppose that Xt satisfies the representation

Xt =

∞∑

j=−∞ψjεt−j (8.14)

where εt are iid random variables with mean zero and variance σ2 and∑∞

j=−∞ |ψj | <∞. Werecall that the covariance of above is

c(k) = E(XtXt+k) =

∞∑

j=−∞ψjψj+k. (8.15)

Since∑∞

j=−∞ |ψj | <∞, it can be seen that

k

|c(k)| ≤∑

k

∞∑

j=−∞|ψj | · |ψj+k| <∞.

Hence by using Theorem 8.3.1, the spectral density function of Xt is well defined. Thereare several ways to derive the spectral density of Xt, we can either use (8.15) and f(ω) =12π

k c(k) exp(ikω) or obtain the spectral representation of Xt and derive f(ω) from thespectral representation. We prove the results using the latter method.

Since εt are iid random variables, using Theorem 8.3.3 there exists an orthogonal randomfunction Z(ω) such that

εt =1

∫ 2π

0exp(itω)dZ(ω).

98

Since E(εt) = 0 and E(ε2t ) = σ2 multiplying the above by εt, taking expectations and notingthat due to the orthogonality of Z(ω) we have E(dZ(ω1)dZ(ω2)) = 0 unless ω1 = ω2 we havethat E(|dZ(ω)|2) = 2πσ2dω, hence fε(ω) = 2πσ2.

Using the above we can obtain the spectral representation for Xt

Xt =1

∫ 2π

0

∞∑

j=−∞ψj exp(−ijω)

exp(itω)dZ(ω).

Hence

Xt =

∫ 2π

0A(ω) exp(itω)dZ(ω), (8.16)

where A(ω) = 12π

∑∞j=−∞ ψj exp(−ijω), noting that this is the unique spectral representation of

Xt.Now multiplying the above by Xt+k and taking expectations gives

E(XtXt+k) = c(k) =

∫ 2π

0A(ω1)A(−ω2) exp(itω1 − i(t+ k)ω2)E(dZ(ω1)dZ(ω2)).

Due to the orthogonality of Z(ω) we have E(dZ(ω1)dZ(ω2)) = 0 unless ω1 = ω2, altogetherthis gives

E(XtXt+k) = c(k) =

∫ 2π

0|A(ω)|2 exp(−ikω)E(|dZ(ω)|2) =

∫ 2π

02πσ2|A(ω)|2 exp(−ikω)dω.

Comparing the above with (8.4) we see that the spectral density f(ω) = 2πσ2|A(ω)|2 =σ2

2π |∑∞

j=−∞ ψj exp(−ijω)|2. Therefore the spectral density function corresponding to the MA(∞)process defined in (8.14) is

f(ω) = 2πσ2|A(ω)|2 =σ2

2π|

∞∑

j=−∞ψj exp(−ijω)|2.

Example 8.3.3 Let us suppose that Xt is a stationary ARMA(p, q) time series (not neces-sarily invertible or causal), where

Xt −p

j=1

ψjXt−j =

q∑

j=1

θjεt−j ,

εt are iid random variables with E(εt) = 0 and E(ε2t ) = σ2. Then the spectral density of Xtis

f(ω) =σ2

|1 +∑q

j=1 θj exp(ijω)|2|1 − ∑q

j=1 φj exp(ijω)|2

We note that because the ARMA is the ratio of trignometric polynomials, this is known as arational spectral density.

99

Definition 8.3.1 (The Cramer Representation) We mention that the representation in (8.16)of a stationary process is usually called the Cramer representation of a stationary process, where

Xt =

∫ 2π

0A(ω) exp(itω)dZ(ω),

where Z(ω) : 0 ≤ ω ≤ 2π are orthogonal functions.

8.3.4 Higher order spectrums

We recall that the covariance is measure of linear dependence between two random variables.Higher order cumulants are a measure of higher order dependence. For example, the third ordercumulant for the zero mean random variables X1, X2, X3 is

cum(X1, X2, X3) = E(X1X2X3)

and the fourth order cumulant for the zero mean random variables X1, X2, X3, X4 is

cum(X1, X2, X3, X4) = E(X1X2X3X4) − E(X1X2)E(X3X4) − E(X1X3)E(X2X4) − E(X1X4)E(X2X3).

From the definition we see that if X1, X2, X3, X4 are independent then cum(X1, X2, X3) = 0and cum(X1, X2, X3, X4) = 0.

Moreover, if X1, X2, X3, X4 are Gaussian random variables then cum(X1, X2, X3) = 0 andcum(X1, X2, X3, X4) = 0. Indeed all cumulants higher than order two is zero. This comes fromthe fact that cumulants are the coefficients of the power series expansion of the logarithm of themoment generating function of Xt.

Since the spectral density is the fourier transform of the covariance it is natural to askwhether one can define the higher order spectra as the fourier transform of the higher ordercumulants. This turns out to be the case, and the higher order spectra have several interestingproperties.

Let us suppose that Xt is a stationary time series (notice that we are assuming it isstrictly stationary and not second order). Let cum(t, s) = E(X0, Xt, Xs) and cum(t, s, r) =E(X0, Xt, Xs, Xr) (noting that like the covariance the higher order cumulants are invariant toshift). The third and fourth order spectras is defined as

f(ω1, ω2) =∞

s=−∞

∞∑

t=−∞cum(s, t) exp(isω1 + itω2)

f(ω1, ω2, ω3) =∞

s=−∞

∞∑

t=−∞

∞∑

r=−∞cum(s, t, r) exp(isω1 + itω2 + irω3).

Example 8.3.4 (Third and Fourth order spectra of a linear process) Let us suppose thatXt satisfies

Xt =∞

j=−∞ψjεt−j

100

where∑∞

j=−∞ |ψj | < ∞, E(εt) = 0 and E(ε4t ) < ∞. Let A(ω) =∑∞

j=−∞ ψj exp(ijω). Then itis straightforward to show that

f3(ω1, ω2) = κ3A(ω1)A(ω2)A(−ω1 − ω2)

f4(ω1, ω2, ω3) = κ4A(ω1)A(ω2)A(ω3)A(−ω1 − ω2 − ω3),

where κ3 = cum(εt, εt, εt) and κ4 = cum(εt, εt, εt, εt).

We see from the example, that unlike the spectral density, the higher order spectras are notnecessarily positive or even real.

A review of higher order spectra can be found in Brillinger (2001). Higher order spectrashave several applications especially in nonlinear processes, see Subba Rao and Gabr (1984). Wewill consider one such application in a later chapter.

8.4 The Periodogram and the spectral density function

Our aim is to construct an estimator of the spectral density function f(ω) associated with thesecond order stationary process Xt.

8.4.1 The periodogram and its properties

Let us suppose we observe Xtnt=1 which are observations from a zero mean, second order

stationary time series. Let us suppose that the autocovariance function is c(k), where c(k) =E(XtXt+k) and

k |c(k)| <∞. We recall that we can be estimate c(k) using

cn(k) =1

n

n−|k|∑

t=1

XtXt+k.

Given that the spectral density is

f(ω) =1

∞∑

k=−∞c(k) exp(ikω),

then a natural estimator of f(ω) is

IX(ω) =1

n−1∑

k=−(n−1)

cn(k) exp(ikω). (8.17)

We usually call IX(ω) the periodogram. We will show that the periodogram has several niceproperties that make it a suitable candidate for the spectral density estimator. The only problemis that the raw periodogram turns out to be an inconsistent estimator. However with somemodifications of the periodogram we can construct a good estimator of the spectral density.

Lemma 8.4.1 Suppose that Xt is a second order stationary time series with∑

k |c(k)| < ∞and IX(ω) is defined in (8.17). Then we have

IX(ω) =1

n−1∑

k=−(n−1)

cn(k) exp(ikω) =1

2πn

n∑

t=1

Xt exp(itω)∣

2(8.18)

101

and

∣E(IX(ω)) − f(ω)∣

∣ ≤ 1

(

|k|≥n

|c(k)| +∑

|k|≤n

|k|n|c(k)|

)

→ 0 (8.19)

as n → ∞. Hence in the case that∑∞

k=−∞ |kc(k)| < ∞ we have∣

∣E(IX(ω)) − f(ω)∣

∣ = O( 1n).

Moreover var( 1√2πn

∑nt=1Xt exp(itω)) = E(IX(ω)).

PROOF. (8.18) follows immediately from the definition of the periodogram.To obtain the first inequality in (8.19) is straightforward. To show that

|k|≥n |c(k)| +∑

|k|≤n|k|n |c(k)| → 0 as n → ∞, we note that

k |c(k)| < ∞, therefore∑

|k|≥n |c(k)| → 0 as

n→ ∞. To show that∑

|k|≤n|k|n |c(k)| → 0 as n→ ∞ we use

k |c(k)| <∞ and Lemma 8.3.1,thus we obtain the desired result.

We see from the above that the periodogram is both a non-negative function as well as anasymptotically unbiased estimator of the spectral density. Hence is has inherited several of thecharacteristics of the spectral density. However a problem with the periodogram is that it isextremely irratic in its behaviour, in fact in its limit it does not converge to the spectral density.Hence as an estimator of the spectral density it is inappropriate. We will demonstrate this inthe following two propositions and later discuss why this is so and how it can be made into aconsistent estimator.

We start by considering the periodogram of iid random variables.

Proposition 8.4.1 Suppse εtnt=1 are iid random variables with mean zero and variance σ2.

We define Jε(ω) = 1√2πn

∑nt=1 εt exp(itω) and Iε(ω) = 1

2πn

∑nt=1 εt exp(itω)

2. Then we have

Jε(ω) =

(

ℜ(Jz(ω))ℑ(Jz(ω))

)

D→ σ2N (0,σ2

2πI2), (8.20)

for any finite m

(Jε(ωk1)′, . . . , Jε(ωkm)′)

D→ σ2N (0,σ2

2πI2m), (8.21)

Iε(ω)/σ2 ∼ χ2(2) (which is equivalent to the exponential distribution with mean one), (Iε(ωk1), . . . , Iε(ωkm))

converges in distribution to

cov(Iε(ωj), Iε(ωk)) =

κ4

2πn j 6= kκ4

2πn +2κ2

2

2π j = k(8.22)

where ωj = 2πj/n and ωk = 2πk/n (and k, j 6= 0, n).

PROOF. We first show (8.20). We note that ℜ(Jε(ωk)) = 1√2πn

∑nt=1 αt,n and ℑ(Jε(ωk)) =

1√2πn

∑nt=1 βt,n where αt,n = εt cos(2kπt/n) and βt,n = εt sin(2kπt/n). We note that ℜ(Jε(ωk)) =

1√2πn

∑nt=1 αt,n and ℑ(Jε(ωk)) = 1√

2πn

∑nt=1 βt,n are the weighted sum of iid random variables,

hence αt,n and βt,n are martingale differences. Therefore, to show asymptotic normality,we will use the martingale central limit theorem with the Cramer-Wold device to show that

102

(8.20). We show the result we need to verify the three conditions of the martingale CLT. Firstwe consider the variances and the conditional variances

1

2πn

n∑

t=1

E(

|αt,n|2∣

∣εt−1, εt−2, . . .)

=1

2πn

n∑

t=1

cos(2kπt/n)2ε2tP→ σ2

1

2πn

n∑

t=1

E(

|βt,n|2∣

∣εt−1, εt−2, . . .)

=1

2πn

n∑

t=1

sin(2kπt/n)2ε2tP→ σ2

1

2πn

n∑

t=1

E(

αt,nβt

∣εt−1, εt−2, . . .)

=1

2πn

n∑

t=1

cos(2kπt/n) sin(2kπt/n)ε2tP→ E(ε2t )

1

πn

n∑

t=1

sin(2k × 2πt/n) = 0.

Finally we need to verify the Lindeberg condition, we only verify it for 1√2πn

∑nt=1 αt,n, the same

argument holds true for 1√2πn

∑nt=1 βt,n. We note that for every ǫ > 0 we have

1

2πn

n∑

t=1

E(

|αt,n|2I(|αt,n| ≥ 2π√nǫ)

∣εt−1, εt−2, . . .)

=1

2πn

n∑

t=1

E(

|αt,n|2I(|αt,n| ≥ 2π√nǫ)

)

≤ 1

2πn

n∑

t=1

E(

|εt|2I(|εt| ≥ 2π√nǫ)

)

= E(

|εt|2I(|εt| ≥ 2π√nǫ)

) P→ 0

as n → ∞, noting that the second to last inequality is because |αt,n| = | cos(2πt/n)εt| ≤ εt.Hence we have verified Lindeberg condition and we obtain (8.20). The proof of (8.21) is similar,hence we omit the details. We can show that Iε(ω) ∼ χ2(2), because Iε(ω) = ℜ(Jε(ω))2 +ℑ(Jε(ω))2, hence from (8.20) we have Iε(ω)/ ∼ χ2(2).

To prove (8.22) we note that

cov(Iε(ωj), Iε(ωk)) =1

2πn2

k1

k2

t1

t2

cov(Xt1Xt1+k1, Xt2Xt2+k2

).

We recall that

cov(εt1εt1+k1, εt2εt2+k2

) = cov(εt1 , εt2+k2)cov(εt2 , εt1+k1

) + cov(εt1 , εt2)cov(εt1+k1, εt2+k2

) +

cum(εt1 , εt1+k1, εt2 , εt2+k2

).

We note that since εt are iid random variables, then for most t1, t2, k1 and k2 the abovecovariance is zero. The exceptions are when t1 = t2 and k1 = k2 or t1 = t2 and k1 = k2 = 0 ort1 − t2 = k1 = −k2. Counting all these combinations we have

cov(Iε(ωj), Iε(ωk)) =2

2πn2

k

t

t

exp(ik(ωj − ωk))κ22 +

1

2πn2

t

κ4

where κ2 = var(εt) and κ4 = cum(εt, εt, εt, εt). We note that for j 6= k,∑

t exp(ik(ωj −ωk)) = 0and for j = k,

t exp(ik(ωj − ωk)) = n, substutiting this into cov(Iε(ωj), Iε(ωk)) gives us thedesired result.

We have seen that the periodogram for iid random variables does not converge to a constantand indeed its distribution is asymptotically exponential. This suggests that something similar

103

holds true for linear processes. This is the case. In the following lemma we show that theperiodogram of a general linear process Xt =

∑∞j=−∞ ψjεt−j is

IX(ω) = |∑

j

ψj exp(ijω)|2Iε(ω) + op(1) =2π

σ2f(ω)Iε(ω) + op(1),

where f(ω) = σ2

2π |∑

j ψj exp(ijω)|2 is the spectral density of Xt.

Lemma 8.4.2 Let us suppose that Xt satisfy Xt =∑∞

j=−∞ ψjεt, where∑∞

j=−∞ |ψj | < ∞,

and εt are iid random variables with mean zero and variance σ2. Then we have

JX(ω) =

j

ψj exp(ijω)

Jε(ω) + Yn(ω), (8.23)

where Yn(ω) = 1√n

j ψj exp(ijω)Un,j, with Un,j =∑n−j

t=1−j exp(itω)εt−∑n

t=1 exp(itω)εt, E(Yn(ω))2 ≤( 1

n1/2

∑∞j=−∞ |ψj |min(|j|, n)1/2)2. Furthermore

IX(ω) = |∑

j

ψj exp(ijω)|2|Jε(ω)|2 +Rn(ω), (8.24)

where E(supω |Rn(ω)|) → 0 and as n→ ∞.If in addition E(ε4t ) <∞ and

∑∞j=−∞ |j1/2ψj | <∞ then E(supω |Rn(ω)|2) = O(n−1).

PROOF. See Priestley (1983), Theorem 6.2.1 or Brockwell and Davis (1998), Theorem 10.3.1.

Using the above we see that IX(ω) ≈ f(ω)Iε(ω). This suggest that most of the propertieswhich apply to Iε(ω) also apply to IX(ω). Indeed in the following theorem we show that theasympototic distribution of IX(ω) is exponential with mean and variance f(ω).

By using the above result we now generalise Proposition 8.4.1 to linear processes.

Theorem 8.4.1 Suppose Xt satisfies Xt =∑∞

j=−∞ ψjεt, where∑∞

j=−∞ |ψj | <∞. Let In(ω)denote the periodogram associated with X1, . . . , Xn and f(·) be the spectral density. Then

(i) If f(ω) > 0 for all ω ∈ [0, 2π] and 0 < ω1, . . . , ωm < π, then

(

In(ω1)/f(ω1), . . . , In(ωm)/f(ωm))

converges in distribution (as n→ ∞) to a vector of independent exponential distributionswith mean one.

(ii) If in addition E(ε4t ) < ∞ and∑∞

j=−∞ |j1/2ψj | < ∞. Then for ωj = 2πjn and ωk = 2πk

n wehave

cov(I(ωk), I(ωj)) =

2(2π)2f(ωk)2 +O(n−1/2) ωj = ωk = 0 or π

(2π)2f(ωk)2 +O(n−1/2) 0 < ωj = ωk < π

O(n−1) ωj 6= ωk

where the bound is uniform in ωj and ωk.

104

PROOF. See Brockwell and Davis (1998), Theorem 10.3.2.

Remark 8.4.1 (Summary of properties of the periodogram for linear processes) (i)The periodogram is nonnegative and is an asymptotically an unbiased estimator of the spec-tral density (when

j |ψj | <∞).

(ii) Like the spectral density is it symmetric about zero: In(ω) = In(ω + π).

(iii) At the fundemental frequencies I(ωj) are asymptotically uncorrelated.

(iv) If 0 < ω < π, I(ω) is asymptotically exponentially distributed with mean f(ω).

We see that the periodogram is extremely irratic and does not converge (in anyway) to the spec-tral density as n→ ∞. In the following section we discuss this further and consider modificationsof the spectral density which lead to a consistent estimate.

8.4.2 Estimating the spectral density

There are several (pretty much equivalent) explanations as to why the raw periodogram is not agood estimator of the spectrum. Intuitively, the simplest explanation is that we have included toomany covariance estimators in the estimation of f(ω). We see from (8.17) that the periodogramis the Fourier transform of the estimates covariances at n different lags. Typically the variancefor each covariance cn(k) will be about O(n−1), hence roughly speaking the variance of In(ω)will be the sum of these n O(n−1) variances which leads to a variance of O(1), which clearlydoes not converge to zero. This suggest that if we use m (m << n) covariances in the estimationof f(ω) rather than all n, (where we let m→ ∞) we may reduce the variance in the estimation(with the cost of introducing some bias) to yield a good estimator of the spectral density. Thisindeed turns about to be the case.

Another way is to approach the problem is from a nonparametric angle. We note that fromTheorem 8.4.1, at the fundemental frequencies I(ωk) can be treated as uncorrelated randomvariables, with mean f(ωk) and variance f(ωk). Therefore we can rewrite I(ωk) as

I(ωk) = E(I(ωk)) + (I(ωk) − E(I(ωk)))

≈ f(ωk) + f(ωk)Uk, k = 1, . . . , n, (8.25)

where Uk is approximately a mean zero, variance one sequence of uncorrelated random vari-ables and ωk = 2πk/n. We note that (8.25) resembles the usual nonparametric function plusadditive noise, often considered in nonparametric statistics. This suggest that another way toestimate the spectral density us to use a locally weighted average of I(ωk). Interestingly boththe estimation methods mentioned above are practically the same method.

It is worth noting that Parzen (1957) first proposed a consistent method to estimate thespectral density. Furthermore, classical density estimation and spectral density estimation arevery similar, and it was spectral density estimation motivated methods, which motivated meth-ods to estimate the density function (one of the first papers on density estimation is Parzen(1962)).

Equation (8.25) motivates the following nonparametric estimator of f(ω).

fn(ωj) =∑

k

1

bnK(

j − k

bn)I(ωk), (8.26)

105

where W (·) is a kernel which satisfies∫

W (x)dx = 1 and∫

W (x)2dx < ∞. An example of

fn(ωj) is the local average about frequency ωj :

fn(ωj) =1

bn

j+bn/2∑

k=j−bn/2

I(ωk).

Theorem 8.4.2 Suppose Xt satisfy Xt =∑∞

j=−∞ ψjεt, where∑∞

j=−∞ |j1/2ψj | < ∞ and

E(ε4t ) <∞. Let fn(ω) be the spectral estimator defined in (8.26). Then

E(fn(ωj)) → f(ωj) (8.27)

and

var(fn(ωj)) →

1bnf(ωj)

2 0 < ωj < π2bnf(ωj)

2 ωj = 0 or π.(8.28)

bn→ ∞, b→ 0 as n→ ∞.

PROOF. The proof of both (8.27) and (8.28) are based on the kernel K(x/b) getting narrowas b → 0, hence there is more localisation as the sample size grows (just like nonparametricregression). We note that since f(ω) = σ2|∑∞

j=−∞ ψ exp(ijω)|2, the spectral density, f , iscontinuous in ω.

To prove (8.27) we take expections

∣E(fn(ωj)) − f(ωj)∣

∣ =∣

k

1

bnK(

k

bn)

E(

I(ωj−k))

− f(ωj)∣

=∣

k

1

bnK(

k

bn)∣

∣E(

I(ωj−k))

− f(ωj−k)∣

∣ +∣

k

1

bn|K(

k

b)

f(ωj) − f(ωj−k)∣

:= I + II.

Using Lemma 8.4.1 we have

I =∑

k

1

bn|K(

k

bn)|

∣E(

I(ωj−k))

− f(ωj−k)∣

≤ K( 1

bn

k

|K(k

bn)|

)

×(

|k|≥n

|c(k)| +∑

|k|≤n

|k|n|c(k)|

)

→ 0.

Now we consider

II =∣

k

1

bnK(

k

bn)

f(ωj) − f(ωj−k)∣

∣ = O(b).

Since the spectral density f(·) is continuous, then we have II → ∞ as bn → ∞, b → 0 andn→ ∞. The above two bounds mean give (8.27).

106

We will use Theorem 8.4.1 to prove (8.28). We first assume that j 6= 0 or n. Evaluating thevariance using Theorem 8.4.1 we have

var(fn(ωj)) =∑

k1,k2

1

(bn)2K(

j − k1

bn)K(

j − k2

bn)cov(I(ωk1

), I(ωk1))

=∑

k

1

(bn)2K(

j − k

bn)K(

j − k

bn)var(I(ωk)) +O(

1

n)

=∑

k

1

(bn)2K(

k

bn)2f(ωj−k) +O(

1

n1/2) → 1

bnf(ωj).

A similar proof can be used to prove the case j = 0 or n.

The above result means that the mean squared error of the estimator is

E(

fn(ωj) − f(ωj))2 → 0

as bn→ ∞, b→ 0 and n→ ∞. Moreover

E(

fn(ωj) − f(ωj))2

= O(1

bn) +

E(fn(ωj)) − f(ωj)2

= O(1

bn) +O

(

|k|≥n

|c(k)| +∑

|k|≤n

|k|n|c(k)| +

1

bnW (

x

b) · (f(ω) − f(x))dx

)2+O(b).

Hence the rate of convergence depends on the bias of the estimator, in particular the rate ofdecay of the covariances.

There are several example of kernels that one can use and each has its own optimalityproperty. An interesting discussion on this is given in Priestley (1983), Chapter 6.

As mentioned briefly above we can also estimate the spectrum by truncating the number ofcovariances estimated. We recall that

IX(ω) =1

n−1∑

k=−(n−1)

cn(k) exp(ikω).

Hence a viable estimator of the spectrum is

fn(ω) =1

n−1∑

k=−(n−1)

λ(k

M)cn(k) exp(ikω),

hence λ(·) is a weight function with very little weight (or no weight) for the covariances at largelags. A useful example, using the rectangular function is λ(x) = 1 if |x| ≤ 1 and zero otherwiseis

fn(ω) =1

M∑

k=−M

cn(k) exp(ikω),

107

where M << n. Now fn(ω) has similar properties to fn(ω). Indeed there is a close relationshipbetween the two which can be seen by using (8.1). Substituting cn(k) = 1

∫ 2π0 In(ω) exp(−ikω)dω

into fn gives

fn(ω) =1

(2π)2

In(λ)n−1∑

k=−(n−1)

λ(k

M) exp(ik(ω − λ))dλ =

1

In(λ)KM (ω − λ)dω,

where KM (ω) = 12π

∑n−1k=−(n−1) λ( k

M ) exp(ikω). Now KM (·) and 1bnK( ·

bn) (defined in (8.26)) arenot necessarily the same function, but they share many of the same characteristics. In otherwords

KM (ω) =1

n−1∑

k=−(n−1)

λ(k

M) exp(ikω) =

M

n−1∑

k=−(n−1)

1

Mλ(

k

M) exp(i

k

M·Mω)

≈ MW (Mω), where W (ω) =

λ(x) exp(iω)dx.

Hence KM (ω) = MW (Mω), therefore

fn(ω) ≈ M

In(λ)W (M(ω − λ))dω.

Comparing with fn we see that M plays the same role as b−1. Furthermore, we observe∑

k1bnK( j−k

bn )I(ωk) is the sum of about I(ωk) terms. The equivalent for KM (·), is that it has the

‘spectral’ width m = n/M . In other words since fn(ω) = 12π

∑n−1k=−(n−1) λ( k

M )cn(k) exp(ikω) =12π

MIn(λ)W (M(ω − λ))dω, is the sum of about m = n/M terms.

Remark 8.4.2 (The distribution of the spectral density estimator) Using that the pe-riodogram In(ω)/f(ω) is asymptotically χ2(2) distributed and uncorrelated at the fundementalfrequencies, we can deduce approximate the distribution of fn(ω). To obtain the distributionconsider the example

fn(ωj) =1

bn

j+bn/2∑

k=j−bn/2

I(ωk).

Since I(ωk)/f(ωk) are approximately χ2(2), then since the sum∑j+bn/2

k=j−bn/2 I(ωk) is taken over a

local neighbourhood of ωj, we have that f(ωj)−1

∑j+bn/2k=j−bn/2 I(ωk) is approximately χ2(2bn). Ex-

tending this argument to aribtrary kernels we have that bnfn(ωj)/f(ωj is approximately χ2(2bn).We note that when bn is large, then χ2(2bn) is close to normal with mean 2bn and vari-

ance 4bn. Hence bnfn(ωj)/f(ωj)x is approximately normal with mean 2bn and variance 4bn.Therefore

√bnfn(ωj) ≈ N(2f(ωj), 4f(ωj)

2).

Using this approximation, we can construct confidence intervals for f(ωj).

108

8.5 The Whittle Likelihood

In Chapter 4 we considered various methods for estimating the parameters of an ARMA pro-cess. The most efficient method (when the errors were Gaussian) was the Gaussian maximumlikelihood estimator. This estimator was defined in the time domain, but it is interesting to notethat a very similar estimator which is asymptotically equivalent to the GMLE estimator can bedefined in the frequency domain. We first define the estimator using heuristics to justify it. Wethen show how it is related to the GMLE (it is the frequency domain approximation of the timedomain estimator).

First let us suppose that we observe Xtnt=1, where

Xt =

p∑

j=1

φ(0)j Xt−j +

q∑

j=1

θ(0)j εt−j + εt,

and εt are iid random variables. As before we will assume that φ0 = φ(0)j and θ0 = θ(0)

j are such that the roots of the characteristic polynomial are greater than 1 + δ. Let us definedthe Discrete Fourier Transform

Jn(ω) =1√2πn

n∑

j=1

Xt exp(itω).

We will consider ω at the fundamental frequencies ωk = 2πkn . As we mentioned in Section 8.2

we have

cov(Jn(ωk1), JX(ωk2

)) =

(

var(Jn(ωk1)) ≈ f(ωk1

) k1 = k2

0 k1 6= k2.

)

.

Hence if the innovations are Gaussian then JX(ω) is complex Gaussian and we have approxi-mately

JX =

JX(ω1)...

JX(ωn)

∼ CN (0,diag(f(ω1), . . . , f(ωn))).

Therefore since Jn is normally distributed (complex) random vector with mean zero and diagonalmatrix variance matrix diag(f(ω1), . . . , f(ωn)), the log likelihood of Jn is approximately

Lwn (φ, θ) =

n∑

k=1

(

log |fφ,θ(ωk)| +|JX(ωk)|2fφ,θ(ωk)

)

.

To estimate the parameter we would choose the θ and φ which minimises the above criterion,that is

(φwn ,θ

wn ) = arg max

φ,θ∈ΩLw

n (φ, θ), (8.29)

where Ω consists of all parameters where the roots of the characteristic polynomial have absolutevalue greater than (1 + δ).

Whittle (1962) showed that the above criterion is an approximation of the GMLE. Thecorrect proof is quite complicated and uses several matrix approximations due to Grenanderand Szego (1958). Instead we give a heuristic proof which is quite enlightening.

109

Remark 8.5.1 (Some properties of circulant matrices) (i) Let us define the n dimen-sional circulant matrix

Γ =

c(0) c(1) c(2) . . . c(n-2) c(n-1)c(n-1) c(0) c(2) . . . c(n-3) c(n-2)

c(n-1) c(1) c(0)

.

See that the elements in each row are the same, just a rotation of each other.

The eigenvalues and vectors of Γ have interesting properties. Define fn(ω) =∑(n−1)

k=−(n−1) c(k) exp(ikω),

then the eigenvalues are fn(ωj) with corresponding eigevectors ej = (1, exp(2πij/n), . . . , exp(2πij(n−1)/n)). Hence let E = (e1, . . . , en), then we can write Γ as

Γ = E∆E−1 (8.30)

where ∆ = diag(f(ω1), . . . , f(ωn)).

(ii) You may wonder about the relevance of circulant matrices to the current setting. Howeverthe variance covariance matrix of a stationary process is

var(Xn) = Σ =

c(0) c(1) c(2) . . . c(n-2) c(n-1)c(1) c(0) c(2) . . . c(n-3) c(n-2)

c(n-1) c(1) c(0)

.

This is a Toeplitz matrix and we observe for large n it is very close to the circulant matrix,the differences are in the endpoints of the matrix. Hence it can be shown that for large n

Σ ≈ E∆E−1, (8.31)

and E−1 ≈ E (where E denotes the complex conjugate of E).

We will use the results in the lemma above to prove the lemma below. We first observe thatin the Gaussian maximum likelihood for the ARMA process can be written as in terms of itsspectrum (see 4.10)

Ln(φ, θ) = det |Σ(φ, θ)| + X′nΣ(φ, θ)−1Xn = det |Σ(fφ,θ)| + X′

nΣ(fφ,θ)−1Xn, (8.32)

where Σ(fφ,θ)s,t =∫

fφ,θ(ω) exp(i(s − t)ω)dω and X′n = (X1, . . . , Xn). We now show that

Ln(φ, θ) ≈ Lwn (φ, θ).

Lemma 8.5.1 Suppose that Xt is a stationary ARMA time series with absolutely summablecovariances and fφ,θ(ω) is the corresponding spectral density function. Then

det |Σ(fφ,θ)| + X′nΣ(fφ,θ)

−1Xn ≈n

k=1

(

log |fφ,θ(ωk)| +|JX(ωk)|2f(φ,θ)(ωk)

)

,

for large n.

110

PROOF. We give a heuristic proof (details on the precise proof can be found in the remarkbelow). Using (8.31) we have see that Σ(fφ,θ) can be approximately written in terms of theeigenvalue and eigenvectors of the circulant matrix associated with Σ(fφ,θ), that is

Σ(fφ,θ) ≈ E∆(fφ,θ)E and Σ(fφ,θ)−1 ≈ E∆(fφ,θ)

−1E, (8.33)

where ∆(fφ,θ) = diag(f(n)(ω1), . . . , f(n)(ωn)), f(n)(φ,θ)(ω) =

∑(n−1)j=−(n−1) cφ,θ(k) exp(ikω) → f(φ,θ)(ω)

and ωk = 2πk/n. A basic calculations gives that

XnE = (Jn(ω1), . . . , Jn(ωn)). (8.34)

Substituting (8.34) and (8.33) into (8.35) yields

1

nLn(φ, θ) ≈ 1

n

n∑

k=1

(

det f(n)φ,θ (ωk) +

|Jn(ω)|2fφ,θ(ωk)

)

=1

nLw(φ, θ). (8.35)

Hence by using the approximation (8.33) we have derived the Whittle likelihood. This proofwas first derived by Tata Subba Rao.

Remark 8.5.2 (A rough flavour of the proof) There are various ways to precisely provethis result. All of them show that the Toeplitz matrix can in some sense be approximated by acirculant matrix. This result uses Szego’s identity (Grenander and Szego (1958)). Studying theGaussian likelihood in (8.35) we note that the Whittle likelihood has a similar representation.That is

L(w)n (φ, θ) =

n∑

k=1

(

log |fφ,θ(ωk)| +|Jn(ωk)|2fφ,θ(ωk)

)

=n

k=1

log |fφ,θ(ωk)| + XnU(f−1φ,θ)Xn,

where U(f−1φ,θ)s,t =

fφ,θ(ω)−1 exp(i(s−t)ω)dω. Hence we can show that |U(f−1φ,θ)−Σ(fφ,θ)

−1| →0 as n → ∞ (and its derivatives with respect φ and θ also converge), then we can show that

L(w)n (φ, θ) and Ln(φ, θ) and its derivatives are asymptotically equivalent. Hence the GMLE and

the Whittle estimator are asymptotic equivalent. The difficult part in the proof involves showingthat |U(f−1

φ,θ)−Σ(fφ,θ)−1| → 0 as n→ ∞. It is worth noting that Rainer Dahlhaus has extensively

developed this area and considered several interesting generalisations (see for example Dahlhaus(1996), Dahlhaus (1997) and Dahlhaus (2000)).

We now show consistency of the estimator (without showing that its equivalent to theGMLE). To simplify calculations we slightly modify the estimator and exchange the summandin (8.29) with an integral to obtain

Lw(φ, θ) =

∫ 2π

0

(

log |fφ,θ(ωk)| +|Jn(ωk)|2fφ,θ(ωk)

)

.

To estimate the parameter we would choose the θ and φ which minimises the above criterion,that is

(φwn ,θ

wn ) = arg max

φ,θ∈ΩLw

n (φ, θ), (8.36)

111

Lemma 8.5.2 (Consistency) Let us suppose that dk(θ, φ) =∫

fφ,θ(ω)−1 exp(ikω)dω and∑

k |dk(θ, φ)| <∞. Let (φn, θn) be defined a in (8.36). Then we have

(φwn ,θ

wn )

P→ (θ0,φ0).

PROOF. Recall to show consistency we need to show pointwise convergence of Ln and equicon-tinuity. First we show pointwise convergence by evaluating the variance of Ln, and show that itconverges to zero as n→ ∞. We first note that by using dk(θ, φ) =

fφ,θ(ω)−1 exp(ikω)dω andusing In(ω) = 1

∑n−1k=−(n−1) cn(k) exp(ikω) we can write Lw

n as

1

nLw

n (φ, θ) =

∫ 2π

0log |fφ,θ(ωk)| +

1

n

n−1∑

r=−(n−1)

dr(θ, φ)

n−|r|∑

k=1

XkXk+r.

Therefore taking the variance gives

var(Lwn (φ, θ)) =

1

n2

n−1∑

r1,r2=−(n−1)

dr1(θ, φ)dr2

(θ, φ)

n−|r1|∑

k1=1

n−|r2|∑

k2=1

cov(Xk1Xk1+r1

, Xk2Xk2+r2

) (8.37)

We note that

cov(Xk1Xk1+r1

, Xk2Xk2+r2

) = cov(Xk1, Xk2

)cov(Xk1+r1, Xk2+r2

) + cov(Xk1, Xk2+r2

)cov(Xk1+r1, Xk2

) +

cums(Xk1Xk1+r1

, Xk2Xk2+r2

).

Now we note that∑

k2|cum(Xk1

Xk1+r1, Xk2

Xk2+r2)| < ∞ and

k2cov(Xk1

, Xk2) < ∞, hence

substituting this and∑

k |dk(θ, φ)| <∞ into (8.37), we have that

var(Lwn (φ, θ)) = O(

1

n),

hence var(Lwn (φ, θ)) → 0 as n→ ∞. Define

Lw(φ, θ)) = E(Lwn (φ, θ)) =

∫ (

det fφ,θ(ω) +fφ0,θ0

(ω)

fφ,θ(ωk)

)

dω.

Hence, since var(Lwn (φ, θ)) → 0 we have

Lwn (φ, θ)

P→ Lw(φ, θ).

To show equicontinuity we apply the mean value theorem to Lwn . We note that because the

parameters (φ,θ) ∈ Θ, have characteristic polynomial whose roots are greater than (1+ δ) thenfφ,θ(ω) is bounded away from zero (indeed there exists a δ∗ > 0 where infω,(φ,θ)∈Θ fφ,θ(ω) ≥δ∗). Hence it can be shown that there exists a random sequence Kn such that |Ln(φ1, θ1) −Ln(φ2, θ2))| ≤ Kn(‖(φ1 − φ2), (θ1 − θ2)‖) and Kn converges almost surely to a finite constantas n → ∞. Therefore Ln is stochastically equicontinuous (and equicontinuous in probability).Since the parameter space Θ is compact, all three conditions in Section 5.6 are satisfied and wehave consistency of the Whittle estimator.

We now show asymptotic normality of the Whittle estimator and in the following remarkshow its relationship to the GMLE estimator.

112

Theorem 8.5.1 Let us suppose that dk(θ, φ) =∫

fφ,θ(ω)−1 exp(ikω)dω and∑

k |dk(θ, φ)| <∞.Let (φw

n ,θwn ) be defined a in (8.36)

√n(

(φwn − φ0), (θ

wn − θ0)

) D→ N (0, V −1 + V −1WV −1)

where

V =1

∫ 2π

0

(∇fφ0,θ0(ω)

fφ0,θ0(ω)2

)(∇fφ0,θ0(ω)

fφ0,θ0(ω)2

)′dω

W =1

∫ 2π

0

∫ 2π

0

(

∇fφ0,θ0(ω1)

−1)(

∇fφ0,θ0(ω2)

−1)′f4,θ0,φ0

(ω1,−ω1, ω2),

and f4,θ0,φ0(ω1, ω2, ω3) = κ4A(ω1)A(ω2)A(ω3)A(−ω1 − ω2 − ω3) is the fourth order spectrum

corresponding to the ARMA process with A(ω) = θ0(exp(iω))φ0(exp(iω)) .

PROOF. See, for example, Brockwell and Davis (1998), Chapter 10.8.

Remark 8.5.3 (i) It is interesting to note that in the case that Xt comes from a linear timeseries (such as an ARMA process) then using f4,θ0,φ0

(ω1, ω1,−ω2) = κ4|A(ω1)|2|A(ω2)|2 =κ4

κ2

2

f(ω1)f(ω2) (for linear processes) we have

W =1

∫ 2π

0

∫ 2π

0

(

∇fφ0,θ0(ω1)

−1

)(

∇fφ0,θ0(ω2)

−1

)′f4,θ0,φ0

(ω1,−ω1, ω2)

=κ4

κ22

(

1

∫ 2π

0

∇fφ0,θ0(ω)

fφ0,θ0(ω)2

fφ0,θ0(ω)dω

)2

=κ4

κ22

(

1

∫ 2π

0

∇fφ0,θ0(ω)

fφ0,θ0(ω)

)2

=κ4

κ22

(

1

∫ 2π

0∇ log fφ0,θ0

(ω)dω

)2

=κ4

κ22

( 1

2π∇

∫ 2π

0log fφ0,θ0

(ω)dω)2

=κ4

κ22

( 1

2π∇2π log

σ2

)2= 0,

where we note that∫ 2π0 log fφ0,θ0

(ω)dω = 2π log σ2

2π by using Kolmogorov’s formula. Hencefor linear processes the higher order cumulant plays no role and above theorem reduces to

√n(

(φn − φ0), (θn − θ0)) D→ N (0, V −1)

(ii) Since the GMLE and the Whittle likelihood are asymptotically equivalent they should leadto the same asymptotic distributions. We recall that the GMLE has the asymptotic distri-

bution√n(θn − θ0, φn − θ0)

D→ N (0, σ20Λ

−1), where

Λ =

(

E(UtU′t) E(VtU

′t)

E(UtV′t ) E(VtV

′t )

)

and Ut and Vt are autoregressive processes which satisfy φ0(B)Ut = εt and θ0(B)Vt =εt.

It can be shown that(

E(UtU′t) E(VtU

′t)

E(UtV′t ) E(VtV

′t )

)

=1

∫ 2π

0

(∇fφ0,θ0(ω)

fφ0,θ0(ω)2

)(∇fφ0,θ0(ω)

fφ0,θ0(ω)2

)′dω.

113