penalized whittle likelihood estimation of spectral density functions
TRANSCRIPT
PENALIZED WHITTLE LIKELIHOOD ESTIMATION
OF SPECTRAL DENSITY FUNCTIONS
by
Yudi PawitaBFinbarr O'Sullivan
TECHNICAL REPORT No. 238
August 1992
Department of Statistics, GN-22
University of Washington
Seattle, Washington 98195 USA
Penalized Whittle Likelihood Estimation of
Spectral Density Functions. 1
Yudi Pawitan
Department of Statistics
University College, Dublin 4.
and
Finbarr O'Sullivan
Department of Statistics
Unlvefl,ltvof Washington
Seattle, WA 98195.
Abstract
The penalized likelihood approach is not yet developed for time series problems, even though
it has applied in a number of nonparametric function estimation problems.
We define a new of nonparametric estimates of the spectral density of a stationary
timeseries as the maximizer of the Whittle likelihood with roughness penalty. Implementa
tion using an iterative least squares procedure, with the log periodogram as starting value,
works very well in practice. We derive an unbiased estimate of the integrated squared error
and use it to choose a data dependent smoothing parameter. The procedure is illustrated
by simulated and real data examples. In larger simulations the new estimate is shown to
be more efficient than the smoothed log periodogram estimate. We indicate heuristically
why this is the case. Assuming some smoothn~ss condition on the true spectrum, we also
show that the estimate achieves the same asymptotic error rate as that in nonparametric
regression and density est;nIlatIOl1.
Keywords: stationary time series, iterative least squares, regularization, bandwidth selec
tion.
1 Introduction
prc.b1€~mseven ttH:mg;npeIlalized likl::lilltood approach is not developed
su<;ce;sfully in a number of no:np,a,rabml::tric function est,im.atitonit has
for Good & Gaskin, 1977; 1982 and 1985; O'Sullivan,
1988). In time analysis, the Whittle likelihood function (Whittle, 1962) has been
investigated extensively for parametric inference (e.g., Hannan, 1973; Davies, 1973) and
has been shown to possess some optimality property (Kulperger, 1985). The purpose of
this paper is to show the application of the penalized Whittle likelihood to the problem of
and a SImilar ap]')roacn
and proved some but here we show asymptotic rate results of the
integrated squared error and describe the computational issues in detail, especially with
regards to the data dependent choice of the smoothing parameter. We also compare the
new method with the standard periodogram and log periodogram smoothing.
The likelihood approach puts a number of spectral estimation methods into a coherent
framework. For example, the raw periodogram is the (unpenalized) maximum likelihood
estimate of the spectrum, while the usual window estimate is a local maximum likelihood
estimate in the sense of Tibshirani (1986) ;and the sIIloothed log periodogram
(SLP) (Cogburn & David, 1974; Wahba, 1980) will be shown to be the first iterate of the
appoximated penalized maximum likelihood estimate (PMLE) of the log >:fil"rfrllTn
We describe the in Section 2, including fiuding a data dependent
smoothing based on an risk with data
and the 3 show a re<ltSO][l-
more etliciellt
than
::ie<:tio,n 4 we
same asJ,mptcltic error rate as
int,egr·atE!d sCluared error is
2 Methods
X t a statio:nar'y time
The second order nTl"n""Tti,~"
mean J.-t observed at time t = 0,
X t is eql111vale:ntliy described by the covariance function
±2, ...
and by the spectral density fnnction
I(x) = I: cxx(k)exp(-21rixk).Ie
The problem is to estimate I(x) from a finite sample X o, ... , X T-l' An estimate of 1 is use-
fulfor a of that the data or, more importantly, it
may be needed as an input for other estimation problem, such as smoothing or filtering(see,
for example, Brillinger, 1981 or Shumway, 1988).
The Whittle likelihood for stationary processes can be justified by the following heuristic
argument. One can also start with a Gaussian likelihood in the time domain (Whittle, 1962;
Chow & Grenander, 1985), but it is not necessary to assume Gaussianity of the process
itself (Hannan, 1973), in which case we are only exploiting the second order structure of
the process. Define the discrete transform (DFT) at frequency x as
d(x) = I: X t exp( -21rixt),t
(1)
and the periodogram I(x) = Id(x)12IT. Under Brillinger's mixing condition (Brillinger
1981, p. 26) we have
CN(O,/(x»,
ISon1
cOlnplex normal distrHmtion and T-l/2d(x) are asymptoti-
to be at Fourier
as T -+ 00, where C.lv (1erlotl~s
cally for distin,ct
lrequenlcy x = kiT.
COIlVeJtlieltlCe we
mCltrv'atlon, we note
10 is
I·
is miJninlizEd at I = 10, so is an apl;>ropriate
of I is mlIllmJ[Zer of
Ln(f) = LT(f) +>..J(f) (2)
where LT(f) serves as a data fit criterion, J(f) is a penalty function and>.. is a parameter
that controls the degree of smoothing: >.. =0 corresponds to the raw periodogram estimates
and>.. = 00 corresponds to the model corresponding to the choice J(f).
COllsidJer the penalty fnncti()n
where fJ( x) == log I(x). Note here that the log transform overcomes the positivity constraint
I(x) > O. The smoothest model using this function is log polynomial of order (m - 1).
Denote Xk = kiT, fJk = fJ(Xk) and (J to be the T x 1 vector of fJk'S. Throughout we will use
the bold notation to represent T x 1 vectors.
Iterative Least Squares for fixed >..
Minimization of the objective function
(3)
IS ac(:onlpJjlshl~d
& Nei,der. p.40).
square metheld as follows McCullagh
+ ) ~
[T/2]
L
-1,
:::;; 1,
peIlalt;y ftlLRct;10n is
J(f) :::: L k2mei ~k
where ek is
mately,
k'th J<h,111'i.,1' C(:>etJllC1,ent of fI(a:), also known as the cepstrum. So, approxi-
(T/2] (T/2]
LT), :::: L (Zk - ek)2 +A L k2mei (7)k=(-T/2]+l k=(-T/2]+1
where Zk is the k'th discrete Fourier transform of z. Minimization of (7) is now trivial.
Given an initial estimate (/J, LT), is milllilIliZ€~d
flJ:::: 1 L(l +Xk2mr 1Zk exp(21rijkIT), j:::: 0, ±1, ....k
(8)
The iteration continues by alternately recomputing (5) and (8) until convergence. Denote
iJ to be the final estimate of the log spectrum.
The equation (8) can be written also as 81 :::: S),mz, where S),m represents a smoothing
matrix associated with A and m. Now it is worth noting that with an initial estimate
fI/. :::: log lk +1.3064313, so that Zk :::: log lk +0.577216, the first iteration gives exactly the
smoothed log periodogram estimate of log spectrum as given by Wahba (1980). This
means that the first iterate is already a good estimate and the procedure typically converges
in less than four iterations, so for a value of A the computational cost is O(TlogT).
Choice of the smoothing parameter ),
The cross-validation l1e;Cl,vI'!-Olle-'OUt baJIld1wklth selection in spectral estimation
and more geller;al
cases smoot;hiIlg para,me~ter as an un-
sqularE!d error
= -2 +
on can be reaAiily cOlnplJ.ted, term
not inv'o!\re A, so it SutfiCE~S socl:md term. mean seclond term can
be unb,iasE~dly estilma,ted as toll()ws. the iter'ath{e
satisfilElS iJ = S),:fnZ, l near the opt:imaJ as T gets large
Z = iJ + lexp(-iJ)-1
:::::$ 8 +1 exp(-8) - 1, (10)
where 8 is the true log spectrum. approximation (10) is in line with the linearization
(11)
lemmas proved in the aJ)l~eIldi:IC.. So, we have
= L EOk + trace{Cov(iJ, z)}k
:::::$ L (Jk E8k+ trace SAm.k
The trace can be shown to be equal to
[T/2]
L (1 + lk2m )-1.k=[-T/2]+l
Then, up to a constant term, the ISE can be unbiasedly estimated by
[T/2]
URE(l, m) =11.iJ z 112 +:{ L (1+l k2m )-1}. (12)k=[~T/2]+1
Since 8 is not available the following fully automatic two-stage procedure is used: in the
first stage we choose the A that minimizes URE using z = iJ + 1 exp(-iJ) - 1, and call the
pilot estimcbte.
millimiza1;ion is needed
optimal estlm.ate 8p
z =8p +1 a
the second repeat the procedure using
find8p •
3 Examples and Comparisons
sec:tion we illustr:a,te
processes as
...... , inn.owl.ticm proc:ess et
deviatl~s are J(eJler,ated
are set to 0
100
4
recurs,ive equation CLU'-'V<:;.
to nOI'ma.llty
sul)se,qm~nt values are COJlll>'llted
of X t are then we use m = 2 and
consider A between and e-1 • A simple grid search is used to minimize URE over
the range of A. Only 5 valnes of A are used for the stage minimization of URE and 50
values are used in the second stage. The procedures are implemented in double precision in
Fortran and validated in S-plus. T = takes abont2 or less to compnte a
fully automatic a DEC 3100 WOJrkSl~atlon.
Figure I(a) shows that the URE of follows the true ISE quite closely up to a
constant shift shown by the dashed line. We call the best estimate to be one that minimizes
the ISE for the data at hand, and the automatic estimate to be one that minimizes the
URE. In this case the automatic is practically the same with the best The URE
and ISE of the SLP are shown for comparison; see Wahba (1980) for detail. It is worthnoting
that the ISE of PMLE is smaller than the ISE of SLP for near optimal smoothing, so, in
particular, the best PMLE has lower ISEthan the best SLP. l(b) shows the log
spectrum of the AR(3) process with the automatic PMLE and SLP estimates. Figure I(c)
for MA(4) process shows patterns to those of lea). For plots we have expressed
the smoothing parameter terms of bandwidth given by
{
T-J at A =
T=
perJodogram at
two crit,eria COIJ:lclcle
occurs as we cannot
zero treqmmcy. ::;,ett.ing it zero thlrol1lJ(h centering
overcome
same
cOlnp;a,ril;on we plot compnted as the
same bandwidth (L = 3)
1 L=-~1 L h:+j}.+ . Lj=-
We see that the is closer to this case.
To compare the performance of PMLE with SLP and LSP we replicate the simulation
500 times, each time recording the best PMLE, SLP and LSP. The LSP estimate
of the log spectrum is computed as the log of simple average periodogram above. The best
LSP for a given L that the ISE in the spectral
domain. This performed for =128,256,512 and 1024.
Figure 3(a) shows that PMLE is than SLP and LSP for AR(3) process and
similar results are shown in Figure 3(c) for MA(4) process. Moreover the estimates are
seen to be consistent as their ISE tends to zero as we increase the number of observations.
The advantage of the PMLE over SLP is retained when we compare the corresponding
automatic procedures, as shown in Figure 3(b) and 3(d) for the AR(3) and MA(4) processes
respectively.
For each simulated realization we also computed the relative efficiency of PMLE against
SLP as the ratio
aSj!mpt()tu:alJly we
1 shows a median efficiency of around
opt;im;al 8pMLE = S),mz and the variance of Zle
l+c) lhis
both for the best and au1;OBlatlc el,tllllat;es.
1.4 to 1.5 in
to be
lJ,,: + lie
1l"2/6 ~
a relative etticie][lcy Int;ui1;ivl~v. we "UV'UiU eXJ;fect as we av€!rafJ~e
squlan!d error spl,;ctrat estimate over trequlmcy. 1 are
Best PMLE vs Best SLP
Auto VB Auto SLP
Auto vs Best PMLE
Best VB
Auto PMLE vs Auto SLP
Auto PMLE VB Best
1.54 1.40 1.45 1.57
1.53 1.48 1.46 1.49
0.82 0.83 0.84 0.87
1.34 1.39 1.41 1.47
1.30 1.44 1.45 1.48
0.89 0.94 0.95 0.96
Table 1: PMLE is more efficient than SLP for best and automatic procedures, and the
unbiased risk estimation shows high efficiency. For each realization the relative efficiency
of A versus B is the ratio of the ISE of B to that of A. Each entry of the table is the median
efficiency 'computed .,over 500. simulations.
cases, with a positive as the sample increases (see 1).
4 Asymptotic results
we apply her'eatter abl:.reviat~~d by CO,
eBtim.ato,r. It is COILVeJ1ierlt to de,relcll) T "13rT"" for
to
is a zero mean SeC1JnG stationar'y process
is
is /J a co~rarian<:e sllmlnal>llit:y C(>ndltlcm
of l</J- a=/J-l>
2:(1+Ie
++<
:$ M 2:(1 + IkI2)i+a Icxx(k )12
k
= Mil fo
{2)1+Ie
where M isa constant andll fo Sobolev norm of fo, which is in turn bounded
l + 1/2. So, the smoothness condition
by II 00
0(k-i - 1
Co:nve:rselv if
k -+ 00 and II fo II~ is bOllmd{~d /J
the covariance =
in C.3 is equivalent to a covariance summability of order I! > 1.
Theorem 1 Under C.l-C.a, if there is an a satisfying
112m < a < (s/m - 1/2m)/2 (13)
and a sequence of AT -+ 0 satisfying
X;(2a+l/m) -+ 0 (14)
then, with probability tending to one, eTA is well defined and for all b E [0, aJ
(15)
where Af a constant ma'ept:na,ent of 00 , b, AT and
any
density estim<!Lte.
theorem may he to convergence ch':tracteris1tics of the spectr.itl
letting In = then we can show that 0(T-2m/(2s+l»)
II Ir>. 10 < Do + II Or>. - DO II~} II On - Do 115
= Op(T-2s/(28 +l»).
(17)
So, for example, if s 2 the asymptotic upper hound is Op(T-4/ 5 ), which is the standard
rate associated with nonparametric regression and density estimation.
APPENDIX: Proof of the theorem
The proof will follow from theorem 3.1 and 3.2 of CO after verifying assumptions A.I-A.6
of that paper. Refer to CO for a detailed development of the normed spaces. Let FT be
the discrete uniform measure on the points XA; = kiT, for k = -[T12] + 1, ... , [TI2]. Also
let F to be the uniform measure on [-0.5,0.5]. Now we can write
Ln j {O(x) + I(x)e-O(x)}dFT(x) +1'x(O, WO}
L>. j{O(x) + 10(x)e-O(X)}dF(x) + 1'x(O, lVO} ,
where I(xA;) = IA;. The score vectors corresponding to Lr>. and LA are given by Zn and
ZA' where
(18)
The corresponding eqllatlon for
(18). :Similarity, the seCOllld
is obtained by sul:!stitut.inJ!: 10 for I and F Fr in
L(O) are by
=
CO.
are vel'ifiE~d similar arg;U111ents as in
Lemma 1
9 such that
CO,~ttfltUOiU8 TfjrnCi~to'lS g on [-0.5,0.5J, there is a constant independent of
(i) If 9 d(FT - 112m < a < 11m,
(ii) E{f(I - fo)g dFT}2 ::;
(iii) E{f(I - fo)g dFTP ::; MT-1tU 9 IIi +T-2ma II 9 II~}
Proof: (i) is standard. For (ii) and (iii), with Xk =kiT
E{/(l - fo)g dFT}2 = 1 ,E,EE{lk - fO(Xk)}{f1;;, - fO(Xk')}9(Xk)9(Xk')k k'
Since X t is stationary and, by C.3, it satisfies a first order covariance summability condition,
then by Theorem 5.2.2 and 5.2.4 of Brillinger (1981, page 123 and 125) we have for some
0< C < 00,
Elk = fO(Xk) +O(T-1)
COV(Ik,f1;;,) = c6wfo(Xk? +O(T-1)
uniformly in k and k'. Then it follows
The latter inequality
then immediate and for part
> 1/2. (ii) is
< 1/ + /<
<
Lelnmla 2
w
w II~.
Proof: For part (i) g(x) = e-8(z)u(x)v(x), then
([D2LT(9) - D2L(9)]uv}2 < {J gd(F - FT)}2 +{~ L(Ik - fO(Xk))9(Xk)}2k
S MT-1 II 9 II~ +{~ L(Ik - fO(Xk))9(Xk)}2,k
where we have used part (i) of Lemma 1. To analyzed the second term, expand 9 in terms
of its Fourier series with coefficients gv = (g, cPv), where </>17 are trigonometric polynomials,
and apply the Cauchy-Schwatrz inequality to obtain
fO(Xk))9{Xk)}2 s M{L(1 + 1~)9;} xv
(L:(1 + 1~)-1[~ I::(Ik fO(Xk))cPv(Xk)F}v k
< ~ II 9 II~ {I::(1 +1~)-1[ ~ I::(Ik - fO(Xk))cPv(Xk)]2},v Vi k
where we have used sUPxcP(x) 1. So, using part (ii) of lemma 1, the random variable in
the parenthesis, can it AT, is positive and
E(AT) s MI::(1 +1~)-1 < 00
v
since 1atV v2am and a > 112m. Hence
< +
=<
= Ope1) such that for b S
=
=
a constant M
b S..\ > 0,
(i) K3()., b)~ S M). -(b+l12m)
(ii) K 2T().,b)2 S ATMT-l).-(a+b+112m)
(iii)K3T().,b)2 S ATM).-(b+l12m)(1 + T-1).-a.
Then we have the following
Lemma
2 a-
Proof: The results follow from Lemma 2.1 of CO and Lemma (3) above.
Lemma 4 If (h. E
there is M such that, with 112m < a*
1180
11m ),
os b a finite K, then
< {T-1 Ilg II 9 Il~* ).(slm-a*)
+II 9 II 9 }
where is ind'eTJf',:nd,ent of 9
term is 9 secl:md term substitute
to ontam
< {fg +lI{f X
g - FT)F dr
:::; II g Ili/m +M II g - fJo II~*
< MT-1{T-1 II g IIt/m +,x(s/m-a*) II g II~* .
Note that T-1 2:: T-2a*m since a* > 112m. The last term
MT-l{~Eg(Xk)2e-20,\(Xk )}
k
< MT-l{~Eg(Xk)2}k
< M {II g 115 +T-2ma* II g II~*}
Now define the following linear approximations for fJ in the neighborhood Noo
ii>.. - fJo
8T>.. - fJ>..
-G>..(fJO)-l z>..(fJo),
-G>..(fJ,\)-l ZT>..(fJ>..),
where fJ>.. is in the neighborhood Noo ' The following result can be proved using the same
arguments leading to Theorem 4.1 and 4.2 of CO. The assumptions A.5 and A.6 of CO are
then verified in view of Lemma 4.
Lemma 5 For 0 :::; b :::; and some constant lvf we have
II fJo II; . (21)
If II fJ>.. fJo is a constant, for A = AT as in theorem
References
(1987). Determining the bandwidth of a kernel spec
Ser. Anal., 8: 21-38.
[1] Hrillinger,
Day.
[2] Beltrao, I. I. and H!oomnetd,
trum estimate. J.
[3] Chow, Y. and Grenander, (1985). A sieve method for spectral density. Ann. Statist.,
13: 998-1010.
Cogburn, and David, H.
Statist.,2: 1108-1126.
and spectr.rU ef,tinlatJtOn. Ann.
[5] Cox, D. D. and O'Sullivan, F. (1990). Asymptotic analysis of penalized likelihood-type
estimators. Ann. Statist., 18: 1676-1695.
[6] Cox, D. D. and O'Sullivan, F. (1989). Generalized nonparametric regression via penal
ized likelihhod. Tech. Report No. 170, Dep. Statist., U. Washington, Seattle.
As'vmptcltie inference in sta,tionary Gaussian time series. Adv.[7] Davies, R. B.
Appl. Prob., 5: 469-497.
[8] Good, I.J. and Gaskin, It.A. (1971). Nonparametric roughness penalty for probability
densities. Biometrika,
J. Appl.
Prob.,
[10] Tlbf,hlrllblll J.
a l':n@'~trllm est.llll<Lte:
cros:s-validatjon methodls. J.
Gene1'ali.zed L,'iTliear MGllIel:s. Llondon: Chapman
[14]
estimat:ors. SIAM J.
COlnp1I1ta,tio,n of fully automated log-density and IOf!-ha,Zalrd
GlJ'I1nput., 9: 363-319.
[15] Shumway, R. H. (1988). Applied Statistical Time Series Analysis. Englewood Cliffs:
Prentice Hall.
[16] Silverman, RW. (1982). On the .estimation of a probability density function by the
ma:ldI1l.nm pelli:lJ1;iSeU likE!1ihl)od me1thod. Ann. Statist., 10: 195-810.
[11] of spilne smoothing approach to nonparametric
regression curve fitting. J. Roy. Statist. Soc., Ser.B, 41: 1-50.
[18] Wahba, G. (1980). Automatic smoothing of the log periodogram. J. Amer. Statist.
Assoc., 15: 122-132.
[19] Whittle, P. (1962). Ganssian estimation in stationary time series. Bull. Inst. Internat.
Statist., 39: 105-129.
[20] Wichman, R A.and Hill, LD. (1982). efficient and protable pseudo-random number
generator. Appl. Statist., 31: 188-190.
LIST OF FIGURES
siml11atl;-:d AR(3)
are estlmllLtes
MA(4)
true intel?:r'atE~d sqnared
error a reaJlizalcion is
constant.is eXl}ected to
trne SpE!ctrnm
process
are similar to process.
T
The sca,ttered points
Log of RlmnlP aVElra~~e pleri(J,dolitraJUl
is shown by the wiggly dotted
of log Rn.,.rtTllm
same bandwidth as
are Ik +
Figure 3: The PMLE is better than the SLP or log of smoothed periodogram (LSP).
Note: P=PMLE, SL=SLP and LS=LSP. (a) Each boxplot summarizes 500 Monte
Carlo replications for the AR(3) process. For each replication we record the true ISE
of the best PMLE, SLP and LSP given the dataj (b) The same as (a), except here
we summarize the true ISE of the automatic PMLE and SLP. We do not develop an
automatic procedure for LSPj (c) same as (a) for the MA(4) process; (d) The
same as (b) for the MA(4) process.
Figure 1(a) (b)
AR(3)
~..-;;;.........
. True'
~MLE
C\II
'VI
'V
E C\I
~c-oo 0
j'""'" _ - - - --
Frequency
0.02 0.04 0.08 0.16 0.32
Effective Bandwidth
0.0 0.1 0.2 0.3 0.4 0.5
(c) (d)
True I'. If':"__ ""~.,,, .
~LP .
..,.
C\I
E2 0
i00g>
...J C\II
MA(4)
, /?: 1',"
.' /',# "_ -17- -,'-. .
// t/'
,/ ,t'
,selPAAlEj·:.~.:·:::::::::.....·.·.-,..::·::<'i$'~;SLP)0.02 0.04 0.08 0.16 0.32 0.0 0.1 0.2 0.3 0.4 0.5
Effective Bandwidth Frequency
Figure 2(a) (b)
v·10'LS'P'
'~y , ••~._:-
PMLE:~?LP· ..·~·b ...
0T""
E cx:>
2~c% <D
.s''It
C\I
- ~
0.04 0.08 0.16 0.32 0.0 0.1 0.2 0.3 0.4 0.5
Effective Bandwidth Frequency
Figure 3(a) (b)
P SL P SL P SL P SL
T=512 028
T
I
T=256"
.ii
t
,~
:T=12~
" III
tII
l()
0
"l:tci
en0
w~
Nci
"...
ci
00
LS P SL LS
!
Ja
iI
28 i T::::256 : T::::512 T=1028T
III
oo
No
(c) (d)
P SL LS P Sl LS P SL LS
I :
$$ $
T=1028T=512T=256
l $t : ~I •
$-Ja 4-
TII
1
---
P SL P SL P SL P SL
T=128:" .
,,!T
I
$1
<X:!0
<'?0
w ~Cf) 0
Nci
0ci
T='1028T=51228
P SL
No