david r. brillinger time series data analysis and theory 2001
TRANSCRIPT
Time Series
http://avaxhome.ws/blogs/ChrisRedfield
SIAM's Classics in Applied Mathematics series consists of books that were previously allowedto go out of print. These books are republished by SIAM as a professional service because theycontinue to be important resources for mathematical scientists.
Editor-in-ChiefRobert E. O'Malley, Jr., University of Washington
Editorial BoardRichard A. Brualdi, University of Wisconsin-Madison
Herbert B. Keller, California Institute of Technology
Andrzej Z. Manitius, George Mason University
Ingram Olkin, Stanford University
Stanley Richardson, University of Edinburgh
Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht
Classics in Applied Mathematics
C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the NaturalSciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras withApplications and Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: SequentialUnconstrained Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the MathematicalSciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors:Part One, Part Two, Supplement. Translated by G. W. Stewart
Richard Bellman, Introduction to Matrix Analysis
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary ValueProblems for Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-ValueProblems in Differential-Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for UnconstrainedOptimization and Nonlinear Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
*First time in print.
Classics in Applied Mathematics (continued)
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, andTraffic Flow
Peter W. M. John, Statistical Design and Analysis of Experiments
Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition
Emanuel Parzen, Stochastic Processes
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods inControl: Analysis and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and OrderingPopulations: A New Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Témam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in SeveralVariables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities andTheir Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asympototic Approximations of Integral
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theoryand Computation
David R. Brillinger, Time Series: Data Analysis and Theory
This page intentionally left blank
Time SeriesData Analysis and Theory
David R. BrillingerUniversity of California at Berkeley
Berkeley, California
siamSociety for Industrial and Applied MathematicsPhiladelphia
Copyright © 2001 by the Society for Industrial and Applied Mathematics.
This SIAM edition is an unabridged republication of the work first published by Holden Day,Inc., San Francisco, 1981.
1 0 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this book may bereproduced, stored, or transmitted in any manner without the written permission of thepublisher. For information, write to the Society for Industrial and Applied Mathematics, 3600University City Science Center, Philadelphia, PA 19104-2688.
Library of Congress Cataloging-in-Publication Data
Brillinger, David R.Time series: data analysis and theory / David R. Brillinger
p. cm. -- (Classics in applied mathematics ; 36)"This SIAM edition is an unabridged republication of the work first published by
Holden Day, Inc., San Francisco, 1981" -- T.p. verso.ISBN 0-89871-501-6 (pbk.)1. Time-series analysis. 2. Fourier transformations. I. Title. II. Series
QA280 .B74 2001519.5'5--dc21
2001034170
Figure 1.1.3 reprinted with permission from E. W. Carpenter, "Explosion Seismology," Science,147:363-373, 22 January 1965. Copyright 1965 by the American Association for theAdvancement of Science.
siamis a registered trademark.
To My Family
This page intentionally left blank
CONTENTS
Preface to the Classics Edition xiiiPreface to the Expanded Edition xviiPreface to the First Edition hxix
1 The Nature of Time Series andTheir Frequency Analysis 1
1.1 Introduction 11.2 A Reason for Harmonic Analysis 71.3 Mixing 81.4 Historical Development 91.5 The Uses of the Frequency Analysis 101.6 Inference on Time Series 121.7 Exercises 13
2 Foundations 16
2.1 Introduction 162.2 Stochastics 172.3 Cumulants 192.4 Stationarity 222.5 Second-Order Spectra 232.6 Cumulant Spectra of Order k 252.7 Filters 272.8 Invariance Properties of Cumulant Spectra 342.9 Examples of Stationary Time Series 352.10 Examples of Cumulant Spectra 392.11 The Functional and Stochastic Approaches to Time Series Analysis
412.12 Trends 432.13 Exercises 44
ix
X CONTENTS
3 Analytic Properties of Fourier Transformsand Complex Matrices 49
3.1 Introduction 493.2 Fourier Series 493.3 Convergence Factors 523.4 Finite Fourier Transforms and Their Properties 603.5 The Fast Fourier Transform 643.6 Applications of Discrete Fourier Transforms 673.7 Complex Matrices and Their Extremal Values 703.8 Functions of Fourier Transforms 753.9 Spectral Representations in the Functional Approach to Time
Series 803.10 Exercises 82
4 Stochastic Properties of Finite Fourier Transforms 88
4.1 Introduction 884.2 The Complex Normal Distribution 894.3 Stochastic Properties of the Finite Fourier Transform 904.4 Asymptotic Distribution of the Finite Fourier Transform 944.5 Probability 1 Bounds 984.6 The Cramer Representation 1004.7 Principal Component Analysis and its Relation to the Cramer
Representation 1064.8 Exercises 109
5 The Estimation of Power Spectra 116
5.1 Power Spectra and Their Interpretation 1165.2 The Periodogram 1205.3 Further Aspects of the Periodogram 1285.4 The Smoothed Periodogram 1315.5 A General Class of Spectral Estimates 1425.6 A Class of Consistent Estimates 1465.7 Confidence Intervals 1515.8 Bias and Prefiltering 1545.9 Alternate Estimates 1605.10 Estimating the Spectral Measure and Autocovariance Function 1665.11 Departures from Assumptions 1725.12 The Uses of Power Spectrum Analysis 1795.13 Exercises 181
CONTENTS xi
6 Analysis of A Linear Time Invariant Relation BetweenA Stochastic Series and Several Deterministic Series 186
6.1 Introduction 1866.2 Least Squares and Regression Theory 1886.3 Heuristic Construction of Estimates 1926.4 A Form of Asymptotic Distribution 1946.5 Expected Values of Estimates of the Transfer Function and Error
Spectrum 1966.6 Asymptotic Covariances of the Proposed Estimates 2006.7 Asymptotic Normality of the Estimates 2036.8 Estimating the Impulse Response 2046.9 Confidence Regions 2066.10 A Worked Example 2096.11 Further Considerations 2196.12 A Comparison of Three Estimates of the Impulse Response 2236.13 Uses of the Proposed Technique 2256.14 Exercises 227
7 Estimating The Second-Order Spectraof Vector-Valued Series 232
7.1 The Spectral Density Matrix and its Interpretation 2327.2 Second-Order Periodograms 2357.3 Estimating the Spectral Density Matrix by Smoothing 2427.4 Consistent Estimates of the Spectral Density Matrix 2477.5 Construction of Confidence Limits 2527.6 The Estimation of Related Parameters 2547.7 Further Considerations in the Estimation of Second-Order Spectra
2607.8 A Worked Example 2677.9 The Analysis of Series Collected in an Experimental Design 2767.10 Exercises 279
8 Analysis of A Linear Time Invariant Relation BetweenTwo Vector-Valued Stochastic Series 286
8.1 Introduction 2868.2 Analogous Multivariate Results 2878.3 Determination of an Optimum Linear Filter 2958.4 Heuristic Interpretation of Parameters and Construction of Estimates
2998.5 A Limiting Distribution for Estimates 3048.6 A Class of Consistent Estimates 3068.7 Second-Order Asymptotic Moments of the Estimates 309
xii CONTENTS
8.8 Asymptotic Distribution of the Estimates 3138.9 Confidence Regions for the Proposed Estimates 3148.10 Estimation of the Filter Coefficients 3178.11 Probability 1 Bounds 3218.12 Further Considerations 3228.13 Alternate Forms of Estimates 3258.14 A Worked Example 3308.15 Uses of the Analysis of this Chapter 3318.16 Exercises 332
9 Principal Components in The Frequency Domain 337
9.1 Introduction 3379.2 Principal Component Analysis of Vector-Valued Variates 3399.3 The Principal Component Series 3449.4 The Construction of Estimates and Asymptotic Properties 3489.5 Further Aspects of Principal Components 3539.6 A Worked Example 3559.7 Exercises 364
10 The Canonical Analysis of Time Series 367
10.1 Introduction 36710.2 The Canonical Analysis of Vector-Valued Variates 36810.3 The Canonical Variate Series 37910.4 The Construction of Estimates and Asymptotic Properties 38410.5 Further Aspects of Canonical Variates 38810.6 Exercises 390
Proofs of Theorems 392
References 461
Notation Index 488
Author Index 490
Subject Index 496
Addendum: Fourier Analysis of Stationary Processes 501
PREFACE TO THE CLASSICSEDITION
"One can FT anything—often meaningfully."—John W. Tukey
John Tukey made this remark after my book had been published, but it issurely the motif of the work of the book. In fact the preface of the original bookstates that
The reader will note that the various statistics presented are imme-diate functions of the discrete Fourier transforms of the observedvalues of the time series. Perhaps this is what characterizes the workof this book. The discrete Fourier transform is given such promi-nence because it has important empirical and mathematical properties.Also, following the work of Cooley and Tukey (1965), it may becomputed rapidly.
The book was finished in mid 1972. The field has moved on from its placethen. Some of the areas of particular development include the following.
I. Limit theorems for empirical Fourier transforms.Many of the techniques based on the Fourier transform of a stretch of time
series are founded on limit or approximation theorems. Examples may be foundin Brillinger (1983). There have been developments to more abstract types ofprocesses: see, for example, Brillinger (1982, 1991). One particular type ofdevelopment concerns distributions with long tails; see Freedman and Lane(1981). Another type of extension concerns series that have so-called longmemory. The large sample distribution of the Fourier transform values in thiscase is developed in Rosenblatt (1981), Yajima (1989), and Pham and Guegan
xiii
xiv PREFACE TO THE CLASSICS EDITION
(1994).
II. Tapering.The idea of introducing convergence factors into a Fourier approximation has
a long history. In the time series case, this is known as tapering. Surprisingproperties continue to be found; see Dahlhaus (1985).
III. Finite-dimensional parameter estimation.Dzhaparidze (1986) develops in detail Whittle's method of Gaussian or
approximate likelihood estimation. Brillinger (1985) generalizes this to the third-order case in the tool of bispectral fitting. Terdik (1999) develops theoreticalproperties of that procedure.
IV. Computation.Time series researchers were astonished in the early 1980s to learn that the fast
Fourier transform algorithms had been anticipated many years earlier by K. F.Gauss. The story is told in Heideman et al. (1985). There have since beenextensions to the cases of a prime number of observations (see Anderson and Dillon(1996)) and to the case of unequally spaced time points (see Nguyen and Liu(1999)).
V. General methods and examples.A number of applications to particular physical circumstances have been made
of Fourier inference; see the paper by Brillinger (1999) and the book by Bloomfield(2000).
D. R. B.Berkeley, CaliforniaDecember 2000
ANDERSON, C., and DILLON, M. (1996). "Rapid computation of the discreteFourier transform." SIAM J. Sci, Comput. 17:913-919.
BLOOMFIELD, P. (2000). Fourier Analysis of Time Series: An Introduction.Second Edition. New York: Wiley.
BRILLINGER, D. R. (1982). "Asymptotic normality of finite Fourier transformsof stationary generalized processes." J. Multivariate Anal. 12:64-71.
BRILLINGER, D. R. (1983). "The finite Fourier transform of a stationaryprocess." In Time Series in the Frequency Domain, Handbook of Statist. 3, Eds. D.R. Brillinger and P. R. Krishnaiah, pp. 21-37. Amsterdam: Elsevier.
PREFACE TO THE CLASSICS EDITION xv
BRILLINGER, D. R. (1985). "Fourier inference: some methods for the analysisof array and nongaussian series data." Water Resources Bulletin. 21:743-756.
BRILLINGER, D. R. (1991). "Some asymptotics of finite Fourier transforms ofa stationary p-adic process." J. Combin. Inform. System Sci. 16:155-169.
BRILLINGER, D. R. (1999). "Some examples of empirical Fourier analysis inscientific problems." In Asymptotics, Nonparametrics and Time Series, Ed. S.Ghosh, pp. 1-36. New York: Dekker.
DAHLHAUS, R. (1985). "A functional limit theorem for tapered empiricalspectral functions." Stochastic Process. Appl. 19:135-149.
DZHAPARIDZE, K. (1986). Parameter Estimation and Hypothesis Testing inSpectral Analysis of Stationary Time Series. New York: Springer.
FREEDMAN, D., and LANE, D. (1981). "The empirical distribution of theFourier coefficients of a sequence of independent, identically distributed long-tailed random variables." Zeitschrift fur Wahrscheinlichkeitstheorie undVerwandte Gebiete. 58:21-40.
HEIDEMAN, M. T., JOHNSON, D. H., and BURRUS, C. S. (1985). "Gauss andthe history of the fast Fourier transform." Arch. Hist. Exact Sci. 34:265-277.
NGUYEN, N., and LIU, Q. H. (1999). "The regular Fourier matrices andnonuniform fast Fourier transforms." SIAM. J. Sci. Comput. 21:283-293.
PHAM, D. T., and GUEGAN, D. (1994). "Asymptotic normality of the discreteFourier transform of long memory series." Statist. Probab. Lett. 21:299-309.
ROSENBLATT, M. (1981). "Limit theorems for Fourier transforms of function-als of Gaussian sequences." Zeitschrift fur Wahrscheinlichkeitstheorie undVerwandte Gebiete. 55:123-132.
TERDIK, G. (1999). Bilinear Stochastic Models and Related Problems ofNonlinear Time Series Analysis: A Frequency Domain Approach, Lecture Notesin Statist. 142. New York: Springer.
YAJIMA, Y. (1989). "A central limit theorem of Fourier transforms of stronglydependent stationary processes." J. Time Ser. Anal. 10:375-383.
This page intentionally left blank
PREFACE TO THE EXPANDEDEDITION
The 1975 edition of Time Series: Data Analysis and Theory has been ex-panded to include the survey paper "Fourier Analysis of Stationary Pro-cesses." The intention of the first edition was to develop the many impor-tant properties and uses of the discrete Fourier transforms of the observedvalues or time series. The Addendum indicates the extension of the resultsto continuous series, spatial series, point processes and random Schwartzdistributions. Extensions to higher-order spectra and nonlinear systems arealso suggested.
The Preface to the 1975 edition promised a Volume Two devoted to theaforementioned extensions. The author found that there was so much ex-isting material, and developments were taking place so rapidly in thoseareas, that whole volumes could be devoted to each. He chose to concen-trate on research, rather than exposition.
From the letters that he has received the author is convinced that his in-tentions with the first edition have been successfully realized. He thanksthose who wrote for doing so.
D. R. B.
xvii
This page intentionally left blank
PREFACE TO THE FIRST EDITION
The initial basis of this work was a series of lectures that I presented tothe members of Department 1215 of Bell Telephone Laboratories, MurrayHill, New Jersey, during the summer of 1967. Ram Gnanadesikan of thatDepartment encouraged me to write the lectures up in a formal manner.Many of the worked examples that are included were prepared that summerat the Laboratories using their GE 645 computer and associated graphicaldevices.
The lectures were given again, in a more elementary and heuristic manner,to graduate students in Statistics at the University of California, Berkeley,during the Winter and Spring Quarters of 1968 and later to graduatestudents in Statistics and Econometrics at the London School of Economicsduring the Lent Term, 1969. The final manuscript was completed in mid1972. It is hoped that the references provided are near complete for theyears before then.
I feel that the book will prove useful as a text for graduate level courses intime series analysis and also as a reference book for research workersinterested in the frequency analysis of time series. Throughout, I have triedto set down precise definitions and assumptions whenever possible. Thisundertaking has the advantage of providing a firm foundation from whichto reach for real-world applications. The results presented are generally farfrom the best possible; however, they have the advantage of flowing from asingle important mixing condition that is set down early and gives continuityto the book.
Because exact results are simply not available, many of the theorems ofthe work are asymptotic in nature. The applied worker need not be put offby this. These theorems have been set down in the spirit that the indicated
xix
XX PREFACE
asymptotic moments and distributions may provide reasonable approxima-tions to the desired finite sample results. Unfortunately not too much workhas gone into checking the accuracy of the asymptotic results, but somereferences are given.
The reader will note that the various statistics presented are immediatefunctions of the discrete Fourier transforms of the observed values of thetime series. Perhaps this is what characterizes the work of this book.The discrete Fourier transform is given such prominence because it hasimportant empirical and mathematical properties. Also, following the workof Cooley and Tukey (1965), it may be computed rapidly. The definitions,procedures, techniques, and statistics discussed are, in many cases, simpleextensions of existing multiple regression and multivariate analysis tech-niques. This pleasant state of affairs is indicative of the widely pervasivenature of the important statistical and data analytic procedures.
The work is split into two volumes. This volume is, in general, devoted toaspects of the linear analysis of stationary vector-valued time series. VolumeTwo, still in preparation, is concerned with nonlinear analysis and theextension of the results of this volume to stationary vector-valued con-tinuous series, spatial series, and vector-valued point processes.
Dr. Colin Mallows of Bell Telephone Laboratories provided the authorwith detailed comments on a draft of this volume. Professor Ingram Olkinof Stanford University also commented on the earlier chapters of that draft.Mr. Jostein Lillestöl read through the galleys. Their suggestions were mosthelpful.
I learned time series analysis from John W. Tukey. I thank him nowfor all the help and encouragement he has provided.
D.R.B.
1
THE NATURE OFTIME SERIES
AND THEIRFREQUENCY ANALYSIS
1.1 INTRODUCTION
In this work we will be concerned with the examination of r vector-valuedfunctions
where Xj(t), j = 1,.. ., r is real-valued and t takes on the values 0, ± 1,±2, . . . . Such an entity of measurements will be referred to as an r vector-valued time series. The index t will often refer to the time of recording of themeasurements.
An example of a vector-valued time series is the collection of meanmonthly temperatures recorded at scattered locations. Figure 1.1.1 givessuch a series for the locations listed in Table 1.1.1. Figure 1.1.2 indicatesthe positions of these locations. Such data may be found in World WeatherRecords (1965). This series was provided by J. M. Craddock, Meteorologi-cal Office, Bracknell. Another example of a vector-valued time series is theset of signals recorded by an array of seismometers in the aftermath of anearthquake or nuclear explosion. These signals are discussed in Keen et al(1965) and Carpenter (1965). Figure 1.1.3 presents an example of such arecord.
1
2 NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
Figure 1.1.1 Monthly mean temperatures in °C at 14 stations for the years 1920-1930.
1.1 INTRODUCTION 3
NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
Table 1.1.1 Stations and Time Periods of TemperatureData Used in Worked Examples
Index
123456789
1011121314
City
ViennaBerlinCopenhagenPragueStockholmBudapestDeBiltEdinburghGreenwichNew HavenBaselBreslauVilnaTrondheim
Period Available
1780-19501769-19501798-19501775-19391756-19601780-19471711-19601764-19591763-19621780-19501755-19571792-19501781-19381761-1946
Figure 1.1.2 Locations of the temperature stations (except New Haven, U.S.A.).
4 4
1.1 INTRODUCTION 5
These examples are taken from the physical sciences; however, the socialsciences also lead to the consideration of vector-valued time series. Figure1.1.4 is a plot of exports from the United Kingdom separated by destinationduring the period 1958-1968. The techniques discussed in this work willsometimes be useful in the analysis of such a series although the results ob-tained are not generally conclusive due to a scarcity of data and departurefrom assumptions.
An inspection of the figures suggests that the individual componentseries are quite strongly interrelated. Much of our concern in this work willcenter on examining interrelations of component series. In addition thereare situations in which we are interested in a single series on its own. Forexample, Singleton and Poulter (1967) were concerned with the call of amale killer whale and Godfrey (1965) was concerned with the quantity ofcash held within the Federal Reserve System for the purpose of meetinginterbank check-handling obligations each month. Figure 1.1.5 is a graph ofthe annual mean sunspot numbers for the period 1760-1965; see Waldmeir(1961). This series has often been considered by statisticians; see Yule(1927), Whittle (1954), Brillinger and Rosenblatt (1967b). Generally speak-ing it will be enough to consider single component series as particular cases
Figure 1.1.3 Signals recorded by an array of seismometers at the time of an event.
6 NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
of vector-valued series corresponding to r = 1. However, it is typicallymuch more informative if we carry out a vector analysis, and it is wise tosearch out series related to any single series and to include them in theanalysis.
Figure 1.1.4 Value of United Kingdom exports by destination for 1958-1968.
1.2 A REASON FOR HARMONIC ANALYSIS
Figure 1.1.5 Annual mean sunspot numbers for 1760-1965.
1.2 A REASON FOR HARMONIC ANALYSIS
The principal mathematical methodology we will employ in our analysisof time series is harmonic analysis. This is because of our decision to restrictconsideration to series resulting from experiments not tied to a specifictime origin or, in other words, experiments invariant with respect to trans-lations of time. This implies, for example, that the proportion of the valuesX(t), t > u, falling in some interval /, should be approximately the same asthe proportion of the values X(t), t > u + u, falling in / for all v.
The typical physical experiment appears to possess, in large part, this sortof time invariance. Whether a physicist commenced to measure the force ofgravity one day or the next does not seem to matter for most purposes. Acursory examination of the series of the previous section suggests: the tem-perature series of Figure 1.1.1 are reasonably stable in time; portions of theseismic series appear stable; the export series do not appear stationary; andthe sunspot series appear possibly so. The behavior of the export series istypical of that of many socioeconomic series. Since people learn from thepast and hence alter their behavior, series relating to them are not generallytime invariant. Later we will discuss methods that may allow removing astationary component from a nonstationary series; however, the techniquesof this work are principally directed toward the analysis of series stable intime.
The requirement of elementary behavior under translations in time hascertain analytic implications. Let/(f) be a real or complex-valued functiondefined for / = 0, ±1, . . . . If we require
7
then clearly /(/) is constant. We must therefore be less stringent than ex-pression (1.2.1) in searching for functions behaving simply under timetranslations. Let us require instead
with Cj = Cj exp f iXj-u}. In other words, if a function of interest is a sum oicosinusoids, then its behavior under translations is also easily described. Wehave, therefore, in the case of experiments leading to results that are deter-ministic functions, been led to functions that can be developed in themanner of (1.2.6). The study of such functions is the concern of harmonic 01Fourier analysis; see Bochner (1959), Zygmund (1959), Hewitt and Ross(1963), Wiener (1933), Edwards (1967).
In Section 2.7 we will see that an important class of operations on timeseries, niters, is also most easily described and investigated through har-monic analysis.
With experiments that result in random or stochastic functions, X(t).time invariance leads us to investigate the class of experiments such that{X(t\\... ,X(tk)\ has the same probability structure as {X(t\ + M), ..., X(tk + «)}for all u and t\,.. ., tk. The results of such experiments are called stationarystochastic processes; see Doob (1953), Wold (1938), and Khintchine (1934).
1.3 MIXING
A second important requirement that we will place upon the time seriesthat we consider is that they have a short span of dependence. That is, the
8 NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
Setting u = 1 and proceeding recursively gives
In either case, if we write Ci = exp [a], a real or complex, then we see thatthe general solution of expression (1.2.2) may be written
and that Cu — exp (aw). The bounded solutions of expression (1.2.2) areseen to occur for a = i\, X real, where / = -^— 1. In summary, if we look forfunctions behaving simply with respect to time translation, then we are ledto the sinusoids exp {i\t\, X real; the parameter X is called the frequency olthe sinusoid. If in fact
then
1.4 HISTORICAL DEVELOPMENT 9
measurements X(t) and X(s) are becoming unrelated or statistically inde-pendent of each other as t — s —> °°.
This requirement will later be set down in a formal manner with Assump-tions 2.6.1 and 2.6.2(7). It allows us to define relevant population parametersand implies that various estimates of interest are asymptotically Gaussian inthe manner of the central limit theorem.
Many series that are reasonably stationary appear to satisfy this sort ofrequirement; possibly because as time progresses they are subjected torandom shocks, unrelated to what has gone before, and these randomshocks eventually form the prime content of the series.
A requirement that a time series have a weak memory is generally referredto as a mixing assumption; see Rosenblatt (1956b).
1.4 HISTORICAL DEVELOPMENT
The basic tool that we will employ, in the analysis of time series, is thefinite Fourier transform of an observed section of the series.
The taking of the Fourier transform of an empirical function was pro-posed as a means of searching for hidden periodicities in Stokes (1879).Schuster (1894), (1897), (1900), (1906a), (1906b), in order to avoid the an-noyance of considering relative phases, proposed the consideration of themodulus-squared of the finite Fourier transform. He called this statistic theperiodogram. His motivation was also the search for hidden periodicities.
The consideration of the periodogram for general stationary processeswas initiated by Slutsky (1929, 1934). He developed many of the statisticalproperties of the periodogram under a normal assumption and a mixingassumption. Concurrently Wiener (1930) was proposing a very general formof harmonic analysis for time series and beginning a study of vectorprocesses.
The use of harmonic analysis as a tool for the search of hidden periodici-ties was eventually replaced by its much more important use for inquiringinto relations between series; see Wiener (1949) and Press and Tukey (1956).An important statistic in this case is the cross-periodogram, a product of thefinite Fourier transforms of two series. It is inherent in Wiener (1930) andGoodman (1957); the term cross-periodogram appears in Whittle (1953).
The periodogram and cross-periodogram are second-order statistics andthus are especially important in the consideration of Gaussian processes.Higher order analogs are required for the consideration of various aspectsof non-Gaussian series. The third-order periodogram, a product of threefinite Fourier transforms, appears in Rosenblatt and Van Ness (1965), andthe fcth order periodogram, a product of k finite Fourier transforms, inBrillinger and Rosenblatt (1967a, b).
10 NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
The instability of periodogram-type statistics is immediately apparentwhen they are calculated from empirical functions; see Kendall (1946),Wold (1965), and Chapter 5 of this text. This instability led Daniell (1946)to propose a numerical smoothing of the periodogram which has now be-come basic to most forms of frequency analysis.
Papers and books, historically important in the development of themathematical foundations of the harmonic analysis of time series, include:Slutsky (1929), Wiener (1930), Khintchine (1934), Wold (1938), Kolmogorov(1941a, b), Crame'r (1942), Blanc-Lapiere and Fortet (1953), and Grenander(1951a).
Papers and books, historically important in the development of the em-pirical harmonic analysis of time series, include: Schuster (1894, 1898),Tukey (1949), Bartlett (1948), Blackman and Tukey (1958), Grenander andRosenblatt (1957), Bartlett (1966), Hannan (1960), Stumpff (1937), andChapman and Bartels (1951).
Wold (1965) is a bibliography of papers on time series analysis. Burkhardt(1904) and Wiener (1938) supply a summary of the very early work. Simpson(1966) and Robinson (1967) provide many computer programs useful inanalyzing time series.
1.5 THE USES OF THE FREQUENCY ANALYSIS
This section contains a brief survey of some of the fields in which spectralanalysis has been employed. There are three principal reasons for usingspectral analysis in the cases to be presented (i) to provide useful descriptivestatistics, (ii) as a diagnostic tool to indicate which further analyses mightbe relevant, and (iii) to check postulated theoretical models. Generally, thesuccess experienced with the technique seems to vary directly with the lengthof series available for analysis.
Physics If the spectral analysis of time series is viewed as the study of theindividual frequency components of some time series of interest, then thefirst serious application of this technique may be regarded as havingoccurred in 1664 when Newton broke sunlight into its component parts bypassing it through a prism. From this experiment has grown the subject ofspectroscopy (Meggers (1946), McGucken (1970), and Kuhn (1962)), inwhich there is investigation of the distribution of the energy of a radiationfield as a function of frequency. (This function will later be called a powerspectrum.) Physicists have applied spectroscopy to identifying chemicalelements, to determine the direction and rate of movement of celestialbodies, and to testing general relativity. The spectrum is an importantparameter in the description of color; Wright (1958).
1.5 THE USES OF THE FREQUENCY ANALYSIS 11
The frequency analysis of light is discussed in detail in Born and Wolfe(1959); see also Schuster (1904), Wiener (1953), Jennison (1961), andSears (1949).
Power spectra have been used frequently in the fields of turbulence andfluid mechanics; see Meecham and Siegel (1964), Kampe de Feriet (1954),Hopf (1952), Burgers (1948), Friedlander and Topper (1961), and Batchelor(1960). Here one typically sets up a model leading to a theoretical powerspectrum and checks it empirically. Early references are given in Wiener(1930).
Electrical Engineering Electrical engineers have long been concernedwith the problem of measuring the power in various frequency bandsof some electromagnetic signal of interest. For example, see Pupin(1894), Wegel and Moore (1924), and Van der Pol (1930). Later, the inven-tion of radar gave stimulus to the problem of signal detection, and fre-quency analysis proved a useful tool in its investigation; see Wiener (1949),Lee and Wiesner (1950), and Solodovnikov (1960). Frequency analysis isnow firmly involved in the areas of coding, information theory, and com-munications; see Gabor (1946), Middleton (1960), and Pinsker (1964). Inmany of these problems, Maxwell's equations lead to an underlying modelof some use.
Acoustics Frequency analysis has proved itself important in the field ofacoustics. Here the power spectrum has generally played the role of adescriptive statistic. For example, see Crandall and Sacia (1924), Beranek(1954), and Majewski and Hollien (1967). An important device in this con-nection is the sound spectrograph which permits the display of time-dependent spectra; see Fehr and McGahan (1967). Another interestingdevice is described in Noll (1964).
Geophysics Tukey (1965a) has given a detailed description and bibliog-raphy of the uses of frequency analysis in geophysics; see also Tukey(1965b), Kinosita (1964), Sato (1964), Smith et al (1967), Labrouste (1934),Munk and MacDonald (1960), Ocean Wave Spectra (1963), Haubrich andMacKenzie (1965), and various authors (1966). A recent dramatic exampleinvolves the investigation of the structure of the moon by the frequencyanalysis of seismic signals, resulting from man-made impacts on the moon;see Latham et al (1970).
Other Engineering Harmonic analysis has been employed in many areasof engineering other than electrical: for example, in aeronautical engineer-ing, Press and Tukey (1956), Takeda (1964); in naval engineering, Yamanou-chi (1961), Kawashima (1964); in hydraulics, Nakamura and Murakami(1964); and in mechanical engineering, Nakamura (1964), Kaneshige (1964),Crandall (1958), Crandall (1963). Civil engineers find spectral techniquesuseful in understanding the responses of buildings to earthquakes.
12 NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
Medicine A variety of medical data is collected in the form of timeseries; for example, electroencephalograms and electrocardiograms. Refer-ences to the frequency analysis of such data include: Alberts et al (1965),Bertrand and Lacape (1943), Gibbs and Grass (1947), Suhara and Suzuki(1964), and Yuzuriha (1960). The correlation analysis of EEG's is discussedin Barlow (1967); see also Wiener (1957, 1958).
Economics Two books, Granger (1964) and Fishman (1969), haveappeared on the application of frequency analysis to economic time series.Other references include: Beveridge (1921), Beveridge (1922), Nerlove(1964), Cootner (1964), Fishman and Kiviat (1967), Burley (1969), andBrillinger and Hatanaka (1970). Bispectral analysis is employed in God-frey (1965).
Biology Frequency analysis has been used to investigate the circadianrhythm present in the behavior of certain plants and animals; for example,see Aschoff (1965), Chance et al (1967), Richter (1967). Frequencyanalysis is also useful in constructing models for human hearing; seeMathews (1963).
Psychology A frequency analysis of data, resulting from psychologicaltests, is carried out in Abelson (1953).
Numerical Analysis Spectral analysis has been used to investigate theindependence properties of pseudorandom numbers generated by variousrecursive schemes; see Jagerman (1963) and Coveyou and MacPherson(1967).
1.6 INFERENCE ON TIME SERIES
The purpose of this section is to record the following fact that the readerwill soon note for himself in proceeding through this work: the theory andtechniques employed in the discussion of time series statistics are entirelyelementary. The basic means of constructing estimates is the method ofmoments. Asymptotic theory is heavily relied upon to provide justifications.Much of what is presented is a second-order theory and is therefore mostsuitable for Gaussian processes. Sufficient statistics, maximum likelihoodstatistics, and other important concepts of statistical inference are onlybarely mentioned.
A few attempts have been made to bring the concepts and methods ofcurrent statistical theory to bear on stationary time series; see Bartlett(1966), Grenander (1950), Slepian (1954), and Whittle (1952). Likelihoodratios have been considered in Striebel (1959), Parzen (1963), and Gikmanand Skorokhod (1966). General frameworks for time series analysis havebeen described in Rao (1963), Stigum (1967), and Rao (1966); see alsoHajek (1962), Whittle (1961), and Arato (1961).
1.7 EXERCISES 13
It should be pointed out that historically there have been two ratherdistinct approaches to the analysis of time series: the frequency or harmonicapproach and the time domain approach. This work is concerned with theformer, while the latter is exemplified by the work of Mann and Wald (1943),Quenouille (1957), Durbin (1960), Whittle (1963), Box and Jenkins (1970).The differences between these two analyses is discussed in Wold (1963). Withthe appearance of the Fast Fourier Algorithm, however, it may be moreefficient to carry out computations in the frequency domain even whenthe time domain approach is adopted; see Section 3.6, for example.
1.7 EXERCISES
1.7.1 If/(-) is complex valued and/(ri + «i, . . . , /* -f- «*) = Cui...ukf(ti, ...,tk)for tj, Uj = 0, ±1, ±2, . . . , 7 = 1, . . . ,*, prove that /(/i, . . . , tk) =/(O, . . . , 0) exp{ 2ajtj} for some ai, . . . , a*. See Aczel (1969).
1.7.2 If/(/) is complex valued, continuous, and/(/ + u) = Cuf(t) for — co < /,u < oo, prove that /(/) = /(O) exp{a/} for some a.
1.7.3 If f(/) is r vector valued, with complex components, and f(t + u) = C«f(/)for /, « = 0, ±1, ±2, .. . and Cu an r X r matrix function, prove thatf(/) = Ci' f(0) if Det{f(0), . . . , f(r - 1)} ^ 0, where
See Doeblin (1938) and Kirchener (1967).1.7.4 Let W(a), — oo < a < °o be an absolutely integrable function satisfying
Let/(a), —oo < « < oobea bounded function continuous at a = X. Showthat e~! / ^[s-KX - <x)]da = 1 and
1.7.5 Prove that for
14
1.7.6 Let X\, . . . , Xr be independent random variables with EXj = /z, and vaXj - a2. Consider linear combinations Y — ̂ y ajXj, ^j a/ = 1. We hav<EY = ^j ajuj. Prove that var Y is minimized by the choice aj = aj~2/^j erf2, J = 1, . . . , r.
1.7.7 Prove that J^f",} exp{ i(27rw)/rj = 7 if s = 0, =tr, ±2r,... and = 0 foiother integral values of s.
1.7.8 If A' is a real-valued random variable with finite second moment and 6 is reavalued, prove that E(X - 0)2 = var X + (EX - 6?.
1.7.9 Let /denote the space of two-sided sequences x = {x,, t = 0, ±1, ±2,... }Let Ct denote an operation on / that is linear, [GL(ax -f- &y) = a&x -f- jSOt.for a, ft scalars and x, y € /,] and time invariant, [&y = Y if GLx = X, yt =x,+u, Y, = Xl+u for some u ~ 0, ±1, ±2, . . . ]. Prove that there exists ifunction /4(X) such that (CU)/ = A(\)xt if x, = exp{/Xr}.
1.7.10 Consider a sequence co, c\, 02,. . . , its partial sums 5r = ]Cf=0 c'> an^tn'
Cesaro means
If ST -* S, prove that err -»S (as T -> oo); see Knopp (1948).
1.7.11 Let „ be a vector-valued random variable with Y real-valued an<
EY2 < oo. Prove that <£(X) with £#(X)2 < °° that minimizes E[Y - <f>(X)]is given by <KX) =£{ r |X j .
1.7.12 Show that for n = 1, 2, . . .
and from this show
NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS
1.7 EXERCISES 15
1.7.13 Show that the identity
holds, where 0 ^ m ^ n, Uk = «o Hh uk, (k ^ 0), £/_i = 0. (Abel'stransformation)
1.7.14 (a) Let /(*), 0 ^ x ^ 1, be integrable and have an integrable derivativef^(x). Show that
with [y] denoting the integral part of y.(b) Let /<*>(*), k = 0,1, 2 , . . . denote the ah derivative of f(x\ Suppose
/<*>(*), 0 ^ x ^ 1, is integrable for k = 0, 1, 2,. . . , K. Show that
where Bk(y) denotes the kth Bernoulli polynomial. (Euler-MacLaurin)
2
FOUNDATIONS
2.1 INTRODUCTION
In this chapter we present portions of both the stochastic and determinis-tic approaches to the foundations of time series analysis. The assumptionsmade in either approach will be seen to lead to the definition of similarparameters of interest, and implications for practice are generally the same.In fact it will be shown that the two approaches are equivalent in a certainsense. An important part of this chapter will be to develop the invarianceproperties of the parameters of interest for a class of transformations of theseries called filters. Proofs of the theorems and lemmas are given at the endof the book.
The notation that will be adopted throughout this text includes bold faceletters A, B which denote matrices. If a matrix A has entries Ajk we some-times indicate it by [Ajk]. Given an r X s matrix A, its s X r transpose is de-noted by Ar, and the matrix whose entries are the complex conjugates ofthose of A is denoted by A. Det A denotes the determinant of the squarematrix A; the trace of A, is indicated by tr A. |A| denotes the sum of theabsolute values of the entries of A, and I, the identity matrix. An r vector isan r X 1 matrix.
We denote the expected value of a random variable X by EX generally,and sometimes, by ave X. This will reduce the possibility of confusion incertain expressions. We denote the variance of Xby \arX. If (X,Y) is a bivari-ate random variable, we denote the covariance of X with Y by cov [X,Y]. Wesignify the correlation of A' with Y by cor \X,Y].
16
2.2 STOCHASnCS 17
If z is a complex number, we indicate its real part by Re z and itsimaginary part by Im z. We therefore have the representation
We denote the modulus of z, [(Re z)2 + (Im z)2]"2 by |z| and its argument,tan~l {Im z/Re z}, by arg z. If x and y are real numbers, we will write
when the difference x — y is an integral multiple of a.The following functions will prove useful in our work: the Kronecker
delta
otherwise
and the Kronecker comb
otherwise.
Likewise the following generalized functions will be useful: the Dirac deltafunction, S(a), —<*> < a < <*>, with the property
for all functions/(a) continuous at 0, and the Dirac comb
for — oo < a < «> with the property
for all suitable functions/(a). These last functions are discussed in Lighthill(1958), Papoulis (1962), and Edwards (1967). Exercise 1.7.4 suggests thate~1M/(e~'a), for small e, provides an approximate Dirac delta function.
2.2 STOCHASTICS
On occasion it may make sense to think of a particular r vector-valuedtime series X(/) as being a member of an ensemble of vector time series whichare generated by some random scheme. We can denote such an ensemble by
18 FOUNDATIONS
(X(f,0); 6 e 9 and f = 0, ±1, ±2, . . . ,} where 0 denotes a random variabletaking values in 9. If X(f,0) is a measurable function of 0, then X(f,0) is arandom variable and we can talk of its finite dimensional distributions givenby relations such as
and we can consider functional such as
and
if the integrals involved exist. Once a 0 has been generated (in accordancewith its probability distribution), the function X(/,0), with 0 fixed, will b<described as a realization, trajectory, or sample path of the time series
Since there will generally be no need to include 0 specifically as an argument in X(/,0), we will henceforth denote X(f,0) by X(/). X(/) will be called itime series, stochastic process, or random function.
The interested reader may refer to Cramer and Leadbetter (1967), Yaglon(1962), or Doob (1953) for more details of the probabilistic foundations o:time series. Function ca(t)t defined in (2.2.2), is called the mean function o:the time series^/)- Function caa(t\,t2), as derived from (2.2.4), is called th<(auto) covariance function of Xa(t), and cab(ti,t2), defined in (2.2.4), is callecthe cross-covariance function of Xa(t) with Xb(i). ca(i) will exist if and only iiave \Xa(t)\ < °° • By the Schwarz inequality we have
is called the (auto) correlation function of Xa(f) and
is called the cross-correlation function of Xa(t\) withXfa)-We will say that the series Xa(t) and X£t) are orthogonal if cab(ti,t2) = C
for all /i, /2.
2.3 CUMULANTS 19
2.3 CUMULANTS
Consider for the present an r variate random variable (Yi,. .., Yr) withave \Yj\r < <» ,j = 1 , . . . , r where the Yjare real or complex.
Definition 2.3.1 The rth order joint cumulant, cum (Yi,..., Yr), of(7i, . . . , F,)is given by
where the summation extends over all partitions (y\, •.., vp), p = 1,. . . , r,of ( l , . . . , r ) .
An important special case of this definition occurs when F, = YJ = !,...,/•.The definition gives then the cumulant of order r of a univariate randomvariable.
Theorem 2.3.1 cum (Y\,.. . , Yr) is given by the coefficient of (iyti • • • tr inthe Taylor series expansion of log (ave exp i J^-i Yjtj) about the origin.
This last is sometimes taken as the definition of cum (Yi,. .. , Yr).Properties of cum (7i , . . . , Yr) include:
(i) cum (a\Y\,. . . , arYr) = a\ • • -a, cum(Y\,. .. , Yr) for ai, . . . , ar
constant(ii) cum(Fi, . . . , Yr) is symmetric in its arguments
(iii) if any group of the Y's are independent of the remaining Y's, thencum<T, , . . . , r r ) = 0
(iv) for the random variable (Z\,Yi,,. . , Yr), cum (Y\ + Z\,Y2,..., Yr)= cum (Yi,Y2t. . . , Fr) + cum (Z,,r2, . . . , Yr)
(v) for n constant and r = 2, 3, . . .
(vi) if the random variables (Fi,. . . , 7r) and (Zi,. . . ,Zr) are inde-pendent, then
(vii) cum Yj = $Yjfory = 1,. .., r(viii) cum (y/,F/) = var Y j f o r j = ! , . . . , / •
(ix) cum (rv,Ffc) = cov(ry,n) fory, A: = 1,. . . , r.
and a partition P\ U /*2 U • • • U PM of its entries. We shall say that setsPm',Pm", of the partition, hook if there exist (iiji) £ Pm> and (fcji) € Pm"such that /i = h. We shall say that the sets Pm> and Pm» communicate if thereexists a sequence of sets Pmi = Pm>, Pm,, . . . , PmA^ = Pm., such that Pmn andPmn+l hook for n = 1,2, . . . , 7 V — I. A partition is said to be indecomposableif all sets communicate. If the rows of Table 2.3.4 are denoted R\,. . . , Ri,then a partition Pi • • • PM is indecomposable if and only if there exist no setsPm i , . . ., PmN, (N < M), and rows Rtl,. .. , /*,., (0 < /), with
20 FOUNDATIONS
Cumulants will provide us with a means of defining parameters of interest,with useful measures of the joint statistical dependence of random variables(see (iii) above) and with a convenient tool for proving theorems. Cumulantshave also been called semi-invariants and are discussed in Dressel (1940),Kendall and Stuart (1958), and Leonov and Shiryaev (1959).
A standard normal variate has characteristic function exp { —12 /2\ . Itfollows from the theorem therefore that its cumulants of order greater than2 are 0. Also, from (iii), all the joint cumulants of a collection of inde-pendent variates will be 0. Now a general multivariate normal is defined tobe a vector of linear combinations of independent normal variates. It nowfollows from (i) and (vi) that all the cumulants of order greater than 2 are 0for a multivariate normal.
We will have frequent occasion to discuss the joint cumulants of poly-nomial functions of random variables. Before presenting expressions for thejoint cumulants of such variates, we introduce some terminology due toLeonov and Shiryaev (1959). Consider a (not necessarily rectangular) two-way table
The next lemma indicates a result relating to indecomposable partitions.
Lemma 2.3.1 Consider a partition P\--PM> M > 1, of Table 2.3.4.Given elements rijt sm;j = ! , . . . , / , - ; /= 1 , . . . , / ; m — 1,. . . , M; definethe function 0(r,v) = sm if (ij) £ Pm, The partition is indecomposable if andonly if the 0(riyi) — <£(/*,./,); 1 ̂ ji, J2 ^ Jr, i = 1,. . ., / generate all theelements of the set \sm — sm>; \ ̂ m, m' ^ M} by additions and subtrac-tions. Alternately, given elements f,, i = 1,.. ., / define the function
This is a case of a result of Isserlis (1918).We end this section with a definition extending that of the mean'function
and autocovariance function given in Section 2.2. Given the r vector-valuedtime series X(f), t = 0, ±1,. . . with components Xa(t), a = 1,.. . , r, andE\Xa(i)\< oo, we define
for fli,. . . , ak = 1, . . . , r and /i, . . . , / * = 0, ±1, . . . . Such a functionwill be called a joint cumulant function of order fe of the series X(f), t = 0,±1, . . . .
2.3 CUMULANTS 21
Mfij) — ti',j' = ! , . . . , / / ; / = 1 , . . . , / . The partition is indecomposable ifand only if the ̂ (r/y) - iK/vy); O'J), 0V) € /V, m = 1,. . ., M generate allthe elements of the set {/, — /,->; 1 ^ // <J /} by addition and subtraction.
We remark that the set {ti — ?,-; 1 ^ /',/' ^ /} is generated by / — 1 in-dependent differences, such as t\ — ti,. . . , ti-\ — ti. It follows that whenthe partition is indecomposable, we may find 7 — 1 independent differencesamong the Mnj) - iK/Vy); (/,;), (/',/) € Pm', m = 1 , . . . , M.
Theorem 2.3.2 Consider a two-way array of random variables X^;j = 1,. . . , J/; / = 1, . . . , / . Consider the / random variables
The joint cumulant cum (Y\,. . . , F/) is then given by
where the summation is over all indecomposable partitions v = v\ U • • • U vp
of the Table 2.3.4.
This theorem is a particular case of a result of work done by Leonov andShiryaev (1959).
We briefly mention an example of the use of this theorem. Let (Xi,... ,^4)be a 4-variate normal random variable. Its cumulants of order greater than 2will be 0. Suppose we wish cov {X^JtsX*} • Following the details of Theo-rem 2.3.2 we see that
We indicate the r X r matrix-valued function with entries cab(u) by cxx(u)and refer to it as the autocovariance function of the series X(f), t = 0,± 1 , . . . . If we extend the definition of cov to vector-valued random vari-ables X, Y by writing
for t, u = 0, ±1,. . . and a, b = 1,. .. , r.
We note that a strictly stationary series with finite second-order momentsis second-order stationary.
On occasion we write the covariance function, of a second-order sta-tionary series, in an unsymmetric form as
22 FOUNDATIONS
2.4 STATIONARITY
An r vector-valued time series X(f), t = 0, ±1,... is called strictlystationary when the whole family of its finite dimensional distributions isinvariant under a common translation of the time arguments or, when thejoint distribution of Xai(ti + / ) , . . . , X0k(tk + t) does not depend on t fort, h,. . ., tk = 0, ±1,... and 0i,. . . ,ak = 1,.. . , r, A: = 1 , 2 , . . . .
Examples of strictly stationary series include a series of independentidentically distributed r vector-valued variates, e(/), t = 0, ±1,.. . and aseries that is a deterministic function of such variates as
More examples of strictly stationary series will be given later.In this section, and throughout this text, the time domain of the series is
assumed to be t = 0, ± 1, . . . . We remark that if / is any finite stretch ofintegers, then a series X(f), / € /, that is relatively stationary over /, may beextended to be strictly stationary over all the integers. (The stationary ex-tension of series defined and relatively stationary over an interval is con-sidered in Parthasarathy and Varadhan (1964).) The important thing, fromthe standpoint of practice, is that the series be approximately stationaryover the time period of observation.
An r vector-valued series X(/), / = 0, ±1, . . . is called second-orderstationary or wide-sense stationary if
2.5 SECOND-ORDER SPECTRA 23
then we may define the autocovariance function of the series X(t) by
for t, u = 0, ±1,... in the second-order stationary case.If the vector-valued series X(0, / = 0, ±1,... is strictly stationary with
JELXXOI*< »,./= l , . . . , r , then
for ti,.. ., tk, a = 0, ±1, . . . . In this case we will sometimes use theasymmetric notation
to remove the redundancy. This assumption of finite moments need notcause concern, for in practice all series available for analysis appear to bestrictly bounded, \Xj(t)\ < C, j = 1, . . . , r for some finite C and so allmoments exist.
2.5 SECOND-ORDER SPECTRA
Suppose that the series X(0, t = 0, ± 1,... is stationary and that, follow-ing the discussion of Section 1.3, its span of dependence is small in the sensethatXa(t) &ndXt(t + u) are becoming increasingly less dependent as |M| —» =>for a, b = 1 , . . . , r. It is then reasonable to postulate that
In this case we define the second-order spectrum of the series Xa(t) with theseries ATt(r) by
Under the condition (2.5.1),/0<,(X) is bounded and uniformly continuous.The fact that the components of \(t) are real-valued implies that
Also an examination of expression (2.5.2) shows that/,i,(X) has period 2irwith respect to X.
The real-valued parameter X appearing in (2.5.2) is called the radian orangular frequency per unit time or more briefly the frequency. If b = a, then/«w(X) is called the power spectrum of the series Xa(t) at frequency X. If b ^ a,then/06(X) is called the cross-spectrum of the series Xtt(t) with the series Xd(t)
24 FOUNDATIONS
at frequency X. We note that tfXa(t) = Xb(i), t = 0, ±1,. . .with probability1, then /afc(X), the cross-spectrum, is in fact the power spectrum faa(\\Re /,i(X) is called the co-spectrum and Im /0&(X) is called the quadraturespectrum. <£a6(X) = arg/a&(X) is called the phase spectrum, while |/aft(X)| iscalled the amplitude spectrum.
Suppose that the autocovariance functions cat,(u), u — 0, ±1,. . . arecollected together into the matrix-valued function cxx(u), u = 0, ±1, . . .having cab(u) as the entry in the 0th row and bth column. Suppose likewisethat the second-order spectra,/fl/>(X), — «> < X < «, are collected togetherinto the matrix-valued function fxx(X), — °° < X < °°, having/aft(X) as theentry in the 0th row and 6th column. Then the definition (2.5.2) may bewritten
The r X r matrix-valued function, fxx(^), — °° < X < <», is called thespectral density matrix of the series X(/), t = 0, ± 1,. . . . Under the condi-tion (2.5.1), the relation (2.5.4) may be inverted to obtain the representation
In Theorem 2.5.1 we shall see that the matrix fxx(^) is Hermitian, non-negative definite, that is, f^XX) = fxx&Y and «Tfjrjr(X)a ^ 0 for all rvectors a with complex entries.
Theorem 2.5.1 LetX(/), t = 0, ±1, ... be a vector-valued series withautocovariance function CXX(H) = cov{X(f + u), X(?)}, /, u — 0, ±1, ...satisfying
Then the spectral density matrix
is Hermitian, non-negative definite.
In the case r = 1, this implies that the power spectrum is real andnon-negative.
In the light of this theorem and the symmetry and periodicity propertiesindicated above, a power spectrum may be displayed as a non-negativefunction on the interval [0,7r], We will discuss the properties of powerspectra in detail in Chapter 5.
2.6 CUMULANT SPECTRA OF ORDER k 25
In the case that the vector-valued series X(0, / = 0, ±1,. . . has finitesecond-order moments, but does not necessarily satisfy some mixing condi-tion of the character of (2.5.1), we can still obtain a spectral representationof the nature of (2.5.5). Specifically we have the following:
Theorem 2.5.2 Let X(/), t — 0, ±1,... be a vector-valued series that issecond-order stationary with finite autocovariance function cxx(u) =cov {X(/ + u), X(f)|, for t, u = 0, ±1,Then there exists an r X rmatrix-valued function ¥xx(ty, — ir < X ̂ TT, whose entries are of boundedvariation and whose increments are non-negative definite, such that
The representation (2.5.8) was obtained by Herglotz (1911) in the real-valued case and by Cramer (1942) in the vector-valued case.
for — oo < \j, < oo, a t , . . . , a* = 1, .. ., r, k = 2, 3,. . . . We will extendthe definition (2.6.2) to the case k = 1 by setting fa=caEXa(f),
The function F^X) is called the spectral measure of the series X(f),t = 0, ±1, . . . . In the case that (2.5.1) holds, it is given by
This function is given by
In this case, we define the feth order cumulant spectrum,/,, . . . , 0 t ( X i , . . . , X*_i)
2.6 CUMULANT SPECTRA OF ORDER fe
Suppose that the series X(r), t = 0, ± I , . . . is stationary and that its spanof dependence is small enough that
26 FOUNDATIONS
a = 1,. . . , r. We will sometimes add a symbolic argument X* to the func-tion of (2.6.2) writing/o, ak(\i,.. . , X*) in order to maintain symmetry. X*may be taken to be related to the other X, by ̂ X, == 0 (mod 2ir).
We note that/ai,...,afc(Xr,.. . , X*) is generally complex-valued. It is alsobounded and uniformly continuous in the manifold ]£* X, = 0 (mod 2?r).We have the inverse relation
and in symmetric form
where
is the Dirac comb of (2.1.6).We will frequently assume that our series satisfy
Assumption 2.6.1 X(/) is a strictly stationary r vector-valued series withcomponents Xj(i), j = 1, . . . , / • all of whose moments exist, and satisfying(2.6.1) for ai, . . . , dk = 1, . . . , r and k — 2, 3, . . . .
We note that all cumulant spectra, of all orders, exist for series satisfyingAssumption 2.6.1. In the case of a Gaussian process, it amounts tonothing more than £ |ca6(tt)| < «, a, b = 1, . . . , r.
Cumulant spectra are defined and discussed in Shiryaev (1960), Leonov(1964), Brillinger (1965) and Brillinger and Rosenblatt (1967a, b). The ideaof carrying out a Fourier analysis of the higher moments of a time seriesoccurs in Blanc-Lapierre and Fortet (1953).
The third-order spectrum of a single series has been called the bispectrum;see Tukey (1959) and Hasselman, Munk and MacDonald (1963). Thefourth-order spectrum has been called the trispectrum.
On occasion we will find the following assumption useful.
Assumption 2.6.2(1) Given the r-vector stationary process X(/) with com-ponents Xj(i), j = 1, . . . , / • , there is an / ^ 0 with
for j = 1,.. . , k — 1 and any k tuple a\,. . . , a* when k = 2, 3, . . . .
2.7 FILTERS 27
This assumption implies, for / > 0, that well-separated (in time) values ofthe process are even less dependent than implied by Assumption 2.6.1, theextent of dependence depending directly on /. Equation (2.6.6) implies thatfai flfc(Xi,..., X*) has bounded and uniformly continuous derivatives oforder ^ /.
If instead of expressions (2.6.1) or (2.6.6) we assume only ave |Jfa(/)|fc < °°,
a = ! , . . . , / - , then the/fll a t (Xi , . . . , \k) appearing in (2.6.4) are Schwartzdistributions of order ^ 2. These distributions, or generalized functions, arefound in Schwartz (1957, 1959). In the case k = 2, Theorem 2.5.2 showsthey are measures.
Several times in later chapters we will require a stronger assumption thanthe commonly used Assumption 2.6.1. It is the following:
Assumption 2.6.3 The r vector-valued series X(f), / = 0, ± 1,. . . satisfiesAssumption 2.6.1. Also if
2.7 FILTERS
In the analysis of time series we often have occasion to apply somemanipulatory operation. An important class of operations consists ofthose that are linear and time invariant. Specifically, consider an operationwhose domain consists of r vector-valued series X(f), t = 0, ±1, . . . andwhose range consists of s vector-valued series Y(f), / = 0, r b l , . . . . Wewrite
for z in a neighborhood of 0.
This assumption will allow us to obtain probability 1 bounds for variousstatistics of interest. If X(/), t = 0, ±1,. . . is Gaussian, all that is requiredis that the covariance function be summable. Exercise 2.13.36 indicates theform of the assumption for another example of interest.
to indicate the action of the operation. The operation is linear if for seriesXi(0, X2(0> t = 0, ±1, . . . in its domain and for constants ai, 0:2 we have
then
28 FOUNDATIONS
Next for given w let T"X(t), t = 0, ±1,. . . denote the series X(t + u), t = 0,± 1,. . . . The operation a is time invariant if
We may now set down the definition: an operation a carrying r vector-valued series into s vector-valued series and possessing the properties (2.7.2)and (2.7.3) is called an s X r linear filter.
The domain of an s X r linear filter may include r X r matrix-valuedfunctions U(/), t = 0, ±1,. . . . Denote the columns of U(r) by U//),j — 1,. .. , r and we then define
The range of this extended operation is seen to consist of s X r matrix-valued functions.
An important property of filters is that they transform cosinusoids intocosinusoids. In particular we have
Lemma 2.7.1 Let S be a linear time invariant operation whose domainincludes the r X r matrix-valued series
/ = 0, ±1,...; — oo < X < co where I is the r X r identity matrix. Thenthere is an s X r matrix A(X) such that
In other words a linear time invariant operation carries complex exponen-tials of frequency X over into complex exponentials of the same frequency X.The function A(X) is called the transfer function of the operation. We see thatA(X -(- 2*0 = A(X).
An important class of s X r linear filters takes the form
/ = 0, ±1, .. ., where X(f) is an r vector-valued series, Y(/) is an s vector-valued series, and a(w), u = 0, ± 1, .. . is a sequence of 5 X r matricessatisfying
2.7 FILTERS 29
We call such a filter an s X r summable filter and denote it by {a(u)|. Thetransfer function of the filter (2.7.7) is seen to be given by
It is a uniformly continuous function of X in view of (2.7.8). The functiona(«), « = 0, ±1,. . . is called the impulse response of the filter in view of thefact that if the domain of the filter is extended to include r X r matrixvalued series and we take the input series to be the impulse
then the output series is a(/), t = 0, ±1, . . . .An s X r filter ja(w)} is said to be realizable if a(u) = 0 for u — — 1, —2,
— 3 From (2.7.7) we see that such a filter has the form
and so Y(0 only involves the values of the X series for present and pasttimes. In this case the domain of A(X) may be extended to be the region- oo < Re X < oo, Im X £ 0.
On occasion we may wish to apply a succession of filters to the sameseries. In this connection we have
Lemma 2.7.2 If {ai(/)( and (a2(OI are s X r summable filters with transferfunctions Ai(X), A2(X), respectively, then (ai(/) + &2(i)} is an s X r sum-mable filter with transfer function Ai(X) + A2(X).
If {bi(/)j is an r X q summable filter with transfer function Bi(X) and{b2(01 is an s X r summable filter with transfer function Ba(X), then(b2*bi(0), the filter resulting from applying first {bi(0} followed by{b2(0)> is a 5 X q summable filter with transfer function B2(X)Bi(X).
The second half of this lemma demonstrates the advantage of consideringtransfer functions as well as the time domain coefficients of a filter. Theconvolution expression
b2 * b,(0 = £ b2(r - «)bi(ii) (2.7.12)u
takes the form of a multiplication in the frequency domain.Let (a(0) be an r X r summable filter. If an r X r filter \b(t)\ exists such
30 FOUNDATIONS
then {a(0} is said to be nonsingular. The filter |b(0} is called the inverse of{a(0}. It exists if the matrix A(X) is nonsingular for — » < X < oo; itstransfer function is ACX)"1.
On occasion we will refer to an / summable filter. This is a summablefilter satisfying the condition
Two examples of / summable filters follow. The operation indicated by
is an / summable filter, for all /, with coefficients
and transfer function
We will see the shape of this transfer function in Section 3.2. For M not toosmall, A(\) is a function with its mass concentrated in the neighborhood offrequencies X = 0 (mod 2ir). The general effect of this filter will be tosmooth functions to which it is applied.
Likewise the operation indicated by
is an / summable filter, for all /, with coefficients
and transfer function
This transfer function has most of its mass in the neighborhood of fre-quencies X = ±T, ± 3 7 r , . . . . The effect of this filter will be to remove theslowly varying part of a function and retain the rapidly varying part.
We will often be applying filters to stochastic series. In this connectionwe have
2.7 FILTERS 31
Lemma 2.7.3 If X(0 is a stationary r vector-valued series with £|X(0| < °°,and {a(/)j is an s X r summable filter, then
it is possible to define the output of such a filter as a limit in mean square.Specifically we have
Theorem 2.7.1 Let X(0, t = 0, db 1, . . . be an r vector-valued series withabsolutely summable autocovariance function. Let A(\) be an s X r matrix-valued function satisfying (2.7.23). Set
t = 0, ±1,.. . exists with probability 1 and is an s vector-valued stationaryseries. If £|X(f)|* < », k > 0, then £|Y(/)j* < ».
An important use of this lemma is in the derivation of additional sta-tionary time series from stationary time series already under discussion. Forexample, if e(r) is a sequence of independent identically distributed r vectorvariates and {a(f)l is an s X r filter, then the s vector-valued series
is a strictly stationary series. It is called a linear process.Sometimes we will want to deal with a linear time invariant operation
whose transfer function A(X) is not necessarily the Fourier transform of anabsolutely summable sequence. In the case that
M = 0, ±1,..... Then
exists for t — 0, ±1,
Results of this character are discussed in Rosenberg (1964) for the case inwhich the conditions of Theorem 2.5.2 are satisfied plus
32 FOUNDATIONS
Two 1 X 1 filters satisfying (2.7.23) will be of particular importance inour work. A 1 X 1 filter \a(u)} is said to be a band-pass filter, centered at thefrequency Xo and with band-width 2A if its transfer function has the form
in the domain — T < X < tr. Typically A is small. If Xo = 0, the filter iscalled a low-pass filter. In the case that
for constants RJt <£,, k and the transfer function A(\) is given by (2.7.26), wesee that the filtered series is given by
with the summation extending overy such that |X, ± Xoj ̂ A. In otherwords, components whose frequencies are near Xo remain unaffected, where-as other components are removed.
A second useful 1 X 1 filter is the Hilbert transform. Its transfer functionis purely imaginary and given by —i sgn X, that is
If the series X(t), t = 0, ± 1, . . . is given by (2.7.27), then the series resultingfrom the application of the filter with transfer function (2.7.29) is
The series that is the Hilbert transform of a series X(t) will be denoted*"(/), / = 0, ± 1 , . . . .
Lemma 2.7.4 indicates how the procedure of complex demodulation (seeTukey (1961)) may be used to obtain a band-pass filter centered at a generalfrequency Xo and the corresponding Hilbert transform from a low-pass filter.
In complex demodulation we first form the pair of real-valued series
for / = 0, ± 1,. . . and then the pair of series
2.7 FILTERS 33
where {a(0} is a low-pass filter. The series, W\(t\ W-&S) — oo < / < oo,arecalled the complex demodulates of the series X(t), — <=° < t < <». Because{a(t)\ is a low-pass filter, they will typically be substantially smoother thanthe series X(i), — oo < t < <». If we further form the series
for — oo < t < oo, then the following lemma shows that the series V\(f) isessentially a band-pass filtered version of the series X(t\ while the seriesV2(i) is essentially a band-pass filtered version of the series XH(t).
Lemma 2.7.4 Let [a(t)\ be a filter with transfer function A(\\— oo < X < oo. The operation carrying the series X(f), -co < t < °°, intothe series V\(f) of (2.7.33) is linear and time invariant with transfer function
The operation carrying the series X(f) into Vz(t) of (2.7.33) is linear and timeinvariant with transfer function
In the case that A(\) is given by
for — TT < X < TT and A small, functions (2.7.34) and (2.7.35) are seen tohave the forms
— TT < X, Ao < TT and
for — IT < X, Ao < TT.Bunimovitch (1949), Oswald (1956), Dugundji (1958), and Deutsch (1962)
discuss the interpretation and use of the output of such filters.
for all (complex) A I , . . . , A, and so obtain the result of Theorem 2.5.1 —that the matrix fxx(X) is non-negative definite — from the case r = 1.
As power spectra are non-negative, we may conclude from (2.8.4) that
If s = 1, then the power spectrum of Y(t) is given by
where A(\) is the transfer function of the filter.
Example 2.8.2 Let fxx(ty and fyy(X) signify the r X r and s X s matrices ofsecond-order spectra of X(0 and Y(/); respectively. Then
Some cases of this theorem are of particular importance.
Example 2.8.1 LctX(t) and Y(t) be real-valued with power spectra/y^X),/yy(X), respectively; then
34 FOUNDATIONS
2.8 INVARIANCE PROPERTIES OF CUMULANT SPECTRA
The principal parameters involved in our discussion of the frequencyanalysis of stationary time series are the cumulant spectra. At the same timewe will often be applying filters to the series or it will be the case that somefiltering operation has already been applied. It is therefore important thatwe understand the effect of a filter on the cumulant spectra of stationaryseries. The effect is of an elementary algebraic nature.
Theorem 2.8.1 Let X(r) be an r vector series satisfying Assumption 2.6.1and Y(0 = ]£„ &(t — «)X(«), where {a(0} is an s X r summable filter. Y(f)satisfies Assumption 2.6.1. Its cumulant spectra
are given by
2.9 EXAMPLES OF STATIONARY TIME SERIES 35
Example 2.8.3 If X(/), Y(f)> t = 0, db 1,. . . are both r vector-valued with Yrelated to X through
2.9 EXAMPLES OF STATIONARY TIME SERIES
The definition of, and several elementary examples of, a stationary timeseries was presented in Section 2.4. As stationary series are the basic entitiesof our analysis, it is of value to have as many examples as possible.
Example 2.9.1 (A Pure Noise Series) Let e(f), t = 0, ±1,. . . be a sequenceof independent, identically distributed r vector-valued random variables.Such a series clearly forms a stationary time series.
Example 2.9.2 (Linear Process) Let c(/), t = 0, ±1,. . . be the r vector-valued pure noise series of the previous example. Let
then the cumulant spectra of \(f) are given by
where Bj(\) denotes the transfer function of the filter {&/«)}.
Later we will see that Examples 2.8.1 and 2.8.3 provide convenient meansof interpreting the power spectrum, cross-spectrum, and higher ordercumulant spectra.
where (a(w)} is an s X r summable filter. Following Lemma 2.7.3, this seriesis a stationary s vector-valued series.
If only a finite number of the a(w) in expression (2.9.1) are nonzero, thenthe series X(t) is referred to as a moving average process. If a(0), a(m) 7* 0and a(w) = 0 for u > m and u < 0, the process is said to be of order m.
Example 2.9.3 (Cosinusoid) Suppose that X(f) is an r vector-valued serieswith components
where /? i , . . . , / ? , are constant, #1,. . ., <f>r-i are uniform on (—7r,7r), and4>i -f • • • + <£r = 0. This series is stationary, because if any finite collectionof values is considered and then the time points are all shifted by t, theirstructure is unchanged.
36 FOUNDATIONS
Example 2.9.4 (Stationary Gaussian Series) An r vector time series X(/),t = 0, ±1, ±2 , . . . is a Gaussian series if all of its finite dimensional distri-butions are multivariate Gaussian (normal). If EX(t) = y and EX(t)\(u)T =R(t — u) for all t, u, then X(f) is stationary in this case, for the series is deter-mined by its first- and second-order moment properties.
We note that if X(f) is a stationary r vector Gaussian series, then
for an s X r filter |a(0] is a stationary s vector Gaussian series.Extensive discussions of stationary Gaussian series are found in Blanc-
Lapierre and Fortet (1965), Loeve (1963), and Cramer and Leadbetter(1967).
Example 2.9.5 (Stationary Markov Processes) An r vector time series X(t),t = 0, ±1, ±2, . . . is said to be an r vector Markov process if the condi-tional probability
Prob{X(/) ^ X | X(Sl) = x,,. . . , X(sn) = x,, X(s) = x} (2.9.4)
(for any s\ < 52 < • • • < sn < s < t) is equal to the conditional probability
The function P(.y,x,/,X) is called the transition probability function. It andan initial probability Prob{X(0) ^ XQ} , completely determine the probabilitylaw of the process. Extensive discussions of Markov processes and in par-ticular stationary Markov processes may be found in Doob (1953), Dynkin(1960), Loeve (1963), and Feller (1966).
A particularly important example is that of the Gaussian stationaryMarkov process. In the real-valued case, its autocorrelation function takes asimple form.
Lemma 2.9.1 If X(t), t = 0, ±1, ±2, . . . is a nondegenerate real-valued,Gaussian, stationary Markov process, then its autocovariance function isgiven by CA-A-(O)P|M|, for some p, — 1 < p < 1.
Another class of examples of real-valued stationary Markov processes isgiven in Wong (1963). Bernstein (1938) considers the generation oPMarkovprocesses as solutions of stochastic difference and differential equations.
An example of a stationary Markov r vector process is provided by X(0,the solution of
2.9 EXAMPLES OF STATIONARY TIME SERIES 37
where t(f) is an r vector pure noise series and a an r X r matrix with alleigenvalues less than 1 in absolute value.
Example 2.9.6 (Autoregressive Schemes) Equation (2.9.6) leads us to con-sider r vector processes X(t) that are generated by schemes of the form
where t(i) is an r vector pure noise series and a(l),. . . , a(m) are r X rmatrices. If the roots of Det A(z) = 0 lie outside the unit circle where
it can be shown (Section 3.8) that (2.9.7) has a stationary solution. Such anX(/) is referred to as an r vector-valued autoregressive process of order m.
Example 2.9.7 (Mixing Moving Average and Autoregressive Process) Onoccasion we combine the moving average and autoregressive schemes. Con-sider the r vector-valued process X(t) satisfying
where t(t) is an s vector-valued pure noise series, a(y), j — 1,.. . , m arer X r matrices, and b(&), k = 1,. . . , n are r X s matrices. If a stationaryX(f), satisfying expression (2.9.9), exists it is referred to as a mixed movingaverage autoregressive process of order (m,n).
If the roots of
lie outside the unit circle, then an X(t) satisfying (2.9.9) is in fact a linearprocess
where C(X) = A(A)-!B(\); see Section 3.8.
Example 2.9.8 (Functions of Stationary Series) If we have a stationaryseries (such as a pure noise series) already at hand and we form time in-variant measurable functions of that series then we have generated anotherstationary series. For example, suppose X(f) is a stationary series andY(0 = ̂ u &(t — u)X(u) for some 5 X r filter {a(«)}. We have seen, (Lemma2.7.1), that under regularity conditions Y(r) is also stationary. Alternativelywe can form a Y(/) through nonlinear functions as by
where U is a transformation that preserves probabilities and 6 lies in theprobability space; see Doob (1953) p. 509. We can often take 6 in the unitinterval; see Choksi (1966).
Unfortunately relations such as (2.9.12) and (2.9.13) generally are not easyto work with. Consequently investigators (Wiener (1958), Balakrishnan(1964), Shiryaev (1960), McShane (1963), and Meecham and Siegel (1964))have turned to series generated by nonlinear relations of the form
in the hope of obtaining more reasonable results. Nisio (1960, 1961) hasinvestigated 7(0 of the above form for the case in which X(t) is a pure noiseGaussian series. Meecham (1969) is concerned with the case where Y(t) isnearly Gaussian.
We will refer to expansions of the form of expression (2.9.14) as Volterrafunctional expansions; see Volterra (1959) and Brillinger (I970a).
In connection with Y(t) generated by expression (2.9.14), we have
Theorem 2.9.1 If the series X(t), t = 0, ±1,. . .satisfies Assumption 2.6.1and
with the aj absolutely summable and L < <», then the series Y(t), t = 0,±1,... also satisfies Assumption 2.6.1.
We see, for example, that the series X(tY, t — 0, ±1, ... satisfies As-sumption 2.6.1 when the series X(t) does. The theorem generalizes to rvector-valued series and in that case provides an extension of Lemma 2.7.3.
Example 2.9.9 (Solutions of Stochastic Difference and Differential Equations)We note that literature is developing on stationary processes that satisfyrandom difference and differential equations; see, for example, Kampe deFeriet (1965), Ito and Nisio (1964), and Mortensen (1969).
38 FOUNDATIONS
for some measurable f[xi,X2]; see Rosenblatt (1964). In fact, in a real sense,all stationary functions are of the form of expression (2.9.12), f possiblyhaving an infinite number of arguments. Any stationary time series, definedon a probability space, can be put in the form
2.10 EXAMPLES OF CUMULANT SPECTRA 39
In certain cases (see Ito and Nisio (1964)) the solution of a stochasticequation may be expressed in the form (2.9.14).
Example 2.9.10 (Solutions of Volterra Functional Relations) On occasion,we may be given Y(t) and wish to define X(t) as a series satisfying expression(2.9.14). This provides a model for frequency demultiplication and theappearance of lower order harmonics.
2.10 EXAMPLES OF CUMULANT SPECTRA
In this section we present a number of examples of cumulant spectra oforder k for a number of r vector-valued stationary time series of interest.
Example 2.10.1 (A Pure Noise Series) Suppose that t(t) is an r vector purenoise series with components za(t),a — \,...,r. Let
The result of this example may be combined with that of the previous ex-ample to obtain the spectra of moving average and autoregressive processes.
Example 2.10.2 (Stationary Gaussian Series) The characteristic functionof a multivariate Gaussian variable, with mean vector p and variance-covariance matrix S, is given by
We see from this that all cumulant functions of order greater than 2 mustvanish for a Gaussian series and therefore all cumulant spectra of ordergreater than 2 also vanish for such a series.
where (a(01 is an s X r filter and e(/) an r vector pure noise series. FromTheorem 2.8.1 we have
Example 2.10.2 (A Linear Process) Suppose that
exist; then cai ak(ui,.. ., uk-i) = Ka, ak8{ui} • • -d{uk-i}, where §{*} isthe Kronecker delta. We see directly that
40 FOUNDATIONS
We see that cumulant spectra of order greater than 2, in some sense,measure the non-normality of a series.
Example 2.10.3 (Cosinusoids) Suppose that X(0 is an r vector process withcomponents Xa(i) = Ra cos (uat -f- </>„), a = 1,. . . , r, where Ra is constant,wi + • • • + ojr = 0 (mod 2?r), and < p i , . . ., <pr-i independent and uniform on(—7r,7r] while <p\ H(-<£>, = () (mod 2*-). X(0 is stationary. We note thatthe members of any proper subset of <pi , . . . , tpr are independent of eachother and so joint cumulants involving such proper subsets vanish. Therefore
cum{*i(fi),. . . , Xr(tr)= ave{ATi(fi) X • • • X Xr(tr) 1
r?(X) was defined by (2.1.6), see also Exercise 2.13.33.In the case that r ~ 1, the power spectrum of the series X(t) =
R cos (o>f + 0), is seen to be
It has peaks at the frequencies X = ±co (mod 2ir). This provides one of thereasons for calling X the frequency. We see that u/(2w) is the number ofcomplete cycles the cosinusoid cos (ut + <£) passes through when t increasesby one unit. For this reason X/(27r) is called the frequency in cycles per unittime. Its reciprocal, 27T/X, is called the period. X itself is the angular frequencyin radians per unit time.
Example 2.10.4 (Volterra Functional Expansions) We return to Example2.9.8 and have
Theorem 2.10.1 Let Y(t\ t = 0, ±1 , . . . be given by (2.9.15) whereS \aj(ui,. . ., uj)\ < oo, and
This is a function of 11 — f r , . . ., tr-\ — tr as wi + • • • + ov = 0 (mod 2ir).We have
and so
2.11 FUNCTIONAL & STOCHASTIC APPROACHES TO TIME SERIES ANALYSIS 41
Then the 7th order cumulant spectrum of the series Y(t), t — 0, ±1,.. . isgiven by
where the outer sum is over all the indecomposable partitions {Pi,..., PM],M = 1,2, . . . o f Table 2.3.4.
We have used the symmetrical notation for the cumulant spectra in Equa-tion (2.10.10). Theorem 2 in Shiryaev (1960) provides a related result.
2.11 THE FUNCTIONAL AND STOCHASTIC APPROACHES TO TIMESERIES ANALYSIS
Currently two different approaches are adopted by workers in time series:the stochastic approach and the functional approach. The former, generallyadopted by probabilists and statisticians (Doob, (1953) and Cramer andLeadbetter (1967)), is that described in Section 2.2. A given time series is re-garded as being selected stochastically from an ensemble of possible series.We have a set 0 of r vector functions 6(0- After defining a probabilitymeasure on 6, we obtain a random function X(f,0), whose samples are thegiven functions d(t). Alternatively given X(/), we can set up an index 6 = X( •)and take 6 to be the set of all 6. We then may set X(f,0) = X(f,X(-)). In anycase we find ourselves dealing with measure theory and probability spaces.
In the second approach, a given r vector time series is interpreted as amathematical function and the basic ensemble of time functions takes theform |X(/,p) = X(/ + D) \ v = 0, ±1, ±2, . . .}, where X(f) is the given rvector function. This approach is taken in Wiener (1930), for example, andis called generalized harmonic analysis.
The distinction, from the point of view of the theoretician, is the differentmathematical tools required and the different limiting processes involved.
Suppose that X(f) has components^/), a = 1, . . . , / • . In the functionalapproach we assume that limits of the form
42 FOUNDATIONS
exist. A form of stationarity obtains as
independently of v for v = 0, ±1, ±2, . . . . We now define a cross-covariance function by
If
we can define a second-order spectrum /fli>(X) as in Section 2.5.Suppose that the functions Xa(t), a — 1, . . . , r are such that
(i) for given real x\,...,Xk and ti,. . . , tk the proportions, F^ ak
(jci,. . . , XA;/I , • • • » t k ) , of /'s in the interval [—5,7") such that
tends to a limit Fai ak(x\, . . . , Xk\t\, . . . , tk) (at points of continuityof this function) as S, T —> <» and
(ii) a compactness assumption such as
is satisfied for all S, T and some u > 0.
In this case the Fai ak(xi,. . . , x*;/i,. . . , tk) provide a consistent andsymmetric family of finite dimensional distributions and so can be associatedwith some stochastic process by the Kolmogorov extension theorem; seeDoob (1953). The limit in (i) depends only on the differences ti — tk, . • • ,tk-i — tk and so the associated process is strictly stationary. If in (ii) wehave u ^ k and X(/) is the associated stationary process, then
and the association makes sense. X(/) will satisfy Assumption 2.6.1 if thecumulant-type functions derived from X(t) satisfy (2.6.1).
2.12 TRENDS 43
In other words, if the function (of the functional approach) satisfies cer-tain regularity conditions, then there is a strictly stationary process whoseanalysis is equivalent.
Conversely: if X(/) is ergodic (metrically transitive), then with probability1 any sample path satisfies the required limiting properties and can be takenas the basis for a functional approach.1
In conclusion, we have
Theorem 2.11.1 If an r vector function satisfies (i) and (ii) above, then astationary stochastic process can be associated with it having the same limit-ing properties. Alternatively, if a stationary process is ergodic, then withprobability 1 any of its sample paths can be taken as the basis for a func-tional approach.
These two approaches are directly comparable to the two approaches tostatistics through kollectivs (Von Mises (1964)) and measurable functions(Doob (1953)); see also Von Mises and Doob (1941).
The condition that X(/) be ergodic is not overly restrictive for our pur-poses since it is ergodic when it satisfies Assumption 2.6.1 and is determinedby its moments; Leonov (1960). We note that a general stationary process is amixture of ergodic processes (Rozanov (1967)), and the associated processobtained by the above procedure will correspond to some component of themixture. The limits in (i) will exist with probability 1; however, they willgenerally be random variables.
Wold (1948) discusses relations between the functional and stochastic ap-proaches in the case of second-order moments.
We note that the limits required in expressions (2.11.1) and (2.11.2) followunder certain conditions from the existence of the limits in (i); see Wintner(1932).
We will return to a discussion of the functional approach to time seriesanalysis in Section 3.9.
2.12 TRENDS
One simple form of departure from the assumption of stationarity is thatthe series X(t), t = 0, ±1,. . . has the form
JX(/) is ergodic if for any real-valued/[x] with ave |/(X(/)] | < o°, with probability 1,
See Cramer and Leadbetter (1967), Wiener et al (1967), Halmos (1956), Billingsley(1965), and Hopf (1937).
44 FOUNDATIONS
where the series e(0, / = 0, ±1,. . . is stationary, while m(t), t = 0, ±1,.. .is a nonconstant deterministic function. If, in addition, m(t) does not satisfyconditions of the character of those of Section 2.11, then a harmonic analy-sis of X(t) is not directly available. Our method of analysis of such series willbe to try to isolate the effects of m(f)and e(/) for separate analysis.
If the function m(t), t = 0, ± 1,.. . varies slowly, it will be referred to as atrend. Many series occurring in practice appear to possess such a trend com-ponent. The series of United Kingdom exports graphed in Figure 1.1.4appear to have this characteristic. In Section 5.11 we will discuss the estima-tion of trend functions of simple form.
2.13 EXERCISES
2.13.1 Let X(t) = cos (Xr + 0), where 6 has a uniform distribution on ( —TT.TT].Determine the finite dimensional distributions of the process, the meanfunction, cx(i), and the autocovariance function cxx(ti,t2).
2.13.2 If (Yi,..., yr) is an r variate chance quantity for which cum (Yjlt. . . ,Yjt)exists71, .. . Js = 1,. .., r andZ* = ̂ , atjYj, k = 1,. . . , s, prove that
cum(Zkl,... ,Zfc.) = J]yr • '2./.a*i>r ' '«*•>.cum Wi» • • • » YjJ> *i, • • •, *i— i, . . . , s.
2.13.3 Denote cum (Yi[m\ times], . . . , Yr[mr times]) and cum (Z\[m times], . . . ,Zs[n, times]) by K,ni...n,r(Y) and #„,...n,(Z), respectively and let £[ffll(Y) andKln](Z\ m = mi H\- mr, n = m H n, denote the vectors withthese components. Denote the transformation of 2.13.2 by Z = AY, whereA is an s X r matrix. Prove that A"[nl(Z) = AulA:lnl(Y), where A1"1 is the«th symmetric Kronecker power of A; see Hua (1963) pp. 10, 100.
2.13.4 Determine the transfer function of the filter of (2.9.6).2.13.5 Show that the power spectrum of the (wide sense stationary) series X(t) —
R cos (co/ + <t>) where R is a constant, to is a random variable with con-tinuous density function /(w) and <J> is an independent uniform variate on(—T.TT] is given by
2.13.6 Prove that the transfer function of the r X r filter indicated by
has off-diagonal elements 0 and diagonal elements [sin(2A/r+ l)X/2]/[(2N + 1) sin X/2].
2.13.7 If Yi(i) = £„ au(f - «)*i(«), Y2(t) = £„ a22(t - u)X2(u\ where\X\(t\ X2(t)} satisfies Assumption 2.6.1. Suppose the transfer functions
2.13 EXERCISES 45
AU(\), /*2a(X) are not 0. Denote the second-order spectra of [Xi(i), X&)}by fjk(X) and those of {Ki(0, Y2(t)} by gjk(\),j, k = 1, 2. Prove that
2.13.8 Prove that 6(*) = lim «JF(/ar) where/ |JP(;t)ldx: < « and/ W(;c)<fc: = 1.W—t 00
2.13.9 If X(/), Y(r) are statistically independent r vector series with cumulant spectrafa(W = A a*(Xi , . . . , X*)ga(W = g"io*(Xi , . . . , X*), respectively, provethat the cumulant spectra of X(0 + Y(/) are given by fa(3t) + &A).
2.13.10 If X(f)and a(t)are real-valued, 7(/) = £„ o(/ - «)^(«) and ^r(/) hascumulant spectra /^ x(hi, . . ., X*), prove that the cumulant spectra ofY(t) are given by ^(Xi)- • -A(\k)xjK\i, . . . , X*).
2.13.11 Prove that^at(Xi, . . . , X*) = faiat(—Xi, . . . . —X*) for anr vectorseries with real-valued components.
2.13.12 If X(/) is a stationary Gaussian Markov r vector process with
prove that
and cxx(u) = (CXX(-UJTu < 0.2.13.13 Prove that the power spectrum of a real-valued stationary Gaussian
Markov process has the form a2/(l + p2 — 2p cos X)2r, — T < X ̂ TT,-1 <p < 1.
2.13.14 Give an example to indicate that X(/) of Section 2.11 is not necessarilyergodic.
2.13.15 Let X(w(f), / = 0, ±1,... ; N = 1, 2, . . . be a sequence of series satisfyingAssumption 2.6.1. Suppose
for / , « i , . . . , Mfc_i = 0, ±1,. . . ; N = 1, 2, . . . where
Suppose, as N —> <», all the finite dimensional distributions of the processXw(f), / = 0, ±1, . . . tend in distribution to those of a process X(f),/ = 0, ±1 Show that
2.13.16 Show that the transfer function of the filter
vanishes at X — ±co. Discuss the effect of this filter on the series
46 FOUNDATIONS
2.13.17 Let*(/) = 1 for (2j - I)2 ^ f ^ (2/)2 and
Let^r(r) = -1 for
Prove that X(i) satisfies the conditions of Section 2.11 and determine theassociated stochastic process.
2.13.18 Let AX/) = R cos (w/ + <p) where R, w, and <p are constants. Prove thatX(fsatisfies the conditions of Section 2.11 and determine the associatedstochastic process.
2.13.19 Let X(t\ t = 0, ±1,... and Y(t\ / = 0, ±1,... be independent series withmean 0 and power spectra fxx(\\ /rr(X) respectively. Show that the powerspectrum of the series X(t)Y(t), t = 0, ±1, . . . is
2.13.20 Let X(t\ t = 0, ±1, . . . be a Gaussian series with mean 0 and powerspectrum fxx(X). Show that the power spectrum of the series AX/)2, t - 0,±1, . . . is
2.13.21 If AX/) is a real-valued series satisfying Assumption 2.6.1, prove, directly,that [AX/)]2 also satisfies Assumption 2.6.1 and determine its cumulantspectra.
2.13.22 If X(/) satisfies Assumption 2.6.2(/), Y(r) = £)„ a(/ - «)X(w) for a(w) ans X r filter with 2 |M|'|a7fc(«)| < <» ,7 = 1 , . . . , s, k ~ 1, . . . , r for some /,then Y(r) satisfies Assumption 2.6.2(/).
2.13.23 An s X r filter a(w) is said to have rank t if A(X) has rank t for each X.Prove that in this case a(w) is equivalent in effect to applying first a / X rfilter then an s X t filter.
2.13.24 If X(/) = ^u"oa(' ~ u)t(u) with t(/) an r vector pure noise series and|a(w)} an s X r summable filter, prove that fx*(X) may be written in theform $(e'x) 4»(e'A)T where 4>(z) is an 5 X r matrix valued function withcomponents analytic in the disc z\ ^ 1.
2.13.25 If X(t) = ^Jto0^ ~ «)e(«) with e(w) a real-valued pure noise series and2a(w)2 < oo, prove that the kih order cumulant spectrum, 7x..*(Xi,..., X*)has the form $(e'xO- • -$(e'x*), Xi -\ \-\k = 0(mod 2?r), with *(z)analytic in the disc |z| ^ 1.
2.13.26 If X(r) is a moving average process of order m, prove that cxx(u) = 0 for\u\ > m.
2.13.27 If we adopt the functional approach to time series analysis, demonstratethat Y(t) = £)„ a(t - u)X(u\ ^-^ |a(w)| < « defines a filter. Indicate therelation between the spectra of Y(f) and those of X(t).
2.13 EXERCISES 47
2.13.28 Show that Vi(t\ F2(0 of (2.7.33) come from X(t) through filters withcoefficients \a(u) cos \ou], [a(u) sin XOH}, respectively.
2.13.29 Prove that S(ax) = \a\~l8(x).
2.13.30 Let X(0 be a stationary r vector valued series with X}{t) = pjXfa — 1) +e/0, \pj\ < 1, y = 1, . . . , /• where e(0 is an r vector pure noise series.Prove that
Him: Define lxx^(\) as in the proof of Theorem 2.5.2. See Bochner (1959)/• /*^ /-Tp. 329, and Grenander (1954). Prove / \(a)lxx(T)(oi)da-> / A(a)dGxx(a)
J -» J -T
for A(a) continuous on [ — TT.TT] also.
where T = min ( / i , . . . , /*), a = (a i , . . . , a*) and #ai ak = cum{ea ,(/),... ,6*4(0}.
2.13.31 Let $(r), 7 = 1, 2 , . . . be a sequence of positive numbers with the proper-ties; <S>(70-> oo and $(T+ l)/f>(r)->l as r-> «. Let X(0, / = 0,±1, ... be an r vector-valued function with the property
for u = 0, ±1,.... Show that there exists an r X r matrix-valued functionG*;KX), — TT < X ^ T, such that
2.13.32 Let X(r), t — 0, ±1,... be a vector-valued stationary series with cumulantspectra fai ak(\i, . . . , A*)- Evaluate the cumulant spectra of the time-reversed series X(—/), t = 0, ±1, . . . .
2.13.33 Show that
for — oo < X < oo. (The Poisson Summation Formula, Edwards (1967))2.13.34 Show that the function cxx(u) = 1, for \u\ ^ m and cxx(u) = 0 otherwise,
cannot be an autocovariance function.
2.13.35 Let Xj(t\ t = 0, ±1,. . . \j = 0,. . . , / — 1 be / independent realizationsof a stationary process. Let
Show that Y(t), / = 0, ±1, . . . is a stationary series. Show that its powerspectrum is fxxO^J)-
48 FOUNDATIONS
2.13.36 In the case that X(/), f = 0, ±1,... is a linear process with
show that Assumption 2.6.3 is satisfied provided for z in a neighborhoodofO
2.13.37 A filter is called stable if it carries a bounded input series over into abounded output series. Show that a summable filter is stable.
2.13.38 Let x(t), t = 0, ±1, . . . be an autoregressive process of order 1. Let e(r),t — 0, ±1,... be a pure noise series. Let X(t) = x(t) + e(r). Show that theseries A^r), t = 0, ±1,... is a mixed moving average autoregressive processof order (1,1).
2.13.39 State and prove an extension of Theorem 2.9.1 in which both the seriesX(r) and Y(f) are vector-valued.
3
ANALYTIC PROPERTIESOF FOURIER TRANSFORMSAND COMPLEX MATRICES
3.1 INTRODUCTION
The principal analytic tool that we will employ with time series is theFourier transform. In this chapter we present those portions of Fourieranalysis that will be required for our discussion. All the functions consideredin this chapter will be fixed, rather than stochastic. Stochastic properties ofFourier transforms will be considered in the next chapter.
Among the topics discussed here are the following: the degree of approxi-mation of a function by the partial sums of its Fourier series; the improve-ment of this approximation by the insertion of convergence factors; theFourier transform of a finite set of values; the rapid numerical evaluation ofFourier transforms; the spectrum of a matrix and its relation to the approxi-mation of one matrix by another of reduced rank; mathematical propertiesof functions of Fourier transforms; and finally the spectral or harmonicrepresentation of functions possessing a generalized harmonic analysis.
We begin by considering the Fourier series of a given function /4(X).
3.2 FOURIER SERIES
Let A(\), — o> < \ < oo, be a complex-valued function of period 2irsuch that
49
50 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
The Fourier coefficients of A(\) are given by
The function
is plotted in Figure 3.2.1 for the values n = 1, 3, 5, 10. We note that itfluctuates in sign and that it is concentrated in the neighborhood of a = 0,becoming more concentrated as n increases. Also
and the Fourier series of A(\) is then given by
There is extensive literature concerning Fourier series and Fourier coeffi-cients; see Zygmund (1959) and Edwards (1967), for example. Much of thisliterature is concerned with the behavior of the partial sums
In this work we will have frequent occasion to examine the nearness of the/4(n)(X) to A(\) for large n. Begin by noting, from expression (3.2.2) andExercise 1.7.5, that
from Exercise 1.7.5. In consequence of these properties of the function(3.2.6) we see that A(n}(\) is a weighted average of the function A(\ — a),with weight concentrated in the neighborhood of a = 0. We would expect^(n)(X) to be near A(\) for large n, if the function A(a) is not too irregular. Infact we can show that A(n)(\) tends to ^(X) as n —> » if, for example, A(a) isof bounded variation; see Edwards (1967) p. 150.
Under supplementary regularity conditions we can measure the rapidityof approach of /J(fl)(X) to A(\) as n —> o°. Suppose
3.2 FOURIER SERIES 51
Figure 3.2.1 Plot of />„(«) = sin (n + l)a/2ir sin $a.
52 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
This condition is tied up with the degree of smoothness of A(a). Under itA(a) has bounded continuous derivatives of order ^ k. We have therefore
and so
In summary, the degree of approximation of A(\) by A(n\\) is intimatelyrelated to the smoothness of /4(\).
We warn the reader that /4(/l)(X) need not necessarily approach A(\) asn —> °o even in the case that A(X) is a bounded continuous function of X; seeEdwards (1967) p. 150, for example. However, the relationship between thetwo functions is well illustrated by (3.2.5). The behavior of A(\) — A(n)(\) isespecially disturbed in the neighborhood of discontinuities of A(\). Gibbs'phenomenon involving the nondiminishing overshooting of a functionalvalue can occur; see Hamming (1962) p. 295 or Edwards (1967) p. 172.
3.3 CONVERGENCE FACTORS
Fejer (1900, 1904) recognized that the partial sums of a Fourier seriesmight be poor approximations of a function of interest even if the functionwere continuous. He therefore proposed that instead of the partial sum(3.2.4) we consider the sum
Using expression (3.2.2) and Exercise 1.7.12 we see that (3.3.1) may bewritten
The function
'We will make use of the Landau o, O notations writing an = oC3n) whenan/fin —»0 as n —> °o and writing an = O(j8n) when | an/fin | is bounded for sufficientlylarge n.
3.3 CONVERGENCE FACTORS 53
is plotted in Figure 3.3.1 for « = 2, 4, 6, 11. It is seen to be non-negative,concentrated in the neighborhood of a = 0 and, following Exercise 1.7.12,such that
It is blunter than the function (3.2.6) of the previous section and has fewerripples. This greater regularity leads to the convergence of (3.3.2) to A(\) inthe case that A(a) is a continuous function in contrast to the behavior of(3.2.5); see Edwards (1967) p. 87. The insertion of the factors 1 — \u\/n inexpression (3.3.1) has expanded the class of functions that may be reason-ably represented by trigonometric series.
Figure 3.3.1 Plot of sin2 Jna/2ir/j sin2 $«.
indicating that (3.3.5) is a weighted average of the function of interestA wide variety of convergence factors h(u/ri) have been proposed. Some
of these are listed in Table 3.3.1 along with their associated //(n)(X). Thetypical shape of h(u/n) involves a maximum of 1 at u = 0, followed by asteady decrease to 0 as \u\ increases to n. Convergence factors have alsobeen called data windows and tapers; see Tukey (1967).
The typical form of //(n>(\) is that of a blocky weight function in theneighborhood of 0 that becomes more concentrated as n —* «>. In fact itfollows from (3.3.6) that
as we would expect. An examination of expression (3.3.7) suggests that forsome purposes we may wish to choose //<n)(X) to be non-negative. The secondand third entries of Table 3.3.1 possess this property. The function //(n)(X)has been called a frequency window and a kernel. From (3.3.7) we see that thenearness of (3.3.5) to A(\) relates to the degree of concentration of the func-tion HM(a) about a = 0. Various measures of this concentration, or band-width, have been proposed. Press and Tukey (1956) suggested the half-powerwidth given by O.L — «t/, where O.L and au are the first positive and negative asuch that //<">(«) = //<">(0)/2. Grenander (1951) suggested the measure
54 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
In general we may consider expressions of the form
for some function h(x) with h(0) = 1 and h(x) = 0 for \x\ > 1. The multi-plier, h(u/n\ appearing in (3.3.5) is called a convergence factor; see, for ex-ample, Moore (1966). If we set
(3.3.5) may be written
This is the mean-squared error about 0 if H{n\a) is considered as a probabil-ity distribution on (—*•,*•). Parzen (1961) has suggested the measure
3.3 CONVERGENCE FACTORS 55
Table 3.3.1 Some Particular Convergence Factors
Authors
Dirichlet[Edwards (1967)]
Fejer, Bartlett[Edwards (1967),Parzen (1963)]de la Valle'-Poussin,Jackson, Parzen[Akhiezer(1956),Parzen (1961)]
Hamming, Tukey[Blackman andTukey (1958)]
Bohman[Bohman (I960)]
Poisson[Edwards (1967)]
Riemann, Lanczos[Edwards (1967),Lanczos (1956)]
GaussWeierstrass[Akhiezer(1956)]
Cauchy, Abel,Poisson[Akhiezer (1956)]
Riesz, BochnerParzen[Bochner (1936),Parzen (1961)]
Tukey[Tukey (1967)]
56 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
This is the width of the rectangle of the same maximum height and areaas //<">(«).
A measure that is particularly easy to handle is
Its properties include: if h(u) has second derivative ft"(Q) at u = 0, then
showing a connection with Grenander's measure (3.3.9). Alternately if thekernel being employed is the convolution of kernels GM(a), HM(a), thenwe can show that
for large n. Finally, if
exists for some q > 0, as Parzen (1961) assumes, then
Table 3.3.2 gives the values of 0nH and l///c">(0) for the kernels of Table
3.3.1. The entries of this table, give an indication of the relative asymptoticconcentration of the various kernels.
The following theorem gives an alternate means of examining the asymp-totic degree of approximation.
Theorem 3.3.1 Suppose A(\) has bounded derivatives of order ^ P.Suppose
with
3.3 CONVERGENCE FACTORS 57
Table 3.3.2 Bandwidths of the Kernels
be 0 for p = 1, 2 , . . . . If h(x) = h(—x), then this is the case for odd valuesof p. The requirement for even p is equivalent to requiring that h(x) be veryflat near x = 0. The last function of Table 3.3.1 is notable in this respect.
In fact, the optimum h(u/ri) will depend on the particular A(\) of interest.A considerable mathematical theory has been developed concerning thebest approximation of functions by trigonometric polynomials; see Akhiezer(1956) or Timan (1963), for example. Bohman (1960) and Akaike (1968)were concerned with the development of convergence factors appropriatefor a broad class of functions; see also Timan (1962), Shapiro (1969), Hoff(1970), Butzer and Nessel (1971).
Wilkins (1948) indicates asymptotic expansions of the form of (3.3.18)that are valid under less restrictive conditions.
for some finite K, then
Expression (3.3.11) gives a useful indication of the manner in which thenearness of (3.3.5) to A(\) depends on the convergence factors employed.If possible it should be arranged that
Kernel
Dirichlet
Fejer
de la Valle'-PoussinHamming
Bohman
Poisson
Riemann
GaussCauchyRieszTukey
The preceding discussion leads us to consider a filter of, for example,the form
for some convergence factors h(u/ri).
58 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
As an application of the discussion of this section we now turn to theproblem of filter design. Suppose that we wish to determine time domain co-efficients a(u), u = 0, ±1,... of a filter with prespecified transfer functionA(\). The relation between a(u) and A(\) is given by the expressions
The filter has the form
if X(t\ t = 0, ± 1,. . . is the initial series. Generally a(u) does not vanish forlarge \u\ and only a finite stretch of the X(t) series is available. These factslead to difficulty in applying (3.3.22). We can consider the problem of deter-mining a finite length filter with transfer function near A(\). This maybe formalized as the problem of determining multipliers h(u/ri) so that
is near A(\). This is the problem discussed above.Suppose that we wish to approximate a low-pass filter with cut-off fre-
quency, ft < IT; that is, the desired transfer function is
— TT < X < TT. This filter has coefficients
3.3 CONVERGENCE FACTORS 59
Figure 3.3.2 Transfer function of ideal Hilbert transform and approximations with variousfactors, n = 7.
60 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
Suppose alternately that we would like to realize numerically the Hilberttransform introduced in Section 2.7. Its transfer function is
The filter coefficients are therefore
Suppose n is odd. We are led therefore to consider niters of the form
Figure 3.3.2 indicates the imaginary part of the ideal A(\) of (3.3.27) for0 < X < 7T/2 and the imaginary part of the A(\) achieved by (3.3.29) withn = 1 for a variety of the convergence factors of Table 3.3.1. Because of thesymmetries of the functions involved we need present only the functions forthis restricted frequency range. The importance of inserting convergencefactors is well demonstrated by these diagrams. We also see the manner inwhich different convergence factors can affect the result.
References concerned with the design of digital filters include: Kuo andKaiser (1966), Wood (1968) and No. 3 of the IEEE Trans. Audio Electro.(1968). Goodman (1960) was concerned with the numerical realization of aHilbert transform and of band-pass filters. In Section 3.6 we will discuss ameans of rapidly evaluating the filtered series Y(t). Parzen (1963) discusses avariety of the topics of this section.
3.4 FINITE FOURIER TRANSFORMS AND THEIR PROPERTIES
Given the sequence a(u), u = 0, ± 1,. . . our work in the previous sectionshas led us to consider expressions of the form
For fixed n, such an expression is called a finite Fourier transform of the se-quence a(u), u = 0, ±1,. . . , ±«. Such transforms will constitute theessential statistics of our analysis of time series.
These two properties imply that, in the case of real-valued components, theprincipal domain of AX(T}(\) may be taken to be 0 ^ X ̂ TT. Continuing, wenote that if X(f), Y(/), t = 0,. . ., T — 1 are given and if a and /3 are scalars,then
3.4 FINITE FOURIER TRANSFORMS AND THEIR PROPERTIES 61
Before proceeding further, it is worthwhile to alter our notation slightlyand to consider the general case of a vector-valued sequence. Specifically weconsider an r vector-valued sequence X(0), X(l) , . . . , X(T — 1), whose do-main is 0, 1,.. . ,r — 1 rather than — n, —n+ 1 , . . . , — 1, 0 ,1, . . . , n. Wedefine the finite Fourier transform of this sequence to be
In the case that T — 2n -j- 1, n being an integer, we may write
and thus we see that the only essential difference between definitions of theform (3.4.1) and the form (3.4.2) is a multiplier of modulus 1. Which defini-tion is more convenient depends on the situation being discussed.
Among the properties of the definition (3.4.2) we note
Also, if the components of X(/) are real-valued, then
On occasion we may wish to relate the finite Fourier transform of theconvolution of two sequences to the Fourier transforms of the two se-quences themselves. We have
Lemma 3.4.1 Let X(/)> t = 0, ±1, ... be r vector-valued and uniformlybounded. Let a(/), / = 0, ±1,. .. be s X r matrix-valued and such that
Set
Then there is a finite K such that
62 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
where
We see that the finite Fourier transform of a filtered series is approximatelythe product of the transfer function of the filter and the finite Fourier trans-form of the series. This result will later provide us with a useful means ofrealizing the digital filtering of a series of interest. See also Lemma 6.3.1.
We now indicate a few examples of finite Fourier transforms. For thesecases it is simplest to take the symmetric definition
Example 1 (Constant) Suppose^?) = 1, f = 0, ±1,. . . , then expression(3.4.10) equals
This function was plotted in Figure 3.2.1 for 0 < X < TT. Notice that ithas peaks at X = 0, ±2*-,. . . .
Example 2 (Cosinusoid) Suppose X(f) — exp{/W}, / = 0, ±1,. . . with wreal-valued, then (3.4.10) equals
This is the transform of Example 1 translated along by w units. It has peaksat X = co, to ± 27r, . . . .
Example 3 (Trigonometric Polynomial) Suppose X(t) = ^k pk exp{/o>/c/}.Clearly, from what has gone before, (3.4.10) equals
an expression with large amplitude at X = co& ± 2?r/, / = 0, ±1 , . . . .
Example 4 {Monomials) Suppose X(i) = tk, t — Q, ±1,. . . , /c a positiveinteger. Expression (3.4.10) becomes
3.4 FINITE FOURIER TRANSFORMS AND THEIR PROPERTIES 63
This transform behaves like the derivatives of the transform of Example 1.Notice that it is concentrated in the neighborhood of X = 0, ±2ir,... forlarge n.
A polynomial J£* «*** will behave as a linear combination of functions ofthe form (3.4.14).
Example 5 (Monomial Amplitude Cosinusoid) Suppose A{0 =tkexp{iut},then (3.4.10) is
The general nature of these results is the following: the Fourier transformof a function X(f) is concentrated in amplitude near \ = 0, ±2ir, . . . ifX(f) is constant or slowly changing with /. It is concentrated near X = o>,w ± 2ir, . . . if X(t) is a cosinusoid of frequency « or is a cosinusoid of fre-quency o> multiplied by a polynomial in /.
The transform (3.4.2) may be inverted by the integration
The Tr vectors Ax{T)(2ws/T')> s = 0, . . . , T — 1, are sometimes referred to asthe discrete Fourier transform of X(0, t = 0,. . . , T — 1. We will discuss itsnumerical evaluation and properties in the next two sections.
The discrete Fourier transform may be written in matrix form. Let 9C de-note the r X T matrix whose columns are X(0),. . . , X(T — 1) successively.Let 3D denote the r X T matrix whose columns are Ax(T)(2irs/T), s = 0,. . . ,T - 1. Let fr denote the T X T matrix with exp {- i(2irst/T)} in row s + 1and column t -f 1 for s,t = 0,. . . , T — 1. Then we see that we have
This is the function of Example 4 translated along by w frequency units.
Alternatively it is seen to be inverted by the sum
64 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
The cases T — 1, 2, 3, 4 are seen to correspond to the respective matrices
General discussions of discrete and finite Fourier transforms are given iiStumpff (1937), Whittaker and Robinson (1944), Schoenberg (1950)Cooley, Lewis, and Welch (1967). Further properties are given in the exercises at the end of this chapter.
3.5 THE FAST FOURIER TRANSFORM
In this book the discrete Fourier transform will be the basic entity fronwhich statistics of interest will be formed. It is therefore important to be abhto calculate readily the discrete Fourier transform of a given set of number:X(t), 0 ^ t $ T - \ .
We have
and note that T2 complex multiplications are required if we calculate thediscrete Fourier transform directly from its definition. If Tis composite (th(product of several integers), then elementary procedures to reduce the required number of multiplications have often been employed; see Cooley et a(1967). Recently formal algorithms, which reduce the required number o:multiplications to what must be a near minimum, have appeared; see Gooc(1958), Cooley and Tukey (1965), Gentleman and Sande (1966), Cooley et a(1967), Bergland (1967), Bingham et al (1967) and Brigham and Morrow(1967). For a formulation in terms of a composition series of a finite groupsee Posner (1968) and Cairns (1971).
3.5 THE FAST FOURIER TRANSFORM 65
We now indicate the form of these Fast Fourier Transform Algorithmsbeginning with two elementary cases. The underlying idea is to reduce thecalculation of the discrete Fourier transform, of a long stretch of data, to thecalculation of successive Fourier transforms of shorter sets of data. Webegin with
Theorem 3.S.1 Let T = T\T2, where Ti and Ti are integers; then
We note that j\T2 + h runs through all integers j, 0 ^ j ^ T - I for0 ^ 71 ̂ Ti - 1 and 0 ^ j2 ^ T2 - 1. We note that (Ti + T2)TiT2 com-plex multiplications are required in (3.5.2) to perform discrete Fouriertransforms of orders T\ and ^^. Certain additional operations will be re-quired to insert the terms exp{ — i2irT~lj2ti}.
A different algorithm is provided by the following theorem in which we letX(f) denote the period T extension of ̂ (0),.. . ,X(T - 1).
Theorem 3.5.2 Let T = TiT2, where T\ and T2 are relatively prime integers;then for;' e >i(mod Ti),j = ;2(mod T2), 0 ̂ j\ ^ TI - \, 0 ^ J2 ^ Tz - 1
The number of complex multiplications required is again (Ti + T2)T\T^.In this case we must determine, for each j, j\ and 72 above and use this in-formation to select the appropriate Fourier coefficient. Notice that theexp{ —/2*T-1./2/i5 terms of (3.5.2) are absent and that the result is sym-metric in T\ and TV Good (1971) contrasts the two Fast Fourier Algorithms.
When we turn to the case in which T =T\ . ..Tk, for general k, withT\,...,Tk integers, the extension of Theorem 3.5.1 is apparent. In (3.5.2), Tzis now composite and so the inner Fourier transform, with respect to t2, maybe written in iterated form (in the form of (3.5.2) itself). Continuing in thisway it is seen that the dxm(2vT^lj), j = 0 , . . . , T - 1 may be derived by ksuccessive discrete Fourier transforms of orders T],.. . , Tk in turn. The
C#(0 is here the periodic extension, with period T, of X(t).)
By way of explanation of this result, we note that the numberstiT/Ti -(- tkT/Tk, when reduced mod T, run through all integer tt0 ^ t ^ T - 1 for 0 ^ ti ^ Ti - 1,. . ., 0 ^ tk ^ Tk - 1. For eachy wemust determine the 71 , . . . ,jk above, and select the appropriate Fourier co-efficient from those that have been calculated. This may be done by settingup a table of the residues of j, 0 ^ j ^ T — 1.
The number of complex multiplications indicated in Theorem 3.5.3 isalso (Ti + • • • + Tk)T. We see that we will obtain the greatest saving if theTj are small. If T = 2", we see that essentially 2T Iog2 T multiplicationsare needed. At the end of Section 3.4, we gave the discrete Fourier trans-form for the cases T.= 1, 2, 3, 4. Examination of the results shows thatfewer than the indicated number of operations may be required, the casesT = 4 and T = 8 being particularly important. Additional gains can beachieved by taking note of the real nature of the X(t) or by transformingmore than one series; see Cooley et al (1967) and Exercise 3.10.30.
It often occurs that T is not highly composite and one is not interested inthe values of dx
m(\) at frequencies of the form 2irj/TJ = 0, . . . , T - 1. Ifthis is so, we can add S — T zeros to the X(f) values, choosing S > T to behighly composite. The transform dx(T}(\) is now obtained for X = 2irj/S,7 = 0, 1,.. . ,S- 1.
Quite clearly we can combine the technique of Theorem 3.5.3, where thefactors of T are relatively prime, with the previously indicated procedure fordealing with general factors. The number of extra multiplications by co-sinusoids may be reduced in this way. See Hamming (1962), p. 74, for thecase T = 12. A FORTRAN program for the mixed radix Fast FourierTransform may be found in Singleton (1969).
In conclusion we remark that the Fast Fourier Transform is primarily anefficient numerical algorithm. Its use or nonuse does not affect the basis ofstatistical inference. Its effect has been to radically alter the calculations ofempirical time series analysis.
66 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
number of complex multiplications required is (Ti -\\- Tk)T. Specificformulas for this case may be found in Bingham et al (1967).
The generalization of Theorem 3.5.2 is as follows:
Theorem 3.5.3 Let T = T\ • • - Tk, where Ti,. . . , Tk are relatively prime inpairs. Lety = ji (mod Ti), 0 ^ ji ^ 71/ — 1, / = ! , . . . , £ ; then
then we quickly see that the convolution (3.6.1) is the coefficient ofexp{— i\u} in the trigonometric polynomial dxm(\)dY(T)(\). It is thereforegiven by
We may obtain the desired values of the convolution from (3.6.4) by takingS large enough. If S is taken to be highly composite then the discrete Fouriertransforms required in the direct evaluation of (3.6.4) may be rapidly cal-culated by means of the Fast Fourier Transform Algorithm of the previoussection. Consequently the convolution (3.6.1) may well be more rapidlycomputed by this procedure rather than by using its definition (3.6.1) di-rectly. This fact was noted by Sande; see Gentleman and Sande (1966), andalso Stockham (1966). From (3.6.5) we see that for S - T < \u\ ̂ T - 1,
3.6 APPLICATIONS OF DISCRETE FOURIER TRANSFORMS 67
3.6 APPLICATIONS OF DISCRETE FOURIER TRANSFORMS
Suppose the values X(t\ Y(t\ t = 0,. . ., T - 1 are available. We willsometimes require the convolution
If
In general (3.6.4) equals
This occurrence suggests that we may be able to compute (3.6.1) by means ofa discrete Fourier transform and so take advantage of the Fast FourierTransform Algorithm. In fact we have
Lemma 3.6.1 Given X(t), Y(t), t = 0,. . . , T - 1 and an integer S > T,the convolution (3.6.1) is given by
and so if a(«) falls off rapidly as |M| —» °° and Q ^ t ^ T — I expression(3.6.8) should be near (3.6.7). If S is taken to be highly composite then thecalculations indicated in (3.6.8) may be reduced by means of the FastFourier Transform. We might introduce convergence factors.
We remark that Lemma 3.6.1 has the following extension:
Lemma 3.6.2 Given X}(t\ t = 0,. . . , T — 1, j = 1,. . . , r and an integerS > T the expression
68 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
expression (3.6.4) gives (3.6.1) plus some additional terms. For moderatevalues of \u\ it will approximately equal (3.6.1). It can be obtained for all uby taking S ^ 2T.
One situation in which one might require the convolution (3.6.1) is in theestimation of the moment function mn(ii) ~ E[X\(t + u) X2(t)] for somestationary bivariate series. An unbiased estimate of m\2(u) is provided by
equals
an expression of the form of (3.6.1). Exercise 3.10.7 indicates how theresult of Lemma 3.6.1 might be modified to construct an estimate ofci2(«) = cov{*i(/ + «),*2(01.
Another situation in which the result of Lemma 3.6.1 proves useful is inthe calculation of the filtered values of Section 3.3:
given the values X(t), t = 0 , . . . , T — 1. Suppose the transfer function ofthe filter [a(ii)} is A(\). Then Lemmas 3.4.1 and 3.6.1 suggest that we form
These values should be near the desired filtered values. In fact by direct sub-stitution we see that (3.6.8) equals
3.6 APPLICATIONS OF DISCRETE FOURIER TRANSFORMS 69
We conclude this section by indicating some uses of the finite Fouriertransform. Suppose
then
following Example 2 of Section 3.4. By inspection we see that the amplitudeof expression (3.6.13) is large for X near ±w and not otherwise, — ir < X < ir.In consequence the finite Fourier transform (3.6.13) should prove useful indetecting the frequency of a cosinusoid of unknown frequency. This use wasproposed in Stokes (1879).
We remark that if X(t) contains two unknown frequencies, say
then we may have difficulty resolving coi and o>2 if they are close to one an-other for
This function will not have obvious peaks in amplitude at X = ±coi, ±«2 ifo>i and W2 are so close together that the ripples of the Dn functions interferewith one another. This difficulty may be reduced by tapering the X(t) seriesprior to forming the Fourier transform. Specifically consider
in the case of (3.6.14) where we have made use of (3.3.6). If the convergencefactors h(u/ri) are selected so that HM(\) is concentrated in some interval,say |X| < A/«, then the amplitude of (3.6.16) should have obvious peaks if|wi - co2| > 2A/n.
Other uses of the finite Fourier transform include: the evaluation of thelatent values of a matrix of interest, see Lanczos (1955); the estimation ofthe mixing distribution of a compound distribution, see Medgyessy (1961);and the determination of the cumulative distribution function of a randomvariable from the characteristic function, see Bohman (1960).
70 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
is the matrix of the discrete Fourier transform considered in Section 3.4,then the matrix T~ll2^f is unitary. Its latent values are given in Exercise3.10.12.
Such an a is called a latent vector of Z. If Z is Hermitian, then its latentvalues are real-valued; see MacDuffee (1946). We denote they'th largest ofthese by M, or M/Z) for/ = 1,...,./. The corresponding latent vector is de-noted by ctj or «/Z). The collection of latent values of a square matrix iscalled its spectrum. We will shortly discuss the connection between thisspectrum and the previously defined second-order spectrum of a stationaryseries.
Given a matrix Z, we note that the matrices ZZT, ZrZ are always Hermi-tian and non-negative definite. Also, following Theorem 2.5.1, we note thatif X(r), / = 0, ±1,. . . is an r vector-valued stationary series with absolutelysummable covariance function, then fxx(X), its spectral density matrix, isHermitian and non-negative definite. We remark that if
3.7 COMPLEX MATRICES AND THEIR EXTREMAL VALUES
We turn to a consideration of matrices whose entries are complex num-bers and remark that the spectral density matrix introduced in Section 2.5is an example of such a matrix. Begin with several definitions. If Z = [Zjk] isa J X A" matrix with the complex number Zjk in they'th row and kth column,then we define Z = [Zjk]to be the matrix whose entries are the complexconjugates of the entries of Z. Let ZT = [Zk\ denote the transpose of Z. Wethen say that Z is Hermitian if ZT = Z. If Z is J X J Hermitian then we saythat Z is non-negative definite if
for all complex scalars a/, j_— 1,. . ., J. A square matrix Z is unitary ifZrl = ZT or equivalently ZZT = I with I the identity matrix. The complexnumber n is called a latent value or latent root of the J X J matrix Z if
where I is the identity of the same dimension as Z. Because Det(Z — /J) is apolynomial of order J in ju, the equation (3.7.2) has at most J distinct roots.It is a classic result (MacDuffee (1946)) that corresponding to any latentvalue M there is always a J vector a such that
It is discussed in Wedderburn (1934), Lanczos (1956), Bellman (1960),Brenner (1961), Good (1963), and Goodman (1963). The correspondence isexceedingly useful for carrying out numerical computations involvingmatrices with complex-valued entries. However, Ehrlich (1970) suggests thatwe should stick to complex arithmetic when convenient.
Latent vectors and values are important in the construction of representa-tions of matrices by more elementary matrices. In the case of a Hermitianmatrix we have
Theorem 3.7.1 If H is a J X J Hermitian matrix, then
In fact the correspondence of this lemma may be taken to be
providing the dimensions of the matrices appearing throughout the lemmaare appropriate.
3.7 COMPLEX MATRICES AND THEIR EXTREMAL VALUES 71
It is sometimes useful to be able to reduce computations involving com-plex matrices to computations involving only real matrices. Lemma 3.7.1below gives an important isomorphism between complex matrices andreal matrices. We first set down the notation; if Z = [Zjk] withZjk = Re Zjk + i Im Z/*, then
Lemma 3.7.1 To any J X K matrix Z with complex entries there corre-sponds a (2J) X (2AT) matrix Z* with real entries such that
(i) if Z = X + Y, then Z* = X* -f Y*(ii) if Z = XY, then Z* = X«Y*
(iii) if Y = Z-i, then Y* = (Z*)-»(iv) Det Z* = |Det Zp(v) if Z is Hermitian, then Z* is symmetric
(vi) if Z is unitary, then Z* is orthogonal(vii) if the latent values and vectors of Z are /x/, «/,./ = 1,...,/, then
those of Z* are, respectively,
72 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
where M; is the jf'th latent value of H and U, is the corresponding latentvector.
The theorem has the following:
Corollary 3.7.1 If H is J X J Hermitian, then it may be written UMUT
where M = diag{M/,7 = 1, • • • , J} and U = [Ui • • -Uy] is unitary. Also if His non-negative definite, then MJ ^ 0, j = 1,. . . , J.
This theorem is sometimes known as the Spectral Theorem. In the case ofmatrices of arbitrary dimension we have
Theorem 3.7.2 If Z is J X K, then
where My2 is the jth latent value of ZZr (or_ZTZ), U, is thejth latent vector ofZZT and V, is the y'th latent vector of ZrZ and it is understood /*/ ^ 0.
The theorem has the following:
Corollary 3.7.2 If Z is J X K, then it may be written UMVT where theJ X K M = diagU,; j = 1,. . ., J}, the J X J U = [Ui• • -U/] is unitaryand the K X K V = [Vi • • • VK] is also unitary.
This theorem is given in Autonne (1915). Structure theorems for matricesare discussed in Wedderburn (1934) and Hua (1963); see also Schwerdtfeger(1960). The representation Z = UMUr is called the singular value decom-position of Z. A computer program for it is given in Businger and Golub(1969).
An important class of matrices, in the subject of time series analysis, is theclass of finite Toeplitz matrices. We say that a matrix C = [Cjk] is finiteToeplitz if Cjk depends only on j — k, that is, Cjk = c(j — k) for somefunction c(.). These matrices are discussed in Widom (1965) where otherreferences may be found. Finite Toeplitz matrices are important in timeseries analysis for the following reason; if X(f), t — 0, ±1, ... is a real-valued stationary series with autocovariance function cxx(u), u — 0,±1,.. ., then the covariance matrix of the stretch X(t), t = 0,. . . , T — 1is a finite Toeplitz matrix with cxx(j — k) in the ;'th row and fcth column.
We will sometimes be interested in the latent roots and vectors of the Co-variance matrix of X(t), t = 0, . . ., T — 1 for a stationary X(t). Variousapproximate results are available concerning these in the case of large T.Before indicating certain of these we first introduce an important class of
3.7 COMPLEX MATRICES AND THEIR EXTREMAL VALUES 73
finite Toeplitz matrices. A square matrix Z = [ZJk] is said to be a circulantof order T if Zjk = z(k — j) for some function z(.) of period T, that is,
In connection with the latent values and vectors of a circulant we have
Theorem 3.7.3 Let Z = [z(k - j)] be a T X T circulant matrix, then itslatent values are given by
and the corresponding latent vectors by
respectively.
The latent values are seen to provide the discrete Fourier transform of thesequence z(/), t = 0 , . . . , T — 1. The matrix of latent vectors is proportionalto the matrix ff of Section 3.4. Theorem 3.7.3 may be found in Aitken (1954),Schoenberg (1950), Hamburger and Grimshaw (1951) p. 94, Good (1950),and Whittle (1951).
Let us return to the discussion of a general square finite Toeplitz matrixC = [c(j — k)], j, k = 1, . . . , T. Consider the related circulant matrix Zwhose fcth entry in the first row is c(l — k) + c(l — k + T), where we con-sider c(r) = 0. Following Theorem 3.7.3 the latent values of Z are
giving a discrete Fourier transform of the c(u), u = 0, ±1, . . . , ±(r — 1).Let JT denote the T XT matrix whose columns are the vectors (3.7.12). LetMj- denote the diagonal matrix with corresponding entries ^*(Z), then
74 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
and we may consider approximating C by 3rMr3rT = Z. We have
then the latent roots of C are tending to be distributed like the values of thediscrete Fourier transform of c(w), u = 0, ±1,. . ., ±(r — 1) as T—* <». Avariety of results of this nature may be found in Grenander and Szego(1958); see also Exercise 3.10.14. This sort of result indicates a connectionbetween the power spectrum of a stationary time series (defined as theFourier transform of its autocovariance function) and the spectrum (definedto be the collection of latent values) of the covariance matrix of longstretches of the series. We return to this in Section 4.7.
Results concerning the difference between the latent vectors of C andthose of Z may be found in Gavurin (1957) and Davis and Kahan (1969).We remark that the above discussion may be extended to the case of vector-valued time series and block Toeplitz matrices; see Exercise 3.10.15.
The representation (3.7.9) is important in the approximation of a matrixby another matrix of reduced rank. We have the following:
Theorem 3.7.4 Let Z be J X K. Among J X K matrices A of rank L ^ J, K
where nj, U/, Vy are given in Theorem 3.7.2. The minimum achieved is n2j+L.
giving us a bound on the difference between C and Z. This bound may beused to place bounds on the differences between the latent roots and vectorsof C and Z. For example the Wielandt-Hoffman Theorem (Wilkinson(1965)) indicates that there is an ordering /*ii(Q» • • • > M/r(Q °f tne latentroots m(C), • • • , Mr(C) of C such that
If
is minimized by
3.8 FUNCTIONS OF FOURIER TRANSFORMS 75
We see that we construct A from the terms in (3.7.9) corresponding to theL largest /iy; see Okamoto (1969) for the case of real symmetric Z and A.
Corollary 3.7.4 The above choice of A also minimizes
3.8 FUNCTIONS OF FOURIER TRANSFORMS
Let X(0, t = 0, ±1,.. . be a vector-valued time series of interest. In orderto discuss the statistical properties of certain series resulting from the appli-cation of operators to the series X(0, we must now develop several analyticresults concerning functions of Fourier transforms. We begin with thefollowing:
Definition 3.8.1 Let C denote the space of complex numbers. A complex-valued function/(z) defined for z == (zi,. . ., zn) £ D, an open subset of Cn,is hoiomorphic in D if each point w = (wi,. . . , wn) £ D is contained in anopen neighborhood U such that /(z) has a convergent power series ex-pansion
for A of rank L ^ J, K. The minimum achieved is
Results of the form of this corollary are given in Eckart and Young (1936),Kramer and Mathews (1956), and Rao (1965) for the case of real Z, A.
for all z £ U.
A result that is sometimes useful in determining hoiomorphic functions isprovided by
Theorem 3.8.1 Suppose Fj(y\,. . . , ym;zi,. . . , zn), j = 1,. .. , m arehoiomorphic functions of m + n variables in a neighborhood of(MI, .. ., um\v\,. .. , D,,) 6 Cm+H. If FJ(UI, . . ., wm ; i?i , . . . , vn) = 0, j = 1,. . ., m, while the determinant of the Jacobian matrix
76 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
is nonzero at (MI, . . . , um\v\,. . ., vn), then the equations
have a unique solution y/ = yj(z\,. . . , zn)J = 1,.. . , m which is holomor-phic in a neighborhood of ( i n , . . . , vn).
This theorem may be found in Bochner and Martin (1948) p. 39. It im-plies, for example, that the zeros of a polynomial are holomorphic functionsof the coefficients of the polynomial in a region where the polynomial hasdistinct roots. It implies a fortiori that the latent values of a matrix areholomorphic functions of the elements of the matrix in a region of distinctlatent values; see Exercise 3.10.19.
Let V+(l), / ^ 0, denote the space of functions z(X), — <» < X < <», thatare Fourier transforms of the form
with the a(u) real-valued and satisfying
Under the condition (3.8.5) the domain of z(X) may be extended to consist ofcomplex X with -co < Re X < oo, Im X ̂ 0. We then have
Theorem 3.8.2 If z,(X) belongs to V+(l\ j = 1,. . . , n and f(z\t. . . , zn)is a holomorphic function in a neighborhood of the range of values|zt(X), . . . , zn(X)}; - oo < Re X < «, Im X ̂ 0, then/(zi(X),. . . , zn(X))also belongs to V+(l).
This theorem may be deduced from results in Gelfand et al (1964). Thefirst theorems of this nature were given by Wiener (1933) and Levy (1933).
As an example of the use of this theorem consider the following: let|a(w)j, u = 0, 1, 2,. . . be an r X r realizable /summable filter with transferfunction A(X) satisfying Det A(X) ̂ 0, - oo < Re X < oo, Im X ̂ 0. Thilast condition implies that the entries of A(X)-1 are holomorphic functions ofthe entries of A(X) in a neighborhood of the range of A(X); see Exercis3.10.37. An application of Theorem 3.8.2 indicates that the entries ofB(X) = A(X)~l are in K+(/) and so B(X) is the transfer function of an r X rrealizable / summable filter {b(«)U « = 0, 1, 2 , . . . . In particular we seethat if X(f), t = 0, ±1,.. . is a stationary r vector-valued series with£|X(/)| < °°, then the relation
We remark that the condition Det A(X) ^ 0, -co < R e X < «>, Im X ̂ 0is equivalent to the condition
and has no roots in the unit disc \z\ ^ 1. In the case that Y(/) = e(0, a purenoise series with finite mean, the above reasoning indicates that if
Det[I -f a(l)z + • • • + a(/w)z*] (3.8.10)
has no roots in the unit disc, then the autoregressive scheme
X(/) + a(l)X(/ - ! )+• • • + a(/n)X(f - m) = t(t) (3.8.11)
has, with probability 1, a stationary solution of the form
3.8 FUNCTIONS OF FOURIER TRANSFORMS 77
may, with probability 1, be inverted to give
for some b(«), u = 0, 1, 2 , . . . with
with
for all / ^ 0.An alternate set of results of the above nature is sometimes useful. We
set down
Definition 3.8.2 A complex-valued function /(z) defined for z =(z\,. . . , zn) £ D, an open subset of C", is real holomorphic in D if each pointw = (wi,.. . , Wn) G D is contained in an open neighborhood U such that/(z) has a convergent power series expansion
for all z 6 U.
where the series
Theorem 3.8.4 Let X(t), t = 0, ± 1, . . . be a real-valued series with mean 0and cov{*(/ -f- u), X(t)} = cxx(u\ t, u = 0, ±1,. . . . Suppose
Suppose /*A-(X) 5^ 0, — oo <X< oo. Then we may write
78 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
We next introduce F(/), / ^ 0, the space of functions z(X), — « < X < °°,that are Fourier transforms of the form
with the a(u) real-valued and satisfying
We then have the following:
Theorem 3.8.3 If z/X) belongs to V(l),j = ! , . . . , » and/(zi , . . . , zn) is areal holomorphic function in a neighborhood of the range of values( z i ( X ) , . . . , zn(X); - oo < x < oo}, then /(zi(X),. . . , zn(X)) also belongsto V(t).
This theorem again follows from the work of Gelfand et al (1964). Com-paring this theorem with Theorem 3.8.2, we note that the required domain ofregularity of/(•) is smaller here and its values are allowed to be moregeneral.
As an application of this theorem: let (a(w)} u = 0, ±1, ±2, . . . be anr X r I summable filter with transfer function A(X) satisfying Det A(X) ^ 0,— oo < X < oo. Then there exists an /summable filter {b(w)}tt = 0, ±1,...with transfer function B(X) = A(X)-1. Or with the same notation, thereexists an / summable filter { C(M) } u = 0, ± 1,. . . with transfer functionC(X) = (A(X)A(X))-».
As an example of the joint use of Theorems 3.8.2 and 3.8.3 we mention thefollowing result useful in the linear prediction of real-valued stationaryseries.
3.8 FUNCTIONS OF FOURIER TRANSFORMS 79
has mean 0 and autocovariance function c,,(«) = 6{u}. The coefficientssatisfy
The \a(u)\, \b(u)\ required here are determined somewhat indirectly. If
then we see that it is necessary to have
As (3.8.17) holds and/o{X) does not vanish, we may write
with
and
following Theorem 3.8.3. Expression (3.8.24) suggests defining
The corresponding (o(«)), |&(H)| satisfy expression (3.8.20) followingTheorem 3.8.2.
Theorems 3.8.2 and 3.8.3 have previously been used in a time series con-text in Hannan (1963). Arens and Calderon (1955) and Gelfand et al (1964)are general references to the theorems. Baxter (1963) develops an inequality,using these procedures, that may be useful in bounding the error of finiteapproximations to certain Fourier transforms.
80 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
3.9 SPECTRAL REPRESENTATIONS IN THE FUNCTIONAL APPROACHTO TIME SERIES
In Section 2.7 we saw that the effect of linear time invariant operations ona time series X(0, t — 0, ±1,. .. was easily illustrated if the series could bewritten as a sum of cosinusoids, that is, if for example
exists for /, M = 0, ±1, . . . . Then the following limit exists,
Also there exists an r vector-valued Zx(^',s), — ?r < X <J ?r, s = 0, ±1 , . . .such that
in the sense that
ZA-(X;S) also satisfies
the z(y) being r vectors. In this section we consider representations of a seriesX(0 that have the nature of expression (3.9.1), but apply to a broader class oftime series: such representations will be called spectral representations. Theyhave the general form
for some r vector-valued ZAT(A). We begin with
Theorem 3.9.1 Let X(/), t = 0, ± 1,. . . be an r vector-valued function suchthat
3.9 SPECTRAL REPRESENTATIONS IN THE FUNCTIONAL APPROACH 81
and
The matrix Gjr*(A) of (3.9.4) may be seen to be bounded, non-negativedefinite, nondecreasing as a function of X, 0 ^ X ̂ *-, and such thatGjr*(—X) = G*jr(X)T. Exercise 2.13.31 indicates a related result.
Expression (3.9.5) provides a representation for X(r + -0 as a sum of co-sinusoids of differing phases and amplitudes. Suppose that {a(u)}, u = 0,±1,... is a filter whose coefficients vanish for sufficiently large \u\. LetA(X) denote the transfer function of this filter. Then if we set
we see that the filtered series has the representation
The cosinusoids making up X(/ + s) have become multiplied by the transferfunction of the filter.
A version of Theorem 3.9.1 is given in Bass (1962a,b); however, thetheorem itself follows from a representation theorem of Wold (1948).
An alternate form of spectral representation was given by Wiener (1930)and a discrete vector-valued version of his result is provided by
Theorem 3.9.2 Let X(j), / = 0, ±1, . . . be an r vector-valued functionA...*.!* *l*A +
then there exists an r vector-valued Z*(X), — r < X ̂ T, with Zx(w) —ZX(-T,) = X(0), such that
Expression (3.9.12) holds in the sense of the formal integration by parts
Expression (3.9.12) may clearly be used to illustrate the effect of linearfilters on the series X(/).
Yet another means of obtaining a spectral representation for a fixed seriesX(f), t = 0, ±1, . . . is to make use of the theory of Schwartz distributions;see Schwartz (1057, 1959) and Edwards (1967) Chap. 12. We will obtain aspectral representation for a stochastic series in Section 4.6. Bertrandias(1960, 1961) also considers the case of fixed series as does Heninger (1970).
3.10 EXERCISES
3.10.1 Suppose A(\) = 1 for |X ± co| < A with A small and A(\) = 0 otherwisefor — IT < X < TT. Show that
3.10.2 Let A(\) denote the transfer function of a filter. Show that the filter leavespolynomials of degree k invariant if and only if A(0) = 1, A(»(Q) = 0,1 ^ j ^ k. (Here A(A(\) denotes the y'th derivative.) See Schoenberg(1946) and Brillinger (1965a).
3.10.3 If
with \H(a)\ ^ K(l + \a\)~2, show that #<«>(X) of (3.3.6) is given by
82 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
The function Z*(X) satisfies
If X(/) also satisfies expression (3.9.3) and GA-AT(^) is given by (3.9.4), then
at points of continuity of G^^(X), 0 ^ X ̂ T.
A theorem of Wiener (1933), p. 138, applies to show that expression(3.9.11) holds if
3.10 EXERCISES 83
3.10.4 If SF denotes the matrix with exp{-/27r(;- !)(£ - \)/T\ in row j,column k,l ^ j, k ^ T, show that 3FSFT = 71 and ff4 = T2!.
3.10.5 If Z>n(X) is given by expression (3.2.6), prove that (2n + l)~lDn(X) tendstoi;{X}/27ras n—> co.
3.10.6 Prove that expression (3.4.14) tends to
3.10.7 Let CA-(r), cy(r> denote the means of the values AT(/), Y(t\ t = 0,. . . , T - 1.Show that
3.10.9 Let
Show that dy(r'(X) = A(\)djr(:r)(X), -co < X < ».3.10.10 Let n(r)(wi, . . . , wr_0 denote expression (3.6.10). Show that
is given by
3.10.8 Let jfrX'X t = 0, ±1,. .. denote the period T extension of *//), t = 0,. . . 'T — 1 for y = 1, . . . , r. Show that the expression
3.10.11 If W = Z-1, show that
Re W = {Re Z + (Im Z)(Re Z)-»(Im Z)}"1
Im W = -(Re W)(Im Z)(Re Z)-».
84 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
3.10.12 Let 5F denote the matrix of Exercise 3.10.4. Show that its latent values are7-1/2, -,T1/2, -r1/2, jT1/2 with multiplicities [7/4] + !, [(r+l)/4],[(r+ 2)/4], [(r -}- 3)/4] - 1 respectively. (Here [N] denotes the integralpart of M) See Lewis (1939).
3.10.13 If the Hermitian matrix Z has latent values p\,. . . , p, and correspondinglatent vectors Ui, .. . , U-, prove that the matrix Z — jujUiUr has latentvalues 0, /Li2,. . . , & and latent vectors Ui, . . . , Ur. Show how this resultmay be used to reduce the calculations required in determining the latentvalues and vectors of Z from those of Z*.
3.10.14 Use the inequality (3.7.16) to prove the following theorem; let
Theorems of this sort are given in Grenander and Szego (1958).3.10.15 A (7» X (TV) matrix Z is said to be a block circulant if it is made up of
r X r matrices Z;* = z(k — j) for some r X r matrix-valued function z(.)of period r. Prove that the latent values of Z are given by the latent valuesof
- oo < X < oo with 2*u \u\ |c(w)|2 < co. Let C^ = [c(j - *)], j, k = 1,. . . , T. If F[.] is a function with a uniformly bounded derivative on therange of /(X), — oo < X < », then
and the corresponding latent vectors by
where u/* are the latent vectors of (*); see Friedman (1961). Indicate howthis result may be used to determine the inverse of a block circulant matrix.
3.10.16 Let Z be a J X J Hermitian matrix. Show that
for x a J vector and D a matrix of rank ^ j — I that has J rows. This is theCourant-Fischer Theorem and may be found in Bellman (1960).
3.10.17 If the pure noise series c(r), / = 0, ±1, . . . has moments of all orders,prove that the autoregressive scheme (3.8.11) has, with probability 1, asolution X(/) satisfying Assumption 2.6.1 provided that the polynomial(3.8.10) has no roots in the unit disc.
3.10 EXERCISES 85
3.10.18 If A, B are r X r complex matrices and F(B;A) = BA, prove that thedeterminant of the Jacobian dF/dB is given by (Det A)r; see Deemer andOlkin (1951) and Khatri (1965a).
3.10.19 Let Z be an r X r complex matrix with distinct latent values /*/, j = 1,. . . , r. Prove that the pt are holomorphic functions of the entries of Z.Hint: Note that the m are the solutions of the equation Det(Z — /*!) = 0and use Theorem 3.8.1; see Portmann (1960).
3.10.20 Let Zo be an r X r complex matrix with distinct latent values. Show thatthere exists a nonsingular Q whose entries are holomorphic functions ofthe entries of Z for all Z in a neighborhood of Zo and such that Q~'ZQ isa diagonal matrix in the neighborhood; see Portmann (1960).
3.10.21 If Zo, Z of Exercise 3.10.20 are Hermitian, then the columns of Q areorthogonal. Conclude that a unitary matrix, U, whose entries are realholomorphic functions of the entries of Z may be determined so thatUTZU is a diagonal matrix.
3.10.22 If {a(w)J, M = 0, ±1,... is an r X r realizable filter and (b(w)} its inverseexists, prove that the b(«), u = 0,1,. . . are given by: a(0)b(0) = I, a(0)b(l)+ a(l)b(0) = 0, a(0)b(2) + a(l)b(l) + a(2)b(0) = 0, ....
3.10.23 Prove Exercise 2.13.22 using the results of Section 3.8.
3.10.24 Let p(S) be a monotonically increasing function such that lim^o,p(5 + l)/p(5) = 1. Let^r(/), / = 0, ±1, . . . be a function such that
exists for /,« = 0, ±1 Indicate the form that Theorem 3.9.1 takes forsuch an X(f).
3.10.25 Adopt the notation of Theorem 3.9.1. If the moments mai...ak(ui,..., wt_i)of expression (2.11.9) exist and are given by the Fourier-Stieltjes transformsof the functions Mai...ak(\i, . . . , A*-0, — TT < X; ^ TT, prove that
3.10.26 Let Z be a / X J Hermitian matrix with ordered latent vectors x i , . . . , x/.Show that
where the maximum is over x orthogonal to x i , . . . , x^_i. Equality occursfor x = X,.
86 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS
3.10.27 Let A be an r X r Hermitian matrix with latent roots and vectors MJ,Vj,y = 1,..., r. Given # mapping the real line into itself, the r X r matrix-valued function A(A) is denned by
3.10.30 Let real-valued data X(t\ Y(t\ t = 0, . . . , T - 1 be given. Set Z(t) =X(t) + iT(r). Show that
Show that <KA)* = <KA*).
3.10.28 Show that there exist constants K, L such that
for
3.10.29 Suppose the conditions of Theorem 3.3.1 are satisfied and in addition theFth derivative of A(a) is continuous at a = X. Show that the last expressionof (3.3.18) may be replaced by
This exercise indicates how the Fourier transforms of two real-valued setsof data may be found with one application of a Fourier transform to acomplex-valued set of data; Bingham et al (1967).
3.10.31 Prove that for S an integer
when a(u)t A(\) are related as in (3.2.2).
3.10.32 If a is an r vector and Z is an r X r Hermitian matrix, show that
where
3.10 EXERCISES 87
3.10.33 With the notation of Corollary 3.7.2, set M+ = diag{M;+,7 = !,...,/}where n+ =•_!//* if n ̂ 0, n+ = 0 if n = 0. Then the K X J matrixZ+ = VM+Ur is called the generalized inverse of Z. Show that(a) ZZ+Z = Z(b) Z+ZZ+ = Z4"(c) (ZZ+)r = ZZ+(d) (Z+Z)T = Z+Z.
3.10.34 Show for 5 ̂ Tthat
3.10.36 Use the singular value decomposition to show that a J X K matrix A ofrank L may be written A = BC, where B is J X L and C is L X K.
3.10.37 Let Zo be an r X r matrix with Det Zo ̂ 0. Show that the entries of Z~'are holomorphic functions of Z in a neighborhood of Zo.
3.10.35 If A(n}(\) is given by expression (3.2.4), show for m ^ n that
4
STOCHASTIC PROPERTIES OFFINITE FOURIER TRANSFORMS
4.1 INTRODUCTION
Consider an r vector-valued sequence X(/), / = 0, ± 1,. . . . In the previ-ous chapter we considered various properties of the finite Fourier transform
in the case that X(/) was a fixed, nonstochastic function. In this chapter wepresent a variety of properties of Ax(T}(\), if X(0, t = 0, ±1,. . . is a sta-tionary time series. We will also consider asymptotic distributions, probabil-ity 1 bounds, behavior under convolution as well as develop the Cramerrepresentation of X(/).
In previous chapters we have seen that Fourier transforms possess awealth of valuable mathematical properties. For example, in Chapter 3 wesaw that the discrete Fourier transform has the important numericalproperty of being rapidly computable by the Fast Fourier TransformAlgorithm, while in this chapter we will see that it has useful and elementarystatistical properties. For all of the reasons previously given, the Fouriertransform is an obvious entity on which to base an analysis of a time seriesof interest.
However, before developing stochastic properties of the transform (4.1.1)we first define two types of complex-valued random variables. These vari-ables will prove important in our development of the distributions of varioustime series statistics.
88
4.2 THE COMPLEX NORMAL DISTRIBUTION 89
4.2 THE COMPLEX NORMAL DISTRIBUTION
If X is an r vector-valued random variable having real-valued componentsand having a multivariate normal distribution with mean y* and covariancematrix Xxx, write: X is Nfax&xx). Throughout this text we will oftenhave to consider r vector-valued random variables X whose individual com-ponents are complex-valued. If, for such an X, the 2r vector-valued variatewith real components
We remark that within the class of complex vector-valued random variableswhose real and imaginary parts have a joint multivariate normal distribu-tion, the complex multivariate normals have the property that if (4.2.4) isdiagonal, then the components of X are statistically independent; seeExercise 4.8.1. Various properties of the complex multivariate normal aregiven in Wooding (1956), Goodman (1963), James (1964), and in Exercises4.8.1 to 4.8.3. We mention the properties: if *Sxx is nonsingular, then theprobability element of X is given by
is distributed as
for some r vector yx and r X r Hermitian non-negative definite SAT*, wewill write: X is Nr
c(^x^xx\ Then X is complex multivariate normal withmean yx and covariance matrix SA-A-, which leads us to
and
for — oo < Re AT/, Im Xj < <». And in the case r = 1 if A'is Nic(nx,<rxx), thenRe X and Im X are independent TV^Re nx,<rxx/2) and A^Im nx,axx/2),respectively.
Turning to a different class of variates, suppose Xi , . .. , Xn are inde-pendent Nr(Qj£>xx) variates. Then the r X r matrix-valued random variable
90 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
is said to have a Wishart distribution of dimension r and degrees of freedomn. We write: W is Wr(nJLxx). If on the other hand X i , . . . , Xn are inde-pendent Nr
c(to^xx) variates, then the r X r matrix-valued random variable
The complex Wishart distribution will be useful in the development ofapproximations to the distributions of estimates of spectral density matrices.
In later sections of this text, we will require the concept of a sequence ofvariates being asymptotically normal. We will say that the r vector-valuedsequence Cr, T = 1 ,2, . . . is asymptotically ^Vr(tfr,Sr) if the sequence£^1/2((r — yr) tends, in distribution, to M(0,I). We will also say that the rvector-valued sequence O, T = 1, 2 , . . . is asymptotically M-c(yr,Sr) if thesequence Sf 1/2(£r — tfr) tends, in distribution, to JV,C(0,I).
4.3 STOCHASTIC PROPERTIES OF THE FINITE FOURIER TRANSFORM
Consider the r vector-valued stationary series X(0, f = 0, ±1, . . . . Inthis section we will develop asymptotic expressions for the cumulants of thefinite Fourier transform of an observed stretch of the series. In Section 3.3we saw that certain benefits could result from the insertion x>f convergencefactors into the direct definition of the finite Fourier transform. Now let us
for n ^ r and W ^ 0. Other properties include:
and
is said to have a complex Wishart distribution of dimension r and degrees offreedom n. In this case we write: X is Wr
c(n&xx)> The complex Wishart dis-tribution was introduced in Goodman (1963). Various of its properties aregiven in Exercises 4.8.4 to 4.8.8 and in Srivastava (1965), Gupta (1965),Kabe (1966,1968), Saxena (1969), and Miller (1968,1969). Its density func-tion may be seen to be given by
In the present context we will refer to the function ha(t/T) as a taper or datawindow. The transform involves at most the values X(0, / = 0, ±1,.. . ,±(r — 1) of the series. If ha(u) = 0 for M < 0, then it involves only thevalues X(0, t = 0 , . . . , T — 1. This means that the asymptotic results wedevelop apply to either one-sided or two-sided statistics. If a segment of theseries is missing, within the time period of observation, then the data avail-able may be handled directly by taking h(t/T) to vanish throughout the miss-ing segment. If the component series are observed over different time inter-vals, this is handled by having the ha(t/T) nonzero over different timeintervals.
Set
and if it is possible to apply the Poisson summation formula (Edwards(1967) p. 173), then we may write
The discussion of convergence factors in Section 3.3 suggests that Hai...ak(^)will have substantial magnitude only for X near 0. This implies that the func-tion (4.3.2) will have substantial magnitude only for X near some multipleOf27T.
4.3 STOCHASTIC PROPERTIES OF THE FINITE FOURIER TRANSFORM 91
begin by inserting convergence factors here and then deducing the results forthe simple Fourier transform as a particular case. We begin with
Assumption 4.3.1 h(u), — °° < u < °° is bounded, is of bounded variation,and vanishes for |«| > 1.
Suppose ha(u) satisfies this assumption for a = ! , . . . , / • . The finiteFourier transform we consider is defined by
If
92 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
We repeat the definition
and if
we also repeat the definition
Now we have
Theorem 4.3.1 Let X(/), t = 0, ± 1, .. . be a stationary r vector-valuedseries satisfying (4.3.6). Suppose ha(u), — °o < u < °°, satisfies Assumption4.3.1 for a — 1, . . ., r. Then
The error term is uniform in Xi, . . . , X/c.
If Xi + (- X*-. = 0 (mod 2*-), then
If Xi + • • • + \k ^ 0 (mod 2?r), then the cumulant will be of reduced order.Expression (4.3.9) suggests that we can base an estimate of the cumulantspectrum (4.3.7) on the dai
(T)(\i),. . . , dak(T)(\k)with X, H\-\k = 0
(mod ITT).There are circumstances in which the error term of (4.3.8) is of smaller
order of magnitude than o(T). Suppose, in place of (4.3.6), we have
then we can prove
4.3 STOCHASTIC PROPERTIES OF THE FINITE FOURIER TRANSFORM 93
Theorem 4.3.2 Let X(0, t = 0, ±1, . . . be a stationary r vector-valuedseries satisfying (4.3.10). Suppose ha(ii), — <» < u < «, satisfies Assump-tion 4.3.1 for a = 1 , . . . , r. Then
is of special interest. In this case the Fourier transform is
Also, from expression (4.3.2)
The function A(r)(X) has the properties: A<"(X) = T for X s 0 (mod 2*-),&<n(2irs/T) = 0 for s an integer with s ^ 0 (mod T). Also |A("(X)| ^l/|sin i\| and so A(r)(^) is of reduced magnitude for X not near a multipleof 2ir. Expression (4.3.11) here takes the form
This joint cumulant has substantial magnitude for Xi -f- • • • + X* near somemultiple of 2ir. Note that the first term on the right side of expression (4.3.15)vanishes for Xy = 2irSj/T, Sj an integer, if si + • • • -f Sk ̂ 0 (mod T).
The error term is uniform in X i , . . . , X*.
Qualitatively the results of Theorem 4.3.1 are the same as those of Theo-rem 4.3.2. However, this theorem suggests to us that decreasing the span ofdependence of series, as is the effect of expression (4.3.10) over (4.3.6), re-duces the size of the asymptotic error term. Exercise 4.8.14 indicates that theerror term may be further reduced by choosing the ha(u) to have Fouriertransforms rapidly falling off to 0 as |X| increases.
The convergence factor
94 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
Expression (4.3.15) was developed in Brillinger and Rosenblatt (1967a);other references to this type of material include: Davis (1953), Root andPitcher (1955), and Kawata (1960, 1966). Exercise 4.8.21 suggests that, onoccasion, it may be more efficient to carry out the tapering through compu-tations in the frequency domain.
4.4 ASYMPTOTIC DISTRIBUTION OF THE FINITE FOURIERTRANSFORM
In the previous section we developed asymptotic expressions for the jointcumulants of the finite Fourier transforms of a stationary time series. In thissection we use these expressions to develop the limiting distribution of thetransform. We set ex — EX(t) and have
Theorem 4.4.1 Let X(0, t — 0, ±1,. .. be an r vector-valued series satis-fying Assumption 2.6.1. Let s/T) be an integer with X/r) = 2irsv{T)/r —> X,as T —> oo for; = 1,.. ., J. Suppose 2X/jT), X/r) ± \k(T) ^ 0 (mod 2ir) for1 ^ j < k ^ J. Let
In the case X = 0,
and the theorem is seen to provide a central limit theorem for the series X(f).Other central limit theorems for stationary series are given in Rosenblatt(1956,1961), Leonov and Shiryaev (1960), losifescu and Theodorescu (1969)p. 22, and Philipp (1969). The asymptotic normality of Fourier coefficientsthemselves is investigated in Kawata (1965, 1966).
If the conditions of the theorem are satisfied and \j = X, j = 1 , . . . , J,then we see that the dx
(T)(\j{T))J= 1 , . . . , J are approximately a sample ofsize J from Nr
c(Q,2irTfxx(>3)- This last remark will prove useful later in thedevelopment of estimates of fxx(ty and in the suggesting of approximatedistributions for a variety of statistics of interest.
Then dAr(r)(X/r)), j = I, . . . , J are asymptotically independentJV,c(0,2ir7f;r*(Xy)) variates respectively. Also if X = 0, =fc2r, . . . , d*(r)(X)is asymptotically Nr(Tcx, 2irTfxx(X)) independently of the previous variatesand if X = ±TT, ±3ir,. . . , dx(T)(\) is asymptotically N^,2irT fxx(\D inde-pendently of the previous variates.
where ha(t) satisfies Assumption 4.3.1, a = 1, . . . , r. Then theAX(T)(\J), \j ^ 0 (mod 2ir), j = 1 , . . . , 7 are asymptotically independentNc(Q,2irT[Hab(.tyfab(\j))variates. Also if X = 0, ±2x,. .. , d^(r)(X) iasymptotically Nr(T[caHa(0)2^T[Haf^i)fab(\)})independently of theprevious variates and if X = ±ir, ±3?r,.. . , d;r(r)(X) is asymptoticallyA^(0,27rT[//a6(0)/afc(X)]) independently of the previous variates.
If the same taper h(t) is applied to each of the components of X(/), then wesee that the asymptotic covariance matrix of dA-(r)(X) has the form
4.4 ASYMPTOTIC DISTRIBUTION OF THE FINITE FOURIER TRANSFORM 95
If the series X(/), t = 0, d b l , . . . is tapered prior to evaluating its finiteFourier transform, then an alternate form of central limit theorem is avail-able. It is
Theorem 4.4.2 Let X(f), / = 0, ±1,. . . be an r vector-valued seriessatisfying Assumption 2.6.1. Suppose 2Xy, X7 ± X^ ^ 0 (mod 2ir) for1 ^ j < k ^ J. Let
Under additional regularity conditions on the ha(t), a = 1,. . . , r, we canobtain a theorem pertaining to sequences X/r) of frequencies tending tolimits \j,j = 1,...,/; see Brillinger (1970) and Exercise 4.8.20. The corre-sponding dA-(r)(X/r)) will be asymptotically independent provided theX/7"), Xfc(r) are not too near each other, (mod 2ir), for 1 ^ j < k ^ /.Exercise 4.8.23 gives the asymptotic behavior of Fourier transforms basedon disjoint stretches of data.
Suppose that X(f), t = 0, db 1, . . . is a real-valued stationary series whosepower spectrum fxXX) is near constant, equal a2/(2ir) say, — °° < X < <».From Theorem 4.4.1 we might expect the values dx(T)(2irs/T), s = 1, . . . ,(T — l)/2 to be approximately independent JVi C(0,7V2) variates and afortiori the values Re dx^(2irs/T), Im dx(T)(2*s/T\ s = 1,. . . , (T - l)/2to be approximately independent 7Vi(0,7V2/2) variates. We turn to a partialempirical examination of this conclusion.
Consider the series V(i), t = 0, 1 , . . . of mean monthly temperatures inVienna for the period 1780-1950; this series, partially plotted in Figure 1.1.1,has a strong yearly periodic component. In an attempt to obtain a serieswith near constant power spectrum, we have reduced this periodic com-ponent by subtracting from each monthly value the average of the values for
96 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
for 7 = 0, . . . , 11 and k = 0, 1,. . . . We then evaluated the Fourier trans-form dxw(2irs/T), s = 1,. . . , (T - l)/2 taking T = 2048 = 2»» so thatthe Fast Fourier Transform Algorithm could be used.
Figures 4.4.1 and 4.4.2 are normal probability plots of the values
respectively. The construction of such plots is described in Chernoff andLieberman (1954). The estimated power spectrum of this series, given inSection 7.8, falls off slowly as X increases and is approximately constant. Ifeach of the variates has the same marginal normal distribution, the valuesshould lie near straight lines. The plots obtained are essentially straight lines,with slight tailing off at the ends, suggesting that the conclusions of Theorem4.4.1 are reasonable, at least for this series of values.
Figure 4.4.1 Normal probability plot of real part of discrete Fourier transform of season-ally adjusted Vienna mean monthly temperatures 1780-1950.
the same month across the whole stretch of data. Specifically we haveformed the series
4.4 ASYMPTOTIC DISTRIBUTION OF THE FINITE FOURIER TRANSFORM 97
Figure 4.4.2 Normal probability plot of imaginary part of discrete Fourier transform ofseasonally adjusted Vienna mean monthly temperatures 1780-1950.
The theorems in this section may provide a justification for the remark,often made in the communications theory literature, that the output of anarrow band-pass filter is approximately Gaussian; see Rosenblatt (1961).Consider the following narrow band-pass transfer function centered at Xo
If a series X(t), t = 0, ±1,.. . is taken as input to this filter, then expression(3.6.8), of the previous chapter, indicates that the output series of the filterwill be approximately
Here s is the integral part of T\o/(2ir) and so 2irs/T = XQ. Theorem 4.4.1now suggests that the variate (4.4.8) is asymptotically N(0,4irT~lfxx(^oy) inthe case Xo ̂ 0 (mod TT). Related references to this result are Leonov andShiryaev (1960), Picinbono (1959), and Rosenblatt (1956c).
98 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
Exercise 4.8.23 contains the useful result that finite Fourier transformsbased on successive stretches of data are asymptotically independent andidentically distributed in certain circumstances.
4.5 PROBABILITY 1 BOUNDS
It is sometimes useful to have a bound on the fluctuations of a finiteFourier transform
with probability 1.
This means that for K > 1, there is probability 1 that only finitely many ofthe events
occur.We see from expression (4.5.2) that under the indicated conditions, the
Fourier transform has a rate of growth at most of order (T log T)1/2. If X(t) isbounded by a constant, M say, then we have the elementary inequality
giving a growth rate of order T. On the other hand, if we consider |c/jr(r)(X)|for a single frequency X, we are in the realm of the law of the iteratedlogarithm; see Maruyama (1949), Parthasarathy (I960), Philipp (1967),losifescu (1968), and losifescu and Theodorescu (1969). This law leads to arate of growth of order (T log log T)112; other results of the nature of (4.5.2)are given in Salem and Zygmund (1956), Whittle (1959), and Kahane (1968).
as a function of frequency X and sample size T. In this connection wemention
Theorem 4.5.1 Let the real-valued series X(f), t = 0, = h l , . . . satisfyAssumption 2.6.3 and have mean 0. Let h(t) satisfy Assumption 4.3.1. Letdx
(T)(\) be given by (4.5.1). Then
In the case that X(0, t = 0, ±1, . . . is an r vector-valued stochastic serieswe have
Theorem 4.5.2 Let the r vector-valued X(/), / = 0, =fc 1,. .. satisfy Assump-tion 2.6.3 and have mean 0. Let Y(/) be given by (4.5.7) where (a(w)} satisfiescondition (4.5.8), then there is a finite L such that
with probability 1.
where
and
then there is a finite K such that
for some s X r matrix-valued filter {a(«)}. On occasion we will be interestedin relating the finite Fourier transform of Y(0 to that of X(0- Lemma 3.4.1indicates that if X(/), / = 0, ±1, ... is bounded and
4.5 PROBABILITY 1 BOUNDS 99
An immediate implication of Theorem 4.5.1 is that, under the statedregularity conditions,
with probability 1 as T —> °°. In particular, taking X = 0 we see
with probability 1 as T —> °o — this last is the strong law of large numbers.Results similar to this are given in Wiener and Wintner (1941).
Turning to the development of a different class of asymptotic results,suppose the s vector-valued series Y(/), / = 0, ± 1, . . . is a filtered versionof X(t), say
100 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
Expression (4.5.12) indicates a possible rate of growth for
with probability 1.
In the case that X(f), t — 0, ±1,... is a series of independent variates,Y(r) given by (4.5.7) is a linear process. Expressions (4.5.12) and (4.5.14)suggest how we can learn about the sampling properties of the Fouriertransform of a linear process from the sampling properties of the Fouriertransform of a series of independent variates. This simplification wasadopted by Bartlett (1966) Section 9.2.
On certain occasions it may be of interest to have a cruder bound on thegrowth of sup \dx(T)(X)\ when the series X(t), t = 0, ±1, . . . satisfies theweaker Assumption 2.6.1.
Theorem 4.5.4 Let the real-valued series X(t), t = 0, ±1,. . . satisfy As-sumption 2.6.1 and have mean 0. Let h(t) satisfy Assumption 4.3.1 and letdx
(T)(X) be given by (4.5.1). Then for any e > 0,
to be of order (log T)1/2. In Theorem 4.5.3 we will see that this rate ofgrowth may be reduced to the order r-1/2(log T)112 if the series are taperedprior to evaluating the Fourier transform.
Theorem 4.5.3 Let the r vector-valued X(f), f = 0, ±1,. . . satisfy Assump-tion 2.6.3 and have mean 0. Let Y(f) be given by (4.5.7) where {a(u)| satisfies(4.5.8). Let
where h(u) = 0 for \u\ ^ 1 and has a uniformly bounded derivative. Thenthere is a finite L such that
with probability 1 as T —> °°.
4.6 THE CRAMER REPRESENTATION
In Section 3.9 we developed two spectral representations for series in-volved in the functional approach to time series while in this section we indi-cate a spectral representation in the stochastic approach. The representationis due to Cramer (1942).
to be the period 2ir extension of the Dirac delta function. We may now state
Theorem 4.6.1 Let X(0, t = 0, ±1,. . . satisfy Assumption 2.6.1. LetZxm(\), — « < X < oo, be given by (4.6.4). Then there exists ZAT(X),— oo < X < oo, such that ZAr(r)(X) tends to Z*(X) in mean of order v, forany v > 0. Also ZA-(X + ITT) = Zx(X), ZAT(X) = Z*(-X) and
4.6 THE CRAMER REPRESENTATION 101
Suppose X(0, t = 0, ±1, . . . is an r vector-valued series. Consider thetapering function
giving the finite Fourier transform
This transform will provide the basis for the representation. Set
We see
if we understand
Define
for o i , . . . , ak = 1,. . ., r, k = 2, 3,
We may rewrite (4.6.7) in differential notation as
Expression (4.6.8) indicates that
102 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
where fxx(^) denotes the spectral density matrix of the series X(/). The in-crements of Zjr(X) are orthogonal unless X = \t (mod 2ir). Also jointcumulants of increments are negligible unless £)* \/ = 0 (mod 2v). Theincrements of Z*(X) mimic the behavior of d;r(r)(X) as given in Section 4.3.
In Theorem 4.6.2 we will need to consider a stochastic integral of the form
If
this integral exists when defined as
See Crame'r and Leadbetter (1967) Section 5.3. We may now state theCramer representation of the series X(/), / = 0, ±1, . . . .
Theorem 4.6.2 Under the conditions of Theorem 4.6.1
with probability 1, where Z*(X) satisfies the properties indicated in Theorem4.6.1.
It is sometimes convenient to rewrite the representation (4.6.13) in a forminvolving variates with real-valued components. To this end set
These satisfy
and
If we make the substitutions
4.6 THE CRAMER REPRESENTATION 103
In differential notation the latter may be written
then from expression (4.6.8) we see that
where the summations extend over e, 7 = ± 1. In the case k + 7 = 2, theserelations give
The Cramer representation (4.6.13) takes the form
in these new terms.The Cramer representation is especially useful for indicating the effect of
operations on series of interest. For example, consider the filtered series
where the series \(i) has Cramer representation (4.6.13). If
with
then
104 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
As an example of an application of (4.6.27) we remark that it, together with(4.6.9), gives the direct relation
of Section 2.8.Suppose the filter is a band-pass filter with transfer function, — TT < X ̂ TT,
applied to each coordinate of the series X(/)> t = 0, ±1,. . . .Suppose, as we may, that the Cramer representation of X(/) is written
Then the band-pass filtered series may be written
for small A. The effect of band-pass filtering is seen to be the lifting, from theCramer representation, of cosinusoids of frequency near ±w. For small Athis series, Y(f), is sometimes called the component of frequency <o of X(/) andis denoted by X(/,oo), suppressing the dependence on A. By considering abank of exhaustive and mutually exclusive band-pass filters with transferfunctions such as
j = 0, 1,. . ., J where (2J + 1)A = r, we see that a series X(f), / = 0, ± 1,. . .may be thought of as the sum of its individual frequency components,
We will see, later in this work, that many useful statistical procedures havethe character of elementary procedures applied to the separate frequencycomponents of a series of interest.
Let us next consider the effect of forming the Hilbert transform of eachcomponent of X(/), / = 0, ±1, . . . . The transfer function of the Hilberttransform is
4.6 THE CRAMER REPRESENTATION 105
If we write the Crame'r representation of X(0 in the form
then we quickly see
The cosinusoids of the representation have been shifted through a phaseangle of ir/2. In the case of X(t,cu), the component of frequency u in the seriesX(0, we see from (4.6.36) that
and so
for example. Function (4.6.38) provides us with another interpretation ofthe differential dZx(u>), appearing in the Cramer representation.
Next let us consider the covariance matrix of the 2r vector-valued series
Elementary calculations show that it is given by
in the case w ^ 0 (mod ?r) and by
in the case co = 0 (mod TT). These results provide us with a useful interpreta-tion of the real and imaginary parts of the spectral density matrix of a seriesof interest.
As another example of the use of the Cramer representation let us seewhat form a finite Fourier transform takes in terms of it. Suppose
for some tapering function h(u). By direct substitution we see that
4.7 PRINCIPAL COMPONENT ANALYSIS AND ITS RELATION TO THECRAMER REPRESENTATION
Let Y be a J vector-valued random variable with covariance matrix Syy.If the components of Y are intercorrelated and / > 2 or 3, then it is oftendifficult to understand the essential statistical nature of Y. Consider, there-fore, the problem of obtaining a variate < more elementary than Y, yet con-taining most of the statistical information in Y. We will require ( to havethe form
where FjrA:(X) denotes the r X r matrix-valued function whose existence wasindicated in Theorem 2.5.2. The integral representation now holds in anintegral in mean square sense only; the proof of Theorem 3.9.1 may bemodified to provide this result.
106 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
where
From what we have seen in previous discussions of tapering, for large valuesof T, the function //(r)(X — a) is concentrated in the neighborhood ofX s a (mod 2*-). Therefore, from (4.6.43), for large values of T, djr<r>(X) isessentially getting at dZ*(A). As a final remark we mention that (4.6.8) and(4.6.43) imply
exactly. The latter may usefully be compared with the asymptotic expression(4.3.8).
In fact Cramer (1942) developed the representation (4.6.13) under theconditions of Theorem 2.5.2. In this more general case, the function Zjt(X)satisfies
4.7 PRINCIPAL COMPONENT ANALYSIS 107
for some K X J matrix A with K < J. And we will formalize the requirementthat C contains much of the statistical information in Y by requiring that itminimize
We have
where U, is thejth latent vector of Syy,y = 1,...,./. The minimum achievedis
The error caused by replacing Y by (4.7.7) is seen from (4.7.4) to depend onthe magnitude of the latent roots withy > K. If K = J then the error is 0 andexpression (4.7.8) is seen to provide a representation for Y in terms of un-correlated variates fy.
Theorem 4.7.1 Let Y be a J vector-valued variate with covariance matrixSyy. The K X J matrix A that minimizes (4.7.2) for C of the form (4.7.1) isgiven by
where ju./ is the y'th latent root of Syy. The extremal B, C are given by
The individual components of £ are called the principal components of Y.They are seen to have the form f y = UyTY and to satisfy
The variate C, therefore, has a more elementary statistical structure than Y.The theorem has led us to consider approximating the J vector-valued
Yby
and its yth component by
108 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
Principal components will be discussed in greater detail in Chapter 9.They were introduced by Hotelling (1933). Theorem 4.7.1 is essentially dueto Kramer and Mathews (1956) and Rao (1964, 1965).
We now turn to the case in which the variate Y refers to a stretch of valuesX(i), t = — Tt... , T of some real-valued stationary time series. In this case
Following the discussion above, the principal components of the variate(4.7.9) will be based upon the latent vectors of the matrix (4.7.10). Thismatrix is finite Toeplitz and so, from Section 3.7, its latent values andvectors are approximately
and
respectively s ~ —T,...,T. The principal components of (4.7.9) are there-fore approximately
If we refer back to expression (4.6.2), then we see that (4.7. l"3) isdx(T)(2irs/(2T + 1)), the finite Fourier transform on which the Crame'rrepresentation was based and which we have proposed be taken as a basicstatistic in computations with an observed stretch of series.
Suppose X(i), t = 0, ± 1,. . . has autocovariance function cxx(u), u — 0,± 1 , . . . , so
4.8 EXERCISES 109
Following Theorem 4.7.1 we are led to approximate X(t) by
if
in some sense. Expression (4.7.15) is seen to be the Cramer representation ofX(t). The Cramer representation therefore results from a limit of a principalcomponent analysis of X(i)t t = 0, db 1,. . . .
Craddock (1965) carried out an empirical principal component analysis ofa covariance matrix resulting from a stretch of time series values. The princi-pal components he obtained have the cosinusoidai nature of (4.7.13).
The collection of latent values of a matrix is sometimes referred to as itsspectrum, which in the case of matrix (4.7.10) are seen, from (4.7.11), toequal approximately 2wfxx(2irs/(2T + 1)), s = -T,...,T, where fxx(\) isthe power spectrum of the series X(t), t = 0, ± 1,. . . . We have thereforebeen led to an immediate relation between two different sorts of spectra.
4.8 EXERCISES
4.8.1 Let Y = U + jV where U and V are jointly multivariate normal. Let£Y = y and £(Y — y)(Y — y)r = 0. Prove that the individual componentsof Y are statistically independent if £(Y — y)(Y — y)7" is diagonal.
4.8.2 If Y is MC(0,S), prove that AY is MC(0,ASAT) for any s X r matrix A.Conclude that if the entries of Y are independent Nic(0,<r2) variates and ifA is r X r unitary, then the entries of AY are also independent Nic(Q,<r2).Also conclude that the marginal distributions of a multivariate complexnormal are complex normal.
4.8.3 If X is AT,c(y,S) and Im S = 0, show that Re X and Im X are statisticallyindependent.
4.8.4 If W is distributed as Wrc(n,V) prove that W'„ is distributed as S^X^/2.
4.8.5 If W is distributed as H^rc(n,£), prove that E\V = nS. Also prove that
E(Wjk - nLJk}(Wlm - nS/w) = nSj^Sik.
where the summation in expression (4.7.14) is over s corresponding to the Klargest values of (4.7.11). If we take K = IT + 1 and let T—> oo, then wewould expect the value (4.7.14) to be very nearer). In fact, (4.7.14) tends to
110 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
4.8.6 If W is distributed as W,c(«f Z), prove that £-1/2WS-"2 is distributed asWr
c(n,l) if £ is nonsingular.4.8.7 Let Y be distributed as Nn
c(y,a2T) and let
if j = k where (*) denotes ay' X j permanent; see Goodman and Dubman(1969).
4.8.10 Let X(f), / = 0, ±1, ... be a stationary process with finite moments andsatisfying X(t + D = X(/), f = 0, ±1, ... for some positive integer T.(Such an X(r) is called a circular process.) Prove that
4.8.11. If X(/), t — 0, ±1,... is a stationary Gaussian series, prove that for k > 2
4.8.12 Let X(r), r = 0, ±1, ... be an r vector-valued pure noise series with
where A* is_Hermitian of rank /jfc. A necessary and sufficient condition forthe forms YrA*Y to be distributed independently with YTA*Y distributedas (noncentral chi-squared)<r2X2^(^ry/(r2)/2 is that n\ -\ \- nK = n\see Brillinger (1973).
4.8.8 Let W be distributed as ^(n.S). Let it be partitioned into
and equals
with Wn r X r and Y/22S X s. Suppose that 2 is similarly partitioned.Prove that W22 - W2iWir!Wi2 is distributed as
If Si2 = 0, prove that W2iWir»Wi2 is distributed as Wsc(r,JLid and is
independent of W22 - \V2iWir1 W|2.4.8.9 Let Y be Nr
c(Q, S). Prove that
4.8 EXERCISES 111
If da(T)[\]is given by expression (4.3.13), prove that
4.8.13 Let X(/), t = 0, ±1, . . . be an r vector-valued stationary series satisfyingexpression (4.3.6). Let T = min; Tj. If c/u
(r)(X) is given by expression (4.3.13)show that
4.8.14 Suppose the conditions of Theorem 4.3.2 are satisfied. Suppose that Ha(\)of (4.3.3) satisfies
for some finite K where v > 2, a = 1, . . . , r. Then prove that
4.8.15 Let X(r), / = 0, ±1,.. . be an r vector-valued series. Suppose the stretch ofvalues X(r), / = 0 , . . . , T - I is given. Prove that Ax
(T>(2irs/T), 5 = 0 , . . . ,T/2 is a sufficient statistic.
4.8.16 Under the conditions of Theorem 4.6.1, prove that ZA-(X) is continuous inmean of order v for any v > 0.
4.8.17 Let Y be a / vector-valued random variable with covariance matrix 2yy.Determine the linear combination «TY, with ara = 1, that has maximumvariance.
4.8.18 Making use of Exercise 3.10.15, generalize the discussion of Section 4.7 tothe case of vector-valued series.
4.8.19 Under the conditions of Theorem 4.4.2, prove that if X ̂ 0 (mod TT), thenarg{£/a(r)(X)} tends in distribution to a uniform variate on the interval(0,27r) as r-> oo. What is the distribution if X = 0 (mod TT)?
4.8.20 Suppose the conditions of Theorem 4.4.2 are satisfied. Suppose //a(X) of(4.3.3) satisfies
for some finite K where v > 2, a = 1 , . . . , r. Suppose X/7) —» X; as T —» oowith min/ T\\j(T) - 2irl\, min/ r|X/r) ± X*(7) - 2irl\ -* oo as T-^ oo,1 ^ j < k ^ J. Prove that the conclusions of Theorem 4.4.2 apply to*r<"(XXD)J =! , . . . , / .
112 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
r-i4.8.21 Let dxW(\) = J] X(t) exp{ -i\t }. Show that
r=0
(a) Show that dx(T)(2-irs/T) is distributed as Wic(0,7V2) for s an integer
with 2irs/T ^ 0 (mod TT).(b) Indicate the distribution of
4.8.23 Let X(r), / = 0, ±1, ... be an r vector-valued series satisfying Assumption2.6.1. Let ha(u), — °° < u < «>, satisfy Assumption 4.3.1. Let
where //«<r)(X) is given by (4.3.2).4.8.22 Let X(t), t = 0, . . . , T — 1 be a sequence of independent normal variates
with mean 0 and variance <r2. Let
(c) Indicate the distribution of arg
for - oo < X < co ; / = 0 , . . . , L - 1; a = 1, . . . , r. Show that d;r(K)(X,/) =[da
<y)(\,f)], I = 0, . . . , L — 1 are asymptotically independent
variates if X ̂ 0 (mod TT) and asymptotically Nr(Q,2irV[Hab(Q) fab(\)])variates if X = ±TT, ±3?r,. . . , as K—> °°. Hint: This result follows directlyfrom Theorem 4.4.2 with X(t) and the ha(u) suitably redefined.
4.8.24 If X is MC(0,S) and A is r X r Hermitian, show that XTAX is distributedas 2^'j^i Hj(.Uj2 + Vj2) where MI, • • • , Mr are the latent values of
and HI, . . . , Mr, 01, . . . , vr are independent W(0,l) variates.4.8.25 Let X i , . . . , XB be independent Mc(y,S) variates. Showthat £ = ^;X7/«
and £ = 2J/X, ~ tfvXXy — tfy)T/« are the maximum likelihood estimates
of y and £; see Giri (1965).4.8.26 Show that a Wr
c(n^~) variate with Im S = 0 may be represented asi(Wn + W22 + i(Wi2 - Wzi)} where
4.8 EXERCISES 113
Conclude that the real part of such a complex Wishart is distributed asi^2(2rt,2).
4.8.27 Use the density function (4.2.9) to show that W = Wrc(n$ may be repre-
sented as (X + /Y)(X + /Y)T where X, Y are lower triangular, Xjk, YJk,1 ^ k < j ^ r are independent #1(0,1) variates and Xjf, Yi3
2 are in-dependent Xn-j+l-
4.8.28 Under the conditions of Exercise 4.8.25, show that 32 = ^TS-1v is dis-tributed as X'2r
2(2/i|irS-1v)X22n r) where X'2r2(S) denotes a noncentral chi-
square variate with 2r degrees of freedom and noncentrality parameter d andX2(n-r) denotes an independent central chi-squared with 2(n — r) degrees offreedom; see Giri (1965).
4.8.29 Under the conditions of Theorem 4.4.2, show that the asymptotic distribu-tion of dx(T)(\) is unaffected by the omission of any finite number of theX(/).
4.8.30 Let W be distributed as Wrc(n,*£). Show that
where Kv is the modified Bessel function of the second kind and order v,see Pearson et al (1929) and Wishart and Bartlett (1932).
(a) Show that the density of x = W\2 (with respect to Re x, Im x) isgiven by
4.8.33 Let W be distributed as
4.8.32 Let W be distributed as Jfaf n\ Y Show that the density function ofx = Wn is given by
all possible interchanges a}•,<-> bj for j - 1, . . . , £ — 1}
where P is as in the previous exercise. Show that the number of termssummed in all is 2k~}(k — 1)!
where the summation is over all permutations, P, of the set {1, 2, . . . , k]with the property that P leaves no proper subset of {1, 2,. . ., k} invariant.Show that the number of such permutations is (k — 1)!
4.8.31 Let W be distributed as ^(w,£). Show that
114 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS
(b) Show that the density of y = Re Jf 12 is given by
in terms of the Cramer representation of the series X(i), t - 0, ±1,. . . .4.8.36 (a) Let W be Wr(n,"£) and a, (J, Y, & be r vectors. Show that
where a = Re p, ft = Im p.(c) Show that the density of z = Im W\i is given by
where fi(X), 0 ^ X ̂ TT, is a complex Brownian motion process satisfyingcov{B(\), B(n)\ = min{X, M} andfi(-X) = 5(X).
4.8.35 Suppose the series Y(i), t - 0, ±1,... is given by (2.9.15). Show that it hasthe form
where /o is the modified Bessel function of the first kind and order 0.
All of the densities of this exercise were derived in Goodman (1957).4.8.34 Let X(t), t — 0, ±1, ... be a stationary Gaussian series with mean 0 and
power spectrum fxx(X), — °° < X < oo. Show that the Cramer representa-tion may be written
(d) Show that the density of </> = arg W\2 is given by
where <£o = arg p.(e) Show that the density of w = \ W12! is given by
(b) Let W be Wrc(/i,S) and o, (J, Y, 8 complex r vectors. Show that
4.8 EXERCISES 115
4.8.37 Let X(t), t = 0, ±1, ... be a 0 mean, real-valued series satisfying Assump-tion 2.6.1. Let u be a non-negative integer. Then Theorem 2.9.1 indicatesthat the series Y(i) = X(t + u)X(t) also satisfies Assumption 2.6.1. Use thisand Theorem 4.4.1 to show that
is asymptotically normal with mean cxx(u) and variance
5
THE ESTIMATION OFPOWER SPECTRA
116
We have seen in Section 2.5 that this power spectrum is non-negative, even,and of period 2-n- with respect to X. This evenness and periodicity means thatwe may take the interval [0, ir] as the fundamental domain of definition of
/A-A-(X) if we wish.
then the power spectrum of the series X(t), t = 0, ±1, . . . at frequency X isdefined to be the Fourier transform
Suppose that the autocovariance function satisfies
and autocovariance function
5.1 POWER SPECTRA AND THEIR INTERPRETATION
Let X(t\ t = 0, ±1, . . . be a real-valued time series with mean function
5.1 POWER SPECTRA AND THEIR INTERPRETATION 117
Under the condition (5.1.3), fx*O) is bounded uniformly continuousfunction. Also the relation (5.1.4) may be inverted and the autocovariancefunction cxx(u) expressed as
In particular setting u = 0 gives
In Sections 2.8 and 4.6 we saw that the power spectrum transforms in anelementary manner if the series is filtered in a linear time invariant way.Specifically, suppose the series Y(t), t = 0, ±1,. . . results when the seriesX(i), t = 0, dbl , . . . is passed through a filter having transfer functionA(\), — oo < X < oo. Then, from Example 2.8.1, the power spectrum of theseries Y(t), t = 0, ±1, . . . is given by
We may use expressions (5.1.6) and (5.1.7) to see that
Expression (5.1.8) suggests one possible means of interpreting the powerspectrum. Suppose we take for — x < a ^ w and A small
and then extend A(a) outside of the interval (—ir,ir] periodically. This trans-fer function corresponds to a filter proportional to a band-pass filter; seeSection 2.7. The output series Y(t), t = 0, ±1 , . . . is therefore proportionalto X(t,\), the component of frequency X in the series X(t), t = 0, ±1, . . . ;see Section 4.6. From expressions (5.1.8) and (5.1.9) we now see that
This means that/A-A-(X) may be interpreted as proportional to the variance ofX(t,\), the component of frequency X in the series AX/), t = 0, ±1, . . . . In-cidentally, we remark that
This equals 0 if X is farther than A from 0, ±27r,. . ., and so
118 THE ESTIMATION OF POWER SPECTRA
Now if F(0 is taken to be the voltage applied across the terminals of thesimple electric circuit of Figure 5.1.1 containing a resistance of R = 1 ohm,then the instantaneous power dissipated is Y(t)2. An examination of (5.1.12)now indicates that /r*(X) may be interpreted as the expected amount ofpower dissipated in a certain electric circuit by the component in X(t) offrequency X. This example is the reason that/y;r(X) is often referred to as a"power" spectrum.
Figure 5.1.1 An elementary electric circuit withvoltage Y(t) applied at time t.
Roberts and Bishop (1965) have discussed a simple vibratory system forillustrating the value of the power spectrum. It consists of a cylindrical brasstube with a jet of air blown across an open end; this may be thought of asX(t). The output signal is the pressure at the closed end of the tube, while thetransfer function of this system is sketched in Figure 5.1.2. The peaks inthe figure occur at frequencies
where / = length of the tube, c = velocity of sound, and n = I, 2 , . . . . Theoutput of this system will have pressure proportional to
Figure 5.1.2 Approximate form of transfer function of system consisting of brass tube withair blown across one end.
5.1 POWER SPECTRA AND THEIR INTERPRETATION 119
where the Xn are given by (5.1.13). A microphone at the closed end allowsone to hear the output.
We conclude this section by presenting some examples of autocovariancefunctions and the corresponding power spectra, which are given in Figure5.1.3. For example, if cxx(u) is concentrated near u = 0, then/r;r(X) is nearconstant. If cxx(u) falls off slowly as u increases, then/**(X) is concentratednear X = 0, ±2ir,.... If Cxx(u) oscillates about 0 as u increases, then /rjr(A)has substantial mass away from X = 0 (mod 2ir),
We now turn to the development of an estimate of/**(X), — °° < X < °°,and a variety of statistical properties of the proposed estimate. For addi-tional discussion the reader may wish to consult certain of the following
Figure 5.1.3 Selected autocovariances,cxx(u),and associated power spectra,fxxfy)-
120 THE ESTIMATION OF POWER SPECTRA
review papers concerning the estimation of power spectra: Tukey (1959a,b),Jenkins (1961), Parzen (1961), Priestley (1962a), Bingham et al (1967), andCooley et al (1970).
5.2 THE PERIODOGRAM
Suppose X(t), t = 0, ± 1,. . . is a stationary series with mean function exand power spectrum fxxO^), — °° < X < <». Suppose also that the valuesX(G),. . ., X(T — 1) are available and we are interested in estimating/rXX).Then we first compute the finite Fourier transform
These distributions suggest a consideration of the statistic
as an estimate offxxO^) in the case X ̂ 0 (mod 2w).The statistic Ixx(T)(X) of (5.2.3) is called the second-order periodogram, or
more briefly periodogram, of the values X(V), . . ., X(T — 1). It was intro-duced by Schuster (1898) as a tool for the identification of hidden periodi-cities because in the case
/A-Ar(r)(X) has peaks at the frequencies X == ±wy (mod 2ir).We note that /jr*(r)(X), given by (5.2.3), has the same symmetry, non-
negativity, and periodicity properties as/o-(X).Figure 5.2.1 is a plot of monthly rainfall in England for the period 1920
to 1930; the finite Fourier transform, dx(T)(\\ of values for 1780-1960 was
calculated using the Fast Fourier Transform Algorithm. The periodogramIxx(T)(\was then calculated and is given as Figure 5.2.2. It is seen to be arather irregular function of X. This irregularity is also apparent in Figures
Following Theorem 4.4.2, this variate is asymptotically
5.2 THE PERIODOGRAM 121
5.2.3 and 5.2.4 which are the lower and upper 100 periodogram ordinates ofthe series of mean monthly sunspot numbers (see Figure 1.1.5 for meanannual numbers). Other examples of periodograms are given in Wold (1965).In each of these examples Ixx(T)(ty is a very irregular function of X despitethe fact that fxx(tyis probably a regular function of X. It appears that/**(r)(X) is an inefficient estimate of/A-A-(X) and so we turn to a considerationof alternate estimates. First, we will present some theorems relating to thestatistical behavior of Ixx(T)(X) in an attempt to understand the source of theirregularity and so construct better estimates.
First consider the expected value of the periodogram. We have
Figure 5.2.2 Periodogram of composite rainfall series of England and Wales for the years1789-1959. (Logarithmic plot)
Figure 5.2.1. Composite index of rainfall for England and Wales for the years 1920-1930.
122 THE ESTIMATION OF POWER SPECTRA
Figure 5.2.3 Low frequency portion of logio periodogram of monthly mean sunspot num-bers for the years 1750-1965.
Figure 5.2.4 High frequency portion of logio periodogram of monthly mean sunspotnumbers for the years 1750-1965.
Theorem 5.2.1 Let X(f), t = 0, ±1,.. . be a time series with EX(f) = cx,cov{*(/ + w), X(t)\= cxx(u\ t, u = 0, ±1, Suppose
then
5.2 THE PERIODOGRAM 123
In the case that X ̂ 0 (mod 2ir), the final term in (5.2.6) is reduced in sizeand we see that EIxxlT)Q3 is essentially a weighted average of the powerspectrum of interest with weight concentrated in the neighborhood of X. Inthe limit we have
Corollary 5.2.1 Under the conditions of the theorem, Ixxm(X) is anasymptotically unbiased estimate of/rx(X) if X ̂ 0 (mod 2ir).
The next theorem gives a bound for the asymptotic bias of Ixx(T)(X).
Theorem 5.2.2 Under the conditions of Theorem 5.2.1 and if
The O(r~1) term is uniform in X.
we have
We remark that in the case X = 2ns/T, s an integer ^ 0 (mod T), thesecond term on the right side of expressions (5.2.6) and (5.2.8) drops outleading to useful simple results. Now a consideration of /*xcr)(X) only forfrequencies of the form 2irs/T, s an integer ^ 0 (mod T), amounts to a con-sideration of IxT~cx(T\x-cx(r>(\), the periodogram of the sample values
removed, because we have the identity
for s ̂ 0 (mod T). In view of the fact that the basic definition of a powerspectrum is based on covariances and so is mean invariant, the restrictedconsideration of Ixx(T)Qirs/T), s an integer ^ 0 (mod T), seems reasonable.We will return to this case in Theorem 5.2.4 below.
We have seen in Sections 3.3 and 4.6 that advantages result from taperingobserved values prior to computing their Fourier transform. We now turn tothe construction of a modified periodogram that is appropriate for a series oftapered values. Suppose that we have formed
124 THE ESTIMATION OF POWER SPECTRA
for some taper h(u) satisfying Assumption 4.3.1. Then Theorem 4.4.2suggests that the distribution of dx(T)(\)may be approximated by
in the case X ̂ 0 (mod TT). This suggests that we might consider the statistic
as an estimate of fxx(^) in the tapered case.We have replaced T J h(f)2dtby the sum of the squares of the taper co-
efficients as this is easily computed. Suppose we set
and
If it is possible to apply the Poisson summation formula, then these two areconnected by
and //(r)(X) is seen to have substantial magnitude for large T only ifX 55 0 (mod 2vr). This observation will help us in interpreting expression(5.2.17). We can now state
Theorem 5.2.3 Let X(t\ t = 0, ± 1,. . . be a real-valued series satisfying theconditions of Theorem 5.2.1. Let h(u) satisfy Assumption 4.3.1. Let /**<r)(X)be given by (5.2.13). Then
In the case that X ̂ 0 (mod 2ir\ the final term in (5.2.17) will be of re-duced magnitude. The first term on the right side of (5.2.17) is seen to be aweighted average of the power spectrum of interest with weight concen-trated in the neighborhood of X and relative weight determined by the taper.This expression is usefully compared with expression (5.2.6) corresponding
5.2 THE PERIODOGRAM 125
to the nontapered case. lffxx(oi) has a substantial peak for a in the neighbor-hood of X, then the expected value of Ixx(T)(X),given by (5.2.6) or (5.2.17),can differ quite substantially from fxx(ty. The advantage of employing ataper is now apparent. It can be taken to have a shape to reduce the effect ofneighboring peaks.
Continuing our investigation of the statistical properties of the periodo-gram as an estimate of the power spectrum, we find Theorem 5.2.4 describesthe covariance structure of IXX(T))in the nontapered case and when it is ofthe special form 2-n-s/T, s an integer.
Theorem 5.2.4 Let X(t), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let Ixx(T)(X) be given by expression (5.2.3). Let r, s beintegers with r, s, r ± s ̂ 0 (mod T\ Let u = 2irr/T, \ = 2irs/T. Then
Given e > 0, the O(T~l) term is uniform in X, n deviating from all multiplesof 2ir by at least s.
The O(7T~1) terms are uniform in X, ju of the indicated form.
In connection with the conditions of this theorem, we remark thatIxx(T)(2irr/T) = Ixx
(T)(2irs/T) if r + s or r - s = 0 (mod T) so the esti-mates are then identical.
This theorem has a crucial implication for statistical practice. It suggeststhat no matter how large Tis taken, the variance of Ixx(T)(ty will tend to re-main at the level fxx(X)- If an estimate with a variance smaller than this isdesired, it is not to be obtained by simply increasing the sample length andcontinuing to use the periodogram. The theorem also suggests a reason forthe irregularity of Figures 5.2.2 to 5.2.4 — namely, adjacent periodogramordinates are seen to have small covariance relative to their variances. Infact we will see in Theorem 5.2.6 that distinct periodogram ordinates areasymptotically independent.
Theorem 5.2.5 describes the asymptotic covariance structure of theperiodogram when X is not necessarily of the form 2-n-s/T.
Theorem 5.2.5 Let X(t\ t — 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let Ixx(T)(^) be given by expression (5.2.3). SupposeX, M ^ 0 (mod 2ir). Then
for — oo < X < oo. Then Ixx(T}(^j(T)),j = 1,. . . ,/are asymptotically in-dependent fxx(\i)X2
2/2 variates. Also if X = ±ir, ±3*-,. . ., Ixx(T)(X) isasymptotically fxx(X)x\ independently of the previous variates.
In Theorem 5.2.6 X,2 denotes a chi-squared variate with v degrees of free-dom. The particular case of X22/2 is an exponential variate with mean 1.
A practical implication of the theorem is that it may prove reasonable toapproximate the distribution of a periodogram ordinate, /**(r)(X), by amultiple of a X2
2 variate. Some empirical evidence for this assertion is pro-vided by Figure 5.2.5 which is a two degree of freedom chi-squared probabil-ity plot of the values Ixx
(T)(2irs/T), s = r/4,. . . , T/2, for the series of meanmonthly sunspot numbers. We have chosen these particular values of s be-cause Figures 5.2.4 and 5.4.3 suggest that/A-A-(X) is approximately constantfor the corresponding frequency interval. If the values graphed in a twodegree of freedom chi-squared probability plot actually have a distributionthat is a multiple of X22, then the points plotted should tend to fall along astraight line. There is substantial evidence of this happening in Figure 5.2.5.Such plots are described in Wilk et al (1962).
Theorem 5.2.6 reinforces the suggestion, made in the discussion of Theo-rem 5.2.4, that the periodogram might prove an ineffective estimate of thepower spectrum. For large T its distribution is approximately that of amultiple of a chi-squared variate with two degrees of freedom and hence isvery unstable. In Section 5.4 we will turn to the problem of constructingestimates that are reasonably stable.
126 THE ESTIMATION OF POWER SPECTRA
We remark that expression (5.2.20) is more informative than (5.2.19) inthat it indicates the transition of cov{/A-A-(r)(X), IXX(T)(JJ)\ into varIxx(T)(^)as M —> X. It also suggests the reason for the reduced covariance in the casethat X, /z have the particular forms 2irs/T, 2irr/T with s, r as integers.
We now complete our investigation of the elementary asymptotic proper-ties of the periodogram by indicating its asymptotic distribution underregularity conditions. Theorem 4.4.1 indicated the asymptotic normality of</Ar(r)(X) for X of the form 2ws/T, s an integer. An immediate application ofthis theorem gives
Theorem 5.2.6 Let X(t), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.1. Let Sj(Tbe an integer with X/r) =2irSj{)/tending toXjas r-> oo for; = 1,. . ., J. Suppose 2X/r), X/r) db \k(T0 (mod2ir)for 1 ̂ j < k ^ J and T = 1,2, Let
j2
5.2 THE PERIODOGRAM 127
Figure 5.2.5 X22 probability plot of the upper 500 periodogram ordinates of monthly
mean sunspot numbers for the years 1750-1965.
The mean and variance of theasymptoticdistributionofIxx(T)(2irs/T)are seen to be consistent with the large sample mean and variance ofIxxm(2irs/T) given by expressions (5.2.8) and (5.2.18), respectively.
Theorem 5.2.6 does not describe the asymptotic distribution of Ixxm(tywhen X = 0 (mod 2ir). Theorem 4.4.1 indicates that the asymptotic distribu-tion isfxx(\)Xi2when EX(i) = cx = 0. In the case that ex ^ 0, Theorem4.4.1 suggests approximating the large sample distribution by fxx(Wi2
where x'i2 denotes a noncentral chi-squared variate with one degree of free-dom and noncentrality parameter \cx\^T/(2irfxx(\))-
Turning to the tapered case we have
Theorem 5.2.7 Let X(t), t = 0, =b l , . . . be a real-valued series satisfyingAssumption 2.6.1. Suppose 2\y, X7 ± \k ^ 0 (mod 2?r) for 1 ̂ j < k ^ J.Let h(u) satisfy Assumption 4.3.1. Let
128 THE ESTIMATION OF POWER SPECTRA
for — o > < X < oo. Then IXX(T)(^J), j = 1,.. ., J are asymptotically in-dependent/rA<X;)X22/2 variates. Also if X = ±TT, ±3r,. . . , Ixx(T)(\) isasymptotically fxx(\)*i2, independently of the previous variates.
With the definition and limiting procedure adopted, the limiting distribu-tion of Ixxm(^) is the same whether or not the series has been tapered. Thehope is, however, that in large samples the tapered estimate will have lessbias. A result of extending Theorem 5.2.5 to tapered values in the case of 0mean is
Theorem 5.2.8 Let X(t), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1) and having mean 0. Let h(u) satisfy Assumption 4.3.1and let Ixx{T)(X) be given by (5.2.22). Then
5.3 FURTHER ASPECTS OF THE PERIODOGRAM
The power spectrum, fxx(K), of the series X(t\ t = 0, ±1, ... was de-fined by
Here
The extent of dependence of Ixx(T)(>C) and IXX(T)(P) is seen to fall off as thefunction H2
(T) falls off.Bartlett (1950, 1966) developed expressions for the mean and covariance
of the periodogram under regularity conditions; he also suggested approxi-mating its distribution by a multiple of a chi-squared with two degrees offreedom. Other references to the material of this section include: Slutsky(1934), Grenander and Rosenblatt (1957), Kawata (1959), Hannan (1960),Akaike (1962b), Walker (1965), and Olshen (1967).
for — oo < X, /i < oo. The error term is uniform in X, ju.
5.3 FURTHER ASPECTS OF THE PERIODOGRAM 129
where cxx(u), u = 0, ±1,.. . was the autocovariance function of the series.This suggests an alternate means of estimating fxx(fy- We could first esti-mate cxx(u) by an expression of the form
where
and then, taking note of (5.3.1), estimate/A-AT(X) by
If we substitute expression (5.3.2) into (5.3.4), we see that this estimate takesthe form
that is, the periodogram of the deviations of the observed values from theirmean. We noted in the discussion of Theorem 5.2.2 that
for A of the form 2irs/T, s ^ 0 (mod T) and so Theorems 5.2.4 and 5.2.6 infact relate to estimates of the form (5.3.5).
In the tapered case where
we see directly that
suggesting the consideration of the 0 mean statistic
where
in the tapered case. The expected value of this statistic is indicated in Ex-ercise 5.13.22.
Turning to another aspect of the periodogram, we have seen that theperiodogram ordinates Ixx(T}(^i), J — 1» • • • »-A ai>e asymptotically inde-pendent for distinct \j, j = 1,...,/. In Theorem 5.3.1 we will see thatperiodogram ordinates of the same frequency, but based on differentstretches of data, are also asymptotically independent.
Theorem 5.3.1 Let X(t\ t = 0, ±1,. . . be a real-valued series satisfyingAssumption 2.6.1. Let h(u) satisfy Assumption 4.3.1 and vanish for it < 0.T p t
130 THE ESTIMATION OF POWER SPECTRA
We have therefore been led to consider the Fourier transform
based on mean-corrected values. We remark that, in terms of the Cram6rrepresentation of Section 4.6, this last may be written
showing the reduction of frequency components for X near 0, ±27r , . . . . Inthe light of this discussion it now seems appropriate to base spectral esti-mates on the modified periodogram
for - * < \ < oo, / = 0,.. . , L - 1. Then, as F-» «>, IXXM(\J) / = 0,.. ., L — 1, are asymptotically independent fxx(^22/2 variates if X ̂ 0(mod TT) and asymptotically independent fxx(X)*i2variates if X = ±x,±3r,
This result will suggest a useful means of constructing spectral estimateslater. It is interesting to note that we can obtain asymptotically independentperiodogram values either by splitting the data into separate segments, as wedo here, or by evaluating them at neighboring frequencies, as in Theorem5.2.7.
We conclude this section by indicating several probability 1 results relat-ing to the periodogram. We begin by giving an almost sure bound forIXX
(T)(X) as a function of X and T.
5.4 THE SMOOTHED PERIODOGRAM 131
Theorem 5.3.2 Let X(t), t = 0, ± 1 , . . . be a real-valued series satisfyingAssumption 2.6.3 and having mean 0. Let h(u) satisfy Assumption 4.3.1. Let
Then
with probability 1.
In words, the rate of growth of the periodogram is at most of order log T,uniformly in X, under the indicated conditions. A practical implication ofthis is that the maximum deviation of Ixxm(ty homfxx(ty as a function of Xbecomes arbitrarily large as T —> ». This is yet another indication of thefact that Ixx(T)0C)is often an inappropriate estimate ofxx(X).
We now briefly investigate the effect of a linear time invariant operationon the periodogram. Suppose
for some filter {a(u)} satisfying
5.4 THE SMOOTHED PERIODOGRAM
In this section we make our first serious proposal for an estimate of thepower spectrum. The discussion following Theorem 5.2.4 indicated that acritical disadvantage of the periodogram as an estimate of the power spec-trum,/**^), was that its variance was approximately fxxW1, under reason-
and having transfer function A(\). Theorem 4.5.2 indicated that underregularity conditions
almost surely. Elementary algebra then indicates that
with probability 1, the error term being uniform in X. In words, the effect offiltering on a periodogram is, approximately, multiplication by the modulussquared of the transfer function of the filter. This parallels the effect offiltering on the power spectrum as given in expression (5.1.7).
132 THE ESTIMATION OF POWER SPECTRA
able regularity conditions, even when based on a lengthy stretch of data. Onmany occasions we require an estimate of greater precision than this andfeel that it must exist. In fact, Theorem 5.2.6 suggests a means of construct-ing an improved estimate.
Suppose s(T) is an integer with 2irs(T)/T near X. Then Theorem5.2.6 indicates that the (2m -f 1) adjacent periodogram ordinatesIXX(T)(ITT{S(T)+ j]/T),j = 0, ±1,. . . , ±m are approximately independent/A-A-(X)X2
2/2variates,if2[s(r) + j\ ^ Q(modT)J = 0, ±1,. . . , ±w. Thesevalues may therefore provide (2m +1) approximately independent esti-mates of/A-A-(X), which suggests an estimate having the form
that is, a simple average of the periodogram ordinates in the neighborhoodof X. A further examination of Theorem 5.2.6 suggests the consideration ofthe estimate
if X = 0, ±27r, ±4ir,... or if X = d=7r, ±37r,. . . and T is even, and
if X = ±TT, ±37r,. . . and T is odd.The estimate given by expressions (5.4.1) to (5.4.3) is seen to have the
same non-negativity, periodicity, and symmetry properties as/^^(X) itself.It is based on the values dx(T)(2irs/T), s — 0,. . . , T — \ and so may berapidly computed by the Fast Fourier Transform Algorithm if T happens tobe highly composite. We will investigate its statistical properties shortly.
In preparation for Theorem 5.4.1 set
the Fejer kernel of Section 3.3. Then set
5.4 THE SMOOTHED PERIODOGRAM 133
and set
Figure 5.4.1 Plot of the kernel ATm(\) for T = 11 and m = 0,1,2, 3.
for
Figure 5.4.2 Plot of the kernel £rm(X) for T = 11 and m = 1, 2, 3.
Taking note of the properties of FT-(X), indicated in Section 3.3, we see thaty4rm(X), Brm(\), and Crm(X) are non-negative, have unit integral over theinterval (— TT.TT) and have period 2ir. They are concentrated principally in theinterval (—2vm/T)2irm/T) for — TT < X < TT. y4rm(X) is plotted in Figure5.4.1 for the values T = 11, m = 0,1, 2, 3. It is seen to have an approximaterectangular shape as was to be expected from the definition (5.4.5). #rm(X) isplotted in Figure 5.4.2 for the values T = 11, m - 1, 2, 3. It is seen to have ashape similar to that of Arm(\) except that in the immediate neighborhood ofthe origin it is near 0.
Turning to an investigation of the expected value of /or(r)(X) we have
Theorem 5.4.1 Let X(f), t = 0, ± 1,. . . be a real-valued series withEX(t) = cx and cov{*(f + «), *(01 = cxx(u) for t, u = 0, ±1Suppose
134 THE ESTIMATION OF POWER SPECTRA
5.4 THE SMOOTHED PERIODOGRAM 135
Let/A-Ar(r)(X) be given by (5.4.1) to (5.4.3). Then
The expected value of fxx(T)0$ is a weighted average of the power spec-trum of interest, fxx(ot), with weight concentrated in a band of width 4irm/Tabout X in the case X ̂ 0 (mod 2ir). In the case X = 0 (mod 2*-)Efxx(T)0^)remains a weighted average of fxx(a) with weight concentrated in theneighborhood of X with the difference that values offxx(a) in the immediateneighborhood of 0 are partially excluded. The latter is a reflection of thedifficulty resulting from not knowing EX(t). If m is not too large comparedto T andfxx(ci) is smooth, then Efxx(T}(X) can be expected to be near/A-A-(X)in both cases. A comparison of expressions (5.2.6) and (5.4.8) suggests thatthe bias of fxxm(X) will generally be greater than that of Ixxm(ty as theintegral extends over a greater essential range in the former case. We willmake detailed remarks concerning the question of bias later.
The theorem has the following:
Corollary 5.4.1 Suppose in addition that X - 2ws(T)/T = O(T~*), m isconstant with respect to r and
and T is odd.
then
In the limit, fxx(T)(^)is an asymptotically unbiased estimate offxxO^)-
In summary, with regard to its first moment, /r*(r)(X) seems a reasonableestimate of/o-(X) provided that m is not too large with respect to T. Theestimate seems reasonable in the case X = 0 (mod 2?r) even if EX(t) is un-known. Turning to the second-order moment structure of this estimatewe have
Theorem 5.4.2 Let X(i), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let /r*(r)(X) be given by (5.4.1) to (5.4.3) with
In the case X f^ 0 (mod TT), the effect of averaging 2m + 1 adjacentperiodogram ordinates has been to produce an estimate whose asymptoticvariance is l/(2m + 1) times that of the periodogram. Therefore, con-template choosing a value of m so large that an acceptable level of stabilityin the estimate is achieved. However, following the discussion of Theorem5.4.1, note that the bias of the estimate/*x(r)(X) may well increase as m isincreased and thus some compromise value for m will have to be selected.
The variance of /o-(r)(X) in the case of X = 0 (mod tr) is seen to be ap-proximately double that in the X ̂ 0 (mod TT) case. This reflects the fact thatthe estimate in the former case is based approximately on half as many inde-pendent statistics. The asymptotic distribution of fxx(T)(K) under certainregularity conditions is indicated in the following:
Theorem 5.4.3 Let X(f), t — 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.1. Let fxx(T)(Xbe given by (5.4.1) to (5.4.3) with 2irs(T)/T—> X as T —» oo. Suppose X, ± X* ^ 0 (mod 2?r) for 1 ^ j < k ^ J. Then/o-(7)(Xi), . . . , f x x ( T ) ( \ j ) are asymptotically independent with fxxm(\)asymptotically fxx(K)xlm+2/(4m + 2) if X ̂ 0 (mod ?r), asymptotically/™(X)x2m2/(2m) if X s 0 (mod *).
This theorem will prove especially useful when it comes time to suggestapproximate confidence limits for /rjr(X).
Figure 5.4.3 presents the logarithm base 10 of/A-A-(r)(X), given by (5.4.1)to (5.4.3), for the series of monthly sunspot numbers whose periodogramwas given in Figures 5.2.3 and 5.2.4. The statistic fxx(r)(X) is calculated for0 ^ X ̂ IT, m — 2, 5, 10, 20 and the growing stability of the estimate as mincreases is immediately apparent. The figures suggest that/rXX) has a lotof mass in the neighborhood of 0. This in turn suggests that neighboringvalues of the series tend to cluster together. An examination of the seriesitself (Figure 1.1.5) confirms this remark. The periodogram and the plotscorresponding to m = 2, 5, 10 suggest a possible peak in the spectrum in theneighborhood of the frequency .01 ST. This frequency corresponds to the
Also
136 THE ESTIMATION OF POWER SPECTRA
X - 2irs(T)/T = O(T~l). Suppose X ± M ̂ 0 (mod 2ir) and that m does notdepend on T. Then
5.4 THE SMOOTHED PERIODOGRAM 137
Figure 5.4.3 Logio/jcx(r)(X) for monthly mean sunspot numbers for the years 1750-1965with 2m + 1 periodogram ordinates averaged.
138 THE ESTIMATION OF POWER SPECTRA
eleven-year solar cycle suggested by Schwabe in 1843; see Newton(1958). Thispeak has disappeared in the case m = 20 indicating that the bias of theestimate has become appreciable. Because this peak is of special interest, wehave plotted/r*(r)(X) in the case m — 2 in an expanded scale in Figure 5.4.4.In this figure there is an indication of a peak near the frequency .030ir thatis the first harmonic of .01 ST.
Figures 5.4.5 to 5.4.8 present the spectral estimate /r*<r)(X) for theseries of mean monthly rainfall whose periodogram was given as Figure5.2.2. The statistic is calculated for m = 2, 5, 7, 10. Once again the increas-ing stability of the estimate as m increases is apparent. The substantial peak
Figure 5.4.4 Low frequency portion of logto fxx(T)(^)for monthly mean sunspot numbersfor the years 1750-1965 with five periodogram ordinates averaged.
Figure 5.4.5 fxx(T)0<) of composite rainfall series of England and Wales for the years1789-1959 with five periodogram ordinates averaged. (Logarithmic plot.)
Figure 5.4.6 fXx(T)(X)of composite rainfall series of England and Wales for the years1789-1959 with eleven periodogram ordinates averaged. (Logarithmic plot.)
Figure 5.4.7 fxx(T)(Xof composite rainfall series of England and Wales for the years1789-1959 with fifteen periodogram ordinates averaged. (Logarithmic plot.)
Figure 5.4.8 fxx(T)(X) of composite rainfall series of England and Wales for the years1789-1959 with twenty-one periodogram ordinates averaged. (Logarithmic plot.)
5.4 THE SMOOTHED PERIODOGRAM 141
in the figures occurs at a frequency of one cycle per year as would be ex-pected in the light of the seasonal nature of the series. For other values ofA, fxx(T)(ty is near constant suggesting that the series is made up approx-imately of an annual component superimposed on a pure noise series.
Figure 5.4.9 presents some empirical evidence related to the validity ofTheorem 5.4.3. It is a X30
2 probability plot of thevaluesfxx(T)(2irs/T),T = 2592, s = r/4,. .. , (r/2) - 1, for the series of monthly sunspots./¥A-(r)(A) has been formed by smoothing 15 adjacent periodogram ordinates.If/rXA) is near constant for ir/2 < X < T, as the estimated spectra suggest,and it is reasonable to approximate the distributionoffxx(T)0$by a multipleof Xso2, as Theorem 5.4.3 suggests, then the plotted values should tend to fallalong a straight line. In fact the bulk of the points plotted in Figure 5.4.9appear to do this. However, there is definite curvature for the rightmostpoints. The direction of this curvature suggests that the actual distributionmay have a shorter right-hand tail than that of a multiple of Xso2.
Figure 5.4.9 X302 probability plot of the upper 500 power spectrum estimates, when fifteen
periodograms are averaged, of monthly mean sunspot numbers for the years 1750-1965.
where
Let s(T) be an integer such that 2irs(T)/T is near X and 2[s(T) + j\ ^ 0 (modT), j = 0, ±1,.. ., ±w. Consider the estimate
5.5 A GENERAL CLASS OF SPECTRAL ESTIMATES
The spectral estimate of the previous section weights all periodogramordinates in the neighborhood of X equally. If fxx(a) is near constant for anear X, then this is undoubtedly a reasonable procedure; however, if fxx(a)varies to an extent, then it is perhaps more reasonable to weight periodo-gram ordinates in the immediate neighborhood of X more heavily than thoseat a distance. We proceed to construct an estimate that allows differentialweighting.
Let Wj,j — 0, ±1, . . . , ±w be weights satisfying
142 THE ESTIMATION OF POWER SPECTRA
It is important to note how informative it is to have calculated/r*(r)(X)not just for a single value of m, but rather for a succession of values. Thefigures for small values of m help in locating any nearly periodic componentsand their frequencies, while the figures for large values of m give exceedinglysmooth curves that could prove useful in model fitting. In the case that thevalues Ixxm(2irs/T), s = 1, 2 , . . . are available (perhaps calculated usingthe Fast Fourier Transform), it is an elementary matter to prepare estimatesfor a succession of values of m.
The suggestion that an improved spectral estimate might be obtained bysmoothing the periodogram was made by Daniell (1946); see also Bartlett(1948b, 1966), the paper by Jones (1965), and the letter by Tick (1966).Bartlett (1950) made use of the x2 distribution for smoothed periodogramestimates.
Because of the shape of Fr(X), both /4r(X) and #r(X) will be weight functionsprincipally concentrated in the interval ( — 2Trm/T,2irm/T) for — x < X < TT.BT(\) will differ from /4r(X) in having negligible mass for —2-rr/T < X< 2ir/T. In the case of equal weights, AT(\) and fir(X) are rectangular inshape. In the general case, the shape of /4r(X) will mimic that of WjJ = — m,. . . , 0, . . . , m.
Turning to an investigation of the properties of this estimate we beginwith
Theorem 5.5.1 Let X(t), t = 0, il,... be a real-valued series withEX(t) = cx and cov{*0 + «), X(i)\ = cxx(u) for t, u = 0, ±1,Suppose
Set
5.5 A GENERAL CLASS OF SPECTRAL ESTIMATES 143
In order to discuss the expected value of this estimate we must definecertain functions. Set
Let/™(r)(X) be given by (5.5.2) to (5.5.4), then
and T is even
and T is odd.
The expected value of the estimate (5.5.2) to (5.5.4) differs from that of theestimate of Section 5.4 in the nature of the weighted average of the power
144 THE ESTIMATION OF POWER SPECTRA
spectrum fx*(«)• Because we can affect the character of the weighted averageby the choice of the Wj, we may well be able to produce an estimate withless bias than that of Section 5.4 in the case that/**(«) varies in the neigh-borhood of X.
Corollary 5.5.1 Suppose in addition to the assumptions of the theorem,X - 2vs(T)/T = O(r-J) and
then
In the limit, fxx(T)(X) is an asymptotically unbiased estimate of fxx(ty-
Turning to the second-order moment structure, we have
Theorem 5.5.2 Let X(t), t — 0, ±1,. . . be a real-valued series satisfyingAssumption 2.6.2(1). Let fxx(T\X) be given by (5.5.2) to (5.5.4) withX - 2Trs(T)/T = O(r-0. Suppose X ± M ^ 0 (mod 2*-), then
Also
The variance of the estimate is seen to be proportional to ^/ W/2 forlarge T. We remark that
and so ]T)j Wp is minimized, subject to ]£_,• W:t = 1, by setting
It follows that the large sample variance of/>A-(r)(X) is minimized by takingit to be the estimate of Section 5.4. Following the discussion after Theorem
In the case W} = 1 /(2m + 1), this leads us back to the approximationsuggested by Theorem 5.4.3. The approximation of the distribution of
and
or
and
5.5 A GENERAL CLASS OF SPECTRAL ESTIMATES 145
5.5.1, it may well be the case that the estimate of Section 5.4 has greater biasthan the estimate (5.5.2) involving well-chosen Wj.
Turning to an investigation of limiting distributions we have
Theorem 5.5.3 Let X(t), t = 0, ±1,. . . be a real-valued series satisfyingAssumption 2.6.1. Let/or(r)(A) given by (5.5.2) to (5.5.4) with 2vs(T)/T-* \as r-> oo. Suppose X, db \k ̂ 0 (mod 2*) for 1 ̂ j < k ^ J. Thenfxx(T)(\\),. .. ,/A-jf(r)(X/)areasymptoticallyndependentwithfxxm(tyasymptotically
The different chi-squared variates appearing are statistically independent.
The asymptotic distribution of fxx(T)(ty is seen to be that of a weightedcombination of independent chi-squared variates. It may prove difficult touse this as an approximating distribution in practice; however, a standardstatistical procedure (see Satterthwaite (1941) and Box (1954)) is to approxi-mate the distribution of such a variate by a multiple, 0X,2, of a chi-squaredwhose mean and degrees of freedom are determined by equating first- andsecond-order moments. Here we are led to set
146 THE ESTIMATION OF POWER SPECTRA
power spectral estimates by a multiple of a chi-squared was suggested byTukey (1949). Other approximations are considered in Freiberger andGrenander (1959), Slepian (1958), and Grenander et al (1959).
In this section we have obtained a flexible estimate of the power spectrumby introducing a variable weighting scheme for periodogram ordinates. Wehave considered asymptotic properties of the estimate under a limiting pro-cedure involving the weighting of a constant number, 2m + 1, of periodo-gram ordinates as T —» <». For some purposes this procedure may suggestvalid large sample approximations, for other purposes it might be better toallow m to increase with T. We turn to an investigation of this alternatelimiting procedure in the next section, which will lead us to estimates thatare asymptotically normal and consistent.
It is possible to employ different weights Wj or different m in separateintervals of the frequency domain if the character of fxx(X) differs in thoseintervals.
5.6 A CLASS OF CONSISTENT ESTIMATES
The class of estimates we consider in this section has the form
where Wm(a), — <» < a < », 7 = 1,2, . . . is a family of weight functionsof period 2ir whose mass is arranged so that estimate (5.6.1) essentially in-volves a weighting of 2wr + 1 periodogram ordinates in the neighborhoodof X. In order to obtain an estimate of diminishing variance as T —> oo, wewill therefore require mr —» °° in contrast to the constant m of Section 5.5.Also the range of frequencies involved in the estimate (5.6.1) is 2ir(2mr + 1 )/T,and so, in order to obtain an asymptotically unbiased estimate, we will re-quire nij/T —»0 as T —•» °°. The estimate inherits the smoothness propertiesof WW(a).
A convenient manner in which to construct the weight function W(T\appearing in the estimate (5.6.1) and having the properties referred to, is toconsider a sequence of scale parameters BT, T = 1 ,2 , . . . with the propertiesBT > 0, BT —> 0, BTT -> » as T —> oo and to set
where W(fi), — oo < /? < oo , is a fixed function satisfying
5.6 A CLASS OF CONSISTENT ESTIMATES 147
Assumption 5.6.1 W(fi), — <» < ft < «>, is real-valued, even, of boundedvariation
that help to explain its character. For large T, in view of (5.6.3), the sum ofthe weights appearing in (5.6.1) should be near 1. The worker may wish toalter (5.6.1) so the sum of the weights is exactly 1. This alteration will haveno effect on the asymptotic expressions given below.
Turning to a large sample investigation of the mean of fxx(T)(X) we have
Theorem 5.6.1 Let X(t\ t = 0, ±1, ... be a real-valued series withEX(i) = cx and cov{X(t + u),X(f)\ = cxx(u) for /, u = 0, ±1, Suppose
Letfxx(T)(\) be given by (5.6.1) where W(fi) satisfies Assumption 5.6.1. Then
and
If we choose W(@) to be 0 for \0\ > 2ir, then we see that the estimate(5.6.1) involves the weighting of the IBrT + 1 periodogram ordinates whosefrequencies fall in the interval (X — 27rJ3r,X + 2?rBr). In terms of the intro-duction to this section, the identification mr = BjT is made.
Because W(T)(a) has period 2ir, the same will be true of fxx(T)(\). Like-wise, because W™(-a) = W(T)(a), we will have f x x ( T ) ( - X ) =fxx(T)(\).The estimate (5.6.1) is not necessarily non-negative under Assumption 5.6.1;however if, in addition, we assume W(ft) ^ 0, then it will be the case thatfxx(T)(\) 2 0. Because of (5.6.3), JJ' W^(a)da = 1.
In view of (5.6.2) we may set down the following alternate expressionsfor/™(7->(\),
The error terms are uniform in X.
In Corollary 5.6.2 below we make use of the function
This is a periodic extension of the Kronecker delta function
148 THE ESTIMATION OF POWER SPECTRA
The expected value offxx(T)(X)is seen to be a weighted average of thefunction fxx(a\ — <» < a < », with weight concentrated in an intervalcontaining X and of length proportional to BT. We now have
Corollary 5.6.1 Under the conditions of the theorem and if BT —» 0 asT —» oo, fxx(T)0\) is an asymptotically unbiased estimate offxxO^), that is,
The property of being asymptotically unbiased was also possessed by theestimate of Section 5.5. Turning to second-order large sample properties ofthe estimate we have
Theorem 5.6.2 Let X(i), t — 0, ± 1,. . . be a real-valued series satisfyingAssumption 2.6.2(1). Let fxx(T}(X) be given by (5.6.1) where W(p) satisfiesAssumption 5.6.1. Then
Corollary 5.6.2 Under the conditions of Theorem 5.6.2 and if BrT —» <» as
5.6 A CLASS OF CONSISTENT ESTIMATES 149
In the case of p = \, this corollary indicates
In either case the variance of /r*(r'(X) is tending to 0 as BjT —» <». InCorollary 5.6.1 we saw that Efxx
(T\^) ~^fxx(\) as T -» <*> if BT -> 0. There-fore the estimate (5,6,1) has the property
under the conditions of Theorem 5.6.2 and if BT —» 0, BTT~-» °° as T -> °°.Such an estimate is called consistent in mean square.
Notice that in expression (5.6.13) we have a doubling of variance atX = 0, ±T, ±2r Expression (5.6.9) is much more informative in thisconnection. It indicates that the transition between the usual asymptoticbehavior and that at X = 0, ±T, ±2w, , . . takes place in intervals aboutthese points whose length is of the order of magnitude of BT.
We see from (5.6.12), t\iatfxxm(*),fxxm(n) are asymptotically uncorre-lated as T —» «, provided X - ^ , X - f - / u ^ O (mod IK). Turning to theasymptotic distribution ofyWl"(X), we have
Theorem 5.6.3 Let X(t), t = 0, ± 1, . . . be a real-valued series satisfyingAssumption 2.6.1. Let fxx(nW be given by (5.6.1) with W(&) satisfyingAssumption 5.6.1. Supposefxx(\j) ?* 0,y = 1, . . . , 7. Then/V,r(T)(Xi)fxx(T)(^j) are asymptotically normal with covariance structure given by(5.6.12)asT-> => with BTT—> », fir —» 0.
The estimate considered in Section 5.4 had an asymptotic distributionproportional to chi-squared under the assumption that we were smoothing afixed number of periodogram ordinates. Here the number of periodogramordinates being smoothed is increasing to <» with T and so it is not sur-prising that an asymptotic normal distribution results. One interesting impli-cation of the theorem is that fxxmW and fxxm(n) are asymptoticallyindependent if X ±fc M ^ 0 (mod 2*-). The theorem has the following:
Corollary 5.6.3 Under the conditions of Theorem 5.6.3 and if fxxM ^ 0,logio/tA-<r)(X) is asymptotically normal with
var
150 THE ESTIMATION OF POWER SPECTRA
This corollary suggests that the variance of log/AMr(r)(X) may not dependtoo strongly on the magnitude of fxx(ty, nor generally on X, for large T.Therefore, it is probably more sensible toplotthestatisticlogfxx(T)(^),rather than /o-(r)(X) itself. In fact this has been the standard engineeringpractice and is what is done for the various estimated spectra of this chapter.
Consistent estimates of the power spectrum were obtained by Grenanderand Rosenblatt (1957) and Parzen (1957, 1958). The asymptotic mean andvariance were considered by these authors and by Blackman and Tukey(1958). Asymptotic normality has been demonstrated by Rosenblatt (1959),Brillinger (1965b, 1968), Brillinger and Rosenblatt (1967a), Hannan (1970),and Anderson (1971) under various conditions. Jones (1962a) is also ofinterest.
In the case that the data have been tapered prior to forming a powerspectral estimate, Theorem 5.6.3 takes the form
Theorem 5.6.4 Let X(t), t = 0, db 1,. . . be a real-valued series satisfyingAssumption 2.6.1. Let h(t), — <» < / < °°, be a taper satisfying Assumption4.3.1. Let W(a), — oo < a < <», satisfy Assumption 5.6.1. Set
where
Set
Let BT-+O, BTT-><*> as r-»oo. Then fxxm(\\),.. . ,fxx(T)(\j)areasymptotically jointly normal with
and
5.7 CONFIDENCE INTERVALS 151
By comparison with expression (5.6.12) the limiting variance of thetapered estimate is seen to differ from that of the untapered estimate bythe factor
By Schwarz's inequality this factor is ^ 1, In the case where we employ acosine taper extending over the first and last 10 percent of the data, its valueis 1.116. It is hoped that in many situations the bias of the tapered estimatewill be reduced so substantially as to more than compensate for this increasein variance. Table 3.3.1 gives some useful tapers.
5.7 CONFIDENCE INTERVALS
In order to communicate an indication of the possible nearness of anestimate to a parameter, it is often desirable to provide a confidence intervalfor the parameter based on the estimate. The asymptotic distributions deter-mined in the previous sections for the various spectral estimates may beused in this connection. We first set down some notation. Let z(a), x,2^*)denote numbers such that
and
where z is a standard normal variate and X,2 a chi-squared variate with vdegrees of freedom.
Consider first the estimate of Section 5.4,
If we take logarithms, this interval becomes
for 2irj(r)/rnear \ ^ 0 (mod *). Theorem 5.4.3 suggests approximating itsdistribution byfxxWx*^/^™ + 2)- This leads to the following lOOy per-cent confidence interval (arfxx(\)
152 THE ESTIMATION OF POWER SPECTRA
The degrees of freedom and multipliers of chi-squared will be altered inthe case X = 0 (mod TT) in accordance with the details of Theorem 5.4.3.
In Figure 5.7.1 we have set 95 percent limits around the estimate, corre-sponding to m = 2, of Figure 5.4.4. We have inserted these limits in twomanners. In the upper half of Figure 5.7.1 we have proceeded in accordancewith expression (5.7.5). In the lower half, we have set the limits around astrongly smoothed spectral estimate; this procedure has the advantage ofcausing certain peaks to stand out.
In Section 5.5, we considered the estimate
involving a variable weighting of periodogram ordinates. Its asymptotic dis-tribution was found to be that of a weighted sum of exponential variates.This last is generally not a convenient distribution to work with; however,in the discussion of Theorem 5.5.3 it was suggested it be approximated byfxx(\W/vwhere
in the case X ̂ 0 (mod TT). Taking this value of v, we are led to the following1007 percent confidence interval for log/o-(X),
If Wj = l/(2m + 1), j = 0, ±1, . . . , ±m, then the interval (5.7.8) is thesame as the interval (5.7.5).
If v is large, then logio {**2A) is approximately normal with mean 0 andvariance 2(.4343)2/V- The interval (5.7.8) is therefore approximately
Interval (5.7.9) leads us directly into the approximation suggested by theresults of Section 5.6. The estimate considered there had the form
Figure 5.7.1 Two mannersof setting 95 percent confi-dence limits about the powerspectrum estimate of Figure5.4.4.
154 THE ESTIMATION OF POWER SPECTRA
the interval (5.7.11) is in essential agreement with the interval (5.7.9). Theintervals (5.7.9) and (5.7.11) are relevant to the case X ̂ 0 (mod TT). IfX s= 0 (mod T), then the variance of the estimate is approximately doubled,indicating we should broaden the intervals by a factor of -^2.
In the case that we believe fxx(a) to be a very smooth function in someinterval about X, an ad hoc procedure is also available. We may estimate thevariance of fxx(T)(tyfrom the variation offxx(T)(a)in the neighborhood ofX. For example, this might prove a reasonable procedure for the frequencies7T/2 < X < ir in the case of the series of monthly mean sunspot numbersanalyzed previously.
The confidence intervals constructed in this section apply to the spectralestimate at a single frequency X. A proportion 1 — 7 of the values may beexpected to fall outside the limits. On occasion we may wish a confidenceregion valid for the whole frequency range. Woodroofe and Van Ness (1967)determined the asymptotic distribution of the variate
where NT —> °° as T —» oo. An approximate confidence region for fxx(\),0 < X < ir, might be determined from this asymptotic distribution.
5.8 BIAS AND PREFILTERING
In this section we will carry out a more detailed analysis of the bias ofthe proposed estimates of a power spectrum. We will indicate how an ele-mentary operation, called prefiltering, can often be used to reduce this bias.We begin by considering the periodogram of a series of tapered values. Forconvenience, assume EX(f) = 0, although the general conclusions reachedwill be relevant to the nonzero mean case as well.
Because
Corollary 5.6.3 leads us to set down the following 1007 percent confidenceinterval for logio/rX^)>
5.8 BIAS AND PREFILTERING 155
Let
where h(u) is a tapering function vanishing for u < 0, u > 1. The peri-odogram here is taken to be
If we define the kernel
where
then Theorem 5.2.3 indicates that
We proceed to examine this expected value in greater detail. Set
As we might expect from Section 3.3, we have
Theorem 5.8.1 LeiX(t), t = 0, ± 1, . . . be a real-valued series with 0 meanand autocovariance function satisfying
Let the tapering function h(u) be such that k(T)(u) given by (5.8.6) satisfies
Let Ixx(T)(X) be given by (5.8.2), then
where/rjr'^X) is the/rth derivative offxx(ty- The error term is uniform in X.
156 THE ESTIMATION OF POWER SPECTRA
From its definition, fc(r>(w) = k(T)(-u) and so the kp in (5.8.8) are 0 forodd p. The dominant bias term appearing in (5.8.9) is therefore
This term is seen to depend on both the kernel employed and the spectrumbeing estimated. We will want to chose a taper so that jfo) is small. In fact, ifwe use the definition (3.3.11) of bandwidth, then the bandwidth of thekernel K(T)(a) is -\j\k2\/T also implying the desirability of small \ki\. Thebandwidth is an important parameter in determining the extent of bias. Inreal terms, the student will have difficulty in distinguishing (or resolving)peaks in the spectrum closer than ^\k2\/T apart. This was apparent to anextent from Theorem 5.2.8, which indicated that the statistics Ixx(T)(ty andfxx(T}(n) were highly dependent for \ near /x- Expressions (5.8.9) and(5.8.10) do indicate that the bias will be reduced in the case that/**(«) isnear constant in a neighborhood of X; this remark will prove the basis forthe operation of prefiltering to be discussed later.
Suppose next that the estimate
with 2irs(T)/T near X and
is considered. Because
the remarks following Theorem 5.8.1 are again relevant and imply that thebias of (5.8.11) will be reduced in the case that ki is small orfxx(a) is nearconstant. An alternate way to look at this is to note, from (5.8.5), that
where the kernel
appearing in (5.8.14) has the shape of a function taking the value Wj for anear 2irj/T,j = 0, ±1,. . . , ±m. In crude terms, this kernel extends over aninterval m times broader than that of K(T)(a) and so tffxx(a) is not constant,
5.8 BIAS AND PREFILTERING 157
the bias of (5.8.11) may be expected to be greater than that of /*jr(r)(X). Itwill generally be difficult to resolve peaks in fxx(X) nearer than m^k.2/Twith the statistic (5.8.11). The smoothing with weights Wj has caused a lossin resolution of the estimate Ixx(T)(ty. It must be remembered, however, thatthe smoothing was introduced to increase the stability of the estimate and itis hoped that the smoothed estimate will be better in some overall sense.
We now turn to a more detailed investigation of the consistent estimateintroduced in Section 5.6. This estimate is given by
with Ixx(T)(X) given by (5.8.2) and W(T)(a) given by (5.6.2).
Theorem 5.8.2 Let X(i), t = 0, ±1, . . . be a real-valued series withEX(t) = 0 and autocovariance function satisfying
for some P ^ 1. Let the tapering function h(u) be such that k(T)(u) of (5.8.6)satisfies (5.8.8) for \u\ ^ T. Let/^"(X) be given by (5.8.16) where W(a)satisfies Assumption 5.6.1. Then
The error terms are uniform in X.
From expression (5.8.18) we see that advantages accrue from tapering inthis case as well. Expression (5.8.18) indicates that the expected value isgiven, approximately, by a weighted average with kernel W(T)(a) of thepower spectrum of interest. The bandwidth of this kernel is
and so is of order O(#r).In Corollary 5.8.2 we set
Now if/ATA-(a:) is constant, fxx, then (5.8.24) equalsfxx exactly. This suggeststhat the nearer/A-*(«) is to being constant, the smaller the bias. Suppose thatthe series X(t), / = 0, ±1,.. . is passed through a filter with transfer func-tion A(\). Denote the filtered series by Y(t), t = 0, ± 1,. . . . From Example2.8.1, the oower spectrum of this series is eiven bv
with inverse relation
Let/yy(T)(X) be an estimate of the power spectrum of the series Y(t). Rela-tion (5.8.26) suggests the consideration of the statistic
158 THE ESTIMATION OF POWER SPECTRA
Corollary 5.8.2 Suppose in addition to the conditions of Theorem 5.8.2
then
Because W(P) = W(-&), the terms in (5.8.22) with p odd drop out. Wesee that the bias, up to order B^~l, may be eliminated by selecting a W(p)such that Wp = 0 for p = 1,.. . , P - 1. Clearly such a W(&) must take onnegative values somewhere leading to complications in some situations. IfP = 3, then (5.8.22) becomes
Now from expression (5.8.19) the bandwidth of the kernel W{T)(a) is essen-tially Br^W2/1 and once again the bias is seen to depend directly on boththe bandwidth of the kernel and the smoothness of /**(«) for « near X.
Tne discussion of Section 3.3 gives some help with the question of whichkernel W(T)(d) to employ in the smoothing of the periodogram. Luckily thisquestion can be made academic in large part by a judicious filtering of thedata prior to estimating the power spectrum. We have seen that £/V*(r)(X) isessentially given by
and proceed as above. In the case that the series X(f) is approximately anautoregressive scheme of order m, see Section 2.9; this must be a near opti-mum procedure. It seems to work well in other cases also.
A procedure of similar character, but not requiring any filtering of thedata, is if the series Y(t) were obtained from the series X(f) by filtering withtransfer function A(\), then following (5.3.20) we have
then form the filtered series
and so
Had A(\) been chosen so that \A(a)\2fxx(a) were constant, then (5.8.28)would equal /r*(X) exactly. This result suggests that in a case where/*•*(«)is not near constant, we should attempt to find a filter, with transfer functionA(\), such that the filtered series Y(t) has near constant power spectrum;then we should estimate this near constant power spectrum from a stretch ofthe series Y(i)\ and finally, we take |/4(X)|-2/yy(r)(X) as an estimateoffxxO>).This procedure is called spectral estimation by prefiltering or prewhitening; itwas proposed in Press and Tukey (1956). Typically the filter has been deter-mined by ad hoc methods; however, one general procedure has been pro-posed by Parzen and Tukey. It is to determine the filter by fitting an autore-gressive scheme to the data. Specifically, for some m, determine a(r)(l),. . . ,a(T\m) to minimize
5.8 BIAS AND PREFILTERING 159
as an estimate of /A-A-(X). Following the discussion above the expected valueof this estimate essentially equals
160 THE ESTIMATION OF POWER SPECTRA
The discussion above now suggests the following estimate of fxx(X),
A similar situation holds if the ordinate Ixx(T)(2irS/T) is dropped from theestimate. Since the values dx(T)(2*s/T), s = 0,. . . , S - I, S + 1, . . . , T/2are unaffected by whether or not a multiple of the series exp {±i2irSt/T},t = 0,. .. , T 1 is subtracted droppingIxx(T)(2frS/Tis equivalent toforming the periodogram of the values X(t), t = 0, . . . , T — 1 with bestfitting sinusoid of frequency 2irS/T removed. The idea of avoiding certainfrequencies in the smoothing of the periodogram appears in Priestley(1962b), Bartlett (1967) and Brillinger and Rosenblatt (1967b).
Akaike (1962a) discusses certain aspects of prefiltering. We sometimeshave a good understanding of the character of the filter function, A(\), usedin a prefiltering and so are content to examine the estimated spectrum of thefiltered series Y(t) and not bother to divide it by |/4(X)|2.
5.9 ALTERNATE ESTIMATES
Up until this point, the spectral estimates discussed have had the char-acter of a weighted average of periodogram values at the particular fre-quencies 2-n-s/r, s = 0,. . ., T — 1. This estimate is useful because theseparticular periodogram values may be rapidly calculated using the FastFourier Transform Algorithm of Section 3.5 if T is highly composite and inaddition their joint large sample statistical behavior is elementary; seeTheorems 4.4.1 and 5.2.6. In this section we turn to the consideration ofcertain other estimates.
where
where the function A(a) has been chosen in the hope that \A(a)\2fxx(of) isnear constant. This estimate is based directly on the discrete Fourier trans-form of the values X(i), t = 0 , . . . , T — 1 and is seen to involve the smooth-ing of weighted periodogram ordinates. In an extreme situation where/A-x(ot)appears to have a high peak near 2irS/T ^ X,wemaywishtotake/4(27rS/r) = 0.The sum in (5.8.33) now excludes the periodogramordinateIxx(T)(2irS/T)altogether. We remark that the ordinate Ixx(T}(0) is already missing fromthe estimate (5.8.16). Following the discussion of Theorem 5.2.2, this isequivalent to forming the periodogram of the mean adjusted values
a1
5.9 ALTERNATE ESTIMATES 161
The estimate considered in Section 5.6 has the specific form
where
If the discrete average in (5.9.1) is replaced by a continuous one, this esti-mate becomes
Now
If this is substituted into (5.9.2), then that estimate takes the form
where
The estimate (5.9.5) is of the general form investigated by Grenander(1951a),Grenander and Rosenblatt (1957), and Parzen (1957); it contains as par-ticular cases the early estimates of Bartlett (1948b), Hamming and Tukey(1949), and Bartlett (1950). Estimate (5.9.5) was generally employed untilthe Fast Fourier Transform Algorithm came into common use.
In fact the estimates (5.9.1) and (5.9.2) are very much of the same char-acter as well as nearly equal. For example, Exercise 5.13.15 shows that(5.9.5) may be written as the following discrete average of periodogramvalues
for any integer S ^ 2T — 1; see also Parzen (1957). The expression (5.9.7)requires twice as many periodogram values as (5.9.1). In the case that S is
— IT < a, X < ir, A small, and
for some finite L and — <» < X < ».
It is seen that, in the case that BT does not tend to 0 too quickly, theasymptotic behavior of the two estimates is essentially identical.
The discussion of the interpretation of power spectra given in Section5.1 suggests a spectral estimate. Specifically, let A(a) denote the transferfunction of a band-pass filter with the properties
162 THE ESTIMATION OF POWER SPECTRA
highly composite it may be rapidly computed by the Fast Fourier Transformof the series
or by computing CXX{T)(U), u — 0, ±1,. .. using a Fast Fourier Transformas described in Exercise 3.10.7, and then evaluating expression (5.9.5), againusing a Fast Fourier Transform. In the reverse direction, Exercise 5.13.15shows that the estimate (5.9.1) may be written as the following continuousaverage of periodogram values
where
A uniform bound for the difference between the two estimates is provided by
Theorem 5.9.1 Let W(a), — °° < a. < °°, satisfy Assumption 5.6.1 andhave a bounded derivative. Then
The estimate (5.9.15) therefore has similar form to estimate (5.4.1).In Theorem 5.3.1 we saw that periodogram ordinates of the same fre-
quency, X ̂ 0 (mod ?r), but based on different stretches of data wereasymptotically independent fxx(^)X22/2 variates. This result suggests that
and so is approximately equal to
and approximately equals 0 otherwise. Using Parseval's formula
If dxl.]\)(2irs/T) denotes the discrete Fourier transform of the filtered valuesX(t,\\ t = 0,. . . , T - 1 and 2irs(T)/T = X, then
in the case X ̂ 0 (mod 2r). In fact, it appears that this last is the firstspectral estimate used in practice; see Pupin (1894), Wegel and Moore(1924), Blanc-Lapierre and Fortet (1953). It is the one generally employedin real-time or analog situations. Turning to a discussion of its character, webegin by supposing that
This suggests the consideration of the estimate
The construction of filters with such properties was discussed in Sections 2.7,3.3, and 3.6. If X(t,\), t = 0, db l , . . . denotes the output series of such afilter, then
5.9 ALTERNATE ESTIMATES 163
164 THE ESTIMATION OF POWER SPECTRA
we construct a spectral estimate by averaging the periodograms of differentstretches of data. In fact we have
Theorem 5.9.2 Let X(f), t — 0, ±1 , . . . be a real-valued series satisfyingAssumption 2.6.1. Let
where T = LV. Then fxx(T)(X) is asymptotically fxx(^2L2/(2L) ifX ^ 0 (mod IT) and asymptotically fxx(X)*L2/L if X = ±ir, ±3w,. . . asK—> oo.
Bartlett (1948b, 1950) proposed the estimate (5.9.21); it is also discussed inWelch (1967) and Cooley, Lewis, and Welch (1970). This estimate has theadvantage of requiring fewer calculations than other estimates, especiallywhen V is highly composite. In addition it allows us to examine the assump-tion of stationarity. Welch (1967) proposes the use of periodograms basedon overlapping stretches of data. Akcasu (1961) and Welch (1961) con-sidered spectral estimates based on the Fourier transform of the data. Theresult of this theorem may be used to construct approximate confidencelimits for/™(X), if we think of the Ixx
(y)(\,l), / = 0,. . . , L - 1 as L inde-pendent estimates of fxxfr).
In the previous section it was suggested that an autoregressive scheme befitted to the data in the course of estimating a power spectrum. Parzen (1964)suggested that we estimate the spectrum of the residual series 7(/) for asuccession of values m and when that estimate becomes nearly flat wetake
as the estimate of fxx(ty, where A(T)(\~) is the transfer function of the filtercarrying the series over to the residual series. This procedure is clearlyrelated to prefiltering. Certain of its statistical properties are considered inKromer (1969), Akaike (1969a), and Section 8.10.
In the course of the work of this chapter we have seen the importantmanner in which the band-width parameter m, or BT, affects the statisticalbehavior of the estimate. In fact, if we carried out some prewhitening of the
f
This is essentially the estimate (5.9.21) if J = V. In the case H(x) = jr1, theestimate takes the form
5.9 ALTERNATE ESTIMATES 165
series, the shape of the weight function appearing in the estimate appears un-important. What is important is its band-width. We have expected the stu-dent to determine m, or BT, from the desired statistical stability. If the de-sired stability was not clear, a succession of band-widths were to be em-ployed. Leppink (1970) proposes that we estimate BT from the data andindicates an estimate; see also Picklands (1970). Daniels (1962) and Akaike(1968b) suggest procedures for modifying the estimate.
In the case that X(t), t = 0, ±1,... is a 0 mean Gaussian series, an esti-mate of fxx(X), based solely on the values
(where sgn X = 1 if X > 0, sgn X = — 1 if X < 0), was proposed byGoldstein; see Rodemich (1966), and discussed in Hinich (1967), McNeil(1967), and Brillinger (1968). Rodemich (1966) also considered the problemof constructing estimates of fxx(ty from the values of X(f) grouped in ageneral way.
Estimates have been constructed by Jones (1962b) and Parzen (1963a) forthe case in which certain values X(t), t = 0,. . . , T — 1 are missing in asystematic manner. Brillinger (1972) considers estimation for the case inwhich the valuesX(T\), . . . , X(Tn) are available n,. . . , rn being the times ofevents of some point process. Akaike (1960) examines the effect of observingX(t) for / near the values 0 ,1 , . . . , T — \ rather than exactly at these values;this has been called jittered sampling.
Pisarenko (1972) has proposed a flexible class of nonlinear estimates. Letthe data be split into L segments. Let cxxm(u,l), u = 0, ±1, . . . ; / = 0,. . . , L — \ denote the autocovariance estimate of segment /. Let HJ(T), U/r),j = 1 , . . . , /denote the latent roots and vectors of [Ir1 J)/ cxx(T}(J — A:,/);j, k = 1, . . . , J]. Pisarenko suggests the following estimate offxx(X),
where H(x), 0 < x < », is a strictly monotonic function with inverse h(.).He was motivated by the definition 3.10.27 of a function of a matrix.In the case H(x) = x, the estimate (5.9.24) may be written
of
166 THE ESTIMATION OF POWER SPECTRA
with [Cjk(T}] the inverse of the matrix whose latent values were computed.The estimate (5.9.26) was suggested by Capon (1969) as having high resolu-tion. Pisarenko (1972) argues that if J, L —> » as T —+ <», and if the series isnormal, then the estimate (5.9.24) will be asymptotically normal withvariance
Capon and Goodman (1970) suggest approximating the distribution of(5.9.26) by fxxWX2L-2j+i/2L if X ̂ 0 (mod TT) and by /™(X)xi_y+,/L ifX = ±TT, ±3?r, . . . .
Sometimes we are interested in fitting a parametric model for the powerspectrum. A useful general means of doing this was proposed in Whittle(1951, 1952a, 1961). Some particular models are considered in Box andJenkins (1970).
5.10 ESTIMATING THE SPECTRAL MEASURE ANDAUTOCOVARIANCE FUNCTION
Let X(i), t = 0, ±1,. . . denote a real-valued series with autocovariancefunction cxx(u), u = 0, ±1,. . . and spectral density fxA-(X), — °° < X < <».There are a variety of situations in which we would like to estimate thespectral measure
introduced in Section 2.5. There are also situations in which we would liketo estimate the autocovariance function
itself, and situations in which we would like to estimate a broad-bandspectral average of the form
W(a) being a weight function of period 2x concentrated near a = 0 (mod 2?r).The parameters (5.10.1), (5.10.2), and (5.10.3) are all seen to be particular
cases of the general form
5.10 ESTIMATING THE SPECTRAL MEASURE 1«7
For this reason we turn to a brief investigation of estimates of the parameter(5.10.4) for given A(a). This problem was considered by Parzen (1957).
As a first estimate we consider the statistic
where IxxmM, — °° < X < <*>, js the periodogram of a stretch of valuesX(t), t = 0,. . . , T — 1. Taking a discrete average at the points 2*s/T allowsa possible use of the Fast Fourier Transform Algorithm in the course of thecalculations.
Setting
otherwise
amounts to proposing
as an estimate of Fxx(\). Taking
we see from Exercise 3.10.8 that we are proposing the circular autoco-variance function
as an estimate of cxx(u). (HereX(t), / = 0, ± 1 , . . . is the period T extensionof X(t), t = 0,. . . , T - 1.) Taking
leads to our considering a spectral estimate of the form
The statistic of Exercise 5.13.31 is sometimes used to test the hypothesis that a stationary Gaussian series has power spextrum fxx). We have
168 THE ESTIMATION OF POWER SPECTRA
Theorem 5.10.1 Let ̂ (0, f = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let Aj(a), 0 ^ a < 2ir, be bounded and of boundedvariation for./ = 1, . . . , / . Then
Also
Finally J(T)(Aj), j — 1 , . . . ,7 are asymptotically jointly normal with theabove first- and second-order moment structure.
From expression (5.10.12) we see that J(T)(Aj) is an asymptotically un-biased estimate of J(Aj). From the fact that its variance tends to 0 as T —> <»,we see that it is also a consistent estimate.
In the case of estimating the spectral measure Fxx(ty, taking A(a) to be(5.10.6), expression (5.10.13) gives
In the case of estimating the autocovariance function cxx(u), where A(a) isgiven by (5.10.8), expression (5.10.13) gives
In this case of a constant weight function, the spectral estimatesfxxm(tyand fxx(T)(v) are not asymptotically independent as was the case for theweight functions considered earlier.
If estimates offxx(oi) andfxxxx(a,@,y) are computed, then we may substi-tute them into expression (5.10.13) to obtain an estimate of var J(T)(Aj). Thisestimate may be used together with the asymptotic normality, to constructapproximate confidence limits for the parameter.
In some situations the student may prefer to use the following estimate in-volving a continuous weighting,
where CXX{T\U) is the sample autocovariance function of (5.9.4) and
We see that the two estimates will be close for large T and that theirasymptotic distribution will be the same.
5.10 ESTIMATING THE SPECTRAL MEASURE 169
In the case of the broad-band spectral estimate of (5.10.10), expression(5.10.13) gives
For example, if A(a) = exp {iua}, this gives the sample autocovariancefunction CXX(T)(U) itself, in contrast to the circular form obtained before.
The estimate (5.10.17) does not differ too much from the estimate (5.10.5).We have
Theorem 5.10.2 Let A(a), 0 ^ a ^ 2-jr, be bounded and of bounded varia-tion. Let^(0, t = 0, ±1,. . . satisfy Assumption 2.6.2(1). Then
170 THE ESTIMATION OF POWER SPECTRA
The spectral measure estimate, Fxx(r'QC), given by (5.10.7) is some-times useful for detecting periodic components in a series and for examiningthe plausibility of a proposed model especially that of pure noise. InFigure 5.10.1 we give Fxx(T)(X)/Fxx(T)(ir),0 < X ̂ TT, for the series of meanmonthly sunspot numbers. The periodogram of this series was given inSection 5.2. The figure shows an exceedingly rapid increase at the lowestfrequencies, followed by a steady increase to the value 1. We remark that ifAA-(X) were constant in a frequency band, then the increase of Fxx(ty wouldbe linear in that frequency band. This does not appear to occur in Figure5.10.1 exceot. oossiblv. at freauencies above 7r/2.
Figure 5.10.1 Plot of Fxx(T)(^)/Fxx(T)M for monthly mean sunspot numbers for the years1750-1965.
The sample autocovariance function, CXX(T)(U), u = 0, ±1,.. ., of aseries stretch also is often useful for examining the structure of a series. InFigures 5.10.2 and 5.10.3 we present portions of CXX(T)(U) for the series ofmean annual and mean monthly sunspot numbers, respectively. The mostapparent character of these figures is the substantial correlation of values ofthe series that are multiples of approximately 10 years apart. The kink nearlag 0 in Figure 5.10.3 suggests that measurement error is present in this data.
Asymptotic properties of estimates of the autocovariance function wereconsidered by Slutsky (1934) in the case of a 0 mean Gaussian series. Bartlett
5.10 ESTIMATING THE SPECTRAL MEASURE 171
Figure 5.10.2 The autocovariance estimate, CXX(T)(U), for annual mean sunspot numbersfor the years 1750-1965.
Figure 5.10.3 The autocovariance estimate, CXX(T\U), for monthly mean sunspot numbersfor the years 1750-1965.
172 THE ESTIMATION OF POWER SPECTRA
(1946) developed the asymptotic second-order moment structure in the caseof a 0 mean linear process. Asymptotic normality was considered in Walker(1954), Lomnicki and Zaremba (1957b, 1959), Parzen (1957), Rosenblatt(1962), and Anderson and Walker (1964). Akaike (1962a) remarked that itmight sometimes be reasonable to consider CXX(T\U), u = 0, ±1,. . . as asecond-order stationary times series with power spectrum 2irT-lfxx(X)2-Thiscorresponds to retaining only the second term on the right in (5.10.15).Brillinger (1969c) indicated two forms of convergence with probability 1 anddiscussed the weak convergence of the estimate to a Gaussian process.
5.11 DEPARTURES FROM ASSUMPTIONS
In this section we discuss the effects of certain elementary departures fromthe assumptions adopted so far in this chapter. Among the importantassumptions adopted are
and
where cxx(u) = cov \X(t + u),X(t)\ for t, u = 0, ±1,We first discuss a situation in which expression (5.11.2) is not satisfied.
Suppose that the series under consideration is
with RJ, Wj constants, 0y uniform on (—v,ir),j = !,...,/ and the series e(0satisying Assumption 2.6.1. The autocovariance function of the series (5.11.3)is quickly seen to be
and so condition (5.11.2) is not satisfied. We note that the spectral measure,Fxx(X), whose existence was demonstrated in Theorem 2.5.2, is given by
in this case where
This is essentially the procedure suggestion of Schuster (1898) to use theperiodogram as a tool for discovering hidden periodicities in a series. Theresult (5.11.10) suggests that we may estimate/,,(X) by smoothing the period-ogram Ixx(T)(X), avoiding the ordinates at frequencies in the immediateneighborhoods of the coy. If v periodogram ordinates Ixx(T)(2irs/T) (s aninteger) are involved in a simple averaging to form an estimate, then itfollows from Theorem 4.4.1 and expression (5.11.10) that this estimate willbe asymptotically fxx(^)^-2v
2/(2v)in the case X ̂ 0 (mod TT) with similarresults in the case X = 0 (mod T).
Bartlett (1967) and Brillinger and Rosenblatt (1967b) discuss the abovesimple modification of periodogram smoothing that avoids peaks. It isclearly related to the technique of prefiltering discussed in Section 5.8.
5.11 DEPARTURES FROM ASSUMPTIONS 173
and/«(X) denotes the spectral density of the series e(0, t = 0, ±1, . ... Thegeneralized derivative of expression (5.11.5) is
0 ^ X ̂ TT, 5(X) being the Dirac delta function. The function (5.11.7) has in-finite peaks at the frequencies Uj,j = 1,.. ., J, superimposed on a boundedcontinuous function, /e,(X). A series of the character of expression (5.11.3) issaid to be of mixed spectrum.
Turning to the analysis of such a series, we note from expression (5.11.3)that
where A(r)(X) is given in expression (4.3.14). Now the function A(r)(X) haslarge amplitude only for X s= 0 (mod 2?r). This means that
while
for |X ± wy| > S/T, -TT < X ̂ TT. The result (5.11.9) suggests that we mayestimate the coy by examining the periodogram 7*;r(r)(X) for substantialpeaks. At such a peak we might estimate Rj by
174 THE ESTIMATION OF POWER SPECTRA
Other references include: Hannan (1961b), Priestley (1962b, 1964) andNicholls (1967). Albert (1964), Whittle (1952b), Hext (1966), and Walker(1971) consider the problem of constructing more precise estimates of the o>7
of (5.11.3).We next turn to a situation in which the condition of constant mean
(5.11.1) is violated. Suppose in the manner of the trend model of Section 2.12
for t = 0, ±1,... with <£i(/)) • • • > 4>XO known fixed functions, 0i, . . . , 6jbeing unknown constants, and e(0, t — 0, ± 1, . . . being an unobservable 0mean series satisfying Assumption 2.6.1. This sort of model was consideredin Grenander (1954). One means of handling it is to determine the leastsquares estimates 6\(T\ . . ., 0/r) of 61,.. ., 6j by minimizing
and then to estimate /.,(X) from the residual series
t = 0,. . ., T — 1. We proceed to an investigation of the asymptoticproperties of such a procedure. We set down an assumption concerning thefunctions <f>\(t\ • • • > <k/(0-
Assumption 5.11.1 Given the real-valued functions <£//)> * = 0, ±1, . . . ,j — 1,. . . ,J, there exists a sequence NT, T = 1 ,2 , . . . with the propertiesNT —* °°, NT+I/NT —» 1 as T—> « such that
As examples of functions satisfying this assumption we mention
and finite collections of estimates/yr)(Ai),.. . ,fee(T)(^K) are asymptotically
jointly normal.
Its covariance function satisfies
and it is asymptotically normal. If BTT —> <» as T —> <», then the vanatefee
(T\\) is asymptotically independent of 6(r) with mean
where W(T)(a) = £, ̂ ^( '̂'[a 4- 2?r/]) and W(a) satisfies Assumption5.6.1. Then the variate 6(r) = [e\(T)- • -Bj™] has mean 6 = [6\ • • -8j]. Its Co-variance matrix satisfies
5.11 DEPARTURES FROM ASSUMPTIONS 175
for constant Rj, a>j, <t>j,j= 1 , . . . , J. We see directly that
taking NT - T. Other examples are given in Grenander (1954).We suppose that /M/*(M) is taken as the entry in row j and column k of the
J X J matrix m^(«),y, k — 1,...,/. It follows from Exercise 2.13.31 thatthere exists an r X r matrix-valued function G^(X), — TT < X ̂ TT, whoseentries are of bounded variation, such that
for u = 0, =fc 1, . . . . We may now state
Theorem 5.11.1 Let e(t), t — 0, ±1,.. . be a real-valued series satisfyingAssumption2.6.2(l),havingOmeanandpowerspectrum/,,(X), — °° <X < «>.Let 0//), j — 1,...,/, / = 0, ±1,.. . satisfy Assumption 5.11.1 withm^(0) nonsingular. Let X(t) be given by (5.11.12) for some constants0 i , . . ., 6j. Let 0i(r),. . . , dj(T) be the least squares estimates of 6\, . . . , 6 j .Let e(t) be given by (5.11.14), and
176 THE ESTIMATION OF POWER SPECTRA
Under the limiting procedure adopted, the asymptotic behavior of/ee(T)(X)
is seen to be the same as that of an estimate /«,(r)(X) based directly on theseries e(r), t — 0, ±1,. . . . We have already seen this in the case of a seriesof unknown mean, corresponding toJ = l,<£i(0 — l»0 i = cx, B\(T} — CX{T)
= T~l Sf=o *(0- The theorem has the following:
Corollary 5.11.1 Under the conditions of the theorem, 6(r) and/ee(r)(X) are
consistent estimates of 6 and /,,(X), respectively.
Other results of the character of this theorem will be presented in Chapter6. Papers related to this problem include Grenander (1954), Rosenblatt(1956a), and Hannan (1968). Koopmans(1966) is also of interest. A commonempirical procedure is to base a spectral estimate on the series of first differ-ences, e(f) = X(i) — X(t — 1), t = 1,. .. , T - 1. This has the effect ofremoving a linear trend directly. (See Exercise 3.10.2.)
A departure of much more serious consequence, than those consideredso far in this section, is one in which cov [X(t + w), X(t)\ depends on both tand u. Strictly speaking a power spectrum is no longer well defined, seeLoynes (1968); however in the case that cov [X(t + u), X(f)\ depends onlyweakly on t we can sometimes garner important information from a stretchof series by spectral type calculations. We can proceed by forming spectralestimates of the types considered in this chapter, but based on segments ofthe data, rather than all the data, for which the assumption of stationaritydoes not appear to be too seriously violated. The particular spectral esti-mate which seems especially well suited to such calculations is the one con-structed by averaging the squared output of a bank of band-pass filters (seeSection 5.9). Papers discussing this approach include Priestley (1965) andBrillinger and Hatanaka (1969).
A departure from assumptions of an entirely different character isthe following: suppose the series X(t) is defined for all real numbers /,— °o < t < °o. (Until this point we have considered X(i) defined fort = 0, ±1,. . . .) Suppose
is defined for — <» < t, u < <» and satisfies
then both
— a> < X < oo 5 are defined. The function fxxO^), — °° < X < °°, is calledthe power spectrum of the discrete series JT(/), / = 0, dt 1,. . . , whereasgxx(b), — °° < X < °°, is called the power spectrum of the continuous seriesX(t), — oo < t < oo. The spectrum gxx(ty may be seen to have very muchthe same character, behavior, and interpretation as/r*(X). The two spectra
/A-A-(X) and gA-A-(X) are intimately related because from (5.11.26) we have
We see from expression (5.11.29) that a frequency X in the discrete seriesX(t),t = 0, ±1,. . . relates to the frequencies X, X ± 2?r, . . . of the continuousseries AX/), — °=> < t < oo. As/o-(X) = fxx( — X), it also relates to the fre-quencies — X, — X ± 2ir,.... For this reason the frequencies
and
5.11 DEPARTURES FROM ASSUMPTIONS 177
giving
f o r w = 0, ±1 From (5.11.25)
have been called aliases by Tukey. It will be impossible to distinguish theirindividual character by means of fxx(h) alone. As an example of the mean-ing of this, consider the series
— oo < / < OD , where 0 is uniform on ( — ir,ir). Considering this continuousseries we have
— oo < u < oo, and from the definition (5.11.26)
This function has infinite peaks at the frequencies X = ±<o -f 2irj, j = 0,±1,. . . and so o> cannot be determined directly, but only said to be one ofthese frequencies. An implication, for practice, of this discussion is that if apower spectral estimate fxx<r)(X) is computed for 0 ^ X ̂ TT, and is foundto have a peak at the frequency co, we cannot be sure which of the frequencies±o> + 2-irj, j — 0, ±1,. . . might be leading to the peak.
An example of such an occurrence with which the author was once con-cerned is the following: data were to be taken periodically on the number ofelectrons entering a conical horn on a spinning satellite of the Explorerseries. The electron field being measured was highly directional in characterand so the data could be expected to contain a substantial periodic com-ponent whose period was that of the satellite's rotation. It was planned thatthe satellite would rotate at such a rate and the data would be taken withsuch a time interval that the frequency of rotation of the satellite would fallin the interval 0 < X < IT. Unfortunately the satellite ended up spinningsubstantially more rapidly than planned and so the frequency of rotationfell outside the interval 0 < X < IT. The spectrum of the data was estimatedand found to contain a substantial peak. It then had to be decided which ofthe aliased frequencies was the relevant one. This was possible on thisoccasion because optical information was available suggesting a crude valuefor the frequency.
Sometimes a prefiltering of the data can be carried out to reduce the diffi-culties of interpretation caused by aliasing. Suppose that the continuoustime series AT(r), — «> < / < <», is band-pass filtered to a band of the sort[—Try — IT,—wj], [irj,irj + TT] prior to recording values at the time t = 0,±1, . . . . In this case we see from (5.11.29) that
178 THE ESTIMATION OF POWER SPECTRA
This function has infinite peaks at X = ±co. Now considering the function(5.11.32) for / = 0, ±1,.. . , we have from (2.10.8) or (5.11.29)
and the interpretation is consequently simplified.We conclude this section by indicating some of the effects of sampling the
series X(i), — °° < t < °°, at a general time spacing h > 0. The values re-corded for analysis in the time interval [0,7*) are nov/X(uh), u = 0 , . . . , ( /— 1where U = T/h. If the series is stationary with cov \X(uh),X(G)} — cxx(uh),u — 0, ±1,. .. and
and proceed by smoothing Ixxm(X), for example.The problem of aliasing was alluded to in the discussion of Beveridge
(1922). Discussions of it were given in Press and Tukey (1956) and Blackmanand Tukey (1958).
5.12 THE USES OF POWER SPECTRUM ANALYSIS
In Chapter 1 of this work we documented some of the various fields ofapplied research wherein the frequency analysis of time series had provenuseful. In this section we indicate some examples of particular uses of thepower spectrum.
A Descriptive Statistic Given a stretch of data, X(i), t = 0,. . . , T — 1, thefunction fxx (r)(X) is often computed simply as a descriptive statistic. It con-
for |X| ̂ ir/h and no aliasing complications arise.When we come to estimate /o-(X) from the stretch X(uh), u = 0, .. .,
U — 1, T — uh, we define
The upper limit of the interval [0,7r/A], namely ir/h, is called the Nyquist fre-quency or folding frequency. If the series X(f), —• » < / < oo, possesses nocomponents with frequency greater than the Nyquist frequency, then
5.12 THE USES OF POWER SPECTRUM ANALYSIS 179
we define the power spectrum fxx(^), — °° < X < °°, by
and have the inverse relation
The power spectrum fxx(X) is seen to have period 2ir/h. As/**(— X) =fxx(ty, its fundamental domain may be taken to be the interval [0,ir/h]. Theexpression (5.11.29) is replaced by
180 THE ESTIMATION OF POWER SPECTRA
denses the data, but not too harshly. For stationary series its approximatesampling properties are elementary. Its form is often more elementary thanthat of the original record. It has been computed in the hope that an under-lying mechanism generating the data will be suggested to the experimenter.Wiener (1957, 1958) discusses electroencephalograms in this manner. Thespectrum fxx(T)(h) has been calculated as a direct measure of the power, inwatts, of the various frequency components of an electric signal; see Bode(1945) for example. In the study of color (see Wright (1958)), the powerspectrum is estimated as a key characteristic of the color of an object.
We have seen that the power spectrum behaves in an elementary mannerwhen a series is filtered. This has led Nerlove (1964) and Godfrey and Karre-man (1967) to use it to display the effect of various procedures that havebeen proposed for the seasonal adjustment of economic time series. Cart-wright (1967) used it to display the effect of tidal filters.
Some further references taken from a variety of fields include: Condit andGrum (1964), Haubrich (1965), Yamanouchi (1961), Manwell and Simon(1966), Plageman et al (1969).
Informal Testing and Discrimination The use of power spectra for testingand discrimination has followed their use as descriptive statistics. In thestudy of color, workers have noted that the spectra of objects of differentcolor do seem to vary in a systematic way; see Wright (1958). Carpenter(1965) and Bullard (1966) question whether earthquakes and explosionshave substantially different power spectra in the hope that these two couldbe discriminated on the basis of spectra calculated from observed seismo-grams. Also, the spectra derived from the EEGs of healthy and neuro-logically ill patients have been compared in the hope of developing a diag-nostic tool; see Bertrand and Lacape (1943), Wiener (1957), Yuzuriha(1960), Suhara and Suzuki (1964), Alberts et al (1965), and Barlow (1967).
We have seen that the power spectrum of a white noise series is constant.The power spectrum has, therefore, been used on occasion as an informaltest statistic for pure noise; see Granger and Morgenstern (1963), Press andTukey (1956), for example. It is especially useful if the alternate is someother form of stationary behavior. A common assumption of relationshipbetween two series is that, up to an additive pure noise, one comes aboutfrom the other in some functional manner. After a functional form has beenfit, its aptness can be measured by seeing how flat the estimated powerspectrum of the residuals is; see also Macdonald.and Ward (1963). The mag-nitude of this residual spectrum gives us a measure of the goodness of fitachieved. Frequency bands of poor fit are directly apparent.
A number of papers have examined economic theories through the spec-trum; for example, Granger and Elliot (1968), Howrey (1968), and Sargent(1968).
5.13 EXERCISES 181
Estimation Power spectra are of use in the estimation of parameters ofinterest. Sometimes we have an underlying model that leads to a functionalform for the spectrum involving unknown parameters. The parameters maythen be estimated from the experimental spectrum; see Whittle (1951,1952a,1961) and Ibragimov (1967). Many unknown parameters are involved if wewant to fit a linear process to the data; see Ricker (1940) and Robinson(1967b). The spectrum is of use here. The shift of a peak in an observedspectrum from its standard position is used by astronomers to determine thedirection of motion of a celestial body; see Brace well (1965). The motionof a peak in a spectrum calculated on successive occasions was used byMunk and Snodgrass (1957) to determine the apparent presence of a stormin the Indian Ocean.
Search for Hidden Periodicities The original problem, leading to the defini-tion of the second-order periodogram, was that of measuring the frequencyof a (possibly) periodic phenomenon; see Schuster (1898).Peaksinfxx(T)(tydo spring into immediate view and their broadness gives a measure of theaccuracy of determination of the underlying frequency. The determinationof the dominant frequency of brain waves is an important step in the analy-sis of a patient with possible cerebral problems; see Gibbs and Grass (1947).Bryson and Button (1961) searched for the period of sunspots in treering records.
Smoothing and Prediction The accurate measurement of power spectra isan important stage on the way to determining Kolmogorov-Wiener smooth-ing and predicting formulas; see Kolmogorov (1941a), Wiener (1949),Whittle (1963a). The problems of signal enhancement, construction of opti-mum transmission forms for signals of harmonic nature (for example,human speech) fall into this area.
while
5.13 EXERCISES
5.13.1 If K(0 = £;„ a(t - u)X(u), while f(/) = ]£„ a(t - «)*(«), where AX/) isstationary with mean 0, prove that
and hence, if Ixx(T)(2irj/T) is smoothed across its whole domain, the valueobtained is nixx(T)(Q)< (This result may be employed as a check on thenumerical accuracy of the computations.) Indicate a similar result con-cerningcxx(T)(fy-
5.13.3 Let X(t), t = 0, ±1, ±2, . . . be a real-valued second-order stationaryprocess with absolutely summable autocovariance function cxx(u) andpower spectrum fxx(X), — TT < X ̂ TT. If fxx(X) ^ 0, show that thereexists a summable filter b(u), such that the series E(t) — T^ b(t — u)X(u)has constant power spectrum. Hint: Take the transfer function of b(u) tobe [/jrjr(X)]-1'2 and use Theorem 3.8.3.
5.13.4 Prove that exp{ — Ixx(T)(X)/fxx(X)}tends, in distribution, to a uniformvariate on (0,1) as 7*—» » under the conditions of Theorem 5.2.7.
5.13.5 Under the conditions of Theorem 5.2.7, prove that the statistics (irT)~l
[Re </*(r)(X)P and (xD~l[Im «/jr(r>(X)P tend, in distribution, to inde-pendent fxx(X) Xi2 variates.
5.13.6 Let Jxx(T)(X) be the smaller of the two statistics of the previous exerciseand Kxx(T)(X) the larger. Under the previous conditions, prove that theinterval [Jxx(T)(X), Kxx(T)(Xj\ provides an approximate 42 percent confi-dence interval for fxx(\). See Durbin's discussion of Hannan (1967b).
5.13.7 Prove that the result of Theorem 5.2.6 is exact, rather than asymptotic, ifX(0), . . . ,X(T — 1) are mean 0, variance a2 independent normal variates.
5.13.8 Let W(ct) = 0 for a < A, a > B, B > A, A, B finite. Prove that theasymptotic variance given in (5.6.13) is minimized by setting W(a) =(B - A)~l for A ^ a ^ B.
5.13.9 Prove, under regularity conditions, that the periodogram is a consistentestimate of fxx(X) if fxxfr) = 0.
5.13.10 Let Y(t) be a series with power spectrum /yy(X) and Z(t) an independentseries with power spectrum fxx(\). Let X(f) = Y(t) for 0 ^ / ^ T/2 andX(t) = Z(t) for 272 < t ^ T — 1. Determine the approximate statisticalproperties ofIxx(T}(X).
integer.
5.13.14 Under the conditions of Theorem 5.6.2 prove that
5.13.2 Prove that
182 THE ESTIMATION OF POWER SPECTRA
If A(a) has a bounded first derivative, then it equals O(l) uniformly in X.5.13.18 Suppose *(/) = R cos (at + <£) + eO), t = 0 , . . . , T - I where the s(/) are
independent N(Q,a2) variates. Show that the maximum likelihood estimateof a is approximately the value of X that maximizes Ixx(T)0$', see Walker(1969).
5.13.19 Let X(t), / = 0, ±1, ... be real-valued, satisfy Assumption 2.6.2(1), andhave mean 0. Let W(a) satisfy Assumption 5.6.1. If BTT—» oo as F—> <»show that
5.13.15 (a) Prove that
5.13 EXERCISES 183
where Z>r-i(«) is given by (5.9.10). Hint: Use expression (3.2.5).(b) Prove that
then
5.13.17 Under Assumption 2.6.1, prove that if
5.13.20 Let X(t), t = 0, ±1, . . . be a real-valued series satisfying Assumption2.6.1. Let cx
(T) = T-i XX"1 X(t). Show that ^(CX(T) - cx) is asymptot-ically independent of ^JT(cxx(T)(u) — cxx(iij) which is asymptoticallynormal with mean 0 and variance
5.13.16 Prove that
184 THE ESTIMATION OF POWER SPECTRA
5.13.21 Under the conditions of Theorem 5.6.3, show that •^ff(cx - ex) and-\lB^f[fxx(T)(\)- £/Ar*(T)(A)] are asymptotically independent and normal-
5.13.22 Show that the expected value of the modified periodogram (5.3.13) is givenby
tends in distribution to a Student's t with 2m degrees of freedom. (Thisresult may be used to set approximate confidence limits for ex.)
5.13.26 Let X(t), t = 0, ±1, . . . be a series with EX(t) = 0 and cov{^(r + K),X(t)\ = cxx(u) for t, u = 0, ±1, . . . . Suppose
and tends to fxx(v for X ̂ 0 (mod 2ir).5.13.23 Under the conditions of Theorem 5.2.6 prove that
for some finite K. Conclude that supx | fxx(T\\) - Efxx(T)(\)\ tends to 0 inprobability if BrT —» °°.
5.13.25 Under the conditions of Theorem 5.4.3 show that
Show that
Show that there exists a finite K such that
for u = 0, ±1, . . . and T = 1, 2,5.13.27 Let the real-valued series AX/), / = 0, ±1, ... be generated by the auto-
regressive scheme
5.13.24 Let fxx(T)(\) be given by (5.6.1) with W(0) bounded. Suppose
5.13 EXERCISES 185
for / = 0, ±1, . . . where s(/) is a series of independent identically dis-tributed random variables with mean 0 and finite fourth-order moment.Suppose all the roots of the.equation
is asymptotically normal with EGxx(T)(\) = X + (XT'-1) and
5.13.32 Use Exercise 2.13.31 to show that (5.11.20) may be written
satisfy |z| > 1. Let a(n(l),. . . , a(r>(m) be the least squares estimates offl(l),. . ., a(ni). They minimize
Show that Vna(r)(l) - a(l),..., a(r)(m) - a(m)] tends in distribution toNm(Q,[cxx(j - A:)]"1) as T-* <*>. This result is due to Mann and Wald(1943).
5.13.28 If a function g(x) on [0,1] satisfies \g(x) - g(y)\ ^ G\x - y\afor somea, 0 < a. ^ 1, show that
5.13.29 Show that /br(r)(A), given by (5.6.1), satisfies
5.13.30 Under the conditions of Theorem 5.2.1 and ex = 0, show that
5.13.31 Under the conditions of Theorem 5.10.1, show that
5.13.33 With the notation of Section 5.11, show that fxx(\) ^ gxxfr) for all X.
6
ANALYSIS OF A LINEAR TIMEINVARIANT RELATION BETWEEN
A STOCHASTIC SERIES ANDSEVERAL DETERMINISTIC SERIES
6.1 INTRODUCTION
Let Y(t), e(/), t = 0, ± 1,. .. be real-valued stochastic series and letX(/), t = 0, ±1,... be an r vector-valued fixed series. Then suppose M is aconstant and that {a(i/)j is a 1 X r filter. Hence, in this chapter we shall beconcerned with the investigation of relations that have the form
We will assume that the error series, e(/), is stationary with 0 mean andpower spectrum/«,(X). This power spectrum is called the error spectrum, it isseen to measure the extent to which the series Y(t) is determinable from theseries X(f) by linear filtering. We will assume throughout this text that valuesof the dependent series, 7(/)» and values of the independent series, X(Y), areavailable for t = 0,.. ., T - 1. Because Es(t) = 0,
That is, the expected value of Y(t) is a filtered version of X(t). Note from re-lation (6.1.2) that the series Y(t) is not generally stationary. However, fork *> 1.
and so the cumulants of Y(t) of order greater than 1 are stationary.
186
6.1 INTRODUCTION 187
The transfer function of the filter a(w) is given by
Let us consider the behavior of this transfer function with respect to filteringsof the series Y(t), X(t). Let {b(w)} be an r X r filter with inverse jc(w)}. Let\d(u)\ be a 1 X 1 filter. Set
then
Set
and
The relation (6.1.1) now yields
where
That is, the relation between the filtered series Y\(t), Xi(0, si(/) has the sameform as the relation (6.1.1). In terms of transfer functions, (6.1.11) mayhe written
or
We see that the transfer function relating Y(t) to X(r) may be determinedfrom the transfer function relating Yi(t) to Xi(f) provided the required in-
188 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
verses exist. We note in passing that similar relations exist even if Yi(t) of(6.1.7) involves X through a term
6.2 LEAST SQUARES AND REGRESSION THEORY
Two classical theorems form the basis of least squares and linear regres-sion theory. The first is the Gauss-Markov Theorem or
Theorem 6.2.1 Let
for some 1 X r filter (e(tt)}. These remarks will be especially importantwhen we come to the problem of prefHtering the series prior to estimatingA(X).
Throughout this chapter we will consider the case of deterministic X(/)and real-valued stochastic Y(i). Brillinger (1969a) considers the model
t = 0, ± 1,. . . where X(i) is deterministic and Y(/), t(t) are s vector-valued.In Chapter 8 the model (6.1.15) is considered with X(/) stochastic.
where E is a 1 X n matrix of random variables with Et = 0, EtTt = o-2!, a isa 1 X k matrix of unknown parameters, X is a k X n matrix of knownvalues. Then
is minimized, for choice of a, by a = YX^XX7)"1 if XXT is nonsingular. Theminimum achieved is Y(I — XT(XXT)~1X)YT. Also £a = a, the covariancematrix of a, is given by £(a — a)T(a — a) = er^XX1")"1 and if a2 = (« — fc)~l
Y(I - XT(XXr)~1X)YT then Ed2 = a2. In addition, a is the minimum vari-ance linear unbiased estimate of a.
These results may be found in Kendall and Stuart (1961) Chapter 19, forexample. The least squares estimate of a is a. Turning to distributional as-pects of the above a and a2 we have
Theorem 6.2.2 If, in addition to the conditions of Theorem 6.2.1, the ncomponents of e have independent normal distributions, then ar isA^aV^XXO"1), and a2 is <r2x2
n_k/(n - K) independent of a.
which has a /„_* distribution.These results apply to real-valued random variables and parameters. In
fact, in the majority of cases of concern to us in time series analysis wewill require extensions to the complex-valued case. We have
Theorem 6.2.3 Let
is minimized, for choice of a, by a = YX^XXO"1 if XXT is nonsingular. Theminimum achieved is Y(I - Xr(XXT)-1X)YT. Also £a = a, £(a - a)'X (a - a) = 0 and £(a - a)*(a - a) = (XXO-1^2. If 62 = (n - k)~{
Y(I - Xr(XXT)-1X)YT, then Ed2 = a2.
Turning to distributional aspects, we have
where e is a 1 X n matrix of complex-valued random variables with Et = 0,EtTt — 0, EiTt = ff2l. a is a 1 X k matrix of unknown complex-valuedparameters, X is a k X n matrix with known complex-valued entries and Y isa 1 X n matrix of known complex-valued entries. Then
and so its distribution is determinable directly from the noncentral F.Suppose dj and 0, denote they'th entries of a and a respectively and c}j de-
notes they'th diagonal entry of (XX7")"1. Then the confidence intervals for a/may be derived through the pivotal quantity,
the squared sample multiple correlation coefficient. It may be seen that0 ^ RYx2 ^ 1. Also from (6.2.3) we see that
is noncentral F, degrees of freedom k over n — k and noncentrality param-eter aXXTar/cr2. We see that the hypothesis a = 0 may be tested by notingthat (6.2.3) has a central /*.„_* distribution when the hypothesis holds. Arelated statistic is
It follows directly from Theorem 6.2.2 tf
6.2 LEAST SQUARES AND REGRESSION THEORY 189
190 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
Theorem 6.2.4 If in addition to the conditions of Theorem 6.2.3, thecomponents of ehavindependenWic(0,<2distributions,then aT isJV^(aT,(XXr)~V2) and 62 is a2X2(»_fc)/[2(« - fc)] independent of a.
We may conclude from this theorem that
is noncentral F, degrees of freedom 2k over 2(n — k) and noncentralityparameter aXXTar/<r2. This statistic could be used to test the hypothesisa = 0. A related statistic is
the squared sample complex multiple correlation coefficient. It may be seendirectly that 0 £ \&Yx\2 ^ 1. Also from (6.2.10) we see that
and so its distribution is determinable directly from the noncentral F.Theorems 6.2.3 and 6.2.4 above are indicated in Akaike (1965). Under theconditions of Theorem 6.2.4, Khatri (1965a) has shown that a and (n — k)62/nare the maximum likelihood estimates of a and a2.
An important use of the estimate a is in predicting the expected value ofyo, the variate associated with given XQ. In this connection we have
Theorem 6.2.5 Suppose the conditions of Theorem 6.2.4 are satisfied.Suppose also
where eo is independent of e of (6.2.7). Let yo = axo, then yo is distributed as7Vic(axo,(r2xoT(XXT)~1xo) and is independent of <r2.
On a variety of occasions we will wish to construct confidence regions forthe entries of a of expression (6.2.7). In the real-valued case we saw that con-fidence intervals could be constructed through the quantity (6.2.6) whichhas a t distribution under the conditions of Theorem 6.2.2. In the presentcase complications arise because a has complex-valued entries.
Let dj and «y denote they'th entries of a and a respectively. Let Cjj denotethe /th diagonal entry of (XXr)-1. Let w, denote
The quantity
6.2 LEAST SQUARES AND REGRESSION THEORY 191
has the form ir1/2z where z is Wic(0,l) and v is independently X2(*_*)/\2(n - k)}. Therefore
has an /2;2(n-;t) distribution. A 100/8 percent confidence region for Re a/,Im dj may thus be determined from the inequality
where F(fi) denotes the upper 1000 percent point of the F distribution. Wenote that this region has the form of a circle centered at Re dj, Im dj.
On other occasions it may be more relevant to set confidence intervals for\dj\ and arg dj. One means of doing this is to derive a region algebraicallyfrom expression (6.2.16). Let
then (6.2.16) is, approximately, equivalent to the region
This region was presented in Goodman (1957) and Akaike and Yamanouchi(1962).
The region (6.2.18) is only approximate. An exact 1007 percent intervalfor \dj\ may be determined by noting that
is noncentral F with degrees of freedom 2 and 2(n — k) and noncentralityparameter |o/|2/2. Tables for the power of the F test (see Pearson andHartley (1951) can now be used to construct an exact confidence interval for\dj\. The charts in Fox (1956) may also be used.
Alternatively we could use expression (6.2.15) to construct an approxi-mate lOOy percent confidence interval by determining its distribution bycentral F with degrees of freedom
and 2(« — k). This approximation to the noncentral F is given in Abramo-witz and Stegun (1964); see also Laubscher (1960).
In the case of </>y = arg «/ we can determine an exact 1005 percent con-fidence interval by noting that
192 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
has a /2(»-fc) distribution. It is interesting to note that this procedure is re-lated to the Creasy-Fieller problem; see Fieller (1954) and Halperin (1967).The two exact confidence procedures suggested above are given in Grovesand Hannan (1968).
If simultaneous intervals are required for several of the entries of a, thenone can proceed through the complex generalization of the multivariate /.See Dunnett and Sobel (1954), Gupta (1963a), Kshirsagar (1961), and Dickey(1967) for a discussion of the multivariate t distribution. However, we con-tent ourselves with defining the complex t distribution. Let z be distributedas Nic(Q,l) and independently let s2 be distributed as xn
2/n. Then z/s has acomplex t distribution with n degrees of freedom. If u = Re t, v = Im t,then its density is given by
— oo < u, v < oo. A related reference is Hoyt (1947).
6.3 HEURISTIC CONSTRUCTION OF ESTIMATES
We can now construct estimates of the parameters of interest. Set
The model (6.1.1) then takes the form
The values X(r), t = 0, . . . , T — 1 are available and therefore we can cal-culate the finite Fourier transform
In the present situation, it is an r vector-valued statistic. Define
The approximate relation between c//?(r)(X) and d^(r)(X) is given by
Lemma 6.3.1 Suppose that |X(01 ^ M, t = 0, ±1, ... and that
6.3 HEURISTIC CONSTRUCTION OF ESTIMATES 193
and
if - oo < a < oo with |a - X| ̂ LT-1.
Let 5(7") be an integer with 2irs(T)/T near X. Suppose T is large. Fromexpression (6.3.6)
for s — 0, ±1,. . . , ±m say. If e(0 satisfies Assumption 2.6.1, then follow-ing Theorem 4.4.1 the quantities dt
(T)(2ir[s(T) + s]/T), s = 0, ±1,. . . , ±mare approximately Nic(Q,2irTflt(\y) variates. Relation (6.3.7) is seen to havethe form of a multiple regression relation involving complex-valued variates.Noting Theorem 6.2.3, we define
Suppose that the /• X r matrix fxx(T\\) is nonsingular. We now estimateA(X) by
and/,,(X) by
Theorem 6.2.4 suggests the approximating distributions A^rc(A(X)%
(2m + l)-'/,,(X)f^(:r)(X)-Ofor A«W and [2(2m + 1 - /•)]-'/,.(X)xi(2m+1_r)
for ^t,(^(X).
In the next sections we generalize the estimates (6.3.12) and (6.3.13) andmake precise the suggested approximate distributions.
As an estimate of n we take
194 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
where CY(T) and c*(r) are the sample means of the given Y and X values. Wewill find it convenient to use the statistic n(T} + A(r)(0)cjr<r) = cr(T) in thestatement of certain theorems below.
The heuristic approach given above is suggested in Akaike (1964, 1965),Duncan and Jones (1966) and Brillinger (1969a).
6.4 A FORM OF ASYMPTOTIC DISTRIBUTION
In this section we determine the asymptotic distribution of a class of ele-mentary estimates, suggested by the heuristic arguments of Section 6.3, forthe parameters A(X) and/.,(X). The form and statistical properties of these es-timates will depend on whether or not X = 0 (mod T). Consider three cases:
Case A X satisfies X ̂ 0 (mod *•)Case B X satisfies X = 0 (mod 2ir) or = ±TT, ±3ir,. . . and T is evenCase C X satisfies X = dbir, ±3ir,. . . and T is odd.
Suppose s(T) is an integer with 2irs(T)/T near X. (We will later require2irs(T)/T —» X as T —> <».) Suppose m is a non-negative integer. Let IyA-(r)(X)be given by (6.3.8). Define
and
with similar definitions for/yy(r)(X) and fA-*(r)(X). These estimates are basedon the discrete Fourier transforms of the data and so may be computed by aFast Fourier Transform Algorithm.
As estimates of A(X), f,K\), n we take
as an estimate of/x. A theorem indicating the behavior of the mean of A(r)(X)in the present situation is
Theorem 6.4.1 Let e(t), / = 0, ±1,. .. satisfy Assumption 2.6.1 and havemean 0. Let X(0, f = 0, ± 1,. . . be uniformly bounded. Let Y(f) be given by(6.1.1) where {a(ii)\ satisfies S \u\ |a(w)| < ». Let A(r)(X) be given by(6.4.4) where fy;r(r)(X) is given by (6.4.1). Then
6.4 A FORM OF ASYMPTOTIC DISTRIBUTION 195
with C(m,r) a constant given by
and finally take
in Case A where
for finite K. There are similar expressions in Cases B and C.
We note from expression (6.4.8) that £A(r)(X) is principally a matrixweighted average of the transfer function A(a). In addition, the expressionsuggests that the larger fxxm(ty is, the smaller the departure from theweighted average will be. From Theorem 6.4.1 we may conclude
Corollary 6.4.1 Under the conditions of Theorem 6.4.1 and if ||f^(r)(X>"!llis bounded, as T~> «>, A(r)(X) is asymptotically unbiased.
Turning to an investigation of asymptotic distributions we have
Theorem 6.4.2 Suppose the conditions of Theorem 6.4.1 are satisfied.Suppose also that fxx(T)(h) is nonsingular for T sufficiently large and that2irs(T)/T-*\ as r-> oo. Then A(r)(X)T is asymptotically Nr
c(A(\)T,(2m + l)~!/;,(X)fA'A-(r)(X)-1) in Case A, is asymptotically Nr(\(\)
T,(2w)-'/te(X)fA-A'(7')(X)-1) in Case B and Case C. Also g,,(r)(X) tends to
is often of special interest as it provides a measure of the strength of a lineartime invariant relation between the series 7(/), / = 0, ±1, . . . and the seriesX(/), t = 0, ± 1, . . . . Its large sample distribution is indicated by
Theorem 6.4.3 Suppose the conditions of Theorem 6.4.1 are satisfied andsuppose |/?y;r(r)(X)l2 is given by (6.4.11). Then, in Case A,
6.5 EXPECTED VALUES OF ESTIMATES OF THE TRANSFERFUNCTION AND ERROR SPECTRUM
We now turn to an investigation of the expected values of estimates ofslightly more general form than those in the previous section. Suppose that
196 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
/«(X)X2(2m+i-r)/[2(2m + 1 - /O] in Case A and to /,.(X)xL-r/(2m - r) inCase B and Case C. The limiting normal and x2 distributions are inde-pendent. Finally M(T) + A(r)(0)CA-(r) is asymptotically N\(n + A(0)cAr(r),2irT-yit(Vj) independently of A(r)(X), g,,(r)(X), - « < X<
In the case X ̂ 0 (mod IT), Theorem 6.4.2 suggests the approximation
In the case X = 0 (mod IT), the theorem suggests the approximate variance2/«(X)V(2m - r).
The limiting distributions of the estimates of the gain and phase may bedetermined through
Corollary 6.4.2 Under the conditions of Theorem 6.4.2, functions ofA(T)(X), g,.(r)(X), M(J) + A(r)(0)c(r) tend in distribution to the same functionsbased on the limiting variates of the theorem.
In Section 6.9 we will use Theorem 6.4.2 and its corollary to set up con-fidence regions for the parameters of interest. The statistic
as T~» oo where F is a noncentral F with degrees of freedom 2r over2(2m +!-/•) and noncentrality parameter A(X)fA-A-(r)(X)A(X)T//«(X).
We will return to a discussion of this statistic in Chapter 8. The notationOfl.s.O) means that the term tends to 0 with probability 1.
6.5 ESTIMATES OF THE TRANSFER FUNCTION 197
we are interested in estimating the parameters of the model (6.1.1) given thevalues X(/), Y(t), t = 0 , . . . , T - 1. Let Iy;r(r)(X) be given by (6.3.8) withsimilar definitions for /yy(r)(X) and lxx(T)&\ We will base our estimates onthese statistics in the manner of (6.3.10); however, we will make our esti-mates more flexible by including a variable weighting of the terms in ex-pression (6.3.10). Specifically, let W(a) be a weight function satisfying
Assumption 6.5.1 W(a), — <» < a < °°, is bounded, even, non-negative,equal to 0 for |«| > ir and such that
The principal restrictions introduced here on W(a), over those of Assump-tion 5.6.1, are the non-negativity and finite support.
In order to reflect the notion that the weight function should becomemore concentrated as the sample size T tends to °°, we introduce a band-width parameter BT that depends on T. Also in order that our estimatepossess required symmetries we extend the weight function periodically. Wetherefore define
The mass of W(T)(a) is concentrated in intervals of width 2irBr abouta = 0 (mod 2ir) as T—> °°.
We nx>w define
We see that W(T)(a) is non-negative
and if BT —* 0 as T —» oo, then for T sufficiently large
As estimates of A(X), /,«(X), /* we now take
198 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
and
Also A(T)(X) and g,,(:r)(X) have period ITT, while g..(T)(X) is non-negative andsymmetric about 0. Finally /*(r) is real-valued as is its corresponding popula-tion parameter ju-
A statistic that will appear in our later investigations is |/?yA-(r)(X)|2
given by
respectively. If m is large, then definitions (6.5.9) and (6.4.5) are essentiallyequivalent.
Because of the conditions placed on W(d), we see that
where the error terms are uniform in X.
Theorem 6.5.1 Let e(0, f = 0, ±1,.. . satisfy Assumption 2.6.2(1), X(/),t = 0, ±1,... satisfy Assumption 6.5.2. Let 7(0, f = 0, ±1,... be givenby (6.1.1) where |a(w)} satisfies S \u\ |a(«)| < «. Let W(a) satisfy Assump-tion 6.5.1. Let A(T)(X) be given by (6.5.8), then
It will be seen to be a form of multiple correlation coefficient and may beseen to be bounded by 0 and 1, and it will appear in an essential mannerin estimates of the variances of our statistics.
We make one important assumption concerning the sequence of fixed(as opposed to random) values X(r) and that is
Assumption 6.5.2 X(0 t = 0, ± 1,. . . is uniformly bounded and iffxx(TWis given by (6.5.5), then there is a finite K such that
for all X and T sufficiently large.
Turning to the investigation of the large sample behavior of A(r)(X)we have
6.5 ESTIMATES OF THE TRANSFER FUNCTION 199
We see that the expected value of A(r)(X) is essentially a (matrix) weightedaverage of the population function A(a) with weight concentrated in aneighborhood, of width 2-n-Br, of X. Because it is a matrix weighted average,an entanglement of the various components of A(a) has been introduced. Ifwe wish to reduce the asymptotic bias, we should try to arrange for A(a) tobe near constant in the neighborhood of X. The weights in (6.5.14) dependon the values X(/), / = 0 , . . . , T — 1. It would be advantageous to makeIxx(T)(a) near constant as well and such that off-diagonal elements are near0. The final expression of (6.5.14) suggests that the asymptotic bias of A(r)(X)is generally of the order of the band-width BT- We have
Corollary 6.5.1 Under the conditions of Theorem 6.5.1 and if BT —» 0 asT —> oo, A(r)(X) is an asymptotically unbiased estimate of A(X).
Let the entries of A(X) and A(r)(X) be denoted by Aj(\) and /4/r)(X),y = 1,. . . , / • , respectively. On occasion we may be interested in the real-valuedgains
These may be estimated by
and
Theorem 6.5.2 Under the conditions of Theorem 6.5.1,
and if Aj(\) ^ 0, then
(In this theorem, ave denotes an expected value derived in a term by termmanner from a Taylor expansion, see Brillinger and Tukey (1964).)
and the real-valued phases
200 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
Corollary 6.5.2 Under the conditions of Theorem 6.5.2 and if BT —» 0,BrT—* oo asT—> °°, G/T)(X) is an asymptotically unbiased estimate of G/X).
Turning to the case of g«,(r)(X), our estimate of the error spectrum, we have
Theorem 6.5.3 Under the conditions of Theorem 6.5.1,
This result may be compared instructively with expression (5.8.22) in thecase P = 1. In the limit we have
Corollary 6.5.3 Under the conditions of Theorem 6.5.3 and if BT —•> 0,BTT —» oo as T —> oo, g«(r)(X) is an asymptotically unbiased estimate of/,,(X).
In the case of /x(r) we may prove
Theorem 6.5.4 Under the conditions of Theorem 6.5.1,
From Theorem 6.5.4 follows
Corollary 6.5.4 Under the conditions of Theorem 6.5.4 and if BT —> 0 asT—* oo. i iW> is an asvmntntir.allv unhiaspH estimate r»f n
6.6 ASYMPTOTIC COVARIANCES OF THE PROPOSED ESTIMATES
In order to be able to assess the precision of our estimates we require theform of their second-order moments. A statistic that will appear in thesemoments is defined by
This statistic has the same form as fxx(T)0$ given by expression (6.5.5) ex-cept that the weight function W(a) has been replaced by W(a)2. Typicallythe latter is more concentrated; however, in the case that W(a) = (2ir)~ for|a| ^ TT
6.6 ASYMPTOTIC COVARIANCES OF THE PROPOSED ESTIMATES 201
In a variety of cases it may prove reasonable to approximate Iujr(r)(X) byf*;r(r)(X). This has the advantage of reducing the number of computationsrequired. Note that if fxx(T)(X) is bounded, then the same istrueforhjo-^X).Thus we may now state
Theorem 6.6.1 Let e(f), t = 0, ±1,. . . satisfy Assumption 2.6.1 and havemean 0. Let X(/), f = 0, ±1,. . . satisfy Assumption 6.5.2. Let 7(0, t = 0,±1,. . . be given by (6.1.1) where |a(«)} satisfies S \u\ |a(u)| < «. LetW(a) satisfy Assumption 6.5.1. If BT —» 0 as T—> », then
In the case that (6.6.2) holds, the second expression of (6.6.3) has the form
an expression that may be estimated by
We see from expression (6.6.3) that the asymptotic variance of A(T)(X) is oforder Br~lT~l and so we have
Corollary 6.6.1 Under the conditions of the theorem and if BrT —» » asT —> oo, A<r)(X) is a consistent estimate of A(X).
We also note, from (6.6.3), that A(r)(X) and A(r)(/x) are asymptotically un-correlated for X ̂ n (mod 2?r).
In practice we will record real-valued statistics. The asymptotic co-variance structure of Re A(r)(X), Im A(r)(X) is given in Exercise 6.14.22.Alternatively we may record G/(r)(X), </»/r)(X) and so we now investigatetheir asymptotic covariances. We define ^^"(X) to be the entry in the jthrow and fcth column of matrix
and
Expressions (6.6.11) and (6.6.12) should be compared with expressions(5.6.12) and (5.6.15). We see that under the indicated limiting processes theasymptotic behavior of the second-order moments of g..(r)(A) is the same asif g..(r)(X) were a power spectral estimate based directly on the valuese(/), r = 0,. . . ,r- 1.
In the case of M(r) + A(r>(0)c^(7') = CY(T) we have
Corollary 6.6.3 Under the conditions of Theorem 6.6.3 and if BT71—> °°as T—> oo
In the limit we have,
202 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
Theorem 6.6.2 Under the conditions of Theorem 6.6.1 and if Aj(\),AM ^ 0
for j, k = 1, . . . ,r.
Note that the asymptotic covariance structure of log Gj(T)(\) is the sameas that of <t>j(T\\) except in the cases X = 0 (mod TT). We can construct esti-mates of the covariances of Theorem 6.6.2 by substituting estimates for theunknowns Aj{\), fM; We note that log G,(r)(X) and <£*(r)(M) are asymp-totically uncorrelated for ally, k and X, M-
Turning to the investigation of g«.(r)(^)> we have
Theorem 6.6.3 Under the conditions of Theorem 6.6.1,
6.7 ASYMPTOTIC NORMALITY OF THE ESTIMATES 203
Theorem 6.6.4 Under the conditions of Theorem 6.6.1,
We may use expression (6.6.15) below to obtain an expression for thelarge sample variance of n(T). (See Exercise 6.14.31.) This variance tends to 0as T —> oo and so we have
Corollary 6.6.4 Under the conditions of the theorem and if BrT —> oo asT—•* oo, /u(r) is a consistent estimate of/x.
In the case of the joint behavior of A(r)(X), £,,(r)(X), M(r) + A(r)(0)cjr<r)
we have
Theorem 6.6.5 Under the conditions of Theorem 6.6.1
and
We see that g<t(T)(n) is asymptotically uncorrelated with both A(r)(^) ar>d
M(r) -|- A(r)(0)cAr(r). Also A(r>(X) and M(r) + A^>(0)cAr(r) are asymptotically
uncorrelated.In the case of the gains and phases we have
Theorem 6.6.6 Under the conditions of Theorem 6.6.1,
and
6.7 ASYMPTOTIC NORMALITY OF THE ESTIMATES
We next turn to an investigation of the asymptotic distributions of theestimates A(r)(X), g.,(r)(X), p.(7"> under the limiting condition BrT—^ »as T—> oo.
Theorem 6.7.1 Let s(/), / = 0, ±1,. .. satisfy Assumption 2.6.1 and havemean 0. Let X(/), t = 0, ±1,. . . satisfy Assumption 6.5.2. Let 7(0, t = 0,±1,.. . be given by (6.1.1) where {a(w)| satifies 2 \u\ |a(w)| < oo. Let
if X ̂ 0 (mod TT) where *F(r)(X) is given by (6.6.6). This result will later beused to set confidence regions for A(X).
Following a theorem of Mann and Wald (1943a), we may conclude
Corollary 6.7.1 Under the conditions of Theorem 6.7.1 loge G/r)(X),*/r)00» g.«(r)O), CY
(T) = M(r) + A(r)(0)c^(r) are asymptotically normal withcovariance structure given by (6.6.7) to (6.6.10), (6.6.13), and (6.6.17) to(6.6.20) for j = l , . . . , r .
We note in particular that, under the indicated conditions, log G/r)(X)and <£/r)(X) are asymptotically independent.
The asymptotic distribution of A(r)(X) given in this theorem is the same asthat of Theorem 6.4.2 once one makes the identification
The asymptotic distribution of g«(r)(X) is consistent with that of Theorem6.4.2 in the case that (6.7.2) is large, for a x2 variate with a large number ofdegrees of freedom is near normal.
6.8 ESTIMATING THE IMPULSE RESPONSE
In previous sections we have considered the problem of estimating thetransfer function A(X). We now consider the problem of estimating the cor-responding impulse response function (a(w)}. In terms of A(X) it is given by
204 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
W(a) satisfy Assumption 6.5.1. If BT-*Q, BTT-+ <*> as T-+ «, thenA(r)(Xi), g,,(r)(Xi),. . . , A(r)(Xy), g,,(r)(Xy) are asymptotically jointly normalwith covariance structure given by (6.6.3), (6.6.11), and (6.6.14). FinallyM(D _j_ A(r)(0)cA-(r) is asymptotically independent of these variates withvariance (6.6.13).
We see from expression (6.6.14) that A(r)(X) and gu(T)(n) are asymptoti-
cally independent for all X, n under the above conditions. From (6.6.3)we see that A(T)(V) and A(r)(/i) are asymptotically independent ifX - n ^ 0 (mod 2ir). From Exercise 6.14.22 we see Re A(r)(X) and Im A(r)(Xare asymptotically independent. All of these instances of asymptotic inde-pendence are in accord with the intuitive suggestions of Theorem 6.2.4.
The theorem indicates that A(r)CXV is asvmototicallv
6.8 ESTIMATING THE IMPULSE RESPONSE 205
Let A(r)(X) be an estimate of A(X) of the form considered previously. LetPT be a sequence of positive integers tending to °° with T. As an estimate ofa(w) we consider
Note that because of the symmetry properties of A(r)(X), the range of sum-mation in expression (6.8.2) may be reduced to 0 ^ p ^ (PT — l)/2 interms of Im A(r), Re A(r). Also, the estimate has period PT and so, forexample
We may prove
Theorem 6.8.1 Let e(f), t = 0, ±1,.. . satisfy Assumption 2.6.1 and havemean 0. Let X(f), t — 0, ±1,. . . satisfy Assumption 6.5.2. Let Y(t) be givenby (6.1.1) where (a(w)} satisfies S \u\ |a(w)| ^ <». Let W(oi) satisfy As-sumption 6.5.1. Let a(r)(w) be given by (6.8.2), then
We see that for large PT and small BT the expected value of the suggestedestimate is primarily the desired a(tt). A consequence of the theorem is
Corollary 6.8.1 Under the conditions of Theorem 6.8.1 and if BT —> 0,PT —> °° as T —» oo, a(r)(«) is asymptotically unbiased.
Next turn to an investigation of the second-order moments of a(r)(w). Wehave previously defined
and we now define
This expression is bounded under the conditions we have set down. Wenow have
Note from (6.8.6) that the asymptotic covariance matrix of a(r)(«) doesnot depend on w. Also the asymptotic covariance matrix of a(r)(w) witha(r)(0) depends on the difference u — v so in some sense the process a(r)(«),M = 0, ± 1,. . . may be considered to be a covariance stationary time series.In the limit we have
Corollary 6.8.2 Under the conditions of Theorem 6.8.2 and if PTBTT —> «as T—> oo, a(r)(w) is a consistent estimate of a(w).
Turning to the joint behavior of a(r)(w) and g,,(r)(X) we have
Theorem 6.8.3 Under the conditions of Theorem 6.8.1
6.9 CONFIDENCE REGIONS
The confidence regions that will be proposed in this section will be basedon the asymptotic distributions obtained in Section 6.4. They will be con-structed so as to be consistent with the asymptotic distributions of Sec-tion 6.7.
206 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
Theorem 6.8.2 Under the assumptions of Theorem 6.8.1 and if BT ^ PT~I,5 r->0asr—» oo,
We see that a(r)(w), g«,(r)(X) are asymptotically uncorrelated for all u, X. Inthe case of the limiting distribution we may prove
Theorem 6.8.4 Under the conditions of Theorem 6.8.1 and if PrBr —•> 0 asr-> oo, a<r>(tt,), .. ., a<r>(w,), g,,(r)(\i),. . . ,g,,(r)(X*) are asymptoticallynormal with covariance structure given by (6.6.10), (6.8.7), and (6.8.8).
In Theorem 6.8.2 we required BT ^ PT~]. From expression (6.8.7) we seethat we should take PrBj as large as possible. Setting PT = BT~I seems asensible procedure, for the asymptotic variance of a(r)(w) is then of order T~*.However, in this case we are unable to identify its principal term from ex-pression (6.8.7). In the case that PrBr —> 0, the first term in (6.8.7) is thedominant one. Finally we may contrast the asymptotic order of this vari-ance with that of A(7">(X) which was BT~lT-i.
6.9 CONFIDENCE REGIONS 207
Suppose estimates A(r)(X), Mm> g«e(r)(X), a(r)(w) have been constructed inthe manner of Section 6.5 using a weight function W(a). A comparison ofthe asymptotic distributions obtained for A(r)(X) in Theorems 6.4.2 and 6.7.1suggests that we set
Following Theorem 6.7.1 we then approximate the distribution of A(r)(X)T by
and by
At the same time the distribution of g,,(r)(X) is approximated by an inde-pendent
and by
A 100/3 percent confidence interval for/,,(X) is therefore provided by
in Case A with similar intervals in Cases B and C. A confidence interval forlog/.,(X) is algebraically deducible from (6.9.6).
If we let Cjj denote the yth diagonal entry of
and Wj denote Cyyg,,(r)(X), then following the discussion of Section 6.2 a100/3 percent confidence region for Re A/T)(\), Im ^y(r)(X) may be deter-mined from the inequality
This region is considered in Akaike (1965) and Groves and Hannan (1968).If a 100/3 percent simultaneous confidence region is desired for all the /4/X),j = ! , . . . , / • then following Exercise 6.14.17 we can consider the region
208 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
If we set
then the region (6.9.9) is approximately equivalent to the region
giving a simultaneous confidence region for the real-valued gains andphases. Regions of these forms are considered in Goodman (1965) andBendat and Piersol (1966). The exact procedures based on (6.2.19) and(6.2.21) may also be of use in constructing separate intervals for j/</X)| or4>/(X). They involve approximating the distribution of
by a noncentral F with degrees of freedom 2, 2(2w -f 1 — /•) and non-centrality |/4/X)|2/2 and on approximating the distribution of
by a t2(2m+i-r) distribution and then finding intervals by algebra.On occasion we might be interested in examining the hypothesis A(X) = 0.
This may be carried out by means of analogs of the statistics (6.2.9) and(6.2.10), namely
and
In the case A(X) = 0, (6.9.14) is distributed asymptotically as /2;2(2m+i-r)and the latter statistic as
respectively.We now turn to the problem of setting confidence limits for the entries of
a(n(H). The investigations of Section 6.8 suggest the evaluation of thestatistic
6.10 A WORKED EXAMPLE 209
Let Ajjm signify the/th diagonal entry of A(r). Theorem 6.8.4 now suggests
6.10 A WORKED EXAMPLE
As a first example we investigate relations between the series, B(f), ofmonthly mean temperatures in Berlin and the series, V(t), of monthly meantemperatures in Vienna. Because these series have such definite annual varia-tion we first adjust them seasonally. We do this by evaluating the mean valuefor each month along the course of each series and then subtracting thatmean value from the corresponding month values. If Y(t) denotes the ad-justed series for Berlin, then it is given by
Figure 6.10.1 Seasonally adjusted series of monthly mean temperatures in °C at Berlin forthe years 1920-1930.
Figure 6.10.2 Seasonally adjusted series of monthly mean temperatures in °C at Viennafor the years 1920-1930.
as an approximate 100)8 percent confidence interval for aj(u).Simultaneous regions for O/,(MI), .. ., O//M/) may be constructed from
(6.9.18) using Bonferroni's inequality; see Miller (1966).
j = 0,. . ., 11; k = 0,.. . , K - 1 and K = T/12. LetX(t) likewise denotethe series of adjusted values for Vienna. These series are given in Figures6.10.1 and 2 for 1920-1930. The original series are given in Figure 1.1.1.
210 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
The period for which we take these temperature series is 1780-1950. Wedetermine the various statistics in the manner of Section 6.4. In fact we takeT — 2048 and so are able to evaluate the required discrete Fourier trans-forms by means of the Fast Fourier Transform Algorithm. In forming thestatistics /yy(r)(X),/y^^(X),/^(r)(X) we take ro = 10.
The results of the calculations are recorded in a series of figures. Figure6.10.3 is a plot of logio/rr(r)(X) and logio g,.(r)(X), the first being the uppercurve. If we use expressions (5.6.15) and (6.6.12) we find that the asymptoticstandard errors of these values are both .095 for X ̂ 0 (mod ir). Figure6.10.4 is a plot of Re A(T)(\) which fluctuates around the value .85; Figure6.10.5 is a plot of Im /4(r)(X) which fluctuates around 0; Figure 6.10.6 is aplot of G(r)(X) which fluctuates around .9; Figure 6.10.7 is a plot of 4>(7X\)which fluctuates around 0; Figure 6.10.8 is a plot of |/?yA-(r)(X)|2 whichfluctuates around .7. Remember that this statistic is a measure of the degreeto which Y is determinable from X in a linear manner. Figure 6.10.9 is aplot of a(T)(u) for \u\ ^ 50. Following (6.8.7) the asymptotic standard error
Figure 6.10.3 An estimate of the power spectrum of Berlin temperatures and an estimateof the error spectrum after fitting Vienna temperatures for the years 1780-1950.
6.1A WORKED EXAMPLE211
Figure 6.10.4 Re A^(\), an estimate of the real part of the transfer function for fittingBerlin temperatures by Vienna temperatures.
of this statistic is .009. The value of a(r)(0) is .85. The other values are notsignificantly different from 0.
Our calculations appear to suggest the relation
where the power spectrum of s(0 has the form of the lower curve in Figure6.10.3. We fitted the instantaneous relation by least squares and found thesimple regression coefficient of Y(f) on X(f) to be .81. If we assume the e(0are independent and identically distributed, then the estimated standarderror of this last is .015. The estimated error variance is 1.57.
As a second example of the techniques of this chapter we present theresults of a frequency regression of the series of monthly mean temperaturesrecorded at Greenwhich on the monthly mean temperatures recorded at thethirteen other locations listed in Table 1.1.1. We prefilter these series by re-
212 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
Figure 6.10.5 Im /4(T)(X), an estimate of the imaginary part of the transfer function forfitting Berlin temperatures by Vienna temperatures.
moving monthly means and a linear trend. Figure 1.1.1 presents the originaldata here.
We form estimates in the manner of (6.4.1) to (6.4.5) with m = 57. TheFourier transforms required for these calculations were computed using aFast Fourier Transform Algorithm with T= 2048. Now: Figure 6.10.10presents <j/r)(X), <£/r)(X) fory = 1,. . . , 13; Figure 6.10.11 presents logicg,.(r)(X); Figure 6.10.12 presents \Ryx(TW\2 as defined by (6.4.11); The powerspectrum of Greenwich is estimated in Figure 7.8.8.
Table 6.10.1 gives the results of an instantaneous multiple regression ofthe Greenwich series on the other thirteen series. The estimated error vari-ance of this analysis is .269. The squared coefficient of multiple correlationof the analysis is .858.
The estimated gains, G/r)(X), appear to fluctuate about horizontal levelsas functions of X. The highest levels correspond to Edinburgh, Basle, andDe Bilt respectively. From Table 6.10.1 these are the stations having the
Figure 6.10.6 G(r)(X), an estimate of the amplitude of the transfer function for fitting Berlintemperatures by Vienna temperatures.
Figure 6.10.7 4>(T)(X), an estimate of the phase of the transfer function for fitting Berlintemperatures by Vienna temperatures.
Figure 6.10.8 \Ryxm(^)\2, an estimate of the coherence of Berlin and Vienna temperaturesfor the years 1780-1950.
Figure 6.10.9 a(T)(w), an estimate of the filter coefficients for fitting Berlin temperatures byVienna temperatures.
6.10 A WORKED EXAMPLE 215
Figure 6.10.10 Estimated gains and phases for fitting seasonally adjusted Greenwichmonthly mean temperatures by similar temperatures at thirteen other stations for the years1780-1950.
216 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
6.10 A WORKED EXAMPLE 217
Figure 6.10.11 logio £«(T)00> the logarithm of the estimated error spectrum for fittingGreenwich temperatures by those at thirteen other stations.
Figure 6.10.12 |/?rjr(T>00|2, an estimate of the multiple coherence of Greenwich tempera-tures with those at thirteen other stations.
6.11 FURTHER CONSIDERATIONS 219
Table 6.10.1. Regression Coefficients of Greenwich on Other Stations
Locations
ViennaBerlinCopenhagenPragueStockholmBudapestDe BiltEdinburghNew HavenBasleBreslauVilnaTrondheim
SampleRegression Coefficients
-.071-.125
.152-.040-.041-.048
.469
.305
.053
.338
.030-.024-.010
EstimatedStandard Errors
.021
.023
.022
.010
.016
.019
.022
.014
.009
.016
.017
.009
.013
largest sample regression coefficients but in the order De Bilt, Basle, andEdinburgh. The estimated phases <£/r)(X), corresponding to these stations,are each near constant at 0, suggesting there is no phase lead or lag and therelationship is instantaneous for these monthly values. As the estimated gainsof the other stations decrease, the estimated phase function is seen to be-come more erratic. This was to be expected in view of expression (6.6.9) forthe asymptotic variance of the phase estimate. Also, the estimated gain forNew Haven, Conn., is least; this was to be expected in view of its greatdistance from Greenwich.
The estimated multiple coherence, |/?yA-(r)(X)|2, is seen to be near constantat the level .87. This is close to the value .858 obtained in the instantaneousmultiple regression analysis. Finally, the estimated error spectrum, g..(r)(X),is seen to fall off steadily as X increases.
6.11 FURTHER CONSIDERATIONS
We turn to an investigation of the nature of the dependence of the variousresults we have obtained on the independent series X(f). We first consider thebias of A(r)(X). From expressions (6.4.8) and (6.5.14) we see that the ex-pected value of A(r)(X) is primarily a matrix weighted average of A(a) withweights depending on lxx(T)(a). From the form of the expressions (6.4.8)and (6.5.14) it would be advantageous if we could arrange that the functionIxx(T)(a) be near constant in a and have off-diagonal terms near 0. Near 0off-diagonal terms reduces the entanglement of the components of A(a).Continuing, an examination of the error term in (6.4.8) suggests that theweighted average term will dominate in the case that Hfjrjr^X^)"""1!! is small,that is, fA-A-(r)(X) is far from being singular.
220 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
Next we consider the asymptotic second-order properties of A(r)(X). Ex-pression (6.6.4) and the results of Theorem 6.4.2 indicate that in order toreduce the asymptotic variances of the entries of A(r)(X) if it is possiblewe should select an X(/), t — 0, ±1,.. . such that the diagonal entries offxx(T)O^Tlare large. Suppose that ///r)(X), j = 1, . . . , r,' the diagonalentries of f*A-(r)(X) are given. Exercise 6.14.18 suggests that approximately
and that equality is achieved in the case that the off-diagonal elements are 0.We are again led to try to arrange that the off-diagonal elements of f*;r(r)(X)be near 0 and that its diagonal elements be large.
An additional advantage accrues from near 0 off-diagonal elements. From(6.6.4) we see that if they are near 0, then the statistics AJ(T)(\\ /4*(r)(X),1 ^ j < k ^ r, will be nearly uncorrelated and asymptotically nearly inde-pendent. Their interpretation and approximate properties will be moreelementary.
In order to obtain reasonable estimates of A(r)(X), — °° < X < <», wehave been led to seek an X(f), t = 0, ±1,. . . such that fxx(T}(a) is nearconstant in a, has off-diagonal terms near 0, and has large diagonal terms.We will see later that a choice of X(/) likely to lead to such an fxx(T)(d) is arealization of a pure noise process having independent components withlarge variance.
On the bulk of occasions we will be presented with X(0, t = 0, ± 1,. . . asa fait accompli; however as we have seen in Section 6.1 we can alter certainof the properties of X(/) by a filtering. We could evaluate
t = 0,. . . ,T — 1 for some r X r filter (c(w)} and then estimate the transferfunction Ai(X) relating 7(0 to Xi(/), t - 0, ±1,. . . . Let this estimate beAi(T)(X)- As an estimate of A(X) we now consider
From (6.1.10) and (6.5.14)
suggesting that we should seek a filter C(X) such that A(X)C(X)-1 does notvary much with X. Applying such an operation is called prefiltering. It can beabsolutely essential even in simple situations.
6.11 FURTHER CONSIDERATIONS 221
Consider a common relationship in which Y(f) is essentially a delayedversion of X(t)\ specifically suppose
In terms of the previous discussion, we are led to prefilter the data using thetransfer function C(X) = exp { — / X u } , that is, to carry out the spectral cal-culations with the series X\(t) = X(t — v) instead of X(t). We would thenestimate A(\) here by exp {—/XD}/i^,(X)//Fjri(X). This procedure wassuggested by Darzell and Pearson (1960) and Yamanouchi (1961); seeAkaike and Yamanouchi (1962). In practice the lag v must be guessed atbefore performing these calculations. One suggestion is to take the lag thatmaximizes the magnitude of the cross-covariance of the series 7(0 and Jf(/).
In Section 7.7 we will discuss a useful procedure for prefiltering in the caseof vector-valued X(f). It is based on using least squares to fit a preliminarytime domain model and then to carry out a spectral analysis of X(f) with theresiduals of the fit.
We have so far considered means of improving the estimate A(r)(X). Theother estimates g,.(r)(X), n(T\ a(r)(w) are based on this estimate in an intimatemanner. We would therefore expect any improvement in A(r)(X) to result inan improvement of these additional statistics. In general terms we feel thatthe nearer the relation between Y(i) and X(f), / = 0, ±1,... is to multipleregression of Y(f) on X(0 with pure noise errors, the better the estimateswill be. All prior knowledge should be used to shift the relation to one nearthis form.
A few comments on the computation of the statistics appear in order. Theestimates have been based directly on the discrete Fourier transforms of theseries involved. This was done to make their sampling properties more ele-
for some v. In this case
and so for example
In the case that v is large, cos 2vsv/T fluctuates rapidly in sign as s varies.Because of the smoothing involved, expression (6.11.7) will be near 0, ratherthan the desired
222 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
mentary. However, it will clearly make sense to save on computations byevaluating these Fourier transforms using the Fast Fourier TransformAlgorithm. Another important simplification results from noting that theestimates of Section 6.4 can be determined directly from a standard multipleregression analysis involving real-valued variates. Consider the caseX ^ 0 (mod TT). Following the discussion of Section 6.3, the model (6.1.1)leads to the approximate relation
s — 0, ±1,.. ., ±w for 2irs(T)/T near X. In terms of real-valued quantitiesthis may be written
5 = 0, ±l, . . . , ±m. Because the values Re dtm(2Tr[s(T) + s]/T),
Im dt(T>(2ir[s(T) -|- s]/T), 5 = 0, d= l , . . ., ±m are approximately uncorre-
lated 7r7/t,(X) variates, (6.11.10) has the approximate form of a multipleregression analysis with regression coefficient matrix
and error variance irTftt(\). Estimates of the parameters of interest willtherefore drop out of a multiple regression analysis taking the Y matrix as
6.12 COMPARISON OF ESTIMATES OF THE IMPULSE RESPONSE
and the X matrix as
223
and so forth in (6.1.1).
6.12 A COMPARISON OF THREE ESTIMATES OF THEIMPULSE RESPONSE
Suppose the model of this chapter takes the more elementary form
which is of the form of (6.12.2) with expanded dimensions.
Estimates in the case X = 0 (mod TT) follow in a similar manner.
We remark that the model (6.1.1) is of use even in the case that X(f),t — 0, ± 1 , . . . is not vector valued. For example if one wishes to investigatethe possibility of a nonlinear relation between real-valued series Y(t) andX(t), t = 0, ±1, . . . one can consider setting
for (6.12.1) may be rewritten
for some finite m, n. In this case the dependence of Y(t) on the series X(t),t = 0, ± 1, . . . is of finite duration only. We turn to a comparison of threeplausible estimates of the coefficients a(—m), . . ., a(0),. . . , a(«) that nowsuggest themselves. These are the estimate of Section 6.8, a least squaresestimate, and an asymptotically efficient linear estimate.
We begin by noting that it is enough to consider a model of the simpleform
We note that both of the estimates (6.12.4) and (6.12.9) are weightedaverages of the \(T)(2irp/P) values. This suggests that we should consider, asa further estimate, the best linear combination of these values. Now Exercise6.14.11 and expression (6.12.5) indicate that this is given approximately by
Using (6.12.5), the covariance matrix of (6.12.9) will be approximately
224 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
The estimate of a of (6.12.2) suggested by the material of Section 6.8 is
From (6.6.4)
and so the covariance matrix of aim is approximately
The particular form of the model (6.12.2) suggests that we should alsoconsider the least squares estimate found by minimizing
with respect to /x and a. This estimate is
We may approximate this estimate by
6.13 USES OF THE PROPOSED TECHNIQUE
The statistics of the present chapter have been calculated by many re-searchers in different situations. These workers found themselves consider-ing a series 7(0, t = 0, ±1,. .. which appeared to be coming about from aseries X(t), t — 0, ±1, . . . in a linear time invariant manner. The latter is theprincipal implication of the model (6.1.1). These researchers calculatedvarious of the statistics A™(\), G(7XA), 4><r>(\), ge6
(r)(X), |/?y^(r)(X)|2, a(r)(«).An important area of application has been in the field of geophysics.
Robinson (1967a) discusses the plausibility of a linear time invariant modelrelating a seismic disturbance X(t), with 7(0 its recorded form at some sta-tion. Tukey (1959c) relates a seismic record at one station with the seismicrecord at another station. Other references to applications in seismologyinclude: Haubrich and MacKenzie (1965) and Pisarenko (1970). Turning tothe field of oceanography, Hamon and Hannan (1963) and Groves andHannan (1968) consider relations between sea level and pressure and windstress at several stations. Groves and Zetler (1964) relate sea levels at SanFrancisco with those at Honolulu. Munk and Cartwright (1966) take ,¥(0 tobe a theoretically specified mathematical function while 7(0 is the series oftidal height. Kawashima (1964) considers the behavior of a boat on an oceanvia cross-spectral analysis. Turning to the field of meteorology, Panofsky(1967) presents the results of spectral calculations for a variety of series in-cluding wind velocity and temperature. Madden (1964) considers certainelectromagnetic data. Rodriguez-Iturbe and Yevjevich (1968) take 7(0 tobe rainfall recorded at a number of stations in the U.S.A. and X(i) to be
6.13 USES OF THE PROPOSED TECHNIQUE 225
with approximate covariance matrix
In view of the source of 33(r), the matrix differences (6.12.6) — (6.12.12)and (6.12.10) — (6.12.12) will both be non-negative definite. In the case thatg.«(r)(X) is near constant, as would be the case were the error series e(0 whitenoise, and T not too small, formulas (6.12.9) and (6.12.11) indicate that theleast squares estimate a2(r) will be near the "efficient" estimate a3(T). In thecase that ixx(T)(\}g»(T\X) is near constant, formulas (6.12.4) and (6.12.11)indicate that the estimate ai(r) will be near the estimate E3(r).
Hannan (1963b, 1967a, 1970) discusses the estimates ai(r), a3(r) in the case
of stochastic X(0, t = 0, d= 1, Grenander and Rosenblatt (1957), Rosen-blatt (1959), and Hannan (1970) discuss the estimates a2(r), a3(r) in the caseof fixed X(0, t = 0, ±1,
226 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
relative sunspot numbers. Brillinger (1969a) takes Y(t) to be monthly rainfallin Sante Fe, New Mexico, and X(i) to be monthly relative sunspot numbers.
Lee (1960) presents arguments to suggest that many electronic circuitsbehave in a linear time invariant manner. Akaike and Kaneshige (1964)take 7(0 to be the output of a nonlinear circuit, X(f) to be the input of thecircuit, and evaluate certain of the statistics discussed in this chapter.
Goodman et al (1961) discuss industrial applications of the techniquesof this chapter as do Jenkins (1963), Nakamura (1964), Nakamura andMurakami (1964). Takeda (1964) uses cross-spectral analysis in an investiga-tion of aircraft behavior.
As examples of applications in economics we mention the books byGranger (1964) and Fishman (1969). Nerlove (1964) uses cross-spectralanalysis to investigate the effectiveness of various seasonal adjustment pro-cedures. Naylor et al (1967) examine the properties of a model of the textileindustry.
Results discussed by Khatri (1965b) may be used to construct a test of thehypothesis Im A(\) = 0, — °° < X < », that is cfoi) = a{—u), u = 0,±1 , . . . . The latter would occur if the relation between 7(0 and X(t) weretime reversible.
A number of interesting physical problems lead to a consideration ofintegral equations of the form
to be solved for/(/), given g(0» ft, b(i). A common means of solution is toset down a discrete approximation to the equation, such as
which is then solved by matrix inversion. Suppose that we rewrite expression(6.13.2) in the form
with the series e(0 indicating the error resulting from having made a discreteapproximation, with X(0) = 0 + b(0), X(u) = b(u), u ** 0, and 7(0 = g(/).The system (6.13.3) which we have been considering throughout this chapter,suggests that another way of solving the system (6.13.1) is to use cross-spectral analysis and to take a(T)(u), given by (6.8.2), as an approximation tothe desired/(O-
So far in this chapter we have placed principal emphasis on the estimationof A(X) and a(u). However, we next mention a situation wherein the param-
6.14 EXERCISES 227
eter of greatest interest is the error spectrum/.,(X). The model that has beenunder consideration is
with S \u\ \a(u)\ < oo. Suppose that we think of a(f), t = 0, ±1, . . . asrepresenting a transient signal of brief duration. Suppose we define
Expression (6.13.4) now takes the simpler form
The observed series, 7(0, is the sum of a series of interest, n + e(0, and apossibly undesirable transient series a(i). The procedures of this chapterprovide a means of constructing an estimate of/,,(X), the power spectrum ofinterest. We simply form g8«
(r)(X), taking the observed values Y(t), t = 0,. . . , T — 1 and X(t) as given by (6.13.5). This estimate should be sensibleeven when brief undesirable transients get superimposed on the series ofinterest. In the case that the asymptotic procedure of Section 6.4 is adopted,the distribution of g«(r)(A) is approximately a multiple of a chi-squared with4m degrees of freedom. This is to be compared with the 4m + 2 degrees offreedom the direct estimate /,.(r)(X) would have. Clearly not too muchstability has been lost, in return for the gained robustness of the estimate.
6.14 EXERCISES
6.14.1 Let the conditions of Theorem 6.2.1 be satisfied, but with EtTt — <r2Ireplaced by EtTt = S. Show that Ea. = a as before, but now
6.14.2 Let the conditions of Theorem 6.2.1 be satisfied, but with EeTt = a2!replaced by Etrt = cr2V. Prove that
is minimized by
Show that b is unbiased with covariance matrix tr2(XV~1XT)"1- Show thatthe least squares estimate a = YXT(XXT)-1 remains unbiased, but hascovariance matrixa2(XXT)-lX\XT(XXr)-1.
6.14.3 In the notation of Theorem 6.2.3, prove that the unbiased, minimumvariance linear estimate of aTa, for a a & vector, is given by aTa.
228 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
6.14.4 Let /if be a complex-valued random variable. Prove that
varpr| ^ covMf,*}.
6.14.5 In the notation of Theorem 6.2.3, let \R\2 = aXXTIT/YYr. Prove that0 ^ \R\2 ^ 1. Under the conditions of Theorem 6.2.4 prove that (n — k)\R\2/(k(\ — \R\2)] is distributed as noncentral_F, degrees of freedom 2kand 2(/i — k) and noncentrality parameter aXXTar/<r2-
6.14.6 Under the conditions of either Theorem 6.4.2 or Theorem 6.7.1, prove that<£(r>(X) is asymptotically uniform on (0,2Tr] if A(\) = 0.
6.14.7 Prove that the following is a consistent definition of asymptotic normality.A sequence of r vector-valued random variables Xn is asymptoticallynormal with mean 6n -f Wny and covariance matrix £„ = ^P*n]£^rn if*Pn-*(Xtt — 6») tends, in distribution, to M(y,S) where On is a sequence ofr vectors and *Fn a sequence of nonsingular r X r matrices.
6.14.8 With the notation of Exercise 6.14.2, show that (XXO^XVX^XX')"1 ^(XV^X)-1.! A ^ B here means A — B is non-negative definite.]
6.14.9 Show that fy;r(r)(X) of (6.5.6) may be written in the form
6.14.10 Let 7(r), / = 0, ±1, . . . denote the series whose finite Fourier transform isdr<"(\) - A<r>(X)d;r<r>(X). Prove that /77
(r)(X) = g«,(r)(X), that is, theestimate of the error spectrum may be considered to be a power spectralbased on a series of residuals.
6.14.11 Let Yy-, / = ! , . . . , J be 1 X r matrix-valued random variables withEYj = 0,£{(Yy - SHYT^)} = 6{j - k}Vjfor 1 ̂ j <C k ^ J. Provethat the best linear unbiased estimate of (J is given by
Hint: Use Exercises 6.14.2 and 6.14.8. Exercise 1.7.6 is the case r = 1. Showthat £{(5 - 0)T(5 - 5)) - CCy V,--1]-'.
6.14.12 Suppose the conditions of Theorem 6.2.4 hold. Prove that if two columnsof the matrix X are orthogonal, then the corresponding entries of & arestatistically independent.
where w(T)(u) is given by
and CYX(T)(U) is given by
6.14 EXERCISES 229
6.14.13 Demonstrate that #«,(n(X), the estimate (6.4.5) of the error spectrum,is non-negative.
6.14.14 If |/?yA(r)(X)|2 is defined by (6.4.11), show that it may be interpreted as theproportion of the sample power spectrum of the Y(t) values explained bythe X(r) values.
6.14.15 Show that the statistics A(7-)(\), gtt(T)(\) do not depend on the values of the
sample means c*(T), CY(T).6.14.16 Prove that /yy(r)(X) ^ gei
m(\) with the definitions of Sections 6.4.6.14.17 Let a be a A: vector. Under the conditions of Theorem 6.2.4 show that
provides a 100/3 percent multiple confidence region for all linear combina-tions of the entries of a. (This region is a complex analog of the Schefferegion; see Miller (1966) p. 49.)
6.14.18 We adopt the notation of Theorem 6.2.3. Let X, denote they'th row of Xand let X,X/ = C,, j = 1, . . . , k with Ci, . . . , Ck given. Prove that
and that the minimum is achieved when X,X*r = 0, k ^ j that is, when therows of X are orthogonal. (For the real case of this result see Rao (1965)p. 194.)
6.14.19 Let w be a Nic(n,a2) variate and let R = \w\, p = |ju|, /= arg w, <f> — argju. Prove that the density function of R is
where /o(*) is the 0 order Bessel function of the first kind. Prove
for v > 0, where \F\(a;b;x) is the confluent hypergeometric function.Evaluate ER if p = 0. Also prove the density function of/is
See Middleton (1960) p. 417.6.14.20 Let
where e is an s X n matrix whose columns are independent JV,C(0,E)variates, a is an s X r matrix of unknown complex parameters, x is anr X n matrix of known complex entries and y is an s X n matrix of knowncomplex variates. Let
230 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES
where 3 = yxT(xxT)~l» provided the indicated inverses exist.6.14.22 Under the conditions of Theorem 6.6.1 prove that
and
Prove that vec & is #£(vec a, S (x) (xx7)"1) and E is independent of a and(n — r)~lW,c(n — r,S). The operations vec and 0 are defined in Section82
6.14.21 Let x and y be given s X n and r X n matrices respectively with complexentries. For given c X sC, r X « U, c X u T, show that the s X r a,constrained by Call = T that minimizes
is given by
6.14.23 Under the conditions of Theorem 6.4.2 prove that
tends to (2m + l)-1/,,(X)X2r2 independently of g,.(r)(X) for X ̂ 0 (mod
IT). Develop a corresponding result for the case of X s= 0 (mod TT).
6.14.24 Suppose that in the formation of (6.4.2) one takes m = T — 1. Prove thatthe resulting A<r>(X) is
where cy(r) and CX(T} are the sample means of the Y and X values. Relatethis result to the multiple regression coefficient of Y(f) on X(r).
6.14.25 Under the conditions of Theorem 6.4.2 and if A(X) = 0, prove that forX ^ 0 (mod TT),
tends in distribution to
where F has an F distribution with degrees of freedom 2(2w + 1 — r) and2r.
6.14.32 Suppose that we consider the full model (6.12.3), rather than the simplerform (6.12.2). Let [a/T>(-m)- • -a/r)(ii)],7 = 1, 2, 3, be the analogs hereof the estimates ai(r), 82(r), 33(r> of Section 6.12. Show that the covariancescov {a/r)(H), aj(T>(v)\J = 1, 2, 3 are approximately BT-l7^l2ir f W(a)2datimes
6.14.31 Under the conditions of Theorem 6.6.1 show that
6.14EXERCISE231
6.14.26 Suppose that Y(f), t(f), t = 0, ±1, . . . are s vector-valued stochasticseries. Let y denote an s vector and a(/) denote an s X r matrix-valuedfunction. Let X(/), / = 0, ±1, . . . denote an r vector-valued fixed series.Suppose
Develop estimates A(r)(X) of the transfer function of {a(w)} and g,«(r)(X) ofthe spectral density matrix of e(r); see Brillinger (1969a).
6.14.27 Suppose that fjr*(r)(X) tends to fxx(X) uniformly in X as T—> °o and sup-pose that ||f*jr(X)||, HfrXX)-1!! < K, - °° < X < °° for finite K. Provethat Assumption 6.5.2 is satisfied.
6.14.28 Prove that fjr*(r)(X) as defined by (6.5.5) is non-negative definite if W(a) ^0. Also prove that g«,(r)(X) given by (6.5.9) is non-negative under thiscondition.
6.14.29 Let Xi(/) = SJK/ - u)X(u) where {b(u)} is a summable rXr filter withtransfer function B(X). Suppose that B(X) is nonsingular, — °o < X < oo.Prove that Xi(0, t = 0, ±1, . . . satisfies Assumption 6.5.2 if X(/), / = 0,±1, ... satisfies Assumption 6.5.2.
6.14.30 Suppose Y(f) and X(r) are related as in (6.1.1). Suppose that Xj(f) is in-creased toXjit) + exp {i\t} and the remaining components of X(/) are heldfixed. Discuss how this procedure is useful in the interpretation of Aj{X).
7
ESTIMATING THESECOND-ORDER SPECTRA OF
VECTOR-VALUED SERIES
7.1 THE SPECTRAL DENSITY MATRIX AND ITS INTERPRETATION
In this chapter we extend the results of Chapter 5 to cover the case of thej oint behavior of second-order statistics based on various components of avector-valued stationary time series.
Let X(/), t — 0, ±1, . . . be an r vector-valued series with componentseries Xa(t), t = 0, ±1, . . . for a = 1,. . . , r. Suppose
Indicate the individual entries of ex by ca, a = 1,. . ., r so ca = EXa(t) isthe mean of the series Xa(i), t = 0, ± 1, . . . . Denote the entry in row a,column b of Cxx(u) by cab(u), a, b — 1,.. ., r, so cab(u) is the cross-covari-ance function of the series Xa(f) with the series Xt,(t). Note that.
Supposing
232
7.1 THE SPECTRAL DENSITY MATRIX 233
we may define fxx(X), the spectral density matrix at frequency X of the seriesX(0, ; = 0, ± l , . . . by
The definition of the spectral density matrix may be inverted to obtain
/afc(X), the entry in row a and column b of fxx(X), is seen to be the powerspectrum of the series Xa(f) if a = b and to be the cross-spectrum of theseries Xa(t) with the series Xb(i) if a ^ b. ixx(^) has period 2ir with respectto X. Also, because the entries of Cxx(u) are real-valued
for an s X r matrix-valued filter with transfer function
then the spectral density matrix of the series Y(/) is given by
Expressions (7.1.9) and (7.1.10) imply that the covariance matrix of the svector-valued variate Y(/) is given by
With a goal of obtaining an interpretation of f^Ar(X) we consider the implica-tion of this result for the 2r vector-valued filter with transfer function
from (7.1.3). The matrix fxx(ty is Hermitian from the last expression. Theseproperties mean that the basic domain of definition of fxx(\) may be theinterval [0,7r]. We have already seen in Theorem 2.5.1 that f*jr(X) is non-negative definite, f^A-(X) ^ 0, for — » < x < o>, extending the result thatthe power spectrum of a real-valued series is non-negative.
Example 2.8.2 shows the effect of filtering on the spectral density matrix.Suppose
234 ESTIMATING SECOND-ORDER SPECTRA
and = 0 for all other essentially different frequencies. (For (7.1.12) we areusing the definition of Theorem 2.7.1 of the filter.) If A is small, the output ofthis filter is the 2r vector-valued series
involving the component of frequency X discussed in Section 4.6. By in-spection, expression (7.1.11) takes the approximate form
and the approximate form
Both approximations lead to the useful interpretation of Re f**(X) as pro-portional to the covariance matrix of X(f,X) (the component of frequency Xin X(0), and the interpretation of Im f*jr(X) as proportional to the cross-covariance matrix of X(/,X) with its Hilbert transform X"(f,X). Re/fl/,(X), theco-spectrum of Xa(i) with Xb(t), is proportional to the covariance of thecomponent of frequency X in the series Xa(t) with the corresponding com-ponent in the series Av,(0- Im/afr(X), the quadrature spectrum, is proportionalto the covariance of the Hilbert transform of the component of frequency Xin the seriesXa(t) with the component of frequency X in the series Xdj).Being covariances, both of these are measures of degree of linear rela-tionship.
When interpreting the spectral density matrix, fxx(h), it is also useful torecall the second-order properties of the Crame*r representation. In Theorem4.6.2 we saw that X(f) could be represented as
7.2 SECOND-ORDER PERIODOGRAMS 235
where the function Z*(X) is stochastic with the property
where r?(-) is the 2ir periodic extension of the Dirac delta function. From(7.1.17) it is apparent that fxx(^) maybe interpreted as being proportionalto the covariance matrix of the complex-valued differential dZjr(A). Bothinterpretations will later suggest plausible estimates for fxx(ty-
7.2 SECOND-ORDER PERIODOGRAMS
Suppose that the stretch X(/), / = 0,. . ., T — 1 of T consecutive values ofan r vector-valued series is available for analysis and the series is stationarywith mean function c.y and spectral density matrix fA-A-(X), — °° < A < ».Suppose also we are interested in estimating f*x(X). Consider basing an esti-mate on the finite Fourier transform
where ha(f) is a tapering function vanishing for \t\ sufficiently large,a = 1,. . . , r. Following Theorem 4.4.2, this variate is asymptotically
where
and
These distributions suggests a consideration of the statistic
as an estimate of fxx(\) in the case X 7* 0, ±27r , . . . . The entries ofIxx(T}(k)are the second order periodograms of the tapered values ha(t/T)Xa(f), t = 0,±1,.... This statistic is seen to have the same symmetry and periodicityproperties as fA-x(A). In connection with it we have
The character of the tapering function ha(t/T) is such that its Fouriertransform Ha
(T)(\) is concentrated in the neighborhood of the frequenciesX == 0, ±2ir,. .. for large T. It follows that in the case X ̂ 0 (mod 2x), thefinal term in (7.2.7) will be of reduced magnitude for T large. The first termon the right side of (7.2.7) is seen to be a weighted average of the cross-spectrum fab of interest with weight concentrated in the neighborhood of Xand with relative weight determined by the tapers. In the limit we have
Corollary 7.2.1 Under the conditions of Theorem 7.2.1 and if/ ha(u)hh(u)du 9* 0 for a, b = 1,. . . , r
The estimate is asymptotically unbiased if X ̂ 0 (mod 2?r) or if ex = 0. Ifca, Cb are far from 0, then substantial bias may be present in the estimate\xx(T)(X) as shown by the term in ca, cb of (7.2.7). This effect may be reducedby subtracting an estimate of the mean of X(f) before forming the finiteFourier transform. We could consider the statistics
with
236 ESTIMATING SECOND-ORDER SPECTRA
Theorem 7.2.1 Let X(f), t = 0, ±1, . . . be an r vector-valued series wt!mean function £X(/) = c* and cross-covariance function cov {X(f + u]X(/)} = CA-A-(M) for t, u = 0, ±1,. . . . Suppose
Let ha(u), — oo < u < o>, satisfy Assumption 4.3.1 for a = 1,. . . , r. LetlAr*(r)(X) be given by (7.2.5). Then
7.2 SECOND-ORDER PERIODOGRAMS 237
and then the estimate
The asymptotic form of the covariance of two entries of lxx(T}0^) in thecase that the series has mean 0 is indicated by
Theorem 7.2.2 Let X(/), / = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.2(1). Let ha(u), a = 1, . . . , / • satisfy Assumption 4.3.1.Let Ixx(T}C^) be given by (7.2.5), then
where |/MM| ^ *i|tf.(r)(X)| |#.(r)G*)l + K2\Ha^(\}\ + K,\Ha^(^\ + *4
for constants K\, . . . , K* and a = a\, 02, b\, 62, — » < X, M < °° •
The statistical dependence of /^ and /j^, is seen to fall off as the functionsHab
(T) fall off. In the limit the theorem becomes
Corollary 7.2.2 Under the conditions of Theorem 7.2.2
In the case of untapered values, ha(u) = 1 for 0 ^ M < 1, and = 0 other-wise, Exercise 7.10.14 shows that we have
for frequencies X, n of the form 2irr/T, 2-n-s/T where r, s are integers withr,s?£Q (mod T).
We complete the present discussion of the asymptotic properties of thematrix of second-order periodograms by indicating its asymptotic dis-tribution.
Theorem 7.2.3 Let X(f), t = 0, ±1,... be an r vector-valued series satis-fying Assumption 2.6.1. Let ha(t), a = 1 , . . . , r satisfy Assumption 4.3.1. Let
238 ESTIMATING SECOND-ORDER SPECTRA
Wr)(X) be given by (7.2.5). Suppose 2Xy, Xy ±\k^Q (mod 2ir) for1 ^.j<k^J. Then IXX(T}(*J), j = 1,.. . ,J are asymptotically inde-pendent Wrc(\fxx(\)) variates. Also if X = ±TT, ±3ir,.. ., then lA-A-(r)(X)is asymptotically W^\tfxx(\)) independently of the previous variates.
The Wishart distribution was given in Section 4.2 with its density functionand various properties. The limiting distribution of this theorem is seen toinvolve fxx(X) in a direct manner. However, being a Wishart with just 1 de-gree of freedom, the distribution is well spread out about fxx(X). ThereforeIjrjr(r>(^) cannot be considered a reasonable estimate.
It is interesting to note that the limiting distributions of Theorem 7.2.3 donot involve the particular tapering functions employed. In the limit thetaper used does not matter; however, as expression (7.2.7) shows, the taperdoes affect the large sample bias before we actually get to the limit. Conse-quently, if there may be peaks close together in fxx(\), we should taper thedata to improve the resolution.
The frequencies considered in Theorem 7.2.3 did not depend on T. Thefollowing theorem considers the asymptotic distribution in the case of anumber of frequencies tending to X as T —>• «. We revert to the untaperedcase in
Theorem 7.2.4 Let X(/), t = 0, ±1,.. . be an r vector-valued series satis-fying Assumption 2.6.1. Let
for - oo < X < oo. Let Sj(T) be an integer with X/r) = 2irSj(T)/T tendingto X, as r-> oo fory = 1, . . . , J. Suppose 2X/r), X/7) ± \k(T) ^ 0 (mod 2*-)for 1 ^ j < k <C J and T sufficiently large. Then lxx(T)(\j(T))J= 1,. . . , Jare asymptotically independent Wr
c(\fxx(\j)),j = I , . . . , / . Also ifX = ±TT, ±3?r,. . . , lA-A-(r)(X) is asymptotically Wr(\^xx(^y) independentlyof the previous variates.
The most important case of this theorem occurs when X, = X for j = 1,. . . , J. The theorem then indicates a source of J asymptotically independentestimates of fA-A-(X). The conclusions of this theorem were very much to beexpected in light of Theorem 4.4.1 which indicated that the £], X(/) exp{ -it\j(T)},j = 1 , . . . , J, are asymptotically independentNr
c(Q,2irTfxx(\jVvariates.
In order to avoid technical details we have made Theorem 7.2.4 refer tothe untapered case. Exercise 4.8.20 and Brillinger (1970b) present results
7.2 SECOND-ORDER PERIODOGRAMS 239
applying to frequencies depending on T as well as in the tapered case. Theessential requirement for asymptotic independence indicated by them is that\j(T) - \k(T), 1 ̂ j < k ^ J do not tend to 0 too quickly.
In particular, Theorems 7.2.3 and 7.2.4 give the marginal distributionspreviously determined in Chapter 5 for a periodogram /aa
(r)(X).The following theorem shows how we may construct L asymptotically in-
dependent estimates of fxxO$ in the case that the data have been tapered.We split the data into L disjoint segments of V observations, taper andform a periodogram for each stretch.
Theorem 7.2.5 Let \(t), t = 0, ±1,. . . be an r vector-valued series satis-fying Assumption 2.6.1. Let ha(ii), — °° < u < <» satisfy Assumption 4.3.1and vanish for u < 0, u ^ I. Let
Figure 7.2.1 Periodogram of seasonally adjusted monthly mean temperatures at Berlin forthe years 1780-1950. (Logarithmic plot.)
/ = 0,. . ., L — 1 where
Then the IjrA-(K)(X,/), / = 0,. . . , L — 1 are asymptotically independent
240 ESTIMATING SECOND-ORDER SPECTRA
Wrc(l,fxx(W variates if X ̂ 0 (mod w) and asymptotically independent
Wr(\,fxx(Xj) variates if X = in-, ±3*-,. . . , as K-> «.
Once again the limiting distribution is seen not to involve the tapers em-ployed; however, the tapers certainly appeared in the standardization ofda™(\J) to form W>(\,/).
Goodman (1963) introduced the complex Wishart distribution as anapproximation for the distribution of spectral estimates in the case of vector-valued series. Brillinger (1969c) developed \Vr
c(\,fxx(^) as the limiting dis-tribution of the matrix of second-order periodograms.
In Figures 7.2.1 to 7.2.5 we give the periodograms and cross-periodogramfor a bivariate series of interest. The series Xi(t) is the seasonally adjustedseries of mean monthly temperatures for Berlin (1780-1950). The seriesA^(0 is the seasonally adjusted series of mean monthly temperatures forVienna (1780-1950). Figures 7.2.1 and 7.2.2 give /n(r)(X), /22
(T)(X), the pe-riodograms of the series. The cross-periodogram is illustrated in the remain-ing figures which give Re /i2(r)(X), Im 7i2(r)(X), arg /i2(T)(V) in turn. All ofthe figures are erratic, a characteristic consistent with Theorem 7.2.3, whichsuggested that second-order periodograms were not generally reasonableestimates of second-order spectra.
Figure 7.2.2 Periodogram of seasonally adjusted monthly mean temperatures at Viennafor the years 1780-1950. (Logarithmic plot.)
Figure 7.2.3 Real part of the cross-periodogram of temperatures at Berlin with those atVienna.
Figure 7.2.4 Imaginary part of the cross-periodogram of temperatures at Berlin with thoseat Vienna.
242 ESTIMATING SECOND-ORDER SPECTRA
Figure 7.2.5 Phase of the cross-periodogram of temperatures at Berlin with those at Vienna.
7.3 ESTIMATING THE SPECTRAL DENSITY MATRIXBY SMOOTHING
Theorem 7.2.4 suggests a means of constructing an estimate of fxx(^)with a degree of flexibility. If
then, from that theorem, for s(T) an integer with 27rs(r)/rnear X ̂ 0 (mod T),the distribution of the variates lxxm(2ir[s(T) + s]/T), s = 0, ±1,. . . , dbmmay be approximated by 2m + 1 independent Wr
c(l,fxx03) distributions.The preceding suggests the consideration of the estimate
f
7.3 ESTIMATING THE SPECTRAL DENSITY MATRIX BY SMOOTHING 243
A further examination of the results of the theorem suggests the form
and the form
The estimate given by (7.3.2) to (7.3.4) is seen to have the same symmetryand periodicity properties as f^A-(X) and to be based on the values,dx(T) (2-n-s/T), 5 ^ 0 (mod T7) of the discrete Fourier transform. In con-nection with it we have
Theorem 7.3.1 Let X(/), / = 0, ±1,. . . be an r vector-valued series withmean function CA- and cross-covariance function £xx(u) = cov{X(f + w),X(r)}for /, w = 0, ±1 Suppose
Let f*A:(r)(A) be given by (7.3.2) to (7.3.4). Then
and
and
The functions ̂ rm(«), #rm(«), Crm(ot) are non-negative weight functions.The first has peaks at a = 0, ±2ir, ±4?r,. . . and has width there of approxi-mately Airm/T. The second and third are also concentrated in intervals ofapproximate width 4irm/T about the frequencies a = 0, ±2ir,. . .; how-ever, they dip at these particular frequencies. They are graphed in Figure5.4.1 for T = 11. In any case, Eixx{T)(*) should be near the desired fxx(tyin the case that fxx(a) is near constant in a band of width 4irm/T about X.In the limit we have
Corollary 7.3.1 Under the conditions of Theorem 7.3.1 and if 2trs(T)/T —> Xas 7"—> 0°
The estimate is asymptotically unbiased as is clearly desirable. We nextturn to a consideration of second-order properties.
Theorem 7.3.2 Let X(0, t = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.2(1). Let txx(T)(\) be given by (7.3.2) to (7.3.4) withX - 2*s(T)/T = O(r-i). Then
244 ESTIMATING SECOND-ORDER SPECTRA
where
7.3 ESTIMATING THE SPECTRAL DENSITY MATRIX BY SMOOTHING 245
The second-order moments are seen to fall off in magnitude as m in-creases. By choice of m, the statistician has a means of reducing the asymp-totic variability of the estimates to a desired level. The statistics are seen tobe asymptotically uncorrelated in the case that A ± n ^ 0 (mod 2ir). Inaddition, expression (7.3.13) has a singularity at the frequencies A, n — 0,dzir, ±27r , . . . . This results from two things: not knowing the mean c* andthe fact that f**(A) is real at these particular frequencies. We remark that theestimate fxx(T}(ty is not consistent under the conditions of this theorem.However, in the next section we will develop a consistent estimate.
Turning to the development of a large sample approximation to the dis-tribution of fxx(T)(X), we may consider
Theorem 7.3.3 Let X(/), t = 0, ± 1,. .. be an r vector-valued series satis-fying Assumption 2.6.1. Let f>;r(r)(A) be given by (7.3.2) to (7.3.4) with2irs(T)/T —»X as T—> °°. Then ixx(T)(^) is asymptotically distributed as(2m + I)"1 Wrc(2m + 1 fxx(\)) if A ̂ 0 (mod *•) and as (2m)-» Wr(2m,fxx(\})if X = 0 (mod TT). Also fxx(T)(hj), j = 1 , . . . ,J are asymptotically inde-pendent if Ay ± \k ̂ 0 (mod 2ir) for 1 ̂ j < k ^ J.
Asymptotically, the marginal distributions of the diagonal entries of fxx(h)are seen to be those obtained previously. The diagonal elements/,fl
(r)(A) areasymptotically the scaled chi-squared variates of Theorem 5.4.3. The stan-dardized off-diagonal elements asymptotically have the densities of Exercise7.10.15.
The approximation of the distribution of fxx(T)(ty by a complex Wishartdistribution was suggested by Goodman (1963). Wahba (1968) considers theapproximation in the case of a Gaussian series and m —» <». Brillinger(1969c) considers the present case with mean 0.
Theorems of the same character as Theorems 7.3.1 to 7.3.3 may be de-veloped in the case of tapered values if we proceed by splitting the data intoL disjoint segments of V observations. Specifically we set
Following Theorem 7.2.5, the estimates IA-A:(K)(A,/), / = 0,. . . , L — 1 areasymptotically independent Wr
c(\,fxx(\y) variates if A ̂ 0 (mod TT) andWr(\,fxx(K) variates if A = ±TT, d = 3 ? r , . . . . This suggests a consideration ofthe estimate
where /fo(«), — <» < u < °° vanishes for u < 0, u > 1. Next we set
246 ESTIMATING SECOND-ORDER SPECTRA
In connection with the above we have
Theorem 7.3.4 Suppose the conditions of Theorem 7.3.1 are satisfied.Suppose also the functions ha(u), a = 1,. . . , r satisfy Assumption 4.3.1,vanish for u < 0, u ^ 1, and satisfy J ha(u)hb(u)du ^ 0. Let fxx(LV) be givenby (7.3.16). Then
This theorem is an immediate consequence of Theorem 7.2.1 and itscorollary. It is interesting to note that the weighted average of fab appear-ing in expression (7.3.17) is concentrated in an interval of width propor-tional to V~l.
Theorem 7.3.5 Let X(f), f = 0, ±1,.. . be an r vector-valued series satis-fying Assumption 2.6.1. Let ha(ii), a = 1,. . . , r satisfy Assumption 4.3.1,vanish for u < 0, u ^ 1 and be such that J ha(u)hi,(u) du j* 0. Let tXx<Ly}(\)be given by (7.3.16). Then
The second-order moments are here reduced from those of (7.2.13) bythe factor \/L. The statistician may choose L appropriately large enoughfor his purposes in many cases. Finally we have
Theorem 7.3.6 Under the conditions of Theorem 7.2.5 and if iXx(LV)(^) isgiven by (7.3.16), iXx(LV\\) is asymptotically L-lWr
c(L,fxx(\)) if X pi 0(mod TT) and asymptotically L~J W£L,fxx(\y) if X = ±ir, ±3ir, . . . as K—> ».
Again the Wishart distribution is suggested as an approximation for thedistribution of an estimate of the spectral density matrix. One difficulty with
We shall form an estimate of/flfc(X) by taking a weighted average of thisstatistic concentrating weight in a neighborhood of X having width O(Br)where BT is a band-width parameter tending to 0 as T —» °°.
Let W0b(a), — <» < a < °°, be a weight function satisfying
is computed. The corresponding second-order periodograms are thengiven by
7.4 CONSISTENT ESTIMATES OF THE SPECTRAL DENSITY MATRIX 247
the above estimation procedure is that it does not provide an estimate in thecase of A = 0 (mod 2ii). An estimate for this case may possibly be obtainedby extrapolating estimates at nearby frequencies. Note also the estimate ofExercise 7.10.23.
Exercise 7.10.24 indicates the asymptotic distribution of the estimate
involving an unequal weighting of periodogram values.
7.4 CONSISTENT ESTIMATES OF THE SPECTRAL DENSITY MATRIX
The estimates of the previous section were not generally consistent, that is,fxx(r)(ty did not tend in probability to fxx(ty as T—» °°, typically. How-ever, the estimates did involve a parameter (m or Z,) that affected theirasymptotic variability. A consideration of the specific results obtainedsuggests that if we were to allow this parameter to tend to <» as T —> °°, thenwe might obtain a consistent estimate. In this section we shall see that thisis in fact the case. The results to be obtained will not be important so muchfor the specific computations to be carried out, as for their suggestion ofalternate plausible large sample approximations for the moments and distri-bution of the estimate.
Suppose the stretch X(/), t = 0,. . . , T — 1 of an r vector-valued series isavailable for analysis. Suppose the discrete Fourier transform
248 ESTIMATING SECOND-ORDER SPECTRA
Let BT,T= 1, 2 , . . . be a bounded sequence of non-negative scale param-eters. As an estimate of /"a&(X) consider
where
This estimate has the same symmetry and periodicity properties as doesfjrXX) in the case that the functions Wat(a) are even, Wab(—a) = Wab(ot). Inaddition, if the matrix [Wa*(a)] is non-negative definite for all a, thenfxx(T)0^) will be non-negative definite as is fxx(^)\ see Exercise 7.10.26.We now set down
Theorem 7.4.1 Let X(/), / = 0, dbl, . . . be an r vector-valued series withmean function £X(/) = c* and covariance function cov {X(/ -f- M), X(/)} =£xx(u), for /, u = 0, ± 1, . . . . Suppose
Let /a6(r)(X) be given by (7.4.5) where Wab(a) satisfies Assumption 5.6.1,a, b = ! , . . . , / - . Then
In view of the 2tr period of Iab(T)(a), the estimate may be written
The estimate (7.4.4) is seen to weight periodogram values heavily at fre-quencies within O(flr) of X. This suggests that we will later require BT —» 0as T -» oo.
As an estimate of f^A-(X) we now take
7.4 CONSISTENT ESTIMATES OF THE SPECTRAL DENSITY MATRIX 249
If in addition
then
The error term is uniform in X.
Expressions (7.4.9) and (7.4.11) show that the expected value of the pro-posed estimate is a weighted average of/,*(«), — °° < a < °°, with weightconcentrated in a band of width O(Br) about X. In the case that BT —> 0 asT—> oos the estimate is asymptotically unbiased. We may proceed as inTheorem 3.3.1 to develop the asymptotic bias of the estimate (7.4.5) as afunction of BT. Specifically we have
Theorem 7.4.2 Let/,&(X) have bounded derivatives of order ^P. Suppose
If P — 3, the above theorems and the fact that W(—a) = W(a) give
From this, and expression (7.4.13), we see that in connection with the biasof the estimate y^,(r)(X) it is desirable that fab(oi) be near constant in theneighborhood of X, that BT be small and that J a? W(a)da, p = 2, 4,. . . besmall. The next theorem will show that we cannot take BT too small if wewish the estimate to be consistent.
250 ESTIMATING SECOND-ORDER SPECTRA
Theorem 7.4.3 Let X(0, t = 0, ± 1,. . . be an r vector-valued series satis-fying Assumption 2.6.2(1), Let Wab(a), — «> < a < «>, satisfy Assumption5.6.1, a, b = 1,. . ., r. Let/«*(7X\) be given by (7.4.5). Let BTT -> «. Then
for ai, «2, b\, b2 = 1,. . ., r. The error term is uniform in X, M.
Given the character of the W(T) functions, this covariance is seen to havegreatest magnitude for X ± M ^ 0 (mod 2;r). The averages in (7.4.15) areapproximately concentrated in a band of width O(flr) about X, n, and so thecovariance approximately equals
In the limit we have
Corollary 7.4.3 Under the conditions of Theorem 7.4.3 and if BT —» 0,BrT —> oo as T —» oo
We see that the second-order moments are O(BT~IT~I) and so tend to 0 as7—> OD . We have already seen that the estimate is asymptotically unbiased.It therefore follows that it is consistent. We see that estimates evaluated atfrequencies X, n with X ± p, ̂ 0 (mod 2ir) are asymptotically uncorrelated.
7.4 CONSISTENT ESTIMATES OF THE SPECTRAL DENSITY MATRIX 251
The first statement of expression (7.4.15) may be used to give an ex-pression for the large sample covariance in the case where BT = 2r/T.Suppose Wab(a) vanishes for |a| sufficiently large and X = 2irs(T)/T withs(T) an integer. For large T, the estimate (7.4.4) then takes the form
The estimate (7.3.2) had this form with Wab(s) = T/2ir(2m + l)for|s| ^ m.Expression (7.4.16) may be seen to give the following approximate form forthe covariance here
The results of Theorem 5.5.2 are particular cases of (7.4.19).Expression (7.4.17) may be combined with expression (7.4.14) to obtain a
form for the large sample mean squared error of/,6(r)(X). Specifically, ifX fzi 0 (mod IT) it is
Exercise 7.10.30 indicates that BT should be taken to fall off as T'1'5 if wewish to minimize this asymptotic mean-squared error.
Turning to the asymptotic distribution itself, we have
Theorem 7.4.4 Suppose Theorem 7.4.1 and Assumption 2.6.1 are satisfied.Then fA-.v ( r )(Xi),..., fxx{T)0^j) are asymptotically jointly normal with co-variance structure given by (7.4.17) as T—» « with BrT—* »} BT —» 0.
An examination of expression (7.4.17) shows that the estimates f*.r(r)(X),fxx{T)(n) are asymptotically independent if X ± M ̂ 0 (mod 2ir). In the casethat X = 0 (mod T), the estimate fxx(T)0$ is real-valued and its limitingdistribution is seen to be real normal.
In Section 7.3, taking an estimate to be the average of 2m + 1 periodo-gram ordinates, we obtained a Wishart with 2m + 1 degrees of freedom asthe limiting distribution. That result is consistent with the result just ob-tained in Theorem 7.4.4. The estimate (7.4.4) is essentially a weightedaverage of periodogram ordinates at frequencies within O(fir) of X. Thereare O(BrT) such ordinates, in contrast with the previous 2m + 1. Now the
Having formed an estimate in the manner of (7.4.4) or (7.4.18) we may con-sider approximating the distribution of that estimate by (2m + I)"1
Wrc(2m + l,fAr*(X)) if X ̂ 0 (mod *•) and by (2m)-iWr(2mfxx(X)) ifX = 0 (mod ?r) taking 2m + 1 to be given by (7.4.21).
Rosenblatt (1959) discussed the asymptotic first- and second-order mo-ment structure and the joint asymptotic distribution of consistent estimatesof second-order spectra. Parzen (1967c) was also concerned with the asymp-totic theory and certain empirical aspects. We end this section by remarkingthat we will develop the asymptotic distribution of spectral estimates basedon tapered data in Section 7.7.
252 ESTIMATING SECOND-ORDER SPECTRA
Wishart is approximately normal for large degrees of freedom. As we haveassumed BTT —> <», the two approximations are essentially the same. Wemay set up a formal equivalence between the approximations. Suppose thesame weight function is used in all the estimates, Wab(a) = W(a) fora, b = 1, . . . ,r. Comparing expression (7.4.16) with expression (7.3.13)suggests the identification
7.5 CONSTRUCTION OF CONFIDENCE LIMITS
Having determined certain limiting distributions for estimates,/afe(r)(X), of
second-order spectra we turn to a discussion of the use of these distributionsin setting confidence limits for the parameter ̂ (X). We begin with the esti-mate of Section 7.3. In the case of X ̂ 0 (mod T), the estimate is given by
for s(T) an integer with 2irs(T)/T near X. Its consideration resulted fromTheorem 7.2.4 which suggested that the variates
might be considered to be 2m + 1 independent estimates of /fli>(X). Having anumber of approximately independent estimates of a parameter of interest,a means of setting approximate confidence limits is clear. Consider for ex-ample the case of 6 = Re/,/,(X). Set
by a Student's / distribution with 2m degrees of freedom. This leads to thefollowing approximate 100/3 percent confidence interval for 6 — Re /0j,(X)
where tj(y) denotes the lOOy percentile of Student's / distribution with v de-grees of freedom. In the case of X = 0 (mod ?r) we again proceed fromTheorem 7.2.4.
By setting
7.5 CONSTRUCTION OF CONFIDENCE LIMITS 253
Our estimate of 6 is now
Set
Even when the basic variates §s are not normal, it has often proved reason-able statistical practice (see Chap. 31 in Kendall and Stuart (1961)) toapproximate the distribution of a variate such as
for s = 0, ±1, . . . , ±m we may likewise obtain an approximate confidenceinterval for the quad-spectrum, Im /,i(X).
A closely related means of setting approximate confidence limits followsfrom Theorem 7.2.5. Here the statistics Ittb<n(\J), I = 1,. . . , L for X ̂ 0(mod 2x) provide L approximately independent estimates of y^>(X). Pro-ceeding as above, we set 6 = Re/aft(X),
and set
254 ESTIMATING SECOND-ORDER SPECTRA
We then approximate the distribution of
by a Student's / distribution with L — 1 degrees of freedom and thenceobtain the desired limits. Similar steps lead to approximate limits in thecase of the quad-spectrum, Im /afr(X).
The results of Theorem 7.4.4 and Exercise 7.10.8 suggest a different meansof proceeding. Suppose X ̂ 0 (mod TT) and that the estimate/fli
(T)(X) is givenby (7.4.4). Then the exercise suggests that the distribution of Re/a/,
(n(X) isapproximately normal with mean Re/^X) and variance
Expression (7.5.13) can be estimated by
and the following approximate 100/3 percent confidence interval can be setdown
where z(y) denotes the 1007 percent point of the distribution of a standardnormal variate. We may obtain an approximate interval for the quad-spectrum Im/0i(X) in a similar manner.
Finally, we note that the approximations suggested in Freiberger (1963)may prove useful in constructing confidence intervals for Re/ai(X), Im/^X).Rosenblatt (1960) and Gyires (1961) relate to these approximations.
7.6 THE ESTIMATION OF RELATED PARAMETERS
Let X(/), t = 0, ± 1, . . . denote an r vector-valued stationary series withcovariance function cxx(u), u = 0, ±1, . . . and spectral density matrixf^A-(X), — 0° < X < oo. Sometimes we are interested in estimating param-eters of the process having the form
for some function A(a) and a, b = 1,. . . , r. Examples of such a parameterinclude the covariance functions
Finally, Jab(T)(Aj),j = 1,.. ., J; a, b = 1,.. . , r are asymptotically jointly
normal with the above first- and second-order moment structure.
From Theorem 7.6.1, we see that Jab(T\Aj) is an asymptotically unbiased
and consistent estimate of Jab(Aj). It is based on the discrete Fourier trans-form and so can possibly be computed taking advantage of the Fast Fourier
7.6 THE ESTIMATION OF RELATED PARAMETERS 255
and the spectral measures
0, b = 1,. . . , r. If /a6(r)(X) indicates a periodogram of a stretch of data,
then an obvious estimate of Jab(A) is provided by
In connection with this estimate we have
Theorem 7.6.1 Let X(r), / = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.1. Let A}{a), 0 ^ a ^ 2ir, be of bounded variationfor /' = 1, . . . , J. Then
256 ESTIMATING SECOND-ORDER SPECTRA
Transform Algorithm. Were Assumption 2.6.2(1) adopted the error termswould be 0(7^0, O(T~2) in the manner of Theorem 5.10.1.
In the case of the estimate
of the spectral measure, Fa&(X), corresponding to A(oi) = 1 for 0 ^ a ^ Xand = 0 otherwise, expression (7.6.7) gives
is also asymptotically normal with the covariance structure (7.6.11).It will sometimes be useful to consider the parameters
for 0 ^ X, M ^ ir; a\,b\,02,b2 = 1,. . . , r. We will return to the discussionof the convergence of Ffl6
(7>)(X) later in this section. In the case of the estimate
of cab(u), corresponding to A(a) = exp {/««} and with&(/) denoting the Tperiodic extension of the sequence X(G),. .. ,X(T — 1), expression (7.6.7)gives
Exercise 7.10.36 shows that the autocovariance estimate
In the case that the spectral estimates are of the form considered in Section7.4 we have
Theorem 7.6.2 Under the conditions of Theorem 7.4.3 and if Rab(T\ty is
given by (7.6.14)
7.6 THE ESTIMATION OF RELATED PARAMETERS 257
— oo <x< o o ; l <C a < b ^ r. Rab(K) is called the coherency of the seriesXa(t) with the series Xb(t) at frequency X. Its modulus squared, |/?ai(X)|2, iscalled the coherence of the series Xa(f) with the series Xb(i) at frequency X. Theinterpretation of the parameter /?flfe(X) will be considered in Chapter 8. It is acomplex-valued analog of the coefficient of correlation. We may estimateit bv
for a, b, c, d = 1, . . . , r. Also the variates 7?at(r)(X), a, b — 1, . . . , r are
asymptotically jointly normal with covariance structure indicated by ex-pression (7.6.16) where we have written Rab for ^(X), a, b = 1, . . . , r.
The asymptotic covariance structure of estimated correlation coefficientsis presented in Pearson and Filon (1898), Hall (1927), and Hsu (1949) for thecase of vector-valued variates with real components. We could clearly de-velop an alternate form of limiting distribution taking the estimate andlimiting Wishart distributions of Theorem 7.3.3. This distribution is givenby Fisher (1962) for the case of vector-valued variates with real components.The theorem has this useful corollary:
Corollary 7.6.2 Under the conditions of Theorem 7.6.2,
and, for given J, the variates Rab(T)(^\), • • • , Rab(T)(^j) are asymptotically
jointly normal with covariance structure given by (7.6.18) for 1 ̂ a < b ^ r.
258 ESTIMATING SECOND-ORDER SPECTRA
In Section 8.5 we will discuss further aspects of the asymptotic distribu-tion of |J?a6<r)(X)|2, and in Section 8.9, we will discuss the construction ofapproximate confidence intervals for |jRa*(X)|.
Let £>[0,7r] signify the space of right continuous functions having left-handlimits. This space can be endowed with a metric which makes it completeand separable; see Billingsley (1968), Chap. 3. Let Dr£r[Q,ir] denote thespace of r X r matrix-valued functions whose entries are complex-valuedfunctions that are right continuous and have left-hand limits. This space isisomorphic with D2r\Q,ir] and may be endowed with a metric making it com-plete and separable. Continuing, if PT, T = 1, 2 , . . . denotes a sequence ofprobability measures on DC^[Q,TT], we shall say that the sequence convergesweakly to a probability measure P on D^O.TT] if
as T—> oo for all real-valued bounded continuous functions, h, on D^O,*-].In this circumstance, if PT is determined by the random element Xr and P isdetermined by the random element X, we shall also say that the sequenceXr, T = 1, 2, ... converges in distribution to X.
The random function $xx(T)(\\ 0 ^ X ̂ ir, clearly lies in Dr£r[Q,Tr] asdoes the function ^f[¥xxm(X)— Fjrjr(X)]. We may now state
Theorem 7.6.3 Let X(r), t = 0, ±1,... be an r vector-valued series satis-fying Assumption 2.6.2(1). Let Fjr;r(r)(X) be given by (7.6.8). Then thesequence of processes {^T[¥xx ( T ) (X) — $xx(\)]; 0 ^ X ̂ •*} converges indistribution to an r X r matrix-valued Gaussian process {Y(X); 0 ^ X ̂ ir\with mean 0 and
for 0 ^ X, M ̂ TT and ai, 02, hi, 62 = 1,. . . , r.
We may use the results of Chapter 4 in Crame'r and Leadbetter (1967) tosee that the sample paths of the limit process (Y(X); 0 ^ X ̂ TT} are con-tinuous with probability 1. In the case that the series X(r), / = 0, ±1,... isGaussian, the fourth-order spectra are identically 0 and the covariance func-tion (7.6.20) is simplified. In this case, by setting Ai(d) — 1 for MI ^ « ^ AIand Ai(oi) = 1 for M2 ̂ « ^ \i and both = 0 otherwise, we see from (7.6.7)that
7.6 THE ESTIMATION OF RELATED PARAMETERS 259
That is, the limiting Gaussian process has independent increments.A key implication of Theorem 7.6.3 is that if h is a function on Dg°"[0,7r]
whose set of discontinuities has probability 0 with respect to the process(Y(X); 0 ^ X ̂ TT}, then h(^[¥xx(T)(') - $xx(-)]) converges in distribu-tion to /<¥(•)); see in Biilingsley (1968) p. 31. The metric for Dg^O,*-] usedabove is often not convenient. Luckily, as the limit process of the theorem iscontinuous, a result of M. L. Straf applies to indicate that if h is continuousin the norm
tends in distribution to
where Yaa(\) is a Gaussian process with 0 mean and
It may be shown that the process
converges in distribution to a 0 mean Gaussian process with covariancefunction (7.6.20).
If r = 1 and the series X(i), / = 0, ± 1, ... is a 0 mean linear process, thenGrenander and Rosenblatt (1957) demonstrated the weak convergence ofthe process
and the h(^[Fxx(T)(-) — F*X')]) are (measurable) random variables, thenh converges in distribution to A(Y(-))- For example this implies that
The estimate considered in the theorem has the disadvantage of being dis-continuous even though the corresponding population parameter is con-tinuous and indeed differentiable. A continuous estimate is provided by
260 ESTIMATING SECOND-ORDER SPECTRA
They also considered the weak convergence of the process
where fxx(T)(X) is an estimate of the spectral density involving a weightfunction. The case of a 0 mean Gaussian process with square integrablespectral density was considered by Ibragimov (1963) and Malevich (1964,1965). MacNeil (1971) considered the case of a 0 mean bivariate Gaussianprocess. Brillinger (1969c) considers the case of a 0 mean r vector-valuedprocess satisfying Assumption 2.6.2(1) and shows convergence in a finertopology. Clevenson (1970) considered the weak convegence of the discon-tinuous process of the theorem in the case of a 0 mean Gaussian series.
7.7 FURTHER CONSIDERATIONS IN THE ESTIMATION OFSECOND-ORDER SPECTRA
We begin this section by developing the asymptotic distribution of a con-sistent estimate of the spectral density matrix based on tapered data. Sup-pose that we wish to estimate the spectral density matrix, fxxO$, of an rvector-valued series X(/)5 t = 0, ±1, . . . with mean function c*. Let ha(u),-co < u < 03, denote a tapering function satisfying Assumption 4.3.1 fora = 1 , . . . , / • . Suppose that the tapered values ha(t/T)Xa(t)t t — 0, ± 1, . ..are available for analysis. Suppose the mean function is estimated by
where
Let
We will base our estimate of fxx(\) on the Fourier transforms of mean-adjusted tapered values, specifically on
is an estimate of the cross-covariance function cab(u).Suppose Wab(a\ — oo < a < oo, a, b = 1, . . . , / • are weight functions
satisfying J Wab(a)da = 1. In this present case involving arbitrary taperingfunctions, no particular advantage accrues from a smoothing of the periodo-gram values at the particular frequencies 2-n-s/T, s = 0, ±1, . . . . For thisreason we consider the following estimate involving a continuous weighting
where
From (7.7.3) we see that expression (7.7.5) may be written
where
Following the discussion of Section 7.2, we next form the second-orderperiodograms
7.7 FURTHER CONSIDERATIONS 261
where the values BT, T = 1,2,... are positive and bounded. Using ex-pression (7.7.7) we see that (7.7.9) may be written
where
We will require this function to satisfy the following:
Assumption 7.7.1 The function w(u), — <» < u < <», is real-valued,bounded, symmetric, w(0) = 1, and such that
Following Schwarz's inequality this is ^ 1 and-so the limiting variance isincreased by tapering. However, the hope is that there has been a sufficientreduction in bias to compensate for any increase in variance. We also have
Corollary 7.7.1 Under the conditions of Theorem 7.7.1, and if Bj —> 0as T—> * t the estimate is asymptotically unbiased.
Historically, the first cross-spectral estimate widely considered had theform (7.7.10) (see Goodman (1957) and Rosenblatt (1959)), although taper-ing was not generally employed. Its asymptotic properties are seen to be
262 ESTIMATING SECOND-ORDER SPECTRA
Exercise 3.10.7 shows the estimate (7.7.10) may be computed using a FastFourier Transform. We have
Theorem 7.7.1 Let X(f), t = 0, ±1,. . . be an r vector-valued series satis-fying Assumption 2.6.2(1). Let /za(w), — » < u < oo, satisfy Assumption4.3.1 for a = 1, . . . , / • and be such that J ha(u)hb(u)du ^ 0. Let wad(u),— oo < u < oo} satisfy Assumption 7.7.1 for a, b = 1,... , r. Let BrT —» <»asr-» oo. Then
Also
Finally, the variates/^Xi),. . . ,/a|C(JK^*) are asymptotically normal with
the above covariance structure.
A comparison of expressions (7.7.14) and (7.4.17) shows that, asymp-totically, the effect of tapering is to multiply the limiting variance by a factor
This factor equals 1 in the case of no tapering, that is, ha(t) = 1 for 0 ^ / < 1and = 0 for other t. In the case that the same tapering function is used forall series, that is, ha(t) — h(t) for a = 1,.. ., r, the factor becomes
or the average of the two. These estimation procedures have the advantageof allowing us to investigate whether or not the structure of the series isslowly evolving in time; see Brillinger and Hatanaka (1969). This type ofestimate was suggested in Blanc-Lapierre and Fortet (1953). One usefulmeans of forming the required series is through the technique of complexdemodulation; see Section 2.7 and Brillinger (1964b).
Brillinger (1968) was concerned with estimating the cross-spectrum of a 0mean bivariate Gaussian series from the values sgn X\(t), sgn X2(t), t = 0,. . . , T — 1. The asymptotic distribution of the estimate (7.7.10), withouttapering, was derived.
On some occasions we may wish a measure of the extent to which/j/,(r)(X)may deviate from its expected value simultaneously as a function of X and T.We begin by examining the behavior of the second-order periodograms.Theorem 4.5.1 indicated that for a 0 mean series and under regularityconditions
with probability 1. This gives us directly
Theorem 7.7.2 Let X(0, t = 0, ±1,. .. be an r vector-valued series satis-fying Assumption 2.6.3 and having mean 0. Let ha(u), — « < u < », satisfyAssumption 4.3.1, a = 1,. . . , r. Let I*;r(r)(A) be given by (7.2.5). Then
with probability 1 for a, b = 1, .. ., r.
7.7 FURTHER CONSIDERATIONS 263
essentially the same as those of the estimate of Section 7.4. It is investigatedin Akaike and Yamanouchi (1962), Jenkins (1963a), Murthy (1963), andGranger (1964). Freiberger (1963) considers approximations to its distribu-tion in the case that the series is bivariate Gaussian.
The discussion of Section 7.1 suggests an alternate class of estimates of thesecond-order spectrum /06(X). Let Ya(i) denote the series resulting fromband-pass filtering the series Xa(t) with a filter having transfer functionA(a) = 1 for |a ± X| < A, and = 0 otherwise, — TT < a, X ̂ TT. Considerestimating Re /,t(X) by
or the average of the two. Consider estimating Im /,f>(X) by
264 ESTIMATING SECOND-ORDER SPECTRA
Whittle (1959) determined a bound for the second-order periodogram thatheld in probability; see also Walker (1965). Parthasarathy (1960) found aprobability 1 bound for the case of a single periodogram ordinate; he foundthat a single ordinate could grow at the rate log log T, rather than the log Tof (7.7.20). Before turning to an investigation of the behavior of/0j,
(r)(X) —Efab(T)(X) we set down a further assumption of the character of Assumption2.6.3 concerning the series X(/), t — 0, ±1,.. ..
Assumption 7.7.2 X(0, / = 0, ± 1,.. . is an /• vector-valued series satis-fying Assumption 2.6.1. Also, with Cn given by (2.6.7),
This is finite for 1C-^z\ < 1 and so Assumption 7.7.1 is satisfied in this caseso long as Assumption 2.6.1 is satisfied. We may now set down
Theorem 7.7.3 X(0, t = 0, ±1, . . . is an r vector-valued series satisfyingAssumption 7.7.2 ha(u), — °° < u< <», satisfies Assumption 4.3.1 fora - 1, . . . , r. The wah(u), — <» < u < °°, satisfy Assumption 7.7.1 andvanish for \u\ sufficiently large, a, b = 1,. . ., r.fab
(T)(\) is given by (7.7.10).Let t\ > 0 be given and such that ^T B} < «. Then
Ihn" sup |/.*«-)(x) - £/«6(r)(A)|(ftT/log l/fir)1/2
T->co X
for z in a neighborhood of 0. In (7.7.21) the inner summation is over all inde-composable partitions v = ( y i , . . ., vp) of the table
with vp having np > 1 elements, p = 1,. . ., P.
In the case of a Gaussian series, Cn = 0 for n > 2 and the series of (7.7.21)becomes
with probability 1 for a, b = 1,. . . , r.
7.7 FURTHER CONSIDERATIONS 265
If 2 \u\ |cflfr(w)| < oo, and / \a.\\Wab(a)\da < «, then Theorem 3.3.1 andexpression (7.7.13) show that
and so we can say
with probability 1, the error terms being uniform in X.We see that in the case of Theorem 7.7.3,/fl&
(r)(X) is a strongly consistentestimate of/^(X). Woodroofe and Van Ness (1967) showed, under regularityconditions including X(f) being a linear process, that
in probability. The data is not tapered here. They also investigated thelimiting distribution of the maximum deviation.
The following cruder result may be developed under the weaker Assump-tion 2.6.1:
Theorem 7.7.4 Let X(/), / = 0, ±1,. .. be an /• vector-valued series satis-fying Assumption 2.6.1. Let ha(u), — °° < u < oo, satisfy Assumption 4.3.1.Let wab(u), — oo < u < oo satisfy Assumption 7.7.1 and vanish for \u\ suffi-ciently large, a, b = 1, . . . , r. Let fab
(r>(\) be given by (7.7.10). Letfirr-> oo, BT -> 0 as T -» oo. Then for any e > 0,
in probability as T—» oo. If, in addition, 2_,r B™ < oo for some m > 0,then the event (7.7.28) occurs with probability 1 as T—> oo.
In Theorem 7.7.4 the multiplier (BTT/log l/flr)1/2 of (7.7.27) has becomereplaced by the smaller (BTT)^I2B^,
If we wish to use the estimate (7.4.5) and are content with a result con-cerning the maximum over a discrete set of points, we have
Theorem 7.7.5 Let X(/), / = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.1. Let Wab(ot), — °° < a < oo, satisfy Assumption5.6.1. Let/fl6
(r)(X) be given by (7.4.5). Let BT -> 0, PT, BTT'-» oo as r-» oo.Then for any e > 0
in probability as T—» oo. If, in addition, X)r FT™ < °°5 for some m > 0,then the event (7.7.29) occurs with probability 1 as T—* ».
266 ESTIMATING SECOND-ORDER SPECTRA
In Section 5.8 we discussed the importance of prefiltering a stretch of dataprior to forming a spectral estimate. Expression (7.7.13) again makes thisclear. The expected value of /06
(r)(X) is not generally /ai(X), rather it is aweighted average of fab(ci), — °° < a < <», with weight concentrated in theneighborhood of X. If /a/>(a) has any substantial peaks or valleys, theweighted average could be far from/afc(X). In practice it appears to be thecase that cross-spectra vary more substantially than power spectra. Con-sider a commonly occurring situation in which a series A^(0 is essentially adelayed version of a series X\(i), for example
Here, if v has any appreciable magnitude at all, the function /2i(X) will berapidly altering in sign as X varies. Any weighted average of it, such as(7.7.13) will be near 0. We could well be led to conclude that there was norelation between the series, when in fact there was a strong linear relation.Akaike (1962) has suggested that a situation of this character be handled bydelaying the series Xi(t) by approximately v time units. That is, we analyzethe series [X\(t — v*), A^OL t = 0, ±1, . . . with v* near v, instead of theoriginal stretch of series. This is a form of prefiltering. Akaike suggests thatin practice one might determine v* as the lag where £2i(r)(w)| is greatest. Ifthe estimated delay is anywhere near v at all, the cross-spectrum being esti-mated now should be a much less rapidly varying function.
In Section 5.8 it was suggested that a prewhitening filter be determinedby fitting an autoregressive model to a time series of interest. Nettheim(1966) has suggested an analagous procedure in the estimation of a cross-spectrum. We fit a model such as
by least squares and estimate the cross-spectrum of the residuals with Xi(t).In the full r vector-valued situation we could determine r vectors a(r)(l),. . .,a(T)(m) to minimize
t = 0, ±1, . . . for constants a, v and e(/) an error series orthogonal to theseries X\(t). Then the cross-spectrum is given by
7.8 A WORKED EXAMPLE 267
We then form f«,(r)(X), a spectral estimate based on the residuals
It follows that the population parameter and corresponding estimate will beessentially the same for all the frequencies
7.8 A WORKED EXAMPLE
For an example of the estimate developed in Section 7.3 we return to theseries considered in Section 7.2. There X\(t) was the seasonally adjustedseries of mean monthly temperatures for Berlin (1780 to 1950) andA^O wasthe seasonally adjusted series of mean monthly temperatures for Vienna(1780 to 1950). The periodograms and cross-periodogram for this data weregiven in Figures 7.2.1 to 7.2.4.
Figures 7.8.1 to 7.8.4 of the present section give /n(T)(X), /22(r>(X),Re/i2(r)(X), Im/i2(r)(X) using estimates of the form (5.4.1) and (7.3.2) withm = 10. If we consider logio power spectral estimates, expression (5.6.15)suggests that the standard errors are both approximately .095. It is inter-esting to contrast the forms of Re/i2
(r)(X) and Im/i2<r>(X); Re/i2(r)(X) is
and then estimate f**(X) by
where
Generally it is wise to use prior knowledge to suggest a statistical model for aseries of interest, to fit the model, and then to compute a spectral estimatebased on the residuals.
Nothing much remains to be said about the complication of aliasing afterthe discussion of Section 5.11. We simply note that the population param-eter f^A-(X) and its estimates both possess the periodicity and symmetryproperties
If possible we should band-pass filter the series prior to digitization in orderto essentially eliminate any frequency components that might cause con-fusion in the interpretation of the spectral estimate.
Figure 7.8.2 /22(r)00 for seasonally adjusted monthly mean temperatures at Vienna for theyears 1780-1950 with 21 periodogram ordinates averaged. (Logarithmic plot.)
Figure 7.8.1 fn(T)(\) for seasonally adjusted monthly mean temperatures at Berlin for theyears 1780-1950 with 21 periodogram ordinates averaged. (Logarithmic plot.)
Figure 7.8.3 Re/ia(r)(X), estimate of the cospectrum of Berlin and Vienna temperaturesfor the years 1780-1950 with 21 periodogram ordinates averaged.
Figure 7.8.4 Im/i2(r)(X), estimate of the quadspectrum of Berlin and Vienna temperaturesfor the years 1780-1950 with 21 periodogram ordinates averaged.
270 ESTIMATING SECOND-ORDER SPECTRA
Figure 7.8.5 c\ \ (r)(«), estimate of the auto- Figure 7.8.6 C22(r)(«), estimate of the auto-covariance function of Berlin temperatures, covariance function of Vienna temperatures.
Figure 7.8.7 c\2(T)(u), estimate of the crosscovariance function of Berlin and Viennatemperatures.
everywhere positive, of appreciable magnitude at several frequencies andapproximately constant otherwise, while Im/i2(r)(X) simply fluctuates alittle about the value 0 suggesting that Im/i2(X) = 0. Other statistics forthis example were given in Section 6.10.
For completeness we also give estimates of the auto- and cross-covariancefunctions of these two series. Figure 7.8.5 is an estimate of the autoco-variance function of the series of Berlin mean monthly temperatures, with
7.8 A WORKED EXAMPLE 271
seasonal effects removed. Likewise Figure 7.8.6 is an estimate of the auto-covariance function of the Vienna series. Figure 7.8.7 is the function ci2(T)(u)for u = 0, ±1,
Figure 7.8.8 Logarithm of estimated power spectrum of seasonally adjusted monthly meantemperatures at various stations, with 115 periodogram ordinates averaged.
Table 7.8.1 Covariance Matrix of the Temperature Series
Vienna 4.272Berlin 3.438 4.333Copenhagen 2.312 2.962 2.939Pragwe 3.986 3.756 2.635 6.030Stockholm 2.056 2.950 3.052 2.325 4.386Budapest 3.808 3.132 2.047 3.558 1.843 4.040DeBilt 2.665 3.209 2.315 2.960 2.170 2.261 3.073Edinburgh .941 1.482 1.349 1.182 1.418 .627 1.509 2.050New Haven .045 .288 .520 .076 .672 .009 .206 .404 2.939Awe/ 3.099 3.051 1.946 3.212 1.576 2.776 2.747 1.179 .178 3.694Breslau 3.868 4.227 2.868 4.100 2.805 3.646 3.053 1.139 .165 3.123 5.095Vilna 3.126 3.623 2.795 3.152 3.349 2.993 2.392 .712 .057 1.962 3.911 6.502Trondheim 1.230 2.165 2.358 1.496 3.312 .984 1.656 1.429 .594 .884 1.801 2.185 3.949Greenwich 1.805 2.255 1.658 2.005 1.570 1.450 2.300 1.564 .440 2.310 2.059 1.261 1.255 2.355
7.8 A WORKED EXAMPLE 273
Table 7.8.2Sample Correlation Matrix of the Seasonally Adjusted Series
1 2 3 4 5 6 7 8 9 10 11 12 13
1 Vienna2 Berlin .803 Copenhagen .65 .834 Prague .79 .73 .635 Stockholm .48 .68 .85 .456 Budapest .92 .75 .59 .72 .447 DeBilt .74 .88 .77 .69 .59 .648 Edinburgh .32 .50 .56 .34 .48 .22 .619 New Haven .01 .08 .18 .02 .19 .00 .07 .16
10 Basel .78 .76 .59 .68 .39 .72 .82 .43 .0511 Breslau .83 .90 .74 .74 .59 .80 .77 .36 .04 .7212 Vilna .59 .68 .64 .50 .63 .58 .54 .20 .01 .40 .6813 Trondheim .30 .52 .69 .31 .80 .25 .48 .50 .17 .23 .40 .4314 Greenwich .57 .71 .67 .53 .49 .47 .86 .72 .17 .78 .59 .32 .41
1 2 3 4 5 6 7 8 9 10 11 12 13
All of these figures are consistent with a hypothesis of an instantaneousrelation between the two series. (Instantaneous here means small time leador lag relative to an interval of one month, because the data is monthly.)
As a full vector-valued example we consider the series of mean monthlytemperatures recorded at the stations listed in Table 1.1.1. The series wereinitially seasonally adjusted by removing monthly means. Table 7.8.1 givesCxx(T)(0), the estimated 0 lag autocovariance matrix. Table 7.8.2 gives the 0lag correlations of the series. Except for the New Haven series the series areseen to be quite intercorrelated.
The spectral density matrix was estimated through a statistic of the form(7.3.2) with m = 57. Because there are so many second-order spectra we donot present all the estimates. Figure 7.8.8 gives the logio of the estimatedpower spectra. These are all seen to have essentially the same shape. Figure7.8.9 gives the sample coherences, |/?i/r)(A)|2, takingX\(i) to be the Green-wich series and letting j run across the remaining series. The horizontal linein each of the diagrams corresponds to the 0 lag correlation squared. Theplots are seen to be vaguely constant, fluctuating about the horizontal linein each case. This last is suggestive of instantaneous dependence of the seriesfor if cab(u) = 0 for u ̂ 0, then |/U(A)|2 = cafc(0)|2/^(0)^(0)1 for— oo < A < oo. The correlation is seen to be greatest for the De Bilt seriesfollowed by Basel. The correlation is lowest for New Haven, Conn., on theopposite side of the Atlantic.
274 ESTIMATING SECOND-ORDER SPECTRA
Figure 7.8.9 Estimated coherences of seasonally adjusted Greenwich monthly meantemperatures with similar temperatures at 13 other stations for the years 1780-1950.
7.8 A WORKED EXAMPLE 275
Figure 9.6.1 gives logio of the estimated power spectra for an estimate ofthe form (7.3.2) with m = 25. These curves are more variable as to be ex-pected from the sampling theory developed in this chapter.
where /i is a constant; where «(/), / = 0, ± 1, . . . is a 0 mean stationary serieswith power spectrum /a«(X); where the series j3/0> t - 0, ±1,.. . ,j = 1,. . . , J are 0 mean stationary series each with power spectrum /^(X),— oo < X < oo; and where the series e/*(0> t = 0, ± 1,. . . ; k = 1,.. . , K;
j = 1,. . . , J are 0 mean stationary series each with power spectrum/.«(X), — 0° < X < oo. The parameter M relates to the mean thickness of thesheeting. Series «(?), t = 0, ±1, . . . is common to all the sheets and theseries /3/0» t = 0, ± 1, ... relates to the effect of they'th batch, if such an in-dividual effect exists. It is common to all sheets selected from batch j. Theseries £;*(/)> t = 0, ± 1, . . . is an error series. Taking note of the language ofthe random effects model of experimental design (Scheffe (1959)) we mightcall /aa(X), /#»(X), /.E(X) components of the power spectrum at frequency X.Spectrum /0/s(X) might be called the between batch power spectrum at fre-quency X and/,,(X) the within batch power spectrum at frequency X.
Under the above assumptions, we note that EXjk(t) = n, t = 0, ± 1,. . .and the series have power spectra and cross-spectra as follows:
and
The coherency between series corresponding to sheets selected from thesame batch is seen to be
276 ESTIMATING SECOND-ORDER SPECTRA
7.9 THE ANALYSIS OF SERIES COLLECTED IN ANEXPERIMENTAL DESIGN
On occasion the subscripts a = 1,.. ., r of an r vector-valued seriesX(0 = [Xi(OL t = 0, ±1,.. . may have an inherent structure of their ownas in the case where the series have been collected in an experimental design.Consider for example the case of a balanced one-way classification, where Kof the series fall into each of J classes. Here we would probably denote theseries by Xjk(t), t = 0, ±1,. .. ; k = 1,. . . , K',j = 1 , . . . , , / with r — JK.Such series would arise if we were making up J batches of sheeting and draw-ing K pieces of sheeting from each batch. If we were interested in the uni-formity of the sheeting we could let t refer to position from an origin along across-section of the sheeting and let Xjk(f) denote the thickness at position ton sheet k selected from batch j. A model that might come to mind for thissituation is
7.9 ANALYSIS OF SERIES COLLECTED IN AN EXPERMENTAL DESIGN 277
This might be called the intraclass coherency at frequency X. The coherencybetween the series corresponding to sheets selected from different batches isseen to be/aa(X)/[/aa(X) +/^(X) + /,,(X)].
We might be interested in a measure of the extent to which sheets from thesame batch are related at frequency X. One such measure is the coherency(7.9.5). In the extreme case of «(/), /3/0 identically 0, this measure is iden-tically 0. In another extreme case where sjk(t) is identically 0, this measureis 1. We turn to the problem of estimating/aa(X),/^(X), and/»(X).
From the model (7.9.1) we see that
where
with similar definitions for aa(T\ a/i > «« • From Theorem 4.4.2 the vanat
da(T)(\) is approximately Nic(Q,2irTfaa(\y), the variates <fc/r)(A),./ = 1 , . . . , ,
are approximately independent Nic(0,2irTf^(\y) variates for X ̂ 0 (mod TTwhile the variates ^^(X), k = 1, . . . , K, j = 1, . . . , J are approximatelindependent Nic(Q, 2irTftl(\}) variates for X ̂ 0 (mod T). The model (7.9.6therefore has the approximate form of the random effects model of analysiof variance in a balanced one-way classification; see Scheffe' (1959). Thisuggests that we evaluate the statistic
and then estimate/.,(X) by
We estimate A/^X) + Jtt(\) by
and finally estimate JKfatt(\) + Kf^\) + /.,(X) by
in the case that X ̂ 0 (mod 2?r).
Theorem 7.9.1 Let JK series XJk(t), t = 0, ±1, . . . ; k = 1,. . . , K;j = 1 , . . . , J be given of the form (7.9.1) where p, is a constant, where a(t),
J),y)ss
e
278 ESTIMATING SECOND-ORDER SPECTRA
|8XO> 6;fc(0» t = 0, ±1,.. . ; A; = 1,. . ., K\j = 1,. . . , Jare independent 0mean series satisfying Assumption 2.6.1 and having power spectra/,a(X),
/fl*(X), /«(X) respectively. Let /,,^(X), W>(X) + /,.(r)(X), JKIaa^(\) +#Vr)(X) + /,,(r)(X) be (7.9.9) to (7.9.11). Then if X ̂ 0 (mod *•), thesestatistics are asymptotically independent ftl(\)K2j(K-i)/[U(K— 1)],[AT/^X) +/.,(X)]x2y
2/(2J), [/A/UX) + WX) + /,,(X)]x22/2. Also for s,(T)
an integer with X/(7) = 1irSi(T)/T -» X/ as T -»• » with 2X/(7), X/(7) ± Xm(F)^ 0 (mod IT) for 1 ̂ / < m ^ L for T sufficiently large, the statistics/e.(r)(X/(T)), JCVr>(X/(D) + /,«(T)(X,(T)), JKIaa™(Xt(T» + ^/^(^(X/(D) +/«(r)(X/(r)), / = 1,. . . , L are asymptotically independent.
It follows from Theorem 7.9.1 that the estimate
of /#j(X) will be distributed asymptotically as the difference of two inde-pendent chi-squared variates. It also follows that the ratio
will be distributed asymptotically as
as T —> oo. This last result may be used to set approximate confidence inter-vals for the ratio of power spectra /p0(X)//,,(X).
We have seen previously that advantages accrue from the smoothing ofperiodogram type statistics. The same is true in the present context. For s(T)an integer with 2irs(T)/T near X ̂ 0 (mod 2tr) consider the statistics
and
It follows from Theorem 7.9.1 that these will be asymptotically distributed asindependent/,,(X)xL(^-i)(2m+i)/[27(^- lX2w + 1)] and [AT/^X) +/«(X)]X2/(2m+D/[2y(2w + 1)] respectively.
The discussion of this section may clearly be extended to apply to timeseries collected in more complicated experimental designs. The calculationsand asymptotic distributions will parallel those going along with a normal
j — I , . .., N where s(t) is a fixed unknown signal and rij(i) a random noiseseries. He -suggests the consideration of F ratios computed in the frequencydomain. Brillinger (1973) considers the model (7.9.1) also in the case that theseries «(0, 0X0 are fixed and in the case that a transient series is present.
7.10 EXERCISES 279
random effects model for the design concerned. Shumway (1971) con-sidered the model
7.10 EXERCISES
7.10.1 Given the series [X\(t),X2(i)}, t = 0, ±1, . . . with absolutely summablecross-covariance function ciatw) = cov [X\(t + u),X2(t)\ t, u = 0, ±1,. . .show that /2i(X) = /12(-X).
7.10.2 Under the conditions of the previous exercise, show that the co-spectrumof the series X\(f)Hwith the series X2(t) is the quad-spectrum of the seriesX\(t) with the seriesX2(f).
7.10.3 Under the conditions of the first exercise, show that/i2(X), — °° < X < oo ,is real-valued in the case that c\2(u) = c2\(u).
7.10.4 Suppose the auto- and cross-covariance functions of the stationary series\X\(i),X2(i)}, t = 0, ±1, .. . are absolutely summable. Use the identity
to prove that
7.10.7 Let [Xi(t)Mt)]t t - 0, ±1, . . . be a stationary series.(a) If Yi(t) = Xi(i) + X2(t), Y2(t) = Xi(t) - X2(t\ show how the co-
spectrum, Re /ia(X), may be estimated from the power spectra ofKiCO and y2(0.
(b) If YiW = Xi(t + 1) - Xi(t - 1), and Y2(f) = X2(t), show how thequad-spectrum, Im/i2(X), may be estimated from the co-spectrumof Yi(i) and Y2(i).
280 ESTIMATING SECOND-ORDER SPECTRA
7.10.8 Under the conditions of Theorem 7.4.4 show that
is asymptotically bivariate normal with variances
and covariance
is a plausible estimate of /i2(0).7.10.13 Show that the results of Theorems 7.2.3, 7.2.4, 7.2.5, and 7.3.3 are exact
rather than asymptotic when [X\(f),i(f = 0, ±1, ... is a sequence ofindependent identically distributed bivariate normals.
for / = 1, . . . , L. Show that c*(K)(/), / = 1, . . . , L are asymptoticallyindependent
7.10.12 Let [X\(f)J(i(t)}, t = 0, ±1,... be a bivariate series satisfying Assumption2.6.1. Let
variates. Conclude that with T = LV
7.10.9 Under the conditions of Theorem 7.4.4, show that !/i2(T)(X)|, |/i2
(r)(M)lare asymptotically bivariate normal with covariance structure
7.10.10 Under the conditions of Theorem 7.4.4, show that </>i2(r)(X) = arg/i2<
r>(X),</>i2
(T)(M) = arg /i2(T)(M) are asymptotically bivariate normal with covari-
ance structure given by
7.10.11 Under the condition 2 |cj2(w)| < <», show that the expected value of/A-,S<r).A-,-c,(T)(X) is
7.10 EXERCISES 281
7.10.14 Let the series [Xi(t)^2(t)], t = 0, ±1, . . . satisfy Assumption 2.6.2(1)and have mean 0. Then
where there are finite K, L such that
7.10.15 Suppose the conditions of Theorem 7.3.3 are satisfied. Let p = /aft(X)/V/aoOO/wW, a*b. Then x = /a6
(r>(X)/V/aa(r)(X)/6^>(X) is asympto-
tically distributed with density function
and density function
Hint: Use Exercise 4.8.33.7.10.16 Let cxx(u), u = 0, ±1, . . . denote the autocovariance matrix of the
stationary r vector-valued series X(r), / == 0, ±1, . . . . Show that thematrix c*;r(0) — Cxx(u)Tcxx(0)~lcxx(u) is non-negative definite for u = 0,
Suppose Det fxxQd 5^0, — o o <X< °°. Show that there exists an /summable r X r filter {a(w)}, such that the series
7.10.20 Under the conditions of Theorem 4.5.2, show that there exists a finite Lsuch that with probability 1
7.10.19 Let X(/), / = 0, ±1,.. . be the vector-valued series of Example 2.9.7.Show that
7.10.17 Let fxx(X), — °° < X > °°, denote the spectral density matrix of thestationary r vector-valued series X(/), / = 0, ±1, . . . . Show that Imf*;r(X) = 0 for X == 0 (mod TT).
7.10.18 Let the autocovariance function of Exercise 7.10.16 satisfy
282 ESTIMATING SECOND-ORDER SPECTRA
7.10.21 Show that Theorem 7.2.1 takes the form
in the case of untapered data.
7.10.23 Suppose the conditions of Theorem 7.2.5 are staisfied. Set
is asymptotically (L - V)-lWrc(L - l,fxx(\)) if X ^ 0 (mod T) and
asymptotically (L - 1)-' Wr(L - l,f™(X)) if X = 0 (mod TT).7.10.24 Consider the estimate
where 2irs(T)/T-+ X ̂ 0 (mod TT) and £, Wt = 1. Under the conditionsof Theorem 7.3.3, show that fxx(T)(\) is distributed asymptotically as
where the Wj, s = 0, = t l , . . . , ±m are independent W,c(l,fxx03) variates.Indicate the mean and covariance matrix of the limiting distribution.
where B(a) is Brownian motion on [0,7r].
7.10.34 If the series X(t\ t — 0, ±1, . . . is a real-valued white noise process withvariance a2 and fourth cumulant *4, show that the limit process of Theorem7.6.3 has covariance function
7.10.25 Suppose the estimate
is used in the case of T even and X = TT. Under the conditions of Theorem7.3.3, show that it is asymptotically (2m + 1)-' Wr(2m + l,fxx(ir)).
7.10.26 Show that the estimate (7.4.5) is non-negative definite if the matrix [ Wab(a)]is non-negative definite for — oo < a < ». Hint: Use Schur's resultthat [AabBah] is non-negative if [Aab], [Bab] are; see Bellman (1960) p. 94.
7.10.27 Show that the matrix [ fab(T)(X)] of estimates (7.7.9) is non-negative definite
if the matrix [Wab(a)} is non-negative definite, — <» < a < «>, and ifha(u) = h(u) for a = 1, . . . , r.
7.10.28 Under the conditions of Theorem 7.3.2, show that fab(T)(X) is consistent if
/-a(X)or/M(X) = 0.
7.10.29 Under the conditions of Theorem 7.3.3, show that V?Wr) - c*) andfxx(T)(Q) are asymptotically independent Nr(Q,2trfxx(fy) and (2m)~l
Wr(2mfXx(ty) respectively. If
conclude that 32/r is asymptotically Fr>2m. This result may be used to con-struct approximate confidence regions for C*.
7.10.30 Under the conditions of Theorem 7.4.3, show that in order to minimize themean-squared error E\fab
(1">(\) — fab(\)\2 asymptotically, one should have
BT = O(r~1/5); see Bartlett (1966) p. 316.
7.10.31 Under the conditions of Theorem 7.6.2, prove that Rab(T)Q^ and RCd(T)(X)
are asymptotically independent if Rab, Rac, Rad, Rbc, /?*</, Rca are 0.
7.10.32 Show that in the case that the series X(/), / = 0, ±1,... is not necessarilyGaussian, the covariance (7.6.21) equals
7.10.33 If the series AX/), / = 0, ±1, ... is stationary, real-valued and Gaussian,prove that the covariance structure of the limit process of Theorem 7.6.3is the same as that of
7.10 exrhvuyh 283
284 ESTIMATING SECOND-ORDER SPECTRA
7.10.35 If X(t), t = 0, ±1,... is a real-valued linear process, under the conditionsof Theorem 7.6.3 show that
for. — oo < X < oo; a, b = 1, . . . , r. Show that the matrix Ixx(T)(ty =[/a6(r)(X)l is non-negative definite.
7.10.40 Let the series XO), t = 0, ±1, . . . satisfy Assumption 2.6.2(1). Show thatexpression (7.2.14) holds with the O(T~l), O(T~2) terms uniform in r,s ^ 0 (mod T).
7.10.41 Use the results of the previous exercise to show that, under the conditionsof Theorem 7.4.3,
converges weakly to a Gaussian process whose covariance function doesnot involve the fourth-order spectrum of the series X(f).
7.10.36 Let X(0, / = 0, ±1,... be an r vector-valued series satisfying Assumption2.6.1. Show that cfli
(r>(w) given by (7.6.10) and ca6<T)(w) given by (7.6.12)
have the same limiting normal distributions. See also Exercise 4.8.37.
7.10.37 Let
7.10.38 With the notation of Section 7.9, show that the following identity holds
/ = 0 , , . . , L — 1; a, b = 1 , . . . , r. Under the conditions of Theorem 7.6.1show that Jab
(y)(A,l\ I — 0, . . . , L — 1 are asymptotically independentnormal with mean JQT A(a) fao(ot)da. as V —» <». This result may be used toset approximate confidence limits for Jai,(A).
71039 jkkhjsdlerki
and
7.10.42 Let X(/), / = 0, ±1, . . . satisfy Assumption 2.6.2(1). Let A(a) be ofbounded variation. Let W(a) satisfy Assumption 6.4.1. Suppose PT =P -> oo, with PTBT ^ 1, PTBTT-> oo as T -> oo. Let
Show that Jab(p)(A) is asymptotically normal with
Hint: Use the previous exercise.
7.10 njbhgkshf285
8
ANALYSIS OF A LINEARTIME INVARIANT RELATION
BETWEEN TWO VECTOR-VALUEDSTOCHASTIC SERIES
8.1 INTRODUCTION
Consider an (r -f s) vector-valued stationary series
/ = 0, ±1, . . . with X(0 r vector-valued and Y(/) s vector-valued. Weassume the series (8.1.1) satisfies Assumption 2.6.1 and we define the means
the covariances
and the second-order spectral densities
286
8.2 ANALAGOUS MULTIVARIATE RESULTS 287
The problem we investigate in this chapter is the selection of an s vectory and an s X r filter (a(w)} such that the value
and
where M/A), ju/B) denote they'th largest latent values of A, B, respectively.In the theorem below, when we talk of minimizing a Hermitian matrix-
valued function A(0) with respect to 6, we mean finding the value 60 such that
for all 0. A(00) is called the minimum value of A(0). We note that if 00
minimizes A(0), then from (8.2.2) to (8.2.5) it also minimizes simultaneouslythe functionais Det A(0), tr A(0), Ajj(0), and M/A(0)).
We next introduce some additional notation. Let Z be an arbitrary matrixwith columns Zi,. . . , Z/. We use the notation
is near the value Y(f) in some sense. We develop statistical properties of esti-mates of the desired y, a(«) based on a sample of values X(/), Y(0, t = 0,. . . , T — 1. The problems considered in this chapter differ from those ofChapter 6 in that the independent series, X(r), t — 0, ±1, . . . , is taken tobe stochastic rather than fixed.
In the next section we review a variety of results concerning analagousmultivariate problems.
8.2 ANALOGOUS MULTIVARIATE RESULTS
We remind the reader of the ordering for Hermitian matrices given by
if the matrix A — B is non-negative definite. This ordering is discussed inBellman (1960), Gelfand (1961), and Siotani (1967), for example. The in-equality (8.2.1) implies, among other things, that
288 TWO VECTOR-VALUED STOCHASTIC SERIES
for the column vector obtained from Z by placing its columns under oneanother successively. Given arbitrary matrices U, V we define their Kroneckerproduct, U (x) V, to be the block matrix
if V is J X K. An important relation connecting the two notations of thisparagraph is
if the dimensions of the matrices that appear are appropriate; see Exercise8.16.26. Neudecker (1968) and Nissen (1968) discuss statistical applicationsof these definitions.
We now turn to the consideration of (r + s) vector-valued random vari-ables of the form
with X, r vector-valued and Y, s vector-valued. Suppose the variate (8.2.10)has mean
Consider the problem of choosing the 5 vector y and s X r matrix a tominimize the s X s Hermitian matrix
We have
Theorem 8.2.1 Let an (r + s) vector-valued variate of the form (8.2.10),with mean (8.2.11) and covariance matrix (8.2.12), be given. Suppose Vxx isnonsingular. Then y and a minimizing (8.2.13) are given by
and
and covariance matrix
8.2 ANALAGOUS MULTIVARIATE RESULTS 289
The minimum achieved is
We call a, given by (8.2.15), the regression coefficient of Y on X. Thevariate
is called the best linear predictor of Y based on X. From Theorem 8.2.1, wesee that the y and a values given also minimize the determinant, trace,diagonal entries, and latent values of the matrix (8.2.13). References to thistheorem include: Whittle (1963a) Chap. 4, Goldberger (1964) p. 280, Rao(1965), and Khatri (1967). In the case s = 1, the square of the correlationcoefficient of Y with the best linear predictor of Y is called the squared coef-ficient of multiple correlation. It is given by
In the case of vector-valued Y, 'S>YY~ll2'S,Yx'S'Xx~l^xY^lYY~112 has been pro-posed. It will appear in our discussion of canonical correlations given inChapter 10. Real-valued functions of it, such as trace and determinant, willsometimes be of use. The matrix appears in Khatri (1964). Tate (1966)makes remarks concerning multivariate analogs of the correlation coeffi-cient; see also Williams (1967) and Hotelling (1936).
We may define an error variate by
This variate represents the residual after approximating Y by the best linearfunction of X. The covariance matrix of e is given by
that is, the matrix (8.2.16). The covariance of sy with ejt is called the partialcovariance of Yj with Yk. It measures the linear relation of Yj with Yk afterthe linear effects of X have been removed. Similarly the correlation coeffi-cient of Sj with ek is called the partial correlation of Yj with Yk. These para-meters are discussed in Kendall and Stuart (1961) Chap. 27, and Morrison(1967) Chap. 3.
In the case that the variate (8.2.10) has a multivariate normal distribution,the predictor suggested by Theorem 8.2.1 is best within a larger class ofpredictors.
Theorem 8.2.2 Suppose the variate (8.2.10) is multivariate normal withmean (8.2.11) and covariance matrix (8.2.12). Suppose J^xx is nonsingular.
The reason for the divisor (n — r) rather then n will become apparent in thecourse of the statement of the next theorem. We have
290 TWO VECTOR-VALUED STOCHASTIC SERIES
The s vector-valued function <j>(X), with E{ <j>(X)r<|>(X)} < °°, that minimizes
is given by
The minimum achieved is
In the case that the variate has a normal distribution, the conditional dis-tribution of Y given X is
and so we see that the partial correlation of Yj with Y* is the conditionalcorrelation of Yj with Y* given X.
We turn to some details of the estimation of the parameters of the abovetheorems. Suppose that a sample of values
j = ! , . . . , « of the variate of Theorem 8.2.1 are available. For convenienceassume yx = 0 and yy = 0. Define the r X n matrix x and the s X nmatrix y by
We may estimate the covariance matrix (8.2.12) by
and
The regression coefficient of Y on X may be estimated by
and the error matrix (8.2.20) may be estimated by
Theorem 8.2.3 Suppose the values (8.2.24), j = !,.. . ,«, are a samplefrom a multivariate normal distribution with mean 0 and covariance matrix(8.2.12). Let a be given by (8.2.28) and ±,. by (8.2.29). Then for any («)vector «
is distributed as
and if n —> <», a is asymptotically normal with these moments. Also Ss, isindependent of a and distributed as (n — r)~l W£n — r,S«,). In the case5 = 1 , Ryx2 = 2rx2xx~l2xr/%YY has density function
The function appearing in (8.2.32) is a generalized hypergeometric func-tion; see Abramowitz and Stegun (1964). Percentage points and momentsof Ryx2 are given in Amos and Koopmans (1962), Ezekiel and Fox (1959)and Kramer (1963). Oikin and Pratt (1958) construct an unbiased estimateof Ryx2- The distributions of further statistics may be determined from thefact that the matrix
is distributed as
The distribution of a is given in Kshirsagar (1961). Its density function isproportional to
This is a form of multivariate t distribution; see Dickey (1967).Estimates of the partial correlations may be based on the entries of S,« in a
manner paralleling their definition. For example an estimate of the partialcorrelation of Yj and 7* with X held linearly constant is
with [S£,]/jt denoting the entry in row 7, column k of Let.
8.2 jhhuanhdju jylidrnjugnucnuncmisj291
292 TWO VECTOR-VALUED STOCHASTIC SERIES
From the distribution of S.t given in Theorem 8.2.3, we see that this expres-sion is distributed as the sample correlation coefficient of sj with e* based onn — r observations. The density function of its square will be given by ex-pression (8.2.32) with Ryx2, Ryx2, n, r replaced by RY^.Y^X, RY^Y^X, n — r,1, respectively. The large sample variance of this R2 is approximately4R2[\ — R2]/n. The distribution of correlation coefficients developed inFisher (1962) may be modified to obtain the joint distribution of all thepartial correlations. The asymptotic joint covariance structure may be de-duced from the results of Pearson and Filon (1898), Hall (1927), and Hsu(1949). Further results and approximations to the distributions of estimatesof squared correlation coefficients are given in Kendall and Stuart (1961) p.341, Gajjar (1967), Hodgson (1968), Alexander and Vok (1963), Giri (1965),and Gurland (1966).
There are complex variate analogs of the preceding theorems. For ex-ample:
Theorem 8.2.4 Let the (r + s) vector-valued variate
have complex entries, mean 0 and be such that
and
Suppose Zxx is nonsingular. Then the y and a minimizing
are given by
The minimum achieved is
We call a, given by (8.2.40), the complex regression coefficient of Y on X.It is a consequence that the indicated y, a also minimize the determinant,trace, and diagonal entries of (8.2.39). In the case s = 1 the minimum
(8.2.41) may be written
where we define
This parameter is clearly an extension to the complex-valued case of thesquared coefficient of multiple correlation. Because the minimum (8.2.41)must lie between Syy and 0, it follows that 0 ^ l^y*!2 ^ 1, the value 1occurring when the minimum is 0. On occasion we may wish to partition\RYx\2 into
and
where we have Sy* = Re SKA- + / Im Sy*. These expressions are measuresof the degree of linear relation of Y with Re X and Im X respectively.
Returning now to the case of vector-valued Y, a direct measure of thedegree of approximation of Y by a linear function of X is provided by theerror variate
which has mean 0 and is such that
and
Analogs of the partial covariance and partial correlation may be based onthe matrix (8.2.47) in an immediate manner.
Suppose now that a sample of values
of the variate of Theorem 8.2.4 are available. Define matrices x and y as in(8.2.25) and (8.2.26). We are led to construct the statistics
28255n jhhuckuaskoasujfnhcmidruepohjyu hyuserknhj 3293kui
294 TWO VECTOR-VALUED STOCHASTIC SERIES
and
which leads us to
Theorem 8.2.5 Suppose values of the form (8.2.49), j = ! , . . . ,« , are asample from a complex multivariate normal distribution with mean 0 andcovariance matrix (8.2.37). Let a be given by (8.2.51) and £., by (8.2.52).Then for any (rs) vector a
and if n —» <», vec a is asymptotically Nrsc(yec a^'S,, (x) S**"1)- Continu-
ing £,. is independent of a and distributed as (n — r)~lWsc(n — /%£„).
Finally in the case s = 1 the density function of |$y;r|2 = SYX^XX'^XY/ZYY is
We note that the distribution of |jfto-|2 in the complex case is identicalwith the real case distribution having twice the sample size and twice the Xdimension. The heuristic approach described in Section 8.4 will suggest thereason for this occurrence. A useful consequence is that we may use tablesand results derived for the real case. The density function (8.2.55) is given inGoodman (1963); see also James (1964) expression (112), and Khatri (1965a).In the case \Rvx\2 = 0, expression (8.2.55) becomes
is distributed as
8.3 DETERMINATION OF AN OPTIMUM LINEAR FILTER 295
This is the same as the null distribution of (6.2.10) derived under the as-sumption of fixed X. Percentage points in this case may therefore be derivedfrom F percentage points as they were in Chapter 6. Amos and Koopmans(1962) and Groves and Hannan (1968) provide a variety of non-null per-centage points for I/?™!2.
Confidence regions for the entries of a may be constructed from ex-pression (8.2.53) in the manner of Section 6.2.
By analogy with (8.2.34) the density function of a will be proportional to
8.3 DETERMINATION OF AN OPTIMUM LINEAR FILTER
We return to the notation of Section 8.1 and the problem of determiningan 5 vector, y, and an s X r filter, {a(w)}, so that
is close to Y(0- Suppose we measure closeness by the s X s Hermitianmatrix
Wahba (1966) determined this density in the case 5 = 1 .Sometimes it is of interest to consider the following complex analogs of
the partial correlations
A natural estimate of is provided by
We see from the distribution of S.. given in Theorem 8.2.5 that this last isdistributed as the sample complex correlation coefficient of e, with e* basedon n — r observations. Its modulus-square will have density function(8.2.55) with the replacement of RYX, RYX, n,r by RY^Y^X, RY^.Y^X, n — r,\,respectively. The asymptotic covariances of pairs of these estimates may bededuced from expression (7.6.16).
We then have
296 TWO VECTOR-VALUED STOCHASTIC SERIES
Theorem 8.3.1 Consider an (r -f- s) vector-valued second-order stationarytime series of the form (8.1.1) with mean (8.1.2) and autocovariance function(8.1.3). Suppose Cxx(u), Cyy(w) are absolutely summable and suppose fxx(\),given by (8.1.4), is nonsingular, — a> < X < oo. Then the y and a(w) thatminimize (8.3.2) are given by
and
where
The filter {a(w)j is absolutely summable. The minimum achieved is
A(X), given by expression (8.3.5), is the transfer function of the s X r filterachieving the indicated minimum. We call A(X) the complex regressioncoefficient of Y(r) on X(0 at frequency X.
The s vector-valued series
where y and a(w) are given in Theorem, 8.3.1 is called the error series. It isseen to have 0 mean and spectral density matrix
is called the error spectrum. We may write in the form
and thus we are led to measure the linear association of Y(/) with X(0 by thes X s matrix
In the case that s = 1, (8.3.10) is called the multiple coherence of Y(t) withX(r) at frequency X. We denote it by |jRy;r(X)j2 and write
and see that \Ryx(^)\2 = 0 corresponds to the incoherent case in which X(f)does not reduce the error variance. The value |.RyA-(A)|2 = 1 corresponds tothe perfectly coherent case in which the error series is reduced to 0. The co-efficient of multiple coherence was defined in Goodman (1963); see alsoKoopmans (1964a,b).
Returning to the case of general s we call the cross-spectrum between theath and bth components of the error series, zj(i) and sb(f), the partial cross-spectrum of Ya(t) with Yb(t) after removing the linear effects of X(;). It isgiven by
This gives us an interpretation for the individual entries of a matrix-valuedcomplex regression coefficient.
8.3 DETERMINATION OF AN OPTIMUM LINEAR FILTER 297
(In the case, r, s = 1, we define the coherency RYX(^) = fYx(^)/[fxx(tyfYY(W2-) The multiple coherence satisfies the inequalities
(see Exercise 8.16.35) and measures the extent to which the real-valued Y(f)is determinable from the r vector-valued X(/) by linear time invariantoperations. We write
— oo < X < oo. We call the coherency of these components, the partialcoherency of Ya(i) with Yt,(t) after removing the linear effects of X(/). It isgiven by
These last parameters are of use in determining the extent to which anapparent time invariant linear relation between the series YJ(i) and YA(/) isdue to the linear relation of each to a series X(f); see Gersch (1972). We canlikewise define the partial complex regression coefficient of Ya(t) on Yh(t)after removing the linear effects of X(/) to be
As would have been expected from the situation in the real variate case itturns out that expression (8.3.16) is the entry corresponding to Yb(t) in thematrix-valued complex regression coefficient of Ya(t) on the r + 1 vector-valued series
298 TWO VECTOR-VALUED STOCHASTIC SERIES
The above parameters of the partial cross-spectral analysis of time serieswere introduced by Tick (1963) and Wonnacott, see Granger (1964) p. xiii.They are studied further in Koopmans (1964b), Goodman (1965), Akaike(1965), Parzen (1967c), and Jenkins and Watt (1968).
As an example of the values of these various parameters consider themodel
where X(f) is r vector-valued, stationary with spectral density matrix fxx(X)\t(t) is s vector-valued, stationary, mean 0, with spectral density matrix, f,«(X),and independent of X(f) at all lags; y, is an s vector; and {a(w)} is an ab-solutely summable s X r matrix-valued filter. We quickly see that the com-plex regression coefficient of Y(r) on X(f) is given by
Also
and so
In the case that the series (8.1.1) is Gaussian a direct interpretation maybe placed on y and a(w) of Theorem 8.3.1. We have
Theorem 8.3.2 Under the conditions of Theorem 8.3.1 and if the series(8.1.1) is Gaussian, y. and a(w) of (8.3.3) and (8.3.4) are given by
Also
General references to the previous development, in the case r, s = ^in-clude: Wiener (1949), Solodovnikov (1950), Koopmans (1964a), and Black-man (1965). There are a variety of connections between the approach of thissection and that of Chapter 6. The principal difference in assumption is thatthe series X(0 is now stochastic rather than fixed. The model of Chapter 6was
This A(X) is called the matched filter for the signal Y(0 in the noise KJ(/)'We see its general character is one of not passing the frequency componentsof X(/) in frequency intervals where f,,(X) is very large relative to fyy(X),while the components are passed virtually unaltered in intervals where f,,(X)is small relative to fyy(X). In the case s — 1, the parameter /yyCxy/^X) iscalled the signal to noise ratio at frequency X.
8.4 HEURISTIC INTERPRETATION OF PARAMETERS ANDCONSTRUCTION OF ESTIMATES
8.4 HEURISTIC INTERPRETATION OF PARAMETERS 299
with i* constant, a(w) a summable filter, and e(/) a 0 mean error series. Exer-cise 8.16.33 is to show that such a model holds under the conditions ofTheorem 8.3.1.
We end this section with an example of the application of Theorem 8.3.1.Suppose thatijO) and Y(/) are independent s vector-valued, 0 mean sta-tionary series. Suppose that the series X(f) is given by
The series Y(/) may be thought of as a signal immersed in a noise series ij(f).Suppose that we wish to approximate Y(f) by a filtered version of X(r). Thespectral density matrix of X(f) and Y(r) is given by
Following expression (8.3.5) the transfer function of the best linear filter fordetermining Y(0 from X(0 is given by
/ = 0, ± 1, . . . satisfy Assumption 2.6.1 and suppose its values are availablefor t = 0 , . . . , T — 1. We evaluate the finite Fourier transform of thesevalues
— oo < X < oo. Following Theorem 4.4.2, for large T, this variate will bedistributed approximately as
larehjncujunuvyjkjmcidliser u
300 TWO VECTOR-VALUED STOCHASTIC SERIES
Referring to the discussion of Theorem 8.2.4, we now see that A(X), thecomplex regression coefficient of Y(r) on X(0 at frequency X may be inter-preted, approximately, as the complex regression coefficient of dy
(r)(X) ond;r(r)(X). It is therefore of use in the prediction of the value of dy(r)(X) fromthat of d^(r)(X) in a linear manner. The error spectrum, f,«(X), is approxi-mately proportional to the covariance matrix of the error variate of thisprediction problem. Likewise the partial complex regression coefficient ofYa(t) on Yb(t) after removing the linear effects of X(0 is nearly the complexregression coefficient of e/ya
(r)(X) on </yt(r)(X) after removing the linear effects
of d;r(r)(X). Continuing, suppose 5 = 1 . We see that |jRy*(X)!2> the multiplecoherence of Y(t) with X(V) at frequency X, may, following the discussion ofTheorem 8.2.4, be interpreted as the complex analog of the squared coeffi-cient of multiple correlation of dy(r)(X) with &x(T)(ty. Finally the partialcoherency of Ya(t) with Yb(t) after removing the linear effects of X(0 may beinterpreted as the complex analog of the partial correlation of </ra
(r)(X) withdYb
(T)(\) after removing the linear effects of d;r(r)(X). In the case that theseries (8.4.1) is Gaussian these partial parameters will be approximately con-ditional parameters given the value d*(r)(X).
Similar interpretations may be given in the case X = 0 (mod ir). Real-valued statistics and distributions will be involved in this case.
Let us next turn to the construction of estimates of the various param-eters. Suppose s(T) is an integer with 2irs(T)/T near X, where we takeX ^ 0 (mod TT). Following Theorem 4.4.1, the values
5 = 0, db 1,. .. , ±m will be approximately independent realizations of thevariate (8.4.3). Following the discussion of Theorem 8.2.5, specifically ex-pression (8.2.50), we can consider forming the statistics
and
in turn, the latter two being estimates of A(X), f,.(X), respectively. Theorem8.2.5 suggests approximations to the distributions of these statistics. In
8.4 HEURISTIC INTERPRETATION OF PARAMETERS 301
Section 8.6 we will make the definition (8.4.5) more flexible by includingweights in the summation.
Heuristic approaches to the linear analysis of multivatiate series are givenin Tick (1963), Akaike (1965), and Groves and Hannan (1968). A discussionof the parameters and estimates is given in Fishman (1969).
We may also provide an interpretation of the parameters of Section 8.3 bymeans of the frequency components X(f,X), Y(f,X), t — 0, ±1,. . . and theirHilbert transforms Xa(t,\), Y"(f,X), t = 0, ±1,. . . . From the discussion ofSection 7.1 we see that the covariance matrix of the variate
Likewise we see that Im A(X) may be interpreted as the coefficient of XH(t,\)in the same regression.
The covariance matrix of the error variate of this regression analysis is
is, approximately, proportional to
Now
and so
We now see that Re A(X) may be interpreted as the coefficient of X(/,X) inthe regression of Y(/,X) on
We see that the coefficient of multiple coherence may be interpreted as thesquared coefficient of multiple correlation of Y(t) with expression (8.4.12).
We end this section with a discussion of some useful parameters. Theentries are generally complex-valued. In practice we may wish to deal withthe real-valued Re Aab(ty, Im /^(X), or the real-valued modulus <ja&(X) =|y4fl/,(X)|, and argument <£a&(X) = arg /^(X). Consider the case r, s = 1.(7(X) = |/4(X)| is called the gain of 7(0 over X(t) at frequency X. The functionG(X) is non-negative and we see that
302 TWO VECTOR-VALUED STOCHASTIC SERIES
We see, therefore, that the real parts of the partial coherencies may be inter-preted as partial correlations involved in the regression of Y(/,X) on thevariate (8.4.12). Similar considerations indicate that the imaginary partsmay be interpreted as partial correlations of the regression of Yff(f,X) on(8.4.12).
If s = 1, then the squared coefficient of multiple correlation of the regres-sion of Y(f) on the variate (8.4.12) is
and
If
then
Expression (8.4.18) suggests the source of the term gain. We see that theamplitude of the component of frequency X in X(t) is multiplied by G(X) inthe case of Y(t).
In the example Y(f) = aX(t — M), we see that
The gain here has the nature of the absolute value of a regression coefficientand is constant with respect to X.
The function <£(X) = arg A(\) is called the phase between Y(f) and X(t) at
8.4 HEURISTIC INTERPRETATION OF PARAMETERS 303
frequency X. The fundamental range of values of 0(X) is the interval (—T,TT].Because fxx(ty ^ 0, <£(X) is given by
We see
and so <£(0) = 0. Also
Suppose
In terms of the Cramer representations
and so <£(X) may be interpreted as the angle between the component of fre-quency X in X(t) and the corresponding component in Y(t).
If, for example, Y(t) = aX(t — u), we see
Figure 8.4.1 <£(X), phase angle corresponding to delay of u time units when « > 0.
Figure 8.4.2 0(\), phase angle corresponding to delay of u time units when a < 0.
8.5 A LIMITING DISTRIBUTION FOR ESTIMATES
In this section we determine the limiting distribution of the estimates con-structed in the previous section under the conditions T—> °°, but m isfixed. Let
Set
and
These two functions are plotted in Figures 8.4.1 and 8.4.2 respectively taking(—7r,7r] as the fundamental range of values for <£(\).
On occasion the function
is more easily interpreted. It is called the group delay of Y(i) over X(t) atfrequency X.
In the case of the example, we see that the group delay is u for all valuesof a. That is, it is the amount that Y(i) is delayed with respect to X(t).
We note that the group delay is defined uniquely, whereas </>(A) is definedonly up to an arbitrary multiple of 2ir.
and so
Let m be a non-negative integer and s(T), T = 1, 2 , . . . a sequence of integerswith 2irs(T)/T-^\ as 7-> ». In the manner of Section 7.3, set
304 TWO VECTOR-VALUED STOCHASTIC SERIES
has the limiting distribution /2(2«+i-r) in the case X ̂ 0 (mod TT). Similarresults hold in the case X = 0 (mod TT).
We conclude from Exercise 4.8.8 that under the conditions of Theorem8.5.1, gee
(:r)(X) is asymptotically (2m + 1 - rYlWf(2m + 1 - r,f,.(X)) if
8.5 A LIMITING DISTRIBUTION FOR ESTIMATES 305
We now construct the estimates
where
We notice that if m is large, then C(m,r) = 1 and definition (8.5.6) is simpli-fied. We also form
We now state,
Theorem 8.5.1 Let the (/• + s) vector-valued series (8.5.1) satisfy Assump-tion 2.6.1 and have spectral density matrix (8.5.2). Let (8.5.2) be estimatedby (8.5.4) where m, s(T) are integers with 2irs(T)/T-> \ as T—> oo. Let
be distributed as (2m + \YlWf+s(2m + l,fzz(X)) if X ̂ 0 (mod TT), as(2m)-i Wr+s(2m, fzz(X)) if X = 0 (mod TT). Then A<r>(X) - A(X), gt,
(r)(X) tendin distribution to WrjrW**"1, W.s = C(w,rXWyy - V^YX^XX-^XY) re-spectively. Also /?Jvv*(X) tends to WtatJ[WSataWtbtb}^J, k= 1,. . . , s andif .v = 1, |/?™(r)(X)|2 tends to ^YX^XX'^XY/WYY.
The density function of the limiting distribution of A(r)(X) is deduciblefrom (8.2.57) and (8.2.34). This was given in Wahba (1966) for the cases = 1, X ̂ 0 (mod IT). A more useful result comes from noting that for any(r.v) vector «
306 TWO VECTOR-VALUED STOCHASTIC SERIES
X ̂ 0 (mod ir), asymptotically (2m - r)-1 WJ(2m - r,f,,(X)) if X = 0 (mod *•).It is also asymptotically independent of A(T)(X). We note, from Theorem7.3.3, that the asymptotic distribution of g..(r)(X) has the nature of theasymptotic distribution of a spectral estimate based directly on the valuest(0, t = 0 , . . . , T — 1 with the parameter 1m in that case replaced by2m — r in the present case.
The partial coherencies R^b.x(^\ a, b = 1,.. ., s are based directly onthe matrix g,,(r)(X). We conclude from the above remarks that under theconditions of Theorem 8.5.1, their asymptotic distribution will be that ofunconditional coherencies with the parameter 2m replaced by 2m — r. Inthe case of vector-valued normal variates this result was noted by Fisher(1924). The distribution for a single /?%,.*(*) is given by (8.2.32) and(8.2.55) with r = 1.
Turning to the asymptotic distribution of the coefficient of multiple co-herence in the case s = 1, set \Rrx\2 = \Ryx(*)\2, \Rrx\2 = |/*rjr(r)(X)|2.Then the limiting distribution of |/?yjr(r)(X)|2 will be given by (8.2.55) withn = 2m + 1, if X ̂ 0 (mod *•), by (8.2.32) with n = 2m, if X s 0 (mod ir).
Goodman (1963) suggested the above limiting distribution for the co-herence. See also Goodman (1965), Khatri (1965), and Groves and Hannan(1968). Enochson and Goodman (1965) investigate the accuracy of approxi-mating the distribution of tanh"1 |/to-(r)(X)| by a normal distribution withmean
and variance l/[2(2m — r)]. The approximation seems reasonable.
8.6 A CLASS OF CONSISTENT ESTIMATES
In this section we develop a general class of estimates of the parametersthat have been defined in Section 8.3. Suppose the values
t = 0,.. . , T - 1 are available. Define d^(r)(X), dy(r)(X), - » < X < «,
the manner of (8.4.2). Define the matrix of cross-periodograms
— oo < X < oo with similar definitions for IA-A^X), lyr^X). Let W(a) bea weight function satisfying Assumption 5.4.1.
We now estimate
8.6 A CLASS OF CONSISTENT ESTIMATES 307
the matrix of second-order spectra by
having taken note of the heuristic estimate (8.4.5). We estimate A(X) by
The typical entry, Aab(\), of A(X) is generally complex-valued. On occasionwe may wish to consider its amplitude Gab(\) and its argument 4»a/,(X). Basedon this estimate we take
and
for a = 1 , . . . , s and b = 1, . . . , / • . We estimate the error spectral densitymatrix fI£(X) by
We estimate the partial coherency Rr^-xQ^ by
In the case s — 1 we estimate |/?yA-(X)j2, the multiple coherence of Y(t) withX(0 by
— co < X < oo. The various estimates are seen to be sample analogs of cor-responding population definitions.
Turning to the asymptotic first-order moments of the various statisticswe have
Theorem 8.6.1 Let the (r + s) vector-valued series (8.6.1) satisfy Assump-tion 2.6.2(1) and have spectral density matrix (8.6.3). Suppose fxx(ty is non-singular. Let W(a) satisfy Assumption 5.6.1. Suppose the statistics A(r)(X),^a/>
{r)(X), Gahw(^, gt,(7XX), R^Yb.x(\) are given by (8.6.5) to (8.6.9). Then
if BT -> 0, BTT -» oo as T -» «
308 TWO VECTOR-VALUED STOCHASTIC SERIES
and
We see that, in each case, the asymptotic means of the various statisticsare nonlinear matrix weighted averages of the population values of interest.The asymptotic bias will therefore depend on how near constant theseaverages are in the neighborhood of X. In the limit we have
Corollary 8.6.1 Under the conditions of Theorem 8.6.1
and in the case s = 1
The various estimates are asymptotically unbiased in an extended sense.We can develop expansions in powers of BT of the asymptotic means; seeExercise 8.16.25. The important thing that we note from such expressions is
8.7 SECOND-ORDER ASYMPTOTIC MOMENTS OF THE ESTIMATES 309
that the nearer the derivatives of the population second-order spectra are to0, the less the asymptotic bias. Nettheim (1966) expanded in powers ofBT~IT~* in the Gaussian case.
Estimates of the parameters under consideration were investigated inGoodman (1965), Akaike (1965), Wahba (1966), Parzen (1967), and Jenkinsand Watt (1968). The case r,s = 1 was considered in Goodman (1957),Tukey (1959a,b), Akaike and Yamanouchi (1962), Jenkins (1963a,b),Akaike (1964), Granger (1964), and Parzen (1964).
8.7 SECOND-ORDER ASYMPTOTIC MOMENTS OF THE ESTIMATES
We now turn to the development of certain second-order properties of thestatistics of the previous section.
Theorem 8.7.1 Under the conditions of Theorem 8.6.1 and if fxx(a) is notsingular in a neighborhood of X or ju, then
To consider various aspects of these results, let *F(X) denote the matrixfxx(^)~l, then from (8.7.1) and the perturbation expansions given in Exer-cise 8.16.24, we conclude
310 TWO VECTOR-VALUED STOCHASTIC SERIES
for a, c = 1,.. ., s\ b, d = 1,. . . , r.Let us use the notation X'b to denote the set of Xj, d = 1,.. ., r excluding
Xb. Then we have from Exercise 8.16.37
We also have
and so
From the standpoint of variability we see, from (8.7.10), that the estimateAah
(T)(\) will be best if the multiple coherence of Yj(i) with X(/) is near 1 andif the multiple coherence of Xb(t) with X\(t),. .. ,Xb-i(t), Xh+i(t),. . . ,XJ(t)is near 0.
Turning to a consideration of the estimated gain and phase we first notethe relations
and
We have from expressions to (8.7.6) to (8.7.8), (8.7.11), and (8.7.13) thefollowing:
and
We see that the variability of log Gab(T\\), 4>flb
(r)(\) will be small if thepartial coherence of Ya(t) with Xd(t) after removing the linear effects ofX\(t),. . . , Xt,-\(t), Xh+\(t), . . . ,XM is near 1. In the case that r, s = 1, the
8.7 SECOND-ORDER ASYMPTOTIC MOMENTS OF THE ESTIMATES 311
partial coherence in expressions (8.7.14) and (8.7.15) is replaced by the bi-variate coherence \RYx(X)\2.
We note that if X ± p f^ 0 (mod 2?r), then the asymptotic covariancestructure of the log gain and phase is identical.
Turning to the estimated error spectral density matrix we note, from(8.7.2) and (7.4.17), that the second-order asymptotic behavior of g««(r)(X) isexactly the same as if it were a direct spectral estimate/..(r)(X) based on thevalues e(/), t = 0, . . . , T - 1.
We note from (8.7.3) that the asymptotic behavior of estimated partial co-herencies is the same as that of the estimated coherencies of an s vector-valued series whose population coherencies are the partial coherenciesRYaYb-x(h), a, b = 1,. . . , s. Taking a = c, b = d we may deduce from(8.7.3) in the manner of Corollary 7.6.2 that
whose behavior is indicated in Table 8.7.1 and Figure 8.7.1. We see thatvalues of \R\ near 0 are not changed much, while values near 1 are greatlyincreased. Now
Figure 8.7.1 Graph of the transformation y = tanh~l x.
The asymptotic covariance structure of \RyaYb.x(^)\2 is seen to be the samefor all values of s, r. An examination of expression (8.7.16) suggests theconsideration of the variance stabilizing transformation
312 TWO VECTOR-VALUED STOCHASTIC SERIES
Table 8.7.1 Values of the Hyperbolic Tangent
x tanh"1 x
.00 .0000
.05 .0500
.10 .1003
.15 .1511
.20 .2027
.25 .2554
.30 .3095
.35 .3654
.40 .4236
.45 .4847
.50 .5493
.55 .6184
.60 .6931
.65 .7753
.70 .8673
.75 .9730
.80 1.0986
.85 1.2562
.90 1.4722
.95 1.83181.00
In the case 5 = 1 , the partial coherence is the multiple coherence,|/?y;r(\)!2, its estimate becomes the estimate \RYx(T)0^\2- It follows that ex-pressions (8.7.18) and (8.7.19) are valid for |#y*(r)(X)i2 as well. Enochsonand Goodman (1965) have investigated the effect of this transformation andhave suggested the approximations
8.8 ASYMPTOTIC DISTRIBUTION OF THE ESTIMATES 313
Were the estimate of Section 8.5 employed, then n = 2m + 1.Parzen (1967) derived the asymptotic mean and variance of Aa^
T)(\),log <?flfr
(r)(X), and <t>abm(\) in the case s = I . Jenkins and Watt (1968), pp.
484, 492, indicated the asymptotic covariance structure of A(r)(X) and\Ryx(T)(X)\2. In the case r, s - \ Jenkins (1963a) derived the asymptoticvariances of the phase, gain, and coherence.
8.8 ASYMPTOTIC DISTRIBUTION OF THE ESTIMATES
We now indicate limiting distributions for the statistics of interest. Webegin with
Theorem 8.8.1 Under the conditions of Theorem 8.6.1 and if fxx(^(l)) isnot singular for / = 1, . . . , L the estimates A^XX^), g,.(r)(X(/))» RYJ^),a, b = 1,. . ., s are asymptotically normally distributed with covariancestructure given by (8.7.1) to (8.7.3). A(r)(X) and g,,(r)(X) are asymptoticallyindependent.
This theorem will be of use in constructing confidence regions of interest.We conclude from Theorem 8.8.1 and expression (8.7.1) that if X ̂ 0(mod TT), then vec A(r)(X) is asymptotically
where
It follows from Exercise 4.8.2 that the individual entries of A(r)(X) will beasymptotically complex normal as conjectured in Parzen (1967). Theorem8.8.1 has the following:
Corollary 8.8.1 Under the conditions of Theorem 8.8.1, functions ofA(r)(X), g,,(r)(X), Ry^Yb(X) with nonsingular first derivative will be asymp-totically normal.
In particular, we may conclude that log GW>(r)(X) will be asymptoticallynormal with variance
will be asymptotically normal with variance
and log Gah(T)(\), 0fl/>
(r)(X) will be asymptotically independent a — 1,. . . , s
a, b = 1,.. . , 5; and if s — 1, tanhr1 l/?r*m(X)| will be asymptoticallynormal with variance (8.8.5) as well. Experience with variance stabilizingtransformations (see Kendall and Stuart (1966) p. 93) suggests that thetransformed variate may be more nearly normal than the untransformedone. We will use the transformed variate to set confidence intervals for thepopulation coherence in the next section.
We note that the limiting distribution of A(r)(X) given in Theorem 8.8.1 isconsistent with that of Theorem 8.5.1 for large m, if we make the identi-fication
8.9 CONFIDENCE REGIONS FOR THE PROPOSED ESTIMATES
The asymptotic distributions derived in the previous section may be usedto construct confidence regions for the parameters of interest. Throughoutthis section we make the identification (8.8.6).
We begin by constructing an approximate confidence region for /4fl/>(X).Suppose X ̂ 0 (mod ir). Expression (8.5.11) lead us to approximate thedistribution of
314 TWO VECTOR-VALUED STOCHASTIC SERIES
and b = 1 , . . . , r. Also tanh"1 |/?i^y6.^(X)| will be asymptotically normalwith variance
The distributions of the other variates are also consistent, since the Wishartdistribution is near the normal when the degrees of freedom are large.
by f2(2m+i-r) where *F(r)(X) = fxx(r)(^)~{- This approximation may bemanipulated in the manner of Section 6.9 to obtain a confidence region foreither {Re /^(X), Im /40«,(X)} or {log Ga/,(X), </><,&(X)}. Tn the case X = 0(mod r) we approximate the distribution of (8.9.1) by ho-m-r)-
If we let A«(r)(X), A«(X) denote the ath row of A(r)(X), A(X) respectively,then a confidence region for Afl(X) may be obtained by approximating theHistriVmtinn nf
by F2r;2(2m+i-r) in the case X ̂ 0 (mod T). Exercise 6.14.17 indicates a meansto construct approximate multiple confidence regions for all linear combina-
8.9 CONFIDENCE REGIONS FOR THE PROPOSED ESTIMATES 315
tions of the entries of Aa(X). This leads us to a consideration of the 100/8 per-cent region of the form
Figure 8.9.1 Confidence intervals of size 80 percent for the coherence, indexed by thenumber of periodograms averaged.
b = 1, . . . , / • in the case X ̂ 0 (mod TT). This last may be converted directlyinto a simultaneous region for <£fl&(X), log <J0&(X), b — 1,. . . , r in the mannerof expression (6.9.11).
Turning to a consideration of f.XX) we note that the parameters /,0%(X),1 ^ a ^ b ^ 5 are algebraically equivalent to the parameters /.0.0(X),a — 1,. .. , s; RYaYb-x(X), I ^ a < b ^ s. We will indicate confidenceintervals for these.
Theorem 8.5.1 leads us to approximate the distribution of g.(J^(X)//,0«a(X)
316 TWO VECTOR-VALUED STOCHASTIC SERIES
by X2(2m+i_r)/{2(2m + 1 - r)| if X ^ 0 (mod ir), by X22m-r/{2m - r\ if
X = 0 (mod TT). Confidence intervals for/,0,a(X) may be obtained from theseapproximations in the manner of expression (5.7.5).
In the case of a single /?yoy(1.XX), Theorem 8.8.1 leads us to consider the100(1 — «) percent confidence interval
Figure 8.9.2 Confidence intervals of size 90 percent for the coherence, indexed by thenumber of periodograms averaged.
Alternately we could consult the tables of Alexander and Vok (1963).The setting of confidence regions of the sort considered in this section is
carried out in Goodman (1965), Enochson and Goodman (1965), Akaike(1965), and Groves and Hannan (1968). In the case \Ryx(X)\2 = 0, X ̂ 0(mod TT), the approximate lOOa percent point of \RYxlT)(X)\2 is given by theelementary expression 1 — (1 — «)1/2m; see Exercise 8.16.22.
8.10 ESTIMATION OF THE FILTER COEFFICIENTS 317
Alternately we could consult the tables of Amos and Koopmans (1962) forthe distribution of the complex analog of the coefficient of correlation withthe sample size reduced by r or use Figures 8.9.1 and 8.9.2 prepared fromthat reference.
In the case of a multiple coherence, we can consider the approximate100(1 — «) percent confidence interval
8.10 ESTIMATION OF THE FILTER COEFFICIENTS
Suppose that the (r + s) vector-valued series (8.1.1) satisfies
/ = 0, ±1,.. . where t(t), t = 0, ±1, . . . is a stationary series independentof the series X(/). Theorem 8.3.1 leads us to consider the time domain co-efficients
where A(X) = fYx(Wxx(X)-1.Suppose now that A(r)(X) is an estimate of A(X) of the form considered
previously in this chapter. We can consider estimating a(w) by the statistic
where PT is a sequence of integers tending to <» as T —> <».We would expect the distribution of a(r)(«) to be centered near
318 TWO VECTOR-VALUED STOCHASTIC SERIES
After the discussion following Theorem 7.4.2, the latter will be near
in the case that the population parameters fy*(a), fxx(a) do not vary muchin intervals of length O(#r). Expression (8.10.5) may be written
which is near the desired a(«) in the case that the filter coefficients fall off to 0sufficiently rapidly. These remarks suggest that, if anything, the procedureof prefiltering will be especially necessary in the present context.
Turning next to second-order moment considerations, expression (8.7.1)suggests that
provided PT is not too large. In fact we have
Theorem 8.10.1 Let the (r + s) vector-valued series (8.1.1) satisfy (8.10.1)where the series X(0, e(0 satisfy Assumption 2.6.2(1) and are independent.Suppose fxx(ty is nonsingular and has a bounded second derivative. LetW(a) satisfy Assumption 6.4.1. Let A(r)(X) be given by (8.6.5) and a(r)(w)by (8.10.3) for w = 0, ±1, Suppose PT -» «> with PTBT ^ 1,PTi+*BT-lT-1 ->0 for an e > 0. Then a(r)(ui), . . . , a(7-)(wy) are asymptoti-cally jointly normal with means given by (8.10.4) and covariances given by(8.10.7).
We note that, to first order, the asymptotic covariance matrix of vec a(r)(w)does not depend on u. We may consider estimating it by
where g.«(r)(X) is given by (8.6.8). If we let *F(r)(X) denote fxx(T)(\)~l and set
we consider the problem of estimating y, a, and f.,(X), — <» < X < <». Themodel (8.10.12) is broader than might be thought on initial reflection. Forexample consider a model
8.10 ESTIMATION OF THE FILTER COEFFICIENTS 319
then we can set down the following approximate 100(1 — a) percent con-fidence interval for ajk(ii):
If one sets PT = BT~I, then the asymptotic variance is of order T~l.Hannan (1967a) considered the estimation of the a(w) in the case that
a(u) = 0 for v sufficiently large and for a linear process error series c(f),/ = 0, ±1 , . . . . Wahba (1966, 1969) considers the Gaussian case withfixed P.
It is of interest to consider also least squares estimates of the a(w), u — 0,±1 ; . . . obtained by minimizing the sum of squares
for some p, q ^ 0. We approach the investigation of these estimates througha consideration of the model
for / = 0, ±1 , . . . . Here we assume that y is an unknown s vector; a is anunknown s X r matrix; X(/), / = 0, ±1,. . . is an observable stationary rvector-valued series; and e(0, t = 0, ±1,. . . an unobservable 0 meanstationary s vector-valued error series having spectral density matrix f.,(X),— oo < X < c». The series Y(0, / = 0, ±1, ... is assumed observable.Given a stretch of values
for t = 0, db l , . .. where X(0> t = 0, ±1,.. . is a stationary r' vector-valued series and the series t(t\ / = 0, ±1, . . . is an independent stationaryseries. This model may be rewritten in the form (8.10.12) with the definitions
320 TWO VECTOR-VALUED STOCHASTIC SERIES
and
in which e(0> / = 0, ±1,. . . is a 0 mean white noise process. The resultsbelow may therefore be used to obtain estimates and the asymptotic proper-ties of those estimates for the models (8.10.14) and (8.10.17).
Given the stretch of values (8.10.13), the least squares estimates y(r), a(r)
of y and a are given by
Theorem 8.10.2 Let the s vector-valued series Y(0, / = 0, ±1, ... satisfy(8.10.12) where X(/), / = 0, ±1, . . . is a 0 mean r vector-valued series satis-fying Assumption 2.6.1 having autocovariance function cxx(u\ w = 0,±1, ... and spectral density matrix fxx(^),— °° < X < °°; e(0> t = 0,± 1 , . . . is an independent s vector-valued series satisfying Assumption 2.6.1,having spectral density matrix f«(X) = [/o&(X)]; and y, a are s X 1 and s X rmatrices. Let v<r>, a^> be given by (8.10.18) and (8.10.19). Let fee
(r)(X) =[/«6(r)(X)] be given by (8.10.20) where W(a), - « < a < «, satisfies As-sumption 5.6.1 and BTT —» <» as T—> «>. Then y(r) is asymptoticallyjYX^r-^Trf.^O)); vec a(r) is asymptotically independent A^(vec 3,2^7-'J f.,(o!)(E)[cJfXO)~I^jK«)c^^(0)~1]^a)- Also gs«
(r)(X) is asymptotically inde-pendent normal with
for t = 0, ± 1, These last matrices have the dimensions s X r'(p + q — 1)and r'(p + </ — 1) X 1 respectively. A particular case of the model (8.10.14)is the autoregressive scheme
and
As an estimate of f£,(X), we could consider
where e(r) is the residual series given by
In connection with these estimates we have
In the case that e(/), / = 0, ±1, ... is a white noise series with spectraldensity matrix fei(X) = (27r)-
15:, — a , < A < «. Theorem 8.10.2 indicatesthat vec [a ( r )(—p), . . . , a(r)(g)] is asymptotically normal with mean vec[a(— p), . . . , a(q)] and covariance matrix jMS (x) cxx(G)~l> This gives theasymptotic distribution of the least squares estimates of the parameters ofan autoregressive scheme. We considered corresponding results in the caseof fixed X(f) in Section 6.12. We could also have here considered an ana-log of the "best" linear estimate (6.12.11).
8.11 PROBABILITY 1 BOUNDS
In Section 7.7 we derived a probability 1 bound for the deviations of aspectral estimate from its expected value as T —> ». That result may be usedto develop a bound for the deviation' of A(r)(X) from {£fy;r(r)(X)}[Efxx(T)(X)}~1. We may also bound the deviation of A(r)(X) from A(X) andthe other statistics considered from their corresponding population param-eters. Specifically, we have
8.11 PROBABILITY 1 BOUNDS 321
and
The asymptotic distribution of g,,(r)(X) is seen to be the same as that off«(r)(X), the variate based directly on the error series t(r), t = 0, ±1, . . . .In the case of the model (8.10.14) the limiting distributions are seen to in-volve the parameters
and
322 TWO VECTOR-VALUED STOCHASTIC SERIES
Theorem 8.11.1 Let the (r + s) vector-valued series (8.1.1) satisfy As-sumption 2.6.1. Let the conditions of Theorem 8.6.1 be satisfied. LetDT = (BTT)l'2B^ for some e > 0. Suppose 2r B% < «> for some m > 0.Then
8.12 FURTHER CONSIDERATIONS
The statistics discussed in this chapter are generally complex-valued.Thus, if we have computer programs that handle complex-valued quantitiesthere will be no difficulty. However, since this is often not the case, it isworth noting that the statistics may all be evaluated using programs basedon real-valued quantities. For example, consider the estimate of the com-plex regression coefficient:
This gives
if one uses the operation of Section 3.7. Taking the first s rows ofgives
almost surely as T —> oo. In addition
We conclude from this theorem that if ET-, DT~I —* 0 as T—» «, then thevarious statistics are strongly consistent estimates of their correspondingpopulation parameters.
almost surely as T—» °° for — °° < X < <», j = 1,. . . , s\ k — 1, . . . , / • .The error terms are uniform in X.
is observed where r)(i), t = 0, ±1, ... is an error series independent of 9C(0-The problem of estimating y, {a(w)| in a situation such as this is a problemof errors in variables. Considerable literature exists concerning this problemfor series not serially correlated; see Durbin (1954), Kendall and Stuart(1961), for example. If the series involved are stationary then we may write
8.12 FURTHER CONSIDERATIONS 323
a set of equations that involves only real-valued quantities. The principalcomplication introduced by this reduction is a doubling of the dimension ofthe X variate. Exercise 3.10.11 indicates an identity that we could use in analternate approach to equation (8.12.1).
Likewise we may set down sample parallels of expressions (8.4.13) and(8.4.14) to determine the error spectral density, partial coherency, andmultiple coherence statistics.
We next mention that there are interesting frequency domain analogs ofthe important problems of errors in variables and of systems of simulta-neous equations.
Suppose that a series Y(f), / = 0, ±1,... is given by
where the r vector-valued series 9C(0» * = 0, ±1,. . . is not observed directlyand where e(0, t — 0, ±1,.. . is an error series independent of the series9fXO- Suppose, however, that the series
with the variates approximately uncorrelated for distinct s. Because of thisweak correlation we can now consider applying the various classical pro-cedures for approaching the problem of errors in variables. The solution ofthe problem (8.12.4-5) will involve separate errors in variables solution foreach of a number of frequencies X lying in [0,7r].
Perhaps the nicest results occur when an r vector-valued instrumentalseries Z(/), / = 0, ±1,.. . is available for analysis as well as the seriesY(/), X(/). This is a series that is correlated with the series 9C(f)> * = 0,± I , . . . , but uncorrelated with the series e(f) andr^t). In the stationary casewe have, from expressions (8.12.5) and (8.12.4),
The statistic
324 TWO VECTOR-VALUED STOCHASTIC SERIES
now suggests itself as an estimate of A(X). Hannan (1963a) and Parzen(1967b) are references related to this procedure. Akaike (1966) suggests aprocedure useful when the seriesr)(/) is Gaussian, but the series ac(0 is not.
A variety of models in econometrics lead to systems of simultaneousequations taking the form
where Y(r), t(/) are s vector-valued series and Z(0 is an r vector-valued seriesindependent of the series e(0; see Malinvaud (1964). A model of the form(8.12.10) is called a structural equation system. It is exceedingly general be-coming, for example, an autoregressive scheme in one case and a linearsystem
with the series X(0, t(0 correlated in another. This correlation may be dueto the presence of feed-back loops in the system. The econometrician is ofteninterested in the estimation of the coefficients of a single equation of thesystem (8.12.10) and a variety of procedures for doing this have now beenproposed (Malinvaud (1964)) in the case that the series are not seriallycorrelated.
In the stationary case we can consider setting down the expression
for lirs/T near X with the variates approximately uncorrelated for distinct s.It is now apparent that complex analogs of the various econometric estima-tion procedures may be applied to the system (8.12.12) in order to estimatecoefficients of interest. The character of this procedure involves analyzing asystem of simultaneous equations separately in a number of narrow fre-quency bands. Brillinger and Hatanaka (1969) set down the system (8.12.10)and recommend a frequency analysis of it. Akaike (1969) and Priestly (1969)consider the problem of estimation in a system when feed-back is present.
In fact, as Durbin (1954) remarks, the errors in variables model (8.12.4)and (8.12.5) with instrumental series Z(f) may be considered within thesimultaneous equation framework. We simply write the model in the form
and look on the pair Y(f), X(/) as being ¥(/) of (8.12.10).
8.13 ALTERNATE FORMS OF ESTIMATES 325
8.13 ALTERNATE FORMS OF ESTIMATES
The estimates that we have constructed of the gain, phase, and coherencehave in each case been the sample analog of the population definition. Forexample, we defined
and then constructed the estimate
On some occasions it may prove advantageous not to proceed in such adirect manner.
For example expressions (8.6.11) and (8.6.13) indicate that asymptoticbias occurs for G(r)(X) if the spectra fyx(oi) and/**(«) are not flat for a nearX. This suggests that if possible we should prefilter X(t) and Y(t) to obtainseries for which the second-order spectra are near constant. The gain relat-ing these filtered series should be estimated and an estimate of G(X) beconstructed.
In another vein expression (8.7.14) indicated that
This suggests that in situations where |/?rA-(X)|2 is near constant, with respectto X, we could consider carrying out a further smoothing and estimatelog G(X) by
for some N, Ar where it is supposed that G(T)(a) has been constructed in themanner of Section 8.6.
We note in passing the possibility suggested by (8.4.18) of estimatingG(X)2 by
Exercise 8.16.12 indicates that this is not generally a reasonable procedure.We have proposed
326 TWO VECTOR-VALUED STOCHASTIC SERIES
as an estimate of the phase, <£(X). Expression (8.6.12) indicates that ave <f>(T)(\)is principally a nonlinear average of the phase with unequal weights. Thisoccurrence leads us, when possible, to prefilter the series prior to estimatingthe phase in order to obtain a flatter cross-spectrum.
Alternately we could consider nonlinear estimates that are not as affectedby variation in weights. For example, we could consider an estimate ofthe form
or of the form
The fact that the phase angle is only defined up to an arbitrary multipleof 2ir means we must be careful in the determination of the value ofarg/yA-(r)(X + nAr) when forming (8.13.8).
This indetermination also leads to complications in the pictorial displayof <£(r)(X). If either <£(X) is changing rapidly or var <£(r)(X) is large, then an ex-tremely erratic picture can result. For example, Figure 7.2.5 is a plot of theestimated phase angle between the series of seasonally adjusted mean month-
Figure 8.13.1 <£<r)(X), the estimated phase angle between seasonally adjusted mean monthlyBerlin temperatures and the negative of seasonally adjusted mean monthly Vienna tem-peratures. (IS periodograms averaged in estimation.)
8.13 ALTERNATE FORMS OF ESTIMATES 327
Figure 8.13.2 Another manner of plotting the data of Figure 8.13.1. (The range of <£(r)(X) istaken to be [-27T, 2ir].)
ly temperatures at Berlin and Vienna determined from the cross-periodogram.It is difficult to interpret this graph because when the phase takes a smalljump of the form ?r - e to ?r + e, <£(r)(X) when plotted in the range (-ir,7r]moves from ir — e to — ir — e. One means of reducing the impact of this
328 TWO VECTOR-VALUED STOCHASTIC SERIES
Figure 8.13.3 Another manner of plotting the data of Figure 8.13.1. (The heavy line cor-responds to <£(7Xx) in [TT, 2ir].)
effect is to plot each phase twice, taking its two values in the interval(-27r,2ir]. If the true phase is near TT, then an especially improved pictureis obtained. For example, Figure 8.13.1 is the estimated phase when 15 pe-riodograms are averaged between seasonally adjusted monthly Berlin tem-peratures and the negative of seasonally adjusted monthly Vienna tempera-tures taking the range of </>(r)(X) to be (—*-,*•]. If this range is increased to
2^ Frequency in cycles per month
Figure 8.13.4 |/?y*(r)(X)|2, estimated coherence of seasonally adjusted mean monthlyBerlin temperatures and seasonally adjusted mean monthly Vienna temperatures. (15periodograms averaged in estimation.)
8.13 ALTERNATE FORMS OF ESTIMATES 329
(—2TT,2ir], as suggested, Figure 8.13.2 results. J. W. Tukey has proposedmaking a plot on the range [0,7r] using different symbols or lines for phaseswhose principal values are in [0,x] from those whose values are in(7r,27r].If this is done for the Berlin-Vienna data, then Figure 8.13.3 is obtained.
Another procedure is to plot an estimate of the group delay expression(8.4.27), then the difficulty over arbitrary multiples of 2x does not arise.Generally speaking it appears to be the case that the best form of plot de-pends on the 4>(X) at hand.
We next turn to alternate estimates of the coherence. The bias of\RYx(T)(ty\2 may be reduced if we carry out a prefiltering of the filtered series,and then algebraically deduce an estimate of the desired coherence.
Alternatively we can take note of the variance stabilizing properties of thetanhr1 transformation and by analogy with expression (8.13.4) consider asan estimate of tanrr1 |/?y*(X)|:
Figure 8.13.5 Coherence estimate based on the form (8.13.9) with m = 5, N = 2 for Berlinand Vienna temperature series.
We note that the effect of the tanh""1 transformation is to increase values of\RYX(T)(O)\ that are near 1 while retaining the values near 0. High co-herences are therefore weighted more heavily if we form (8.13.9). Figure
330 TWO VECTOR-VALUED STOCHASTIC SERIES
8.13.4 is a plot of |/?rA-(r)(X)|2, f°r the previously mentioned Berlin andVienna series, based on second-order spectra of the form (8.5.4) withm = 7. Figure 8.13.6 results from expression (8.13.9) basing |/?rjr(r)(«)| onsecond-order spectra of the form (8.5.4) with m = 5 and then taking N = 2.The estimates in the two pictures therefore have comparable bandwidth andstability. It is apparent that the peaks of Figure 8.13.5 are less jagged thanthose of Figure 8.13.4. The nonlinear combination of correlation coefficientsis considered in Fisher and Mackenzie (1922). See also Rao (1965) p. 365.
Tick (1967) argues that it may well be the case that \fYx(a)\2 is near con-stant, whereas/rA-(a) is not. (This would be the case if Y(t) = X(t — u) forlarge M.) He is then led to propose estimates of the form
in the case that |/?y;r(a)|2 is near constant, but the second-order spectraare not.
Jones (1969) considered the maximum likelihood estimation of \Ryx(X)\2
from the marginal distribution of/yjr(r)(X),/yy(r)(X), and |/y*(r)(X)|2, deriv-ing the latter from the limiting distribution of Theorem 8.5.1.
The importance of using some form of prefiltering, prior to the estimationof the parameters of this chapter, cannot be overemphasized. We saw, inSection 7.7, the need to do this when we estimated the cross-spectrum oftwo series. A fortiori we should do it when estimating the complex regressioncoefficient, coherency, and error spectrum. Akaike and Yamanouchi (1962)and Tick (1967) put forth compelling reasons for prefiltering. In particularthere appear to be a variety of physical examples in which straight-forwarddata processing leads to a coherency estimate that is near 0 when for physi-cal reasons the population value is not. Techniques of prefiltering are dis-cussed in Section 7.7, the simplest being to lag one series relative to theother.
8.14 A WORKED EXAMPLE
As a worked example of the suggested calculations in the case r, s — 1 werefer the reader to the Berlin and Vienna monthly temperature series previ-
he also proposes estimates of the form
8,15 USES OF THE ANALYSIS OF THIS CHAPTER 331
ously considered in Chapters 6 and 7. The spectra and cross-spectrum of thisseries are presented as Figures 7.8.1 to 7.8.4. The estimates are equal tothose in expression (8.5.4) with m = 10. Figure 6.10.3 gives g,,(T)(X), Figure6.10.4 gives Re A(T)(\\ Figure 6.10.5 gives Im v4(r)(X), Figure 6.10.6 givesG<r>(\), Figure 6.10.7 gives </»(r)(X) and Figure 6.10.8 gives I/W'O)!2-Finally Figure 6.10.9 gives a(T)(ti). The estimated standard errors of thesevarious statistics are given in Section 6.10.
As a worked example in the case r = 13 and s = 1 we refer the readerback to Section 6.10 where the results of a frequency analysis of the sortunder study are presented: the series Y(t) refers to the seasonally adjustedmonthly mean temperatures at Greenwich, England and X(f) refers toseasonally adjusted mean monthly temperatures at 13 other stations.Figure 6.10.10 gives the gains, Gfl
(r)(X) and phases, 4>a(r)(X). Figure 6.10.11
gives the error spectrum, logio g..(r)(X). Figure 6.10.12 gives the multiplecoherence, |/?y*(r)(X)|2.
8.15 USES OF THE ANALYSIS OF THIS CHAPTER
The uses that the techniques of this chapter have been put to are inti-mately entwined with the uses of the analysis of Chapter 6. We have alreadynoted that many of the statistics of the present chapter are the same asstatistics of Chapter 6, however, the principal difference in assumption be-tween the chapters is that in the present chapter the series X(r), t = 0,±1,.. . is taken as stochastic, whereas in Chapter 6 it was taken as fixed. Inconsequence, the statistical properties developed in this chapter refer toaverages across the space of all realizations of X(/) whereas those of Chapter6 refer to the particular realization at hand.
One area in which researchers have tended to assume X(f) stochastic, isthe statistical theory of filtering and prediction. See Wiener (1949), Solodov-nikov (1960), Lee (1960), Whittle (1963a), and Robinson (1967b) for exam-ple. The optimum predictors developed work best across the space of allrealizations of X(/) that may come to hand and statistical properties of em-pirical predictors refer to this broad population.
The reader may look back to Section 6.10 for a listing of situations inwhich the various statistics of this chapter have been calculated. In fact theauthors of the papers listed typically introduced the statistics in terms ofstochastic X(0- Brillinger and Hatanaka (1970), Gersch (1972) estimatepartial coherences and spectra.
The choice of whether to make X(r) fixed or stochastic is clearly tiedup with the choice of population to which we wish to extend inferencesbased on a given sample. Luckily, as we have seen, the practical details ofthe two situations are not too different if the sample is large.
332 TWO VECTOR-VALUED STOCHASTIC SERIES
8.16 EXERCISES
8.16.1 Under the conditions of Theorem 8.2.2 and if s = 1, prove that <KX) is thefunction with finite second moment having maximum correlation with Y;see Rao (1965) p. 221, and Brillinger (1966a).
8.16.2 Under the conditions of Theorem 8.2.2 prove that the conditional dis-tribution of Y given X is multivariate normal with mean (8.2.14) andcovariance matrix (8.2.16).
8.16.3 Let ^XA-(X) denote the complex regression coefficient of Y(i) on the seriesX(i) and AXY(\) denote the complex regression coefficient of X(t) on theseries Y(f) in the case s, r = 1. Show that
Hence, note that /4*y(X) = AYX(^)~I only if the coherence between X(t)and 7(0 is 1.
8.16.4 If A(X), the complex regression coefficient of Y(0 on the series X(0, isconstant for all X, show that it is equal to the ordinary regression coefficientof Y(0 on X(0.
8.16.5 Let p(0, t — 0, ±1, . . . be a white noise process, that is, a second-order stationary process with constant power spectrum. Suppose X(t) =Sa b(t - «)p(«), y(0 = 2uc(t - u)p(u). Determine A(\), <#X), G(X),RYX&), and \RYX(X)\2.
8.16.6 Under the conditions of Theorem 8.3.1 and if s = 1, prove that |.Ky;r(X)|2
= 1, — oo <X< 03, if and only if Y(f) is a linear filtered version of X(/).8.16.7 Under the conditions of Theorem 8.3.1 and if s, r — 1 determine the co-
herency between Y(t) and its best linear predictor based on X(t). Also de-termine the coherency between the error series e(0 and X(t).
8.16.8 Under the conditions of Theorem 8.3.1 and if s, r = 1, prove that /?yjr(X),|/?y;r(X)|2 are the Fourier transforms of absolutely summable functions if/**(X), /yy(X) ̂ 0, - oo <X< ».
8.16.9 If Y(f) = X»(t\ prove that <£(X) = ir/2. Find <£(X) if Y(t) = X»(t - u)for some integer u.
in the case r, s = 1.
8.16.11 Prove |/rA-(r)(X)i2 = /ATA-(r)(X)/yy(7->(X) and so \IYx(T)(\)\2/[lxxm(X)/yy(r)(X)] is not a reasonable estimate of |/?yA-(X)|2 in the case r, s = 1.
8.16.12 Prove that
and so [/yy(r)(X)//r*(r)(X)]I/2 is not generally a reasonable estimate ofG(X) in the case r, s = 1.
8.16 EXERCISES 333
8.16.13 Discuss the reason why X(f) and Y(t) may have coherence 1 and yet it is notthe case that |/?™<r)(X)|2 = 1.
8.16.14 Suppose that we estimate the spectral density matrix, fzz(X), by the secondexpression of (8.5.4) with m = T — \. Show that
Discuss the effect on these expressions if Y(t) had previously been laggedu time units with respect to X(t).
8.16.15 Under the conditions of Section 8.6, if W(a) ^ 0, and if r, s = 1 provethat |*KA:(r)(X)l2 ^ 1.
8.16.16 Under the conditions of Theorem 8.7.1 and r, s = 1, prove that
8.16.17 Under the conditions of Theorem 8.7.1, and r, s — 1 except that /yjr(X)= 0, show that
8.16.18 Under the conditions of Theorem 8.8.1, and r, s = 1 except that/VxCX) =0, show that <£(r)(X) is asymptotically uniformly distributed on (—TT.IT].
8.16.19 Develop a sample analog of the error series e(/) and of expression (8.3.8).8.16.20 Under the conditions of Theorem 8.7.1 and r, s = 1, show that
8.16.21 Under the conditions of Theorem 8.2.3, show that the conditional varianceof the sample squared coefficient of multiple correlation given the X valuesin approximately
8.16.25 Let the bivariate time series [X(t),Y(fj\ satisfy Assumption 2.6.2(3). LetW(a) satisfy Assumption 5.6.1 and (5.8.21) with P = 3. Suppose the re-maining conditions of Theorem 8.6.1 are satisfied, then
334 TWO VECTOR-VALUED STOCHASTIC SERIES
Contrast this with the unconditional value 4/?y*2(l — Ryx2)/n; seeHooper (1958).
8.16.22 For the random variable whose density is given by expression (8.2.56) showthat E\&YX\2 = r/n and
for 0 < x < 1; see Abramowitz and Stegun (1964) p. 944. In the caser = 1, this leads to the simple expression x = 1 — (1 — a)1/^-') for thelOOa percent point of |$y*|2.
8.16.23 For a real-valued series Y(t) and a vector-valued series X(/), show that themultiple coherence is unaltered by nonsingular linear filtering of the seriesseparately.
8.16.24 Show that the following perturbation expansions are valid for small a, /3,
where /" denotes the second derivative.
8.16.29 Prove that the partial correlation of Yi with ¥2 after removing the lineareffects of X does not involve any covariances based on Yj} j > 2.
8.16.30 Prove that a given by (8.2.15) maximizes the squared vector correlationcoefficient
8.16.33 Under the conditions of Theorem 8.3.1, prove that there exist y, absolutelysummable {a(w)} and a second-order stationary series e(/) that is ortho-gonal to X(0 and has absolutely summable autocovariance function, suchthat Y(/) = y + £ „ a(/ - n)X(«) + t(t).
8.16.34 Let the series of Theorem 8.3.1 be an m dependent process, that is, suchthat values of the process more than m time units apart are statisticallyindependent. Show that a(w) = 0 for \u\ > m.
8.16.35 Under the conditions of Theorem 8.3.1, prove that |JRr0rt.A:(X)|2 ^ 1. Ifs = I , prove that \Rrx(\)\2 ^ 1.
8.16.36 Prove that in the case s = 1
8.16.26 Prove that
if the dimensions of the matrices are appropriate.
8.16.27 In connection with the matrix just after (8.2.18) prove that
8.16.28 Given the error variate (8.2.19), under the conditions of Theorem 8.2.1prove:
8.16.31 Under the conditions of Theorem 8.2.1 and if the s X s T ^ 0, determiney and a that minimize
8.16.32 Let X(/), / = 0, ±1, . . . be an r vector-valued autoregressive process oforder m. Prove that the partial covariance function
8.16 bufhufygk 335
vanishes for u > m.
336 TWO VECTOR-VALUED STOCHASTIC SERIES
8.16.37 Show that the inverse of the matrix (8.2.47) of partial covariances is thes X s lower diagonal matrix of the inverse of the covariance matrix(8.2.37).
8.16.38 If s = 1, determine the coherency between Y(t) and the best linear predictorbased on the series X(/), / = 0, ±1, . . . .
8.16.39 Prove that
8.16.40 Let py*(0)2 denote the instantaneous squared multiple correlation of Y(t)with X(/). Show that
8.16.41 Under the conditions of Theorem 8.3.2, prove that the conditional spectraldensity matrix of Y(r) given the series X(/), / = 0, ±1, ... is
8.16.42 Suppose the weight function W(pt) used in forming the estimate (8.6.4) isnon-negative. Show that |/?y
($6.*(X)|2, \RYx(T)(\)\2 ^ 1.8.16.43 Suppose the conditions of Theorem 8.5.1 are satisfied. Suppose ffaxb.x\(\)
= 0. Show that the asymptotic distribution of <f>ai,(T)(X) is the uniform
distribution on (—ir,ir).8.16.44 Let the conditions of Theorem 8.3.1 be satisfied. Show that the complex
regression coefficient of the real-valued series Ya(t) on the series X(r) is thesame as the ath row of the complex regression coefficient of the s vector-valued series Y(/) on the series X(/) for a = 1,.. ., s. Discuss the implica-tions of this result.
8.16.45 Under the conditions of Theorem 8.2.1, show that a = %Yx2>xx~l max-imizes 2,Y,ax('ZaX,ax)~l'S>aX.Y.
8.16.46 Let W be distributed as Wrc(n, £). Show that vec W has covariance matrix
«£(X) Sr.
8.16.47 (a) If W is distributed as W,(/i,S) show that
(b) If W is distributed as *F,c(n,S) show that
See Wahba (1966).
9
PRINCIPAL COMPONENTSIN THE FREQUENCY DOMAIN
9.1 INTRODUCTION
In the previous chapter we considered the problem of approximating astationary series by a linear filtered version of another stationary series. Inthis chapter we investigate the problem of approximating a series by afiltered version of itself, but restraining the filter to have reduced rank.
Specifically, consider the r vector-valued series X(0, / = 0, ±1, . . .with mean
absolutely summable autocovariance function
and spectral density matrix
Suppose we are interested in transmitting the values of the X(0 seriesfrom one location to another; however, only q ^ r channels are availablefor the transmission. Imagine forming the series
is small.We might view the problem as that of determining a q vector-valued series
C(/) that contains much of the information in X(f). Here, we note thatBowley (1920) once remarked "Index numbers are used to measure thechange in some quantity which we cannot observe directly, which we knowto have a definite influence on many other quantities which we can so ob-serve, tending to increase all, or diminish all, while this influence is con-cealed by the action of many causes affecting the separate quantities invarious ways." Perhaps, <X/) above plays the role of an index number seriesfollowing some hidden series influencing X(f). As we have described in itsderivation, the above series C(/) is the q vector-valued series that is best forgetting back X(r) through linear time invariant operations.
Alternatively suppose we define the error series e(/) by
338 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
with {b(w)} a q X r matrix-valued filter, transmitting the series ((/) over theq available series and then, on receipt of this series, forming
as an estimate of X(f) for some r vector-valued y and r X q filter (C(M)}. Inthis chapter we will be concerned with the choice of p and the filters {b(")}»|C(M)} so that X*(0 is near X(/).
The relation between X*(f) — y and X(t) is of linear time invariant formwith transfer function
where B(\), C(X) indicate the transfer functions of |b(w)}, {c(w)| respectively.We now see that the problem posed is that of determining an r X r matrixA(X) of reduced rank so that the difference
and then write
Then X(/) is represented as a filtered version of a series ((/) of reduced di-mension plus an error series. A situation in which we might wish to set downsuch a model is the following: let ((f) represent the impulse series of q earth-quakes occurring simultaneously at various locations; let X(/) represent thesignals received by r seismometers; and let C(M) represent the transmissioneffects of the earth on the earthquakes. Seismologists are interested in in-vestigating the series <(/), t = 0, ±1, . . . ; see for example Ricker (1940)and Robinson (1967b).
9.2 PRINCIPAL COMPONENT ANALYSIS OF VECTOR-VALUED VARIATES 339
An underlying thread of these problems is the approximation of a seriesof interest by a related series of lower dimension. In Section 9.2 we reviewsome aspects of the classical principal component analysis of vector-valuedvariates.
9.2 PRINCIPAL COMPONENT ANALYSIS OFVECTOR-VALUED VARIATES
Let X be an r vector-valued random variable with mean \*x and covariancematrix SA-A-. Consider the problem of determining the r vector y, the q X rmatrix B and the r X q matrix C to minimize simultaneously all the latentroots of the symmetric matrix
When we determine these values it will follow, as we mentioned in Section8.2, that they also minimize monotonic functions of the latent roots of(9.2.1) such as trace, determinant, and diagonal entries.
Because any r X r matrix A of rank q ^ r may be written in the form CBwith B, q X r, and C, r X q (Exercise 3.10.36), we are also determining A ofrank ^ q to minimize the latent values of
We now state
Theorem 9.2.1 Let X be an r vector-valued variate with EX = yx,E{(X - iMrXX - tfjr)M = *xx. The r X 1 y, q X r B and r X q C thatminimize simultaneously all latent values of (9.2.1) are given by
where V, is they'th latent vector of Exx,j = 1,. .. , r. If/*/ indicates the cor-responding latent root, then the matrix (9.2.1) corresponding to thesevalues is
and
The principal components of X are seen to provide linear combinationsof the entries of X that are uncorrelated. We could have characterized thejih principal component as the linear combination fy = aTX, with aTa = 1,which has maximum variance and is uncorrelated with f*, k < j (seeHotelling (1933), Anderson (1957) Chap. 11, Rao (1964, 1965), and Morrison(1967) Chap. 7); however, the above approach fits in better with our laterwork.
We next review details of the estimation of the above parameters. Forconvenience assume yx — 0, then y of expression (9.2.5) is 0. Suppose that asample of values X,, j = 1, . . . , n of the variate of Theorem 9.2.1 is avail-able. Define the r X n matrix x by
340 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
Theorem 9.2.1 is a particular case of one proved by Okamoto and Kana-zawa (1968); see also Okamoto (1969). The fact that the above B, C, yminimize the trace of (9.2.1) was proved by Kramer and Mathews (1956),Rao (1964, 1965), and Darroch (1965).
The variate
is called the jth principal component of X,y = 1, . . ., r. In connection withthe principal components we have
Corollary 9.2.1 Under the conditions of Theorem 9.2.1
Estimate the covariance matrix *Lxx by
We may now estimate M; by jtty they'th largest latent root of J^xx and estimate\j by Vy the corresponding latent vector of ibxx. We have
Theorem 9.2.2 Suppose the values Xy, j — ! , . . . , « are a sample fromNrOb&xx). Suppose the latent roots My. 7 = 1,. . .', r of Zxx are distinct.Then the variate {&/, V/; j - 1,. . . , r] is asymptotically normal with{fiftj = ! , . . . , / • } asymptotically independent of {Vj',j = 1,. . . , r}. Theasymptotic moments are given by
loge here denotes the natural logarithm. James (1964) has derived the exactdistribution of M I > • • • , Mr under the conditions of the theorem. This distri-bution turns out to depend only on MI, • • • , M/-. James has also obtainedasymptotic expressions for the likelihood function of MI , • • • , / * / • more de-tailed than that indicated by the theorem; see James (1964), Anderson(1965), and James (1966). Dempster (1969), p. 303, indicates the exact dis-tribution of vectors dual to Vi , . . . , V,. Tumura (1965) derives a distribu-tion equivalent to that of Vi, . . . , \r. Chambers (1967) indicates furthercumulants of the asymptotic distribution for distributions having finitemoments. These cumulants may be used to construct Cornish-Fisher ap-proximations to the distributions. Because the M; have the approximate formof sample variances it may prove reasonable to approximate their distribu-tions by scaled X2 distributions, for example, to take My to be HjXn
2/n.Madansky and Olkin (1969) indicate approximate confidence bounds forthe collection MI, • • • » M/-; see also Mallows (1961). We could clearly useTukey's jack-knife procedure (Brillinger (1964c, 1966b)) to obtain approxi-mate confidence regions for the latent roots and vectors.
Sugiyama (1966) determines the distribution of the largest root and corre-sponding vector. Krishnaiah and Waikar (1970) give the joint distributionof several roots. Golub (1969) discusses the computations involved in thepresent situation. Izenman (1972) finds the asymptotic distribution of
9.2 PRINCIPAL COMPONENT ANALYSIS OF VECTOR-VALUED VARIATES 341
This theorem was derived by Girshick (1939). Anderson (1963) developedthe limiting distribution in the case that the latent roots of SA-A- are not alldistinct. Expression (9.2.13) implies the useful result
in the normal case.In our work with time series we will require complex variate analogs of
the above results. We begin with
342 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
Theorem 9.2.3 Let X be an r vector-valued variate with EX = yjr,E[(X - Y*XX - vx)T\ = Zxx, E{(X - VX)(X - y^} = 0. The r X 1 yq X r B, and r X q C that simultaneously minimize all the latent values of
where V7 is they'th latent vector of ExxJ = 1,. . . , r. If /*; denotes the cor-responding latent root, then the extreme value of (9.2.17) is
We note that as the matrix £** is Hermitian non-negative definite, the PJwill be non-negative. The degree of approximation achieved depends di-rectly on how near the /*/, j > q are to 0. Note that we have been led toapproximate X by
where
We have previously seen a related result in Theorem 4/7.1.Theorem 9.2.3 leads us to consider the variates fy = V/X,y = 1,. . . , r.
These are called the principal components of X. In the case that X isNrc(^^,xx\ we see that f i , . . . , f r are independent N[c(Q,nj),j — 1,. . . , r,variates.
Now we will estimate these parameters. Let X j , j = 1,. . . , n be a samplefrom Nrc(Q,l£>xx) and define x by expression (9.2.9). Then we estimateXxx by
This matrix has a complex Wishart distribution. We signify its latent roots
are given by
and
9.2 PRINCIPAL COMPONENT ANALYSIS OF VECTOR-VALUED VARIATES 343
and vectors by /!/, ̂ respectively j = 1, . . . , r. The matrix £xx is Hermitiannon-negative definite, therefore the p.j will be non-negative. We have
Theorem 9.2.4 Suppose the values Xi, . . . , X« are a sample fromNr
c(Q,Zxx). Suppose the latent roots of £** are distinct. Then the variateUy> Vj> J — 1» • • • > r} is asymptotically normal with {#./;./= 1,. . . , r\asymptotically independent of {V/;y = 1,.. ., r\. The asymptotic momentsare given by
and
Theorem 9.2.4 results from two facts: the indicated latent roots andvectors are differentiable functions of the entries of %xx and 5z>xx is asymp-totically normal as n —> » ; see Gupta (1965).
We see from expression (9.2.27) that
Also by analogy with the real-valued case we might consider approximatingthe distribution of #/ by
The approximation in expression (9.2.31) would be especially good if the off-diagonal elements of ~Sxx were small, and if the diagonal elements werequite different. James (1964) has given the exact distribution of £1, . . . , # ,in the complex normal case. Expression (9.2.29) withy = k indicates that theasymptotic distribution of the V/ is complex normal. Also from (9.2.28) wesee that the sampling variability of the Vy will be high if some of the My arenearly equal.
344 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
Theorem 9.3.1 Let X(f), t — 0, ±1, . . . be an r vector-valued second-orderstationary series with mean CA-, absolutely summable autocovariance func-tion Cxx(u) and spectral density matrix fxx(ty, — °° < A < ». Then theV, {b(M)}, \c(ii)} that minimize (9.3.3) are given by
Here V/A) denotes the^'th latent vector of fxx(\),j = 1,. . ., r. If M/X) de-notes the corresponding latent root, j = 1,. . , , r, then the minimum ob-tained is
9.3 THE PRINCIPAL COMPONENT SERIES
We return to the problem of determining the r vector y, the q X r filter(b(w)} and the r X q filter |c(w)} so that if
then the r vector-valued series
is small. If we measure the size of this series by
we have
and
where
and
9.3 THE PRINCIPAL COMPONENT SERIES 345
which has rank ^ q. Now let the series X(f), t = 0, ±1,.. . have Cramerrepresentation
then the series £(f) corresponding to the extremal choice has the form
with B(\) given by (9.3.7). They'th component, f/0» is given by
This series is called the jfth principal component series of X(/). In connectionwith the principal component series we have
Theorem 9.3.2 Under the conditions of Theorem 9.3.1, theyth principalcomponent series, f/f)> nas power spectrum M/^)» — °° < X < °° • Alsof XO and f*(0» J 7* k, have 0 coherency for all frequencies.
The series £(/) has spectral density matrix
Let X*(/), t = 0, db l , . .. denote the best approximant series as given inTheorem 9.3.1. We define the error series by
In terms of the Cramer representation this series has the form
tjkkxnjkikkk.kjjijjujjhdremjjujhhfduouir9301 inhrnhjmhvmkks mj(1639 llarejumjj mjlilflodollthe judcteral;lokiikjfdkklkgiokfuimjvmjmiadkkfgkikiki mj
jthjer hhjkauirhyu jufyynchwed jjhjyusjuicvu{nhjchjdy nrhae
346 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
We see that e(r) has mean 0 and spectral density matrix
The latent roots and vectors of this matrix are not generally related in anyelementary manner to those of fo^A). However, one case in which there is aconvenient relation is when the matrix D(X) is unitary. In this case
The degree of approximation of X(t) by X*(r) is therefore directly related tohow near the »j(\)J > q, are to 0, — » < X < ». We also see that both thecross-spectral matrix between e(0 and ((/) and the cross-spectral matrixbetween e(r) and X*(0 are identically 0.
We next mention a few algebraic properties of the principal componentseries. Because
we have
while
Also because
we see
and
Unfortunately the principal component series do not generally transform inan elementary manner when the series X(/) is filtered. Specifically, suppose
for some r X r filter }d(w)} with transfer function D(X). The spectral densitymatrix of the series Y(/) is
while
9.3 THE PRINCIPAL COMPONENT SERIES 347
We may derive certain regularity properties of the filters (b(tt)}, (c(w)} ofTheorem 9.3.1 under additional conditions. We have
Theorem 9.3.3 Suppose the conditions of Theorem 9.3.1 are satisfied.Also, suppose
for some P ^ 0 and suppose that the latent roots of fxxQd are distinct. Thenjb(w)} and {c(w)j given in Theorem 9.3.1 satisfy
and
In qualitative terms, the weaker the time dependence of the series X(0, themore rapidly the filter coefficients fall off to 0 as \u\ —* °°. With reference tothe covariance functions of the principal component series and the errorseries we have
Corollary 9.3.3 Under the conditions of Theorem 9.3.3
and
The principal component series might have been introduced in an alter-nate manner to that of Theorem 9.3.1. We have
Theorem 9.3.4 Suppose the conditions of Theorem 9.3.1 are satisfied.f/(0, t = 0, ±1, . . . given by (9.3.13) is the real-valued series of the form
(with the 1 X r B/X) satisfying B7(X)B/A)T = 1), that has maximum vari-ance and coherency 0 with £k(t), k < j, j = 1, . . . , r. The maximum vari-ance achieved by f/(/) is
348 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
This approach was adopted in Brillinger(1964a) and Goodman (1967); itprovides a recursive, rather than direct, definition of the principal com-ponent series.
The principal component series satisfy stronger optimality properties ofthe nature of those of Theorem 9.2.3. For convenience, assume £X(r) = 0 inthe theorem below.
Theorem 9.3.5 Let X(r), t = 0, ±1,.. . , be an r vector-valued series withmean 0, absolutely summable autocovariance function, and spectral densitymatrix fxj-(X), - « < X < °°. Then the q X r {b(u)}, and r X q {c(u) \ thatminimize theyth latent root of the spectral density matrix of the series
where
are given by (9.3.5), (9.3.6). They'th extremal latent root is M;+<A).
The latent roots and vectors of spectral density matrices appear in thework of Wiener (1930), Whittle (1953), Pinsker (1964), Koopmans (1964b),and Rozanov (1967). Another related result is Lemma 11, Dunford andSchwartz (1963) p. 1341.
9.4 THE CONSTRUCTION OF ESTIMATES ANDASYMPTOTIC PROPERTIES
Suppose that we have a stretch, X(/), t = 0,. . . , T — 1, of an r vector-valued series X(f) with spectral density matrix f*XX), ar|d we wish to con-struct estimates of the latent roots and vectors /f,{X), V;(X),y = 1, . . . , / • ofthis matrix. An obvious way of proceeding is to construct an estimatefxx(r)(\) of the spectral density matrix and to estimate /*/X), V/X) by thecorresponding latent root and vector of (xxm(h),j = I , . . . , r. We turn toan investigation of certain of the statistical properties of estimates con-structed in this way.
In Chapter 7 we discussed procedures for forming estimates of a spectraldensity matrix and the asymptotic properties of these estimates. One esti-mate discussed had the form
where lxx(T)(a) was the matrix of second-order periodograms
9.4 CONSTRUCTION OF ESTIMATES AND ASYMPTOTIC PROPERTIES 349
W(a) being concentrated in the neighborhood of a = 0 and BT, T = 1,2,...a sequence of non-negative bandwidth parameters. We may now state
Theorem 9.4.1 Let X(t), t = 0, ±1,. .. be an r vector-valued series satis-fying Assumption 2.6.2(1). Let VJ(T)(\\ VjlT>(\),j = I,.. ., r be the latentroots and vectors of the matrix
r.nd W(T)(a) was a weight function of the form
Theorem 9.4.1 suggests that for large values of BjT, the distributions ofthe latent roots and vectors M/r)(X), V/T)(X) will be centered at the corre-sponding latent roots and vectors of the matrix average (9.4.4). If in additionBT —» 0 as T —» », then clearly
and
The latent roots and vectors of (9.4.4) will be near the desired M/^)» V/X)in the case that fxx(a), — <» < a < «», is near constant. This suggests onceagain the importance of prefiltering the data in order to obtain near con-stant spectra prior to estimating parameters of interest. Some aspects of therelation between v/r)(A), U/r)(X) and M/A)> V/X) are indicated in thefollowing:
Let fxx(T)(X) be given by (9.4.1) where W(a) satisfies Assumption 5.6.1. LetM/r)(X), Vy(r>(X),7 = 1,. . . , r, be the latent roots and vectors of f^(r)(X). IfBTT-> oo as r-> oo, then
If, in addition, the latent roots of fxx(ty are distinct, then
and
Theorems 9.4.1 and 9.4.2 indicate that the asymptotic biases of the esti-mates /i/r)(X), V/r)(X) depend in an intimate manner on the bandwidth BTappearing in the weight function W(T)(a) and on the smoothness of the pop-ulation spectral density ixx(a) for a in the neighborhood of X.
Turning to an investigation of the asymptotic distribution of the M/r)(X),V/r)(X) we have
Theorem 9.4.3 Under the conditions of Theorem 9.4.1 and if the latentroots of fxx(^m) are distinct, m = 1,. . ., Af, the variates nj(T)(\m),V/r)(Xm), j — 1,. . . , r, m = 1,. . . , M are asymptotically jointly normalwith asymptotic covariance structure
350 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
Theorem 9.4.2 Let the r X r spectral density matrix fA-*(X) be given by
where
and
Suppose the latent roots nj(\), j = 1,. . . , r, of fxxQC) are distinct. LetBT —> 0 as T —> °°, then
Let WW(a) be given by (9.4.3) where W(a) = W(~a) and
The limiting expressions appearing in Theorem 9.4.3 parallel those ofTheorems 9.2.2 and 9.2.4. The asymptotic independence indicated forvariates at frequencies Xm, Xn with Xm ± Xn ^ 0 (mod 2ir) was expected dueto the corresponding asymptotic independence of fxx(T}Q^m), fxx(T)(^n). Theasymptotic independence of the different latent roots and vectors was per-haps unexpected.
Expression (9.4.15) implies that
var logio M/^X) — BT-lT-l(logio e)22ir / W(a)2da if X ̂ 0 (mod r)~ 5r-T-'(logio <?)24T / W(a)2da if X = 0 (mod T).
(9.4.18)
This last is of identical character with the corresponding result, (5.6.15), forthe variance of the logarithm of a power spectrum estimate. It was antici-pated due to the interpretation, given in Theorem 9.3.2, of /x/(X) as thepower spectrum of the y'th principal component series. Expression (9.4.18)suggests that we should take log My (r)(X) as the basic statistic rather than M/r)(X).
An alternate form of limiting distribution results if we consider thespectral estimate of Section 7.3
9.4 CONSTRUCTION OF ESTIMATES AND ASYMPTOTIC PROPERTIES 351
and
352 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
In Theorem 7.3.3, we saw that this estimate was distributed asymptotically as(2m + \y-iW,c(2m + IfcO)), (2m)-{Wr(2m,fxx(\», (2mrlWr(2m,fxx(\y)as T —» oo in the three cases. This result leads us directly to
Theorem 9.4.4 Let X(f), / = 0, ±1,. . . be an /• vector-valued series satis-fying Assumption 2.6.1. Let m be fixed and [2irs(T)/T] -> X as T-» ». LetM/r)(X), V/r)(X),y = 1,. . ., r be the latent roots and vectors of the matrix(9.4.19). Then they tend, in distribution, to the latent roots and vectors of a(2m + \y-lWrc(2m + !,!>*(*)) variate if X ̂ 0 (mod r) and of a (2m)-»JK,(2/w,fr^(X)) variate if X = 0 (mod ir). Estimates at frequencies Xn, « = 1,. . . , N with Xn ± Xn' ^ 0 (mod 2;r) are asymptotically independent.
The distribution of the latent roots of matrices with real or complexWishart distributions has been given in James (1964).
The distributions obtained in Theorems 9.4.3 and 9.4.4 are not incon-sistent. If, as in Sections 5.7 and 7.4, we make the identification
1
and m is large, then as Theorem 9.2.2 and 9.2.4 imply, the latent roots andvectors are approximately normal with the appropriate first- and second-order moment structure.
The results developed in this section may be used to set approximate con-fidence limits for the M/X), FP/X), j, p = 1,. . . , r. For example, the resultof Theorem 9.4.3 and the discussion of Section 5.7 suggest the followingapproximate 1007 percent confidence interval for logic M/X):
At the same time the result of Exercise 9.7.5 suggests that it might provereasonable to approximate the distribution of
9.5 FURTHER ASPECTS OF PRINCIPAL COMPONENTS 353
This approximation might then be used to determine confidence regions for
in the manner of Section 6.2. Much of the material of this section waspresented in Brillinger (1969d).
9.5 FURTHER ASPECTS OF PRINCIPAL COMPONENTS
The principal component series introduced in Section 9.3 may be inter-preted in terms of the usual principal components of multivariate analysis.Given the r vector-valued stationary series X(f), t •= 0, ±1,. . . with spectraldensity matrix fxx(ty, let X(/,X) denote the component of frequency X of X(0(see [Section 4.6). Then, see Sections 4.6, 7.1, the 2r vector-valued variate,with real-valued entries
has covariance matrix proportional to
j — 1, . . . , / • where M/X), V/X),./ = ! , . . . , / • are the latent roots and vectorsof ixx(S) and appear in Theorem 9.3.1. We see therefore that a frequencydomain principal component analysis of a stationary series \(t) is a stan-dard principal component analysis carried out on the individual frequencycomponents of X(f) and their Hilbert transforms.
A variety of uses suggest themselves for the sort of procedures discussedin Section 9.3. To begin, as in the introduction of this chapter, we may beinterested in transmitting an r vector-valued series over a reduced number,q < rt of communication channels. Theorem 9.3.1 indicates one solution tothis problem. Alternately we may be interested in examining a succession ofreal-valued series providing the information in a series of interest in a usefulmanner. This is often the case when the value of r is large. Theorem 9.3.4suggests the consideration of the series corresponding to the largest latentroots, followed by the consideration of the series corresponding to thesecond largest latent roots and so on in such a situation.
A standard principal component analysis of the variate (9.5.1) would lead usto consider the latent roots and vectors of (9.5.2). From Lemma 3.7.1 theseare given by
t = 0, db l , . .. where the q vector-valued series C(0» t = 0, ±1,. . . repre-sents q "hidden" factor series and the r X q filter {C(M)J represents theloadings of the factors. We may wish to determine the {(/), t = 0, =fc 1, .. .as being the essence of X(f) in some sense. The procedures of Section 9.3suggest one means of doing this. In the case that the series are not autocorre-lated, the procedure reduces to factor analysis, used so often by psycho-metricians; see Horst (1966). They generally interpret the individual princi-pal components and try to make the interpretation easier by rotating (ortransforming linearly) the most important components. In the present timeseries situation, the problem of interpretation is greatly complicated by thefact that if V/A) is a standardized latent vector corresponding to a latentroot M/A), then so is a/A) V/A) for a/A) with modulus 1.
Another complication that arises relates to the fact that the latent rootsand vectors of a spectral density matrix are not invariant under linear filter-ing of the series. Hence, the series with greater variability end up weightedmore heavily in the principal components. If the series are not recorded incomparable scales, difficulties arise. One means of reducing these complica-tions is to carry out the computations on the estimated matrix of coheren-cies, [Rjk(T)(ty], rather than on the matrix of spectral densities.
We conclude this section by reminding the reader that we saw, in Section4.7, that the Cramer representation resulted from a form of principal com-ponent analysis carried out in the time domain. Other time domain principalcomponent analyses appear in the work of Craddock (1965), Hannan(1961a), Stone (1947), Yaglom (1965), and Craddock and Flood (1969).
354 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
At the other extreme, we may consider the series corresponding to thesmallest latent roots. Suppose we feel the series X(f), / = 0, ±1,. . . maysatisfy some linear time invariant identity of the form
where b(u) is 1 X r and unknown and K is constant. Thus
and it is reasonable to take b(«) to correspond to the rth principal compo-nent series derived from the smallest latent roots. This is an extension of asuggestion of Bartlett (1948a) concerning the multivariate case.
On another occasion, we may be concerned with some form of factoranalytic model such as
9.6 A WORKED EXAMPLE 355
9.6 A WORKED EXAMPLE
We consider the estimation of the coefficients of the principal componentseries for the 14 vector-valued series of monthly mean temperatures at theone American and 13 European stations indicated in Chapter 1. In the dis-cussion of Theorem 9.4.2 we saw that the estimates /i/r)(X), V/r)(X) could besubstantially biased if the spectral density matrix was far from constant withrespect to X. For this reason the series were prefiltered initially by removingthe seasonal effects. Figure 9.6.1 presents estimates of the power spectra ofthe seasonally adjusted series, taking an estimate of the form (9.4.19)with m = 25.
Figure 9.6.2 gives logic M/r)(X),y = 1 , . . . . 14. The /ty(T)(X) are the latentroots of the estimated spectral density matrix fxx<T\\). In fact, because ofthe unavailability of a computer program evaluating the latent roots andvectors of a complex Hermitian matrix, the /i/r)(X) and the V/r)(X) werederived from the following matrix with real-valued entries
Figures 9.6.3 and 9.6.4 give the estimated gain and phase, \Vpj(T)(\)\ and
arg Vpj(T)(\), for the first two principal components. For the first compo-
nent, the gains are surprisingly constant with respect to X. They are not near0 except in the case of New Haven. The phases take on values near 0 or ?r/2,simultaneously for most series. In interpreting the latter we must rememberthe fact that the latent vectors are determined only up to an arbitrary mul-tiplier of modulus 1. This is why 4 dots stand out in most of the plots. It ap-pears that the first principal component series is essentially proportional tothe average of the 13 European series, with no time lags involved. Thegains and phases of the second component series are seen to be much moreerratic and not at all easy to interpret. The gain for New Haven is noticeablylarge for X near 0. The discussion at the end of Section 9.4 and Exercise 9.7.7suggest two possible means of constructing approximate standard errors forthe estimates.
Table 9.6.1 gives logic of the latent values of the matrix c*,r<r)(0) of Table
making use of Lemma 3.7.1. The curves of Figure 9.6.2 are seen to fall off asX increases in much the same manner as the power spectra appearing inFigure 9.6.1. Following expressions (9.4.18) and (9.4.20), the standard errorof these estimates is approximately
Figure 9.6.1 Logarithm of estimated power spectrum of seasonally adjusted monthly meantemperatures at various stations with 51 periodogram ordinates averaged.
7.8.1. Table 9.6.2 gives the corresponding latent vectors. In view of the ap-parent character of the first principal component series, suggested above, it
356 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
9.6 A WORKED EXAMPLE 357
Figure 9.6.2 Logarithm of estimate of the power spectrum for the principal componentseries.
makes sense to consider these quantities. An examination of Table 9.6.2suggests that the first vector corresponds to a simple average of the 13
Figure 9.6.3 Estimated gains and phases for the first principal component series.
9.6 A WORKED EXAMPLE 359
360 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
9.6 A WORKED EXAMPLE 361
Figure 9.6.4 Estimated gains and phases for the second principal component series.
362 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
9.6 A WORKED EXAMPLE 363
series obtained from the 14 by excluding New Haven. The second vectorappears to correspond to New Haven primarily.
9.7.2 Suppose the conditions of Theorem 9.3.1 are satisfied. Suppose cxx(u) — 0for u 7* 0. Show that (b(w)}, |c(w)} given in the theorem satisfy b(u), c(w) = 0for u T± 0.
9.7.3 Under the conditions of Theorem 9.3.1, show that the coherency of the seriesXj(i) and f *(/) is
9.7.4 For the variates of Theorems 9.2.2, 9.2.4 show that Efij = nj + O(/r1/2),
9.7.5 Under the conditions of Theorem 9.4.3, show that Vpi(T)(\) is asymptotically
Nvc(Vpj(\\ <rr2) where
9.7.6 Suppose that the data is tapered, with tapering function h(t/T\ prior to cal-culating the estimates of Theorem 9.4.3. Under the conditions of that
9.7 EXERCISES
9.7.1 Let Hj(\), j = 1, . . . , r denote the latent roots of the r X r non-negativedefinite matrix fxx(X). Let p/r)(X), j = 1,. . . , r denote the latent roots of thematrix
364 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
Table 9.6.1 logio Latent Values of the Temperature Series
1.5911.025.852 .781.369.267.164.009
-121.-.276-.345-.511-.520-.670
Table 9.6.2 Latent Vectors of the Temperature Series
.281-.173-.129-.211-.112-.018
.365-.278-.034-.095
.174-.367
.599-.271
.315-.055-.0
.012
.063
.082
.013
.424-.126
.230
.399
.052-.302-.619
.242
.060-.028
.186-.081
.138-.047
.069
.366-.136
.388
.633
.368
.177
.312-.169-.098-.234-.594-.343-.567-.064-.002-.033-.073
.013-.083-.004
.248
.118-.098
.450-.160
.252-.007-.091
.610
.119-.204-.392-.168-.076
.252-.194-.194-.205-.088
.022
.522-.275
.046-.227
.023
.217-.571
.188
V.
.256-.001
.133-.061
.117
.122-.136
.092-.127
.112
.507-.437-.091
.610
.250-.173-.129-.211-.112-.018
.365-.278-.034-.095
.174-.367
.599-.272
.357
.185
.441-.200
.336
.257-.278-.195-.011-.491-.164
.003-.059-.197
.071
.822-.481-.288
.020-.050
.001
.011-.034
.019
.015-.018-.001
.015
.251-.061
.104-.307
.125
.255-.011-.213
.001
.740-.284
.239
.102
.083
.321-.142-.157-.063
.046
.065
.152
.686-.067-.177-.482-.036
.174
.220
.283-.167-.393
.362
.570-.411-.193-.233-.117
.029-.057
.048
.020
.023
.186
.173
.018
.497-.343
.286
.061-.169-.660-.009-.098
.095
.034
.035
366 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN
theorem, show that the asymptotic covariances (9.4.15), and (9.4.17) aremultiplied by J A(OWl / >K/)W.
9.7.7 Under the conditions of Theorem 9.4.3, show that log \Vpj&(\)\ andan? {Vpj(r>(X)} are asymptotically distributed as independent W(log |^/X)|,(£)ffr2|r,;(A)|-2) and Af(arg {K,/X)j, (flcrr^X^h2) variates respectivelywhere or2 is given in Exercise 9.7.5.
9.7.8 (a) Show that if in the estimate (9.4.19) we smooth across the wholefrequency domain, the proposed analysis reduces to a standard principalcomponent analysis of the sample covariance matrix c^^(T)(0).
(b) Let the series X(/), / = 0, ±1,... be Gaussian and satisfy Assumption2.6.2(1). Let /i/, V,, j = 1 ,. . . , r denote the latent roots and vectors ofc**(0). Suppose the roots are distinct. Let #;, V/,y = 1,.. ., r indicatethe latent roots and vectors of c*;r(r)(0). Use (7.6.11) and the expansionsof the proof of Theorem 9.2.4 to show that the ftj, \j,j— 1 , . . . , r areasymptotically jointly normal with
10
THE CANONICAL ANALYSISOF TIME SERIES
10.1 INTRODUCTION
In this chapter we consider the problem of approximating one stationarytime series by a filtered version of a second series where the filter employedhas reduced rank. Specifically consider the (r + s) vector-valued stationaryseries
is near Y(f) for some s vector y and s X q filter [c(u)\. If the series Y(/) wereidentical with X(f), then we would have the problem discussed in the previ-ous chapter whose solution led to a principal component analysis of thespectral density matrix. If q = min (r,s), then we are not requiring any realreduction in dimension and we have the multiple regression problem dis-cussed in Chapter 8.
367
/ = 0, ±1,. .. with X(/) r vector-valued and Y(0 s vector-valued.Suppose we are interested in reducing the series X(/) to be q vector-valued
forming, for example, the series
t = 0, ±1, ... with {b(u)\ a q X r matrix-valued filter, and suppose wewish to do this so that the s vector-valued series
368 THE CANONICAL ANALYSIS OF TIME SERIES
The relation connecting ¥*(/) — p to X(/) is linear and time invariant withtransfer function
where B(X) and C(X) are the transfer functions of {b(w)} and { C(M) }, respec-tively. Note that under the indicated requirements the matrix A(X) has rank^ q. Conversely if it were known that A(X) had rank ^ q, then we could finda q X fB(X) and a s X q C(X) so that relation (10.1.4) holds. The problemindicated is approximating Y(/) by a filtered version of X(f) where the filteremployed has rank ^ q.
In the next section we discuss an analog of this problem for vector-valuedvariates. A general reference to the work of this chapter is Brillinger (1969d).
be an (r -f s) vector-valued variate with X r vector-valued and Y s vector-valued. Suppose the mean of (10.2.1) is
10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES
Let
and its covariance matrix is
Consider the problem of determining the s vector y, the q X r matrix B andthe s X q matrix C so that the variate
is small. Let us measure the size of this variate by the real number
for some symmetric positive definite r. We have
Theorem 10.2.1 Let an (r + s) vector-valued variate of the form (10.2.1)with mean (10.2.2) and covariance matrix (10.2.3) be given. Suppose £**, r
10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES 369
are nonsingular. Then the s X 1 i», q X r B and s X q C, q ^ r,s, thatminimize (10.2.5) are given by
where V/ is the/th latent vector of T-^l2^x^xx~l^xyT-112, j = !,...,*.If pj denotes the corresponding latent root,y = 1,. . ., s, then the minimumobtained is
and
with
and
is
The case r = I is of particular importance. Then we are led to evaluatethe latent roots and vectors of the matrix ^YX^XX~I^XY- If My and V, de-note these, then the covariance matrix of the error series
370 THE CANONICAL ANALYSIS OF TIME SERIES
If we take q = r, then we are led to the multiple regression results of Theo-rem 8.2.1. If s = r and Y = X, then we are led to the principal componentresults of Theorem 9.2.1. A result related to Theorem 10.2.1 is given inRao (1965) p. 505.
A closely related problem to this theorem is that of determining theq X 1 |t, 9 X r D, and q X s E so that the q vector-valued variate
is small. This problem leads us to
Theorem 10.2.2 Let an (r + s) vector-valued variate of the form (10.2.1)with mean (10.2.2) and covariance matrix (10.2.3) be given. Suppose TLxxand £yx are nonsinguiar. The q X 1 y, q X r D, and q X s E with E£r rET -I, D£*;rDT = I that minimize
are given by
and
where Vj denotes theyth latent vector of ^YY^rx^xx'^xr^rr'2' where U/denotes the/th latent vector of S^V23Ej-j'Xrj'~lZi'^S^y2. If nj denotes theyth latent root of either matrix, then the minimum achieved is
We see that the covariance matrix of the variate
10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES 371
is given by
with Uj and (5y proportional to Z^]/2U/ and Eyy/2V, respectively. The co-efficients of the canonical variates satisfy
We note that the standardization a/Zjrjro/, $kr'2>YY$k = 1 is sometimesadopted. However, sampling properties of the empirical variates are simpli-fied by adopting (10.2.25). We define
This result leads us to define the canonical variates
We standardize them so that
and
Corollary 10.2.2 Under the conditions of Theorem 10.2.2
and
372 THE CANONICAL ANALYSIS OF TIME SERIES
The value PJ = /uy1 /2 is called the y'th canonical correlation in view of(10.2.28). We note that the variates introduced in this theorem could alter-nately have been deduced by setting r = Syr in Theorem 10.2.1.
Canonical variates were introduced by Hotelling (1936) as linear com-binations of the entries of X and Y that have extremal correlation. Relatedreferences include: Obukhov (1938, 1940), Anderson (1957), Morrison(1967), Rao (1965), and Kendall and Stuart (1966). In the case that thevariate (10.2.1) is Gaussian, the first canonical variate is extremal within abroader class of variates; see Lancaster (1966). Canonical variates are useful:in studying relations between two vector-valued variates (Hotelling (1936)),in discriminating between several populations (Glahn (1968), Dempster(1969) p. 186, and Kshirsager (1971)), in searching for common factors(Rao (1965) p. 496), in predicting variables from other variables (Dempster(1969) p. 176, Glahn (1968)), and in the analysis of systems of linear equa-tions (Hooper (1959) and Hannan (1967c)).
Let us consider certain aspects of the estimation of the above parameters.Assume, for convenience, that yx and pr = 0. Suppose that a sample ofvalues
j — 1, . . ., n of the variate of Theorem 10.2.2 is available. As an estimateof (10.2.3) we take
We then determine estimates of njt ay, (3, from the equations
and
with the standardizations
Below we set
10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES 373
in order to obtain
Theorem 10.2.3 Suppose the values (10.2.30) are a sample of size n from
Suppose r ^ s and suppose the latent roots nj, j = ! , . . . , $ are distinct.Then the variate jjuy, «7, $f, j = 1,. . ., s] is asymptotically normal with{#,; j = 1,. . ., s} asymptotically independent of {&,, (3/; j = ! , . . . ,$} .The asymptotic moments are given by
374 THE CANONICAL ANALYSIS OF TIME SERIES
The asymptotic variances of the statistics may now be estimated by sub-stituting estimates for the parameters appearing in expressions (10.2.41) to(10.2.44). In the case of Ay we note that
and so it is simpler to consider the transformed variate tanh~' /I/"2. Inpractice it is probably most sensible to estimate the asymptotic second-ordermoments by means of the jack-knife procedure; see Brillinger (1964c,1966b).
If s - 1 we note that the canonical correlation squared, pi2 = m, is thesquared coefficient of multiple correlation discussed in Section 8.2.
The asymptotic covariance of fcj with fa was derived in Hotelling (1936);Hsu (1941) found the asymptotic distribution; Lawley (1959) derived highercumulants; Chambers (1966) derived further terms in the expansion of theasymptotic means; Dempster (1966) considered the problem of bias reduc-tion; Hooper (1958) derived the asymptotic covariance structure under anassumption of fixed Xy, j = !, . . . ,«; the exact distribution of the samplecanonical correlations depends only upon the population canonical corre-lations and has been given in Constantine (1963) and James (1964). Thedistribution of the vectors was found in Tumura (1965). Golub (1969) dis-cusses the computations involved. Izenman (1972) finds the asymptoticdistribution of an estimate for CB of (10.2.4) in the normal case.
We will require complex analogs of the previous results. Consider the(r + s) vector-valued variate
with complex entries. Suppose it has mean
covariance matrix
and
We then have
Theorem 10.2.4 Let an (r + s) vector-valued variate of the form (10.2.46)with mean (10.2.47) and covariance matrix (10.2.48) be given. Suppose
10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VAR1ATES 375
Exx, r are nonsingular with r > 0. Then the s X 1 p, q X r B and s X q C,q < r, 5 that minimize
are given by
and
where Vy is theyth latent vector ofIf M/ indicates the corresponding latent root, the minimum obtained is
are given by
and
We note that the Vy are arbitrary to the extent of a complex multiplier ofmodulus 1. Next we have
Theorem 10.2.5 Let an {r + s) vector-valued variate of the form (10.2.46)with mean (10.2.47) and covariance matrix (10.2.48) be given. SupposeExx, ^VY are nonsingular. Then the q X 1 p, q X r D, q X s E withElyyE' = I, DS^^D' = I, that minimize
376 THE CANONICAL ANALYSIS OF TIME SERIES
where V_/ signifies the y'th latent vector of 'S,yil2'£,Yx'2xx~l'ZxY'S'yl12, andwhere Uy signifies theyth latent vector of S^y^A-ySyy-^SyA-S^2.
As in the real case, we are led to consider the variates
where a, and (5, are proportional to S~y2U/ and £y^/2V/, respectively. Westandardize them so that
Thus we have
Corollary 10.2.5 Under the conditions of Theorem 10.2.5
If we let PJ denote the y'th latent root of ^^^Yx^xx'^xY^Y12 then itappears as
fory = 1,. . ., min (r,s). We call the variates f/, <ajtj = 1,. . . , min (r,s) thejth pair of canonical variates. The coefficient p, = M y
1 / 2 ^ 0 is called thej'thcanonical correlation coefficient. We set p/ = 0 for j > min (r,.v) and we takea determination of a/ and ^y so that
10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES 377
Canonical variates of complex-valued random variables appear in Pinsker(1964) p. 134.
Suppose now yx and yy = 0 and a sample of values
j = 1,. . . , n of the variate of Theorem 10.2.5 is available. As an estimateof expression (10.2.48) we take
We then determine estimates of My, ay, (5y from the equations
and
with the normalizations
and
In Theorem 10.2.6 we set
and
Theorem 10.2.6 Suppose the values (10.2.69) are a sample of size n from
378 THE CANONICAL ANALYSIS OF TIME SERIES
Suppose r ^ s and suppose the latent roots /*,-, y = 1,. . ., s are distinct.Then the variate {#;, a/, (J/;y = 1,. . ., 5} is asymptotically normal with{Ayi 7 = U • • • » •*} asymptotically independent of jo/, (5/; y' = 1 , . . . , s\.The asymptotic moments are given by
We note that the asymptotic distribution of the variate
10.3 THE CANONICAL VARIATE SERIES 379
is complex normal, j = ! , . . . ,$. We also note that
James (1964) gives the exact distribution of the jft/, j = 1,. . . , s in thiscomplex case.
10.3 THE CANONICAL VARIATE SERIES
Consider the problem, referred to in the introduction of this chapter, ofdetermining the s vector y, the q X r filter {b(u)} and the s X q filter {c(u)\so that if
then
is near Y(f), / = 0, ± 1 , . . . . Suppose we measure the degree of nearness by
which may be written
in the case that £Y(/) = £¥*(?). We have
Theorem 10.3.1 Let
/ = 0, ± 1,. . . be an (r + s) vector-valued, second-order stationary serieswith mean
absolutely summable autocovariance function and spectral density matrix
— oo < A < oo. Suppose fxx(ty is nonsingular. Then, for given q ^ r, s, the
Here V/A) denotes they'th latent vector of the matrix fYx(Wxx(h)~lfxYOC),j = 1 , . . . , s. If M/A) denotes the corresponding latent root, j = 1 , . . . , . ? ,then the minimum achieved is
380 THE CANONICAL ANALYSIS OF TIME SERIES
s X 1 tf, q X r {b(w)j and s X q {c(w)j that minimize (10.3.3) are given by
and
where
and
The previous theorem has led us to consider the latent roots and vectorsof certain matrices based on the spectral density matrix of a given series.Theorem 10.3.1 is seen to provide a generalization of Theorems 8.3.1 and9.3.1 which correspond to taking, q = s and Y(?) = X(/) with probability 1,respectively.
We see that the error series
has mean 0 and spectral density matrix
10.3 THE CANONICAL VARIATE SERIES 381
for — oo < X < oo; this spectral density matrix is the sum of two parts ofdifferent character. The first part
appeared in Section 8.3 as the error spectral density matrix resulting fromregressing Y(r) on the series X(0, / = 0, ±1, . . . . It represents a lowerbound beyond which no improvement in degree of approximation is possibleby choice of q, and also measures the inherent degree of linear approxima-tion of Y(0 by the series X(t), t = 0, ±1,. . . . The second part
will be small, for given q, in the case that the latent roots M/A), j > q, aresmall. As a function of q it decreases with increasing q and becomes 0 whenq ^ r or s.
The criterion (10.3.3) has the property of weighting the various com-ponents of Y(0 equally. This may not be desirable in the case that thedifferent components have substantially unequal variances or a complicatedcorrelation structure. For some purposes it may be more reasonable tominimize a criterion such as
In this case we have
Corollary 10.3.1 Under the conditions of Theorem 10.3.1, expression(10.3.18) is minimized by the |b(w)} and (C(M)} of the theorem now based onV/\),y= 1 , . . . , . y , the latent vectors of frK*)~!/2fy;r(*)frjKX)~!f*i<A)frKA)-"2-
The procedure suggested by this corollary has the advantage of being in-variant under nonsingular filtering of the series involved; see Exercise 10.6.5.The latent vectors of the matrix of this corollary essentially appear in thefollowing:
Theorem 10.3.2 Suppose the conditions of Theorem 10.3.1 are satisfied.The real-valued series f//)» ^X0> t — 0, ±1,.. . of the form
and
382 THE CANONICAL ANALYSIS OF TIME SERIES
with the standardizations Ay(a)rA/a) = 1, B/a)rB/a) = 1, having maxi-mum coherence, |/?r,i,,(A)|2, and coherence 0 with the series f*(/), Tj*(0, t = 0,±1,. . . , k < j, j = 1, . . . , min (r,s), are given by the solutions of theequations:
and
j — 1,. .. , min (r,s), where m(X) *£ w(A) ^ • • • . The maximum coherenceachieved is /iy{^)> J' — ! » • • • > niin (r,s).
The solutions of the equations (10.3.21) and (10.3.22) are intimately con-nected to the latent roots and vectors of the matrix of Corollary 10.3.1,which satisfy
This last gives
and
allowing us to identify nj(\) as p/A) and to take A/A) and B/A) proportionalto fA-Ar(^)"1^y(A)fyy(A)-»/2Vy(A) and fyy(A)-»/2V/A), respectively.
Theorem 10.3.2 has the advantage, over Corollary 10.3.1, of treating theseries X(/) and Y(/) symmetrically. The pair f//) and 7jy(0» / = 0, ±1,.. . ofthe theorem is called the jth pair of canonical series. Their coherence, nj(\),is called the jth canonical coherence. They could also have been introducedthrough an analog of Theorem 10.2.4.
In the case that the autocovariance functions involved fall off rapidly as|u| —> 0°, the filter coefficients appearing will similarly fall off. Specificallywe have
Theorem 10.3.3 Suppose the conditions of Theorem 10.3.1 are satisfiedand in addition
10.3 THE CANONICAL VARIATE SERIES 383
and
for some P ^ 0 and suppose that the latent roots of fYx(WxxO$~lfxY(X) aredistinct. Then the b(«), c(w) given in Theorem 10.3.1 satisfy
and
Likewise the autocovariance function of the error series t(t), t = 0, ±1,. ..satisfies
The following theorem provides a related result sometimes useful insimplifying the structure of a series under consideration.
Theorem 10.3.4 Suppose the conditions of Theorem 10.3.1 are satisfiedand in addition
and
for some P ̂ 0 and suppose the latent roots AH(X), . . . , nJ(\) of frr(X)~1/2
fy^(X)f^A-(X)~1fA-y(X)fyK(X)~1/2 are distinct and nonzero. Then there existr X r and s X s filters {a(w)} and {b(w)j satisfying
and
such that the series
384 THE CANONICAL ANALYSIS OF TIME SERIES
has spectral density matrix
Pinsker (1964) indicates that we can filter a stationary series in order toobtain a spectral density matrix of the form of (10.3.36).
10.4 THE CONSTRUCTION OF ESTIMATES ANDASYMPTOTIC PROPERTIES
Suppose that we have a stretch
of the (r -}- s) vector-valued stationary series
with spectral density matrix
and that we wish to construct estimates of the latent roots and transfer func-tions, M/X), A/X), B/X), j = 1 , 2 , . . . , described in Theorem 10.3.2. Anobvious means in which to proceed is to construct an estimate
10.4 CONSTRUCTION OF ESTIMATES AND ASYMPTOTIC PROPERTIES 385
of the matrix (10.4.3) and then to determine estimates as solutions of theequations
Now let us investigate the statistical properties of estimates constructed inthis way.
Suppose we take
Theorem 10.4.1 Let the (r -f .9) vector-valued series (10.4.2) satisfy As-sumption 2.6.2(1). Let VJ(T)(\), R/r)(X), S/r)(X) be the solutions of the systemof equations:
with the standardizations
as the estimate (10.4.4) where
and
for some weight function W(a). Then we have
and
386 THE CANONICAL ANALYSIS OF TIME SERIES
where
Let (10.4.4) be given by (10.4.8) where W(a) satisfies Assumption 5.6.1.Let ju/r)(X), A/r>(\), B/r>(X) be given by (10.4.5) and (10.4.6). If BTT-* »as T —> oo, then
If, in addition, the latent roots of fYY(X)-ll2fYx(Wxx(\TlfxY(WYY(X)~112
are distinct, then
and
and
Theorem 10.4.1 suggests the importance of prefiltering. The distributionsof jt/r)(X), Aj,(T)(X), B/r)(X) are centered at the solutions of the equations(10.4.11) and (10.4.12). These equations will be near the desired (10.3.21)and (10.3.22) only when the weighted average (10.4.13) is near (10.4.3). Thelatter is more likely to be the case when the series have been prefilteredadequately.
Turning to an investigation of the asymptotic distribution of M/r)(X) andA/r)(X), B/r>(X) we have
Theorem 10.4.2 Under the conditions of Theorem 10.4.1 and if theM/Xm), j = 1 , . . . , min (r,s) are distinct for m = 1 , . . . , Af, the variatesM/r)(Xm), A/r)(XM), B/r)(xm), j = 1, 2 , . . . , w = 1 , . . . , M are asymptoti-cally jointly normal with asymptotic covariance structure
10.4 CONSTRUCTION OF ESTIMATES AND ASYMPTOTIC PROPERTIES 387
and suppressing the dependence of population parameters on Xm,
with analagous expressions for cov {A/r)(Xm), B*(7>)(A,,)}, cov {B/(r)(Xw),Byt^(Xn)} deducible from (10.2.84) to (10.2.87).
Expression (10.4.20) implies that
in addition to which tanh"1 VM/"(X) will be asymptotically normal. Theseresults may be used to construct approximate confidence limits for thecanonical coherences.
An alternate form of limiting distribution results if we consider thespectral estimate (8.5.4) corresponding to a simple average of a fixed numberof periodogram ordinates.
Theorem 10.4.3 Let the (r -f- s) vector-valued series (10.4.2) satisfy As-sumption 2.6.1 and have spectral density matrix (10.4.3). Let this matrix beestimated by (8.5.4) where m, s(T) are integers with 2irs(T)/T —» X as T —» ».Then let
be distributed as (2m + I)"1 W^im + 1, (10.4.3)) if X ̂ 0 (mod TT), and as(2m)-Wr+s(2m, (10.4.3)) if X = 0 (mod TT). Then, as r-» », M/D(A),A,(7XX), B/r>(X) tend in distribution to the distribution of £,, A,, By—thesolutions of the equations
and
388 THE CANONICAL ANALYSIS OF TIME SERIES
Constantine (1963) and James (1964) give the distribution of the fa,7 = 1 , 2 , . . . .
The distributions obtained in Theorems 10.4.2 and 10.4.3 are not incon-sistent. If, as in Section 5.7, we make the identification
and m is large, then, as Theorems 10.2.3 and 10.2.6 imply, the HJ(T)(\),A/r)(X) and B/r)(X) are asymptotically normal with the appropriate first-and second-order moment structure.
10.5 FURTHER ASPECTS OF CANONICAL VARIATES
We begin by interpreting the canonical series introduced in this chapter, interms of the usual canonical variates of vector-valued variables with real-valued components. Let X(/,X) and Y(/,X) signify the components of fre-quency X of the series X(f), t = 0, ±1,. . . and Y(f), t = 0, ±1,. . ., respec-tively. Then (see Sections 4.6, 7.1) the 2(r -f s) vector-valued variate
A standard canonical correlation analysis of the variate (10.5.1) would thuslead us to consider latent roots and vectors based on (10.5.2), specifically theroots and vectors of
has covariance matrix proportional to
Following Lemma 3.7.1, these are essentially the roots and vectors of
10.5 FURTHER ASPECTS OF CANONICAL VARIATES 389
^YY2^Yx^xx~l^xY^YY2- *n summary> we &w that a frequency domain canoni-cal analysis of the series
may be considered to be a standard canonical correlation analysis carriedout on the individual frequency components of the series X(f) and Y(/) andtheir Hilbert transforms.
Alternately we can view the variates appearing in Theorem 10.4.3 as re-sulting from a canonical correlation analysis carried out on complex valuedvariates of the sort considered in Theorem 10.2.6. Specifically Theorem 4.4.1suggests that for s(T) an integer with 2trs(T)/T ^ X ̂ 0 (mod TT), the values
are approximately a sample of size (2m +1) from
The discussion preceding Theorem 10.2.6 now leads us to the calculationof variates of the sort considered in Theorem 10.4.3.
We remark that the student who has available a computer program forthe canonical correlation analysis of real-valued quantities may make use ofthe real-valued correspondence discussed above in order to compute esti-mates of the coefficients, rather than writing a new program specific to thecomplex case.
Further statistics we may wish to calculate in the present context include:
and
for u = 0, ±1, . . . , where A/r)(X), B/r)(A) are given by the solutions of(10.4.5) and (10.4.6). These statistics are estimates of the time domain co-efficients of the canonical series.
By analogy with what is done in multivariate analysis, we may wish toform certain real-valued measures of the association of the series X(/) andY(/) such as Wilks' A statistic
390 THE CANONICAL ANALYSIS OF TIME SERIES
the vector alienation coefficient
Sample estimates of these coefficients would be of use in estimating thedegree of association of the series X(f) and Y(?) at frequency X.
Miyata (1970) includes an example of an empirical canonical analysis ofsome oceanographic series.
10.6 EXERCISES
10.6.1 Show that if in Theorem 10.2.1 we set q = r, then we obtain the multipleregression results of Theorem 8.2.1.
10.6.2 If T is taken to be Syy in Theorem 10.2.1, show that the criterion (10.2.5)is invariant under nonsingular linear transformations of Y.
10.6.3 Under the conditions of Theorem 10.3.1, prove that Mi(X) = \Ryx(\)\2 ifs= 1.
10.6.4 Under the conditions of Theorem 10.3.2, prove |/u/00! ^ 1-10.6.5 Under the conditions of Theorem 10.3.1, prove the canonical coherences
J*A),7 = 1» 2, ... are invariant under nonsingular filterings of theseries X(r), / = 0, dbl, . . . or the series Y(f), t = 0, ±1, . . . .
10.6.6 Suppose the conditions of Theorem 10.3.1 are satisfied. Also, Cxx(u),CXY(U), CYY(U) = 0 for u 7* 0. Then \b(u)}, }c(«)} given in the theoremsatisfy b(«), c(«) = 0 for u ̂ 0.
10.6.7 Demonstrate that the coherence |/?r;r(X)|2 can be interpreted as the largestsquared canonical correlation of the variate (AX/,X), X(t,\)H] with thevariate (>UX), WO"}.
10.6.8 Prove that if in the estimate (8.5.4), used in Theorem 10.4.3, we smoothacross the whole frequency domain, then the proposed analysis reduces toa standard canonical correlation analysis of the sample covariance matrix
or the vector correlation coefficient
10.6.9 Suppose the data is tapered with tapering function h(t/T) prior to calculat-ing the estimates of Theorem 10.4.2. Under the conditions of the theorem,prove the asymptotic covariances appearing become multiplied byJ/KOWl/MOW.
10.6.10 Suppose that there exist J groups of r vector-valued observations with Kobservations in each group. Suppose the vectors have complex entries and
10.6 EXERCISES 391
(a) Show that the linear discriminant functions, (3TY, providing the ex-trema of the ratio (JTSj(5/(JTS0'(5 (of between to within group sumsof squares) are solutions of the determinental equation
for some v.(b) Define a J — \ vector-valued indicator variable X = [X}] with
Xj = 1 if Y is in the y'th group and equal 0 otherwise, j = 1, . . . .J — 1. Show that the analysis above is equivalent to a canonicalcorrelation analysis of the values
See Glahn (1968).(c) Indicate extensions of these results to the case of stationary time
series ¥,*(/), / = 0, ±1,
PROOFS OF THEOREMS
PROOFS FOR CHAPTER 2
Proof of Theorem 2.3.1 Directly by identification of the indicated co-efficient.Proof of Lemma 2.3.1 First the "only if" part. If the partition is not inde-composable, then following (2.3.5), the </>(/•/./,) — ̂ C/Y/i); 1 ̂ ji ^ h ^ Ii>i = /i, . . . , /o generate only the values sm> — sm>>\ m', m" = m\,. . . , m/v.There is no way the values sm> — sm»', m' = m\, . . . , m/v and m" ^mi,. . . , WN can be generated.
Next the "if" part. Suppose the ^//i) — <t4.rij^l 1 ̂ ji ^ h ^ Ji\i = 1,. . ., / generate the .smi — smi-, 1 ^ mi 7* mi ^ M. It follows thateach pair Pm>, Pm" of the partition communicate otherwise sm> — sm» wouldnot have been generated. The indecomposability is thus shown.
We next demonstrate the alternate formulation. If the partition is notindecomposable, then following (2.3.5), the^(r,-;) — iK/Vy); OJ)> C1''/) ^ ?>»'•>m = mi,. . ., WAT generate only the values /, — /,-; /, /' = i\,. .. , z'o. Thereis no way the values ti — ti>,i = i\, . . . , /o; /' ^ ii,. . . , /o can be generated.
On the other hand, if the ^(r/y) — ̂ (r/'y) generate all the /,- — /,-<, thenthere must be some sequence of the Pm beginning with an / and ending withan /' and so all sets communicate.Proof of Theorem 2.3.2 We will proceed by induction on j. From Theorem2.3.1 we see that
EYi'-Yj-Z *>»•-*>„ (*)M
392
with ]5£/ extending over partitions with/) ^ 2. We see that the terms comingfrom decomposable partitions are subtracted out in the above expressionyielding the stated result.Proof of Theorem 2.5,1 We will write A ^ 0 to mean that the matrix A isnon-negative definite. The fact that f^A-(X) is Hermitian follows from (2.5.7)and CXX(U)T = cxx(-u).
Next suppose EX(t) = 0 as we may. Consider
where C, = cum (Xau ... ,XUm) when v = (a\.,.. ., a«), (the a's being pairsof integers) and the sum extends over all partitions of the set {(m,«) | m = 1,. . . J and n = 1,.. . , km}.
From (*) and (**) we see
PROOFS FOR CHAPTER 2 393
where Z>M = cum (Yait..., Yam) when /* = (a i , . . . , am) and the sum ex-tends over all partitions (MI, . . . , MP) of (1,. . . ,y). Also
that is, it is a Cesaro mean of the series for fjr*(M- By assumption this seriesis convergent, and so Exercise 1.7.10 implies
and the latter must therefore be ^ 0.Proof of Theorem 2.5.2 Suppose EX(t) = 0. Set
By construction Wr)(X) ^ 0 and so therefore Elxx(TW 2 0. We have
lies between 0 and Cxx(ty. By an extension of Helly's selection theoremthis sequence will contain a subsequence converging weakly to a matrix-valued measure at all continuity points of the limit. Suppose the limit ofsuch a convergent subsequence, Jjrjr<r)(X), is Fxx(ty- By approximating theintegrals involved by finite sums we can see that
394 PROOFS OF THEOREMS
for u - 0, ±1,. . . , ±r. Now the sequence of matrix-valued measures
In addition from (**) it tends to cxx(ii). This gives (2.5.8).Expression (2.5.9) follows from the usual inversion formula for Fourier-
Stieltjes transforms. The increments of F^^(X) are ^ 0 by construction.Proof of Lemma 2.7.1 We have
usirig the properties of linearity and time invariance. Setting / = 0 itfollows that
and we have (2.7.6) with A(X) = ft[e](0).Proof of Lemma 2.7.2 The properties indicated are standard results con-cerning the Fourier transforms of absolutely summable sequences; seeZygmund (1968), for example.Proof of Lemma 2.7.3 We note that
It follows that
and so ̂ « a(f — w)X(«) is finite with probability 1. The stationarity of Y(r)follows from the fact that the operation is time invariant.
Continuing, we have
completing the proof. •
for some finite K. The latter tends to 0 as T, T —> oo in view of (2.7.23). Thesequence Yr(0> T = 1, 2 , . . . is therefore Cauchy and so the limit (2.7.25)exists by completeness.
Proof of Lemma 2.7A Set
PROOFS FOR CHAPTER 2 395
Proof of Theorem 2.7.1 Set
showing that the operations are indeed linear and time invariant. Now
from which (2.7.34) and (2.7.35) follow.
Proof of Theorem 2,8.1 The series X(/) is r vector-valued, strictly sta-tionary, with cumulant spectra fai ak(\i, .. ., A*). Y(/) = ^JL-o, a(f — u)X(w); where the a(«) are the coefficients of an s X r filter and XL°°= - »|O/XM)| < °°, / = 1,. . . , s,j = !,... ,/•. From Lemma 2.7.3, Y(f) is alsostrictly stationary and its cumulant functions exist. Indicate these by
and the interchange of averaging and summation needed in going from (*)to (**) above is justified. Now, dbl bk(vi,.. ., pjt-i) is a sum of convolu-tions of absolutely summable functions and is, therefore, absolutely sum-mable. We see that Y(0 satisfies Assumption 2.6.1. Expression (2.8.1) followson taking the Fourier transform of the cumulant function of Y(/) and notingthat it is a sum of convolutions.Proof of Lemma 2.9.1 If X(t) is Markov and Gaussian, then X(s + i)- E{X(s + t) \X(s)\, t>0, s^ 0, is independent of JT(0). Therefore,cov {*(0), [X(s -f 0 - E{X(s + t)\X(s)\]\ = 0 and since E\X(s + /) | X(s)\= K + X(s)cxx(t)/cxx(Q), K a constant, we have
In view of the absolute summability of the %(M), ay, jk(u\,..., w*_i) isabsolutely summable by Fubini's theorem. Therefore
396 PROOFS OF THEOREMS
From this we see that
The proof is completed by noting that cxx(t) = cxx(-t).Proof of Theorem 2.9.1 We begin by noting that under the stated assump-tions Y(t) exists, with probability 1, since
Consider
where
PROOFS FOR CHAPTER 2 397
The cumulant involving the X's is, from Theorem 2.3.2, the sum over inde-composable partitions of products of joint cumulants in theX(ti — M//), say
we see that the cumulants of the series Y(t) are absolutely summable.
Proof of Theorem 2.10.1 We anticipate our development somewhat in thisproof. In Section 4.6 we will see that we can write
Because the series is stationary, the cumulants will be functions of the differ-ences ti — tt> — iiij + Mfy. Following Lemma 2.3 .1 ,7—1 of the differences// — tf will be independent. Suppose that these are t\ — / / , . . . , ti-\ — t/.
Setting ti = 0 we now see that
where g is absolutely summable as a function of its arguments. Making thechange of variables
where
Substituting into (2.9.15) shows that
Now using Theorem 2.3.2 and the expressions set down above gives thedesired expression (2.10.10).
398 PROOFS OF THEOREMS
PROOFS FOR CHAPTER 3
Proof of Theorem 3.3.1 We quickly see the first relation of (3.3.18). It maybe rewritten as
from which the second part of (3.3.18) follows.Proof of Lemma 3.4.1 We have
where |e(r)(X)| < L ̂ « |a(«)| • \u\ for some finite L because the componentsof X(0 are bounded.Proof of Theorem 3.5.1 The theorem follows directly from the substitu-tions j = jiT2 + J2, t = ti + tiT\ and the fact that exp {— i2irk\ = 1 for kan integer.Proof of Theorem 3.5.2 See the proof of Theorem 3.5.3.Proof of Theorem 3.5.3 We first note that the integers
when reduced mod T run through integers t,Q ^ t ^ T — 1. We see thatthere are Ti • • -7* = T possible values for (*), each of which is an integer.Suppose that two of these, when reduced mod T, are equal, that is
for some integer /. This means
The left side of this equation is not divisible by 7"i, whereas the right side is.We have a contradiction and so the values (*) are identical with the integers/ = 0,. . . . , r — 1. The theorem now follows on substituting
and reducing mod T.
has rank at mosty + L — \. By inspection we see that this minimum isachieved by the matrix A of (3.7.19) completing the proof.Proof of Theorem 3.8.1 See Bochner and Martin (1948) p. 39.
PROOFS FOR CHAPTER 3 399
Proof of Lemma 3.6.1 See the proof of Lemma 3.6.2.Proof of Lemma 3.6.2 If we make the substitution
then expression (3.6.11) becomes
This last gives (3.6.5) in the case r = 2. It is also seen to give (3.6.10) in thecase | t/y | ^ S - T.Proof of Lemma 3.7 J The results of this lemma follow directly once thecorrespondence (3.7.7) has been set up.Proof of Theorem 3.7.1 See Bellman (1960).Proof of Theorem 3.7.2 The matrix ZrZ is non-negative definite andHermitian. It therefore has latent values ju;2 for some nj_^ 0. Take V to bethe associated matrix of latent vectors satisfying ZTZV = VD whereD = diag {juy2}. Let M = diag {/u/} be s X r. Take U such that UM_= ZV.We see that U is unitary and composed of the latent vectors of Z ZT. Theproof is now complete.Proof of Theorem 3.7.3 Let ju* denote expression (3.7.11) and a* denote ex-pression (3.7.12). We quickly check that Za^ = /**«*•Proof of Theorem 3.7.4 Set B = Z — A. By the Courant-Fischer theorem(see Bellman (1960) and Exercise 3.10.16),
where D is any (j — I) X J matrix and x is any J vector. Therefore
because the matrix
where £(A) is a spectral family of projection operators on H. This family hasthe properties; £(X)£(M) = £(M)£(X) = £(min {X, M}), E(-ir) = 0, E(ir) = /and £(X) is continuous from the right. Also for Y\(t\ Y2(t) in H,(E(\)Yi,Y2}is of bounded variation and
400 PROOFS OF THEOREMS
Proof of Theorem 3.8.2 The space K+(/) is a commutative normed ring;see Gelfand et al (1964). The space 9fE of maximal ideals of this ring ishomomorphic with the strip — v < Re X ̂ IT, Im X ̂ 0, in such a way thatif M 6 3TI and X are corresponding elements then
These are the functions of K+(/). The stated result now follows from Theo-rem 1, Gelfand et al (1964) p. 82. The result may be proved directly also.Proof of Theorem 3.8.3 The space V(l) is a commutative normed ring. Itsspace of maximum ideals is homomorphic with the interval ( —TT.TT] throughthe correspondence
for M in the space of maximal ideals. The x(Af) are the elements of V(l).The theorem now follows from Theorem 1, Gelfand et al (1964) p. 82.Proof of Theorem 3.9.1 Consider the space consisting of finite linear com-binations ofXj(t + s),j = 1, . . . , / • and t = 0, ±1 , . . . . An inner productmay be introduced into this space by the definition
The space is then a pre-Hilbert space. It may be completed to obtain aHilbert space H. There is a unitary operator, 01, on H such that
Following Stone's theorem (see Riesz and Nagy (1955)) the operator 01 hasa spectral representation
If we define Z/(X;s) = E(\)Xj(s) we see that
PROOFS FOR CHAPTER 3 401
in the sense of (3.9.6). Also from (*) above
Bochner's theorem now indicates that G^(X) defined by (3.9.4) is given by(ZXX;s),Z*(X;,y)). The remaining parts of the theorem follow from the prop-erties of £(X).Proof of Theorem 3.9.2 Set
In view of (3.9.11), there exists Z*(X) such that
Now take an equivalent version of Z(X) with the property= X(0). We see that
and
where nu*(u) is given by (3.9.3). Now because
the uniqueness theorem for Fourier-Stieltjes transforms gives (3.9.15).
402 PROOFS OF THEOREMS
PROOFS FOR CHAPTER 4
Before proving Theorems 4.3.1 and 4.3.2 we first set down some lemmas.
Lemma P4.1 If ha(u) satisfies Assumption 4.3.1 and if ha(T)(f) = ha(t/T) for
a = 1,. . ., r, then
for some finite K.
Proof The expression in question is
for some finite L. Suppose for convenience ua > 0. (The other cases arehandled similarly.) The expression is now
as desired.
Lemma P4.2 The cumulant of interest in Theorems 4.3.1 and 4.3.2 isgiven by
where S = 2(T - 1) and
for some finite K.
Proof The cumulant has the form
PROOFS FOR CHAPTER 4 403
Using Lemma P4.1 this equals
where er has the indicated bound.
Lemma P4.3 Under the condition (4.3.6), sr = o(T") as T-+ «.Proof
Now T~l(\ui\ -\ 1- |M*_I|) —> 0 as T—> «>. Because of (4.3.6) we may nowuse the dominated convergence theorem to see that T^jer) —* 0 as T —» «>.
Lemma P4.4 Under the condition (4.3.10), er = O(l).Proof Immediate.Proof of Theorem 4.3.1 Immediate from Lemmas P4.2, P4.3 and thefact that
Proof of Theorem 4.3.2 Immediate from Lemmas P4.2, P4.4 and the factthat
since (4.3.10) holds.
The following lemma will be needed in the course of the proof of Theo-rem 4.4.1.
Lemma P4.5 Let Y(r), T = 1,2,... be a sequence of r vector-valuedrandom variables, with complex components, and such that all cumulants ofthe variate [Y\(T\ Ti(T\ . . . , Yr
(T\ Fr(r)J exist and tend to the corresponding
cumulants of a variate [Yi, F i , . . . , Yr, Tr] that is determined by its mo-ments. Then Y(r) tends in distribution to a variate having componentsYi,...,Yr.
404 PROOFS OF THEOREMS
Proof All convergent subsequences of the sequence of cdf's of Y(r) tend tocdfs with the given moments. By assumption there is only one cdf withthese moments and we have the indicated result.
Proof of Theorem 4A.I We begin by noting that
We therefore see that the first cumulant of d*(r)(X/J)) behaves in the man-ner required by the theorem.
Next we note, from Theorem 4.3.1, that
The latter tends to 0 if X/T) ± \k(T) ^ 0 (mod 2*-). It tends to 2vfab(±\}) ifiX/I) ss ±X*(!T) (mod 27r). This indicates that the second-order cumulantbehavior required by the theorem holds.
Finally, again from Theorem 4.3.1,
This last tends to 0 as T-* » if k > 2 because A(r)(-) is O(J).Putting the above results together, we see that the cumulants of the
variates at issue, and the conjugates of those variates, tend to the cumulantsof a normal distribution. The conclusion of the theorem now follows fromthe lemma since the normal distribution is determined by its moments.
Before proving Theorem 4.4.2 we must state a lemma:
Lemma P4.6 Let ha(f) satisfy Assumption 4.3.1, a = 1,.. ., r and let//aS/X) be given by (4.3.2). Then if X ̂ 0 (mod'27r)
for some finite K.Proof Suppose, for convenience, that h(i) is nonzero only if 0 ^ t < T.Using Exercise 1.7.13, we see that
PROOFS FOR CHAPTER 4 405
if we use the lemma required in the proof of Theorem 4.3.2.
Proof of Theorem 4.4.2 We proceed as in the proof of Theorem 4.4.1.
using Lemma P4.6 and Lemma P4.1.Next from Theorem 4.3.1,
This tends to 0 if X, ± X* ^ 0 (mod 2r) following Lemma P4.6. It tends to
2x{ / ha(i)hb(t)dt\ fab(±\j) = 2irHah(Q)fab(±\j) if ±X,- = ±X* (mod 2*-).
Finally
This tends to 0 for k > 2 as //«3t(x) = O(T) and the proof of the theorem
follows as before.
To prove Theorem 4.5.1, we proceed via a sequence of lemmas. Set
H2 = / h(t?dt,or = var Re dx™(\) = ± / \H™(\ - a) + H™(-\ - a^fxx(a)dot.
Lemma P4.7 Under the conditions of Theorem 4.5.1, for given X, e and asufficiently small,
Proof From the first expression of the proof of Lemma P4.2, we see
where L = sup \h(u)\ and C* is given by (2.6.7). Therefore
The indicated expression now follows on taking a sufficiently small.
406 PROOFS OF THEOREMS
Corollary Under the conditions of Lemma P4.7.
Lemma P4.8 Let \r = 2irr/R, r = 0 , . . . , / ? — 1 for some integerR > 6irT. Then
Proof This follows immediately from Lemma 2.1 of Woodroofe and VanNess (1967); see also Theorem 7.28, Zygmund (1968) Chap. 10.
Lemma P4.9 Under the conditions of Theorem 4.5.1
Proof The indicated expected value is
giving the result because the sum runs over R = exp {log R\ points and
Lemma P4.10 Given e, S > 0, let a2 = 2^1 + sX2 + 5)r(log T)H2 supx
fxxO*). Under the conditions of Theorem 4.5.1,
for some K.Proof The probability is^ exp { — aa\ 2 exp {log R
This last is ^ KT~*~S after the indicated choice of a.
PROOFS FOR CHAPTER 4 407
Corollary Under the conditions of Theorem 4.5.1
with probability 1.Proof From the Borel-Cantelli lemma (see Loeve (1965)) and the fact thats, 5 above are arbitrary.
Proof of Theorem 4.5.1 We can develop a corollary, similar to the last one,for Im e/;r(r)(X). The theorem then follows from the fact that
Proof of Theorem 4.5.2 We prove Theorem 4.5.3 below. The proof ofTheorem 4.5.2 is similar with the key inequality of the first lemma belowreplaced by
To prove Theorem 4.5.3, we proceed via a sequence of lemmas.
Lemma P4.ll Suppose h(u) has a uniformly bounded derivative and finitesupport. Let
Then
Therefore
In absolute value this is
for some
Proof
408 PROOFS OF THEOREMS
for some finite M with L denoting a bound for the derivative of h(ii).
Lemma P4.12 For a sufficiently small there is a finite L such that
Proof From the previous lemma
for |«| sufficiently small and some finite L.
Lemma P4.13 Let \r = 2irr/R, r = 0,. . ., R — 1 for some integerR > \2irT, then there is a finite N such that
for some K. Now £(r)(A) may be written
The first term here is a trigonometric polynomial of order IT. From Lemma2.1 of Woodroofe and Van Ness (1967) we therefore have
The latter and (*) now give the indicated inequality.
Lemma P4.14 Under the conditions of Theorem 4.5.3,
PROOFS FOR CHAPTER 4 409
Proof Immediate from Lemma P4.12 and the fact that the sup runs overR points.
Lemma P4.15 Given 8 > 0, let a1 = 4L(2 + 8) log T/T, then under theconditions of Theorem 4.5.3,
for some finite K.Proof Set R = T log T and
The probability is then
Corollary Under the conditions of Theorem 4.5.3
for some finite K with probability 1.
Proof of Theorem 4.5.3 The result follows from Theorem 4.5.1, the previ-ous corollary and Lemma P4.13.
Proof of Theorem 4.5.4 Exercise 3.10.34(b) gives
Let k be a positive integer. Holder's inequality gives
for some finite ^following Exercise 3.10.28. It follows from Theorem 2.3.2and (*) in the Proof of Lemma P4.7 that
for some finite M and so
410 PROOFS OF THEOREMS
for some finite N. This gives
As S J1-̂ *.-!) < co for k sufficiently large, we have the result of the theorem.
To prove Theorem 4.6.1, we first indicate a lemma.
Lemma P4.16 Suppose X(0, t = 0, ±1,. . . satisfies Assumption 2.6.1. Let
then
Proof The cumulant may be written as (2ir)~k times
if we substitute for the cumulant function.The limit indicated in the lemma now results once we note that
where ?/(.) is the periodic extension of the Dirac delta function; see Exercise2.13.33.
Proof The cumulant is a sum of terms of the form
PROOFS FOR CHAPTER 4 411
In view of the lemma above these all tend to ± the same limit. The sumtherefore tends to 0.
Corollary
Proof The moment may be written as a sum of cumulants of the form ofthose appearing in the previous corollary. Each of these cumulants tend to 0giving the result.
Proof of Theorem 4.6.1 From the last corollary above we see that thesequence Za
(r)(X), T = 1, 2 , . . . is a Cauchy sequence in the space Lv forany v > 0. Because this space is complete, the sequence has a limit Zfl(X)in the space.
To complete the proof we note that expression (4.6.7) follows fromLemma P4.16 above.
Proof of Theorem 4.6.2 Set
From this we see that
Also
then
412 PROOFS OF THEOREMS
In a similar manner we may show that
From these last two we see
and so
with probability 1, / = 0, ±1,.. . giving the desired representation.Proof of Theorem 4.7.1 We may write
The latter is clearly minimized with respect to B by setting
Now
Following Corollary 3.7.4 this is minimized by setting
in the notation of the theorem. This gives the required result.
PROOFS FOR CHAPTER 5
Proof of Theorem 5.2.1 We have
Expression (5.2.6) now follows after the substitution
PROOFS FOR CHAPTER 5 413
Proof of Theorem 5.2.2 Proceeding as in the proof of Theorem 4.3.2, wesee that (5.2.7) implies
and (5.2.8) follows from (*) immediately above.Proof of Theorem 5.2.3 We begin by noting that
Next we have
Expression (5.2.17) now follows from the fact that
Proof of Theorem 5.2.4 See proof of Theorem 5.2.5 given immediatelybelow.Proof of Theorem 5.2.5 From Theorem 4.3.2 we have
giving the indicated result.We now set down a result that will be needed in the next proof and other
proofs throughout this work.
Theorem P5.1 Let the sequence of r vector-valued random variablesXr, T = 1,2,... tend in distribution to the distribution of a random vari-able X. Let g : Rr —» Rs be an s vector-valued measurable function whose
414 PROOFS OF THEOREMS
discontinuities have X probability 0. Then the sequence of s vector-valuedvariables g(Xr), T= 1,2,. . . tends in distribution to the distribution of g(X).Proof See Mann and Wald (1943a) and Theorem 5.1 of Billingsley (1968).
A related theorem that will also be needed later is
Theorem P5.2 Let the sequence of r vector-valued random variablesV?(YT- - y), T = 1 ,2 , . . . tend in distribution to A^(0,S). Let g : R'-> R'be an s vector-valued function differentiable in a neighborhood of y andhaving s X r Jacobian matrix J at y. Then ^7Xg(Y;r) — g(i»)) tends in distri-bution to #XO»JSJr) as r-» <*>.Proof See Mann and Wald (1943) and Rao (1965) p. 321.
Corollary P5.2 (The Real-Valued Case) Let ^T(YT - n), 7 = 1, 2,...tend in distribution to N(Q,ir2). Let g : R-+ R have derivative g' in a neigh-borhood of it. Then ^(^Yr) - «GO) -» N(0,[g'(n)?a2) as r-* ».
Proof of Theorem 5.2.6 Theorem 4.4.1 indicates that Re dx(T)(\j(T)),Im <£r(r)(X/r)) are asymptotically independent N(0,vTfxx(^j)) variates. Itfollows from Theorem P5.1 that
is asymptotically fxx(^j)Xi2/2. The asymptotic independence for differentvalues of j follows in the same manner from the asymptotic independenceofthe<//«(A/r)),y = l,...,J.Proof of Theorem 5.2.7 This theorem follows from Theorem 4.4.2 asTheorem 5.2.6 followed from Theorem 4.4.1.Proof of Theorem 5.2,8 From Theorem 4.3.2
The indicated result follows as
Proof of Theorem 5.3.1 This theorem is an immediate consequence ofExercise 4.8.23.Proof of Theorem 5.3.2 Follows directly from Theorem 4.5.1 and the defini-tion of I*x™(\).Proof of Theorem 5.4.1 This theorem follows directly from expression(5.2.6) of Theorem 5.2.1 and the definitions of /4rm(X), £rm(A), Cr"<X).
PROOFS FOR CHAPTER 5 415
The corollary follows from Theorem 5.2.2.
Proof of Theorem 5.4.2 This follows from Theorem 5.2.4.
Proof of Theorem 5.4.3 Follows from Theorem 5.2.6.
Proof of Theorem 5.5.1 This theorem follows from expression (5.2.6) ofTheorem 5.2.1 and the definitions of /4r(A), 5r(A), Cr(A). The corollaryfollows from Theorem 5.2.2.
Proof of Theorem 5.5.2 From Theorem 5.2.4.
Proof of Theorem 5.5.3 From Theorem 5.2.6 and Theorem P5.1.The following lemma will be required in the course of the proofs of
several theorems.
Lemma P5.1 If a function g(x) has finite total variation, V, on [0,1], then
Proof See Polya and Szego (1925) p. 37; a related reference is Cargo (1966).If g is differentiable, the right side may be replaced by J \g'(x)\dx/n.
Further results are given as Exercises 1.7.14 and 5.13.28.
Proof of Theorem 5.6.1 The first expression in (5.6.7) follows directly fromexpression (5.2.8) and the definition (5.6.1).
If we use the lemma above to approximate the sum appearing by anintegral, then we see that
giving the final expression in (5.6.7).
Proof of Theorem 5.6.2 Using Theorem 5.2.5, the indicated covariance isgiven by
giving the indicated first expression. The second expression follows fromthis on replacing the sum by an integral making use of Lemma P5.1.Proof of Theorem 5.6.3 See the proof of Theorem 7.4.4.Proof of Corollary 5.6.3 This follows from Theorem 5.6.3 and CorollaryP5.2.
Proof of Theorem 5.8.2 The first expression in (5.8.18) follows directlyfrom the definition of /o-(r)(A) and expression (5.8.9). The second ex-pression of (5.8.18) follows from the first, neglecting terms after the first,and Lemma P5.1.
Proof of Corollary 5.5.2 This follows after we substitute the Taylor ex-pansion
416 PROOFS OF THEOREMS
Proof of Theorem 5.6.4 See the proof of Theorem 7.7.1.Proof of Theorem 5.8.1
in view of (5.8.7) and because |&(r)(w)| ^ 1. This in turn equals
giving the desired result because
into the second expression of (5.8.18).
Proof of Theorem 5.9.1 We write X' for X — CX(T) below. Now
The indicated result now follows as;
PROOFS FOR CHAPTER 5 417
giving
and finally
Proof of Theorem 5.9.2 Follows directly from Theorem 5.3.1.
Proof of Theorem 5.10.1 From Theorem 5.2.2
for s ^ 0(mod T) and s an integer. This gives the first part of (5.10.12).The second part follows from Lemma P5.1.
Continuing, from Theorem 4.3.2
418 PROOFS OF THEOREMS
Taking note of the linear restrictions introduced by the A(r) functions, wesee that the dominant term in this cumulant is of order T~L+l.
Now, when the variates Tll2J(T)(Aj),j = 1,...,/, are considered, we seethat their joint cumulants of order greater than 2 all tend to 0. It followsthat these variates are asymptotically normal.Proof of Theorem 5.10.2 We proceed as in the proof of Theorem 5.9.1.Proof of Theorem 5.11.1 In order to avoid cumbersome algebraic detail,we present a proof only in the case J = 1. The general J case follows in asimilar manner.
The model is
the inner sum being over all indecomposable partitions of the table
giving expression (5.10.13).Turning to the higher order cumulants we have
PROOFS FOR CHAPTER 5 419
and the least squares estimate
Because £e(0 = 0, we see from the latter expression that EB(T) = 0. Also
It follows by the bounded convergence criterion that
At the same time
and so
as indicated in (5.11.20). In the case of higher order cumulants we see
in view of the second condition of Assumption 5.11.1. It follows that
as T —» oo for L > 2 and so 6(T) is asymptotically normal as indicated in thestatement of the theorem.
We next consider the statistical behavior of fee(T)(\). As
we have
420 PROOFS OF THEOREMS
Now
showing that the asymptotic distribution of fee(T)(\) is the same as that of
f,.(T)(X) given in Theorem 5.6.3. (op(l) denotes a variate tending to 0 inprobability.)
The asymptotic independence of 0(r) and/ee(r)(^i)>. . . ,fee
(T)(\K) followsfrom a consideration of joint asymptotic cumulants.
PROOFS FOR CHAPTER 6
Proofs of Theorems 6.2.1 and 6.2.2 These are classical results. Proofs(maybe found in Chapter 19, Kendall and Stuart (1961), for example.Proofs of Theorems 6.2.3 and 6.2.4 These results follow from Theorems6.2.1 and 6.2.2 when we rewrite (6.2.7) in the form
This is a model of the form considered in those theorems.
for some finite N. Therefore
for some finite N' and so
giving (5.11.21) from Theorem 5.6.1. It also follows from these inequalitiesthat
for some finite M, M' while
PROOFS FOR CHAPTER 6 421
Proof of Theorem 6.2.5 Follows directly from the properties of & indi-cated in Theorem 6.2.4.Proof of Lemma 6.3.1 We have
where, because the components of X(/) are bounded |e<r)(/3)|^ 4wi2« W")!- The last part follows directly.
In the proofs below we will require the following lemma.
Lemma P6.1 Given a 1 X M matrix P and an r X M matrix Q we have
Proof We begin by noting the matrix form of Schwarz's inequality
(This follows from the minimum achieved in Theorem 6.2.1.) This implies
Now
and the result follows.
Proof of Theorem 6.4.1 Because Es(t) = 0, we have EA(r)(X) = f/«(r)(X)
where following Lemma P6.1
from which (6.4.9) follows.Before proving Theorem 6.4.2 we state a lemma which is a slight extension
of a result of Billingsley (1966).
422 PROOFS OF THEOREMS
Lemma P6.2 Let Z(r) be a sequence of q vectors tending in distribution toNq
c(Q,l) as T—» <». Let U(r) be a sequence of q X q unitary matrices. ThenUfinz<r) aiso tends in distribution to Nq
c(Qf).Proof Consider any subsequence of Z(r), say Z(r/). Because the group ofunitary matrices is compact (see Weyl (1946)), U(r/) has a convergent subse-quence, say U(r//) tending to U. Now, by Theorem P5.1 U(:r">Z(r"> tendsin distribution to \JNq
c(0,l) = A^C(0,I). Therefore any subsequence ofU(r)Z(r) has a subsequence tending in distribution to A^C(0,I) and soU<:r>Z(r> must tend to #,C(0,I).
Proo/ of Theorem 6.4.2 Consider X of Case A to begin. From Lemma 6.3.1
Let U(r) indicate a (2m + 1) X (2m +1) unitary matrix whose first rcolumns are the matrix Ui(r> = DxT[DxDxT]-112. Write U(r) = [Ui(7>)U2
(r)].Applying U(r) to the matrix equation above gives
The first r columns of the latter give
The remaining give
Because U(r) is unitary we have
s = 0, ± 1,. . ., ±m where the error term O(l) is uniform in s because A(X)has a uniformly bounded first derivative and ||dA-(r)(«)|| = O(T). (Theequations above may be compared with (6.3.7).) Now let Dy denote the1 X (2m +1) matrix whose columns are the values (2irT)~112
dy(T\2ir[s(T) + s]/T), s = 0, dbl, . . . , ±m with a similar definition for DA-and D«. The equations above now take the form
PROOFS FOR CHAPTER 6 423
and so
where OP(l) denotes a variate bounded in probability.
Now Theorem 4.4.1 applies indicating that because the series e(/), / = 0,±1,... satisfies Assumption 2.6.1, D.r tends to JV2«+i(0,/,,(X)I). Therefore/./X)-1/2D.r tends to A^+iOM). Lemma P6.2 applies indicating that/.XX)-1/2(D.U(«)' also tends to N2m+i(0,l) and so (D.U<")T tends to#2m+t(0,/,.(X)I). The indicated asymptotic behavior of A(r)(X) and g..(7-)(X)now follows from the representations obtained for them above.
If X is of Case B or Case C, then the above form of argument goes throughwith the unitary matrix replaced by an orthogonal one. The behavior of n(T)
follows from its dependence on A(r)(0).
We need the following lemma.
Lemma P6.3 (Skorokhod (1956)) Let V<«, T = 1, 2 , . . . be a sequence ofvector-valued random variables tending in distribution to a random vari-able V. Then, moving to an equivalent probability structure, we may write
where Z is Nqc(Q,l) and so
and U("Z is A^(0,I) for all T.
Proof of Theorem 6.4.3 The last lemma shows that we may write
where the f, are independent Nic(0,2irTf.t(\)) variates. LetX* = 2ir[s(T) + s]/T.We may make the substitution </r(r)(X,) = A(X)djrm(X,) + f , + oa.,.(^JT).We have the sum of squares identity,
The terms appearing are quadratic forms of ranks 2m + 1, r, 2m + I — rrespectively in the {•„ plus terms tending to 0 with probability 1. Exercise4.8.7 applies to indicate that the first term on the right here may be written
This lemma provides us with another proof of Lemma P6.2. We may write
424 PROOFS OF THEOREMS
/Wx^A(\)f,jr<"(X)A(X)V/,.(X))/2 + ofl.,.(l), while the second term maybe written/.^X)x!(2in+i-rt/2 + o0.,.(l) with the x2 variates independent. Ex-pression (6.4.12) now follows by elementary algebra.Proof of Theorem 6.5.1 Let R(t) = £» a(f — u)X(u). Now, because£e(r) = 0, we have
Let rf*(r>(0) = A(0)d*<"O) + e(T)C9). From Lemma 6.3.1, em(/3) is uni-formly bounded. By substitution we therefore have
where, following Lemma P6.1,
for some finite K, where we use the facts that e(n(0) is bounded and thatW(ff) is non-negative. The first part of (6.5.14) now follows from Assump-tion 6.5.2. Turning to the second part: suppose 0 ^ X < 2*-. The region inwhich W™ is nonzero is |X — (2vs/T)\ ^ BTv. In this region, \(2*s/T) =A(X) + O(BT) because under the given assumptions A(j3) has a uniformlybounded first derivative. The proof of the theorem is now completed by thesubstitution of this last into the first expression of (6.5.14).Proof of Theorem 6.5.2 To begin we note that
giving the first part of (6.5.19). The second part follows from (6.5.14) and thefact that |a + s| = |a| + 0(<0.
To prove the first parts of (6.5.20) and (6.5.21) we use the Taylor seriesexpansions
from (6.6.3), and so
Because the series e(/), t = 0, ± 1,. . . is unobservable, these variates are un-observable. However, we will see that the statistics of interest are elementaryfunctions of these variates. Continuing to set up a notation, let [dx(T)(ty]k de-note the kth entry of d.y(r)(X) with a similar notation for the entries off*.(r)(X). We have
Lemma P6.4 If fxx(T)(X) is uniformly bounded, then
PROOFS FOR CHAPTER 6 425
taking f + s = /4/r)(X), f = £^/r>(X) and using (6.6.3). To prove thesecond parts, we again use these expansions; however, this time withf + e = EAjW(\), f = Aj(K) and using (6.5.14).
Before developing the remaining proofs of this section we must first setdown some notation and prove some lemmas. We define
The error term is uniform in X.Proof We have
from which the result follows.
Lemma P6.5 If fxx(T)(ty is uniformly bounded, then
The error term is uniform in X, n.
426 PROOFS OF THEOREMS
Proof By virtue of Schwarz's inequality the absolute value of the ex-pression at issue is
giving the desired result.
Lemma P6.6 Under the conditions of Theorem 6.5.1,
Proof We begin by noting that
if we use Lemma 6.3.1. Because
The cumulant in question is given by
PROOFS FOR CHAPTER 6 427
The cumulant appearing in the last expression has principal term
where pp = 2N and ti,.. ., t2N is a permutation of (s\t — s\\- • ",SN, — SN)corresponding to an indecomposable partition. We have p ^ n. We now useLemmas P6.4, P6.5 to eliminate the summations on q and r and see that theprincipal term in A is
giving the indicated result for L -f- M > 1. The other expressions follow in asimilar manner.
These estimates of the order of the joint cumulants are sufficient for cer-tain purposes; however, they are too crude in the second-order case. Ingreater detail in that case we have,
Lemma P6.7 Under the conditions of Theorem 6.5.1,
428 PROOFS OF THEOREMS
Proof Consider the second of these expressions. Following the first ex-pression of the proof of Lemma P6.4, the required covariance is
The other co variances also follow from the expression of Lemma P6.4, thefact that/.,(A) has a uniformly bounded derivative and the fact that thesupport of W(T\a) is |a| ^ BTTT.
In the lemma below we let CT = BT + T~112.
Lemma P6.8 Let R(t) = ]£„ a(f — w)X(w). Under the assumptions ofTheorem 6.5.1,
Proof We derived the first expression in the course of the proof of (6.5.14).The second is immediate. For the third we note
from which the indicated result follows. For the next
PROOFS FOR CHAPTER 6 429
for finite K and L following Assumption 6.5.2. For the final statement wenote that
and the result follows from the earlier expressions of the lemma.
Proof of Theorem 6.5.3 From Lemma P6.8, we see that
from Lemma P6.7. From Theorem 5.6.1, we see that
and we have the indicated result.Proof of Theorem 6.5.4 From (6.3.2) and Lemma 6.3.1 we see
This gives
Therefore
using (6.5.14). The result now follows because under the indicated bounded-ness of X(r), t = 0, ±1,. . . , CX(T) is uniformly bounded.
Proof of Theorem 6.6.1 Directly from the definition of A(r)(X), we see that
and (6.6.3) follows from the first expression of Lemma P6.7.Proof of Theorem 6.6.2 As in the proof of Theorem 6.5.2, we have theTaylor expansions
The desired covariances now follow from these expansions and (6.6.3).
430 PROOFS OF THEOREMS
Proof of Theorem 6.6.3 From Lemma P6.8 we see
From Lemma P6.6 we see that the remainder term is
The indicated result now follows from (5.6.12).Proof of Theorem 6.6.4 From (6.3.2) and Lemma 6.3.1 we see
Expression (6.6.13) now follows from Theorem 4.3.1.
Proof of Theorem 6.6.5 The first covariance required is
if we use the representation of Lemma P6.8 and Lemma P6.6.The second covariance follows from the representation of n(T) given in the
Proof of Theorem 6.6.4 and from Lemmas P6.6 and P6.8. The final Co-variance follows likewise.
Proof of Theorem 6.7.1 We prove the first part of this theorem by evaluat-ing joint cumulants of order greater than 2 of A(r)G*), gtt
(T)(y) and provingthat, when appropriately standardized, these joint cumulants tend to 0.From Lemma P6.6 we see that
and these each tend to 0 as T —» °°. The second part of the theorem followssimilarly by evaluating joint cumulants.
Proof of Theorem 6.8.1 From (6.8.2) we see that
where the error terms are uniform. This gives the first part of (6.8.4); thesecond part follows algebraically.
Proof of Theorem 6.8.2 We begin by examining (6.6.3) and noting that
PROOFS FOR CHAPTER 7 431
in this case. Now from (6.8.2) we see that
for p i£ q, 1 ^ p, q ^ Pr~l because BT ^ PT~I and so
from (6.6.3) giving (6.8.7).Proof of Theorem 6.8.3 This follows as did the proof of Theorem 6.8.2;however, we use (6.6.14) rather than (6.6.3).Proof of Theorem 6.8.4 We prove that the standardized joint cumulants oforder greater than 2 of the variates of the theorem tend to 0. We have
where we use Lemma P6.6 and also the remark at the end of its proof toeliminate one of the summations on p. The cumulant is seen to tend to 0 be-cause PTBT —> 0 as T—> oo.
PROOFS FOR CHAPTER 7
Proof of Theorem 7.2.1
and so
We also have
432 PROOFS OF THEOREMS
from the Cramer representation. It follows from this that
Finally from Parseval's formula
and we have (7.2.7).Proof of Corollary 7.2.1 Suppose hj(u) = 0 for u < 0. (The general casefollows by writing ha as a function vanishing for u < 0 plus a functionvanishing for u ^ 0.) Now
using the Abel transformation of Exercise 1.7.13. If Va denotes the varia-tion of ha(u) we see that
At the same time
(*) and (**) show that the term in caCb tends to 0 as T —* °° if X ̂ 0 (mod ITT)or if ca or c* = 0. Next consider
We split the region of integration here into the regions \a < 8 and |a| ^ 5.In the first region, \fab(\ — a) — /a*(X)| may be made arbitrarily small bychoice of 8 as fab is continuous. Also there
PROOFS FOR CHAPTER 7 433
In the second region, /^(X — a) —/»fr(X) is bounded and
from (*). It therefore follows from (**) that (***) tends to 0 as T—>«.Proof of Theorem 7.2.2 From Theorem 4.3.2
This gives the required result once we note that Hab(T\ Haled = O(T).
Proof of Corollary 7.2.2 We simply consider in turn the cases X ± n = 0(mod ITT) and X ± TT ^ 0 (mod 2ir).Proof of Theorem 7.2.3 Theorem 4.4.2 indicates that d*(r)(Xi),..., d*(r)(Xy)are asymptotically independent Nr
c(Q,2irT[Hab(Q)fab(W) variates. TheoremP5.1 now indicates that
j = 1 , . . . ,J are asymptotically independent Wrc(l,fxx(^jJ) variates. Theconclusions of the theorem now follow as
Proof of Theorem 7.2.4 This follows from Theorem 4.4.1 as Theorem 7.2.3followed from Theorem 4.4.2.Proof of Theorem 7.2.5 This follows directly from Exercise 4.8.23 andTheorem P5.1.Proof of Theorem 7.3.1 From Exercise 7.10.21.
for r an integer ̂ 0 (mod 7"). If X ̂ 0 (mod TT), this gives
Efxx™(\)
for(2*r/r) - X = O(T~l) gives (7.3.13) in the case X ̂ 0(mod7r)as2m -f 1terms of the estimates match up, while the other terms have covarianceO(T~l). Turning to the case X = 0 (mod *•), from Exercise 7.10.22(b) and thefact that m terms match up, the covariance is given by
434 PROOFS OF THEOREMS
giving (7.3.6). If X = 0 (mod 2*) or X = ±T, =fc3ir , . . . with T even, then
giving (7.3.7). If X = ±x, ±3*-,... with T odd, then
giving (7.3.8).Proof of Corollary 7.3.1 As fxx(a) is a uniformly continuous function ofa, expression (*), of the above proof, tends to f^-jr(X) as T —» <» if 2-rr/T —> X.This gives the indicated result.Proof of Theorem 7.3.2 If r, s are integers with 2wr/Tt 2irs/T ^ 0 (mod 2v),Exercise 7.10.22(a) gives
This together with the fact that
and we check that this can be written in the manner (7.3.13).Proof of Theorem 7.3.3 This theorem follows directly from Theorem 7.2.4and Theorem P5.1.Proof of Theorem 7.3.4 This follows directly from Theorem 7.2.1 and itscorollary.Proof of Theorem 7.3.5 The pseudo tapers
for /, m — 0 , . . . , L — 1. The general expression of the proof of Theorem7.2.2 with appropriate redefinition, now shows that
This now gives (7.3.18).
Proof of Theorem 7.3.6 Follows directly from Theorem 7.2.5 and TheoremP5.1.
The second term on the right side here may be made arbitrarily small bysplitting the range of summation into a segment where \(2irs/T) — Xj < 5implies \fab(2irs/T) - fab(\)\ < e and a remainder where S W™(\ - (2irs/T))tends to 0 and | fab(2vs/T) — ̂ (X)| is bounded. This completes the proof of(7.4.9).
PROOFS FOR CHAPTER 7 435
have the property
Proof of Theorem 7.4.1 From Theorem 7.2.1
for s — 1,. . . , T — 1. This gives the first part of (7.4.9). Beginning the proofof the second part, the right side of the above expression has the form
where o(l) is uniform in s, from Exercise 1.7.10. Using this we see
436 PROOFS OF THEOREMS
Turning to (7.4.11), from Theorem 4.3.2
with the error term uniform in s. This gives the first part of (7.4.11). Thesecond follows from Lemma P5.1.Proof of Theorem 7.4.2 By Taylor series expansion of /j&(X — BTO) as afunction of a.Proof of Theorem 7.4.3 From expression (7.2.14)
r, s = 1,. .. , T — \ with the error term uniform in rt s. This gives
giving the first part of (7.4.15). The second part follows from Lemma P5.1.Proof of Corollary 7.4.3 This follows directly from the final part of ex-pression (7.4.15).Proof of Theorem 7.4.4 We have already investigated the asymptotic first-and second-order moment structure of the estimates. We will complete theproof of asymptotic joint normality by showing that all standardized jointcumulants of order greater than 2 tend to 0 as T —> <» under the indicatedconditions.
We have
PROOFS FOR CHAPTER 7 437
In the discussion below set r^i = Sk, rk2 = — Sk, k — 1,. . . , K. Alsoneglect the subscripts a\>.,., OK, hi,..., b/( as they play no essential role.From Theorems 2.3.2 and 4.3.2 it follows that the cumulant in this lastexpression is given by
and nit denotes the number of elements in vt. The cumulant (*) therefore hasthe form
where the summation extends over all indecomposable partitionsv = [ v i , . . ., vp\ of the table
The effect of the A(r) functions is to introduce q linear restraints if q < Kand q — 1 if q = K. We write this number as q — [q/K]. (Here [ ] denotes"integral part.") It follows that (*) is of order
It follows that
is of order B-KI2+lT-KI2+i and so tends to 0 as r-» « for K > 2. The de-sired result now follows from Lemma P4.5.
Proof of Theorem 7.6.1 From Theorem 4.3.1
with the error term uniform in s. This gives the first part of (7.6.6) directly.The second part follows from Lemma P5.1. Continuing, from Theorem 4.3.2
the inner sum being over all indecomposable partitions of the table
438 PROOFS OF THEOREMS
giving expression (7.6.7).
Turning to the higher order cumulants we have, neglecting subscripts,
PROOFS FOR CHAPTER 7 439
Taking note of the linear restrictions introduced by the A(r) functions wesee that the dominant term in this cumulant is of order T~L+l.
Now, when the variates r1/2Ja6(r)(y4;), j = 1 , . . . , J, a, b - 1 , . . . , r are
considered, we see that their joint cumulants of order greater than 2 all tendto 0. It now follows from Lemma P4.5 that the variates are asymptoticallynormal as indicated.
Proof of Theorem 7.6.2 We use the Taylor series expansion
to derive (7.6.15) and (7.6.16) from (7.4.13) and (7.4.17) using theorems ofBrillinger and Tukey (1964). The indicated asymptotic normality followsfrom Theorem 7.4.4 and Theorem P5.2.Proof of Theorem 7.6.3 We have already seen in Theorem 7.6.1 that thefinite dimensional distributions converge as required. We also see that
uniformly in X and so it is enough to consider the process Y(T)(X) =V^[FATAr(r)(X) - EFXx(T)(X)};Q ^ X ̂ TT. We have therefore to show that thesequence of probability measures is tight. It follows from Problem 6, p. 41of Billingsley (1968) that we need show tightness only for the marginalprobability distributions. Following Theorem 15.6 of Billingsley (1968)this will be the case if
We see directly that
From the proof of Theorem 7.6.1 we see that all the second-order momentsof the variates Yab
m(\) - Yab(T)(\i\ JVr)(X2) - Yab™(\) and their conju-
gates are ^ £|X2 — Xi| for some finite L. We have therefore only to consider
440 PROOFS OF THEOREMS
these domains of summation are disjoint, the cumulants on the right sideare of reduced order, in fact expression (*) is seen to be
when EU, EV = 0. This gives the desired result.Before proving Theorem 7.7.1 we remark that as the estimate is transla-
tion invariant, we may act as if EX(t) = 0. We set down some lemmas show-ing that mean correction has no asymptotic effect in the case that E\(t) = 0.
Lemma P7.1 Let X(r), t = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.2(1) and having mean 0. Let ha(u), — °° < u < °°,satisfy Assumption 4.3.1, a = 1,. . . , r. Let ca6
(7)(«)begivenby(7.7.8)and
Then
uniformly in u.
Now from Theorems 5.2.3 and 5.2.8 as ca = 0
Also from the arguments of those theorems
PROOFS FOR CHAPTER 7 441
uniformly in w. It follows that
giving the desired result.
Lemma P7.2 Suppose the conditions of the theorem are satisfied. Suppose£X(0 = 0 and
then
uniformly in X.Proof This follows directly from Lemma P7.1 and the fact that
Proof of Theorem 7.7.1 Lemma P7.2 shows that the asymptotics of/06(r)(X)
are essentially the same as those of gfl6(r)(X). We begin by considering
Egah(TW. Now
where
From Theorem 4.3.2
and so
giving (7.7.13).Next, from Theorem 7.2.2
442 PROOFS OF THEOREMS
We next show that
uniformly in a. As
we may write (**) as
where from Lemma P4.1
for some finite H. A similar result holds for the second term of the integral.The covariance being evaluated thus has the form
and the desired (7.7.14) follows.Finally, we consider the magnitude of the joint cumulants of order K. We
neglect the subscripts a, b henceforth. We have
PROOFS FOR CHAPTER 7 443
where the summation is over all indecomposable partitions v = (vi,... ,vp)of the table
As the partition is indecomposable, in each set vp of the partition we mayfind an element tp*, so that none of tj — tp*,j£vp,p = 1, . . . ,P is afi/-i — t2i, 1 = 1,2,... ,L. Define 2L — P new variables MI, . . . , UIL-Pas the nonzero tj — tp*. The cumulant (*) is now bounded by
In the next to last expression, Cn is given by (2.6.7) and «/ denotes the num-ber of elements in theyth set of the partition v. We see that the standardizedjoint cumulant
c u m {(*iT)1/2g(r>(*i), • • • > <BrT)"*gV>(\L)}
for L > 2, tends to 0 as Y—> «. This means that the variates ga$?Ai),.. .,Bog-b^K) are asymptotically normal with the moment structure of the
for some finite M where ai,.. ., «2L are selected from 1,. . ., 2L and0i,. . . , j82L are selected from 1,. . . , P. Defining <f>(tj) = tp*, j£v\, weapply Lemma 2.3.1 to see that there are P — I linearly independent differ-ences among the fo* — fy,*,.. ., ^2£,-, — $tL. For convenience supposethese are tp* — t^*,. .. , ^2P_3 — t^p_v Making a final change of variables
we see that the cumulant (*) is bounded by
444 PROOFS OF THEOREMS
theorem. From Lemma P7.2 the same is true of the f(T) and we have thetheorem.Proof of Corollary 7.7.7 Immediate from (7.7.13).Proof of Theorem 7.7.2 Follows directly from Theorem 4.5.1.Proof of Theorem 7.7.3 We prove this theorem by means of a sequence oflemmas paralleling those used in the proof of Theorem 4.5.1. FollowingLemma P7.2 it is enough to consider ga*(r)(X) corresponding to the 0 meancase. In the lemmas below we use the notation
Lemma P7.3 Under the conditions of Theorem 7.7.3, for given X, e and asufficiently small
Proof In the course of the proof of Theorem 7.7.1 we saw that
for some finite M. Therefore
The indicated expression now follows from (7.7.21) on taking \a\ sufficientlysmall and the fact that, from (7.7.14),
In the discussion below let
Corollary Under the conditions of Theorem 7.7.3, for given ft
for T sufficiently large.
Lemma P7.4 Let \r = 2irr/JR, r = 0 , . . . , / ? — 1 for some integer R, then
for some finite K.
From Exercise 3.10.28 the final integral here is O(«1/2fc). From the proofof Theorem 7.7.1, E\fab™(a) - Efab
(T)(a)\2k = O(BT~kT-^ This givesWW/^r'supx \fab(T)(\) - £/fl*
(r)(X)|]2fc = O(5^-i). Taking k sufficientlylarge gives the two results of the theorem.
Proof of Theorem 7.7.5 We have
PROOFS FOR CHAPTER 7 445
Proof We first note that because M>(M) is 0 for sufficiently large \u\, gab(T)(\)is an entire function of order ^ KBf~l. The inequality of Lemma P7.4 nowfollows in the manner of Corollary 2.1 in Woodroofe and Van Ness (1967)using Bernstein's inequality for entire functions of finite order (see Timan(1963)).
Lemma P7.5 For T sufficiently large
for some finite N.
The proof of the theorem is now completed by developing similar lemmasfor Im gfl6
(r)(X) and applying the Borel-Cantelii lemma.
Proof of Theorem 7.7.4 Suppose wab(u) = 0 for \u\> 1. Then/flfr(r)(X) is a
trigonometric polynomial of degree BT~I — n. Exercise 3.10.35(b) gives
for positive integers k. From the proof of Theorem 7.4.4
uniformly in X. This gives
PROOFS FOR CHAPTER 8
Proof of Theorem 8.2.1 We may write
446 PROOFS OF THEOREMS
Taking k sufficiently large gives the two results of the theorem.
Proof of Theorem 7.9.1 From Lemma P6.3 and Theorem 4.4.2 we maywrite
where i\ is #ic(0,/aa(X)), 0,, j = I,..., J are independent Nic(Qtfp^X)) and£jk> J — 1, • • - , • / , k = 1,. . ., K are independent N\ C(0,/,,(X)). It followsthat
By evaluating covariances, we see that the i> — fy-., the £/. — f.. + 0,- — 0.and the f.. -f- 0. -4-17 are statistically independent. This implies that thestatistics of the theorem are asymptotically independent.
We have the identity
Exercise 4.8.7 applies indicating that ^ If;* ~ f;-l2 *s distributed as/,(A)xi/(A--n/2. We also have the identity
and Exercise 4.8.7 again applies to indicate that 2) |fy. — f.. + Bj — 0.|2 isdistributed as [/^X) + ^-%(X)]xia-i)/2. Finally, |f.. + 0. + i»|2 is dis-tributed as [/aa(X) + J~1M\) + y-'^-1/,(X)]x22/2. This completes theproof of the theorem.
PROOFS FOR CHAPTER 8 447
with equality achieved by the choices (8.2.14) and (8.2.15).
Before proving Theorem 8.2.2, we state a lemma of independent interest.
Lemma P8.1 Suppose the conditions of Theorem 8.2.1 are satisfied. Thes vector-valued function 4>(X), with £<j>(X)T«j>(X) < » minimizing
is given by the conditional expected value
Proof We may write (*) as
with equality achieved by the indicated <J>(X).
Proof of Theorem 8.2.2 If the variate (8.2.10) is normal, it is a classicalresult, given in Anderson (1957) for example, that
and the theorem follows from Lemma P8.1.Proof of Theorem 8.2.3 We prove Theorem 8.2.5 below, this theoremfollows in a similar manner.Proof of Theorem 8.2.4 This follows directly as did the proof of Theorem8.2.1.Proof of Theorem 8.2.5 Let x, y denote the matrices (8.2.25) and (8.2.26)respectively. We may write
with a = J^YX^XX~I and e = y — ax. The columns of e are independent7V.tc(0,S,,) variates. Also e is independent of x.
For fixed x it therefore follows from Exercise 6.12.20 that vec (a — a) isdistributed as Nrs
c(0,^tt (xXxx*)"1) and S«, is independently (n — r)~l
Wsc(n - r,S.,). For fixed x the distribution of (8.2.53) is therefore f2<»-/).
As this distribution does not depend on x, it is also the unconditional distri-bution. Next £{a | x} = a and so Ea. = a as desired. Also
448 PROOFS OF THEOREMS
As £(xxr)~! = (n - r)"1 .̂"1 (see Exercise 8.16.47) and cov {£a | x, £a | xj= 0, we have (8.2.54). The asymptotic normality of a follows from the jointasymptotic normality of the entries of yxT and XXT and the fact that a is adifferentiable function of those entries using Theorem P5.2.
It remains to demonstrate the independence of a and £.,. In terms ofprobability density functions we may write
from the conditional independence indicated above. It follows that
and the proof is completed.Proof of Theorem 8.3.1 Let A(X) be the transfer function of {a(w)}. Weshall see that it is well defined. We may write expression (8.3.2) in the form
with equality under the choices (8.3.3) and (8.3.5).The fact that A(X) given by (8.3.5) is the Fourier transform of an ab-
solutely summable function follows from Theorem 3.8.3 and the fact thatfxxO^) is nonsingular — °° < X < oo.Proof of Theorem 8.3.2 We have seen that we can write
with Ee(f) = 0, cov [\(t + u), t(i)\ = 0 for all u. Because the series arejointly normal this 0 covariance implies that X(/ + «) and t(t) are statis-tically independent for all u. We have, therefore,
giving the required (8.3.21) and (8.3.22).Proof of Theorem 8.5.1 This follows from Theorem 7.3.3 and Theo-rem P5.1.Proof of Theorem 8.6.1 Under the indicated assumptions it follows fromTheorem 7.4.1 that
Covariances, to first asymptotic order, coming out of the perturbation ex-pansions, will therefore be the same as those based on the variate (*). FromTheorem 8.2.5 we can now say that
as the limiting distribution of Theorem 8.2.5 is complex normal. Also here
here. From Theorem 7.4.3 we can say that
here.In the case that X + M — 0 (mod lir) and X ̂ 0 (mod 2-ir) we can say
PROOFS FOR CHAPTER 8 449
The statistics \(T\ <t>jk(T\ Gjk
(T\ R^k.x, \RYx(T)\2 are each differentiablefunctions of fxx(T\X), f;ry(r)(X), fyy(r)(X). The indicated expressions nowfollow from a theorem of Brillinger and Tukey (1964).Proof of Corollary 8.6,1 This follows directly from the expressions (8.6.11)to (8.6.15) and the convergence theorem of Exercise 1.7.4.Proof of Theorem 8.7.1 A(r)(X), g,.(r)(X), R™(\), \R™(\)\2 are all differ-entiable functions of the entries of f*jr(r)(X), fy*(r)(X), fyy(r)(X) and soperturbation expansions such as
may be set down and used with Theorem 7.4.3 to deduce the indicatedasymptotic covariances. In fact it is much more convenient to take ad-vantage of the results of Section 8.2 to deduce the form of the covariances.
We begin by noting, from Corollary 7.4.3, that the covariances of variatesat frequencies X, n are o(Br~lT~i) unless X — / * o r X + ju = 0 (mod 2ir).
Suppose X — n = 0 (mod 2-ir) and X ̂ 0 (mod 2x). The asymptotic co-variance structure of
is seen to be the same as that of
where
450 PROOFS OF THEOREMS
In the case that X, n = 0 (mod 2-n), the statistics are real-valued and wemust make use of Theorem 8.2.3 instead. We see that here
This completes the development of expressions (8.7.1) and (8.7.2). Ex-pressions (8.7.3) and (8.7.4) follow from Theorems 8.2.5 and 7.6.2.Proof of Theorem 8.8.1 This follows from the remarks made at the be-ginning of the proof of Theorem 8.7.1, Theorem 7.4.4 and Theorem P5.2.
The asymptotic independence of A(r) and gtt(T) follows from their
negligible covariance indicated in Theorem 8.2.5.
Before proving Theorem 8.10.1 it will be convenient to set down somenotation and a lemma. If \p = lirp/Pr, p = 0, . . ., PT-I, we define
We can now state
Lemma P8.1 Under the conditions of Theorem 8.10.1,
for any d > 0.Proof We have the identity
The norm of the right side here is bounded by
and = Op(PTiB~{l2T~{12) uniformly in p. This gives the lemma.
for any e > 0. It follows that
if -Y ^ (3 with Y ̂ 0.From Theorem 7.7.5
will be made up of two parts, a term involving only second-order spectraand a term involving fourth-order spectra.
From our investigation of A(r)(X), we can say that the contribution of theterm in second-order spectra to cov {vec CP, vec £,} is asymptotically
PROOFS FOR CHAPTER 8 451
Proof of Theorem 8.10.1 We must investigate the asymptotic behaviorof the
We begin by noting that E vec C/> = 0. Next, because PrBr ^ 1 and W(a)vanishes for \a\ > IT, Exercise 7.10.41 takes the form
It follows that the covariance matrix of the variate
Suppose we denote the term in cov {vec $p, vec $q\ that involves fourth-order spectra by (2ir/T)Vpq. Because of the model Y(f) = y + S a(/ — u)X X(w) + t(0, with the series t(t) independent of the series X(0, the corre-sponding terms in cov {vec ap, vec (5?} and cov {vec up, vec a,} will be
It follows that their contribution to cov {vec <p, vec C9} will be
as A(XP)^ ApBp-1.We may deduce from all this that
452 PROOFS OF THEOREMS
Exercise 7.10.42 may next be invoked to conclude that PT~I £ exp {i\pu}vec (P is asymptotically normal. Putting this together we have the desiredresult.
Proof of Theorem 8.10.2 By substitution
therefore
This last may be rewritten
From Exercise 7.10.36, c,*(r)(0) is asymptotically normal with mean 0 and
and so
This gives the indicated asymptotic distribution for vec a(r) as cxx(T)(Q)~~l
tends to cxx(ty~l in probability._Because £X(0 = 0, cx
(T) = o/l) and (*) shows that V?Vr) - tf) =Vr cf
(T) + op(l) giving the indicated limiting distribution for y(r) fromTheorem 4.4.1. The asymptotic independence of a(r) and y(7') follows fromthe asymptotic independence of c,(r) and c«A-(r)(0):
Continuing
PROOFS FOR CHAPTER 9 453
It follows that
PROOFS FOR CHAPTER 9
Proof of Theorem 9.2.1 We prove Theorem 9.2.3 below; Theorem 9.2.1follows in a similar manner.
Proof of Theorem 9.2.2 We prove Theorem 9.2.4 below; Theorem 9.2.2follows in a similar manner.
Proof of Theorem 9.2.3 We prove that the yth latent root of (9.2.17) is^ juy+4 with equality achieved by the indicated y, B, C.
In view of the previously determined asymptotic distributions of f^A-(r)(X),fA-«(r)(X) we have from the last expression
giving the indicated asymptotic distribution for feem(\).
Before proving Theorem 8.11.1 we first set down a lemma.
Lemma P8.2 Let Xr, T = 1, 2 , . . . be a sequence of vector-valued randomvariables, y a constant vector and ax, T = 1, 2 , . . . a sequence of constantstending to 0 with T. Suppose
with probability 1. Let f(x) have a continuous first derivative in a neighbor-hood of y with |f'(y)| ^ 0. Then
with probability 1.Proof With probability 1, Xr will be in the indicated neighborhood of y. forall large T. Take it as being there. Next because f(x) has a first derivativewe have
for some < in the neighborhood. Because of the continuity of f'(x), f'(C) be-comes arbitrarily close to f'(y) as T —> « and we have the indicated result.
Proof of Theorem 8.11.1 The theorem follows from Lemma P8.2, Theo-rem 7.7.3, and Theorem 7.4.2.
454 PROOFS OF THEOREMS
From Theorem 8.2.4
because the matrix has rank q -f / — 1 at most. We quickly check that
the indicated p, B, C lead to a matrix (9.2.17) of the form (9.2.21). Nowequality in the above inequalities is achieved by the indicated choices be-cause the /th latent root of (9.2.21) is /*«+,.
We have here presented a complex version of the arguments of Okamotoand Kanazawa (1968).
Proof of Theorem 9.2.4 We have the Taylor series expansions
See Wilkinson (1965) p. 68.We see that t>xx is asymptotically normal with mean Exx and
This implies the useful result of Exercise 4.8.36(b)
for r vectors a, (3, y, 5. The indicated asymptotic moments now follow
where
The matrix D has rank ^ q. Now
where L is (i — 1) X r. This is
PROOFS FOR CHAPTER 9 455
directly from (*) and (**) using these expressions. For example,
We see that the second term is minimized if we minimize
where V/(X) is they'th latent vector of fxx(W2 and a fortiori of fxx(ty- Theindicated B(X), C(X) are now seen to achieve the desired minimization.Proof of Theorem 9.3.2 The cross-spectrum of f/O with f *(0 is given by
giving the indicated results.Proof of Theorem 9.3.3 Because the latent roots of fxx(ty are simple forall X, its latent roots and vectors will be real holomorphic functions of itsentries, see Exercises 3.10.19 to 3.10.21.
giving the indicated covariances because as the Vy are latent vectors
The asymptotic normality follows from the asymptotic normality of t,xxand Theorem P5.2.Proof of Theorem 9.3.1 We may write (9.3.3) as
where A(«) = C(a)B(a). We may make the first term 0 by setting
for each a with A(a) of rank ^ q. From Theorem 3.7.4 we see that weshould take
456 PROOFS OF THEOREMS
Expressions (9.3.29) and (9.3.30) now follow from Theorem 3.8.3. Ex-pressions (9.3.31) and (9.3.32) follow directly from these and from ex-pression (9.3.28).Proof of Theorem 9.3.4 The desired B/A) must be some linear combina-tion of the V*(A)T, k — 1,. . ., say
expression (9.4.5) now follows.Expressions (9.4.6) and (9.4.7) result from the following Taylor series ex-
pansions set down in the course of the proof of Theorem 9.2.4:
The desired series is orthogonal to £*(f), k <j and so it must have G/*(A) = 0for k < j. The variance of (9.3.33) may be written
with ̂ k |(//jt(A)|2 = 1. This variance is clearly maximized by taking
and we have the result.
Proof of Theorem 9.3.5 The spectral density matrix of (9.3.35) is givenby
where A(X) = C(A)B(A). We see from Theorem 9.2.3 that the latent roots ofthe latter are minimized by the indicated B(A), C(A).Proof of Theorem 9.4.1 From the Wielandt-Hoffman theorem (see Wilkin-son (1965))
Also from Theorems 7.4.1 and 7.4.3
As
Proof of Theorem 9.4.2 Expressions (9.4.13) and (9.4.14) follow from thefollowing expressions given in the proof of Theorem 9.2.4:
under the given conditions.
Proof of Theorem 9.4.3 This follows from the expressions of the proof ofTheorem 9.4.1 in the manner of the proof of Theorem 9.2.4.
Proof of Theorem 9.4.4 The latent roots and vectors of a matrix are con-tinuous functions of its entries. This theorem consequently follows fromTheorem 7.3.3 and Theorem P5.1.
The minimum achieved is seen to be as stated.
PROOFS FOR CHAPTER 10 457
and the result (7.4.13)
PROOFS FOR CHAPTER 10
Proof of Theorem 10.2.1 Let A = CB, and write (10.2.5) as
From Theorem 3.7.4 this is minimized by setting
or
with UTU = I. Now, the latent roots that appear are maximized by takingthe columns of U to be the first q latent vectors of S^^SyjfSjr^'ZA-rZyJ'2;see Bellman (1960) p. 117. The theorem follows directly.Proof of Theorem 10.2.3 This follows as the proof of Theorem 10.2.6given below.Proof of Theorem 10.2.4 This follows as did the proof of Theorem 10.2.1.Proof of Theorem 10.2.5 This follows as did the proof of Theorem 10.2.2.Proof of Theorem 10.2.6 Let A** = t,xx — %xx with a similar definitionfor A*y, Ayy. Proceeding in the manner of Wilkinson (1965) p. 68 orDempster (1966) we have the expansions
458 PROOFS OF THEOREMS
Proof of Theorem 10.2.2 First take E as fixed. Then Theorem 10.2.1 indi-cates that the minimum with respect to y and D is
Let U = Es]/r2, then write
where
and
Using the expression developed in the course of the proof of Theorem9.2.4 we see that
ifj=k,l = m and equals 0 otherwise. Similarly
if j = m, I = k and equals 0 otherwise.
PROOFS FOR CHAPTER 10 459
Continuing
if j = m, I = k and so on.The expansions above and these moments now give the indicated first-
and second-order asymptotic moments. The asymptotic normality followsfrom the asymptotic normality of the t*xx, %XY, and %YY and the fact thatthe latent roots and vectors are differentiable functions of these matricesthrough Theorem P5.2.
Proof of Theorem 10.3.1 The expression (10.3.3) may be written
and we see that we should choose y so that £¥(/) = £Y*(0- Now
It therefore follows from Corollary 3.7.4 that expression (10.3.3) is mini-mized by the indicated B(a) and C(a).Proof of Corollary 10.3.1 This result follows from an application of Theo-rem 10.3.1 to the transformed variate
noting, for example, that
for this series.Proof of Theorem 10.3.2 We are interested in the coherence
having defined
460 PROOFS OF THEOREMS
for B'(A) orthogonal to Vi(X),. . . , V/_i(X) the first j — 1 latent vectors of^YYI2^YX^XX~^XY^YY12 by Exercise 3.10.26. Expression (10.3.25) indicatesthat B/X) is as indicated in the theorem; that A/X) achieves equality followsby inspection.Proof of Theorem 103.3 Because the latent roots of fyxfxx~{fxY aresimple for all X, its latent roots and vectors are real holomorphic functionsof the entries, see Exercises 3.10.19-21. Expressions (10.3.28) and (10.3.29)now follow from Theorem 3.8.3. Expression (10.3.30) follows from (10.3.26)to (10.3.29).Proof of Theorem 10.3.4 Because the latent roots of {^i2frxfxx~lfxYf^12
are simple for all X, its latent roots and vectors are real holomorphic func-tions of its entries; see Exercises 3.10.19 to 3.10.21. Expressions (10.3.33)and (10.3.34) now follow from Theorem 3.8.3. That the spectral density is(10.3.36) either follows from Theorem 10.3.1 or by direct computation.Proof of Theorem 10.4.1 This follows as did the proof of Theorem 9.4.1with the exception that the perturbation expansions of the proof of Theorem10.2.6 are now used.Proof of Theorem 10.4.2 This follows from the above perturbation ex-pansions in the manner of the proof of Theorem 10.2.6.Proof of Theorem 10.4.3 The fa, A7, and B7 are continuous functions ofthe entries of (10.4.25). The theorem consequently follows from Theorem7.3.3 and Theorem P5.1.
By Schwarz's inequality the coherency is
REFERENCES
ABELSON, R. (1953). Spectral analysis and the study of individual differences. Ph.D.Thesis, Princeton University.
ABRAMOWITZ, M., and STEGUN, I. A. (1964). Handbook of MathematicalFunctions. Washington: National Bureau of Standards.
ACZIiL, J. (1969). On Applications and Theory of Functional Equations. Basel:Birkhauser.
AITKEN, A. C. (1954). Determinants and Matrices. London: Oliver and Boyd.AKAIKE, H. (1960). "Effect of timing-error on the power spectrum of sampled
data." Ann. Inst. Statist. Math. 11:145-165.AKAIKE, H. (1962a). "Undamped oscillation of the sample autocovariance
function and the effect of prewhitening operation." Ann. Inst. Statist. Math.13:127-144.
AKAIKE, H. (1962b). "On the design of lag windows for the estimation of spectra."Ann. Inst. Statist. Math. 14:1-21.
AKAIKE, H. (1964). "Statistical measurement of frequency response function."Ann. Inst. Statist. Math., Supp. III. 15:5-17.
AKAIKE, H. (1965). "On the statistical estimation of the frequency response func-tion of a system having multiple input." Ann. Inst. Statist. Math. 17:185-210.
AKAIKE, H. (1966). "On the use of a non-Gaussian process in the identification ofa linear dynamic system." Ann. Inst. Statist. Math. 18:269-276.
AKAIKE, H. (1968a). "Low pass filter design." Ann. Inst. Statist. Math. 20:271-298.AKAIKE, H. (1968b). "On the use of an index of bias in the estimation of power
spectra." Ann. Inst. Statist. Math. 20:55-69.AKAIKE, H. (1969a). "A method of statistical investigation of discrete time para-
meter linear systems." Ann. Inst. Statist. Math. 21:225-242.AKAIKE, H. (1969b). "Fitting autoregressive models for prediction." Ann. Inst.
Statist. Math. 21:243-247.
461
462 REFERENCES
AKAIKE, H., and KANESHIGE, I. (1964). "An analysis of statistical response ofbackrash." Ann. Inst. Statist. Math., Supp. III. 15:99-102.
AKAIKE, H., and YAMANOUCHI, Y. (1962). "On the statistical estimation offrequency response function." Ann. Inst. Statist. Math. 14:23-56.
AKCASU, A. Z. (1961). "Measurement of noise power spectra by Fourier analysis."J. Appl. Physics. 32:565-568.
AKHIEZER, N. I. (1956). Theory of Approximation. New York: Ungar.ALBERT, A. (1964). "On estimating the frequency of a sinusoid in the presence of
noise." Ann. Math. Statist. 35:1403.ALBERTS, W. W., WRIGHT, L. E., and FEINSTEIN, B. (1965). "Physiological
mechanisms of tremor and rigidity in Parkinsonism. Confin. Neural. 26:318-327.ALEXANDER, M. J., and VOK, C. A. (1963). Tables of the cumulative distribution
of sample multiple coherence. Res. Rep. 63-67. Rocketdyne Division, NorthAmerican Aviation Inc.
AMOS, D. E., and KOOPMANS, L. H. (1962). Tables of the distribution of thecoefficient of coherence for stationary bivariate Gaussian processes. SandiaCorporation Monograph SCR-483.
ANDERSON, G. A. (1965). "An asymptotic expansion for the distribution of thelatent roots of the estimated covariance matrix." Ann. Math. Statist. 36:1153-1173.
ANDERSON, T. W. (1957). An Introduction to Multivariate Statistical Analysis.New York: Wiley.
ANDERSON, T. W. (1963). "Asymptotic theory for principal component analysis."Ann. Math. Statist. 34:122-148.
ANDERSON, T. W. (1971). Statistical Analysis of Time Series. New York: Wiley.ANDERSON, T. W., and WALKER, A. M. (1964). "On the asymptotic distribu-
tion of the autocorrelations of a sample from a linear stochastic process." Ann.Math. Statist. 35:1296-1303.
ARATO, M. (1961). "Sufficient statistics of stationary Gaussian processes."Theory Prob. Appl. 6:199-201.
ARENS, R., and CALDER6N, A. P. (1955). "Analytic functions of severalBanach algebra elements." Ann. Math. 62:204-216.
ASCHOFF, J. (1965). Circadian Clocks. Amsterdam: North Holland.AUTONNE, L. (1915). "Sur les matrices hypohermitiennes et sur les matrices
unitaires." Ann. Univ. Lyon. 38:1-77.BALAKRISHNAN, A. V. (1964). "A general theory of nonlinear estimation prob-
lems in control systems." /. Math. Anal. App. 8:4-30.BARLOW, J. S. (1967). "Correlation analysis of EEG-tremor relationships in man."
In Recent Advances in Clinical Neurophysiology, Electroenceph. Clin. Neuro-physiol., Suppl. 25:167-177.
BARTLETT, M. S. (1946). "On the theoretical specification of sampling propertiesof auto-correlated time series." J. Roy. Statist. Soc., Suppl. 8:27-41.
BARTLETT, M. S. (1948a). "A note on the statistical estimation of supply anddemand relations from time series." Econometrica. 16:323-329.
BARTLETT, M. S. (1948b). "Smoothing periodograms from time series with con-tinuous spectra." Nature. 161:686-687.
REFERENCES 463
BARTLETT, M. S. (1950). "Periodogram analysis and continuous spectra."Biometrika. 37:1-16.
BARTLETT, M. S. (1966). An Introduction to Stochastic Processes, 2nd ed. Cam-bridge: Cambridge Univ. Press.
BARTLETT, M. S. (1967). "Some remarks on the analysis of time series." Bio-metrika. 50:25-38.
BASS, J. (1962a). "Transformees de Fourier des fonctions pseudo-aleatoires."C. R. Acad. Scl 254:3072.
BASS, J. (1962b). Les Fonctions Pseudo-aleatoires. Paris: Gauthier-Villars.BATCHELOR, G. K. (1960). The Theory of Homogeneous Turbulence. Cambridge.
Cambridge Univ. Press.BAXTER, G. (1963). "A norm inequality for a finite section Weiner-Hopf equa-
tion." ///. J. Math. 7:97-103.BELLMAN, R. (1960). Introduction to Matrix Analysis. New York: McGraw-Hill.BEND AT, J. S., and PIERSOL, A. (1966). Measurement and Analysis of Random
Data. New York: Wiley.BERANEK, L. L. (1954). Acoustics. New York: McGraw-Hill.BERGLAND, G. D. (1967). "The fast Fourier transform recursive equations for
arbitrary length records." Math. Comp. 21:236-238.BERNSTEIN, S. (1938). "Equations differentielles stochastiques." Act. Sci. Ind.
738:5-31.BERTRAND, J., and LACAPE, R. S. (1943). Theorie de r Electro-encephalogram.
Paris: G. Doin.BERTRANDIAS, J. B. (1960). "Sur le produit de deux fonctions pseudo-aleatoires."
C. R. Acad. Sci. 250:263BERTRANDIAS, J. B. (1961). "Sur 1'analyse harmonique generalisee des fonctions
pseudo-aleatoires." C. R. Acad. Sci. 253:2829.BEVERIDGE, W. H. (1921). "Weather and harvest cycles." Econ. J. 31:429.BEVERIDGE, W. H. (1922). "Wheat prices and rainfall in Western Europe." J.
Roy. Statist. Soc. 85:412-459.BILLINGSLEY, P. (1965). Ergodic Theory and Information. New York: Wiley.BILLINGSLEY, P. (1966). "Convergence of types in £-space." Zeit. Wahrschein.
5:175-179.BILLINGSLEY, P. (1968). Convergence oj'Probability Measures. New York: Wiley.BINGHAM, C, GODFREY, M. D., and TUKEY, J. W. (1967). "Modern tech-
niques in power spectrum estimation." IEEE Trans. Audio Electroacoust. AU-15:56-66.
BLACKMAN, R. B. (1965). Linear Data Smoothing and Prediction in Theory andPractice. Reading, Mass.: Addison-Wesley.
BLACKMAN, R. B., and TUKEY, J. W. (1958). "The measurement of powerspectra from the point of view of communications engineering." Bell Syst.Tech. J. 37:183-282, 485-569.
BLANC-LAPIERRE, A., and FORTET, R. (1953). Theorie des Fonctions Aleatoires.Paris: Masson.
BLANC-LAPIERRE, A., and FORTET, R. (1965). Theory of Random Functions.New York: Gordon and Breach. Translation of 1953 French edition.
464 REFERENCES
BOCHNER, S. (1936). "Summation of multiple Fourier series by spherical means."Trans, Amer. Math. Soc. 40:175-207.
BOCHNER, S. (1959). Lectures on Fourier Integrals. Princeton: Princeton Univ.Press.
BOCHNER, S., and MARTIN, W. T. (1948). Several Complex Variables. Prince-ton: Princeton Univ. Press.
BODE, H. W. (1945). Network Analysis and Feedback Amplifier Design. NewYork: Van Nostrand.
BOHMAN, H. (1960). "Approximate Fourier analysis of distribution functions."Ark. Mat. 4:99-157.
BORN, M., and WOLF, E. (1959). Principles of Optics. London: Pergamon.BOWLEY, A. L. (1920). Elements of Statistics. London: King.BOX, G. E. P. (1954). "Some theorems on quadratic forms applied in the study of
analysis of variance problems." Ann. Math. Statist. 25:290-302.BOX, G. E. P. and JENKINS, G. M. (1970). Time Series Analysis, Forecasting
and Control. San Francisco: Holden-Day.BRACEWELL, R. (1965). The Fourier Transform and its Applications. New York:
McGraw-Hill.BRENNER, J. L. (1961). "Expanded matrices from matrices with complex ele-
ments." SIAM Review. 3:165-166.BRIGHAM, E. O., and MORROW, R. E. (1967). "The fast Fourier transform."
IEEE Spectrum. 4:63-70.BRILLINGER, D. R. (1964a). "The generalization of the techniques of factor
analysis, canonical correlation and principal components to stationary timeseries." Invited paper at Royal Statistical Society Conference in Cardiff, Wales.Sept. 29-Oct. 1.
BRILLINGER, D. R. (1964b). "A technique for estimating the spectral densitymatrix of two signals." Proc. I.E.E.E. 52:103-104.
BRILLINGER, D. R. (1964c). "The asymptotic behavior of Tukey's generalmethod of setting approximate confidence limits (the jackknife) when appliedto maximum likelihood estimates." Rev. Inter. Statis. Inst. 32:202-206.
BRILLINGER, D. R. (1965a). "A property of low-pass filters. "SIAM Review.7:65-67.
BRILLINGER, D. R. (1965b). "An introduction to polyspectra." Ann. Math.Statist. 36:1351-1374.
BRILLINGER, D. R. (1966a). "An extremal property of the conditional expecta-tion." Biometrika. 53:594-595.
BRILLINGER, D. R. (1966b). "The application of the jackknife to the analysis ofsample surveys." Commentary. 8:74-80.
BRILLINGER, D. R. (1968). "Estimation of the cross-spectrum of a stationarybivariate Gaussian process from its zeros." J. Roy. Statist. Soc., B. 30:145-159.
BRILLINGER, D. R. (1969a). "A search for a relationship between monthlysunspot numbers and certain climatic series. "Bull. ISI. 43:293-306.
BRILLINGER, D. R. (1969b). "The calculation of cumulants via conditioning."Ann. Inst. Statist. Math. 21:215-218.
BRILLINGER, D. R. (1969c). "Asymptotic properties of spectral estimates ofsecond-order." Biometrika. 56:375-390.
REFERENCES 465
BRILLINGER, D. R. (1969d). "The canonical analysis of stationary time series."In Multivariate Analysis — II, Ed. P. R. Krishnaiah, pp. 331-350. New York:Academic.
BRILLINGER, D. R. (1970a). "The identification of polynomial systems by meansof higher order spectra." J. Sound Vib. 12:301-313.
BRILLINGER, D. R. (1970b). "The frequency analysis of relations between sta-tionary spatial series." Proc. Twelfth Bien. Sem. Canadian Math. Congr, Ed.R. Pyke, pp. 39-81. Montreal: Can. Math. Congr.
BRILLINGER, D. R. (1972). "The spectral analysis of stationary interval func-tions." In Proc. Seventh Berkeley Symp. Prob. Statist. Eds. L. LeCam, J. Ney-man, and E. L. Scott, pp. 483-513. Berkeley: Univ. of California Press.
BRILLINGER, D. R. (1973). "The analysis of time series collected in an experi-mental design." Multivariate Analysis — III, Ed. P. R. Krishnaiah, pp. 241-256. New York: Academic.
BRILLINGER, D. R., and HATANAKA, M. (1969). "An harmonic analysis ofnonstationary multivariate economic processes. "Econometrica. 35:131-141.
BRILLINGER, D. R., and HATANAKA, M. (1970). "A permanent income hy-pothesis relating to the aggregate demand for money (an application of spectraland moving spectral analysis)." Economic Studies Quart. 21:44-71.
BRILLINGER, D. R., and ROSENBLATT, M. (1967a). "Asymptotic theory of/c-th order spectra." Spectral Analysis of Time Series, Ed. B. Harris, pp. 153-188.New York: Wiley.
BRILLINGER, D. R., and ROSENBLATT, M. (1967b). "Computation and inter-pretation of Ar-th order spectra." In Spectral Analysis of Time Series, Ed. B.Harris, pp. 189-232. New York: Wiley.
BRILLINGER, D. R., and TUKEY, J..W. (1964). Asymptotic variances, moments,cumulants and other average values. Unpublished manuscript.
BRYSON, R. A., and DUTTON, J. A. (1961). "Some aspects of the variancespectra of tree rings and varves." Ann. New York Acad. Sci. 95:580-604.
BULLARD, E. (1966). "The detection of underground explosions." ScL Am. 215:19.BUNIMOVITCH, V. I. (1949). The fluctuation process as a vibration with random
amplitude and phase." J. Tech. Phys. (USSR) 19:1237-1259.BURGERS, J. M. (1948). "Spectral analysis of an irregular function." Proc. Acad.
Sci. Amsterdam. 51:1073.BURKHARDT, H. (1904). "Trigonometrische Reihen und Integrale." Enzykl.
Math. Wiss. 2:825-1354.BURLEY, S. P. (1969). "A spectral analysis of the Australian business cycle."
Austral. Econ. Papers. 8:193-128.BUSINGER, P. A., and GOLUB, G. H. (1969). "Singular value decomposition of a
complex matrix." Comm. ACM. 12:564-565.BUTZER, P. L., and NESSEL, R. J. (1971). Fourier Analysis and Approximations,
Vol. 1. New York: Academic.CAIRNS, T. W. (1971). "On the fast Fourier transform on a finite Abelian group."
IEEE Trans. Computers. C-20:569-571.CAPON, J. (1969). "High resolution frequency wavenumber spectral analysis."
Proc. I.E.E.E. 57:1408-1418.
466 REFERENCES
CAPON, J. and GOODMAN, N. R. (1970). "Probability distributions for estima-tors of the frequency wavenumber spectrum." Proc. I.E.E.E. 58:1785-1786.
CARGO, G. T. (1966). "Some extension of the integral test." Amer. Math. Monthly.73:521-525.
CARPENTER, E. W. (1965). "Explosions seismology." Science. 147:363-373.CARTWRIGHT, D. E. (1967). "Time series analysis of tides and similar motions of
the sea surface." /. Appl. Prob. 4:103-112.CHAMBERS, J. M. (1966). Some methods of asymptotic approximation in multi-
variate statistical analysis. Ph.D. Thesis, Harvard University.CHAMBERS, J. M. (1967). "On methods of asymptotic approximation for multi-
variate distributions." Biometrika. 54:367-384.CHANCE, B., PYE, K., and HIGGINS, J. (1967). "Waveform generation by
enzymatic oscillators." IEEE Spectrum. 4:79-86.CHAPMAN, S., and BARTELS, J. (1951). Geomagnetism, Vol. 2. Oxford: Oxford
Univ. Press.CHERNOFF, H., and LIEBERMAN, G. J. (1954). "Use of normal probability
paper." J. Amer. Statist. Assoc. 49:778-785.CHOKSI, J. R. (1966). "Unitary operators induced by measure preserving trans-
formations." J. Math, and Mech. 16:83-100.CHOW, G. C. (1966). "A theorem on least squares and vector correlation in multi-
variate linear regression." /. Amer. Statist. Assoc. 61:413-414.CLEVENSON, M. L. (1970). Asymptotically efficient estimates of the parameters of
a moving average time series. Ph.D. Thesis, Stanford University.CONDIT, H. R., and GRUM, F. (1964). "Spectral energy distribution of daylight."
J. Optical Soc. Amer. 54:937-944.CONSTANTINE, A. G. (1963). "Some noncentral distributions in multivariate
analysis." Ann. Math. Statist. 34:1270-1285.COOLEY, J. W., LEWIS, P. A. W., and WELCH, P. D. (1967a). "Historical notes
on the fast Fourier transform." IEEE Trans, on Audio and Electroacoustics.AU-15:76-79.
COOLEY, J. W., LEWIS, P. A. W., and WELCH, P. D. (1967b). The fast Fouriertransform algorithm and its applications. IBM Memorandum RC 1743.
COOLEY, J. W., LEWIS, P. A. W., and WELCH, P. D. (1970). "The application ofthe Fast Fourier Transform Algorithm to the estimation of spectra and cross-spectra." J. Sound Vib. 12:339-352.
COOLEY, J. W., and TUKEY, J. W. (1965). "An algorithm for the machine cal-culation of complex Fourier series." Math. Comp. 19:297-301.
COOTNER, P. H. (1964). The Random Character of Stock Market Prices. Cam-bridge: MIT Press.
COVEYOU, R. R., and MACPHERSON, R. D. (1967). "Fourier analysis of uni-form random number generators."/. Assoc. Comp. Mach. 14:100-119.
CRADDOCK, J. M. (1965). "The analysis of meteorological time series for use inforecasting." Statistician. 15:167-190.
CRADDOCK, J. M., and FLOOD, C. R. (1969). "Eigenvectors for representingthe 500 mb geopotential surface over the Northern Hemisphere." Quart. J.Roy. Met. Soc. 95:576-593.
REFERENCES 467
CRAMER, H. (1939). "On the representation of functions by certain Fourierintegrals." Trans. Amer. Math. Soc. 46:191-201.
CRAMER, H. (1942). "On harmonic analysis in certain functional spaces." ArkivMath. Astr. Fysik. 28:1-7.
CRAMER, H., and LEADBETTER, M. R. (1967). Stationary and Related Sto-chastic Processes. New York: Wiley.
CRANDALL, I. B. (1958). Random Vibration, I. Cambridge: MIT Press.CRANDALL, I. B. (1963). Random Vibration, II. Cambridge: MIT Press.CRANDALL, I. B., and SACIA, C. F. (1924). "A dynamical study of the vowel
sounds." BellSyst. Tech. J. 3:232-237.DANIELL, P. J. (1946). "Discussion of paper by M. S. Bartlett," J. Roy. Statist.
Soc., Suppl. 8:27.DANIELS, H. E. (1962). "The estimation of spectral densities." J. Roy. Statist. Soc.,
B. 24:185-198.DARROCH, J. N. (1965). "An optimal property of principal components." Ann.
Math. Statist. 36:1579-1582.DARZELL, J. F., and PIERSON, W. J., Jr. (1960). The apparent loss of coherency
in vector Gaussian processes due to computational procedures with applicationsto ship motions and random seas. Report of Dept. of Meteorology and Oceano-graphy, New York University.
DAVIS, C., and KAHAN, W. M. (1969). "Some new bounds on perturbation ofsubspaces." Bull. Amer. Math. Soc. 75:863-868.
DAVIS, R. C. (1953). "On the Fourier expansion of stationary random processes."Proc. Amer. Math. Soc. 24:564-569.
DEEMER, W. L., and Olkin, I. (1951). "The Jacobians of certain matrix trans-formations." Biometrika. 38:345-367.
DEMPSTER, A. P. (1966). "Estimation in multivariate analysis." In MultivariateAnalysis, Ed. P. R. Krishmaiah, pp. 315-334. New York: Academic.
DEMPSTER, A. P. (1969). Continuous Multivariate Analysis. Reading: Addison-Wesley.
DEUTSCH, R. (1962). Nonlinear Transformations of Random Processes. EnglewoodCliffs: Prentice-Hall.
DICKEY, J. M. (1967). "Matricvariate generalizations of the multivariate t dis-tributions and the inverted multivariate / distribution." Ann. Math. Statist.38:511-519.
DOEBLIN, W. (1938). "Sur 1'equation matricielle A(t + s) = A(t)A(s) et ses ap-plications aux probabilites en chaine." Bull. Sci. Math. 62:21-32.
DOOB, J. L. (1953). Stochastic Processes. New York: Wiley.DRAPER, N. R., and SMITH, H. (1966). Applied Regression Analysis. New York:
Wiley.DRESSEL, P. L. (1940). "Semi-invariants and their estimates." Ann. Math. Statist.
11:33-57.DUGUNDJI, J. (1958). "Envelopes and pre-envelopes of real waveforms." IRE
Trans. Inf. Theory. IT-4:53-57.DUNCAN, D. B., and JONES, R. H. (1966). "Multiple regression with stationary
errors." J. Amer. Statist. Assoc. 61:917-928.
468 REFERENCES
DUNFORD, N., and SCHWARTZ, J. T. (1963). Linear Operators, Part II. NewYork: Wiley, Interscience.
DUNNETT, C. W., and SOBEL, M. (1954). "A bivariate generalization ofStudent's /-distribution, with tables for certain special cases." Biometrika.41:153-169.
DURBIN, J. (1954). "Errors in variables." Rev. Inter. Statist. Inst. 22:23-32.DURBIN, J. (1960). "Estimation of parameters in time series regression models."
J. Roy. Statist. Soc., B. 22:139-153.DYNKIN, E. B. (1960). Theory of Markov Processes. London: Pergamon.ECKART, C., and YOUNG, G. (1936). "On the approximation of one matrix by
another of lower rank." Psychometrika. 1:211-218.ECONOMIC TRENDS (1968). No. 178. London, Central Statistical Office.EDWARDS, R. E. (1967). Fourier Series: A Modern Introduction, Vols. I, II. New
York: Holt, Rinehart and Winston.EHRLICH, L. W. (1970). "Complex matrix inversion versus real." Comm. A.C.M.
13:561-562.ENOCHSON, L. D., and GOODMAN, N. R. (1965). Gaussian approximations to
the distribution of sample coherence. Tech. Rep. AFFDL — TR — 65-57,Wright-Patterson Air Force Base.
EZEKIEL, M. A., and FOX, C. A. (1959). Methods of Correlation and RegressionAnalysis. New York: Wiley.
FEHR, U., and MCGAHAN, L. C. (1967). "Analog systems for analyzing infra-sonic signals monitored in field experimentation." /. Acoust. Soc. Amer. 42:1001-1007.
FEJE~R, L. (1900). "Sur les fonctions bornees et integrates." C. R. Acad. Sci.(Paris) 131:984-987.
FEJ^R, L. (1904). "Untersuchungen iiber Fouriersche Reihen." Mat. Ann. 58:501-569.
FELLER, W. (1966). Introduction to Probability Theory and its Applications, Vol. 2.New York: Wiley.
FIELLER, E. C. (1954). "Some problems in interval estimation." /. Roy. Statist.Soc., B. 16:175-185.
FISHER, R. A. (1928). "The general sampling distribution of the multiple correla-tion coefficient." Proc. Roy. Soc. 121:654-673.
FISHER, R. A. (1962). "The simultaneous distribution of correlation coefficients."Sankhya A. 24:1-8.
FISHER, R. A., and MACKENZIE, W. A. (1922). "The correlation of weeklyrainfall" (with discussion). J. Roy. Met. Soc. 48:234-245.
FISHMAN, G. S. (1969). Spectral Methods in Econometrics. Cambridge: HarvardUniv. Press.
FISHMAN, G. S., and KIVIAT, P. J. (1967). "Spectral analysis of time seriesgenerated by simulation models. Management Science. 13:525-557.
FOX, M. (1956). "Charts of the power of the F-test." Ann. Math. Statist. 27:484-497.
FREIBERGER, W. (1963). "Approximate distributions of cross-spectral estimatesfor Gaussian processes." In Time Series Analysis, Ed. M. Rosenblatt, pp. 244-259. New York: Wiley.
REFERENCES 469
FREIBERGER, W., and GRENANDER, U. (1959). "Approximate distributions ofnoise power measurements." Quart. AppL Math. 17:271-283.
FRIEDLANDER, S. K., and TOPPER, L. (1961). Turbulence; Classic Papers onStatistical Theory. New York: Wiley Interscience.
FRIEDMAN, B. (1961). "Eigenvalues of composite matrices." Proc. Comb. Philos.Soc. 57:37-49.
GABOR, D. (1946). "Theory of communication." J. Inst. Elec. Engrs. 93:429-457.GAJJAR, A. V. (1967). "Limiting distributions of certain transformations of
multiple correlation coefficient." Metron. 26:189-193.GAVURIN, M. K. (1957). "Approximate determination of eigenvalues and the
theory of perturbations." Uspehi Mat. Nauk. 12:173-175.GELFAND, I., RAIKOV, D., and SHILOV, G. (1964). Commutative Normed
Rings. New York: Chelsea.GENTLEMAN, W. M., and SANDE, G. (1966). "Fast Fourier transforms — for
fun and profit." AFIPS. 1966 Fall Joint Computer Conference. 28:563-578.Washington: Spartan.
GERSCH, W. (1972). "Causality or driving in electrophysiological signal analy-sis." J. Math. Bioscience. 14:177-196.
GIBBS, F. A., and GRASS, A. M. (1947). "Frequency analysis of electroencephalo-grams." Science. 105:132-134.
GIKMAN, I. L, and SKOROKHOD, A. V. (1966). "On the densities of probabilitymeasures in function spaces." Russian Math. Surveys. 21:83-156.
GINZBURG, J. P. (1964). "The factorization of analytic matrix functions."Soviet Math. 5:1510-1514.
GIRI, N. (1965). "On the complex analogues of 71? and R2 tests." Ann. Math. Statist.36:664-670.
GIRSHICK, M. A. (1939). "On the sampling theory of roots of determinentalequations." Ann. Math. Statist. 10:203-224.
GLAHN, H. R. (1968). "Canonical correlation and its relationship to discriminantanalysis and multiple regression." J. Atmos. Sci. 25:23-31.
GODFREY, M. D. (1965). "An exploratory study of the bispectrum of an economictime series." Applied Statistics. 14:48-69.
GODFREY, M. D., and KARREMAN, H. F. (1967). "A spectrum analysis ofseasonal adjustment." In Essays in Mathematical Economics, Ed. M. Shubik,pp. 367-421. Princeton: Princeton Univ. Press.
GOLDBERGER, A. S. (1964). Econometric Theory. New York: Wiley.GOLUB, G. H. (1969). "Matrix decompositions and statistical calculations." In
Statistical Computation, Eds. R. C. Milton, J. A. Nelder, pp. 365-397. NewYork: Academic.
GOOD, I. J. (1950). "On the inversion of circulant matrices." Biometrika. 37:185-186.
GOOD, I. J. (1958). "The interaction algorithm and practical Fourier series." J.Roy. Stat. Soc., B. 20:361-372. Addendum (1960), 22:372-375.
GOOD, I. J. (1963). "Weighted covariance for detecting the direction of a Gaussiansource. In Time Series Analysis, Ed. M. Rosenblatt, pp. 447-470. New York:Wiley.
470 REFERENCES
GOOD, I. J. (1971). "The relationship between two fast Fourier transforms."IEEE Trans. Computers. €-20:310-317.
GOODMAN, N. R. (1957). On the joint estimation of the spectra, cospectrum andquadrature spectrum of a two-dimensional stationary Gaussian process. Ph.D.Thesis, Princeton University.
GOODMAN, N. R. (1960). "Measuring amplitude and phase." J. Franklin Inst.270:437-450.
GOODMAN, N. R. (1963). "Statistical analysis based upon a certain multivariatecomplex Gaussian distribution (an introduction)." Ann. Math. Statist. 34:152-177.
GOODMAN, N. R. (1965). Measurement of matrix frequency reponse functions andmultiple coherence functions. Research and Technology Division, AFSC,AFFDL TR 65-56, Wright-Patterson AFB, Ohio.
GOODMAN, N. R. (1967). Eigenvalues and eigenvectors of spectral density matrices.Seismic Data Lab. Report 179.
GOODMAN, N. R., and DUBMAN, M. R. (1969). "Theory of time-varying spec-tral analysis and complex Wishart matrix processes." In Multivariate AnalysisII, Ed. P. R. Krishnaiah, pp. 351-366. New York: Academic.
GOODMAN, N. R., KATZ, S., KRAMER, B. H., and KUO, M. T. (1961). "Fre-quency response from stationary noise: two case histories." Technometrics.3:245-268.
GORMAN, D., and ZABORSZKY, J. (1966). "Functional expansion in state spaceand the s domain." IEEE Trans. Aut. Control. AC-11:498-505.
GRANGER, C. W. J. (1964). Spectral Analysis of Economic Time Series. Princeton:Princeton Univ. Press.
GRANGER, C. W. J., and ELLIOTT, C. M. (1968). "A fresh look at wheat pricesand markets in the eighteenth century." Economic History Review. 20:257-265.
GRANGER, C. W. J., and HUGHES, A. O. (1968). "Spectral analysis of shortseries — a simulation study." J. Roy. Statist. Soc., A. 131:83-99.
GRANGER, C. W. J., and MORGENSTERN, O. (1963). "Spectral analysis ofstock market prices." Kyklos. 16:1-27.
GRENANDER, U. (1950). "Stochastic processes and statistical inference." Ark.Mat. 1:195-277.
GRENANDER, U. (1951 a). "On empirical spectral analysis of stochastic pro-cesses." Ark. Mat. 1:503-531.
GRENANDER, U. (1951b). "On Toeplitz forms and stationary processes." Ark.Mat. 1:551-571.
GRENANDER, U. (1954). "On the estimation of regression coefficients in the caseof an autocorrelated disturbance." Ann. Math. Statist. 25:252-272.
GRENANDER, U., POLLAK, H. O, and SLEPIAN, D. (1959). "The distributionof quadratic forms in normal variates: a small sample theory with applicationsto spectral analysis."/. Soc. Jndust. Appl. Math. 7:374-401.
GRENANDER, U., and ROSENBLATT, M. (1953). "Statistical spectral analysisof time series arising from stochastic processes." Ann. Math. Stat. 24:537-558.
GRENANDER, U., and ROSENBLATT, M. (1957). Statistical Analysis of Sta-tionary Time Series. New York: Wiley.
GRENANDER, U., and SZEG5, G. (1958). Toeplitz Forms and Their Applications.Berkeley: Univ. of Cal. Press.
REFERENCES 471
GROVES, G. W., and HANNAN, E. J. (1968). "Time series regression of sea levelon weather." Rev. Geophysics. 6:129-174.
GROVES, G. W., and ZETLER, B. D. (1964). "The cross-spectrum of sea level atSan Francisco and Honolulu." J. Marine Res. 22:269-275.
GUPTA, R. P. (1965). "Asymptotic theory for principal component analysis in thecomplex case." J. Indian Statist. Assoc. 3:97-106.
GUPTA, S. S. (1963a). "Probability integrals of multivariate normal and multi-variate /." Ann. Math. Statist. 34:792-828.
GUPTA, S. S. (1963b). "Bibliography on the multivariate normal integrals andrelated topics." Ann. Math. Statist. 34:829-838.
GURLAND, J. (1966). "Further consideration of the distribution of the multiplecorrelation coefficient." Ann. Math. Statist. 37:1418.
GYIRES, B. (1961). "Ober die Spuren der verallgemeinerten Toeplitzschen Ma-trize." Publ. Math. Debrecen. 8:93-116.
HAJEK, J. (1962). "On linear statistical problems in stochastic processes." Czech.Math. J. 12:404-443.
HALL, P. (1927). "Multiple and partial correlation coefficients." Biometrika. 19:100-109.
HALMOS, P. R. (1956). Lectures in Ergodic Theory. Tokyo: Math. Soc. Japan.HALPERIN, M. (1967). "A generalisation of Fieller's theorem to the ratio of
complex parameters." J. Roy. Statist. Soc., B. 29:126-131.HAMBURGER, H., and GRIMSHAW, M. E. (1951). Linear Transformations in
n-dimensional Vector Space. Cambridge: Cambridge Univ. Press.HAMMING, R. W. (1962). Numerical Methods for Scientists and Engineers. New
York: McGraw-Hill.HAMMING, R. W., and TUKEY, J. W. (1949). Measuring noise color. Bell
Telephone Laboratories Memorandum.HAMON, B. V., and HANNAN, E. J. (1963). "Estimating relations between time
series." J. Geophys. Res. 68:6033-6041.HANNAN, E. J. (1960). Time Series Analysis. London: Methuen.HANNAN, E. J. (1961a). "The general theory of canonical correlation and its
relation to functional analysis." J. Aust. Math. Soc. 2:229-242.HANNAN, E. J. (1961b). "Testing for a jump in the spectral function." J. Roy.
Statist. Soc., B. 23:394-404.HANNAN, E. J. (1963a). "Regression for time series with errors of measurement."
Biometrika. 50:293-302.HANNAN, E. J. (1963b). "Regression for time series." In Time Series Analysis,
Ed. M. Rosenblatt, pp. 17-37. New York: Wiley.HANNAN, E. J. (1965). "The estimation of relationships involving distributed
lags." Econometrica. 33:206-224.HANNAN, E. J. (1967a). "The estimation of a lagged regression relation." Bio-
metrika. 54:409-418.HANNAN, E. J. (1967b). "Fourier methods and random processes." Bull. Inter.
Statist. Inst. 42:475-494.HANNAN, E. J. (1967c). "Canonical correlation and multiple equation systems in
economics." Econometrica. 35:123-138.HANNAN, E. J. (1968). "Least squares efficiency for vector time series." /. Roy.
Statist. Soc., B. 30:490-498.
472 REFERENCES
HANNAN, E. J. (1970). Multiple Time Series. New York: Wiley.HASSELMAN, K., MUNK, W., and MACDONALD, G. (1963). "Bispectrum of
ocean waves." In Time Series Analysis, Ed. M. Rosenblatt, pp. 125-139. NewYork: Wiley.
HAUBRICH, R. A. (1965). "Earth noise, 5 to 500 millicycles per second. 1. Spectralstationarity, normality, nonlinearity." J. Geophys. Res. 70:1415-1427.
HAUBRICH, R. A., and MACKENZIE, G. S. (1965). "Earth noise, 5 to 500 milli-cydes per second. 2. Reaction of the earth to ocean and atmosphere." J.Geophys. Res. 70:1429-1440.
HENNINGER, J. (1970). "Functions of bounded mean square and generalizedFourier-Stieltjes transforms." Can. /. Math. 22:1016-1034.
HERGLOTZ, G. (1911). "Ober Potenzreihen mit positivem reellem Teil im Ein-heitskreis." Sitzgsber. Sachs Akad. Wiss. 63:501-511.
HEWITT, E., and ROSS, K. A. (1963). Abstract Harmonic Analysis. Berlin:Springer.
HEXT, G. R. (1966). A new approach to time series with mixed spectra. Ph.D. Thesis,Stanford University.
HINICH, M. (1967). "Estimation of spectra after hard clipping of Gaussian pro-cesses." Technometrics. 9:391-400.
HODGSON, V. (1968). "On the sampling distribution of the multiple correlationcoefficient." Ann. Math. Statist. 39:307.
HOFF, J. C. (1970). "Approximation with kernels of finite oscillations, I. Con-vergence." J. Approx. Theory. 3:213-228.
HOOPER, J. W. (1958). "The sampling variance of correlation coefficients underassumptions of fixed and mixed variates." Biometrika. 45:471-477.
HOOPER, J. W. (1959). "Simultaneous equations and canonical correlationtheory." Econometrica. 27:245-256.
HOPF, E. (1937). Ergodentheorie. Berlin: Springer.HOPF, E. (1952). "Statistical hydromechanics and functional calculus." J. Rat.
Mech. Anal. 1:87-123.HORST, P. (1965). Factor Analysis of Data Matrices. New York: Holt, Rinehart
and Winston.HOTELLING, H. (1933). "Analysis of a complex of statistical variables into
principal components." /. Educ. Psych. 24:417-441, 498-520.HOTELLING, H. (1936). "Relations between two sets of variates." Biometrika.
28:321-377.HOWREY, E. P. (1968). "A spectrum analysis of the long-swing hypothesis."
Int. Econ. Rev. 9:228-252.HOYT, R. S. (1947). "Probability functions for the modulus and angle of the normal
complex variate." Bell System Tech. J. 26:318-359.HSU, P. L. (1941). "On the limiting distribution of canonical correlations." Bio-
metrika. 33:38-45.HSU, P. L. (1949). "The limiting distribution of functions of sample means and
application to testing hypotheses." In Proc. Berkeley Symp. Math. Statist.Prob., Ed. J. Neyman, pp. 359-401. Berkeley: Univ. of Cal. Press.
HUA, L. K. (1963). Harmonic Analysis of Functions of Several Variables in ClassicalDomains. Providence: American Math. Society.
REFERENCES 473
IBRAGIMOV, I. A. (1963). "On estimation of the spectral function of a stationaryGaussian process." Theory Prob. Appl. 8:366-401.
IBRAGIMOV, I. A. (1967). "On maximum likelihood estimation of parameters ofthe spectral density of stationary time series." Theory Prob. Appl. 12:115-119.
IOSIFESCU, M. (1968). "The law of the interated logarithm for a class of de-pendent random variables." Theory Prob. Appl. 13:304-313.
IOSIFESCU, M., and THEODORESCU, R. (1969). Random Processes and Learn-ing. Berlin: Springer.
ISSERLIS, L. (1918). "On a formula for the product moment coefficient of anyorder of a normal frequency distribution in any number of variables." Bio-metrika. 12:134-139.
ITO, K., and NISIO, M. (1964). "On stationary solutions of a stochastic differentialequation." J. Math. Kyoto. 4:1-75.
IZENMAN, A. J. (1972). Reduced rank regression for the multivariate linear model.Ph.D. Thesis, University of California, Berkeley.
JAGERMAN, D. L. (1963). "The autocorrelation function of a sequence uniformlydistributed modulo 1." Ann. Math. Statist. 34:1243-1252.
JAMES, A. T. (1964). "Distributions of matrix variates and latent roots derivedfrom normal samples." Ann. Math. Statist. 35:475-501.
JAMES, A. T. (1966). "Inference on latent roots by calculation of hypergeometricfunctions of matrix argument." In Multivariate Analysis, Ed. P. R. Krishnaiah,pp. 209-235. New York: Academic.
JENKINS, G. M. (1961). "General considerations in the analysis of spectra."Technometrics. 3:133-166.
JENKINS, G. M. (1963a). "Cross-spectral analysis and the estimation of linearopen loop transfer functions." In Time Series Analysis, Ed. M. Rosenblatt,pp. 267-278. New York: Wiley.
JENKINS, G. M. (1963b). "An example of the estimation of a linear open-looptransfer function." Technometrics. 5:227-245.
JENKINS, G. M., and WATTS, D. G. (1968). Spectrum Analysis and Its Applica-tions. San Francisco: Holden-Day.
JENNISON, R. C. (1961). Fourier Transforms and Convolutions for the Experi-mentalist. London: Pergamon.
JONES, R. H. (1962a). "Spectral estimates and their distributions, II." Skand.Aktuartidskr. 45:135-153.
JONES, R. H. (1962b). "Spectral analysis with regularly missed observations." Ann.Math. Statist. 33:455-461.
JONES, R. H. (1965). "A reappraisal of the periodogram in spectral analysis."Technometrics. 7:531-542.
JONES, R. H. (1969). "Phase free estimation of coherence." Ann. Math. Statist.40:540-548.
KABE, D. G. (1966). "Complex analogues of some classical non-central multi-variate distributions." Austral. J. Statist. 8:99-103.
KABE, D. G. (1968a). "On the distribution of the regression coefficient matrix of anormal distribution." Austral. J. Statist. 10:21-23.
KABE, D. G. (1968b). "Some aspects of analysis of variance and covariance theoryfor a certain multivariate complex Gaussian distribution." Metrika. 13:86-97.
474 REFERENCES
KAHANE, J. (1968). Some Random Series of Functions. Lexington: Heath.KAMPE de FERIET, J. (1954). "Introduction to the statistical theory of turbu-
lence." J. Soc. Ind. Appl. Math. 2:244-271.KAMPE de FERIET, J. (1965). "Random integrals of differential equations."
In Lectures on Modern Mathematics, Ed. T. L. Saaty, 3:277-321. New York:Wiley.
KANESHIGE, I. (1964). "Frequency response of an automobile engine mounting."Ann. Inst. Stat. Math., Suppl. 3:49-58.
KAWASHIMA, R. (1964). "On the response function for the rolling motion of afishing boat on ocean waves." Ann. Inst. Stat. Math., Suppl. 3:33-40.
KAWATA, T. (1959). "Some convergence theorems for stationary stochasticprocesses." Ann. Math. Statist. 30:1192-1214.
KAWATA, T. (1960). "The Fourier series of some stochastic processes." JapaneseJ. Math. 29:16-25.
KAWATA, T. (1965). "Sur la serie de Fourier d'un processus stochastique sta-tionaire." C. R. Acad. Sci. (Paris). 260:5453-5455.
KAWATA, T. (1966). "On the Fourier series of a stationary stochastic process."Zeit. Wahrschein. 6:224-245.
KEEN, C. G., MONTGOMERY, J., MOWAT, W. M. H., and PLATT, D. C.(1965). "British seismometer array recording systems." J. Br. Instn. RadioEngrs.3Q:219.
KENDALL, M. (1946). Contributions to the Study of Oscillatory Time Series.Cambridge: Cambridge Univ. Press.
KENDALL, M. G., and STUART, A. (1958). The Advanced Theory of Statistics,Vol. I. London: Griffin.
KENDALL, M. G., and STUART, A. (1961). The Advanced Theory of Statistics,Vol. II. London: Griffin.
KENDALL, M. G., and STUART, A. (1968). The Advanced Theory of Statistics,Vol. III. London: Griffin.
KHATRI, C. G. (1964). "Distribution of the 'generalised' multiple correlationmatrix in the dual case." Ann. Math. Statist. 35:1801-1806.
KHATRI, C. G. (1965a). "Classical statistical analysis based on a certain multi-variate complex Gaussian distribution." Ann. Math. Statist. 36:98-114.
KHATRI, C. G. (1965b). "A test for reality of a covariance matrix in a certaincomplex Gaussian distribution." Ann. Math. Statist. 36:115-119.
KHATRI, C. G. (1967). "A theorem on least squares in multivariate linear re-gression." J. Amer. Statist. Assoc. 62:1494-1495.
KHINTCHINE, A. (1934). "Korrelationstheorie der stationaren Prozesse." Math.Annalen. 109:604-615.
KINOSITA, K. (1964). "On the behaviour of tsunami in a tidal river." Ann. Inst.Stat. Math., Suppl. 3:78-88.
KIRCHENER, R. B. (1967). "An explicit formula for exp At." Amer. Math. Monthly.74:1200-1203.
KNOPP, K. (1948). Theory and Application of Infinite Series. New York: Hafner.KOLMOGOROV, A. N. (1941a). "Interpolation und Extrapolation von stationaren
zufalligen Folgen." Bull. Acad. Sci. de l'U.R.S.S. 5:3-14.
REFERENCES 475
KOLMOGOROV, A. N. (1941 b). "Stationary sequences in Hilbert space." (InRussian.) Bull. Moscow State U. Math. 2:1-40. [Reprinted in Spanish inTrab. Estad. 4:55-73, 243-270.]
KOOPMANS, L. H. (1964a). "On the coefficient of coherence for weakly stationarystochastic processes." Ann. Math. Statist. 35:532-549.
KOOPMANS, L. H. (1964b). "On the multivariate analysis of weakly stationarystochastic processes." Ann. Math. Statist. 35:1765-1780.
KOOPMANS, L. H. (1966). "A note on the estimation of amplitude spectra forstochastic processes with quasi-linear residuals." J. Amer. Statist. Assoc. 61:397-402.
KRAMER, H. P., and MATHEWS, M. V. (1956). "A linear coding for transmittinga set of correlated signals." IRE Trans. Inf. Theo. IT-2:41-46.
KRAMER, K. H. (1963). "Tables for constructing confidence limits on the multiplecorrelation coefficient." J. Amer. Statist. Assoc. 58:1082-1085.
KRISHNAIAH, P. R., and WAIKAR, V. B. (1970). Exact joint distributions of fewroots of a class of random matrices. Report ARL 70-0345. Aerospace Res. Labs.
KROMER, R. E. (1969). Asymptotic properties of the autoregressive spectralestimator. Ph.D. Thesis, Stanford University.
KSHIRSAGAR, A. M. (1961). "Some extensions of the multivariate /-distributionand the multivariate generalization of the distribution of the regression co-efficient." Proc. Camb. Philos. Soc. 57:80-85.
KSHIRSAGAR, A. M. (1971). "Goodness of fit of a discriminant function from thevector space of dummy variables." J. Roy. Statist. Soc., B. 33:111-116.
KUHN, H. G. (1962). Atomic Spectra. London: Longmans.KUO, F. F., and KAISER, J. F. (1966). System Analysis by Digital Computer. New
York: Wiley.LABROUSTE, M. H. (1934). "L'analyse des seismogrammes." Memorial des
Sciences Physiques, Vol. 26. Paris: Gauthier-Villars.LAMPERTI, J. (1962). "On covergence of stochastic processes." Trans. Amer.
Math. Soc. 104:430-435.LANCASTER, H. O. (1966). "Kolmogorov's remark on the Hotelling canonical
correlations." Biometrika. 53:585-588.LANCZOS, C. (1955). "Spectroscopic eigenvalue analysis." /. Wash. Acad. Sci.
45:315-323.LANCZOS, C. (1956). Applied Analysis. Englewood Cliffs: Prentice-Hall.LATHAM, G., et al. (1970). "Seismic data from man-made impacts on the moon."
Science. 170:620-626.LAUBSCHER, N. F. (1960). "Normalizing the noncentral / and F distributions."
Ann. Math. Statist. 31:1105-1112.LAWLEY, D. N. (1959). "Tests of significance in canonical analysis." Biometrika.
46:59-66.LEE, Y. W. (1960). Statistical Theory of Communication. New York: Wiley.LEE, Y. W., and WIESNER, J. B. (1950). "Correlation functions and communica-
tion applications." Electronics. 23:86-92.LEONOV, V. P. (1960). "The use of the characteristic functional and semi-invariants
in the ergodic theory of stationary processes." Soviet Math. 1:878-881.
476 REFERENCES
LEONOV, V. P. (1964). Some Applications of Higher-order Semi-invariants to theTheory of Stationary Random Processes (in Russian). Moscow: Izdatilstvo,Nauka.
LEONOV, V. P., and SHIRYAEV, A. N. (1959). "On a method of calculation ofsemi-invariants." Theor. Prob. Appl. 4:319-329.
LEONOV, V. P., and SHIRYAEV, A. N. (1960). "Some problems in the spectraltheory of higher moments, II." Theory Prob. Appl. 5:460-464.
LEPPINK, G. J. (1970). "Efficient estimators in spectral analysis. "Proc. TwelfthBiennial Seminar Can. Math. Cong., Ed. R. Pyke, pp. 83-87. Montreal: Can.Math. Cong.
LliVY, P. (1933). "Sur la convergence absolue des series de Fourier." C. R. Acad.Sci. Paris. 196:463-464.
LEWIS, F. A. (1939). "Problem 3824." Amer. Math. Monthly. 46:304-305.LIGHTHILL, M. J. (1958). An Introduction to Fourier Analysis and Generalized
Functions. Cambridge: Cambridge Univ. Press.LOEVE, M. (1963). Probability Theory. Princeton: Van Nostrand.LOMNICKI, Z. A., and ZAREMBA, S. K. (1957a). "On estimating the spectral
density function of a stochastic process." J. Roy. Statist. Soc., B. 19:13-37.LOMNICKI, Z. A., and ZAREMBA, S. K. (1957b). "On some moments and dis-
tributions occurring in the theory of linear stochastic processes, I." Mh. Math.61:318-358.
LOMNICKI, Z. A., and ZAREMBA, S. K. (1959). "On some moments and dis-tributions occurring in the theory of linear stochastic processes, II." Mh. Math.63:128-168.
LOYNES, R. M. (1968). "On the concept of the spectrum for non-stationaryprocesses." J. Roy. Statist. Soc., B. 30:1-30.
MACDONALD, N. J., and WARD, F. (1963). "The prediction of geomagneticdisturbance indices. 1. The elimination of internally predictable variations."J. Geophys. Res. 68:3351-3373.
MACDUFFEE, C. C. (1946). The Theory of Matrices. New York: Chelsea.MACNEIL, I. B. (1971). "Limit processes for co-spectral and quadrature spectral
distribution functions." Ann. Math. Statist. 42:81-96.MADANSKY, A., and OLKIN, I. (1969). "Approximate confidence regions for
constraint parameters." In Multivariate Analysis — II, Ed. P. R. Krishnaiah,pp. 261-286. New York: Academic.
MADDEN, T. (1964). "Spectral, cross-spectral and bispectral analysis of lowfrequency electromagnetic data." Natural Electromagnetic Phenomena Below30 kc/s, Ed. D. F. Bleil, pp. 429-450. New York: Wiley.
MAJEWSKI, W., and HOLLIEN, H. (1967). "Formant frequency regions of Polishvowels." J. Acoust. Soc. Amer. 42:1031-1037.
MALEVICH, T. L. (1964). "The asymptotic behavior of an estimate for the spectralfunction of a stationary Gaussian process." Theory Prob. Appl. 9:350-353.
MALEVICH, T. L. (1965). "Some properties of the estimators of the spectrum ofa stationary process." Theory Prob. Appl. 10:447-465.
MALINVAUD, E. (1964). Statistical Methods of Econometrics. Amsterdam: North-Holland.
REFERENCES 477
MALLOWS, C. L. (1961). "Latent vectors of random symmetric matrices." Bio-metrika. 48:133-149.
MANN, H. B., and WALD, A. (1943a). "On stochastic limit and order relation-ships." Ann. Math. Statist. 14:217-226.
MANN, H. B., and WALD, A. (1943b). "On the statistical treatment of linearstochastic difference equations." Econometrica. 11:173-220.
MANWELL, T., and SIMON, M. (1966). "Spectral density of the possibly randomfluctuations of 3 C 273." Nature. 212:1224-1225.
MARUYAMA, G. (1949). "The harmonic analysis of stationary stochastic pro-cesses." Mem. Fac. Sci. Kyusyu Univ. Ser. A. 4:45-106.
MATHEWS, M. V. (1963). "Signal detection models for human auditory percep-tion." In Time Series Analysis, Ed. M. Rosenblatt, pp. 349-361. New York:Wiley.
MCGUCKEN, W. (1970). Nineteenth Century Spectroscopy. Baltimore: JohnsHopkins.
MCNEIL, D. R. (1967). "Estimating the covariance and spectral density functionsfrom a clipped stationary time series." /. Roy. Statist. Soc., B. 29:180-195.
MCSHANE, E. J. (1963). "Integrals devised for special purposes." Bull. Amer.Math. Soc. 69:597-627.
MEDGYESSY, P. (1961). Decomposition of Super positions of Distribution Functions.Budapest: Hungar. Acad. Sci.
MEECHAM, W. C. (1969). "Stochastic representation of nearly-Gaussian nonlinearprocesses." J. Statist. Physics. 1:25-40.
MEECHAM, W. C., and SIEGEL, A. (1964). "Wiener-Hermite expansion inmodel turbulence at large Reynolds numbers." Physics Fluids. 7:1178-1190.
MEGGERS, W. F. (1946). "Spectroscopy, past, present and future." J. Opt. Soc.Amer. 36:431-448.
MIDDLETON, D. (1960). Statistical Communication Theory. New York: McGraw-Hill.
MILLER, K. S. (1968). "Moments of complex Gaussian processes." Proc. IEEE.56:83-84.
MILLER, K. S. (1969). "Complex Gaussian processes." SIAM Rev. 11:544-567.MILLER, R. G. (1966). Simultaneous Statistical Inference. New York: McGraw-
Hill.MIYATA, M. (1970). "Complex generalization of canonical correlation and its
application to sea level study." J. Marine Res. 28:202-214.MOORE, C. N. (1966). Summable Series and Convergence Factors. New York:
Dover.MORAN, J. M., et al. (1968). "The 18-cm flux of the unresolved component of 3
C 273." AstrophysicalJ. 151:L99-L101.MORRISON, D. F. (1967). Multivariate Statistical Methods. New York: McGraw-
Hill.MORTENSEN, R. E. (1969). "Mathematical problems of modeling stochastic
non-linear dynamic systems." J. Statist. Physics. 1:271-296.MUNK, W. H., and CARTWRIGHT, D. E. (1966). "Tidal Spectroscopy and
prediction." Phil. Trans., A. 259:533-581.
478 REFERENCES
MUNK, W. H., and MACDONALD, G. J. F. (1960). The Rotation of the Earth.Cambridge: Cambridge Univ. Press.
MUNK, W. H., and SNODGRASS, F. E. (1957). "Measurements of southern swellat Guadalupe Island." Deep-Sea Research. 4:272-286.
MURTHY, V. K. (1963). "Estimation of the cross-spectrum." Ann. Math. Statist.34:1012-1021.
NAKAMURA, I. (1964). "Relation between superelevation and car rolling." Ann.Inst. Stat. Math., Suppl. 3:41-48.
NAKAMURA, H., and MURAKAMI, S. (1964). "Resonance characteristic of thehydraulic system of a water power plant." Ann. Inst. Stat. Math., Suppl.3:65-70.
NAYLOR, T. H., WALLACE, W. H., and SASSER, W. E. (1967). "A computersimulation model of the textile industry." J. Amer. Stat. Assoc. 62:1338-1364.
NERLOVE, M. (1964). "Spectral analysis of seasonal adjustment procedures."Econometrica. 32:241-286.
NETTHEIM, N. (1966). The estimation of coherence. Technical Report, StatisticsDepartment, Stanford University.
NEUDECKER, H. (1968). "The Kronecker matrix product and some of its appli-cations in econometrics." Statistica Neerlandica. 22:69-82.
NEWTON, H. W. (1958). The Face of the Sun. London: Penguin.NICHOLLS, D. F. (1967). "Estimation of the spectral density function when testing
for a jump in the spectrum." Austral. J. Statist. 9:103-108.NISIO, M. (1960). "On polynomial approximation for strictly stationary processes."
J. Math. Soc. Japan. 12:207-226.NISIO, M. (1961). "Remarks on the canonical representation of strictly stationary
processes." J. Math. Kyoto. 1:129-146.NISSEN, D. H. (1968). "A note on the variance of a matrix." Econometrica. 36:603-
604.NOLL, A. M. (1964). "Short-time spectrum and 'cepstrum' techniques for vocal-
pitch detection." J. Acoust. Soc. Amer. 36:296-302.OBUKHOV, A. M. (1938). "Normally correlated vectors." ho. Akad. Nauk SSR.
Section on Mathematics. 3:339-370.OBUKHOV, A. M. (1940). "Correlation theory of vectors." Uchen. Zap. Moscow
State Univ. Mathematics Section. 45:73-92.OCEAN WAVE SPECTRA (1963). National Academy of Sciences. Englewood
Cliffs: Prentice-Hall.OKAMOTA, M. (1969). "Optimality of principal components." In Multivariate
Analysis — II, Ed. P. R. Krisknaiah, pp. 673-686. New York: Academic.OKAMOTO, M., and KANAZAWA, M. (1968). "Minimization of eigenvalues of
a matrix and Optimality of principal components." Ann. Math. Statist. 39:859-863.
OLKIN, I., and PRATT, J. W. (1958). "Unbiased estimation of certain correlationcoefficients." Ann. Math. Statist. 29:201-210.
OLSHEN, R. A. (1967). "Asymptotic properties of the periodogram of a discretestationary process." J. Appl. Prob. 4:508-528.
OSWALD, J. R. V. (1956). "Theory of analytic bandlimited signals applied tocarrier systems." IRE Trans. Circuit Theory. CT-3:244-251.
REFERENCES 479
PANOFSKY, H. A. (1967). "Meteorological applications of cross-spectrumanalysis." In Advanced Seminar on Spectral Analysis of Time Series, Ed. B.Harris, pp. 109-132. New York: Wiley.
PAPOULIS, A. (1962). The Fourier Integral and its Applications. New York:McGraw-Hill.
PARTHASARATHY, K. R. (1960). "On the estimation of the spectrum of astationary stochastic process." Ann. Math. Statist. 31:568-573.
PARTHASARATHY, K. R., and VARADAHN, S. R. S. (1964). "Extension ofstationary stochastic processes." Theory Prob. Appl. 9:65-71.
PARZEN, E. (1957). "On consistent estimates of the spectrum of a stationary timeseries." Ann. Math. Statist. 28:329-348.
PARZEN, E. (1958). "On asymptotically efficient consistent estimates of the spectraldensity function of a stationary time series." J. Roy. Statist. Soc., B. 20:303-322.
PARZEN, E. (1961). "Mathematical considerations in the estimation of spectra."Technometrics. 3:167-190.
PARZEN, E. (1963a). "On spectral analysis with missing observations and ampli-tude modulation." Sankhya. A. 25:180-189.
PARZEN, E. (1963b). "Notes on Fourier analysis and spectral windows." Includedin Parzen (1967a).
PARZEN, E. (1963c). "Probability density functionals and reproducing kernelHilbert spaces." In Times Series Analysis, Ed. M. Rosenblatt, pp. 155-169.New York: Wiley.
PARZEN, E. (1964). "An approach to empirical-time series analysis." RadioScience. 680:937-951.
PARZEN, E. (1967a). Time Series Analysis Papers. San Francisco: Holden-Day.PARZEN, E. (1967b). "Time series analysis for models of signals plus white noise."
In Advanced Seminar on Spectral Analysis of Time Series, Ed. B. Harris, pp.233-257. New York: Wiley.
PARZEN, E. (1967c). "On empirical multiple time series analysis." In Proc. FifthBerkeley Symp. Math. Statist. Prob., 1, Eds. L. Le Cam and J. Neyman, pp.305-340. Berkeley: Univ. of Cal. Press.
PARZEN, E. (1969). "Multiple time series modelling." In Multivariate Analysis —II, Ed. P. R. Krishnaiah, pp. 389-409. New York: Academic.
PEARSON, E. S., and HARTLEY, H. O. (1951). "Charts of the power function foranalysis of variance tests derived from the non-central F distribution." Bio-metrika. 38:112-130.
PEARSON, K., and FILON, L. N. G. (1898). "Mathematical contributions to thetheory of evolution. IV. On the probable errors of frequency constants and onthe influence of random selection on variation and correlation." Phil. Trans.,A. 191:229-311.
PEARSON, K., JEFFERY, G. B., and ELDERTON, E. M. (1929). "On the co-efficient of the first product moment coefficient in samples drawn from anindefinitely large normal population." Biometrika. 21:164-201.
PHILIPP, W. (1967). "Das Gesetz vom iterierten Logarithmus fur stark mischendestationare Prozesse." Zeit. Wahrschein. 8:204-209.
PHILIPP, W. (1969). "The central limit problem for mixing sequences of randomvariables." Z. Wahrschein. verw. Gebiet. 12:155-171.
480 REFERENCES
PICINBONO, B. (1959). "Tendence vers le caractere gaussien par filtrage selectif."C. R. Acad. Sci. Paris. 248:2280.
PICKLANDS, J. (1970). "Spectral estimation with random truncation." Ann.Math. Statist. 41:44-58.
PINSKER, M. S. (1964). Information and Information Stability of Random Variablesand Processes. San Francisco: Holden-Day.
PISARENKO, V. F. (1970). "Statistical estimates of amplitude and phase correc-tions." Geophys. J. Roy. Astron. Soc. 20:89-98.
PISARENKO, V. F. (1972). "On the estimation of spectra by means of non-linearfunctions of the covariance matrix." Geophys. J. Roy Astron. Soc. 28:511-531.
PLAGEMANN, S. H., FELDMAN, V. A., and GRIBBIN, J. R. (1969). "Powerspectrum analysis of the emmission-line redshift distribution of quasi-stellarand related objects." Nature. 224:875-876.
POLYA, G., and SZEGO, G. (1925). Aufgaben und Lehrsatze aus der Analysis I.Berlin: Springer.
PORTMANN, W. O. (1960). "Hausdorff-analytic functions of matrices." Proc.Amer. Math. Soc. 11:97-101.
POSNER, E. C. (1968). "Combinatorial structures in planetary reconnaissance."In Error Correcting Codes, Ed. H. B. Mann, pp. 15-47. New York: Wiley.
PRESS, H., and TUKEY, J. W. (1956). Power spectral methods of analysis and theirapplication to problems in airplane dynamics. Bell Telephone System Monograph2606.
PRIESTLEY, M. B. (1962a). "Basic considerations in the estimation of spectra."Technometrics. 4:551-564.
PRIESTLEY, M. B. (1962b). "The analysis of stationary processes with mixedspectra." J. Roy. Statist. Soc., B. 24:511-529.
PRIESTLEY, M. B. (1964). "Estimation of the spectra density function in thepresence of harmonic components." J. Roy. Statist. Soc., B. 26:123-132.
PRIESTLEY, M. B. (1965). "Evolutionary spectra and non-stationary processes."J. Roy. Statist. Soc., B. 27:204-237
PRIESTLEY, M. B. (1969). "Estimation of transfer functions in closed loop sto-chastic systems." Automatica. 5:623-632.
PUPIN, M. I. (1894). "Resonance analysis of alternating and polyphase currents."Trans. A.I.E.E. 9:523.
QUENOUILLE, M. H. (1957). The Analysis of Multiple Time Series. London:Griffin.
RAO, C. R. (1964). "The use and interpretation of principal component analysis inapplied research." Sankhya, A. 26:329-358.
RAO, C. R. (1965). Linear Statistical Inference and Its Applications. New York:Wiley.
RAO, M. M. (1960). "Estimation by periodogram." Trabajos Estadistica. 11:123-137.
RAO, M. M. (1963). "Inference in stochastic processes. I." Tear. Verojatnest. iPrimemen. 8:282-298.
RAO, M. M. (1966). "Inference in stochastic processes, II." Zeit. Wahrschein. 5:317-335.
REFERENCES 481
RAO, S. T. (1967). "On the cross-periodogram of a stationary Gaussian vectorprocess." Ann. Math. Statist. 38:593-597.
RICHTER, C. P. (1967). "Biological clocks in medicine and psychiatry." Proc.Nat. Acad. Sci. 46:1506-1530.
RICKER, N. (1940). The form and nature of seismic waves and the structure ofseismograms." Geophysics. 5:348-366.
RIESZ, F., and NAGY, B. Sz. (1955). Lessons in Functional Analysis. New York:Ungar.
ROBERTS, J. B., and BISHOP, R. E. D. (1965). "A simple illustration of spectraldensity analysis." J. Sound Vib. 2:37-41.
ROBINSON, E. A. (1967a). Multichannel Time Series Analysis with Digital Com-puter Programs. San Francisco: Holden-Day.
ROBINSON, E. A. (1967b). Statistical Communication and Detection with SpecialReference to Digital Data Processing of Radar and Seismic Signals. London:Griffin.
RODEMICH, E. R. (1966). "Spectral estimates using nonlinear functions." Ann.Math. Statist. 37:1237-1256.
RODRIGUEZ-ITURBE, I., and YEVJEVICH, V. (1968). The investigation of re-lationship between hydrologic time series andsunspot numbers. Hydrology Paper.No. 26. Fort Collins: Colorado State University.
ROOT, W. L., and PITCHER, T. S. (1955). "On the Fourier expansion of randomfunctions." Ann. Math. Statist. 26:313-318.
ROSENBERG, M. (1964). "The square-integrability of matrix-valued functionswith respect to a non-negative Hermitian measure." Duke Math. J. 31:291-298.
ROSENBLATT, M. (1956a). "On estimation of regression coefficients of a vector-valued time series with a stationary disturbance." Ann. Math. Statist. 27:99-121.
ROSENBLATT, M. (1956b). "On some regression problems in time series analysis."Proc. Third Berkeley Symp. Math. Statist. Prob., Vol 1. Ed. J. Neyman, pp.165-186. Berkeley: Univ. of Cal. Press.
ROSENBLATT, M. (1956c). "A central limit theorem and a strong mixing condi-tion." Proc. Nat. Acad. Sci. (U.S.A.). 42:43-47.
ROSENBLATT, M. (1959). "Statistical analysis of stochastic processes with sta-tionary residuals." In Probability and Statistics, Ed. U. Grenander, pp. 246-275.New York: Wiley.
ROSENBLATT, M. (1960). "Asymptotic distribution of the eigenvalues of blockToeplitz matrices." Bull. Amer. Math. Soc. 66:320-321.
ROSENBLATT, M. (1961). "Some comments on narrow band-pass filters." Quart.Appl. Math. 18:387-393.
ROSENBLATT, M. (1962). "Asymptotic behavior of eigenvalues of Toeplitzforms." J. Math. Mech. 11:941-950.
ROSENBLATT, M. (1964). "Some nonlinear problems arising in the study of ran-dom processes." Radio Science. 68D:933-936.
ROSENBLATT, M., and VAN NESS, J. S. (1965). "Estimation of the bispectrum."Ann. Math. Statist. 36:1120-1136.
ROZANOV, Yu. A. (1967). Stationary Random Processes. San Francisco: Holden-Day.
482 REFERENCES
SALEM, R., and ZYGMUND, A. (1956). "A note on random trigonometricpolynomials. In Proc. Third Berkeley Symp. Math. Statist. Prob., Ed. J. Neyman,pp. 243-246. Berkeley: Univ. of Cal. Press.
SARGENT, T. J. (1968). "Interest rates in the nineteen-fifties." Rev. Econ. Stat.50:164-172.
SATO, H. (1964). "The measurement of transfer characteristic of ground-structuresystems using micro tremor." Ann. Inst. Stat. Math., Suppl. 3:71-78.
SATTERTHWAITE, F. E. (1941). "Synthesis of variance." Psychometrica. 6:309-316.
SAXENA, A. K. (1969). "Classification into two multivariate complex normal dis-tributions with different covariance matrices." J. Ind. Statist. Assoc. 7:158-161.
SCHEFFE, H. (1959). The Analysis of Variance. New York: Wiley.SCHOENBERG, I. J. (1946). "Contributions to the problem of approximation of
equidistant data by analytic functions." Quart. Appl. Math. 4:45-87, 112-141.SCHOENBERG, I. J. (1950). "The finite Fourier series and elementary geometry."
Amer. Math. Monthly. 57:390-404.SCHUSTER, A. (1894). "On interference phenomena." Phil. Mag. 37:509-545.SCHUSTER, A. (1897). "On lunar and solar periodicities of earthquakes." Proc.
Roy.Soc. 61:455-465.SCHUSTER, A. (1898). "On the investigation of hidden periodicities with applica-
tion to a supposed 26 day period of meteorological phenomena." Terr. Magn.3:13-41.
SCHUSTER, A. (1900). "The periodogram of magnetic declination as obtainedfrom the records of the Greenwich Observatory during the years 1871-1895."Camb. Phil. Trans. 18:107-135.
SCHUSTER, A. (1904). The Theory of Optics. London: Cambridge Univ. Press.SCHUSTER, A. (1906a). "The periodogram and its optical analogy." Proc. Roy.
Soc. 77:137-140.SCHUSTER, A. (1906b). "On the periodicities of sunspots." Philos. Trans. Roy.
Soc., A. 206:69-100.SCHWARTZ, L. (1957). Theorie des Distributions, Vol. I. Paris: Hermann.SCHWARTZ, L. (1959). Theorie des Distributions, Vol. II. Paris: Hermann.SCHWERDTFEGER, H. (1960). "Direct proof of Lanczos's decomposition the-
orem." Amer. Math. Mon. 67:856-860.SEARS, F. W. (1949). Optics. Reading: Addison-Wesley.SHAPIRO, H. S. (1969). Smoothing and Approximation of Functions. New York:
Van Nostrand.SHIRYAEV, A. N. (1960). "Some problems in the spectral theory of higher-order
moments, I." Theor. Prob. Appl. 5:265-284.SHIRYAEV, A. N. (1963). "On conditions for ergodicity of stationary processes in
terms of higher order moments." Theory Prob. Appl. 8:436-439.SHUMWAY, R. H. (1971). "On detecting a signal in N stationarily correlated
noise series." Technometrics. 13:499-519.SIMPSON, S. M. (1966). Time Series Computations in FORTRAN and FAP.
Reading: Addison-Wesley.SINGLETON, R. C. (1969). "An algorithm for computing the mixed radix fast
Fourier transform." IEEE Trans. Audio Elec. AU-17:93-103.
REFERENCES 483
SINGLETON, R. C., and POULTER, T. C. (1967). "Spectral analysis of the callof the male killer whale." IEEE Trans, on Audio and Electroacoustics. AU-15:104-113.
SIOTANI, M. (1967). "Some applications of Loewner's ordering of symmetricmatrices." Ann. Inst. Statist. Math. 19:245-259.
SKOROKHOD, A. V. (1956). "Limit theorems for stochastic processes." TheoryProb. Appl. 1:261-290.
SLEPIAN, D. (1954). "Estimation of signal parameters in the presence of noise."Trans. I.R.E. PGIT-3:82-87.
SLEPIAN, D. (1958). "Fluctuations of random noise power." Bell Syst. Tech. J.37:163-184.
SLUTSKY, E. (1929). "Sur 1'extension de la theorie de periodogrammes aux suitesdes quantites dependentes." Comptes Rendues. 189:722-733.
SLUTSKY, E. (1934). "Alcuni applicazioni di coefficienti di Fourier al analizo disequenze eventual! coherenti stazionarii." Giorn. d. Institute Italiano degliAtuari. 5:435-482.
SMITH, E. J., HOLZER, R. E., MCLEOD, M. G., and RUSSELL, C. T. (1967)."Magnetic noise in the magnetosheath in the frequency range 3-300 Hz." J.Geophys. Res. 72:4803-4813.
SOLODOVNIKOV, V. V. (1960). Introduction to the Statistical Dynamics of Auto-matic Control Systems. New York: Dover.
SRIVASTAVA, M. S. (1965). "On the complex Wishart distribution." Ann. Math.Statist. 36:313-315.
STIGUM, B. P. (1967). "A decision theoretic approach to time series analysis."Ann. Inst. Statist. Math. 19:207-243.
STOCKHAM, T. G., Jr., (1966). "High speed convolution and correlation." Proc.Spring Joint Comput. Conf. 28:229-233.
STOKES, G. G. (1879). Proc. Roy. Soc. 122:303.STONE, R. (1947). "On the interdependence of blocks of transactions." /. Roy.
Statist. Soc., B. 9:1-32.STRIEBEL, C. (1959). "Densities for stochastic processes." Ann. Math. Statist.
30:559-567.STUMPFF, K. (1937). Grundlagen und Methoden der Periodenforschung. Berlin:
Springer.STUMPFF, K. (1939). Tafeln und Aufgaben zur Harmonischen Analyse undPeriodo-
grammrechnung. Berlin: Springer.SUGIYAMA, G. (1966). "On the distribution of the largest latent root and corre-
sponding latent vector for principal component analysis." Ann. Math. Statist.37:995-1001.
SUHARA, K., and SUZUKI, H. (1964). "Some results of EEG analysis by analogtype analyzers and finer examinations by a digital computer." Ann. Inst. Statist.Math., Suppl. 3:89-98.
TAKEDA, S. (1964). "Experimental studies on the airplane response to the sidegusts." Ann. Inst. Statist. Math., Suppl. 3:59-64.
TATE, R. F. (1966). "Conditional-normal regression models." /. Amer. Statist.Assoc. 61:477-489.
484 REFERENCES
TICK, L. J. (1963). "Conditional spectra, linear systems and coherency." In TimeSeries Analysis, Ed. M. Rosenblatt, pp. 197-203. New York: Wiley.
TICK, L. J. (1966). "Letter to the Editor." Technometrics. 8:559-561.TICK, L. J. (1967). "Estimation of coherency." In Advanced Seminar on Spectral
Analysis of Time Series, Ed. B. Harris, pp. 133-152. New York: Wiley.TIMAN, M. F. (1962). "Some linear summation processes for the summation of
Fourier series and best approximation." Soviet Math. 3:1102-1105.TIMAN, A. F. (1963). Theory of Approximation of Functions of a Real Variable.
New York: Macmillan.TUKEY, J. W. (1949). "The sampling theory of power spectrum estimates." Proc.
on Applications of Autocorrelation Analysis to Physical Problems. NAVEXOS-P-735, pp. 47-67. Washington, D.C.: Office of Naval Research, Dept. of theNavy.
TUKEY, J. W. (1959a). "An introduction to the measurement of spectra." InProbability and Statistics, Ed. U. Grenander, pp. 300-330. New York: Wiley.
TUKEY, J. W. (1959b). "The estimation of power spectra and related quantities."In On Numerical Approximation, pp. 389-411. Madison: Univ. of WisconsinPress.
TUKEY, J. W. (1959c). "Equalization and pulse shaping techniques applied to thedetermination of initial sense of Rayleigh waves." In The Need of FundamentalResearch in Seismology, Appendix 9, pp. 60-129. Washington: U.S. Departmentof State.
TUKEY, J. W. (1961). "Discussion, emphasizing the connection between analysisof variance and spectrum analysis." Technometrics. 3:1-29.
TUKEY, J. W. (1965a). "Uses of numerical spectrum analysis in geophysics." Bull.I.S.I. 35 Session. 267-307.
TUKEY, J. W. (1965b). "Data analysis and the frontiers of geophysics." Science.148:1283-1289.
TUKEY, J. W. (1967). "An introduction to the calculations of numerical spectrumanalysis." In Advanced Seminar on Spectral Analysis of Time Series, Ed. B.Harris, pp. 25-46. New York: Wiley.
TUMURA, Y. (1965). "The distributions of latent roots and vectors." TRU Math-ematics. 1:1-16.
VAN DER POL, B. (1930). "Frequency modulation." Proc. Inst. Radio. Eng. 18:227.
VARIOUS AUTHORS (1966). "A discussion on recent advances in the technique ofseismic recording and analysis." Proc. Roy. Soc. 290:288-476.
VOLTERRA, V. (1959). Theory of Functional* and of Integrals and Integra-differ-ential Equations. New York: Dover.
VON MISES, R. (1964). Mathematical Theory of Probability and Statistics. NewYork: Academic.
VON MISES, R., and DOOB, J. L. (1941). "Discussion of papers on probabilitytheory." Ann. Math. Statist. 12:215-217.
WAHBA, G. (1966). Cross spectral distribution theory for mixed spectra and estima-tion of prediction filter coefficients. Ph.D. Thesis, Stanford University.
WAHBA, G. (1968). "One the distribution of some statistics useful in the analysis ofjointly stationary time series." Ann. Math. Statist. 39:1849-1862.
REFERENCES 485
WAHBA, G. (1969). "Estimation of the coefficients in a distributed lag model."Econometrica. 37:398-407.
WALDMEIR, M. (1961). The Sunspot Activity in the Years 1610-1960. Zurich:Schulthess.
WALKER, A. M. (1954). "The asymptotic distribution of serial correlation co-efficients for autoregressive processes with dependent residuals." Proc. Camb.Philos. Soc. 50:60-64.
WALKER, A. M. (1965). "Some asymptotic results for the periodogram of a sta-tionary time series." J. Austral. Math. Soc. 5:107-128.
WALKER, A. M. (1971). "On the estimation of a harmonic component in a timeseries with stationary residuals." Biometrika. 58:21-36.
WEDDERBURN, J. H. M. (1934). Lectures on Matrices. New York: Amer. Math.Soc.
WEGEL, R. L., and MOORE, C. R. (1924). "An electrical frequency analyzer."Bell Syst. Tech. J. 3:299-323.
WELCH, P. D. (1961). "A direct digital method of power spectrum estimation."1BMJ. Res. Deo. 5:141-156.
WELCH, P. D. (1967). "The use of the fast Fourier transform for estimation ofspectra: a method based on time averaging over short, modified periodograms."IEEE Trans. Electr. Acoust. AU-15:70.
WEYL, H. (1946). Classical Groups. Princeton: Princeton Univ. Press.WHITTAKER, E. T., and ROBINSON, G. (1944). The Calculus of Observations.
Cambridge: Cambridge Univ. Press.WHITTLE, P. (1951). Hypothesis Testing in Time Series Analysis. Uppsala: Alm-
qvist.WHITTLE, P. (1952a). "Some results in time series analysis." Skand. Aktuar. 35:
48-60.WHITTLE, P. (1952b). "The simultaneous estimation of a time series' harmonic and
covariance structure." Trab. Estad. 3:43-57.WHITTLE, P. (1953). "The analysis of multiple stationary time series." J. Roy.
Statist. Soc., B. 15:125-139.WHITTLE, P. (1954). "A statistical investigation of sunspot observations with
special reference to H. Alven's sunspot model." Astrophys. J. 120:251-260.WHITTLE, P. (1959). "Sur la distribution du maximim d'un polynome trigono-
metrique a coefficients aleatoires." Colloques Internationaux du Centre Nationalde la Recherche Scientifique. 87:173-184.
WHITTLE, P. (1961). "Gaussian estimation in stationary time series." Bull. Int.Statist. Inst. 39:105-130.
WHITTLE, P. (1963a). Prediction and Regulation. London: English UniversitiesPress.
WHITTLE, P. (1963b). "On the fitting of multivariate auto-regressions and theapproximate canonical factorization of a spectral density matrix." Biometrika.50:129-134.
WIDOM, H. (1965). "Toeplitz matrices." In Studies in Real and Complex Analysis,Ed. 1.1. Hirschman, Jr., pp. 179-209. Englewood Cliffs: Prentice-Hall.
WIENER, N. (1930). "Generalized harmonic analysis." Acta. Math. 55:117-258.
486 REFERENCES
WIENER, N. (1933). The Fourier Integral and Certain of its Applications. Cambridge:Cambridge Univ. Press.
WIENER, N. (1938). "The historical background of harmonic analysis." Amer.Math. Soc. Semicentennial Pub. 2:56-68.
WIENER, N. (1949). The Extrapolation, Interpolation and Smoothing of StationaryTime Series with Engineering Applications. New York: Wiley.
WIENER, N. (1953). "Optics and the theory of stochastic processes." J. Opt. Soc.Amer. 43:225-228.
WIENER, N. (1957). "Rhythms in physiology with particular reference to ence-phalography." Proc. Rud. Virchow Med. Soc. in New York. 16:109-124.
WIENER, N. (1958). Non-linear Problems in Random Theory. Cambridge: MITPress.
WIENER, N., SIEGEL, A., RANKIN, B., and MARTIN, W. T. (1967). DifferentialSpace, Quantum Systems and Prediction. Cambridge: MIT Press.
WIENER, N., and WINTNER, A. (1941). "On the ergodic dynamics of almostperiodic systems." Amer. J. Math. 63:794-824.
WILK, M. B., GNANADESIKAN, R., and HUYETT, M. J. (1962). "Probabilityplots for the gamma distribution." Technometrics. 4:1-20.
WILKINS, J. E. (1948). "A note on the general summability of functions." Ann.Math. 49:189-199.
WILKINSON, J. H. (1965). The Algebraic Eigenvalue Problem. Oxford: OxfordUniv. Press.
WILLIAMS, E. J. (1967). "The analysis of association among many variates."J. Roy. Statist. Soc., B. 29:199-242.
WINTNER, A. (1932). "Remarks on the ergodic theorem of Birkhoff." Proc. Nat.Acad. Sci. (U.S.A.). 18:248-251.
WISHART, J. (1931). "The mean and second moment coefficient of the multiplecorrelation coefficient in samples from a normal population." Biometrika.22:353-361.
WISHART, J., and BARTLETT, M. S. (1932). "The distribution of second ordermoment statistics in a normal system." Proc. Camb. Philos. Soc. 28:455-459.
WOLD, H. O. A. (1948). "On prediction in stationary time series." Ann. Math.Statist. 19:558-567.
WOLD, H. O. A. (1954). A Study in the Analysis of Stationary Time Series, 2nd ed.Uppsala: Almqvist and Wiksells.
WOLD, H. O. A. (1963). "Forecasting by the chain principle." In Time SeriesAnalysis, Ed. M. Rosenblatt, pp. 471-497. New York: Wiley.
WOLD, H. O. A. (1965). Bibliography on Time Series and Stochastic Processes.London: Oliver and Boyd.
WONG, E. (1964). "The construction of a class of stationary Markov processes."Proc. Symp. Applied Math. 16:264-276. Providence: Amer. Math. Soc.
WOOD, L. C. (1968). "A review of digital pass filtering." Rev. Geophysics. 6:73-98.WOODING, R. A. (1956). "The multivariate distribution of complex normal
variates." Biometrika. 43:212-215.WOODROOFE, M. B., and VAN NESS, J. W. (1967). "The maximum deviation of
sample spectral densities." Ann. Math. Statist. 38:1558-1570.
REFERENCES 487
WORLD WEATHER RECORDS. Smithsonian Miscellaneous Collections, Vol. 79(1927), Vol. 90 (1934), Vol. 105 (1947). Smithsonian Inst. Washington.
WORLD WEATHER RECORDS. 1941-1950 (1959) and 1951-1960 (1965). U.S.Weather Bureau, Washington, D.C.
WRIGHT, W. D. (1906). The Measurement of Colour. New York: Macmillan.YAGLOM, A. M. (1962). An Introduction to the Theory of Stationary Random
Functions. Englewood Cliffs: Prentice-Hall.YAGLOM, A. M. (1965). "Stationary Gaussian processes satisfying the strong
mixing condition and best predictable functional." In Bernoulli, Bayes,Laplace, Ed. J. Neyman and L. M. LeCam, pp. 241-252. New York: Springer.
YAMANOUCHI, Y. (1961). "On the analysis of the ship oscillations amongwaves—I, II, III." J. Soc. Naval Arch. (Japan). 109:169-183; 110:19-29; 111:103-115.
YULE, G. U. (1927). "On a method of investigating periodicities in disturbed series,with special reference to Wolfer's sunspot numbers." Phil. Trans. Roy. Soc.,A. 226:267-298.
YUZURIHA, T. (1960). "The autocorrelation curves of schizophrenic brain wavesand the power spectra." Psych. Neurol. Jap. 62:911-924.
ZYGMUND, A. (1959). Trigonometric Series. Cambridge: Cambridge Univ. Press.ZYGMUND, A. (1968). Trigonometric Series, Vols. I, II. Cambridge: Cambridge
Univ. Press.
NOTATION INDEX
a(«), 29a<r>(M), 317A(X), 29, 296A<T>(X), 300, 305, 307, 323/MX), 143AT
m(\), 132,244aveA', 16ave, 199arg z, 17
Br, 146flr(X), 143flr^X), 133,244
c0, 22ca<
T>, 236, 260c*,94, 116, 232CJT(T), 83, 123, 129, 150, 160, 183, 184ca6(«), 22, 42cjtjt(w), 23, 24, 116,232cxx<r>(«), 161, 182, 256cjrjr(r)(«), 167, 168cxx<r)(«,0, 165co6<
T>(H), 256Cr(X), 143Cj-^CX), 133, 244cov, 203cov {*, K}, 16, 19,90cov {X,Y),22cor (X, Y\, 16
c u m ( K i , . . . , Kr), 19c«, . . . « t ( / i , . . . , r*_i), 23,92c,,. . .« t ( / I , . . . , / * ) , 21
D«(a), 50, 162D[0,7r], 258dx(T)(X), 61,91, 120, 123,235^(X,/), 239Det A, 16
EX, 16
./^JVxCX), 297/», .. . a t (Xi , . . . ,X f c _! ) , 25, 92/., .. .a*(Xi Xt), 26/., 25/oi(X), 23£b(r)(X), 248, 261/xx(X), 24, 116,233fxx(T)(\), 132,142, 146, 147, 150, 164, 242,
243, 248, 282/rx(r)(X), 194/xxl"l(X), 155/ra6<T)(X), 256Fjrx(X), 25, 166/=ix(T)(X), 167, 168Fm;.W), 191
G(X), 302, 325^x(X), 177g«.(r)(X), 195, 300
488
NOTATION INDEX 489
//<r>(X), 124#2<
r>(X), 128H.™.. 0jt(X), 91
1,8I, 16/xx(T)(X), 120, 182Ixx<r>(X), 235/xx<F>(X,/), 164Ixx<V)(X,/), 239Imz, 17Im Z, 71
y^>(/4), 167yab(/0, 254JahW(A), 255
Jt^'>(M), 155/f<r>(«), 155
Hm, 98, 131
wafc(H), 41,68, 174mxx(«), 47, 80, 175mxx(T)(u), 115, 181
M<l»,S): 89WV,S), 89
oGU 52CK/U 52Opd), 423on.,.(l), 196
&YX2, 289£rx2, 189, 291|/?rx|2, 293|£yx|2, 294/?oi(X), 256/?ab^(X), 257^KAr(X), 297|*rx(X)|2, 296|-Ryx(r)(X)|2, 196, 198, 305, 307Re z, 17*Kay6.x(X), 297Re Z, 71
sgn X, 32, 165
T, translation, 28b, 291
/^T), 253t,c, 294, 305tr A, 16
V(0, 78V+(/), 76vec, 230, 287varX 16, 19va7, 149, 202
W"T>(a), 146Jfofc(
rK«), 248^r(«,S), 90Wr
c(fl,£), 90
X(r,o.), 104, 163
«/£), 70/3,,w, 56«(«), 17«l«l, >7A(r>(«), 86, 93nM,17^(a), 17, 26, 47, 101M,<Z), 70, 84, 85, 287£.2900(A), 86tf«6(X), 24
0(X), 303x,2, 126x-2(«), 151xV, 113, 127
(g), 230, 288>, 228, 287^, approximately congruent to, 120, 124\ transpose, 16, 70
, complex conjugate, 16, 70| |, absolute value of a matrix, 16[jk\, matrix, 16, 70*, convolution, 29+, generalized inverse, 87= , congruent to, 17[ ], integral part, 15, 84R, real matrix, 71| |, modulus, 17|| ||, matrix norm, 74*, Hilbert transform, 32, 105A, periodic extension, 65, 66, 167~ , associated process, 42
24
AUTHOR INDEX
Abel, N. H., 15, 55Abelson, R., 12Abramowitz, M., 191, 291, 334Aczel, J., 13Aitken, A. C., 73Akaike, H., 57,128,160,164,165,172,190,
191, 194, 207, 221, 226, 263, 266, 298,301,309,317,324,330
Akcasu, A. Z., 164Akhiezer, N. I., 55, 57Albert, A., 174Alberts, W. W., 12, 180Alexander, M. J., 292, 317Amos, D. E., 291, 295, 317Anderson, G. A., 341Anderson, T. W., 150, 340, 341, 372Arato, M., 12Arens, R., 79Aschoff, J., 12Autonne, L., 72
Balakrishnan, A. V., 38Barlow, J. S., 12, 180Bartels, J., 10Bartlett, M. S., 10, 12, 55, 100, 113, 128,
142, 160, 161, 164, 170, 173, 283, 354Bass, J., 81Batchelor, G. K.,11Baxter, G., 79Bellman, R., 71, 84, 283, 287, 399, 458
Bendat, J. S., 208Beranek, L. L., 11Bergland, G. D., 64Bernstein, S., 36, 445Bertrand, J., 12,180Bertrandias, J. B., 82Bessel, F. W., 113, 114,229Beveridge, W. H., 12, 179Billingsley, P., 43, 258, 259, 421, 439Bingham, C., 64, 66, 86, 120Bishop, R. E. D., 118Blackman, R. B., 55, 150, 179, 298Blanc-Lapierre, A., 10, 26, 36,163, 263Bochner, S., 8, 47, 55, 76, 399, 401Bode, H. W., 180Bohman, H., 55, 57, 69Bonferroni, C. E., 209Borel, E., 406Born, M., 11Bowley, A. L., 338Box, G. E. P., 13, 145, 166Bracewell, R., 181Brenner, J. L., 71Brigham, E. O., 64Brillinger, D. R., 5,9,12,26, 38,82,94,95,
110, 150,160, 165, 172, 173, 176, 188,194, 199, 226, 231, 238, 240, 245, 260,263, 279, 324, 332, 341, 343, 348, 353,368, 439, 449
Bryson, R. A., 181
490
AUTHOR INDEX 491
Bullard, E., 180Bunimovitch, V. I., 33Burgers, J. M., 11Burkhardt, H., 10Burley, S. P., 12Businger, P. A., 72Butzer, P. L., 57
Cairns, T. W., 64Calderon, A. P., 79Cantelli, F. P., 407Capon, J., 166Cargo,G. T., 415Carpenter, E. W., 1, 180Cartwright, D. E., 180, 225Cauchy, A. L., 55, 57, 395, 411Chambers,!. M., 341, 374Chance, B., 12Chapman, S., 10Chernoff, H., 96Choksi, J. R., 38Clevenson, M. L., 260Condit, H. R., 180Constantine, A. G., 374, 388Cooley, J. W., 64, 66, 120, 164Cootner, P. H., 12Cornish, E. A., 341Courant, R., 84, 399Coveyou, R. R., 12Craddock, J. M., 1, 109, 354Cramer, H., 10, 18, 25, 41, 43, 100, 102-
106, 108, 109, 114, 234, 258, 345, 354,432
Crandall, I. B., 11Creasy, M. A., 192
Daniell, P. J., 10, 142Daniels, H. E., 165Darroch, J. N., 340Darzell, J. F., 221Davis, C, 74, 94Davis, R. C, 94Deemer, W. L., 85Dempster, A. P., 341, 372, 374, 458Deutsch, R., 33Dickey, J. M., 192, 291Dirac, P. A. M., 17, 26, 101, 173, 235, 410Dirichlet, P. G. L., 55, 57Doeblin, W., 13Doob, J. L., 8, 18, 38,41,42,43Dressel, P. L., 20
Dubman, M. R., 110Dugundji, J., 33Duncan, D. B., 194Dunford, N., 348Dunnett, C. W., 192Durbin, J., 13, 182, 323,324Dutton, J. A., 181Dynkin, E. B., 36
Eckart, C., 75Edwards, R. E., 8, 17, 47, 50, 52, 53, 55,
82,91Ehrlich, L. W., 71Elderton, E. M., 113Elliot, C. M., 180Enochson, L. D., 120, 306, 312, 317Euler, L., 15Ezekiel, M. A., 291
Fehr, U., 11Feinstein, B., 12Fejer, L., 52, 55, 57Feldman, V. A., 180Feller, W., 36Fieller, E. C., 192Filon, L. N. G., 257, 292Fischer, E., 84, 399Fisher, R. A., 257, 292, 330, 341Fishman, G. S., 12, 226, 301Flood, C. R., 355Fortet, R., 10, 26, 36, 163, 263Fourier, J. J., 8, 9, 31, 49, 50, 52, 60-70,
73-75, 78,79, 88,91, 93-101,105,123,130, 132, 142, 160-163, 167, 194, 210,212, 221, 222, 228, 235, 247, 255, 260,262, 299, 332, 394, 396, 401
Fox, C., 291Fox, M., 191Freiberger, W., 254, 263Friedlander, S. K., 11Friedman, B., 84Fubini, G., 396
Gabor, D., 11Gajjar, A. V., 292Gauss, C. F.,55, 57, 188Gavurin, M. K., 74Gelfand, I., 76, 78, 79, 287, 400Gentleman, W. M., 64, 67Gersch, W., 297Gibbs, F. A., 12, 52, 181Gikman, I. I., 12
492 AUTHOR INDEX
Giri, N., 112,113, 292Girshick, M. A., 341Glahn, H. R., 372, 391Gnanadesikan, R., 126Godfrey, M. D., 5, 180Goldberger, A. S., 289Goldstein, R. M., 165Golub.G.H., 72, 341,374Good, I. J., 64, 65, 71,73Goodman, N. R., 9, 60, 71, 89, 90, 110,
114, 166,191, 208, 226, 240, 245, 262,294, 297, 298, 306, 309, 312, 317, 348
Granger, C. W. J., 12, 180, 226, 263, 298,309
Grass, A.M., 12,181Grenander, U., 10,12, 47, 54, 56, 84,128,
146, 150, 161, 174, 175, 176, 225, 229Gribbon, J. R., 180Grimshaw, M. E., 73Groves, G. W., 192, 207, 225, 295, 301,
306,317Grum, F., 180Gupta, R. P., 343Gupta, S. S., 90, 192, 343Gurland, J., 292Gyires, B., 254
Hajek,J., 12Hall, P., 257, 292Halmos, P. R., 43Halperin, M., 192Hamburger, H., 73Hamming, R. W., 52, 55, 57,66,161Hamon, B. V., 225Hannan, E. J., 10, 79, 128, 150, 174, 176,
182,192, 207,225, 295, 301, 306, 317,319, 324, 354, 372
Hartley, H. O., 191Hasselman, K., 26Hatanaka, M., 12,176, 263, 324Haubrich, R. A., 11, 180,225Helly, E., 394Heninger, J., 82Herglotz, G., 25Hewitt, E., 8Hext, G. R., 174Higgins, J., 12Hilbert, D., 32,60,104,234, 301, 353, 389,
400Hinich, M., 165Hodgson, V., 292
Hoff, J. C., 57Hoffman, K., 74, 456Holder, O., 409Hollien, H., 11Holzer, R. E., 11Hooper, J. W., 334, 372, 374Hopf, E., 11,43Horst, P., 354Hotelling, H., 108, 289, 340, 372, 374Howrey, E. P., 180Hoyt, R. S., 192Hsu, P. L., 257, 292, 374Hua, L. K., 44, 72Huyett, M. J., 126
Ibragimov, I. A., 181,260losifescu, M., 94, 98Isserlis, L., 21Ito, K., 38, 39Izenman, A. J., 341, 374
Jackson, D., 55Jagerman, D. L., 12James, A. T., 89, 294, 341, 343, 352, 374,
379, 388Jeffery, G. B., 113Jenkins, G. M., 13,120,166,226,263,298,
309,313Jennison, R. C., 11Jones, R. H., 142, 150, 165, 194, 330
Kabe, D. G., 90Kahan, W. M., 74Kahane, J., 98Kaiser, J. F., 60Kampe de FeYiet, J., 11, 38Kanazawa, M., 340, 454Kaneshige, I., 11, 226Karreman, H. F., 180Katz, S., 226Kawashima, R., 11,225Kawata, T., 94,128Keen, C. G., 1Kendall, M. G., 10,20,188, 253,289, 292,
314, 323, 372, 420Khatri, C. G., 85, 190, 226, 289, 294, 306Khintchine, A., 8, 10Kinosita, K., 11Kirchener, R. B., 13Kiviat, P. J., 12Knopp, K., 14
AUTHOR INDEX 493
Kolmogorov, A. N., 10, 42, 181Koopmans, L. H., 176, 291, 295, 297, 298,
317, 348Kramer, B. H., 226Kramer, H. P., 75, 108, 340Kramer, K. H., 291Krishnaiah, P. JR., 341Kromer, R. E., 164Kronecker, L., 17, 44, 148, 288Kshirsagar, A. M., 192, 291, 372Kuhn, H. G., 10Kuo, F. F., 60Kuo, M. T., 226
Labrouste, M. H., 11Lacape, R. S., 12, 180Lancaster, H. O., 372Lanczos, C., 55, 69, 71Landau, E., 52Latham, G., 11Laubscher, N. F., 191Lawley, D. N., 374Leadbetter, M. R., 18, 41, 43, 102, 258Lee, Y. W., 11,226,331Leonov, V. P., 20, 21, 26, 43, 94, 97Leppink, G. J., 165L6vy, P., 76Lewis, F. A., 84Lewis, P. A. W., 64, 164Lieberman, G. J., 96Lighthill, M. J., 17Loeve, M., 36, 407Lomnicki, Z. A., 172Loynes, R. M., 176
MacDonald, G. J. F., 11, 26MacDonald, N. J., 180MacDuffee, C. C., 70MacKenzie, G. S., 11MacKenzie, W. A., 11, 225, 330MacLaurin, C, 15MacNeil, I. B., 260MacPherson, R. D., 12Madansky, A., 341Madden, T., 225Majewski, W., 11Malevich, T. L., 260Malinvaud, E., 324Mallows, C. L., 341Mann, H. B., 13, 185, 204, 414Manwell, T., 180
Markov, A. A., 36, 45, 188, 396Martin, W. T., 43, 76, 399Maruyama, G., 98Mathews, M. V., 12, 75, 108, 340Maxwell, J. C, 11McGahan, L. C, 11McGucken, W., 10McLeod, M. G., 11McNeil, D. R., 165McShane, E. J., 38Medgyessy, P., 69Meecham, W. C, 11, 38Meggers, W. F., 10Middleton, D., 11,229Miller, K. S., 90Miller, R. G., 209, 229Miyata, M., 390Montgomery, J., 1Moore, C. N., 11, 163Moore, C. R., 54Morgenstern, O., 180Morrison, D. F., 289, 340, 372Morrow, R. E., 64Mortensen, R. E., 38Mowat, W. M. H., 1Munk, W. H., 11,26, 181,225Murakimi, S., 11,226Murthy, V. K., 263
Nagy, B. Sz., 400Nakamura, H., 11,226Naylor, T. H., 226Nerlove, M., 12, 180, 226Nessel, R. J., 57Nettheim, N., 266, 309Neudecker, H., 288Newton, H. W., 138Newton, I., 10Nicholls, D. F., 174Nisio, M., 38, 39Nissen, D. H., 288Noll, A. M., 11Nyquist, H., 179
Obukhov, A. M., 372Okamoto, M.,75, 340, 454Olkin, I., 85, 291, 341Olshen, R. A., 128Oswald, J. R. V., 33
Panofsky, H. A., 225Papoulis, A., 17
494 AUTHOR INDEX
Parseval-Deschenes, M. A., 163, 432Parthasarathy, K. R., 22, 98, 264Parzen, E., 12, 54-56, 60, 120, 150, 159,
161, 164,165, 167, 252, 298, 309, 313,324
Pearson, E. S., 191Pearson, K., 113, 257, 292Philipp, W., 94, 98Picinbono, B., 97Picklands, J., 165Piersol, A., 208Pierson, W. J., Jr., 221Pinsker, M. S., 11, 348, 377, 384Pisarenko, V. F., 165, 166, 225Pitcher, T. S., 94Plagemann, S. H., 180Platt, D. C., 1Poisson, S. D., 47, 55, 57, 91, 124Pollak, H. O., 146Polya, G., 415Portmann, W. O., 85Posner, E. C., 64Poulter, T. C., 5Pratt, J. W., 291Press, H., 9, 11, 54, 159, 179, 180Priestley, M. B., 160,174, 176, 324Pupin, M. I., 11, 163Pye, K., 12
Quenouille, M. H., 13
Raikov, D., 76, 78, 79, 287, 400Rankin, B., 43Rao, C. R., 75, 108, 229, 289, 330, 332,
340, 372, 414Rao, M. M., 12Richter, C. P., 12Ricker, N., 181,338Riemann, B., 55, 57Riesz, F., 55, 57, 400Roberts, J. B., 118Robinson, E. A., 10, 181, 225, 331, 338Robinson, G., 64Rodemich, E. R., 165Rodriguez-lturbe, 1., 225Root, W. L., 94Rosenberg, M., 31Rosenblatt, M., 5, 9, 26, 38, 94, 97, 128,
150, 160, 161, 173, 176, 225, 252, 254,259, 262
Ross, K. A., 8
Rozanov, Yu. A., 43, 348Russel, C. T., 11
Sacia, C. F., 11Salem, R., 98Sande, G., 64, 67Sargent, T. J., 180Sasser, W. E., 226Sato, H., 11Satterthwaite, F. E., 145Saxena, A. K., 90Scheffe, H., 229, 276, 277Schoenberg, I. J., 64, 73, 82Schur, I., 283Schuster, A., 9, 10, 11, 173, 181Schwabe, H. S., 138Schwartz, J. T., 348Schwartz, L., 27, 82Schwarz, H. A., 18,151, 262, 421, 426,460Schwerdtfeger, H., 72Sears, F. W., 11Shapiro, H. S., 57Shilov, G., 76, 78, 79, 287, 400Shiryaev, A. N., 20, 21, 26, 38, 41, 94, 97Shumway, R. H., 279Siegel, A., 11, 38Simon, M., 180Simpson, S. M., 10Singleton, R. C., 5, 66Siotani, M., 287Skorokhod, A. V., 12, 423Slepian, D., 12, 146Slutsky, E., 9, 10, 128, 170Smith, E. J., 11Snodgrass, F. E., 181Sobel, M., 192Solodovnikov, V. V., 11, 298, 331Srivastava, M. S., 90Stegun, I. A., 191,291,334Stieltjes, T. J., 394, 401Stigum, B. P., 12Stockham, T. G., Jr., 67Stokes, G. G., 9, 69Stone, R., 354, 400Straf, M. L., 259Stiiebel, C., 12Stuart, A., 20,188, 253, 289, 292, 314, 323,
372, 420Student, 253, 254Stumpff, K., 10, 64Sugiyama, G., 341
AUTHOR INDEX 495
Suhara, K., 12, 180Suzuki, H., 12, 180Szego, G., 84, 415
Takeda.S., 11,226Tate, R. F., 289Taylor, B., 199, 416, 424, 439Theodorescu, R., 94, 98Tick, L. J., 142, 298, 301, 330Timan, A. F., 57, 445Toeplitz, O., 72, 73, 74, 108Topper, L., 11Tukey, J. W., viii, 9, 10, 11, 26, 32, 54, 55,
57, 65, 120, 146, 150, 159, 161, 177,179,180, 199, 225, 309, 329, 341, 439,449
Tumura, Y., 341,374
de la Valle-Poussin, C, 55, 57Van der Pol, B., 11Van Ness, J. S., 9, 154, 265, 406, 408, 445Varadhan, S. R. S., 22Vok, C. A., 292, 317Volterra, V., 38Von Mises, R., 43
Wahba, G., 245, 295, 305, 309, 319, 336Waikar, V. B., 341Wald, A., 13, 185, 204, 414Waldmeir, M., 5Walker, A. M., 128, 172, 174, 183, 264Wallace, W. H., 226Ward, F., 180Watts, D. G., 298, 309, 313Wedderburn, J. H. M., 71, 72Wegel, R. L., 11, 163Weierstrass, K., 55
Welch, P. D., 64, 164Weyl, H., 422Whittaker, E. T., 64Whittle, P., 5, 9, 12, 13, 73, 98, 166, 174,
181,264,289,331,348Wielandt, H., 74, 456Wiener, N., 8, 9, 10,11, 12, 38, 41, 43, 76,
81,82,99,180,181,298,331,348Wiesner, J. B., 11Wilk, M. B., 126Wilkins, J. E., 57Wilkinson, J. H., 74, 456, 458Wilks, S. S., 389Williams, E. J., 289Wintner, A., 43, 99Wishart, J., 90, 113, 238, 240, 245, 246,
251,252,342,352Wold, H. O. A., 8, 10, 13, 43, 81, 121Wolfe, E., 11Wong, E., 36Wonnacott, T., 298Wood, L. C., 60Wooding, R. A., 89Woodroofe, M. B., 154, 265, 406, 408, 445Wright, W. D., 10, 180Wright, W. W., 12
Yaglom, A. M., 18, 354Yamanouchi, Y., 11, 180, 191, 221, 263,
309, 330Yevjevich, V., 225Young, G., 75Yule, G. U., 5Yuzuriha, T., 12, 180
Zaremba, S. K., 172Zetler, B. D., 225Zygmund, A., 8, 50, 98, 394, 406
SUBJECT INDEX
Acoustics, 11Adjustment, seasonal, 180, 209Algorithm, Fast Fourier, 13, 65,67, 88.
132,160, 167,212,222,255Alias, 177Aliasing, 267Analysis, canonical, 368, 391
cross-spectral, 225, 226factor, 354Fourier, 8, 26frequency, 10, 11, 12, 34, 179generalized harmonic, 41harmonic, 7, 8, 10multiple regression, 222power spectral, 179principal component, 366, 367regression, 301spectral, 10
Approach, functional, 41, 43, 80, 100stochastic, 41,43, 100
Argument, 17
Bandwidth, 32, 54, 57, 157, 158, 164, 165,350
Bias, 154, 158Biology, 12Bispectrum, 26
Classification, balanced one-way, 276Coefficient, canonical correlation, 376
Coefficient (continued)complex regression, 292, 296, 300, 322,
330, 332, 336filter, 317Fourier, 50partial complex regression, 297regression, 289squared sample multiple correlation, 189vector alienation, 390vector correlation, 335, 390
Coherence, 214, 257, 275, 325, 329, 332,333,382
canonical, 382, 390multiple, 219, 296, 302, 307, 310, 312,
317, 331, 334partial, 311, 312
Coherency, 257, 297, 330, 347, 364intraclass, 277partial, 297, 300, 302, 306, 311
Color, 180Comb, Dirac, 17, 26, 101
Kronecker, 17Communicate, 20Component, frequency, 104, 117, 353
principal, 106, 107, 108, 337, 339, 340,342
Consistent, 149, 168, 176, 182Convergence, in distribution, 258
weak, 258
496
SUBJECT INDEX 497
Convolution, 61, 67Correlation, 16, 289, 330, 332
canonical, 289, 372conditional, 290multiple, 289, 293, 300, 302, 333, 336partial, 289, 291, 293, 295, 302, 335
Cosinusoid, 8, 28, 35, 40, 62, 81, 104Covariance, 16
partial, 289, 293, 335, 336Cross-periodogram, 9, 306, 327Cross-spectrum, 23, 233
partial, 297Cumulant, 19, 341
Decomposition, singular value, 72, 87Delay, group, 304, 329Delta, Dirac, 17, 100
Kronecker, 17, 148Demodulate, complex, 33Demodulation, complex, 32, 47Design, experimental, 276
filter, 58Determinant, 16Discrimination, 180, 391Distribution, asymptotic, 88
complex normal, 89, 109complex Wishart, 90, 342finite dimensional, 18normal, 89, 332Schwartz, 82Student's /, 253uniform, 336Wishart, 90, 314
Domain, frequency, 13, 94, 337time, 13
Economics, 12Electroencephalogram (EEG), 180Engineering, electrical, 11Equation, integral, 226
stochastic difference, 38stochastic differential, 38
Equations, simultaneous, 323, 324Errors, in variables, 323Estimate, best linear, 321
consistent, 146, 306least squares, 174, 185, 188, 321maximum likelihood, 112, 183, 190, 330nonlinear, 326spectral, 130, 142spectral measure, 170
Estimation, spectral by prefiltering, 159Expansion, perturbation, 334
power series, 75, 77Taylor, 199Volterra functional, 38, 40
Expected value, 16Extension, period T, 65, 66, 83
Factor, convergence, 52, 54, 55, 90Filter, 8, 16, 27, 337, 344
band-pass, 32, 97, 104, 117, 162, 176digital, 60inverse, 30linear, 28low-pass, 32, 58summable, 30matched, 299nonsingular, 30optimum linear, 295, 299realizable, 29, 78, 85stable, 48summable, 29
Formula, Euler-MacLaurin, 15Poisson summation, 47, 91, 124
FORTRAN, 66Frequency, 8,40
angular, 23folding, 179Nyquist, 179radian, 23unknown, 69
Function, autocorrelation, 18autocovariance, 18, 22, 119, 166Bessel, 113, 114, 229characteristic, 39, 69circular autocovariance, 167cross-correlation, 18cross-covariance, 18, 221, 232fixed, 88generalized, 17, 27holomorphic, 75, 85hypergeometric, 229joint cumulant, 21linear discriminant, 391matrix-valued, 86mean, 18measurable, 43random, 8, 18real holomorphic, 77, 85sample autocovariance, 169stochastic, 8
498 SUBJECT INDEX
Function (continued)transfer, 28, 187, 196, 345transition probability, 36
Gain, 302, 310, 325, 358, 361Geophysics, 11, 225Group, finite, 64
Harmonic, first, 138Hook, 20
Identity, 16Inequality, Schwarz, 18, 151, 262Inference, 12Integral, stochastic, 102Interval, confidence, 151Inverse, generalized, 87Isomorphism, 71
Kernel, 54, 155FejeY, 132
Law, iterated logarithm, 98Law of large numbers, 99Least squares, 188, 221Limits, confidence, 252, 352, 387Loop, feed-back, 324
Matrix, 16, 70block circulant, 84circulant, 73complex, 70conditional spectral density, 336error spectral density, 307, 381finite Toeplitz, 72, 108Hermitian, 70, 71, 287Jacobian, 75, 85non-negative definite, 70, 287spectral density, 24, 233, 242, 247, 333unitary, 70, 71, 346
Mean, Cesaro, 14sample, 83
Measure, probability, 41spectral, 25, 166, 168
Medicine, 12Meteorology, 225Mixing, 8, 9Model, parametric, 166Modulus, 17Monomial, 62, 63Motion, Brownian, 283
Normal, asymptotically, 90, 228, 340complex multivariate, 89, 313
Notation, Landau, 52Numbers, sunspot, 5, 127, 137, 138, 141,
153, 170, 171
Oceanography, 225, 390Operation, linear, 27
time invariant, 28, 80Order (m,M), 37Orthogonal, 18
Part, integral, 84Partition, indecomposable, 20Path, sample, 18Periodicity, hidden, 9,173,181Periodogram, 9, 120, 128
cross-, 9fcth order, 9second-order, 120, 235smoothed, 131third-order, 9
Permanent, 110Phase, 302, 310, 325, 358, 361Phenomenon, Gibbs', 52Physics, 10Plot, chi-squared probability, 126, 141
normal probability, 96, 97Polynomial, Bernoulli, 15
trigonometric, 57, 62Power, instantaneous, 118Prediction, 300
linear, 78, 181, 331Predictor, best linear, 289, 331, 332, 336Prefiltering, 154, 220, 318, 329, 330, 349Prewhitening (prefiltering), 159, 266Probability, initial, 36Probability 1,43,98Procedure, jack-knife, 374Process, autoregressive
(autogressive scheme), 335Brownian motion, 114circular, 110ergodic, 43, 45Gaussian, 9, 284linear, 31, 35, 39, 100, 319m dependent, 335Markov, 36, 45mixed moving average and
autoregressive, 37point, 165white noise, 332
SUBJECT INDEX 499
Product, Kronecker, 288Program, computer, 66, 322, 389Psychology, 12Psychometrician, 354
Rainfall, English and Welsh, 121,122,139,140
Rank, 46Ratio, signal to noise, 299Realization, 18, 331Region, confidence, 154, 206, 314
multiple confidence, 229Regression, 188, 367Representation, Cramer, 100, 102, 106,
234, 345, 354spectral, 25, 80, 81
Resolution, 166Response, impulse, 29, 204, 223Root, latent (latent value), 69, 70, 76, 84,
107, 165, 339-341, 343, 366, 378
Sampling, jittered, 165Scheme, autoregressive, 37, 77, 84, 159,
164, 184, 320, 321, 324Seismology, 1, 225, 338Semi-invariant (cumulant), 20Series, canonical variate, 379, 382
continuous, 177dependent, 186deterministic, 186discrete, 177error, 186, 345, 369fixed, 82, 186, 231Fourier, 49Gaussian, 36, 110, 165, 283, 298, 324,
366independent, 186index number, 338instrumental, 323, 324principal component, 344-348, 351, 353,
357pure noise, 35, 39, 141, 180residual, 174stationary, 35stationary Gaussian, 36, 39, 167stochastic, 82, 186time, 1, 18trigonometric, 53white noise (white noise process), 321,
332Signal, 299
Sinusoid, 8Smoothing, 181Spacing, general time, 178Spectrum, 70, 74, 109
amplitude, 24co-, 24, 234, 279cross-, 23, 233cumulant, 25, 34, 39, 92error, 186, 196, 227, 296, 300, 330mixed, 173phase, 24power, 11, 23, 74, 116-119, 177, 179quadrature, 24, 234, 279residual, 180second-order, 23, 232, 260
Stationary, 22Stationary, 8
second-order, 22strictly, 22, 42wide sense, 22
Statistic, descriptive, 10, 179sufficient, 12, 111Wilks' X, 389
Stochastic process, 8, 18Stochastics, 17Sum, partial, 52System, structural equation, 324
Taper, 54, 91, 124, 150, 156, 364cosine, 151
TemperaturesBasel, 2, 271-275, 355-365Berlin, 2, 209-219, 239-242, 267-275,
327-330, 355-365Breslau, 2, 271-275, 355-365Budapest, 2, 271-275, 355-365Copenhagen, 2, 271-275, 355-365De Bill, 2, 271-275, 355-365Edinburgh, 2, 271-275, 355-365Greenwich, 2, 271-275, 355-365New Haven, 2, 271-275, 355-365Prague, 2, 271-275, 355-365Stockholm, 2, 271-275, 355-365Trondheim, 2, 271-275, 355-365Vienna, 2, 95-97, 209-219, 240-242,
268-275, 327-330, 355-365Vilna, 2, 271-275, 355-365
Testing, informal, 180Theorem, central limit, 94, 95
Gauss-Markov, 188Kolmogorov extension, 42
500 SUBJECT INDEX
Theorem (continued)Spectral, 72Wielandt-Hoffman, 74
Trajectory, 18Transform, discrete Fourier, 63,67,70,73,
221fast Fourier, 64, 68, 120, 142, 162, 262finite Fourier, 9, 60, 69, 88, 90,94, 105,
235,299Fourier, 49, 75, 123,164, 332Fourier-Stieltjes, 85Hilbert, 32, 59, 60, 104, 234, 301, 353,
389Transformation, Abel, 15
variance stabilizing, 150, 311, 314, 329Transient, 227Transitive, metrically, 43Transpose, 16, 70Trend, 43, 44, 174, 176Trispectrum, 26Turbulence, 11
Value, extremal, 70latent, 69, 70, 76, 84, 107, 287, 342
Variance, 16Variate, canonical, 371, 372, 376, 377, 388
chi-squared, 126,145complex /, 192error, 289, 301exponential, 126F, 189multivariate /, 192, 291noncentral chi-squared, 127noncentral F, 196, 228normal, 20,112,289, 332/, 184uniform, 111, 182, 333
Vector, 16latent, 70, 165, 339, 340, 342, 344, 366
Window, data, 54, 91frequency, 54
This invited paper is one of a series planned on topics of general inter-est-The Editor.
Manuscript received June 7, 1974; revised August 13, 1974. Thispaper was prepared while the author was a Miller Research Professorand was supported by NSF under Grant GP-31411.
The author is with the Department of Statistics, University of Cali-fornia, Berkeley, Calif. 94720.
501
I. INTRODUCTION
T HE FOURIER analysis of data has a long history, dat-ing back to Stokes [1] and Schuster [2], for example.It has been done by means of arithmetical formulas
(Whittaker and Robinson [3], Cooley and Tukey [4]), bymeans of a mechanical device (Michelson [5]), and by meansof real-time filters (Newton [6], Pupin [7]). It has been car-ried out on discrete data, such as monthly rainfall in the Ohiovalley (Moore [8]), on continuous data, such as radiated light(Michelson [5]), on vector-valued data, such as vertical andhorizontal components of wind speed (Panofsky and McCor-mick [9]), on spatial data, such as satellite photographs (Leeseand Epstein [10]), on point processes, such as the times atwhich vehicles pass a position on a road (Bartlett [11]), and on
ADDENDUMFourier Analysis of Stationary Processes
Reprinted with permission from Proceedings of theIEEE, Volume 62, No. 12, December 1974. Copyright ©1974—The Institute of Electrical and Electronics Engi-neers, Inc.
Abstract-Tim papet begins with a description of some of the impor-tant procedures of the Fourier analysis of real-valued stationary discretetime series. These procedures include the estimation of the power spec-trum, the fitting of finite parameter models, and the identification oflinear time invariant systems. Among the results emphasized is the onethat the large sample statistical properties of the Fourier transform aresimpler than those of the series itself. The procedures are next gen-eralized to apply to the cases of vector-valued series, multidimensionaltime series or spatial series, point processes, random measures, andfinally to stationary random Schwartz distributions. It is seen that therelevant Fourier transforms are evaluated by different formulas in thesefurther cases, but that the same constructions are carried out after theirevaluation and the same statistical results hold. Such generalizationsare of interest because of current work in the fields of picture process-ing and pulse-code modulation.
502 ADDENDUM
point processes in space, such as the positions of pine trees in afield (Bartlett [12]). It has even been carried out on thelogarithm of a Fourier transform (Oppenheim et al [ 13]) andon the logarithm of a power spectrum estimate (Bogert et al.[14]).
The summary statistic examined has been: the Fourier trans-form itself (Stokes [1]), the modulus of the transform(Schuster [2]), the smoothed modulus squared (Bartlett[15]), the smoothed product of two transforms (Jones [16]),and the smoothed product of three transforms (Hasselmanetal [17]).
The summary statistics are evaluated in an attempt to mea-sure population parameters of interest. Foremost among theseparameters is the power spectrum. This parameter was initiallydefined for real-valued-time phenomena (Wiener [18]). In re-cent years it has been defined and shown useful for spatialseries, point processes, and random measures as well. Our de-velopment in this paper is such that the definitions set downand mathematics employed are virtually the same for all ofthese cases.
Our method of approach to the topic is to present first anextensive discussion of the Fourier analysis of real-valueddiscrete-time series emphasizing those aspects that extend di-rectly to the cases of vector-valued series, of continuous spatialseries, of point processes, and finally of random distributions.We then present extensions to the processes just indicated.Throughout, we indicate aspects of the analysis that are pecu-liar to the particular process under consideration. We alsomention higher order spectra and nonlinear systems. Wold[19] provides a bibliography of papers on time series analysiswritten prior to 1960. Brillinger [20] presents a detailed de-scription of the Fourier analysis of vector-valued discrete-timeseries.
We now indicate several reasons that suggest why Fourieranalysis has proved so useful in the analysis of time series.
II. WHY THE FOURIER TRANSFORM?Several arguments can be advanced as to why the Fourier
transform has proved so useful in the analysis of empiricalfunctions. For one thing, many experiments of interest havethe property that their essential character is not changed bymoderate translations in time or space. Random functionsproduced by such experiments are called stationary. (A defini-tion of this term is given later.) Let us begin by looking for aclass of functions that behave simply under translation. If, forexample, we wish
for t > 0 and so f(t) =/(0) exp {ott} for a = In Ci. If/(f) is tobe bounded, then a = i\, for i = >/-T and X real. We have beenled to the functions exp {i\t}. Fourier analysis is concernedwith such functions and their linear combinations.
On the other hand, we might note that many of the opera-tions we would like to apply to empirical functions are linearand translation invariant, that is such that; if Xi(t)-+ Y\(t)and X2(t) -> F2(0 then a^C) + a^X2(t) -» ^Y^t) +<*2 Y2 (t) and if X(t) -> 7(0 then X(t - w) -» Y(t - u). Such op-erations are called linear filters. It follows from these condi-tions that if X(t) = exp {i\t} -»• Y\(t) then
X(t + u) = exp {i\u} X(t) -> exp {iKt} Yx(t) = Y(t + «).
Setting u = t, t - 0 gives 7x(f) = exp {i\t} 7\(0). In sum-mary, exp {i\t} the complex exponential of frequency X iscarried over into a simple multiple of itself by a linear filter.A(\) = 7x(0) is called the transfer function of the filter. If thefunction X(t) is a Fourier transform, X(t) = / exp {iat} x(a)da, then from the linearity (and some continuity) X(t) -+/exp ioct A(a) x(ct) da. We see that the effect of a linear filteris easily described for a function that is a Fourier transform.
In the following sections, we will see another reason for deal-ing with the Fourier transforms of empirical functions,namely, in the case that the functions are realizations of a sta-tionary process, the large sample statistical properties of thetransforms are simpler than the properties of the functionsthemselves.
Finally, we mention that with the discovery of fast Fouriertransform algorithms (Cooley and Tukey [4]), the transformsmay often be computed exceedingly rapidly.
III. STATIONARY REAL-VALUED DISCRETE-TIME SERIESSuppose that we are interested in analyzing T real-valued
measurements made at the equispaced times t = 0, • • • , T- 1.Suppose that we are prepared to model these measurements bythe corresponding values of a realization of a stationarydiscrete-time series X(t), t = 0, ±1, ±2, • • • . Important param-eters of such a series include its mean,
giving the average level about which the values of the series aredistributed and its autocovariance function
FOURIER ANALYSIS OF STATIONARY PROCESSES 503
with Ci =£ 0, then by recursion
504 ADDENDUM
providing a measure of the degree of dependence of values ofthe process \u\ time units apart. (These parameters do not de-pend on t because of the assumed stationarity of the series.)In many cases of interest the series is mixing, that is, such thatvalues well separated in time are only weakly dependent in aformal statistical sense to be described later. Suppose, in par-ticular, that cxx(u) ~* 0 sufficiently rapidly as \u \ -*• °° for
to be defined. The parameter fxxO^ *s called the power spec-trum of the series X(t) at frequency X. It is symmetric about 0and has period 27T. The definition (3) may be inverted to ob-tain the representation
of the au toco variance function in terms of the powerspectrum.
If the series X(t) is passed through the linear filter
with well-defined transfer function
then we can check that
and, by taking Fourier transforms, that
under some regularity conditions. Expression (6), the fre-quency domain description of linear filtering, is seen to bemuch nicer than (5), the time-domain description.
Expressions (4) and (6) may be combined to obtain an inter-
FOURIER ANALYSIS OF STATIONARY PROCESSES 505
pretation of the power spectrum at frequency X. Suppose thatwe consider a narrow band-pass filter at frequency X havingtransfer function
In words, the power spectrum of the series X(t) at frequency Xis proportional to the variance of the output of a narrow band-pass filter of frequency X. In the case that X =£ 0, ±2ir, ±47r, • • •the mean of the output series is 0 and the variance of the out-put series is the same as its mean-squared value*. Expression(7) shows incidentally that the power spectrum is nonnegative.
We mention, in connection with the representation (4), thatKhintchine [21 ] shows that for X(t) a stationary discrete timeseries with finite second order moments, we necessarily have
where FXX^ *s a monotonic nondecreasing function.FxxQ^ is called the spectral measure. Its derivative is thepower spectrum. Going along with (8), Cramer [22] demon-strated that the series itself has a Fourier representation
(In these last expressions, if 8(X) is the Dirac delta functionthen r?(X) = 2 5(X- 27T/) is the Kronecker comb.) Also expres-
with A small. Then the variance of the output series Y(t), o'the filter, is given by
where Zj^(X) is a random function with the properties;
506 ADDENDUM
sion (11) concerns the covariance of two complex-varied vari-ates. Such a covariance is defined by cov {X, Y} -E{(X- EX)(Y~ EY)}.) Expression (9) writes the series X(t)as a Fourier transform. We can see that if the series X(t) ispassed through a linear filter with transfer function A(\), thenthe output series has Fourier representation
These remarks show that the finite Fourier transform may beinterpreted as, essentially, the result of narrow band-pass filter-ing the series.
Before presenting a second interpretation, we first remarkthat the sample covariance of pairs of values X(t), Y(t), t = Q,1, • • • , T- 1 is given by T~l 2 X(t) Y(f), when the Y(t) valueshave 0 mean. This quantity is a measure of the degree of linearrelationship of the X(t) and Y(t) values. The finite Fouriertransform is essentially, then, the sample covariance betweenthe X(t) values and the complex exponential of frequency X.It provides some measure of the degree of linear relationshipof the series X(t) and phenomena of exact frequency X.
In Section XV, we will see that the first and second-order rela-tions (10), (11) may be extended to fcth order relations withthe definition of fcth order spectra.
IV. THE FINITE FOURIER TRANSFORMLet the values of the series X(t) be available for t = 0, 1, 2,
' " ,T- 1 where tT is an integer. The finite Fourier transformof this stretch of series is defined to be
A number of interpretations may be given for this variate. Forexample, suppose we take a linear filter with transfer functionconcentrated at the frequency X, namely A (a) = 6 (a - X). Thecorresponding time domain coefficients of this filter are
The output of this filter is the series
FOURIER ANALYSIS OF STATIONARY PROCESSES 507
In the case that X = 0, the finite Fourier transform (12) isthe sample sum. The central limit theorem indicates condi-tions under which a sum of random variables is asymptoticallynormal as the sample size grows to °°. Likewise, there aretheorems indicating that d^(\) is asymptotically normal asT -»• °°. Before indicating some aspects of these theorems weset down a definition. A complex-valued variate w is calledcomplex normal with mean 0 and variance a2 when its realand imaginary parts are independent normal variates withmean 0 and variance a2/2. The density function of w is pro-portional to exp {- |w|2/a2}. The variate |w|2 is exponentialwith mean a2 in this case.
In the case that the series X(t) is stationary, with finitesecond-order moments, and mixing (that is, well-separatedvalues are only weakly dependent) the finite Fourier transformhas the following useful asymptotic properties as T -*• °°:
a) d^P(O) - Tcx is asymptotically normal with mean 0 andvariance 2*nTfxx(Q);
b) for X =£ 0, ±TT, ±27r, • • • , d^(\) is asymptotically com-plex normal with mean 0 and variance 2nTfxxO*>)',
c) for si(T), / = ! , • • • , / integers with X'CD = 2irsf(T)/T^>-X =£ 0, ±TT, ±27T, • • • the variates ^(X^D), • • • ,d^\\J(T)) are asymptotically independent complexnormals with mean 0 and variance 2itTfxxOd>
d) for X ̂ 0, ±TT, ±27r, • • • and U = T/J and integer, thevariates
are asymptotically, independent complex normals withmean 0 and variance 2irUfxxO^-
These results are developed in Brillinger [20]. Related re-sults are given in Section XV and proved in the Appendix.Other references include: Leonov and Shiryaev (23], Picin-bono [24], Rosenblatt [25], Brillinger [26], Hannan andThomson [27]. We have seen that exp {z'Xf} d^(X) may beinterpreted as the result of narrow band-pass filtering theseries X(t). It follows that the preceding result b) is consistentwith the "engineering folk" theorem to the effect that narrowband-pass noise is approximately Gaussian.
Result a) suggests estimating the mean GX by
and approximating the distribution of this estimate by a nor-
508 ADDENDUM
in the case X =£ 0, ±2ir, • • • . We will say more about this sta-tistic later. It is interesting to note, from c) and d), thatasymptotically independent statistics with mean 0 and vari-ance proportional to the power spectrum at frequency X maybe obtained by either computing the Fourier transform atparticular distinct frequencies near X or by computing them atthe frequency X but based on different time domains. Wewarn the reader that the results a)-d) are asymptotic. Theyare to be evaluated in the sense that they might prove reason-able approximations in practice when the domain of observa-tion is large and when values of the series well separated in thedomain are only weakly dependent.
On a variety of occasions we will taper the data before com-puting its Fourier transform. This means that we take a datawindow 0^(0 vanishing for t < 0, t > T - 1, and compute thetransform
mal distribution with mean 0 and variance 2irfxx(Q)/T. Re-sult b) suggests estimating the power spectrum fxxO^) by theperiodogram
for selected values of X. One intention of tapering is to reducethe interference of neighboring frequency components. If
then the Cramer representation (9) shows that (14) may bewritten
From what we have just said, we will want to choose $T^(t)so that &T\ot) is concentrated near a = 0, ±2n, • • • . (Oneconvenient choice of <^T\t) takes the form 0(f/r) where0(w) = 0 for u < 0, u > 1.) The asymptotic effect of taperingmay be seen to be to replace the variance in b) by27T 2 0(r)(f)2/^(X).
Hannan and Thomson [27] investigate the asymptotic dis-tribution of the Fourier transform of tapered data in a casewhere fxxQd depends on T in a particular manner. The hopeis to obtain better approximations to the distribution.
FOURIER ANALYSIS OF STATIONARY PROCESSES 509
V. ESTIMATION OF THE POWER SPECTRUMIn the previous section, we mentioned the periodogram,
/j^f(A), as a possible estimate of the power spectrum /jrjr(X)in the case that X =£ 0, ±27T, • • • . If result b) holds true, then/^(X), being a continuous function of d^(X), will be dis-tributed asymptotically as |w|2, where w is a complex normalvariate with mean 0 and variance fxxO^- Tnat is fjcxQti willbe distributed asymptotically as an exponential variate withmean fxxO^- From the practical standpoint this is interest-ing, but not satisfactory. It suggests that no matter how largethe sample size T is, the variate /j^y(X) will tend to be dis-tributed about fxxOd With an appreciable scatter. Luckily,results c) and d) suggest means around this difficulty. Follow-ing c), the variates 1^0^(T))J = ! , - • • , / are distributedasymptotically as independent exponential variates with meanfxx&)- Their average
will be distributed asymptotically as the average of / indepen-dent exponential variates having mean fxxO^- That is, it willbe distributed as
where \\j denotes a chi-squared variate with 2J degrees offreedom. The variance of the variate (17) is
if U = T/J. By choice of J the experimenter can seek to obtainan estimate of which the sampling fluctuations are smallenough for his needs. From the standpoint of practice, itseems to be useful to compute the estimate (16) for a numberof values of /. This allows us to tailor the choice of / to thesituation at hand and even to use different values of / for dif-ferent frequency ranges. Result d) suggests our considerationof the estimate
It too will have the asymptotic distribution (17) with variance(18).
We must note that it is not sensible to take / in (16) and(19) arbitrarily large as the preceding arguments might havesuggested. It may be seen from (15) that
510 ADDENDUM
If we are averaging / periodogram values at frequencies 2ir/Tapart and centered at X, then the bandwidth of the kernel of(21) will be approximately 4nJ/T. If / is large and fxxfa)varies substantially in the interval -2irJ/T<a- \<2irJ/T,then the value of (21) can be very far from the desired fxjr(X).In practice we will seek to have / large so that the estimate isreasonably stable, but not so large that it has appreciable bias.This same remark applies to the estimate (19). Parzen [28]constructed a class of estimates such that Ef^^(K) -+fxxQdand var /^(M •*• 0. These estimates have an asymptotic dis-tribution that is normal, rather than x3> Rosenblatt [29].Using the notation preceding these estimates correspond tohaving / depend on T in such a way that JT -*• °°, but JffT -*• 0as T -* °°.
Estimates of the power spectrum have proved useful; i) assimple descriptive statistics, ii) in informal testing and discrim-ination, iii) in the estimation of unknown parameters, and iv)in the search for hidden periodicities. As an example of i), wemention their use in the description of the color of an object,Wright [30]. In connection with ii) we mention the estima-tion of the spectrum of the seismic record of an event in at-tempt to see if the event was an earthquake or a nuclear explo-sion, Carpenter [31], Lampert et al [32]. In case iii), wemention that Munk and MacDonald [33] derived estimates ofthe fundamental parameters of the rotation of the Earth fromthe periodogram. Turning to iv), we remind the reader thatthe original problem that led to the definition of the powerspectrum, was that of the search for hidden periodicities. As a
where
is the Fejer kernel. This kernel, or frequency window, is non-negative, integrates to 1, and has most of its mass in the inter-val (-2JT/71, 27T/D. The term in ex may be neglected for X =£ 0,±27r, • • • and T large. From (16) and (20) we now see that
FOURIER ANALYSIS OF STATIONARY PROCESSES 511
modern example, we mention the examination of spectral es-timates for the periods of the fundamental vibrations of theEarth, MacDonald and Ness [34].
VI. OTHER ESTIMATES OF THE POWER SPECTRUMWe begin by mentioning minor modifications that can be
made to the estimates of Section V. The periodograms of (16)may be computed at frequencies other than those of the formIns/T, s an integer, and they may be weighted unequally. Theperiodograms of the estimate (19) may be based on overlap-ping stretches of data. The asymptotic distributions are not sosimple when these modifications are made, but the estimate isoften improved. The estimate (19) has another interpretation.We saw in Section IV that exp {z'Xf} d^\\J) might be inter-preted as the output of a narrow band-pass filter centered at X.This suggests that (19) is essentially the first power spectralestimate widely employed in practice, the average of thesquared output of a narrow band-pass filter (Wegel and Moore[35 ]). We next turn to a discussion of some spectral estimatesof quite different character.
We saw in Section HI that if the series X(t) was passedthrough a linear filter with transfer function A(\), then theoutput series Y(t) had power spectrum given by /yy(X) =\A(\)\2fxxO^- In Section V, we saw that the estimates (16),(19) could have substantial bias were there appreciable varia-tion in the value of the population power spectrum. These re-marks suggest a means of constructing an improved estimate,namely: we use our knowledge of the situation at hand to de-vise a filter, with transfer function A(\), such that the outputseries Y(t) has spectrum nearer to being constant. We thenestimate the power spectrum of the filtered series in the man-ner of Section V and take M(X)r2/yy(X) as our estimate offxxOd- This procedure is called spectral estimation by pre-whitening and is due to Tukey (see Panofsky and McCormick[9]). We mention that in many situations we will be contentto just examine /^(X). This would be necessary wereA(\) = 0.
One useful means of determining an A(X) is to fit an auto-regressive scheme to the data by least squares. That is, forsome K, choose £(1), • • • ,ft(K) to minimize
where the summation extends over the available data. In thiscase 1(X) = !+£(!) exp {-iX} + • • • + ti(K) exp {-iA£>. Analgorithm for efficient computation of the £(«) is given inWiener [36, p. 136]. This procedure should prove especiallyeffective when the series X(t) is near to being an autoregressive
512 ADDENDUM
scheme of order K. Related procedures are discussed inGrenander and Rosenblatt [37, p. 270], Parzen [38], Lacoss[39], and Burg [40]. Berk [41] discusses the asymptotic dis-tribution of the estimate \A(\)r*(2irT)~l 2 [X(t) +ti(\)X(t- !) + • • • + d(K) X(t - K)|2. Its asymptotic varianceis shown to be (18) with U=2K.
Pisarenko [42] has proposed a broad class of estimates in-cluding the high resolution estimate of Capon [43] as a par-
/\
ticular case. Supp6se S is an estimate of the covariance matrixof the variate
determined from the sample values AT(0), • • • , X(T - 1). Sup-pose jlu, au, u = 1, • • • , U are the latent roots and vectors ofXv
E. Suppose H(p.), 0 < ju < °°, is a strictly monotonic functionwith inverse h(-). Pisarenko proposed the estimate
He presents an argument indicating that the asymptotic vari-ance of this estimate is also (18). The hope is that it is lessbiased. Its character is that of a nonlinear average of periodo-gram values in contrast to the simple average of (16) and (19).The estimates (16) and (19) essentially correspond to the caseH(n) = ju. The high resolution estimate of Capon [43] corre-sponds to H(n) = ju"1.
The autoregressive estimate, the high-resolution estimate andthe Pisarenko estimates are not likely to be better than anordinary spectral estimate involving steps of pre whitening,tapering, naive spectral estimation and recoloring. They areprobably better than a naive spectral estimate for a series thatis a sum of sine waves and noise.
VII. FINITE PARAMETER MODELSSometimes a situation arises in which we feel that the form
of the power spectrum is known except for the value of a finitedimensional parameter 6. For example existing theory maysuggest that the series X(t) is generated by the mixed movingaverage autoregressive scheme
where U, V are nonnegative integers and e(t) is a series of
FOURIER ANALYSIS OF STATIONARY PROCESSES 513
Expression (25) is the likelihood corresponding to the assump-tion that the periodogram valuesI^(2ns/T), Q<s< T/2, areindependent exponential variates with means fxx(lirs/T', 6),0 < s < T/2, respectively. Under regularity conditions we canshow that this estimate, #, is asymptotically normal with mean6 and covariance matrix 2irT~lA~l(A + B)A~l where; ifVfxxO^> 0) is tne gradient vector with respect to 6 andfxxxxthe 4th order cumulant spectrum (see Section XV)
independent variates with mean 0 and variance a2. The powerspectrum of this series is
with 0 = a2, a(l), • • • ,a(K), b(l\ • • • , b(L). A number ofprocedures have been suggested for estimating the parametersof the model (23), see Hannan [44] and Anderson [45], forexample.
The following procedure is useful in situations more generalthan the above. It is a slight modification of a procedure ofWhittle [46]. Choose as an estimate of 6 the value thatmaximizes
We may carry out the maximization of (25) by a number ofcomputer algorithms, see the discussion in Chambers [47]. In[48], we used the method of scoring. Other papers investi-gating estimates of this type are Whittle [49], Walker [50],and Dzaparidze [51].
The power spectrum itself may now be estimated byfxxO^'y ® )• This estimate will be asymptotically normal withmean /^(X; 8) and variance 27r7""1V/jrjr(X; 6)TA~l(A + B) •A~l Vjjo^X; 0) following the preceding asymptotic normal dis-tribution for 6. In the case that we model the series by an
514 ADDENDUM
autoregressive^ scheme and proceed in the same way, the esti-mate fxxO^'y 0) has the character of the autoregressive estimateof the previous section.
VIII. LINEAR MODELSIn some circumstances we may find ourselves considering a
linear time invariant model of the form
where the values X(t), S(t), t = 0, 1, • • • , T- 1 are given, e(t)is an unknown stationary error series with mean 0 and powerSpectrum f€€(\), the a(u) are unknown coefficients, n is an un-known parameter, and S(t) is a fixed function. For example,we might consider the linear trend model
with ju and a unknown, and be interested in estimating/ee(X).Or we might have taken S(t) to be the input series to a linearfilter with unknown impulse-response function a(u), u = 0,±1, • • • in an attempt to identify the system, that is, to estimatethe transfer function ^4(X) = S a(u) exp {-/X«} and the a(u).The model (26) for the series X(t) differs in an important wayfrom the previous models of this paper. The series X(t) is notgenerally stationary, because EX(t) - ju + Sa(/ - u)S(u).
Estimates of the preceding parameters may be constructedas follows: define
with similar definitions for dJP(X), d^T\\), Then (26) leadsto the approximate relationship
for / = 1, • • • , / . Following b) of Section IV, the d(T\X(T))are, for large T, approximately independent complex normalvariates with mean 0 and variance 27r7>/€e(X). The approximatemodel (28) is seen to take the form of linear regression. Theresults of linear least-squares theory now suggest our considera-tion of the estimates,
for some integer P. In some circumstances it may be appro-priate to taper the data prior to computing the Fourier trans-form. In others it might make sense to base the Fouriertransforms on disjoint stretches of data in the manner of d) ofSection IV.
Under regularity conditions the estimate A^T\\) may beshown to be asymptotically complex normal with mean A(\)and variance J~lf€€(\)f^\\Tl (see [20]). The degree of fitof the model (26) at frequency X may be measured by thesample coherence function
FOURIER ANALYSIS OF STATIONARY PROCESSES 515
and
where
with similar definitions for /£y, /£y, fj>p. The impulse re-sponse could be estimated by an expression such as
satisfying
This function provides a time series analog of the squaredcoefficient of correlation of two variates (see Koopmans[52]).
The procedure of prefiltering is often essential in the estima-tion of the parameters of the model (26). Consider a commonrelationship in which the series X(t) is essentially a delayedversion of the series S(t), namely
for some v. In this case
and
516 ADDENDUM
If v is large, the complex exponential fluctuates rapidly about0 as / changes and the first term on the right-hand side of (30)may be near 0 instead of the desired a exp {-i\v}fjg\\). Auseful prefiltering for this situation is to estimate v by v, thelag that maximizes the magnitude of the sample cross-covari-ance function, and then to cany out the spectral computationson the data X(t), S(t - v), see Akaike and Yamanouchi [53]and Tick [54]. In general, one should prefilter the X(t) seriesor the 5(0 series or both, so that the relationship between thefiltered series is as near to being instantaneous as is possible.
The most important use of the calculations we have describedis in the identification of linear systems. It used to be the casethat the transfer function of a linear system was estimated byprobing the system with pure sine waves in a succession ofexperiments. Expression (29) shows, however, that we canestimate the transfer function, for all X, by simply employinga single input series S(t) such that .$p(X) *£ 0.
In some situations we may have reason to believe that thesystem (26) is realizable that is a(u) = 0 for u < 0. The factor-ization techniques of Wiener [36] may be paralleled on thedata in order to obtain estimates of A(\), a(u) appropriate tothis case, see Bhansali [55]. In Section IX, we will discuss amodel like (26), but for the case of stochastic S(t).
Another useful linear model is
with 0i(t), • - • , $£•(?) given functions and B\, • • • , 6% un-known. The estimation of these unknowns and /ee(X) is con-sidered in Hannan [44] and Anderson [45], This modelallows us to handle trends and seasonal effects.
Yet another useful model is
with fi, Pi, 61, ai, • - -, PR, QK> OK unknown. The estimationof these unknowns and fee(\) is considered in Whittle [49].It allows us to handle hidden periodicities.
IX. VECTOR-VALUED CONTINUOUS SPATIAL SERIESIn this section we move on from a consideration of real-
valued discrete time series to series with a more complicated do-main, namely p-dimensional Euclidean space, and with a morecomplicated range, namely /--dimensional Euclidean space. Thisstep will allow us to consider data such as: that received by anarray of antennas or seismometers, picture or TV, holographic,turbulent field.
Provided we set down our notation judiciously, the changes
FOURIER ANALYSIS OF STATIONARY PROCESSES 517
involved are not dramatic. The notation that we shall adoptincludes the following: boldface letters such as X, a, A willdenote vectors and matrices. AT will denote the transpose of amatrix A, tr A will denote its trace, det A will denote its de-terminant. EX will denote the vector whose entries are theexpected values of the corresponding entries of the vector-valued variate X. cov [X, Y} = E{(X - EX)(Y - EYf} willdenote the covariance matrix of the two vector-valued variatesX, Y (that may have complex entries), t, u, \ will lie in p-dimensional Euclidean space, Rp, with
The limits of integrals will be from -°° to °°, unless indicatedotherwise.
We will proceed by paralleling the development of SectionsIII and IV. Suppose that we are interested in analyzing mea-surements made simultaneously on r series of interest at loca-tion t, for all locations in some subset of the hypercube0 < ti, • • • , tp < T. Suppose that we are prepared to modelthe measurements by the corresponding values of a realizationof an r vector-valued stationary continuous spatial series X(t),t E.RP. We define the mean
the autocovariance function
and the spectral density matrix
in the case that the integral exists. (The integral will existwhen well-separated values of the series are sufficiently weaklydependent.) The inverse of the relationship (31) is
518 ADDENDUM
Let
As in Section HI, expressions (32) and (33) may be combinedto see that the entry in row /, column k of the matrix fxxOdmay be interpreted as the covariance of the series resultingfrom passing the /th and fcth components of X(t) through nar-row band-pass filters with transfer functions A (a) - 6(tt - X).
The series has a Cramer representation
be a linear filter carrying the r vector-valued series X(t) intothe s vector-valued series Y(t). Let
denote the transfer function of this filter. Then the spectraldensity matrix of the series Y(t) may be seen to be
where Z^-(X) is an r vector-valued random function with theproperties
If 7(0 is the filtered version of X(t), then it has Cramerrepresentation
We turn to a discussion of useful computations when valuesof the series X(t) are available for t in some subset of thehypercube 0 < t\, • • • , tp < T. Let 0^(0 be a data windowwhose support (that is the region of locations where 0^00 ̂0) is the region of observation of X(t). (We might take 0(*\t)of the form 0(f/T) where 0(0 = 0 outside 0<f i , • • • , fp < 1.)We consider the Fourier transform
based on the observed sample values.
FOURIER ANALYSIS OF STATIONARY PROCESSES 519
Before indicating an approximate large sample distributionfor d£\\), we must first define the complex multivariatenormal distribution and the complex Wishart distribution. Wesay that a vector-valued variate X, with complex entries, ismultivariate complex normal with mean 0 and covariancematrix _E when it has probability density proportional toexp {-XT2T1X}. We shall say that a matrix-valued variate iscomplex Wishart with n Degrees of freedom and parameter Swhen it has the form X\X\ + • • • + XnX%, where Xi} • - • , Xn
are independent multivariate complex normal variates withmean 0 and covariance matrix 2. In the one dimensional case,the complex Wishart with n degrees of freedom is a multiple ofa chi-squared variate with In degrees of freedom.
In the case that well-separated values of the series X(t) areonly weakly dependent, the d^\\) have useful asymptoticproperties as T -*• °°. These include:
a') d$f\Q) is asymptotically multivariate normal with meanf<j>(T\t)dtcx and covariance matrix (2nf /0(r)(02 dtfxxW>
b') for X ̂ 0, djjp(X) is asymptotically multivariate complexnormal with mean 0 and covariance matrix
c') for X'(r) -> X =£ 0, with \*(T) - \k(T) not tending to 0too rapidly, 1 < / < k < /, the variates d^r)(X!(r)), • • • ,djjP(X(T)) are asymptotically independent multivariate com-plex normal with mean 0 and covariance matrix
/ = 1 , • • • , / are asymptotically independent multivariate com-plex normal with mean 0 and respective covariance matrices<2ffy ft(T\t,n2dtfxx(X),j = 1, • • - , / .
Specific conditions under which these results hold are givenin Section XV. A proof is given in the Appendix.
Results a'), b') are forms of the central limit theorem. Inresult d') the Fourier transforms are based on values of X(t)over disjoint domains. It is interesting to note, from c') andd') that asymptotically independent statistics may be obtained
520 ADDENDUM
by either taking the Fourier transform at distinct frequenciesor at the same frequency, but over disjoint domains.
Result a') suggests estimating the mean GX by
where / is chosen large enough to obtain acceptable stability,but not so large that the estimate becomes overly biased.From c ) the asymptotic distribution of the estimate (37) iscomplex Wishart with J degrees of freedom and parameterfxxOd- lfl tne case J = 1 this asymptotic distribution is that°f fxxOdxljIU- Result d') suggests the consideration of theperiodogram matrices
will have as asymptotic distribution J~l times a complexWishart with / degrees of freedom and parameter fxxO^ fol-lowing result d'). We could clearly modify the estimates (37),(39) by using a finer spacing of frequencies and by averagingperiodograms based on data over nondisjoint domains, Theexact asymptotic distributions will not be so simple in thesecases.
The method of fitting finite parameter models, described in
Result b') suggests the consideration of the periodogram matrix
as an estimate of fxxO$ when X =£ 0. From b') its asymptoticdistribution is complex Wishart with 1 degree of freedom andparameter /y^(X). This estimate is often inappropriate becauseof its instability and singularity. Result c') suggests the con-sideration of the estimate
/ = 1 ,'•' fJ as estimates of fxxO^y X ̂ 0. The estimate
FOURIER ANALYSIS OF STATIONARY PROCESSES 521
Section VII, extends directly to this vector-valued situation.Result b') suggests the replacement of the likelihood function(25) by
in this new case for some large values S\, • •' , Sp such thatthere is little power left beyond the cutoff frequency(2nSi/T, • • •, 2itSp/T). Suppose that G is the value of 6leading to the maximum of (40). Under regularity conditions,we can show that 0 is asymptotically normal with mean 9 andcovariance matrix 2nT~lA~l(A + R)A~l where if Afif,^/k are
row /, column k of A, B
with Cabj(a) the entry in row a column b of
In a number of situations we find ourselves led to consideran (r + $) vector-valued series,
satisfying a linear model of the form
for some s vector ;j and some s X r matrix-valued functiona(u). The model says that the average level of the series X(t)at position t, given the series S(t), is a linear filtered version ofthe series S(t). If (41) is a stationary series and if A(\) is thetransfer function of the filter a(w), then (42) implies
522 ADDENDUM
If we define the error series e(t) by
then the degree of fit of the model (42) may be measured bythe error spectral density
The relationships (43M45) suggest the estimates
respectively. The asymptotic distributions of these statisticsare given in [26].
If there is a possibility that the matrix f$jj\\) might becomenearly singular, then we would be better off replacing the esti-mate (46) by a frequency domain analog of the ridge regressionestimate (Hoerl and Kennard [56], Hunt [57]), such as
for some k > 0 and / the identity matrix. This estimate in-troduces further bias, over what was already present, but it ishoped that its increased stability more than accounts for this.In some circumstances we might choose k to depend on X andto be matrix-valued.
X. ADDITIONAL RESULTS nsr THE SPATIAL SERIES CASEThe results of the previous section have not taken any essen-
tial notice of the fact that the argument t of the random func-tion under consideration is multidimensional. We now indicatesome new results pertinent to the multidimensional character.
In some situations, we may be prepared to assume that theseries X(t), t £ Rp, is isotropic, that is the autocovariancefunction cxx(") = cov {X(t + u), X(t)} is a function of \u\ only.In this case the spectral density matrix fxxO^ *s ̂ so rotation-ally symmetric, depending only on |X|. In fact (see in Bochnerand Chandrasekharan [58, p. 69])
FOURIER ANALYSIS OF STATIONARY PROCESSES 523
where /^(O is the Bessel function of the first kind of order fe.The relationship (50) may be inverted as follows,
where the \f(T) are distinct, but with IX'CDI near |X|. Thereare many more \i(T) with |X/CT)I near |X| than there are X/(Dwith \*(T) near X. It follows that we generally obtain a muchbetter estimate of the spectrum in this case over the estimate inthe general case. Also the number of X'(T') with |X;(r)| near |X|increases as |X| increases. If follows that the estimate formedwill generally be more stable for the frequencies with |X| large.Examples of power spectra estimated in this manner may befound in Mannos [59].
Another different thing that can occur in the general pdimensional case is the definition of marginal processes andmarginal spectra. We are presently considering processesX(ti,''' ,tp). Suppose that for some n, 1 < n < p, we areinterested in the process with tn+i, • • • , tp fixed, say at 0, • • • ,0. By inspection we see that the marginal process X(t\,' ' ° ,tn,0, • • • , 0) has autocovariance function cxx(ui>'' ' > un> 0 > ' ' ' >0). The spectral density matrix of the marginal process is,therefore,
The simplified character of fxxO^ m the isotropic case makesits estimation and display much simpler. We can estimate itby an expression such as
We see that we obtain the spectral density of the marginalprocess by integrating the complete spectral density. The sameremark applies to the Cramer representation for
524 ADDENDUM
Vector-valued series with multidimensional domain are dis-cussed in Hannan [44] and Brillinger [26].
XI. ADDITIONAL RESULTS IN THE VECTOR CASEIn the case that the series X(t) is r vector-valued with r > 1,
we can describe analogs of the classical procedures of multi-variate analysis including for example; i) partial correlation,ii) principal component analysis, iii) canonical correlation anal-ysis, iv) cluster analysis, v) discriminant analysis, vi) multi-variate analysis of variance, and vii) simultaneous equations.These analogs proceed from c') or d') of earlier section. Theprocedures listed are often developed for samples from multi-variate normal distributions. We obtain the time series pro-cedure by identifying the d$\tf(T))t /« 1 , • • • , / or d£\\t /),/ - 0, • • • , / - 1 with independent multivariate normals havingmean 0 and covariance matrix (2it~f / <fiT\rf dt fxxOd andsubstituting into the formulas developed for the classical situa-tion. For example, stationary time series analogs of correlationcoefficients are provided by the
the coherency at frequency X of the y'th component with thefcth component of X(t), where )J-fc(A) is the entry in row /',column k offxxQd an^ < V A A ) ̂ ̂ e entfy m row; of djjp(A)for /', k = 1, • • • , r. The parameter /tyt(A) satisfies 0 <Utyt(A)l < 1 and is seen to provide a measure of the degree oflinear relationship of the series Xj(t) with the series Xk(t) atfrequency A. Its modulus squared, Ufyfc(A)|2, is called thecoherence. It may be estimated by
where fj£\\) is an estimate of///t(A).As time series papers on corresponding multivariate topics,
we mention in case i) Tick [60], Granger [61], Goodman[62], Bendat and Piersol [63], Groves and Hannan [64], andGersch [65]; in case ii) Goodman [66], Brillinger [67], [20],and Priestley et al. [68]; in case iii) Brillinger [67], [20],
FOURIER ANALYSIS OF STATIONARY PROCESSES 525
Miyata [69], and Priestley et al [68]; in case iv) Ligett [70];in case v) Brillinger [20]; in case vi) Brillinger [71]; in casevii) Brillinger and Hatanaka [72], and Hannan and Terrell [73].
Instead of reviewing each of the time series analogs we con-tent ourselves by indicating a form of discriminant analysisthat can be carried out in the time series situation. Supposethat a segment of the r vector-valued series X(t) is availableand that its spectral density matrix may be any one of/}(X),i = ! , • • • , / . Suppose that we wish to construct a rule forassigning X(t) to one of the/j-(A).
In the case of a variate U coming from one of / multivariatenormal populations with mean 0 and covariance matrix 2,-,i = 1 , • • • , / , a common discrimination procedure is to define adiscriminant score
for the i th population and then to assign the observation U tothe population for which the discriminant score has the highestvalue (see Rao [74, p. 488]). The discriminant score is essen-tially the logarithm of the probability density of the zthpopulation.
Result 2) suggests a time series analog for this procedure. Ifthe spectral density of the series X(t) is //(X), the log densityof d^'(X) is essentially
This provides a discriminant score for each frequency X. Amore stable score would be provided by the smoothed version
with /$$(X) given by (37) or (39). These scores could beplotted against X for i = 1 , - • • , / in order to carry out therequired discrimination. In the case that the//(X) are unknown,their values could be replaced by estimates in (52).
XII. ADDITIONAL RESULTS IN THE CONTINUOUS CASEIn Section IX, we changed to a continuous domain in con-
trast to the discrete domain we began with in Section III. Inmany problems, we must deal with both sorts of domains,because while the phenomenon of interest may correspond toa continuous domain, observational and computational con-siderations may force us to deal with the values of the processfor a discrete domain. This occurrence gives rise to the com-plication of aliasing. Let Z denote the set of integers, 2 ~0, ±1, • • • . Suppose X(t), t G Rp, is a stationary continuousspatial series with spectral density matrix fxxW ancl Cramerrepresentation
526 ADDENDUM
XIII. STATIONARY POINT PROCESSESA variety of problems, such as those of traffic systems,
queues, nerve pulses, shot noise, impulse noise, and micro-scopic theory of gases lead us to data that has the character oftimes or positions in space at which certain events have oc-curred. We turn now to indicating how the formulas we havepresented so far in this paper must be modified to apply todata of this new character.
Suppose that we are recording the positions in p-dimensionalEuclidean space at which events of r distinct types occur. For; = 1, • • • , r let Xj(t) = Xj(t\, • • • , tp) denote the number ofevents of the /th type that occur in the hypercube (0, ti ] X• • • X (0, t p ] . Let dXj(t) denote the number that occur in thesmall hypercube (/i, /i + dfj X • • • X (tp> tp + d t p ] . Supposethat joint distributions of variates such as dX(tl), • • • , dX(tk)are unaffected by simple translation of tl, • • • r*, we then saythat X(t) is a stationary point process.
Stationary point process analogs of definitions set downpreviously include
Suppose X(t) is observable only for t G Zp. For these valuesoff
This is the Cramer representation of a discrete series withspectral density matrix
We see that if the series X(t) is observable only for 16 Zpt then
there is no way of untangling the frequencies
These frequencies are called the aliases of the fundamentalfrequency A.
Cx is called the mean intensity of the process,
FOURIER ANALYSIS OF STATIONARY PROCESSES 527
This last refers to an (r + s) vector-valued point process. Itsays that the instantaneous intensity of the series X(t) at posi-tion t, given the location of all the points of the process S(u),is a linear translation invariant function of the process S(M).The locations of the points of X(t) are affected by where thepoints of S(u) are located. We may define here a stationaryrandom measure de(t) by
The change in going from the case of spatial series to thecase of point processes is seen to be the replacement of X(t) dtby dX(t). In the case that well-separated increments of theprocess are only weakly dependent, the results a')-d') of Sec-tion IX hold without further redefinition.
We next indicate some statistics that it is useful to calculatewhen the process X(t) has been observed over some region.The Fourier transform is now
for the data window <^T\t) whose support corresponds to thedomain of observation. If r = 1 and points occur at the posi-tions TI , Ta, * * ' , then this last has the form
We may compute Fourier transforms for different domains inwhich case we define
528 ADDENDUM
References to the theory of stationary point processes in-clude: Cox and Lewis [75], Brillinger [76], Daley and Vere-Jones [77], and Fisher (78]. We remark that the material ofthis section applies equally to the case in which dX(t} is ageneral stationary random measure, for example with p, r = 1,we might take dX(t) to be the amount of energy released byearthquakes in the time interval (t, t + dt). In the next sectionwe indicate some results that do take note of the specificcharacter of a point process.
XIV. NEW THINGS IN THE POINT PROCESS CASEIn the case of a point process, the parameters ex, Cxx(u)
have interpretations further to their definitions (53), (54).Suppose that the process is orderly, that is the probability thata small region contains more than one point is very small.Then, for small dt
Cj dt = EdXj(t) = Pr [there is an event of type / in (t, t + dt] ].
It follows that Cj may be interpreted as the intensity withwhich points of type / are occurring. Likewise, for « =£ 0
It follows that
In the case that the processes Xj(t) and X^t) are independent,expression (62) is equal to Cjdu.
If the derivative c^(«) = dCjk(u)/du exists for u =£ 0 it iscalled the cross-covariance density of the two processes in thecase / =£ k and the autocovariance density in the case / = k. Formany processes
and so the power spectrum of the process ATy(r) is given by
For a Poisson process c//(w) = 0 and so .fay (X) = (27r)~pcy.The parameter (lit)pfxx(Q)/cx is useful in the classification
of real-valued point processes. From 1)
FOURIER ANALYSIS OF STATIONARY PROCESSES 529
It follows that, for large T, (2ir)pfXx(Wcx is the ratio of thevariance of the number of points in the hypercube (0, T]p forthe process X(t) to the variance of the number of points in thesame hypercube for a Poisson process with the same intensityGX- For this reason we say that the process X(t) is under-dispersed or clustered if the ratio is greater than 1 and over-dispersed if the ratio is less than 1.
The estimation procedure described in Section XI for modelswith a finite number of parameters is especially useful in thepoint process case as, typically, convenient time domainestimation procedures do not exist at all. Results of applyingsuch a procedure are indicated in [79].
XV. STATIONARY RANDOM SCHWARTZ DISTRIBUTIONSIn this section, we present the theory of Schwartz distribu-
tions (or generalized functions) needed to develop propertiesof the Fourier transforms of random Schwartz distributions.These last are important as they contain the processes dis-cussed so far in this paper as particular cases. In addition theycontain other interesting processes as particular cases, such asprocesses whose components are a combination of the processesdiscussed so far and such as the processes with stationary in-crements that are useful in the study of turbulence, seeYaglom [80]. A further advantage of this abstract approach isthat the assumptions needed to develop results are cut back toessentials. References to the theory of Schwartz distributionsinclude Schwartz [81 ] and Papoulis [82].
Let 2) denote the space of infinitely differentiable functionson Rp with compact support. Let S denote the space of in-finitely differentiable functions on Rp with rapid decrease,that is such that if 0^(0 denotes a derivative of order q then
A continuous linear functional on 3) is called a Schwartz dis-tribution or generalized function. The Dirac delta functionthat we have been using throughout the paper is an example.A continuous linear functional on 3) is called a tempereddistribution.
Suppose now that a random experiment is being carried out,the possible results of which are continuous linear maps Xfrom 3) to L*(P), the space of square integrable functions for aprobability measure P. Suppose that r of these maps are col-lected into an r vector, A"(0). We call X((J>) an r vector-valuedrandom Schwartz distribution. It is possible to talk about
530 ADDENDUM
things such as E Jf($), cov {X(<j>), X(\J/)} in this case. An im-portant family of transformations on D consists of the shiftsSu defined by 5M0(r) = 0(r + u), t,u£Rp. The randomSchwartz distribution is called wide-sense stationary when
for all u E /?** and 0, ^/ € 3). It is called strictly stationarywhen all the distributions of finite numbers of values are in-variant under the shifts.
Let us denote the convolution of two functions 0, i// € 3) by
and the Fourier transform of a function in S by the corre-sponding capital letter
then we can set down the following Theorem.Theorem 1: (Ito [83], Yaglom [80].) If X(<t>), (j> G 3) is a
wide-sense stationary random Schwartz distribution, then
and
where ex is an r vector, cxx(') is an r X r matrix of tempereddistributions, FXX&) i$ a nonnegative matrix-valued measuresatisfying
for some nonnegative integer k, and finally Zj^-(X) is a randomfunction satisfying
FOURIER ANALYSIS OF STATIONARY PROCESSES 531
The spatial series of Section IX is a random Schwartz dis-tribution corresponding to the functional
for 0 €= 3). The representations indicated in that section maybe deduced from the results of Theorem 1. It may be shownthat k of (67) may be taken to be 0 for this case.
The stationary point process of Section XII is likewise arandom Schwartz distribution corresponding to the functional
for 063). The representations of Section XII may be deducedfrom Theorem 1. It may be shown that k of (67) may betaken to be 2 for this case.
Gelfand and Vilenkin [84] is a general reference to thetheory of random Schwartz distributions. Theorem 1 isproved there.
A linear model that extends those of (42) and (58) to thepresent situation is one in which the (r + s) vector-valued sta-tionary random Schwartz distribution
suggesting that the system may be identified if the spectraldensity may be estimated. We next set down a mixing assump-tion, before constructing such an estimate and determining itsasymptotic properties.
Given k variates Xl, • • • , Xk let cum {X\ , • • • , - Xk} denotetheir joint cumulant or semi-invariant. Cumulants are defined
satisfies
In the case that the spectral measure is differentiate this lastimplies that
532 ADDENDUM
and discussed in Kendall and Stuart [85] and Brillinger [20].They are the elementary functions of the moments of thevariates that vanish when the variates are independent. Assuch they provide measures of the degree of dependence ofvariates. We will make use of
Assumption 1. X(<j>) is a stationary random Schwartz dis-tribution with the property that for 0 t, • • • , 0fc € S and
for some finite mi, • • • , m^.j, L^.In the case that the spectral measure F^^(X) is differentiable,
relation (65) corresponds to the case k = 2 of (72). The char-acter of Assumption 1 is one of limiting the size of the cumu-lants of the functional of the process X(<f>). It will be shownthat it is a form of weak dependence requirement, for func-tionals of the process that are far apart in t, in the Appendix.The function /fl( ...afc(X
!, • • • , X*~!) appearing in (72) is calleda cumulant spectrum of order k, see Brillinger [86] and thereferences therein. From (66) we see that it is also given by
The fact that it only depends on k - 1 arguments results fromthe assumed stationarity of the process.
Let 0(T)(f) = 0(f/r) with 0 € 2). As an analog of the Fouriertransforms of Sections IX and XII we now define
for the stationary random Schwartz distribution X($). We cannow state the following theorem.
Theorem 2: If Assumption 1 is satisfied, if <f^(X) is givenby (74) ajad if T \ \*(T) - X*(D | -+« 1 < / < fc < /, thenl)-4) of Section IX hold.
This theorem is proved in the Appendix. It provides a justi-fication for the estimation procedures suggested in the paper
FOURIER ANALYSIS OF STATIONARY PROCESSES 533
and for the large sample approximations suggested for the dis-tributions of the estimates.
We end this section by mentioning that a point process withevents at positions T^, k = 1, • • • may be represented by thegeneralized function
the sampled function of Section III may be represented by thegeneralized function
and that a point process with associated variate S may berepresented by
see Beutler and Leneman [87]. Matheron [92] discusses theuse of random Schwartz distributions in the smoothing of maps.
XVI. HIGHER ORDER SPECTRA AND NONLINEAR SYSTEMSIn the previous section we have introduced the higher order
cumulant spectra of stationary random Schwartz distributions.In this section we will briefly discuss the use of such spectraand how they may be estimated.
In the case that the process under consideration is Gaussian,the cumulant spectra of order greater than two are identically0. In the non-Gaussian case, the higher order spectra provideus with important information concerning the distribution ofthe process. For example were the process real-valued Poissonon the line with intensity c^, then the cumulant spectrum oforder k would be constant equal to Cff(2n)1 ~*. Were theprocess the result of passing a series of independent identicallydistributed variates through a filter with transfer functionA(\), then the cumulant spectrum of order k would be pro-portional to
Such hypotheses might be checked by estimating higher cumu-lant spectra.
An important use of higher order spectra is in the identifica-tion of polynomial systems such as those discussed inWiener [88] and Brillinger [86] and Halme [89]. Tick [90]shows that if S(t) is a stationary real-valued Gaussian series, ife(f) is an independent stationary series and if the series X(t)is given by
534 ADDENDUM
Suppose that no proper subset of X1, • • • , Xfc sums to 0. Itthen follows from the principal relation connecting momentsand cumulants that
and fgsx(\ M) is a third-order cumulant spectrum. It followsthat both the linear transfer function A(\) and the bitransferfunction B(\, //) of the system may be estimated, fromestimates of second- and third-order spectra, following theprobing of the system by a single Gaussian series. Referencesto the identification of systems of order greater than 2, andto the case of non-Gaussian S(t) are given in [86].
We turn to the problem of constructing an estimate of a fcthorder cumulant spectrum. In the course of the proof ofTheorem 2 given in the Appendix, we will see that
then
where
FOURIER ANALYSIS OF STATIONARY PROCESSES 535
as a naive estimate of the spectrum fa ...ak(\l, • • • , X*"1' pro-
vided that no proper subset of X1, • • • , \k~l sums to 0. Fromwhat we have seen in the case k = 2 this estimate will be un-stable. It'follows that we should in fact construct an estimateby smoothing the periodogram (76) over (k - IHuples of fre-quencies in the neighborhood of X1, • • • , \k~l, but such thatno proper subset of the (k - I )-tuple sums to 0. Details of thisconstruction are given in Brillinger and Rosenblatt [91] forthe discrete time case. We could equally well have constructedan estimate using the Fourier transforms d£'(\,;') based ondisjoint domains.
APPENDIX
We begin by providing a motivation for Assumption 1 ofSection XIV. Suppose that
is continuous in each of its arguments. Being a continuousmultilinear functional it can be written
where ca ...fl. is a Schwartz distribution on 3)(/Jp ), from theSchwartz nuclear theorem. If the process is stationary thisdistribution satisfies
It follows that it has the form
provided X1 + • • • + Xfe = 0. This last one suggests the use offcth order periodogram
536 ADDENDUM
when the supports of <j>i, • • • , 0fc_i are farther away from thatof 4>k than some number p. This means that the distribution Chas compact support. By the Schwartz-Paley-Wiener theorem,C is, therefore, the Fourier transform of a function of slowgrowth, say /fli ...flfc(X
!, • • • , X*~') and we may write the rela-tion (72). In the case that values of the process X(<f>) at a dis-tance from each other are only weakly dependent, we can ex-pect the cumulant to be small and for the representation (72)to hold with (73) satisfied.
Proof of Theorem 2: We see from (66) and (73)
for 0 € £(tfpk) where C is a distribution on 3) (/?p(*~J)).Now consider the case in which the process Jf(0) has the
property that
It follows from this last that the standardized joint cumulantsof order greater than 2 tend to 0 and so the Fourier transformsare asymptotically normal.
FOURIER ANALYSIS OF STATIONARY PROCESSES 537
REFERENCES11 ] G. G. Stokes, "Note on searching for periodicities," Proc. Roy.
Soc., vol. 29, p. 122, 1879.(2] A. Schuster, "The periodogram of magnetic declination," Cam-
bridge Phil. Soc., vol. 18, p. 18, 1899.13] E. T. Whittaker and A. Robinson, The Calculus of Observations.
Cambridge, England: Cambridge Univ. Press, 1944.(4] J. W. Cooley and J. W. Tukey, "An algorithm for the machine
calculation of complex Fourier series," Math. Comput., vol. 19,pp. 297-301, 196S.
| S J A. A. Michelson, Light Waves and Their Uses. Chicago, 111.: Univ.Chicago Press, 1907.
[6] I. Newton, Opricfct. London, England: W. Innys, 1730.[7] M. I. Pupin, "Resonance analysis of alternating and polyphase
currents," Trans. AIEE, vol. 9, p. 523, 1894.[8] H. L. Moore, Economic Cycles Their Law and Cause. New York:
Macmillan, 1914.(9) H. A. Panofsky and R. A. McCormick, "Properties of spectra of
atmospheric turbulence at 100 metres," Quart. J. Roy. Meteorol.Soc., vol. 80, pp. 546-564, 1954.
[10] J. A. Leese and E. S. Epstein, "Application of two-dimensionalspectral analysis to the quantification of satelite cloud photo-graphs," /. Appl. Meteorol., vol. 2, pp. 629-644, 1963.
[11] M. S. Bartlett, "The spectral analysis of point processes," /. Roy.Stat. Soc., vol. B 25, pp. 264-296, 1963.
[12] , "The spectral analysis of two dimensional point processes,"Biometrika,vo\. 51, pp. 299-311, 1964.
{13) A. V. Oppenheim, R. W. Schafer, and T. G. Stockham, Jr., "Non-linear filtering of multiplied and convolved signals," Proc. IEEE,vol. 56, pp. 1264-1291, 1968.
[14] B. P. Bogert, M. J. Healey, and J. W. Tukey, "The quefrencyalanysis of time series for echoes: cepstrum, pseudo-covariance,cross-cepstrum and saphe cracking," in Time Series Analysts,M. Rosenblatt, Ed. New York: Wiley, pp. 209-243, 1963.
[15] M. S. Bartlett, "Periodogram analysis and continuous spectra,"Biometrtka,\o\. 37, pp. 1-16, 19SO.
[16] R. H. Jones, "A reappraisal of the periodogram in spectral analy-sis," Technometrics, vol. 7, pp. 531-542, 1965.
[17] K. Hasselman, W. Munk, and G. J. F. MacDonald, "Bispectra ofocean waves," in Time Series Analysis, M. Rosenblatt, Ed. NewYork: Wiley, pp. 125-139, 1963.
[18) N. Wiener, "Generalized harmonic analysis," ActaMath., vol. 55,pp. 117-258, 1930.
[19] H. Wold, Bibliography on Time Series and Stochastic Processes.London, England: Oliver and Boyd, 1965.
[20 J D. R. Brillinger, Time Series: Data Analysis and Theory. NewYork: Holt, Rinehart and Winston, 1974.
[211 A. Ya. Khintchine, "Korrelationstheories der stationaren sto-chastischen Prozesse," Math. Ann., vol. 109, pp. 604-615, 1934.
[22] H. Cramer, "On harmonic analysis in certain functional spaces,"Ark. Mat. Astron. Fys.,vol. 28B, pp. 1-7, 1942.
[23] V. P. Leonov and A. N. Shiryaev, "Some problems in the spectraltheory of higher moments, II," Theory Prob. Appl. (USSR), vol.5, pp. 460-464, 1960.
[24] B. Picinbono, "Tendence vers le caractere Gaussien par filtrageselectif," C. R. Acad. Sci. Paris, vol. 248, p. 2280, 1959.
[25] M. Rosenblatt, "Some comments on narrow band-pass filters,"Quart. Appl Math.,vo\. 18, pp. 387-393, 1961.
[26] D. R. Brillinger, "The frequency analysis of relations betweenstationary spatial series," in Proc. 12th Biennial Seminar of theCanadian Math. Congress, R. Pyke, Ed. Montreal, P.Q., Canada:Can. Math. Congr., pp. 39-81, 1970.
[27] E. J. Hannan and P. J. Thomson, "Spectral inference over narrow
538 ADDENDUM
bands,"/. Appl. Prob., vol. 8, pp. 157-169, 1971.[28] E. Parzen, "On consistent estimates of the spectrum of stationary
time series," Ann. Math. Statist, vol. 28, pp. 329-348, 1957.[29) M. Rosenblatt, "Statistical analysis of stochastic processes with
stationary residuals," in Probability and Statistics, U. Grenander,Ed. New York: Wiley, pp. 246-275, 1959.
[ 301 W. D. Wright, The Measurement of Color. New York: Macmillan.1958.
[31] E. W. Carpenter, "Explosions seismology," Science, vol. 147,pp. 363-373, 1967.
(32) D. G. Lambert, E. A. Flinn, and C. B. Archambeau, "A com-parative study of the elastic wave radiation from earthquakesand underground explosions," Geophys. J. Roy. Astron. Soc.,vol. 29, pp. 403-432, 1972.
[33] W. H. Munk and G. J. F. MacDonald, Rotation of the Earth.Cambridge, England: Cambridge Univ. Press, 1960.
[34] G. J. F. MacDonald and N. Ness, "A study of the free oscillationsof the Earth,"/. Geophys. Res.,vol. 66, pp. 1865-1911, 1961.
J 35) R. L. Wegel and C. R. Moore, "An electrical frequency analyzer,"Bell Syst. Tech. /..vol. 3, pp. 299-323, 1924.
[36) N. Wiener, Time Series. Cambridge, Mass.: M.I.T. Press, 1964.[37) U. Grenander and M. Rosenblatt, Statistical Analysis of Sta-
tionary Time Series. New York: Wiley, 1957.[38] E. Parzen, "An approach to empirical time series analysis,"
Radio Sci., vol. 68 D, pp. 551-565, 1964.(39) R. T. Lacoss, "Data adaptive spectral analysis methods," Geo-
physics, \ol. 36, pp. 661-675, 1971.(40) J. P. Burg, "The relationship between maximum entropy spectra
and maximum likelihood spectra," Geophysics, vol. 37, pp.375-376,1972.
[41] K. N. Berk, "Consistent autoregressive spectral estimates," Ann.Stat., vol. 2, pp. 489-502, 1974.
(42) V. E. Pisarenko, "On the estimation of spectra by means of non-linear functions of the covariance matrix," Geophys. J. Roy.Astron. Soc., vol. 28, pp. 511-531, 1972.
[43] J. Capon, "Investigation of long-period noise at the large apertureseismic array,"/. Geophys. Res., vol. 74, pp. 3182-3194, 1969.
[44] E. J. Hannan, Multiple Time Series. New York: Wiley, 1970.[45] T. W. Anderson, The Statistical Analysis of Time Series. New
York: Wiley, 1971.[46] P. Whittle, "Estimation and information in stationary time
series," Ark. Mat. Astron. Fys., vol. 2, pp. 423-434, 1953.[47] J. M. Chambers, "Fitting nonlinear models: numerical tech-
niques," Biometrika, vol. 60, pp. 1-14, 1973.[48] D. R. Brillinger, "An empirical investigation of the Chandler
wobble and two proposed excitation processes," Bull. Int. Stat.Inst., vol. 39, pp. 413-434, 1973.
[49) P. Whittle, "Gaussian estimation in stationary time series," Bull.Int. Stat. Inst., vol. 33, pp. 105-130, 1961.
[50] A. M. Walker, "Asymptotic properties of least-squares estimatesof parameters of the spectrum of a stationary nondeterministictime-series," /. Australian Math. Soc., vol. 4, pp. 363-384, 1964.
[51] K. O. Dzaparidze, "A new method in estimating spectrum pa-rameters of a stationary regular time series," Tear. Veroyat. EePrimen., vol. 19, p. 130, 1974.
[52] L. H. Koopmans, "On the coefficient of coherence for weaklystationary stochastic processes," Ann. Math. Stat., vol. 35, pp.532-549, 1964.
[53] H. Akaike and Y. Yamanouchi, "On the statistical estimation offrequency response function," Ann. Inst. Stat. Math., vol. 14,pp. 23-56, 1962.
[54] L. J. Tick, "Estimation of coherency," in Advanced Seminar onSpectral Analysis of Time Series, B. Harris, Ed. New York:Wiley, 1967, pp. 133-152.
FOURIER ANALYSIS OF STATIONARY PROCESSES 539
(55] R. J. Bhansali, "Estimation of the Wiener filter," in ContributedPapers 39th Session Int. Stat. Inst.,vo\. 1, pp. 82-88, 1973.
[56] A. E. Hoerl and R. W. Kennard, "Ridge regression: biased estima-tion for nonorthogonal problems," Technometrics, vol. 12, pp.55-67, 1970.
[57] B. R. Hunt, "Biased estimation for nonparametric identificationof linear systems," Marti. Btosct,vol. 10, pp. 215-237, 1971.
[58] S. Bochner and K. Chandrasekharan, Fourier Transforms.Princeton, N.J.: Princeton Univ. Press, 1949.
[59] J. Mannos, "A class of fidelity criteria for the encoding of visualimages," Ph.D. dissertation, Univ. California, Berkeley, 1972.
[60] L. J. Tick, "Conditional spectra, linear systems and coherency,"in Time Series Analysis, M. Rosenblatt, Ed. New York: Wiley,pp. 197-203, 1963.
[61] C. W. J. Granger, Spectral Analysis of Economic Time Series.Princeton, N.J.: Princeton Univ. Press, 1964.
[62] N. R. Goodman, "Measurement of matrix frequency responsefunctions and multiple coherence functions," Air Force DynamicsLab., Wright Patterson AFB, Ohio, Tech. Rep. AFFDL-TR-65-56,1965.
[63] J. S. Bendat and A. Piersol, Measurement and Analysis of RandomData. New York: Wiley, 1966.
[64] G. W. Groves and E. J. Hannan, "Time series regression of sealevel on weather,"Rev. Geophys., vol. 6, pp. 129-174, 1968.
[65] W. Gersch, "Causality or driving in electrophysiological signalanalysis,"/. Math. Biosci.,vol. 14, pp. 177-196, 1972.
[66] N. R. Goodman, "Eigenvalues and eigenvectors of spectral densitymatrices," Tech. Rep. 179, Seismic Data Lab., Teledyne, Inc.,1967.
[67] D. R. Brillinger, "The canonical analysis of time series," in Multi-variate Analysis—II, P. R. Krishnaiah, Ed. New York: Academic,pp. 331-350, 1970.
[68] M. B. Priestley, T. Subba Rao, and H. Tong, "Identification ofthe structure of multivariable stochastic systems," in MultivariateAnalysis—III, P. R. Krishnaiah, Ed. New York: Academic, pp.351-368, 1973.
[69] M. Miyata, "Complex generalization of canonical correlation andits application to a sea-level study," /. Marine Res., vol. 28, pp.202-214, 1970.
[70] W. S. Ligett, Jr., "Passive sonar: Fitting models to multiple timeseries," paper presented at NATO Advanced Study Institute onSignal Processing, Loughborough, U. K., 1972.
[71] D. R. Brillinger, "The analysis of time series collected in an ex-perimental design," Multivariate Analysis—III, P. R. Krishnaiah,Ed. New York: Academic, pp. 241-256, 1973.
[72] D. R. Brillinger and M. Hatanaka, "An harmonic analysis of non-stationary multivariate economic processes," Econometrica, vol.35, pp. 131-141, 1969.
[73] E. J. Hannan and R. D. Terrell, "Multiple equation systems withstationary errors," Econometrica, vol. 41, pp. 299-320, 1973.
[74] C. R. Rao, Linear Statistical Inference and its Applications.New York: Wiley, 1965.
[75] D. R. Cox and P. A. W. Lewis, The Statistical Analysis of Seriesof Events. London, England: Methuen, 1966.
[76] D. R. Brillinger, "The spectral analysis of stationary interval func-tions," Proc. 6th Berkeley Symp. Math. Stat. Prob. Vol. 1,L. M. Le Cam, J. Neyman, and E. L. Scott, Eds. Berkeley,Calif.: Univ. California Press, pp. 483-513, 1972.
[77] D. J. Daley and D. Vere-Jones, "A summary of the theory ofpoint processes," in Stochastic Point Processes, P. A. W. Lewis,Ed. New York: Wiley, pp. 299-383, 1972.
[78] L. Fisher, "A survey of the mathematical theory of multidimen-sional point processes," in Stochastic Point Processes, P. A. W.Lewis, Ed. New York: Wiley, pp. 468-513, 1972.
540 ADDENDUM
[79] A. G. Hawkes and L. Adamopoulos, "Cluster models forearthquakes—regional comparisons," Bull. Int. Stat. Inst., vol. 39,pp.454-460,1973.
[80] A. M. Yaglom, "Some classes of random fields in n-dimensionalspace related to stationary random processes," Theory Prob.Appl. (USSR), vol. 2, pp. 273-322, 1959.
[81 ] L. Schwartz, Theorie des Distributions, Vols. 1, 2. Paris, France:Hermann, 1957.
[82] A. Papoulis, The Fourier Integral and Its Applications. NewYork: McGraw-Hill, 1962
[83] K. Ito, "Stationary random distributions," Mem. Col. Sci. Univ.Kyoto A, vol. 28, pp. 209-223, 1954.
[84] I. M. Gelfand and N. Ya. Vilenkin, Generalized Functions, vol. 4.New York: Academic, 1964.
[85] M. G. Kendall and A. Stuart, The Advanced Theory of Statistics,vol. 1. London, England: Griffin, 1958.
[86] D. R. Brillinger, "The identification of polynomial systems bymeans of higher-order spectra," /. Sound Vibration, vol. 12, pp.301-313, 1970.
[87] F. J. Beutler and O. A. Z. Leneman, "On the statistics of randompulse processes,Inform. Contr.,vol. 18, pp. 326-341, 1971.
[88] N. Wiener, Nonlinear Problems in Random Theory. Cambridge,Mass.: M.I.T. Press, 1958.
[ 89 ] A. Halme, "Polynomial operators for nonlinear systems analysis,"Acta Poly tech. Scandinavica, no. 24, 1972.
[90] L. J. Tick, "The estimation of the transfer functions of quadraticsystems," Technometries, vol. 3, pp. 563-567, 1961.
[91] D. R. Brillinger and M. Rosenblatt, "Asymptotic theory of fcthorder spectra," in Advanced Seminar on the Spectral Analysis ofTime Series, B. Harris, Ed. New York: Wiley, pp. 153-188,1967.
[92] G. Matheron, "The intrinsic random functions and their applica-tions," Adv. Appl. Prob., vol. 5, pp. 439-468, 1973.