analysis of stationary and non-stationary long memory ... · memory processes and the...

HAL Id: tel-00422376https://tel.archives-ouvertes.fr/tel-00422376

Submitted on 6 Oct 2009

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Analysis of stationary and non-stationary long memoryprocesses : estimation, applications and forecast

Zhiping Lu

To cite this version:Zhiping Lu. Analysis of stationary and non-stationary long memory processes : estimation, applica-tions and forecast. Mathematics [math]. École normale supérieure de Cachan - ENS Cachan, 2009.English. tel-00422376

https://tel.archives-ouvertes.fr/tel-00422376

https://hal.archives-ouvertes.fr

THESE DE DOCTORATDE L’ECOLE NORMALE SUPERIEURE DE CACHAN

Présentée par LU Zhiping

pour obtenir le grade de

DOCTEUR DE L’ECOLE NORMALE SUPERIEURE DE CACHAN

Domaine :MATHEMATIQUES–MATHEMATIQUES FINANCIERES ET STATISTIQUES

APPLIQUEES

Sujet de la thèse : Analyse des Processus Longue Mémoire Stationnaires et

Non-stationnaires: Estimations, Applications et Prévisions

Thèse présentée et soutenue à Cachan le 02 juin 2009 devant le jury composé de :

Yves Meyer Professeur émérite à l’Ecole Normale Supérieure de Cachan PrésidentGilles Dufrénot Professeur à l’Université d’Aix-Marseille II RapporteurRongming WANG Professeur à l’Ecole Normale de la Chine de l’Est RapporteurDominique Guégan Professeur à l’Université Paris I Directrice de ThèseFeng ZHOU Professeur à l’Ecole Normale de la Chine de l’Est Directeur de Thèse

Laboratoire C.E.S. Centre d’Economie de la Sorbonne.(ENS CACHAN UMR CNRS 8533)

61, avenue du Président Wilson - 94235 CACHAN CEDEX (France)

ECOLE NORMALE SUPERIEURE DE CACHANEAST CHINA NORMAL UNIVERSITY

ANALYSIS OF STATIONARY ANDNON-STATIONARY LONG MEMORY

PROCESSES:ESTIMATION, APPLICATIONS AND

FORECAST

University:

Department:

Major:Subject:Tutor:

Author:

Ecole Normale Supérieure de Cachanand East China Normal University

CES de l’Université Paris 1and Mathematics Department

Applied MathematicsMathematical Finance and Statistical Applications

Prof. Dominique GUEGANand Prof. ZHOU Feng

LU Zhiping

2009. 03

Acknowledgement

This thesis would not have been possible without the encouragement and guidance ofmany individuals. Since I am in the cooperation program between East China NormalUniversity and Ecole Normale Supérieure de Cachan, I would like to show my sincerethanks to many people and many organizations.

I would like to express my very deep gratitude to my French supervisor, Professor Do-minique GUEGAN, who guided me at each and every aspect of this thesis. She has alwayslistened to me patiently, sharing her insight into the applied statistics and encouraged meto solve the questions independently. She clearly answered all my questions and helpedme better understand theory. I value her constructive criticisms of my work and writing.She showed me that working hard in academe can be rewarding and satisfying. I am look-ing forward to more collaborations in the future.

I would like to thank Professor Feng ZHOU, my Chinese supervisor, for helping mepersonally and professionally all these years, my graduate studies included. He taught mehow to do the research, how to enlarge the scope of academic knowledge, how to live andto enrich the life in France, etc. He is always available for helping me. In particular, whenI was ill, he was so anxious and have given me a lot of useful advices. I really appreciatehis kindness and precious help! I could not have asked for a better mentor and professor!

I am quite grateful to Pr. Yves Meyer. He accepted to be the president of the jury, whichis such a great honor for me. During the last several years, he is always very kind andhas given me many precious helps. Without his signatures, I would not have registered inENS Cachan for all these three years.

I would like to thank Professor Gilles Dufrénot. He agreed to serve as one referee ofmy thesis. I am quite grateful for his help.

And I wish to express my thanks to Professor Rongming WANG, for he agreed to bethe other referee of the thesis and to come all the way from Shanghai to Paris for mydefense.

I would like to thank Laurent Ferrara, one of my professors and my friends. He taughtme a lot in the last three years. The cooperation with him is very pleasant and instructive.His kindness and encouragements gave me much motivation to do the research.

i

I would like to thank Pr. Dong YE and his wife, Pr. Xiaonan MA and his wife, fortheir friendly and warmly help and suggestion.

I also would like to thank the faculty and staff. I am very pleased to work in the lab-oratory of CES (Centre d’Economie de la Sorbonne) of l’University of Paris 1. I am quitegrateful for the help of professors of Université Paris 1 during my stay in Paris, such asPr. Phillipe BICH, Pr. Pascal Gourdel, Pr. Jean-Phillipe MEDECIN, Mme Marie-Lou, M.Cuong Le Van, etc. In particular, I would show my sincere thanks to Pr. Phillipe BICH.To be his teaching assistant is really pleasant and interesting. Besides, the professorsof Ecole Normale Supérieure de Cachan are also very kind to me. For example, MmeChristine ROSE, the secretary of the EDSP (Ecole Doctorale de Science et de Pratique),is always patient and kind to answer my questions. Mme Vidale, the secretary of the Stu-dents’ life, has helped me a lot when I applied for the "titre de séjour" in Fracne. What’smore, the professors of East China Normal University are always with me when I needany help. So I would like to thank M. Wenhui YOU, M. Haisheng LI, M. Zhibin LI, Ms.Yunhua QIAN, Ms. Xiaoling LIU, Ms. Ying’e HUANG, Ms. Yujun QIN, Ms Jie XU, etc.

I would like to thank my fellow students and my friends. Qinying TAO and Beijia ZHU,the two girls have helped me a lot in the last year. I would like to show the sincerest grat-itude to them for everything that they have done for me, for taking care of me when I wasill, for accompanying me to the laboratory everyday, for helping me check the first draftof the thesis, for always being so considerate, etc. I am also very grateful to M. ChunyuanZHOU, Ms. Qiuying LU, Ms. Ting WU, Ms. Xiaoju NI, M. Hua YI, M. Tong WU, M.Haibin ZHAO, Ms. Na LI, Ms. Hua REN, M. Jie XU, M. Sanjun ZHANG, Ms. YeqinZHAO, M. Zhongwei TANG, M. Chun LI, Ms. Keguang CHENG, M. Chenjiang ZHU,M. Liang WANG, M. Jianxiao YANG, M. Jianxin YANG, M. Guang WU, M. XiaopengHE, Ms.Yaxin PENG, M. Rui HUANG, M. Lei WU, Ms. Tong LI, etc.

My deepest gratitude goes to my parents, my sister, my parents-in-law and my husbandfor their total and unwavering support during my studies. They taught me by their loveand living example that all things are possible with hard work and discipline. Most of all,they taught me to live humble and sincere.

Thanks to everyone who has given me their valuable time, skills and enthusiasm dur-ing these last years!

Finally, I would like to show my thanks to the departments who has awarded me thescholarships in the last three years: East China Normal University (the scholarship ofECNU), Egide of France (the scholarship of Eiffel Doctorat), China Scholarship Council(the scholarship of Chinese government). Their financial aides are also important for thecompletion of this thesis.

LU Zhiping2009.3

ii

AbstractIn this thesis, we consider two classes of long memory processes: the stationary longmemory processes and the non-stationary long memory processes. We are devoted to thestudy of their probabilistic properties, estimation methods, forecast methods and the sta-tistical tests.

Stationary long memory processes have been extensively studied over the past decades. Ithas been shown that some long memory processes have the properties of self-similarity,which are important for parameter estimation. We review the self-similar properties ofcontinuous-time and discrete-time long memory processes. We establish the propositionsthat stationary long memory process is asymptotically second-order self-similar, whilestationary short memory process is not asymptotically second-order self-similar. Thenwe extend the results to specific long memory processes such as k-factor GARMA pro-cesses and k-factor GIGARCH processes. We also investigate the self-similar propertiesof some heteroscedastic models and the processes with switches and jumps.

We make a review for the stationary long memory processes’ parameter estimation meth-ods, including the parametric methods (for example, maximum likelihood estimation, ap-proximate maximum likelihood estimation) and the semiparametric methods (for exam-ple, GPH method, Whittle method, Robinson method). The consistency and asymptoticnormality behaviors are also investigated for the estimators.

Testing the fractionally integrated order of seasonal and non-seasonal unit roots of thestochastic stationary long memory process is quite important for the economic and finan-cial time series modeling. The widely used Robinson test (1994) is applied to variouswell-known long memory models. Via Monte Carlo experiments, we study and comparethe performances of this test using several sample sizes, which provide a good referencefor the practitioners who want to apply Robinson’s test.

In practice, seasonality and time-varying long-range dependence can often be observedand thus some kind of non-stationarity exists inside the economic and financial datasets. To take into account this kind of phenomena, we review the existing non-stationaryprocesses and we propose a new class of non-stationary stochastic process: the locallystationary k-factor Gegenbauer process. We describe a procedure of estimating consis-tently the time-varying parameters with the help of the discrete wavelet packet transform(DWPT). The consistency and asymptotic normality of the estimates are proved. The ro-bustness of the algorithm is investigated through simulation study.

We also propose the forecast method for this new non-stationary long memory processes.Applications and forecasts based on the error correction term in the error correction modelof the Nikkei Stock Average 225 (NSA 225) index and the West Texas Intermediate (WTI)crude oil price are followed.

iii

KEY WORDS : Discrete wavelet packet transform, Gegenbauer process, Long memoryprocesses, Monte Carlo simulations, Nikkei Stock Average 225 index, Non-stationarity,Ordinary least squares estimation, Seasonality, Self-similarity, Test.

iv

RésuméDans cette thèse, on considère deux types de processus longues mémoires: les proces-sus stationnaires et non-stationnaires. Nous nous consacrons à l’étude de leurs propriétésstatistiques, les méthodes d’estimation, les méthodes de prévision et les tests statistiques.

Les processus longue mémoire stationaires ont été largement étudiés au cours des dernièresdécennies. Il a été démontré que des processus longue mémoire ont des propriétés d’auto-similarité, qui sont importants pour l’estimation des paramètres. Nous passons en revueles propriétés d’auto-similairité des processus longue mémoire en temps continu et entemps discret. Nous proposons deux propositions montrant que les processus longuemémoire sont asymptotiquement auto-similaires du deuxième ordre, alors que proces-sus courte mémoire ne sont pas asymptotiquement auto-similaires du deuxième ordre.Ensuite, nous étudions l’auto-similairité des processus longue mémoire spécifiques telsque les processus GARMA à k facteurs et les processus GIGARCH à k facteurs. Nousavons également étudié les propriétés d’auto-similarités des modèles heteroscedastiqueset des processus avec des sauts.

Nous faisons une revue des méthodes d’estimation des paramètres des processus longuemémoire, par méthodes paramétriques (par exemple, l’estimation par maximum de vraisem-blance et estimation par pseudo-maximum de vraisemblance) et les méthodes semiparamétriques(par exemple, la méthode de GPH, la méthode de Whittle, la méthode de Robinson). Lescomportements de consistance et de normalité asymptotique sont également étudiés pources estimateurs.

Le test sur l’ordre fractionnaire intégré de la racine unité saisonnière et non-saisonnièredes processus longue mémoire stationnaires est très important pour la modélisation desseries économiques et financières. Le test de Robinson (1994) est largement utilisé et ap-pliqué aux divers modèles longues mémoires bien connus. A partir de méthode de MonteCarlo, nous étudions et comparons les performances de ce test en utilisant plusieurs taillesd’échantillons. Ce travail est important pour les praticiens qui veulent utiliser le test deRobinson.

Dans la pratique, lorsqu’on traite des données financières et économiques, la saisonnal-ité et la dépendance qui évolvent avec le temps peuvent souvent être observées. Ainsiune sorte de non-stationnarité existe dans les données financières. Afin de prendre encompte ce genre de phénomènes, nous passons en revue les processus non-stationnaires etnous proposons une nouvelle classe de processus stochastiques: les processus de Gegen-bauer à k facteurs localement stationnaire. Nous proposons une procédure d’estimationde la fonction de paramètres en utilisant la transformation discrète en paquets d’ondelettes(DWPT). La robustesse de l’algorithme est étudiée par simulations.

Nous proposons également des méthodes de prévisions pour cette nouvelle classe de pro-cessus non-stationnaire à long mémoire. Nous dennons des applications sur le terme dela correction d’erreurs de l’analyse cointégration fractionnaire de l’index Nikkei Stock

v

Average 225 et nous étudions les prix mondiaux du pétrole brut.

Mots Clés: Transformation discrète en paquets d’ondelettes, Processus de Gegenbauer,Processus longue mémoire, Simulation de Monte Carlo, Nikkei 225, Non-stationarité, Es-timation des moindres carrés, Seasonnarité, Auto-similarité, Test.

JEL Classification: C12, C13, C14, C15, C22, C63, G15.

vi

Contents

Acknowledgment i

List of Tables xi

List of Figures xiii

1 Introduction 1

2 Some Probabilistic Properties of Stationary Processes 82.1 Introduction of Stationary Processes . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Short Memory Processes . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Long Memory Processes . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Self-similar Properties for Stationary Processes . . . . . . . . . . . . . . 112.2.1 Concepts of Self-similarity . . . . . . . . . . . . . . . . . . . . . 112.2.2 Continuous-time Self-similar Processes . . . . . . . . . . . . . . 132.2.3 Discrete-time Self-similar Processes . . . . . . . . . . . . . . . . 152.2.4 Examples of Self-similar Processes in Continuous Time . . . . . 172.2.5 Examples of Self-similar Processes in Discrete Time . . . . . . . 202.2.6 Summarize for the Self-similar Processes . . . . . . . . . . . . . 26

3 Wavelet Techniques for Time Series Analysis 283.1 Introduction of the Time-frequency Representations . . . . . . . . . . . . 28

3.1.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.2 Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . 293.1.3 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Properties of the Wavelet Transform . . . . . . . . . . . . . . . . . . . . 303.2.1 Continuous Wavelet Functions . . . . . . . . . . . . . . . . . . . 303.2.2 Continuous versus Discrete Wavelet Transform . . . . . . . . . . 31

3.3 Discrete Wavelet Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.1 Haar Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.2 Daubechies Wavelets . . . . . . . . . . . . . . . . . . . . . . . . 343.3.3 Minimum Bandwidth Discrete-time Wavelets . . . . . . . . . . . 34

3.4 Discrete Wavelet Transform (DWT) . . . . . . . . . . . . . . . . . . . . 343.4.1 Implementation of the DWT: Pyramid Algorithm . . . . . . . . . 353.4.2 Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . . 37

3.5 Maximal Overlap Discrete Wavelet Transform (MODWT) . . . . . . . . 37

vii

3.5.1 Definition and Implementation of MODWT . . . . . . . . . . . . 383.5.2 Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . . 38

3.6 Discrete Wavelet Packet Transform (DWPT) . . . . . . . . . . . . . . . . 393.7 Maximal Overlap Discrete Wavelet Packet Transform (MODWPT) . . . . 41

4 Estimation Methods for Stationary Long Memory Processes: A Review 424.1 ARFIMA Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1.1 Parametric Estimators . . . . . . . . . . . . . . . . . . . . . . . 464.1.2 Semiparametric Estimators . . . . . . . . . . . . . . . . . . . . . 484.1.3 Wavelet Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Seasonal and/or Cyclical Long Memory (SCLM) Models . . . . . . . . . 584.2.1 Estimation for the k-factor Gegenbauer ARMA Processes . . . . 604.2.2 Estimation for the Models with Fixed Seasonal Periodicity . . . . 67

4.3 Seasonal and/or Cyclical Asymmetric Long Memory (SCALM) Models . 68

5 Estimation and Forecast for Non-stationary Long Memory Processes 695.1 Fractional Integrated Processes with a Constant Long Memory Parameter 715.2 Locally Stationary ARFIMA Processes . . . . . . . . . . . . . . . . . . . 725.3 Locally Stationary k-factor Gegenbauer Processes . . . . . . . . . . . . . 74

5.3.1 Procedure for Estimating di(t) . . . . . . . . . . . . . . . . . . . 765.3.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . 765.3.3 Procedure for Estimating di(t) (i = 1, · · · , k) . . . . . . . . . . . 785.3.4 Consistency for Estimates di(t) (i = 1, · · · , k) . . . . . . . . . . 80

5.4 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.5 Forecast for Non-stationary Processes . . . . . . . . . . . . . . . . . . . 86

6 Applications 966.1 Nikkei Stock Average 225 Index Data . . . . . . . . . . . . . . . . . . . 96

6.1.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.1.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.1.3 Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.2 WTI Oil Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.2.1 Fitting by Stationary Model: AR(1)+FI(d) Model . . . . . . . . . 1176.2.2 Fitting by Stationary Model: AR(2)+FI(d) Model . . . . . . . . . 1176.2.3 Fitting by Non-stationary Model Using Wavelet Method . . . . . 1286.2.4 Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7 Testing the Fractional Order of Long Memory Processes 1477.1 Unit Root Test for Autoregressive Moving Average Processes . . . . . . . 1487.2 Unit Root Test for Fractional Integrated Processes . . . . . . . . . . . . . 149

8 Conclusion 1608.1 Overview of the Contribution . . . . . . . . . . . . . . . . . . . . . . . . 1608.2 Possible Directions for Future Research . . . . . . . . . . . . . . . . . . 162

viii

A The Well-definedness of the Locally Stationary k-factor Gegenbauer Pro-cesses 164

Bibliography 166

ix

List of Tables

5.1 Estimation of Gegenbauer frequencies, bias and RMSE of (y0,t)t, (y1,t)t,(y2,t)t, (y3,t)t, (y4,t)t, (y5,t)t. . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1 The relative results of the h-step-ahead predictions on the error correctionterm in the ECM of NSA 225 index data using the locally stationary 1-factor Gegenbauer model (parameter function smoothed by spline method).102

6.2 The relative results of the h-step-ahead predictions on the error correc-tion term in the ECM of NSA 225 index data using the locally stationaryGegenbauer model (parameter function smoothed by loess method). . . . 112

6.3 The relative results of the h-step-ahead predictions on Zt of the WTI oilprice data using the AR(1)+FI(d) model. . . . . . . . . . . . . . . . . . . 135

6.4 The relative results of the h-step-ahead predictions on Zt of the WTI oilprice data using the AR(2)+FI(d) model. . . . . . . . . . . . . . . . . . . 138

6.5 The relative results of the h-step-ahead predictions of WTI oil price datausing the locally stationary Gegenbauer model (parameter function smoothedby loess method). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.1 Robinson test for model (1−B)0.3Xt = εt where εt is a strong white noise.1557.2 Robinson test for model (1 − B4)0.3Xt = εt where εt is a strong white

noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.3 Robinson test for model (1 − B12)0.3Xt = εt where εt is a strong white

noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.4 Robinson test for model (1 − B)0.3(1 − B4)0.4Xt = εt with d1 = 0.3

where εt is a strong white noise. . . . . . . . . . . . . . . . . . . . . . . 1557.5 Robinson test for model (1 − B)0.3(1 − B12)0.4Xt = εt where εt is a

strong white noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1567.6 Robinson test for model (1− 2νB + B2)0.15Xt = (1 + B)0.3 = εt where

εt is a strong white noise and ν = −1. . . . . . . . . . . . . . . . . . . . 1567.7 Robinson test for model (1 − 2νB + B2)0.1Xt = (1 + B)0.2 = εt where

εt is a strong white noise and ν = −1. . . . . . . . . . . . . . . . . . . . 1567.8 Robinson test for model (1 − 2νB + B2)0.3Xt = εt where εt is a strong

white noise, ν = cosπ

3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

7.9 Robinson test for model (1− 2ν1B + B2)0.3(1− 2ν2B + B2)0.4Xt = εt

where εt is a strong white noise, ν1 = cosπ

3, ν2 = cos

5π

6. . . . . . . . . 156

xi

7.10 Robinson test for model (1 − 2ν1B + B2)0.2(1 − 2ν2B + B2)0.3(1 −2ν3B + B2)0.4Xt = εt where εt is a strong white noise, ν1 = cos

π

6, ν2 =

cosπ

2, ν3 = cos

2π

3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

7.11 Robinson test for model (1−B)0.3Xt = εt where εt is GARCH(1,1) noise.1577.12 Robinson test for model (1−B4)0.3Xt = εt where εt is GARCH(1,1) noise.1577.13 Robinson test for model (1 − B12)0.3Xt = εt where εt is GARCH(1,1)

noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1587.14 Robinson test for model (1−B)0.3(1−B4)0.4Xt = εt where εt is GARCH(1,1)

noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1587.15 Robinson test for model (1 − B)0.3(1 − B12)0.4Xt = εt where εt is

GARCH(1,1) noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1587.16 Robinson test for model (1− 2νB + B2)0.15 = (1 + B)0.3Xt = εt where

εt is GARCH(1,1) noise and ν = −1. . . . . . . . . . . . . . . . . . . . 1587.17 Robinson test for model (1 − 2νB + B2)0.1 = (1 + B)0.2Xt = εt where

εt is GARCH(1,1) noise and ν = −1. . . . . . . . . . . . . . . . . . . . 1597.18 Robinson test for model (1−2νB+B2)0.3Xt = εt where εt is GARCH(1,1)

noise, ν = cosπ

3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.19 Robinson test for model (1− 2ν1B + B2)0.3(1− 2ν2B + B2)0.4Xt = εt

where εt is GARCH(1,1) noise, ν1 = cosπ

3, ν2 = cos

5π

6. . . . . . . . . . 159

7.20 Robinson test for model (1 − 2ν1B + B2)0.2(1 − 2ν2B + B2)0.3(1 −2ν3B + B2)0.4Xt = εt where εt is GARCH(1,1) noise, ν1 = cos

π

6, ν2 =

cosπ

2, ν3 = cos

2π

3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

xii

List of Figures

1.1 604 daily observations of the Nasdaq-100 index . . . . . . . . . . . . . . 51.2 The log-returns of the Nasdaq-100 index . . . . . . . . . . . . . . . . . 5

2.1 ACF of a simulated short memory process . . . . . . . . . . . . . . . . . 112.2 ACF of a simulated long memory process . . . . . . . . . . . . . . . . . 112.3 Spectrum of a simulated long memory process (FI(d) Process) . . . . . . 122.4 Spectrum of a simulated long memory process (Gegenbauer Process) . . 122.5 Spectrum of a simulated long memory process (2-factor Gegenbauer pro-

cess) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6 Spectrum of a simulated long memory process (Hassler Model) . . . . . 12

4.1 The ACF of a simulated seasonal long memory processes . . . . . . . . . 59

5.1 Sample path of (y0,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 ACF of (y0,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.3 Spectrum of (y0,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.4 d0(t) (smoothed by spline and loess method) for (y0,t)t . . . . . . . . . . 855.5 Sample path of (y1,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.6 ACF of (y1,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.7 Spectrum of (y1,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.8 d1(t) (smoothed by spline and loess method) for (y1,t)t . . . . . . . . . . 875.9 Sample path of (y2,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.10 ACF of (y2,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.11 Spectrum of (y2,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.12 d2(t) (smoothed by spline and loess method) for (y2,t)t . . . . . . . . . . 895.13 Sample path of (y3,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.14 ACF of (y3,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.15 Spectrum of (y3,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.16 d3(t) (smoothed by spline and loess method) for (y3,t)t . . . . . . . . . . 915.17 Sample path of (y4,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.18 ACF of (y4,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.19 Spectrum of (y4,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.20 d4(t) (smoothed by spline and loess method) for (y4,t)t . . . . . . . . . . 935.21 Sample path of (y5,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.22 ACF of (y5,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.23 Spectrum of (y5,t)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

xiii

5.24 d5(t) (smoothed by spline and loess method) for (y5,t)t . . . . . . . . . . 95

6.1 Trajectory of NSA 225 index (02/01/1989-13/09/2004) . . . . . . . . . . 986.2 Error Correction Term (Zt)t . . . . . . . . . . . . . . . . . . . . . . . . 996.3 Multi-Resolution Analysis of (Zt)t (J = 6) . . . . . . . . . . . . . . . . 1006.4 d(t) smoothed by spline and loess method . . . . . . . . . . . . . . . . . 1016.5 NSA: 1-step-ahead forecast for d(t) smoothed by spline method . . . . . 1036.6 NSA: 2-step-ahead forecast for d(t) smoothed by spline method . . . . . 1046.7 NSA: 3-step-ahead forecast for d(t) smoothed by spline method . . . . . 1056.8 NSA: 4-step-ahead forecast for d(t) smoothed by spline method . . . . . 1066.9 NSA: 5-step-ahead forecast for d(t) smoothed by spline method . . . . . 1076.10 NSA: 6-step-ahead forecast for d(t) smoothed by spline method . . . . . 1086.11 NSA: 7-step-ahead forecast for d(t) smoothed by spline method . . . . . 1096.12 1-step-ahead forecast for the error term in the ECM of NSA 225 index

(smoothed by spline method). . . . . . . . . . . . . . . . . . . . . . . . . 1106.13 2-step-ahead forecast for the error term in the ECM of NSA 225 index






(smoothed by spline method). . . . . . . . . . . . . . . . . . . . . . . . . 1116.19 NSA: 1-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1126.20 NSA: 2-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1126.21 NSA: 3-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1136.22 NSA: 4-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1136.23 NSA: 5-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1136.24 NSA: 6-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1136.25 NSA: 7-step-ahead forecast for d(t) smoothed by loess method . . . . . . 1146.26 1-step-ahead forecast for the error term in the model of NSA 225 index

(smoothed by loess method). . . . . . . . . . . . . . . . . . . . . . . . . 1146.27 2-step-ahead forecast for the error term in the model of NSA 225 index




(smoothed by loess method). . . . . . . . . . . . . . . . . . . . . . . . . 115

xiv

6.31 6-step-ahead forecast for the error term in the model of NSA 225 index(smoothed by loess method). . . . . . . . . . . . . . . . . . . . . . . . . 115

6.32 7-step-ahead forecast for the error term in the model of NSA 225 index(smoothed by loess method). . . . . . . . . . . . . . . . . . . . . . . . . 116

6.33 WTI: The sample path, spectrum, ACF and PACF of (Xt)t . . . . . . . . 1186.34 WTI: The sample path, spectrum, ACF and PACF of (Ut)t . . . . . . . . 1196.35 WTI: Fit Zt by AR(1) model . . . . . . . . . . . . . . . . . . . . . . . . 1206.36 WTI: AR(1)+FI(d), The sample path, spectrum, ACF and PACF of the

residual (εt)t of the AR(1) term . . . . . . . . . . . . . . . . . . . . . . . 1216.37 WTI: AR(1)+FI(d), The sample path, spectrum, ACF and PACF of the

volatility (ε2t )t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.38 WTI: AR(1)+FI(d), The residual (νt)t of the FI(d) term . . . . . . . . . . 1236.39 WTI: Fit Zt by AR(2) model . . . . . . . . . . . . . . . . . . . . . . . . 1246.40 WTI: AR(2)+FI(d), Sample path, spectrum, ACF and PACF of the resid-

ual (εt)t of AR(2) term . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.41 WTI: AR(2)+FI(d), The sample path, spectrum, ACF and PACF of the

volatility (ε2t )t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.42 WTI: AR(2)+FI(d), The residuals of the FI(d) term (νt)t . . . . . . . . . 1276.43 WTI: The Multiresolution analysis of (Xt)t . . . . . . . . . . . . . . . . 1296.44 WTI: The sample path, spectrum, ACF and PACF of (Zt)t . . . . . . . . 1306.45 WTI: The estimated parameter of (dt)t . . . . . . . . . . . . . . . . . . . 1316.46 WTI: The sample path, spectrum, ACF and PACF of (Zt)t after differenc-

ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.47 WTI: The estimated parameter of (dt)t after differncing . . . . . . . . . . 1336.48 WTI: The 1-step-ahead prediction of (ε2

t )t concerning AR(1)+FI(d) model 1346.49 WTI: The 1-step-ahead prediction Zt using AR(1)+FI(d) model . . . . . . 1346.50 WTI: The 2-step-ahead prediction of (ε2

t )t concerning AR(1)+FI(d) model 1356.51 WTI: The 2-step-ahead prediction of (Zt)t concerning AR(1)+FI(d) model 1356.52 WTI: The 3-step-ahead prediction of (ε2









t )t concerning AR(2)+FI(d) model 140

xv

6.69 WTI: The 4-step-ahead prediction Zt using AR(2)+FI(d) model . . . . . . 1406.70 WTI: The 5-step-ahead prediction of (ε2



t )t concerning AR(2)+FI(d) model 1416.75 WTI: The 7-step-ahead prediction Zt using AR(2)+FI(d) model . . . . . . 1416.76 1-step-ahead forecast of the long memory parameter for the WTI data

(smoothed by the loess method). . . . . . . . . . . . . . . . . . . . . . . 1436.77 1-step-ahead forecast of the WTI oil price data (smoothed by the loess

method). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1436.78 2-step-ahead forecast of the long memory parameter for the WTI data












method). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

xvi

Chapter 1

Introduction

In our endeavours to understand the changing world around us, observations of one kindor another are frequently made sequentially over time. The record of sunspots is a classicexample, which may be traced back to 28 B.C.

A time series is a set of observations Xt, each one being recorded at a specified timet. A discrete time series is one in which the set T0 of times at which observations aremade is a discrete set, as is the case for example when observations are made at fixed timeintervals. Continuous time series are obtained when observations are recorded continu-ously over some time interval. And the most important objective in our study of a timeseries is to uncover the law governing its generation.

Definition 1.0.1. A stochastic process is a family of random variables Xt, t ∈ T definedon a probability space (Ω, F, P ), where T is a set of time points.

In this thesis, we consider the data coming in the form of a univariate stochastic process.The identification of processes with strong correlation between observations far apart intime or space (so-called long memory or long range dependence) is now widespread inmany diverse fields and disciplines, which is the main interest in this thesis.

The past several decades have witnessed an increasing interest in fractionally integratedprocesses as a convenient way of describing the long memory properties of many timeseries. In the pioneering papers about long memory, Granger and Joyeux (1980) andHosking (1981) proposed a fractionally integrated ARMA (ARFIMA) process which ex-tends the conventional ARIMA model using a differencing operator (I −B)d, where d isallowed to take any non-integer value. Since then, ARFIMA processes, which have beenfitted to many time series, have provided reliable long-term inferences and have becomeone of the most popular parametric long memory models in the literature. A singularityor a pole can be observed at the zero frequency in the spectrum of the ARFIMA process.

In empirical studies, fractional integration and long memory have been found relevantin many areas in macroeconomics and finance. There have been many studies to providea theoretical motivation for fractional integration and long memory, for instance modelsbased on aggregation have been suggested by Robinson (1976) and Granger and Joyeux

1

(1980), error duration models by Parke (1999), and regime switching models by Dieboldand Inoue (2001). Other examples of applications are Diebold and Rudebusch (1989,1991) and Sowell (1992b) for various GDP measures, Gil-Alana and Robinson (1997) forthe extended Nelson-Plosser data set, Hassler and Wolters (1995) and Baillie et al. (1996)for inflation data, Diebold et al. (1991) and Baillie (1996) for real exchange rate data, andAndersen, Bollerslev, Diebold and Ebens (2001) and Andersen, Bollerslev, Diebold andLabys (2001) for financial volatility series.

One extension of the ARFIMA process is Gegenbauer ARMA (GARMA) model pro-posed by Gray et al. (1989) in which the operator (I − 2νB + B2)d is used to character-ize seasonal long memory with period 2π/f where f(= cos−1(ν)) is called the Gegen-bauer frequency. In comparison with the ARFIMA model, the long memory propertyof a GARMA process is reflected by the unbounded spectral density f(ω) around theGegenabuer freqeuncy f :

f(ω) ∼ |ω − f |−2d as ω → f,

for 0 < d < 1/2, where x ∼ y means that x/y tends to 1. Thus, GARMA process hasonly one singular point in the spectral density. The GARMA model was further general-ized to allow multiple poles in the spectrum which leads to the k-factor GARMA models(see Giraitis and Leipus, 1995; Woodward et al., 1998).

Another extension was proposed by Porter-Hudak (1990), who generalized the ARFIMAmodels to seasonal fractionally integrated ARMA (SARFIMA) models by incorporat-ing the operator (I − Bs)d with a known period s, with a successful application to theU.S. monetary aggregates. In contrast to the ARFIMA model, the spectral density of anSAFRIMA process has singularities not only at zero frequency but also at frequencieswith multiple values of 2π/s.

More generally, Robinson (1994) proposed the season and/or cyclical long memory (SCLM)model which is a generalization of most long memory processes. It permits the singulari-ties at zero and non-zero frequencies at the same time.

In the thesis, we make great efforts on the study of the long memory processes. Brieflyspeaking, this thesis is composed mainly by two parts of work. The first part of our workis concerned with the modeling and probabilistic properties of stationary long memoryprocesses. The second part of our work is concentrated on the study of non-stationarylong memory processes by wavelet methods.

In the first part of our work, we review the self-similar concepts and properties for thecontinuous-time processes and discrete-time processes. We particularly focus on the re-lationship between long range dependence and self-similarity for discrete-time processes,and we present some new results for these latter models. Our major contribution hereis that we have proved that if a process is both covariance stationary and long memory,then it is asymptotically second-order self-similar. Under Gaussianity, these processes areall asymptotically self-similar. We apply this result to the processes such as the k-factor

2

GARMA processes, the k-factor GIGARCH processes, etc. We also prove that if a pro-cess is covariance stationary and short memory, then it is not asymptotically self-similar.Thus, the usual linear ARMA processes, the GARCH processes, the models with switchesand the models with breaks, are not asymptotically self-similar. Whereas, we show thatunder appropriate assumptions and hypotheses the SETAR models can be asymptoticallyself-similar. Moreover, for the processes with breaks and jumps, although they are the-oretically short memory processes, they are empirically long memory processes with thesample autocorrelation function decaying hyperbolically to zero. Consequently, theo-retically they are not asymptotically second-order self-similar, while empirically they areasymptotically second-order self-similar. Since we often refer the empirical long memorybehavior as "spurious long memory behavior", we propose a new notion to describe thisphenomenon: the "spurious asymptotically second-order self-similar" behavior. This partof work has been written as a working paper (Document de Travail du Centre d’Economiede la Sorbonne de l’Université Paris 1–2007.55) and has been presented at the seminarof "Monnaie, Banque, Finance et Assurance" of University Paris 1 in June 2007. See inChapter 2 for more details .

Furthermore, to have an overall idea of the parameter estimation methods for the seasonaland/or cyclical long memory processes, we make a review in Chapter 4 of the parametricmethods based on the maximum likelihood function in time and frequency domain, thesemiparametric methods based on the Geweke and Porter-Hudak (GPH) method in thefrequency domain, etc. We also discuss briefly the asymptotical behaviors of the estima-tors.

We are also quite interested in the statistical test for the long memory processes. Work-ing with macroeconomics data sets significates that we deal with data sets with rathersmall sample sizes, generally lower than 1 000 points. Our aim is to know the accuracyof this test for a finite sample size. Our major contribution here is to assess the rate ofconvergence for Robinson’s test using Monte Carlo simulations. To adjust these modelson real data sets, it is fundamental to detect long memory behavior through statisticaltests. Thus, in the second part of our work, we focus on the Robinson test (1994) totest the long memory parameters. Robinson (1994) investigated a general model in orderto test whether the data stemmed from a stationary or a non-stationary process, underuncorrelated and weakly-correlated innovations (εt)t. We recall the Robinson test andcarry out the Monte Carlo simulations using grid-search method in order to study thefinite sample behaviors of the test for several long memory models under the innova-tions of strong white noise and GARCH noise, the fractional integrated model, k-factorGegenbauer model (k = 1, 2, 3, 4), rigid Hassler models, flexible Hassler models, etc. Wecompare the different finite sample behaviors of the seasonal and/or cyclical long memoryprocesses, which provides a useful reference in practice. We find that the sample size iscrucial for the accuracy of the test. From another point of view, the Robinson test canwork as a parameter estimation method by grid searching. This part of work is describedin Chapter 7 and has been presented at the seminar of "Monnaie, Banque, Finance et As-surance" of University Paris 1 in June 2008, and also at the International Symposium onForecasting 2008 in Nice, France. This paper is in revision for the journal of Computa-

3

tional Statistics and Data Analysis.

Although stationarity has been usually regarded as a valid assumption for series of shortduration. However, such an assumption is rapidly losing its credibility in the enormousdatabases maintained by firms and organizations on a large variety of subjects such asgeophysics, oceanography, meteorology, speech and economics. By means of the auto-correlation function (ACF) (with autocorrelations on the y axis and the different time lagson the x axis) it is possible to detect non-stationarity of the time series with respect to themean level.

In practice, there are many stochastic processes which are not stationary. Sometimes,we will do some transformations to make them stationary. An example of non-stationarytime series is given in Figure 1.1, which represents the values of the Nasdaq-100 indexfrom January 4, 1999 through January 4, 2009. This index includes hundred of the largestnon-financial domestic and international companies listed on the Nasdaq National Mar-ket. It is clear that this process contains a trend, which can be removed if we compute thefirst-order difference of the logarithm of the Nasdaq-100 series (the log-return index), seeFigure 1.2. The resulting zero-mean process still contains valuable information for theanalyst, in terms of the volatility of the time series.

However, some time series are not suitable to be made stationary. An example of a non-stationary time series is a record of readings of the atmosphere temperature measured each10 seconds with some random errors that have a constant distribution with zero mean. Atany given time point the mean of the readings is equal to the true temperature. On theother hand, the mean value itself changes with time–as far as the true temperature varieswith time. Thus, it gives us great motivations to investigate the estimation method of thelong memory parameters for non-stationary processes.

To our knowledge, the existing non-stationarity of long memory time series is character-

ized by two kinds of parameters. One is the constant parameter d(∈ (1

2, 1)) in the fraction-

ally integrated model (I −B)dXt = εt. The other is the time-varying parameter function

d(t)(∈ (−1

2,1

2)) in the generalized fractionally integrated model (I−B)d(t)Xt = εt con-

sidered by Jensen (2000), Whithcer and Jensen (2000), Cavanaugh et al. (2002), etc.

It is necessary to extend the concept of long memory to the non-stationary framework,(see Cheung and Lai, 1993; Maynard and Phillips, 2001; Phillips, 2005). In Chapter 5,for non-stationary processes, we first review the previous work focusing on the estima-tion method of long memory parameter, in particularly the fractional integrated model.The methods are concentrated on the GPH method, local Whittle method, exact localWhittle method, fully extended local Whittle method, Whittle pseudo maximum likeli-hood method and wavelet-based local Whittle method. We also discuss the correspondingconsistency and normality of the estimators. Hurvich and Ray (1995) argued, that theGPH estimator was consistent only when d < 1 by simulation, and Kim and Phillips(1999) theoretically. In the same context, Velasco (1999a) showed the consistency and

4

Time

X

0 500 1000 1500 2000 2500

1000

2000

3000

4000

Figure 1.1: 604 daily observa-tions of the Nasdaq-100 index

Time

Z

0 500 1000 1500 2000 2500

−0.

10−

0.05

0.00

0.05

0.10

0.15

Figure 1.2: The log-returns of theNasdaq-100 index

the asymptotic normality of the Robinson (1995a) estimator for d ∈ [1/2, 3/4]. To solvethe non-consistency problem, Hurvich and Ray (1995) and Velasco (1999a) suggested theuse of data tapering, which was first proposed by Cooley and Tukey (1965) and discussedby Cooley et al. (1967) and Jones (1971). This technique has also been used by many au-thors, such as Hurvich and Chen (2000), Giraitis and Robinson (2003), Sibbertsen (2004),Olhede et al. (2004), among many others. For any value of d, Velasco (1999a) showedthat if the tapering order p is greater or equal to [d + 1/2] + 1, the estimator is consistentand asymptotically normal.

On the other hand, in reality, many naturally occurring phenomena show a slow driftin their periodicities and self-similarities, when observed for enough long time period.The changes in the cycles and self-similarities are often so gradual that a process over ashort span will behave as a stationary process. This information on the drift of the cyclesand self-similarities can provide a great deal of insight into the dynamics of an evolvingprocess. The analysis of the non-stationary economic time series with time-varying pa-rameters has been going on for some time. According to the long memory processes withtime-varying parameter functions, one important model is the locally stationary ARFIMAmodel, which has been studied by Jensen (1992a), Whitcher and Jensen (2000) and Ca-vanaugh et al. (2002), for instance. In fact, this model can be regarded as an extension ofthe stationary ARFIMA model which permits the long memory parameter to evolve withtime. These authors all used the wavelet techniques based on the semiparametric methodfor estimating the parameters. Whitcher and Jensen (2000) improved the wavelet-basedGPH method (Jensen, 1999) by the concept of the cone of influence. Cavanaugh et al.(2002) studied the self-similar property of the locally stationary ARFIMA model, builtthe approximate log-linear relationship, and estimated the time-varying parameters whichhas been proved to be quite efficient. Concerning the estimation methods of this class

5

of processes, Brock (2000) discussed the need for time-frequency analysis of economictime series when he brought up the work on the interface between ecology and economics.

Fourier transform is known to be suitable to analyze the stationary time series with itslocalized frequency basis function. Whereas, since wavelet transform is localized both intime and in frequency, it has the ability to capture the characteristics of the events that arelocal in time, which makes it an ideal tool for studying non-stationary or transient timeseries with time-varying characteristics. Yves Meyer may be considered as the founder ofthis mathematical subject, which we call wavelet analysis. Of course, Meyer’s profoundcontribution to wavelet analysis is much more than being a pioneer of this new mathemati-cal field. For the past ten years, he has been totally committed to its development, not onlyby building the mathematical foundation, but also by actively promoting the field as aninterdisciplinary area of research. In fact, a number of concepts such as non-stationarity,multiresolution and approximate decorrelation emerge from wavelet filters, as it providesa natural platform to deal with the time-varying characteristics found in most real worldtime series, and thus the assumption of stationarity may be avoided. Hence, a transformthat decomposes a process into different time horizons (scales) is appealing as it differ-ences seasonalities, reveals structural breaks and volatility clusters, and identifies localand global dynamic properties of a process at these time scales. Moreover, wavelet fil-ters provide a convenient way of dissolving the correlation structure of a process acrosstime scales. This will indicate that the wavelet coefficients at one level are not (much)associated with coefficients at different scales or within their scales. This is useful whenperforming tasks such as simulations, estimations, and test since it is always easier to dealwith an uncorrelated process than one with unknown correlation structure.

One of the most important task of this thesis is to study the non-stationary time seriesmodels with seasonalities which permit the long memory parameters to evolve with time,motivated by the idea of Dalhaus (1997). As an extension of the k-factor GARMA modeland that of the fractionally integrated model, a class of new model (locally stationaryk-factor Gegenbauer process) is proposed:

k∏i=1

(I − 2νB + B2)d(t)yt = εt,

where εt is Gaussian white noise. Fr appropriate values of d(t), this model can be re-garded locally as stationary processes with constant parameter, that is to say, piecewisestationary. Based on this kind of understanding, we develop a wavelet-based estima-tion procedure with the help of the DWPT. After evenly partitioning the time interval,we locate the DWPT coefficients on each subinterval according to the "Heisenberg box"principle. Then we calculate the the wavelet variances on each subinterval, which is an ap-proximate estimate of the spectrum. Then we apply the ordinary least squares regressionson the subintervals. What we obtain at last are the local estimates on the subintervals. Thefirst and last estimates are omitted before doing smoothing in order to avoid the bound-ary effects. We adopt the spline smoothing method and loess smoothing method to getthe smoothed curves. Furthermore, we prove the consistency and asymptotical normality.

6

The robustness of the estimates is verified by Monte Carlo simulation study. We simu-late six kinds of elementary functions as the parametric functions: the constant, linear,quadratic, cubic, exponential and logarithmic functions. We present the smoothed curvesfor the estimated long memory parameter functions with comparison to the other authors’estimation results, and we also present the mean bias and the mean RMSE of the estima-tion for the simulated series. To our knowledge, there exists almost no literature of theforecast method for the non-stationary processes. For the new non-stationary processesthat we propose in this thesis, we develop the corresponding forecast method based on theestimations. According to the model, the most critical point is to make the forecast for thelong memory parameter function, with which we obtain the forecast for the original series.

This part of work is presented in Chapter 5 and we have written as a working paper (Doc-ument de Travail du Centre d’Economie de la Sorbonne de l’Université Paris 1–2009.15).It will be submitted to a statistic journal of high level.

In the application part, we apply the new non-stationary model on the error correctionterm of the cointegration model of Nikkei Stock Average 225 index and on the WTI oilprice data, using the wavelet-based estimation procedure that we proposed and we makethe h-step-ahead forecasts of the long memory parameter functions. Then we make theh-step-ahead forecasts of the error correction terms. The forecasts are also evaluated bythe bias and the root mean square error. We discuss a little the applicability of the newmodel and the new estimation method.

The remainder of this thesis is organized as follows: In Chapter 2, we introduce someprobabilistic properties of stationary processes. Chapter 3 are the wavelet techniques fortime series analysis. In Chapter 4, we make a review of the estimation methods for station-ary long memory processes. Chapter 5 is concerned with the estimation methods of thenon-stationary processes. In Chapter 6, we make some applications on the financial dataseries and the energy data series. Chapter 7 is concentrated on the tests for the fractionalorder of long memory processes. Chapter 8 is the conclusion.

7

Chapter 2

Some Probabilistic Properties ofStationary Processes

2.1 Introduction of Stationary ProcessesLet Xt denote a real-valued random variable representing the observation made at time t.We confine our study to observations made at regular time intervals, and without loss ofgenerality, we assume that the basic time interval is of duration one unit of time.

Now we present some important features of time series with time-invariant properties.

Definition 2.1.1. The time series (Xt)t is said to be stricly stationary if, for any t1, t2, · · · , tn ∈Z, any k ∈ Z, and n = 1, 2, · · · ,

FXt1 ,Xt2 ,··· ,Xtn(x1, · · · , xn) = FXt1+k,Xt2+k,··· ,Xtn+k

(x1, · · · , xn), (2.1)

where F denotes the distribution function of the set of random variables which appear assuffices.

The term "weakly stationary", "second-order stationary", "covariance stationary" or "wide-sense stationary" is used to describe the theoretically less restricted situation in which

E(Xt1) = E(Xt1+k), Cov(Xt1 , Xt2) = Cov(Xt1+k, Xt2+k), (2.2)

for all t1, t2, k ∈ Z, the covariances being assumed to exist. Strict stationarity impliesweak stationarity provided that V ar(Xt) exists. And these two concepts are equivalentunder Gaussianity.

In practice, to obtain stationarity, sometimes we need to do some simple transforma-tions, such as taking differences of consecutive observations, subtracting a polynomial ortrigonometric trend, etc.

Consider a covariance stationary time series (Xt)t with finite variance. It follows fromthe relationship (2.2) that Cov(Xt1 , Xt2) is simply a function of |t1− t2|. This function iscalled the autocovariance function of (Xt)t at lag (t1 − t2). We denote it by γt2−t1 . The

8

ratio γτ/γ0 (τ ∈ Z), is called the autocorrelation function of (Xt)t of lag τ , denoted by ρτ .

For a scalar covariance stationary process (Xt)t, if we assume absolute continuity of thespectral distribution function, then there is a spectral density f(λ) (−π < λ ≤ π), suchthat the autocovariance

γj = E[(X1 − E(X1))(X1+j − E(X1))] =

∫ π

−π

cos(jλ)f(λ)dλ.

In the analysis of stationary time series, the behavior of the spectral density around zerofrequency is often of interest, which is an important characteristic in identifying themodel.

2.1.1 Short Memory ProcessesDefinition 2.1.2. A covariance stationary process (Xt)t is called short memory (or short

range dependence) process, if its autocorrelation function ρk satisfies∞∑

k=0

ρk < ∞.

Thus the autoregressive moving average(ARMA) processes and autoregressive condi-tional heteroscedastic(ARCH) processes are the classical short memory processes. InFigure 2.1 we observe the sample autocorrelation function of a simulated short memoryprocess.

2.1.2 Long Memory ProcessesOne of the earliest studies which mentioned that the observed time series may exhibitlong range dependence is attributed to the pioneering work of Hurst (1951). While look-ing at time series from the physical sciences (rainfall, tress rings, river levels, etc.), henoticed that his R/S- statistic, on a logarithmic scale, was randomly scattered around a

line with slope H >1

2for large sample sizes. For a stationary process with short range

dependence, log R/S should be proportional to k1/2, for large k. Hurst’s discovery ofslopes proportional to kH , with H > 1/2, was in direct contradiction to the theory ofsuch processes at the time. This discovery is known as the "Hurst effect" and H is the"Hurst parameter".

Mandelbrot and co-workers (Mandelbrot and Van Ness, 1968; Mandelbrot and Wallis,1969) showed that the Hurst effect may be modeled by fractional Gaussian noise withself-similarity parameter 0 < H < 1. This process exhibits stationary long memory dy-

namics when 1/2 < H < 1 and reduces to white noise when H =1

2. The spectrum of

fractional Gaussian noise may be derived from an appropriately filtered spectral densityfunction of fractional Brownian motion and evaluated via numeric integration, althoughfinite term approximations may be used in practice. We will instead focus our attentionon a convenient class of time series models known as fractional differencing processes.

9

Since the work of Granger and Joyeux (1980) and Hosking (1981), long memory pro-cesses have been widely studied. There exist different characterizations for the conceptof long memory in time domain or in spectral domain.

We say that a function h changes slowly to infinity (or to zero) if it satisfies the conditionH: for all a ∈ R, h(ax)/h(x) → 1, when x →∞ (or x → 0).

Definition 2.1.3. A covariance stationary process (Xt)t is called long memory processif it has an autocorrelation function ρk which behaves like a power function decaying tozero hyperbolically as

ρk ∼ Cρ(k) · k−α, as k →∞, 0 < α < 1, (2.3)

where ∼ represents the asymptotic equivalence, and Cρ(k) is a function which changesslowly to infinity satisfying the condition H.

This definition is concerned with the concept of long memory behavior in time domain,which means that the decay rate of autocorrelations is very slow. Thus, the autocorrelationseries is absolutely divergent, i.e. Σ∞

k=0ρk = ∞. Remark that if we note H = 1− α

2, the

long memory behavior following the Definition 2.1.3 occurs when1

2< H < 1. Figure

2.2 shows the sample autocorrelation function (ACF) of a simulated long memory pro-cess. Obviously, a very slow decay of the ACF is displayed.

Due to the Fourier transform, the covariance function is connected to the spectral den-sity. In spectral domain, we define the long memory behavior which is characterized bythe rate of explosions at low frequencies as follows:

Definition 2.1.4. A covariance stationary process (Xt)t is called long memory process ifits spectral density function f is approximated by

f(λ) ∼ Cf (λ) · λ−2d, as λ → 0+, 0 < d <1

2, (2.4)

where Cf (λ) is a function which changes slowly to zero at frequency zero satisfying thecondition H.

For d ≥ 1

2, a function behaving like λ−2d as λ → 0+ will not be integrable so that co-

variance stationarity cannot be obtained, while the case when d > −1

2corresponds to an

invertibility condition in parametric models with property (2.4). In fact, property (2.4)is also useful in modeling stochastic processes by semi-parametric methods. Figure 2.3illustrates the spectrum of a a simulated long memory behavior. We can find an explosionat the zero frequency in the spectrum.

It should be pointed out that Definition 2.1.3 and Definition 2.1.4 are given in an asymp-totic context near the zero frequency, which indicate the singularity at zero frequency in

10

the spectral density.

More generally, a process is said to exhibit long memory behavior if its spectral density isunbounded at a finite number of frequencies between [0, π]. While if the process exhibitsshort memory behavior, the spectral density is bounded on the frequency interval [0, π].

Definition 2.1.5. A covariance stationary process (Xt)t is called long memory process ifits spectral density function f has the following property:

∃λ0 ∈ [0, π], such that f(λ0) is unbounded.

Figure 2.4 shows the spectrum of a simulated long memory process (Gegenbauer process)

with an explosion at the frequency λ0 =2π

3. Figure 2.5 shows the spectrum of the 2-factor

Gegenbauer process with two explosions in the spectrum. While in Figure 2.6, we canobserve three explosions in the spectrum of the Hassler model.

0 5 10 15 20

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series ts.sim

Figure 2.1: ACF of a simulatedshort memory process

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series ts.sim

Figure 2.2: ACF of a simulated longmemory process

2.2 Self-similar Properties for Stationary Processes

2.2.1 Concepts of Self-similarityIn the past decades there has been a growing interest in studying self-similar processes andasymptotically self-similar processes which were first introduced theoretically throughthe work of Komogorov (1941). These processes are typically used to model randomphenomena with long range dependence, which informally means significant correlationsacross arbitrarily large time scales.

11

0.0 0.1 0.2 0.3 0.4 0.5

1e−

051e

−02

1e+

011e

+04

1e+

07

frequency

spec

trum

Series: xRaw Periodogram

bandwidth = 6.68e−05

Figure 2.3: Spectrum of a simu-lated long memory process (FI(d)Process)

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−02

1e−

011e

+00

1e+

01

frequency

spec

trum



Figure 2.4: Spectrum of a simulatedlong memory process (GegenbauerProcess)

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−01

1e+

01

frequency

spec

trum



Figure 2.5: Spectrum of a sim-ulated long memory process (2-factor Gegenbauer process)

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−01

1e+

01

frequency

spec

trum



Figure 2.6: Spectrum of a simu-lated long memory process (HasslerModel)

12

The concept of self-similarity has been considered by many authors, mainly in the contextof continuous time. Many applications can be found in physics, geophysics, hydrology,finance, economics, communications and the "1/f noises".

Self-similarity is a pervasive characteristic to describe the phenomena exhibiting certainforms of long range dependence. It provides an elegant explanation and interpretationof an empirical law that is commonly referred as the Hurst’s effect. The parameter Hintroduced previously is the so-called Hurst parameter, designed to capture the degree ofself-similarity. Self-similarity measures the deviation of self-similar process away from

the Brownian motion whose Hurst parameter H is equal to1

2.

A priori, there is no evident link between the notion of self-similarity and long memory(see Beran (1994)). Generally speaking, the concept of long memory is not equivalent tothat of self-similarity or asymptotical self-similarity. For a self-similar or asymptoticallyself-similar process, we cannot justify directly whether it is long memory or not. For ex-ample, Brownian motion is self-similar but not long memory, while fractional Brownianmotion is an H-self-similar process with stationary increments (for specific values of H)and at the same time it exhibits long memory behavior. Thus we observe that the relation-ship between these two properties necessitates to be carefully identified.

There exist several different definitions for self-similarity and asymptotic self-similarity.In the context of stochastic processes, self-similarity is defined in terms of the distributionof the processes. We will consider these concepts respectively for continuous-time anddiscrete-time processes.

2.2.2 Continuous-time Self-similar ProcessesDefinitions of Self-similarity

Definition 2.2.1. A real-valued stochastic process Z = Z(t)t∈R is an H-self-similarprocess (or self-similar process with index H > 0), if for any a > 0,

Z(at)t∈R =d aHZ(t). (2.5)

We call such a model H-ss process.Here ” =d ” means identical in finite-dimensional distributions.

Definition 2.2.2. A process Z = Z(t)t∈R is an H-self-similar process with stationaryincrements (or self-similar process with index H > 0 and with stationary increments), iffor the H-ss process Ztt∈R and all h ∈ Z,

Z(t + h)− Z(h)t∈R =d Z(t)− Z(0)t∈R. (2.6)

We call such a model H-sssi process.

13

Properties of Self-similar Processes

Let Ztt∈R be an H-sssi process with 0 ≤ H < 1. We introduce the increment process(Xt)t∈Z defined by Xt = Zt − Zt−1, ∀ t. Denote γZ(·) the covariance function of theprocess Ztt∈R and σ2

Z its variance. Then we have the following properties, see Beran(1994) and Taqqu et al. (1997):

1. Z(0)=0 almost surely (a.s.).

2. −Z(t) =d Z(−t).

3. Suppose Z(t) is a non-trivial H-sssi process with H > 0:

(i) If E[|Z(1)|γ] < ∞ for some γ < 1, then H < 1/γ;

(ii) If E[|Z(1)|] < ∞, then H < 1;

(iii) If E[|Z(1)|] < ∞ and 0 < H < 1, then E[Z(t)] = 0;

(iv) If E[|Z(1)|] < ∞ and H = 1, then Z(t) = tZ(1) a.s..

4. E[(Zt − Zt−1)2] = σ2

Z(t− s)2H and γZ(t, s) =1

2σ2

Z [t2H − (t− s)2H + s2H ].

5. Let X be the sample mean of the process (Xt)t, then we have some empiricalproperties concerning X: X = nH−1(Z1 − Z0) and V ar(X) = σ2

Zn2H−2.

6. If (Xt)t is a Gaussian process with mean µ and variance σ2X , then n1−H(X−µ)/σX

is a standard normal random variable.

7. Among continuous-time processes, the H-ss process and the H-sssi process are themost frequently studied processes. Definition 2.2.1 hints that self-similar processesare stochastic processes that are invariant in distribution under appropriate scalingof time and space. If an H-ss process is not covariance stationary, we will considerthe increments of the H-ss process to get some stationarity. On the other hand, anH-ss process cannot be strictly stationary.

8. For the H−ss process Z(t)t∈R, the process Y (t) = e−tHZ(et)(t ∈ R) is station-ary. Conversely, if Y (t)t∈R is stationary, then the process Z(t) = tHY (lnt)(t >0) is H-ss.

9. When1

2< H < 1, the process (Xt)t∈Z exhibits long memory behavior.

When H =1

2, the process (Xt)t∈Z is uncorrelated.

When 0 < H <1

2, the process (Xt)t∈Z exhibits short memory.

14

2.2.3 Discrete-time Self-similar ProcessesFor a covariance stationary time series (Xt)t∈Z, if its covariance exists and its autocorrela-tion function ρk has the asymptotical behavior lim

k→∞ρk = 0, then the parameter H belongs

to the interval (0, 1).

Let (Xt)t be a covariance stationary time series. Denote

X(m)t =

1

m

tm∑

k=(t−1)m+1

Xk, k = 1, 2, ..., (2.7)

the corresponding aggregated sequence with level of aggregation m(> 1), and ρ(m)k the

autocorrelation function of the process (X(m)t )t∈Z.

Definitions of Self-similarity

Definition 2.2.3. A strictly stationary stochastic process (Xt)t is exactly self-similar (orasymptotically self-similar) if for all t,

Xt =d m1−HX(m)t (2.8)

holds for all m (or as m → ∞), where (X(m)t )t is the aggregated sequence defined as in

Equation (2.7) and 1/2 < H < 1.

Another characterization of self-similarity is based on second-order moments as follows:

Definition 2.2.4. Let (Xt)t be a covariance stationary process,

1. The process (Xt)t is called exactly second-order self-similar, or s.o.s.s, if m1−HX(m)t

has the same autocorrelation as Xt, for all m and for all t. Thus,

V ar(X(m)) = V ar(X)m2H−2 (2.9)

and ρ(m)k = ρk (2.10)

where 1/2 < H < 1, m > 1, k = 0, 1, 2, · · · , and ρk ∼ Ck2H−2, as k →∞.

2. The process (Xt)t is called asymptotically second-order self-similar, or a.s.o.s.s, if

limm→∞

ρ(m)k =

1

2[(k + 1)2H − 2k2H + (k − 1)2H ], ∀ k > 0.

We denote in the following

gH(k) =1

2[(k + 1)2H − 2k2H + (k − 1)2H ], where k > 0. (2.11)

The notion of exact (asymptotical) self-similarity concerns all the finite-dimensional dis-tributions of a strictly stationary process, while the notion of exact (asymptotical) second-order self-similarity concerns only the variance and autocorrelation function of a covari-ance stationary process. In fact, under Gaussian framework, exact second-order self-similarity is equivalent to exact self-similarity, and likewise, asymptotic second-orderself-similarity is equivalent to asymptotic self-similarity.

15

Properties of s.o.s.s. and a.s.o.s.s Time Series

Let (Xt)t be a covariance stationary exactly second-order self-similar process. Denote ρk

its autocorrelation function, fX(λ) its spectral density function, σ2X its variance, then we

have the following properties:

1. The autocorrelation function of the s.o.s.s process (Xt)t is such that:

ρk = gH(k), ∀k > 0, (2.12)

where gH(k) is introduced in the Equation (2.11).

2. For 1/2 < H < 1, the s.o.s.s process (Xt)t exhibits long memory behavior with itsautocorrelation function decaying hyperbolically: ρk ∼ H(2H − 1)k2H−2, as k →∞. And, its spectral density function explodes at the origin with fX(λ) ∼ |λ|1−2H ,as λ → 0.

3. The spectral density of the process (Xt)t verifies:

fX(λ) = 2Cf (1−cos λ)∞∑

i=−∞| 2πi+λ |−2H−1= Cf | λ |1−2H +O(| λ |min(3−2H,2))

(2.13)where λ ∈ [−π, π], 1/2 < H < 1 and Cf = σ2

X(2π)−1 sin(πH)Γ(2H + 1).

4. There exists some equivalences between the previous properties. The condition(2.9) is equivalent to the condition (2.12) and they are both equivalent to the condi-tion (2.13). Whereas, with the condition (2.9), we can deduce condition (2.10), butnot vice versa. (see Taqqu et al. (1997) for more details)

It should be pointed out that we can also use the above property 1 and property 3 asthe definition of second-order self-similarity processes. Thus, with these properties, weobtain the following result which presents the relationship between self-similarity andlong memory behaviors.

Lemma 2.2.5. Let (Xt)t be a covariance stationary process, if this process is short mem-ory as is defined in Definition 2.1.2, then it is not asymptotically second-order self-similar.

Proof. For a short memory process (Xt)t, the corresponding autocorrelation function de-cays exponentially to zero, so it does not satisfy the Equation (2.12), which means thatthe process (Xt)t is not asymptotically second-order self-similar.

Lemma 2.2.6. Let (Xt)t be a covariance stationary long memory process with1

2<

H < 1, then this process is asymptotically second-order self-similar. Furthermore, underGaussianity, the process is asymptotically self-similar.

Proof. For a covariance stationary process (Xt)t, its autocorrelation function decays hy-perbolically, i.e.

limk→∞

ρk

k2H−2= c,

1

2< H < 1.

16

According to the results of Tsybakov and Georganas (1997), we deduce that

limm→∞

ρ(m)k = gH(k), ∀k = 1, 2, · · · .

Thus the process is asymptotically second-order self-similar following the definition 2.2.4.

From the Lemma 2.2.6, the following result is straightforward:

Lemma 2.2.7. Let (Xt)t be a covariance stationary long memory process with1

2< H <

1, if the spectral density of the process blows up at the origin, then the process is asymp-totically second-order self-similar. Under Gaussianity, it is asymptotically self-similar.

2.2.4 Examples of Self-similar Processes in Continuous TimeIn this part, we present some continuous-time processes and study their self-similar prop-erties and long range dependence. We consider three cases: Gaussian H-sssi models,non-Gaussian H-sssi models and multi-fractal processes.

Gaussian H-sssi Models

The canonical Gaussian H-sssi model is the fractional Brownian motion, which is uniqueunder a scaling constant. We recall first of all the definition of Brownian motion.

Definition 2.2.8. Let B(t) be a stochastic process with continuous sample paths and suchthat, ∀t ∈ R,

(i) B(0) = 0 a.s.;(ii) B(t) has independent increments;(iii) for each t, B(t) has a Gaussian distribution with mean zero and variance t;(iv) E[B(t)−B(s)] = 0;

Then B(t) is called the (standard) Brownian motion.

According to the definition, we can remark that the Brownian motion is a1

2-ss process.

However, since the increments of the Brownian motion are independent, it is not a longmemory process.

Among continuous-time processes, another important model is the fractional Brownianmotion, which is the canonical example of Gaussian H-sssi processes.

Definition 2.2.9. Let B(t) be the standard Brownian motion with 0 < H < 1. DefineBH(t) the stochastic integral

BH(t) =

∫ωH(t, u)dB(u)

where the convergence of the integral is under L2-norm with respect to the Lebesguemeasure on the real numbers, where the weight function ωH satisfying the following con-ditions:

17

• ωH(t, u) = 0, for t ≤ u;

• ωH(t, u) = (t− u)H− 12 , for 0 ≤ u < t;

• ωH(t, u) = (t− u)H− 12 − (−u)H− 1

2 , for u < 0,

then BH(t) is called the fractional Brownian motion (fBm) with self-similar parameterH .

Now we recall some interesting properties concerning the fractional Brownian motion:

1. The fBm is the unique Gaussian H-sssi process, with respect to a scaling constant.

2. The covariance function γ(k) (k ∈ Z) of the fBm is proportional to |k|2H−2, ask →∞.

3. When 0 < H < 1, the fBm exhibits long memory behavior, since its spectraldensity f(w)(−π < ω < π) is proportional to ω−2H−1, as ω → 0.

4. We can divide the class of the fBm into three classes: anti-persistent process (0 <

H <1

2), random walk (H =

1

2) and persistent process (

1

2< H < 1).

5. To check whether a process X(t)t∈R is the fBm, we need to verify the followingitems:

(i) It is a Gaussian process with mean 0, X(0) = 0;

(ii) E[X2(t)] = σ2|t|2H , for some σ > 0, 0 < H < 1;

(iii) The process X(t)t∈R has stationary increments.

6. We often consider the fBm as a generalization of the Brownian motion. WhenH = 1, we get the degenerate case.

Since the fBm is the H-sssi processes, it builds a bridge between the continuous-timeprocesses and the discrete-time processes.

Non-Gaussian H-sssi Models

In contrast to the uniqueness of the Gaussian H-sssi process, we have an infinite numberof non-Gaussian H-sssi models, among which there is a large class of models with infinitevariance – the α-stable processes (with 0 < α < 2). If α = 2, they reduce to the Gaussianprocesses.

For α-stable H-sssi processes with 0 < α < 2, we have the following properties:

1. For different α ∈ (0, 2), the parameter H lies in different intervals:

• H ∈ (0,1

α], if 0 < α < 1

• H ∈ (0, 1], if 1 < α < 2.

18

2. When 0 < α < 2 and H 6= 1

α, one of the most commonly studied α-stable H-sssi

processes is the linear fractal stable motion (or linear fractal Lévy motion).

When 0 < α < 2 and H =1

α, we get the α-stable Lévy motion.

3. When 0 < α < 1, the unique non-degenerate α-stable1

α-sssi process is the α-stable

Lévy motion.When 1 ≤ α < 2, there is no uniqueness any more. For example, when α = 1, Xis a 1-stable random variable. This means that the process Y (t) = t ·X (∀t ∈ R) isa 1-sssi process.

4. When 1 < α < 2, the log-fractional stable motion is1

α-sssi process.

5. There exist many other standard families of α-stable H-sssi processes, and for sim-plicity, we concentrate on the symmetric case. The three famous symmetric α-stable (SαS) H-sssi processes are:

• the linear fractional stable motions;

• the real harmonizable fractional stable motions;

• the sub-Gaussian fractional motions.

All these three models reduce themselves to fractional Brownian motion if α = 2.The corresponding increments processes are called linear fractional stable noise,real harmonizable fractional stable noise and sub-Gaussian fractional noise.

6. Comparing the fBm and the α-stable Lévy processes, we find that the self-similarityproperty can have quite different origins:

• It can arise from strong dependence between increments in the absence of highvariability (ex. fBm). This mechanism for self-similarity is called "Josepheffect".

• It can arise from high variability, whose increments are independent and heavytailed (α-stable Lévy process). This mechanism for self-similarity is called"Noach effect".

7. The relationship between self-similar processes, Gaussian processes and Lévy pro-cesses can be characterized as follows:

• A self-similar and Gaussian process is the fractional Brownian motion;

• A self-similar and Lévy process is the so-called α-stable process;

• A Gaussian and Lévy process corresponds to the Brownian motion with drift;

• The Brownian motion acts as a particular case of all these three kinds of pro-cesses.

19

Multi-fractal Processes

The notion of self-similarity coincides with the original definition of fractality for ge-ometric objets. The concept of fractality comes from the fractal geometry to describethe self-similar processes. So we can generalize the notion of self-similarity to multi-fractality.

Definition 2.2.10. A random process X(t)t>0 is called a multi-fractal process if forany a > 0, there exits a random function M(a), such that X(at) =d M(a)X(t).

Exact self-similar process is a degenerate example of multi-fractal process, which can becalled mono-fractal process.

2.2.5 Examples of Self-similar Processes in Discrete TimeIn this part, we investigate the classical discrete-time processes and their self-similarityproperties. The study concerns fGn, k-factor GARMA process, k-factor GIGARCH pro-cess, process with switches, process with breaks and process with threshold.

• Fractional Gaussian Noise (fGn)

Definition 2.2.11. A process (Xt)t∈Z is called a fractional Gaussian noise, or fGn,if it satisfies, for all t ∈ Z, Xt = BH(t) − BH(t − 1), where BH(t)t∈R is afractional Brownian motion introduced in Definition 2.2.9.

Thus, the fractional Gaussian noise (fGn) is defined as the increments sequences ofthe process fBm. And it has the following properties:

1. The fractional Gaussian noise is an exactly self-similar stationary Gaussianprocess with zero mean.

2. Since the fractional Brownian motion is the unique H-sssi Gaussian process,the process fGn is also the unique stationary Gaussian process which is exactlyself-similar. The uniqueness is under the sense of a scaling constant.

• k-factor GARMA Processes

Definition 2.2.12. A process (Xt)t is called a k-factor Gegenbauer autoregressivemoving average process, or a k-factor GARMA process, if it has the following rep-resentation

Φ(B)k∏

i=1

(1− 2νiB + B2)diXt = Θ(B)εt, (2.14)

where k is a finite integer, |νi| ≤ 1 for all i = 1, · · · , k, (εt)t is a white noisewith zero mean and variance σ2

ε , and Φ(B) = I − φ1B − · · · − φpBp, Θ(B) =

I − θ1B − · · · − θqBq, di ∈ R, B is the backshift operator satisfying BXt = Xt−1.

20

Notice that the frequencies λi = cos−1(νi)(i = 1, · · · , k) are called Gegenbauerfrequencies. This model was first studied by Woodward et al. (1998) and Giraitisand Leipus (1995).

In the following, we recall the conditions for a covariance stationary k-factor GARMAprocess to be long memory in order to apply the result of the lemmas that we ob-tained.

Lemma 2.2.13. A k-factor GARMA process is covariance stationary and exhibitslong memory behavior when νi are distinct, all the roots of the polynomials Φ(B)

and Θ(B) are distinct and outside the unit circle, and if (i) 0 < di <1

2and |νi| < 1

or if (ii) 0 < di <1

4and |νi| = 1, for i = 1, · · · , k.

Proposition 2.2.14. Let (Xt)t be a k-factor GARMA process, then under the condi-tions of Lemma 2.2.13, it is asymptotically second-order self-similar. Furthermore,under Gaussianity, it is asymptotically self-similar.

Proof. A k-factor GARMA process is covariance stationary and long memory un-der the conditions of Lemma 2.2.13. Thus according to Lemma 2.2.6, a k-factorGARMA process is asymptotically second-order self-similar. Under Gaussianity, itis asymptotically self-similar.

We note that there are many particular cases of k-factor GARMA process describedas follows:

1. When k = 1 in Equation (2.14), we get Gegenbauer ARMA (GARMA) pro-cess introduced by Gray et al. (1989). This model contains the Gegenbauerprocess (GI(d)), without short memory terms.

2. When k = 1 and ν = 1 in Equation (2.14), we get the fractional inte-grated ARMA (ARFIMA) process introduced by Granger and Joyeux (1980)and Hosking (1981). This model contains the fractional integrated process(FI(d))when there are no short memory terms.

As a consequence, these models, as the particular cases of k-factor GARMA pro-cess, are covariance stationary, they are asymptotically self-similar, for which theresults are given in Proposition 2.2.14.

• Heteroscedastic ProcessesGARCH Processes and Related Processes

Definition 2.2.15. A process (Xt)t is a generalized autoregressive conditional het-eroscedastic process with order p (≥ 0) and q (> 0), or GARCH(p,q), if it has thefollowing representation Xt = σtεt and

σ2t = α0 +

q∑i=1

αiX2t−i +

p∑j=1

βj

2∑t−j

= α0 + a(B)X2t + b(B)σ2

t , (2.15)

21

where α0 > 0, αi ≥ 0(i = 1, · · · , q), βj ≥ 0(j = 1, · · · , p), εt ∼ IID(0, 1),and εt is independent of Xt−k, k ≥ 1 for all t, B is the backshift operator, a(B)and b(B) are polynomials in B of order q and p respectively.

This model was introduced by Bollerslev (1986). If βj = 0 in (2.15), we get an

ARCH process (Engle 1982). Ifq∑

i=1

αi +

p∑j=1

βj = 1, we get an IGARCH process

(Bollerslev 1988).

Lemma 2.2.16. The GARCH (p,q) process defined in Definition 2.2.15 is covari-ance stationary with E(Xt) = 0, V ar(Xt) = α0(1−a(1)−b(1))−1 and Cov(Xt, Xs) =0 for t 6= s, if and only if a(1) + b(1) < 1.

See Bollerslev (1986) for more details.

Proposition 2.2.17. Let (Xt)t be the GARCH process introduced in Definition2.2.15, if a(1) + b(1) < 1, then it is not asymptotically second-order self-similar.

Proof. If a(1) + b(1) < 1, according to the Lemma 2.2.16, the GARCH model iscovariance stationary and short memory. Thus, following the Lemma 2.2.5, it is notasymptotically second-order self-similar.

The conclusion also works for particular GARCH models like ARCH and IGARCHmodels.

k-factor GIGARCH Processes

Definition 2.2.18. A process (Xt)t is called a k-factor Gegenbauer integrated gen-eralized autoregressive conditional heteroscedastic process, or a k-factor GIGARCHprocess, if it has the following representation

Φ(B)k∏

i=1

(I − 2νiB + B2)di(Xt − µ) = Θ(B)εt, (2.16)

where εt = ξtσt with (ξt)t∈Z a white noise process with unit variance and mean zero,

σ2t = a0+

r∑i=1

aiε2t−i +

s∑j=1

bjσt−j , µ the mean of the process (Xt)t, Φ(B) and Θ(B)

polynomials in B of order p and q respectively, B the backshift operator satisfyingBXt = Xt−1, d = (d1, ..., dk) the memory parameters and ν = (ν1, ..., νk) thefrequency location parameters, di ∈ R, |νi| ≤ 1, i = 1, · · · , k.

Notice that the frequencies λi = cos−1(νi) (i = 1, ..., k) are called Gegenbauerfrequencies.

22

Lemma 2.2.19. A k-factor GIGARCH process (Xt)t∈Z is covariance stationarywhen the following hypotheses are satisfied:

(H0) : a0 > 0, a1, ..., ar, b1, ...bs ≥ 0 andr∑

i=1

ai +s∑

i=1

bi < 1;

(H1) : |di| < 1 if |νi| < 1 and |di| < 1

4if νi = 1, for i = 1, ..., k;

(H2) : all the roots of Φ(B) = 0 and Θ(B) = 0 lie outside the unit circle and thereis no common root;

(H3) : E(ε4t ) < ∞;

(H4) : νi(i = 1, ..., k) are supposed to be known.

Lemma 2.2.20. A covariance stationary k-factor GIGARCH process is long mem-

ory when |νi| < 1 and 0 < di <1

2. The asymptotic behaviors of spectral density

and autocorrelation function are as follows: fX(λ) ∼ c(λ)|λ− λj|−2d, as λ → λj ,j = 1, ..., k, and ρk ∼ k2d−1 cos(kλj), as k → ∞, where λj = cos−1(νj) is theGegenbauer frequency.

See Guégan (2003) for detail.

Proposition 2.2.21. Let (Xt)t be a k-factor GIGARCH process, it is asymptoti-cally second-order self-similar under the conditions of Lemma (2.2.19) and Lemma(2.2.20). Furthermore, under Gaussianity, it is asymptotically self-similar.

Proof. A k-factor GIGARCH process is covariance stationary and exhibits longmemory under the conditions of Lemma 2.2.19 and Lemma 2.2.20, thus accordingto Lemma 2.2.6, a k-factor GIGARCH process is asymptotically second-order self-similar. Under the framework of Gaussianity, it is asymptotically self-similar.

The k-factor GIGARCH processes exhibit both heteroscedasticity and long memorycharacteristics.

• Processes with Switches and JumpsStructural breaks have been observed in many economic and financial time series.A lot of models have been proposed in order to capture the existence of structuralchanges and complex dynamic patterns.

Let (Xt)t be a process whose recursive scheme is

Xt = µst + εt, (2.17)

where (µst)t is a process that we specify below and (εt)t is a strong white noise,independent of (µst)t.

With respect to the process (µst)t, we distinguish two cases:

23

1. (µst)t depends on a hidden ergodic Markov chain (st)t;

2. If (µst)t = µt, then we will assume that this process depends on a probabilityp.

The first class of models includes "models with switches" and the second class in-cludes "models with breaks". If these processes are covariance stationary, for mostof them, it has been proved that they are short memory processes. Nevertheless,among these processes, some exhibit empirically a kind of long memory behaviorwhen we observe the corresponding behavior of the sample autocorrelation func-tion. Thus for these processes, studying their self-similarity is more complex thanfor the previous class of processes that we have already discussed.

Processes with Switches and BreaksWe consider a two-state Markov Switching model (Xt)t defined by the followingequations:

Xt = µst + φstXt−1 + σstεt, (2.18)

where µst , φst and σst (i = 1, 2) are real parameters, σi are positive and (εt)t

is a strong white noise with mean m ∈ R and variance σ2 ∈ R∗+, and (εt)t isindependent of the hidden ergodic Markov chain (st)t, which is characterized by itstransition probabilities pij , defined by:

P [st = j|st−1 = i] = pij (2.19)

with 0 ≤ pij ≤ 1 and2∑

j=1

pij = 1 (i = 1, 2). Thus the transition matrix is equal to

P =

(p11 1− p22

1− p11 p22

).

Proposition 2.2.22. Let (Xt)t be a process defined by Equation (2.18) and Equa-tion (2.19), then if max

i=1,2pi1|φ1|2 + pi2|φ2|2 < 1, the two-state Markov Switching

model is covariance stationary and also short memory, thus it is not asymptoticallysecond-order self-similar.

Proof. Under the above conditions, the process (Xt)t defined by Equation (2.18)and Equation (2.19) is covariance stationary (Yang 2000). Under stationarity, theo-retically this model is known to be short memory. Thus, following Lemma 2.2.5, itis not asymptotically second-order self-similar.

Actually, there are many other interesting models with switches contained in Equa-tion (2.17), for example:

– the mean regime switching model Xt = µst + εt;

– the mean variance regime switching model Xt = µst + σstεt;

24

– the sign model Xt = sign(Xt−1) + εt, where εt ∼ N(0, σ2).

For all of these models with switches, if they are covariance stationary, then theyare theoretically short memory processes, although empirically they exhibit longmemory behavior. Likewise, they are not asymptotically second-order self-similar.

Now we consider processes with breaks. Assume the process (Xt)t is defined by

Xt = µt + εt, (2.20)

where the process (µst)t = µt depends on a probability p. Different dynamics ofthe process (µt)t corresponds to different break models, for example: the Binomialmodel, the random walk model with a Bernouilli process, the STOPBREAK model,the stationary random level shift model, the mean-plus-noise model, etc. Un-der stationarity, these models are short memory and is not asymptotically second-order self-similar. However, empirically, the sample autocorrelation function of theMarkov-Switching process decreases in an hyperbolic way towards zero, thus itexhibits the long memory behavior. According to Lemma 2.2.6, empirically thisprocess appears asymptotically second-order self-similar. We will call this kindof behavior a "Spurious asymptotically second-order self-similar" behavior. Simi-larly, when the models with breaks and switches introduced above are covariancestationary, they are also theoretically short memory processes, although empiri-cally they exhibit long memory behavior. Therefore, under stationarity, they are notasymptotically second-order self-similar, but present some "spurious asymptoticalsecond-order self-similar" behavior.

Processes with ThresholdConsider the general form of the processes with threshold:

Xt = f(Xt−1)(1−G(Xt−d, γ, c)) + g(Xt−1)G(Xt−d, γ, c) + εt, (2.21)

where the function f and g can be any linear or nonlinear functions of the pastvalues of Xt or εt. The process (εt)t is a strong white noise and G is an indicatorfunction or a continuous function. For a given threshold c and the position of therandom variable Xt−d with respect to this threshold c, the process (Xt)t containsdifferent models, for example, SETAR models, STAR models, etc.

Lemma 2.2.23. Consider the threshold process defined in Equation (2.21), if func-tions f and g are to short memory processes, then under covariance stationarity,the model (2.21) is short memory.

Proposition 2.2.24. For the process with threshold defined in Equation (2.21), ifthe functions f and g correspond to short memory process, then the model (2.21) isnot asymptotically second-order self-similar.

Proof. According to Lemma 2.2.23, if the functions f and g are short memory pro-cess, the process (2.21) is short memory. Thus, following Lemma 2.2.5, the process(2.21) is not asymptotically second-order self-similar.

25

Now we consider the long memory SETAR model, defined as follows:

Xt = (1−B)−dε(1)t It(Xt−d ≤ c) + ε

(2)t [1− It(Xt−d ≤ c)], (2.22)

and the assumptions:(H5) : The process (ε

(i)t )t (i = 1, 2) is a sequence of independent identically dis-

tributed random variables.(H6) : The long memory parameter d is such that 0 < d < 1/2. So in regime 1, theprocess is invertible and stationary.

Lemma 2.2.25. Under the assumptions (H5) and (H6), the process (Xt)t defined inEquation (2.22) is globally covariance stationary. Asymptotically its spectral den-sity function is such that fX(ω) ∼ Cω−2d, as ω → 0, where C is a positive constant,

and its autocorrelation function verifies γX(h) ∼ Γ(1− 2d)

Γ(d)Γ(1− d)h2d−1, as h → 0.

Proposition 2.2.26. Under the assumptions (H5) and (H6), the covariance sta-tionary SETAR model defined in Equation (2.22) is asymptotically second-orderself-similar. Furthermore, under Gaussianity, it is asymptotically self-similar.

Proof. Under the assumptions (H5) and (H6), following Lemma 2.2.25, the covari-ance stationary model defined in Equation (2.22) is long memory process. Accord-ing to Lemma 2.2.6, it is asymptotically second-order self-similar. Furthermore,under Gaussianity, it is asymptotically self-similar.

Dufrénot et al. (2005a, 2005b, 2008) have make the applications of the long memorySETAR models on some real data.

2.2.6 Summarize for the Self-similar ProcessesIn the previous part, we have discussed the self-similar properties for continuous-timeand discrete-time processes. In the continuous-time framework, the fractional Gaussiannoise is the unique Gaussian H-sssi process. We made a review for the non-GaussianH-sssi models. Then we study the multifractal process as one of the generalized self-similar processes. In the discrete-time framework, we proposed two important lemmaswhich reveal the relationship between the self-similar properties and long/short memorybehaviors. We proved that the covariance stationary long memory process is asymptoti-cally second-order self-similar, while the covariance stationary short memory process isnot asymptotically second-order self-similar. Then we apply these two lemmas to sev-eral stochastic processes: the ARFIMA models, the k-factor GARMA models, GARCHmodels, k-factor GIGARCH models, processes with switches, processes with breaks andprocesses with threshold, etc.

Now, Guégan (2007) has proved that, if a process is not globally stationary, but only lo-cally stationary, which means that for instance, for two subsets of the process, the meansare different (as in model (2.18)), then the sample autocorrelation function decreases hy-perbolically towards zero. According to the definition of long memory process in time

26

domain, this process appears globally long memory. The arising question is the existenceof self-similarity for this process. It seems that even if it is not globally covariance station-ary, it can be globally asymptotically second-order self-similar. Thus, we suggest in thatcase that the process have a spurious asymptotically second-order self-similar behavior.

We summarize the previous discussions in a graph to give some intuitive overall ideasof the main concepts discussed in the previous sections:

Bm

LévyBmwithdrift

fBm

H-ssH-sssi

Discrete time

Continuous time Gaussian

CovarianceStationary

LongMemoryProcess

GaussianProcess

Process

a.s.o.s.sa.s.s

increments ?

α-stable

s.o.s.s

¼1

2< H < 1

ρ(k) = gH(k)-

(acf)

In this graph, H-ss stands for H-self-similar process; H-sssi stands for H-self-similarprocess with stationary increments; a.s.s for asymptotically self-similar; a.s.o.s.s for asymp-totically second-order self-similar; s.o.s.s for second-order self-similar; Autocorelationfunction (ACF) for long memory process defined in sense of autocorrelation function;Bm for Brownian motion; fBm for fractional Brownian motion; α-stable for α-stableprocess; Lévy for Lévy process.

27

Chapter 3

Wavelet Techniques for Time SeriesAnalysis

The discrete wavelet transform (DWT) has several appealing qualities that make it a usefultool for time series analysis. Although developed originally with geophysical applicationsin mind (Goupillaud et al. 1984), the DWT and its variants have found a home in a widevariety of disciplines – geology, atmospheric science, turbulence and applied mathemat-ics, etc. The engineering community quickly realized that common techniques in signalprocessing were closely related and developed the framework of multiresolution analy-sis (Mallat, 1989). There are numerous references over the past decades with respect towavelet analysis. Introductory texts include Ogden (1997), Vidakovic (1999), and Per-cival and Walden (2000) from a statistical perspective; Vetterli and Kovacevic (1995),Burrus et al. (1998), and Mallat (1998) from an engineering perspective; and Strang andNguyen (1996) and Chui (1992, 1997) from a more mathematical perspective.

This chapter provides a brief discussion of time-frequency representations, including theclassical Fourier transform and short-time Fourier transform. And a detailed introductionof the wavelet transform is followed. This part of review serves for the work in Chapter 4and Chapter 5 and to make this thesis self-contained.

3.1 Introduction of the Time-frequency Representations

3.1.1 Fourier TransformThe discrete Fourier transform (DFT) may be derived from a variety of perspectives (e.g.approximating the Fourier transform of a function, approximating the Fourier coefficientsof a function) (Briggs and Henson, 1995). We prefer to take the viewpoint of approximat-ing a discretely sampled time series (Xt)t via a linear combination of sines and cosines.Each of these sines and cosines is itself a function of frequency, and therefore the DFTmay be seen as a decomposition on a frequency-by-frequency basis. The Fourier basisfunctions (sines and cosines) are very appealing when representing a stationary time se-ries.

28

The Fourier transform is an alternative representation of the original time series such thatit summarizes information in the data as a function of frequency and therefore does notpreserve information in time. This is the opposite of how we observed the original timeseries, where no frequency resolution was provided. When observing the time series inthe time domain, we have complete time resolution and no frequency resolution, whereasthe opposite is true after performing the Fourier transform.

3.1.2 Short-Time Fourier TransformGabor (1946) recognized that the Fourier transform goes too far by eliminating all timeresolution by frequency resolution and attempted to achieve a balance between time andfrequency by sliding a window across the time series and taking the Fourier transform ofthe windowed series. This is known as the Gabor transform or Short-Time Fourier Trans-form (STFT). The resulting expansion is a function of two parameters, frequency andtime-shift. The key property is that the window size is fixed with respect to frequency.This produces a rectangular partitioning of the time-frequency plane.

The choice of window is very important with respect to the performance of the STFTin practice. Since the STFT is simply applying the Fourier transform to pieces of thetime series of interests, it will not be able to resolve the events if they happen to appearwithin the width of the window. In this case, the lack of time resolution of the Fouriertransform is present. In general, one cannot achieve simultaneous time and frequency res-olution because of the Heisenberg uncertainty principle. In the field of particle physics,an elementary particle does not simultaneously have a precise position and momentum.The better one determines the position of the particle, the less precisely the momentum isknown at that time, and vice versa. For signal processing, this rule translates into the factthat signal does not simultaneously have a precise location in time and precise frequency.An interesting discussion of the Heisenberg uncertainty principle in signal processing canbe found in Hubbard (1996) and Mallat (1989).

3.1.3 Wavelet TransformWhat is needed to overcome the fixed time-frequency partition is a new set of basis func-tions. The wavelet transform utilizes a basic function (called the mother wavelet), thendilates and translates it to capture the features that are local in time and local in frequency.The resulting time-frequency partition corresponding to the wavelet transform is long intime when capturing low-frequency events, thus having good frequency resolution forthese events, and long in frequency when capturing high-frequency events, thus havinggood time resolution for these events. Consequently, the wavelet transform intelligentlyadapts itself to capture features across a wide range of time and frequencies.

The Fourier transform, and also the STFT, is a function of frequency, while the wavelettransform is a function of scale. Actually, the scale in a wavelet transform is indeed relatedto frequency. Loosely speaking, scale is inversely proportional to a frequency interval. Ifthe scale parameter increases, then the wavelet basis is

29

• stretched in the time domain,

• shrunk in the frequency domain, and

• shifted toward lower frequencies.

Conversely, a decrease in the scale parameter

• reduces the time support,

• increases the number of frequencies captured, and

• shifts toward higher frequencies.

3.2 Properties of the Wavelet TransformAlthough the Fourier transform, the STFT, and wavelet transform are all alternative rep-resentations, each transform is well suited to certain types of applications. Wavelets, withtheir ability to quantify events in both time and scale, are well suited to study a widerange of time series. We introduce some features of wavelets and wavelet transformsbefore discussing the discrete wavelet transform.

3.2.1 Continuous Wavelet FunctionsA wavelet ψ(t) is simply a function of time t that obeys a basic rule, known as the waveletadmissibility condition:

Cψ =

∫ ∞

0

|Ψ(f)|f

df < ∞, (3.1)

where Ψ(f) is the Fourier transform, a function of frequency f , of ψ(t). This conditionensures that Ψ(f) goes to zero quickly as f → 0 (Grossmann and Morlet, 1984; Mallat,1998). In fact, to guarantee that Cψ < ∞ we must impose Ψ(0) = 0, which is equivalentto ∫ ∞

−∞ψ(t)dt = 0. (3.2)

A second condition imposed on a wavelet function is unit energy, that is∫ ∞

−∞|ψ(t)|2dt = 1. (3.3)

By satisfying both Equation (3.2) and (3.3), the wavelet function must have nonzero en-tries, but all departures from zero must cancel out. The classic example of a continuous-time wavelet function is the Morlet wavelet. The Mexican hat wavelet is an example ofsymmetric wavelet function, see Percival and Walden (2000) for more details.

The continuous wavelet transform (CWT) is a function of two variables W (u, s) and

30

is obtained by simply projecting the function of interest x(t) onto a particular wavelet ψvia

W (u, s) =

∫ ∞

−∞x(t)ψu,s(t)dt, (3.4)

whereψu,s(t) =

1√sψ(

t− u

s)

is the translated (by u) and dilated (by s) version of the original wavelet function. Theresulting wavelet coefficients are a function of these two parameters, location and scale,even though the original function is only a function of one parameter. By applying shiftedand translated versions of the mother wavelet to a function of interest, we are breakingdown the complicated structure present in the function into simpler components. This iscalled analyzing and decomposing the function. If a wavelet satisfies the admissibilitycondition (Equation (3.1)), then an inverse operation may be performed to produce thefunction from its wavelet coefficients; that is,

x(t) =1

Cψ

∫ ∞

0

∫ ∞

−∞W (u, s)ψu,s(t)du

ds

s2.

This is called synthesizing and reconstructing the function. A key property of wavelettransforms is their ability to decompose and perfectly reconstruct a square-integrablefunction.

3.2.2 Continuous versus Discrete Wavelet TransformAs mentioned previously, the CWT is a function of two parameters and therefore containsa high amount of extra (redundant) information when analyzing a function. Enough in-formation would be available to easily detect this discontinuity if one were to sample onlya portion of the CWT. Thus we reduce our task from analyzing an image (the CWT) withcontinuous parameters u and s to viewing a small number of scales with a varying num-ber of wavelet coefficients at each scale. This is the discrete wavelet transform (DWT).Although the DWT may be derived without referring to the CWT, we may review it asa "discretization" of the CWT through sampling specific wavelet coefficients (Vidakovic,1999). A critical sampling of the CWT is obtained via

s = 2−j and u = k2−j,

where j and k are integers representing the set of discrete translations and discrete dila-tions. A critical sampling defines the resolution of the DWT in both time and frequency.We use the term critical sampling to denote the minimum number of coefficients sampledfrom the CWT to ensure that all information present in the original function is retained bythe wavelet coefficients. The CWT takes on values at every point on the time-frequencyplane. The DWT, on the other hand, only takes on values at very few points and thewavelets that follow these values ψj,k(t) = 2j/2ψ(2jt− k) (for all integers j, k), producean orthogonal basis. If we select a sequence of dyadic scales and instead use all integertranslations:

s = 2−j and u = k,

31

then we arrive at the maximal overlap DWT (MODWT).

Over the past decades, both the DWT and MODWT have been utilized in a variety offields and under a variety of names. With respect to economics and finance, J.B. Ram-sey and co-authors first introduced wavelets into the mainstream literature. Ramsey andZhang (1997) performed a time-frequency analysis of foreign exchange rates (DeutscheMark versus U.S. Dollar and Japanese yen versus U.S. Dollar) using wavelets. Theyfound that wavelet analysis succinctly captured a variety of non-stationary events in theseries. Ramsey and Lampart (1998a, b) decomposed economic variables across severalwavelet scales in order to identify different relationships between money and income, andbetween consumption and income. See Ramsey (1999) for a review article on wavelets ineconomics and finance. Stengos and Sun (2001) designed a consistent specification testfor a regression model based on wavelet estimation, and Fan (2000) proposed a waveletestimator of a partial linear model by regressing boundary independent DWT coefficientsof the dependent variable on the corresponding DWT coefficients of the regressors inlinear part of the model across several scales.

3.3 Discrete Wavelet FiltersBefore formulating the DWT or MODWT, we discuss the discrete wavelet filters avail-able to us. Fundamental properties of the continuous wavelet functions (filters), such asintegration to zero and unit energy (Equation (3.2) and (3.3)), have discrete counterparts.

Let hl = (h0, · · · , hL−1) be a finite length discrete wavelet filter such that it integrates(sums) to zero

L−1∑

l=0

hl = 0 (3.5)

and have unit energyL−1∑

l=0

h2l = 1. (3.6)

In addition to Equation (3.5) and (3.6), the wavelet (or high-pass) filter hl is orthogonalto its even shifts; that is,

L−1∑

l=0

hlhl+2n = 0, for all nonzero integers n. (3.7)

This comes from the relationship between the DWT and CWT via a critical sampling.To construct the orthonormal matrix that defines the DWT, wavelet coefficients cannotinteract with one another. Equations (3.6) and (3.7) may be succinctly expressed in thefrequency domain via the squared gain function

H(f) +H(f +1

2) = 2, for all f. (3.8)

32

The natural object to complement a high-pass filter is a low-pass (scaling) filter whosesquared gain function monotonically increases as f → 0. By applying both hl and gl to anobserved time series, we separate high-frequency oscillations from low-frequency ones.We will denote a low-pass filter as gl = (g0, · · · , gL−1), and its transfer and squared gainfunction are given by G(f) and G(f), respectively. For all the wavelets considered here,the low-pass filter coefficients are determined by the "quadrature mirror relationship":

gl = (−1)l+1hL−1−l, for l = 0, · · · , L− 1. (3.9)

The inverse relationship is given by hl = (−1)lgL−1−l. Using Equation (3.9) allows us torelate the transfer function for gl and hl via

G(f) =L−1∑

l=0

gle−i2πf = e−i2πf(L−1)H(f − 1

2),

and, thus the squared gain function follows

G(f) = |e−2πf(L−1)H(f − 1

2)|2 = |H(

1

2− f)|2 = H(

1

2− f). (3.10)

Finally, a band-pass filter has a squared gain function that covers an interval of frequencies

and then decays to zero as f → 0 and f → 1

2. We may construct a band-pass filter by

recursively applying a combination of low-pass and high-pass filters.

3.3.1 Haar WaveletsThe first wavelet filter, the Haar wavelet (Haar, 1910), remained in relatively obscurityuntil the convergence of several disciplines to form what we now know in a broad senseas wavelet methodology. It is a filter of length L = 2 that can be succinctly defined by itsscaling (low-pass) filter coefficients

g0 = g1 =1√2,

or equivalently by its wavelet (high-pass) filter coefficients h0 =1√2

and hl = − 1√2

through the inverse quadrature mirror relationship.

The Haar wavelet is special since it is the only symmetric compactly supported orthonor-mal wavelet (Daubechies, 1992). It is also useful for presenting the basic properties sharedby all the wavelet filters. Although the Haar wavelet filter is easy to visualize and imple-ment, it is inadequate for most real-world applications in that it is a poor approximation toan ideal band-pass filter. An ideal band-pass filter is proportional to one inside the desiredfrequency interval and zero at all other frequencies.

33

3.3.2 Daubechies WaveletsThe Daubechies wavelet filters represent a collection of wavelets that improve the frequency-domain characteristics of the Haar wavelet and may be still interpreted as generalizeddifferences of adjacent averages (Daubechies, 1992). Daubechies derived these waveletsfrom the criteria of a compactly supported function with the maximum number of van-ishing moments. In general, there are no explicit time-domain formulae for this class forwavelet filters (when possible, filter coefficients will be provided). Daubechies first chosean extremal phase factorization, whose resulting wavelet we denote by D(L) where L isthe length of the filter. An alternative factorization leads to the least asymmetric class ofwavelets, which we denote by LA(L). Shann and Yen (1999) provided exact values forboth the extremal phase and least asymmetric wavelets of length L ∈ 8, 10. Longerextremal phase and least asymmetric wavelet filters do not have a closed form and havebeen tabulated by, for example, Daubechies (1992) and Percival and Walden (2000).

For Daubechies wavelets, the number of vanishing moments is half the filter length, thusthe Haar wavelet has a single vanishing moment, the D(4) wavelet has two vanishingmoments, and the D(8) and LA(8) wavelets both have four vanishing moments. Oneimplication of this property is that longer wavelet filters may produce stationary waveletcoefficients vectors from "higher degree" non-stationary stochastic processes.

3.3.3 Minimum Bandwidth Discrete-time WaveletsThe minimum-bandwidth discrete-time (MBDT) wavelets are a new class of orthonormalwavelet filters that were developed by Morris and Peravali (1999). They are generated viaan iterative optimization of the spectral factorization procedure. This results in a familyof filters that have similar values to those of the Daubechies wavelets, but are obtainedthrough a completely iterative procedure. Hence, although Daubechies wavelets haveclosed form expressions for the squared gain functions, the MBDT wavelets do not.

What MBDT wavelets offer is an improved approximation to an ideal band-pass filterfor length L ≥ 8. The MBDT wavelets offer superior frequency domain propertiesto Daubechies wavelets given a filter of the same length. From a statistical point ofview, band-pass filtering a time series is important to produce approximately uncorre-lated wavelet coefficients for processes with quite general spectra. The MBDT waveletsoffer improved frequency resolution to that of the Daubechies family of wavelets. Theyare most similar to the Daubechies extremal phase family of wavelets.

3.4 Discrete Wavelet Transform (DWT)These days, data availability is becoming less and less of a problem. For instance, most ofthe exchanges and especially those that trade electronically would gladly provide tick-by-tick data to interested parties. Data vendors have themselves improved their data struc-tures and provide their users with tools to collect data from over-the-counter (OTC) mar-kets. Data vendors like Reuters, for instance, transmit more than 275,000 prices per day

34

for foreign exchange spot rates alone. With such massive amounts of financial data beingcollected at any given time, the discrete wavelet transform is computationally cheaper rel-ative to the continuous wavelet transform. Furthermore, economic and financial data areinherently discrete, and thus we consider discrete transformations.

The DWT is an alternative to the Fourier transform for analyzing a time series. It pro-vides wavelet coefficients that are local in both time and frequency. The DWT may bethought of as a critical sampling of the CWT–that is, it uses the least number of coeffi-cients possible. The DWT possesses key attributes such as approximately decorrelatingcertain processes and efficiently captures important features of a process in a limited num-ber of coefficients.

There are a variety of ways to express the basic DWT. As we have seen from the def-inition of the CWT, wavelet coefficients were obtained from projecting (or correlating)portions of a time series with translated and dilated versions of the wavelet function. Wealso note that projection is closely related to convolution and we introduce filtering con-cepts for discrete series. Here it is most straightforward to introduce the DWT through asimple matrix operation.

Let x be a dyadic length vector (N = 2J ) of observations. The length N vector ofdiscrete wavelet coefficients w is obtained via

w = Wx,

where W is an N × N orthonormal matrix defining the DWT, composed of the waveletand scaling filter coefficients arranged on a row-by-row basis. The vector of waveletcoefficients may be organized into J + 1 vectors,

w = [w1,w2, · · · ,wJ ,vJ ], (3.11)

where wj is a length N/2j (j = 1, 2, · · · , J) vector of wavelet coefficients associatedwith changes on a scale of length λj = 2j−1 and wJ is a length N/2J vector of scalingcoefficients associated with averages on a scale of length 2J = 2λJ .

3.4.1 Implementation of the DWT: Pyramid AlgorithmIn practice, the DWT is implemented via pyramid algorithm (Mallat, 1989) that, startingwith the data xt, filters a series using h1 and g1, subsamples both filter outputs to half theiroriginal lengths, keeps the subsampled output from the h1 filter as wavelet coefficients,and then repeats the above filtering operations on the subsampled output from the g1 filter.

For each iteration of pyramid algorithm, we require three objects: the data vector x, thewavelet filter hl, and the scaling filter gl. Assuming those quantities are passed into theprogram, the first iteration of the pyramid algorithm begins by filtering (convolving) thedata with each filter to obtain the following wavelet and scaling coefficients:

w1,t =L−1∑

l=0

hlx2t+1−l mod N and v1,t =L−1∑

l=0

glx2t+1−l mod N ,

35

where t = 0, 1, · · · , N/2−1 (this may be combined into the same loop for computationalefficiency). Note that the downsampling operation has been included in the filtering stepthrough the subscript of xt. The N length vector of observations has been high-pass andlow-pass filtered to obtain N/2 coefficients associated with this information. The secondstep of the pyramid algorithm starts by defining the "data" to be the scaling coefficientsv1 from the first iteration and apply the filtering operations as above to obtain the secondlevel of wavelet and scaling coefficients :

w2,t =L−1∑

l=0

hlv1,2t+1−l mod N and v2,t =L−1∑

l=0

glv1,2t+1−l mod N ,

t = 0, 1, · · · , N/4 − 1. Keeping all vectors of wavelet coefficients, and the final level ofscaling coefficients, we have the following length N decomposition w = [w1,w2,v2]

T .After the third iteration of the pyramid algorithm, where we apply filtering operations tov2, the decomposition now looks like w = [w1,w2,w3,v3]

T . The procedure may berepeated up to J times where J = log2(N) and gives the vector of wavelet coefficients inEquations (3.11).

Inverting the DWT is achieved through upsampling the final level of wavelet and scal-ing coefficients, convolving them with their respective filters (wavelet for wavelet andscaling for scaling) and adding up to the two filtered vectors. Starting with the final levelof the DWT, upsampling the vectors wJ and vJ will result in two new vectors:

w0J = [0 wJ,0]

T and v0J = [0 vJ,0]

T .

The level J − 1 vector of scaling coefficients vJ−1 is given by

vJ−1,t =L−1∑

l=0

hlw0J,t+l mod 2 +

L−1∑

l=0

glv0J,t+l mod 2,

t = 0, 1. Notice that the length of vJ−1 is twice that of vJ , as to be expected. The nextstep of reconstruction involves upsampling to produce

w0J−1 = [0 wJ−1,0 0 wJ−1,1]

T and v0J−1 = [0 vJ−1,0 0 vJ−1,1]

T ,

and the level J − 2 vector of scaling coefficients vJ−2 is given by

vJ−2,t =L−1∑

l=0

hlw0J−1,t+l mod 4 +

L−1∑

l=0

glv0J−1,t+l mod 4,

t = 0, 1, 2, 3. This procedure may be repeated until the first level of wavelet and scalingcoefficients have been upsampled and combined to produce the original vector of obser-vations; that is,

xt =L−1∑

l=0

hlw01,t+l mod N +

L−1∑

l=0

glv01,t+l mod N ,

t = 0, 1, · · · , N − 1.

36

3.4.2 Multiresolution AnalysisUsing the DWT, we may formulate an additive decomposition of a series of observations.Let dj = WT

j wj for j = 1, · · · , J , define the j-th level wavelet detail associated withchanges in x at scale λj . The wavelet coefficients wj = Wjx represent the portion of thewavelet synthesis attributable to scale λj . For a length N = 2J vector of observations, thefinal wavelet detail dJ+1 = vT

J vJ is equal to the sample mean of the observations.

A multiresolution analysis (MRA) may now be defined via

xt =J+1∑j=1

dj,t, t = 0, · · · , N − 1. (3.12)

That is, for each observation xt is a linear combination of wavelet detail coefficients.

Let sj =J+1∑

k=j+1

dk define the j-th level wavelet smooth for 0 ≤ j ≤ J , where sJ+1 is

defined to be a vector of zeros. Whereas the wavelet detail dj is associated with variationsat a particular scale, sj , is a cumulative sum of these variations and will be smoother

and smoother as j increases. In fact x − sj =

j∑

k=1

dj so that only lower-scale details

(high-frequency features) will be apparent. The jth level wavelet rough characterizes the

remaining lower-scale details through rj =

j∑

k=1

dk for 1 ≤ j ≤ J +1, where r0 is defined

to be a vector of zeros. A vector of observations may thus be decomposed through awavelet smooth and rough via

x = sj + rj,

for all j. The terminology "detail" and "smooth" were used by Percival and Walden (2000)to describe additive decompositions from Fourier and wavelet transforms.

3.5 Maximal Overlap Discrete Wavelet Transform (MODWT)An alternative wavelet transform–the maximal overlap discrete wavelet transform (MODWT)–is computed without subsampling the filtered output. The MODWT gives up orthogonal-ity in order to gain features the DWT does not possess. A consequence of this is that thewavelet and scaling coefficients must be rescaled in order to retain the variance preservingproperty of the DWT.

Although not an orthonormal transform, the MODWT has several advantages over theDWT such as translation invariance, approximation of a zero-phase filtering operationand easy computation for any sample size (Percival and Walden, 2000). The MODWTmay be interpreted as applying the rescaled wavelet filters of the DWT to the vector butnot decimating (downsampling) the output after filtering.

37

The following properties are important in distinguishing the MODWT from the DWT(Percival and Mofjeld, 1997):

1. The MODWT can handle any sample size N , while the Jp-th order partial DWTrestricts the sample size to a multiple of 2Jp .

2. The detail and smooth coefficients of a MODWT multiresolution analysis are as-sociated with zero phase filters. This means events that feature in the original timeseries may be properly aligned with features in the multiresolution analysis.

3. The MODWT is invariant to circularly shifting the original time series. Hence,shifting the time series by an integer unit will shift the MODWT wavelet and scalingcoefficients the same amount. This property does not hold for the DWT.

4. While both the DWT and MODWT can perform an analysis of variance on a timeseries, the MODWT wavelet variance estimator is asymptotically more efficientthan the same estimator based on the DWT (Percival, 1995).

The MODWT goes by several names in the statistical and engineering literature, such as,the "stationary DWT" (Nason and Silverman, 1995), "translation invariant DWT" (Coif-man and Donoho, 1995; Liang and Parks, 1996), and "time-invariant DWT" (Pesquet etal., 1996).

3.5.1 Definition and Implementation of MODWTLet x be an arbitrary length N vector of observations. The length (J + 1)N vector ofMODWT coefficients w is a (J + 1)N ×N matrix defining the MODWT. The vector ofMODWT coefficients may be organized into J + 1 vectors via

w = [w1, w2, · · · , wJ , vJ ]T , (3.13)

where wj is a length N/2j vector coefficients associated with changes on a scale of lengthλj = 2j−1 and vJ is a length N/2J vector of scaling coefficients associated with averageson a scale of length 2J = 2λJ , just as with the DWT. For time series of dyadic length (N =2J), the MODWT may be subsampled and rescaled to obtain DWT wavelet coefficientsvia

wj,t = 2j/2wj,2j(t+1)−1, t = 0, · · · , N/2j − 1,

and DWT scaling coefficients via

vJ,t = 2J/2vJ,2J (t+1)−1, t = 0, · · · , N/2J − 1.

3.5.2 Multiresolution AnalysisAn analogous MRA to that of the DWT may be performed utilizing the MODWT via

xt =J+1∑j=1

dj,t, t = 0, · · · , N − 1,

38

where dj,t is the t-th element of dj = WTj wj for j = 1, · · · , J . We may also define

respectively the MODWT-based wavelet smooths and roughs to be

sJ,t =J+1∑

k=j+1

dk,t and rj,t =

j∑

k=1

dk,t, t = 0, · · · , N − 1.

A key feature of an MRA using the MODWT is that the wavelet details and smoothare associated with zero-phase filters. Thus, interesting features in the wavelet detailsand smooth may be perfectly aligned with the original time series. This attribute is notavailable through the DWT since it subsamples the output of its filtering operations.

3.6 Discrete Wavelet Packet Transform (DWPT)The DWT has a very specific band-pass filtering structure that partitions the spectrum oflong memory process finer and finer as f → 0 (i.e. where the spectrum is unbounded)which is described in detail in the next section. This is done through a succession offiltering and downsampling operations. In order to exploit the approximate decorrelationproperty for long memory processes with seasonality, we need to generalize the partitionscheme of the DWT.

This is easily obtained by performing the discrete wavelet packet transform (DWPT) onthe process, see for example, Wickerhauser (1994) and Percival and Walden (2000). In-stead of one particular filtering sequence, the DWPT executes all possible filtering combi-nations to obtain a wavelet packet tree, denoted by T . Let T = (j, n)|j = 0, · · · , J ; n =0, · · · , 2j−1 be the collection of all doublets (j, n) that form the indices of the nodes of awavelet packet tree. An orthonormal basis B ⊂ T is obtained when a collection of DWPTcoefficients is chosen, whose ideal band-pass frequencies are disjoint and cover [0, 1/2].The set B is simply a collection of doublets (j, n) that corresponds to an orthonormal ba-sis. Ramsey and Zhang (1996) used a similar, but more extensive, wave-form dictionaryto analyze the Standard and Poor’s 500 stock index. They found that this decompositionbrought out the intermittent nature of the stock market index, that of intense bursts ofactivity across a wide frequency range followed by periods of relative quiet.

Let x be a dyadic length vector (N = 2J) of observations, then the length N vectorof DWPT coefficients wB is obtained via

wB = WBx,

where WB is an N × N orthonormal matrix defined by the orthonormal basis B. All(J + 1)N wavelet packet coefficients may be computed by constructing an overcompletematrix WT and applying it to the vector of observations; that is, wT = WT x, where WTis an (J + 1)N ×N matrix.

Constructing the matrices WB or WT involves a minor amount of book-keeping in or-der to retain the sequency ordering of the wavelet packet filters. Let h0, · · · , hL−1 be the

39

unit scale wavelet (high-pass) filter coefficients from a Daubechies compactly supportedwavelet family of even length L, with scaling (low-pass) coefficients computed via thequadrature mirror relationship. Now we define

un,l =

gl, if n mod 4 = 0 or 3;hl, if n mod 4 = 1 or 2,

to be the appropriate filter at a given node of the wavelet packet tree. The wavelet packetfilters un,l preserve the ordering of the DWPT by increasing frequency.

Let h0, · · · , hL−1 be the unit scale wavelet (high-pass) filter. Thus, the scaling (low-pass)coefficients may be computed via the quadrature mirror relationship:

gl = (−1)l+1hL−l−1, l = 0, 1, · · · , L− 1.

Instead of one particular filtering sequence, the DWPT executes all possible filtering com-binations to construct a wavelet packet tree, denoted by T = (j, n)|j = 0, · · · , J−1; n =0, · · · , 2j − 1.

The DWPT coefficients are then calculated using the pyramid algorithm of filtering anddownsampling (Mallat, 1999). Denote Wj, n, K the K−th element of length Nj(= N/2j),corresponding to the wavelet coefficient vector Wj, n, (j, n) ∈ T with W0, 0 = x. Giventhe DWPT coefficients Wj−1, [n

2], K , where [ · ] represents the "integer part" operator, then

the coefficient Wj, n, K is calculated by

Wj, n, K ≡Lj−1∑

l=0

un, lWj−1, [n2], 2K+1−l mod Nj−1

, K = 0, 1, · · · , Nj − 1, (3.14)

where Lj = (2j − 1)(L− 1) + 1 is the length of a level j wavelet filter.

An adaptive orthonormal basis B ⊂ T is obtained when a collection of DWPT coeffi-cients is retained such that band-pass frequencies are disjoint and cover the frequency

interval [0,1

2] (Percival and Walden, 2000; Gençay et al., 2001).

As with the DWT, the DWPT is most efficiently computed using a pyramid algorithm(Wickerhauser, 1994). The algorithm has O(N log N) operations, like the fast Fouriertransform (FFT).

There is no longer a convenient interpretation of wavelet packet coefficient vectors withdifferences at various scales. Instead, the vector wj,n is associated with the frequency

interval λj,n = [n

2j+1,n + 1

2j+1]. The DWPT coefficient vectors corresponding to the DWT

coefficients are given by wj,1 at each scale.

40

3.7 Maximal Overlap Discrete Wavelet Packet Transform(MODWPT)

Definition of the maximal overlap DWPT (MODWPT) is straightforward and follows di-rectly from the DWPT. The rescaled wavelet packet filter is defined to be un,l = un,l/2

1/2.Using un,l instead of the filter un,l and not subsampling the filter output produces theMODWPT coefficients. Hence the vector of MODWPT coefficients wj,n is computedrecursively given wj−1, n

2via

wj,n,t =

Lj−1∑

l=0

un,lwj−1,[n2],t−2j−1l mod N , t = 0, 1, · · · , N − 1.

Each vector of MODWPT coefficients has length N (to begin the recursion simply definew0,0 = x). This formulation leads to the efficient computation using a pyramid-typealgorithm (Percival and Walden, 2000). As with the DWPT, the MODWPT is an energypreserving transform and we may define the decomposition of energy at each level of thetransform via

‖ x ‖2=2j−1∑n=0

‖ wj,n ‖2, for j = 1, · · · , J.

This corresponds to the basis Bj = (j, n)|n = 0, · · · , 2j − 1. Given a particular levelj of the transform, we may also reconstruct x by projecting the MODWPT coefficientsback onto their rescaled filter coefficients via

xt =2j−1∑n=0

Lj−1∑

l=0

uj,n,lwj,n,t+l modN , t = 0, · · · , N − 1, (3.15)

where Lj = (2j − 1)(L− 1) + 1 is the length of a level j wavelet filter.

Let dj,n = (dj,n,0, dj,n,1,··· ,dj,n,N−1) be the MODWPT detail associated with the frequency

interval λj,n, then the t−th element of dj,n is given by

dj,n,t =

Lj−1∑

l=0

uj,n,lwj,n,t+l mod N , t = 0, · · · , N − 1,

and an additive decomposition in Equation (3.15) may be rewritten as

x =∑

(j,n)∈Bdj,n

for any orthonormal basis B. These details are associated with zero-phase filters, just likethe MODWT, and therefore line up perfectly with the features in the original time seriesx at the same time.

41

Chapter 4

Estimation Methods for StationaryLong Memory Processes: A Review

Long memory models have been used by several authors to model data with persistentautocorrelations. The earlier model was introduced by Mandelbrot and Van Ness (1968)and Mandelbrot (1971) to formalize Hurst’s empirical findings using cumulative river flow

data. Let Yt =

∫ t

∞(t − s)H− 1

2 dB(s), where B(s) is Brownian motion and H ∈ (0, 1).

Then Xt = Yt − Yt−1 is a simple fractional Gaussian noise, which is designed to accountfor the long term behavior of time series. The second long memory model was proposedby Granger and Joyeux (1980) and Hosking (1981) – the fractional integrated autoregres-sive moving average (ARFIMA) models with an infinite peak in the spectrum at f = 0.Geweke and Porter-Hudak (1983) proved that the definitions of fractional Gaussian noiseand integrated (or fractionally differenced) series are equivalent.

A simple generalization of the ARFIMA model is the Gegenbauer process (Gray et al.,1989) or a seasonal persistent process (Andel, 1986). More generally, Woodward etal. (1998) proposed the k-factor Gegenbauer and k-factor Gegenbauer ARMA process,which can model long term periodic behavior associated with several peaks in frequencies

in [0,1

2].

In order to allow for different persistence parameters across different frequencies, weconsider the general class of fractionally integrated zero mean processes (Xt)t, referredto as Seasonal and/or Cyclical Long Memory (SCLM henceforth) processes, defined bythe following equation:

F (B)Xt = (I −B)d0

k−1∏i=1

(I − 2νiB + B2)di(I + B)dkXt = εt, (4.1)

where B is the backshift operator. For i = 1, · · · , k − 1, νi = cosλi, λi being any fre-quency between 0 and π. For i = 0, 1, · · · k and di is such that: |di| < 1/2, implyingthus that the spectral density is unbounded at λi. Moreover, (εt)t is an innovation processto be specified. This model has been first discussed by Robinson (1994) in order to test

42

whether the data stems from a stationary or a non-stationary process, under uncorrelatedand weakly correlated innovations (εt)t.

Without loss of generality, we assume that (Xt)t described by Equation (4.1)is a zeromean process and for the moment we assume that (Xt)t is a stationary process. Theprocess nests all the specific long memory processes introduced from the nineties in theliterature to take both the seasonal and/or cyclical behaviors and the long memory com-ponents of the data into account. From model (4.1), we can derive in a stationary settinga lot of models as follows whose interest in macroeconomics is recognized.

• If di = 0 (i = 1, · · · , k), we get the FI(d) (Fractionally Integrated) process if (εt)t

is a white noise:(I −B)dXt = εt, (4.2)

proposed by Granger and Joyeux (1980) and Hosking (1981). If we assume that(εt)t follows a GARCH noise, we get the FIGARCH model (fractionally integratedand GARCH), see Baillie, Bollerslev and Mikkelsen (1996) or reference. This classof models permits to take into account the existence of an infinite cycle, as well asthe spectral density’s typical shape of macroeconomics data, namely an explosionfor the very low frequencies.

• In order to model a fixed seasonal periodicity s, supposed to be even, we generallyuse the following representation:

(I −Bs)dXt = εt. (4.3)

For instance, if s = 4, the expression (4.3) becomes:

(I −B4)dXt = (I −B)d(I + B2)d(I + B)dXt = εt. (4.4)

This filter was introduced by Porter-Hudak (1990), called the rigid filter and can beconsidered as the particular case of Hassler’s flexible filter (1994): (I − B)d1(I +B2)d2(I + B)d3Xt = εt. It is motivated by factorizing I − B4 according to its unitroots, allowing to model stationary fractional seasonalities. The spectral density ofthe process (4.3) is unbounded for the three seasonal frequencies 0, π/2, π. Thisrepresentation is useful for quarterly data sets (with s = 4) and if we deal withmonthly data we consider the same model using s = 12.

• It may happen that we observe an explosion at the zero frequency as well as at anyfrequency between ]0, π]. This means that an infinite cycle is mixed with anotherseasonality. In that case, we use the following model for (Xt)t:

(I −B)d1(I −Bs)d2Xt = εt. (4.5)

This model was introduced by Porter-Hudak (1990). The parameter d1 correspondsto the persistence associated to the infinite cycle and d2 is the persistence associatedto the fixed seasonality.

43

• In the presence of explosions at k frequencies in the spectral density, we use a modelcharacterizing these k persistent periodicities. It is the k-factor Gegenbauer processgiven by:

k∏i=1

(I − 2νiB + B2)diXt = εt, (4.6)

with, for i = 1, . . . , k, νi = cos(λi), the frequencies λi = cos−1(νi) being theGegenbauer frequencies or the G- frequencies. When (εt)t is a white noise, thismodel was introduced by Giraitis and Leipus (1995) and Woodward, Cheng andGray (1998). When (εt)t follows a GARCH process, (Xt)t is called a GIGARCHprocess, introduced by Guégan (2000, 2003).

• Inside the spectral density, we can observe k explosions as well as an explosion atthe zero frequency, then the previous model becomes:

(I −B)d

k∏i=1

(I − 2νiB + B2)diXt = εt. (4.7)

There are a number of estimators of a long memory process’ long memory parameterwhen the parameter is assumed to hold constant over the entire data set. We make areview in the following.

4.1 ARFIMA ProcessesA model that has long range dependence and is frequently used in modeling long mem-ory is the fractionally integrated autoregressive moving average (ARFIMA) model. TheARFIMA model succinctly captures the slowly decaying autocovariance function of a

long memory process with fractional difference or long memory parameter d = H − 1

2,

where H is the Hurst parameter. By letting the difference parameter take non-integervalues, the ARFIMA model permits to model complex long-run behavior in a more par-simonious manner.

There are several estimation methods for ARFIMA models, including parametric, semi-parametric (in frequency domain and in time domain), and wavelet methods. Nielsenand Frederiksen (2005) studied and carried out the finite sample comparison of differ-ent estimation methods. They found that among the parametric method, the frequencydomain maximum likelihood procedure is superior with respect to both bias and RMSE.And the bias of parametric time domain procedures is alleviated when larger sample sizesare considered. Among the semiparametric (frequency domain and wavelet) methods, thebias-reduced log-periodogram regression (Andrews and Guggenberger, 2003) and localpolynomial Whittle estimators (Andrews and Sun, 2004) outperform the correctly speci-fied time domain parametric domain methods. Furthermore, when the methods are veryheavily biased owing to contamination from short-run dynamics, these estimators show a

44

much lower bias at the expense of an increase in the RMSE. Finally, if there is not suf-ficient trimming of scales, the wavelet-based method are heavily biased when short-rundynamics is introduced.

We first briefly describe the ARFIMA model and provide an introduction to the esti-mation methods. We do not present all the mathematical assumptions underlying eachestimation procedure but rather describe the methods and applicability in general and alsobriefly discuss and compare the asymptotic distributions of the various estimators.

Granger and Joyeux (1980) and Hosking (1981) introduced an ARFIMA(p, d, q) pro-cess if its d-th difference is a stationary and invertible ARMA(p, q) process. Here d may

be any real number such that −1

2< d <

1

2(to ensure the stationarity and invertibility).

More precisely, Xt is an ARFIMA(p, d, q) if

Φ(B)(I −B)d(Xt − ν) = Θ(B)εt, (4.8)

where Φ(z) = 1− φ1z− · · · − φpzp and Θ(z) = 1 + θ1z + · · ·+ θqz

q are polynomials oforder p and q, respectively, with roots outside the unit circle, εt is Gaussian white noisewith the variance σ2.

If d > −1

2, the process is invertible and possesses a linear (Wold) representation, and

if d <1

2, it is covariance stationary. If d = 0, the spectral density is bounded at the origin

and the process has only weak dependence (short memory). Furthermore, the parameterd determines the (long) memory of the process. If d > 0, the process is said to havelong memory, since the autocorrelations die out at a hyperbolic rate (and indeed are nolonger absolutely summable), in contrast to the much faster exponential rate in the weakdependence case, whereas if d < 0, the process is said to be antipersistent (Mandelbrot,

1982) and has mostly negative autocorrelations. The case 0 ≤ d ≤ 1

2has been proved

particularly relevant for many applications in finance and economics.

The autocorrelation function of the process (4.8) satisfies

ρk ∼ cρk2d−1, 0 < cρ < ∞ as k →∞, (4.9)

which decays at a hyperbolic rate, see Granger and Joyeux (1980) and Hosking (1981).Equivalently, the behavior of the autocorrelations at large lags can be stated in the fre-quency domain at small frequencies.

Thus, defining the spectral density function of Xt, fX(λ), as

γk =

∫ π

−π

fX(λ)eiλkdλ, (4.10)

45

where γk is the k-th autocovariance of Xt, it can be shown that the spectral density of theARFIMA(p, d, q) process (4.8) is given by

fX(λ) =σ2

2π|1− eiλ|−2d |Θ(eiλ)|2

|Φ(eiλ)|2 =σ2

2π(2 sin λ/2)−2d |Θ(eiλ)|2

|Φ(eiλ)|2 . (4.11)

Thus the approximation (4.9) can be restated in the frequency domain as follows (seeGranger and Joyeux (1980), Hosking (1981), or Beran (1994))

fX(λ) ∼ g|λ|−2d, 0 < g < ∞, as λ → 0. (4.12)

4.1.1 Parametric EstimatorsFour different parametric maximum likelihood estimators (MLE) are described in the fol-lowing: the exact time domain MLE, the modified profile likelihood estimator, the con-ditional time domain MLE, and the frequency domain MLE. The time domain estimatorsare based on the likelihood function of the ARFIMA(p,d,q) model with or without condi-tioning on initial observations, and the frequency domain estimator is based on Whittle’sapproximation to the likelihood function in the frequency domain.

Maximum Likelihood in the Time Domain

As for the estimation of the factional differencing parameter in ARFIMA model, in timedomain, the often used maximum likelihood method are the exact Gaussian maximumlikelihood estimator (Sowell, 1992b), the modified likelihood estimator (Cox and Reid,1987; An and Bloomfield, 1993), the conditional maximum likelihood estimator or con-ditional sum of squares estimator (Chung et Baillie, 1993; Beran, 1995; Tanaka, 1999;Nielsen, 2004), etc.

The exact Gaussian maximum likelihood objective function for the model (4.8) is (when

−1

2< d <

1

2):

LE(d, φ, θ, σ2, µ) = −T

2log |Ω| − 1

2(Y − µl)′Ω−1(Y − µl), (4.13)

where l = (1, · · · , 1)′, X = (X1, · · · , XT )′, φ and θ are the parameters of Φ(B) andΘ(B), µ is the mean of X , and Ω is the variance matrix of X , which is a complicatedfunction of d and the remaining parameters of the model. Sowell (1992a) derived an ef-ficient procedure for solving this function in terms of hypergeometric functions, but animportant limitation is that the roots of the autoregressive polynomial cannot be multiple.

Gathering the parameters in the vector γ = (d, φ, θ, σ2, µ)′, the exact maximum like-lihood (EML) estimator is obtained by maximizing the likelihood function (4.13) withrespect to γ. Sowell (1992a) showed that the EML estimator of d is

√T -consistent and

asymptotically normal, i.e.,√

T (dEML − d) →d N(0, (π2/6− C)−1), (4.14)

46

where C = 0 when p = q = 0 and C > 0 otherwise. The variance of the EML estimatormay be derived as the (1, 1) element of the inverse of the matrix

1

4π

∫ 2π

0

∂ log fX(λ)

∂γ

∂ log fX(λ)

∂λdλ.

Although the time and frequency domain maximum likelihood estimators are asymptoti-cally equivalent, their finite sample properties differ. Sowell (1992b) showed that the timedomain estimator has better finite sample properties than the frequency domain estimatorwhen the mean of the process is known. However, Cheung and Diebold (1994) showedthat the finite sample efficiency of the discrete Whittle frequency domain MLE relative totime domain EML rised dramatically when the mean is unknown and has to be estimated.

The modified profile likelihood (MPL) estimator is based on a correction of the param-eters of interest (here d, Φ, Θ) for the second-order effects due to nuisance parameters(here σ2, µ). Thus the idea is to reduce the bias by applying a transformation that makes(d, Φ, Θ) orthogonal to (σ2, µ), see Cox and Reid (1987) and An and Bloomfield (1993).The modified profile log-likelihood function is given as (without constants)

LM(d, φ, θ; µ) = −(1

2− 1

T) log |R|−1

2log(l′R−1l)−(

T − 3

2) log[T−1(X−µl)′R−1(X−µl)],

(4.15)where R = Ω/σ2 and µ = (l′R−1l)−1l′R−1X . The asymptotic distribution of the MPLestimator is unchanged compared to the EML estimator on which it is based, and hence italso satisfies the asymptotic distribution (4.14).

Imposing the initialization Xt = 0(t ≤ 0), the model (4.8) is valid for any value of dand is a type II fractional process in the terminology of Marinucci and Robinson (1999).The objective function considered by Chung and Baillie (1993), Beran (1995), Tanaka(1999), and Nielsen (2004) is

Lc(d, φ, θ, µ) = −T

2log[

T∑t=1

(Φ(B)

Θ(B)(I −B)d(Xt − µ))2], (4.16)

and we call the estimator that maximizes Equation (4.16) the conditional maximum like-lihood (CML) estimator. Maximizing LC is equivalent to minimizing the usual (condi-tional) sum of squares, and hence this estimator is also referred to as the CSS estimatorby some authors, e.g., Chung and Baillie (1993) and Beran (1995). The CML estimatorhas the same asymptotic distribution (4.14) as the EML estimator for any value of d andis computationally much less demanding.

The parametric estimators are asymptotically efficient in the classical sense when themodel is Gaussian and correctly specified.

Maximum Likelihood in the Frequency Domain

An alternative approximate MLE of the ARFIMA(p, d, q) model follows the idea ofWhittle (1951), who noted that for stationary models the covariance matrix Ω can be

47

diagonalized by transforming the model into the frequency domain. Fox and Taqqu (1986)

showed that (when d ∈ (−1

2,1

2)) the log likelihood can be approximated by

LF (d, φ, θ, σ2) = −[T/2]∑j=1

[log fX(λj) +I(λj)

fX(λj)], (4.17)

where λj = 2πj/T are the Fourier frequencies, I(λ) =1

2πT|

T∑t=1

Xteitλ|2 is the peri-

odogram of Xt, fX(λ) is the spectral density of Xt given in Equation (4.11).

The approximate frequency domain maximum likelihood (FML) estimator is definedas the maximizer of Equation (4.17) and was proposed by Fox and Taqqu (1986), whoalso proposed a continuously integrated version of (4.17). Dahlhaus (1989) also assumedGaussianity and considered the exact likelihood function in the frequency domain. Notethat the FML estimator is invariant to the presence of a nonzero mean, i.e., µ 6= 0, sincej = 0 (the zero frequency) is left out of the summation in (4.17). The FML estimatorhas the same asymptotic normality, i.e.,

√T -consistency and asymptotic normality, and

when the process is Gaussian, asymptotic efficiency. Finally, Giraitis and Surgailis (1990)relaxed the Gaussianity assumption and analyze the Whittle estimate for linear processes,showing that it is

√T -consistent and asymptotically normal but no longer efficient, while

Hosoya (1997) extended the previous analysis to multivariate framework.

4.1.2 Semiparametric EstimatorsAfter the seminal papers of Granger and Joyeux (1980) and Hosking (1981), fractionalintegration processes (FI(d)) have attracted the attention of many statisticians and econo-metricians. These long-range dependence processes give more flexibility to empiricalresearch than the classical FI(0) and FI(1) classes of processes. For 0 < d < 1/2, theyare stationary with hyperbolic decay of the autocorrelation function and they exhibit longmemory or long-range dependence. To estimate d, we usually use the semiparametricmethods developed by Geweke and Porter-Hudak (1983) (henceforth referred to GPH).Agiakloglou et al. (1993) showed that this estimator has a large bias. Reisen (1994),Robinson (1994, 1995a,b) and Lobato and Robinson (1996) give some estimators with asmall bias.

The semiparametric frequency domain estimators are based on the approximation (4.12)to the spectral density. Two classes of semiparametric estimators have become very pop-ular in empirical work, the log-periodogram regression method suggested by Geweke andPorter-Hudak (1983) and the local Whittle approach suggested by Künsch (1987). Someearlier work on the (adjusted) rescaled range, or "R/S" statistic, by Hurst (1951) andMandelbrot and Wallis (1969), or its modified version allows for weak dependence by Lo(1991).

The semiparametric estimators enjoy robustness to short-run dynamics, since they use

48

only information from the periodogram ordinates in the vicinity of the origin. Indeed,the short-run dynamics in the model, i.e., the autoregressive and moving average polyno-mials Φ(·) and Θ(·) in ARFIMA model (4.8), does not even have to be specified. Thedrawback is that only

√m-consistency is achieved, where m = m(T ) is a user-chosen

bandwidth parameter, in comparison to√

T -consistency (and efficiency) in the parametriccase. Thus the semiparametric approach is much less efficient than the parametric one,sine it requires at least

m

T→ 0.

Log-periodogram Regression

Probably the most commonly applied semiparametric estimator is the log-periodogramregression (LPR) estimator introduced by Geweke and Porter-Hudak (1983) and analyzedin detail by Robinson (1995b). Taking logs in Equation (4.12) and inserting sample quan-tities, we get the approximate regression relationship

log(I(λj)) = constant− 2d log(λj) + error. (4.18)

The LPR estimator is defined as the OLS estimator in the regression (4.18) using j =1, · · · ,m, where m = m(T ) is a bandwidth number which tends to infinity as T → ∞but at a slower rate than T . Note that the estimator is invariant to a non-zero mean, sincej = 0 is left out of the regression.

Under suitable regularity conditions, including Xt being Gaussian (later relaxed by Ve-lasco (2000)) and a restriction on the bandwidth, Robinson (1995b) derived the asymp-totically normal limit distribution for the LPR estimator when d is in the stationary and

invertible range (−1

2,1

2). The proof by Robinson (1995b) also employed trimming of the

very low frequencies as suggested by Künsch (1986), but following recent research, e.g.,Hurvich et al. (1998) and the original suggestion of Geweke ad Porter-Hudak (1983),the trimming is not necessary and has been largely ignored in empirical work. Kimand Phillips (1999) and Velasco (1999b) demonstrated that the range of consistency is

d ∈ (−1

2, 1] and the range of asymptotic normality is d ∈ (−1

2,3

4). The limiting distribu-

tion of the LPR estimator for d ∈ (−1

2,1

2) is given by Robinson (1995a) as

√m(dR − d) →d N(0,

π2

24). (4.19)

There exist also some other estimators based on LPR estimator, the bias reduced log-periodogram regression (BRLPR) estimator by Agiakloglou et al. (1993) and Andrewsand Gugggenberger (2003) and the pooled log-periodogram regression (PLPR) estimatorby Shimotsu and Phillips (2002); see Nielsen and Frederisksen (2005) for more details.

Local Whittle Approach

The other class of semiparametric frequency domain estimators we consider follows thelocal Whittle approach suggestd by Künsch (1987). The local Whittle (LW) estimator was

49

analyzed by Robinson (1995a) (who called it a Gaussian semiparametric estimator) andis attractive because of its likelihood interpretation, nice asymptotic properties and verymild assumptions. The LW estimator is defined as the maximizer of the (local Whittlelikelihood) function

Q(g, d) = − 1

m

m∑j=1

[log(gλ−2dj ) +

I(λj)

gλ−2dj

]. (4.20)

One drawback compared to log-periodogram estimation is that numerical optimization isneeded. However, the assumptions underlying this estimator are weaker than those of the

LPR estimator, and Robinson (1995a) showed that when d ∈ (−1

2,1

2),

√m(dLW − d) →d N(0,

1

4). (4.21)

Thus the asymptotic distribution is extremely simple, facilitating easy asymptotic infer-ence, and in particular the estimator is more efficient than the LPR estimator. The rangesof consistency and asymptotic normality for the LW estimator have been shown by Ve-lasco (1999a) and by Phillips and Shimotsu (2004) to be the same as those of the LPRestimator.

An exact local Whittle (ELW) estimator has been proposed by Shimotsu and Phillips(2005) that avoids some of the approximations in the deviation of the LW estimator andis valid for any value of d. The ELW estimator replaces the objective function (4.20) bythe function

QE(g, d) = − 1

m

m∑j=1

[log(gλ−2dj ) +

I∆dX(λj)

g], (4.22)

where I∆dX(λ) =1

2πT|

T∑t=1

(∆dXt)eitλ|2 is the periodogram of ∆dXt. The ELW estima-

tor satisfies (4.21) for any value of d and is thus not confined to any particular range of dvalues, but it is confined to zero-mean processes.

In addition, Shimotsu (2002) proposed the feasible ELW (FELW) estimator. Andrewsand Sun (2004) proposed a generalization of the local Whittle estimator–the local poly-

nomial Whittle (LPW) estimator of d for d ∈ (−1

2,1

2).

For both the log-periodogram regression method and the local Whittle approach, thechoice of bandwidth parameter m is very important. Results on optimal (mean squarederror minimizing) choice of bandwidth for the log-periodogram regression have been de-rived by Hurvich et al. (1998), and results for the local Whittle approach have beenderived by Henry and Robinson (1996). In both cases, the optimal bandwidth is found tobe a multiple of T 0.8, where the multiplicative constant depends on the smoothness of thespectral density near the origin, i.e., on the short-run dynamics of the process.

50

4.1.3 Wavelet EstimatorsIn the latter part of the twenties century, several disciplines came to the realization of thenaturally occurring phenomena (river flow, atmospheric patterns, telecommunications,astronomy, financial markets, etc.) which exhibit correlations that do not decay at a suffi-ciently fast rate. This means that observations separated by long periods of time still ex-hibit significant correlation. These time series are known as long memory or long rangedependent processes and require different approaches to modeling than short memorytime series. At first, Fourier-based methods dominated the literature in terms of identi-fying and fitting models to long memory processes. Both least squares and maximumlikelihood procedures have been established for estimating the model parameters in thecase of long range dependence. By generalizing the concept of a long memory process toa long memory process with seasonality, one may apply these estimation procedures to amuch broader class of time series models.

Wavelets have shown great promise in handling long memory and a combination of short-memory and long-memory processes. By performing a wavelet decomposition to an ob-served time series, one makes an implicit assumption about the underlying nature of theprocess. Specifically, there is a hierarchical structure present in the data so that informa-tion in the time series is evolving at different time horizons (scales) and different magni-tudes. When decomposed appropriately, one may more easily view the individual factorsthat make up the (potentially) complicated process. More precisely, the correlation matrixof the decomposition is essentially block diagonal. Large values on the diagonal repre-sent the information at a particular level of the decomposition and near zero values on theoff-diagonal denote little interaction between elements. This concept was discussed byBrock (2000).

By design, the wavelet’s strength rests in its ability to localize simultaneously processes intime and scale. At high scales, the wavelet has a small centralized time support enabling itto focus on short-lived time phenomena like a singularity point. At low scales, the wavelethas a large time support allowing it to identify long periodic behavior. By moving fromlow to high scales, the wavelet zooms in on the behavior of a process at a particular pointin time, identifying singularities, jumps, and cusps. Alternatively, the wavelet can zoomout to reveal the long, smooth features of a series. In practice, the Haar and Daubechies(1988) wavelets are most commonly applied in the literature.

In the study of the estimation of long memory parameters, there exists mainly two meth-ods through wavelet techniques, the ordinary least squares (OLS) method and the maxi-mum likelihood method. These methods are applicable due to the ability of the DWT todecorrelate long memory processes.

For the fractional integrated ARMA (ARFIMA) model introduced by Granger and Joyeux(1980) and Hosking (1981), an explosion can be found at the zero frequency in its spec-trum. Using the expression of the spectral density of the ARFIMA process, Geweke andPorter-Hudak (1983) proposed a semiparametric method using the log regression on the

51

periodogram in order to estimate the long memory parameter d. We refer to this estimatorthe GPH estimator. Thereafter, several improvements have been suggested such as usingan alternative spectral estimator. In the wavelet framework, Jensen (1999) introduced awavelet-based estimator of d using the OLS regression. The main technique used here isthat the wavelet coefficients’ variance is an approximate estimation of the spectrum. Thuswe can obtain a log linear regression between the wavelet coefficients’ variance and thescale. This approach is known for its simplicity in operation and application, but the largevariance of the estimator posed a problem. Thus we need to pay attention for the adoptionof this method.

The other method often used by the practitioners is the wavelet-based approximate max-imum likelihood estimation, which has been investigated by McCoy and Walden (1996)and Jensen (1999a, 2000). This method produced much less mean squared errors whencompared to the wavelet-based OLS method. To avoid the complexity in computing theexact likelihood, a wavelet-based approximate maximum likelihood estimator is proposedby replacing the covariance matrix of the process with an approximation using the DWT.

Let Xt be a stationary and invertible FI(d) (−1

2< d <

1

2) process with zero mean

whose d-th order backward difference is as follows:

(I −B)dXt =∞∑

k=0

(dk

)(−1)kXt−k = εt, (4.23)

where(

dk

)=

d!

k!(d− k)!and εt is a white noise process with variance σ2

ε .

The autocovariance sequence (ACVS) of Xt is defined to be

γτ = E(Xt, Xt−τ ) =σ2

ε(−1)τΓ(1− 2d)

Γ(1 + τ − d)Γ(1− τ − d),

which means the variance is given by

V ar(Xt) = γ0 =σ2

εΓ(1− 2d)

[Γ(1− d)2].

The spectral density function (SDF) of Xt is

SX(f) =σ2

ε

|2 sin(πf)|2d, for − 1

2< f <

1

2. (4.24)

Consequently, SX(f) ∝ f−2d approximately as f → 0 and, thus the SDF is approxi-

mately linear on the log scale. When 0 < d <1

2, the SDF has an asymptote at frequency

zero, in which case the process exhibits slowly decaying autocovariances and provides asimple example of a long memory process.

52

We explore the output of the DWT when applied to an FI(d) process. Emphasis willbe on the spectral properties of the DWT coefficients. The ability of the DWT to decor-relate time series, such as the FI(d) process, producing wavelet coefficients for a givenscale, which are approximately uncorrelated, is well-known; see, for example, Tewfikand Kim (1992), McCoy and Walden (1996) and Wornell (1996). The DWT produceswavelet coefficients with flat spectra and therefore exhibit little correlation within scales.And there is very little cross-correlation between scales too. This property is used to sim-ulate FI(d) process and also to estimate the long memory parameter d in practice. Ramsey(1998) showed this fact empirically to validate previous results with respect to performingregressions at different time scales.

There is an important property worthy of being mentioned. The band-pass variance Bj

for an FI(d) process, with spectrum given in Equation (4.24), in the frequency interval[−2−j,−2−j−1) ∪ (2−j−1, 2−j] is

Bj = 2 · 4−d

∫ 1/2j

1/2j+1

σ2ε

| sin(πf)|2ddf. (4.25)

Replace the true SDF at each frequency interval with a constant Sj = Sj(f), for all f ,such that the band-pass variances are equal. This step assumes the SDF is slowly varyingacross the frequency interval. Integrating the constant spectrum over [−2−j,−2−j−1) ∪(2−j−1, 2−j] gives ∫ 1/2j

1/2j+1

Sj,ndf = Sj2−j−1.

Equating this to the band-pass variance gives

2Sj2−j−1 = Bj =⇒ Sj = 2jBj. (4.26)

The variance of wavelet coefficients wj,t is therefore given by Sj through the band-passnature of the DWT.

A popular time-domain technique for simulating time series is based on the Levinson-Durbin recursions (Hosking, 1984). This time domain method is exact, but the Levinson-Durbin recursions require O(N2) operations and become unwieldy for larger samplesizes. Davies and Harte (1987) described a method for simulating certain stationary Gaus-sian time series of length N with known autocovariances γ0, γ1, · · · , γN−1. The methodis based on the Fourier transform and is provided in Chan and Wood (1994) and Beran(1994). This method is exact for short memory processes, like the typical ARMA timeseries models, but is approximate when used to generate FI(d) process (Percival, 1993).Even so, the Fourier-based method is very efficient and produces realizations of FI(d)process with good statistical properties. Recently, Parke (1999) proposed a simulationprocedure by representing the FI(d) process via an error duration model. A DWT-basedmethod for generating realizations of FI(d) process was proposed by McCoy and Walden(1996).

53

Wavelet Ordinary Least Squares (OLS) Estimator

Using the logarithmic decay of the autocovariance function of a long memory process,Jensen (1999) showed that a log-linear relationship (suggested by McCoy and Walden,1996; Johnstone and Silverman, 1997) exists between the variance of the wavelet coef-ficient from the long memory process and its scale, which can be used to estimate d byleast squares regression. Leaving out high level wavelet coefficients results in robustnessto the short-run dynamics, which is similar to the LPR estimator (see McCoy and Walden,1996; Tse et al., 2002).

The approximate linear relationship between the periodogram and Fourier frequencies(on a log-log scale) has been known for a long time. Geweke and Porter-Hudak (1983)first proposed the regressing the log values on the periodogram on the log SDF to estimatethe fractional differencing parameter d (we refer to this as the GPH estimator). Althoughvery popular, the GPH estimator suffers from the poor asymptotic properties of the peri-odogram. Improvements to the GPH estimator have been suggested, such as restrictingthe number of frequencies used in the regression or using an alternative spectral estima-tor (smoothed or multitaper). A wavelet-based estimator of d was introduced by Jensen(1999b) using ordinary least square (OLS) regression. Tkacz (2000) has applied this tech-nique to nominal interest rates in the United States and Canada.

For a vector of observations y, the OLS model is formulated via y = Xβ + e, where

y =

y1

y2...

yJ

X =

1 x1,1

1 x2,1...

...1 xJ,1

β =

(β0

β1

)

are the length J vector of dependence observations, the J × 2 dimensional model matrixand the length 2 parameter vector, respectively. The final vector

e = [e1, e2, · · · , eJ ]T

is the column of model errors with E(e) = 0 and V ar(e) = σ2eIJ . The OLS estimator β

of β isβ = (XTX)−1XTy, (4.27)

and the covariance matrix of β is given by

Σβ = σ2e(X

TX)−1. (4.28)

We are interested in estimating the slope parameter β1 when the dependent observationsare yj = log(σ2

x(λj)) and the independent observations are xj,1 = log(λj), for j =1, · · · , J . There is a relationship between the wavelet variance σ2

x(λj) and scale λj (on alog-log scale) for FI(d) process. Jensen (1999b) proved that σ2

x(λj) → Cλ2d−1j as j →∞

for FI(d) process with −1

2≤ d ≤ 1

2. A reasonable regression model is therefore

log(σ2x(λj)) = β0 + β1 log(λj) + ej, (4.29)

54

where β1 = 2d− 1. Defining yj = log(σ2x(λj)) and xj,1 = log(λj) for j = 1, · · · , J , the

OLS estimator of β1 is given by the second element of β in Equation (4.27), or explicitly,

β1 =

∑Jj=1[log(λj)− log(λj)] log(σ2

x(λj))∑Jj=1[log(λj)− log(λj)]2

,

where log(λj) is the sample mean of log(λj). Hence the OLS estimator of the fractionaldifference parameter is d = (β1 + 1)/2. The variance of the OLS estimator is

V ar(β1) =σ2

e∑Jj=1[log(λj)− log(λj)]2

,

where the estimated variance of the model errors is given by (most easily expressed inmatrix notation)

σ2e =

(y −Xβ)T (y −Xβ)

J − 2.

Basic properties of the variance tell us that V ar(d) =1

4V ar(β1). Jensen (2000) used

the wavelet transform to decompose the variance of a long memory process to develop analternative to the frequency domain estimators of the long memory parameter, but onlyfor globally stationary long memory processes.

In particular, Jensen (1999) showed that for the fractionally integrated noise process Xt

satisfying (I − B)dXt = εt, when d ∈ (−1

2,1

2), the wavelet coefficient ωj,k has the

following asymptotic behavior,

ωj,k →d N(0, σ22−2jd) as j → 0. (4.30)

If we define the variance of ωj,k as R(j), the intuitive log-linear relationship

log R(j) = log σ2 − d log 22j (4.31)

arises. To estimate d through Equation (4.31), an estimate of the variance is required.Jensen (1999) proposed

R(j) = 2−j

2j−1∑

k=0

ω2j,k, j = 0, · · · , p− 1, (4.32)

and the relationship (4.31) thus gives rise to the regression

log R(j) = constant− d log 22j + error, j = J, · · · , p− 1−K, (4.33)

which can be estimated by ordinary least squares yielding the wavelet OLS (WOLS) esti-

mator. The WOLS estimator is consistent and asymptotically normal when d ∈ (−1

2,1

2),

see Jensen (1999). The trimming of the lowest J scales was suggested by Jensen (1999)

55

to avoid boundary effects, and the trimming of the highest K scales was suggested byMcCoy and Walden (1996) and Tse et al. (2002), since (4.30) is valid for small j only.

The wavelet coefficients’ variance is a regularization of the spectrum (Percival, 1995,McCoy and Walden, 1996), which is a quite useful property for the estimation theory.The wavelet coefficients’ variance decomposes the variance of the series across differentscales. Those scales which contribute the most to the series’ variance are associated withthose wavelet coefficients with largest variance. Hence, the wavelet coefficients’ samplevariance provided a more intuitive parametric estimate of its population variance than thenon-parametric periodogram does of the power spectrum. More importantly, whereas theperiodogram is an inconsistent estimator of the spectrum, the wavelet coefficients’ samplevariance is a consistent estimator of the population variance that enables the wavelet OLSestimator to be a consistent estimator of the fractional differencing parameter.

The use of periodogram regressions appears simple but rather ad hoc. Nevertheless, Ya-jima (1985) extended these results to show the strong consistency of this least squares

estimator over d in the open interval (−1

2,1

2) and the asymptotic normality of these least

squares estimator for d in [0,1

4). In addition, he gave results on the rates of convergence

for the estimator when d ∈ [1

4,1

2). The main problem of the least squares technique is its

large variance.

Maximum Likelihood Estimation in Scale and Space (Wavelet MLE)

Other researchers (e.g. Hipel and McLeod, 1978; Hosking, 1981) suggested using max-imum likelihood (ML) techniques on the basis of asymptotic efficiency. Nevertheless, itis unclear how the long-term and short-term parameters can be distinguished (i.e. identi-fied) in the time domain. Identification can be done very simply in the frequency domain,despite the general equivalence that exists between spectral and time domain frameworks.Hosking (1981) proposed an iterative ML technique that first estimated d using ML tech-niques and then identified the ARMA structure from a residual series differenced by d.Clearly this estimate of d would be inconsistent.

Wavelet-based maximum likelihood estimation procedure has been investigated by Mc-Coy and Walden (1996) and Jensen (1999a, 2000). Although ordinary least squares esti-mation is popular because of its simplicity to program and to compute, it produces muchlarger mean square errors when compared to maximum likelihood method. The wavelet-based maximum likelihood method presented here overcomes the difficulty of computingthe exact likelihood by replacing the covariance matrix of the process with an approxi-mation using the DWT. This is possible due to the ability of the DWT to decorrelate longmemory processes.

More concretely, let x be a length N(= 2J) FI(d) process with mean zero and covari-

56

ance matrix given by Σx. Then its likelihood can be written as

L(d, σ2ε |x) = (2π)−N/2|Σx|−1/2 exp[−1

2xT Σ−1

x x], (4.34)

see Brockwell and Davis (1991) for reference. The quantity |Σx| is the determinant ofΣx. The maximum likelihood (MLE) estimators of the parameters (d and σ2

ε ) are thosequantities that maximize Equation (4.34). To avoid the difficulties in computing the exactMLE, we use the approximate decorrelation of the DWT as applied to FI(d) process; thatis ,

Σx ≈ Σx = WT ΩNW ,

where W is the orthonormal matrix defining the DWT and ΩN is a diagonal matrix con-taining the variances of DWT coefficients computed from FI(d) process ; that is,

ΩN = diag(S1, · · · , S1︸︷︷︸N/2

, S2, · · · , S2︸︷︷︸N/4

, · · · , Sj, · · · , Sj︸︷︷︸N/2j

, · · · , SJ , SJ+1).

The approximate likelihood function is now

L(d, σ2ε |x) = (2π)−N/2|Σx|−1/2 exp[−1

2xT Σ−1

x x].

Hence, we want to find values of d and σ2ε that minimize the log-likelihood function

L(d, σ2ε |x) = −2 log(L(d, σ2

ε |x))−N log(2π) = log(|Σx|) + xT Σ−1x x.

Since the variance for scale λj DWT coefficients is given by Sj . We note that Sj dependson two parameters related to the FI(d) process, the fractional differencing parameter d andvariance σ2

ε . Let Sj(d, σ2ε) = σ2

εS′j(d). Through the properties of diagonal and orthonor-

mal matrices, the approximate log-likelihood function may be rewritten as

L(d, σ2ε |x) = N log(σ2

ε) + log(S ′J+1(d)) +J∑

j=1

log(S ′j(d)) +1

σ2ε

[vT

J vJ

S ′J+1(d)+

J∑j=1

wTj wj

S ′j(d)].

(4.35)Differentiating Equation (4.35) with respect to σ2

ε and setting the result to zero, the MLEof σ2

ε is found to be

σ2ε =

1

N[

vTJ vJ

S ′J+1(d)+

J∑j=1

wTj wj

S ′j(d)].

Substituting σ2ε into Equation (4.35) we obtain the reduced log-likelihood

L(d|x) = N log(σ2ε) + log(S ′J+1(d)) +

J∑j=1

Nj log(S ′j(d)).

The reduced log-likelihood is now a function of only the fractional differencing parameterd ∈ (−1/2, 1/2).

57

This maximum likelihood (ML) procedure may be extended to allow for short memory(ARMA) components in the time series model – producing an ARFIMA(p,d,q) modelgiven by

Φ(B)(I −B)dXt = Θ(B)εt,

where Φ(B) and Θ(B) are p and q degree polynomials in the backshift operator withautoregressive (AR) parameters Φ = (φ1, φ2, · · · , φp) and moving average (MA) param-eters Θ = (θ1, θ2, · · · , θq), respectively. The approximate log-likelihood has been inves-tigated by Jensen (1999a), who compared an approximate wavelet-based ML procedureto the approximate frequency-domain ML procedure in Fox and Taqqu (1986).

An alternative to the (approximate) ML estimators described above is to use an approx-imate wavelet ML (AWML) estimator. Following the arguments of McCoy and Walden(1996) and Johnstone and Silverman (1997), see also Jensen (1998, 2000), we assumethat the asymptotic behavior (4.30) is satisfied, where σ2 depends on other parameters ofthe model but do not vary with j.

The DWT provides a simple and effective method for approximately diagonalizing thevariance/covariance of the original process. It follows that, ignoring wavelet coefficientsj > p− 1−K, the approximate wavelet likelihood function is given by

LW (d, σ2) = −1

2

p−1−K∑j=0

[(2j − 1) log(σ22−2jd) +2j−1∑

k=0

ω2j,k

σ22−2jd] (4.36)

and the WML estimator is obtained by maximizing LW . Since the relationship (4.30) isonly valid for small j, we follow McCoy and Walden (1996) and Tse et al. (2002) andleave out the K largest scales in the likelihood function (4.36) to achieve robustness tothe possible presence of short-run dynamics in the same sense as the semiparametric fre-quency domain estimators.

McCoy and Walden (1996) and Jensen (1999a) have both provided an approximate maxi-mum likelihood (AML) estimator to the fractional difference parameter for long memorytime series models.

4.2 Seasonal and/or Cyclical Long Memory (SCLM) Mod-els

The potential value of a fractionally differenced model for a series, such as the monetaryaggregates, is indicated by the sample autocorrelation function (ACF) of the first differ-ence of the series. If a seasonally fractionally differenced model is appropriate, this ACFdisplays a hyperbolic decay at the seasonal lags, rather than the slow linear decay char-acteristic of the conventional seasonal differencing models, see in Figure 4.1 to have anintuitive idea. For this class of model, there are two types of long memory parameter es-timation methods which are often used: a semiparametric one, based on the expression of

58

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

FSeries 1

Figure 4.1: The ACF of a simulated seasonal long memory processes

the log-periodogram, and a pseudo maximum likelihood one, based on the Whittle likeli-hood.

One of the most difficult problems of the semiparametric method for the generalized longmemory process with seasonalities is the choice of the trimming number l and the choiceof the bandwidth m. The semiparametric method requires first the estimation of theGegenbauer frequencies, and we use as estimates the values for which the peridogramis maximum. The semiparametric estimate has the great advantage to be easily com-puted, but it possesses nevertheless a slow convergence rate. Thus, this estimate must becarefully used in the case of small sample sizes, although statistical techniques, such assmoothing and tapering, can greatly improve the semiparametric estimate performancesin such cases. Palma and Chan (2005) studied the asymptotic behavior of the estimatedparameters of model (4.1) using pseudo-maximum likelihood method.

59

In practice, parameter estimation in statistical long memory models having spectral den-sity with singularities is done in two steps (Gray et al., 1989; Chung, 1996a, b; Woodwardet al., 1998). The first step consists in grid-search procedure to estimate the frequencies inwhich the spectral density is unbounded and in a second step, the memory parameter d isestimated by using a classical parametric method. Particularly, in the case of Gegenbauerprocess, Yajima (1996) proposed to estimate first the frequency of unbounded spectraldensity by maximizing the periodogram. The 2-steps method of Yajima (1996) and thesimultaneous method have the great advantage to avoid a grid-search procedure, whichis a very time-consuming method. Other authors considered a simultaneous global esti-mation of the whole of the parameters (Giraitis and Leipus, 1995; Ferrara, 2000). Thismethod can be generalized to the case of a spectral density with several singularities (seeFerrara, 2000).

The common techniques for estimating the long-memory parameter for an ARFIMAmodel have been extended to the seasonal models, including log-periodogram and semi-parametric analysis (Arteche and Robinson, 2000). Ferrara (2000) used the Whittle’sestimation of the maximum likelihood to estimate the parameters of k-factor Gegenauerprocesses. Whitcher (2004) applied the approximate maximum likelihood estimation tothe case of long memory processes with seasonality, utilizing the DWPT under a partic-ular basis function B to approximately diagonalize the variance/covariance matrix of theGegenbauer process.

Recall the seasonal and/or cyclical long memory process (4.1) proposed by Robinson(1994):

(I −B)d0

k−1∑i=1

(I − 2νiB + B2)di(I + B)dkXt = εt.

We will consider in the following the estimation method for the particular cases of thisgeneral model.

4.2.1 Estimation for the k-factor Gegenbauer ARMA ProcessesHosking (1981) notes that taking the fractional power of a second-order polynomialmakes it possible to describe long term structures of periodic shape. Thus, Gray et al.(1989) proposed the so-called Gegenbauer ARMA (GARMA) model, possessing a cycli-cal and persistent structure. Then Woodward et al. (1998) and Giratis and Leipus (1995)extended the GARMA model to the case in which the serie has a k cyclical persistentcomponents, that is the k-factor Gegenbauer ARMA (k-factor GARMA) model. It is aparticular case of the SCLM model.

Both time and frequency domain techniques have been established for the simulation ofFI(d) process. The partitioning of the time-frequency plane by the DWT makes it a nat-ural alternative to the discrete Fourier transform for decomposing long memory process.However, the DWT cannot be adapted to a general SDF of the Gegenbauer processes. In-

60

stead, we may make use of the DWPT, to produce the least correlated wavelet coefficients.

We consider first of all the stationary Gegenbauer process (Xt)t with a singularity onthe interval [0, π] in the spectrum. It contains only one persistent cyclical component. LetXt be a stochastic process such that

(I − 2νB + B2)dXt = εt (4.37)

where εt is Gaussian white noise, then Xt is a Gegenbauer (or seasonal persistent) process,a simple example of the k-factor Gegenbauer ARMA (GARMA) process which permitsthe seasonalities and also takes into account the short memory terms.

Gray et al. (1989) showed that Xt is stationary and invertible for |ν| = 1 and −1/4 <d < 1/4 or |ν| < 1 and −1/2 < d < 1/2. When ν=1, Equation (4.37) becomes(I −B)2dXt = εt, which is an FI(d) process with parameter 2d. Clearly, the definition ofthe Gegenbauer process also includes an FI(d) process.

Equation (4.37) may be rewritten as an infinite moving-average process via

Xt =∞∑

j=0

Cj(d, ν)εt−j,

where Cj(d, ν) are the Gegenbauer coefficients (Rainville, 1960) which can be calculatedby the following recursion formula:

C0(d, ν) = 1C1(d, ν) = 2dν

Cj(d, ν) = 2ν(d− 1

j+ 1)Cj−1(d, ν)− (2

d− 1

j+ 1)Cj−2(d, ν).

The SDF of Xt is given by

SX(f) =σ2

ε

2| cos(f)− ν|2d , for − π

2< f <

π

2, (4.38)

so that SX(f) becomes unbounded at frequency fG = cos−1(ν), sometimes called theGegenbauer frequency.

The autocovariance sequence (ACVS) of a Gegenabuer process may be expressed viathe Fourier transform of its SDF; that is,

γτ =

∫ 1/2

−1/2

Sy(f) cos(fτ)df.

Gray et al. (1994) showed that the autocorrelation sequence of a Gegenabuer process isgiven by

ρτ ∼ τ 2d−1 cos(fGτ), as τ →∞. (4.39)

61

An obvious extension of the Gegenabuer process would be to allow multiple singularitiesto appear in the SDF of the process. Woodward et al. (1998) and Giraitis and Leipus(1995) considered the zero mean k-factor Gegenabuer process given by

k∏i=1

(I − 2νiB + B2)diXt = εt,

exhibiting k asymptotes located at the frequencies fi = cos−1(νi) (i = 1, · · · , k), in itsspectrum

S(k)X = σ2

ε

k∏i=1

2| cos(f)− νi|−2di , |f | < π

2. (4.40)

Using this model allows one to incorporate several observed oscillations, such as an an-nual frequency and its harmonics. For the generalized long memory processes possessingthe singularities at the non-zero frequencies in the spectrum, the DWT is not adaptable todecorrelate the general SDF.

Whitcher (2004) analyzed the feasibility of the OLS regression for the Gegenbauer pro-cesses. In fact, he established a linear relationship (on a log-log scale) between theDWPT variance and the long memory parameter, using the spectral density function of theGegenbauer process. It is indeed a frequency-based semiparametric method. However,he showed that the wavelet-based OLS estimate of the fractional differencing parameterfor Gegenbauer process exhibits reasonable bias and MSE characteristics in simulationstudies. And if a non-adaptive orthonormal basis is used, the wavelet packet OLS estima-tor is heavily biased, since the basis mimic the frequency domain estimators of the SDF.Thus, he suggested that the wavelet-based OLS estimate should be restricted in its use inpractice.

On the other side, Whitcher (2004) proposed the approximate maximum likelihood es-timation (AMLE) for Gegenbauer process using the wavelet-based OLS estimate as theinitial value. He utilized the DWPT under a particular choice of orthonormal basis toapproximately diagonalize the covariance matrix of a Gegenbauer process. What’s more,this maximum likelihood method can also be applied to the k-factor Gegenbauer ARMAprocesses possessing the short memory terms. Using the wavelet-based AMLE, he foundthat the bias and RMSE are largely improved.

• Ordinary Least Squares (OLS) Estimation of Gegenabuer ProcessesPorter-Hudak (1990) extended the non seasonal estimation technique developed inGeweke and Poter-Hudak (1983) to the fractionally differenced seasonal model andoffered some preliminary sampling evidence as to its efficacy.

And the observed linear relationship (on a log-log scale) between the wavelet vari-ance σ2

x(λj) and scale λj provided the impetus to use OLS estimation for the frac-tional differencing parameter d of FI(d) process. Actually, there also exists a log-linear relationship for Gegenabuer process.

62

By applying the logarithmic transform to both sides of Equation (4.38), we get

log SX(f) = −2d log 2| cos(f)− ν|. (4.41)

This suggests a simple linear regression of log SX(f) on log 2| cos(2πf) − ν| inorder to estimate the fractional differencing parameter d, where log SX(f) is anestimate of the true spectrum at each frequency (e.g. the periodogram or multitaperspectrum estimator). Arteche and Robinson (2000) suggested a simple modificationto Equation (4.41), which consists of replacing the explicit parametric form of theSDF with just the frequency, yielding

log SX(f) ≈ −2d log 2|f − cos−1 ν|. (4.42)

Thus, our variables for the OLS regression y = Xβ + e based on Equation (4.41)are

y =

log SX(f1)

log SX(f2)...

log SX(fk)

X =

1 log 2| cos(f1)− ν|1 log 2| cos(f2)− ν|...

...1 log 2| cos(fk)− ν|

β =

(β0

β1

)

and model errors e = [e1, e2, · · · , eJ ]T . The OLS estimator of β1 is the second el-ement of β = (XTX)−1XTy, thus using the transformation dOLS = −β1/2 givesthe estimated fractional differencing parameter. To utilize the regression based onEquation (4.42), simply construct the model matrix X using log 2|f − cos−1 ν|instead of log 2| cos(f) − ν|. Additional modifications to this log-periodogram re-gression scheme may be found in Arteche and Robinson (2000). Robinson (1995)replaced the parametric form of the SDF with frequency in Equation (4.42) andfound it to work quite well for long memory processes.

We now extend the results to formulate a wavelet packet variance estimator for thefractional difference parameter of Gegenabuer process. Since the variance of thewavelet coefficients is an estimation of the true spectrum, we obtain the followingrelationship:

log σ2(λj,n) = −2d log 2| cos(µj,n)− ν|, (4.43)

where µj,n is the midpoint of the frequency interval λj,n. In strict terms, the (j, n)-thwavelet packet variance covers the entire interval of frequencies λj,n but it sufficesto represent this interval by its midpoint here. As in Equation (4.41), the slope froma simple linear regression of log σ2(λj,n) on log 2| cos(µj,n)−ν|, appropriately nor-malized, provided an estimate of the fractional differencing parameter. SimplifyingEquation (4.43) to just the frequencies, not the full SDF, yields

log σ2(λj,n) = −2d log 2|µj,n − cos−1 ν|. (4.44)

63

For the wavelet packet variance, the variables for the OLS regression y = Xβ + ebased on Equation (4.43) are

y =

log σX(λj,n)log σX(λj,n)

...log σX(λj,n)

X =

1 log 2| cos(µj,n)− ν|1 log 2| cos(µj,n)− ν|...

...1 log 2| cos(µj,n)− ν|

β =

(β0

β1

)

and model errors e = [e1, e2, · · · , eJ ]T . As the log-periodogram estimator, theOLS estimator of β1 is the second element of β = (XTX)−1XTy and using thetransformation dOLS = −β1/2 gives the estimated fractional difference parame-

ter with V ar(dOLS) =1

4V ar(β1). To utilize the regression based on Equation

(4.44), we simply construct the model matrix X using log 2|µj,n − cos−1 ν| insteadof log 2| cos(µj,n)− ν|.

Whereas the OLS estimate of the fractional differencing parameter for long mem-ory processes has been shown to exhibit reasonable bias and MSE characteristicsin simulation studies, Whitcher (2000) showed that the log-periodogram estimatorusing Equation (4.42) is heavily biased and the wavelet packet OLS estimate alsoperforms poorly when using a non-adaptive orthonormal basis. This basis mimicsfrequency domain estimators of the SDF. When utilizing an adaptive orthonormalbasis, the wavelet packet OLS estimate of the fractional differencing parameter out-performs both the log-periodogram and non-adaptive wavelet packet estimates. Inpractice, the OLS estimator dOLS for Gegenabuer process should be restricted in itsuse to input into a maximum likelihood procedure and not be regarded as a visableestimator on its own.

• Approximate Maximum Likelihood Estimation of Gegenabuer ProcessesWe have provided an approximate maximum likelihood estimator for the fractionaldifferencing parameter of the long memory process. The DWT provides a simpleand effective method for approximately diagonalizing the covariance matrix of theoriginal process. We extend these results to the case of Gegenabuer process, wheretwo parameters d and ν define the SDF. The key point is to utilize the DWPT undera particular choice of orthonormal basis B to approximately diagonalize the covari-ance matrix of a Gegenabuer process.

Let x be a realization of a zero mean stationary Gegenbauer process with unknownparameter d, ν and σ2

ε > 0. Recall that the likelihood function of x, under theassumption of multivariate Gaussianity, is given by

L(d, ν, σ2ε |x) = (2π)−N/2|Σx|−1/2exp[−1

2xT Σ−1

x x]. (4.45)

The MLEs of the parameters d, ν, and σ2ε are those quantities that maximize Equa-

tion (4.45). We avoid the difficulties in computing the exact MLEs by using the

64

approximate decorrelation of the DWPT as applied to Gegenabuer process; that is,

Σx ≈ Σx = WBT ΩNWB

where WB is an N × N orthonormal matrix defining the DWPT through the basisB and ΩN is a diagonal matrix containing the band-pass variances associated with(j, n) ∈ B. The approximate likelihood function is now

L(d, ν, σ2ε |x) = (2π)−N/2|Σx|−1/2exp[−1

2xT Σ−1

x x].

Hence we want to find values of d, ν and σ2ε that minimize the log-likelihood func-

tion

L(d, ν, σ2ε |x) = −2 log(L(d, ν, σ2

ε |x))−N log(2π) = log(|Σx|) + xT Σ−1x x.

We know that the variance for DWPT coefficients associated with the frequencyinterval λj,n is given by Sj,n. We note that Sj,n depends on three parameters relatedto the Gegenabuer process: the fractional difference parameter d, the Gegenbauerfrequency fG = cos−1(ν), and the variance σ2

ε . Let Sj,n(d, ν, σ2ε) = σ2

εS′j,n(d, ν).

Since properties of diagonal and orthonormal matrices, the approximate log-likelihoodfunction may be rewritten as

L(d, ν, σ2ε |x) = N log(σ2

ε) +∑

(j,n)∈BNj log(S ′j,n(d, ν)) +

1

σ2ε

[∑

(j,n)∈B

wTj,nwj,n

S ′j,n(d, ν)].

(4.46)Differencing Equation (4.46) with respect to σ2

ε and setting the result to zero, theMLE of σ2

ε is found to be

σ2ε =

1

N[

∑

(j,n)∈B

wTj,nwj,n

S ′j,n(d, ν)].

Substituting σ2ε into Equation (4.46), we obtain the reduced log-likelihood function

as follows:L(d, ν|x) = N log(σ2

ε) +∑

(j,n)∈BNj log(S ′j,n(d, ν)).

This estimation procedure differs from the frequency-based semiparametric estima-tor of Arteche and Robinson (2000) by simultaneously determining MLEs for boththe fractional difference parameter and Gegenbauer frequency.

The estimation procedure outlined here has assumed only one singularity in thespectrum of Xt. It is common to observe a fundamental frequency, say the annualcycle, and several harmonics, such as cycles of two per year, and so on. These maybe included in the ML procedure by using the spectrum of a k-factor Gegenabuerprocess. The likelihood function would then be a function of d = (d1, d2, · · · , dk),Φ = (φ1, φ2, · · · , φk) and σ2

ε . Long memory may also be incorporated into the

65

modeling by allowing one of the Gegenabuer frequencies to be zero. The fractionaldifference parameter associated with this zero frequency would be constrained via|d| ≤ 1/4, but the relation to a fractional differencing parameter of an FI(d) is 2d.Finally, short memory may also be included by adding AR or MA terms to the spec-trum of the model. Thus we allow ML estimation for parameters to cover a widevariety of time series models involving long memory with seasonality.

The spectral density of the k-factor GARMA process is as follows:

SX(f) =σ2

ε

2π

|Θ(eif )|2|Φ(eif )|2

k∏i=1

|4 sin(f + fi

2) sin(

f − fi

2)|−2di , for f ∈ [0, 2π), (4.47)

where fi = cos−1(νi) (i = 1, · · · , k) are the k Gegenbauer frequencies. We have theapproximation of the spectral density as follows:

SX(f) ∼ C|f − fi|−2d∗i , when f → fi, (4.48)

where C is a strictly positive finite constant and

d∗ =

2di, if fi ∈ 0, π,di, if 0 < fi < π.

Arteche (1998) carried out the generalized Robinson method with the trimming techniqueto estimate di(i = 1, · · · , k) and proved the asymptotic behavior of the estimates.

For the k-factor GARMA model, the estimation method proposed by Chung (1996a, b) isbased on the minimization of the conditional sum of squared residuals (CSS). However,parameter estimation of this class of process is delicate. Chung (1996a, b) showed thatthe estimator of ν obtained by CSS minimization converges at a greater speed than theother parameters. This rules out the use of gradient-based methods on the whole set ofparameters. It is therefore advisable to use an alternative method based on an incrementalsearch (or grid-search). This method is very slow if the grid-search corresponds to [−1, 1].It is then more effective to restrict the search interval to a neighborhood of frequenciesrelative to the strongest values of the periodogram. Chung demonstrated that the distri-bution of the estimators of d, as well as that of a possible ARMA structure obtained byminimization of the CSS function, is asymptotically normal. Parameter significance canbe tested by a significant Student test.

Recall that the Whittle estimator of θ = (d, ν, σ2ε) is obtained by minimizing the fol-

lowing approximation of the log-likelihood function (see Beran 1994):

LW (X, θ) =1

2π

∫ π

−π

(log(fX(λ, θ)) +IT (λ)

fX(λ, θ))dλ, (4.49)

where fX(λ, θ) is the spectral density of the k-factor Gegenbauer process (Xt)t generatingthe data, and IT (λ) is the periodogram defined by

IT (λ) =1

2πT|

T∑t=1

eiλt(Xt − XT )|2,

66

where XT is the empirical mean of the process, equal to zero in our case.

Under some classical conditions inherent to the Whittle estimation procedure, Giraitisand Leipus (1995) proved the strong consistency of the Whittle estimator θT , by using aresult of Hannan (1973). The limiting distribution of the Whittle estimator of Gegenbauerprocess was proven by Diongue and Guégan (2004). The asymptotic Gaussian distribu-tion of the Whittle estimator has been proved under some non-restrictive conditions, byextending the proof of Yajima (1985) to the vectorial case.

It is worthwhile to note that the convergence rate of this pseudo maximum likelihoodestimate is greater than the convergence rate of the semiparametric estimate. While theconvergence rate of the conditional sum of squares (CSS) estimate of the Gegenbauer fre-quency is O(T−1). Whitcher (2004) proposed a method to estimate the parameters of theGegenbauer processes using semiparametric method by wavelet transformation. He tookthe logarithmic of the SDF of Gegenbauer process, yielding

log SX(f) = −2d log | cos f − ν|.Thus, we obtained a simple linear regression of log SX(f) to estimate the long memoryparameter d, where SX(f) is an estimate of the true spectrum at each frequency.

4.2.2 Estimation for the Models with Fixed Seasonal PeriodicityConsider the simple model

(I −Bs)dXt = εt, (4.50)

where d is the fractionally differenced component and lies inside the interval (−1/2, 1/2),εt is a white noise process, and s is the seasonal periodicity (e.g. s = 12 for monthlyseries). The model in (4.50) is a direct analogue of the simple fractional differencedmodel (4.23). The generalization of (4.50) to an autoregressive moving average (ARMA)model with a fractionally differenced seasonal component is

Φ(B)(I −Bs)dXt = Θ(B)εt, (4.51)

where Φ(B) and Θ(B) are autoregressive and moving average polynomials (each con-ceivably including seasonal components).

Porter-Hudak (1990) extended the results of Geweke and Porter-Hudak (1983) to themodel (4.51), considering the spectral density of the model (4.50),

SX(f) = σ2/2π(2(1− cos(sf)))−d. (4.52)

The natural extension of the non-seasonal technique given in Geweke and Porter-Hudak(1983) is a regression of the log periodogram, log I(πj/T ), around the seasonal harmon-ics, πsj/T (j = 1, 2, · · · , g(T )), for some choice of g(T ) such that the limit of g(T )/Tgoes to 0 as T →∞,

log I(πj/T ) = φ0 − d log(2(1− cos(πjs/T ))), j = 1, 2, · · · , g(T ). (4.53)

67

Consider a more general seasonal process. Porter-Hudak (1990) introduced the followingmodel, taking into account the presence of an infinite cycle as well as a given seasonalitywith a certain persistence:

Φ(B)(I −B)d1(I −Bs)d2Xt = Θ(B)εt, (4.54)

whose spectral density is given by, for −π ≤ f ≤ π:

SX(f) =σ2

2π(2 sin(fs/2))−2d2(2 sin(f/2))−2d1 . (4.55)

The parameter estimation is generally done with the Geweke and Porter-Hudak (1983)method, or GPH method. Two fractionally differenced components d1 and d2 could beestimated in a multivariate regression setting, by considering the following regression:

log I(πj/T ) = φ0−d1 log(2(1−cos(πj/T )))−d2 log(2(1−cos(πjs/T ))), j = 1, 2, · · · , g(T ).(4.56)

Unfortunately, this latter expression is asymptotically equivalent to log(πj/T ) = φ0 −d1 log(πj/T ) − d2 log(πjs/T ) or log I(πj/T ) = φ − (d1 + d2) log(πj/T ). Hence bothd1 and d2 can not be identified (see Porter-Hudak, 1983).

4.3 Seasonal and/or Cyclical Asymmetric Long Memory(SCALM) Models

By adding the asymmetry, Arteche and Robinson (2000) introduced the Seasonal and/orCyclical Asymmetric Long Memory model, or SCALM. They consider a semiparametricapproach using spectral density defined in the following way:

f(λ) ∼ C1|λ− ω| as λ → ω+, (4.57)f(λ) ∼ C2|λ + ω| as λ → ω−, (4.58)

where ω ∈ (0, π] and for i = 1, 2,

0 < Ci < ∞, |di| < 1

2, (4.59)

permittingd1 6= d2, and/or C1 6= C2. (4.60)

Arteche and Robinson (2000) proposed an estimate of d1 and d2 based on a trimmingapproach of the periodogram. A complementary approach for parameter estimation isconsidered by Olhede, McCoy and Stephens (2004).

68

Chapter 5

Estimation and Forecast forNon-stationary Long Memory Processes

Stationarity has always played an important role in time series analysis. One reason isthat for stationary processes there exists a rich and elegant theory which allows for adetailed investigation of the different methods used in statistical inference. As a conse-quence, practitioners often try to use stationary methods even when the data clearly showa non-stationary behavior, e.g. by taking differences or by looking at different segmentsseparately.

However, in some situations, the assumption that real world processes exhibit a constantlong memory structure may not be reasonable. Time-varying long memory characteristicshave been hypothesized or observed in telecommunications networks, physiological sig-nals, seismic measurements, etc. To better capture the non-stationary behavior associatedwith market collapses, political upheavals and new announcements, we need to study thenon-stationary models with the time-varying parameters.

It is difficult to develop a general non-stationary theory. On one hand, an asymptotictheory is needed, since for example an investigation of a maximum likelihood estimatefor fixed sample size is much too complicated and will not lead to any satisfactory results.On the other hand, a classical asymptotic theory with the assumption that more and moreobservations of the future become available does not make sense since future observa-tions of a general non-stationary process do not necessarily contain any information ofthe structure at present.

It is well-known that the Fourier transform is well-equipped with its localized frequencybasis functions and spectral representation to deal with stationary time series, but its use innon-stationary setting is not recommended. Whereas wavelet analysis is more appropri-ate to analyze the time-varying behavior of non-stationary time series because it is welllocalized both in time and scale. Compared with the study on wavelet applications ofstationary long memory processes, the wavelet-based work on non-stationary processesis not so much. Here, in order to provide a consistent and robust procedure to estimatethe long memory parameters of non-stationary processes, we focus on wavelet techniques.

69

Summarizing the time-varying nature through a constant long memory parameter, alsoknown as the Hurst coefficient H , yields a stationary model that does not capture its non-stationary behavior. Gonçalvès and Abry (1997) estimated a local scaling exponent for acontinuous-time, multifractal, Brownian motion process characterized by a time-varyingHurst coefficient H(t). Their method relies on the multiple window scalogram (squaredmagnitude of the continuous wavelet transform) to estimate H(t). This involves con-structing non-standard wavelets to compute the scalograms, which may hinder practicalimplementation.

As far as we know, the existing work concerning the non-stationary processes is mostlyconcentrated on the fractional integrated process (I − B)dXt = εt with a constant dif-

ferencing parameter d ∈ (1

2, 1) and the fractional model (I − B)d(t)Xt = εt with a

time-varying parameter function d(t) ∈ (−1

2,1

2).

For the first case, many authors adopted the estimation method based on the Whittlemethod, see for example Abadir et al. (2007), Beran and Terrin (1996), Moulines etal. (1998), Phillips and Shimotsu (2000, 2004, 2006) and Velasco and Robinson (2000).For the second case, the possibility that the long memory parameter d is not constant overtime is an interesting generalization of the usual FI(d) process. Veitch and Abry (1999)developed a testing procedure for the time constancy of d, while Whitcher and Jensen(2000) proposed an OLS estimator for a non-stationary FI(d) process, and Jensen andWhitcher (2000) applied the OLS-based estimator to a year of high-frequency foreign ex-change rates. Parameter estimation for a non-stationary long memory time series model,through OLS or maximum likelihood, is in its infancy and should benefit greatly fromwavelet-based methods.

In the literature, in order to estimate the fractional differencing parameter of the processesfor which the spectrum has the singularity only at zero, the authors perform the orthonor-mal discrete wavelet transform (DWT) to decorrelate the original time series (Jensen,1999a, b; Cavanaugh et al., 2002). The band-pass structure of the DWT partitions thespectrum finer and finer as the frequency tends to 0, where the spectrum is unbounded. Itis performed through a succession of low-pass and high-pass filtering operations.

For the long memory processes with seasonal terms, the spectral density function can ex-plode at any frequency between 0 and π. We apply the discrete wavelet packet transform(DWPT) instead of DWT which permits to approximately decorrelate the spectrum of theprocess. To realize this approximate decorrelation, we resort to the minimum-bandwidthdiscrete-time (MBDT) wavelets with length L (denoted by MB(L)), introduced by Morrisand Peravali (1999). It permits to approximately decorrelate the band-pass filter and tochoose the adaptive orthonormal basis (Whitcher, 2004).

Since there exists an OLS regression (4.43) to obtain the long memory parameter of theGegenbauer process, we will try to utilize this method in the estimation of non-stationary

70

long memory processes with seasonalities, see Chapter 5 for details. And the approximatelikelihood estimation method for non-stationary long memory processes with seasonali-ties is still under consideration.

5.1 Fractional Integrated Processes with a Constant LongMemory Parameter

Moulines et al. (2008) considered a time series with memory parameter d ∈ R. Thistime series is either stationary or can be made stationary after differencing a finite numberof times. They proposed a wavelet-based semiparametric pseudo-likelihood maximummethod estimator – the local Whittle wavelet estimator of the memory parameter d. Theyalso showed that the estimator is consistent and rate optimal if the process is linear, andis asymptotically normal if the process is Gaussian.

In fact, the study of the estimation of long memory parameters in non-stationary longmemory processes is mostly focused on the fractional integrated process proposed byGranger and Joyeux (1980) and Hosking (1981) to take into account the existence of non-stationarity jointly with presence of persistence.

Consider the model(I −B)dXt = εt, |d| ≥ 1

2. (5.1)

Such fractional integrated process has been studied by Beran and Terrin (1996), Velascoand Robinson (2000), Shimotsu and Phillips (2000, 2004, 2006), Abadir et al. (2007) andMoulines et al. (2008), for instance. In these work, the authors applied Geweke-Porter-Hudak (GPH) method, local Whittle method, exact local Whittle method, fully extendedlocal Whittle method, Whittle pseudo maximum likelihood method, wavelet-based localWhittle method for estimating the parameter.

Considering the fractional integrated process (5.1), Phillips and Shimotsu (2001) foundthat for the local Whittle (LW) estimator of the non-stationary fractional integrated pro-

cesses (|d| ≥ 1

2), the asymptotic theory is discontinuous at d =

3

4and d = 1. Thus it is

awkward to use the LW estimator because of non-normal limit theory and the estimator isinconsistent when d > 1. And the LW estimator is not a good general purpose estimator

when the value of d may take on values in the non-stationary zone beyond3

4. Shimotsu

and Phillips (2002) proposed the exact local Whittle method for the estimation of the frac-tional integration, which is applicable to the stationary and non-stationary processes. And

the estimator is shown to be consistent and to follow the N(0,1

4) limit distribution for all

values of d.

When d ∈ (−3

2,∞), Abadir et al. (2007) investigated the properties of fully extended

local Whittle (FELW) estimator, which is applicable not only for the traditional cases but

71

also for nonlinear and non-Gaussian processes. They showed that the estimator is consis-tent and has the good asymptotic behavior.

Whittle pseudo-maximum likelihood estimates of parameters for stationary time serieshave been found to be consistent and asymptotically normal in the presence of long rangedependence. Generalizing the definition of the long memory parameter d, Velasco andRobinson (2000) extended these results to include possibly non-stationary (0.5 ≤ d < 1)and anti-persistent (−0.5 < d < 0) observations. Using adequate data tapers , we canapply this estimation technique to any degree of non-stationarity d ≥ 0.5 without priorknowledge of the memory of the series.

5.2 Locally Stationary ARFIMA ProcessesFor the non-stationary long memory processes, some authors studied the processes withtime-varying long memory parameters instead of a constant long memory parameter. Forinstance, a locally stationary ARFIMA model is studied in Jensen (1999a) and Whitcherand Jensen (2000). They introduced the operator (I−B)d(t). And Cavanaugh et al. (2002)investigated the self-similar processes, for example, the time-varying fractional Brownianmotion and the fractional Gaussian noise with the time-varying parameter. The authorsdeveloped the estimation procedures based on wavelets techniques, succeeding in captur-ing the local changes in the series.

Existing frequency-domain estimators of the long memory parameter depend on the Fouriertransforms localized in frequency. These estimators are incapable of addressing any time-varying long memory behavior (see Geweke and Porter-Hudak, 1983; Fox and Taqqu,1986) for two of the most popular frequency domain, long memory estimators). By def-inition, the statistical properties of a non-stationary process are a function of time andscale. Since the wavelet is localized in time, it is feasible to focus on a period where thestatistical properties of the non-stationary process are relatively stable and not affectedby observations with differing statistical properties. Hence, whereas Fourier analysis isfor the study of stationary processes, wavelet analysis is more suitable for the study ofnon-stationary processes.

Whitcher and Jensen (2000) introduced a stochastic process defined Xt,T given by

Xt,T = (I −B)−d(t) Θ(B)

Φ(B)εt ≡

∞∑j=0

Γ[j + d(t)]

Γ(j + 1)Γ[d(t)]Ψjεt−j, (5.2)

where d(t)(∈ (−1

2,1

2)) is the time-varying fractional differencing parameter and εt is a

sequence of mean zero normal (Gaussian) random variables with variance σ2ε . Here, B

denotes the lag (backshift) operator, that is Xt−j,T = BjXt,T , and Γ(·) is the gammafunction. The functions Θ(B) and Φ(B) are respectively p and q order polynomialsin the lag operator B, each with roots outside the unit circle, and Ψj is the solution to

72

Ψjzj =

Θ(z)

Φ(z). Called a locally stationary ARFIMA model, this long memory time series

model is a member of the non-stationary class of processes known as locally stationaryprocesses (Dalhaus, 1996).

Actually, we can rewrite the model (5.2) in the following form:

Φ(B)(I −B)d(t)Xt,T = Θ(B)εt. (5.3)

This process can be regarded as an extension of the ARFIMA model allowing the longmemory parameter to evolve over time.

The time-varying spectral density function for Xt,T is given by S(u, ω) ∼ ω−2d(u), whereu = t/T . If d(u) > 0, S(u, ω) is smooth for frequencies close to zero, but is unboundedwhen ω = 0. In other words, the energy of Xt,T is concentrated over those frequenciesassociated with long term cycles. If d(u) < 0, then S(u, 0) = 0 and Xt,T is a locallystationary series that is anti-persistent. As a result of the time-varying long memory pa-rameter, Xt,T will be smoother with less variation in its amplitude during time periodswhere d(u) > 0. And Xt,T will have large fluctuations in its value when d(u) < 0.

The short memory parameter found in Θ(B) and Ψ(B) could be modeled as functionsof t. Since these parameters only affect the short-run dynamics of the process and ourmain interest is to study the estimation of the long memory parameters, thus we can setthe short memory parameters to be zero.

Whitcher and Jensen (2000) extended Jensen (1999a)’s wavelet-based ordinary least squares(OLS) estimator of long memory d to d(t), by using the cone of influence to determine thelocation in time of the long memory estimate. They introduced an estimation procedurefor the fractional differencing parameter function of a particular model, locally station-ary long memory processes. Instead of arbitrarily partitioning the data, they allowed thesupport of the central portion of the wavelet filter to determine a scale-dependent windowfor computing the local wavelet variance. The estimator is calculated via ordinary leastsquares regression applied to the local wavelet variances. The wavelet-based estimator ofthe local long-memory parameter is demonstrated using vertical ocean shear data.

Besides, Cavanaugh et al. (2002) considered the model (5.3) as a locally self-similar

process with the scaling function H(t) = d(t) +1

2, since many phenomena exhibit self-

similar patterns that change as the phenomenon itself evolves. They proved that the model(5.3) is a locally self-similar process, then by applying the discrete wavelet transform andby partitioning the time interval, they built the approximate log-linear relationship be-tween the wavelet coefficients and the scale, and they carried out the OLS regressionon each partitioned subinterval, thus they obtained the local estimates of the scaling pa-rameter function. After smoothing, they got the shape of the parameter function. Andthey also proved the consistency of the estimator. Then several practical applications arepresented, for example in the data of the vertical ocean shear measurements, yearly wa-ter levels for Nile river and Ethernet network traffic measurements, which illustrated that

73

their wavelet-based method can quantify time-dependent self-similarity patterns that arisein actual temporal and spatial series.

5.3 Locally Stationary k-factor Gegenbauer ProcessesLong memory processes have been extensively studied over the past decades. Whenwe deal with the financial and economic data, seasonality and time-varying long-rangedependence can often be observed and thus some kind of non-stationarity can exist in-side financial data sets. To take into account this kind of phenomena, we propose a newclass of stochastic process: the locally stationary k-factor Gegenbauer process. We de-scribe a procedure of estimating consistently the time-varying parameters by applyingthe discrete wavelet packet transform (DWPT). The robustness of the algorithm is in-vestigated through simulation study. We propose also the forecast method for this newnon-stationary model.

Working with the existence of non-stationarity does not mean that we observe explo-sions. Due to the existence of seasonalities inside data sets, the k-factor GARMA modelhas been extensively applied in economics and finance, which was justified by Dieboldand Rudebush (1989), Sowell (1992), Gil-Alana and Robinson (2001), Carlin and Demp-ster (1989), Porter Hudak (1990), Ray (1993), Franses and Ooms (1997), Arteche andRobinson (2000), Arteche (2003), Darné et al (2004), Ferrara and Guégan (2001a, b) andGil-Alana and Hualde (2008), to name just a few.

Most of the applications within this framework assume that the data sets are stationary.But in practice, series cannot always be made stationary even by transformation or some-times it makes no sense to render the data sets stationary. Generally, that the underlyingprocess is mean-reverting can be useful in practice. Some examples can be found in ecol-ogy for instance, Whitcher and Jensen (2000) and Cavanaugh et al. (2002).

In this part, we extend the k-factor Gegenbauer model assuming that the parameters di

evolve with time. Thus, we introduce a new model which is locally stationary takinginto account both the presence of persistence and the existence of seasonalities, and weprovide an estimation procedure adapted to this new class of models. We consider thefollowing model:

k∏i=1

(I − 2 cos λiB + B2)di(t)yt = εt, (5.4)

where εt is a Gaussian white noise. In this model, the long memory parameter is time-varying, thus the model is non-stationary, which can be regarded as piecewise stationaryprocess. In the spectrum of this model, we can also observe the explosions at non-zerofrequency, which indicates the seasonality.

We specify now why the locally stationary k-factor Gegenbauer model is well-definedand the conditions which ensure the local stationarity. First of all, we remark that the

74

operatork∏

i=1

(I − 2 cos λiB + B2)di(t) permits to define the fixed frequencies λi of the

spectrum characterized by the time-varying long memory parameters di(t) (1 ≤ i ≤ k).Then, we can expand the previous operator in the following way:

k∏i=1

(I − 2 cos λiB + B2)di(t) =∞∑

n=0

πn(t)Bn, (5.5)

and the coefficients πn(t) verify:

πn(t) =∑

0 ≤ j1, · · · , jk ≤ nj1 + · · ·+ jk = n

c(−d1(t))j1

(cos λ1) · · · c(−dk(t))jk

(cos λk) (5.6)

in which c(d(t))k (x) are orthogonal Gegenbauer (or ultraspherical) polynomials defined on

[−1, 1]. Thus, we get the following proposition which ensures the existence and locallystationarity conditions for the locally stationary k-factor Gegenbauer process.

Proposition 5.3.1. Let dj(t)(j = 1, · · · , k) be regular and nonzero functions satisfyingthe condition

|dj(t)| <

1

2, if 0 < λj < π

1

4, if λj = 0 or π.

(5.7)

Then, there exists a unique solution for the locally stationary k-factor Gegenbauer modelwhich has the following representation

y(t) =∞∑

n=0

ψn(t)ε(t− n). (5.8)

The coefficients ψn(t) verify:

ψn(t) = 2∑

k:0<λk<π

D(k, t)Γ(n + dk(t))

Γ(n + 1)Γ(dk(t))cos(λkn + νk)

+∑

k:λk=0 or π

D(k, t)Γ(n + 2dk(t))

Γ(n + 1)Γ(2dk(t))cos(λkn) + O(n−2+maxd∗1(t),··· ,d∗k(t))

(5.9)

for any given time t, as n →∞, where

d∗k(t) =

dk(t), if 0 < λk < π2dk(t), if λk = 0 or π,

νk(t) = λk

k∑j=1

dj(t)− π

k−1∑j=1

dj(t)− dk(t)π

2,

75

and

D(k, t) =

|2 sin λk|−dk(t)∏

j 6=k

|2(cos λk − cos λj)|−dj(t), if 0 < λk < π.

∏

j 6=k

|2(cos λk − cos λj)|−dj(t), if λk = 0 or π.

The prove of the proposition can be referred to in the Appendix.

This proposition ensures that the locally stationary k-factor Gegenbauer process is well-defined since every L2 functions can be approximated by regular functions. Thus, forthe locally stationary k-factor Gegenbauer process, the parameters di(t) evolve with timeand, at each time di(t) are constant. Therefore locally, the locally stationary k-factorGegenbauer model corresponds to the stationary k-factor Gegenbauer process.

5.3.1 Procedure for Estimating di(t)

In the stationary case, estimation methods for long memory models with seasonalitieshave been developed using semiparametric methods (Robinson, 1995; Chung, 1996a, b;Arteche and Robinson, 2000; Diongue et al., 2004).

In this part, we develop a procedure based on wavelet method in order to estimate thetime-varying long memory parameter di(t). We focus on wavelet method to investigateproperties of this new model using the discrete wavelet packet transform (DWPT) whichpermits to approximately decorrelate the spectrum of the process. We first establish an ap-proximate log-linear relationship between the time-varying wavelet variance of the DWPTcoefficients and the time-varying long memory parameter di(t). Finally we apply locallythe OLS regression method to obtain local estimates of the time-varying parameters. Ourmethod may be regarded as an extension of the log-linear regression techniques proposedby Whitcher (2004). It can also be considered as an extension of the estimation techniqueof locally self-similar parameters proposed by Cavanaugh et al. (2002).

5.3.2 Estimation ProcedureFirst, we assume that the sample size is dyadic (N = 2J ), otherwise we repeat the lastdata value several times to achieve such a sample size.

In the first step, we are restricted to a locally stationary 1-factor Gegenbauer model andwe assume that we observe yt (t = 1, · · · , N ), such that:

(I − 2νB + B2)d(t)yt = εt, (5.10)

(εt)t being a Gaussian white noise. We assume that the time-varying fractional differenc-ing parameter is such that |d(t)| < 1/2, B is the backshift operator, yt−j = Bjyt. In orderto estimate d(t), providing an asymptotic theory, we need to make N tend to infinity. To

76

avoid instability of d(t), we suppose that we observe d(t) on a finer grid (making d(t)rescaled on [0, 1]), that we observe (yt,N) such that:

(I − 2νB + B2)d(t/N)yt,N = εt. (5.11)

Letting N tends to infinity means that we have in the sample y1,N , · · · , yN,N more andmore observations for each value of d(t).

Now, we are going to characterize this local stationary process through its spectral den-sity: this tool is superior in presence of seasonalities, to the autocovariance function.

The stochastic process defined in (5.10) is a Gegenbauer process that is locally stationaryin the sense of Dalhaus (1996a), with realizations of length N . Its spectral density is suchthat:

fN(λ) =σ2

εt

2π

1

(2|cosλ− ν|)2d(t/N), −π

2< λ <

π

2. (5.12)

As the process yt,N is non-stationary, increasing the number of observations by measuringnew realizations of the process tells us nothing about the process’ behavior at the begin-ning of the period. As a result, we fix the time period and we liken it to measuring theprocess at higher and higher levels of resolution on a fixed time interval along with theincrease of N .

The spectral density for the process (5.11) is an even, 2π-periodic function that is uni-formly Lipchitz continuous in t/N ∈ [0, 1]. The time-varying spectral density function isgiven by:

fN(λ) ∼ (2|λ− cos−1 ν|)−2d(t/N), as λ → cos−1 ν.

Then, if d(t/N) > 0, fN(λ) is smooth for frequencies around cos−1 ν, but is unboundedwhen λ → cos−1 ν. In other words the behavior of yt,N is concentrated over the frequencyassociated with seasonality. This behavior can be extended to the case where we haveseveral explosions inside the spectral density. This means that, on the interval [0, N ], weobserve the locally stationary k-factor Gegenbauer process:

k∏i=1

(I − 2νiB + B2)di(t/N)yt,N = εt, (5.13)

where εt is Gaussian white noise with the time-varying spectral density:

fN(λ) =σ2

εt

2π

k∏i=1

(2| cos λ− νi|)−2di(t/N). (5.14)

For the locally stationary 1-factor Gegenbauer process defined in (5.11), its time-varyingspectral density is expressed in Equation (5.12). Thus by applying the logarithmic trans-form to both sides of Equation (5.12), we get

log fN(λ) = C − 2d(t

N) log 2| cos λ− ν|, −π

2< λ <

π

2. (5.15)

77

This suggests a simple regression of log fN(λ) on log 2| cos λ − ν| to estimate the frac-

tional difference parameter function d(t

N). For convenience, we approximate (5.15) by

the following relationship:

log fN(λ) ≈ C − 2d(t

N) log 2|λ− cos−1(ν)|, −1

2< λ <

1

2. (5.16)

Thus, we partition the time interval [0, 1) into 2l (0 < l < J − 1) non-overlappingsubintervals as follows:

Ih = [h2−2l, (h + 1)2−2l), h = 0, · · · , 2l − 1.

On each subinterval, we suppose that the time-varying parameter d(t

N) is locally con-

stant, i.e., the process (yt,N)t is locally stationary on each subinterval Ih = [h2−2l, (h +1)2−2l), h = 0, · · · 2l−1. Since the time-varying wavelet variance provides an estimate ofthe spectral density function, the logarithmic transformation of the variance of the waveletcoefficients provides the following log-linear relationship, for i = 1, · · · , k:

log σ2i (λj,n, t) = αi(t) + βi(t) log 2| cos µj,n − νi|, (5.17)

where σ2i (λj, n, t) is the variance of the DWPT coefficients Wj, n associated, at time t, with

the frequency interval λj, n = (n

2j+1,n + 1

2j+1] (where n = 0, · · · , 2j − 1; j = 0, · · · , J − 1);

µj, n is the midpoint of the interval λj, n; βi(t) is the slope of the log-linear relationship attime t. Denote that di(t) = −βi(t)/2. Now we apply locally the following approximationof the Equation (5.17) in order to estimate di(t):

log σ2i (λj,n, t) = αi(t) + βi(t) log 2|µj,n − cos−1(νi)|+ ui(t), (5.18)

where ui(t) is a sequence of correlated random variables (we follow Arteche and Robin-son (2000)’s methodology).

5.3.3 Procedure for Estimating di(t) (i = 1, · · · , k)Using the previous filtering, we present a general procedure for estimating the time-varying parameter functions di(t) (i = 1, · · · , k), of the model (5.13). We assume that thesample size is dyadic (N = 2J ). We detail the different steps in order to estimate di(t).

1. We first detect the Gegenbauer frequency λ1 which corresponds to the highest ex-plosion in the periodogram: ν1 = cos(λ1). This frequency is fixed all along theprocedure.

2. We compute the DWPT coefficient vectors Wj, n of length Nj through the formula(3.14), where j = 0, · · · , J − 1; n = 0, · · · , 2j − 1.

78

3. We associate to the vector Wj, n an adaptive orthonormal basis B, such that thesquared gain function of the wavelet filter associated with Wj, n is sufficiently smallat the Gegenbauer frequency. Practically, we define Uj, n(f) = |Uj, n(f)|2 to bethe squared gain function for the wavelet packet filter uj, n, l, where Uj, n(f) is thediscrete Fourier transform (DFT) of

uj, n, l =

Lj−1∑

k=0

un, kuj−1, [n2], l−2j−1k, l = 0, · · · , Lj − 1,

with u1, 0, l = gl, u1, 1, l = hl and Lj = (2j − 1)(L − 1) + 1, gl and hl being thescaling filter and the wavelet filter defined as before.

4. The basis selection procedure involves selecting the combination of wavelet basisfunctions such that Uj, n(f1) < ε for some ε > 0 at the minimum level j. However,the method of basis selection is not unique and the basis is not unique either. Weapply the white noise tests like the portmanteau test to determine the best adaptiveorthonormal basis that decorrelates the observed time series.

5. We partition the sampling interval [0, 1) into 2l non-overlapping subintervals ofequal length, where l is an integer chosen such that 0 < l < (J − 1). ”l” dependson the length of the data and the required precision. The 2l subintervals are asfollows

Ih = [h2−l, (h + 1)2−l), where h = 0, · · · , 2l − 1.

6. We locate the DWPT coefficients Wj,n,K on each subinterval Ih. In order to con-struct the local estimates for the time-varying long memory parameter d1(t), weproceed according to the Heisenberg uncertainty principle: every DWPT coefficientvector is mapped to a rectangle (Heisenberg box) defined in the time-frequencyplane with the boxes completely covering the plane.

7. Since the DWPT coefficient vector Wj, n = (Wj, n, K) corresponds to the frequency

interval λj, n = (n

2j+1,n + 1

2j+1], we obtain the corresponding time interval on the

time-frequency plane with the width of Wj, n is 2j/N . Whereas, the length of thevector Wj,n is Nj . Therefore, we partition the elements of the vectors Wj,n withequal length Nj/2

l = 2J−j−l and attach them sequently to the subintervals Ih.

8. On each subinterval Ih (h = 0, · · · , 2l − 1), we consider the bivariate collection ofdata

(log 2|µj,n − cos−1(ν1)|, log σ21(λj, n)) | 0 ≤ n ≤ 2j − 1; 0 ≤ j ≤ J − 1,

then we get the approximate log-relationship (5.18). On each subinterval Ih, wecarry out the ordinary least squares (OLS) regression to get the local estimates forthe slope β1(t). Thus we obtain 2l local estimates for the parameter β1(t). Since

d1(t) = − β1(t)

2, we get 2l local estimates for the parameter d1(t).

79

9. We omit the first and the last estimates to avoid the boundary effects. We associatethe time index for d1(t) to the interval Ih by the midpoint of Ih, i.e. 2−l−1(2h + 1),and we smooth the estimated 2l points by two local polynomial methods: splinemethod and loess (locally weighted scatter plot smoothing) method. Thus, we ob-tain two smoothed curves d1(t) from the local estimates which approximate the trueparameter curve.

10. The above steps (1-9) permit to get the estimate of d1(t) and ν1 for the correspond-ing Gegenbauer frequency λ1.

11. Now, we proceed in the same way to estimate the other Gegenbauer frequenciesand their corresponding time-varying long memory parameters. First we calculatey1

t,N := (I − 2ν1B + B2)d1(t)yt,N , where d1(t) and ν1 are obtained in the previoussteps. We need to interpolate some points such that the vector d1(t) is of length N ,due to the fact that the number of points on the smoothed curves is less than N ifwe adopt the loess smoothing method, for instance.

12. We repeat the above steps 1 to 9 on the vector y1t,N , then we get the estimate d2(t))

and ν2 associated to the frequency λ2.

13. We proceed in the same way for other Gegenbauer frequencies until the (k + 1)-thstep providing the white noise (εt)t.

At the end, there is no more peak in the periodogram, and we have k pairs of estimationsfor the Gegenbauer frequencies and parameter functions.

5.3.4 Consistency for Estimates di(t) (i = 1, · · · , k)

In this subsection, we study the properties of the estimates di(t) (i = 1, · · · , k). Toget di(t), we have previously established a linear regression between the variance of theDWPT wavelet coefficients Wj,n,t and the long memory parameters di(t). Some similarapproaches have been developed, in a stationary setting, by Geweke and Porter-Hudak(1983), Robinson (1995), Hurvich and Beltrao (1993) and Arteche (1998). And they ob-tain the consistency of the constant long memory parameter di. We will follow here thesame method.

We have introduced the spectral density for the process (yt,N)t in (5.14). We will assumethat the assumptions A1 - A2 and A4 - A5 introduced in Arteche (1998) are verified forfN(λ) defined in (5.14). The assumptions A1 and A2 specify the local behavior of thespectrum. The assumption A4 corresponds to the "trimming" condition introduced first inRobinson (1995). Now, under these assumptions and in the case of a 1-factor stationaryGegenbauer model, the asymptotic normality of the long memory parameter is obtained.

Lemma 5.3.2. Consider the Gegenbauer model (I − 2ν1B + B2)d1yt = εt with theprevious assumptions A1 - A2 and A4 - A5. Let d1 be the least squares estimate of d1

80

obtained from the following regression:

log I(ω + λj) = c + d1(−2 log λj) + uj, j = l + 1, · · · ,m, (5.19)

where c = log C − η, η is the Euler’s constant η = 0.5772 · · · ,uj = log(

I(ω + λj)

Cλ−2d1j

)+η, λj =2πj

nare Fourier frequencies and I(λ) is the periodogram.

Then2√

m(d1 − d1) →d N(0,π

6).

This lemma has been proved by Arteche (1998).

Now we consider the locally stationary 1−factor Gegenbauer process (I−2νB+B2)d1(t)yt =εt. The parameter d(t) has been locally estimated on a sequence of subintervals Ih

(h = 0, · · · , 2l − 1) treating locally as the stationary process, then we get

∀h,√

m(β1(h)− β1(h)) →d N(0,π

6), h = 0, 1, · · · , 2l − 1.

Since d1(t) = − β(t)

2, we have

∀h,√

m(d1(h)− d1(h)) →d N(0,π

24), h = 0, 1, · · · , 2l − 1.

In order to get a smoothed curve for d1(t), we have smoothed the 2l independent estimates(d1(0), · · · , d1(2

l − 1)) using two local polynomial methods: spline method and loessmethod, i.e., there exists a set of basis function ωh(t) satisfying that there exists a constant

C such that (2l−1∑

h=0

ωh(t)2 = C < ∞) and

d1(t) =2l−1∑

h=0

ωh(t)d1(h).

Thus E[d1(t)] =2l−1∑

h=0

ωh(t)E[d1(h)] = 0, and V ar[d1(t)] = V ar[2l−1∑

h=0

ωh(t)d1(h)] =

(2l−1∑

h=0

ωh(t))2V ar[d1(h)] = C

π

24≡ C1. Assuming that N tends to infinity, thus l tends to

infinity, we can apply the central limit theorem and get the following result:√

m(d1(t)− d1(t)) →d N(0, C1).

This result is still true for all the time-varying parameters di(t)(i = 1, · · · , k), of a locallystationary k-factor Gegenbauer process.

81

5.4 Simulation ExperimentsIn this section, we carry out some Monte Carlo simulations to establish the robustness ofthe estimation of the parameter function di(t) using wavelet approach, for finite samples.We focus on the model (5.10) with (εt)t, a Gaussian noise:

(I − 2νB + B2)d(t)y(t) = ε(t). (5.20)

We consider the constant, linear, quadratic, cubic, exponential and logarithmic functionsd(t) as follows:

1. d(t) is constant: d0(t) = 0.3;

2. d(t) is linear: d1(t) = 0.2t + 0.1;

3. d(t) is quadratic: d2(t) = 0.3(t− 0.5)2 + 0.1;

4. d(t) is cubic: d3(t) = 0.4(t− 0.5)3 + 0.3;

5. d(t) is exponential: d4(t) = 0.01 exp (2t) + 0.3;

6. d(t) is logarithmic: d5(t) = 0.1(log(10t + 0.5) + 1.0).

For convenience, we assume that the observed data points [y1, · · · , yN ]T are equallyspaced on the time interval and are scaled on the time interval [0, 1), using the transfor-

mation ti =i− 1

N(where i = 1, · · · , N = 2J ). In our examples, we use N = 4096 = 212

(J = 12) and the Gegenbauer frequency cos−1ν = λG =π

3.

For the estimation procedure, we use the MB(16) wavelet filter (L = 16), and we choosethe adaptive orthonormal basis using portmanteau test with p = 0.01. We partition thesampling interval [0, 1) into 26 = 64 subintervals (l = 6) and, we get 64 local estimatesfor d(t). Finally, we smooth the estimates using two local polynomial methods, splinemethod and loess method. We replicate the simulations 100 times for each locally sta-tionary 1-factor GARMA process (5.10) with the two previous functions d(t). We carryout the code on the computer Mac OS X 10.5.1 Léopard, written in language R with thehelp of the package "waveslim".

We denote (y0,t)t, · · · , (y5,t)t, the process (5.10) for constant, linear, quadratic, cubic,exponential and logarithmic parameter functions expressed in the beginning of this part.For each process, we provide the trajectories, the autocorrelation function, spectrum andthe estimated curves smoothed by spline method and loess method together with the truefunctions of d(t), see in from Figure 5.1 to Figure 5.24 for more details.

In Table 5.1, numerical results for the six models (y0,t)t, (y1,t)t, (y2,t)t, (y3,t)t, (y4,t)t and(y5,t)t are provided with 100 times replication. We give the mean of the estimated Gegen-bauer frequencies, mean of the bias and that of the RMSE for d(t) using 100 simulations.We deduce that:

82

Gegenbauer frequency cos−1 ν bias of d(t) RMSE of d(t)y0,t 0.333403 spline: -0.159373 spline: 0.174109

loess: -0.120514 loess: 0.127925y1,t 0.33262 spline: -0.115436 spline: 0.126721





loess: -0.133765 loess: 0.138645

Table 5.1: Estimation of Gegenbauer frequencies, bias and RMSE of (y0,t)t, (y1,t)t, (y2,t)t,(y3,t)t, (y4,t)t, (y5,t)t.

1. Each estimated curve approximates the general shape of the time-varying parameterfunction. The rebuilding of the curve smoothed using the loess method appearsbetter than that using the spline method.

2. The estimations of the Gegenbauer frequencies have quite small bias. The smallvalues for the bias and the RMSE of the estimated parameter suggest that our algo-rithm is robust. Comparing the two smoothing methods, we find that in most cases,the loess method performs a little better than the spline method.

83

Time

y

0 1000 2000 3000 4000

−4−2

02

4

Figure 5.1: Sample path of (y0,t)t

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

F

Series y

Figure 5.2: ACF of (y0,t)t

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−02

1e−

011e

+00

1e+

01

frequency

spec

trum



Figure 5.3: Spectrum of (y0,t)t

84

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

xaxis

dreg

_OLS

truesplineloess

Figure 5.4: d0(t) (smoothed by spline and loess method) for (y0,t)t

Time

y

0 1000 2000 3000 4000

−20

2


85

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

FSeries y


0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−02

1e−

011e

+00

1e+

01

frequency

spec

trum




5.5 Forecast for Non-stationary ProcessesThere exists much literature concerning the forecast for stationary processes, see for ex-ample, Geweke and Porter-Hudak (1983), Barkoulas and Baum (1997), Noakes et al.(1988), Ray (1993a), Smith and Yadav (1994), Crato and Ray (1996), Brodsky and Hur-vich (1999), etc.

As for the forecast of non-stationary process, to our knowledge, there exists almost noliterature for reference. In this thesis, since we proposed a new non-stationary processtogether with a consistent and robust wavelet-based estimation procedure, it is necessaryto give out a method of the forecast for this new model.

It is known that the best prediction on a horizon h for a time series yt is in the senseof the minimum mean squared error (MSE). The forecast provided by the predictor ofleast squares method is given by the following expression:

yt(h) = E(yt+h|It), where It = σ(ys, s ≤ t).

We assess the predictive ability of the model by considering the root mean square error(RMSE) of prediction. The criteria are defined as follows:

RMSE =

√√√√1

h

h∑

l=1

(yt+l − yt(l))2,

where h is the forecast horizon and yt(l) is the predicted value of yt+l, see Priestley (1981)for reference.

86

0.0 0.2 0.4 0.6 0.8 1.0

−0.

10.

00.

10.

20.

30.

4

x_axis

dreg

_OLS

truesplineloess


Time

y

0 1000 2000 3000 4000

−3−2

−10

12

3


87

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

FSeries y


0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−02

1e−

011e

+00

1e+

01

frequency

spec

trum




Now, we consider the forecast for non-stationary processes, especially the locally sta-tionary k-factor Gegenbauer process that we have proposed:

k∏i=1

(I − 2 cos λiB + B2)di(t)yt = ε(t), (5.21)

where (εt)t is white noise with variance σ2ε .

Our strategy for the prediction of the locally stationary k-factor Gegenbauer process isas follows:

1. Make the n-step-ahead forecast for the long memory parameter functions dt(h) =E(d(t + h)|It), where It = σ(d(s), s ≤ t).We start from the estimations of di(t) obtained by the wavelet-based algorithmwhich are smoothed by spline method and loess method. It is the prediction for theestimated polynomial curves.

2. Calculate yt using the Equations (5.5) and (5.6).

In detail, suppose that we have the series y1, · · · , yT which has been modeled by thelocally stationary k-factor Gegenbauer process (5.21), and the estimated time-varyingparameters are di(t), t = 1, · · · , T . First we make the n-step-ahead forecasts di(T + h)(h = 1, · · · , n) for the time-varying long memory parameters. One smoothed curve isobtained by the cubic smoothing spline method. We predict a smoothing spline fit atnew points. The predicted fit is linear beyond the original data. The other smoothedcurve is obtained by the loess method using local fitting by fitting a polynomial surfacedetermined by one or more numerical predictors. According to the results of Monte Carloexperiments, when the degree of smoothing is 0.75, the estimation behavior seems to

88

0.0 0.2 0.4 0.6 0.8 1.0

−0.

10.

00.

10.

20.

30.

4

x_axis

dreg

_OLS

truesplineloess


Time

y

0 1000 2000 3000 4000

−4−2

02

4


89

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

FSeries y


0.0 0.1 0.2 0.3 0.4 0.5

1e−

041e

−02

1e+

001e

+02

frequency

spec

trum




be the best for the loess method with comparison to the other degrees of smoothing.And a few iterations of an M-estimation procedure with Tukey’s biweight are used toachieve the forecast of d(t). Then we obtain the forecast yT+h (h = 1, · · · , n) using theorthogonal Gegenbauer coefficients. To justify the behavior of the forecast, we measurethe corresponding bias and RMSE.

90

0.0 0.2 0.4 0.6 0.8 1.0

−1.

0−

0.5

0.0

0.5

xaxis

dreg

_OLS

truesplineloess


Time

y

0 1000 2000 3000 4000

−4−2

02

4


91

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

F

Series y


0.0 0.1 0.2 0.3 0.4 0.5

1e−

041e

−02

1e+

00

frequency

spec

trum




92

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

x_axis

dreg

_OLS

truesplineloess


Time

y

0 1000 2000 3000 4000

−4−2

02


93

0 5 10 15 20 25 30 35

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

F

Series y


0.0 0.1 0.2 0.3 0.4 0.5

1e−

051e

−03

1e−

011e

+01

frequency

spec

trum




94

0.0 0.2 0.4 0.6 0.8 1.0

−0.

2−

0.1

0.0

0.1

0.2

0.3

0.4

xaxis

dreg

_OLS

truesplineloess


95

Chapter 6

Applications

This chapter is concerned with the application results of the financial and energy data,applying the estimation theory of stationary and non-stationary long memory processes.In particular, for these two series, we apply the new non-stationary model and wavelet-based estimation method that we proposed in Chapter 5. Then we make the predictions.

6.1 Nikkei Stock Average 225 Index Data

6.1.1 Data SetMany authors have made the applications on the stock markets, for example, Dufrénot etal. (2005a) and Ferrara and Guégan (2001a). In this section we consider the Nikkei StockAverage 225 (NSA 225) spot index and futures price which correspond to 4096 dailyobservations of the spot index and the futures price of the NSA 225, covering the periodfrom January 2nd 1989 to September 13rd 2004. Daily closing values of the spot indexand the settlement prices of the futures contracts are used. The regular futures contractsmature in March, June, September and December. For further details on the futures priceseries, we refer to Lien and Tse (1999). Data sets are available from Thomson data stream.

6.1.2 ModelingFigure 6.1 represents the daily spot index and daily futures prices from January 2nd 1989to September 13rd 2004. (St)t denotes the logarithm of the spot price and (Ft)t thelogarithm of the futures price. Lien and Tse (1999) assumed that (St)t and (Ft)t wereboth integrated of order one and they modeled the relationship between (St)t and (Ft)t

using an error correction model (ECM), proposed by Engle and Granger (1987). Currentprices are affected by the past prices and error correction term and the authors used thefollowing relationship:

∆St = φ0 +

p∑i=1

φi∆St−i +

q∑j=1

ψj∆Ft−j + γZt−1 + εSt , (6.1)

where (Zt)t is such that Zt = Ft − St, for t = 1, · · · , T . Figure 6.2 represents the errorcorrection term (Zt)t, which is the difference between the log futures prices and the log

96

spot prices.

Our aim is to model the error correction term (Zt)t using the non-stationary model andthe method that we have developed in Chapter 5. Indeed, Lien and Tse (1999) and Fer-rara and Guégan (2001b) have already considered this problem using stationary modelson a shorter period (from January 1989 to August 1997 and from May 1992 to August1996 respectively). In the error correction model (ECM) of Lien and Tse (1999), the spotprices and futures prices are integrated of order one but the bias (the difference betweenthe futures price and the spot indexes) is fractionally integrated. Whereas, Ferrara andGuégan (2001b) modeled the bias in the ECM by stationary Gegenbauer process, whichhas been proved to be more efficient than the modeling proposed by Lien and Tse (1999).

However, what we consider is an even longer time period of data set, which is not neces-sarily globally stationary. Since we observe the existence of volatility in (Zt)t, it seemsappropriate to model the series (Zt)t by non-stationary model with time-varying parame-ters. In the following, we propose to model the series using the locally stationary k-factorGARMA process.

In the first step, we use the wavelet multiresolution analysis to remove the time-varyingmean. For this purpose, we apply the Maximal Overlap Discrete Wavelet Transform(MODWT) (J=6) with a Daubechies least asymmetric (LA(8)) wavelet filter and per-form the multiresolution analysis, which is shown in Figure 6.3. The wavelet details,D1, · · · , D6, exhibit the zero mean, while the wavelet smooth S6, associated with the lowfrequency [0, 1/64], captures the trend of the series. To remove the time-dependent mean,we ignore the wavelet smooth and sum up the six wavelet details. We get the residuals:Zt − S6,t where S6,t is the wavelet smooth. Thus, the residuals still keep the periodicityin the original data.

In the second step, we estimate the Gegenbauer frequency λG = 0.015 which corre-sponds to the highest explosion in the periodogram. Thus, we apply the DWPT on theresiduals Zt − Z6,t, choosing the orthonormal basis, locating the DWPT coefficients onthe evenly partitioned 64 subintervals, calculating locally the variance, and carrying outthe approximate OLS regression on each subinterval. Then, we smoothed the local esti-mated points of the long memory parameter by spline and loess method, without takinginto account the first and last estimated points to avoid the boundary effects.

Thus, we fit the residuals by the following model:

(I − 2× 0.995B + B2)d(t)(Zt − S6,t) = εt, (6.2)

where d(t) is the estimated curves presented in Figure 6.4, ν = 0.995 = cos(2πλG) andS6,t is the wavelet smooth obtained by the multiresolution analysis indicated in Figure 6.3.The thin real curve is the estimated parameter function obtained by spline smoothing, andthe thick real curve is smoothed by loess method. In Figure 6.4, we can also observethe estimation results in dashed line and dotted line using another two semi-parametricmethods, Robinson method (1995) and Whittle method (1951), treating the parameter

97

Time

Spo

t Ind

ex /

Fut

ure

Pric

e

0 1000 2000 3000 4000

1000

020

000

3000

040

000

Spot IndexFuture Price

Figure 6.1: Trajectory of NSA 225 index (02/01/1989-13/09/2004)

function as a constant in the model 6.2. Comparing our result with that in Ferrara andGuégan (2001b) on the same time period, we get similar behavior, while our result cap-turing more precisely the local changes.

Therefore, the new model and the new method that we developed in Chapter 5 permitto extend the other authors’ work by capturing the local characteristics, which indicatesthe advantages of our methodology. Indeed, on a shorter time period, it seems to be rea-sonable to assume the stationarity for the series. However on a longer time period, sincegeneral stationarity is not always satisfied, it seems to be more appropriate to consider thenon-stationary processes and the time-varying parameters and to work locally.

98

Time

x

0 1000 2000 3000 4000

−0.

04−

0.02

0.00

0.02

0.04

Figure 6.2: Error Correction Term (Zt)t

99

Time

Y

Time

D1

Time

D2

Time

D3

Time

D4

Time

D5

Time

D6

S6

0 200 480 760 1040 1360 1680 2000 2320 2640 2960 3280 3600 3920

Figure 6.3: Multi-Resolution Analysis of (Zt)t (J = 6)

100

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

splineloesssplineloess

RobinsonWhittle

Figure 6.4: d(t) smoothed by spline and loess method

101

h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7Bias 23.30081 23.06577 23.26697 22.83360 22.88856 23.30048 23.20274

RMSE 0.465407 0.3278092 0.2697851 0.2288531 0.2071427 0.1911432 0.1750827

Table 6.1: The relative results of the h-step-ahead predictions on the error correctionterm in the ECM of NSA 225 index data using the locally stationary 1-factor Gegenbauermodel (parameter function smoothed by spline method).

6.1.3 ForecastIn the previous part, we have modeled the error correction term in the ECM of the NSA225 index by the locally stationary 1-factor Gegenbauer process. Now we turn to theforecast of the model. According to the forecast theory described in Chapter 5, what iscritical is to forecast the time-varying long memory parameter function d(t) in the model.

Using the strategy that we described in Chapter 5, first of all, we make the the h-step-ahead (h = 1, · · · , 7) forecasts for d(t) smoothed by spline method and that by loessmethod. Once we obtain the h-step-ahead predictions for d(t), we can deduce the h-step-ahead predictions for Zt. We present in the following the prediction results according tothe smoothing method for the estimated long memory parameter functions.

• The first case: the parameter function is obtained by the spline smoothing.

We first make the h-step-ahead forecasts for d(t) (h = 1, · · · , 7). We can ob-serve the predictions for d(t) in Figure 6.5 (h = 1), Figure 6.6 (h = 2), Figure 6.7(h = 3), Figure 6.8 (h = 4), Figure 6.9 (h = 5), Figure 6.10 (h = 6) and Figure6.11 (h = 7). Then we utilize these results to forecast the error correction termZt. Figure 6.12 (h = 1), Figure 6.13 (h = 2), Figure 6.14 (h = 3), Figure 6.15(h = 4), Figure 6.16 (h = 5), Figure 6.17 (h = 6) and Figure 6.18 (h = 7) are thecorresponding predictions for Zt.

From the graphs, there seems to exist some differences between the true trajec-tory (see in Figure 6.2) and the predicted trajectory. To our understanding, it is tosome extent due to the different scales in the graphs. To have a more clear idea ofour forecast, we need to refer to the criteria of the forecast. In Table 6.1, the biasand the RMSE of the h-step-ahead predictions are presented. We find that althoughZt is a long series with the length 4096, the bias and the RMSE are quite small,which is really satisfying.

• The second case: the parameter function is smoothed by the loess method.

We first make the h-step-ahead forecast for d(t) (h = 1, · · · , 7). We can observethe predictions for d(t) in Figure 6.19 (h = 1), Figure 6.20 (h = 2), Figure 6.21(h = 3), Figure 6.22 (h = 4), Figure 6.23 (h = 5), Figure 6.24 (h = 6) and Figure6.25 (h = 7). Then we utilize these results to forecast the error correction term Zt,using the strategy described in the previous Chapter. In Figure 6.26, Figure 6.27,

102

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle

spline smoothing1−step−ahead forecast

Figure 6.5: NSA: 1-step-ahead forecast for d(t) smoothed by spline method

103

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



104

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



105

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



106

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



107

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



108

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



109

Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03

1−step−ahead forecast

Figure 6.12: 1-step-ahead forecastfor the error term in the ECMof NSA 225 index (smoothed byspline method).

Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03

0.04



110

Time

ZZ

0 1000 2000 3000 4000

−0.

010.

000.

010.

020.

03



Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



111

h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7Bias 21.30018 21.20002 21.2284 20.79552 21.42539 21.4057 21.49394

RMSE 0.4323617 0.3029903 0.2462552 0.2110108 0.1936811 0.1756666 0.1647264

Table 6.2: The relative results of the h-step-ahead predictions on the error correctionterm in the ECM of NSA 225 index data using the locally stationary Gegenbauer model(parameter function smoothed by loess method).

Figure 6.28, Figure 6.29, Figure 6.30, Figure 6.31 and Figure 6.32, we can observeall the predictions.

Similarly, from the graphs, there seems to exist some differences between the truetrajectory (see in Figure 6.2) and the predicted trajectory. We think that it is to someextent due to the different scales in the graphs. To have a more clear idea of ourforecast, we refer to the two criteria of the forecast. In Table 6.2, the bias and theRMSE of the h-step-ahead predictions are presented. We find that the bias and theRMSE are not too large with comparison to the length of the time series Zt, whichis really satisfying.

From the numerical point of view, we find that the forecast based on the loess smoothingis really a little better than that based on the spline smoothing, which is consistent withour conjecture made from the simulation experiments for estimation.

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle

loess smoothing1−step−ahead forecast

Figure 6.19: NSA: 1-step-aheadforecast for d(t) smoothed by loessmethod

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



112

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



113

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1.0

−0.

50.

00.

5

time

dreg

_OLS

1

RobinsonWhittle



Time

ZZ

0 1000 2000 3000 4000

−0.

010.

000.

010.

020.

03


Figure 6.26: 1-step-ahead forecastfor the error term in the model ofNSA 225 index (smoothed by loessmethod).

Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



114

Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



Time

ZZ

0 1000 2000 3000 4000

−0.

010.

000.

010.

020.

03



Time

ZZ

0 1000 2000 3000 4000

−0.

02−

0.01

0.00

0.01

0.02

0.03



115

Time

ZZ

0 1000 2000 3000 4000

−0.

010.

000.

010.

020.

03



6.2 WTI Oil DataCrude oil prices behave similarly as many other commodities with wide price swings intimes of shortage or oversupply. Thus, to predict the oil price seems to be quite important.West Texas Intermediate (WTI), also known as Texas Light Sweet, is a type of crude oilused as a benchmark in oil pricing and the underlying commodity of New York Mercan-tile Exchange’s oil futures contracts. This oil type is often referenced in North Americannews reports about oil prices, alongside North Sea Brent Crude.

Historical price data for WTI can be found on a web site maintained by Energy Infor-mation Administration, Department of Energy. It is listed as WTI, Cushing, Oklahoma.Typical price difference per barrel is about $1 more than Brent, and $2 more than OPECbasket.

Denote Xt the WTI oil price. The length of the data that we consider is 5110. Weobserve that Xt is not stationary (see in Figure 6.33). The moments of Xt is as follows:mean(X) = 23.76919, variance(X) = 100.1522, skewness = 1.993175, kurtosis =7.363735. We carry out the augmented Dickey-Fuller test to test the stationarity and wefind that the process Xt is not stationary. Therefore, we have two choices to carry out themodeling.

First of all, we can apply the stationary models. To obtain the stationarity, we differenceXt: Ut = Xt−Xt−1. See in Figure 6.34 for more details of Ut. According to the behaviorof the ACF and PACF of Ut, we decide to de-mean the series Ut: Zt = Ut −mean(Ut)

116

and to model Zt by the autoregressive models. The other choice is to carry out the model-ing in the non-stationary setting using the new non-stationary model and new estimationalgorithm that we proposed in Chapter 5.

6.2.1 Fitting by Stationary Model: AR(1)+FI(d) ModelIf we chose to model Zt by an AR(1) model, we get the following equation:

Zt = a1Zt−1 + εt,

where a1 = −0.04144. In Figure 6.35, we can find how the AR(1) model fits Zt. Ac-cording to the fitting behavior of Zt, we turn to study the residuals (εt)t. The relatedinformation for εt is exhibited in Figure 6.36. It is something like a noise. Thus we studythe square of εt to investigate the properties of the volatility in the oil price, see in Figure6.37 for the information of (ε2

t )t. Since there exists an explosion near the 0 frequency inits spectrum, we decide to fit the volatility (ε2

t )t by a fractional integrated model FI(d) asfollows:

(I −B)dε2t = νt,

where d = 0.1234373. Observing Figure 6.38, we do not find the existence of long mem-ory behavior in the residual νt. However, νt is not totally white noise. Since the residualνt is a a little complicated, here we just consider it as a Gaussian white noise for simplicity.

Thus, we model Zt as follows:

Zt = a1Zt−1 + εt


(6.3)

where a1 = −0.0414, d = 0.1234373 and νt is Gaussian white noise with mean zero.

Graphically, it seems that this model does not fit very well the series (Zt)t.

6.2.2 Fitting by Stationary Model: AR(2)+FI(d) ModelIf we model Zt by an AR(2) model, we get the following equation:

Zt = a1Zt−1 + a2Zt−2 + εt,

where a1 = −0.0422963, a2 = −0.02060408. In Figure 6.39, we can observe howZt is fit by the AR(2) model. According to the fitting behavior of Zt, we turn to studythe residuals (εt)t. The related information of (εt)t is exhibited in Figure 6.40. It lookssomething like a noise,thus we study the square of the (εt)t to investigate the properties ofthe volatility in the oil price, see in Figure 6.41 for the information of (ε2

t )t. We observethat there exists an explosion near the 0 frequency in the spectrum, thus we fit the volatility(ε2

t )t by a fractional integrated model FI(d) as follows:


117

Time

X

0 1000 3000 5000

1030

5070

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

+01

frequency

spec

trum



0 10 20 30

0.0

0.4

0.8

Lag

AC

F

Series X

0 5 10 20 30

0.0

0.4

0.8

Lag

Par

tial A

CF

Series X

Figure 6.33: WTI: The sample path, spectrum, ACF and PACF of (Xt)t

where d = 0.1230429, estimated by the GPH method. In the residual νt, there does notexist long memory behavior, see Figure 6.42. We can regard it as a kind of noise, but it isnot completely a white noise. For simplicity, we just consider it as a Gaussian white noise.

Thus, we model Zt as follows:

Zt = a1Zt−1 + a2Zt−2 + εt


(6.4)

where a1 = −0.0422963, a2 = −0.02060408, d = 0.1230429 and νt is Gaussian whitenoise with zero mean.

Similarly, from the graphs, it seems that this model does not fit very well the series Zt.

118

Time

U

0 1000 3000 5000

−10

−5

05

0.0 0.1 0.2 0.3 0.4 0.5

5e−

045e

−02

frequency

spec

trum



0 10 20 30

0.0

0.4

0.8

Lag

AC

F

Series U

0 5 10 20 30

−0.

040.

000.

04

Lag

Par

tial A

CF

Series U

Figure 6.34: WTI: The sample path, spectrum, ACF and PACF of (Ut)t

119

Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05

true Z_tZ_t fitted by AR(1) model

Figure 6.35: WTI: Fit Zt by AR(1) model

120

Time

epsi

0 1000 3000 5000

−10

−5

05

0.0 0.1 0.2 0.3 0.4 0.5

5e−

045e

−02

frequency

spec

trum



0 10 20 300.

00.

40.

8

Lag

AC

F

Series epsi

0 5 10 20 30

−0.

040.

00

Lag

Par

tial A

CF

Series epsi

Figure 6.36: WTI: AR(1)+FI(d), The sample path, spectrum, ACF and PACF of the resid-ual (εt)t of the AR(1) term

121

Time

epsi

2

0 1000 3000 5000

040

80

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−01

1e+

01

frequency

spec

trum



0 10 20 300.

00.

40.

8

Lag

AC

F

Series epsi2

0 5 10 20 30

0.00

0.10

Lag

Par

tial A

CF

Series epsi2

Figure 6.37: WTI: AR(1)+FI(d), The sample path, spectrum, ACF and PACF of thevolatility (ε2

t )t

122

Time

nut

0 1000 3000 5000

040

80

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−01

1e+

01

frequency

spec

trum



0 10 20 30

0.0

0.4

0.8

Lag

AC

F

Series nut

0 5 10 20 30

−0.

050.

05

Lag

Par

tial A

CF

Series nut

Figure 6.38: WTI: AR(1)+FI(d), The residual (νt)t of the FI(d) term

123

Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05

true Z_tZ_t fitted by AR(2) model

Figure 6.39: WTI: Fit Zt by AR(2) model

124

Time

epsi

0 1000 3000 5000

−10

−5

05

0.0 0.1 0.2 0.3 0.4 0.5

5e−

045e

−02

frequency

spec

trum



0 10 20 300.

00.

40.

8

Lag

AC

F

Series epsi

0 5 10 20 30

−0.

040.

00

Lag

Par

tial A

CF

Series epsi

Figure 6.40: WTI: AR(2)+FI(d), Sample path, spectrum, ACF and PACF of the residual(εt)t of AR(2) term

125

Time

epsi

2

0 1000 3000 5000

040

80

0.0 0.1 0.2 0.3 0.4 0.5

1e−

021e

+00

1e+

02

frequency

spec

trum



0 10 20 300.

00.

40.

8

Lag

AC

F

Series epsi2

0 5 10 20 30

0.00

0.10

Lag

Par

tial A

CF

Series epsi2

Figure 6.41: WTI: AR(2)+FI(d), The sample path, spectrum, ACF and PACF of thevolatility (ε2

t )t

126

Time

nut

0 1000 3000 5000

040

80

0.0 0.1 0.2 0.3 0.4 0.5

5e−

035e

−01

frequency

spec

trum



0 10 20 30

0.0

0.4

0.8

Lag

AC

F

Series nut

0 5 10 20 30

−0.

050.

05

Lag

Par

tial A

CF

Series nut

Figure 6.42: WTI: AR(2)+FI(d), The residuals of the FI(d) term (νt)t

127

6.2.3 Fitting by Non-stationary Model Using Wavelet MethodFirst we detrend Xt through multiresolution analysis (MRA) by applying the MODWTon Xt, where we adopt the Daubechies least asymmetric "LA8" as the wavelet filter, thedepth of the decomposition is 6 and the decomposed vector is assumed to be periodic onits defined interval. That is to say, Zt = Xt−S6,t, where S6,t is the wavelet smooth in themultiresolution analysis presented in Figure 6.43. We investigate the properties of Zt inFigure 6.44. Observing the behavior of ACF and spectrum of Zt, we decide to model Zt

by locally stationary 1-factor Gegenbauer process with time-varying parameter function:

(I − 2νB + B2)d(t)Zt = εt.

We estimate the Gegenbauer frequency which corresponds to the frequency of the maxi-mum periodogram and we get λG = 0.01049805. Then we apply the wavelet-based algo-rithm that we proposed in Chapter 5. We partition evenly the time interval and carry outthe OLS regression using DWPT wavelet coefficients on each subinterval and we smooththe estimations by spline method and loess method.

Thus we model Xt as follows:

Xt = Zt + S6,t

(I − 2νB + B2)d(t)Zt = εt,(6.5)

where ν = cos(2πλG) = 0.9978253, εt is Gaussian white noise with zero mean and theestimation d(t) is presented in the Figure 6.45. The thin curve is the estimation smoothedby spline method. The thick curve is the estimation smoothed by loess method.

If we observe carefully the Figure 6.45, we find that the estimated function d(t) does

not lie well in the definition interval (−1

2,1

2) for both two kinds of smoothing method. It

is really a little awkward. To solve this problem, we decide to do some differencing inorder to satisfy the range of definition interval for the parameter function. After differ-encing, we investigate the properties of Zt in Figure 6.46. Thus we have the modeling asfollows:

Xt = Zt + S6,t

(I − 2νB + B2)d(t)+1Zt = εt,(6.6)

where ν = cos(2πλG) = 0.9978253, εt is Gaussian white noise with zero mean and theestimation d(t) is presented in the Figure 6.47. The thin curve is the estimation smoothedby spline method. The thick curve is the estimation smoothed by loess method.

Now the estimated parameter function smoothed by the loess method lies well in thedefinition interval, while part of the parameter function smoothed by the spline is still

outside the interval (−1

2,1

2).

128

Time

X

Time

D1

Time

D2

Time

D3

Time

D4

Time

D5

Time

D6

S6

0 200 480 760 1040 1360 1680 2000 2320 2640 2960 3280 3600 3920

Figure 6.43: WTI: The Multiresolution analysis of (Xt)t

129

Time

Z

0 1000 2000 3000 4000

−6

−4

−2

02

46

8

0.0 0.1 0.2 0.3 0.4 0.5

1e−

051e

−03

1e−

011e

+01

spec

trum


0 5 10 20 30

−0.

20.

20.

61.

0

Lag

AC

F

Series Z

0 5 10 20 30

0.0

0.2

0.4

0.6

0.8

Par

tial A

CF

Series Z

Figure 6.44: WTI: The sample path, spectrum, ACF and PACF of (Zt)t

130

0.0 0.2 0.4 0.6 0.8 1.0

−0.

50.

00.

51.

0

time

dreg

_OLS

splineloess

Figure 6.45: WTI: The estimated parameter of (dt)t

131

Time

Z

0 1000 2000 3000 4000

−20

−10

010

20

0.0 0.1 0.2 0.3 0.4 0.5

1e−

031e

−01

1e+

01

spec

trum


0 5 10 20 30

−0.

20.

20.

61.

0

Lag

AC

F

Series Z

0 5 10 20 30

−0.

10.

10.

30.

5

Par

tial A

CF

Series Z

Figure 6.46: WTI: The sample path, spectrum, ACF and PACF of (Zt)t after differencing

132

0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

splineloess

Figure 6.47: WTI: The estimated parameter of (dt)t after differncing

133

6.2.4 ForecastIn this section, we utilize the previous estimation results of modeling to make the forecastfor the WTI oil price data.

Forecast for the Model Fit by the AR(1)+FI(d) Model

To make the forecast in this case, the most important thing is to make the forecast for thevolatility ε2

t . We regard (νt)t as a Gaussian white noise with zero mean and generate thevolatility by

ε2t = (I −B)−dνt.

In Figure 6.48 (h = 1), Figure 6.50 (h = 2), Figure 6.52 (h = 3), Figure 6.54 (h = 4),Figure 6.56 (h = 5), Figure 6.58 (h = 6) and Figure 6.60 (h = 7), we can observe theh-step-ahead forecasts (h = 1, · · · , 7) for the volatility. Then we make the autoregressiveregression to obtain the h-step-ahead predictions for Zt. The results are resented in Figure6.49 (h = 1), Figure 6.51 (h = 2), Figure 6.53 (h = 3), Figure 6.55 (h = 4), Figure 6.57(h = 5), Figure 6.59 (h = 6) and Figure 6.61 (h = 7). In Table 6.3, we evaluate theforecast behavior by the bias and the RMSE.

Observing the graphs, we find that the forecasts for (ε2t )t and (Zt)t are not far away from

the mean. But they do not grasp very well the local volatilities. And the bias and theRMSE are relatively a little too large.

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0

true volatility1−step−ahead predicted volatility

Figure 6.48: WTI: The 1-step-ahead prediction of (ε2

t )t concern-ing AR(1)+FI(d) model

Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05

true Z_t1−step−ahead predicted Z_t

Figure 6.49: WTI: The 1-step-ahead prediction Zt usingAR(1)+FI(d) model

134

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05


Figure 6.51: WTI: The 2-step-ahead prediction of (Zt)t concern-ing AR(1)+FI(d) model

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7Bias 6489.46 6503.977 6451.939 6451.207 6377.762 6431.554 6509.753

RMSE 105.2755 74.59163 60.41483 52.38439 46.2088 42.57117 39.73764

Table 6.3: The relative results of the h-step-ahead predictions on Zt of the WTI oil pricedata using the AR(1)+FI(d) model.

135

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



136

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



137

h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7Bias 6427.122 6396.929 6493.92 6455.806 6425.974 6517.23 6412.771

RMSE 103.7263 73.4362 60.59561 52.3697 46.75349 43.07525 39.19955

Table 6.4: The relative results of the h-step-ahead predictions on Zt of the WTI oil pricedata using the AR(2)+FI(d) model.

Forecast for the Model Fit by the AR(2)+FI(d) model

To make the forecast in this case, the most important thing is to make the forecast for thevolatility ε2

t . We regard (νt)t as a Gaussian white noise with zero mean and generate thevolatility by

ε2t = (I −B)−dνt.

In Figure 6.62 (h = 1), Figure 6.64 (h = 2), Figure 6.66 (h = 3), Figure 6.68 (h = 4),Figure 6.70 (h = 5), Figure 6.72 (h = 6) and Figure 6.74 (h = 7), we can observe theh-step-ahead forecasts (h = 1, · · · , 7) for the volatility. Then we make the autoregressiveregression to obtain the h-step-ahead predictions for Zt. The results are resented in Figure6.63 (h = 1), Figure 6.65 (h = 2), Figure 6.67 (h = 3), Figure 6.69 (h = 4), Figure 6.71(h = 5), Figure 6.73 (h = 6) and Figure 6.75 (h = 7). In Table 6.4, we evaluate theforecast behavior by the bias and the RMSE.

Similarly, observing the graphs, we find that the forecasts for (ε2t )t and (Zt)t are not

far away from the mean. But they do not grasp very well the local volatilities. And thebias and the RMSE are relatively a little too large.

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



138

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



139

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



140

Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



Time

epsi

2

0 1000 2000 3000 4000 5000

020

4060

8010

0




Time

Z

0 1000 2000 3000 4000 5000

−10

−5

05



141

h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7Bias 6608.34 6983.16 9299.67 6620.821 6360.43 6342.72 8446.96

RMSE 135.25 102.40 110.10 68.47 56.24 53.12 65.91

Table 6.5: The relative results of the h-step-ahead predictions of WTI oil price data usingthe locally stationary Gegenbauer model (parameter function smoothed by loess method).

Forecast for the Model Fit by the Locally Stationary Gegenbauer Model

Observing the figures of the estimation of the parameter function, we find that with-out differencing of the original data, the most part of the estimated parameter functions(smoothed by spline and loess method) lie outside the definition interval of the locallystationary Gegenabuer model.

To overcome this awkwardness, we have done some differencing. Thus, the estimationsmoothed by loess method lies well in the definition interval of the locally stationaryGegenbauer process, while the estimation smoothed by spline method does not lie totallyin the definition interval. In the following, we will make the predictions using the param-eter function smoothed by the loess method satisfying the model 6.6.

We first make the h-step-ahead forecasts for d(t) (h = 1, · · · , 7) where the parameterfunction is smoothed by the loess method. We can observe the predictions for d(t) inFigure 6.76 (h = 1), Figure 6.78 (h = 2), Figure 6.80 (h = 3), Figure 6.82 (h = 4),Figure 6.84 (h = 5), Figure 6.86 (h = 6) and Figure 6.88 (h = 7). Then we utilize theseresults to forecast the error correction term Zt, using the strategy described in the previousChapter. We present in Figure 6.77, Figure 6.79, Figure 6.81, Figure 6.83, Figure 6.85,Figure 6.87 and Figure 6.89 for all the predictions.

From the graphs, we can find that the forecasts for Zt are to some extent quite good,capturing well the local changes in the trajectory. To have a more clear idea of our fore-cast, we refer to the two criteria of the forecast. In Table 6.5, the bias and the RMSE ofthe h-step-ahead predictions are presented. We find that the bias and the RMSE are not solarge and thus acceptable, which indicates that the forecast for the model 6.6 is effective.

In fact, if we make the forecast based on the model 6.5, we can imagine the poor result ofthe forecast due to the fact that the range of the estimate is [0, 1], which is not permittedby the definition of the locally stationary k-factor Gegenbauer process (see Proposition5.3.1). Thus, if we go on doing the forecast using the estimation based on the locallystationary model without doing any treatments beforehand, the result would not be good.However, if we notice this problem before prediction, we can carry out some differencingsuch that the parameter function can lie well in the definition interval. That is why weobtained good forecast results based on the model ??.

142

0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1


Figure 6.76: 1-step-ahead forecastof the long memory parameter forthe WTI data (smoothed by theloess method).

Time

x409

7

0 1000 2000 3000 4000

1015

2025

3035

40

true trajectory1−step−ahead forecast

Figure 6.77: 1-step-ahead forecastof the WTI oil price data (smoothedby the loess method).

0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1



Time

x409

8

0 1000 2000 3000 4000

1015

2025

3035

40



143

0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1



Time

x409

9

0 1000 2000 3000 4000

1015

2025

3035

40



0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1



Time

x410

0

0 1000 2000 3000 4000

1015

2025

3035

40



144

0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1



Time

x410

1

0 1000 2000 3000 4000

1015

2025

3035

40



0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1



Time

x410

2

0 1000 2000 3000 4000

1015

2025

3035

40



145

0.0 0.2 0.4 0.6 0.8 1.0

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

time

dreg

_OLS

1



Time

x410

3

0 1000 2000 3000 4000

1015

2025

3035

40



6.3 ConclusionIn practice, we need to pay attention to the applicability of our model, particularly therange of the definition interval for the parameter function.

If the estimated curve of the long memory parameter function lies well in the intervalof the definition of the locally stationary k-factor Gegenbauer process, we can go on car-rying out the forecast, and the result will be quite satisfying, for example, the case ofNSA225 index data. Otherwise, if the estimated curve is outside the definition interval,we have to do some transformations before modeling.

As we can find in the case of the WTI oil price data, if we apply directly the locallystationary 1-factor Gegenbauer processes, we find that the estimated function is not welllocated in the definition interval of the model, which will lead to the poor prediction re-sults. However, after suitable differencing, we find that the estimated parameter functionsmoothed by loess method lies well in the definition interval. Therefore, the graphicalresult and numerical results are good with no doubt.

146

Chapter 7

Testing the Fractional Order of LongMemory Processes

Usefulness of fractionally integrated processes has been pointed out over the recent yearsin order to take various strong persistence effects into modeling. Macroeconomics workon the modeling and forecasting of economic activity time series have made an extensiveuse of the fractional alternative through long memory processes. For example, Carlinand Dempster (1989) considered monthly unemployment rate of US males; Porter-Hudak(1990) dealt with the US money supply and monetary aggregates and Ray (1993) pro-posed models for monthly IBM revenue data. Monthly UK inflation rates have been con-sidered by Franses and Ooms (1997), Arteche and Robinson (2000) and Arteche (2003).Other applications also dealt with time series on consumer goods (Darné, Guiraud andTerraza, 2004), public transportation (Ferrara and Guégan, 2000), exchange rates (Fer-rara and Guégan, 2001a), spot prices (Ferrara and Guégan, 2001b) or electricity prices(Diongue and Guégan, 2004).

For all those applications, a specific fractionally integrated process has been proposed bythe researchers. Generally the choice of the process corresponds to a specific problematicand no comparison has been carried out with different types of long memory processes.For example, some papers used the classical fractionally integrated process, introducedby Granger and Joyeux (1980) and Hosking (1981), while others focused on generalizedlong memory processes or seasonal long memory processes dedicated to take cyclical orseasonal components with persistence into account.

As we have seen the properties of a time series depend on its order of integration, d, thatis on the presence of unit roots. It is important to have techniques available to determinethe actual form of non-stationarity and to distinguish between stochastic and deterministictrends if possible. There is a large literature on testing for unit roots theory.

The need to test economic theories which imply random walks has stimulated a largeliterature involving the unit root distribution (see Dickey and Fuller (1979, 1981), Evansand Savin (1981, 1984), Sargan and Bhargava (1983), Phillips (1987)). One facet of theunit root literature has concerned weakening the assumption of IID errors. In particular

147

Phillips (1987) showed that the unit root distribution can be used to test for a randomwalk if the errors satisfy a strong mixing condition. Unfortunately, this condition maynot be justified for some economic time series. For example, dependency greater thanallowed in Phillips (1987), is permitted by fractionally integrated models which extendthe ARIMA(p, d, q) model to real values of d. Furthermore, studies of fractional integra-tion (Granger and Joyeux (1981), Geweke and Porter-Hudak (1983)) have concluded thatsome economic time series possess fractional unit roots.

7.1 Unit Root Test for Autoregressive Moving AverageProcesses

Recently, methods for detecting unit roots in autoregressive and autoregressive- movingaverage time series have been proposed. The presence of a unit root indicates that thetime series is not stationary but that differencing will reduce it to stationarity. The testsproposed to date require specification of the number of autoregressive and moving aver-age coefficients in the model.

A good survey may be found in Dickey et al. (1986), among others. Consider the simpleAR(1) model:

Xt = φXt−1 + εt, (7.1)

where y0 = 0 and the innovation εt is a white noise sequence with constant variance. Wecan regress Xt on Xt−1 and then use the standard t-statistic to test the null hypothesis:H0 : φ = φ0. The problem arises because we do not know a priori whether the model isstationary. If |φ| < 1, the AR(1) model is stationary, and the ordinary least squares esti-mator of φ, φOLS , equals the maximum likelihood estimator under normality and followsa normal asymptotic distribution. Furthermore, the statistic given by:

tφ =φLS − φ0

sφ

,

where sφ is the estimated standard deviation of φLS , follows an asymptotic distributionN(0, 1). For small samples, this statistic is distributed approximately as a Student’s t with(T − 1) degrees of freedom. Nevertheless, when φ = 1, this result does not hold. It canbe shown that the OLS estimator of φ is biased downwards and that the t-statistic underthe unit-root null hypothesis, does not have a Student’s t distribution even in the limit asthe sample size becomes infinite. The AR(1) model (7.1) can be written as follows bysubtract Xt−1 at both sides of the equation:

Xt −Xt−1 = (φ− 1)Xt−1 + εt (7.2)

or∆Xt = ρXt−1 + εt, (7.3)

where ρ = φ − 1. The relevant unit root hypothesis is ρ = 0 and the alternative is oneside:

Ha : ρ < 0,

148

since ρ > 0 corresponds to explosive time series models. Dickey (1976) tabulated thepercentiles of this statistic under the unit root null hypothesis. The H0 of a unit root isrejected when the value of the statistic is lower than the critical value. This statistic, de-noted by τ , is called the Dickey-Fuller statistic and their critical values are published inFuller (1976).

Up to now it has been shown how to test the null hypothesis of a random walk (oneunit root) against the alternative of a zero mean, stationary . For economic time series, itcould be of interest to consider alternative hypothesis including stationarity around a con-stant and/or a linear trend. This could be achieved by introducing these terms in model(7.3):

∆Xt = α + ρXt−1 + εt, (7.4)

∆Xt = α + βt + ρXt−1 + εt. (7.5)

The unit root null hypothesis is simply H0 : ρ = 0 in both model (7.4)-(7.5). Dick-Fullertabulated the critical values for the corresponding statistics, denoted by τµ and ττ respec-tively. It should be noted that model (7.5) under the null hypothesis becomes a randomwalk plus drift model, which is a hypothesis that frequently arises in economic applica-tions.

The augmented Dickey-Fuller (ADF) test tests the null hypothesis that a time series Xt

is FI(1) against the alternative that it is FI(0), assuming that the dynamics in the datahave an ARMA structure. Hassler and Wolters (1994) found that the Augmented Dickey-Fuller test (hereafter ADF) against fractional alternatives lost considerable power whenaugmented terms were added. In contrast, Krämer and Dittmann (1998) showed that thistest is consistent if the order of the autoregression does not tend to infinity too fast.

7.2 Unit Root Test for Fractional Integrated ProcessesThe class of fractionally integrated processes, denoted as FI(d), where the order of inte-gration d is extended to be any real number, has proved very useful in capturing the per-sistence properties of many long-memory processes: Baillie (1996), Beran (1994), andGranger and Joyeux (1980). In general, unit root tests are consistent when the alternativeis a FI(d) process but their power turns out to be quite low (see Diebold and Rudebusch(1991), Schmidt and Lee (1996)). In particular, this lack of power has motivated thedevelopment of new testing approaches that take this type of alternative explicitly intoconsideration. There is a growing literature on this subject that can be basically classifiedinto two strands. First, there are Wald-type tests that, by working under the alternativehypothesis, provide point estimates of the memory parameter and build confidence inter-vals around it. Secondly, there are Lagrange Multiplier (LM) tests where statistics areevaluated under the corresponding null hypothesis. Within the first group, there are avery large number of rather heterogeneous contributions: parametric and semiparamet-ric methods of estimating d both in the frequency and in the time domain (see, inter

149

alia, Geweke and Porter-Hudak (1983), Fox and Taqqu (1986), Sowell (1992), Robin-son (1992)). However, most of them lack power when used for testing purposes. Onone hand, the semiparametric techniques tend to yield large confidence intervals that in-clude too often the null hypothesis. On the other hand, although in general the parametricmethods present narrower confidence intervals, the precision with which the parametersare estimated hinges on the correct specification of the model (see Hauser, Potscher andReschenhofer (1999)). Within the second group, Robinson (1994) and Tanaka (1999)have proposed useful LM tests in the frequency and the time domain, respectively. Adistinctive feature of both approaches is that, in contrast to the classical unit root testswhere asymptotic distributions are nonstandard and require case-by-case numerical tab-ulation, they do have standard asymptotic distributions. In this respect, Robinson (1994)has attributed this different limit behavior to the use of an explicit autoregressive (AR)alternative in the classical unit root-testing approach. Nonetheless, despite the advantageof having a standard limit distribution, a possible shortcoming of the LM approach is that,by working under the null hypothesis, it does not yield any direct information about thecorrect long memory parameter, d, when the null is rejected.

In order to overcome that drawback, Dolado et al. (2002) proposed a simple Wald-typetest in the time domain that has acceptable power properties and, as a by-product of its im-plementation, provides information about the values of d under the alternative hypothesis.It turns out to be a generalization of the well-known Dickey-Fuller (D-F) test, originallydeveloped for the case of FI(1) versus FI(0), to the more general case of FI(d0) versusFI(d1) with d1 < d0 and, thus, they referred to it as the Fractional Dickey-Fuller (FD-F)test. When d0 = 1, the proposed test statistics are based on the OLS estimator, or itst-ratio, of the coefficient on ∆d1Xt−1 in a regression of ∆Xt on ∆d1Xt−1 and, possibly,some lags of ∆Xt. When d1 is not taken to be known a priori, a pre-estimation of d, isneeded to implement the test. We show that the choice of any T 1/2-consistent estimatorof d1 ∈ [0, 1) suffices to make the test feasible, while achieving asymptotic normality.

Another well-known unit root test was proposed by Robinson (1994). He investigateda general model in order to test whether the data stemmed from a stationary or a non-stationary process, under uncorrelated and weakly-correlated innovations (εt)t. The pro-cess described by the following equation nests all the specific long memory processesgenerally used in applications:

F (B)Xt = (I −B)d0+θ0

k−1∏i=1

(I − 2νiB + B2)di+θi(I + B)dk+θkXt = εt, (7.6)

where B is the backshift operator. For i = 1, · · · , k − 1, νi = cosλi, λi being any fre-quency between 0 and π. For i = 0, 1, · · · k, θi belongs to [−1, 1] and di is such that:|di| < 1/2, implying thus that the spectral density is unbounded in λi. Moreover, (εt)t isan innovation process to be specified.

To adjust these models on real data sets, it is fundamental to detect long memory be-havior through statistical tests. Here, we investigate Robinson (1994) test. The properties

150

of this test have been proved in an asymptotic setting. Working with macroeconomicsdata sets significates that we deal with data sets with rather small sample sizes, generallylower than 1 000 points. Thus it is crucial for practitioners to know the accuracy of thistest for a finite sample size. We assess the rate of convergence for Robinson’s test usingMonte Carlo simulations.

From a practical point of view, before implementing a fractional process on real data,it is warmly recommended to carry out a statistical test to show evidence of persistence inthe data. In this respect, the test of Robinson (1994) has been proved to be very useful fortesting stationarity of many SCLM processes (see Gil-Alana, 2001, 2006). This test alsopermits to test the integration order at various frequencies and does not require the esti-mation of long memory parameters since the series have a short memory behavior underthe null hypothesis. Now, this test can also be used to test the degree of persistence of thememory parameter using the null of Robinson test. However, one of the major drawbackin empirical macroeconomics is the rather small amount of data available to the practi-tioners. For example in the industrialized countries, the broadest measure of economicactivity released by the quarterly national accounts of the statistical institutes, namelyGDP, is generally available only since 1970. Thus, in most of the cases, less than 160 datapoints are available to carry out the analysis on a quarterly basis. In this respect, it ap-pears crucial to study the finite sample behavior of the statistical procedures carried out ateach step of the statistical analysis. In this paper, we propose a simulation experiment todetermine the possible application of the Robinson (1994) test for finite samples. Indeed,in his paper the results were proved in an asymptotic setting and we need to know how thetest works empirically. We study the convergence of the test according to the fractionalprocess used to generate simulated data. We investigate this rate of convergence whateverthe innovation process is uncorrelated or weakly correlated.

We briefly describe Robinson test (1994) which is a Lagrange Multiplier test for testingunit roots and other fractionally hypotheses when the roots are located at any frequencyon the interval [0, π]. The test is derived via the score principle and its asymptotic criticalvalues follow the Chi-squared distribution. Let (Yt)t be a stochastic process such that:

Yt = β′Zt + Xt, (7.7)

where (Zt)t is a k × 1 observable vector, β an unknown k × 1 vector and (Xt)t a processwhich follows Equation (7.6). In the rest of the paper, we assume that β = 0 and (εt)t iseither a strong white noise or a GARCH(1,1) noise.

Robinson (1994) worked with the general model (7.6) for a fixed d and tested the assump-tion

H0 : θ = (θ0, · · · , θk)′ = 0,

against the alternative:Ha : θ 6= 0.

The test statistic is defined by:

R =T

σ4

a2

A, (7.8)

151

where T is the length of the raw time series and

σ2 =2π

T

∗∑j

Iε(λj).

Iε(λj) is the periodogram of εt with εt = F (B)Yt, F (B) being given in Equation (7.6).Moreover, we get:

A =2

T

∗∑j

ψ(λj) · ψ(λj)′,

and

a2 =−2π

T

∗∑j

ψ(λj)I(λj),

where∗∑j

is the sum over λj =2πj

T∈ M = λ : −π < λ < π, λ /∈ (ρl−η, ρl +η) such

that ρl are the distinct poles of ψ(λ) on (−π, π], η is a given positive constant. Finally, weget:

ψ(λj) = (ψl(λj)),

with

ψl(λj) = δ0l log |2sin1

2λj|+ δkl log(2cos

1

2λj) +

k∑i=1

δil log(|2(cos(λj)− cos ωi)|,

for l = 0, 1, · · · , k, where δil = 1 if i = l and 0 otherwise.

Under stationary conditions, Robinson (1994) established that:

R →d χ2k+1,

where k + 1 = dim(θ). If χ2k+1 represents the χ2 distribution with k + 1 degrees of

freedom then χ2k+1,α represents a quantile for a given level α. As soon as R > χ2

k+1,α, wereject H0, with a risk α.

Under the null, the test chooses the best long memory parameter which corresponds tothe greatest p-value of the Chi-square test. We accept the null hypotheses if the p-value isgreater than the significant level and we reject it if the p-value is smaller than or equal tothe significance level. The test appears as a method to test the long memory parameters.We can perform the properties of this test to estimate parameters using a Monte Carlosimulation which provides the mean, bias and RMSE for a suitable number of replications.

Monte Carlo Experiment

In this section we carry out the Monte Carlo experiments for several models derived fromEquation (7.6) using different sample sizes with replications.

152

Under (H0), we simulate different models using first a strong Gaussian white noise (εt)t

with zero mean and unit variance and second a GARCH (1,1) noise. In that latter case,εt =

√htξt, ht = a0 + a1ε

2t−1 + b1ht−1, with (ξt)t being a sequence of i.i.d. Gaussian

random variables with zero mean and unit variance, a0 = 1, a1 = 0.15 and b1 = 0.8.

We consider nine various models: one model (4.2) , two models (4.3), two models (4.5)and four models (4.6). For the models (4.3), we use s = 4 and s = 12, then we mix thepossible explosion at frequency zero with the explosions with fixed seasonalities assum-ing s = 4 and s = 12. For the Gegenbauer models, we detail the results with respect tothe location frequency and the number of explosions inside the spectral density. Whenwe have only one factor in the model, the true value of the long memory parameter isd = 0.3 (except for the model (1 + B)d = εt, for which we use d = 0.2); in presence oftwo factors, we use d1 = 0.3 and d2 = 0.4; in presence of three factors, we use d1 = 0.2,d2 = 0.3 and d3 = 0.4.

We consider several sample sizes T from 100 to 3000. We do not give the results up to3000 because we intend to apply the method to macroeconomic data sets whose sizes aregenerally smaller. In all cases, we use two sizes of replication, TT = 100, 1000, 5000.We only present the results for TT = 100, because the results are quite similar withTT = 1000 and TT = 5000. The results are available upon request.

We carry out the code on the computer Mac OS X 10.5.1 Léopard, written in languageR. The random numbers are generated by the command "rnorm()" as the pseudo ran-dom numbers. In the tables the notation d represents the mean of the TT realisations(d1, . . . , dTT ) possessing the greatest p-value of the test. In the tables, n represents thepercentage of times that we get the true value for all the long memory parameters involvedinto the models.

We find that for the models with only one term, like models (4.2) and (4.6) with k = 1, thetest performs correctly for sample size greater than 900. However, for the models (4.3),although there is only one parameter to test, the test does not present good performances.The performances become correct for sample sizes equal to 2000 and 3000. The sameresults are observed when we simulate models with several factors, like the models (4.5)and (4.6) with k ≥ 2. The more the explosions are inside the spectral density, the worse isthe test’s performance. We never got convergence for the test applying at the model (4.7)as soon as k > 3. From a general point of view, as expected, the performance of the testincreases with the sample size.

The main results are the following:

1. First, we assume that the noise (εt)t is a strong white noise in all the models:

• For the model (4.2), d = 0.3 when T reaches 3000.

• For the models (4.3), d = 0.3, for s = 4 when T reaches 3000, and the testdoes not converge when we use s = 12.

153

• For the models (4.5), d = 0.3, for s = 4 and s = 12, when T reaches 3000.

• For the 1-factor model (4.6),

(a) If ν = −1, d = 0.201 when T reaches 700.(b) If ν = cos(π/3), d = 0.3 when T reaches 2000.

• For a 2-factors model (4.6), d1 = 0.3 and d2 = 0.4 when T reaches 3000.

• For a 3-factors model (4.6), d1 = 0.2, d2 = 0.3 and d3 = 0.4 when T reaches2000.

2. Second, we assume that the noise (εt)t is a GARCH(1,1) noise for all the models:

• For the model (4.2), d = 0.299 when T reaches 3000.

• For the model (4.3), d = 0.3, for s = 4 when T reaches 3000, and does notconverge when we use s = 12.

• For the models (4.5), the test does not converge.

• For the 1-factor model (4.6),

(a) If ν = −1, d = 0.206 when T reaches 1000.(b) If ν = cos(π/3), d = 0.3 when T reaches 2000.

• For a 2-factors model (4.6), d1 = 0.3 and d2 = 0.4 when T reaches 3000.

• For a 3-factors model (4.6), d1 = 0.2, d2 = 0.3 and d3 = 0.4 when T reaches3000.

In the presence of an infinite cycle, when we simulate the models (4.4) and (4.5), the com-parison of the performances of the test shows that the convergence is slower. Sometimesthe test does not converge at all. We observe also that the test does not converge when weuse the model (4.3) with s = 12. In any case, the test convergence is very slow for all themodels we use. As soon as we have more than one explosion, we need to use almost 1000data to be sure to attain in mean the correct estimated value. When we have more thanone explosion inside the spectral density, it appears difficult to use the test for sampleswhose size is smaller than 3000. The results are quite similar whatever the noise we usefor simulations: a strong white noise or a GARCH noise.

Conclusion

In this part, we evaluate the performances of the Robinson (1994) test for several sim-ulated SCLM models. We show that the sample size is crucial for the accuracy of thetest. It appears that the use of this test is mainly recommended when we observe only oneexplosion in the spectral density if we have at least 500 points. If there exists more thanone explosion inside the spectral density, this test does not provide accurate informationif the sample size of the data set is less than 3000. This latter result raises concern asregards the applications of Robinson test to seasonal macroeconomics data.

154

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

3752

6779

8085

8890

9498

100

100

d0.

248

0.26

0.28

30.

291

0.28

80.

289

0.29

0.29

60.

302

0.3

0.3

0.3

Tabl

e7.

1:Te

stfo

rmod

el(1−

B)0

.3X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

723

4350

5863

7277

8592

9910

0d

0.09

20.

187

0.23

50.

250.

258

0.26

40.

276

0.27

70.

285

0.29

20.

299

0.3

Tabl

e7.

2:Te

stfo

rmod

el(1−

B4)0

.3X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

00

00

33

410

1826

7898

d0.

00.

005

0.05

0.08

80.

148

0.18

0.19

10.

204

0.21

70.

2223

0.27

80.

298

Tabl

e7.

3:Te

stfo

rmod

el(1−

B12)0

.3X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

314

1726

2838

5052

5163

8792

d1

0.27

90.

314

0.29

60.

307

0.30

50.

297

0.29

90.

307

0.29

90.

30.

30.

302

d2

0.15

40.

287

0.33

20.

338

0.34

60.

360.

372

0.37

60.

377

0.38

60.

396

0.39

9

Tabl

e7.

4:Te

stfo

rmod

el(1−

B)0

.3(1−

B4)0

.4X

t=

ε tw

ithd

1=

0.3

whe

reε t

isa

stro

ngw

hite

nois

e.

155

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

00

31

47

1016

2027

7192

d1

0.29

90.

283

0.30

40.

303

0.30

40.

309

0.29

80.

306

0.30

20.

307

0.3

0.30

1d

20.

001

0.01

90.

137

0.17

50.

245

0.27

20.

284

0.30

30.

322

0.32

90.

373

0.39

4

Tabl

e7.

5:Te

stfo

rmod

el(1−

B)0

.3(1−

B12)0

.4X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

3459

6578

7891

8997

9498

100

100

d0.

246

0.28

60.

285

0.29

20.

286

0.29

30.

297

0.29

90.

30.

30.

30.

3

Tabl

e7.

6:Te

stfo

rmod

el(1−

2νB

+B

2)0

.15X

t=

(1+

B)0

.3=

ε tw

here

ε tis

ast

rong

whi

teno

ise

and

ν=−1

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

3255

6574

7590

8790

9794

9910

0d

0.15

90.

170.

179

0.19

0.18

90.

196

0.20

10.

194

0.20

10.

196

0.20

10.

2

Tabl

e7.

7:Te

stfo

rmod

el(1−

2νB

+B

2)0

.1X

t=

(1+

B)0

.2=

ε tw

here

ε tis

ast

rong

whi

teno

ise

and

ν=−1

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

1842

6069

7891

9494

9895

100

100

d0.

171

0.23

50.

266

0.27

20.

278

0.29

30.

294

0.29

60.

298

0.29

70.

30.

3

Tabl

e7.

8:Te

stfo

rmod

el(1−

2νB

+B

2)0

.3X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise,

ν=

cosπ 3

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

07

2329

3247

5256

6967

9410

0d

10.

068

0.15

20.

192

0.22

50.

231

0.25

50.

254

0.26

70.

277

0.28

0.29

50.

3d

20.

223

0.29

0.32

20.

343

0.35

10.

364

0.37

20.

378

0.38

80.

389

0.39

80.

4

Tabl

e7.

9:Te

stfo

rmod

el(1−

2ν1B

+B

2)0

.3(1−

2ν2B

+B

2)0

.4X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise,

ν 1=

cosπ 3

,ν2

=co

s5π 6

.

156

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

04

821

3350

5863

7773

100

100

d1

0.02

30.

085

0.12

90.

142

0.15

60.

173

0.17

60.

180.

192

0.18

70.

20.

2d

20.

160.

229

0.24

50.

250.

271

0.27

80.

284

0.28

50.

295

0.28

60.

30.

3d

30.

133

0.24

70.

306

0.33

0.35

30.

369

0.36

90.

372

0.38

80.

385

0.4

0.4

Tabl

e7.

10:

Test

for

mod

el(1−

2ν1B

+B

2)0

.2(1−

2ν2B

+B

2)0

.3(1−

2ν3B

+B

2)0

.4X

t=

ε tw

here

ε tis

ast

rong

whi

teno

ise,

ν 1=

cosπ 6

,ν2

=co

sπ 2

,ν3

=co

s2π 3

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

3450

5663

7276

8081

8384

9999

d0.

249

0.27

80.

279

0.28

10.

292

0.29

10.

296

0.29

10.

301

0.29

80.

299

0.29

9

Tabl

e7.

11:T

estf

orm

odel

(1−

B)0

.3X

t=

ε tw

here

ε tis

GA

RC

H(1

,1).

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

624

3846

4373

6472

8172

9198

d0.

112

0.19

30.

235

0.25

10.

244

0.27

40.

270.

274

0.28

10.

272

0.29

10.

3

Tabl

e7.

12:T

estf

orm

odel

(1−

B4)0

.3X

t=

ε tw

here

ε tis

GA

RC

H(1

,1).

157

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

00

10

79

1213

2029

7594

d0.

00.

005

0.06

0.11

0.15

60.

182

0.19

20.

204

0.21

80.

229

0.27

50.

294

Tabl

e7.

13:T

estf

orm

odel

(1−

B12)0

.3X

t=

ε tw

here

ε tis

GA

RC

H(1

,1).

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

04

1420

2232

3649

5054

8386

d1

0.28

30.

312

0.29

70.

292

0.31

20.

289

0.30

40.

30.

30.

298

0.30

60.

299

d2

0.15

20.

257

0.31

90.

335

0.34

0.36

90.

372

0.37

80.

381

0.38

50.

393

0.39

5

Tabl

e7.

14:T

estf

orm

odel

(1−

B)0

.3(1−

B4)0

.4X

t=

ε tw

here

ε tis

GA

RC

H(1

,1).

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

00

02

59

813

2220

6381

d1

0.29

40.

294

0.28

0.30

20.

298

0.30

20.

299

0.30

50.

303

0.30

50.

304

0.30

4d

20

0.02

0.11

70.

177

0.24

90.

278

0.28

50.

298

0.31

40.

323

0.37

0.38

6

Tabl

e7.

15:T

estf

orm

odel

(1−

B)0

.3(1−

B12)0

.4X

t=

ε tw

here

ε tis

GA

RC

H(1

,1).

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

3149

5966

7176

8485

8686

9410

0d

0.23

20.

273

0.29

40.

284

0.29

10.

298

0.30

40.

299

0.29

90.

30.

30.

3

Tabl

e7.

16:T

estf

orm

odel

(1−

2νB

+B

2)0

.15

=(1

+B

)0.3X

t=

ε tw

here

ε tis

GA

RC

H(1

,1)a

ndν

=−1

.

158

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

3654

6561

6773

8287

8886

9499

d0.

145

0.16

60.

190.

192

0.18

50.

193

0.19

20.

195

0.19

80.

206

0.20

20.

199

Tabl

e7.

17:T

estf

orm

odel

(1−

2νB

+B

2)0

.1=

(1+

B)0

.2X

t=

ε tw

here

ε tis

GA

RC

H(1

,1)a

ndν

=−1

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

2043

5765

8281

7991

8489

100

100

d0.

181

0.24

0.26

90.

277

0.28

30.

286

0.28

90.

291

0.29

0.29

30.

30.

3

Tabl

e7.

18:T

estf

orm

odel

(1−

2νB

+B

2)0

.3X

t=

ε tw

here

ε tis

GA

RC

H(1

,1),

ν=

cosπ 3

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

28

1718

3939

5152

5063

8510

0d

10.

062

0.13

80.

196

0.21

0.24

90.

262

0.25

0.26

50.

266

0.27

20.

30.

3d

20.

218

0.27

0.33

10.

341

0.36

20.

368

0.36

70.

374

0.38

30.

394

0.4

Tabl

e7.

19:T

estf

orm

odel

(1−

2ν1B

+B

2)0

.3(1−

2ν2B

+B

2)0

.4X

t=

ε tw

here

ε tis

GA

RC

H(1

,1),

ν 1=

cosπ 3

,ν2

=co

s5π 6

.

T10

020

030

040

050

060

070

080

090

010

0020

0030

00n

05

1010

2341

3851

6258

9597

d1

0.02

30.

097

0.13

80.

144

0.15

60.

167

0.17

20.

177

0.18

70.

177

0.19

80.

195

d2

0.16

70.

224

0.24

90.

266

0.28

10.

285

0.27

70.

281

0.28

40.

283

0.30

20.

3d

30.

161

0.26

0.32

80.

330.

346

0.36

0.36

50.

369

0.38

40.

378

0.39

70.

4

Tabl

e7.

20:T

estf

orm

odel

(1−

2ν1B

+B

2)0

.2(1−

2ν2B

+B

2)0

.3(1−

2ν3B

+B

2)0

.4X

t=

ε tw

here

ε tis

GA

RC

H(1

,1),

ν 1=

cosπ 6

,ν2

=

cosπ 2

,ν3

=co

s2π 3

.

159

Chapter 8

Conclusion

In this thesis, we have studied the stochastic long memory processes. Two classes of pro-cesses have been considered, the stationary long memory processes and the non-stationarylong memory processes. We have investigated the probabilistic properties, parameter es-timation methods, statistical tests, forecast methods, etc. And the applications to thefinancial data and the energy data are followed.

8.1 Overview of the ContributionThe contributions of this thesis on the stationary processes are composed by four parts.

• Self-similarity Properties:In Chapter 2, we studied the probabilistic properties of the stationary processes,focusing on their corresponding self-similar properties. In continuous-time frame-work, we have reviewed three classes of processes: H-self-similar processes (ex:Brownian motion), Gaussian H-self-similar process with stationary increments (frac-tional Brownian motion), non-Gaussian H-self-similar processes with stationaryincrements (ex: α-stable processes with 0 < α < 2) and multi-fractional processes.In discrete-time framework, we have reviewed the different definitions of self-similarity (exactly self-similar, asymptotically self-similar, exactly second-orderself-similar, asymptotically second-order self-similar) and clarified the relationshipamong these definitions. Since the concepts of self-similarity and long / short mem-ory are not equivalent, we proposed two important propositions which build therelationship between long range dependence and self-similarity. We proved thatthe stationary long memory process is asymptotically second-order self-similar,while the stationary short memory process is not asymptotically second-order self-similar. What’s more, under Gaussianity, the stationary long memory process isasymptotically self-similar, and the stationary short memory process is not asymp-totically self-similar. Then we applied these results on the classical discrete-timeprocesses, for example, the fractional Gaussian noise, k-factor GARMA processes,heteroscedastic processes, processes with switches and jumps, etc.

• Estimation:In Chapter 4, we made a review of the estimation methods for stationary seasonal

160

and/or cyclical long memory (SCLM) processes (Robinson, 1994). Firstly we con-sidered the ARFIMA models. After briefly describing the model, we recalled forthis simple and classical long memory model the estimation methods, including theparametric, semiparametric (in frequency domain and in time domain) and waveletmethods. We did not present all the mathematical assumptions underlying the es-timation procedure but rather described the methods and applicability, and we alsodiscussed the asymptotic distributions of the estimators. For the parametric method,we have studied the four parametric maximum likelihood estimators (MLE): theexact time domain MLE, the modified profile likelihood estimator, the conditionaltime domain MLE and the frequency domain MLE. For the semiparametric esti-mators, we have looked at the log-periodogram regression method and the localWhittle Whittle approach. We also mentioned some modified versions of these es-timators. For the wavelet methods, we focused on the wavelet-based ordinary leastsquares (OLS) estimator and the wavelet maximum likelihood estimator. We dis-cussed also the advantages and disadvantages of these estimators. Secondly, weconsidered the estimation methods of the models with seasonality. We studied thek-factor Gegenbauer ARMA processes for its OLS estimator, approximate maxi-mum likelihood estimator, conditional sum of squares residuals (CSS) estimator,wavelet-based OLS estimator and wavelet-based approximate maximum likelihoodestimator, etc. We also have a brief look at the estimation methods for the modelswith fixed seasonal periodicity, the particular cases of the SCLM model. Thus, therigid Hassler’s models and flexible Hassler’s models are all in this category. Forcompleteness, we also mentioned a little the seasonal and/or cyclical asymmetriclong memory processes and the corresponding estimation methods.

• Unit Root Test:In Chapter 7, we first recalled the well-known unit root test for ARMA process,Dicky-Fuller test, the augmented Dicky-Fuller test, etc. We mainly focused on oneof unit root tests, the Robinson (1994) test. We evaluated the performances of theRobinson (1994) test for several simulated SCLM models. We showed that thesample size is crucial for the accuracy of the test. It seems that the use of this test ismainly recommended when we observe only one explosion in the spectral densityif we have at least 500 points. The results raise concern as regards the applicationsof Robinson test to seasonal macroeconomics data.

Our contributions on the non-stationary processes are as follows:

• Modeling:In Chapter 5, we first recalled two kinds of non-stationary fractional integrated pro-cesses, the fractionally integrated processes with a constant long memory parameter

|d|(> 1

2) and the generalized fractionally integrated processes with time-varying

parameter. Then we proposed a new non-stationary process, which permits the ex-istence of the time-varying persistence and seasonality at the same time: the locallystationary k-factor Gegenbauer process. This new process can be regarded as anextension of the stationary k-factor Gegenabuer process allowing the time-varying

161

parameter. On the other hand, it can be considered as an extension of the general-ized fractional integrated process permitting the seasonalities.

• Estimation and Forecast:In Chapter 5, we first reviewed the estimation methods for the non-stationary frac-tional integrated processes and their corresponding asymptotic behaviors.Then weproposed a new non-stationary model: the locally stationary k-factor Gegenabuerprocess and a new semiparametric estimation procedure using wavelet method,which is proved to be consistent and asymptotically normal. Through Monte Carlosimulation experiments, we studied the robustness of the estimator. The results aresatisfying with small bias and small root mean square errors. Then we presentedthe forecast method for non-stationary long memory processes.

• Applications:Chapter 6 is concerned with the applications on the financial data and the energydata. For the error correction term in ECM of NAS 225 index, we applied onthe series the new non-stationary model and the wavelet-based algorithm that weproposed in Chapter 5. With comparison to other authors’ work, the estimation ofour model captures better the local characteristics of the parameter function. Andthe forecast based on our estimation is also quite satisfying with little bias andRMSE. For the world crude oil data, we have modeled the WTI price series byseveral different models, the autoregressive models with fractional integrated noiseand the locally stationary 1-factor Gegenabuer process. For the modeling by thelocally stationary 1-factor Gegenabuer model, we need to pay attention to the rangeof the definition interval. If needed, we have to do some transformation such that

the estimated parameter function lies well in the interval (−1

2,1

2). Thus, we can get

the satisfying forecast results.

8.2 Possible Directions for Future ResearchSome challenging problems remain for future research.

The first problem is to provide an approximate maximum likelihood estimation (AMLE)method for the locally stationary k-factor Gegenbauer processes. For the generalizedfractional integrated model, generally, there exist two wavelet-based estimation methods:the wavelet-based OLS method and the wavelet-based AMLE method. In this thesis, wehave built the wavelet-based OLS method for the locally stationary k-factor Gegenbauerprocesses. Thus, we would like to establish a wavelet-based AMLE method for this newnon-stationary model. However, because of the technical difficulty, we have not yet solvedthis problem.

The second problem is to consider the locally stationary k-factor Gegenbauer processeswith the short memory terms. What we consider in this thesis are the locally stationaryk-factor Gegenabuer processes, which are the particular cases of the locally stationary k-factor GARMA processes. It will be of great interest to estimate the ARMA coefficients

162

together with the long memory parameter functions. But we are not quite sure whetherthe existence of ARMA term will result in some contaminations on the estimation.

Another idea is to propose a new model: the locally stationary seasonal and/or cyclicallong memory process, which permits the long memory parameters of the SCLM model tovary with time. The spectrum will be a little complicated, and the wavelet-based estima-tion will be even more difficult. But it will be quite interesting to investigate such kind ofgeneral model.

163

Appendix A

The Well-definedness of the LocallyStationary k-factor GegenbauerProcesses

In this part, we give out the proof for Proposition 5.3.1. For more details and the classicaltheorem of Darboux’s method, we can refer to Giraitis and Leipus (1995).

PROOF of Proposition 5.3.1.

Proof. We prove it through Darboux’s method.Let λ−j = −λj , d−j = dj , j = 1, · · · ,m. Consider the following function

Ud1(t),··· ,dm(t)(z) =m∏

j=1

(1− eiλjz)dj(t)(1− e−iλjz)dj(t)

with singular points zj = eiλj , j = ±1, · · · ,±m on the unit circle |z| = 1. DenoteI(k) = j : 1 ≤ |j| ≤ m,λj 6= λ−kmod 2π. Thus, Ud1(t),··· ,dm(t)(z) = ek(t)hk(1−zz−k)

in the neighborhood of the point zk(1 ≤ |k| ≤ m), where ek(t) =∏

j∈I(k)

(1− zjzk)dj(t) and

hk(z, t) = zd∗k(t)∏

j∈I(k)

(1− zjzk)dk(t). If we expand hk(z, t) in powers series about z = 0,

we obtain ek(t)hk(z, t) =∞∑

ν=0

c(k)ν (t)zd∗k(t)+ν , where

c(k)ν (t) = ek(t)

∑

(s)

∏

j∈I(k)

(d∗j(t)sj

)(

zjzk

1− zjzk

)sj

and the sum∑

(s)

is taken over all integers 0 ≤ sj ≤ ν, j ∈ I(k) such that∑

j∈I(k)

sj =

ν. Then by Darboux’s method (see Giraitis and Leipus 1995), the following general

164

expansion for the weights πn(t) in (5.6) can be obtained

πn(t) =

p−1∑ν=0

m∑

k=1

c(k)ν

(d∗k(t) + ν

n

)+ O(n−P−mind∗1(t),··· ,d∗m(t)−1) (A.1)

where

c(k)ν =

2Re(c(k)

ν (−e−iλk)n), if 0 < λk < π

Re(c(k)ν (−e−iλk)n), λk = 0 or π.

If we stop the expansion (A.1) at the term ν = 0 (p = 1), we obtain

πn(t) =m∑

k=1

(d∗k(t)

n

)c(k)0 + O(n−2−mind∗1(t),··· ,d∗m(t)), as n →∞.

Then the symmetry leads to the asymptotical expansion (5.9). So the application of thelinear filter (ψj, j ∈ Z) to a stationary white noise ε(t) gives a well-defined process

X(t) = 5−d1(t),··· ,−dm(t)λ1,··· ,λm

=∞∑

n=0

ψn(t)ε(t− n),

which is the solution of Equation (5.13).

According to the condition (5.7) and asymptotical expansion (5.9), we can get∞∑

n=0

ψ2n(t) <

∞ and∞∑

n=0

π2n(t) < ∞. Therefore we obtain the uniqueness.

165

Bibliography

[1] Abadir, K.M., W. Distaso, L. Giraitis (2007). Nonstationarity-extended local Whittleestimation. J. Econometrics, 141, 2, 1353-1384.

[2] Abry, P., P. Flandrin, M.S. Taqqu and D. Veitch (2000). Wavelets for the analysis,estimation ans synthesis of scaling data. In: K.Park and W. Willinger (eds.), Self-similar Network Traffic and Performance Evaluation. Wiley, New York.

[3] Abry, P., P. Flandrin, M.S. Taqqu and D. Veitch (2001). Self-similarity and long-rangedependence through the wavelet lens. In: P. Doukhan, G. Oppenheim and M.S. Taqqu,eds., Long-Range dependence: Theory and Applications. Birkhäuser, Boston.

[4] Abry, P. and F. Sellan (1996). The wavelet based synthesis for fractional Brownianmotion proposed by F. Sellan and Y. Meyer: Remarks and fast implementation. Appliedand Computational Harmonic Analysis, 3, 377-383.

[5] Agiakloglou, C., P. Newbold and M. Wohar (1993). Bias in an estimator of the frac-tional difference parameter. J. Time Series Analysis, 14, 235-246.

[6] An, S. and P. Bloomfield (1993). Cox and Reid’s modification in regression modelswith correlated errors. Technical Report, Department of Statistics, North Carolina StateUniversity, Raleigh, NC 27695-8203, USA.

[7] Andel, J. (1986). Long memory time series models. Kybernetika, 22, 105-123.

[8] Andersen, T.G., T. Bollerslev, F.X. Diebold and H. Ebens (2001). The distribution ofrealized stock return volatility. J. Financial Economics, 61, 1, 43-76.

[9] Andersen, T.G., T. Bollerslev, F.X. Diebold and P. Labys. The Distribution of Real-ized Exchange Rate Volatility. J. American Statistical Association, 96(453), 42-55.

[10] Andrews, D.W.K. and P. Guggenberger (2003). A bias-reduced log-periodogramregression estimator for the long-memory parameter. Econometrica, 71, 2, 675-712.

[11] Andrews, D.W.K. and Y. Sun (2004). Adaptive Local Polynomial Whittle Estimationof Long-Range Dependence. Econometrica, 72, 2, 569-614.

[12] Arteche, J. (1998). Log-periodogram regression in seasonal/cyclical long memorytime series. Working paper, university of the Basque country (UPV-EHU), November.

166

[13] Arteche, J. and P.M. Robinson (2000). Semiparametric inference in seasonal andcyclical long memory processes. J. Time Series Analysis, 21, 1-25.

[14] Arteche, J. (2003). Semi-parametric robust tests on seasonal or cyclical long mem-ory time series. J. Time Series Analysis, 23, 251-285.

[15] Baillie R.T. (1996). Long memory processes and fractional integration in economet-rics. J. Econometrics,73, 1, 5-59.

[16] Baillie, R.T., T. Bollerslev and H.O. Mikkelsen (1996). Fractionally integrated gen-eralized autoregressive conditional heteroskedasticity. J. Econometrics, 74, 3-30.

[17] Baillie, R.T. and H. Chung (2001). Estimation of GARCH models from the autocor-relations of the squares of a process. J. Time Series Analysis, 22, 631-650.

[18] Bardet, J.M., G. Lang, E. Moulines and P. Soulier (2000). Wavelet Estimator oflong-range dependent processes. Statistical Inference for Stochastic Processes, 3, 85-99.

[19] Bardet, J.M., G. Lang, G. Oppenheim, A. Philippe, S. Stoev and M.S. Taqqu (2001).Semi-parametric estimation of the long-range dependence parameter: A survey. In: P.Doukhan, G. Oppenheim and M.S. Taqqu (eds.). Long-Range Dependence: Theoryand Applicaitons. Birkhäuser, Boston.

[20] Barkoulas, J.T. and C.F. Baum (1997). Fractional differencing modeling and fore-casting of eurocurrency. J. Financial Research, 20, 3, 355-372.

[21] Beran, J. (1994). Statistics for long memory processes. Chapman and Hall, NewYork.

[22] Beran, J. (1995). Maximum likelihood estimation of the differencing parameter forinvertible short and long memory autoregressive integrated moving average models. J.Royal Statistical Society, 57, 659-672.

[23] Beran, J. and N. Terrin (1996). Testing for a change of the long-memory parameter.Biometrika, 83 (3), 627-638.

[24] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. J.Econometrics, 31, 307-327.

[25] Bollerslev, T. (1988). On the corrrelation structure for the generalized autoregressiveconditonal heteroskedastic process. J. Time Series Analysis, 9, 121-131.

[26] Briggs, W.L. and V.E. Henson (1995). The DFT: an owner’s manual for the discreteFourier transform. Society for Industrial and Applied Mathematics Philadelphia.

[27] Brockwell, P.J. and R.A. Davis (2002). Introduction to Time Series and Forecasting.Series: Springer Texts in Statistics, Springer.

167

[28] Brodsky, J. and C.M. Hurvich (1999). Multi-step forecasting for long-memory pro-cesses. J. Forecasting, 18, 1, 59-75.

[29] Burrus, C.S., R.A. Gopinath and H. Guo (1998). Introduction to wavelets andwavelet transforms. Prentice Hall Upper Saddle River, NJ.

[30] Carlin, J.B. and A.P. Dempster (1989). Sensitivity analysis of seasonal adjustements:Empirical cas studies. J. American Statistical Association, 84, 6-20.

[31] Cavanaugh, J.E., Y. Wang and J.W. Davis (2002). Locally self-similar processes andtheir wavelet analysis. Chapter 3 in Handbook of statistics 21: Stochastic processes:Modelling and simulation, DN Shanbhag and CR Rao (eds.). Elsevier Science, Ams-terdam, The Netherlands.

[32] Chan, G. and A.T.A. Wood (1994). Simulation of stationary Gaussian processes in[0, 1]d. J. Compututational and Graphical Statistics, 3, 409-432.

[33] Chen, G., P. Hall and D.S. Poskitt (1995). Periodogram-based estimators of fractalproperties. The Annals of Statistics, 23, 1684-1711.

[34] Cheung, Y.W. and F.X. Diebold (1994). On maximum-likelihood estimation of thedifferencing parameter of fractionally-integrated noise with unknown mean. J. Econo-metrics, 62, 301-316.

[35] Cheung, Y.W. and K.S. Lai (1993). A search for long memory in international stockmarket returns. J. International Money and Finance, 14, 4, 597-615.

[36] Chui, C. K. (1992). An Introduction to Wavelets. New York: Academic Press.

[37] Chui, C. K. (1997). Wavelets: A Mathematical Tool for Signal Analysis. Society forIndustrial Mathematics.

[38] Chung, C.F. (1996a). Estimating a generalized long memory process. J. Economet-rics, 73, 1, 237-259.

[39] Chung, C.F. (1996b). A generalized fractionally integrated autoregressive moving-average process. J. Time Series Analysis, 17, 2, 111-140.

[40] Chung, C.F. and R.T. Baillie (1993). Small sample bias in conditional sum-of-squares estimators of fractionally integrated ARMA models. Empirical Economics,18, 791-806.

[41] Coifman, R.R. and D. Donoho (1995). Time-invariant wavelet de-noising. In Anto-niadis and Oppenheim, 125-150.

[42] Collet, J. and D. Guégan (2004). Another Characterization of the long memory be-havior. Note de Recherche IDHE-MORA No01-2004, ENS Cachan, France.

[43] Comte, F. (1996). Simulation and estimation of long memory continous time mod-els. J. Time Series Analysis, 17, 19-36.

168

[44] Constantine, A.G. and P. Hall (1994). Characterizing surface smoothness via esti-mation of effective fractal dimension. J. Royal Statistical Society B, 56, 97-113.

[45] Cooley, J.W. and J.W. Tukey (1965). An Algorithm for the Machine Calculation ofComplex Fourier Series. Mathematics of computation,19, 90, 297-301.

[46] Cooley, J.W., P.A.W. Lewis and P.D. Welch (1967). Historical notes on the fastFourier transform. Proceedings of the IEEE, 55, 10, 1675-1677.

[47] Cox, D.R. (1984). Long-range dependence: A review, Statistics: An Appraisal. Pro-ceedings 50th Anniversary Conference, Iowa State Statistical Laboratory, Iowa StateUniversity Press, 55-74.

[48] Cox, D.R. and N. Reid (1987). Approximations to Noncentral Distributions. TheCanadian Journal of Statistics, 15, 2.

[49] Cox, D.R. and N. Reid (1992). A note on the difference between profile and modifiedprofile likelihood. Biometrika, 79(2), 408-411.

[50] Cox, D.R. and N. Reid (1993). A note on the calculation of adjusted profile likeli-hood. J. Royal Statistical Society, Series B.

[51] Crato, N. and B.K. Ray (1999). Model selection and forecasting for long-range de-pendent processes. J. Forecasting,15, 107-125.

[52] Dahlhaus, R. (1989). Efficient parameter estimation for self-similar processes. itAnnals of Statistics, 17, 1749-1766.

[53] Dahlhaus, R. (1996a). On the Kullback Leibler information divergence of locallystationary processes. Stochastic Processes and their Applications, 62, 139-168.

[54] Dahlhaus, R. (1996b). Asymptotic statistical inference for nonstationary processeswith evolutionary spectra. In Athens Conference on Applied Probability and Time Se-ries Analy Z. sis P. M. Robinson and M. Rosenblatt (eds.). 2. Springer, New York.

[55] Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Annalsof Statistics, 25, 1, 37.

[56] Darné, O., V. Guiraud and M. Terraza (2004). Forecast of the seasonal fractionalintegrated series. J. Forecasting, 23, 1-17.

[57] Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Com-munications on Pure and Applied Mathematics, 41, 909-996.

[58] Daubechies, I. (1992). Ten Lectures on Wavelets. Volume 61 of CBMS-NSF Re-gional Conference Series in Applied Mathematics. Philadelphia: Society for Industrialand Applied Mathematics.

[59] Davidson, J. (2001). Moment and memory properties of linear conditional het-eroscedasticity models. Working paper, Cardiff University, UK.

169

[60] Davies, R.B. and D.S. Harte (1987). Tests for Hurst effect. Biometrika, 74(1), 95-101.

[61] Dickey, D.A. (1976). Estimation and hypothesis testing in nonstationary time series.PHD dissertation, Iowa State University.

[62] Dickey, D.A. and W.A. Fuller (1979). Distribution of the Estimator for Autoregres-sive Time Series with a Unit Root. J. American Statistical Association, 74, 427-431.

[63] Dickey, D.A. and W.A. Fuller (1981). Likelihood Ratio Statistics for AutoregressiveTime Series with a Unit Root. Econometrica, 49, 1057-1072.

[64] Dickey, D.A., W.R. Bell and R.B. Miller (1986). Unit roots in time series models:tests and implications. The American Statistician, 40, 1, 12-26.

[65] Diebold, F.X. and A. Inoue (2001). Long memory and regime switching. J. Econo-metrics, 105, 1, 131-159.

[66] Diebold, F.X., S. Husted and M. Rush (1991). Real Exchange Rates under the GoldStandard. The Journal of Political Economy, 99, 6, 1252-1271.

[67] Diebold, F.X. and G. D. Rudebusch (1989). Long memory and persistence in aggre-gate output. J. Monetary Economics, 24(2), 189-209.

[68] Diebold, F.X. and G. D. Rudebusch (1991). Forecasting output with the compositeleading index: a real-time analysis. American Statistical Association, 86, 415.

[69] Ding, Z. and C.W.J. Granger (1996). Modeling volatility persistence of speculativereturns: A new approach. J. Econometrics, 73, 185-215.

[70] Diongue, A.K., D. Guégan and B. Vignal (2007). The stationary seasonal hyperbolicasymmetric power ARCH model. statistics and probability letters, 77, 1158-1169.

[71] Diongue, A.K., D. Guégan and B. Vignal (2009). Forecasting electricity spot marketprices with a k-factor GIGARCH process. Applied energy, 36, 505-510.

[72] Diongue, A.K., D. Guégan and B. Vignal (2004). A k-factor GIGARCH process:estimation and application on electricity market spot prices. IEEE proceedings of the8th International Conferences on probability methods Applied to power systems, IowaState University, AMES, Iowa, 1 - 7.

[73] Dolado, J.J., J. Gonzalo and L. Mayoral (2002). A fractional Dickey-Fuller test forunit roots. Econometrica, 70, 5, 1963-2006.

[74] Dufrénot, G., D. Guégan and A. Péguin-Feissolle (2005a). Long-memory dynamicsin a SETAR model. Applications to stock markets. J. International Financial Markets,Institutions and Money, 15, 391-406.

[75] Dufrénot, G., D. Guégan and A. Péguin-Feissolle (2005b). Modeling squared returnsusing a SETAR model with long-memory dynamics. Economics Letters, 86, 237-243.

170

[76] Dufrénot, G., D. Guégan and A. Péguin-Feissolle (2008). Changing regime volatil-ity: a fractionally integrated SETAR model. Applied Financial Economics, 18, 519-526.

[77] Embrechts, P. and M. Maejima (2000). An introduction to the theory of self-similarstochastic processes. International Journal of Modern Physics B, 14, 1399-1420.

[78] Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with estimates ofthe variance of United Kindom inflation. Econometrica, 50, 987-1007.

[79] Engle, R.F. and T. Bollerslev (1986). Modeling the persistence of conditional vari-ance, Econometric Reviews, 5, 1-50.

[80] Engle, R.F. and C.W.J. Granger (1987). Co-integration and error correction: Repre-sentation, estimation and testing. Economica, 55, 251-276.

[81] Engle, R.F. and B.S. Yoo (1987). Forecasting and testing in cointegrated systems. J.Econometrics, 35, 143-159.

[82] Evans, G.B.A. and N.E. SAVIN (1981). Testing for Unit Roots: 1. Econometrica,49, 753-779.

[83] Evans, G.B.A. and N.E. SAVIN (1984). Testing for Unit Roots: 2. Econometrica,52, 1241-1269.

[84] Fan, J. (2000). Prospects of nonparametric modeling. J. American Statistical Asso-ciation, 95, 452, 1296-1300.

[85] Fan, J. and Q. Yao (2005). Nonlinear time series: Nonparametric and parametricmethods. Springer Science, Business Media, Inc.

[86] Ferrara, L. (2000). Processus longue méoire généralisés: estimation, prévision etapplications. thèse de doctorat, Université Paris XIII, France.

[87] Ferrara, L. and D. Guégan (2001a). Forecasting with k-factor Gegenbauer processes:Theory and applications. J. Forecasting, 20, 581-601.

[88] Ferrara, L. and D. Guégan (2001b). Comparison of parameter estimation methods incyclical long memory time series. Developments in Forecast Combination and Portfo-lio Choice, C. Dunis, J. Moody and A. Timmermann (eds.), Wiley, New York, Chapter8.

[89] Fox, R. and M.S. Taqqu (1986). Large-sample properties of parameter estimates forstrongly dependent stationary gaussian series. Annals of Statistics, 14, 517-532.

[90] Franses, P.H. and Ooms, M. (1997). A periodic long memory model for quartely UKinflation. International Journal of Forecasting, 13, 117 - 126.

[91] Fuller, W.A. (1976). Introduction to statistical time series. New York, John Wileyand Sons.

171

[92] Gençay, R., B. Selçuk and B. Whitcher (2001). An introduction to wavelets andother filtering methods in finance and economics(1 ed.). Academic Press.

[93] Geweke J. and S. Porter-Hudak (1983). The estimation and application of long mem-ory time series models. J. Time Series Analysis, 4, 221-238.

[94] Gil-Alana, L.A. (2001). A fractionally integrated exponential spectral model for theUK unemployement. J. Forecasting, 20, 329-340.

[95] Gil-Alana, L.A. (2006). Testing seasonality in the contrext of fractionally integratedprocesses. Annales d’Economie et de Statistique, 81, 69-91.

[96] Gil-Alana, L.A. and P.M. Robinson (1997). Testing of unit root and other nonsta-tionary hypotheses in macroeconomic time series. J. Econometrics, 80, 2, 241-268.

[97] Gil-Alana, L.A. and P.M. Robinson (2001). Testing of seasonal fractional integrationin UK and Japanese consumption and income. J. Applied Econometrics, 16(2), 95-114.

[98] Giraitis, L. and R. Leipus (1995). A generalized fractionally differencing approachin long memory modelling. Lithuanian Mathematical Journal, 35, 65-81.

[99] Giraitis, L. and P.M. Robinson (2003). Edgeworth Expansions for SemiparametricWhittle Estimation of Long Memory. The Annals of Statistics, 31, 4, 1325-1375.

[100] Giraitis, L. and D. Surgailis (1990). A central limit theorem for quadratic formsin strongly dependent linear variables and its application to asymptotic normality ofWhittlears estimate. Probability Theory and Related Fields, 86, 87-104.

[101] Goncalves, P. and P. Abry (1997). Multiple-window wavelet transform and localscaling exponent estimation. Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, 5, 3433-3436.

[102] Goupillaud. P., A. Grossmann and J. Morlet (1984). Cycle-octave and related trans-forms in seismic signal analysis. Geoexploration, 23,1, 85-102.

[103] Granger, C.W.J. and R. Joyeux (1980). An introduction to long memory time seriesmodels and fractional differencing. J. Time Series Analysis, 1, 15-29.

[104] Granger, C.W.J. and T. Teräsvirta (1999). A simple nonlinear time series modelwith misleading linear properties. Economics letters, 62, 161-165.

[105] Gray, H.L., N.F. Zhang and W.A. Woodward (1989). On generalized fractionalprocesses. J. Time Series Analysis, 10(3), 233-257.

[106] Gray, H.L., N.F. Zhang and W.A. Woodward (1994). On generalized fractionalprocesses - a correction. J. Time Series Analysis, 15(5), 561-562.

[107] Green, P.J. and B.W. Silverman (1994). Nonparametric Regression and General-ized Linear Models: A Roughness Penalty Approach. Chapman and Hall.

172

[108] Grossmann, A. and J. Morlet (1984). Decomposition of functions into wavelets ofconstant shape, and related transforms. University of Bielefeld.

[109] Guégan, D. (2000). A new model: the k-factor GIGARCH process. J. Signal pro-cessing, 4, 265-271.

[110] Guégan, D. (2003). A prospective study of the k-factor Gegenbauer process withheteroscedastic errors and an application to inflation rates. Finance India, 17, 1-21.

[111] Guégan, D. (2005). How can we define the concept of long memory? An econo-metric survey. Econometric Reviews, 24, 2.

[112] Guégan, D. (2009). Global and local stationary modelling in finance: theory andempirical evidence. In revision for Economic Review.

[113] Guégan, D. and Z.P. Lu (2007). A note on self-similarity for discrete time series.CES Working paper, 2007.55, Université Paris 1, France.

[114] Hall, P., H.L. Koul and B.A. Turlach (1997). Note on convergence rates of semi-parametric estimators of dependence index. The Annals of Statistics, 25, 1725-1739.

[115] Haar, A. (1910). Zur Theorie der orthogonalen Funktionensysteme. (German)Mathematische Annalen, 69, 3, 331-371.

[116] Hassler, U. (1994). Misspecification of long memory seasonal time series. J. TimeSeries Analysis, 15, 19-30.

[117] Hassler, U. and J. Wolters (1994). On the power of unit root test against fractionalalternatives. Economics Letters, 45, 1-5.

[118] Hauser, M.A., B.M. Potscher, E. Reschenhofer (1999). Measuring persistence inaggregate output: ARMA models, fractionally integrated ARMA models and nonpara-metric procedures. Empirical Economics, 24, 2, 243-269.

[119] Henry, M. and P.M. Robinson (1996). Bandwidth choice in Gaussian semiparamet-ric estimation of long range dependence, in P. M. Robinson and M. Rosenblatt (eds.).Athens Conference on Applied Probability and Time Series Analysis, Volume II: TimeSeries Analysis, In Memory of E. J. Hannan, Springer, New York, 220-232.

[120] Hipel, K.W. and A.I. McLeod (1978). Preservation of the rescaled adjusted range,Part 2, Simulation studies using Box-Jenkins models. Water Resources Research, 14,509-516.

[121] Hosking, J.R.M. (1981). Fractional differencing. Biometrika, 68(1), 165-176.

[122] Hosoya, Y. (1997). A limit theory for long-range dependence and statistical infer-ence on related models. Annals of Statistics, 25, 105-137.

[123] Hubbard, B.B. (1996). The world according to wavelets. AK Peters, Ltd., 2 RevUpd edition.

173

[124] Hurst, H.E. (1951). Long term storage capacity of reservoirs, American Society ofCivil Engineers, 116, 776-808.

[125] Hurvich, C.M. and K.I. Beltrao (1993). Asymptotics for the low-frequency ordi-nates of the periodogram of a long-memory time series. J. Time Series Analysis, 14, 5,455-472.

[126] Hurvich, C.M. and K.I. Beltrao (1994). Automatic Semiparametric Estimation ofthe Memory Parameter of a Long-Memory Time Series. J. Time Series Analysis, 15,285-302.

[127] Hurvich, C.M. and W.W. Chen (2000). An efficient taper for potentially overdiffer-enced longmemory time series. J. Time Series Analysis, 21, 155-180.

[128] Hurvich, C.M., R.S. Deo and J. Brodsky (1998). The mean squared error ofGeweke and Porter-Hudakars estimator of the memory paramater of a long memorytime series. J. Time Series Analysis, 19, 19-46.

[129] Hurvich, C.M. and B.K. Ray (1995). Estimation of the memory parameter for non-stationary or noninvertible fractionally integrated processes. J. Time Series Analysis,16, 17-42.

[130] Jensen, M.J. (1999a). An approximate wavelet MLE of short and long memoryparameters. Studies in Nonlinear Dynamics and Economics, 3(4), 239-253.

[131] Jensen, M.J. (1999b). Using wavelets to obtain a consistent ordinary least squaresestimator of the long-memory parameter. J. Forecasting, 18(1), 17-32.

[132] Jensen, M.J. (2000). An alternative maximum likelihood estimator of long-memoryprocesses using compactly supported wavelets. J. Economic Dynamics and Control,24, 3, 361-387.

[133] Jensen, M.J. and B. Whitcher (2000). Time-varying long-memory in volatility: de-tection and estimation with wavelets. Working Paper, Department of Economics, Uni-versity of Missouri, Columbia.

[134] Jones, R.H. (1971). Spectrum estimation with missing observations. Annals of theInstitute of Statistical Mathematics, 23, 1, 387-398.

[135] Johnstone, I.M. and B.W. Silverman (1997). Wavelet threshold estimators for datawith correlated noise. J. Royal Statistical Society. Series B, 59, 319-351.

[136] Karanasos, M., Z. Psaradakis and M. Sola (2004). On the autocorrelation propertiesof long-memory GARCH processes. J. Time Series Analysis, 25, 265-281.

[137] Kent, J.T. and A.T.A. Wood (1997). Estimating the fractal dimension of a locallyself-similar Gaussian process by using increments. J. Royal Statistical Society B, 59,679-700.

174

[138] Kim, C.S. and P.C.B. Phillips (1999). Modified log periodogram regression.mimeo, Yale University.

[139] Kolmogorov, A.N. (1941). On degeneration of isotropic turbulence in an incom-pressible viscous liquid. Dokl. Akad. Nauk SSSR, 30, 301.

[140] Kolmogorov, A.N. (1961). Local structure of turbulence in fluid for every largeReynolds numbers, Transl. in Turbulence. S.K.Friedlander and L.Topper (eds). Inter-science Publishers, New York, 151-155.

[141] Kolmogorov, A.N. (1991). The Local Structure of Turbulence in IncompressibleViscous Fluid for Very Large Reynolds Numbers. Proceedings: Mathematical andPhysical Sciences, 434, 1890, Turbulence and Stochastic Process: Kolmogorov’s Ideas50 Years On (Jul. 8, 1991), 9-13.

[142] Kozhemyak, A. (2006). Modélisation de séries financières à l’aide de processusinvariants d’échelle. Application à la prédiction du risque, thèse de doctorat, ÉcolePolytechnique, France.

[143] Krämer, W. and I. Dittmann (1998). Fractional integration and the augmentedDickey-Fuller Test. Economics Letters, 61, 269-272.

[144] Künsch, H.R. (1986). Discrimination between monotonic trends and long-rangedependence. J. Applied Probability, 23, 1025-1030.

[145] Künsch, H.R. (1987). Statistical aspects of self-similar processes. In Y. Prokhorovand V.V. Sazanov (eds.). Proceedings of the First World Congress of the BernoulliSociety, VNU Science Press, Utrecht, 67-74.

[146] Leland, W.E., M.S. Taqqu, W. Willinger and D.V. Wilson (1994). On the self-similar nature of Ethernet traffic (extended version). IEEE/ACM Trans Netw., 2,1(Feb.), 1-15.

[147] Liang, J. and T.W. Parks (1996). A translation-invariant wavelet representation al-gorithm with applications. IEEE Transactions on Signal Processing, 44, 2, 225-232.

[148] Lien, D. and Y.K. Tse (1999). Forecasting the Nikkei spot index with fractionalcointegraion. J. Forecasting, 18, 259-273.

[149] Lo, A.W. (1991). Long term memory in stock market prices. Econometrica, 59,1279-1313.

[150] Lobato, I. and P.M. Robinson (1996). Averaged periodogram estimation of longmemory. J. Econometrics, 73, 1, 303-324.

[151] López-Ardao, J.C., C. López-García, A. Suárez-González, M. Fernández-Veigaand R.F. Rodríguez-Rubio (2000). On the use of self-similar processes in network sim-ulation. ACM Transactions on Modeling and Computer Simulation, 10, 125-151.

175

[152] Mallat, S. (1989). A theory for multiresolution signal decomposition: the wavele-trepresentation. Pattern Analysis and Machine Intelligence, IEEE Transactions, 11, 7,674-693.

[153] Mallat, S. (1999). A wavelet tour of signal processing (2 ed.). Academic press.

[154] Mallat, S. and W.L. Hwang (1992). Singularity detection and processing withwavelets. IEEE Transactions on Information Theory, 38, 2.

[155] Mallat, S. and D. Zhong (1992). Characterization of signals from multiscale edges.IEEE Transactions on patiern analysis and machine intelligence, 14, 7.

[156] Mandelbrot, B.B. and J.W. Van Ness (1968). Fractional Brownian motions, frac-tional noises and applications. SIAM Review, 10, 422-437.

[157] Mandelbrot, B.B. and J.R. Wallis (1969). Computer experiments with fractionalGaussian noises. Water Resources Resarch, 5, 228-267.

[158] Maynard, A. and P.C.B. Phillips (2001). Rethinking an old empirical puzzle:econometric evidence on the forward discount anomaly. J. Applied Econometrics, 16,6, 671-708.

[159] McCoy, E.J. and D.A. Stephens (2004). Bayesian time series analysis of periodicbehaviour and spectral structure. International Journal of Forecasting, 20, 4, 713-730.

[160] McCoy, E.J. and A.T. Walden (1996). Wavelet analysis and synthesis of stationarylong-memory processes. J. Computational and Graphical Statistics, 5(1), 26-56.

[161] Meyer, Y. (1991). Wavelets and applications. Proceedings of the InternationalCongress of Mathematicians, I, II, 1619-1626.

[162] Meyer, Y. (1992). Wavelets and operators. Cambridge Studies in Advanced Math.,vol. 37, Cambridge Univ. Press, Cambridge.

[163] Morris, J.M. and R. Peravali (1999). Minimum-bandwidth discrete-time wavelets.Signal Processing, 76(2), 181-193.

[164] Moulines, E., F. Roueff and M.S. Taqqu (2008). A wavelet whittle estimator of thememory parameter of a nonstationary gaussian time series. The Annals of Statistics,36, 4, 1925-1956.

[165] Nason, G.P. and B.W. Silverman (1995). The stationary wavelet transform andsome statistical applications. In Antoniadis and Oppenheim, 281-300.

[166] Nelson, D.B. (1990). Stationarity and persistence in the GARCH(1,1) model.Econometric Theory, 6, 318-334.

[167] Nielsen, M.Ø. (2004), Efficient likelihood inference in nonstationary univariatemodels. Econometric Theory, 20, 116-146.

176

[168] Nielsen, M.Ø. and P.H. Frederiksen (2005). Finite sample comparison of para-metric, semiparametric, and wavelet estimators of fractional integration. EconometricReviews, 24, 405-443.

[169] Noakes, D.J., K.W. Hipel, A.I. McLeod, C. Jimenez and C. Yakowitz (1988). Fore-casting annual geophysical time series. International Journal of Forecasting, 4, 103-115.

[170] Ogden, R.T. (1997). Essential Wavelets for Statistical Applications and Data Anal-ysis. Boston: Birkhäuser.

[171] Olhede, S.C., E.J. McCoy and D.A. Stephens (2004). Large-sample properties ofthe periodogram estimator of seasonally persistent processes. Biometrika, 91(3), 613-628.

[172] Palma, W. (2000). Long-Memory Time Series: Theory and Methods. Wiley Seriesin Probability and Statistics.

[173] Palma, W. and N.H. Chan (2005). Efficient Estimation of Seasonal Long-Range-Dependent Processes. J. Time Series Analysis, 26, 6, 863-892.

[174] Palma, W. and M. Zevallos (2004). Analysis of the correlation structure of squaretime series. J. Time Series, 25, 529-550.

[175] Parke, W.R. (1999). What is Fractional Integration? The review of economics andstatistics, 81, 4, 632-638.

[176] Percival, D.B. (1993). Simulating Gaussian random processes with specified spec-tra. Computing Science and Statistics, 24, 534-538.

[177] Percival, D.B. (1995). On estimation of the wavelet variance. Biometrika, 82(3),619-631.

[178] Percival, D.B. and H.O. Mofjeld (1997). Analysis of Subtidal Coastal Sea LevelFluctuations Using Wavelets. J. American Statistical Asscociation, 92, 868-880.

[179] Percival, D.B. and A.T. Walden (2000). Wavelet methods for time series analysis.Cambridge University Press.

[180] Pesquet, J.C., H. Krim and H. Carfantan (1996). Time-invariant orthonormalwavelet representations. IEEE Transactions on Signal Processing, 44, 8, 1964-1970.

[181] Phillips, P.C.B. (1987). Time Series Regression with a Unit Root. Econometrica,55, 277-301.

[182] Phillips, P.C.B. (2007). Unit root log periodogram regression. J. Econometrics,138, 1, 104-124.

[183] Phillips, P.C.B. and K. Shimotsu (2000). Modified Local Whittle Estimation of theMemory Parameter in the Nonstationary Case. Cowles Foundation Discussion Papersof Yale Universitywith number 1265.

177

[184] Phillips, P.C.B. and K. Shimotsu (2004). Local Whittle estimation in nonstationaryand unit root cases. Annals of Statistics, 32, 656-692.

[185] Phillips, P.C.B. and K. Shimotsu (2006). Local Whittle estimation of fractionalintegration and some of its variants. J. Econometrics, 130, 2, 209-233.

[186] Porter-Hudak, S. (1990). An application to the seasonally fractionally differencedmodel to the monetary aggregates. J. American Statistical Association, 85, 338-344.

[187] Priestley, M.B. (1981). Spectral Analysis and Time Series. Academic Press, Lon-don.

[188] Rainville, E.D. (1960). Special functions. The Macmillan Company, New York.

[189] Ramsey J.B. (1998). Regression over time scale decomposition: a sampling analy-sis of distribitional properties. Economic Systems Research, 11, 163-183.

[190] Ramsey, J.B. (1999). The contribution of wavelets to the analysis of economic andfinancial data. Philosophical Transactions of the Royal Society.

[191] Ramsey, J.B. (2002). Wavelets in Economics and Finance: Past and Future. Studiesin Nonlinear Dynamics and Econometrics, 6, 3, 1.

[192] Ramsey, J.B. and C. Lampart (1998a). The Decomposition of Economic Relation-ships by Time Scale Using Wavelets: Expenditure and Income. Studies in NonlinearDynamics and Econometrics, 3, 1, 2.

[193] Ramsey, J.B. and C. Lampart (1998b). Decomposition of economic relationshipsby timescale using wavelets. Macroeconomic Dynamics, 2, 49-71.

[194] Ramsey, J.B. and Z. Zhang (1997). The analysis of foreign exchange data usingwaveform dictionaries. J. Empirical Finance, 4, 4, 341-372.

[195] Ray, B.K. (1993a). Modelling long memory processes for optimal long range pre-diction. J. Time Series Analysis, 14, 511-526.

[196] Ray, B.K. (1993b). Long-range forecasting of IBM product revenues using a sea-sonal fractionally differenced ARMA model. International Journal of Forecasting, 9,255-269.

[197] Reisen, V.A. (1994). Estimation of the fractional difference parameter in theARFIMA(p,d,q) model using the smoothed periodogram. J. Time Series Analysis,15(1), 335-350.

[198] Robinson, P.M. (1976). Instrumental Variables Estimation of Differential Equa-tions. Economitrica, 44, 4, 765-776.

[199] Robinson, P.M. (1992). Log-periodogram regression for time series with long rangedependence. Unpublished manuscript.

178

[200] Robinson, P.M. (1994). Efficient tests of nonstationary hypotheses. J. AmericanStatistical Association, 89(428), 1420-1437.

[201] Robinson, P.M. (1995). Log-periodogram regression of time series with long rangedependence. Annals of Statistics, 23, 1048-1072.

[202] Said, E.S. and D.A. Dickey (1984). Testing for unit roots in ARMA(p,q) modelswith unknown p and q. Biometrika, 71, 599-607.

[203] Samorodnitsky, G. and M.S. Taqqu (1994). Stable non-Gaussian random Processesstochastic models with infinite variance. Chapman and Hall, New York.

[204] Sargan, J.D. and A. Bhargava (1983). Testing Residuals from Least Squares Re-gression for Being Generated By a Gaussian Random Walk. Econometrica 51, 153-174.

[205] Schmidt, P. and J. Lee (1996). A modification of the Schmidt-Phillips unit root test.Economics Letters, 36, 3, 285-289.

[206] Shann, W.C. and C.C. Yen (1999). On the exact values of orthonormal scalingcoefficients of lengths 8 and 10. Applied and Computational Harmonic Analysis, 6, 1,109-112.

[207] Shimotsu, K. (2002). Exact local Whittle estimation of fractional integration withunknown mean and time trend. Department of Economics Discussion Paper No. 543,University of Essex.

[208] Shimotsu, K. and P.C. B. Phillips (2002). Pooled log periodogram regression. J.Time Series Analysis, 23, 57-93.

[209] Shimotsu, K. and P.C.B. Phillips (2005). Exact local Whittle estimation of frac-tional integration. The Annals of Statistics, 33, 4, 1890-1933.

[210] Sibbertsen, P. (2004). Long memory versus structural breaks: An overview. Statis-tical Papers, 45, 4.

[211] Smith, J. and S. Yadav (1994). Forecasting cost incurred from unit differencingfractionally integrated processes. International Journal of Forecasting, 10, 507-514.

[212] Sowell, F. (1989). A decomposition of block toeplitz matrices with applications tovector time series. Technical Report, GSIA, Carnegie Mellon University.

[213] Sowell, F. (1992a). Modeling long-run behavior with the fractional ARIMA model.J. Monetary Economics, 29(2), 277-302,

[214] Sowell, F. (1992b). Maximum likelihood estimation of stationary univariate frac-tionally integrated time series models. J. Economics, 53, 165-188.

[215] Strang, G. and T. Nguyen (1996). Wavelets and filter banks. Wellesley-CambridgePress, Wellesley, MA.

179

[216] Stengos, T. and Y. Sun (2001). A consistent model specification test for a regressionfunction based on nonparametric wavelet estimation. Econometric Reviews, 20, 1, 41-60.

[217] Tanaka, K. (1999). The nonstationary fractional unit root. Econometric theory, 15,509-582.

[218] Taqqu, M.S., V. Teverovsky and W. Willinger (1995). Estimators for long-rangedependence: An empirical study. Fractals, 3, 785-798.

[219] Taqqu, M.S., V. Teverovsky and W. Willinger (1997). Is network traffic self-similaror multifractal. Fractals, 5, 63-73.

[220] Taylor, C.C. and S.J. Taylor (1991). Estimating the dimension of a fractal. J. RoayalStatistical Society B. 53, 353-364.

[221] Tewfik, A.H. and M. Kim (1992). Correlation structure of the discrete waveletcoefficients of fractional Brownian motion. IEEE transactions on information theory,38, 2, 2, 904-909.

[222] Tkacz, G. (2002). Estimating the fractional order of integration of interest ratesusing a wavelet OLS estimator. Technical report 2000-5, Department of Monetary andFinancial Analysis, Bank of Canada.

[223] Tse, Y.K., V.V. Ahn and Q. Tieng (2002). Maximum likelihood estimation of thefractional differencing parameter in an ARFIMA model using wavelets. Mathematicsand Computers in Simulation, 59, 153-161.

[224] Tsybakov, B. and N.D. Georganas (1997). On the self-similar traffic in ATMqueues: definitions, overflow probablility bound, and cell delay distribution.IEEE/ACM Trans, Netw. 5. 3, 397-409.

[225] Velasco, C. (1999a). Non-stationary log periodogram regression. J. Econometrics,91, 325-371.

[226] Velasco, C. (1999b). Gaussian semiparametric estimation of non-stationary timeseries. J. time series analysis, 20, 1, 87-127.

[227] Velasco, C. and P.M. Robinson (2000). Whittle pseudo-maximum likelihood est-mation for nonstationary time series. J. the American Statistical Association, 95, 452,1229-1243.

[228] Vetterli, M. and J. Kovacevic (1995). Wavelets and subband coding. Prentice Hall,Englewood Cliffs, NJ.

[229] Vidakovic, B. (1998). Nonlinear wavelet shrinkage with Bayes rules and Bayesfactors. J. the American Statistical Association, 93, 173-179.

[230] Vidakovic, B. (1999). Statistical modeling by wavelets. New York: Wiley.

180

[231] Vidakovic, B. and P. Müller(1999). An introduction to wavelets. In: Bayesian In-ference in Wavelet-Based Models. EditorsMüller and Vidakovic, Springer-Verlag, Lec-ture Notes in Statistics, 141, 1-18.

[232] Veitch, D. and P. Abry (1999). A wavelet-based joint estimator of the parametersof long-rangedependence. IEEE Transactions on Information Theory, 45, 3, 878-897.

[233] Wang, R. (1998). Some properties of sums of independent random sets. Northeast-ern Math. Journal, 14(2), 203-210.

[234] Wang, R. and Z. Wang (1997). Set-Valued Stationary Processes. J. MultivariateAnalysis, 63, 1, 180-198.

[235] Whitcher, B. and M.J. Jensen (2000). Wavelet estimation of a local long memoryparameter. Exploration Geophysics, 31, 94-103.

[236] Whitcher, B. (2001). Simulating Gaussian stationary processes with unboundedspectra. J. Computational and Graphical Statistics, 10, 1, 112-134.

[237] Whitcher, B. (2004). Wavelet-based estimation procedures for seasonal long mem-ory models. Technometrics, 46, 2, 225-238(14).

[238] Whittle, P. (1951) Hypothesis testing in time series analysis. Hafnerr, New York.

[239] Wickerhauser, M.V. (1996). Adapted Wavelet Analysis from Theory to Software.A. K. Peters, Wellesley, MA.

[240] Woodward, W.A., Q.C. Cheng and H.L. Gray (1998). A k-factor GARMA long-memory model. J. Time Series Analysis, 19, 5, 485-504.

[241] Wornell, G.W. (1996). Signal processing with fractals: a wavelet based approach.Prentice Hall, Englewood Cliffs, New Jersey.

[242] Yajima, Y. (1985). On estimation of long-memory time series models. Australianand New Zealand Journal of Statistics, 27, 3, 303-320.

[243] Yang, M. (2000). Some properties of vector autoregressive processes with Markov-Switching coefficients. Econometric Theory, 16, 23-43.

181

analysis of stationary and non-stationary long memory ... · memory processes and the...

Documents