econ 623 econometrics ii topic 5: non-stationary time ... · econ 623 econometrics ii topic 5:...

Econ 623 Econometrics II

Topic 5: Non-stationary Time Series

1 Types of non-stationary time series models oftenfound in economics

� Deterministic trend (trend stationary):

Xt = f(t) + "t;

where t is the time trend and "t represents a stationary error term (hence both the

mean and the variance of "t are constant. Usually we assume "t � ARMA(p; q)

with mean 0 and variance �2)

� f(t) is a deterministic function of time and hence the model is called the

deterministic trend model.

�Why non-stationary? E(Xt) = f(t)(depends on t, hence not a constant);

V ar(Xt) = �2(constant)

� Since "t is stationary, Xt is �uctuating around f(t), the deterministic trend.

Xt has no obvious tendency for the amplitude of the �uctuations to increase

or decrease since V ar(Xt) is a constant. (Xt could increase or decrease,

however.)

�Xt� f(t) is stationary. For this reason a deterministic trend model is some-

times called a trend stationary model.

�Assume "t = �(L)et; et � iid(0; �2e): If we label et the innovation or the shock

to the system, the innovation must have a transient, diminishing e¤ect on

X. Why?

1

0 20 40 60 80 100

46

812

16

0 20 40 60 80 100

020

6010

0

0 20 40 60 80 100

040

8012

0

2

0 50 100 150 200

02

46

810

12

0 50 100 150 200

01

23

4

3

0 200 400 600 800 1000

20

10

010

random walk

0 200 400 600 800 1000

020

4060

80

4

Example: Xt = �0 + �1t + �(L)et; with �(L) = 1 + �L + �2L2 + � � � and

j�j < 1

So Xt = �0 + �1t+ "t; "t = �"t�1 + et

"t = �(L)et measures the deviation of the series from trend in period t. We

wish to examine the e¤ect of an innovation et on "t; "t+1; :::

"t = et + �et�1 + �2et�2 + :::

which gives @"t+s@et

= �s (impulse response)

Therefore, when there is a shock to the economy, a deterministic trend model

implies the shock has a transient e¤ect. Sooner or later the system will be

back to the deterministic trend.

� If f(t) = c1 + c2t, we have a linear trend model. It is widely used.

� If f(t) = Aert, we have an exponential growth curve

� If f(t) = c1 + c2t+ c3t2, we have a quadratic trend model

� If f(t) = 1k+abt

; we have a logistic curve

5

� Stochastic trend (unit root or di¤erence stationarity):

Xt = �+Xt�1 + "t

where "t is a stationary process (hence both the mean and the variance of "t are

constant. Usually we assume "t � ARMA(p; q) with mean 0 and variance �2)

�Another representation: (1� L)Xt = �+ "t

� Solving 1 � L = 0 for L; we have L = 1, justifying the terminology �unit

root�.

�Pure random walk: Xt = Xt�1 + et; etiid

~N(0; �2e)

1. Xt =Pt�1

j=0 et�j if we assume X0 = 0 with probability 1

2. E(Xt) = 0; V ar(Xt) = t�2e !1

3. Non-stationarity. Xt is wandering around and can be anywhere.

6

�Random walk with a drift: Xt = �+Xt�1 + et; etiid� N(0; �2e)

1. Xt = t�+Pt�1

j=0 et�j if we assume X0 = 0 with probability 1

2. E(Xt) = t�!1; V ar(Xt) = t�2e !1

3. Non-stationarity.

4. As E(Xt) = t�, a stochastic trend (or unit root) model could behave

similar to a model with a linear deterministic trend.

� In a random walk model, Xt = t�+Pt

j=0 et�j. If we label et�j the innovation

or the shock to the system, the innovation has a permanent e¤ect on Xt

because@Xt

@et�j= 1; 8 j > 0:

This is true for all models with a unit root. Therefore, when there is a shock

to the economy, a unit root model implies that the shock has a permanent

e¤ect. The system will begin with a new level every time.

� In a trend-stationary model

Xt = �0 + �1t+ �Xt�1 + et; j�j < 1;

we have@Xt

@et�j= �j ! 0:

So the innovation has a transient e¤ect on Xt

7

�A deterministic trend model and a stochastic trend (or unit root) model

could behave similar to each other. However, knowing whether non-stationarity

in the data is due to a deterministic trend or a stochastic trend (or unit

root) would seem to be a very important question in economics. For ex-

ample, macroeconomists are very interested in knowing whether economic

recessions have permanent consequences for the level of future GDP, or in-

stead represent temporary downturns with the lost output eventually made

up during the recovery.

� If (1� L)Xt = "t � ARMA(p; q); we say Xt � ARIMA(p; 1; q) or Xt is an

I(1) process.

� If (1�L)dXt = "t � ARMA(p; q); we say Xt � ARIMA(p; d; q) or Xt is an

I(d) process.

8

� Why linear time trends and unit roots?

�Why linear trends? Indeed most GDP series, such as many economic and

�nancial series seem to involve an exponential trend rather than a linear

trend (apart from a stochastic trend). Suppose this is true, then we have

Yt = exp(�t)

However, if we take the natural log of this exponential trend function, we

will have,

lnYt = �t

Thus, it is common to take logs of the data before attempting to describe

them. After we take logs, most economic and �nancial time series exhibit a

linear trend. This is why researchers often take the logarithmic transforma-

tion before doing analysis.

9

�Why unit roots? After taking logs, Suppose a time series follows a unit root

rather than a linear trend, we have (1�L) lnYt = "t; where "t is a stationary

process, such as ARMA(p,q). However, (1�L) lnYt = ln(Yt=Yt�1) = ln(1+Yt�Yt�1Yt�1

) � Yt�Yt�1Yt�1

: So the rate of growth of the series Yt is stationary. For

example, if Yt represents CPI and lnYt is a I(1) process, then (1 � L) lnYt

represent in�ation rate and is a stationary process.

� Other forms of nonstationarity: structural break in mean, structural break in

variance.

10

2 Processes with Deterministic Trends

� Model with a simple linear trend

Xt = �t+ et; t = 1; :::; T:

� Let the OLS estimator of � be b�T , ie,b�T = PXttP

t2= � +

PettPt2:

� If et � N(0; �2), then

b�T � N

��;

�2Pt2

�= N

��;

6�2

T (T + 1)(2T + 1)

�:

This is the exact small-sample normal distribution. The corresponding t statisticpT (T + 1)(2T + 1)b�Tp

6b�2Thas exact small-sample t distribution, where b�2T = 1

T�1P�

Xt � b�T t�.

11

� If etiid� (0; �2) and has �nite fourth moment, we have to �nd the asymptotic

distributions for b�T .� Note that

T 3=2�b�T � �

�=T�1=2

Pett=T

T�3Pt2

:

� Also note that fett=Tg is a MDS with (1) �2t = E(e2t t2=T 2) = �2t2=T 2 and

1T

P�2t2=T 2 ! �2=3; (2) Ejett=T j4 < 1; (3) 1

T

Pe2t t

2=T 2p! �2=3. The reason

for (3) to hold is because

E

�1

T

Xe2t t

2=T 2 � 1

T

X�2t

�2= E

�1

T

X�e2t � �2

�t2=T 2

�2=1

T 6E�e2t � �2

�2Xt4 ! 0:

� Applying CLT for MDS to fett=Tg, we get T�1=2Pett=T

d! N(0; �2=3). Since

T�3Pt2 ! 1=3, T 3=2

�b�T � ��

d! N(0; 3�2).

� Consider the t statistic:pT (T + 1)(2T + 1)

�b�T � ��

p6b�2T =

pT (T + 1)(2T + 1)T 3=2

�b�T � ��

p6b�2T d! N (0; 1) :

.

12

� Model with a simple linear trend

Xt = �+ �t+ et:

� If etiid� (0; �2) and has �nite fourth moment, then�

T�1=2Pet

T�1=2P(t=T ) et

�d! N

��00

�; �2

�1 1

212

13

��:

� Need to �nd the asymptotic distributions for b�T and b�T and the rate of conver-gence is di¤erent."

T 1=2 (b�T � �)

T 3=2�b�T � �

� # d! N

�00

�; �2

�1 1

212

13

��1!:

� The corresponding t statistics converge to N (0; 1).

13

3 Unit Root Tests and A Stationarity Test

� The theory of ARMA method relies on the assumption of stationarity.

� The assumption of stationarity is too strong for many macroeconomic time series.

For example, many macroeconomic time series involve trend (both deterministic

trends and stochastic trends �unit root).

� Tests for a unit root. Various test statistics are often used in practice: Augmented

Dickey-Fuller (ADF) test, Phillips and Perron (PP) test.

� Test for stationarity: Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.

� Three possible alternative hypotheses for the unit root tests

1. no constant and no trend: Xt = �Xt�1 + et; j�j < 1

2. constant but no trend: Xt = �+ �Xt�1 + et; j�j < 1

3. constant and trend: Xt = �+ �Xt�1 + �t+ et; j�j < 1

14

� First consider the AR(1) process, Xt = �Xt�1 + et; utiid� N(0; �2)

� Stationarity requires j�j < 1

� If j�j = 1; Xt is nonstationary and has a unit root

� Test H0 : � = 1 against H1 : j�j < 1

� How to estimate �? OLS estimation, ie, b� = PXtXt�1PX2t�1

= � +PXt�1etPX2t�1. Stock

(1994) shows that the OLS estimator of � is superconsistent:

� What is the distribution or asymptotic distribution of b�� ?

� Recall from the standard model, y = X� + e

� If X is non-stochastic, b� � � � N(0; �2(X0X)�1)

� If X is stochastic and stationary with some restrictions on autocovariance,pT (b� � �)

d! N(0; �2Q�1) where Q = p lim X0XT

15

3.1 Classical Asymptotic Theory for Covariance StationaryProcess

� Consider the AR(1) process, Xt = �Xt�1 + et; et � iidN(0; �2) with j�j < 1

� For this model E(Xt) = 0, V ar(Xt) =�2

1��2 � �2X , j = �jV ar(Xt).

� What is the asymptotic property of X = 1T

PTt=1Xt?

� We haveE(X) = 0, V ar(X) = E(X2) = V ar(Xt)

T

�1 + 2T�1

T�+ 2T�2

T�2 + � � �+ 2 1

T�T�1

�!

0. So X r2! 0) Xp! 0

� Also, TV ar(X)!P1

j=�1 j, which is called the long-run variance.

� The consistency is generally true for any covariance stationary process with con-

ditionP1

j=0 j jj <1.

� Martingale Di¤erence Sequence (MDS): E(XtjXt�1; :::; X1) = 0 for all t

� MDS is serially uncorrelated but not necessarily independent.

� MDS does not need to be stationary.

16

� Theorem (White, 1984): Let fXtg be a MDS with XT =1T

PTt=1Xt Suppose

that (a) E(X2t ) = �2t > 0 with 1

T

PTt=1 �

2t ! �2 > 0, (b) EjXtjr < 1 for some

r > 2 and all t, and (c) 1T

PTt=1X

2t

p! �2. ThenpTXT

d! N(0; �2).

� Theorem (Andersen, 1971): Let

Xt =P1

t=0 j"t�j

where f"tg is a sequence of iid variables with E("2t ) < 1 andP1

j=0 j jj < 1.

Let j be the jth order autocovariance. Then

pTXT

d! N(0;P1

j=�1 j)

� To know more about asymptotic theory for linear processes, read Phillips and

Solo (1991).

17

3.2 Dickey-Fuller Test

� Under H0, however, we cannot claimpT (b�� ) d! N(0; �2Q�1) because Xt�1 is

not stationary.

� If Xt = �Xt�1 + et; and H0 : � = 1; we have b�� 1 = PXt�1etPX2t�1

� Consider X2t = (Xt�1 + et)

2 =) Xt�1et =12(X2

t � X2t�1 � e2t ) =)

PXt�1et =

12(X2

T �X20 �

Pe2t )

� Let X0 = 0;PXt�1et =

12(X2

T �Pe2t )

� etiid� (0; �2) =) e2t � iid with E(e2t ) = �2: By LLN, 1

T

Pe2t

p! �2

� XT = X0 +PT

t=1 et =Pet

a� N(0; T�2) =) XTpT�

d! N(0; 1) =) X2T

T�2d! �2(1)

� 1T�2

PXt�1et =

12(X2T

T�2�

Pe2t

T�2)d! 1

2(�2(1) � 1)

� What is the limit behavior ofPX2t�1? We need to introduce the Brownian Mo-

tion.

18

3.2.1 Brownian Motion

� Consider Xt = Xt�1 + et; X0 = 0; et � iidN(0; 1) =) Xt =Pt

j=1 ej � N(0; t)

� 8t > s;Xt �Xs =Pt

j=1 ej �Ps

j=1 ej = es+1 + :::+ et � N(0; (t� s))

� Also note that, Xt �Xs is indept of Xr �Xq if t > s > r > q

� What happen if we go from the discrete case to the continuous case? We have

the standard Brownian Motion (BM). It is denoted by B(t)

� A stochastic process Bt over time is a standard Brownian motion if for small

time interval �, the change in Bt;�Bt(� Bt+��Bt), follows the following prop-

erties:

1. �Bt = ep� where e � N(0; 1)

2. �Bt and �Bs for changes over any non-overlapping short intervals are in-

dependent

� The above two properties imply the following:

1. E(�Bt) = 0

2. Var(�Bt) = �

3. �(�Bt) =p�

19

� In general we have

BT �B0 =

NXi=1

eip�

� Due to the above, we have

E(BT �B0) = 0

Var(BT �B0) = T

�(BT �B0) =pT

20

� Therefore, for the Brownian motion, results for the mean and variance in small

intervals also apply to large intervals.

� When �! 0, we write the Brownian motion as:

dBt = ep�

where dBt should be interpreted as the change in the Brownian motion over an

arbitrarily small time interval.

21

� Sample paths of the Brownian motion are continuous everywhere but not smooth

anywhere.

� Derivative of the Brownian motion does not exist anywhere and the process is

in�nitely �kinky�.

� A Brownian motion is nonstationary as Var(BT ) drifts up with t. The transition

density is

BT+�jBT � N(BT ;�):

� A Brownian motion is also called a Wiener process. It has no drift. Now we

extend it to the generalized Wiener process.

� We can de�ne et = e1t + e2t; e1t; e2t � N(0; 1=2); yt+ 12= yt + e1t

� De�nition of the BM: W (t) = �B(t)

22

� Note:

1. B(0) = 0

2. B(t) � N(0; t) or B(t) is a Gaussian process

3. E(B(t+ h)�B(t)) = 0; E(B(t+ h)�B(t))2 = jhj

4. E(B(t)B(s)) = minft; sg

5. If t > s > r > q;B(t) � B(s) is indept of B(r) � B(q); ie, B(t) has indept

increment

6. Since B(t) is a Gaussian process, it is completely speci�ed by its covariance

matrix

7. If B(t) is a BM, then so is B(t+ a)�B(a); ��1B(�2t);�B(t)

8. The sample path of BM is continuous in time with probability 1

9. No point di¤erentiable

23

� Return to the unit root test. Xt = X0+Pt

j=1 ej: De�ne pt =Pt

j=1 ej = pt�1+ et

� Change the time index from T = f1; :::; Tg to t with 0 � t � 1

� De�ne YT (t) = 1�pTp[Tt]; if

j�1T� t � j

T; j = 1; :::; T: [Tt] = integer part of Tt

� For instance, YT (1) = 1�pTpT

� YT (t) =

8<:0 if 0 � t � 1

Tp1�pT

if 1T� t � 2

Tpj�1�pT

if j�1T� t � j

T

� Two important theorems

1. Functional CLT: YT (�)d! B(�)

2. Continuous mapping theorem: if f is a continuous function, f(YT (t))d!

f(B(t))

24

� Note that YT (1)d! N(0; 1), YT (1=2)

d! N(0; 1=2), YT (r)d! N(0; r), 8r 2 [0; 1].

� 8r1; r2 2 [0; 1], r2 > r1, YT (Tr2)� YT (Tr2)d! N(0; r2 � r1).

�PT

t=1Xt =PT

t=1 pt�1 +PT

t=1 et if X0 = 0.

�R j=T(j�1)=T YT (t)dt =

pj�1�TpT=)

Ppj�1 = �T

pTPT

j=1

R j=T(j�1)=T YT (t)dt = �T

pTR 10YT (t)dt

�PT

t=1Xt = �TpTR 10YT (t)dt+

pTPejpT=)

PTt=1Xt

TpT= �

R 10YT (t)dt+

1T

PejpT

� By FCLT and CMT,PTt=1Xt

TpT

d! �R 10B(t)dt

25

�PT

t=1X2t�1 = X2

0+ � � �+X2T�1 = X2

1+ � � �+X2T�1 =

PTt=1X

2t �X2

T =PT

t=1 p2t�p2T

� Recall p2j�1 = �2T 2R j=Tj�1=T Y

2T (t)dt =)

Pp2j�1 = �2T 2

R 10Y 2T (t)dt and p2T =

�2TY 2T (1)

� Therefore,PTt=1X

2t�1

T 2d! �2

R 10B2(t)dt

� Under H0; T (b�� )d! 1=2(�2(1)�1)R 1

0 B2(t)dt

� Phillips (1987) proved the above result under very general conditions.

� Note:

1. This limiting distribution is non-standard

2. The numerator and denominator are not independent.

3. It is called the Dickey-Fuller distribution since Dickey and Fuller (1976) uses

Monte-Carlo simulation to �nd the critical values of�2(1)�1

2R 10 B

2(t)dtand tabulate

them.

26

1. T�1=2PT

t=1 etd! �B(1);

PTt=1 et � Op(T

1=2)

2. T�1PT

t=1 pt�1etd! 1

2�2 fB2(1)� 1g ;

PTt=1 pt�1et � Op(T )

3. T�3=2PT

t=1 tetd! �B(1)� �

R 10W (r)dr;

PTt=1 tet � Op(T

3=2)

4. T�3=2PT

t=1 pt�1d! �

R 10B(r)dr;

PTt=1 pt�1 � Op(T

3=2)

5. T�2PT

t=1 p2t�1

d! �2R 10B2(r)dr;

PTt=1 p

2t�1 � Op(T

2)

6. T�5=2PT

t=1 tpt�1d! �

R 10rB(r)dr;

PTt=1 tpt�1 � Op(T

5=2)

7. T�3PT

t=1 tp2t�1

d! �2R 10rB2(r)dr;

PTt=1 tp

2t�1 � Op(T

3)

8.PT

t=1 t = T (T + 1)=2 � O(T 2)

27

3.2.2 Case 1 (no drift or trend in the regression):

� In case 1, we test�H0 : Xt = Xt�1 + etH1 : Xt = �Xt�1 + et with j�j < 1

� This is equivalent to�H0 : �Xt = etH1 : �Xt = �Xt�1 + et with � < 0

� We estimate the following model using the OLS method:

�Xt = �Xt�1 + et; etiid� (0; �2e):

28

� The OLS estimator of � is de�ned by b�T , which is biased but consistent.� The test statistic (known as theDF Z test or the coe¢ cient statistic) is de�ned

as T b�T . Under H0, T b�T d! 1=2(�2(1)�1)R 10 B

2(r)dr.

� 1=2(�2(1)�1)R 10 B

2(r)dris not a normal distribution. See Table B.5 for the critical values of

this distribution. The �nite sample distribution of b�T is not normal. See TableB.5 for the critical values for di¤erent sample sizes.

� Alternatively, we can use t-statistic b�Tbse(b�T ) . Under H0, it can be shown that

b�Tbse(b�T ) = T b�T �T�2PX2t�1�1=2

b�T d!1=2

��2(1) � 1

�hR 10B2(r)dr

i1=2 :Although it is called the Dickey-Fuller t-statistic, it is no longer a t distribution.

See Table B.6 for the critical values of this distribution and the critical values of

the �nite sample distribution for di¤erent sample sizes.

29

3.2.3 Case 2 (constant but no trend in the regression):

� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + et with � < 0

� In case 2, we estimate the following model using the OLS method:

�Xt = �+ �Xt�1 + et; etiid� N(0; �2e)

� The DF Z test statistic is T b�T : Under H0, T b�T d! 1=2(�2(1)�1)�B(1)

R 10 B(r)drR 1

0 B2(r)dr�(

R 10 B(r)dr)

2 . See

Table B.5 for the critical values of this distribution and the critical values of the

�nite sample distribution for di¤erent sample sizes.

� Proof of the Asymptotic Distribution:

The OLS estimates are� b�Tb�T�=

�T

PXt�1P

Xt�1PX2t�1

��1 � PetP

Xt�1et

�=

�Op(T ) Op(T

3=2)Op(T

3=2) Op(T2)

��1 �Op(T

1=2)Op(T )

�:

30

Let g(T ) = diag(T 1=2; T ), then�T 1=2b�TT b�T

�= g(T )

�T

PXt�1P

Xt�1PX2t�1

��1 � PetP

Xt�1et

�=

�g�1(T )

�T

PXt�1P

Xt�1PX2t�1

�g�1(T )

��1g�1(T )

� PetP

Xyt�1et

�=

�1 T�3=2

PXt�1

T�3=2PXt�1 T�2

PX2t�1

��1 �T�1=2

Pet

T�1PXt�1et

�d!"

1 �R 10B(r)dr

�R 10B(r)dr �2

R 10B2(r)dr

#�1 ��B(1)

12�2 fB2(1)� 1g

�:

� Alternatively, we can use t-statistic b�Tbse(b�T ) d! 1=2(�2(1)�1)�B(1)

R 10 B(r)dr�R 1

0 B2(r)dr�(

R 10 B(r)dr)

2�1=2 under H0.

Although it is called the Dickey-Fuller t-statistic, it is no longer a t distribution.

See Table B.6 for the critical values of this distribution and the critical values of

the �nite sample distribution for di¤erent sample sizes.

31

3.2.4 Case 2b (H0: Random Walk with Drift):

� What if the true model is a random walk with drift, ie, Xt = � + Xt�1 + et =

X0 + t�+Pt

j=1 ej? (� 6= 0)

�PXt�1 = TX0 + T (T � 1)�=2 +

Ppt�1 � Op(T

2)

� T�2PXt�1

p! �=2, T�3PX2t�1

p! �2=3,PXt�1et � Op(T

3=2)

� Let g(T ) = diag(T 1=2; T 3=2), then�T 1=2b�TT 3=2b�T

�=

�g�1(T )

�T

PXt�1P

Xt�1PX2t�1

�g�1(T )

��1 �T�1=2

Pet

T�3=2PXt�1et

�

� The �rst term converges to�

1 �=2�=2 �2=3

��1. Since T�3=2

PXt�1et

p! T�3=2P�(t�

1)et, the second term converges to�T�1=2

Pet

T�3=2PXt�1et

�d! N

��00

�; �2

�1 �=2�=2 �2=3

��:

And hence, �T 1=2b�TT 3=2b�T

�d! N

�00

�; �2

�1 �=2�=2 �2=3

��1!

32

3.2.5 Case 3 (constant and trend in the regression):

� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0

or�H0 : �Xt = �+ etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0

� In this case, we estimate the following model using the OLS method:

Xt = �+ �Xt�1 + �t+ et; etiid� N(0; �2e)

� Xt = �(1��)+�(Xt�1��(t� 1))+ (�+��)t+ et = ��+�pt�1+ ��t+ et, where

pt�1 = Xt�1 � �(t� 1) is a pure random walk if � = 1 (ie � = 0) and � = 0.

33

� Let g(T ) = diag(T 1=2; T; T 3=2), then2664T 1=2b��T

T�b�T � 1�

T 3=2�b��T � �

�3775 =

8<:g�1(T )24 T

Ppt�1

PtP

pt�1Pp2t�1

Ptpt�1P

tPtpt�1

Pt2

35 g�1(T )9=;�1 24 T�1=2

Pet

T�1Ppt�1et

T�3=2Ptet

35d!

24 1 �RBdr 1=2

�RB(r)dr �2

RB2dr �

RrBdr

1=2 �RrBdr 1=3

35�1 24 �B(1)12�2 [B2(1)� 1]

��B(1)�

RBdr

�35

=

24 � 0 00 1 00 0 �

3524 1RBdr 1=2R

BdrRB2dr

RrBdr

1=2RrBdr 1=3

35�1 24 B(1)12[B2(1)� 1]

B(1)�RBdr

35 :� So the aymptotic distribution of b�T (and hence b�T ) is independent of � and �.

34

� The Dickey-Fuller Z test statistic is T b�T : Under H0, the asymptotic distribution

is not a normal distribution. See Table B.5 for the critical values of the asymptotic

distribution distribution and the critical values of the �nite sample distribution

for di¤erent sample sizes.

� Alternatively, we can use the Dickey-Fuller t-statistic b�Tbse(b�T ) whose asymptoticdistribution distribution is no longer a t distribution. See Table B.6 for the

critical values of this distribution for di¤erent sample sizes.

Test Statistic 1% 2.5% 5% 10%Case 1 -2.56 -2.34 -1.94 -1.62Case 2 -3.43 -3.12 -2.86 -2.57Case 3 -3.96 -3.66 -3.41 -3.13

When the sample size is 100, the critical values for the Dickey-Fuller t-statistic

are given in the table below

Test Statistic 1% 2.5% 5% 10%Case 1 -2.60 -2.24 -1.95 -1.61Case 2 -3.51 -3.17 -2.89 -2.58Case 3 -4.04 -3.73 -3.45 -3.15

35

3.2.6 Which case to use in practice?

� Use Case 3 test for a series with an obvious trend, such as GDP

� Use Case 2 test for a series without an obvious trend, such as interest rates. For

variables such as interest rate we should use Case 2 rather than Case 1 since if

the data were to be described by a stationary process, surely the process would

have a positive mean. However, if you strongly believe a series has a zero mean

when it is stationary, use Case 1.

36

3.3 Augmented DF (ADF) Test

� The model in the null hypothesis considered in the DF test is a class of highly

restrictive unit root models.

� Suppose the model in the null hypothesis is Xt = Xt�1 + ut; ut = �ut�1 + et.

Unless � = 0, the DF test is not applicable.

� The above model implies the following AR(2) model for Xt,

Xt = (1 + �)Xt�1 � �Xt�2 + et = Xt�1 + ��Xt�1 + et

� In general, if

Xt = �1Xt�1 + �2Xt�2 + :::+ �pXt�p + et; et � iid(0; �2e); (3.1)

then the DF test is not applicable.

� Model (3.1) can be represented by

Xt = �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e);

where � =Pp

i=1 �i

� So if � = 1; fXtg is a unit root process; if j�j < 1; fXtg is a stationary process.

37

� In Case 2, the ADF test is based on the OLS estimator of � from the following

regression model,

�Xt = �+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)

� In Case 3, the ADF test is based on the OLS estimator of � from the following

regression model,

�Xt = �+ �t+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)

� Based on the OLS estimator of �; the ADF test statistic, b�Tbse(b�T ) , can be used. Ithas the same asymptotic distribution as the DF t statistic.

38

3.4 Phillips-Perron (PP) Test

� Xt = Xt�1 + ut;�(L)ut = �(L)et; et � iidN(0; �2e); so ut is an ARMA process.

Phillips and Perron (1988) proposed a nonparametric method of controlling for

possible serial correlation in ut.

� There are two PP test statistics. One of them (so-called PP Z test) is the

analogue of T b�T : The other (so-called PP t test) is the analogue ofb�Tbse(b�T ) . PP

Z test has the same sampling distribution as T b�T under all three cases in theprevious section. PP t test has the same sampling distribution as

b�Tbse(b�T ) under allthree cases in the previous section.

� In Case 2, both PP tests are based on the OLS estimator of � from the following

regression model,

�Xt = �+ �Xt�1 + "t; where "t � ARMA

� In this case, the PP Z test statistic is de�ned by T b�T � 12(T 2b�2b�T � s2T )(�2 � 0);

where s2T = (T � 2)�1P(Xt � b�T � b�TXt�1)

2; � = ��(1); 0 = E(u2t )

� In Case 3, both PP tests are based on the OLS estimator of � from the following

regression model,

�Xt = �+ �t+ �Xt�1 + "t; where "t � ARMA

39

3.5 Kwiatkowski, Phillips, Schmidt, and Shin (KPSS, 1992)Test

� Unlike the tests we have thus far introduced, KPSS test is a test for stationarity or

trend-stationarity, that is, it tests for the null of stationarity or trend stationarity

against the alternative of a unit root.

� It is assumed that one can decompose Xt into the sum of a deterministic trend,

a random walk and a stationary error

Xt = �t+ rt + ut (3.2)

rt = rt�1 + et (3.3)

where etiid� N(0; �2e). The initial value r0 is assumed to be �xed.

� If �2e = 0, Xt is trend-stationary. If �2e > 0, Xt is a non-stationary.

40

� KPSS test is a one-sided Lagrange Multiplier (LM) test

� The KPSS statistic is based on the residuals from the OLS regression of Xt on

the exogenous variables Zt = 1 or (1; t):

Xt = �0Zt + �t

� The LM statistic is be de�ned as: PTt=1 S(t)

2

T 2f0;

where f0 is an estimator of the long-run variance and S(t) is a cumulative residual

function:

S(t) =tXs=1

�s

� Critical values for the KPSS test statistic are:

Level of Signi�cance 10% 5% 2.5% 1%Crit value (case 2) 0.347 0.463 0.574 0.739Crit value (case 3) 0.119 0.146 0.176 0.216

41

� If the LM test statistic is larger than the critical value, we have to reject the null

of stationarity (or trend-stationarity).

� In general �t is not a white noise. As a result, the long run variance is di¤erent

from the short run variance. One way to estimate the long run variance is to

nonparametrically estimate the spectral density at frequency zero, that is, com-

pute a weighted sum of the autocovariances, with the weights being de�ned by a

kernel function. For example, a nonparametric estimate based on Bartlett kernel

is

f0 =T�1X

h=�(T�1)

(h)K(h=l);

where l is a bandwidth parameter (which acts as a truncation lag in the covariance

weighting), and (h) is the h-th sample autocovariance of the residuals �, de�ned

as,

(h) =TX

t=h+1

�t�t�h=T;

and K is the Barlett kernel function de�ned by

K(x) =

� 1� jxj if jxj � 1

0 otherwise

42

4 Revisit to Box-Jenkins

� If the null hypothesis cannot be rejected, di¤erence the data (ie, Xt�Xt�1 = ut)

to achieve stationarity.

� In the case where both a stochastic trend and a linear deterministic trend are

involved, the �rst di¤erence transformation leads to a stationary process.

� Test for a unit root in the di¤erenced data. If the null hypothesis cannot be

rejected, di¤erence the di¤erenced data until stationarity is achieved.

� ARMA(p; q)+one unit root =ARIMA(p; 1; q)

� ARMA(p; q)+d unit roots =ARIMA(p; d; q)

5 Limitations of Box-Jenkins

� The Data Generating Process has to be time-invariant. This assumption can be

too restrictive for an economy undergoing a period of transition and for data

covering a long sample period.

� What if the data is explosive, ie � > 1.

� What if a nonlinear deterministic trend is involved.

� What if the data follow a trend stationary process.

� Unit root tests and model selection are done in two separate steps.

43

6 Model Selection

6.1 Why use model selection criteria

� Select p; q in the ARMA model.

� Select lag order, the trend order and other regressors.

6.2 Conventional model selection criteria

� 1�R2

� Minimize 1� �R2

� Minimize AIC (Akaike information criterion, Akaike, 1969). AIC= �2lT+ 2k

T,

where k is the number of parameters, l is the log-likelihood value, and T is the

sample size

� Minimize BIC (Bayesian information criterion, Schwarz, 1978): BIC= �2lT+

kTlnT

44

6.3 Model selection criteria recently developed

� In AIC/BIC, the penalty depends only on the number of regressors while in PIC

it depends not only on the number of regressors but also on the type of regressors

� In the regression context with the normality assumption (ie, Y = X� + "),

PIC= �2lT+ln(jX0X=�2K j)

T, where �2K is a consistent estimate of the residual variance

of a reference model. Usually the reference model is chosen to be the biggest

plausible model.

45

Number of Parameters

Crit

eria

5 10 15

0.4

0.5

0.6

0.7

0.8

1Adj R SquareAICBICPIC

1Adj R SquareAICBICPIC

6.4 Which criterion to use in practice

� Comparison of these criteria

46

� BIC and PIC are consistent. This means in large samples they will select the

true model

� AIC in not consistent. In large samples it tends to choose a model with too many

lags

� PIC is applicable to ARIMA+Trend processes

� AIC and BIC are provided by most packages when a regression is run.

� A Gauss library COINT developed by Ouliaris and Phillips calculates PIC

� In the context of stationary ARMA models, Phillips and Ploberger (1994) found

that PIC is superior to BIC using simulated data.

� PIC is found in Schi¤ and Phillips (2000) to be superior to BIC in a class of

AR+Trend models for forecasting New Zealand GDP

47

7 Cointegration

� De�nition: two variables (Yt; Xt) is said to be cointegrated if each of them taken

individually is I(1) ( ie non-stationary with a unit root), while some linear com-

binations of them (say Yt � �Xt) is stationary (ie I(0)).

� If (Yt; Xt) is cointegrated, the regression model Yt = � + �Xt + et is called a

cointegration relation, where et is I(0).

� (1;��) is called the cointegrating vector. It is easy to see any multiple of it is

also a cointegrating vector. So normalisation is important.

� Cointegration is a long run equilibrium relationship, that is the manner in which

the two variables drift upward together.

48

� Since et is I(0), any deviation from the equilibrium will come back to the equi-

librium in the long run.

� Cointegration and common trend:

� cointegrated variables share common trends

� Suppose both Yt and Xt have a stochastic trend

�Yt = �Y t + "Y t; where �Y t is I(1) ie �Y t = �Y t�1 + !Y t and "Y t is stationary

Xt = �Xt+"Xt; where �Xt is I(1) ie �Xt = �Xt�1+!Xt and "Y t is stationary

�Now if Yt and Xt are cointegrated, then there must exist a parameter � such

that Yt � �Xt is stationary, ie, �Y t + "Y t � �(�Xt + "Xt) = �Y t � ��Xt +

"Y t � �"Xt is stationary. Since "Y t � �"Xt is stationary, for Yt � �Xt to be

stationary, �Y t � ��Xt must be 0. Therefore �Y t = ��Xt

�The equality thus necessarily requires that Yt and Xt must have a common

stochastic trend

49

� The concept of cointegration is originally due to Granger (1981) but popularised

by Engle and Granger (1987, Econometrica)

� Estimation of cointegration systems:

1. OLS estimator is not only consistent but also �super-consistent�. This

means the estimator converges to its limiting distribution at a faster rate

than the estimator in the stationary case.

2. Super-consistency is also true in the cointegration model involved the auto-

correlation.

50

� Test for cointegration (between Yt and Xt)

Basic Idea:

1. Test Yt and Xt for I(1)

2. Obtain the �tted residual bet from a regression of Yt on Xt (di¤erent cointe-

grating regression models lead to di¤erent cases)

3. Use the ADF or PP test to test for a unit root in bet. Since the test is basedon bet, not et, we cannot use the Dickey-Fuller tables as before. We must usemodi�ed critical values.

4. If H0 is rejected Yt and Xt are cointegrated. If H0 cannot be rejected, the

regression of Yt on Xt is spurious.

51

� Case 2 (constant and no trend in the regression):

�We estimate the following model (with the constant term but without trend)

using the OLS method

Yt = �+X 0t� + et

�The ADF t-statistic or the PP t-statistic can be used. The critical values

for the ADF t-statistic or the PP t-statistic are given in the table below.

� Case 3 (constant and trend in the regression):

�We estimate the following model (with the constant term and the trend)

using the OLS method:

Yt = �+X 0t� + �t+ et

�The ADF t-statistic or the PP t-statistic can be used. The critical values

for the ADF t-statistic or the PP t-statistic are given in the table below.

� Which case to use in practice?

�Use Case 3 test if either Yt or Xt or both involve a possible trend, such as

GDP

�Use Case 2 test if neither Yt nor Xt involve a possible trend, such as interest

rates and exchange rates.

k 2 3 4 5 6case 2 3 2 3 2 3 2 3 2 31% -3.90 -4.32 -4.29 -4.66 -4.64 -4.97 -4.96 -5.25 -5.25 -5.525% -3.34 -3.78 -3.74 -4.12 -4.10 -4.43 -4.42 -4.72 -4.71 -4.9810% -3.04 -3.50 -3.45 -3.84 -3.81 -4.15 -4.13 -4.43 -4.42 -4.70

52

8 Spurious Regression

� Yt = a+ bZt + ut

� Classical approach requires that both Yt and Zt and hence ut are stationary

� Now consider Yt = Yt�1 + "1t, Zt = Zt�1 + "2t; where "jtiid� (0; �2j )

� Let Y0 = Z0 = 0; we have Yt =Pt

j=1 "1j, Zt =Pt

j=1 "2j

� If "1t and "2t are indept, the regression is a spurious regression.

� Why spurious?

� Using Monte Carlo simulations Granger and Newbold (1974) �nd that t statistic

rejects the null hypothesis b = 0 far more often than it should and tends to reject

it more and more frequently the larger is the sample size.

� Using asymptotic theory Phillips (1986) �nds that t p! +1, hence it will reject

the null hypothesis b = 0 all the time asymptotically. Also he shows R2p! 1

� Another type of spurious regression: Yt = a+ bt+ ut and Yt = Yt�1 + "t

� Note:

1. If Yt and Zt are stationary, the classical theory applies

2. If Yt and Zt are integrated with di¤erent orders, the regression is meaningless

3. If Yt and Zt are integrated with the same order, but ut is integrated with

the same order, we have a spurious regression

4. If Yt and Zt are integrated with the same order, but ut is integrated with a

lower order, we have a cointegration

53

econ 623 econometrics ii topic 5: non-stationary time ... · econ 623 econometrics ii topic 5:...

Documents