econ 623 econometrics ii topic 5: non-stationary time ... · econ 623 econometrics ii topic 5:...
TRANSCRIPT
Econ 623 Econometrics II
Topic 5: Non-stationary Time Series
1 Types of non-stationary time series models oftenfound in economics
� Deterministic trend (trend stationary):
Xt = f(t) + "t;
where t is the time trend and "t represents a stationary error term (hence both the
mean and the variance of "t are constant. Usually we assume "t � ARMA(p; q)
with mean 0 and variance �2)
� f(t) is a deterministic function of time and hence the model is called the
deterministic trend model.
�Why non-stationary? E(Xt) = f(t)(depends on t, hence not a constant);
V ar(Xt) = �2(constant)
� Since "t is stationary, Xt is �uctuating around f(t), the deterministic trend.
Xt has no obvious tendency for the amplitude of the �uctuations to increase
or decrease since V ar(Xt) is a constant. (Xt could increase or decrease,
however.)
�Xt� f(t) is stationary. For this reason a deterministic trend model is some-
times called a trend stationary model.
�Assume "t = �(L)et; et � iid(0; �2e): If we label et the innovation or the shock
to the system, the innovation must have a transient, diminishing e¤ect on
X. Why?
1
0 20 40 60 80 100
46
812
16
0 20 40 60 80 100
020
6010
0
0 20 40 60 80 100
040
8012
0
2
0 50 100 150 200
02
46
810
12
0 50 100 150 200
01
23
4
3
0 200 400 600 800 1000
20
10
010
random walk
0 200 400 600 800 1000
020
4060
80
4
Example: Xt = �0 + �1t + �(L)et; with �(L) = 1 + �L + �2L2 + � � � and
j�j < 1
So Xt = �0 + �1t+ "t; "t = �"t�1 + et
"t = �(L)et measures the deviation of the series from trend in period t. We
wish to examine the e¤ect of an innovation et on "t; "t+1; :::
"t = et + �et�1 + �2et�2 + :::
which gives @"t+s@et
= �s (impulse response)
Therefore, when there is a shock to the economy, a deterministic trend model
implies the shock has a transient e¤ect. Sooner or later the system will be
back to the deterministic trend.
� If f(t) = c1 + c2t, we have a linear trend model. It is widely used.
� If f(t) = Aert, we have an exponential growth curve
� If f(t) = c1 + c2t+ c3t2, we have a quadratic trend model
� If f(t) = 1k+abt
; we have a logistic curve
5
� Stochastic trend (unit root or di¤erence stationarity):
Xt = �+Xt�1 + "t
where "t is a stationary process (hence both the mean and the variance of "t are
constant. Usually we assume "t � ARMA(p; q) with mean 0 and variance �2)
�Another representation: (1� L)Xt = �+ "t
� Solving 1 � L = 0 for L; we have L = 1, justifying the terminology �unit
root�.
�Pure random walk: Xt = Xt�1 + et; etiid
~N(0; �2e)
1. Xt =Pt�1
j=0 et�j if we assume X0 = 0 with probability 1
2. E(Xt) = 0; V ar(Xt) = t�2e !1
3. Non-stationarity. Xt is wandering around and can be anywhere.
6
�Random walk with a drift: Xt = �+Xt�1 + et; etiid� N(0; �2e)
1. Xt = t�+Pt�1
j=0 et�j if we assume X0 = 0 with probability 1
2. E(Xt) = t�!1; V ar(Xt) = t�2e !1
3. Non-stationarity.
4. As E(Xt) = t�, a stochastic trend (or unit root) model could behave
similar to a model with a linear deterministic trend.
� In a random walk model, Xt = t�+Pt
j=0 et�j. If we label et�j the innovation
or the shock to the system, the innovation has a permanent e¤ect on Xt
because@Xt
@et�j= 1; 8 j > 0:
This is true for all models with a unit root. Therefore, when there is a shock
to the economy, a unit root model implies that the shock has a permanent
e¤ect. The system will begin with a new level every time.
� In a trend-stationary model
Xt = �0 + �1t+ �Xt�1 + et; j�j < 1;
we have@Xt
@et�j= �j ! 0:
So the innovation has a transient e¤ect on Xt
7
�A deterministic trend model and a stochastic trend (or unit root) model
could behave similar to each other. However, knowing whether non-stationarity
in the data is due to a deterministic trend or a stochastic trend (or unit
root) would seem to be a very important question in economics. For ex-
ample, macroeconomists are very interested in knowing whether economic
recessions have permanent consequences for the level of future GDP, or in-
stead represent temporary downturns with the lost output eventually made
up during the recovery.
� If (1� L)Xt = "t � ARMA(p; q); we say Xt � ARIMA(p; 1; q) or Xt is an
I(1) process.
� If (1�L)dXt = "t � ARMA(p; q); we say Xt � ARIMA(p; d; q) or Xt is an
I(d) process.
8
� Why linear time trends and unit roots?
�Why linear trends? Indeed most GDP series, such as many economic and
�nancial series seem to involve an exponential trend rather than a linear
trend (apart from a stochastic trend). Suppose this is true, then we have
Yt = exp(�t)
However, if we take the natural log of this exponential trend function, we
will have,
lnYt = �t
Thus, it is common to take logs of the data before attempting to describe
them. After we take logs, most economic and �nancial time series exhibit a
linear trend. This is why researchers often take the logarithmic transforma-
tion before doing analysis.
9
�Why unit roots? After taking logs, Suppose a time series follows a unit root
rather than a linear trend, we have (1�L) lnYt = "t; where "t is a stationary
process, such as ARMA(p,q). However, (1�L) lnYt = ln(Yt=Yt�1) = ln(1+Yt�Yt�1Yt�1
) � Yt�Yt�1Yt�1
: So the rate of growth of the series Yt is stationary. For
example, if Yt represents CPI and lnYt is a I(1) process, then (1 � L) lnYt
represent in�ation rate and is a stationary process.
� Other forms of nonstationarity: structural break in mean, structural break in
variance.
10
2 Processes with Deterministic Trends
� Model with a simple linear trend
Xt = �t+ et; t = 1; :::; T:
� Let the OLS estimator of � be b�T , ie,b�T = PXttP
t2= � +
PettPt2:
� If et � N(0; �2), then
b�T � N
��;
�2Pt2
�= N
��;
6�2
T (T + 1)(2T + 1)
�:
This is the exact small-sample normal distribution. The corresponding t statisticpT (T + 1)(2T + 1)b�Tp
6b�2Thas exact small-sample t distribution, where b�2T = 1
T�1P�
Xt � b�T t�.
11
� If etiid� (0; �2) and has �nite fourth moment, we have to �nd the asymptotic
distributions for b�T .� Note that
T 3=2�b�T � �
�=T�1=2
Pett=T
T�3Pt2
:
� Also note that fett=Tg is a MDS with (1) �2t = E(e2t t2=T 2) = �2t2=T 2 and
1T
P�2t2=T 2 ! �2=3; (2) Ejett=T j4 < 1; (3) 1
T
Pe2t t
2=T 2p! �2=3. The reason
for (3) to hold is because
E
�1
T
Xe2t t
2=T 2 � 1
T
X�2t
�2= E
�1
T
X�e2t � �2
�t2=T 2
�2=1
T 6E�e2t � �2
�2Xt4 ! 0:
� Applying CLT for MDS to fett=Tg, we get T�1=2Pett=T
d! N(0; �2=3). Since
T�3Pt2 ! 1=3, T 3=2
�b�T � ��
d! N(0; 3�2).
� Consider the t statistic:pT (T + 1)(2T + 1)
�b�T � ��
p6b�2T =
pT (T + 1)(2T + 1)T 3=2
�b�T � ��
p6b�2T d! N (0; 1) :
.
12
� Model with a simple linear trend
Xt = �+ �t+ et:
� If etiid� (0; �2) and has �nite fourth moment, then�
T�1=2Pet
T�1=2P(t=T ) et
�d! N
��00
�; �2
�1 1
212
13
��:
� Need to �nd the asymptotic distributions for b�T and b�T and the rate of conver-gence is di¤erent."
T 1=2 (b�T � �)
T 3=2�b�T � �
� # d! N
�00
�; �2
�1 1
212
13
��1!:
� The corresponding t statistics converge to N (0; 1).
13
3 Unit Root Tests and A Stationarity Test
� The theory of ARMA method relies on the assumption of stationarity.
� The assumption of stationarity is too strong for many macroeconomic time series.
For example, many macroeconomic time series involve trend (both deterministic
trends and stochastic trends �unit root).
� Tests for a unit root. Various test statistics are often used in practice: Augmented
Dickey-Fuller (ADF) test, Phillips and Perron (PP) test.
� Test for stationarity: Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.
� Three possible alternative hypotheses for the unit root tests
1. no constant and no trend: Xt = �Xt�1 + et; j�j < 1
2. constant but no trend: Xt = �+ �Xt�1 + et; j�j < 1
3. constant and trend: Xt = �+ �Xt�1 + �t+ et; j�j < 1
14
� First consider the AR(1) process, Xt = �Xt�1 + et; utiid� N(0; �2)
� Stationarity requires j�j < 1
� If j�j = 1; Xt is nonstationary and has a unit root
� Test H0 : � = 1 against H1 : j�j < 1
� How to estimate �? OLS estimation, ie, b� = PXtXt�1PX2t�1
= � +PXt�1etPX2t�1. Stock
(1994) shows that the OLS estimator of � is superconsistent:
� What is the distribution or asymptotic distribution of b�� �?
� Recall from the standard model, y = X� + e
� If X is non-stochastic, b� � � � N(0; �2(X0X)�1)
� If X is stochastic and stationary with some restrictions on autocovariance,pT (b� � �)
d! N(0; �2Q�1) where Q = p lim X0XT
15
3.1 Classical Asymptotic Theory for Covariance StationaryProcess
� Consider the AR(1) process, Xt = �Xt�1 + et; et � iidN(0; �2) with j�j < 1
� For this model E(Xt) = 0, V ar(Xt) =�2
1��2 � �2X , j = �jV ar(Xt).
� What is the asymptotic property of X = 1T
PTt=1Xt?
� We haveE(X) = 0, V ar(X) = E(X2) = V ar(Xt)
T
�1 + 2T�1
T�+ 2T�2
T�2 + � � �+ 2 1
T�T�1
�!
0. So X r2! 0) Xp! 0
� Also, TV ar(X)!P1
j=�1 j, which is called the long-run variance.
� The consistency is generally true for any covariance stationary process with con-
ditionP1
j=0 j jj <1.
� Martingale Di¤erence Sequence (MDS): E(XtjXt�1; :::; X1) = 0 for all t
� MDS is serially uncorrelated but not necessarily independent.
� MDS does not need to be stationary.
16
� Theorem (White, 1984): Let fXtg be a MDS with XT =1T
PTt=1Xt Suppose
that (a) E(X2t ) = �2t > 0 with 1
T
PTt=1 �
2t ! �2 > 0, (b) EjXtjr < 1 for some
r > 2 and all t, and (c) 1T
PTt=1X
2t
p! �2. ThenpTXT
d! N(0; �2).
� Theorem (Andersen, 1971): Let
Xt =P1
t=0 j"t�j
where f"tg is a sequence of iid variables with E("2t ) < 1 andP1
j=0 j jj < 1.
Let j be the jth order autocovariance. Then
pTXT
d! N(0;P1
j=�1 j)
� To know more about asymptotic theory for linear processes, read Phillips and
Solo (1991).
17
3.2 Dickey-Fuller Test
� Under H0, however, we cannot claimpT (b�� �) d! N(0; �2Q�1) because Xt�1 is
not stationary.
� If Xt = �Xt�1 + et; and H0 : � = 1; we have b�� 1 = PXt�1etPX2t�1
� Consider X2t = (Xt�1 + et)
2 =) Xt�1et =12(X2
t � X2t�1 � e2t ) =)
PXt�1et =
12(X2
T �X20 �
Pe2t )
� Let X0 = 0;PXt�1et =
12(X2
T �Pe2t )
� etiid� (0; �2) =) e2t � iid with E(e2t ) = �2: By LLN, 1
T
Pe2t
p! �2
� XT = X0 +PT
t=1 et =Pet
a� N(0; T�2) =) XTpT�
d! N(0; 1) =) X2T
T�2d! �2(1)
� 1T�2
PXt�1et =
12(X2T
T�2�
Pe2t
T�2)d! 1
2(�2(1) � 1)
� What is the limit behavior ofPX2t�1? We need to introduce the Brownian Mo-
tion.
18
3.2.1 Brownian Motion
� Consider Xt = Xt�1 + et; X0 = 0; et � iidN(0; 1) =) Xt =Pt
j=1 ej � N(0; t)
� 8t > s;Xt �Xs =Pt
j=1 ej �Ps
j=1 ej = es+1 + :::+ et � N(0; (t� s))
� Also note that, Xt �Xs is indept of Xr �Xq if t > s > r > q
� What happen if we go from the discrete case to the continuous case? We have
the standard Brownian Motion (BM). It is denoted by B(t)
� A stochastic process Bt over time is a standard Brownian motion if for small
time interval �, the change in Bt;�Bt(� Bt+��Bt), follows the following prop-
erties:
1. �Bt = ep� where e � N(0; 1)
2. �Bt and �Bs for changes over any non-overlapping short intervals are in-
dependent
� The above two properties imply the following:
1. E(�Bt) = 0
2. Var(�Bt) = �
3. �(�Bt) =p�
19
� In general we have
BT �B0 =
NXi=1
eip�
� Due to the above, we have
E(BT �B0) = 0
Var(BT �B0) = T
�(BT �B0) =pT
20
� Therefore, for the Brownian motion, results for the mean and variance in small
intervals also apply to large intervals.
� When �! 0, we write the Brownian motion as:
dBt = ep�
where dBt should be interpreted as the change in the Brownian motion over an
arbitrarily small time interval.
21
� Sample paths of the Brownian motion are continuous everywhere but not smooth
anywhere.
� Derivative of the Brownian motion does not exist anywhere and the process is
in�nitely �kinky�.
� A Brownian motion is nonstationary as Var(BT ) drifts up with t. The transition
density is
BT+�jBT � N(BT ;�):
� A Brownian motion is also called a Wiener process. It has no drift. Now we
extend it to the generalized Wiener process.
� We can de�ne et = e1t + e2t; e1t; e2t � N(0; 1=2); yt+ 12= yt + e1t
� De�nition of the BM: W (t) = �B(t)
22
� Note:
1. B(0) = 0
2. B(t) � N(0; t) or B(t) is a Gaussian process
3. E(B(t+ h)�B(t)) = 0; E(B(t+ h)�B(t))2 = jhj
4. E(B(t)B(s)) = minft; sg
5. If t > s > r > q;B(t) � B(s) is indept of B(r) � B(q); ie, B(t) has indept
increment
6. Since B(t) is a Gaussian process, it is completely speci�ed by its covariance
matrix
7. If B(t) is a BM, then so is B(t+ a)�B(a); ��1B(�2t);�B(t)
8. The sample path of BM is continuous in time with probability 1
9. No point di¤erentiable
23
� Return to the unit root test. Xt = X0+Pt
j=1 ej: De�ne pt =Pt
j=1 ej = pt�1+ et
� Change the time index from T = f1; :::; Tg to t with 0 � t � 1
� De�ne YT (t) = 1�pTp[Tt]; if
j�1T� t � j
T; j = 1; :::; T: [Tt] = integer part of Tt
� For instance, YT (1) = 1�pTpT
� YT (t) =
8<:0 if 0 � t � 1
Tp1�pT
if 1T� t � 2
Tpj�1�pT
if j�1T� t � j
T
� Two important theorems
1. Functional CLT: YT (�)d! B(�)
2. Continuous mapping theorem: if f is a continuous function, f(YT (t))d!
f(B(t))
24
� Note that YT (1)d! N(0; 1), YT (1=2)
d! N(0; 1=2), YT (r)d! N(0; r), 8r 2 [0; 1].
� 8r1; r2 2 [0; 1], r2 > r1, YT (Tr2)� YT (Tr2)d! N(0; r2 � r1).
�PT
t=1Xt =PT
t=1 pt�1 +PT
t=1 et if X0 = 0.
�R j=T(j�1)=T YT (t)dt =
pj�1�TpT=)
Ppj�1 = �T
pTPT
j=1
R j=T(j�1)=T YT (t)dt = �T
pTR 10YT (t)dt
�PT
t=1Xt = �TpTR 10YT (t)dt+
pTPejpT=)
PTt=1Xt
TpT= �
R 10YT (t)dt+
1T
PejpT
� By FCLT and CMT,PTt=1Xt
TpT
d! �R 10B(t)dt
25
�PT
t=1X2t�1 = X2
0+ � � �+X2T�1 = X2
1+ � � �+X2T�1 =
PTt=1X
2t �X2
T =PT
t=1 p2t�p2T
� Recall p2j�1 = �2T 2R j=Tj�1=T Y
2T (t)dt =)
Pp2j�1 = �2T 2
R 10Y 2T (t)dt and p2T =
�2TY 2T (1)
� Therefore,PTt=1X
2t�1
T 2d! �2
R 10B2(t)dt
� Under H0; T (b�� �)d! 1=2(�2(1)�1)R 1
0 B2(t)dt
� Phillips (1987) proved the above result under very general conditions.
� Note:
1. This limiting distribution is non-standard
2. The numerator and denominator are not independent.
3. It is called the Dickey-Fuller distribution since Dickey and Fuller (1976) uses
Monte-Carlo simulation to �nd the critical values of�2(1)�1
2R 10 B
2(t)dtand tabulate
them.
26
1. T�1=2PT
t=1 etd! �B(1);
PTt=1 et � Op(T
1=2)
2. T�1PT
t=1 pt�1etd! 1
2�2 fB2(1)� 1g ;
PTt=1 pt�1et � Op(T )
3. T�3=2PT
t=1 tetd! �B(1)� �
R 10W (r)dr;
PTt=1 tet � Op(T
3=2)
4. T�3=2PT
t=1 pt�1d! �
R 10B(r)dr;
PTt=1 pt�1 � Op(T
3=2)
5. T�2PT
t=1 p2t�1
d! �2R 10B2(r)dr;
PTt=1 p
2t�1 � Op(T
2)
6. T�5=2PT
t=1 tpt�1d! �
R 10rB(r)dr;
PTt=1 tpt�1 � Op(T
5=2)
7. T�3PT
t=1 tp2t�1
d! �2R 10rB2(r)dr;
PTt=1 tp
2t�1 � Op(T
3)
8.PT
t=1 t = T (T + 1)=2 � O(T 2)
27
3.2.2 Case 1 (no drift or trend in the regression):
� In case 1, we test�H0 : Xt = Xt�1 + etH1 : Xt = �Xt�1 + et with j�j < 1
� This is equivalent to�H0 : �Xt = etH1 : �Xt = �Xt�1 + et with � < 0
� We estimate the following model using the OLS method:
�Xt = �Xt�1 + et; etiid� (0; �2e):
28
� The OLS estimator of � is de�ned by b�T , which is biased but consistent.� The test statistic (known as theDF Z test or the coe¢ cient statistic) is de�ned
as T b�T . Under H0, T b�T d! 1=2(�2(1)�1)R 10 B
2(r)dr.
� 1=2(�2(1)�1)R 10 B
2(r)dris not a normal distribution. See Table B.5 for the critical values of
this distribution. The �nite sample distribution of b�T is not normal. See TableB.5 for the critical values for di¤erent sample sizes.
� Alternatively, we can use t-statistic b�Tbse(b�T ) . Under H0, it can be shown that
b�Tbse(b�T ) = T b�T �T�2PX2t�1�1=2
b�T d!1=2
��2(1) � 1
�hR 10B2(r)dr
i1=2 :Although it is called the Dickey-Fuller t-statistic, it is no longer a t distribution.
See Table B.6 for the critical values of this distribution and the critical values of
the �nite sample distribution for di¤erent sample sizes.
29
3.2.3 Case 2 (constant but no trend in the regression):
� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + et with � < 0
� In case 2, we estimate the following model using the OLS method:
�Xt = �+ �Xt�1 + et; etiid� N(0; �2e)
� The DF Z test statistic is T b�T : Under H0, T b�T d! 1=2(�2(1)�1)�B(1)
R 10 B(r)drR 1
0 B2(r)dr�(
R 10 B(r)dr)
2 . See
Table B.5 for the critical values of this distribution and the critical values of the
�nite sample distribution for di¤erent sample sizes.
� Proof of the Asymptotic Distribution:
The OLS estimates are� b�Tb�T�=
�T
PXt�1P
Xt�1PX2t�1
��1 � PetP
Xt�1et
�=
�Op(T ) Op(T
3=2)Op(T
3=2) Op(T2)
��1 �Op(T
1=2)Op(T )
�:
30
Let g(T ) = diag(T 1=2; T ), then�T 1=2b�TT b�T
�= g(T )
�T
PXt�1P
Xt�1PX2t�1
��1 � PetP
Xt�1et
�=
�g�1(T )
�T
PXt�1P
Xt�1PX2t�1
�g�1(T )
��1g�1(T )
� PetP
Xyt�1et
�=
�1 T�3=2
PXt�1
T�3=2PXt�1 T�2
PX2t�1
��1 �T�1=2
Pet
T�1PXt�1et
�d!"
1 �R 10B(r)dr
�R 10B(r)dr �2
R 10B2(r)dr
#�1 ��B(1)
12�2 fB2(1)� 1g
�:
� Alternatively, we can use t-statistic b�Tbse(b�T ) d! 1=2(�2(1)�1)�B(1)
R 10 B(r)dr�R 1
0 B2(r)dr�(
R 10 B(r)dr)
2�1=2 under H0.
Although it is called the Dickey-Fuller t-statistic, it is no longer a t distribution.
See Table B.6 for the critical values of this distribution and the critical values of
the �nite sample distribution for di¤erent sample sizes.
31
3.2.4 Case 2b (H0: Random Walk with Drift):
� What if the true model is a random walk with drift, ie, Xt = � + Xt�1 + et =
X0 + t�+Pt
j=1 ej? (� 6= 0)
�PXt�1 = TX0 + T (T � 1)�=2 +
Ppt�1 � Op(T
2)
� T�2PXt�1
p! �=2, T�3PX2t�1
p! �2=3,PXt�1et � Op(T
3=2)
� Let g(T ) = diag(T 1=2; T 3=2), then�T 1=2b�TT 3=2b�T
�=
�g�1(T )
�T
PXt�1P
Xt�1PX2t�1
�g�1(T )
��1 �T�1=2
Pet
T�3=2PXt�1et
�
� The �rst term converges to�
1 �=2�=2 �2=3
��1. Since T�3=2
PXt�1et
p! T�3=2P�(t�
1)et, the second term converges to�T�1=2
Pet
T�3=2PXt�1et
�d! N
��00
�; �2
�1 �=2�=2 �2=3
��:
And hence, �T 1=2b�TT 3=2b�T
�d! N
�00
�; �2
�1 �=2�=2 �2=3
��1!
32
3.2.5 Case 3 (constant and trend in the regression):
� We test�H0 : �Xt = etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0
or�H0 : �Xt = �+ etH1 : �Xt = �+ �Xt�1 + �t+ et with � < 0
� In this case, we estimate the following model using the OLS method:
Xt = �+ �Xt�1 + �t+ et; etiid� N(0; �2e)
� Xt = �(1��)+�(Xt�1��(t� 1))+ (�+��)t+ et = ��+�pt�1+ ��t+ et, where
pt�1 = Xt�1 � �(t� 1) is a pure random walk if � = 1 (ie � = 0) and � = 0.
33
� Let g(T ) = diag(T 1=2; T; T 3=2), then2664T 1=2b��T
T�b�T � 1�
T 3=2�b��T � �
�3775 =
8<:g�1(T )24 T
Ppt�1
PtP
pt�1Pp2t�1
Ptpt�1P
tPtpt�1
Pt2
35 g�1(T )9=;�1 24 T�1=2
Pet
T�1Ppt�1et
T�3=2Ptet
35d!
24 1 �RBdr 1=2
�RB(r)dr �2
RB2dr �
RrBdr
1=2 �RrBdr 1=3
35�1 24 �B(1)12�2 [B2(1)� 1]
��B(1)�
RBdr
�35
=
24 � 0 00 1 00 0 �
3524 1RBdr 1=2R
BdrRB2dr
RrBdr
1=2RrBdr 1=3
35�1 24 B(1)12[B2(1)� 1]
B(1)�RBdr
35 :� So the aymptotic distribution of b�T (and hence b�T ) is independent of � and �.
34
� The Dickey-Fuller Z test statistic is T b�T : Under H0, the asymptotic distribution
is not a normal distribution. See Table B.5 for the critical values of the asymptotic
distribution distribution and the critical values of the �nite sample distribution
for di¤erent sample sizes.
� Alternatively, we can use the Dickey-Fuller t-statistic b�Tbse(b�T ) whose asymptoticdistribution distribution is no longer a t distribution. See Table B.6 for the
critical values of this distribution for di¤erent sample sizes.
Test Statistic 1% 2.5% 5% 10%Case 1 -2.56 -2.34 -1.94 -1.62Case 2 -3.43 -3.12 -2.86 -2.57Case 3 -3.96 -3.66 -3.41 -3.13
When the sample size is 100, the critical values for the Dickey-Fuller t-statistic
are given in the table below
Test Statistic 1% 2.5% 5% 10%Case 1 -2.60 -2.24 -1.95 -1.61Case 2 -3.51 -3.17 -2.89 -2.58Case 3 -4.04 -3.73 -3.45 -3.15
35
3.2.6 Which case to use in practice?
� Use Case 3 test for a series with an obvious trend, such as GDP
� Use Case 2 test for a series without an obvious trend, such as interest rates. For
variables such as interest rate we should use Case 2 rather than Case 1 since if
the data were to be described by a stationary process, surely the process would
have a positive mean. However, if you strongly believe a series has a zero mean
when it is stationary, use Case 1.
36
3.3 Augmented DF (ADF) Test
� The model in the null hypothesis considered in the DF test is a class of highly
restrictive unit root models.
� Suppose the model in the null hypothesis is Xt = Xt�1 + ut; ut = �ut�1 + et.
Unless � = 0, the DF test is not applicable.
� The above model implies the following AR(2) model for Xt,
Xt = (1 + �)Xt�1 � �Xt�2 + et = Xt�1 + ��Xt�1 + et
� In general, if
Xt = �1Xt�1 + �2Xt�2 + :::+ �pXt�p + et; et � iid(0; �2e); (3.1)
then the DF test is not applicable.
� Model (3.1) can be represented by
Xt = �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e);
where � =Pp
i=1 �i
� So if � = 1; fXtg is a unit root process; if j�j < 1; fXtg is a stationary process.
37
� In Case 2, the ADF test is based on the OLS estimator of � from the following
regression model,
�Xt = �+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)
� In Case 3, the ADF test is based on the OLS estimator of � from the following
regression model,
�Xt = �+ �t+ �Xt�1 + �1�Xt�1 + :::+ �p�1�Xt�p+1 + et; et � iid(0; �2e)
� Based on the OLS estimator of �; the ADF test statistic, b�Tbse(b�T ) , can be used. Ithas the same asymptotic distribution as the DF t statistic.
38
3.4 Phillips-Perron (PP) Test
� Xt = Xt�1 + ut;�(L)ut = �(L)et; et � iidN(0; �2e); so ut is an ARMA process.
Phillips and Perron (1988) proposed a nonparametric method of controlling for
possible serial correlation in ut.
� There are two PP test statistics. One of them (so-called PP Z test) is the
analogue of T b�T : The other (so-called PP t test) is the analogue ofb�Tbse(b�T ) . PP
Z test has the same sampling distribution as T b�T under all three cases in theprevious section. PP t test has the same sampling distribution as
b�Tbse(b�T ) under allthree cases in the previous section.
� In Case 2, both PP tests are based on the OLS estimator of � from the following
regression model,
�Xt = �+ �Xt�1 + "t; where "t � ARMA
� In this case, the PP Z test statistic is de�ned by T b�T � 12(T 2b�2b�T � s2T )(�2 � 0);
where s2T = (T � 2)�1P(Xt � b�T � b�TXt�1)
2; � = ��(1); 0 = E(u2t )
� In Case 3, both PP tests are based on the OLS estimator of � from the following
regression model,
�Xt = �+ �t+ �Xt�1 + "t; where "t � ARMA
39
3.5 Kwiatkowski, Phillips, Schmidt, and Shin (KPSS, 1992)Test
� Unlike the tests we have thus far introduced, KPSS test is a test for stationarity or
trend-stationarity, that is, it tests for the null of stationarity or trend stationarity
against the alternative of a unit root.
� It is assumed that one can decompose Xt into the sum of a deterministic trend,
a random walk and a stationary error
Xt = �t+ rt + ut (3.2)
rt = rt�1 + et (3.3)
where etiid� N(0; �2e). The initial value r0 is assumed to be �xed.
� If �2e = 0, Xt is trend-stationary. If �2e > 0, Xt is a non-stationary.
40
� KPSS test is a one-sided Lagrange Multiplier (LM) test
� The KPSS statistic is based on the residuals from the OLS regression of Xt on
the exogenous variables Zt = 1 or (1; t):
Xt = �0Zt + �t
� The LM statistic is be de�ned as: PTt=1 S(t)
2
T 2f0;
where f0 is an estimator of the long-run variance and S(t) is a cumulative residual
function:
S(t) =tXs=1
�s
� Critical values for the KPSS test statistic are:
Level of Signi�cance 10% 5% 2.5% 1%Crit value (case 2) 0.347 0.463 0.574 0.739Crit value (case 3) 0.119 0.146 0.176 0.216
41
� If the LM test statistic is larger than the critical value, we have to reject the null
of stationarity (or trend-stationarity).
� In general �t is not a white noise. As a result, the long run variance is di¤erent
from the short run variance. One way to estimate the long run variance is to
nonparametrically estimate the spectral density at frequency zero, that is, com-
pute a weighted sum of the autocovariances, with the weights being de�ned by a
kernel function. For example, a nonparametric estimate based on Bartlett kernel
is
f0 =T�1X
h=�(T�1)
(h)K(h=l);
where l is a bandwidth parameter (which acts as a truncation lag in the covariance
weighting), and (h) is the h-th sample autocovariance of the residuals �, de�ned
as,
(h) =TX
t=h+1
�t�t�h=T;
and K is the Barlett kernel function de�ned by
K(x) =
� 1� jxj if jxj � 1
0 otherwise
42
4 Revisit to Box-Jenkins
� If the null hypothesis cannot be rejected, di¤erence the data (ie, Xt�Xt�1 = ut)
to achieve stationarity.
� In the case where both a stochastic trend and a linear deterministic trend are
involved, the �rst di¤erence transformation leads to a stationary process.
� Test for a unit root in the di¤erenced data. If the null hypothesis cannot be
rejected, di¤erence the di¤erenced data until stationarity is achieved.
� ARMA(p; q)+one unit root =ARIMA(p; 1; q)
� ARMA(p; q)+d unit roots =ARIMA(p; d; q)
5 Limitations of Box-Jenkins
� The Data Generating Process has to be time-invariant. This assumption can be
too restrictive for an economy undergoing a period of transition and for data
covering a long sample period.
� What if the data is explosive, ie � > 1.
� What if a nonlinear deterministic trend is involved.
� What if the data follow a trend stationary process.
� Unit root tests and model selection are done in two separate steps.
43
6 Model Selection
6.1 Why use model selection criteria
� Select p; q in the ARMA model.
� Select lag order, the trend order and other regressors.
6.2 Conventional model selection criteria
� 1�R2
� Minimize 1� �R2
� Minimize AIC (Akaike information criterion, Akaike, 1969). AIC= �2lT+ 2k
T,
where k is the number of parameters, l is the log-likelihood value, and T is the
sample size
� Minimize BIC (Bayesian information criterion, Schwarz, 1978): BIC= �2lT+
kTlnT
44
6.3 Model selection criteria recently developed
� In AIC/BIC, the penalty depends only on the number of regressors while in PIC
it depends not only on the number of regressors but also on the type of regressors
� In the regression context with the normality assumption (ie, Y = X� + "),
PIC= �2lT+ln(jX0X=�2K j)
T, where �2K is a consistent estimate of the residual variance
of a reference model. Usually the reference model is chosen to be the biggest
plausible model.
45
Number of Parameters
Crit
eria
5 10 15
0.4
0.5
0.6
0.7
0.8
1Adj R SquareAICBICPIC
1Adj R SquareAICBICPIC
6.4 Which criterion to use in practice
� Comparison of these criteria
46
� BIC and PIC are consistent. This means in large samples they will select the
true model
� AIC in not consistent. In large samples it tends to choose a model with too many
lags
� PIC is applicable to ARIMA+Trend processes
� AIC and BIC are provided by most packages when a regression is run.
� A Gauss library COINT developed by Ouliaris and Phillips calculates PIC
� In the context of stationary ARMA models, Phillips and Ploberger (1994) found
that PIC is superior to BIC using simulated data.
� PIC is found in Schi¤ and Phillips (2000) to be superior to BIC in a class of
AR+Trend models for forecasting New Zealand GDP
47
7 Cointegration
� De�nition: two variables (Yt; Xt) is said to be cointegrated if each of them taken
individually is I(1) ( ie non-stationary with a unit root), while some linear com-
binations of them (say Yt � �Xt) is stationary (ie I(0)).
� If (Yt; Xt) is cointegrated, the regression model Yt = � + �Xt + et is called a
cointegration relation, where et is I(0).
� (1;��) is called the cointegrating vector. It is easy to see any multiple of it is
also a cointegrating vector. So normalisation is important.
� Cointegration is a long run equilibrium relationship, that is the manner in which
the two variables drift upward together.
48
� Since et is I(0), any deviation from the equilibrium will come back to the equi-
librium in the long run.
� Cointegration and common trend:
� cointegrated variables share common trends
� Suppose both Yt and Xt have a stochastic trend
�Yt = �Y t + "Y t; where �Y t is I(1) ie �Y t = �Y t�1 + !Y t and "Y t is stationary
Xt = �Xt+"Xt; where �Xt is I(1) ie �Xt = �Xt�1+!Xt and "Y t is stationary
�Now if Yt and Xt are cointegrated, then there must exist a parameter � such
that Yt � �Xt is stationary, ie, �Y t + "Y t � �(�Xt + "Xt) = �Y t � ��Xt +
"Y t � �"Xt is stationary. Since "Y t � �"Xt is stationary, for Yt � �Xt to be
stationary, �Y t � ��Xt must be 0. Therefore �Y t = ��Xt
�The equality thus necessarily requires that Yt and Xt must have a common
stochastic trend
49
� The concept of cointegration is originally due to Granger (1981) but popularised
by Engle and Granger (1987, Econometrica)
� Estimation of cointegration systems:
1. OLS estimator is not only consistent but also �super-consistent�. This
means the estimator converges to its limiting distribution at a faster rate
than the estimator in the stationary case.
2. Super-consistency is also true in the cointegration model involved the auto-
correlation.
50
� Test for cointegration (between Yt and Xt)
Basic Idea:
1. Test Yt and Xt for I(1)
2. Obtain the �tted residual bet from a regression of Yt on Xt (di¤erent cointe-
grating regression models lead to di¤erent cases)
3. Use the ADF or PP test to test for a unit root in bet. Since the test is basedon bet, not et, we cannot use the Dickey-Fuller tables as before. We must usemodi�ed critical values.
4. If H0 is rejected Yt and Xt are cointegrated. If H0 cannot be rejected, the
regression of Yt on Xt is spurious.
51
� Case 2 (constant and no trend in the regression):
�We estimate the following model (with the constant term but without trend)
using the OLS method
Yt = �+X 0t� + et
�The ADF t-statistic or the PP t-statistic can be used. The critical values
for the ADF t-statistic or the PP t-statistic are given in the table below.
� Case 3 (constant and trend in the regression):
�We estimate the following model (with the constant term and the trend)
using the OLS method:
Yt = �+X 0t� + �t+ et
�The ADF t-statistic or the PP t-statistic can be used. The critical values
for the ADF t-statistic or the PP t-statistic are given in the table below.
� Which case to use in practice?
�Use Case 3 test if either Yt or Xt or both involve a possible trend, such as
GDP
�Use Case 2 test if neither Yt nor Xt involve a possible trend, such as interest
rates and exchange rates.
k 2 3 4 5 6case 2 3 2 3 2 3 2 3 2 31% -3.90 -4.32 -4.29 -4.66 -4.64 -4.97 -4.96 -5.25 -5.25 -5.525% -3.34 -3.78 -3.74 -4.12 -4.10 -4.43 -4.42 -4.72 -4.71 -4.9810% -3.04 -3.50 -3.45 -3.84 -3.81 -4.15 -4.13 -4.43 -4.42 -4.70
52
8 Spurious Regression
� Yt = a+ bZt + ut
� Classical approach requires that both Yt and Zt and hence ut are stationary
� Now consider Yt = Yt�1 + "1t, Zt = Zt�1 + "2t; where "jtiid� (0; �2j )
� Let Y0 = Z0 = 0; we have Yt =Pt
j=1 "1j, Zt =Pt
j=1 "2j
� If "1t and "2t are indept, the regression is a spurious regression.
� Why spurious?
� Using Monte Carlo simulations Granger and Newbold (1974) �nd that t statistic
rejects the null hypothesis b = 0 far more often than it should and tends to reject
it more and more frequently the larger is the sample size.
� Using asymptotic theory Phillips (1986) �nds that t p! +1, hence it will reject
the null hypothesis b = 0 all the time asymptotically. Also he shows R2p! 1
� Another type of spurious regression: Yt = a+ bt+ ut and Yt = Yt�1 + "t
� Note:
1. If Yt and Zt are stationary, the classical theory applies
2. If Yt and Zt are integrated with di¤erent orders, the regression is meaningless
3. If Yt and Zt are integrated with the same order, but ut is integrated with
the same order, we have a spurious regression
4. If Yt and Zt are integrated with the same order, but ut is integrated with a
lower order, we have a cointegration
53