01 stationary time series - vilniaus...

01 Stationary time series

Andrius Buteikis, [email protected]://web.vu.lt/mif/a.buteikis/

mailto:[email protected]

http://web.vu.lt/mif/a.buteikis/

IntroductionAll time series may be divided into two big classes - stationary andnon-stationary.I Stationary process - a random process with a constant mean,

variance and covariance. Examples of stationary time series:

WN, mean = 0

Time

x1

0 50 100 150 200

−2

−1

01

2

MA(3), mean = 5

Time

x2

0 50 100 150 200

23

45

67

AR(1), mean = 5

Time

x3

0 50 100 150 200

24

68

The three example processes fluctuate around their constant mean values.Looking at the graphs, the fluctuations of the first two graphs seem to beconstant, however the third one is not so apparent.

If we were to examine a longer time period of the last time series:

AR(1), mean = 5

Time

x3

0 50 100 150 200

24

68

AR(1), mean = 5

Time

x3

0 100 200 300 400

24

68

We can see that the fluctuations are indeed around a constant mean andthe variance does not appear to change throughout the period.

Some non-stationary time series examples:I Yt = t + εt , where εt ∼ N (0, 1);I Yt = εt · t, where εt ∼ N (0, σ2);I Yt =

∑tj=1 Zj , where each independent variable Zj is either 1 or −1,

with a 50% probability for either value.

The reasons for their non-stationarity are as follows:I The first time series is not stationary because its mean is not

constant: EYt = t - depends on t;I The second time series is not stationary because its variance is not

constant: Var(Yt) = t2 · σ2 - depends on t.However, EYt = 0 · t = 0 is constant;

I The third time series is not stationary because even thoughEYt =

∑tj=1 (0.5 + (−0.5)) = 0, the variance

Var(Yt) = E(Y 2t )− (E(Yt))2 = E(Y 2

t ) = t where:

E(Y 2t ) =

∑tj=1 E(Z 2

j ) + 2∑

j 6=k E(ZjZk) = t · (0.5 · 1 + 0.5 · (−1)2) = t

The sample data graphs are provided below:

0 10 20 30 40 50

010

2030

4050

non stationary in mean

Index

ns1

0 10 20 30 40 50

−50

050

100

non stationary in variance

Index

ns2

0 10 20 30 40 50

02

46

no clear tendency

Index

ns3

I White noise (WN) - a stationary process of uncorrelated(sometimes we may demand a stronger property of independence)random variables with zero mean and constant variance. White noiseis a model of an absolutely chaotic process of uncorrelatedobservations - it is a process that immediately forgets its past.

How can we know which of the previous three stationary graphs are notWN? Two functions help us determine this:I ACF - Autocorrelation functionI PACF - Partial autocorrelation functionIf all the bars (except the 0th in the ACF) are within the blue band- the stationary process is WN.

0 5 10 15 20

0.0

0.4

0.8

Lag

AC

F

WN

0 5 10 15 20

−0.

20.

20.

61.

0

Lag

AC

F

MA(3)

0 5 10 15 20 25

0.0

0.4

0.8

Lag

AC

F

AR(1)

5 10 15 20

−0.

100.

000.

10

Lag

Par

tial A

CF

WN

5 10 15 20

−0.

10.

10.

3

Lag

Par

tial A

CF

MA(3)

0 5 10 15 20 25

0.0

0.2

0.4

0.6

Lag

Par

tial A

CF

AR(1)

The 95% confidence intervals are calculated from:qnorm(p = c(0.025, 0.975))/sqrt(n)

(more details on the confidence interval calculation are provided later inthese slides)

par(mfrow = c(1, 2))set.seed(10)n = 50x0 <- rnorm(n)acf(x0)abline(h = qnorm(c(0.025, 0.975))/sqrt(n), col = "red")pacf(x0)abline(h = qnorm(c(0.025, 0.975))/sqrt(n), col = "red")

0 5 10 15

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

F

Series x0

5 10 15−

0.2

−0.

10.

00.

10.

20.

3

Lag

Par

tial A

CF

Series x0

To decide whether a time series is stationary, examine its graph.

To decide whether a stationary time series is WN, examine its ACF and PACF.

Covariance-Stationary Time SeriesI In cross-sectional data different observations were assumed to be

uncorrelated;I In time series we require that there be some dynamics, some

persistence, some way in which the present is linked to the past andthe future - to the present. Having historical data then would allow usto forecast the future.

If we want to forecast a series - at a minimum we would like its mean andcovariance structure to be stable over time. In that case, we would saythat the series is covariance stationary. There are two requirements forthis to be true:

1. The mean of the series is stable over time: EYt = µ;2. The covariance structure is stable over time.

In general, the (auto)covariance between Yt and Yt−τ is:

γ(t, τ) = cov(Yt ,Yt−τ ) = E(Yt − µ)(Yt−τ − µ)

If the covariance structure is stable, then the covariance depends on τ butnot on t: γ(t, τ) = γ(τ). Note: γ(0) = Cov(Yt ,Yt) = Var(Yt) <∞.

RemarkWhen observing/measuring time series we obtain numbers y1, ..., yT whichare the realization of random variables Y1, ...,YT .Using probabilistic concepts, we can give a more precise definition of a(weak) stationary series:I If EYt = µ - the process is called mean-stationary;I If Var(Yt) = σ2 <∞ - the process is called variance-stationary;I If γ(t, τ) = γ(τ) - the process is called covariance-stationary.In other words, a time series Yt is stationary if its mean, varianceand covariance do not depend on t.

If at least one of the three requirements is not met, then the process isnot-stationary.

Since we often work with the (auto)correlation between Yt and Yt−τrather than the (auto)covariance (because they are easier to interpret), wecan calculate the autocorrelation function (ACF):

ρ(τ) = cov(Yt ,Yt−τ )√Var(Yt)Var(Yt−τ )

= γ(τ)γ(0)

Note: ρ(0) = 1, |ρ(τ)| ≤ 1.

The partial autocorrelation function (PACF) measures the associationbetween Yt and Yt−k :

p(k) = βk , where Yt = α + β1Yt−1 + ...+ βkYt−k + εt

The variance of the autocorrelation coefficient at lag k, rk , is normallydistributed at the limit, and the variance can be approximated:Var(rk) ∼ 1

T (where T is the number of observations).

As such, we want to create lower and upper 95% confidence bounds for

the normal distribution N(0, 1T

), whose standard deviation is 1√

T.

The 95% confidence interval (of a WN time series) is:

∆ = 0± 1.96√T

In general, the critical value of a standard normal distribution and itsconfidence interval can be found in these steps:

I Compute α = 1− Q2 , where Q is the confidence level;

I To express the critical value as a z − score, find the z1−α value.

For example, if Q = 0.95, then α = 0.05. Then, the standard normaldistributions 1− α quantile is z0.025 ≈ 1.96.

White Noise

White noise processes are the fundamental building blocks of all stationarytime series.

We denote it as εt ∼WN(0, σ2) - a zero mean, constant variance andserially uncorrelated (ρ(t, τ) = 0, for τ > 0 and any t) random variableprocess.

Sometimes we demand a stronger property of independence.

From the definition it follows that:I E(εt) = 0;I Var(εt) = σ2 <∞;I γ(t, τ) = E(εt − Eεt)(εt−τ − Eεt−τ ) = E(εtεt−τ ), where:

E(εtεt−τ ) =0, if τ 6= 0σ2, if τ = 0

Checking if a process is stationary.Let us check if Yt = εt + β1εt−1, where εt ∼WN(0, σ2) is stationary:1. EYt = E(εt + β1εt−1) = 0 + β1 · 0 = 0;2. Var(Yr ) = Var(εt + β1εt−1) = σ2 + β21σ

2 = σ2(1 + β1);3. The autocovariance for τ > 0:

γ(t, τ) = E(YtYt−τ ) = E(εt + β1εt−1)(εt−τ + β1εt−τ−1)= Eεtεt−τ + β1Eεtεt−τ−1 + β1Eεt−1εt−τ + β21Eεt−1εt−τ−1

= β1Eεt−1εt−τ =β1σ

2, if τ = 10, if τ > 1

None of these characteristics depend on t, which means that the process isstationary. This process has a very short memory (i.e. if Yt and Yt+τ areseparated by more than one time period - they are uncorrelated).On the other hand, this process is not a WN.

The Lag OperatorThe lag operator L is used to lag a time series: LYt = Yt−1. Similarly:L2Yt = L(LYt) = L(Yt−1) = Yt−2 etc. In general, we can write:

LpYt = Yt−p

Typically, we operate on a time series with a polynomial in the lagoperator. A lag operator polynomial of degree m is:

B(L) = β0 + β1L + β2L2 + ...+ βmLm

For example, if B(L) = 1 + 0.9L− 0.6L2, then:

B(L)Yt = Yt + 0.9Yt−1 − 0.6Yt−2

A well known operator - the first-difference operator ∆ - is a first-orderpolynomial in the lag operator: ∆Yt = Yt − Yt−1 = (1− L)Yt ,i.e. B(L) = 1− L.

We can also write an infinite-order lag operator polynomial as:

B(L) = β0 + β1L + β2L2 + ... =∞∑

j=0βjLj

The General Linear Process

Wold’s representation theorem points to the appropriate model forstationary processes.

Wold’s Representation TheoremLet Yt be any zero-mean covariance-stationary process. Then we canwrite it as:

Yt = B(L)εt =∞∑

j=0βjεt−j , εt ∼WN(0, σ2)

where β0 = 1 and∑∞

j=0 β2j <∞. On the other hand, any process of the

above form is stationary.I If β1 = β2 = ... = 0 (and β0 6= 0)- this corresponds to a WN process.

This shows once again that WN is a stationary process.I If βk = φk , then since 1 + φ+ φ2 + ... = 1/(1− φ) <∞ we have

that if |φ| < 1, then the process Yt = ε+ φεt−1 + φ2εt−2 + ... is astationary process.

I In Wold’s theorem, we assumed a zero mean, though this is not asrestrictive as it may seem. Whenever you see Yt , you can analyse theprocess Yt − µ, so that the process is expressed in deviations from itsmean. The deviation from the mean has a zero mean byconstruction. So, there is no generality loss when analyzingzero-mean processes.

I Wold’s representation theorem points to the importance of modelswith infinite distributed (weighted) lags. Although infinite distributedlag models are not of immediate practical use since they containinfinite parameters, however, this may not always be the case.I As an example, from the previous slide, we may have βk = φk in the

infinite polynomial B(L) - which is only a single (unknown) parameter.

Estimation and Inference for the Mean, ACF and PACF

Suppose we have a data sample of a stationary time series but we do notknow the true model that generated the data (we only know that it was apolynomial B(L)), nor the mean, ACF or PACF associated with the model.

We want to use the data to estimate the mean, ACF and PACF, which wemight use to help us decide on a suitable model to fit the data.

Sample MeanThe mean of a stationary series is EYt = µ. A fundamental principle ofestimation, called the analog principle, suggests that we develop estimatorsby replacing expectations with sample averages. Thus, our estimator ofthe population mean, given a sample of size T is the sample mean:

Y = 1T

T∑t=1

Yt

Typically, we are not interested in estimating the mean but it is needed forestimating the autocorrelation function.

Sample AutocorrelationsThe autocorrelation at displacement, or lag, τ for the covariance stationaryseries Yt is:

ρ(τ) = E (Yt − µ) (Yt−τ − µ)E (Yt − µ)2

Application of the analog principle yields a natural estimator of ρ(τ):

ρ(τ) =

1T∑T

t=1[(Yt − Y

) (Yt−τ − Y

)]1T∑T

t=1(Yt − Y

)2This estimator is called the sample autocorrelation function (sampleACF).

Checking whether the autocorrelations are statisticallysignificantly different from zeroIt is often of interest to assess whether a series is reasonably approximatedas white noise, i.e. whether all of its autocorrelations are zero inpopulation.If a series is white noise, then the sample autocorrelations ρ(τ),τ = 1, ...,K in large samples are independent and have the N (0, 1/

√T )

distribution.Thus, if the series is WN, ~95% of the sample autocorrelations should fallin the interval of ±1.96/

√T .

Exactly the same holds for both sample ACF and sample PACF. Wetypically plot the sample ACF and sample PACF along with their errorbands.

The aforementioned error bands provide 95% confidence bounds foronly the sample autocorrelation taken one at a time.

Ljung-Box TestWe are often interested in whether a series is white noise, i.e. whether allits autocorrelations are jointly zero. Because of the sample size, we canonly take a finite number of autocorrelations. We want to test:

H0 : ρ(1) = 0, ρ(2) = 0, ..., ρ(k) = 0

Under the null hypothesis the Ljung-Box statistic:

Q = T (T + 2)k∑τ=1

ρ2(τ)T − τ

is approximately distributed as a χ2k random variable.

To test the null hypothesis, we have to calculate thep − value = P(χ2K > q): if p − value < 0.05 - we reject the nullhypothesis, H0, and assume that Yt is not white noise.

Example: Canadian unemployment data

We will illustrate the provided ideas by examining quarterly Canadianemployment index data. The data is seasonally adjusted and displays notrend (more on what this means in a later lecture), however it does appearto be highly serially correlated. . .suppressPackageStartupMessages(require("forecast"))txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"txt2 <- "caemp.txt"caemp <- read.csv(url(paste0(txt1, txt2)),

header = TRUE, as.is = TRUE)caemp <- ts(caemp, start = c(1960, 1), freq = 4)tsdisplay(caemp)

caemp

1960 1965 1970 1975 1980 1985 1990 1995

8590

9510

5

5 10 15 20

−0.

20.

00.

20.

40.

60.

81.

0

Lag

AC

F

5 10 15 20

−0.

20.

00.

20.

40.

60.

81.

0

Lag

PAC

F

I The sample ACF are large and display a slow one-sided decay;I The sample PACF are large at first, but are statistically negligible

beyond displacement τ = 2.

We shall once again test the WN hypothesis, this time using theLjung-Box test statistic.Box.test(caemp, lag = 1, type = "Ljung-Box")

#### Box-Ljung test#### data: caemp## X-squared = 127.73, df = 1, p-value < 2.2e-16

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0.Box.test(caemp, lag = 2, type = "Ljung-Box")

#### Box-Ljung test#### data: caemp## X-squared = 240.45, df = 2, p-value < 2.2e-16

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0, ρ(2) = 0, andso on. We can see that the time series is not WN.

We will now present a few more examples of stationary processes.

Moving-Average (MA) Models

I Finite-order moving-average processes are approximations to the Woldrepresentation (which is an infinite-order moving average process).

I The variation in time series, one way of another, is driven by shocksof various sorts. This suggests the possibility of modelling time seriesdirectly as distributed lags of current and past shocks - i.e. asmoving-average processes.

The MA(1) Process

The first-order moving average, or MA(1), process is:

Yt = εt + θεt−1 = (1− θL)εt , −∞ < θ <∞, ε ∼WN(0, σ2)

Defining characteristics of an MA process: the current value of theobserved series can be expressed as a function of current and laggedunobservable shocks εt .

Whatever the value of θ (as long as |θ| <∞), MA(1) is always astationary process and:I E(Yt) = E(εt) + θE(εt−1) = 0;I Var(Yt) = Var(εt) + θ2Var(εt−1) = (1 + θ2)σ2;

I ρ(τ ) =

1, if τ = 0θ/(1 + θ2), if τ = 10, otherwise

Key feature of MA(1): (sample) ACF has a sharp cutoff beyond τ = 1.

We can write MA(1) another way:

Since:Yt = (1− θL)εt ⇒ εt = 1

1− θLYt

Recalling the formula of a geometric series, if |θ| < 1:

εt = (1− θL + θ2L2 − θ3L3 + ...)Yt

= Yt − θYt−1 + θ2Yt−2 − θ3Yt−3 + ...

and we can express Yt as an infinite AR, AR(∞), process:

Yt = θYt−1 − θ2Yt−2 + θ3Yt−3 − ...+ εt

=∞∑

j=1(−1)j+1θjYt−j + εt

Remembering the definition of a PACF we have that for an MA(1) processit will decay gradually to zero. Furthermore:I If θ < 0, then the pattern of decay will be one-sidedI If 0 < θ < 1, then the pattern of decay will be oscillating.

An example on how the sample ACF and PACF would look like for someMA(1) processes:

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

MA(1) with θ = 0.5

1 2 3 4 5

−0.

20.

00.

10.

20.

3

Lag

Par

tial A

CF

MA(1) with θ = 0.5

0 1 2 3 4 5

−0.

40.

00.

40.

8

Lag

AC

F

MA(1) with θ = −0.5

1 2 3 4 5

−0.

4−

0.2

0.0

0.1

Lag

Par

tial A

CF

MA(1) with θ = −0.5

The MA(q) ProcessWe will now consider a general finite-order moving average process oforder q, MA(q):

Yt = εt+θ1εt−1+...+θqεt−q = Θ(L)εt , −∞ < θ <∞, ε ∼WN(0, σ2)

whereΘ(L) = 1 + θ1L + ...+ θqLq

is the qth-order lag polynomial. The MA(q) process is a generalization ofthe MA(1) process. Compared to MA(1), the MA(q) can capture richerdynamic patterns which can be used for improved forecasting.

The properties of an MA(q) processes are parallel to those of an MA(1)process in all respects:I The finite-order MA(q) process is covariance stationary for any value

of its parameters (|θj | <∞, j = 1, ..., q);I In MA(q) case, all autocorrelations in ACF beyond displacement q are

0 (a distinctive property of the MA process);I The PACF of the MA(q) decays gradually in accordance with the

infinite autoregressive representation, similar to MA(1):Yt = a1Yt−1 + a2Yt−2 + ...+ εt (with certain conditions for aj).

An example on how the sample ACF and PACF would look like for aspecific MA(3) process:

0 2 4 6 8

0.0

0.4

0.8

Lag

AC

F

MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35

2 4 6 8

−0.

40.

00.

20.

40.

6

Lag

Par

tial A

CF

MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35

ACF is cut off at τ = 3 and PACF decays gradually.

Autoregressive (AR) Models

I The autoregressive process is also a natural approximation of theWold representation.

I We have seen that, under certain conditions, a moving-averageprocess has an autoregressive representation.

I Consequently, an autoregressive process is, in a sense, the same as amoving average process.

The AR(1) Process

The first-order autoregressive, or AR(1), process is:

Yt = φYt−1 + εt , εt ∼WN(0, σ2)

or:(1− φL)Yt = εt ⇒ Yt = 1

1− φLεt

Note the special interpretation of the errors, or disturbances, or shocks εtin time series theory: in contrast to the regression theory where they wereunderstood as the summary of all unobserved X ’s, now they are treated aseconomic effects which have developed in period t.

As we will see when analyzing ACF, the AR(1) model is capable ofcapturing much more persistent dynamics (depending on its parametervalue) than the MA(1) model, which has a very short memory regardlessof its parameter value.

Recall that a finite-order moving-average process is always covariancestationary, but that certain conditions must be satisfied for AR(1) to bestationary. The AR(1) process can be rewritten as:

Yt = 11− φLεt = (1 + φL + φ2L2 + ...)εt = εt + φεt−1 + φ2εt−2 + ...

This Wold’s moving-average representation for Y is convergent if |φ| < 1,thus:

AR(1) is stationary is |φ| < 1

Equivalently, the condition for covariance stationarity is that the root,z1, of the autoregressive lag operator polynomial (i.e. 1 − φz1 =0 ⇔ z1 = 1/φ) be greater than 1 in absolute value (a similarcondition on the roots is important for the AR(p) case).

We can also get the above equation by recursively applying the equation ofAR(1) to get the infinite MA process:

Yt = φYt−1 + εt = φ(φYt−2 + εt−1) + εt

= εt + φεt−1 + φ2Yt−2 = ... =∞∑

j=0φjεt−j

From the moving average representation of the covariance stationaryAR(1) process:I E(Yt) = E(εt + φεt−1 + φ2εt−2 + ...) = 0;I Var(Yt) = Var(εt) + φ2Var(εt−1) + ... = σ2/(1− φ2);

Or, alternatively: when |φ| < 1 - the process is stationary, i.e. EYt = m,therefore EYt = φEYt−1 + Eεt ⇒ m = φm + 0⇒ m = 0.

This allows us to easily estimate the mean of the generalized AR(1)process: if Yt = α + φYt−1 + εt , then m = α/(1− φ).

The correlogram (ACF & PACF) of AR(1) is in a sense symmetric to thatof MA(1):I ρ(τ ) = φτ , τ = 0, 1, 2... - ACF decays exponentially;

I p(τ ) =φ, τ = 10, τ > 1

- PACF cuts off abruptly.

An example on how the sample ACF and PACF would look like for someAR(1) process:

0 1 2 3 4 5

0.0

0.4

0.8

Lag

AC

F

AR(1) with φ = 0.85

1 2 3 4 5

0.0

0.4

0.8

Lag

Par

tial A

CF

AR(1) with φ = 0.85

The AR(p) ProcessThe general pth order autoregressive process, AR(p) is:

Yt = φ1Yt−1 + φ2Yt−2 + ...+ φpYt−p + εt , εt ∼WN(0, σ2)

In lag operator form, we write:

Φ(L)Yt = (1− φ1L− φ2L2 − ...− φpLp)Yt = εt

Similar to the AR(1) case, the AR(p) process is covariance stationaryif and only if all the roots zi of the autoregressive lag operator polynomialΦ(z) are outside the complex unit circle:

1− φ1z − φ2z2 − ...− φpzp = 0⇒ |zi | > 1

So:

AR(p) is stationary if all the roots |zi | > 1

For a quick check of stationarity, use the following rule of thumb:

If∑p

i=1 φi ≥ 1, the process isn’t stationary

(Note: a rule of thumb and not a guarantee)

In the covariance stationary case, we can write the process in the infinitemoving average MA(∞) form:

Yt = 1Φ(L)εt

I The ACF for the general AR(p) process decays gradually when the lagincreases;

I The PACF for the general AR(p) process has a sharp cutoff atdisplacement p.

An example on how the sample ACF and PACF would look like for anAR(2) process Yt = 1.5Yt−1 − 0.9Yt−2 + εT :

0 5 10 15 20

−0.

50.

5

Lag

AC

F

AR(2) with φ1 = 1.5, φ2 = −0.9,

5 10 15 20

−0.

50.

5

Lag

Par

tial A

CF

AR(2) with φ1 = 1.5, φ2 = −0.9,

The corresponding lag operator polynomial is 1− 1.5L + 0.9L2 with twocomplex conjugate roots: z1,2 = 0.83± 0.65i ,|z1,2| =

√0.832 + 0.652 = 1.05423 > 1 - thus the process is stationary.

The ACF for an AR(2) is:

ρ(τ) =

0, τ = 0φ1/(1− φ2), τ = 1φ1ρ(τ − 1) + φ2ρ(τ − 2), τ = 2, 3, ...

Because the roots are complex, the ACF oscillates and because the rootsare close to the unit circle, the oscillation damps slowly.

https://en.wikipedia.org/wiki/Complex_conjugate_root_theorem

Stationarity and InvertibilityThe AR(p) is a generalization of the AR(1) strategy for approximating theWold representation. The moving-average representation associated withthe stationary AR(p) process:

Yt = 1Φ(L)εt where 1

Φ(L) =∞∑

j=0ψjLj , ψ0 = 1

depends on p parameters only. This gives us the infinite process fromWold’s Representation Theorem:

Yt =∞∑

j=0ψjεt−j

which is known as the infinite moving-average process, MA(∞). BecauseAR is stationary,

∑∞j=0 ψ

2j <∞ and Yt take finite values.

Thus, a stationary AR process can be rewritten as an MA(∞) process.

Stationarity and Invertibility

In some cases the AR form of a stationary process is preferred to that ofMA. Just as we can write an AR process as an MA(∞), we can written anMA process as an AR(∞). The necessary definition says that the MAprocess is called invertible if it can be expressed as an AR process. So,the MA(q) process:

Yt = εt+θ1εt−1+...+θqεt−q = Θ(L)εt , −∞ < θi <∞, εt ∼WN(0, σ2)

is invertible if all the roots of Θ(x) = 1 + θ1x + ... + θqxq lie outside theunit circle:

1 + θ1x + ...+ θqxq = 0⇒ |xi | > 1

Stationarity and InvertibilityThen we can write the process as:

εt = 1Θ(L)Yt , where

1Θ(L) =

∞∑j=0

πjLj , π0 = 1

εt =∞∑

j=0πjYt−j = Yt +

∞∑j=1

πjYt−j

which gives us the infinite-order autoregressive process, AR(∞):

Yt =∞∑

j=1πjYt−j + εt

Because the MA process is invertible, the infinite series converges to afinite value.

For example, MA(1) of the form Yt = εt − εt−1 is not invertible since1− x = 0⇒ x = 1.

Causality

A process Yt is causal, or a causal function of εt, if Yt can beexpressed in terms of the current and past values of εt .

So, by definition:I a stationary AR(p) process is causal;I any MA(q) process is causal.

Note, for an AR(1) process Yt = φYt−1 + εt , εt ∼WN(0, σ2):I If |φ| < 1, then Yt is causal, because a stationary AR(1) process

can be expressed in terms of its shocks: Yt = 1/(1− φL)εt ;I If |φ| > 1, then Yt is non-causal.

To recap:I The AR(p) process is always invertible (it contains no MA

terms);I The MA(q) process is invertible if all the roots ofθ(x) = 1 + θ1x + ...+ θqxq lie outside the unit circle;

I An invertible MA(q) process can be rewritten as an AR(∞)process;

I The MA(q) process is always stationary (it contains no ARterms);

I The AR(p) process is stationary if all the roots ofφ(z) = 1− φ1z − ...− φpzp lie outside the unit circle;

I A stationary AR(p) process can be rewritten as an MA(∞)process.

I A stationary AR(p) process is causal;I Any MA(q) process is causal.

Autoregressive Moving-Average (ARMA) Models

AR and MA models are often combined in attempts to obtain betterapproximations to the Wold representation. This results in theARMA(p,q) process. The motivation for using ARMA models is asfollows:I If the random shock that drives and AR process is itself a MA process,

then we obtain an ARMA process;I ARMA processes arise from aggregation - sums of AR processes, sums

of AR and MA processes;I AR processes observed subject to measurement error also turn out to

be ARMA processes.

ARMA(1,1) processThe simplest ARMA process, that is not a pure AR or a pure MA, is theARMA(1,1) process:

Yt = φYt−1 + εt + θεt−1, εt ∼WN(0, σ2)or in lag operator form:

(1− φL)Yt = (1 + θL)εtwhere:

1. |φ| < 1 - required for stationarity;2. |θ| < 1 - required for invertibility.

If the covariance stationarity conditions are satisfied, then we have the MArepresentation:

Yt = (1− φL)(1 + θL) εt = εt + b1εt−1 + b2εt−2 + ...

which is an infinite distributed lag of current and past innovations.

Similarly, we can rewrite it in the infinite AR form:

Yt + a1Yt−1 + a2Yt−2 + ... = (1 + θL)(1− φL)Yt = εt

ARMA(p,q) process

A natural generalization of the ARMA(1,1) is the ARMA(p,q) process thatallows for multiple moving-average and autoregressive lags. We can writeit as:

Yt = φ1Yt−1 + ...+φpYt−p + εt +θqεt−1 + ...+θqεt−q, εt ∼WN(0, σ2)

or:Φ(L)Yt = Θ(L)εt

I If all the roots of Φ(L) are outside the unit circle, then the process isstationary and has a convergent infinite moving averagerepresentation: Yt = (Φ(L)/Θ(L)) εt ;

I If all roots of Θ(L) are outside the unit circle, then the process isinvertible and can be expressed as the convergent infiniteautoregression: (Φ(L)/Θ(L))Yt = εt .

An example of an ARMA(1,1) process: Yt = 0.85Yt−1 + εt + 0.5εt−1:

0 5 10 15 20

0.0

0.4

0.8

Lag

AC

F

ARMA(1,1) with φ = 0.85, θ = 0.5,

5 10 15 20

−0.

40.

00.

40.

8

Lag

Par

tial A

CF

ARMA(1,1) with φ = 0.85, θ = 0.5,

Choosing between AR, MA and ARMA (Part I: Order Selection)ARMA models are often both highly accurate and highly parsimonious.

I In a particular situation, for example, it might take an AR(5) modelto get the same approximation accuracy as could be obtained with anARMA(1, 1), but the AR(5) has five parameters to be estimated,whereas the ARMA(1, 1) has only two.

The rule to determine the number of AR and MA terms:- AR(p) - ACF declines, PACF = 0 if τ > p;- MA(q) - ACF = 0 if τ > q, PACF declines;- ARMA(p,q) - both ACF and PACF decline.

I Generally, when choosing a model order for ARMA, we choose theorder that results in the smallest BIC . Note that there are alternativeinformation criterions - AIC , which does not penalize higher ordermodels, as well as AICc , which does take into account the number ofparameters.

Stationarity and Invertibility in Model Specification

By inverting and truncating the appropriate backshift operator function(i.e. either φ(L), or θ(L)):

I a stationary AR(p) process can be approximated with an arbitraryprecision by truncating its infinite MA representation at some highorder MA(q∗);

I an invertible MA(q) process can be approximated with an arbitraryprecision by truncating its infinite AR representation by some highorder AR(p∗);

I a stationary and invertible ARMA(p, q) process can be closelyapproximated by either a high order AR, or a high order MAprocess.

Consequently:

I a higher order AR(p) process can be well approximated by alower order ARMA(p, q), where p + q < p;

I a higher order MA(q) process can be well approximated by alower order ARMA(p, q), where p + q < q;

In an empirical applications, where the data sample may be small, thecorrelation structure (ACF and PACF) may be such that in order toachieve a good model by a pure AR (or pure MA) requires a high order p(or high order q).

On the other hand, an approximation with a lower order ARMA (e.g. withp, q ∈ 0, 1, 2) may be reasonable for a given series.

Based on the ARMA definition, we can write ARMA(p, 0) = AR(p)and ARMA(0, q) = MA(q). This idea leads to universal functionsin R and Python, which allow estimation of AR/MA/ARMA modelswith various lag orders.

Choosing between AR, MA and ARMA (Part II: DiagnosticChecking)While the AIC/AICc/BIC is a good indicator of the adequacy of themodel, another important part is the model residuals.

Remember that one of the primary assumptions about the model isthat the shocks (i.e. residuals) are white noise - εt ∼WN(0, σ2).

If this does not hold true for the selected model, then the stationarityconditions may not hold and we need to specify a different model.Consequently:

The Ljung-Box test is commonly used to check whether the resid-uals of an ARMA model have no autocorrelation. In such cases,the degrees of freedom need to be adjusted to reflect the parameterestimation. For example, if εt are the residuals of an ARMA(p, q)model, we want to test: H0 : ρε(1) = 0, ..., ρε(k) = 0. Then, underthe null hypothesis, the statistic:

Q = T (T + 2)k∑τ=1

ρ2ε(τ)T − τ ∼ χ

21−α,k−p−q

For a given α significance level.

Estimation

Autoregressive process parameter estimationLet say we want to estimate the parameters of our AR(1) process:

Yt = φ1Yt−1 + εt

I The OLS estimator of φ for the AR(1) case:

φ =∑T

t=1 YtYt−1∑Tt=1 Y 2

t−1

I Yule-Walker estimator of φ for AR(1) can be calculated bymultiplying Yt = φ1Yt−1 + εt by Yt−1 and taking its expectation. Wewill get the equation:

γ(1) = φγ(0)

Recall that γ(τ) is the covariance between Yt and Yt−τ .

For the AR(p) case, we would need p different equations,i.e.:

γ(k) = θ1γ(t − 1) + ...+ θpγ(k − p), k = 1, ..., p

Moving-average process parameter estimationLet say we want to estimate the parameter of our invertible MA(1) process(i.e. |θ| < 1):

Yt = εt + θ1εt−1 ⇒ εt = Yt − θYt−1 + ...

Let S(θ) =∑T

t=1 εt and ε0 = 0. We can find the parameter θ byminimizing S(θ).

ARMA process parameter estimationFor the ARMA(1,1): Yt = φYt−1 + εt + θεt−1 we would need to minimizeS(θ, φ) =

∑Tt=1 ε

2t with ε0 = Y0 = 0.

For the ARMA(p,q), we would need to minimize S(θ, φ) by settingεk = Yk = 0 for k ≤ 0.We can also estimate the parameters using the maximum likelihoodmethod.

Forecasting: The General IdeaSo far we thought of the information set as containing the available pasthistory of the series, ΩT = YT ,YT−1, ..., where we imagined the historyas having begun in an infinite past. Based on that information set, wewant to find the optimal forecast of Y at some future time T + h.

If Yt is a stationary process, then the forecast tends to the processmean, as h increases. Therefore, the forecast is only interesting forseveral small values of h.

The basic idea of the forecast method is always the same: write out theprocess for the future time period, T + h and project it on what is knownat time T when the forecast is made. We denote the forecast asYT+h|T , h ≥ 1.

Point forecasts can be calculated using the following three steps.

1. If needed, expand the equation so that Yt is on the left hand side andall other terms are on the right;

2. Rewrite the equation by replacing T by T + h;3. On the right hand side of the equation, replace future observations by

their forecasts, future errors (εT+j , 0 < j ≤ h) by zero, and pasterrors by the corresponding residuals, εt , t ≤ T .

Forecasting MA(q) process

Consider, for example, an MA(1) process:

Yt = µ+ εt + θεt−1, εt ∼WN(0, σ2)

We then calculate the forecasts for periods T + 1, . . . , T + h as:

YT+1 = µ+ εT+1 + θεT ⇒ YT+1|T = µ+ 0 + θεT

YT+2 = µ+ εT+2 + θεT+1 ⇒ YT+2|T = µ+ 0 + 0...

YT+h = µ+ εT+h + θεT+h−1 ⇒ YT+h|T = µ

The forecast can be generalized for MA(q) as follows:I The forecast quickly approaches the (sample) mean of the process

and for h ≥ q + 1 - coincides with it.I When h increases, the accuracy of the forecast diminishes up to the

moment h = q + 1, whereupon it becomes a constant.

An example of an MA(1) process: Yt = εt + 0.5εt−1:

Forecasts from ARIMA(0,0,1) with zero mean

0 20 40 60 80 100 120

−2

−1

01

2

Forecasting AR(p) process

Consider, for example, an AR(1) process:

Yt = φYt−1 + εt , εt ∼WN(0, σ2)


YT+1 = φYT + εT+1 ⇒ YT+1|T = φYT + 0YT+2 = φYT+1 + εT+2 ⇒ YT+2|T = φYT+1 + 0 = φ2YT

...YT+h = φYT+h−1 + εT+h ⇒ YT+h|T = φYT+h−1 + 0 = φhYT

The forecast can be generalized for AR(p) as follows:I When h increases, the forecast tends to the (sample) mean

exponentially fast, but never reaches it.

An example of an AR(1) process: Yt = 0.85Yt−1 + εt :


0 20 40 60 80 100 120

−4

−2

02

4

Forecasting ARMA(p,q) process

Consider, for example, an ARMA(1,1) process:

Yt = φYt−1 + εt + θεt−1, εt ∼WN(0, σ2)


YT+1 = φYT + εT+1 + θεT ⇒ YT+1|T = φYT + 0 + θεT

YT+2 = φYT+1 + εT+2 + θεT+1 ⇒ YT+2|T = φYT+1 + 0 + 0 = φ2YT + φθεT

...

YT+h = φYT+h−1 + εT+h + θεT+h−1 ⇒ YT+h|T = φhYT + φh−1θεT

The forecast can be generalized for ARMA(p, q) as follows:I Similar to the AR(p) process, the ARMA(p, q) process tends to the

average, but never reaches it.

An example of an ARMA(1,1) process: Yt = 0.85Yt−1 + εt + 0.5εt−1:


0 20 40 60 80 100 120

−4

−2

02

46

- The forecast YT+h|T of an MA(q) process in h = q steps reachesits average and then does not change anymore;- The forecast YT+h|T of an AR(p) or ARMA(p,q) process tendsto the average, but never reaches it. The speed of convergencedepends on the coefficients;

ARIMA models and interpretationARMA models are atheoretic models (i.e. not concerned with(economic) theory). We are selecting an appropriate model purelybased on the information criteria and residual diagnostics. The goalis usually to get an adequate ARMA model for forecasting.

Nevertheless, there are a couple of ways to examine the coefficients of themodel.

Short-run and Long-run coefficientsIf we look at the MA(1) and AR(1) models separately (and apply theWold’s decomposition theorem to the AR(1) model):I MA(1): Yt = εt + θ1εt−1;I AR(1): Yt = φ1Yt−1 + εt = εt + φ1εt−1 + φ21εt−2 + ...

We see that in the AR model, the lagged terms of Y can be expressed asa an infinite sum of the past values of ε, whereas in the MA model, thenumber of past error terms depends on the model lag order.

To generalize for an ARMA(p, q) model, the error terms εt , ..., εt−qexplain the short-term influence of the past, whereas Yt−1, ...,Yt−pexplain the long-term influence.

Note: Invertibility is a restriction in the software used to estimate the coefficients of models with MA terms. It is not something that wecheck for in the data analysis separately (unlike stationarity).

Interpreting ARMA coefficients as regression parametersConsider an AR(2) model for the inflation rate, πt :

πt = α + φ1πt−1 + φ2πt−2 + εt

If we were to attempt to interpret the coefficients, as we would have in across-sectional regression case, we might say that “inflation today dependson the level of inflation yesterday and on the level of inflation the daybefore yesterday”. The problem with this type of interpretation istwo-fold:I it would be harder to describe for higher order lags, even more so, if

we include lags of ε;I when we interpret the regression coefficients, we make use of the

ceteris paribus condition. This is much harder to do, when a unitincrease in inflation the day before yesterday, would also affect theinflation rate yesterday.

Impulse-Response FunctionsInstead of attempting to interpret the estimated coefficients, which areoften too difficult to interpret in ARMA models, it is better to try tounderstand the dynamics of the system itself. This can be done in twoways:I By looking at the forecast dynamics (remember the differences

between AR and MA model forecasts);I By looking at the impulse-response function or time path associated

with the model.

Before examining impulse-response functions, we define:I Momentum - the tendency to continue moving in the same direction.

The momentum effect can offset the force of regression (convergence)toward the mean and can allow a variable to move away from itshistorical mean, for some time, but not indefinitely;

I Persistence - a persistence variable will hang around where it is andconverge slowly only to the historical mean.

The impulse-response functions allow us to ask the question: Sup-pose that a variable is at its historical mean and it receives a tem-porary one unit shock in a single period. How will the variablerespond in future periods?

ExampleConsider an MA(1):

Yt = εt + θ1εt−1

Assume that a unit shock arrives at t = 0 (we can do this equivalently atany other moment, for example t = T ), so that ε0 = σ. We then havethat the effect of this unit shock on Y , as t →∞, is:

t = 0 : σt = 1 : θ1 × σt = 2 : 0

...t = h : 0

The shock completely disappears after two periods. This means that ifwe have a momentary shock in an MA(1), it only lasts for twoperiods.

In practical applications, this could be used in analysing a temporaryshock to advertising, expenditure, interest rate and so on.

If we have σ = 1 and θ1 = 0.5, then the IRF is:

Time

Res

pons

e

1 2 3 4 5

−0.

50.

00.

51.

0

ExampleAR(1):

Yt = φ1Yt−1 + εt

Assume that a unit shock arrives at t = 0, so that ε0 = σ. We then havethat the effect of this unit shock on Y , as t →∞, is:

t = 0 : σt = 1 : φ1 × σt = 2 : φ21 × σ

...t = h : φh

1 × σ

Note: if we rewrite the stationary AR(1) process as an MA(∞):

Yt =∞∑

j=1φj1εt−j

We see that the unit shock to ε0 will have the same effect as in the AR(1)specification.

If we have σ = 1 and φ1 = 0.5, then the IRF is:

Time

Res

pons

e

2 4 6 8 10

−0.

50.

00.

51.

0

P.S. We are examining momentary shocks, but we could similarly examine theeffects of a permanent shock (we leave this for later lectures).

ExampleARMA(2, 1):

Yt = φ1Yt−1 + φ2Yt−2 + εt + θ1εt−1,

Assuming that the model has no constant, it may be easier to set Y0 = ε0and examine how the value of Y changes, when εj = 0, ∀j > 0. Then, aunit shock ε0 = σ will have the following effect on Y , as t →∞, is:

Y0 = σ

Y1 = φ1 × Y0 + θ1 × σY2 = φ1 × Y1 + φ2 × Y0

...Yh = φ1 × Yh−1 + φ2 × Yh−2

Note that this is similar to how the forecasts are calculated. However,instead of using YT , the beginning period is set Y0 = σ. If we wereto have a constant term, then we would need to set Y0 = α + σ.

Sometimes, for convenience (and depending on the model complex-ity), it may be easier to set σ = 1 and then re-scale the resultingIRF values for various different initial shocks σ.

For example, a unit shock toYt = 0.3Yt−1 − 0.1Yt−2 + εt + 0.05εt−1, εt ∼WN(0, 1)

results in the following IRF :

Time

Res

pons

e

1 2 3 4 5 6 7

−0.

50.

00.

51.

0

01 stationary time series - vilniaus...

Documents