chapter 3 prediction and model selection

Introduction to time series (2008) 1

Chapter 3

Prediction and model selection


Chapter 3. Contents.3.1. Properties of MMSE of prediction.

3.2. The computation of ARIMA forecasts.

3.3. Interpreting the forecasts from ARIMA models.

3.4. Prediction confidence intervals.

3.5. Forecasting updating.

3.6. Model selection criteria.


• We assume an observed sample:

• and want to generate predictions of future values given the observations,

• T is the forecast origin and k the forecast horizon.

)',...,,( 21 TT zzzZ

kTTT zzz ,....,, 21


Three components

• 1. Estimation of new values: prediction.

• 2. Measure of the uncertainty: prediction intervals.

• 3. Arrival of new data: updating.


Chapter 3. Prediction and model selection.

3.1. Properties of MMSE of prediction.


Properties of MMSE of prediction

• Prediction by the conditional expectation

• We have T observations of a zero mean stationary time series , and we want to forecast the value of .

• In order to compare alternative forecasting procedures, we need a criterion of optima- lity.

kTz

)( TZ



• Minimum Mean Square Error Forecasts (MMSF). Forecasts that minimize this criterion can be computed as follows.

• Let be the forecast we want to generate, this forecast must minimize

– Where the expected value is taken over the joint distribution of and

)(kgT

2)]([),( kgzEgzMSE TkTkT

kTz TZ



• Using the well-known property of

• we obtain

)()( / yEEyE xyx

]/[)/( 2TkTTkT ZzEZzMSE

]/[)(2)( 2TkTTT ZzEkgkg



• and taking the derivative, we obtain

• This result indicates that, conditioning to the observed sample, the MMSEF is obtained by computing the conditional expectation of the random variable given the available information.

)(ˆ]/[)( kzZzEkg TTkTT



• Linear predictions

• Conditional expectations can be, in some cases, difficult to compute. – Restrict our search to forecasting functions that

are linear functions of the observations.• General equation for a linear predictor

• . TkTkTkoT Zbzbzbkz '1)1(....)(ˆ



• calling MSEL to the mean square error of a linear forecast

• minimizing this expression with respect to the parameters, we have

2' ][)/( TkkTTkT ZbzEZzMSEL

0])[( ' TTkkT ZZbzE



• Which implies that the best linear forecast must be such that the forecast error is uncorrelated with the set of observed variables.

• This property suggests the interpretations of the linear predictor as projections.



• that is, finding the coefficients of the best linear predictor is equivalent to regress,

• then,

• where is the covariance matrix of and is the covariance vector between and

TkT Zonz

kTkb 1

T TZk

kTz

TZ



3.2. The computation of ARIMA forecasts.


The computation of ARIMA forecasts

• Suppose we want to forecast a time series that follows an ARIMA(p,d,q) model. First, we will assume that the parameters are known and the prediction horizon is 1 (k=1)

• where h=p+d

111111 ...... qTqTThThTT aaazzz



• The one-step-ahead forecast will be,

• and because the expected value for the observed sample data or the errors are themselves, and the only unknown is

]/[)1(ˆ 1 TTT ZzEz

1Ta

1111 ......)1(ˆ qTqThThTT aazzz



• Therefore, the one-step prediction error is,

– remember this is considering that the parameters are known, and therefore, the innovations are also known because we can compute them recursively from the observations

)1(ˆ11 TTT zza



• Multiple steps ahead forecast.

• where

)(ˆ...)1(ˆ)(ˆ 1 hkzkzkz ThTT

)(ˆ...)1(ˆ1 qkaka TqT

kjZaEjakjZzEjz

TjTT

TjTT

,...,2,1]/[)(ˆ,...2,1]/[)(ˆ



• This expression has two parts:– The first one, which depends on the AR

coefficients, will determine the form of the long run forecast (eventual forecast equation).

– The second one, which depends on the moving average coefficients, will dissapear for k>q



• AR(1) model

• for large k, the term , and therefore, the long-run forecast (for any ARMA(p,q)) will go to the mean of the process.

Tk

TT

TTT

TT

zkzkzzzz

zz

)1(ˆ)(ˆ)1(ˆ)2(ˆ

)1(ˆ2

Tk z



• Random walk with constant.

• The forecasts follow a straight line with slope c. If c=0, all forecasts are equal to the last observed value.

TTT

TTT

TT

zkckzckzzczcz

zcz

)1(ˆ)(ˆ2)1(ˆ)2(ˆ

)1(ˆ



3.3. Interpreting the forecasts from ARIMA models.


Interpretation of the forecasts

Nonseasonal models.

• The eventual forecast function of a nonseasonal ARIMA model verifies for k>q

• where

0))(ˆ)(( kzB Td

)( tzmean



• Espasa and Peña (1995) proved that the general solution for this equation can be written as,

• where, the permanent component is,

),0max()()()(ˆ pdqkKtkPkz TTT

)(kPTd



• and the transitory component is,

• Permanent component will be given by

• With determined by the mean of the stationary process

0)( TtB

dd

TTT kkkP ...)( )(

1)(

0

!/ dd



• whereas the rest of the parameters, depend on the initial values and change with the forecast origin.

• Examples:

)(Ti

22/)(1)(0)(

2)(1

)(

)(

dkkkPdkkPdkP

TToT

ToT

T



• 1. will be constant for all horizons.• 2. deterministic linear trend with slope ,

if , then the permanent component is just a constant.

• 3. the solution is a quadratic trend with the leading term determined by . If the equation reduces to a linear trend, but now the slope depends on the origin of the forecast.

0

0



• In summary, the long-run forecast from an ARIMA model is the mean if the series is stationary and a polynomial for nonstatio-nary models. – In this last case, the leading term of the

polynomial is a constant (when the mean is zero), whereas it depends on the forecast origin (adaptative) if the mean is different from zero.



• Transitory component. Can be given by

• where are the roots of the AR polyno-mial and are coefficientes depending on the forecast origin.

p

i

kiiT GAkt

1)(

1iG

iA



• Example. Consider the model,

• then , and the forecasts must have the form,

• where ,the constant that appears as the solution of and , the constant in the transitory equation must

tt azB )1(

1G

kTT Ackz 1)(ˆ

Tc0)( kPT 1A



• be determined from the initial conditions and can be obtained by

• and the solution of these two equations is

1 12 2

1 1 1

ˆ (1) ( )

ˆ (2) ( ) ( )T T T T T

T T T T T T T

z c A Á z Á z zz c A Á z Á z z Á z z

1( )1T T

T TÁ z zc z Á



• and,

• these results indicate that the forecasts are slowly approaching the long run forecast

• note that as goes to zero, the adjust-ment made by the transitory decreases exponentially. Cases for

11

( )1T TÁ z zA Á

Tc1kA Á

Á



• Seasonal models. For seasonal processes the forecast will satisfy the equation

• Let us assume that D=1, then the seasonal difference

ˆ( ) ( )( ( ) ) 0s D ds TB Á B z k ¹

2 1(1 ) (1 ... )(1 ) ( )(1 )s ssB B B B B S B B



• and therefore,

• which has the property that all the operators involved do not share roots in common. The solution is given by

1 ˆ( ) ( )( ( ) ( ) ) 0s ds TB Á B S B z k ¹

ˆ ( ) ( ) ( ) ( )T T T Tz k T k E k t k



• Permanent component has been splitted into two terms, trend component

• and the seasonal component

1 ( )dT

¹T k s

( ) ( ) 0s TS B E k



• Finally the transitory component, which will die out for large horizon is,

• the trend component has the same form as for nonseasonal data, but the order is d+1 and therefore the last term is

( ) ( ) ( ) 0sTB Á B t k

1 / ( 1)!d¯ ¹ s d



• the seasonal component will be given by

• and the solution of this equation is a function of period s and values summing zero each s lags. The coefficients are called seasonal coefficients and depend on the forecasting origin.

2

1 1

( ) ( ) 0s s

T Tj sE j E j



• Example: the airline model.

• The equation of the forecast is

1212 (1 )(1 )t tz µB B a

ˆ ˆ ˆ ˆ( ) ( 1) ( 12) ( 13)ˆ ˆ ˆ( 1) ( 12) ( 13)

T T T T

T T T

z k z k z k z kµa k a k µ a k



• This equation can be written,

• that is a linear trend plus a seasonal compo-nent with coefficients that are changing over time. In order to determine the parameters, we need 13 initial conditions

( ) ( ) ( )0 1

ˆ ˆˆ ( ) T T TT kz k ¯ ¯ k S

0 1ˆ ˆˆ ( ) 1,2,...,13T T T

T jz j ¯ j ¯ S j



• With , we obtain that the slope is,

• and calling

12T Tj jS S

1ˆ ˆ(13) (1)ˆ

12T T Tz z¯

12

1 ... 1210 112 12

1

ˆ ˆˆ ( ) T TTz z j ¯ ¯



• we have that

• The seasonal ceofficients are

• and will be given by the deviations of the forecast from the trend component.

0 113ˆ ˆ2

T T¯ z ¯

0 1ˆ ˆˆ ( )T T T

j TS z j ¯ ¯ j


Prediction confidence intervals

• Known parameter values. Let us write

• then, we can write

• taking expected values conditional to data

( )t tz Ã B a

0T k j T k j

jz Ã a

0

ˆ ( )T j k T jj

z k Ã a



• The forecast error is

• with variance

• this equation indicates that the uncertainty of the long run forecast is different for stationary and nonstationary models.

1 1 1 1ˆ( ) ( ) ...T T k T T k T k k Te k z z k a Ãa Ã a

2 2 21 1( ( )) (1 ... )T kVar e k ¾ Ã Ã



• For a stationary model the series converge since

• For an AR(1) model, for instance

• The long run forecast goes to the mean, and the uncertainty is finite.

0kÃ k

2( ( )) /(1 )TVar e k ¾ Á



• When the model is nonstationary, the variance of the forecast grows without bounds. This means that we cannot make useful long run forecasts.

• If the distribution of the forecast error is known, we can compute confidence intervals for the forecast.



• Assuming normality, the 95% confidence interval for the random variable

• We may also need the covariances, for h>0

T kz

2 2 1/ 21 1ˆ ˆ( ) 1.96 (1 ... )T kz k ¾ Ã Ã

12

1

ˆ ˆcov( ( ), ( 1) ( ( ), ( 1))i

T T T T h j jj

z i z i E e i e i ¾ Ã Ã



• Unknown parameter values. It can be shown that the uncertainty introduced in the forecast for this additional source is small for moderate sample size, and can be ignored in practice.

• Suppose an AR(1) model,

ˆˆ (1)T Tz Áz



• the true forecast error , is related to the observed forecast error

• assuming that is fixed, and using that

(1)T Te a*

1ˆ(1)T T Te z Áz

* ˆ(1) (1) ( )T T Te e Á Á z

Tz2 2

1ˆ( ) / tVar Á ¾ z



• we have that

• This equation indicates that the forecast error has two components:– uncertainty due to the random behavoir of the

observation.– The second measures the parameter uncertainty

becuse the parameter are estimated from the sample. (order 1/n - can be ignored for large n).

* 2 2 21( (1)) (1 / )T T tVar e ¾ z n z



3.5. Forecasting updating.


Forecasting updating.

• Computing updated forecasts. When new observations become available, forecasts are updated by

• which leads to

1 1

1 1 1

ˆ ( ) ...ˆ ( 1) ...

T k T k T

T k T k T

z k Ã a Ã az k Ã a Ã a

1 1 1ˆ ˆ( 1) ( )T T k Tz k z k Ã a



• where , is the one-step-ahead forecast error.

• and so the forecasts are updated by

– Note that the forecasts are updated by adding some part of the observed last forecast error to the previous forecast, and the coefficients for forecast updating are the weights.

1 1 ˆ (1)T T Ta z z

1 1 1ˆ ˆ( 1) ( )T T k Tz k z k Ã a

Ã



• Testing model stability Box and Tiao(1976) If the model is correct, we have that the statistic,

• because we need an estimate of the variance

21

2

ˆ( )

hT jj

ha

Q h Â¾

21*

,2

ˆ /( )

ˆ

hT jj

h n p qa h

Q h F¾



3.6. Model selection criteria.


Model selection criteria.

• The FPE and AIC criteria. Suppose we want to select the order of an AR(p) model in such a way that the out-of-sample one-step- ahead prediction mean-square error is minimized. This MSE is given by

• With

21 1( ) [ ' ]T T pMSE z E z Á Z

),...,( pTTp zzZ



• The forecast error can be decomposed as

• and so

• which decomposes the forecast error as the sum of the variable uncertainty and the parameter uncertainty.

1 1ˆ' ( ) 'T T p pe z Á Z Á Á Z

2 '1

ˆ ˆ( ) [( ) ' ( )]T p pMSE Z ¾ E Á Á Z Z Á Á



• This expectation can be approximated by,

• An unbiased estimate of is, Inserting this, we have an estimation of the out-of-sample forecast error. If we want to minimize this value, it implies that the order p must be chosen by minimizing the FPE criterion

)1()( 21 n

pTzMSE

2 )/(ˆ 2 pnn



• The Final Prediction Error (FPE) combines fitting with parsimony, due to the penalty introduced by the term (n+p)/(n-p).

pnpnFPE

)(ˆ 2



• An equivalent form for this criterion is

• Multiplying for n, we ontain the AIC criterion

– Tends to overstimate the number of parameters

npnnFPE n

pnp

/2ˆlog)1(log)1(logˆloglog

2

2

pnAIC 2ˆlog 2



• Bayesian Information Criterion (BIC).

• In this criteria the penalty is grater than in AIC, so BIC tends to select simpler models.

pnnBIC )(logˆlog 2

))((log)(2)(2)(2qpndevianceBIC

qpdevianceAIC

chapter 3 prediction and model selection

Documents

properties of mmse

mean stationary time

prediction horizon

prediction intervals

forecast error

best linear forecast

forecast horizon

forecast origin