energy infrastructure planning: forecastinge4210...time series examples hourly electricity demand...

266
Energy Infrastructure Planning: Forecasting Carlos Abad November 7, 2014

Upload: hoangkiet

Post on 03-Apr-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Energy Infrastructure Planning: Forecasting

Carlos Abad

November 7, 2014

Who am I?

PhD student in the IEOR department

Advisors: Prof. Vijay Modi and Prof. Garud Iyengar

Research:

Robust control algorithms for solar micro-gridsControl, signal detection, and forecasting methods for managing DRprograms

Who am I?

PhD student in the IEOR department

Advisors: Prof. Vijay Modi and Prof. Garud Iyengar

Research:

Robust control algorithms for solar micro-gridsControl, signal detection, and forecasting methods for managing DRprograms

Who am I?

PhD student in the IEOR department

Advisors: Prof. Vijay Modi and Prof. Garud Iyengar

Research:

Robust control algorithms for solar micro-gridsControl, signal detection, and forecasting methods for managing DRprograms

Who am I?

PhD student in the IEOR department

Advisors: Prof. Vijay Modi and Prof. Garud Iyengar

Research:

Robust control algorithms for solar micro-gridsControl, signal detection, and forecasting methods for managing DRprograms

References

Hyndman, R. J. & Athanasopoulos, G.(2013) Forecasting: principlesand practice.

www.otexts.org/fpp/

R package fpp

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Time series data

Time series consists of sequences of observationscollected over time.

We will assume the time periods are equally spaced

Time series examples

Hourly electricity demandDaily maximum temperatureWeekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series examples

Hourly electricity demandDaily maximum temperatureWeekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series examples

Hourly electricity demandDaily maximum temperatureWeekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series examplesHourly electricity demand

Daily maximum temperatureWeekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series examplesHourly electricity demandDaily maximum temperature

Weekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series examplesHourly electricity demandDaily maximum temperatureWeekly wind generation

Monthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series examplesHourly electricity demandDaily maximum temperatureWeekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series in RMain package used in this course> library(fpp)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Forecasting using R Time series data 34

Time series in R

Other packages> library(xts)

Order time series by timestamp

Nicer plotsEasier time aggregation

Time series in R

Other packages> library(xts)

Order time series by timestampNicer plots

Easier time aggregation

Time series in R

Other packages> library(xts)

Order time series by timestampNicer plotsEasier time aggregation

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Notation

yt : observed value at time t

yT+h|T : forecast for time T + h made at time T withhistorical information up to time T

Notation

yt : observed value at time t

yT+h|T : forecast for time T + h made at time T withhistorical information up to time T

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Drift method

Forecasts equal to last value plus averagechange.

Forecasts:

yT+h|T = yT +h

T − 1

T∑

t=2

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Equivalent to extrapolating a line drawnbetween first and last observations.

Forecasting using R Some simple forecasting methods 41

Drift method

Forecasts equal to last value plus averagechange.

Forecasts:

yT+h|T = yT +h

T − 1

T∑

t=2

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Equivalent to extrapolating a line drawnbetween first and last observations.

Forecasting using R Some simple forecasting methods 41

Drift method

Forecasts equal to last value plus averagechange.

Forecasts:

yT+h|T = yT +h

T − 1

T∑

t=2

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Equivalent to extrapolating a line drawnbetween first and last observations.

Forecasting using R Some simple forecasting methods 41

Some simple forecasting methods

Mean: meanf(x, h=20)

Naive: naive(x, h=20) or rwf(x, h=20)

Seasonal naive: snaive(x, h=20)

Drift: rwf(x, drift=TRUE, h=20)

Forecasting using R Some simple forecasting methods 43

Some simple forecasting methods

Mean: meanf(x, h=20)

Naive: naive(x, h=20) or rwf(x, h=20)

Seasonal naive: snaive(x, h=20)

Drift: rwf(x, drift=TRUE, h=20)

Forecasting using R Some simple forecasting methods 43

Some simple forecasting methods

Mean: meanf(x, h=20)

Naive: naive(x, h=20) or rwf(x, h=20)

Seasonal naive: snaive(x, h=20)

Drift: rwf(x, drift=TRUE, h=20)

Forecasting using R Some simple forecasting methods 43

Some simple forecasting methods

Mean: meanf(x, h=20)

Naive: naive(x, h=20) or rwf(x, h=20)

Seasonal naive: snaive(x, h=20)

Drift: rwf(x, drift=TRUE, h=20)

Forecasting using R Some simple forecasting methods 43

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Forecasting residuals

Residuals in forecasting: difference betweenobserved value and its forecast based on allprevious observations: et = yt − yt|t−1.

Assumptions1 {et} uncorrelated. If they aren’t, then

information left in residuals that should be usedin computing forecasts.

2 {et} have mean zero. If they don’t, thenforecasts are biased.

Useful properties (for prediction intervals)3 {et} have constant variance.4 {et} are normally distributed.

Forecasting using R Forecast residuals 10

Forecasting residuals

Residuals in forecasting: difference betweenobserved value and its forecast based on allprevious observations: et = yt − yt|t−1.

Assumptions1 {et} uncorrelated. If they aren’t, then

information left in residuals that should be usedin computing forecasts.

2 {et} have mean zero. If they don’t, thenforecasts are biased.

Useful properties (for prediction intervals)3 {et} have constant variance.4 {et} are normally distributed.

Forecasting using R Forecast residuals 10

Forecasting residuals

Residuals in forecasting: difference betweenobserved value and its forecast based on allprevious observations: et = yt − yt|t−1.

Assumptions1 {et} uncorrelated. If they aren’t, then

information left in residuals that should be usedin computing forecasts.

2 {et} have mean zero. If they don’t, thenforecasts are biased.

Useful properties (for prediction intervals)3 {et} have constant variance.4 {et} are normally distributed.

Forecasting using R Forecast residuals 10

Measures of forecast accuracy

Let yt denote the tth observation and yt|t−1 denote its forecastbased on all previous data, where t = 1, . . . , T. Then thefollowing measures are useful.

MAE = T−1T∑

t=1

|yt − yt|t−1|

MSE = T−1T∑

t=1

(yt − yt|t−1)2 RMSE =

√√√√T−1

T∑

t=1

(yt − yt|t−1)2

MAPE = 100T−1T∑

t=1

|yt − yt|t−1|/|yt|

MAE, MSE, RMSE are all scale dependent.

MAPE is scale independent but is only sensible if yt � 0for all t, and y has a natural zero.

Forecasting using R Evaluating forecast accuracy 17

Measures of forecast accuracy

Let yt denote the tth observation and yt|t−1 denote its forecastbased on all previous data, where t = 1, . . . , T. Then thefollowing measures are useful.

MAE = T−1T∑

t=1

|yt − yt|t−1|

MSE = T−1T∑

t=1

(yt − yt|t−1)2 RMSE =

√√√√T−1

T∑

t=1

(yt − yt|t−1)2

MAPE = 100T−1T∑

t=1

|yt − yt|t−1|/|yt|

MAE, MSE, RMSE are all scale dependent.

MAPE is scale independent but is only sensible if yt � 0for all t, and y has a natural zero.

Forecasting using R Evaluating forecast accuracy 17

Measures of forecast accuracy

Let yt denote the tth observation and yt|t−1 denote its forecastbased on all previous data, where t = 1, . . . , T. Then thefollowing measures are useful.

MAE = T−1T∑

t=1

|yt − yt|t−1|

MSE = T−1T∑

t=1

(yt − yt|t−1)2 RMSE =

√√√√T−1

T∑

t=1

(yt − yt|t−1)2

MAPE = 100T−1T∑

t=1

|yt − yt|t−1|/|yt|

MAE, MSE, RMSE are all scale dependent.

MAPE is scale independent but is only sensible if yt � 0for all t, and y has a natural zero.

Forecasting using R Evaluating forecast accuracy 17

Measures of forecast accuracy

Mean Absolute Scaled Error

MASE = T−1T∑

t=1

|yt − yt|t−1|/Q

where Q is a stable measure of the scale of the timeseries {yt}.

Forecasting using R Evaluating forecast accuracy 18

Measures of forecast accuracy

Mean Absolute Scaled Error

MASE = T−1T∑

t=1

|yt − yt|t−1|/Q

where Q is a stable measure of the scale of the timeseries {yt}.

For non-seasonal time series,

Q = (T − 1)−1T∑

t=2

|yt − yt−1|

works well. Then MASE is equivalent to MAE relativeto a naive method.

Forecasting using R Evaluating forecast accuracy 18

Measures of forecast accuracy

Mean Absolute Scaled Error

MASE = T−1T∑

t=1

|yt − yt|t−1|/Q

where Q is a stable measure of the scale of the timeseries {yt}.

For seasonal time series,

Q = (T −m)−1T∑

t=m+1

|yt − yt−m|

works well. Then MASE is equivalent to MAE relativeto a seasonal naive method.

Forecasting using R Evaluating forecast accuracy 19

Training and test sets

Available data

Training set Test set(e.g., 80%) (e.g., 20%)

The test set must not be used for any aspect ofmodel development or calculation of forecasts.

Forecast accuracy is based only on the test set.

Forecasting using R Evaluating forecast accuracy 24

Training and test sets

Available data

Training set Test set(e.g., 80%) (e.g., 20%)

The test set must not be used for any aspect ofmodel development or calculation of forecasts.

Forecast accuracy is based only on the test set.

Forecasting using R Evaluating forecast accuracy 24

Beware of over-fitting

A model which fits the data well does notnecessarily forecast well.A perfect fit can always be obtained by using amodel with enough parameters. (Compare R2)Over-fitting a model to data is as bad as failingto identify the systematic pattern in the data.Problems can be overcome by measuring trueout-of-sample forecast accuracy. That is, totaldata divided into “training” set and “test” set.Training set used to estimate parameters.Forecasts are made for test set.Accuracy measures computed for errors in testset only.

Forecasting using R Evaluating forecast accuracy 26

Beware of over-fitting

A model which fits the data well does notnecessarily forecast well.A perfect fit can always be obtained by using amodel with enough parameters. (Compare R2)Over-fitting a model to data is as bad as failingto identify the systematic pattern in the data.Problems can be overcome by measuring trueout-of-sample forecast accuracy. That is, totaldata divided into “training” set and “test” set.Training set used to estimate parameters.Forecasts are made for test set.Accuracy measures computed for errors in testset only.

Forecasting using R Evaluating forecast accuracy 26

Beware of over-fitting

A model which fits the data well does notnecessarily forecast well.A perfect fit can always be obtained by using amodel with enough parameters. (Compare R2)Over-fitting a model to data is as bad as failingto identify the systematic pattern in the data.Problems can be overcome by measuring trueout-of-sample forecast accuracy. That is, totaldata divided into “training” set and “test” set.Training set used to estimate parameters.Forecasts are made for test set.Accuracy measures computed for errors in testset only.

Forecasting using R Evaluating forecast accuracy 26

Beware of over-fitting

A model which fits the data well does notnecessarily forecast well.A perfect fit can always be obtained by using amodel with enough parameters. (Compare R2)Over-fitting a model to data is as bad as failingto identify the systematic pattern in the data.Problems can be overcome by measuring trueout-of-sample forecast accuracy. That is, totaldata divided into “training” set and “test” set.Training set used to estimate parameters.Forecasts are made for test set.Accuracy measures computed for errors in testset only.

Forecasting using R Evaluating forecast accuracy 26

Beware of over-fitting

A model which fits the data well does notnecessarily forecast well.A perfect fit can always be obtained by using amodel with enough parameters. (Compare R2)Over-fitting a model to data is as bad as failingto identify the systematic pattern in the data.Problems can be overcome by measuring trueout-of-sample forecast accuracy. That is, totaldata divided into “training” set and “test” set.Training set used to estimate parameters.Forecasts are made for test set.Accuracy measures computed for errors in testset only.

Forecasting using R Evaluating forecast accuracy 26

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Time series graphics

Time plotsR command: plot or plot.ts

Seasonal plotsR command: seasonplot

Seasonal subseries plotsR command: monthplot

Lag plotsR command: lag.plot

ACF plotsR command: Acf

Forecasting using R Time series graphics 3

Seasonal plots

Data plotted against the individual “seasons” inwhich the data were observed. (In this case a“season” is a month.)

Something like a time plot except that the datafrom each season are overlapped.

Enables the underlying seasonal pattern to beseen more clearly, and also allows anysubstantial departures from the seasonalpattern to be easily identified.

In R: seasonplot

Forecasting using R Time series graphics 7

Seasonal plots

Data plotted against the individual “seasons” inwhich the data were observed. (In this case a“season” is a month.)

Something like a time plot except that the datafrom each season are overlapped.

Enables the underlying seasonal pattern to beseen more clearly, and also allows anysubstantial departures from the seasonalpattern to be easily identified.

In R: seasonplot

Forecasting using R Time series graphics 7

Seasonal plots

Data plotted against the individual “seasons” inwhich the data were observed. (In this case a“season” is a month.)

Something like a time plot except that the datafrom each season are overlapped.

Enables the underlying seasonal pattern to beseen more clearly, and also allows anysubstantial departures from the seasonalpattern to be easily identified.

In R: seasonplot

Forecasting using R Time series graphics 7

Seasonal plots

Data plotted against the individual “seasons” inwhich the data were observed. (In this case a“season” is a month.)

Something like a time plot except that the datafrom each season are overlapped.

Enables the underlying seasonal pattern to beseen more clearly, and also allows anysubstantial departures from the seasonalpattern to be easily identified.

In R: seasonplot

Forecasting using R Time series graphics 7

Seasonal subseries plots

Data for each season collected together in timeplot as separate time series.

Enables the underlying seasonal pattern to beseen clearly, and changes in seasonality overtime to be visualized.

In R: monthplot

Forecasting using R Time series graphics 9

Seasonal subseries plots

Data for each season collected together in timeplot as separate time series.

Enables the underlying seasonal pattern to beseen clearly, and changes in seasonality overtime to be visualized.

In R: monthplot

Forecasting using R Time series graphics 9

Seasonal subseries plots

Data for each season collected together in timeplot as separate time series.

Enables the underlying seasonal pattern to beseen clearly, and changes in seasonality overtime to be visualized.

In R: monthplot

Forecasting using R Time series graphics 9

Time series patterns

Trend pattern exists when there is a long-termincrease or decrease in the data.

Seasonal pattern exists when a series isinfluenced by seasonal factors (e.g., thequarter of the year, the month, or day ofthe week).

Cyclic pattern exists when data exhibit rises andfalls that are not of fixed period (durationusually of at least 2 years).

Forecasting using R Seasonal or cyclic? 15

Seasonal or cyclic?

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable length

average length of cycle longer than length ofseasonal pattern

magnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

Forecasting using R Seasonal or cyclic? 21

Seasonal or cyclic?

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable length

average length of cycle longer than length ofseasonal pattern

magnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

Forecasting using R Seasonal or cyclic? 21

Seasonal or cyclic?

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable length

average length of cycle longer than length ofseasonal pattern

magnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

Forecasting using R Seasonal or cyclic? 21

Seasonal or cyclic?

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable length

average length of cycle longer than length ofseasonal pattern

magnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

Forecasting using R Seasonal or cyclic? 21

Seasonal or cyclic?

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable length

average length of cycle longer than length ofseasonal pattern

magnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

Forecasting using R Seasonal or cyclic? 21

Time series patterns

Forecasting using R Seasonal or cyclic? 16

Australian electricity production

Year

GW

h

1980 1985 1990 1995

8000

1000

012

000

1400

0

Time series patterns

Forecasting using R Seasonal or cyclic? 17

Australian clay brick production

Year

mill

ion

units

1960 1970 1980 1990

200

300

400

500

600

Stationarity

DefinitionIf {yt} is a stationary time series, then for

all s, the distribution of (yt, . . . , yt+s) does

not depend on t.

A stationary series is:

roughly horizontal

constant variance

no patterns predictable in the long-term

Forecasting using R Stationarity 3

Stationarity

DefinitionIf {yt} is a stationary time series, then for

all s, the distribution of (yt, . . . , yt+s) does

not depend on t.

A stationary series is:

roughly horizontal

constant variance

no patterns predictable in the long-term

Forecasting using R Stationarity 3

Stationarity

DefinitionIf {yt} is a stationary time series, then for

all s, the distribution of (yt, . . . , yt+s) does

not depend on t.

A stationary series is:

roughly horizontal

constant variance

no patterns predictable in the long-term

Forecasting using R Stationarity 3

Stationarity

DefinitionIf {yt} is a stationary time series, then for

all s, the distribution of (yt, . . . , yt+s) does

not depend on t.

A stationary series is:

roughly horizontal

constant variance

no patterns predictable in the long-term

Forecasting using R Stationarity 3

Stationary?

Forecasting using R Stationarity 4

Day

Dow

−Jo

nes

inde

x

0 50 100 150 200 250 300

3600

3700

3800

3900

Stationary?

Forecasting using R Stationarity 5

Day

Cha

nge

in D

ow−

Jone

s in

dex

0 50 100 150 200 250 300

−10

0−

500

50

Stationarity

DefinitionIf {yt} is a stationary time series, then for

all s, the distribution of (yt, . . . , yt+s) does

not depend on t.

Transformations help to stabilize the

variance.

For ARIMA modelling, we also need to

stabilize the mean.Forecasting using R Stationarity 13

Non-stationarity in the mean

Identifying non-stationary series

time plot.

The ACF of stationary data drops to

zero relatively quickly

The ACF of non-stationary data

decreases slowly.

For non-stationary data, the value of r1is often large and positive.

Forecasting using R Stationarity 14

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX).

Autocovariance and autocorrelation: measurelinear relationship between lagged values of atime series y.

We measure the relationship between: yt and yt−1

yt and yt−2

yt and yt−3

etc.

Forecasting using R Autocorrelation 23

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX).

Autocovariance and autocorrelation: measurelinear relationship between lagged values of atime series y.

We measure the relationship between: yt and yt−1

yt and yt−2

yt and yt−3

etc.

Forecasting using R Autocorrelation 23

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX).

Autocovariance and autocorrelation: measurelinear relationship between lagged values of atime series y.

We measure the relationship between: yt and yt−1

yt and yt−2

yt and yt−3

etc.

Forecasting using R Autocorrelation 23

AutocorrelationWe denote the sample autocovariance at lag k by ck and thesample autocorrelation at lag k by rk. Then define

ck =1

T

T∑

t=k+1

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to eachother

r2 indicates how y values two periods apart relate toeach other

rk is almost the same as the sample correlation betweenyt and yt−k.

Forecasting using R Autocorrelation 27

AutocorrelationWe denote the sample autocovariance at lag k by ck and thesample autocorrelation at lag k by rk. Then define

ck =1

T

T∑

t=k+1

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to eachother

r2 indicates how y values two periods apart relate toeach other

rk is almost the same as the sample correlation betweenyt and yt−k.

Forecasting using R Autocorrelation 27

AutocorrelationWe denote the sample autocovariance at lag k by ck and thesample autocorrelation at lag k by rk. Then define

ck =1

T

T∑

t=k+1

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to eachother

r2 indicates how y values two periods apart relate toeach other

rk is almost the same as the sample correlation betweenyt and yt−k.

Forecasting using R Autocorrelation 27

AutocorrelationWe denote the sample autocovariance at lag k by ck and thesample autocorrelation at lag k by rk. Then define

ck =1

T

T∑

t=k+1

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to eachother

r2 indicates how y values two periods apart relate toeach other

rk is almost the same as the sample correlation betweenyt and yt−k.

Forecasting using R Autocorrelation 27

Recognizing seasonality in a time series

If there is seasonality, the ACF at the seasonal lag(e.g., 12 for monthly data) will be large andpositive.

For seasonal monthly data, a large ACF valuewill be seen at lag 12 and possibly also at lags24, 36, . . .

For seasonal quarterly data, a large ACF valuewill be seen at lag 4 and possibly also at lags 8,12, . . .

Forecasting using R Autocorrelation 31

Recognizing seasonality in a time series

If there is seasonality, the ACF at the seasonal lag(e.g., 12 for monthly data) will be large andpositive.

For seasonal monthly data, a large ACF valuewill be seen at lag 12 and possibly also at lags24, 36, . . .

For seasonal quarterly data, a large ACF valuewill be seen at lag 4 and possibly also at lags 8,12, . . .

Forecasting using R Autocorrelation 31

Example: White noise

Forecasting using R White noise 5

White noise

Time

x

0 10 20 30 40 50

−3

−2

−1

01

2

Example: White noise

Forecasting using R White noise 5

White noise

Time

x

0 10 20 30 40 50

−3

−2

−1

01

2

White noise data is uncorrelated acrosstime with zero mean and constant variance.(Technically, we require independence aswell.)

Example: White noise

Forecasting using R White noise 5

White noise

Time

x

0 10 20 30 40 50

−3

−2

−1

01

2

White noise data is uncorrelated acrosstime with zero mean and constant variance.(Technically, we require independence aswell.)

Think of white noise as completelyuninteresting with no predictable patterns.

Example: White noise

r1 = 0.013r2 = −0.163r3 = 0.163r4 = −0.259r5 = −0.198r6 = 0.064r7 = −0.139r8 = −0.032r9 = 0.199r10 = −0.240

Sample autocorrelations for white noise series.For uncorrelated data, we would expect eachautocorrelation to be close to zero.

Forecasting using R White noise 6

−0.

4−

0.2

0.0

0.2

0.4

Lag

AC

F

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1511 13

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T).

95% of all rk for white noise must lie within±1.96/

√T.

If this is not the case, the series is probably notWN.

Common to plot lines at ±1.96/√T when

plotting ACF. These are the critical values.

Forecasting using R White noise 7

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T).

95% of all rk for white noise must lie within±1.96/

√T.

If this is not the case, the series is probably notWN.

Common to plot lines at ±1.96/√T when

plotting ACF. These are the critical values.

Forecasting using R White noise 7

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T).

95% of all rk for white noise must lie within±1.96/

√T.

If this is not the case, the series is probably notWN.

Common to plot lines at ±1.96/√T when

plotting ACF. These are the critical values.

Forecasting using R White noise 7

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T).

95% of all rk for white noise must lie within±1.96/

√T.

If this is not the case, the series is probably notWN.

Common to plot lines at ±1.96/√T when

plotting ACF. These are the critical values.

Forecasting using R White noise 7

Autocorrelation

Forecasting using R White noise 8

−0.

4−

0.2

0.0

0.2

0.4

Lag

AC

F

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1511 13

Example:T = 50 and socritical values at±1.96/

√50 =

±0.28.All autocorrelationcoefficients lie withinthese limits,confirming that thedata are white noise.(More precisely, the data cannot bedistinguished from white noise.)

ACF of residuals

We assume that the residuals are white noise(uncorrelated, mean zero, constant variance). Ifthey aren’t, then there is information left in theresiduals that should be used in computingforecasts.So a standard residual diagnostic is to checkthe ACF of the residuals of a forecastingmethod.We expect these to look like white noise.

Dow-Jones naive forecasts revisited

yt|t−1 = yt−1

et = yt − yt−1

Forecasting using R White noise 12

ACF of residuals

We assume that the residuals are white noise(uncorrelated, mean zero, constant variance). Ifthey aren’t, then there is information left in theresiduals that should be used in computingforecasts.So a standard residual diagnostic is to checkthe ACF of the residuals of a forecastingmethod.We expect these to look like white noise.

Dow-Jones naive forecasts revisited

yt|t−1 = yt−1

et = yt − yt−1

Forecasting using R White noise 12

ACF of residuals

We assume that the residuals are white noise(uncorrelated, mean zero, constant variance). Ifthey aren’t, then there is information left in theresiduals that should be used in computingforecasts.So a standard residual diagnostic is to checkthe ACF of the residuals of a forecastingmethod.We expect these to look like white noise.

Dow-Jones naive forecasts revisited

yt|t−1 = yt−1

et = yt − yt−1

Forecasting using R White noise 12

Non-stationarity in the mean

Identifying non-stationary series

time plot.

The ACF of stationary data drops to

zero relatively quickly

The ACF of non-stationary data

decreases slowly.

For non-stationary data, the value of r1is often large and positive.

Forecasting using R Stationarity 14

Non-stationarity in the mean

Identifying non-stationary series

time plot.

The ACF of stationary data drops to

zero relatively quickly

The ACF of non-stationary data

decreases slowly.

For non-stationary data, the value of r1is often large and positive.

Forecasting using R Stationarity 14

Non-stationarity in the mean

Identifying non-stationary series

time plot.

The ACF of stationary data drops to

zero relatively quickly

The ACF of non-stationary data

decreases slowly.

For non-stationary data, the value of r1is often large and positive.

Forecasting using R Stationarity 14

Non-stationarity in the mean

Identifying non-stationary series

time plot.

The ACF of stationary data drops to

zero relatively quickly

The ACF of non-stationary data

decreases slowly.

For non-stationary data, the value of r1is often large and positive.

Forecasting using R Stationarity 14

Differencing

Differencing helps to stabilize the

mean.

The differenced series is the change

between each observation in the

original series: y′t = yt − yt−1.

The differenced series will have only

T − 1 values since it is not possible to

calculate a difference y′1 for the first

observation.

Forecasting using R Ordinary differencing 20

Differencing

Differencing helps to stabilize the

mean.

The differenced series is the change

between each observation in the

original series: y′t = yt − yt−1.

The differenced series will have only

T − 1 values since it is not possible to

calculate a difference y′1 for the first

observation.

Forecasting using R Ordinary differencing 20

Differencing

Differencing helps to stabilize the

mean.

The differenced series is the change

between each observation in the

original series: y′t = yt − yt−1.

The differenced series will have only

T − 1 values since it is not possible to

calculate a difference y′1 for the first

observation.

Forecasting using R Ordinary differencing 20

Second-order differencing

Occasionally the differenced data will not

appear stationary and it may be necessary

to difference the data a second time:

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t will have T − 2 values.

In practice, it is almost never necessary

to go beyond second-order differences.Forecasting using R Ordinary differencing 24

Second-order differencing

Occasionally the differenced data will not

appear stationary and it may be necessary

to difference the data a second time:

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t will have T − 2 values.

In practice, it is almost never necessary

to go beyond second-order differences.Forecasting using R Ordinary differencing 24

Second-order differencing

Occasionally the differenced data will not

appear stationary and it may be necessary

to difference the data a second time:

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t will have T − 2 values.

In practice, it is almost never necessary

to go beyond second-order differences.Forecasting using R Ordinary differencing 24

Second-order differencing

Occasionally the differenced data will not

appear stationary and it may be necessary

to difference the data a second time:

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t will have T − 2 values.

In practice, it is almost never necessary

to go beyond second-order differences.Forecasting using R Ordinary differencing 24

Seasonal differencing

A seasonal difference is the difference

between an observation and the

corresponding observation from the

previous year.

y′t = yt − yt−m

where m = number of seasons.

For monthly data m = 12.

For quarterly data m = 4.Forecasting using R Seasonal differencing 26

Seasonal differencing

A seasonal difference is the difference

between an observation and the

corresponding observation from the

previous year.

y′t = yt − yt−m

where m = number of seasons.

For monthly data m = 12.

For quarterly data m = 4.Forecasting using R Seasonal differencing 26

Seasonal differencing

A seasonal difference is the difference

between an observation and the

corresponding observation from the

previous year.

y′t = yt − yt−m

where m = number of seasons.

For monthly data m = 12.

For quarterly data m = 4.Forecasting using R Seasonal differencing 26

Seasonal differencing

A seasonal difference is the difference

between an observation and the

corresponding observation from the

previous year.

y′t = yt − yt−m

where m = number of seasons.

For monthly data m = 12.

For quarterly data m = 4.Forecasting using R Seasonal differencing 26

Seasonal differencing

When both seasonal and first differences areapplied. . .

it makes no difference which is done first—theresult will be the same.

If seasonality is strong, we recommend thatseasonal differencing be done first becausesometimes the resulting series will bestationary and there will be no need for furtherfirst difference.

It is important that if differencing is used, thedifferences are interpretable.

Forecasting using R Seasonal differencing 35

Seasonal differencing

When both seasonal and first differences areapplied. . .

it makes no difference which is done first—theresult will be the same.

If seasonality is strong, we recommend thatseasonal differencing be done first becausesometimes the resulting series will bestationary and there will be no need for furtherfirst difference.

It is important that if differencing is used, thedifferences are interpretable.

Forecasting using R Seasonal differencing 35

Seasonal differencing

When both seasonal and first differences areapplied. . .

it makes no difference which is done first—theresult will be the same.

If seasonality is strong, we recommend thatseasonal differencing be done first becausesometimes the resulting series will bestationary and there will be no need for furtherfirst difference.

It is important that if differencing is used, thedifferences are interpretable.

Forecasting using R Seasonal differencing 35

Seasonal differencing

When both seasonal and first differences areapplied. . .

it makes no difference which is done first—theresult will be the same.

If seasonality is strong, we recommend thatseasonal differencing be done first becausesometimes the resulting series will bestationary and there will be no need for furtherfirst difference.

It is important that if differencing is used, thedifferences are interpretable.

Forecasting using R Seasonal differencing 35

Seasonal differencing

When both seasonal and first differences areapplied. . .

it makes no difference which is done first—theresult will be the same.

If seasonality is strong, we recommend thatseasonal differencing be done first becausesometimes the resulting series will bestationary and there will be no need for furtherfirst difference.

It is important that if differencing is used, thedifferences are interpretable.

Forecasting using R Seasonal differencing 35

Interpretation of differencing

first differences are the change

between one observation and the

next;

seasonal differences are the change

between one year to the next.

But taking lag 3 differences for yearly data,

for example, results in a model which

cannot be sensibly interpreted.

Forecasting using R Seasonal differencing 36

Interpretation of differencing

first differences are the change

between one observation and the

next;

seasonal differences are the change

between one year to the next.

But taking lag 3 differences for yearly data,

for example, results in a model which

cannot be sensibly interpreted.

Forecasting using R Seasonal differencing 36

Interpretation of differencing

first differences are the change

between one observation and the

next;

seasonal differences are the change

between one year to the next.

But taking lag 3 differences for yearly data,

for example, results in a model which

cannot be sensibly interpreted.

Forecasting using R Seasonal differencing 36

Interpretation of differencing

first differences are the change

between one observation and the

next;

seasonal differences are the change

between one year to the next.

But taking lag 3 differences for yearly data,

for example, results in a model which

cannot be sensibly interpreted.

Forecasting using R Seasonal differencing 36

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Autoregressive modelsAutoregressive (AR) models:

yt = c+ φ1yt−1 + φ2yt−2 + · · ·+ φpyt−p + et,

where et is white noise. This is a multiple regressionwith lagged values of yt as predictors.

Forecasting using R Non-seasonal ARIMA models 3

Autoregressive modelsAutoregressive (AR) models:

yt = c+ φ1yt−1 + φ2yt−2 + · · ·+ φpyt−p + et,

where et is white noise. This is a multiple regressionwith lagged values of yt as predictors.

AR(1)

Time

0 20 40 60 80 100

78

910

1112

13

AR(2)

Time

0 20 40 60 80 100

1618

2022

24

Forecasting using R Non-seasonal ARIMA models 3

AR(1) model

yt = c+ φ1yt−1 + et

When φ1 = 0, yt is equivalent to WN

When φ1 = 1 and c = 0, yt is

equivalent to a RW

When φ1 = 1 and c 6= 0, yt is

equivalent to a RW with drift

When φ1 < 0, yt tends to oscillate

between positive and negative

values.

Forecasting using R Non-seasonal ARIMA models 4

AR(1) model

yt = c+ φ1yt−1 + et

When φ1 = 0, yt is equivalent to WN

When φ1 = 1 and c = 0, yt is

equivalent to a RW

When φ1 = 1 and c 6= 0, yt is

equivalent to a RW with drift

When φ1 < 0, yt tends to oscillate

between positive and negative

values.

Forecasting using R Non-seasonal ARIMA models 4

AR(1) model

yt = c+ φ1yt−1 + et

When φ1 = 0, yt is equivalent to WN

When φ1 = 1 and c = 0, yt is

equivalent to a RW

When φ1 = 1 and c 6= 0, yt is

equivalent to a RW with drift

When φ1 < 0, yt tends to oscillate

between positive and negative

values.

Forecasting using R Non-seasonal ARIMA models 4

AR(1) model

yt = c+ φ1yt−1 + et

When φ1 = 0, yt is equivalent to WN

When φ1 = 1 and c = 0, yt is

equivalent to a RW

When φ1 = 1 and c 6= 0, yt is

equivalent to a RW with drift

When φ1 < 0, yt tends to oscillate

between positive and negative

values.

Forecasting using R Non-seasonal ARIMA models 4

Moving Average (MA) models

Moving Average (MA) models:

yt = c+ et + θ1et−1 + θ2et−2 + · · ·+ θqet−q,

where et is white noise. This is a multiple regressionwith past errors as predictors. Don’t confuse thiswith moving average smoothing!

Forecasting using R Non-seasonal ARIMA models 5

Moving Average (MA) models

Moving Average (MA) models:

yt = c+ et + θ1et−1 + θ2et−2 + · · ·+ θqet−q,

where et is white noise. This is a multiple regressionwith past errors as predictors. Don’t confuse thiswith moving average smoothing!

MA(1)

Time

0 20 40 60 80 100

1718

1920

2122

23

MA(2)

Time

0 20 40 60 80 100

−4

−2

02

4

Forecasting using R Non-seasonal ARIMA models 5

ARIMA models

Autoregressive Moving Average models:

yt = c+ φ1yt−1 + · · ·+ φpyt−p+ θ1et−1 + · · ·+ θqet−q + et.

Predictors include both lagged values of ytand lagged errors.

ARMA models can be used for a huge range ofstationary time series.

They model the short-term dynamics.

An ARMA model applied to differenced data isan ARIMA model.

Forecasting using R Non-seasonal ARIMA models 6

ARIMA models

Autoregressive Moving Average models:

yt = c+ φ1yt−1 + · · ·+ φpyt−p+ θ1et−1 + · · ·+ θqet−q + et.

Predictors include both lagged values of ytand lagged errors.

ARMA models can be used for a huge range ofstationary time series.

They model the short-term dynamics.

An ARMA model applied to differenced data isan ARIMA model.

Forecasting using R Non-seasonal ARIMA models 6

ARIMA models

Autoregressive Moving Average models:

yt = c+ φ1yt−1 + · · ·+ φpyt−p+ θ1et−1 + · · ·+ θqet−q + et.

Predictors include both lagged values of ytand lagged errors.

ARMA models can be used for a huge range ofstationary time series.

They model the short-term dynamics.

An ARMA model applied to differenced data isan ARIMA model.

Forecasting using R Non-seasonal ARIMA models 6

ARIMA models

Autoregressive Moving Average models:

yt = c+ φ1yt−1 + · · ·+ φpyt−p+ θ1et−1 + · · ·+ θqet−q + et.

Predictors include both lagged values of ytand lagged errors.

ARMA models can be used for a huge range ofstationary time series.

They model the short-term dynamics.

An ARMA model applied to differenced data isan ARIMA model.

Forecasting using R Non-seasonal ARIMA models 6

ARIMA models

Autoregressive Moving Average models:

yt = c+ φ1yt−1 + · · ·+ φpyt−p+ θ1et−1 + · · ·+ θqet−q + et.

Predictors include both lagged values of ytand lagged errors.

ARMA models can be used for a huge range ofstationary time series.

They model the short-term dynamics.

An ARMA model applied to differenced data isan ARIMA model.

Forecasting using R Non-seasonal ARIMA models 6

ARIMA modelsAutoregressive Integrated Moving AveragemodelsARIMA(p,d,q) model

AR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)

Random walk: ARIMA(0,1,0) with no constant

Random walk with drift: ARIMA(0,1,0) with const.

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

Forecasting using R Non-seasonal ARIMA models 7

ARIMA modelsAutoregressive Integrated Moving AveragemodelsARIMA(p,d,q) model

AR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)

Random walk: ARIMA(0,1,0) with no constant

Random walk with drift: ARIMA(0,1,0) with const.

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

Forecasting using R Non-seasonal ARIMA models 7

ARIMA modelsAutoregressive Integrated Moving AveragemodelsARIMA(p,d,q) model

AR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)

Random walk: ARIMA(0,1,0) with no constant

Random walk with drift: ARIMA(0,1,0) with const.

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

Forecasting using R Non-seasonal ARIMA models 7

ARIMA modelsAutoregressive Integrated Moving AveragemodelsARIMA(p,d,q) model

AR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)

Random walk: ARIMA(0,1,0) with no constant

Random walk with drift: ARIMA(0,1,0) with const.

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

Forecasting using R Non-seasonal ARIMA models 7

ARIMA modelsAutoregressive Integrated Moving AveragemodelsARIMA(p,d,q) model

AR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)

Random walk: ARIMA(0,1,0) with no constant

Random walk with drift: ARIMA(0,1,0) with const.

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

Forecasting using R Non-seasonal ARIMA models 7

Understanding ARIMA models

If c = 0 and d = 0, the long-term forecasts willgo to zero.If c = 0 and d = 1, the long-term forecasts willgo to a non-zero constant.If c = 0 and d = 2, the long-term forecasts willfollow a straight line.If c 6= 0 and d = 0, the long-term forecasts willgo to the mean of the data.If c 6= 0 and d = 1, the long-term forecasts willfollow a straight line.If c 6= 0 and d = 2, the long-term forecasts willfollow a quadratic trend.

Forecasting using R Non-seasonal ARIMA models 11

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−k

Now, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlated

What is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?

αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelation

αk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1

αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or q

If both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpful

Data may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidal

Significant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model ifPACF is exponentially decaying or sinusoidal

Significant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model ifPACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

Akaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

Forecasting using R Exponential smoothing state space models 18

Akaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

Forecasting using R Exponential smoothing state space models 18

Akaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

Forecasting using R Exponential smoothing state space models 18

Akaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

Forecasting using R Exponential smoothing state space models 18

Akaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

Forecasting using R Exponential smoothing state space models 18

Akaike’s Information Criterion

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Forecasting using R Exponential smoothing state space models 19

Akaike’s Information Criterion

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Forecasting using R Exponential smoothing state space models 19

Akaike’s Information Criterion

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Forecasting using R Exponential smoothing state space models 19

Akaike’s Information Criterion

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Forecasting using R Exponential smoothing state space models 19

Akaike’s Information Criterion

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Forecasting using R Exponential smoothing state space models 19

Backshift notationA very useful notational device is the backwardshift operator, B, which is used as follows:

Byt = yt−1 .

In other words, B, operating on yt, has the effect ofshifting the data back one period. Twoapplications of B to yt shifts the data back twoperiods:

B(Byt) = B2yt = yt−2 .

For monthly data, if we wish to shift attention to“the same month last year,” then B12 is used, andthe notation is B12yt = yt−12.

Forecasting using R Backshift notation 3

Backshift notationA very useful notational device is the backwardshift operator, B, which is used as follows:

Byt = yt−1 .

In other words, B, operating on yt, has the effect ofshifting the data back one period. Twoapplications of B to yt shifts the data back twoperiods:

B(Byt) = B2yt = yt−2 .

For monthly data, if we wish to shift attention to“the same month last year,” then B12 is used, andthe notation is B12yt = yt−12.

Forecasting using R Backshift notation 3

Backshift notationA very useful notational device is the backwardshift operator, B, which is used as follows:

Byt = yt−1 .

In other words, B, operating on yt, has the effect ofshifting the data back one period. Twoapplications of B to yt shifts the data back twoperiods:

B(Byt) = B2yt = yt−2 .

For monthly data, if we wish to shift attention to“the same month last year,” then B12 is used, andthe notation is B12yt = yt−12.

Forecasting using R Backshift notation 3

Backshift notationA very useful notational device is the backwardshift operator, B, which is used as follows:

Byt = yt−1 .

In other words, B, operating on yt, has the effect ofshifting the data back one period. Twoapplications of B to yt shifts the data back twoperiods:

B(Byt) = B2yt = yt−2 .

For monthly data, if we wish to shift attention to“the same month last year,” then B12 is used, andthe notation is B12yt = yt−12.

Forecasting using R Backshift notation 3

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Forecasting using R Backshift notation 4

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Forecasting using R Backshift notation 4

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Forecasting using R Backshift notation 4

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Forecasting using R Backshift notation 4

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Forecasting using R Backshift notation 4

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Forecasting using R Backshift notation 4

Backshift notation for ARIMA

ARMA model:yt = c + φ1yt−1 + · · ·+ φpyt−p + et + θ1et−1 + · · ·+ θqet−q

= c + φ1Byt + · · ·+ φpBpyt + et + θ1Bet + · · ·+ θqB

qet

φ(B)yt = c + θ(B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et

Forecasting using R Backshift notation 5

Backshift notation for ARIMA

ARMA model:yt = c + φ1yt−1 + · · ·+ φpyt−p + et + θ1et−1 + · · ·+ θqet−q

= c + φ1Byt + · · ·+ φpBpyt + et + θ1Bet + · · ·+ θqB

qet

φ(B)yt = c + θ(B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et

Forecasting using R Backshift notation 5

Backshift notation for ARIMA

ARMA model:yt = c + φ1yt−1 + · · ·+ φpyt−p + et + θ1et−1 + · · ·+ θqet−q

= c + φ1Byt + · · ·+ φpBpyt + et + θ1Bet + · · ·+ θqB

qet

φ(B)yt = c + θ(B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et↑

Firstdifference

Forecasting using R Backshift notation 5

Backshift notation for ARIMA

ARMA model:yt = c + φ1yt−1 + · · ·+ φpyt−p + et + θ1et−1 + · · ·+ θqet−q

= c + φ1Byt + · · ·+ φpBpyt + et + θ1Bet + · · ·+ θqB

qet

φ(B)yt = c + θ(B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et↑

AR(1)

Forecasting using R Backshift notation 5

Backshift notation for ARIMA

ARMA model:yt = c + φ1yt−1 + · · ·+ φpyt−p + et + θ1et−1 + · · ·+ θqet−q

= c + φ1Byt + · · ·+ φpBpyt + et + θ1Bet + · · ·+ θqB

qet

φ(B)yt = c + θ(B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et↑

MA(1)

Forecasting using R Backshift notation 5

Seasonal ARIMA models

ARIMA (p,d,q) (P,D,Q)m

where m = number of periods per season.

Forecasting using R Seasonal ARIMA models 7

Seasonal ARIMA models

ARIMA (p,d,q)︸ ︷︷ ︸ (P,D,Q)m

Non-seasonalpart of themodel

where m = number of periods per season.

Forecasting using R Seasonal ARIMA models 7

Seasonal ARIMA models

ARIMA (p,d,q) (P,D,Q)m︸ ︷︷ ︸↑

Seasonalpart ofthemodel

where m = number of periods per season.

Forecasting using R Seasonal ARIMA models 7

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

6

(Seasonaldifference

)

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

6(Non-seasonal

difference

)

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

6

(Seasonal

AR(1)

)

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

6(Non-seasonal

AR(1)

)

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

6

(Seasonal

MA(1)

)

Forecasting using R Seasonal ARIMA models 8

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)

(1− φ1B)(1−Φ1B4)(1− B)(1− B4)yt = (1 + θ1B)(1 + Θ1B

4)et.

6(Non-seasonal

MA(1)

)

Forecasting using R Seasonal ARIMA models 8

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACF

An ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will show

Exponential decay in the seasonal lags of the PACF: 12, 24, 36, ...Single significant spike at lag 12 in the ACF

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACFAn ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will show

Exponential decay in the seasonal lags of the PACF: 12, 24, 36, ...Single significant spike at lag 12 in the ACF

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACFAn ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...

Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will show

Exponential decay in the seasonal lags of the PACF: 12, 24, 36, ...Single significant spike at lag 12 in the ACF

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACFAn ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will show

Exponential decay in the seasonal lags of the PACF: 12, 24, 36, ...Single significant spike at lag 12 in the ACF

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACFAn ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will show

Exponential decay in the seasonal lags of the PACF: 12, 24, 36, ...Single significant spike at lag 12 in the ACF

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACFAn ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will showExponential decay in the seasonal lags of the PACF: 12, 24, 36, ...

Single significant spike at lag 12 in the ACF

ACF and PACF plots

The seasonal part of an AR or MA model will be seenin the seasonal lags of the ACF and PACFAn ARIMA(0, 0, 0)(1, 0, 0)12 model will show

Exponential decay in the seasonal lags of the ACF: 12, 24, 36, ...Single significant spike at lag 12 in the PACF

An ARIMA(0, 0, 0)(0, 0, 1)12 model will showExponential decay in the seasonal lags of the PACF: 12, 24, 36, ...Single significant spike at lag 12 in the ACF

Regression with ARIMA errors

Regression modelsyt = b0 + b1x1,t + · · ·+ bkxk,t + nt

yt modeled as function of k explanatoryvariables x1,t, . . . , xk,t.Usually, we assume that nt is WN.Now we want to allow nt to be autocorrelated.

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

and et is white noise .

Forecasting using R Regression with ARIMA errors 3

Regression with ARIMA errors

Regression modelsyt = b0 + b1x1,t + · · ·+ bkxk,t + nt

yt modeled as function of k explanatoryvariables x1,t, . . . , xk,t.Usually, we assume that nt is WN.Now we want to allow nt to be autocorrelated.

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

and et is white noise .

Forecasting using R Regression with ARIMA errors 3

Regression with ARIMA errors

Regression modelsyt = b0 + b1x1,t + · · ·+ bkxk,t + nt

yt modeled as function of k explanatoryvariables x1,t, . . . , xk,t.Usually, we assume that nt is WN.Now we want to allow nt to be autocorrelated.

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

and et is white noise .

Forecasting using R Regression with ARIMA errors 3

Regression with ARIMA errors

Regression modelsyt = b0 + b1x1,t + · · ·+ bkxk,t + nt

yt modeled as function of k explanatoryvariables x1,t, . . . , xk,t.Usually, we assume that nt is WN.Now we want to allow nt to be autocorrelated.

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

and et is white noise .

Forecasting using R Regression with ARIMA errors 3

Regression with ARIMA errors

Regression modelsyt = b0 + b1x1,t + · · ·+ bkxk,t + nt

yt modeled as function of k explanatoryvariables x1,t, . . . , xk,t.Usually, we assume that nt is WN.Now we want to allow nt to be autocorrelated.

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

and et is white noise .

Forecasting using R Regression with ARIMA errors 3

Residuals and errors

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

Be careful in distinguishing nt from et.nt are the “errors” and et are the “residuals”.In ordinary regression, nt is assumed to bewhite noise and so nt = et.

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

Now a regression with ARMA(1,1) error

Forecasting using R Regression with ARIMA errors 4

Residuals and errors

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

Be careful in distinguishing nt from et.nt are the “errors” and et are the “residuals”.In ordinary regression, nt is assumed to bewhite noise and so nt = et.

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

Now a regression with ARMA(1,1) error

Forecasting using R Regression with ARIMA errors 4

Residuals and errors

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

Be careful in distinguishing nt from et.nt are the “errors” and et are the “residuals”.In ordinary regression, nt is assumed to bewhite noise and so nt = et.

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

Now a regression with ARMA(1,1) error

Forecasting using R Regression with ARIMA errors 4

Residuals and errors

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

Be careful in distinguishing nt from et.nt are the “errors” and et are the “residuals”.In ordinary regression, nt is assumed to bewhite noise and so nt = et.

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

Now a regression with ARMA(1,1) error

Forecasting using R Regression with ARIMA errors 4

Residuals and errors

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

Be careful in distinguishing nt from et.nt are the “errors” and et are the “residuals”.In ordinary regression, nt is assumed to bewhite noise and so nt = et.

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

Now a regression with ARMA(1,1) error

Forecasting using R Regression with ARIMA errors 4

Residuals and errors

Example: nt = ARIMA(1,1,1)

yt = b0 + b1x1,t + · · ·+ bkxk,t + ntwhere (1− φ1B)(1− B)nt = (1− θ1B)et

Be careful in distinguishing nt from et.nt are the “errors” and et are the “residuals”.In ordinary regression, nt is assumed to bewhite noise and so nt = et.

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

Now a regression with ARMA(1,1) error

Forecasting using R Regression with ARIMA errors 4

Regression with ARIMA errorsAny regression with an ARIMA error can be rewrittenas a regression with an ARMA error by differencingall variables with the same differencing operator asin the ARIMA model.

Original data

yt = b0 + b1x1,t + · · ·+ bkxk,t + nt

where φ(B)(1− B)dnt = θ(B)et

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

where φ(B)nt = θ(B)et

and y′t = (1− B)dyt, etc.Forecasting using R Regression with ARIMA errors 5

Regression with ARIMA errorsAny regression with an ARIMA error can be rewrittenas a regression with an ARMA error by differencingall variables with the same differencing operator asin the ARIMA model.

Original data

yt = b0 + b1x1,t + · · ·+ bkxk,t + nt

where φ(B)(1− B)dnt = θ(B)et

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

where φ(B)nt = θ(B)et

and y′t = (1− B)dyt, etc.Forecasting using R Regression with ARIMA errors 5

Regression with ARIMA errorsAny regression with an ARIMA error can be rewrittenas a regression with an ARMA error by differencingall variables with the same differencing operator asin the ARIMA model.

Original data

yt = b0 + b1x1,t + · · ·+ bkxk,t + nt

where φ(B)(1− B)dnt = θ(B)et

After differencing all variables

y′t = b1x′1,t + · · ·+ bkx

′k,t + n′t.

where φ(B)nt = θ(B)et

and y′t = (1− B)dyt, etc.Forecasting using R Regression with ARIMA errors 5

Modeling procedureProblems with OLS and autocorrelated errors

1 OLS no longer the best way to computecoefficients as it does not take account oftime-relationships in data.

2 Standard errors of coefficients are incorrect —most likely too small. This invalidates tests andprediction intervals.

Second problem more serious because it can lead tomisleading results.If standard errors obtained using OLS too small, someexplanatory variables may appear to be significant when,in fact, they are not. This is known as “spuriousregression.”

Forecasting using R Regression with ARIMA errors 6

Modeling procedureProblems with OLS and autocorrelated errors

1 OLS no longer the best way to computecoefficients as it does not take account oftime-relationships in data.

2 Standard errors of coefficients are incorrect —most likely too small. This invalidates tests andprediction intervals.

Second problem more serious because it can lead tomisleading results.If standard errors obtained using OLS too small, someexplanatory variables may appear to be significant when,in fact, they are not. This is known as “spuriousregression.”

Forecasting using R Regression with ARIMA errors 6

Modeling procedureProblems with OLS and autocorrelated errors

1 OLS no longer the best way to computecoefficients as it does not take account oftime-relationships in data.

2 Standard errors of coefficients are incorrect —most likely too small. This invalidates tests andprediction intervals.

Second problem more serious because it can lead tomisleading results.If standard errors obtained using OLS too small, someexplanatory variables may appear to be significant when,in fact, they are not. This is known as “spuriousregression.”

Forecasting using R Regression with ARIMA errors 6

Modeling procedureProblems with OLS and autocorrelated errors

1 OLS no longer the best way to computecoefficients as it does not take account oftime-relationships in data.

2 Standard errors of coefficients are incorrect —most likely too small. This invalidates tests andprediction intervals.

Second problem more serious because it can lead tomisleading results.If standard errors obtained using OLS too small, someexplanatory variables may appear to be significant when,in fact, they are not. This is known as “spuriousregression.”

Forecasting using R Regression with ARIMA errors 6

Modeling procedureProblems with OLS and autocorrelated errors

1 OLS no longer the best way to computecoefficients as it does not take account oftime-relationships in data.

2 Standard errors of coefficients are incorrect —most likely too small. This invalidates tests andprediction intervals.

Second problem more serious because it can lead tomisleading results.If standard errors obtained using OLS too small, someexplanatory variables may appear to be significant when,in fact, they are not. This is known as “spuriousregression.”

Forecasting using R Regression with ARIMA errors 6

Modeling procedure

Estimation only works when all predictorvariables are deterministic or stationary andthe errors are stationary.

So difference stochastic variables as requireduntil all variables appear stationary. Then fitmodel with ARMA errors.

auto.arima() will handle order selection anddifferencing (but only checks that errors arestationary).

Forecasting using R Regression with ARIMA errors 7

Modeling procedure

Estimation only works when all predictorvariables are deterministic or stationary andthe errors are stationary.

So difference stochastic variables as requireduntil all variables appear stationary. Then fitmodel with ARMA errors.

auto.arima() will handle order selection anddifferencing (but only checks that errors arestationary).

Forecasting using R Regression with ARIMA errors 7

Modeling procedure

Estimation only works when all predictorvariables are deterministic or stationary andthe errors are stationary.

So difference stochastic variables as requireduntil all variables appear stationary. Then fitmodel with ARMA errors.

auto.arima() will handle order selection anddifferencing (but only checks that errors arestationary).

Forecasting using R Regression with ARIMA errors 7

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Time series decomposition

Yt = f(St, Tt,Et)

where Yt = data at period tSt = seasonal component at period tTt = trend-cycle component at period tEt = remainder (or irregular or error)

component at period t

Additive decomposition: Yt = St + Tt + Et.Multiplicative decomposition: Yt = St × Tt × Et.

Forecasting using R Time series decomposition 21

Time series decomposition

Yt = f(St, Tt,Et)

where Yt = data at period tSt = seasonal component at period tTt = trend-cycle component at period tEt = remainder (or irregular or error)

component at period t

Additive decomposition: Yt = St + Tt + Et.Multiplicative decomposition: Yt = St × Tt × Et.

Forecasting using R Time series decomposition 21

Time series decomposition

Yt = f(St, Tt,Et)

where Yt = data at period tSt = seasonal component at period tTt = trend-cycle component at period tEt = remainder (or irregular or error)

component at period t

Additive decomposition: Yt = St + Tt + Et.Multiplicative decomposition: Yt = St × Tt × Et.

Forecasting using R Time series decomposition 21

Time series decomposition

Additive model appropriate if magnitude ofseasonal fluctuations does not vary with level.

If seasonal are proportional to level of series,then multiplicative model appropriate.

Multiplicative decomposition more prevalentwith economic series

Logs turn multiplicative relationship into anadditive relationship:

Yt = St×Tt×Et ⇒ log Yt = logSt+log Tt+logEt.

Forecasting using R Time series decomposition 22

Time series decomposition

Additive model appropriate if magnitude ofseasonal fluctuations does not vary with level.

If seasonal are proportional to level of series,then multiplicative model appropriate.

Multiplicative decomposition more prevalentwith economic series

Logs turn multiplicative relationship into anadditive relationship:

Yt = St×Tt×Et ⇒ log Yt = logSt+log Tt+logEt.

Forecasting using R Time series decomposition 22

Time series decomposition

Additive model appropriate if magnitude ofseasonal fluctuations does not vary with level.

If seasonal are proportional to level of series,then multiplicative model appropriate.

Multiplicative decomposition more prevalentwith economic series

Logs turn multiplicative relationship into anadditive relationship:

Yt = St×Tt×Et ⇒ log Yt = logSt+log Tt+logEt.

Forecasting using R Time series decomposition 22

Time series decomposition

Additive model appropriate if magnitude ofseasonal fluctuations does not vary with level.

If seasonal are proportional to level of series,then multiplicative model appropriate.

Multiplicative decomposition more prevalentwith economic series

Logs turn multiplicative relationship into anadditive relationship:

Yt = St×Tt×Et ⇒ log Yt = logSt+log Tt+logEt.

Forecasting using R Time series decomposition 22

Seasonal adjustment

Useful by-product of decomposition: an easyway to calculate seasonally adjusted data.

Additive decomposition: seasonally adjusteddata given by

Yt − St = Tt + Et

Multiplicative decomposition: seasonallyadjusted data given by

Yt/St = Tt × Et

Forecasting using R Seasonal adjustment 42

Seasonal adjustment

Useful by-product of decomposition: an easyway to calculate seasonally adjusted data.

Additive decomposition: seasonally adjusteddata given by

Yt − St = Tt + Et

Multiplicative decomposition: seasonallyadjusted data given by

Yt/St = Tt × Et

Forecasting using R Seasonal adjustment 42

Seasonal adjustment

Useful by-product of decomposition: an easyway to calculate seasonally adjusted data.

Additive decomposition: seasonally adjusteddata given by

Yt − St = Tt + Et

Multiplicative decomposition: seasonallyadjusted data given by

Yt/St = Tt × Et

Forecasting using R Seasonal adjustment 42

Forecasting and decomposition

Forecast seasonal component by repeating thelast year

Forecast seasonally adjusted data usingnon-seasonal time series method. E.g.,

Holt’s method — next topicRandom walk with drift model

Combine forecasts of seasonal component withforecasts of seasonally adjusted data to getforecasts of original data.

Sometimes a decomposition is useful just forunderstanding the data before building aseparate forecasting model.

Forecasting using R Forecasting and decomposition 46

Forecasting and decomposition

Forecast seasonal component by repeating thelast year

Forecast seasonally adjusted data usingnon-seasonal time series method. E.g.,

Holt’s method — next topicRandom walk with drift model

Combine forecasts of seasonal component withforecasts of seasonally adjusted data to getforecasts of original data.

Sometimes a decomposition is useful just forunderstanding the data before building aseparate forecasting model.

Forecasting using R Forecasting and decomposition 46

Forecasting and decomposition

Forecast seasonal component by repeating thelast year

Forecast seasonally adjusted data usingnon-seasonal time series method. E.g.,

Holt’s method — next topicRandom walk with drift model

Combine forecasts of seasonal component withforecasts of seasonally adjusted data to getforecasts of original data.

Sometimes a decomposition is useful just forunderstanding the data before building aseparate forecasting model.

Forecasting using R Forecasting and decomposition 46

Forecasting and decomposition

Forecast seasonal component by repeating thelast year

Forecast seasonally adjusted data usingnon-seasonal time series method. E.g.,

Holt’s method — next topicRandom walk with drift model

Combine forecasts of seasonal component withforecasts of seasonally adjusted data to getforecasts of original data.

Sometimes a decomposition is useful just forunderstanding the data before building aseparate forecasting model.

Forecasting using R Forecasting and decomposition 46

Forecasting and decomposition

Forecast seasonal component by repeating thelast year

Forecast seasonally adjusted data usingnon-seasonal time series method. E.g.,

Holt’s method — next topicRandom walk with drift model

Combine forecasts of seasonal component withforecasts of seasonally adjusted data to getforecasts of original data.

Sometimes a decomposition is useful just forunderstanding the data before building aseparate forecasting model.

Forecasting using R Forecasting and decomposition 46

Forecasting and decomposition

Forecast seasonal component by repeating thelast year

Forecast seasonally adjusted data usingnon-seasonal time series method. E.g.,

Holt’s method — next topicRandom walk with drift model

Combine forecasts of seasonal component withforecasts of seasonally adjusted data to getforecasts of original data.

Sometimes a decomposition is useful just forunderstanding the data before building aseparate forecasting model.

Forecasting using R Forecasting and decomposition 46

Simple methods

Random walk forecasts

yT+1|T = yT

Average forecasts

yT+1|T =1

T

T∑

t=1

yt

Want something in between that weights mostrecent data more highly.Simple exponential smoothing uses a weightedmoving average with weights that decreaseexponentially.

Forecasting using R Simple exponential smoothing 3

Simple methods

Random walk forecasts

yT+1|T = yT

Average forecasts

yT+1|T =1

T

T∑

t=1

yt

Want something in between that weights mostrecent data more highly.Simple exponential smoothing uses a weightedmoving average with weights that decreaseexponentially.

Forecasting using R Simple exponential smoothing 3

Simple methods

Random walk forecasts

yT+1|T = yT

Average forecasts

yT+1|T =1

T

T∑

t=1

yt

Want something in between that weights mostrecent data more highly.Simple exponential smoothing uses a weightedmoving average with weights that decreaseexponentially.

Forecasting using R Simple exponential smoothing 3

Simple methods

Random walk forecasts

yT+1|T = yT

Average forecasts

yT+1|T =1

T

T∑

t=1

yt

Want something in between that weights mostrecent data more highly.Simple exponential smoothing uses a weightedmoving average with weights that decreaseexponentially.

Forecasting using R Simple exponential smoothing 3

Simple Exponential Smoothing

Forecast equation

yT+1|T = αyT + α(1− α)yT−1 + α(1− α)2yT−2 + · · · ,

where 0 ≤ α ≤ 1.

Weights assigned to observations for:Observation α = 0.2 α = 0.4 α = 0.6 α = 0.8

yT 0.2 0.4 0.6 0.8yT−1 0.16 0.24 0.24 0.16yT−2 0.128 0.144 0.096 0.032yT−3 0.1024 0.0864 0.0384 0.0064yT−4 (0.2)(0.8)4 (0.4)(0.6)4 (0.6)(0.4)4 (0.8)(0.2)4

yT−5 (0.2)(0.8)5 (0.4)(0.6)5 (0.6)(0.4)5 (0.8)(0.2)5

Forecasting using R Simple exponential smoothing 4

Simple Exponential Smoothing

Forecast equation

yT+1|T = αyT + α(1− α)yT−1 + α(1− α)2yT−2 + · · · ,

where 0 ≤ α ≤ 1.

Weights assigned to observations for:Observation α = 0.2 α = 0.4 α = 0.6 α = 0.8

yT 0.2 0.4 0.6 0.8yT−1 0.16 0.24 0.24 0.16yT−2 0.128 0.144 0.096 0.032yT−3 0.1024 0.0864 0.0384 0.0064yT−4 (0.2)(0.8)4 (0.4)(0.6)4 (0.6)(0.4)4 (0.8)(0.2)4

yT−5 (0.2)(0.8)5 (0.4)(0.6)5 (0.6)(0.4)5 (0.8)(0.2)5

Forecasting using R Simple exponential smoothing 4

Simple Exponential Smoothing

Weighted average form

yt+1|t = αyt + (1− α)yt|t−1

for t = 1, . . . , T, where 0 ≤ α ≤ 1 is the smoothingparameter.

The process has to start somewhere, so we let thefirst forecast of y1 be denoted by `0. Then

y2|1 = αy1 + (1− α)`0

y3|2 = αy2 + (1− α)y2|1y4|3 = αy3 + (1− α)y3|2

...

yT+1|T = αyT + (1− α)yT|T−1Forecasting using R Simple exponential smoothing 5

Simple Exponential Smoothing

Weighted average form

yt+1|t = αyt + (1− α)yt|t−1

for t = 1, . . . , T, where 0 ≤ α ≤ 1 is the smoothingparameter.

The process has to start somewhere, so we let thefirst forecast of y1 be denoted by `0. Then

y2|1 = αy1 + (1− α)`0

y3|2 = αy2 + (1− α)y2|1y4|3 = αy3 + (1− α)y3|2

...

yT+1|T = αyT + (1− α)yT|T−1Forecasting using R Simple exponential smoothing 5

Simple Exponential Smoothingyt+1|t = αyt + (1− α)yt|t−1

Substituting each equation into the following equation:

y3|2 = αy2 + (1− α)y2|1= αy2 + (1− α) [αy1 + (1− α)`0]

= αy2 + α(1− α)y1 + (1− α)2`0

y4|3 = αy3 + (1− α)[αy2 + α(1− α)y1 + (1− α)2`0]

= αy3 + α(1− α)y2 + α(1− α)2y1 + (1− α)3`0

...

yT+1|T = αyT + α(1− α)yT−1 + α(1− α)2yT−2 + · · ·+ (1− α)T`0

Exponentially weighted average

yT+1|T =T−1∑

j=0

α(1− α)jyT−j + (1− α)T`0

Forecasting using R Simple exponential smoothing 6

Simple Exponential Smoothingyt+1|t = αyt + (1− α)yt|t−1

Substituting each equation into the following equation:

y3|2 = αy2 + (1− α)y2|1= αy2 + (1− α) [αy1 + (1− α)`0]

= αy2 + α(1− α)y1 + (1− α)2`0

y4|3 = αy3 + (1− α)[αy2 + α(1− α)y1 + (1− α)2`0]

= αy3 + α(1− α)y2 + α(1− α)2y1 + (1− α)3`0

...

yT+1|T = αyT + α(1− α)yT−1 + α(1− α)2yT−2 + · · ·+ (1− α)T`0

Exponentially weighted average

yT+1|T =T−1∑

j=0

α(1− α)jyT−j + (1− α)T`0

Forecasting using R Simple exponential smoothing 6

Simple exponential smoothing

Initialization

Last term in weighted moving average is(1− α)T ˆ0.

So value of `0 plays a role in all subsequentforecasts.

Weight is small unless α close to zero or Tsmall.

Common to set `0 = y1. Better to treat it as aparameter, along with α.

Forecasting using R Simple exponential smoothing 7

Simple exponential smoothing

Initialization

Last term in weighted moving average is(1− α)T ˆ0.

So value of `0 plays a role in all subsequentforecasts.

Weight is small unless α close to zero or Tsmall.

Common to set `0 = y1. Better to treat it as aparameter, along with α.

Forecasting using R Simple exponential smoothing 7

Simple exponential smoothing

Initialization

Last term in weighted moving average is(1− α)T ˆ0.

So value of `0 plays a role in all subsequentforecasts.

Weight is small unless α close to zero or Tsmall.

Common to set `0 = y1. Better to treat it as aparameter, along with α.

Forecasting using R Simple exponential smoothing 7

Simple exponential smoothing

Initialization

Last term in weighted moving average is(1− α)T ˆ0.

So value of `0 plays a role in all subsequentforecasts.

Weight is small unless α close to zero or Tsmall.

Common to set `0 = y1. Better to treat it as aparameter, along with α.

Forecasting using R Simple exponential smoothing 7

Simple exponential smoothing

Optimization

We can choose α and `0 by minimizing MSE:

MSE =1

T − 1

T∑

t=2

(yt − yt|t−1)2

Unlike regression there is no closed formsolution — use numerical optimization.

Forecasting using R Simple exponential smoothing 10

Simple exponential smoothing

Optimization

We can choose α and `0 by minimizing MSE:

MSE =1

T − 1

T∑

t=2

(yt − yt|t−1)2

Unlike regression there is no closed formsolution — use numerical optimization.

Forecasting using R Simple exponential smoothing 10

Simple exponential smoothing

Multi-step forecasts

yT+h|T = yT+1|T, h = 2,3, . . .

A “flat” forecast function.

Remember, a forecast is an estimated mean ofa future value.

So with no trend, no seasonality, and no otherpatterns, the forecasts are constant.

Forecasting using R Simple exponential smoothing 13

Simple exponential smoothing

Multi-step forecasts

yT+h|T = yT+1|T, h = 2,3, . . .

A “flat” forecast function.

Remember, a forecast is an estimated mean ofa future value.

So with no trend, no seasonality, and no otherpatterns, the forecasts are constant.

Forecasting using R Simple exponential smoothing 13

Simple exponential smoothing

Multi-step forecasts

yT+h|T = yT+1|T, h = 2,3, . . .

A “flat” forecast function.

Remember, a forecast is an estimated mean ofa future value.

So with no trend, no seasonality, and no otherpatterns, the forecasts are constant.

Forecasting using R Simple exponential smoothing 13

Simple exponential smoothing

Multi-step forecasts

yT+h|T = yT+1|T, h = 2,3, . . .

A “flat” forecast function.

Remember, a forecast is an estimated mean ofa future value.

So with no trend, no seasonality, and no otherpatterns, the forecasts are constant.

Forecasting using R Simple exponential smoothing 13