energy infrastructure planning: forecastinge4210...time series examples hourly electricity demand...

Energy Infrastructure Planning: Forecasting

Carlos Abad

November 7, 2014

Who am I?

PhD student in the IEOR department

Advisors: Prof. Vijay Modi and Prof. Garud Iyengar

Research:

Robust control algorithms for solar micro-gridsControl, signal detection, and forecasting methods for managing DRprograms

Who am I?

Research:

Who am I?

Research:

Who am I?

Research:

References

Hyndman, R. J. & Athanasopoulos, G.(2013) Forecasting: principlesand practice.

www.otexts.org/fpp/

R package fpp

Outline

1 Time series in R

2 Simple forecasting methods

3 Measuring forecast accuracy

4 Seasonality and stationarity

5 ARIMA forecasting

6 Exponential smoothing

Time series data

Time series consists of sequences of observationscollected over time.

We will assume the time periods are equally spaced

Time series examples

Hourly electricity demandDaily maximum temperatureWeekly wind generationMonthly rainfall

Forecasting is estimating how the sequence ofobservations will continue into the future.

Time series data

Time series consists of sequences of observationscollected over time.We will assume the time periods are equally spaced

Time series data

Time series examplesHourly electricity demand

Daily maximum temperatureWeekly wind generationMonthly rainfall

Time series data

Time series examplesHourly electricity demandDaily maximum temperature

Weekly wind generationMonthly rainfall

Time series data

Time series examplesHourly electricity demandDaily maximum temperatureWeekly wind generation

Monthly rainfall

Time series data

Time series examplesHourly electricity demandDaily maximum temperatureWeekly wind generationMonthly rainfall

Time series in RMain package used in this course> library(fpp)

Forecasting using R Time series data 34

Time series in RMain package used in this course> library(fpp)This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)tseries package (for a few time seriesfunctions)fma package (for lots of time series data)expsmooth package (for more time seriesdata)lmtest package (for some regressionfunctions)

Time series in R

Other packages> library(xts)

Order time series by timestamp

Nicer plotsEasier time aggregation

Time series in R

Order time series by timestampNicer plots

Easier time aggregation

Time series in R

Order time series by timestampNicer plotsEasier time aggregation

Outline

1 Time series in R

5 ARIMA forecasting

Notation

yt : observed value at time t

yT+h|T : forecast for time T + h made at time T withhistorical information up to time T

Notation

yt : observed value at time t

yT+h|T : forecast for time T + h made at time T withhistorical information up to time T

Some simple forecasting methods

Average method

Forecast of all future values is equal to mean ofhistorical data {y1, . . . , yT}.Forecasts: yT+h|T = y = (y1 + · · ·+ yT)/T

Naïve method (for time series only)

Forecasts equal to last observed value.Forecasts: yT+h|T = yT.Consequence of efficient market hypothesis.

Seasonal naïve method

Forecasts equal to last value from same season.Forecasts: yT+h|T = yT+h−km where m =seasonal period and k = b(h− 1)/mc+1.

Forecasting using R Some simple forecasting methods 39

Average method

Drift method

Forecasts equal to last value plus averagechange.

Forecasts:

yT+h|T = yT +h

T − 1

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Equivalent to extrapolating a line drawnbetween first and last observations.

Drift method

Forecasts:

yT+h|T = yT +h

T − 1

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Drift method

Forecasts:

yT+h|T = yT +h

T − 1

(yt − yt−1)

= yT +h

T − 1(yT − y1).

Mean: meanf(x, h=20)

Naive: naive(x, h=20) or rwf(x, h=20)

Seasonal naive: snaive(x, h=20)

Drift: rwf(x, drift=TRUE, h=20)

Outline

1 Time series in R

5 ARIMA forecasting

Forecasting residuals

Residuals in forecasting: difference betweenobserved value and its forecast based on allprevious observations: et = yt − yt|t−1.

Assumptions1 {et} uncorrelated. If they aren’t, then

information left in residuals that should be usedin computing forecasts.

2 {et} have mean zero. If they don’t, thenforecasts are biased.

Useful properties (for prediction intervals)3 {et} have constant variance.4 {et} are normally distributed.

Forecasting using R Forecast residuals 10

Measures of forecast accuracy

Let yt denote the tth observation and yt|t−1 denote its forecastbased on all previous data, where t = 1, . . . , T. Then thefollowing measures are useful.

MAE = T−1T∑

|yt − yt|t−1|

MSE = T−1T∑

(yt − yt|t−1)2 RMSE =

√√√√T−1

(yt − yt|t−1)2

MAPE = 100T−1T∑

|yt − yt|t−1|/|yt|

MAE, MSE, RMSE are all scale dependent.

MAPE is scale independent but is only sensible if yt � 0for all t, and y has a natural zero.

Forecasting using R Evaluating forecast accuracy 17

MAE = T−1T∑

|yt − yt|t−1|

MSE = T−1T∑

√√√√T−1

(yt − yt|t−1)2

MAPE = 100T−1T∑

|yt − yt|t−1|/|yt|

MAE = T−1T∑

|yt − yt|t−1|

MSE = T−1T∑

√√√√T−1

(yt − yt|t−1)2

MAPE = 100T−1T∑

|yt − yt|t−1|/|yt|

Mean Absolute Scaled Error

MASE = T−1T∑

|yt − yt|t−1|/Q

where Q is a stable measure of the scale of the timeseries {yt}.

MASE = T−1T∑

|yt − yt|t−1|/Q

For non-seasonal time series,

Q = (T − 1)−1T∑

|yt − yt−1|

works well. Then MASE is equivalent to MAE relativeto a naive method.

MASE = T−1T∑

|yt − yt|t−1|/Q

For seasonal time series,

Q = (T −m)−1T∑

|yt − yt−m|

works well. Then MASE is equivalent to MAE relativeto a seasonal naive method.

Training and test sets

Available data

Training set Test set(e.g., 80%) (e.g., 20%)

The test set must not be used for any aspect ofmodel development or calculation of forecasts.

Forecast accuracy is based only on the test set.

Training and test sets

Available data

Training set Test set(e.g., 80%) (e.g., 20%)

The test set must not be used for any aspect ofmodel development or calculation of forecasts.

Forecast accuracy is based only on the test set.

Beware of over-fitting

A model which fits the data well does notnecessarily forecast well.A perfect fit can always be obtained by using amodel with enough parameters. (Compare R2)Over-fitting a model to data is as bad as failingto identify the systematic pattern in the data.Problems can be overcome by measuring trueout-of-sample forecast accuracy. That is, totaldata divided into “training” set and “test” set.Training set used to estimate parameters.Forecasts are made for test set.Accuracy measures computed for errors in testset only.

Outline

1 Time series in R

5 ARIMA forecasting

Time series graphics

Time plotsR command: plot or plot.ts

Seasonal plotsR command: seasonplot

Seasonal subseries plotsR command: monthplot

Lag plotsR command: lag.plot

ACF plotsR command: Acf

Forecasting using R Time series graphics 3

Seasonal plots

Data plotted against the individual “seasons” inwhich the data were observed. (In this case a“season” is a month.)

Something like a time plot except that the datafrom each season are overlapped.

Enables the underlying seasonal pattern to beseen more clearly, and also allows anysubstantial departures from the seasonalpattern to be easily identified.

In R: seasonplot

Seasonal plots

In R: seasonplot

Seasonal plots

In R: seasonplot

Seasonal plots

In R: seasonplot

Seasonal subseries plots

Data for each season collected together in timeplot as separate time series.

Enables the underlying seasonal pattern to beseen clearly, and changes in seasonality overtime to be visualized.

In R: monthplot

Time series patterns

Trend pattern exists when there is a long-termincrease or decrease in the data.

Seasonal pattern exists when a series isinfluenced by seasonal factors (e.g., thequarter of the year, the month, or day ofthe week).

Cyclic pattern exists when data exhibit rises andfalls that are not of fixed period (durationusually of at least 2 years).

Forecasting using R Seasonal or cyclic? 15

Seasonal or cyclic?

Differences between seasonal and cyclicpatterns:

seasonal pattern constant length; cyclic patternvariable length

average length of cycle longer than length ofseasonal pattern

magnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

Seasonal or cyclic?

Australian electricity production

1980 1985 1990 1995

Australian clay brick production

1960 1970 1980 1990

Stationarity

DefinitionIf {yt} is a stationary time series, then for

all s, the distribution of (yt, . . . , yt+s) does

not depend on t.

A stationary series is:

roughly horizontal

constant variance

no patterns predictable in the long-term

Forecasting using R Stationarity 3

Stationarity

not depend on t.

roughly horizontal

constant variance

Stationarity

not depend on t.

roughly horizontal

constant variance

Stationarity

not depend on t.

roughly horizontal

constant variance

Stationary?

0 50 100 150 200 250 300

Stationary?

0 50 100 150 200 250 300

Stationarity

not depend on t.

Transformations help to stabilize the

variance.

For ARIMA modelling, we also need to

stabilize the mean.Forecasting using R Stationarity 13

Non-stationarity in the mean

Identifying non-stationary series

time plot.

The ACF of stationary data drops to

zero relatively quickly

The ACF of non-stationary data

decreases slowly.

For non-stationary data, the value of r1is often large and positive.

Autocorrelation

Covariance and correlation: measure extent oflinear relationship between two variables (y andX).

Autocovariance and autocorrelation: measurelinear relationship between lagged values of atime series y.

We measure the relationship between: yt and yt−1

yt and yt−2

yt and yt−3

Forecasting using R Autocorrelation 23

Autocorrelation

yt and yt−2

yt and yt−3

Autocorrelation

yt and yt−2

yt and yt−3

AutocorrelationWe denote the sample autocovariance at lag k by ck and thesample autocorrelation at lag k by rk. Then define

(yt − y)(yt−k − y)

and rk = ck/c0

r1 indicates how successive values of y relate to eachother

r2 indicates how y values two periods apart relate toeach other

rk is almost the same as the sample correlation betweenyt and yt−k.

(yt − y)(yt−k − y)

and rk = ck/c0

(yt − y)(yt−k − y)

and rk = ck/c0

(yt − y)(yt−k − y)

and rk = ck/c0

Recognizing seasonality in a time series

If there is seasonality, the ACF at the seasonal lag(e.g., 12 for monthly data) will be large andpositive.

For seasonal monthly data, a large ACF valuewill be seen at lag 12 and possibly also at lags24, 36, . . .

For seasonal quarterly data, a large ACF valuewill be seen at lag 4 and possibly also at lags 8,12, . . .

Recognizing seasonality in a time series

If there is seasonality, the ACF at the seasonal lag(e.g., 12 for monthly data) will be large andpositive.

For seasonal monthly data, a large ACF valuewill be seen at lag 12 and possibly also at lags24, 36, . . .

For seasonal quarterly data, a large ACF valuewill be seen at lag 4 and possibly also at lags 8,12, . . .

Example: White noise

Forecasting using R White noise 5

White noise

0 10 20 30 40 50

White noise

0 10 20 30 40 50

White noise data is uncorrelated acrosstime with zero mean and constant variance.(Technically, we require independence aswell.)

White noise

0 10 20 30 40 50

White noise data is uncorrelated acrosstime with zero mean and constant variance.(Technically, we require independence aswell.)

Think of white noise as completelyuninteresting with no predictable patterns.

r1 = 0.013r2 = −0.163r3 = 0.163r4 = −0.259r5 = −0.198r6 = 0.064r7 = −0.139r8 = −0.032r9 = 0.199r10 = −0.240

Sample autocorrelations for white noise series.For uncorrelated data, we would expect eachautocorrelation to be close to zero.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1511 13

Sampling distribution of autocorrelations

Sampling distribution of rk for white noise data isasymptotically N(0,1/T).

95% of all rk for white noise must lie within±1.96/

If this is not the case, the series is probably notWN.

Common to plot lines at ±1.96/√T when

plotting ACF. These are the critical values.

Autocorrelation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1511 13

Example:T = 50 and socritical values at±1.96/

√50 =

±0.28.All autocorrelationcoefficients lie withinthese limits,confirming that thedata are white noise.(More precisely, the data cannot bedistinguished from white noise.)

ACF of residuals

We assume that the residuals are white noise(uncorrelated, mean zero, constant variance). Ifthey aren’t, then there is information left in theresiduals that should be used in computingforecasts.So a standard residual diagnostic is to checkthe ACF of the residuals of a forecastingmethod.We expect these to look like white noise.

Dow-Jones naive forecasts revisited

yt|t−1 = yt−1

et = yt − yt−1

ACF of residuals

yt|t−1 = yt−1

et = yt − yt−1

ACF of residuals

yt|t−1 = yt−1

et = yt − yt−1

time plot.

decreases slowly.

time plot.

decreases slowly.

time plot.

decreases slowly.

time plot.

decreases slowly.

Differencing

Differencing helps to stabilize the

The differenced series is the change

between each observation in the

original series: y′t = yt − yt−1.

The differenced series will have only

T − 1 values since it is not possible to

calculate a difference y′1 for the first

observation.

Forecasting using R Ordinary differencing 20

Differencing

observation.

Differencing

observation.

Second-order differencing

Occasionally the differenced data will not

appear stationary and it may be necessary

to difference the data a second time:

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t will have T − 2 values.

In practice, it is almost never necessary

to go beyond second-order differences.Forecasting using R Ordinary differencing 24

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

y′′t = y′t − y′t−1

= (yt − yt−1)− (yt−1 − yt−2)

= yt − 2yt−1 + yt−2.

Seasonal differencing

A seasonal difference is the difference

between an observation and the

corresponding observation from the

previous year.

y′t = yt − yt−m

where m = number of seasons.

For monthly data m = 12.

For quarterly data m = 4.Forecasting using R Seasonal differencing 26

previous year.

When both seasonal and first differences areapplied. . .

it makes no difference which is done first—theresult will be the same.

If seasonality is strong, we recommend thatseasonal differencing be done first becausesometimes the resulting series will bestationary and there will be no need for furtherfirst difference.

It is important that if differencing is used, thedifferences are interpretable.

Forecasting using R Seasonal differencing 35

Interpretation of differencing

first differences are the change

between one observation and the

seasonal differences are the change

between one year to the next.

But taking lag 3 differences for yearly data,

for example, results in a model which

cannot be sensibly interpreted.

Outline

1 Time series in R

5 ARIMA forecasting

Autoregressive modelsAutoregressive (AR) models:

yt = c+ φ1yt−1 + φ2yt−2 + · · ·+ φpyt−p + et,

where et is white noise. This is a multiple regressionwith lagged values of yt as predictors.

Forecasting using R Non-seasonal ARIMA models 3

Autoregressive modelsAutoregressive (AR) models:

yt = c+ φ1yt−1 + φ2yt−2 + · · ·+ φpyt−p + et,

where et is white noise. This is a multiple regressionwith lagged values of yt as predictors.

0 20 40 60 80 100

AR(1) model

yt = c+ φ1yt−1 + et

When φ1 = 0, yt is equivalent to WN

When φ1 = 1 and c = 0, yt is

equivalent to a RW

When φ1 = 1 and c 6= 0, yt is

equivalent to a RW with drift

When φ1 < 0, yt tends to oscillate

between positive and negative

values.

AR(1) model

equivalent to a RW

values.

AR(1) model

equivalent to a RW

values.

AR(1) model

equivalent to a RW

values.

Moving Average (MA) models

Moving Average (MA) models:

yt = c+ et + θ1et−1 + θ2et−2 + · · ·+ θqet−q,

where et is white noise. This is a multiple regressionwith past errors as predictors. Don’t confuse thiswith moving average smoothing!

Moving Average (MA) models

Moving Average (MA) models:

yt = c+ et + θ1et−1 + θ2et−2 + · · ·+ θqet−q,

where et is white noise. This is a multiple regressionwith past errors as predictors. Don’t confuse thiswith moving average smoothing!

0 20 40 60 80 100

ARIMA models

Autoregressive Moving Average models:

yt = c+ φ1yt−1 + · · ·+ φpyt−p+ θ1et−1 + · · ·+ θqet−q + et.

Predictors include both lagged values of ytand lagged errors.

ARMA models can be used for a huge range ofstationary time series.

They model the short-term dynamics.

An ARMA model applied to differenced data isan ARIMA model.

ARIMA models

ARIMA modelsAutoregressive Integrated Moving AveragemodelsARIMA(p,d,q) model

AR: p = order of the autoregressive partI: d = degree of first differencing involved

MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)

Random walk: ARIMA(0,1,0) with no constant

Random walk with drift: ARIMA(0,1,0) with const.

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

AR(p): ARIMA(p,0,0)

MA(q): ARIMA(0,0,q)

Understanding ARIMA models

If c = 0 and d = 0, the long-term forecasts willgo to zero.If c = 0 and d = 1, the long-term forecasts willgo to a non-zero constant.If c = 0 and d = 2, the long-term forecasts willfollow a straight line.If c 6= 0 and d = 0, the long-term forecasts willgo to the mean of the data.If c 6= 0 and d = 1, the long-term forecasts willfollow a straight line.If c 6= 0 and d = 2, the long-term forecasts willfollow a quadratic trend.

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−k

Now, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

yt = c + φ1yt−1 + φ2yt−2 + . . . + φkyt−k + et

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlated

What is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?

αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelation

αk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1

αk = the estimate of φk in the autoregression model

ACF and PACF plots

Recall that k-th autocorrelation rk measures the linearrelationship between yt and yt−kNow, if yt and yt−1 are correlated, then yt and yt−2must be correlatedWhat is the correlation between yt and yt−2 afterremoving the correlation between yt and yt−1?αk : k-th partial autocorrelationαk : linear relationship between yt and yt−k afterremoving the effects of time lags κ = 1, 2, . . . , k − 1αk = the estimate of φk in the autoregression model

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or q

If both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF is exponentially decaying or sinusoidalSignificant spike at lag p in PACF, but none beyond lag p

Data may follow ARIMA(0, d , q) model if

PACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpful

Data may follow ARIMA(p, d , 0) model if

ACF and PACF plots

If data follows an ARIMA(p, d , 0) or amdARIMA(0, d , q) model, ACF and PACF plots can helpto determine the value of p or qIf both p and q are positive, ACF and PACF plots arenot helpfulData may follow ARIMA(p, d , 0) model if

ACF and PACF plots

ACF is exponentially decaying or sinusoidal

Significant spike at lag p in PACF, but none beyond lag p

ACF and PACF plots

Data may follow ARIMA(0, d , q) model ifPACF is exponentially decaying or sinusoidal

Significant spike at lag p in ACF, but none beyond lag p

ACF and PACF plots

Data may follow ARIMA(0, d , q) model ifPACF is exponentially decaying or sinusoidalSignificant spike at lag p in ACF, but none beyond lag p

Akaike’s Information Criterion

AIC = −2 log(Likelihood) + 2p

where p is the number of estimated parameters inthe model.

Minimizing the AIC gives the best model forprediction.

AIC corrected (for small sample bias)

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Schwartz’ Bayesian IC

BIC = AIC + p(log(n)− 2)

Forecasting using R Exponential smoothing state space models 18

AICC = AIC +2(p+ 1)(p+ 2)

n− p

AICC = AIC +2(p+ 1)(p+ 2)

n− p

AICC = AIC +2(p+ 1)(p+ 2)

n− p

AICC = AIC +2(p+ 1)(p+ 2)

n− p

Value of AIC/AICc/BIC given in the R output.

AIC does not have much meaning by itself. Onlyuseful in comparison to AIC value for anothermodel fitted to same data set.

Consider several models with AIC values closeto the minimum.

A difference in AIC values of 2 or less is notregarded as substantial and you may choosethe simpler but non-optimal model.

AIC can be negative.

Backshift notationA very useful notational device is the backwardshift operator, B, which is used as follows:

Byt = yt−1 .

In other words, B, operating on yt, has the effect ofshifting the data back one period. Twoapplications of B to yt shifts the data back twoperiods:

B(Byt) = B2yt = yt−2 .

For monthly data, if we wish to shift attention to“the same month last year,” then B12 is used, andthe notation is B12yt = yt−12.

Forecasting using R Backshift notation 3

Byt = yt−1 .

Backshift notation

First difference: 1− B.

Double difference: (1− B)2.

dth-order difference: (1− B)dyt.

Seasonal difference: 1− Bm.

Seasonal difference followed by a firstdifference: (1− B)(1− Bm).

Multiply terms together together to see thecombined effect:

(1− B)(1− Bm)yt = (1− B− Bm + Bm+1)yt= yt − yt−1 − yt−m + yt−m−1.

Backshift notation

Backshift notation for ARIMA

ARMA model:yt = c + φ1yt−1 + · · ·+ φpyt−p + et + θ1et−1 + · · ·+ θqet−q

= c + φ1Byt + · · ·+ φpBpyt + et + θ1Bet + · · ·+ θqB

φ(B)yt = c + θ(B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et↑

Firstdifference

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et↑

where φ(B) = 1− φ1B− · · · − φpBp

and θ(B) = 1 + θ1B + · · ·+ θqBq.

ARIMA(1,1,1) model:

(1− φ1B) (1− B)yt = c + (1 + θ1B)et↑

Seasonal ARIMA models

ARIMA (p,d,q) (P,D,Q)m

where m = number of periods per season.

Forecasting using R Seasonal ARIMA models 7

ARIMA (p,d,q)︸︷︷︸ (P,D,Q)m

Non-seasonalpart of themodel

ARIMA (p,d,q) (P,D,Q)m︸︷︷︸↑

Seasonalpart ofthemodel

Seasonal ARIMA modelsE.g., ARIMA(1,1,1)(1,1,1)4 model (without constant)