1
Stat 5100 Handout #12.d – SAS: ARIMA Dependence Structures (Unit 7)
Example 1: 57 daily ‘overshorts’ from an underground gasoline tank in Colorado; see
Brockwell and Davis, Introduction to Time Series and Forecasting, Example 3.2.8;
overshort for day t is
Zt = (amount of fuel at end of day t)
– (amount of fuel at end of day t-1)
– (amount of fuel delivered during day t)
+ (amount of fuel sold during day t)
(With no measurement error and no tank leaks, Zt=0)
/* Define options */
ods html image_dpi=300 style=journal;
data Oshort; input X @@; cards; 78 -58 53 -65 13 -6 -16 -14 3 -72 89 -48 -14 32 56 -86 -66 50 26
59 -47 -83 2 -1 124 -106 113 -76 -47 -32 39 -30 6 -73 18 2 -24 23
-38 91 -56 -58 1 14 -4 77 -127 97 10 -28 -17 23 -2 48 -131 65 -17
;
data Oshort; set Oshort;
day = _n_;
run;
/* Graphical representation; check stationarity */
proc sgplot data=Oshort noautolegend;
scatter y=X x=day /
markerattrs=(symbol=CIRCLEFILLED size=2pt);
series y=X x=day / lineattrs=(pattern=solid);
xaxis label='Day';
yaxis label='Overshort';
title1 'Daily Overshorts';
run;
2
/* Investigate potential dependence structures */
proc arima data=Oshort;
identify var=X nlag=10;
title1 'Look at SAC: MA(1)';
run;
Look at SAC: MA(1)
The ARIMA Procedure
Name of Variable = X
Mean of Working Series -4.03509
Standard Deviation 58.44414
Number of Observations 57
Autocorrelation Check for White Noise
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 20.25 6 0.0025 -0.504 0.122 -0.212 0.080 0.019 0.116
3
/* Now fit specifically as an MA(1) model */
proc arima data=Oshort;
identify var=X nlag=10;
estimate q=1 method=uls plot;
title1 'MA(1) model fit to Overshort data';
run;
MA(1) model fit to Overshort data
Unconditional Least Squares Estimation
Parameter Estimate Standard Error t Value Approx
Pr > |t|
Lag
MU -5.12443 0.35073 -14.61 <.0001 0
MA1,1 0.99999 0.26992 3.70 0.0005 1
Constant Estimate -5.12443
Variance Estimate 1996.541
Std Error Estimate 44.68267
Autocorrelation Check of Residuals
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 4.82 5 0.4379 0.119 0.131 -0.054 0.102 0.130 0.123
12 13.18 11 0.2817 -0.090 0.079 -0.210 -0.161 -0.178 -0.041
18 29.47 17 0.0304 0.098 -0.141 -0.273 -0.173 -0.207 -0.151
24 32.94 23 0.0821 -0.084 -0.071 0.068 -0.057 -0.086 0.095
Model for variable X
Estimated Mean -5.12443
Moving Average Factors
Factor 1: 1 - 0.99999 B**(1)
4
Example 2: General Electric’s gross investment (in millions of dollars) for years 1935 –
1954. Originally presented in Grunfeld, Y. (1958), "The Determinants of Corporate
Investment," Ph.D. dissertation, University of Chicago; discussed in Boot, J.C.G. (1960),
"Investment Demand: An Empirical Contribution to the Aggregation Problem,"
International Economic Review, 1, 3-30. See also Damodar N. Gujarati, Basic
Econometrics, Third Edition, 1995, McGraw-Hill, [1995, pp. 522-525].
data GE; input year GEinv @@; cards; 1935 33.1 1936 45.0 1937 77.2 1938 44.6 1939 48.1
1940 74.4 1941 113.0 1942 91.9 1943 61.3 1944 56.8
1945 93.6 1946 159.9 1947 147.2 1948 146.3 1949 98.3
1950 93.5 1951 135.2 1952 157.3 1953 179.5 1954 189.6
;
proc sgplot data=GE noautolegend;
scatter y=GEinv x=year /
markerattrs=(symbol=CIRCLEFILLED size=2pt);
series y=GEinv x=year / lineattrs=(pattern=solid);
xaxis label='Year';
yaxis label='GE gross investment (millions)';
title1 'GE gross investment';
run;
5
/* Make stationary */
proc reg data=GE noprint;
model GEinv=year; output out=a1 r=resid;
title1 'simple regression on time';
proc sgplot data=a1 noautolegend;
scatter y=resid x=year /
markerattrs=(symbol=CIRCLEFILLED size=2pt);
series y=resid x=year / lineattrs=(pattern=solid);
xaxis label='Year'; yaxis label='Residual';
title1 'GE gross investment after accounting for time';
run;
6
data GE; set GE; logGEinv=log(GEinv);
proc reg data=GE noprint;
model logGEinv=year; output out=a2 r=resid;
title1 'simple regression on time, using log';
proc sgplot data=a2 noautolegend;
scatter y=resid x=year /
markerattrs=(symbol=CIRCLEFILLED size=2pt);
series y=resid x=year / lineattrs=(pattern=solid);
xaxis label='Year'; yaxis label='Residual';
title1 'GE gross investment after accounting for time,
using log';
run;
/* Investigate potential dependence structures */
data newuse; set a2;
Z = resid;
proc arima data=newuse;
identify var=Z nlag=12 ;
title1 'Look at SPAC: AR(2)';
run;
Look at SPAC: AR(2)
The ARIMA Procedure
Autocorrelation Check for White Noise
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 20.42 6 0.0023 0.290 -0.517 -0.535 -0.070 0.310 0.225
12 21.33 12 0.0457 -0.030 -0.127 -0.040 -0.016 0.040 -0.049
7
/* Now fit specifically as an AR(2) model */
proc arima data=newuse;
identify var=logGEinv crosscorr=(year) nlag=12;
estimate p=2 input=(year) method=uls plot;
title1 'AR(2) model fit to log of GE data';
run;
AR(2) model fit to log of GE data
Unconditional Least Squares Estimation
Parameter Estimate Standard Error t Value Approx
Pr > |t|
Lag Variable Shift
MU -135.17006 14.84188 -9.11 <.0001 0 logGEinv 0
AR1,1 0.51014 0.18639 2.74 0.0146 1 logGEinv 0
AR1,2 -0.71635 0.17516 -4.09 0.0009 2 logGEinv 0
NUM1 0.07183 0.0076327 9.41 <.0001 0 year 0
8
Constant Estimate -163.042
Variance Estimate 0.044281
Std Error Estimate 0.210431
Autocorrelation Check of Residuals
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 3.11 4 0.5395 -0.176 -0.019 -0.086 -0.269 0.018 0.078
12 9.44 10 0.4910 0.122 -0.032 0.094 -0.343 0.065 -0.026
18 14.23 16 0.5815 -0.140 0.189 -0.037 0.005 0.074 -0.004
Autoregressive Factors
Factor 1: 1 - 0.51014 B**(1) + 0.71635 B**(2)
9
Example 3: Gas price data – annual averages 1976 – 2012. Variable ‘price’ is U.S.
annual average cents per gallon for unleaded (source: U.S. Energy Information
Administration); variable ‘infl76’ is inflation (relative buying power) with 1976 as base
(source: U.S. Bureau of Labor Statistics).
data gas; input year price infl76 @@; cards;
1976 61.4 100 1977 65.6 107 1978 67.0 115 1979 90.3 128
1980 124.5 145 1981 137.8 160 1982 129.6 170 1983 124.1 175
1984 121.2 183 1985 120.2 189 1986 92.7 193 1987 94.8 200
1988 94.6 208 1989 102.1 218 1990 116.4 230 1991 114.0 239
1992 112.7 247 1993 110.8 254 1994 111.2 260 1995 114.7 268
1996 123.1 276 1997 123.4 282 1998 105.9 286 1999 116.5 293
2000 151.0 303 2001 146.1 311 2002 135.8 316 2003 159.1 323
2004 188.0 332 2005 229.5 343 2006 258.9 354 2007 280.1 364
2008 326.6 378 2009 235.0 377 2010 278.2 383 2011 352.1 395
2012 361.8 404 2013 350.1 409 2014 335.0 411
; /* Note: 2014 data current through 03.03.14 */
/* Look at plots */
proc sgplot data=gas;
series x=year y=price / lineattrs=(pattern=solid);
series x=year y=infl76 / lineattrs=(pattern=dash) y2axis;
xaxis label='Year';
yaxis label='Annual Average Price of Unleaded';
y2axis label='Buying Power of 1976 Dollar';
title1 'Gas Price Data';
run;
10
/* Just out of curiousity, look at prices,
adjusted for inflation (to 2014 dollars) */
data gas; set gas;
priceNow = price / infl76 * 411 / 100;
proc sgplot data=gas;
series y=priceNow x=year / lineattrs=(pattern=solid);
yaxis label='Price of Unleaded (2014 Dollars)';
xaxis label='Year';
refline 2007 / axis=x;
title1 'Gas Price Data, adjusted for inflation';
run;
/* For demonstration purposes only,
restrict attention to pre-2007 */
data gas7; set gas;
if year < 2007;
run;
/* Make stationary */
proc reg data=gas7 noprint;
model price=infl76;
output out=out1 r=resid;
run;
/* Check stationarity and potential dependence structure */
proc arima data=out1;
identify var=resid nlag=10;
title1 'Look at SAC and SPAC plots';
run;
11
Look at SAC and SPAC plots
Autocorrelation Check for White Noise
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 23.80 6 0.0006 0.706 0.377 0.185 0.080 0.032 -0.079
/* Try some models */
proc arima data=out1;
identify var=resid nlag=10;
estimate p=1 method=uls plot;
title1 'AR(1) model fit to gas data';
run;
12
AR(1) model fit to gas data
Unconditional Least Squares Estimation
Parameter Estimate Standard Error t Value Approx
Pr > |t|
Lag
MU 25.72666 89.82249 0.29 0.7766 0
AR1,1 0.96923 0.11054 8.77 <.0001 1
Constant Estimate 0.791484
Variance Estimate 241.7949
Std Error Estimate 15.54976
Autocorrelation Check of Residuals
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 6.14 5 0.2932 0.416 -0.027 -0.002 0.036 0.061 -0.022
12 8.59 11 0.6592 -0.143 -0.112 0.023 0.116 -0.031 -0.074
13
proc arima data=out1;
identify var=resid nlag=10;
estimate q=1 method=uls plot;
title1 'MA(1) model fit to gas data';
run;
MA(1) model fit to gas data
Unconditional Least Squares Estimation
Parameter Estimate Standard Error t Value Approx
Pr > |t|
Lag
MU 0.75142 6.17848 0.12 0.9040 0
MA1,1 -0.99999 0.37396 -2.67 0.0122 1
Constant Estimate 0.751422
Variance Estimate 305.3926
Std Error Estimate 17.47549
Autocorrelation Check of Residuals
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 13.57 5 0.0186 0.446 0.402 0.038 0.134 -0.086 0.044
12 22.07 11 0.0239 -0.247 -0.119 -0.184 -0.030 -0.231 -0.130
18 37.36 17 0.0030 -0.246 -0.159 -0.188 -0.110 -0.237 -0.200
24 50.28 23 0.0008 -0.147 -0.047 0.049 0.097 0.153 0.212
14
proc arima data=out1;
identify var=resid nlag=10;
estimate p=1 q=1 method=uls plot;
title1 'ARMA(1,1) model fit to gas data';
run;
ARMA(1,1) model fit to gas data
Unconditional Least Squares Estimation
Parameter Estimate Standard Error t Value Approx
Pr > |t|
Lag
MU 11.41352 27.31787 0.42 0.6793 0
MA1,1 -0.65140 0.15763 -4.13 0.0003 1
AR1,1 0.85942 0.15923 5.40 <.0001 1
Constant Estimate 1.604542
Variance Estimate 179.4121
Std Error Estimate 13.39448
15
Autocorrelation Check of Residuals
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 0.18 4 0.9961 0.024 0.052 -0.005 0.032 0.027 0.003
12 2.30 10 0.9935 -0.099 -0.082 -0.038 0.122 -0.097 -0.048
18 8.11 16 0.9457 -0.044 -0.183 0.054 0.092 -0.148 -0.129
24 11.84 22 0.9607 -0.138 0.033 -0.064 -0.094 0.001 0.063
16
/* Now fit specifically as an ARMA(1,1) model */
proc arima data=gas7;
identify var=price crosscorr=(infl76) nlag=10;
estimate p=1 q=1 input=(infl76) method=uls plot;
title1 'ARMA(1,1) model fit to gas data';
title2 '(with inflation as predictor)';
run;
ARMA(1,1) model fit to gas data
(with inflation as predictor)
Unconditional Least Squares Estimation
Parameter Estimate Standard
Error
t Value Approx
Pr > |t|
Lag Variable Shift
MU -88.09278 116.63110 -0.76 0.4566 0 price 0
MA1,1 -0.56741 0.16480 -3.44 0.0019 1 price 0
AR1,1 0.95756 0.07686 12.46 <.0001 1 price 0
NUM1 1.02118 0.42680 2.39 0.0239 0 infl76 0
17
Example 3, continued: Gas price data – now look at differencing and forecasting
/* Still restricting attention to pre-2007 */
/* Need to add rows for 'future' predictor variables */
data temp; input year @@; cards;
2007 2008 2009 2010 2011 2012 2013 2014
;
data gas7; set gas7 temp;
run;
/* Look at differencing */
data gas7; set gas7;
Z = price - lag(price);
proc sgplot data=gas7;
series y=Z x=year;
title1 'Gas prices: first differences';
run;
/* Add year and year^2 terms to remove curvilinearity */
data gas7; set gas7;
year1 = year-1975;
year2 = (year-1975)**2;
proc reg data=gas7 noprint;
model Z = year1 year2;
output out=out1 r=resid;
proc sgplot data=out1;
series y=resid x=year;
title1 'Gas prices after differencing and time effects';
run;
18
/* Check stationarity and potential dependence structure */
proc arima data=out1;
identify var=resid nlag=12;
title1 'Look at SAC and SPAC plots';
title2 '(after differencing and time effects');
run;
Look at SAC and SPAC plots
(after differencing and time effects)
Autocorrelation Check for White Noise
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 10.21 6 0.1159 0.161 -0.425 -0.197 0.083 0.053 -0.188
12 13.19 12 0.3554 -0.170 -0.024 0.109 0.144 0.004 -0.062
19
/* Fit tentative model: ARIMA(2,1,0) with two covariates */
proc arima data = gas7;
identify var = price (1) crosscorr = (year1 year2)
nlag=12 ;
estimate p = 2 q = 0 input = (year1 year2)
method = uls plot;
forecast lead=6 alpha=.10 noprint out=fout1;
title1 'Tentative model: ARIMA(2,1,0)';
run;
Tentative model: ARIMA(2,1,0)
Unconditional Least Squares Estimation
Parameter Estimate Standard Error t Value Approx
Pr > |t|
Lag Variable Shift
MU 24.60820 6.99701 3.52 0.0017 0 price 0
AR1,1 0.23270 0.17614 1.32 0.1984 1 price 0
AR1,2 -0.52555 0.17708 -2.97 0.0065 2 price 0
NUM1 -3.70280 0.96942 -3.82 0.0008 0 year1 0
NUM2 0.12424 0.02869 4.33 0.0002 0 year2 0
20
Constant Estimate 31.81473
Variance Estimate 148.1016
Std Error Estimate 12.1697
Autocorrelation Check of Residuals
To Lag Chi-Square DF Pr > ChiSq Autocorrelations
6 3.28 4 0.5128 -0.023 -0.018 -0.012 -0.214 -0.084 -0.179
12 5.31 10 0.8696 -0.109 -0.068 0.058 0.140 -0.043 -0.054
18 10.73 16 0.8258 0.067 -0.120 0.162 0.173 -0.082 0.031
24 16.66 22 0.7819 -0.114 0.080 -0.079 -0.135 -0.012 0.091
21
/* Note that since year wasn't a predictor,
it won't appear in fout1;
so we'll need to add it 'by hand' before making the plot
*/
data fout1; set fout1;
year = _n_ + 1975;
if year=2007 then price=280.1;
if year=2008 then price=326.6;
if year=2009 then price=235.0;
if year=2010 then price=278.2;
if year=2011 then price=352.1;
if year=2012 then price=361.8;
if year=2013 then price=350.1;
if year=2014 then price=335.0;
run;
proc sgplot data=fout1;
series x=year y=price /
lineattrs=(pattern=solid thickness=5);
series x=year y=forecast / lineattrs=(pattern=solid);
series x=year y=l90 / lineattrs=(pattern=dash);
series x=year y=u90 / lineattrs=(pattern=dash);
refline 2007 / axis=x;
xaxis label='Year';
yaxis label='Gas Price';
title1 'Pre-2007 gas data: ARIMA(2,1,0)';
title2 'Forecast with 90 percent confidence intervals';
run;