arima and neural networks. an application to the real gnp growth rate and the unemployment rate of...

1

ARIMA and Neural Networks. An application to the real GNP growth rate and

the unemployment rate of U.S.A.

Eleftherios Giovanis

Abstract

This paper examines the estimation and forecasting performance of ARIMA models

in comparison with some of the most popular and common models of neural

networks. Specifically we provide the estimation results of AR-GRNN (Generalized

regression neural networks) and the AR-RBF (Radial basis function). We show that

neural networks models outperform the ARIMA forecasting. We found that the best

model in the case of real US GNP is the AR-GRNN and for US unemployment rate is

the AR-MLP.

Keywords: ARIMA; Radial basis function; Multilayer perceptron; Generalized regression

neural networks; stationarity; unit root

1 Introduction

Artificial neural networks are computational networks which aim and attempt

to simulate nerve cells or neuron of biological nervous system of human or animals

(Graupe, 2007) . The difference between the neural networks and the other estimation

and approximation methods is that neural networks conclude the hidden layers in

which the input variables or data are transformed into special function, as the logistic

or the negative exponential and many more. With this hidden layer and the synapses

functions, the approach can be prove a very efficient to model and to estimate

nonlinear processes (McNelis, 2005). In this paper we have to deal with two

macroeconomic series , which are characterized of trend and circularity.

2

Aryal and Yao-Wu (2003) applied a MLP network with 3 hidden layers to

forecast the Chinese construction industry and they compare the forecasting

performance of the MLP networks with that of ARIMA. They found that the RMSE

of the MLP estimation is 49 percent lower than the ARIMA counterpart. Maasoumi

et al., (1996) have applied a back-propagation ANN model to forecast GDP and

unemployment rate among others. The network they apply is a single hidden layer

feedforward networks with the hidden units. Swanson and White (1997a, 1997b)

applied neural networks to forecast nine seasonally adjusted US macroeconomic time

series and they found generally neural networks outperform the linear models. Tkacz

and Hu (1999) have applied neural networks to forecast the Canadian GDP growth at

4-quarter horizon and they found that forecast accuracy is statistically significant,

while the performance in the 1-quarter horizon is poor. Also they found that the best

neural networks models outperform the best linear models by 15 to 19 percent at 4-

quarter horizon. Tkacz (2001) has found that neural networks produce lower

forecasting errors for the yearly growth rate of the real Canadian GDP relative with

the linear and univariate models.

2 Data

The data concern quarterly series of the real gross national product (GNP) and

the unemployment rate for the economy of the USA during period 1948-2006. The

data have been obtained by the Reserve Federal Bank of St. Louis.

3

3 Methodology

a. Autoregressive moving average

The first model we estimate is the ARMA, which its process (Gujarati, 2004)

is defined as

�� ……� �� ……� �� (1)

This is the ARMA(p,q) process. If the series are not stationary in their levels ,

which means that aren’t I(0), then we have to estimate an ARIMA(p,d,q) process

(Gujarati, 2004).

b. Generalized Regression Neural Networks

The GRNN is defined as

dy

dy

∫∫

∞

∞

∞

∞=

-

-

y)g(x,

y)yg(x, x]|E[y (2)

, where x]|E[y is the expected value of y given x and g(x,y) is the Parzen

probability density estimator . If the value of g(x,y) is unknown , then it can be

estimated from a sample of observations of x and y.

The predicted output obtained by GRNN is:

∑

∑

−−

−−

=n

i2

2

n

i2

2

^

)2

||||exp(

)2

||||exp(

)(

i

i

i

i

i

xx

xxy

xy

σ

σ (3)

Usually the GRNN consists of four layers. The first layer , which are the input

data, the synaptic and the activation functions are linear. In the second layer, the

pattern layer, the synaptic function is the radial and the activation function is the

negative exponential. The third layer, the summation layer, has as the first layer linear

synaptic and activation functions. The last layer , the output, has a synaptic function a

4

division and linear activation function. More specifically input layer receives the input

vector X and distributes the data to the pattern layer. Each neuron in the pattern layer

generates an output θ, which )2

||||exp(

2

2

i

ii

xx

σθ

−−= and present the results to the

summation layer. In this layer the numerator and denominator neuron compute the

weighted and simple sums based on the value of w and θ , which is wijθj , the

numerator is Sj = Σi wijθj and denominator is Sd = Σi θj. In the output layer output y are

computed as Υj = Sj/ Sd. We must mention that the hidden layer consists of 24 units.

The smooth rate for GNP is set at 0.01 and for the unemployment rate is set at 0.05

based on the lowest train and test errors. In our case we propose the AR-GRNN

model (Li et al., 2007), which means that the output is the vector of data yt and inputs

are the data with lags as yt-1, yt-2…yt-p. So the general form of the AR-GRNN is

defined as

�� , ��, ……… , �� (4)

, where F is a function produced by GRNN network. But in the case of unemployment

we consider the first differences, because we suspect that unemployment , is not

probably stationary, as indicates the KPSS test , so we apply the following AR(p)

function

�� , ��, ……… , �� (5)

We apply relation (4) and (5) for all neural networks models and specifically we apply

AR(1) for GNP and AR(2) for the first differences of unemployment rate. The

technique we obtain is the following. Suppose that we have quarterly output data for a

period e.g. 1948:Q1-2006:Q4 which is the variable yt. If we have AR(1) then we

obtain the yt-1 , which is the output data with one lag. But this lag is referred again to

5

same data for period 1948:Q2-2007:Q1, which means that we don’t extinguish the last

observation , but we put it forward to the next period. The same process is followed

for AR(2). So in this paper we estimate for the period 1948:Q1-2006:Q4 and then we

make the forecast for the period 2007:Q1-2008:Q1. This definition is applied also for

the other two neural network models. In figure 1 is presented a general GRNN

architecture. In all neural network models estimations the training sample is set up for

period 1948:Q1-1990:Q4 and the testing sample is set up for period 1991:Q1-

2006:Q4. The

Y1 Y2 YJ

Output Layer

……………..

Numerator Denominator

……………… Summation Layer

………… Pattern Layer

……………. Input Layer

X1 X2 Xk

Figure 1. General GRNN architecture

22

3 I 2 1

1 2 J

6

c. Radial Basis Function

The radial basis function is defined (Bishop, 1995) as

koj

M

j

kjk wxwxy +=∑=

)()(1

φ (6)

, where wkj are the weights and wko are biases and φj(x) can be estimated by

)2

||||exp()(

2j

j

j

xx

σ

µφ

−−= (7)

The RBF consists by three layers, the input, which its synaptic and activation

function are linear, the hidden layer , where the synaptic and the activation functions

are radial and negative exponential respectively. Finally the third layer, which is the

output layer, has linear synaptic and activation function, as in the case of the input

layer. In figure 2 we present a general RBF illustration. The hidden layer in the RBF

estimation has 11 units. The radial for GNP and for unemployment rate has been set at

50 based on the lowest train and test error as in the case of GRNN estimation. The

definition of AR-RBF function is applied as in the AR-GRNN case. We present a

general RBF illustration in figure 2.

d. Multilayer perceptron

The last model we estimate is the multilayer perceptron (MLP), which has two

differences in relation with the RBF (McNelis, 2005). First the RBF has at the most

one hidden layer, while MLP can have more. Second the activation function in RBF

computes the Euclidean distance of the between the signal from input vector and the

7

center of that unit , while MLP computes the inner products of the inputs and the

weights for that unit.

The first layer, input, in the MLP has linear synaptic and activation function,

as the last layer, the output, has. The hidden layers, which in our case are three , have

linear synaptic function and hyperbolic activation functions. For networks with binary

units MLP with one hidden layer has been shown that is suffice. But in our case we

Output

Linear weights

Radial basis functions

Weights

Input x

Figure 2. General RBF architecture

have continuous variables or data , so we prefer three hidden layers. In the first phase

the back-propagation method is applied. Each layer consists of units and receive input

from the units of the layer directly below, and then send the output to unit directly

above the unit. The Ni inputs are fed into the first layer of Nh,1 hidden units (Krose &

Smagt, 1996). The mathematical concept of the back-propagation method is

��

�� (1)

8

, where �� ∑ ��

� � �� (2)

, to get the delta rule we must set

��

�!"# (3)

The error measure of Et is defined as the total square error for pattern t at the output

units and it is

$� ��

∑ �%&

� �'(

&)� �&�� (4)

, where %&� is the desired output for unit i and pattern t. Then we can write by the

chain rule

��

�!"#�

��

�*#

�*#

�!"# (5)

But by equation (2) we find that the second factor from the right hand term of the

equation (5) is equal with

�*#

�!"#� ��

� (6)

And we define the first factor ��

�*# as

�+� � �

��

�*# (7)

, so equation (3) can be written as

�� +��

� (8)

Then to compute �+� we write the partial derivation, by applying the chain rule, as the

product of two factors. The one factor in relation (9) reflects the change in error as a

function of the output of the unit , while the other reflects the change in the output as

a function of changes in the input. Relation (9) is defined as

�+� � �

��

�*# �

��

�,#

�,#

�*# (9)

We know that the second factor of (9) is

9

�,#

�*# � ��

�� (10)

, which is the derivative of function f for the kth unit. For the first factor computation

we assume that k=i. Then in this case we have

��

�,# � ��%&

� � �&�� (11)

And then we have

�&� � �%&

� � �&��&��&

�� (12)

, for any output unit i. Second if k is a hidden unit and not an output, which means

that k=h , then the error measure can be written as a function of net inputs from

hidden to output layers and we use the chain rule.

��

�,. � ∑

��

�*(

�*(

�,( � ∑

��

�*(

�

�,( ∑ ��&,"

�'.

�)�

'(

&)�

'(

&)�∑

��

�*( �/& � �∑ �&

��/&'(

&)�

'(

&)� (13)

�0� � ��/

� �∑ �&��/&

'(

&)� (14)

In the first phase we use the back-propagation method. In the second phase we use the

Levenberg-Marquardt algorithm (Bishop, 1995). Suppose that we have the error

function

$ ��

∑ �12�2 (15)

, where ε4 is the error for the nth pattern. We set WA as the old weight space and WB

as the new weight space. Then we can expand the error vector ε to first order in

Taylor series.

ε(WB) = ε(WΑ) + Ζ (WB – WΑ) (16)

, where Z is matrix and is defined as

�5�2& ��67

�!( (17)

So the error function (20) can be written as

$ ��

∑ ε�WΑ� � Ζ �WB – WΑ� �

�

||ε�WΑ� � Ζ �WB – WΑ�|| 2 (18)

10

In this paper we estimate a MLP network with three hidden layers and three

units each of them. The learning rate is set at 0.01 and the momentum at 0.3. In the

first phase the number of epochs are 100 and in the second phase they are 500. The

AR-MLP is defined as in the other two neural network models, the AR-GRNN and

the AR-RBF. In figure 3 a general MLP illustration with three hidden layers is

presented.

No

Ni Nh,1 Nh,t-1 Nh,t-1

Figure 3. MLP architecture with three hidden layers

Also we will apply the unit root test to examine if the series are I(0) or not,

which with other words means, if they are stationary in levels or in first difference and

above. We apply these tests to define if we have an ARMA(p,q) or an ARIMA(p,d,q)

process. We apply two tests the Dickey-Fuller (Greene, 2003) and the KPSS

(Kwiatkowski, 1992) tests. For DF GLS test we examine the regression with constant

and trend and it is

�� > � �� , ?@� 1� (19)

h

h

h

h

11

And we test the hypothesis

H0: φ=1, δ=0 => yt ~ I(1) with drift

H1: φ<|1| => yt ~ I(0) with deterministic time trend

, which means that if we accept the null hypothesis then the series are non stationary

in first differences , so they are I(1), else if we reject the null hypothesis the time

series are stationary, I(0). For the KPSS test we have the hypotheses.

H0: stationary

H1: non-stationary

The KPSS test is based in the residuals by the OLS regression of yt on exogenous

variables xt. Specifically it is

yt = α + βt + γΖt (20)

If γ equals with zero , then the process is stationary if β=0 and trend stationary if

β B 0 . Let et denotes the OLS residuals, et= yt - α - βt The KPSS statistic is

DEFF �∑ $�

G�)�

HIJ

, where IJ �∑ �

KL M@

G� 2∑ �1 �

�

PQ��R�

P�)� , while R� �

∑ STST?"LTM"U@

G

To compare the forecasting performance between the models we examine, we

apply two statistical measures, the RMSE (root mean square error) and the MAE

(mean absolute error).

12

4 Results

Table 1

Unit root tests for real GNP and unemployment rate of USA

Test Series t-statistics Critical values

DF GLS

GNP

-10.466

-3.46 (1%)

-2.92 (5%)

-2.62 (10%)

DF GLS

Test

Un. Rate

-2.43

LM-stat

-3.46 (1%)

-2.92 (5%)

-2.62 (10%)

Critical Values

KPSS

GNP

0.0299

0.216 (1%)

0.146 (5%)

0.119 (10%)

KPSS

Un. rate

0.2375

0.216 (1%)

0.146 (5%)

0.119 (10%)

From table 1 we conclude that real GNP is I(0), so it is stationary in levels

with both tests. For unemployment rate we conclude that with KPSS test is I(1) as we

can see from table 2.

Table 2

KPSS unit root test for first difference of unemployment rate

Test Series LM stat Critical values

DF GLS

Un. rate

0.0338

0.216 (1%)

0.146 (5%)

0.119 (10%)

According to the three information criteria , Akaike, Hannan-Quinn and

Schwarz, we have an ARMA(1,0) process for GNP and ARIMA(2,1,3) for

unemployment rate. So we apply an AR(1) for the three neural networks in the case of

13

GNP and AR(2) for unemployment rate. From table 3 we conclude that neural

networks modeling is better, with AR-GRNN to have the lowest RMSE and MAE. So

we prefer neural network for the forecasting of the real GNP of USA. Specifically we

found that the RMSE of forecasting for neural network models is 7 to 17 per cent

lower than the ARIMA counterpart an the MAE is 9 to 22 per cent lower than the

MAE of ARIMA.

Table 3

Forecasting comparison between ARIMA and neural networks for the real GNP of USA for the period

2007:Q1-2008:Q1

Model RMSE MAE

ARMA(1,0) 0.554 0.502

GRNN 0.460 0.393

RBF 0.500 0.433

MLP 0.515 0.455

In table 4 the conclusions are almost the same with that of GNP results. The

neural networks modeling is again more reliable and these models present lower

RMSE and MAE than that of ARIMA(2,1,3). Especially the AR-MLP and then the

AR-GRNN are the best models. In the case of the unemployment the RMSE and

MAE of neural networks are respectively 45 to 62 and 56 to 67 percent lower than

the ARIMA counterparts. In table 5 we present the actual values of real US GNP and

the predicted values generated by the four models.

Table 4

Forecasting comparison between ARIMA and neural networks for the real unemployment rate of USA

for the period 2007:Q1-2008:Q1

Model RMSE MAE

ARIMA(2,1,3)

GRNN

0.217

0.107

0.202

0.089

RBF 0.120 0.084

MLP 0.081 0.066

14

Table 5

Forecasting values for GNP with the four models

Model Period Actual Predicted Model Predicted Model Predicted Model Predicted

ARMA

(1,0)

2007:Q1 0.164 0.76379

GRNN

0.860

RBF

0.513

MLP

0.729

2007:Q2 0.983 0.80734 0.390 0.941 0.893

2007:Q3 1.411 0.82153 1.001 0.695 0.936

2007:Q4 0.462 0.82615 0.209 0.974 0.790

2008:Q1 0.044 0.82765 0.060 0.643 0.864

Table 6

Forecasting values for unemployment rate with ARMA (2,1,3) and neural networks

Model Period Actual Predicted Model Predicted Model Predicted Model Predicted

ARIMA

(2,1,3)

2007:Q1 0.567 0.390

GRNN

0.659

RBF

0.629

MLP

0.550

2007:Q2 -0.367 -0.176 -0.300 -0.328 -0.280

2007:Q3 0.234 0.391 0.197 0.269 0.384

2007:Q4 -0.100 -0.232 0.100 0.155 -0.144

2008:Q1 0.700 0.342 0.749 0.722 0.667

In table 6 we present the actual and predicted first differences of US

unemployment with ARIMA (2,1,3) and the three neural network models. In figure 4

we present the forecasting with for US real GNP during the period 2007:Q1-2008:Q1,

while in figure 5 are presented the forecasting results for US unemployment for the

same period.

15

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL GRNN

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL MLP

(a) (b)

(c) (d)

Figure 4. Actual against forecasting for US GNP in the period 2007:Q1-2008:Q1 with: (a) ARMA

(1,0), (b) GRNN, (c) RBF and (d) MLP

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL ARIMA (2,1,3)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL RBF

16

-.4

-.2

.0

.2

.4

.6

.8

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL GRNN

-.4

-.2

.0

.2

.4

.6

.8

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL MLP

(a) (b)

(c) (d)

Figure 5. Actual against forecasting for US unemployment first differences in the period 2007:Q1-

2008:Q1 with: (a) ARIMA (2,1,3), (b) GRNN, (c) RBF and (d) MLP

-.4

-.2

.0

.2

.4

.6

.8

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL ARIMA

-.4

-.2

.0

.2

.4

.6

.8

2007Q1 2007Q2 2007Q3 2007Q4 2008Q1

ACTUAL RBF

17

5 Conclusion

We examined the forecasting performance of the traditional time series

method, the ARIMA process in comparison with three neural networks models. We

proposed the three of the most usual models the generalized regression neural

networks (GRNN), the radial basis function (RBF) and the multilayer perceptron

(MLP). We obtained the autoregressive (AR) of these neural models, which means

that input data are just the output data with time lags. We configure the AR(p) order

as we define by the unit root tests, so we have AR(1) for the real gross national

product (GNP) and AR(2) for the unemployment rate for the economy of USA. We

show that all neural models outperform the ARIMA process , so we conclude that

traditional time series and econometrical methods , are not always the best or even the

only choice, but we must look out for more sophisticated modeling , as the neural

networks modeling, which are able to capture with great success , the non-linear

processes.

REFERENCES

Aryal R.D. & Yao-Wu W. (2003). Neural Network Forecasting of the Production

Level of Chinese Construction Industry. Journal of comparative

international management , 29, 319-33

Bishop C.M. (1995). Neural Networks for Pattern Recognition. pp. 164-170, 290-

291. Oxford: Clarendon Press

Graupe D. (2007). Principles of Artificial Neural Networks. 2nd Edition, pp. 1 World

USA: Scientific Publishing

Greene H. W. (2003). Econometric Analysis. Fifth Edition, pp. 637-640. New

Jersey: Pearson Education

Gujarati D. (2004). Basic Econometrics. Fourth Edition, pp. 839-840. USA: McGraw-

hill

Krose B. & Smagt. V.D. P. (1996). An introduction to neural networks. Eighth

edition . pp. 33-37. The University of Amsterdam

Kwiatkowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992). Testing the

Null Hypothesis of Stationarity against the Alternative of a Unit Root.

Journal of Econometrics, 54, 159-178.

Li W., Luo Y., Zhu Q., Liu J. & Le J. (2007). Applications of AR*-GRNN model

18

for financial time series forecasting. Neural Computing & Applications,

London: Springer

Maasumi E., Khotanzad A., and Abaye A. (1996). Artificial neural networks for

some macroeconomic series: a first report. Econometric Reviews, 13 (1),

105-122

McNelis D. P. (2005). Neural Networks in Finance: Gaining Predictive Edge in the

Market. pp. 21. USA : Elsevier Academic Press

Swanson, N.R., and White, H. (1997a). A model selection approach to real time

macroeconomic forecasting using linear models and artificial neural

networks. Review of Economics and Statistics, 79, 540-50.

Swanson, N.R., and White, H. (1997b) . Forecasting economic time series using

adaptive versus non-adaptive and linear versus nonlinear econometric

models. International Journal of Forecasting, 13, 439-61.

Tkacz G. and Hu, S. (1999). Forecasting GDP Growth Using Artificial Neural

Networks. Working Paper, Bank of Canada, 99-3

Tkacz G. (2001). Neural network forecasting of Canadian GDP growth.

International Journal of Forecasting, 17, 57-69.

arima and neural networks. an application to the real gnp growth rate and the unemployment rate of...

Documents