time series analysis: a contemporary approach to traffic

15
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 1 Time Series Analysis: A Contemporary Approach to Traffic Volume Forecasting 1 Date of Submission: 31 st July 2012. 2 Word Count: (4326 words + 12 *250) = 7326 words 3 Authors: 4 Kartikeya Jha 5 Undergraduate Student, Civil Engineering Department, Birla Institute of Technology & Science, 6 Pilani-333031, Rajasthan, India 7 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 8 Pilani-333031, Rajasthan, India 9 Telephone No.: 09789856265 10 Email: [email protected] 11 12 Balaji Ponnu 13 Former Post Graduate Student, Transportation Engineering Department, Birla Institute of 14 Technology & Science, Pilani-333031, Rajasthan, India 15 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 16 Pilani-333031, Rajasthan, India 17 Telephone No.: 07598231953 18 Email: [email protected] 19 20 Shriniwas S. Arkatkar* 1 21 Assistant Professor, Civil Engineering Department, Birla Institute of Technology & Science, 22 Pilani-333031, Rajasthan, India 23 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 24 Pilani-333031, Rajasthan, India 25 Telephone No.: 08058321357 26 Fax: +91-01596-244183 27 E-mail: [email protected] 28 29 30 31 32 33 34 35 36 37 38 1 *Corresponding Author: Shriniwas S. Arkatkar ( [email protected])

Upload: others

Post on 07-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 1

Time Series Analysis: A Contemporary Approach to Traffic Volume Forecasting 1

Date of Submission: 31st July 2012. 2

Word Count: (4326 words + 12 *250) = 7326 words 3

Authors: 4

Kartikeya Jha 5 Undergraduate Student, Civil Engineering Department, Birla Institute of Technology & Science, 6 Pilani-333031, Rajasthan, India 7 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 8 Pilani-333031, Rajasthan, India 9

Telephone No.: 09789856265 10

Email: [email protected] 11 12

Balaji Ponnu 13

Former Post Graduate Student, Transportation Engineering Department, Birla Institute of 14 Technology & Science, Pilani-333031, Rajasthan, India 15 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 16

Pilani-333031, Rajasthan, India 17 Telephone No.: 07598231953 18

Email: [email protected] 19 20 Shriniwas S. Arkatkar*

1 21

Assistant Professor, Civil Engineering Department, Birla Institute of Technology & Science, 22 Pilani-333031, Rajasthan, India 23

Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 24 Pilani-333031, Rajasthan, India 25

Telephone No.: 08058321357 26 Fax: +91-01596-244183 27 E-mail: [email protected] 28

29 30

31 32 33

34

35

36

37

38

1

*Corresponding Author: Shriniwas S. Arkatkar ([email protected])

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 2

Abstract 1

Traffic forecasting is the process of estimating the number of users of different transportation 2 facilities in the future in terms of number of vehicles or people. It is an indispensible element of 3

transportation planning and engineering. For developing nations, assessment of the accuracy 4 levels of different forecasting techniques is imperative to meaningful allocation of scarce 5 resources like land, labor and money. This work attempts to review the cardinal issues that 6 surround this very essential and challenging field of study. Stress has been laid on Time Series 7 Analysis, a relatively contemporary approach to forecasting, especially in the field of 8

transportation engineering. This method has been used to predict total vehicular population in 9 India using data set for increasing number of years for separate sets of analyses and accuracy 10 level for each such analysis has been evaluated by checking with actual traffic population 11 figures. For this, the Box & Jenkins methodology has been adopted and analysis has been done 12

using the Auto-Regressive Integrated Moving Average (ARIMA) approach. Further, to highlight 13 the increasing effectiveness of this method with rich data, analysis has been done with AADT 14

data from PeMS, Caltrans, US. The study reveals the potential of Time Series Analysis as a 15 sound forecasting tool in times to come. The error in forecasting using this method has been 16

found to be significantly lower than that from other traditional methods. This analysis will 17 provide us an insight into the choice of a method best-suited for forecasting vehicular population 18

in India and other developing countries. 19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 3

1.0 INTRODUCTION 1

1.1 Traffic Forecasting 2 In essence, traffic volume forecasting is the process of estimating the number of vehicles or 3 people that are likely to use different transportation facilities in the future. For instance, a 4

forecast may estimate the number of vehicles on a planned road or bridge, the expected ridership 5 on a railway/metro line, the number of passengers visiting an airport, or the expected future 6 traffic levels for the whole country. This process begins with collection of data on current traffic. 7 Depending upon the specific requirement of analysis, this traffic data is combined with other 8 known data, such as population and economic growth rates, employment rate, trip rates, travel 9

costs etc., to develop a traffic demand model for the current situation. Combining this with 10 predicted data for population, employment etc. results in estimates of future traffic, typically 11 estimated for each segment of the transportation infrastructure in question, e.g., for each roadway 12

segment or railway station that falls under the scope of facility. 13

1.2 Need For Traffic Forecasting 14 Knowledge of future traffic flow is an essential input in the planning, implementation and 15 development of a transportation system. It also helps in its operation, management and control 16 (6). It is required to start the planning and/or development phase of any major transportation 17

project initiatives. Being the first step in defining the scope and geometry of such projects, 18 sometimes forecasting even helps us know whether a project is needed at all. Forecasting is 19

necessary for doing relevant economic analysis (11). It can also be used for other purposes such 20 as corridor planning, systems planning, air quality analysis, safety analysis and other such special 21 projects. Inaccuracies in traffic volume forecasts are responsible for the additional costs 22

associated with over and under design (17). The costs associated with an under designed project 23 arise when an additional project must satisfy the original inadequacies. Extra materials, labor, 24

and additional right-of-way attainment add to the cost of an over designed project. Efficiency of 25 Traffic Forecasting depends mainly on the size of average daily traffic. In general, the smaller 26

the average daily traffic, the larger is the error in traffic forecasting. The major reasons for these 27

errors can be: 28

• The changing traffic patterns in the future, specifically Induced demand effect (3),(4) & 29 Rebound effect (7). 30

• Traffic impacts due to development, majorly due to change in land use patterns (16), 31

• Unforeseen and unaccounted socio-economic changes (5), 32

• Construction of new roads, diversions etc. 33 34

2.0 LITERATURE REVIEW 35 The literature review for this work comprises of the study of available literature on the methods 36 previously used for traffic forecasting, their challenges, scope for improvement and then the 37

study of more recent, contemporary approaches to forecasting, especially with reference to Time 38 Series Analysis. In the Indian context, the past research work has mainly concentrated on Trend 39 Line Analysis (9),(10). Here the traffic volume levels for the country have been predicted using a 40 linear relationship between a country‟s Gross National Product (GNP) and the total vehicular 41 population. On the same lines, a project feasibility report on 6-laning of NH-2 from Delhi to 42

Agra prepared by CES for NHDP (15) elaborates a combination of Trip Generation models and 43 Trend Line Analysis using NSDP (Net State Domestic Product) instead of GNP for different 44 corridors lying in the scope of this project. Study of more contemporary areas of research focuses 45

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 4

mainly on the Time Series Analysis. While Bhar & Sharma (1) deal with the applications and 1

nuances of Time Series Analysis (exemplified with the use of the software SPSS), Nihan & 2 Holmesland (12) stress on the basics of Time Series Modeling. Approximate nearest neighbor 3 nonparametric regression method has been discussed by Oswald et al. (13). Although a number 4

of methods can be adopted for traffic volume forecasting depending on the specific situation at 5 hand, for this analysis one of the more recent approaches-Time Series Analysis was chosen for a 6 comparative analysis with other traditional methods. 7 8

3.0 OBJECTIVE AND SCOPE 9 This paper attempts to highlight the usefulness of Time Series (TS) Analysis in traffic 10 forecasting by underlining the lower values of estimation errors found with this method when 11 compared to other two methods- Trend Line Analysis and Econometric Regression Analysis (8). 12 This whole exercise is only the first step in developing an insight into the choice of the best 13

suited method, especially with respect to Indian conditions to estimate future traffic levels in the 14 country which, as has been discussed, is quite imperative from many aspects. Due to data 15

availability constraints the present analysis has been done for total vehicular population in India 16 to enable the choice of appropriate methods for estimation at specific project level also. The 17

primary data used has been cited from “Time Series Data on Road Transport Passenger and 18 Freight Movement (1951-1991)”, Special Publication 45, Indian Roads Congress, New Delhi, 19 1996 (18). This has been produced in Table 1 for ready reference. To gauge the extent of data 20

requirement of TS method, analysis was carried out with 15, 20, 25, 30 and 35 years‟ traffic data 21 and respective errors in estimation were calculated. As suggested by Box & Jenkins (2), ideally 22

at least 50 observations are required for performing Time Series Analysis. Taking this into 23 account, TS analysis was done on Average Annual Daily Traffic (AADT) data sourced from 24 Performance Measurement System (PeMS), DOT, California, US for a location in district 7 on 25

Interstate-10(W) (data shown in Table 2). This analysis further established the potential of Time 26

Series Analysis as a promising alternative to traditional methods of forecasting. Overall, the 27 paper attempts to gauge the suitability of Time Series forecasting technique for traffic volume 28 prediction. Given rich and varied data availability, this analysis can be extended to produce 29

better understanding of this method and its application to project level studies as well. Further, 30 multivariate Time Series Modeling can be explored for even better results if data availability 31

meets the high requirements of TS analysis. 32 33

4.0 METHODOLGY OF ANALYSIS 34 4.1 Methods adopted 35 This work deals mainly with the Time Series Analysis method for forecasting. At the same time, 36 the results obtained after analysis by this method have been compared with those obtained from 37

two other methods- Trend Line Analysis, where future traffic volume is predicted based on a 38 linear relationship between traffic population and Gross National Product (GNP); and 39 Econometric Regression Analysis where traffic demand is seen as being dependent on chosen 40

economic/demographic variables (8). A brief description of TS method is given below: 41 Time Series Analysis: Time series is a set of observations ordered in time. This analysis deals 42 with observations that are collected over equally spaced, discrete time intervals. As in this case, 43 when observations are made for only one variable over time, it is called a univariate time series. 44 The fundamental assumption for any Time Series Analysis is that some aspects of past pattern 45 will continue to affect the future values. 46

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 5

TABLE 1 Data Table for Total Vehicular Population (1951-1996) 1 2

Year Total Vehicular

Population

Year Total Vehicular

Population

1951 1516079 1974 13109888

1952 1646669 1975 14359564

1953 1833692 1976 15717431

1954 1967710 1977 17553280

1955 2158496 1978 19303907

1956 2403603 1979 21083477

1957 2605137 1980 23418452

1958 2865091 1981 26138616

1959 3129076 1982 28846935

1960 3452840 1983 32056201

1961 3778488 1984 35530913

1962 4173044 1985 39429002

1963 4630391 1986 38349721

1964 5042291 1987 45492645

1965 5574485 1988 53073160

1966 6172690 1989 60827580

1967 6786859 1990 68944375

1968 7466313 1991 74641916

1969 8219423 1992 80487495

1970 9049346 1993 86298645

1971 10014079 1994 92274138

1972 11028301 1995 100337963

1973 11918799 1996 108336195

3 Source-“Time Series Data on Road Transport Passenger and Freight Movement (1951-1991)”, 4 Special Publication 45, Indian Roads Congress, New Delhi, 1996. 5 6 7

8 9 10

11

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 6

TABLE 2 Monthly AADT Data for Location ‘Lark Ellen’ on I-10(W), California 1

Month

Mainline (ML)

AADT

Month

Mainline (ML)

AADT

Month

Mainline (ML)

AADT

Jul-00 117018 Feb-04 120744 Sep-07 116075

Aug-00 117170 Mar-04 120914 Oct-07 116076

Sep-00 117113 Apr-04 120896 Nov-07 115953

Oct-00 117099 May-04 120710 Dec-07 115650

Nov-00 117339 Jun-04 120672 Jan-08 115364

Dec-00 117462 Jul-04 120258 Feb-08 115341

Jan-01 117725 Aug-04 119920 Mar-08 115093

Feb-01 118230 Sep-04 119160 Apr-08 115009

Mar-01 118441 Oct-04 118394 May-08 115176

Apr-01 118147 Nov-04 118376 Jun-08 115351

May-01 118070 Dec-04 121533 Jul-08 115630

Jun-01 118020 Jan-05 124428 Aug-08 115670

Jul-01 117952 Feb-05 125687 Sep-08 115698

Aug-01 118524 Mar-05 125992 Oct-08 115596

Sep-01 119371 Apr-05 126534 Nov-08 115441

Oct-01 119828 May-05 126209 Dec-08 115472

Nov-01 119624 Jun-05 126098 Jan-09 115743

Dec-01 119469 Jul-05 125727 Feb-09 116064

Jan-02 119249 Aug-05 125454 Mar-09 116292

Feb-02 118846 Sep-05 125454 Apr-09 116346

Mar-02 118790 Oct-05 125431 May-09 116031

Apr-02 118425 Nov-05 125496 Jun-09 115870

May-02 118615 Dec-05 125283 Jul-09 115563

Jun-02 118910 Jan-06 125427 Aug-09 115462

Jul-02 119169 Feb-06 125349 Sep-09 115122

Aug-02 119426 Mar-06 125197 Oct-09 115156

Sep-02 119412 Apr-06 124803 Nov-09 115486

Oct-02 119364 May-06 124623 Dec-09 115616

Nov-02 119450 Jun-06 123341 Jan-10 114961

Dec-02 119397 Jul-06 122467 Feb-10 114377

Jan-03 119689 Aug-06 121037 Mar-10 114107

Feb-03 119904 Sep-06 120781 Apr-10 113668

Mar-03 120019 Oct-06 120445 May-10 113356

Apr-03 120451 Nov-06 119571 Jun-10 112959

May-03 120665 Dec-06 118978 Jul-10 112688

Jun-03 120594 Jan-07 118558 Aug-10 112579

Jul-03 120750 Feb-07 117770 Sep-10 112617

Aug-03 120717 Mar-07 117690 Oct-10 112282

Sep-03 120831 Apr-07 117817 Nov-10 111810

Oct-03 121107 May-07 117860 Dec-10 111340

Nov-03 120832 Jun-07 117484 Jan-11 111574

Dec-03 120931 Jul-07 116822 Feb-11 111668

Jan-04 120731 Aug-07 116424 Mar-11 111834

2

Data Source: http://www.pems.dot.ca.gov (PeMS, Caltrans) accessed on 02.04.2012 3

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 7

Values of variables occurring prior to the current observation are called lag values. The primary 1

difference between time series models and other types of models is that lag values of the target 2 variable are used as predictor variables, whereas traditional models use other variables as 3 predictors, and the concept of a lag value doesn‟t apply because the observations don‟t represent 4

a chronological sequence. A time series is deterministic if its future behavior can be exactly 5 predicted from its past behavior. Otherwise the time series is statistical. The future behavior of a 6 statistical time series can be predicted only in probabilistic terms. Time series techniques can be 7 used to develop highly accurate and inexpensive short term forecasts. The Box & Jenkins 8 methodology (14) has been adopted and analysis has been done using the ARIMA approach. The 9

main rationale behind using Box & Jenkins technique is that it has been shown to produce 10 relatively accurate forecasts. The results from comparative studies conducted by Naylor et al. 11 (1972) and Nelson (1973) show that the Box & Jenkins model, although simpler, was more 12 effective than other such contemporary econometric models. The basic limitation of this 13

approach is its high data requirement. In case of traffic forecasting, it demands rich, reasonably 14 accurate data spanning over a long time frame so that there may be sufficient number of data 15

points to model the situation appropriately. Since this is not always possible in the Indian 16 context, there is scope for a lot of improvement if this approach is to be used to good effect in the 17

future. 18

19

4.2 Analysis 20 As has been remarked before, there are two sets of Time Series analysis that have been 21 performed here. Both have been discussed sequentially hereafter. 22

23 4.2.1 Analysis With IRC Data: For the first case, the data used has been cited from “Time Series 24 Data on Road Transport Passenger and Freight Movement (1951-1991)”, Special Publication 45, 25

Indian Roads Congress, New Delhi, 1996 (Table 1). For univariate time series analysis, data 26

from years 1951-1985 (35 years) has been used. The estimation has been done for target year 27 1996. The Box & Jenkins methodology (2) has been used and ARIMA technique has been 28 adopted for analysis. The modeling has been performed on STATA. The following brief 29

definitions will enable a better understanding of the Time Series Analysis and reasons behind 30 selection of particular models for the same: 31

Box and Jenkins Methodology- The original Box-Jenkins modelling procedure involved an 32 iterative three-stage process of model selection, parameter estimation and model checking. The 33

five broad steps include the following: 34

Checking for stationarity and transforming the data set such that assumption of stationarity is 35 reasonable: A stationary process is a stochastic process whose joint probability 36 distribution does not change when shifted in time or space. Consequently, parameters such as 37

the mean and variance, if they exist, also do not change over time or position (Figure 1). 38

Dickey Fuller & Philip Perron Tests are performed to confirm stationarity of data used. A 39 stationarized series is relatively easy to predict since it is predicted that its statistical 40 properties will be the same in the future as they have been in the past. The predictions for the 41 stationarized series can then be untransformed by reversing whatever mathematical 42 transformations were previously used, to obtain predictions for the original series. Another 43

reason for trying to stationarize a time series is to be able to obtain meaningful sample 44 statistics such as means, variances, and correlations with other variables. Such statistics are 45 useful as descriptors of future behavior only if the series is stationary. For example, if the 46

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 8

series is consistently increasing over time, the sample mean and variance will grow with the 1

size of the sample, and they will always underestimate the mean and variance in future 2 periods. And if the mean and variance of a series are not well-defined, then neither are its 3 correlations with other variables. 4

Identification of the parameters of the model: ARIMA stands for Auto-Regressive Integrated 5 Moving Average process. Lags of the differenced series appearing in the forecasting equation 6

are called "auto-regressive" terms, lags of the forecast errors are called "moving average" 7 terms, and a time series which needs to be differenced to be made stationary is said to be an 8 "integrated" version of a stationary series. An ARMA model predicts the value of the target 9 variable as a linear function of lag values (this is the auto-regressive part) plus an effect from 10 recent random shock values (this is the moving average part). To get the order of AR and 11

MA process, autocorrelation function (ACF) and partial autocorrelation function (PACF) are 12 studied. An autoregressive process is a function of lagged dependent variables and a 13

moving average process a function of lagged error terms. An autocorrelation is the 14 correlation between the target variable and lag values for the same variable. Correlation 15 values range from -1 to +1. A value of +1 indicates that the two variables move together 16 perfectly; a value of -1 indicates that they move in opposite directions. When building a time 17

series model, it is important to include lag values that have large, positive autocorrelation 18 values. Sometimes it is also useful to include those that have large negative autocorrelations. 19

The partial autocorrelation is the autocorrelation of time series observations separated by a 20 lag of time units with the effects of the intervening observations eliminated. The grey region 21 in the ACF and PACF plots (Figure 2) shows the points two standard deviations (an 22

approximate 95% confidence interval) from zero. If the autocorrelation/partial 23 autocorrelation bar is longer than the marker (that is, it covers it), then the correlation should 24

be considered significant. 25

Estimation of the parameters: There are two different ways a model can be estimated-26 Maximum Likelihood Estimation and conditional Maximum Likelihood Estimation. The first 27 one uses numerical optimization techniques for estimation purpose and the latter is OLS 28

regression. This analysis follows full Maximum Likelihood Estimation. Based on minimum 29 AIC (Akaike Information Criterion) & BIC (Bayesian Information Criterion) which 30 determine the parsimony of the model, the best models are selected. 31

Performing diagnostic checks: If the model is correctly specified, the residuals of the model 32 should be uncorrelated. In other words, there should be a white noise. One way to test this is 33 to get a Portmanteau Test statistic (23). This is also called the White Noise Test. This 34 indicates absence of serial correlation or predictability. If the computed Q exceeds the value 35

from the χ2 table for some specified significance level, the null hypothesis that the series of 36

autocorrelations represents a random series is rejected at that level. The p-value gives the 37

probability of exceeding the computed Q, given a random series of residuals (22). Thus 38 random residuals give small Q and high p-value. Results are considered better when the 39 value of this probability is closer to 1. 40

Forecasting: There are two kinds of forecasting that can be done using an ARIMA model- a 41 static forecasting and a dynamic forecasting. The static forecast or the simple one step ahead 42 forecast will forecast only for a single time period ahead at a time. Dynamic forecast on the 43

other hand is used for forecasting for a longer horizon. For the purpose of forecasting, the 44 period from 1986-1996 has been kept aside taken as the „forecasting window‟. These 45

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 9

observed values are then compared with the forecasted values to calculate the root mean 1

forecasting error. 2 To assess the extent of data requirement for TS analysis, traffic population data for increasing 3 number of years were first tried out for analysis. In all these cases, estimation has been done for 4

the target year 1996. The respective observations for all these have been compiled below. 5 With 15 years‟ data (1971-1985): No significant spikes were observed from the ACF and PACF 6 plots. All the points located well inside the grey region for both of these plots. This indicated that 7 the available data was insufficient to carry out TS Analysis. 8 With 20 years‟ data (1966-1985): Although no significant spikes were observed for the ACF 9

plot, the PACF plot showed some spikes going outside the grey region. For ACF, spike at point 3 10 just touched the boundary of the grey region. Since taking this into analysis was not all that 11 reliable, analysis with this set of data was also not possible. 12 With 25 years‟ data (1961-1985): Both the ACF and PACF plots showed data consistency in 13

terms of significance of spikes. The models which were investigated during analysis are shown 14 in Table 3. 15

16

Table 3 TS Test Statistics for Various Prospective Models With 25 Years’ Data 17

Model P (White Noise) AIC BIC RMSE

ARIMA(3,2,3) 0.788 631.55 640.63 4370355

ARIMA(5,2,3) 0.884 633.43 644.78 4171331

ARIMA(6,2,3) 0.883 635.09 647.58 3885872

18 On the basis of the above results, ARIMA(6,2,3) was selected for forecasting. 19 With 30 years‟ data (1956-1985): The various prospective models selected for analysis along 20

with relevant parameters have been given in Table 4. 21

Table 4 TS Test Statistics for Various Prospective Models with 30 Years’ Data 22

Model P (White Noise) AIC BIC RMSE

ARIMA(3,2,3) 0.844 759.88 770.54 3974921

ARIMA(4,2,3) 0.837 761.82 773.81 3974921

ARIMA(5,2,3) 0.944 757.04 770.36 2808914

ARIMA(6,2,3) 0.998 752.68 766.00 1476482

On the basis of results obtained, ARIMA(6,2,3) was considered the best suited for estimation. 23 With 35 years‟ data (1951-1985): The available data had to be differenced twice to achieve 24 stationarity (a pre-requisite for Time Series Analysis) (Figure 1). The Dickey Fuller and Philip 25 Perron tests were conducted to confirm stationarity. Results obtained for various models using 26

this set of data are given in Table 5. 27 28

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 10

020

4060

8010

0

TVP_

milli

ons

1950 1960 1970 1980 1990 2000time

1 (a) 2

-50

510

ddTV

P_mi

llions

1950 1960 1970 1980 1990 2000time

3 (b) 4

FIGURE 1 (a) Non-Stationary Data (b) Stationary Data 5

6

-0.5

00.

000.

50

Auto

corre

latio

ns o

f ddT

VP

0 5 10 15Lag

Bartlett's formula for MA(q) 95% confidence bands

7 (a) 8

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 11

-2.0

0-1

.00

0.00

1.00

2.00

Parti

al a

utoc

orre

latio

ns o

f ddT

VP

0 5 10 15Lag

95% Confidence bands [se = 1/sqrt(n)]

1 (b) 2

FIGURE 2 (a) ACF Plot (b) PACF Plot 3 4

Table 5 TS Test Statistics for Various Prospective Models with 35 Years’ Data 5

Model P (White Noise) AIC BIC RMSE

ARIMA(3,2,3) 0.932 882.4 894.4 2572936

ARIMA(4,2,3) 0.832 889.6 903.1 3674235

ARIMA(5,2,3) 0.906 885.1 900.0 1603122

ARIMA(6,2,3) 0.909 886.8 903.3 1593738

ARIMA(7,2,3) 0.998 882.6 900.6 1374773

6

The ARIMA Regression parameters for model ARIMA(7,2,3) have been shown in Table 6. 7

TABLE 6 ARIMA (7,2,3) 8

9

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 12

The estimated coefficients (Column 2 in Table 6) should be significantly different (distant) from 1

zero. Significance of the AR and MA coefficients can be evaluated by comparing estimated 2 parameters with the standard errors (Column 3). The value (magnitude-wise) of the standard 3 error should be less than that of the coefficient itself. In this case, the parameters could not be 4

found to be significant at 5% level of significance as the 95% confidence interval (the last 2 5 columns in Table 6) included the point 0. As the confidence interval was lowered down, some of 6 the lags were found to be significant at 75% confidence interval. This could have been caused 7 due to the less than adequate data, the veracity of data source and method of data collection 8 adopted for the data used for TS Analysis in this case. Portmanteau test for white noise gave 9

Portmanteau (Q) statistic as 3.3464 which is less than the critical value of 23.7 at 5% level of 10 significance. Also, Probability > χ

2 (14) is 0.9983 which is very close to 1. Considering these 11

results and their significance, for this case, ARIMA (7, 2, 3) was considered to be the best suited 12 for modeling. 13

14 4.2.2 Analysis With PeMS Data: PeMS is an Archived Data User Service (ADUS) that provides 15

over ten years of data for historical analysis. In the raw AADT data available on PeMS website 16 the column that says "Arithmetic Mean" is the average of all daily flows. Each row shows this 17

value for a year; so if the row starts at 4/1/2009 (in mm-dd-yyyy format), the value being shown 18 is the arithmetic mean (the simple average) of daily traffic volumes from 4/1/2009 to 3/31/2010. 19 The next row that starts at 5/1/2009 shows the arithmetic mean from 5/1/2009 to 4/30/2010 and 20

so on. Study of this data for the location called Lark Ellen (34.4 miles along I-10W) has been 21 done. AADT data for this location is given in Table 2. The choice of location is based on the 22

following three different criteria, 23 i. The location should fall somewhere midway along the length of I-10 which itself is 46.8 24

miles long in district 7, 25

ii. Preferably, mainline data should be taken into analysis, and 26

iii. That location should be selected for which data is available for the longest duration. 27 For this set of analysis, monthly AADT data from Jul 2000 to Dec. 2008 were taken to 28

estimate the AADT for March 2011 (27 data points ahead in future), the most recent point of 29

time for which data was available. Important parameters for some prospective models have been 30 shown in Table 7. 31

32

TABLE 7 TS Parameters for prospective models for Lark Ellen 33

Model P (White Noise) AIC BIC RMSE

ARIMA (1,1,1) 0.703 1523.31 1533.77 948.004

ARIMA (1,1,2) 0.847 1523.86 1536.94 953.260

ARIMA (2,1,2) 0.846 1523.05 1538.74 948.004

ARIMA (2,1,1) 0.825 1524.07 1537.15 956.340

34 Table 8 shows statistics for ARIMA (2,1,2) which has been found to be best suited for modeling 35

this situation based on the results of these tests. As can be noticed from the last 2 columns in 36 Table 8, the lag 1 for AR and both lags 1 & 2 for MA are significant at 5% level of significance 37 since the 95% confidence interval for these lags is far from zero. Portmanteau test for white 38 noise gives Portmanteau (Q) statistic as 30.9877 which is less than the critical value of 55.76 at 39 5% level of significance. Probability > χ

2 (40) is 0.8459. Figure 3 shows the actual and predicted 40

AADT values for Lark Ellen after analysis with ARIMA (2, 1, 2). 41

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 13

TABLE 8 Regression Parameters for ARIMA (2, 1, 2) 1

2 2

1100

0011

5000

1200

0012

5000

1300

00

2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 2012m1t

AADT y prediction, dyn(m(2008m12))

3

FIGURE 3 Actual and Predicted AADT for Lark Ellen 4

5.0 RESULTS 5 The results for analyses carried out using IRC data for varying (and increasing) number of years 6 is summarized in Table 9. The estimated and actual values are for the target year 1996. 7

8 TABLE 9 Results Obtained With Different Sets of Data 9

Data Used Model Estimated Value Actual Value Error (%)

1961-1985(25 years) ARIMA(6,2,3) 97736195 108336195 -9.78

1956-1985(30 years) ARIMA(6,2,3) 110117248 108336195 1.644

1951-1985(35 years) ARIMA(7,2,3) 109951968 108336195 1.491

10

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 14

A negative error signifies underestimation i.e. the actual traffic volume was greater than the one 1

predicted. A positive error shows that the value predicted by analysis was more than the actual 2 value. As can be noted, the TS analysis was possible only with data for 25 years and more 3 (analysis with 15 and 20 years‟ data could not be run due to data insufficiency). Further, the 4

efficacy of TS analysis improved hugely with increasing number of years. 5 For the analysis carried out with AADT data from PeMS, DOT, California, ARIMA 6

(2,1,2) predicts an AADT value of 114959 for Mar. 2011 while the actual value is 111834, 7 resulting in an overestimation error of 2.794%. The RMSE values in this case have been 8 significantly lower than the previous set of analysis. This highlights the fact that the efficiency of 9

estimation by Time Series Analysis improves drastically with increasing amount (richness) of 10 data available. 11

The corresponding values of errors obtained from other two alternative approaches- 12 Trend Line Analysis and Econometric Regression Analysis were found to be 93.57% and 13

6.202% respectively (8). As is clear from these results, the error level of result obtained from 14 Time Series Analysis is considerably lower than that from the other two methods, stressing upon 15

its usefulness as a forecasting technique in the future. 16

17 6.0 CONCLUSIONS 18 Identification, investigation and implementation of appropriate traffic forecasting techniques is 19 imperative to meaningful and sustainable allocation of scarce resources like land, labor and fund 20

for developing nations. Time Series Analysis can be a promising alternative to the problem of 21 overestimation of future traffic levels, a trend generally observed when forecasting with other 22

traditional techniques. This method has been in use for short term forecasting in fields of finance 23 and econometrics for a long time now and an understanding of its use in transportation 24 engineering must be developed. The time frame for accurate forecasts by this method (which in 25

this paper is 11 years into the future) can be further investigated. As was realized during this 26

analysis, at least 30 data points were required for acceptable results from forecasting. If the 27 limitation of high and rich data requirement for this method is overcome by implementation of 28 proper technology then, in agreement with the findings of other researchers, it should contribute 29

favorably towards accurate traffic forecasting in times to come. Use of multivariate Time Series 30 methods like GARCH (Generalized Auto Regressive Conditional Heteroskedasticity) and ARCH 31

Processes, incorporating factors like change in land use patterns and a few relevant economic 32 indicators may produce even more accurate results. 33

34

Acknowledgement 35 The authors would like to thank Jane Berner of Caltrans for providing access to the PeMS data 36 and for her useful suggestions and illustrations about the database. They would also like to thank 37

Miss Nishita Sinha, M.A., Economics, JNU, New Delhi for her useful insights and contributions. 38 39

References 40 [1] Bhar, L. M., and V.K. Sharma. Time Series Analysis, Indian Agricultural Statistics Research 41 Institute, New Delhi, pp. 1-15. 42 [2] Box, G. E. P., and G. M. Jenkins. Time Series Analysis: Forecasting and Control, San 43 Francisco: Holden-Day, 1976. 44 [3] Cervero, R., Are Induced Traffic Studies Inducing Bad Investments? ACCESS, 22, 2003, pp. 45 22-27. 46

Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 15

[4] Cervero, R., and M. Hansen. Induced Travel Demand and Induced Road Investment: A 1

Simultaneous Equation Analysis, Journal of Transport Economics and Policy, Vol. 36, No. 3, 2 2002, pp. 469-490. 3 [5] Clark, S., Traffic Prediction using Multivariate Nonparametric Regression, Journal of 4

Transportation Engineering, ASCE, Vol. 129, No. 2, 2003, pp. 161-168. 5 [6] Dhingra, S. L. et al., Application of time series techniques for forecasting truck traffic 6 attracted by the Bombay metropolitan region, Journal of Advanced Transportation, Vol. 27, No. 7 3, 1993, pp. 227–249. 8 [7] Hymel, K. M. et al., Induced Demand and Rebound Effects in Road Transport, 9

Transportation Research Board, Methodological 44 (10), 2010, pp. 1220–1241. 10 [8] Jha, Kartikeya and Shriniwas Arkatkar. "Traffic Forecasting Techniques for Projecting 11 Vehicular Population in India", National Conference and Workshop on Recent Advances in 12 Traffic Engineering, SVNIT Surat, India, 2012, pp. 30-36. 13

[9] Kadiyali, L. R., Road Transport Demand Forecast for 2000 AD, Journal of the Indian Roads 14 Congress, Vol. 384, No. 48(3), 1987. 15

[10] Kadiyali, L.R., and T. V. Shashikala. Road Transport Demand Forecast for 2000 AD 16 Revisited and Demand Forecast for 2021, Journal of the Indian Roads Congress, Vol. 557, 2009, 17

pp. 235-237. 18 [11] Matas, Anna et al., Demand Forecasting in the Evaluation of Projects, Working Paper in 19 Economic Evaluation of Transportation Projects, 2009, pp. 1-31. 20

[12] Nihan, N.L., and K.O. Holmesland. Use of the Box and Jenkins Time Series Technique in 21 Traffic Forecasting, Transportation, Vol. 9, No. 2, 1980, pp. 125–143. 22

[13] Oswald, R.K. et al., Traffic Flow Forecasting Using Approximate Nearest Neighbor 23 Nonparametric Regression, Research Report no. UVACTS-15-13-7, Center for Transportation 24 Studies at the University of Virginia, 2001. 25

[14] Pankratz, A., Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, John 26

Wiley & Sons, New York, 1983. 27 [15] Project: Feasibility for 6-laning of NH-2 from Delhi-Agra Project on DBFO pattern under 28 NHDP Phase V, Consulting Engineering Services, 2007, Chapter 3. 29

[16] Ramsey, S., Of Mice and Elephants, ITE Journal, Vol. 75, No. 9, 2005, pp. 38. 30 [17] Skamris, M. K., and B. Flyvbjerg. Inaccuracy of Traffic Forecasts and Cost Estimates on 31

Large Transport Projects, Transport Policy, Vol. 4, No. 3, 1997, pp. 141-146. 32 [18] “Time Series Data on Road Transport Passenger and Freight Movement (1951-1991)”, 33

Special Publication 45, Indian Roads Congress, New Delhi, 1996. 34 35

Websites 36 [19] http://www.pems.dot.ca.gov accessed on 29.03.2012 37

[20] http://www.ssc.wisc.edu/~bhansen/390/stata.pdf 38 [21] http://www.ccsr.ac.uk/publications/teaching/mlr.pdf 39 [22] http://dss.princeton.edu/training/TS101.pdf 40

[23] http://www.stat.tamu.edu/~jnewton/stat626/topics/topics/topic13.pdf 41