time series analysis: a contemporary approach to traffic
TRANSCRIPT
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 1
Time Series Analysis: A Contemporary Approach to Traffic Volume Forecasting 1
Date of Submission: 31st July 2012. 2
Word Count: (4326 words + 12 *250) = 7326 words 3
Authors: 4
Kartikeya Jha 5 Undergraduate Student, Civil Engineering Department, Birla Institute of Technology & Science, 6 Pilani-333031, Rajasthan, India 7 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 8 Pilani-333031, Rajasthan, India 9
Telephone No.: 09789856265 10
Email: [email protected] 11 12
Balaji Ponnu 13
Former Post Graduate Student, Transportation Engineering Department, Birla Institute of 14 Technology & Science, Pilani-333031, Rajasthan, India 15 Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 16
Pilani-333031, Rajasthan, India 17 Telephone No.: 07598231953 18
Email: [email protected] 19 20 Shriniwas S. Arkatkar*
1 21
Assistant Professor, Civil Engineering Department, Birla Institute of Technology & Science, 22 Pilani-333031, Rajasthan, India 23
Mailing Address: Civil Engineering Department, Birla Institute of Technology & Science, 24 Pilani-333031, Rajasthan, India 25
Telephone No.: 08058321357 26 Fax: +91-01596-244183 27 E-mail: [email protected] 28
29 30
31 32 33
34
35
36
37
38
1
*Corresponding Author: Shriniwas S. Arkatkar ([email protected])
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 2
Abstract 1
Traffic forecasting is the process of estimating the number of users of different transportation 2 facilities in the future in terms of number of vehicles or people. It is an indispensible element of 3
transportation planning and engineering. For developing nations, assessment of the accuracy 4 levels of different forecasting techniques is imperative to meaningful allocation of scarce 5 resources like land, labor and money. This work attempts to review the cardinal issues that 6 surround this very essential and challenging field of study. Stress has been laid on Time Series 7 Analysis, a relatively contemporary approach to forecasting, especially in the field of 8
transportation engineering. This method has been used to predict total vehicular population in 9 India using data set for increasing number of years for separate sets of analyses and accuracy 10 level for each such analysis has been evaluated by checking with actual traffic population 11 figures. For this, the Box & Jenkins methodology has been adopted and analysis has been done 12
using the Auto-Regressive Integrated Moving Average (ARIMA) approach. Further, to highlight 13 the increasing effectiveness of this method with rich data, analysis has been done with AADT 14
data from PeMS, Caltrans, US. The study reveals the potential of Time Series Analysis as a 15 sound forecasting tool in times to come. The error in forecasting using this method has been 16
found to be significantly lower than that from other traditional methods. This analysis will 17 provide us an insight into the choice of a method best-suited for forecasting vehicular population 18
in India and other developing countries. 19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 3
1.0 INTRODUCTION 1
1.1 Traffic Forecasting 2 In essence, traffic volume forecasting is the process of estimating the number of vehicles or 3 people that are likely to use different transportation facilities in the future. For instance, a 4
forecast may estimate the number of vehicles on a planned road or bridge, the expected ridership 5 on a railway/metro line, the number of passengers visiting an airport, or the expected future 6 traffic levels for the whole country. This process begins with collection of data on current traffic. 7 Depending upon the specific requirement of analysis, this traffic data is combined with other 8 known data, such as population and economic growth rates, employment rate, trip rates, travel 9
costs etc., to develop a traffic demand model for the current situation. Combining this with 10 predicted data for population, employment etc. results in estimates of future traffic, typically 11 estimated for each segment of the transportation infrastructure in question, e.g., for each roadway 12
segment or railway station that falls under the scope of facility. 13
1.2 Need For Traffic Forecasting 14 Knowledge of future traffic flow is an essential input in the planning, implementation and 15 development of a transportation system. It also helps in its operation, management and control 16 (6). It is required to start the planning and/or development phase of any major transportation 17
project initiatives. Being the first step in defining the scope and geometry of such projects, 18 sometimes forecasting even helps us know whether a project is needed at all. Forecasting is 19
necessary for doing relevant economic analysis (11). It can also be used for other purposes such 20 as corridor planning, systems planning, air quality analysis, safety analysis and other such special 21 projects. Inaccuracies in traffic volume forecasts are responsible for the additional costs 22
associated with over and under design (17). The costs associated with an under designed project 23 arise when an additional project must satisfy the original inadequacies. Extra materials, labor, 24
and additional right-of-way attainment add to the cost of an over designed project. Efficiency of 25 Traffic Forecasting depends mainly on the size of average daily traffic. In general, the smaller 26
the average daily traffic, the larger is the error in traffic forecasting. The major reasons for these 27
errors can be: 28
• The changing traffic patterns in the future, specifically Induced demand effect (3),(4) & 29 Rebound effect (7). 30
• Traffic impacts due to development, majorly due to change in land use patterns (16), 31
• Unforeseen and unaccounted socio-economic changes (5), 32
• Construction of new roads, diversions etc. 33 34
2.0 LITERATURE REVIEW 35 The literature review for this work comprises of the study of available literature on the methods 36 previously used for traffic forecasting, their challenges, scope for improvement and then the 37
study of more recent, contemporary approaches to forecasting, especially with reference to Time 38 Series Analysis. In the Indian context, the past research work has mainly concentrated on Trend 39 Line Analysis (9),(10). Here the traffic volume levels for the country have been predicted using a 40 linear relationship between a country‟s Gross National Product (GNP) and the total vehicular 41 population. On the same lines, a project feasibility report on 6-laning of NH-2 from Delhi to 42
Agra prepared by CES for NHDP (15) elaborates a combination of Trip Generation models and 43 Trend Line Analysis using NSDP (Net State Domestic Product) instead of GNP for different 44 corridors lying in the scope of this project. Study of more contemporary areas of research focuses 45
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 4
mainly on the Time Series Analysis. While Bhar & Sharma (1) deal with the applications and 1
nuances of Time Series Analysis (exemplified with the use of the software SPSS), Nihan & 2 Holmesland (12) stress on the basics of Time Series Modeling. Approximate nearest neighbor 3 nonparametric regression method has been discussed by Oswald et al. (13). Although a number 4
of methods can be adopted for traffic volume forecasting depending on the specific situation at 5 hand, for this analysis one of the more recent approaches-Time Series Analysis was chosen for a 6 comparative analysis with other traditional methods. 7 8
3.0 OBJECTIVE AND SCOPE 9 This paper attempts to highlight the usefulness of Time Series (TS) Analysis in traffic 10 forecasting by underlining the lower values of estimation errors found with this method when 11 compared to other two methods- Trend Line Analysis and Econometric Regression Analysis (8). 12 This whole exercise is only the first step in developing an insight into the choice of the best 13
suited method, especially with respect to Indian conditions to estimate future traffic levels in the 14 country which, as has been discussed, is quite imperative from many aspects. Due to data 15
availability constraints the present analysis has been done for total vehicular population in India 16 to enable the choice of appropriate methods for estimation at specific project level also. The 17
primary data used has been cited from “Time Series Data on Road Transport Passenger and 18 Freight Movement (1951-1991)”, Special Publication 45, Indian Roads Congress, New Delhi, 19 1996 (18). This has been produced in Table 1 for ready reference. To gauge the extent of data 20
requirement of TS method, analysis was carried out with 15, 20, 25, 30 and 35 years‟ traffic data 21 and respective errors in estimation were calculated. As suggested by Box & Jenkins (2), ideally 22
at least 50 observations are required for performing Time Series Analysis. Taking this into 23 account, TS analysis was done on Average Annual Daily Traffic (AADT) data sourced from 24 Performance Measurement System (PeMS), DOT, California, US for a location in district 7 on 25
Interstate-10(W) (data shown in Table 2). This analysis further established the potential of Time 26
Series Analysis as a promising alternative to traditional methods of forecasting. Overall, the 27 paper attempts to gauge the suitability of Time Series forecasting technique for traffic volume 28 prediction. Given rich and varied data availability, this analysis can be extended to produce 29
better understanding of this method and its application to project level studies as well. Further, 30 multivariate Time Series Modeling can be explored for even better results if data availability 31
meets the high requirements of TS analysis. 32 33
4.0 METHODOLGY OF ANALYSIS 34 4.1 Methods adopted 35 This work deals mainly with the Time Series Analysis method for forecasting. At the same time, 36 the results obtained after analysis by this method have been compared with those obtained from 37
two other methods- Trend Line Analysis, where future traffic volume is predicted based on a 38 linear relationship between traffic population and Gross National Product (GNP); and 39 Econometric Regression Analysis where traffic demand is seen as being dependent on chosen 40
economic/demographic variables (8). A brief description of TS method is given below: 41 Time Series Analysis: Time series is a set of observations ordered in time. This analysis deals 42 with observations that are collected over equally spaced, discrete time intervals. As in this case, 43 when observations are made for only one variable over time, it is called a univariate time series. 44 The fundamental assumption for any Time Series Analysis is that some aspects of past pattern 45 will continue to affect the future values. 46
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 5
TABLE 1 Data Table for Total Vehicular Population (1951-1996) 1 2
Year Total Vehicular
Population
Year Total Vehicular
Population
1951 1516079 1974 13109888
1952 1646669 1975 14359564
1953 1833692 1976 15717431
1954 1967710 1977 17553280
1955 2158496 1978 19303907
1956 2403603 1979 21083477
1957 2605137 1980 23418452
1958 2865091 1981 26138616
1959 3129076 1982 28846935
1960 3452840 1983 32056201
1961 3778488 1984 35530913
1962 4173044 1985 39429002
1963 4630391 1986 38349721
1964 5042291 1987 45492645
1965 5574485 1988 53073160
1966 6172690 1989 60827580
1967 6786859 1990 68944375
1968 7466313 1991 74641916
1969 8219423 1992 80487495
1970 9049346 1993 86298645
1971 10014079 1994 92274138
1972 11028301 1995 100337963
1973 11918799 1996 108336195
3 Source-“Time Series Data on Road Transport Passenger and Freight Movement (1951-1991)”, 4 Special Publication 45, Indian Roads Congress, New Delhi, 1996. 5 6 7
8 9 10
11
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 6
TABLE 2 Monthly AADT Data for Location ‘Lark Ellen’ on I-10(W), California 1
Month
Mainline (ML)
AADT
Month
Mainline (ML)
AADT
Month
Mainline (ML)
AADT
Jul-00 117018 Feb-04 120744 Sep-07 116075
Aug-00 117170 Mar-04 120914 Oct-07 116076
Sep-00 117113 Apr-04 120896 Nov-07 115953
Oct-00 117099 May-04 120710 Dec-07 115650
Nov-00 117339 Jun-04 120672 Jan-08 115364
Dec-00 117462 Jul-04 120258 Feb-08 115341
Jan-01 117725 Aug-04 119920 Mar-08 115093
Feb-01 118230 Sep-04 119160 Apr-08 115009
Mar-01 118441 Oct-04 118394 May-08 115176
Apr-01 118147 Nov-04 118376 Jun-08 115351
May-01 118070 Dec-04 121533 Jul-08 115630
Jun-01 118020 Jan-05 124428 Aug-08 115670
Jul-01 117952 Feb-05 125687 Sep-08 115698
Aug-01 118524 Mar-05 125992 Oct-08 115596
Sep-01 119371 Apr-05 126534 Nov-08 115441
Oct-01 119828 May-05 126209 Dec-08 115472
Nov-01 119624 Jun-05 126098 Jan-09 115743
Dec-01 119469 Jul-05 125727 Feb-09 116064
Jan-02 119249 Aug-05 125454 Mar-09 116292
Feb-02 118846 Sep-05 125454 Apr-09 116346
Mar-02 118790 Oct-05 125431 May-09 116031
Apr-02 118425 Nov-05 125496 Jun-09 115870
May-02 118615 Dec-05 125283 Jul-09 115563
Jun-02 118910 Jan-06 125427 Aug-09 115462
Jul-02 119169 Feb-06 125349 Sep-09 115122
Aug-02 119426 Mar-06 125197 Oct-09 115156
Sep-02 119412 Apr-06 124803 Nov-09 115486
Oct-02 119364 May-06 124623 Dec-09 115616
Nov-02 119450 Jun-06 123341 Jan-10 114961
Dec-02 119397 Jul-06 122467 Feb-10 114377
Jan-03 119689 Aug-06 121037 Mar-10 114107
Feb-03 119904 Sep-06 120781 Apr-10 113668
Mar-03 120019 Oct-06 120445 May-10 113356
Apr-03 120451 Nov-06 119571 Jun-10 112959
May-03 120665 Dec-06 118978 Jul-10 112688
Jun-03 120594 Jan-07 118558 Aug-10 112579
Jul-03 120750 Feb-07 117770 Sep-10 112617
Aug-03 120717 Mar-07 117690 Oct-10 112282
Sep-03 120831 Apr-07 117817 Nov-10 111810
Oct-03 121107 May-07 117860 Dec-10 111340
Nov-03 120832 Jun-07 117484 Jan-11 111574
Dec-03 120931 Jul-07 116822 Feb-11 111668
Jan-04 120731 Aug-07 116424 Mar-11 111834
2
Data Source: http://www.pems.dot.ca.gov (PeMS, Caltrans) accessed on 02.04.2012 3
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 7
Values of variables occurring prior to the current observation are called lag values. The primary 1
difference between time series models and other types of models is that lag values of the target 2 variable are used as predictor variables, whereas traditional models use other variables as 3 predictors, and the concept of a lag value doesn‟t apply because the observations don‟t represent 4
a chronological sequence. A time series is deterministic if its future behavior can be exactly 5 predicted from its past behavior. Otherwise the time series is statistical. The future behavior of a 6 statistical time series can be predicted only in probabilistic terms. Time series techniques can be 7 used to develop highly accurate and inexpensive short term forecasts. The Box & Jenkins 8 methodology (14) has been adopted and analysis has been done using the ARIMA approach. The 9
main rationale behind using Box & Jenkins technique is that it has been shown to produce 10 relatively accurate forecasts. The results from comparative studies conducted by Naylor et al. 11 (1972) and Nelson (1973) show that the Box & Jenkins model, although simpler, was more 12 effective than other such contemporary econometric models. The basic limitation of this 13
approach is its high data requirement. In case of traffic forecasting, it demands rich, reasonably 14 accurate data spanning over a long time frame so that there may be sufficient number of data 15
points to model the situation appropriately. Since this is not always possible in the Indian 16 context, there is scope for a lot of improvement if this approach is to be used to good effect in the 17
future. 18
19
4.2 Analysis 20 As has been remarked before, there are two sets of Time Series analysis that have been 21 performed here. Both have been discussed sequentially hereafter. 22
23 4.2.1 Analysis With IRC Data: For the first case, the data used has been cited from “Time Series 24 Data on Road Transport Passenger and Freight Movement (1951-1991)”, Special Publication 45, 25
Indian Roads Congress, New Delhi, 1996 (Table 1). For univariate time series analysis, data 26
from years 1951-1985 (35 years) has been used. The estimation has been done for target year 27 1996. The Box & Jenkins methodology (2) has been used and ARIMA technique has been 28 adopted for analysis. The modeling has been performed on STATA. The following brief 29
definitions will enable a better understanding of the Time Series Analysis and reasons behind 30 selection of particular models for the same: 31
Box and Jenkins Methodology- The original Box-Jenkins modelling procedure involved an 32 iterative three-stage process of model selection, parameter estimation and model checking. The 33
five broad steps include the following: 34
Checking for stationarity and transforming the data set such that assumption of stationarity is 35 reasonable: A stationary process is a stochastic process whose joint probability 36 distribution does not change when shifted in time or space. Consequently, parameters such as 37
the mean and variance, if they exist, also do not change over time or position (Figure 1). 38
Dickey Fuller & Philip Perron Tests are performed to confirm stationarity of data used. A 39 stationarized series is relatively easy to predict since it is predicted that its statistical 40 properties will be the same in the future as they have been in the past. The predictions for the 41 stationarized series can then be untransformed by reversing whatever mathematical 42 transformations were previously used, to obtain predictions for the original series. Another 43
reason for trying to stationarize a time series is to be able to obtain meaningful sample 44 statistics such as means, variances, and correlations with other variables. Such statistics are 45 useful as descriptors of future behavior only if the series is stationary. For example, if the 46
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 8
series is consistently increasing over time, the sample mean and variance will grow with the 1
size of the sample, and they will always underestimate the mean and variance in future 2 periods. And if the mean and variance of a series are not well-defined, then neither are its 3 correlations with other variables. 4
Identification of the parameters of the model: ARIMA stands for Auto-Regressive Integrated 5 Moving Average process. Lags of the differenced series appearing in the forecasting equation 6
are called "auto-regressive" terms, lags of the forecast errors are called "moving average" 7 terms, and a time series which needs to be differenced to be made stationary is said to be an 8 "integrated" version of a stationary series. An ARMA model predicts the value of the target 9 variable as a linear function of lag values (this is the auto-regressive part) plus an effect from 10 recent random shock values (this is the moving average part). To get the order of AR and 11
MA process, autocorrelation function (ACF) and partial autocorrelation function (PACF) are 12 studied. An autoregressive process is a function of lagged dependent variables and a 13
moving average process a function of lagged error terms. An autocorrelation is the 14 correlation between the target variable and lag values for the same variable. Correlation 15 values range from -1 to +1. A value of +1 indicates that the two variables move together 16 perfectly; a value of -1 indicates that they move in opposite directions. When building a time 17
series model, it is important to include lag values that have large, positive autocorrelation 18 values. Sometimes it is also useful to include those that have large negative autocorrelations. 19
The partial autocorrelation is the autocorrelation of time series observations separated by a 20 lag of time units with the effects of the intervening observations eliminated. The grey region 21 in the ACF and PACF plots (Figure 2) shows the points two standard deviations (an 22
approximate 95% confidence interval) from zero. If the autocorrelation/partial 23 autocorrelation bar is longer than the marker (that is, it covers it), then the correlation should 24
be considered significant. 25
Estimation of the parameters: There are two different ways a model can be estimated-26 Maximum Likelihood Estimation and conditional Maximum Likelihood Estimation. The first 27 one uses numerical optimization techniques for estimation purpose and the latter is OLS 28
regression. This analysis follows full Maximum Likelihood Estimation. Based on minimum 29 AIC (Akaike Information Criterion) & BIC (Bayesian Information Criterion) which 30 determine the parsimony of the model, the best models are selected. 31
Performing diagnostic checks: If the model is correctly specified, the residuals of the model 32 should be uncorrelated. In other words, there should be a white noise. One way to test this is 33 to get a Portmanteau Test statistic (23). This is also called the White Noise Test. This 34 indicates absence of serial correlation or predictability. If the computed Q exceeds the value 35
from the χ2 table for some specified significance level, the null hypothesis that the series of 36
autocorrelations represents a random series is rejected at that level. The p-value gives the 37
probability of exceeding the computed Q, given a random series of residuals (22). Thus 38 random residuals give small Q and high p-value. Results are considered better when the 39 value of this probability is closer to 1. 40
Forecasting: There are two kinds of forecasting that can be done using an ARIMA model- a 41 static forecasting and a dynamic forecasting. The static forecast or the simple one step ahead 42 forecast will forecast only for a single time period ahead at a time. Dynamic forecast on the 43
other hand is used for forecasting for a longer horizon. For the purpose of forecasting, the 44 period from 1986-1996 has been kept aside taken as the „forecasting window‟. These 45
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 9
observed values are then compared with the forecasted values to calculate the root mean 1
forecasting error. 2 To assess the extent of data requirement for TS analysis, traffic population data for increasing 3 number of years were first tried out for analysis. In all these cases, estimation has been done for 4
the target year 1996. The respective observations for all these have been compiled below. 5 With 15 years‟ data (1971-1985): No significant spikes were observed from the ACF and PACF 6 plots. All the points located well inside the grey region for both of these plots. This indicated that 7 the available data was insufficient to carry out TS Analysis. 8 With 20 years‟ data (1966-1985): Although no significant spikes were observed for the ACF 9
plot, the PACF plot showed some spikes going outside the grey region. For ACF, spike at point 3 10 just touched the boundary of the grey region. Since taking this into analysis was not all that 11 reliable, analysis with this set of data was also not possible. 12 With 25 years‟ data (1961-1985): Both the ACF and PACF plots showed data consistency in 13
terms of significance of spikes. The models which were investigated during analysis are shown 14 in Table 3. 15
16
Table 3 TS Test Statistics for Various Prospective Models With 25 Years’ Data 17
Model P (White Noise) AIC BIC RMSE
ARIMA(3,2,3) 0.788 631.55 640.63 4370355
ARIMA(5,2,3) 0.884 633.43 644.78 4171331
ARIMA(6,2,3) 0.883 635.09 647.58 3885872
18 On the basis of the above results, ARIMA(6,2,3) was selected for forecasting. 19 With 30 years‟ data (1956-1985): The various prospective models selected for analysis along 20
with relevant parameters have been given in Table 4. 21
Table 4 TS Test Statistics for Various Prospective Models with 30 Years’ Data 22
Model P (White Noise) AIC BIC RMSE
ARIMA(3,2,3) 0.844 759.88 770.54 3974921
ARIMA(4,2,3) 0.837 761.82 773.81 3974921
ARIMA(5,2,3) 0.944 757.04 770.36 2808914
ARIMA(6,2,3) 0.998 752.68 766.00 1476482
On the basis of results obtained, ARIMA(6,2,3) was considered the best suited for estimation. 23 With 35 years‟ data (1951-1985): The available data had to be differenced twice to achieve 24 stationarity (a pre-requisite for Time Series Analysis) (Figure 1). The Dickey Fuller and Philip 25 Perron tests were conducted to confirm stationarity. Results obtained for various models using 26
this set of data are given in Table 5. 27 28
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 10
020
4060
8010
0
TVP_
milli
ons
1950 1960 1970 1980 1990 2000time
1 (a) 2
-50
510
ddTV
P_mi
llions
1950 1960 1970 1980 1990 2000time
3 (b) 4
FIGURE 1 (a) Non-Stationary Data (b) Stationary Data 5
6
-0.5
00.
000.
50
Auto
corre
latio
ns o
f ddT
VP
0 5 10 15Lag
Bartlett's formula for MA(q) 95% confidence bands
7 (a) 8
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 11
-2.0
0-1
.00
0.00
1.00
2.00
Parti
al a
utoc
orre
latio
ns o
f ddT
VP
0 5 10 15Lag
95% Confidence bands [se = 1/sqrt(n)]
1 (b) 2
FIGURE 2 (a) ACF Plot (b) PACF Plot 3 4
Table 5 TS Test Statistics for Various Prospective Models with 35 Years’ Data 5
Model P (White Noise) AIC BIC RMSE
ARIMA(3,2,3) 0.932 882.4 894.4 2572936
ARIMA(4,2,3) 0.832 889.6 903.1 3674235
ARIMA(5,2,3) 0.906 885.1 900.0 1603122
ARIMA(6,2,3) 0.909 886.8 903.3 1593738
ARIMA(7,2,3) 0.998 882.6 900.6 1374773
6
The ARIMA Regression parameters for model ARIMA(7,2,3) have been shown in Table 6. 7
TABLE 6 ARIMA (7,2,3) 8
9
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 12
The estimated coefficients (Column 2 in Table 6) should be significantly different (distant) from 1
zero. Significance of the AR and MA coefficients can be evaluated by comparing estimated 2 parameters with the standard errors (Column 3). The value (magnitude-wise) of the standard 3 error should be less than that of the coefficient itself. In this case, the parameters could not be 4
found to be significant at 5% level of significance as the 95% confidence interval (the last 2 5 columns in Table 6) included the point 0. As the confidence interval was lowered down, some of 6 the lags were found to be significant at 75% confidence interval. This could have been caused 7 due to the less than adequate data, the veracity of data source and method of data collection 8 adopted for the data used for TS Analysis in this case. Portmanteau test for white noise gave 9
Portmanteau (Q) statistic as 3.3464 which is less than the critical value of 23.7 at 5% level of 10 significance. Also, Probability > χ
2 (14) is 0.9983 which is very close to 1. Considering these 11
results and their significance, for this case, ARIMA (7, 2, 3) was considered to be the best suited 12 for modeling. 13
14 4.2.2 Analysis With PeMS Data: PeMS is an Archived Data User Service (ADUS) that provides 15
over ten years of data for historical analysis. In the raw AADT data available on PeMS website 16 the column that says "Arithmetic Mean" is the average of all daily flows. Each row shows this 17
value for a year; so if the row starts at 4/1/2009 (in mm-dd-yyyy format), the value being shown 18 is the arithmetic mean (the simple average) of daily traffic volumes from 4/1/2009 to 3/31/2010. 19 The next row that starts at 5/1/2009 shows the arithmetic mean from 5/1/2009 to 4/30/2010 and 20
so on. Study of this data for the location called Lark Ellen (34.4 miles along I-10W) has been 21 done. AADT data for this location is given in Table 2. The choice of location is based on the 22
following three different criteria, 23 i. The location should fall somewhere midway along the length of I-10 which itself is 46.8 24
miles long in district 7, 25
ii. Preferably, mainline data should be taken into analysis, and 26
iii. That location should be selected for which data is available for the longest duration. 27 For this set of analysis, monthly AADT data from Jul 2000 to Dec. 2008 were taken to 28
estimate the AADT for March 2011 (27 data points ahead in future), the most recent point of 29
time for which data was available. Important parameters for some prospective models have been 30 shown in Table 7. 31
32
TABLE 7 TS Parameters for prospective models for Lark Ellen 33
Model P (White Noise) AIC BIC RMSE
ARIMA (1,1,1) 0.703 1523.31 1533.77 948.004
ARIMA (1,1,2) 0.847 1523.86 1536.94 953.260
ARIMA (2,1,2) 0.846 1523.05 1538.74 948.004
ARIMA (2,1,1) 0.825 1524.07 1537.15 956.340
34 Table 8 shows statistics for ARIMA (2,1,2) which has been found to be best suited for modeling 35
this situation based on the results of these tests. As can be noticed from the last 2 columns in 36 Table 8, the lag 1 for AR and both lags 1 & 2 for MA are significant at 5% level of significance 37 since the 95% confidence interval for these lags is far from zero. Portmanteau test for white 38 noise gives Portmanteau (Q) statistic as 30.9877 which is less than the critical value of 55.76 at 39 5% level of significance. Probability > χ
2 (40) is 0.8459. Figure 3 shows the actual and predicted 40
AADT values for Lark Ellen after analysis with ARIMA (2, 1, 2). 41
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 13
TABLE 8 Regression Parameters for ARIMA (2, 1, 2) 1
2 2
1100
0011
5000
1200
0012
5000
1300
00
2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 2012m1t
AADT y prediction, dyn(m(2008m12))
3
FIGURE 3 Actual and Predicted AADT for Lark Ellen 4
5.0 RESULTS 5 The results for analyses carried out using IRC data for varying (and increasing) number of years 6 is summarized in Table 9. The estimated and actual values are for the target year 1996. 7
8 TABLE 9 Results Obtained With Different Sets of Data 9
Data Used Model Estimated Value Actual Value Error (%)
1961-1985(25 years) ARIMA(6,2,3) 97736195 108336195 -9.78
1956-1985(30 years) ARIMA(6,2,3) 110117248 108336195 1.644
1951-1985(35 years) ARIMA(7,2,3) 109951968 108336195 1.491
10
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 14
A negative error signifies underestimation i.e. the actual traffic volume was greater than the one 1
predicted. A positive error shows that the value predicted by analysis was more than the actual 2 value. As can be noted, the TS analysis was possible only with data for 25 years and more 3 (analysis with 15 and 20 years‟ data could not be run due to data insufficiency). Further, the 4
efficacy of TS analysis improved hugely with increasing number of years. 5 For the analysis carried out with AADT data from PeMS, DOT, California, ARIMA 6
(2,1,2) predicts an AADT value of 114959 for Mar. 2011 while the actual value is 111834, 7 resulting in an overestimation error of 2.794%. The RMSE values in this case have been 8 significantly lower than the previous set of analysis. This highlights the fact that the efficiency of 9
estimation by Time Series Analysis improves drastically with increasing amount (richness) of 10 data available. 11
The corresponding values of errors obtained from other two alternative approaches- 12 Trend Line Analysis and Econometric Regression Analysis were found to be 93.57% and 13
6.202% respectively (8). As is clear from these results, the error level of result obtained from 14 Time Series Analysis is considerably lower than that from the other two methods, stressing upon 15
its usefulness as a forecasting technique in the future. 16
17 6.0 CONCLUSIONS 18 Identification, investigation and implementation of appropriate traffic forecasting techniques is 19 imperative to meaningful and sustainable allocation of scarce resources like land, labor and fund 20
for developing nations. Time Series Analysis can be a promising alternative to the problem of 21 overestimation of future traffic levels, a trend generally observed when forecasting with other 22
traditional techniques. This method has been in use for short term forecasting in fields of finance 23 and econometrics for a long time now and an understanding of its use in transportation 24 engineering must be developed. The time frame for accurate forecasts by this method (which in 25
this paper is 11 years into the future) can be further investigated. As was realized during this 26
analysis, at least 30 data points were required for acceptable results from forecasting. If the 27 limitation of high and rich data requirement for this method is overcome by implementation of 28 proper technology then, in agreement with the findings of other researchers, it should contribute 29
favorably towards accurate traffic forecasting in times to come. Use of multivariate Time Series 30 methods like GARCH (Generalized Auto Regressive Conditional Heteroskedasticity) and ARCH 31
Processes, incorporating factors like change in land use patterns and a few relevant economic 32 indicators may produce even more accurate results. 33
34
Acknowledgement 35 The authors would like to thank Jane Berner of Caltrans for providing access to the PeMS data 36 and for her useful suggestions and illustrations about the database. They would also like to thank 37
Miss Nishita Sinha, M.A., Economics, JNU, New Delhi for her useful insights and contributions. 38 39
References 40 [1] Bhar, L. M., and V.K. Sharma. Time Series Analysis, Indian Agricultural Statistics Research 41 Institute, New Delhi, pp. 1-15. 42 [2] Box, G. E. P., and G. M. Jenkins. Time Series Analysis: Forecasting and Control, San 43 Francisco: Holden-Day, 1976. 44 [3] Cervero, R., Are Induced Traffic Studies Inducing Bad Investments? ACCESS, 22, 2003, pp. 45 22-27. 46
Kartikeya Jha, Balaji Ponnu & Shriniwas S. Arkatkar 15
[4] Cervero, R., and M. Hansen. Induced Travel Demand and Induced Road Investment: A 1
Simultaneous Equation Analysis, Journal of Transport Economics and Policy, Vol. 36, No. 3, 2 2002, pp. 469-490. 3 [5] Clark, S., Traffic Prediction using Multivariate Nonparametric Regression, Journal of 4
Transportation Engineering, ASCE, Vol. 129, No. 2, 2003, pp. 161-168. 5 [6] Dhingra, S. L. et al., Application of time series techniques for forecasting truck traffic 6 attracted by the Bombay metropolitan region, Journal of Advanced Transportation, Vol. 27, No. 7 3, 1993, pp. 227–249. 8 [7] Hymel, K. M. et al., Induced Demand and Rebound Effects in Road Transport, 9
Transportation Research Board, Methodological 44 (10), 2010, pp. 1220–1241. 10 [8] Jha, Kartikeya and Shriniwas Arkatkar. "Traffic Forecasting Techniques for Projecting 11 Vehicular Population in India", National Conference and Workshop on Recent Advances in 12 Traffic Engineering, SVNIT Surat, India, 2012, pp. 30-36. 13
[9] Kadiyali, L. R., Road Transport Demand Forecast for 2000 AD, Journal of the Indian Roads 14 Congress, Vol. 384, No. 48(3), 1987. 15
[10] Kadiyali, L.R., and T. V. Shashikala. Road Transport Demand Forecast for 2000 AD 16 Revisited and Demand Forecast for 2021, Journal of the Indian Roads Congress, Vol. 557, 2009, 17
pp. 235-237. 18 [11] Matas, Anna et al., Demand Forecasting in the Evaluation of Projects, Working Paper in 19 Economic Evaluation of Transportation Projects, 2009, pp. 1-31. 20
[12] Nihan, N.L., and K.O. Holmesland. Use of the Box and Jenkins Time Series Technique in 21 Traffic Forecasting, Transportation, Vol. 9, No. 2, 1980, pp. 125–143. 22
[13] Oswald, R.K. et al., Traffic Flow Forecasting Using Approximate Nearest Neighbor 23 Nonparametric Regression, Research Report no. UVACTS-15-13-7, Center for Transportation 24 Studies at the University of Virginia, 2001. 25
[14] Pankratz, A., Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, John 26
Wiley & Sons, New York, 1983. 27 [15] Project: Feasibility for 6-laning of NH-2 from Delhi-Agra Project on DBFO pattern under 28 NHDP Phase V, Consulting Engineering Services, 2007, Chapter 3. 29
[16] Ramsey, S., Of Mice and Elephants, ITE Journal, Vol. 75, No. 9, 2005, pp. 38. 30 [17] Skamris, M. K., and B. Flyvbjerg. Inaccuracy of Traffic Forecasts and Cost Estimates on 31
Large Transport Projects, Transport Policy, Vol. 4, No. 3, 1997, pp. 141-146. 32 [18] “Time Series Data on Road Transport Passenger and Freight Movement (1951-1991)”, 33
Special Publication 45, Indian Roads Congress, New Delhi, 1996. 34 35
Websites 36 [19] http://www.pems.dot.ca.gov accessed on 29.03.2012 37
[20] http://www.ssc.wisc.edu/~bhansen/390/stata.pdf 38 [21] http://www.ccsr.ac.uk/publications/teaching/mlr.pdf 39 [22] http://dss.princeton.edu/training/TS101.pdf 40
[23] http://www.stat.tamu.edu/~jnewton/stat626/topics/topics/topic13.pdf 41