link travel time prediction using origin destination …ds.cs.ut.ee/courses/course-files/project...

Link Travel Time Prediction usingOrigin Destination Matrix and

Neural Network ModelAYOBAMI EPHRAIM ADEWALE∗

University of [email protected]

May 28, 2018

Abstract

Origin Destination matrix have been used over the years by transport agencies to understand and meettransportation demands but with the increase in world’s population, previous method like Gravity Model,Statistical model and Equilibrium have become inefficient in their OD matrix predictions. The aim of thispaper is to describe and take advantage of the recent advancement in deep learning to predict bus travel timesby using OD matrix and Neural Network.

I. Introduction

The birth of public bus transportation sys-tem has contributed immensely to thegrowth of cities in developed countries

like New York in United State of America,Tallinn in Estonia, Beijing in China and Munichin Germany. It has been used to reduce trafficcongestion, reduce the emission of greenhousegas that are harmful to the environment, im-prove access to opportunities within connectedcities, boost economy growth and lastly, im-prove quality of life in general. However, mostpeople are still reluctant to take public bus andthus, prefer to go around with their privatevehicles. This is quite understandable becausethe system is easily affected by weather, traf-fic signals, traffic fluctuations, peak hours androad incidents which often leads to delay inset schedules, irregularities in journey timesand bus arrival times. In some countries, thedifference between the set schedule and ac-tual arrival times could be upto ten minutesor more. This does not only affect the plans

∗keywords: OD matrix, Deep learning, Neural Net-works, ITS

of the commuters but also reflects on the econ-omy growth. Based on this, there have beenincreasing demand for scientific techniques tosolve the lingering problems.

In this article, we intend to adopt the conceptof Origin Destination matrix and Deep learningtechniques in the prediction of trip travel timesin public transport. The resulting OD matrixwill provide a detailed picture of trip traveltimes distribution in a region and can be usedby transport agencies to plan transportationneeds.

The remainder of this paper is organizedas follows, section 2 talks about the literaturereview, Deep Learning Techniques in section 3,section 4 discusses the case study and the lastsection covers the result and conclusion.

II. Literature Review

In [1], Remya et.al took advantage of thecomputing abilities of Artificial Neural Net-work(ANN) which have been proven in variousfields such as pattern recognition and fields re-lated to prediction and estimation. ANN wasused to develop a new technique that would

1

mailto:[email protected]

Distributed System Seminar • April 2018

result into a more accurate O-D matrix esti-mation. The authors divided the OD matrixestimation problem into three, shortest path es-timation, selection of Links and Training of theNeural network. Dijkstra algorithm was usedfor shortest path selection and the selectionof appropriate links was based on some fewassumptions and constraint. This was done be-cause the availability of multiple links withunequal cost between any pair zones oftenaffects the computation and accuracy of theOD matrix estimation. To minimize the devi-ation between the model outputs and targetvalues, neural network Levenberg- Marquardtalgorithm was used to train the data set andthe performance was measured using MeanSquared Error(MSE). The result showed thatthe neural network model fits good in the ana-lyzed scenarios but this were subject to severalassumptions and constraints which might notgive an accurate OD matrix estimate in a dif-ferent scenario.

In the second paper [2] , Daehyon Kim andYohan Chang adopted a type of ANN calledmulti-layer feed-forward network on a back-ward propagation model to solve the O-Destimation problem using link traffic counts.The result discussed in the paper showed thatthe backward propagation model provideda better estimation accuracy than the popu-larly adopted equilibrium-based O-D estima-tion. The authors mentioned that the modeldiscussed is also suitable for real-time dynamicO-D estimation problem and guessed that itmight be more reliable when applied to largecomplicated road networks with missing andnoisy link data. The issues with the methoddiscussed in this paper can be divided intotwo: first, the assumptions made based onthe result of the model must be validated andverified by doing the same test on a large com-plicated road. Lastly, the result of a neuralnetwork model is more accurate when trainedwith large data set and there is a limit to thesize of data that can be acquired through linktraffic count. This means that a more accurateOD matrix estimate would have been obtainedif the test made use of a huge GPS dataset.

In [3], Gusri Yaldi et.al took an interest-ing approach by testing the performance ofthree training algorithms of the NN modelswhen used in generating an OD matrix esti-mating and the aim was to know the algorithmthat would generate the most accurate ODmatrix estimate. The NN model was trainedwith Backward Propagation algorithm, Vari-able learning Rate Algorithm and LevenbergMarquardt algorithm. The already trained net-work is then used to predict an OD-matrix ofnew data set, which has not been used in thetraining process. The experiment was carriedout 30 times and the performance of the threealgorithms was compared by looking at theRoot Mean Square Error (RMSE) between theobserved and the estimated trip numbers. Theresult of the experiment showed that the NNmodel trained with LM is better than the othertwo algorithms. The authors also mentionedthat, there are other factors that can affect theperformance of the NN model, such as thetype of normalization method used. The ex-periment made use of a work trip data that isbased on an home interview survey in PadangCity, West Sumatra , Indonesia and it wouldbe interesting to see the output when the testis done with a big data set such as an GPSbus data set because NN models gives betteroutput when trained it big data set.

III. Neural Network Model

Neural Network is a concept that was inspiredby the operation of the brain and so far, it hasbeen successful in solving prediction and esti-mation problems mainly because of its abilityto find complex non-linear relationships. Itmakes use of Neurons known as nodes andsignals are sent from one neuron to the otherjust like it happens inside the brain. Each neu-rons in the network processes the signal re-ceived from other neurons through some func-tion known as the activation function and pro-duces an output which is either forwarded toanother neuron, displayed as final output orreturned back into the network for further pro-cessing .

2


The neurons in the network are arranged inlayers and information or signals as it is calledare exchanged between layers. The numberof layers and the type of connection betweenthem determines the architecture of the NN.For example, the figure 1 below is the smallestpossible neuron network structure called theperceptron. It has a two layers, the input layerwith two input neurons and the output layerwith one neuron.

Figure 1: Perceptron

The input neurons read in the two features ofthe dataset into the network and the neuron inthe output layer applies an activation functionon the inputs. The output neuron c performsthe simplest output function on a and b bymultiplying the values with randomly chosenweights and added to a bias.

c = f (2

∑n=1

wnxn + θ)

Where:xn is the value of nth inputwn is the weight of nth inputθ is the biasf is the activation function

Perceptrons are very limited in what theycan represent thus they are only use for rep-resenting linearly separable functions. Whenthe complexity of the relationship between theinput data and the output becomes complex,the number of layers in the Neural Network isincreased. Figure 2 is a type of neural networkcalled Multi-layer perceptron neural networkwith 2 layers, Input layer, Hidden layer andOutput layer. This type of architecture is bestat identifying patterns and trends in data forexample in pattern recognition problems andtime series problems.

Figure 2: MLP Architecture

In this paper we focus on one type of multi-perceptron neural network called backwardpropagation which is a derivative of deep neu-ral network. In this network, the weight foreach input is initialized and the bias is addedas discussed above. The computation of eachlayer is moved forward to the correspondinglayers till its get the output layer. The result atthe output layer is compared with the targetand if the result differs from the target with ahuge margin, the error is propagated back toprevious layers in the network so as to adjustthe previously used weights and bias.

The backward propagation algorithm tries tofind the minimum and maximum of a functionby iterating over the direction of the negative

3


of the slope of the function to be minimizedor maximized [7]. Mathematically, the error atthe output layer is:

E = 1/2i+1

∑i=1

(Oi − ti)2 MSE

and the algorithm tries to minimize:

∂E∂Wk

ij

Using the gradient descent strategy, the errorminimization problem becomes a chain rule ofdifferentiation:

∂E(n)∂Wij(n)

=∂E(n)∂ej(n)

∂ej(n)∂yj(n)

∂yj(n)∂vj(n)

∂vj(n)∂wij(n)

[7]

After simplifying, the rule for updating theweights at each node becomes:

∆wij = ηδj(n)yi(n)

Basically, the backward propagation is bro-ken down into the following steps:

1. Feed forward computation into the net-work

2. Compute Error

3. Backward propagation to the output layerbased on error

4. Backward propagation to the hidden layer

5. Adjust weight and repeat from 1.

Neural Network can be used to solve ODmatrix prediction problem [3]. In this paperwe try to model a NN architecture that can beused to predict OD matrix based on historicaldata.

While trying to solve an OD matrix predic-tion problem with NN , [3] compared differentNN training algorithms Levenberg-Marqardt(LM), Quickprop and Variable learning rate(VLR) algorithms. The result showed that LMoutperformed the other training algorithms.However, in this study, adaGrad algorithm isthe algorithm of choice because it has beenshown it works best for sparse dataset.

Details Value

Length 19kmBus stops (One direction) 59Journey Time (Excluding Peak Time) 60 minutesJourney Id 2

Figure 3: Details for Route 46A

The output of the model will be tested bycomputing the Root Mean Square Error (RMSE)between the expected value and the estimatedvalue.

RMSE =

√∑(tt

ij − Ttij)

2

z[3]

Where ttij = the observed journey time from

origin zone i to destination zone j for testingdata.Tt

ij = The estimated journey time from originzone i to destination zone j for testing data.z = the number of ij pairs.

IV. Data Collection

The dataset used for this study was obtainedfrom the web page of Dublin City Councilstraffic control and the data for the month ofNovember was selected from the dataset. Theroute chosen was route 46A, it was selectedbecause it is a busy route, it gives direct con-nectivity to both commercial and residentialareas for commuters.

i. Data Description

In the dataset, each bus produces data every 20seconds which is sent to the monitoring system.The data sent consist of the following:

1. The time-stamp of event, that is the timethe event was sent to the monitoring sys-tem.

2. Current location of the bus in Latitude andLongitude

4


3. An identifier for the bus, term as the vehi-cle journey id

4. An identifier for the journey, term as thejourney id

5. The journey pattern, which shows if thecurrent journey is north bound or southbound.

6. Bus stop id , which relates the bus to a stopon its journey. In this dataset, every eventsent by a bus has a bus stop identifier evenwhen the bus is not currently at the bus.

ii. Data Cleaning and Preprocessing

Before fitting the data into the neural networkmodel, I had to prepare the data for the modelby cleaning the data and extracting the ODfrom the pre-processed dataset. The first stepof preprocessing the data was to eliminate erro-neous points or noise from the data. I groupedthe dataset into journeys and point filteringwas done on each unique journey.

1. The dataset for line 46A comes withfour journey patterns 046A0001, 046A1001,401001 and 400001. While analyzing thedataset , only 046A0001 and 046A1001were frequently visited and they were alsolengthy (covers upto 19km) . Based on this,journeys belonging to 401001 and 400001were eliminated from the dataset.

2. Next, I eliminated all journey that hadall its data points inside a 100m by 100msquare. An example can be found in thefigure below.

3. Next , I eliminated all journeys that haddata points with large time jumps in it.Each bus are expected to send updateabout its location to the AVL every 20 sec-onds and it is possible to have delays dueto obstruction etc. Thus, journeys that sentupdate to the AVL after 3 minutes wereeliminate from the dataset.

4. Lastly, for each journey in the dataset, imarked all stops made by the bus at a

different bus stops throughout the jour-ney life span and journey with less than 8marked stops were marked bad journeysand dropped from the dataset.

The calculated Travel Time and Journey TravelTime were both taken as the base line for theprediction model.

iii. Matrix Extraction

The OD matrix for each journey in the alreadyfiltered dataset was extracted by calculatingthe journey time between stops in the journey.

JTab = Ta − Tb

where JT is the journey time between stop A and BTa is the time stamp of the bus at STOP ATa is the time stamp of the bus after getting toSTOP BThe table in figure 4 describes the OD matrix

Origin A B . . . ZA JTaa JTab . . . JTazB 0 JTbb . . . JTbz. . . . . . .. . . . . . .. . . . . . .Z 0 0 . . . JTzz

Figure 4: Origin Destination Matrix

extracted from the dataset for each journeysin the dataset. In total, 1195 OD matrix wasextracted from the dataset.

V. Data Analysis and Input

Variable declaration

To determine the correlation between the jour-ney times and other variables in the dataset, itwas necessary to analyze the filtered data. Byplotting the average journey time of trips inthe dataset for both the north and south boundjourney, some correlations were determined.From the plot in figure 5 and 6, we can see thatthe journey time is dependent on the hour ofthe day the journey started and the day of the

5


week. It is also logical to know that the journeytimes also depends on the distance betweenthe stops in the journey.

Therefore, from this analysis, the input vari-ables considered for the model were distancebetween stops, time of the day, day of the weekand the two considered stops. From this, thejourney time was predicted. Since the day ofthe week and stops are categorical data, theywere represented using One-Hot encoding.

Figure 5: North bound Journey

Figure 6: South bound Journey

Furthermore, the travel time distribution forthe south bound journey from Dun Laoghaire(Stop 2039) to Phoenix part (Stop 807) on week-days was analyzed. Figure 7 shows the distri-bution and from the plot it can be seen thatthe travel times in the dataset follow a normaldistribution curve. The left half of the plot rep-resents the non-peak periods while the right

half of the plot represents the travel time dis-tribution at peak-periods.

Figure 7: Travel Time distribution

It can also be deduced that the average traveltime in the dataset is 80 minute and with anyparticular trip starting at Stop 2039 to destina-tion Stop 807, there is a 68% probability thatit will be within 10 minutes from the averagetravel time.

VI. Preliminary Result

As discussed in the previous sction, a neuralnetwork was modeled and adagrad is the learn-ing algorithm of choice. The number of inputneurons used is 6 which is same as the numberof independent variables in the dataset whilethe number of neurons used in output layer is1 which is the dimension of travel time T tobe predicted. To determine the number of hid-den layers to use and the number of neuronsin each layer, we conducted an experiment toobtain the best combinations of values to useand these values are presented in table 1.

Furthermore, in order to improve the pre-diction accuracy of the model, the dataset wasdivided into two classes according to their esti-mated travel time before being used as an inputto the neural network. Each class of the datasetis passed into a different neural network, how-ever, the two networks have the same configu-ration highlighted in table 1. When predictiondone, the result of the two network is then com-

6


Details Value

Input Layer 6 neuronFirst hidden Layer 12 neuronSecond hidden Layer 50 neuronOutput Layer 1 neuronEpoch 1000Optimizer adaGrad

Table 1: NN configuration

bined. Figure 8 gives a pictorial description ofhow this division was done.

Figure 8: Dataset Class division

i. Model Evaluation

The performance of the model was evaluated interms of accuracy by using Root Mean SquareError (RMSE). That is the predicted travel time(TT) was tesed against the ground truth valuewhich is the estimated travel time discussed inchapter three.

RMSE =

√∑(ttj

a,b − TT ja,b)

2

z

Where ttja,b = the observed travel time from

stop a to stop b for a journey j in testing data.TT j

a,b = the ground truth travel time from stopa to stop b for a journey j in testing data.z = the total number of links in the journey.

Figure 9 presents the behavior of the modelwhen used to predict a single journey outside

peak hours. It can be seen that the differencebetween the estimated travel time between twolinks is approximately equal to one minute.That is, in most cases, the model predicted ac-curately the travel time while in some cases,the difference between the predicted and esti-mated is less than one minute.

Figure 9: Predicted link travel time for a journey outsidepeak hour

In figure 10, the result of using the modelto predict a journey inside a peak hour (16:00is an example of rush hour period as shownin the analysis done in chapter 3) is shown. Itcan be seen that the model is not affected bypeak hour as it almost accurately predicted theirregular travel time that might occur at thathour.

Figure 10: Predicted link travel time for a journey insidepeak hour

7


In figure 11, the graph for RMSE observedfor both peak hours and non peak hours for atypical weekday is shown. The figure showsthat the model performed better for peak hourscompared to normal hours while on Friday’s,the RMSE error grows more than the otherdays.

Figure 11: RMSE error observed by day for Non peakand peak hour

For long jumps, Figure 12 presents the RMSEobserved per day. As expected, the RMSE forlong jumps are larger than that of short jumps,however, the prediction accuracy at 88% accu-racy is still better than short jumps. The dif-ference between the predicted and the groundtruth is often between 3 and 4 minutes, whichis acceptable given the type of data used andthe factors considered. The model is also ableto capture but non-peak and peak periods asshown from the plot. The error margin be-tween peak hour and non-peak hours on Sun-day for long trips is very low compared to thatof short trips.

Figure 12: Long jumps measured RMSE

VII. Conclusion

The result showed that the NN model dis-cussed is capable of making near accurate pre-dictions when the number of stops between theorigin and destination is at least 4 stops. How-ever, if the travel time between two or threestops is between 1 and 3 minutes, the modelgives poor prediction with prediction accuracyclose to 70%. This is because the variability forshort trips are often caused by time spent attraffic lights at intersections which are oftenunpredictable. To improve the expectations ofthe Model for short trips, the travel time pre-dicted can be given in form of lower and upperbound pair.

Although, given a bigger dataset of two tothree months, the NN model might be capableof making more accurate predictions.

References

[1] Remya K P and Samson Mathew. “OD Ma-trix Estimation from Link Counts UsingArtificial Neural Network (2013) ” Inter-national Journal of Scientific and EngineeringResearch.

[2] Daehyon Kim and Yohan Chang (2011) “Neural Network-based O-D Matrix Esti-mation from Link Traffic Counts ”

[3] Gusri Yaldi, Michael A P and TaylorWen Long Yue “ Forecasting origin-

8


destination matrices by using neuralnetwork approach: A comparison oftesting performance between backpropagation, variable learning rateand levenberg-marquardt algorithms”http://atrf.info/papers/2011/2011_

Yaldi_Taylor_Yue.pdf

[4] Jean DamascÃlne Mazimpaka and SabineTimpf“How They Move Reveals What IsHappening: Understanding the Dynam-ics of Big Events from Human MobilityPattern ” International Journal of GeoInfor-mation, January 2017.

[5] Manojit Nandi “Density Based Cluster-ing” https://blog.dominodatalab.com/

topology-and-density-based-clustering/

[6] Jing Gao “Clustering: Den-sity Based Method”https://blog.dominodatalab.com/

topology-and-density-based-clustering/

Lecture Note for State University of NewYork College, Buffalo .

[7] Abhishek Kar “Stock Prediction usingNeural Network” Department of Computerof Science and Engineering , IIT Kanpur .

9

http://atrf.info/papers/2011/2011_Yaldi_Taylor_Yue.pdf

http://atrf.info/papers/2011/2011_Yaldi_Taylor_Yue.pdf

https://blog.dominodatalab.com/topology-and-density-based-clustering/





link travel time prediction using origin destination …ds.cs.ut.ee/courses/course-files/project...

Documents