improving bus predictions using machine learning · seabus: 3 passenger-only ferries. bus: 200+ bus...

Improving Bus Predictions Using

Machine LearningRon Mok, Software Development Manager at TransLinkNew Westminster, British Columbia, Canada

TransLink Quick Facts

• Regional transportation authority for

Metro Vancouver

• Canada’s largest transit service

area: 1800 km2 (700 mi2)

• Serving a population of 2.5M

residents

TransLink Quick Facts - Fleet

SkyTrain: 326 train cars that serve 53 stations along

79km (49mi) of rapid transit. Longest rapid transit system

in Canada and the one of the longest fully automated

driverless systems in the world.

SeaBus: 3 passenger-only ferries.

Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262

million bus trips across our network in 2018.

When Is The Next Bus?

Next Bus

SMS

Next Bus Web

App

Google Maps Other 3rd party

apps

How Were Predictions Being Made?

• Predictions were based on comparing Automatic Vehicle Location (AVL)

data to the schedule and shifting the schedule based on the difference.

• The AVL dataset provides real-time GPS data for buses. This data is

sent whenever a bus leaves or passes by a bus stop or every 90

seconds whichever comes first.

• Susceptible to changing weather and traffic conditions as these

predictions do not account for those.

• This is what we called our RTTI (Real Time Transit Information) system.

What Is The Problem?• We have real-time next

bus predictions, but

they’re not always

accurate.

What Is The Problem?

Our Approach – Machine Learning

Some Examples….

?R1

Run Time Model Dwell Time Model

+ +Predicted Departure

Time @ Stop B =

R1 D1

Additional

Factors

?D1

BA

Weather• The Dark Sky API is used to

acquire real-time and forecast

weather data for the next 4

hours.

• Weather depends on the location

of the segment or bus stop, so

we broke the bus transit system

into 13 geographical regions and

pulled data for each one.

Weather• The Dark Sky API offers a full collection of weather

conditions. We used the following parameters in our models:

➢ Chance of precipitation

➢ Pressure

➢ Intensity of precipitation

➢ Wind speed

➢ Temperature

➢ Wind bearing

➢ Apparent temperature

➢ Cloud cover

➢ Dew point

➢ UV index

➢ Humidity

➢ Visibility

ML Details – Model Level Predictions• The ML model we chose was Extreme Gradient Boosting (XGBoost)

because of its fast training time and high-performance rate.

• Each bus stop and segment in the transit system has their own set of

machine learning models, provided that there is enough data to train

on (at least 500 instances of dwell/run events).

• As we create and train a different set of XGBoost models for every

single segment and bus stop in our system, we end up with over

18,000 different sets of models.

ML Details – Bus Level Predictions• With the ML models generating prediction times for individual

segments and stops (model-level predictions), we need to create an

algorithm that aggregates these predictions together to create stop

arrival/departure times for particular buses (bus-level predictions).

• Every time new information about a bus location or weather is

received in our system, these bus-level predictions are recalculated

using the latest relevant model-level predictions.

Bus Level Predictions

Run Time

ML Models

Dwell Time

ML Models

R3 R4 R5

D1

R1 R2

D5D2 D3 D4

A B C D E F

Predicted Departure

Time @ Stop F =

+R1

+D1

+R2

+D2

+R3

+D3

+R4

+D4

+R5

+D5

Infrastructure ConsiderationsSpeed

➢Departure time predictions must be generated and served to customers in a

timely manner as predictions become stale in minutes.

➢ The implementation method must be able to generate predictions every 90

seconds for up to a total 1200 buses and their upcoming stops.

Storage

➢Reporting database was growing at a rate of roughly 6TB a year.

Scaling

➢With each bus line having an average of 30 segments and 30 stops, the

system must be able to generate over 20,000 predictions per minute. This

varies based on peak/non-peak times.

Infrastructure Implementation

Microsoft Azure cloud platform

➢ Easy to deploy resources, e.g. VMs, DBs

➢ Supports containers to help with scaling and deployment

➢ Technical support and consultation readily available

➢ Matched the team skillset

Pilot Routes13 pilot routes chosen

based on the following

factors:

➢ Trip length

➢ Number of stops

➢ Frequency

➢ Week day services

➢ Road types

➢ Vehicle types

➢ Bus rapid transit

➢ Weather sensitivity

➢ Bus type

Results

Machine Learning

Scheduled

RTTI

Blended

Actual

Time Of Day

Results

Results

Average Error

+/- 3 min 13 sec

+/- 1 min 41 sec

-4 min -2 min 0 min 2 min 4 min

Old

New

47.8% less error!

Average Unexpected Wait Time Per Boarding

60% less waiting

per boarding!

+/- 3 min 27 sec

+/- 1 min 23 sec

-4 min -2 min 0 min 2 min 4 min

Old

New

Operational Support & Maintenance

Cloud cost reduction➢ Optimize code to reduce VM usage

➢ Kubernetes to auto-scale VM cluster based on need

➢ Cheap cold-storage options for data archiving

➢ Reserved instances

➢Turn-off/de-allocate unused resources


Drift Detection➢ Like any ML model, predictions from our models will drift over time. The

system must have the ability to automatically detect when models drift beyond

an acceptable threshold and trigger model retraining with more recent data.

➢With over 18,000 sets of machine learning models, models should have

version control; model linage information must be traceable to allow future

investigations and reproduction of models if required.


Drift Detection Workflow

Train Model

Drift

Detection

System

Score Model Approved?

Promote

Model To

Production

No

Yes

Training

Data

Testing

Data

ArchitectureHighlights:

➢ The aggregator takes the weather

and bus vehicle monitoring (VM) info

to generate bus-level predictions

➢ Predictions queue “smooths” out the

# of predictions

➢ Bus Schedule DB serves predictions

to the customer

➢ Drift monitoring system ensures

models are kept accurate

Project Timeline

Q2

2018

Q4

2018

Q1

2019

Microsoft Proof-Of-

Concept on a single

bus route

Design a more

powerful ML

algorithm

Implement ML

predictions for 13

pilot routes

Q3

2019

Implement ML

predictions for all

remaining bus routes

Q2

2019

Include additional

features to ML models

Q3

2018

Q4

2019

Implement drift detection and

automated model creation &

cost optimization

What’s Next?• Improve ML algorithm with additional factors:

➢ Traffic

➢ Bus driver

➢ Special events

• Continue cost optimizations

• Monitor & improve data quality

Is Machine Learning Right For You?• Do you have a lot of data?

• Is the data cleaned and validated?

• Do you have a business objective that is measurable with data?

• Does your team have the correct skills (programming & stats)?

• Is the necessary computing power & storage available to you?

• Do you have the ability to do drift detection?

If you answered “no” to any of these questions then you’re not ready for ML.

If you answered “yes” to all of these questions, then ML might be right for you!

Thank You!

https://www.translink.ca

https://www.translink.ca/

improving bus predictions using machine learning · seabus: 3 passenger-only ferries. bus: 200+ bus...

Documents