improving bus predictions using machine learning · seabus: 3 passenger-only ferries. bus: 200+ bus...
TRANSCRIPT
Improving Bus Predictions Using
Machine LearningRon Mok, Software Development Manager at TransLinkNew Westminster, British Columbia, Canada
TransLink Quick Facts
• Regional transportation authority for
Metro Vancouver
• Canada’s largest transit service
area: 1800 km2 (700 mi2)
• Serving a population of 2.5M
residents
TransLink Quick Facts - Fleet
SkyTrain: 326 train cars that serve 53 stations along
79km (49mi) of rapid transit. Longest rapid transit system
in Canada and the one of the longest fully automated
driverless systems in the world.
SeaBus: 3 passenger-only ferries.
Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262
million bus trips across our network in 2018.
When Is The Next Bus?
Next Bus
SMS
Next Bus Web
App
Google Maps Other 3rd party
apps
How Were Predictions Being Made?
• Predictions were based on comparing Automatic Vehicle Location (AVL)
data to the schedule and shifting the schedule based on the difference.
• The AVL dataset provides real-time GPS data for buses. This data is
sent whenever a bus leaves or passes by a bus stop or every 90
seconds whichever comes first.
• Susceptible to changing weather and traffic conditions as these
predictions do not account for those.
• This is what we called our RTTI (Real Time Transit Information) system.
What Is The Problem?• We have real-time next
bus predictions, but
they’re not always
accurate.
What Is The Problem?
Our Approach – Machine Learning
Some Examples….
?R1
Run Time Model Dwell Time Model
+ +Predicted Departure
Time @ Stop B =
R1 D1
Additional
Factors
?D1
BA
Weather• The Dark Sky API is used to
acquire real-time and forecast
weather data for the next 4
hours.
• Weather depends on the location
of the segment or bus stop, so
we broke the bus transit system
into 13 geographical regions and
pulled data for each one.
Weather• The Dark Sky API offers a full collection of weather
conditions. We used the following parameters in our models:
➢ Chance of precipitation
➢ Pressure
➢ Intensity of precipitation
➢ Wind speed
➢ Temperature
➢ Wind bearing
➢ Apparent temperature
➢ Cloud cover
➢ Dew point
➢ UV index
➢ Humidity
➢ Visibility
ML Details – Model Level Predictions• The ML model we chose was Extreme Gradient Boosting (XGBoost)
because of its fast training time and high-performance rate.
• Each bus stop and segment in the transit system has their own set of
machine learning models, provided that there is enough data to train
on (at least 500 instances of dwell/run events).
• As we create and train a different set of XGBoost models for every
single segment and bus stop in our system, we end up with over
18,000 different sets of models.
ML Details – Bus Level Predictions• With the ML models generating prediction times for individual
segments and stops (model-level predictions), we need to create an
algorithm that aggregates these predictions together to create stop
arrival/departure times for particular buses (bus-level predictions).
• Every time new information about a bus location or weather is
received in our system, these bus-level predictions are recalculated
using the latest relevant model-level predictions.
Bus Level Predictions
Run Time
ML Models
Dwell Time
ML Models
R3 R4 R5
D1
R1 R2
D5D2 D3 D4
A B C D E F
Predicted Departure
Time @ Stop F =
+R1
+D1
+R2
+D2
+R3
+D3
+R4
+D4
+R5
+D5
Infrastructure ConsiderationsSpeed
➢Departure time predictions must be generated and served to customers in a
timely manner as predictions become stale in minutes.
➢ The implementation method must be able to generate predictions every 90
seconds for up to a total 1200 buses and their upcoming stops.
Storage
➢Reporting database was growing at a rate of roughly 6TB a year.
Scaling
➢With each bus line having an average of 30 segments and 30 stops, the
system must be able to generate over 20,000 predictions per minute. This
varies based on peak/non-peak times.
Infrastructure Implementation
Microsoft Azure cloud platform
➢ Easy to deploy resources, e.g. VMs, DBs
➢ Supports containers to help with scaling and deployment
➢ Technical support and consultation readily available
➢ Matched the team skillset
Pilot Routes13 pilot routes chosen
based on the following
factors:
➢ Trip length
➢ Number of stops
➢ Frequency
➢ Week day services
➢ Road types
➢ Vehicle types
➢ Bus rapid transit
➢ Weather sensitivity
➢ Bus type
Results
Machine Learning
Scheduled
RTTI
Blended
Actual
Time Of Day
Results
Results
Average Error
+/- 3 min 13 sec
+/- 1 min 41 sec
-4 min -2 min 0 min 2 min 4 min
Old
New
47.8% less error!
Average Unexpected Wait Time Per Boarding
60% less waiting
per boarding!
+/- 3 min 27 sec
+/- 1 min 23 sec
-4 min -2 min 0 min 2 min 4 min
Old
New
Operational Support & Maintenance
Cloud cost reduction➢ Optimize code to reduce VM usage
➢ Kubernetes to auto-scale VM cluster based on need
➢ Cheap cold-storage options for data archiving
➢ Reserved instances
➢Turn-off/de-allocate unused resources
Operational Support & Maintenance
Drift Detection➢ Like any ML model, predictions from our models will drift over time. The
system must have the ability to automatically detect when models drift beyond
an acceptable threshold and trigger model retraining with more recent data.
➢With over 18,000 sets of machine learning models, models should have
version control; model linage information must be traceable to allow future
investigations and reproduction of models if required.
Operational Support & Maintenance
Drift Detection Workflow
Train Model
Drift
Detection
System
Score Model Approved?
Promote
Model To
Production
No
Yes
Training
Data
Testing
Data
ArchitectureHighlights:
➢ The aggregator takes the weather
and bus vehicle monitoring (VM) info
to generate bus-level predictions
➢ Predictions queue “smooths” out the
# of predictions
➢ Bus Schedule DB serves predictions
to the customer
➢ Drift monitoring system ensures
models are kept accurate
Project Timeline
Q2
2018
Q4
2018
Q1
2019
Microsoft Proof-Of-
Concept on a single
bus route
Design a more
powerful ML
algorithm
Implement ML
predictions for 13
pilot routes
Q3
2019
Implement ML
predictions for all
remaining bus routes
Q2
2019
Include additional
features to ML models
Q3
2018
Q4
2019
Implement drift detection and
automated model creation &
cost optimization
What’s Next?• Improve ML algorithm with additional factors:
➢ Traffic
➢ Bus driver
➢ Special events
• Continue cost optimizations
• Monitor & improve data quality
Is Machine Learning Right For You?• Do you have a lot of data?
• Is the data cleaned and validated?
• Do you have a business objective that is measurable with data?
• Does your team have the correct skills (programming & stats)?
• Is the necessary computing power & storage available to you?
• Do you have the ability to do drift detection?
If you answered “no” to any of these questions then you’re not ready for ML.
If you answered “yes” to all of these questions, then ML might be right for you!