![Page 1: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/1.jpg)
Improving Bus Predictions Using
Machine LearningRon Mok, Software Development Manager at TransLinkNew Westminster, British Columbia, Canada
![Page 2: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/2.jpg)
TransLink Quick Facts
• Regional transportation authority for
Metro Vancouver
• Canada’s largest transit service
area: 1800 km2 (700 mi2)
• Serving a population of 2.5M
residents
![Page 3: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/3.jpg)
TransLink Quick Facts - Fleet
SkyTrain: 326 train cars that serve 53 stations along
79km (49mi) of rapid transit. Longest rapid transit system
in Canada and the one of the longest fully automated
driverless systems in the world.
SeaBus: 3 passenger-only ferries.
Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262
million bus trips across our network in 2018.
![Page 4: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/4.jpg)
When Is The Next Bus?
Next Bus
SMS
Next Bus Web
App
Google Maps Other 3rd party
apps
![Page 5: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/5.jpg)
How Were Predictions Being Made?
• Predictions were based on comparing Automatic Vehicle Location (AVL)
data to the schedule and shifting the schedule based on the difference.
• The AVL dataset provides real-time GPS data for buses. This data is
sent whenever a bus leaves or passes by a bus stop or every 90
seconds whichever comes first.
• Susceptible to changing weather and traffic conditions as these
predictions do not account for those.
• This is what we called our RTTI (Real Time Transit Information) system.
![Page 6: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/6.jpg)
What Is The Problem?• We have real-time next
bus predictions, but
they’re not always
accurate.
![Page 7: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/7.jpg)
What Is The Problem?
![Page 8: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/8.jpg)
Our Approach – Machine Learning
Some Examples….
?R1
Run Time Model Dwell Time Model
+ +Predicted Departure
Time @ Stop B =
R1 D1
Additional
Factors
?D1
BA
![Page 9: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/9.jpg)
Weather• The Dark Sky API is used to
acquire real-time and forecast
weather data for the next 4
hours.
• Weather depends on the location
of the segment or bus stop, so
we broke the bus transit system
into 13 geographical regions and
pulled data for each one.
![Page 10: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/10.jpg)
Weather• The Dark Sky API offers a full collection of weather
conditions. We used the following parameters in our models:
➢ Chance of precipitation
➢ Pressure
➢ Intensity of precipitation
➢ Wind speed
➢ Temperature
➢ Wind bearing
➢ Apparent temperature
➢ Cloud cover
➢ Dew point
➢ UV index
➢ Humidity
➢ Visibility
![Page 11: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/11.jpg)
ML Details – Model Level Predictions• The ML model we chose was Extreme Gradient Boosting (XGBoost)
because of its fast training time and high-performance rate.
• Each bus stop and segment in the transit system has their own set of
machine learning models, provided that there is enough data to train
on (at least 500 instances of dwell/run events).
• As we create and train a different set of XGBoost models for every
single segment and bus stop in our system, we end up with over
18,000 different sets of models.
![Page 12: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/12.jpg)
ML Details – Bus Level Predictions• With the ML models generating prediction times for individual
segments and stops (model-level predictions), we need to create an
algorithm that aggregates these predictions together to create stop
arrival/departure times for particular buses (bus-level predictions).
• Every time new information about a bus location or weather is
received in our system, these bus-level predictions are recalculated
using the latest relevant model-level predictions.
![Page 13: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/13.jpg)
Bus Level Predictions
Run Time
ML Models
Dwell Time
ML Models
R3 R4 R5
D1
R1 R2
D5D2 D3 D4
A B C D E F
Predicted Departure
Time @ Stop F =
+R1
+D1
+R2
+D2
+R3
+D3
+R4
+D4
+R5
+D5
![Page 14: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/14.jpg)
Infrastructure ConsiderationsSpeed
➢Departure time predictions must be generated and served to customers in a
timely manner as predictions become stale in minutes.
➢ The implementation method must be able to generate predictions every 90
seconds for up to a total 1200 buses and their upcoming stops.
Storage
➢Reporting database was growing at a rate of roughly 6TB a year.
Scaling
➢With each bus line having an average of 30 segments and 30 stops, the
system must be able to generate over 20,000 predictions per minute. This
varies based on peak/non-peak times.
![Page 15: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/15.jpg)
Infrastructure Implementation
Microsoft Azure cloud platform
➢ Easy to deploy resources, e.g. VMs, DBs
➢ Supports containers to help with scaling and deployment
➢ Technical support and consultation readily available
➢ Matched the team skillset
![Page 16: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/16.jpg)
Pilot Routes13 pilot routes chosen
based on the following
factors:
➢ Trip length
➢ Number of stops
➢ Frequency
➢ Week day services
➢ Road types
➢ Vehicle types
➢ Bus rapid transit
➢ Weather sensitivity
➢ Bus type
![Page 17: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/17.jpg)
Results
Machine Learning
Scheduled
RTTI
Blended
Actual
Time Of Day
![Page 18: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/18.jpg)
Results
![Page 19: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/19.jpg)
Results
Average Error
+/- 3 min 13 sec
+/- 1 min 41 sec
-4 min -2 min 0 min 2 min 4 min
Old
New
47.8% less error!
Average Unexpected Wait Time Per Boarding
60% less waiting
per boarding!
+/- 3 min 27 sec
+/- 1 min 23 sec
-4 min -2 min 0 min 2 min 4 min
Old
New
![Page 20: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/20.jpg)
Operational Support & Maintenance
Cloud cost reduction➢ Optimize code to reduce VM usage
➢ Kubernetes to auto-scale VM cluster based on need
➢ Cheap cold-storage options for data archiving
➢ Reserved instances
➢Turn-off/de-allocate unused resources
![Page 21: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/21.jpg)
Operational Support & Maintenance
Drift Detection➢ Like any ML model, predictions from our models will drift over time. The
system must have the ability to automatically detect when models drift beyond
an acceptable threshold and trigger model retraining with more recent data.
➢With over 18,000 sets of machine learning models, models should have
version control; model linage information must be traceable to allow future
investigations and reproduction of models if required.
![Page 22: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/22.jpg)
Operational Support & Maintenance
Drift Detection Workflow
Train Model
Drift
Detection
System
Score Model Approved?
Promote
Model To
Production
No
Yes
Training
Data
Testing
Data
![Page 23: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/23.jpg)
ArchitectureHighlights:
➢ The aggregator takes the weather
and bus vehicle monitoring (VM) info
to generate bus-level predictions
➢ Predictions queue “smooths” out the
# of predictions
➢ Bus Schedule DB serves predictions
to the customer
➢ Drift monitoring system ensures
models are kept accurate
![Page 24: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/24.jpg)
Project Timeline
Q2
2018
Q4
2018
Q1
2019
Microsoft Proof-Of-
Concept on a single
bus route
Design a more
powerful ML
algorithm
Implement ML
predictions for 13
pilot routes
Q3
2019
Implement ML
predictions for all
remaining bus routes
Q2
2019
Include additional
features to ML models
Q3
2018
Q4
2019
Implement drift detection and
automated model creation &
cost optimization
![Page 25: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/25.jpg)
What’s Next?• Improve ML algorithm with additional factors:
➢ Traffic
➢ Bus driver
➢ Special events
• Continue cost optimizations
• Monitor & improve data quality
![Page 26: Improving Bus Predictions Using Machine Learning · SeaBus: 3 passenger-only ferries. Bus: 200+ bus routes with a fleet of 1,500+ vehicles. 262 million bus trips across our network](https://reader034.vdocument.in/reader034/viewer/2022051603/5ff1280f1b56580d73156a61/html5/thumbnails/26.jpg)
Is Machine Learning Right For You?• Do you have a lot of data?
• Is the data cleaned and validated?
• Do you have a business objective that is measurable with data?
• Does your team have the correct skills (programming & stats)?
• Is the necessary computing power & storage available to you?
• Do you have the ability to do drift detection?
If you answered “no” to any of these questions then you’re not ready for ML.
If you answered “yes” to all of these questions, then ML might be right for you!