to a data-driven product from research prototype …...fairtiq reduces the hurdles to public...
TRANSCRIPT
From research prototypeto a data-driven product
Roman Prokofyev
linkedin.com/in/rprokofyev
6th Swiss Data Science Conference, Bern. 14.06.2019
Agenda
● Intro to FAIRTIQ● Location data challenges● Data annotation● Quality assurance
Intro to FAIRTIQ
FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click
• You get a valid ticket with one swipe• With another click at the end of your
journey, you are automatically charged the lowest possible fare for the route traveled
4
Ticket is valid in the whole tariff
community
Journey is charged after
check-out
Data collection
What is collected?
6
Accuracy
How it is collected
7
uses mostly GPS
Fuses multiple sensors: WiFi,
Cell, seldom GPS
It’s unknown what sensor was used
Uniqueness of location data
8
Uniqueness of location data
9
Uniqueness of geo data
10
Precision of location data: outliers
11
Precision of location data: vehicles
12
Train Bus
Integrity of location data
13
Time gapsLog
Challenges
Uniqueness Outliers Time gaps
data is never the same on the same path
different vehicles
because not only GPS is used
because underground or OS throttling
Data annotation
What to annotate?
16
Modes
What to annotate?
Stations
Trains
Time
FAIRTIQ Annotations
18
Annotate only stations, the rest is derived automatically
Semi-automatic annotations
19
Timetable data
Luzern
Bern
Bern, Bahnhof
Bern, Kursaal
Luzern 12:00
TrainBern 13:04
Bern, Bahnhof 13:08
TramBern, Kursaal 13:18
Stations Automatic annotation
Quality assurance
Metrics
21
Most common metrics: Precision, Recall.
A B C
P = 1.0 R = 1.0
D
A B C D
P = 1.0 R = 0.75P = 0.8 R =1.0
E
Ground truth
System output
Precision/Recall drawbacks
22
A
B
C
D
E
The metrics treat elements as unordered sets
P = 1.0 R = 1.0
C
Sequence alignment
23
A
B
C
D
E
Edit Distance
Pseq
= 1.0 Rseq
= 1.0Pseq
= 0.8 Rseq
= 0.8
A B C D E
A C B D E
1 insertion + 1 deletion
Be aware of the location data
challenges
Know what to annotate and what
to automate
Know what to measure
Key takeaways
Thank you for your attention
linkedin.com/in/rprokofyev