to a data-driven product from research prototype …...fairtiq reduces the hurdles to public...

25
From research prototype to a data-driven product Roman Prokofyev linkedin.com/in/rprokofyev 6th Swiss Data Science Conference, Bern. 14.06.2019

Upload: others

Post on 20-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

From research prototypeto a data-driven product

Roman Prokofyev

linkedin.com/in/rprokofyev

6th Swiss Data Science Conference, Bern. 14.06.2019

Page 2: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Agenda

● Intro to FAIRTIQ● Location data challenges● Data annotation● Quality assurance

Page 3: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Intro to FAIRTIQ

Page 4: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click

• You get a valid ticket with one swipe• With another click at the end of your

journey, you are automatically charged the lowest possible fare for the route traveled

4

Ticket is valid in the whole tariff

community

Journey is charged after

check-out

Page 5: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Data collection

Page 6: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

What is collected?

6

Accuracy

Page 7: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

How it is collected

7

uses mostly GPS

Fuses multiple sensors: WiFi,

Cell, seldom GPS

It’s unknown what sensor was used

Page 8: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Uniqueness of location data

8

Page 9: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Uniqueness of location data

9

Page 10: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Uniqueness of geo data

10

Page 11: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Precision of location data: outliers

11

Page 12: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Precision of location data: vehicles

12

Train Bus

Page 13: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Integrity of location data

13

Time gapsLog

Page 14: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Challenges

Uniqueness Outliers Time gaps

data is never the same on the same path

different vehicles

because not only GPS is used

because underground or OS throttling

Page 15: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Data annotation

Page 16: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

What to annotate?

16

Modes

Page 17: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

What to annotate?

Stations

Trains

Time

Page 18: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

FAIRTIQ Annotations

18

Annotate only stations, the rest is derived automatically

Page 19: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Semi-automatic annotations

19

Timetable data

Luzern

Bern

Bern, Bahnhof

Bern, Kursaal

Luzern 12:00

TrainBern 13:04

Bern, Bahnhof 13:08

TramBern, Kursaal 13:18

Stations Automatic annotation

Page 20: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Quality assurance

Page 21: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Metrics

21

Most common metrics: Precision, Recall.

A B C

P = 1.0 R = 1.0

D

A B C D

P = 1.0 R = 0.75P = 0.8 R =1.0

E

Ground truth

System output

Page 22: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Precision/Recall drawbacks

22

A

B

C

D

E

The metrics treat elements as unordered sets

P = 1.0 R = 1.0

C

Page 23: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Sequence alignment

23

A

B

C

D

E

Edit Distance

Pseq

= 1.0 Rseq

= 1.0Pseq

= 0.8 Rseq

= 0.8

A B C D E

A C B D E

1 insertion + 1 deletion

Page 24: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Be aware of the location data

challenges

Know what to annotate and what

to automate

Know what to measure

Key takeaways

Page 25: to a data-driven product From research prototype …...FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click •You get a valid ticket with one swipe •With

Thank you for your attention

linkedin.com/in/rprokofyev