to a data-driven product from research prototype …...fairtiq reduces the hurdles to public...

Post on 20-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From research prototypeto a data-driven product

Roman Prokofyev

linkedin.com/in/rprokofyev

6th Swiss Data Science Conference, Bern. 14.06.2019

Agenda

● Intro to FAIRTIQ● Location data challenges● Data annotation● Quality assurance

Intro to FAIRTIQ

FAIRTIQ reduces the hurdles to public transport: A valid ticket with one click

• You get a valid ticket with one swipe• With another click at the end of your

journey, you are automatically charged the lowest possible fare for the route traveled

4

Ticket is valid in the whole tariff

community

Journey is charged after

check-out

Data collection

What is collected?

6

Accuracy

How it is collected

7

uses mostly GPS

Fuses multiple sensors: WiFi,

Cell, seldom GPS

It’s unknown what sensor was used

Uniqueness of location data

8

Uniqueness of location data

9

Uniqueness of geo data

10

Precision of location data: outliers

11

Precision of location data: vehicles

12

Train Bus

Integrity of location data

13

Time gapsLog

Challenges

Uniqueness Outliers Time gaps

data is never the same on the same path

different vehicles

because not only GPS is used

because underground or OS throttling

Data annotation

What to annotate?

16

Modes

What to annotate?

Stations

Trains

Time

FAIRTIQ Annotations

18

Annotate only stations, the rest is derived automatically

Semi-automatic annotations

19

Timetable data

Luzern

Bern

Bern, Bahnhof

Bern, Kursaal

Luzern 12:00

TrainBern 13:04

Bern, Bahnhof 13:08

TramBern, Kursaal 13:18

Stations Automatic annotation

Quality assurance

Metrics

21

Most common metrics: Precision, Recall.

A B C

P = 1.0 R = 1.0

D

A B C D

P = 1.0 R = 0.75P = 0.8 R =1.0

E

Ground truth

System output

Precision/Recall drawbacks

22

A

B

C

D

E

The metrics treat elements as unordered sets

P = 1.0 R = 1.0

C

Sequence alignment

23

A

B

C

D

E

Edit Distance

Pseq

= 1.0 Rseq

= 1.0Pseq

= 0.8 Rseq

= 0.8

A B C D E

A C B D E

1 insertion + 1 deletion

Be aware of the location data

challenges

Know what to annotate and what

to automate

Know what to measure

Key takeaways

Thank you for your attention

linkedin.com/in/rprokofyev

top related