pivotal digital transformation forum: data science

27
Data Science: Bridging the Gap Between Data Generation and Data Comprehension Dr Carsten Riggselsen Principal Data Scientist Pivotal

Upload: pivotal

Post on 16-Apr-2017

6.466 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Data Science: Bridging the Gap Between Data Generation and Data Comprehension

Dr Carsten Riggselsen Principal Data Scientist Pivotal

2 © Copyright 2015 Pivotal. All rights reserved.

Analyzing data is nothing new

3 © Copyright 2015 Pivotal. All rights reserved.

“Their Data” “Our Data” “My Data”

“Data”

“The Data”

“Data (Big)”

4 © Copyright 2015 Pivotal. All rights reserved.

“Data” vs. “Data-Driven”

Deploy analytic apps and automation at scale

Store any type and size of data

Discover insights Create analytics algorithms

5 © Copyright 2015 Pivotal. All rights reserved.

6 © Copyright 2015 Pivotal. All rights reserved.

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

Data Science

7 © Copyright 2015 Pivotal. All rights reserved.

Isolated Data Science

I don’t think (Big) Data is valuable, it’s a hype

– prove me wrong. We do BI and stuff

already. Data Science is a hype – prove me wrong.

8 © Copyright 2015 Pivotal. All rights reserved.

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

Data Science

9 © Copyright 2015 Pivotal. All rights reserved.

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

10 © Copyright 2015 Pivotal. All rights reserved.

“Mere” convenience through Apps

Automate mundane or tedious tasks

Present information at a glance in an app

User Interaction with the app

Consistency and unbiasedness

24-7 availability

Scalability

Platform independence

Easy Provisioning

11 © Copyright 2015 Pivotal. All rights reserved.

Smart Apps – Data Science Powered

Combining/link data sources/streams across areas and domains

There is an element of prediction involved based on accumulated data/info

Inferring (ab)normal patterns, e.g., profiling users, usage patterns

There is an element of root-cause identification involved

12 © Copyright 2015 Pivotal. All rights reserved.

DS-Cheat-Sheet - Is it a SMART App?

q  Can past knowledge potentially improve on how to inform or act in the future?

q  Is past knowledge based on data/info from different domains? q  Do you need to affect outcomes in real-time? q  Are (ab)normal patterns to be inferred? q  Is the reason or cause for an action or a pattern unclear yet an important

thing to know?

q  Is the solution highly personalised? q  Is “crowdsourcing” knowledge (data/information) beneficial?

13 © Copyright 2015 Pivotal. All rights reserved.

The Car Unlock Button – Press it!

14 © Copyright 2015 Pivotal. All rights reserved.

“Siri or OK Google – unlock my car… UnnnLoooock my Caaaar…”

“OK – I will unlock your house”

15 © Copyright 2015 Pivotal. All rights reserved.

SMART Unlock

Access to your Calendar/Agenda

Infer where/when you usually go by car

Awareness of Bank Holidays etc.

Knows where you parked your car

Knows where you are (GPS)

16 © Copyright 2015 Pivotal. All rights reserved.

Works Efficient Convenient Smart

The Car-Unlock Experience

I unlocked your car!

17 © Copyright 2015 Pivotal. All rights reserved.

Examples

18 © Copyright 2015 Pivotal. All rights reserved.

Obstruction Duration Prediction

•  Predict duration of road incidents in London

•  Android app developed on top of the model

•  http://ds-demo-transport.cfapps.io

19 © Copyright 2015 Pivotal. All rights reserved.

R E A LT I M E DASHBOARD Driving Prediction

https://youtu.be/5gySgGWJMHA

20 © Copyright 2015 Pivotal. All rights reserved.

Time to Delivery

� Three sub problems –  Time to delivery estimate –  Time slot availability –  Courier scheduling

� Courier scheduling and time to delivery estimate may have mutual feedback

Logistics Comp. Logistics Comp.

21 © Copyright 2015 Pivotal. All rights reserved.

Telco: Protecting Minors - Age Prediction

Estimate age of the customer based on their calling habits

Can distinguish minors in with an accuracy of >80%

•  Call records from March-Aug 2014

•  Corresponds to ~3TB data

•  Attributes are •  Calling party ID •  Called party ID •  Date •  Time •  Duration at start/end •  Location •  Type of call and bearer •  TAC •  Data

•  Call records from March-Aug 2014

•  Corresponds to ~3TB data

•  Attributes are •  Calling party ID •  Called party ID •  Date •  Time •  Duration at start/end •  Location •  Type of call and bearer •  TAC •  Data

CDR CRM Data Feature Importance Observation

Calls (holidays-schooltime) 0.08-0.06 Minors call less in school holiday

Average call length 0.07 Minors make shorter calls

Call timing (night-day) 0.07-0.03 Minors call more at nighttime

Number of phone uses 0.05 Minors use the phones less

Percentage of text use 0.05 Minors text less

Number of contacts 0.05 Minors less likely to have 1 contact

Percentage of calls to minors 0.04 Minors call other minors more

Percentage of voice use 0.04 Typical

Caller-Callee ratio 0.04 Minors receive more calls than make

Fri/Sat/Thurs ratio 0.04-0.03 Minors call more at weekends

Number of locations 0.04 Minors more likely to have 2 locs

22 © Copyright 2015 Pivotal. All rights reserved.

Internal Transaction Fraud Detection

Beyond signatures

Beyond simple metrics for thresholding

Beyond manual engineering of rules

Monitor each and every entity in its environmental context

23 © Copyright 2015 Pivotal. All rights reserved.

Internal Transaction Fraud Detection

Beyond signatures

Beyond simple metrics for thresholding

Beyond manual engineering of rules

Monitor each and every entity in its environmental context

24 © Copyright 2015 Pivotal. All rights reserved.

2

5

3

3

3,25

UserID and Data Experts analyze Overall vote is determined

S(id) = w1 ·M1(id) + . . .+ wj ·Mj(id)

X

i

wi = 1

s.t.Weights are a measure of “importance” for model expert j. Initially uniform across all experts.

Mixture of Experts Metaphor

25 © Copyright 2015 Pivotal. All rights reserved.

Anomalous User Behavior Comparison

Mean Anomaly Scores Users

Transaction Anomaly

SoD Risk

Terminated Employees

CDHDR Access

Anomaly

VPN Access

Anomaly

Cluster Outlier

Total Score # %

Reg B

Red 0.6 0.6 0.1 0.2 0.1 0.6 2.3 26 0.3%

Amber 0.4 0.5 0.1 0.1 0.1 0.6 1.7 73 0.8%

Green 0.0 0.0 0.0 0.0 0.1 0.0 0.1 8,765 98.9%

Reg A

Red 0.1 - - 1.0 0.4 0.9 2.4 1 0.01%

Amber 0.4 0.2 0.0 0.1 0.2 0.7 1.7 25 0.4%

Green 0.0 0.0 0.0 0.0 0.1 0.0 0.2 6,853 99.6%

26 © Copyright 2015 Pivotal. All rights reserved.

Add SMARTness to your app by leveraging data

Don’t think of Data Science in an isolated fashion

Move beyond POCs on Big Data

Start with a minimal viable product/solution

Get the right platform and resources in place

Collaborate and interact

Conclusions

Digital Transformation Forum

Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH