pivotal digital transformation forum: data science
TRANSCRIPT
Data Science: Bridging the Gap Between Data Generation and Data Comprehension
Dr Carsten Riggselsen Principal Data Scientist Pivotal
3 © Copyright 2015 Pivotal. All rights reserved.
“Their Data” “Our Data” “My Data”
“Data”
“The Data”
“Data (Big)”
4 © Copyright 2015 Pivotal. All rights reserved.
“Data” vs. “Data-Driven”
Deploy analytic apps and automation at scale
Store any type and size of data
Discover insights Create analytics algorithms
6 © Copyright 2015 Pivotal. All rights reserved.
Data Science
Product Management
Product Design
Engineering
Continuous Improvement
Data Science
7 © Copyright 2015 Pivotal. All rights reserved.
Isolated Data Science
I don’t think (Big) Data is valuable, it’s a hype
– prove me wrong. We do BI and stuff
already. Data Science is a hype – prove me wrong.
8 © Copyright 2015 Pivotal. All rights reserved.
Data Science
Product Management
Product Design
Engineering
Continuous Improvement
Data Science
9 © Copyright 2015 Pivotal. All rights reserved.
Data Science
Product Management
Product Design
Engineering
Continuous Improvement
10 © Copyright 2015 Pivotal. All rights reserved.
“Mere” convenience through Apps
Automate mundane or tedious tasks
Present information at a glance in an app
User Interaction with the app
Consistency and unbiasedness
24-7 availability
Scalability
Platform independence
Easy Provisioning
11 © Copyright 2015 Pivotal. All rights reserved.
Smart Apps – Data Science Powered
Combining/link data sources/streams across areas and domains
There is an element of prediction involved based on accumulated data/info
Inferring (ab)normal patterns, e.g., profiling users, usage patterns
There is an element of root-cause identification involved
12 © Copyright 2015 Pivotal. All rights reserved.
DS-Cheat-Sheet - Is it a SMART App?
q Can past knowledge potentially improve on how to inform or act in the future?
q Is past knowledge based on data/info from different domains? q Do you need to affect outcomes in real-time? q Are (ab)normal patterns to be inferred? q Is the reason or cause for an action or a pattern unclear yet an important
thing to know?
q Is the solution highly personalised? q Is “crowdsourcing” knowledge (data/information) beneficial?
14 © Copyright 2015 Pivotal. All rights reserved.
“Siri or OK Google – unlock my car… UnnnLoooock my Caaaar…”
“OK – I will unlock your house”
15 © Copyright 2015 Pivotal. All rights reserved.
SMART Unlock
Access to your Calendar/Agenda
Infer where/when you usually go by car
Awareness of Bank Holidays etc.
Knows where you parked your car
Knows where you are (GPS)
16 © Copyright 2015 Pivotal. All rights reserved.
Works Efficient Convenient Smart
The Car-Unlock Experience
I unlocked your car!
18 © Copyright 2015 Pivotal. All rights reserved.
Obstruction Duration Prediction
• Predict duration of road incidents in London
• Android app developed on top of the model
• http://ds-demo-transport.cfapps.io
19 © Copyright 2015 Pivotal. All rights reserved.
R E A LT I M E DASHBOARD Driving Prediction
https://youtu.be/5gySgGWJMHA
20 © Copyright 2015 Pivotal. All rights reserved.
Time to Delivery
� Three sub problems – Time to delivery estimate – Time slot availability – Courier scheduling
� Courier scheduling and time to delivery estimate may have mutual feedback
Logistics Comp. Logistics Comp.
21 © Copyright 2015 Pivotal. All rights reserved.
Telco: Protecting Minors - Age Prediction
Estimate age of the customer based on their calling habits
Can distinguish minors in with an accuracy of >80%
• Call records from March-Aug 2014
• Corresponds to ~3TB data
• Attributes are • Calling party ID • Called party ID • Date • Time • Duration at start/end • Location • Type of call and bearer • TAC • Data
• Call records from March-Aug 2014
• Corresponds to ~3TB data
• Attributes are • Calling party ID • Called party ID • Date • Time • Duration at start/end • Location • Type of call and bearer • TAC • Data
CDR CRM Data Feature Importance Observation
Calls (holidays-schooltime) 0.08-0.06 Minors call less in school holiday
Average call length 0.07 Minors make shorter calls
Call timing (night-day) 0.07-0.03 Minors call more at nighttime
Number of phone uses 0.05 Minors use the phones less
Percentage of text use 0.05 Minors text less
Number of contacts 0.05 Minors less likely to have 1 contact
Percentage of calls to minors 0.04 Minors call other minors more
Percentage of voice use 0.04 Typical
Caller-Callee ratio 0.04 Minors receive more calls than make
Fri/Sat/Thurs ratio 0.04-0.03 Minors call more at weekends
Number of locations 0.04 Minors more likely to have 2 locs
22 © Copyright 2015 Pivotal. All rights reserved.
Internal Transaction Fraud Detection
Beyond signatures
Beyond simple metrics for thresholding
Beyond manual engineering of rules
Monitor each and every entity in its environmental context
23 © Copyright 2015 Pivotal. All rights reserved.
Internal Transaction Fraud Detection
Beyond signatures
Beyond simple metrics for thresholding
Beyond manual engineering of rules
Monitor each and every entity in its environmental context
24 © Copyright 2015 Pivotal. All rights reserved.
2
5
3
3
3,25
UserID and Data Experts analyze Overall vote is determined
S(id) = w1 ·M1(id) + . . .+ wj ·Mj(id)
X
i
wi = 1
s.t.Weights are a measure of “importance” for model expert j. Initially uniform across all experts.
Mixture of Experts Metaphor
25 © Copyright 2015 Pivotal. All rights reserved.
Anomalous User Behavior Comparison
Mean Anomaly Scores Users
Transaction Anomaly
SoD Risk
Terminated Employees
CDHDR Access
Anomaly
VPN Access
Anomaly
Cluster Outlier
Total Score # %
Reg B
Red 0.6 0.6 0.1 0.2 0.1 0.6 2.3 26 0.3%
Amber 0.4 0.5 0.1 0.1 0.1 0.6 1.7 73 0.8%
Green 0.0 0.0 0.0 0.0 0.1 0.0 0.1 8,765 98.9%
Reg A
Red 0.1 - - 1.0 0.4 0.9 2.4 1 0.01%
Amber 0.4 0.2 0.0 0.1 0.2 0.7 1.7 25 0.4%
Green 0.0 0.0 0.0 0.0 0.1 0.0 0.2 6,853 99.6%
26 © Copyright 2015 Pivotal. All rights reserved.
Add SMARTness to your app by leveraging data
Don’t think of Data Science in an isolated fashion
Move beyond POCs on Big Data
Start with a minimal viable product/solution
Get the right platform and resources in place
Collaborate and interact
Conclusions