operationalizing data science: the right architecture and tools
TRANSCRIPT
© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0
Megha Agarwal, Data Scientist November 7, 2017
Operationalizing Data Science: The Right Tools and Architecture
Today’s Speaker
Megha Agarwal Data Scientist, Pivotal Megha has been with Pivotal since 2 years, helping clients ranging from startups to Fortune 500 companies identify and deliver value through their data. She is focused on developing smart apps by applying machine learning and statistics.
Prior to Pivotal, she was working in the credit risk department to identify potential delinquency and fraud patterns. She has done her Masters in Machine Learning and HPC form University of Bristol.
Operationalizing Data Science: Common Pitfalls
3
3. Pace of Insight Generation Mismatch
2. Lack of Business Process Integration
1. Predictive Insights are Insufficient
4. Inability to Act on Perishable Insights
5. Failing to Learn from Past Experience
4
3. Right Insight, Right Time
2. Business Process Integration
1. Prescriptive Insights
4. Software Automation
5. Close the Analytics Loop
Operationalizing Data Science: A Strategy for Success
Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor
Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor
Data Science
Product Management
Product Design
Engineering
Continuous Improvement
Meets user needs
Easy to use
Smart
First version of the product
No missed opportunities Laying the data foundation from the
start allows us to easily add smart
features
Iterative without losing the bigger picture
Customers expect apps to be
personalised. The iterative process
allows the product to learn and
improve over time
Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor
Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor
Our products are smart from the start
MVM & Continuous Deployment of DS Models
Model Evaluation
Operationalization
Model Building
Feature Review
Scoping Data Review Feature Engineering
User Feedback
Pair Programming
Retros
Test Driven Development
Continuous Integration / API First
Pivotal Tracker
Standups
Digital Messaging App
Customer: A multinational banking and financial service
Problem: Digital messaging app to provide relevant information to the customers about their finances at appropriate time
Improving customer banking experience
Unusual Direct Debits Scheduled Direct Debits Future Insufficient Funds
Improving customer banking experience
Unusual Direct Debits Scheduled Direct Debits Future Insufficient Funds
● Online Learning Model
● Personalised (each customer, DD company)
● 18G data flowing in everyday
Model Nuances
● Begin the exploration with an end to end wiring discussions with devs
● Explore the direct debit transactions, identifying how the overall population behaves
● Minimum Viable Model: Median Deviation from Mean
Data Exploration
Daily Transactions Parse, enrich, filter
transaction data
Customer Information
DS Microservice to create, score and update UDD models
UDD Alert ~ 18GB Transaction Daily Data
UDD Micro-services Pipeline
Parse, enrich, filter transaction data
Customer Information
DS Microservice to create, score and update UDD models
UDD Alert ~ 18GB Transaction Daily Data
Daily Transactions
UDD Micro-services Pipeline
Historical Direct Debits (12M)
For each DD that a customer has Mean,
Median Deviation from Mean
Model Repo
New Direct Debit
Retrieve required DD model for the
customer Stable?
Yes
Within Limit?
No
No
Yes
Update model
DS Python Micro-service
● Product Team vs Siloed Data Science Team
● User Centric
● Extreme Programming Practices can be applied to DS & it help to ship features
faster
● It’s important to have a MVM up and running in production rather than waiting for the
perfect model
Key Takeaways
Other Resources ● Scoring as a service
● Operationalising DS Models on Pivotal Stack
● API First for Data Science
● Pairing for Data Scientists
● Test Driven Development for Data Science
● Continuous Integration for Data Science