operationalizing data science: the right architecture and tools

34
© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Megha Agarwal, Data Scientist November 7, 2017 Operationalizing Data Science: The Right Tools and Architecture

Upload: pivotal

Post on 24-Jan-2018

515 views

Category:

Technology


1 download

TRANSCRIPT

© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0

Megha Agarwal, Data Scientist November 7, 2017

Operationalizing Data Science: The Right Tools and Architecture

Today’s Speaker

Megha Agarwal Data Scientist, Pivotal Megha has been with Pivotal since 2 years, helping clients ranging from startups to Fortune 500 companies identify and deliver value through their data. She is focused on developing smart apps by applying machine learning and statistics.

Prior to Pivotal, she was working in the credit risk department to identify potential delinquency and fraud patterns. She has done her Masters in Machine Learning and HPC form University of Bristol.

Operationalizing Data Science: Common Pitfalls

3

3. Pace of Insight Generation Mismatch

2. Lack of Business Process Integration

1. Predictive Insights are Insufficient

4. Inability to Act on Perishable Insights

5. Failing to Learn from Past Experience

4

3. Right Insight, Right Time

2. Business Process Integration

1. Prescriptive Insights

4. Software Automation

5. Close the Analytics Loop

Operationalizing Data Science: A Strategy for Success

The Right Tools and Architecture

OPERATIONALIZING DATA SCIENCE

Ingredients

Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor

Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor

Data Science

Product Management

Product Design

Engineering

Continuous Improvement

Process

Our practices are based on Lean + XP

Meets user needs

Easy to use

Smart

First version of the product

No missed opportunities Laying the data foundation from the

start allows us to easily add smart

features

Iterative without losing the bigger picture

Customers expect apps to be

personalised. The iterative process

allows the product to learn and

improve over time

Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor

Quality Loremipsumdolorsitamet,consecteturadipisicingelit,seddoeiusmodtempor

Our products are smart from the start

MVM & Continuous Deployment of DS Models

Model Evaluation

Operationalization

Model Building

Feature Review

Scoping Data Review Feature Engineering

User Feedback

Pair Programming

Retros

Test Driven Development

Continuous Integration / API First

Pivotal Tracker

Standups

Pairing

Pairing

15

CI & TDD

16

Standups

17

Retros

Let’s Bake

Digital Messaging App

Customer: A multinational banking and financial service

Digital Messaging App

Customer: A multinational banking and financial service

Problem: Digital messaging app to provide relevant information to the customers about their finances at appropriate time

●  User Centric

●  Persona Identification

●  Product Features

Design + Data

Improving customer banking experience

Unusual Direct Debits Scheduled Direct Debits Future Insufficient Funds

Improving customer banking experience

Unusual Direct Debits Scheduled Direct Debits Future Insufficient Funds

Unusual Direct Debit Workflow

●  Online Learning Model

●  Personalised (each customer, DD company)

●  18G data flowing in everyday

Model Nuances

●  Begin the exploration with an end to end wiring discussions with devs

●  Explore the direct debit transactions, identifying how the overall population behaves

●  Minimum Viable Model: Median Deviation from Mean

Data Exploration

Putting into Production

Daily Transactions Parse, enrich, filter

transaction data

Customer Information

DS Microservice to create, score and update UDD models

UDD Alert ~ 18GB Transaction Daily Data

UDD Micro-services Pipeline

Parse, enrich, filter transaction data

Customer Information

DS Microservice to create, score and update UDD models

UDD Alert ~ 18GB Transaction Daily Data

Daily Transactions

UDD Micro-services Pipeline

Historical Direct Debits (12M)

For each DD that a customer has Mean,

Median Deviation from Mean

Model Repo

New Direct Debit

Retrieve required DD model for the

customer Stable?

Yes

Within Limit?

No

No

Yes

Update model

DS Python Micro-service

UNUSUAL DIRECT DEBITS

USUAL DIRECT DEBITS

USER 3 USER 4

USER 1 USER 2

●  Product Team vs Siloed Data Science Team

●  User Centric

●  Extreme Programming Practices can be applied to DS & it help to ship features

faster

●  It’s important to have a MVM up and running in production rather than waiting for the

perfect model

Key Takeaways

Other Resources ●  Scoring as a service

●  Operationalising DS Models on Pivotal Stack

●  API First for Data Science

●  Pairing for Data Scientists

●  Test Driven Development for Data Science

●  Continuous Integration for Data Science

Transforming How The World Builds Software

© Copyright 2017 Pivotal Software, Inc. All rights Reserved.