2018 predictive analytics symposium - soa · 2018-09-07 · 2018 predictive analytics symposium ....

31
2018 Predictive Analytics Symposium Session 33: Commercializing a Data Science Model as Application Programming Interface (API) or Batch Service SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Upload: others

Post on 26-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

2018 Predictive Analytics Symposium

Session 33: Commercializing a Data Science Model as Application Programming Interface (API) or Batch

Service

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Page 2: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

Jeffrey Heaton, Ph.D. and Ed Deuser

September 2018

Commercializing a Data Science Model as API or Batch Service

Page 3: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

2

Agenda

Intro

Operational Readiness

Model Methodology

Partnerships

Example

Page 4: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

3

Intro

Page 5: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

4

Presenters

Jeffrey Heaton, Ph.D. – Lead Data Scientist - RGA

Ed Deuser – Technical Architect and Developer - RGA

RGA Reinsurance CompanyThe security of experience. The power of innovation. www.rgare.com

Ed Deuser is a Technical Architect with RGA Reinsurance Company. In this role, Ed is responsible for technical solutions that support RGA’s global business units, including Valuation, Financial Solutions, Underwriting, and Global Research, Development and Analytics. He also served as the technical lead for B3i, the Blockchain Insurance Industry Initiative, and guides other digital objectives for RGA. In addition to his experience in the insurance sector, Ed has worked in financial services, government and law enforcement. Accomplished in the emerging field of distributed ledger technology, Ed has participated in RGA sponsored hackathons as a coach and was part of the winning team at the Office of the National Coordinator (ONC) for Health Information Technology’s first-ever hackathon.Ed received his Bachelor of Science in Information Systems from the University of Missouri–St. Louis. His article “From R Studio to Real-Time Operations,” which he co-authored with RGA Lead Data Scientist Jeff Heaton, was published in the December 2017 issue of the Society of Actuaries’ Predictive Analytics and Futurism Section newsletter.

Jeff Heaton is a lead data scientist at Reinsurance Group of America (RGA), an adjunct instructor for the Sever Institute at Washington University, and the author of several books about artificial intelligence. Jeff holds a Master of Information Management (MIM) from Washington University and a Ph.D. in computer science from Nova Southeastern University. Over twenty years of experience in all aspects of software development allows Jeff to bridge the gap between complex data science problems and proven software development. Working primarily with the Python, R, Java/C#, and JavaScript programming languages he leverages frameworks such as TensorFlow, Scikit-Learn, Numpy, and Theano to implement deep learning, random forests, gradient boosting machines, support vector machines, T-SNE, and generalized linear models (GLM). Jeff holds numerous certifications and credentials, such as the Johns Hopkins Data Science certification, Fellow of the Life Management Institute (FLMI), ACM Upsilon Pi Epsilon (UPE), a senior membership with IEEE. He has published his research through peer reviewed papers with the Journal of Machine Learning Research and IEEE.

Page 6: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

5

Science is good, but how do my customers use it ?

Page 7: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

6

Operational Readiness

Page 8: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

7

Operational Readiness Readiness occurs throughout the

project; most importantly when it starts.

End User Journey – Contract and Service Level Agreement (SLA)

Security is first and last thing we think of.

Agreed on patterns of use• Batch • Real Time• Web

Project Execution

Workload Reality

Project Inception

Project at Risk

Project Failure

Page 9: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

8

Contract Management

Clear Expectation Management in Contractual Terms

End User Journey and Expectations

Standard Service level agreement as basis

End Users Journey to a delivered Service level agreement (SLA)

Page 10: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

9

Threat Modeling• How could it be compromised ?• How to protect compromised sections ?

Logging, Monitoring and Alerting• Forensic logging of the item to be protected

and where it is housed.• Monitor and Alert on suspicious activities and logs.

Pen Testing • Contract with someone to ensure the item is protected.

Security in DepthShould be first and last thing we think of

“According to Microsoft, the potential cost of cyber-crime to the global community is a mind-boggling $500 billion, and a data breach will cost the average company about $3.8 million. “

Page 11: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

10

API in English PleaseAPI stands for Application Programming Interface.

Cohort – 100Id, gender, conditionsScores

Cohort – 100Id, gender, conditions, score

What is an API ?

API

Compute Score

API

Page 12: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

11

Model Development Methodology

Page 13: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

12

Model Development Methodology

Model Scoping and Business Understanding Data Understanding Data Discovery and

Enrichment Model Fitting /

Validation Model Deployment

Page 14: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

13

Input Format for Model

Clients tend to vary the format of input data during model development.

Columns provided might change.

Column names might change.

Date formats may not be consistent.

For an automated API, this format must become consistent.

For an API, data input must be very standardized

Page 15: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

14

Use Excel as a Tool, Not a Format

Excel is a powerful data exploration tool for rapid analysis.

However, Excel can be a problematic data exchange format.• Inability to specify export encoding (UTF-8, Unicode, etc.).• Excel often mangles input by inferring data format. Such as treating SNOMED codes as

numbers.• Different tools generate Excel files differently. • Many more ways to confuse automated imports with Excel than CSV.

For tabular data, we prefer CSV (UTF-8)

Page 16: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

15

Input Format for Model

Input from the client is usually in JSON, XML, or CSV format.

For real time API’s we prefer JSON/XML• JSON and XML provide a hierarchical view of data.• JSON and XML do not always easily fit into Excel.

For batch, we generally prefer CSV (sometimes Excel)• CSV and Excel both store data in tabular format.

JSON, CSV, or XML?

Page 17: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

16

The XML FormatVerbose and Hierarchical

Page 18: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

17

The JSON FormatConcise and JavaScript-like

Page 19: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

18

Data Discovery and Enrichment

Client input data usually will not contain all necessary information for a model.• If identity of individual is known (PII), we might augment with:

o 3rd party marketing data on individual.o 3rd party credit data on individual.

• If identity of individual is unknown (PII-less):o RGA severity scores for drugs or medical diagnosis.o RGA mortality tables.

Augmenting the input data with additional data sources

Page 20: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

19

Model Fitting

Model fitting is where a data scientist trains a model based on data.

Fitting is usually a very manual process that can go on for days, weeks, or months.

The final output from fitting is a model that can be deployed for client use.

Teaching a model from data

Page 21: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

20

Model Deployment

How will your model be used?• Will the model be used directly by individual human users?• Will the model be integrated into a system developed by client’s IT?• Will the model be used as part of a client’s mobile application?• Will users upload files that a client will upload?

Manual steps from fitting must be automated.

Input data must be checked for errors.

Making your model available to clients

Page 22: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

21

Personally Identifiable Information (PII)and Data Retention

Some input data contains PII, others do not.

Some clients request us to retain no data.

We prefer to keep some data.

We usually do not store PII data on the model side.

What data should we retain? (and where)

Page 23: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

22

Ongoing Model Validation

Client data distributions can change over time.

Baseline truth can change.

Models must be evaluated over time to ensure they remain relevant.

Calibration is an ongoing process.

Keeping the model relevant

Page 24: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

23

Partnerships

Page 25: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

24

Know your strengths

Types of partnerships :

• Internal

“Partnering with different parts of your organization “

• External

“ i.e. Staff Augmentation, Client Partner (i.e. RGA) “

Partnerships in Place to Ensure success

Questions to ask :

• Do you have data scientists in your organization ?

• Are you experienced in cloud deployments ?

• Can you sustain the DevOps practice ?

• Do you understand where your attack vectors are ?

Page 26: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

25

Example Commercialization

Page 27: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

26

Commercialization exampleEXAMPLE. models

Swagger Hub – Create an API first, what's on the menu

Upload API to API gateway on AWS.

Pre- templated NodeJS Lamda to compute score on cohort.

Page 28: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

27

Questions

Page 29: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

28

Appendix

Page 30: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface

29

Resources to use for creating your own API

Disclaimer:

The resources provided are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of Reinsurance Group of America, its cosponsors, or its committees. Reinsurance Group of America does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented. The above resources do not provide all security measures that are recommended; such that appropriate security measures are not provided use freely at your own risk.

https://github.com/eddeuser2017/commercialize_api

Page 31: 2018 Predictive Analytics Symposium - SOA · 2018-09-07 · 2018 Predictive Analytics Symposium . Session 33: Commercializing a Data Science Model as Application Programming Interface