ibm machine learning on z/os - neodbug · 2019-12-08 · challenge of machine learning on z/os...

39
© 2017 IBM Corporation IBM Machine learning on z/OS Nigel Slinger – [email protected] Teja Karra – [email protected] Luan Bui – [email protected]

Upload: others

Post on 22-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation

IBM Machine learning on z/OSNigel Slinger – [email protected] Karra – [email protected] Bui – [email protected]

Page 2: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation2

Agenda

§ What is Machine Learning?- Value of Machine Learning- Machine Learning 101- Standard process and Challenges today with z/OS

§ What is IBM Machine Learning for z/OS- Overview- Architecture flow- Feature highlights

§ Summary

Page 3: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation3

WHAT IS MACHINE LEARNING?

Page 4: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation4

What is Machine Learning?

Computers that …Learn without being explicitly programmed Grow and change when exposed to new data

Deliver personalized and optimized customer interactions

Identify Patterns not readily

foreseen by humans

Build Models of behavior from those patterns

Page 5: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation5

Achieving Business Values with Machine Learning

Identify suspicious behavior, predict and prevent threats / fraud – continually reduce business risks and costs

Product recommendation, next purchase prediction, targeted offers – individual tailored shopping experience.

Learn, predict weather patterns and energy production from renewable sources and integrate into grid more effectively

Detect and understand life-threatening medical conditions and design ever more effective treatment programs

Churn analysis helps identify the cause of the churn and implement effective strategies for retention.Machine Learning…

• Constantly learns and adapts• Avoids making the same mistakes• Faster, deeper, improved insights

Resulting in…ü Smarter business outcomesü Lower business risks and costsü New business opportunities

Page 6: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation6

Types of Machine Learning

§Supervised- A model is trained from a set of known input data to a set of target

output- Input

• Numbers (stock prices)• Text (car failure description)• Images• Audio files• And more

- Output• Class

• Yes/no• Buy/not-buy• Car failure types

• Number • House price• Annual energy in kWh

𝒚 output

input

function

𝒚 = 𝒇 𝒙

Page 7: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation7

Types of Machine Learning

§Unsupervised- No structure is given- There is no expected target output- Unsupervised learning can be used to find hidden patterns in the data itself

Page 8: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation8

§ Imagine a basket of fruits- Supervised learning – you know about fruits

• We train the machine to recognize various fruits• One input maps to a target output

Apple

Cherry

Lemon

Banana

- Unsupervised learning – you have no knowledge of any fruit• We group the fruit by COLOR

• RED – apple, cherry• YELLOW – lemon, banana

• Then we group the fruit by COLOR and SIZE• RED and BIG – apple• RED and SMALL – cherry• YELLOW and BIG – banana• YELLO and SMALL – lemon

Types of Machine Learning

Page 9: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation9

§Classification- Data points are labeled and are being used to predict a category- Two-class vs multi-class- Example:

• Fraud detection (fraud vs non-fraud)• Spam email detection (spam vs non-spam)

§Regression- When a value is being predicted- Example:

• Stock prices prediction • House prices prediction

§Clustering- Data points are not labeled. - Goal is to group data into clusters to better organize the data

Types of Machine Learning

Page 10: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation10

§A feature is a piece of information that might be useful for prediction- Example, predict the churn probability of a customer

§Labeled data is the desired output data- Example, CHURN_LABEL false representing a churn sample

NOTafeature Feature FeatureFeature

Feature vs. Label

Page 11: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation11

ATrainOps (DevOps)story

Training a modelFeature Engineering

Feature Engineering Scoring

Labeled examples

Training

Scoring

Newdata

Model

ModelPredicted

data

DeployData Scientist/Data analyst

Operational systemDev

Ops

Training, Deployment, Scoring

Page 12: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation12

IngestData

ExtractFeatures

TrainModel

DeployModel

MakePredictions

HumanIntervention

ChooseBestModel

IdentifyModelDegradation

PredictionAndScoring

ManageDeployments

The (incomplete) machine learning processTakes significant development, deployment, and management efforts

Page 13: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation13

Challenge of Machine Learning on z/OS

§ Mainframe data has to be ETLed for training

§ Every ETL over a physical network is a potential security exposure

§ Every ETL is a new copy of the data:- Data rapidly becomes out of date- Expensive to maintain multiple copies of the same set of data - Analytics is ineffective if performed on old data

§ Not possible to implement online scoring within native z/OS online transactions

Page 14: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation14

WHAT IS IBM MACHINE LEARNING FOR Z/OS

Page 15: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation15

© 2017 IBM Corporation

IngestData

ExtractFeatures

TrainModel

DeployModel

• End-to-End platform for machine learning tasks on the mainframe

• Create better models faster with Cognitive Assistant for Data Scientists

MakePredictions

IBM Machine Learning for z/OS v1.1

• Continuous monitoring of the model performance to guide model retraining

• RESTful API for online scoring within transactions

The (complete) machine learning processIBM Turns Machine Learning into Learning Machines

Page 16: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation16

IBM Machine Learning for z/OS

Graphic User Interface

Page 17: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation17

IBM Machine Learning for z/OS

Graphic User Interface

Page 18: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation18

Themachinelearningworkflow

Onlinescoringonz– RESTAPI

Trainingonzz/OSdata– DB2,IMS,VSAM,IDAAetc.

IBM Machine Learning for z/OS V1.1

» Training and scoring on z using Spark for z/OS as the backend data processor» Customers can train models best fit for their business with the data on mainframe» Customers can deploy the ML models and perform online scoring within transaction

on mainframe

Page 19: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation19

IBM Machine Learning for z/OS V1.1

» We announced product GA on March/17/2017» It’s two-tiers architecture

» Components on z/OS – MLz scoring service, various SPARK ML libraries and CADS/HPO library

» Components on Linux/x86 – running on docker images» Deploy the docker images through kubernetes» It contains : authentication token, metadata service, deployment service, ingestion

service, transformation service, pipeline service, » Uses DB2 for z/OS as the database to store the metadata information for the

models, model deployment information, evaluation information.

Page 20: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation20

IBM Machine Learning for z/OS V1.1

» Pre-req» On Linux x86

» X86 64-bit system with 8 cores, 32GB RAM and 250G disk space (recommendation 3 Linux x86 system for high availability coverage)

» RedHat Enterprise Linux Server 7.2 or later» Docker Engine» Kubernetes (one master and 2 workers)

» On z/OS» z/OS 2.1 » DB2 for z/OS V10 or beyond» LDAP» IBM 64-bit SDK for z/OS» SPARK for z/OS 2.0.2

» Hardware requirements» Supported machines z13, z13s, EC12» We recommend a minimum of 4 zIIP processors, 1 general CP, with minimum 100GB memory to

the LPAR

Page 21: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation21

Application Cluster Ingestion service

Transformation service

Pipeline service

z/OS Spark Cluster

Ingestion lib

Transformation lib Pipeline lib

Service Metadata

ML modelsDB2z

MDSS driver

IBM Machine Learning UIJupyter Notebook UI / Visual Model Builder

Model Management / Model Deployment / MonitoringBundledsoftware

IBMMLz components

Pre-requisitesoftware

z/OS LibertyzLDAP

RACF(optional

)

AuthService Kubernetes/Docker

Linux

z/OS

Scoring serviceIMSVSAM

JupyterKernel

Gateway

Metadata Service

DeploymentService

(Feedback/Monitoring)

LDBM

Jupyterserver

DB2

DB2 JDBC driverCADS/HPO lib

SMF

CouchDB(NoSQL

Metadata)

Apache Toree

z/OS SparkIn

Local Mode

IBM Machine Learning for z/OS Architecture

Page 22: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation22

Feature Highlights

- CADS (Cognitive Assistant for Data Scientist) library, IBM’s value add- Integrated notebook with flexible APIs

• Supported Language : Scala - Integrated Brunel Visualization library- Visual Model Builder- Model Management

• Model• Deployment• Evaluation

- RESTful API for online Scoring within Application- Model feedback and monitoring - Administration Dashboard

Page 23: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation23

Feature Highlights – CADS

§What is CADS? - Cognitive Assistant for Data Scientist which helps select the best fit algorithm for training

§Why Data Scientists need CADS? - Many algorithms for classification/regression tasks: SVM, Decision Trees/Forests, Naïve Bayes, Logistic

Regression, Linear Regression, etc.- Substantial cost in user and compute time to select the best algorithm

• User spends time on trying various learners • Computational cost for training a single SVM can exceed 24h• Selection commonly based on data scientist bias and experience

Page 24: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation24

Feature Highlights – CADS

§Minimize amount of data to be considered to make an informed selection of most suitable leaner

§Given a data set try to select the best approach by directly considering part of actual data

Logistic Regression

Training Data

Random Forest

Decision Tree…

500500

Page 25: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation25

Ingest data from DB2z table

Data transformation and training

Feature Highlights – Integrated Notebook Interface with flexible APIs

Page 26: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation26

Feature Highlights – Brunel Visualization Tool

§ What is Brunel? - Data Scientists use visualization tool to help them understand data distribution. Brunel is one of the

tools commonly used by Data Scientists- This is an open source tool sponsored by IBM

§ Import Brunel library- External network connectivity is not an issue

• %AddJar –magic https://brunelvis.org/jar/spark-kernel-brunel-all-2.2.jar- External network connectivity is a concern. Use the integrated Brunel visualization jar from the installed path on MLz

on z system. • %AddJar –magic file:///your-local-scoring-path/iml-library/brunel/spark-kernel-brunel-all-2.2.jar Add Brunel Jar

from the local copy

Page 27: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation27

Brunel example

Page 28: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation28

Feature Highlights – Visual Model Builder

§Unlocks the world of Data Science to none Data Scientists

§Allows Data Scientists to be more productive

Page 29: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation29

Ingest data and transform

Training and evaluation

Feature Highlights – Visual Model Builder, the guided Machine Learning Interface

Page 30: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation30

Manage model, create deployment

Manage deployment

Feature Highlights – Model Management

Page 31: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation31

Feature Highlights – Evaluation

Page 32: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation32

Feature Highlights – Easily consumable RESTful API for online Scoring within Application Code

RESTful API for online scoring and prediction

Page 33: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation33

Feature Highlights – Model feedback and monitoring

– Feedback and Continuous Monitoring

Page 34: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation34

Feature Highlights – Administration Dashboard

Page 35: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation35

Feature Highlights – Administration Dashboard

Page 36: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation36

Feature Highlights – Administration Dashboard

Kernel Management

Page 37: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation37

Feature Highlights – Administration Dashboard

Page 38: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation38

SUMMARY

Page 39: IBM Machine learning on z/OS - neodbug · 2019-12-08 · Challenge of Machine Learning on z/OS §Mainframe data has to be ETLedfor training §Every ETL over a physical network is

© 2017 IBM Corporation39

Differentiating Value Capability

Create better models in less time

§ Rapidly optimize the algorithm that best fits the data and business scenario

Cognitive Assistant for Data Scientists (CADS)

§ Provide optimal parameters for any given model

Hyper Parameter Optimization (HPO)

Simplify model creation § Wizards make it easy for users to create and train a model

DSX PipelineUser Interface

Improve modelsover time

§ Monitor model performance with feedback data and performance history

§ Notification of model performance deterioration for more efficient retraining

Continuous Monitoring and Feedback Loop

Easily integrate with existing tools and applications

§ Ease collaboration across users (e.g., Data Scientists and App Developers)

Modern RESTful APIs

Simplify model management

§ Easily manage thousands of models in an enterprise environment Single UI for Deployment

IBM Machine Learning for z/OS – faster time to value