ibm machine learning on z/os - neodbug · 2019-12-08 · challenge of machine learning on z/os...

© 2017 IBM Corporation

IBM Machine learning on z/OSNigel Slinger – [email protected] Karra – [email protected] Bui – [email protected]

© 2017 IBM Corporation2

Agenda

§ What is Machine Learning?- Value of Machine Learning- Machine Learning 101- Standard process and Challenges today with z/OS

§ What is IBM Machine Learning for z/OS- Overview- Architecture flow- Feature highlights

§ Summary


WHAT IS MACHINE LEARNING?


What is Machine Learning?

Computers that …Learn without being explicitly programmed Grow and change when exposed to new data

Deliver personalized and optimized customer interactions

Identify Patterns not readily

foreseen by humans

Build Models of behavior from those patterns


Achieving Business Values with Machine Learning

Identify suspicious behavior, predict and prevent threats / fraud – continually reduce business risks and costs

Product recommendation, next purchase prediction, targeted offers – individual tailored shopping experience.

Learn, predict weather patterns and energy production from renewable sources and integrate into grid more effectively

Detect and understand life-threatening medical conditions and design ever more effective treatment programs

Churn analysis helps identify the cause of the churn and implement effective strategies for retention.Machine Learning…

• Constantly learns and adapts• Avoids making the same mistakes• Faster, deeper, improved insights

Resulting in…ü Smarter business outcomesü Lower business risks and costsü New business opportunities


Types of Machine Learning

§Supervised- A model is trained from a set of known input data to a set of target

output- Input

• Numbers (stock prices)• Text (car failure description)• Images• Audio files• And more

- Output• Class

• Yes/no• Buy/not-buy• Car failure types

• Number • House price• Annual energy in kWh

𝒚 output

input

function

𝒚 = 𝒇 𝒙



§Unsupervised- No structure is given- There is no expected target output- Unsupervised learning can be used to find hidden patterns in the data itself


§ Imagine a basket of fruits- Supervised learning – you know about fruits

• We train the machine to recognize various fruits• One input maps to a target output

Apple

Cherry

Lemon

Banana

- Unsupervised learning – you have no knowledge of any fruit• We group the fruit by COLOR

• RED – apple, cherry• YELLOW – lemon, banana

• Then we group the fruit by COLOR and SIZE• RED and BIG – apple• RED and SMALL – cherry• YELLOW and BIG – banana• YELLO and SMALL – lemon



§Classification- Data points are labeled and are being used to predict a category- Two-class vs multi-class- Example:

• Fraud detection (fraud vs non-fraud)• Spam email detection (spam vs non-spam)

§Regression- When a value is being predicted- Example:

• Stock prices prediction • House prices prediction

§Clustering- Data points are not labeled. - Goal is to group data into clusters to better organize the data



§A feature is a piece of information that might be useful for prediction- Example, predict the churn probability of a customer

§Labeled data is the desired output data- Example, CHURN_LABEL false representing a churn sample

NOTafeature Feature FeatureFeature

Feature vs. Label


ATrainOps (DevOps)story

Training a modelFeature Engineering

Feature Engineering Scoring

Labeled examples

Training

Scoring

Newdata

Model

ModelPredicted

data

DeployData Scientist/Data analyst

Operational systemDev

Ops

Training, Deployment, Scoring


IngestData

ExtractFeatures

TrainModel

DeployModel

MakePredictions

HumanIntervention

ChooseBestModel

IdentifyModelDegradation

PredictionAndScoring

ManageDeployments

The (incomplete) machine learning processTakes significant development, deployment, and management efforts


Challenge of Machine Learning on z/OS

§ Mainframe data has to be ETLed for training

§ Every ETL over a physical network is a potential security exposure

§ Every ETL is a new copy of the data:- Data rapidly becomes out of date- Expensive to maintain multiple copies of the same set of data - Analytics is ineffective if performed on old data

§ Not possible to implement online scoring within native z/OS online transactions


WHAT IS IBM MACHINE LEARNING FOR Z/OS


© 2017 IBM Corporation

IngestData

ExtractFeatures

TrainModel

DeployModel

• End-to-End platform for machine learning tasks on the mainframe

• Create better models faster with Cognitive Assistant for Data Scientists

MakePredictions

IBM Machine Learning for z/OS v1.1

• Continuous monitoring of the model performance to guide model retraining

• RESTful API for online scoring within transactions

The (complete) machine learning processIBM Turns Machine Learning into Learning Machines


IBM Machine Learning for z/OS

Graphic User Interface


Themachinelearningworkflow

Onlinescoringonz– RESTAPI

Trainingonzz/OSdata– DB2,IMS,VSAM,IDAAetc.

IBM Machine Learning for z/OS V1.1

» Training and scoring on z using Spark for z/OS as the backend data processor» Customers can train models best fit for their business with the data on mainframe» Customers can deploy the ML models and perform online scoring within transaction

on mainframe



» We announced product GA on March/17/2017» It’s two-tiers architecture

» Components on z/OS – MLz scoring service, various SPARK ML libraries and CADS/HPO library

» Components on Linux/x86 – running on docker images» Deploy the docker images through kubernetes» It contains : authentication token, metadata service, deployment service, ingestion

service, transformation service, pipeline service, » Uses DB2 for z/OS as the database to store the metadata information for the

models, model deployment information, evaluation information.



» Pre-req» On Linux x86

» X86 64-bit system with 8 cores, 32GB RAM and 250G disk space (recommendation 3 Linux x86 system for high availability coverage)

» RedHat Enterprise Linux Server 7.2 or later» Docker Engine» Kubernetes (one master and 2 workers)

» On z/OS» z/OS 2.1 » DB2 for z/OS V10 or beyond» LDAP» IBM 64-bit SDK for z/OS» SPARK for z/OS 2.0.2

» Hardware requirements» Supported machines z13, z13s, EC12» We recommend a minimum of 4 zIIP processors, 1 general CP, with minimum 100GB memory to

the LPAR


Application Cluster Ingestion service

Transformation service

Pipeline service

z/OS Spark Cluster

Ingestion lib

Transformation lib Pipeline lib

Service Metadata

ML modelsDB2z

MDSS driver

IBM Machine Learning UIJupyter Notebook UI / Visual Model Builder

Model Management / Model Deployment / MonitoringBundledsoftware

IBMMLz components

Pre-requisitesoftware

z/OS LibertyzLDAP

RACF(optional

)

AuthService Kubernetes/Docker

Linux

z/OS

Scoring serviceIMSVSAM

JupyterKernel

Gateway

Metadata Service

DeploymentService

(Feedback/Monitoring)

LDBM

Jupyterserver

DB2

DB2 JDBC driverCADS/HPO lib

SMF

CouchDB(NoSQL

Metadata)

Apache Toree

z/OS SparkIn

Local Mode

IBM Machine Learning for z/OS Architecture


Feature Highlights

- CADS (Cognitive Assistant for Data Scientist) library, IBM’s value add- Integrated notebook with flexible APIs

• Supported Language : Scala - Integrated Brunel Visualization library- Visual Model Builder- Model Management

• Model• Deployment• Evaluation

- RESTful API for online Scoring within Application- Model feedback and monitoring - Administration Dashboard


Feature Highlights – CADS

§What is CADS? - Cognitive Assistant for Data Scientist which helps select the best fit algorithm for training

§Why Data Scientists need CADS? - Many algorithms for classification/regression tasks: SVM, Decision Trees/Forests, Naïve Bayes, Logistic

Regression, Linear Regression, etc.- Substantial cost in user and compute time to select the best algorithm

• User spends time on trying various learners • Computational cost for training a single SVM can exceed 24h• Selection commonly based on data scientist bias and experience


Feature Highlights – CADS

§Minimize amount of data to be considered to make an informed selection of most suitable leaner

§Given a data set try to select the best approach by directly considering part of actual data

Logistic Regression

Training Data

Random Forest

Decision Tree…

500500


Ingest data from DB2z table

Data transformation and training

Feature Highlights – Integrated Notebook Interface with flexible APIs


Feature Highlights – Brunel Visualization Tool

§ What is Brunel? - Data Scientists use visualization tool to help them understand data distribution. Brunel is one of the

tools commonly used by Data Scientists- This is an open source tool sponsored by IBM

§ Import Brunel library- External network connectivity is not an issue

• %AddJar –magic https://brunelvis.org/jar/spark-kernel-brunel-all-2.2.jar- External network connectivity is a concern. Use the integrated Brunel visualization jar from the installed path on MLz

on z system. • %AddJar –magic file:///your-local-scoring-path/iml-library/brunel/spark-kernel-brunel-all-2.2.jar Add Brunel Jar

from the local copy


Brunel example


Feature Highlights – Visual Model Builder

§Unlocks the world of Data Science to none Data Scientists

§Allows Data Scientists to be more productive


Ingest data and transform

Training and evaluation

Feature Highlights – Visual Model Builder, the guided Machine Learning Interface


Manage model, create deployment

Manage deployment

Feature Highlights – Model Management


Feature Highlights – Evaluation


Feature Highlights – Easily consumable RESTful API for online Scoring within Application Code

RESTful API for online scoring and prediction


Feature Highlights – Model feedback and monitoring

– Feedback and Continuous Monitoring


Feature Highlights – Administration Dashboard



Kernel Management


SUMMARY


Differentiating Value Capability

Create better models in less time

§ Rapidly optimize the algorithm that best fits the data and business scenario

Cognitive Assistant for Data Scientists (CADS)

§ Provide optimal parameters for any given model

Hyper Parameter Optimization (HPO)

Simplify model creation § Wizards make it easy for users to create and train a model

DSX PipelineUser Interface

Improve modelsover time

§ Monitor model performance with feedback data and performance history

§ Notification of model performance deterioration for more efficient retraining

Continuous Monitoring and Feedback Loop

Easily integrate with existing tools and applications

§ Ease collaboration across users (e.g., Data Scientists and App Developers)

Modern RESTful APIs

Simplify model management

§ Easily manage thousands of models in an enterprise environment Single UI for Deployment

IBM Machine Learning for z/OS – faster time to value

ibm machine learning on z/os - neodbug · 2019-12-08 · challenge of machine learning on z/os...

Documents