ibm machine learning on z/os - neodbug · 2019-12-08 · challenge of machine learning on z/os...
TRANSCRIPT
© 2017 IBM Corporation
IBM Machine learning on z/OSNigel Slinger – [email protected] Karra – [email protected] Bui – [email protected]
© 2017 IBM Corporation2
Agenda
§ What is Machine Learning?- Value of Machine Learning- Machine Learning 101- Standard process and Challenges today with z/OS
§ What is IBM Machine Learning for z/OS- Overview- Architecture flow- Feature highlights
§ Summary
© 2017 IBM Corporation3
WHAT IS MACHINE LEARNING?
© 2017 IBM Corporation4
What is Machine Learning?
Computers that …Learn without being explicitly programmed Grow and change when exposed to new data
Deliver personalized and optimized customer interactions
Identify Patterns not readily
foreseen by humans
Build Models of behavior from those patterns
© 2017 IBM Corporation5
Achieving Business Values with Machine Learning
Identify suspicious behavior, predict and prevent threats / fraud – continually reduce business risks and costs
Product recommendation, next purchase prediction, targeted offers – individual tailored shopping experience.
Learn, predict weather patterns and energy production from renewable sources and integrate into grid more effectively
Detect and understand life-threatening medical conditions and design ever more effective treatment programs
Churn analysis helps identify the cause of the churn and implement effective strategies for retention.Machine Learning…
• Constantly learns and adapts• Avoids making the same mistakes• Faster, deeper, improved insights
Resulting in…ü Smarter business outcomesü Lower business risks and costsü New business opportunities
© 2017 IBM Corporation6
Types of Machine Learning
§Supervised- A model is trained from a set of known input data to a set of target
output- Input
• Numbers (stock prices)• Text (car failure description)• Images• Audio files• And more
- Output• Class
• Yes/no• Buy/not-buy• Car failure types
• Number • House price• Annual energy in kWh
𝒚 output
input
function
𝒚 = 𝒇 𝒙
© 2017 IBM Corporation7
Types of Machine Learning
§Unsupervised- No structure is given- There is no expected target output- Unsupervised learning can be used to find hidden patterns in the data itself
© 2017 IBM Corporation8
§ Imagine a basket of fruits- Supervised learning – you know about fruits
• We train the machine to recognize various fruits• One input maps to a target output
Apple
Cherry
Lemon
Banana
- Unsupervised learning – you have no knowledge of any fruit• We group the fruit by COLOR
• RED – apple, cherry• YELLOW – lemon, banana
• Then we group the fruit by COLOR and SIZE• RED and BIG – apple• RED and SMALL – cherry• YELLOW and BIG – banana• YELLO and SMALL – lemon
Types of Machine Learning
© 2017 IBM Corporation9
§Classification- Data points are labeled and are being used to predict a category- Two-class vs multi-class- Example:
• Fraud detection (fraud vs non-fraud)• Spam email detection (spam vs non-spam)
§Regression- When a value is being predicted- Example:
• Stock prices prediction • House prices prediction
§Clustering- Data points are not labeled. - Goal is to group data into clusters to better organize the data
Types of Machine Learning
© 2017 IBM Corporation10
§A feature is a piece of information that might be useful for prediction- Example, predict the churn probability of a customer
§Labeled data is the desired output data- Example, CHURN_LABEL false representing a churn sample
NOTafeature Feature FeatureFeature
Feature vs. Label
© 2017 IBM Corporation11
ATrainOps (DevOps)story
Training a modelFeature Engineering
Feature Engineering Scoring
Labeled examples
Training
Scoring
Newdata
Model
ModelPredicted
data
DeployData Scientist/Data analyst
Operational systemDev
Ops
Training, Deployment, Scoring
© 2017 IBM Corporation12
IngestData
ExtractFeatures
TrainModel
DeployModel
MakePredictions
HumanIntervention
ChooseBestModel
IdentifyModelDegradation
PredictionAndScoring
ManageDeployments
The (incomplete) machine learning processTakes significant development, deployment, and management efforts
© 2017 IBM Corporation13
Challenge of Machine Learning on z/OS
§ Mainframe data has to be ETLed for training
§ Every ETL over a physical network is a potential security exposure
§ Every ETL is a new copy of the data:- Data rapidly becomes out of date- Expensive to maintain multiple copies of the same set of data - Analytics is ineffective if performed on old data
§ Not possible to implement online scoring within native z/OS online transactions
© 2017 IBM Corporation14
WHAT IS IBM MACHINE LEARNING FOR Z/OS
© 2017 IBM Corporation15
© 2017 IBM Corporation
IngestData
ExtractFeatures
TrainModel
DeployModel
• End-to-End platform for machine learning tasks on the mainframe
• Create better models faster with Cognitive Assistant for Data Scientists
MakePredictions
IBM Machine Learning for z/OS v1.1
• Continuous monitoring of the model performance to guide model retraining
• RESTful API for online scoring within transactions
The (complete) machine learning processIBM Turns Machine Learning into Learning Machines
© 2017 IBM Corporation16
IBM Machine Learning for z/OS
Graphic User Interface
© 2017 IBM Corporation17
IBM Machine Learning for z/OS
Graphic User Interface
© 2017 IBM Corporation18
Themachinelearningworkflow
Onlinescoringonz– RESTAPI
Trainingonzz/OSdata– DB2,IMS,VSAM,IDAAetc.
IBM Machine Learning for z/OS V1.1
» Training and scoring on z using Spark for z/OS as the backend data processor» Customers can train models best fit for their business with the data on mainframe» Customers can deploy the ML models and perform online scoring within transaction
on mainframe
© 2017 IBM Corporation19
IBM Machine Learning for z/OS V1.1
» We announced product GA on March/17/2017» It’s two-tiers architecture
» Components on z/OS – MLz scoring service, various SPARK ML libraries and CADS/HPO library
» Components on Linux/x86 – running on docker images» Deploy the docker images through kubernetes» It contains : authentication token, metadata service, deployment service, ingestion
service, transformation service, pipeline service, » Uses DB2 for z/OS as the database to store the metadata information for the
models, model deployment information, evaluation information.
© 2017 IBM Corporation20
IBM Machine Learning for z/OS V1.1
» Pre-req» On Linux x86
» X86 64-bit system with 8 cores, 32GB RAM and 250G disk space (recommendation 3 Linux x86 system for high availability coverage)
» RedHat Enterprise Linux Server 7.2 or later» Docker Engine» Kubernetes (one master and 2 workers)
» On z/OS» z/OS 2.1 » DB2 for z/OS V10 or beyond» LDAP» IBM 64-bit SDK for z/OS» SPARK for z/OS 2.0.2
» Hardware requirements» Supported machines z13, z13s, EC12» We recommend a minimum of 4 zIIP processors, 1 general CP, with minimum 100GB memory to
the LPAR
© 2017 IBM Corporation21
Application Cluster Ingestion service
Transformation service
Pipeline service
z/OS Spark Cluster
Ingestion lib
Transformation lib Pipeline lib
Service Metadata
ML modelsDB2z
MDSS driver
IBM Machine Learning UIJupyter Notebook UI / Visual Model Builder
Model Management / Model Deployment / MonitoringBundledsoftware
IBMMLz components
Pre-requisitesoftware
z/OS LibertyzLDAP
RACF(optional
)
AuthService Kubernetes/Docker
Linux
z/OS
Scoring serviceIMSVSAM
JupyterKernel
Gateway
Metadata Service
DeploymentService
(Feedback/Monitoring)
LDBM
Jupyterserver
DB2
DB2 JDBC driverCADS/HPO lib
SMF
CouchDB(NoSQL
Metadata)
Apache Toree
z/OS SparkIn
Local Mode
IBM Machine Learning for z/OS Architecture
© 2017 IBM Corporation22
Feature Highlights
- CADS (Cognitive Assistant for Data Scientist) library, IBM’s value add- Integrated notebook with flexible APIs
• Supported Language : Scala - Integrated Brunel Visualization library- Visual Model Builder- Model Management
• Model• Deployment• Evaluation
- RESTful API for online Scoring within Application- Model feedback and monitoring - Administration Dashboard
© 2017 IBM Corporation23
Feature Highlights – CADS
§What is CADS? - Cognitive Assistant for Data Scientist which helps select the best fit algorithm for training
§Why Data Scientists need CADS? - Many algorithms for classification/regression tasks: SVM, Decision Trees/Forests, Naïve Bayes, Logistic
Regression, Linear Regression, etc.- Substantial cost in user and compute time to select the best algorithm
• User spends time on trying various learners • Computational cost for training a single SVM can exceed 24h• Selection commonly based on data scientist bias and experience
© 2017 IBM Corporation24
Feature Highlights – CADS
§Minimize amount of data to be considered to make an informed selection of most suitable leaner
§Given a data set try to select the best approach by directly considering part of actual data
Logistic Regression
Training Data
Random Forest
Decision Tree…
500500
© 2017 IBM Corporation25
Ingest data from DB2z table
Data transformation and training
Feature Highlights – Integrated Notebook Interface with flexible APIs
© 2017 IBM Corporation26
Feature Highlights – Brunel Visualization Tool
§ What is Brunel? - Data Scientists use visualization tool to help them understand data distribution. Brunel is one of the
tools commonly used by Data Scientists- This is an open source tool sponsored by IBM
§ Import Brunel library- External network connectivity is not an issue
• %AddJar –magic https://brunelvis.org/jar/spark-kernel-brunel-all-2.2.jar- External network connectivity is a concern. Use the integrated Brunel visualization jar from the installed path on MLz
on z system. • %AddJar –magic file:///your-local-scoring-path/iml-library/brunel/spark-kernel-brunel-all-2.2.jar Add Brunel Jar
from the local copy
© 2017 IBM Corporation27
Brunel example
© 2017 IBM Corporation28
Feature Highlights – Visual Model Builder
§Unlocks the world of Data Science to none Data Scientists
§Allows Data Scientists to be more productive
© 2017 IBM Corporation29
Ingest data and transform
Training and evaluation
Feature Highlights – Visual Model Builder, the guided Machine Learning Interface
© 2017 IBM Corporation30
Manage model, create deployment
Manage deployment
Feature Highlights – Model Management
© 2017 IBM Corporation31
Feature Highlights – Evaluation
© 2017 IBM Corporation32
Feature Highlights – Easily consumable RESTful API for online Scoring within Application Code
RESTful API for online scoring and prediction
© 2017 IBM Corporation33
Feature Highlights – Model feedback and monitoring
– Feedback and Continuous Monitoring
© 2017 IBM Corporation34
Feature Highlights – Administration Dashboard
© 2017 IBM Corporation35
Feature Highlights – Administration Dashboard
© 2017 IBM Corporation36
Feature Highlights – Administration Dashboard
Kernel Management
© 2017 IBM Corporation37
Feature Highlights – Administration Dashboard
© 2017 IBM Corporation38
SUMMARY
© 2017 IBM Corporation39
Differentiating Value Capability
Create better models in less time
§ Rapidly optimize the algorithm that best fits the data and business scenario
Cognitive Assistant for Data Scientists (CADS)
§ Provide optimal parameters for any given model
Hyper Parameter Optimization (HPO)
Simplify model creation § Wizards make it easy for users to create and train a model
DSX PipelineUser Interface
Improve modelsover time
§ Monitor model performance with feedback data and performance history
§ Notification of model performance deterioration for more efficient retraining
Continuous Monitoring and Feedback Loop
Easily integrate with existing tools and applications
§ Ease collaboration across users (e.g., Data Scientists and App Developers)
Modern RESTful APIs
Simplify model management
§ Easily manage thousands of models in an enterprise environment Single UI for Deployment
IBM Machine Learning for z/OS – faster time to value