jean-françois puget, distinguished engineer, machine learning and optimization, ibm at mlconf sf...
TRANSCRIPT
© 2016 IBM CorporationIBM Confidential
From ML Algorithms To Learning Machines(+ Optimization)
Jean-François Puget11/11/2016@JFPuget
© 2016 IBM Corporation. IBM Confidential2
• 25 years ago, academic topic• The Machine Learning Workflow
Data ML algorithm ? publication
© 2016 IBM Corporation. IBM Confidential3
• Perception now• The Machine Learning Workflow
Data ??? ML Algorithm ??? $$$
© 2016 IBM Corporation. IBM Confidential4
• Simple!• The Machine Learning Workflow
Data Data Scientist
ML Algorithm Model $$$
R, Sklearn, Spark ML, Deep Learning, GBM (xgboost), vw, H2O, …
© 2016 IBM Corporation. IBM Confidential5
• Focus on missing pieces• The Machine Learning Workflow
Data ??? ML Algorithm ??? $$$
© 2016 IBM Corporation. IBM Confidential6
• Not that simple• The Machine Learning Workflow
Data Data Prep ML Algo Model Deploy Predict $$$
Choosing best
model
Models lose
accuracy
Scalable deployme
nt
Creating examples
Automating DS work
© 2016 IBM Corporation. IBM Confidential7
The gap between data scientists and operations is incredible
© 2016 IBM Corporation. IBM Confidential8
AlgorithmData prep
Data prem Scoring
Labeled examples
Training
Scoring
Newdata
Model
ModelPredicted
data
DeployDevOps
For each ML toolkit we need model serialization + scalable scoring engineWe are building that for Spark ML
© 2016 IBM Corporation. IBM Confidential9
• Not that simple• The Machine Learning Workflow
Data Data Prep ML Algo Model Deploy Predict $$$
Choosing best
model
Models lose
accuracyCreating examples
Automating DS work
© 2016 IBM Corporation
Cognitive Assistant for Data Scientists• Objective:
• Bring automation into key areas of large-scale data analysis tasks • Overcome “analytic decision overload” for Data Scientists
• Current CADS System• Automated selection, composition, configuration, training, and deployment of modeling pipelines
for supervised data mining tasks that leverages:• AI/Learning and Planning based principled exploration of analytic choices • Cross-platform analytic deployments (e.g., R, Spark, Python, SPSS) on Big Data platforms Cloud
• What is next….• Automation of more parts of the Data Scientists workflow (e.g. automated feature engineering)• Extend for other problems, data types, scale and user requirements (e.g., unstructured data, Deep Learning)• Self-Learning and Adaptation • Build first-ever conversational data science system with CADS + Watson QA
IBM Research10
© 2016 IBM Corporation. IBM Confidential11
SystemML
11 IBM Research
Hadoop or Spark Cluster (scale-out)
In-Memory Single Node
(scale-up)
Runtime
Compiler
Language
DML Scripts DML (Declarative Machine Learning Language)
since 2010since 2012 since 2015
Linear Regression Conjugate Gradient
© 2016 IBM Corporation. IBM Confidential12
• Pain points• The Machine Learning Workflow
Data Data Prep ML Algo Model Deploy Predict $$$
Models lose
accuracyCreating examples
© 2016 IBM Corporation. IBM Confidential13
• Feedback loop• The Machine Learning Workflow
Data Data Prep ML Algo Model Deploy Predict $$$
Prediction acuracy monitoring:Collect predictions vs actuals
© 2016 IBM Corporation. IBM Confidential14
Cognitive = Natural language processing + Machine Learning + …
What about Watson and cognitive computing ?
© 2016 IBM Corporation. IBM Confidential15
Machine Learning and Mathematical Optimization Most ML algorithms solve an optimization problem: find paramaters for a given model family
that minimize Loss function (prediction error) Model simplicity (regularization)
Optimization algorithms: local methods Stochastic gradient descent, conjugate gradient, LBFGS, … Scale to large number of examples Embarrassingly parallel Can be stuck in local minima Hard time coping with additional constraints on the optimization problem
Mathematical optimization (e.g. CPLEX) Can find global optimum Can deal with constraints, eg L0 norm Limited in scale
© 2016 IBM Corporation. IBM Confidential16
Classical ML Algorithms implemented with mathematical optimization models
Linear models: LASSO, Ridge Classifier, Elastic Net, Hinge loss, Hinge-squared loss Support Vector Machines: Primal, Dual linear, Dual RBF, Hinge models Decision Forests: Decision trees vote (preliminary work) Multi-label problems: Using 1-vs-rest method Alternating Least Squares: Application to Collaborative Filtering (recommendations)
LASSO
© 2016 IBM Corporation. IBM Confidential17
Compressive Sensing
Image reconstruction with and without bounds on the pixel value
Original Lasso (sklearn) ConstrainedLasso (CPLEX)
Distribution ofpixel values
© 2016 IBM Corporation. IBM Confidential18
Matrix factorization
• Used in recommendation systems• User profiles x movie profiles = observed interactions
© 2016 IBM Corporation. IBM Confidential19
Aternating Least Square with additional constraints(Hugues Juille)
© 2016 IBM Corporation. IBM Confidential20
References
IBM Watson Machine Learning: http://datascience.ibm.com/registration/stepone
System ML: https://systemml.apache.org/
CADS: ICML 2014
CPLEX-learn Contributors: Jean-Francois Puget, Paul Shaw, Vincent Beraudier, Pierre Bonami, Daniel Junglas, Hugues Juille, Renaud Dumeur, Viu Long Kong, Philippe Couronne