jean-françois puget, distinguished engineer, machine learning and optimization, ibm at mlconf sf...

© 2016 IBM CorporationIBM Confidential

From ML Algorithms To Learning Machines(+ Optimization)

Jean-François Puget11/11/2016@JFPuget

© 2016 IBM Corporation. IBM Confidential2

• 25 years ago, academic topic• The Machine Learning Workflow

Data ML algorithm ? publication


• Perception now• The Machine Learning Workflow

Data ??? ML Algorithm ??? $$$


• Simple!• The Machine Learning Workflow

Data Data Scientist

ML Algorithm Model $$$

R, Sklearn, Spark ML, Deep Learning, GBM (xgboost), vw, H2O, …


• Focus on missing pieces• The Machine Learning Workflow

Data ??? ML Algorithm ??? $$$


• Not that simple• The Machine Learning Workflow

Data Data Prep ML Algo Model Deploy Predict $$$

Choosing best

model

Models lose

accuracy

Scalable deployme

nt

Creating examples

Automating DS work


The gap between data scientists and operations is incredible


AlgorithmData prep

Data prem Scoring

Labeled examples

Training

Scoring

Newdata

Model

ModelPredicted

data

DeployDevOps

For each ML toolkit we need model serialization + scalable scoring engineWe are building that for Spark ML


• Not that simple• The Machine Learning Workflow


Choosing best

model

Models lose

accuracyCreating examples

Automating DS work

© 2016 IBM Corporation

Cognitive Assistant for Data Scientists• Objective:

• Bring automation into key areas of large-scale data analysis tasks • Overcome “analytic decision overload” for Data Scientists

• Current CADS System• Automated selection, composition, configuration, training, and deployment of modeling pipelines

for supervised data mining tasks that leverages:• AI/Learning and Planning based principled exploration of analytic choices • Cross-platform analytic deployments (e.g., R, Spark, Python, SPSS) on Big Data platforms Cloud

• What is next….• Automation of more parts of the Data Scientists workflow (e.g. automated feature engineering)• Extend for other problems, data types, scale and user requirements (e.g., unstructured data, Deep Learning)• Self-Learning and Adaptation • Build first-ever conversational data science system with CADS + Watson QA

IBM Research10


SystemML

11 IBM Research

Hadoop or Spark Cluster (scale-out)

In-Memory Single Node

(scale-up)

Runtime

Compiler

Language

DML Scripts DML (Declarative Machine Learning Language)

since 2010since 2012 since 2015

Linear Regression Conjugate Gradient


• Pain points• The Machine Learning Workflow


Models lose

accuracyCreating examples


• Feedback loop• The Machine Learning Workflow


Prediction acuracy monitoring:Collect predictions vs actuals


Cognitive = Natural language processing + Machine Learning + …

What about Watson and cognitive computing ?


Machine Learning and Mathematical Optimization Most ML algorithms solve an optimization problem: find paramaters for a given model family

that minimize Loss function (prediction error) Model simplicity (regularization)

Optimization algorithms: local methods Stochastic gradient descent, conjugate gradient, LBFGS, … Scale to large number of examples Embarrassingly parallel Can be stuck in local minima Hard time coping with additional constraints on the optimization problem

Mathematical optimization (e.g. CPLEX) Can find global optimum Can deal with constraints, eg L0 norm Limited in scale


Classical ML Algorithms implemented with mathematical optimization models

Linear models: LASSO, Ridge Classifier, Elastic Net, Hinge loss, Hinge-squared loss Support Vector Machines: Primal, Dual linear, Dual RBF, Hinge models Decision Forests: Decision trees vote (preliminary work) Multi-label problems: Using 1-vs-rest method Alternating Least Squares: Application to Collaborative Filtering (recommendations)

LASSO


Compressive Sensing

Image reconstruction with and without bounds on the pixel value

Original Lasso (sklearn) ConstrainedLasso (CPLEX)

Distribution ofpixel values


Matrix factorization

• Used in recommendation systems• User profiles x movie profiles = observed interactions


Aternating Least Square with additional constraints(Hugues Juille)


References

IBM Watson Machine Learning: http://datascience.ibm.com/registration/stepone

System ML: https://systemml.apache.org/

CADS: ICML 2014

CPLEX-learn Contributors: Jean-Francois Puget, Paul Shaw, Vincent Beraudier, Pierre Bonami, Daniel Junglas, Hugues Juille, Renaud Dumeur, Viu Long Kong, Philippe Couronne

http://datascience.ibm.com/registration/stepone

https://systemml.apache.org/

http://www.cs.toronto.edu/~horst/cogrobo/papers/CADS.pdf

jean-françois puget, distinguished engineer, machine learning and optimization, ibm at mlconf sf...

Technology