jean-françois puget, distinguished engineer, machine learning and optimization, ibm at mlconf sf...

20
© 2016 IBM Corporation IBM Confidential From ML Algorithms To Learning Machines (+ Optimization) Jean-François Puget 11/11/2016 @JFPuget

Upload: mlconf

Post on 09-Jan-2017

625 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM CorporationIBM Confidential

From ML Algorithms To Learning Machines(+ Optimization)

Jean-François Puget11/11/2016@JFPuget

Page 2: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential2

• 25 years ago, academic topic• The Machine Learning Workflow

Data ML algorithm ? publication

Page 3: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential3

• Perception now• The Machine Learning Workflow

Data ??? ML Algorithm ??? $$$

Page 4: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential4

• Simple!• The Machine Learning Workflow

Data Data Scientist

ML Algorithm Model $$$

R, Sklearn, Spark ML, Deep Learning, GBM (xgboost), vw, H2O, …

Page 5: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential5

• Focus on missing pieces• The Machine Learning Workflow

Data ??? ML Algorithm ??? $$$

Page 6: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential6

• Not that simple• The Machine Learning Workflow

Data Data Prep ML Algo Model Deploy Predict $$$

Choosing best

model

Models lose

accuracy

Scalable deployme

nt

Creating examples

Automating DS work

Page 7: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential7

The gap between data scientists and operations is incredible

Page 8: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential8

AlgorithmData prep

Data prem Scoring

Labeled examples

Training

Scoring

Newdata

Model

ModelPredicted

data

DeployDevOps

For each ML toolkit we need model serialization + scalable scoring engineWe are building that for Spark ML

Page 9: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential9

• Not that simple• The Machine Learning Workflow

Data Data Prep ML Algo Model Deploy Predict $$$

Choosing best

model

Models lose

accuracyCreating examples

Automating DS work

Page 10: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation

Cognitive Assistant for Data Scientists• Objective:

• Bring automation into key areas of large-scale data analysis tasks • Overcome “analytic decision overload” for Data Scientists

• Current CADS System• Automated selection, composition, configuration, training, and deployment of modeling pipelines

for supervised data mining tasks that leverages:• AI/Learning and Planning based principled exploration of analytic choices • Cross-platform analytic deployments (e.g., R, Spark, Python, SPSS) on Big Data platforms Cloud

• What is next….• Automation of more parts of the Data Scientists workflow (e.g. automated feature engineering)• Extend for other problems, data types, scale and user requirements (e.g., unstructured data, Deep Learning)• Self-Learning and Adaptation • Build first-ever conversational data science system with CADS + Watson QA

IBM Research10

Page 11: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential11

SystemML

11 IBM Research

Hadoop or Spark Cluster (scale-out)

In-Memory Single Node

(scale-up)

Runtime

Compiler

Language

DML Scripts DML (Declarative Machine Learning Language)

since 2010since 2012 since 2015

Linear Regression Conjugate Gradient

Page 12: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential12

• Pain points• The Machine Learning Workflow

Data Data Prep ML Algo Model Deploy Predict $$$

Models lose

accuracyCreating examples

Page 13: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential13

• Feedback loop• The Machine Learning Workflow

Data Data Prep ML Algo Model Deploy Predict $$$

Prediction acuracy monitoring:Collect predictions vs actuals

Page 14: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential14

Cognitive = Natural language processing + Machine Learning + …

What about Watson and cognitive computing ?

Page 15: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential15

Machine Learning and Mathematical Optimization Most ML algorithms solve an optimization problem: find paramaters for a given model family

that minimize Loss function (prediction error) Model simplicity (regularization)

Optimization algorithms: local methods Stochastic gradient descent, conjugate gradient, LBFGS, … Scale to large number of examples Embarrassingly parallel Can be stuck in local minima Hard time coping with additional constraints on the optimization problem

Mathematical optimization (e.g. CPLEX) Can find global optimum Can deal with constraints, eg L0 norm Limited in scale

Page 16: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential16

Classical ML Algorithms implemented with mathematical optimization models

Linear models: LASSO, Ridge Classifier, Elastic Net, Hinge loss, Hinge-squared loss Support Vector Machines: Primal, Dual linear, Dual RBF, Hinge models Decision Forests: Decision trees vote (preliminary work) Multi-label problems: Using 1-vs-rest method Alternating Least Squares: Application to Collaborative Filtering (recommendations)

LASSO

Page 17: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential17

Compressive Sensing

Image reconstruction with and without bounds on the pixel value

Original Lasso (sklearn) ConstrainedLasso (CPLEX)

Distribution ofpixel values

Page 18: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential18

Matrix factorization

• Used in recommendation systems• User profiles x movie profiles = observed interactions

Page 19: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential19

Aternating Least Square with additional constraints(Hugues Juille)

Page 20: Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

© 2016 IBM Corporation. IBM Confidential20

References

IBM Watson Machine Learning: http://datascience.ibm.com/registration/stepone

System ML: https://systemml.apache.org/

CADS: ICML 2014

CPLEX-learn Contributors: Jean-Francois Puget, Paul Shaw, Vincent Beraudier, Pierre Bonami, Daniel Junglas, Hugues Juille, Renaud Dumeur, Viu Long Kong, Philippe Couronne