statistical learning: some principles and applicationsadeline.e-samson.org › wp-content ›...

24
Statistical learning: some principles and applications Adeline LECLERCQ SAMSON Massih-Reza AMINI A. Samson Grenoble, 7/06/2018 1 / 24

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

Statistical learning: some principles and applications

Adeline LECLERCQ SAMSON

Massih-Reza AMINI

A. Samson Grenoble, 7/06/2018 1 / 24

Page 2: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

Challenges in Statistical Learning

Some vocabulary in statistical learning1. Learning a decision rule

I Prediction of an (labeled) outcome based on observed variablesI Prediction of the outcome from new observations

2. ClusteringI Creation of groups of similar individuals / objects / variablesI Research of patterns

3. Learning a model from the dataI Physical / biological modelsI Estimation of the parameters, shape of the model

4. Learning associations, statistical testsI Correlation between variablesI Interactions in a network

5. Data visualizationI Exploration of the data, outliers detection, errorsI Reduction of the dimension

A. Samson Grenoble, 7/06/2018 2 / 24

Page 3: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

Challenges in Statistical Learning

Challenges in statistical learning

High dimensionI A lot of variables per individualI Aim: learn the effect of all these variables even when the number of

individuals / units remains small

Repeated measuresI Repeated trialsI Longitudinal measurements: several measures in timeI Aim: learn the variability in the process and the evolution in time

Structured dataI Connected objects: a function measured with high frequencyI Functional dataI Aim: learn a function (infinite dimension) and not only some parameters

A. Samson Grenoble, 7/06/2018 3 / 24

Page 4: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

Challenges in Statistical Learning

Main approaches in statistical learning

Parametric versus Non Parametric

Non parametricI No prior knowledge of a model, of a relationship between variablesI Objectives: learn the model, the distribution of the variables, the network

ParametricI Models with meaningful parameters: chemical interactions, physical laws,

biological transformationI Prior models: linear regression, reliabilityI Objectives: numerical optimization of a criteria

Challenges

Parsimony: high dimension but maybe few significant signals

Optimization: new technics to optimize complex criteria/space

Distributed calcul: scalability of the solution

A. Samson Grenoble, 7/06/2018 4 / 24

Page 5: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

1. Decision rule

1. Decision rule, classification

Supervised learning: class labelsare provided

Aim: learn a classifier to predictclass labels of novel data

Statistical toolsI Logistic regression

(parametric)I K-nearest neighbors

(non-parametric)I Decision tree

(non-parametric)

A. Samson Grenoble, 7/06/2018 5 / 24

Page 6: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

1. Decision rule

Decision tree

V1 >= 0.5

V2 >= 0.75

V2 >= 0.67

V1 < 0.2

V2 >= 0.25

1126 3 1 3 2 0

214 355 10 10 3 0

33 2 152 2 3 0

43 5 2 125 0 0

54 1 1 1 98 0

61 0 2 1 3 64

yes no

V1 >= 0.5

V2 >= 0.75

V2 >= 0.87

V2 < 0.85

V1 < 0.75

V2 >= 0.85

V1 < 0.88

V1 < 0.84

V2 < 0.022

V1 >= 0.93

V2 >= 0.019

V1 >= 0.69

V1 < 0.67

V2 < 0.015

V2 >= 0.67

V1 < 0.2

V1 >= 0.2

V1 < 0.2

V1 < 0.13

V1 >= 0.13

V1 >= 0.099

V1 < 0.1

V2 >= 0.25

V1 >= 0.46

V1 < 0.47

V1 < 0.47

178 0 0 1 1 0

144 1 0 2 0 0

14 0 0 0 0 0

20 1 0 0 0 0

20 1 0 0 0 0

30 0 1 0 0 0

50 0 0 0 1 0

11 0 0 0 0 0

11 0 0 0 0 0

20 7 0 0 0 0

20 2 0 0 0 0

30 0 1 0 0 0

30 0 1 0 0 0

212 346 8 10 3 0

33 2 152 2 3 0

30 0 1 0 0 0

40 0 0 2 0 0

11 0 0 0 0 0

20 2 0 0 0 0

40 1 0 10 0 0

41 2 1 64 0 0

41 0 0 49 0 0

11 0 0 0 0 0

40 0 0 1 0 0

50 1 0 0 8 0

53 0 1 0 90 0

61 0 2 1 3 64

yes no

A. Samson Grenoble, 7/06/2018 6 / 24

Page 7: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

1. Decision rule

K-nearest neighbors

A. Samson Grenoble, 7/06/2018 7 / 24

Page 8: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

1. Decision rule

Examples

Advanced personalized medicine

I Predict the best treatment from the knowledge of biomarkers measured at aninitial clinical visit

I Logistic regression and decision tree

A. Samson Grenoble, 7/06/2018 8 / 24

Page 9: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

1. Decision rule

Examples

Manufacturing industryI High production costsI Some non conformity at the end of the

production processI Large number of sensors

I Decision tree to predict early in the production process which piece is likely tobe not conform

I Reduce the proportion of non conformityA. Samson Grenoble, 7/06/2018 9 / 24

Page 10: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

2. Clustering

2. Clustering

Unsupervised learning: no classlabel is given

Aim

I Creation of groups of similarindividuals / objects /variables;

I Understanding the structureunderlying the data

Statistical toolsI K-means (non-parametric)I Mixture model (parametric)I Bi-clustering, Stochastic

Block Model (parametric)

A. Samson Grenoble, 7/06/2018 10 / 24

Page 11: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

2. Clustering

Examples

Consumption curvesI A curve per consumerI Prediction of the future consumption

I Clustering and identification of profiles by mixture model of functional dataI Prediction based on these clusters

A. Samson Grenoble, 7/06/2018 11 / 24

Page 12: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

2. Clustering

Examples

Consumption curvesI A curve per consumerI Prediction of the future consumption

I Clustering and identification of profiles by mixture model of functional dataI Prediction based on these clusters

A. Samson Grenoble, 7/06/2018 12 / 24

Page 13: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

2. Clustering

Bi-clustering

A. Samson Grenoble, 7/06/2018 13 / 24

Page 14: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

2. Clustering

Stochastic block model

A. Samson Grenoble, 7/06/2018 14 / 24

Page 15: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

2. Clustering

Examples

Autonomous VehicleI Precision of the positionI Sequence of images

I Segmentation of the images by Stochastic Block ModelI Prediction of the position

A. Samson Grenoble, 7/06/2018 15 / 24

Page 16: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

3. Learning a model

3. Learning a model

AimI Fit the data with a modelI Regression model

Statistical toolsI Differential equationsI Point processI Estimation of the parameters

(parametric)I Maximum likelihoodI BayesianI Penalization with high

dimensionI Estimation of the model

(non-parametric)I Splines, Wavelets, Fourier

A. Samson Grenoble, 7/06/2018 16 / 24

Page 17: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

3. Learning a model

Examples

Immunotherapy

I Efficacy of the treatmentI Optimization of the treatment, in terms of dose and time to treatment

A. Samson Grenoble, 7/06/2018 17 / 24

Page 18: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

3. Learning a model

Examples

Maintenance, reliabilityI Maintenance optimizationI Imperfect maintenance

I Virtual age modeling, point processesI Optimization of the next preventive maintenance

A. Samson Grenoble, 7/06/2018 18 / 24

Page 19: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

4. Learning associations

4. Learning associations, statistical tests

AimI Correlation between

variables,I Learning communities

and networks

Statistical toolsI Correlation tests,

multiple testsI Graphical models

A. Samson Grenoble, 7/06/2018 19 / 24

Page 20: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

4. Learning associations

Examples

GenomicsI Effect of the pollution on epigenetics

and baby growth

I Associations tests and multiple testingI Mediation to infer causality

A. Samson Grenoble, 7/06/2018 20 / 24

Page 21: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

4. Learning associations

Examples

NeurosciencesI Understanding the connexions in the brainI Longitudinal, functional data through EEG, MEG data

A. Samson Grenoble, 7/06/2018 21 / 24

Page 22: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

4. Learning associations

Statistical learning frameworks

Open source development frameworks available

Possible to use in industry

A. Samson Grenoble, 7/06/2018 22 / 24

Page 23: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

4. Learning associations

Take-Home message

Core-idea of statistical learningI Variety of (industrial) problemsI Variety of statistical questionsI Variety of learning approachesI Machine learning: decision rule, clusteringI Statistical learning: model learning, association/correlation

A strategy that is effective across different disciplinesI HealthI EnvironmentI EnergyI MarketingI Manufacturing industry

A. Samson Grenoble, 7/06/2018 23 / 24

Page 24: Statistical learning: some principles and applicationsadeline.e-samson.org › wp-content › uploads › 2018 › 06 › datascience… · Challenges in Statistical Learning Some

4. Learning associations

Some directions of ongoing research

Structured data, connected objects

Social networks, interaction graph

High dimension

Optimization with constraints

Distributed computation

A. Samson Grenoble, 7/06/2018 24 / 24