statistical learning: some principles and applicationsadeline.e-samson.org › wp-content ›...

Post on 04-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Statistical learning: some principles and applications

Adeline LECLERCQ SAMSON

Massih-Reza AMINI

A. Samson Grenoble, 7/06/2018 1 / 24

Challenges in Statistical Learning

Some vocabulary in statistical learning1. Learning a decision rule

I Prediction of an (labeled) outcome based on observed variablesI Prediction of the outcome from new observations

2. ClusteringI Creation of groups of similar individuals / objects / variablesI Research of patterns

3. Learning a model from the dataI Physical / biological modelsI Estimation of the parameters, shape of the model

4. Learning associations, statistical testsI Correlation between variablesI Interactions in a network

5. Data visualizationI Exploration of the data, outliers detection, errorsI Reduction of the dimension

A. Samson Grenoble, 7/06/2018 2 / 24

Challenges in Statistical Learning

Challenges in statistical learning

High dimensionI A lot of variables per individualI Aim: learn the effect of all these variables even when the number of

individuals / units remains small

Repeated measuresI Repeated trialsI Longitudinal measurements: several measures in timeI Aim: learn the variability in the process and the evolution in time

Structured dataI Connected objects: a function measured with high frequencyI Functional dataI Aim: learn a function (infinite dimension) and not only some parameters

A. Samson Grenoble, 7/06/2018 3 / 24

Challenges in Statistical Learning

Main approaches in statistical learning

Parametric versus Non Parametric

Non parametricI No prior knowledge of a model, of a relationship between variablesI Objectives: learn the model, the distribution of the variables, the network

ParametricI Models with meaningful parameters: chemical interactions, physical laws,

biological transformationI Prior models: linear regression, reliabilityI Objectives: numerical optimization of a criteria

Challenges

Parsimony: high dimension but maybe few significant signals

Optimization: new technics to optimize complex criteria/space

Distributed calcul: scalability of the solution

A. Samson Grenoble, 7/06/2018 4 / 24

1. Decision rule

1. Decision rule, classification

Supervised learning: class labelsare provided

Aim: learn a classifier to predictclass labels of novel data

Statistical toolsI Logistic regression

(parametric)I K-nearest neighbors

(non-parametric)I Decision tree

(non-parametric)

A. Samson Grenoble, 7/06/2018 5 / 24

1. Decision rule

Decision tree

V1 >= 0.5

V2 >= 0.75

V2 >= 0.67

V1 < 0.2

V2 >= 0.25

1126 3 1 3 2 0

214 355 10 10 3 0

33 2 152 2 3 0

43 5 2 125 0 0

54 1 1 1 98 0

61 0 2 1 3 64

yes no

V1 >= 0.5

V2 >= 0.75

V2 >= 0.87

V2 < 0.85

V1 < 0.75

V2 >= 0.85

V1 < 0.88

V1 < 0.84

V2 < 0.022

V1 >= 0.93

V2 >= 0.019

V1 >= 0.69

V1 < 0.67

V2 < 0.015

V2 >= 0.67

V1 < 0.2

V1 >= 0.2

V1 < 0.2

V1 < 0.13

V1 >= 0.13

V1 >= 0.099

V1 < 0.1

V2 >= 0.25

V1 >= 0.46

V1 < 0.47

V1 < 0.47

178 0 0 1 1 0

144 1 0 2 0 0

14 0 0 0 0 0

20 1 0 0 0 0

20 1 0 0 0 0

30 0 1 0 0 0

50 0 0 0 1 0

11 0 0 0 0 0

11 0 0 0 0 0

20 7 0 0 0 0

20 2 0 0 0 0

30 0 1 0 0 0

30 0 1 0 0 0

212 346 8 10 3 0

33 2 152 2 3 0

30 0 1 0 0 0

40 0 0 2 0 0

11 0 0 0 0 0

20 2 0 0 0 0

40 1 0 10 0 0

41 2 1 64 0 0

41 0 0 49 0 0

11 0 0 0 0 0

40 0 0 1 0 0

50 1 0 0 8 0

53 0 1 0 90 0

61 0 2 1 3 64

yes no

A. Samson Grenoble, 7/06/2018 6 / 24

1. Decision rule

K-nearest neighbors

A. Samson Grenoble, 7/06/2018 7 / 24

1. Decision rule

Examples

Advanced personalized medicine

I Predict the best treatment from the knowledge of biomarkers measured at aninitial clinical visit

I Logistic regression and decision tree

A. Samson Grenoble, 7/06/2018 8 / 24

1. Decision rule

Examples

Manufacturing industryI High production costsI Some non conformity at the end of the

production processI Large number of sensors

I Decision tree to predict early in the production process which piece is likely tobe not conform

I Reduce the proportion of non conformityA. Samson Grenoble, 7/06/2018 9 / 24

2. Clustering

2. Clustering

Unsupervised learning: no classlabel is given

Aim

I Creation of groups of similarindividuals / objects /variables;

I Understanding the structureunderlying the data

Statistical toolsI K-means (non-parametric)I Mixture model (parametric)I Bi-clustering, Stochastic

Block Model (parametric)

A. Samson Grenoble, 7/06/2018 10 / 24

2. Clustering

Examples

Consumption curvesI A curve per consumerI Prediction of the future consumption

I Clustering and identification of profiles by mixture model of functional dataI Prediction based on these clusters

A. Samson Grenoble, 7/06/2018 11 / 24

2. Clustering

Examples

Consumption curvesI A curve per consumerI Prediction of the future consumption

I Clustering and identification of profiles by mixture model of functional dataI Prediction based on these clusters

A. Samson Grenoble, 7/06/2018 12 / 24

2. Clustering

Bi-clustering

A. Samson Grenoble, 7/06/2018 13 / 24

2. Clustering

Stochastic block model

A. Samson Grenoble, 7/06/2018 14 / 24

2. Clustering

Examples

Autonomous VehicleI Precision of the positionI Sequence of images

I Segmentation of the images by Stochastic Block ModelI Prediction of the position

A. Samson Grenoble, 7/06/2018 15 / 24

3. Learning a model

3. Learning a model

AimI Fit the data with a modelI Regression model

Statistical toolsI Differential equationsI Point processI Estimation of the parameters

(parametric)I Maximum likelihoodI BayesianI Penalization with high

dimensionI Estimation of the model

(non-parametric)I Splines, Wavelets, Fourier

A. Samson Grenoble, 7/06/2018 16 / 24

3. Learning a model

Examples

Immunotherapy

I Efficacy of the treatmentI Optimization of the treatment, in terms of dose and time to treatment

A. Samson Grenoble, 7/06/2018 17 / 24

3. Learning a model

Examples

Maintenance, reliabilityI Maintenance optimizationI Imperfect maintenance

I Virtual age modeling, point processesI Optimization of the next preventive maintenance

A. Samson Grenoble, 7/06/2018 18 / 24

4. Learning associations

4. Learning associations, statistical tests

AimI Correlation between

variables,I Learning communities

and networks

Statistical toolsI Correlation tests,

multiple testsI Graphical models

A. Samson Grenoble, 7/06/2018 19 / 24

4. Learning associations

Examples

GenomicsI Effect of the pollution on epigenetics

and baby growth

I Associations tests and multiple testingI Mediation to infer causality

A. Samson Grenoble, 7/06/2018 20 / 24

4. Learning associations

Examples

NeurosciencesI Understanding the connexions in the brainI Longitudinal, functional data through EEG, MEG data

A. Samson Grenoble, 7/06/2018 21 / 24

4. Learning associations

Statistical learning frameworks

Open source development frameworks available

Possible to use in industry

A. Samson Grenoble, 7/06/2018 22 / 24

4. Learning associations

Take-Home message

Core-idea of statistical learningI Variety of (industrial) problemsI Variety of statistical questionsI Variety of learning approachesI Machine learning: decision rule, clusteringI Statistical learning: model learning, association/correlation

A strategy that is effective across different disciplinesI HealthI EnvironmentI EnergyI MarketingI Manufacturing industry

A. Samson Grenoble, 7/06/2018 23 / 24

4. Learning associations

Some directions of ongoing research

Structured data, connected objects

Social networks, interaction graph

High dimension

Optimization with constraints

Distributed computation

A. Samson Grenoble, 7/06/2018 24 / 24

top related