comparing cart and random forest for the monitoring of ...€¦ · cart and random forest...

20
Julie Campagna, phD student, Angers France Aurélie Davranche, Associate professor Angers France Comparing CART and Random Forest for the monitoring of wetland vegetation with multispectral data. cnrs.fr 1 IBS-DR Biometry Workshop Würzburg University 09 october 2015 Workshop Wurszburg 10/2015

Upload: others

Post on 18-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

Julie Campagna, phD student,

Angers France

Aurélie Davranche, Associate

professor Angers France

Comparing CART and Random

Forest for the monitoring of

wetland vegetation with

multispectral data.

cnrs.fr

1

IBS-DR Biometry Workshop Würzburg University 09

october 2015

Workshop Wurszburg 10/2015

Page 2: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

SUMMARY

Decisions trees : Generalities

CART and Random Forest presentation

CART functionment

Random Forest functionment

Exemple of application : Remote sensing

Workshop Wurszburg 10/2015 2

Page 3: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

DECISION TREES

Method of classification (or regression)

Non parametric method

Can deal with a lot of data

Separate each sample to obtain the most homogeneous

classes as possible

Separability criterions existing :

Gini Index : CART

Chi square automatic interaction detection : CHAID

Shannon Entropy :C5.0

Workshop Wurszburg 10/2015 3

Page 4: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

COMPARAISON CART ET RANDOM FOREST

Two decision tree methods developped essentially by Breiman

et al.

Cart was the first in 1984

Random Forest 2001

Different applications: biology, medecine, remote

sensing,…

Deal with a lot of data sample and variables

Not perturbated with extrems data or variables not

required

Workshop Wurszburg 10/2015 4

Page 5: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

CART : FUNDAMENTALS

Cart use Ginny criterion to separate a

training sample

n : Number of class to predict

Fi : Class frequencie in the node

Dichotomous partionning

Decision rule appears

Workshop Wurszburg 10/2015 5

n

i

ifI ²1

Page 6: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

CART : IMPROVEMENT

Choose the result tree

• 75% for training sample and 25% for

validation

• 10 cross validation

• (Esposito et al, CV-1SE)

Workshop Wurszburg 10/2015 6

Sample

75% 25%

Cross Validation

10 folds>Error

minimal

Final tree

Validation Accuracy

Page 7: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

CART : PRUNNING RESULT

Workshop Wurszburg 10/2015 7

Page 8: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

CART : PARAMETERS

Cart was implement in R using the package Rpart

Presence = « 1 » ; absence = « 2 »

Unbalanced sample

Optimal « Prior » parameter : iterative runs of the algorithm

Workshop Wurszburg 10/2015 8

Page 9: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

RANDOM FOREST : GENERAL OPERATION

RF grows many classification trees

To classify, each variable goes down each of the trees in the

forest.

Each tree gives a classification: we say the tree “votes” for that

class.

The forest choses the classification having the most vote (over all

the trees in the forest).

Workshop Wurszburg 10/2015 9

Page 10: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

RANDOM FOREST : STEP ONE

For each tree it selects randomly 2/3 of the sample for

training set and 1/3 for validation (Out Of Bag, OOB)

Variables are chosen randomly (generally sqrt(variables))

at each node with replacement

Workshop Wurszburg 10/2015 10

Page 11: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

RANDOM FOREST : STEP TWO FOREST CONSTRUCTION

Workshop Wurszburg 10/2015 11

Page 12: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

RANDOM FOREST : PARAMETERS

Can not deal with unbalanced samples

Two ways to ajust datas :

Up-sampling based on the size of the largest class

Down-sampling based on the size of the smallest class

Workshop Wurszburg 10/2015 12

Page 13: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

EXEMPLE OF APPLICATION : REMOTE SENSING

Satellite images usefull for monitoring of wetland environments

In this case we used a high spatial resolution image (World View 2) on Camargue in South of France.

Needs :

Mapping the vegetation

Create a method easy to apply without knowledge in remote sensing and R programmation

Workshop Wurszburg 10/2015 13

Page 14: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

SAMPLE

21 landcover classes from field data

49 descriptive variables : reflectance values from bands

spectral data and multispectral indices

Workshop Wurszburg 10/2015 14

020406080

100120140160180

Class size

Page 15: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

EXEMPLE OF APPLICATION : REMOTE SENSING

Classification of Salicornia Fruticosa

Workshop Wurszburg 10/2015 15

Page 16: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

EXEMPLE OF APPLICATION : REMOTE SENSING

Cartography Results : Sarcocornia Fruticosa

Workshop Wurszburg 10/2015 16

Page 17: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

EXEMPLE OF APPLICATION : REMOTE SENSING

Confusion matricies :

Cart Carte de référence

Classe 1 Classe 2 Précision

Globale

Car

te p

rod

uit

e, c

lass

ific

atio

n

Entraînement Classe 1 49 65

Classe2 1 1316

50 1381 0,953878407

Erreur d'omission 0,02 0,04706

7

Validation Classe 1 16 22

Classe2 0 428

16 450 0,9527897

Erreur d'omission 0 0,04888

9

Total Classe 1 65 87

Classe2 1 1744

66 1831 0,953610965

Erreur d'omission 0,015 0,047

RF Carte de référence

Classe 1 Classe 2 Précision

Globale Erreur OOB

Car

te p

roduit

e, c

lass

ific

atio

n

RF_Up Classe 1 858 9

Classe2 0 1822

858 1831 0,991 0,26%

0 0,00494

RF_Down Classe 1 59 50

Classe2 7 1781

66 1831 0,97 3%

Erreur

d'omission 0,11 0,027

Workshop Wurszburg 10/2015 17

Close classification accuracy values

Page 18: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

EXEMPLE OF APPLICATION : REMOTE SENSING

The difference between global accuracy is really low

between CART and Random forest (around 1,5%) and both

results are good.

CART provides an explicit model, the one of Random

Forest is implicit

An explicit model can be used again on a new dataset or an

other image of the same date without repeat all the steps of

modeling : more easy to use without specific knowledge

Workshop Wurszburg 10/2015 18

Page 19: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

CONCLUSION AND DISCUSSION

On a same dataset and with all parameters suitable to CART

we obtain results not significantly different from Random

Forest

This two models need some parameters to be capable to deal

with unbalanced samples

CART can generate an explicit model as Random Forest can’t

This two algorithms also permit to identify important

variables

Workshop Wurszburg 10/2015 19

Page 20: Comparing CART and Random Forest for the monitoring of ...€¦ · CART and Random Forest presentation ... Cart was implement in R using the package Rpart Presence = « 1 » ; absence

THANKS FOR YOUR ATTENTION !

Workshop Wurszburg 10/2015 20