understanding random forests from theory to practicekegl/polytechnique/louppe.pdfunderstanding...
TRANSCRIPT
![Page 1: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/1.jpg)
Understanding Random ForestsFrom Theory to Practice
Gilles Louppe
Universite de Liege, Belgium
1 / 40
![Page 2: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/2.jpg)
Motivation
2 / 40
![Page 3: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/3.jpg)
Objective
From a set of measurements,
learn a model
to predict and understand a phenomenon.
3 / 40
![Page 4: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/4.jpg)
Running example
From physicochemicalproperties (alcohol, acidity,
sulphates, ...),
learn a model
to predict wine tastepreferences (from 0 to 10).
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis, Modeling wine
preferences by data mining from physicochemical properties, 2009.
4 / 40
![Page 5: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/5.jpg)
Outline
1 Motivation
2 Growing decision trees and random forestsReview of state-of-the-art, minor contributions
3 Interpreting random forestsMajor contributions (Theory)
4 Implementing and accelerating random forestsMajor contributions (Practice)
5 Conclusions
5 / 40
![Page 6: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/6.jpg)
Supervised learning
• The inputs are random variables X = X1, ..., Xp ;
• The output is a random variable Y .
• Data comes as a finite learning set
L = {(xi , yi )|i = 0, . . . ,N − 1},
where xi ∈ X = X1 × ...× Xp and yi ∈ Y are randomly drawnfrom PX ,Y .
E.g., (xi , yi ) = ((color = red, alcohol = 12, ...), score = 6)
• The goal is to find a model ϕL : X 7→ Y minimizing
Err(ϕL) = EX ,Y {L(Y ,ϕL(X ))}.
6 / 40
![Page 7: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/7.jpg)
Performance evaluation
Classification
• Symbolic output (e.g., Y = {yes, no})
• Zero-one loss
L(Y ,ϕL(X )) = 1(Y 6= ϕL(X ))
Regression
• Numerical output (e.g., Y = R)
• Squared error loss
L(Y ,ϕL(X )) = (Y −ϕL(X ))2
7 / 40
![Page 8: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/8.jpg)
Divide and conquer
X1
X2
8 / 40
![Page 9: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/9.jpg)
Divide and conquer
0.7 X1
X2
8 / 40
![Page 10: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/10.jpg)
Divide and conquer
0.7
0.5
X1
X2
8 / 40
![Page 11: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/11.jpg)
Decision trees
0.7
0.5
X1
X2
t5t3
t4𝑡2
𝑋1 ≤0.7
𝑡1
𝑡3
𝑡4 𝑡5
𝒙
𝑝(𝑌 = 𝑐|𝑋 = 𝒙)
Split node
Leaf node≤ >
𝑋2 ≤0.5≤ >
t ∈ ϕ : nodes of the tree ϕXt : split variable at tvt ∈ R : split threshold at tϕ(x) = arg maxc∈Y p(Y = c |X = x)
9 / 40
![Page 12: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/12.jpg)
Learning from data (CART)
function BuildDecisionTree(L)Create node t from the learning sample Lt = L
if the stopping criterion is met for t thenyt = some constant value
elseFind the split on Lt that maximizes impurity decrease
s∗ = arg maxs∈Q
∆i(s, t)
Partition Lt into LtL ∪ LtR according to s∗
tL = BuildDecisionTree(LL)tR = BuildDecisionTree(LR)
end ifreturn t
end function
10 / 40
![Page 13: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/13.jpg)
Back to our example
alcohol <= 10.625
vol. acidity <= 0.237 alcohol <= 11.741
y = 5.915 y = 5.382 vol. acidity <= 0.442 y = 6.516
y = 6.131 y = 5.557
11 / 40
![Page 14: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/14.jpg)
Bias-variance decomposition
Theorem. For the squared error loss, thebias-variance decomposition of theexpected generalization error at X = x is
EL{Err(ϕL(x))} = noise(x)+bias2(x)+var(x)
where
noise(x) = Err(ϕB(x)),
bias2(x) = (ϕB(x) − EL{ϕL(x)})2,
var(x) = EL{(EL{ϕL(x)}−ϕL(x))2}.
y
P
'B (x)
L 'L(x)
bias2 (x)
noise(x) var(x)
{ }
12 / 40
![Page 15: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/15.jpg)
Diagnosing the generalization error of a decision tree
• (Residual error : Lowest achievable error, independent of ϕL.)
• Bias : Decision trees usually have low bias.
• Variance : They often suffer from high variance.
• Solution : Combine the predictions of several randomized treesinto a single model.
13 / 40
![Page 16: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/16.jpg)
Random forests
𝒙
𝑝𝜑1(𝑌 = 𝑐|𝑋 = 𝒙)
𝜑1 𝜑𝑀
…
𝑝𝜑𝑚(𝑌 = 𝑐|𝑋 = 𝒙)
∑
𝑝𝜓(𝑌 = 𝑐|𝑋 = 𝒙)
Randomization• Bootstrap samples } Random Forests• Random selection of K 6 p split variables } Extra-Trees• Random selection of the threshold
14 / 40
![Page 17: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/17.jpg)
Bias-variance decomposition (cont.)
Theorem. For the squared error loss, the bias-variancedecomposition of the expected generalization errorEL{Err(ψL,θ1,...,θM (x))} at X = x of an ensemble of Mrandomized models ϕL,θm is
EL{Err(ψL,θ1,...,θM (x))} = noise(x) + bias2(x) + var(x),
where
noise(x) = Err(ϕB(x)),
bias2(x) = (ϕB(x) − EL,θ{ϕL,θ(x)})2,
var(x) = ρ(x)σ2L,θ(x) +
1 − ρ(x)
Mσ2L,θ(x).
and where ρ(x) is the Pearson correlation coefficient between thepredictions of two randomized trees built on the same learning set.
15 / 40
![Page 18: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/18.jpg)
Diagnosing the generalization error of random forests
• Bias : Identical to the bias of a single randomized tree.
• Variance : var(x) = ρ(x)σ2L,θ(x) +
1−ρ(x)M σ2
L,θ(x)
As M →∞, var(x)→ ρ(x)σ2L,θ(x)
The stronger the randomization, ρ(x)→ 0, var(x)→ 0.The weaker the randomization, ρ(x)→ 1, var(x)→ σ2
L,θ(x)
Bias-variance trade-off. Randomization increases bias but makesit possible to reduce the variance of the corresponding ensemblemodel. The crux of the problem is to find the right trade-off.
16 / 40
![Page 19: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/19.jpg)
Back to our example
Method Trees MSE
CART 1 1.055Random Forest 50 0.517Extra-Trees 50 0.507
Combining several randomized trees indeed works better !
17 / 40
![Page 20: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/20.jpg)
Outline
1 Motivation
2 Growing decision trees and random forests
3 Interpreting random forests
4 Implementing and accelerating random forests
5 Conclusions
18 / 40
![Page 21: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/21.jpg)
Variable importances
• Interpretability can be recovered through variable importances
• Two main importance measures :The mean decrease of impurity (MDI) : summing totalimpurity reductions at all tree nodes where the variableappears (Breiman et al., 1984) ;The mean decrease of accuracy (MDA) : measuringaccuracy reduction on out-of-bag samples when the values ofthe variable are randomly permuted (Breiman, 2001).
• We focus here on MDI because :It is faster to compute ;It does not require to use bootstrap sampling ;In practice, it correlates well with the MDA measure.
19 / 40
![Page 22: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/22.jpg)
Mean decrease of impurity
𝜑1 𝜑𝑀 𝜑2
…
Importance of variable Xj for an ensemble of M trees ϕm is :
Imp(Xj) =1
M
M∑m=1
∑t∈ϕm
1(jt = j)[p(t)∆i(t)
],
where jt denotes the variable used at node t, p(t) = Nt/N and∆i(t) is the impurity reduction at node t :
∆i(t) = i(t) −NtL
Nti(tL) −
Ntr
Nti(tR)
20 / 40
![Page 23: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/23.jpg)
Back to our example
MDI scores as computed from a forest of 1000 fully developedtrees on the Wine dataset (Random Forest, default parameters).
0.00 0.05 0.10 0.15 0.20 0.25 0.30
color
fixed acidity
citric acid
dens ity
chlorides
pH
res idual sugar
total sulfur dioxide
sulphates
free sulfur dioxide
volatile acidity
alcohol
21 / 40
![Page 24: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/24.jpg)
What does it mean ?
• MDI works well, but it is not well understood theoretically ;
• We would like to better characterize it and derive its mainproperties from this characterization.
• Working assumptions :
All variables are discrete ;Multi-way splits a la C4.5 (i.e., one branch per value) ;Shannon entropy as impurity measure :
i(t) = −∑c
Nt,c
Ntlog
Nt,c
Nt
Totally randomized trees (RF with K = 1) ;Asymptotic conditions : N →∞, M →∞.
22 / 40
![Page 25: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/25.jpg)
Result 1 : Three-level decomposition (Louppe et al., 2013)
Theorem. Variable importances provide a three-leveldecomposition of the information jointly provided by all the inputvariables about the output, accounting for all interaction terms ina fair and exhaustive way.
I (X1, . . . ,Xp;Y )︸ ︷︷ ︸Information jointly provided
by all input variablesabout the output
=
p∑j=1
Imp(Xj)︸ ︷︷ ︸i) Decomposition in terms of
the MDI importance ofeach input variable
Imp(Xj) =
p−1∑k=0
1
C kp
1
p − k︸ ︷︷ ︸ii) Decomposition along
the degrees k of interactionwith the other variables
∑B∈Pk(V−j)
I (Xj ;Y |B)
︸ ︷︷ ︸iii) Decomposition along all
interaction terms Bof a given degree k
E.g. : p = 3, Imp(X1) = 13 I(X1;Y )+ 1
6 (I(X1;Y |X2)+I(X1;Y |X3))+13 I(X1;Y |X2,X3)
23 / 40
![Page 26: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/26.jpg)
Illustration : 7-segment display (Breiman et al., 1984)
y x1 x2 x3 x4 x5 x6 x7
0 1 1 1 0 1 1 11 0 0 1 0 0 1 02 1 0 1 1 1 0 13 1 0 1 1 0 1 14 0 1 1 1 0 1 05 1 1 0 1 0 1 16 1 1 0 1 1 1 17 1 0 1 0 0 1 08 1 1 1 1 1 1 19 1 1 1 1 0 1 1
24 / 40
![Page 27: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/27.jpg)
Illustration : 7-segment display (Breiman et al., 1984)
Imp(Xj) =
p−1∑k=0
1
C kp
1
p − k
∑B∈Pk(V−j)
I (Xj ;Y |B)
Var Imp
X1 0.412X2 0.581X3 0.531X4 0.542X5 0.656X6 0.225X7 0.372∑
3.321
0 1 2 3 4 5 6
X1
X2
X3
X4
X5
X6
X7
k24 / 40
![Page 28: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/28.jpg)
Result 2 : Irrelevant variables (Louppe et al., 2013)
Theorem. Variable importances depend only on the relevantvariables.
Theorem. A variable Xj is irrelevant if and only if Imp(Xj) = 0.
⇒ The importance of a relevant variable is insensitive to theaddition or the removal of irrelevant variables.
Definition (Kohavi & John, 1997). A variable X is irrelevant (to Y with respect to V )
if, for all B ⊆ V , I(X ;Y |B) = 0. A variable is relevant if it is not irrelevant.
25 / 40
![Page 29: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/29.jpg)
Relaxing assumptions
When trees are not totally random...
• There can be relevant variables with zero importances (due tomasking effects).
• The importance of relevant variables can be influenced by thenumber of irrelevant variables.
When the learning set is finite...
• Importances are biased towards variables of high cardinality.
• This effect can be minimized by collecting impurity termsmeasured from large enough sample only.
When splits are not multiway...
• i(t) does not actually measure the mutual information.
26 / 40
![Page 30: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/30.jpg)
Back to our exampleMDI scores as computed from a forest of 1000 fixed-depth trees onthe Wine dataset (Extra-Trees, K = 1, max depth = 5).
0.00 0.05 0.10 0.15 0.20 0.25 0.30
pH
res idual sugar
fixed acidity
sulphates
free sulfur dioxide
citric acid
chlorides
total sulfur dioxide
dens ity
color
volatile acidity
alcohol
Taking into account (some of) the biasesresults in quite a different story !
27 / 40
![Page 31: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/31.jpg)
Outline
1 Motivation
2 Growing decision trees and random forests
3 Interpreting random forests
4 Implementing and accelerating random forests
5 Conclusions
28 / 40
![Page 32: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/32.jpg)
Implementation (Pedregosa et al., 2011 ; Buitinck et al., 2013)
Scikit-Learn
• Open source machine learning library forPython
• Classical and well-establishedalgorithms
• Emphasis on code quality and usability
scikit
A long team effort
Time for building a Random Forest (relative to version 0.10)
1 0.99 0.98
0.330.11 0.04
0.10 0.11 0.12 0.13 0.14 0.15
29 / 40
![Page 33: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/33.jpg)
Implementation overview
• Modular implementation, designed with a strict separation ofconcerns
Builders : for building and connecting nodes into a treeSplitters : for finding a splitCriteria : for evaluating the goodness of a splitTree : dedicated data structure
• Efficient algorithmic formulation [See Louppe, 2014]
Dedicated sorting procedureEfficient evaluation of consecutive splits
• Close to the metal, carefully coded, implementation2300+ lines of Python, 3000+ lines of Cython, 1700+ lines of tests
# But we kept it stupid simple for users!
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
30 / 40
![Page 34: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/34.jpg)
A winning strategyScikit-Learn implementation proves to be one of the fastestamong all libraries and programming languages.
0
2000
4000
6000
8000
10000
12000
14000
Fit
tim
e(s
)
203.01 211.53
4464.65
3342.83
1518.14 1711.94
1027.91
13427.06
10941.72
Scikit-Learn-RFScikit-Learn-ETsOpenCV-RFOpenCV-ETsOK3-RFOK3-ETsWeka-RFR-RFOrange-RF
Scikit-LearnPython, Cython
OpenCVC++
OK3C Weka
Java
randomForestR, Fortran
OrangePython
31 / 40
![Page 35: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/35.jpg)
Computational complexity (Louppe, 2014)
Average time complexity
CART Θ(pN log2 N)
Random Forest Θ(MKN log2 N)Extra-Trees Θ(MKN logN)
• N : number of samples in L
• p : number of input variables
• K : the number of variables randomly drawn at each node
• N = 0.632N.
32 / 40
![Page 36: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/36.jpg)
Improving scalability through randomization
Motivation
• Randomization and averaging allow to improve accuracy byreducing variance.
• As a nice side-effect, the resulting algorithms are fast andembarrassingly parallel.
• Why not purposely exploit randomization to make thealgorithm even more scalable (and at least as accurate) ?
Problem
• Let assume a supervised learning problem of Ns samplesdefined over Nf features. Let also assume T computingnodes, each with a memory capacity limited to Mmax , withMmax � Ns × Nf .
• How to best exploit the memory constraint to obtain the mostaccurate model, as quickly as possible ?
33 / 40
![Page 37: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/37.jpg)
A straightforward solution : Random Patches (Louppe et al., 2012)
X Y
...}
1. Draw a subsample r of psNs
random examples, with pfNf
random features.
2. Build a base estimator on r .
3. Repeat 1-2 for a number T ofestimators.
4. Aggregate the predictions byvoting.
ps and pf are two meta-parameters thatshould be selected
• such that psNs × pfNf 6 Mmax
• to optimize accuracy
34 / 40
![Page 38: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/38.jpg)
Impact of memory constraint
0.1 0.2 0.3 0.4 0.5Memory constraint
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
Accuracy
RP-ETRP-DTETRF
35 / 40
![Page 39: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/39.jpg)
Lessons learned from subsampling
• Training each estimator on the whole data is (often) useless.The size of the random patches can be reduced without(significant) loss in accuracy.
• As a result, both memory consumption and training time canbe reduced, at low cost.
• With strong memory constraints, RP can exploit data betterthan the other methods.
• Sampling features is critical to improve accuracy. Samplingthe examples only is often ineffective.
36 / 40
![Page 40: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/40.jpg)
Outline
1 Motivation
2 Growing decision trees and random forests
3 Interpreting random forests
4 Implementing and accelerating random forests
5 Conclusions
37 / 40
![Page 41: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/41.jpg)
Opening the black box
• Random forests constitute one of the most robust andeffective machine learning algorithms for many problems.
• While simple in design and easy to use, random forests remainhowever
hard to analyze theoretically,non-trivial to interpret,difficult to implement properly.
• Through an in-depth re-assessment of the method, thisdissertation has proposed original contributions on theseissues.
38 / 40
![Page 42: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/42.jpg)
Future works
Variable importances
• Theoretical characterization of variable importances in a finitesetting.
• (Re-analysis of) empirical studies based on variableimportances, in light of the results and conclusions of thethesis.
• Study of variable importances in boosting.
Subsampling
• Finer study of subsampling statistical mechanisms.
• Smart sampling.
39 / 40
![Page 43: Understanding Random Forests From Theory to Practicekegl/Polytechnique/louppe.pdfUnderstanding Random Forests From Theory to Practice Gilles Louppe Universit e de Li ege, Belgium 1/40](https://reader036.vdocument.in/reader036/viewer/2022081600/604577a71185e02a9d36418b/html5/thumbnails/43.jpg)
Questions ?
40 / 40