decision and regression tree ensemble methods and their … · decision and regression tree...

58
Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department of Electrical Engineering and Computer Science University of Li` ege - Institut Montefiore ORBEL Symposium - March 16, 2005 Louis Wehenkel Extremely Randomised Trees et al. (1/56)

Upload: others

Post on 11-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Decision and regression tree ensemble methods andtheir applications in automatic learning

Louis Wehenkel

Department of Electrical Engineering and Computer ScienceUniversity of Liege - Institut Montefiore

ORBEL Symposium - March 16, 2005

Louis Wehenkel Extremely Randomised Trees et al. (1/56)

Page 2: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Part I

Ensembles of extremely randomised treesMotivation(s)Extra-Trees algorithmCharacterisation(s)

Tree-based batch mode reinforcement learningProblem settingProposed solutionIllustration

Pixel-based image classificationProblem settingProposed solutionSome resultsFurther refinements

Louis Wehenkel Extremely Randomised Trees et al. (2/56)

Page 3: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Supervised learning algorithm (Batch Mode)

◮ Inputs: learning sample ls of (x , y) observations (ls ∈ (X × Y )∗)

◮ Output: a model f lsA ∈ FA ⊂ Y X (decision tree, MLP, ...)

Learning

X1

N

N

O

O

< 31

8X < 5 C1

C3C2

Y

2

10 234 0

018 573

C1

C1

C2

C2

C3

C1

60

60

2

75

10

10 10 102 5 8 814

9

1

1 1 1

3

3

11 23

46

29

0171819

22 23

9 77

0 0

0

3

1

2

?65

0

220 213 0 1 1

XX X X X X2 3 5 6 7 84X1X

◮ Objectives:◮ maximise accuracy on independent observations◮ interpretability, scalability

Louis Wehenkel Extremely Randomised Trees et al. (3/56)

Page 4: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Induction of single decision/regression trees (Reminder)

◮ Algorithm development (1960-1995)

◮ Top-down growing of trees by recursive partitioning◮ local optimisation of split score (variance, entropy)

◮ Bottom-up pruning to prevent over-fitting◮ global optimisation of complexity vs accuracy (B/V tradeoff)

◮ Characterisation◮ Highly scalable algorithm◮ Interpretable models (rules)◮ Robustness: irrelevant variables, scaling, outliers◮ Expected accuracy often low (high variance)

◮ Many variants and extensions◮ C4.5, CART, ID3 . . .◮ oblique, fuzzy, hybrid . . .

Louis Wehenkel Extremely Randomised Trees et al. (4/56)

Page 5: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Bias/variance decomposition (of average error)

Accuracy of models produced by an algorithm in a given context

◮ Assume problem (inputs X , outputs Y , relation P(X , Y ))

and sampling scheme (e.g. fixed size LS ∼ PN(X , Y )).

◮ Take model error function (e.g. Errf ,Y ≡ EX ,Y {(f (X ) − Y )2})

and evaluate expected error of algo A (i.e. ErrA,Y ≡ ELS{Errf lsA

,Y })

◮ We have ErrA,Y − ErrB,Y = Bias2A + VarA

where◮ B is the best possible model (here, B(·) ≡ EY |·)

◮ Bias2A = Errf A,B (f A is the average model)

◮ VarA = ErrA,f A(dependence of model on sample)

Louis Wehenkel Extremely Randomised Trees et al. (5/56)

Page 6: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Ensembles of trees (How?/Why?)

◮ Perturb and Combine paradigm (1990-2005)

◮ Build several trees (e.g. 100, by randomisation)

◮ Combine trees by voting, averaging. . . (i.e. aggregation)

◮ Characterisation◮ Can preserve scalability (+ trivially parallel)

◮ Does not preserve interpretability◮ Can preserve robustness (irrelevant variables, scaling, outliers)

◮ Can improve accuracy significantly

◮ Many generic variants (Bagging, Stacking, Boosting, . . . )

◮ Non-generic variants: (Random Forests, Random Subspace, . . . )

Louis Wehenkel Extremely Randomised Trees et al. (6/56)

Page 7: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Extra-Trees: learning algorithm

T1 T3 T4 T5T2

◮ Ensemble of trees T1, T2, . . .TT (generated independently)

◮ Random splitting (choice of variable and cut-point)

◮ Trees are fully developed (perfect fit on ls)

◮ Ultra-fast (√

nN log N)

(Presentation based on [Geu02, GEW04])

Louis Wehenkel Extremely Randomised Trees et al. (7/56)

Page 8: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Extra-Trees: prediction algorithm

◮ Aggregation (majority vote or averaging)

T2T1 T3 T4 T5

C1 C2 CM0 0 0 00 00 14 0

C2

?

Louis Wehenkel Extremely Randomised Trees et al. (8/56)

Page 9: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Extra-Trees splitting algorithm (for numerical attributes)

Given a node of a tree and a sample S corresponding to it

◮ Select K attributes {X1, . . . ,XK} at random;

◮ For each Xi (draw a split at random)◮ Let xS

i,min and xSi,max be the min and max values of Xi in S ;

◮ Draw a cut-point xi,c uniformly in ]xSi,min, x

Si,max];

◮ Let ti = [Xi < xi,c ].

◮ Return a split ti = arg maxti Score(ti , S).

NB: the node becomes a LEAF

◮ if |S | < nmin;

◮ if all attributes are constant in S ;

◮ if the output is constant in S ;

Louis Wehenkel Extremely Randomised Trees et al. (9/56)

Page 10: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Geometric properties (of Single Trees)

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

y

x

True functionLearning sample

ST

A single fully developed CART tree.

Louis Wehenkel Extremely Randomised Trees et al. (10/56)

Page 11: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Geometric properties (of Tree Bagging models)

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

y

x

True functionLearning sample

TB

With T = 100 trees in the ensemble.

Louis Wehenkel Extremely Randomised Trees et al. (11/56)

Page 12: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Geometric properties (of Tree Bagging models)

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

y

x

True functionLearning sample

TB

With T = 1000 trees in the ensemble.

Louis Wehenkel Extremely Randomised Trees et al. (11/56)

Page 13: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Geometric properties (of Extra-Trees models)

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

y

x

True functionLearning sample

ET

With T = 100 trees in the ensemble.

Louis Wehenkel Extremely Randomised Trees et al. (12/56)

Page 14: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Geometric properties (of Extra-Trees models)

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

y

x

True functionLearning sample

ET

With T = 1000 trees in the ensemble.

Louis Wehenkel Extremely Randomised Trees et al. (12/56)

Page 15: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Parameters (of the Extra-Trees learning algorithm)

Averaging strength T

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80 90 100

Err

or

T

Waveform

PSTTB

RFd

ETd

ET1

4 6 8

10 12 14 16 18 20 22 24 26

0 10 20 30 40 50 60 70 80 90 100

Err

or

T

Friedman1

PSTTB

ETd

ET1

Louis Wehenkel Extremely Randomised Trees et al. (13/56)

Page 16: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Parameters (of the Extra-Trees learning algorithm)

Attribute selection strength K (w.r.t. irrelevant variables)

3

3.5

4

4.5

5

5.5

6

0 5 10 15 20 25 30 35 40 45

Err

or

K

Two-Norm

OriginalWith irrelevant attributes

3

4

5

6

7

8

9

10

11

0 5 10 15 20 25 30 35 40

Err

or

K

Ring-Norm

OriginalWith irrelevant attributes

Louis Wehenkel Extremely Randomised Trees et al. (14/56)

Page 17: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Bias/variance tradeoff (of the Extra-Trees models)

0

2

4

6

8

10

12

14

ET1ET*RF*RS*TBST

Friedman1

VarianceBias

0 10 20 30 40 50 60 70 80 90

ET1ET*RF*RS*TBST

Pumadyn-32nm

VarianceBias

0

5

10

15

20

25

30

ET1ET*RF*RS*TBST

Census-16H

VarianceBias

0

5

10

15

20

25

30

ET1ET*RF*RS*TBST

Waveform

VarianceBias

Error rate

0

5

10

15

20

25

ET1ET*RF*RS*TBST

Two-Norm

VarianceBias

Error rate

0 2 4 6 8

10 12 14 16 18

ET1ET*RF*RS*TBST

Ring-Norm

VarianceBias

Error rate

Louis Wehenkel Extremely Randomised Trees et al. (15/56)

Page 18: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Bias/variance tradeoff (of the Extra-Trees learning algorithm)

Effect of attribute selection strength K

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10

K

Friedman1

Square errorBias

Variance

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

5 10 15 20 25

K

Waveform

Square errorBias

VarianceError rate

Louis Wehenkel Extremely Randomised Trees et al. (16/56)

Page 19: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Motivation(s)Extra-Trees algorithmCharacterisation(s)

Extra-Trees: variants of setting K

Automatic tuning of K

◮ by (10-fold) cross-validation◮ on (large enough) independent test sample

Default settings

◮ K =√

n, in classification◮ K = n, in regression (n =number of variables)

Totally randomised trees

◮ correspond to K = 1◮ splits (attribute and cut-point) totally at random◮ ultra-fast “non-supervised” learning algorithm◮ tree structures independent of output values◮ akin to KNN, or kernel-based method

Louis Wehenkel Extremely Randomised Trees et al. (17/56)

Page 20: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Ensembles of extremely randomised treesMotivation(s)Extra-Trees algorithmCharacterisation(s)

Tree-based batch mode reinforcement learningProblem settingProposed solutionIllustration

Pixel-based image classificationProblem settingProposed solutionSome resultsFurther refinements

Louis Wehenkel Extremely Randomised Trees et al. (18/56)

Page 21: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Optimal control problem (stochastic, discrete-time, infinite horizon)

xt+1 = f (xt , ut , wt) (stochastic dynamics, wt ∼ Pw (wt |xt , ut))

rt = r(xt , ut , wt) (real valued reward signal bounded over X × U × W )

γ (discount factor ∈ [0, 1))

µ(·) : X → U (closed-loop, stationary control policy)

h (x) = E{

∑h−1t=0 γtr(xt , µ(xt), wt)|x0 = x

}

(finite horizon return)

∞(x) = limh→∞ Jµ

h (x) (infinite horizon return)

Optimal infinite horizon control policyµ∗∞(·) that maximises Jµ

∞(x) for all x .

(Presentation based on [EGW03, EGW05])

Louis Wehenkel Extremely Randomised Trees et al. (19/56)

Page 22: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Batch mode reinforcement learning problem

Suppose that instead of system model (f (·, ·, ·), r(·, ·, ·), Pw (·|·, ·)),the only information we have is a (finite) sample F of four-tuples:

F = {(xt i , ut i , rt i , xt i+1), i = 1, · · · , #F}.Each four-tuple corresponds to a system transition

The objective of batch mode RL is to determine an approximationµ(·) of µ∗

∞(·) from the sole knowledge of F

(Many one-step episodes: xt i distributed independently)

(One single episode with many steps: xt i+1 = xt i +1)

(In general: several multi-step episodes)

Louis Wehenkel Extremely Randomised Trees et al. (20/56)

Page 23: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Q-function iteration to solve Bellman equation

Idea: µ∗∞(·) ≡ can be obtained as the limit of a sequence of

optimal finite horizon (time-varying) policies.

Define sequence of value-functions Qh and policies µ∗h(t, x) by:

Q0(x , u) ≡ 0Qh(x , u) = Ew |x ,u{r(x , u, w)+γmaxu′Qh−1(f (x , u, w), u′)} (∀h ∈ N)

µ∗h(t, x) = arg maxu Qh−t(x , u) (∀h ∈ N,∀t = 0, . . . , h − 1)

NB: these sequences converge (Qhsup−→ Q∞ and µ

∗h (t, x)

∞−→ µ∗∞(x))

Alternative view: (Bellman equation)

Q∞(x , u) = Ew |x ,u{r(x , u, w) + γmaxu′Q∞(f (x , u, w), u′)}µ∗∞(x) = arg maxu Q∞(x , u)

Louis Wehenkel Extremely Randomised Trees et al. (21/56)

Page 24: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Fitted Q iteration algorithm

Idea1: replace expectation operator Ew |x ,u by average over sampleIdea2: represent Qh by model to interpolate from samplesSupervised learning (regression): does the two in a single step

◮ Inputs:◮ a set F of four-tuples ((xt i , ut i , rt i , xt i +1), i = 1, · · · , #F )

◮ a regression algorithm A (A : ls → f lsA )

◮ Initialisation: Q0(x , u) ≡ 0

◮ Iteration: (for h = 1, 2, . . .)

◮ Training set construction: (∀i = 1, . . . #F )

xi = (xt i , ut i );yi = rt i + γ maxu Qh−1(xt i+1, u),

◮ Q-function fitting:Qh = A(ls) where ls = ((x1, y1), . . . , (x#F , y#F ))

Louis Wehenkel Extremely Randomised Trees et al. (22/56)

Page 25: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Coupling with tree-based models

Use tree-based regression as supervised learning algorithm

◮ Tree-based methods: ‘non-divergence’ to infinity

◮ Kernel-based methods: ‘convergence’ (when h → ∞)

◮ Tree structures frozen for h > h0 ⇒ kernel-based method

Solves at the same time

◮ System identification (implicitly)

◮ State-space discretisation (and curse-of-dimensionality)

◮ Bellman equation (iteratively and approximately)

Generality of the framework

◮ Non strong hypothesis on f , r (discrete, continuous, high-dimensional)

◮ Minimum-time problems (define r(x , u, w) = 1Goal (f (x , u, w)))

◮ Stabilisation problems (define r(x , u, w) = ||f (x , u, w) − xref ||)

Louis Wehenkel Extremely Randomised Trees et al. (23/56)

Page 26: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Illustration - Electric power system stabilisation

2C7

G2 G4

6 7 9 10 11 3 G3

L7 L9 C94

TCSCG1 1 5

Figure: Four-machine test system

◮ Use of simulator + 1000 random episodes (60s, ∆t =50ms)

◮ 5-dimensional X × U space; F contains 1100,000 four-tuples.

◮ “Reward”: power oscillations and loss of stability (γ = 0.95)

◮ Policy learning by fitted Q-function iteration (h = 100) withExtra-Trees (T = 50;K = 5; nmin = 2)

Louis Wehenkel Extremely Randomised Trees et al. (24/56)

Page 27: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Electric power system stabilisation

0 5 10 15 20 25

0

10

20

30

40

50

Time (s)

δ13

(deg

)

classical controller

uncontrolled response

RL controller (local)

Figure: The system responses to 100 ms, self-clearing, short circuit

Louis Wehenkel Extremely Randomised Trees et al. (25/56)

Page 28: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Electric power system stabilisation

5 10 15 20 25

−60

−40

−20

0

20

40

60

80

Time (s)

δ13

(deg

)

UncontrolledRL control (local)

Figure: 100 ms short circuit cleared by opening line

Louis Wehenkel Extremely Randomised Trees et al. (26/56)

Page 29: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionIllustration

Electric power system stabilisation

0 5 10 15 20 255

10

15

20

25

30

35

40

45

50

55

Time (s)

δ13

(deg

)

RL with local signal

RL with local and remote signals (no delay)

RL with local and remote signals (delay of 50ms)

Figure: Local vs remote signals with/without communication delay

Louis Wehenkel Extremely Randomised Trees et al. (27/56)

Page 30: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Ensembles of extremely randomised treesMotivation(s)Extra-Trees algorithmCharacterisation(s)

Tree-based batch mode reinforcement learningProblem settingProposed solutionIllustration

Pixel-based image classificationProblem settingProposed solutionSome resultsFurther refinements

Louis Wehenkel Extremely Randomised Trees et al. (28/56)

Page 31: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Generic pixel-based image classification

Challenge:

Create a robust image classification algorithm by the soleuse of supervised learning on the low-level pixel-basedrepresentation of the images.

Question:

How to inject invariance (scale, translation, orientation)in a generic way into a supervised learning algorithm ?

NB: work used mainly on Extra-Trees, but other supervisedlearners could also be used (e.g. SVMs, KNN. . . ).

(Presentation based on [MGPW04, MGPW05])

Louis Wehenkel Extremely Randomised Trees et al. (29/56)

Page 32: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Examples

◮ Hand written digit recognition (0, 1, 2, ..., 9)

◮ Face classification (Jim, Jane, John, ...)

Louis Wehenkel Extremely Randomised Trees et al. (30/56)

Page 33: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Examples

◮ Texture classification (Metal, Bricks, Flowers, Seeds, ...)

Louis Wehenkel Extremely Randomised Trees et al. (31/56)

Page 34: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Examples

◮ Object recognition (Cup X, Bottle Y, Fruit Z, ...)

Louis Wehenkel Extremely Randomised Trees et al. (32/56)

Page 35: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Principle of solution (global)

◮ Learning sample of N pre-classified images,

ls = {(ai, c i ), i = 1, . . . ,N}

ai : vector of pixel values of the entire imagec i : image class

Louis Wehenkel Extremely Randomised Trees et al. (33/56)

Page 36: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Principle of solution (local)

��������

��������

������

������

���

�����������

��������

���������

������������

���������

������������

������������

������

������

��������

��������

��������

��������

��������

��������

��������

������

������������

������

��������

��������

��������

��������

��������

��������

������

������������

������

��������

��������

��������

������

������

���������

���������

������

������

��������

��������

��������

��������

���������

���������

���

���

C1 C1 C1 C1 C1 C2 C2 C2 C2 C2 C3 C3 C3 C3 C3

C1 C2 C3

Learning sample of Nw sub-windows (size w × w , pre-classified),

ls = {(ai, c i ), i = 1, . . . ,Nw}

ai : vector of pixel-values of the sub-windowc i : class of mother image (from which the window was extracted)

Louis Wehenkel Extremely Randomised Trees et al. (34/56)

Page 37: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Local approach: prediction

C1 C2 CM

C1 C2 CM

C1 C2 CM

T2T1 T3 T4 T5

������

������

������

������

��������

��������

��������

��������

������

������

������

������

?

? ? ? ? ? ?

0 0 0 00 0

0 0 0 0 0 0 0 0 14

01

C2

+

=

3

10 89 7 3 20 1 7 94

1

1

Louis Wehenkel Extremely Randomised Trees et al. (35/56)

Page 38: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Datasets and protocols

Datasets # images # base attributes # classes Nw w

MNIST 70000 784 (28 ∗ 28 ∗ 1) 10 360,000 24

ORL 400 10304 (92 ∗ 112 ∗ 1) 40 120,000 20

COIL-100 7200 3072 (32 ∗ 32 ∗ 3) 100 120,000 16

OUTEX 864 49152 (128 ∗ 128 ∗ 3) 54 120,000 4

◮ MNIST: LS = 60000 images ; TS = 10000 images

◮ ORL: Stratified cross-validation: 10 random splits LS = 360; TS = 40

◮ COIL-100: LS = 1800 images ; TS = 5400 images (36 images per object)

◮ OUTEX: LS = 432 images (8 images per texture) ; TS = 432 images (8 images per texture)

Louis Wehenkel Extremely Randomised Trees et al. (36/56)

Page 39: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

A few results: accuracyDBs Extra-Trees Extra-Trees State-of-the-art

with sub-windows

MNIST 3.26% 2.63% 0.5% [DKN04]

ORL 4.56% ± 1.43 1.66% ± 1.08 2.0% [Rav04]

COIL-100 1.96% 0.37% 0.1% [OM02]

OUTEX 65.05% 2.78% 0.2% [MPV02]

24 × 24:

20 × 20:

16 × 16:

4 × 4 :

Louis Wehenkel Extremely Randomised Trees et al. (37/56)

Page 40: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

A few results: CPU times

◮ Learning stage: depends on parametersMNIST: 6h, ORL: 37s, COIL-100: 1h, OUTEX: 11m

◮ Prediction: depends on parameters and sub-window sampling◮ Exhaustive (all sub-windows)

������

������

��������

��������

?

MNIST: 2msec, ORL: 354msecCOIL-100: 14msec, OUTEX: 800msec

◮ Random subset of sub-windows

������

������

��������

��������

������

������

����

���������

���������

��������

��������

?

MNIST: 1msec, ORL: 10msecCOIL-100: 5msec, OUTEX: 33msec

Louis Wehenkel Extremely Randomised Trees et al. (38/56)

Page 41: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Sub-windows of random size (robustness w.r.t. scale)

◮ Extraction of sub-windows of random size

◮ Rescaling to standard size

��������

��������

��������

��������

��������

��������

������

������������

������

��������

��������

��������

������

������

���������

���������

������

������

��������

��������

��������

��������

���������

���������

���

���

���

��� �

��

���

����

������������

���������������

���������������

������������������

����������

������ ��

����

������

���������

���������������

�������������������� ��

������

������������������

������������������

���

���

������

������

���

���

C1 C2 C3

C1 C1 C1 C1 C2 C2 C2 C2 C2 C3 C3 C3 C3 C3C1

Louis Wehenkel Extremely Randomised Trees et al. (39/56)

Page 42: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Sub-windows of random size and orientation (more robustness)

◮ Extraction of sub-windows of random size◮ + Random rotation◮ Rescaling to standard size

C1 C2 C3

C1 C1 C1 C1 C1 C2 C2 C2 C2 C2 C3 C3 C3 C3 C3

Louis Wehenkel Extremely Randomised Trees et al. (40/56)

Page 43: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Ensembles of extremely randomised treesTree-based batch mode reinforcement learning

Pixel-based image classification

Problem settingProposed solutionSome resultsFurther refinements

Attribute importance measures (global approach)

Compute information quantity (Shannon) brought by each pixel ineach tree, and average over the trees.

ORL (faces) MNIST (all digits) MNIST (0 vs 8)

Louis Wehenkel Extremely Randomised Trees et al. (41/56)

Page 44: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

Part II

Proteomics biomarker identificationProblem settingProposed solutionApplication to inflammatory diseases

Industrial (real-world) applicationsSteal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

References

Louis Wehenkel Extremely Randomised Trees et al. (42/56)

Page 45: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

Louis Wehenkel Extremely Randomised Trees et al. (43/56)

Page 46: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

Louis Wehenkel Extremely Randomised Trees et al. (44/56)

Page 47: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

Supervised learning based methodology

Biomarker identification

of attributesImportance ranking

Peak selectionDiscretizationPre−processing:

Data + Cross−validationMachine Learning

Biomarkers

Biomarker selection

Model and

(Presentation based on [GFd+04])

Louis Wehenkel Extremely Randomised Trees et al. (45/56)

Page 48: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

RA and IBD

RA Early diagnosis of Rhumatoid Arthritis

IBD Better understanding of Inflammatory Bowel Diseases

Datasets collected at University Hospital of Liege.

Patients Number of attributesDataset #target #others Raw p = .3% p = .5% p = 1% Peaks

RA 68 138 15445 1026 626 319 136IBD 240 240 13799 1086 664 338 152

Toolbox: Single trees, Tree Bagging, Tree Boosting, Random Forests, Extra-Trees

Louis Wehenkel Extremely Randomised Trees et al. (46/56)

Page 49: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

Biomarker identification

5

10

15

20

25

30

35

40

45

50

55

0 10 20 30 40 50 60 70 80 90 100

Err

(%

)

N

Biomarker importance ranking

Random orderP-values ranking

Err+sigma

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70 80 90 100

Err

(%

)N

Biomarker importance ranking

Random orderP-values ranking

Err+sigma

RA dataset IBD dataset

Figure: Variation of accuracy with number of biomarkers (Tree Boosting)

Louis Wehenkel Extremely Randomised Trees et al. (47/56)

Page 50: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Problem settingProposed solutionApplication to inflammatory diseases

Graphical visualisation of biomarker identification (RA)

Louis Wehenkel Extremely Randomised Trees et al. (48/56)

Page 51: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Steal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

Proteomics biomarker identificationProblem settingProposed solutionApplication to inflammatory diseases

Industrial (real-world) applicationsSteal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

References

Louis Wehenkel Extremely Randomised Trees et al. (49/56)

Page 52: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Steal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

Steal-mill control (ULg, PEPITe, ARCELOR)

◮ Development of afriction model,taking intoaccount steelquality andtemperature.

◮ Improvepre-setting ofsteel-millcontroller

◮ Reduce waste

Louis Wehenkel Extremely Randomised Trees et al. (50/56)

Page 53: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Steal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

Wide area control of power systems (ULg, PEPITe, Hydro-Quebec)

◮ Improve emergency control scheme

◮ Churchill-Falls power plant

◮ Reduce probability of blackout

◮ Reduce over/under-tripping◮ Adjust load shedding scheme

◮ Database generation

◮ 10,000 real-time snapshotssampled (several years)

◮ Massive time-domainsimulations

◮ New rules in operation

◮ Methodology has been adopted

Louis Wehenkel Extremely Randomised Trees et al. (51/56)

Page 54: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Steal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

Failure analysis of manufacturing process (PEPITe, Valeo)

Problem

◮ Car reflector manufacturing line◮ High, unexplained defect rate◮ 40 process parameters (T,H, pH, flow...)

measured every 5 minutes

Approach

◮ Two-month period data collection◮ Database of 10,000×40 measurements◮ Data mining using PEPITo software◮ Identification of the root cause◮ Default rate reduced by 20%

Louis Wehenkel Extremely Randomised Trees et al. (52/56)

Page 55: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

Steal-mill controlEmergency control of power systemsFailure analysis of manufacturing processSCADA system data mining

SCADA system data mining (PEPITe, AREVA, TENNET)

Challenges faced by TENNET (South NL subsystem)

◮ Minimise exchanges of reactive power◮ Formalise operators actions◮ Discover optimal network states◮ Optimise forecasting of industrial loads

◮ Decide of network upgrades effectively◮ Justify long-term planning decisions◮ Validation of state estimator

Goal of this project: show the value of DataMining with respect to these challenges.

Based on 6 months, 15´ sampling of 3200 data-points

◮ 900 status, 2200 analog, 100 calculated

◮ Database: 16000 rows, 3200 columns

Louis Wehenkel Extremely Randomised Trees et al. (53/56)

Page 56: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

T. Deselaers, D. Keysers, and H. Ney.Classification error rate for quantitative evaluation of content-based imageretrieval systems.In Proceedings of the 7th International Conference on Pattern Recognition, 2004.

D. Ernst, P. Geurts, and L. Wehenkel.Iteratively extending time horizon reinforcement learning.In Proceedings of the 14th European Conference on Machine Learning, 2003.

D. Ernst, P. Geurts, and L. Wehenkel.Tree-based batch mode reinforcement learning.To appear in Journal of Machine Learning Research, 2005.

P. Geurts.Contributions to decision tree induction: bias/variance tradeoff and time seriesclassification.Phd. thesis, Department of Electrical Engineering and Computer Science,University of Liege, May 2002.

Louis Wehenkel Extremely Randomised Trees et al. (54/56)

Page 57: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

P. Geurts, D. Ernst, and L. Wehenkel.Extremely randomized trees.Submitted for publication, 2004.

P. Geurts, M. Fillet, D. de Seny, M.-A. Meuwis, M.-P. Merville, and L. Wehenkel.Proteomic mass spectra classification using decision tree based ensemblemethods.Submitted for publication, 2004.

R. Maree, P. Geurts, J. Piater, and L. Wehenkel.A generic approach for image classsification based on decision tree ensemblesand local sub-windows.In Proceedings of the 6th Asian Conference on Computer Vision, 2004.

R. Maree, P. Geurts, J. Piater, and L. Wehenkel.Random subwindows for robust image classification.To appear in Proceedings of the IEEE International Conference on ComputerVision and Pattern Recognition, 2005.

Louis Wehenkel Extremely Randomised Trees et al. (55/56)

Page 58: Decision and regression tree ensemble methods and their … · Decision and regression tree ensemble methods and their applications in automatic learning Louis Wehenkel Department

Proteomics biomarker identificationIndustrial (real-world) applications

References

T. Maenpaa, M. Pietikainen, and J. Viertola.Separating color and pattern information for color texture discrimination.In Proceedings of the 16th International Conference on Pattern Recognition,2002.

S. Obdrzalek and J. Matas.Object recognition using local affine frames on distinguished regions.In Electronic Proceedings of the 13th British Machine Vision Conference, 2002.

S. Ravela.Shaping receptive fields for affine invariance.In Proceedings of the IEEE International Conference on Computer Vision andPattern Recognition, 2004.

Louis Wehenkel Extremely Randomised Trees et al. (56/56)