researcharticle forest pruning based on branch...

Research ArticleForest Pruning Based on Branch Importance

Xiangkui Jiang1 Chang-anWu2 and Huaping Guo2

1School of Automation Xirsquoan University of Posts and Telecommunication Xirsquoan Shaanxi 710121 China2School of Computer and Information Technology Xinyang Normal University Xinyang Henan 464000 China

Correspondence should be addressed to Huaping Guo hpguo_cm163com

Received 19 January 2017 Revised 29 March 2017 Accepted 30 April 2017 Published 1 June 2017

Academic Editor Michael Schmuker

Copyright copy 2017 Xiangkui Jiang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

A forest is an ensemble with decision trees as membersThis paper proposes a novel strategy to pruning forest to enhance ensemblegeneralization ability and reduce ensemble size Unlike conventional ensemble pruning approaches the proposed method tries toevaluate the importance of branches of trees with respect to the whole ensemble using a novel proposed metric called importancegain The importance of a branch is designed by considering ensemble accuracy and the diversity of ensemble members and thusthe metric reasonably evaluates how much improvement of the ensemble accuracy can be achieved when a branch is pruned Ourexperiments show that the proposed method can significantly reduce ensemble size and improve ensemble accuracy no matterwhether ensembles are constructed by a certain algorithm such as bagging or obtained by an ensemble selection algorithm nomatter whether each decision tree is pruned or unpruned

1 Introduction

Ensemble learning is a very important research topic inmachine learning and data mining The basic heuristic is tocreate a set of learners and aggregate the prediction of eachlearner for classifying examples Many approaches such asbagging [1] boosting [2] andCOPEN [3] have been proposedto create ensembles and the key to the success of theseapproaches is that base learners are accurate and diverse [4]

Ensemble methods have been applied to many applica-tions such as image detection [5ndash7] and imbalanced learn-ing problem [8] However an important drawback existingin ensemble learning approaches is that they try to trainunnecessarily large ensembles Large ensembles need a largememory for storing the bases learners and much responsetime for prediction Besides large ensemble may reduce itsgeneralization ability instead of increasing the performance[9] Therefore a lot of researches to tackle this problemhave been carried out and the researches mainly focus onensemble selection selecting a subset of ensemble membersfor prediction such as ordered-based ensemble selectionmethods [10ndash12] and greedy heuristic based ensemble selec-tionmethods [13ndash21]The research results indicate that awell-designed ensemble selection method can reduce ensemblesize and improve ensemble accuracy

Besides ensemble selection we can prune an ensemblethrough the following two approaches if ensemble membersare decision trees (1) pruning individual members separatelyand combining the pruned members together for predictionand (2) repeatedly pruning individual members by consid-ering the overall performance of the ensemble For the firststrategy many decision tree pruning methods such as thoseused inCART [22] andC45 [23] have been studiedAlthoughpruning can simplify model structure whether pruning canimprove model accuracy is still a controversial topic inmachine learning [24] The second strategy coincides withthe expectation of improving model generalization abilityglobally However this method has not been extensivelystudied This paper focuses on this strategy and names thestrategy as forest pruning (FP)

The major job of forest pruning is to define an effectivemetric evaluating the importance of a certain branch of treesTraditionalmetrics can not be applied to forest pruning sincethese metrics just consider the influence on a single decisiontree when a branch is pruned Therefore we need a newmetric for pruning forest Our contributions in this paper areas follows

(i) Introduce a new ensemble pruning strategy to prunedecision tree based ensemble

HindawiComputational Intelligence and NeuroscienceVolume 2017 Article ID 3162571 11 pageshttpsdoiorg10115520173162571

2 Computational Intelligence and Neuroscience

(ii) propose a novel metric to measure the improvementof forest performance when a certain node grows intoa subtree

(iii) present a new ensemble pruning algorithm withthe proposed metric to prune a decision tree basedensemble The ensemble can be learned by a cer-tain algorithm or obtained by some ensemble selec-tion method Each decision tree can be pruned orunpruned

Experimental results show that the proposed methodcan significantly reduce the ensemble size and improve itsaccuracy This result indicates that the metric proposed inthis paper reasonably measures the influence on ensembleaccuracy when a certain node grows into a subtree

The rest of this paper is structured as follows Section 2provides a survey of ensemble of decision trees Section 3presents the formal description of forest trimming and themotivation of this study by an example Section 4 introduces anew forest pruning algorithm Section 5 reports and analyzesexperimental results and we conclude the paper with simpleremark and future work in Section 6

2 Forests

A forest is an ensemble whose members are learned bydecision tree learningmethod Two approaches are oftenusedto train a forest traditional approaches and the methodsspecially designed for forests

Bagging [1] and boosting [2] are the two most oftenused traditional methods to build forests Bagging takesbootstrap samples of objects and trains a tree on each sampleThe classifier votes are combined by majority voting Insome implementations classifiers produce estimates of theposterior probabilities for the classes These probabilities areaveraged across the classifiers and the most probable classis assigned called ldquoaveragerdquo or ldquomeanrdquo aggregation of theoutputs Bagging with average aggregation is implemented inWeka and used in the experiments in this paper Since eachindividual classifier is trained on a bootstrap sample the datadistribution seen during training is similar to the originaldistribution Thus the individual classifiers in a baggingensemble have relatively high classification accuracy Thefactor encouraging diversity between these classifiers is theproportion of different examples in the training set Boostingis a family of methods and Adaboost is the most prominentmember The idea is to boost the performance of a ldquoweakrdquoclassifier (can be decision tree) by using it within an ensemblestructure The classifiers in the ensemble are added one ata time so that each subsequent classifier is trained on datawhich have been ldquohardrdquo for the previous ensemble membersA set of weights is maintained across the objects in the dataset so that objects that have been difficult to classify acquiremore weight forcing subsequent classifiers to focus on them

Random forest [25] and rotation forest [26] are twoimportant approaches specially designed for building forestsRandom forest is a variant version of bagging The forest isbuilt again on bootstrap samples The difference lies in theconstruction of the decision treeThe feature to split a node isselected as the best feature among a set of119872 randomly chosen

features where 119872 is a parameter of the algorithm This smallalteration appeared to be a winning heuristic in that diversitywas introduced without much compromising the accuracy ofthe individual classifiers Rotation forest randomly splits thefeature set into 119870 subsets (119870 is a parameter of the algorithm)and Principal Component Analysis (PCA) [27] is applied toeach subset All principal components are retained in orderto preserve the variability information in the data Thus 119870axis rotations take place to form the new features and rotationforest building a tree using all training set in the new spacedefined by a given new feature space

3 Problem Description and Motivation

31 Problem Description Let 119863 = (119909119894 119910119894) | 119894 = 1 2 119873be a data set and let 119865 = 1198791 119879119872 be an ensemble withdecision tree 119879119894 learning from 119863 Denote by V isin 119879119905 a nodein tree 119879 and by 119864(V) isin 119863 the set of the examples reaching Vfrom the root of119879 root(119879) Suppose each node V isin 119879 containsa vector (119901V

1 119901V2 119901V

119870) where 119901V119896 is the proportion of the

examples in 119864(V) associated with label 119896 If V isin 119879119894 is a leaf andx119894 isin 119864(V) the prediction of 119879119895 on x119894 is

119879119895 (x119894) = argmax119896

119901V119896 (1)

Similarly for each example x119895 to be classified ensemble 119865returns a vector (1199011198951 1199011198952 119901119895119870) indicating that x119895 belongsto label 119896 with probability 119901119895119896 where

119901119895119896 =1119872119872

sum119895=1

119901(119894)119895119896 119896 = 1 2 119870 (2)

The prediction of 119865 on x119895 is 119865(x119895) = argmax119896119901119895119896Now our problem is given a forest 119865 with 119872 decision

trees how to prune each tree to reduce 119865rsquos size and improveits accuracy where 119865 is either constructed by some algorithmor obtained by some ensemble selection method

32 Motivation First let us look at an example which showsthe possibility that forest trimming can improve ensembleaccuracy

Example 1 Let 119865 = 1198790 1198791 1198799 be a forest with tendecision trees where 1198791 is shown in Figure 1 Suppose that119901V1 = 060 119901V

2 = 040 119901V11 = 100 119901V1

2 = 000 119901V21 = 020

and 119901V22 = 080 Let ten examples x0 x1 x9 reach node V

where x0 x5 associate with label 1 and x6 x9 associatewith label 2 Assume examples x0 x1 x4 reach leaf nodeV1 and x5 x9 reach leaf node V1

Obviously for 1198790 we can not prune the children of nodeV since treating V as a leaf would lead to more examplesincorrectly classified by 1198790

Assume that 119865rsquos predictions on x0 x1 x9 are as fol-lows

11990101 = 06511990111 = 070

Computational Intelligence and Neuroscience 3

Root

1 2

Figure 1 Decision tree 1198790 V is a test node and V1 and V2 are two leaves

11990121 = 07011990131 = 06511990141 = 08011990151 = 04911990161 = 03011990171 = 01911990181 = 02011990191 = 03011990102 = 03511990112 = 03011990122 = 03011990132 = 03511990142 = 02011990152 = 05111990162 = 07011990172 = 08111990182 = 08011990192 = 070

(3)

where119901119895119896 is the probability of x119895 associatedwith label 119896 From119865rsquos predictions shown above we have that x6 is incorrectlyclassified by 119865 Update 1198790 to 11987910158400 by pruning Vrsquos children andupdate 119865 to 1198651015840 = 11987910158400 1198791 1198799 A simple calculation tells usthat for the ten examples 1198651015840 returns

11990101 = 06111990111 = 06511990121 = 065

11990131 = 06511990141 = 07511990151 = 05211990161 = 03311990171 = 02211990181 = 02311990191 = 03311990102 = 04011990112 = 03511990122 = 03511990132 = 03511990142 = 02511990152 = 04811990162 = 06711990172 = 07811990182 = 07711990192 = 067

(4)

It is easy to see that 1198651015840 correctly classifies all of the tenexamples

This example shows that if a single decision tree is consid-ered maybe it should not be pruned any more However forthe forest as a whole it is still possible to prune some branchesof the decision tree and this pruning will probably improvethe ensemble accuracy instead of reducing it

Although the example above is constructed by us similarcases can be seen everywhere when we study ensemblesfurther It is this observation that motivates us to study foresttrimming methods However more efforts are needed to


turn possibility into feasibility Further discussions about thisproblem will be presented in the next section

4 Forest Pruning Based onBranch Importance

41 The Proposed Metric and Algorithm Idea To avoid trap-ping in detail too early we assume that 119868(V 119865 x119895) has beendefined which is the importance of node V when forest119865 classifies example x119895 If x119895 notin 119864(V) then 119868(V 119865 x119895) =0 Otherwise the details of the definition of 119868(V 119865 x119895) arepresented in Section 32

Let 119879 isin 119865 be a tree and let V isin 119879 be a node The impor-tance of V with respect to forest 119865 is defined as

119868 (V 119865) = sumx119895isin1198631015840

119868 (V 119865 x119895) = sumx119895isin119864(V)

119868 (V 119865 x119895) (5)

where 1198631015840 is a pruning set and 119864(V) is the set of the example in1198631015840 reaching node V from root(119879) 119868(V 119865) reflects the impactof node V on 119865rsquos accuracy

Let 119871(V) be the set of leaf nodes of branch(V) the branch(subtree) with V as the root The contribution of branch(V) to119865 is defined as

119868 (branch (V) 119865) = sumV1015840isin119871(V)

119868 (V1015840 119865) (6)

which is the sum of the importance of leaves in branch(V)Let V isin 119879 be a nonterminal node The importance gain

of V to 119865 is defined by the importance difference betweenbranch(V) and node V that is

IG (V 119865) = 119868 (branch (V)) minus 119868 (V 119865) (7)

IG(V 119865) can be considered as the importance gain ofbranch(V) and its value reflects how much improvementof the ensemble accuracy is achieved when V grows into asubtree If IG(V 119865) gt 0 then this expansion is helpful toimprove 119865rsquos accuracy Otherwise it is unhelpful to improveor even reduce 119865rsquos accuracy

The idea of the proposed method of pruning ensembleof decision trees is as follows For each nonterminal nodeV in each tree 119879 calculate its importance gain IG(V 119865) onthe pruning set If IG(V 119865) is smaller than a threshold prunebranch(V) and treat V as a leafThis procedure continues untilall decision trees can not be pruned

Before presenting the specific details of the proposedalgorithmwe introduce how to calculate 119868(V 119865 x119895) in the nextsubsection

42 Con(V 119865 x119895) Calculation Let ℎ be a classifier and let119878 be an ensemble Partalas et al [28 29] identified that theprediction of ℎ and 119878 on an example x119895 can be categorizedinto four cases (1) 119890119905119891 ℎ(x119894) = 119910119894 and 119878(x119894) = 119910119894 (2) 119890119905119905 ℎ(x119894) =119910119894 and 119878(x119894) = 119910119894 (3) 119890119891119905 ℎ(x119894) = 119910119894 and 119878(x119894) = 119910119894 (4) 119890119891119905ℎ(x119894) = 119910119894 and 119878(x119894) = 119910119894 They concluded that considering allfour cases is crucial to design ensemble diversity metrics

Based on the four cases above Lu et al [11] introduced ametric IC(119895)119894 to evaluate the contribution of the 119894th classifier

to 119878 when 119878 classifies the 119895th instance Partalas et al [28 29]introduced ameasure calledUncertaintyWeighted AccuracyUWA119863(ℎ 119878 x119895) to evaluate ℎrsquos contribution when 119878 classifiesexample x119895

Similar to the discussion above we define119890119905119891 (V)

= x119895 | x119895 isin 119864 (V) and 119879 (x119895) = 119910119895 and 119865 (x119895) = 119910119895

119890119905119905 (V)


119890119891119905 (V)


119890119891119891 (V)


(8)

In the following discussions we assume that V isin 119879 andx119895 isin 119864(V) Let 119891119898 and 119891119904 be the subscripts of the largestelement and the second largest element in 1199011198951 119901119895119870respectively Obviously 119891119898 is the label of x119895 predicted byensemble 119865 Similarly let 119905119898 = argmax119896(119901V

1 119901V119870) If V is

a leaf node then 119905119898 is the label of x119895 predicted by decisiontree 119879 Otherwise 119905119898 is the label of x119895 predicted by 1198791015840 where1198791015840 is the decision tree obtained from 119879 by pruning branch(V)For simplicity we call 119905119898 the label of x119895 predicted by node Vand say node V correctly classifies x119895 if 119905119898 = 119910119895

We define 119868(V 119865 x119895) based on the four cases in formula(8) respectively If x119895 isin 119890119905119891(V) or x119895 isin 119890119905119905(V) thenCon(V 119865 x119895) ge 0 since V correctly classifies x119895 OtherwiseCon(V 119865 x119895) lt 0 since V incorrectly classifies x119895

For x119895 isin 119890119905119891(V) Con(V 119865 x119895) is defined as

119868 (V 119865 x119895) =119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119905119898 + 1119872) (9)

where 119872 is the number of base classifiers in 119865 Here 119905119898 = 119910119895and 119891119898 = 119910119895 then 119901V

119905119898ge 119901V119891119898

119901119895119891119898 ge 119901119895119905119898 and thus0 le Con(V 119865 x119895) le 1 Since 119901V

119891119898is the contribution of node V

to the probability that 119865 correctly predicates x119895 belonging toclass 119905119898 while 119901V

119891119898is the contribution of node V to 119901119895119891119898 the

probability that 119865 incorrectly predicates x119895 belongs to class119891119898 (119901V

119905119898minus 119901V119891119898

)119872 can be considered as the net importanceof node V when 119865 classifies x119895 119901119895119891119898 minus 119901119895119905119898 is the weight ofVrsquos net contribution which reflects the importance of nodeV for classifying x119895 correctly The constant 1119872 is to avoid119901119895119891119898 minus 119901119895119905119898 being zero or too small


119868 (V 119865 x119895) =119901V119905119898

minus 119901V119905119904

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (10)

Here 0 le Con(V 119865 x119895) le 1 In this case both V and119865 correctly classify x119895 We treat (119901V

119905119898minus 119901V119905119904)119872 as the net


Input pruning set 1198631015840 forest 119865 = 1198791 1198792 119879119898 where 119879119894 is a decision treeOutput pruned forest 119865Method(1) for each x119895 isin 1198631015840(2) Evaluate 119901119895119896 1 le 119896 le 119870(3) for each 119879119894 isin 119865 do(4) for each node V in 119879119894 do(5) 119868V larr 0(6) for each x119895 isin 1198631015840 do(7) 119902119895119896 larr 119901119894119895119896 1 le 119896 le 119870(8) Let 119875 be the path along which x119895 travels(9) for each node V isin 119875(10) 119868V larr 119868V + 119868(V 119865 x119895)(11) PruningTree(root(119879119894))(12) for each x119895 isin 1198631015840(13) 119903119895119896 larr 119901119894119895119896 1 le 119896 le 119870(14) 119901119895119896 larr 119901119895119896 minus 119902119895119896119872 + 119903119895119896119872Procedure PruningTree(V)(1) if V is not a leaf then(2) IG larr 119868V(3) 119868br(V) larr 0(4) for each child 119888 of V(5) PruningTree(c)(6) 119868br(V) larr 119868br(V) + 119868br(119888)(7) IG = 119868br(V) minus 119868V(8) if IG lt 120575 then(9) Prune subtree(V) and set V to be a leaf

Algorithm 1 The procedure of forest pruning

contribution of node V to119865 and x119895 and119901119895119891119898minus119901119895119891119904 as theweightof Vrsquos net contribution


119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (11)

It is easy to prove minus1 le Con(V 119865 x119895) le 0This case is opposedto the first case In this case we treat minus(119901V

119905119898minus 119901V119891119898

)119872 as thenet contribution of node V to 119865 and x119895 and 119901119895119891119898 minus 119901119895119891119904 as theweight of Vrsquos net contribution


119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119910119895

119872(119901119895119891119898 minus 119901119895119910119895 + 1119872) (12)

where 119910119895 isin 1 119870 is the label of x119895 minus1 le Con(V 119865 x119895)le 0 In this case both V and 119865 incorrectly classify x119895 namely119905119898 = 119910119895 and 119891119898 = 119910119895 We treat minus(119901V

119905119898minus 119901V119910119895

)119872 as the netcontribution of node V to 119865 and x119895 and 119901119895119891119898 minus 119901119895119910119895 as theweight of Vrsquos net contribution

43 Algorithm The specific details of forest pruning (FP) areshown in Algorithm 1 where

1198631015840 is a pruning set containing 119899 instances

119901119895119896 is the probability that ensemble119865 predicts x119895 isin 1198631015840associated with label 119896119901119894119895119896 is the probability that current tree 119879119894 predicts x119895 isin1198631015840 associated with label 119896119868V is a variant associated with node V to save Vrsquosimportance119868br(V) is a variant associated with node V to save thecontribution of branch(V)

FP first calculates the probability of119865rsquos prediction on eachinstance x119895 (lines (1)sim(2)) Then it iteratively deals with eachdecision tree 119879119894 (lines (3)sim(14)) Lines (4)sim(10) calculate theimportance of each node V isin 119879119894 where 119868(V 119865 x119895) in line (10)is calculated using one of the equations (9)sim(12) based onthe four cases in equation (8) Line (11) calls PruningTree(V)to recursively prune 119879119894 Since forest 119865 has been changedafter pruning 119879119894 we adjust 119865rsquos prediction in lines (12)ndash(14)Lines (3)ndash(14) can be repeated many times until all decisiontrees can not be pruned Experimental results show thatforest performance is stable after this iteration is executed 2times

The recursive procedure PruningTree(V) adopts abottom-up fashion to prune the decision tree with V asthe root After pruning branch(V) (subtree(V)) 119868V saves thesum of the importance of leaf nodes in branch(V) Then119868(branch(V) 119865) is equal to the sum of importance of thetree with V as root The essence of using 119879119894rsquos root to call


PruningTree is to travel 119879119894 If current node V is a nonleafthe procedure calculates Vrsquos importance gain IG saves into119868V the importance sum of the leaves of branch(V) (lines(2)sim(7)) and determines pruning branch(V) or not based onthe difference between CG and the threshold value 120575 (lines(8)sim(9))

44 Discussion Suppose pruning set1198631015840 contains 119899 instancesforest119865 contains119872decision trees and119889max is the depth of thedeepest decision tree in 119865 Let |119879119894| be the number of nodesin decision tree 119879119894 and 119905max = max1le119894le119872(|119879119894|) The runningtime of FP is dominated by the loop from lines (4) to (19)The loop from lines (5) to (7) traverses 119879119894 which is can bedone in 119874(119905max) the loop from lines (8) to (14) searches apath of 119879119894 for each instance in 1198631015840 which is complexity of119874(119899119889max) the main operation of PruningTree(root(119879119894)) is acomplete traversal of 119879119894 whose running time is 119874(119905max) theloop from lines (16) to (18) scans a linear list of length 119899 in119874(119899) Since 119905max ≪ 119899119889max we conclude the running timeof FP is 119874(119899119872119889max) Therefore FP is a very efficient forestpruning algorithm

Unlike traditional metrics such as those used by CART[22] and C45 [23] the proposed measure uses a global eval-uation Indeed this measure involves the prediction valuesthat result from a majority voting of the whole ensembleThus the proposed measure is based on not only individualprediction properties of ensemble members but also thecomplementarity of classifiers

From equations (9) (10) (11) and (12) our proposedmeasure takes into account both the correctness of predic-tions of current classifier and the predictions of ensembleand the measure deliberately favors classifiers with a betterperformance in classifying the samples on which the ensem-ble does not work well Besides the measure considers notonly the correctness of classifiers but also the diversity ofensemble members Therefore using the proposed measureto prune an ensemble leads to significantly better accuracyresults

5 Experiments

51 Experimental Setup 19 data sets of which the details areshown in Table 1 are randomly selected from UCI repertory[30] where Size Attrs and Cls are the size attributenumber and class number of each data set respectively Wedesign four experiments to study the performance of theproposed method (forest pruning FP)

(i) The first experiment studies FPrsquos performance versusthe times of running FP Here four data sets thatis autos balance-scale German-credit and pima areselected as the representatives and each data set israndomly divided into three subsets with equal sizewhere one is used as the training set one as thepruning set and the other one as the testing setWe repeat 50 independent trials on each data setTherefore a total of 300 trials of experiments areconducted

Table 1 The details of data sets used in this paper

Data set Attrs Size ClsAustralian 226 70 24Autos 205 26 6Backache 180 33 2Balance-scale 625 5 3Breast-cancer 268 10 2Cars 1728 7 4Credit-rating 690 16 2German-credit 1000 21 2Ecoli 336 8 8Hayes-roth 160 5 4Heart-c 303 14 5Horse-colic 368 24 2Ionosphere 351 35 2Iris 150 5 3Lymph 148 19 4Page-blocks 5473 11 5Pima 768 9 2prnn-fglass 214 10 6Vote 439 17 2

(ii) The second experiment is to evaluate FPrsquos perfor-mance versus FLrsquos size (number of base classifiers)The experimental setup of data sets is the same as thefirst experiment

(iii) The third experiment aims to evaluate FPrsquos perfor-mance on pruning ensemble constructed by bagging[1] and random forest [26] Here tenfold cross-validation is employed each data set is divided intotenfold [31 32] For each one the other ninefold is totrain model and the current one is to test the trainedmodelWe repeat 10 times the tenfold cross-validationand thus 100 models are constructed on each dataset Here we set the training set as the pruning setBesides algorithm rank is used to further test theperformance of algorithms [31ndash33] on a data set thebest performing algorithm gets the rank of 10 thesecond best performing algorithmgets the rank of 20and so on In case of ties average ranks are assigned

(iv) The last experiment is to evaluate FPrsquos performance onpruning the subensemble obtained by ensemble selec-tion method EPIC [11] is selected as the candidate ofensemble selection methods The original ensembleis a library with 200 base classifiers and the size ofsubsembles is 30 The setup of data sets is the same asthe third experiment

In the experiments bagging is used to train originalensemble and the base classifier is J48 which is a Javaimplementation of C45 [23] from Weka [34] In the thirdexperiment random forest is also used to build forest In thelast three experiments we run FP two times


0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500

Autos Balance-scale German-credit Pima(a)

0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74

German-credit 0 2 4 6 8 10

745

75

755

PimaBalance-scale

(b)

Figure 2 Results on data sets (a) Forest size (node number) versus the times of running FP (b) Forest accuracy versus the times of runningFP

52 Experimental Results The first experiment is to inves-tigate the relationship of the performance of the proposedmethod (FP) and the times of running FP In each trial wefirst use bagging to learn 30 unpruned decision trees as aforest and then iteratively run lines (3)sim(14) of FP manytimes to trim the forest More experimental setup refers toSection 51 The corresponding results are shown in Figure 2where the top four subfigures are the variation trend offorest nodes number with the iteration number increasingand the bottom four are the variation trend of ensembleaccuracy Figure 2 shows that FP significantly reduces forestssize (almost 40sim60of original ensemble) and significantlyimproves their accuracy However the performance of FP isalmost stable after two iterations Therefore we set iterationnumber to be 2 in the following experiments

The second experiment aims at investigating the per-formance of FP on pruning forests with different scalesThe number of decision trees grows gradually from 10 to200 More experimental setup refers to Section 51 Theexperimental results are shown in Figure 3 where the topfour subfigures are the comparison between pruned andunpruned ensembles with the growth of the number ofdecision trees and the bottom four are the comparisonof ensemble accuracy As shown in Figure 3 for eachdata set the rate of forest nodes pruned by FP keepsstable and forests accuracy improved by FP is also basi-cally unchanged no matter how many decision trees areconstructed

The third experiment is to evaluate the performance of FPon pruning the ensemble constructed by ensemble learningmethodThe setup details are shown in Section 51 Tables 2 34 and 5 show the experimental results of comparedmethods

respectively where Table 2 reports themean accuracy and theranks of algorithms Table 3 reports the average ranks usingnonparameter Friedman test [32] (using STACWeb Platform[33]) Table 4 reports the comparing results using post hocwith Bonferroni-Dunn (using STAC Web Platform [33]) of005 significance level and Table 5 reports the mean nodenumber and standard deviations Standard deviations are notprovided in Table 2 for clarity The column of ldquoFPrdquo of Table 2is the results of pruned forest and ldquobaggingrdquo and ldquorandomforestrdquo are the results of unpruned forests constructed bybagging and random forest respectively In Tables 3 and 4Alg1 Alg2 Alg3 Alg4 Alg5 and Alg6 indicate PF pruningbagging with unpruned C45 bagging with unpruned C45PF pruning bagging with pruned C45 bagging with prunedC45 PF pruning random forest and random forest FromTable 2 FP significantly improves ensemble accuracy in mostof the 19 data sets nomatter whether the individual classifiersare pruned or unpruned no matter whether the ensemble isconstructed by bagging or random forest Besides Table 2shows that the ranks of FP always take place of best threemethods in these data sets Tables 3 and 4 validate the resultsin Table 2 where Table 3 shows that the average rank of PFis much small than other methods and Table 4 shows thatcompared with other methods PF shows significant betterperformance Table 5 shows FP is significantly smaller thanbagging and random forest nomatter whether the individualclassifier is pruned or not

The last experiment is to evaluate the performance ofFP on pruning subensembles selected by ensemble selectionmethod EPIC Table 6 shows the results on the 19 data setswhere left and right are the accuracy and size respectivelyAs shown in Table 6 FP can further significantly improve the


0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76

Autos Balance-scale German-credit Pima(b)

Figure 3 Results on data sets (a) Forest size (node number) versus the number of decision trees (b) Forest accuracy versus the number ofdecision trees Solid curves and dash curves represent the performance of FP and bagging respectively

Table 2 The accuracy of FP bagging and random forest ∙ represents that FP outperforms bagging in pairwise t-tests at 95 significancelevel and denotes that FP is outperformed by bagging

Dataset Unpruned C45 Pruned C45 PF RFPF Bagging PF Bagging

Australian 8714 (20) 8609 (50)∙ 8680 (30) 8586 (60)∙ 8721 (10) 8614 (40)∙Autos 7440 (20) 7330 (40)∙ 7420 (30) 7320 (50)∙ 7472 (10) 7310 (60)∙Backache 8507 (30) 8317 (55)∙ 8589 (10) 8317 (55)∙ 8521 (20) 8322 (40)∙Balance-scale 7889 (30) 7507 (60)∙ 7979 (10) 7664 (40)∙ 7965 (20) 7632 (50)∙Breast-cancer 6998 (20) 6710 (50)∙ 6997 (30) 6658 (60)∙ 7011 (10) 6888 (40)∙Cars 8651 (40) 8678 (20) 8688 (10) 8628 (50) 8655 (30) 8611 (60)Credit-rating 8644 (20) 8554 (40)∙ 8634 (30) 8543 (50)∙ 8682 (10) 8542 (60)∙German-credit 7533 (10) 7383 (40)∙ 7486 (30) 7311 (60)∙ 7522 (20) 7318 (50)∙Ecoli 8447 (20) 8332 (60)∙ 8420 (30) 8340 (50)∙ 8452 (10) 8389 (40)∙Hayes-roth 7875 (30) 7863 (50) 7877 (10) 7631 (60)∙ 7876 (20) 7777 (40)Heart-c 8094 (20) 8034 (50) 8101 (10) 8027 (60) 8090 (30) 8087 (40)Horse-colic 8452 (10) 8329 (60)∙ 8433 (20) 8342 (50)∙ 8431 (30) 8399 (40)Ionosphere 9399 (10) 9393 (20) 9359 (60) 9371 (40) 9387 (30) 9356 (50)Iris 9355 (60) 9424 (40) 9452 (30) 9453 (20) 9421 (50) 9462 (10)Lymphography 8381 (50) 8343 (60) 8455 (20) 8453 (30) 8438 (40) 8482 (10)Page-blocks 9703 (45) 9704 (25) 9704 (25) 9706 (10) 9703 (45) 9701 (60)Pima 7509 (30) 7427 (40)∙ 7546 (10) 7406 (50)∙ 7543 (20) 7321 (60)∙prnn-fglass 7814 (40) 7846 (10) 7762 (60) 7784 (50) 7818 (30) 7832 (20)Vote 9577 (10) 9513 (60)∙ 9567 (30) 9533 (40) 9572 (20) 9531 (50)


Table 3 The ranks of algorithms using Friedman test where Alg1 Alg2 Alg3 Alg4 Alg5 and Alg6 indicate PF pruning bagging withunpruned C45 bagging with unpruned C45 PF pruning bagging with pruned C45 bagging with pruned C45 PF pruning random forestand random forest

Algorithm Alg5 Alg3 Alg1 Alg2 Alg6 Alg4Ranks 239 250 271 432 442 466

Table 4 The testing results using post hoc Alg1 Alg2 Alg3 Alg4 Alg5 and Alg6 indicate PF pruning bagging with unpruned C45 baggingwith unpruned C45 PF pruning bagging with pruned C45 bagging with pruned C45 PF pruning random forest and random forest

Comparison Statistic 119901 valueAlg1 versus Alg2 264469 004088Alg3 versus Alg4 355515 000189Alg5 versus Alg6 333837 001264

Table 5 The size (node number) of PF and bagging ∙ denotes that the size of PF is significantly smaller than the corresponding comparingmethod

Dataset Unpruned C45 Pruned C45 PF-RF RFPF Bagging PF Bagging

Australian 444082 plusmn 22324 595006 plusmn 21053∙ 219471 plusmn 9965 289788 plusmn 9866∙ 198967 plusmn 9965 265388 plusmn 9961∙Autos 113483 plusmn 19345 181319 plusmn 18349∙ 98782 plusmn 19822 152332 plusmn 19322∙ 95426 plusmn 19822 142912 plusmn 18221∙Backache 116279 plusmn 9658 159280 plusmn 7597∙ 51877 plusmn 4049 76424 plusmn 3778∙ 52274 plusmn 4049 78923 plusmn 4562∙Balance-scale 345852 plusmn 7455 462058 plusmn 7820∙ 300044 plusmn 7176 376260 plusmn 6555∙ 296744 plusmn 7176 376319 plusmn 7946∙Breast-cancer 216464 plusmn 15641 319420 plusmn 14495∙ 84396 plusmn 12944 118933 plusmn 15408∙ 88666 plusmn 12944 101121 plusmn 14892∙Cars 174168 plusmn 6059 209220 plusmn 14495∙ 156911 plusmn 5755 183491 plusmn 4680∙ 142132 plusmn 5665 189992 plusmn 6888∙Credit-rating 437065 plusmn 21927 594051 plusmn 22351∙ 216811 plusmn 12151 290440 plusmn 9973∙ 201521 plusmn 14058 265040 plusmn 10213∙German-credit 927075 plusmn 19762 1146419 plusmn 16863∙ 441011 plusmn 11494 542160 plusmn 10724∙ 431154 plusmn 12468 534060 plusmn 21748∙Ecoli 136662 plusmn 6168 173652 plusmn 6491∙ 130430 plusmn 5439 161102 plusmn 5631∙ 132430 plusmn 5442 182002 plusmn 8874∙Hayes-roth 49865 plusmn 2899 69758 plusmn 4087∙ 27230 plusmn 4511 30848 plusmn 5386∙ 26424 plusmn 4646 29948 plusmn 6384∙Heart-c 150346 plusmn 6547 194694 plusmn 6252∙ 64789 plusmn 10215 97493 plusmn 12983∙ 64789 plusmn 10215 103293 plusmn 11157∙Horse-colic 230767 plusmn 10699 362523 plusmn 11663∙ 68429 plusmn 10635 97493 plusmn 12983∙ 64789 plusmn 10215 74325 plusmn 12043∙Ionosphere 55249 plusmn 6141 68043 plusmn 6995∙ 52183 plusmn 5801 63473 plusmn 6444∙ 54258 plusmn 9602 66584 plusmn 6644∙Iris 16846 plusmn 11112 22266 plusmn 15042∙ 14452 plusmn 9726 19184 plusmn 13312∙ 13324 plusmn 9832 21255 plusmn 12947∙Lymphography 108987 plusmn 6716 139437 plusmn 6185∙ 71162 plusmn 3761 85644 plusmn 3083∙ 72453 plusmn 3761 92433 plusmn 5078∙Page-blocks 142005 plusmn 27851 218745 plusmn 55502∙ 139411 plusmn 60006 209293 plusmn 40379∙ 140111 plusmn 58803 213440 plusmn 53497∙Pima 220241 plusmn 67418 277677 plusmn 85295∙ 202119 plusmn 69802 248164 plusmn 74719∙ 192767 plusmn 62527 252143 plusmn 69982∙prnn-fglass 121998 plusmn 3985 139862 plusmn 3629∙ 114520 plusmn 3976 126928 plusmn 3552∙ 109818 plusmn 3426 131405 plusmn 6097∙Vote 30306 plusmn 12400 52780 plusmn 22505∙ 17404 plusmn 7761 27600 plusmn 12746∙ 18214 plusmn 7621 28833 plusmn 11376∙

accuracy of subensembles selected by EPIC and reduce thesize of the subensembles

6 Conclusion

An ensemble with decision trees is also called forest Thispaper proposes a novel ensemble pruning method calledforest pruning (FP) FP prunes treesrsquo branches based on theproposed metric called branch importance which indicatesthe importance of a branch (or a node) with respect to thewhole ensemble In this way FP achieves reducing ensemblesize and improving the ensemble accuracy

The experimental results on 19 data sets show that FPsignificantly reduces forest size and improves its accuracyin most of the data sets no matter whether the forestsare the ensembles constructed by some algorithm or the

subensembles selected by some ensemble selection methodno matter whether each forest member is a pruned decisiontree or an unpruned one

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work is in part supported by the National NaturalScience Foundation of China (Grant nos 61501393 and61402393) in part by Project of Science and Technol-ogy Department of Henan Province (nos 162102210310172102210454 and 152102210129) in part by Academics


Table 6 The performance of FP on pruning subensemble obtained by FP on bagging ∙ represents that FP is significantly better (or smaller)than EPIC in pairwise t-tests at 95 significance level and denotes that FP is significantly worse (or larger) than EPIC

Dataset Error rate SizePF EPIC PF EIPC

Australian 8683 plusmn 372 8622 plusmn 369∙ 244750 plusmn 12393 324616 plusmn 11607∙Autos 8483 plusmn 446 8211 plusmn 589∙ 70801 plusmn 5455 93144 plusmn 5116∙Backache 8483 plusmn 446 8211 plusmn 589∙ 70801 plusmn 5455 93144 plusmn 5116∙Balance-scale 7974 plusmn 369 7857 plusmn 382∙ 327776 plusmn 8507 403082 plusmn 9467∙Breast-cancer 7026 plusmn 724 6716 plusmn 836∙ 84396 plusmn 12944 118933 plusmn 15408∙Cars 8702 plusmn 506 8683 plusmn 504 17832 plusmn 6044 202281 plusmn 5319∙Credit-rating 8613 plusmn 392 8561 plusmn 395∙ 241460 plusmn 12366 322625 plusmn 13146∙German-credit 7498 plusmn 363 7313 plusmn 400∙ 441011 plusmn 11494 600728 plusmn 12430∙Ecoli 8377 plusmn 596 8324 plusmn 598∙ 149886 plusmn 6227 180626 plusmn 7098∙Hayes-roth 7875 plusmn 957 7681 plusmn 916∙ 27509 plusmn 4790 31132 plusmn 5705∙Heart-c 8121 plusmn 637 7999 plusmn 665∙ 123014 plusmn 5480 151057 plusmn 5256∙Horse-colic 8453 plusmn 530 8380 plusmn 611∙ 94007 plusmn 6664 133760 plusmn 7573∙Ionosphere 9390 plusmn 405 9402 plusmn 383 59063 plusmn 6562 70679 plusmn 7317∙Iris 9447 plusmn 511 9447 plusmn 502 15258 plusmn 10804 19780 plusmn 14131∙Lymphography 8165 plusmn 945 8146 plusmn 939 85842 plusmn 4650 102267 plusmn 3968∙Page-blocks 9702 plusmn 074 9707 plusmn 069 139663 plusmn 23703 208689 plusmn 39910∙Pima 7492 plusmn 394 7403 plusmn 358∙ 239195 plusmn 76416 291031 plusmn 93670∙prnn-fglass 7813 plusmn 806 7799 plusmn 844 128014 plusmn 4385 141084 plusmn 3959∙Vote 9570 plusmn 286 9533 plusmn 297 17736 plusmn 8610 28162 plusmn 14060∙

Propulsion Technology Transfer projects of Xirsquoan Scienceand Technology Bureau [CXY1516(6)] and in part by NanhuScholars Program for Young Scholars of XYNU

References

[1] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[2] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[3] D Zhang S Chen Z Zhou and Q Yang ldquoConstraint projec-tions for ensemble learningrdquo in Proceedings of the 23rd AAAIConference on Artificial Intelligence (AAAI rsquo08) pp 758ndash763Chicago Ill USA July 2008

[4] T G Dietterich ldquoEnsemble methods in machine learningrdquoin Proceedings of the 1st International Workshop on MultipleClassifier Systems pp 1ndash15 Cagliari Italy June 2000

[5] Z Zhou Y Wang Q J Wu C N Yang and X Sun ldquoEffectiveand effcient global context verifcation for image copy detectionrdquoIEEE Transactions on Information Forensics and Security vol 12no 1 pp 48ndash63 2017

[6] Z Xia X Wang L Zhang Z Qin X Sun and K Ren ldquoAprivacy-preserving and copy-deterrence content-based imageretrieval scheme in cloud computingrdquo IEEE Transactions onInformation Forensics and Security vol 11 no 11 pp 2594ndash26082016

[7] Z Zhou C-N Yang B Chen X Sun Q Liu and Q M J WuldquoEffective and efficient image copy detection with resistanceto arbitrary rotationrdquo IEICE Transactions on information andsystems vol E99-D no 6 pp 1531ndash1540 2016

[8] W M Zhi H P Guo M Fan and Y D Ye ldquoInstance-basedensemble pruning for imbalanced learningrdquo Intelligent DataAnalysis vol 19 no 4 pp 779ndash794 2015

[9] Z-H Zhou J Wu and W Tang ldquoEnsembling neural networksmany could be better than allrdquoArtificial Intelligence vol 137 no1-2 pp 239ndash263 2002

[10] G Martinez-Munoz D Hernandez-Lobato and A Suarez ldquoAnanalysis of ensemble pruning techniques based on orderedaggregationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 31 no 2 pp 245ndash259 2009

[11] Z Lu X D Wu X Q Zhu and J Bongard ldquoEnsemble pruningvia individual contribution orderingrdquo in Proceedings of the 16thACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining (KDD rsquo10) pp 871ndash880 Washington DCUSA July 2010

[12] L Guo and S Boukir ldquoMargin-based ordered aggregation forensemble pruningrdquo Pattern Recognition Letters vol 34 no 6pp 603ndash609 2013

[13] Y Liu and X Yao ldquoEnsemble learning via negative correlationrdquoNeural Networks vol 12 no 10 pp 1399ndash1404 1999

[14] B Krawczyk and M Wozniak ldquoUntrained weighted classifiercombinationwith embedded ensemble pruningrdquoNeurocomput-ing vol 196 pp 14ndash22 2016

[15] C Qian Y Yu and Z H Zhou ldquoPareto ensemble pruningrdquoin Proceedings of the 29th AAAI Conference on Artificial Intel-ligence pp 2935ndash2941 Austin Tex USA January 2015

[16] R E Banfield L O Hall K W Bowyer and W P KegelmeyerldquoEnsemble diversity measures and their application to thin-ningrdquo Information Fusion vol 6 no 1 pp 49ndash62 2005

[17] W M Zhi H P Guo and M Fan ldquoEnergy-based metric forensemble selectionrdquo in In Proceedings of 14th Asia-Pacific WebConference vol 7235 pp 306ndash317 Springer Berlin HeidelbergKunming China April 2012


[18] Q Dai and M L Li ldquoIntroducing randomness into greedyensemble pruning algorithmsrdquo Applied Intelligence vol 42 no3 pp 406ndash429 2015

[19] I Partalas G Tsoumakas and I Vlahavas ldquoA Study on greedyalgorithms for ensemble pruningrdquo Tech Rep TR-LPIS-360-12 Dept of Informatics Aristotle University of ThessalonikiGreece 2012

[20] D D Margineantu and T G Dietterich ldquoPruning adaptiveboostingrdquo in Proceedings of the 14th International Conference onMachine Learning pp 211ndash218 Nashville Tenn September 1997

[21] Q Dai T Zhang and N Liu ldquoA new reverse reduce-errorensemble pruning algorithmrdquo Applied Soft Computing vol 28pp 237ndash249 2015

[22] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth InternationalGroup Belmont Calif USA 1984

[23] J R Quinlan C45 Programs for Machine Learning MorganKaufmann San Francisco Calif USA 1993

[24] G I Webb ldquoFurther experimental evidence against the utilityof occamrsquos razorrdquo Journal of Artificial Intelligence Research vol4 pp 397ndash417 1996

[25] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[26] J J Rodrıguez L I Kuncheva and C J Alonso ldquoRotationforest a new classifier ensemble methodrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 28 no 10 pp1619ndash1630 2006

[27] C Yuan X Sun andR Lv ldquoFingerprint liveness detection basedon multi-scale LPQ and PCArdquo China Communications vol 13no 7 pp 60ndash65 2016

[28] I Partalas G Tsoumakas and I P Vlahavas ldquoFocused ensem-ble selection a diversity-based method for greedy ensembleselectionrdquo in In Proceeding of the 18th European Conference onArtificial Intelligence pp 117ndash121 Patras Greece July 2008

[29] I Partalas G Tsoumakas and I Vlahavas ldquoAn ensembleuncertainty aware measure for directed hill climbing ensemblepruningrdquo Machine Learning vol 81 no 3 pp 257ndash282 2010

[30] A Asuncion and D Newman ldquoUCI machine learning reposi-toryrdquo 2007

[31] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo The Journal of Machine Learning Research vol 6 pp1ndash30 2006

[32] S Garcıa and F Herrera ldquoAn extension on lsquostatistical com-parisons of classifiers over multiple data setsrsquo for all pairwisecomparisonsrdquo Journal of Machine Learning Research vol 9 pp2677ndash2694 2008

[33] I Rodrıguez-Fdez A Canosa M Mucientes and A BugarınldquoSTAC a web platform for the comparison of algorithmsusing statistical testsrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 1ndash8 Istanbul Turkey August2015

[34] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



(ii) propose a novel metric to measure the improvementof forest performance when a certain node grows intoa subtree

(iii) present a new ensemble pruning algorithm withthe proposed metric to prune a decision tree basedensemble The ensemble can be learned by a cer-tain algorithm or obtained by some ensemble selec-tion method Each decision tree can be pruned orunpruned

Experimental results show that the proposed methodcan significantly reduce the ensemble size and improve itsaccuracy This result indicates that the metric proposed inthis paper reasonably measures the influence on ensembleaccuracy when a certain node grows into a subtree

The rest of this paper is structured as follows Section 2provides a survey of ensemble of decision trees Section 3presents the formal description of forest trimming and themotivation of this study by an example Section 4 introduces anew forest pruning algorithm Section 5 reports and analyzesexperimental results and we conclude the paper with simpleremark and future work in Section 6

2 Forests

A forest is an ensemble whose members are learned bydecision tree learningmethod Two approaches are oftenusedto train a forest traditional approaches and the methodsspecially designed for forests

Bagging [1] and boosting [2] are the two most oftenused traditional methods to build forests Bagging takesbootstrap samples of objects and trains a tree on each sampleThe classifier votes are combined by majority voting Insome implementations classifiers produce estimates of theposterior probabilities for the classes These probabilities areaveraged across the classifiers and the most probable classis assigned called ldquoaveragerdquo or ldquomeanrdquo aggregation of theoutputs Bagging with average aggregation is implemented inWeka and used in the experiments in this paper Since eachindividual classifier is trained on a bootstrap sample the datadistribution seen during training is similar to the originaldistribution Thus the individual classifiers in a baggingensemble have relatively high classification accuracy Thefactor encouraging diversity between these classifiers is theproportion of different examples in the training set Boostingis a family of methods and Adaboost is the most prominentmember The idea is to boost the performance of a ldquoweakrdquoclassifier (can be decision tree) by using it within an ensemblestructure The classifiers in the ensemble are added one ata time so that each subsequent classifier is trained on datawhich have been ldquohardrdquo for the previous ensemble membersA set of weights is maintained across the objects in the dataset so that objects that have been difficult to classify acquiremore weight forcing subsequent classifiers to focus on them

Random forest [25] and rotation forest [26] are twoimportant approaches specially designed for building forestsRandom forest is a variant version of bagging The forest isbuilt again on bootstrap samples The difference lies in theconstruction of the decision treeThe feature to split a node isselected as the best feature among a set of119872 randomly chosen

features where 119872 is a parameter of the algorithm This smallalteration appeared to be a winning heuristic in that diversitywas introduced without much compromising the accuracy ofthe individual classifiers Rotation forest randomly splits thefeature set into 119870 subsets (119870 is a parameter of the algorithm)and Principal Component Analysis (PCA) [27] is applied toeach subset All principal components are retained in orderto preserve the variability information in the data Thus 119870axis rotations take place to form the new features and rotationforest building a tree using all training set in the new spacedefined by a given new feature space

3 Problem Description and Motivation

31 Problem Description Let 119863 = (119909119894 119910119894) | 119894 = 1 2 119873be a data set and let 119865 = 1198791 119879119872 be an ensemble withdecision tree 119879119894 learning from 119863 Denote by V isin 119879119905 a nodein tree 119879 and by 119864(V) isin 119863 the set of the examples reaching Vfrom the root of119879 root(119879) Suppose each node V isin 119879 containsa vector (119901V

1 119901V2 119901V

119870) where 119901V119896 is the proportion of the

examples in 119864(V) associated with label 119896 If V isin 119879119894 is a leaf andx119894 isin 119864(V) the prediction of 119879119895 on x119894 is

119879119895 (x119894) = argmax119896

119901V119896 (1)

Similarly for each example x119895 to be classified ensemble 119865returns a vector (1199011198951 1199011198952 119901119895119870) indicating that x119895 belongsto label 119896 with probability 119901119895119896 where

119901119895119896 =1119872119872

sum119895=1

119901(119894)119895119896 119896 = 1 2 119870 (2)

The prediction of 119865 on x119895 is 119865(x119895) = argmax119896119901119895119896Now our problem is given a forest 119865 with 119872 decision

trees how to prune each tree to reduce 119865rsquos size and improveits accuracy where 119865 is either constructed by some algorithmor obtained by some ensemble selection method

32 Motivation First let us look at an example which showsthe possibility that forest trimming can improve ensembleaccuracy

Example 1 Let 119865 = 1198790 1198791 1198799 be a forest with tendecision trees where 1198791 is shown in Figure 1 Suppose that119901V1 = 060 119901V

2 = 040 119901V11 = 100 119901V1

2 = 000 119901V21 = 020

and 119901V22 = 080 Let ten examples x0 x1 x9 reach node V

where x0 x5 associate with label 1 and x6 x9 associatewith label 2 Assume examples x0 x1 x4 reach leaf nodeV1 and x5 x9 reach leaf node V1

Obviously for 1198790 we can not prune the children of nodeV since treating V as a leaf would lead to more examplesincorrectly classified by 1198790

Assume that 119865rsquos predictions on x0 x1 x9 are as fol-lows

11990101 = 06511990111 = 070


Root

1 2


11990121 = 07011990131 = 06511990141 = 08011990151 = 04911990161 = 03011990171 = 01911990181 = 02011990191 = 03011990102 = 03511990112 = 03011990122 = 03011990132 = 03511990142 = 02011990152 = 05111990162 = 07011990172 = 08111990182 = 08011990192 = 070

(3)


11990101 = 06111990111 = 06511990121 = 065

11990131 = 06511990141 = 07511990151 = 05211990161 = 03311990171 = 02211990181 = 02311990191 = 03311990102 = 04011990112 = 03511990122 = 03511990132 = 03511990142 = 02511990152 = 04811990162 = 06711990172 = 07811990182 = 07711990192 = 067

(4)









119868 (V 119865) = sumx119895isin1198631015840

119868 (V 119865 x119895) = sumx119895isin119864(V)

119868 (V 119865 x119895) (5)




119868 (V1015840 119865) (6)












119890119905119905 (V)


119890119891119905 (V)


119890119891119891 (V)


(8)


1 119901V119870) If V is




119868 (V 119865 x119895) =119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119905119898 + 1119872) (9)


119905119898ge 119901V119891119898






119905119898minus 119901V119891119898



119868 (V 119865 x119895) =119901V119905119898

minus 119901V119905119904

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (10)


119905119898minus 119901V119905119904)119872 as the net






119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (11)


119905119898minus 119901V119891119898



119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119910119895

119872(119901119895119891119898 minus 119901119895119910119895 + 1119872) (12)


119905119898minus 119901V119910119895












5 Experiments










0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500


0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74


745

75

755

PimaBalance-scale

(b)








0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in




Root

1 2


11990121 = 07011990131 = 06511990141 = 08011990151 = 04911990161 = 03011990171 = 01911990181 = 02011990191 = 03011990102 = 03511990112 = 03011990122 = 03011990132 = 03511990142 = 02011990152 = 05111990162 = 07011990172 = 08111990182 = 08011990192 = 070

(3)


11990101 = 06111990111 = 06511990121 = 065

11990131 = 06511990141 = 07511990151 = 05211990161 = 03311990171 = 02211990181 = 02311990191 = 03311990102 = 04011990112 = 03511990122 = 03511990132 = 03511990142 = 02511990152 = 04811990162 = 06711990172 = 07811990182 = 07711990192 = 067

(4)









119868 (V 119865) = sumx119895isin1198631015840

119868 (V 119865 x119895) = sumx119895isin119864(V)

119868 (V 119865 x119895) (5)




119868 (V1015840 119865) (6)












119890119905119905 (V)


119890119891119905 (V)


119890119891119891 (V)


(8)


1 119901V119870) If V is




119868 (V 119865 x119895) =119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119905119898 + 1119872) (9)


119905119898ge 119901V119891119898






119905119898minus 119901V119891119898



119868 (V 119865 x119895) =119901V119905119898

minus 119901V119905119904

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (10)


119905119898minus 119901V119905119904)119872 as the net






119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (11)


119905119898minus 119901V119891119898



119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119910119895

119872(119901119895119891119898 minus 119901119895119910119895 + 1119872) (12)


119905119898minus 119901V119910119895












5 Experiments










0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500


0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74


745

75

755

PimaBalance-scale

(b)








0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in








119868 (V 119865) = sumx119895isin1198631015840

119868 (V 119865 x119895) = sumx119895isin119864(V)

119868 (V 119865 x119895) (5)




119868 (V1015840 119865) (6)












119890119905119905 (V)


119890119891119905 (V)


119890119891119891 (V)


(8)


1 119901V119870) If V is




119868 (V 119865 x119895) =119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119905119898 + 1119872) (9)


119905119898ge 119901V119891119898






119905119898minus 119901V119891119898



119868 (V 119865 x119895) =119901V119905119898

minus 119901V119905119904

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (10)


119905119898minus 119901V119905119904)119872 as the net






119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (11)


119905119898minus 119901V119891119898



119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119910119895

119872(119901119895119891119898 minus 119901119895119910119895 + 1119872) (12)


119905119898minus 119901V119910119895












5 Experiments










0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500


0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74


745

75

755

PimaBalance-scale

(b)








0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in








119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119891119898

119872(119901119895119891119898 minus 119901119895119891119904 + 1119872) (11)


119905119898minus 119901V119891119898



119868 (V 119865 x119895) = minus119901V119905119898

minus 119901V119910119895

119872(119901119895119891119898 minus 119901119895119910119895 + 1119872) (12)


119905119898minus 119901V119910119895












5 Experiments










0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500


0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74


745

75

755

PimaBalance-scale

(b)








0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in








5 Experiments










0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500


0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74


745

75

755

PimaBalance-scale

(b)








0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in




0 2 4 6 8 10500

1000

1500

Size

0 2 4 6 8 10500

1000

1500

0 2 4 6 8 101000

2000

3000

4000

0 2 4 6 8 100

500

1000

1500


0 2 4 6 8 1067

675

68

685

69

Autos

Accu

racy

()

0 2 4 6 8 10815

82

825

83

835

0 2 4 6 8 1071

72

73

74


745

75

755

PimaBalance-scale

(b)








0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in




0 100 2000

5000

10000

Size

0 100 2000

5000

10000

0 100 2000

1

2

3

0 100 2000

5000

10000times10

4


0 100 20066

67

68

69

70

Accu

racy

()

0 100 20080

81

82

83

84

0 100 20071

72

73

74

0 100 20074

745

75

755

76















6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in












6 Conclusion






Acknowledgments







References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in








References











































Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in




























Advances in

FuzzySystems


Volume 2014












Journal of



Advances in

Multimedia


Biomedical Imaging


Advances in


RoboticsJournal of










Advances in



researcharticle forest pruning based on branch...

Documents