[ieee 2008 8th international conference on hybrid intelligent systems (his) - barcelona, spain...
TRANSCRIPT
![Page 1: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems](https://reader037.vdocument.in/reader037/viewer/2022092712/5750a6a41a28abcf0cbb19c2/html5/thumbnails/1.jpg)
Evolving Sets of Symbolic Classifiers into a Single Symbolic Classifier usingGenetic Algorithms
Flavia Cristina BernardiniADDLabs, Fluminense Federal University
Campus of Boa ViagemNiteroi, RJ, Brazil
Ronaldo C. Prati Maria Carolina MonardInstitute of Mathemathics and Computer ScienceUniversity of Sao Paulo, Campus of Sao Carlos
P.O. Box 668, Sao Carlos, SP, Brazil{prati,mcmonard}@icmc.usp.br
Abstract
For a given data set, different learning algorithms typi-cally provide different classifiers. Although it is possible tosimply select the most successful classifier, the less success-ful classifiers could have potentially valuable informationthat may be wasted. This work proposes GAESC, an algo-rithm for evolving a set of classifiers into a single symbolicclassifier using genetic algorithms. Individuals are formedby rules collected from symbolic classifiers and rules fromassociation classification rules. Experimental results inthree data sets from UCI show that GAESC outperforms thesingle symbolic classifiers in terms of classification errorrate.
1 Introduction
The aim of symbolic supervised machine learning is toconstruct classifiers having good classification performanceon unseen cases. Furthermore, these classifiers should beeasily interpreted by humans. As different learning algo-rithms usually provide different classifiers for the same task,and as it is not possible to determine beforehand whichone will perform best in this task, numerous learning al-gorithms can be used and the most successful constructedclassifier could be selected. However, the less successfulclassifiers could have potentially valuable information forclassification that may be wasted. Indeed, it has been ob-served that the cases misclassified by different learning al-gorithms would not necessarily overlap [10], indicating thatdifferent classifiers could offer complementary information.
One solution to use this complementary informationfrom the less successful classifiers is to somehow combineby voting the output of multiple classifiers, an approachknown in machine learning as ensemble construction [2, 5].Although this approach might enhance classification per-
formance, it comes with a price as the multiple classifiersshould be remembered so that the ensemble could combinetheir decisions to classify new instances.
In [3] we propose several methods for constructing en-sembles of symbolic classifiers, which were implementedand tested on data sets from UCI. However, as all the clas-sifiers which compose the ensemble should be kept andused to classify any new instance, and these classificationsshould be pooled, the classification process is rather com-plex. These problems motivated us to develop this work.The main idea is to use genetic algorithms (GAs) [8] that,instead of combining classifiers into an ensemble, evolvesymbolic classifiers into a single symbolic classifier. In thisGA, individuals can be formed by pieces of other symbolicclassifiers (rules) as well as other rules provided by a do-main expert and/or classification association rules [12] hav-ing the class value as the consequence. This addition mayimprove the diversity in the population of the GA, so thatthe GA can have more room in order to find a good solu-tion.
One of the main advantages of our proposal is relatedto the codification of individuals, which enables the use ofwell known operations of genetic algorithms. Besides, asthis codification enables the codification of classifiers in-duced by other symbolic learning algorithms in the initialpopulation, our proposal also takes advantage of this ini-tialization as an informed guess for a good solution. Fur-thermore, as the same set of rules might provide differentclassifiers (depending on how they are combined to classifynew instances), we evaluated the proposed algorithm withdifferent rule classification methods and configurations.
This paper is organized as follows: Section 2 introducesthe definitions and notation used in this paper. Section 3 de-scribes the use of genetic algorithms for induction of sym-bolic classifiers. Section 4 describes GAESC, the proposedalgorithm. Section 5 presents the experimental results andSection 6 concludes the paper.
Eighth International Conference on Hybrid Intelligent Systems
978-0-7695-3326-1/08 $25.00 © 2008 IEEE
DOI 10.1109/HIS.2008.158
525
![Page 2: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems](https://reader037.vdocument.in/reader037/viewer/2022092712/5750a6a41a28abcf0cbb19c2/html5/thumbnails/2.jpg)
2 Definitions and Notation
A training data set T is a set of N classified instances{(x1, y1), ..., (xN , yN )} for some unknown function y =f(x). The xi instances are typically vectors of the form(xi1, xi2, ..., xim) whose components are discrete or realvalues, called features or attributes. Thus, xij denotes thevalue of the j-th feature Xj of the example xi. For clas-sification purposes, the yi values refer to a discrete set ofNCl classes, i.e. yi ∈ {C1, C2, ..., CNCl
}. Given a set T oftraining examples, a learning algorithm induces a classifierh, which is a hypothesis about the true unknown function f .Given new x values, h predicts the corresponding y values.
In this work we consider symbolic classifiers whose de-scription can be represented by a set of NR unordered rules,i.e. h = {R1, R2..., RNR
}. The term unordered means thata rule can be interpreted in isolation, i.e.,without taking intoaccount other rules in the set.
A rule R assumes the form if B then H or symbolicallyB → H , where H stands for the head, or rule conclusion,and B for the body, or rule condition. The body consists ofa disjunction of conjunctions of feature tests in the form ofXi op Value, where Xi is a feature name, op is an operatorin the set {=, 6=, <,≤, >,≥} and Value is a valid Xi featurevalue. In a classification rule, the head H assumes the formclass = Ci, where Ci ∈ {C1, ..., CNCl
}.Given a rule R = B → H and a set of examples T ,
let B ⊂ T be the set of examples that satisfy B and H ⊂T be the set of examples that satisfy H . Also, let B andH be their respective complement. Examples that satisfyboth B and H are correctly covered by R, and belong toset B ∩H. Instances satisfying B but not H are incorrectlycovered by R, and belong to set B ∩ H. On the other hand,instances that do not satisfy B are not covered by the rule,and belong to set B. One way to assess the quality of a ruleis by computing its contingency matrix [11], as shown inTable 1, where the cardinality of a set A as a, i.e. a = |A|,then b and h in Table 1 denote the number of instances insets B and H respectively, i.e. b = |B| and h = |H|, andso forth. From the contingency matrix of R numerous rulequality measures can be calculated [11]. In this work weuse Acc(R) = hb/b, Lap(R) = (bh + 1)/(b + NCl) andCov(R) = b.
Table 1. Contingency matrix for B → HB B
H bh bh h
H bh bh h
b b n
As more than one rule with different consequents maycover an instance, a set of rules can be used in differentways to constitute a classifier. One of them, called Single
Rule classification method (SR), consists in choosing fromall fired rules the best rule, according to some rule qual-ity criteria, to classify a given instance (for instance, a rulequality measure). Another method, called Multiple Ruleclassification method (MR), considers the combination ofall fired rules. In this work, MR has been implemented asfollows: let MRCi be the set of fired rules having the sameclass Ci. For each MRCi
, the quality measure of the dis-junction of these rules is computed and the instance is clas-sified according to the best computed quality criteria amongthe MRCi
.Finally, the performance of the rule set as a classifier (us-
ing either SR or MR to classify examples) can be assessedin a confusion matrix by comparing the classification givenby the classifier with respect to the true classification of theexample in the test set. For a two class problem, generallycalled positive + and negative −, this confusion matrix isshown in Table 2, where: TP (TN ) is the number of positive(negative) instances correctly classified; and FP (FN ) is thenumber of negative (positive) instances incorrectly classi-fied. As with rules, numerous classifier’s quality measuresbased on the confusion matrix can be calculated. In thiswork we used Acc(h) = TP +TN
N ; F1(h) = 2TP
2TP +FN+FP;
and Prec(h) = TP
TP +FP.
Table 2. Confusion matrix for binary classifi-cation
Class Predicted + Predicted -True + TP FN
True - FP TN
3 Genetic Algorithms and Symbolic Learn-ing
Genetic algorithms provide an approach to learning us-ing a global search based on abstractions of the processesof the Darwin evolution. A GA maintains a population ofindividuals, where each individual is a candidate solution toa given problem and is evaluated by a fitness function thatmeasures its quality to solve the problem considered [8].GAs generate new individuals (offspring) by repeatedly mu-tating and recombining part of the best current individualsin the population. At each iteration (generation) the cur-rent population is updated by replacing some fraction of thepopulation by the best offspring. Thus, the better the qual-ity of an individual, the higher the probability that parts ofits candidate solution will be passed on to later generationsof individuals. This iterative process executes repeatedlyuntil the best possible individual is found or another stopcriterion is satisfied. Many stop criterion can be used. Two
526
![Page 3: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems](https://reader037.vdocument.in/reader037/viewer/2022092712/5750a6a41a28abcf0cbb19c2/html5/thumbnails/3.jpg)
criteria often used are (1) executing a pre-defined number ofgenerations or (2) waiting for a convergence of the fitnessfunction of the individuals, which remains the same after afixed number of generations.
Given a problem, it is necessary to find a suitable rep-resentation for individuals and a fitness function for theGA. In this work, each individual consists of a set of ruleswhich represents a symbolic classifier, and the fitness func-tion should be a measure of the classifiers’ quality.
In symbolic learning, the two following approaches havebeen proposed by the GA community to represent individ-uals, as described in [7]: Pittsburgh [4, 13], where each in-dividual codifies a set of knowledge rules; and Michigan,where each individual codifies a single knowledge rule. Asour interest is to find classifiers with good prediction power,the interaction between the rules that composes the classi-fier is more important than each rule individually, thus thePittsburgh approach seems more natural.The implementedGA is described next.
4 GAESC — A Genetic Algorithm for Evolv-ing Symbolic Classifiers
The GA proposed in this work follows the same steps ofthe classical GA [8] and is described by Algorithm 1, whereCinit represents the set of classifiers already constructed bythe user.Individuals codification: As already mentioned, each indi-vidual is a symbolic classifier represented by a set of rulescoded using the Pittsburgh approach, and using a knowl-edge rule base setW , containing W rules — Figure 1 —,where each rule, all different, is uniquely identified by anumber. Observe that rules that participate in the initialclassifiers in Cinit must be in W . Besides the rules fromCinit, and in order to add diversity to the evolution process,W can be loaded with rules constructed by domain special-ist(s), as well as rules constructed using, for example, anclassification association rules algorithm, such as Apriori1,and others. Figure 2 shows the codification of an individualcomposed by 5 rules fromW .
The initial individuals, except the ones that belong toCinit, are initialized with Nrules rules. Unless providedby the user, the value of Nrules is calculated as the meannumber of rules of the classifiers in Cinit. Note that theindividuals’ size may change on each generation. The im-plemented operators are described next.Selection operator: The individuals are selected for thenext population according to its fitness function. The better
1In this work, only classification association rules, i.e., rules gener-ated by Apriori where the right-hand side refers only to the class attribute.values. In our experiments, to discretize the attributes, we have used thenon-supervised discretization algorithm implemented in the Weka toolkit,having the maximum number of bins set to 10.
01 if ... then ...02 if ... then ...
...
W if ... then ...
Figure 1. Knowl-edge Rule Base
01 10 15 09 07
Figure 2. Exampleof a coded indi-vidual
the fitness function of an individual, the higher the probabil-ity of this individual to be selected for the next generation.Elitism was also used: in each new generation, the best in-dividual from the previous one is maintained .Crossover operator: Two individuals (parents) are chosen.Consider for example that: parent1 = 07 02 11 04 08 0301 05 09 and parent2 = 05 23 10 06 12 09 20 15, and oneposition is randomly selected in each of them, Let it be po-sitions 3 and 5 respectively, which divide each individual intwo sets of rules: parent1 = 07 02 11 | 04 08 03 01 05 09and parent2 = 05 23 10 06 12 | 09 20 15. Afterwards, theyare crossed over generating the following two sons: son1 =07 02 11 09 20 15 and son2 = 05 23 10 06 12 04 08 03 0109. In this way, the individuals’ size may change from onegeneration to the other, thus enabling GAESC to find indi-viduals with the appropriate number of rules. Duplicatedrules are removed.Mutation operator: A rule is randomly selected from anindividual and is replaced (without reposition) by a ran-domly selected rule from the knowledge rule baseW . Themutation operator may also be applied to offsprings in thesame generation.Fitness Functions: Since the individuals are classifiers,fitness functions should evaluate their behavior on a testdata set. In GAESC, the measures to evaluate classi-fiers are called HQ. The classifiers’ evaluation measuresimplemented are HQAcc(h) = Acc(h), HQPrec(h) =Prec(h), HQF1(h) = F1(h), HQAC(h) = Acc(h) ×mean(Cov(Ri)),∀Ri ∈ h and HQPC(h) = Prec(h) ×mean(Cov(Ri)),∀Ri ∈ h
Furthermore, as explained in Section 2, classification bya rule set can be carried out using single (SR) or multiple(MR) classification methods. Besides the three rule mea-sures stated earlier, an index PE, which subtracts precisionfrom error, defined by PE(R) = Err(R) − Prec(R) isused.
The SR methods implemented are SRAcc, SRLap,SRCov and SRPE , which classify an example using thebest covering rule according to Acc(R), Lap(R), Cov(R)and PE(R), respectively. The MR methods proposed areMRAcc, MRLap, MRCov and MRPE .
The fitness function of GAESC can be set as a combina-
527
![Page 4: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems](https://reader037.vdocument.in/reader037/viewer/2022092712/5750a6a41a28abcf0cbb19c2/html5/thumbnails/4.jpg)
Algorithm 1 GAESC algorithm.Require: S = {(x1, y1), ..., (xN , yN )}: Set of examples
which will be used to construct the training, test andvalidation sets;Cinit = {ha,hb...,hq};W = {R1, ..., RW }: Knowledge rule base;Nind: Number of individuals in population;cross over, selection and mutation parameters;f(h): Fitness function to evolve individuals
1: Generate initial population Pcurrent = {h1, ...,hNind}
where Cinit ⊂ Pcurrent;2: for all h ∈ Pcurrent do3: Calculate f(h) using the training set;4: end for5: while Stop criteria not satisfied do6: h = best individual(Pcurrent);7: Pnew = h + Nind − 1 individuals selected from
Pcurrent;8: Apply cross-over operator to pairs of individuals se-
lected from Pnew;9: Apply mutation operator to individuals selected from
Pnew;10: Evaluate f of the modified individuals Pnew using
the training set;11: Pcurrent = Pnew;12: end while13: h = best individual(Pcurrent);14: h∗ =Post-process(h); {Prune h using the validation
set}15: return h∗; {Classifier evolved by GA}
tion of two of the described methods — one HQ methodcombined to one MR or SR method. In other words,the possible fitness functions the user can choose fromare MRAccHQAC , MRCovHQAcc, SRLapHQF1, and soforth.Stop criteria: Three criteria were implemented:
1. A maximum number of generations specified by theuser.
2. A convergence method. Given a number of genera-tions Ngen, if the evaluation function does not improvein the last Ngen generations, the GA stops;
3. Similar to the previous one, but GAESC stops if themean of all the individuals fitness does not change inthe last Ngen generations.
5 Experiments and Results
In order to evaluate the proposed genetic algorithm, wecarried out a series of experiments using 3 (three) data sets
from UCI [9] — Autos, Balance2 and Heart. Table 3 de-scribes the main characteristics of the data sets: number ofinstances (# Inst.); number of features (continuous, discrete)(# Feat.); class distribution (Class %); error rate of the clas-sifier which allways predict the most prevalent class (Maj.Err.); presence/absence of unknown values (Unk. Val?); andnumber of conflicting/duplicated instances (# InstCD). Thebehavior of GAESC was evaluated varying the fitness func-tion and the stop criteria.
Data set # Inst. # Feat Class Maj. Unk. # InstCd.(cont.,disc.) (%) Err. Val?
Autos 205 25 safe 44.88% Y 0(15,10) (44.88%) in risky (0.00%)
risky(52.12%)
Balance 576 4 L 53.92% N 0(0,4) (46.08%) in L (0.00%)
R(53.92%)
Heart 270 13 1 44.44% Y 0(5,8) (55.56%) in 1 (0.00%)(5,8) 2
(44.44%)
Table 3. Data sets description
Experiments were evaluated using 10-fold cross valida-tion. The remaining training data set was used to constructclassification association rules using Apriori (continuousfeatures were discretized in advance) and three classifierswere constructed, using CN2, C4.5 and C4.5rules. Allthe induced rules were used to compose the knowledge rulebase. The initial population consisted of the three classifiersinduced by CN2 (unordered rules), C4.5 and C4.5rules(Cinit) as well as classifiers constructed using randomlyselected rules from the knowledge rule base. The muta-tion and crossover parameters were set as pm = 0.01 andpc = 0.40, respectively. The individuals’ number of rulesin the initial population (except the ones in Cinit) was setas the mean number of rules of the three classifiers in Cinit.
In order to test the convergence of the algorithm, threedifferent stopping criteria configurations were used:
Maximum number of generations: number of executedgenerations — 10; number of individuals in popula-tion — 15;
Convergence of the the best individual: the fitness func-tion of the best individual does not improve in 10 gen-erations; number of individuals in population — 15;
Convergence of the population: the fitness function ofall of the individuals does not improve in 25 iterations;number of individuals in populations — 50.
2The original data set, named Balance-Scale, was modified by remov-ing the instances from class B, since this class only has 7,84% of the totalnumber of examples and introduces the additional problem of class imbal-ance [1].
528
![Page 5: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems](https://reader037.vdocument.in/reader037/viewer/2022092712/5750a6a41a28abcf0cbb19c2/html5/thumbnails/5.jpg)
We use the error rate, averaged over the 10 cross val-idation test sets, as the metric to evaluate our approach.In order to analyze whether there are differences amongthe methods, we ran the Friedman test3. To run the tests,each combination of stopping criteria and fitness function isconsidered as an independent execution of the algorithm.Due to lack of space, only results of these tests are re-ported here4. Friedman test was ran with four different null-hypotheses: (H1) that the results obtained using the threeconvergence criteria are comparable considering all results;(H2) that the performance of using either single rules (SR)or multiple rule (MR) as classification methods are com-parable considering all results; (H3) that the performanceof using Acc(R), Cov(R), Lap(R) or PE(R) as rule clas-sification methods are comparable considering all results;(H4) that the performance of using HQAC(h), HQAcc(h),HQF1(h), HQPC(h) or HQPrec(h) as hypothesis qualitymethods are comparable considering all results. When thenull-hypothesis is rejected by the Friedman test, we can pro-ceed with a post-hoc test to detect which differences amongthe methods are significant. We ran the Nemenyi multiplecomparison with a control test to point out whether there isa significant difference among the methods involved in theexperimental evaluation.
The Friedman tests rejected the four null hypothesis at95% of confidence level. Figures 3 to 6 show the resultsof Nememy tests carried out in the results. For each graph,the thicker line above the graph marks the interval of onecritical difference (see [6]) two methods should have to bestatistically different from each other. Groups of algorithmsthat are not significantly different are also grouped by a line.
As shown in Figure 3, the convergence considering themedian of the fitness function of the last 25 iterations isstatistically better than the others, and the convergence con-sidering no increases in the best individual is better thanrunning 10 interactions only.
CD
1 2 3
Convergence of the populationConvergence of the best individual
Maximum number of generations
Figure 3. Critical difference diagram for hy-pothesis H1: all convergence criteria behavethe same?
Figure 4 shows that using multiple rule (MR) evalua-tion is significantly better than using SR to classify an in-stance. Regarding the rule classification method, Figure 5
3The Friedman test is the nonparametric equivalent of the repeated-measures ANOVA. See [6] for a thorough discussion regarding statisticaltests in machine learning research.
4All tabulated results can be found in http://www.icmc.usp.br/˜prati/HIS2008.
shows no statistical significance among Acc(R), Lap(R)and PE(R).
CD
1 2
Rule combination Single rule
Figure 4. Critical difference diagram for hy-pothesis H2: either single rule (SR) or multi-ple rule (MR) evaluation behave the same?
CD
2 3 4
LapAcc PE
Cov
Figure 5. Critical difference diagram for hy-pothesis H3: Acc(R), Cov(R), Lap(R) andPE(R) behave the same as classificationmethod?
Finally, Figure 6 shows the comparison of HQAC(h),HQAcc(h), HQF1(h), HQPC(h) or HQPrec(h) as hy-pothesis quality measures. As can be seem in this fig-ure, there is no significantly differences between HQAcc(h)and HQF1(h) or among HQAC(h), HQPC(h) andHQPrec(h), although the first two are better than the lat-ter three.
CD
2 3 4
HQAccHQF1
HQP C
HQP rec
HQAC
Figure 6. Critical difference diagram for hy-pothesis H4: HQAC(h), HQAcc(h), HQF1(h),HQPC(h) and HQPrec(h) behave the same ashypothesis quality measures?
As the results using the average convergence of the pop-ulation in the last 25 iterations and multiple rule classifi-tion (MR) were statistically better than the others, we con-straint further analysis to them. We also considered onlyMRAcc(R), MRLap(R) and MRPE(R) as classificationmethod and HQAcc(h) and HQF1(h) as hypothesis qual-ity measures, as the results do not provide statistical differ-ences among them. We compare these methods with respectto the number of rules as well as the error rate of the classi-fiers induced by C4.5, C4.5rules and CN2.
Table 4 shows the error rate of the three inducers, C4.5,C4.5rules and CN2. Number in parenthesis indicates stan-
529
![Page 6: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems](https://reader037.vdocument.in/reader037/viewer/2022092712/5750a6a41a28abcf0cbb19c2/html5/thumbnails/6.jpg)
autos balance heartC4.5 8.69(2.95) 28.48(1.61) 22.96(3.16)C4.5rules 9.14(3.22) 15.12(1.71) 21.11(2.28)CN2 12.14(3.29) 15.11(1.55) 22.22 (2.82)MRAccHQAcc 5.40 (1.39) 4.88 (0.81) 9.63 (2.42)MRAccHQF1 4.88 (1.78) 6.26 (1.59) 10.00 (1.24)MRLapHQAcc 6.88 (1.97) 5.38 (0.98) 11.85 (2.46)MRLapHQF1 4.88 (0.71) 5.04 (1.12) 12.96 (2.08)MRPEHQAcc 4.43 (1.14) 9.19 (1.71) 8.89 (0.99)MRPEHQF1 6.38 (1.67) 7.11 (1.44) 9.63 (2.36)
Table 4. Error rate
autos balance heartC4.5 6.00(6.77) 36.20(4.13) 25.10(4.38)C4.5rules 8.70(1.25) 34.60(4.45) 14.20(1.03)CN2 33.40(2.12) 200.60(6.11) 44.40(2.63)MRAccHQAcc 26.00(38.92) 33.30(9.74) 93.20(93.11)MRAccHQF1 28.00(28.40) 34.10(11.40) 65.10(49.93)MRLapHQAcc 11.60(7.35) 29.80(5.94) 13.50(8.24)MRLapHQF1 19.10(15.54) 32.10(8.80) 42.40(27.76)MRPEHQAcc 21.60(53.03) 36.30(10.59) 90.40(103.55)MRPEHQF1 21.90(12.44) 38.90(9.16) 35.20(27.66)
Table 5. Comparison of the number of rules
dard error. As can be seem from this table, GAESC greatlyoutperforms all these inducers in all data sets. Regarding thenumber of rules, Table 5 shows a comparison of these threealgorithms with the proposed methods. In general, C4.5 andC4.5rules produced smaller rule sets for the autos and heartdata sets, although our approach compares favorably to thenumber of rules generated by CN2 for the autos data set,and in three cases out of six for the heart data set. On theother hand, the number of rules generated by GAESC usingdata set balance is comparable to C4.5 and C4.5rules, andmuch lower than the number of rules generated by CN2.
These results show the suitability of our approach.GAESC was able to achieve considerable lower error rateswhen compared to C4.5, C4.5rules and CN2, with aslightly increase in the rule sets sizes, keeping them mostof the time between the size of the classifiers induced byC4.5 and CN2.
6 Conclusions
This paper presented GAESC, a genetic algorithm thatevolves a single symbolic classifier from classifiers con-sisting of rules coming from a knowledge rule base. Thisknowledge base is formed by rules from symbolic learn-ing algorithms, as well as associative classification rules.Each individual (classifier) is a rule set formed by a subsetof rules from the knowledge rule base.
GAESC was evaluated in three data sets from UCI us-
ing different configurations of the fitness function. Thisfitness function was formed by a combination of classifi-cation methods and rule set quality. We also evaluated dif-ferent convergence criteria. Individuals from the best con-figurations were compared to symbolic classifiers inducedby C4.5, C4.5rules and CN2. Results show that GAESCoutperforms these symbolic classifiers in terms of classifi-cation error rates, while keeping the individuals’ rule set atreasonable sizes when compared to them.
Acknowledgments: This research was supported by theBrazilian research councils CNPq and FAPESP. The authorswould like to thank Alex A. Freitas (University of Kent,UK) and Luis Correia (Universidade de Lisboa, Portugal)for their helpful discussions about this work; and the anony-mous referees for their insightful comments.
References
[1] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard.A study of the behavior of several methods for balanc-ing machine learning training data. SIGKDD Explorations,6(1):20–29, 2004.
[2] E. Bauer and R. Kohavi. An empirical comparison of votingclassification algorithms: Bagging, boosting and variants.Machine Learning, 36(1/2):105–139, 1999.
[3] F. C. Bernardini, M. C. Monard, and R. C. Prati. Construct-ing ensembles of symbolic classifiers. Int. J. Hybrid Intelli-gent Systems, 3(3):159–167, 2006.
[4] K. A. De Jong, W. Spears, and D. Gordon. Using genetic al-gorithms for concept learning. Machine Learning, 13:161–188, 1993.
[5] T. G. Dietterich. Ensemble methods in machine learning. InFirst International Workshop on Multiple Classifier Systems.LNCS, volume 1857, pages 1–15, New York, 2000.
[6] J. Demsar. Statistical comparisons of classifiers over multi-ple data sets. J. Machine Learning Research, 7:1–30, 2006.
[7] A. A. Freitas. Data Mining and Knowledge Discovery withEvolutionary Algorithms. Springer Verlag, 2002.
[8] D. Goldberg. Genetic Algorithms in Search, Optimizationand Machine Learning. Addison Wesley, 1989.
[9] S. Hettich, C. Blake, and C. Merz. UCI repository of ma-chine learning databases, 1998.
[10] J. Kittler, M. Hatef, R. P. Duin, and J. Matas. On combin-ing classifiers. IEEE Trans. Pattern Analysis and MachineIntelligence, 20(3):226–239, 1998.
[11] N. Lavrac, P. Flach, and B. Zupan. Rule evaluation mea-sures: a unifying view. In Proc. 9th Inter. W. on InductiveLogic Programming. LNAI, volume 1634, pages 74–185,1999.
[12] B. Liu, W. Hsu, and Y. Ma. Integrating classification andassociation rule mining. In KDD, pages 80–86, 1998.
[13] C. Setzkorn and R. C. Paton. MERBIS – a multi-objectiveevolutionary rule base induction system. Technical ReportULCS-03-016, Department of Computer Science, Univer-sity of Liverpool, Liverpool, U.K., 2003.
530