[ieee 2008 8th international conference on hybrid intelligent systems (his) - barcelona, spain...

6
Evolving Sets of Symbolic Classifiers into a Single Symbolic Classifier using Genetic Algorithms Flavia Cristina Bernardini ADDLabs, Fluminense Federal University Campus of Boa Viagem Niter´ oi, RJ, Brazil [email protected] Ronaldo C. Prati Maria Carolina Monard Institute of Mathemathics and Computer Science University of S˜ ao Paulo, Campus of S˜ ao Carlos P.O. Box 668, S˜ ao Carlos, SP, Brazil {prati,mcmonard}@icmc.usp.br Abstract For a given data set, different learning algorithms typi- cally provide different classifiers. Although it is possible to simply select the most successful classifier, the less success- ful classifiers could have potentially valuable information that may be wasted. This work proposes GAESC, an algo- rithm for evolving a set of classifiers into a single symbolic classifier using genetic algorithms. Individuals are formed by rules collected from symbolic classifiers and rules from association classification rules. Experimental results in three data sets from UCI show that GAESC outperforms the single symbolic classifiers in terms of classification error rate. 1 Introduction The aim of symbolic supervised machine learning is to construct classifiers having good classification performance on unseen cases. Furthermore, these classifiers should be easily interpreted by humans. As different learning algo- rithms usually provide different classifiers for the same task, and as it is not possible to determine beforehand which one will perform best in this task, numerous learning al- gorithms can be used and the most successful constructed classifier could be selected. However, the less successful classifiers could have potentially valuable information for classification that may be wasted. Indeed, it has been ob- served that the cases misclassified by different learning al- gorithms would not necessarily overlap [10], indicating that different classifiers could offer complementary information. One solution to use this complementary information from the less successful classifiers is to somehow combine by voting the output of multiple classifiers, an approach known in machine learning as ensemble construction [2, 5]. Although this approach might enhance classification per- formance, it comes with a price as the multiple classifiers should be remembered so that the ensemble could combine their decisions to classify new instances. In [3] we propose several methods for constructing en- sembles of symbolic classifiers, which were implemented and tested on data sets from UCI. However, as all the clas- sifiers which compose the ensemble should be kept and used to classify any new instance, and these classifications should be pooled, the classification process is rather com- plex. These problems motivated us to develop this work. The main idea is to use genetic algorithms (GAs) [8] that, instead of combining classifiers into an ensemble, evolve symbolic classifiers into a single symbolic classifier. In this GA, individuals can be formed by pieces of other symbolic classifiers (rules) as well as other rules provided by a do- main expert and/or classification association rules [12] hav- ing the class value as the consequence. This addition may improve the diversity in the population of the GA, so that the GA can have more room in order to find a good solu- tion. One of the main advantages of our proposal is related to the codification of individuals, which enables the use of well known operations of genetic algorithms. Besides, as this codification enables the codification of classifiers in- duced by other symbolic learning algorithms in the initial population, our proposal also takes advantage of this ini- tialization as an informed guess for a good solution. Fur- thermore, as the same set of rules might provide different classifiers (depending on how they are combined to classify new instances), we evaluated the proposed algorithm with different rule classification methods and configurations. This paper is organized as follows: Section 2 introduces the definitions and notation used in this paper. Section 3 de- scribes the use of genetic algorithms for induction of sym- bolic classifiers. Section 4 describes GAESC, the proposed algorithm. Section 5 presents the experimental results and Section 6 concludes the paper. Eighth International Conference on Hybrid Intelligent Systems 978-0-7695-3326-1/08 $25.00 © 2008 IEEE DOI 10.1109/HIS.2008.158 525

Upload: maria-carolina

Post on 16-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems

Evolving Sets of Symbolic Classifiers into a Single Symbolic Classifier usingGenetic Algorithms

Flavia Cristina BernardiniADDLabs, Fluminense Federal University

Campus of Boa ViagemNiteroi, RJ, Brazil

[email protected]

Ronaldo C. Prati Maria Carolina MonardInstitute of Mathemathics and Computer ScienceUniversity of Sao Paulo, Campus of Sao Carlos

P.O. Box 668, Sao Carlos, SP, Brazil{prati,mcmonard}@icmc.usp.br

Abstract

For a given data set, different learning algorithms typi-cally provide different classifiers. Although it is possible tosimply select the most successful classifier, the less success-ful classifiers could have potentially valuable informationthat may be wasted. This work proposes GAESC, an algo-rithm for evolving a set of classifiers into a single symbolicclassifier using genetic algorithms. Individuals are formedby rules collected from symbolic classifiers and rules fromassociation classification rules. Experimental results inthree data sets from UCI show that GAESC outperforms thesingle symbolic classifiers in terms of classification errorrate.

1 Introduction

The aim of symbolic supervised machine learning is toconstruct classifiers having good classification performanceon unseen cases. Furthermore, these classifiers should beeasily interpreted by humans. As different learning algo-rithms usually provide different classifiers for the same task,and as it is not possible to determine beforehand whichone will perform best in this task, numerous learning al-gorithms can be used and the most successful constructedclassifier could be selected. However, the less successfulclassifiers could have potentially valuable information forclassification that may be wasted. Indeed, it has been ob-served that the cases misclassified by different learning al-gorithms would not necessarily overlap [10], indicating thatdifferent classifiers could offer complementary information.

One solution to use this complementary informationfrom the less successful classifiers is to somehow combineby voting the output of multiple classifiers, an approachknown in machine learning as ensemble construction [2, 5].Although this approach might enhance classification per-

formance, it comes with a price as the multiple classifiersshould be remembered so that the ensemble could combinetheir decisions to classify new instances.

In [3] we propose several methods for constructing en-sembles of symbolic classifiers, which were implementedand tested on data sets from UCI. However, as all the clas-sifiers which compose the ensemble should be kept andused to classify any new instance, and these classificationsshould be pooled, the classification process is rather com-plex. These problems motivated us to develop this work.The main idea is to use genetic algorithms (GAs) [8] that,instead of combining classifiers into an ensemble, evolvesymbolic classifiers into a single symbolic classifier. In thisGA, individuals can be formed by pieces of other symbolicclassifiers (rules) as well as other rules provided by a do-main expert and/or classification association rules [12] hav-ing the class value as the consequence. This addition mayimprove the diversity in the population of the GA, so thatthe GA can have more room in order to find a good solu-tion.

One of the main advantages of our proposal is relatedto the codification of individuals, which enables the use ofwell known operations of genetic algorithms. Besides, asthis codification enables the codification of classifiers in-duced by other symbolic learning algorithms in the initialpopulation, our proposal also takes advantage of this ini-tialization as an informed guess for a good solution. Fur-thermore, as the same set of rules might provide differentclassifiers (depending on how they are combined to classifynew instances), we evaluated the proposed algorithm withdifferent rule classification methods and configurations.

This paper is organized as follows: Section 2 introducesthe definitions and notation used in this paper. Section 3 de-scribes the use of genetic algorithms for induction of sym-bolic classifiers. Section 4 describes GAESC, the proposedalgorithm. Section 5 presents the experimental results andSection 6 concludes the paper.

Eighth International Conference on Hybrid Intelligent Systems

978-0-7695-3326-1/08 $25.00 © 2008 IEEE

DOI 10.1109/HIS.2008.158

525

Page 2: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems

2 Definitions and Notation

A training data set T is a set of N classified instances{(x1, y1), ..., (xN , yN )} for some unknown function y =f(x). The xi instances are typically vectors of the form(xi1, xi2, ..., xim) whose components are discrete or realvalues, called features or attributes. Thus, xij denotes thevalue of the j-th feature Xj of the example xi. For clas-sification purposes, the yi values refer to a discrete set ofNCl classes, i.e. yi ∈ {C1, C2, ..., CNCl

}. Given a set T oftraining examples, a learning algorithm induces a classifierh, which is a hypothesis about the true unknown function f .Given new x values, h predicts the corresponding y values.

In this work we consider symbolic classifiers whose de-scription can be represented by a set of NR unordered rules,i.e. h = {R1, R2..., RNR

}. The term unordered means thata rule can be interpreted in isolation, i.e.,without taking intoaccount other rules in the set.

A rule R assumes the form if B then H or symbolicallyB → H , where H stands for the head, or rule conclusion,and B for the body, or rule condition. The body consists ofa disjunction of conjunctions of feature tests in the form ofXi op Value, where Xi is a feature name, op is an operatorin the set {=, 6=, <,≤, >,≥} and Value is a valid Xi featurevalue. In a classification rule, the head H assumes the formclass = Ci, where Ci ∈ {C1, ..., CNCl

}.Given a rule R = B → H and a set of examples T ,

let B ⊂ T be the set of examples that satisfy B and H ⊂T be the set of examples that satisfy H . Also, let B andH be their respective complement. Examples that satisfyboth B and H are correctly covered by R, and belong toset B ∩H. Instances satisfying B but not H are incorrectlycovered by R, and belong to set B ∩ H. On the other hand,instances that do not satisfy B are not covered by the rule,and belong to set B. One way to assess the quality of a ruleis by computing its contingency matrix [11], as shown inTable 1, where the cardinality of a set A as a, i.e. a = |A|,then b and h in Table 1 denote the number of instances insets B and H respectively, i.e. b = |B| and h = |H|, andso forth. From the contingency matrix of R numerous rulequality measures can be calculated [11]. In this work weuse Acc(R) = hb/b, Lap(R) = (bh + 1)/(b + NCl) andCov(R) = b.

Table 1. Contingency matrix for B → HB B

H bh bh h

H bh bh h

b b n

As more than one rule with different consequents maycover an instance, a set of rules can be used in differentways to constitute a classifier. One of them, called Single

Rule classification method (SR), consists in choosing fromall fired rules the best rule, according to some rule qual-ity criteria, to classify a given instance (for instance, a rulequality measure). Another method, called Multiple Ruleclassification method (MR), considers the combination ofall fired rules. In this work, MR has been implemented asfollows: let MRCi be the set of fired rules having the sameclass Ci. For each MRCi

, the quality measure of the dis-junction of these rules is computed and the instance is clas-sified according to the best computed quality criteria amongthe MRCi

.Finally, the performance of the rule set as a classifier (us-

ing either SR or MR to classify examples) can be assessedin a confusion matrix by comparing the classification givenby the classifier with respect to the true classification of theexample in the test set. For a two class problem, generallycalled positive + and negative −, this confusion matrix isshown in Table 2, where: TP (TN ) is the number of positive(negative) instances correctly classified; and FP (FN ) is thenumber of negative (positive) instances incorrectly classi-fied. As with rules, numerous classifier’s quality measuresbased on the confusion matrix can be calculated. In thiswork we used Acc(h) = TP +TN

N ; F1(h) = 2TP

2TP +FN+FP;

and Prec(h) = TP

TP +FP.

Table 2. Confusion matrix for binary classifi-cation

Class Predicted + Predicted -True + TP FN

True - FP TN

3 Genetic Algorithms and Symbolic Learn-ing

Genetic algorithms provide an approach to learning us-ing a global search based on abstractions of the processesof the Darwin evolution. A GA maintains a population ofindividuals, where each individual is a candidate solution toa given problem and is evaluated by a fitness function thatmeasures its quality to solve the problem considered [8].GAs generate new individuals (offspring) by repeatedly mu-tating and recombining part of the best current individualsin the population. At each iteration (generation) the cur-rent population is updated by replacing some fraction of thepopulation by the best offspring. Thus, the better the qual-ity of an individual, the higher the probability that parts ofits candidate solution will be passed on to later generationsof individuals. This iterative process executes repeatedlyuntil the best possible individual is found or another stopcriterion is satisfied. Many stop criterion can be used. Two

526

Page 3: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems

criteria often used are (1) executing a pre-defined number ofgenerations or (2) waiting for a convergence of the fitnessfunction of the individuals, which remains the same after afixed number of generations.

Given a problem, it is necessary to find a suitable rep-resentation for individuals and a fitness function for theGA. In this work, each individual consists of a set of ruleswhich represents a symbolic classifier, and the fitness func-tion should be a measure of the classifiers’ quality.

In symbolic learning, the two following approaches havebeen proposed by the GA community to represent individ-uals, as described in [7]: Pittsburgh [4, 13], where each in-dividual codifies a set of knowledge rules; and Michigan,where each individual codifies a single knowledge rule. Asour interest is to find classifiers with good prediction power,the interaction between the rules that composes the classi-fier is more important than each rule individually, thus thePittsburgh approach seems more natural.The implementedGA is described next.

4 GAESC — A Genetic Algorithm for Evolv-ing Symbolic Classifiers

The GA proposed in this work follows the same steps ofthe classical GA [8] and is described by Algorithm 1, whereCinit represents the set of classifiers already constructed bythe user.Individuals codification: As already mentioned, each indi-vidual is a symbolic classifier represented by a set of rulescoded using the Pittsburgh approach, and using a knowl-edge rule base setW , containing W rules — Figure 1 —,where each rule, all different, is uniquely identified by anumber. Observe that rules that participate in the initialclassifiers in Cinit must be in W . Besides the rules fromCinit, and in order to add diversity to the evolution process,W can be loaded with rules constructed by domain special-ist(s), as well as rules constructed using, for example, anclassification association rules algorithm, such as Apriori1,and others. Figure 2 shows the codification of an individualcomposed by 5 rules fromW .

The initial individuals, except the ones that belong toCinit, are initialized with Nrules rules. Unless providedby the user, the value of Nrules is calculated as the meannumber of rules of the classifiers in Cinit. Note that theindividuals’ size may change on each generation. The im-plemented operators are described next.Selection operator: The individuals are selected for thenext population according to its fitness function. The better

1In this work, only classification association rules, i.e., rules gener-ated by Apriori where the right-hand side refers only to the class attribute.values. In our experiments, to discretize the attributes, we have used thenon-supervised discretization algorithm implemented in the Weka toolkit,having the maximum number of bins set to 10.

01 if ... then ...02 if ... then ...

...

W if ... then ...

Figure 1. Knowl-edge Rule Base

01 10 15 09 07

Figure 2. Exampleof a coded indi-vidual

the fitness function of an individual, the higher the probabil-ity of this individual to be selected for the next generation.Elitism was also used: in each new generation, the best in-dividual from the previous one is maintained .Crossover operator: Two individuals (parents) are chosen.Consider for example that: parent1 = 07 02 11 04 08 0301 05 09 and parent2 = 05 23 10 06 12 09 20 15, and oneposition is randomly selected in each of them, Let it be po-sitions 3 and 5 respectively, which divide each individual intwo sets of rules: parent1 = 07 02 11 | 04 08 03 01 05 09and parent2 = 05 23 10 06 12 | 09 20 15. Afterwards, theyare crossed over generating the following two sons: son1 =07 02 11 09 20 15 and son2 = 05 23 10 06 12 04 08 03 0109. In this way, the individuals’ size may change from onegeneration to the other, thus enabling GAESC to find indi-viduals with the appropriate number of rules. Duplicatedrules are removed.Mutation operator: A rule is randomly selected from anindividual and is replaced (without reposition) by a ran-domly selected rule from the knowledge rule baseW . Themutation operator may also be applied to offsprings in thesame generation.Fitness Functions: Since the individuals are classifiers,fitness functions should evaluate their behavior on a testdata set. In GAESC, the measures to evaluate classi-fiers are called HQ. The classifiers’ evaluation measuresimplemented are HQAcc(h) = Acc(h), HQPrec(h) =Prec(h), HQF1(h) = F1(h), HQAC(h) = Acc(h) ×mean(Cov(Ri)),∀Ri ∈ h and HQPC(h) = Prec(h) ×mean(Cov(Ri)),∀Ri ∈ h

Furthermore, as explained in Section 2, classification bya rule set can be carried out using single (SR) or multiple(MR) classification methods. Besides the three rule mea-sures stated earlier, an index PE, which subtracts precisionfrom error, defined by PE(R) = Err(R) − Prec(R) isused.

The SR methods implemented are SRAcc, SRLap,SRCov and SRPE , which classify an example using thebest covering rule according to Acc(R), Lap(R), Cov(R)and PE(R), respectively. The MR methods proposed areMRAcc, MRLap, MRCov and MRPE .

The fitness function of GAESC can be set as a combina-

527

Page 4: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems

Algorithm 1 GAESC algorithm.Require: S = {(x1, y1), ..., (xN , yN )}: Set of examples

which will be used to construct the training, test andvalidation sets;Cinit = {ha,hb...,hq};W = {R1, ..., RW }: Knowledge rule base;Nind: Number of individuals in population;cross over, selection and mutation parameters;f(h): Fitness function to evolve individuals

1: Generate initial population Pcurrent = {h1, ...,hNind}

where Cinit ⊂ Pcurrent;2: for all h ∈ Pcurrent do3: Calculate f(h) using the training set;4: end for5: while Stop criteria not satisfied do6: h = best individual(Pcurrent);7: Pnew = h + Nind − 1 individuals selected from

Pcurrent;8: Apply cross-over operator to pairs of individuals se-

lected from Pnew;9: Apply mutation operator to individuals selected from

Pnew;10: Evaluate f of the modified individuals Pnew using

the training set;11: Pcurrent = Pnew;12: end while13: h = best individual(Pcurrent);14: h∗ =Post-process(h); {Prune h using the validation

set}15: return h∗; {Classifier evolved by GA}

tion of two of the described methods — one HQ methodcombined to one MR or SR method. In other words,the possible fitness functions the user can choose fromare MRAccHQAC , MRCovHQAcc, SRLapHQF1, and soforth.Stop criteria: Three criteria were implemented:

1. A maximum number of generations specified by theuser.

2. A convergence method. Given a number of genera-tions Ngen, if the evaluation function does not improvein the last Ngen generations, the GA stops;

3. Similar to the previous one, but GAESC stops if themean of all the individuals fitness does not change inthe last Ngen generations.

5 Experiments and Results

In order to evaluate the proposed genetic algorithm, wecarried out a series of experiments using 3 (three) data sets

from UCI [9] — Autos, Balance2 and Heart. Table 3 de-scribes the main characteristics of the data sets: number ofinstances (# Inst.); number of features (continuous, discrete)(# Feat.); class distribution (Class %); error rate of the clas-sifier which allways predict the most prevalent class (Maj.Err.); presence/absence of unknown values (Unk. Val?); andnumber of conflicting/duplicated instances (# InstCD). Thebehavior of GAESC was evaluated varying the fitness func-tion and the stop criteria.

Data set # Inst. # Feat Class Maj. Unk. # InstCd.(cont.,disc.) (%) Err. Val?

Autos 205 25 safe 44.88% Y 0(15,10) (44.88%) in risky (0.00%)

risky(52.12%)

Balance 576 4 L 53.92% N 0(0,4) (46.08%) in L (0.00%)

R(53.92%)

Heart 270 13 1 44.44% Y 0(5,8) (55.56%) in 1 (0.00%)(5,8) 2

(44.44%)

Table 3. Data sets description

Experiments were evaluated using 10-fold cross valida-tion. The remaining training data set was used to constructclassification association rules using Apriori (continuousfeatures were discretized in advance) and three classifierswere constructed, using CN2, C4.5 and C4.5rules. Allthe induced rules were used to compose the knowledge rulebase. The initial population consisted of the three classifiersinduced by CN2 (unordered rules), C4.5 and C4.5rules(Cinit) as well as classifiers constructed using randomlyselected rules from the knowledge rule base. The muta-tion and crossover parameters were set as pm = 0.01 andpc = 0.40, respectively. The individuals’ number of rulesin the initial population (except the ones in Cinit) was setas the mean number of rules of the three classifiers in Cinit.

In order to test the convergence of the algorithm, threedifferent stopping criteria configurations were used:

Maximum number of generations: number of executedgenerations — 10; number of individuals in popula-tion — 15;

Convergence of the the best individual: the fitness func-tion of the best individual does not improve in 10 gen-erations; number of individuals in population — 15;

Convergence of the population: the fitness function ofall of the individuals does not improve in 25 iterations;number of individuals in populations — 50.

2The original data set, named Balance-Scale, was modified by remov-ing the instances from class B, since this class only has 7,84% of the totalnumber of examples and introduces the additional problem of class imbal-ance [1].

528

Page 5: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems

We use the error rate, averaged over the 10 cross val-idation test sets, as the metric to evaluate our approach.In order to analyze whether there are differences amongthe methods, we ran the Friedman test3. To run the tests,each combination of stopping criteria and fitness function isconsidered as an independent execution of the algorithm.Due to lack of space, only results of these tests are re-ported here4. Friedman test was ran with four different null-hypotheses: (H1) that the results obtained using the threeconvergence criteria are comparable considering all results;(H2) that the performance of using either single rules (SR)or multiple rule (MR) as classification methods are com-parable considering all results; (H3) that the performanceof using Acc(R), Cov(R), Lap(R) or PE(R) as rule clas-sification methods are comparable considering all results;(H4) that the performance of using HQAC(h), HQAcc(h),HQF1(h), HQPC(h) or HQPrec(h) as hypothesis qualitymethods are comparable considering all results. When thenull-hypothesis is rejected by the Friedman test, we can pro-ceed with a post-hoc test to detect which differences amongthe methods are significant. We ran the Nemenyi multiplecomparison with a control test to point out whether there isa significant difference among the methods involved in theexperimental evaluation.

The Friedman tests rejected the four null hypothesis at95% of confidence level. Figures 3 to 6 show the resultsof Nememy tests carried out in the results. For each graph,the thicker line above the graph marks the interval of onecritical difference (see [6]) two methods should have to bestatistically different from each other. Groups of algorithmsthat are not significantly different are also grouped by a line.

As shown in Figure 3, the convergence considering themedian of the fitness function of the last 25 iterations isstatistically better than the others, and the convergence con-sidering no increases in the best individual is better thanrunning 10 interactions only.

CD

1 2 3

Convergence of the populationConvergence of the best individual

Maximum number of generations

Figure 3. Critical difference diagram for hy-pothesis H1: all convergence criteria behavethe same?

Figure 4 shows that using multiple rule (MR) evalua-tion is significantly better than using SR to classify an in-stance. Regarding the rule classification method, Figure 5

3The Friedman test is the nonparametric equivalent of the repeated-measures ANOVA. See [6] for a thorough discussion regarding statisticaltests in machine learning research.

4All tabulated results can be found in http://www.icmc.usp.br/˜prati/HIS2008.

shows no statistical significance among Acc(R), Lap(R)and PE(R).

CD

1 2

Rule combination Single rule

Figure 4. Critical difference diagram for hy-pothesis H2: either single rule (SR) or multi-ple rule (MR) evaluation behave the same?

CD

2 3 4

LapAcc PE

Cov

Figure 5. Critical difference diagram for hy-pothesis H3: Acc(R), Cov(R), Lap(R) andPE(R) behave the same as classificationmethod?

Finally, Figure 6 shows the comparison of HQAC(h),HQAcc(h), HQF1(h), HQPC(h) or HQPrec(h) as hy-pothesis quality measures. As can be seem in this fig-ure, there is no significantly differences between HQAcc(h)and HQF1(h) or among HQAC(h), HQPC(h) andHQPrec(h), although the first two are better than the lat-ter three.

CD

2 3 4

HQAccHQF1

HQP C

HQP rec

HQAC

Figure 6. Critical difference diagram for hy-pothesis H4: HQAC(h), HQAcc(h), HQF1(h),HQPC(h) and HQPrec(h) behave the same ashypothesis quality measures?

As the results using the average convergence of the pop-ulation in the last 25 iterations and multiple rule classifi-tion (MR) were statistically better than the others, we con-straint further analysis to them. We also considered onlyMRAcc(R), MRLap(R) and MRPE(R) as classificationmethod and HQAcc(h) and HQF1(h) as hypothesis qual-ity measures, as the results do not provide statistical differ-ences among them. We compare these methods with respectto the number of rules as well as the error rate of the classi-fiers induced by C4.5, C4.5rules and CN2.

Table 4 shows the error rate of the three inducers, C4.5,C4.5rules and CN2. Number in parenthesis indicates stan-

529

Page 6: [IEEE 2008 8th International Conference on Hybrid Intelligent Systems (HIS) - Barcelona, Spain (2008.09.10-2008.09.12)] 2008 Eighth International Conference on Hybrid Intelligent Systems

autos balance heartC4.5 8.69(2.95) 28.48(1.61) 22.96(3.16)C4.5rules 9.14(3.22) 15.12(1.71) 21.11(2.28)CN2 12.14(3.29) 15.11(1.55) 22.22 (2.82)MRAccHQAcc 5.40 (1.39) 4.88 (0.81) 9.63 (2.42)MRAccHQF1 4.88 (1.78) 6.26 (1.59) 10.00 (1.24)MRLapHQAcc 6.88 (1.97) 5.38 (0.98) 11.85 (2.46)MRLapHQF1 4.88 (0.71) 5.04 (1.12) 12.96 (2.08)MRPEHQAcc 4.43 (1.14) 9.19 (1.71) 8.89 (0.99)MRPEHQF1 6.38 (1.67) 7.11 (1.44) 9.63 (2.36)

Table 4. Error rate

autos balance heartC4.5 6.00(6.77) 36.20(4.13) 25.10(4.38)C4.5rules 8.70(1.25) 34.60(4.45) 14.20(1.03)CN2 33.40(2.12) 200.60(6.11) 44.40(2.63)MRAccHQAcc 26.00(38.92) 33.30(9.74) 93.20(93.11)MRAccHQF1 28.00(28.40) 34.10(11.40) 65.10(49.93)MRLapHQAcc 11.60(7.35) 29.80(5.94) 13.50(8.24)MRLapHQF1 19.10(15.54) 32.10(8.80) 42.40(27.76)MRPEHQAcc 21.60(53.03) 36.30(10.59) 90.40(103.55)MRPEHQF1 21.90(12.44) 38.90(9.16) 35.20(27.66)

Table 5. Comparison of the number of rules

dard error. As can be seem from this table, GAESC greatlyoutperforms all these inducers in all data sets. Regarding thenumber of rules, Table 5 shows a comparison of these threealgorithms with the proposed methods. In general, C4.5 andC4.5rules produced smaller rule sets for the autos and heartdata sets, although our approach compares favorably to thenumber of rules generated by CN2 for the autos data set,and in three cases out of six for the heart data set. On theother hand, the number of rules generated by GAESC usingdata set balance is comparable to C4.5 and C4.5rules, andmuch lower than the number of rules generated by CN2.

These results show the suitability of our approach.GAESC was able to achieve considerable lower error rateswhen compared to C4.5, C4.5rules and CN2, with aslightly increase in the rule sets sizes, keeping them mostof the time between the size of the classifiers induced byC4.5 and CN2.

6 Conclusions

This paper presented GAESC, a genetic algorithm thatevolves a single symbolic classifier from classifiers con-sisting of rules coming from a knowledge rule base. Thisknowledge base is formed by rules from symbolic learn-ing algorithms, as well as associative classification rules.Each individual (classifier) is a rule set formed by a subsetof rules from the knowledge rule base.

GAESC was evaluated in three data sets from UCI us-

ing different configurations of the fitness function. Thisfitness function was formed by a combination of classifi-cation methods and rule set quality. We also evaluated dif-ferent convergence criteria. Individuals from the best con-figurations were compared to symbolic classifiers inducedby C4.5, C4.5rules and CN2. Results show that GAESCoutperforms these symbolic classifiers in terms of classifi-cation error rates, while keeping the individuals’ rule set atreasonable sizes when compared to them.

Acknowledgments: This research was supported by theBrazilian research councils CNPq and FAPESP. The authorswould like to thank Alex A. Freitas (University of Kent,UK) and Luis Correia (Universidade de Lisboa, Portugal)for their helpful discussions about this work; and the anony-mous referees for their insightful comments.

References

[1] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard.A study of the behavior of several methods for balanc-ing machine learning training data. SIGKDD Explorations,6(1):20–29, 2004.

[2] E. Bauer and R. Kohavi. An empirical comparison of votingclassification algorithms: Bagging, boosting and variants.Machine Learning, 36(1/2):105–139, 1999.

[3] F. C. Bernardini, M. C. Monard, and R. C. Prati. Construct-ing ensembles of symbolic classifiers. Int. J. Hybrid Intelli-gent Systems, 3(3):159–167, 2006.

[4] K. A. De Jong, W. Spears, and D. Gordon. Using genetic al-gorithms for concept learning. Machine Learning, 13:161–188, 1993.

[5] T. G. Dietterich. Ensemble methods in machine learning. InFirst International Workshop on Multiple Classifier Systems.LNCS, volume 1857, pages 1–15, New York, 2000.

[6] J. Demsar. Statistical comparisons of classifiers over multi-ple data sets. J. Machine Learning Research, 7:1–30, 2006.

[7] A. A. Freitas. Data Mining and Knowledge Discovery withEvolutionary Algorithms. Springer Verlag, 2002.

[8] D. Goldberg. Genetic Algorithms in Search, Optimizationand Machine Learning. Addison Wesley, 1989.

[9] S. Hettich, C. Blake, and C. Merz. UCI repository of ma-chine learning databases, 1998.

[10] J. Kittler, M. Hatef, R. P. Duin, and J. Matas. On combin-ing classifiers. IEEE Trans. Pattern Analysis and MachineIntelligence, 20(3):226–239, 1998.

[11] N. Lavrac, P. Flach, and B. Zupan. Rule evaluation mea-sures: a unifying view. In Proc. 9th Inter. W. on InductiveLogic Programming. LNAI, volume 1634, pages 74–185,1999.

[12] B. Liu, W. Hsu, and Y. Ma. Integrating classification andassociation rule mining. In KDD, pages 80–86, 1998.

[13] C. Setzkorn and R. C. Paton. MERBIS – a multi-objectiveevolutionary rule base induction system. Technical ReportULCS-03-016, Department of Computer Science, Univer-sity of Liverpool, Liverpool, U.K., 2003.

530