Modeling XCS in Class Imbalances: Population Size
and Parameter Settings
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1
1Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois Genetic Algorithms LaboratoryDepartment of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 2GECCO’07
Framework
Domain Learner Datamodel
Information basedon experience
Knowledgeextraction
Consisting of
Examples
Counter-examples
In real-world domains, typically:Higher cost to obtain examples of the concept to be learntSo, distribution of examples in the training dataset is usually imbalanced
Applications:Fraud detectionMedical diagnosis of rare illnessesDetection of oil spills in satellite images
New instance
Predicted Output
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 3GECCO’07
Framework
Do learners suffer from class imbalances?
Learner Minimize theglobal error
TrainingSet
examplesnumbererrorsnumerrorsnum
error cc 21 .. +=Biased towards
the overwhelmed class
Maximization of the overwhelmed class accuracy,in detriment of the minority class.
And what about incremental learning?– Sampling instances of the minority class less frequently
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 4GECCO’07
Aim
Facetwise analysis of XCS for class imbalances
How can XCS create rules of the minority class
When XCS will remove these rules
Population size bound with respect to the imbalance ratio
Until which imbalance ratio would XCS be able to learn from the minority class?
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 5GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 6GECCO’07
Description of XCS
1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Population [P]
Environment
Problem instance
Match set generation
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Match Set [M]
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Action Set [A]
c1 c2 … cn
Prediction Array
Genetic Algorithm
Selection, Reproduction, Mutation
Deletion
Selectedaction
ClassifierParameters
Update
REWARD1000/0
In single-step tasks:
Random Action
Minorityclass instance
1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Population [P] Match set generation
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Match Set [M]
Starved niches
Majorityclass instance
1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Population [P] Match set generation
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Match Set [M]
Nourished niches
Problem niche: the schema defines the relevant attributes for a particular problem niche.Eg: 10**1*
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 7GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 8GECCO’07
Facetwise Analysis
Study XCS capabilities to provide representatives of starved niches:– Population covering– Generation of correct representatives of starved niches– Time of extinction of these correct classifiers
Derive a bound on the population size to guarantee that XCS will learn starved nichesDepart from theory developed for XCS
– (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS – (Butz, Goldberg & Lanzi, 04): Learning time bound – (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche
support– (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled
Approach to LCS Analysis and Design.
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 9GECCO’07
Facetwise Analysis
Assumptions– Problems consisting of n classes
– One class sampled with a lower frequency: minority class
– Probability of sampling an instance of the minority class:
ir11 Ps(min)+
=
classminority theof instances num.classminority theother than classany of instances num.ir =
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 10GECCO’07
Facetwise Analysis
Facetwise Analysis– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 11GECCO’07
Population Initialization
Covering procedure– Covering: Generalize over the input with probability P#
– P# needs to satisfy the covering challenge (Butz et al., 01)
Would I trigger covering on minority class instances?– Probability that one instance is covered, by, at least,
one rule is (Butz et. al, 01):Inputlength
Population size
Population specificity
Initially 1 – P#
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 12GECCO’07
Population Initialization1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Probability to apply covering on the first minority class instance
l = 20
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 13GECCO’07
Facetwise Analysis
Facetwise Analysis– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 14GECCO’07
Creation of Representatives of Starved Niches
Assumptions– Covering has not provided any representative of starved niches– Simplified model: only consider mutation in our model.
How can we generate representative of starved niches?– In the population there are:
• Representative of nourished niches• Overgeneral classifiers
– Specifying correctly all the bits of the schema that represents the starved niche
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 15GECCO’07
Creation of Representatives ofStarved Niches
Summing up, time to get the first representative of a starved niche
Time to extinction
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
n: number of classes
μ: Mutation probability
km: Order of the schema
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 16GECCO’07
Facetwise Analysis
Facetwise Analysis– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 17GECCO’07
Bounding the Population Size
Population size bound to guarantee that there will be representatives of starved niches
– Require that:
– Bound:
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
n: number of classes
μ: Mutation probability
km: Order of the schema
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 18GECCO’07
Bounding the Population Size
Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity:– Consider θGA = 0
– We require that the best representative of a starved niche receive a genetic event before being removed
– Population size bound:
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
n: number of classes
ir: Imbalance ratio
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 19GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 20GECCO’07
Design of Test Problems
One-bit problem
– Only two schemas of order one: 0***** and 1*****
Parity problem
– The k bits of parity form a single building block
Undersampling instances of the class labeled as 1
000110 :0 Value of the left-most bit
Condition length (l)
ir11 Ps(min)+
=
01001010
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Condition length (l)
:1 Number of 1 mod 2
Relevantbits ( k)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 21GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 22GECCO’07
XCS on the one-bit Problem
XCS configuration
Evaluation of the results:– Minimum population size to achieve:
TP rate * TN rate > 95%
– Results are averages over 25 seeds
α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 23GECCO’07
XCS on the one-bit Problem
N remains constant up to ir = 64
N increases linearly from ir=64 to ir=256
N increases exponentially fromir=256 to ir=1024
Higher ir could not be solved
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 24GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 25GECCO’07
Analysis of the Deviations
Inheritance Error of Classifiers’ Parameters– New promising representatives of starved niches are created from classifiers that
belong to nourished niches. – These new promising rules inherit parameters from these classifiers. This is
specially delicate for the action set size (as).
– Approach: initialize as=1.
Subsumption– An overgeneral classifier of the majority class may receive ir positive reward
before receiving the first negative reward– Approach: set θsub>ir
Stabilizing the population before testing– Overgeneral classifiers poorly evaluated– Approach: introduce some extra runs at the end of learning with the GA switched
off.
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 26GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 27GECCO’07
XCS+PCM in the one-bit Problem
N remains constant up to ir = 128
For higher ir, N slightly increases
We only have to guarantee that a representative of the starved niche will be created
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 28GECCO’07
XCS+PCM in the Parity Problem
Building blocks of size 3 need to be processed
Empirical results agree with thetheory
Population size bound to guaranteethat a representative of the nichewill receive a genetic event
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 29GECCO’07
Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 30GECCO’07
Conclusions and Further Work
We derived models that analyzed the representatives of starved niches provided by covering and mutation
A population size bound was derived
We saw that the empirical observations met the theory if four aspects were considered:
– as initialization
– Subsumption
– Stabilization of the population
XCS really robust to class imbalances
Further analysis of the covering operator
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Modeling XCS in Class Imbalances: Population Size
and Parameter Settings
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1
1Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois Genetic Algorithms LaboratoryDepartment of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 32GECCO’07
Motivation
And what about incremental learning?
Sampling instances of the minority class less frequently
This influences the mechanisms of XCS (Orriols & Bernadó, 2006)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 33GECCO’07
Analysis of the Deviations
Niched Mutation vs. Free Mutation– Classifiers can only be created if minority class instances are sampled
Inheritance Error of Classifiers’ Parameters– New promising representatives of starved niches are created from
classifiers that belong to nourished niches
– These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as).
– Approach: initialize as=1.
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 34GECCO’07
Analysis of the Deviations
Subsumption– An overgeneral classifier of the majority class may receive ir positive
reward before receiving the first negative reward
– Approach: set θsub>ir
Stabilizing the population before testing– Overgeneral classifiers poorly evaluated
– Approach: introduce some extra runs at the end of learning with the GA switched off.
We gather all these little tweaks in XCS+PMC
1. Description of XCS2. Facetwise Analysis3. Design of test Problems4. XCS on the one-bit Problem5. Analysis of Deviations6. Results7. Conclusions