linkage learning for pittsburgh lcs: making problems tractable€¦ · 2003; de la osa, sastry, and...
TRANSCRIPT
Linkage Learning for Pittsburgh LCS:Making Problems Tractable
Xavier Llorà, Kumara Sastry, & David E. Goldberg
Illinois Genetic Algorithms LabUniversity of Illinois at Urbana-Champaign
{xllora,kumara,deg}@illigal.ge.uiuc.edu
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 2
Motivation and Early Work
• Can we apply Wilson’s ideas for evolving rule setsformed only by maximally accurate and general rules inPittsburgh LCS?
• Previous Multi-objective approaches: Bottom up (Bernadó, 2002)
• Panmictic populations
• Multimodal optimization (sharing/crowding for niche formation)
Top down (Llorà, Goldberg, Traus, Bernadó, 2003)• Explicitly address accuracy and generality
• Use it to push and product compact rule sets
• The compact classifier system (CCS) roots on the bottomup approach.
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 3
Maximally Accurate and General Rules
• Accuracy and generality can be compute as
!
"(r) =nt+(r) + n
t#(r)
nt
!
"(r) =nt+(r)
nm
• Fitness should combine accuracy and generality
!
f (r) ="(r) # $(r)%
• Such measure can be either applied to rules or rule sets.
• The CCS uses this fitness and a compact genetic algorithm(cGA) to evolve such rules.
• One cGA run provides one rule.
• Multiple rules are required to form a rule set.
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 4
The cGA Can Make It
• Rules may be obtained optimizing
!
f (r) ="(r) # $(r)%
• The basic CGA scheme1. Initialization
2. Model sampling (two individuals are generated)
3. Evaluation (f(r))
4. Selection (tournament selection)
5. Probabilistic model updation
6. Repeat steps 2-5 until termination criteria are met
!
pxi0
= 0.5
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 5
cGA Model Perturbation
• Facilitate the evolution of different rules
• Explore the frequency of appearance of each optimalrule
• Initial model perturbation
!
pxi0
= 0.5 +U("0.4,0.4)
• Experiments using the 3-input multiplexer
• 1,000 independent runs
• Visualize the pair-wise relations of the genes
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 6
But One Rule Is Not Enough
• Model perturbation in cGA evolve different rules
• The goal: evolve population of rules that solve theproblem together
• The fitness measure (f(r)) can be also be applied to rulesets
• Two mechanism: Spawn a population until the solution is meet
Fusing populations when they represent the same rule
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 7
Spawning and Fusing Populations
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 8
Experiments & Scalability
• Analysis using multiplexer problems (3-, 6-, and 11-input)
• The number of rules in [O] grow exponentially. It grows as 2i, where i is the number of inputs.
Assume equal probability of hitting a rule (binomial model).
The number or runs to achieve all the rules in [O] growsexponentially.
• The cGA success as a function of the problem size! 3-input: 97%
6-input: 73.93%
11-input: 43.03%
• Scalability over 10,000 independent runs
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 9
Scalability of CCS
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 10
So?
• Open questions:
Multiple runs is not an option.
Could the poor cGA scalability be the result of the existence of linkage?
• The χ-ary extended compact classifier system (χeCCS) needs to
provide answers to:
Perform linkage learning to improve the scalability of the rule learningprocess.
Evolve [O] in a single run (rule niching?).
• The χeCCS answer:
Use the extended compact genetic algorithm (Harik, 1999)
Rule niching via restricted tournament replacement (Harik, 1995)
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 11
Extended Compact Genetic Algorithm
• A Probabilistic model building GA (Harik, 1999)
Builds models of good solutions as linkage groups
• Key idea:
Good probability distribution → Linkage learning
• Key components:
Representation: Marginal product model (MPM)
• Marginal distribution of a gene partition
Quality: Minimum description length (MDL)
• Occam’s razor principle
• All things being equal, simpler models are better
Search Method: Greedy heuristic search
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 12
Marginal Product Model (MPM)
• Partition variables into clusters
• Product of marginal distributions on a partition of genes
• Gene partition maps to linkage groups
x1 x2 x3 x4 x5 x6 xl-2 xl-1 xl
{p000, p001, p010, p100, p011, p101, p110, p111}
. . .
MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l]
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 13
Minimum Description Length Metric
• Hypothesis: For an optimal model
Model size and error is minimum
• Model complexity, Cm
# of bits required to store all marginal probabilities
• Compressed population complexity, Cp
Entropy of the marginal distribution over all partitions
• MDL metric, Cc = Cm + Cp
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 14
Building an Optimal MPM
• Assume independent genes ([1],[2],…,[l])
• Compute MDL metric, Cc
• All combinations of two subset merges• Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])}
• Compute MDL metric for all model candidates
• Select the set with minimum MDL,
• If , accept the model and go to step 2.
• Else, the current model is optimal
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 15
Extended Compact Genetic Algorithm
• Initialize the population (usually random initialization)
• Evaluate the fitness of individuals
• Select promising solutions (e.g., tournament selection)
• Build the probabilistic model
• Optimize structure & parameters to best fit selected individuals
• Automatic identification of sub-structures
• Sample the model to create new candidate solutions
• Effective exchange of building blocks
• Repeat steps 2–7 till some convergence criteria are met
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 16
Models built by eCGA
• Use model-building procedure of extended compact GA
Partition genes into (mutually) independent groups
Start with the lowest complexity model
Search for a least-complex, most-accurate model
Model Structure Metric[X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000[X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933[X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819[X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644
M M[X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273
M M[X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 17
Modifying ecGA for Rule Learning
• Rules are described using χ-ary alphabets {0, 1, #}.
• χeCCS uses a χ-ary version of ecGA (Sastry and Goldberg,2003; de la Osa, Sastry, and Lobo, 2006).
• Maximally general and maximally accurate rules may beobtained using:
• Needs to maintain multiple rules in a run → niching
We need an efficient niching method, that does not adverselyaffect the quality of the probabilistic models.
Restricted tournament replacement (Harik, 1995)
!
f (r) ="(r) # $(r)%
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 18
Experiments
• Goals1. Is linkage learning useful to solve the multiplexer problem using
Pittsburgh LCS?
2. How far can we push it?
• Multiplexer problems Address bits determine what input to use
There is un underlying structure, isn’t it?
• The larger solved using Pittsburgh approaches (11-input) Match all the examples
No linkage learning available
• We borrowed the population sizing theory for ecGA.
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 19
χeCCS Models for Different MultiplexersBu
ildin
g Bl
ock
Size
Incr
ease
s
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 20
χeCCS Scalability
• Follows facet-wise theory:1. Grows exponential with the number of address bits (building block size)
2. Quadratically with the problem size
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 21
Conclusions
• The χeCCS builds on competent GAs
• The facetwise models from GA theory hold• The χeCCS is able to:
1. Perform linkage learning to improve the scalability of the rulelearning process.
2. Evolve [O] in a single run.
• The χeCCS show the need for linkage learning inPittsburgh LCS to effectively solve multiplexerproblems.
• χeCCS solved 20-input, 37-input, and 70-inputmultiplexers problems for the first time using PittsburghLCS.
Linkage Learning for Pittsburgh LCS:Making Problems Tractable
Xavier Llorà, Kumara Sastry, & David E. Goldberg
Illinois Genetic Algorithms LabUniversity of Illinois at Urbana-Champaign
{xllora,kumara,deg}@illigal.ge.uiuc.edu