linkage learning for pittsburgh lcs: making problems tractable€¦ · 2003; de la osa, sastry, and...

Linkage Learning for Pittsburgh LCS:Making Problems Tractable

Xavier Llorà, Kumara Sastry, & David E. Goldberg

Illinois Genetic Algorithms LabUniversity of Illinois at Urbana-Champaign

{xllora,kumara,deg}@illigal.ge.uiuc.edu

NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 2

Motivation and Early Work

• Can we apply Wilson’s ideas for evolving rule setsformed only by maximally accurate and general rules inPittsburgh LCS?

• Previous Multi-objective approaches: Bottom up (Bernadó, 2002)

• Panmictic populations

• Multimodal optimization (sharing/crowding for niche formation)

Top down (Llorà, Goldberg, Traus, Bernadó, 2003)• Explicitly address accuracy and generality

• Use it to push and product compact rule sets

• The compact classifier system (CCS) roots on the bottomup approach.


Maximally Accurate and General Rules

• Accuracy and generality can be compute as

!

"(r) =nt+(r) + n

t#(r)

nt

!

"(r) =nt+(r)

nm

• Fitness should combine accuracy and generality

!

f (r) ="(r) # $(r)%

• Such measure can be either applied to rules or rule sets.

• The CCS uses this fitness and a compact genetic algorithm(cGA) to evolve such rules.

• One cGA run provides one rule.

• Multiple rules are required to form a rule set.


The cGA Can Make It

• Rules may be obtained optimizing

!

f (r) ="(r) # $(r)%

• The basic CGA scheme1. Initialization

2. Model sampling (two individuals are generated)

3. Evaluation (f(r))

4. Selection (tournament selection)

5. Probabilistic model updation

6. Repeat steps 2-5 until termination criteria are met

!

pxi0

= 0.5


cGA Model Perturbation

• Facilitate the evolution of different rules

• Explore the frequency of appearance of each optimalrule

• Initial model perturbation

!

pxi0

= 0.5 +U("0.4,0.4)

• Experiments using the 3-input multiplexer

• 1,000 independent runs

• Visualize the pair-wise relations of the genes


But One Rule Is Not Enough

• Model perturbation in cGA evolve different rules

• The goal: evolve population of rules that solve theproblem together

• The fitness measure (f(r)) can be also be applied to rulesets

• Two mechanism: Spawn a population until the solution is meet

Fusing populations when they represent the same rule


Spawning and Fusing Populations


Experiments & Scalability

• Analysis using multiplexer problems (3-, 6-, and 11-input)

• The number of rules in [O] grow exponentially. It grows as 2i, where i is the number of inputs.

Assume equal probability of hitting a rule (binomial model).

The number or runs to achieve all the rules in [O] growsexponentially.

• The cGA success as a function of the problem size! 3-input: 97%

6-input: 73.93%

11-input: 43.03%

• Scalability over 10,000 independent runs


Scalability of CCS


So?

• Open questions:

Multiple runs is not an option.

Could the poor cGA scalability be the result of the existence of linkage?

• The χ-ary extended compact classifier system (χeCCS) needs to

provide answers to:

Perform linkage learning to improve the scalability of the rule learningprocess.

Evolve [O] in a single run (rule niching?).

• The χeCCS answer:

Use the extended compact genetic algorithm (Harik, 1999)

Rule niching via restricted tournament replacement (Harik, 1995)


Extended Compact Genetic Algorithm

• A Probabilistic model building GA (Harik, 1999)

Builds models of good solutions as linkage groups

• Key idea:

Good probability distribution → Linkage learning

• Key components:

Representation: Marginal product model (MPM)

• Marginal distribution of a gene partition

Quality: Minimum description length (MDL)

• Occam’s razor principle

• All things being equal, simpler models are better

Search Method: Greedy heuristic search


Marginal Product Model (MPM)

• Partition variables into clusters

• Product of marginal distributions on a partition of genes

• Gene partition maps to linkage groups

x1 x2 x3 x4 x5 x6 xl-2 xl-1 xl

{p000, p001, p010, p100, p011, p101, p110, p111}

. . .

MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l]


Minimum Description Length Metric

• Hypothesis: For an optimal model

Model size and error is minimum

• Model complexity, Cm

# of bits required to store all marginal probabilities

• Compressed population complexity, Cp

Entropy of the marginal distribution over all partitions

• MDL metric, Cc = Cm + Cp


Building an Optimal MPM

• Assume independent genes ([1],[2],…,[l])

• Compute MDL metric, Cc

• All combinations of two subset merges• Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])}

• Compute MDL metric for all model candidates

• Select the set with minimum MDL,

• If , accept the model and go to step 2.

• Else, the current model is optimal


Extended Compact Genetic Algorithm

• Initialize the population (usually random initialization)

• Evaluate the fitness of individuals

• Select promising solutions (e.g., tournament selection)

• Build the probabilistic model

• Optimize structure & parameters to best fit selected individuals

• Automatic identification of sub-structures

• Sample the model to create new candidate solutions

• Effective exchange of building blocks

• Repeat steps 2–7 till some convergence criteria are met


Models built by eCGA

• Use model-building procedure of extended compact GA

Partition genes into (mutually) independent groups

Start with the lowest complexity model

Search for a least-complex, most-accurate model

Model Structure Metric[X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000[X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933[X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819[X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644

M M[X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273

M M[X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895


Modifying ecGA for Rule Learning

• Rules are described using χ-ary alphabets {0, 1, #}.

• χeCCS uses a χ-ary version of ecGA (Sastry and Goldberg,2003; de la Osa, Sastry, and Lobo, 2006).

• Maximally general and maximally accurate rules may beobtained using:

• Needs to maintain multiple rules in a run → niching

We need an efficient niching method, that does not adverselyaffect the quality of the probabilistic models.

Restricted tournament replacement (Harik, 1995)

!

f (r) ="(r) # $(r)%


Experiments

• Goals1. Is linkage learning useful to solve the multiplexer problem using

Pittsburgh LCS?

2. How far can we push it?

• Multiplexer problems Address bits determine what input to use

There is un underlying structure, isn’t it?

• The larger solved using Pittsburgh approaches (11-input) Match all the examples

No linkage learning available

• We borrowed the population sizing theory for ecGA.


χeCCS Models for Different MultiplexersBu

ildin

g Bl

ock

Size

Incr

ease

s


χeCCS Scalability

• Follows facet-wise theory:1. Grows exponential with the number of address bits (building block size)

2. Quadratically with the problem size


Conclusions

• The χeCCS builds on competent GAs

• The facetwise models from GA theory hold• The χeCCS is able to:

1. Perform linkage learning to improve the scalability of the rulelearning process.

2. Evolve [O] in a single run.

• The χeCCS show the need for linkage learning inPittsburgh LCS to effectively solve multiplexerproblems.

• χeCCS solved 20-input, 37-input, and 70-inputmultiplexers problems for the first time using PittsburghLCS.

Linkage Learning for Pittsburgh LCS:Making Problems Tractable

Xavier Llorà, Kumara Sastry, & David E. Goldberg

Illinois Genetic Algorithms LabUniversity of Illinois at Urbana-Champaign

{xllora,kumara,deg}@illigal.ge.uiuc.edu

linkage learning for pittsburgh lcs: making problems tractable€¦ · 2003; de la osa, sastry, and...

Documents