practicalities cits4404 artificial intelligence & adaptive systems
TRANSCRIPT
PRACTICALITIESCITS4404Artificial Intelligence & Adaptive Systems
2
Issues in applying CI techniques• Global optimisation, some definitions • Fitness progression • Generality • Role of domain knowledge • Niching and speciation • Memetic optimisation • Multi-objective problems • Co-evolution • Constraints • Noisy and dynamic problems • Offline vs online learning • Supervised, reinforcement, unsupervised learning • Experimental methodology, performance measures • Parameter tuning and/or control
3
Global optimisation• Given a function f and a set of solutions S, search for
x* S such that x S: f(x*) beats f(x) • The graph below depicts a simple 1D fitness landscape
http://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
x* Global optimum
Local optima
Three basins of attraction
4
Definitions • The function f giving the fitness for each member of S is
called the fitness landscape • The best solution wrt f is called the global optimum
• Note that there may be multiple (equal) global optima
• Non-global optima that are better than other “similar solutions” are called local optima
• The part of the landscape dominated by an optimum is called its basin of attraction
• Diversity refers to the distribution of a set of solutions over the fitness landscape • Diversity preservation techniques try to ensure a good distribution
5
Heuristic optimisation• CI techniques are heuristic optimisers, also known as
generate-and-test optimisers • Searching procedures that use rules (inspired by nature) to decide
which solution(s) to try next
• The simplest heuristic optimisers are hill-climbers • Given one solution, generate similar solutions, and
keep the best of them
• A hill-climber is guaranteed to find a local optimum • They can exploit their basin of attraction, but they lack the ability to
explore the entire landscape properly
• CI techniques use populations and other tricks to promote both exploration and exploitation • More about this later under memetic algorithms
6
Fitness progression• CI techniques are anytime algorithms • The fitness of the best known solution will improve over time,
but usually under the law of diminishing returns • The convergence rate (the rate of improvement) tends to fall over time
• Several shorter runs may be better than one long run
Progress in 1st half
Bes
t fit
ness
in p
opul
atio
n
Time (number of generations)
Progress in 2nd half
7
Smart initialisation• Initial solutions can be generated randomly, or using
domain knowledge, often called smart initialisation • This can improve both results and time performance,
but it may also introduce bias into the search
Bes
t fit
ness
in p
opul
atio
n
Time (number of generations)
T: time needed to reach equivalent fitness of “smart” initialisation
T
8
CI vs problem-specific methods • CI techniques are general-purpose
• Robust techniques giving good performance over a range of problems
Scale of “all” problems
Per
form
ance
Random search
Special problem-tailored method
Computational intelligence approach
P
9
Domain knowledge • Performance can sometimes be improved by incorporating
“expertise” into the process
Scale of “all” problems
Per
form
ance
P
EA 1
EA 4
EA 3
EA 2
10
Contd. • Too little domain knowledge makes the search space
bigger and can make the search inefficient • cf. EA1
• Too much domain knowledge can exclude novel solutions• cf. EA4
• But care must be taken!• “If you tell the system what the solution looks like,
that’s what it’ll give you!” [R.L. While]
• But this can all be highly non-obvious…
• Most interesting problems are multi-modal • Sometimes we want to discover more than just the global optimum• i.e. we want to discover y* and z*, as well as x*
• This might be important to offer extra robustness • Often it is hard for a fitness function to capture everything
11
Niching and speciation
x*
y* z*
12
Contd.• Each basin of attraction is called a niche or sometimes
a species • Niching can be achieved in two broad ways• Implicit niching is achieved by modifying the solution
representation • Explicit niching is achieved by promoting dissimilar solutions,
or penalising similar solutions• Both techniques rely on having some distance metric
between solutions
13
Memetic algorithms• CI techniques are good at exploration – finding high peaks
in the fitness landscape – but are less good at exploitation of those peaks
• Memetic algorithms combine CI with some local-search technique that is good at exploitation • AKA Baldwinian, Lamarckian, or cultural algorithms
• Hill-climbing is the classic example • CI finds the best basin of attraction, then hill-climbing climbs the peak
• The two techniques can be applied in series, or in parallel
14
Multi-objective optimisation • In many problems, solutions are assessed wrt several criteria
• e.g. speed vs safety vs price for car designs
• Fitness is now a vector, not a scalar, which complicates selection • Vectors have only a partial ordering, rather than a total ordering
• The “solution” to a multi-objective problem is a set of solutions offering different trade-offs between the objectives • It is important to make no a priori assumptions about trade-off weights
or the shape of the solution set
15
Contd. Rank = 0
Rank = 1
Rank = 2
Rank = 1
Rank = 4
f2
f1
PQ
A
B
A and B are non-dominated
Rank = 0
Rank = 0
Rank = 0
P dominates Q
Two objectives f1 and f2, both being maximised
X dominates Y if it is better in all objectives
The rank of X is the number of solutions that dominate X
Selection is based on ranks
Each solution is plotted by its values in the objectives
16
Co-evolution• Fitness is sometimes assessed by the interactions
between solutions, rather than in isolation • e.g. build a team which is the best in the AFL
• The above is an example of competitive co-evolution • Aim for an “arms race” between solutions to drive improvement
through parallel adaptation
• The alternative is co-operative co-evolution • Decompose a problem into simpler sub-tasks (layered learning),
and combine the sub-solutions to solve the original problem
17
Constraints • A constraint is a requirement placed on a solution,
as opposed to a measure of quality • e.g. the length of this component must be less than X,
or the power of that component must be more than Y
• A feasible solution is one that satisfies all constraints• An infeasible solution fails at least one constraint
All solutions
Feasible
Feasible
Feasible
18
Contd. • There are many different constraint-handling techniques
• Separatist: consider objectives and constraints separately• Purist: discard all infeasible solutions when they arise • Repair: repair infeasible solutions when they arise • Penalty: modify the fitness function to penalise infeasible
solutions • MOOP: add an extra objective function that measures
“degree of infeasibility” • Hybrid: some combination of the above
19
Noisy problems • A noisy fitness function arises when the fitness calculations
aren’t perfect
Fitness landscape Noise landscape Noisy fitness landscape
20
Contd. • The algorithm can now only estimate performance • Again, this complicates selection
• Bad solutions might get lucky and survive • Good solutions might get unlucky and die
• These behaviours cause undesirable long-term effects • The learning rate is reduced • Learning may not be retained
• The usual approach to this is resampling • Evaluate the fitness multiple times and average the results • But how many times is sufficient?
• A second common approach is to try to bound the error • Basically to assume the error won’t exceed a certain magnitude
21
Dynamic problems• With some problems, the
fitness landscape changes over time, maybe due to • Temporal effects • External factors • System adaptation
• The system needs to adapt to this change • Requires online learning
http://www.natural-selection.com
22
Offline vs online learning • Offline learning is where a system learns before use
• The strategy is fixed once training is completed • Requires comprehensive training data • Only feasible in well-understood environments
• Online learning is where a system learns while in use • The strategy is adapted from each instance encountered • Initial decisions are made from incomplete training data • Usually much greater time-pressure to improve
23
Learning paradigms • Supervised learning is (offline) training by
comparing a system’s responses to expected responses in training data
• Reinforcement learning is (online) training using feedback from the environment to assess the quality of responses
• Unsupervised learning is (offline) training with no training data • No real question presented • The system looks for patterns in the data
• Question: tell me about this data
24
Unsupervised learning • Question: tell me about this data
25
Supervised learning • Question: what is an apple?
26
Experimental methodology • CI techniques are stochastic
• Their results are non-deterministic
• Thus we should never draw conclusions from a single run• Always perform a “large” number of runs • Assess results using statistical measures • Assess significance using statistical tests
• When comparing algorithms, it is crucial to make all comparisons fair • Give each the same amount of resource • Use the same performance measures • Try different competition limits
27
Performance measures • Offline performance measures
• Effectiveness (algorithm quality) • Success rate (percentage of “good’ runs) • Mean best fitness at termination
• Efficiency (algorithm speed) • CPU to “completion” • Number of solutions evaluated
• Online performance measures • Population distribution • Fitness distribution • Improvement rate
28
Parameter tuning • How do we decide on the various constants in a run
of the system? • e.g. bigger population, or more generations? • How big should mutations be?
• This can be difficult! • Sub-optimal values can seriously degrade performance • Choosing good values can take significant time • Exhaustive search is usually impractical • Good values may become bad during a run
29
Parameter control • Can we get the system to choose parameters automatically?
• i.e. allow settings to vary during the run
• Three main alternatives are used • Deterministic: change parameters according to some pre-determined
schedule, e.g. based on the passage of time • Adaptive: change parameters according to some measure of the
search progress • Self-adaptive: encode the “scope of change” into the solution
representation in some way
• One important goal is to reduce the prevalence of “magic numbers” in the system • Still, finding good settings is not easy