computed prediction: so far, so good. what now?
DESCRIPTION
Pier Luca Lanzi talks at NIGEL 2006 about computed predictionsTRANSCRIPT
![Page 1: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/1.jpg)
Computed PredictionSo far, so good. What now?
Pier Luca Lanzi
Politecnico di Milano, ItalyIllinois Genetic Algorithms Laboratory,University of Illinois at Urbana Champaign, USA
![Page 2: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/2.jpg)
RL
![Page 3: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/3.jpg)
What is the problem?
Environment
Agent
st atrt+1st+1
Compute a value function Q(st,at) mapping state-action pairs into expected future payoffs
How much future reward when action at is performed in state st?
What is the expected payoff for st and at?
GOAL: maximize the amount of reward received in the long run
![Page 4: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/4.jpg)
Example: The Mountain Car
GOAL
Task: drive an underpoweredcar up a steep mountain road
a t = a
cc. l
eft,
acc.
right
, no
acc.
st = position, velocity
rt = 0 when goal isreached, -1 otherwise.
Value FunctionQ(st,at)
![Page 5: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/5.jpg)
What are the issues?
Exact representation infeasible Approximation mandatory The function is unknown,
it is learnt online from experience
Learning the unknown payoff functionwhile also trying to approximate it
Approximator works on intermediate estimatesbut it also tries to provide information for the
learning
Convergence is not guaranteed
![Page 6: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/6.jpg)
Classifiers
![Page 7: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/7.jpg)
Learning Classifier Systems
Solve reinforcement learning problems
Represent the payoff function Q(st, at) asa population of rules, the classifiers.
Classifiers are evolved whileQ(st, at) is learnt online
![Page 8: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/8.jpg)
payoff
surface for A
What is a classifier?
IF condition C is true for input sTHEN the payoff of action a is p
s
payoff
l u
p
ConditionC(s)=l≤s≤u
General conditionscovering large portionsof the problem space
Accurateapproximations
Generalization depends on how wellconditions can partition the problem space
What is the best representation for theproblem?
Several representations have beendeveloped to improve generalization
![Page 9: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/9.jpg)
payoff
landscape of A
What is computed prediction?
Replace the prediction p bya parametrized functionp(x,w)
x
payoff
l u
p(x,w)=w0+xw1
ConditionC(s)=l≤s≤u
IF condition C is true for input sTHEN the value of action a isp(x,w)
Which Representation?
Which type ofapproximation?
![Page 10: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/10.jpg)
Computed Prediction:Linear approximation
Each classifier has a vector of parameters wClassifier prediction is computed as,
Classifier weights are updated usingWidrow-Hoff update,
![Page 11: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/11.jpg)
Summary
![Page 12: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/12.jpg)
Typical RL approach:What is the best approximator?
GOAL: Learn thepayoff function
Typical LCS approach asks:What is the best representation
for the problem?
What are the differences?
REPRESENTATION
intervals messy Symbols
Hullsellipsoid0/1/#AP
PR
OX
IMA
TOR
GradientDescent
Radial Basis
NNs
Tile Coding
ComputedPrediction
BooleanRepresentationSigmoidPrediction
BooleanRepresentation
NeuralPrediction
(O’hara & Bull2004)
Real IntervalsNeuralPrediction
Convex HullsLinearPrediction
![Page 13: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/13.jpg)
To represent or to approximate?
Powerful representations allow the solution ofdifficult problems with basic approximators
Powerful approximators may make thechoice of the representation less critical
Experiment
Consider a very powerful approximatorthat we know it can solve a certain RL problem
Use it to compute classifier prediction in an LCSand apply the LCS to solve the same problem
Does genetic search stillprovide an advantage?
![Page 14: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/14.jpg)
Computed prediction with Tile Coding
Powerful approximator developed inthe reinforcement learning community
Tile coding can solve the mountain car problemgiven an adequate parameter setting
Classifier prediction is computed using tile coding Each tile coding has a different parameter settings When using tile coding to compute
classifier prediction, one classifier cansolve the whole problem
What should we expect?
![Page 15: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/15.jpg)
The performance?
Computed prediction can perform as well as theapproximator with the most adequate configuration
The evolution of a population of classifiersprovides advantages over one approximator
Even if the same approximator alonemight solve the whole problem
![Page 16: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/16.jpg)
How do parameters evolve?
![Page 17: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/17.jpg)
What now?
![Page 18: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/18.jpg)
What now?REPRESENTATION
AP
PR
OX
IMA
TOR
Problem
Whichrepresentation?
Whichapproximator?
Which approximator?
Let evolution decide!
Population of classifiers using differentapproximators to compute prediction
The genetic algorithm selects the bestapproximators for each problem subspace
![Page 19: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/19.jpg)
Evolving the best approximator
![Page 20: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/20.jpg)
What next?REPRESENTATION
AP
PR
OX
IMA
TOR
Problem
Whichrepresentation?
Whichapproximator?
Which approximator?
Let evolution decide!
Population of classifiers using differentapproximators to compute prediction
Even if the same approximator alonemight solve the whole problem
![Page 21: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/21.jpg)
Evolving Heterogeneous Approximators
HeterogeneousApproximators
Most PowerfulApproximator
![Page 22: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/22.jpg)
What next?
Allow different representationsin the same populations
Let evolution evolve the most adequaterepresentation for each problem subspace
Then, allow different representations anddifferent approximators evolve all together
Probably donefor BooleanConditions
![Page 23: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/23.jpg)
Acknowledgements
Daniele LoiaconoMatteo ZaniniAll the current and former
members of IlliGAL
![Page 24: Computed Prediction: So far, so good. What now?](https://reader036.vdocument.in/reader036/viewer/2022081413/549fde29ac79590b768b4ae5/html5/thumbnails/24.jpg)
Thank you!Any question?