machine learning and ilp for multi-agent systems daniel kudenko & dimitar kazakov department of...
Post on 21-Dec-2015
213 Views
Preview:
TRANSCRIPT
Machine Learning and ILP for Multi-Agent Systems
Daniel Kudenko & Dimitar Kazakov
Department of Computer Science
University of York, UK
ACAI-01, Prague, July 2001
Why Learning Agents?
Agent designers are not able to foresee all situations that the agent will encounter.
To display full autonomy Agents need to learn from and adapt to novel environments.
Learning is a crucial part of intelligence.
A Brief History
Machine Learning
Agents
Disembodied ML
Single-Agent System
Single-Agent Learning Multiple
Single-Agent Learners
Multiple Single-AgentSystem
Social Multi-AgentLearners
Social Multi-AgentSystem
Outline
Principles of Machine Learning (ML) ML for Single Agents ML for Multi-Agent Systems Inductive Logic Programming for Agents
What is Machine Learning?
Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell 97]
Example: T = “play tennis”, E = “playing matches”, P = “score”
Types of Learning
Inductive Learning (Supervised Learning) Reinforcement Learning Discovery (Unsupervised Learning)
Inductive Learning
[An inductive learning] system aims at determining a description of a given concept from a set of concept examples provided by the teacher and from background knowledge. [Michalski et al. 98]
Inductive Learning
Examples of Category C1
Examples of Category C2
Examples of Category Cn
Inductive LearningSystem
Hypothesis(Procedure to Classify
New Examples)
Inductive Learning ExampleAmmo: lowMonster: nearLight: goodCategory: shoot
Inductive LearningSystem
If (Ammo = high) and (light {medium, good}) then shoot; ………..
Ammo: lowMonster: farLight: mediumCategory: ¬shoot
Ammo: highMonster: farLight: goodCategory: shoot
Performance Measure
Classification accuracy on unseen test set.
Alternatively: measure that incorporates cost of false-positives and false-negatives (e.g. recall/precision).
Where’s the knowledge?
Example (or Object) language Hypothesis (or Concept) language Learning bias Background knowledge
Example Language
Feature-value vectors, logic programs. Which features are used to represent
examples (e.g., ammunition left)? For agents: which features of the
environment are fed to the agent (or the learning module)?
Constructive Induction: automatic feature selection, construction, and generation.
Hypothesis Language
Decision trees, neural networks, logic programs, …
Further restrictions may be imposed, e.g., depth of decision trees, form of clauses.
Choice of hypothesis language influences choice of learning methods and vice versa.
Learning bias
Preference relation between legal hypotheses.
Accuracy on training set. Hypothesis with zero error on training
data is not necessarily the best (noise!). Occam’s razor: the simpler hypothesis
is the better one.
Inductive Learning
No “real” learning without language or learning bias.
IL is search through space of hypotheses guided by bias.
Quality of hypothesis depends on proper distribution of training examples.
Inductive Learning for Agents
What is the target concept (i.e., categories)?
Example: do(a), ¬do(a) for specific action a.
Real-valued categories/actions can be discretized.
Where does the training data come from and what form does it take?
Batch vs Incremental Learning
Batch Learning: collect a set of training examples and compute hypothesis.
Incremental Learning: update hypothesis with each new training example.
Incremental learning more suited for agents.
Batch Learning for Agents
When should (re-)computation of hypothesis take place?
Example: after experienced accuracy of hypothesis drops below threshold.
Which training examples should be used?
Example: sequences of actions that led to success.
Eager vs. Lazy learning
Eager learning: commit to hypothesis computed after training.
Lazy learning: store all encountered examples and perform classification based on this database (e.g. nearest neighbour).
Active Learning
Learner decides which training data to receive (i.e. generates training examples and uses oracle to classify them).
Closed Loop ML: learner suggests hypothesis and verifies it experimentally. If hypothesis is rejected, the collected data gives rise to a new hypothesis.
Black-Box vs. White-Box
Black-Box Learning: Interpretation of the learning result is unclear to a user.
White-Box Learning: Creates (symbolic) structures that are comprehensible.
Reinforcement Learning
Agent learns from environmental feedback indicating the benefit of states.
No explicit teacher required. Learning target: optimal policy (i.e.,
state-action mapping) Optimality measure: e.g., cumulative
discounted reward.
Q Learning
Value of a state: discounted cumulative reward V(st) = i 0 i r(st+i,at+i)
0 < 1 is a discount factor ( = 0 means that only immediate reward is considered).r(st+i ,at+i) is the reward determined by performing actions specified by policy .
Q(s,a) = r(s,a) + V*((s,a))
Optimal Policy: *(s) = argmaxa Q(s,a)
Q Learning
Initialize all Q(s,a) to 0
In some state s choose some action a. Let s’ be the resulting state.
Update Q:
Q(s,a) = r + maxa’ Q(s’,a’)
Q Learning
Guaranteed convergence towards optimum (state-action pairs have to be visited infinitely often).
Exploration strategy can speed up convergence.
Basic Q Learning does not generalize: replace state-action table with function approximation (e.g. neural net) in order to handle unseen states.
Pros and Cons of RL+ Clearly suited to agents acting and
exploring an environment.+ Simple.- Engineering of suitable reward function
may be tricky. - May take a long time to converge.- Learning result may be not transparent
(depending on representation of Q function).
Combination of IL and RL
Relational reinforcement learning [Dzeroski et al. 98]: leads to more general Q function representation that may still be applicable even if the goals or environment change.
Explanation-based learning and RL [Dietterich and Flann, 95].
More ILP and RL: see later.
Unsupervised Learning
Acquisition of “useful” or “interesting” patterns in input data.
Usefulness and interestingness are based on agent’s internal bias.
Agent does not receive any external feedback.
Discovered concepts are expected to improve agent performance on future tasks.
Learning and Verification
Need to guarantee agent safety. Pre-deployment verification for non-
learning agents. What to do with learning agents?
Learning and Verification[Gordon ’00] Verification after each self-modification
step. Problem: Time-consuming. Solution 1: use property-preserving
learning operators. Solution 2: use learning operators which
permit quick (partial) re-verification.
Learning and Verification
What to do if verification fails? Repair (multi)-agent plan. Choose different learning operator.
Learning in Multi-Agent Systems
Classification Social Awareness. Communication Role Learning. Distributed Learning.
Types of Multi-Agent Learning[Weiss & Dillenbourg 99] Multiplied Learning: No interference in
the learning process by other agents (except for exchange of training data or outputs).
Divided Learning: Division of learning task on functional level.
Interacting Learning: cooperation beyond the pure exchange of data.
Social Awareness
Awareness of existence of other agents and (eventually) knowledge about their behavior.
Not necessary to achieve near optimal MAS behavior: rock sample collection [Steels 89].
Can it degrade performance?
Levels of Social Awareness [Vidal&Durfee 97]
0-level agent: no knowledge about existence of other agents.
1-level agent: recognizes that other agents exist, model other agents as 0-level.
2-level agent: has some knowledge about behavior of other agents and their behavior; model other agents as 1-level agents.
k-level agent: model other agents as (k-1)-level.
Social Awareness and Q Learning 0-level agents already learn implicitly
about other agents. [Mundhe and Sen, 00]: study of two Q
learning agents up to level 2. Two 1-level agents display slowest and
least effective learning (worse than two 0-level agents).
Agent models and Q Learning Q: S An R, where n is the number of
agents. If other agent’s actions are not observable,
need assumption for actions of other agents. Pessimistic assumption: given an agent’s
action choice other agents will minimize reward.
Optimistic assumption: other agents will maximize reward.
Agent Models and Q Learning
Pessimistic Assumption leads to overly cautious behavior.
Optimistic Assumption guarantees convergence towards optimum [Lauer & Riedmiller ‘00].
If knowledge of other agent’s behavior available, Q value update can be based on probabilistic computation [Claus and Boutilier ‘98]. But: no guarantee of optimality.
Q Learning and Communication[Tan 93]
Types of communication: Sharing sensation Sharing or merging policies Sharing episodes
Results: Communication generally helps Extra sensory information may hurt
Role Learning
Often useful for agents to specialize in specific roles for joint tasks.
Pre-defined roles: reduce flexibility, often not easy to define optimal distribution, may be expensive.
How to learn roles? [Prasad et al. 96]: learn optimal
distribution of pre-defined roles.
Q Learning of roles
[Crites&Barto 98]: elevator domain; regular Q learning; no specialization achieved (but highly efficient behavior).
[Ono&Fukumoto 96]: Hunter-Prey domain, specialization achieved with greatest mass merging strategy.
Q Learning of Roles [Balch 99] Three types of reward function: local
performance-based, local shaped, global. Global reward supports specialization. Local reward supports emergence of
homogeneous behaviors. Some domains benefit from learning team
heterogeneity (e.g., robotic soccer), others do not (e.g., multi-robot foraging).
Heterogeneity measure: social entropy.
Distributed Learning
Motivation: Agents learning a global hypothesis from local observations.
Application of MAS techniques to (inductive) learning.
Applications: Distributed Data Mining [Provost & Kolluri ‘99], Robotic Soccer.
Distributed Data Mining
[Provost& Hennessy 96]: Individual learners see only subset of all training examples and compute a set of local rules based on these.
Local rules are evaluated by other learners based on their data.
Only rules with good evaluation are carried over to the global hypothesis.
Bibliography[Mitchell 97] T. Mitchell. Machine Learning. McGraw Hill, 1997.
[Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat. Machine Learning and Data Mining: Methods and Applications. Wiley, 1998.
[Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.
[Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In: Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98. Springer, 1998.
[Gordon 00] D. Gordon: Asimovian Adaptive Agents. Journal of Artificial Intelligence Research, 13, 2000.
[Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.), Collaborative Learning. Cognitive and Computational Approaches. Pergamon Press, 1999.
[Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997.
[Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners. Proceedings of the Fourth International Conference on Multiagent Systems, IEEE Press, 2000.
[Claus & Boutillier 98] C. Claus and C. Boutillier. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI 98.
[Lauer & Riedmiller 00] M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000.
Bibliography[Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993.
[Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996.
[Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996.
[Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998.
[Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999.
[Provost & Kolluri 99] F. Provost and V. Kolluri. "A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery 3, 1999.
[Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.
B R E A K
Machine Learning and ILP for MAS: Part II
Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language
Machine Learning and ILP for MAS: Part II
Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language
From Machine Learning to Learning Agents
Classic Machine Learning
Machine Learning: Learning as the only goal
Active Learning
Learning as one of many goals: Learning Agent(s)
Closed Loop Machine Learning
Integrating Machine Learning into the Agent Architecture
Time constraints on learning Synchronisation between agents’
actions Learning and Recall
Time Constraints on Learning
Machine Learning alone: – predictive accuracy matters, time doesn’t
(just a price to pay) ML in Agents
– Soft deadlines: resources must be shared with other activities (perception, planning, control)
– Hard deadlines: imposed by environment: Make up your mind now! (or they’ll eat you)
Doing Eager vs. Lazy Learning under Time Pressure Eager Learning
– Theories typically more compact…– …and faster to use– Takes more time to learn – do it when the
agent is idle Lazy Learning
– Knowledge acquired at (almost) no cost– May be much slower when a test example
comes
“Clear-cut” vs. Any-time Learning
Consider two types of algorithms: Running a prescribed number of steps
guarantees finding a solution– can use worst case complexity analysis to find an
upper bound on the execution time Any-time algorithms
– a longer run may result in a better solution– don’t know an optimal solution when they see one– example: Genetic Algorithms– policies: halt learning to meet hard deadlines or
when cost outweighs expected improvements of accuracy
Time Constraints on Learning in Simulated Environments Consider various cases:
– Unlimited time for learning– Upper bound on time for learning– Learning in real time
Gradually tightening the constraints makes integration easier
Not limited to simulations: real-world problems have similar setting– e.g., various types of auctions
Synchronisation Time ConstraintsUnlimited time Upper
boundReal time
1-move-per-round, batch update
Logic-based MAS for conflict simulations (Kudenko, Alonso)
1-move-per-round, immediate update
The York MA Environment
(Kazakov et al.)
Asynchronous Multi-agent Progol (Muggleton)
Learning and Recall
Agent must strike a balance between: Learning, which updates the model of
the world Recall, which applies existing model of
the world to other tasks
Learning and Recall (2)
Update sensory information
Recall current model of world to choose and carry out an action
Observe the action outcome
Learn new model of the world
Learning and Recall (3)
Update sensory information
Recall current model of world to choose and carry out an action
Learn new model of the world
• In theory, the two can run in parallel
• In practice, must share limited resources
Learning and Recall (4)
Possible strategies: Parallel learning and recall at all times Mutually exclusive learning and recall
– After incremental, eager learning, examples are discarded…
– …or kept if batch or lazy learning used Cheap on-the-fly learning (preprocessing),
off-line computationally expensive learning– reduce raw information, change object language– analogy with human learning and the role of sleep
Machine Learning and ILP for MAS: Part II
Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language
Machine Learning Revisited
ML can be seen as the task of: taking a set of observations represented
in a given object/data language and representing (the information in) that
set in another language called concept/hypothesis language.
A side effect of this step – the ability to deal with unseen observations.
Object and Concept Language
Object Language: (x,y,+/-). Concept Language: any ellipse
(5 param.)
+++
+
_
__
_
Machine Learning Biases The concept/hypothesis language
specifies the language bias, which limits the set of all concepts/hypotheses that can be expressed/considered/learned.
The preference bias allows us to decide between two hypotheses if they both classify the training data equally.
The search bias defines the order in which hypotheses will be considered.– Important if one does not search the whole
hypothesis space.
Preference Bias, Search Bias & Version Space
Version space: the subset of hypotheses that have zero training error.
+++
+
_
__
_
most spec. concept
most gen. concept
Inductive Logic Programming
Based on three pillars: Logic Programming (LP) to represent
data and concepts (i.e., object and concept language)
Background Knowledge to extend the concept language
Induction as learning method
LP as ILP Object Language
A subset of First Order Predicate Logic (FOPL) called Logic Programming.
Often limited to ground facts, i.e., propositional logic (cf. ID3 etc.).
In the latter case, data can be represented as a single table.
ILP Object Language Example
Good bargain cars ILP representation
model mileage price y/n
BMW Z3
50,000 £5000 + gbc(z3,50000,5000).
Audi V8
30,000 £4000 + gbc(v8,30000,4000).
Fiat Uno
90,000 £3000 - :- gbc(uno,90000,3000).
LP as ILP Concept Language
The concept language of ILP is relations expressed as Horn clauses, e.g.:
equal(X,X).greater(X,Y) :- X > Y.
Cf. propositional logic representation:(arg1=1 & arg2=1)or(arg1=2 & arg2=2)...
– Tedious for finite domains and impossible otherwise.
Most often there is one target predicate (concept) only. – exceptions exist, e.g., Progol 5.
Modes in ILP
Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate
learned. Mode # used to describe attributes that must
contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-
Doors =< 2, Roof = convertible.
Modes in ILP
Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate
learned. Mode # used to describe attributes that must
contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-
Doors =< 2, Roof = convertible.
Modes in ILP
Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate
learned. Mode # used to describe attributes that must
contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-
Doors =< 2, Roof = convertible.
Modes in ILP
Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate
learned. Mode # used to describe attributes that must
contain a constant in the predicate definition. E.g., use mode car_type(-,-,#) to learncar_type(Doors,Roof,sports_car):-
(Doors = 1 ; Doors = 2), Roof = convertible.
Types in ILP
Specify the range for each argument User-defined types represented as
unary predicates:colour(blue). colour(red). colour(black).
Built-in types also provided:nat/1, real/1, any/1 in Progol.
These definitions may or may not be generative: colour(X) instantiates X,nat(X) does not.
ILP Types and Modes: Example
Good bargain cars ILP representation (Progol)
model mileage price y/n modeh(1,gbc(+model,+mileage,+price))?
BMW Z3
50,000 5000 + gbc(z3,50000,5000).
Audi V8
30,000 4000 + gbc(v8,30000,4000).
Fiat Uno
90,000 3000 - :- gbc(uno,90000,3000).
Positive Only Learning
A way of dealing with domains where no negative examples are available.– Learn the concept of non-self-destructive
actions. The trivial definition “Anything belongs
to the target concept” looks all right ! Trick: generate random examples and
treat them as negative.– Requires generative type definitions.
Background Knowledge
Only very simple math. relations, such as identity and “greater than” used so far:equal(X,X).greater(X,Y) :- X > Y.
These can also be easily hard-wired in the concept language of propositional learners.
ILP’s big advantage: one can extend the concept language with user-defined concepts or background knowledge.
Background Knowledge (2) The use of certain BK predicates may be a
necessary condition for learning the right hypothesis.
Redundant or irrelevant BK slows down the learning.
ExampleBK: prod(Miles,Price,Threshold):- Miles * Price < Threshold.
Modes: modeh(1,gbc(#model,+miles,+price))? modeb(1,prod(+miles,+price,#threshold))?
Th: gbc(z3,Miles,Price) :- prod(Miles,Price,250000001).
Choice of Background Knowledge In an ideal world one should start from a complete
model of the background knowledge of the target population. In practice, even with the most intensive anthropological studies, such a model is impossible to achieve. We do not even know what it is that we know ourselves. The best that can be achieved is a study of the directly relevant background knowledge, though it is only when a solution is identified that one can know what is or is not relevant.
The Critical Villager, Eric Dudley
ILP Preference Bias
Typically a trade-off between generality and complexity:– cover as many positive examples (and as
few negative ones) as you can…– …with as simple a theory as possible
Some ILP learners allow the users to specify their own preference bias.
Induction in ILP
Bottom-up (least general generalisation)– Map a term into a variable– Drop a literal from the clause body
Top-down (refinement operator)– Instantiate a variable– Add a literal to the clause body
Mixed techniques (e.g., Progol)
Example of Induction
p(X,Y).
p(b,a) :- q(b).
p(X,a).p(X,Y) :- q(X).
BK:
q(b).q(c).
Training examples:
p(b,a).p(f,g).:- p(i,j).
Induction in Progol
For each training example– Find the most general theory (clause) T– Find the most specific theory (clause) – Search the space in between in a top-down
fashion: T = p(X,Y)
= p(X,a) :- q(X).
p(X,a).p(X,Y) :- q(X)
Summary of ILP Basics
Symbolic Eager Knowledge-oriented (white-box) learner Complex, flexible hypothesis space Based on Induction
Learning Pure Logic Programs vs. Decision Lists Pure logic programs: the order of
clauses is irrelevant, and they must not contradict each other.
Decision lists: the concept language includes the predicate cut (!).
The use of decision lists can make for simpler (more concise) theories.
Decision List Example
%action(Cat,ObservedAnimal,Action).
action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.
action(Cat,Animal,run):-dog(Animal),!.
action(Cat,Animal,stay).
Updating Decision Lists with Exceptions
action(Cat,caesar,run):- !.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.
action(Cat,Animal,run):-dog(Animal),!.
action(Cat,Animal,stay).
Updating Decision Lists with Exceptions Could be very beneficial in agents when
immediate updating of the agent’s knowledge is important: just add the exception at the top of the list.
Computationally inexpensive – does not need to modify the rest of the list.
Exceptions could be compiled into rules when agent is inactive.
Replacing Exceptions with Rules: Before
action(Cat,caesar,run):- !.action(Cat,rex,run):- !.action(Cat,rusty,run):- !.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.
…
Replacing Exceptions with Rules: After
action(Cat,Animal,run):- dog(Animal), owner(richard,Animal),!.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.
…
Eager ILP vs. Analogical Prediction Eager Learning: learn theory, dispose of
observations. Lazy Learning:
– keep all observations– compare new with old ones to classify– no explanation provided.
Analogical Prediction (Muggleton, Bain ‘98)– Combines the often higher accuracy of lazy
learning with an intelligible, explicit hypothesis typical for ILP
– Constructs a local theory for each new observation that is consistent with the largest number of training examples.
Analogical Prediction Example
owner(richard,caesar).
action(Cat,caesar,run).
owner(richard,rex).
action(Cat,rex,run).
owner(daniel,blackie).
action(Cat,blackie,stay).
owner(richard,rusty).
action(Cat,rusty,?).
Analogical Prediction Example
owner(richard,caesar).
action(Cat,caesar,run).
owner(richard,rex).
action(Cat,rex,run).
owner(daniel,blackie).
action(Cat,blackie,stay).
owner(richard,rusty).
action(Cat,Dog,run):-
owner(richard,Dog).
Timing Analysis of Theories Learned with ILP The more training examples, the more
accurate the theory… …but how long does it take to produce
an answer ? No theoretical work on the subject so far Experiment shows nontrivial behaviour
(reminding of the phase transitions observed in SAT learning).
Timing Analysis of ILP Theories: Example Kazakov, PhD Thesis:
• left: simple theory with low coverage; succeeds or quickly fails high speed
• middle: medium coverage, fragmentary theory, lots of backtracking low speed
• right: general theory with high coverage; less backtracking high speed
Machine Learning and ILP for MAS: Part II
Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language
Agent Applications of ILP
Relational Reinforcement Learning (Džeroski, De Raedt, Driessens)
combines reinforcement learning with ILP generalises over previous experience and
goals (Q-table) to produce logical decision trees
results can be used to address new situations Don’t miss the next talk (~11:40 –13:10h) !
Agent Applications of ILP
ILP for Verification and Validation of MAS (Jacob, Driessens, De Raedt)
Also uses FOPL decision trees Observes agents’ behavour and
represents it as a logical decision tree The rules in the decision tree can be
compared with the designers’ intentions Test domain: RoboCup
Agent Applications of ILP
Reid & Ryan 2000: ILP used to help hierarchical
reinforcement learning ILP constructs high-level features that
help discriminate between (state,action) transitions with non-deterministic behaviour
Agent Applications of ILP
Matsui et al. 2000: Proposed an ILP agent that avoids
actions which will probably fail to achieve the goal.
Application domain: RoboCup
Alonso & Kudenko ‘99: ILP and EBL for conflict simulations.
The York MA Environment
Species of 2D agents competing for renewable, limited resources.
Agents have simple hard-coded behaviour based on the notion of drives.
Each agent can optionally have an ILP (Progol) mind – a separate process receiving observations and suggesting actions.
Allows to select the values of inherited features through natural selection.
The York MA Environment
The York MA Environment
ILP hasn’t been used in experiments yet (to come soon).
A number of experiments using inheritance studied Kinship-driven Altruism among Agents.
The start-up project sponsored by Microsoft.
Undergraduate students involved so far: Lee Mallabone, Steve Routledge, John Barton.
Machine Learning and ILP for MAS: Part II
Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language
Learning and Natural Selection
In learning, search is trivial, choosing the right bias is hard.
But, the choice of learning bias is always external to the learner !
To find the best suited bias one could combine arbitrary choices of bias of with evolution and natural selection of the fittest individuals.
Darwinian vs. Lamarckian Evolution Darwinian evolution: nothing learned by
the individual is encoded in the genes and passed on to the offspring.
The Baldwin effect: learning abilities (good biases) are selected in evolution because they give the individual a better chance in a dynamic environment.
What is passed on to the offspring is useful, but very general.
Darwinian vs. Lamarckian Evolution (2) Lamarckian Evolution: individual
experience acquired in life can be inherited.
Not the case in nature. Doesn’t mean we can’t use it. The inherited concepts may be too
specific and not of general importance.
Learning and Language
Language uses concepts which are– specific enough to be useful to most/all
speakers of that language– general enough to correspond to shared
experience (otherwise, how would one know what the other is talking about !)
The concepts of a language serve as a learning bias which is “inherited” not in genes but through education.
Communication and Learning Language
– helps one learn (in addition to inherited biases)– allows to communicate knowledge.
Distinguish between– Knowledge: things that one can explain by the
means of a language to another. – Skills: the rest, require individual learning, cannot be
communicated.
If watching was enough to learn, the dog would have become a butcher. Bulgarian proverb.
Communication and Learning (2)
In NLP, forgetting [examples] may be harmful (van den Bosch et al.)
An expert is someone who does not think anymore – he knows. Frank Lloyd Wright.
It may be difficult to communicate what one has learned because of– Limited bandwidth (for lazy learning)– The absence of appropriate concepts in the
language (for black-box learning)
Communication and Learning (3)
In a society of communicating agents, less accurate white-box learning may be better than more accurate but expensive learning that cannot be communicated since the reduced performance could be outweighed by the much lower cost of learning.
Our Current Research
Inductive Bias Selection (Shane Greenaway)
Role Learning (Spiros Kapetanakis) Inductive Learning for Games (Alex
Champandard) Machine Learning of Natural Language
in MAS (Mark Bartlett)
The End
top related