machine learning and ilp for multi-agent systems daniel kudenko & dimitar kazakov department of...

Machine Learning and ILP for Multi-Agent Systems

Daniel Kudenko & Dimitar Kazakov

Department of Computer Science

University of York, UK

ACAI-01, Prague, July 2001

Why Learning Agents?

Agent designers are not able to foresee all situations that the agent will encounter.

To display full autonomy Agents need to learn from and adapt to novel environments.

Learning is a crucial part of intelligence.

A Brief History

Machine Learning

Agents

Disembodied ML

Single-Agent System

Single-Agent Learning Multiple

Single-Agent Learners

Multiple Single-AgentSystem

Social Multi-AgentLearners

Social Multi-AgentSystem

Outline

Principles of Machine Learning (ML) ML for Single Agents ML for Multi-Agent Systems Inductive Logic Programming for Agents

What is Machine Learning?

Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell 97]

Example: T = “play tennis”, E = “playing matches”, P = “score”

Types of Learning

Inductive Learning (Supervised Learning) Reinforcement Learning Discovery (Unsupervised Learning)

Inductive Learning

[An inductive learning] system aims at determining a description of a given concept from a set of concept examples provided by the teacher and from background knowledge. [Michalski et al. 98]

Inductive Learning

Examples of Category C1

Examples of Category C2

Examples of Category Cn

Inductive LearningSystem

Hypothesis(Procedure to Classify

New Examples)

Inductive Learning ExampleAmmo: lowMonster: nearLight: goodCategory: shoot

Inductive LearningSystem

If (Ammo = high) and (light {medium, good}) then shoot; ………..

Ammo: lowMonster: farLight: mediumCategory: ¬shoot

Ammo: highMonster: farLight: goodCategory: shoot

Performance Measure

Classification accuracy on unseen test set.

Alternatively: measure that incorporates cost of false-positives and false-negatives (e.g. recall/precision).

Where’s the knowledge?

Example (or Object) language Hypothesis (or Concept) language Learning bias Background knowledge

Example Language

Feature-value vectors, logic programs. Which features are used to represent

examples (e.g., ammunition left)? For agents: which features of the

environment are fed to the agent (or the learning module)?

Constructive Induction: automatic feature selection, construction, and generation.

Hypothesis Language

Decision trees, neural networks, logic programs, …

Further restrictions may be imposed, e.g., depth of decision trees, form of clauses.

Choice of hypothesis language influences choice of learning methods and vice versa.

Learning bias

Preference relation between legal hypotheses.

Accuracy on training set. Hypothesis with zero error on training

data is not necessarily the best (noise!). Occam’s razor: the simpler hypothesis

is the better one.

Inductive Learning

No “real” learning without language or learning bias.

IL is search through space of hypotheses guided by bias.

Quality of hypothesis depends on proper distribution of training examples.

Inductive Learning for Agents

What is the target concept (i.e., categories)?

Example: do(a), ¬do(a) for specific action a.

Real-valued categories/actions can be discretized.

Where does the training data come from and what form does it take?

Batch vs Incremental Learning

Batch Learning: collect a set of training examples and compute hypothesis.

Incremental Learning: update hypothesis with each new training example.

Incremental learning more suited for agents.

Batch Learning for Agents

When should (re-)computation of hypothesis take place?

Example: after experienced accuracy of hypothesis drops below threshold.

Which training examples should be used?

Example: sequences of actions that led to success.

Eager vs. Lazy learning

Eager learning: commit to hypothesis computed after training.

Lazy learning: store all encountered examples and perform classification based on this database (e.g. nearest neighbour).

Active Learning

Learner decides which training data to receive (i.e. generates training examples and uses oracle to classify them).

Closed Loop ML: learner suggests hypothesis and verifies it experimentally. If hypothesis is rejected, the collected data gives rise to a new hypothesis.

Black-Box vs. White-Box

Black-Box Learning: Interpretation of the learning result is unclear to a user.

White-Box Learning: Creates (symbolic) structures that are comprehensible.

Reinforcement Learning

Agent learns from environmental feedback indicating the benefit of states.

No explicit teacher required. Learning target: optimal policy (i.e.,

state-action mapping) Optimality measure: e.g., cumulative

discounted reward.

Q Learning

Value of a state: discounted cumulative reward V(st) = i 0 i r(st+i,at+i)

0 < 1 is a discount factor ( = 0 means that only immediate reward is considered).r(st+i ,at+i) is the reward determined by performing actions specified by policy .

Q(s,a) = r(s,a) + V*((s,a))

Optimal Policy: *(s) = argmaxa Q(s,a)

Q Learning

Initialize all Q(s,a) to 0

In some state s choose some action a. Let s’ be the resulting state.

Update Q:

Q(s,a) = r + maxa’ Q(s’,a’)

Q Learning

Guaranteed convergence towards optimum (state-action pairs have to be visited infinitely often).

Exploration strategy can speed up convergence.

Basic Q Learning does not generalize: replace state-action table with function approximation (e.g. neural net) in order to handle unseen states.

Pros and Cons of RL+ Clearly suited to agents acting and

exploring an environment.+ Simple.- Engineering of suitable reward function

may be tricky. - May take a long time to converge.- Learning result may be not transparent

(depending on representation of Q function).

Combination of IL and RL

Relational reinforcement learning [Dzeroski et al. 98]: leads to more general Q function representation that may still be applicable even if the goals or environment change.

Explanation-based learning and RL [Dietterich and Flann, 95].

More ILP and RL: see later.

Unsupervised Learning

Acquisition of “useful” or “interesting” patterns in input data.

Usefulness and interestingness are based on agent’s internal bias.

Agent does not receive any external feedback.

Discovered concepts are expected to improve agent performance on future tasks.

Learning and Verification

Need to guarantee agent safety. Pre-deployment verification for non-

learning agents. What to do with learning agents?

Learning and Verification[Gordon ’00] Verification after each self-modification

step. Problem: Time-consuming. Solution 1: use property-preserving

learning operators. Solution 2: use learning operators which

permit quick (partial) re-verification.

Learning and Verification

What to do if verification fails? Repair (multi)-agent plan. Choose different learning operator.

Learning in Multi-Agent Systems

Classification Social Awareness. Communication Role Learning. Distributed Learning.

Types of Multi-Agent Learning[Weiss & Dillenbourg 99] Multiplied Learning: No interference in

the learning process by other agents (except for exchange of training data or outputs).

Divided Learning: Division of learning task on functional level.

Interacting Learning: cooperation beyond the pure exchange of data.

Social Awareness

Awareness of existence of other agents and (eventually) knowledge about their behavior.

Not necessary to achieve near optimal MAS behavior: rock sample collection [Steels 89].

Can it degrade performance?

Levels of Social Awareness [Vidal&Durfee 97]

0-level agent: no knowledge about existence of other agents.

1-level agent: recognizes that other agents exist, model other agents as 0-level.

2-level agent: has some knowledge about behavior of other agents and their behavior; model other agents as 1-level agents.

k-level agent: model other agents as (k-1)-level.

Social Awareness and Q Learning 0-level agents already learn implicitly

about other agents. [Mundhe and Sen, 00]: study of two Q

learning agents up to level 2. Two 1-level agents display slowest and

least effective learning (worse than two 0-level agents).

Agent models and Q Learning Q: S An R, where n is the number of

agents. If other agent’s actions are not observable,

need assumption for actions of other agents. Pessimistic assumption: given an agent’s

action choice other agents will minimize reward.

Optimistic assumption: other agents will maximize reward.

Agent Models and Q Learning

Pessimistic Assumption leads to overly cautious behavior.

Optimistic Assumption guarantees convergence towards optimum [Lauer & Riedmiller ‘00].

If knowledge of other agent’s behavior available, Q value update can be based on probabilistic computation [Claus and Boutilier ‘98]. But: no guarantee of optimality.

Q Learning and Communication[Tan 93]

Types of communication: Sharing sensation Sharing or merging policies Sharing episodes

Results: Communication generally helps Extra sensory information may hurt

Role Learning

Often useful for agents to specialize in specific roles for joint tasks.

Pre-defined roles: reduce flexibility, often not easy to define optimal distribution, may be expensive.

How to learn roles? [Prasad et al. 96]: learn optimal

distribution of pre-defined roles.

Q Learning of roles

[Crites&Barto 98]: elevator domain; regular Q learning; no specialization achieved (but highly efficient behavior).

[Ono&Fukumoto 96]: Hunter-Prey domain, specialization achieved with greatest mass merging strategy.

Q Learning of Roles [Balch 99] Three types of reward function: local

performance-based, local shaped, global. Global reward supports specialization. Local reward supports emergence of

homogeneous behaviors. Some domains benefit from learning team

heterogeneity (e.g., robotic soccer), others do not (e.g., multi-robot foraging).

Heterogeneity measure: social entropy.

Distributed Learning

Motivation: Agents learning a global hypothesis from local observations.

Application of MAS techniques to (inductive) learning.

Applications: Distributed Data Mining [Provost & Kolluri ‘99], Robotic Soccer.

Distributed Data Mining

[Provost& Hennessy 96]: Individual learners see only subset of all training examples and compute a set of local rules based on these.

Local rules are evaluated by other learners based on their data.

Only rules with good evaluation are carried over to the global hypothesis.

Bibliography[Mitchell 97] T. Mitchell. Machine Learning. McGraw Hill, 1997.

[Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat. Machine Learning and Data Mining: Methods and Applications. Wiley, 1998.

[Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.

[Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In: Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98. Springer, 1998.

[Gordon 00] D. Gordon: Asimovian Adaptive Agents. Journal of Artificial Intelligence Research, 13, 2000.

[Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.), Collaborative Learning. Cognitive and Computational Approaches. Pergamon Press, 1999.

[Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997.

[Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners. Proceedings of the Fourth International Conference on Multiagent Systems, IEEE Press, 2000.

[Claus & Boutillier 98] C. Claus and C. Boutillier. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI 98.

[Lauer & Riedmiller 00] M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000.

Bibliography[Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993.

[Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996.

[Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996.

[Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998.

[Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999.

[Provost & Kolluri 99] F. Provost and V. Kolluri. "A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery 3, 1999.

[Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.

B R E A K

Machine Learning and ILP for MAS: Part II

Integration of ML and Agents ILP and its potential for MAS Agent Applications of ILP Learning, Natural Selection and Language

From Machine Learning to Learning Agents

Classic Machine Learning

Machine Learning: Learning as the only goal

Active Learning

Learning as one of many goals: Learning Agent(s)

Closed Loop Machine Learning

Integrating Machine Learning into the Agent Architecture

Time constraints on learning Synchronisation between agents’

actions Learning and Recall

Time Constraints on Learning

Machine Learning alone: – predictive accuracy matters, time doesn’t

(just a price to pay) ML in Agents

– Soft deadlines: resources must be shared with other activities (perception, planning, control)

– Hard deadlines: imposed by environment: Make up your mind now! (or they’ll eat you)

Doing Eager vs. Lazy Learning under Time Pressure Eager Learning

– Theories typically more compact…– …and faster to use– Takes more time to learn – do it when the

agent is idle Lazy Learning

– Knowledge acquired at (almost) no cost– May be much slower when a test example

“Clear-cut” vs. Any-time Learning

Consider two types of algorithms: Running a prescribed number of steps

guarantees finding a solution– can use worst case complexity analysis to find an

upper bound on the execution time Any-time algorithms

– a longer run may result in a better solution– don’t know an optimal solution when they see one– example: Genetic Algorithms– policies: halt learning to meet hard deadlines or

when cost outweighs expected improvements of accuracy

Time Constraints on Learning in Simulated Environments Consider various cases:

– Unlimited time for learning– Upper bound on time for learning– Learning in real time

Gradually tightening the constraints makes integration easier

Not limited to simulations: real-world problems have similar setting– e.g., various types of auctions

Synchronisation Time ConstraintsUnlimited time Upper

boundReal time

1-move-per-round, batch update

Logic-based MAS for conflict simulations (Kudenko, Alonso)

1-move-per-round, immediate update

The York MA Environment

(Kazakov et al.)

Asynchronous Multi-agent Progol (Muggleton)

Learning and Recall

Agent must strike a balance between: Learning, which updates the model of

the world Recall, which applies existing model of

the world to other tasks

Learning and Recall (2)

Update sensory information

Recall current model of world to choose and carry out an action

Observe the action outcome

Learn new model of the world

Update sensory information

Recall current model of world to choose and carry out an action

Learn new model of the world

• In theory, the two can run in parallel

• In practice, must share limited resources

Possible strategies: Parallel learning and recall at all times Mutually exclusive learning and recall

– After incremental, eager learning, examples are discarded…

– …or kept if batch or lazy learning used Cheap on-the-fly learning (preprocessing),

off-line computationally expensive learning– reduce raw information, change object language– analogy with human learning and the role of sleep

Machine Learning Revisited

ML can be seen as the task of: taking a set of observations represented

in a given object/data language and representing (the information in) that

set in another language called concept/hypothesis language.

A side effect of this step – the ability to deal with unseen observations.

Object and Concept Language

Object Language: (x,y,+/-). Concept Language: any ellipse

(5 param.)

Machine Learning Biases The concept/hypothesis language

specifies the language bias, which limits the set of all concepts/hypotheses that can be expressed/considered/learned.

The preference bias allows us to decide between two hypotheses if they both classify the training data equally.

The search bias defines the order in which hypotheses will be considered.– Important if one does not search the whole

hypothesis space.

Preference Bias, Search Bias & Version Space

Version space: the subset of hypotheses that have zero training error.

most spec. concept

most gen. concept

Inductive Logic Programming

Based on three pillars: Logic Programming (LP) to represent

data and concepts (i.e., object and concept language)

Background Knowledge to extend the concept language

Induction as learning method

LP as ILP Object Language

A subset of First Order Predicate Logic (FOPL) called Logic Programming.

Often limited to ground facts, i.e., propositional logic (cf. ID3 etc.).

In the latter case, data can be represented as a single table.

ILP Object Language Example

Good bargain cars ILP representation

model mileage price y/n

BMW Z3

50,000 £5000 + gbc(z3,50000,5000).

Audi V8

30,000 £4000 + gbc(v8,30000,4000).

Fiat Uno

90,000 £3000 - :- gbc(uno,90000,3000).

LP as ILP Concept Language

The concept language of ILP is relations expressed as Horn clauses, e.g.:

equal(X,X).greater(X,Y) :- X > Y.

Cf. propositional logic representation:(arg1=1 & arg2=1)or(arg1=2 & arg2=2)...

– Tedious for finite domains and impossible otherwise.

Most often there is one target predicate (concept) only. – exceptions exist, e.g., Progol 5.

Modes in ILP

Used to distinguish between input attributes (mode +) output attributes (mode -) of the predicate

learned. Mode # used to describe attributes that must

contain a constant in the predicate definition. E.g., use mode car_type(+,+,#) to learncar_type(Doors,Roof,sports_car):-

Doors =< 2, Roof = convertible.

Modes in ILP

contain a constant in the predicate definition. E.g., use mode car_type(-,-,#) to learncar_type(Doors,Roof,sports_car):-

(Doors = 1 ; Doors = 2), Roof = convertible.

Types in ILP

Specify the range for each argument User-defined types represented as

unary predicates:colour(blue). colour(red). colour(black).

Built-in types also provided:nat/1, real/1, any/1 in Progol.

These definitions may or may not be generative: colour(X) instantiates X,nat(X) does not.

ILP Types and Modes: Example

Good bargain cars ILP representation (Progol)

model mileage price y/n modeh(1,gbc(+model,+mileage,+price))?

BMW Z3

50,000 5000 + gbc(z3,50000,5000).

Audi V8

30,000 4000 + gbc(v8,30000,4000).

Fiat Uno

90,000 3000 - :- gbc(uno,90000,3000).

Positive Only Learning

A way of dealing with domains where no negative examples are available.– Learn the concept of non-self-destructive

actions. The trivial definition “Anything belongs

to the target concept” looks all right ! Trick: generate random examples and

treat them as negative.– Requires generative type definitions.

Background Knowledge

Only very simple math. relations, such as identity and “greater than” used so far:equal(X,X).greater(X,Y) :- X > Y.

These can also be easily hard-wired in the concept language of propositional learners.

ILP’s big advantage: one can extend the concept language with user-defined concepts or background knowledge.

Background Knowledge (2) The use of certain BK predicates may be a

necessary condition for learning the right hypothesis.

Redundant or irrelevant BK slows down the learning.

ExampleBK: prod(Miles,Price,Threshold):- Miles * Price < Threshold.

Modes: modeh(1,gbc(#model,+miles,+price))? modeb(1,prod(+miles,+price,#threshold))?

Th: gbc(z3,Miles,Price) :- prod(Miles,Price,250000001).

Choice of Background Knowledge In an ideal world one should start from a complete

model of the background knowledge of the target population. In practice, even with the most intensive anthropological studies, such a model is impossible to achieve. We do not even know what it is that we know ourselves. The best that can be achieved is a study of the directly relevant background knowledge, though it is only when a solution is identified that one can know what is or is not relevant.

The Critical Villager, Eric Dudley

ILP Preference Bias

Typically a trade-off between generality and complexity:– cover as many positive examples (and as

few negative ones) as you can…– …with as simple a theory as possible

Some ILP learners allow the users to specify their own preference bias.

Induction in ILP

Bottom-up (least general generalisation)– Map a term into a variable– Drop a literal from the clause body

Top-down (refinement operator)– Instantiate a variable– Add a literal to the clause body

Mixed techniques (e.g., Progol)

Example of Induction

p(X,Y).

p(b,a) :- q(b).

p(X,a).p(X,Y) :- q(X).

q(b).q(c).

Training examples:

p(b,a).p(f,g).:- p(i,j).

Induction in Progol

For each training example– Find the most general theory (clause) T– Find the most specific theory (clause) – Search the space in between in a top-down

fashion: T = p(X,Y)

= p(X,a) :- q(X).

p(X,a).p(X,Y) :- q(X)

Summary of ILP Basics

Symbolic Eager Knowledge-oriented (white-box) learner Complex, flexible hypothesis space Based on Induction

Learning Pure Logic Programs vs. Decision Lists Pure logic programs: the order of

clauses is irrelevant, and they must not contradict each other.

Decision lists: the concept language includes the predicate cut (!).

The use of decision lists can make for simpler (more concise) theories.

Decision List Example

%action(Cat,ObservedAnimal,Action).

action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

action(Cat,Animal,run):-dog(Animal),!.

action(Cat,Animal,stay).

Updating Decision Lists with Exceptions

action(Cat,caesar,run):- !.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

action(Cat,Animal,run):-dog(Animal),!.

action(Cat,Animal,stay).

Updating Decision Lists with Exceptions Could be very beneficial in agents when

immediate updating of the agent’s knowledge is important: just add the exception at the top of the list.

Computationally inexpensive – does not need to modify the rest of the list.

Exceptions could be compiled into rules when agent is inactive.

Replacing Exceptions with Rules: Before

action(Cat,caesar,run):- !.action(Cat,rex,run):- !.action(Cat,rusty,run):- !.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

Replacing Exceptions with Rules: After

action(Cat,Animal,run):- dog(Animal), owner(richard,Animal),!.action(Cat,Animal,stay):-dog(Animal),owner(Owner,Animal),owner(Owner,Cat),!.

Eager ILP vs. Analogical Prediction Eager Learning: learn theory, dispose of

observations. Lazy Learning:

– keep all observations– compare new with old ones to classify– no explanation provided.

Analogical Prediction (Muggleton, Bain ‘98)– Combines the often higher accuracy of lazy

learning with an intelligible, explicit hypothesis typical for ILP

– Constructs a local theory for each new observation that is consistent with the largest number of training examples.

Analogical Prediction Example

owner(richard,caesar).

action(Cat,caesar,run).

owner(richard,rex).

action(Cat,rex,run).

owner(daniel,blackie).

action(Cat,blackie,stay).

owner(richard,rusty).

action(Cat,rusty,?).

Analogical Prediction Example

owner(richard,caesar).

action(Cat,caesar,run).

owner(richard,rex).

action(Cat,rex,run).

owner(daniel,blackie).

action(Cat,blackie,stay).

owner(richard,rusty).

action(Cat,Dog,run):-

owner(richard,Dog).

Timing Analysis of Theories Learned with ILP The more training examples, the more

accurate the theory… …but how long does it take to produce

an answer ? No theoretical work on the subject so far Experiment shows nontrivial behaviour

(reminding of the phase transitions observed in SAT learning).

Timing Analysis of ILP Theories: Example Kazakov, PhD Thesis:

• left: simple theory with low coverage; succeeds or quickly fails high speed

• middle: medium coverage, fragmentary theory, lots of backtracking low speed

• right: general theory with high coverage; less backtracking high speed

Agent Applications of ILP

Relational Reinforcement Learning (Džeroski, De Raedt, Driessens)

combines reinforcement learning with ILP generalises over previous experience and

goals (Q-table) to produce logical decision trees

results can be used to address new situations Don’t miss the next talk (~11:40 –13:10h) !

ILP for Verification and Validation of MAS (Jacob, Driessens, De Raedt)

Also uses FOPL decision trees Observes agents’ behavour and

represents it as a logical decision tree The rules in the decision tree can be

compared with the designers’ intentions Test domain: RoboCup

Reid & Ryan 2000: ILP used to help hierarchical

reinforcement learning ILP constructs high-level features that

help discriminate between (state,action) transitions with non-deterministic behaviour

Matsui et al. 2000: Proposed an ILP agent that avoids

actions which will probably fail to achieve the goal.

Application domain: RoboCup

Alonso & Kudenko ‘99: ILP and EBL for conflict simulations.

Species of 2D agents competing for renewable, limited resources.

Agents have simple hard-coded behaviour based on the notion of drives.

Each agent can optionally have an ILP (Progol) mind – a separate process receiving observations and suggesting actions.

Allows to select the values of inherited features through natural selection.

ILP hasn’t been used in experiments yet (to come soon).

A number of experiments using inheritance studied Kinship-driven Altruism among Agents.

The start-up project sponsored by Microsoft.

Undergraduate students involved so far: Lee Mallabone, Steve Routledge, John Barton.

Learning and Natural Selection

In learning, search is trivial, choosing the right bias is hard.

But, the choice of learning bias is always external to the learner !

To find the best suited bias one could combine arbitrary choices of bias of with evolution and natural selection of the fittest individuals.

Darwinian vs. Lamarckian Evolution Darwinian evolution: nothing learned by

the individual is encoded in the genes and passed on to the offspring.

The Baldwin effect: learning abilities (good biases) are selected in evolution because they give the individual a better chance in a dynamic environment.

What is passed on to the offspring is useful, but very general.

Darwinian vs. Lamarckian Evolution (2) Lamarckian Evolution: individual

experience acquired in life can be inherited.

Not the case in nature. Doesn’t mean we can’t use it. The inherited concepts may be too

specific and not of general importance.

Learning and Language

Language uses concepts which are– specific enough to be useful to most/all

speakers of that language– general enough to correspond to shared

experience (otherwise, how would one know what the other is talking about !)

The concepts of a language serve as a learning bias which is “inherited” not in genes but through education.

Communication and Learning Language

– helps one learn (in addition to inherited biases)– allows to communicate knowledge.

Distinguish between– Knowledge: things that one can explain by the

means of a language to another. – Skills: the rest, require individual learning, cannot be

communicated.

If watching was enough to learn, the dog would have become a butcher. Bulgarian proverb.

Communication and Learning (2)

In NLP, forgetting [examples] may be harmful (van den Bosch et al.)

An expert is someone who does not think anymore – he knows. Frank Lloyd Wright.

It may be difficult to communicate what one has learned because of– Limited bandwidth (for lazy learning)– The absence of appropriate concepts in the

language (for black-box learning)

Communication and Learning (3)

In a society of communicating agents, less accurate white-box learning may be better than more accurate but expensive learning that cannot be communicated since the reduced performance could be outweighed by the much lower cost of learning.

Our Current Research

Inductive Bias Selection (Shane Greenaway)

Role Learning (Spiros Kapetanakis) Inductive Learning for Games (Alex

Champandard) Machine Learning of Natural Language

in MAS (Mark Bartlett)

The End

machine learning and ilp for multi-agent systems daniel kudenko & dimitar kazakov department of...

inductive learning n

n example

n definition

n agent designers

single agents n

n quality of hypothesis

real learning

learning module

Documents

a learning adaptive bollinger band system · a learning...

eutrophication dimitar pavel_svetla_10_3

dimitar svilenov portfolio 2014

c. beskidt , w. de boer, d. kazakov , f. ratnikov

dimitar stanimiroff

curriculum vitae dimitar ouzounov - chapman university ·...

1 international fund for saving the aral sea (ifas) mr....

my first month in mozambique,dimitar

corel-dimitar bitrakov

dimitar popnikolov krisy lidia

english grammar. simple for understanding. prepared by igor...

device abstraction in osgi based embedded systems - dimitar...

dimitar dobrev cv & portfolio june'12

writer dimitar bitrakov

dimitar bogov vice governor

word dimitar bitrakov

kantanfest: dimitar shterionov - part 2

oa and oer paper dimitar poposki final

georgi kazakov - srednovekoven urbanizym

dimitar angelov portfolio