lecture about agents that learn 3rd april 2000 int4/2i1235
TRANSCRIPT
![Page 1: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/1.jpg)
Lecture about Agents that Learn
• 3rd April 2000
• INT4/2I1235
![Page 2: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/2.jpg)
Agenda
• Introduction• Centralized learning vs decentralized learning• Credit Assignment Problem• Learning and Activity Coordination• Learning about and from other agents• Learning and Communication• Summary
![Page 3: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/3.jpg)
Introduction
• Todays topic• Who is the lecturer• Why do we have this lecture
![Page 4: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/4.jpg)
Todays topic
• How do agents learn?• What are the benefits of learning agents?• Learning in isolation, or in cooperation?
![Page 5: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/5.jpg)
Who is the lecturer
• Johan Kummeneje• Doctoral Student• RoboCup, Social Decisions, and Java
![Page 6: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/6.jpg)
Why do we have this lecture
• Beats me….. You tell me.
• Take 2 minutes to think about why this is interesting, and then I will ask 2 or 3 of you what you think.
![Page 7: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/7.jpg)
Agenda
• Introduction
• Centralized learning vs decentralized learning
• Credit Assignment Problem
• Learning and Activity Coordination
• Learning about and from other agents
• Learning and Communication
• Summary
![Page 8: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/8.jpg)
Centralized vs Decentralized
• Introduction• The Degree of Decentralization• Interaction-specific features• Involvement-specific features• Goal-specific features• The learning method• The learning feedback
![Page 9: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/9.jpg)
Introduction
• Learning process => planning, inference, decision steps etc.
• Centralized learning or isolated learning• Decentralized learning or interactive
learning
![Page 10: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/10.jpg)
The Degree of Decentralization
• Distributedness• Paralellism
![Page 11: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/11.jpg)
Interaction-specific features
• Level of interaction ( ”simple” observation to complex negotiations and dialogues)
• Persitence of interaction (short-long)• Frequency (low -high)• Pattern ( unstructured- hierarchical)• Variability (fixed - dynamic)
![Page 12: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/12.jpg)
Involvement-specific features
• Relevance to the learning process• Role in the learning process• Generalist-- Specialist
![Page 13: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/13.jpg)
Goal-specific features
• Improvement (Individual vs Social)• Conflict vs Compatible Goals
![Page 14: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/14.jpg)
The learning method
• Rote learning (”Korvstoppning”)
• Instructed and adviced
• Examples and practice (Learning by Doing, Baden-Powell)
• Analogy
• Discovery
Efforts increase from top to bottom.
![Page 15: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/15.jpg)
The learning feedback
• Supervised (tells which action that is the best)
• Reinforcement (maximizing the utility of action)
• Unsupervised (no explicit feedback)
![Page 16: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/16.jpg)
Agenda
• Introduction
• Centralized learning vs decentralized learning
• Credit Assignment Problem
• Learning and Activity Coordination
• Learning about and from other agents
• Learning and Communication
• Summary
![Page 17: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/17.jpg)
Credit Assignment Problem
• Inter Agent CAP (how to divide credit to the different agents)
• Intra Agent CAP (how to divide credit between different actions performed in an agent)
![Page 18: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/18.jpg)
Agenda
• Introduction
• Centralized learning vs decentralized learning
• Credit Assignment Problem
• Learning and Activity Coordination
• Learning about and from other agents
• Learning and Communication
• Summary
![Page 19: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/19.jpg)
Learning and Activity Coordination
• Introduction
• Reinforcement Learning– Q-Learning and Learning Classifier Systems
• Isolated, Concurrent Reinforcement Learners
• Interactive Reinforcement Learning of Coordination– ACE and AGE
![Page 20: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/20.jpg)
Introduction
• Activity Coordination• Adaption to to differences in the
coordination process• Effectively utilize opportunities and
avoidance of pitfalls.
![Page 21: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/21.jpg)
Reinforcement Learning
• Optimise the feedback (reinforcement)• Modeled by a Markov decision process• <S, A, SxSxA,r>
![Page 22: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/22.jpg)
Q-Learning
• When getting feedback=> update the Q-value
• Q(s,a) <- (1-b)Q(s,a)+b(R+y max(Q(s',a'))
• where b is a small constant called the learning rate
![Page 23: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/23.jpg)
Learning Classifier Systems
• A classifier is (condition, action)• Strength of the classifier at a time S(c,a)• At each timestep a classifier is choosen from
a matchset ( according to environment)• Feedback is received and the S is modified
accordingly.
![Page 24: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/24.jpg)
Isolated, Concurrent Reinforcement Learners
• Agent Coupling• Agent relationships• Feedback timing• Optimal behaviour combinations
• CIRL• No modelling of other agents• In cooperative situations, complimentary
policies can be developed• Adapts to similar situations.
![Page 25: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/25.jpg)
Interactive Reinforcement Learning of Coordination
• Eliminates incompatible actions• Agents can observe the set of considered
actions of other agents.• Two different alternatives are ACE and AGE
![Page 26: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/26.jpg)
Action Estimate Algorithm (ACE)
• Each agent calculates the set of performable actions• For each of these the agent calculates the goalrelevance.• For all agent with a GR above a treshold, the agents calc.
And announces a bid with a risk factor and a noise term :• B(S)= (a+b)E(S)• Removal of incompatible actions. It thereafter executes the
one with the highest bid.• The feedback increases the probability for succesful actions
to be performed in future.
![Page 27: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/27.jpg)
Action Group Estimate Algorithm (AGE)
• All applicable actions from each agent is collected in to all possible activity contexts, in which all actions are mutually compatible.
• Using the same bidding strategy from ACE, the highest sum of bids for a activity context, chooses the activity context to execute.
• Credit assignment is dependent on the actions performed and the relevance of the action.
• Requires more computational effort than ACE.
![Page 28: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/28.jpg)
Agenda
• Introduction
• Centralized learning vs decentralized learning
• Credit Assignment Problem
• Learning and Activity Coordination
• Learning about and from other agents
• Learning and Communication
• Summary
![Page 29: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/29.jpg)
Learning about and from other agents
• Introduction • Learning Organizational Roles• Learning in Market Environments
![Page 30: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/30.jpg)
Introduction
• Learning to improve the individual performance
• On the expense of other agents
• Anticipatory Agents, RMM
![Page 31: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/31.jpg)
·Learning Organizational Roles
• Learns roles, to better complement each other.
• Each agent can be in a set of roles (one at a time), and the choice is to choose the most appropriate role. (Minimise costs).
• f(U, P, C, Potential)
![Page 32: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/32.jpg)
·Learning in Market Environments
• Agents sell/buy information from each other.
• 0-level agents do not model other agents• 1-level agents model other agents as 0-level
agents• 2-level agents model other agents as 1-level
agents
![Page 33: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/33.jpg)
Agenda
• Introduction
• Centralized learning vs decentralized learning
• Credit Assignment Problem
• Learning and Activity Coordination
• Learning about and from other agents
• Learning and Communication
• Summary
![Page 34: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/34.jpg)
Learning and Communication
• Introduction• Reducing Communication by Learning• Improving Learning by Communication
![Page 35: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/35.jpg)
Introduction
• Learning to communicate• Communicating as learning
• What to communicate?• When to communicate?• With whom to communicate?• How to communicate?
![Page 36: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/36.jpg)
Reducing Communication by Learning
• Learning about the abilities of other agents.
• Learning which agents to ask, instead of broadcasting
• Problem similarities
![Page 37: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/37.jpg)
Improving Learning by Communication
• Communicating beliefs and pieces of information
• Explanation
• Ontologies• Finding out complex relationships between
different agents and actions.
![Page 38: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/38.jpg)
Agenda
• Introduction
• Centralized learning vs decentralized learning
• Credit Assignment Problem
• Learning and Activity Coordination
• Learning about and from other agents
• Learning and Communication
• Summary
![Page 39: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/39.jpg)
Summary
• We have seen the move of foci from isolated (individual, centralized) learning to a more diverse flora of learning.
• Besides standard (old) ML-methods there are some new ML-algorithms proposed.
• Agents learn to improve communication and cooperation.
![Page 40: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/40.jpg)
Further reading
• Peter Stone, Ph.D-thesis• Weiss (coursematerial), chapter 6• Russell and Norvig, AI. A modern Approach
![Page 41: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235](https://reader035.vdocument.in/reader035/viewer/2022062720/56649efc5503460f94c0f9fb/html5/thumbnails/41.jpg)
• THE END