zhai yuqing [email protected] learning in multi-agent system
TRANSCRIPT
![Page 2: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/2.jpg)
Outline
• Agent Learning
• Multi-agent learning
• Reinforcement Learning & Multi-agent Reinforcement Learning
![Page 3: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/3.jpg)
Agent Learning
![Page 4: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/4.jpg)
Why Learning?
• Learning is essential for unknown environments,– i.e., when designer lacks omniscience
• Learning is useful as a system construction method,– i.e., expose the agent to reality rather than trying to
write it down
• Learning modifies the agent's decision mechanisms to improve performance
![Page 5: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/5.jpg)
Why Learning?• It is difficult to hand-design behaviours that act optimally (or
even close to it)
• Agents can optimize themselves using reinforcement learning
– Not learning new concepts (or behaviours), but given a set of states and actions it can find best policy
– Is this just adaptive control?
• Learning can be done on-line and continuously throughout lifetime of agent, adapting to (slowly) changing situations
![Page 6: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/6.jpg)
Learning
Policy
World,
State
Learning Algorithm
Actions
Observations,Sensations
Rewards
![Page 7: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/7.jpg)
Learning to act in the world
Policy
Environ-ment
Learning Algorithm
Actions
Observations,Sensations
Rewards
Other agents(possibly learning)
?
World
![Page 8: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/8.jpg)
Learning Agent Architecture
• A learning agent can be thought of as containing a performance element that decides what actions to take and a learning element that modifies the performance element so that it makes better decisions
![Page 9: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/9.jpg)
Learning Agent Architecture
![Page 10: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/10.jpg)
![Page 11: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/11.jpg)
![Page 12: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/12.jpg)
Multi-agent Learning
![Page 13: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/13.jpg)
Learning in Multiagent Systems
• Intersection of DAI and ML
• Why bring them together?– There is a strong need to equip Multiagent
systems with learning abilities– The extended view of ML as Multiagent
learning is qualitatively different from traditional ML and can lead to novel ML techniques and algorithms
![Page 14: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/14.jpg)
Multi-Agent Learning Problem:
• Agent tries to solve its learning problem, while other agents in the environment also are trying to solve their own learning problems. challenging non-stationarity.
• Main scenarios: (1) cooperative; (2) self-interest (many deep issues swept under the rug)
• Agent may know very little about other agents:– payoffs may be unknown– learning algorithms unknown
• Traditional method of solution: game theory (uses several questionable assumptions)
![Page 15: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/15.jpg)
Multi-agent Learning Problem• Agent tries to solve its own learning problem, while other agents in the
environment try to solve their own learning problems
• Larger state space– Might have to include state of other robots in own state
• Problems of Multi-Agent RL– All of problems from single agent– Other agents unpredictable or non-stationary– Should reinforcement be local or global?– Was robot trying to achieve goal or reacting to other robots when
doing a good action?
![Page 16: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/16.jpg)
Learning in Multi-Agent Systems• No Doubt learning is of great importance for MAS !• Challenge:
– Multi-Agent learning problem. The optimal policy changes. Other agents are learning too.
– Can we have a unifying framework in which this learning can be understood.
• Challenging MAS-domains:– Robotic soccer– Traffic– Robotic rescue– Trading agents, e-commerce– Automated Driving
![Page 17: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/17.jpg)
![Page 18: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/18.jpg)
![Page 19: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/19.jpg)
![Page 20: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/20.jpg)
![Page 21: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/21.jpg)
General Characterization
• Principal categories of learning
• The features in which learning approaches may differ
• The fundamental learning problem known as the credit-assignment problem
![Page 22: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/22.jpg)
Principal Categories
• Centralized Learning (isolated learning)– Learning executed by a single agent, no
interaction with other agents– Several centralized learners may try to obtain
different or identical goals at the same time
![Page 23: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/23.jpg)
Principal Categories
• Decentralized Learning (interactive learning)– Several agents are engaged in the samesame learning process
– Several groups of agents may try to obtain different or identical learning goals at the same time
• Single agent may be involved in several centralized/decentralized learning processes at the same time
![Page 24: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/24.jpg)
Learning and Activity Coordination
• Previous research on coordination focused on off-line design of behavioral rules, negotiation protocols, etc…
• Agents operating in open, dynamic environments must be able to adapt to changing demands and opportunities
• How can agents learn to appropriately coordinate their activities?
![Page 25: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/25.jpg)
Learning about and from Other Agents
• Agents learn to improve their individual performance
• Better capitalize on available opportunities by prediction the behavior of other agents (preferences, strategies, intentions, etc…)
![Page 26: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/26.jpg)
Learning Organizational Roles
• Assume agents have the capability of playing one of several roles in a situation
• Agents need to learn role assignments to effectively complement each other
![Page 27: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/27.jpg)
Learning Organizational Roles
• The framework includes Utility, Probability and Cost (UPC) estimates of a role adopted at a particular situation– UtilityUtility – desired final state’s worth if the agent adopted
the given role in the current situation
– ProbabilityProbability – likelihood of reaching a successful final state (given role/situation)
– CostCost – associated computational cost incurred
– PotentialPotential – usefulness of a role in discovering pertinent global information
![Page 28: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/28.jpg)
Learning Organizational Roles:Theoretical Framework
• sets of situations and roles for agent k• An agent maintains vectors of UPC
• During the learning phase:
• rates a role by combining the component
measures
kk RS ,
kk RS
kRj jsjsjsjs
rsrsrsrs
PotentialCPUf
PotentialCPUfr
),,,(
),,,()Pr(
f
![Page 29: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/29.jpg)
Learning Organizational Roles:Theoretical Framework
• After the learning phase is over, the role to be played in situation s is:
• UPC values are learned using reinforcement learning
• UPC estimates after n updates:
),,,(maxarg jsjsjsjsRj
PotentialCPUfrk
nrs
nrs
nrs otentialPPU ˆ,ˆ,ˆ
![Page 30: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/30.jpg)
Learning Organizational Roles:Updating the Utility
• S – the situations encountered between the time of adopting role r in situation s and reaching a final state F with utility
• The utility values for all roles chosen in each of the situation in S are updated:
FU
Fnrs
nrs UUU ˆ)1(ˆ 1
![Page 31: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/31.jpg)
Learning Organizational Roles:Updating the Probability
• - returns 1 if the given state is successful
• The update rule for probability: )(ˆ)1(ˆ 1 FOPP n
rsn
rs
]1,0[: SO
![Page 32: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/32.jpg)
Learning Organizational Roles:Updating the Potential
• - returns 1 if in the path to the final state, conflicts are detected and resolved by information exchange
• The update rule for potential: )(ˆ)1(ˆ 1 SConfotentialPotentialP n
rsnrs
)(SConf
![Page 33: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/33.jpg)
Learning to Exploit an Opponent:Model-Based Approach
• The most prominent approach in AI for developing playing strategies is the minimaxminimax algorithm– Assumes that the opponent will choose the
worst move
• An accurate model of the opponent can be used to develop better strategies
![Page 34: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/34.jpg)
Learning to Exploit an OpponentModel-Based Approach
• The main problem of RL is its slow convergence
• Model based approach tries to reduce the number of interaction examples needed for learning
• Perform deeper analysis of past interaction experience
![Page 35: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/35.jpg)
Model Based Approach
• The learning process is split into two separate stages:– Infer a model of the other agent based on past
experience– Utilize the learned model for designing
effective interaction strategy for the future
![Page 36: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/36.jpg)
Reducing Communication by Learning
• Learning is a method for reducing the load of communication among agents
• Consider the contract-net approach:– Broadcasting of task announcement is assumed– Scalability problems when the number of
managers/tasks increases
![Page 37: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/37.jpg)
Reducing Communication in Contract-Net
• A flexible learning-based mechanism called addressee learningaddressee learning
• Enable agents to acquire knowledge about the other agents’ task solving abilities
• Tasks may be assigned more directly
![Page 38: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/38.jpg)
Reducing Communication in Contract-Net
• Case-based reasoning is used for knowledge acquisition and refinement
• Humans often solve problems using solutions that worked well for similar problems
• Construct cases – problem-solution pairs
![Page 39: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/39.jpg)
Case-Based Reasoning in Contract Net
• Each agent maintains it own case base
• A case consists of:– Task specification– Info about which agent already solved the task
and the quality of the solution
• Need a similarity measure for tasks
ii imimiii VAVAT ,...,11
![Page 40: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/40.jpg)
Case-Based Reasoning in Contract Net
• Distance between two attributes is domain-specific
• Similarity between two tasks and :
• For task a set of similar tasks is:
),( jsir AADist
iT jT
r s
jsirji AADistTTSimilar ),(),(
iT
85.0),(:)( jiji TTSimilarTTS
![Page 41: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/41.jpg)
Case-Based Reasoning in Contract Net
• An agent has to assign task to another agent
• Select the most appropriate agents by computing their suitability:
iT
)(
),()(
1),(
ij TSTj
ii TAPerform
TSTASuit
![Page 42: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/42.jpg)
Improving Learning by Communication
• Two forms of improving learning by communication are distinguished:
– Learning based on low-level communication (e.g. exchanging missing information)
– Learning based on high-level communication (e.g. mutual explanation)
![Page 43: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/43.jpg)
Improving Learning by Communication
• Example: Predator-Prey domain– Predators are Q-learners– Each predator has a limited visual perception– Exchange sensor data – low-level
communication– Experiments show that it clearly leads to
improved learning results
![Page 44: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/44.jpg)
Knowledge exchange in MAS
• More sophisticated implementations provide knowledge exchange capabilities – Exchange the strongest rules they have learned– Multi-agent Mutual Learning(MAML)
![Page 45: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/45.jpg)
Some Open Questions…• What are the unique requirements and conditions for
Multiagent learning?
• Do centralized and decentralized learning qualitatively differ from each other?
• Development of theoretical foundations of decentralized learning
• Applications of Multiagent learning in complex real-world environments
![Page 46: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/46.jpg)
Reinforcement Learning & Multi-agent Reinforcement
Learning
![Page 47: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/47.jpg)
6
![Page 48: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/48.jpg)
12
![Page 49: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/49.jpg)
Reinforcement Learning Approach
Overview:
TD [Sutton 88], Q-learning [Watkins 92]• Agent can estimate a model of state transition probabilities of
E(Environment), if E has a fixed state transition probability (; E is a MDPs) .
Profit sharing [Grefensttette 88]• Agent can estimate a model of state transition probabilities of E, even
though E does not have a fixed state transition probability.
c.f. Dynamic programming • Agent needs to have a perfect model of state transition probabilities of E.
Feature: The Reward won’t be given immediately after agent’s action. Usually, it will be given only after achieving the goal. This delayed reward is the only clue to agent’s learning.
State Recognizer
Action Selector
LookUp Table
W (S, a )
Learner
Agent Input
Action
En
vironm
ent
Reward
En
vironm
ent
E
![Page 50: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/50.jpg)
![Page 51: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/51.jpg)
![Page 52: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/52.jpg)
![Page 53: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/53.jpg)
![Page 54: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/54.jpg)
![Page 55: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/55.jpg)
![Page 56: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/56.jpg)
![Page 57: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/57.jpg)
Agent
Environment
action at
rt+1
st+1
reward rtstate st
Reinforcement Learning Scenario
![Page 58: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/58.jpg)
![Page 59: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/59.jpg)
![Page 60: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/60.jpg)
![Page 61: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/61.jpg)
![Page 62: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/62.jpg)
![Page 63: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/63.jpg)
![Page 64: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/64.jpg)
Example
Q(s, ared) = 0 + × 81 = 72
Q(s, agreen) = 0 + × 100 = 90
Q(s, ablue) = 0 + × 100 = 90
![Page 65: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/65.jpg)
![Page 66: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/66.jpg)
![Page 67: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/67.jpg)
![Page 68: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/68.jpg)
![Page 69: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/69.jpg)
![Page 70: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/70.jpg)
![Page 71: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/71.jpg)
![Page 72: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/72.jpg)
![Page 73: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/73.jpg)
![Page 74: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/74.jpg)
![Page 75: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/75.jpg)
![Page 76: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/76.jpg)
Multi-agent RL
• Basic idea– Combine the learning process in an unknown environ
ment with the interactive decision process of multiple agents
– There is no single utility function to optimize
– Each agent has a different objective and its payoff is determined by the joint action of multiple agents
![Page 77: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/77.jpg)
Challenges in Multi-agent RL
• Curse of dimensionality – The number of parameters to be learned increases dramatically
with the number of agents • Partial observability
– states and actions of the other agents which are required for an agent to make decision are not fully observable
– inter-agent communication is usually costly – Notes: Partially observable Markov decision processes (POMD
Ps) have been used to model partial observability in probabilistic AI
![Page 78: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/78.jpg)
The Multi-Agent Reinforcement Learning (MARL) Model
• Multiple selfish agents in a stationary dynamic environment.
• Environment modeled as a Stochastic (a.k.a Markov) Game (SG or MG).
• Transition and Payoff are functions of all agents’ actions.
![Page 79: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/79.jpg)
The MARL Model (cont.)
• Transition probabilities and payoffs are initially unknown to agent.
• Agent’s goal – maximize return.
![Page 80: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/80.jpg)
Typical Multi-agent RL methods
• Value Iteration Learning [Sutton and Barto ]– Based on different concepts of equilibrium in game theory
• Min-max solution-based learning algorithm in zero-sum stochastic games [Littman]
• Nash Equilibrium-based learning algorithm[Wellman]– Extending Littman’s algorithm to the general-sum games
• Correlated Equilibrium based learning algorithm[Hall ]– Consider the possibility of action correlation among agents
![Page 81: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/81.jpg)
Typical Multi-agent RL methods
• Multiple-person decision theory-based– Assume that each agent plays a best-response a
gainst stationary opponents– Require the joint action of agents to converge t
o Nash Equilibrium in self-play
– Learn quickly while losing and slowly while winning
– Learn best response when opponents are stationary, otherwise move to equilibrium
![Page 82: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/82.jpg)
Typical Multi-agent RL methods
• Integrating RL with Coordination Learning – Joint action learner
• Independent Learner– Ignore the existence of other agents– Just apply RL in the classic sense
![Page 83: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/83.jpg)
Typical Multi-agent RL methods
• Hierarchical Multi-agent RL– Each agent is given an initial hierarchical decompositi
on of the overall task– Define cooperative subtasks to be those subtasks in w
hich coordination among agents has significant effect on the performance of the overall task
– Cooperative subtasks are usually defined at highest level(s) of the hierarchy
![Page 84: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/84.jpg)
MAL Foundation• The game theoretic concepts of stochastic
games and Nash equilibria• Learning algorithms use stochastic games as a na
tural extension of Markov decision processes (MDPs) to multiple agents – Equilibria learners
• Nash-Q , Minimax-Q • Friend-or-Foe-Q • gradient ascent learner
– best-response learner
![Page 85: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/85.jpg)
Multiagent Q-learning desiderata
“performs well” vs. arbitrarily adapting other agents best-response probably impossible
Doesn’t need correct model of other agents’ learning algorithms
But modeling is fair game
Doesn’t need to know other agents’ payoffs
Estimate other agents’ strategies from observation do not assume game-theoretic play
No assumption of stationary outcome: population may never reach eqm, agents may never stop adapting
Self-play: convergence to repeated Nash would be nice but not necessary. (unreasonable to seek convergence to a one-shot Nash)
![Page 86: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/86.jpg)
Finding Nash equilibrium
• Game theoretic approach which supposes the complete knowledge of the reward structure of the underlying game by all the agents – Each agent calculates an equilibrium, by using
mathematical programming – Suppose that the other agents are rational
![Page 87: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/87.jpg)
Potential applications of MARL
• E-commerce – agents buying and selling over the internet.
• Autonomous computing, e.g., automatic fault recovery.
• Exploration of environments that are inaccessible to humans: bottom of oceans, space, etc…
![Page 88: Zhai Yuqing yqzhai@seu.edu.cn Learning in Multi-agent System](https://reader031.vdocument.in/reader031/viewer/2022020921/56649ca75503460f9496a325/html5/thumbnails/88.jpg)
The End