multi-agent reinforcement learning - game theory polimi · 2019-12-03 · introduction to...
TRANSCRIPT
![Page 1: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/1.jpg)
Multi-Agent Reinforcement LearningAn Overview
Marcello Restelli
November 12, 2014
![Page 2: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/2.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 3: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/3.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 4: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/4.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Game Theory in Computer Science
Game Theory ComputerScience
![Page 5: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/5.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Game Theory in Computer Science
Game Theory ComputerScience
Computing Solution Concepts
Compact Game Representations
Mechanism Design
Multi-agent Learning
![Page 6: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/6.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 7: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/7.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 8: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/8.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 9: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/9.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 10: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/10.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 11: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/11.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 12: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/12.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Some Naming Conventions
Player = AgentPayoff = RewardValue = UtilityMatrix = Strategic form = Normal formStrategy = PolicyPure strategy = Determinitic policyMixed strategy = Stochastic policy
![Page 13: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/13.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
![Page 14: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/14.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
![Page 15: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/15.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
If multi-agent is theanswer, what is thequestion?
![Page 16: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/16.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
![Page 17: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/17.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 18: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/18.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 19: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/19.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 20: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/20.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 21: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/21.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 22: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/22.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 23: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/23.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 24: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/24.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 25: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/25.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 26: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/26.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
What is Multi-Agent Learning?
Difficult question...... we will try to answer in theseslidesIt involves
Multiple agentsSelf-interestConcurrently learning
It is strictly related toGame TheoryReinforcement LearningMulti-agent Systems
Shoham et al.,2002-2007
Stone, 2007
If multi-agent is theanswer, what is thequestion?
Multi-agent learningis not the answer, itis the question!
![Page 27: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/27.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Which Applications?
Distributed vehicle regulationAir traffic controlNetwork management and routingElectricity distribution managementSupply chainsJob schedulingComputer games
![Page 28: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/28.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent Learning and RL
We are interested in learning in situations wheremultiple decision makers repeatedly interactAmong the different machine learning paradigms,reinforcement learning is the most suited to approachsuch problemWe will mainly focus on multi-agent RL, even if other(game-theoretic) learning approaches will bementioned
Fictitious playNo-regret learning
![Page 29: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/29.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent Learning and RL
We are interested in learning in situations wheremultiple decision makers repeatedly interactAmong the different machine learning paradigms,reinforcement learning is the most suited to approachsuch problemWe will mainly focus on multi-agent RL, even if other(game-theoretic) learning approaches will bementioned
Fictitious playNo-regret learning
![Page 30: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/30.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent Learning and RL
We are interested in learning in situations wheremultiple decision makers repeatedly interactAmong the different machine learning paradigms,reinforcement learning is the most suited to approachsuch problemWe will mainly focus on multi-agent RL, even if other(game-theoretic) learning approaches will bementioned
Fictitious playNo-regret learning
![Page 31: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/31.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent Learning and RL
We are interested in learning in situations wheremultiple decision makers repeatedly interactAmong the different machine learning paradigms,reinforcement learning is the most suited to approachsuch problemWe will mainly focus on multi-agent RL, even if other(game-theoretic) learning approaches will bementioned
Fictitious playNo-regret learning
![Page 32: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/32.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent Learning and RL
We are interested in learning in situations wheremultiple decision makers repeatedly interactAmong the different machine learning paradigms,reinforcement learning is the most suited to approachsuch problemWe will mainly focus on multi-agent RL, even if other(game-theoretic) learning approaches will bementioned
Fictitious playNo-regret learning
![Page 33: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/33.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 34: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/34.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
History of RL
Psychology, Trial and error
![Page 35: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/35.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
History of RL
Psychology, Trial and error
Pavlov (1903)Classical conditioning
Thorndike (1905)Law of effect
Minsky (1961)Credit–assignment
problem
![Page 36: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/36.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
History of RL
Psychology, Trial and error
Pavlov (1903)Classical conditioning
Thorndike (1905)Law of effect
Minsky (1961)Credit–assignment
problem
Optimal Control
![Page 37: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/37.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
History of RL
Psychology, Trial and error
Pavlov (1903)Classical conditioning
Thorndike (1905)Law of effect
Minsky (1961)Credit–assignment
problem
Optimal Control
Bellman (1957)Dynamic Programming
Howard (1960)Policy Iteration
![Page 38: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/38.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
History of RL
Psychology, Trial and error
Pavlov (1903)Classical conditioning
Thorndike (1905)Law of effect
Minsky (1961)Credit–assignment
problem
Reinforcement Learning
Optimal Control
Bellman (1957)Dynamic Programming
Howard (1960)Policy Iteration
![Page 39: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/39.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
History of RL
Psychology, Trial and error
Pavlov (1903)Classical conditioning
Thorndike (1905)Law of effect
Minsky (1961)Credit–assignment
problem
Reinforcement Learning
Optimal Control
Bellman (1957)Dynamic Programming
Howard (1960)Policy Iteration
Samuel (1956)Checkers
Sutton & Barto (1984)Temporal Difference
Watkins (1989)Q–learning
Littman (1994)minimax–Q
Tesauro (1992)TD–Gammon
![Page 40: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/40.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Agent-Environment Interface
Agent interacts at discrete time steps t = 0,1,2, . . .Full observability: agent directly observesenvironment stateFormally, this is a Markov Decision Process (MDP)
![Page 41: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/41.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Agent-Environment Interface
Agent interacts at discrete time steps t = 0,1,2, . . .Full observability: agent directly observesenvironment stateFormally, this is a Markov Decision Process (MDP)
![Page 42: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/42.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Agent-Environment Interface
Agent interacts at discrete time steps t = 0,1,2, . . .Full observability: agent directly observesenvironment stateFormally, this is a Markov Decision Process (MDP)
![Page 43: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/43.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 44: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/44.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 45: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/45.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 46: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/46.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 47: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/47.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 48: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/48.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 49: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/49.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 50: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/50.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 51: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/51.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Decision Processes
An MDP is formalized as a 4-tuple: 〈S,A,P,R〉S: set of states
What the agent knows (complete observability)A: set of actions
What the agent can do (it may depend on state)P: state transition model
S × A× S → [0,1]
R: reward functionS × A× S → R
![Page 52: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/52.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Assumption
Let st be a random variable for state at time tP(st |at−1, st−1, . . . ,a0, s0) = P(st |at−1, st−1)
Markov is a special kind of conditional independenceFuture is independent of past given current state
![Page 53: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/53.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Assumption
Let st be a random variable for state at time tP(st |at−1, st−1, . . . ,a0, s0) = P(st |at−1, st−1)
Markov is a special kind of conditional independenceFuture is independent of past given current state
![Page 54: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/54.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Assumption
Let st be a random variable for state at time tP(st |at−1, st−1, . . . ,a0, s0) = P(st |at−1, st−1)
Markov is a special kind of conditional independenceFuture is independent of past given current state
![Page 55: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/55.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Markov Assumption
Let st be a random variable for state at time tP(st |at−1, st−1, . . . ,a0, s0) = P(st |at−1, st−1)
Markov is a special kind of conditional independenceFuture is independent of past given current state
![Page 56: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/56.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 57: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/57.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 58: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/58.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 59: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/59.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 60: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/60.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 61: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/61.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 62: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/62.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 63: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/63.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 64: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/64.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 65: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/65.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
The Goal: a Policy
Finding a policy that maximizes some cumulativefunctions of the rewardsWhat is a policy?
a mapping function from states to distributions overactionsdeterministic vs stochasticstationary vs non-stationary
Cost criteriafinite horizoninfinite horizon
average
discounted Rt =∞∑
k=0
γk rt+k
![Page 66: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/66.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Value Functions
MDP + stationary policy⇒ Markov chainGiven a policy π, it is possible to define the utility ofeach state: Policy EvaluationValue function (Bellman equation)
Vπ(s) =∑a∈A
π(s|a)∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
For control purposes, rather than the value of eachstate, it is easier to consider the value of each action ineach stateAction-value function (Bellman equation)
Qπ(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
![Page 67: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/67.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Value Functions
MDP + stationary policy⇒ Markov chainGiven a policy π, it is possible to define the utility ofeach state: Policy EvaluationValue function (Bellman equation)
Vπ(s) =∑a∈A
π(s|a)∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
For control purposes, rather than the value of eachstate, it is easier to consider the value of each action ineach stateAction-value function (Bellman equation)
Qπ(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
![Page 68: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/68.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Value Functions
MDP + stationary policy⇒ Markov chainGiven a policy π, it is possible to define the utility ofeach state: Policy EvaluationValue function (Bellman equation)
Vπ(s) =∑a∈A
π(s|a)∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
For control purposes, rather than the value of eachstate, it is easier to consider the value of each action ineach stateAction-value function (Bellman equation)
Qπ(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
![Page 69: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/69.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Value Functions
MDP + stationary policy⇒ Markov chainGiven a policy π, it is possible to define the utility ofeach state: Policy EvaluationValue function (Bellman equation)
Vπ(s) =∑a∈A
π(s|a)∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
For control purposes, rather than the value of eachstate, it is easier to consider the value of each action ineach stateAction-value function (Bellman equation)
Qπ(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
![Page 70: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/70.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Value Functions
MDP + stationary policy⇒ Markov chainGiven a policy π, it is possible to define the utility ofeach state: Policy EvaluationValue function (Bellman equation)
Vπ(s) =∑a∈A
π(s|a)∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
For control purposes, rather than the value of eachstate, it is easier to consider the value of each action ineach stateAction-value function (Bellman equation)
Qπ(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γVπ(s′))
![Page 71: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/71.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Optimal Value Functions
Optimal Bellamn equation (Bellman, 1957)
V ∗(s) = maxa
(∑s′∈S
P(s′|s,a)(R(s,a, s′) + γV ∗(s′)))
Q∗(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γmaxa′
Q∗(s′,a′))
For each MDP there is at least one deterministicoptimal policyAll optimal policies have the same value function V ∗
![Page 72: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/72.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Optimal Value Functions
Optimal Bellamn equation (Bellman, 1957)
V ∗(s) = maxa
(∑s′∈S
P(s′|s,a)(R(s,a, s′) + γV ∗(s′)))
Q∗(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γmaxa′
Q∗(s′,a′))
For each MDP there is at least one deterministicoptimal policyAll optimal policies have the same value function V ∗
![Page 73: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/73.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Optimal Value Functions
Optimal Bellamn equation (Bellman, 1957)
V ∗(s) = maxa
(∑s′∈S
P(s′|s,a)(R(s,a, s′) + γV ∗(s′)))
Q∗(s,a) =∑s′∈S
P(s′|s,a)(R(s,a, s′) + γmaxa′
Q∗(s′,a′))
For each MDP there is at least one deterministicoptimal policyAll optimal policies have the same value function V ∗
![Page 74: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/74.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 75: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/75.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 76: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/76.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 77: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/77.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 78: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/78.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 79: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/79.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 80: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/80.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 81: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/81.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 82: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/82.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Solving an MDP
Policy searchbrute force is unfeasible (|A||S|)policy gradient, stochastic optimization approaches
Dynamic Programming (DP)Value IterationPolicy Iteration
Linear ProgrammingLP worst–case convergence guarantees are betterthan those of DP methodsLP methods become impractical at a much smallernumber of states than DP methods do
![Page 83: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/83.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Dynamic Programming
Dynamic Programming (DP) is a collection ofalgorithms to solve problems exhibiting optimalsubstructureWhen the transition model and the reward function areknown, (offline) DP algorithms can be used to solveMDP problems
Complete knowledgeComputational expensive
From DP algorithms have been derived RL algorithms
![Page 84: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/84.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Dynamic Programming
Dynamic Programming (DP) is a collection ofalgorithms to solve problems exhibiting optimalsubstructureWhen the transition model and the reward function areknown, (offline) DP algorithms can be used to solveMDP problems
Complete knowledgeComputational expensive
From DP algorithms have been derived RL algorithms
![Page 85: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/85.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Dynamic Programming
Dynamic Programming (DP) is a collection ofalgorithms to solve problems exhibiting optimalsubstructureWhen the transition model and the reward function areknown, (offline) DP algorithms can be used to solveMDP problems
Complete knowledgeComputational expensive
From DP algorithms have been derived RL algorithms
![Page 86: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/86.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Dynamic Programming
Dynamic Programming (DP) is a collection ofalgorithms to solve problems exhibiting optimalsubstructureWhen the transition model and the reward function areknown, (offline) DP algorithms can be used to solveMDP problems
Complete knowledgeComputational expensive
From DP algorithms have been derived RL algorithms
![Page 87: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/87.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Dynamic Programming
Dynamic Programming (DP) is a collection ofalgorithms to solve problems exhibiting optimalsubstructureWhen the transition model and the reward function areknown, (offline) DP algorithms can be used to solveMDP problems
Complete knowledgeComputational expensive
From DP algorithms have been derived RL algorithms
![Page 88: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/88.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
RL vs DP
RL methods are used when the transition model or thereward function are unknownThrough repeated interactions the agent estimatesthe utility of each stateTwo approaches
Model-basedModel-free
Q-learning
![Page 89: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/89.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
RL vs DP
RL methods are used when the transition model or thereward function are unknownThrough repeated interactions the agent estimatesthe utility of each stateTwo approaches
Model-basedModel-free
Q-learning
![Page 90: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/90.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
RL vs DP
RL methods are used when the transition model or thereward function are unknownThrough repeated interactions the agent estimatesthe utility of each stateTwo approaches
Model-basedModel-free
Q-learning
![Page 91: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/91.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
RL vs DP
RL methods are used when the transition model or thereward function are unknownThrough repeated interactions the agent estimatesthe utility of each stateTwo approaches
Model-basedModel-free
Q-learning
![Page 92: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/92.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
RL vs DP
RL methods are used when the transition model or thereward function are unknownThrough repeated interactions the agent estimatesthe utility of each stateTwo approaches
Model-basedModel-free
Q-learning
![Page 93: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/93.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
RL vs DP
RL methods are used when the transition model or thereward function are unknownThrough repeated interactions the agent estimatesthe utility of each stateTwo approaches
Model-basedModel-free
Q-learning
![Page 94: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/94.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Q-learning (Watkins,’89)
Q-learning is the most popular RL algorithmQt+1(st ,at ) = Qt (st ,at )+α(rt +γmaxa Qt (st+1,a)−Qt (st ,at ))
Qt+1(st ,at ) = (1− α)Qt (st ,at ) + α(rt + γmaxa Qt (st+1,a))
Off-policy TD algorithmSimple to implementIf all the state-action pairs are tried infinitely often andthe learning rate opportunely decreases, converges tothe optimal solution
![Page 95: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/95.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Q-learning (Watkins,’89)
Q-learning is the most popular RL algorithmQt+1(st ,at ) = Qt (st ,at )+α(rt +γmaxa Qt (st+1,a)−Qt (st ,at ))
Qt+1(st ,at ) = (1− α)Qt (st ,at ) + α(rt + γmaxa Qt (st+1,a))
Off-policy TD algorithmSimple to implementIf all the state-action pairs are tried infinitely often andthe learning rate opportunely decreases, converges tothe optimal solution
![Page 96: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/96.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Q-learning (Watkins,’89)
Q-learning is the most popular RL algorithmQt+1(st ,at ) = Qt (st ,at )+α(rt +γmaxa Qt (st+1,a)−Qt (st ,at ))
Qt+1(st ,at ) = (1− α)Qt (st ,at ) + α(rt + γmaxa Qt (st+1,a))
Off-policy TD algorithmSimple to implementIf all the state-action pairs are tried infinitely often andthe learning rate opportunely decreases, converges tothe optimal solution
![Page 97: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/97.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Q-learning (Watkins,’89)
Q-learning is the most popular RL algorithmQt+1(st ,at ) = Qt (st ,at )+α(rt +γmaxa Qt (st+1,a)−Qt (st ,at ))
Qt+1(st ,at ) = (1− α)Qt (st ,at ) + α(rt + γmaxa Qt (st+1,a))
Off-policy TD algorithmSimple to implementIf all the state-action pairs are tried infinitely often andthe learning rate opportunely decreases, converges tothe optimal solution
![Page 98: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/98.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Q-learning (Watkins,’89)
Q-learning is the most popular RL algorithmQt+1(st ,at ) = Qt (st ,at )+α(rt +γmaxa Qt (st+1,a)−Qt (st ,at ))
Qt+1(st ,at ) = (1− α)Qt (st ,at ) + α(rt + γmaxa Qt (st+1,a))
Off-policy TD algorithmSimple to implementIf all the state-action pairs are tried infinitely often andthe learning rate opportunely decreases, converges tothe optimal solution
![Page 99: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/99.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Q-learning (Watkins,’89)
Q-learning is the most popular RL algorithmQt+1(st ,at ) = Qt (st ,at )+α(rt +γmaxa Qt (st+1,a)−Qt (st ,at ))
Qt+1(st ,at ) = (1− α)Qt (st ,at ) + α(rt + γmaxa Qt (st+1,a))
Off-policy TD algorithmSimple to implementIf all the state-action pairs are tried infinitely often andthe learning rate opportunely decreases, converges tothe optimal solution
![Page 100: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/100.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Advanced Topics in RL
High-dimensional problemsContinuous MDPsPartially observable MDPsMulti-Objective MDPsInverse RLTransfer of KnowledgeExploration vs ExploitationMulti-agent learning
![Page 101: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/101.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 102: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/102.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 103: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/103.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 104: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/104.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 105: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/105.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 106: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/106.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 107: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/107.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Exploration vs Exploitation
To cumulate high rewards an agent needs to exploitactions that have been tried in the past and are knownto be effective...... but it has to explore such actions to improveThe dilemma is that both exploration and exploitationare necessaryMany techniques have been studied
ε-greedy
Boltzmann→ π(s,a) =e
Q(s,a)T∑
a′∈A eQ(s,a′)
T
More efficient techniques (Multi–Armed Bandits)
![Page 108: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/108.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 109: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/109.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
How RL can be extended to MAS?
RL research is mainly focused on single-agentlearningWe need to extend the MDP framework in order toconsider other agents with possibly different rewardfunctionsSo we will need to resort to game theory concepts
![Page 110: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/110.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
How RL can be extended to MAS?
RL research is mainly focused on single-agentlearningWe need to extend the MDP framework in order toconsider other agents with possibly different rewardfunctionsSo we will need to resort to game theory concepts
![Page 111: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/111.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
How RL can be extended to MAS?
RL research is mainly focused on single-agentlearningWe need to extend the MDP framework in order toconsider other agents with possibly different rewardfunctionsSo we will need to resort to game theory concepts
![Page 112: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/112.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 113: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/113.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 114: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/114.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 115: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/115.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 116: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/116.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 117: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/117.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 118: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/118.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 119: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/119.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
When do we have a MAL problem?When there are multiple concurrent learnersActually, when some agent’s policies depend on otheragents’ past actions
MAL is much more difficult than SAL. Why?Problem dimensions typically grow with the number ofagentsNon-stationarity“Optimal” policies can be stochasticLearning cannot be separated from teaching
![Page 120: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/120.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 121: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/121.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 122: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/122.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 123: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/123.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 124: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/124.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 125: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/125.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 126: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/126.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 127: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/127.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 128: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/128.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 129: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/129.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 130: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/130.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Multi-agent vs Single-agent
Which is the goal? What the agents have to learn?Actually it depends on the learning strategy adopted byother agents
Best-responseEquilibrium
No learning procedure is optimal against all possibleopponent beahviors
Self-playTargeted optimality
Desirable properties for learning strategiesSafetyRationalityUniversal consistency / no-regret
![Page 131: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/131.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 132: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/132.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 133: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/133.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 134: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/134.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 135: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/135.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 136: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/136.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 137: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/137.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 138: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/138.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 139: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/139.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Matrix Games
A matrix (or strategic) game is a tuple: 〈n,A,R〉n: number of playersA: joint action space, Ai is the set of actions of player iR: vector of reward functions, Ri is the reward functionof player i
Matrix games are one-shot gamesLearning requires repeated interactions
Repeated gamesStochastic games
![Page 140: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/140.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 141: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/141.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 142: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/142.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 143: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/143.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 144: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/144.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 145: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/145.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 146: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/146.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 147: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/147.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
A Special Case: Repeated Games
In repeated games, the same one-shot game (calledstage game) is repeatedly played
E.g., iterated prisoner’s dilemma
Infinitely vs finitely repeated gamesReally, an extensive form game
Subgame-perfect (SP) equilibriaOne SP equilibrium is to repeatedly play some Nashequilibrium of the stage game
Stationary strategyAre other equilibria possible?
![Page 148: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/148.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Folk Theorem
No strategy profiles, but obtained payoffsInformally:
“In infinitely repeated game the set ofaverage rewards attainable in equilibriumare precisely those pairs attainable under
mixed strategies in the single stage game,with the constraint on the mixed strategies
that each player’s payoff is at least theamount he would receive if the other players
adopted minimax strategies against him”
![Page 149: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/149.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Folk Theorem
No strategy profiles, but obtained payoffsInformally:
“In infinitely repeated game the set ofaverage rewards attainable in equilibriumare precisely those pairs attainable under
mixed strategies in the single stage game,with the constraint on the mixed strategies
that each player’s payoff is at least theamount he would receive if the other players
adopted minimax strategies against him”
![Page 150: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/150.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Folk Theorem
No strategy profiles, but obtained payoffsInformally:
“In infinitely repeated game the set ofaverage rewards attainable in equilibriumare precisely those pairs attainable under
mixed strategies in the single stage game,with the constraint on the mixed strategies
that each player’s payoff is at least theamount he would receive if the other players
adopted minimax strategies against him”
![Page 151: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/151.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Folk Theorem
![Page 152: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/152.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 153: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/153.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 154: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/154.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 155: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/155.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 156: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/156.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 157: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/157.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 158: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/158.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 159: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/159.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
MDP + Matrix = Stochastic Games (SGs)
A stochastic (or Markov) game is a tuple: 〈n,S,A,P,R〉n: number of playersS: set of statesA: joint action space, A1 × · · · × AnP: state transition modelR: vector of reward functions, one for each agent
SG extends MDP to multiple agentsSGs with one state are called repeated games
![Page 160: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/160.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Strategies in SG
Let ht = (s0,a0, s1,a1, . . . , st−1,at−1, st ) denote ahistory of t stages of a stochastic gameThe space of possible strategies is huge, but there areinteresting restrictions
Behavioral strategy: returns the probability of playingan action given a history htMarkov strategy: is a behavioral strategy in which thedistribution over actions depends only on the currentstateStationary strategy: is a time-independent Markovstrategy
![Page 161: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/161.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Strategies in SG
Let ht = (s0,a0, s1,a1, . . . , st−1,at−1, st ) denote ahistory of t stages of a stochastic gameThe space of possible strategies is huge, but there areinteresting restrictions
Behavioral strategy: returns the probability of playingan action given a history htMarkov strategy: is a behavioral strategy in which thedistribution over actions depends only on the currentstateStationary strategy: is a time-independent Markovstrategy
![Page 162: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/162.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Strategies in SG
Let ht = (s0,a0, s1,a1, . . . , st−1,at−1, st ) denote ahistory of t stages of a stochastic gameThe space of possible strategies is huge, but there areinteresting restrictions
Behavioral strategy: returns the probability of playingan action given a history htMarkov strategy: is a behavioral strategy in which thedistribution over actions depends only on the currentstateStationary strategy: is a time-independent Markovstrategy
![Page 163: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/163.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Strategies in SG
Let ht = (s0,a0, s1,a1, . . . , st−1,at−1, st ) denote ahistory of t stages of a stochastic gameThe space of possible strategies is huge, but there areinteresting restrictions
Behavioral strategy: returns the probability of playingan action given a history htMarkov strategy: is a behavioral strategy in which thedistribution over actions depends only on the currentstateStationary strategy: is a time-independent Markovstrategy
![Page 164: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/164.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Strategies in SG
Let ht = (s0,a0, s1,a1, . . . , st−1,at−1, st ) denote ahistory of t stages of a stochastic gameThe space of possible strategies is huge, but there areinteresting restrictions
Behavioral strategy: returns the probability of playingan action given a history htMarkov strategy: is a behavioral strategy in which thedistribution over actions depends only on the currentstateStationary strategy: is a time-independent Markovstrategy
![Page 165: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/165.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibria in SG
Markov-perfect equilibrium: is a profile of Markovstrategies that yields a Nash equilibrium in everyproper subgameEvery n-player, general-sum, discounted-rewardstochastic game has a Markov perfect equilibrium
![Page 166: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/166.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibria in SG
Markov-perfect equilibrium: is a profile of Markovstrategies that yields a Nash equilibrium in everyproper subgameEvery n-player, general-sum, discounted-rewardstochastic game has a Markov perfect equilibrium
![Page 167: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/167.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Stochastic Games: Example
RewardGoalreached:+100Collision: -1Otherwise: 0
Some solutions:
![Page 168: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/168.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Stochastic Games: Example
RewardGoalreached:+100Collision: -1Otherwise: 0
Some solutions:
![Page 169: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/169.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Stochastic Games: Example
RewardGoalreached:+100Collision: -1Otherwise: 0
Some solutions:
![Page 170: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/170.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Stochastic Games: Example
RewardGoalreached:+100Collision: -1Otherwise: 0
Some solutions:
![Page 171: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/171.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Stochastic Games: Example
RewardGoalreached:+100Collision: -1Otherwise: 0
Some solutions:
![Page 172: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/172.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryProblems
States
Agents
Single
Multiple
Single Multiple
Optimization
![Page 173: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/173.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryProblems
States
Agents
Single
Multiple
Single Multiple
Optimization
MDP
![Page 174: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/174.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryProblems
States
Agents
Single
Multiple
Single Multiple
Optimization
MDP
Matrix Game
![Page 175: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/175.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryProblems
States
Agents
Single
Multiple
Single Multiple
Optimization
MDP
Matrix Game
Stochastic Game
![Page 176: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/176.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryLearning
States
Agents
Single
Multiple
Single Multiple
Multi-ArmedBandit
![Page 177: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/177.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryLearning
States
Agents
Single
Multiple
Single Multiple
Multi-ArmedBandit
ReinforcementLearning
![Page 178: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/178.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryLearning
States
Agents
Single
Multiple
Single Multiple
Multi-ArmedBandit
ReinforcementLearning
Learning inRepeated Games
![Page 179: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/179.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
SummaryLearning
States
Agents
Single
Multiple
Single Multiple
Multi-ArmedBandit
ReinforcementLearning
Learning inRepeated Games
Multi-AgentLearning
![Page 180: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/180.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Learning vs Game Theory
Game theory predicts which strategies rationalplayers will playUnfortunately, in multi-agent learning, many agents arenot able to behave rationally
The problem is unknownReal-time constraintsHumans
In some problems a non-equilibrium strategy isappropriate if one expects others to playnon-equilibrium strategies
![Page 181: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/181.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Learning vs Game Theory
Game theory predicts which strategies rationalplayers will playUnfortunately, in multi-agent learning, many agents arenot able to behave rationally
The problem is unknownReal-time constraintsHumans
In some problems a non-equilibrium strategy isappropriate if one expects others to playnon-equilibrium strategies
![Page 182: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/182.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Learning vs Game Theory
Game theory predicts which strategies rationalplayers will playUnfortunately, in multi-agent learning, many agents arenot able to behave rationally
The problem is unknownReal-time constraintsHumans
In some problems a non-equilibrium strategy isappropriate if one expects others to playnon-equilibrium strategies
![Page 183: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/183.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Learning vs Game Theory
Game theory predicts which strategies rationalplayers will playUnfortunately, in multi-agent learning, many agents arenot able to behave rationally
The problem is unknownReal-time constraintsHumans
In some problems a non-equilibrium strategy isappropriate if one expects others to playnon-equilibrium strategies
![Page 184: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/184.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Learning vs Game Theory
Game theory predicts which strategies rationalplayers will playUnfortunately, in multi-agent learning, many agents arenot able to behave rationally
The problem is unknownReal-time constraintsHumans
In some problems a non-equilibrium strategy isappropriate if one expects others to playnon-equilibrium strategies
![Page 185: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/185.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Learning vs Game Theory
Game theory predicts which strategies rationalplayers will playUnfortunately, in multi-agent learning, many agents arenot able to behave rationally
The problem is unknownReal-time constraintsHumans
In some problems a non-equilibrium strategy isappropriate if one expects others to playnon-equilibrium strategies
![Page 186: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/186.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Why Learning?
If the game is known, the agent wants to learn thestrategies employed by the other agentsIf the game is unknown, the agent wants to learn alsothe structure of the game
Unknown payoffsUnknown transition probabilities
Observability: Do the agents see each others’actions, and/or each others’ payoffs?
![Page 187: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/187.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Why Learning?
If the game is known, the agent wants to learn thestrategies employed by the other agentsIf the game is unknown, the agent wants to learn alsothe structure of the game
Unknown payoffsUnknown transition probabilities
Observability: Do the agents see each others’actions, and/or each others’ payoffs?
![Page 188: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/188.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Why Learning?
If the game is known, the agent wants to learn thestrategies employed by the other agentsIf the game is unknown, the agent wants to learn alsothe structure of the game
Unknown payoffsUnknown transition probabilities
Observability: Do the agents see each others’actions, and/or each others’ payoffs?
![Page 189: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/189.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Why Learning?
If the game is known, the agent wants to learn thestrategies employed by the other agentsIf the game is unknown, the agent wants to learn alsothe structure of the game
Unknown payoffsUnknown transition probabilities
Observability: Do the agents see each others’actions, and/or each others’ payoffs?
![Page 190: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/190.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Why Learning?
If the game is known, the agent wants to learn thestrategies employed by the other agentsIf the game is unknown, the agent wants to learn alsothe structure of the game
Unknown payoffsUnknown transition probabilities
Observability: Do the agents see each others’actions, and/or each others’ payoffs?
![Page 191: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/191.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Desired Properties in MAL
Rationality: play best-response against stationaryopponentsConvergence: play a Nash equilibrium in self-playSafety: no worse than minimax strategyTargeted-optimality: approximate best-responseagainst memory-bounded opponentsCooperate and compromise: an agent must offer andaccept compromisesThere are a lot of algorithms that have been proposedshowing some of these properties (WoLF, GIGA-WoLF,AWESOME, M-Qbed, . . . )
![Page 192: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/192.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Desired Properties in MAL
Rationality: play best-response against stationaryopponentsConvergence: play a Nash equilibrium in self-playSafety: no worse than minimax strategyTargeted-optimality: approximate best-responseagainst memory-bounded opponentsCooperate and compromise: an agent must offer andaccept compromisesThere are a lot of algorithms that have been proposedshowing some of these properties (WoLF, GIGA-WoLF,AWESOME, M-Qbed, . . . )
![Page 193: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/193.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Desired Properties in MAL
Rationality: play best-response against stationaryopponentsConvergence: play a Nash equilibrium in self-playSafety: no worse than minimax strategyTargeted-optimality: approximate best-responseagainst memory-bounded opponentsCooperate and compromise: an agent must offer andaccept compromisesThere are a lot of algorithms that have been proposedshowing some of these properties (WoLF, GIGA-WoLF,AWESOME, M-Qbed, . . . )
![Page 194: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/194.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Desired Properties in MAL
Rationality: play best-response against stationaryopponentsConvergence: play a Nash equilibrium in self-playSafety: no worse than minimax strategyTargeted-optimality: approximate best-responseagainst memory-bounded opponentsCooperate and compromise: an agent must offer andaccept compromisesThere are a lot of algorithms that have been proposedshowing some of these properties (WoLF, GIGA-WoLF,AWESOME, M-Qbed, . . . )
![Page 195: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/195.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Desired Properties in MAL
Rationality: play best-response against stationaryopponentsConvergence: play a Nash equilibrium in self-playSafety: no worse than minimax strategyTargeted-optimality: approximate best-responseagainst memory-bounded opponentsCooperate and compromise: an agent must offer andaccept compromisesThere are a lot of algorithms that have been proposedshowing some of these properties (WoLF, GIGA-WoLF,AWESOME, M-Qbed, . . . )
![Page 196: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/196.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Desired Properties in MAL
Rationality: play best-response against stationaryopponentsConvergence: play a Nash equilibrium in self-playSafety: no worse than minimax strategyTargeted-optimality: approximate best-responseagainst memory-bounded opponentsCooperate and compromise: an agent must offer andaccept compromisesThere are a lot of algorithms that have been proposedshowing some of these properties (WoLF, GIGA-WoLF,AWESOME, M-Qbed, . . . )
![Page 197: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/197.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 198: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/198.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 199: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/199.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 200: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/200.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 201: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/201.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 202: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/202.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 203: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/203.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 204: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/204.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 205: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/205.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsTask type
Fully cooperativeStatic: JAL, FMQDynamic: Team-Q, Distributed-Q, OAL
Fully competitiveMinimax-Q
MixedStatic: Fictitious Play, MetaStrategy, IGA, WoLF-IGA,GIGA, GIGA-WoLF, AWESOME, Hyper-QDynamic: Single-agent RL, Nash-Q, CE-Q,Asymmetric-Q, NSCP, WoLF-PHC, PD-WoLF, EXORL
![Page 206: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/206.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Taxonomy of MARL AlgorithmsField of Origin
TemporalDifference RL
single-agent RLJAL
Distributed-QEXORLHyper-Q
FMQ
CE-QNash-QTeam-Q
minimax-QNSCP
Asymmetric-Q
OAL
Fictitious Play
AWESOME
MetaStrategy
WoLF-PHCPD-WoLF
IGAWoLF-IGA
GIGA-WoLFGIGA
Game Theory
Direct Policy Search
![Page 207: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/207.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 208: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/208.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 209: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/209.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 210: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/210.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 211: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/211.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 212: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/212.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 213: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/213.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 214: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/214.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 215: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/215.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Equilibrium or not?
Why focus on equilibria?Equilibrium identifies conditions under which learningcan or should stopEasier to play in equilibrium as opposed to continuedcomputation
Why not to focus on equilibriaNash equilibrium strategy has no prescriptive forceMultiple potential equilibriaUse of an oracle to uniquely identify an equilibria is“cheating”Opponent may not wish to play an equilibriaCalculating a Nash Equilibrium for a large game can beintractable
![Page 216: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/216.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 217: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/217.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 218: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/218.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 219: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/219.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 220: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/220.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 221: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/221.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 222: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/222.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 223: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/223.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 224: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/224.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 225: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/225.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 226: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/226.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Indipendent Learners
Typical conditions for IndependentLearning (IL):
An agent is unaware of theexistence of other agentsIt cannot identify other agent’sactions, or has no reason to believethat other agents are actingstrategically.
Independent learners try to learn bestresponses
AdvantagesStraightforward application ofsingle-agent techniquesScales with the number of agents
DisadvantagesConvergence guarantees fromsingle-agent setting are lostNo explicit means for coordination
Traffic
![Page 227: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/227.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent Reinforcement Learners
Q-learning [Watkins’92]Learning Automata [Narendra’74,Wheeler’86]WoLF-PHC [Bowling’01]FAQ-learning [Kaisers’10]RESQ-learning [Hennes’10]
![Page 228: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/228.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Joint Action Learners
A joint action learner (JAL) is an agent that learnsQ-values for joint actionsTo estimate opponents’ actions empiricaldistributions can be used (as in fictitious play)
fi(a−i) = Πj 6=iφj(a−i)
The expected value of an individual action is the sum ofjoint Q-values, weighted by the estimated probability ofthe associated complementary joint action profiles:
EV (ai) =∑
a−i∈A−i
Q(ai ∪ a−i)fi(a−i)
![Page 229: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/229.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Joint Action Learners
A joint action learner (JAL) is an agent that learnsQ-values for joint actionsTo estimate opponents’ actions empiricaldistributions can be used (as in fictitious play)
fi(a−i) = Πj 6=iφj(a−i)
The expected value of an individual action is the sum ofjoint Q-values, weighted by the estimated probability ofthe associated complementary joint action profiles:
EV (ai) =∑
a−i∈A−i
Q(ai ∪ a−i)fi(a−i)
![Page 230: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/230.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Joint Action Learners
A joint action learner (JAL) is an agent that learnsQ-values for joint actionsTo estimate opponents’ actions empiricaldistributions can be used (as in fictitious play)
fi(a−i) = Πj 6=iφj(a−i)
The expected value of an individual action is the sum ofjoint Q-values, weighted by the estimated probability ofthe associated complementary joint action profiles:
EV (ai) =∑
a−i∈A−i
Q(ai ∪ a−i)fi(a−i)
![Page 231: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/231.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Outline
1 Introduction to Multi-Agent Reinforcement LearningReinforcement LearningMARL vs RLMARL vs Game Theory
2 MARL algorithmsBest-Response LearningEquilibrium Learners
Team GamesZero-sum GamesGeneral-sum Games
![Page 232: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/232.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Games
Team games are fully cooperative games in which allthe agents share the same reward functionIf the learning is centralized, it is actually single-agentlearning with multiple actuatorsMulti-agent learning raises in distributed problems
![Page 233: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/233.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Games
Team games are fully cooperative games in which allthe agents share the same reward functionIf the learning is centralized, it is actually single-agentlearning with multiple actuatorsMulti-agent learning raises in distributed problems
![Page 234: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/234.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Games
Team games are fully cooperative games in which allthe agents share the same reward functionIf the learning is centralized, it is actually single-agentlearning with multiple actuatorsMulti-agent learning raises in distributed problems
![Page 235: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/235.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Coordination Equilibria
In a coordination equilibrium all the agents achieve theirmaximum possible payoff.If π1, π2, . . . are in coordination equilibrium, we havethat∑a1,...,an
π1(s, a1)·· · ··πn(s, an)Qi(s, a1, . . . , an) = maxa1,...,an
Qi(s, a1, . . . , an)
for all 1 ≤ i ≤ n ans states sIf a game has a coordination equilibrium, then it has adeterministic coordination equilibrium
![Page 236: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/236.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Coordination Equilibria
In a coordination equilibrium all the agents achieve theirmaximum possible payoff.If π1, π2, . . . are in coordination equilibrium, we havethat∑a1,...,an
π1(s, a1)·· · ··πn(s, an)Qi(s, a1, . . . , an) = maxa1,...,an
Qi(s, a1, . . . , an)
for all 1 ≤ i ≤ n ans states sIf a game has a coordination equilibrium, then it has adeterministic coordination equilibrium
![Page 237: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/237.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Coordination Equilibria
In a coordination equilibrium all the agents achieve theirmaximum possible payoff.If π1, π2, . . . are in coordination equilibrium, we havethat∑a1,...,an
π1(s, a1)·· · ··πn(s, an)Qi(s, a1, . . . , an) = maxa1,...,an
Qi(s, a1, . . . , an)
for all 1 ≤ i ≤ n ans states sIf a game has a coordination equilibrium, then it has adeterministic coordination equilibrium
![Page 238: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/238.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Coordination game
a0 a1b0 10 0b1 0 10
The agents use Boltzmann explorationBoth are able to converge to one of the optimalstrategiesJALs can distinguish Q-values of different joint actionsThe difference in performance is small due to theexploration strategy
![Page 239: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/239.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Coordination game
a0 a1b0 10 0b1 0 10
The agents use Boltzmann explorationBoth are able to converge to one of the optimalstrategiesJALs can distinguish Q-values of different joint actionsThe difference in performance is small due to theexploration strategy
![Page 240: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/240.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Coordination game
a0 a1b0 10 0b1 0 10
The agents use Boltzmann explorationBoth are able to converge to one of the optimalstrategiesJALs can distinguish Q-values of different joint actionsThe difference in performance is small due to theexploration strategy
![Page 241: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/241.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Coordination game
a0 a1b0 10 0b1 0 10
The agents use Boltzmann explorationBoth are able to converge to one of the optimalstrategiesJALs can distinguish Q-values of different joint actionsThe difference in performance is small due to theexploration strategy
![Page 242: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/242.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Penalty game
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Considering k < 0, the game has 3 pure equilibria
Suppose penalty K = −100
Both ILs and JALs will converge to the self-confirmingequilibrium 〈a1,b1〉The magnitude of the penalty k influences the probability ofconvergence to the optimal joint strategy
![Page 243: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/243.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Penalty game
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Considering k < 0, the game has 3 pure equilibria
Suppose penalty K = −100
Both ILs and JALs will converge to the self-confirmingequilibrium 〈a1,b1〉The magnitude of the penalty k influences the probability ofconvergence to the optimal joint strategy
![Page 244: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/244.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Penalty game
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Considering k < 0, the game has 3 pure equilibria
Suppose penalty K = −100
Both ILs and JALs will converge to the self-confirmingequilibrium 〈a1,b1〉The magnitude of the penalty k influences the probability ofconvergence to the optimal joint strategy
![Page 245: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/245.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Penalty game
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Considering k < 0, the game has 3 pure equilibria
Suppose penalty K = −100
Both ILs and JALs will converge to the self-confirmingequilibrium 〈a1,b1〉The magnitude of the penalty k influences the probability ofconvergence to the optimal joint strategy
![Page 246: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/246.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Climbing game
a0 a1 a2b0 11 -30 0b1 -30 7 6b2 0 0 5
Agents start playing 〈a2,b2〉Agents converge to 〈a1,b1〉Convergence to pure equilibria is almost sure
![Page 247: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/247.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Climbing game
a0 a1 a2b0 11 -30 0b1 -30 7 6b2 0 0 5
Agents start playing 〈a2,b2〉Agents converge to 〈a1,b1〉Convergence to pure equilibria is almost sure
![Page 248: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/248.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersExample: Climbing game
a0 a1 a2b0 11 -30 0b1 -30 7 6b2 0 0 5
Agents start playing 〈a2,b2〉Agents converge to 〈a1,b1〉Convergence to pure equilibria is almost sure
![Page 249: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/249.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersSufficient conditions
The learning rate α decreases over time such that∑
t α is divergentand
∑t α
2 is convergent
Each agent samples each of its actions infinitely often
The probability P it (a) of agent i choosing action a is nonzero
Agents become full exploiters with probability one eventually:
limt→∞
P it (Xt) = 0
where Xt is a random variable denoting the event that (fi , gi)prescribe a sub-optimal action
Let Et be a random variable denoting the probability of a(deterministic) equilibrium strategy profile being played at timet. Then for both ILs and JALs, for any δ, ε > 0, there is anT (δ, ε) such that
Pr(|Et − 1| < ε) > 1− δ
for all t > T (δ, ε).
![Page 250: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/250.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersSufficient conditions
The learning rate α decreases over time such that∑
t α is divergentand
∑t α
2 is convergent
Each agent samples each of its actions infinitely often
The probability P it (a) of agent i choosing action a is nonzero
Agents become full exploiters with probability one eventually:
limt→∞
P it (Xt) = 0
where Xt is a random variable denoting the event that (fi , gi)prescribe a sub-optimal action
Let Et be a random variable denoting the probability of a(deterministic) equilibrium strategy profile being played at timet. Then for both ILs and JALs, for any δ, ε > 0, there is anT (δ, ε) such that
Pr(|Et − 1| < ε) > 1− δ
for all t > T (δ, ε).
![Page 251: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/251.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersSufficient conditions
The learning rate α decreases over time such that∑
t α is divergentand
∑t α
2 is convergent
Each agent samples each of its actions infinitely often
The probability P it (a) of agent i choosing action a is nonzero
Agents become full exploiters with probability one eventually:
limt→∞
P it (Xt) = 0
where Xt is a random variable denoting the event that (fi , gi)prescribe a sub-optimal action
Let Et be a random variable denoting the probability of a(deterministic) equilibrium strategy profile being played at timet. Then for both ILs and JALs, for any δ, ε > 0, there is anT (δ, ε) such that
Pr(|Et − 1| < ε) > 1− δ
for all t > T (δ, ε).
![Page 252: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/252.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersSufficient conditions
The learning rate α decreases over time such that∑
t α is divergentand
∑t α
2 is convergent
Each agent samples each of its actions infinitely often
The probability P it (a) of agent i choosing action a is nonzero
Agents become full exploiters with probability one eventually:
limt→∞
P it (Xt) = 0
where Xt is a random variable denoting the event that (fi , gi)prescribe a sub-optimal action
Let Et be a random variable denoting the probability of a(deterministic) equilibrium strategy profile being played at timet. Then for both ILs and JALs, for any δ, ε > 0, there is anT (δ, ε) such that
Pr(|Et − 1| < ε) > 1− δ
for all t > T (δ, ε).
![Page 253: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/253.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersSufficient conditions
The learning rate α decreases over time such that∑
t α is divergentand
∑t α
2 is convergent
Each agent samples each of its actions infinitely often
The probability P it (a) of agent i choosing action a is nonzero
Agents become full exploiters with probability one eventually:
limt→∞
P it (Xt) = 0
where Xt is a random variable denoting the event that (fi , gi)prescribe a sub-optimal action
Let Et be a random variable denoting the probability of a(deterministic) equilibrium strategy profile being played at timet. Then for both ILs and JALs, for any δ, ε > 0, there is anT (δ, ε) such that
Pr(|Et − 1| < ε) > 1− δ
for all t > T (δ, ε).
![Page 254: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/254.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic Heuristics
Neither ILs nor JALs ensure convergence to an optimalequilibriumNo hope with ILs, but JALs with a different explorationstrategy...Myopic heuristics
Optimistic Boltzmann (OB): For agent i , actionai ∈ Ai , let maxQ(ai ) = maxΠ−i Q(Π−i ,ai ). Chooseactions with Boltzmann exploration (another expolitivestrategy would suffice) using MaxQ(ai ) as the value ofaiWeighted OB (WOB): Explore using Boltzmann usingfactors MaxQ(ai ) · Pri (optimalmatchΠ−i forai )Combined: Let C(ai ) = ρMaxQ(ai ) + (1− ρ)EV (ai ), forsome 0 ≤ ρ ≤ 1. Choose actions using Boltzmannexploration with C(ai ) as value of ai
![Page 255: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/255.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic Heuristics
Neither ILs nor JALs ensure convergence to an optimalequilibriumNo hope with ILs, but JALs with a different explorationstrategy...Myopic heuristics
Optimistic Boltzmann (OB): For agent i , actionai ∈ Ai , let maxQ(ai ) = maxΠ−i Q(Π−i ,ai ). Chooseactions with Boltzmann exploration (another expolitivestrategy would suffice) using MaxQ(ai ) as the value ofaiWeighted OB (WOB): Explore using Boltzmann usingfactors MaxQ(ai ) · Pri (optimalmatchΠ−i forai )Combined: Let C(ai ) = ρMaxQ(ai ) + (1− ρ)EV (ai ), forsome 0 ≤ ρ ≤ 1. Choose actions using Boltzmannexploration with C(ai ) as value of ai
![Page 256: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/256.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic Heuristics
Neither ILs nor JALs ensure convergence to an optimalequilibriumNo hope with ILs, but JALs with a different explorationstrategy...Myopic heuristics
Optimistic Boltzmann (OB): For agent i , actionai ∈ Ai , let maxQ(ai ) = maxΠ−i Q(Π−i ,ai ). Chooseactions with Boltzmann exploration (another expolitivestrategy would suffice) using MaxQ(ai ) as the value ofaiWeighted OB (WOB): Explore using Boltzmann usingfactors MaxQ(ai ) · Pri (optimalmatchΠ−i forai )Combined: Let C(ai ) = ρMaxQ(ai ) + (1− ρ)EV (ai ), forsome 0 ≤ ρ ≤ 1. Choose actions using Boltzmannexploration with C(ai ) as value of ai
![Page 257: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/257.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic Heuristics
Neither ILs nor JALs ensure convergence to an optimalequilibriumNo hope with ILs, but JALs with a different explorationstrategy...Myopic heuristics
Optimistic Boltzmann (OB): For agent i , actionai ∈ Ai , let maxQ(ai ) = maxΠ−i Q(Π−i ,ai ). Chooseactions with Boltzmann exploration (another expolitivestrategy would suffice) using MaxQ(ai ) as the value ofaiWeighted OB (WOB): Explore using Boltzmann usingfactors MaxQ(ai ) · Pri (optimalmatchΠ−i forai )Combined: Let C(ai ) = ρMaxQ(ai ) + (1− ρ)EV (ai ), forsome 0 ≤ ρ ≤ 1. Choose actions using Boltzmannexploration with C(ai ) as value of ai
![Page 258: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/258.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic Heuristics
Neither ILs nor JALs ensure convergence to an optimalequilibriumNo hope with ILs, but JALs with a different explorationstrategy...Myopic heuristics
Optimistic Boltzmann (OB): For agent i , actionai ∈ Ai , let maxQ(ai ) = maxΠ−i Q(Π−i ,ai ). Chooseactions with Boltzmann exploration (another expolitivestrategy would suffice) using MaxQ(ai ) as the value ofaiWeighted OB (WOB): Explore using Boltzmann usingfactors MaxQ(ai ) · Pri (optimalmatchΠ−i forai )Combined: Let C(ai ) = ρMaxQ(ai ) + (1− ρ)EV (ai ), forsome 0 ≤ ρ ≤ 1. Choose actions using Boltzmannexploration with C(ai ) as value of ai
![Page 259: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/259.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic Heuristics
Neither ILs nor JALs ensure convergence to an optimalequilibriumNo hope with ILs, but JALs with a different explorationstrategy...Myopic heuristics
Optimistic Boltzmann (OB): For agent i , actionai ∈ Ai , let maxQ(ai ) = maxΠ−i Q(Π−i ,ai ). Chooseactions with Boltzmann exploration (another expolitivestrategy would suffice) using MaxQ(ai ) as the value ofaiWeighted OB (WOB): Explore using Boltzmann usingfactors MaxQ(ai ) · Pri (optimalmatchΠ−i forai )Combined: Let C(ai ) = ρMaxQ(ai ) + (1− ρ)EV (ai ), forsome 0 ≤ ρ ≤ 1. Choose actions using Boltzmannexploration with C(ai ) as value of ai
![Page 260: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/260.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Independent vs Joint Action LearnersMyopic heuristics: Penalty game
![Page 261: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/261.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learning [Lauer and Riedmiller,’01]
Applies to deterministic cooperative SGsNon-negative reward functionsUpdate rule:
Q0(s,a) = 0
Qk+1(s,a) = max(Qk (s,a),R(s,a) + γmaxa′∈A
Q(s′,a′))
This optimistic algorithm learns distributed Q-tables,provided that all state-action pairs occurs infinitely often
![Page 262: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/262.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learning [Lauer and Riedmiller,’01]
Applies to deterministic cooperative SGsNon-negative reward functionsUpdate rule:
Q0(s,a) = 0
Qk+1(s,a) = max(Qk (s,a),R(s,a) + γmaxa′∈A
Q(s′,a′))
This optimistic algorithm learns distributed Q-tables,provided that all state-action pairs occurs infinitely often
![Page 263: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/263.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learning [Lauer and Riedmiller,’01]
Applies to deterministic cooperative SGsNon-negative reward functionsUpdate rule:
Q0(s,a) = 0
Qk+1(s,a) = max(Qk (s,a),R(s,a) + γmaxa′∈A
Q(s′,a′))
This optimistic algorithm learns distributed Q-tables,provided that all state-action pairs occurs infinitely often
![Page 264: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/264.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learning [Lauer and Riedmiller,’01]
Applies to deterministic cooperative SGsNon-negative reward functionsUpdate rule:
Q0(s,a) = 0
Qk+1(s,a) = max(Qk (s,a),R(s,a) + γmaxa′∈A
Q(s′,a′))
This optimistic algorithm learns distributed Q-tables,provided that all state-action pairs occurs infinitely often
![Page 265: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/265.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningExample: Climbing Games
a0 a1 a2b0 11 -30 0b1 -30 7 6b2 0 0 5
Distributed Q-tables
a0 a1 a2Q1(s0,a) 11 7 6Q2(s0,a) 11 7 5
![Page 266: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/266.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningExample: Climbing Games
a0 a1 a2b0 11 -30 0b1 -30 7 6b2 0 0 5
Distributed Q-tables
a0 a1 a2Q1(s0,a) 11 7 6Q2(s0,a) 11 7 5
![Page 267: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/267.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningExample: Penalty Games
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Distributed Q-tables
a0 a1 a2Q1(s0,a) 10 2 10Q2(s0,a) 10 2 10
It requires an additional mechanism of coordinationbetween agents
Update the current policy only if an improvement to theQ-value happens
![Page 268: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/268.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningExample: Penalty Games
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Distributed Q-tables
a0 a1 a2Q1(s0,a) 10 2 10Q2(s0,a) 10 2 10
It requires an additional mechanism of coordinationbetween agents
Update the current policy only if an improvement to theQ-value happens
![Page 269: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/269.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningExample: Penalty Games
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Distributed Q-tables
a0 a1 a2Q1(s0,a) 10 2 10Q2(s0,a) 10 2 10
It requires an additional mechanism of coordinationbetween agents
Update the current policy only if an improvement to theQ-value happens
![Page 270: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/270.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningExample: Penalty Games
a0 a1 a2b0 10 0 kb1 0 2 0b2 k 0 10
Distributed Q-tables
a0 a1 a2Q1(s0,a) 10 2 10Q2(s0,a) 10 2 10
It requires an additional mechanism of coordinationbetween agents
Update the current policy only if an improvement to theQ-value happens
![Page 271: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/271.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningStochastic environments
Distributed Q-learning works fine with deterministiccooperative environmentsExtension to stochastic environments is problematicThe main difficulty is that Q-values are affected by twokinds of uncertainty
behavior of other agentsinfluence of stochastic environments
Distinguish these two uncertainties is a key point inmultiagent learning
![Page 272: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/272.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningStochastic environments
Distributed Q-learning works fine with deterministiccooperative environmentsExtension to stochastic environments is problematicThe main difficulty is that Q-values are affected by twokinds of uncertainty
behavior of other agentsinfluence of stochastic environments
Distinguish these two uncertainties is a key point inmultiagent learning
![Page 273: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/273.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningStochastic environments
Distributed Q-learning works fine with deterministiccooperative environmentsExtension to stochastic environments is problematicThe main difficulty is that Q-values are affected by twokinds of uncertainty
behavior of other agentsinfluence of stochastic environments
Distinguish these two uncertainties is a key point inmultiagent learning
![Page 274: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/274.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningStochastic environments
Distributed Q-learning works fine with deterministiccooperative environmentsExtension to stochastic environments is problematicThe main difficulty is that Q-values are affected by twokinds of uncertainty
behavior of other agentsinfluence of stochastic environments
Distinguish these two uncertainties is a key point inmultiagent learning
![Page 275: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/275.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningStochastic environments
Distributed Q-learning works fine with deterministiccooperative environmentsExtension to stochastic environments is problematicThe main difficulty is that Q-values are affected by twokinds of uncertainty
behavior of other agentsinfluence of stochastic environments
Distinguish these two uncertainties is a key point inmultiagent learning
![Page 276: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/276.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Distributed Q-learningStochastic environments
Distributed Q-learning works fine with deterministiccooperative environmentsExtension to stochastic environments is problematicThe main difficulty is that Q-values are affected by twokinds of uncertainty
behavior of other agentsinfluence of stochastic environments
Distinguish these two uncertainties is a key point inmultiagent learning
![Page 277: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/277.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Q-learning [Littman, ’01]
Requires to observe actions from other agentsUpdate rule
Q1(s, a1, . . . , an)← (1− α)Q1(s, a1, . . . , an) + α
(r1 + γmaxa′1,...,a
′n
Q1(s′, a′1, . . . , a′n)
)
It does not use an opponent modelIn a team game, team Q-learning will converge to theoptimal Q-function with probability one.If the limit equilibrium is unique and the agent follows aGLIE policy, it will converge in behavior withprobability oneThe main problem is to select an equilibrium whenthere are multiple coordination equilibria
![Page 278: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/278.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Q-learning [Littman, ’01]
Requires to observe actions from other agentsUpdate rule
Q1(s, a1, . . . , an)← (1− α)Q1(s, a1, . . . , an) + α
(r1 + γmaxa′1,...,a
′n
Q1(s′, a′1, . . . , a′n)
)
It does not use an opponent modelIn a team game, team Q-learning will converge to theoptimal Q-function with probability one.If the limit equilibrium is unique and the agent follows aGLIE policy, it will converge in behavior withprobability oneThe main problem is to select an equilibrium whenthere are multiple coordination equilibria
![Page 279: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/279.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Q-learning [Littman, ’01]
Requires to observe actions from other agentsUpdate rule
Q1(s, a1, . . . , an)← (1− α)Q1(s, a1, . . . , an) + α
(r1 + γmaxa′1,...,a
′n
Q1(s′, a′1, . . . , a′n)
)
It does not use an opponent modelIn a team game, team Q-learning will converge to theoptimal Q-function with probability one.If the limit equilibrium is unique and the agent follows aGLIE policy, it will converge in behavior withprobability oneThe main problem is to select an equilibrium whenthere are multiple coordination equilibria
![Page 280: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/280.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Q-learning [Littman, ’01]
Requires to observe actions from other agentsUpdate rule
Q1(s, a1, . . . , an)← (1− α)Q1(s, a1, . . . , an) + α
(r1 + γmaxa′1,...,a
′n
Q1(s′, a′1, . . . , a′n)
)
It does not use an opponent modelIn a team game, team Q-learning will converge to theoptimal Q-function with probability one.If the limit equilibrium is unique and the agent follows aGLIE policy, it will converge in behavior withprobability oneThe main problem is to select an equilibrium whenthere are multiple coordination equilibria
![Page 281: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/281.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Q-learning [Littman, ’01]
Requires to observe actions from other agentsUpdate rule
Q1(s, a1, . . . , an)← (1− α)Q1(s, a1, . . . , an) + α
(r1 + γmaxa′1,...,a
′n
Q1(s′, a′1, . . . , a′n)
)
It does not use an opponent modelIn a team game, team Q-learning will converge to theoptimal Q-function with probability one.If the limit equilibrium is unique and the agent follows aGLIE policy, it will converge in behavior withprobability oneThe main problem is to select an equilibrium whenthere are multiple coordination equilibria
![Page 282: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/282.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Team Q-learning [Littman, ’01]
Requires to observe actions from other agentsUpdate rule
Q1(s, a1, . . . , an)← (1− α)Q1(s, a1, . . . , an) + α
(r1 + γmaxa′1,...,a
′n
Q1(s′, a′1, . . . , a′n)
)
It does not use an opponent modelIn a team game, team Q-learning will converge to theoptimal Q-function with probability one.If the limit equilibrium is unique and the agent follows aGLIE policy, it will converge in behavior withprobability oneThe main problem is to select an equilibrium whenthere are multiple coordination equilibria
![Page 283: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/283.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Zero-sum Games
Consider 2 playersR1(i , j) = M(i , j)R1(i , j) = −M(i , j)player 1 is maximizerplayer 2 is minimizerExamples: matching pennies, rock-paper-scissors
![Page 284: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/284.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Zero-sum Games
Consider 2 playersR1(i , j) = M(i , j)R1(i , j) = −M(i , j)player 1 is maximizerplayer 2 is minimizerExamples: matching pennies, rock-paper-scissors
![Page 285: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/285.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Zero-sum Games
Consider 2 playersR1(i , j) = M(i , j)R1(i , j) = −M(i , j)player 1 is maximizerplayer 2 is minimizerExamples: matching pennies, rock-paper-scissors
![Page 286: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/286.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Zero-sum Games
Consider 2 playersR1(i , j) = M(i , j)R1(i , j) = −M(i , j)player 1 is maximizerplayer 2 is minimizerExamples: matching pennies, rock-paper-scissors
![Page 287: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/287.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Zero-sum Games
Consider 2 playersR1(i , j) = M(i , j)R1(i , j) = −M(i , j)player 1 is maximizerplayer 2 is minimizerExamples: matching pennies, rock-paper-scissors
![Page 288: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/288.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Zero-sum Games
Consider 2 playersR1(i , j) = M(i , j)R1(i , j) = −M(i , j)player 1 is maximizerplayer 2 is minimizerExamples: matching pennies, rock-paper-scissors
![Page 289: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/289.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-Q [Littman, ’94]
In MDPs a stationary, deterministic, and undominatedoptimal policy always existsIn MGs The performance of a policy depends on theopponent’s policy, so we cannot evaluate them withoutcontext.New definition of optimality in game theory
Performs best at its worst case compared with othersAt least one optimal policy exists, which may or may notbe deterministic because the agent is uncertain of itsopponent’s move.
![Page 290: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/290.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-Q [Littman, ’94]
In MDPs a stationary, deterministic, and undominatedoptimal policy always existsIn MGs The performance of a policy depends on theopponent’s policy, so we cannot evaluate them withoutcontext.New definition of optimality in game theory
Performs best at its worst case compared with othersAt least one optimal policy exists, which may or may notbe deterministic because the agent is uncertain of itsopponent’s move.
![Page 291: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/291.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-Q [Littman, ’94]
In MDPs a stationary, deterministic, and undominatedoptimal policy always existsIn MGs The performance of a policy depends on theopponent’s policy, so we cannot evaluate them withoutcontext.New definition of optimality in game theory
Performs best at its worst case compared with othersAt least one optimal policy exists, which may or may notbe deterministic because the agent is uncertain of itsopponent’s move.
![Page 292: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/292.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-Q [Littman, ’94]
In MDPs a stationary, deterministic, and undominatedoptimal policy always existsIn MGs The performance of a policy depends on theopponent’s policy, so we cannot evaluate them withoutcontext.New definition of optimality in game theory
Performs best at its worst case compared with othersAt least one optimal policy exists, which may or may notbe deterministic because the agent is uncertain of itsopponent’s move.
![Page 293: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/293.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-Q [Littman, ’94]
In MDPs a stationary, deterministic, and undominatedoptimal policy always existsIn MGs The performance of a policy depends on theopponent’s policy, so we cannot evaluate them withoutcontext.New definition of optimality in game theory
Performs best at its worst case compared with othersAt least one optimal policy exists, which may or may notbe deterministic because the agent is uncertain of itsopponent’s move.
![Page 294: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/294.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QLearning Optimal Policy
Q-learning
Q(s,a)← (1− α)Q(s,a) + α(r + γV (s′)
)V (s) = max
aQ(s,a)
minimax-Q learning
Q(s,a,o) := (1− α)Q(s,a,o) + α(rs,a,o + γV (s′)
)π(s, . . . ) := arg max
π′(s,... )min
o′
∑a′
(π(s,a′) ·Q(s,a′,o′)
)V (s) := min
o′
∑a′π(s,a′) ·Q(s,a′,o′)
![Page 295: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/295.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QLearning Optimal Policy
Q-learning
Q(s,a)← (1− α)Q(s,a) + α(r + γV (s′)
)V (s) = max
aQ(s,a)
minimax-Q learning
Q(s,a,o) := (1− α)Q(s,a,o) + α(rs,a,o + γV (s′)
)π(s, . . . ) := arg max
π′(s,... )min
o′
∑a′
(π(s,a′) ·Q(s,a′,o′)
)V (s) := min
o′
∑a′π(s,a′) ·Q(s,a′,o′)
![Page 296: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/296.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QConsiderations
In a two-player zero-sum multiagent environment, anagent following minimax Q-learning will converge tothe optimal Q-function with probability one.Furthermore, if it follows a GLIE policy and the limitequilibrium is unique, it will converge in behavior withprobability oneIn zero-sum SGs, even if the limit equilibrium is notunique, it converge to a policy that always achieve theoptimal value regardless of its opponent (safety)The minimax Q-learning achieves the largest valuepossible in the absence of knowledge of theopponent’s policyMinimax-Q is quite slow to converge w.r.t. Q-learning(but the latter learns only deterministic policies)
![Page 297: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/297.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QConsiderations
In a two-player zero-sum multiagent environment, anagent following minimax Q-learning will converge tothe optimal Q-function with probability one.Furthermore, if it follows a GLIE policy and the limitequilibrium is unique, it will converge in behavior withprobability oneIn zero-sum SGs, even if the limit equilibrium is notunique, it converge to a policy that always achieve theoptimal value regardless of its opponent (safety)The minimax Q-learning achieves the largest valuepossible in the absence of knowledge of theopponent’s policyMinimax-Q is quite slow to converge w.r.t. Q-learning(but the latter learns only deterministic policies)
![Page 298: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/298.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QConsiderations
In a two-player zero-sum multiagent environment, anagent following minimax Q-learning will converge tothe optimal Q-function with probability one.Furthermore, if it follows a GLIE policy and the limitequilibrium is unique, it will converge in behavior withprobability oneIn zero-sum SGs, even if the limit equilibrium is notunique, it converge to a policy that always achieve theoptimal value regardless of its opponent (safety)The minimax Q-learning achieves the largest valuepossible in the absence of knowledge of theopponent’s policyMinimax-Q is quite slow to converge w.r.t. Q-learning(but the latter learns only deterministic policies)
![Page 299: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/299.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QConsiderations
In a two-player zero-sum multiagent environment, anagent following minimax Q-learning will converge tothe optimal Q-function with probability one.Furthermore, if it follows a GLIE policy and the limitequilibrium is unique, it will converge in behavior withprobability oneIn zero-sum SGs, even if the limit equilibrium is notunique, it converge to a policy that always achieve theoptimal value regardless of its opponent (safety)The minimax Q-learning achieves the largest valuepossible in the absence of knowledge of theopponent’s policyMinimax-Q is quite slow to converge w.r.t. Q-learning(but the latter learns only deterministic policies)
![Page 300: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/300.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Minimax-QConsiderations
In a two-player zero-sum multiagent environment, anagent following minimax Q-learning will converge tothe optimal Q-function with probability one.Furthermore, if it follows a GLIE policy and the limitequilibrium is unique, it will converge in behavior withprobability oneIn zero-sum SGs, even if the limit equilibrium is notunique, it converge to a policy that always achieve theoptimal value regardless of its opponent (safety)The minimax Q-learning achieves the largest valuepossible in the absence of knowledge of theopponent’s policyMinimax-Q is quite slow to converge w.r.t. Q-learning(but the latter learns only deterministic policies)
![Page 301: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/301.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Can we extend this approach to general-sumSGs?
Yes and NoNash-Q learning is such an extensionHowever, it has much worse computational andtheoretical properties
![Page 302: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/302.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Can we extend this approach to general-sumSGs?
Yes and NoNash-Q learning is such an extensionHowever, it has much worse computational andtheoretical properties
![Page 303: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/303.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Can we extend this approach to general-sumSGs?
Yes and NoNash-Q learning is such an extensionHowever, it has much worse computational andtheoretical properties
![Page 304: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/304.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-Q [Hu & Wellman, ’98-’03]
NashQit (s′) = π1(s′) · · · · · πn(s′) ·Qi
t (s′)
Each agent needs to maintain the Q-functions of all theother agents
![Page 305: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/305.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-Q [Hu & Wellman, ’98-’03]
NashQit (s′) = π1(s′) · · · · · πn(s′) ·Qi
t (s′)
Each agent needs to maintain the Q-functions of all theother agents
![Page 306: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/306.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QComplexity
Space requirements: n · |S| · |A|n
The algorithm running time is dominated by thecalculation of Nash equilibriumThe minimax operator can be computed in polynomialtime (linear programming)
![Page 307: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/307.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QComplexity
Space requirements: n · |S| · |A|n
The algorithm running time is dominated by thecalculation of Nash equilibriumThe minimax operator can be computed in polynomialtime (linear programming)
![Page 308: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/308.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QComplexity
Space requirements: n · |S| · |A|n
The algorithm running time is dominated by thecalculation of Nash equilibriumThe minimax operator can be computed in polynomialtime (linear programming)
![Page 309: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/309.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 310: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/310.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 311: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/311.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 312: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/312.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 313: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/313.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 314: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/314.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 315: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/315.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 316: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/316.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence
Assumptions
Every state and joint action are visited infinitely oftenLearning rates suitably decreaseOne of the following assumptions hold
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a global optimal point,
and agent’s payoff in this equilibrium are used to update their Q-functions
Every stage game (Q1t (s), . . . ,Qn
t (s)), for all t and s, has a saddle point, and
agent’s payoff in this equilibrium are used to update their Q-functions
TheoremUnder these assumptions, the sequence Qt = (Q1
t , . . . ,Qnt ), updated
by
Qkt+1(s, a1
, . . . , an) = (1−αt )Qkt (s, a1
, . . . , an)+αt
(rkt + γπ
1(s′) · · · · · πn(s′)Qkt (s′)
)for k = 1, . . . , n
where(π1(s′), . . . , πn(s′)
)is the appropriate type of Nash
equilibrium solution for the stage game(Q1
t (s′), . . . ,Qnt (s′)
),
converges to the Nash Q-value Q∗ =(Q1∗, . . . ,Qn
∗).
![Page 317: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/317.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence Result Analysis
The third assumption is really strongIt is unlikely that stage games during learning maintainadherence to assumptionsThe global optimum assumption implies fullcooperation between agents.The saddle point assumption implies no cooperationbetween agents.Nonetheless, empirically the algorithm convergeseven when assumptions are violatedThis suggests that assumptions may be relaxed, atleast for some classes of games
![Page 318: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/318.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence Result Analysis
The third assumption is really strongIt is unlikely that stage games during learning maintainadherence to assumptionsThe global optimum assumption implies fullcooperation between agents.The saddle point assumption implies no cooperationbetween agents.Nonetheless, empirically the algorithm convergeseven when assumptions are violatedThis suggests that assumptions may be relaxed, atleast for some classes of games
![Page 319: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/319.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence Result Analysis
The third assumption is really strongIt is unlikely that stage games during learning maintainadherence to assumptionsThe global optimum assumption implies fullcooperation between agents.The saddle point assumption implies no cooperationbetween agents.Nonetheless, empirically the algorithm convergeseven when assumptions are violatedThis suggests that assumptions may be relaxed, atleast for some classes of games
![Page 320: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/320.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence Result Analysis
The third assumption is really strongIt is unlikely that stage games during learning maintainadherence to assumptionsThe global optimum assumption implies fullcooperation between agents.The saddle point assumption implies no cooperationbetween agents.Nonetheless, empirically the algorithm convergeseven when assumptions are violatedThis suggests that assumptions may be relaxed, atleast for some classes of games
![Page 321: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/321.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence Result Analysis
The third assumption is really strongIt is unlikely that stage games during learning maintainadherence to assumptionsThe global optimum assumption implies fullcooperation between agents.The saddle point assumption implies no cooperationbetween agents.Nonetheless, empirically the algorithm convergeseven when assumptions are violatedThis suggests that assumptions may be relaxed, atleast for some classes of games
![Page 322: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/322.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Nash-QConvergence Result Analysis
The third assumption is really strongIt is unlikely that stage games during learning maintainadherence to assumptionsThe global optimum assumption implies fullcooperation between agents.The saddle point assumption implies no cooperationbetween agents.Nonetheless, empirically the algorithm convergeseven when assumptions are violatedThis suggests that assumptions may be relaxed, atleast for some classes of games
![Page 323: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/323.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe [Littman, ’01]
Friend-or-Foe Q-learning (FFQ) aims at removing therequirements on the intermediate Q-values duringlearningThe idea is to let the algorithm know what kind ofopponent to expect
friend: coordination equilibrium
Nash1(s,Q1,Q2) = maxa1∈A1,a2∈A2
Q1(s,a1,a2)
foe: adversarial equilibrium
Nash1(s,Q1,Q2) = maxπ∈Π(A1)
mina2∈A2
∑a1∈A1
π(a1)Q1(s,a1,a2)
In FFQ the learner maintains only a Q-function foritself
![Page 324: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/324.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe [Littman, ’01]
Friend-or-Foe Q-learning (FFQ) aims at removing therequirements on the intermediate Q-values duringlearningThe idea is to let the algorithm know what kind ofopponent to expect
friend: coordination equilibrium
Nash1(s,Q1,Q2) = maxa1∈A1,a2∈A2
Q1(s,a1,a2)
foe: adversarial equilibrium
Nash1(s,Q1,Q2) = maxπ∈Π(A1)
mina2∈A2
∑a1∈A1
π(a1)Q1(s,a1,a2)
In FFQ the learner maintains only a Q-function foritself
![Page 325: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/325.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe [Littman, ’01]
Friend-or-Foe Q-learning (FFQ) aims at removing therequirements on the intermediate Q-values duringlearningThe idea is to let the algorithm know what kind ofopponent to expect
friend: coordination equilibrium
Nash1(s,Q1,Q2) = maxa1∈A1,a2∈A2
Q1(s,a1,a2)
foe: adversarial equilibrium
Nash1(s,Q1,Q2) = maxπ∈Π(A1)
mina2∈A2
∑a1∈A1
π(a1)Q1(s,a1,a2)
In FFQ the learner maintains only a Q-function foritself
![Page 326: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/326.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe [Littman, ’01]
Friend-or-Foe Q-learning (FFQ) aims at removing therequirements on the intermediate Q-values duringlearningThe idea is to let the algorithm know what kind ofopponent to expect
friend: coordination equilibrium
Nash1(s,Q1,Q2) = maxa1∈A1,a2∈A2
Q1(s,a1,a2)
foe: adversarial equilibrium
Nash1(s,Q1,Q2) = maxπ∈Π(A1)
mina2∈A2
∑a1∈A1
π(a1)Q1(s,a1,a2)
In FFQ the learner maintains only a Q-function foritself
![Page 327: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/327.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe [Littman, ’01]
Friend-or-Foe Q-learning (FFQ) aims at removing therequirements on the intermediate Q-values duringlearningThe idea is to let the algorithm know what kind ofopponent to expect
friend: coordination equilibrium
Nash1(s,Q1,Q2) = maxa1∈A1,a2∈A2
Q1(s,a1,a2)
foe: adversarial equilibrium
Nash1(s,Q1,Q2) = maxπ∈Π(A1)
mina2∈A2
∑a1∈A1
π(a1)Q1(s,a1,a2)
In FFQ the learner maintains only a Q-function foritself
![Page 328: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/328.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe Q-learningConvergence results
Friend-or-foe Q-learning convergesIn general the values learned by FFQ will not convergeto those of any Nash equilibriumThere are some special cases (independently fromopponent behavior)
Foe-Q learns values for a Nash equilibrium policy if thegame has an adversarial equilibriumFriend-Q learns values for a Nash equilibrium policy ifthe game has a coordination equilibrium
Foe-Q learns a Q-function whose corresponding policywill achieve at least the learned values, regardless ofthe opponent’s selected policy
![Page 329: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/329.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe Q-learningConvergence results
Friend-or-foe Q-learning convergesIn general the values learned by FFQ will not convergeto those of any Nash equilibriumThere are some special cases (independently fromopponent behavior)
Foe-Q learns values for a Nash equilibrium policy if thegame has an adversarial equilibriumFriend-Q learns values for a Nash equilibrium policy ifthe game has a coordination equilibrium
Foe-Q learns a Q-function whose corresponding policywill achieve at least the learned values, regardless ofthe opponent’s selected policy
![Page 330: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/330.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe Q-learningConvergence results
Friend-or-foe Q-learning convergesIn general the values learned by FFQ will not convergeto those of any Nash equilibriumThere are some special cases (independently fromopponent behavior)
Foe-Q learns values for a Nash equilibrium policy if thegame has an adversarial equilibriumFriend-Q learns values for a Nash equilibrium policy ifthe game has a coordination equilibrium
Foe-Q learns a Q-function whose corresponding policywill achieve at least the learned values, regardless ofthe opponent’s selected policy
![Page 331: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/331.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe Q-learningConvergence results
Friend-or-foe Q-learning convergesIn general the values learned by FFQ will not convergeto those of any Nash equilibriumThere are some special cases (independently fromopponent behavior)
Foe-Q learns values for a Nash equilibrium policy if thegame has an adversarial equilibriumFriend-Q learns values for a Nash equilibrium policy ifthe game has a coordination equilibrium
Foe-Q learns a Q-function whose corresponding policywill achieve at least the learned values, regardless ofthe opponent’s selected policy
![Page 332: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/332.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe Q-learningConvergence results
Friend-or-foe Q-learning convergesIn general the values learned by FFQ will not convergeto those of any Nash equilibriumThere are some special cases (independently fromopponent behavior)
Foe-Q learns values for a Nash equilibrium policy if thegame has an adversarial equilibriumFriend-Q learns values for a Nash equilibrium policy ifthe game has a coordination equilibrium
Foe-Q learns a Q-function whose corresponding policywill achieve at least the learned values, regardless ofthe opponent’s selected policy
![Page 333: Multi-Agent Reinforcement Learning - Game Theory Polimi · 2019-12-03 · Introduction to Multi-Agent Reinforcement Learning Reinforcement Learning MARL vs RL MARL vs Game Theory](https://reader033.vdocument.in/reader033/viewer/2022050515/5f9eec74ef67ee64900a6e45/html5/thumbnails/333.jpg)
MARL
MarcelloRestelli
Introduction toMulti-AgentReinforcementLearningReinforcementLearning
MARL vs RL
MARL vs GameTheory
MARLalgorithmsBest-ResponseLearning
Equilibrium Learners
Team Games
Zero-sum Games
General-sumGames
Friend-or-Foe Q-learningConvergence results
Friend-or-foe Q-learning convergesIn general the values learned by FFQ will not convergeto those of any Nash equilibriumThere are some special cases (independently fromopponent behavior)
Foe-Q learns values for a Nash equilibrium policy if thegame has an adversarial equilibriumFriend-Q learns values for a Nash equilibrium policy ifthe game has a coordination equilibrium
Foe-Q learns a Q-function whose corresponding policywill achieve at least the learned values, regardless ofthe opponent’s selected policy