the minority game: individual and social learning
DESCRIPTION
Artificial Intelligence meets Evolutionary Learning. Reinforcement Learning through Evolutionary Game Theory. Social Learning in the Minority Game.TRANSCRIPT
The Minority Game:Individual and Social Learning
under the supervision of
Dr. Matthijs Ruijgrok
Stathis Grigoropoulos
MSc MathematicsScientific Computing
March 27, 2014
1 / 51
Overview
1 Introduction
2 Evolutionary Game Theory (EGT)RationalityLearningEGT - Nash Equilibrium
3 Revision Protocol
4 Minority GameStage GameCongestion GameIndividual LearningSocial Learning
5 Conclusion
2 / 51
Introduction
• Learning has been given much attention in ArtificialIntelligence (AI) and Game Theory (GT) disciplines, as it isthe key to intelligent and rational behavior.
• What type of learning processes can we identify in a contextwith many interacting agents?
• Mathematics to model learning?
• What do they learn to play? learning outcomes?
3 / 51
EGT - Rationality
• Classic Game Theory (GT) assumes perfect rationality.• The players know all the details of the game, including each
other’s preferences over all possible outcomes.
• The richness and depth of the analysis is tremendous.
• It can handle only a small number of heterogeneous agents.
• Humans do not always play rationally• Lack of game details, player information.• Follow “rules of thumb”.
4 / 51
EGT - Rationality
• How do you choose the route back home to avoidcongestion?
5 / 51
EGT - Learning
• Evolutionary Game Theory (EGT), motivated by biology,imagines the game to be repeatedly played by boundedrational players randomly drawn from a population.
• The players make no assumption on the other playersstrategies.
• Each agent has a predefined strategy of the game(like a “ruleof thumb”).
• Agents have the opportunity to adapt through an evolutionaryprocess over time and change their behavior.
• Can be reproduction or imitation ⇒ Learning!.
6 / 51
EGT - Learning
• Individual Learning
• In individual learning, success and failure directly influenceagent choices and behavior.
• From a AI learning perspective, individual learning isinterpreted as various types of reinforcement learning.
7 / 51
EGT - Learning
• Social Learning
• Social Learning occurs in the cases where success and failureof other players influence choice probabilities.
• Social Learning ⇒ Social Imitation.
8 / 51
EGT - Nash Equilibrium
• Thankfully, we still have the Nash Equilibrium.
9 / 51
EGT - Nash Equilibrium
• We consider a game played by a single population, whereagents play equivalent roles.
• Let there be N players, each of whom takes a pure strategyfrom the set S = {1 . . . n}.
• We call population state x the element of the simplexX = {x ∈ Rn
+ :∑
j∈S xj = 1}, with xj the fraction of agentsplaying strategy j .
• A population game is identified by a continuous vector-valuedfunction that maps population states to payoffs, i.eF : X 7→ Rn.
• The payoff of strategy i when population state is x , isdescribed by the scalar Fi (x).
10 / 51
EGT - Nash Equilibrium
Population state x∗ is a Nash Equilibrium of F , when no agent canbenefit and improve his profit by switching unilaterally fromstrategy i to strategy j . Specifically, x∗ is a NE if and only if:
Fi (x∗) ≥ Fj(x∗) ∀j ∈ S
∀ i ∈ S s.t.
x∗i > 0.
11 / 51
Revision Protocol
• In population games introduced, agents are matched randomlyand play their strategies, producing their respected payoffs.
• However, population games can also embody cases, where allthe players take part in the game.
• The foundations of population learning dynamics are builtupon a notion called revision protocol.
• How individuals choose to update their strategy.
DefinitionA revision protocol is a map ρ : Rn × X 7→ Rn×n that takes asinput payoff vectors π and population states x and returnsnon-negative matrices as output.
12 / 51
Revision Protocol - Social Learning
• Commonly, Social Imitation is modeled through the revisionprotocol called proportional imitation protocolρij(π, x) = xj [πj − πi ]+ [7].
• Simply, imitate ‘the first man you meet on the street” with aprobability proportional to your score difference.
• This protocol generates the well-studied replicator dynamic
xi = xi Fi (x), (1)
with Fi (x) = Fi (x)− F (x) and F (x) =∑
i∈S xiFi (x) [7].
13 / 51
Revision Protocol - Replicator Equation
• The replicator equation is a famous deterministic dynamic inEGT.
• The percentage growth rate xi/xi of each strategy in use, isequal to its excess payoff.
• The attracting rest points of the replicator equation are NashEquilibrium (NE) of the game [9].
• Bounded Rational Agents discover NE through learning.
14 / 51
Revision Protocol - Individual Learning
• Suppose we have n populations playing an n-player game.
• A random agent from each population is selected and theyplay the game at each round.
• The pairwise proportional imitation revision protocol,
• ρhk(π, x i ) = x ik [πk − πh]+, for each population i ∈ N with
population state x i .
• We get the multi-population replicator dynamics [7].
x ih = x i
hFh(x i ), ∀h ∈ S ,∀i ∈ N. (2)
15 / 51
Revision Protocol - Individual Learning
• Each population represents a player.
• Each agent within a population represents a voice or opinioninside each the player.
• Through the imitation process the voices achieving higherscores gets reinforced ⇒ Reinforcement Learning.
16 / 51
Minority Game
• The Minority Game is a famous Congestion Game.
• Congestion games model instances when the payoff of eachplayer depends on the choice of resources along with thenumber of players choosing the same resource [5].
• Route Choice in road network• Selfish packet routing in the Internet
• Congestion games are Potential Games.
• There exist a single scalar-valued function that characterizesthe game [6, p. 53].
17 / 51
Minority Game
• Consider an odd population of N agents competing in arepeated one-shot game (N = 2k + 1, k ≥ 1) wherecommunication is not allowed.
• At each time step (round) t of the game, every agent has tochoose between one of two possible actions, eitherαi (t) ∈ {−1, 1}.
• The minority choice wins the round at each time step and allthe winning agents are rewarded one point, none otherwise.
• By construction, the MG is a negative sum game, as thewinners are always less than the losers.
18 / 51
Minority Game
Do Individual and Social Learning lead to different learningoutcomes?
19 / 51
Minority Game - Stage Game
DefinitionDefine the Minority stage game as the one shot strategic gameΓ =< N,∆, πi > consisting of:
• N players indexed byi ∈ {1, . . . , 2k + 1}, k ∈ N,N = {1, . . . , 2k + 1},
• a finite set of strategies Ai = {−1, 1} indexed by α , where αi
denotes the strategy of player i ,
• the set of mixed strategies of player i is denoted by ∆(Ai ) and
• a payoff function πi : αi × α−i 7→ R = {0, 1}, whereα−i =
∏i 6=j αj . More formally,
πi =
{1 if − αi
∑Nj=1 αj ≥ 0
0 otherwise(3)
20 / 51
Minority Game - Stage Game
• Some Details...
• We say agent i has mixed strategy αi :• We think of a probability p ∈ [0, 1] that agent i plays action{1}
• (1− p) to play action {−1}• Pure strategy : p = 0, 1.
• Mixed Strategy Profile α:• All agents Mixed Strategies in a vector ⇒ Population State.• The agents playing a mixed strategy are called mixers.
21 / 51
Minority Game - Stage Game
• “inversion” symmetry πi (−αi ,−α−i ) = πi (αi , α−i )⇒ two actions are a priori equivalent [2].
We have three general types of Nash equilibria, namely:
• Pure Strategy Nash Equilibria. I.e., when all players play apure strategy.
• Symmetric Mixed Strategy Nash Equilibria. That is, theagents choose the same mixed strategy to play.
• Asymmetric Mixed Strategy Nash Equilibria. Specifically, theNE when some players choose a pure strategy and the rest amixed strategy.
22 / 51
Minority Game - Stage Game
Proposition
The number of pure strategy Nash Equilibria in the stage game ofthe original MG is 2
( NN−1
2
).
Proof.The first part is the number of ways N
2 − 1 different players can bechosen out of the set of N players at a time with minority side “0”.Similarly, for the chosen set of players that are labeled withwinning side “1”. ♦
23 / 51
Minority Game - Stage Game
LemmaLet be α ∈ ×i∈N∆(Ai ) a Nash equilibrium with a non-empty set ofmixers. Then all mixers use the same mixed strategy [10, 1].
Asymmetric Mixed Strategy Nash Equilibria.
• with one mixer.
• with more than one mixer.
Unique Symmetric Mixed NE
• All players choose one the two actions with probabilityp = 1/2
24 / 51
Minority Game - Congestion Game
The two available choices to the agents can represent two distinctresources in a Congestion Game.A congestion model (N,M, (Ai )i∈N , (cj)j∈M) is described asfollows:
• N the number of players.
• M = {1..m} the number of resources.
• Ai the set of strategies of player i , where each ai ∈ Ai is anon empty set of resources.
• For j ∈ M, cj ∈ Rn denotes the vector of benefits, where cjk isthe cost (e.g cost or payoff ) related to each user of resourcej , if there are exactly k players using that resource.
• A population state is a vector ~a = α = (α1, ..., α2k+1) orpoint in the polyhedron ∆(A) of the mixed strategy profiles
25 / 51
Minority Game - Congestion Game• The overall cost function for player i [8, p. 174]:
Ci =∑j∈ai
cj(σj(~a)) = −πi (~a). (4)
DefinitionA function P : A 7→ R is a potential of the game G if∀~a ∈ A, ∀ai , bi ∈ Ai
ui (bi ,~a−i )− ui (ai ,~a−i ) = P(bi ,~a−i )− P(ai ,~a−i ) [3].
• Skipping a lot of nice math...
Proposition
An exact potential of the MG is the sum of the payoffs of all theN = 2h + 1 ∈ R players. Therefore,
P(~a) =N∑i=1
ui (~a) ∀~a ∈ A (5)
26 / 51
Minority Game - Individual Learning
• Let N = {1, ..., 2k + 1} be a set of populations, with eachpopulation representing a Player i in the MG.
• A population state is a vector ~a = α = (α1, ..., α2k+1) orpoint in the polyhedron ∆(A) of the mixed strategy profiles.
• Each component αi is a point in the simplex ∆(Ai ), denotingthe proportion of agents programmed to play the purestrategy ai ∈ Ai .
• Agents – one from each population – are continuously drawnuniformly at random from these populations to play the MG.
• Time indexed t, is continuous.
∀i ∈ N, ∀αi ∈ Ai :
αi (ai ) = αi (ai )(ui (ai , α−i )− ui (αi , α−i )). (6)
27 / 51
Minority Game - Individual Learning
• A population state α ∈ ×i∈N∆(Ai ) is a stationary state of thereplicator dynamics (6),when ˙αi (ai ) = 0 ∀i ∈ N,∀αi ∈ Ai .
• The set T of stationary states can be partitioned into threesubsets [1].
T1 : The connected set of Nash equilibria with at most one mixer,T2 : Nash equilibria with more than one mixer andT3 : non-equilibrium profiles of the type (l , r , λ).l , r ∈ 2k + 1,
l + r ≤ 2k + 1,
if l + r < 2k + 1 then λ ∈ (0, 1).
28 / 51
Minority Game - Individual Learning
• Lyapunov stability
• Asymptotic stability
29 / 51
Minority Game - Individual Learning
Concretely,
• A population state α ∈ ×i∈N∆(Ai ) is Lyapunov stable if everyneighborhood B of α contains a neighborhood B0 of α suchthat ψ(t, a0) ∈ B for every x0 ∈ B ∩ ×i∈N∆(Ai ) and t ≥ 0.
• A stationary state is asymptotically stable if it is Lyapunovstable, and, in addition, there exists a neighborhood B∗, withlimt→∞ ψ(t, a0) = α ∀α0 ∈ B∗ ∩ ×i∈N∆(Ai ).
30 / 51
Minority Game - Individual Learning
Using definition (4) of the potential function in equations (6):
∀i ∈ N, ∀αi ∈ Ai :
αi (ai ) = αi (ai )(U(ai , α−i )− U(αi , α−i )). (7)
Proposition
The potential function U of the minority game is a Lyapunovfunction for the replicator dynamic: for each trajectory(α(t))t∈[0,∞], we have dU
dt ≥ 0. Equality holds at the stationarystates [1].
Proposition
The collection of Nash equilibria with at most one mixer in T 1 isasymptotically stable under the replicator dynamics. Moreover,stationary states in T 2 and T 3 are not Lyapunov stable [1].
31 / 51
Minority Game - Individual Learning
• The Nash Equilibria with one mixer, are global maxima of thepotential function U.
• The potential function is the sum of the utilities of theplayers. ⇒ Maximization of Utility.
• The symmetric NE of the MG is not Lyapunov stable.
32 / 51
Minority Game - Individual Learning
Three Players in the MG:We define x , y , z ∈ R[0, 1] the probabilities for each player to playstrategy ai = 1.
A3 -1
A1/A2 -1 1-1 000 0101 100 001
A3 1
A1/A2 -1 1-1 100 1001 010 000
Table: Payoff matrix of the three-player MG. A1,A2 and A3 denoteagents 1,2,3 respectively with actions {−1, 1}. The utility matrix is splitinto two submatrices using agent A3 actions as a divider. The payoffs foreach agent are presented w.r.t to their number, i.e. payoff 010 meanspayoff for A1=0, A2=1 and A3=0.
33 / 51
Minority Game - Individual Learning
U(x , y , z) = u1 + u2 + u3 = x + y + z − xy − yz − zx (8)
max U(x , y , z) = 1 (9)
34 / 51
Minority Game - Individual Learning
35 / 51
Minority Game - Individual Learning
• Individual Learning is a “utilitarian” solution.
• The sum of the utilities is a Lyapunov function and ismaximized in the replicator dynamics.
• The result can be generalized to congestion games.
• What about Social Learning?
36 / 51
Minority Game - Social Learning
• No Mathematical model was attempted.
• Agent-Based Simulation were performed.
• Netlogo - Java [11],[4].
37 / 51
Minority Game - Social Learning
We define
• Attendance
Att(t) =N∑i=1
αi (t) (10)
• |Att(t)| = 1⇒ Nash Equilibrium!
38 / 51
Minority Game - Social Learning
Figure: The algorithm
39 / 51
Minority Game - Social Learning
Parameter Value
Agents 99,Imitators 3Review Rounds 3
Table: Parameters for the experiment using a Single Population of Players
40 / 51
Minority Game - Social Learning
• Not a Nash Equilibrium!
41 / 51
Minority Game - Social Learning
Figure: Histogram of strategies at stationary state,N = 99.
42 / 51
Minority Game - Social Learning
• Strategies always come as pairs.
43 / 51
Minority Game - Social Learning
44 / 51
Minority Game - Social vs IndividualLearning
45 / 51
Minority Game - Social vs IndividualLearning
46 / 51
Conclusion
• Yes! Different Learning Algorithms, do lead to differentoutcomes.
• Individual Learning in MG is robust and efficient, cares aboutmaximization of utility.
• Individual Learning can lead to great differences in agentperformance.
• Social Learning is a more “egalitarian” solution, agents areequal in terms of score.
• Social Learning does not converge to a Nash Equilibrium.
• Optimization ⇒ Individual Learning.
• Social Analysis ⇒ Social Learning.
47 / 51
Thank You!Questions?
48 / 51
W. Kets and M. Voorneveld.Congestion, Equilibrium and Learning: the Minority Game.Discussion paper. Tilburg University, 2007.
M. Marsili, D. Challet, and R. Zecchina.Exact solution of a modified El Farol’s bar problem: Efficiencyand the role of market impact.Physica A Statistical Mechanics and its Applications,280:522–553, June 2000.
Dov Monderer and Lloyd S Shapley.Potential games.Games and economic behavior, 14(1):124–143, 1996.
Steven F Railsback, Steven L Lytinen, and Stephen K Jackson.
Agent-based simulation platforms: Review and developmentrecommendations.Simulation, 82(9):609–623, 2006.
RobertW. Rosenthal.49 / 51
A class of games possessing pure-strategy nash equilibria.International Journal of Game Theory, 2(1):65–67, 1973.
W.H. Sandholm.Population Games and Evolutionary Dynamics.Economic Learning and Social Evolution. MIT Press, 2010.
William H Sandholm.Stochastic evolutionary game dynamics: Foundations,deterministic approximation, and equilibrium selection.American Mathematical Society, 201(1):1–1, 2010.
Yoav Shoham and Kevin Leyton-Brown.Multiagent systems: Algorithmic, game-theoretic, and logicalfoundations.Cambridge University Press, 2009.
J.W. Weibull.Evolutionary Game Theory.MIT Press, 1997.
Duncan Whitehead.50 / 51
The El Farol Bar Problem Revisited: Reinforcement Learningin a Potential Game.ESE Discussion Papers 186, Edinburgh School of Economics,University of Edinburgh, September 2008.
U. Wilensky.Netlogo. http://ccl.northwestern.edu/netlogo/. center forconnected learning and computer-based modeling,northwestern university, evanston, il., 1999.
51 / 51