the minority game: individual and social learning

The Minority Game:Individual and Social Learning

under the supervision of

Dr. Matthijs Ruijgrok

Stathis Grigoropoulos

MSc MathematicsScientific Computing

March 27, 2014

1 / 51

Overview

1 Introduction

2 Evolutionary Game Theory (EGT)RationalityLearningEGT - Nash Equilibrium

3 Revision Protocol

4 Minority GameStage GameCongestion GameIndividual LearningSocial Learning

5 Conclusion

2 / 51

Introduction

• Learning has been given much attention in ArtificialIntelligence (AI) and Game Theory (GT) disciplines, as it isthe key to intelligent and rational behavior.

• What type of learning processes can we identify in a contextwith many interacting agents?

• Mathematics to model learning?

• What do they learn to play? learning outcomes?

3 / 51

EGT - Rationality

• Classic Game Theory (GT) assumes perfect rationality.• The players know all the details of the game, including each

other’s preferences over all possible outcomes.

• The richness and depth of the analysis is tremendous.

• It can handle only a small number of heterogeneous agents.

• Humans do not always play rationally• Lack of game details, player information.• Follow “rules of thumb”.

4 / 51

EGT - Rationality

• How do you choose the route back home to avoidcongestion?

5 / 51

EGT - Learning

• Evolutionary Game Theory (EGT), motivated by biology,imagines the game to be repeatedly played by boundedrational players randomly drawn from a population.

• The players make no assumption on the other playersstrategies.

• Each agent has a predefined strategy of the game(like a “ruleof thumb”).

• Agents have the opportunity to adapt through an evolutionaryprocess over time and change their behavior.

• Can be reproduction or imitation ⇒ Learning!.

6 / 51

EGT - Learning

• Individual Learning

• In individual learning, success and failure directly influenceagent choices and behavior.

• From a AI learning perspective, individual learning isinterpreted as various types of reinforcement learning.

7 / 51

EGT - Learning

• Social Learning

• Social Learning occurs in the cases where success and failureof other players influence choice probabilities.

• Social Learning ⇒ Social Imitation.

8 / 51

EGT - Nash Equilibrium

• Thankfully, we still have the Nash Equilibrium.

9 / 51


• We consider a game played by a single population, whereagents play equivalent roles.

• Let there be N players, each of whom takes a pure strategyfrom the set S = {1 . . . n}.

• We call population state x the element of the simplexX = {x ∈ Rn

+ :∑

j∈S xj = 1}, with xj the fraction of agentsplaying strategy j .

• A population game is identified by a continuous vector-valuedfunction that maps population states to payoffs, i.eF : X 7→ Rn.

• The payoff of strategy i when population state is x , isdescribed by the scalar Fi (x).

10 / 51


Population state x∗ is a Nash Equilibrium of F , when no agent canbenefit and improve his profit by switching unilaterally fromstrategy i to strategy j . Specifically, x∗ is a NE if and only if:

Fi (x∗) ≥ Fj(x∗) ∀j ∈ S

∀ i ∈ S s.t.

x∗i > 0.

11 / 51

Revision Protocol

• In population games introduced, agents are matched randomlyand play their strategies, producing their respected payoffs.

• However, population games can also embody cases, where allthe players take part in the game.

• The foundations of population learning dynamics are builtupon a notion called revision protocol.

• How individuals choose to update their strategy.

DefinitionA revision protocol is a map ρ : Rn × X 7→ Rn×n that takes asinput payoff vectors π and population states x and returnsnon-negative matrices as output.

12 / 51

Revision Protocol - Social Learning

• Commonly, Social Imitation is modeled through the revisionprotocol called proportional imitation protocolρij(π, x) = xj [πj − πi ]+ [7].

• Simply, imitate ‘the first man you meet on the street” with aprobability proportional to your score difference.

• This protocol generates the well-studied replicator dynamic

xi = xi Fi (x), (1)

with Fi (x) = Fi (x)− F (x) and F (x) =∑

i∈S xiFi (x) [7].

13 / 51

Revision Protocol - Replicator Equation

• The replicator equation is a famous deterministic dynamic inEGT.

• The percentage growth rate xi/xi of each strategy in use, isequal to its excess payoff.

• The attracting rest points of the replicator equation are NashEquilibrium (NE) of the game [9].

• Bounded Rational Agents discover NE through learning.

14 / 51

Revision Protocol - Individual Learning

• Suppose we have n populations playing an n-player game.

• A random agent from each population is selected and theyplay the game at each round.

• The pairwise proportional imitation revision protocol,

• ρhk(π, x i ) = x ik [πk − πh]+, for each population i ∈ N with

population state x i .

• We get the multi-population replicator dynamics [7].

x ih = x i

hFh(x i ), ∀h ∈ S ,∀i ∈ N. (2)

15 / 51

Revision Protocol - Individual Learning

• Each population represents a player.

• Each agent within a population represents a voice or opinioninside each the player.

• Through the imitation process the voices achieving higherscores gets reinforced ⇒ Reinforcement Learning.

16 / 51

Minority Game

• The Minority Game is a famous Congestion Game.

• Congestion games model instances when the payoff of eachplayer depends on the choice of resources along with thenumber of players choosing the same resource [5].

• Route Choice in road network• Selfish packet routing in the Internet

• Congestion games are Potential Games.

• There exist a single scalar-valued function that characterizesthe game [6, p. 53].

17 / 51

Minority Game

• Consider an odd population of N agents competing in arepeated one-shot game (N = 2k + 1, k ≥ 1) wherecommunication is not allowed.

• At each time step (round) t of the game, every agent has tochoose between one of two possible actions, eitherαi (t) ∈ {−1, 1}.

• The minority choice wins the round at each time step and allthe winning agents are rewarded one point, none otherwise.

• By construction, the MG is a negative sum game, as thewinners are always less than the losers.

18 / 51

Minority Game

Do Individual and Social Learning lead to different learningoutcomes?

19 / 51

Minority Game - Stage Game

DefinitionDefine the Minority stage game as the one shot strategic gameΓ =< N,∆, πi > consisting of:

• N players indexed byi ∈ {1, . . . , 2k + 1}, k ∈ N,N = {1, . . . , 2k + 1},

• a finite set of strategies Ai = {−1, 1} indexed by α , where αi

denotes the strategy of player i ,

• the set of mixed strategies of player i is denoted by ∆(Ai ) and

• a payoff function πi : αi × α−i 7→ R = {0, 1}, whereα−i =

∏i 6=j αj . More formally,

πi =

{1 if − αi

∑Nj=1 αj ≥ 0

0 otherwise(3)

20 / 51


• Some Details...

• We say agent i has mixed strategy αi :• We think of a probability p ∈ [0, 1] that agent i plays action{1}

• (1− p) to play action {−1}• Pure strategy : p = 0, 1.

• Mixed Strategy Profile α:• All agents Mixed Strategies in a vector ⇒ Population State.• The agents playing a mixed strategy are called mixers.

21 / 51


• “inversion” symmetry πi (−αi ,−α−i ) = πi (αi , α−i )⇒ two actions are a priori equivalent [2].

We have three general types of Nash equilibria, namely:

• Pure Strategy Nash Equilibria. I.e., when all players play apure strategy.

• Symmetric Mixed Strategy Nash Equilibria. That is, theagents choose the same mixed strategy to play.

• Asymmetric Mixed Strategy Nash Equilibria. Specifically, theNE when some players choose a pure strategy and the rest amixed strategy.

22 / 51


Proposition

The number of pure strategy Nash Equilibria in the stage game ofthe original MG is 2

( NN−1

2

).

Proof.The first part is the number of ways N

2 − 1 different players can bechosen out of the set of N players at a time with minority side “0”.Similarly, for the chosen set of players that are labeled withwinning side “1”. ♦

23 / 51


LemmaLet be α ∈ ×i∈N∆(Ai ) a Nash equilibrium with a non-empty set ofmixers. Then all mixers use the same mixed strategy [10, 1].

Asymmetric Mixed Strategy Nash Equilibria.

• with one mixer.

• with more than one mixer.

Unique Symmetric Mixed NE

• All players choose one the two actions with probabilityp = 1/2

24 / 51

Minority Game - Congestion Game

The two available choices to the agents can represent two distinctresources in a Congestion Game.A congestion model (N,M, (Ai )i∈N , (cj)j∈M) is described asfollows:

• N the number of players.

• M = {1..m} the number of resources.

• Ai the set of strategies of player i , where each ai ∈ Ai is anon empty set of resources.

• For j ∈ M, cj ∈ Rn denotes the vector of benefits, where cjk isthe cost (e.g cost or payoff ) related to each user of resourcej , if there are exactly k players using that resource.

• A population state is a vector ~a = α = (α1, ..., α2k+1) orpoint in the polyhedron ∆(A) of the mixed strategy profiles

25 / 51

Minority Game - Congestion Game• The overall cost function for player i [8, p. 174]:

Ci =∑j∈ai

cj(σj(~a)) = −πi (~a). (4)

DefinitionA function P : A 7→ R is a potential of the game G if∀~a ∈ A, ∀ai , bi ∈ Ai

ui (bi ,~a−i )− ui (ai ,~a−i ) = P(bi ,~a−i )− P(ai ,~a−i ) [3].

• Skipping a lot of nice math...

Proposition

An exact potential of the MG is the sum of the payoffs of all theN = 2h + 1 ∈ R players. Therefore,

P(~a) =N∑i=1

ui (~a) ∀~a ∈ A (5)

26 / 51

Minority Game - Individual Learning

• Let N = {1, ..., 2k + 1} be a set of populations, with eachpopulation representing a Player i in the MG.

• A population state is a vector ~a = α = (α1, ..., α2k+1) orpoint in the polyhedron ∆(A) of the mixed strategy profiles.

• Each component αi is a point in the simplex ∆(Ai ), denotingthe proportion of agents programmed to play the purestrategy ai ∈ Ai .

• Agents – one from each population – are continuously drawnuniformly at random from these populations to play the MG.

• Time indexed t, is continuous.

∀i ∈ N, ∀αi ∈ Ai :

αi (ai ) = αi (ai )(ui (ai , α−i )− ui (αi , α−i )). (6)

27 / 51


• A population state α ∈ ×i∈N∆(Ai ) is a stationary state of thereplicator dynamics (6),when ˙αi (ai ) = 0 ∀i ∈ N,∀αi ∈ Ai .

• The set T of stationary states can be partitioned into threesubsets [1].

T1 : The connected set of Nash equilibria with at most one mixer,T2 : Nash equilibria with more than one mixer andT3 : non-equilibrium profiles of the type (l , r , λ).l , r ∈ 2k + 1,

l + r ≤ 2k + 1,

if l + r < 2k + 1 then λ ∈ (0, 1).

28 / 51


• Lyapunov stability

• Asymptotic stability

29 / 51


Concretely,

• A population state α ∈ ×i∈N∆(Ai ) is Lyapunov stable if everyneighborhood B of α contains a neighborhood B0 of α suchthat ψ(t, a0) ∈ B for every x0 ∈ B ∩ ×i∈N∆(Ai ) and t ≥ 0.

• A stationary state is asymptotically stable if it is Lyapunovstable, and, in addition, there exists a neighborhood B∗, withlimt→∞ ψ(t, a0) = α ∀α0 ∈ B∗ ∩ ×i∈N∆(Ai ).

30 / 51


Using definition (4) of the potential function in equations (6):

∀i ∈ N, ∀αi ∈ Ai :

αi (ai ) = αi (ai )(U(ai , α−i )− U(αi , α−i )). (7)

Proposition

The potential function U of the minority game is a Lyapunovfunction for the replicator dynamic: for each trajectory(α(t))t∈[0,∞], we have dU

dt ≥ 0. Equality holds at the stationarystates [1].

Proposition

The collection of Nash equilibria with at most one mixer in T 1 isasymptotically stable under the replicator dynamics. Moreover,stationary states in T 2 and T 3 are not Lyapunov stable [1].

31 / 51


• The Nash Equilibria with one mixer, are global maxima of thepotential function U.

• The potential function is the sum of the utilities of theplayers. ⇒ Maximization of Utility.

• The symmetric NE of the MG is not Lyapunov stable.

32 / 51


Three Players in the MG:We define x , y , z ∈ R[0, 1] the probabilities for each player to playstrategy ai = 1.

A3 -1

A1/A2 -1 1-1 000 0101 100 001

A3 1

A1/A2 -1 1-1 100 1001 010 000

Table: Payoff matrix of the three-player MG. A1,A2 and A3 denoteagents 1,2,3 respectively with actions {−1, 1}. The utility matrix is splitinto two submatrices using agent A3 actions as a divider. The payoffs foreach agent are presented w.r.t to their number, i.e. payoff 010 meanspayoff for A1=0, A2=1 and A3=0.

33 / 51


U(x , y , z) = u1 + u2 + u3 = x + y + z − xy − yz − zx (8)

max U(x , y , z) = 1 (9)

34 / 51


35 / 51


• Individual Learning is a “utilitarian” solution.

• The sum of the utilities is a Lyapunov function and ismaximized in the replicator dynamics.

• The result can be generalized to congestion games.

• What about Social Learning?

36 / 51

Minority Game - Social Learning

• No Mathematical model was attempted.

• Agent-Based Simulation were performed.

• Netlogo - Java [11],[4].

37 / 51


We define

• Attendance

Att(t) =N∑i=1

αi (t) (10)

• |Att(t)| = 1⇒ Nash Equilibrium!

38 / 51


Figure: The algorithm

39 / 51


Parameter Value

Agents 99,Imitators 3Review Rounds 3

Table: Parameters for the experiment using a Single Population of Players

40 / 51


• Not a Nash Equilibrium!

41 / 51


Figure: Histogram of strategies at stationary state,N = 99.

42 / 51


• Strategies always come as pairs.

43 / 51


44 / 51

Minority Game - Social vs IndividualLearning

45 / 51

Minority Game - Social vs IndividualLearning

46 / 51

Conclusion

• Yes! Different Learning Algorithms, do lead to differentoutcomes.

• Individual Learning in MG is robust and efficient, cares aboutmaximization of utility.

• Individual Learning can lead to great differences in agentperformance.

• Social Learning is a more “egalitarian” solution, agents areequal in terms of score.

• Social Learning does not converge to a Nash Equilibrium.

• Optimization ⇒ Individual Learning.

• Social Analysis ⇒ Social Learning.

47 / 51

Thank You!Questions?

48 / 51

W. Kets and M. Voorneveld.Congestion, Equilibrium and Learning: the Minority Game.Discussion paper. Tilburg University, 2007.

M. Marsili, D. Challet, and R. Zecchina.Exact solution of a modified El Farol’s bar problem: Efficiencyand the role of market impact.Physica A Statistical Mechanics and its Applications,280:522–553, June 2000.

Dov Monderer and Lloyd S Shapley.Potential games.Games and economic behavior, 14(1):124–143, 1996.

Steven F Railsback, Steven L Lytinen, and Stephen K Jackson.

Agent-based simulation platforms: Review and developmentrecommendations.Simulation, 82(9):609–623, 2006.

RobertW. Rosenthal.49 / 51

A class of games possessing pure-strategy nash equilibria.International Journal of Game Theory, 2(1):65–67, 1973.

W.H. Sandholm.Population Games and Evolutionary Dynamics.Economic Learning and Social Evolution. MIT Press, 2010.

William H Sandholm.Stochastic evolutionary game dynamics: Foundations,deterministic approximation, and equilibrium selection.American Mathematical Society, 201(1):1–1, 2010.

Yoav Shoham and Kevin Leyton-Brown.Multiagent systems: Algorithmic, game-theoretic, and logicalfoundations.Cambridge University Press, 2009.

J.W. Weibull.Evolutionary Game Theory.MIT Press, 1997.

Duncan Whitehead.50 / 51

The El Farol Bar Problem Revisited: Reinforcement Learningin a Potential Game.ESE Discussion Papers 186, Edinburgh School of Economics,University of Edinburgh, September 2008.

U. Wilensky.Netlogo. http://ccl.northwestern.edu/netlogo/. center forconnected learning and computer-based modeling,northwestern university, evanston, il., 1999.

51 / 51

the minority game: individual and social learning

Science

imitation learning

introduction learning

model learning

population states x

game details

population state xi

nplayer game

population games