analytical solutions

7/31/2019 Analytical Solutions

1/9

Analytical Solutions of N-Person Games

MIKLOS N. SZILAGYI

Department of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona 85721

Received May 27, 2011; revised September 25, 2011; accepted November 1, 2011

The possibility of analytical solutions of N-person games is presented. A simple formula provides valuable in-formation about the outcomes of such games with linear payoff functions and Pavlovian agents. Experiments

performed with our simulation tool for the multiagent stag hunt dilemma game are presented. For the case

of Pavlovian agents the game has nontrivial but remarkably regular solutions. If both payoff functions are

linear and the real solutions of Eq. (2) are both positive, then the analytical solutions are remarkably accu-

rate. 2012 Wiley Periodicals, Inc. Complexity 17: 5462, 2012

Key Words: agent-based simulation; N-person games; stag hunt dilemma

1. INTRODUCTION

We will consider N-person games in which each par-

ticipant has a choice between two actions. The

participants can be individuals, collectives of per-

sons, organizations, or even computer programs. We sim-

ply call them agents.

Usually these two actions are cooperating with each

other for the common good or defecting (following their

selfish short-term interests). As a result of its choice, each

agent receives a reward or punishment (payoff) that is de-

pendent on its choice as well as the choices of all the

others. Their decisions to cooperate or defect will accumu-

late over time to produce a result that will determine the

success or failure of the given artificial society.Our agent-based simulation tool developed for social

and economic experiments with a large number of deci-

sion-makers operating in a stochastic environment [1]

makes it possible to simulate any iterated N-person gamewith a wide range of user-defined parameters. It is a genu-

ine multiagent tool and it is quite different from programs

analyzing repeated two-person games. It is suitable for an

unlimited number of agents with various personalities. We

were able to perform interesting nontrivial experiments

with this tool [28].

The agents are described as stochastic learning cellular

automata, i.e., as combinations of cellular automata [9, 10]

and stochastic learning automata [11, 12]. The cellular au-

tomaton format describes the environment in which the

agents interact. In our model, this environment is not lim-

ited to the agents immediate neighbors. The number of

neighborhood layers around each agent and the agentslocation determine the number of its neighbors. The depth

of agent As neighborhood is defined as the maximum dis-

tance, in three orthogonal directions, that agent B can be

from agent A and still be in its neighborhood. An agent at

the edge or in the corner of the available space has fewer

neighbors than one in the middle. The neighborhood may

extend to the entire array of agents. In this case the agents

interact with all other agents simultaneously.

Correspondance to: Miklos N. Szilagyi, Department of Elec-

trical and Computer Engineering, University of Arizona,

Tucson, Arizona 85721 (e-mail: [email protected])

54 C O M P L E X I T Y Q 2012 Wiley Periodicals, Inc., Vol. 17, No. 4DOI 10.1002/cplx.21385

Published online 13 February 2012 in Wiley Online Library(wileyonlinelibrary.com)


2/9

As it is a realistic simulation model of any N-person

game, the user is able to define parameters such as the

size and shape of the simulation environment, the payoff

functions, updating schemes for subsequent actions, per-

sonalities of the agents, and the definition of the neighbor-

hood.

Our simulation environment is a two-dimensional array

of the participating agents. The aggregate cooperation pro-

portion changes over subsequent iterations. At each itera-

tion, every agent chooses an action according to the payoffreceived for its previous action. The updating occurs

simultaneously for all agents.

In an iterative game the aggregate cooperation propor-

tion changes in time, i.e., over subsequent iterations. The

agents take actions according to probabilities updated on

the basis of the reward/penalty received for their previous

actions and of their personalities. The updating scheme

may be different for different agents. This means that

agents with completely different personalities can be

allowed to interact with each other in the same experi-

ment. Agents with various personalities and various initial

states and actions can be placed anywhere in the array.

The response of the environment is influenced by theactions of all participating agents.

The updated probabilities lead to new decisions by the

agents that are rewarded/penalized by the environment.

With each iteration, the software tool draws the array of

agents in a window on the computers screen, with each

agent in the array colored according to its most recent

action. The experimenter can view and record the evolu-

tion of the society of agents as it changes in time. After a

certain number of iterations the proportion of cooperators

stabilizes to either a constant value or oscillates around

such a value.

When everything else is fixed, the payoff (reward/pen-

alty) functions determine the game. The payoff functions

are given as two curves: one (C) for a cooperator and

another (D) for a defector. The payoff to each agent

depends on its choice, on the distribution of other players

among cooperators and defectors, and also on the proper-

ties of the environment. The payoff curves are functions of

the ratio of cooperators to the total number of neighbors

(Figure 1). The freedom of using arbitrary functions for

the determination of the reward/penalty system makes it

possible to simulate a wide range of games and other

social situations, including those where the two curves

intersect each other.

There are an infinite variety of payoff curves. In addition,

stochastic factors can be specified to represent stochastic

responses from the environment. Zero stochastic factors

mean a deterministic environment. Even in the almost triv-

ial case when both payoff curves are straight lines and the

stochastic factors are both zero, four parameters specify the

environment. Attempts to describe it with a single variable

[13, 14] are certainly too simplistic. The relative position of

the two payoff curves with respect to each other does not

always determine the outcome of the game. Ordinal prefer-

ence is not enough to represent the payoff functions: the

actual amounts of reward and punishment may be as im-

portant as the relative situation of the two curves.

The horizontal axis x in Figure 1 represents the number

of cooperators related to the total number of agents. We

will assume that the payoffs are linear functions of this ra-

tio for both choices and the game is uniform, i.e., the pay-off functions are the same for all agents.

Point P corresponds to the punishment when all agents

defect, R is the reward when all agents cooperate, T is the

temptation to defect when everybody else cooperates, and

S is the suckers payoff for cooperating when everyone else

defects. C(0) and D(1) are impossible by definition, but we

will follow the generally accepted notation by extending

both lines to the full range of 0x1 and denoting C(0) 5

S and D(1) 5 T that makes it simpler to define the payoff

functions. For large number of agents this extension is not

even noticeable.

We connect by straight lines point S with point R

(cooperators payoff function C) and point P with point T(defectors payoff function D). Thus the payoff to each

agent depends on its choice and on the distribution of

other players among cooperators and defectors.

There are 4! 5 24 different orderings of the values of P,

R, S, and T. Each of them represents a different type of

game. For the payoff functions shown in Figure 1, we have

S< P< T< R (1)

FIGURE 1

Payoff (reward/penalty) functions for cooperators (C) and defectors

(D). The horizontal axis (x) represents the ratio of the number of

cooperators to the total number of agents; the vertical axis is the

reward/penalty provided by the environment. In this figure, C(x) 5

22 1 4x and D(x) 5 21 1 2x (stag hunt dilemma).

Q 2012 Wiley Periodicals, Inc. C O M P L E X I T Y 55DOI 10.1002/cplx


3/9

that represents the N-person stag hunt dilemma, also

known as assurance game, coordination game, and trust

dilemma. The two payoff functions intersect each other in

this case.

It is a social dilemma, because R> S and T> P simul-

taneously [15]. The two-person stag hunt dilemma and its

applications have been studied in the literature [16].

Pacheco et al. published a mathematical analysis of the N-

person case in 2009 [17] but without using any payoff

functions.

We assume that the agents are distributed in and fully

occupy a finite two-dimensional space, the updates are si-

multaneous, the agents have no goals, know nothing about

each other, and they cannot refuse participation in any

iteration. This restriction leaves the problems of payoff

curves, neighborhood, and personalities open for investi-

gation.

The outcome of the game strongly depends on the per-

sonalities of the agents. We use the term personality in

the sense of decision heuristics (repeated-game strategies),

to represent the fact that different agents react differently

to the same stimulus from their environment. For exam-

ple, agents with short-term rationality will always choose

defection; benevolent agents will ignore their short-term

interests and will all cooperate, etc.

This is one of the most important characteristics of the

game. Personalities of the agents may represent genetic as

well as cultural differences between them. The psychologi-

cal literature on the impact of personalities in social

dilemmas is summarized in [18].

Personalities are usually neglected in the literature. We

have considered N-person Prisoners Dilemmas with various

personalities of the participating agents. Different agents mayhave quite different personalities in the same experiment [2].

The agents personalities may also change in time based on

the influences by other agents. We used five personality com-

ponents to represent human behavior in [19].

In the present work we investigate analytical solutions

for N-person games with crossing payoff functions and

Pavlovian agents.

2. ANALYTICAL SOLUTION

One of the simplest and most important personality profiles

is the Pavlovian with a linear updating scheme. The agents

probability of choosing the previously chosen action again

changes by an amount proportional to its reward or penaltyfor its previous action (the coefficient of proportionality is

called the learning rate). Of course, the probabilities always

remain in the interval between 0 and 1.

These agents are primitive enough not to know any-

thing about their rational choices but they have enough

intelligence to learn a behavior according to Thorndikes

law: if an action is followed by a satisfactory state of

affairs, then the tendency of the agent to produce that

particular action is reinforced [20]. This law was confirmed

by Pavlovs experiments.

Pavlovian agents behavior is determined by stochastic

learning rules that provide more powerful and realistic

results than the deterministic rules usually used in cellu-

lar automata. Stochastic learning means that behavior is

not determined but only shaped by its consequences,

i.e., an action of the agent will be more probable but

still not certain after a favorable response from the envi-

ronment.

Kraines and Kraines [21], Macy [22], Flache and Hegsel-

mann [23], and others used such agents for the investiga-

tion of iterated two-person games.

Pavlovian solutions can be predicted for any situa-

tion. We have developed an algorithm that accurately

predicts the final aggregate outcome for any combina-

tion of Pavlovian agents and any payoff functions [5].

The predictions are exact for an infinite number of

agents but the experimental results of the simulation

approximate the predictions very closely even for a few

hundred agents.

An even more convenient approach is to use an analyt-

ical formula for the prediction of the solutions. Let us

assume that in a society of N Pavlovian agents the neigh-

borhood is the entire collective of agents, the ratio of

cooperators is x, and the ratio of defectors is (1 2 x) at a

certain time. We have shown [3] that when the coopera-

tors receive the same total payoff as the defectors, i.e.,

x Cx 1 xDx; (2)

an equilibrium occurs. This may happen if C(x) and D(x)

are either both negative or both positive. In the first case,astable equilibrium was observed. In the second case, an

unstable equilibrium occurred.

In case of linear payoff functions the equilibrium

equation is quadratic. If its solutions are real, they are

x1 (stable attractor) and x2 (unstable repulsor). When

the initial cooperation ratio is below x2, the solution of

the game converges toward x1 as an oscillation while it

stabilizes exactly when the initial cooperation ratio is

above x2. The latter case does not result in the aggregate

cooperation proportion converging to 1, as one would

expect. This is because, for an individual agent that

started off as a defector, there is always some likelihood

that the agent will continue to defect. This probability isinitially small but continues to increase if the agent is

always rewarded for defecting. If the number of agents

is sufficiently large, then there will be some agents that

continue to defect until their cooperation probability

reaches zero due to the successive rewards they have

received, and these agents will defect forever. In case of

complex solutions, Eq. (2) does not give any information

about the game.

56 C O M P L E X I T Y Q 2012 Wiley Periodicals, Inc.DOI 10.1002/cplx


4/9

Naturally, the results are strongly dependent on the

payoff functions. In case of Pavlovian agents the relative

situation of the two payoff curves with respect to each

other does not determine the outcome of the game. It is

equally important to know the actual values of the payoffs.

We have performed numerous experiments with our

simulation tool for N-person games with crossing payoff

functions. There are 12 different such games [6]. For the

sake of space economy we will show the results using the

example of the N-person Stag Hunt dilemma only. When

the agents have Pavlovian personalities, the following

cases are possible for the application of Eq. (2):

a. Both curves are positive for any value of x. In this case

only the unstable equilibrium exists and the solution of

the game depends on the value of this equilibrium and

on the initial ratio of cooperators. When the initial

cooperation ratio is below x2, the solution of the game

stabilizes at a lower value between zero and x2. When

the initial cooperation ratio is above x2, the final stable

ratio has a higher value between x2 and 1.

b. Both C(x) and D(x) are negative for all values of x. In

this case only the stable equilibrium exists and the so-

lution of the game always converges to x1.

c. The C(x) curve is entirely positive but D(x) changessign from negative to positive as the value of x grows or

the D(x) curve is entirely positive and C(x) changes

sign. The situation is similar to case (a). The only differ-

ence is that in this case the region where both C(x) and

D(x) are positive may be too narrow to produce a solu-

tion according to Eq. (2).

d. The C(x) curve is entirely negative but D(x) changes

sign from negative to positive as the value of x grows

or the D(x) curve is entirely negative but C(x)

changes sign. The situation is similar to case (b).

However, the region where both C(x) and D(x) are

negative may be too narrow to produce a solution

according to Eq. (2).

The most interesting case is when both C(x) and D(x)

change sign. In this case both equilibria exist and Eq. (2)

always works.

For the payoff functions of Figure 1 the solutions are x15 1/3 (stable attractor) and x2 5 1/2 (unstable repulsor).

This corresponds to case (e) (see the discussion above).

Simulation results are shown in Figure 2. The graphs rep-

resent the proportions of cooperating agents as functions

of the number of iterations for different initial cooperation

ratios x0. We see that Eq. (2) works perfectly. After about

60 iterations the trajectories start to oscillate around x1 for

any value of x0 in the interval 0 x0 < x2. If x0 > x2, the

stable solution appears at about the 30th iteration at a

value below 1.

Let us start moving both payoff functions up together.

If C 5 21 1 4x and D 5 2x, the solutions of Eq. (2) are x15 0 and x2 5 1/2. This is case (c). Figure 3 shows the sim-

ulation results. In this case the trajectories are always sta-

ble and they are different from the previous case. In the

interval 0 x0 < x3 the solutions of the game are zero as

required by Eq. (2). The value of x3 is about 0.45. However,

in the region x3 < x0 < x2 the solutions are below x3 but

well above zero and they depend on the value of x0. If x0

> x2, the stable solutions are now much further from the

total cooperation than in the previous case.

FIGURE 2

Evolution of the game for the case when all agents are Pavlovian,

Figure 1 gives the payoff curves, and the neighborhood is the entire

collective of agents. The graphs show the proportions of cooperat-

ing agents as functions of the number of iterations. The initial

cooperation ratios from top to bottom curves are 0.51, 0.49, and

0.00, respectively.

FIGURE 3


the neighborhood is the entire collective of agents, and the payoff

curves are C(x) 5 21 1 4x and D(x) 5 2x. The graphs show the

proportions of cooperating agents as functions of the number of

iterations. The initial cooperation ratios from top to bottom curves

are 0.60, 0.55, 0.51, 0.49, 0.48, 0.45, and 0.40, respectively.



5/9

Moving further in the same direction, we find trajecto-

ries shown in Figure 4 when C 5 20.8 1 4x and D 5 0.2

1 2x. In this case x1 5 20.067 and x2 5 1/2. The negative

value of x1 points to the fact that in this case there is no

such region in the interval 0 x 1 where both functions

would be simultaneously negative (by definition x is

always positive). This is still case (c) and the simulation

results are similar to the previous case.

Figure 5 refers to the case when C 5 4x and D 5 1 1

2x. Now x1 5 21/3 and x2 5 1/2. The behavior of the tra-

jectories follows the previous trend even stronger. When

we move to C 5 1 1 4x and D 5 2 1 2x, we have x1 5

22/3 and x2 5 1/2. The behavior of the trajectories corre-

sponds to case (a). Equation (2) is of little use in this case

(Figure 6).

Let us reverse the direction now and move both pay-

off functions down together. When C 5 23 1 4x, D 5

22 1 2x the solutions are x1 5 1/2 and x2 5 2/3. This is

case (d). Figure 7 shows that the trajectories strictly

FIGURE 4



curves are C(x) 5 20.8 1 4x and D(x) 5 0.2 1 2x. The graphs

show the proportions of cooperating agents as functions of the

number of iterations. The initial cooperation ratios from top to

bottom curves are 0.55, 0.49, 0.48, and 0.45, respectively.

FIGURE 5



curves are C(x) 5 4x and D(x) 5 1 1 2x. The graphs show the



are 0.55, 0.49, 0.48, 0.45, 0.40, 0.35, and 0.30, respectively.

FIGURE 6



curves are C(x) 5 1 1 4x and D(x) 5 2 1 2x. The graphs show

the proportions of cooperating agents as functions of the number of


are 0.55, 0.49, 0.48, 0.45, 0.40, 0.35, 0.30, 0.25, and 0.20,

respectively.

FIGURE 7



curves are C(x) 5 23 1 4x and D(x) 5 22 1 2x. The graphs


number of iterations. The initial cooperation ratios from top to bot-

tom curves are 0.67, 0.66, and 0.00, respectively.



6/9

follow Eq. (2). When x0 > x2, all agents cooperate in this

case.

The next step is C 5 24 1 4x, D 5 23 1 2x. The solu-

tions are x1 5 1/2 and x2 5 1. This is the limit of case (b)

because C 5 0 at x 5 1. In this case only the stable equi-

librium exists and the solutions all converge to x1, inde-

pendent of the value of x0 (Figure 8).

Going one step further, we arrive at C 5 25 1 4x, D 5

24 1 2x. The solutions are x1 5 1/2 and x2 5 4/3 reflect-

ing the fact that this is case (b). There is no such region in

the interval 0 x 1 where both functions would be

simultaneously positive. The trajectories strictly follow Eq.

(2) for any value of x0 (Figure 9).

If we jump down to C 5 210 1 4x, D 5 29 1 2x, the

solutions are x1 5 1/2 and x2 5 3. We observe the emer-

gence of wild oscillations of the trajectories during the first

ten iterations, after which the trajectories settle down to x1

for any value of x0 (Figure 10).

FIGURE 8






bottom curves are 0.99 and 0.00, respectively.

FIGURE 9






bottom curves are 1.00 and 0.00, respectively.

FIGURE 10






tom curves are 1.00 and 0.00, respectively.

FIGURE 11



curves are C(x) 5 22 1 2x and D(x) 5 21 1 x. The graphs



tom curves are 0.99 and 0.00, respectively.



7/9

Let us now change the slopes of both payoff functions

simultaneously by varying the value of b in the following

formulae:

C 2 2b x

D 1 b x(3)

The solutions of Eq. (2) are simply 1/3 and 1/b for this

case.

If b < 3, then x1 5 1/3 and x2 5 1/b. Let us choose b 5

1 which is the limiting case of this game. This is case (b).

All trajectories converge to x1 (Figure 11).

If b 5 3, the two solutions coincide: x1 5 x2 5 1/3.

This is case (e). Three trajectories are shown in

Figure 12.

If b > 3, then x1 5 1/b and x2 5 1/3. Let us choose b 5

5. This is again case (e) but with only a small region where

both payoff functions are negative. Therefore, the trajecto-

FIGURE 13






bottom curves are 0.40, 0.32, 0.25, and 0.00, respectively.

FIGURE 12







FIGURE 15



curves are C(x) 5 22 1 4x and D(x) 5 21.07 1 2x. The graphs




FIGURE 14






tom curves are 0.80, 0.60, 0.40, 0.20, and 0.00, respectively.



8/9

ries shown on Figure 13 are now similar to those of

Figures 3 and 4.

Finally, we will change the separation of the two payoff

functions. We keep the C function unchanged as C 5 22

1 4x and vary the D function according to the formula D

5 22 1 k 1 2x. The determinant of Eq. (2) is then equal

to 12 12 k k2 which is positive when k < 0.928. Then

the roots of Eq. (2) are both complex. When k 5 0, the C

curve is entirely above the D curve and perfect coopera-

tion occurs at any value of x0. The trajectories for k 5 0.9

are shown in Figure 14. After a hundred iterations, cooper-ation is approached but not reached.

At k 5 0.93 the two solutions are x1 5 0.41 and x2 5

0.44. This is case (e). The trajectories start to follow Eq. (2)

(Figure 15).

k 5 1 corresponds to the original situation (Figure 2).

At k 5 1.5 the two solutions are x1 5 0.14 and x2 5 0.61.

This is still case (e). The trajectories follow Eq. (2) (Figure 16).

Finally, at k 5 2 the two solutions are x1 5 0 and x2 5

2/3. This is case (c) and the trajectories behave similarly

to those in Figure 3 (Figure 17).

When the neighborhood is only a finite number of

layers deep, each agent has less neighbors whose behavior

can influence its reward/penalty. In this case, the trajecto-

ries do not strictly follow the predictions of Eq. (2) but

have similar tendencies.

To summarize, we can say that if both payoff functionsare linear and the real solutions of Eq. (2) are both posi-

tive, then the predictions of Eq. (2) are almost always

valid. This simple formula provides valuable information

about the outcomes of N-person games with linear payoff

functions and Pavlovian agents.

REFERENCES1. Szilagyi, M.N.; Szilagyi, Z.C. A tool for simulated social experiments. Simulation 2000, 74, 410.

2. Szilagyi, M.N. Solutions to Realistic Prisoners Dilemma Games. Proceedings of the 2001 IEEE Systems, Man, and CyberneticsConference, TA12/2, 841846.

3. Szilagyi, M.N. An investigation of N-Person prisoners dilemmas. Complex Systems 2003, 14, 155174.

4. Szilagyi, M.N. Simulation of multi-agent prisoners dilemmas. Systems Anal Model Simulat 2003, 43, 829846.5. Szilagyi, M.N.; Szilagyi, Z.C. Nontrivial solutions to the N-person prisoners dilemma. Systems Res Behav Sci 2002, 19, 281290.6. Szilagyi, M.N.; Somogyi, I. Agent-based simulation of N-person games with crossing payoff functions. Complex Systems

2008, 17, 427439.

7. Szilagyi, M.N.; Somogyi, I. Agent-based simulation of an N-person game with parabolic payoff functions. Complexity 2010,15, 5060.

8. Szilagyi, M.N.; Somogyi, I. A systematic analysis of the N-person chicken game. Complexity 2010, 15, 5662.

9. Wolfram, S. Cellular Automata and Complexity; Addison-Wesley: Boston, MA, 1994.10. Hegselmann, R.; Flache, A. Understanding complex social dynamics: A plea for cellular automata based modelling. J Artif

Soc Soc Simulat 1998, 1, 3.

FIGURE 16






bottom curves are 0.62, 0.60, and 0.00, respectively.

FIGURE 17



curves are C(x) 5 22 1 4x and D(x) 5 2x. The graphs show the



are 0.80, 0.70, 0.67, and 0.66, respectively.



9/9

11. Narendra, K.S.; Thathachar, M.A.L. Learning Automata (An Introduction); Prentice Hall: Englewood Cliffs, NJ, 1989.12. Flache, A.; Macy, M.W. Weakness of strong ties: Collective action failure in a highly cohesive group. J Math Sociol 1996, 21,

328.13. Komorita, S.S. A model of the n-person dilemma-type game. J Exp Soc Psychol 1976, 12, 357373.

14. Nowak, M.A.; May, R.M. Evolutionary games and spatial chaos. Nature 1992, 359, 826829.15. Poundstone, W. Prisoners Dilemma; Doubleday: New York, 1992.

16. Skyrms, B. The Stag Hunt and Evolution of Social Structure; Cambridge University Press: Cambridge, UK, 2004.

17. Pacheco, J.M.; Santos, F.C.; Souza, M.O.; Skyrms, B. Evolutionary dynamics of collective action in N-person stag hunt dilem-mas. Proc R Soc 2009, B276, 315321.

18. Komorita, S.S.; Parks, C.D. Social Dilemmas; Brown & Benchmark: Madison, WI, 1994.19. Szilagyi, M.N.; Jallo, M.D. Standing ovation: An attempt to simulate human personalities. Systems Res Behav Sci 2006, 23,

825838.

20. Thorndike, E.L. Animal Intelligence; Hafner: Darien, CT, 1911.

21. Kraines, D.; Kraines, V. Pavlov and the prisoners dilemma. Theory Decision 1989, 26, 4779.22. Macy, M.W. Pavlov and the evolution of cooperation. An experimental test. Soc Psychol Quart 1995, 58, 7487.23. Flache, A.; Hegselmann, R. Rationality vs. learning in the evolution of solidarity networks: A theoretical comparison. Comput

Math Theory 1999, 5, 97127.


analytical solutions

Documents