cooperation in networked populations of selfish adaptive agents: sensitivity to network structure

8
Physica A 378 (2007) 110–117 Cooperation in networked populations of selfish adaptive agents: Sensitivity to network structure $ La´szlo´ Gulya´s a,b, a Department of History and Philosophy of Science, Lora´nd Eo¨tvo¨s University, 1117 Budapest, Pa´zma´ny Pe´ter se´ta´ny 1/c, Hungary b AITIA International Inc., 1039 Budapest, Czetz Ja´nos utca 48-50, Hungary Available online 15 December 2006 Abstract This paper investigates the adaptation of cooperating strategies in an iterated prisoner’s dilemma (IPD) game with individually learning agents, subject to the structure of the interaction network. In particular, we study how cooperation or defection comes to dominate the population on Watts–Strogatz networks, under varying average path lengths. Our results are in good agreement with previous works on discrete choice dynamics on networks, but are in stark contrast with results from the evolution of cooperation literature. We argue that the latter is because the different adaptation method used (i.e., adaptive learning instead of ‘evolutionary’ strategy switching). r 2006 Elsevier B.V. All rights reserved. Keywords: Cooperation; Network topology; Iterated prisoner’s dilemma; Adaptive learning 1. Introduction Trust and cooperation among agents in a multi-agent system have several interpretations; including shared preferences, work toward a common goal, dependable expertise and opinions, altruistic behavior, etc. Yet, in most fields of the social sciences, trust is understood in the strategic context, i.e., as the (missing) tendency of the agent to take unilateral advantage over its partners. The classic framework to explore such situations is the (iterated) prisoner’s dilemma game (IPD). Following the series of works by Axelrod, this framework is now ubiquitous in many disciplines and forms the base of many models, studying a variety of problems [1]. Within this framework, the success of altruistic cooperation is often understood as the presence of the tit-for-tat (TFT) strategy, that cooperates on the first round and then reciprocates the moves of its partner, against the all-defective (ALLD) strategy that exploits its opponents in all rounds (i.e., never cooperates). The role of interaction topology (i.e., network structure) was also studied in IPD settings. In the context of their evolutionary model, Cohen, Axelrod and Riolo (CAR) found that context preservation (i.e., a static ARTICLE IN PRESS www.elsevier.com/locate/physa 0378-4371/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2006.11.063 $ The partial support of the GVOP-3.2.2-2004.07-005/3.0 (ELTE Informatics Cooperative Research and Education Center) grant of the Hungarian Government is gratefully acknowledged. Corresponding author. Department of History and Philosophy of Science, Lora´nd Eo¨tvo¨s University, 1117 Budapest, Pa´zma´ny Pe´ter se´ta´ny 1/c, Hungary. E-mail address: [email protected]. URL: http://hps.elte.hu/gulya/.

Upload: laszlo-gulyas

Post on 26-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

ARTICLE IN PRESS

0378-4371/$ - se

doi:10.1016/j.ph

$The partial

Hungarian Gov�Correspond

setany 1/c, Hun

E-mail addr

URL: http:/

Physica A 378 (2007) 110–117

www.elsevier.com/locate/physa

Cooperation in networked populations ofselfish adaptive agents: Sensitivity to network structure$

Laszlo Gulyasa,b,�

aDepartment of History and Philosophy of Science, Lorand Eotvos University, 1117 Budapest, Pazmany Peter setany 1/c, HungarybAITIA International Inc., 1039 Budapest, Czetz Janos utca 48-50, Hungary

Available online 15 December 2006

Abstract

This paper investigates the adaptation of cooperating strategies in an iterated prisoner’s dilemma (IPD) game with

individually learning agents, subject to the structure of the interaction network. In particular, we study how cooperation or

defection comes to dominate the population on Watts–Strogatz networks, under varying average path lengths. Our results

are in good agreement with previous works on discrete choice dynamics on networks, but are in stark contrast with results

from the evolution of cooperation literature. We argue that the latter is because the different adaptation method used (i.e.,

adaptive learning instead of ‘evolutionary’ strategy switching).

r 2006 Elsevier B.V. All rights reserved.

Keywords: Cooperation; Network topology; Iterated prisoner’s dilemma; Adaptive learning

1. Introduction

Trust and cooperation among agents in a multi-agent system have several interpretations; including sharedpreferences, work toward a common goal, dependable expertise and opinions, altruistic behavior, etc. Yet, inmost fields of the social sciences, trust is understood in the strategic context, i.e., as the (missing) tendency ofthe agent to take unilateral advantage over its partners. The classic framework to explore such situations is the(iterated) prisoner’s dilemma game (IPD). Following the series of works by Axelrod, this framework is nowubiquitous in many disciplines and forms the base of many models, studying a variety of problems [1]. Withinthis framework, the success of altruistic cooperation is often understood as the presence of the tit-for-tat(TFT) strategy, that cooperates on the first round and then reciprocates the moves of its partner, against theall-defective (ALLD) strategy that exploits its opponents in all rounds (i.e., never cooperates).

The role of interaction topology (i.e., network structure) was also studied in IPD settings. In the context oftheir evolutionary model, Cohen, Axelrod and Riolo (CAR) found that context preservation (i.e., a static

e front matter r 2006 Elsevier B.V. All rights reserved.

ysa.2006.11.063

support of the GVOP-3.2.2-2004.07-005/3.0 (ELTE Informatics Cooperative Research and Education Center) grant of the

ernment is gratefully acknowledged.

ing author. Department of History and Philosophy of Science, Lorand Eotvos University, 1117 Budapest, Pazmany Peter

gary.

ess: [email protected].

/hps.elte.hu/�gulya/.

ARTICLE IN PRESSL. Gulyas / Physica A 378 (2007) 110–117 111

network) and not network structure was the key for the survival (and domination) of cooperative behavior(i.e., agents playing TFT) [2,3]. When network dynamics was allowed, ALLD agents took over the world.While this result has important and appealing interpretations in social disciplines, it is in stark contrast withwhat is known about information/behavior cascades and adaptation in complex (social) networks. In thelatter field, it is established that the average path length of the network plays an important role in thespreading of information and behavior.

This paper revisits the question of network dependence in IPD games with the aim to consolidate thepreviously discussed apparent contradiction. It departs from the classic CAR-model by assuming individuallearning, instead of evolutionary adaptation. Studying this altered model on static Watts–Strogatz networks, itshows that context preservation is not always sufficient for achieving a cooperative regime. Furthermore, andmore importantly, the results suggest that the performance of the TFT strategy may depend on the (static)network structure, eliminating the contradiction above.

2. Evolutionary iterated prisoner’s dilemma games on static networks

In the classic prisoner’s dilemma (PD) game, two agents face the same binary choice: each has the option toeither cooperate (C) or to defect (D)—without knowing the choice of the partner. The payoff matrix of thisgame is shown in Table 1, with the condition that T4R4P4S and 2R4S þ T .1 That is, choosing D alwayshas a higher expected payoff than choosing C, yet the combined payoff of the two agents is best when both ofthem cooperate. The deficiency of defection becomes more pronounced when the partners play consecutivegames, as the difference 2R� ðS þ TÞ40 accumulates. This makes the number of games (iterations, ni)important. IPD games (ni41) are interesting if the agents have a memory of the last nmoni actions taken bytheir partner [4].

In this paper we join the body of work studying one-step memory ðnm ¼ 1Þ deterministic strategies, whichare generally described with a sequence of three actions: AI , AC , AD. That is, the action taken initially (whenno information about the partner is available), in response to a cooperative and a defective action,respectively. Among the eight possible strategies, we focus on the four studied in Cohen et al. [2], as listed inTable 2.2

2.1. Networked iterated prisoner’s dilemma games

The evolution of cooperation is studied in a ‘society’ A of more than two agents ðjAj42Þ. In this context,agents playing IPD games have to be ‘paired’, i.e., the interaction topology of the society must be specified.The most general formulation of the topology is a graph (G ¼ ðA;EÞ, where E � A� A): the nodes are theagents and the links represent pairs playing IPD games. Each agent plays an IPD game with all of its EðaÞ

neighbors independently, but simultaneously. (For simplicity, we assume an undirected network.) Agents areassigned a random initial strategy drawn from a uniform distribution.

At the end of each round (i.e., after each agent has played with all of its neighbors), the agents have theoption to change their strategies. In the evolutionary family of models, this is done by observing theprofitability of the strategies applied by the agent’s neighbors [2,3]. Profitability is expressed as the averagepayoff per game achieved in the current round by the neighbor in question. Let POt

a denote the payoffcollected, and nt

a the number of IPD games played by agent a in round t. Then the profitability of agent a attime t is expressed as Pt

a ¼ POta=nt

a.The evolutionary adaptation rule assumes that at the end of the round the agents copy the strategy of their

most profitable neighbor. If sta stands for the strategy played by agent a in round t, then this adaptation rule

can be expressed as

stþ1a ¼ st

argmaxi2EðaÞPti. (1)

1Henceforth, we adopt the following standard values: T ¼ 5, R ¼ 3, P ¼ 1, S ¼ 0.2This is a discretized version of the model discussed by Cohen et al. Their results apply to a wider set of strategies than discussed here.

ARTICLE IN PRESS

Table 2

The 4 one-step memory strategies studied

Name (abbreviation) Action sequence (AI , AC , AD)

Always cooperates (ALLC) C;C;CTit-for-tat (TFT) C;C;DAnt-tit-for-tat (ATFT) D;D;CAlways defects (ALLD) D;D;D

Table 1

The payoff matrix of the prisoner’s dilemma game, T4R4P4S, 2R4S þ T

Player 2’s action

Player 1’s action Cooperates Defects

Cooperates R (reward) R (reward) S (sucker) T (temptation)

Defects T (temptation) S (sucker) P (punishment) P (punishment)

Player 1’s payoff Player 2’s payoff Player 1’s payoff Player 2’s payoff

L. Gulyas / Physica A 378 (2007) 110–117112

If different strategies scored the same maximum value, the agent applies a random choice. Notice that thisadaptation rule assumes that the agents have the means to ex post discover the strategies played by theiropponents. However, this does not mean that they would have the same capability during the IPD game.

2.2. The evolution of cooperation

Cohen et al. study how cooperation (c.f., trust) can evolve spontaneously in a population of selfish agents,using the networked IPD games framework with the evolutionary adaptation rule. They explore variousinteraction network topologies with the constraint of a fixed z ¼ 4 average number of neighbors and ni ¼ 4.

In such models, all monopolistic configurations (i.e., where the entire population follows the same strategy)are trivially stable, but numerical experiments show that, starting from a randomized initial configuration, thetwo prevalent outcomes are the 100% ALLD and the 100% TFT. That is, cooperation (i.e., an all-TFTconfiguration) can indeed emerge spontaneously. More importantly, this is a rather common outcome, towhich the key appears to be context preservation. In other words, the exact structure of the network is notimportant as long as the interaction topology is static.3

Fig. 1 demonstrates the typical time trajectory of the model on static networks. Clearly, ALLD and TFT arethe major contenders, the former showing a much better start off. However, there is a tipping point afterwhich TFT is unbeatable.

2.3. Evolutionary dynamics on Watts– Strogatz networks

In the following we focus on Watts–Strogatz networks, which start from a regular D-dimensional latticewith periodic boundary conditions and k-neighborhood (i.e., the agents are connected to their neighbors atmost k-steps away on the lattice). Then with a certain pRew probability, each connection is ‘rewired’, i.e., it isreplaced by a random link subject to the following constraints: neither self-loops nor multiple links betweenthe same two nodes are allowed [5]. In particular, we study two-dimensional Watts–Strogatz networks ðD ¼ 2Þwith 1-neighborhood ðk ¼ 1Þ. The motivation for the latter is to keep the average number of neighbors ðz ¼ 4Þconsistent with the values studied in the Cohen et al. papers [2,3].

3Admittedly, these results show certain dependency on the relation among the exact values of T , S, P, R, ni and z.

ARTICLE IN PRESS

pRew

1.00.80.60.40.20.0

Q

1.0

0.8

0.6

0.4

0.2

0.0

N=10000N=3600N=2500N=900N=625N=400N=256N=100

Fig. 2. Evolutionary adaptation’s independence of network structure on Watts–Strogatz networks after 103 rounds. (Numerical

replications. The points are averages for 10 network instances for each parameter value and 10 different runs for each network. The values

for the N ¼ 100 case show an averaging effect: all individual simulations converge to either 0 or 1.0.)

50403020100

1.0

0.8

0.6

0.4

0.2

0.0

#ALLD/N

#ATFT/N

#TFT/N

#ALLC/N

Fig. 1. Example run of the evolutionary model on a N ¼ 256 torus. The time series show the ratio of agents playing each strategy versus

simulated time.

L. Gulyas / Physica A 378 (2007) 110–117 113

Fig. 2 demonstrates that the evolution of cooperation outcome is independent of the network structure onWatts–Strogatz graphs, using the evolutionary adaptation method. On the horizontal axis pRew is varied from0 to 1. For each value 10 different random instances of the appropriate Watts–Strogatz network class iscreated and each of them is tested with 10 different random initial configurations. After 103 iterations, wecount the number of agents playing the non-cooperative ALLD strategy and normalize the result with thesystem size. The vertical axis of Fig. 2 shows the average of the values obtained from the 10� 10 simulations,defined as

Q ¼

Pa2Awðs

1000a ¼ ALLDÞ

N

� �, (2)

where wð�Þ yields 1 for true statements and 0 otherwise.

ARTICLE IN PRESSL. Gulyas / Physica A 378 (2007) 110–117114

3. Individually inductive adaptation

The strong network independence of the emergence of cooperation result has a strong message in thecontext of cooperation and trust in society. However, it is in stark contrast with the growing body of results inthe network dynamics literature, where discrete choice dynamics were found to be dependent on networktopology, and especially on the l average path length in the network [6–9,12]. Our working hypothesis is thatthe difference is caused by the sharp, step-like nature of adaptation in the evolutionary setting. That is, agentsswitch from ALLD to TFT after a single round of underperformance. In order to investigate this hypothesis,we designed a model using an individually inductive adaptation rule that ‘smoothens out’ the strategy change.

For the sake of the new adaptation rule, we need to redefine our basic model. The agents of our model willcontinue to play one of the four 1-step memory ðnm ¼ 1Þ strategies as above in each round. However, they willprobabilistically pick one of these strategies at the beginning of each round and play this fixed strategy against

all of their opponents.4 Let ptaðsÞ denote the probability that agent a plays strategy s 2

fALLC;TFT;ATFT;ALLDg at time t. Naturally,P

s2fALLC;TFT;ATFT;ALLDg ptaðsÞ ¼ 1 for all a 2 A and tX0.

In order to approximate the random initial strategy configuration of the evolutionary adaptation model,agents are assigned a special initial strategy probability distribution. In this distribution, one strategy has adefinite advantage (pX0:25, set to 0.3 in our experiments) and the complementing probability is distributedequally among the three remaining strategies (see Table 3). The random initial configuration is created in sucha way that an approximately equal number of agents will have a bias for each strategy.

At the end of the round agents have the option to adapt to their more successful neighbors, just like in theevolutionary case. However, instead of copying their opponents directly, the agents only change theirprobabilities for the four strategies. They will increase the weight of the strategy of their best performingneighbor with a constant cA value. Then the weights of the strategies are renormalized, in order to create aprobability distribution again. That is, if st

M ¼ stargmaxi2EðaÞP

tistands for the best strategy in the neighborhood,

then the new strategy probability distribution of agent a will be the following:

ptþ1a ðsÞ ¼

ptaðsÞ þ cA

1þ cA

; s ¼ stM ;

ptaðsÞ

1þ cA

; sastM :

8>>><>>>:

(3)

Notice that this adaptation rule has a nice convergence property when subjected to repeated reinforcement, asdemonstrated in Fig. 3.

3.1. Sensitivity to network structure

In contrast to the model with the evolutionary adaptation rule, the aggregate outcome is highly dependenton network structure, in much the same way as with other discrete choices on networks, when agents followindividually inductive adaptation. Cooperation (i.e., TFT) is successful when agents in the network are ‘farapart’, while on small worlds (i.e., when the average agent-to-agent distance l scales with logN) ALLD takesover.

In Fig. 4 we observe a phase shift from the TFT dominated large worlds to the ALLD dominated smallworlds as the pRew probability of rewiring in the Watts–Strogatz network increases. The Q ratio of ALLDdomination scales as a f ðNÞ ¼ aN þ b linear function of the system size. The inset illustrates the collapse of l

as a function of pRew.

3.2. Sensitivity to learning speed

The results with the individually inductive adaptation model suggest that context preservation, in itself, isnot sufficient to explain the ubiquitous occurrence of cooperation in real systems. Furthermore, it seems to

4Therefore, the proposed scheme is not a mixed strategy.

ARTICLE IN PRESS

Number of reinforcing steps

4037343128252219161310741

Pro

babili

ty o

f str

ate

gy

1.0

0.8

0.6

0.4

0.2

0.0

Strategy 1

Strategy 2

Fig. 3. The convergence property of the strategy adaptation rule in case of cA ¼ 0.1. The chart depicts two strategies among which one is

repeatedly reinforced.

Table 3

Initial distribution of strategy probabilities for the four strategy classes of the evolutionary adaptation model

p0a ðALLCÞ p0a ðTFTÞ p0a ðATFTÞ p0a ðALLDÞ

Former ALLC p ð1� pÞ=3 ð1� pÞ=3 ð1� pÞ=3Former TFT ð1� pÞ=3 p ð1� pÞ=3 ð1� pÞ=3Former ATFT ð1� pÞ=3 ð1� pÞ=3 p ð1� pÞ=3Former ALLD ð1� pÞ=3 ð1� pÞ=3 ð1� pÞ=3 p

p is a model parameter.

L. Gulyas / Physica A 378 (2007) 110–117 115

imply that cooperation requires large network diameters, while it is known that essentially all social networksare small worlds [10]. However, individually inductive adaptation is not only sensitive to network structure,but also to adaptation speed.

Fig. 5 shows the Q dominance of ALLD outcomes as a function of cA and pRew in case of N ¼ 400. (Weobtained similar results for different system sizes as well.) Clearly, quicker adaptation yields higher levels ofcooperation. Moreover, there appears to be a threshold above which cooperation emerges independently ofnetwork diameter.

4. Conclusions and related works

The emergence of cooperation in iterated prisoner’s dilemma games has been the subject of intensiveresearch over a period of several decades. Yet, the study of the effect of interaction (network) topology is arelatively novel direction, started by the seminal work of Cohen et al. [2–5]. As discussed earlier, they foundthat cooperation is not dependent on the exact network structure, but on the stability of it. We argued that thisresult is in contrast with what was found in essentially all models of discrete choice dynamics on networks.Therefore, we proposed a modified model, where evolutionary adaptation is replaced by individually inductivestrategy updates. Despite recent interest in dynamic networks, we limited our study to static networks, in orderto focus on the context dependency issue emphasized by Cohen et al. Furthermore, it has been shown thatheterogeneity in the number of connections the agents have may play a significant role in the emerging level ofcooperation, subject to the additivity of the utility model [1,11]. Even though the models discussed in thispaper use non-additive utilities, we took the conservative approach and compared the models on networkswith near-homogenous degree distributions and with similar values of z.

ARTICLE IN PRESS

0.0

5

0.0

7

0.0

9

0.1

1

0.1

3

0.1

5

0.1

7

0.1

9

0.3

0.5

0.7

0.9

CA

Q

1

0.8

0.6

0.4

0.2

0

pR

ew

Fig. 5. Sensitivity to learning speed ðN ¼ 400Þ. Numerical results showing the Q dominance of ALLD outcomes after 103 rounds versus cA

and pRew. Averages for 10 network instances for each pRew and 10 different runs for each network per parameter value.

pRew

1.00.80.60.40.20.0

Q/f(N

)0.16

0.12

0.08

0.04

0.00

N=10000

N=3600

N=2500

N=900

N=625

N=400

N=256

N=100

<path

length

>

60

50

40

30

20

10

0

Fig. 4. Sensitivity to network structure with the individually inductive adaptation rule (cA ¼ 0.1). Numerical results showing the Q

dominance of ALLD outcomes after 103 rounds, scaling as a polynomial function of N, with a ¼ 5� 10�4, b ¼ 7:8 (see text). Inset:

Average path length (l) versus pRew in Watts–Strogatz networks. (Numerical results. Averages for 10 network instances for each parameter

value.)

L. Gulyas / Physica A 378 (2007) 110–117116

Using numerical simulations of the proposed new model, we found that with the proposed ‘smoother’adaptation rule, the emergence of cooperation is indeed dependent on network structure. In particular, TFT isonly successful in ‘large worlds’, i.e., when the network diameter increases faster than the logarithm of systemsize. We also found that the shift from TFT-domination to ALLD-prevalence scales as a polynomial f ðNÞ

function of the number of agents. Moreover, sensitivity to network structure is found to be dependent onlearning speed ðcAÞ as well, such that sufficiently fast individual learning can even overcome network

ARTICLE IN PRESSL. Gulyas / Physica A 378 (2007) 110–117 117

dependence. This suggests that the spontaneous emergence of cooperative behavior observed in, the typicallysmall worlds of, real social networks, may be a consequence of the specific abilities of human learning. This isan interesting theoretical direction, which we intend to explore further.

References

[1] G. Szabo, G. Fath, Evolutionary games on graphs, arXiv.org:cond-mat/0607344.

[2] M.D. Cohen, R.L. Riolo, R. Axelrod, The role of social structure in the maintenance of cooperative regimes, Rationality Soc. 13

(2001) 5–32.

[3] R. Axelrod, R.L. Riolo, M.D. Cohen, Beyond geography: cooperation with persistent links in the absence of clustered

neighborhoods, Pers. Soc. Psychol. Rev. 6 (2002) 341–346.

[4] F. Schweitzer, R. Mach, H. Muhlenbein, Agents with heterogeneous strategies interacting in a spatial IPD, in: T. Lux, S. Reitz, E.

Samanidou (Eds.), Nonlinear Dynamics and Heterogenous Interacting Agents, Lecture Notes in Economics and Mathematical

Systems, vol. 550, Springer, Berlin, 2005, pp. 87–102.

[5] D. Watts, Small Worlds: The Dynamics of Networks between Order and Randomness, Princeton University Press, Princeton, NJ,

1999.

[6] L. Gulyas, E.R. Dugundji, Discrete choice on networks: an agent-based approach, North American Association for Computational

Social and Organizational Science Conference (NAACSOS 2003), Omni William Penn, Pittsburgh, PA, June 22–25, 2003.

[7] E.R. Dugundji, L. Gulyas, Socio-dynamic discrete choice on networks in space: impacts of agent heterogeneity on emergent

outcomes, Workshop on Agent-based Models of Market Dynamics and Consumer Behaviour, Institute of Advanced Studies,

University of Surrey, January 17–18, 2006.

[8] C.P. Herrero, Ising model in small-world networks, Phys. Rev. E 65 (2002) 066110.

[9] T. Nikoletopoulos, A.C.C. Coolen, I. Perez Castillo, N.S. Skantzos, J.P.L. Hatchett, B. Wemmenhove, Replicated transfer matrix

analysis of Ising spin models on ‘small world’ lattices, J. Phys. A 37 (2004) 6455–6475.

[10] M. Newman, Models of the small world: a review, J. Stat. Phys. 101 (2000) 819–841.

[11] F.C. Santos, J.M. Pacheco, T. Lenaerts, Evolutionary dynamics of social dilemmas in structured heterogeneous populations, Proc.

Nat. Acad. Sci. 103 (9) (2006) 3490–3494.

[12] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (1998).