concepts of game theory i. 2 what are multi-agent systems? organisational relationship interaction...

Concepts of Game Theory I

2

What are Multi-Agent Systems?

OrganisationalrelationshipInteraction

Agent

Environment

Spheres ofinfluence

3

A Multi-Agent System Contains:A number of agents that• interact through communication• are able to act in an environment• have different “spheres of influence” (which may coincide)• will be linked by other (organisational) relationships.

4

Utilities of agents (1)• Assume that we have just two agents:

AG = {i, j }• Agents are assumed to be self-interested:

– They have preferences over environmental states

5

Utilities of agents (2)• Assume that there is a set of “outcomes” that agents have

preferences over: = {1, 2, }

– Example: odd-or-even game (alternative to head-or-tail) = {(0,0),…,(0,5),(1,0),…,(1,5),…(5,0),…,(5,5)}

• These preferences are captured by utility functions:ui : uj :

– Example: odd-or-even game (alternative to head-or-tail)ueven((0,0)) = 1 ueven((0,1)) = 0 ueven((0,2)) = 1 …uodd((0,0)) = 0 uodd((0,1)) = 1 uodd((0,2)) = 0 …

Or, more simply,ueven((m,n)) = 1, if m +n is an even number; otherwise 0uodd((m,n)) = 0, if m +n is an even number; otherwise 1

6

Utilities of agents (2)• Utility functions lead to preference orderings over

outcomes: i ’ means ui () ui (’) j ’ means uj () uj (’)

• But, what is utility?• In some domains, utility is

analogous to money; e.g. we could have a relationship like this:

Money

Utility

7

Agent Encounters• To investigate agent encounters we need a model of the

environment in which agents act:– agents simultaneously choose an action to perform, – the actions they select will result in an outcome ;– the actual outcome depends on the combination of actions;

• Assume each agent has just two possible actions it can perform:– C (“cooperate”) – D (“defect”).

8

The State Transformer Function• Let’s formalise environment behaviour as:

: Aci Acj • Some possibilities:

– Environment is sensitive to the actions of both agents: (D,D) 1 (D,C ) 2 (C,D) 3 (C,C ) 4

– Neither agent has influence on the environment: (D,D) (D,C ) (C,D) (C,C ) 1

– The environment is controlled by agent j . (D,D) 1 (D,C ) 2 (C,D) 1 (C,C ) 2

9

Rational Action (1)• Suppose an environment in which both agents can

influence the outcome, with these utility functions:ui (1) 1 ui (2) 1 ui (3) 4 ui (4) 4uj (1) 1 uj (2) 4 uj (3) 1 uj (4) 4

• Including choices made by the agents:ui ( (D,D)) 1 ui ( (D,C )) 1 ui ( (C,D)) 4 ui ( (C,C )) 4uj ( (D,D)) 1 uj ( (D,C )) 4 uj ( (C,D)) 1 uj ( (C,C )) 4

10

Rational Action (2)• Then, the preferences of agent i are:

(C,C ) i (C,D) i (D,C ) i (D,D)

• “C ” is the rational choice for i :– Agent i prefers outcomes that arise through C over all outcomes

that arise through D.

11

Pay-off Matrices• We can charaterise this scenario (& similar scenarios) as a

pay-off matrix :

• Agent i is the column player• Agent j is the row player

i

j

Defect Coop

Defect

11

41

Coop1

44

4

12

Dominant Strategies• Given any particular strategy s (either C or D) for agent i,

there will be a number of possible outcomes• s1 dominates s2 if

every outcome possible by i playing s1 is preferred over

every outcome possible by i playing s2• A rational agent will never play a strategy that is dominated

by another strategy– However, there isn’t always a unique strategy that dominates all

other strategies…

13

Nash Equilibrium• Two strategies s1 and s2 are in Nash

Equilibrium if:– under the assumption that agent i plays s1, agent

j can do no better than play s2; and– under the assumption that agent j plays s2, agent

i can do no better than play s1.• Neither agent has any incentive to deviate

from a Nash equilibrium!!• Unfortunately:

– Not every interaction has a Nash equilibrium– Some interactions have more than one Nash

equilibrium…

John

For

bes

Nas

h, Jr

http://www.math.princeton.edu/jfnj/

14

Competitive and Zero-Sum Interactions• When preferences of agents are diametrically opposed we

have strictly competitive scenarios• Zero-sum encounters have utilities which sum to zero:

, ui () uj () 0– Zero sum implies strictly competitive

• Zero sum encounters in real life are very rare– However, people tend to act in many scenarios as if they were

zero sum.

15

The Prisoner’s Dilemma• Two people are collectively charged with a crime

– Held in separate cells– No way of meeting or communicating

• They are told that:– if one confesses and the other does not, the confessor will be

freed, and the other will be jailed for three years;– if both confess, both will be jailed for two years– if neither confess, both will be jailed for one year

Albe

rt W

. Tuc

ker

16

• Defect = confess; Cooperate = not confess• Numbers in pay-off matrix are not years in jail• They capture how good an outcome is for the agents

– The shorter the jail term, the better• The utilities thus are:

ui (D,D) 2 ui (D,C ) 5 ui (C,D ) 0 ui (C,C ) 3uj (D,D) 2 uj (D,C ) 0 uj (C,D ) 5 uj (C,C ) 3

• The preferences are:(D,C ) i (C,C ) i (D,D) i (C,D )(C,D ) j (C,C ) j (D,D) j (D,C )

The Prisoner’s Dilemma Pay-Off Matrix

17

The Prisoner’s Dilemma Pay-Off Matrix

• Top left: both defect, both get 2 years.• Top right: i cooperates and j defects, i gets sucker’s pay-off, while j

gets 5. – Bottom left is the opposite

• Bottom right: reward for mutual cooperation.

i

j

Defect Coop

Defect

22

05

Coop5

03

3

Defect = confessCoop = not confess

concepts of game theory i. 2 what are multi-agent systems? organisational relationship interaction...

Documents

u i u i j

u i u j

u j u j

d defect

preferences of agent

tail u

ac i ac j

agent j plays s