slide 1 of 24 bayesian games matthew h. henry november 10, 2004 references 1.axlerod, robert. 1987....

Bayesian Games

Matthew H. HenryNovember 10, 2004

References

1. Axlerod, Robert. 1987. “The evolution of strategies in iterated prisoner’s dilemma.” Genetic Algorithms and Simulated Annealing. (ed. D. Davis) London: Pitman, pp. 32-43.

2. Gibbons, Robert. 1992. Game Theory for Applied Economists. Princeton, New Jersey: Princeton University Press.

3. Harsanyi, John C. 1967. “Games with Incomplete Information Played by Bayesian Players, Parts I, II and III.” Management Science 14:159-182, 320-334, 486-502.

4. Sigmund, Karl. 1993. Games of Life – Explorations in Ecology, Evolution, and Behaviour. Oxford, England: Oxford University Press.

Outline

• Static Games with Bayesian Players

– Example: Scalping Tickets

– Nash Equilibria for Matrix Games with Incomplete Information: Generals

– Nash Equilibria for Games with Asymmetric Information: Cournot Model

– Nash Equilibria for Games with Continuous Type Space: Auction

• Dynamic Games with Bayesian Players

– Perfect Bayesian Equilibrium for Games with Incomplete or Imperfect Information

– Example: 3-Player Game Tree

• Signaling Games

– Perfect Bayesian Equilibrium for Signaling Games

– Example: Job Market Signaling

Static Games with Incomplete Information

• Static games

– Players move simultaneously

– No observation of opponent move history

• Games with incomplete information

– One or more players lacks full information regarding the payoff functions and

strategies available

– We shall limit the information deficit to the player state (or type) knowledge

• Player type implies (and is implied by) payoff function

• Matrix games will have a unique payoff matrix for each player type match-up

Example: Scalping Tickets

• For example, consider a scenario in which you and the Cavalier are each

scalping tickets for beer money before the UVa-Miami football game

• For every discrete round of the game, each player assumes one of two types

and can take one of two actions (stand in one of two locations)– Types: Buyer or Seller

– Locations: in front of Durty Nellie’s Pub or at the Fry’s Spring Garage

• You know that you are either buying or selling and you know with probability

p that the Cavalier is buying, and selling otherwise

• Four payoff matrices for the four possible type match-ups

• Choose a spot to maximize profit (Durty Nellies or Fry’s Spring Garage)

based on your type and your best guess of the Cav’s type

A Better Example from Harsanyi

• Consider two Generals A and B

– A seeks to maximize (maxmin) payoff and B seeks to minimize (minmax) payoff

– Fixed action profiles: (a1, a2) and (b1, b2)

– Each leads an army which assumes one of two states: Strong or Weak

• This yields four possible match-ups – (AS, BS), (AS, BW), (AW, BS), (AW, BW) –

with corresponding payoff matrices, each having its own Nash equilibrium:

2 5-1 20

(AS, BS)

-24 -360 24

(AS, BW)

28 1540 4

(AW, BS)

12 202 13

(AW, BW)

[Harsanyi]

Bayesian Players

• Each player knows his own state and estimates his opponents state

• Each player has a pure strategy for every possible match-up

• Each player forms a strategy based on the expected payoff

• To continue the example given by Harsanyi, consider the following

probabilities of occurrence for the four possible match-ups:

4/10 1/10

2/10 3/10

Bayesian Nash Equilibrium

This yields the following payoff matrix and a single pure strategy Nash equilibrium:

AS a1, AW a1 B

AS a1, AW a2

AS a2, AW a1

AS a2, AW a2

Example calculation: Bayesian Nash Equilibrium payoff = (.4)(-1) + (.1)(0) + (.2)(28) + (.3)(12) = 8.8

Interpretation of Bayesian Nash Equilibrium

• If Player A is Strong, he takes action a2 and a1 if Weak.

• Player B takes action b1 irrespective of state.

• Emerged from the known probabilities of each possible match-up

• Nash optimal : Best response (in a Bayesian sense) on the part of each player to

the actions available to his opponent

• Note that each player has a pure state-dependent strategy

(However, an outside observer could interpret it as a mixed strategy, with

Nature playing the part of a third indifferent player who randomly chooses

states for players A and B according to fixed probability distributions)

Static Bayesian Game #2: Cournot Model

• Consider a Cournot model comprising two firms A and B producing the same commodity to satisfy market demand, D.

• The commodity price on the market is given by

• Firm A’s cost of producing the commodity is cAqA

– cAis the marginal cost

– qA is the quantity that Firm A produces.

• Firm B’s cost of producing the commodity is

– cB1qB, with probability p

– cB2qB with probability (1-p).

• Player state defined by its marginal cost

• Each firm seeks to maximize its profit by anticipating the market price

otherwise0

qqD ifqqDP BABA

Cournot Model and Asymmetric Information

•Firm B knows its state and Firm A’s state

•Firm A knows its own marginal cost but can only estimate Firm B’s state

•Each firm knows of the other’s degree of knowledge

•Gibbons calls this a Bayesian game with asymmetric information

•Firm A chooses the optimal quantity qA to produce:

•Firm B chooses the optimal quantity qB to produce

For cB1:

For cB2

AABBAAABBAq

qccqqDpqccqqDpA

)(1)(max 21

1BBB1q

qqwhere,qmaxB

BBBA ccqqD

2BBB2q

qqwhere,qmaxB

BBBA ccqqD

Analytical Solution: Bayesian Nash Equilibrium

System of Equations:

Solutions:

ccpccD

cppccDq

2 statein isit ifsolution sB' Firm ,2

1 statein isit ifsolution sB' Firm ,2

ABBABBA

ccqDpccqDpq

Bayesian Game with Continuous Type Space: Auction

• Consider an auction comprising two bidders and one item

• Players offer bids, b1 or b2, for the item

• b1 & b2 [0, 1]

• Each bidder values the item at v1 or v2 with payoff v1– p or v2– p, respectyively

• v1 & v2 [0, 1]

jiiijiii

bbbPbvbbPbv

1)(max

Note: The latter term in this utility function applies only when bids are offered in fixed increments. For bids from the continuous set [0,1], this term is zero.

Linear Equilibrium

• We simplify the search for equilibrium by limiting the solution to the linear form

bi(vi) = ai + civi

• This does not limit the player action spaces to linear strategies, but simply looks for a

linear equilibrium solution

• We can assume that a player i will neither bid above the expected highest bid nor

below the lowest expected bid of player j

• Therefore, aj bi aj+cj, since vj[0,1] and is a uniformly distributed random variable

Linear Equilibrium

• This gives us:

otherwisea

avforav

similarly

otherwisea

avforav

abPvcabPbbP

jijjjiji

Linear Equilibrium

• Since we are looking for a linear solution, ai and aj 0, since values greater

than zero would yield a non-linear solution or, if greater than 1, would yield

an infeasible solution since neither bidder will offer more than he values the

• Thus, since the bids must be non-negative, ai = aj = 0, and the solution is that

each bidder will offer one half his valuation of the item.

Dynamic Games with Bayesian Players

• Dynamic games with incomplete or imperfect information

– Players move after observing the actions taken by their opponents.

– Recall from the initial discussion on static games that information incompleteness

implied an information deficit with respect to an opponent’s type or state

– Information imperfection implies that each successive player’s move is based on

complete information about the state of the other players but flawed information

about the state of the game; i.e., the play history on the part of his opponents

• These games require a new solution concept: perfect Bayesian equilibrium

Perfect Bayesian Equilibrium

Gibbons gives the following four requirements for a perfect Bayesian equilibrium:

1. For each game turn, the moving player must have a belief about the state of the game,

i.e. the play history to that point, in the form of a probability distribution over the set of

the possible game sub-states at that point.

2. Given their beliefs, the players’ strategies must be sequentially rational. Note: An example of irrational (but effective under some circumstances) strategy is tit-for-tat in

repeated prisoner’s dilemma games. [Axlerod, Sigmund]

3. At each game state on the equilibrium path, beliefs are formed by observation-driven Bayes’ rule and players’ equilibrium strategies. (For a given equilibrium in a sequential game, a game state is on the equilibrium path if it will be reached with positive probability when the game is played according to equilibrium strategies. Otherwise, the state is off the equilibrium path.)

4. For game states off the equilibrium path, beliefs are formed by Bayes’ rule and players’

equilibrium strategies where possible.

Simple Example

Consider the following 3-player Game Tree. Each set of nodes corresponding to outcomes

associated with any particular player’s move represents a possible game state.200

L’ R’ L’ R’

[p] [1-p]

R1. This requirement is relevant for P3 only since

if P1 chooses A, the game is over, and thus

P2 has only to believe that he is in state D if

he has a turn. Player 3 must conclude that p

= 1 since R is dominated by L for player 2.

R2. Given this belief, Player 3 must choose R’.

R3. This requirement is satisfied by R1.

R4. This requirement is trivially satisfied since

there are no states off the equilibrium path.

Thus, the equilibrium (D,L,R’) can be confirmed

by inspection.

Signaling Games

• Games of two players with incomplete information about the opponent’s type

• One player is the Sender, one is the Receiver.

• Nature draws a type for the Sender according to a probability distribution on the set of feasible types.

• The Sender observes his type and sends a message based on that type. The sender can follow pooling, separating or hybrid strategies.

– A pooling Sender transmits the same message regardless of type.

– A separating Sender always transmits different messages for each type.

• The Receiver observes the message but not the type and chooses an action.

• Payoffs to the Sender and receiver are each a function of Sender type, message and Receiver action.

Requirements for Perfect Bayesian Equilibrium in Signaling Games

1. After observing the Sender’s message, the Receiver must have a belief about

the Sender’s type in the form of a probability distribution conditional upon the

message transmitted.

2R.For each message observed, the Receiver’s action must maximize the

Receiver’s expected payoff, given the belief about the Sender’s type.

2S. For each type determined by Nature, the Sender’s message must maximize his

expected payoff, given the Receiver’s strategy, defined as the set of actions to

be taken as functions of the message transmitted.

3. For each message transmittable by the Sender, if there exists a sender type

such that the message is optimal for that type, then the Receiver’s belief about

the Sender’s type must be derivable from Bayes’ rule and the Sender’s

strategy.

Example: Job Market Signaling

• Nature determines a worker’s (the Sender) productive ability, which can be either High

or Low. The probability that his ability is High is q.

• The worker observes his ability and chooses a level of education (his message to

potential employers).

• The hiring market (the Receiver) observes the worker’s level of education and, based on

a belief about the worker’s ability, offers a wage (Receiver’s action).

• Payoff to the worker is W – C(a, e), where W is the wage offered, C is the cost (financial

+ intellectual difficulty) of attaining a particular level of education as a function of

ability a and education level e. Presumably, the cost of attaining a higher level of

education for a Low ability worker is relatively high due to the additional intellectual

difficulty sustained by the worker in its pursuit.

• Payoff to the hiring market is P(a, e) – W, where P is the level of productivity supplied

by the worker as a function of ability and education level.

Complete Information Solution

P(L,e)

P(H,e)ILIH

e*(L) e*(H)

Note the marginal cost of education is higher for a Low ability worker, thus he would require a higher relative salary to justify pursuing a higher education, hence the steeper indifference curve.

The Productivity lines are found from the Nash solution W(e) = P(,e) in which the market, which is presumed to be competitive and therefore devoid of excess profit, offers a wage equal to the expected level of productivity.

Pooling Equilibria and the Power of Envy

• Suppose now that the hiring market has incomplete information about the

worker’s type and only observes the level of education attained by the

workers.

• Suppose further that a Low ability worker is envious of a High ability

worker’s salary and decides to attempt to masquerade as a High ability worker

by getting a more advanced degree.

• This constitutes a pooling strategy since the worker will attempt to signal to

the hiring market that he is of High ability irrespective of type.

Note, this is only rational if the following inequality holds:

W*(H) - C[L,e*(H)] > W*(L) – C[L,e*(L)]

Masquerading Workers with Pooling Strategies

P(L,e)

qP(H,e) + (1-q) P(L,e)

e*(L) e*p

Here the Nash equilibrium sets the wage at wp, where the expected Productivity line intersects both indifference curves.

P(H,e)

slide 1 of 24 bayesian games matthew h. henry november 10, 2004 references 1.axlerod, robert. 1987....

games example

b w harsanyi slide

history games

bayesian games matthew

payoff function matrix

cavs type slide

outline static games

bayesian players example

Documents

how learning biases and cultural transmission structure...

evaluate two theories explaining altruism. prisoner’s...

evolution of extortion in iterated prisoner dilemma...

machine learning in iterated prisoner’s dilemma using...

general article evolutionary stable strate gy...game is...

the iterated prisoner’s dilemma” as justification in the...

iterated prisoner’s dilemma game in evolutionary...

collapse of cooperation in evolving games · game theory...

iterated prisoner’s dilemma game in evolutionary...

iterated prisoner’s dilemma on alliance network

the evolution of friendship - ubc...

iterated dominance and iterated best response in...

risky punishment and reward in the prisoner’s … risky...

repeated games – prisoner’s dilemma -...

an open framework for the reproducible study of the...

machine learning in iterated prisoner’s dilemma using...

introduction to cognition and gaming 9/8/02: iterated...

statistical relational learning for game theory · learning...

on some winning strategies for the iterated prisoner… ·...

evolutionary iterated prisoner’s dilemma game h.-t. kim...