mit and james orlin © 2003 1 game theory 2-person 0-sum (or constant sum) game theory 2-person game...

MIT and James Orlin © 2003

1

Game Theory

2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)


2

2-person 0-sum game theory

Person R chooses a row: either 1, 2, or 3

Person C chooses a column: either 1, 2, or 3

This matrix is the payoff matrix for player R. (And player C gets the negative.)

2 -1 0

1 0 -2

-2 1 2

e.g., R chooses row 3; C chooses column 1

1

R gets 1; C gets –1 (zero sum)


3

Some more examples of payoffs

R chooses 2, C chooses 3

2 -1 0

1 0 -2

-2 1 2

R chooses row 3; C chooses column 3

0

R gets -2; C gets +2 (zero sum)

R gets 0; C gets 0 (zero sum)

0

-2


4

Next: 2 volunteers

Player R puts out 1, 2 or 3 fingers

2 -1 0

1 0 -2

-2 1 2

R tries to maximize his or her total

C tries to minimize R’s total.

Player C simultaneously puts out 1, 2, or 3 fingers

We will run the game for 5 trials.


5

Next: Play the game with your partner(If you don’t have one, then watch)

Player R puts out 1, 2 or 3 fingers

2 -1 0

1 0 -2

-2 1 2

R tries to maximize his or her total

C tries to minimize R’s total.

Player C simultaneously puts out 1, 2, or 3 fingers

We will run the game for 5 trials.


6

Who has the advantage: R or C?

2 -1 0

1 0 -2

-2 1 2

Suppose that R and C are both brilliant players and they play a VERY LONG TIME.

Will R’s payoff be positive in the long run, or will it be negative, or will it converge to 0?

We will find a lower and upper bound on the payoff to R using linear programming.


7

Computing a lower bound

2 -1 0

1 0 -2

-2 1 2

Suppose that player R must announce his or her strategy in advance of C making a choice.

A strategy that consists of selecting the same row over and over again is a “pure strategy.”R can guarantee a payoff of at least –1.

If R is forced to announce a row, then what row will R select?

2 -1 0


8

Computing a lower bound on R’s payoff

2 -1 0

1 0 -2

-2 1 2

Suppose we permit R to choose a random strategy.

The column player makes the choice after hearing the strategy, but before seeing the flip of the coin.

Suppose R will flip a coin, and choose row 1 if Heads, and choose row 3 if tails.


9

What is player’s C best response?

2 -1 0

1 0 -2

-2 1 2

What would C’s best response be?

If C knows R’s random strategy, then C can determine the expected payoff for each column chosen.

Prob.

.5

0

.5

-.5 .5 0Expected

Payoff

So, with a random strategy R can get at least -.5


10

Suppose that R randomizes between row 1 and row 2.

2 -1 0

1 0 -2

-2 1 2

What would C’s best response be?

Prob.

.5

.5

0

0 0 1Expected

Payoff

So, with a random strategy R can get at least 0.


11

What is R’s best random strategy?

2 -1 0

1 0 -2

-2 1 2

Prob.

x1

x2

x3

A: -2 x1 + 2 x2 + x3

AExpected

PayoffB

B: x1 - x2

C

C: 2 x1 - 2 x3

x1 + x2 + x3 = 1

C will choose the column that is the minimum of A, B, and C.


12

Finding the min of 2 numbers as an optimization problem.

Let z = min (x, y)

Then z is the optimum solution value to the following LP:

maximize z

subject to z x

z y


13

R’s best strategy, as an LP

2 -1 0

1 0 -2

-2 1 2 x1

x2

x3

A: PA = -2 x1 + 2 x2 + x3

A B

B: PB = x1 - x2

C

C: PC = 2 x1 - 2 x3

Expected Payoff

Maximize min (PA, PB, PC)

x1 + x2 + x3 = 1

x1 , x2 , x3 0


14

R’s best strategy, as an LP

2 -1 0

1 0 -2

-2 1 2 x1

x2

x3

A: z -2 x1 + 2 x2 + x3

A B

B: z x1 - x2

C

C: z 2 x1 - 2 x3

Expected Payoff

Maximize z (the payoff to x)

x1 + x2 + x3 = 1

x1 , x2 , x3 0

2-person0-sumgame


15

The Row Player’s LP, in general

a11 a12 a13 x1

x2

xn

a1m

a21 a22 a23 a2m

an1 an2 an3 anm

…

…

…

…

Pj: z a1j x1 + a2j x2 +… + anjxn for all j

P1 P2 P3

x1 + x2 + … + xn = 1

xj 0 for all j

Expected Payoff

Maximize z (the payoff to x)

Pm…


16

Here is the optimal random strategy for R.

2 -1 0

1 0 -2

-2 1 2

The optimal payoff to R is 1/9.

Prob.

7/18

5/18

1/3

1/9 1/9 1/9Expected

Payoff

So, with a random strategy R guarantees obtaining at least 1/9.

This strategy is a lower bound on what R can obtain if he or she takes into account what C is doing.


17

We can obtain a lower bound for C (and an upper bound for R) in the same manner.

2 -1 0

1 0 -2

-2 1 2

If C announced this random strategy, R would select row 1.

Exp.payoff

1/2

0

-1/4

1/4 1/2 1/4Prob.

So, with this random strategy C guarantees that R obtains at most 1/2.

If C chooses a random strategy, it will give an upper bound on what R can obtain.


18

Exercise, what is the best bound for C?

2 -1 0

1 0 -2

-2 1 2

With your partner, write the LP whose solution gives an optimal strategy for the column player.

Exp.payoff

?

?

?

y1 y2 y3Prob.


19

The best strategy for the column player

2 -1 0

1 0 -2

-2 1 2

Exp.payoff

1/9

1/9

1/9

1/3 5/9 1/9Prob.

So, with this random strategy R gets only 1/9.


20

Note: the payoff is the same, whether the Row Player goes first or the Column Player

2 -1 0

1 0 -2

-2 1 2

Exp.payoff

1/9

1/9

1/9

1/3 5/9 1/9Prob.

For 2-person 0-sum games, the maximum payoff that R can guarantee by choosing a random strategy is the minimum payoff to R that C can guarantee by choosing a random strategy.

So, the optimal average payoff to the game is 1/9, assuming that both players play optimally.


21

Constant sum games

5 1 2 4 3 3

4 2 3 3 1 5

0 6 3 3 5 1

Column Player C

RowPlayerR

5 1

R chooses 2, C chooses 1, Payoff is 5 to RPayoff is 1 to C


22

Constant sum games

5 1 2 4 3 3

4 2 3 3 1 5

0 6 3 3 5 1

Column Player C

RowPlayerR

5 1

R chooses 2, C chooses 1, Payoff is 5 to RPayoff is 1 to C

Here the total payoff is 6.

For C, maximizing payoff is the same as minimizing payoff to R.


23

Constant sum games

5 1 2 4 3 3

4 2 3 3 1 5

0 6 3 3 5 1

Column Player C

RowPlayerR

5 1

This becomes equivalent to 0-sum games

Here the total payoff is 6.

For C, maximizing payoff is the same as minimizing payoff to R.


24

2-person 0-sum games in general.

Let x denote a random strategy for R, with value z(x) and let y denote a random strategy for C with value v(y).

z(x) v(y) for all x, y The optimum x* can be obtained by

solving an LP. So can the optimum y* . z(x*) = v(y*)


25

More on 2-person 0-sum games

In principle, R can do as well with a random fixed strategy as by carefully varying a strategy over time

Duality for game theory was discovered by Von Neumann and Morgenstern (predates LP duality).

The idea of randomizing strategy permeates strategic gaming.

Applications to games: football, baseball and more


26

Non-Constant sum 2-person games

-20 0 -1 -1

-5 -5 0 -20

Prisoner 2

Prisoner 1

confessdon’t confess

confess

don’t confess

If Prisoner 1 announces a strategy, we assume that Prisoner 2 chooses optimally.


27

Remarks on strategies for Prisoner 2

-20 0 -1 -1

-5 -5 0 -20

Prisoner 2

Prisoner 1


confess

don’t confess

Suppose Prisoner 1 confesses

Then the optimum strategy for Prisoner 2 is to confess.


28


-20 0 -1 -1

-5 -5 0 -20

Prisoner 2

Prisoner 1


confess

don’t confess

Suppose Prisoner 1 does not confess



29


-20 0 -1 -1

-5 -5 0 -20

Prisoner 2

Prisoner 1


confess

don’t confess

Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not.

1/2

1/2


30

Remarks on a mixed strategy

-2.5 -10.5

Prisoner 2

Prisoner 1


confess

don’t confess

Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not.



31

Consider the following pure strategy:Prisoner 1: confessesPrisoner 2: confesses

A strategy is called a Nash equilibrium if neither player can benefit by a unilateral change in strategy.

The above strategy is a Nash equilibrium


32

The optimal strategy for Prisoner 2 is to confess, regardless of what Prisoner 1 does.

Similarly, the optimum strategy for Prisoner 1 is to confess, regardless of what Prisoner 2 does.

It is rare that an optimal strategy does not depend on the strategy of the opponent.


33

Non-Constant sum 2-person games

-20 0 -1 -1

-5 -5 0 -20

Prisoner 2

Prisoner 1


confess

don’t confess

Suppose the game is repeated 100 times with participants. The Nash equilibrium is to always confess. But in practice …


34

Another 2-person game

-2 2 2 -2

2 -1 -2 1

Column Player C

Row Player R

By a strategy for the row player, we mean a mixed strategy, such as choosing row 1 with probability 2/3 and choosing row 2 with probability 1/3.


35

A Nash Equilibrium

-2 2 2 -2

2 -1 -2 1

Column Player C

Row Player R

2/3

1/3

1/2 1/2

Let’s fix R’s strategy and look at C


36

A Nash Equilibrium

-2 2 2 -2

2 -1 -2 1

Column Player C

Row Player R

2/3

1/3

1/2 1/2

Let’s fix R’s strategy and look at C


37

C’s perspective

0 0

Column Player C

Row Player R

1/2 1/2

Let’s fix C’s strategy and look at R

Note: any strategy for C gives the same expected return, which is 0. So C cannot benefit by changing strategy


38

R’s Perspective

-2 2 2 -2

2 -1 -2 1

Column Player C

Row Player R

2/3

1/3

1/2 1/2



39

R’s Perspective

0

0

Column Player C

Row Player R

2/3

1/3


Note: any strategy for R gives the same expected return, which is 0. So R cannot benefit by changing strategy


40

Theorem. For two person game theory, there always exists a Nash equilibrium.

To find a Nash equilibrium, one can use optimization theory, but it is not a linear program. It is more difficult.


41

Summary

2-person constant sum game theory– mini-max solution– randomized strategies

2-person game theory– Prisoner’s dilemma– Nash equilibrium


42

A 2-dimensional view of game theory

The following slides show how to solve 2-person 0-sum game theory when there are two strategies per player

If R goes first and decides on a strategy of choosing row 1 with probability p and row 2 with probability 1-p, then the strategy for C is easily determined.

So R can determine the payoff as a function of p, and then choose p to maximize the payoff.

These slides were not presented in lecture.


43

What is the optimal strategy for R?

2 1

0 4

Here is the payoff if C picks column 1

0 1

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4

probability of selecting row 1

payoff

Graph the payoff as a function of the probability p that R selects row 1


44

More on payoffs as a function of p

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4

probability of selecting row 10 1

2 1

0 4


payoff


45

What is the optimal strategy for C as a function of p?

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4

C picks column 1

C picks column 2

payoff

1/5


46

Here is the payoff to R as a function of p

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4

C picks column 1C picks column 2payoff

1/5


47

What is the optimal strategy for R? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5

The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.


48

What is the optimal strategy for C?

2 1

0 4

Here is the payoff if R picks row 1

0 1

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4

probability of selecting col. 1

payoff

Graph the payoff as a function of the probability q that C selects column 1


49

More on payoffs as a function of q

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4

probability of selecting col 10 1

2 1

0 4Here is the payoff if R picks row 2

payoff


50

What is the optimal strategy for C as a function of q?

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4

R picks row 1

R picks row 2

payoff

q = 3/5


51

Here is the payoff to R as a function of q, the probability that C chooses Column 1.

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4

payoff

q = 3/5

R picks row 1

R picks row 2


52

What is the optimal strategy for C? payoff to R is 4(1-q) if q < 3/5 payoff to R is 2q + (1-q) if q > 3/5

The column player C can guarantee the minimum payoff to R when q = 3/5, and the payoff is 8/5 to R.


53

What is the optimal strategy for C? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5



54

What is the optimal strategy for R?

3 -3

-1 4


0 1

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4

probability of selecting row 1

payoff

Graph the payoff as a function of the probability p that R selects row 1


55

More on payoffs as a function of p

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4


payoff

3 -3

-1 4


56

What is the optimal strategy for C as a function of p?

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4C picks column 1 C picks column 2

payoff

2 1

0 4

3 -3

-1 4

p = 6/11


57

Here is the payoff to R as a function of p

1

4

3

2

0

-1-2

-3

-4

1

4

3

2

0

-1-2

-3

-4


2 1

0 4

C picks column 1

C picks column 2

payoff

2 1

0 4

3 -3

-1 4

p = 6/11


58

What is the optimal strategy for R? payoff to R is 7p -3 if p < 6/11 payoff to R is 3 - 4p if p > 6/11


mit and james orlin © 2003 1 game theory 2-person 0-sum (or constant sum) game theory 2-person game...

Documents