mit and james orlin © 2003 1 game theory 2-person 0-sum (or constant sum) game theory 2-person game...
TRANSCRIPT
MIT and James Orlin © 2003
1
Game Theory
2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
MIT and James Orlin © 2003
2
2-person 0-sum game theory
Person R chooses a row: either 1, 2, or 3
Person C chooses a column: either 1, 2, or 3
This matrix is the payoff matrix for player R. (And player C gets the negative.)
2 -1 0
1 0 -2
-2 1 2
e.g., R chooses row 3; C chooses column 1
1
R gets 1; C gets –1 (zero sum)
MIT and James Orlin © 2003
3
Some more examples of payoffs
R chooses 2, C chooses 3
2 -1 0
1 0 -2
-2 1 2
R chooses row 3; C chooses column 3
0
R gets -2; C gets +2 (zero sum)
R gets 0; C gets 0 (zero sum)
0
-2
MIT and James Orlin © 2003
4
Next: 2 volunteers
Player R puts out 1, 2 or 3 fingers
2 -1 0
1 0 -2
-2 1 2
R tries to maximize his or her total
C tries to minimize R’s total.
Player C simultaneously puts out 1, 2, or 3 fingers
We will run the game for 5 trials.
MIT and James Orlin © 2003
5
Next: Play the game with your partner(If you don’t have one, then watch)
Player R puts out 1, 2 or 3 fingers
2 -1 0
1 0 -2
-2 1 2
R tries to maximize his or her total
C tries to minimize R’s total.
Player C simultaneously puts out 1, 2, or 3 fingers
We will run the game for 5 trials.
MIT and James Orlin © 2003
6
Who has the advantage: R or C?
2 -1 0
1 0 -2
-2 1 2
Suppose that R and C are both brilliant players and they play a VERY LONG TIME.
Will R’s payoff be positive in the long run, or will it be negative, or will it converge to 0?
We will find a lower and upper bound on the payoff to R using linear programming.
MIT and James Orlin © 2003
7
Computing a lower bound
2 -1 0
1 0 -2
-2 1 2
Suppose that player R must announce his or her strategy in advance of C making a choice.
A strategy that consists of selecting the same row over and over again is a “pure strategy.”R can guarantee a payoff of at least –1.
If R is forced to announce a row, then what row will R select?
2 -1 0
MIT and James Orlin © 2003
8
Computing a lower bound on R’s payoff
2 -1 0
1 0 -2
-2 1 2
Suppose we permit R to choose a random strategy.
The column player makes the choice after hearing the strategy, but before seeing the flip of the coin.
Suppose R will flip a coin, and choose row 1 if Heads, and choose row 3 if tails.
MIT and James Orlin © 2003
9
What is player’s C best response?
2 -1 0
1 0 -2
-2 1 2
What would C’s best response be?
If C knows R’s random strategy, then C can determine the expected payoff for each column chosen.
Prob.
.5
0
.5
-.5 .5 0Expected
Payoff
So, with a random strategy R can get at least -.5
MIT and James Orlin © 2003
10
Suppose that R randomizes between row 1 and row 2.
2 -1 0
1 0 -2
-2 1 2
What would C’s best response be?
Prob.
.5
.5
0
0 0 1Expected
Payoff
So, with a random strategy R can get at least 0.
MIT and James Orlin © 2003
11
What is R’s best random strategy?
2 -1 0
1 0 -2
-2 1 2
Prob.
x1
x2
x3
A: -2 x1 + 2 x2 + x3
AExpected
PayoffB
B: x1 - x2
C
C: 2 x1 - 2 x3
x1 + x2 + x3 = 1
C will choose the column that is the minimum of A, B, and C.
MIT and James Orlin © 2003
12
Finding the min of 2 numbers as an optimization problem.
Let z = min (x, y)
Then z is the optimum solution value to the following LP:
maximize z
subject to z x
z y
MIT and James Orlin © 2003
13
R’s best strategy, as an LP
2 -1 0
1 0 -2
-2 1 2 x1
x2
x3
A: PA = -2 x1 + 2 x2 + x3
A B
B: PB = x1 - x2
C
C: PC = 2 x1 - 2 x3
Expected Payoff
Maximize min (PA, PB, PC)
x1 + x2 + x3 = 1
x1 , x2 , x3 0
MIT and James Orlin © 2003
14
R’s best strategy, as an LP
2 -1 0
1 0 -2
-2 1 2 x1
x2
x3
A: z -2 x1 + 2 x2 + x3
A B
B: z x1 - x2
C
C: z 2 x1 - 2 x3
Expected Payoff
Maximize z (the payoff to x)
x1 + x2 + x3 = 1
x1 , x2 , x3 0
2-person0-sumgame
MIT and James Orlin © 2003
15
The Row Player’s LP, in general
a11 a12 a13 x1
x2
xn
a1m
a21 a22 a23 a2m
an1 an2 an3 anm
…
…
…
…
Pj: z a1j x1 + a2j x2 +… + anjxn for all j
P1 P2 P3
x1 + x2 + … + xn = 1
xj 0 for all j
Expected Payoff
Maximize z (the payoff to x)
Pm…
MIT and James Orlin © 2003
16
Here is the optimal random strategy for R.
2 -1 0
1 0 -2
-2 1 2
The optimal payoff to R is 1/9.
Prob.
7/18
5/18
1/3
1/9 1/9 1/9Expected
Payoff
So, with a random strategy R guarantees obtaining at least 1/9.
This strategy is a lower bound on what R can obtain if he or she takes into account what C is doing.
MIT and James Orlin © 2003
17
We can obtain a lower bound for C (and an upper bound for R) in the same manner.
2 -1 0
1 0 -2
-2 1 2
If C announced this random strategy, R would select row 1.
Exp.payoff
1/2
0
-1/4
1/4 1/2 1/4Prob.
So, with this random strategy C guarantees that R obtains at most 1/2.
If C chooses a random strategy, it will give an upper bound on what R can obtain.
MIT and James Orlin © 2003
18
Exercise, what is the best bound for C?
2 -1 0
1 0 -2
-2 1 2
With your partner, write the LP whose solution gives an optimal strategy for the column player.
Exp.payoff
?
?
?
y1 y2 y3Prob.
MIT and James Orlin © 2003
19
The best strategy for the column player
2 -1 0
1 0 -2
-2 1 2
Exp.payoff
1/9
1/9
1/9
1/3 5/9 1/9Prob.
So, with this random strategy R gets only 1/9.
MIT and James Orlin © 2003
20
Note: the payoff is the same, whether the Row Player goes first or the Column Player
2 -1 0
1 0 -2
-2 1 2
Exp.payoff
1/9
1/9
1/9
1/3 5/9 1/9Prob.
For 2-person 0-sum games, the maximum payoff that R can guarantee by choosing a random strategy is the minimum payoff to R that C can guarantee by choosing a random strategy.
So, the optimal average payoff to the game is 1/9, assuming that both players play optimally.
MIT and James Orlin © 2003
21
Constant sum games
5 1 2 4 3 3
4 2 3 3 1 5
0 6 3 3 5 1
Column Player C
RowPlayerR
5 1
R chooses 2, C chooses 1, Payoff is 5 to RPayoff is 1 to C
MIT and James Orlin © 2003
22
Constant sum games
5 1 2 4 3 3
4 2 3 3 1 5
0 6 3 3 5 1
Column Player C
RowPlayerR
5 1
R chooses 2, C chooses 1, Payoff is 5 to RPayoff is 1 to C
Here the total payoff is 6.
For C, maximizing payoff is the same as minimizing payoff to R.
MIT and James Orlin © 2003
23
Constant sum games
5 1 2 4 3 3
4 2 3 3 1 5
0 6 3 3 5 1
Column Player C
RowPlayerR
5 1
This becomes equivalent to 0-sum games
Here the total payoff is 6.
For C, maximizing payoff is the same as minimizing payoff to R.
MIT and James Orlin © 2003
24
2-person 0-sum games in general.
Let x denote a random strategy for R, with value z(x) and let y denote a random strategy for C with value v(y).
z(x) v(y) for all x, y The optimum x* can be obtained by
solving an LP. So can the optimum y* . z(x*) = v(y*)
MIT and James Orlin © 2003
25
More on 2-person 0-sum games
In principle, R can do as well with a random fixed strategy as by carefully varying a strategy over time
Duality for game theory was discovered by Von Neumann and Morgenstern (predates LP duality).
The idea of randomizing strategy permeates strategic gaming.
Applications to games: football, baseball and more
MIT and James Orlin © 2003
26
Non-Constant sum 2-person games
-20 0 -1 -1
-5 -5 0 -20
Prisoner 2
Prisoner 1
confessdon’t confess
confess
don’t confess
If Prisoner 1 announces a strategy, we assume that Prisoner 2 chooses optimally.
MIT and James Orlin © 2003
27
Remarks on strategies for Prisoner 2
-20 0 -1 -1
-5 -5 0 -20
Prisoner 2
Prisoner 1
confessdon’t confess
confess
don’t confess
Suppose Prisoner 1 confesses
Then the optimum strategy for Prisoner 2 is to confess.
MIT and James Orlin © 2003
28
Remarks on strategies for Prisoner 2
-20 0 -1 -1
-5 -5 0 -20
Prisoner 2
Prisoner 1
confessdon’t confess
confess
don’t confess
Suppose Prisoner 1 does not confess
Then the optimum strategy for Prisoner 2 is to confess.
MIT and James Orlin © 2003
29
Remarks on strategies for Prisoner 2
-20 0 -1 -1
-5 -5 0 -20
Prisoner 2
Prisoner 1
confessdon’t confess
confess
don’t confess
Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not.
1/2
1/2
MIT and James Orlin © 2003
30
Remarks on a mixed strategy
-2.5 -10.5
Prisoner 2
Prisoner 1
confessdon’t confess
confess
don’t confess
Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not.
Then the optimum strategy for Prisoner 2 is to confess.
MIT and James Orlin © 2003
31
Consider the following pure strategy:Prisoner 1: confessesPrisoner 2: confesses
A strategy is called a Nash equilibrium if neither player can benefit by a unilateral change in strategy.
The above strategy is a Nash equilibrium
MIT and James Orlin © 2003
32
The optimal strategy for Prisoner 2 is to confess, regardless of what Prisoner 1 does.
Similarly, the optimum strategy for Prisoner 1 is to confess, regardless of what Prisoner 2 does.
It is rare that an optimal strategy does not depend on the strategy of the opponent.
MIT and James Orlin © 2003
33
Non-Constant sum 2-person games
-20 0 -1 -1
-5 -5 0 -20
Prisoner 2
Prisoner 1
confessdon’t confess
confess
don’t confess
Suppose the game is repeated 100 times with participants. The Nash equilibrium is to always confess. But in practice …
MIT and James Orlin © 2003
34
Another 2-person game
-2 2 2 -2
2 -1 -2 1
Column Player C
Row Player R
By a strategy for the row player, we mean a mixed strategy, such as choosing row 1 with probability 2/3 and choosing row 2 with probability 1/3.
MIT and James Orlin © 2003
35
A Nash Equilibrium
-2 2 2 -2
2 -1 -2 1
Column Player C
Row Player R
2/3
1/3
1/2 1/2
Let’s fix R’s strategy and look at C
MIT and James Orlin © 2003
36
A Nash Equilibrium
-2 2 2 -2
2 -1 -2 1
Column Player C
Row Player R
2/3
1/3
1/2 1/2
Let’s fix R’s strategy and look at C
MIT and James Orlin © 2003
37
C’s perspective
0 0
Column Player C
Row Player R
1/2 1/2
Let’s fix C’s strategy and look at R
Note: any strategy for C gives the same expected return, which is 0. So C cannot benefit by changing strategy
MIT and James Orlin © 2003
38
R’s Perspective
-2 2 2 -2
2 -1 -2 1
Column Player C
Row Player R
2/3
1/3
1/2 1/2
Let’s fix C’s strategy and look at R
MIT and James Orlin © 2003
39
R’s Perspective
0
0
Column Player C
Row Player R
2/3
1/3
Let’s fix C’s strategy and look at R
Note: any strategy for R gives the same expected return, which is 0. So R cannot benefit by changing strategy
MIT and James Orlin © 2003
40
Theorem. For two person game theory, there always exists a Nash equilibrium.
To find a Nash equilibrium, one can use optimization theory, but it is not a linear program. It is more difficult.
MIT and James Orlin © 2003
41
Summary
2-person constant sum game theory– mini-max solution– randomized strategies
2-person game theory– Prisoner’s dilemma– Nash equilibrium
MIT and James Orlin © 2003
42
A 2-dimensional view of game theory
The following slides show how to solve 2-person 0-sum game theory when there are two strategies per player
If R goes first and decides on a strategy of choosing row 1 with probability p and row 2 with probability 1-p, then the strategy for C is easily determined.
So R can determine the payoff as a function of p, and then choose p to maximize the payoff.
These slides were not presented in lecture.
MIT and James Orlin © 2003
43
What is the optimal strategy for R?
2 1
0 4
Here is the payoff if C picks column 1
0 1
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 1
payoff
Graph the payoff as a function of the probability p that R selects row 1
MIT and James Orlin © 2003
44
More on payoffs as a function of p
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
Here is the payoff if C picks column 2
payoff
MIT and James Orlin © 2003
45
What is the optimal strategy for C as a function of p?
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
C picks column 1
C picks column 2
payoff
1/5
MIT and James Orlin © 2003
46
Here is the payoff to R as a function of p
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
C picks column 1C picks column 2payoff
1/5
MIT and James Orlin © 2003
47
What is the optimal strategy for R? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5
The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.
MIT and James Orlin © 2003
48
What is the optimal strategy for C?
2 1
0 4
Here is the payoff if R picks row 1
0 1
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting col. 1
payoff
Graph the payoff as a function of the probability q that C selects column 1
MIT and James Orlin © 2003
49
More on payoffs as a function of q
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting col 10 1
2 1
0 4Here is the payoff if R picks row 2
payoff
MIT and James Orlin © 2003
50
What is the optimal strategy for C as a function of q?
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
R picks row 1
R picks row 2
payoff
q = 3/5
MIT and James Orlin © 2003
51
Here is the payoff to R as a function of q, the probability that C chooses Column 1.
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
payoff
q = 3/5
R picks row 1
R picks row 2
MIT and James Orlin © 2003
52
What is the optimal strategy for C? payoff to R is 4(1-q) if q < 3/5 payoff to R is 2q + (1-q) if q > 3/5
The column player C can guarantee the minimum payoff to R when q = 3/5, and the payoff is 8/5 to R.
MIT and James Orlin © 2003
53
What is the optimal strategy for C? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5
The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.
MIT and James Orlin © 2003
54
What is the optimal strategy for R?
3 -3
-1 4
Here is the payoff if C picks column 1
0 1
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 1
payoff
Graph the payoff as a function of the probability p that R selects row 1
MIT and James Orlin © 2003
55
More on payoffs as a function of p
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
Here is the payoff if C picks column 2
payoff
3 -3
-1 4
MIT and James Orlin © 2003
56
What is the optimal strategy for C as a function of p?
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4C picks column 1 C picks column 2
payoff
2 1
0 4
3 -3
-1 4
p = 6/11
MIT and James Orlin © 2003
57
Here is the payoff to R as a function of p
1
4
3
2
0
-1-2
-3
-4
1
4
3
2
0
-1-2
-3
-4
probability of selecting row 10 1
2 1
0 4
C picks column 1
C picks column 2
payoff
2 1
0 4
3 -3
-1 4
p = 6/11
MIT and James Orlin © 2003
58
What is the optimal strategy for R? payoff to R is 7p -3 if p < 6/11 payoff to R is 3 - 4p if p > 6/11
The maximum payoff is when p = 6/11, and the payoff is 9/11 to R.