csc304 lecture 5 game theory : zero-sum games, the ...that meeting with the express purpose of...

Post on 06-Aug-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSC304 Lecture 5

Game Theory : Zero-Sum Games,

The Minimax Theorem

CSC304 - Nisarg Shah 1

Recap

CSC304 - Nisarg Shah 2

โ€ข Last lectureโžข Cost-sharing games

o Price of anarchy (PoA) can be ๐‘›

o Price of stability (PoS) is ๐‘‚(log ๐‘›)

โžข Potential functions and pure Nash equilibria

โžข Congestion games

โžข Braessโ€™ paradox

โžข Updated (slightly more detailed) slides

โ€ข Assignment 1 to be posted

โ€ข Volunteer note-taker

Zero-Sum Games

CSC304 - Nisarg Shah 3

โ€ข Total reward constant in all outcomes (w.l.o.g. 0)โžข Common term: โ€œzero-sum situationโ€

โžข Psychology literature: โ€œzero-sum thinkingโ€

โžข โ€œStrictly competitive gamesโ€

โ€ข Focus on two-player zero-sum games (2p-zs)โžข โ€œThe more I win, the more you loseโ€

Zero-Sum Games

CSC304 - Nisarg Shah 4

SamJohn Stay Silent Betray

Stay Silent (-1 , -1) (-3 , 0)

Betray (0 , -3) (-2 , -2)

Non-zero-sum game: Prisonerโ€™s dilemma

Zero-sum game: Rock-Paper-Scissor

P1P2 Rock Paper Scissor

Rock (0 , 0) (-1 , 1) (1 , -1)

Paper (1 , -1) (0 , 0) (-1 , 1)

Scissor (-1 , 1) (1 , -1) (0 , 0)

Zero-Sum Games

CSC304 - Nisarg Shah 5

โ€ข Why are they interesting?โžข Most games we play are zero-sum: chess, tic-tac-toe,

rock-paper-scissor, โ€ฆ

โžข (win, lose), (lose, win), (draw, draw)

โžข (1, -1), (-1, 1), (0, 0)

โ€ข Why are they technically interesting?โžข Relation between the rewards of P1 and P2

โžข P1 maximizes his reward

โžข P2 maximizes his reward = minimizes reward of P1

Zero-Sum Games

CSC304 - Nisarg Shah 6

โ€ข Reward for P2 = - Reward for P1

โžข Only need a single matrix ๐ด : reward for P1

โžข P1 wants to maximize, P2 wants to minimize

P1P2 Rock Paper Scissor

Rock 0 -1 1

Paper 1 0 -1

Scissor -1 1 0

Rewards in Matrix Form

CSC304 - Nisarg Shah 7

โ€ข Say P1 uses mixed strategy ๐‘ฅ1 = (๐‘ฅ1,1, ๐‘ฅ1,2, โ€ฆ )โžข What are the rewards for P1 corresponding to different

possible actions of P2?

๐‘ ๐‘—

๐‘ฅ1,1

๐‘ฅ1,2

๐‘ฅ1,3

.

.

.

Rewards in Matrix Form

CSC304 - Nisarg Shah 8

โ€ข Say P1 uses mixed strategy ๐‘ฅ1 = (๐‘ฅ1,1, ๐‘ฅ1,2, โ€ฆ )โžข What are the rewards for P1 corresponding to different

possible actions of P2?

๐‘ ๐‘—

๐‘ฅ1,1, ๐‘ฅ1,2, ๐‘ฅ1,3, โ€ฆ โˆ—

โ– Reward for P1 when P2

chooses sj = ๐‘ฅ1๐‘‡ โˆ— ๐ด

๐‘—

Rewards in Matrix Form

CSC304 - Nisarg Shah 9

โ€ข Reward for P1 whenโ€ฆโžข P1 uses mixed strategy ๐‘ฅ1โžข P2 uses mixed strategy ๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด

1, ๐‘ฅ1

๐‘‡ โˆ— ๐ด2, ๐‘ฅ1

๐‘‡ โˆ— ๐ด3โ€ฆ โˆ—

๐‘ฅ2,1๐‘ฅ2,2๐‘ฅ2,3โ‹ฎ

= ๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2

CSC304 - Nisarg Shah 10

How would the two players actdo in this zero-sum game?

John von Neumann, 1928

Maximin Strategy

CSC304 - Nisarg Shah 11

โ€ข Worst-case thinking by P1โ€ฆโžข If I choose mixed strategy ๐‘ฅ1โ€ฆ

โžข P2 would choose ๐‘ฅ2 to minimize my reward (i.e., maximize his reward)

โžข Let me choose ๐‘ฅ1 to maximize this โ€œworst-case rewardโ€

๐‘‰1โˆ— = max

๐‘ฅ1min๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2

Maximin Strategy

CSC304 - Nisarg Shah 12

๐‘‰1โˆ— = max

๐‘ฅ1min๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2

โ€ข ๐‘‰1โˆ— : maximin value of P1

โ€ข ๐‘ฅ1โˆ— (maximizer) : maximin strategy of P1

โ€ข โ€œBy playing ๐‘ฅ1โˆ—, I guarantee myself at least ๐‘‰1

โˆ—โ€

โ€ข But if P1 โ†’ ๐‘ฅ1โˆ—, P2โ€™s best response โ†’ เทœ๐‘ฅ2

โžข Will ๐‘ฅ1โˆ— be the best response to เทœ๐‘ฅ2?

Maximin vs Minimax

CSC304 - Nisarg Shah 13

Player 1

Choose my strategy to maximize my reward, worst-case over P2โ€™s response

๐‘‰1โˆ— = max

๐‘ฅ1min๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2

Player 2

Choose my strategy to minimize P1โ€™s reward, worst-case over P1โ€™s response

๐‘‰2โˆ— = min

๐‘ฅ2max๐‘ฅ1

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2

Question: Relation between ๐‘‰1โˆ— and ๐‘‰2

โˆ—?

๐‘ฅ1โˆ— ๐‘ฅ2

โˆ—

Maximin vs Minimax

CSC304 - Nisarg Shah 14

๐‘‰1โˆ— = max

๐‘ฅ1min๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2 ๐‘‰2

โˆ— = min๐‘ฅ2

max๐‘ฅ1

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2

โ€ข What if (P1,P2) play (x1โˆ— , x2

โˆ—)?โžข P1 must get at least ๐‘‰1

โˆ— (ensured by P1)

โžข P1 must get at most ๐‘‰2โˆ— (ensured by P2)

โžข ๐‘‰1โˆ— โ‰ค ๐‘‰2

โˆ—

๐‘ฅ1โˆ— ๐‘ฅ2

โˆ—

The Minimax Theorem

CSC304 - Nisarg Shah 16

โ€ข Jon von Neumann [1928]

โ€ข Theorem: For any 2p-zs game,

โžข ๐‘‰1โˆ— = ๐‘‰2

โˆ— = ๐‘‰โˆ— (called the minimax value of the game)

โžข Set of Nash equilibria =

{ x1โˆ— , x2

โˆ— โˆถ x1โˆ— = maximin for P1, x2

โˆ— = minimax for P2}

โ€ข Corollary: ๐‘ฅ1โˆ— is best response to ๐‘ฅ2

โˆ— and vice-versa.

The Minimax Theorem

CSC304 - Nisarg Shah 17

โ€ข Jon von Neumann [1928]

โ€œAs far as I can see, there could be no theory of games โ€ฆ without that theorem โ€ฆ

I thought there was nothing worth publishing until the Minimax Theorem was provedโ€

โ€ข An unequivocal way to โ€œsolveโ€ zero-sum gamesโžข Optimal strategies for P1 and P2 (up to ties)

โžข Optimal rewards for P1 and P2 under a rational play

Proof of the Minimax Theorem

CSC304 - Nisarg Shah 18

โ€ข Simpler proof using Nashโ€™s theoremโžข But predates Nashโ€™s theorem

โ€ข Suppose ๐‘ฅ1, ๐‘ฅ2 is a NE

โ€ข P1 gets value ๐‘ฃ = ๐‘ฅ1๐‘‡๐ด ๐‘ฅ2

โ€ข ๐‘ฅ1 is best response for P1 : ๐‘ฃ = max๐‘ฅ1 ๐‘ฅ1๐‘‡๐ด ๐‘ฅ2

โ€ข ๐‘ฅ2 is best response for P2 : ๐‘ฃ = min๐‘ฅ2 ๐‘ฅ1๐‘‡๐ด ๐‘ฅ2

Proof of the Minimax Theorem

CSC304 - Nisarg Shah 19

โ€ข But we already saw ๐‘‰1โˆ— โ‰ค ๐‘‰2

โˆ—

โžข ๐‘‰1โˆ— = ๐‘‰2

โˆ—

max๐‘ฅ1

๐‘ฅ1๐‘‡๐ด ๐‘ฅ2 = ๐‘ฃ = min

๐‘ฅ2๐‘ฅ1

๐‘‡๐ด ๐‘ฅ2

โ‰ค max๐‘ฅ1

min๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2 = ๐‘‰1

โˆ—

๐‘‰2โˆ— = min

๐‘ฅ2max๐‘ฅ1

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2 โ‰ค

Proof of the Minimax Theorem

CSC304 - Nisarg Shah 20

โ€ข When ( ๐‘ฅ1, ๐‘ฅ2) is a NE, ๐‘ฅ1 and ๐‘ฅ2 must be maximin and minimax strategies for P1 and P2, respectively.

โ€ข The reverse direction is also easy to prove.

max๐‘ฅ1

๐‘ฅ1๐‘‡๐ด ๐‘ฅ2 = ๐‘ฃ = max

๐‘ฅ2๐‘ฅ1

๐‘‡๐ด ๐‘ฅ2

= max๐‘ฅ1

min๐‘ฅ2

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2 = ๐‘‰1

โˆ—

๐‘‰2โˆ— = min

๐‘ฅ2max๐‘ฅ1

๐‘ฅ1๐‘‡ โˆ— ๐ด โˆ— ๐‘ฅ2 =

Computing Nash Equilibria

CSC304 - Nisarg Shah 21

โ€ข Can I practically compute a maximin strategy (and thus a Nash equilibrium of the game)?

โ€ข Wasnโ€™t it computationally hard even for 2-player games?

โ€ข For 2p-zs games, a Nash equilibrium can be computed in polynomial time using linear programming.

โžข Polynomial in #actions of the two players: ๐‘š1 and ๐‘š2

Computing Nash Equilibria

CSC304 - Nisarg Shah 22

Maximize ๐‘ฃ

Subject to

๐‘ฅ1๐‘‡ ๐ด

๐‘—โ‰ฅ ๐‘ฃ, ๐‘— โˆˆ 1,โ€ฆ ,๐‘š2

๐‘ฅ1 1 +โ‹ฏ+ ๐‘ฅ1 ๐‘š1 = 1

๐‘ฅ1 ๐‘– โ‰ฅ 0, ๐‘– โˆˆ {1,โ€ฆ ,๐‘š1}

Minimax Theorem in Real Life?

CSC304 - Nisarg Shah 23

โ€ข If you were to play a 2-player zero-sum game (say, as player 1), would you always play a maximin strategy?

โ€ข What if you were convinced your opponent is an idiot?

โ€ข What if you start playing the maximin strategy, but observe that your opponent is not best responding?

Minimax Theorem in Real Life?

CSC304 - Nisarg Shah 24

KickerGoalie L R

L 0.58 0.95

R 0.93 0.70

KickerMaximize ๐‘ฃSubject to0.58๐‘๐ฟ + 0.93๐‘๐‘… โ‰ฅ ๐‘ฃ0.95๐‘๐ฟ + 0.70๐‘๐‘… โ‰ฅ ๐‘ฃ๐‘๐ฟ + ๐‘๐‘… = 1

๐‘๐ฟ โ‰ฅ 0, ๐‘๐‘… โ‰ฅ 0

GoalieMinimize ๐‘ฃSubject to0.58๐‘ž๐ฟ + 0.95๐‘ž๐‘… โ‰ค ๐‘ฃ0.93๐‘ž๐ฟ + 0.70๐‘ž๐‘… โ‰ค ๐‘ฃ๐‘ž๐ฟ + ๐‘ž๐‘… = 1

๐‘ž๐ฟ โ‰ฅ 0, ๐‘ž๐‘… โ‰ฅ 0

Minimax Theorem in Real Life?

CSC304 - Nisarg Shah 25

KickerGoalie L R

L 0.58 0.95

R 0.93 0.70

KickerMaximin:๐‘๐ฟ = 0.38, ๐‘๐‘… = 0.62

Reality:๐‘๐ฟ = 0.40, ๐‘๐‘… = 0.60

GoalieMaximin:๐‘ž๐ฟ = 0.42, ๐‘ž๐‘… = 0.58

Reality:๐‘๐ฟ = 0.423, ๐‘ž๐‘… = 0.577

Some evidence that people may play minimax strategies.

Minimax Theorem

CSC304 - Nisarg Shah 26

โ€ข We proved it using Nashโ€™s theoremโžข Cheating. Typically, Nashโ€™s theorem (for

the special case of 2p-zs games) is proved using the minimax theorem.

โ€ข Useful for proving Yaoโ€™s principle, which provides lower bound for randomized algorithms

โ€ข Equivalent to linear programming duality

John von Neumann

George Dantzig

von Neumann and Dantzig

CSC304 - Nisarg Shah 27

George Dantzig loves to tell the story of his meeting with John von Neumann on October 3, 1947 at the Institute for Advanced Study at Princeton. Dantzig went to that meeting with the express purpose of describing the linear programming problem to von Neumann and asking him to suggest a computational procedure. He was actually looking for methods to benchmark the simplex method. Instead, he got a 90-minute lecture on Farkas Lemma and Duality (Dantzig's notes of this session formed the source of the modern perspective on linear programming duality). Not wanting Dantzig to be completely amazed, von Neumann admitted:

"I don't want you to think that I am pulling all this out of my sleeve like a magician. I have recently completed a book with Morgenstern on the theory of games. What I am doing is conjecturing that the two problems are equivalent. The theory that I am outlining is an analogue to the one we have developed for games.โ€œ

- (Chandru & Rao, 1999)

top related