a tutorial on game theory - university of hawaiinreed/ics606/lectures/gametheorytut.pdfa tutorial on...

48
1 A Tutorial on Game Theory Daniel B. Neill Carnegie Mellon University April 2004 Outline Lecture 1: Basics of rational decision theory. Games in normal form; pure and mixed Nash equilibria. Games in extensive form; backward induction. Lecture 2: Bayesian games. Auctions and negotiation. Game theory vs. game playing. Lecture 3: Evolution and learning in games.

Upload: trinhphuc

Post on 14-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

1

A Tutorial on Game Theory

Daniel B. NeillCarnegie Mellon University

April 2004

Outline• Lecture 1:

– Basics of rational decision theory.– Games in normal form; pure and mixed Nash equilibria.– Games in extensive form; backward induction.

• Lecture 2: – Bayesian games.– Auctions and negotiation. – Game theory vs. game playing.

• Lecture 3: – Evolution and learning in games.

Page 2: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

2

What is game theory?

• Game theory can be defined as “the study of rational decision-making in situations of conflict and/or cooperation.”– What is a decision?– What is a rational decision?– What do we mean by conflict and cooperation?

What is game theory? (2)

• The study of rational decision-making in situations of conflict and/or cooperation.

• Decision: a player’s choice of what action to take, given some information about the state of the world.– The consequences of a player’s decision will be

a function of his action, the actions of other players (if applicable) and the current state.

Page 3: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

3

What is game theory? (3)

• The study of rational decision-making in situations of conflict and/or cooperation.

• Rational: a rational player will choose the action which he expects to give the best consequences, where “best” is according to his set of preferences. – For example, people typically prefer more

money to less money, or pleasure to pain.

Rational decisions

• A decision maker is assumed to have a fixed range of alternatives to choose from, and his choice influences the outcome of the situation.

• Each possible outcome is associated with a real number– its utility. This can be subjective (how much the outcome is desired) or objective (how good the outcome actually is for the player).

• In any case, the basic assumption of game theory is: A rational player will make the decision that maximizes his expected utility.

Page 4: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

4

Types of decision situation

• Decision making under certainty: would you prefer to be paid $100 or punched in the nose?– Consequences C(A) of each action A are known.– A rational agent chooses the action with the highest

utility U(C(A)).– For most people, U(paid $100)>U(punched in nose) so

they would choose the former.

Note that we are only considering the rationality of actions, not preferences: a person who prefers a punch in the nose can still be rational under our definition!

Types of decision situation (2)• Decision making under risk: would you

wager $100 on the flip of a fair coin?– For each action, a probability distribution over

possible consequences P(C | A) is known.– A rational agent chooses the action with highest

expected utility,– For most people, ½ U(gain $100) + ½ U(lose

$100) < 0, so they would not take this wager.

∑C

CUACP )()|(

Money is not utility, since most people are “risk-averse”!

Page 5: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

5

Types of decision situation (3)

• Decision making under uncertainty: would you rather go to the movies or to the beach?– Agents are assumed to have a subjective probability

distribution over possible states of nature P(S).– The consequence of an action is assumed to be a

deterministic function C(A, S) of the action A and the state S.

– A rational agent chooses the action with the highest subjective expected utility, ∑

S

SACUSP )),(()(

Types of decision situation (4)

• Decision making under uncertainty: would you rather go to the movies or to the beach?– I believe there is a 40% chance that it will rain.– I will enjoy a movie whether it rains or not:

U(C(movie, sun) = U(C(movie, rain)) = 1.– I will not enjoy the beach if it is rainy, but I will have a

great time if it is sunny: U(C(beach, rain)) = -1, U(C(beach, sun)) = 2.

– SEU(movie) = 1.– SEU(beach) = .4(-1)+.6(2) = .8. I’m going to

the movies!

Page 6: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

6

What is game theory? (4)

• The study of rational decision-making in situationsof conflict and/or cooperation.

• Game theory typically deals with situations in which multiple decision-makers interact: the consequences for each player are affected not only by his choice but also by choices of other players.

• In zero sum games (ex. chess), one player’s gain is the other’s loss, while in non-zero sum games (ex. business agreements), it is possible for both players to simultaneously gain or lose.

Zero sum games

• Examined in depth by Von Neumann and Morgenstern in the 1920s-1940s.

• Most important result: the minimax theorem, which states that under common assumptions of rationality, each player will make the choice that maximizes his minimum expected utility.

• This choice may be a pure strategy (always making the same choice) or a mixed strategy (a random choice between pure strategies).

Page 7: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

7

Zero sum example

-1 / 12 / -2Player 1

chooses B

-4 / 4-7 / 7Player 1

chooses A

Player 2

chooses bPlayer 2

chooses aPayoffs to

P1/P2

Solution: P1 always prefers B; P2 (knowing this) prefers Bb to Ba.

The value of the game is -1!

Mixed strategies

• Now consider a simple “soccer” game:– There are two players, the

kicker and the goalie.– The kicker can shoot the ball

either left or right; the goalie can dive either left or right.

– If the goalie chooses correctly, he blocks the ball; if the goalie chooses wrong, it’s a goal!

• What should each player do?

-1 / 1 1 / -11 / -1-1 / 1

Goalie divesL R

L

RKicker shoots

Page 8: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

8

Non-zero sum games• Can be cooperative (players can make enforceable

agreements: “I’ll cooperate if you do”) or non-cooperative (no prior agreements can be enforced).– In non-cooperative games, an agreement must be self-

enforcing, in that players have no incentive to deviate from it.

• Most important concept: Nash Equilibrium. – A combination of strategy choices such that no player

can increase his utility by changing strategies.– Nash’s Thm: every game has at least one NE.

Non-zero sum example

1 / 12 / 0Player 1

chooses B

0 / 24 / 4Player 1

chooses A

Player 2

chooses bPlayer 2

chooses aPayoffs to

P1/P2Aa is a Nash equilibrium: each player gets 4 at Aa, but only 2 if he plays B instead!

Are there any other Nash equilibria for this game?

Page 9: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

9

Formal definitions

• A game in normal form consists of:– A list of players i = 1…n– A finite set of strategies Si for each player i– A utility function ui for each player i, where

ui : (S1 x S2 x … x Sn) R. (ui maps a combination of players’ pure strategies to the payoff for player i).

• Normal form gives no indication of the order of players’ moves; for this we need extensive form(more on this later). For now, assume that all players choose strategies simultaneously!

Formal definitions (2)

• A Nash equilibrium is a set of strategies s1 S1 … sn Sn, such that for each player i:si = arg maxs ui (s1, …, si-1, s, si+1, …, sn)

• In other words, each player’s strategy si is a strategy s that maximizes his payoff, given the other players’ strategies; no player can do better by switching strategies.

∈ ∈

Page 10: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

10

Formal definitions (3)

• A mixed strategy σi is a probability distribution over player i’s pure strategies si.– For example, if A and B are the pure strategies for P1,

then σ1 might be (¼ A, ¾ B).

• Then the utility

• Nash equilibrium in mixed strategies: a set of mixed strategies σ1 … σn, such that for each player i:σi = arg maxσ ui (σ1, …, σi-1, σ, σi+1, …, σn)

∑ ∏∈ =

=Ss

n

j

jnini sssuu j

1

11 )(Pr)..()..( σσσ

Computing Nash equilibria• A strategy si is strictly dominated if there exists

another strategy ti in Si such that player i always scores better with ti than with si.

• Assumption: A rational player will never play a strictly dominated strategy!

• Moreover, since the other players know this, we can iteratively delete strictly dominated strategies.– order of deletion doesn’t matter– we will never delete a NE– if, after we do this, there is only one combination of

strategies left, this is the unique NE of the game.

Page 11: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

11

Strict domination example

7 / 03 / 62 / 46 / 82 / 4 1 / 32 / 24 / 13 / 3

a b cA

B

C7 / 03 / 62 / 42 / 24 / 13 / 3

a b cA

CC > B a > c

a bA

C 3 / 62 / 44 / 13 / 3

A > C

a b

A 4 / 13 / 3 3 / 3A

a

a > b

Thus Aa is the unique Nash equilibrium!

Strict domination example

7 / 03 / 62 / 46 / 82 / 4 1 / 32 / 24 / 13 / 3

a b cA

B

C7 / 03 / 62 / 42 / 24 / 13 / 3

a b cA

CC > B a > c

a bA

C 3 / 62 / 44 / 13 / 3

A > C

a b

A 4 / 13 / 3 3 / 3A

a

a > b

Thus Aa is the unique Nash equilibrium!

If both players are rational, and this is common knowledge,

Aa will be the outcome of the game!

Page 12: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

12

Computing Nash equilibria (2)• Strict domination is great…

when it applies.• But for some games, no

strategies can be eliminated by strict domination. What to do now?

• We can check whether each combination of pure strategies is a Nash equilibrium:– Can any player do better by

switching strategies? If not, then it’s NE.

6 / 63 / 53 / 55 / 30 / 4 4 / 05 / 34 / 00 / 4

a b cA

B

C

A nice example from Andrew: where is the Nash equilibrium?

Computing Nash equilibria (3)• Here’s a neat little trick for

finding pure strategy NE in two player games:– For each column, color the box(es)

with maximum payoff to P1 red.– For each row, color the box(es) with

maximum payoff to P2 blue.– The Nash equilibria are the set of

squares colored both red and blue (purple in our picture).

6 / 63 / 53 / 55 / 30 / 4 4 / 05 / 34 / 00 / 4

a b cA

B

C

A nice example from Andrew: where is the Nash equilibrium?

Page 13: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

13

Computing Nash equilibria (4)• What if there are no pure strategy

equilibria?– By Nash’s Theorem, a mixed strategy

equilibrium must exist.• For a mixed strategy equilibrium, each

player chooses a mixture of strategies that makes the other players indifferent between the strategies over which they are mixing.

• If P1 chooses (½ A, ½ B), P2 is indifferent between a and b, and if P2 chooses (½ a, ½ b), P1 is indifferent between A and B.

• Thus ((½ A, ½ B), (½ a, ½ b)) is a NE.

0 / 4 4 / 04 / 00 / 4

a bA

B

This game can be thought of as a variant on “matching pennies,” where the winner gets four points and the loser none.

Computing Nash equilibria (5)• What about this game?• Aa and Bb are both pure strategy

NE. Are there any mixed NE?• Assume there is a mixed strategy

NE with P1(A) = x, and P2(a) = y.• For P1 to be indifferent between A and B:

4y + 0(1-y) = 2y + 1(1-y) y = ⅓.• For P2 to be indifferent between a and b:

4x + 0(1-x) = 3x + 1(1-x) x = ½.

1 / 1 2 / 00 / 34 / 4

a bA

B

Mixed strategy NE: ((½ A + ½ B), (⅓ a + ⅔ b)).

Page 14: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

14

Extensive form games• An extensive form game is a “game

tree”: a rooted tree where each non-terminal node represents a choice that a player must make, and each terminal node gives payoffs for all players.

• For example, in this 3-player game, first P1 must choose between L and R.– Assume he chooses R: then P3 must

choose between X and Y. – Assume P3 chooses Y: then player 1

scores 3, players 2 and 3 score 5, and the game ends.

1L R

2A

BC

3X Y

2U D1

01

734

008

355

961

056

Solving extensive form games

• We use a procedure called “backward induction,” reasoning backward from the terminal nodes of the tree.– At node a, P2 would maximize his

utility by choosing C, scoring 3 points instead of 0.

– At node b, P2 would choose U.– Now, what would P3 choose at his

decision node?

1L R

2A

BC

3X Y

2U D1

01

734

008

355

961

046

a

b

Page 15: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

15

Solving extensive form games• Solution depends on common

knowledge of rationality:– Since P3 knows P2 is rational and

will choose U at node b, P3 knows he will only get 1 if he chooses X. Thus he chooses Y instead.

– Now, P1 knows he will get 7 if he chooses L, or 3 if he chooses R. Thus he chooses L.

– The value of the game is (7, 3, 4).• Backward induction gives a

unique solution to any extensive form game w/ perfect information(see below) and no ties in payoffs.

1L R

2A

BC

3X Y

2U D1

01

734

008

355

961

046

a

b

Games with imperfect information

• In games with perfect information, at each node in the tree, the player knows exactly where in the tree he is.

• In games with imperfect information, this may not be true.

• For example, in this game, if P1 chooses L or M, P2 must choose U or D, without knowing whether he is at node a or node b.

– Nodes a and b are part of the same “information set.”

• Games with imperfect information tend to be much harder to solve!

1

L M R

41

14

14

41

k3

a bU D U D

What should P1 choose if k = 3? And if k = 2?

2

Page 16: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

16

Strategies in extensive form games

• A pure strategy si for player i consists of a choice for each of player i’sinformation sets.– In a game with perfect information, each

information set consists of a single decision node.

– P1 has 2 strategies: (L) and (R). P2 has 6 strategies: (A, U), (A, D), (B, U), (B, D), (C, U), and (C, D).

– Mixed strategies σi are defined by randomizing over pure strategies as before; pure and mixed Nash equilibriaare defined as above.

1L R

2A

BC

3X Y

2U D1

01

734

008

355

961

056

Transforming games from extensive to normal form

1

L M R

41

14

14

41

23

U D U D

2

2 / 32 / 34 / 11 / 41 / 44 / 1L

M

R

U D

This game has no pure Nash equilibria; its mixed equilibrium is (½ L + ½ M), (½ U + ½ D).

Page 17: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

17

Transforming games from extensive to normal form (2)

1

L M R

41

14

14

41

33

U D U D

2

3 / 33 / 34 / 11 / 41 / 44 / 1L

M

R

U D

This game has no pure Nash equilibria; its mixed equilibria are (R, (kU + (1-k)D)) for ⅓ ≤ k ≤ ⅔.

Transforming games from extensive to normal form (3)

1

L M R

41

14

14

41

k3

U D u d2

3 / 34 / 14 / 1

k / 31 / 41 / 4

3 / 33 / 34 / 11 / 41 / 44 / 1L

M

R

Uu Ud

This game has a pure Nash equilibrium (R, Du) for all k > 1.

2

Du Dd

Page 18: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

18

“Implausible” threats1

-100 100

-∞-∞

00

-∞-∞

2 2

• A rational agent comes up to you holding a grenade, and says “Give me $100 or I’ll blow us both up.”

• Do you believe him?

Pay Don’t pay

Another nice example from Andrew’s slides!

What if you don’t know that he’s rational?

What to do when there are multiple Nash equilibria?

• NE says nothing about which equilibrium should be played.

• Various refinements of NE have been proposed, with the goal of separating “reasonable” from “unreasonable” equilibria.

• For example, Ab and Ba are both NE of the game at right.– We would like to eliminate Ab,

since P2 is playing a weakly dominated strategy.

– Assume “trembling hand”: what if P1 will accidentally play B at equilibrium Ab, or A at equilibrium Ba, with some small probability?

0 / 03 / 1

2 / 22 / 2a b

A

B

Ba is a “perfect equilibrium,” (Selten, 1973), and Ab is not.

Page 19: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

19

What to do when there are multiple Nash equilibria? (2)

• In “coordination games” such as this one, traditional refinements fail; for instance, both equilibria are “perfect.”

• How to choose between equilibria?– Choose the “Pareto dominant” NE (Aa)?– Choose the “risk dominant” NE (Bb)?

• Maybe evolutionary games can help!– In a population playing this coordination

game, where players’ choices evolve over time, which strategy is more likely to dominate the population?

– More on this in lecture 3!

2 / 23 / 0

0 / 34 / 4a b

A

B

Pareto dominant: higher payoffs if opponent coordinates.

Risk dominant: higher payoffs if opponent randomizes 50/50.

Outline• Lecture 1:

– Basics of rational decision theory.– Games in normal form; pure and mixed Nash equilibria.– Games in extensive form; backward induction.

• Lecture 2: – Bayesian games.– Auctions and negotiation. – Game theory vs. game playing.

• Lecture 3: – Evolution and learning in games.

Page 20: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

20

Bayesian games• All the games we have examined so far are

games of complete information: players have common knowledge of the structure of the game and the payoffs.

• What if the players do not know some of the parameters of the game?

• For example, consider the “entry” game:– P1 and P2 are businesses; P1 currently controls

the market.– P1 must choose whether to invest money in

upgrading its product.– P2 must choose whether to enter the market.– If P2 enters the market, it will obtain a decent

market share only if P1 has not invested.– The catch: P2 doesn’t know P1’s upgrade cost!

4 / 02 / 1? / 0? / -1invest

enter

wait

don’t

Solving Bayesian games• Harsanyi’s solution:

– Each player has a type, representing all the private information the player has that is not common knowledge.

– Players do not know other players’ types, but they do know what possible types a player can have.

– The game starts with a random move by Nature, which assigns a type to each player.

– The probability distribution over types is assumed to be common knowledge.

– This transforms the game from incomplete information to imperfect information, and this we can solve!

4 / 02 / 1? / 0? / -1invest

enter

wait

don’t

Page 21: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

21

Solving Bayesian games (2)

• For simplicity, let us assume P1 has two possible types.– The cost of investment is either

high or low, with corresponding payoff tables given here.

– Further assumption: Pr(high) = ¼, Pr(low) = ¾.

• P2 only has a single type.• What should each player do in

this situation?

4 / 02 / 12 / 00 / -1invest

enter

wait

don’t

4 / 02 / 15 / 03 / -1invest

enter

wait

don’t

Low cost

High cost

Solving Bayesian games (3)• What should each player do?

– If P1’s cost is high, he should always wait.– If P1’s cost is low, he should always invest.– Thus P2 knows that P1 will invest with

probability ¾ and wait with probability ¼.– So P2’s expected gain from entering is

¾ (-1) + ¼ (1) < 0, and P2 should not enter.– This game has a unique “Bayes-Nash

equilibrium” in pure strategies: s1(high) = waits1(low) = invests2 = don’t enter

4 / 02 / 12 / 00 / -1invest

enter

wait

don’t

4 / 02 / 15 / 03 / -1invest

enter

wait

don’t

Low cost

High cost

(¾)

(¼)

Page 22: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

22

Solving Bayesian games (4)• Here’s a harder case:

– If P1’s cost is high, he should always wait.– Let x = P(P1 invests | low cost), and y =

P(P2 enters).– If P1’s cost is low, his expected gain from

investing is 3 - (2y + 4(1-y)) = 2y - 1.– P2’s expected gain from entering is:

¾ (-1(x) + 1(1-x)) + ¼ (1) = 1 - 3x/2.• Thus we know:

x > ⅔ y = 0.x < ⅔ y = 1.y > ½ x = 1.y < ½ x = 0.

4 / 02 / 10 / 00 / -1invest

enter

wait

don’t

4 / 02 / 13 / 03 / -1invest

enter

wait

don’t

Low cost

High cost

(¾)

(¼)

Mixed BNE: σ1(high) = wait σ1(low) = invest w/ prob ⅔. σ2 = enter w/ prob ½.

Bayesian games- formal definition

• A Bayesian game consists of:– A list of players i = 1… n.– A finite set of types Θi for each player i.– A finite set of actions Ai for each player i.– A utility function ui for each player i, where

ui: A x Θ R. (ui maps a combination of players’actions and types to the payoff for player i).

– A prior distribution on types, P(θ) for θ Θ.• The set of pure strategies Si for each player i are

defined as Si : Θi Ai.– A strategy is a mapping from types to actions.

Page 23: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

23

Bayesian games: formal definition (2)

• A Bayes-Nash equilibrium is a combination of strategies s1 S1 … sn Sn, such that for each player i and each possible type θi Θi:si(θi) = arg maxs ∑ ui (s1(θ1), …, si-1(θi-1), s(θi), si+1(θi+1),

…, sn(θn)) P(θ-i | θi)• At a BNE, no player type can increase his

expected payoff (over the distribution of possible opponents’ types) by changing strategies.

• Mixed strategies are distributions over pure strategies as before; mixed BNE defined similarly.

∈ ∈∈

What if the opponent’s set of strategies is unknown?

• Consider the following variation on the rock-scissors-paper game: – There are three “taboo” cards, labeled “Rock,”

“Scissors,” and “Paper” respectively.– Both players draw a card, then play rock-

scissors-paper, with a catch: neither player is allowed to play the action on his taboo card.

– What should each player do?

Page 24: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

24

“Taboo” Rock-Scissors-Paper (1)• First consider the simplest variant,

where each player gets to see the other’s card.

• There are six possible games, depending on who draws which card.

• For example, if P1 draws Rock and P2 draws Scissors, we have the game shown here.

• In half of these games, P1 has an advantage (value of the game = ⅓) and in half P2 has an advantage (value of the game = -⅓).

0 / 01 / -11 / -1-1 / 1S

R

P

P

P1R

P2S

Mixed NE: P1 plays (⅓ S, ⅔ P), and P2 plays (⅓ R, ⅔ P).

Expected payoffs: (⅓, -⅓)

“Taboo” Rock-Scissors-Paper (2)• Next consider an unfair variant,

where P2 gets to see P1’s card, but not vice-versa.

• The game is symmetric w.r.t. P1’s draw; assume wlog that he draws Rock, giving the Bayesian game shown here.

• Bayesian Nash equilibrium: σ1 = (⅓ S, ⅔ P) σ2(S) = (⅔ R, ⅓ P) σ2(P) = (S)

0 / 01 / -11 / -1-1 / 1S

R

P

P

P1R

P2S

-1 / 11 / -10 / 0-1 / 1S

R

P

S

P1R

P2P(½)

(½)

Work this out for practice!

Value of the game is - . P2 does have an advantage!61

Page 25: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

25

“Taboo” Rock-Scissors-Paper (3)

• Now let’s make it more fun: neither player gets to see the opponent’s card!– What should each player do in this situation?– Play a couple games yourself and try to work

out the solution… answer on next slide.– Hint: the game is symmetric, with respect to the

players and with respect to the cards drawn.

“Taboo” Rock-Scissors-Paper (4)• By symmetry, we know P1,R(S) =

P2,R(S) = P1,S(P) = P2,S(P) = P1,P(R) = P2,P(R) = x.

• Then P1,R(P) = P2,R(P) = P1,S(R) = P2,S(R) = P1,P(S) = P2,P(S) = 1-x.

• wlog assume P1 draws R; P2 does not know this. Then P2 has S or P with equal probability; what should P1 do?

• Expected payoff for P1,R(S) = ½ (-P2,S(R) + P2,S(P)) + ½ (-P2,P(R)) = -½ (1-x) + ½x - ½x = ½(x-1).

• Expected payoff for P1,R(P) = ½ (P2,S(R)) + ½ (P2,P(R) - P2,P(S)) = ½ (1-x) + ½x - ½ (1-x) = ½ x.

0 / 01 / -11 / -1-1 / 1S

R

P

P

P1R

P2S

-1 / 11 / -10 / 0-1 / 1S

R

P

S

P1R

P2P(½)

(½)

Page 26: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

26

“Taboo” Rock-Scissors-Paper (5)

• Expected payoff for P1,R(S) = ½(x-1).• Expected payoff for P1,R(P) = ½ x.• Thus P1 should play P if he draws R,

and hence we know x = 0.• By symmetry, each player should:

– Play P if he draws R.– Play R if he draws S.– Play S if he draws P.

• This is the symmetric Bayes-Nash equilibrium!

0 / 01 / -11 / -1-1 / 1S

R

P

P

P1R

P2S

-1 / 11 / -10 / 0-1 / 1S

R

P

S

P1R

P2P(½)

(½)

Bayesian games in the real world: an introduction to auctions

• Let’s assume that the seller wants to sell an object worth nothing to him: anything the seller is paid is pure profit.

• There are N buyers, each with a value vi for the object.

• vi are drawn uniformly at random (i.i.d.) from [0,1], and this is common knowledge.

• Each buyer knows his own value vi, but not the other buyers’ values vj (j ≠ i).

Page 27: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

27

First price sealed bid auction

• Each buyer i writes down their bid bisimultaneously; no buyer gets to see another’s bid.

• The buyer with the highest bid pays the seller biand gets the object.

• What should a buyer bid, in terms of his value vi? (Why not bid bi = vi?)– First, a couple obvious properties (proof omitted).– By symmetry, each buyer must have the same function

bi(vi) = b*(vi) at the Bayes-Nash equilibrium.– Also, b*(vi) must be monotonically increasing with vi.

First price sealed bid (2)

b*(vi) = arg maxb E[Profit if playing b] =arg maxb (profit if b wins) P(b wins) =arg maxb (vi - b) P(all b*(vj) < b) =arg maxb (vi - b) P(b*(v) < b)N-1, where v ~ U[0,1].Since b* is monotonically increasing, this equals: arg maxb (vi - b) P(v < (b*)-1(b))N-1 =arg maxb (vi - b) ((b*)-1(b))N-1 =arg maxb (vi - b) (f(b))N-1, where function f = (b*)-1.

Page 28: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

28

First price sealed bid (3)

Setting the first derivative equal to zero:(vi - b) (f(b))N-1 = 0

(vi - b) (N-1) (f(b))N-2 f '(b) - (f(b))N-1 = 0

(vi - b) (N-1) f '(b) - f(b) = 0

(f(b) - b) (N-1) f '(b) - f(b) = 0

f '(b) = f(b) / ((N-1)(f(b)-b))

b∂∂

Since b = b*(vi), vi = f(b).

First price sealed bid (4)

Solving the differential equation:f(b) = (N / (N-1)) b vi = (N / (N-1)) b b = ((N-1) / N) vi

b*(vi) = (1 - 1/N) vi.Thus, in a first price sealed bid auction, each

bidder should bid (1 - 1/N) times his value!

Page 29: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

29

Second price sealed bid auction

• Each buyer i writes down their bid bisimultaneously; no buyer gets to see another’s bid.

• The buyer with the highest bid gets the object, but pays the seller only the amount b' of the second highest bid.

• What should a buyer bid, in terms of his value vi?

Also called a Vickrey auction!

Second price sealed bid (2)

• If a bidder with value vi bids bi, and the highest bid among the other bidders is b', his utility is vi - b', if bi > b', and 0 otherwise.– If b' < vi, any bid bi > b' is optimal.– If b' > vi, any bid bi < b' is optimal.– If b' = vi, any bid is optimal.

• Thus the dominant strategy for a bidder with valuation vi is to bid bi = vi.

Page 30: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

30

How to compute the seller’s profit?• In the first price sealed bid auction, the seller’s profit is

(N-1) / N times the highest vi.• In the second price sealed bid auction, the seller’s profit is

the second highest vi.• A little useful information from statistics:

– E[kth order statistic] = k / (N+1).– Var[kth order statistic] = k (N-k+1) / (N+2)(N+1)2

• Expected profit from second price sealed bid: (N-1) / (N+1).• Expected profit from first price sealed bid:

((N-1) / N) (N / (N+1)) = (N-1) / (N+1). Seller’s expected profit is the same in either case!

Special case of the “revenue equivalence

theorem”

Which auction has higher variance?

Given N i.i.d. values drawn from U[0,1]

Oral auctions

• English auction: the “standard” auction with ascending price; the item is sold when all but one buyer drop out.– Dominant strategy: remain in the bidding until

price is equal to vi, then drop out.– Equivalent to 2nd price sealed bid (Why?).

• Dutch auction: descending price; the item is sold when a buyer agrees to buy for that price.– Equivalent to 1st price sealed bid (Why?)

Mine!

Going… going… gone!

Will anyone buy at $100?

Page 31: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

31

Game theory vs. game playing

• Let’s say we want to apply game theory to help us win at chess (or checkers, or Go…)

• How would we do this?• Chess is a two-player, zero-

sum game with perfect information… so just write out the game tree and apply backward induction!?

• What’s wrong with this idea?

+1 -10

Game theory vs. game playing (2)

• In chess, a player has an average of b = 35 moves from a given state.

• Let’s say games are up to L = 100 moves long…

• This gives us about 35100 = 10154 nodes!

• Writing out the game tree is not an option… so what can we do instead?

b children on avg

L levels

+1 -1bL leaf nodes

Page 32: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

32

Recursive game-tree search• Rather than calculating

utilities bottom-up for the entire tree, on each move we do a top-down recursive search from the current board state.

• Assume an extensive-form game of perfect information; not necessarily zero-sum.

• For two-player zero-sum games, ui = (vi, -vi) M = arg maxi vi for P1; M = arg mini vi for P2.

recursive-gts (S, d):If (S is terminal node) return u(S).If (d = 0) return estimate of u(S).Otherwise:

Let p = player-to-move(S).Let S1 … Sk = children(S).For i = 1…k, let ui = recursive-gts (Si, d-1).Let M = arg max i=1…k ui(p).Return uM.

Current state(Returns vector of utilities for the current state.)

Search depth

See next slidePayoff vector

Player p’s expected utility for choosing action i is the pthcomponent of the utility vector ui. Since he is rational, he will choose the action i which maximizes this utility!

Heuristic evaluation functions• We typically will not be able to search deep

enough to reach a terminal state.• Thus we must have some way of estimating the

utility vector of a given non-terminal state.• A heuristic evaluation function is a mapping Φ: S û(S), where û is an estimate of the true utility vector u = u1 … uN.

• For chess, one simple evaluation function is:– Count the total “point value” p of each player’s pieces,

where more powerful pieces are worth more points.– Φ: S (v, -v), where v = (p1 - p2) / (p1 + p2).

Queen = 10

Rook = 5

Knight = 3

Bishop = 3

Pawn = 1

v 1 for p1 >> p2, and v -1 for p1 << p2.

Page 33: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

33

Comments• Typical heuristic evaluation functions are much, much

more complicated than the simple formula on the previous slide, taking many other aspects of the current state into consideration!

• How deeply do we search the game tree?– Search as deeply as possible in the allowed time!– Use iterative deepening to make sure we always have an answer

(start with search of depth 1, then keep increasing search depthuntil we are forced to move…)

– In practice, we typically want to search some lines of play moredeeply than others, rather than doing a “fixed depth” search as shown here.

A few other tricks• Alpha-beta pruning: while doing minimax search, we prune

nodes that would definitely not be chosen by a player. – In the best case, reduces branching factor b b1/2, so we can

double the search depth!– Modern chess programs use even more aggressive (but not

guaranteed) pruning to further increase search depth.• Dynamic programming for chess endgames

– Keep track of the value of every possible game state… we can do this if the number of pieces on the board is small, so there are not that many distinct states. Ex. “king and rook vs. king” endgames.

• “Memorize” chess openings… some common lines of play go out to 20-25 moves.

See Andrew’s “Game Tree Search” slides, and Russell & Norvig, Ch.5, for more details

Page 34: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

34

Outline• Lecture 1:

– Basics of rational decision theory.– Games in normal form; pure and mixed Nash equilibria.– Games in extensive form; backward induction.

• Lecture 2: – Bayesian games.– Auctions and negotiation. – Game theory vs. game playing.

• Lecture 3: – Evolution and learning in games.

The Prisoner’s Dilemma• The district attorney of a small town is holding two

prisoners as suspects in an armed robbery case. Being experienced with the criminal mentality, he tells each prisoner the following:– “If you both confess, you’ll get the standard sentence for armed

robbery. If neither of you confess, I’ll have to lower the sentence to illegal possession of firearms.”

– “But if you confess and your partner doesn’t, we’ll have enough evidence to give him the harshest possible sentence. In return for your assistance, we’ll let you go free.”

What should each player do, cooperate with the other prisoner (telling the district attorney nothing) or defect(admit the crime in return for a lighter sentence)?

Page 35: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

35

The Prisoner’s Dilemma (2)• “If my partner defects, I’ll get a harsh sentence if I

cooperate or a standard sentence if I defect.”• “If my partner cooperates, I’ll get a light sentence if I

cooperate or go free if I defect.”• “So I should defect no matter what my partner does.” And

he admits his guilt to the D.A…• In a one-shot Prisoner’s Dilemma, defection strictly

dominates cooperation, so mutual defection is clearly the only rational outcome (i.e. the unique Nash equilibrium).

• Note that if each player had done the “irrational” thing, and cooperated, both players would have been better off!

In the Prisoner’s Dilemma, the Nash equilibrium is not Pareto optimal!

Prisoner’s Dilemma definition

P / PT / SPlayer 1 chooses

defect

S / TR / RPlayer 1 chooses

cooperate

Player 2 chooses

defectPlayer 2 chooses

cooperatePayoffs to

P1/P2

For a Prisoner’s Dilemma, T>R>P>S, and 2R>T+S.

Page 36: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

36

Prisoner’s Dilemma example

1 / 15 / 0Player 1 chooses

defect

0 / 53 / 3Player 1 chooses

cooperate

Player 2 chooses

defectPlayer 2 chooses

cooperatePayoffs to

P1/P2

The typical PD payoff table:(T, R, P, S) = (5, 3, 1, 0).

Repeated games• A repeated game occurs when the same decision situation

is repeated by the same players over a large (or infinite) number of “stages.”– For example, 50 stages, where each stage is a Prisoner’s Dilemma.– Or after each stage, 10% chance of stopping…

• Each player’s goal is to maximize his average utility per stage, over all stages of the game.

• A pure strategy for a repeated game is a mapping from a sequence of past game results to an action:

Si : (m, (A1 x A2 x … x Am-1 )) Ai

• Given stage = m, actions of all players for first m-1 stages, Si chooses player i’s action for stage m.

We could also write this out as a huge extensive form game, but this is more convenient,especially when considering simple classes of strategies (finite memory, or finite state)!

Page 37: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

37

The Iterated Prisoner’s Dilemma• When a Prisoner’s Dilemma interaction is iterated over a large number

of rounds, cooperation can become a rational option.• Axelrod (1984) and many others have argued that cooperation can

evolve by reciprocal altruism: “I’ll cooperate with you if you cooperate with me.”– For example, the “Tit for Tat” (TFT) strategy: cooperate iff opponent

cooperated on the previous round.– TFT (and variants) performed very well in Axelrod’s experiments,

sparking a huge literature on cooperation via reciprocal altruism.• PD and IPD model a huge number of situations: business agreements,

animal behavior, nuclear arms race, environmental conservation, Tragedy of the Commons…see Axelrod’s book for more details.

• For infinitely repeated games, or for finitely repeated games with an unknown number of moves and sufficiently small probability of stopping, TFT is a symmetric Nash equilibrium.

But there are lots of other NE: for example, “always defect”!

Individual rationality• Many possible Nash equilibria in Iterated Prisoner’s

Dilemma: problem of equilibrium selection.• More generally, question of individual rationality: what

should an individual rational player do to maximize his payoff, given no prior knowledge of the opponent?

• Choose “best” strategy according to some performance measure… but which one?– Axelrod (1981): round-robin tournament. First choose a set of

strategies, then the “best” strategy is the one with highest average payoff vs. all strategies in that set. (Tit For Tat won!)

– Problem: performance is very dependent on set of strategies under consideration (e.g. if many strategies are unconditional cooperators, exploiters will win).

Page 38: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

38

Evolutionary game theory• A performance measure should be justified by a model: i.e.

what sort of interactions are assumed between strategies?• Evolutionary game theory assumes that the payoff for a

game is a measure of its value toward evolutionary survival.• Evolution occurs by natural selection: strategies which earn

higher average payoffs are more likely to survive and reproduce, and less fit strategies die off.

• Samuelson (2002): “Evolutionary game theory covers a wide variety of models. The common theme is a dynamic processdescribing how players adapt their behavior over the course of repeated plays of a game.”– The dynamic process can be biological evolution (i.e. the players

evolve better strategies) or individual processes of learning/imitation (i.e. the players choose strategies that they observe to be better!)

Replicator dynamics• Compute how the proportion of a population playing a

given strategy changes over time, assuming rate of reproduction is proportional to average payoff received.

• The share of agents playing a strategy grows at a rate equal to the difference between the average payoff of that strategy and the average payoff of the entire population.

• dxi / dt = xi (wi - wavg). More generally, increase must be monotonically increasing with wi (i.e. rate of reproduction higher for strategies receiving higher average payoffs).

• This could describe an actual biological scenario where natural selection determines reproductive fitness. Or it could describe a model of learning/imitation in which players are able to switch from worse to better strategies.

Page 39: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

39

Short term vs. long term dynamics• Replicator dynamics describe the short term evolution of a

system: given a starting state (proportions of strategies in the population), the population will evolve over time by natural selection.

• Short term dynamics will eventually converge to a “stable state”: this can be a single strategy, mixture of strategies, or a cyclic or even chaotic fluctuation in population shares.

• The long term evolution of a system is how the system moves between different short term stable states, i.e. when some change (mutation, migration, environmental fluctuation) knocks it from one state into another.

• Most important stability concept in long term dynamics: evolutionarily stable strategy (Maynard Smith).

Evolutionary stability (1)• Assume that a homogeneous population (everyone playing

the same strategy) is invaded by a small population of “mutants” playing an alternative strategy.

• If the average payoff of the common strategy is greater than the average payoff of the mutants, natural selection will eliminate the mutants. Otherwise, the mutants will be able to invade, and possibly to take over the population.

• Maynard Smith’s criteria (1982): – A strategy X can invade a strategy Y if w(X|Y) > w(Y|Y), or if

w(X|Y) = w(Y|Y) and also w(X|X) > w(Y|X).– Also, if w(X|Y) = w(Y|Y) and w(X|X) = w(Y|X), strategy X can

mix with strategy Y, and evolutionary drift occurs.

w(X|Y) = average payoff to strategy X vs. opponent playing strategy Y

Page 40: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

40

Evolutionary stability (2)• If no other strategy A can invade or mix with strategy X,

then X is an evolutionarily stable strategy (ESS): once it is established in a population it cannot be displaced by any single mutant. – This is the case if, for all alternative strategies A≠X,

w(A|X)≤w(X|X), and if w(A|X)=w(X|X) then w(X|A)>w(A|A). • ESS is a refinement of (symmetric) Nash equilibrium.

– An ESS is at least a weak best response to itself, thus ESS NE!– All strict symmetric NE are ESS.– A weak symmetric NE is an ESS if it passes the stability condition

that it earns a higher payoff when facing any alternative best response than does the alternative itself.

• Thm: Every ESS is an asymptotically stable fixed point of the replicator dynamics (i.e. ESS is short-term stable).

The Nowak-Sigmund model• A simple model of long-term evolution based on Maynard

Smith’s invasion criteria:– Assume an initial large homogeneous population of some

(randomly selected) strategy Y.– Each round, select a mutant strain X at random from the strategy

space. Then if X invades Y according to the Maynard Smith criteria, X takes over the population, otherwise the initial population of Y will continue.

• ESS are stable points of the Nowak-Sigmund model.– Once the population has evolved to an ESS, no other strategy can

invade, so the population will play the ESS strategy forever…

Page 41: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

41

Assumptions of ESS• The population evolves (in the short term) according to

some payoff-monotone selection dynamics.• The population size is infinite.• Only a single type of mutant strategy can attempt to invade

the population at a time.• Mutations are rare: the population evolves to a short-term

stable state after each mutation, before the next mutation occurs.

• Mutations have small impact: the proportion of a mutant strategy in the combined population is negligible.

Under what conditions is an ESS really “stable”?

Let’s see what happens if we break some of these assumptions!

ESS for finite populations • X is an infinite population (Maynard Smith) ESS if:

– For all alternative strategies A≠X, w(A|X) ≤ w(X|X).– If w(A|X) = w(X|X), then w(X|A) > w(A|A).

• Schaffer (1988) demonstrated that a strategy meeting these criteria can, in fact, be invaded…if the population is finite.– The reason: a player cannot play himself in a contest.– In a population of N players with M mutants, the mutants will play

against a mutant (M-1)/(N-1) of the time, while players of the common strategy will play against a mutant M/(N-1) of the time.

– Thus a mutant will invade if (M-1)w(A|A)+(N-M)w(A|X) > Mw(X|A) + (N-M-1)w(X|X).

– X is a finite population (Schaffer) ESS if a population of size N cannot be invaded by M mutants of any type, for given N and M.

Page 42: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

42

ESS for finite populations (2)• Does the finite population ESS

converge to the infinite population ESS as population size N ∞?

– Surprisingly, the answer is NO!• Neill (2004) defines a large population

ESS as a strategy which cannot be invaded by any finite number of mutants M, as long as the population size N is sufficiently large.

– The large population ESS is not equivalent to the infinite population ESS; see examples to right.

• A large population ESS X must satisfy:– For all A≠X, w(A|X) ≤ w(X|X).– If w(A|X) = w(X|X), then w(X|A) ≥

w(A|A) and w(X|A) > w(A|X).

For infinite population, this was “If w(A|X) = w(X|X), then w(X|A) > w(A|A).”

-1202X

vs. X

A

vs. A

3232X

A

vs. X vs. A

X is ESS for infinite populations, but not for large populations.

X is ESS for large populations, but not for infinite populations.

Stochastic stability• One of the main lines of research in

evolutionary games in the 1990s was evolutionary models where shocks to the population are continuous rather than rare.

• Continuous small, stochastic shocks, accumulated over time, can move the population between the basins of attraction of different equilibria.

• The stochastically stable equilibria are defined as the set of equilibria which occur with finite probability as the intensity of the shocks goes to zero.

• Main result: not all strict NE are stochastically stable. Now we can select between equilibria in a coordination game!

1 / 1 0 / 00 / 04 / 4A

A

B

B

Stochastic shocks cause thepopulation to move between the basins of attraction of A and B.

In this game, transitions B A are much more common thantransitions A B. As the shocksgo to zero, the system spends almost all of the time in A, andthus B is not stochastically stable!

Page 43: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

43

Stochastic stability (2)• One problem with stochastic stability models is

that, because they assume such small shocks, it takes a huge number of time steps for them to converge to a limiting distribution.

– Imagine population N=100, .01 mutation rate. Then 20 mutations (prob 10-40) must accumulate to go from B to A, and 80 mutations (prob 10-160) must accumulate to go from A to B!

• Several researchers have argued that this makes the equilibrium selection results irrelevant: on typical time scales, the population will simply stay in its initial basin of attraction!

• Also, different stochastic stability models select different equilibria when risk-dominant and Pareto-dominant equilibria differ.

1 / 1 0 / 00 / 04 / 4A

A

B

B

8 / 8 7 / 00 / 79 / 9A

A

B

B

“Large aggregate shocks” models• One solution to the slow convergence of the stochastic stability models

is to assume that shocks, rather than being small and rare (as in ESS) or small and continuous (as in stochastic stability) are large and rare.

• In these “large aggregate shocks” models (Neill, 2003b), the impact of a mutation is significant: a non-negligible proportion of the population adopts the entering strategy before selection takes effect.

• We have proposed a variety of mechanisms by which these large shocks may occur in a population:

– Mutations in a finite population.– Invasion by migration, or combination of formerly isolated populations.– Temporary shocks due to environmental changes, disasters, etc.– Imitation models with initial uncertainty about relative benefits/harms.– Darwinian evolution in a communicating population.

(LASH)

Page 44: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

44

LASH models (2)• Rather than assuming that the initial proportion of mutants is zero (as in ESS),

the initial proportion of mutants is non-zero and determined by some probabilistic “spread function” f(x) = Pr (initial proportion of mutants = x):

– Some stochastic stability models are equivalent to the case of an exponentially decreasing spread function f(x) = Ce–kx, k large.

– Many other spread functions are possible; in particular, uniform spread common.• As a result, we must consider a set of evolutionary interactions which are more

complicated than Maynard Smith’s invasion criteria. In fact, since whether X invades Y and whether Y invades X are independent, there are four possibilities:– inv(X|Y) and not inv(Y|X): X dominates Y – inv(Y|X) and not inv(X|Y): Y dominates X– inv(X|Y) and inv(Y|X): X and Y reach a stable equilibrium in which both

strategies survive.– Neither inv(X|Y) nor inv(Y|X): bistable equilibrium; either strategy can

dominate, depending on the initial proportions.

LASH models (3)• Our long term model is like Nowak & Sigmund (1992): at each time step, we

select a mutant strategy X at random and see whether it can invade the current population Y. However, there are four cases:

– If X dominates Y, then the invasion always succeeds, and X takes over, replacing Y.– If Y dominates X, then the invasion fails, and the population of Y remains.– If X and Y are stable, then both strategies survive in the combined population.– If X and Y are bistable, then we compute the proportion m of X needed to take over,

and the invasion succeeds with probability Pr (f(x) > m).• We then compute the percentage of the time, in the long run, that each strategy

dominates the population: this is our performance measure!– Note that ESS are no longer “stable” in our model, since they can be taken over (with

some probability) by any strategy with which they are bistable.• Benefits of our model:

– Large shocks result in much faster convergence to the limiting distribution (ex. 8-20 time steps instead of 109).

– Tends to select risk-dominant equilibria, as in most stochastic stability models.– But we have very different equilibrium selection results in certain cases!

More on this later…

Page 45: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

45

Evolution in the Prisoner’s Dilemma• Consider an infinitely repeated Prisoner’s Dilemma interaction, in the

standard Nowak-Sigmund model.– Tit for Tat can invade a population of defectors, establishing cooperation.– However, TFT can be invaded by increasingly “generous” variants, until

the population consists of unconditional cooperators!– Then the population can be invaded by defectors again…

• The other problem with TFT is that it does poorly in games with noise (a non-zero probability of error): a single error will lead to a chain of reciprocal defections!

• In order to catalyze the evolution of a stable cooperative society, a strategy must be able to:

– Cooperate with others playing the same strategy, even under noise.– Resist exploitation by defectors.– Exploit overly cooperative strategies, so these will not mix with and

weaken the population.

The Alternating Prisoner’s Dilemma• Neill (2001) investigates the “Alternating Prisoner’s Dilemma,” a

variant of the Iterated Prisoner’s Dilemma in which players alternate choices whether to cooperate or defect.

• The APD describes various examples of alternating reciprocal altruism in human and animal behavior: gift-giving, caring for the sick, donating food to the hungry (vampire bats), guarding the pack.

• We prove that, in the APD, none of the low-memory strategies commonly discussed in the literature can simultaneously be self-cooperating, C-exploiting, and D-unexploitable.

• However, we construct several classes of strategy that meet the criteria, and demonstrate that these strategies consistently outperform the others in evolutionary interactions.

• Neill (2003a) uses similar criteria to examine the “Turn-Taking Dilemma,” an IPD variant where players do better alternating defections than cooperating continuously. This models a variety of situations where players must coordinate cooperation under noise.

Page 46: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

46

Finitely Repeated Prisoner’s Dilemma• Now consider a Prisoner’s Dilemma interaction repeated

M times, where number of turns M is common knowledge.• The last move is a standard Prisoner’s Dilemma (i.e.

defection strictly dominates cooperation), so two rational players will both defect…

• But now, both players know that they will defect no matter what on turn M. So turn M-1 is a standard PD… and both players will defect! And similarly for M-2, and so on…

• Argument by backward induction forces us to conclude that two rational players will defect on every round of the Finitely Repeated Prisoner’s Dilemma game!

The big question: is it rational to be rational?

Finitely Repeated Prisoner’s Dilemma (2)• In the Finitely Repeated Prisoner’s Dilemma game, a

rational player is forced to make decisions which we would not consider “reasonable,” and tends to perform significantly worse in practice than an agent with imperfect rationality.

• So maybe evolutionary games will resolve this paradox?!• But if we consider the “evolutionary FRPD” using the

standard Nowak-Sigmund model, we observe an evolutionary equivalent to backward induction: – A strategy which cooperates through turn k can be invaded by a

strategy which cooperates through turn k-1, and this continues until we reach ALLD (continual defection), which is evolutionarily stable. Thus the population always evolves to ALLD!

Page 47: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

47

LASH models and the FRPD• Problem with the Nowak-Sigmund model: though ALLD

is evolutionarily stable, it is inferior to nearly all other strategies (bistable equilibrium; a very small proportion of the other strategy will take over) when game length is long.

• Neill (2003b) proposes an alternate evolutionary model like N-S (random invasion), but with large aggregate shocks. We then use this model to compute the % of the time each strategy dominates the population at equilibrium.– Assuming a uniform spread function, for short games (1-3 rounds)

ALLD has the highest % in the population. For longer games, cooperation through round M-1 has the highest %.

– As game length increases, it becomes more and more certain that strategies will cooperate until near the end of the game!

This is what we expect: defection in short games (like PD), cooperation in long games (like IPD)!

LASH models and the FRPD (2)• In fact, we prove the following:

– Let yi (0 < i < M) be the proportion of the population cooperating through round i. If the spread function f(x) satisfies the LASH property (i.e. mutation results in a non-zero proportion of mutants), then:

– For any constant k, 0 < k < 1, Σi>kM yi 1 and Σi<kM yi 0 as the number of stages M goes to infinity.

• Our LASH model is the first evolutionary model which explains how cooperation can evolve in the Finitely Repeated Prisoner’s Dilemma game, contrary to the backward induction argument!

• “Evolutionary backward induction” can still occur, but is overwhelmed by “evolutionary forward progression,” where a defecting strategy is taken over by a small proportion of a much more cooperative strategy.

• Large shocks weaken the poor but evolutionarily stable strategy of continual defection, allowing cooperation to succeed!

Cooperation in the FRPD is possible, if the game is long enough!

Page 48: A Tutorial on Game Theory - University of Hawaiinreed/ics606/lectures/gametheorytut.pdfA Tutorial on Game Theory Daniel B. Neill ... – I will enjoy a movie whether it rains or not:

48

References• R Axelrod (1981). The Evolution of Cooperation.• R Boyd and J Lorberbaum (1987). “No pure strategy is evolutionarily stable in the

repeated Prisoner’s Dilemma game,” Nature 327.• I Eshel et al (1998). “Long term evolution, short term evolution, and population genetic

theory,” J. Theor. Bio. 191(4).• D Fudenberg and D Levine (1998). The Theory of Learning in Games.• J Maynard Smith (1982). Evolution and the Theory of Games.• D Neill (2001). “Optimality under noise: higher memory strategies for the Alternating

Prisoner’s Dilemma,” J. Theor. Biol. 211(2).• D Neill (2003a). “Cooperation and coordination in the Turn-Taking Dilemma,”

proceedings of Theoretical Aspects of Rationality and Knowledge.• D Neill (2003b). “Evolutionary dynamics with large aggregate shocks,” submitted for

publication.• D Neill (2004). “Evolutionary stability for large populations,” J. Theor. Biol. 227(2).• M Nowak and K Sigmund (1992). “Tit for Tat in heterogeneous populations,” Nature

355.• L Samuelson (2002). “Evolution and game theory,” J. Econ. Perspectives 16(2).• J Weibull (1995). Evolutionary Game Theory.