varoufakis versus nash learning in unpro table …the critique of hargreaves heap and varoufakis...

27
Varoufakis versus Nash Learning in unprofitable games Josef Hofbauer, Vienna

Upload: others

Post on 18-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Varoufakis versus NashLearning in unprofitable games

Josef Hofbauer, Vienna

Page 2: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to
Page 3: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

http://yanisvaroufakis.eu/books/game-theory-a-critical-text/

Shaun P. Hargreaves Heap and Yanis Varoufakis:

Game Theory. A Critical Text. (2004): Game 2.11, p. 62

C1 C2 C3

R1 1,1 100,0 -100,1

R2 0,100 1,1 100,1

R3 1,-100 1,100 1,1

(1)

Page 4: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

C1 C2 C3

R1 1,1 100,0 -100,1

R2 0,100 1,1 100,1

R3 1,-100 1,100 1,1

They write:

In this game the unique Nash equilibrium in pure strategies seems

unreasonable to us. The unique pure strategy Nash equilibrium is

(R1,C1), but note that player R can secure, with absolute certainty,

exactly the same payoff (1) as that associated with (R1,C1) by

playing R3. In sharp contrast, R1 comes with the risk of −100 if

C plays C3. Does the fact that R1 may also yield +100 utils (if C

responds to her R1 with C2) cancel out the risk of playing R1?

Page 5: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

The critique of Hargreaves Heap and Varoufakis sounds convincing.

Since C3 is the maximin strategy for player 2, and is likely to be

played by a risk averse player, it is indeed quite risky to play R1. So

the more reasonable play in this game is R3 resp. C3, at least in

the one-shot situation.

Harsanyi (1964): similar examples of 2× 2 games

Aumann & Maschler (1972), Aumann (1985)

L R

T N,0 0,1

B 0,1 1,0

Page 6: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Aumann & Maschler (1972), Aumann (1985)

L R

T N,0 0,1

B 0,1 1,0

NE: x = T+B2 , y = L+NR

1+N , equilibrium payoffs: NN+1 and 1

2.

prudent strategies: x = T+NB1+N and y = L+R

2 .

But many people prefer to play T with probability x >> 12 because

of the attractive payoff N >> 1.

Page 7: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Holler (1990):

L R

T 1,1 1,0

B 2,1 0,4

This is again a cyclic 2× 2 game.

NE x = (34,

14), y = (1

2,12), equilibrium payoffs (1,1).

These payoffs are guaranteed by the pure maximin strategies TL.

Page 8: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

unprofitable game: for each player no equilibrium yields more than

his maxmin payoff.

Harsanyi (1964) argues that in such a situation it is more rational

to use a maximin strategy (even if it is not an equilibrium strategy)

than it is to use an equilibrium strategy (if it is not a maximin

strategy)

Harsanyi (1966) formulates Rationality Postulates for Game Situa-

tions. The first one is the Maximin Postulate:

In a game G unprofitable to you, always use a maximin strategy. (In

other words, if you cannot hope to obtain more than your maximin

payoff anyhow, then use a strategy that will absolutely assure you

at least that much.)

Page 9: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Aumann and Maschler (1972):

To be sure, we do not know what to recommend in this situa-

tion; but the maximin strategies seem preferable to the equilibrium

strategies.

. . .

Under these conditions, the use of the equilibrium strategies does

not seem reasonable.

Page 10: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Can players learn Nash equilibrium?

cyclic 2× 2 games:

learning procedures oscillate around the unique equilibrium, many

(FP, regret matching, . . . ) converge to the NE

in HH & Varoufakis’ game:

R2 is best reply to C3, then C1 is player 2’s best reply against R2,

and in the long run, the NE (R1, C1) might emerge.

Page 11: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Can players learn Nash equilibrium?

more general symmetric two person game, with the same strategicstructure (subtract 1 from each entry of the payoff matrix):

A =

0 a −b

−1 0 c

0 0 0

with a, b, c > 0. (2)

Game (1) corresponds to the case a = c = 99, b = 101.Besides (R1, C1) the game (2) has another, completely mixed

and symmetric NE, given by the proportions (ac : b : a) for bothplayers. For game (1), it is (992 : 101 : 99) which is rather close tothe pure NE. expected payoff at the mixed equilibrium equals 0 =payoff at the pure equilibrium and the maximin payoff for game (2).

If a game has 2 NE then one of them is not robust (odd numbertheorem, index theorem): the pure NESo it is the mixed equilibrium which seems more relevant.

Page 12: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Replicator dynamics:

(EWA, reinforcement learning, proportional imitation rule, . . . )

x1 = x1(ax2 − bx3 − u)

x2 = x2(−x1 + cx3 − u)

x3 = x3(−u)

(3)

coordinate transformation: x = x2x3

and y = x1x3

x = x(c− y)

y = y(−b + ax)(4)

classical Lotka–Volterra predator–prey equation!

Closed orbits, surrounding the equilibrium x = ba, y = c.

Hence (3) has only closed orbits around the mixed NE.

x1 + ax2

x3− c logx1 − b logx2 + (b + c) logx3

is a constant of motion.

Page 13: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

The time averages of these periodic solutions coincide with the

mixed NE, see H&S (1988,1998).

A population of (boundedly rational, Darwinian) players on the

average plays the mixed NE, and the average payoff

1

T

∫ T

0u(x(t))dt = 0

Page 14: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

A =

0 a −b

−1 0 c

0 0 0

with a, b, c > 0.

Best response dynamics

The population distribution x ∈ ∆ moves continuously towards a

best reply.

x ∈ BR(x)− x (5)

The best reply regions are:

1 = BR(x) iff ax2 − bx3 > −x1 + cx3 and ax2 − bx3 > 0.

2 = BR(x) iff −x1 + cx3 > ax2 − bx3 and −x1 + cx3 > 0.

3 = BR(x) iff 0 > ax2 − bx3 and 0 > −x1 + cx3.

Page 15: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

BR paths always head towards a pure strategy.

Page 16: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

The function V (x) = maxi(Ax)i = max{ax2−bx3,−x1+cx3,0} satis-

fies V = −V along solutions, and hence V (x(t))→ 0. The attractor

{x : V (x) = 0} = {x : ax2 − bx3 ≤ 0,−x1 + cx3 ≤ 0} is the re-

gion where strategy 3 is a best reply. So we end up in the region

where the maximin strategy is best!! So if a newcomer enters the

population (already at the attractor), he should play the maximin

strategy!

The attracting triangle {x : V (x) = 0} is a transitive region: it

contains a dense orbit.

Both NE are on the border of this region, but both are highly unsta-

ble: There are solutions that start at Nash, but leave it (nonunique-

ness). The pure NE cannot be reached from any other state in finite

time, but all solutions with V (x) > 0 converge to it! The mixed NE

can be reached in finite time from all states x with V (x) = 0 (in-

cluding the pure NE)!

Page 17: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Summary

While Hargreaves Heap and Varoufakis are perfectly right in their

critique of Nash equilibrium as a suggestion for actual play, we show

that in an evolutionary or learning context, at least the mixed NE

of games (1) and (2) has some relevance.

Can one save Nash in this way for every nonprofitable game?

Page 18: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Extension I

more general (symmetric) nonprofitable games (in game (2): d = 0)

A =

0 a −b

−1 d c

0 0 0

with b > 0, c > 0, a > d. (6)

(top,left) is the unique pure NE.(bottom, right) is the maximin strategy.

Morgan and Sefton (2002) studied the case b = 1, d = −1, a = c

simplest nonprofitable games: only three different payoffs: 0,−1, aThey performed experiments for a = 8/3 and a = 2/3.

4 12 1

1 1 12

4 4 4

4 6 1

1 1 6

4 4 4

Page 19: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

transformation to Lotka–Volterra system: x = x2x3

and y = x1x3

x = x(c + dx− y)

y = y(−b + ax)

Case d < 0: predator–prey system

(L) ac + bd > 0 coexistence: globally stable interior equilibrium

→ completely mixed NE, globally attracting for (REP) and (BR).

(R) ac + bd ≤ 0 predator (= pure NE strategy) dies out

→ NE mixing between strategies 2 and 3, evolutionarily stable,

globally attracting for (REP), asymptotically stable for (BR).

Page 20: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to
Page 21: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Evolution and learning lead to NE.

BUT: in experiments, average play if far from NE and maximin!

in (R) game, players collect less than maximin payoff

in (L) game: slightly more

Page 22: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Case d > 0:

x = x(c + dx− y)

y = y(−b + ax)

There exists a positive equilibrium, global repeller.

→ There exists a completely mixed NE, global repeller for (REP),

local repellor for (BR). Learning does not lead to NE.

Page 23: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to
Page 24: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Extension II

arbitrary (symmetric) RSP game - framed as an unprofitable game0 a2 −b3

−b1 0 a3

a1 −b2 0

∼−a1 a2 + b2 −b3

−a1 − b1 b2 a3

0 0 0

(ai, bi > 0).

pure maximin strategy

no pure NE

unique, completely mixed NE (symmetric)

at NE: expected payoff = 0

Gaunersdorfer & Hofbauer (GEB 1995):

If a1a2a3 < b1b2b3 the NE is a repellor for (REP) and (BRD).

Learning does not lead to the NE!

Page 25: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Example: M,N large −1 2 −N

−M 1 1

0 0 0

Page 26: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to

Conclusion

NE is an elegant mathematical solution concept for games.

NE are the fixed points of the BR correspondence, they are fixed

points of many evolutionary and learning processes.

But is NE a reasonable suggestion for playing a game or a good

prediction of the outcomes of learning?

In some situations YES:

strict NE, zero-sum games, ESS, potential games

In general, I doubt it.

Page 27: Varoufakis versus Nash Learning in unpro table …The critique of Hargreaves Heap and Varoufakis sounds convincing. Since C3 is the maximin strategy for player 2, and is likely to