critique and comment. prisoner's dilemma: metagames and other solutions

5
CRITIQUE AND COMMENT PRISONER’S DILEMMA: METAGAMES AND OTHER SOLUTIONS by Mike Robinson, Kingston-on-Thames, Surrey, England Metagame theory does not solve the paradox of rationality, i.e., both players do better if both play irrationally, generated by prisoner’s dilemma games under the assumption of individual rationality. There is a further metagame generated by each player’s need to optimize. This is that if either player convinces the other to cooperate by metagames or other means, he can do better by defecting. This higher order metagame is isomorphic with the base game, and the original irrational equilibrium is regenerated. w HE CLASSICAL prisoner’s dilemma, PD, T game matrix was intended to mirror a police interrogation situation. Alf and Bert haJe committed a crime. They are picked up on suspicion but the police have no definite evidence. They are interrogated separately-the implication of this being that neither knows whether the other has confessed or not. What each does know is the probable consequence of their joint responses: if neither confesses they both get light sentences on reduced charges; if both confess they both get medium sentences on the original charge; if one confesses, i.e., turns Queen’s evidence, he gets off but the other gets a heavy sentence. This situation can be represented in Ma- trix 1. MATRIX 1 Bert does Bert not confess confesses confess year in prison Bert gets off This matrix can be transformed into an experimental game by transforming a scale of increasing punishment into a scale of decreasing financial reward as in Matrix 2 or Matrix 3. 201 Behavioral Science, Volume 20. 1975 coop- erate Alf ’s Choice Defect MATRIX 2 Bert’s Choice Cooperate L s Alf gets 53 Bert gets 53 c- ----. Alf gets 54 Bert gets nothing MATRIX 3 Player 2 C Defect Alf gets nothing Bert gets 54 Alf gets 22 7 Bert gets 52 D The problem in the original that both the experimenter and the police have to face is that there is no guarantee that the players’ preferences for outcomes are scaled in a similar way to the outcomes. However, if the players’ preferences are scaled in the same way as the outcomes, then the dilemma is generated. Before examining this situation, let us look at another, using Matrix 3, in which the rational choice is unambiguous.

Upload: mike-robinson

Post on 06-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Critique and comment. Prisoner's Dilemma: Metagames and other solutions

CRITIQUE AND COMMENT

PRISONER’S DILEMMA: METAGAMES AND OTHER SOLUTIONS

by Mike Robinson,

Kingston-on-Thames, Surrey, England

Metagame theory does not solve the paradox of rationality, i.e., both players do better if both play irrationally, generated by prisoner’s dilemma games under the assumption of individual rationality. There is a further metagame generated by each player’s need to optimize. This is that if either player convinces the other to cooperate by metagames or other means, he can do better by defecting. This higher order metagame is isomorphic with the base game, and the original irrational equilibrium is regenerated.

w

HE CLASSICAL prisoner’s dilemma, PD, T game matrix was intended to mirror a police interrogation situation. Alf and Bert haJe committed a crime. They are picked up on suspicion but the police have no definite evidence. They are interrogated separately-the implication of this being that neither knows whether the other has confessed or not. What each does know is the probable consequence of their joint responses: if neither confesses they both get light sentences on reduced charges; if both confess they both get medium sentences on the original charge; if one confesses, i.e., turns Queen’s evidence, he gets off but the other gets a heavy sentence.

This situation can be represented in Ma- trix 1.

MATRIX 1

Bert does Bert not confess confesses

confess year in prison Bert gets off

This matrix can be transformed into an experimental game by transforming a scale of increasing punishment into a scale of decreasing financial reward as in Matrix 2 or Matrix 3.

201

Behavioral Science, Volume 20. 1975

coop- erate

Alf ’s Choice

Defect

MATRIX 2

Bert’s Choice

Cooperate

L s

Alf gets 53 Bert gets 53 c- ----. Alf gets 54 Bert gets nothing

MATRIX 3

Player 2 C

Defect

Alf gets nothing Bert gets 54

Alf gets 22 7 Bert gets 52

D

The problem in the original that both the experimenter and the police have to face is that there is no guarantee that the players’ preferences for outcomes are scaled in a similar way to the outcomes. However, if the players’ preferences are scaled in the same way as the outcomes, then the dilemma is generated.

Before examining this situation, let us look at another, using Matrix 3, in which the rational choice is unambiguous.

Page 2: Critique and comment. Prisoner's Dilemma: Metagames and other solutions

202 MIKE ROBINSON

Player 1 is a person, player 2 is a random automaton, or nature. In this situation player 1 should always play D. By doing this he maximizes his greatest and least gains.

There are two further situations in which the rational choice is unambiguous.

The first is the situation where the game is defined as competitive by player 1, i.e., his sole object is to get more and to avoid getting less than player 2. In this situation player 1 should always play D. By doing this he cannot do worse than player 2 and may do better.

The second situation arises when the game is defined as cooperative. Both players wish to maximize their joint gains. In this case the rational choice is C because it gives the possibility of the highest joint total; CC gives six, all other outcomes give four.

If we now assume that the players’ pref- erences correspond to the reward or punish- ment scale, and that each player’s aim is to achieve the maximum gain for himself, rather than to compete or cooperate, we can construct the paradox of PD. In other words, we assume the preference order of Table 3, rather than Tables 1 or 2 .

We now have the situation where both players are ignorant of the other’s choice and both wish to maximize their gains. The rational course is for both players to choose D. As Howard (1971) points out, this leads to the paradoxical situation where both players would do better by behaving irra- tionally than rationally.

Howard then goes on to suggest that if we use a different canon of rationality, one in which the players attribute rationality to each other, then we can construct a new model of rational behavior that is able to generate a CC as well as a DD result-thus a better empirical predictor, as well as a more satisfying theory.

TABLE 1 COMPETITIVE PREFERENCES

Preferences

1 2 3 Player

TABLE 2 COOPERATIVE PREFERENCES

Preferences Player

1 ‘ 2

1 cc C D or DC or D D 2 cc C D or DC or D D

While there is much to be said for this theory, my point here is that i t does not generate a CC solution to PD as classically formulated, i.e., play once only, without communication.

HOWARD’S THEORY

We wish to show that the outcome CC is an cquilibrium in PD. An equilibrium is a stable outcome.

A stable outcome is one which is arrived a t when both players make their choice in the light of correct predictions as to the other’s choice, i.e., neither would change his decision in the light of any further informa- tion. Each player gains his knowledge of the other’s choice by consideration of the k-metagame, knowledge that is not avail- able in the original game. The k-metagame is constructed by supposing that player k chooses his strategy after the others and in the light of their choices.

Howard then maintains that considera- tion of the PDl(2) metagame reveals two equilibria: CC and DD.

Let us consider this proposition, and the process by which one arrives a t it. We start with the assumption that the preference order of the players is that shown in Table 3, and almost tautologically, that each player will choose the path-assuming rep- resentation of the game as a tree-leading to the best outcome for himself in the light of his knowledge of the others’ choices.

The essential confusion of this method

TABLE 3 INDIVIDUAL PREFERENCES

Preferences

1 2 3 4 Player

1 DC CC or D D C D 2 CD CC or D D DC

1 DC cc D D C D 2 CD cc D D DC

Behavioral Science, Volume 20, 1975

Page 3: Critique and comment. Prisoner's Dilemma: Metagames and other solutions

PRISONER’S DILEMMA : METAGAMES AND OTHER SOLUTIONS 203

originates in the mode of interpretation of the metagames. The process of resolution goes like this. In the 2-metagame, player 2 makes his choice after player 1 in knowledge of player 1’s choice. Thus if player 1 chooses C , player 2 does best by choosing D ; if 1 chooses D, 2 does best by choosing D. Thus player 2’s best strategy is to play DID, or D . He assumes that 1 has thought out things the same way, and will therefore play D. He therefore plays D , expects 1 to play D, and is right. The D D solution is therefore stable.

We note first that the 2-metagame is not, and cannot be, the game that is actually played if that game is classical PD. This can be seen in Table 4.

In other words, in the metagame, 2 has two choices (CD, DC) that are not available in the original game. They are not available in the original game precisely because 2 does not know what 1 has played.

Metagames, then are mental aids, not the games that are played.

Mental aids to whom? Howard sometimes seems to suggest that players reach the CC position by discovering, constructing, con- templating, understanding, and acting on simultaneous mental 2(1) and l(2) meta- games. This is a proposition that I find unlikely, to put i t mildly.

Conversely, metagames are clearly not for the experimenter only. If they were, they would be redundant, for the players would never know that a CC outcome was on the cards.

The plausible suggestion seems to be that the players contemplate a restricted form of metagame in the course of trying to anticipate the other player’s move: “what

TABLE 4

Possible Game Player Choices

PD 1 & 2 C cc, cd D dc, dd

2-metagame 1 C cc, cd D dc, dd

2 cc cc, cd D D dc, dd CD cc, dd DC dc, cd

would happen if he were prepared to co- operate if I was?”, etc.

We can now say that the metagame is not the game that is played, but a formalization of each player’s hypothetical internal con- versation.

We come now to the l(2) metagame where all is made clear. This is where it becomes clear that the 1(2), or any other level, metagame does not provide a CC equi- librium for PD.

MATRIX 4

C / C D / D C / D D/C

313 411

491

491

421

431

4 , 1

491

393 393 313 393 373 333 313

4 , 1

If the l(2) metagame were the game actually being played, then the equilibrium would indeed be CC. By contemplating Matrix 4, player 1 can see that his best move is DDCD, player 2 can see that this is player 1’s best move, and that his best response to this is CD. By the same process player 1 can anticipate player 2’s CD, and both players can be said to correctly an- ticipate the other’s move, and the outcome is therefore stable, or an equilibrium.

But the l(2) metagame based on PD is a very different game from PD. It hm a different tree, and the players have different and much larger choice sets.

Behavioral Science, Volume 20, 1975

Page 4: Critique and comment. Prisoner's Dilemma: Metagames and other solutions

204 MIKE ROBINSON

The real crush comes when player 1 uses the l(2) metagame to predict 2’s choice in PD. He sees that if he plays DDCD, then 2 wil l play CD. This, in practice, will lead him to play C. At this point 2, if he is not besotted by metagame theory, will realize that 1’s contemplation of the l(2) meta- game has led him to play C. Thus, 2’s best move is now D. If 1 has realized this, then he would have played D and not C. The equilibrium is therefore DD.

To put it another way, metagame theory appears to provide a way out of the PD paradox because it allows each player to assume the rationality of the other. Each player can predicate his play on predictions of the other’s play. This is no longer a game against nature. Although this is un- doubtedly a better theoretical representa- tion of what goes on, it does not go far enough.

Having allowed for rationality, one must also allow for duplicity or bluff. Although it may be true that no further equilibria can be gained by expanding the 1(2)-and 2(1) metagames in terms of pure rationality-it is not true that rationality stops at the l(2) and 2(1) metagames. If either player has thought this far and assuming Table 3 preference orderings, then the next thought must be “If I have pursuaded him to co- operate by saying that I will, then I will be better off defecting”.

But in fact, we do not have to go this far. Once the notion of bluff is admitted, then the certainty that came with ration- ality is lost. The bluff-metagame supersedes the l(2) and 2(1) metagames, and means “ignorance about the other’s choice”.

The highest metagame becomes a game against nature or the original game. How- ever far one follows the infinite metagame tree, DD is its ultimate and unique fruit.

Classical PD The ground for the construction of meta-

game theory is not only that the classical canon of rationality leads to a paradox in which both players do better if both behave irrationally; but it is also true that on occa-

COMMENTS

sion players do reach a CC equilibrium and seem to have rational, in some valid sense, grounds for doing this.

As far as the classical PD goes, it seems from Howard’s data, and some experiments of my own, that this empirical statement is unjustified. The CC solution is reached in so few cases that statistically we may ignore it. This is because we cannot guarantee that players’ preferences will follow the experi- mental reward structure (Tables 1, 2, and 3) for a number of reasons.

PD one-shot, with communication Formally speaking, I would argue that

the equilibrium of this game is the same as in classical PD, namely, DD. This is because there is no essential difference between both players constructing their predictions v i s a vis the other separately or jointly. The situation where player 1 says, “I will co- operate if you will,” and player 2 says “OK,’’ is isomorphic with the situation where 1 chooses DDCD and 2 chooses CD. Thinking and playing are not the same thing. If player 1 has genuinely pursuaded player 2 that he intends to play C, then 1 would be better off if he played D. On the other hand, if player 2 was not convinced, 1 would still be better off playing D, etc.

However, all this assumes that the play- ers’ preferences are structured in the same way as the rewards.

This seems even less likely to be the case than in classical PD. Communication will not only be about the outcomes, but about the values implicit in ordering the outcomes. Communication between players allows them to modify or reject the experimental instructions. This is very clear in the origi- nal interrogation situation. There are three players not two. Alf, Bert, and the police. The game the police play is to try and impose a preference order that will benefit them. Shades of this conflict inevitably carry over into the experimental situation. We may expect a higher number of CC solutions in this condition because we are even less certain of controlling the players’ preferred orderings.

Behavioral Science. Volume 20. 1975

Page 5: Critique and comment. Prisoner's Dilemma: Metagames and other solutions

PRISONER’S DILEMMA : METAGAMES AND OTHER SOLUTIONS 205

PD multiple play This seems to me to be the form in which

metagame theory comes into its o m . Both players can be reasonably seen as trying to state, in actual play, a policy that will be relevant to the next play. If rewards are assumed to be cumulative, then it becomes

rational to try and induce a real CC equi- librium-

REFERENCE Howmd, N. P a d o z e s of rationality: The theory of

metagcrmes a d political behavior. Cambridge, Mase.: MIT Preee, 1971.

(Manuscript received November 13, 1974)

Behavioral Science, Volume 20, 1975