fairness and learning: an experimental examinationmyweb.fsu.edu/djcooper/research/fair.pdf ·...

26
Fairness and Learning: An Experimental Examination David J. Cooper Department of Economics Weatherhead School of Management Case Western Reserve University 10900 Euclid Avenue Cleveland, OH 44106 [email protected] Carol Kraker Stockman Graduate School of Public Health University of Pittsburgh 130 DeSoto Street, A664 Pittsburgh, PA 15261 [email protected] September 11, 2002 The authors would like to thank the National Science Foundation for financial support. We would like to thank Ido Erev, John Kagel, Jim Rebitzer, Al Roth, David Schkade, Bob Slonim, John Van Huyck, Jim Warnick, an anonymous editor, two anonymous referees, and seminar participants at Alabama, Case Western Reserve, the Cleveland Federal Reserve, Oberlin, Ohio State, Pittsburgh, Purdue, Texas A&M, Tufts, the SEA meetings, and the ESA meetings for their helpful comments. The usual caveat applies. Abstract: Over the last twenty years, experimental economists have identified a wide variety of games in which subjects display “other-regarding” behavior in violation of standard game theoretic predictions. The two primary approaches to explaining this behavior can be characterized as the “fairness hypothesis” and the “learning hypothesis.” Both hypotheses provide an adequate explanation of behavior in existing experiments, but the two approaches rely on very different interpretations of the factors underlying subjects’ behavior. In this paper, we examine experiments with a step-level public goods game designed to distinguish between the fairness and learning hypotheses. We find that neither approach provides an adequate explanation of the experimental data by itself. We propose a model which synthesizes the fairness and learning hypotheses. Using econometric analysis, we demonstrate that this model provides a significantly better fit to the experimental data. JEL: C72, C91, C92, D64

Upload: vokiet

Post on 29-Mar-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Fairness and Learning:An Experimental Examination

David J. CooperDepartment of Economics

Weatherhead School of ManagementCase Western Reserve University

10900 Euclid AvenueCleveland, OH 44106

[email protected]

Carol Kraker StockmanGraduate School of Public Health

University of Pittsburgh130 DeSoto Street, A664

Pittsburgh, PA [email protected]

September 11, 2002

The authors would like to thank the National Science Foundation for financial support. We would like tothank Ido Erev, John Kagel, Jim Rebitzer, Al Roth, David Schkade, Bob Slonim, John Van Huyck, JimWarnick, an anonymous editor, two anonymous referees, and seminar participants at Alabama, Case WesternReserve, the Cleveland Federal Reserve, Oberlin, Ohio State, Pittsburgh, Purdue, Texas A&M, Tufts, theSEA meetings, and the ESA meetings for their helpful comments. The usual caveat applies.

Abstract: Over the last twenty years, experimental economists have identified a wide variety of games inwhich subjects display “other-regarding” behavior in violation of standard game theoretic predictions. Thetwo primary approaches to explaining this behavior can be characterized as the “fairness hypothesis” and the“learning hypothesis.” Both hypotheses provide an adequate explanation of behavior in existing experiments,but the two approaches rely on very different interpretations of the factors underlying subjects’ behavior. Inthis paper, we examine experiments with a step-level public goods game designed to distinguish between thefairness and learning hypotheses. We find that neither approach provides an adequate explanation of theexperimental data by itself. We propose a model which synthesizes the fairness and learning hypotheses.Using econometric analysis, we demonstrate that this model provides a significantly better fit to theexperimental data.

JEL: C72, C91, C92, D64

1Well known examples include the ultimatum game (Roth, 1995), the dictator game (Roth, 1995), VCMpublic goods games without thresholds (Ledyard, 1995), experimental labor markets (Fehr, Kirchler, andWeichbold, 1998), and trust games (Berg, Dickhaut, & McCabe, 1995).

2It is somewhat of a misnomer to refer to the “fairness” hypothesis, since the theory encompasses a widevariety of emotions such as envy, spite, and reciprocity. The “interpersonal comparisons” hypothesis wouldbe a more accurate (if unwieldy) characterization of this approach.

3See also Falk and Fischbacher (1998), Levine (1998), and Charness and Rabin (2000).

1

I. Introduction: Over the last twenty years, experimental economists have identified a wide variety

of games in which subjects display “other-regarding” behavior that violates standard game theoretic

predictions.1 Explaining these results has become a major focus of experimental economics and the related

areas of economic theory. The two primary approaches to explaining this anomalous behavior can be

characterized as the “fairness hypothesis” and the “learning hypothesis.” The fairness hypothesis maintains

the assumption of full rationality but eliminates the equivalence between monetary payoffs and utility, instead

allowing a player’s utility to depend on the payoffs of others.2 Sophisticated versions of this approach, such

as Bolton and Ockenfels (2000) and Fehr and Schmidt (1999), give unified explanations of results from a

wide variety of experiments.3 The learning hypothesis takes the opposite approach, relaxing the assumption

of full rationality but maintaining the equivalence between monetary payoffs and utility. Under the learning

hypothesis, players have bounded rationality, only gradually learning to behave optimally. As shown by Roth

and Erev (1995) and Binmore, Gale, and Samuelson (1995), the learning hypothesis also provides a unified

explanation of results from a wide variety of experiments. Although both hypotheses provide an adequate

explanation of existing experimental data, the two approaches rely on very different interpretations of the

factors underlying subjects’ behavior. As such, there exists a need for experimental work which allows us

to determine whether subjects’ behavior is best explained by the fairness hypothesis, the learning hypothesis,

or a synthesis of both approaches.

This paper examines experiments with a three player sequential step-level public goods game. We

focus on the behavior of “critical” third players – third players whose decisions determine whether or not the

public good will be provided. This case is of particular interest because critical third players face no strategic

uncertainty. The game creates a strong tension between monetary payoffs and fairness for critical third

4The MCS game was first introduced in a simultaneous form by Van de Kragt, Orbell and Dawes (1983)and in a sequential form by Erev and Rapoport (1990).

2

players. Applying the fairness and learning hypotheses to this game allows us to make sharp predictions

about the observed behavior of critical third players. In particular, existing models of fairness, static by their

very nature, predict no change in behavior over time, while existing learning models predict that

contributions by critical third players, a strictly dominant strategy, must increase over time. Our results

unequivocally indicate that neither approach can explain the experimental data by itself – we observe

consistent movements in critical third players’ play, including a strong decrease in the contribution rate for

one of the treatments. We propose a model which synthesizes the fairness and learning hypotheses. Using

maximum likelihood analysis, we show that this hybrid model provides a better explanation for our data than

either a pure fairness or learning approach.

To the best of our knowledge, our data set is unique in showing robust movement away from a

dominant strategy. The existing experimental literature on other-regarding behavior generally reports no

observable dynamics, in part due to the relatively small number of repetitions typically employed. To the

extent that any dynamics are observed, they involve movement towards a dominant strategy.

We initially viewed our MCS game experiments as a contest between the fairness and learning

hypotheses, but discovered that neither approach is wholly satisfactory. Rather than saying both approaches

are wrong, it would be more accurate to say that both are partially right. By synthesizing the best features

of the fairness and learning hypotheses, our model hopefully not only improves our ability to fit the current

experimental data, but also improves our ability to understand interactions between learning and fairness.

II. The MCS Game, the Experimental Design, and Predicted Results: The Minimal Contributing

Set (MCS) game we study is a 3-person step-level public goods game.4 Each player is given an initial

endowment of 12 tokens. Players sequentially choose whether or not to contribute to provision of a public

good. Contribution is a binary choice; players choose to either contribute or not contribute, and a

predetermined cost of contribution is deducted from a player’s initial endowment following contribution. The

public good is provided if at least 2 of the 3 players contribute. If the public good is provided, then each of

3

the players receives an additional 18 tokens regardless of whether he contributed to the public good. Costs

of contribution are deducted regardless of whether or not the public good is provided. Players are given

perfect information – they know all players’ payoff tables and the moves of preceding players.

The pre-set cost associated with contribution for player i is ci tokens, i 0 {1,2,3}, where the index

gives the player’s position in the game (Player 1 moves first, Player 2 moves second, and Player 3 moves

third). We use three different cost structures, as summarized in Table 1. Only one cost structure is used in

any given experimental session. We will refer to the treatments by the cost structure, giving us the “3/6/9

treatment,” the “1/3/9 treatment,” and the “1/3/16 treatment.” Note that costs of contribution are increasing

in player’s position for all three treatments. Although payoffs for all treatments are set so that all players earn

more money if the public good is provided (ci < 18, œ i), each player earns more if he can free-ride on the

contributions of the other players.

Restricting our attention to pure strategy equilibria, and assuming that utility is a function solely of

a player’s monetary payoff, the MCS game has four Nash equilibria. In one equilibrium none of the players

contribute and the public good is not provided. In the remaining three equilibria exactly two of the three

players contribute and the public good is provided. Only the equilibrium in which the last two players

contribute is a subgame perfect equilibrium.

The position contingent contribution costs and the sequential nature of the game create tension

between payoff maximization and other-regarding preferences. Consider the position of a “critical” third

player – a third player who knows that exactly one of the two preceding players has contributed. A critical

third player knows that the public good will be provided if and only if he contributes. If this player cares

solely about his monetary payoffs, he should always contribute. However, this player also knows that a

preceding player with a lower cost of contribution has not contributed and will free-ride on provision of the

public good. We can easily imagine that a critical third player might resent this, and therefore not contribute

to punish the earlier player who did not contribute. A critical third player must also consider that his decision

to contribute or not contribute affects the preceding individual who did contribute, an individual whom he

may wish to help. Thus, the desire to maximize one’s own monetary payoffs, the desire to punish free riders,

4

and the desire to reward other contributors all can affect the behavior of critical third players. The resulting

behavior depends on which of these forces carries the greatest weight.

The three different cost structures are designed to place differing emphasis on maximizing one’s own

payoffs versus fairness for critical third players. Compare the 3/6/9 treatment with the 1/3/9 treatment. In

both cases, the monetary benefits of contributing are identical for critical third players. However, in the 1/3/9

treatment the preceding players’ costs of contribution are lower. Holding the identity of the contributing

player fixed, we would expect a critical third player to feel greater resentment when the non-contributing

player’s cost of contribution is lowered. A critical third player should also be less concerned about harming

a player who has contributed when that player’s cost of contribution is reduced. We therefore would expect

fairness considerations to play a greater role in the 1/3/9 treatment than the 3/6/9 treatment, lowering

contribution rates by critical third players. Likewise, compare the 1/3/9 treatment with the 1/3/16 treatment.

While the costs of contribution are identical for Player 1s and Player 2s across the two treatments, the cost

of contribution has increased for Player 3s in the 1/3/16 treatment. Since the monetary incentives to

contribute have been reduced, we expect fairness considerations to play a more important role in determining

the choices of critical third players in the 1/3/16 treatment than in the 1/3/9 treatment. We therefore anticipate

lower contribution rates by critical third players in the 1/3/16 treatment than in the 1/3/9 treatment.

By their very nature, existing models of fairness predict no change in the behavior of critical third

players over time. These are all models in which individuals have fixed utility functions over their own

payoffs and the payoffs of others. Since there is no uncertainty about other players’ payoffs to be resolved

for critical third players, there is no reason for their behavior to change.

The preceding arguments have all been made on an intuitive basis, but can also be placed on a more

formal footing. For example, we can apply Bolton and Ockenfels’ (2000) ERC model, a leading model of

fairness, to the MCS game. Bolton and Ockenfels model a player’s utility from a game as a function of his

monetary payoff and his payoff share. The latter term captures a player’s concern with fairness. Making

some standard assumptions, we prove the following two propositions. The proofs are available from the

authors upon request.

5

Proposition 1: If an individual ever contributes as a critical third player in the 1/3/9 treatment, they must

always contribute as a critical third player in the 3/6/9 treatment.

Proposition 2: If an individual ever contributes as a critical third player in the 1/3/16 treatment, they must

always contribute as a critical third player in the 1/3/9 treatment.

It follows from these propositions that the ERC model predicts contribution rates by critical third

players to be non-increasing as we go from the 3/6/9 treatment to the 1/3/9 treatment and then to the 1/3/16

treatment. Because a player’s utility function only depends on his own monetary payoff and the monetary

payoffs of the other two players, information which is known to a critical third player, the ERC model

predicts no systematic change over time in contribution rates by critical third players.

Our predictions for the MCS game are not driven by the fine details of the ERC model, and wouldn’t

change if a different model of fairness was used. For example, Proposition 1 and Proposition 2 can both be

proved using Fehr and Schmidt’s (1999) model of fairness, competition, and cooperation. Copies of these

alternative proofs are also available from the authors upon request.

The predictions of a reinforcement learning model (with reinforcements equal to monetary payoffs)

are quite different than those of a fairness model. While we put off a detailed description of the reinforcement

learning model of Roth and Erev until Section V, a brief intuitive explanation of what the model predicts is

useful in reading the experimental results. Reinforcement learning models can be summarized in a single

sentence: Strategies that do well should be played more frequently over time. A learning model by its very

nature makes no predictions about the initial distribution of play – learning models predict the evolution of

play given a starting distribution of strategies. With monetary reinforcements, play should move in the

direction of increased monetary payoffs. Since it is a strictly dominant strategy to contribute as a critical third

player regardless of the treatment, we expect that the contribution rate for third players will increase (on

average) over time regardless of the treatment. However, we do expect that the rate of increase will be slower

in the 1/3/16 treatments than in the other two since the reinforcement for contributing is smaller in this

treatment.

6

To summarize, both a pure fairness approach and pure learning approach make strong predictions

about the data. Considering fairness by itself, we predict that contribution rates by critical third players

should be decreasing as we move from the 3/6/9 treatment to the 1/3/9 treatment and then the 1/3/16

treatment. A fairness only approach predicts no change over time in contribution rates. A learning model

with no fairness component makes no predictions about the starting distribution of strategies across the three

treatments, but predicts that contribution rates should be rising for critical third players in all three cases. It

also predicts that the rate of increase will be slowest in the 1/3/16 treatment.

III. Experimental Procedures: Four replications of each of the three cells of the design have been run

using a total of 234 subjects. Between fifteen and twenty-four subjects participated in each session and each

subject was allowed to take part in only one session. Subjects were recruited from the University of

Pittsburgh undergraduate population using newspaper ads, posters, and electronic bulletin board postings.

The average session lasted about 1½ hours, and the average subject earned about $15, including a $5

participation fee. These earnings were adequate to ensure a steady supply of subjects.

Before the beginning of a session, instructions were read aloud to all subjects. Subjects were also

given a written copy of the instructions and of the payoff tables. All substantive questions about the

instructions were answered out loud to ensure common knowledge. Subjects were asked to complete a short

payoff quiz to verify their ability to read the payoff tables.

Each session consisted of forty periods of play of the sequential MCS Game. Subjects were told the

number of periods to be played. We randomly determined each subject’s player position (Player 1, Player

2, or Player 3) before the first round of play. A subjects’ position remained unchanged throughout the course

of the session. Subjects knew their own position and that there were an equal number of subjects in each

position, but did not know the position of any other subject. At the beginning of each round of play, we

randomly and anonymously assigned subjects to three person groups containing one subject for each position.

The instructions emphasized that the three person groups would be randomly rematched in every round. This

was intended to preserve the one-shot strategic character of the MCS game. After each round of play,

subjects were informed of the decisions made and the payoffs earned by the other members of their group.

7

Subjects were given record sheets on which to record their outcomes. While we did not require subjects to

fill out the record sheet, we observed that most did.

To avoid any framing effects, neutral terms were used wherever possible. For example, players were

asked to choose between “X” and “Y” rather than “contribute” and “don’t contribute.” No explicit mention

was ever made of “provision” or “public goods”. Instead, the payoff table refers solely to “your choice” and

“total number of other players choosing X.”

Subjects were paid privately, in cash, based on their token earnings in two periods randomly chosen

from the forty periods of play. Each token was worth $0.30 to subjects.

IV. Experimental Data: Figure 1 shows the evolution of contribution rates for Player 1s, Player 2s

contingent on Player 1's decision, and critical Player 3s. Our data analysis concentrates on the choices of

critical third players. Unlike other players, third players face no uncertainty about the play of others. We

therefore don’t need to worry about the unobservable beliefs of subjects in analyzing their behavior. Only

for third players can we make strong predictions about how behavior should vary across treatment and across

time and draw strong conclusions about the effects of fairness and learning on subjects’ behavior.

After sorting themselves out over the first few rounds, contribution rates by critical third players are

decreasing as the asymmetry of costs increases. These differences emerge early -- aggregating over the first

ten periods, the contribution rates by critical third players are 82%, 76%, and 73% in the 3/6/9, 1/3/9, and

1/3/16 treatments respectively -- and increase over time. For the 3/6/9 and 1/3/9 treatments, the contribution

rate for critical third players rises over time, reaching 90% and 83% respectively for the final ten periods. In

the 1/3/16 treatment, the contribution rate for critical third players falls substantially over time, dipping to

50% in periods 21-30 and then rebounding somewhat to 60% in the final ten periods. This decline in

contribution rates is not due to anomalous behavior in a single session – for all four sessions of the 1/3/16

treatment the contribution rate by critical third players was lower in the final ten periods than in the initial

ten periods.

Rather than capturing a substantive feature of subjects’ behavior such as reputation building, the late

5We can also refute the reputation hypothesis by looking at play by critical third players in the 40th period. Since this is the last period of play, there is nothing to be gained by maintaining a reputation. None theless, only 8 of 15 critical third players in the 40th period of the 1/3/16 treatment contribute to the publicgood. By way of comparison, 20 out of 20 critical third players contributed in the 40th period of the 3/6/9treatment and 13 out of 14 critical third players contributed in the 40th period of the 1/3/9 treatment.

8

increase of the contribution rate for critical third players in the 1/3/16 treatment illustrates the deceptive

effects of aggregation over sessions. In later periods of the experiment, the probability of being a critical third

player is endogenous. If learning by the first two players moves them towards strategies with higher expected

payoffs, there will be positive correlation between the initial probability that critical third players contribute

in a session and the probability that third players are critical in later periods. It follows that observations of

critical third players in later periods are more likely to come from sessions in which critical third players tend

to contribute. This aggregation effect biases estimated changes in the contribution rate for critical third

players upwards.

The rise in contribution rates for 1/3/16 critical third players in the final ten periods is largely due to

the aggregation effect described above. When contribution rates are broken down by session, no clear pattern

of change emerges over the second half of the experiment, but the percentage of observations coming from

sessions with relatively high contribution rates rises steadily over time. As such, we see little support for the

hypothesis that reputation effects are driving our results.5

To control for individual effects (and for the aggregation effects), we examine the behavior of critical

third players using probit analysis with a random effects specification. This analysis is summarized in Table

2. Each observation corresponds to a play of the game for which the third player was critical. For all three

models the dependent variable is the critical third player’s decision (0 = don’t contribute, 1 = contribute).

The random effects term is significant at the 1% level in all three regressions, but is not reported in Table 2

since it is of little economic significance.

Model 1 is a baseline regression looking for general differences between the treatments. The only

regressors are dummy variables for treatment (Payoff 3/6/9=1 if 3/6/9 treatment, 0 otherwise; Payoff

1/3/16=1 if 1/3/16 treatment, 0 otherwise). Using the 1/3/9 treatment as a base, this regression finds that

6Throughout this paper, all tests of significance for individual parameters are two tailed z-tests. All tests ofjoint significance use a log likelihood ratio test.

7To test this hypothesis, we reran Model 2 with the 1/3/16 treatment as the base. The parameter estimate forperiod is -.0295 with a standard error of .0063. This parameter is significant at the 1% level.

9

contribution rates are higher in the 3/6/9 treatment and lower in the 1/3/16 treatment with both dummies

significant at the 1% level.6

Model 2 adds a regressor for the period in which the decision was made as well as two variables

which interact period with a treatment dummy (period 3/6/9 and period 1/3/16). Once again the treatment

dummies are significant at the 1% level, indicating that statistically significant differences between treatments

are present from the beginning of play. Jointly, the three new variables are significant at the 1% level (P2 =

31.756, 3 d.f., p < .01). The parameter for period is positive and significant at the 5% level, while the period

3/6/9 parameter is not significant at any standard level. Together, these parameters indicate that contribution

rates are rising in the 3/6/9 and 1/3/9 treatments with the difference between the two treatments remaining

roughly constant. The parameter for period 1/3/16 is negative and significant at the 1% level. Additionally,

the overall rate of change in 1/3/16 sessions is negative and significant at the 1% level, indicating that

contribution rates are falling significantly over time in the 1/3/16 treatment.7

Model 3 adds four new variables to the regression. These additions are chosen to test secondary

predictions of the fairness and learning hypotheses. Whocon is a dummy variable indicating which of the

preceding two players contributed (whocon = 1 if the first player contributed, 0 otherwise). Under general

conditions, the fairness hypothesis predicts that critical third players should contribute more frequently if

Player 2 has contributed than if Player 1 has contributed. (Formal proofs using either the Bolton-Ockenfels

model or the Fehr and Schmidt model are available from the authors upon request.) This prediction holds

regardless of treatment, so no interactions with the treatment dummies are included in the regression. Chance

gives the percentage of previous periods in which the third player has been critical. In a reinforcement

learning model without fairness (reinforcements equal monetary payoffs), the contribution rate by critical

third players should be positively correlated with chance. Intuitively, the more opportunities you have had

10

to learn that contributing yields higher payoffs than not contributing, the more likely you should be to

contribute. When the reinforcement includes a fairness component, the model predicts that more chances to

be critical should accelerate learning but not affect the direction. Since contribution rates by critical third

players move in different directions in the different treatments, the variable chance is interacted with the

treatment dummies to generate the variables chance 3/6/9 and chance 1/3/16.

The results for Model 3 indicates that which player contributed prior to Player 3 does not have any

significant effect on a critical third player’s decision. The three chance variables are not jointly significant

at any standard level (P2 = 4.46, 3 d.f., p > .10). Examined individually, only the parameter estimate for

chance 1/3/16 is significant even at the 10% level. The positive sign of the parameter estimate for chance

1/3/16 does not fit with any sort of reinforcement learning model, given that contribution rates for critical

third players in the 1/3/16 sessions decrease over time.

Overall, the regression results reinforce our observations from Figure 1. Both the level of

contribution rates for critical third players and the direction of change for these contribution rates depend on

the asymmetry of contribution costs. For the treatments with relatively symmetric costs (3/6/9 and 1/3/9

treatments), contribution rates for critical third players are relatively high and rise over time. For the

treatment with relatively asymmetric costs (1/3/16 treatment), critical third player contribution rates are

initially low and fall over time.

The experimental results can only be partially explained by the existing theories of fairness. The

initial differences in play by critical third players between treatments are consistent with the predictions of

fairness models like the Bolton-Ockenfels ERC model. However, these initial differences are quite small,

only becoming economically significant with the passage of time. The existing models of fairness are static

in nature, and therefore cannot track the dynamics.

A pure learning approach also fails to explain all of the major features of the data. As noted

previously, such an approach predicts a rising contribution rate for critical third players in all three treatments,

when in fact the contribution rate falls sharply in the 1/3/16 treatment.

8See Cooper and Feltovich (1996), Cheung and Friedman (1997), Erev and Roth (1998), Salmon (1998),Camerer and Ho (1999), Blume, DeJong, Neumann, and Savin (1999), and Feltovich (2000).

11

V. Learning and Fairness: While neither the pure fairness hypothesis nor the pure learning hypothesis

captures all of the major features of the experimental data, both are able to explain some of the major features.

While unable to explain the changes in subjects’ behavior, the pure fairness hypothesis does correctly predict

that contribution rates for critical third players are decreasing as the costs of contribution become more

asymmetric. Likewise, while unable to predict the decrease in contribution rates for critical third players in

the 1/3/16 treatment, the pure learning hypothesis correctly predicts that behavior will change over time with

the differences between treatments growing. These partial successes suggest that a hybrid model synthesizing

the best of the fairness and learning hypotheses might fit the data well.

This section presents a reinforcement learning model and explores whether the addition of fairness

to the model improves its ability to fit the data. Because the dynamics generated by the learning model are

probabilistic in nature and sensitive to how the learning model is specified, the model must be formally

specified and appropriate econometric tests must be applied to cleanly reject the null hypothesis of pure

learning in favor of a hybrid between learning and fairness.

The Learning Model: We use a variation of the reinforcement learning model developed by Roth and Erev

(1995). There exist a wide variety of learning models in the literature, and which of these best tracks the

behavior of experimental subjects remains an open question.8 Our concern is not to determine whether the

reinforcement learning model is the best model of learning, but rather to study whether or not the observed

data is best explained by a learning model that allows for fairness considerations. The reinforcement learning

model is particularly well suited to this purpose since it has been used previously to study similar issues and

is easily modified to incorporate fairness considerations. We strongly conjecture that our conclusions will

hold for any learning model which puts increased weight over time on strategies that do well.

The reinforcement learning model studies how behavior changes over time for a population of

individuals who are repeatedly matched in small groups to play a game. Individuals are assumed to always

12

play the same role within this game, but the groups matched with each other change from round to round.

Formally, consider an N player normal form game. Let Mi be the number of strategies available to the i-th

player. In round t, the j-th individual in the i-th role has propensity qijt(m) for choosing his m-th strategy.

The probability of the j-th individual in the i-th role choosing his m-th strategy in round t is obtained by

dividing the associated propensity for this strategy by the sum of the propensities in round t.

(eq. 1)p (m)

q (m)

qijt ij

t

ijt

=∑=

( )µµ 1

Mi

In round t, each individual is randomly assigned N - 1 opponents, then chooses his strategy based on

his round-t probabilities. Let Fijt 0 {1,2, . . . , Mi} be the strategy chosen by the j-th individual in the i-th role

in round t. Monetary payoffs are determined the strategies chosen by all N individuals in a matching. Let

rijt be the reinforcement received by the j-th individual in the i-th role in round t. For the learning model

without fairness, this reinforcement is equal to the individual’s monetary payoff. For the hybrid model with

fairness, the reinforcement is a function of the individual’s monetary payoff and the size of his payoff relative

to the payoffs of others. An individual’s propensities are updated by multiplying all propensities by a

“forgetting” parameter * and then adding his reinforcement to his propensity for choosing his realized

strategy. Equation 2 gives the formula for updating the propensity of the realized strategy – for all other

strategies, the rijt term is dropped. By assumption, 0 # * # 1.

(eq. 2)( ) ( )q s q s rijt 1

ijt

ijt

ijt

ijt+ = +δ

To help pick up the individual effects in the data, we modify the model by adding an autocorrelation

parameter psame. This “inertia” parameter allows for the possibility that the draws are not independent across

periods, but instead correlated. The addition of this parameter affects how strategies are selected in each

period. With probability psame, the same strategy is chosen in period t as was chosen in period t - 1 (Fijt = Fij

t-1).

Otherwise, the formula in (eq. 1) is used to pick a new strategy. In other words, (eq. 1) now gives the

probability for each strategy being used subject to the individual choosing a new strategy.

9For example, suppose the first two players contribute. Suppose the third player would not have contributedif he had been critical. With normal form strategies, this choice gets reinforced even though it was neverimplemented. No reinforcement takes place with behavioral strategies.

10The frequency of contribution for third players when both preceding players contribute is 3.1% in the3/6/9 treatment, 13.2% in the 1/3/9 treatment, and 3.2% in the 1/3/16 treatment. The frequency ofcontribution for third players when neither preceding player contributes is 4.2% in the 3/6/9 treatment,18.7% in the 1/3/9 treatment, and 1.5% in the 1/3/16 treatment.

13

Fitting the reinforcement learning model: Before fitting parameters for the reinforcement learning model, we

must make some decisions as to how the model will be formulated for the MCS game. The model is written

down in terms of normal form strategies, but can be equally well stated in terms of behavioral strategies. For

an extensive form game like the MCS game, this might seem like a more natural choice. However, to give

the pure learning hypothesis its best possible chance, we use normal form strategies. This decision increases

the probability of a decline in critical third players’ contribution rates as observed in the 1/3/16 treatment.

The two approaches differ on the critical issue of whether or not third players reinforce the decision they

would have made if they had been critical even when they are not critical. Using normal form strategies

allows this sort of “accidental” reinforcement while using behavioral strategies does not.9 As a neutral force,

accidental reinforcement works against the natural direction of the dynamic. Since contributing is a dominant

strategy for critical third players with no fairness considerations, the natural direction of the dynamic is

towards increasing contribution rates. Thus accidental reinforcement works to our benefit if we are trying

to generate decreasing contribution rates.

In this paper, the model will only be fit for third players since this is the primary focus. Results on

fitting the model for Players 1 and 2 are available from the authors upon request. In the full normal form

game, Player 3s have sixteen available strategies. To make the estimation less unwieldy, the strategy set is

simplified by forcing third players to choose the same action whenever they are not critical. In other words,

a third player who is not critical cannot condition his strategy on whether both or neither of the previous

players have contributed. Given that contribution rates are about the same in both cases, and given the low

proportion of contributions observed in either case, this assumption is unlikely to have any substantial effect

on our conclusions.10 The remaining eight strategies tell a Player 3 what action to take when not critical,

14

r3 1 2 3i

1 2 3( , , )π π π π α

ππ π π

= −+ +

3 0 13

* min , (Eq.3)

critical following a contribution by Player 1, or critical following a contribution by Player 2.

We fit two variations of the reinforcement learning model to the data for third players using standard

maximum likelihood techniques. In the first version of the model, the learning model without fairness,

reinforcements equal the players’ monetary payoffs. In the second version, we incorporate fairness into the

learning model by using a version of Bolton and Ockenfel’s ERC model to generate the reinforcements. In

particular, the reinforcements for the learning model with fairness are given by equation 3. Let Bi be the

monetary payoff for the i-th player. Note that if " = 0, we get the learning model without fairness.

The results of the maximum likelihood estimation are contained in Table 3. All Player 3 observations

were used in fitting the models. Separate initial propensities are fit for each of the three treatments. Given

that the inclusion of psame probably does not fully control for individual effects in the data, the standard errors

have been corrected for clustering using techniques due to Moulton (1986) and White (1994). The initial

propensities are of little economic interest, and hence are not reported.

The critical parameter in the maximum likelihood estimation is ", the weight put on fairness in the

reinforcement function. Even after correcting the standard errors for clustering, we can reject the null

hypothesis that " = 0 at the 1% level using either a z-test (z = 3.34, p < .01) or a log-likelihood ratio test (P2

= 11.91, 1 d.f., p < .01). Adding fairness to the reinforcement function significantly improves the model’s

ability to fit the data. This improvement does not depend on the precise details of how fairness is added to

the reinforcements. We have done analogous analysis using several alternative specifications of the ERC

model and several specifications of the Fehr-Schmidt model, and get identical qualitative results. The details

of these additional fitting exercises are available upon request.

The ability of the reinforcement learning model with fairness to better fit the data follows from the

basic properties of the model. Reinforcement learning models tends to move in the direction of higher

reinforcements. Because contributing always provides a higher monetary payoff for critical third players,

11For " > 14.7, a critical third player in the 1/3/16 treatment always receives greater reinforcement from notcontributing than from contributing. For the 1/3/9 treatment, a critical third player always receives greaterreinforcement from contributing than not contributing if " < 127.1. For the 3/6/9 treatment, a critical thirdplayer always receives greater reinforcement from contributing than not contributing if " < 140.4.

15

the learning model without fairness is strongly predisposed toward rising contribution rates. This makes it

difficult for the model to track the falling contribution rates in the 1/3/16 treatment. With fairness added to

the reinforcement function, it becomes possible for not contributing when critical to generate a higher

reinforcement than contributing. For the fitted value of ", this is indeed the case for the 1/3/16 treament, but

not for the other two treatments.11 The model therefore can track both the rising contribution rate for critical

third players in the 3/6/9 and 1/3/9 treatments and the falling contribution rate in the 1/3/16 treatment.

The preceding argument should make it clear why use of reinforcement learning versus some other

learning model is not an essential ingredient of the hybrid model. What make the hybrid model work is its

ability to change which strategy is dominant in the 1/3/16 treatment. As long as the learning model has the

tendency of moving towards a dominant strategy, the specifics of the dynamics aren’t crucial.

VI. Conclusions and Discussion: This paper has presented experiments with a sequential step-level

public goods game, the MCS game. It explores whether the observed data can best be explained by the pure

fairness hypothesis, the pure learning hypothesis, or some combination of both approaches. While neither

approach by itself captures the major features of the data, a hybrid model that combines the dynamics of the

learning hypothesis with the sensitivity to relative payoffs of the fairness hypothesis provides a significantly

better fit for the experimental data. In particular, this model is consistent with declining contribution rates

for critical third players in the 1/3/16 treatment.

The novelty of our work comes from two sources, one empirical and one theoretical. Empirically,

we find strong movement towards a dominated strategy, a unique result as far as we know. For the most part,

games with other-regarding behavior yield no discernable dynamics. One case for which strong dynamics

are typically observed is in VCM public goods games without a threshold (Ledyard, 1995, pp. 146-148).

However, these dynamics involve movement towards the dominant strategy of not contributing. Only

12 The observed dynamics in VCM public goods games do not allow us to reject the pure fairnesshypothesis in any substantive manner. Because VCM games are simultaneous play games, subjects facestrategic uncertainty. It is consistent with the fairness hypothesis to build a model in which players arecompletely rational and have other-regarding preferences, but have to learn about the preferences (and byextension play) of others. Such a Bayesian learning model can fully explain the observed dynamics inVCM public goods games. Because critical third players in MCS games face no strategic uncertainty, asimilar approach cannot explain the observed changes in contribution rates over time.

Roth and Slonim (1998) and Cooper, Feltovich, Roth, and Zwick (2000) report learning byproposers in ultimatum games. As with VCM public goods games, these changes can be explained as aresponse to uncertainty about responders’ preferences.

16

because of the significant shift in the 1/3/16 treatment away from the dominant strategy of contributing for

critical third players can we reject the pure learning hypothesis.12

The primary theoretical innovation of this paper is driven by the empirical results. The data make

it clear that fairness and learning do not act in isolation, but instead interact with each other. This suggest

the need to develop a theory that incorporates both fairness and learning and allows us to model the

interaction between these two factors. Our hybrid model, the reinforcement learning model with fairness,

takes a first step towards building a unified theory of fairness and learning.

An important implication of the learning model with fairness is that fairness and learning are not

distinctly separate phenomena, but instead must be considered as aspects of a unified whole. Understanding

the dual nature of subjects’ behavior allows us to reconcile the contradictory conclusions reached by earlier

papers comparing the fairness and learning hypotheses. Two preceding papers have studied variants of the

ultimatum game to compare these hypotheses. Abbink, Bolton, Sadrieh, and Tang (1997) conclude that

punishment is a better explanation of responder behavior than learning. They find clear evidence that fairness

matters for responders and no firm evidence of learning by responders. In contrast, Cooper, Feltovich, Roth,

and Zwick (2000) find support for the pure learning hypothesis. Their main treatment effect is in the direction

predicted by the pure learning hypothesis and they find strong evidence that responders’ actions respond to

their past experience as a learning model predicts. Within the framework of our hybrid model, these opposing

results can happen quite naturally. Trying to discern the relative importance of fairness and learning from

a single experiment is closely akin to the famous anecdote about blind men trying to determine the nature of

an elephant by feeling a single body part. Depending on exactly which experiment is considered, different

13In fact, Cooper et al fit a hybrid model analogous to ours to their data in the working paper version of theirtext. Even though it isn’t necessary to explain the main treatment effect, they find that the learning modelwith fairness provides a significantly better fit for the data.

14Suppose we concede the dubious point that subjects would have a 30% error rate in early periods. If playin the last half of the experiment accurately reflects subjects’ true preferences, there is approximately a 50-50 split between subjects that prefer to contribute as critical third players in the 1/3/16 treatment and thosewho prefer to not contribute. Since random errors are presumably unbiased, it is equally likely that a playerthat intended to contribute would accidentally choose to not contribute as vice versa. Thus, even with ahigh rate of random errors, the contribution rate for critical third players should be roughly 50% in the earlyperiods. This is clearly not the case.

15For reviews of the literature on constructive preferences, see Payne, Bettman, and Johnson (1992), Slovic(1995), and Bettman, Luce and Payne (1998). For attempts to build a theory of constructive preferences,see Bettman, Luce, and Payne (1998) and Payne, Bettman, and Schkade (forthcoming).

17

conclusions will be reached about the relative importance of fairness and learning.13

We speculate that the learning model with fairness captures something about the behavioral

foundations of critical third players’ choices. Unlike many settings where strong dynamics are observed, it

is somewhat puzzling that the behavior of critical third players changes so dramatically. After all, what is

there to learn for a critical third player? Usually when strong learning effects are observed, subjects face

either substantial uncertainty about parameters of the game or substantial strategic uncertainty. Neither of

these conditions applies to critical third players in the MCS game. In settings which involve fairly complex

optimization problems, we observe learning because subjects must develop a good heuristic for solving the

difficult maximization problem. This scarcely applies to critical third players who face a straightforward

binary choice. We cannot even justify the observed dynamics through a decrease in random errors. The

relatively weak increases in contribution rates for critical third players in the 3/6/9 and 1/3/9 treatments might

be attributed to a decrease in random errors, but it hard to square the strong decrease in contribution rates for

critical third players in the 1/3/16 treatment with a decrease in random errors.14 Just about the only thing left

that subjects could be learning is their preferences. Indeed, recent advances in cognitive psychology suggest

that subjects will not enter our experiment with well-formed preferences.

Economists typically assume that preferences are an innate characteristic of individuals, and hence

stable. Over the past few decades, cognitive psychologists have largely abandoned this rigid view of

preferences in favor of the more flexible notion of constructive preferences.15 The basic idea can be

18

summarized as follows. Individuals are not assumed to carry around pre-specified preferences in their heads.

They cannot simply look up the appropriate preferences from some master list when faced with a novel

problem. At best, individuals may have well defined attitudes towards the primary attributes of a novel

problem. However, they lack any coherent notion of how these attitudes combine into a single preference

measure. Instead, preferences must be constructed on the spot.

The concept of constructive preferences applies naturally to the decision facing inexperienced

subjects who are critical third players in the MCS game. The primary attributes of their choices are monetary

payoffs and fairness. Contributing to the public good generates higher monetary payoffs but lower fairness.

Presumably most subjects know that (ceteris paribus) they prefer outcomes that are more fair to outcomes that

are less fair. They also can be assumed to prefer outcomes that (ceteris paribus) give higher monetary payoffs

to those that give lower monetary payoffs. However, we speculate that subjects lack a complete preference

ordering over combinations of fairness and monetary payoffs and must construct said preferences as needed.

Put simply, rather than entering our experiment knowing how they feel about tradeoffs between fairness and

monetary payoffs, we suspect that our subjects must learn their preferences. In particular, many critical third

players in the 1/3/16 treatment realize with experience that they prefer a slightly lower monetary payoff with

equal relative payoffs over a slightly higher monetary payoff with very unequal relative payoffs.

Given this story, the reinforcement learning model with fairness is natural approach to capturing the

process of critical third players resolving their preferences. Consider the position of a subject playing as a

third player in the 1/3/16 treatment. He is in an unfamiliar environment facing a problem he’s never seen

before. He must resolve a difficult tradeoff between fairness and monetary considerations. We might

reasonably expect that he will be somewhat confused. While it is possible that our subject uses a very

sophisticated strategy for resolving his confusion, it seems at least as plausible that he muddles along before

settling on a strategy that seems to do well. This is essentially the learning strategy of a reinforcement

learner.

Our current experiments do not allow us to directly confirm this conjecture that changes in the

19

choices of critical third players reflect the construction of preferences. More generally, they do not allow us

to state with any certainty that the reinforcement learning model with fairness is the best possible method of

synthesizing fairness and learning, to determine the best possible specification for this model, or to determine

the range of settings in which the hybrid model is applicable. In future work with the MCS game and other

related games, we plan to explore these issues.

20

References

Abbink, Klaus; Bolton, Gary E.; Sadrieh, A. and Tang, F. "Adaptive Learning versus Punishment inUltimatum Bargaining." Mimeo, Penn State University, 1997.

Berg, Joyce; Dickhaut, John W. and McCabe, Kevin A. "Trust, Reciprocity and Social History." Games andEconomic Behavior, July 1995, 10(1), pp. 122-42.

Bettman, James R.; Luce, Mary Frances and Payne, John W. "Constructive Consumer Choice Processes."Journal of Consumer Research, December 1998, 25(3), pp. 187-217.

Binmore, Ken; Gale, John and Samuelson, Larry. "Learning to be Imperfect: The Ultimatum Game." Gamesand Economic Behavior, January 1995, 8(1), pp. 56-90.

Blume, A.; DeJong, D.V.; Neumann, G.R. and Savin, N.E. "Inferring Learning Rules from ExperimentalGame Data." Working paper, University of Iowa,1999.

Bolton, Gary E. and Ockenfels, A. "ERC: A Theory of Equity, Reciprocity, and Competition." AmericanEconomic Review, March 2000, 90(1), pp.166-93.

Camerer, C. and Ho, T. H. "Experience-Weighted Attraction Learning in Normal Form Games."Econometrica, July 1999, 67(4), pp. 827-874.

Charness, Gary and Matthew Rabin, "Social Preferences: Some Simple Tests and a New Model," workingpaper, University of California-Berkeley, 2000.

Cheung, Yin Won and Friedman, Daniel. "Individual Learning in Normal Form Games: Some LaboratoryResults." Games and Economic Behavior, April 1997, 19(1), pp. 46-79.

Cooper, David J. and Feltovich, Nick. "Reinforcement-Based Learning vs. Bayesian Learning: AComparison." Working paper, University of Pittsburgh, 1996.

Cooper, David J.; Feltovich, Nick; Roth, Alvin E. and Zwick, Rami. "Relative versus Absolute Speed ofAdjustment in Strategic Environments: Responder Behavior in Ultimatum Games." Working paper,Case Western Reserve University, 2000.

Erev, Ido and Rapoport, Amnon. "Provision of Step-Level Public Goods: The Sequential ContributionMechanism." Journal of Conflict Resolution, September 1990, 34(3), pp. 401-25.

Erev, Ido and Roth, A.E. "Predicting How People Play Games: Reinforcement Learning in ExperimentalGames with Unique, Mixed Strategy Equilibria." American Economic Review, September 1998,88(4), pp. 848-881.

Falk, Armin and Fischbacher, Urs. "An Eye for an Eye, a Tooth for a Tooth: A Theory of Reciprocity."Mimeo, University of Zurich, 1998.

21

Fehr, Ernst; Kirchler, Erich; Weichbold, Andreas and Gachter, Simon. "When Social Norms OverpowerCompetition: Gift Exchange in Experimental Labor Markets." Journal of Labor Economics, April1998, 16(2), pp. 324-51.

Fehr, Ernst and Schmidt, K. "A Theory of Fairness, Competition, and Cooperation." Quarterly Journal ofEconomics, August 1999, 114(3), pp. 817-68.

Feltovich, Nicholas. "Reinforcement-Based vs. Beliefs-Based Learning Models in ExperimentalAsymmetric-Information Games." Econometrica, May 2000, 68(3), pp. 605-41.

Ledyard, John O. "Public Goods: A Survey of Experimental Research," in John H. Kagel and Alvin E. Roth,eds., The Handbook of Experimental Economics. Princeton: Princeton University Press, 1995.

Levine, David K. "Modeling Altruism and Spitefulness in Experiments." Review of Economic Dynamics, July1998, 1(3), pp. 593-622.

Moulton, Brent, "Random Group Effects and the Prevision of Regression Estimates," Journal ofEconometrics,1986, 32, 385-397.

Payne, John W.; Bettman, James R. and Johnson, Eric J. "Behavioral Decision Research: A ConstructiveProcessing Approach." Annual Review of Psychology, 1992, 43, pp. 87-131.

Payne, John W.; Bettman, James R. and Schkade, David A. "Measuring Constructed Preferences: Towardsa Building Code." forthcoming Journal of Risk and Uncertainty.

Roth, Alvin E. "Bargaining Experiments," in John H. Kagel and Alvin E. Roth, eds., The Handbook ofExperimental Economics. Princeton: Princeton University Press, 1995.

Roth, Alvin E. and Erev, Ido. "Learning in Extensive-Form Games: Experimental Data and Simple DynamicModels in the Intermediate Term." Games and Economic Behavior, January 1995, 8(1), pp. 164-212.

Roth, Alvin E. and Slonim, Robert. "Learning in High Stakes Ultimatum Games: An Experiment in theSlovak Republic." Econometrica, May 1998, 66(3), pp. 569-96.

Salmon, T.C. "An Evaluation of Econometric Models of Adaptive Learning." forthcoming Econometrica.

Slovic, Paul. "The Construction of Preference." American Psychologist, May 1995, 50(5), pp. 364-371.

Van de Kragt; Alphons J.; Orbell, John M. and Dawes, Robyn M. "The Minimal Contributing Set as aSolution to Public Goods Problems." American Political Science Review, March 1983, 77(1), pp.112-22.

White, H. Estimation, Inference and Specification Analysis. New York: Cambridge University Press, 1994.

Figure 1

Contribution Rates by Player 1s

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40Periods

Con

tribu

tion

Rat

e

3/6/9 Treatment 1/3/9 Treatment 1/3/16 Treatment

Contribution Rates by Player 2sContingent on Player 1s Contributing

0

0.1

0.2

0.3

0.4

0.5

0.6

1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40Periods

Con

tribu

tion

Rat

e

3/6/9 Treatment 1/3/9 Treatment 1/3/16 Treatment

Contribution Rates by Player 2sContingent on Player 1 Not Contributing

0.5

0.6

0.7

0.8

0.9

1

1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40Periods

Con

tribu

tion

Rat

e

3/6/9 Treatment 1/3/9 Treatment 1/3/16 Treatment

Contribution Rates by Critical Third Players

0.4

0.5

0.6

0.7

0.8

0.9

1

1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40Periods

Con

tribu

tion

Rat

e

3/6/9 Treatment 1/3/9 Treatment 1/3/16 Treatment

Table 1Summary of Contribution Costs by Treatment

Player 1 Player 2 Player 3

3/6/9 Treatment 3 6 9

1/3/9 Treatment 1 3 9

1/3/16 Treatment 1 3 16

Table 2Results of Probits for Critical Third Players

1.960 ** 1.678 ** 2.145 **

(0.163) (0.219) (-0.507)0.987 ** 0.832 ** 0.240(.232) (0.357) (0.606)-0.903 ** -0.955 ** -1.643 **

(.154) (0.297) (0.561)0.014 * 0.012 *

(0.007) (0.007)0.007 0.008

(0.010) (0.010)-0.043 ** -0.041 **

(0.010) (0.009)0.095-0.095-0.787(0.634)0.670

(0.640)1.012 +

(0.671)Log-likelihood -643.24 -627.37 -625.12

** indicates significance at the 1% level* and + indicate significance at the 5 and 10% levels, respectively( ) standard errors

Model 1 Model 2 Model 3

Constant

Payoff 3/6/9

Payoff 1/3/16

Period

Chance 3/6/9

Chance 1/3/16

Period 3/6/9

Period 1/3/16

Whocon

Chance

Table 3

Maximum Likelihood Estimates of Parameters for Reinforcement Learning ModelPlayer 3 Data

Model without Fairness Model with Fairness

Initial Sumof Propensities

46.2*

(19.4)Initial Sum

of Propensities36.6**

(11.2)

Delta .251**

(.036) Delta .248**

(.035)

Probability Same .228*

(.093) Probability Same .222**

(.074)

Alpha 51.8**

(15.5)

Log Likelihood 750.33 Log Likelihood -744.38

Notes: Each regression has 3120 observations split between 78 subjects. Numbers reported inparentheses are standard errors corrected for clustering.

** Significantly different from 0 at the 1% level* Significantly different from 0 at the 5% level