decision from models: generalizing probability information

15
Decision From Models: Generalizing Probability Information to Novel Tasks Hang Zhang, Jacienta T. Paily, and Laurence T. Maloney New York University We investigate a new type of decision under risk where—to succeed—participants must generalize their experience in one set of tasks to a novel set of tasks. We asked participants to trade distance for reward in a virtual minefield where each successive step incurred the same fixed probability of failure (referred to as hazard). With constant hazard, the probability of success (the survival function) decreases exponentially with path length. On each trial, participants chose between a shorter path with smaller reward and a longer (more dangerous) path with larger reward. They received feedback in 160 training trials: encountering a mine along their chosen path resulted in zero reward and successful completion of the path led to the reward associated with the path chosen. They then completed 600 no-feedback test trials with novel combinations of path length and rewards. To maximize expected gain, participants had to learn the correct exponential model in training and generalize it to the test conditions. We compared how participants discounted reward with increasing path length to the predictions of 9 choice models including the correct exponential model. The choices of a majority of the participants were best accounted for by a model of the correct exponential form although with marked overestimation of the hazard rate. The deci- sion-from-models paradigm differs from experience-based decision paradigms such as decision-from-sampling in the importance assigned to generalizing experience-based information to novel tasks. The task itself is representative of everyday tasks involving repeated decisions in stochastically invariant environments. Keywords: decision under risk, decision from experience, constant hazard rate, exponential survival function, generalization Supplemental materials: http://dx.doi.org/10.1037/dec0000022.supp Decision-making is often modeled as a choice among lotteries. A lottery is a list of mutually exclusive possible outcomes, O i , i 1, . . . , n, with corresponding probabilities of occurrence p i , i 1,..., n, such that i1 n p i 1. A decision maker may, for exam- ple, be constrained to choose one of the follow- ing: Lottery A: 50% chance to win $900 and 50% chance to get nothing; versus Lottery B: $450 for sure. Decision tasks can be divided into classes based on the source of probability information available to the decision maker (Barron & Erev, 2003; Hertwig, Barron, Weber, & Erev, 2004). In the traditional “decision from description” task—illustrated above—(e.g., Kahneman & Tversky, 1979), participants choose between lotteries where the probability of each outcome is explicitly given. In a new and growing research area known as “decision from experience” or “decision from sampling” (Barron & Erev, 2003; Hadar & Fox, 2009; Hertwig et al., 2004; Unge- mach, Chater, & Stewart, 2009; see Rakow & Newell, 2010 for a review), participants learn This article was published Online First November 3, 2014. Hang Zhang, Department of Psychology and Center for Neural Science, New York University; Jacienta T. Paily, Department of Psychology, New York University; Laurence T. Maloney, Department of Psychology and Center for Neural Science, New York University. This research was supported by NIH EY019889 to Lau- rence T. Maloney. Correspondence concerning this article should be ad- dressed to Hang Zhang, Department of Psychology and Center for Neural Science, New York University, 6 Wash- ington Place, 2nd Floor, New York, NY 10003. E-mail: [email protected] This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Decision © 2014 American Psychological Association 2015, Vol. 2, No. 1, 39 –53 2325-9965/15/$12.00 http://dx.doi.org/10.1037/dec0000022 39

Upload: others

Post on 21-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision From Models: Generalizing Probability Information

Decision From Models: Generalizing Probability Information toNovel Tasks

Hang Zhang, Jacienta T. Paily, and Laurence T. MaloneyNew York University

We investigate a new type of decision under risk where—to succeed—participantsmust generalize their experience in one set of tasks to a novel set of tasks. We askedparticipants to trade distance for reward in a virtual minefield where each successivestep incurred the same fixed probability of failure (referred to as hazard). With constanthazard, the probability of success (the survival function) decreases exponentially withpath length. On each trial, participants chose between a shorter path with smallerreward and a longer (more dangerous) path with larger reward. They received feedbackin 160 training trials: encountering a mine along their chosen path resulted in zeroreward and successful completion of the path led to the reward associated with the pathchosen. They then completed 600 no-feedback test trials with novel combinations ofpath length and rewards. To maximize expected gain, participants had to learn thecorrect exponential model in training and generalize it to the test conditions. Wecompared how participants discounted reward with increasing path length to thepredictions of 9 choice models including the correct exponential model. The choices ofa majority of the participants were best accounted for by a model of the correctexponential form although with marked overestimation of the hazard rate. The deci-sion-from-models paradigm differs from experience-based decision paradigms such asdecision-from-sampling in the importance assigned to generalizing experience-basedinformation to novel tasks. The task itself is representative of everyday tasks involvingrepeated decisions in stochastically invariant environments.

Keywords: decision under risk, decision from experience, constant hazard rate, exponentialsurvival function, generalization

Supplemental materials: http://dx.doi.org/10.1037/dec0000022.supp

Decision-making is often modeled as achoice among lotteries. A lottery is a list ofmutually exclusive possible outcomes, Oi, i �1, . . . , n, with corresponding probabilities ofoccurrence pi, i � 1, . . . , n, such that

�i�1n pi � 1. A decision maker may, for exam-

ple, be constrained to choose one of the follow-ing:

Lottery A: 50% chance to win $900 and 50%chance to get nothing; versus

Lottery B: $450 for sure.

Decision tasks can be divided into classesbased on the source of probability informationavailable to the decision maker (Barron & Erev,2003; Hertwig, Barron, Weber, & Erev, 2004).In the traditional “decision from description”task—illustrated above—(e.g., Kahneman &Tversky, 1979), participants choose betweenlotteries where the probability of each outcomeis explicitly given.

In a new and growing research area knownas “decision from experience” or “decisionfrom sampling” (Barron & Erev, 2003; Hadar& Fox, 2009; Hertwig et al., 2004; Unge-mach, Chater, & Stewart, 2009; see Rakow &Newell, 2010 for a review), participants learn

This article was published Online First November 3,2014.

Hang Zhang, Department of Psychology and Center forNeural Science, New York University; Jacienta T. Paily,Department of Psychology, New York University; LaurenceT. Maloney, Department of Psychology and Center forNeural Science, New York University.

This research was supported by NIH EY019889 to Lau-rence T. Maloney.

Correspondence concerning this article should be ad-dressed to Hang Zhang, Department of Psychology andCenter for Neural Science, New York University, 6 Wash-ington Place, 2nd Floor, New York, NY 10003. E-mail:[email protected]

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Decision © 2014 American Psychological Association2015, Vol. 2, No. 1, 39–53 2325-9965/15/$12.00 http://dx.doi.org/10.1037/dec0000022

39

Page 2: Decision From Models: Generalizing Probability Information

probabilities by repeatedly sampling out-comes from multiple lotteries and then decidewhich they prefer.1 The participant may, forexample, repeatedly execute Lottery A aboveand receive a stream of outcome values $900and $0, with roughly as many of the former asthe latter. Sampling from Lottery B leads onlyto outcomes $450.

With large enough samples, the informationabout probability in decision from samplingconverges to that available in decision fromdescription.2 However, participants in decisionfrom description and decision from samplingexperiments seem to treat probabilities very dif-ferently. In decision from description, humanchoices deviate from those that maximize ex-pected utility as if small probabilities were over-weighted and large probabilities underweighted(Tversky & Kahneman, 1992). In decision fromsampling, the reverse pattern—small probabili-ties underweighted and large probabilities over-weighted—is found (Erev et al., 2010; Hertwiget al., 2004; Teodorescu & Erev, 2014).

Decision from sampling captures many real-world situations where explicit probability in-formation is unavailable but individuals may“try out” the various choices repeatedly beforecommitting to a decision. There are, however,situations where people must make decisions,probabilities are not given, and repeated sam-pling is either not possible or undesirable. Theymight not, for example, want to learn how tocross a wide, busy street by trial and error.However, if people can somehow generalizetheir experience with narrow, safe streets bymodeling how their probability of survival var-ies with traffic density and speed, they are po-tentially able to estimate probabilities of sur-vival in novel situations (e.g., high speed traffic)without direct experience. Generalization ex-tends the benefits of learning to a wider range ofpotential decision tasks.

In decision-from-sampling tasks (Barron &Erev, 2003; Hertwig et al., 2004), there is—bydesign—little basis for generalizing to novelconditions (lotteries). The focus of research ison experience—not generalization. If, for ex-ample, the individual samples Lottery A’s out-comes from a red deck of cards and Lottery B’soutcomes from a green deck he has informationconcerning the frequencies of possible rewardsfrom each of the two decks. He has no basis for

generalizing to an unsampled blue deck of cardsrepresenting Lottery C.

In the present study, we introduce a new classof decision task, “decision from models,”where—in order to do well—people must gen-eralize probability information from past expe-rience to a novel task based on an internalmodel of the probabilistic environment. In ourstudy, on each trial, participants had to choosebetween two paths through a virtual minefield(Figure 1). A fixed number of invisible mineswere randomly and uniformly distributed in theminefield. The probability of successfully tra-versing a linear path in the field without runninginto a mine is a decreasing exponential functionof path length x (the survival function)3:

P(x) � e��x, x � 0 (1)

The parameter � � 0 is the hazard rate. Onmost trials, the participant had to choose be-tween a shorter, thus safer, path toward asmaller reward and a longer, more dangerouspath toward a larger reward.4 Surviving the paththey chose would result in the specified reward;otherwise, they would receive nothing. The par-ticipant could potentially estimate the hazardrate � from feedback in the training phase.Whether he adopts the correct, exponentialmodel of Eq. 1, though, is a separate question,one we return to below.

The tasks in the test phase differed only in thepath lengths and values of the lotteries pre-sented; no feedback was provided. The newconditions and absence of feedback preventedthe participant from using stimulus-response

1 Later on, we will use the term “decision from sampling”rather than the more general term “decision from experi-ence” to refer to this research area and its tasks, to avoidconfusion with other decision tasks that have experience-based components.

2 One striking result in the decision from sampling liter-ature is that participants tend to take very small sampleseven when they are free to sample as much as they wish(Hertwig et al., 2004).

3 If the observer takes discrete steps of a fixed size, hislength of survival is a geometric distribution. As the stepsbecome smaller and smaller, the limiting distribution is theexponential which we use in the following analysis.

4 On a few “catch trials” participants were asked tochoose between a shorter path with larger reward and alonger path with smaller rewards. The catch trials were usedto screen out participants as described in the Method sec-tion.

40 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 3: Decision From Models: Generalizing Probability Information

mappings or “model free” reinforcement learn-ing (Daw, Gershman, Seymour, Dayan, &Dolan, 2011) to complete his choices. He had touse the information in the training phase towork out how the probability of survivalchanges with path length; that is, the model thatmaps probability information from training topaths of novel lengths in the test phase in Eq. 1.If he could make effective use of Eq. 1 with anaccurate estimate �, then he could preciselypredict survival probabilities for any pathlength.5

We test whether the individual acquires andmakes use of the exponential survival functionof Eq. 1 with correct hazard rate, allowing cor-rect translation of information from the trainingto the test phase. The internal model, analogousto E. C. Tolman’s cognitive map (Tolman,1948), would allow people to choose amongnovel lotteries that they have never had directexperience with.

We do not assume that the internal modelwould come exclusively from the individual’sexperience in training: In the training phase, we

sought to give participants every opportunity toinfer the correct model, including instructingthem that the mines were randomly distributedand that the probability of surviving a specificpath depended only on the path length. Theyalso received feedback that includes display ofthe full minefield at the end of each trainingtrial. And of course they experienced success orfailure in each training trial. Any of this infor-mation could have contributed to their learningthe correct model.

The task and environment we considered is ofinterest in itself. It is an example of a process ofconstant hazard: no matter how long an organ-ism has survived, on its next step it has the samechance of encountering a mine as it did on thefirst. For a process to be a process of constanthazard simply means the environment is memo-ryless: the number of success in past attemptshas no influence on the probability of success inthe next attempt given that one has surviveduntil now.

Of course, repeated choices in many environ-ments are properly modeled with changing haz-ard functions: repeated successes in the pastmay increase the probability that the next stepwill lead to success—or decrease it. Yet—inmany environments—repeated choices can beapproximated as processes of constant hazardon time or distance or trial. All that is neededis a plausible basis for assuming that eachsuccessive step in time or space incurs thesame probability of failure: unprotected sex,clicking on an e-mail from an unknownsender, darting across a busy street at lunch-time, moving across a meadow exposed topredators— or crossing a minefield.

People have been found to be surprisinglysensitive to the forms of probability distribu-tions (exponential, Gaussian, Poisson, etc.) theyencounter in everyday life: They can accuratelyestimate the distributions of the social attitudesand behaviors of their group (Nisbett & Kunda,1985); they can also use appropriate probabilitydistributions for inference or prediction (Grif-fiths & Tenenbaum, 2006, 2011; Lewandowsky,Griffiths, & Kalish, 2009; Vul, Goodman, Grif-fiths, & Tenenbaum, 2009). We investigate

5 The correct generalizations are somewhat nonintuitive.If the probability of surviving a path of length L is p then theprobability of surviving a path of length 2L is p2.

Figure 1. Example of the task. Top: Participants in avirtual minefield chose between a shorter path leading to asmaller reward and a longer path leading to a larger reward.The probability of failure with each successive step—thehazard rate—is constant. Each treasure chest denoted $1.Middle and bottom: Possible feedback on the chosen path(success and failure). Feedback was present in the trainingphase and absent in the test phase. See Method section.

41DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 4: Decision From Models: Generalizing Probability Information

whether they adopt the exponential survivalfunction (constant hazard) when it is appropri-ate.

Our focus on model-based transfer also dis-tinguishes the path lottery task from sampling-based decision tasks that have a similar riskstructure, such as the Balloon Analogue RiskTask (BART; Lejuez et al., 2002; Pleskac,2008; Wallsten, Pleskac, & Lejuez, 2005). Inthe minefield, every tiny step along the pathincurred a fixed probability of triggering a mineand the participant need to make use of thisconstraint to infer the probability of survival fornovel path lengths in the test phase. In theBART, participants repeatedly pumped a break-able balloon to accumulate reward. They re-ceived an amount of reward for each “pump”but would lose all reward if the balloon broke.The decision at any moment was whether to riskone more “pump” for additional reward. Partic-ipants were never required to generalize theirknowledge of the probability of failure incurredby one “pump” to that of an arbitrary number of“pumps.” Furthermore, in the test phase of theminefield task there was no feedback, preclud-ing any trial-by-trial learning strategies andmaking the task a rigorous test of model-basedtransfer.

We analyzed human choices in the path lot-tery task to answer two questions. First, werepeople’s choices based on the correct exponen-tial model of probability of survival? If not,what model did they use? Second, do peoplecorrectly estimate the parameters of whatevermodel they assume? That is, if they correctlyadopted an exponential survival function, wastheir estimation of the hazard rate (� in Eq. 1)accurate? Or, if they assumed a model of adifferent form, are the parameters of that modelappropriately estimated?

Modeling

Modeling Stochastic Choice

Human choices are stochastic (Rieskamp,2008): People do not necessarily make the samechoice when confronted with the same options.To predict the choice behavior in individualtrials, we constructed choice models that arestochastic.

Denote the pair of path lotteries on a specifictrial as L1: (v, x) and L2: (w, y) where v, w

denote rewards and x, y denote path lengths. Ineach choice model, we computed Pr (L2), theparticipant’s probability of choosing L2, andmodeled the participant’s choice on the trial asa Bernoulli random variable with mean Pr (L2)(Erev, Roth, Slonim, & Barron, 2002). For eachparticipant, we fitted the free parameters in eachchoice model to the participant’s choices usingmaximum likelihood estimation (Edwards,1972; see Supplemental Materials).

Except for the resampling choice models(that we present later), Pr (L2) is determined bythe normalized expected utilities of L1 and L2and a free parameter � that reflects the random-ness in the participant’s choices (Busemeyer &Townsend, 1993; Erev et al., 2002; Sutton &Barto, 1998; see Supplemental Materials).

Utility

The utility of a specific monetary reward wasmodeled as a power function with parameter� � 0 (Luce, 2000):

u(v) � v� (2)

Pleskac (2008) has pointed out that the utilityfunction and probability weighting function inthe task of Wallsten et al. (2005) were notsimultaneously identifiable. We also found that� and the free parameters in some choice mod-els could not be simultaneously estimated (seeSupplemental Appendix C online for proof). Toovercome this issue, we treated � in all thechoice models as a constant for each fit (i.e., nota free parameter in the actual fit) and estimatedeach choice model under a range of different�’s that are typical in the decision-making lit-erature (Gonzalez & Wu, 1999; Tversky &Kahneman, 1992). In this way, we could verifythat any conclusions we drew were valid acrossthe range of utility functions typically encoun-tered.

Equivalent Length

We introduce an intuitive measure of partic-ipants’ preference in the path lottery task, theequivalent length, and then show how differentsurvival functions (increasing, constant, or de-creasing) would lead to different predictions forequivalent length.

Suppose participants choose between ashorter path of length x toward a smaller reward

42 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 5: Decision From Models: Generalizing Probability Information

v and a longer path of length y toward a largerreward w. For any specific (v, x) and w, weestimate the equivalent length y where the par-ticipant is indifferent between the two options,that is, �v, x� ~ �w, y� experimentally. We as-sume an equivalence of expected utilities:

u(v)P(x) � u(w)P(y) (3)

where u(.) is the utility function and P(.) is theprobability of success. Let � � u(w)/u(v) (� �1 as w � v). Equation 3 can then be written as:

P(x) � �P(y) (4)

When the survival function is specified (e.g.,Eq. 1), for any specific �, we can compute theequivalent length y, based on Eq. 4, as a func-tion of x. The functional form of y against x isdetermined by the survival function the partic-ipant assumed. Conversely, we can use the mea-sured relationship of y to x to rule out somepossible choice models.

We first describe three choice models, whichdiffer in their assumptions concerning the sur-vival function. Their assumptions and predic-tions are illustrated in Figure 2. We will laterconsider additional choice models (notablymodels based on resampling) that are motivatedby the observed choice patterns. These will beintroduced in the Results section.

Exponential Choice Model

When the hazard rate is constant, the proba-bility of success (survival function) is an expo-nential function of the path length, as indicatedby Eq. 1. The larger the hazard rate �, the riskierthe minefield, and the smaller the probability ofsuccess. Substituting Eq. 1 into Eq. 4 and, tak-ing logarithms, we have (see Supplemental Ap-pendix A for derivation):

y � x �ln�

�(5)

That is, for any specific choice of �, if par-ticipants (correctly) assumed the exponentialmodel, their measured y would be a linear func-tion of x whose slope equals 1 (Figure 2, leftbottom). Intuitively, if you are indifferent be-tween traveling 1 cm for $1 and 3 cm for $2,

you should be indifferent between traveling 11cm for $1 and 13 cm for $2.

Weibull Choice Model

The Weibull survival function (Weibull,1951) is a generalization of the exponential thatincludes smoothly increasing and decreasinghazard functions6:

P(x) � e�(�x)(6)

6 If the participant correctly assumes an exponential sur-vival function but has a probability distortion in the one-parameter Prelec form (Prelec, 1998), his survival functionis of the form of a Weibull function. The survival functionis probability versus distance and any distortion of thesurvival function can be treated as a form of probabilitydistortion.

Figure 2. Illustration of three classes of survival models. Top:Survival functions (probability of success as a function of pathlength) assumed by the models. Black (thickest) line denotes anexponential model. Red (second thickest) line denotes aWeibull model (� � 1). Blue (least thick) line denotes ahyperbolic model. Bottom: Predicted equivalent length asa function of the shorter length in the lottery pair. The left,central, and right panels denote the predictions of the ex-ponential, Weibull, and hyperbolic models. Lighter anddarker colors in each panel correspond to a higher and alower �’s. See Eq. 1–9. The unit of length is arbitrary.

43DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 6: Decision From Models: Generalizing Probability Information

When � � 1, it coincides with the expo-nential. The corresponding Weibull hazardfunction is

h(x) � �(�x)�1. (7)

which is an increasing power function when� � 1, a decreasing power function when 0 � 1, and constant when � � 1.

The Weibull equivalent length (see Supple-mental Appendix A for derivation) is:

y � x �ln�

� (8)

For any specific �, it is a curve that convergesto the identity line when x approaches infinity(Figure 2, central bottom). The curve is convexwhen � � 1 and concave when 0 � 1.

Hyperbolic Choice Model

There is an evident analogy between our taskof trading-off distance for reward and temporaldiscounting (Frederick, Loewenstein, &O’Donoghue, 2002). The counterpart of con-stant hazard rate in temporal discounting is con-stant discount rate. Human decisions on delayedrewards, however, exhibit a declining discountrate (e.g., Thaler, 1981), which is well fit by ahyperbolic model (see Frederick et al., 2002, fora review; Kirby & Marakovic, 1995; Myerson& Green, 1995; Raineri & Rachlin, 1993).

Analogously, participants may assume a sur-vival function that declines with distance fol-lowing a hyperbolic function:

P(x) �1

1 � x(9)

where � 0 is a parameter of discounting rate,analogous to the hazard rate � in the exponentialmodel. The larger the , the smaller the proba-bility of success. Substituting Eq. 9 into Eq. 4,we get a linear relationship (see SupplementalAppendix A for derivation):

y � �x �� � 1

(10)

Recall that � � u(w)/u(v) � 1. That is, thehyperbolic model predicts a line of a slope

greater than one (Figure 2, right bottom). If youare indifferent between traveling 1 cm for $1and traveling 3 cm for $2, you would prefertraveling 13 cm for $2 to traveling 11 cm for $1.

Method

Apparatus and Stimuli

Stimuli were presented on a 32-in. (69.8 �39.2 cm) Elo touch screen, using the Psycho-physics Toolbox (Brainard, 1997; Pelli, 1997).Figure 1 shows an example of stimuli. Twopaths branched toward the upper left and rightof the screen. Each treasure chest at the end ofa path represented a reward of US$1 for com-pleting the path. On each trial, 369 invisiblecircular mines were independently and uni-formly distributed at random. For every 1 cmtraveled along a path, there was an approxi-mately 8% probability of hitting a mine.

Procedure and Design

The experiment consisted of two phases:training and test. It lasted approximately 75min.

Training. The purpose of the trainingphase was to provide participants an opportu-nity to familiarize themselves with the task andestimate the hazard implicit in moving throughthe minefield. On each trial, participants chosebetween two paths of different rewards anddifferent lengths: (v, x) and (w, y). They wereinformed that mines were randomly distributedand the probability of surviving a specific pathdepended only on the path length.

After choosing, the participant was shown ananimation of her travel along the chosen path. Ifshe hit a mine, the animation stopped and agraphical explosion marked the location of themine (Figure 1, bottom). Otherwise the anima-tion continued to the end and the treasurebox(es) flashed. Thereafter all the mines werebriefly displayed as an aid to the participant inlearning the hazard of traveling in the minefieldand the correct form of the hazard function.

During training, the rewards (v vs. w) were $1versus $2. On each trial, P(x) was randomlychosen to be between 0.3 and 0.7, and P(y) waseither 3/8 of P(x) or 2/3 of P(x) with equallikelihood. Accordingly, x was between 4.3 and14.5 cm and y was 11.8 or 4.9 cm longer than x.

44 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 7: Decision From Models: Generalizing Probability Information

There were 160 training trials (including 20screening trials as described below). Partici-pants were not told that there would be a sub-sequent test phase with no feedback.

Test. In the test phase, the task was thesame except that feedback was withdrawn andthe values of the rewards changed. There weretwo reward conditions (v vs. w): $1 versus $3,and $2 versus $3. The x value could be 11.1,8.4, 6.2, 4.3, or 2.7 cm so that P(x) equaled 0.4,0.5, 0.6, 0.7, or 0.8. For each of the these 10conditions, y was adjusted by a 1-up/1-downadaptive staircase procedure of multiplicativesteps and we computed the equivalent length yas the geometric mean of the x after the first tworeversals. All staircases were interleaved andterminated after 60 trials, resulting in 10 �60 � 600 test trials.

It is known that people accurately estimatelength for lines ranging from 1 cm to 1m(Teghtsoonian, 1965). The path lengths wechose to use were well within this range, sothere would be little chance that biases wouldarise from a misperception of path lengths.

Reward. After each phase, a few trials (oneout of training, six out of test) were chosen atrandom and the participant’s was rewarded forthe outcome of traveling along the chosen path.The sum of winnings on these reward trials waspaid to the participant as a bonus. The partici-pant knew, at the beginning of training, thatsome of the trials would be rewarded in thisway. The participant received US$12 per hourin addition to the bonus.

Screening. In 20 trials of the trainingphase, the shorter path led to a larger rewardand the longer path to the smaller reward. Aparticipant who selected the higher risk path tothe smaller reward has violated dominance. Anyparticipant who failed to choose the nondomi-nant option in more than 4 of the 20 trials wasexcluded from the remainder of the experiment.

Participants

Data from 17 participants—4 men and 13women, aged 18–50, median � 21—were ana-lyzed. An additional two participants failed thedominance screening and did not complete theexperiment. We determined the sample size inadvance. No variables or conditions weredropped. The experiment was approved by the

University Committee on Activities InvolvingHuman Subjects at New York University.

Results

Choices in the Training

In the training phase of the experiment, par-ticipants chose between pairs of path lotteriesL1: (v,x) and L2: (w,y). We use L2 to denote thelottery that had the larger reward and the longerpath length. The larger reward was always twicethe smaller reward ($2 vs. $1). The probabilityP(y) of surviving the longer path was 2/3 of theprobability P(x) of surviving the shorter path for70 trials and 3/8 of the probability P(x) ofsurviving the shorter path for an additional 70trials, all trials interleaved. Figure 3A showshow the participants’ mean probability ofchoosing L2 evolved with experience. In bothconditions, the participants showed a trend to-ward picking the less risky option L1. Thistrend, however, was not significant, accordingto a 2 � 5 repeated-measures ANOVA, F(4,144) � 1.57, p � .19.7

We did not model the choices in training.What we can infer from Figure 3A, without anyfurther assumptions on the underlying decisionprocess, is that there was no significant indica-tion that the participants were becoming moreand more risk-seeking during their training. Wewill return to this point in the Discussion.

Measured Equivalent Lengths in the Test

Before testing participants’ choice behavioragainst the choice models we introduced earlier,we would like to give an overview of theirchoice patterns. In the test phase, we measuredthe equivalent length y, the length for which theparticipant was indifferent between the two op-tions, that is, �v, x� ~ �w, y�. The mean equiva-lent length across participants, plotted as dots inFigure 3B, visually agrees with the pattern pre-dicted by a survival model of the correct expo-nential form (Eq. 4): y is a linear function of xwhose slope is one.

The lines in Figure 3B shows the predictionsof the correct model, an exponential modelwhose hazard rate was estimated from the ran-

7 The degrees of freedom correspond to 2 P(y)/P(x) con-ditions by 5, the number of bins of 14 trials each.

45DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 8: Decision From Models: Generalizing Probability Information

dom samples each participant actually saw us-ing the maximum likelihood method. The mea-sured equivalent lengths were smaller thanthose predicted by the correct model. Partici-pants appeared to be highly risk-averse: Whenchoosing between a shorter path leading to $1and a longer path leading to $3, at the indiffer-ence point, the expected gain of the shorter pathwas only 49% of that of the longer path. Par-ticipants sacrificed over half of their potentialwinnings to avoid the longer, riskier path. Ac-cording to Eq. 4, a smaller equivalent lengthimplies an overestimation of the hazard rate.

Model Comparison for Participants’Choices in the Test

An examination of Figure 3B by eye, as wedescribed previously, suggests that an averageparticipant correctly based his choices on anexponential internal model but overestimatedthe hazard rate. We wanted to see whether theseobservations are supported by formal modelcomparison procedures and whether they heldon the individual level.

For each participant, we fit the exponential,Weibull, and hyperbolic models to the partici-pants’ 600 choices in the test using the maxi-mum likelihood method (see Online Supple-mental Materials). We used the Bayesianinformation criterion (BIC; Schwarz, 1978) tochoose the model that best accounted for theparticipant’s choice behaviors, that is, themodel which maximized:

BIC � � �1

2k log N (11)

where � denotes the log likelihood of the modelfit, k, denotes the number of free parameters inthe model, and N denotes the number of datapoints fitted.

Form of the survival function. Table 1shows the number of participants best fit byeach choice model under a few representativeutility functions: u(v) � v, u(v) � v0.88 (Tversky& Kahneman, 1992), and u(v) � v0.49 (Gonza-lez & Wu, 1999). The distribution of best fits foreach model barely changed with the assumed

Figure 3. Summary of participants’ choices in training and test. (A) Training: How theprobability of choosing the lottery of the longer path (L2) and the larger reward evolved withtrials. Probabilities were computed for bins of 14 trials and averaged across participants. Therewards for the shorter and longer paths were $1 versus $2. Blue and green correspond tolong-to-short-probability-ratio of 2/3 and 3/8. Errors bars denote 1 SE. (B) Test: Measuredequivalent length, y, as a function of the length of the shorter path, x. The measured are plottedagainst those predicted by a correct exponential internal model. Dots denote data. Errors barsdenote 1 SE. The correct model varied with the random samples each participant actuallyobserved in the training. Lines denote the mean of the model predictions. Shadows denote therange of the model predictions. Red for $1 versus $3, black for $2 versus $3.

46 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 9: Decision From Models: Generalizing Probability Information

utility function. Among the three choice mod-els, most participants were best fit by an expo-nential model (constant hazard rate); the run-ner-up was the Weibull model; few participantswere best fit by the hyperbolic model.

The Weibull model is a generalized form ofthe exponential model, which allows the hazardrate to be increasing (� � 1), constant (� � 1),or decreasing (� 1) over distance. We usedthe estimated � in the Weibull model to provideadditional information about the hazard-changing trend assumed by the participant. Fig-ure 4A shows � for each participant, assumingu(v) � v: The � of most participants was closeto one (median � 1.16), implying an assump-tion of constant hazard. Across participantsthere was no trend away from constant hazard.

Estimation of hazard. In the trainingphase, participants observed whether and wheretheir chosen paths hit mines. Based on the ran-dom samples that each participant actually saw,we estimated the hazard rate �0 that most likelygenerated these samples using the maximumlikelihood method. It ranged from 0.069 to0.091 (median � 0.078); if the participant’schoices were consistent with this hazard rate wewould judge that she correctly estimated hazardduring training.

Figure 4B shows the estimated hazard rate �when the participant’s choices were fitted to theexponential model, assuming u(v) � v, relativeto her true hazard rate �0. Most participantsoverestimated the hazard rate. The median ratiowas 3.53, 3.13, and 2.41, respectively, foru(v) � v, u(v) � v0.88, and u(v) � v0.49.

Learning-Based Choice Models

What would lead participants to overestimatetheir hazard rate? We constructed a delta-rulelearning algorithm to estimate the survival func-

tion based on experience in the training phase(Online Supplemental Materials). The algo-rithm could potentially result in the observedoverestimation of hazard rate. The basic ideawas that positive and negative outcomes couldbe weighted in an unbalanced way. In particu-lar, the overestimation of hazard rate was im-plemented as an overweighting of negative out-comes (running into a mine) compared withpositive outcomes (survival).

The survival function produced by the learn-ing algorithm is a step function. We call thechoice model based on it the learning-basednonparametric model. We constructed a secondlearning-based model, the learning-based expo-nential model, whose survival function is sim-ply a smoothing of the survival function of thenonparametric model. Remarkably, most partic-ipants were better fit by the learning-based ex-ponential model than by the learning-basednonparametric model (Table 2), providing fur-ther evidence that participants assumed survivalmodels of the correct exponential form.

Survival-Based Versus ResamplingChoice Models

Instead of building a one-to-one mapping be-tween path length and probability of survival, asimplemented in the survival-based choice mod-els discussed earlier, participants might memo-rize all the path lotteries they attempted duringtraining together with their outcomes and usethese instances to estimate the probability ofsurvival for each test path. However, partici-pants could not simply recall paths of the samelength as the test path because there was noexact match from the training set—the pathlengths in the training set were randomly chosenfrom a continuum. The issue then is, how canthe participant generalize the information

Table 1Model Comparison: Constant Versus Changing Hazard Rate

Model df

Assuming u(v) � v Assuming u(v) � v0.88 Assuming u(v) � v0.49

No. of best fits Mean BIC No. of best fits Mean BIC No. of best fits Mean BIC

Exponential 2 12 �404.7 11 �404.7 11 �405.2Weibull 3 4 �404.9 4 �404.9 4 �405.4Hyperbolic 2 1 �418.3 2 �417.9 2 �414.3

Note. BIC � Bayesian information criterion. df refers to the number of free parameters in the model. No. of best fits refersto the number of participants that were best fit by the model.

47DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 10: Decision From Models: Generalizing Probability Information

gained in the training phase? This informationcan be thought of as ordered triples (x, x=, O)where, x is the path length on a trial, x= is thepath length traveled until a mine was encoun-tered on a failure trial or the total length of thepath on a reward trial, and O represents theobserved outcome (success or failure).

We explored simple rules based on resamplingfrom the information gathered during training,stored in memory. We constructed three resam-pling models, all of which assume that participantsrandomly (re)sample from their training experi-ence. The models differ in the memory popula-tions to resample from and the generalization ruleapplied to the resampled paths.8

The first resampling model assumed that par-ticipants resample only from paths that are lon-ger than the test path. Each resampled path thatran into a mine before reaching the length of thetest path corresponded to a possible failure forthe test path (the test path would have encoun-tered the same mine as the resampled path ifplaced on the same minefield). The probability

of survival for the test path was computed as thenumber of survivals averaged across all thesampled paths. We call this model the unbiasedresampling model: the estimated probability ofsurvival for any test path would not deviatefrom the true probability of survival in anysystematic way.

To accommodate possible biases in partici-pants’ choices, we considered two additionalresampling models that involve biased (butplausible) usage of the resampled paths. Both ofthem assumed that resamples could be drawnfrom any path the participant has experienced intraining. When the resampled path length waslonger than the test path, the rule was the same

8 In statistics, resampling is a method based on drawingrandom samples from data with replacement. It is the basisfor bootstrapping methods (Efron & Tibshirani, 1993). Weuse the terms “resampling,” and so forth, to distinguish thereuse of the training phase data from the actual gathering ofinformation during the training phase, which is a form ofsampling.

Figure 4. Survival models assumed in participants’ choices in the test. Each bar is for oneparticipant. Bar color codes whether the participant was best fit by the exponential, Weibull,or hyperbolic survival model: black for exponential, gray for Weibull, light gray for hyper-bolic. The utility function was assumed to be u(v) � v. (A) Subjective hazard-changing indexfor each participant. When a participant’s choices were fitted to the Weibull survival model(Eq. 6), the estimated parameter � provided an index whether the participant assumed anincreasing (� � 1), constant (� � 1), or decreasing (� 1) hazard rate across distance. The� of most participants was close to one (median � 1.16), echoing the results of modelcomparison that most participants were best accounted for by the exponential model (i.e.,Weibull model with � � 1). (B) Subjective relative to true hazard rate for each participant.The subjective hazard rate � was the estimated hazard rate parameter when the participant’schoices in test were fitted to the exponential survival model (Eq. 1). The true hazard rate �0

was what the participant experienced in training. Almost all participants overestimated thehazard rate to a large extent (median � 3.53).

48 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 11: Decision From Models: Generalizing Probability Information

as in the unbiased resampling model. When theresampled path was shorter than the test path,the optimistic resampling model assumed thatthe resampled path counted as a success if theresampled path itself ended in success, that is,success on the shorter, resampled path wastreated as a guarantee of success on the longerpath; conversely, the pessimistic resamplingmodel, treated the same case as a failure. Theprobability of survival is typically overesti-mated in the optimistic resampling model andunderestimated in the pessimistic model.

Could participants’ choice behaviors arisefrom a resampling process? We compared thefits of the survival-based choices models withthe resampling models. As shown in Table 3,only 3 out of 17 participants could be bettercaptured by a resampling model than by a sur-vival-based model.

We noticed three facts. First, on average theresampling models fit much worse to partici-pants’ choice behaviors than most of the model-based models did (mean BIC in Table 3). Sec-ond, the pessimistic resampling model fit betterthan the other two resampling models (numberof best-fit participants). Recall that the pessi-

mistic resampling model would yield an under-estimation of the probability of survival, that is,an overestimation of hazard rate. Third, for allthe participants who were best fit by a resam-pling model, the fitted sample size was 1, incontrast to a typical sample size of 5 for sam-pling-based decisions (Erev et al., 2010).

The BIC difference between the survival-based models and the resampling models wassmaller for u(v) � v0.49. At first glance it ap-pears that the resampling models do “relativelyless poorly” when the utility function is mostconcave. This is simply because of the fact thatlarger rewards are associated with longer pathlengths (lower probabilities) and failure to gen-eralize correctly is not punished as much iflarger rewards are scaled to be relatively lessimportant.

Discussion

In the present article, we introduce decisionfrom models as a class of decisions parallel todecision from description (Tversky & Kahne-man, 1992) and decision from sampling (Barron& Erev, 2003). It has an experienced-based

Table 2Model Comparison: Learning-Based Models

Model df

Assuming u(v) � v Assuming u(v) � v0.88 Assuming u(v) � v0.49

No. of best fits Mean BIC No. of best fits Mean BIC No. of best fits Mean BIC

L-B non-parametric 3 4 �411.5 5 �410.8 5 �409.1L-B exponential 3 13 �407.9 12 �407.9 12 �408.4

Note. L-B abbreviates for learning-based. df refers to the number of free parameters in the model. No. of best fits refersto the number of participants that were best fit by the model. BIC � Bayesian information criterion.

Table 3Model Comparison: Survival-Based Versus Resampling Models

Model df

Assuming u(v) � v Assuming u(v) � v0.88 Assuming u(v) � v0.49

No. of best fits Mean BIC No. of best fits Mean BIC No. of best fits Mean BIC

Survival-based 14 14 14Exponential 2 10 �404.7 9 �404.7 9 �405.2Weibull 3 3 �404.9 3 �404.9 3 �405.4Hyperbolic 2 1 �418.3 2 �417.9 2 �414.3

Resampling 3 3 3Unbiased 1 0 �434.7 0 �434.8 1 �427.9Optimistic 1 0 �479.7 0 �479.7 0 �479.5Pessimistic 1 3 �437.9 3 �437.4 2 �430.6

Note. BIC � Bayesian information criterion. df refers to the number of free parameters in the model. No. of best fits refersto the number of participants that were best fit by the model.

49DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 12: Decision From Models: Generalizing Probability Information

component but is focused on generalization:participants need to develop a model allowingthem to generalize probability information fromtheir past experience to novel tasks. We inves-tigated whether people could correctly learn anexponential survival model and base theirchoices in novel tasks on the correct model.

To maximize their expected reward, partici-pants needed to take the information acquiredduring training and use it to generalize to novelconditions. If their memory for this informationis distorted then their choice of model and/orestimate of its hazard rate parameter could beaffected. But remembering this information ac-curately is not sufficient to guarantee that theypick the correct model or correctly use it togeneralize.

We found that most participants madechoices consistent with a survival model of thecorrect exponential form: The exponentialmodel fit better to participants’ choices than theWeibull model and the hyperbolic model.

Prelec and Loewenstein (1991), and Greenand Myerson (2004) suggested a number ofparallels between decision under risk and inter-temporal decision. The correct exponentialfunctional form provided better fits to partici-pants’ data than the hyperbolic form found inthe temporal discounting literature. If risk in ourtask were exchangeable with temporal delay,we would have expected a hyperbolic discount-ing of distance (i.e., hyperbolic survival func-tion) in the minefield. However, hyperbolic in-ternal models were favored by fewer than oneeighth of the participants.

Evidence in favor of the correct assumptionof the exponential form also came from a modelcomparison with a nonparametric model. We“piped” the participant’s training experiencethrough a specific learning algorithm and foundthat, if the resulting survival function was thensmoothed to approximate an exponential func-tion (no parameter added), it would provide abetter fit for most participants than that of thesurvival function originally learned. That is,choice models that assumed internal models ofthe correct exponential form better capturedparticipants’ choices than those that assumed nospecific functional forms.

We further verified that participants’ choicesin the test could not be reduced to a resamplingprocess that is analogous to decision from sam-pling but based on resampling from memory

(Erev et al., 2010). The performance of mostparticipants were better accounted for by choicemodels that rely on a survival function than byresampling models. Even for the few partici-pants who may have depended on resampling,the sampling process was based on very smallsamples. The sample size was typically 1, evensmaller than the number of samples ( 5) foundin decision-from-sampling tasks (Erev et al.,2010). This result echoes Vul et al.’s (2009)claim that a sample size of one is efficient forinference from probability distributions.

But most participants did not base theirchoices on the correct model exactly: they over-estimated the hazard rate of the exponentialsurvival function. They did not go as far as theyshould go when offered a larger reward. Ac-cording to participants’ average response in the$1 versus $3 condition, the expected gain of theshorter path was only 49% of that of the longerpath. We noticed a similar “risk-averse” out-come in the BART: Their participants pumpedthe balloon fewer times than would maximizeexpected gain (Lejuez et al., 2002). When theoptimal number of pumps was 64, participantson average pumped only 37.6 times (Wallsten etal., 2005).

The BART and our path lottery task, puttogether, shed light on this overestimation ofhazard. Wallsten et al. (2005), in their modelingof the BART, attributed it to an incorrect priorbelief that would be corrected by experience.Our results cast doubt on this explanation. Sincethe hazard rates of both tasks were arbitrarilychosen by the experimenter, why should peoplehave a prior hazard rate in both tasks that ishigher than the true? Moreover, if participantsin the path lottery task did have an overestima-tion of hazard before the task but were able tocorrect it through experience, we would expectan increasing trend in their probability to choosethe more risky option (L2) during training. Thiswas not the case (Figure 3A). We conjecturethat participants’ overestimation of hazard re-flects an improper estimation of probability in-formation from their experience and not simplylack of experience.

It is natural to expect people to correctlyestimate both the correct functional form andthe correct parameters (as in Griffiths & Tenen-baum, 2006) or neither. Mozer, Pashler, andHomaei (2008) pointed out the possibility thatthe seemingly accurate human predictions for a

50 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 13: Decision From Models: Generalizing Probability Information

specific distribution may be based on a fewmemorized samples of the distribution. A sim-ilar possibility can hardly be true for our deci-sion task. If we assume that participants’ mem-ory of samples was faithful and that theycombine sample information correctly, their de-cisions should not systematically deviate fromthe predictions of the true hazard rate. Con-versely, if their memory of samples were dis-torted, there would be no reason to assume thatthe form of the distribution would be unaf-fected. And as we demonstrated earlier, resam-pling models did not provide good fits to par-ticipants’ choices. Of course, we did not test allpossible resampling models and could not ex-clude the possibility that a different generaliza-tion model based on resampling such as thesimilarity-based sampling model of Lejarragaand Gonzalez (2011) might work better than thethree sampling models we tested. However, wesee no way to develop a sampling-based modelthat could account for the coexistence of thecorrect functional form and the overestimationof hazard.

Two basic approaches to learning a specificfunctional relationship are (1) to rely on a set ofhard-wired base functions or (2) to use associa-tive learning free of functional forms (McDaniel& Busemeyer, 2005). The disassociation oflearning of the form and the parameter of thehazard function in our task, however, posessomething of a challenge for associative learn-ing for the similar reason we discussed abovefor the sampling approach. We conjecture in-stead that the exponential (or an approximationto the exponential) is one of the hard-wiredfunctions in the cognitive “repertoire” of func-tions permitting extrapolation and interpolation.Selection of a particular function (exponentialor otherwise) is then seen to be analogous tocontour completion in human perception(Metzger, 2006). We have argued that processesof constant hazard are appropriate models ofmany repeated tasks in everyday life, consistentwith the claim that the exponential is hard-wired.

We conjecture that participants were able tochoose the correct form of model to use simplybased on their understanding of the minefieldand possibly a few trials of experience. There isevidence that in their use of probability infor-mation people take into account how stochasticevents or processes are generated physically

(C. S. Green, Benson, Kersten, & Schrater,2010; Pleskac, 2008). However, people alsostick to particular probabilistic models in spiteof extensive exposure to contradictory evi-dence. For example, people incorrectly assumean isotropic model of their motor error distribu-tion after 300 trials of exposure to the verticallyelongated true distribution (Zhang, Daw, & Ma-loney, 2013). In the BART (Lejuez et al., 2002),the probability of breaking the balloon with thenext pump increased with the number of pumps,but most participants were better modeled asassuming a constant hazard rate (Wallsten et al.,2005).

In sum, the previous findings allow us toclaim decision from models as a new class ofdecisions that are distinct from decision fromdescription and decision from sampling. Peopledo make choices that generalize to novel tasksusing models that allow them to translate expe-rience into accurate probability estimates asso-ciated with the novel tasks.

The major goal of the present study was toevaluate decision from models at the computa-tional-theory level in Marr’s scheme (Marr,1982, p. 25). In addition, a process model isneeded to account for the patterned choice be-haviors we have observed, in particular, howpeople interpolate and estimate hazard rate as afunction of path length and why they sometimesget it wrong.

There are many open questions in this newclass of decision tasks. For example, our resultsdo not enable us to ascribe participants’ devia-tions from optimality (maximizing utility) toerrors in learning (training phase) or choice (testphase). Further work is needed to elucidate thesource of error: what is learned versus how it isused. Another untreated problem is probabilitydistortion. As we noted previously, any distor-tion of the survival function is equivalent toapplying a probability distortion function to thecorrect, exponential survival function.

Last of all, turning a finite set of observationsinto a predictive mechanism for an infinite num-ber of possible cases is the problem of induc-tion. An exciting direction for future researchwould be to compare tasks with different, cor-rect generalization functions (such as minefieldswith nonuniform distributions of mines) to testhuman ability to generalize correctly.

51DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 14: Decision From Models: Generalizing Probability Information

References

Barron, G., & Erev, I. (2003). Small feedback-baseddecisions and their limited correspondence to de-scription-based decisions. Journal of BehavioralDecision Making, 16, 215–233. http://dx.doi.org/10.1002/bdm.443

Brainard, D. H. (1997). The psychophysics toolbox.Spatial Vision, 10, 433– 436. http://dx.doi.org/10.1163/156856897X00357

Busemeyer, J. R., & Townsend, J. T. (1993). Deci-sion field theory: A dynamic-cognitive approach todecision making in an uncertain environment. Psy-chological Review, 100, 432–459. http://dx.doi.org/10.1037/0033-295X.100.3.432

Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P.,& Dolan, R. J. (2011). Model-based influences onhumans’ choices and striatal prediction errors.Neuron, 69, 1204–1215. http://dx.doi.org/10.1016/j.neuron.2011.02.027

Edwards, A. W. F. (1972). Likelihood. New York,NY: Cambridge University Press.

Efron, B., & Tibshirani, R. J. (1993). An introductionto the Bootstrap. New York: Chapman & Hall.

Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog,S. M., Hau, R., . . . Lebiere, C. (2010). A choiceprediction competition: Choices from experienceand from description. Journal of Behavioral Deci-sion Making, 23, 15–47. http://dx.doi.org/10.1002/bdm.683

Erev, I., Roth, A., Slonim, R., & Barron, G. (2002).Combining a theoretical prediction with experi-mental evidence. Retrieved from http://papers.ssrn.com/abstract_id�1111712

Frederick, S., Loewenstein, G., & O’Donoghue, T.(2002). Time discounting and time preference: Acritical review. Journal of Economic Literature,40, 351– 401. http://dx.doi.org/10.1257/jel.40.2.351

Gonzalez, R., & Wu, G. (1999). On the shape of theprobability weighting function. Cognitive Psychol-ogy, 38, 129–166. http://dx.doi.org/10.1006/cogp.1998.0710

Green, C. S., Benson, C., Kersten, D., & Schrater, P.(2010). Alterations in choice behavior by manipu-lations of world model. Proceedings of the Na-tional Academy of Sciences of the United States ofAmerica, 107, 16401–16406. http://dx.doi.org/10.1073/pnas.1001709107

Green, L., & Myerson, J. (2004). A discountingframework for choice with delayed and probabi-listic rewards. Psychological Bulletin, 130, 769–792. http://dx.doi.org/10.1037/0033-2909.130.5.769

Griffiths, T. L., & Tenenbaum, J. B. (2006). Optimalpredictions in everyday cognition. PsychologicalScience, 17, 767–773. http://dx.doi.org/10.1111/j.1467-9280.2006.01780.x

Hadar, L., & Fox, C. R. (2009). Information asym-metry in decision from description versus decisionfrom experience. Judgment and Decision Making,4, 317–325.

Hertwig, R., Barron, G., Weber, E. U., & Erev, I.(2004). Decisions from experience and the effectof rare events in risky choice. Psychological Sci-ence, 15, 534 –539. http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x

Kahneman, D., & Tversky, A. (1979). Prospect the-ory: An analysis of decision under risk. Economet-rica, 47, 263–291. http://dx.doi.org/10.2307/1914185

Kirby, K. N., & Marakovic, N. N. (1995). Modelingmyopic decisions: Evidence for hyperbolic delay-discounting within subjects and amounts. Organi-zational Behavior and Human Decision Processes,64, 22–30. http://dx.doi.org/10.1006/obhd.1995.1086

Lejarraga, T., & Gonzalez, C. (2011). Effects offeedback and complexity on repeated decisionsfrom description. Organizational Behavior andHuman Decision Processes, 116, 286–295. http://dx.doi.org/10.1016/j.obhdp.2011.05.001

Lejuez, C. W., Read, J. P., Kahler, C. W., Richards,J. B., Ramsey, S. E., Stuart, G. L., . . . Brown,R. A. (2002). Evaluation of a behavioral measureof risk taking: The Balloon Analogue Risk Task(BART). Journal of Experimental Psychology:Applied, 8, 75–84. http://dx.doi.org/10.1037/1076-898X.8.2.75

Lewandowsky, S., Griffiths, T. L., & Kalish, M. L.(2009). The wisdom of individuals: Exploring peo-ple’s knowledge about everyday events using iter-ated learning. Cognitive Science, 33, 969–998.http://dx.doi.org/10.1111/j.1551-6709.2009.01045.x

Luce, R. D. (2000). Utility of gains and losses:Measurement-theoretical and experimental ap-proaches. London, England: Erlbaum.

Marr, D. (1982). Vision: A computational investiga-tion into the human representation and processingof visual information. New York, NY: Freeman.

McDaniel, M. A., & Busemeyer, J. R. (2005). Theconceptual basis of function learning and extrapo-lation: Comparison of rule-based and associative-based models. Psychonomic Bulletin & Review,12, 24–42. http://dx.doi.org/10.3758/BF03196347

Metzger, W. (2006). Laws of seeing. Cambridge,MA: MIT Press.

Mozer, M. C., Pashler, H., & Homaei, H. (2008).Optimal predictions in everyday cognition: Thewisdom of individuals or crowds? Cognitive Sci-ence, 32, 1133–1147. http://dx.doi.org/10.1080/03640210802353016

Myerson, J., & Green, L. (1995). Discounting ofdelayed rewards: Models of individual choice.Journal of the Experimental Analysis of Behavior,

52 ZHANG, PAILY, AND MALONEY

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Page 15: Decision From Models: Generalizing Probability Information

64, 263–276. http://dx.doi.org/10.1901/jeab.1995.64-263

Nisbett, R. E., & Kunda, Z. (1985). Perception ofsocial distributions. Journal of Personality andSocial Psychology, 48, 297–311. http://dx.doi.org/10.1037/0022-3514.48.2.297

Pelli, D. G. (1997). The VideoToolbox software forvisual psychophysics: Transforming numbers intomovies. Spatial Vision, 10, 437–442. http://dx.doi.org/10.1163/156856897X00366

Pleskac, T. J. (2008). Decision making and learningwhile taking sequential risks. Journal of Experi-mental Psychology: Learning, Memory, and Cog-nition, 34, 167–185. http://dx.doi.org/10.1037/0278-7393.34.1.167

Prelec, D. (1998). The probability weighting func-tion. Econometrica, 66, 497–527. http://dx.doi.org/10.2307/2998573

Prelec, D., & Loewenstein, G. (1991). Decision mak-ing over time and under uncertainty: A commonapproach. Management Science, 37, 770 –786.http://dx.doi.org/10.1287/mnsc.37.7.770

Raineri, A., & Rachlin, H. (1993). The effect of tem-poral constraints on the value of money and othercommodities. Journal of Behavioral Decision Mak-ing, 6, 77–94. http://dx.doi.org/10.1002/bdm.3960060202

Rakow, T., & Newell, B. R. (2010). Degrees ofuncertainty: An overview and framework for fu-ture research on experience-based choice. Journalof Behavioral Decision Making, 23, 1–14. http://dx.doi.org/10.1002/bdm.681

Rieskamp, J. (2008). The probabilistic nature of pref-erential choice. Journal of Experimental Psychol-ogy: Learning, Memory, and Cognition, 34, 1446–1465. http://dx.doi.org/10.1037/a0013646

Schwarz, G. (1978). Estimating the dimension of amodel. The annals of statistics, 6, 461–464. http://dx.doi.org/10.1214/aos/1176344136

Sutton, R. S., & Barto, A. G. (1998). Reinforcementlearning: An introduction. Cambridge, MA: MITPress.

Teghtsoonian, M. (1965). The judgment of size. TheAmerican Journal of Psychology, 78, 392–402.http://dx.doi.org/10.2307/1420573

Teodorescu, K., & Erev, I. (2014). On the decision toexplore new alternatives: The coexistence of un-der- and over-exploration. Journal of BehavioralDecision Making, 27, 109–123. http://dx.doi.org/10.1002/bdm.1785

Thaler, R. (1981). Some empirical evidence on dynamicinconsistency. Economics Letters, 8, 201–207. http://dx.doi.org/10.1016/0165-1765(81)90067-7

Tolman, E. C. (1948). Cognitive maps in rats andmen. Psychological Review, 55, 189–208. http://dx.doi.org/10.1037/h0061626

Tversky, A., & Kahneman, D. (1992). Advances inprospect theory: Cumulative representation of un-certainty. Journal of Risk and Uncertainty, 5, 297–323. http://dx.doi.org/10.1007/BF00122574

Ungemach, C., Chater, N., & Stewart, N. (2009). Areprobabilities overweighted or underweighted whenrare outcomes are experienced (rarely)? Psycho-logical Science, 20, 473–479. http://dx.doi.org/10.1111/j.1467-9280.2009.02319.x

Vul, E., Goodman, N. D., Griffiths, T. L., & Tenen-baum, J. B. (2009, July). One and done? Optimaldecisions from very few samples. Paper presentedat the Proceedings of 31st Annual Meeting of theCognitive Science Society, Amsterdam, Nether-lands.

Wallsten, T. S., Pleskac, T. J., & Lejuez, C. (2005).Modeling behavior in a clinically diagnostic se-quential risk-taking task. Psychological Review,112, 862– 880. http://dx.doi.org/10.1037/0033-295X.112.4.862

Weibull, W. (1951). A statistical distribution func-tion of wide applicability. Journal of Applied Me-chanics, 18, 293–297.

Zhang, H., Daw, N. D., & Maloney, L. T. (2013).Testing whether humans have an accurate modelof their own motor uncertainty in a speeded reach-ing task. PLoS Computational Biology, 9,e1003080. http://dx.doi.org/10.1371/journal.pcbi.1003080

Received April 30, 2013Revision received July 27, 2014

Accepted August 20, 2014 �

53DECISION FROM MODELS

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.