introduction to reward processing
TRANSCRIPT
-
8/12/2019 Introduction to Reward Processing
1/28
1 Introduction to Reward processingKrishna Prasad Miyapuram
Ph.D. Thesis Chapter, University of Cambridge, April 2!
1.1 Functions of rewardsReward seeking behaviour depends to a large extent on successfully extracting reward information
from a large variety of environmental stimuli and events. Learning to reliably predict the occurrence
of rewards such as food enables an organism to prepare behavioural reactions and improve the
choices that it makes in the future. Learning can be defined as a change in behaviour. Various
sensory cues from the environment such as sounds, sights and smells that are associated with a
reward guide the animal to return to the previously experienced reward (Wise, 2002). Thus, one of
the main functions of rewards is to induce learning, as subjects will come back for more when they
encounter a reward. Another function of rewards is to induce approach and consummatory
behaviour for acquiring the rewarding object. This is essential for decision making and Goal-
directed behaviour, as the animal learns to decide the appropriate actions to be executed with
rewards as goals. The third function of rewards is to induce subjective feelings of pleasure and
hedonia (positive emotions). This common perception associates rewards primarily with happiness.
Thus rewards have very basic functions in the life of individuals and are necessary for survival and
reproduction (survival of genes) (Schultz, 2000, 2004, 2006).
1.1.1
Learning by conditioning
Reward-directed learning can occur by associating a stimulus with a reward (Pavlovian or classical
conditioning) or by associating an action with a reward (instrumental or operant conditioning).
These forms of learning fall under the category of associative learning. More than a century ago,
Thorndike (1898) argued that learning consists of the formation of connections between stimuli and
responses and that these connections are formed whenever a response is followed by a reward. This
kind of learning is called instrumental (or operant) conditioning as the delivery of the reward is
contingent on the response made by the animal. Pavlov (1929) delivered the reward to his subjectsindependently of the animals behaviour. Thus, learning in Pavlovian conditioning consisted of
-
8/12/2019 Introduction to Reward Processing
2/28
Introduction to Reward processing
2
pairing between a stimulus and a reward. In both kinds of learning an arbitrary, previously neutral
stimulus (Conditioned Stimulus, CS) acquires the function of a rewarding stimulus after being
repeatedly associated in time with a rewarding object (Unconditioned Stimulus, US).
The early definitions of conditioning have emphasised that the temporal contiguity of the CS
and the US is essential for learning. Modern views of conditioning, however, suggest that the
pairing or contiguity of two events is neither necessary nor sufficient for learning to occur (see
Rescorla, 1988 for review). Rather, conditioning depends on the information that the CS provides
Figure 1-1 Learning by classical conditioning
(a) Contiguity requirement. The US needs to follow the CS in a temporallycontiguous manner. (b) If the US is delayed after the offset of CS, it iscalled trace conditioning procedure. (c) Contingency requirement. The USshould have higher probability of occurring in the presence of the CS thanits absence for excitatory conditioning. (d) If the CS predicts the omissionof a US, it is said to be conditioned inhibition. (e) prediction error.Unexpected delivery of reward gives a positive prediction error; while the
omission of a predicted reward gives a negative prediction error. (f) higherorder conditioning occurs when a second stimulus CS predicts the
occurrence of the CS.
-
8/12/2019 Introduction to Reward Processing
3/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
3
about the US. More specifically, the US needs to occur more frequently in the presence of the CS as
compared with its absence. Further, a negative relation between a CS and US can be learned if the
occurrence of the CS predicts the omission of the US (conditioned inhibition, Rescorla, 1969). This
suggests that contingency of the US upon occurrence of the CS is crucial for Pavlovian conditioning
(Dickinson, 1980). When a US is fully predicted by a CS, then it does not contribute to any further
learning even if the contiguity and contingency requirements are fulfilled. This phenomenon is
illustrated by the blocking effect (Kamin, 1969), in which a previously formed association prevents
or blocks the formation of a new association. Kamin (1969) proposed that the surprise or error in
prediction of the US contributes to learning. Thus, there are three key factors govern learning by
conditioning contiguity, contingency, and prediction error (Tobler, 2003; Schultz, 2006).
Box 1 Models of conditioning Role of prediction error
Prediction error has been fundamental to many models of conditioning. Rescorla and Wagner
(1972) proposed that repeated pairing of a CS (stimulus A) and a US will result in a gradual
increase in the strength of association (VA) between them. According to their model, the change in
associative strength,
VA= (-VT)
where, the value of is set by the magnitude of the US and represents the maximum strength that
the CS-US association can achieve. VT represents the sum of associative strengths of all stimuli
present on the trial. Therefore, the term -VT represents the prediction error, which is nothing but
the discrepancy between the maximum associative strength and the current prediction. The two
learning-rate parameters and with values between 0 and 1 determined by the salience of the CS
(stimulus A) and the US respectively, that are fixed during conditioning. The Rescorla-Wagner (R-
W) model can explain the contingency requirement for conditioning by allowing the experimental
context to be associated with the US like any other CS. Hence if the probability p(US|CS) of the US
occurring in the presence of the CS is lower than the probability p(US|no CS) of the US occurring
in the absence of the CS, the associative strength for predicting the US would be greater for the
experimental context compared to that of the CS (conditioned inhibition). The blocking effect can
also be explained as the R-W model incorporates the prediction error from the total associative
strength VT of all stimuli present on a given trial. So a fully predicted US does not generate any
prediction error and hence blocks any further learning by a second stimulus. Despite, the limitations
-
8/12/2019 Introduction to Reward Processing
4/28
Introduction to Reward processing
4
of R-W model to explain phenomenon like latent inhibition (the pre-exposure of a CS retards later
conditioning of the CS with a US), the prediction error principle remains central to a number of
contemporary models of conditioning (see Pearce and Boston, 2001).
Attentional theories of conditioning have suggested that in addition to the processing of theUS proposed by Rescorla-Wagner model, the processing of the CS is integral to the process of
conditioning (Mackintosh, 1975; Pearce and Hall, 1980). According to Mackintosh (1975), stimuli
that generate least absolute value of prediction error are good predictors of US and generate
maximum attention. The change in associability of a stimulus A is positive if | -VA| < | -Vx|
and is negative otherwise. Here, Vx is sum of associative strengths of all stimuli except A. The
change in associative strength is given by
VA= A(-VA)
Thus, Mackintosh model suggests a separable error term so that associative change undergone by a
CS is influenced by the discrepancy between its own associative strength (VA) and the outcome ().
Pearce and Hall (1980) proposed that the associability Aof a stimulus A on a trial n is determined
by the absolute value of the discrepancy for the previous occasion on which stimulus A was
presented.
An
= | -VT|
n-1
The change in associative strength is determined by
VA= ASA
where, SAdenotes the salience of the CS.
Pearce-Hall model suggests, contrary to the Mackintosh model, that maximum attention (processing
of the CS) is generated by stimuli that have generated prediction error of the US in the previous
trial. Nevertheless, the attentional theories of conditioning suggest that attention to CS is crucial for
learning and changes in attentional processing result from absolute prediction errors (see Pearce and
Bouton, 2001 for a review).
The models of conditioning can be summarised as essentially including two terms that are
combined multiplicatively CS processing (eligibility) and US processing (reinforcement). While
the Rescorla-Wagner model proposed that learning is driven entirely by changes in US processing
in terms of prediction error, the Mackintosh and Pearce-Hall models have emphasised the role of
CS processing (attention) in terms of change in associability. Le Pelley (2004) has suggested a
-
8/12/2019 Introduction to Reward Processing
5/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
5
hybrid model integrating these previous models of associative learning. The hybrid model
distinguishes between attentional associability of the Mackintosh model and the salience
associability of the Pearce-Hall model and combines them in a multiplicative way along with
separable error term (e.g. | -VA| ) and summed error term of Rescorla-Wagner model.
A real-time extension of the Rescorla-Wagner model is the temporal difference (TD) model
developed by Sutton and Barto (1981; Sutton, 1988; see Sutton and Barto, 1990 for a review with
reference to animal learning theories). The advantage of real-time models is that the temporal
relationship between stimuli within a trial can be captured. An important illustration is the delay
conditioning procedure. In this procedure, the CS has an onset much earlier than the US and the
onset of the US is at the offset of the CS or slightly earlier. A further delay between the offset of the
CS and the onset of the US is referred to as trace conditioning procedure. The time between theonset of the CS and the onset of the US is called Inter-Stimulus-Interval (ISI). The effectiveness of
conditioning is known to reduce for long ISI (see Sutton and Barto, 1990). This can be explained by
assuming that the internal representation of CS as perceived by the subject diminishes during the
ISI. This can be modelled by taking several time-bins within a trial and the CS predicts a temporally
discounted sum of all future rewards within the trial with more recent time-bins having greater
weight. Thus, an US occurring with longer ISI is discounted more and hence is less effective in
conditioning. For example, using an exponential discounting function, with as discount factor, the
reward predicted Vtat time t is given by
Vtt+1+ t+2+ 2t+3+
3t+4+
The following recursive relationship allows estimation of the current prediction and avoids the
necessity to wait until all future rewards are received in that trial.
Vtt+1+ Vt+1
We can now define the temporal difference error that must approach zero with learning as
t= t+1+ Vt+1 - Vt
and the learning is governed by
VA= A(t+1+ Vt+1 - Vt)
where, t+1+ Vt+1takes the role of (asymptotic value of US) in Rescorla-Wagner model.
Another important illustration of the use of real-time models such as the TD is that it can explain
higher-order conditioning, in which conditioned stimuli not only acquire predictive power when
-
8/12/2019 Introduction to Reward Processing
6/28
Introduction to Reward processing
6
associated with an US, but also when associated with another conditioned stimuli that has
previously been associated with an US. The prediction of reward at various time-points within a
trial, as proposed by the TD model, explains the ability of the organism to predict the US based on
the earliest available CS.
1.1.2 Approach behaviour and decision makingRewards act as positive reinforcers by increasing the frequency and intensity of the behaviour that
leads to the acquisition of goal objects (Schultz, 2000). Reinforcers are those objects that increase
the frequency of behaviour. Rewards also act as goals in their own right and can therefore elicit
approach and consummatory behaviour. Omission of reward leads to extinction of behaviour.
Punishment has opposite motivational valence to reward and decrease the frequency of behaviour.
Avoidance/escape behaviours are negatively reinforced (strengthened) in order to prevent/terminate
a punishment, respectively. These findings have been formalised as law of effect (Thorndike, 1911)
that states learning would only occur if there was reinforcement. The approach behaviour has been
central to the operational definition of rewards as those objects which subjects will work to acquire
through allocation of time, energy, or effort (McClure, 2003) or in other words, rewards make
subjects come back for more.
In Pavlovian conditioning, the conditioned stimuli elicit responses that help prepare the
animal for the consumption of reward. Konorski (1967) distinguished between preparatory and
consummatory conditioned responses. Preparatory responses (e.g. excitement, approach) depend on
the general motivational attributes of, or emotional responses to, a reinforcer and hence reflect the
general affective value of the reinforcer. Consummatory responses (e.g. pecking, salivation) depend
on the specific sensory attributes of the reinforcer (Mackintosh, 1983). In most experiments, both
preparatory and consummatory conditioning will occur. Therefore, CS will be associated with both
affective and sensory attributes of the US.
In instrumental conditioning, the actions that lead to reward are reinforced. In real world, an
animal is often encountered with more than one action to choose. The animal is then confronted
with a decision-making situation and would choose those actions that have maximum value.
Reinforcement learning models and its implementations such as the actor-critic architecture
provide an account of choice behaviour. An agent (organism) learns to achieve a goal (maximise
reward) by navigating through the space of states (making decisions - actor) using the reinforcement
signal (updating the value function - critic). In the temporal difference (TD) model, the TD error
-
8/12/2019 Introduction to Reward Processing
7/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
7
guides the updating of value function V(St) when transitioning from state Stto state St+1. Q-learning
and its variants have offered estimation of value functions over state-action pairs, so that in a given
state s, the organism chooses the action a that maximises the value Q(s,a). The updating of value
function Q is done similar to the TD model.
Box 2 Basic reward parameters Microeconomic concepts
The influence of rewards on decision-making can be assessed by the basic reward parameters such
as magnitude, probability and delay. Given a choice between different magnitudes or probabilities
of reward, an organism would choose those options with higher magnitude and probability. Smaller
delays to obtain reward are preferred to longer delays. The reward value is expressed as associative
strength in models of conditioning that facilitates learning.
The occurrence of rewards is uncertain in the dynamic world, in which both the environment
and the behaviour of other agents render the rewards partly unpredictable. Uncertainty can be in the
expected magnitude of the reward (characterised by the variance) or the probability (p) of the
reward (maximum uncertainty at p = 50%) or the time of delivery of the reward. The uncertainty of
rewards can generate attention that determines learning according to associability learning rules
(Mackintosh, 1975; Pearce and Hall, 1980).
Pascal, way back in 1650, conjectured that human choice behaviour could be understood bythe expected value (product of probability and magnitude of the reward). Bernoulli (1738/1954)
suggested that the actual value or the utility that the people assign to an outcome depends on the
wealth of the assigning person and grows more slowly than its magnitude. Bernoulli proposed that
increase in magnitude is always accompanied by an increase in the utility, which follows a concave
(more specifically, a logarithmic) function of magnitude. Hence, individuals behave as to maximise
the expected utility, instead of the expected value. Prospect theory (Kahneman and Tversky, 1979)
suggests that not only the perception of magnitude but also the perception of probability issubjective to an individual.
1.1.3 Subjective feelings of pleasureThe common perception of reward associates positive feelings of pleasure and hedonia as one of the
main functions of reward. Pleasure is a subjective feeling as it depends on the motivation of the
organism (wealth, satiety etc) and other available options (contextual effects). Rewards induce
positive emotions (affect). Recent theories (Berridge and Robinson, 2003) have suggested that the
-
8/12/2019 Introduction to Reward Processing
8/28
Introduction to Reward processing
8
motivational and emotional functions of rewards are dissociable as wanting and liking respectively.
Both the motivational and emotional functions can occur either consciously or unconsciously.
Indeed, wanting can occur without pleasurable liking as accumulated wealth or satiation can fade
away the liking.
1.2 Classical Reward structures: NeurophysiologyDopamine neurons of the ventral tegmental area (VTA) and substantia nigra have long been
identified with the processing of rewarding stimuli. Romo and Schultz (1990) have shown that
phasic dopamine responses appeared to be related to the appetitive properties of the object being
touched rather than the object itself. Phasic burst of dopamine neurons occurred when the monkey's
hand touched a morsel of food but not when the animal's hand touched a wire or other non-food
objects. Dopamine neurons in the substantia nigra pars compacta form part of the nigrostriatal
pathway and project mainly to the caudate and putamen and is identified strongly with motor
function. More medially, the ventral tegmental area (VTA) projects strongly to the nucleus
accumbens and also to the amygdala and hippocampus (mesolimbic pathway). The mesocortical
pathway from medial VTA project to a number of brain structures including the dorsal and ventral
prefrontal cortex. The mesocorticolimbic structures are known to be involved in processing the
reward information.
1.2.1 Dopamine responses related to animal learning theoryDopamine neurons respond to the sight of primary food reward and to the conditioned stimulus
associated with reward (Ljungberg et al. 1992). However dopamine responses were not observed to
a light that was not associated with task performance, suggesting the behavioural significance of
dopamine neurons specific to reward. When a stimulus predicting reward is itself preceded by
another stimulus, the phasic activation of dopamine neurons transfers back to this latter stimulus
(Schultz et al., 1993). Thus, dopamine neurons might respond to the earliest reward predicting
stimulus.
Mirencowiz and Schultz (1994) found that of dopamine neurons showed a short-latency,
phasic response to unpredicted liquid rewards and during conditioning. After learning, the neuronal
responses occurred at the onset of the conditioned stimulus. When a predicted reward is omitted,
dopamine neurons are depressed time-locked to the usual occurrence of the reward. It is suggested
that the phasic dopamine response might encode the discrepancy between the predicted and the
-
8/12/2019 Introduction to Reward Processing
9/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
9
actual occurrence of the reward (for review see Schultz et al., 1997). More recently, Bayer and
Glimcher (2005) have used a regression model that replicated the findings consistent with a
temporal difference model demonstrating a role of dopamine neurons in positive reward prediction
error. Hollerman and Schultz (1998) showed that dopamine neurons were activated by rewards
during early trials and the activity progressively reduced as the rewards became more predictable.
Further, these neurons were activated when rewards occurred at unpredicted times and were
depressed when rewards were omitted at predicted times. Thus dopamine neurons encode errors in
prediction of both the occurrence and the time of rewards.
Waelti et al. (2001) used blocking procedure to show that the responses of dopamine
neurons to conditioned stimuli were governed differentially by the occurrence of reward prediction
errors rather than stimulusreward associations alone. Tobler et al. (2003) used a conditionedinhibition paradigm and showed that out of 69 dopamine neurons that were strongly active to
reward predicting stimuli, 48 neurons showed considerable depressions to conditioned inhibitors
and minor activations in remaining neurons. To be able to successfully discriminate between reward
and non-reward predicting stimuli, attention must be paid to both conditioned excitors as well as
Figure 1-2 Primary target regions of dopamine
The dopamine neurons, named after the neurotransmitter they release withnerve impulses in their projection territories, are located in the midbrainstructures substantia nigra (pars compacta) and the medially adjoiningventral tegmental area (VTA). The axons of dopamine neurons project to
the striatum (caudate nucleus, putamen and ventral striatum includingnucleus accumbens), the dorsal and ventral prefrontal cortex, and a
number of other structures.
-
8/12/2019 Introduction to Reward Processing
10/28
Introduction to Reward processing
10
inhibitors. This indicates differential neural coding of reward prediction and attention.
These findings indicate dopamine responses comply with basic tenets of animal learning
theory and indicate a role for dopamine in reward-based learning, in particular representing reward
prediction errors. Learning rules such as proposed by Rescorla and Wagner (1972) also explaingreater associative strength for increasing magnitudes of reward. Further as learning is captured by
the concept of prediction error, thus increasing probability of reward should result in smaller
responses to the reward and thereby greater responses to the reward predicting cue. These basic
parameters of reward processing, namely magnitude, probability, expected value and uncertainty
have been fundamental concepts of microeconomics.
Two reports by Schultz and colleagues (Fiorillo et al., 2003; Tobler et al., 2005) have shown
dopamine responses to magnitude and probabilities of reward. Fiorillo et al. (2003) found that the
phasic activation of dopamine neurons varied monotonically across the full range of probabilities,
supporting past claims that this response codes the discrepancy between predicted and actual
reward. In addition, a gradual increase in activity until the potential time of reward was observed
that was related to the uncertainty of obtaining a reward. Tobler et al. (2005) found that the phasic
activation of midbrain dopamine neurons showed similar sensitivity to both the magnitude and
probability of reward, and appeared to increase monotonically with expected reward value. Further,
a second form of adaptation observed was the change in sensitivity or gain of neural activity thatappeared to depend on the range of likely reward magnitudes.
Figure 1-3 Dopamine responses to basic reward parameters (adapted from Tobler, 2003)
-
8/12/2019 Introduction to Reward Processing
11/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
11
1.2.2 Reward signals in the striatum and orbitofrontal cortexHikosaka et al. (1989) showed reward expectation and reward delivery related activation in caudate
neurons. The activations were non-selective for how the monkey obtained the reward, i.e., by visual
fixation only, by a saccade, or by a hand movement. Apicella et al (1991) found ventral and dorsal
striatal responses to primary liquid rewards that could be distinguished from movement related
activations in posterior putamen. Neurons that detect rewards are more common in the ventral
striatum than in the caudate nucleus and putamen. Schultz et al. (1992) showed reward-expectation
and reward-delivery related responses in the ventral striatum. Changes in the appetitive value of the
reward liquid modified the magnitude of activations, suggesting a possible relationship to the
hedonic properties of the expected event.
Thorpe et al. (1983) showed that neurons in orbitofrontal cortex responded selectively to
particular foods or aversive stimuli that could not be explained by simple sensory features of the
stimulus. Orbitofrontal neurons tracked whether particular visual stimuli continue to be associated
with reinforcement and the responses reversed when the stimulus contingencies were interchanged.
Critchley and Rolls, (1996) found neuronal responses in orbitofrontal cortex to rewards and reward-
predicting stimuli is reduced with satiation and hence are related to the motivational value rather
than the sensory properties of reward objects. Tremblay and Schultz (2000) showed activation in
orbitofrontal neurons to expectation of reward that also detect reward delivery at trial end. The
activations also preceded expected drops of liquid delivered outside the task.
The number of possible reward values and stimuli has no absolute limits. However, the
number of neurons and their possible spike outputs are limited. If the neurons outputs were evenly
allocated for reward values, there would be little discrimination between rewards. Neurons in the
orbitofrontal cortex of the monkey discriminate between different rewards on the basis of their
relative preferences (Tremblay and Schultz, 1999). For example, consider a neuron that is active
when a more preferred reward (such as a piece of apple) is expected rather than a less preferred
reward (such as cereal). The same neuron shows higher activity, in a different trial, when an even
more preferred reward (such as raisin) is expected rather than the previously preferred reward of
apple. Thus, rewards may influence each other, and the value of a reward can depend on other
available rewards. Cromwell and Schultz (2003) have shown that single neurons within the anterior
striatum distinguish between minute differences in reward magnitude (
-
8/12/2019 Introduction to Reward Processing
12/28
Introduction to Reward processing
12
study. Cromwell et al., (2005) suggested that the shift in reward processing due to different
preferences of the animal may reflect the adaptation of responses to the current reward distribution.
For linear, monotonic responses, this can be expressed as y=a+b(x-p) where, b represents reward
sensitivity and p represents the shift in the current distribution and a is a constant. It might be
possible that the immediate past experience sets up a prediction about the mean and range of the
future rewards. This prediction would facilitate the brain to use its full coding potential, thus
optimising its response, only within this distribution.
1.3 FMRI studies of reward processingAlthough animal studies provide an unprecedented approach to study neural mechanisms at cellular
level, the limited communication and cognitive capabilities restricts the investigation of reward
functions in animals. Early neuroimaging studies have replicated the animal results in human
subjects and extended the view of putative reward-processing neural structures. Presentation of
monetary or liquid rewards and stimuli predicting such rewards activates reward structures
previously characterised in neurophysiological experiments, notably the striatum, orbitofrontal
cortex, amygdala and dopaminergic midbrain. As human blood oxygen level dependent (BOLD)
responses most likely reflect presynaptic inputs to neurons (Logothetis et al. 2001), some of these
activations may be due to the known dopaminergic inputs to these structures.
In a Positron Emission Tomography (PET) study, Thut et al. (1997) have found activation of
left frontal cortex, thalamus and midbrain in a go-no go task using monetary rewards. Arana et al.
(2003) in a PET study used a restaurant task in which subjects considered or chose items from a
menu tailored according to the subjects preferences. Amygdala and medial orbitofrontal cortex
were activated when considering appetitive incentive values of foods. Activation in amygdala
correlated with the subjects incentive ratings and the activation in medial orbitofrontal cortex
correlated with the difficulty of choice being made suggesting its role in goal selection.
Kirsch et al. (2003) used a differential conditioning paradigm and asked participants to
perform a reaction time task. Participants were rewarded (or not rewarded) with a monetary or
verbal feedback (fast or slow). Activity related to anticipation of reward in substantia nigra and
nucleus accumbens was stronger with highly motivating stimuli (monetary reward) compared to
less motivating ones (verbal feedback).
Gottfried et al. (2003) trained subjects with picture-odour associations while performing a
visuospatial discrimination task. After training, subjects received the same contingencies in two
-
8/12/2019 Introduction to Reward Processing
13/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
13
further sessions. Subjects were fed to satiety selectively on one of the two food-based olfactory
rewards in between the two sessions. Activity in the amygdala and OFC declined for the CS
predicting the devalued odour, while the activity in the ventral striatum, insula and cingulate cortex
not only showed decreased responses to the CS predicting the devalued odour, but also increased
responses to the CS predicting non-devalued odour. Their results suggest that amygdala and OFC
encode the current value of the reward representations accessible to the predictive cues.
Ramnani et al. (2004) trained participants with Pavlovian conditioning paradigm in which
two conditioned stimuli predicted occurrence of a 1 pound reward or no reward respectively.
Participants were then scanned during which few of the trials had cue-outcome contingencies
reversed. Unexpected rewards evoked activation in the orbitofrontal cortex, frontal pole,
parahippocampal cortex and cerebellum. Unexpected reward failure evoked activity in the frontalpole and the temporal cortex.
Cox et al. (2005) used a simple card game (guessing whether the number on the back of a
card shown face up was higher or lower than the value shown) to mask awareness of a conditioning
task in which discriminable visual patterns were associated with monetary reward and loss. The
patterns were then presented one at a time without reward or negative feedback. Subjects indicated
their preference when two patterns were presented simultaneously. This procedure allowed the
authors to test the brain activations to conditioned stimuli in the absence of explicit rewardanticipation. Activity was observed in ventral striatum and OFC when reward was compared with
negative feedback. When passively viewing the conditioned stimuli, activation was observed in the
OFC. Thus OFC is involved in representing both rewarding and conditioned stimuli that have
acquired reward value.
ODoherty et al. (2006) used Pavlovian conditioning and determined subjects preferences
for five different food flavours that were associated with five fractals. Subjects performed a
keypress indicating the spatial location (left or right) of the fractal. Using a temporal difference
model of learning the value signal, they found that ventral midbrain region showed a linear response
to preferences. However, the ventral striatum showed bivalent responses, with maximal responses
to most and least preferred food, possibly consistent with the suggestions that ventral striatum might
be involved in both appetitive and aversive learning (Jensen et al., 2003; Knutson et al., 2001;
Seymour et al., 2004). Given that no aversive stimuli were used by ODoherty et al. (2006), a
further possibility is that the ventral striatum is coding a relative value of the stimuli rather than the
-
8/12/2019 Introduction to Reward Processing
14/28
Introduction to Reward processing
14
objective value independently of the context in which the stimuli are presented (Cromwell et al.
2005).
Recently, Bray and ODoherty (2007) used classical conditioning procedure in which
subjects performed a simple spatial identification task to indicate the side (left or right) on which afractal was presented. Participants received reinforcement on 50% of trials by attractive/
unattractive faces. They find significant prediction-error related activity in the ventral striatum for
the attractive compared with unattractive faces. In contrast, amygdala showed positive correlations
with prediction error signals of both attractive and unattractive faces.
1.3.1 Motivational ValenceA number of neuroimaging studies have found distinct neural systems processing reward andpunishment information. Delgado et al. (2000) asked participants to guess whether the value of the
card was higher or lower than 5. Participants received monetary reward ($1.00), punishment ($0.50)
or neutral feedback. They found that bilateral caudate in the dorsal striatum showed differential
activation based on the valence of the feedback. A sharp decrease of response below baseline was
observed after a punishment, while the activations sustained following a reward. Delgado et al.
(2004) found the activity in caudate nucleus was more robust in early phases of learning, which
decreased for the reward-feedback signal as the learning progressed for the well-predicted cues.
They suggest the role of caudate in the initial acquisition of contingencies by trial-and-error
learning as well as its activity is modulated as a function of learning and predictability.
Knutson et al. (2001) showed that anticipation of reward significantly increased activation in
the nucleus accumbens, whereas activation in medial caudate increased in anticipation of both
rewards and punishments. Nucleus accumbens activity was also correlated with self-reported
happiness. Cues signalled the potential reward ($0.20, $1.00, or $5.00), punishment ($0.20, $1.00,
or $5.00) or no monetary outcome. Subjects performed a button press task during a target to win or
avoid losing money and the task difficulty was adjusted so that subjects should succeed on ~66% of
target responses.
The lateral area of orbitofrontal cortex (OFC) is activated following a punishing outcome
and the medial OFC is activated following a rewarding outcome (see Elliot et al; 2000 Review).
ODoherty et al. (2001) used a visual reversal-learning task in which the choice of a correct
stimulus lead to a probabilistically determined monetary reward and the choice of an incorrect
stimulus lead to a monetary loss. They found a medial-lateral distinction for rewarding and
-
8/12/2019 Introduction to Reward Processing
15/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
15
punishing outcomes, respectively. ODoherty et al., (2003) used a reversal task in which selection
of a correct stimulus lead to 70% probability of receiving monetary reward and 30% probability of
monetary punishment. The incorrect stimulus had the reverse contingency. The reversal occurred on
a random trial after a criterion of five selections of the correct stimulus was reached. They found
that ventromedial and orbital PFC are not only involved in representing the valence of outcomes,
but also signal subsequent behavioural choice. The anterior insula / caudolateral OFC was related to
behavioural choice and was active in trials that required a switch in stimulus choice the subsequent
trials.
Jensen et al. (2003) found ventral striatum activation in anticipation of aversive stimuli
(unpleasant cutaneous electrical stimulation) that was not a consequence of relief after the aversive
event. Further, the ventral striatum was active regardless of whether there was an opportunity toavoid the stimulus or not. Nitschke et al. (2006) used a passive viewing task of aversive and neutral
pictures. They found activation in dorsal amygdala, anterior insula, dorsal ACC, right DLPFC, and
posterior OFC during both anticipation and viewing of aversive pictures. Further, rostral ACC,
superior sector of right DLPFC and medial sectors of OFC were more responsive to anticipation of
aversive pictures than in response to them.
The relief obtained by avoidance of an aversive stimulus can itself be a reward. Kim et al.
(2006) used a monetary instrumental task in which participants chose between a pair of fractals thatmarked the onset of four trial types that predicted reward, avoided loss, neutral feedback, or no
feedback with 60% or 30% probability. They found that medial OFC activity increased after
receiving reward or avoiding loss, and decreased after failing to obtain a reward or receiving an
aversive outcome. These responses cannot be explained as a prediction error, because the activity
does not decrease over the course of learning. They also found signed reward prediction error
signals in the ventral striatum on reward trials but not on avoidance trials, possibly indicating that
monetary loss as a secondary reinforcer as against primary reinforcers such as aversive flavours or
pain might be processed differently in the ventral striatum.
1.3.2 Reward prediction errorsThe reward responses comply with formalisms of learning theory such as the reward prediction
error hypothesis. Berns et al. (2001) delivered subjects with fruit juice and water in a temporally
predictable or an unpredictable manner. Unpredictability of rewards resulted in significant activity
in nucleus accumbens and medial orbitofrontal cortex, while predictability resulted in activation
-
8/12/2019 Introduction to Reward Processing
16/28
Introduction to Reward processing
16
predominantly in the superior temporal gyrus. Unlike classical conditioning, the source of
prediction in Berns et al (2001) was based on the sequence of stimuli. Moreover, the subjects
preference for juice or water was reflected in the activity in sensorimotor cortex, but not in reward
regions. Pagnoni et al. (2002) demonstrated that activity in ventral striatum is time-locked to reward
prediction errors when a juice expected at 4 seconds after a cue initiated button press was delayed to
seconds. The finding was not replicated when the juice was replaced by a visual stimulus indicating
that ventral striatum selectively encodes rewarding events and not any salient stimulus in general.
McClure et al. (2003) used classical conditioning paradigm to test for temporal prediction errors
when a juice expected at 6 seconds delay after a light cue was delivered only after a further delay of
4 seconds. Thus a negative prediction error would occur for the absence of juice, while a positive
prediction error would occur for unexpected delivery of juice at a later time. They found that both
these prediction errors correlated with activity in the left putamen.
The real-time extension of the Rescorla-Wagner learning rule, i.e. the TD model has been
successfully used to explain brain activity in tasks involving prediction error. ODoherty et al.
(2003) used appetitive conditioning with taste reward. Three fractals were associated with glucose,
neutral taste or no taste. Reward was omitted or unexpectedly delivered in some of the trials.
Regression analysis with a temporal difference model revealed significant correlation of activity in
the ventral striatum and OFC with the error signal, suggesting their role in reward-related learning.
Seymour et al. (2004) used second-order pain learning task in which two visual cues preceded
delivery of high or low pain. While the second cue fully predicted the strength of the subsequently
experienced pain, the first cue only allowed a probabilistic prediction. They demonstrate that
activity in the ventral striatum and the anterior insula display a marked correspondence to the
signals predicted by the temporal difference models.
Seymour et al. (2007) used a probabilistic Pavlovian task to compare winning / losing
money in two conditions when the alternative was respectively winning / losing nothing, or losing /
winning money. Positive reward prediction error can be obtained by contrasting bivalent 1.00 win
with univalent 1.00 outcome. Similarly, Positive loss prediction error is obtained by contrasting
bivalent 1.00 loss with univalent 1.00 loss. The opposite contrasts would reveal negative
prediction errors. They found that striatal activation reflected positively signed prediction error in
anterior region for rewards and more posteriorly for losses.
-
8/12/2019 Introduction to Reward Processing
17/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
17
1.3.3 Neuroimaging of basic reward parametersAnimal learning theory and microeconomic theory have suggested a number of basic reward
parameters, such a magnitude, probability, delay etc that are involved in processing reward
information. In a parametric study, Elliott et al. (2003) have found non-linear responses in
orbitofrontal cortex with increasing magnitudes of financial reward. They parametrically varied the
monetary reward value (10, 20, 50 pence and 1 pound) while subjects performed a simple target
detection task. They found that amygdala, striatum and dopaminergic midbrain responded
regardless of the reward value, while medial and lateral OFC responded non-linearly showing
maximum response for the lowest and highest values. Galvan et al. (2005) used a delayed-response
spatial-choice task and were presented with small, medium or large amount of coins. The exact
value of each reward was not disclosed to the subjects to avoid the subjects counting the totalmoney after each trial. They find reward magnitude related responses in nucleus accumbens,
thalamus and orbitofrontal cortex. Interestingly, only nucleus accumbens had a shift in activity from
the reward to the predicting cue during later stages of learning. A frontostriatal shift in activity can
be suggested (Pasupathy and Miller, 2005) as the OFC responses contrasted with the accumbens
activity and the responses in OFC increased to the rewarded response rather than to the predictive
cue.
Breiter et al. (2001) showed subjects prospects consisting of a set of three outcomes and one
of these amounts was awarded after a delay. Subjects could win or lose money in these prospects.
Three kinds of prospects (good: $10, $2.50, $0, intermediate: $2.50, $0, -$1.50 and bad: $0, -$1.50,
-$6) were used. Subjects could either win, lose or retain their initial endowment of $50. In the good
prospect, subjects could win additional money or retain their earnings, in the bad prospect subjects
could retain or lose money and in the intermediate prospect, subjects could win/retain/lose money.
Haemodynamic responses in the amygdala and orbital gyrus tracked the expected values of the
prospects. Sustained outcome phase responses in nucleus accumbens, amygdala, and hypothalamuswere ordered as a function of monetary payoff on the good prospect. They found a large overlap
between the neural activations in the prospect and outcome phase and little evidence for anatomical
segregation between the prospect and outcome phases. According to decision affect theory (Mellers
et al., 1997), responses to a given outcome depend on counterfactual comparisons. Thus $0 on a
good prospect will be experienced as a loss and the same outcome in a bad prospect would be
experienced as a win. Partial evidence for this was observed clearly in time courses of nucleus
accumbens and amygdala for the good and bad prospects, but not so for intermediate prospect.
-
8/12/2019 Introduction to Reward Processing
18/28
Introduction to Reward processing
18
McClure et al. (2004) examined neural correlates of time discounting while subjects made a
series of choices between monetary reward options that varied by delay (same day to 6 weeks later)
to delivery. They found ventral striatum, medial OFC, medial PFC, posterior cingulate and left
posterior hippocampus were related to choice of immediate rewards. In contrast, regions of lateral
prefrontal cortex and posterior parietal cortex are engaged uniformly by inter-temporal choices
irrespective of delay. Recently, McClure et al. (2007) used primary rewards (fruit juice or water)
with time delay of minutes instead of weeks and found similar activation patterns as in their
previous study. When the delivery of all rewards was offset by 10 min, there was no further
differential activity in limbic reward-related areas. Hence suggesting that time discounting is not a
relative concept.
Dreher et al. (2006) used three slot machines with two different reward values ($10, $20)and reward probabilities (0.25, 0.5), so that one pair of slot machines had the expected value
matched. To avoid counterfactual comparison, a common outcome of no reward with a probability
of 1 served as a fourth slot machine. They found that midbrain region responded transiently to
higher reward probability at the time of the cue and to lower reward probability at the time of the
reward outcome and in a sustained fashion to reward uncertainty during the delay period. These
results parallel those found in electrophysiological studies of primates (Fiorillo et al., 2003). The
midbrain activations could not be explained by increase in expected value alone, as when
comparing the two conditions with equal expected values, midbrain activation was robustly
activated in anticipation of an uncertain reward (50% probability) with low magnitude ($10)
compared with reward with known lower probability (25%) but higher magnitude ($20). A frontal
network covaried with the reward prediction error signal both at the time of the cue and at the time
of the outcome. The ventral striatum (putamen) showed sustained activation that covaried with
maximum reward uncertainty during reward anticipation. Their results suggest distinct functional
networks encoding statistical properties of reward information.
Recently, Liu et al. (2007) used a monetary decision making task in which participants
chose whether to bank or bet a certain number of chips. Decision to bank or losing the bet made
them start over from one chip, while the wager was doubled if they won the bet. However,
participants witnessed the outcome even after they banked. They contrasted three reward processes
reward anticipation (bet Vs bank), outcome monitoring (win Vs loss) and choice evaluation (right
Vs wrong). They found the striatum and medial/middle orbitofrontal cortex was activated by
positive reward anticipation, winning outcome and evaluation of right choices, whereas lateral
-
8/12/2019 Introduction to Reward Processing
19/28
-
8/12/2019 Introduction to Reward Processing
20/28
Introduction to Reward processing
20
performance and minimal when receipt of money was unrelated to the task. They found that
behaviourally salient monetary rewards activate the human striatum, suggesting its role in saliency
of rewards rather than value or hedonic feelings. Tricomi et al. (2004) have reported that the
caudate nucleus was robustly activated when the subjects thought that whether they won or lost
money was contingent on the button press (i.e. action). Elliot et al. (2004) investigated whether the
neural responses to financial reward depended on instrumental action using a 2x2 factorial design
consisting of movement and reward as the two factors. Subjects performed a simple target detection
task. The trial types were indicated by coloured squares and hence rewards were fully predictable
and reward expectation remained fixed. Significant enhancement of the reward-related response
under the movement condition was seen in the dopaminergic midbrain, dorsal striatum and the
amygdala.
1.4 RationaleThe recently developed functional Magnetic Resonance Imaging (fMRI) methods provide a unique
opportunity to extend reward work to humans, first by replicating, and thus referencing, the reward
work done in monkeys, and then by investigating typical 'human' tasks that are difficult to approach
in animals.
As mentioned earlier, rewards have schematically three functions: they induce learning,
approach behaviour, and positive emotions. The first of the reward functions (learning) can be well
investigated in animals, for example using classical (Pavlovian) and instrumental (operant)
conditioning. The second reward function (approach behaviour) can also be investigated in animals,
but the work is limited due to their limited communication and cognitive abilities. The third reward
function (subjective feelings of pleasure) is very difficult to investigate in animals, and humans
appear to be the subjects of choice.
Monetary rewards are uniquely human. The importance of money in everyday life makes it
a strong reinforcer. Neurophysiological studies in animals have provided the primary basis for
speculations about the brain areas that might process reward information in the human brain. The
initial neuroimaging studies in humans using Positron Emission Tomography revealed that alpha-
numerically presented monetary reward was more reinforcing than positive reinforcement of the
word ok in the dorsolateral and orbital frontal cortex, midbrain and thalamus (Thut et al., 1997).
The success of fMRI to study reward processing in humans was obtaining measurable BOLD signal
changes in the orbitofrontal cortex (OFC), amygdala, ventral striatum/nucleus accumbens (see
-
8/12/2019 Introduction to Reward Processing
21/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
21
McClure et al., 2004 for a review), regions that have previously been implicated in reward
processing in non-human primates. A wide range of rewarding stimuli including primary rewards
(liquids, smells, sexual stimuli), abstract rewards (money, positive reinforcement) and social
rewards (beautiful faces, pleasant touch) activate the same network of brain areas. The findings
from numerous animal and human studies has led researchers to suggest the roles that the different
brain areas might play in processing reward information. The midbrain and ventral striatum might
be involved in reward prediction error, while the orbitofrontal cortex might be involved in
evaluating rewards and relative processing of rewards. The amygdala, though traditionally believed
to process aversive and fear-inducing stimuli, is now generally believed to involve processing
reinforcer intensity both appetitive and aversive.
BibliographyAdcock, R., Thangavel, A., Whitfield-Gabrieli, S.,
Knutson, B., and Gabrieli, J. (2006). Reward-motivated learning: mesolimbic activation precedesmemory formation. Neuron, 50:507517.
Anderson, A., Christoff, K., Stappen, I., Panitz, D.,Ghahremani, D., Glover, G., Gabrieli, J., and Sobel,N. (2003). Dissociated neural representations ofintensity and valence in human olfaction. Nat.Neurosci., 6:196202.
Apicella, P., Ljungberg, T., Scarnati, E., and Schultz,W. (1991). Responses to reward in monkey dorsal
and ventral striatum. Exp Brain Res, 85(3):491500.Arana, F., Parkinson, J., Hinton, E., Holland, A.,
Owen, A., and Roberts, A. (2003). Dissociablecontributions of the human amygdala andorbitofrontal cortex to incentive motivation andgoal selection. J. Neurosci., 23:96329638.
Bandettini, P.A. (1994). Magnetic resonance imagingof human brain activation using endogenoussusceptibility contrast, PhD Thesis, MedicalCollege of Wisconsin.
Bandettini, P., Wong, E., Jesmanowicz, A., Hinks, R.,and Hyde, J. (1994). Spin-echo and gradient-echo
EPI of human brain activation using BOLDcontrast: a comparative study at 1.5 T. NMRBiomed, 7:1220.
Bayer, H. and Glimcher, P. (2005). Midbrain dopamineneurons encode a quantitative reward predictionerror signal. Neuron, 47:129141.
Beaver, JD, Lawrence, AD, van Ditzhuijzen, J, Davis,MH, Woods, A, Calder, AJ (2006). Individualdifferences in reward drive predict neural responsesto images of food. J. Neurosci., 26, 19:5160-6.
Bensafi, M, Sobel, N, Khan, RM (2007). Hedonic-specific activity in piriform cortex during odorimagery mimics that during odor perception. J.
Neurophysiol., 98, 6:3254-62.
Berger, TW, Alger, B, Thompson, RF (1976).Neuronal substrate of classical conditioning in thehippocampus. Science, 192, 4238:483-5.
Bernoulli, D. (1738/1954). Exposition of a new theoryon the measurement of risk. Econometrica, 22, 23-36 (translated from latin).
Berns, G. (1999). Functional neuroimaging. Life Sci.,65:2531 2540.
Berns, G., McClure, S., Pagnoni, G., and Montague, P.(2001). Predictability modulates human brainresponse to reward. J. Neurosci., 21:27932798.
Berridge, K. and Robinson, T. (1998). What is the roleof dopamine in reward: hedonic impact, rewardlearning, or incentive salience? Brain Res. BrainRes. Rev., 28:309369.
Berridge, K. and Robinson, T. (2003). Parsing reward.Trends Neurosci., 26:507513.
Boser, B.E., Guyon, I., and Vapnik, V. (1992). Atraining algorithm for optimal margin classifiers. InProceedings of the Fifth Annual Workshop onComputational Learning Theory, (ACM Press) pp.144152.
Bowman, C. and Turnbull, O. (2003). Real versusfacsimile reinforcers on the Iowa Gambling Task.
Brain Cogn, 53:207210.Boynton, G., Engel, S., Glover, G., and Heeger, D.(1996). Linear systems analysis of functionalmagnetic resonance imaging in human V1. J.Neurosci., 16:42074221.
Bray, S. and ODoherty, J. (2007). Neural coding ofreward-prediction error signals during classicalconditioning with attractive faces. J. Neurophysiol.,97:30363045.
Bray, S., Shimojo, S., and ODoherty, J. (2007). Directinstrumental conditioning of neural activity usingfunctional magnetic resonance imaging-derivedreward feedback. J. Neurosci., 27:74987507.
-
8/12/2019 Introduction to Reward Processing
22/28
Introduction to Reward processing
22
Breiter, H., Aharon, I., Kahneman, D., Dale, A., andShizgal, P. (2001). Functional imaging of neuralresponses to expectancy and experience ofmonetary gains and losses. Neuron, 30:619639.
Brett, M., Leff, A., Rorden, C., and Ashburner, J.(2001). Spatial normalization of brain images with
focal lesions using cost function masking.Neuroimage, 14:486500.Bunzeck, N. and Duzel, E. (2006). Absolute coding of
stimulus novelty in the human substantianigra/VTA. Neuron, 51:369379.
Buxton, R., Wong, E., and Frank, L. (1998). Dynamicsof blood flow and oxygenation changes duringbrain activation: the balloon model. Magn ResonMed, 39:855864.
Camerer, C.F., Hogarth, R.M. (1999). The Effects ofFinancial Incentives in Experiments: A Review andCapital-Labor-Production Framework. Journal ofRisk and Uncertainty, 19:742.
Carlson, TA, Schrater, P, He, S (2003). Patterns ofactivity in the categorical representations of objects.J Cogn Neurosci, 15, 5:704-17.
Chein, J. M. and Schneider, W. (2003). Designingeffective fMRI experiments. In Grafman, J. andRobertson, I., editors, Handbook ofNeuropsychology. Elsevier Science B.V.,Amsterdam.
Cox, DD, Savoy, RL (2003). Functional magneticresonance imaging (fMRI) "brain reading":detecting and classifying distributed patterns offMRI activity in human visual cortex. Neuroimage,19, 2 Pt 1:261-70.
Childress, A.R., Franklin, T., Listerud, J., Acton, P.D.,and OBrien, C.P. (2002). Neuroimaging of cocainecraving states: cessation, stimulant administration,and drug cue paradigms. InNeuropsychopharmacology: a fifth generation ofprogress. K.L. Davis, D. Charney, J.T. Coyle, C.Nemeroff, eds. pp. 575-1590.
Cohen, M. and Bookheimer, S. (1994). Localization ofbrain function using magnetic resonance imaging.Trends Neurosci., 17:268277.
Constable, R. (1995). Functional MR imaging usinggradientecho echo-planar imaging in the presenceof large static field inhomogeneities. J Magn Reson
Imaging, 5:746752.Cox, S., Andrade, A., and Johnsrude, I. (2005).
Learning to like: a role for human orbitofrontalcortex in conditioned reward. J. Neurosci.,25:27332740.
Critchley, H. and Rolls, E. (1996). Hunger and satietymodify the responses of olfactory and visualneurons in the primate orbitofrontal cortex. J.Neurophysiol., 75:16731686.
Cromwell, H. C., Hassani, O. K., and Schultz, W.(2005). Relative reward processing in primatestriatum. Exp Brain Res, 162(4):520525.
Cromwell, H. C. and Schultz, W. (2003). Effects ofexpectations for different reward magnitudes onneuronal activity in primate striatum. JNeurophysiol, 89(5):28232838.
Cusack, R., Russell, B., Cox, S., De Panfilis, C.,Schwarzbauer, C., and Ansorge, R. (2005). An
evaluation of the use of passive shimming toimprove frontal sensitivity in fMRI. Neuroimage,24:8291.
Dadds, M., Bovbjerg, D., Redd, W., and Cutmore, T.(1997). Imagery in human classical conditioning.Psychol Bull, 122:89103.
Dale, A. (1999). Optimal experimental design forevent-related fMRI. Hum Brain Mapp, 8:109114.
DArdenne, K., McClure, S., Nystrom, L., and Cohen,J. (2008). BOLD responses reflecting dopaminergicsignals in the human ventral tegmental area.Science, 319:12641267.
Davatzikos, C, Ruparel, K, Fan, Y, Shen, DG,
Acharyya, M, Loughead, JW, Gur, RC, Langleben,DD (2005). Classifying spatial patterns of brainactivity with machine learning methods: applicationto lie detection. Neuroimage, 28, 3:663-8.
De Houwer, J., Thomas, S., and Baeyens, F. (2001).Associative learning of likes and dislikes: A reviewof 25 years of research on human evaluativeconditioning. Psychological Bulletin, 127, 853869.
Deichmann, R., Gottfried, J., Hutton, C., and Turner,R. (2003). Optimized EPI for fMRI studies of theorbitofrontal cortex. Neuroimage, 19:430441.
Delgado, M., Miller, M., Inati, S., and Phelps, E.(2005). An fMRI study of reward-related
probability learning. Neuroimage, 24:862 873.Delgado, M., Nystrom, L., Fissell, C., Noll, D., and
Fiez, J. (2000). Tracking the hemodynamicresponses to reward and punishment in the striatum.J. Neurophysiol., 84:30723077.
Delgado, M.R., Stenger, V.A. and Fiez, J.A. (2004).Motivation-dependent responses in the humancaudate nucleus. Cereb. Cortex, 14(9):1022-30.
Dematt`e, M., Osterbauer, R., and Spence, C. (2007).Olfactory cues modulate facial attractiveness.Chem. Senses, 32:603610.
Dickinson, A. (1980). Contemporary animal learningtheory. Cambridge: Cambridge University Press.
Djordjevic, J., Zatorre, R., Petrides, M., Boyle, J., andJones-Gotman, M. (2005). Functionalneuroimaging of odor imagery. Neuroimage,24:791801.
Dreher, J., Kohn, P., and Berman, K. (2006). Neuralcoding of distinct statistical properties of rewardinformation in humans. Cereb. Cortex, 16:561573.
Elliott, R., Dolan, R., and Frith, C. (2000). Dissociablefunctions in the medial and lateral orbitofrontalcortex: evidence from human neuroimaging studies.Cereb. Cortex, 10:308317.
Elliott, R., Newman, J., Longe, O., and Deakin, J.(2003). Differential response patterns in the
-
8/12/2019 Introduction to Reward Processing
23/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
23
striatum and orbitofrontal cortex to financial rewardin humans: a parametric functional magneticresonance imaging study. J. Neurosci., 23:303307.
Elliott, R., Newman, J., Longe, O., and WilliamDeakin, J. (2004). Instrumental responding forrewards is associated with enhanced neuronal
response in subcortical reward systems.Neuroimage, 21:984990.Fawcett, T. (2006). An introduction to ROC analysis.
Pattern Recognition Letters, 27: 861-874.Fernie, G. and Tunney, R. (2006). Some decks are
better than others: the effect of reinforcer type andtask instructions on learning in the Iowa GamblingTask. Brain Cogn, 60:94102.
Fiorillo, C., Tobler, P., and Schultz, W. (2003).Discrete coding of reward probability anduncertainty by dopamine neurons. Science,299:18981902.
Friston, K. (1997). Testing for anatomically specified
regional effects. Human Brain Mapping, 5:133136.
Friston, K. J., Ashburner, J., Frith, C. D., Poline, J. B.,Heather, J. D., and Frackowiak, R. S. J. (1995a).Spatial registration and normalisation of images.Human Brain Mapping, 2:165189.
Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.B., Frith, C. D., and Frackowiak, R. S. J. (1995b).Statistical parametric maps in functional imaging:A general linear approach. Human Brain Mapping,2:189210.
Friston, K., Price, C., Fletcher, P., Moore, C.,Frackowiak, R., and Dolan, R. (1996). The trouble
with cognitive subtraction. Neuroimage, 4:97104.Gallistel, C. (1990). Representations in animal
cognition: an introduction. Cognition, 37:122.Galvan, A., Hare, T., Davidson, M., Spicer, J., Glover,
G., and Casey, B. (2005). The role of ventralfrontostriatal circuitry in rewardbased learning inhumans. J. Neurosci., 25:86508656.
Genovese, C., Lazar, N., and Nichols, T. (2002).Thresholding of statistical maps in functionalneuroimaging using the false discovery rate.NeuroImage, 870878.
Gneezy, U., and Rustichini, A. (2000). Pay Enough orDon't Pay at All. The Quarterly Journal of
Economics, 115:791810.Gottfried, J., Deichmann, R., Winston, J., and Dolan,
R. (2002a). Functional heterogeneity in humanolfactory cortex: an eventrelated functionalmagnetic resonance imaging study. J. Neurosci.,22:10819 10828.
Gottfried, J., ODoherty, J., and Dolan, R. (2002b).Appetitive and aversive olfactory learning inhumans studied using eventrelated functionalmagnetic resonance imaging. J. Neurosci.,22:1082910837.
Gottfried, J., ODoherty, J., and Dolan, R. (2003).Encoding predictive reward value in human
amygdala and orbitofrontal cortex. Science,301:11041107.
Gusnard, D., Raichle, M., and Raichle, M. (2001).Searching for a baseline: functional imaging andthe resting human brain. Nat. Rev. Neurosci.,2:685694.
Hampton, A., Adolphs, R., Tyszka, M., and ODoherty,J. (2007). Contributions of the amygdala to rewardexpectancy and choice signals in human prefrontalcortex. Neuron, 55:545555.
Hassabis, D., Kumaran, D., Vann, S., and Maguire, E.(2007). Patients with hippocampal amnesia cannotimagine new experiences. Proc. Natl. Acad. Sci.U.S.A., 104:17261731.
Haxby, JV, Gobbini, MI, Furey, ML, Ishai, A,Schouten, JL, Pietrini, P (2001). Distributed andoverlapping representations of faces and objects inventral temporal cortex. Science, 293, 5539:2425-30.
Haynes, JD, Rees, G (2006). Decoding mental statesfrom brain activity in humans. Nat. Rev. Neurosci.,7, 7:523-34.
Heeger, D. and Ress, D. (2002). What does fMRI tellus about neuronal activity? Nat. Rev. Neurosci.,3:142151.
Henson, R., Rugg, M., and Friston, K. (2001). Thechoice of basis functions in event-related fMRI.NeuroImage, 13(6):127. Supplement 1.
Hikosaka, O., Sakamoto, M., and Usui, S. (1989).Functional properties of monkey caudate neurons.III. Activities related to expectation of target andreward. J. Neurophysiol., 61:814832.
Holland, P. (1990). Event representation in Pavlovianconditioning: image and action. Cognition, 37:105131.
Hollerman, J. R. and Schultz, W. (1998). Dopamineneurons report an error in the temporal prediction ofreward during learning. Nat Neurosci, 1(4):304309.
Holmes, A. and Friston, K. (1998). Generalisability,random effects and population inference. InNeuroImage, volume 7, page S754.
Holt, C.A., and Laury, S.K. (2002). Risk Aversion andIncentive Effects. The American Economic Review92, 1644-1655.
Horvitz, J. (2000). Mesolimbocortical and nigrostriataldopamine responses to salient non-reward events.Neuroscience, 96:651656.
Hulvershorn, J., Bloy, L., Gualtieri, E., Leigh, J., andElliott, M. (2005). Spatial sensitivity and temporalresponse of spin echo and gradient echo boldcontrast at 3 T using peak hemodynamic activationtime. Neuroimage, 24:216223.
Ishai, A., Ungerleider, L., and Haxby, J. (2000).Distributed neural systems for the generation ofvisual images. Neuron, 28:979990.
Jensen, J., McIntosh, A., Crawley, A., Mikulis, D.,Remington, G., and Kapur, S. (2003). Direct
-
8/12/2019 Introduction to Reward Processing
24/28
Introduction to Reward processing
24
activation of the ventral striatum in anticipation ofaversive stimuli. Neuron, 40:12511257.
Johnson, M. and Bickel, W. (2002). Within-subjectcomparison of real and hypothetical money rewardsin delay discounting. J Exp Anal Behav, 77:129146.
Kahneman, D. & Tversky, A. (1979). Prospect theory:an analysis of decision under risk. Econometrica,47, 263-291.
Kamin, L. J. (1969). Predictability, surprise, attentionand conditioning. In B. A. Campbell & R. M.Church (eds.), Punishment and aversive behavior,279-296, New York: Appleton-Century-Crofts.
Kamitani, Y, Tong, F (2005). Decoding the visual andsubjective contents of the human brain. Nat.Neurosci., 8, 5:679-85.
Kim, H., Shimojo, S., and ODoherty, J. (2006). Isavoiding an aversive outcome rewarding? Neuralsubstrates of avoidance learning in the human brain.
PLoS Biol., 4:e233.King, D. (1973). An image theory of classical
conditioning. Psychol Rep, 33:403411.King, D. (1974). An image theory of instrumental
conditioning. Psychol Rep, 35:11151122.Kirsch, P., Schienle, A., Stark, R., Sammer, G.,
Blecker, C., Walter, B., Ott, U., Burkart, J., andVaitl, D. (2003). Anticipation of reward in anonaversive differential conditioning paradigm andthe brain reward system: an event-related fMRIstudy. Neuroimage, 20:10861095.
Knutson, B., Adams, C., Fong, G., and Hommer, D.(2001). Anticipation of increasing monetary reward
selectively recruits nucleus accumbens. J.Neurosci., 21:RC159.
Knutson, B. and Cooper, J. (2005). Functionalmagnetic resonance imaging of reward prediction.Curr. Opin. Neurol., 18:411417.
Knutson, B., Taylor, J., Kaufman, M., Peterson, R., andGlover, G. (2005). Distributed neural representationof expected value. J Neurosci 25:4806-4812.
Kobayashi, M., Takeda, M., Hattori, N., Fukunaga, M.,Sasabe, T., Inoue, N., Nagai, Y., Sawada, T.,Sadato, N., and Watanabe, Y. (2004). Functionalimaging of gustatory perception and imagery: top-down processing of gustatory signals.
Neuroimage, 23:12711282.Konorski, J. (1967). Integrative action of the brain.
Chicago: University of Chicago Press.Kosslyn, S. (1988). Aspects of a cognitive
neuroscience of mental imagery. Science,240:16211626.
Kosslyn, S., Ganis, G., and Thompson, W. (2001).Neural foundations of imagery. Nat. Rev.Neurosci., 2:635642.
Kosslyn, S., Shin, L., Thompson, W., McNally, R.,Rauch, S., Pitman, R., and Alpert, N. (1996).Neural effects of visualizing and perceiving
aversive stimuli: a PET investigation. Neuroreport,7:15691576.
Kringelbach, M. (2005). The human orbitofrontalcortex: linking reward to hedonic experience. Nat.Rev. Neurosci., 6:691702.
Kringelbach, M., ODoherty, J., Rolls, E., and
Andrews, C. (2003). Activation of the humanorbitofrontal cortex to a liquid food stimulus iscorrelated with its subjective pleasantness. Cereb.Cortex, 13:10641071.
Kringelbach, M. and Rolls, E. (2004). The functionalneuroanatomy of the human orbitofrontal cortex:evidence from neuroimaging and neuropsychology.Prog. Neurobiol., 72:341372.
LaConte, S, Strother, S, Cherkassky, V, Anderson, J,Hu, X (2005). Support vector machines fortemporal classification of block design fMRI data.Neuroimage, 26, 2:317-29.
Lancaster, J., Woldorff, M., Parsons, L., Liotti, M.,
Freitas, C., Rainey, L., Kochunov, P., Nickerson,D., Mikiten, S., and Fox, P. (2000). AutomatedTalairach atlas labels for functional brain mapping.Hum Brain Mapp, 10:120131.
Lauterbur, P.C. (1973). Image formation by inducedlocal interactions. Examples employing nuclearmagnetic resonance. Nature, 242:190191.
Le Pelley, M. (2004). The role of associative history inmodels of associative learning: a selective reviewand a hybrid model. Q J Exp Psychol B, 57:193243.
Lin, H.-T., Lin, C.-J., and Weng, R. C. (2007). A noteon Platt's probabilistic outputs for support vector
machines. Machine Learning. 68(3), 267-276.Lisman, J. and Grace, A. (2005). The hippocampal-
VTA loop: controlling the entry of information intolong-term memory. Neuron, 46:703713.
Liu, X., Powell, D., Wang, H., Gold, B., Corbly, C.,and Joseph, J. (2007). Functional dissociation infrontal and striatal areas for processing of positiveand negative reward information. J. Neurosci.,27:4587 4597.
Ljungberg, T., Apicella, P., and Schultz, W. (1992).Responses of monkey dopamine neurons duringlearning of behavioral reactions. J Neurophysiol,67(1):145163.
Logothetis, N. (2002). The neural basis of the blood-oxygenlevel- dependent functional magneticresonance imaging signal. Philos. Trans. R. Soc.Lond., B, Biol. Sci., 357:10031037.
Logothetis, N., Pauls, J., Augath, M., Trinath, T., andOeltermann, A. (2001). Neurophysiologicalinvestigation of the basis of the fMRI signal.Nature, 412:150157.
Mackintosh, N. J. (1975). A theory of attention:Variations in the associability of stimuli withreinforcement. Psychological Review, 82, 276-298.
Mackintosh, N. J. (1983). Conditioning and associativelearning. Oxford: Oxford University Press.
-
8/12/2019 Introduction to Reward Processing
25/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
25
Maldjian, J., Laurienti, P., Kraft, R., and Burdette, J.(2003). An automated method for neuroanatomicand cytoarchitectonic atlasbased interrogation offMRI data sets. Neuroimage, 19:12331239.
Mansfield, P. (1977). Multi-planar image formationusing NMR spin echoes. J. Phys. C 10:L55L58.
McClure, S. M. (2003). Reward prediction errors inhuman brain, PhD Thesis, Baylor College ofMedicine.
McClure, S., Berns, G., and Montague, P. (2003).Temporal prediction errors in a passive learningtask activate human striatum. Neuron, 38:339346.
McClure, S., Ericson, K., Laibson, D., Loewenstein,G., and Cohen, J. (2007). Time discounting forprimary rewards. J. Neurosci., 27:57965804.
McClure, S., Laibson, D., Loewenstein, G., and Cohen,J. (2004a). Separate neural systems valueimmediate and delayed monetary rewards. Science,306:503507.
McClure, S., York, M., and Montague, P. (2004b). Theneural substrates of reward processing in humans:the modern role of FMRI. Neuroscientist, 10:260268.
Mechelli, A., Price, C., Friston, K., and Ishai, A.(2004). Where bottom-up meets top-down:neuronal interactions during perception andimagery. Cereb. Cortex, 14:12561265.
Mellers, B.A., Schwartz, A., Ho, K., and Ritov, I.(1997). Decision affect theory: Emotional reactionsto the outcomes of risky options. PsychologicalScience, 8(6):423429.
Miller, G. (2007). Neurobiology. A surprising
connection between memory and imagination.Science, 315:312.
Mirenowicz, J. and Schultz, W. (1994). Importance ofunpredictability for reward responses in primatedopamine neurons. J Neurophysiol, 72(2):10241027.
Mirenowicz, J. and Schultz, W. (1996). Preferentialactivation of midbrain dopamine neurons byappetitive rather than aversive stimuli. Nature,379(6564):449451.
Mouro-Miranda, J, Bokde, AL, Born, C, Hampel, H,Stetter, M (2005). Classifying brain states anddetermining the discriminating activation patterns:
Support Vector Machine on functional MRI data.Neuroimage, 28, 4:980-95.
Murray, E. (2007). The amygdala, reward and emotion.Trends Cogn. Sci. (Regul. Ed.), 11:489497.
Nichols, T., Brett, M., Andersson, J., Wager, T., andPoline, J. (2005). Valid conjunction inference withthe minimum statistic. Neuroimage, 25:653660.
Nitschke, J., Sarinopoulos, I., Mackiewicz, K.,Schaefer, H., and Davidson, R. (2006). Functionalneuroanatomy of aversion and its anticipation.Neuroimage, 29:106116.
Norris, D., Zysset, S., Mildner, T., and Wiggins, C.(2002). An investigation of the value of spin-echo-
based fMRI using a Stroop colorword matchingtask and EPI at 3 T. Neuroimage, 15:719726.
OCraven, K. and Kanwisher, N. (2000). Mentalimagery of faces and places activates correspondingstiimulus-specific brain regions. J Cogn Neurosci,12:10131023.
ODoherty, J. (2004). Reward representations andrewardrelated learning in the human brain: insightsfrom neuroimaging. Curr. Opin. Neurobiol.,14:769776.
ODoherty, J. (2007). Lights, camembert, action! Therole of human orbitofrontal cortex in encodingstimuli, rewards, and choices. Ann. N. Y. Acad.Sci., 1121:254272.
ODoherty, J., Buchanan, T., Seymour, B., and Dolan,R. (2006). Predictive neural coding of rewardpreference involves dissociable responses in humanventral midbrain and ventral striatum. Neuron,49:157 166.
ODoherty, J., Critchley, H., Deichmann, R., andDolan, R. (2003a). Dissociating valence of outcomefrom behavioral control in human orbital andventral prefrontal cortices. J. Neurosci., 23:79317939.
ODoherty, J., Dayan, P., Friston, K., Critchley, H.,and Dolan, R. (2003b). Temporal difference modelsand reward-related learning in the human brain.Neuron, 38:329337.
ODoherty, J., Dayan, P., Schultz, J., Deichmann, R.,Friston, K., and Dolan, R. (2004). Dissociable rolesof ventral and dorsal striatum in instrumentalconditioning. Science, 304:452454.
ODoherty, J., Kringelbach, M., Rolls, E., Hornak, J.,and Andrews, C. (2001). Abstract reward andpunishment representations in the humanorbitofrontal cortex. Nat. Neurosci., 4:95102.
Ogawa, S., Lee, T., Kay, A., and Tank, D. (1990).Brain magnetic resonance imaging with contrastdependent on blood oxygenation. Proc. Natl. Acad.Sci. U.S.A., 87:98689872.
Ogawa, S., Menon, R., Tank, D., Kim, S., Merkle, H.,Ellermann, J., and Ugurbil, K. (1993). Functionalbrain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging. Acomparison of signal characteristics with a
biophysical model. Biophys. J., 64:803 812.Ogawa, S., Tank, D., Menon, R., Ellermann, J., Kim,
S., Merkle, H., and Ugurbil, K. (1992). Intrinsicsignal changes accompanying sensory stimulation:functional brain mapping with magnetic resonanceimaging. Proc. Natl. Acad. Sci. U.S.A., 89:59515955.
Ojemann, J., Akbudak, E., Snyder, A., McKinstry, R.,Raichle, M., and Conturo, T. (1997). Anatomiclocalization and quantitative analysis of gradientrefocused echo-planar fMRI susceptibility artifacts.Neuroimage, 6:156167.
-
8/12/2019 Introduction to Reward Processing
26/28
-
8/12/2019 Introduction to Reward Processing
27/28
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
27
Schultz, W., Dayan, P., and Montague, P. R. (1997). Aneural substrate of prediction and reward. Science,275(5306):15931599.
Schwarzbauer, C., Raposo, A. & Tyler, L.K. (2005).Spin-echo fMRI overcomes susceptibility-inducedsignal losses in the inferior temporal lobes.
NeuroImage, 26 (S1): 802.Schwarzbauer C., Mildner T., Heinke W., Zysset S.,Deichmann R., Brett M., Davis M.H. (2006). Spin-echp EPI The method of choice for fMRI of brainregions affected by magnetic fieldinhomogeneities? Human Brain Mapping, AbstractNo: 1049.
Seymour, B., Daw, N., Dayan, P., Singer, T., andDolan, R. (2007a). Differential encoding of lossesand gains in the human striatum. J. Neurosci.,27:48264831.
Seymour, B., ODoherty, J., Dayan, P., Koltzenburg,M., Jones, A., Dolan, R., Friston, K., and
Frackowiak, R. (2004). Temporal difference modelsdescribe higher-order learning in humans. Nature,429:664 667.
Seymour, B., Singer, T., and Dolan, R. (2007b). Theneurobiology of punishment. Nat. Rev. Neurosci.,8:300311.
Shafir, E., Diamond, P.A., and Tversky, A. (1997). OnMoney Illusion. Quarterly Journal of Economics,112:341-74.
Simmons, W., Martin, A., and Barsalou, L. (2005).Pictures of appetizing foods activate gustatorycortices for taste and reward. Cereb. Cortex,15:16021608.
Stark, C.E., Squire, L.R. (2001). When zero is not zero:the problem of ambiguous baseline conditions infMRI. Proc. Natl. Acad. Sci. U.S.A., 98, 22:12760-6.
Sutton, R. S. (1988). Learning to predict by the methodof temporal difference. Machine Learning, 3, 9-44.
Sutton, R. and Barto, A. (1981). Toward a moderntheory of adaptive networks: expectation andprediction. Psychol Rev, 88:135170.
Sutton, R. S. & Barto, A. G. (1990). Time-derivativemodels of Pavlovian reinforcement. In M. Gabriel& J. Moore (eds.), Learning and computationalneuroscience: foundations of adaptive networks,
497-537, Boston: MIT Press.Talairach, J. and Tournoux, P. (1988). Co-planar
Stereotaxic Atlas of the Human Brain. Thieme,New York.
Talmi, D., Seymour, B., Dayan, P., and Dolan, R.(2008). Human pavlovian-instrumental transfer. J.Neurosci., 28:360368.
Thorpe, S., Rolls, E., and Maddison, S. (1983). Theorbitofrontal cortex: neuronal activity in thebehaving monkey. Exp Brain Res, 49:93115.
Thorndike, E. L. (1911). Animal intelligence:experimental studies. New York: Macmillan.
Thut, G., Schultz, W., Roelcke, U., Nienhusmeier, M.,Missimer, J., Maguire, R., and Leenders, K. (1997).Activation of the human brain by monetary reward.Neuroreport, 8:12251228.
Tiggemann, M, Kemps, E (2005). The phenomenologyof food cravings: the role of mental imagery.
Appetite, 45, 3:305-13.Tobler, P. (2003). Coding of basic reward parametersby dopamine neurons. PhD Thesis, University ofCambridge.
Tobler, P., Dickinson, A., and Schultz, W. (2003).Coding of predicted reward omission by dopamineneurons in a conditioned inhibition paradigm. J.Neurosci., 23:1040210410.
Tobler, P., Fiorillo, C., and Schultz, W. (2005).Adaptive coding of reward value by dopamineneurons. Science, 307:16421645.
Tobler, P., Fletcher, P., Bullmore, E., and Schultz, W.(2007a). Learning-related human brain activations
reflecting individual finances. Neuron, 54:167175.Tobler, P., Odoherty, J., Dolan, R., and Schultz, W.
(2006). Human neural learning depends on rewardprediction errors in the blocking paradigm. J.Neurophysiol., 95:301310.
Tobler, P., ODoherty, J., Dolan, R., and Schultz, W.(2007b). Reward value coding distinct from riskattitude-related uncertainty coding in human rewardsystems. J. Neurophysiol., 97:16211632.
Tremblay, L., Hollerman, J. R., and Schultz, W.(1998). Modifications of reward expectation-relatedneuronal activity during learning in primatestriatum. J Neurophysiol, 80(2):964977.
Tremblay, L. and Schultz, W. (1999). Relative rewardpreference in primate orbitofrontal cortex. Nature,398(6729):704 708.
Tremblay, L. and Schultz, W. (2000a). Modificationsof reward expectation-related neuronal activityduring learning in primate orbitofrontal cortex. JNeurophysiol, 83(4):18771885.
Tremblay, L. and Schultz, W. (2000b). Rewardrelatedneuronal activity during go-nogo task performancein primate orbitofrontal cortex. J Neurophysiol,83(4):18641876.
Tricomi, E., Delgado, M., and Fiez, J. (2004).Modulation of caudate activity by action
contingency. Neuron, 41:281292.Tzourio-Mazoyer, N, Landeau, B, Papathanassiou, D,
Crivello, F, Etard, O, Delcroix, N, Mazoyer, B,Joliot, M (2002). Automated anatomical labeling ofactivations in SPM using a macroscopic anatomicalparcellation of the MNI MRI single-subject brain.Neuroimage, 15, 1:273-89.
Valentin, V., Dickinson, A., and ODoherty, J. (2007).Determining the neural substrates of goal-directedlearning in the human brain. J. Neurosci., 27:40194026.
-
8/12/2019 Introduction to Reward Processing
28/28
Introduction to Reward processing
28
Vohs, K., Mead, N., and Goode, M. (2006). Thepsychological consequences of money. Science,314:11541156.
Waelti, P., Dickinson, A., and Schultz, W. (2001).Dopamine responses comply with basicassumptions of formal learning theory. Nature,
412(6842):4348.Winston, J., Gottfried, J., Kilner, J., and Dolan, R.(2005). Integrated neural representations of odorintensity and affective valence in human amygdala.J. Neurosci., 25:89038907.
Wise, R. (2002). Brain reward circuitry: insights fromunsensed incentives. Neuron, 36:229240.
Wise, R. (2004). Dopamine, learning and motivation.Nat. Rev. Neurosci., 5:483494.
Wittmann, B., Schott, B., Guderian, S., Frey, J.,Heinze, H., and Duzel, E. (2005). Reward-relatedFMRI activation of dopaminergic midbrain isassociated with enhanced hippocampus-dependent
long-term memory formation. Neuron, 45:459467.
Worsley, K., Marrett, S., Neelin, P., Vandal, A. C.,Friston, K., and Evans, A. C. (1996). A unifiedstatistical approach for determining significantvoxels in images of cerebral activation. HumanBrain Mapping, 4:5873.
Yoo, S., Freeman, D., McCarthy, J., and Jolesz, F.
(2003). Neural substrates of tactile imagery: afunctional MRI study. Neuroreport, 14:581585.Zink, C., Pagnoni, G., Chappelow, J., Martin-Skurski,
M., and Berns, G. (2006). Human striatal activationreflects degree of stimulus saliency. Neuroimage,29:977983.
Zink, C., Pagnoni, G., Martin, M., Dhamala, M., andBerns, G. (2003). Human striatal response to salientnonrewarding stimuli. J. Neurosci., 23:80928097.
Zink, C., Pagnoni, G., Martin-Skurski, M., Chappelow,J., and Berns, G. (2004). Human striatal responsesto monetary reward depend on saliency. Neuron,42:509517