introduction to reward processing

8/12/2019 Introduction to Reward Processing

1/28

1 Introduction to Reward processingKrishna Prasad Miyapuram

Ph.D. Thesis Chapter, University of Cambridge, April 2!

1.1 Functions of rewardsReward seeking behaviour depends to a large extent on successfully extracting reward information

from a large variety of environmental stimuli and events. Learning to reliably predict the occurrence

of rewards such as food enables an organism to prepare behavioural reactions and improve the

choices that it makes in the future. Learning can be defined as a change in behaviour. Various

sensory cues from the environment such as sounds, sights and smells that are associated with a

reward guide the animal to return to the previously experienced reward (Wise, 2002). Thus, one of

the main functions of rewards is to induce learning, as subjects will come back for more when they

encounter a reward. Another function of rewards is to induce approach and consummatory

behaviour for acquiring the rewarding object. This is essential for decision making and Goal-

directed behaviour, as the animal learns to decide the appropriate actions to be executed with

rewards as goals. The third function of rewards is to induce subjective feelings of pleasure and

hedonia (positive emotions). This common perception associates rewards primarily with happiness.

Thus rewards have very basic functions in the life of individuals and are necessary for survival and

reproduction (survival of genes) (Schultz, 2000, 2004, 2006).

1.1.1

Learning by conditioning

Reward-directed learning can occur by associating a stimulus with a reward (Pavlovian or classical

conditioning) or by associating an action with a reward (instrumental or operant conditioning).

These forms of learning fall under the category of associative learning. More than a century ago,

Thorndike (1898) argued that learning consists of the formation of connections between stimuli and

responses and that these connections are formed whenever a response is followed by a reward. This

kind of learning is called instrumental (or operant) conditioning as the delivery of the reward is

contingent on the response made by the animal. Pavlov (1929) delivered the reward to his subjectsindependently of the animals behaviour. Thus, learning in Pavlovian conditioning consisted of


2/28

Introduction to Reward processing

2

pairing between a stimulus and a reward. In both kinds of learning an arbitrary, previously neutral

stimulus (Conditioned Stimulus, CS) acquires the function of a rewarding stimulus after being

repeatedly associated in time with a rewarding object (Unconditioned Stimulus, US).

The early definitions of conditioning have emphasised that the temporal contiguity of the CS

and the US is essential for learning. Modern views of conditioning, however, suggest that the

pairing or contiguity of two events is neither necessary nor sufficient for learning to occur (see

Rescorla, 1988 for review). Rather, conditioning depends on the information that the CS provides

Figure 1-1 Learning by classical conditioning

(a) Contiguity requirement. The US needs to follow the CS in a temporallycontiguous manner. (b) If the US is delayed after the offset of CS, it iscalled trace conditioning procedure. (c) Contingency requirement. The USshould have higher probability of occurring in the presence of the CS thanits absence for excitatory conditioning. (d) If the CS predicts the omissionof a US, it is said to be conditioned inhibition. (e) prediction error.Unexpected delivery of reward gives a positive prediction error; while the

omission of a predicted reward gives a negative prediction error. (f) higherorder conditioning occurs when a second stimulus CS predicts the

occurrence of the CS.


3/28

Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

3

about the US. More specifically, the US needs to occur more frequently in the presence of the CS as

compared with its absence. Further, a negative relation between a CS and US can be learned if the

occurrence of the CS predicts the omission of the US (conditioned inhibition, Rescorla, 1969). This

suggests that contingency of the US upon occurrence of the CS is crucial for Pavlovian conditioning

(Dickinson, 1980). When a US is fully predicted by a CS, then it does not contribute to any further

learning even if the contiguity and contingency requirements are fulfilled. This phenomenon is

illustrated by the blocking effect (Kamin, 1969), in which a previously formed association prevents

or blocks the formation of a new association. Kamin (1969) proposed that the surprise or error in

prediction of the US contributes to learning. Thus, there are three key factors govern learning by

conditioning contiguity, contingency, and prediction error (Tobler, 2003; Schultz, 2006).

Box 1 Models of conditioning Role of prediction error

Prediction error has been fundamental to many models of conditioning. Rescorla and Wagner

(1972) proposed that repeated pairing of a CS (stimulus A) and a US will result in a gradual

increase in the strength of association (VA) between them. According to their model, the change in

associative strength,

VA= (-VT)

where, the value of is set by the magnitude of the US and represents the maximum strength that

the CS-US association can achieve. VT represents the sum of associative strengths of all stimuli

present on the trial. Therefore, the term -VT represents the prediction error, which is nothing but

the discrepancy between the maximum associative strength and the current prediction. The two

learning-rate parameters and with values between 0 and 1 determined by the salience of the CS

(stimulus A) and the US respectively, that are fixed during conditioning. The Rescorla-Wagner (R-

W) model can explain the contingency requirement for conditioning by allowing the experimental

context to be associated with the US like any other CS. Hence if the probability p(US|CS) of the US

occurring in the presence of the CS is lower than the probability p(US|no CS) of the US occurring

in the absence of the CS, the associative strength for predicting the US would be greater for the

experimental context compared to that of the CS (conditioned inhibition). The blocking effect can

also be explained as the R-W model incorporates the prediction error from the total associative

strength VT of all stimuli present on a given trial. So a fully predicted US does not generate any

prediction error and hence blocks any further learning by a second stimulus. Despite, the limitations


4/28


4

of R-W model to explain phenomenon like latent inhibition (the pre-exposure of a CS retards later

conditioning of the CS with a US), the prediction error principle remains central to a number of

contemporary models of conditioning (see Pearce and Boston, 2001).

Attentional theories of conditioning have suggested that in addition to the processing of theUS proposed by Rescorla-Wagner model, the processing of the CS is integral to the process of

conditioning (Mackintosh, 1975; Pearce and Hall, 1980). According to Mackintosh (1975), stimuli

that generate least absolute value of prediction error are good predictors of US and generate

maximum attention. The change in associability of a stimulus A is positive if | -VA| < | -Vx|

and is negative otherwise. Here, Vx is sum of associative strengths of all stimuli except A. The

change in associative strength is given by

VA= A(-VA)

Thus, Mackintosh model suggests a separable error term so that associative change undergone by a

CS is influenced by the discrepancy between its own associative strength (VA) and the outcome ().

Pearce and Hall (1980) proposed that the associability Aof a stimulus A on a trial n is determined

by the absolute value of the discrepancy for the previous occasion on which stimulus A was

presented.

An

= | -VT|

n-1

The change in associative strength is determined by

VA= ASA

where, SAdenotes the salience of the CS.

Pearce-Hall model suggests, contrary to the Mackintosh model, that maximum attention (processing

of the CS) is generated by stimuli that have generated prediction error of the US in the previous

trial. Nevertheless, the attentional theories of conditioning suggest that attention to CS is crucial for

learning and changes in attentional processing result from absolute prediction errors (see Pearce and

Bouton, 2001 for a review).

The models of conditioning can be summarised as essentially including two terms that are

combined multiplicatively CS processing (eligibility) and US processing (reinforcement). While

the Rescorla-Wagner model proposed that learning is driven entirely by changes in US processing

in terms of prediction error, the Mackintosh and Pearce-Hall models have emphasised the role of

CS processing (attention) in terms of change in associability. Le Pelley (2004) has suggested a


5/28


5

hybrid model integrating these previous models of associative learning. The hybrid model

distinguishes between attentional associability of the Mackintosh model and the salience

associability of the Pearce-Hall model and combines them in a multiplicative way along with

separable error term (e.g. | -VA| ) and summed error term of Rescorla-Wagner model.

A real-time extension of the Rescorla-Wagner model is the temporal difference (TD) model

developed by Sutton and Barto (1981; Sutton, 1988; see Sutton and Barto, 1990 for a review with

reference to animal learning theories). The advantage of real-time models is that the temporal

relationship between stimuli within a trial can be captured. An important illustration is the delay

conditioning procedure. In this procedure, the CS has an onset much earlier than the US and the

onset of the US is at the offset of the CS or slightly earlier. A further delay between the offset of the

CS and the onset of the US is referred to as trace conditioning procedure. The time between theonset of the CS and the onset of the US is called Inter-Stimulus-Interval (ISI). The effectiveness of

conditioning is known to reduce for long ISI (see Sutton and Barto, 1990). This can be explained by

assuming that the internal representation of CS as perceived by the subject diminishes during the

ISI. This can be modelled by taking several time-bins within a trial and the CS predicts a temporally

discounted sum of all future rewards within the trial with more recent time-bins having greater

weight. Thus, an US occurring with longer ISI is discounted more and hence is less effective in

conditioning. For example, using an exponential discounting function, with as discount factor, the

reward predicted Vtat time t is given by

Vtt+1+ t+2+ 2t+3+

3t+4+

The following recursive relationship allows estimation of the current prediction and avoids the

necessity to wait until all future rewards are received in that trial.

Vtt+1+ Vt+1

We can now define the temporal difference error that must approach zero with learning as

t= t+1+ Vt+1 - Vt

and the learning is governed by

VA= A(t+1+ Vt+1 - Vt)

where, t+1+ Vt+1takes the role of (asymptotic value of US) in Rescorla-Wagner model.

Another important illustration of the use of real-time models such as the TD is that it can explain

higher-order conditioning, in which conditioned stimuli not only acquire predictive power when


6/28


6

associated with an US, but also when associated with another conditioned stimuli that has

previously been associated with an US. The prediction of reward at various time-points within a

trial, as proposed by the TD model, explains the ability of the organism to predict the US based on

the earliest available CS.

1.1.2 Approach behaviour and decision makingRewards act as positive reinforcers by increasing the frequency and intensity of the behaviour that

leads to the acquisition of goal objects (Schultz, 2000). Reinforcers are those objects that increase

the frequency of behaviour. Rewards also act as goals in their own right and can therefore elicit

approach and consummatory behaviour. Omission of reward leads to extinction of behaviour.

Punishment has opposite motivational valence to reward and decrease the frequency of behaviour.

Avoidance/escape behaviours are negatively reinforced (strengthened) in order to prevent/terminate

a punishment, respectively. These findings have been formalised as law of effect (Thorndike, 1911)

that states learning would only occur if there was reinforcement. The approach behaviour has been

central to the operational definition of rewards as those objects which subjects will work to acquire

through allocation of time, energy, or effort (McClure, 2003) or in other words, rewards make

subjects come back for more.

In Pavlovian conditioning, the conditioned stimuli elicit responses that help prepare the

animal for the consumption of reward. Konorski (1967) distinguished between preparatory and

consummatory conditioned responses. Preparatory responses (e.g. excitement, approach) depend on

the general motivational attributes of, or emotional responses to, a reinforcer and hence reflect the

general affective value of the reinforcer. Consummatory responses (e.g. pecking, salivation) depend

on the specific sensory attributes of the reinforcer (Mackintosh, 1983). In most experiments, both

preparatory and consummatory conditioning will occur. Therefore, CS will be associated with both

affective and sensory attributes of the US.

In instrumental conditioning, the actions that lead to reward are reinforced. In real world, an

animal is often encountered with more than one action to choose. The animal is then confronted

with a decision-making situation and would choose those actions that have maximum value.

Reinforcement learning models and its implementations such as the actor-critic architecture

provide an account of choice behaviour. An agent (organism) learns to achieve a goal (maximise

reward) by navigating through the space of states (making decisions - actor) using the reinforcement

signal (updating the value function - critic). In the temporal difference (TD) model, the TD error


7/28


7

guides the updating of value function V(St) when transitioning from state Stto state St+1. Q-learning

and its variants have offered estimation of value functions over state-action pairs, so that in a given

state s, the organism chooses the action a that maximises the value Q(s,a). The updating of value

function Q is done similar to the TD model.

Box 2 Basic reward parameters Microeconomic concepts

The influence of rewards on decision-making can be assessed by the basic reward parameters such

as magnitude, probability and delay. Given a choice between different magnitudes or probabilities

of reward, an organism would choose those options with higher magnitude and probability. Smaller

delays to obtain reward are preferred to longer delays. The reward value is expressed as associative

strength in models of conditioning that facilitates learning.

The occurrence of rewards is uncertain in the dynamic world, in which both the environment

and the behaviour of other agents render the rewards partly unpredictable. Uncertainty can be in the

expected magnitude of the reward (characterised by the variance) or the probability (p) of the

reward (maximum uncertainty at p = 50%) or the time of delivery of the reward. The uncertainty of

rewards can generate attention that determines learning according to associability learning rules

(Mackintosh, 1975; Pearce and Hall, 1980).

Pascal, way back in 1650, conjectured that human choice behaviour could be understood bythe expected value (product of probability and magnitude of the reward). Bernoulli (1738/1954)

suggested that the actual value or the utility that the people assign to an outcome depends on the

wealth of the assigning person and grows more slowly than its magnitude. Bernoulli proposed that

increase in magnitude is always accompanied by an increase in the utility, which follows a concave

(more specifically, a logarithmic) function of magnitude. Hence, individuals behave as to maximise

the expected utility, instead of the expected value. Prospect theory (Kahneman and Tversky, 1979)

suggests that not only the perception of magnitude but also the perception of probability issubjective to an individual.

1.1.3 Subjective feelings of pleasureThe common perception of reward associates positive feelings of pleasure and hedonia as one of the

main functions of reward. Pleasure is a subjective feeling as it depends on the motivation of the

organism (wealth, satiety etc) and other available options (contextual effects). Rewards induce

positive emotions (affect). Recent theories (Berridge and Robinson, 2003) have suggested that the


8/28


8

motivational and emotional functions of rewards are dissociable as wanting and liking respectively.

Both the motivational and emotional functions can occur either consciously or unconsciously.

Indeed, wanting can occur without pleasurable liking as accumulated wealth or satiation can fade

away the liking.

1.2 Classical Reward structures: NeurophysiologyDopamine neurons of the ventral tegmental area (VTA) and substantia nigra have long been

identified with the processing of rewarding stimuli. Romo and Schultz (1990) have shown that

phasic dopamine responses appeared to be related to the appetitive properties of the object being

touched rather than the object itself. Phasic burst of dopamine neurons occurred when the monkey's

hand touched a morsel of food but not when the animal's hand touched a wire or other non-food

objects. Dopamine neurons in the substantia nigra pars compacta form part of the nigrostriatal

pathway and project mainly to the caudate and putamen and is identified strongly with motor

function. More medially, the ventral tegmental area (VTA) projects strongly to the nucleus

accumbens and also to the amygdala and hippocampus (mesolimbic pathway). The mesocortical

pathway from medial VTA project to a number of brain structures including the dorsal and ventral

prefrontal cortex. The mesocorticolimbic structures are known to be involved in processing the

reward information.

1.2.1 Dopamine responses related to animal learning theoryDopamine neurons respond to the sight of primary food reward and to the conditioned stimulus

associated with reward (Ljungberg et al. 1992). However dopamine responses were not observed to

a light that was not associated with task performance, suggesting the behavioural significance of

dopamine neurons specific to reward. When a stimulus predicting reward is itself preceded by

another stimulus, the phasic activation of dopamine neurons transfers back to this latter stimulus

(Schultz et al., 1993). Thus, dopamine neurons might respond to the earliest reward predicting

stimulus.

Mirencowiz and Schultz (1994) found that of dopamine neurons showed a short-latency,

phasic response to unpredicted liquid rewards and during conditioning. After learning, the neuronal

responses occurred at the onset of the conditioned stimulus. When a predicted reward is omitted,

dopamine neurons are depressed time-locked to the usual occurrence of the reward. It is suggested

that the phasic dopamine response might encode the discrepancy between the predicted and the


9/28


9

actual occurrence of the reward (for review see Schultz et al., 1997). More recently, Bayer and

Glimcher (2005) have used a regression model that replicated the findings consistent with a

temporal difference model demonstrating a role of dopamine neurons in positive reward prediction

error. Hollerman and Schultz (1998) showed that dopamine neurons were activated by rewards

during early trials and the activity progressively reduced as the rewards became more predictable.

Further, these neurons were activated when rewards occurred at unpredicted times and were

depressed when rewards were omitted at predicted times. Thus dopamine neurons encode errors in

prediction of both the occurrence and the time of rewards.

Waelti et al. (2001) used blocking procedure to show that the responses of dopamine

neurons to conditioned stimuli were governed differentially by the occurrence of reward prediction

errors rather than stimulusreward associations alone. Tobler et al. (2003) used a conditionedinhibition paradigm and showed that out of 69 dopamine neurons that were strongly active to

reward predicting stimuli, 48 neurons showed considerable depressions to conditioned inhibitors

and minor activations in remaining neurons. To be able to successfully discriminate between reward

and non-reward predicting stimuli, attention must be paid to both conditioned excitors as well as

Figure 1-2 Primary target regions of dopamine

The dopamine neurons, named after the neurotransmitter they release withnerve impulses in their projection territories, are located in the midbrainstructures substantia nigra (pars compacta) and the medially adjoiningventral tegmental area (VTA). The axons of dopamine neurons project to

the striatum (caudate nucleus, putamen and ventral striatum includingnucleus accumbens), the dorsal and ventral prefrontal cortex, and a

number of other structures.


10/28


10

inhibitors. This indicates differential neural coding of reward prediction and attention.

These findings indicate dopamine responses comply with basic tenets of animal learning

theory and indicate a role for dopamine in reward-based learning, in particular representing reward

prediction errors. Learning rules such as proposed by Rescorla and Wagner (1972) also explaingreater associative strength for increasing magnitudes of reward. Further as learning is captured by

the concept of prediction error, thus increasing probability of reward should result in smaller

responses to the reward and thereby greater responses to the reward predicting cue. These basic

parameters of reward processing, namely magnitude, probability, expected value and uncertainty

have been fundamental concepts of microeconomics.

Two reports by Schultz and colleagues (Fiorillo et al., 2003; Tobler et al., 2005) have shown

dopamine responses to magnitude and probabilities of reward. Fiorillo et al. (2003) found that the

phasic activation of dopamine neurons varied monotonically across the full range of probabilities,

supporting past claims that this response codes the discrepancy between predicted and actual

reward. In addition, a gradual increase in activity until the potential time of reward was observed

that was related to the uncertainty of obtaining a reward. Tobler et al. (2005) found that the phasic

activation of midbrain dopamine neurons showed similar sensitivity to both the magnitude and

probability of reward, and appeared to increase monotonically with expected reward value. Further,

a second form of adaptation observed was the change in sensitivity or gain of neural activity thatappeared to depend on the range of likely reward magnitudes.

Figure 1-3 Dopamine responses to basic reward parameters (adapted from Tobler, 2003)


11/28


11

1.2.2 Reward signals in the striatum and orbitofrontal cortexHikosaka et al. (1989) showed reward expectation and reward delivery related activation in caudate

neurons. The activations were non-selective for how the monkey obtained the reward, i.e., by visual

fixation only, by a saccade, or by a hand movement. Apicella et al (1991) found ventral and dorsal

striatal responses to primary liquid rewards that could be distinguished from movement related

activations in posterior putamen. Neurons that detect rewards are more common in the ventral

striatum than in the caudate nucleus and putamen. Schultz et al. (1992) showed reward-expectation

and reward-delivery related responses in the ventral striatum. Changes in the appetitive value of the

reward liquid modified the magnitude of activations, suggesting a possible relationship to the

hedonic properties of the expected event.

Thorpe et al. (1983) showed that neurons in orbitofrontal cortex responded selectively to

particular foods or aversive stimuli that could not be explained by simple sensory features of the

stimulus. Orbitofrontal neurons tracked whether particular visual stimuli continue to be associated

with reinforcement and the responses reversed when the stimulus contingencies were interchanged.

Critchley and Rolls, (1996) found neuronal responses in orbitofrontal cortex to rewards and reward-

predicting stimuli is reduced with satiation and hence are related to the motivational value rather

than the sensory properties of reward objects. Tremblay and Schultz (2000) showed activation in

orbitofrontal neurons to expectation of reward that also detect reward delivery at trial end. The

activations also preceded expected drops of liquid delivered outside the task.

The number of possible reward values and stimuli has no absolute limits. However, the

number of neurons and their possible spike outputs are limited. If the neurons outputs were evenly

allocated for reward values, there would be little discrimination between rewards. Neurons in the

orbitofrontal cortex of the monkey discriminate between different rewards on the basis of their

relative preferences (Tremblay and Schultz, 1999). For example, consider a neuron that is active

when a more preferred reward (such as a piece of apple) is expected rather than a less preferred

reward (such as cereal). The same neuron shows higher activity, in a different trial, when an even

more preferred reward (such as raisin) is expected rather than the previously preferred reward of

apple. Thus, rewards may influence each other, and the value of a reward can depend on other

available rewards. Cromwell and Schultz (2003) have shown that single neurons within the anterior

striatum distinguish between minute differences in reward magnitude (


12/28


12

study. Cromwell et al., (2005) suggested that the shift in reward processing due to different

preferences of the animal may reflect the adaptation of responses to the current reward distribution.

For linear, monotonic responses, this can be expressed as y=a+b(x-p) where, b represents reward

sensitivity and p represents the shift in the current distribution and a is a constant. It might be

possible that the immediate past experience sets up a prediction about the mean and range of the

future rewards. This prediction would facilitate the brain to use its full coding potential, thus

optimising its response, only within this distribution.

1.3 FMRI studies of reward processingAlthough animal studies provide an unprecedented approach to study neural mechanisms at cellular

level, the limited communication and cognitive capabilities restricts the investigation of reward

functions in animals. Early neuroimaging studies have replicated the animal results in human

subjects and extended the view of putative reward-processing neural structures. Presentation of

monetary or liquid rewards and stimuli predicting such rewards activates reward structures

previously characterised in neurophysiological experiments, notably the striatum, orbitofrontal

cortex, amygdala and dopaminergic midbrain. As human blood oxygen level dependent (BOLD)

responses most likely reflect presynaptic inputs to neurons (Logothetis et al. 2001), some of these

activations may be due to the known dopaminergic inputs to these structures.

In a Positron Emission Tomography (PET) study, Thut et al. (1997) have found activation of

left frontal cortex, thalamus and midbrain in a go-no go task using monetary rewards. Arana et al.

(2003) in a PET study used a restaurant task in which subjects considered or chose items from a

menu tailored according to the subjects preferences. Amygdala and medial orbitofrontal cortex

were activated when considering appetitive incentive values of foods. Activation in amygdala

correlated with the subjects incentive ratings and the activation in medial orbitofrontal cortex

correlated with the difficulty of choice being made suggesting its role in goal selection.

Kirsch et al. (2003) used a differential conditioning paradigm and asked participants to

perform a reaction time task. Participants were rewarded (or not rewarded) with a monetary or

verbal feedback (fast or slow). Activity related to anticipation of reward in substantia nigra and

nucleus accumbens was stronger with highly motivating stimuli (monetary reward) compared to

less motivating ones (verbal feedback).

Gottfried et al. (2003) trained subjects with picture-odour associations while performing a

visuospatial discrimination task. After training, subjects received the same contingencies in two


13/28


13

further sessions. Subjects were fed to satiety selectively on one of the two food-based olfactory

rewards in between the two sessions. Activity in the amygdala and OFC declined for the CS

predicting the devalued odour, while the activity in the ventral striatum, insula and cingulate cortex

not only showed decreased responses to the CS predicting the devalued odour, but also increased

responses to the CS predicting non-devalued odour. Their results suggest that amygdala and OFC

encode the current value of the reward representations accessible to the predictive cues.

Ramnani et al. (2004) trained participants with Pavlovian conditioning paradigm in which

two conditioned stimuli predicted occurrence of a 1 pound reward or no reward respectively.

Participants were then scanned during which few of the trials had cue-outcome contingencies

reversed. Unexpected rewards evoked activation in the orbitofrontal cortex, frontal pole,

parahippocampal cortex and cerebellum. Unexpected reward failure evoked activity in the frontalpole and the temporal cortex.

Cox et al. (2005) used a simple card game (guessing whether the number on the back of a

card shown face up was higher or lower than the value shown) to mask awareness of a conditioning

task in which discriminable visual patterns were associated with monetary reward and loss. The

patterns were then presented one at a time without reward or negative feedback. Subjects indicated

their preference when two patterns were presented simultaneously. This procedure allowed the

authors to test the brain activations to conditioned stimuli in the absence of explicit rewardanticipation. Activity was observed in ventral striatum and OFC when reward was compared with

negative feedback. When passively viewing the conditioned stimuli, activation was observed in the

OFC. Thus OFC is involved in representing both rewarding and conditioned stimuli that have

acquired reward value.

ODoherty et al. (2006) used Pavlovian conditioning and determined subjects preferences

for five different food flavours that were associated with five fractals. Subjects performed a

keypress indicating the spatial location (left or right) of the fractal. Using a temporal difference

model of learning the value signal, they found that ventral midbrain region showed a linear response

to preferences. However, the ventral striatum showed bivalent responses, with maximal responses

to most and least preferred food, possibly consistent with the suggestions that ventral striatum might

be involved in both appetitive and aversive learning (Jensen et al., 2003; Knutson et al., 2001;

Seymour et al., 2004). Given that no aversive stimuli were used by ODoherty et al. (2006), a

further possibility is that the ventral striatum is coding a relative value of the stimuli rather than the


14/28


14

objective value independently of the context in which the stimuli are presented (Cromwell et al.

2005).

Recently, Bray and ODoherty (2007) used classical conditioning procedure in which

subjects performed a simple spatial identification task to indicate the side (left or right) on which afractal was presented. Participants received reinforcement on 50% of trials by attractive/

unattractive faces. They find significant prediction-error related activity in the ventral striatum for

the attractive compared with unattractive faces. In contrast, amygdala showed positive correlations

with prediction error signals of both attractive and unattractive faces.

1.3.1 Motivational ValenceA number of neuroimaging studies have found distinct neural systems processing reward andpunishment information. Delgado et al. (2000) asked participants to guess whether the value of the

card was higher or lower than 5. Participants received monetary reward ($1.00), punishment ($0.50)

or neutral feedback. They found that bilateral caudate in the dorsal striatum showed differential

activation based on the valence of the feedback. A sharp decrease of response below baseline was

observed after a punishment, while the activations sustained following a reward. Delgado et al.

(2004) found the activity in caudate nucleus was more robust in early phases of learning, which

decreased for the reward-feedback signal as the learning progressed for the well-predicted cues.

They suggest the role of caudate in the initial acquisition of contingencies by trial-and-error

learning as well as its activity is modulated as a function of learning and predictability.

Knutson et al. (2001) showed that anticipation of reward significantly increased activation in

the nucleus accumbens, whereas activation in medial caudate increased in anticipation of both

rewards and punishments. Nucleus accumbens activity was also correlated with self-reported

happiness. Cues signalled the potential reward ($0.20, $1.00, or $5.00), punishment ($0.20, $1.00,

or $5.00) or no monetary outcome. Subjects performed a button press task during a target to win or

avoid losing money and the task difficulty was adjusted so that subjects should succeed on ~66% of

target responses.

The lateral area of orbitofrontal cortex (OFC) is activated following a punishing outcome

and the medial OFC is activated following a rewarding outcome (see Elliot et al; 2000 Review).

ODoherty et al. (2001) used a visual reversal-learning task in which the choice of a correct

stimulus lead to a probabilistically determined monetary reward and the choice of an incorrect

stimulus lead to a monetary loss. They found a medial-lateral distinction for rewarding and


15/28


15

punishing outcomes, respectively. ODoherty et al., (2003) used a reversal task in which selection

of a correct stimulus lead to 70% probability of receiving monetary reward and 30% probability of

monetary punishment. The incorrect stimulus had the reverse contingency. The reversal occurred on

a random trial after a criterion of five selections of the correct stimulus was reached. They found

that ventromedial and orbital PFC are not only involved in representing the valence of outcomes,

but also signal subsequent behavioural choice. The anterior insula / caudolateral OFC was related to

behavioural choice and was active in trials that required a switch in stimulus choice the subsequent

trials.

Jensen et al. (2003) found ventral striatum activation in anticipation of aversive stimuli

(unpleasant cutaneous electrical stimulation) that was not a consequence of relief after the aversive

event. Further, the ventral striatum was active regardless of whether there was an opportunity toavoid the stimulus or not. Nitschke et al. (2006) used a passive viewing task of aversive and neutral

pictures. They found activation in dorsal amygdala, anterior insula, dorsal ACC, right DLPFC, and

posterior OFC during both anticipation and viewing of aversive pictures. Further, rostral ACC,

superior sector of right DLPFC and medial sectors of OFC were more responsive to anticipation of

aversive pictures than in response to them.

The relief obtained by avoidance of an aversive stimulus can itself be a reward. Kim et al.

(2006) used a monetary instrumental task in which participants chose between a pair of fractals thatmarked the onset of four trial types that predicted reward, avoided loss, neutral feedback, or no

feedback with 60% or 30% probability. They found that medial OFC activity increased after

receiving reward or avoiding loss, and decreased after failing to obtain a reward or receiving an

aversive outcome. These responses cannot be explained as a prediction error, because the activity

does not decrease over the course of learning. They also found signed reward prediction error

signals in the ventral striatum on reward trials but not on avoidance trials, possibly indicating that

monetary loss as a secondary reinforcer as against primary reinforcers such as aversive flavours or

pain might be processed differently in the ventral striatum.

1.3.2 Reward prediction errorsThe reward responses comply with formalisms of learning theory such as the reward prediction

error hypothesis. Berns et al. (2001) delivered subjects with fruit juice and water in a temporally

predictable or an unpredictable manner. Unpredictability of rewards resulted in significant activity

in nucleus accumbens and medial orbitofrontal cortex, while predictability resulted in activation


16/28


16

predominantly in the superior temporal gyrus. Unlike classical conditioning, the source of

prediction in Berns et al (2001) was based on the sequence of stimuli. Moreover, the subjects

preference for juice or water was reflected in the activity in sensorimotor cortex, but not in reward

regions. Pagnoni et al. (2002) demonstrated that activity in ventral striatum is time-locked to reward

prediction errors when a juice expected at 4 seconds after a cue initiated button press was delayed to

seconds. The finding was not replicated when the juice was replaced by a visual stimulus indicating

that ventral striatum selectively encodes rewarding events and not any salient stimulus in general.

McClure et al. (2003) used classical conditioning paradigm to test for temporal prediction errors

when a juice expected at 6 seconds delay after a light cue was delivered only after a further delay of

4 seconds. Thus a negative prediction error would occur for the absence of juice, while a positive

prediction error would occur for unexpected delivery of juice at a later time. They found that both

these prediction errors correlated with activity in the left putamen.

The real-time extension of the Rescorla-Wagner learning rule, i.e. the TD model has been

successfully used to explain brain activity in tasks involving prediction error. ODoherty et al.

(2003) used appetitive conditioning with taste reward. Three fractals were associated with glucose,

neutral taste or no taste. Reward was omitted or unexpectedly delivered in some of the trials.

Regression analysis with a temporal difference model revealed significant correlation of activity in

the ventral striatum and OFC with the error signal, suggesting their role in reward-related learning.

Seymour et al. (2004) used second-order pain learning task in which two visual cues preceded

delivery of high or low pain. While the second cue fully predicted the strength of the subsequently

experienced pain, the first cue only allowed a probabilistic prediction. They demonstrate that

activity in the ventral striatum and the anterior insula display a marked correspondence to the

signals predicted by the temporal difference models.

Seymour et al. (2007) used a probabilistic Pavlovian task to compare winning / losing

money in two conditions when the alternative was respectively winning / losing nothing, or losing /

winning money. Positive reward prediction error can be obtained by contrasting bivalent 1.00 win

with univalent 1.00 outcome. Similarly, Positive loss prediction error is obtained by contrasting

bivalent 1.00 loss with univalent 1.00 loss. The opposite contrasts would reveal negative

prediction errors. They found that striatal activation reflected positively signed prediction error in

anterior region for rewards and more posteriorly for losses.


17/28


17

1.3.3 Neuroimaging of basic reward parametersAnimal learning theory and microeconomic theory have suggested a number of basic reward

parameters, such a magnitude, probability, delay etc that are involved in processing reward

information. In a parametric study, Elliott et al. (2003) have found non-linear responses in

orbitofrontal cortex with increasing magnitudes of financial reward. They parametrically varied the

monetary reward value (10, 20, 50 pence and 1 pound) while subjects performed a simple target

detection task. They found that amygdala, striatum and dopaminergic midbrain responded

regardless of the reward value, while medial and lateral OFC responded non-linearly showing

maximum response for the lowest and highest values. Galvan et al. (2005) used a delayed-response

spatial-choice task and were presented with small, medium or large amount of coins. The exact

value of each reward was not disclosed to the subjects to avoid the subjects counting the totalmoney after each trial. They find reward magnitude related responses in nucleus accumbens,

thalamus and orbitofrontal cortex. Interestingly, only nucleus accumbens had a shift in activity from

the reward to the predicting cue during later stages of learning. A frontostriatal shift in activity can

be suggested (Pasupathy and Miller, 2005) as the OFC responses contrasted with the accumbens

activity and the responses in OFC increased to the rewarded response rather than to the predictive

cue.

Breiter et al. (2001) showed subjects prospects consisting of a set of three outcomes and one

of these amounts was awarded after a delay. Subjects could win or lose money in these prospects.

Three kinds of prospects (good: $10, $2.50, $0, intermediate: $2.50, $0, -$1.50 and bad: $0, -$1.50,

-$6) were used. Subjects could either win, lose or retain their initial endowment of $50. In the good

prospect, subjects could win additional money or retain their earnings, in the bad prospect subjects

could retain or lose money and in the intermediate prospect, subjects could win/retain/lose money.

Haemodynamic responses in the amygdala and orbital gyrus tracked the expected values of the

prospects. Sustained outcome phase responses in nucleus accumbens, amygdala, and hypothalamuswere ordered as a function of monetary payoff on the good prospect. They found a large overlap

between the neural activations in the prospect and outcome phase and little evidence for anatomical

segregation between the prospect and outcome phases. According to decision affect theory (Mellers

et al., 1997), responses to a given outcome depend on counterfactual comparisons. Thus $0 on a

good prospect will be experienced as a loss and the same outcome in a bad prospect would be

experienced as a win. Partial evidence for this was observed clearly in time courses of nucleus

accumbens and amygdala for the good and bad prospects, but not so for intermediate prospect.


18/28


18

McClure et al. (2004) examined neural correlates of time discounting while subjects made a

series of choices between monetary reward options that varied by delay (same day to 6 weeks later)

to delivery. They found ventral striatum, medial OFC, medial PFC, posterior cingulate and left

posterior hippocampus were related to choice of immediate rewards. In contrast, regions of lateral

prefrontal cortex and posterior parietal cortex are engaged uniformly by inter-temporal choices

irrespective of delay. Recently, McClure et al. (2007) used primary rewards (fruit juice or water)

with time delay of minutes instead of weeks and found similar activation patterns as in their

previous study. When the delivery of all rewards was offset by 10 min, there was no further

differential activity in limbic reward-related areas. Hence suggesting that time discounting is not a

relative concept.

Dreher et al. (2006) used three slot machines with two different reward values ($10, $20)and reward probabilities (0.25, 0.5), so that one pair of slot machines had the expected value

matched. To avoid counterfactual comparison, a common outcome of no reward with a probability

of 1 served as a fourth slot machine. They found that midbrain region responded transiently to

higher reward probability at the time of the cue and to lower reward probability at the time of the

reward outcome and in a sustained fashion to reward uncertainty during the delay period. These

results parallel those found in electrophysiological studies of primates (Fiorillo et al., 2003). The

midbrain activations could not be explained by increase in expected value alone, as when

comparing the two conditions with equal expected values, midbrain activation was robustly

activated in anticipation of an uncertain reward (50% probability) with low magnitude ($10)

compared with reward with known lower probability (25%) but higher magnitude ($20). A frontal

network covaried with the reward prediction error signal both at the time of the cue and at the time

of the outcome. The ventral striatum (putamen) showed sustained activation that covaried with

maximum reward uncertainty during reward anticipation. Their results suggest distinct functional

networks encoding statistical properties of reward information.

Recently, Liu et al. (2007) used a monetary decision making task in which participants

chose whether to bank or bet a certain number of chips. Decision to bank or losing the bet made

them start over from one chip, while the wager was doubled if they won the bet. However,

participants witnessed the outcome even after they banked. They contrasted three reward processes

reward anticipation (bet Vs bank), outcome monitoring (win Vs loss) and choice evaluation (right

Vs wrong). They found the striatum and medial/middle orbitofrontal cortex was activated by

positive reward anticipation, winning outcome and evaluation of right choices, whereas lateral


19/28


20/28


20

performance and minimal when receipt of money was unrelated to the task. They found that

behaviourally salient monetary rewards activate the human striatum, suggesting its role in saliency

of rewards rather than value or hedonic feelings. Tricomi et al. (2004) have reported that the

caudate nucleus was robustly activated when the subjects thought that whether they won or lost

money was contingent on the button press (i.e. action). Elliot et al. (2004) investigated whether the

neural responses to financial reward depended on instrumental action using a 2x2 factorial design

consisting of movement and reward as the two factors. Subjects performed a simple target detection

task. The trial types were indicated by coloured squares and hence rewards were fully predictable

and reward expectation remained fixed. Significant enhancement of the reward-related response

under the movement condition was seen in the dopaminergic midbrain, dorsal striatum and the

amygdala.

1.4 RationaleThe recently developed functional Magnetic Resonance Imaging (fMRI) methods provide a unique

opportunity to extend reward work to humans, first by replicating, and thus referencing, the reward

work done in monkeys, and then by investigating typical 'human' tasks that are difficult to approach

in animals.

As mentioned earlier, rewards have schematically three functions: they induce learning,

approach behaviour, and positive emotions. The first of the reward functions (learning) can be well

investigated in animals, for example using classical (Pavlovian) and instrumental (operant)

conditioning. The second reward function (approach behaviour) can also be investigated in animals,

but the work is limited due to their limited communication and cognitive abilities. The third reward

function (subjective feelings of pleasure) is very difficult to investigate in animals, and humans

appear to be the subjects of choice.

Monetary rewards are uniquely human. The importance of money in everyday life makes it

a strong reinforcer. Neurophysiological studies in animals have provided the primary basis for

speculations about the brain areas that might process reward information in the human brain. The

initial neuroimaging studies in humans using Positron Emission Tomography revealed that alpha-

numerically presented monetary reward was more reinforcing than positive reinforcement of the

word ok in the dorsolateral and orbital frontal cortex, midbrain and thalamus (Thut et al., 1997).

The success of fMRI to study reward processing in humans was obtaining measurable BOLD signal

changes in the orbitofrontal cortex (OFC), amygdala, ventral striatum/nucleus accumbens (see


21/28


21

McClure et al., 2004 for a review), regions that have previously been implicated in reward

processing in non-human primates. A wide range of rewarding stimuli including primary rewards

(liquids, smells, sexual stimuli), abstract rewards (money, positive reinforcement) and social

rewards (beautiful faces, pleasant touch) activate the same network of brain areas. The findings

from numerous animal and human studies has led researchers to suggest the roles that the different

brain areas might play in processing reward information. The midbrain and ventral striatum might

be involved in reward prediction error, while the orbitofrontal cortex might be involved in

evaluating rewards and relative processing of rewards. The amygdala, though traditionally believed

to process aversive and fear-inducing stimuli, is now generally believed to involve processing

reinforcer intensity both appetitive and aversive.

BibliographyAdcock, R., Thangavel, A., Whitfield-Gabrieli, S.,

Knutson, B., and Gabrieli, J. (2006). Reward-motivated learning: mesolimbic activation precedesmemory formation. Neuron, 50:507517.

Anderson, A., Christoff, K., Stappen, I., Panitz, D.,Ghahremani, D., Glover, G., Gabrieli, J., and Sobel,N. (2003). Dissociated neural representations ofintensity and valence in human olfaction. Nat.Neurosci., 6:196202.

Apicella, P., Ljungberg, T., Scarnati, E., and Schultz,W. (1991). Responses to reward in monkey dorsal

and ventral striatum. Exp Brain Res, 85(3):491500.Arana, F., Parkinson, J., Hinton, E., Holland, A.,

Owen, A., and Roberts, A. (2003). Dissociablecontributions of the human amygdala andorbitofrontal cortex to incentive motivation andgoal selection. J. Neurosci., 23:96329638.

Bandettini, P.A. (1994). Magnetic resonance imagingof human brain activation using endogenoussusceptibility contrast, PhD Thesis, MedicalCollege of Wisconsin.

Bandettini, P., Wong, E., Jesmanowicz, A., Hinks, R.,and Hyde, J. (1994). Spin-echo and gradient-echo

EPI of human brain activation using BOLDcontrast: a comparative study at 1.5 T. NMRBiomed, 7:1220.

Bayer, H. and Glimcher, P. (2005). Midbrain dopamineneurons encode a quantitative reward predictionerror signal. Neuron, 47:129141.

Beaver, JD, Lawrence, AD, van Ditzhuijzen, J, Davis,MH, Woods, A, Calder, AJ (2006). Individualdifferences in reward drive predict neural responsesto images of food. J. Neurosci., 26, 19:5160-6.

Bensafi, M, Sobel, N, Khan, RM (2007). Hedonic-specific activity in piriform cortex during odorimagery mimics that during odor perception. J.

Neurophysiol., 98, 6:3254-62.

Berger, TW, Alger, B, Thompson, RF (1976).Neuronal substrate of classical conditioning in thehippocampus. Science, 192, 4238:483-5.

Bernoulli, D. (1738/1954). Exposition of a new theoryon the measurement of risk. Econometrica, 22, 23-36 (translated from latin).

Berns, G. (1999). Functional neuroimaging. Life Sci.,65:2531 2540.

Berns, G., McClure, S., Pagnoni, G., and Montague, P.(2001). Predictability modulates human brainresponse to reward. J. Neurosci., 21:27932798.

Berridge, K. and Robinson, T. (1998). What is the roleof dopamine in reward: hedonic impact, rewardlearning, or incentive salience? Brain Res. BrainRes. Rev., 28:309369.

Berridge, K. and Robinson, T. (2003). Parsing reward.Trends Neurosci., 26:507513.

Boser, B.E., Guyon, I., and Vapnik, V. (1992). Atraining algorithm for optimal margin classifiers. InProceedings of the Fifth Annual Workshop onComputational Learning Theory, (ACM Press) pp.144152.

Bowman, C. and Turnbull, O. (2003). Real versusfacsimile reinforcers on the Iowa Gambling Task.

Brain Cogn, 53:207210.Boynton, G., Engel, S., Glover, G., and Heeger, D.(1996). Linear systems analysis of functionalmagnetic resonance imaging in human V1. J.Neurosci., 16:42074221.

Bray, S. and ODoherty, J. (2007). Neural coding ofreward-prediction error signals during classicalconditioning with attractive faces. J. Neurophysiol.,97:30363045.

Bray, S., Shimojo, S., and ODoherty, J. (2007). Directinstrumental conditioning of neural activity usingfunctional magnetic resonance imaging-derivedreward feedback. J. Neurosci., 27:74987507.


22/28


22

Breiter, H., Aharon, I., Kahneman, D., Dale, A., andShizgal, P. (2001). Functional imaging of neuralresponses to expectancy and experience ofmonetary gains and losses. Neuron, 30:619639.

Brett, M., Leff, A., Rorden, C., and Ashburner, J.(2001). Spatial normalization of brain images with

focal lesions using cost function masking.Neuroimage, 14:486500.Bunzeck, N. and Duzel, E. (2006). Absolute coding of

stimulus novelty in the human substantianigra/VTA. Neuron, 51:369379.

Buxton, R., Wong, E., and Frank, L. (1998). Dynamicsof blood flow and oxygenation changes duringbrain activation: the balloon model. Magn ResonMed, 39:855864.

Camerer, C.F., Hogarth, R.M. (1999). The Effects ofFinancial Incentives in Experiments: A Review andCapital-Labor-Production Framework. Journal ofRisk and Uncertainty, 19:742.

Carlson, TA, Schrater, P, He, S (2003). Patterns ofactivity in the categorical representations of objects.J Cogn Neurosci, 15, 5:704-17.

Chein, J. M. and Schneider, W. (2003). Designingeffective fMRI experiments. In Grafman, J. andRobertson, I., editors, Handbook ofNeuropsychology. Elsevier Science B.V.,Amsterdam.

Cox, DD, Savoy, RL (2003). Functional magneticresonance imaging (fMRI) "brain reading":detecting and classifying distributed patterns offMRI activity in human visual cortex. Neuroimage,19, 2 Pt 1:261-70.

Childress, A.R., Franklin, T., Listerud, J., Acton, P.D.,and OBrien, C.P. (2002). Neuroimaging of cocainecraving states: cessation, stimulant administration,and drug cue paradigms. InNeuropsychopharmacology: a fifth generation ofprogress. K.L. Davis, D. Charney, J.T. Coyle, C.Nemeroff, eds. pp. 575-1590.

Cohen, M. and Bookheimer, S. (1994). Localization ofbrain function using magnetic resonance imaging.Trends Neurosci., 17:268277.

Constable, R. (1995). Functional MR imaging usinggradientecho echo-planar imaging in the presenceof large static field inhomogeneities. J Magn Reson

Imaging, 5:746752.Cox, S., Andrade, A., and Johnsrude, I. (2005).

Learning to like: a role for human orbitofrontalcortex in conditioned reward. J. Neurosci.,25:27332740.

Critchley, H. and Rolls, E. (1996). Hunger and satietymodify the responses of olfactory and visualneurons in the primate orbitofrontal cortex. J.Neurophysiol., 75:16731686.

Cromwell, H. C., Hassani, O. K., and Schultz, W.(2005). Relative reward processing in primatestriatum. Exp Brain Res, 162(4):520525.

Cromwell, H. C. and Schultz, W. (2003). Effects ofexpectations for different reward magnitudes onneuronal activity in primate striatum. JNeurophysiol, 89(5):28232838.

Cusack, R., Russell, B., Cox, S., De Panfilis, C.,Schwarzbauer, C., and Ansorge, R. (2005). An

evaluation of the use of passive shimming toimprove frontal sensitivity in fMRI. Neuroimage,24:8291.

Dadds, M., Bovbjerg, D., Redd, W., and Cutmore, T.(1997). Imagery in human classical conditioning.Psychol Bull, 122:89103.

Dale, A. (1999). Optimal experimental design forevent-related fMRI. Hum Brain Mapp, 8:109114.

DArdenne, K., McClure, S., Nystrom, L., and Cohen,J. (2008). BOLD responses reflecting dopaminergicsignals in the human ventral tegmental area.Science, 319:12641267.

Davatzikos, C, Ruparel, K, Fan, Y, Shen, DG,

Acharyya, M, Loughead, JW, Gur, RC, Langleben,DD (2005). Classifying spatial patterns of brainactivity with machine learning methods: applicationto lie detection. Neuroimage, 28, 3:663-8.

De Houwer, J., Thomas, S., and Baeyens, F. (2001).Associative learning of likes and dislikes: A reviewof 25 years of research on human evaluativeconditioning. Psychological Bulletin, 127, 853869.

Deichmann, R., Gottfried, J., Hutton, C., and Turner,R. (2003). Optimized EPI for fMRI studies of theorbitofrontal cortex. Neuroimage, 19:430441.

Delgado, M., Miller, M., Inati, S., and Phelps, E.(2005). An fMRI study of reward-related

probability learning. Neuroimage, 24:862 873.Delgado, M., Nystrom, L., Fissell, C., Noll, D., and

Fiez, J. (2000). Tracking the hemodynamicresponses to reward and punishment in the striatum.J. Neurophysiol., 84:30723077.

Delgado, M.R., Stenger, V.A. and Fiez, J.A. (2004).Motivation-dependent responses in the humancaudate nucleus. Cereb. Cortex, 14(9):1022-30.

Dematt`e, M., Osterbauer, R., and Spence, C. (2007).Olfactory cues modulate facial attractiveness.Chem. Senses, 32:603610.

Dickinson, A. (1980). Contemporary animal learningtheory. Cambridge: Cambridge University Press.

Djordjevic, J., Zatorre, R., Petrides, M., Boyle, J., andJones-Gotman, M. (2005). Functionalneuroimaging of odor imagery. Neuroimage,24:791801.

Dreher, J., Kohn, P., and Berman, K. (2006). Neuralcoding of distinct statistical properties of rewardinformation in humans. Cereb. Cortex, 16:561573.

Elliott, R., Dolan, R., and Frith, C. (2000). Dissociablefunctions in the medial and lateral orbitofrontalcortex: evidence from human neuroimaging studies.Cereb. Cortex, 10:308317.

Elliott, R., Newman, J., Longe, O., and Deakin, J.(2003). Differential response patterns in the


23/28


23

striatum and orbitofrontal cortex to financial rewardin humans: a parametric functional magneticresonance imaging study. J. Neurosci., 23:303307.

Elliott, R., Newman, J., Longe, O., and WilliamDeakin, J. (2004). Instrumental responding forrewards is associated with enhanced neuronal

response in subcortical reward systems.Neuroimage, 21:984990.Fawcett, T. (2006). An introduction to ROC analysis.

Pattern Recognition Letters, 27: 861-874.Fernie, G. and Tunney, R. (2006). Some decks are

better than others: the effect of reinforcer type andtask instructions on learning in the Iowa GamblingTask. Brain Cogn, 60:94102.

Fiorillo, C., Tobler, P., and Schultz, W. (2003).Discrete coding of reward probability anduncertainty by dopamine neurons. Science,299:18981902.

Friston, K. (1997). Testing for anatomically specified

regional effects. Human Brain Mapping, 5:133136.

Friston, K. J., Ashburner, J., Frith, C. D., Poline, J. B.,Heather, J. D., and Frackowiak, R. S. J. (1995a).Spatial registration and normalisation of images.Human Brain Mapping, 2:165189.

Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.B., Frith, C. D., and Frackowiak, R. S. J. (1995b).Statistical parametric maps in functional imaging:A general linear approach. Human Brain Mapping,2:189210.

Friston, K., Price, C., Fletcher, P., Moore, C.,Frackowiak, R., and Dolan, R. (1996). The trouble

with cognitive subtraction. Neuroimage, 4:97104.Gallistel, C. (1990). Representations in animal

cognition: an introduction. Cognition, 37:122.Galvan, A., Hare, T., Davidson, M., Spicer, J., Glover,

G., and Casey, B. (2005). The role of ventralfrontostriatal circuitry in rewardbased learning inhumans. J. Neurosci., 25:86508656.

Genovese, C., Lazar, N., and Nichols, T. (2002).Thresholding of statistical maps in functionalneuroimaging using the false discovery rate.NeuroImage, 870878.

Gneezy, U., and Rustichini, A. (2000). Pay Enough orDon't Pay at All. The Quarterly Journal of

Economics, 115:791810.Gottfried, J., Deichmann, R., Winston, J., and Dolan,

R. (2002a). Functional heterogeneity in humanolfactory cortex: an eventrelated functionalmagnetic resonance imaging study. J. Neurosci.,22:10819 10828.

Gottfried, J., ODoherty, J., and Dolan, R. (2002b).Appetitive and aversive olfactory learning inhumans studied using eventrelated functionalmagnetic resonance imaging. J. Neurosci.,22:1082910837.

Gottfried, J., ODoherty, J., and Dolan, R. (2003).Encoding predictive reward value in human

amygdala and orbitofrontal cortex. Science,301:11041107.

Gusnard, D., Raichle, M., and Raichle, M. (2001).Searching for a baseline: functional imaging andthe resting human brain. Nat. Rev. Neurosci.,2:685694.

Hampton, A., Adolphs, R., Tyszka, M., and ODoherty,J. (2007). Contributions of the amygdala to rewardexpectancy and choice signals in human prefrontalcortex. Neuron, 55:545555.

Hassabis, D., Kumaran, D., Vann, S., and Maguire, E.(2007). Patients with hippocampal amnesia cannotimagine new experiences. Proc. Natl. Acad. Sci.U.S.A., 104:17261731.

Haxby, JV, Gobbini, MI, Furey, ML, Ishai, A,Schouten, JL, Pietrini, P (2001). Distributed andoverlapping representations of faces and objects inventral temporal cortex. Science, 293, 5539:2425-30.

Haynes, JD, Rees, G (2006). Decoding mental statesfrom brain activity in humans. Nat. Rev. Neurosci.,7, 7:523-34.

Heeger, D. and Ress, D. (2002). What does fMRI tellus about neuronal activity? Nat. Rev. Neurosci.,3:142151.

Henson, R., Rugg, M., and Friston, K. (2001). Thechoice of basis functions in event-related fMRI.NeuroImage, 13(6):127. Supplement 1.

Hikosaka, O., Sakamoto, M., and Usui, S. (1989).Functional properties of monkey caudate neurons.III. Activities related to expectation of target andreward. J. Neurophysiol., 61:814832.

Holland, P. (1990). Event representation in Pavlovianconditioning: image and action. Cognition, 37:105131.

Hollerman, J. R. and Schultz, W. (1998). Dopamineneurons report an error in the temporal prediction ofreward during learning. Nat Neurosci, 1(4):304309.

Holmes, A. and Friston, K. (1998). Generalisability,random effects and population inference. InNeuroImage, volume 7, page S754.

Holt, C.A., and Laury, S.K. (2002). Risk Aversion andIncentive Effects. The American Economic Review92, 1644-1655.

Horvitz, J. (2000). Mesolimbocortical and nigrostriataldopamine responses to salient non-reward events.Neuroscience, 96:651656.

Hulvershorn, J., Bloy, L., Gualtieri, E., Leigh, J., andElliott, M. (2005). Spatial sensitivity and temporalresponse of spin echo and gradient echo boldcontrast at 3 T using peak hemodynamic activationtime. Neuroimage, 24:216223.

Ishai, A., Ungerleider, L., and Haxby, J. (2000).Distributed neural systems for the generation ofvisual images. Neuron, 28:979990.

Jensen, J., McIntosh, A., Crawley, A., Mikulis, D.,Remington, G., and Kapur, S. (2003). Direct


24/28


24

activation of the ventral striatum in anticipation ofaversive stimuli. Neuron, 40:12511257.

Johnson, M. and Bickel, W. (2002). Within-subjectcomparison of real and hypothetical money rewardsin delay discounting. J Exp Anal Behav, 77:129146.

Kahneman, D. & Tversky, A. (1979). Prospect theory:an analysis of decision under risk. Econometrica,47, 263-291.

Kamin, L. J. (1969). Predictability, surprise, attentionand conditioning. In B. A. Campbell & R. M.Church (eds.), Punishment and aversive behavior,279-296, New York: Appleton-Century-Crofts.

Kamitani, Y, Tong, F (2005). Decoding the visual andsubjective contents of the human brain. Nat.Neurosci., 8, 5:679-85.

Kim, H., Shimojo, S., and ODoherty, J. (2006). Isavoiding an aversive outcome rewarding? Neuralsubstrates of avoidance learning in the human brain.

PLoS Biol., 4:e233.King, D. (1973). An image theory of classical

conditioning. Psychol Rep, 33:403411.King, D. (1974). An image theory of instrumental

conditioning. Psychol Rep, 35:11151122.Kirsch, P., Schienle, A., Stark, R., Sammer, G.,

Blecker, C., Walter, B., Ott, U., Burkart, J., andVaitl, D. (2003). Anticipation of reward in anonaversive differential conditioning paradigm andthe brain reward system: an event-related fMRIstudy. Neuroimage, 20:10861095.

Knutson, B., Adams, C., Fong, G., and Hommer, D.(2001). Anticipation of increasing monetary reward

selectively recruits nucleus accumbens. J.Neurosci., 21:RC159.

Knutson, B. and Cooper, J. (2005). Functionalmagnetic resonance imaging of reward prediction.Curr. Opin. Neurol., 18:411417.

Knutson, B., Taylor, J., Kaufman, M., Peterson, R., andGlover, G. (2005). Distributed neural representationof expected value. J Neurosci 25:4806-4812.

Kobayashi, M., Takeda, M., Hattori, N., Fukunaga, M.,Sasabe, T., Inoue, N., Nagai, Y., Sawada, T.,Sadato, N., and Watanabe, Y. (2004). Functionalimaging of gustatory perception and imagery: top-down processing of gustatory signals.

Neuroimage, 23:12711282.Konorski, J. (1967). Integrative action of the brain.

Chicago: University of Chicago Press.Kosslyn, S. (1988). Aspects of a cognitive

neuroscience of mental imagery. Science,240:16211626.

Kosslyn, S., Ganis, G., and Thompson, W. (2001).Neural foundations of imagery. Nat. Rev.Neurosci., 2:635642.

Kosslyn, S., Shin, L., Thompson, W., McNally, R.,Rauch, S., Pitman, R., and Alpert, N. (1996).Neural effects of visualizing and perceiving

aversive stimuli: a PET investigation. Neuroreport,7:15691576.

Kringelbach, M. (2005). The human orbitofrontalcortex: linking reward to hedonic experience. Nat.Rev. Neurosci., 6:691702.

Kringelbach, M., ODoherty, J., Rolls, E., and

Andrews, C. (2003). Activation of the humanorbitofrontal cortex to a liquid food stimulus iscorrelated with its subjective pleasantness. Cereb.Cortex, 13:10641071.

Kringelbach, M. and Rolls, E. (2004). The functionalneuroanatomy of the human orbitofrontal cortex:evidence from neuroimaging and neuropsychology.Prog. Neurobiol., 72:341372.

LaConte, S, Strother, S, Cherkassky, V, Anderson, J,Hu, X (2005). Support vector machines fortemporal classification of block design fMRI data.Neuroimage, 26, 2:317-29.

Lancaster, J., Woldorff, M., Parsons, L., Liotti, M.,

Freitas, C., Rainey, L., Kochunov, P., Nickerson,D., Mikiten, S., and Fox, P. (2000). AutomatedTalairach atlas labels for functional brain mapping.Hum Brain Mapp, 10:120131.

Lauterbur, P.C. (1973). Image formation by inducedlocal interactions. Examples employing nuclearmagnetic resonance. Nature, 242:190191.

Le Pelley, M. (2004). The role of associative history inmodels of associative learning: a selective reviewand a hybrid model. Q J Exp Psychol B, 57:193243.

Lin, H.-T., Lin, C.-J., and Weng, R. C. (2007). A noteon Platt's probabilistic outputs for support vector

machines. Machine Learning. 68(3), 267-276.Lisman, J. and Grace, A. (2005). The hippocampal-

VTA loop: controlling the entry of information intolong-term memory. Neuron, 46:703713.

Liu, X., Powell, D., Wang, H., Gold, B., Corbly, C.,and Joseph, J. (2007). Functional dissociation infrontal and striatal areas for processing of positiveand negative reward information. J. Neurosci.,27:4587 4597.

Ljungberg, T., Apicella, P., and Schultz, W. (1992).Responses of monkey dopamine neurons duringlearning of behavioral reactions. J Neurophysiol,67(1):145163.

Logothetis, N. (2002). The neural basis of the blood-oxygenlevel- dependent functional magneticresonance imaging signal. Philos. Trans. R. Soc.Lond., B, Biol. Sci., 357:10031037.

Logothetis, N., Pauls, J., Augath, M., Trinath, T., andOeltermann, A. (2001). Neurophysiologicalinvestigation of the basis of the fMRI signal.Nature, 412:150157.

Mackintosh, N. J. (1975). A theory of attention:Variations in the associability of stimuli withreinforcement. Psychological Review, 82, 276-298.

Mackintosh, N. J. (1983). Conditioning and associativelearning. Oxford: Oxford University Press.


25/28


25

Maldjian, J., Laurienti, P., Kraft, R., and Burdette, J.(2003). An automated method for neuroanatomicand cytoarchitectonic atlasbased interrogation offMRI data sets. Neuroimage, 19:12331239.

Mansfield, P. (1977). Multi-planar image formationusing NMR spin echoes. J. Phys. C 10:L55L58.

McClure, S. M. (2003). Reward prediction errors inhuman brain, PhD Thesis, Baylor College ofMedicine.

McClure, S., Berns, G., and Montague, P. (2003).Temporal prediction errors in a passive learningtask activate human striatum. Neuron, 38:339346.

McClure, S., Ericson, K., Laibson, D., Loewenstein,G., and Cohen, J. (2007). Time discounting forprimary rewards. J. Neurosci., 27:57965804.

McClure, S., Laibson, D., Loewenstein, G., and Cohen,J. (2004a). Separate neural systems valueimmediate and delayed monetary rewards. Science,306:503507.

McClure, S., York, M., and Montague, P. (2004b). Theneural substrates of reward processing in humans:the modern role of FMRI. Neuroscientist, 10:260268.

Mechelli, A., Price, C., Friston, K., and Ishai, A.(2004). Where bottom-up meets top-down:neuronal interactions during perception andimagery. Cereb. Cortex, 14:12561265.

Mellers, B.A., Schwartz, A., Ho, K., and Ritov, I.(1997). Decision affect theory: Emotional reactionsto the outcomes of risky options. PsychologicalScience, 8(6):423429.

Miller, G. (2007). Neurobiology. A surprising

connection between memory and imagination.Science, 315:312.

Mirenowicz, J. and Schultz, W. (1994). Importance ofunpredictability for reward responses in primatedopamine neurons. J Neurophysiol, 72(2):10241027.

Mirenowicz, J. and Schultz, W. (1996). Preferentialactivation of midbrain dopamine neurons byappetitive rather than aversive stimuli. Nature,379(6564):449451.

Mouro-Miranda, J, Bokde, AL, Born, C, Hampel, H,Stetter, M (2005). Classifying brain states anddetermining the discriminating activation patterns:

Support Vector Machine on functional MRI data.Neuroimage, 28, 4:980-95.

Murray, E. (2007). The amygdala, reward and emotion.Trends Cogn. Sci. (Regul. Ed.), 11:489497.

Nichols, T., Brett, M., Andersson, J., Wager, T., andPoline, J. (2005). Valid conjunction inference withthe minimum statistic. Neuroimage, 25:653660.

Nitschke, J., Sarinopoulos, I., Mackiewicz, K.,Schaefer, H., and Davidson, R. (2006). Functionalneuroanatomy of aversion and its anticipation.Neuroimage, 29:106116.

Norris, D., Zysset, S., Mildner, T., and Wiggins, C.(2002). An investigation of the value of spin-echo-

based fMRI using a Stroop colorword matchingtask and EPI at 3 T. Neuroimage, 15:719726.

OCraven, K. and Kanwisher, N. (2000). Mentalimagery of faces and places activates correspondingstiimulus-specific brain regions. J Cogn Neurosci,12:10131023.

ODoherty, J. (2004). Reward representations andrewardrelated learning in the human brain: insightsfrom neuroimaging. Curr. Opin. Neurobiol.,14:769776.

ODoherty, J. (2007). Lights, camembert, action! Therole of human orbitofrontal cortex in encodingstimuli, rewards, and choices. Ann. N. Y. Acad.Sci., 1121:254272.

ODoherty, J., Buchanan, T., Seymour, B., and Dolan,R. (2006). Predictive neural coding of rewardpreference involves dissociable responses in humanventral midbrain and ventral striatum. Neuron,49:157 166.

ODoherty, J., Critchley, H., Deichmann, R., andDolan, R. (2003a). Dissociating valence of outcomefrom behavioral control in human orbital andventral prefrontal cortices. J. Neurosci., 23:79317939.

ODoherty, J., Dayan, P., Friston, K., Critchley, H.,and Dolan, R. (2003b). Temporal difference modelsand reward-related learning in the human brain.Neuron, 38:329337.

ODoherty, J., Dayan, P., Schultz, J., Deichmann, R.,Friston, K., and Dolan, R. (2004). Dissociable rolesof ventral and dorsal striatum in instrumentalconditioning. Science, 304:452454.

ODoherty, J., Kringelbach, M., Rolls, E., Hornak, J.,and Andrews, C. (2001). Abstract reward andpunishment representations in the humanorbitofrontal cortex. Nat. Neurosci., 4:95102.

Ogawa, S., Lee, T., Kay, A., and Tank, D. (1990).Brain magnetic resonance imaging with contrastdependent on blood oxygenation. Proc. Natl. Acad.Sci. U.S.A., 87:98689872.

Ogawa, S., Menon, R., Tank, D., Kim, S., Merkle, H.,Ellermann, J., and Ugurbil, K. (1993). Functionalbrain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging. Acomparison of signal characteristics with a

biophysical model. Biophys. J., 64:803 812.Ogawa, S., Tank, D., Menon, R., Ellermann, J., Kim,

S., Merkle, H., and Ugurbil, K. (1992). Intrinsicsignal changes accompanying sensory stimulation:functional brain mapping with magnetic resonanceimaging. Proc. Natl. Acad. Sci. U.S.A., 89:59515955.

Ojemann, J., Akbudak, E., Snyder, A., McKinstry, R.,Raichle, M., and Conturo, T. (1997). Anatomiclocalization and quantitative analysis of gradientrefocused echo-planar fMRI susceptibility artifacts.Neuroimage, 6:156167.


26/28


27/28


27

Schultz, W., Dayan, P., and Montague, P. R. (1997). Aneural substrate of prediction and reward. Science,275(5306):15931599.

Schwarzbauer, C., Raposo, A. & Tyler, L.K. (2005).Spin-echo fMRI overcomes susceptibility-inducedsignal losses in the inferior temporal lobes.

NeuroImage, 26 (S1): 802.Schwarzbauer C., Mildner T., Heinke W., Zysset S.,Deichmann R., Brett M., Davis M.H. (2006). Spin-echp EPI The method of choice for fMRI of brainregions affected by magnetic fieldinhomogeneities? Human Brain Mapping, AbstractNo: 1049.

Seymour, B., Daw, N., Dayan, P., Singer, T., andDolan, R. (2007a). Differential encoding of lossesand gains in the human striatum. J. Neurosci.,27:48264831.

Seymour, B., ODoherty, J., Dayan, P., Koltzenburg,M., Jones, A., Dolan, R., Friston, K., and

Frackowiak, R. (2004). Temporal difference modelsdescribe higher-order learning in humans. Nature,429:664 667.

Seymour, B., Singer, T., and Dolan, R. (2007b). Theneurobiology of punishment. Nat. Rev. Neurosci.,8:300311.

Shafir, E., Diamond, P.A., and Tversky, A. (1997). OnMoney Illusion. Quarterly Journal of Economics,112:341-74.

Simmons, W., Martin, A., and Barsalou, L. (2005).Pictures of appetizing foods activate gustatorycortices for taste and reward. Cereb. Cortex,15:16021608.

Stark, C.E., Squire, L.R. (2001). When zero is not zero:the problem of ambiguous baseline conditions infMRI. Proc. Natl. Acad. Sci. U.S.A., 98, 22:12760-6.

Sutton, R. S. (1988). Learning to predict by the methodof temporal difference. Machine Learning, 3, 9-44.

Sutton, R. and Barto, A. (1981). Toward a moderntheory of adaptive networks: expectation andprediction. Psychol Rev, 88:135170.

Sutton, R. S. & Barto, A. G. (1990). Time-derivativemodels of Pavlovian reinforcement. In M. Gabriel& J. Moore (eds.), Learning and computationalneuroscience: foundations of adaptive networks,

497-537, Boston: MIT Press.Talairach, J. and Tournoux, P. (1988). Co-planar

Stereotaxic Atlas of the Human Brain. Thieme,New York.

Talmi, D., Seymour, B., Dayan, P., and Dolan, R.(2008). Human pavlovian-instrumental transfer. J.Neurosci., 28:360368.

Thorpe, S., Rolls, E., and Maddison, S. (1983). Theorbitofrontal cortex: neuronal activity in thebehaving monkey. Exp Brain Res, 49:93115.

Thorndike, E. L. (1911). Animal intelligence:experimental studies. New York: Macmillan.

Thut, G., Schultz, W., Roelcke, U., Nienhusmeier, M.,Missimer, J., Maguire, R., and Leenders, K. (1997).Activation of the human brain by monetary reward.Neuroreport, 8:12251228.

Tiggemann, M, Kemps, E (2005). The phenomenologyof food cravings: the role of mental imagery.

Appetite, 45, 3:305-13.Tobler, P. (2003). Coding of basic reward parametersby dopamine neurons. PhD Thesis, University ofCambridge.

Tobler, P., Dickinson, A., and Schultz, W. (2003).Coding of predicted reward omission by dopamineneurons in a conditioned inhibition paradigm. J.Neurosci., 23:1040210410.

Tobler, P., Fiorillo, C., and Schultz, W. (2005).Adaptive coding of reward value by dopamineneurons. Science, 307:16421645.

Tobler, P., Fletcher, P., Bullmore, E., and Schultz, W.(2007a). Learning-related human brain activations

reflecting individual finances. Neuron, 54:167175.Tobler, P., Odoherty, J., Dolan, R., and Schultz, W.

(2006). Human neural learning depends on rewardprediction errors in the blocking paradigm. J.Neurophysiol., 95:301310.

Tobler, P., ODoherty, J., Dolan, R., and Schultz, W.(2007b). Reward value coding distinct from riskattitude-related uncertainty coding in human rewardsystems. J. Neurophysiol., 97:16211632.

Tremblay, L., Hollerman, J. R., and Schultz, W.(1998). Modifications of reward expectation-relatedneuronal activity during learning in primatestriatum. J Neurophysiol, 80(2):964977.

Tremblay, L. and Schultz, W. (1999). Relative rewardpreference in primate orbitofrontal cortex. Nature,398(6729):704 708.

Tremblay, L. and Schultz, W. (2000a). Modificationsof reward expectation-related neuronal activityduring learning in primate orbitofrontal cortex. JNeurophysiol, 83(4):18771885.

Tremblay, L. and Schultz, W. (2000b). Rewardrelatedneuronal activity during go-nogo task performancein primate orbitofrontal cortex. J Neurophysiol,83(4):18641876.

Tricomi, E., Delgado, M., and Fiez, J. (2004).Modulation of caudate activity by action

contingency. Neuron, 41:281292.Tzourio-Mazoyer, N, Landeau, B, Papathanassiou, D,

Crivello, F, Etard, O, Delcroix, N, Mazoyer, B,Joliot, M (2002). Automated anatomical labeling ofactivations in SPM using a macroscopic anatomicalparcellation of the MNI MRI single-subject brain.Neuroimage, 15, 1:273-89.

Valentin, V., Dickinson, A., and ODoherty, J. (2007).Determining the neural substrates of goal-directedlearning in the human brain. J. Neurosci., 27:40194026.


28/28


28

Vohs, K., Mead, N., and Goode, M. (2006). Thepsychological consequences of money. Science,314:11541156.

Waelti, P., Dickinson, A., and Schultz, W. (2001).Dopamine responses comply with basicassumptions of formal learning theory. Nature,

412(6842):4348.Winston, J., Gottfried, J., Kilner, J., and Dolan, R.(2005). Integrated neural representations of odorintensity and affective valence in human amygdala.J. Neurosci., 25:89038907.

Wise, R. (2002). Brain reward circuitry: insights fromunsensed incentives. Neuron, 36:229240.

Wise, R. (2004). Dopamine, learning and motivation.Nat. Rev. Neurosci., 5:483494.

Wittmann, B., Schott, B., Guderian, S., Frey, J.,Heinze, H., and Duzel, E. (2005). Reward-relatedFMRI activation of dopaminergic midbrain isassociated with enhanced hippocampus-dependent

long-term memory formation. Neuron, 45:459467.

Worsley, K., Marrett, S., Neelin, P., Vandal, A. C.,Friston, K., and Evans, A. C. (1996). A unifiedstatistical approach for determining significantvoxels in images of cerebral activation. HumanBrain Mapping, 4:5873.

Yoo, S., Freeman, D., McCarthy, J., and Jolesz, F.

(2003). Neural substrates of tactile imagery: afunctional MRI study. Neuroreport, 14:581585.Zink, C., Pagnoni, G., Chappelow, J., Martin-Skurski,

M., and Berns, G. (2006). Human striatal activationreflects degree of stimulus saliency. Neuroimage,29:977983.

Zink, C., Pagnoni, G., Martin, M., Dhamala, M., andBerns, G. (2003). Human striatal response to salientnonrewarding stimuli. J. Neurosci., 23:80928097.

Zink, C., Pagnoni, G., Martin-Skurski, M., Chappelow,J., and Berns, G. (2004). Human striatal responsesto monetary reward depend on saliency. Neuron,42:509517

introduction to reward processing

Documents