introduction to reward processing

Upload: kpmiyapuram

Post on 03-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Introduction to Reward Processing

    1/28

    1 Introduction to Reward processingKrishna Prasad Miyapuram

    Ph.D. Thesis Chapter, University of Cambridge, April 2!

    1.1 Functions of rewardsReward seeking behaviour depends to a large extent on successfully extracting reward information

    from a large variety of environmental stimuli and events. Learning to reliably predict the occurrence

    of rewards such as food enables an organism to prepare behavioural reactions and improve the

    choices that it makes in the future. Learning can be defined as a change in behaviour. Various

    sensory cues from the environment such as sounds, sights and smells that are associated with a

    reward guide the animal to return to the previously experienced reward (Wise, 2002). Thus, one of

    the main functions of rewards is to induce learning, as subjects will come back for more when they

    encounter a reward. Another function of rewards is to induce approach and consummatory

    behaviour for acquiring the rewarding object. This is essential for decision making and Goal-

    directed behaviour, as the animal learns to decide the appropriate actions to be executed with

    rewards as goals. The third function of rewards is to induce subjective feelings of pleasure and

    hedonia (positive emotions). This common perception associates rewards primarily with happiness.

    Thus rewards have very basic functions in the life of individuals and are necessary for survival and

    reproduction (survival of genes) (Schultz, 2000, 2004, 2006).

    1.1.1

    Learning by conditioning

    Reward-directed learning can occur by associating a stimulus with a reward (Pavlovian or classical

    conditioning) or by associating an action with a reward (instrumental or operant conditioning).

    These forms of learning fall under the category of associative learning. More than a century ago,

    Thorndike (1898) argued that learning consists of the formation of connections between stimuli and

    responses and that these connections are formed whenever a response is followed by a reward. This

    kind of learning is called instrumental (or operant) conditioning as the delivery of the reward is

    contingent on the response made by the animal. Pavlov (1929) delivered the reward to his subjectsindependently of the animals behaviour. Thus, learning in Pavlovian conditioning consisted of

  • 8/12/2019 Introduction to Reward Processing

    2/28

    Introduction to Reward processing

    2

    pairing between a stimulus and a reward. In both kinds of learning an arbitrary, previously neutral

    stimulus (Conditioned Stimulus, CS) acquires the function of a rewarding stimulus after being

    repeatedly associated in time with a rewarding object (Unconditioned Stimulus, US).

    The early definitions of conditioning have emphasised that the temporal contiguity of the CS

    and the US is essential for learning. Modern views of conditioning, however, suggest that the

    pairing or contiguity of two events is neither necessary nor sufficient for learning to occur (see

    Rescorla, 1988 for review). Rather, conditioning depends on the information that the CS provides

    Figure 1-1 Learning by classical conditioning

    (a) Contiguity requirement. The US needs to follow the CS in a temporallycontiguous manner. (b) If the US is delayed after the offset of CS, it iscalled trace conditioning procedure. (c) Contingency requirement. The USshould have higher probability of occurring in the presence of the CS thanits absence for excitatory conditioning. (d) If the CS predicts the omissionof a US, it is said to be conditioned inhibition. (e) prediction error.Unexpected delivery of reward gives a positive prediction error; while the

    omission of a predicted reward gives a negative prediction error. (f) higherorder conditioning occurs when a second stimulus CS predicts the

    occurrence of the CS.

  • 8/12/2019 Introduction to Reward Processing

    3/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    3

    about the US. More specifically, the US needs to occur more frequently in the presence of the CS as

    compared with its absence. Further, a negative relation between a CS and US can be learned if the

    occurrence of the CS predicts the omission of the US (conditioned inhibition, Rescorla, 1969). This

    suggests that contingency of the US upon occurrence of the CS is crucial for Pavlovian conditioning

    (Dickinson, 1980). When a US is fully predicted by a CS, then it does not contribute to any further

    learning even if the contiguity and contingency requirements are fulfilled. This phenomenon is

    illustrated by the blocking effect (Kamin, 1969), in which a previously formed association prevents

    or blocks the formation of a new association. Kamin (1969) proposed that the surprise or error in

    prediction of the US contributes to learning. Thus, there are three key factors govern learning by

    conditioning contiguity, contingency, and prediction error (Tobler, 2003; Schultz, 2006).

    Box 1 Models of conditioning Role of prediction error

    Prediction error has been fundamental to many models of conditioning. Rescorla and Wagner

    (1972) proposed that repeated pairing of a CS (stimulus A) and a US will result in a gradual

    increase in the strength of association (VA) between them. According to their model, the change in

    associative strength,

    VA= (-VT)

    where, the value of is set by the magnitude of the US and represents the maximum strength that

    the CS-US association can achieve. VT represents the sum of associative strengths of all stimuli

    present on the trial. Therefore, the term -VT represents the prediction error, which is nothing but

    the discrepancy between the maximum associative strength and the current prediction. The two

    learning-rate parameters and with values between 0 and 1 determined by the salience of the CS

    (stimulus A) and the US respectively, that are fixed during conditioning. The Rescorla-Wagner (R-

    W) model can explain the contingency requirement for conditioning by allowing the experimental

    context to be associated with the US like any other CS. Hence if the probability p(US|CS) of the US

    occurring in the presence of the CS is lower than the probability p(US|no CS) of the US occurring

    in the absence of the CS, the associative strength for predicting the US would be greater for the

    experimental context compared to that of the CS (conditioned inhibition). The blocking effect can

    also be explained as the R-W model incorporates the prediction error from the total associative

    strength VT of all stimuli present on a given trial. So a fully predicted US does not generate any

    prediction error and hence blocks any further learning by a second stimulus. Despite, the limitations

  • 8/12/2019 Introduction to Reward Processing

    4/28

    Introduction to Reward processing

    4

    of R-W model to explain phenomenon like latent inhibition (the pre-exposure of a CS retards later

    conditioning of the CS with a US), the prediction error principle remains central to a number of

    contemporary models of conditioning (see Pearce and Boston, 2001).

    Attentional theories of conditioning have suggested that in addition to the processing of theUS proposed by Rescorla-Wagner model, the processing of the CS is integral to the process of

    conditioning (Mackintosh, 1975; Pearce and Hall, 1980). According to Mackintosh (1975), stimuli

    that generate least absolute value of prediction error are good predictors of US and generate

    maximum attention. The change in associability of a stimulus A is positive if | -VA| < | -Vx|

    and is negative otherwise. Here, Vx is sum of associative strengths of all stimuli except A. The

    change in associative strength is given by

    VA= A(-VA)

    Thus, Mackintosh model suggests a separable error term so that associative change undergone by a

    CS is influenced by the discrepancy between its own associative strength (VA) and the outcome ().

    Pearce and Hall (1980) proposed that the associability Aof a stimulus A on a trial n is determined

    by the absolute value of the discrepancy for the previous occasion on which stimulus A was

    presented.

    An

    = | -VT|

    n-1

    The change in associative strength is determined by

    VA= ASA

    where, SAdenotes the salience of the CS.

    Pearce-Hall model suggests, contrary to the Mackintosh model, that maximum attention (processing

    of the CS) is generated by stimuli that have generated prediction error of the US in the previous

    trial. Nevertheless, the attentional theories of conditioning suggest that attention to CS is crucial for

    learning and changes in attentional processing result from absolute prediction errors (see Pearce and

    Bouton, 2001 for a review).

    The models of conditioning can be summarised as essentially including two terms that are

    combined multiplicatively CS processing (eligibility) and US processing (reinforcement). While

    the Rescorla-Wagner model proposed that learning is driven entirely by changes in US processing

    in terms of prediction error, the Mackintosh and Pearce-Hall models have emphasised the role of

    CS processing (attention) in terms of change in associability. Le Pelley (2004) has suggested a

  • 8/12/2019 Introduction to Reward Processing

    5/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    5

    hybrid model integrating these previous models of associative learning. The hybrid model

    distinguishes between attentional associability of the Mackintosh model and the salience

    associability of the Pearce-Hall model and combines them in a multiplicative way along with

    separable error term (e.g. | -VA| ) and summed error term of Rescorla-Wagner model.

    A real-time extension of the Rescorla-Wagner model is the temporal difference (TD) model

    developed by Sutton and Barto (1981; Sutton, 1988; see Sutton and Barto, 1990 for a review with

    reference to animal learning theories). The advantage of real-time models is that the temporal

    relationship between stimuli within a trial can be captured. An important illustration is the delay

    conditioning procedure. In this procedure, the CS has an onset much earlier than the US and the

    onset of the US is at the offset of the CS or slightly earlier. A further delay between the offset of the

    CS and the onset of the US is referred to as trace conditioning procedure. The time between theonset of the CS and the onset of the US is called Inter-Stimulus-Interval (ISI). The effectiveness of

    conditioning is known to reduce for long ISI (see Sutton and Barto, 1990). This can be explained by

    assuming that the internal representation of CS as perceived by the subject diminishes during the

    ISI. This can be modelled by taking several time-bins within a trial and the CS predicts a temporally

    discounted sum of all future rewards within the trial with more recent time-bins having greater

    weight. Thus, an US occurring with longer ISI is discounted more and hence is less effective in

    conditioning. For example, using an exponential discounting function, with as discount factor, the

    reward predicted Vtat time t is given by

    Vtt+1+ t+2+ 2t+3+

    3t+4+

    The following recursive relationship allows estimation of the current prediction and avoids the

    necessity to wait until all future rewards are received in that trial.

    Vtt+1+ Vt+1

    We can now define the temporal difference error that must approach zero with learning as

    t= t+1+ Vt+1 - Vt

    and the learning is governed by

    VA= A(t+1+ Vt+1 - Vt)

    where, t+1+ Vt+1takes the role of (asymptotic value of US) in Rescorla-Wagner model.

    Another important illustration of the use of real-time models such as the TD is that it can explain

    higher-order conditioning, in which conditioned stimuli not only acquire predictive power when

  • 8/12/2019 Introduction to Reward Processing

    6/28

    Introduction to Reward processing

    6

    associated with an US, but also when associated with another conditioned stimuli that has

    previously been associated with an US. The prediction of reward at various time-points within a

    trial, as proposed by the TD model, explains the ability of the organism to predict the US based on

    the earliest available CS.

    1.1.2 Approach behaviour and decision makingRewards act as positive reinforcers by increasing the frequency and intensity of the behaviour that

    leads to the acquisition of goal objects (Schultz, 2000). Reinforcers are those objects that increase

    the frequency of behaviour. Rewards also act as goals in their own right and can therefore elicit

    approach and consummatory behaviour. Omission of reward leads to extinction of behaviour.

    Punishment has opposite motivational valence to reward and decrease the frequency of behaviour.

    Avoidance/escape behaviours are negatively reinforced (strengthened) in order to prevent/terminate

    a punishment, respectively. These findings have been formalised as law of effect (Thorndike, 1911)

    that states learning would only occur if there was reinforcement. The approach behaviour has been

    central to the operational definition of rewards as those objects which subjects will work to acquire

    through allocation of time, energy, or effort (McClure, 2003) or in other words, rewards make

    subjects come back for more.

    In Pavlovian conditioning, the conditioned stimuli elicit responses that help prepare the

    animal for the consumption of reward. Konorski (1967) distinguished between preparatory and

    consummatory conditioned responses. Preparatory responses (e.g. excitement, approach) depend on

    the general motivational attributes of, or emotional responses to, a reinforcer and hence reflect the

    general affective value of the reinforcer. Consummatory responses (e.g. pecking, salivation) depend

    on the specific sensory attributes of the reinforcer (Mackintosh, 1983). In most experiments, both

    preparatory and consummatory conditioning will occur. Therefore, CS will be associated with both

    affective and sensory attributes of the US.

    In instrumental conditioning, the actions that lead to reward are reinforced. In real world, an

    animal is often encountered with more than one action to choose. The animal is then confronted

    with a decision-making situation and would choose those actions that have maximum value.

    Reinforcement learning models and its implementations such as the actor-critic architecture

    provide an account of choice behaviour. An agent (organism) learns to achieve a goal (maximise

    reward) by navigating through the space of states (making decisions - actor) using the reinforcement

    signal (updating the value function - critic). In the temporal difference (TD) model, the TD error

  • 8/12/2019 Introduction to Reward Processing

    7/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    7

    guides the updating of value function V(St) when transitioning from state Stto state St+1. Q-learning

    and its variants have offered estimation of value functions over state-action pairs, so that in a given

    state s, the organism chooses the action a that maximises the value Q(s,a). The updating of value

    function Q is done similar to the TD model.

    Box 2 Basic reward parameters Microeconomic concepts

    The influence of rewards on decision-making can be assessed by the basic reward parameters such

    as magnitude, probability and delay. Given a choice between different magnitudes or probabilities

    of reward, an organism would choose those options with higher magnitude and probability. Smaller

    delays to obtain reward are preferred to longer delays. The reward value is expressed as associative

    strength in models of conditioning that facilitates learning.

    The occurrence of rewards is uncertain in the dynamic world, in which both the environment

    and the behaviour of other agents render the rewards partly unpredictable. Uncertainty can be in the

    expected magnitude of the reward (characterised by the variance) or the probability (p) of the

    reward (maximum uncertainty at p = 50%) or the time of delivery of the reward. The uncertainty of

    rewards can generate attention that determines learning according to associability learning rules

    (Mackintosh, 1975; Pearce and Hall, 1980).

    Pascal, way back in 1650, conjectured that human choice behaviour could be understood bythe expected value (product of probability and magnitude of the reward). Bernoulli (1738/1954)

    suggested that the actual value or the utility that the people assign to an outcome depends on the

    wealth of the assigning person and grows more slowly than its magnitude. Bernoulli proposed that

    increase in magnitude is always accompanied by an increase in the utility, which follows a concave

    (more specifically, a logarithmic) function of magnitude. Hence, individuals behave as to maximise

    the expected utility, instead of the expected value. Prospect theory (Kahneman and Tversky, 1979)

    suggests that not only the perception of magnitude but also the perception of probability issubjective to an individual.

    1.1.3 Subjective feelings of pleasureThe common perception of reward associates positive feelings of pleasure and hedonia as one of the

    main functions of reward. Pleasure is a subjective feeling as it depends on the motivation of the

    organism (wealth, satiety etc) and other available options (contextual effects). Rewards induce

    positive emotions (affect). Recent theories (Berridge and Robinson, 2003) have suggested that the

  • 8/12/2019 Introduction to Reward Processing

    8/28

    Introduction to Reward processing

    8

    motivational and emotional functions of rewards are dissociable as wanting and liking respectively.

    Both the motivational and emotional functions can occur either consciously or unconsciously.

    Indeed, wanting can occur without pleasurable liking as accumulated wealth or satiation can fade

    away the liking.

    1.2 Classical Reward structures: NeurophysiologyDopamine neurons of the ventral tegmental area (VTA) and substantia nigra have long been

    identified with the processing of rewarding stimuli. Romo and Schultz (1990) have shown that

    phasic dopamine responses appeared to be related to the appetitive properties of the object being

    touched rather than the object itself. Phasic burst of dopamine neurons occurred when the monkey's

    hand touched a morsel of food but not when the animal's hand touched a wire or other non-food

    objects. Dopamine neurons in the substantia nigra pars compacta form part of the nigrostriatal

    pathway and project mainly to the caudate and putamen and is identified strongly with motor

    function. More medially, the ventral tegmental area (VTA) projects strongly to the nucleus

    accumbens and also to the amygdala and hippocampus (mesolimbic pathway). The mesocortical

    pathway from medial VTA project to a number of brain structures including the dorsal and ventral

    prefrontal cortex. The mesocorticolimbic structures are known to be involved in processing the

    reward information.

    1.2.1 Dopamine responses related to animal learning theoryDopamine neurons respond to the sight of primary food reward and to the conditioned stimulus

    associated with reward (Ljungberg et al. 1992). However dopamine responses were not observed to

    a light that was not associated with task performance, suggesting the behavioural significance of

    dopamine neurons specific to reward. When a stimulus predicting reward is itself preceded by

    another stimulus, the phasic activation of dopamine neurons transfers back to this latter stimulus

    (Schultz et al., 1993). Thus, dopamine neurons might respond to the earliest reward predicting

    stimulus.

    Mirencowiz and Schultz (1994) found that of dopamine neurons showed a short-latency,

    phasic response to unpredicted liquid rewards and during conditioning. After learning, the neuronal

    responses occurred at the onset of the conditioned stimulus. When a predicted reward is omitted,

    dopamine neurons are depressed time-locked to the usual occurrence of the reward. It is suggested

    that the phasic dopamine response might encode the discrepancy between the predicted and the

  • 8/12/2019 Introduction to Reward Processing

    9/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    9

    actual occurrence of the reward (for review see Schultz et al., 1997). More recently, Bayer and

    Glimcher (2005) have used a regression model that replicated the findings consistent with a

    temporal difference model demonstrating a role of dopamine neurons in positive reward prediction

    error. Hollerman and Schultz (1998) showed that dopamine neurons were activated by rewards

    during early trials and the activity progressively reduced as the rewards became more predictable.

    Further, these neurons were activated when rewards occurred at unpredicted times and were

    depressed when rewards were omitted at predicted times. Thus dopamine neurons encode errors in

    prediction of both the occurrence and the time of rewards.

    Waelti et al. (2001) used blocking procedure to show that the responses of dopamine

    neurons to conditioned stimuli were governed differentially by the occurrence of reward prediction

    errors rather than stimulusreward associations alone. Tobler et al. (2003) used a conditionedinhibition paradigm and showed that out of 69 dopamine neurons that were strongly active to

    reward predicting stimuli, 48 neurons showed considerable depressions to conditioned inhibitors

    and minor activations in remaining neurons. To be able to successfully discriminate between reward

    and non-reward predicting stimuli, attention must be paid to both conditioned excitors as well as

    Figure 1-2 Primary target regions of dopamine

    The dopamine neurons, named after the neurotransmitter they release withnerve impulses in their projection territories, are located in the midbrainstructures substantia nigra (pars compacta) and the medially adjoiningventral tegmental area (VTA). The axons of dopamine neurons project to

    the striatum (caudate nucleus, putamen and ventral striatum includingnucleus accumbens), the dorsal and ventral prefrontal cortex, and a

    number of other structures.

  • 8/12/2019 Introduction to Reward Processing

    10/28

    Introduction to Reward processing

    10

    inhibitors. This indicates differential neural coding of reward prediction and attention.

    These findings indicate dopamine responses comply with basic tenets of animal learning

    theory and indicate a role for dopamine in reward-based learning, in particular representing reward

    prediction errors. Learning rules such as proposed by Rescorla and Wagner (1972) also explaingreater associative strength for increasing magnitudes of reward. Further as learning is captured by

    the concept of prediction error, thus increasing probability of reward should result in smaller

    responses to the reward and thereby greater responses to the reward predicting cue. These basic

    parameters of reward processing, namely magnitude, probability, expected value and uncertainty

    have been fundamental concepts of microeconomics.

    Two reports by Schultz and colleagues (Fiorillo et al., 2003; Tobler et al., 2005) have shown

    dopamine responses to magnitude and probabilities of reward. Fiorillo et al. (2003) found that the

    phasic activation of dopamine neurons varied monotonically across the full range of probabilities,

    supporting past claims that this response codes the discrepancy between predicted and actual

    reward. In addition, a gradual increase in activity until the potential time of reward was observed

    that was related to the uncertainty of obtaining a reward. Tobler et al. (2005) found that the phasic

    activation of midbrain dopamine neurons showed similar sensitivity to both the magnitude and

    probability of reward, and appeared to increase monotonically with expected reward value. Further,

    a second form of adaptation observed was the change in sensitivity or gain of neural activity thatappeared to depend on the range of likely reward magnitudes.

    Figure 1-3 Dopamine responses to basic reward parameters (adapted from Tobler, 2003)

  • 8/12/2019 Introduction to Reward Processing

    11/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    11

    1.2.2 Reward signals in the striatum and orbitofrontal cortexHikosaka et al. (1989) showed reward expectation and reward delivery related activation in caudate

    neurons. The activations were non-selective for how the monkey obtained the reward, i.e., by visual

    fixation only, by a saccade, or by a hand movement. Apicella et al (1991) found ventral and dorsal

    striatal responses to primary liquid rewards that could be distinguished from movement related

    activations in posterior putamen. Neurons that detect rewards are more common in the ventral

    striatum than in the caudate nucleus and putamen. Schultz et al. (1992) showed reward-expectation

    and reward-delivery related responses in the ventral striatum. Changes in the appetitive value of the

    reward liquid modified the magnitude of activations, suggesting a possible relationship to the

    hedonic properties of the expected event.

    Thorpe et al. (1983) showed that neurons in orbitofrontal cortex responded selectively to

    particular foods or aversive stimuli that could not be explained by simple sensory features of the

    stimulus. Orbitofrontal neurons tracked whether particular visual stimuli continue to be associated

    with reinforcement and the responses reversed when the stimulus contingencies were interchanged.

    Critchley and Rolls, (1996) found neuronal responses in orbitofrontal cortex to rewards and reward-

    predicting stimuli is reduced with satiation and hence are related to the motivational value rather

    than the sensory properties of reward objects. Tremblay and Schultz (2000) showed activation in

    orbitofrontal neurons to expectation of reward that also detect reward delivery at trial end. The

    activations also preceded expected drops of liquid delivered outside the task.

    The number of possible reward values and stimuli has no absolute limits. However, the

    number of neurons and their possible spike outputs are limited. If the neurons outputs were evenly

    allocated for reward values, there would be little discrimination between rewards. Neurons in the

    orbitofrontal cortex of the monkey discriminate between different rewards on the basis of their

    relative preferences (Tremblay and Schultz, 1999). For example, consider a neuron that is active

    when a more preferred reward (such as a piece of apple) is expected rather than a less preferred

    reward (such as cereal). The same neuron shows higher activity, in a different trial, when an even

    more preferred reward (such as raisin) is expected rather than the previously preferred reward of

    apple. Thus, rewards may influence each other, and the value of a reward can depend on other

    available rewards. Cromwell and Schultz (2003) have shown that single neurons within the anterior

    striatum distinguish between minute differences in reward magnitude (

  • 8/12/2019 Introduction to Reward Processing

    12/28

    Introduction to Reward processing

    12

    study. Cromwell et al., (2005) suggested that the shift in reward processing due to different

    preferences of the animal may reflect the adaptation of responses to the current reward distribution.

    For linear, monotonic responses, this can be expressed as y=a+b(x-p) where, b represents reward

    sensitivity and p represents the shift in the current distribution and a is a constant. It might be

    possible that the immediate past experience sets up a prediction about the mean and range of the

    future rewards. This prediction would facilitate the brain to use its full coding potential, thus

    optimising its response, only within this distribution.

    1.3 FMRI studies of reward processingAlthough animal studies provide an unprecedented approach to study neural mechanisms at cellular

    level, the limited communication and cognitive capabilities restricts the investigation of reward

    functions in animals. Early neuroimaging studies have replicated the animal results in human

    subjects and extended the view of putative reward-processing neural structures. Presentation of

    monetary or liquid rewards and stimuli predicting such rewards activates reward structures

    previously characterised in neurophysiological experiments, notably the striatum, orbitofrontal

    cortex, amygdala and dopaminergic midbrain. As human blood oxygen level dependent (BOLD)

    responses most likely reflect presynaptic inputs to neurons (Logothetis et al. 2001), some of these

    activations may be due to the known dopaminergic inputs to these structures.

    In a Positron Emission Tomography (PET) study, Thut et al. (1997) have found activation of

    left frontal cortex, thalamus and midbrain in a go-no go task using monetary rewards. Arana et al.

    (2003) in a PET study used a restaurant task in which subjects considered or chose items from a

    menu tailored according to the subjects preferences. Amygdala and medial orbitofrontal cortex

    were activated when considering appetitive incentive values of foods. Activation in amygdala

    correlated with the subjects incentive ratings and the activation in medial orbitofrontal cortex

    correlated with the difficulty of choice being made suggesting its role in goal selection.

    Kirsch et al. (2003) used a differential conditioning paradigm and asked participants to

    perform a reaction time task. Participants were rewarded (or not rewarded) with a monetary or

    verbal feedback (fast or slow). Activity related to anticipation of reward in substantia nigra and

    nucleus accumbens was stronger with highly motivating stimuli (monetary reward) compared to

    less motivating ones (verbal feedback).

    Gottfried et al. (2003) trained subjects with picture-odour associations while performing a

    visuospatial discrimination task. After training, subjects received the same contingencies in two

  • 8/12/2019 Introduction to Reward Processing

    13/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    13

    further sessions. Subjects were fed to satiety selectively on one of the two food-based olfactory

    rewards in between the two sessions. Activity in the amygdala and OFC declined for the CS

    predicting the devalued odour, while the activity in the ventral striatum, insula and cingulate cortex

    not only showed decreased responses to the CS predicting the devalued odour, but also increased

    responses to the CS predicting non-devalued odour. Their results suggest that amygdala and OFC

    encode the current value of the reward representations accessible to the predictive cues.

    Ramnani et al. (2004) trained participants with Pavlovian conditioning paradigm in which

    two conditioned stimuli predicted occurrence of a 1 pound reward or no reward respectively.

    Participants were then scanned during which few of the trials had cue-outcome contingencies

    reversed. Unexpected rewards evoked activation in the orbitofrontal cortex, frontal pole,

    parahippocampal cortex and cerebellum. Unexpected reward failure evoked activity in the frontalpole and the temporal cortex.

    Cox et al. (2005) used a simple card game (guessing whether the number on the back of a

    card shown face up was higher or lower than the value shown) to mask awareness of a conditioning

    task in which discriminable visual patterns were associated with monetary reward and loss. The

    patterns were then presented one at a time without reward or negative feedback. Subjects indicated

    their preference when two patterns were presented simultaneously. This procedure allowed the

    authors to test the brain activations to conditioned stimuli in the absence of explicit rewardanticipation. Activity was observed in ventral striatum and OFC when reward was compared with

    negative feedback. When passively viewing the conditioned stimuli, activation was observed in the

    OFC. Thus OFC is involved in representing both rewarding and conditioned stimuli that have

    acquired reward value.

    ODoherty et al. (2006) used Pavlovian conditioning and determined subjects preferences

    for five different food flavours that were associated with five fractals. Subjects performed a

    keypress indicating the spatial location (left or right) of the fractal. Using a temporal difference

    model of learning the value signal, they found that ventral midbrain region showed a linear response

    to preferences. However, the ventral striatum showed bivalent responses, with maximal responses

    to most and least preferred food, possibly consistent with the suggestions that ventral striatum might

    be involved in both appetitive and aversive learning (Jensen et al., 2003; Knutson et al., 2001;

    Seymour et al., 2004). Given that no aversive stimuli were used by ODoherty et al. (2006), a

    further possibility is that the ventral striatum is coding a relative value of the stimuli rather than the

  • 8/12/2019 Introduction to Reward Processing

    14/28

    Introduction to Reward processing

    14

    objective value independently of the context in which the stimuli are presented (Cromwell et al.

    2005).

    Recently, Bray and ODoherty (2007) used classical conditioning procedure in which

    subjects performed a simple spatial identification task to indicate the side (left or right) on which afractal was presented. Participants received reinforcement on 50% of trials by attractive/

    unattractive faces. They find significant prediction-error related activity in the ventral striatum for

    the attractive compared with unattractive faces. In contrast, amygdala showed positive correlations

    with prediction error signals of both attractive and unattractive faces.

    1.3.1 Motivational ValenceA number of neuroimaging studies have found distinct neural systems processing reward andpunishment information. Delgado et al. (2000) asked participants to guess whether the value of the

    card was higher or lower than 5. Participants received monetary reward ($1.00), punishment ($0.50)

    or neutral feedback. They found that bilateral caudate in the dorsal striatum showed differential

    activation based on the valence of the feedback. A sharp decrease of response below baseline was

    observed after a punishment, while the activations sustained following a reward. Delgado et al.

    (2004) found the activity in caudate nucleus was more robust in early phases of learning, which

    decreased for the reward-feedback signal as the learning progressed for the well-predicted cues.

    They suggest the role of caudate in the initial acquisition of contingencies by trial-and-error

    learning as well as its activity is modulated as a function of learning and predictability.

    Knutson et al. (2001) showed that anticipation of reward significantly increased activation in

    the nucleus accumbens, whereas activation in medial caudate increased in anticipation of both

    rewards and punishments. Nucleus accumbens activity was also correlated with self-reported

    happiness. Cues signalled the potential reward ($0.20, $1.00, or $5.00), punishment ($0.20, $1.00,

    or $5.00) or no monetary outcome. Subjects performed a button press task during a target to win or

    avoid losing money and the task difficulty was adjusted so that subjects should succeed on ~66% of

    target responses.

    The lateral area of orbitofrontal cortex (OFC) is activated following a punishing outcome

    and the medial OFC is activated following a rewarding outcome (see Elliot et al; 2000 Review).

    ODoherty et al. (2001) used a visual reversal-learning task in which the choice of a correct

    stimulus lead to a probabilistically determined monetary reward and the choice of an incorrect

    stimulus lead to a monetary loss. They found a medial-lateral distinction for rewarding and

  • 8/12/2019 Introduction to Reward Processing

    15/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    15

    punishing outcomes, respectively. ODoherty et al., (2003) used a reversal task in which selection

    of a correct stimulus lead to 70% probability of receiving monetary reward and 30% probability of

    monetary punishment. The incorrect stimulus had the reverse contingency. The reversal occurred on

    a random trial after a criterion of five selections of the correct stimulus was reached. They found

    that ventromedial and orbital PFC are not only involved in representing the valence of outcomes,

    but also signal subsequent behavioural choice. The anterior insula / caudolateral OFC was related to

    behavioural choice and was active in trials that required a switch in stimulus choice the subsequent

    trials.

    Jensen et al. (2003) found ventral striatum activation in anticipation of aversive stimuli

    (unpleasant cutaneous electrical stimulation) that was not a consequence of relief after the aversive

    event. Further, the ventral striatum was active regardless of whether there was an opportunity toavoid the stimulus or not. Nitschke et al. (2006) used a passive viewing task of aversive and neutral

    pictures. They found activation in dorsal amygdala, anterior insula, dorsal ACC, right DLPFC, and

    posterior OFC during both anticipation and viewing of aversive pictures. Further, rostral ACC,

    superior sector of right DLPFC and medial sectors of OFC were more responsive to anticipation of

    aversive pictures than in response to them.

    The relief obtained by avoidance of an aversive stimulus can itself be a reward. Kim et al.

    (2006) used a monetary instrumental task in which participants chose between a pair of fractals thatmarked the onset of four trial types that predicted reward, avoided loss, neutral feedback, or no

    feedback with 60% or 30% probability. They found that medial OFC activity increased after

    receiving reward or avoiding loss, and decreased after failing to obtain a reward or receiving an

    aversive outcome. These responses cannot be explained as a prediction error, because the activity

    does not decrease over the course of learning. They also found signed reward prediction error

    signals in the ventral striatum on reward trials but not on avoidance trials, possibly indicating that

    monetary loss as a secondary reinforcer as against primary reinforcers such as aversive flavours or

    pain might be processed differently in the ventral striatum.

    1.3.2 Reward prediction errorsThe reward responses comply with formalisms of learning theory such as the reward prediction

    error hypothesis. Berns et al. (2001) delivered subjects with fruit juice and water in a temporally

    predictable or an unpredictable manner. Unpredictability of rewards resulted in significant activity

    in nucleus accumbens and medial orbitofrontal cortex, while predictability resulted in activation

  • 8/12/2019 Introduction to Reward Processing

    16/28

    Introduction to Reward processing

    16

    predominantly in the superior temporal gyrus. Unlike classical conditioning, the source of

    prediction in Berns et al (2001) was based on the sequence of stimuli. Moreover, the subjects

    preference for juice or water was reflected in the activity in sensorimotor cortex, but not in reward

    regions. Pagnoni et al. (2002) demonstrated that activity in ventral striatum is time-locked to reward

    prediction errors when a juice expected at 4 seconds after a cue initiated button press was delayed to

    seconds. The finding was not replicated when the juice was replaced by a visual stimulus indicating

    that ventral striatum selectively encodes rewarding events and not any salient stimulus in general.

    McClure et al. (2003) used classical conditioning paradigm to test for temporal prediction errors

    when a juice expected at 6 seconds delay after a light cue was delivered only after a further delay of

    4 seconds. Thus a negative prediction error would occur for the absence of juice, while a positive

    prediction error would occur for unexpected delivery of juice at a later time. They found that both

    these prediction errors correlated with activity in the left putamen.

    The real-time extension of the Rescorla-Wagner learning rule, i.e. the TD model has been

    successfully used to explain brain activity in tasks involving prediction error. ODoherty et al.

    (2003) used appetitive conditioning with taste reward. Three fractals were associated with glucose,

    neutral taste or no taste. Reward was omitted or unexpectedly delivered in some of the trials.

    Regression analysis with a temporal difference model revealed significant correlation of activity in

    the ventral striatum and OFC with the error signal, suggesting their role in reward-related learning.

    Seymour et al. (2004) used second-order pain learning task in which two visual cues preceded

    delivery of high or low pain. While the second cue fully predicted the strength of the subsequently

    experienced pain, the first cue only allowed a probabilistic prediction. They demonstrate that

    activity in the ventral striatum and the anterior insula display a marked correspondence to the

    signals predicted by the temporal difference models.

    Seymour et al. (2007) used a probabilistic Pavlovian task to compare winning / losing

    money in two conditions when the alternative was respectively winning / losing nothing, or losing /

    winning money. Positive reward prediction error can be obtained by contrasting bivalent 1.00 win

    with univalent 1.00 outcome. Similarly, Positive loss prediction error is obtained by contrasting

    bivalent 1.00 loss with univalent 1.00 loss. The opposite contrasts would reveal negative

    prediction errors. They found that striatal activation reflected positively signed prediction error in

    anterior region for rewards and more posteriorly for losses.

  • 8/12/2019 Introduction to Reward Processing

    17/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    17

    1.3.3 Neuroimaging of basic reward parametersAnimal learning theory and microeconomic theory have suggested a number of basic reward

    parameters, such a magnitude, probability, delay etc that are involved in processing reward

    information. In a parametric study, Elliott et al. (2003) have found non-linear responses in

    orbitofrontal cortex with increasing magnitudes of financial reward. They parametrically varied the

    monetary reward value (10, 20, 50 pence and 1 pound) while subjects performed a simple target

    detection task. They found that amygdala, striatum and dopaminergic midbrain responded

    regardless of the reward value, while medial and lateral OFC responded non-linearly showing

    maximum response for the lowest and highest values. Galvan et al. (2005) used a delayed-response

    spatial-choice task and were presented with small, medium or large amount of coins. The exact

    value of each reward was not disclosed to the subjects to avoid the subjects counting the totalmoney after each trial. They find reward magnitude related responses in nucleus accumbens,

    thalamus and orbitofrontal cortex. Interestingly, only nucleus accumbens had a shift in activity from

    the reward to the predicting cue during later stages of learning. A frontostriatal shift in activity can

    be suggested (Pasupathy and Miller, 2005) as the OFC responses contrasted with the accumbens

    activity and the responses in OFC increased to the rewarded response rather than to the predictive

    cue.

    Breiter et al. (2001) showed subjects prospects consisting of a set of three outcomes and one

    of these amounts was awarded after a delay. Subjects could win or lose money in these prospects.

    Three kinds of prospects (good: $10, $2.50, $0, intermediate: $2.50, $0, -$1.50 and bad: $0, -$1.50,

    -$6) were used. Subjects could either win, lose or retain their initial endowment of $50. In the good

    prospect, subjects could win additional money or retain their earnings, in the bad prospect subjects

    could retain or lose money and in the intermediate prospect, subjects could win/retain/lose money.

    Haemodynamic responses in the amygdala and orbital gyrus tracked the expected values of the

    prospects. Sustained outcome phase responses in nucleus accumbens, amygdala, and hypothalamuswere ordered as a function of monetary payoff on the good prospect. They found a large overlap

    between the neural activations in the prospect and outcome phase and little evidence for anatomical

    segregation between the prospect and outcome phases. According to decision affect theory (Mellers

    et al., 1997), responses to a given outcome depend on counterfactual comparisons. Thus $0 on a

    good prospect will be experienced as a loss and the same outcome in a bad prospect would be

    experienced as a win. Partial evidence for this was observed clearly in time courses of nucleus

    accumbens and amygdala for the good and bad prospects, but not so for intermediate prospect.

  • 8/12/2019 Introduction to Reward Processing

    18/28

    Introduction to Reward processing

    18

    McClure et al. (2004) examined neural correlates of time discounting while subjects made a

    series of choices between monetary reward options that varied by delay (same day to 6 weeks later)

    to delivery. They found ventral striatum, medial OFC, medial PFC, posterior cingulate and left

    posterior hippocampus were related to choice of immediate rewards. In contrast, regions of lateral

    prefrontal cortex and posterior parietal cortex are engaged uniformly by inter-temporal choices

    irrespective of delay. Recently, McClure et al. (2007) used primary rewards (fruit juice or water)

    with time delay of minutes instead of weeks and found similar activation patterns as in their

    previous study. When the delivery of all rewards was offset by 10 min, there was no further

    differential activity in limbic reward-related areas. Hence suggesting that time discounting is not a

    relative concept.

    Dreher et al. (2006) used three slot machines with two different reward values ($10, $20)and reward probabilities (0.25, 0.5), so that one pair of slot machines had the expected value

    matched. To avoid counterfactual comparison, a common outcome of no reward with a probability

    of 1 served as a fourth slot machine. They found that midbrain region responded transiently to

    higher reward probability at the time of the cue and to lower reward probability at the time of the

    reward outcome and in a sustained fashion to reward uncertainty during the delay period. These

    results parallel those found in electrophysiological studies of primates (Fiorillo et al., 2003). The

    midbrain activations could not be explained by increase in expected value alone, as when

    comparing the two conditions with equal expected values, midbrain activation was robustly

    activated in anticipation of an uncertain reward (50% probability) with low magnitude ($10)

    compared with reward with known lower probability (25%) but higher magnitude ($20). A frontal

    network covaried with the reward prediction error signal both at the time of the cue and at the time

    of the outcome. The ventral striatum (putamen) showed sustained activation that covaried with

    maximum reward uncertainty during reward anticipation. Their results suggest distinct functional

    networks encoding statistical properties of reward information.

    Recently, Liu et al. (2007) used a monetary decision making task in which participants

    chose whether to bank or bet a certain number of chips. Decision to bank or losing the bet made

    them start over from one chip, while the wager was doubled if they won the bet. However,

    participants witnessed the outcome even after they banked. They contrasted three reward processes

    reward anticipation (bet Vs bank), outcome monitoring (win Vs loss) and choice evaluation (right

    Vs wrong). They found the striatum and medial/middle orbitofrontal cortex was activated by

    positive reward anticipation, winning outcome and evaluation of right choices, whereas lateral

  • 8/12/2019 Introduction to Reward Processing

    19/28

  • 8/12/2019 Introduction to Reward Processing

    20/28

    Introduction to Reward processing

    20

    performance and minimal when receipt of money was unrelated to the task. They found that

    behaviourally salient monetary rewards activate the human striatum, suggesting its role in saliency

    of rewards rather than value or hedonic feelings. Tricomi et al. (2004) have reported that the

    caudate nucleus was robustly activated when the subjects thought that whether they won or lost

    money was contingent on the button press (i.e. action). Elliot et al. (2004) investigated whether the

    neural responses to financial reward depended on instrumental action using a 2x2 factorial design

    consisting of movement and reward as the two factors. Subjects performed a simple target detection

    task. The trial types were indicated by coloured squares and hence rewards were fully predictable

    and reward expectation remained fixed. Significant enhancement of the reward-related response

    under the movement condition was seen in the dopaminergic midbrain, dorsal striatum and the

    amygdala.

    1.4 RationaleThe recently developed functional Magnetic Resonance Imaging (fMRI) methods provide a unique

    opportunity to extend reward work to humans, first by replicating, and thus referencing, the reward

    work done in monkeys, and then by investigating typical 'human' tasks that are difficult to approach

    in animals.

    As mentioned earlier, rewards have schematically three functions: they induce learning,

    approach behaviour, and positive emotions. The first of the reward functions (learning) can be well

    investigated in animals, for example using classical (Pavlovian) and instrumental (operant)

    conditioning. The second reward function (approach behaviour) can also be investigated in animals,

    but the work is limited due to their limited communication and cognitive abilities. The third reward

    function (subjective feelings of pleasure) is very difficult to investigate in animals, and humans

    appear to be the subjects of choice.

    Monetary rewards are uniquely human. The importance of money in everyday life makes it

    a strong reinforcer. Neurophysiological studies in animals have provided the primary basis for

    speculations about the brain areas that might process reward information in the human brain. The

    initial neuroimaging studies in humans using Positron Emission Tomography revealed that alpha-

    numerically presented monetary reward was more reinforcing than positive reinforcement of the

    word ok in the dorsolateral and orbital frontal cortex, midbrain and thalamus (Thut et al., 1997).

    The success of fMRI to study reward processing in humans was obtaining measurable BOLD signal

    changes in the orbitofrontal cortex (OFC), amygdala, ventral striatum/nucleus accumbens (see

  • 8/12/2019 Introduction to Reward Processing

    21/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    21

    McClure et al., 2004 for a review), regions that have previously been implicated in reward

    processing in non-human primates. A wide range of rewarding stimuli including primary rewards

    (liquids, smells, sexual stimuli), abstract rewards (money, positive reinforcement) and social

    rewards (beautiful faces, pleasant touch) activate the same network of brain areas. The findings

    from numerous animal and human studies has led researchers to suggest the roles that the different

    brain areas might play in processing reward information. The midbrain and ventral striatum might

    be involved in reward prediction error, while the orbitofrontal cortex might be involved in

    evaluating rewards and relative processing of rewards. The amygdala, though traditionally believed

    to process aversive and fear-inducing stimuli, is now generally believed to involve processing

    reinforcer intensity both appetitive and aversive.

    BibliographyAdcock, R., Thangavel, A., Whitfield-Gabrieli, S.,

    Knutson, B., and Gabrieli, J. (2006). Reward-motivated learning: mesolimbic activation precedesmemory formation. Neuron, 50:507517.

    Anderson, A., Christoff, K., Stappen, I., Panitz, D.,Ghahremani, D., Glover, G., Gabrieli, J., and Sobel,N. (2003). Dissociated neural representations ofintensity and valence in human olfaction. Nat.Neurosci., 6:196202.

    Apicella, P., Ljungberg, T., Scarnati, E., and Schultz,W. (1991). Responses to reward in monkey dorsal

    and ventral striatum. Exp Brain Res, 85(3):491500.Arana, F., Parkinson, J., Hinton, E., Holland, A.,

    Owen, A., and Roberts, A. (2003). Dissociablecontributions of the human amygdala andorbitofrontal cortex to incentive motivation andgoal selection. J. Neurosci., 23:96329638.

    Bandettini, P.A. (1994). Magnetic resonance imagingof human brain activation using endogenoussusceptibility contrast, PhD Thesis, MedicalCollege of Wisconsin.

    Bandettini, P., Wong, E., Jesmanowicz, A., Hinks, R.,and Hyde, J. (1994). Spin-echo and gradient-echo

    EPI of human brain activation using BOLDcontrast: a comparative study at 1.5 T. NMRBiomed, 7:1220.

    Bayer, H. and Glimcher, P. (2005). Midbrain dopamineneurons encode a quantitative reward predictionerror signal. Neuron, 47:129141.

    Beaver, JD, Lawrence, AD, van Ditzhuijzen, J, Davis,MH, Woods, A, Calder, AJ (2006). Individualdifferences in reward drive predict neural responsesto images of food. J. Neurosci., 26, 19:5160-6.

    Bensafi, M, Sobel, N, Khan, RM (2007). Hedonic-specific activity in piriform cortex during odorimagery mimics that during odor perception. J.

    Neurophysiol., 98, 6:3254-62.

    Berger, TW, Alger, B, Thompson, RF (1976).Neuronal substrate of classical conditioning in thehippocampus. Science, 192, 4238:483-5.

    Bernoulli, D. (1738/1954). Exposition of a new theoryon the measurement of risk. Econometrica, 22, 23-36 (translated from latin).

    Berns, G. (1999). Functional neuroimaging. Life Sci.,65:2531 2540.

    Berns, G., McClure, S., Pagnoni, G., and Montague, P.(2001). Predictability modulates human brainresponse to reward. J. Neurosci., 21:27932798.

    Berridge, K. and Robinson, T. (1998). What is the roleof dopamine in reward: hedonic impact, rewardlearning, or incentive salience? Brain Res. BrainRes. Rev., 28:309369.

    Berridge, K. and Robinson, T. (2003). Parsing reward.Trends Neurosci., 26:507513.

    Boser, B.E., Guyon, I., and Vapnik, V. (1992). Atraining algorithm for optimal margin classifiers. InProceedings of the Fifth Annual Workshop onComputational Learning Theory, (ACM Press) pp.144152.

    Bowman, C. and Turnbull, O. (2003). Real versusfacsimile reinforcers on the Iowa Gambling Task.

    Brain Cogn, 53:207210.Boynton, G., Engel, S., Glover, G., and Heeger, D.(1996). Linear systems analysis of functionalmagnetic resonance imaging in human V1. J.Neurosci., 16:42074221.

    Bray, S. and ODoherty, J. (2007). Neural coding ofreward-prediction error signals during classicalconditioning with attractive faces. J. Neurophysiol.,97:30363045.

    Bray, S., Shimojo, S., and ODoherty, J. (2007). Directinstrumental conditioning of neural activity usingfunctional magnetic resonance imaging-derivedreward feedback. J. Neurosci., 27:74987507.

  • 8/12/2019 Introduction to Reward Processing

    22/28

    Introduction to Reward processing

    22

    Breiter, H., Aharon, I., Kahneman, D., Dale, A., andShizgal, P. (2001). Functional imaging of neuralresponses to expectancy and experience ofmonetary gains and losses. Neuron, 30:619639.

    Brett, M., Leff, A., Rorden, C., and Ashburner, J.(2001). Spatial normalization of brain images with

    focal lesions using cost function masking.Neuroimage, 14:486500.Bunzeck, N. and Duzel, E. (2006). Absolute coding of

    stimulus novelty in the human substantianigra/VTA. Neuron, 51:369379.

    Buxton, R., Wong, E., and Frank, L. (1998). Dynamicsof blood flow and oxygenation changes duringbrain activation: the balloon model. Magn ResonMed, 39:855864.

    Camerer, C.F., Hogarth, R.M. (1999). The Effects ofFinancial Incentives in Experiments: A Review andCapital-Labor-Production Framework. Journal ofRisk and Uncertainty, 19:742.

    Carlson, TA, Schrater, P, He, S (2003). Patterns ofactivity in the categorical representations of objects.J Cogn Neurosci, 15, 5:704-17.

    Chein, J. M. and Schneider, W. (2003). Designingeffective fMRI experiments. In Grafman, J. andRobertson, I., editors, Handbook ofNeuropsychology. Elsevier Science B.V.,Amsterdam.

    Cox, DD, Savoy, RL (2003). Functional magneticresonance imaging (fMRI) "brain reading":detecting and classifying distributed patterns offMRI activity in human visual cortex. Neuroimage,19, 2 Pt 1:261-70.

    Childress, A.R., Franklin, T., Listerud, J., Acton, P.D.,and OBrien, C.P. (2002). Neuroimaging of cocainecraving states: cessation, stimulant administration,and drug cue paradigms. InNeuropsychopharmacology: a fifth generation ofprogress. K.L. Davis, D. Charney, J.T. Coyle, C.Nemeroff, eds. pp. 575-1590.

    Cohen, M. and Bookheimer, S. (1994). Localization ofbrain function using magnetic resonance imaging.Trends Neurosci., 17:268277.

    Constable, R. (1995). Functional MR imaging usinggradientecho echo-planar imaging in the presenceof large static field inhomogeneities. J Magn Reson

    Imaging, 5:746752.Cox, S., Andrade, A., and Johnsrude, I. (2005).

    Learning to like: a role for human orbitofrontalcortex in conditioned reward. J. Neurosci.,25:27332740.

    Critchley, H. and Rolls, E. (1996). Hunger and satietymodify the responses of olfactory and visualneurons in the primate orbitofrontal cortex. J.Neurophysiol., 75:16731686.

    Cromwell, H. C., Hassani, O. K., and Schultz, W.(2005). Relative reward processing in primatestriatum. Exp Brain Res, 162(4):520525.

    Cromwell, H. C. and Schultz, W. (2003). Effects ofexpectations for different reward magnitudes onneuronal activity in primate striatum. JNeurophysiol, 89(5):28232838.

    Cusack, R., Russell, B., Cox, S., De Panfilis, C.,Schwarzbauer, C., and Ansorge, R. (2005). An

    evaluation of the use of passive shimming toimprove frontal sensitivity in fMRI. Neuroimage,24:8291.

    Dadds, M., Bovbjerg, D., Redd, W., and Cutmore, T.(1997). Imagery in human classical conditioning.Psychol Bull, 122:89103.

    Dale, A. (1999). Optimal experimental design forevent-related fMRI. Hum Brain Mapp, 8:109114.

    DArdenne, K., McClure, S., Nystrom, L., and Cohen,J. (2008). BOLD responses reflecting dopaminergicsignals in the human ventral tegmental area.Science, 319:12641267.

    Davatzikos, C, Ruparel, K, Fan, Y, Shen, DG,

    Acharyya, M, Loughead, JW, Gur, RC, Langleben,DD (2005). Classifying spatial patterns of brainactivity with machine learning methods: applicationto lie detection. Neuroimage, 28, 3:663-8.

    De Houwer, J., Thomas, S., and Baeyens, F. (2001).Associative learning of likes and dislikes: A reviewof 25 years of research on human evaluativeconditioning. Psychological Bulletin, 127, 853869.

    Deichmann, R., Gottfried, J., Hutton, C., and Turner,R. (2003). Optimized EPI for fMRI studies of theorbitofrontal cortex. Neuroimage, 19:430441.

    Delgado, M., Miller, M., Inati, S., and Phelps, E.(2005). An fMRI study of reward-related

    probability learning. Neuroimage, 24:862 873.Delgado, M., Nystrom, L., Fissell, C., Noll, D., and

    Fiez, J. (2000). Tracking the hemodynamicresponses to reward and punishment in the striatum.J. Neurophysiol., 84:30723077.

    Delgado, M.R., Stenger, V.A. and Fiez, J.A. (2004).Motivation-dependent responses in the humancaudate nucleus. Cereb. Cortex, 14(9):1022-30.

    Dematt`e, M., Osterbauer, R., and Spence, C. (2007).Olfactory cues modulate facial attractiveness.Chem. Senses, 32:603610.

    Dickinson, A. (1980). Contemporary animal learningtheory. Cambridge: Cambridge University Press.

    Djordjevic, J., Zatorre, R., Petrides, M., Boyle, J., andJones-Gotman, M. (2005). Functionalneuroimaging of odor imagery. Neuroimage,24:791801.

    Dreher, J., Kohn, P., and Berman, K. (2006). Neuralcoding of distinct statistical properties of rewardinformation in humans. Cereb. Cortex, 16:561573.

    Elliott, R., Dolan, R., and Frith, C. (2000). Dissociablefunctions in the medial and lateral orbitofrontalcortex: evidence from human neuroimaging studies.Cereb. Cortex, 10:308317.

    Elliott, R., Newman, J., Longe, O., and Deakin, J.(2003). Differential response patterns in the

  • 8/12/2019 Introduction to Reward Processing

    23/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    23

    striatum and orbitofrontal cortex to financial rewardin humans: a parametric functional magneticresonance imaging study. J. Neurosci., 23:303307.

    Elliott, R., Newman, J., Longe, O., and WilliamDeakin, J. (2004). Instrumental responding forrewards is associated with enhanced neuronal

    response in subcortical reward systems.Neuroimage, 21:984990.Fawcett, T. (2006). An introduction to ROC analysis.

    Pattern Recognition Letters, 27: 861-874.Fernie, G. and Tunney, R. (2006). Some decks are

    better than others: the effect of reinforcer type andtask instructions on learning in the Iowa GamblingTask. Brain Cogn, 60:94102.

    Fiorillo, C., Tobler, P., and Schultz, W. (2003).Discrete coding of reward probability anduncertainty by dopamine neurons. Science,299:18981902.

    Friston, K. (1997). Testing for anatomically specified

    regional effects. Human Brain Mapping, 5:133136.

    Friston, K. J., Ashburner, J., Frith, C. D., Poline, J. B.,Heather, J. D., and Frackowiak, R. S. J. (1995a).Spatial registration and normalisation of images.Human Brain Mapping, 2:165189.

    Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.B., Frith, C. D., and Frackowiak, R. S. J. (1995b).Statistical parametric maps in functional imaging:A general linear approach. Human Brain Mapping,2:189210.

    Friston, K., Price, C., Fletcher, P., Moore, C.,Frackowiak, R., and Dolan, R. (1996). The trouble

    with cognitive subtraction. Neuroimage, 4:97104.Gallistel, C. (1990). Representations in animal

    cognition: an introduction. Cognition, 37:122.Galvan, A., Hare, T., Davidson, M., Spicer, J., Glover,

    G., and Casey, B. (2005). The role of ventralfrontostriatal circuitry in rewardbased learning inhumans. J. Neurosci., 25:86508656.

    Genovese, C., Lazar, N., and Nichols, T. (2002).Thresholding of statistical maps in functionalneuroimaging using the false discovery rate.NeuroImage, 870878.

    Gneezy, U., and Rustichini, A. (2000). Pay Enough orDon't Pay at All. The Quarterly Journal of

    Economics, 115:791810.Gottfried, J., Deichmann, R., Winston, J., and Dolan,

    R. (2002a). Functional heterogeneity in humanolfactory cortex: an eventrelated functionalmagnetic resonance imaging study. J. Neurosci.,22:10819 10828.

    Gottfried, J., ODoherty, J., and Dolan, R. (2002b).Appetitive and aversive olfactory learning inhumans studied using eventrelated functionalmagnetic resonance imaging. J. Neurosci.,22:1082910837.

    Gottfried, J., ODoherty, J., and Dolan, R. (2003).Encoding predictive reward value in human

    amygdala and orbitofrontal cortex. Science,301:11041107.

    Gusnard, D., Raichle, M., and Raichle, M. (2001).Searching for a baseline: functional imaging andthe resting human brain. Nat. Rev. Neurosci.,2:685694.

    Hampton, A., Adolphs, R., Tyszka, M., and ODoherty,J. (2007). Contributions of the amygdala to rewardexpectancy and choice signals in human prefrontalcortex. Neuron, 55:545555.

    Hassabis, D., Kumaran, D., Vann, S., and Maguire, E.(2007). Patients with hippocampal amnesia cannotimagine new experiences. Proc. Natl. Acad. Sci.U.S.A., 104:17261731.

    Haxby, JV, Gobbini, MI, Furey, ML, Ishai, A,Schouten, JL, Pietrini, P (2001). Distributed andoverlapping representations of faces and objects inventral temporal cortex. Science, 293, 5539:2425-30.

    Haynes, JD, Rees, G (2006). Decoding mental statesfrom brain activity in humans. Nat. Rev. Neurosci.,7, 7:523-34.

    Heeger, D. and Ress, D. (2002). What does fMRI tellus about neuronal activity? Nat. Rev. Neurosci.,3:142151.

    Henson, R., Rugg, M., and Friston, K. (2001). Thechoice of basis functions in event-related fMRI.NeuroImage, 13(6):127. Supplement 1.

    Hikosaka, O., Sakamoto, M., and Usui, S. (1989).Functional properties of monkey caudate neurons.III. Activities related to expectation of target andreward. J. Neurophysiol., 61:814832.

    Holland, P. (1990). Event representation in Pavlovianconditioning: image and action. Cognition, 37:105131.

    Hollerman, J. R. and Schultz, W. (1998). Dopamineneurons report an error in the temporal prediction ofreward during learning. Nat Neurosci, 1(4):304309.

    Holmes, A. and Friston, K. (1998). Generalisability,random effects and population inference. InNeuroImage, volume 7, page S754.

    Holt, C.A., and Laury, S.K. (2002). Risk Aversion andIncentive Effects. The American Economic Review92, 1644-1655.

    Horvitz, J. (2000). Mesolimbocortical and nigrostriataldopamine responses to salient non-reward events.Neuroscience, 96:651656.

    Hulvershorn, J., Bloy, L., Gualtieri, E., Leigh, J., andElliott, M. (2005). Spatial sensitivity and temporalresponse of spin echo and gradient echo boldcontrast at 3 T using peak hemodynamic activationtime. Neuroimage, 24:216223.

    Ishai, A., Ungerleider, L., and Haxby, J. (2000).Distributed neural systems for the generation ofvisual images. Neuron, 28:979990.

    Jensen, J., McIntosh, A., Crawley, A., Mikulis, D.,Remington, G., and Kapur, S. (2003). Direct

  • 8/12/2019 Introduction to Reward Processing

    24/28

    Introduction to Reward processing

    24

    activation of the ventral striatum in anticipation ofaversive stimuli. Neuron, 40:12511257.

    Johnson, M. and Bickel, W. (2002). Within-subjectcomparison of real and hypothetical money rewardsin delay discounting. J Exp Anal Behav, 77:129146.

    Kahneman, D. & Tversky, A. (1979). Prospect theory:an analysis of decision under risk. Econometrica,47, 263-291.

    Kamin, L. J. (1969). Predictability, surprise, attentionand conditioning. In B. A. Campbell & R. M.Church (eds.), Punishment and aversive behavior,279-296, New York: Appleton-Century-Crofts.

    Kamitani, Y, Tong, F (2005). Decoding the visual andsubjective contents of the human brain. Nat.Neurosci., 8, 5:679-85.

    Kim, H., Shimojo, S., and ODoherty, J. (2006). Isavoiding an aversive outcome rewarding? Neuralsubstrates of avoidance learning in the human brain.

    PLoS Biol., 4:e233.King, D. (1973). An image theory of classical

    conditioning. Psychol Rep, 33:403411.King, D. (1974). An image theory of instrumental

    conditioning. Psychol Rep, 35:11151122.Kirsch, P., Schienle, A., Stark, R., Sammer, G.,

    Blecker, C., Walter, B., Ott, U., Burkart, J., andVaitl, D. (2003). Anticipation of reward in anonaversive differential conditioning paradigm andthe brain reward system: an event-related fMRIstudy. Neuroimage, 20:10861095.

    Knutson, B., Adams, C., Fong, G., and Hommer, D.(2001). Anticipation of increasing monetary reward

    selectively recruits nucleus accumbens. J.Neurosci., 21:RC159.

    Knutson, B. and Cooper, J. (2005). Functionalmagnetic resonance imaging of reward prediction.Curr. Opin. Neurol., 18:411417.

    Knutson, B., Taylor, J., Kaufman, M., Peterson, R., andGlover, G. (2005). Distributed neural representationof expected value. J Neurosci 25:4806-4812.

    Kobayashi, M., Takeda, M., Hattori, N., Fukunaga, M.,Sasabe, T., Inoue, N., Nagai, Y., Sawada, T.,Sadato, N., and Watanabe, Y. (2004). Functionalimaging of gustatory perception and imagery: top-down processing of gustatory signals.

    Neuroimage, 23:12711282.Konorski, J. (1967). Integrative action of the brain.

    Chicago: University of Chicago Press.Kosslyn, S. (1988). Aspects of a cognitive

    neuroscience of mental imagery. Science,240:16211626.

    Kosslyn, S., Ganis, G., and Thompson, W. (2001).Neural foundations of imagery. Nat. Rev.Neurosci., 2:635642.

    Kosslyn, S., Shin, L., Thompson, W., McNally, R.,Rauch, S., Pitman, R., and Alpert, N. (1996).Neural effects of visualizing and perceiving

    aversive stimuli: a PET investigation. Neuroreport,7:15691576.

    Kringelbach, M. (2005). The human orbitofrontalcortex: linking reward to hedonic experience. Nat.Rev. Neurosci., 6:691702.

    Kringelbach, M., ODoherty, J., Rolls, E., and

    Andrews, C. (2003). Activation of the humanorbitofrontal cortex to a liquid food stimulus iscorrelated with its subjective pleasantness. Cereb.Cortex, 13:10641071.

    Kringelbach, M. and Rolls, E. (2004). The functionalneuroanatomy of the human orbitofrontal cortex:evidence from neuroimaging and neuropsychology.Prog. Neurobiol., 72:341372.

    LaConte, S, Strother, S, Cherkassky, V, Anderson, J,Hu, X (2005). Support vector machines fortemporal classification of block design fMRI data.Neuroimage, 26, 2:317-29.

    Lancaster, J., Woldorff, M., Parsons, L., Liotti, M.,

    Freitas, C., Rainey, L., Kochunov, P., Nickerson,D., Mikiten, S., and Fox, P. (2000). AutomatedTalairach atlas labels for functional brain mapping.Hum Brain Mapp, 10:120131.

    Lauterbur, P.C. (1973). Image formation by inducedlocal interactions. Examples employing nuclearmagnetic resonance. Nature, 242:190191.

    Le Pelley, M. (2004). The role of associative history inmodels of associative learning: a selective reviewand a hybrid model. Q J Exp Psychol B, 57:193243.

    Lin, H.-T., Lin, C.-J., and Weng, R. C. (2007). A noteon Platt's probabilistic outputs for support vector

    machines. Machine Learning. 68(3), 267-276.Lisman, J. and Grace, A. (2005). The hippocampal-

    VTA loop: controlling the entry of information intolong-term memory. Neuron, 46:703713.

    Liu, X., Powell, D., Wang, H., Gold, B., Corbly, C.,and Joseph, J. (2007). Functional dissociation infrontal and striatal areas for processing of positiveand negative reward information. J. Neurosci.,27:4587 4597.

    Ljungberg, T., Apicella, P., and Schultz, W. (1992).Responses of monkey dopamine neurons duringlearning of behavioral reactions. J Neurophysiol,67(1):145163.

    Logothetis, N. (2002). The neural basis of the blood-oxygenlevel- dependent functional magneticresonance imaging signal. Philos. Trans. R. Soc.Lond., B, Biol. Sci., 357:10031037.

    Logothetis, N., Pauls, J., Augath, M., Trinath, T., andOeltermann, A. (2001). Neurophysiologicalinvestigation of the basis of the fMRI signal.Nature, 412:150157.

    Mackintosh, N. J. (1975). A theory of attention:Variations in the associability of stimuli withreinforcement. Psychological Review, 82, 276-298.

    Mackintosh, N. J. (1983). Conditioning and associativelearning. Oxford: Oxford University Press.

  • 8/12/2019 Introduction to Reward Processing

    25/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    25

    Maldjian, J., Laurienti, P., Kraft, R., and Burdette, J.(2003). An automated method for neuroanatomicand cytoarchitectonic atlasbased interrogation offMRI data sets. Neuroimage, 19:12331239.

    Mansfield, P. (1977). Multi-planar image formationusing NMR spin echoes. J. Phys. C 10:L55L58.

    McClure, S. M. (2003). Reward prediction errors inhuman brain, PhD Thesis, Baylor College ofMedicine.

    McClure, S., Berns, G., and Montague, P. (2003).Temporal prediction errors in a passive learningtask activate human striatum. Neuron, 38:339346.

    McClure, S., Ericson, K., Laibson, D., Loewenstein,G., and Cohen, J. (2007). Time discounting forprimary rewards. J. Neurosci., 27:57965804.

    McClure, S., Laibson, D., Loewenstein, G., and Cohen,J. (2004a). Separate neural systems valueimmediate and delayed monetary rewards. Science,306:503507.

    McClure, S., York, M., and Montague, P. (2004b). Theneural substrates of reward processing in humans:the modern role of FMRI. Neuroscientist, 10:260268.

    Mechelli, A., Price, C., Friston, K., and Ishai, A.(2004). Where bottom-up meets top-down:neuronal interactions during perception andimagery. Cereb. Cortex, 14:12561265.

    Mellers, B.A., Schwartz, A., Ho, K., and Ritov, I.(1997). Decision affect theory: Emotional reactionsto the outcomes of risky options. PsychologicalScience, 8(6):423429.

    Miller, G. (2007). Neurobiology. A surprising

    connection between memory and imagination.Science, 315:312.

    Mirenowicz, J. and Schultz, W. (1994). Importance ofunpredictability for reward responses in primatedopamine neurons. J Neurophysiol, 72(2):10241027.

    Mirenowicz, J. and Schultz, W. (1996). Preferentialactivation of midbrain dopamine neurons byappetitive rather than aversive stimuli. Nature,379(6564):449451.

    Mouro-Miranda, J, Bokde, AL, Born, C, Hampel, H,Stetter, M (2005). Classifying brain states anddetermining the discriminating activation patterns:

    Support Vector Machine on functional MRI data.Neuroimage, 28, 4:980-95.

    Murray, E. (2007). The amygdala, reward and emotion.Trends Cogn. Sci. (Regul. Ed.), 11:489497.

    Nichols, T., Brett, M., Andersson, J., Wager, T., andPoline, J. (2005). Valid conjunction inference withthe minimum statistic. Neuroimage, 25:653660.

    Nitschke, J., Sarinopoulos, I., Mackiewicz, K.,Schaefer, H., and Davidson, R. (2006). Functionalneuroanatomy of aversion and its anticipation.Neuroimage, 29:106116.

    Norris, D., Zysset, S., Mildner, T., and Wiggins, C.(2002). An investigation of the value of spin-echo-

    based fMRI using a Stroop colorword matchingtask and EPI at 3 T. Neuroimage, 15:719726.

    OCraven, K. and Kanwisher, N. (2000). Mentalimagery of faces and places activates correspondingstiimulus-specific brain regions. J Cogn Neurosci,12:10131023.

    ODoherty, J. (2004). Reward representations andrewardrelated learning in the human brain: insightsfrom neuroimaging. Curr. Opin. Neurobiol.,14:769776.

    ODoherty, J. (2007). Lights, camembert, action! Therole of human orbitofrontal cortex in encodingstimuli, rewards, and choices. Ann. N. Y. Acad.Sci., 1121:254272.

    ODoherty, J., Buchanan, T., Seymour, B., and Dolan,R. (2006). Predictive neural coding of rewardpreference involves dissociable responses in humanventral midbrain and ventral striatum. Neuron,49:157 166.

    ODoherty, J., Critchley, H., Deichmann, R., andDolan, R. (2003a). Dissociating valence of outcomefrom behavioral control in human orbital andventral prefrontal cortices. J. Neurosci., 23:79317939.

    ODoherty, J., Dayan, P., Friston, K., Critchley, H.,and Dolan, R. (2003b). Temporal difference modelsand reward-related learning in the human brain.Neuron, 38:329337.

    ODoherty, J., Dayan, P., Schultz, J., Deichmann, R.,Friston, K., and Dolan, R. (2004). Dissociable rolesof ventral and dorsal striatum in instrumentalconditioning. Science, 304:452454.

    ODoherty, J., Kringelbach, M., Rolls, E., Hornak, J.,and Andrews, C. (2001). Abstract reward andpunishment representations in the humanorbitofrontal cortex. Nat. Neurosci., 4:95102.

    Ogawa, S., Lee, T., Kay, A., and Tank, D. (1990).Brain magnetic resonance imaging with contrastdependent on blood oxygenation. Proc. Natl. Acad.Sci. U.S.A., 87:98689872.

    Ogawa, S., Menon, R., Tank, D., Kim, S., Merkle, H.,Ellermann, J., and Ugurbil, K. (1993). Functionalbrain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging. Acomparison of signal characteristics with a

    biophysical model. Biophys. J., 64:803 812.Ogawa, S., Tank, D., Menon, R., Ellermann, J., Kim,

    S., Merkle, H., and Ugurbil, K. (1992). Intrinsicsignal changes accompanying sensory stimulation:functional brain mapping with magnetic resonanceimaging. Proc. Natl. Acad. Sci. U.S.A., 89:59515955.

    Ojemann, J., Akbudak, E., Snyder, A., McKinstry, R.,Raichle, M., and Conturo, T. (1997). Anatomiclocalization and quantitative analysis of gradientrefocused echo-planar fMRI susceptibility artifacts.Neuroimage, 6:156167.

  • 8/12/2019 Introduction to Reward Processing

    26/28

  • 8/12/2019 Introduction to Reward Processing

    27/28

    Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram

    27

    Schultz, W., Dayan, P., and Montague, P. R. (1997). Aneural substrate of prediction and reward. Science,275(5306):15931599.

    Schwarzbauer, C., Raposo, A. & Tyler, L.K. (2005).Spin-echo fMRI overcomes susceptibility-inducedsignal losses in the inferior temporal lobes.

    NeuroImage, 26 (S1): 802.Schwarzbauer C., Mildner T., Heinke W., Zysset S.,Deichmann R., Brett M., Davis M.H. (2006). Spin-echp EPI The method of choice for fMRI of brainregions affected by magnetic fieldinhomogeneities? Human Brain Mapping, AbstractNo: 1049.

    Seymour, B., Daw, N., Dayan, P., Singer, T., andDolan, R. (2007a). Differential encoding of lossesand gains in the human striatum. J. Neurosci.,27:48264831.

    Seymour, B., ODoherty, J., Dayan, P., Koltzenburg,M., Jones, A., Dolan, R., Friston, K., and

    Frackowiak, R. (2004). Temporal difference modelsdescribe higher-order learning in humans. Nature,429:664 667.

    Seymour, B., Singer, T., and Dolan, R. (2007b). Theneurobiology of punishment. Nat. Rev. Neurosci.,8:300311.

    Shafir, E., Diamond, P.A., and Tversky, A. (1997). OnMoney Illusion. Quarterly Journal of Economics,112:341-74.

    Simmons, W., Martin, A., and Barsalou, L. (2005).Pictures of appetizing foods activate gustatorycortices for taste and reward. Cereb. Cortex,15:16021608.

    Stark, C.E., Squire, L.R. (2001). When zero is not zero:the problem of ambiguous baseline conditions infMRI. Proc. Natl. Acad. Sci. U.S.A., 98, 22:12760-6.

    Sutton, R. S. (1988). Learning to predict by the methodof temporal difference. Machine Learning, 3, 9-44.

    Sutton, R. and Barto, A. (1981). Toward a moderntheory of adaptive networks: expectation andprediction. Psychol Rev, 88:135170.

    Sutton, R. S. & Barto, A. G. (1990). Time-derivativemodels of Pavlovian reinforcement. In M. Gabriel& J. Moore (eds.), Learning and computationalneuroscience: foundations of adaptive networks,

    497-537, Boston: MIT Press.Talairach, J. and Tournoux, P. (1988). Co-planar

    Stereotaxic Atlas of the Human Brain. Thieme,New York.

    Talmi, D., Seymour, B., Dayan, P., and Dolan, R.(2008). Human pavlovian-instrumental transfer. J.Neurosci., 28:360368.

    Thorpe, S., Rolls, E., and Maddison, S. (1983). Theorbitofrontal cortex: neuronal activity in thebehaving monkey. Exp Brain Res, 49:93115.

    Thorndike, E. L. (1911). Animal intelligence:experimental studies. New York: Macmillan.

    Thut, G., Schultz, W., Roelcke, U., Nienhusmeier, M.,Missimer, J., Maguire, R., and Leenders, K. (1997).Activation of the human brain by monetary reward.Neuroreport, 8:12251228.

    Tiggemann, M, Kemps, E (2005). The phenomenologyof food cravings: the role of mental imagery.

    Appetite, 45, 3:305-13.Tobler, P. (2003). Coding of basic reward parametersby dopamine neurons. PhD Thesis, University ofCambridge.

    Tobler, P., Dickinson, A., and Schultz, W. (2003).Coding of predicted reward omission by dopamineneurons in a conditioned inhibition paradigm. J.Neurosci., 23:1040210410.

    Tobler, P., Fiorillo, C., and Schultz, W. (2005).Adaptive coding of reward value by dopamineneurons. Science, 307:16421645.

    Tobler, P., Fletcher, P., Bullmore, E., and Schultz, W.(2007a). Learning-related human brain activations

    reflecting individual finances. Neuron, 54:167175.Tobler, P., Odoherty, J., Dolan, R., and Schultz, W.

    (2006). Human neural learning depends on rewardprediction errors in the blocking paradigm. J.Neurophysiol., 95:301310.

    Tobler, P., ODoherty, J., Dolan, R., and Schultz, W.(2007b). Reward value coding distinct from riskattitude-related uncertainty coding in human rewardsystems. J. Neurophysiol., 97:16211632.

    Tremblay, L., Hollerman, J. R., and Schultz, W.(1998). Modifications of reward expectation-relatedneuronal activity during learning in primatestriatum. J Neurophysiol, 80(2):964977.

    Tremblay, L. and Schultz, W. (1999). Relative rewardpreference in primate orbitofrontal cortex. Nature,398(6729):704 708.

    Tremblay, L. and Schultz, W. (2000a). Modificationsof reward expectation-related neuronal activityduring learning in primate orbitofrontal cortex. JNeurophysiol, 83(4):18771885.

    Tremblay, L. and Schultz, W. (2000b). Rewardrelatedneuronal activity during go-nogo task performancein primate orbitofrontal cortex. J Neurophysiol,83(4):18641876.

    Tricomi, E., Delgado, M., and Fiez, J. (2004).Modulation of caudate activity by action

    contingency. Neuron, 41:281292.Tzourio-Mazoyer, N, Landeau, B, Papathanassiou, D,

    Crivello, F, Etard, O, Delcroix, N, Mazoyer, B,Joliot, M (2002). Automated anatomical labeling ofactivations in SPM using a macroscopic anatomicalparcellation of the MNI MRI single-subject brain.Neuroimage, 15, 1:273-89.

    Valentin, V., Dickinson, A., and ODoherty, J. (2007).Determining the neural substrates of goal-directedlearning in the human brain. J. Neurosci., 27:40194026.

  • 8/12/2019 Introduction to Reward Processing

    28/28

    Introduction to Reward processing

    28

    Vohs, K., Mead, N., and Goode, M. (2006). Thepsychological consequences of money. Science,314:11541156.

    Waelti, P., Dickinson, A., and Schultz, W. (2001).Dopamine responses comply with basicassumptions of formal learning theory. Nature,

    412(6842):4348.Winston, J., Gottfried, J., Kilner, J., and Dolan, R.(2005). Integrated neural representations of odorintensity and affective valence in human amygdala.J. Neurosci., 25:89038907.

    Wise, R. (2002). Brain reward circuitry: insights fromunsensed incentives. Neuron, 36:229240.

    Wise, R. (2004). Dopamine, learning and motivation.Nat. Rev. Neurosci., 5:483494.

    Wittmann, B., Schott, B., Guderian, S., Frey, J.,Heinze, H., and Duzel, E. (2005). Reward-relatedFMRI activation of dopaminergic midbrain isassociated with enhanced hippocampus-dependent

    long-term memory formation. Neuron, 45:459467.

    Worsley, K., Marrett, S., Neelin, P., Vandal, A. C.,Friston, K., and Evans, A. C. (1996). A unifiedstatistical approach for determining significantvoxels in images of cerebral activation. HumanBrain Mapping, 4:5873.

    Yoo, S., Freeman, D., McCarthy, J., and Jolesz, F.

    (2003). Neural substrates of tactile imagery: afunctional MRI study. Neuroreport, 14:581585.Zink, C., Pagnoni, G., Chappelow, J., Martin-Skurski,

    M., and Berns, G. (2006). Human striatal activationreflects degree of stimulus saliency. Neuroimage,29:977983.

    Zink, C., Pagnoni, G., Martin, M., Dhamala, M., andBerns, G. (2003). Human striatal response to salientnonrewarding stimuli. J. Neurosci., 23:80928097.

    Zink, C., Pagnoni, G., Martin-Skurski, M., Chappelow,J., and Berns, G. (2004). Human striatal responsesto monetary reward depend on saliency. Neuron,42:509517