dopamine, uncertainty and td learning
DESCRIPTION
Dopamine, Uncertainty and TD Learning. Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL. CNS 2004. Dorsal Striatum (Caudate, Putamen). Prefrontal Cortex. Nucleus Accumbens (Ventral Striatum). Amygdala. Substantia Nigra. Ventral Tegmental Area. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/1.jpg)
Dopamine, Uncertainty
and TD Learning
CNS 2004
Yael Niv
Michael Duff
Peter Dayan
Gatsby Computational Neuroscience Unit, UCL
![Page 2: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/2.jpg)
What is the function of Dopamine?
Dorsal Striatum (Caudate, Putamen)
Ventral TegmentalArea
Substantia Nigra
Amygdala
Nucleus Accumbens(Ventral Striatum)
Prefrontal Cortex
Parkinson’s Disease-> Movement control?
Intracranial self-stimulation;Drug addiction-> Reward pathway?-> Learning?
Also involved in:- Working memory- Novel situations- ADHD- Schizophrenia…
![Page 3: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/3.jpg)
What does phasic Dopamine encode?Unpredicted reward(neutral/no stimulus)
Predicted reward(learned task)
Omitted reward(probe trial)
(Schultz et al.)
![Page 4: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/4.jpg)
The TD Hypothesis of Dopamine
Phasic DA encodes a reward prediction error
• Precise theory for generation of DA firing patterns
• Compelling account for the role of DA in classical conditioning
)1()( ttV
)1()1( tVtr
...)3()2()1()()(
trtrtrrtVt
reward
value
r
V
(Sutton+Barto 1987, Schultz,Dayan,Montague 1997)
)()1()1()1( tVtVtrt Temporal difference error
![Page 5: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/5.jpg)
But: Fiorillo, Tobler & Schultz 2003• Introduce inherent uncertainty into the classical
conditioning paradigm
• Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0%
Stimulus = 2 sec visual stimulus
Reward (probabilistic) = drops of juice
![Page 6: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/6.jpg)
Fiorillo, Tobler & Schultz 2003At stimulus time - DA represents
mean expected reward
Delay activity - A ramp in activity up to reward
Hypothesis: DA ramp encodes uncertainty in reward
![Page 7: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/7.jpg)
“Uncertainty Ramping” and TD error?• The uncertainty is predictable from the stimulus• TD predicts away predictable quantities If it represents uncertainty, the ramping activity should
disappear with learning according to TD.
Uncertainty ramping is not easily compatible with the TD hypothesis
Are the ramps really coding uncertainty?
![Page 8: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/8.jpg)
At time of reward:• Prediction errors result from
probabilistic reward delivery
• Crucially: Positive and negative errors cancel out
A closer look at FTS’s results
p = 50%
p = 75%
![Page 9: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/9.jpg)
• TD prediction error δ(t) can be positive or negative• Neuronal firing rate is only positive (negative values can
be encoded relative to base firing rate)
But: DA base firing rate is low -> asymmetric encoding of δ(t)
A TD Resolution:
55%
270%
δ(t)
DA
![Page 10: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/10.jpg)
Negative δ(t) scaled by
d=1/6 prior to PSTH
summation
Simulating TD with asymmetric errors
Learning proceeds normally (without scaling) − Necessary to produce the right predictions− Can be biologically plausible
![Page 11: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/11.jpg)
With asymmetric coding of errors, the mean TD error at the time of reward p(1-p)=> Maximal at p=50%
However:• No need to assume explicit coding of uncertainty -
Ramping is explained by neural constraints.• Explanation for puzzling absence of ramp in trace
conditioning results.• Experimental test: Ramp as within or
between trial phenomenon?
Challenges: TD and noise;
Conditioned inhibition, additivity
DA - Uncertainty or Temporal Difference?Experiment
Model
![Page 12: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/12.jpg)
Trace conditioning: A puzzle and its resolution
• Same (if not more) uncertainty, but no DA ramping (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman)
• Resolution: lower learning rate in trace conditioning eliminates ramp
CS = short visual stimulus
Trace period
US (probabilistic) = drops of juice
![Page 13: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/13.jpg)
• Rate coding is inherently stochastic• Add noise to tapped delay line representation
=> TD learning is robust to this type of noise
σ = 0.0577
σ = 0.0866
σ = 0.1155
prediction error weights
Mirenowicz and Schultz (1996)
Other sources of uncertainty: Representational Noise (1)
![Page 14: Dopamine, Uncertainty and TD Learning](https://reader035.vdocument.in/reader035/viewer/2022062422/56813c24550346895da59c7b/html5/thumbnails/14.jpg)
• Neural timing of events is necessarily inaccurate• Add temporal noise to tapped delay line representation
=> Devastating effects of even small amounts of temporal noise on TD predictions
Other sources of uncertainty: Representational Noise (2)
ε = 0.05
ε = 0.10