reinforcement learning and human behavior hanan shteingart and yonatan loewenstein
DESCRIPTION
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava. Introduction. Operant Learning - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/1.jpg)
Reinforcement learning and
human behavior
Hanan Shteingart and Yonatan Loewenstein
MTAT.03.292 Seminar in Computational Neuroscience
Zurab Bzhalava
![Page 2: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/2.jpg)
Introduction
• Operant Learning
• Dominant computational approach to model operant learning is model-free RL
• Human behavior is far more complex
• Remaining Challenges
![Page 3: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/3.jpg)
Reinforcement Learning
RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment
Goal: Learn a policy to maximize some measure of long-term reward
![Page 4: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/4.jpg)
Markov Decision Process
• A (finite) set of states S• A (finite) set of actions A• Transition Model: T(s, a, s’) = P(s’ | a ,s)• Reward Function: R(s)
• ᵧ is a discount factor ᵧ [0; 1]∈
• Policy π
• Optimal policy π*
![Page 5: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/5.jpg)
Markov Decision Process
Bellman equation:
![Page 6: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/6.jpg)
Biological Algorithms
• Behavioral control
• Evaluate the world quickly
• Choose appropriate behavior based on those valuations
![Page 7: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/7.jpg)
midbrain's dopamine neurons
• Central role in guiding our behavior and thoughts
• Valuation of our world– Value of money– Other human being
• Major role in decision-making • Reward-dependent learning• Malfunction in mental illness • Related to Parkinson's disease. • Schizophrenia
![Page 8: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/8.jpg)
Reinforcement signals define an agent's goals
1. organism is in state X an receives reward information;
2. organism queries stored value of state X;
3. organism updates stored value of state X based on current reward information;
4. organism selects action based on stored policy
5. organism transitions to state Y and receives reward information.
![Page 9: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/9.jpg)
The reward-prediction error hypothesis
Difference between the experienced and predicted “reward” of an event
•Neurons of the ventral tegmental area
•phasic activity changes encode a 'prediction error about summed future reward'
![Page 10: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/10.jpg)
prediction-error signal encoded in dopamine neuron firing.
![Page 11: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/11.jpg)
Value binding
![Page 12: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/12.jpg)
Human reward responses
![Page 13: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/13.jpg)
Human reward responses
![Page 14: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/14.jpg)
Model-based RL vs Model-free RL
• goal-directed vs habitual behaviors
• Implemented by two anatomically distinct systems (subject of debate)
• Some findings suggest:
– Medial striatum is more engaged during planning
– Lateral striatum is more engaged during choices in extensively trained tasks
![Page 15: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/15.jpg)
Model-based RL vs Model-free RL
(b) Model-free RL
(c) Model-based RL
Human subjects in exhibited a mixture of both effects.
![Page 16: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/16.jpg)
Challenges in relating human behavior to RL algorithms
• Humans tend to alternate rather than repeat an action after receiving a positively surprising payoff
• Tremendous heterogeneity in reports on human operant learning
• Probability matching or not
![Page 17: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/17.jpg)
Heterogeneity in world model
![Page 18: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/18.jpg)
Learning the world model
![Page 19: Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein](https://reader035.vdocument.in/reader035/viewer/2022062721/56813652550346895d9dd6de/html5/thumbnails/19.jpg)
Reference List:
• Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein
• The ubiquity of model-based reinforcement learning Bradley B Doll Dylan A Simon3 and Nathaniel D Daw
• Computational roles for dopamine in behavioral control P. Read Montague1,2, Steven E. Hyman3 & Jonathan D. Cohen4,5