the taming of un certainty magda osman qmul head of dynamic learning and decision-making lab ...
TRANSCRIPT
THE TAMING OF UNCERTAINTY
Magda Osman QMULHead of Dynamic Learning and Decision-making Lab
The Taming of Uncertainty
Act 1 – The Comedy of Gains & Losses
Act 2- The Tragedy of Unknowing
Act 3 – The Triumph of Heroes
Scene setting
Meder, Le Lec & Osman (2013). Trend in Cognitive Science
Decision-Making Scenarios
Meder, Le Lec & Osman (2013). Trend in Cognitive Science
Types of Uncertainty
How are dynamic situations studied in the lab?
• Mircoworlds – mini computerised situations that mimic uncertain domains in which the participant interacts with and then attempts to control various outcomes
1. A brief Scenario is presented
2. The participant is then shown the computer-based task
3. They have a set number of trials in which to manipulate variables in order to control an outcome to criterion
4. This is followed by knowledge tests designed to access their understanding of the rules or causal structure connecting the input variables to the output variables
Characters
Dynamic Decision-Making (DDM)
“….is a goal directed process that involves selecting actions that will reliably achieve and maintain the same outcome over time”
• (Brehmer, 1992)
• Where can DDM be found?• Changes occur either endogenously and/or exogenously
• What are the defining Characteristics of DDM?• Sequentially, interdependent, online
Osman (2010). Psychological Bulletin
Agency & Control
Choice involves an selection between alternatives, inherent in this is an action (mental/physical) that identifies the preferred choice
Control“… is the combination of cognitive processes needed to co-ordinate actions in order to achieve an goal on a reliable basis over time”
Agency“…is the overarching state or sustained experience that is concerned with ownership of and responsibility of observed actions”
Osman (2014). Future-minded: The psychology of agency and control
Heroes vs. Villains
• Unconscious• Invalidity• Unreliability
• Bounded Rationality• Reinforcement learning• Causality
Plot
Advances in DDM Research
Machine learning/ Animal learning/ Neuroscience: Huys & Dayan (2009),Dayan (2009); Daw, Niv & Dayan, (2005), Dickinson (1985); Matignon, Laurent, & Fort-Piat (2006)
Dynamic control / Naturalistic decision making tasks/ Contingency learning/ Management Science:Brehmer (1992), Busemeyer (1999), Edwards (1962), Kirlik, Miller & Jagacinski (1993), Sterman (1989)
So, what are the key influences on Dynamic Decision-Making?
Monitoring & Control Theory (Osman, 2008, 2010, 2011)
The agentThe agent is engaged in a goal-directed way (goals), in order to achieve a certain outcome (reward), and repeatedly behaves as if an action will achieve and maintain a certain goal (sense of agency/contingency learning)
The situationThe actions of the agent are informed by state changes in the environmental, and the feedback/reward structure, in which the state changes are potentially knowable
Simple predictionOnly manipulations of Agency/ Goals/Reward/Feedback/Contingency should impact decision-making performance
Synopsis
Learning: - Mode of learning
(prediction vs. control)
(observation vs. action)
- Type of goal
(Exploration vs. Exploitation)
Sense of Agency: (Low vs. High)
Contingency:
(Extreme, High, Moderate, Low)
Performance feedback:
(Positive, Negative, Both, Neither)
Reward: (Gains, Losses)
Research Programme
Preamble
Lab-based Decision-making – Hedonic principle + Loss aversion = Losses are generally more salient than gains (a la Prospect theory)
So, losses should drive learning more effectively
Real-world Decision-making - During early stages of learning negative rewards lead to improved performance
So, losses should drive learning more effectively
(Brett & VandeWalle, 1999; Kaheman & Tversky, 1970; Latham & Locke, 1990; Tabernero & Wood, 1999)
losses vs. Gains
• Actions• Intervene on Input 1, input 2, input 3, no intervention• Value setting (1-100)
• Goal of task• Learnt to control a dynamic outcome to a specific
goal
• Reward set up• Maximize gains vs. Minimize Losses
• Experimental set up• 100 Learning trials • 20 Test trial (reward is performance related) – Familiar
goal• 20 Test trials (reward is performance related)-
Unfamiliar goal
Experimental Set up
- 5 + 5Points converts to money
Gains +10 (maximum gain)Losses -5 (minimum loss)
the closer the outcome value is to target as compared to the previous trial
Gains +5 (minimum gain)Losses -10 (maximum loss)
the further the outcome value is to target as compare to the previous trial
But point assignment is probabilistic, 80% reliable
Total points in Test 1 + Test 2 * 2.5 = final wins
Reward is based on the discrepancy between achieved and target value
Structure
20
3 inputs , 1 Output (Continuous variables)
y(t) = y(t-1) + b1 x1(t) + b2 x2(t) + et
• Output value = y(t)• Previous output value = y(t-1)• Positive input= b1 = 0.65• Negative input = b2 = -0.65• (Null input) Random noise = et
The random noise was drawn from a normal distributionwith mean of 0, SD 8 (Intermediate Noise)
Structure of Task Environment
Act 1: The Comedy of Gains & Losses
N = 40 ( Gains N = 20, Losses N = 20) - No differences in learning patterns
Trials 1-100
Learning Performance - outcome feedback every trial
Limited difference in test performance
Test 1 Test 2
Test Performance : Outcome feedback every trial
Act 2: The Tragedy of Unknowing
N = 50 ( Gains N = 25, Losses N = 25) - Advantage for gains group
Trials 1-1001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
0
10
20
30
40
50
60
70
80
90
100
GainsLosses
Learning Performance: Outcome feedback every 5th trial
Small but reliable advantage for the Gains group in test performance
Test 1 Test 2
1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
0
10
20
30
40
50
60
70
80
90GainsLosses
Test Performance: Outcome feedback every 5th trial
Comparison of both ExperimentsTest Performance
Learning Performance
Exp 1a Exp 1b Exp 1a Exp 1b
Final Act: The Triumph of Heroes
Despite impoverished information (Exp 1b) DDM is robust enough that learning is possible - but only when maximizing gains
1. High quality (but simple) outcome information presented frequently, with reward signals – [Not socially framed]
2. Incentivization schemes make a difference but only under extreme uncertainty
[avoid punishment schedules under these conditions]
Best Circumstances for DDM
Strategies
When the conditions are highly unstable people
1. intervene on the system a lot
2. make dramatic changes in parameter setting
3. Change multiple variables at once
When it is highly stable people
1. intervene on the system very little
2. make conservative changes in parameter setting
3. Make minimal systematic changes to variables
Under both conditions people seem to stick to their choice of strategy than switch over time
Monitoring and Control
• Simple prediction• Only manipulations of Agency/
Goals/Reward/Feedback/Contingency should impact decision-making performance
• This has been supported in several studies
• Critically the above factors impact on forecasting behaviour as well as DDM (control behaviour)
• Critically, there is NO evidence that these factors have differential effect on DDM/forecasting because they tap in different “systems” (i.e. System 1 vs. System 2)
Zuzanna Hola Brian Glass
Thanks to
Undergraduate Students
Patrycja Marta Bartoszek
Susanne Stollewerk
Researchers
Bjoern Meder Agata Ryterska
Maarten Speekenbrink
Exploitation, Optimality, Variability
Exploitation Optimality Variability