the taming of un certainty magda osman qmul head of dynamic learning and decision-making lab ...

THE TAMING OF UNCERTAINTY

Magda Osman QMULHead of Dynamic Learning and Decision-making Lab

[email protected]

http://www.magdaosman.co.uk/

mailto:[email protected]

The Taming of Uncertainty

Act 1 – The Comedy of Gains & Losses

Act 2- The Tragedy of Unknowing

Act 3 – The Triumph of Heroes

Scene setting

Meder, Le Lec & Osman (2013). Trend in Cognitive Science

Decision-Making Scenarios

Meder, Le Lec & Osman (2013). Trend in Cognitive Science

Types of Uncertainty

How are dynamic situations studied in the lab?

• Mircoworlds – mini computerised situations that mimic uncertain domains in which the participant interacts with and then attempts to control various outcomes

1. A brief Scenario is presented

2. The participant is then shown the computer-based task

3. They have a set number of trials in which to manipulate variables in order to control an outcome to criterion

4. This is followed by knowledge tests designed to access their understanding of the rules or causal structure connecting the input variables to the output variables

Characters

Dynamic Decision-Making (DDM)

“….is a goal directed process that involves selecting actions that will reliably achieve and maintain the same outcome over time”

• (Brehmer, 1992)

• Where can DDM be found?• Changes occur either endogenously and/or exogenously

• What are the defining Characteristics of DDM?• Sequentially, interdependent, online

Osman (2010). Psychological Bulletin

Agency & Control

Choice involves an selection between alternatives, inherent in this is an action (mental/physical) that identifies the preferred choice

Control“… is the combination of cognitive processes needed to co-ordinate actions in order to achieve an goal on a reliable basis over time”

Agency“…is the overarching state or sustained experience that is concerned with ownership of and responsibility of observed actions”

Osman (2014). Future-minded: The psychology of agency and control

Heroes vs. Villains

• Unconscious• Invalidity• Unreliability

• Bounded Rationality• Reinforcement learning• Causality

Advances in DDM Research

Machine learning/ Animal learning/ Neuroscience: Huys & Dayan (2009),Dayan (2009); Daw, Niv & Dayan, (2005), Dickinson (1985); Matignon, Laurent, & Fort-Piat (2006)

Dynamic control / Naturalistic decision making tasks/ Contingency learning/ Management Science:Brehmer (1992), Busemeyer (1999), Edwards (1962), Kirlik, Miller & Jagacinski (1993), Sterman (1989)

So, what are the key influences on Dynamic Decision-Making?

Monitoring & Control Theory (Osman, 2008, 2010, 2011)

The agentThe agent is engaged in a goal-directed way (goals), in order to achieve a certain outcome (reward), and repeatedly behaves as if an action will achieve and maintain a certain goal (sense of agency/contingency learning)

The situationThe actions of the agent are informed by state changes in the environmental, and the feedback/reward structure, in which the state changes are potentially knowable

Simple predictionOnly manipulations of Agency/ Goals/Reward/Feedback/Contingency should impact decision-making performance

Synopsis

Learning: - Mode of learning

(prediction vs. control)

(observation vs. action)

- Type of goal

(Exploration vs. Exploitation)

Sense of Agency: (Low vs. High)

Contingency:

(Extreme, High, Moderate, Low)

Performance feedback:

(Positive, Negative, Both, Neither)

Reward: (Gains, Losses)

Research Programme

Preamble

Lab-based Decision-making – Hedonic principle + Loss aversion = Losses are generally more salient than gains (a la Prospect theory)

So, losses should drive learning more effectively

Real-world Decision-making - During early stages of learning negative rewards lead to improved performance

So, losses should drive learning more effectively

(Brett & VandeWalle, 1999; Kaheman & Tversky, 1970; Latham & Locke, 1990; Tabernero & Wood, 1999)

losses vs. Gains

• Actions• Intervene on Input 1, input 2, input 3, no intervention• Value setting (1-100)

• Goal of task• Learnt to control a dynamic outcome to a specific

goal

• Reward set up• Maximize gains vs. Minimize Losses

• Experimental set up• 100 Learning trials • 20 Test trial (reward is performance related) – Familiar

goal• 20 Test trials (reward is performance related)-

Unfamiliar goal

Experimental Set up

- 5 + 5Points converts to money

Gains +10 (maximum gain)Losses -5 (minimum loss)

the closer the outcome value is to target as compared to the previous trial

Gains +5 (minimum gain)Losses -10 (maximum loss)

the further the outcome value is to target as compare to the previous trial

But point assignment is probabilistic, 80% reliable

Total points in Test 1 + Test 2 * 2.5 = final wins

Reward is based on the discrepancy between achieved and target value

Structure

20

3 inputs , 1 Output (Continuous variables)

y(t) = y(t-1) + b1 x1(t) + b2 x2(t) + et

• Output value = y(t)• Previous output value = y(t-1)• Positive input= b1 = 0.65• Negative input = b2 = -0.65• (Null input) Random noise = et

The random noise was drawn from a normal distributionwith mean of 0, SD 8 (Intermediate Noise)

Structure of Task Environment

Act 1: The Comedy of Gains & Losses

N = 40 ( Gains N = 20, Losses N = 20) - No differences in learning patterns

Trials 1-100

Learning Performance - outcome feedback every trial

Limited difference in test performance

Test 1 Test 2

Test Performance : Outcome feedback every trial

Act 2: The Tragedy of Unknowing

N = 50 ( Gains N = 25, Losses N = 25) - Advantage for gains group

Trials 1-1001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

0

10

20

30

40

50

60

70

80

90

100

GainsLosses

Learning Performance: Outcome feedback every 5th trial

Small but reliable advantage for the Gains group in test performance

Test 1 Test 2

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

0

10

20

30

40

50

60

70

80

90GainsLosses

Test Performance: Outcome feedback every 5th trial

Comparison of both ExperimentsTest Performance

Learning Performance

Exp 1a Exp 1b Exp 1a Exp 1b

Final Act: The Triumph of Heroes

Despite impoverished information (Exp 1b) DDM is robust enough that learning is possible - but only when maximizing gains

1. High quality (but simple) outcome information presented frequently, with reward signals – [Not socially framed]

2. Incentivization schemes make a difference but only under extreme uncertainty

[avoid punishment schedules under these conditions]

Best Circumstances for DDM

Strategies

When the conditions are highly unstable people

1. intervene on the system a lot

2. make dramatic changes in parameter setting

3. Change multiple variables at once

When it is highly stable people

1. intervene on the system very little

2. make conservative changes in parameter setting

3. Make minimal systematic changes to variables

Under both conditions people seem to stick to their choice of strategy than switch over time

Monitoring and Control

• Simple prediction• Only manipulations of Agency/

Goals/Reward/Feedback/Contingency should impact decision-making performance

• This has been supported in several studies

• Critically the above factors impact on forecasting behaviour as well as DDM (control behaviour)

• Critically, there is NO evidence that these factors have differential effect on DDM/forecasting because they tap in different “systems” (i.e. System 1 vs. System 2)

Zuzanna Hola Brian Glass

Thanks to

Undergraduate Students

Patrycja Marta Bartoszek

Susanne Stollewerk

Researchers

Bjoern Meder Agata Ryterska

Maarten Speekenbrink

Exploitation, Optimality, Variability

Exploitation Optimality Variability

the taming of un certainty magda osman qmul head of dynamic learning and decision-making lab ...

Documents