modeling transfer of learning in games of strategic interaction
DESCRIPTION
Modeling transfer of learning in games of strategic interaction. Ion Juvina & Christian Lebiere Department of Psychology Carnegie Mellon University. Background | Experiment | Model | In progress | Discussion. Outline. Background Experiment Cognitive model - PowerPoint PPT PresentationTRANSCRIPT
ACT-R Workshop July 2012 1
Modeling transfer of learning in games of strategic interaction
Ion Juvina & Christian Lebiere
Department of Psychology
Carnegie Mellon University
2
Outline Background
Experiment
Cognitive model
Work in progress
Discussion
Background | Experiment | Model | In progress | Discussion
3
Transfer of learning Alfred Binet (1899):
Formal discipline: Exercise of mental faculties -> generalization
Thorndike (1903): Identical element theory:
transfer of learning occurs only when identical elements of behavior are carried over from one task to another
Singley & Anderson (1989): Surface vs. deep similarities
Common “cognitive units”
Background | Experiment | Model | In progress | Discussion
4
Transfer in strategic interaction
Bipartisan cooperation in Congress Golf -> bipartisanship?
Similarity? What is transferred?
Background | Experiment | Model | In progress | Discussion
5
Prisoner’s Dilemma (PD)
Background | Experiment | Model | In progress | Discussion
6
Chicken game (CG)
Background | Experiment | Model | In progress | Discussion
PD & CG payoff matrices
PD A B
A -1, -1 10, -10
B -10, 10 1, 1
CG A B
A -10, -10 10, -1
B -1, 10 1, 1
Background | Experiment | Model | In progress | Discussion
Similarities between PD & CG Surface (near transfer)
2X2 games 2 symmetric and 2 asymmetric outcomes [1,1] outcome is identical
Deep (far transfer) Mixed motive Non-zero sum Mutual cooperation is superior to competition
in long term Though unstable (risky)
Background | Experiment | Model | In progress | Discussion
9
Differences between PD & CG Different equilibria:
Symmetric in PD: [-1,-1]
Asymmetric in CG: [-1, 10] and [10,-1]
Different strategies to maximize joint payoff (Pareto-efficient outcome): [1,1] in PD
Alternation of [-1,10] and [10,-1] in CG
Background | Experiment | Model | In progress | Discussion
10
Questions / hypotheses Similarities
Identical element? Common cognitive units? Transfer of learning
Is there any transfer? Only in one direction?
Low – high entropy? (Bednar, Chen, Xiao Liu, & Page, in press) Identical element -> both ways
Mechanism of transfer Reciprocal trust mitigates the risk associated with the
long term solution (Hardin, 2002)
Background | Experiment | Model | In progress | Discussion
Participants and design 480 participants (CMU students)
240 pairs 2 within-subjects games: PD & CG 4 between-subjects information conditions
No-info: 60 pairs Min-info: 60 pairs Mid-info: 60 pairs Max-info: 60 pairs
2 between-subjects order conditions in each information condition
PD-CG: 30 pairs CG-PD: 30 pairs
200 unnumbered rounds for each game
Background | Experiment | Model | In progress | Discussion
Typical outcomes
Background | Experiment | Model | In progress | Discussion
Pareto-optimal equilibria
Background | Experiment | Model | In progress | Discussion
[1,1] increases with info
Background | Experiment | Model | In progress | Discussion
Alternation increases with info
Background | Experiment | Model | In progress | Discussion
PD – CG sequence
16
Background | Experiment | Model | In progress | Discussion
CG – PD sequence
17
Background | Experiment | Model | In progress | Discussion
18
PD before and after
Background | Experiment | Model | In progress | Discussion
19
CG before and after
Background | Experiment | Model | In progress | Discussion
20
Transfer from PD to CG Increased [1,1] (surface transfer)
Increased alternation (deep transfer)
Background | Experiment | Model | In progress | Discussion
21
Transfer from CG to PD Increased [1,1] (surface + deep transf.)
Background | Experiment | Model | In progress | Discussion
22
Divergent effects
[1,1]SurfaceSurface
DeepDeep
[1,1]
[10,-1] / [-1,10]
PD CG
Background | Experiment | Model | In progress | Discussion
23
Convergent effects
[1,1]
SurfaceSurface
DeepDeep
[1,1]
[10,-1] / [-1,10]
CG PD
Background | Experiment | Model | In progress | Discussion
Reciprocation by info
Background | Experiment | Model | In progress | Discussion
Payoff by info in PD and CG
Background | Experiment | Model | In progress | Discussion
26
Summary results Mutual cooperation increases with awareness of
interdependence (info) Transfer of learning
Better performance “after” than “before” Combined effects of surface and deep similarities
CG -> PD surface similarity facilitates transfer PD -> CG surface similarity interferes with transfer
Transfer occurs in both directions Mechanism of generalization
Reciprocal trust?
Background | Experiment | Model | In progress | Discussion
27
Cognitive model Awareness of interdependence
Opponent modeling
Generality Utility learning (reinforcement learning)
Transfer of learning Surface transfer Deep transfer
Background | Experiment | Model | In progress | Discussion
28
Opponent modeling Instance-based learning
Dynamic representation of the opponent
Sequence learning Prediction of opponent’s next move
Instance (snapshot of the current situation) Previous moves and opponent’s current move
Contextualized expectations
Background | Experiment | Model | In progress | Discussion
29
Utility learning Reinforcement learning
Strategy: what move to make given Expected move of opponent Context (previous moves)
Reward functions Own payoff – Opponent’s payoff Opponent’s payoff Joint payoff – Opponent’s previous payoff
Background | Experiment | Model | In progress | Discussion
30
Surface transfer Declarative sub-symbolic learning
Retrieval of instances guided by recency and frequency
Strategy learning A learned strategy continues to be used for
a while until it is unlearned
Background | Experiment | Model | In progress | Discussion
31
Deep transfer Trust learning / Trust dynamics
Trust accumulator Increases when opponent makes cooperative (risky)
moves
Decreases when opponent makes competitive moves
Trust invest accumulator Increases with mutually destructive outcome
Decreases with unreciprocated cooperation (risk taking)
Background | Experiment | Model | In progress | Discussion
32
Meta strategy Determines which reward function to use
Trust accumulator <= 0 Reward = own payoff – opponent’s payoff
Trust invest accumulator > 0 Reward = opponent’s payoff
Trust accumulator > 0 Reward = joint payoff – opp’s previous payoff
Background | Experiment | Model | In progress | Discussion
Model diagram
HSCB 2011 33
InstanceCurrent moves: A BPrevious moves: A A
Declarative Memory
Inst2Inst1
Inst4
Inst3
PredictionPrevious moves: A BOpponent move: A
Procedural Memory
Rule2
Rule1 Rule3
MoveBest response: A Predicted move: A
Trust Trust accumulatorTrust invest
Opponent MoveA
Reward
ACT-R
Environment
ACT-R extension
Background | Experiment | Model | In progress | Discussion
PD-CG
Background | Experiment | Model | In progress | Discussion
CG-PD
Background | Experiment | Model | In progress | Discussion
PD-CG surface transfer
Background | Experiment | Model | In progress | Discussion
PD-CG deep transfer
Background | Experiment | Model | In progress | Discussion
CG – PD surf+deep transfer
Background | Experiment | Model | In progress | Discussion
Trust simulation
Background | Experiment | Model | In progress | Discussion
40
Summary model results Awareness of interdependence
Opponent modeling
Generality Utility learning
Transfer of learning Surface level transfer: cognitive units
Deep level transfer: Trust
Background | Experiment | Model | In progress | Discussion
41
In progress Expand model to account for all
information conditions
Develop more ecologically valid paradigm (IPD^3)
Model “affective” processes in ACT-R
Background | Experiment | Model | In progress | Discussion
42
IPD^3
Background | Experiment | Model | In progress | Discussion
43
General discussion Transfer of learning is possible
Deep similarities: interpersonal level IPD^3
To be used in behavioral experiments Tool for learning strategic interaction skills
Background | Experiment | Model | In progress | Discussion
44
Acknowledgments
Coty Gonzalez Jolie Martin Hau-Yu Wong Muniba Saleem This research is supported by the Defense
Threat Reduction Agency (DTRA) grant number: HDTRA1-09-1-0053 to Cleotilde Gonzalez and Christian Lebiere
45
Thank you for your attention! Questions?