modeling transfer of learning in games of strategic interaction

ACT-R Workshop July 2012 1

Modeling transfer of learning in games of strategic interaction

Ion Juvina & Christian Lebiere

Department of Psychology

Carnegie Mellon University

2

Outline Background

Experiment

Cognitive model

Work in progress

Discussion

Background | Experiment | Model | In progress | Discussion

3

Transfer of learning Alfred Binet (1899):

Formal discipline: Exercise of mental faculties -> generalization

Thorndike (1903): Identical element theory:

transfer of learning occurs only when identical elements of behavior are carried over from one task to another

Singley & Anderson (1989): Surface vs. deep similarities

Common “cognitive units”


4

Transfer in strategic interaction

Bipartisan cooperation in Congress Golf -> bipartisanship?

Similarity? What is transferred?


5

Prisoner’s Dilemma (PD)


6

Chicken game (CG)


PD & CG payoff matrices

PD A B

A -1, -1 10, -10

B -10, 10 1, 1

CG A B

A -10, -10 10, -1

B -1, 10 1, 1


Similarities between PD & CG Surface (near transfer)

2X2 games 2 symmetric and 2 asymmetric outcomes [1,1] outcome is identical

Deep (far transfer) Mixed motive Non-zero sum Mutual cooperation is superior to competition

in long term Though unstable (risky)


9

Differences between PD & CG Different equilibria:

Symmetric in PD: [-1,-1]

Asymmetric in CG: [-1, 10] and [10,-1]

Different strategies to maximize joint payoff (Pareto-efficient outcome): [1,1] in PD

Alternation of [-1,10] and [10,-1] in CG


10

Questions / hypotheses Similarities

Identical element? Common cognitive units? Transfer of learning

Is there any transfer? Only in one direction?

Low – high entropy? (Bednar, Chen, Xiao Liu, & Page, in press) Identical element -> both ways

Mechanism of transfer Reciprocal trust mitigates the risk associated with the

long term solution (Hardin, 2002)


Participants and design 480 participants (CMU students)

240 pairs 2 within-subjects games: PD & CG 4 between-subjects information conditions

No-info: 60 pairs Min-info: 60 pairs Mid-info: 60 pairs Max-info: 60 pairs

2 between-subjects order conditions in each information condition

PD-CG: 30 pairs CG-PD: 30 pairs

200 unnumbered rounds for each game


Typical outcomes


Pareto-optimal equilibria


[1,1] increases with info


Alternation increases with info


PD – CG sequence

16


CG – PD sequence

17


18

PD before and after


19

CG before and after


20

Transfer from PD to CG Increased [1,1] (surface transfer)

Increased alternation (deep transfer)


21

Transfer from CG to PD Increased [1,1] (surface + deep transf.)


22

Divergent effects

[1,1]SurfaceSurface

DeepDeep

[1,1]

[10,-1] / [-1,10]

PD CG


23

Convergent effects

[1,1]

SurfaceSurface

DeepDeep

[1,1]

[10,-1] / [-1,10]

CG PD


Reciprocation by info


Payoff by info in PD and CG


26

Summary results Mutual cooperation increases with awareness of

interdependence (info) Transfer of learning

Better performance “after” than “before” Combined effects of surface and deep similarities

CG -> PD surface similarity facilitates transfer PD -> CG surface similarity interferes with transfer

Transfer occurs in both directions Mechanism of generalization

Reciprocal trust?


27

Cognitive model Awareness of interdependence

Opponent modeling

Generality Utility learning (reinforcement learning)

Transfer of learning Surface transfer Deep transfer


28

Opponent modeling Instance-based learning

Dynamic representation of the opponent

Sequence learning Prediction of opponent’s next move

Instance (snapshot of the current situation) Previous moves and opponent’s current move

Contextualized expectations


29

Utility learning Reinforcement learning

Strategy: what move to make given Expected move of opponent Context (previous moves)

Reward functions Own payoff – Opponent’s payoff Opponent’s payoff Joint payoff – Opponent’s previous payoff


30

Surface transfer Declarative sub-symbolic learning

Retrieval of instances guided by recency and frequency

Strategy learning A learned strategy continues to be used for

a while until it is unlearned


31

Deep transfer Trust learning / Trust dynamics

Trust accumulator Increases when opponent makes cooperative (risky)

moves

Decreases when opponent makes competitive moves

Trust invest accumulator Increases with mutually destructive outcome

Decreases with unreciprocated cooperation (risk taking)


32

Meta strategy Determines which reward function to use

Trust accumulator <= 0 Reward = own payoff – opponent’s payoff

Trust invest accumulator > 0 Reward = opponent’s payoff

Trust accumulator > 0 Reward = joint payoff – opp’s previous payoff


Model diagram

HSCB 2011 33

InstanceCurrent moves: A BPrevious moves: A A

Declarative Memory

Inst2Inst1

Inst4

Inst3

PredictionPrevious moves: A BOpponent move: A

Procedural Memory

Rule2

Rule1 Rule3

MoveBest response: A Predicted move: A

Trust Trust accumulatorTrust invest

Opponent MoveA

Reward

ACT-R

Environment

ACT-R extension


PD-CG


CG-PD


PD-CG surface transfer


PD-CG deep transfer


CG – PD surf+deep transfer


Trust simulation


40

Summary model results Awareness of interdependence

Opponent modeling

Generality Utility learning

Transfer of learning Surface level transfer: cognitive units

Deep level transfer: Trust


41

In progress Expand model to account for all

information conditions

Develop more ecologically valid paradigm (IPD^3)

Model “affective” processes in ACT-R


42

IPD^3


43

General discussion Transfer of learning is possible

Deep similarities: interpersonal level IPD^3

To be used in behavioral experiments Tool for learning strategic interaction skills


44

Acknowledgments

Coty Gonzalez Jolie Martin Hau-Yu Wong Muniba Saleem This research is supported by the Defense

Threat Reduction Agency (DTRA) grant number: HDTRA1-09-1-0053 to Cleotilde Gonzalez and Christian Lebiere

45

Thank you for your attention! Questions?

modeling transfer of learning in games of strategic interaction

Documents