belief learning in an unstable infinite game paul j. healy cmu
TRANSCRIPT
Belief Learning in an Unstable Infinite Game
Paul J. Healy
CMU
Belief Learning in an Unstable Infinite Game
Issue #3
Issue #1
Issue #2
Issue #1: Infinite Games
• Typical Learning Model:– Finite set of strategies– Strategies get weight based on ‘fitness’– Bells & Whistles: experimentation, spillovers…
• Many important games have infinite strategies– Duopoly, PG, bargaining, auctions, war of attrition…
• Quality of fit sensitive to grid size?• Models don’t use strategy space structure
Previous Work
• Grid size on fit quality:– Arifovic & Ledyard
• Groves-Ledyard mechanisms• Convergence failure of RL with |S| = 51
• Strategy space structure:– Roth & Erev AER ’99
• Quality-of-fit/error measures– What’s the right metric space?
• Closeness in probs. or closeness in strategies?
Issue #2: Unstable Game
• Usually predicting convergence rates– Example: p–beauty contests
• Instability: – Toughest test for learning models– Most statistical power
Previous Work
• Chen & Tang ‘98– Walker mechanism & unstable Groves-Ledyard– Reinforcement > Fictitious Play > Equilibrium
• Healy ’06– 5 PG mechanisms, predicting convergence or not
• Feltovich ’00– Unstable finite Bayesian game– Fit varies by game, error measure
Issue #3: Belief Learning
• If subjects are forming beliefs, measure them!
• Method 1: Direct elicitation– Incentivized guesses about s-i
• Method 2: Inferred from payoff table usage– Tracking payoff ‘lookups’ may inform our models
Previous Work
• Nyarko & Schotter ‘02– Subjects BR to stated beliefs– Stated beliefs not too accurate
• Costa-Gomes, Crawford & Boseta ’01– Mouselab to identify types– How players solve games, not learning
This Paper
• Pick an unstable infinite game• Give subjects a calculator tool & track usage• Elicit beliefs in some sessions
• Fit models to data in standard way• Study formation of “beliefs”
– “Beliefs” <= calculator tool– “Beliefs” <= elicited beliefs
The Game
• Walker’s PG mechanism for 3 players• Added a ‘punishment’ parameter
N 1, 2, 3Si 10, 10 R1
u isi , s i viys tisys
jsj
viy b i y a i y2
tis si 1 si 1ys
Parameters & Equilibrium
• vi(y) = biy – aiy2 + ci
• Pareto optimum: y = 7.5• Unique PSNE: si* = 2.5
• Punishment γ = 0.1• Purpose: Not too wild, payoffs rarely negative
• Guessing Payoff: 10 – |gL - sL|/4 - |gR - sR|/4• Game Payoffs: Pr(<50) = 8.9%
Pr(>100) = 71%
ai bi ci
1 0.1 1.5 110
2 0.2 3.0 125
3 0.3 4.5 140
Choice of Grid Size
Grid Width 5 2 1 1/2 1/4 1/8
# Grid Points 5 11 21 41 81 161
% on Grid 59.7 61.6 88.7 91.6 91.9 91.9
S = [-10,10]
Properties of the Game
• Best response:
• BR Dynamics: unstable– One eigenvalue is +2
sBR b1/2a1b2/2a2b3/2a3
0 1 /2a1 1 /2a1
1 /2a2 0 1 /2a21 /2a3 1 /2a3 0
s
Interface
Design• PEEL Lab, U. Pittsburgh• All Sessions
– 3 player groups, 50 periods– Same group, ID#s for all periods– Payoffs etc. common information– No explicit public good framing– Calculator always available– 5 minute ‘warm-up’ with calculator
• Sessions 1-6– Guess sL and sR.
• Sessions 7-13– Baseline: no guesses.
• Total Variation:– No significant difference (p=0.745)
• No. of Strategy Switches:– No significant difference (p=0.405)
• Autocorrelation (predictability):– Slightly more without elicitation
• Total Earnings per Session:– No significant difference (p=1)
• Missed Periods:– Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%)
Does Elicitation Affect Choice?
t|xt xt 1 |
Does Play Converge?
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
12
14
16
18
20Average Distance From Equilibrium
Average | si – si* | per Period Average | y – yo | per Period
0 5 10 15 20 25 30 35 40 45 500
1
2
3
4
5
6
7
8
9
10Average |y - yo|
Does Play Converge, Part 2
0 5 10 15 20 25 30 35 40 45 50-10
-8
-6
-4
-2
0
2
4
6
8
10
Accuracy of Beliefs• Guesses get better in time
0 5 10 15 20 25 30 35 40 450
2
4
6
8
10
12
14
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
12
14
Average || s-i – s-i(t) || per Period
Elicited guesses Calculator inputs
Model 1: Parametric EWA
• δ : weight on strategy actually played• φ : decay rate of past attractions• ρ : decay rate of past experience• A(0): initial attractions• N(0): initial experience• λ : response sensitivity to attractions
Atsi At 1si N t 1 1 Is i ,s it u isi , s it
N tN t N t 1 1
itsi e A ts i
xS ie A tx
Model 1’: Self-Tuning EWA
• N(0) = 1• Replace δ and φ with deterministic functions:
tsi 1 if u isi , s it u ist0 otherwise
i,t 1 12 xS i
1t
1
t
Ix,s i Ix,s it
2
Atsi Nt 1A t 1s i 1 Isi ,sit u is i ,s it
Nt 1 1
STEWA: Setup
• Only remaining parameters: λ and A0
– λ will be estimated
– 5 minutes of ‘Calculator Time’ gives A0
• Average payoff from calculator trials:
A0si
t 1T
I si ,si tu ist
t 1T
I si ,si t
if t 1T Is i ,s it 1
1T
t 1T u ist otherwise
STEWA: Fit
• Likelihoods are ‘zero’ for all λ– Guess: Lots of near misses in predictions
• Alternative Measure: Quad. Scoring Rule
– Best fit: λ = 0.04 (previous studies: λ>4)– Suggests attractions are very concentrated
1 kP isik , t Isik , sit
2
-10
-8
-6
-4
-2
0
2
4
6
8
10
1112131415161718192021222324252627282930
-
0.2000
0.4000
0.6000
0.8000
1.0000
EWA Prob
Strategy
Period
STEWA Lambda=4: Session 3 Player 2 Pers 11-30
-10
-8
-6
-4
-2
0
2
4
6
8
10
1112131415161718192021222324252627282930
0
0.2
0.4
0.6
0.8
1
EWA Prob
Strategy
Period
STEWA Lambda=0.04: Session 3 Player 2 Pers 11-30
STEWA: Adjustment Attempts
• The problem: near misses in strategy space,
not in time• Suggests: alter δ (weight on hypotheticals)
– original specification : QSR* = 1.193 @ λ*=0.04– δ = 0.7 (p-beauty est.): QSR* = 1.056 @ λ*=0.03– δ = 1 (belief model): QSR* = 1.082 @ λ*=0.175– δ(k,t) = % of B.R. payoff: QSR* = 1.077 @ λ*=0.06
• Altering φ:– 1/8 weight on surprises: QSR* = 1.228 @ λ*=0.04
STEWA: Other Modifications
• Equal initial attractions: worse• Smoothing
– Takes advantage of strategy space structure• λ spreads probability across strategies evenly• Smoothing spreads probability to nearby strategies
– Smoothed Attractions– Smoothed Probabilities– But… No Improvement in QSR* or λ* !
• Tentative Conclusion:– STEWA: not broken, or can’t be fixed…
Other Standard Models
• Nash Equilibrium• Uniform Mixed Strategy (‘Random’)• Logistic Cournot BR• Deterministic Cournot BR• Logistic Fictitious Play• Deterministic Fictitious Play• k-Period BR
st BR 1k t k
t 1 s
“New” Models
• Best respond to stated beliefs (S1-S6 only)
• Best respond to calculator entries– Issue: how to aggregate calculator usage?– Decaying average of input
• Reinforcement based on calculator payoffs– Decaying average of payoffs
Model ComparisonsMODEL PARAM BIC 2-QSR MAD MSD
Random Choice* N/A In: Infinite In: 0.952
Out: 0.878
In: 7.439
Out: 7.816
In: 82.866
Out: 85.558
Logistic STEWA* λ In: Infinite In: 0.807
Out: 0.665
λ*=0.04
In: 3.818
Out: 3.180
λ*=0.41
In: 34.172
Out: 22.853
λ*=0.35
Logistic Cournot* λ In: Infinite In: 0.952
Out: 0.878
λ*=0.00(!)
In: 4.222
Out: 3.557
λ*=4.30
In: 38.186
Out: 25.478
λ*=4.30
Logistic F.P.* λ In: Infinite In: 0.955
Out: 0.878
λ*=14.98
In: 4.265
Out: 3.891
λ*=4.47
In: 31.062
Out: 22.133
λ*=4.47
* Estimates on the grid of integers {-10,-9,…,9,10}
In = periods 1-35 Out = periods 36-End
Model Comparisons 2MODEL PARAM MAD MSD
BR(Guesses)
(6 sessions only)
N/A In: 5.5924
Out: 3.3693
In: 57.874 Out: 19.902
BR(Calculator Input) δ (=1/2) In: 6.394
Out: 8.263
In: 79.29
Out: 116.7
Calculator Reinforcement*
δ (=1/2) In: 7.389
Out: 7.815
In: 82.407
Out: 85.495
k-Period BR k In: 4.2126
Out: 3.582
k* = 4
In: 35.185
Out: 23.455
k* = 4
Cournot N/A In: 4.7974
Out: 3.857
In: 45.283
Out: 29.058
Weighted F.P. δ In: 4.500
Out: 3.518
δ* = 0.56
In: 38.290
Out: 22.426
δ * = 0.65
The “Take-Homes”
• Methodological issues– Infinite strategy space– Convergence vs. Instability– Right notion of error
• Self-Tuning EWA fits best.
• Guesses & calculator input don’t seem to offer any more predictive power… ?!?!