adaptive regret minimization in bounded memory games
DESCRIPTION
Adaptive Regret Minimization in Bounded Memory Games. Jeremiah Blocki , Nicolas Christin, Anupam Datta, Arunesh Sinha . GameSec 2013 – Invited Paper. Motivating Example: Cheating Game. Semester 1. Semester 2. Semester 3. Motivating Example: Speeding Game. Week 1. Week 2. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/1.jpg)
Adaptive Regret Minimization in Bounded Memory Games
Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha
1
GameSec 2013 – Invited Paper
![Page 2: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/2.jpg)
Motivating Example: Cheating Game
4
Semester 1 Semester 2 Semester 3
![Page 3: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/3.jpg)
Motivating Example: Speeding Game
5
Week 1 Week 2 Week 3
![Page 4: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/4.jpg)
Motivating Example: Speeding GameExample
Actions
6
QuestionsAppropriate Game Model for this Interaction?
Defender Strategies?
:
Outcomes
:High InspectionLow Inspection
Speed Behave
![Page 5: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/5.jpg)
Game Elements
8
o Repeated Interactiono Two Players: Defender and Adversary
o Imperfect Informationo Defender only observes outcome
o Short Term Adversarieso Adversary Incentives Unknown to
Defendero Last presentation! [JNTP 13]
o Adversary may be uninformed/irrational
Repeated G
ame M
odel?
Stackelberg
![Page 6: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/6.jpg)
Additional Game Elements
9
o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?
o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history
and on the current outcome
StandardRegretMinimization
Repeated Game Model?
![Page 7: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/7.jpg)
Outline
10
Motivation Background
Standard Definition of Regret Regret Minimization Algorithms Limitations
Our Contributions Bounded Memory Games Adaptive Regret Results
![Page 8: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/8.jpg)
Speeding Game: Repeated Game Model
Example
11
.19 0.70.2 1
High InspectionLow Inspection
Defender’s (D) Expected Utility
+
![Page 9: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/9.jpg)
Regret Minimization ExampleExample
Experts
13
Low Inspecti
onHigh
Inspection
What should I do?
![Page 10: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/10.jpg)
BehaveLow
BehaveHigh
SpeedHighLowHigh
LowHigh
LowHigh
Regret Minimization ExampleExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s Utility
Experts
14
AdversaryDefender
Utility1.892.21.59
AristotlePlato
0.19 + 1 + 0.7 = 1.89 0.2 + 1 + 1 = 2.2Day 1 Day 2 Day 3
![Page 11: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/11.jpg)
Regret Minimization ExampleExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s Utility
Regret
15
DefenderAristotlePlato
Utility1.892.20.59
![Page 12: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/12.jpg)
Regret Minimization ExampleExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s Utility
Regret
16
![Page 13: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/13.jpg)
Regret Minimization ExampleExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s utility
Regret Minimization Algorithm (A)
17
lim¿𝑇→∞ (Regret ( 𝐴 ,𝐸𝑥𝑝𝑒𝑟𝑡𝑠 ,𝑇 )𝑇 )≤0
![Page 14: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/14.jpg)
Regret Minimization: Basic Idea
18
Low Inspectio
n
High Inspectio
n
1.0 1.0Weights
Choose action probabilistically based on weights
.19 0.70.2 1
High Inspection
Low Inspection
![Page 15: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/15.jpg)
Regret Minimization: Basic Idea
19
Updated weights
Low Inspectio
n
High Inspectio
n
0.5 1.5
.19 0.70.2 1
High InspectionLow Inspection
![Page 16: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/16.jpg)
Speeding GameExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s utility
Defender’s Strategy
20
Nash Equilibrium: Low Inspection
Regret Minimization: Low Inspection
Dominant Strategy
Low Inspectio
nHigh
Inspection
0.5 1.5
![Page 17: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/17.jpg)
Speeding GameExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s utility
Defender’s Strategy
21
Nash Equilibrium: Low Inspection
Regret Minimization: Low Inspection
Dominant Strategy
Low Inspectio
nHigh
Inspection
0.3 1.7
![Page 18: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/18.jpg)
Speeding GameExample
.19 0.70.2 1
High InspectionLow Inspection
Defender’s utility
Defender’s Strategy
22
Nash Equilibrium: Low Inspection
Regret Minimization: Low Inspection
Dominant Strategy
Low Inspectio
nHigh
Inspection
0.1 1.9
![Page 19: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/19.jpg)
Philosophical Argument
24
See! My advice
was better!
We need
a better game model
!
![Page 20: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/20.jpg)
Unmodeled Game Elements
29
o Adversary Incentives Unknown to Defendero Last presentation! [JNTP 13]
o Adversary may be uninformed/irrational
o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history
and on the current outcome
o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?
![Page 21: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/21.jpg)
Outline
30
Motivation Background Our Contributions
Bounded Memory Games Adaptive Regret Results
![Page 22: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/22.jpg)
Bounded Memory Games
32
State s: Encodes last m outcomes
States: can capture history dependent rewards
𝑂𝑖−𝑚 ,…,𝑂 𝑖− 2 ,𝑂𝑖−1 𝑂𝑖−𝑚+1 ,…,𝑂𝑖−1 ,𝑂𝑖𝑂𝑖
- Defender payoff when actions d,a are played at state s
![Page 23: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/23.jpg)
Bounded Memory Games
33
State s: Encodes last m outcomes
Current outcome is only dependent on current actions
, ¿ ,…, , - Defender payoff when actions d,a are played at state s
![Page 24: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/24.jpg)
Bounded Memory Games - Experts
34
Expert advice may depend on the last m outcomes If no violations have been
detected in the last m rounds then play High Inspection, otherwise Low Inspection
Fixed Defender Strategy:
State Action
![Page 25: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/25.jpg)
Outline
35
Motivation Background Our Contributions
Bounded Memory Games Adaptive Regret Results
![Page 26: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/26.jpg)
k-Adaptive Strategy
36
Decision tree for the next k roundsSpeed
Speed
Speed Speed
Day 1
Day 2
Day 3
…… …
Behave
… …
Speed
Speed
![Page 27: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/27.jpg)
k-Adaptive Strategy
37
Decision tree for the next k rounds
Week 1 Week 2 Week 3
I will never speed while I am on vacation.
I will speed until I get caught. If I ever get a ticket then I will
stop.
I will keep speeding until I get two tickets. If I ever get two
tickets then I will stop.
If violations have been detected in the last 7 rounds
then play High Inspection, otherwise Low Inspection
![Page 28: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/28.jpg)
k-Adaptive Regret
38
Regret (𝐷 , Expert ,𝑇 )=∑𝑖=1
𝑇(𝑟 𝑖′−𝑟𝑖 )
Initial State
Defender … O-1 O0
Actions (a1,d1) (a2,d2) … (ak+1,dk+1)
Outcome
O1 O2 … Ok+1 …
r1 r2 … rk+1
Expert … O-1 O0
Actions (a1,d1’)
(a2’,d2’)
… (ak+1,dk+1’)
…
Outcome
O1’ O2’ … Ok+1’ …
r1’ r2’ … rk+1’
![Page 29: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/29.jpg)
k-Adaptive Regret Minimization
39
Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP
lim ¿ 𝑇→∞(max𝐸∈exp
Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .
![Page 30: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/30.jpg)
Outline
40
Motivation Background Bounded Memory Games Adaptive Regret Results
![Page 31: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/31.jpg)
k-Adaptive Regret Minimization
41
lim¿𝑇→∞(max𝐸∈exp
Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .
Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP
Theorem: For any there is an inefficient –approximate k-adaptive regret minimization algorithm.
![Page 32: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/32.jpg)
Inefficient Regret Minimization Algorithm
42
• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]
, ¿ ,…, , …
f1
f2
…
Bounded Memory-mGame
fixed
stra
tegy
Repeated Game
![Page 33: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/33.jpg)
Inefficient Regret Minimization Algorithm
43
, ¿ ,…, , …
f1
f2
…
Bounded Memory-mGame
fixed
stra
tegy
Repeated Game
Expected reward in original game given:
1. Defender follows fixed strategy f2 for next mkt rounds of original game
2. Defender sees sequence of k-adaptive adversaries below
• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]
![Page 34: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/34.jpg)
Current outcome is only dependent on current actions
Inefficient Regret Minimization Algorithm
Start State Stagei (m*k*t rounds)
Real Game … O1 … Om …
Repeated Game
… O1 … Om …
44
![Page 35: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/35.jpg)
Inefficient Regret Minimization Algorithm
Start State Stagei (m*k*t rounds)
Real Game … O1 … Om …
Repeated Game
… O1 … Om …
45
• After m rounds in Stagei View 1 and View 2 must converge to the same state
![Page 36: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/36.jpg)
Inefficient Regret Minimization Algorithm
46
, ¿ ,…, , …
f1
f2
…
Bounded Memory-mGame
fixed
stra
tegy
Repeated GameStandard Regret Minimization algorithms maintain weight
for each expert.
Inefficient: Exponentially many fixed strategies!
• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]
![Page 37: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/37.jpg)
Summary of Technical Results
47
Imperfect Information
Perfect Information
Oblivious Regret Hard (Theorem 1)APX (Theorem 5)
APX (Theorem 4)
k-Adaptive Regret Hard (Theorem 1) Hard (Remark 2)
Fully Adaptive Regret X (Theorem 6) X (Theorem 6)
Easier
X – No Regret Minimization Algorithm ExistsHard – unless no regret minimization algorithm is efficient in
APX – efficient approximate regret minimization algorithm
![Page 38: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/38.jpg)
Summary of Technical Results
48
Imperfect Information
Perfect Information
Oblivious Regret Hard (Theorem 1)APX (Theorem 5)
APX (Theorem 4)
k-Adaptive Regret Hard (Theorem 1)APX (New!)
Hard (Remark 2)APX (New!)
Fully Adaptive Regret X (Theorem 6) X (Theorem 6)
Easier
X – No Regret Minimization Algorithm ExistsHard – unless no regret minimization algorithm is efficient in
APX – efficient approximate regret minimization algorithm in n, k
![Page 39: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/39.jpg)
Summary of Technical Results
49
Imperfect Information
Perfect Information
Oblivious Regret k-Adaptive Regret
Fully Adaptive Regret X (Theorem 6) X (Theorem 6)
Easier
Ideas: Implicit weight representation + Dynamic Programming
Warning! f(k) is a very large constant!
![Page 40: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/40.jpg)
Implicit Weights: Outcome Tree
50
… …
Behave… …
SpeedBehave Speed 𝑶 ( ln𝒏𝜸 )
❑
𝒘𝒖𝒗
How often is edge (s,t) relevant?
nodes
![Page 41: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/41.jpg)
Implicit Weights: Outcome Tree
51
Expert: E… …
Behave… …
SpeedBehave Speed
𝒘𝒖𝒗 𝒘 𝑬= ∑𝒖 ,𝒗∈𝑬
❑
𝑹𝒖𝒗𝒘𝒖𝒗
nodes
𝑶 ( ln𝒏𝜸 )❑
![Page 42: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/42.jpg)
Open Questions
52
Perfect Information: efficient 𝛄-Approximate k-Adaptive Regret Minimization Algorithm when k = 0 and 𝛄 = 0?
𝛄-Approximate k-Adaptive Regret Minimization Algorithm with more efficient running time? ?
Imperfect Information
Perfect Information
Oblivious Regret Hard (Theorem 1)APX (Theorem 5)
APX (Theorem 4)
k-Adaptive Regret Hard (Theorem 1)APX
Hard (Remark 2)APX
Fully Adaptive Regret X (Theorem 6) X (Theorem 6)
Thanks for Listening!
![Page 44: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/44.jpg)
THEOREM 3Unless RP=NP there is no efficient RegretMinimization algorithm for Bounded Memory
Games even against an oblivious adversary.
Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDP’s
54
![Page 45: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/45.jpg)
THEOREM 3: SETUP
Defender Actions A: {0,1}x{0,1}
m = O(log n)
States: Two states for each variable S0 = {s1,…, sn} S1 = {s’1,…,s’n}
Intuition: A fixed strategy corresponds to a variable assignment 55
![Page 46: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/46.jpg)
THEOREM 3: OVERVIEW The adversary picks a clause uniformly at
random for the next n rounds
Defender can earn reward 1 by satisfying this unknown clause in the next n rounds
The game will “remember” if a reward has already been given so that defender cannot earn a reward multiple times during n rounds
56
![Page 47: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/47.jpg)
THEOREM 3: STATE TRANSITIONS
57
Adversary Actions B: {0,1}x{0,1,2,3}
b = (b1,b2)
g(a,b) = b1
f(a,b) = S1 if a2 = 1 or b2 = a1 (reward already given) S0 else (no reward given)
![Page 48: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/48.jpg)
THEOREM 3: REWARDS
58
b = (b1,b2)
No reward whenever B plays b2 = 2
r(a,b,s) =
1 if s S0 and a = b2
-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise
No reward whenever s S1
![Page 49: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/49.jpg)
THEOREM 3: OBLIVIOUS ADVERSARY(d1,…,dn) - binary De Buijn sequence of order
n
1. Pick a clause C uniformly at random2. For i = 1,…,n
Play b = (di,b2)
3. Repeat Step 159
b2 =
1 If xi C0 If xi C3 If i = n2 Otherwise
![Page 50: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/50.jpg)
ANALYSIS
Defender can never be rewarded from s S1 Get Reward => Transition to s S1 Defender is punished for leaving S1
Unless adversary plays b2 = 3 (i.e when i = n)60
f(a,b) = S1 if a2 = 1 or b2 = a1
S0 else
r(a,b,s)=
1 if s S0 and a = b2
-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise
![Page 51: Adaptive Regret Minimization in Bounded Memory Games](https://reader030.vdocument.in/reader030/viewer/2022033022/5681662f550346895dd9945c/html5/thumbnails/51.jpg)
THEOREM 3: ANALYSIS φ - assignment satisfying ρ fraction of clauses
fφ – average score ρ/n
Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n
Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n
61