regret minimization in bounded memory games
DESCRIPTION
Regret Minimization in Bounded Memory Games. Jeremiah Blocki Nicolas Christin Anupam Datta Arunesh Sinha. Work in progress. Motivating Example. Employee Actions: {Behave, Violate}. Audit Process Example. Reward Organization: -6 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/1.jpg)
REGRET MINIMIZATION IN BOUNDED MEMORY GAMESJeremiah BlockiNicolas ChristinAnupam DattaArunesh Sinha
Work in progress
![Page 2: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/2.jpg)
MOTIVATING EXAMPLE
2
Employee Actions: {Behave, Violate}
![Page 3: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/3.jpg)
AUDIT PROCESS EXAMPLE
3
Behave ViolateIgnore 0 -5Investigate -1 -1
Game proceeds in rounds
EmployeeOrganizationExpertOutcome:Reward:
Round 1BehaveIgnoreInvestigateNo Violation0
Round 2ViolateIgnoreInvestigateMissed V-5
Round 3ViolateInvestigateInvestigateDetected V-1
…………
…
Reward Organization: -6Rounds: 3Reward of Best Expert (hindsight): -3Regret : (-3) -(-6) = 3Average Regret: 1
![Page 4: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/4.jpg)
TALK OUTLINEMotivation Bounded Memory Game Model Defining Regret Regret Minimization in Bounded Memory
Games Feasibility Complexity
4
![Page 5: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/5.jpg)
ELEMENTS OF GAME MODEL Two Players:
Adversary (Employee) and Defender (Organization) Actions:
Adversary Actions: {Violate, Behave} Defender Actions: {Investigate, Ignore}
Repeated Interactions Each interaction has an outcome History of game play is a sequence of outcomes
Imperfect Information: The organization doesn’t always observe the
actions of the employee5
Could be formalized as a repeated game
![Page 6: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/6.jpg)
ADDITIONAL ELEMENTS OF GAME MODEL Move to richer game model
o History-dependent Rewards:o Save money by ignoringo Reputation possibly damaged if we Ignore and the
employee did violateo Reputation of the organization depends both on its
history and on the current outcome
o History-dependent Actions:o Players’ behavior may depend on historyo Defender’s behavior may depend on complete history 6
![Page 7: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/7.jpg)
ADVERSARY MODELSAdversary behavior depends only on history he
remembers
Fully Oblivious Adversaries – only remembers the round number
k-Adaptive Adversary – remembers the round number, but history is reset every k turns
Adaptive Adversary – remembers complete game history 7
![Page 8: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/8.jpg)
GOAL Define game model Define notions of regret for defender in game
model wrt different adversary models Study complexity of regret minimization
problem in this game model
8
![Page 9: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/9.jpg)
PRIOR WORK: REGRET MINIMIZATION FOR GAME MODELS
Regret Minimization well studied in repeated games including imperfect information (bandit model)
[AK04, McMahanB04,K05,FKM05, DH06]
Defender compares his performance to the performance of the best expert in hindsight. Traditional: for the sake of comparison we
assume that the adversary was oblivious
9
![Page 10: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/10.jpg)
TALK OUTLINE MotivationBounded Memory Game Model Defining Regret Regret Minimization in Bounded Memory
Games
10
![Page 11: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/11.jpg)
TWO PLAYER STOCHASTIC GAMES
11
Transitions between states can be probabilistic, and may depend on the actions (a,b) taken by Defender and Adversary.
r(a,b,s) - Payoff when action a,b is played at state s
Two Players: Defender and Adversary
![Page 12: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/12.jpg)
STOCHASTIC GAMES Captures dependence of rewards on history
A fixed strategy is a function f mapping S (states) to actions A Experts: Set of fixed strategies Captures dependence of defender’s actions on
history
12
Recall: Additional elements of game model in motivating example
![Page 13: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/13.jpg)
NOTATION Number of States in the Game (n)
Number of Actions (|A|)
Number of Experts (N) N = |A|n
Total rounds of play T
13
![Page 14: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/14.jpg)
THEOREM 1: REGRET MINIMIZATION IS IMPOSSIBLE FOR STOCHASTIC GAMES
14Oblivious Strategy i: Play b2 i times then play b1
Optimal Defender Strategy: Play a1 every round
Oblivious Strategy: Play b2 every turn
Optimal Defender Strategy: Play a2 every round
![Page 15: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/15.jpg)
OUR GAME MODEL Definition of bounded memory games, a
subclass of stochastic games
Memory-m games States record last m outcomes in the history of
the game play
15
![Page 16: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/16.jpg)
BOUNDED MEMORY GAME: STATES & ACTIONS
16
Four states – record last outcome (memory-1)
Defender Actions: {UP, DOWN}Adversary Actions: {LEFT, RIGHT}
![Page 17: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/17.jpg)
BOUNDED MEMORY GAME: OUTCOMES
17
Four Outcomes:1. (Up, Left)2. (Up, Right)3. (Down, Left)4. (Down, Right)
• The outcome depends only on the current actions of defender and adversary.
• It is independent of the current state
![Page 18: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/18.jpg)
BOUNDED MEMORY GAME: OUTCOMES
18
Four States:1. (Top, Left)2. (Top, Right)3. (Bottom, Left)4. (Bottom Right)
![Page 19: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/19.jpg)
BOUNDED MEMORY GAME: EXAMPLE GAME PLAY
19
RoundStateDefenderAdversaryOutcome
Round 3Bottom, Right………
Round 2Bottom,LeftDownRightBottom,Right
Round 1Top, LeftDownLeft(Bottom,Left)
![Page 20: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/20.jpg)
BOUNDED MEMORY GAMES: REWARDS
Defender sees reward 1 if Adversary plays LEFT from a green state Adversary plays RIGHT from a blue state
20
![Page 21: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/21.jpg)
TRADITIONAL REGRET IN BOUNDED MEMORY GAMES
Adversary strategy Plays RIGHT from a green state Plays LEFT from a blue state
21
• Defender will never see a reward!
• In hindsight, it may look like the fixed strategy “always play UP” would have received a reward
• What view of regret makes sense?
![Page 22: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/22.jpg)
TRADITIONAL REGRET IN BOUNDED MEMORY GAMES
RoundStateDefenderAdversaryReward
22
FiveTop, Left………
FourBottom, LeftUPLeft0
ThreeBottom,LeftDownLeft0
TwoTop, RightDownLeft0
OneTop,LeftUPRight0
![Page 23: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/23.jpg)
TRADITIONAL REGRET IN BOUNDED MEMORY GAMES
RoundStateUPAdversaryReward
23
FiveTop, Left………
FourBottom, LeftUPLeft0
ThreeBottom,LeftDownLeft0
TwoTop, RightDownLeft0
OneTop,LeftUPRight0
OneTop,LeftUPRight0
TwoTop, RightUPLeft0
ThreeTop,LeftUPLeft1
FourTop,LeftUPLeft1
• Player A will never see a reward!
• In hindsight, it looks like the fixed strategy “always play UP” would have received reward 2
• What are other ways to compare our performance?
![Page 24: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/24.jpg)
COMPARING PERFORMANCE WITH THE EXPERTS Actual Game
Hypothetical Game (Fixed Strategy f)
Compare performance of defender in actual game to performance of f in hypothetical game
24
Defender Real Adversary
f Hypothetical Adversary
![Page 25: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/25.jpg)
REGRET MODELS
25
Actual AdversaryHypothetical Adversary
Oblivious k-Adaptive Adaptive
Oblivious k-Adaptive _________Adaptive _________ _________
Hypothetical Oblivious Adversary – hard code moves played by actual adversary
Hypothetical k-Adaptive Adversary – hard code state of actual adversary after each window of k moves.
![Page 26: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/26.jpg)
REGRET IN BOUNDED MEMORY GAMES (REVISITED)
26
• In hindsight, our hypothetical
adversary model is k-Adaptive (k=5)
• What is our regret now?
……
………
FiveTop,LeftUpRight0
FourTop, RightUpLeft0
ThreeTop, LeftUpRight0
TwoTop, ‘RightUpLeft0
OneTop, LeftUpRight0
Round One Two Three Four Five …State Top,
LeftTop, Right Bottom,
LeftTop, Left
Bottom,Right
…
Defender Up Down Down Down Up …Adversary
Right Left Left Left Right …
Reward 0 0 0 0 0 …
RoundState
UPDefenderReward
Actual Performance: 0Performance of Expert: 0Regret: 0
![Page 27: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/27.jpg)
MEASURING REGRET IN HINDSIGHT View 1: hypothetical adversary is oblivious
The adversary would have played the exact same moves played in the actual game.
Traditional View of Regret: Repeated Games [Blackwell56,Hannan57,LW89], etc… Impossible for Bounded Memory Games (Example)
View 2: hypothetical adversary is k-Adaptive The adversary would have used the exact same
strategy during each window of k-moves.
View 3: hypothetical adversary fully adaptive The hypothetical adversary is the real adversary.
27
![Page 28: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/28.jpg)
REGRET MINIMIZATION ALGORITHMS Regret Minimization Algorithm:
Average Regret 0 as T
Examples for repeated games:Weighted Majority Algorithm [LW89]:
Average Regret: O((log N)/T)½
[ACFS02] Bandit Setting:Average Regret: O(((N log N)/T)½)
28
![Page 29: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/29.jpg)
REGRET MINIMIZATION IN REPEATED GAMES
29Easy consequence of Theorem 2
Actual AdversaryHypothetical Adversary
Oblivious k-Adaptive
Adaptive
Oblivious
k-Adaptive _
Adaptive _ _ X
![Page 30: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/30.jpg)
REGRET MINIMIZATION IN STOCHASTIC GAMES
30
Theorem 1: No Regret Minimization Algorithm exists for the general class of stochastic games.
Actual AdversaryHypothetical Adversary
Oblivious k-Adaptive
Adaptive
Oblivious X X X
k-Adaptive _ X X
Adaptive _ _ X
![Page 31: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/31.jpg)
THEOREM 1: REGRET MINIMIZATION IS IMPOSSIBLE FOR STOCHASTIC GAMES
31Oblivious Strategy i: Play b2 i times then play b1
Optimal Defender Strategy: Play a1 every round
Oblivious Strategy: Play b2 every turn
Optimal Defender Strategy: Play a2 every round
![Page 32: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/32.jpg)
REGRET MINIMIZATION IN BOUNDED MEMORY GAMES
Theorem 2 (Time Permitting)32
Actual AdversaryHypothetical Adversary
Oblivious k-Adaptive
Adaptive
Oblivious X X
k-Adaptive _
Adaptive _ _ X
![Page 33: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/33.jpg)
REGRET MINIMIZATION IN BOUNDED MEMORY GAMES
Theorem 3: Unless RP = NP there is no efficient regret minimization algorithm for the general class of bounded memory games. 33
Actual AdversaryHypothetical Adversary
Oblivious k-Adaptive
Adaptive
Oblivious Hard X X
k-Adaptive _ Hard Hard
Adaptive _ _ X
![Page 34: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/34.jpg)
THEOREM 3Unless RP=NP there is no efficient RegretMinimization algorithm for Bounded Memory
Games even against an oblivious adversary.
Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDP’s
34
![Page 35: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/35.jpg)
THEOREM 3: SETUP
Defender Actions A: {0,1}x{0,1}
m = O(log n)
States: Two states for each variable S0 = {s1,…, sn} S1 = {s’1,…,s’n}
Intuition: A fixed strategy corresponds to a variable assignment 35
![Page 36: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/36.jpg)
THEOREM 3: OVERVIEW The adversary picks a clause uniformly at
random for the next n rounds
Defender can earn reward 1 by satisfying this unknown clause in the next n rounds
The game will “remember” if a reward has already been given so that defender cannot earn a reward multiple times during n rounds
36
![Page 37: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/37.jpg)
THEOREM 3: STATE TRANSITIONS
37
Adversary Actions B: {0,1}x{0,1,2,3}
b = (b1,b2)
g(a,b) = b1
f(a,b) = S1 if a2 = 1 or b2 = a1 (reward already given) S0 else (no reward given)
![Page 38: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/38.jpg)
THEOREM 3: REWARDS
38
b = (b1,b2)
No reward whenever B plays b2 = 2
r(a,b,s) =
1 if s S0 and a = b2
-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise
No reward whenever s S1
![Page 39: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/39.jpg)
THEOREM 3: OBLIVIOUS ADVERSARY(d1,…,dn) - binary De Buijn sequence of order
n
1. Pick a clause C uniformly at random2. For i = 1,…,n
Play b = (di,b2)
3. Repeat Step 139
b2 =
1 If xi C0 If xi C3 If i = n2 Otherwise
![Page 40: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/40.jpg)
ANALYSIS
Defender can never be rewarded from s S1 Get Reward => Transition to s S1 Defender is punished for leaving S1
Unless adversary plays b2 = 3 (i.e when i = n)40
f(a,b) = S1 if a2 = 1 or b2 = a1
S0 else
r(a,b,s)=
1 if s S0 and a = b2
-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise
![Page 41: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/41.jpg)
THEOREM 3: ANALYSIS φ - assignment satisfying ρ fraction of clauses
fφ – average score ρ/n
Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n
Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n
41
![Page 42: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/42.jpg)
OPEN QUESTIONS How hard is Regret Minimization against an
oblivious adversary when A = {0,1} and m = O(log n)?
How hard is Regret Minimization against a oblivious adversary in the complete information model?
42
![Page 43: Regret Minimization in Bounded Memory Games](https://reader035.vdocument.in/reader035/viewer/2022062521/5681678a550346895ddca234/html5/thumbnails/43.jpg)
QUESTIONS?
43