© cognitive radio technologies, 2007 1 game theory and cognitive radio networks basic game models...
TRANSCRIPT
1© Cognitive Radio Technologies, 2007
Game Theory and Cognitive Radio Networks
Basic Game Models
Equilibria Concepts
Optimality Concepts
Convergence and Noise
2© Cognitive Radio Technologies, 2007
Game Models
Models of interactive decision processes
3© Cognitive Radio Technologies, 2007
Game
1. A (well-defined) set of 2 or more players2. A set of actions for each player.3. A set of preference relationships for each player for
each possible action tuple.
• More elaborate games exist with more components but these three must always be there.
• Some also introduce an outcome function which maps action tuples to outcomes which are then valued by the preference relations.
• Games with just these three components (or a variation on the preference relationships) are said to be in Normal form or Strategic Form
4© Cognitive Radio Technologies, 2007
Set of Players (decision makers)
N – set of n players consisting of players “named” {1, 2, 3,…,i, j,…,n}
Note the n does not mean that there are 14 players in every game.
Other components of the game that “belong” to a particular player are normally indicated by a subscript.
Generic players are most commonly written as i or j. Usage: N is the SET of players, n is the number of
players. N \ i = {1,2,…,i-1, i+1 ,…, n} All players in N except
for i
5© Cognitive Radio Technologies, 2007
Actions
Ai – Set of available actions for player i
ai – A particular action chosen by i, ai Ai
A – Action Space, Cartesian product of all Ai
A=A1 A2· · · An
a – Action tuple – a point in the Action Space
A-i – Another action space A formed from
A-i =A1 A2· · · Ai-1 Ai+1 · · · An
a-i – A point from the space A-i
A = Ai A-i A1= A-2
A2 = A-1
a
a1 = a-2
a2 = a-1
Example Two Player Action SpaceA1 = A2 = [0 )
A=A1 A2
b
b1 = b-2
b2 = b-1
6© Cognitive Radio Technologies, 2007
Preference Relation expresses an individual player’s desirability ofone outcome over another (A binary relationship)
*io o
o is preferred at least as much as o* by player i
i Preference Relationship (prefers at least as much as)
i Strict Preference Relationship (prefers strictly more than)
~i “Indifference” Relationship (prefers equally)
*io o *
io o
iff *io o
but not
*~io o *io o
iff *io o
and
Preference Relations (1/2)
7© Cognitive Radio Technologies, 2007
Preference Relationship (2/2)
Games generally assume the relationship between actions and outcomes is invertible so preferences can be expressed over action vectors.
Preferences are really an ordinal relationship– Know that player prefers one outcome to another,
but quantifying by how much introduces difficulties
8© Cognitive Radio Technologies, 2007
Preference Relation then defined as*
ia a
Maps action space to set of real numbers.
iff *i iu a u a
*ia a iff *
i iu a u a
*~ia a iff *i iu a u a
:iu A R
A mathematical description of preference relationships.
Utility Functions (1/2)(Objective Fcns, Payoff Fcns)
9© Cognitive Radio Technologies, 2007
Utility Functions (2/2)
By quantifying preference relationships all sorts of valuable mathematical operations can be introduced.
Also note that the quantification operation is not unique as long as relationships are preserved. Many map preference relationships to [0,1].
Example
Jack prefers Apples to Oranges
JackApples Oranges Jack Jacku Apples u Oranges
a) uJack(Apples) = 1, uJack(Oranges) = 0
b) uJack(Apples) = -1, uJack(Oranges) = -7.5
10© Cognitive Radio Technologies, 2007
Variety of game models
Normal Form Game <N,A,{ui}>– Synchronous play– T is a singleton– Perfect knowledge of action space, other players’ goals (called
utility functions) Repeated Game <N,A,{ui},{di}>
– Repeated synchronous play of a normal form game– T may be finite or infinite– Perfect knowledge of action space, other players’ goals (called
utility functions)– Players may consider actions in future stages and current stages
Strategies (modified di)
Asynchronous myopic repeated game <N,A,{ui},{di},T>– Repeated play of a normal form game under various timings– Radios react to most recent stage, decision rule is “intelligent”
Many others in the literature and in the dissertation
11© Cognitive Radio Technologies, 2007
NormalUrgent
Allocate ResourcesInitiate Processes
OrientInfer from Context
Establish Priority
PlanNormal
Negotiate
Immediate
LearnNewStates
Goal
Adapted From Mitola, “Cognitive Radio for Flexible Mobile Multimedia Communications ”, IEEE Mobile Multimedia Conference, 1999, pp 3-10.
Observe
OutsideWorld
Decide
Act
Autonomous
Infer from Radio Model
States
\
Utility functionArguments
Utility Function
Outcome Space
Action Sets
DecisionRules
Cognitive radios are naturally modeled as players in a game
12© Cognitive Radio Technologies, 2007
Interaction is naturally modeled as a game
Radio 2
Actions
Radio 1
ActionsAction Space
u2u1
Decision Rules
Decision Rules
Outcome Space
:f A OInformed by Communications Theory
1 2ˆ ˆ, 1 1̂u 2 2ˆu
13© Cognitive Radio Technologies, 2007
Some differences between game models and cognitive radio network model
Player Cognitive Radio
Knowledge Knows A Can learn O (may know or learn A)
f : A O
Invertible
Constant
Known
Not invertible (noise)
May change over time (though relatively fixed for short periods)
Has to learn
Preferences Ordinal Cardinal (goals)
Assuming numerous iterations, normal form game only has a single stage.
– Useful for compactly capturing modeling components at a single stage
– Normal form game properties will be exploited in the analysis of other games
Repeated games are explicitly used as the basis for cognitive radio algorithm design (e.g., Srivastava, MacKenzie)
– Not however, focus of dissertation– Not the most commonly encountered implementation
14© Cognitive Radio Technologies, 2007
Summary
The interactions in a cognitive radio network (levels 1-3) can be represented by the tuple <N, A, {ui}, {di},T>
A dynamical system model adequately represents inner-loop procedural radios
A myopic asynchronous repeated game adequately represents ontological radios and random procedural radios
– Suitable for outer-loop processes
– Not shown here, but can also handle inner-loop
Some differences in models– Most analysis carries over– Some differences
Game Model
Dynamical System
15© Cognitive Radio Technologies, 2007
Equilibrium Concepts
Nash Equilibria, Mixed Strategy Equilbria, Coalitional Games, the Core, Shapley Value, Nash Bargaining
16© Cognitive Radio Technologies, 2007
Steady-states
Recall model of <N,A,{di},T> which we characterize with the evolution function d
Steady-state is a point where a*= d(a*) for all t t *
Obvious solution: solve for fixed points of d. For non-cooperative radios, if a* is a fixed point under
synchronous timing, then it is under the other three timings. Works well for convex action spaces
– Not always guaranteed to exist– Value of fixed point theorems
Not so well for finite spaces– Generally requires exhaustive search
17© Cognitive Radio Technologies, 2007
An action vector from which no player can profitably unilaterally deviate.
, ,i i i i i iu a a u b a An action tuple a is a NE if for every i N
for all bi Ai.
Definition
Note showing that a point is a NE says nothing about the process by which the steady state is reached. Nor anything about its uniqueness.Also note that we are implicitly assuming that only pure strategies are possible in this case.
“A steady-state where each player holds a correct expectation of the other players’ behavior and acts rationally.” - Osborne
Nash Equilibrium
18© Cognitive Radio Technologies, 2007
Examples
Cognitive Radios’ Dilemma– Two radios have two
signals to choose between {n,w} and {N,W}
– n and N do not overlap– Higher throughput from
operating as a high power wideband signal when other is narrowband
Jamming Avoidance– Two channels– No NE
0 1
0 (-1,1) (1,-1)
1 (1,-1) (-1,1)
Jammer
Transmitter
19© Cognitive Radio Technologies, 2007
Preplay Communication– Before the game, discuss their options. Note only NE are suitable
candidates for coordination as one player could profitably violate any agreement.
Rational Introspection– Based on what each player knows about the other players, reason what
the other players would do in its own best interest. (Best Response - tomorrow) Points where everyone would be playing “correctly” are the NE.
Focal Point– Some distinguishing characteristic of the tuple causes it to stand out.
The NE stands out because it’s every player’s best response. Trial and Error
– Starting on some tuple which is not a NE a player “discovers” that deviating improves its payoff. This continues until no player can improve by deviating. Only guaranteed to work for Potential Games (couple weeks)
How do the players find the Nash Equilibrium?
20© Cognitive Radio Technologies, 2007
Iterative Elimination of Strongly Dominated Strategies (IESDS)
MotivationSometimes a player’s actions are not preferable, no matter what the other players do.These actions would thus never rationally be played and can be eliminated from consideration in any NE action vector.
DefinitionAn action of player i, ai, is said to be dominated, if there exists some other action bi such that
Note this is strictly the definition for strong domination and the technique being discussed is actually Strong IEDS (or IESDS). Normal (weak) IEDS and sometimes results in incorrect elimination of NE.
, ,i i i i i i i iu a a u b a a A
21© Cognitive Radio Technologies, 2007
IEDS Algorithm
1. Remove any readily apparent dominated actions from the analysis.
2. Check to see if this reveals new dominated actions.
3. If there are any dominated actions, repeat from 1, else continue to 4.
4. Apply NE definition on any remaining action tuples.
22© Cognitive Radio Technologies, 2007
D C
C
D -1, -1
-5, -5
-10, 0
0, -10So D is dominated by C for player 1. So we remove D for player 1 from the game.
Note the following
1 1, ,u C D u D D
1 1, ,u C C u D C
Iteration 1.
Iteration 2.Note the following
2 2, ,u C C u C D
So in the remaining game D is dominated by C for player 2. So we remove D for player 2 from the game.
As there is only one action tupleleft (thus no deviation is possible, nor is a profitable deviation), it must be a NE
Example IEDS
23© Cognitive Radio Technologies, 2007
Comments on IESDS
IESDS is not useful for all games as many games have no dominated strategies.
While each iterative elimination is rational, Dutta states that rationality is more difficult to assume when a large number of iterative eliminations are required. (I disagree for repetitive play)
IESDS can also augment other solution techniques.
If a game is IESDS solvable
24© Cognitive Radio Technologies, 2007
Nash Equilibrium as a Fixed Point
Individual Best Response
Synchronous Best Response
Nash Equilibrium as a fixed point
Fixed point theorems can be used to establish existence of NE (see dissertation)
NE can be solved by implied system of equations
ˆ : , ,i i i i i i i i i i iB a b A u b a u a a a A
ˆ ˆi
i NB a B a
* *ˆa B a
25© Cognitive Radio Technologies, 2007
Example solution for Fixed Point by Solving for Best Response Fixed Point
Bandwidth Allocation Game– Five cognitive radios with each radio, i, free to determine
the number of simultaneous frequency hopping channels the radio implements, ci [0,).
– Goal– P(c) fraction of symbols that are not interfered with (making
P(c)ci the goodput for radio i)
– Ci(ci) is radio i’s cost for supporting ci simultaneous channels.
i k i ik N
u c B c c Kc
i i i iu c P c c C c
26© Cognitive Radio Technologies, 2007
Best Response Analysis
ˆ / 1ic B K N i N
i k i ik N
u c B c c Kc
Goal
\
ˆ / 2i i kk N i
c B c B K c
Best Response
Simultaneous System of Equations
ˆ / 6ic B K i N Solution
Generalization
27© Cognitive Radio Technologies, 2007
Significance of NE for CRNs
Why not “if and only if”?– Consider a self-motivated game with a local maximum and a hill-climbing algorithm.– For many decision rules, NE do capture all fixed points (see dissertation)
Identifies steady-states for all “intelligent” decision rules with the same goal. Implies a mechanism for policy design while accommodating differing
implementations– Verify goals result in desired performance– Verify radios act intelligently
Autonomously Rational Decision Rule
28© Cognitive Radio Technologies, 2007
Key Theorem for NE Existence
Kakutani’s Fixed Point Theorem
– Let f :X X be an upper semi-continuous convex valued correspondence from a non-empty compact convex set X n, then there is some x*X such that x* f(x*)
x
f(x)
x
f(x)
29© Cognitive Radio Technologies, 2007
Nash Equilibrium Existence
* *:U a a A f a a
a2a1
U(a1)
a
f (a)
a0 a2a1
U(a1)
a
f (a)
a0
Visualizable Definition of Quasi-ConcavityAll upper-level sets are convex
30© Cognitive Radio Technologies, 2007
Pure Strategies in an Extended GameConsider an extensive form game where each stage is a strategic form game and the action space remains the same at each stage.Before play begins, each player chooses a probabilistic strategy that assigns a probability to each action in his action set. At each stage, the player chooses an action from his action set according to the probabilities he assigned before play began.
ExampleConsider a video football game which will be simulated. Before the game begins two players assign probabilities of calling running plays or passing plays for both offense or defense. In the simulation, for each down the kind of play chosen by each team is based on the initial probabilities assigned to kinds of plays.
My Favorite Mixed Strategy Story
31© Cognitive Radio Technologies, 2007
Example Mixed Strategy Game
Jamming game
a1
b1
a2 b2
1,-1 -1, 1
1, -1-1, 1
p
(1-p)
q (1-q) Action Tuples Probabilities(a1,a2)(a1,b2)
(b1,a2)(b1,b2)
pqp(1-q)
(1-p)(1-q)(1-p)q
Expected Utilities
1 , 1 1 1
1 1 1 1 1
U p q pq p q
p q p q
2 , 1 1 1
1 1 1 1 1
U p q pq p q
p q p q
(A1)={p,(1-p): p[0,1]}
(A2)={q,(1-q): q[0,1]}
Sets of probability distributions
32© Cognitive Radio Technologies, 2007
Nash Equilibrium in a Mixed Strategy Game
Definition Mixed Strategy Nash EquilibriumA mixed strategy profile * is a NE iff iN
* * *, ,i i i i i i i iU U A
Best Response Correspondence
arg max ,i i
i i i i iA
BR U
Alternate NE DefinitionConsider i N iB BR
A mixed strategy profile * is a NE iff
* *B
33© Cognitive Radio Technologies, 2007
0 1
1
Best Response
1 4 2u
q qp
2 4 2u
p pq
1
0 1/ 2
[0,1] 1/ 2
1 1/ 2
q
BR q q
q
2 , 4 2 2 1U p q pq p q 1 , 4 2 2 1U p q pq p q
2
1 1/ 2
[0,1] 1/ 2
0 1/ 2
p
BR p p
p
0.5
0.5
BR1(q)
BR2(p)
Best Response Correspondences
Note: NE in mixed extension which did not exist in original
p(a1)
p(a2)
Nash Equilibrium
p
q
1-p
1-q
34© Cognitive Radio Technologies, 2007
Interesting Properties of Mixed Strategy Games
1. Every Mixed Extension of a Strategic Game has an NE.
2. A mixed strategy i is a best response to -i
iff every action in the support of i is itself a best response to -i.
3. Every action in the support of any player’s equilibrium mixed strategy yields the same payoff to that player.
35© Cognitive Radio Technologies, 2007
Coalitional Game (with transferable payoff)
Concept: groups of players (called coalitions) conspire together to implement actions which yields a result for the coalition. The value received by the coalition is then distributed among the coalition members.
Where do radios collaborate and distribute value?– 802.16h interference groups – allocation of bandwidth– Distribution of frequencies/spreading codes among cells– File sharing in P2P network
Transferable utility refers to existence of some commodity for which a player’s utility increases by one unit for every unit of the commodity it receives
Game Components, N,v– N set of players– Characteristic function– Coalition, SN
How is this value distributed?– Payoff vector, (xi)iS
Payoff vector is said to be S-feasible if x(S) v(S)
: 2 \Nv
ii S
x S x
36© Cognitive Radio Technologies, 2007
The Core (Transferrable)
The Core– For N,v, the set of feasible payoff profiles, (xi)iS
for which there is no coalition S and S-feasible payoff vector (yi)iS for which yi > xi for all iS.
General principles of the NE also apply to the Core:– Number of solutions for a game may be anywhere
from 0 to – May be stable or unstable.
37© Cognitive Radio Technologies, 2007
Example
Suppose three radios, N = {1,2,3}, can choose to participate in a peer-to-peer network.
Characteristic Function– v(N) = 1– v({1,2})= v({1,3})= v({2,3})=[0,1]– v(1)= v(2)= v(3) = 0
Loosely, indicates # of duplicated files If >2/3, Core is empty
x = (2/5, 2/5,0)
Example adaptations for=4/5 x = (0, 3/5, 1/5) x = (2/5, 0, 2/5)
x = (1/3, 1/3, 1/3) x = (2/5, 2/5,0)
38© Cognitive Radio Technologies, 2007
Comments on the Core
Possibility of empty core implies that even when radios can freely negotiate and form arbitrary coalitions, no steady-state may exist
Frequently very large (infinite) number of steady-states, e.g., <2/3
– Makes it impossible to predict exact behavior Existence conditions for the Core, but would need to
cover some linear programming concepts Related (but not addressed today) concepts:
– Bargaining Sets, Kernel, Nucleolus
39© Cognitive Radio Technologies, 2007
Strong NE
Concept: Assume radios are able to collaborate, but utilities aren’t necessarily transferrable
An action tuple a* such that
* *, ,i i S S S ii S
u a u a a S N a A
No Strong NE
N W
n (9.6,9.6) (9.6, 21)
w (21, 9.6) (22, 22)
Unique Strong NE
40© Cognitive Radio Technologies, 2007
Motivation for Shapley value
Core was generally either empty or very large. Want a “good” single solution.
Terminology– Marginal Contribution of i
– Interchangeability of i, j
– Dummy player (no synergy)
i S v S i v S
\i S v i S N i
\{ , }i jS S S N i j
41© Cognitive Radio Technologies, 2007
Axioms for Shapley Value
Let be some distribution of value for a TU coalition game
Symmetry:– If i and j are interchangeable, then i(v)=j(v)
Dummy:– If i is a dummy, then i(v) = v({i})
Additivity:– Given N,v and N,w, i(v + w)= i(v)+ i(w) for all iN,
where v+w = v(S) + w(S) Balanced Contributions
– Given N,v,
\ \, \ . , \ .N j N ii i j jN v N j v N v N i v
42© Cognitive Radio Technologies, 2007
Shapley Value
Only assignment (value) that satisfies balanced contributions; only assignment that simultaneously satisfies symmetry, dummy, and additivity axioms
\
! 1 !
!iS N i
S N SS v S i v S
N
Marginal Value Contributed by i
Probability that i will be next one invited to the grand coalition given that coalition S is already part of the coalition assuming random ordering.
43© Cognitive Radio Technologies, 2007
Implications of Shapley Value
One form of a fair allocation– What you receive is based on the value you add– Independent of order of arrival– I liken it to setting salaries according to the Value Over
Replacement Player concept in Baseball Statistics “Better” solution concept than the core as it’s a single
payoff as opposed to a potentially infinite number Allows for analysis of relative “power” of different
players in the system
44© Cognitive Radio Technologies, 2007
Bargaining Problem
Components: F, v– Feasible payoffs F, closed convex subset of n
– Disagreement Point v = (v1, v2) What 1 or 2 could achieve without bargaining
Example:– Even if system is jammed, still gets some throughput– Member of 802.16h interference group and try its luck
F is said to be essential if there is some yF such that y1>v1 and y2>v2
If contracts are “binding” then F could be the payoffs corresponding to entire original action space
Otherwise, F may need to be drawn from the set of NE or from enforceable set (see punishment in repeated games)
A particular solution is referred to by (F, v) n
45© Cognitive Radio Technologies, 2007
Desirable Bargaining Axioms about a Solution
Strong Efficiency (F, v) is Pareto Efficient
Individually Rational (F, v) v
Scale Covariance– For any 1, 2, 1, 2,
1, 2 >0, if
then
Independence of Irrelevant Alternatives
– If G F and G is closed and convex and (F, v)G, then (G, v)=(F, v)
Symmetry– If v1=v2 and {(x1,x2)|
(x2,x1)F}=F, then 1(F, v)= 2(F, v)
1 1 1 2 2 2 1 2, | ,G x x x x F
1 1 1 2 2 2, , , ,G w F v F v
1 1 1 2 2 2,w v v
46© Cognitive Radio Technologies, 2007
Nash Bargaining Solution
NBS
Interestingly, this is the only bargaining solution which simultaneously satisfies the preceding 5 axioms
,
, arg max i ii Nx F x v
F v x v
47© Cognitive Radio Technologies, 2007
GT framework for BW allocation [Yaiche]: System Model
N users L links Users compete for the total link capacity Each user has a minimum rate MRi and peak rate
PRi
Admissible rate vector is given by,
C : vector of link capacities
A L*N: alp = 1 if link belongs to path p, else 0.
0 | , , andNX x x MR x PR Ax C ¡
Scenario given in H. Yaiche, R. Mazumdar, C. Rosenberg, “A game theoretic framework for bandwidth allocation and pricing in broadband networks”, IEEE/ACM Transactions on Networking, Volume: 8 , Issue: 5 , Oct. 2000, pp. 667-678.
48© Cognitive Radio Technologies, 2007
Centralized Optimization Problem
1
: 1
1
1
N
i ixi
i i
i i
l l
Max x MR
st x MR i N
x PR i N
Ax C l L
K
K
K
NBS exists
v
F
49© Cognitive Radio Technologies, 2007
Steady-State Summary
Not every game has a steady-state NE are analogous to fixed points of self-interested decision
processes NE can be applied to procedural and ontological radios
– Don’t need to know decision rule, only goals, actions, and assumption that radios act in their own interest
A game (network) may have 0, 1, or many steady-states All finite normal form games have an NE in its mixed extension
– Over multiple iterations, implies constant adaptation More complex game models yield more complex steady-state
concepts Can define steady-states concepts for coalitional games
– Frequently so broad that specific solutions are used
50© Cognitive Radio Technologies, 2007
Evaluating Equilibria
Objective Function Maximization, Pareto Efficiency, Notions of Fairness
51© Cognitive Radio Technologies, 2007
Optimality
In general we assume the existence of some design objective function J:A
The desirableness of a network state, a, is the value of J(a).
In general maximizers of J are unrelated to fixed points of d.
Figure from Fig 2.6 in I. Akbar, “Statistical Analysis of Wireless Systems Using Markov Models,” PhD Dissertation, Virginia Tech, January 2007
52© Cognitive Radio Technologies, 2007
Example Functions
Utilitarian– Sum of all players’ utilities– Product of all players’
utilities Practical
– Total system throughput– Average SINR– Maximum End-to-End
Latency– Minimal sum system
interference Objective can be unrelated
to utilities
Utilitarian Maximizers
System Throughput Maximizers
Interference Minimization
53© Cognitive Radio Technologies, 2007
Price of Anarchy (Factor)
Centralized solution always at least as good as distributed solution
– Like ASIC is always at least as good as DSP Ignores costs of implementing algorithms
– Sometimes centralized is infeasible (e.g., routing the Internet)
– Distributed can sometimes (but not generally) be more costly than centralized
Performance of Centralized Algorithm Solution
Performance of Distributed Algorithm Solution
1
9.6
7
54© Cognitive Radio Technologies, 2007
Implications
Best of All Possible Worlds– Low complexity distributed algorithms with low anarchy factors
Reality implies mix of methods– Hodgepodge of mixed solutions
Policy – bounds the price of anarchy Utility adjustments – align distributed solution with centralized solution Market methods – sometimes distributed, sometimes centralized Punishment – sometimes centralized, sometimes distributed,
sometimes both Radio environment maps –”centralized” information for distributed
decision processes– Fully distributed
Potential game design – really, the panglossian solution, but only applies to particular problems
55© Cognitive Radio Technologies, 2007
Pareto efficiency (optimality)
Formal definition: An action vector a* is Pareto efficient if there exists no other action vector a, such that every radio’s valuation of the network is at least as good and at least one radio assigns a higher valuation
Informal definition: An action tuple is Pareto efficient if some radios must be hurt in order to improve the payoff of other radios.
Important note– Like design objective function, unrelated to fixed points (NE)– Generally less specific than evaluating design objective
function
56© Cognitive Radio Technologies, 2007
Example Games
a1
b1
a2 b2
1,1 -5,5
-1,-15,-5
a1
b1
a2 b2
1,1 -5,5
3, 35,-5
Legend Pareto Efficient
NE NE + PE
57© Cognitive Radio Technologies, 2007
Notions of Fairness
What is “Fair”?– Abstractly “fair” means different things to different analysts– In every day life, really just short hand for “I deserve more
than I got”
Nonetheless is used to evaluate how equitably radio resources are distributed
58© Cognitive Radio Technologies, 2007
Gini Coefficient
Basic concept: – Order players by utility. – Form CDF for sorted utility distribution
(Lorenz curve)– Integrate (sum) the difference between
perfect equality (of outcome) and CDF– Divide result by sum of all players’ utilities
Formula
Used in a lot of macro-economic comparisons of income distributions
Relatively simple, independent of scale, independent of size of N, anonymity
Radically different outcomes can give the same result G N W
n 0 0.37
w 0.37 0
Player #
Agg
rega
te U
tility
Lorenz curvePerfe
ct Equality
11
1 2i
i N
ii N
n i u aG a n
n u a
59© Cognitive Radio Technologies, 2007
Other Metrics of Fairness
Theill Index
Atkinson Index, is income inequality aversion
1 1
11 11 , 0,1i
i N
T a u au n
1lni i
i N
u a u aT a
n u u
1
ii N
u a u an
1/
1 11 , 1
n
ii N
T a u au n
60© Cognitive Radio Technologies, 2007
Summary of Equilibria Evaluation
Lots of different ways which a point can be evaluated
Many are contradictory Loosely, any point could be said to be
optimal given the right objective function Insufficient to say that a point is optimal Must describe the metric in use Suggestion: use whatever metric makes
sense to you as a network designer
61© Cognitive Radio Technologies, 2007
The Notion of Time and Imperfections in Games and Networks
Extensive Form Games, Repeated Games, Convergence Concepts in Normal Form Games, Trembling Hand Games, Noisy Observations
62© Cognitive Radio Technologies, 2007
Model Timing Review
When decisions are made also matters and different radios will likely make decisions at different time
Tj – when radio j makes its adaptations
– Generally assumed to be an infinite set
– Assumed to occur at discrete time
Consistent with DSP implementation
T=T1T2Tn
t T
Decision timing classes Synchronous
– All at once Round-robin
– One at a time in order– Used in a lot of analysis
Random– One at a time in no order
Asynchronous– Random subset at a time– Least overhead for a network
63© Cognitive Radio Technologies, 2007
Extensive Form Game Components
1. A set of players.2. The actions available to each player at each decision moment (state).3. A way of deciding who is the current decision maker.4. Outcomes on the sequence of actions.5. Preferences over all outcomes.
Components
A1
2
1
2
1’
2’
B
B
1,-1
-1,1
-1,1
1,-1
1
2
1,1’ 1,2’ 2,1’ 2,2’
1,-1
1,-11,-1
1,-1 -1,1 -1,1
-1,1 -1,1
Strategic Form EquivalenceStrategies for A{1,2}
Strategies for B{(1,1’),(1,2’),(2,1’),(2,2’)}
Game Tree Representation
A Silly Jammer Avoidance Game
64© Cognitive Radio Technologies, 2007
Backwards Induction
Concept– Reason backwards based on what each player would rationally
play– Predicated on Sequential Rationality– Sequential Rationality – if starting at any decision point for a player
in the game, his strategy from that point on represents a best response to the strategies of the other players
– Subgame Perfect Nash Equilibrium is a key concept (not formally discussed today).
C
S S
1 1
1,0
C2
0,2
C
S
3,1
S
C2
2,4
S
1
5,3
C C
S
2
4,6
7,54,65,32,43,10,2
Alternating Packet Forwarding Game
65© Cognitive Radio Technologies, 2007
Comments on Extensive Form Games
Actions will generally not be directly observable However, likely that cognitive radios will build up
histories Ability to apply backwards induction is predicated on
knowing other radio’s objectives, actions, observations and what they know they know…
– Likely not practical Really the best choice for modeling notion of time
when actions available to radios change with history
66© Cognitive Radio Technologies, 2007
Repeated Games
Stage 1
Stage 2
Stage k
Stage 1
Stage 2
Stage k
Same game is repeated– Indefinitely– Finitely
Players consider discounted payoffs across multiple stages
– Stage k
– Expected value over all future stages
k k ki iu a u a
0
k k ki i
k
u a u a
67© Cognitive Radio Technologies, 2007
Lesser Rationality: Myopic Processes
Players have no knowledge about utility functions, or expectations about future play, typically can observe or infer current actions
Best response dynamic – maximize individual performance presuming other players’ actions are fixed
Better response dynamic – improve individual performance presuming other players’ actions are fixed
Interesting convergence results can be established
68© Cognitive Radio Technologies, 2007
Paths and Convergence
Path [Monderer_96]– A path in is a sequence = (a0, a1,…) such that for every k 1
there exists a unique player such that the strategy combinations (ak-1, ak) differs in exactly one coordinate.
– Equivalently, a path is a sequence of unilateral deviations. When discussing paths, we make use of the following conventions.
– Each element of is called a step.– a0 is referred to as the initial or starting point of .– Assuming is finite with m steps, am is called the terminal point or
ending point of and say that has length m. Cycle [Voorneveld_96]
– A finite path = (a0, a1,…,ak) where ak = a0
69© Cognitive Radio Technologies, 2007
Improvement Paths
Improvement Path– A path = (a0, a1,…) where for all k1, ui(ak)>ui(ak-1)
where i is the unique deviator at k Improvement Cycle
– An improvement path that is also a cycle– See the DFS example
2
1
3
4562
1
3
456
70© Cognitive Radio Technologies, 2007
Convergence Properties
Finite Improvement Property (FIP)– All improvement paths in a game are finite
Weak Finite Improvement Property (weak FIP)– From every action tuple, there exists an
improvement path that terminates in an NE.
FIP implies weak FIP FIP implies lack of improvement cycles Weak FIP implies existence of an NE
71© Cognitive Radio Technologies, 2007
Examples
a
b
A B
1,-1
-1,1
0,2
2,2
Game with FIP
a
b
A B
1,-1 -1,1
1,-1-1,1
C
0,2
1,2
c 2,12,0 2,2
Weak FIP but not FIP
72© Cognitive Radio Technologies, 2007
Implications of FIP and weak FIP
Assumes radios are incapable of reasoning ahead and must react to internal states and current observations
Unless the game model of a CRN has weak FIP, then no autonomously rational decision rule can be guaranteed to converge from all initial states under random and round-robin timing (Theorem 4.10 in dissertation).
If the game model of a CRN has FIP, then ALL autonomously rational decision rules are guaranteed to converge from all initial states under random and round-robin timing.
– And asynchronous timings, but not immediate from definition More insights possible by considering more refined classes of
decision rules and timings
73© Cognitive Radio Technologies, 2007
Decision Rules
74© Cognitive Radio Technologies, 2007
Absorbing Markov Chains and Improvement Paths
Sources of randomness– Timing (Random, Asynchronous)– Decision rule (random decision rule)– Corrupted observations
An NE is an absorbing state for autonomously rational decision rules.
Weak FIP implies that the game is an absorbing Markov chain as long as the NE terminating improvement path always has a nonzero probability of being implemented.
This then allows us to characterize – convergence rate, – probability of ending up in a particular NE, – expected number of times a particular transient state will be visited
75© Cognitive Radio Technologies, 2007
Connecting Markov models, improvement paths, and decision rules
Suppose we need the path = (a0, a1,…am) for convergence by weak FIP.
Must get right sequence of players and right sequence of adaptations.
Friedman Random Better Response– Random or Asynchronous
Every sequence of players have a chance to occur Random decision rule means that all improvements have a chance to
be chosen– Synchronous not guaranteed
Alternate random better response (chance of choosing same action)
– Because of chance to choose same action, every sequence of players can result from every decision timing.
– Because of random choice, every improvement path has a chance of occurring
76© Cognitive Radio Technologies, 2007
Convergence Results (Finite Games)
If a decision rule converges under round-robin, random, or synchronous timing, then it also converges under asynchronous timing.
Random better responses converge for the most decision timings and the most surveyed game conditions.
– Implies that non-deterministic procedural cognitive radio implementations are a good approach if you don’t know much about the network.
77© Cognitive Radio Technologies, 2007
Impact of Noise
Noise impacts the mapping from actions to outcomes, f :AO
Same action tuple can lead to different outcomes Most noise encountered in wireless systems is
theoretically unbounded. Implies that every outcome has a nonzero chance of being
observed for a particular action tuple. Some outcomes are more likely to be observed than others
(and some outcomes may have a very small chance of occurring)
78© Cognitive Radio Technologies, 2007
DFS Example
Consider a radio observing the spectral energy across the bands defined by the set C where each radio k is choosing its band of operation fk.
Noiseless observation of channel ck
Noisy observation If radio is attempting to minimize inband
interference, then noise can lead a radio to believe that a band has lower or higher interference than it does
,i k ki k k kk N
o c g p c f
, ,i k ki k k k i k
k N
o c g p c f n c t
79© Cognitive Radio Technologies, 2007
Trembling Hand (“Noise” in Games)
Assumes players have a nonzero chance of making an error implementing their action.
– Who has not accidentally handed over the wrong amount of cash at a restaurant?
– Who has not accidentally written a “tpyo”?
Related to errors in observation as erroneous observations cause errors in implementation (from an outside observer’s perspective).
80© Cognitive Radio Technologies, 2007
Noisy decision rules
Noisy utility , ,i i iu a t u a n a t
Trembling Hand
ObservationErrors
81© Cognitive Radio Technologies, 2007
Implications of noise
For random timing, [Friedman] shows game with noisy random better response is an ergodic Markov chain.
Likewise other observation based noisy decision rules are ergodic Markov chains
– Unbounded noise implies chance of adapting (or not adapting) to any action
– If coupled with random, synchronous, or asynchronous timings, then CRNs with corrupted observation can be modeled as ergodic Makov chains.
– Not so for round-robin (violates aperiodicity) Somewhat disappointing
– No real steady-state (though unique limiting stationary distribution)
82© Cognitive Radio Technologies, 2007
DFS Example with three access points
3 access nodes, 3 channels, attempting to operate in band with least spectral energy.
Constant power Link gain matrix
Noiseless observations
Random timing
12
3
83© Cognitive Radio Technologies, 2007
Trembling Hand
Transition Matrix, p=0.1
Limiting distribution
84© Cognitive Radio Technologies, 2007
Noisy Best Response
Transition Matrix, (0,1) Gaussian Noise
Limiting stationary distributions
85© Cognitive Radio Technologies, 2007
Comment on Noise and Observations
Cardinality of goals makes a difference for cognitive radios– Probability of making an error is a function of the difference in
utilities– With ordinal preferences, utility functions are just useful fictions
Might as well assume a trembling hand Unboundedness of noise implies that no state can be
absorbing for most decision rules NE retains significant predictive power
– While CRN is an ergodic Markov chain, NE (and the adjacent states) remain most likely states to visit
– Stronger prediction with less noise– Also stronger when network has a Lyapunov function– Exception - elusive equilibria ([Hicks_04])
86© Cognitive Radio Technologies, 2007
Summary
Given a set of goals, an NE is a fixed point for all radios with those goals for all autonomously rational decision processes
Traditional engineering analysis techniques can be applied in a game theoretic setting
– Markov chains to improvement paths Network must have weak FIP for autonomously rational radios to
converge– Weak FIP implies existence of absorbing Markov chain for many decision
rules/timings In practical system, network has a theoretically nonzero chance of visiting
every possible state (ergodicity), but does have unique limiting stationary distribution
– Specific distribution function of decision rules, goals– Will be important to show Lyapunov stability
Shortly, we’ll cover potential games and supermodular games which can be shown to have FIP or weak FIP. Further potential games have a Lyapunov function.
87© Cognitive Radio Technologies, 2007
Designing Cognitive Radio Networks to Yield Desired Behavior
Policy, Cost Functions, Global Altruism, Supermodular Games, Potential Games
88© Cognitive Radio Technologies, 2007
Policy
Concept: Constrain the available actions so the worst cases of distributed decision making can be avoided
Not a new concept – – Policy has been used since
there’s been an FCC What’s new is assuming
decision makers are the radios instead of the people controlling the radios
89© Cognitive Radio Technologies, 2007
Policy applied to radios instead of humans
Need a language to convey policy– Learn what it is– Expand upon policy later
How do radios interpret policy– Policy engine?
Need an enforcement mechanism– Might need to tie in to humans
Need a source for policy– Who sets it?– Who resolves disputes?
Logical extreme can be quite complex, but logical extreme may not be necessary.
Policiesfrequency
mask
90© Cognitive Radio Technologies, 2007
Detection– Digital TV: -116 dBm over a 6 MHz channel– Analog TV: -94 dBm at the peak of the NTSC (National
Television System Committee) picture carrier– Wireless microphone: -107 dBm in a 200 kHz bandwidth.
Transmitted Signal– 4 W Effective Isotropic Radiated Power (EIRP)– Specific spectral masks – Channel vacation times
C. Cordeiro, L. Challapali, D. Birru, S. Shankar, “IEEE 802.22: The First Worldwide Wireless Standard based on Cognitive Radios,” IEEE DySPAN2005, Nov 8-11, 2005 Baltimore, MD.
802.22 Example Policies
91© Cognitive Radio Technologies, 2007
Cost Adjustments
Concept: Centralized unit dynamically adjusts costs in radios’ objective functions to ensure radios operate on desired point
Example: Add -12 to use of wideband waveform
i i iu a u a c a
92© Cognitive Radio Technologies, 2007
Comments on Cost Adjustments
Permits more flexibility than policy– If a radio really needs to deviate, then it can
Easy to turn off and on as a policy tool– Example: protected user shows up in a channel,
cost to use that channel goes up– Example: prioritized user requests channel, other
users’ cost to use prioritized user’s channel goes up (down if when done)
93© Cognitive Radio Technologies, 2007
Global Altruism: distributed, but more costly
Concept: All radios distributed all relevant information to all other radios and then each independently computes jointly optimal solution
– Proposed for spreading code allocation in Popescu04, Sung03– Used in xG Program (Comments of G. Denker, SDR Forum Panel Session on
“A Policy Engine Framework”) Overhead ranges from 5%-27% C = cost of computation I = cost of information transfer from node to node n = number of nodes Distributed
– nC + n(n-1)I/2 Centralized (election)
– C + 2(n-1)I Price of anarchy = 1 May differ if I is asymmetric
94© Cognitive Radio Technologies, 2007
Improving Global Altruism
Global altruism is clearly inferior to a centralized solution for a single problem.
However, suppose radios reported information to and used information from a common database
– n(n-1)I/2 => 2nI And suppose different radios are concerned with different
problems with costs C1,…,Cn Centralized
– Resources = 2(n-1)I + sum(C1,…,Cn)– Time = 2(n-1)I + sum(C1,…,Cn)
Distributed– Resources = 2nI + sum(C1,…,Cn)– Time = 2I + max (C1,…,Cn)
95© Cognitive Radio Technologies, 2007
Example Application:
Overlay network of secondary users (SU) free to adapt power, transmit time, and channel
Without REM:– Decisions solely based on link SINR
With REM– Radios effectively know everything
Upshot: A little gain for the secondary users; big gain for primary users
From: Y. Zhao, J. Gaeddert, K. Bae, J. Reed, “Radio Environment Map Enabled Situation-Aware Cognitive Radio Learning Algorithms,” SDR Forum Technical Conference 2006.
96© Cognitive Radio Technologies, 2007
Comments on Radio Environment Map
Local altruism also possible– Less information transfer
Like policy, effectively needs a common language Nominally could be centralized or distributed
database Read more:
– Y. Zhao, B. Le, J. Reed, “Infrastructure Support – The Radio Environment MAP,” in Cognitive Radio Technology, B. Fette, ed., Elsevier 2006.
97© Cognitive Radio Technologies, 2007
Repeated Games
Stage 1
Stage 2
Stage k
Stage 1
Stage 2
Stage k
Same game is repeated– Indefinitely– Finitely
Players consider discounted payoffs across multiple stages– Stage k
– Expected value over all future stages
k k ki iu a u a
0
k k ki i
k
u a u a
98© Cognitive Radio Technologies, 2007
Impact of Strategies
Rather than merely reacting to the state of the network, radios can choose their actions to influence the actions of other radios
Threaten to act in a way that minimizes another radio’s performance unless it implements the desired actions
Common strategies– Tit-for-tat– Grim trigger– Generous tit-for-tat
Play can be forced to any “feasible” payoff vector with proper selection of punishment strategy.
99© Cognitive Radio Technologies, 2007
Impact of Communication on Strategies
nada
c
Nada C
0,0 -5,5
-1,15,-5
N
-100,0
-100,-1
n -1,-1000,-100 -100,-100
Players agree to play in a certain manner Threats can force play to almost any state
– Breaks down for finite number of stages
100© Cognitive Radio Technologies, 2007
Improvement from Punishment
A. MacKenzie and S. Wicker, “Game Theory in Communications: Motivation, Explanation, and Application to Power Control,” Globecom2001, pp. 821-825.
Throughput/unit power gains be enforcing a common received power level at a base station
Punishment by jamming
Without benefit to deviating, players can operate at lower power level and achieve same throughput
101© Cognitive Radio Technologies, 2007
Instability in Punishment
Issues arise when radios aren’t directly observing actions and are punishing with their actions without announcing punishment
Eventually, a deviation will be falsely detected, punished and without signaling, this leads to a cascade of problems
V. Srivastava, L. DaSilva, “Equilibria for Node Participation in Ad Hoc Networks – An Imperfect Monitoring Approach,” ICC 06, June 2006, vol 8, pp. 3850-3855
102© Cognitive Radio Technologies, 2007
Comments on Punishment
Works best with a common controller to announce Problems in fully distributed system
– Need to elect a controller– Otherwise competing punishments, without knowing other players’
utilities can spiral out of control Problems when actions cannot be directly observed
– Leads to Byzantine problem No single best strategy exists
– Strategy flexibility is important – Significant problems with jammers (they nominally receive higher
utility when “punished” Generally better to implement centralized controller
– Operating point has to be announced anyways
103© Cognitive Radio Technologies, 2007
What does game theory bring to the design of cognitive radio networks? (1/2)
A natural “language” for modeling cognitive radio networks
Permits analysis of ontological radios– Only know goals and that radios will adapt towards its
goal Simplifies analysis of random procedural radios Permits simultaneous analysis of multiple
decision rules – only need goal Provides condition to be assured of possibility of
convergence for all autonomously myopic cognitive radios (weak FIP)
104© Cognitive Radio Technologies, 2007
What does game theory bring to the design of cognitive radio networks? (2/2)
Provides condition to be assured of convergence for all autonomously myopic cognitive radios (FIP, not synchronous timing)
Rapid analysis– Verify goals and actions satisfy a single model, and steady-
states, convergence, and stability An intuition as to what conditions will be needed to
field successful cognitive radio decision rules. A natural understanding of distributed interactive
behavior which simplifies the design of low complexity distributed algorithms
105© Cognitive Radio Technologies, 2007
Game Models of Cognitive Radio Networks
Almost as many models as there are algorithms
Normal form game excellent for capturing single iteration of a complex system
Most other models add features to this model
– Time, decision rules, noisy observations, Natural states
Some can be recast as a normal form game
– Extensive form game
Normal Form Game N, A, {ui}
Repeated Game N, A, {ui}, {di}
Asynchronous Game– N, A, {ui}, {di}, T
Extensive Form Game N, A, {ui}, {di}, T
TU Game N, v
Bargaining Game N, v
106© Cognitive Radio Technologies, 2007
Steady-states
Different game models have different steady-state concepts
Games can have many, one, or no steady-states
Nash equilibrium (and its variants) is most commonly applied concept
– Excellent for distributed noncollaborative algorithms
Games with punishment and Coalitional games tend to have a very large number of equilibria
Game theory permits analysis of steady-states without knowing specific decision rules
Nash Equilibrium Strong Nash Equilibrium Core Shapley Value Nash Bargaining Solution
107© Cognitive Radio Technologies, 2007
Optimality
Numerous different notions of optimality
Many are contradictory Use whatever metric
makes sense
Pareto Efficiency Objective Maximization Gini Index Shapley Value Nash Bargaining
Solution
108© Cognitive Radio Technologies, 2007
Convergence
Showing existence of steady-states is insufficient; need to know if radios can reach those states
FIP (potential games) gives the broadest convergence conditions
Random timing actually helps convergence
109© Cognitive Radio Technologies, 2007
Noise
Unbounded noise causes all networks to theoretically behave as ergodic Markov chains
Important to show Lyapunov stabiltiy
Noisy observations cause noisy implementation to an outside observer
– Trembling hand
110© Cognitive Radio Technologies, 2007
Game Theory and Design
Numerous techniques for improving the behavior of cognitive radio networks
Techniques can be combined Potential games yield lowest
complexity implementations– Judicious design of goals,
actions, Practical limitations limit
effectiveness of punishment– Observing actions– Likely best when a referee
exists Policy can limit the worst
effects, doesn’t really address optimality or convergence issues
Punishment– Can enforce any action tuple– Can be brittle when distributed
Policy– Limits worst case performance
Cost function– Reshapes preferences– Could damage underlying
structure if not a self-interested cost
Centralized– Can theoretically realize any
result– Consumes overhead– Slower reactions
111© Cognitive Radio Technologies, 2007
Future Directions in General Game Theory Research and Cognitive Radio Design
Integrate policy and potential games Integration of coalitional and distributed
forms Increasing dimensionality of action sets
– Cross-layer
Integration of dynamic and hierarchical policies and games
112© Cognitive Radio Technologies, 2007
Future Direction in Regulation
Can incorporate optimization into policy by specifying goals
In theory, correctly implementing goals, correctly implementing actions, and exhaustive self-interested adaptation is enough to predict behavior (at least for potential games)– Simpler policy certification
Provable network behavior
113© Cognitive Radio Technologies, 2007
Avenues for Future Research on Game Theory and CRNs
Integration of bargaining, centralized, and distributed algorithms into a common framework
Cross-layer algorithms Better incorporating
performance of classification techniques into behavior
Asymmetric potential games Bargaining algorithms for
cognitive radio Improving the brittleness of
punishment in distributed implementations with imperfect observations
Imperfection in observations in general
Time varying game models while inferring convergence, stability…
Combination of policy, potential games, coalition formation, and token economies
Can be modeled as a game with to types of players
– Distributed cognitive radios– Dynamic policy provider