© cognitive radio technologies, 2007 1 game theory and cognitive radio networks basic game models...

1© Cognitive Radio Technologies, 2007

Game Theory and Cognitive Radio Networks

Basic Game Models

Equilibria Concepts

Optimality Concepts

Convergence and Noise


Game Models

Models of interactive decision processes


Game

1. A (well-defined) set of 2 or more players2. A set of actions for each player.3. A set of preference relationships for each player for

each possible action tuple.

• More elaborate games exist with more components but these three must always be there.

• Some also introduce an outcome function which maps action tuples to outcomes which are then valued by the preference relations.

• Games with just these three components (or a variation on the preference relationships) are said to be in Normal form or Strategic Form


Set of Players (decision makers)

N – set of n players consisting of players “named” {1, 2, 3,…,i, j,…,n}

Note the n does not mean that there are 14 players in every game.

Other components of the game that “belong” to a particular player are normally indicated by a subscript.

Generic players are most commonly written as i or j. Usage: N is the SET of players, n is the number of

players. N \ i = {1,2,…,i-1, i+1 ,…, n} All players in N except

for i


Actions

Ai – Set of available actions for player i

ai – A particular action chosen by i, ai Ai

A – Action Space, Cartesian product of all Ai

A=A1 A2· · · An

a – Action tuple – a point in the Action Space

A-i – Another action space A formed from

A-i =A1 A2· · · Ai-1 Ai+1 · · · An

a-i – A point from the space A-i

A = Ai A-i A1= A-2

A2 = A-1

a

a1 = a-2

a2 = a-1

Example Two Player Action SpaceA1 = A2 = [0 )

A=A1 A2

b

b1 = b-2

b2 = b-1


Preference Relation expresses an individual player’s desirability ofone outcome over another (A binary relationship)

*io o

o is preferred at least as much as o* by player i

i Preference Relationship (prefers at least as much as)

i Strict Preference Relationship (prefers strictly more than)

~i “Indifference” Relationship (prefers equally)

*io o *

io o

iff *io o

but not

*~io o *io o

iff *io o

and

Preference Relations (1/2)


Preference Relationship (2/2)

Games generally assume the relationship between actions and outcomes is invertible so preferences can be expressed over action vectors.

Preferences are really an ordinal relationship– Know that player prefers one outcome to another,

but quantifying by how much introduces difficulties


Preference Relation then defined as*

ia a

Maps action space to set of real numbers.

iff *i iu a u a

*ia a iff *

i iu a u a

*~ia a iff *i iu a u a

:iu A R

A mathematical description of preference relationships.

Utility Functions (1/2)(Objective Fcns, Payoff Fcns)


Utility Functions (2/2)

By quantifying preference relationships all sorts of valuable mathematical operations can be introduced.

Also note that the quantification operation is not unique as long as relationships are preserved. Many map preference relationships to [0,1].

Example

Jack prefers Apples to Oranges

JackApples Oranges Jack Jacku Apples u Oranges

a) uJack(Apples) = 1, uJack(Oranges) = 0

b) uJack(Apples) = -1, uJack(Oranges) = -7.5


Variety of game models

Normal Form Game <N,A,{ui}>– Synchronous play– T is a singleton– Perfect knowledge of action space, other players’ goals (called

utility functions) Repeated Game <N,A,{ui},{di}>

– Repeated synchronous play of a normal form game– T may be finite or infinite– Perfect knowledge of action space, other players’ goals (called

utility functions)– Players may consider actions in future stages and current stages

Strategies (modified di)

Asynchronous myopic repeated game <N,A,{ui},{di},T>– Repeated play of a normal form game under various timings– Radios react to most recent stage, decision rule is “intelligent”

Many others in the literature and in the dissertation


NormalUrgent

Allocate ResourcesInitiate Processes

OrientInfer from Context

Establish Priority

PlanNormal

Negotiate

Immediate

LearnNewStates

Goal

Adapted From Mitola, “Cognitive Radio for Flexible Mobile Multimedia Communications ”, IEEE Mobile Multimedia Conference, 1999, pp 3-10.

Observe

OutsideWorld

Decide

Act

Autonomous

Infer from Radio Model

States

\

Utility functionArguments

Utility Function

Outcome Space

Action Sets

DecisionRules

Cognitive radios are naturally modeled as players in a game


Interaction is naturally modeled as a game

Radio 2

Actions

Radio 1

ActionsAction Space

u2u1

Decision Rules

Decision Rules

Outcome Space

:f A OInformed by Communications Theory

1 2ˆ ˆ, 1 1̂u 2 2ˆu


Some differences between game models and cognitive radio network model

Player Cognitive Radio

Knowledge Knows A Can learn O (may know or learn A)

f : A O

Invertible

Constant

Known

Not invertible (noise)

May change over time (though relatively fixed for short periods)

Has to learn

Preferences Ordinal Cardinal (goals)

Assuming numerous iterations, normal form game only has a single stage.

– Useful for compactly capturing modeling components at a single stage

– Normal form game properties will be exploited in the analysis of other games

Repeated games are explicitly used as the basis for cognitive radio algorithm design (e.g., Srivastava, MacKenzie)

– Not however, focus of dissertation– Not the most commonly encountered implementation


Summary

The interactions in a cognitive radio network (levels 1-3) can be represented by the tuple <N, A, {ui}, {di},T>

A dynamical system model adequately represents inner-loop procedural radios

A myopic asynchronous repeated game adequately represents ontological radios and random procedural radios

– Suitable for outer-loop processes

– Not shown here, but can also handle inner-loop

Some differences in models– Most analysis carries over– Some differences

Game Model

Dynamical System


Equilibrium Concepts

Nash Equilibria, Mixed Strategy Equilbria, Coalitional Games, the Core, Shapley Value, Nash Bargaining


Steady-states

Recall model of <N,A,{di},T> which we characterize with the evolution function d

Steady-state is a point where a*= d(a*) for all t t *

Obvious solution: solve for fixed points of d. For non-cooperative radios, if a* is a fixed point under

synchronous timing, then it is under the other three timings. Works well for convex action spaces

– Not always guaranteed to exist– Value of fixed point theorems

Not so well for finite spaces– Generally requires exhaustive search


An action vector from which no player can profitably unilaterally deviate.

, ,i i i i i iu a a u b a An action tuple a is a NE if for every i N

for all bi Ai.

Definition

Note showing that a point is a NE says nothing about the process by which the steady state is reached. Nor anything about its uniqueness.Also note that we are implicitly assuming that only pure strategies are possible in this case.

“A steady-state where each player holds a correct expectation of the other players’ behavior and acts rationally.” - Osborne

Nash Equilibrium


Examples

Cognitive Radios’ Dilemma– Two radios have two

signals to choose between {n,w} and {N,W}

– n and N do not overlap– Higher throughput from

operating as a high power wideband signal when other is narrowband

Jamming Avoidance– Two channels– No NE

0 1

0 (-1,1) (1,-1)

1 (1,-1) (-1,1)

Jammer

Transmitter


Preplay Communication– Before the game, discuss their options. Note only NE are suitable

candidates for coordination as one player could profitably violate any agreement.

Rational Introspection– Based on what each player knows about the other players, reason what

the other players would do in its own best interest. (Best Response - tomorrow) Points where everyone would be playing “correctly” are the NE.

Focal Point– Some distinguishing characteristic of the tuple causes it to stand out.

The NE stands out because it’s every player’s best response. Trial and Error

– Starting on some tuple which is not a NE a player “discovers” that deviating improves its payoff. This continues until no player can improve by deviating. Only guaranteed to work for Potential Games (couple weeks)

How do the players find the Nash Equilibrium?


Iterative Elimination of Strongly Dominated Strategies (IESDS)

MotivationSometimes a player’s actions are not preferable, no matter what the other players do.These actions would thus never rationally be played and can be eliminated from consideration in any NE action vector.

DefinitionAn action of player i, ai, is said to be dominated, if there exists some other action bi such that

Note this is strictly the definition for strong domination and the technique being discussed is actually Strong IEDS (or IESDS). Normal (weak) IEDS and sometimes results in incorrect elimination of NE.

, ,i i i i i i i iu a a u b a a A


IEDS Algorithm

1. Remove any readily apparent dominated actions from the analysis.

2. Check to see if this reveals new dominated actions.

3. If there are any dominated actions, repeat from 1, else continue to 4.

4. Apply NE definition on any remaining action tuples.


D C

C

D -1, -1

-5, -5

-10, 0

0, -10So D is dominated by C for player 1. So we remove D for player 1 from the game.

Note the following

1 1, ,u C D u D D

1 1, ,u C C u D C

Iteration 1.

Iteration 2.Note the following

2 2, ,u C C u C D

So in the remaining game D is dominated by C for player 2. So we remove D for player 2 from the game.

As there is only one action tupleleft (thus no deviation is possible, nor is a profitable deviation), it must be a NE

Example IEDS


Comments on IESDS

IESDS is not useful for all games as many games have no dominated strategies.

While each iterative elimination is rational, Dutta states that rationality is more difficult to assume when a large number of iterative eliminations are required. (I disagree for repetitive play)

IESDS can also augment other solution techniques.

If a game is IESDS solvable


Nash Equilibrium as a Fixed Point

Individual Best Response

Synchronous Best Response

Nash Equilibrium as a fixed point

Fixed point theorems can be used to establish existence of NE (see dissertation)

NE can be solved by implied system of equations

ˆ : , ,i i i i i i i i i i iB a b A u b a u a a a A

ˆ ˆi

i NB a B a

* *ˆa B a


Example solution for Fixed Point by Solving for Best Response Fixed Point

Bandwidth Allocation Game– Five cognitive radios with each radio, i, free to determine

the number of simultaneous frequency hopping channels the radio implements, ci [0,).

– Goal– P(c) fraction of symbols that are not interfered with (making

P(c)ci the goodput for radio i)

– Ci(ci) is radio i’s cost for supporting ci simultaneous channels.

i k i ik N

u c B c c Kc

i i i iu c P c c C c


Best Response Analysis

ˆ / 1ic B K N i N

i k i ik N

u c B c c Kc

Goal

\

ˆ / 2i i kk N i

c B c B K c

Best Response

Simultaneous System of Equations

ˆ / 6ic B K i N Solution

Generalization


Significance of NE for CRNs

Why not “if and only if”?– Consider a self-motivated game with a local maximum and a hill-climbing algorithm.– For many decision rules, NE do capture all fixed points (see dissertation)

Identifies steady-states for all “intelligent” decision rules with the same goal. Implies a mechanism for policy design while accommodating differing

implementations– Verify goals result in desired performance– Verify radios act intelligently

Autonomously Rational Decision Rule


Key Theorem for NE Existence

Kakutani’s Fixed Point Theorem

– Let f :X X be an upper semi-continuous convex valued correspondence from a non-empty compact convex set X n, then there is some x*X such that x* f(x*)

x

f(x)

x

f(x)


Nash Equilibrium Existence

* *:U a a A f a a

a2a1

U(a1)

a

f (a)

a0 a2a1

U(a1)

a

f (a)

a0

Visualizable Definition of Quasi-ConcavityAll upper-level sets are convex


Pure Strategies in an Extended GameConsider an extensive form game where each stage is a strategic form game and the action space remains the same at each stage.Before play begins, each player chooses a probabilistic strategy that assigns a probability to each action in his action set. At each stage, the player chooses an action from his action set according to the probabilities he assigned before play began.

ExampleConsider a video football game which will be simulated. Before the game begins two players assign probabilities of calling running plays or passing plays for both offense or defense. In the simulation, for each down the kind of play chosen by each team is based on the initial probabilities assigned to kinds of plays.

My Favorite Mixed Strategy Story


Example Mixed Strategy Game

Jamming game

a1

b1

a2 b2

1,-1 -1, 1

1, -1-1, 1

p

(1-p)

q (1-q) Action Tuples Probabilities(a1,a2)(a1,b2)

(b1,a2)(b1,b2)

pqp(1-q)

(1-p)(1-q)(1-p)q

Expected Utilities

1 , 1 1 1

1 1 1 1 1

U p q pq p q

p q p q

2 , 1 1 1

1 1 1 1 1

U p q pq p q

p q p q

(A1)={p,(1-p): p[0,1]}

(A2)={q,(1-q): q[0,1]}

Sets of probability distributions


Nash Equilibrium in a Mixed Strategy Game

Definition Mixed Strategy Nash EquilibriumA mixed strategy profile * is a NE iff iN

* * *, ,i i i i i i i iU U A

Best Response Correspondence

arg max ,i i

i i i i iA

BR U

Alternate NE DefinitionConsider i N iB BR

A mixed strategy profile * is a NE iff

* *B


0 1

1

Best Response

1 4 2u

q qp

2 4 2u

p pq

1

0 1/ 2

[0,1] 1/ 2

1 1/ 2

q

BR q q

q

2 , 4 2 2 1U p q pq p q 1 , 4 2 2 1U p q pq p q

2

1 1/ 2

[0,1] 1/ 2

0 1/ 2

p

BR p p

p

0.5

0.5

BR1(q)

BR2(p)

Best Response Correspondences

Note: NE in mixed extension which did not exist in original

p(a1)

p(a2)

Nash Equilibrium

p

q

1-p

1-q


Interesting Properties of Mixed Strategy Games

1. Every Mixed Extension of a Strategic Game has an NE.

2. A mixed strategy i is a best response to -i

iff every action in the support of i is itself a best response to -i.

3. Every action in the support of any player’s equilibrium mixed strategy yields the same payoff to that player.


Coalitional Game (with transferable payoff)

Concept: groups of players (called coalitions) conspire together to implement actions which yields a result for the coalition. The value received by the coalition is then distributed among the coalition members.

Where do radios collaborate and distribute value?– 802.16h interference groups – allocation of bandwidth– Distribution of frequencies/spreading codes among cells– File sharing in P2P network

Transferable utility refers to existence of some commodity for which a player’s utility increases by one unit for every unit of the commodity it receives

Game Components, N,v– N set of players– Characteristic function– Coalition, SN

How is this value distributed?– Payoff vector, (xi)iS

Payoff vector is said to be S-feasible if x(S) v(S)

: 2 \Nv

ii S

x S x


The Core (Transferrable)

The Core– For N,v, the set of feasible payoff profiles, (xi)iS

for which there is no coalition S and S-feasible payoff vector (yi)iS for which yi > xi for all iS.

General principles of the NE also apply to the Core:– Number of solutions for a game may be anywhere

from 0 to – May be stable or unstable.


Example

Suppose three radios, N = {1,2,3}, can choose to participate in a peer-to-peer network.

Characteristic Function– v(N) = 1– v({1,2})= v({1,3})= v({2,3})=[0,1]– v(1)= v(2)= v(3) = 0

Loosely, indicates # of duplicated files If >2/3, Core is empty

x = (2/5, 2/5,0)

Example adaptations for=4/5 x = (0, 3/5, 1/5) x = (2/5, 0, 2/5)

x = (1/3, 1/3, 1/3) x = (2/5, 2/5,0)


Comments on the Core

Possibility of empty core implies that even when radios can freely negotiate and form arbitrary coalitions, no steady-state may exist

Frequently very large (infinite) number of steady-states, e.g., <2/3

– Makes it impossible to predict exact behavior Existence conditions for the Core, but would need to

cover some linear programming concepts Related (but not addressed today) concepts:

– Bargaining Sets, Kernel, Nucleolus


Strong NE

Concept: Assume radios are able to collaborate, but utilities aren’t necessarily transferrable

An action tuple a* such that

* *, ,i i S S S ii S

u a u a a S N a A

No Strong NE

N W

n (9.6,9.6) (9.6, 21)

w (21, 9.6) (22, 22)

Unique Strong NE


Motivation for Shapley value

Core was generally either empty or very large. Want a “good” single solution.

Terminology– Marginal Contribution of i

– Interchangeability of i, j

– Dummy player (no synergy)

i S v S i v S

\i S v i S N i

\{ , }i jS S S N i j


Axioms for Shapley Value

Let be some distribution of value for a TU coalition game

Symmetry:– If i and j are interchangeable, then i(v)=j(v)

Dummy:– If i is a dummy, then i(v) = v({i})

Additivity:– Given N,v and N,w, i(v + w)= i(v)+ i(w) for all iN,

where v+w = v(S) + w(S) Balanced Contributions

– Given N,v,

\ \, \ . , \ .N j N ii i j jN v N j v N v N i v


Shapley Value

Only assignment (value) that satisfies balanced contributions; only assignment that simultaneously satisfies symmetry, dummy, and additivity axioms

\

! 1 !

!iS N i

S N SS v S i v S

N

Marginal Value Contributed by i

Probability that i will be next one invited to the grand coalition given that coalition S is already part of the coalition assuming random ordering.


Implications of Shapley Value

One form of a fair allocation– What you receive is based on the value you add– Independent of order of arrival– I liken it to setting salaries according to the Value Over

Replacement Player concept in Baseball Statistics “Better” solution concept than the core as it’s a single

payoff as opposed to a potentially infinite number Allows for analysis of relative “power” of different

players in the system


Bargaining Problem

Components: F, v– Feasible payoffs F, closed convex subset of n

– Disagreement Point v = (v1, v2) What 1 or 2 could achieve without bargaining

Example:– Even if system is jammed, still gets some throughput– Member of 802.16h interference group and try its luck

F is said to be essential if there is some yF such that y1>v1 and y2>v2

If contracts are “binding” then F could be the payoffs corresponding to entire original action space

Otherwise, F may need to be drawn from the set of NE or from enforceable set (see punishment in repeated games)

A particular solution is referred to by (F, v) n


Desirable Bargaining Axioms about a Solution

Strong Efficiency (F, v) is Pareto Efficient

Individually Rational (F, v) v

Scale Covariance– For any 1, 2, 1, 2,

1, 2 >0, if

then

Independence of Irrelevant Alternatives

– If G F and G is closed and convex and (F, v)G, then (G, v)=(F, v)

Symmetry– If v1=v2 and {(x1,x2)|

(x2,x1)F}=F, then 1(F, v)= 2(F, v)

1 1 1 2 2 2 1 2, | ,G x x x x F

1 1 1 2 2 2, , , ,G w F v F v

1 1 1 2 2 2,w v v


Nash Bargaining Solution

NBS

Interestingly, this is the only bargaining solution which simultaneously satisfies the preceding 5 axioms

,

, arg max i ii Nx F x v

F v x v


GT framework for BW allocation [Yaiche]: System Model

N users L links Users compete for the total link capacity Each user has a minimum rate MRi and peak rate

PRi

Admissible rate vector is given by,

C : vector of link capacities

A L*N: alp = 1 if link belongs to path p, else 0.

0 | , , andNX x x MR x PR Ax C ¡

Scenario given in H. Yaiche, R. Mazumdar, C. Rosenberg, “A game theoretic framework for bandwidth allocation and pricing in broadband networks”, IEEE/ACM Transactions on Networking, Volume: 8 , Issue: 5 , Oct. 2000, pp. 667-678.


Centralized Optimization Problem

1

: 1

1

1

N

i ixi

i i

i i

l l

Max x MR

st x MR i N

x PR i N

Ax C l L

K

K

K

NBS exists

v

F


Steady-State Summary

Not every game has a steady-state NE are analogous to fixed points of self-interested decision

processes NE can be applied to procedural and ontological radios

– Don’t need to know decision rule, only goals, actions, and assumption that radios act in their own interest

A game (network) may have 0, 1, or many steady-states All finite normal form games have an NE in its mixed extension

– Over multiple iterations, implies constant adaptation More complex game models yield more complex steady-state

concepts Can define steady-states concepts for coalitional games

– Frequently so broad that specific solutions are used


Evaluating Equilibria

Objective Function Maximization, Pareto Efficiency, Notions of Fairness


Optimality

In general we assume the existence of some design objective function J:A

The desirableness of a network state, a, is the value of J(a).

In general maximizers of J are unrelated to fixed points of d.

Figure from Fig 2.6 in I. Akbar, “Statistical Analysis of Wireless Systems Using Markov Models,” PhD Dissertation, Virginia Tech, January 2007


Example Functions

Utilitarian– Sum of all players’ utilities– Product of all players’

utilities Practical

– Total system throughput– Average SINR– Maximum End-to-End

Latency– Minimal sum system

interference Objective can be unrelated

to utilities

Utilitarian Maximizers

System Throughput Maximizers

Interference Minimization


Price of Anarchy (Factor)

Centralized solution always at least as good as distributed solution

– Like ASIC is always at least as good as DSP Ignores costs of implementing algorithms

– Sometimes centralized is infeasible (e.g., routing the Internet)

– Distributed can sometimes (but not generally) be more costly than centralized

Performance of Centralized Algorithm Solution

Performance of Distributed Algorithm Solution

1

9.6

7


Implications

Best of All Possible Worlds– Low complexity distributed algorithms with low anarchy factors

Reality implies mix of methods– Hodgepodge of mixed solutions

Policy – bounds the price of anarchy Utility adjustments – align distributed solution with centralized solution Market methods – sometimes distributed, sometimes centralized Punishment – sometimes centralized, sometimes distributed,

sometimes both Radio environment maps –”centralized” information for distributed

decision processes– Fully distributed

Potential game design – really, the panglossian solution, but only applies to particular problems


Pareto efficiency (optimality)

Formal definition: An action vector a* is Pareto efficient if there exists no other action vector a, such that every radio’s valuation of the network is at least as good and at least one radio assigns a higher valuation

Informal definition: An action tuple is Pareto efficient if some radios must be hurt in order to improve the payoff of other radios.

Important note– Like design objective function, unrelated to fixed points (NE)– Generally less specific than evaluating design objective

function


Example Games

a1

b1

a2 b2

1,1 -5,5

-1,-15,-5

a1

b1

a2 b2

1,1 -5,5

3, 35,-5

Legend Pareto Efficient

NE NE + PE


Notions of Fairness

What is “Fair”?– Abstractly “fair” means different things to different analysts– In every day life, really just short hand for “I deserve more

than I got”

Nonetheless is used to evaluate how equitably radio resources are distributed


Gini Coefficient

Basic concept: – Order players by utility. – Form CDF for sorted utility distribution

(Lorenz curve)– Integrate (sum) the difference between

perfect equality (of outcome) and CDF– Divide result by sum of all players’ utilities

Formula

Used in a lot of macro-economic comparisons of income distributions

Relatively simple, independent of scale, independent of size of N, anonymity

Radically different outcomes can give the same result G N W

n 0 0.37

w 0.37 0

Player #

Agg

rega

te U

tility

Lorenz curvePerfe

ct Equality

11

1 2i

i N

ii N

n i u aG a n

n u a


Other Metrics of Fairness

Theill Index

Atkinson Index, is income inequality aversion

1 1

11 11 , 0,1i

i N

T a u au n

1lni i

i N

u a u aT a

n u u

1

ii N

u a u an

1/

1 11 , 1

n

ii N

T a u au n


Summary of Equilibria Evaluation

Lots of different ways which a point can be evaluated

Many are contradictory Loosely, any point could be said to be

optimal given the right objective function Insufficient to say that a point is optimal Must describe the metric in use Suggestion: use whatever metric makes

sense to you as a network designer


The Notion of Time and Imperfections in Games and Networks

Extensive Form Games, Repeated Games, Convergence Concepts in Normal Form Games, Trembling Hand Games, Noisy Observations


Model Timing Review

When decisions are made also matters and different radios will likely make decisions at different time

Tj – when radio j makes its adaptations

– Generally assumed to be an infinite set

– Assumed to occur at discrete time

Consistent with DSP implementation

T=T1T2Tn

t T

Decision timing classes Synchronous

– All at once Round-robin

– One at a time in order– Used in a lot of analysis

Random– One at a time in no order

Asynchronous– Random subset at a time– Least overhead for a network


Extensive Form Game Components

1. A set of players.2. The actions available to each player at each decision moment (state).3. A way of deciding who is the current decision maker.4. Outcomes on the sequence of actions.5. Preferences over all outcomes.

Components

A1

2

1

2

1’

2’

B

B

1,-1

-1,1

-1,1

1,-1

1

2

1,1’ 1,2’ 2,1’ 2,2’

1,-1

1,-11,-1

1,-1 -1,1 -1,1

-1,1 -1,1

Strategic Form EquivalenceStrategies for A{1,2}

Strategies for B{(1,1’),(1,2’),(2,1’),(2,2’)}

Game Tree Representation

A Silly Jammer Avoidance Game


Backwards Induction

Concept– Reason backwards based on what each player would rationally

play– Predicated on Sequential Rationality– Sequential Rationality – if starting at any decision point for a player

in the game, his strategy from that point on represents a best response to the strategies of the other players

– Subgame Perfect Nash Equilibrium is a key concept (not formally discussed today).

C

S S

1 1

1,0

C2

0,2

C

S

3,1

S

C2

2,4

S

1

5,3

C C

S

2

4,6

7,54,65,32,43,10,2

Alternating Packet Forwarding Game


Comments on Extensive Form Games

Actions will generally not be directly observable However, likely that cognitive radios will build up

histories Ability to apply backwards induction is predicated on

knowing other radio’s objectives, actions, observations and what they know they know…

– Likely not practical Really the best choice for modeling notion of time

when actions available to radios change with history


Repeated Games

Stage 1

Stage 2

Stage k

Stage 1

Stage 2

Stage k

Same game is repeated– Indefinitely– Finitely

Players consider discounted payoffs across multiple stages

– Stage k

– Expected value over all future stages

k k ki iu a u a

0

k k ki i

k

u a u a


Lesser Rationality: Myopic Processes

Players have no knowledge about utility functions, or expectations about future play, typically can observe or infer current actions

Best response dynamic – maximize individual performance presuming other players’ actions are fixed

Better response dynamic – improve individual performance presuming other players’ actions are fixed

Interesting convergence results can be established


Paths and Convergence

Path [Monderer_96]– A path in is a sequence = (a0, a1,…) such that for every k 1

there exists a unique player such that the strategy combinations (ak-1, ak) differs in exactly one coordinate.

– Equivalently, a path is a sequence of unilateral deviations. When discussing paths, we make use of the following conventions.

– Each element of is called a step.– a0 is referred to as the initial or starting point of .– Assuming is finite with m steps, am is called the terminal point or

ending point of and say that has length m. Cycle [Voorneveld_96]

– A finite path = (a0, a1,…,ak) where ak = a0


Improvement Paths

Improvement Path– A path = (a0, a1,…) where for all k1, ui(ak)>ui(ak-1)

where i is the unique deviator at k Improvement Cycle

– An improvement path that is also a cycle– See the DFS example

2

1

3

4562

1

3

456


Convergence Properties

Finite Improvement Property (FIP)– All improvement paths in a game are finite

Weak Finite Improvement Property (weak FIP)– From every action tuple, there exists an

improvement path that terminates in an NE.

FIP implies weak FIP FIP implies lack of improvement cycles Weak FIP implies existence of an NE


Examples

a

b

A B

1,-1

-1,1

0,2

2,2

Game with FIP

a

b

A B

1,-1 -1,1

1,-1-1,1

C

0,2

1,2

c 2,12,0 2,2

Weak FIP but not FIP


Implications of FIP and weak FIP

Assumes radios are incapable of reasoning ahead and must react to internal states and current observations

Unless the game model of a CRN has weak FIP, then no autonomously rational decision rule can be guaranteed to converge from all initial states under random and round-robin timing (Theorem 4.10 in dissertation).

If the game model of a CRN has FIP, then ALL autonomously rational decision rules are guaranteed to converge from all initial states under random and round-robin timing.

– And asynchronous timings, but not immediate from definition More insights possible by considering more refined classes of

decision rules and timings


Decision Rules


Absorbing Markov Chains and Improvement Paths

Sources of randomness– Timing (Random, Asynchronous)– Decision rule (random decision rule)– Corrupted observations

An NE is an absorbing state for autonomously rational decision rules.

Weak FIP implies that the game is an absorbing Markov chain as long as the NE terminating improvement path always has a nonzero probability of being implemented.

This then allows us to characterize – convergence rate, – probability of ending up in a particular NE, – expected number of times a particular transient state will be visited


Connecting Markov models, improvement paths, and decision rules

Suppose we need the path = (a0, a1,…am) for convergence by weak FIP.

Must get right sequence of players and right sequence of adaptations.

Friedman Random Better Response– Random or Asynchronous

Every sequence of players have a chance to occur Random decision rule means that all improvements have a chance to

be chosen– Synchronous not guaranteed

Alternate random better response (chance of choosing same action)

– Because of chance to choose same action, every sequence of players can result from every decision timing.

– Because of random choice, every improvement path has a chance of occurring


Convergence Results (Finite Games)

If a decision rule converges under round-robin, random, or synchronous timing, then it also converges under asynchronous timing.

Random better responses converge for the most decision timings and the most surveyed game conditions.

– Implies that non-deterministic procedural cognitive radio implementations are a good approach if you don’t know much about the network.


Impact of Noise

Noise impacts the mapping from actions to outcomes, f :AO

Same action tuple can lead to different outcomes Most noise encountered in wireless systems is

theoretically unbounded. Implies that every outcome has a nonzero chance of being

observed for a particular action tuple. Some outcomes are more likely to be observed than others

(and some outcomes may have a very small chance of occurring)


DFS Example

Consider a radio observing the spectral energy across the bands defined by the set C where each radio k is choosing its band of operation fk.

Noiseless observation of channel ck

Noisy observation If radio is attempting to minimize inband

interference, then noise can lead a radio to believe that a band has lower or higher interference than it does

,i k ki k k kk N

o c g p c f

, ,i k ki k k k i k

k N

o c g p c f n c t


Trembling Hand (“Noise” in Games)

Assumes players have a nonzero chance of making an error implementing their action.

– Who has not accidentally handed over the wrong amount of cash at a restaurant?

– Who has not accidentally written a “tpyo”?

Related to errors in observation as erroneous observations cause errors in implementation (from an outside observer’s perspective).


Noisy decision rules

Noisy utility , ,i i iu a t u a n a t

Trembling Hand

ObservationErrors


Implications of noise

For random timing, [Friedman] shows game with noisy random better response is an ergodic Markov chain.

Likewise other observation based noisy decision rules are ergodic Markov chains

– Unbounded noise implies chance of adapting (or not adapting) to any action

– If coupled with random, synchronous, or asynchronous timings, then CRNs with corrupted observation can be modeled as ergodic Makov chains.

– Not so for round-robin (violates aperiodicity) Somewhat disappointing

– No real steady-state (though unique limiting stationary distribution)


DFS Example with three access points

3 access nodes, 3 channels, attempting to operate in band with least spectral energy.

Constant power Link gain matrix

Noiseless observations

Random timing

12

3


Trembling Hand

Transition Matrix, p=0.1

Limiting distribution


Noisy Best Response

Transition Matrix, (0,1) Gaussian Noise

Limiting stationary distributions


Comment on Noise and Observations

Cardinality of goals makes a difference for cognitive radios– Probability of making an error is a function of the difference in

utilities– With ordinal preferences, utility functions are just useful fictions

Might as well assume a trembling hand Unboundedness of noise implies that no state can be

absorbing for most decision rules NE retains significant predictive power

– While CRN is an ergodic Markov chain, NE (and the adjacent states) remain most likely states to visit

– Stronger prediction with less noise– Also stronger when network has a Lyapunov function– Exception - elusive equilibria ([Hicks_04])


Summary

Given a set of goals, an NE is a fixed point for all radios with those goals for all autonomously rational decision processes

Traditional engineering analysis techniques can be applied in a game theoretic setting

– Markov chains to improvement paths Network must have weak FIP for autonomously rational radios to

converge– Weak FIP implies existence of absorbing Markov chain for many decision

rules/timings In practical system, network has a theoretically nonzero chance of visiting

every possible state (ergodicity), but does have unique limiting stationary distribution

– Specific distribution function of decision rules, goals– Will be important to show Lyapunov stability

Shortly, we’ll cover potential games and supermodular games which can be shown to have FIP or weak FIP. Further potential games have a Lyapunov function.


Designing Cognitive Radio Networks to Yield Desired Behavior

Policy, Cost Functions, Global Altruism, Supermodular Games, Potential Games


Policy

Concept: Constrain the available actions so the worst cases of distributed decision making can be avoided

Not a new concept – – Policy has been used since

there’s been an FCC What’s new is assuming

decision makers are the radios instead of the people controlling the radios


Policy applied to radios instead of humans

Need a language to convey policy– Learn what it is– Expand upon policy later

How do radios interpret policy– Policy engine?

Need an enforcement mechanism– Might need to tie in to humans

Need a source for policy– Who sets it?– Who resolves disputes?

Logical extreme can be quite complex, but logical extreme may not be necessary.

Policiesfrequency

mask


Detection– Digital TV: -116 dBm over a 6 MHz channel– Analog TV: -94 dBm at the peak of the NTSC (National

Television System Committee) picture carrier– Wireless microphone: -107 dBm in a 200 kHz bandwidth.

Transmitted Signal– 4 W Effective Isotropic Radiated Power (EIRP)– Specific spectral masks – Channel vacation times

C. Cordeiro, L. Challapali, D. Birru, S. Shankar, “IEEE 802.22: The First Worldwide Wireless Standard based on Cognitive Radios,” IEEE DySPAN2005, Nov 8-11, 2005 Baltimore, MD.

802.22 Example Policies


Cost Adjustments

Concept: Centralized unit dynamically adjusts costs in radios’ objective functions to ensure radios operate on desired point

Example: Add -12 to use of wideband waveform

i i iu a u a c a


Comments on Cost Adjustments

Permits more flexibility than policy– If a radio really needs to deviate, then it can

Easy to turn off and on as a policy tool– Example: protected user shows up in a channel,

cost to use that channel goes up– Example: prioritized user requests channel, other

users’ cost to use prioritized user’s channel goes up (down if when done)


Global Altruism: distributed, but more costly

Concept: All radios distributed all relevant information to all other radios and then each independently computes jointly optimal solution

– Proposed for spreading code allocation in Popescu04, Sung03– Used in xG Program (Comments of G. Denker, SDR Forum Panel Session on

“A Policy Engine Framework”) Overhead ranges from 5%-27% C = cost of computation I = cost of information transfer from node to node n = number of nodes Distributed

– nC + n(n-1)I/2 Centralized (election)

– C + 2(n-1)I Price of anarchy = 1 May differ if I is asymmetric


Improving Global Altruism

Global altruism is clearly inferior to a centralized solution for a single problem.

However, suppose radios reported information to and used information from a common database

– n(n-1)I/2 => 2nI And suppose different radios are concerned with different

problems with costs C1,…,Cn Centralized

– Resources = 2(n-1)I + sum(C1,…,Cn)– Time = 2(n-1)I + sum(C1,…,Cn)

Distributed– Resources = 2nI + sum(C1,…,Cn)– Time = 2I + max (C1,…,Cn)


Example Application:

Overlay network of secondary users (SU) free to adapt power, transmit time, and channel

Without REM:– Decisions solely based on link SINR

With REM– Radios effectively know everything

Upshot: A little gain for the secondary users; big gain for primary users

From: Y. Zhao, J. Gaeddert, K. Bae, J. Reed, “Radio Environment Map Enabled Situation-Aware Cognitive Radio Learning Algorithms,” SDR Forum Technical Conference 2006.


Comments on Radio Environment Map

Local altruism also possible– Less information transfer

Like policy, effectively needs a common language Nominally could be centralized or distributed

database Read more:

– Y. Zhao, B. Le, J. Reed, “Infrastructure Support – The Radio Environment MAP,” in Cognitive Radio Technology, B. Fette, ed., Elsevier 2006.


Repeated Games

Stage 1

Stage 2

Stage k

Stage 1

Stage 2

Stage k

Same game is repeated– Indefinitely– Finitely

Players consider discounted payoffs across multiple stages– Stage k

– Expected value over all future stages

k k ki iu a u a

0

k k ki i

k

u a u a


Impact of Strategies

Rather than merely reacting to the state of the network, radios can choose their actions to influence the actions of other radios

Threaten to act in a way that minimizes another radio’s performance unless it implements the desired actions

Common strategies– Tit-for-tat– Grim trigger– Generous tit-for-tat

Play can be forced to any “feasible” payoff vector with proper selection of punishment strategy.


Impact of Communication on Strategies

nada

c

Nada C

0,0 -5,5

-1,15,-5

N

-100,0

-100,-1

n -1,-1000,-100 -100,-100

Players agree to play in a certain manner Threats can force play to almost any state

– Breaks down for finite number of stages


Improvement from Punishment

A. MacKenzie and S. Wicker, “Game Theory in Communications: Motivation, Explanation, and Application to Power Control,” Globecom2001, pp. 821-825.

Throughput/unit power gains be enforcing a common received power level at a base station

Punishment by jamming

Without benefit to deviating, players can operate at lower power level and achieve same throughput


Instability in Punishment

Issues arise when radios aren’t directly observing actions and are punishing with their actions without announcing punishment

Eventually, a deviation will be falsely detected, punished and without signaling, this leads to a cascade of problems

V. Srivastava, L. DaSilva, “Equilibria for Node Participation in Ad Hoc Networks – An Imperfect Monitoring Approach,” ICC 06, June 2006, vol 8, pp. 3850-3855


Comments on Punishment

Works best with a common controller to announce Problems in fully distributed system

– Need to elect a controller– Otherwise competing punishments, without knowing other players’

utilities can spiral out of control Problems when actions cannot be directly observed

– Leads to Byzantine problem No single best strategy exists

– Strategy flexibility is important – Significant problems with jammers (they nominally receive higher

utility when “punished” Generally better to implement centralized controller

– Operating point has to be announced anyways


What does game theory bring to the design of cognitive radio networks? (1/2)

A natural “language” for modeling cognitive radio networks

Permits analysis of ontological radios– Only know goals and that radios will adapt towards its

goal Simplifies analysis of random procedural radios Permits simultaneous analysis of multiple

decision rules – only need goal Provides condition to be assured of possibility of

convergence for all autonomously myopic cognitive radios (weak FIP)


What does game theory bring to the design of cognitive radio networks? (2/2)

Provides condition to be assured of convergence for all autonomously myopic cognitive radios (FIP, not synchronous timing)

Rapid analysis– Verify goals and actions satisfy a single model, and steady-

states, convergence, and stability An intuition as to what conditions will be needed to

field successful cognitive radio decision rules. A natural understanding of distributed interactive

behavior which simplifies the design of low complexity distributed algorithms


Game Models of Cognitive Radio Networks

Almost as many models as there are algorithms

Normal form game excellent for capturing single iteration of a complex system

Most other models add features to this model

– Time, decision rules, noisy observations, Natural states

Some can be recast as a normal form game

– Extensive form game

Normal Form Game N, A, {ui}

Repeated Game N, A, {ui}, {di}

Asynchronous Game– N, A, {ui}, {di}, T

Extensive Form Game N, A, {ui}, {di}, T

TU Game N, v

Bargaining Game N, v


Steady-states

Different game models have different steady-state concepts

Games can have many, one, or no steady-states

Nash equilibrium (and its variants) is most commonly applied concept

– Excellent for distributed noncollaborative algorithms

Games with punishment and Coalitional games tend to have a very large number of equilibria

Game theory permits analysis of steady-states without knowing specific decision rules

Nash Equilibrium Strong Nash Equilibrium Core Shapley Value Nash Bargaining Solution


Optimality

Numerous different notions of optimality

Many are contradictory Use whatever metric

makes sense

Pareto Efficiency Objective Maximization Gini Index Shapley Value Nash Bargaining

Solution


Convergence

Showing existence of steady-states is insufficient; need to know if radios can reach those states

FIP (potential games) gives the broadest convergence conditions

Random timing actually helps convergence


Noise

Unbounded noise causes all networks to theoretically behave as ergodic Markov chains

Important to show Lyapunov stabiltiy

Noisy observations cause noisy implementation to an outside observer

– Trembling hand


Game Theory and Design

Numerous techniques for improving the behavior of cognitive radio networks

Techniques can be combined Potential games yield lowest

complexity implementations– Judicious design of goals,

actions, Practical limitations limit

effectiveness of punishment– Observing actions– Likely best when a referee

exists Policy can limit the worst

effects, doesn’t really address optimality or convergence issues

Punishment– Can enforce any action tuple– Can be brittle when distributed

Policy– Limits worst case performance

Cost function– Reshapes preferences– Could damage underlying

structure if not a self-interested cost

Centralized– Can theoretically realize any

result– Consumes overhead– Slower reactions


Future Directions in General Game Theory Research and Cognitive Radio Design

Integrate policy and potential games Integration of coalitional and distributed

forms Increasing dimensionality of action sets

– Cross-layer

Integration of dynamic and hierarchical policies and games


Future Direction in Regulation

Can incorporate optimization into policy by specifying goals

In theory, correctly implementing goals, correctly implementing actions, and exhaustive self-interested adaptation is enough to predict behavior (at least for potential games)– Simpler policy certification

Provable network behavior


Avenues for Future Research on Game Theory and CRNs

Integration of bargaining, centralized, and distributed algorithms into a common framework

Cross-layer algorithms Better incorporating

performance of classification techniques into behavior

Asymmetric potential games Bargaining algorithms for

cognitive radio Improving the brittleness of

punishment in distributed implementations with imperfect observations

Imperfection in observations in general

Time varying game models while inferring convergence, stability…

Combination of policy, potential games, coalition formation, and token economies

Can be modeled as a game with to types of players

– Distributed cognitive radios– Dynamic policy provider

© cognitive radio technologies, 2007 1 game theory and cognitive radio networks basic game models...

Documents