mean field games in societal networks - · pdf filemean field games in societal networks ......

Mean Field Games in Societal Networks

Vijay G. Subramanian EECS Department, University of Michigan

Joint work with J. Li, B. Xia, S. Paul, R. Bhattacharya, X. Geng, H. Ming, L. Xie and Srinivas Shakkottai, Texas A&M University

May 7th 2015 Mathematical and Algorithmic Sciences Lab

France Research Center, Huawei Technologies

Large Complex Networks In Modern Society

Telecommunications & Social Networks

Economic & social driver of change

~$1.5 trillion in 2010 (2.4% of world GDP)

Urban transportation systems

Congestion losses immense

~$67.5 billion productivity loss (0.7% of US GDP)

Driver for intelligent transportation systems

Cyber physical systems

Smart grids, demand response & EV charging/storage

DoE study: savings of $46-117 billion over 20 years

Societal networks as complex networks“Properties” of complex networks

Many agents, controllers, measurement devices, ...

Complex connectivity & interaction structure

Different agents possess different information

Distributed, decentralized or competitive sub-systems

Conflicts between agents’ preferred solutions & system’s preferred solutions

Extremely challenging to develop or analyze optimal solutions

Societal networks are complex networks

Interconnected networks that are important to the functioning of society.

Have a shared resource component, and participants have to periodically take decisions on when and how much to utilize such resources.

Desire to incentivize good behavior that would benefit society as a whole.

Examples of real-world experiments on incentivizing good behavior in societal networks

MeruguPrabhakarRama’09, An incentive mechanism for decongesting the roads: A pilot program in Bangalore

Prabhakar’13, Designing large-scale nudge engines.

Mean-field games paradigm

Mean-field game setting helps in many instances

Models typical agent behavior with many other agents

Naturally applies to distributed settings

Yields simpler policies & simpler calculations

JovanovicRosenthal’88, GrahamMeleard’94, Lasry-Lions’07, HuangCainesMalhame’10

Mean-field analysisRich history in statistical physics

Analysis of many body problems, Ising model, replica method, cavity method, etc.

Asymptotic analysis of Markov processes

Kurtz’s theorem for population density dependent Markov processes

Chemical Master Equation

Lots of recent work on analysis of multiple-access channel (802.11)

Rich history in economics

Population games, Non-atomic players

Competitive equilibrium analysis, Analysis of wage distribution in labor markets

Mean-field gamesLarge population of agents interact

A. Number of agents each agent interacts with increases without bound

B. Each agent interacts with a random pool of agents during finite life-time with extremely small chance of interacting with any particular agent twice

Agents use simpler strategies

Best-respond to population distribution of actions

Mean-field equilibrium (MFE): a self-consistent distribution of actions

Avoids complex belief structure & strategies of equilibrium analysis

For finite population setting, these strategies typically yields ϵ-Nash equilibrium with ϵ decreasing to 0 with population size

Literature OverviewBasic Literature

Model A - JovanovicRosenthal’88, Lasry-Lions’07, HuangCainesMalhame’10

Model A - existence and uniqueness under general conditions AdhlakaJohariWeintraub’12, BodohCreed’13

Model B - GrahamMeleard’94 developed approach using Sniztman’91 Propagation of Chaos paper

Applications of Model B

Dynamic auctions with learning (second price auctions) - IyerJohariSundararajan’12

Scheduling in cellular systems - ManjrekarRamaswamyShakkottai’14

Specific strategic behavior is analyzed: no optimality considerations, no incentives

Multi-armed bandit games - GummadiJohariYu’12

Some optimality considerations: nudging of equilibrium not considered

Outline of rest of talkDiscuss two problems with optimality considerations, incentives and nudging of equilibrium behavior

Real-time wireless streaming of video with device-to-device cooperation

Cooperation leads to great benefits but needs to be incentivized as users can free-ride

User participation and truth-telling important considerations

J. Li, R. Bhattacharya, S. Paul, S. Shakkottai, and V. S., “Incentivizing sharing in realtime D2D streaming networks: A mean field game perspective,” IEEE INFOCOM, Hong Kong, 2015

Enabling demand-response to shift consumer peak energy usage via lottery schemes

Load aggregator constraints indicated to users by coupling them using lottery schemes

Consumer actions can be observed so gains from demand-response passed on to the consumers via lottery schemes

J. Li, B. Xia, X. Geng, H. Ming, L. Xie, S. Shakkottai, and V. S., “Energy coupon: A mean field game perspective on demand response in smart grids,” accepted as poster to ACM SIGMETRICS, 2015.

Real-time wireless streaming of video with device-to-device cooperation

System OverviewUsers want to view common real-time video stream TV broadcast of sports or political event

Internet

Timing and Quality of ServiceTight timing constraints

Create block k at server

N pieces per block

Transmit to users over cellular link at time k-2

# of received pieces ei[k]

i.i.d. with distribution 𝜁

Users exchange pieces of block at time k-1

Get T (<N) broadcast transmission attempts

a[k] vector of # of transmissions

Users play out at time k if all pieces available

Outcome 𝜒i(a[k],e

i[k])

B2D block (k) B2D block (k+1) B2D block (k+2)

frame kframe k-1frame k-2

create block (k)

D2D block (k)D2D block (k-1) D2D block (k+1)

create block (k+1) create block (k+2)

play block (k-2) play block (k-1) play block (k)

Track deficit of unplayed packets

η - Target delivery ratio

View as arrival rate

Successful play out as service

Stable queue implies delivery ratio met

Scheme of HouBorkarKumar’09

Cost of deficit

Convex, monotone increasing function

Timing and Quality of Service

Computation of 𝜒i(a[k],e

i[k])

Using network coding can make all broadcast transmissions useful to recipients

# of useful transmissions for user i is gi(a[k])

Depends on other user transmissions

Iff ei[k]+ g

i(a[k]) = N, play out possible

Can devise a greedy policy to stabilize deficit queues (Abedini, et al. ’13)

Maximizes one-step drift of Lyapunov function of deficits

Dramatically reduces cellular usage even with 4 users

Need users to collectively report 𝜃[k]=(e[k],d[k-1])

Why would users report truthfully? Why would they transmit for others?

User i would like to free-ride, e.g., report high di[k-1] and 0 for e

i[k]

Mean-field modelUsers extremely mobile

J clusters & M users per cluster

Users randomly permuted in each frame

Users can quit at any time with new agent taking its place: regeneration w.p. 1-β

At regeneration deficit chosen independently with distribution ψ

System Problem: minimize discounted sum of user costs

Users are assumed risk neutral

Mean-field calculationsAssume truth-telling + participation in all frames

With J large, consider distributed setting where a[k] is chosen in each cluster

Information flow/overhead is minimized

State of users in cluster chosen by distribution (𝜚 x 𝜁)M

Greedy policy optimal in each cluster

Each non-regenerating user’s state is a Markov process

Users perceive everyone else via the mean-field!

Denote stationary distribution of deficits (with regeneration) as ∏𝜚 x 𝜁

Any self-consistent 𝜚= ∏𝜚 x 𝜁

defines an MFE

Mean-Field Set-upConsider J large & truth-telling in all frames

• Distributed setting: Each cluster chooses a independentlyInformation flow/overhead between clusters is minimized

• States of users in each cluster ⇥̂ chosen with mean fielddistribution (⇢⇥ ⇣)M

• State of user i evolves via a Markov process

di [k] = di [k � 1] + ⌘ � �(a⇣✓i , ⇥̂�i ), ✓i

⌘

⇥̂�i ⇠ (⇢⇥ ⇣)M�1

Users perceive everyone else via the mean-field!

• Denote stationary distribution of user deficits by ⇧⇢⇥⇣

Any ⇢ = ⇧⇢⇥⇣ defines a mean-field equilibrium

MFE exists

Uses regenerative representation of ∏𝜚 x 𝜁

Owing to discrete nature of problem need to use stronger coupling than usual Skorohod coupling

Use Schauder fixed point theorem for existence

Incentives based on unique features of problem

Devise transfers using mechanism design ideas for incentive compatibility & individual rationality

Values interdependent: different from traditional VCG setting

Groves mechanisms still work (RadnerWilliams’88)

Dynamic setting

Use dynamic mechanism design concepts (BergemannValimaki’10)

MFE & Incentives

Details on transfersSystem viewpoint and user viewpoint different

Each user’s viewpoint

Self via Markov process (during life-time) and all others in mean-field

System’s viewpoint

All users in mean-field

Need to align these view-points for correct incentives

Transfers should make each user solve system problem from its viewpoint

System solves: system optimal allocation, optimal allocation from each user’s viewpoint

Using Clarke pivot mechanism transfers positive with nice structure

System solves: system optimal allocation without each user

One part of transfer subsidizes user for participation: equals loss of gains from free-riding

Second part of transfer pays users for simpler nature of MFE: aligns viewpoints

Obtain per frame dominant strategy incentive compatibility & interim individual rationality

Test on Android phonesCustom Android kernel for simultaneous 3G & WiFi use

D2D allocations implemented via 802.11 back-off scheme

4 smartphones used

0 2 4 6 8 10 120

0.01

0.02

0.03

0.04

0.05

Deficit

Norm

aliz

ed F

requency

1 1.5 2 2.5 3

x 104

0

0.02

0.04

0.06

0.08

0.1

Transfer Value

No

rma

lized F

requency

At 800 kbps, 16 minute video costs 1$ for cellular only download

Hybrid scheme results in 40 cents of cellular usage

System needs to pays users 36 cents

User receives enough savings on cellular to use scheme

Enabling demand-response to shift consumer peak energy usage via

lottery schemes

Problem settingLoad Serving Entities (LSE) or Load Aggregator (LA) pays a variable price in the market

Peak prices in the day-ahead market are between 3pm-8pm with 5pm-6pm being maximum

Corresponds to increased demand

AC usage of about 3 kWh between 5pm-6pm in Texan homes

Nudge users to reduce this usage pattern by offering coupons to win weekly lotteries

0 4 8 12 16 20 240

20

40

60

80

100

120

140

160

Time

Ave

rag

e P

rice

s a

nd

th

e S

tan

da

rd D

evi

atio

n Day−ahead Prices Distribution

Play lottery with M-1 other users: couples users under an LSE/LA

More discomfort, more coupons & better the chances to win

User’s surplus increases by w upon win and decreases by l upon loss

Users risk neutral & measure utility by a concave + increasing function of surplus

Model detailsMean-field setting

Users quit & regenerate

Models user churn

Users play lottery in every frame with randomly chosen M-1 other users

Users have finite action choices & MFE is a distribution on actions

Lottery schemes

Viewed as choosing distribution over permutations

Different actions yield different number of coupons

More coupons should result in better chances of winning

Use ranking models such a Plackett-Luce model to run lottery

Permutation given by increasing order of exponentials with parameter # of coupons

Easily generalizes: multiple rounds, complex but monotonic in coupons reward structure, etc.

MFE existence proof ideasAn action distribution results in a surplus distribution

Surplus of each user is a Markov process driven by the lottery outcomes

A surplus distribution results in a poset of action distributions

Actions are determined by solving discounted reward maximization problem

Threshold policy results with map between surplus & actions set-valued

Can focus on extreme pair of actions at any surplus value

Different choices of extreme actions results in poset of action distributions

Upper semicontinuous structure holds & we can take convex hull

Use Kakutani fixed point theorem for existence

Logic follows Nash’s original proof of existence of (mixed) equilibria

Case study: Texan homesHome air conditioning is a major part of electricity usage in Texas.

Typical 2500 sq. ft home consumes of the order of 30 kWh per day, and about 12 kWh in the peak period.

Can we incentivize users to move some energy consumption from the peak period to off-peak?

0 4 8 12 16 20 2422

24

26

28

30

32

34

36

38Ambient Temperature

Time

Te

mp

era

ture

(C

)

0 4 8 12 16 20 2420

22

24

26

28

30ON−OFF Pattern for AC

Time

Tem

pera

ture

(C

)

the case of homogeneous homes that all have identical pa-rameters, it is straightforward to extend our results to thecase where there are a finite set of classes of homes, and theparticipating homes are drawn randomly from these classes.

7.1 Home ModelA standard continuous time model [14,15] for describing theevolution of the internal temperature ⌧(t) at time t of an airconditioned home is

⌧̇(t) =

8><

>:

� 1RC

(⌧(t)� ⌧a

)� ⌘

CPm

, if q(t) = 1,

� 1RC

(⌧(t)� ⌧a

), if q(t) = 0.(18)

Here, ⌧a

is the ambient temperature (of the external envi-ronment), R is the thermal resistance of the home, C is thethermal capacitance of the home, ⌘ is the e�ciency, and P

m

is the rated electrical power of the AC unit. The state ofthe AC is described by the binary signal q(t), where q(t) = 1means AC is in the ON state at time t and in the OFF stateif q(t) = 0. The state is determined by the crossings of userspecified temperature thresholds as follows:

lim✏!0

q(t+ ✏) =

(q(t), |⌧(t)� ⌧

r

| < �,

1� q(t), |⌧(t)� ⌧r

| < �.(19)

A number of studies investigate the thermal properties oftypical homes. We use the parameters shown in Table 1 forour simulations. These are based on the derivations pre-sented in [14] for conditioning a 250 m2 home (about 2700square feet), which is a common mid-size home in manyTexas neighborhoods.

Table 1: Parameters for a Residential AC Unit

Parameter ValueC, Thermal Capacitance 10 kWh/�CR, Thermal Resistance 2 �C/kW

Pm

, Rated Electrical Power 6.8 kW⌘, Coe�cient of Performance 2.5⌧r

, Temperature Setpoint 22.5 �C�, Temperature Deadband 0.3 �C

In order to determine the energy usage for AC in our typi-cal home, we need to have an estimate of how the ambienttemperature varies over a day in Texas during the summermonths. These values are available in the Pecan Street dataset, and we plot the average values over three months forAustin, TX in Figure 3.

Next, we calculate the ON-OFF pattern of our typical airconditioner based on the ambient temperature variation overthe course of the day. We do this by simulating the con-troller in (19) with the appropriate ambient temperaturevalues taken from Figure 3. The pattern is presented in Fig-ure 4. We see that there is higher energy usage during thehotter times of the day, as is to be expected. This also cor-responds to the peak in wholesale electricity prices shownfor the same period in Figure 1. The total energy used eachday corresponding to our 5 ton AC (= 6.8 kW; see Table 1)is 32.83 kWh.

0 4 8 12 16 20 2422

24

26

28

30

32

34

36

38Ambient Temperature

Time

Te

mp

era

ture

(C

)

Figure 3: Average ambient temperature from June–August, 2013 in Austin, TX. Measurements aretaken every 15 minutes from 12 AM to 12 PM.

For comparison, we use the Pecan Street data set to providethe measured daily average energy usage for AC during thesame period for 4 homes that have parameters in the sameballpark as our typical home. These values are shown inTable 2. The table shows the close match of our home modelwith real AC usage patterns.

Table 2: Daily AC Usage for Four Homes

ID Square feet Age AC (tons) Energy (kWh)93 2934 20 5 28.2545 2345 6 3.5 41.54767 2710 5 4 31.73967 2521 5 3.5 37.8

7.2 Actions and CostsThe customer action space in our problem consists of choos-ing when to turn ON and OFF the AC, and is uncountablyinfinitely large. We need to pick a reasonable discrete subsetof the action space for our study. We find from the homemodel that 11.27 kWh of energy is consumed during the in-terval of 3� 8 PM. The amount of energy used in each hourduring this period in kWh and the corresponding ON timein minutes for the AC are shown in the first four columns ofTable 3.

Table 3: Peak-Price Period Consumption

Index Period Energy ON Time No. of units1 3� 4 PM 0.45 4 0.82 4� 5 PM 3.4 30 63 5� 6 PM 3.85 34 6.84 6� 7 PM 0 0 05 7� 8 PM 3.57 31.5 6.3

From Figure 1 it is clear that the consumption during max-imum price period 3 (5 � 6 PM) has the maximum impact

Action choicesChange 5pm-6pm energy usage by adjusting set-points

Compute hazard for LSE as a combination of mean & standard deviation of day-ahead price, and assign coupons to periods

Pick 5 alternate actions: specific set-points

Compute cost to user as combination of change in mean temperature & change in standard deviation of temperature

Coupons for action vector based on total usage in all periods based on action, relative to base-line 0

Coupon determination steers/nudges possible MFEs

SETPOINT

1. Actions, Costs and Coupons

Now we change the control method by modifying the setting points during the corre-

sponding periods, 3�8 PM. We consider changing the setpoint of each period and we take

the setpoint 22.5C as the baseline.

Coupons are given in proportion to the usage of energy during the corrsponding period,

which is calculated in a piecewise linear way based on the baseline setpoint. For example,

in 3�4 PM, if the energy usage is less then the baseline usage, then 1 coupon will be given

for every kWh energy, and 15 coupons/kWh will be given for the part that are more than

the baseline usage. Similarly, we keep giving 1 coupon/kWh for the energy usage that are

less than the baseline usage at that period, and 15, 1, 0, 2, 20 coupons/kWh for the parts

that are greater than the baseline usage at the corresponding periods.

67 days’ data trace is availabe, including the embinent temperature and day ahead price

for each day. We run the AC with each day’s ambient temperature, and then calculate

the cost and coupons when you take a particular action on each day. With that day’s day

ahead price, we can further calculate the benefit (in $,) that LSE can get when agents take

that particular action on that day.

Each action can be identified as a vector (x1, x2, x3, x4, x5), where xi indicattes the

setpoint in period i.

Table 1. Actions, Costs and Energy Coupons

Index Action Vector Cost Coupons

0 (22.5, 22.5, 22.5, 22.5, 22.5) 0 108.9

1 (22, 22.25, 24, 22, 21.75) 1.774 834.6

2 (21.75, 22.25, 24, 22.25, 22) 1.430 815.4

3 (21.75, 22, 24, 22.25, 22) 1.185 772.7

4 (21.75, 22, 24, 22, 22) 0.838 637.1

5 (22, 22, 24, 22.25, 22.25) 0.697 509.1

2. Mean Field Equilibrium

Figure 4 illustrates the mean field action distribution. For example, the best action from

the LSE’s perspective is action 1, which is chosen with probability 0.57. We use mean field

action distribution to find that the savings for LSE over 50 homes is $70 each week. Thus,

incentivizing customers by o↵ering a prize of $40 each week is certainly feasible.

1

Lottery details & resultsUsers win $40: perceive gain of $39 and loss of $1 per lottery

Users stay in lottery for 50 weeks

Almost linear utility

Net reduction in hazard for LSE for 50 homes is $70 per week

SETPOINT 3

−200−160−120 −80 −40 0 40 80 120 160 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Mean Field Distribution of Surplus

Surplus

p.d

.f

Figure 3. Mean Field Distribution of Surplus

3 6 9 12 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Action Distribution

Iteration Round

Action D

istr

ibution

action 0

action 1

action 2

action 3

action 4

action 5

Figure 4. Action Distribution

SETPOINT 3

−200−160−120 −80 −40 0 40 80 120 160 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Mean Field Distribution of Surplus

Surplus

p.d

.f

Figure 3. Mean Field Distribution of Surplus

3 6 9 12 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Action Distribution

Iteration Round

Actio

n D

istr

ibu

tio

n

action 0

action 1

action 2

action 3

action 4

action 5

Figure 4. Action Distribution

4 SETPOINT

1 3 5 7 9 11 13 15 17 19 21 230

1

2

3

4

5

6

Time

kWh

Energy Distribution

Original EnergyMFE Energy

Figure 5. Energy Distribution

Conclusion & Future DirectionsSummary

Suggested Mean-Field Games as a potential tool for analysis & design of large scale societal networks

Explored incentives: either subsidies for cooperation or lottery rewards for tough actions

Can be easily generalized from homogeneous set-up to a finite number of classes of heterogeneous agents with complex interactions

Future directions

Consider systems with reputation effects & learning : crowdsourcing systems

Consider systems that allow for reselling of resources: dynamic spectrum markets, peer-to-peer systems or cloud computing systems

For a simpler setting understand how MFE could be steered/optimized

Acknowledgements

Collaborators

Funding from NSF

Thank you

mean field games in societal networks - · pdf filemean field games in societal networks ......

Documents