mean field games in societal networks - · pdf filemean field games in societal networks ......
TRANSCRIPT
Mean Field Games in Societal Networks
Vijay G. Subramanian EECS Department, University of Michigan
Joint work with J. Li, B. Xia, S. Paul, R. Bhattacharya, X. Geng, H. Ming, L. Xie and Srinivas Shakkottai, Texas A&M University
May 7th 2015 Mathematical and Algorithmic Sciences Lab
France Research Center, Huawei Technologies
Large Complex Networks In Modern Society
Telecommunications & Social Networks
Economic & social driver of change
~$1.5 trillion in 2010 (2.4% of world GDP)
Urban transportation systems
Congestion losses immense
~$67.5 billion productivity loss (0.7% of US GDP)
Driver for intelligent transportation systems
Cyber physical systems
Smart grids, demand response & EV charging/storage
DoE study: savings of $46-117 billion over 20 years
Societal networks as complex networks“Properties” of complex networks
Many agents, controllers, measurement devices, ...
Complex connectivity & interaction structure
Different agents possess different information
Distributed, decentralized or competitive sub-systems
Conflicts between agents’ preferred solutions & system’s preferred solutions
Extremely challenging to develop or analyze optimal solutions
Societal networks are complex networks
Interconnected networks that are important to the functioning of society.
Have a shared resource component, and participants have to periodically take decisions on when and how much to utilize such resources.
Desire to incentivize good behavior that would benefit society as a whole.
Examples of real-world experiments on incentivizing good behavior in societal networks
MeruguPrabhakarRama’09, An incentive mechanism for decongesting the roads: A pilot program in Bangalore
Prabhakar’13, Designing large-scale nudge engines.
Mean-field games paradigm
Mean-field game setting helps in many instances
Models typical agent behavior with many other agents
Naturally applies to distributed settings
Yields simpler policies & simpler calculations
JovanovicRosenthal’88, GrahamMeleard’94, Lasry-Lions’07, HuangCainesMalhame’10
Mean-field analysisRich history in statistical physics
Analysis of many body problems, Ising model, replica method, cavity method, etc.
Asymptotic analysis of Markov processes
Kurtz’s theorem for population density dependent Markov processes
Chemical Master Equation
Lots of recent work on analysis of multiple-access channel (802.11)
Rich history in economics
Population games, Non-atomic players
Competitive equilibrium analysis, Analysis of wage distribution in labor markets
Mean-field gamesLarge population of agents interact
A. Number of agents each agent interacts with increases without bound
B. Each agent interacts with a random pool of agents during finite life-time with extremely small chance of interacting with any particular agent twice
Agents use simpler strategies
Best-respond to population distribution of actions
Mean-field equilibrium (MFE): a self-consistent distribution of actions
Avoids complex belief structure & strategies of equilibrium analysis
For finite population setting, these strategies typically yields ϵ-Nash equilibrium with ϵ decreasing to 0 with population size
Literature OverviewBasic Literature
Model A - JovanovicRosenthal’88, Lasry-Lions’07, HuangCainesMalhame’10
Model A - existence and uniqueness under general conditions AdhlakaJohariWeintraub’12, BodohCreed’13
Model B - GrahamMeleard’94 developed approach using Sniztman’91 Propagation of Chaos paper
Applications of Model B
Dynamic auctions with learning (second price auctions) - IyerJohariSundararajan’12
Scheduling in cellular systems - ManjrekarRamaswamyShakkottai’14
Specific strategic behavior is analyzed: no optimality considerations, no incentives
Multi-armed bandit games - GummadiJohariYu’12
Some optimality considerations: nudging of equilibrium not considered
Outline of rest of talkDiscuss two problems with optimality considerations, incentives and nudging of equilibrium behavior
Real-time wireless streaming of video with device-to-device cooperation
Cooperation leads to great benefits but needs to be incentivized as users can free-ride
User participation and truth-telling important considerations
J. Li, R. Bhattacharya, S. Paul, S. Shakkottai, and V. S., “Incentivizing sharing in realtime D2D streaming networks: A mean field game perspective,” IEEE INFOCOM, Hong Kong, 2015
Enabling demand-response to shift consumer peak energy usage via lottery schemes
Load aggregator constraints indicated to users by coupling them using lottery schemes
Consumer actions can be observed so gains from demand-response passed on to the consumers via lottery schemes
J. Li, B. Xia, X. Geng, H. Ming, L. Xie, S. Shakkottai, and V. S., “Energy coupon: A mean field game perspective on demand response in smart grids,” accepted as poster to ACM SIGMETRICS, 2015.
Real-time wireless streaming of video with device-to-device cooperation
System OverviewUsers want to view common real-time video stream TV broadcast of sports or political event
Internet
Timing and Quality of ServiceTight timing constraints
Create block k at server
N pieces per block
Transmit to users over cellular link at time k-2
# of received pieces ei[k]
i.i.d. with distribution 𝜁
Users exchange pieces of block at time k-1
Get T (<N) broadcast transmission attempts
a[k] vector of # of transmissions
Users play out at time k if all pieces available
Outcome 𝜒i(a[k],e
i[k])
B2D block (k) B2D block (k+1) B2D block (k+2)
frame kframe k-1frame k-2
create block (k)
D2D block (k)D2D block (k-1) D2D block (k+1)
create block (k+1) create block (k+2)
play block (k-2) play block (k-1) play block (k)
Track deficit of unplayed packets
η - Target delivery ratio
View as arrival rate
Successful play out as service
Stable queue implies delivery ratio met
Scheme of HouBorkarKumar’09
Cost of deficit
Convex, monotone increasing function
Timing and Quality of Service
Computation of 𝜒i(a[k],e
i[k])
Using network coding can make all broadcast transmissions useful to recipients
# of useful transmissions for user i is gi(a[k])
Depends on other user transmissions
Iff ei[k]+ g
i(a[k]) = N, play out possible
Can devise a greedy policy to stabilize deficit queues (Abedini, et al. ’13)
Maximizes one-step drift of Lyapunov function of deficits
Dramatically reduces cellular usage even with 4 users
Need users to collectively report 𝜃[k]=(e[k],d[k-1])
Why would users report truthfully? Why would they transmit for others?
User i would like to free-ride, e.g., report high di[k-1] and 0 for e
i[k]
Mean-field modelUsers extremely mobile
J clusters & M users per cluster
Users randomly permuted in each frame
Users can quit at any time with new agent taking its place: regeneration w.p. 1-β
At regeneration deficit chosen independently with distribution ψ
System Problem: minimize discounted sum of user costs
Users are assumed risk neutral
Mean-field calculationsAssume truth-telling + participation in all frames
With J large, consider distributed setting where a[k] is chosen in each cluster
Information flow/overhead is minimized
State of users in cluster chosen by distribution (𝜚 x 𝜁)M
Greedy policy optimal in each cluster
Each non-regenerating user’s state is a Markov process
Users perceive everyone else via the mean-field!
Denote stationary distribution of deficits (with regeneration) as ∏𝜚 x 𝜁
Any self-consistent 𝜚= ∏𝜚 x 𝜁
defines an MFE
Mean-Field Set-upConsider J large & truth-telling in all frames
• Distributed setting: Each cluster chooses a independentlyInformation flow/overhead between clusters is minimized
• States of users in each cluster ⇥̂ chosen with mean fielddistribution (⇢⇥ ⇣)M
• State of user i evolves via a Markov process
di [k] = di [k � 1] + ⌘ � �(a⇣✓i , ⇥̂�i ), ✓i
⌘
⇥̂�i ⇠ (⇢⇥ ⇣)M�1
Users perceive everyone else via the mean-field!
• Denote stationary distribution of user deficits by ⇧⇢⇥⇣
Any ⇢ = ⇧⇢⇥⇣ defines a mean-field equilibrium
MFE exists
Uses regenerative representation of ∏𝜚 x 𝜁
Owing to discrete nature of problem need to use stronger coupling than usual Skorohod coupling
Use Schauder fixed point theorem for existence
Incentives based on unique features of problem
Devise transfers using mechanism design ideas for incentive compatibility & individual rationality
Values interdependent: different from traditional VCG setting
Groves mechanisms still work (RadnerWilliams’88)
Dynamic setting
Use dynamic mechanism design concepts (BergemannValimaki’10)
MFE & Incentives
Details on transfersSystem viewpoint and user viewpoint different
Each user’s viewpoint
Self via Markov process (during life-time) and all others in mean-field
System’s viewpoint
All users in mean-field
Need to align these view-points for correct incentives
Transfers should make each user solve system problem from its viewpoint
System solves: system optimal allocation, optimal allocation from each user’s viewpoint
Using Clarke pivot mechanism transfers positive with nice structure
System solves: system optimal allocation without each user
One part of transfer subsidizes user for participation: equals loss of gains from free-riding
Second part of transfer pays users for simpler nature of MFE: aligns viewpoints
Obtain per frame dominant strategy incentive compatibility & interim individual rationality
Test on Android phonesCustom Android kernel for simultaneous 3G & WiFi use
D2D allocations implemented via 802.11 back-off scheme
4 smartphones used
0 2 4 6 8 10 120
0.01
0.02
0.03
0.04
0.05
Deficit
Norm
aliz
ed F
requency
1 1.5 2 2.5 3
x 104
0
0.02
0.04
0.06
0.08
0.1
Transfer Value
No
rma
lized F
requency
At 800 kbps, 16 minute video costs 1$ for cellular only download
Hybrid scheme results in 40 cents of cellular usage
System needs to pays users 36 cents
User receives enough savings on cellular to use scheme
Enabling demand-response to shift consumer peak energy usage via
lottery schemes
Problem settingLoad Serving Entities (LSE) or Load Aggregator (LA) pays a variable price in the market
Peak prices in the day-ahead market are between 3pm-8pm with 5pm-6pm being maximum
Corresponds to increased demand
AC usage of about 3 kWh between 5pm-6pm in Texan homes
Nudge users to reduce this usage pattern by offering coupons to win weekly lotteries
0 4 8 12 16 20 240
20
40
60
80
100
120
140
160
Time
Ave
rag
e P
rice
s a
nd
th
e S
tan
da
rd D
evi
atio
n Day−ahead Prices Distribution
Play lottery with M-1 other users: couples users under an LSE/LA
More discomfort, more coupons & better the chances to win
User’s surplus increases by w upon win and decreases by l upon loss
Users risk neutral & measure utility by a concave + increasing function of surplus
Model detailsMean-field setting
Users quit & regenerate
Models user churn
Users play lottery in every frame with randomly chosen M-1 other users
Users have finite action choices & MFE is a distribution on actions
Lottery schemes
Viewed as choosing distribution over permutations
Different actions yield different number of coupons
More coupons should result in better chances of winning
Use ranking models such a Plackett-Luce model to run lottery
Permutation given by increasing order of exponentials with parameter # of coupons
Easily generalizes: multiple rounds, complex but monotonic in coupons reward structure, etc.
MFE existence proof ideasAn action distribution results in a surplus distribution
Surplus of each user is a Markov process driven by the lottery outcomes
A surplus distribution results in a poset of action distributions
Actions are determined by solving discounted reward maximization problem
Threshold policy results with map between surplus & actions set-valued
Can focus on extreme pair of actions at any surplus value
Different choices of extreme actions results in poset of action distributions
Upper semicontinuous structure holds & we can take convex hull
Use Kakutani fixed point theorem for existence
Logic follows Nash’s original proof of existence of (mixed) equilibria
Case study: Texan homesHome air conditioning is a major part of electricity usage in Texas.
Typical 2500 sq. ft home consumes of the order of 30 kWh per day, and about 12 kWh in the peak period.
Can we incentivize users to move some energy consumption from the peak period to off-peak?
0 4 8 12 16 20 2422
24
26
28
30
32
34
36
38Ambient Temperature
Time
Te
mp
era
ture
(C
)
0 4 8 12 16 20 2420
22
24
26
28
30ON−OFF Pattern for AC
Time
Tem
pera
ture
(C
)
the case of homogeneous homes that all have identical pa-rameters, it is straightforward to extend our results to thecase where there are a finite set of classes of homes, and theparticipating homes are drawn randomly from these classes.
7.1 Home ModelA standard continuous time model [14,15] for describing theevolution of the internal temperature ⌧(t) at time t of an airconditioned home is
⌧̇(t) =
8><
>:
� 1RC
(⌧(t)� ⌧a
)� ⌘
CPm
, if q(t) = 1,
� 1RC
(⌧(t)� ⌧a
), if q(t) = 0.(18)
Here, ⌧a
is the ambient temperature (of the external envi-ronment), R is the thermal resistance of the home, C is thethermal capacitance of the home, ⌘ is the e�ciency, and P
m
is the rated electrical power of the AC unit. The state ofthe AC is described by the binary signal q(t), where q(t) = 1means AC is in the ON state at time t and in the OFF stateif q(t) = 0. The state is determined by the crossings of userspecified temperature thresholds as follows:
lim✏!0
q(t+ ✏) =
(q(t), |⌧(t)� ⌧
r
| < �,
1� q(t), |⌧(t)� ⌧r
| < �.(19)
A number of studies investigate the thermal properties oftypical homes. We use the parameters shown in Table 1 forour simulations. These are based on the derivations pre-sented in [14] for conditioning a 250 m2 home (about 2700square feet), which is a common mid-size home in manyTexas neighborhoods.
Table 1: Parameters for a Residential AC Unit
Parameter ValueC, Thermal Capacitance 10 kWh/�CR, Thermal Resistance 2 �C/kW
Pm
, Rated Electrical Power 6.8 kW⌘, Coe�cient of Performance 2.5⌧r
, Temperature Setpoint 22.5 �C�, Temperature Deadband 0.3 �C
In order to determine the energy usage for AC in our typi-cal home, we need to have an estimate of how the ambienttemperature varies over a day in Texas during the summermonths. These values are available in the Pecan Street dataset, and we plot the average values over three months forAustin, TX in Figure 3.
Next, we calculate the ON-OFF pattern of our typical airconditioner based on the ambient temperature variation overthe course of the day. We do this by simulating the con-troller in (19) with the appropriate ambient temperaturevalues taken from Figure 3. The pattern is presented in Fig-ure 4. We see that there is higher energy usage during thehotter times of the day, as is to be expected. This also cor-responds to the peak in wholesale electricity prices shownfor the same period in Figure 1. The total energy used eachday corresponding to our 5 ton AC (= 6.8 kW; see Table 1)is 32.83 kWh.
0 4 8 12 16 20 2422
24
26
28
30
32
34
36
38Ambient Temperature
Time
Te
mp
era
ture
(C
)
Figure 3: Average ambient temperature from June–August, 2013 in Austin, TX. Measurements aretaken every 15 minutes from 12 AM to 12 PM.
For comparison, we use the Pecan Street data set to providethe measured daily average energy usage for AC during thesame period for 4 homes that have parameters in the sameballpark as our typical home. These values are shown inTable 2. The table shows the close match of our home modelwith real AC usage patterns.
Table 2: Daily AC Usage for Four Homes
ID Square feet Age AC (tons) Energy (kWh)93 2934 20 5 28.2545 2345 6 3.5 41.54767 2710 5 4 31.73967 2521 5 3.5 37.8
7.2 Actions and CostsThe customer action space in our problem consists of choos-ing when to turn ON and OFF the AC, and is uncountablyinfinitely large. We need to pick a reasonable discrete subsetof the action space for our study. We find from the homemodel that 11.27 kWh of energy is consumed during the in-terval of 3� 8 PM. The amount of energy used in each hourduring this period in kWh and the corresponding ON timein minutes for the AC are shown in the first four columns ofTable 3.
Table 3: Peak-Price Period Consumption
Index Period Energy ON Time No. of units1 3� 4 PM 0.45 4 0.82 4� 5 PM 3.4 30 63 5� 6 PM 3.85 34 6.84 6� 7 PM 0 0 05 7� 8 PM 3.57 31.5 6.3
From Figure 1 it is clear that the consumption during max-imum price period 3 (5 � 6 PM) has the maximum impact
Action choicesChange 5pm-6pm energy usage by adjusting set-points
Compute hazard for LSE as a combination of mean & standard deviation of day-ahead price, and assign coupons to periods
Pick 5 alternate actions: specific set-points
Compute cost to user as combination of change in mean temperature & change in standard deviation of temperature
Coupons for action vector based on total usage in all periods based on action, relative to base-line 0
Coupon determination steers/nudges possible MFEs
SETPOINT
1. Actions, Costs and Coupons
Now we change the control method by modifying the setting points during the corre-
sponding periods, 3�8 PM. We consider changing the setpoint of each period and we take
the setpoint 22.5C as the baseline.
Coupons are given in proportion to the usage of energy during the corrsponding period,
which is calculated in a piecewise linear way based on the baseline setpoint. For example,
in 3�4 PM, if the energy usage is less then the baseline usage, then 1 coupon will be given
for every kWh energy, and 15 coupons/kWh will be given for the part that are more than
the baseline usage. Similarly, we keep giving 1 coupon/kWh for the energy usage that are
less than the baseline usage at that period, and 15, 1, 0, 2, 20 coupons/kWh for the parts
that are greater than the baseline usage at the corresponding periods.
67 days’ data trace is availabe, including the embinent temperature and day ahead price
for each day. We run the AC with each day’s ambient temperature, and then calculate
the cost and coupons when you take a particular action on each day. With that day’s day
ahead price, we can further calculate the benefit (in $,) that LSE can get when agents take
that particular action on that day.
Each action can be identified as a vector (x1, x2, x3, x4, x5), where xi indicattes the
setpoint in period i.
Table 1. Actions, Costs and Energy Coupons
Index Action Vector Cost Coupons
0 (22.5, 22.5, 22.5, 22.5, 22.5) 0 108.9
1 (22, 22.25, 24, 22, 21.75) 1.774 834.6
2 (21.75, 22.25, 24, 22.25, 22) 1.430 815.4
3 (21.75, 22, 24, 22.25, 22) 1.185 772.7
4 (21.75, 22, 24, 22, 22) 0.838 637.1
5 (22, 22, 24, 22.25, 22.25) 0.697 509.1
2. Mean Field Equilibrium
Figure 4 illustrates the mean field action distribution. For example, the best action from
the LSE’s perspective is action 1, which is chosen with probability 0.57. We use mean field
action distribution to find that the savings for LSE over 50 homes is $70 each week. Thus,
incentivizing customers by o↵ering a prize of $40 each week is certainly feasible.
1
Lottery details & resultsUsers win $40: perceive gain of $39 and loss of $1 per lottery
Users stay in lottery for 50 weeks
Almost linear utility
Net reduction in hazard for LSE for 50 homes is $70 per week
SETPOINT 3
−200−160−120 −80 −40 0 40 80 120 160 2000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Mean Field Distribution of Surplus
Surplus
p.d
.f
Figure 3. Mean Field Distribution of Surplus
3 6 9 12 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Action Distribution
Iteration Round
Action D
istr
ibution
action 0
action 1
action 2
action 3
action 4
action 5
Figure 4. Action Distribution
SETPOINT 3
−200−160−120 −80 −40 0 40 80 120 160 2000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Mean Field Distribution of Surplus
Surplus
p.d
.f
Figure 3. Mean Field Distribution of Surplus
3 6 9 12 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Action Distribution
Iteration Round
Actio
n D
istr
ibu
tio
n
action 0
action 1
action 2
action 3
action 4
action 5
Figure 4. Action Distribution
4 SETPOINT
1 3 5 7 9 11 13 15 17 19 21 230
1
2
3
4
5
6
Time
kWh
Energy Distribution
Original EnergyMFE Energy
Figure 5. Energy Distribution
Conclusion & Future DirectionsSummary
Suggested Mean-Field Games as a potential tool for analysis & design of large scale societal networks
Explored incentives: either subsidies for cooperation or lottery rewards for tough actions
Can be easily generalized from homogeneous set-up to a finite number of classes of heterogeneous agents with complex interactions
Future directions
Consider systems with reputation effects & learning : crowdsourcing systems
Consider systems that allow for reselling of resources: dynamic spectrum markets, peer-to-peer systems or cloud computing systems
For a simpler setting understand how MFE could be steered/optimized
Acknowledgements
Collaborators
Funding from NSF
Thank you