distributed learning in network games: a dual averaging approach · 2020. 2. 15. · learning...

Distributed Learning in Network Games:a Dual Averaging Approach

Shahriar Talebi†, Siavash Alemzadeh†, Lillian J. Ratliff‡, and Mehran Mesbahi†

Abstract— In this paper, we propose a distributed no-regretlearning algorithm for network games using a primal-dualmethod, i.e., dual averaging. With only locally available obser-vations, we consider the scenario where each player optimizesa global objective, formed by local objective functions on thenodes of a given communication graph. Our learning algorithmfor each player involves taking steps along their individualpay-off gradients, dictated by local observations of the otherplayer’s actions–where the local nature of information exchangeis encoded by the network. The output is then projected back-again locally-to the set of admissible actions for each player.We show the convergence of this distributed learning algorithmfor the case of a deterministic network that is subjectedto two teams with distinct objectives. Our analysis indicatesthe key correlation between the rate of convergence andnetwork structure/connectivity that also appeared in distributedoptimization setups via dual averaging. Finally, illustrativeexamples showcase the performance of our algorithm in relationto the size and connectivity of the communication network.

Keywords: Network games, distributed learning, dual averaging,Nash equilibrium

I. INTRODUCTION

Networked systems analysis has been on the cutting edgeof multi-disciplinary research over the past few years withapplications spanning from robotic swarms to biologicalnetworks. Common to all of these systems, a global objectiveis achieved based on local interactions which in turn requirelocal decision-making that is inherently restricted by lim-ited information exchange and prescribed set of admissiblepolicies [1]. While provably effective, many established com-putationally efficient optimization algorithms [2]–[5] sufferfrom these limitations, as they become mostly intractablewhen the network grows large in scale. Not surprisingly,there has been an extensive literature on exploiting thestructure of a communication graph to reduce the complexityof methods such as subgradient optimization [6], mirrordecent and dual averaging [7], [8], and sequential decision-making [9] to name a few. However, while these worksmainly focus on a cooperative information sharing, many ofthe real-world systems exhibit non-cooperative behaviors dueto a plethora of reasons including intrusions/attacks, greedyagents, competition for constrained resources, misaligned

† William E. Boeing Department of Aeronautics andAstronautics, University of Washington, Seattle WA, USA{shahriar,alems,mesbahi}@uw.edu.‡ Department of Electrical Engineering, University of Washington, Seattle

WA, USA {ratliffl}@ece.uw.edu.The research of the authors has been supported by NSF grant SES-

1541025 and Air Force Office of Scientific Research under grant numberFA9550-16-1-0022.

incentives, or the adversarial nature of environments thatnecessitate a game-theoretical model.

Game theory has been successfully employed in non-cooperative decision-making [10]–[12]. A variety of dynamicgames including zero-sum, non-zero-sum [13] and Stackel-berg [14] have been studied in the literature for decades thatmodel various non-cooperative interactions among decisionmakers, e.g. network security [15], and wireless networks[16], [17], and multi-agent systems such as Pursuit Evasiongames [18], [19],

Conventional centralized game-theoretic machinery arebarely leveraged in real-world decision-making due to poorcomputational performance, lack of information, and policylimitations. In this direction, well-established algorithmssuch as primal-dual methods have been used for convergenceanalysis of learning and optimization on games. For instance,by introducing variational stability condition, [20] considersa centralized multiagent decision process where each agentemploys an online mirror descent learning algorithm. Also,[21] studies the behaviour of learning with ”bandit” feedback(a framework for extremely low-information environments)in non-cooperative concave games. However, in spite of theimproved convergence rates, scalability and limited com-munications are inherent source of intractability in theselearning setups.

As the key connection, Nash Equilibrium (NE) plays acentral role in game theory roughly defined as a strategy fromwhich no player has any incentive to deviate unilaterally.Upon the convexity and continuous differentiability, the NEcan be characterized as the solution of a variational inequality(VI) problem. Distributed Nash seeking refers to the classof problems that aims for learning a global NE with onlylocal information [22]–[25]. In this direction, each node inthe network usually represents an individual player, e.g.,[26] proposes an accelerated gradient decent for monotonegames and [27] considers a zero-sum game between twonetworks characterizing conditions for convergence to NE.Additionally, the existence of a unique Nash equilibrium canbe guaranteed under assumptions on the admissible actionsets (compactness) and the game Jacobian (monotonicity andits variants) [28]–[30].

In this work, we introduce a non-cooperative game be-tween two teams, each consisting of players that interact overa network. While the goal of each team is to learn the globalNE, players have no information about the other side, neitherdo they enjoy a global decision-making capability. Thusthe equilibrium learning is only based on local observationsof the opponent actions and distributed decision-making of

nodes within their team. This setup resembles a scenariowhere each node in the network is subjected to actionsthat contribute to distinct objectives–in this sense, there isduality in the interactions between nodes in the network.For example, each node can represent a socio-economicentity, with objectives that are not completely aligned witheach other (altruistic vs. profit-seeking). Financial networksconsisting of nodes that are subjected to political influencefrom two opposing parties yet provide another scenario ofinterest. The network in these scenarios provides the back-bone for information exchange and coordination amongst theteams. As such, it becomes imperative to characterize therole of the network in the evolution of team’s strategies,potentially towards NE. Of prime interest in this work ishow algebraic and combinatorial properties of the network,such as its connectivity, contribute to the convergence ofdistributed learning in the context of games. The choice of aprimal/dual approach that is built around dual averaging [3],[4], in conjunction with a certain monotonicity assumption, isprimary motivated by this overarching objective; this choiceis also consistent with how dual averaging has been usedin the context of distributed optimization to underscore the“network-effect” on the convergence and optimality proper-ties of distributed optimization, where there is no “duality”in the nodes’ operation. However, in order to extend thework of [8] to the game setting, one has to pay a particularattention to the information structure, as for example, itwould be unreasonable to assume information sharing withthe opponents in the game setting.

Accordingly, in our contribution: (1) We suppose playershave neither a priori information about their adversary, norglobal coordination over the network. Instead, they have achance to choose a local strategy at each node and receivea local cost (reward), resulting in learning the global NE ina distributed fashion, (2) We propose intuitive assumptionswhich lead us to the convergence proof of objective values tothose of NE. In addition, we will discuss the convergence ofrunning average of actions to NE at each node under strongerregularity assumptions. It is shown that the algorithm enjoysa similar regret bound as dual averaging methods which iswell-known to be tight in black-box setting [2].

The rest of the paper is organized as follows: In §IIwe provide a quick overview of mathematical tools thatare used in the paper. In §III we introduce the problemsetup and propose our method. In §IV we build up therequired foundations to ultimately prove convergence of ouralgorithm to NE. We provide an illustrative example in §Vand concluding remarks and future directions are addressedin §VI.

II. MATHEMATICAL PRELIMINARIES

We denote by R the set of real numbers. A columnvector with n elements is referred to as v ∈ Rn, wherevi represents the ith element in v. The matrix M ∈ Rp×qcontains p rows and q columns with Mij denoting theelement in the ith row and jth column of M . The squarematrix N ∈ Rn×n is symmetric if N> = N , where N>

denotes the transpose of the matrix N . The n × 1 vectorof all ones is denoted by 1. A doubly stochastic matrixP ∈ Rn×n is defined as a non-negative square matrix suchthat

∑j Pij =

∑j Pjk = 1 for all i and k. We define

[n] = {1, . . . , n}. The Euclidean norm of a vector x ∈ Rn isdefined as ‖x‖2 = (x>x)1/2 = (

∑ni=1 x

2i )

1/2 and the dualnorm to ‖x‖2 is denoted by ‖x‖∗ := sup‖u‖=1〈v, u〉. Also,the 1-norm is defined as ‖x‖1 =

∑ni=1 |xi|. A function f is

convex if f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y) for allθ ∈ (0, 1) and for all x, y in its convex domain, and g is asub-gradient of f at point z if f(y) ≥ f(z)+g>(y−z) for ally. If f is convex and differentiable, the gradient of f , ∇f(x),is also a sub-gradient of f at x. A graph is characterized bythe 2-tuple G = (V, E), where V is the set of nodes andE ⊆ V × V denotes the set of edges. An edge exists fromnode i to j if (i, j) ∈ E which can also be shown by j ∈ Niwhere Ni is the set of neighbors of node i. Then we say agraph is complete if (i, j) ∈ E for all nodes i, j, is randomk-regular if |Ni| = k for all i in some random order, and iscalled a cycle in case j ∈ Ni if and only if j = i ± 1 witha cyclic order, i.e., (1, n) ∈ E in case there are n nodes inthe graph.

III. PROBLEM SETUP

In this section we introduce the main framework ofour analysis. Herein, we briefly mention some backgroundmaterial on dual averaging which is the workhorse of ourmethodology. We then continue by proposing the distributedsetup followed by a two-player game-theoretic framework,which can be generalized to multi-player setting with noextra effort.

A. Standard Dual Averaging

The dual averaging algorithm proposed by Nesterov [3] isa subgradient scheme for non-smooth convex problems. Theprimal-dual nature of this method generates two sequenceof iterates {x(t), z(t)}∞t=0 contained within X × Rd suchthat the updates of z(t) is responsible for averaging thesupport functions in the dual space, while the updates ofx(t) establishes a dynamically updated scale between theprimal and dual spaces. More precisely, after receiving thesubgradient g(t) ∈ ∂f(x(t)) at iteration t, the algorithm isupdated as follows,

z(t+ 1) = z(t) + γ(t)g(t),

x(t+ 1) = ΠψX (−z(t+ 1), α(t)),

(1)

where γ(t) > 0, {α(t)}∞t=0 is a positive non-increasingsequence, and

ΠψX := argminx∈X

{〈−z, x〉+

1

αψ(x)

}(2)

is a generalized projection according to a strongly convexprox-function ψ(.).

Fig. 1: Schematic of our game setup including two teams(red A and blue B) with each team having a representativeat each node.

B. Our Model

We consider the distributed learning problem for a gamebetween two teams (players), both playing on a networkconsisting of n nodes connected via a communication graphG. To this end, each team has a representative on each node,hence 2n members in total (Figure 1). The teams are groupedwithin the sets I` = {(`, 1), . . . , (`, n)} for ` ∈ {A,B}and each team has a choice of action x` ∈ X` ⊂ Rd` thatminimizes the following global cost:

f`(xA, xB) =1

n

n∑i=1

f`,i(xA, xB), ` ∈ {A,B} (3)

which is the average of its members’ costs at each nodei, denoted by f`,i. The goal is to learn a global NE whileplayers have no global decision-making capability. Instead,G is assumed to be connected and players can communicatewithin their own team according to the network structure.

To learn a global NE, each team updates the state ofits nodes using a distributed dual averaging method. Thenetwork-based information flow of our algorithm is relatedto [8], however, in a non-cooperative game-theoretic setup,convergence of such iterative methods to NE is non-trivialdue to nature of equilibria, limited information exchange,etc. Sufficient conditions of convergence is further discussedin IV.

In our proposed algorithm, at each node i and iteration t,player ` ∈ {A,B} maintains an estimate of its team’s actionas x`,i(t). A communication protocol is designed for sharingdual variables among the nodes of each team, where node iupdates its dual variable z`,i(t) using a convex combinationof those of its neighboring teammates. Then it maps z`,i(t)back to the set of admissible actions X` followed by taking itslocal action x`,i(t). Subsequently, players observe the actionsof the opponent at each node and locally obtain an estimateof the subgradient of their distributed cost (reward). We showthat under some regularity assumptions, this process providesplayers with enough local information to decide about thenext step and eventually learn the global NE.

C. Our Contribution: Team-based Dual Averaging

We assume that the structure of G induces a doublystochastic matrix P` available to each team where P`,ij > 0

Algorithm 1 Team-Based Dual Averaging

1: Inputs: For player `2: Subdifferential ∂f`,i at node i3: Doubly stochastic P` induced by network structure4: Outputs:5: Estimates of NE x`,i(t) at node i for player `6: Initialize:7: Random: x`,0(0), . . . , x`,n(0)8: t = 0, z`,i(0) = 09: while convergence

10: For Player ` = {A,B}11: For Node i = 1, . . . , n12: Update the dual variable z`,i(t+ 1) by (4)13: Calculate and take action x`,i(t+ 1) by (4)14: For Player ` = {A,B}15: For Node i = 1, . . . , n16: Observe the opponents local action and get17: g`,i(t+ 1) ∈ ∂`f`,i(xA,i(t+ 1), xB,i(t+ 1))18: t = t+ 119: Return: the running average x`,i by (6)

if and only if nodes i and j are connected. Then each player` ∈ {A,B} at iteration t and each node i ∈ V computesupdates by the following,{

z`,i(t+ 1) =∑j∈N`,i

P`,ijz`,j(t) + γ`(t)g`,i(t)

x`,i(t+ 1) = ΠψX`

(− z`,i(t+ 1), α`(t)

),

(4)

where z`,i and x`,i are the dual variable and the local actionof player ` at node i respectively, g`.i is an evaluation ofsubdifferential of the local cost f`,i at the local actions x`,i(t)as,

g`,i(t) ∈ ∂`f`,i(xA,i(t), xB,i(t)), (5)

where ∂` is the differential w.r.t. the action of player `.Finally, α` and γ` are positive step-sizes with α` being non-increasing. Note that x`,i can be viewed as the local copy ofx` at node i, and its updates require access to only the ithrow of the matrix P`. We refer to the updates in (4) as Team-Based dual averaging (TDA). The proposed methodology issummarized in Algorithm 1. We define the running localaverage at node i, for player ` ∈ {A,B} as,

x`,i(t) =1

t

t∑s=0

x`,i(s). (6)

In the sequel, we first show the convergence of functionvalues, f`, evaluated at x`,i to NE. Then, convergenceof action iterates x`,i is discussed under more regularityconditions.

IV. MAIN RESULTS

In this section, we present the main results of the paperand the corresponding analysis. We first make a few keyassumptions that are used in our subsequent analysis.

Assumption 1: The undirected communication graph G isconnected.

Assumption 2: The cost functions fA and fB satisfy thefollowing:

1) The cost fA(x1, x2) is convex in x1 for any x2,and concave in x2 for any x1. Similarly, The costfB(x1, x2) is convex in x2 for any x1, and concavein x1 for any x2.

2) The cost fA(x1, x2) is LA-Lipschitz continuous in x1for any x2 such that |fA(x, y)− fA(z, y)| ≤ LA‖x−z‖, ∀x, z ∈ XA. Similarly, the cost fB(x1, x2) is LB-Lipschitz continuous in x2 for any x1, i.e., |fB(x, y)−fB(x, z)| ≤ LB‖y − z‖, ∀y, z ∈ XB .

Definition 1: (Cross-monotonicity) A two-player gamesetup with cost functions f1(x1, x2) and f2(x1, x2) is cross-monotone if⟨S(x1, x2)− S(x∗1, x

∗2), (x1, x2)− (x∗1, x

∗2)⟩≥ 0, (7)

for all (x1, x2), (x∗1, x∗2) ∈ X1 × X2 where S is a 2-tuple

defined as S(x1, x2) =(ξ(x1, x2), ζ(x1, x2)

)with any

ξ(x1, x2) ∈ ∂1f2(x1, x2), ζ(x1, x2) ∈ ∂2f1(x1, x2).

The above inner-product is defined over the product spaceRd1 × Rd2 .

In contrast with optimization problems where the notionof “optimality” plays a central role, in a game setup, theobjective is seeking an equilibrium rather than an “optimal”solution. In order to incorporate the interactions of players inconvergence analysis of distributed algorithms, monotonicityhas been known as a useful sufficient condition for problem“regularity,” originally due to the seminal work by Rosen[28]. The monotonicity as defined by Rosen, and subse-quently used in the game literature (e.g., [29], [30] andreferences therein), refers to the property of the operatorgenerated by the pseudo-gradient of the game. However,cross-monotonicity as proposed here is different and refersto the operator S generated by the cross-pseudo-gradient asdefined above. The reason for introducing this new regularitycondition in the context of network games is due to the natureof our model, where the costs of players are distributed overthe network rather than each node representing a player. Thecross-monotonicity condition is used in the convergence ofthe function values in Section IV-C, while the conventionalmonotonicity is used to ensure the convergence of actioniterates in Section IV-D.

A. Convergence Analysis

First, we introduce two lemmas that equip us with therequired tools to prove the basic convergence. Herein, weborrow some tools from prior works, as such, we onlymention the key steps that distinguishes our contribution inthe game setup. Also, it is worth mentioning that even thoughwe prove convergence of TDA with γ`(t) = 1, it can improvethe convergence rates numerically and its characteristics willbe analyzed in a followup paper.

Lemma 1: Following Algorithm 1, suppose that player` ∈ {A,B} has access to g`,i for (`, i) ∈ I` at each nodei ∈ [n], and {α`(t)}∞t=0 is a non-increasing sequence of

positive step-sizes and γ`(t) = 1. Then under Assumption 2,for any x∗ = (x∗A, x

∗B) ∈ XA ×XB at node i,

F (i)` (T ) ≤ 1

T

ψ(x∗` )

α`(T )+

1

T

L2`

2

T∑t=1

α`(t− 1)

+L`T

T∑t=1

α`(t)(R(i)` (t) +

2

n

n∑j=1

R(j)` (t)

)+

1

T

1

n

T∑t=1

n∑j=1

H(j)` (t),

(8)

where,

F (i)A (t) = fA(xA,i(t), x

∗B)− fA(x∗),

F (i)B (t) = fB(x∗A, xB,i(t))− fB(x∗),

H(i)A (t) =

⟨h∗A,i(t)− hA,i(t), xB,i(t)− x∗B

⟩,

H(i)B (t) =

⟨h∗B,i(t)− hB,i(t), xA,i(t)− x∗A

⟩,

R(i)` (t) = ‖z`(t)− z`,i(t)‖∗,

(9)

with x`,i(t) defined as in (6), z`(t) = 1n

∑ni=1 z`,i(t) as the

averaging factor in the dual space, and finally h`,i and h∗`,ias the sub-gradients of f`,i detailed below,

hA,i(t) ∈ ∂BfA,i(xA,i(t), xB,i(t)), h∗A,i(t) ∈ ∂BfA,i(x∗).hB,i(t) ∈ ∂AfB,i(xA,i(t), xB,i(t)), h∗B,i(t) ∈ ∂AfB,i(x∗).

(10)

Proof: Define y`(t) for each team as the generalizedprojection,

y`(t) = ΠψX`

(−z`(t), α)

= argminx∈X`

{ t−1∑s=1

n∑i=1

〈g`,i(s), x〉+1

α`(s)ψ(x)

},

(11)which follows from the doubly stochastic nature of P` andthe iterative form of z`(t),

z`(t+ 1) = z`(t) +

n∑j=1

g`,j(t). (12)

Convexity of fA (in its first element) results in,

F (i)A (t) ≤ 1

T

T∑t=1

fA(xA,i(t), x∗B)− fA(x∗), (13)

Using the LA-Lipschitz property of fA and α-Lipchitz con-tinuity of the generalized projection Πψ

X (., α) (see Lemma 1in [3]) we can show that,

1

T

T∑t=1

fA(xA,i(t), x∗B)− fA(x∗)

≤ 1

T

T∑t=1

fA(yA(t), x∗B)− fA(x∗) +LAT

T∑t=1

αA(t)R(i)A (t).

(14)

By adding and subtracting fA,i(xA,i(t), x∗B), we can boundthe first term on the right hand side of (14) as,

fA(yA(t), x∗B)− fA(x∗) ≤ LAn

n∑i=1

‖yA(t)− xA,i(t)‖

+1

n

n∑i=1

fA,i(xA,i(t), x∗B)− fA,i(x∗),

(15)where the LA-Lipschitz condition is again leveraged. Thenfrom the first part of Assumption 2 we can determine a boundon the last term in (15) as,

fA,i(xA,i(t), x∗B)− fA,i(x∗A, x∗B)

=(fA,i(xA,i(t), x

∗B)− fA,i(xA,i(t), xB,i(t))

)+(fA,i(xA,i(t), xB,i(t))− fA,i(x∗A, xB,i(t))

)+(fA,i(x

∗A, xB,i(t))− fA,i(x∗A, x∗B)

)≤⟨− hA,i(t), xB,i(t)− x∗B

⟩+⟨gA,i(t), xA,i(t)− x∗A

⟩+⟨h∗A,i(t), xB,i(t)− x∗B

⟩= J (i)

A (t) +H(i)A (t),

(16)where,

J (i)A (t) =

⟨gA,i(t), xA,i(t)− x∗A

⟩. (17)

The rest of the proof contains bounding JA(t) using the α-Lipchitz continuity of the generalized projection and the factthat ‖gA,i(t)‖∗ ≤ LA which is skipped here due to brevity.The reader is referred to Lemma 3 in [8] for the details.Similar analysis results in the bound for F (i)

B (T ).

B. Choice of the learning rate α`(t)

In order for Theorem 1 to result in convergence, an ap-propriate choice of step-size (learning rate) α`(t) is required.Next lemma shows how specific choices of α`(t) result inpractical bounds on F (i)

` by getting rid of R(k)` terms.

Lemma 2: Under Assumptions 1 and 2, and definitionsof Lemma 1, suppose that ψ(x∗` ) ≤ R2

` . By choosing thestep-size,

α`(t) =R`√

1− σ2(P`)

4√

2L`√t

,

for ` ∈ {A,B} and γ`(t) = 1, at each node i we get,

F (i)` (T ) ≤8

√2

log(T√n)√

T

R`L`√1− σ2(P`)

+1

T

1

n

T∑t=1

n∑j=1

H(j)` (t).

(18)

Proof: Stacking the updates of dual variables in (4)into a matrix form Z` = [z`,1 . . . z`,n] and similarly for G`.Then for an undirected graph (P>` = P`), we get

Z`(t+ 1) = Z`(t)P` +G`(t). (19)

Define,Φ`(t, s) = P t+1−s

` ,

where Φ can be a model of some transition matrix for adiscrete-time linear system with states Z. This results in,

Z`(t+ 1) = Z(s)Φ`(t, s) +

t∑r=s+1

G`(r− 1)Φ`(t, r) +G`(t)

for 0 ≤ s ≤ t− 1. Noting Φ`(t, s)1 = 1 and from definitionz`(t) = Z`(t)1/n, it is straightforward to show,

z`(t)− z`,i(t) = z`(s)− Z(s)Φ`(t, s)ei

+

t−1∑r=s+1

G`(r − 1)[1/n− Φ`(t− 1, r)ei]

+G`(t− 1)[1/n− ei],

where we used z`,i(t) = Z`(t)ei. For simplicity we assumez`,i(0) = z`(0) (say by choosing z`,i(0) = 0), then we havez`(s)− Z(s)Φ`(t, s)ei = 0 at s = 0. This implies,

z`(t)− z`,i(t) =

t−1∑r=1

G`(r − 1)[1/n− Φ`(t− 1, r)ei]

+G`(t− 1)[1/n− ei].

We can then proceed to bound this error,

‖z`(t)− z`,i(t)‖∗ ≤ L`t−1∑r=1

‖1/n− Φ`(t− 1, r)ei‖1 + 2L`,

where we used ‖g`,i(t − 1)‖∗ ≤ L` and norm inequalities.Consider the following standard inequality ([31]),

‖1/n− Φ`(t− 1, r)ei‖1 ≤√n

2σ2(P`)

t+1−r.

Now we say r is small if t− r ≥ log(T√n)

log σ2(P`)−1 − 1, otherwiseit is large. Then by splitting the sum, one would note thatfor small enough r,

‖1/n− Φ`(t− 1, r)ei‖1 ≤1

T,

and otherwise,

‖1/n− Φ`(t− 1, r)ei‖1 ≤ 2.

Then it can be shown that,

R(i)` = ‖z`(t)− z`,i(t)‖∗ ≤ 2L`

log(T√n)

log σ2(P`)−1+ 3L`

≤ 2L`log(T

√n)

1− σ2(P`)+ 3L`,

using log σ2(P`)−1 ≥ 1− σ2(P`). Then from Lemma 1,

F (i)` (T ) ≤ ψ(x∗` )

Tα`(T )+L2

`

2T

T∑t=1

α`(t− 1)

+3L2

`

T

(2log(T

√n)

1− σ2(P`)+ 3

) T∑t=1

α`(t) +1

T

1

n

T∑t=1

n∑j=1

H(j)` (t).

(20)Define the sequence {α(t)}∞t=0 as,

α`(t) =K`√t, α`(0) = 1.

Since ψ(x∗` ) ≤ R2` and

∑Tt=1 t

−1/2 ≤ 2√T − 1,

F (i)` (T ) ≤

√T

T

[R2

`

K`+ 19K`L

2` + 12K`L

2`

(log(T

√n)

1− σ2(P`)

)]+

1

T

1

n

T∑t=1

n∑j=1

H(j)` (t).

(21)

Choosing K` =R`

√1−σ2(P`)

4√2L`

results in (18).

Now we are well-equipped to propose the main result ofthe paper.

C. Convergence of Function Values

Theorem 1: Under Assumptions 1 and 2 and notationsin Lemmas 1 and 2, if x∗ is a global NE and the game isCross-monotone, then for each node i,

limT→∞

F (i)A (T ) = lim

T→∞F (i)B (T ) = 0 (22)

Proof: From Lemma 2,

F (i)A (T ) + F (i)

B (T ) ≤8√

2log(T

√n)√

T

∑`∈{A,B}

R`L`√1− σ2(P`)

+1

T

1

n

T∑t=1

n∑j=1

H(j)A (t) +H(j)

B (t).

Expanding the the last term, and defining the operator Sj (asin (7) for f1 = fA,j and f2 = fB,j) we get

H(j)A (t) +H(j)

B (t) =

−⟨Sj(xA,j(t), xB,j(t))− Sj(x∗A, x∗B),

(xA,j(t), xB,j(t)

)− x∗

⟩≤ 0,

since the game is cross-monotone at each node, i.e., theoperator Sj is monotone, specifically at ξ = hB,j andζ = hA,j . Hence,

F (i)A (T ) +F (i)

B (T ) ≤ 8√

2log(T

√n)√

T

∑`∈{A,B}

R`L`√1− σ2(P`)

(23)Then given that x∗ is a global NE, F (i)

A (T ),F (i)B (T ) ≥ 0

and the convergence will subsequently be guaranteed.

D. Convergence of Action Iterates

For simplicity, suppose that fA and fB are differentiable,then under Assumption 2.1 a NE exists (Theorem 1 in [28]).Define the operator Q generated by the pseudo-gradient mapas Q(x1, x2) =

(ω(x1, x2), ν(x1, x2)

)with

ω(x1, x2) ∈ ∂AfA(x1, x2), ν(x1, x2) ∈ ∂BfB(x1, x2).

Additionally, suppose that Q is (strictly) monotone, i.e.⟨Q(x1, x2)−Q(x∗1, x

∗2), (x1, x2)− (x∗1, x

∗2)⟩≥ 0, (24)

for all (x1, x2), (x∗1, x∗2) ∈ XA×XB with equality if and only

if (x1, x2) = (x∗1, x∗2) (where inner-product is defined over

the product space RdA × RdB ). Then it follows that thereexists a unique NE (see Theorem 2 in [28], where also a

sufficient condition for this monotonicity condition is givenin terms of the game Jacobian).

Now by combining this uniqueness result with the conver-gence of TDA in function values, we conclude convergenceof the running average of action iterates to the unique NE.

V. EXAMPLE

In this section we illustrate the performance of our methodfor a two-team game on a network where each player has anobjective,

f`,i =1

2‖x1 − a`,i‖2 +

1

4〈x1, x2 − b`,i〉 (25)

where a`,i and b`,i are arbitrary prescribed parameters. Theinformation about each player’s cost is only known locally(at each node) and unknown to the opponent. Although thecost functions are locally Lipschitz, they can be treated asa Lipschitz continuous function over any (large) boundeddomain. We have simulated the performance of TDA overcomplete, random 6-regular, and cycle graphs each consistingof 50 nodes. Figure 3(b) shows the random 6-regular networksimulated here. To illustrate the advantage of dual averagingin our algorithm, we compare the results with anotherdistributed algorithm — referred to as Team-Based Mirror-Descent (TMD) — which is an extension of DistributedMirror Descent algorithm [7] implemented on our gamemodel, where ψ(.) = 1

2 ‖.‖2. Also, as in standard form, the

step-size of this algorithm β`(t) = K`

t0.8 is chosen to be notsummable but square summable where K` is the constant asin Lemma 2. The TMD algorithm is detailed in Algorithm2.

Algorithm 2 Team-Based Mirror Descent

1: Inputs: For player ` ∈ {A,B}2: Subdifferential ∂f`,i at node i3: Doubly stochastic P` induced by network structure4: Outputs:5: Estimates of NE as x`,i(t) at node i for player `6: Initialize:7: Random: x`,0(0), . . . , x`,n(0)8: t = 0,9: while convergence

10: For Player ` = {A,B}11: For Node i = 1, . . . , n12: Observe the opponents action: x−`,i(t+ 1)13: Calculate the average estimate:14: v`,i(t+ 1) =

∑j∈N`,i

P`,ijx`,j(t)

15: Calculate get g`,i(t+1) ∈ ∂`f`,i at v`,i(t+1)16: and x−`,i(t+ 1)17: For Player ` = {A,B}18: For Node i = 1, . . . , n19: Calculate and take action20: x`,i(t+ 1) = ProjX`

[v`,i(t)− β`(t)g`,i(t)]21: t = t+ 122: Return x`,i(t) at node i for player ` ∈ {A,B}

0 1000 2000 3000 4000 5000Iterations

10−3

10−2

10−1

100Normalized Av

erag

e Error TDA_complete

TMD_completeTDA_randomTMD_randomTDA_cycleTMD_cycle

Fig. 2: NAE at each iteration for both TDA and TMDalgorithms in complete, random 6-regular, and cycle graphswith 50 nodes.

We define Normalized Average Error (NAE) as the nor-malized mean of the running average error from NE over allnodes of the network as follows,

NAE(t) =

∑ni=1 ‖(xA,i(t)− x∗A , xB,i(t)− x∗B)‖∑ni=1 ‖(xA,i(0)− x∗A , xB,i(0)− x∗B)‖

Then, Figure 2 shows the NAE at each iteration for bothof TDA and TMD algorithms. It can be noted that on thenetworks with the same structure, the TDA method has aconvergence rate much faster than the TMD, which is due todual averaging nature of TDA. Clearly, as the connectivity ofthe network decreases from complete to cycle, convergencerate becomes slower. This is captured in Theorem 1) wherelower connectivity in the network leads to an increase ofσ2(P`) closer to 1, which results in a larger bound for eachtime horizon T .

Next, we examine the performance of TDA in terms ofthe required iteration for a given error (from the NE). To doso, we consider four complete graphs with 25, 50, 80, and120 nodes and require the algorithm to achieve a NAE ofless than 0.1. Figure 3 illustrates the number of iterationsT needed for each network to achieve NAE(T ) < 0.1.Since, the initialization of the algorithm is random, for eachnetwork we have illustrated the mean and variance of thenumber of iterations required by 30 realizations. It is clearthat, the number of required iterations to achieve the sameerror-bound, increases exponentially in this simulation. Partof this dynamic is due to the fact that TDA is a first-ordermethod and its convergence in game setting is towards anequilibrium, unlike distributed optimization case where theconvergence is towards an attractive optimal point.

VI. CONCLUSION

In this paper, we proposed a new model for network gameswhere each player’s cost is distributed over a network. With-out global information and only with distributed decision-making for players, we have shown that dual averaging

can be applied to the scenario where two teams, withdistinct objectives, can coordinate with their respective teammembers, over the network. In particular, we have shownthat using assumptions adopted in distributed optimization, inconjunction with a new cross-monotonicity condition on thegame structure, the members of each team can distributivelylearn the NE.

Stochastic access to subdifferential of cost functions aswell as noisy observations of opponent’s action is consideredas the next immediate extension of this work. Additionally,analysis of the step-size for the dual variables and equippingthe algorithm with feasible second-order information are yetother achievable furtherance that can improve the conver-gence behavior of this method close to equilibrium.

50 100Nodes

101

102

103

Iteratio

ns

(a) (b)

Fig. 3: (a) Number of iterations needed for each network sothat NAE is less than 0.1, (b) The random 6-regular networkswith 50 nodes simulated in our setup.

REFERENCES

[1] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in MultiagentNetworks, vol. 33. Princeton University Press, 2010.

[2] A. Nemirovsky and D. Yudin, Problem complexity and method effi-ciency in optimization. Wiley, New York, 1983.

[3] Y. Nesterov, “Primal-dual Subgradient Methods for Convex Problems,”Mathematical Programming, vol. 120, no. 1 SPEC. ISS., pp. 221–259,2009.

[4] L. Xiao, “Dual averaging methods for regularized stochastic learningand online optimization,” JMLR ’10, vol. 11, no. 1, pp. 2543–2596,2010.

[5] M. Sedghi, G. Atia, and M. Georgiopoulos, “Robust manifold learningvia conformity pursuit,” IEEE Signal Processing Letters, vol. 26,pp. 425–429, March 2019.

[6] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54,no. 1, pp. 48–61, 2009.

[7] T. T. Doan, S. Bose, D. H. Nguyen, and C. L. Beck, “Convergence ofthe iterates in mirror descent methods,” IEEE Control Systems Letters,vol. 3, no. 1, pp. 114–119, 2019.

[8] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging fordistributed optimization: Convergence analysis and network scaling,”IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592–606,2012.

[9] S. Alemzadeh and M. Mesbahi, “Distributed q-learning for dynami-cally decoupled systems,” arXiv preprint arXiv:1809.08745, 2018.

[10] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory.Academic Press, 1995.

[11] J. Engwerda, LQ Dynamic Optimization & Differential Games. JohnWiley & Sons, 2005.

[12] L. Lambertini, Differential Games in Industrial Economics. CambridgeUniversity Press, 2018.

[13] A. W. Starr and Y.-C. Ho, “Nonzero-sum differential games,” Journalof Optimization Theory and Applications, vol. 3, no. 3, pp. 184–206,1969.

[14] M. Simaan and J. B. Cruz, “On the Stackelberg strategy in nonzero-sum games,” Journal of Optimization Theory and Applications, vol. 11,no. 5, pp. 533–555, 1973.

[15] M. H. Manshaei, Q. Zhu, T. Alpcan, T. Bacsar, and J.-P. Hubaux,“Game theory meets network security and privacy,” ACM ComputingSurveys (CSUR), vol. 45, no. 3, p. 25, 2013.

[16] V. Srivastava, J. Neel, A. B. MacKenzie, R. Menon, L. A. DaSilva,J. E. Hicks, J. H. Reed, and R. P. Gilles, “Using game theory toanalyze wireless ad hoc networks,” IEEE Communications Surveys &Tutorials, vol. 7, no. 4, pp. 46–56, 2005.

[17] Z. Han, D. Niyato, W. Saad, T. Basar, and A. Hjørungnes, GameTheory in Wireless and Communication Networks: Theory, Models,and Applications. Cambridge university press, 2012.

[18] R. Vidal, O. Shakernia, H. J. Kim, D. H. Shim, and S. Sastry, “Prob-abilistic pursuit-evasion games: theory, implementation, and experi-mental evaluation,” IEEE Transactions on Robotics and Automation,vol. 18, no. 5, pp. 662–669, 2002.

[19] S. Talebi, M. A. Simaan, and Z. Qu, “Cooperative, non-cooperativeand greedy pursuers strategies in multi-player pursuit-evasion games,”in 2017 IEEE Conference on Control Technology and Applications(CCTA), pp. 2049–2056, Aug 2017.

[20] P. Mertikopoulos and Z. Zhou, “Learning in games with continuousaction sets and unknown payoff functions,” Mathematical Program-ming, pp. 1–43, 2018.

[21] M. Bravo, D. Leslie, and P. Mertikopoulos, “Bandit learning in con-cave n-person games,” in Advances in Neural Information ProcessingSystems 31 (S. Bengio, H. Wallach, H. Larochelle, K. Grauman,N. Cesa-Bianchi, and R. Garnett, eds.), pp. 5661–5671, Curran Asso-ciates, Inc., 2018.

[22] N. Li and J. R. Marden, “Designing games for distributed optimiza-

tion,” IEEE Journal on Selected Topics in Signal Processing, vol. 7,no. 2, pp. 230–242, 2013.

[23] F. Salehisadaghiani and L. Pavel, “Distributed Nash equilibrium seek-ing in networked graphical games,” Automatica, vol. 87, pp. 17–24, 12018.

[24] P. Frihauf, M. Krstic, and T. Basar, “Nash equilibrium seeking innoncooperative games,” IEEE Transactions on Automatic Control,vol. 57, pp. 1192–1207, May 2012.

[25] T. Alpcan and T. Basar, Distributed Algorithms for Nash Equilibria ofFlow Control Games, pp. 473–498. Boston, MA: Birkhauser Boston,2005.

[26] T. Tatarenko, W. Shi, and A. Nedic, “Accelerated Gradient PlayAlgorithm for Distributed Nash Equilibrium Seeking,” in 2018 IEEEConference on Decision and Control (CDC), pp. 3561–3566, IEEE,12 2018.

[27] B. Gharesifard and J. Cortes, “Distributed convergence to Nashequilibria in two-network zero-sum games,” Automatica, vol. 49, no. 6,pp. 1683–1692, 2013.

[28] J. B. Rosen, “Existence and uniqueness of equilibrium points forconcave n-person games,” Econometrica, vol. 33, no. 3, pp. 520–534,1965.

[29] F. Facchinei and J.-S. Pang, Finite-dimensional Variational Inequali-ties and Complementarity Problems. Springer, 2003.

[30] F. Parise and A. Ozdaglar, “A variational inequality framework fornetwork games: Existence, uniqueness, convergence and sensitivityanalysis,” Games and Economic Behavior, no. 2010, pp. 1–43, 2019.

[31] P. Diaconis and D. Stroock, “Geometric bounds for eigenvalues ofMarkov chains,” The Annals of Applied Probability, vol. 1, pp. 36–61,2 1991.

distributed learning in network games: a dual averaging approach · 2020. 2. 15. · learning...

Documents