on attack-resilient distributed formation control...

ON ATTACK-RESILIENT DISTRIBUTED FORMATION CONTROLIN OPERATOR-VEHICLE NETWORKS

MINGHUI ZHU AND SONIA MARTıNEZ ∗

Abstract. This paper tackles a distributed formation control problem where a group of vehiclesis remotely controlled by a network of operators. Each operator-vehicle pair is attacked by anadversary, who corrupts the commands sent from the operator to the vehicle. From the point of viewof operators, each adversary follows an attacking strategy linearly parameterized by some (potentiallytime-varying) matrix which is unknown a priori. In particular, we consider two scenarios dependingupon whether adversaries can adapt their attacking tactics online. To assure mission completion insuch a hostile environment, we propose two novel attack-resilient distributed control algorithms thatallow operators to adjust their policies on the fly by exploiting the latest collected information aboutadversaries. Both algorithms enable vehicles to asymptotically achieve the desired formation fromany initial configuration and initial estimate of the adversaries’ strategies. It is further shown that thesequence of the distances to the desired formation is square summable for each proposed algorithm.In numerical examples, the convergence rates of our algorithms are exponential, outperforming thetheoretic results.

1. Introduction. Recent advances in communications, sensing and computa-tion have made possible the development of highly sophisticated unmanned vehicles.Applications include, to name a few, border patrol, search and rescue, surveillance,and target identification operations. Unmanned vehicles operate without crew on-board, which lowers their deployment costs in scenarios that are hazardous to hu-mans. More recently, the use of unmanned vehicles by (human) operators has beenproposed to enhance information sharing and maintain situational awareness. How-ever, this capability comes at the price of an increased vulnerability of informationtechnology systems. Motivated by this, we consider a formation control problem foran operator-vehicle network where each unmanned vehicle is able to perform real-time coordination with operators (or ground stations) via sensor and communicationinterfaces. However, the operator-vehicle links can be attacked by adversaries, dis-rupting the overall network objective. Since we cannot rule out that adversaries areable to successfully amount attacks, it is of prominent importance to provide resilientsolutions that assure mission completion despite the presence of security threats.

Literature review. In information technology networks, either reactive or pro-tective mechanisms have been exploited to prevent cyber attacks. Non-cooperativegame theory is advocated as a mathematical framework to model the interdependencybetween attackers and administrators, and predict the behavior of attackers; see anincomplete list of references [1, 15, 26, 32].

Another relevant field is networked control systems in which the effects of imper-fect communication channels on remote control are analyzed and compensated. Mostof the existing papers focus on; e.g., band-limited channels [19], quantization [11],packet dropout [27], delay [10], and sampling [21].

Very recently, cyber-security of the emerging cyber-physical systems has drawnmounting attention in the control society. Denial-of-service attacks, destroying thedata availability in control systems, are entailed in recent papers [2, 4, 6, 15]. An-other important class of cyber attacks, namely false data injection, compromises thedata integrity of state estimation and is attracting considerable effort; an incomplete

∗M. Zhu is with the Department of Electrical Engineering, Pennsylvania State University, 201Old Main, University Park, PA, 16802, ([email protected]). S. Martınez is with the Department ofMechanical and Aerospace Engineering, University of California, San Diego, 9500 Gilman Dr, LaJolla CA, 92093, ([email protected]). A preliminary version of this paper is published in [35].

1

reference list includes [20, 24, 30, 33]. In [7, 8], the authors exploit pursuit-evasiongames to compute optimal evasion strategies for mobile agents in the face of jammingattacks. Other relevant papers include [3] examining the stability of a SCADA wa-ter management system under a class of switching attacks, and our recent paper [36]studying a secure control problem of linear time-invariant systems through a receding-horizon Stackelberg game model. As [3, 36], the current paper is devoted to studyingdeception attacks where attackers maliciously modify the transmitted data. In [31], anattack space defined by the adversary’s system knowledge, disclosure and disruptionresources is introduced. In the paper [17], a class of trust based distributed Kalmanfilters is proposed for power systems to prevent data disseminated by untrusted PMUs.

Regarding malicious behavior in multi-agent systems, we distinguish [23, 28] astwo representative references mostly relevant to this work. The paper [28] consid-ers the problem of computing arbitrary functions of initial states in the presence offaulty or malicious agents, whereas [23] focuses on consensus problems. In both set-tings, the faulty or malicious agents are part of the network and subject to unknown(arbitrarily non-zero) inputs. Their main objective is to determine conditions underwhich the misbehaving agents can (or cannot) be detected and identified, and thendevise algorithms to overcome the malicious behavior. This significantly departs fromthe problem formulation we consider here, where the attackers are external to theoperator-vehicle network and can affect inter operator-vehicle connections. Addition-ally, we make use of a model of attackers as rational decision makers, who can makedecisions in a real-time and feedback fashion. Here we aim to design completely dis-tributed algorithms for the operator-vehicle network to maintain mission assuranceunder limited knowledge of teammates and opponents. Our objective is to deter-mine an algorithm that is independent of the number of adversaries and robust todynamical changes of communication graphs between operators.

Statement of contributions. The current paper studies a formation control prob-lem for an operator-vehicle network in which each vehicle is remotely controlled byan operator. Each operator-vehicle pair is attacked by an adversary, who corruptsthe control commands sent to the vehicle. The adversaries are modeled as rationaldecision makers and their strategies are linearly parameterized by some (potentiallytime-varying) matrices which are unknown to operators in advance. We investigatetwo plausible scenarios depending on the learning capabilities of adversaries. The firstscenario involves unilateral learning, where adversaries possess (potentially incorrect)private information of operators in advance, but do not update such informationduring the attacking course. The second scenario assumes bilateral learning, whereadversaries are intelligent and attempt to infer some private information of operatorsthrough their observations. We propose a class of novel distributed attack-resilientformation control algorithms each consisting of two feedback-connected blocks: a for-mation control block and an online learning block. The online learning mechanismserves to collect information in a real-time fashion and update the estimates of ad-versaries through continuous contact with them. The formation control law of eachoperator is adapted online to minimize a local formation error function. To do this,each operator exploits the latest estimate of her opponent and locations of neighbor-ing vehicles. We show how each proposed algorithm guarantees that vehicles achieveasymptotically the desired formation from any initial vehicle configuration and anyinitial estimates of adversaries. For each proposed algorithm, the sequence of thedistances to the desired formation is shown to be square summable. Two numericalexamples are provided to verify the performance of the proposed algorithms. In the

2

simulation, the convergence rates turn out to be exponential, which outperform theanalytic results characterizing the worst-case convergence rates.

A preliminary version of this paper is published in [35] where only the scenarioof unilateral learning is investigated.

2. Problem formulation. In this section, we first articulate the layout of theoperator-vehicle network and its formation control mission. Then, we present theadversary model that is used in the remainder of the current paper. After this, wespecify two scenarios investigated in the paper.

2.1. Architecture and objective of the operator-vehicle network. Con-sider a group of vehicles in Rd, labeled by i ∈ V := 1, · · · , N. The dynamics ofeach vehicle is governed by the following discrete-time and fully actuated system:

pi(k + 1) = pi(k) + ui(k), (2.1)

where pi(k) ∈ Rd is the position of vehicle i and ui(k) ∈ Rd is its input. Each vehicle iis remotely maneuvered by an operator i, and this assignment is assumed to be one-to-one and fixed over time. For simplicity, we assume that vehicles communicate onlywith the associated operator and not with other vehicles. Moreover, each vehicleis able to identify its location and send this information to its operator. On theother hand, an operator can exchange information with neighboring operators anddeliver control commands to her vehicle. We assume that the communications betweenoperators, and from vehicle to operator are secure1, while the communications fromoperator to vehicle can be attacked. Other architectures are possible, and the present

Fig. 2.1. The architecture of the operator-vehicle network

one is chosen as a first main class of operator-vehicle networked systems; see Figure 2.1.

The mission of the operator-vehicle network is to achieve some desired formationwhich is characterized by a formation digraph G := (V, E). Each edge (j, i) ∈ E ⊆V × V \ diag(V ), starting from vehicle j and pointing to vehicle i, is associated witha vector νij ∈ Rd. Denote by Ni := j ∈ V | (j, i) ∈ E the set of in-neighbors ofvehicle i in G and let ni be the cardinality ofNi; i.e., ni = |Ni|. The set of in-neighborsof agent i will be enumerated as Ni = i1, . . . , ini

. Being a member of the team,each operator i is only aware of local formation constraints; i.e., νij for j ∈ Ni.

1Alternatively, it can be assumed that operators have access to vehicles’ positions by an externaland safe measurement system.

3

The multi-vehicle formation control mission can be formulated as a team opti-mization problem where the global optimum corresponds to the desired formation ofvehicles. In particular, we encode the problem into the following quadratic program:2

minp

[

J(p) :=∑

(j,i)∈E

‖pi − pj − νij‖2Pij

]

,

where the vector p := [pT1 , · · · , pTN ]T ∈ RNd is the collection of vehicles’ locations. Thematrix Pij ∈ Rd×d is a diagonal and positive-definite weight matrix and representsthe preference of operator i on the link (j, i) with j ∈ Ni. Observe that J(p) is aconvex function of p since ‖ · ‖2Pij

is convex and pi − pj − νij is affine [9]. Denote by

the set of the (global) minimizers X∗ ⊂ RNd. In this paper, we impose the followingon G and X∗:

Assumption 2.1. The digraph G is strongly connected. In addition, X∗ 6= ∅ andJ(p∗) = 0 for any p∗ ∈ X∗.

The objective function J(p) can describe any shape in Rd by adjusting the for-mation vectors νij . We assume that operators and vehicles are synchronized. Thecommunication digraph between operators is assumed to be fixed and identical to G.That is, each operator only receives information from in-neighbors in Ni at each timeinstant. We later discuss a possible extension to deal with time-varying communica-tion digraphs; see Section 5.

Remark 2.1. Similar formation functions are used in [13, 14]. When νij = 0for all (i, j) ∈ E, then the formation control problem reduces to the special case ofrendezvous which has received considerable attention [12, 16, 22]. •

2.2. Model of rational adversaries. A group of N adversaries aims to abortthe mission of formation stabilization. To achieve this, an adversary is allocated toattack a specific operator-vehicle pair and this relation does not change over time.Thus, we identify adversary i with the operator-vehicle pair i. Each adversary isable to locate her target vehicle, and eavesdrop on incoming messages of her targetoperator. We further assume that adversaries are able to collect some (potentiallyimperfect and dynamically changing) information of their opponents. Specifically,adversary i will have estimates νaij(k) ∈ Rd of νij at time k and P a

ij ∈ Rd×d of Pij ,for j ∈ Ni. Here, the matrix P a

ij is positive-definite and diagonal.

As [7, 8, 15], we assume that adversaries are rational decision makers, and theymake real-time decisions based on the latest information available. In particular,at time k, adversary i identifies pi(k) of her target vehicle, eavesdrops pj(k) sentfrom operator j ∈ Ni to operator i, and intercepts ui(k) sent from operator i tovehicle i. The adversary then computes a command vi(k) which is added to ui(k) sothat vehicle i receives and implements ui(k)+ vi(k) instead. The command vi(k) willbe the solution to the following program:

maxvi∈Rd

∑

j∈Ni

‖pj(k)− (pi(k) + ui(k) + vi)− νaij(k)‖2Paij− ‖vi‖2Ri

, (2.2)

where Ri ∈ Rd×d is diagonal and positive definite. The above optimization problemcaptures two partly conflicting objectives of adversary i. On the one hand, adversary i

2In this paper, we denote by ‖x‖2A

:= xTAx the weighted norm of vector x for a matrix A withthe proper dimensions.

4

would like to destabilize the formation associated with vehicle i, and this malicious in-

terest is encapsulated in the first term∑

j∈Ni

‖pj(k)− (pi(k) + ui(k) + vi)− νaij(k)‖2Paij.

On the other hand, adversary i would like to avoid a high attacking cost ‖vi‖2Ri. Here

we provide a justification on the attacking cost ‖vi‖2Riin problem (2.2). At each time,

adversary i has to spend some energy to successfully decode the message and deliverthe wrong data to vehicle i. The energy consumption depends upon security schemes;e.g., cryptography and/or radio frequency, employed by operator i. A larger ‖vi(k)‖alerts operator i that there is a greater risk to her vehicle, and consequently operator iwill raise the security level si(k + 1) (e.g., the expansion of radio frequencies) of thelink to vehicle i for time k+1, increasing the subsequent costs paid by adversary i (e.g.to block all of the radio frequencies following the operator) at the next time instant.That is, si(k + 1) = ‖vi(k)‖Ri

. For simplicity, the attacking cost at time k + 1 isassumed to be identical to the security level si(k + 1). As a rational decision maker,adversary i is willing to reduce such attacking cost. In the remainder of the paper,we assume the following on the cost matrices of adversaries:

Assumption 2.2. For each i ∈ V , it holds that∑

j∈Ni

P aij −Ri < 0.

In this way, the objective function of the optimization problem (2.2) is strictlyconcave. This can be easily verified by noticing that the Hessian of 2

∑

j∈NiP aij −2Ri

is negative definite. As a consequence, the optimization problem (2.2) is well defined,and its solution is uniquely determined by:

vi(k) = −∑

j∈Ni

Lij(pj(k)− (pi(k) + ui(k))− νaij(k)), (2.3)

where Lij := (Ri −∑

j∈NiP aij)

−1P aij ∈ Rd×d is diagonal and positive definite.

2.3. Justification of the attacker model. Problem (2.2) assumes that eachadversary is a rational decision maker, and always chooses the optimal action basedon the information available. Compared with [20, 23, 24, 25, 28, 30, 33] focusingon attacking detection, our attacker model limits the actions of adversaries to someextent. Assumptions that restrict the behavior of attackers are usually taken in mainreferences on system control under jamming attacks. For example, the paper [15]limits the number of denial-of-service attacks in a time period. This is based onthe consideration that the jammer is energy constrained. Moreover, the paper [7]assumes that the maximum speeds of UAVs and the aerial jammer are identical in apursuit-evasion game. In addition, the papers [2, 4, 6] restrict the attacking strategiesto follow some I.I.D. probability distributions. We argue that the investigation ofresilient control policies for constrained jamming attacks is reasonable and can leadto important insights for network vulnerability and algorithm design. Clearly, if theactions of adversaries were omnipotent, no strategy could counteract them. But, evenin the case that jammer actions are limited, it is not fully clear what strategy wouldwork or fail. The analysis of these settings can reveal important system and algorithmweaknesses.

2.4. Information about opponents and online adaptation. In a hostileenvironment, it is not realistic to expect that decision makers have complete andperfect information of their opponents. On the other hand, information about oppo-nents plays a vital role in defending or attacking a system. Throughout this paper,we assume that operator i knows that adversary i is rational and makes decisions

5

online based on the solution to the optimization problem (2.2). In particular, we willinvestigate the following two plausible attacking scenarios.

SCENARIO I - Unilateral learning

In the first scenario, adversary i does not update her estimates; i.e., νaij(k) = νaijfor all k ≥ 0 even though νaij and P a

ij may be different from the true values of νijand Pij . On the other hand, operator i has no access to the values of Ri, P

aij and

νaij which are some private information of adversary i. In order to maintain systemresilience, operators can aim to identify the adversarial behavior. To do this, wewill novelly exploit the ideas of reinforcement learning [29], and adaptive control [5],which operators can use to learn these parameters through continuous contact withadversaries.

SCENARIO II - Bilateral learning

Adversaries could be intelligent, attempting to learn some unknown informationonline as well. This motivates us to investigate a second scenario in which adversariesinfer private information during the attacking course. For simplicity, we will assumethat operator i and adversary i know the cost matrices of each other, and how eachother makes real-time decisions on vi(k) and ui(k). However, adversary i is unawareof the formation vectors νij associated with operator i, and thus attempts to identifythese quantities online. In order to play against this class of intelligent adversaries,we show how operators can keep track of the dynamically changing and unmodeledestimates of adversaries, and in turn adapt their defense tactics.

2.5. Discussion. Informally speaking, we pose the formation control problemas a dynamic non-cooperative game between two teams of rational decision makers:operators and adversaries. In SCENARIO I (unilateral learning), adversaries do notadapt their strategies online, but they do in SCENARIO II (bilateral learning). Incontrast to [7, 8], decision makers in our problem formulation do not aim to determinea Nash equilibrium, which is a widely used notion in non-cooperative game theory.Instead, the main focus of the current paper is to quantitatively analyze how onlineadaptation helps operators maintain system functions when they are facing vague and(potentially intelligent) adversaries.

The papers [20, 24, 30, 33] focus on detection of false data injection attacks againststate estimation. There, attackers could intelligently take advantage of channel noisesand successfully bypass the detectors if they have perfect information of the systemdynamics and detectors. The papers [23, 28] aim to detect malicious behavior in amulti-agent setting. Attack detection is a key security variable, and we should mentionthat this is trivial in the set-up of the current paper. Since we assume communicationchannels are noise-free, then operators can verify whether their commands are cor-rupted by simply examining the locations of their associated vehicles. Here, our focusis network resilience to malicious attacks, which is another key security aspect. It isof interest to investigate attack detection in the setting of operator-vehicle networksand this is one of the future work.

In our recent papers [36, 37], we consider deception attacks for a single group ofoperator, plant and adversary. The plant dynamics in [36, 37] is more complicatedthan those in the current paper and could be any stabilizable linear time-invariantsystem. In contrast, the challenge of the current paper is to coordinate multiplevehicles in a distributed way against deception attacks.

Notations. In the sequel, we let tr be the trace operator of matrices, and let‖A‖F and ‖A‖ denote the Frobenius norm and 2-norm of a real matrix A ∈ Rm×n,

6

respectively. Recall that ‖A‖2F = tr(ATA) =

m∑

i=1

n∑

j=1

a2ij and ‖A‖ ≤ ‖A‖F . We will

use the shorthand of [Bij ]j∈Ni:= [Bii1 , · · · , Biini

] ∈ Rn×mni where the dimensions

of the given Bij ∈ Rn×m will be identical for all j ∈ Ni. Consider the diagonal

vector map, diagve : Rd×d → Rd, defined as diagve(A) = v, with vi = Aii, for all i.

Similarly, define the diagonal matrix map, diagma : Rd → Rd×d, as diagma(v) = D,

with Dii = vi, Dij = 0, for all i, j and j 6= i. Let P≥0 : Rd → R

d be theprojection operator from R

d onto the non-negative orthant of Rd. Now define thelinear operator Pi : R

ni(d+1)×d → Rni(d+1)×d as follows. Given Λ ∈ Rni(d+1)×d, then

Pi(Λ) = M ∈ Rni(d+1)×d, defined block-wise as follows:

if ΛT := [[LTij ]j∈Ni

, [ηTij ]j∈Ni], then MT := [[MT

ij ]j∈Ni, [µT

ij ]j∈Ni], with

MTij = diagma(P≥0(diagve(L

Tij))), µT

ij = ηTij , j ∈ Ni. (2.4)

The linear operator Pi will be used in the learning rule of the algorithm proposed forSCENARIO I (unilateral learning).

3. Attack-resilient distributed formation control with unilateral learn-ing. In this section, we investigate SCENARIO I (unilateral learning) and proposea novel formation control algorithm, namely ALGORITHM I (unilateral learning),to guarantee the formation control mission under malicious attacks. It is worthy torecall that in this scenario adversary i does not update her estimates in this scenario;i.e., νaij(k) = νaij for all k ≥ 0.

3.1. A linearly parametric interpretation of attacking policies. Recallthat operator i is aware that the decisions of adversary i are based on the solution tothe optimization problem (2.2). This implies that operator i knows that vi(k) is inthe form of (2.3), but does not have access to the real values of Lij and νaij . A morecompact expression for vi(k) is given in the following.

Lemma 3.1. The vector vi(k) can be written in the following form:

vi(k) = ΘTi Φi(k) = −

∑

j∈Ni

Lij(pj(k)− (pi(k) + ui(k))− νij) + ηij

= −∑

j∈Ni

Lij

(

(pj(k)− (pi(k) + ui(k))− νij) + (νij − νaij))

,

where ηij := Lij(νij − νaij) ∈ Rd, and matrices Θi ∈ R

ni(d+1)×d, φi(k) ∈ Rnid,

Φi(k) ∈ Rni(d+1) are given by:

φi(k) := −

pi1(k)− (pi(k) + ui(k))− νii1...

pini(k)− (pi(k) + ui(k))− νiini

,

ΘTi := [[Lij ]j∈Ni

[ηij ]j∈Ni], Φi(k) := −[φi(k)

T 1 · · · 1]T . (3.1)

Proof. This fact can be readily verified.In the light of the above lemma, we will equivalently assume that operator i is

aware of vi(k) being the product of Θi and Φi(k), where the unknown parameter Θi

is referred to as the target parameter of operator i, and the vector Φi(k) is referred

7

to as the regression vector of operator i at time k. In other words, from the point ofview of operator i, the attacking strategy of adversary i is linearly parameterized bythe unknown (but fixed) matrix Θi.

3.2. ALGORITHM I (unilateral learning) and its convergence proper-ties.

[Informal description] Overall, ALGORITHM I (unilateral learning)can be roughly described as follows. At each time instant, each op-erator first collects the current locations of neighboring operators’vehicles. Then, the operator computes a control command ui(k)minimizing a local formation error function by assuming that herneighboring vehicles do not move. This computation is based on thecertainty equivalence principle; i.e., operator i exploits her latest es-timate Θi(k) to predict that adversary i corrupts her command byadding voi (k) := Θi(k)

TΦi(k) as if Θi(k) were identical to Θi. Afterthat, the operator sends the new command ui(k) to her associatedvehicle. Adversary i then corrupts the command by adding the signalvi(k) linearly parameterized by Θi. Vehicle i receives, implements,and further sends back to operator i the new position pi(k + 1). Af-ter that, operator i computes the new estimation error of Θi, andupdates her estimate to minimize a local estimation error function.

We now formally state the interactions of the ith group consisting of operator,vehicle and adversary i in Algorithm 1. The rule to compute ui(k), and the preciseupdate law for Θi(k) can be found there. The notations used to describe ALGO-RITHM I (unilateral learning) are summarized in Table 3.1.

Table 3.1

Notations used in ALGORITHM I (unilateral learning)

pi(k) ∈ Rd the location of vehicle i at time k

pi(k + 1|k) ∈ Rd the prediction of pi(k + 1) producedby operator i at time k

Pij ∈ Rd×d the weight matrix assigned by operator ito the formation vector νij for j ∈ Ni

Pii ∈ Rd×d the weight matrix assigned by operator ito her own current location

ui(k) ∈ Rd the control command of operator i at time k

vi(k) ∈ Rd the command generated by adversary iat time k and given in (2.3)

voi (k) ∈ Rd the prediction of vi(k) generated by operator i

Θi ∈ Rni(d+1)×d the target parameter of operator i given in (3.1)

Θi(k) ∈ Rni(d+1)×d the estimate of Θi produced by operator iat time k

Φi(k) ∈ Rni(d+1) the regression vector of operator iat time k given in (3.1)

mi(k) :=√

1 + ‖Φi(k)‖2 the normalized term of operator i

Pi a projection operator defined by (2.4)

Remark 3.1. We denote Pi := Pii +∑

j∈NiPij and let Θi(k)

T be partitioned in

the form of Θi(k)T = [[Lij(k)]j∈Ni

[ηij(k)]j∈Ni], where Lij(k) ∈ Rd×d and ηij(k) ∈

8

Algorithm 1 ALGORITHM I (unilateral learning) for group i

Initialization: Operator i chooses any Θi ∈ Rni(d+1)×d and and lets Θi(0) = Pi[Θi]as the initial estimate of Θi.

Iteration: At each k ≥ 0, adversary, operator, and vehicle i execute the followingsteps:

1: Operator i receives pj(k) from operator j ∈ Ni, and solves the following quadraticprogram:

minui(k)∈Rd

∑

j∈Ni

‖pj(k)− pi(k + 1|k)− νij‖2Pij+ ‖pi(k)− pi(k + 1|k)‖2Pii

,

s.t. pi(k + 1|k) = pi(k) + ui(k) + voi (k), (3.2)

to obtain the optimal solution ui(k) where voi (k) := Θi(k)TΦi(k) and Pii is a

positive-definite and diagonal matrix.2: Operator i sends ui(k) to vehicle i, and generates a prediction of pi(k+1) in such

a way that pi(k + 1|k) = pi(k) + ui(k) + voi (k).3: Adversary i identifies pi(k), eavesdrops on pj(k) sent from operator j ∈ Ni to

operator i, and corrupts ui(k) by adding vi(k) = ΘTi Φi(k).

4: Vehicle i receives and implements the corrupted command ui(k)+vi(k), and thensends back the new location pi(k + 1) = pi(k) + ui(k) + vi(k) to operator i.

5: Operator i computes the estimation error ei(k) = pi(k + 1) − pi(k + 1|k), andupdates her parameter estimate as Θi(k + 1) = Pi[Θi(k) +

1mi(k)2

Φi(k)ei(k)T ],

where mi(k) :=√

1 + ‖Φi(k)‖2.6: Repeat for k = k + 1.

Rd, for j ∈ Ni = 1, · · · , ni. Then, the solution ui(k) to the quadratic program in

Step 1 of ALGORITHM I (unilateral learning) can be explicitly computed as follows:

ui(k) =(

I +∑

j∈Ni

Lij(k))−1 ×

∑

j∈Ni

P−1i Pij(pj(k)− pi(k)− νij)

+∑

j∈Ni

Lij(k)(pj(k)− pi(k)− νij) +∑

j∈Ni

ηij(k)

. (3.3)

Hence, the program in Step 1 of ALGORITHM I (unilateral learning) is equivalent tothe computation (3.3). In Step 5 of ALGORITHM I (unilateral learning), operator iutilizes a projected parameter identifier to learn Θi online. This scheme extends theclassic (vector) normalized gradient algorithm; e.g., in [5], to the matrix case andfurther incorporates the projection operator Pi to guarantee that ui(k) is well defined.That is, the introduction of Pi ensures that the estimate Lij(k) is positive definite,and that I +

∑

j∈NiLij(k) is nonsingular. As in [5], the term 1

mi(k)2Φi(k)ei(k)

T in

the update law of Θi(k) is to minimize the error cost ei(k)T ei(k)

mi(k)2. Here, ei(k) is the

position estimation error, and mi(k) is a normalizing factor. •The following theorem guarantees that our proposed ALGORITHM I (unilat-

eral learning) is attack-resilient and allows the multi-vehicle to achieve the desiredformation in SCENARIO I (unilateral learning).

Theorem 3.2. (Convergence properties of ALGORITHM I (unilaterallearning)): Consider SCENARIO I (unilateral learning) with any initial configura-

9

tion p(0) ∈ RNd of vehicles. If Assumptions 2.1 and 2.2 hold, then ALGORITHM I(unilateral learning) for every group i ensures that the vehicles asymptotically achievethe desired formation; i.e., lim

k→+∞dist(p(k), X∗) = 0. Furthermore, the convergence

rate of ALGORITHM I (unilateral learning) ensures the following:

+∞∑

k=0

∑

(i,j)∈E

‖pj(k)− pi(k)− νij‖2 < +∞.

Proof. The proof is provided in the appendix.

3.3. A numerical example for ALGORITHM I (unilateral learning).Here we evaluate the performance of ALGORITHM I (unilateral learning) througha numerical example. Consider a group of 15 vehicles which are initially randomlydeployed over a square of 50×50 length units as shown in Figure 3.1(a). Figure 3.1(c)delineates the trajectory of each vehicle in the first 60 iterations of the algorithm.The configuration of the vehicles at the 60th iteration of ALGORITHM I (unilaterallearning) is given by Figure 3.1(b) and this one is identical to the desired formation.This fact can be verified by Figure 3.1(d), which shows the evolution of the formationerrors of ALGORITHM I (unilateral learning). Figure 3.1(d) also demonstrates thatthe convergence rate of ALGORITHM I (unilateral learning) in the simulation isexponential and this is faster than our analytical result in Theorem 3.2.

0 10 20 30 40 500

10

20

30

40

50

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(a) Initial configuration of vehicles forALGORITHM I (unilateral learning).

10 15 20 25 30 35 40 4520

25

30

35

40

45

50 1

2 3

4 5 6

7 8 9 10

11 12 13 14 15

(b) The configuration of vehicles at the60th iteration under ALGORITHM I(unilateral learning).

4. Attack-resilient distributed formation control with bilateral learn-ing. In this section, we investigate the more challenging SCENARIO II (bilaterallearning) and we propose ALGORITHM II (bilateral learning) to defeat the intelli-gent adversaries.

In SCENARIO II (bilateral learning), adversary i is aware of Pij (i.e, P aij = Pij)

and the policy of operator i to compute ui(k). However, adversary i has no access tothe formation vectors of νij for j ∈ Ni in advance. This motivates adversary i to learnνij . In what follows, the quantity νaij(k) is an estimate of νij maintained by adversaryi at time k. On the other hand, operator i is assumed to know Ri and the rule ofadversary i making decisions without accessing the instantaneous estimate νaij(k). Inorder to play against her opponent, operator i has to keep track of the time-varyingquantity νaij(k). Operator i is completely unaware of the learning dynamics associatedwith the estimates νaij(k), and thus νaij(k) is totally unmodeled for operator i. The best

10

0 10 20 30 40 500

10

20

30

40

501

2

3

4

5

6

7

8

9

10

11

12

1314

15

1

2 3

4 5 6

7 8 9 10

11

12 13

1415

(c) Trajectories of the vehicles duringthe first 60 iterations of ALGORITHM I(unilateral learning). The green squaresstand for initial locations and red circlesrepresent final locations.

0 10 20 30 40 50 60

0

200

400

600

800

1000

formation error

(d) The evolution of formation errorsduring the first 60 iterations of ALGO-RITHM I (unilateral learning).

operator i can do is to observe some quantity that depends on νaij(k) at time k, andgenerate a posterior estimate νoij(k + 1) of νaij(k). Through the certainty equivalenceprinciple, the actions of operator i and adversary i at time k employ the estimates ofνaij(k) and νoij(k), respectively.

In the remainder of this section, the subscripts of a and o are used to indicatethe target parameters of adversaries and operators, respectively, and the superscriptsof a and o are employed to indicate the estimates of target parameters or other localvariables of adversaries and operators, respectively. Towards this end, let us makethe following notations: Ωa,i = [[νTij ]j∈Ni

]T (resp. Ψo,i(k) = Ωai (k)) is the target

parameter of adversary i (resp. operator i), and Ωai (k) = [[νaij(k)

T ]j∈Ni]T (resp.

Ψoi (k) = [[νoij(k)

T ]j∈Ni]T ) represents the estimate of Ωa,i (resp. Ψo,i(k−1)) produced

by adversary i (resp. operator i) at time k.

4.1. A linearly parametric interpretation of attacking policies and localformation control laws. In this part, we first find a linearly parametric interpreta-tion of attacking policies from the point of view of operators. Then we devise a localformation control law for each operator. Before it, we adopt the following notation3:

Lij := (Ri −∑

j∈Ni

Pij)−1Pij , Li :=

∑

j∈Ni

Lij ,

Mij := (I + Li)−1P−1

i Pij , Mi :=∑

j∈Ni

Mij .

Throughout this section, we assume that the cost matrices of each operator are ho-mogeneous, and this assumption is formally stated as follows:

Assumption 4.1. For each i ∈ V , there is a diagonal and positive-definitematrix Pi such that Pij =

1niPi for all j ∈ Ni.

With this assumption, it is easy to see that:

Lij =1

ni(Ri − Pi)

−1Pi, Lij =1

niLi, Mij =

1

niMi.

3Note that similar letters do not exactly match their meaning in the previous section.

11

Lemma 4.1. The vector vi(k) can be written in the following way:

vi(k) = −∑

j∈Ni

Lij(pj(k)− pi(k)− ui(k)) + (Φoi )

TΨo,i(k), (4.1)

where the matrices of Φoi and Ψo,i(k) are given by:

(Φoi )

T := [[Lij ]j∈Ni], Ψo,i(k) := [[νaij(k)

T ]Tj∈Ni]T (4.2)

Proof. It is straightforward to verify this result.In SCENARIO II (bilateral learning), operator i knows that adversary i bases her

decisions on the solution to the optimization problem (2.2) which is parameterized bythe unknown quantity νaij(k). Lemma 4.1 indicates that, from operator i’s point ofview, the attacking strategy of adversary i is linearly parameterized by the unknownand time-varying matrix Ψo,i(k). The quantity Φo

i is referred to as the regressionvector of operator i.

We are now in the position to devise a local formation control law for each oper-ator. In particular, with pj(k) for j ∈ Ni, operator i computes the control commandui(k) by solving the following program to minimize the local formation error:

minui(k)∈Rd

∑

j∈Ni

‖pj(k)− pi(k + 1|k)− νij‖2Pij+ ‖pi(k)− pi(k + 1|k)‖2Pii

,

s.t. pi(k + 1|k) = pi(k) + ui(k) + voi (k), (4.3)

where voi (k) is a prediction of vi(k) and defined as follows:

voi (k) := −∑

j∈Ni

Lij(pj(k)− (pi(k) + ui(k))) + (Φoi )

TΨoi (k). (4.4)

The solution to (4.3) is uniquely determined by:

ui(k) = (I + Li)−1

∑

j∈Ni

P−1i Pij(pj(k)− pi(k)− νij)

+∑

j∈Ni

Lij(pj(k)− pi(k)− νoij(k))

. (4.5)

4.2. A linearly parametric interpretation and estimates of formationcontrol commands. In SCENARIO II (bilateral learning), adversary i, on the onehand, is unaware of the formation vector νij for j ∈ Ni; and on the other hand, isable to intercept ui(k) produced by operator i. This motivates adversary i to infer νijthrough the observation of ui(k). To achieve this, she generates the following estimateuai (k) of the control command ui(k) before receiving ui(k):

uai (k) = (I + Li)

−1∑

j∈Ni

P−1i Pij(pj(k)− pi(k)− νaij(k))

+∑

j∈Ni

Lij(pj(k)− pi(k)− νaij(k))

, (4.6)

and computes the estimation error eai (k) = ui(k)−uai (k) via the comparison with ui(k)

and uai (k). In the next part, we will explain how adversary i updates her estimates

of νij based on eai (k).

12

4.3. ALGORITHM II (bilateral learning) and convergence properties.

[Informal description] We informally describe ALGORITHM II (bi-lateral learning) as follows. At each time instant k, operator i, ad-versary i and vehicle i implement the following steps.(1) Each operator first receives the information of pj(k) from neigh-boring operator j. The operator then computes a control commandui(k) to minimize a local formation error function by assuming thather neighboring vehicles do not move and the strategy of adversary iis linearly parameterized by Θo

i (k). After this computation, the op-erator sends the generated command ui(k) to vehicle i.(2) Adversary i intercepts pj(k) for j ∈ Ni and ui(k), and furthercorrupts ui(k) by adding the signal vi(k) linearly parameterized byΘo

i . Adversary i maintains a scheduler T ai

4 which determines thecollection of time instants to update her estimate Ωa

i (k). In particu-lar, if k ∈ T a

i , then adversary i generates an estimate uai (k) of ui(k),

identifies her estimation error and then produces her estimate Ωai (k)

by minimizing some local estimation error function.5

(3) Vehicle i receives, implements, and further sends back to opera-tor i the new position pi(k + 1).(4) After that, operator i determines the estimation error of Ψo,i(k),and updates her estimate to minimize a local estimation error func-tion.

We proceed to formally state ALGORITHM II (bilateral learning) in Algorithm 2.The notations used in Algorithm 2 are summarized in Table 4.1.

We now set out to analyze ALGORITHM II (bilateral learning). First of all, letus spell out the estimation errors eai (k) and eoi (k) as follows:

eoi (k) = pi(k + 1)− pi(k + 1|k) = (Φoi )

T (Ψo,i(k)−Ψoi (k)),

eai (k) = ui(k)− uai (k) = rai (k) + (I + Li)

−1eoi (k), (4.9)

where rai (k) := (Φai )

T (Ωa,i−Ωai (k)). In addition, we notice that operator i is attempt-

ing to identify some time-varying quantities, and the evolution of her time-varyingtarget parameters is given by:

Ψoi (k + 1) = Ψo

i (k) +µai

m2i

Φai e

ai (k), (4.10)

which can be readily obtained from the update rules of (4.7) in Algorithm 2 by notingthat Ψo

i (k) = Ωa,i(k). The following lemma describes a linear relation between theregression vectors Φa

i and Φoi . This fact will allow us to quantify the estimation errors

of operators which are introduced by the variations of time-varying target parameters.

Lemma 4.2. The regression vectors of the ith adversary-operator pair satisfyΦa

i = ΦoiL

−1i Mi.

Proof. It follows from Assumption 4.1 and the non-singularity of correspondingmatrices.

4Without loss of any generality, we assume that 0 ∈ Tai.

5Note that the scheduler just marks the update of the attack policy, but the attacker corruptsthe control commands at every k.

13

Table 4.1

The notations of ALGORITHM II (bilateral learning)

pi(k) ∈ Rd the location of vehicle i at time k

pi(k + 1|k) ∈ Rd the prediction of pi(k + 1) produced byoperator i at time k

ui(k) ∈ Rd the control command of operator i at time k

uai (k) ∈ Rd the estimate of ui(k) maintained by

adversary i at time k and given in (4.6)

vi(k) ∈ Rd the command generated by adversary i at time k

voi (k) ∈ Rd the prediction of vi(k) produced byoperator i at time k and given by (4.4)

Ωa,i = [[νTij ]j∈Ni]T the target parameter of adversary i

Ωai (k) = [[νaij(k)

T ]j∈Ni]T

the estimate of Ωa,i produced byadversary i at time k

Ψo,i(k) = Ωai (k) the target parameter of operator i

Ψoi (k) = [[νoij(k)

T ]j∈Ni]T

the posterior estimate of Ψo,i(k − 1) produced byoperator i at time k

(Φai )

T := [[Mij ]j∈Ni] the regression vector of adversary i

(Φai )

T := [[Lij ]j∈Ni] the regression vector of operator i

mi :=√

1 + ‖Φoi ‖2 + ‖Φa

i ‖2 the normalized term of group i

µai ∈ (0, 1] the step-size of adversary i

µoi ∈ (0, 1] the step-size of operator i

T ai the scheduler of adversary i

For each k ≥ 1, we denote by τai (k) the largest time instant in T ai that satisfies

τai (k) < k. The following proposition summarizes the convergence properties of theestimation errors of the learning schemes in ALGORITHM II (bilateral learning).

Proposition 4.3. Consider SCENARIO II (bilateral learning) with any initialconfiguration p(0) ∈ RNd of vehicles and any initial estimates of Ωa

i (0) and Ψoi (0).

Suppose Assumptions 2.1, 2.2 and 4.1 hold and the following inequalities are satisfied:

− 2µai + 5(µa

i )2 +

(µai

µoi

)2‖L−1i Mi‖2 +

(µai

µoi

)2‖(I + Li)−1‖2 < 0,

− 2µoi + 4(µo

i )2 + 4(µa

i )2‖(I + Li)

−1‖2 + 2µai ‖L−1

i Mi‖‖(I + Li)−1‖ < 0. (4.11)

The following statements hold for the sequences generated by ALGORITHM II (bilat-eral learning):

1. The sequence of Ψai (k) is uniformly bounded.

2. The sequence of eoi (k) is square summable and the sequence of rai (k) isdiminishing.

3. If there is an integer TB ≥ 1 such that k − τai (k) ≤ TB for any k ≥ 0, thenthe sequence of eai (k) is square summable.

Proof. The proof is given in the appendix.Based on Proposition 4.3, we are able to characterize the asymptotic convergence

properties of ALGORITHM II (bilateral learning) as follows.Theorem 4.4. (Convergence properties of ALGORITHM II (bilateral

learning)): Consider SCENARIO II (bilateral learning) with any initial positionp(0) ∈ RNd of vehicles and any initial estimates of Ωa

i (0) and Ψoi (0). Suppose As-

14

Algorithm 2 ALGORITHM II (bilateral learning) for group i

Initialization: Vehicle i informs operator i of its initial location pi(0) ∈ Rd. Oper-ator i chooses initial estimate Ψo

i (0), and adversary i chooses initial estimates ofΩa

i (0).Iteration: At each k ≥ 0, adversary, operator, and vehicle i execute the followingsteps:

1: Operator i receives pj(k) from operator j ∈ Ni, solves the quadratic program (4.3),and obtains the optimal solution ui(k). Operator i then sends ui(k) to vehicle i,and generates the prediction of pi(k + 1|k) = pi(k) + ui(k) + voi (k).

2: Adversary i identifies the location pi(k) of vehicle i, eavesdrops on pj(k) sentfrom operator j ∈ Ni to operator i, and corrupts ui(k) by adding vi(k) in (2.3).Adversary i produces an estimate ua

i (k) of ui(k) in the way of (4.6), and computesher estimation error eai (k) = ui(k) − ua

i (k). If k /∈ T ai , then Ωa

i (k + 1) = Ωai (k);

otherwise,

Ωai (k + 1) = Ωa

i (k) +µai

m2i

Φai e

ai (k), (4.7)

with the step-size µai ∈ (0, 1] and the normalized term mi :=

√

1 + ‖Φoi ‖2 + ‖Φa

i ‖2.3: Vehicle i receives and implements the corrupted command ui(k)+vi(k), and then

sends back its new location pi(k + 1) = pi(k) + ui(k) + vi(k) to operator i.4: Operator i computes the estimation error eoi (k) = pi(k + 1) − pi(k + 1|k), and

updates her parameter estimate in the following manner:

Ψoi (k + 1) = Ψo

i (k) +µoi

m2i

Φoi e

oi (k), (4.8)

with the step-size µoi ∈ (0, 1].

5: Repeat for k = k + 1.

sumptions 2.1, 2.2 and 4.1 and condition (4.11) hold. Then ALGORITHM II (bilat-eral learning) ensures that the vehicles asymptotically achieve the desired formation;i.e., lim

k→+∞dist(p(k), X∗) = 0. Furthermore, the convergence rate of the algorithm can

be estimated in such a way that

+∞∑

k=0

∑

(i,j)∈E

‖pj(k)− pi(k)− νij‖2 < +∞.

Proof. The proof can be found in the appendix.Through the comparison of Theorem 4.4 and Theorem 3.2, it is not difficult to

see that ALGORITHM II (bilateral learning) shares analogous convergence propertieswith ALGORITHM I (unilateral learning), but requires an additional condition (4.11).The following provides a set of sufficient conditions that can ensure (4.11).

Lemma 4.5. The following statements hold:1. For any pair of step-sizes µa

i ∈ (0, 25 ) and µo

i ∈ (0, 12 ), there is a Pi such that

condition (4.11) holds.2. For any given triple of µo

i ∈ (0, 12 ), ‖(I + Li)

−1‖ and ‖L−1i Mi‖, then there is

µai ∈ (0, 25 ) such that for any µa

i ∈ (0, µai ], condition (4.11) holds.

Proof. Note that−2µai+5(µa

i )2 < 0 and−2µo

i+4(µoi )

2 < 0 in (4.11) for µai ∈ (0, 2

5 )and µo

i ∈ (0, 12 ).

15

Let us investigate the first condition. If we take the limit on Ri − Pi to 0, thenwe have ‖(I + Li)

−1‖ → 0 and ‖L−1i Mi‖ → 0. This means that operator i can

always choose Pi such that ‖(I + Li)−1‖ and ‖L−1

i Mi‖ are sufficiently small. As aresult, operator i can always choose Pi to enforce condition (4.11). We now considerthe second condition. When µa

i is sufficiently small, then −2µai and −2µo

i dominatein the two inequalities of condition (4.11), respectively. By continuity, there existsµai ∈ (0, 25 ) such that condition (4.11) holds for any µa

i ∈ (0, µai ].

To conclude this section, we leverage singular perturbation theory (e.g., in [18]) toprovide an informal interpretation of the conditions in Lemma 4.5. This will help usdraw some insights from Proposition 4.3 and Theorem 4.4. From (4.9) and Lemma 4.2,we know the following:

eai (k) = (ΦoiL

−1i Mi)

T (Ωa,i − Ωai (k)) + (I + Li)

−1eoi (k).

The first condition in Lemma 4.5 renders that ‖(I + Li)−1‖ and ‖L−1

i Mi‖, and thus‖eai (k)‖, are sufficiently small. The second condition in Lemma 4.5 renders µa

i ≈ 0and thus ‖eai (k)‖ ≈ 0 as well. Hence, under any condition in Lemma 4.5, the dynam-ics (4.7) approximates Ωa

i (k + 1) ≈ Ωai (k); i.e., the learning dynamics of adversary i

evolves on a slow manifold. On the other hand, for any fixed Ωai , (4.8) becomes:

Ψoi (k + 1) = Ψo

i (k) +µoi

m2i

Φoi (Ψ

oi (k)− Ωa

i ), (4.12)

and the trajectories of (4.12) asymptotically reach the set of Ψoi | ‖Φo

i (Ψoi−Ωa

i )‖ = 0where the estimation error of operator i vanishes.

If we informally interpret µai and µo

i as learning rates of adversary i and operator i,respectively, then the second condition in Lemma 4.5 demonstrates that operators canwin the game if their learning rates are sufficiently faster than their opponents.

4.4. A numerical example for ALGORITHM II (bilateral learning). Inorder to compare with the performance of ALGORITHM I (unilateral learning), weconsider the same problem where a group of 15 vehicles are initially randomly de-ployed over the square of 50 × 50. Figure 4.1(e) shows the initial configuration, andFigure 4.1(g) then presents the trajectory of each vehicle in the first 100 iterationsof the algorithm. The group configuration at the 100th iteration is provided in Fig-ure 4.1(f). We can verify the fact that the desired formation is exponentially achievedfrom Figure 4.1(h) of the evolution of the formation errors of ALGORITHM II (bi-lateral learning).

The simulations provide some insights of the algorithms. Comparing Figures 3.1(d)and 4.1(h), it can be seen that ALGORITHM I (unilateral learning) converges fasterthan ALGORITHM II (bilateral learning). Figure 3.1(c) shows that vehicles stayclose to the region where they start from while Figure 4.1(g) shows that vehicles driftsignificantly away from the starting area. These two facts verify the fact that thedamage induced by intelligent adversaries is greater.

5. An extension to time-varying inter-operator communication digraphs.So far, we have only considered a fixed communication digraph of operators. ALGO-RITHM I (unilateral learning), together with Theorem 3.2, can be extended to asimple case of time-varying inter-operator digraphs with some additional assump-tions. Let NC

i (k) ⊆ Ni be the set of operators who can send data to operator i attime k. We define an operator communication digraph as GC(k) := (V, EC(k)) whereEC(k) := (j, i) | j ∈ NC

i (k). It can be seen that GC(k) is a subgraph of G. We

16

0 10 20 30 40 500

5

10

15

20

25

30

35

40

451

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(e) Initial configuration of vehicles forALGORITHM II (bilateral learning).

95 100 105 110 115 120 125−80

−75

−70

−65

−60

−55

−501

2 3

4 5 6

7 8 9 10

11 12 13 14 15

(f) The configuration of vehicles at the100th iteration under ALGORITHM II(bilateral learning).

0 20 40 60 80 100 120 140−100

−50

0

50

(g) Trajectories of the vehicles dur-ing the first 100 iterations of ALGO-RITHM II (bilateral learning). Thegreen squares stand for initial locationsand red circles represent final locations.

0 20 40 60 80 1000

200

400

600

800

1000

1200

formation error

(h) The evolution of formation errorsduring the first 100 iterations of ALGO-RITHM II (bilateral learning).

slightly modify ALGORITHM I (unilateral learning) as follows. If NCi (k) 6= Ni, then

operator i does nothing at this time instant. Since operator i does not send out anyinformation, then adversary and vehicle i will have to keep idle at this time instantas well. If NC

i (k) = Ni, then operator i, adversary i and vehicle i implement oneiteration of ALGORITHM I (unilateral learning). In other words, this situation mod-els a type of asynchronous operator interactions under the assumption that vehiclescan maintain their positions. To ensure the convergence of the modified algorithm,we require that the frequency that the set Ni can be recovered by operators is highenough. Formally, we need the following to hold:

Assumption 5.1. There is some integer T ≥ 1 such that the event of NCi (k) =

Ni occurs at least once within any T consecutive steps.

This assumption in conjunction with Assumption 2.1 ensures that for all k0 ≥ 0,the digraph (V,

⋃B−1k=0 EC(k0 + k)) is strongly connected with the integer B := NT .

The proof of Theorem 3.2 can be carried out almost exactly by only changing Tk :=k(NB − 1) in the proof of Claim 1 of the proofs for Theorem 3.2 in the appendix,as we did in [34]. This extension applies to ALGORITHM II (bilateral learning) aswell. The possible solution aforementioned allows for tolerating unexpected changesof communication digraphs between operators, but this robustness comes with theexpense of potentially slowing down the algorithms. An interesting future researchproblem is to maintain the convergence rates of algorithms under switching topologies.

17

6. Conclusions. We have studied a distributed formation control problem foran operator-vehicle network which is threatened by a team of adversaries. We haveproposed a class of novel attack-resilient distributed formation control algorithms andanalyzed their asymptotic convergence properties. Our results have demonstrated thecapability of online learning to enhance network resilience, and suggest a number offuture research directions which we plan to investigate. For example, the currentoperator-vehicle architecture can be enlarged to allow for more complex interactions.Moreover, the types of malicious attacks can be broadened and the models of attackerscan be further refined. In addition, it would be interesting to study the cyber-securityof other cooperative control problems in the operator-vehicle setting.

7. Appendix. In this section, we provide the proofs for Theorem 3.2, Proposi-tion 4.3 and Theorem 4.4. Before doing that, we give two instrumental facts as followswhere the second one is a direct result of the first one.

Lemma 7.1. The following statements hold:1. Let a(k) be a non-negative scalar sequence. If a(k) is summable, then it

converges to zero.2. Consider non-negative scalar sequences of V (k) and b(k) such that V (k+

1)− V (k) ≤ −b(k). Then it holds that limk→+∞

b(k) = 0.

It is worthy to remark that the second fact in Lemma 7.1 is a discrete-time versionof Barbalat’s lemma (e.g., in [18]) and plays an important role in our subsequent anal-ysis. We are now in the position to show Theorem 3.2 for ALGORITHM I (unilaterallearning).

Proof of Theorem 3.2:Proof. First of all note that, through the choice of ui(k), pi(k + 1|k) is the

minimizer of∑

j∈Ni

‖pj(k)− pi − νij‖2Pij+ ‖pi(k)− pi‖2Pii

in pi. Therefore,

pi(k + 1|k) = pi(k) +∑

j∈Ni

P−1i Pij(pj(k)− pi(k)− νij),

where we use the fact that Pij is diagonal and positive definite. Recall that ei(k) =pi(k + 1)− pi(k + 1|k). The above relation leads to:

pi(k + 1) = pi(k + 1|k) + ei(k) = pi(k) +∑

j∈Ni

P−1i Pij(pj(k)− pi(k)− νij) + ei(k).

(7.1)

Pick any p∗ := [p∗i ]i∈V ∈ X∗. Then p∗j − p∗i = νij for any (j, i) ∈ E . Denoteyi(k) = pi(k)− p∗i , for i ∈ V . Subtracting p∗i on both sides of (7.1) leads to:

yi(k + 1) = yi(k) +∑

j∈Ni

P−1i Pij

(

(pj(k)− p∗j )− (pi(k)− p∗i ))

−∑

j∈Ni

P−1i Pij(−p∗j + p∗i + νij) + ei(k)

= yi(k) +∑

j∈Ni

P−1i Pij(yj(k)− yi(k)) + ei(k). (7.2)

Since the Pij are diagonal and positive definite, system (7.2) can be viewed as d parallelfirst-order dynamic consensus algorithms in the variables yi(k) subject to the time-varying signals ei(k). We can guarantee convergence of the vehicles to the desired for-mation if consensus in the yi(k) is achieved. In other words, lim

k→+∞‖yi(k)− yj(k)‖ = 0,

18

for all (i, j) ∈ E , is equivalent to limk→+∞

‖pi(k)− pj(k)− (p∗i − p∗j )‖ = 0. Since p∗i −p∗j = νij , consensus on the yi(k) is equivalent to lim

k→+∞‖pi(k)− pj(k)− νij‖ = 0. The

rest of the proof is devoted to verify this consensus property.

For each ℓ ∈ 1, · · · , d, we denote the following:

gℓ(k) := maxi,j∈V

‖eiℓ(k)− ejℓ(k)‖, Dℓ(k) := maxi,j∈V

‖yiℓ(k)− yjℓ(k)‖.

Here, the quantity Dℓ(k) represents the maximum disagreement of the ℓth consensusalgorithm. The following claim characterizes the input-to-state stability properties ofconsensus algorithms, and it is based on the analysis of dynamic average consensusalgorithms of our paper [34]:

Claim 1: There exist Dℓ(0), β > 0, and σ ∈ (0, 1), such that the following holds:

Dℓ(k + 1) ≤ σk+1Dℓ(0) + βk

∑

s=0

σk−sgℓ(s). (7.3)

Proof. Denote Tk := k(N − 1) and, for any integer k ≥ 0, let ℓk be the largestinteger such that ℓk(N − 1) ≤ k. From (16) in the proof of Theorem 4.1 in [34], weknow that there exists some η ∈ (0, 1) such that

Dℓ(k) ≤ (1− η)ℓkDℓ(0) + (1− η)ℓk−1T1−1∑

s=0

gℓ(s) + · · ·

+ (1 − η)

T(ℓk−1)−1∑

s=T(ℓk−2)

gℓ(s) +

Tℓk−1

∑

s=T(ℓk−1)

gℓ(s) +

k−1∑

s=Tℓk

gℓ(s).

This relation can be rewritten as follows:

Dℓ(k) ≤ (1− η)ℓkDℓ(0) +

k−1∑

s=0

(1 − η)ℓk−ℓsgℓ(s). (7.4)

Since k ≤ ℓk(N − 1) and k−sN−1 − 1 ≤ ℓk − ℓs for k ≥ s, then it follows from (7.4) that

Dℓ(k) ≤ (1− η)k

N−1Dℓ(0) +k−1∑

s=0

(1− η)k−sN−1−1gℓ(s). (7.5)

We get the desired result by letting σ = (1− η)1

N−1 and β = 11−η in (7.5).

Define now an auxiliary scalar sequence z(k):

z(k + 1) = σk+1z(0) +

k∑

s=0

σk−sf(s), k ≥ 0, (7.6)

where z(0) = maxℓ∈1,··· ,d

Dℓ(0), and f(k) = β maxℓ∈1,··· ,d

gℓ(k). It is not difficult to verify

that z(k) is an upper bound of Dℓ(k) in such a way that 0 ≤ Dℓ(k) ≤ z(k), forall k ≥ 0 and ℓ ∈ 1, · · · , d. In order to show the convergence of Dℓ(k) to zero for

19

any i ∈ 1, · · · , d, it suffices to show that z(k) converges to zero. We do this inthe following. Observe that z(k) satisfies the following recursion:

z(k + 1) = σk+1z(0) +

k∑

s=0

σk−sf(s)

= σ(σkz(0) +k−1∑

s=0

σk−1−sf(s)) + f(k) = σz(k) + f(k). (7.7)

For any λ > 0, it follows from (7.7) that

z(k + 1)2 ≤ (1 + λ)σ2z(k)2 + (1 +1

λ)f(k)2, (7.8)

by noting that 2σz(k)f(k) ≤ λσ2z(k)2 + 1λf(k)

2. From the definition of f(k), it is

not difficult to see that f(k)2 ≤ 4β2∑

i∈V

‖ei(k)‖2. Therefore, we have the bound:

z(k + 1)2 ≤ (1 + λ)σ2z(k)2 + 4(1 +1

λ)2β2

∑

i∈V

‖ei(k)‖2.

In the sequel, we choose a (sufficiently small) λ > 0 such that (1 + λ)σ2 < 1. Thefollowing claim finds a bound for ‖ei(k)‖ in terms of z(k)2, for each i ∈ V .

Claim 2: For each i ∈ V , there are a positive and summable sequence γi(k),and positive constants λ1, λ2, such that the following holds:

‖ei(k)‖2 ≤ γi(k)(1 + ni + λ1z(k)2 + λ2). (7.9)

Furthermore, ‖Θi(k)‖ is uniformly bounded.Proof. Denote Θi(k) := Θi(k)+

1mi(k)2

Φi(k)ei(k)T . Subtracting Θi on both sides

leads to the following:

Θi(k)−Θi = (Θi(k)−Θi) +1

mi(k)2Φi(k)ei(k)

T . (7.10)

Recall that ‖A‖2F =

m∑

i=1

n∑

j=1

a2ij for a matrix A ∈ Rm×n. Similarly to the vector

normalized gradient algorithm in [5], one can compute 12‖Θi(k)−Θi‖2F = 1

2 tr((Θi(k)−Θi)

T (Θi(k)−Θi)), just plugging in (7.10), as follows:

1

2‖Θi(k)−Θi‖2F =

1

2‖Θi(k)−Θi‖2F − 1

2mi(k)2tr(

ei(k)(2−Φi(k)

TΦi(k)

mi(k)2)ei(k)

T)

,

(7.11)

where we use the fact that tr is a linear operator, and that ei(k) = (Θi−Θi(k))TΦi(k).

As a consequence, the difference of 12‖Θi(k) − Θi‖2F − 1

2‖Θi(k) − Θi‖2F can be char-acterized in the following way:

1

2‖Θi(k)−Θi‖2F − 1

2‖Θi(k)−Θi‖2F ≤ − 1

2mi(k)2tr(

ei(k)ei(k)T)

= −‖ei(k)‖22mi(k)2

,

(7.12)

20

where we have used that 2 − Φi(k)T Φi(k)

mi(k)2≥ 1, since mi(k) is a normalizing term.

Since the projection operator Pi is applied block-wise, then ‖Θi(k + 1) − Θi‖2F ≤‖Θi(k)−Θi‖2F . Then from (7.12) we have:

‖Θi(k + 1)−Θi‖2F − ‖Θi(k)−Θi‖2F ≤ −‖ei(k)‖2mi(k)2

. (7.13)

The above relation implies that ‖Θi(k)−Θi‖2F is non-increasing and uniformlybounded. Further, this ensures that ‖Θi(k)‖ is uniformly bounded by noting that:

‖Θi(k)‖2 = ‖(Θi(k)−Θi) + Θi‖2 ≤ ‖(Θi(k)−Θi) + Θi‖2F ≤ 2‖Θi(k)−Θi‖2F + 2‖Θi‖2F .

Denote γi(k) := ‖Θi(k)−Θi‖2F − ‖Θi(k + 1)−Θi‖2F . It is noted that

K∑

k=0

γi(k) = ‖Θi(0)− Θi‖2F − ‖Θi(K + 1)−Θi‖2F .

The previous discussion implies that the sequence γi(k) is non-negative, summable,and thus converges to zero by Lemma 7.1. Now, from (7.13) we obtain the followingupper bound on ‖ei(k)‖2 in terms of γi(k):

‖ei(k)‖2 ≤ γi(k)mi(k)2 = γi(k)(1 + ‖Φi(k)‖2) ≤ γi(k)(1 + ni + ‖φi(k)‖2). (7.14)

We would like to find now a relation between ‖φi(k)‖ and z(k). To do this, recoverfrom (3.3) the expression for ui(k) :

ui(k) = (I +∑

j∈Ni

Lij(k))−1 ×

∑

j∈Ni

P−1i Pij(yj(k)− yi(k))

+∑

j∈Ni

Lij(k)(yj(k)− yi(k)) +∑

j∈Ni

ηij(k)

. (7.15)

Recall that Lij(k) and Pij are positive definite and diagonal. This gives us that‖(I +∑

j∈NiLij(k))

−1‖ ≤ 1. Now, it follows from (7.15) that

‖ui(k)‖ ≤∑

j∈Ni

‖P−1i Pij‖

√dz(k) +

∑

j∈Ni

√d‖Lij(k)‖z(k) +

∑

j∈Ni

‖ηij(k)‖.

Since ‖Θi(k)‖ is uniformly bounded, there exists some θ1, θ2 > 0 such that ‖ui(k)‖ ≤θ1z(k)+ θ2, for all k ≥ 0 and all i ∈ V . Notice that φi(k) can be rewritten as follows:

φi(k) :=

yi1(k)− yi(k)− ui(k)...

yini(k)− yi(k)− ui(k)

.

This implies that there exists some λ1, λ2 > 0 such that the following holds for allk ≥ 0 and all i ∈ V :

‖φi(k)‖2 ≤ λ1z(k)2 + λ2,

‖ei(k)‖2 ≤ γi(k)(1 + ni + ‖φi(k)‖2) ≤ γi(k)(1 + ni + λ1z(k)2 + λ2).

21

Using the upper bound on the ‖ei(k)‖2, and the uniform bound on the ‖Θi(k)‖,we can now obtain an inequality involving the z(k)2 and other diminishing terms.This is used to determine the stability properties of z(k)2.

Claim 3: The sequence z(k) is square summable.Proof. From the recursion for z(k), we found that

z(k + 1)2 ≤ (1 + λ)σ2z(k)2 + 4(1 +1

λ)β2

∑

i∈V

‖ei(k)‖2, (7.16)

where a (sufficiently small) λ > 0 is chosen such that (1+λ)σ2 ∈ (0, 1). We now define

V (k) := z(k)2 +∑

i∈V

‖ei(k)‖2 to be a Lyapunov function candidate for ALGORITHM I

(unilateral learning), and have that:

V (k + 1)− V (k) = z(k + 1)2 +∑

i∈V

‖ei(k + 1)‖2 − z(k)2 −∑

i∈V

‖ei(k)‖2

≤ z(k + 1)2 +∑

i∈V

‖ei(k + 1)‖2 − z(k)2.

Using now the bound for ‖ei(k + 1)‖2 in Claim 2, we obtain:

V (k + 1)− V (k) ≤ (1 + λ1

∑

i∈V

γi(k + 1))z(k + 1)2 +∑

i∈V

γi(k + 1)(1 + ni + λ2)− z(k)2.

Finally, upper-bounding z(k + 1)2 as in (7.16), we get:

V (k + 1)− V (k) ≤ (1 + λ1

∑

i∈V

γi(k + 1))((1 + λ)σ2z(k)2

+ (1 +1

λ)4β2

∑

i∈V

‖ei(k)‖2)− z(k)2 +∑

i∈V

γi(k + 1)(1 + ni + λ2). (7.17)

Substituting the upper bound on ‖ei(k)‖2 from (7.9) of Claim 2 into (7.17), wefind that there exists α ∈ (0, 1) and two sequences π1(k) and π2(k) such that

V (k + 1)− V (k) ≤ (α− 1 + π1(k))z(k)2 + π2(k), (7.18)

where π1(k) is positive and diminishing and π2(k) is positive and summable byusing that each sequence of γi(k) is summable. There is a finite K ≥ 0 such that1− α− π1(k) ≤ 1− α

2 for all k ≥ K. Then, for k ≥ K, we have the following:

(1 − α

2)z(k)2 ≤ (1− α− π1(k))z(k)

2 ≤ V (k)− V (k + 1) + π2(k).

This implies that

(1− α

2)

+∞∑

k=K

z(k)2 ≤ V (K) +

+∞∑

k=K

π2(k). (7.19)

Upper-bounding ‖ei(k)‖2 by (7.9) from Claim 2 in the recursion (7.16), it can befound that z(k) is finite for any finite k. As a consequence, ei(k), and thus V (k), are

22

finite for every finite time. In this way, V (K) is finite in (7.19) and, since π2(k) issummable, so is z(k)2.

Claim 3 guarantees that z(k), and thus Dℓ(k), for all ℓ ∈ 1, · · · , d, convergeto zero by Lemma 7.1. Therefore, p(k) asymptotically converges to the set X∗. Inorder to estimate the convergence rate, note that

+∞∑

k=0

∑

(i,j)∈E

‖pj(k)− pi(k)− νij‖2 =

+∞∑

k=0

∑

(i,j)∈E

‖yj(k)− yi(k)‖2 ≤ d|E|+∞∑

k=0

z(k)2 < +∞,

where |E| is the cardinality of E , and in the last inequality we use the summability ofz(k)2 from Claim 3. This completes the proof of Theorem 3.2.

We now turn our attention to show Proposition 4.3 for ALGORITHM II (bilaterallearning).

The proof of Proposition 4.3:

Proof. We will divide the proof into several claims.

Claim 4: For adversary i, then the following relation holds when k ∈ T ai :

‖Ωai (k + 1)− Ωa,i‖2F − ‖Ωa

i (k)− Ωa,i‖2F ≤ −2(µai − (µa

i )2)‖rai (k)‖2

m2i

+ 2µai ‖(I + Li)

−1‖‖rai (k)‖‖eoi (k)‖

m2i

+ 2(µai )

2‖(I + Li)−1‖2 ‖e

oi (k)‖2m2

i

. (7.20)

If k /∈ T ai , then the following holds:

‖Ωai (k + 1)− Ωa,i‖2F = ‖Ωa

i (k)− Ωa,i‖2F . (7.21)

Proof. First of all, we notice that, analogous to (7.11), the following holds foradversary i when k ∈ T a

i :

‖Ωai (k + 1)− Ωa,i‖2F = ‖Ωa

i (k)− Ωa,i‖2F +( µa

i

m2i

)2tr(eai (k)

T (Φai )

TΦai e

ai (k))

+ 2 tr(

(Ωai (k)− Ωa,i)

T µai

m2i

Φai e

ai (k)

)

. (7.22)

For the last term on the right-hand side of the relation (7.22), we have

(Ωai (k)− Ωa,i)

T µai

m2i

Φai e

ai (k) = (Ωa

i (k)− Ωa,i)T µa

i

m2i

Φai (r

ai (k) + (I + Li)

−1eoi (k))

= − µai

m2i

rai (k)T rai (k)−

µai

m2i

rai (k)T (I + Li)

−1eoi (k). (7.23)

The trace of the second term in the last term of (7.23) can be upper bounded in thefollowing way:

µai

m2i

‖ tr(rai (k)T (I + Li)−1eoi (k))‖ ≤ ‖(I + Li)

−1‖µai

m2i

‖rai (k)‖‖eoi (k)‖. (7.24)

Let us consider the second term on the right-hand side of the relation (7.22). Note that(Φa

i )TΦa

i = diag(M2ij) is a diagonal matrix from the fact thatMij is a diagonal matrix.

23

Using the definition of mi as a normalizing term and eai (k) = rai (k)+ (I+Li)−1eoi (k),

we have

( µai

m2i

)2tr(eai (k)

T (Φai )

TΦai e

ai (k)) ≤

(µai )

2

m2i

‖eai (k)‖2

≤ 2(µa

i )2

m2i

(‖rai (k)‖2 + ‖(I + Li)−1‖2‖eoi (k)‖2), (7.25)

where in the last inequality we use the relations of ‖a + b‖2 ≤ 2(‖a‖2 + ‖b‖2) and‖cd‖ ≤ ‖c‖‖d‖. Substitute the bounds of (7.24) and (7.25) into (7.22), and we havethe desired relation (7.20) for k ∈ T a

i by using the fact that tr is a linear operator.The relation for k /∈ T a

i is trivial to verify.

Claim 5: For operator i, the following relation holds when k ∈ T ai :

‖Ψoi (k + 1)−Ψo,i(k + 1)‖2F − ‖Ψo

i (k)−Ψo,i(k)‖2F

≤(

− 2µoi + (µo

i )2 + 2(µa

i )2‖(I + Li)

−1‖2 + 2µai ‖L−1

i Mi‖‖(I + Li)−1‖

)‖eoi (k)‖2m2

i

+ 2(µai )

2 ‖rai (k)‖2m2

i

+ 2(µai ‖L−1

i Mi‖+ µoiµ

ai )‖rai (k)‖‖eoi (k)‖

m2i

. (7.26)

If k /∈ T ai , then the following holds:

‖Ψoi (k + 1)−Ψo,i(k + 1)‖2F − ‖Ψo

i (k)−Ψo,i(k)‖2F ≤(

− 2µoi + (µo

i )2)‖eoi (k)‖2

m2i

.

(7.27)

Proof. We first discuss the case when both adversary i and operator i updatetheir estimates at time k. Note that the following holds for operator i:

Ψoi (k + 1)−Ψo,i(k + 1) = Ψo

i (k) +µoi

m2i

Φoi e

oi (k)−Ψo,i(k)−

µai

m2i

Φai e

ai (k). (7.28)

Analogous to (7.11), it follows from (7.28) that

‖Ψoi (k + 1)−Ψo,i(k + 1)‖2F = ‖Ψo

i (k)−Ψo,i(k)‖2F+

µoi

m2i

tr(

(Ψoi (k)−Ψo,i(k))

TΦoi e

oi (k)

)

− µai

m2i

tr(


TΦai e

ai (k)

)

. (7.29)

One can verify that


T µoi

m2i

Φoi e

oi (k) = − µo

i

m2i

eoi (k)T eoi (k), (7.30)

which produces the following upper bounds for the second term on the right-handside of (7.29):

‖ tr(Ψoi (k)−Ψo,i(k))

T µoi

m2i

Φoi e

oi (k)‖ ≤ µo

i

m2i

‖eoi (k)‖2. (7.31)

24

From (7.30), we can derive the following upper bounds for the third term on theright-hand side of (7.29):

‖(Ψoi (k)−Ψo,i(k))

T µai

m2i

Φai e

ai (k)‖ =

µai

m2i

‖(Ψoi (k)−Ψo,i(k))

TΦoiL

−1i Mie

ai (k)‖

=µai

m2i

‖eoi (k)TL−1i Mie

ai (k)‖ ≤ µa

i ‖L−1i Mi‖‖

eoi (k)

mi‖‖e

ai (k)

mi‖

≤ µai ‖L−1

i Mi‖‖eoi (k)

mi‖(‖r

ai (k)

mi‖+ ‖(I + Li)

−1‖‖eoi (k)

mi‖), (7.32)

where in the first equality we use Lemma 4.2, in the second equality we use thedefinition of eoi (k), and the third equality follows from the definition of eai (k). Thecombination of (7.29), (7.31) and (7.32) gives (7.26). When k /∈ T a

i , then Ψoi (k+1) =

Ψoi (k) and thus (7.26) reduces to (7.27).

Denote the following to characterize the estimation errors of the ith group:

Ui(k) := ‖Ωai (k)− Ωa,i‖2F + ‖Ψo

i (k)−Ψo,i(k)‖2F .

With the two claims just proved, one can characterize Ui(k + 1)− Ui(k) as follows:

Ui(k + 1)− Ui(k) ≤1

m2i

[

‖rai (k)‖ ‖eoi (k)‖]

Πi(k)

[

‖rai (k)‖‖eoi (k)‖

]

, (7.33)

where the time-varying matrix Π(k) is given that: if k ∈ T ai , then

Πi(k) = Π(1)i =

[

ξ1 00 ξ2

]

,

with ξ1 := −2µai +5(µa

i )2+

(µai ‖L

−1i

Mi‖µoi

)2+(‖(I+Li)

−1‖µai

µoi

)2and ξ2 := −2µo

i +4(µoi )

2+

4(µai )

2‖(I + Li)−1‖2 + 2µa

i ‖L−1i Mi‖‖(I + Li)

−1‖; otherwise,

Πi(k) = Π(2)i =

[

0 00 −2µo

i + (µoi )

2

]

.

Claim 6: The matrix Π(2)i are negative semi-definite and Π

(1)i is negative definite.

Proof. Since µoi ∈ (0, 1), then it is easy to see that Π

(2)i is negative semi-definite.

From (4.11), it can be seen that Π(2)i is negative definite.

Claim 7: The sequence of Ψoi (k) is uniformly bounded. Furthermore, the

sequence of rai (k) is diminishing, and the sequence of eoi (k) is square summable.Proof. It follows from Claim 6 and (7.33) that the sequence of Ui(k) is non-

increasing and uniformly bounded. Since ‖Ωai (k) − Ωa,i‖2F and ‖Ψo

i (k) − Ψo,i(k)‖2Fare non-negative, they are uniformly bounded. Since Ωi is constant, so Ωa

i (k) andthus Ψo,i(k) are uniformly bounded. It further implies that Ψo

i (k) are uniformlybounded. Sum (7.13) over [0,K], and we have the following relation:

− λmax(Π(2))

m2i

∑

0≤k≤K,k∈Tai

(

‖eoi (k)‖2 + ‖rai (k)‖2)

+2µo

i − (µoi )

2

m2i

∑

0≤k≤K,k/∈Tai

‖eoi (k)‖2

≤ Ui(0)− Ui(K + 1) < +∞.

25

This implies that the sequence of ‖eoi (k)‖2 and the subsequence ‖rai (k)‖2k∈Taiare

summable. Then the subsequence of ‖rai (k)‖2k∈Taidiminishing by Lemma 7.1.

Notice that rai (s) = rai (τai (k) + 1) for all τai (k) + 1 ≤ s ≤ k. We are now in the

position to show the convergence of the whole sequence eai (k). Pick any ǫ > 0, thereis K(ǫ) ≥ 0 such that the following holds for any k′, k′′ ≥ K(ǫ) with k′, k′′ ∈ T a

i :

‖rai (k′)− rai (k′′)‖ ≤ ǫ. (7.34)

Pick any k1, k2 ≥ K(ǫ), then ‖rai (k1)− rai (k2)‖ can be characterized as follows:

‖rai (k1)− rai (k2)‖ = ‖rai (τai (k1))− rai (τai (k2))‖ ≤ ǫ, (7.35)

where the last inequality is a result of (7.34). As a result, the sequence of rai (k)k≥0

is a Cauchy sequence and thus converges. Since rai (k)k∈Tai

is a subsequence ofrai (k)k≥0, it implies that rai (k)k≥0 has the same limit as rai (k)k∈Ta

iand thus

rai (k)k≥0 goes to zero. Since eai (k) = rai (k)+(I+Li)−1eoi (k), this gives that eai (k)

is diminishing.Claim 8: If k − τai (k) ≤ TB, then the sequence of eai (k) is square summable.Proof. Since k − τai (k) ≤ TB, then we have

+∞∑

k=0

‖rai (k)‖2 ≤ TB

∑

k∈Tai

‖rai (k)‖2 < +∞. (7.36)

Recall that eai (k) = rai (k) + (I + Li)−1eoi (k). It follows from the square summability

of eoi (k) and rai (k) that eai (k) is square summable.This completes the proof of Proposition 4.3.We are now ready to show Theorem 4.4 for ALGORITHM II (bilateral learning).The proof of Theorem 4.4:Proof. The proof is analogous to Theorem 3.2, and we only provide its sketch

here. From Proposition 4.3, we know that eoi (k) is square summable and Ψoi (k)

is uniformly bounded. This result is the counterpart of Claim 2 in the proof ofTheorem 3.2. The remainder of the proof can be finished by following analogous linesin Claim 1 and Claim 3 in Theorem 3.2. The details are omitted here.

REFERENCES

[1] T. Alpcan and T. Basar. Network Security: A Decision And Game Theoretic Approach.Cambridge University Press, 2011.

[2] S. Amin, A. Cardenas, and S.S. Sastry. Safe and secure networked control systems under denial-of-service attacks. In Hybrid systems: Computation and Control, pages 31–45, 2009.

[3] S. Amin, X. Litrico, S.S. Sastry, and A.M. Bayen. Stealthy deception attacks on water SCADAsystems. In Hybrid systems: Computation and Control, pages 161–170, Stockholm, Sweden,2010.

[4] S. Amin, G.A. Schwartz, and S.S. Sastry. Security of interdependent and identical networkedcontrol systems. Automatica, 49(1):186–192, 2013.

[5] K.J. Astrom and B. Wittenmark. Adaptive control. Dover Publications, 2008.[6] G.K. Befekadu, V. Gupta, and P.J. Antsaklis. Risk-sensitive control under a class of denial-

of-service attack models. In American Control Conference, pages 643–648, San Francisco,USA, June 2011.

[7] S. Bhattacharya and T. Basar. Game-theoretic analysis of an aerial jamming attack on a UAVcommunication network. In American Control Conference, pages 818–823, Baltimore,USA, June 2010.

[8] S. Bhattacharya and T. Basar. Graph-theoretic approach for connectivity maintenance inmobile networks in the presence of a jammer. In IEEE Int. Conf. on Decision and Control,pages 3560–3565, Atlanta, USA, December 2010.

26

[9] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.[10] M.S. Branicky, S.M. Phillips, and W. Zhang. Stability of networked control systems: explicit

analysis of delay. In American Control Conference, pages 2352–2357, Chicago, USA, 2000.[11] R. W. Brockett and D. Liberzon. Quantized feedback stabilization of linear systems. IEEE

Transactions on Automatic Control, 45(7):1279–1289, 2000.[12] M. Cao, A. S. Morse, and B. D. O. Anderson. Reaching a consensus in a dynamically changing

environment - convergence rates, measurement delays and asynchronous events. SIAMJournal on Control and Optimization, 47(2):601–623, 2008.

[13] J. Cortes. Global and robust formation-shape stabilization of relative sensing networks. Auto-matica, 45(12):2754–2762, 2009.

[14] W. B. Dunbar and R. M. Murray. Distributed receding horizon control for multi-vehicle for-mation stabilization. Automatica, 42(4):549–558, 2006.

[15] A. Gupta, C. Langbort, and T. Basar. Optimal control in the presence of an intelligent jammerwith limited actions. In IEEE Int. Conf. on Decision and Control, pages 1096–1101,Atlanta, USA, December 2010.

[16] A. Jadbabaie, J. Lin, and A. S. Morse. Coordination of groups of mobile autonomous agentsusing nearest neighbor rules. IEEE Transactions on Automatic Control, 48(6):988–1001,2003.

[17] T. Jiang, I. Matei, and J.S. Baras. A trust based distributed Kalman filtering approach formode estimation in power systems. In Proceedings of The First Workshop on SecureControl Systems, pages 1–6, Stockholm, Sweden, April 2010.

[18] H. K. Khalil. Nonlinear Systems. Prentice Hall, 3 edition, 2002.[19] D. Liberzon and J.P. Hespanha. Stabilization of nonlinear systems with limited information

feedback. IEEE Transactions on Automatic Control, 50(6):910–915, 2005.[20] Y. Mo, E. Garone, A. Casavola, and B. Sinopoli. False data injection attacks against state

estimation in wireless sensor networks. In IEEE Int. Conf. on Decision and Control, pages5967–5972, Atlanta, USA, December 2010.

[21] D. Nesic and A. Teel. Input-output stability properties of networked control systems. IEEETransactions on Automatic Control, 49(10):1650–1667, 2004.

[22] R. Olfati-Saber and R. M. Murray. Consensus problems in networks of agents with switchingtopology and time-delays. IEEE Transactions on Automatic Control, 49(9):1520–1533,2004.

[23] F. Pasqualetti, A. Bicchi, and F. Bullo. Consensus computation in unreliable networks: A sys-tem theoretic approach. IEEE Transactions on Automatic Control, 57(1):90–104, February2012.

[24] F. Pasqualetti, R. Carli, and F. Bullo. A distributed method for state estimation and false datadetection in power networks. In IEEE Int. Conf. on Smart Grid Communications, pages469–474, October 2011.

[25] F. Pasqualetti, F. Dorfler, and F. Bullo. Attack detection and identification in cyber-physicalsystems. IEEE Transactions on Automatic Control, 58(11):2715–2729, 2013.

[26] S. Roy, C. Ellis, S. Shiva, D. Dasgupta, V. Shandilya, and Q. Wu. A survey of game theory asapplied to network security. In Int. Conf. on Systems Sciences, pages 1–10, Hawaii, USA,2010.

[27] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M. I. Jordan, and S. S. Sastry.Kalman filtering with intermittent observations. IEEE Transactions on Automatic Con-trol, 49(9):1453–63, 2004.

[28] S. Sundaram and C.N. Hadjicostis. Distributed function calculation via linear iterative strate-gies in the presence of malicious agents. IEEE Transactions on Automatic Control,56(7):1731–1742, 2011.

[29] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.[30] A. Teixeira, S. Amin, H. Sandberg, K.H. Johansson, and S.S. Sastry. Cyber security analysis

of state estimators in electric power systems. In IEEE Int. Conf. on Decision and Control,pages 5991–5998, Atlanta, USA, December 2010.

[31] A. Teixeira, D. Perez, H. Sandberg, and K.H. Johansson. Attack models and scenarios fornetworked control systems. In Int. Conf. on High Confidence Networked Systems, pages55–64, 2012.

[32] G. Theodorakopoulos and J. S. Baras. Game theoretic modeling of malicious users in collab-orative networks. IEEE Journal of Selected Areas in Communications, 26(7):1317–1327,2008.

[33] L. Xie, Y. Mo, and B. Sinopoli. False data injection attacks in electricity markets. In IEEEInt. Conf. on Smart Grid Communications, pages 226–231, Gaithersburg, USA, October2010.

27

[34] M. Zhu and S. Martınez. Discrete-time dynamic average consensus. Automatica, 46(2):322–329,2010.

[35] M. Zhu and S. Martınez. Attack-resilient distributed formation control via online adaptation. InIEEE Int. Conf. on Decision and Control, pages 6624–6629, Orlando, FL, USA, December2011.

[36] M. Zhu and S. Martınez. Stackelberg game analysis of correlated attacks in cyber-physicalsystem. In American Control Conference, pages 4063–4068, June 2011.

[37] M. Zhu and S. Martınez. On the performance of resilient networked control systems underreplay attacks. IEEE Transactions on Automatic Control, 59(3):804–808, 2014.

28

on attack-resilient distributed formation control...

Documents