lin 2015

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015 1659

An Optimal Control Approach to the Multi-Agent PersistentMonitoring Problem in Two-Dimensional Spaces

Xuchao Lin and Christos G. Cassandras

AbstractWe address the persistent monitoring problem intwo-dimensional mission spaces where the objective is to controlthe trajectories of multiple cooperating agents to minimize anuncertainty metric. In a one-dimensional mission space, we haveshown that the optimal solution is for each agent to move atmaximal speed and switch direction at specific points, possiblywaiting some time at each such point before switching. In a two-dimensional mission space, such simple solutions can no longerbe derived. An alternative is to optimally assign each agent alinear trajectory, motivated by the one-dimensional analysis. Weprove, however, that elliptical trajectories outperform linear ones.With this motivation, we formulate a parametric optimizationproblem in which we seek to determine such trajectories. We showthat the problem can be solved using Infinitesimal PerturbationAnalysis (IPA) to obtain performance gradients on line and obtaina complete and scalable solution. Since the solutions obtained aregenerally locally optimal, we incorporate a stochastic comparisonalgorithm for deriving globally optimal elliptical trajectories. Nu-merical examples are included to illustrate the main result, allowfor uncertainties modeled as stochastic processes, and compareour proposed scalable approach to trajectories obtained throughoff-line computationally intensive solutions.

Index TermsHybrid systems, Infinitesimal Perturbation Anal-ysis (IPA), multi-agent systems, optimal control.

I. INTRODUCTION

Autonomous cooperating agents may be used to perform taskssuch as coverage control [1], [2], surveillance [3] and environmen-tal sampling [4][6]. Persistent monitoring (also called persistentsurveillance or persistent search) arises in a large dynamicallychanging environment which cannot be fully covered by a stationaryteam of available agents. Thus, persistent monitoring differs fromtraditional coverage tasks due to the perpetual need to cover a changingenvironment, i.e., all areas of the mission space must be sensedinfinitely often. The main challenge in designing control strategiesin this case is in balancing the presence of agents in the changingenvironment so that it is covered over time optimally (in some well-defined sense) while still satisfying sensing and motion constraints.

Manuscript received October 10, 2013; revised April 25, 2014, April 28,2014, July 14, 2014, and August 13, 2014; accepted September 18, 2014. Dateof publication September 24, 2014; date of current version May 21, 2015.The authors work is supported in part by NSF under Grant CNS-1239021,by AFOSR under grant FA9550-12-1-0113, by ONR under grant N00014-09-1-1051, and by ARO under Grant W911NF-11-1-0227. Recommended byAssociate Editor A. Nedich.

The authors are with the Division of Systems Engineering and Center forInformation and Systems Engineering, Boston University, Boston, MA 02215USA (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TAC.2014.2359712

Control and motion planning for agents performing persistent mon-itoring tasks have been studied in the literature, e.g., see [7][13]. In[14], we addressed the persistent monitoring problem by proposingan optimal control framework to drive multiple cooperating agentsso as to minimize a metric of uncertainty over the environment. Thismetric is a function of both space and time such that uncertainty at apoint grows if it is not covered by any agent sensors. It was shown in[14] that the optimal control problem can be reduced to a parametricoptimization problem. In particular, each agents optimal trajectoryis fully described by a set of switching points {1, . . . , K} andassociated waiting times at these points, {w1, . . . , wK}. This allowsus to make use of Infinitesimal Perturbation Analysis (IPA) [15] todetermine gradients of the objective function with respect to theseparameters and subsequently obtain optimal switching locations andwaiting times that fully characterize an optimal solution. It also allowsus to exploit robustness properties of IPA to readily extend this solutionapproach to a stochastic uncertainty model.

In this paper, we address the same persistent monitoring problemin a two-dimensional (2D) mission space. Using an analysis similar tothe one-dimensional (1D) case, we find that we can no longer identifya parametric representation of optimal agent trajectories. Motivatedby the simple structure of the 1D problem, it has been suggested toassign each agent a linear trajectory for which the explicit 1D solutioncan be used. However, in a 2D space it is not obvious that a lineartrajectory is a desirable choice. Indeed, a key contribution of this paperis to formally prove that an elliptical agent trajectory outperforms alinear one in terms of the uncertainty metric we are using. Motivatedby this result, we formulate a 2D persistent monitoring problem asone of determining optimal elliptical trajectories for a given number ofagents, noting that this includes the possibility that one or more agentsshare the same trajectory. We show that this problem can be explicitlysolved using similar IPA techniques as in our 1D analysis. In particular,we use IPA to determine on line the gradient of the objective functionwith respect to the parameters that fully define each elliptical trajectory(center, orientation and length of the minor and major axes). Thisapproach is scalable in the number of observed events, not states, ofthe underlying hybrid system characterizing the persistent monitoringprocess, so that it is suitable for on-line implementation. However,the standard gradient-based optimization process we use is generallylimited to local, rather than global optimal solutions. Thus, we adopt astochastic comparison algorithm from the literature [16] to overcomethis problem.

Section II formulates the optimal control problem for 2D missionspaces and Section III presents the solution approach. In Section IVwe establish our key result that elliptical agent trajectories outperformlinear ones in terms of minimizing an uncertainty metric per unitarea. In Section V we formulate and solve the problem of determiningoptimal elliptical agent trajectories using an algorithm driven by gradi-ents evaluated through IPA. In Section VI we incorporate a stochasticcomparison algorithm for obtaining globally optimal solutions and inSection VII we provide numerical results to illustrate our approachand compare it to computationally intensive solutions based on aTwo-Point-Boundary-Value-Problem (TPBVP) solver. Section VIIIconcludes the paper.

0018-9286 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1660 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015

II. PERSISTENT MONITORING PROBLEM FORMULATION

We consider N mobile agents in a 2D rectangular mission space [0, L1] [0, L2] R2. Let the position of the agents at timet be sn(t) = [sxn(t), syn(t)] with sxn(t) [0, L1] and syn(t) [0, L2],n = 1, . . . , N , following the dynamics:

sxn(t) = un(t) cos n(t), syn(t) = un(t) sin n(t) (1)

where 0 un(t) 1 is the normalized scalar speed of the nth agentand n(t) is the angle relative to the positive direction that satisfies 0 n(t) < 2. Each agent is represented as a particle in the 2D space,thus we ignore the case of two or more agents colliding with each other.We associate with every point [x, y] a function pn(x, y, sn) thatmeasures the probability that an event at location [x, y] is detected byagent n. We also assume that pn(x, y, sn) = 1 if [x, y] = sn, and thatpn(x, y, sn) is monotonically nonincreasing in the Euclidean distanceD(x, y, sn) [x, y] sn between [x, y] and sn, thus capturing thereduced effectiveness of a sensor over its range which we considerto be finite and denoted by rn. Therefore, we set pn(x, y, sn) = 0when D(x, y, sn) > rn. Our analysis is not affected by the precisesensing model pn(x, y, sn) (in [14], for example, a linear decay modelwas used). Next, consider a set of points {[i, i], i = 1, . . . ,M},[i, i] , and associate a time-varying measure of uncertainty witheach point [i, i], which we denote by Ri(t). The set of points{[1, 1], . . . , [M , M ]} may be selected to contain specific pointsof interest in the environment, or simply to sample points in themission space. Alternatively, we may consider a partition of into Mrectangles denoted by i whose center points are [i, i]. Therefore,the joint probability of detecting an event at location [i, i] by all theN agents (assuming detection independence) is

Pi (s(t)) = 1N

n=1

[1 pn (i, i, sn(t))] (2)

In order to avoid the uninteresting case where there is a large numberof agents who can adequately cover the mission space, we assume thatfor any s(t) = [s1(t), . . . , sN (t)]T, there exists some point [x, y] (the discretized mission space) with P (x, y, s(t)) = 0. This meansthat for any assignment of N agents at time t, there is always at leastone point in the mission space that cannot be sensed by any agent. Sim-ilar to the 1D analysis in [14], we define uncertainty functions Ri(t)associated with the rectangles i, i = 1, . . . ,M , so that they have thefollowing properties: (i)Ri(t) increases with a rateAi ifPi(s(t)) = 0,(ii) Ri(t) decreases with a fixed rate B Ai if Pi(s(t)) = 1 and(iii) Ri(t) 0 for all t. It is then natural to model uncertainty so thatits decrease is proportional to the probability of detection. In particular,we model the dynamics of Ri(t), i = 1, . . . ,M , as follows:

Ri(t) =

{0 if Ri(t) = 0, Ai BPi (s(t))Ai BPi (s(t)) otherwise (3)

where we assume that initial conditions Ri(0), i = 1, . . . ,M , aregiven and that B > Ai > 0 for all i = 1, . . . ,M ; thus, the uncertaintystrictly decreases when there is perfect sensing Pi(s(t)) = 1.

The goal of the optimal persistent monitoring problem we con-sider is to control through un(t), n(t) in (1) the movement ofthe N agents so that the cumulative uncertainty over all sensingpoints {[1, 1], . . . , [M , M ]} is minimized over a fixed timehorizon T . Thus, setting u(t) = [u1(t), . . . , uN (t)] and (t) =

[1(t), . . . , N (t)] we aim to solve the following optimal controlproblem P1:

P1 : minu(t),(t)

J =

T0

Mi=1

Ri(t)dt (4)

subject to the agent dynamics (1), uncertainty dynamics (3), controlconstraint 0 un(t) 1, 0 n(t) 2, t [0, T ], and state con-straints sn(t) for all t [0, T ], n = 1, . . . , N .

III. OPTIMAL CONTROL SOLUTION

We first characterize the optimal control solution of problem P1.We define the state vector x(t)= [sx1(t), s

y1(t), . . . , s

xN (t), s

yN (t),

R1(t), . . . , RM (t)]T and the associated costate vector (t)=

[x1(t), y1(t), . . . ,

xN (t),

yN (t), 1(t), . . . , M (t)]

T. In view of the

discontinuity in the dynamics of Ri(t) in (3), the optimal state trajec-tory may contain a boundary arc when Ri(t) = 0 for any i; otherwise,the state evolves in an interior arc [17]. This follows from the fact,proved in [14] and [18], that it is never optimal for agents to reachthe mission space boundary. We analyze the system operating in suchan interior arc and omit the state constraint sn(t) , n = 1, . . . , N ,t [0, T ]. Using (1) and (3), the Hamiltonian is

H =i

Ri(t) +i

iRi(t)

+n

xn(t)un(t) cos n(t) +n

yn(t)un(t) sin n(t). (5)

Combining the trigonometric function terms, after some algebra, weobtain

H =i

Ri(t) +i

iRi(t)

+n

sgn(yn(t))un(t)

(xn(t))

2+(yn(t))2 sin(n(t)+n(t))

(6)

where sgn() is the sign function and n(t) is defined so thattann(t) =

xn(t)/

yn(t) for yn(t) = 0 and n(t) = (/2)sgn(yn)

for yn(t) = 0. In what follows, we exclude the case where xn(t) = 0and yn(t) = 0 at the same time for any given n over any finitesingular interval. Applying the Pontryagin minimum principle to (6)with un(t), n(t), t [0, T ), denoting optimal controls, we have

H (x,,u,) = minu[0,1]N ,[0,2]N

H(x,,u, )

and it is immediately obvious that it is necessary for an optimal controlto satisfy

un(t) = 1 (7)

and {sin (n(t) + n(t)) = 1, if yn(t) < 0sin (n(t) + n(t)) = 1, if yn(t) > 0. (8)

We are only left with the task of determining n(t), n = 1, . . . , N .This can be accomplished by solving a standard TPBVP involvingforward and backward integrations of the state and costate equations toevaluate H/n after each such iteration and using a gradient descentapproach until the objective function converges to a (local) minimum.


Clearly, this is a computationally intensive process which scales poorlywith the number of agents and the size of the mission space. Inaddition, it requires discretizing the mission time T and calculatingevery control at each time step which adds to the computationalcomplexity.

IV. LINEAR VERSUS ELLIPTICAL AGENT TRAJECTORIES

Given the complexity of the TPBVP required to obtain an optimalsolution of problem P1, we seek alternative approaches which maybe suboptimal but are tractable and scalable. The first such effort ismotivated by the results obtained in our 1D analysis, where we foundthat on a mission space defined by a line segment [0, L] the optimaltrajectory for each agent is to move at full speed until it reaches someswitching point, dwell on the switching point for some time (possiblyzero), and then switch directions. Thus, each agents optimal trajectoryis fully described by a set of switching points {1, . . . , K} and associ-ated waiting times at these points, {w1, . . . , wK}. The values of theseparameters can then be efficiently determined using a gradient-basedalgorithm; in particular, we used Infinitesimal Perturbation Analysis(IPA) to evaluate the objective function gradient as shown in [14].

Thus, a reasonable approach that has been suggested is to assigneach agent a linear trajectory. However, there is no reason to believethat a linear trajectory is a good choice in a 2D setting. A broaderchoice is provided by the set of elliptical trajectories which in factencompass linear ones when the minor axis of the ellipse becomeszero. The main result of this section is to formally show that an ellip-tical trajectory outperforms a linear one using the average uncertaintymetric in (4) as the basis for such comparison.

To simplify notation, let = [x, y] R2 and, for a single agents(t), define

={ R2|t [0, T ] such that Bp(, s(t)) > A()

}(9)

In view of the uncertainty dynamics in (3), it should be clear that defines the effective coverage region for the agent, i.e., the regionwhere R(, t) is strictly reduced given B and a specific sensing modelp(, s). Clearly, depends on the values of s(t) which are dependenton the single agent trajectory. Let us define an elliptical trajectoryso that the agent position sn(t) = [sxn(t), syn(t)] follows the generalparametric form of an ellipse:{

sxn(t) = Xn + an cos n(t) cosn bn sin n(t) sinnsyn(t) = Yn + an cos n(t) sinn + bn sin n(t) cosn

(10)

where [Xn, Yn] is the center of the ellipse, an, bn are its major andminor axes respectively, n [0, ) is the ellipse orientation (theangle between the x axis and the major ellipse axis) and n(t) [0, 2) is the eccentric anomaly of the ellipse. Assuming the agentmoves with constant maximal speed 1 on this trajectory (based on (7)),we have (sxn)

2+ (syn)

2= 1, which gives

n(t) =[(an sin n(t) cosn + bn cos n(t) sinn)

2

+ (an sin n(t) sinn bn cos n(t) cosn)2]1/2

. (11)In order to make a fair comparison between a linear and an ellipticaltrajectory, we normalize the objective function in (4) with respectto the coverage area in (9) and consider all points in (rather thandiscretizing it or limiting ourselves to a finite set of sampling points).Thus, we define

J(b) =1

T0

R(, t)ddt (12)

where =d is the area of the effective coverage region. We

drop the subscript here to indicate we are now considering the singleagent case. Note that we view this normalized metric as a function ofb 0, so that when b = 0 we obtain the uncertainty corresponding to alinear trajectory. For simplicity, the trajectory is selected so that [X,Y ]coincides with the origin and = 0, the major axis a is assumed fixed.Regarding the range of b, we will only be interested in values whichare limited to a neighborhood of zero that we will denote by B. Givena, this set dictates the values that s(t) is allowed to take. Finally,we make the following assumptions.

Assumption 1: p(, s) p(D(, s)) is a continuous function ofD(, s) s.

Assumption 2: Let , be symmetric points in with respect tothe center point of the ellipse. Then, A() = A().

The following result establishes the fact that an elliptical trajectorywith some b > 0 can achieve a lower cost than a linear trajectory (i.e.,b = 0) in terms of a long-term average uncertainty per unit area. Theproof may be found in [19].

Proposition IV.1: Under Assumptions 12 and b B,

limT,b0

J(b)

b< 0

i.e., switching from a linear to an elliptical trajectory reduces the costin (12).

In other words, Prop. IV.1 shows that elliptical trajectories are moresuitable for a 2D mission space in terms of achieving near-optimal re-sults in solving problem P1. We should point out that if Assumption 2is violated, i.e., a mission space is inhomogeneous in terms of howuncertainties arise, then this result does not hold in general (there arecounterexamples).

V. OPTIMAL ELLIPTICAL TRAJECTORIES

Based on our analysis thus far, the approach is to associate witheach agent an elliptical trajectory, parameterize each such trajectoryby its center, orientation and major and minor axes, and then solveP1 as a parametric optimization problem. Note that this includes thepossibility that two agents share the same trajectory if the solution tothis problem results in identical parameters for the associated ellipses.Choosing elliptical trajectories, offers several practical advantages inaddition to reduced computational complexity. Elliptical trajectoriesinduce a periodic structure to the agent movements which providespredictability. As a result, it is also easier to handle issues related tocollision avoidance. Therefore, we replace problem P1 by the deter-mination of optimal parameter vectors n [Xn, Yn, an, bn, n]T,n = 1, . . . , N , and formulate the following problem P2:

P2 : minn,n=1,...,N

J =

T0

Mi=1

Ri(1, . . . ,N , t)dt. (13)

Observe that the behavior of each agent under the optimal ellipse con-trol policy is that of a hybrid system whose dynamics undergo switcheswhen Ri(t) reaches or leaves the boundary value Ri = 0 (the eventscausing the switches). As a result, we are faced with a parametricoptimization problem for a system with hybrid dynamics. We solvethis hybrid system problem using a gradient-based approach in whichwe apply IPA to determine the gradients Ri(1, . . . ,N , t) on line(hence, J), i.e., directly using information from the agent trajectoriesand iterate upon them.


A. Infinitesimal Perturbation Analysis (IPA)We begin with a brief review of the IPA framework for general

stochastic hybrid systems as presented in [15]. The purpose of IPA is tostudy the behavior of a hybrid system state as a function of a parametervector for a given compact, convex set Rl. Let {k()},k = 1, . . . ,K, denote the occurrence times of all events in the statetrajectory. For convenience, we set 0 = 0 and K+1 = T . Over aninterval [k(), k+1()), the system is at some mode during which thetime-driven state satisfies x = fk(x, , t). An event at k is classifiedas (i) Exogenous if it causes a discrete state transition independentof and satisfies dk/d = 0; (ii) Endogenous, if there exists acontinuously differentiable function gk : Rn R such that k =min{t > k1 : gk(x(, t), ) = 0}; and (iii) Induced if it is triggeredby the occurrence of another event at time m k. IPA specifies howchanges in influence the state x(, t) and the event times k() and,ultimately, how they influence interesting performance metrics whichare generally expressed in terms of these variables.

We define

x(t) x(, t)

, k k()

, k = 1, . . . ,K

for all state and event time derivatives. It is shown in [15] that x(t)satisfies

d

dtx(t) =

fk(t)

xx(t) +

fk(t)

(14)

for t [k, k+1) with boundary conditionx(+k)= x

(k)+[fk1

(k) fk

(+k)]

k (15)

for k = 0, . . . ,K, where k is the left limit of k. In addition, in (15),the gradient vector for each k is k = 0 if the event at k is exogenousand

k = [gkx

fk(k)]1(gk

+

gkx

x(k))

(16)

if the event at k is endogenous (i.e., gk(x(, k), ) = 0) and definedas long as (gk/x)fk(k ) = 0.

In our case, the parameter vectors are n [Xn, Yn, an, bn, n]Tas defined earlier, and we seek to determine optimal vectors n ,n = 1, . . . , N . We will use IPA to evaluate J(1, . . . , N ) =[J/1, . . . , J/N ]

T. From (13), this gradient clearly depends

on Ri(t) = [Ri(t)/1, . . . , Ri(t)/N ]T. In turn, this gradientdepends on whether the dynamics of Ri(t) in (3) are given byRi(t) = 0 or Ri(t) = Ai BPi(s(t)). The dynamics switch at eventtimes k, k = 1, . . . ,K, when Ri(t) reaches or escapes from 0 whichare observed on a trajectory over [0, T ] based on a given n, n =1, . . . , N .

IPA Equations: We begin by recalling the dynamics of Ri(t) in (3)which depend on the relative positions of all agents with respect to[i, i] and change at time instants k such that eitherRi(k) = 0 withRi(

k ) > 0 or Ai > BPi(s(k)) with Ri(

k ) = 0. Moreover, the

agent positions sn(t) = [sxn(t), syn(t)], n = 1, . . . , N , on an ellipticaltrajectory are expressed using (10). Viewed as a hybrid system, we cannow concentrate on all events causing transitions in the dynamics ofRi(t), i = 1, . . . ,M , since any other event has no effect on the valuesof Ri(1, . . . ,N , t) at t = k.

For notational simplicity, we define i = [i, i] . First, ifRi(t) = 0 and A(i)BP (i, s(t)) 0, applying (14) to Ri(t)and using (3) gives

d

dt

Ri(t)

n= 0. (17)

When Ri(t) > 0, we have

d

dt

Ri(t)

n= Bpn (i, sn(t))

n

Nd =n

[1 pd (i, sd(t))] . (18)

Noting that pn(i, sn(t)) pn(D(i, sn(t))), we havepn (i, sn(t))

n=

pn (D (i, sn(t)))

D (i, sn(t)))

D (i, sn(t))

n(19)

where D(i, sn(t)) = [(sxn(t) i)2 + (syn(t) i)2]1/2

. For sim-plicity, we write D = D(i, sn(t)) and we get

D

n=

1

2D

(D

sxn

sxnn

+D

syn

synn

)(20)

where D/sxn = 2(sxn i) and D/syn = 2(syn i). Notethat sxn/n = [sxn/Xn, sxn/Yn, sxn/an, sxn/bn, sxn/n]

T and syn/n = [syn/Xn, syn/Yn, syn/an, syn/bn,syn/n]

T. From (10), for sxn/n, we obtain sxn/Xn = 1,

sxn/Yn = 0, sxn/an = cos n(t) cosn, s

xn/bn = sin n

(t) sinn, sxn/n = an cos n(t) sinn b sin n(t) cosn.

Similarly, for syn/n, we get syn/Xn = 0, syn/Yn = 1,syn/an= cos n(t) sinn, s

yn/bn= sin n(t) cosn and syn/

n = an cos n(t) cosn b sin n(t) sinn. Using sxn/nand syn/n in (20) and then (19) and back into (18), we can finallyobtain Ri(t)/n for t [k, k+1) as

Ri(t)

n=

Ri(+k)

n

+

{0 if Ri(t) = 0, Ai BPi (s(t)) tk

ddt

Ri(t)n

dt otherwise (21)

where the integral above is obtained from (17)(19). Thus, it re-mains to determine the components Ri(+k ) in (21) using (15).This involves the event time gradient vectors k = k/n fork = 1, . . . ,K, which will be determined through (16). There are twopossible cases regarding the events that cause switches in the dynamicsof Ri(t).

Case 1) At k, Ri(t) switches from Ri(t) = 0 to Ri(t) = Ai BPi(s(t)). In this case, it is easy to see that the dynamicsof Ri(t) are continuous, so that fk1(k ) = fk(

+k ) in

(15) applied to Ri(t) and we get

Ri(+k)= Ri

(k), i = 1, . . . ,M (22)

Case 2) At k, Ri(t) switches from Ri(t) = Ai BPi(s(t))to Ri(t) = 0, i.e., Ri(k) becomes zero. In this case,we need to first evaluate k from (16) in order todetermine Ri(+k ) through (15). Observing that thisevent is endogenous, (16) applies with gk = Ri = 0 andwe get k = Ri(k )/(A(i)BP (i, s(k ))). Itfollows from (15) that

Ri(+k)= Ri

(k)

[A(i)BP

(i, s

(k))]

Ri(k)

A(i)BP(i, s

(k)) = 0. (23)

Thus, Ri(+k ) is always reset to 0 regardless ofRi(k ).


Objective Function Gradient Evaluation: Based on our analysis,we first rewrite J in (13) as

J(1, . . . , N ) =

Mi=1

Kk=0

k+1(1,...,N )k(1,...,N )

Ri(1, . . . , N , t)dt

and (omitting some function arguments), we get

J=Mi=1

Kk=0

k+1k

Ri(t)dt+Ri(k+1)k+1Ri(k)k

.

Observing the cancellation of all terms of the form Ri(k)k for allk (with 0 = 0, K+1 = T fixed), we finally get

J(1, . . . , N ) =Mi=1

Kk=0

k+1k

Ri(t)dt. (24)

This depends entirely on Ri(t), which is obtained from (21) andthe event times k, k = 1, . . . ,K, given initial conditions sn(0) forn = 1, . . . , N , and Ri(0) for i = 1, . . . ,M . In (21), Ri(+k )/nis obtained through (22), (23), whereas (d/dt)(Ri(t)/n) is ob-tained through (17)(20).

Remark 1: Observe that the evaluation of Ri(t), hence J , isindependent of Ai, i = 1, . . . ,M , i.e., the values in our uncertaintymodel. Thus, the IPA approach possesses an inherent robustnessproperty: there is no need to explicitly model how uncertainty affectsRi(t) in (3). Consequently, we may treat Ai as unknown withoutaffecting the solution approach (the values of Ri(t) are obviouslyaffected). We may also allow this uncertainty to be modeled throughrandom processes {Ai(t)}, i = 1, . . . ,M ; in this case, however, theresult of Proposition IV.1 no longer applies without some conditionson the statistical characteristics of {Ai(t)} and the resulting J is anestimate of a stochastic gradient.

Remark 2: Note that the complexity of J(1, . . . , N ) in (24)grows linearly in the number of agents N as well as in T . In otherwords, solving the problem using IPA is scalable with respect to thenumber of agents and the operation time.

B. Objective Function OptimizationWe now seek to obtain [ 1 , . . . , N ] minimizing J(1, . . . , N )

through a standard gradient-based optimization algorithm of the form[ l+11 , . . . ,

l+1N

]=[ l1, . . . ,

lN

][l1, . . . ,

lN

]J

( l1, . . . ,

lN

)(25)

where {ln}, l = 1, 2, . . . are appropriate step size sequences andJ( l1, . . . , lN ) is the projection of the gradient J(1, . . . , N )onto the feasible set, i.e., sn(t) for all t [0, T ], n = 1, . . . , N .The optimization algorithm terminates when |J( l1, . . . , lN )| < (for a fixed threshold ) for some [ 1 , . . . , N ]. When > 0 issmall, [ l1, . . . , lN ] is believed to be in the neighborhood of the localoptimum, then we set [ 1 , . . . , N ] = [ l1, . . . , lN ]. However, in ourproblem the function J(1, . . . , N ) is non-convex and there are actu-ally many local optima depending on the initial controllable parametervector [ 01 , . . . ,

0N ]. In the next section, we propose a stochastic

comparison algorithm which addresses this issue by randomizing overthe initial points [ 01 , . . . , 0N ]. This algorithm defines a process whichconverges to a global optimum under certain well-defined conditions.

Fig. 1. Optimal trajectories using TPBVP solver for two agents. (a) Red andgreen trajectories obtained from TPBVP solution. (b) Cost as a function ofalgorithm iterations. JTPBVP = 7.15 104.

VI. STOCHASTIC COMPARISON ALGORITHMFOR GLOBAL OPTIMALITY

Gradient-based optimization algorithms are generally efficient andeffective in finding the global optimum when one is uniquely specifiedby the point where the gradient is zero. When this is not the case, toseek a global optimum one must resort to several alternatives whichinclude a variety of random search algorithms. In this section, weuse the Stochastic Comparison algorithm in [16] to find the globaloptimum. As shown in [16], for a stochastic system, if (i), the costfunction J( ) is continuous in and (ii), for each estimate J( )of J( ) the error W ( ) = J( ) J( ) has a symmetric pdf, thenthe Markov process {k} generated by the Stochastic Comparisonalgorithm will converge to an -optimal interval of the global optimumfor arbitrarily small > 0. In short, limk P [ k ] = 1, for any > 0, where is defined as = { |J( ) J( ) + }. Usingthe Continuous Stochastic Comparison (CSC) Algorithm developed in[16] for a general continuous optimization problem, consider to be a controllable vector, where is the bounded feasible con-trollable parameter space. The detailed CSC algorithm is included in[19]. Note that in the deterministic case, the CSC algorithm reducesto a comparison algorithm with multi-starts over the 6-dimensionalcontrollable vector n [Xn, Yn, an, bn, n, n]T, for each ellipseassociated with agent n = 1, . . . , N .

VII. NUMERICAL RESULTS

We begin with a two-agent example in which we solve P2 byassigning elliptical trajectories using the gradient-based approach. Theenvironment setting parameters used are: r = 4 for the sensing rangeof agents; L1 = 20, L2 = 10, for the mission space dimensions; andT = 200. All sampling points [i, i] are uniformly spaced withinL1 L2, i = 1, . . . ,M where M = (L1 + 1)(L2 + 1) = 231. Ini-tial values for the uncertainty functions are Ri(0) = 2 and B = 6,Ai = 0.2 for all i = 1, . . . ,M in (3). We applied the TPBVP algo-rithm for P1. The results are shown in Fig. 1. It is computationallyexpensive and time consuming (about 800,000 steps to converge).Interestingly, the solution corresponds to a cost JTPBVP = 7.15104, which is higher than that of Fig. 2, which is an elliptical trajectorycase discussed next. This is an indication of the presence of locallyoptimal trajectories.

We then solve the same two-agent example with the same environ-ment setting using the CSC algorithm. For simplicity, we select theellipse center location [Xn, Yn] as the only two (out of six) multi-startcomponents: for a given number of comparisons Q, we sample theellipse center [Xn, Yn] L1 L2, n = 1, . . . , N, using a uniformdistribution while an = 5, bn = 2, n = /4, n = 0, for n = 1, 2are randomly assigned but initially fixed parameters during the number


Fig. 2. Two agent example for the deterministic environment setting usingthe CSC algorithm. (a) Red ellipses: initial trajectories. Blue ellipses: optimalelliptical trajectories. (b) Cost as a function of algorithm iterations. JDetCSC =6.57 104.

of comparisons Q (thus, it is still possible that there are local minimawith respect to the remaining four components [an, bn, n, n], but,clearly, all six components in n can be used at the expense of someadditional computational cost.) In Fig. 2, the red elliptical trajectorieson the left show the initial ellipses and the blue trajectories representthe corresponding resulting ellipses the CSC algorithm converges to.Fig. 2(b) shows the cost versus number of iterations of the CSCalgorithm. The resulting cost for Q = 300 is JDetCSC = 6.57 104,where Det stands for a deterministic environment. It is clear fromFig. 2(b) that the cost of the worst local minimum is much higherthan that of the best local minimum. Note also that the CSC algo-rithm does improve the original pure gradient-based algorithm perfor-mance Je = 6.93 104 in [19]. Additional numerical results can befound in [19], including cases where the uncertainty term Ai in (3)is stochastic.

VIII. CONCLUSION

We have shown that an optimal control solution to the 1D persistentmonitoring problem does not easily extend to the 2D case. In partic-ular, we have proved that elliptical trajectories outperform linear onesin a 2D mission space. Therefore, we have sought to solve a paramet-ric optimization problem to determine optimal elliptical trajectories.Numerical examples indicate that this scalable approach (which canbe used on line) provides solutions that approximate those obtainedthrough a computationally intensive TPBVP solver. Moreover, sincethe solutions obtained are generally locally optimal, we have incorpo-rated a stochastic comparison algorithm for deriving globally optimalelliptical trajectories. Ongoing work aims at alternative approaches fornear-optimal solutions and at distributed implementations.

REFERENCES[1] M. Zhong and C. G. Cassandras, Distributed coverage control and data

collection with mobile sensor networks, IEEE Trans. Autom. Control,vol. 56, no. 10, pp. 24452455, Oct. 2011.

[2] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, Coverage control formobile sensing networks, IEEE Trans. Robot. Autom., vol. 20, no. 2,pp. 243255, 2004.

[3] B. Grocholsky, J. Keller, V. Kumar, and G. Pappas, Cooperative air andground surveillance, IEEE Robot. Autom. Mag., vol. 13, no. 3, pp. 1625,2006.

[4] R. Smith, S. Mac Schwager, D. Rus, and G. Sukhatme, Persistent oceanmonitoring with underwater gliders: Towards accurate reconstruction ofdynamic ocean processes, in Proc. IEEE Conf. Robot. Autom., 2011,pp. 15171524.

[5] D. Paley, F. Zhang, and N. Leonard, Cooperative control for oceansampling: The glider coordinated control system, IEEE Trans. ControlSyst. Technol., vol. 16, no. 4, pp. 735744, 2008.

[6] P. Dames, M. Schwager, V. Kumar, and D. Rus, A decentralized controlpolicy for adaptive information gathering in hazardous environments, inProc. IEEE Conf. Decision Control, 2012, pp. 28072813.

[7] S. L. Smith, M. Schwager, and D. Rus, Persistent monitoring of chang-ing environments using robots with limited range sensing, IEEE Trans.Robotics, 2011.

[8] D. E. Soltero, S. Smith, and D. Rus, Collision avoidance for persis-tent monitoring in multi-robot systems with intersecting trajectories, inProc. IEEE/RSJ Int. Conf. Intelligent Robots Systems (IROS), 2011,pp. 36453652.

[9] N. Nigam and I. Kroo, Persistent surveillance using multiple unmannedair vehicles, in Proc. IEEE Aerospace Conf., 2008, pp. 114.

[10] P. Hokayem, D. Stipanovic, and M. Spong, On persistent coverage con-trol, in Proc. 46th IEEE Conf. Decision Control, 2008, pp. 61306135.

[11] B. Julian, M. Angermann, and D. Rus, Non-parametric inference andcoordination for distributed robotics, in Proc. 51st IEEE Conf. DecisionControl, 2012, pp. 27872794.

[12] Y. Chen, K. Deng, and C. Belta, Multi-agent persistent monitoring instochastic environments with temporal logic constraints, in Proc. 51stIEEE Conf. Decision Control, 2012, pp. 28012806.

[13] X. Lan and M. Schwager, Planning periodic persistent monitoring trajec-tories for sensing robots in gaussian random fields, in Proc. 2013 IEEEInt. Conf. Robot. Autom., 2013, pp. 24152420.

[14] C. G. Cassandras, X. Lin, and X. C. Ding, An optimal control approachto the multi-agent persistent monitoring problem, IEEE Trans. Autom.Control, vol. 58, no. 4, pp. 947961, 2013.

[15] C. G. Cassandras, Y. Wardi, C. G. Panayiotou, and C. Yao, Perturbationanalysis and optimization of stochastic hybrid systems, Eur. J. Control,vol. 16, no. 6, pp. 642664, 2010.

[16] G. Bao and C. G. Cassandras, Stochastic comparison algorithm for con-tinuous optimization with estimation, J. Optimiz. Theory Applic., vol. 91,no. 3, pp. 585615, Dec. 1996.

[17] A. Bryson and Y. Ho, Applied Optimal Control. New York: Wiley, 1975.[18] X. Lin and C. G. Cassandras, An optimal control approach to the multi-

agent persistent monitoring problem in two-dimensional spaces, in Proc.52nd IEEE Conf. Decision Control, 2013, pp. 68866891.

[19] X. Lin and C. G. Cassandras, An Optimal Control Approach to the Multi-Agent Persistent Monitoring Problem in Two-Dimensional Spaces, Tech.Rep. [Online]. Available: http://arxiv.org/abs/1308.0345

Copyright of IEEE Transactions on Automatic Control is the property of IEEE and its contentmay not be copied or emailed to multiple sites or posted to a listserv without the copyrightholder's express written permission. However, users may print, download, or email articles forindividual use.

lin 2015

Documents