a graph patrol problem with random attack times

This article was downloaded by: [205.155.65.226] On: 24 September 2014, At: 20:30Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Operations Research

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

A Graph Patrol Problem with Random Attack TimesKyle Y. Lin, Michael P. Atkinson, Timothy H. Chung, Kevin D. Glazebrook,

To cite this article:Kyle Y. Lin, Michael P. Atkinson, Timothy H. Chung, Kevin D. Glazebrook, (2013) A Graph Patrol Problem with Random AttackTimes. Operations Research 61(3):694-710. http://dx.doi.org/10.1287/opre.1120.1149

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2013, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

http://pubsonline.informs.org

http://dx.doi.org/10.1287/opre.1120.1149

http://pubsonline.informs.org/page/terms-and-conditions

http://www.informs.org

OPERATIONS RESEARCHVol. 61, No. 3, May–June 2013, pp. 694–710ISSN 0030-364X (print) � ISSN 1526-5463 (online) http://dx.doi.org/10.1287/opre.1120.1149

© 2013 INFORMS

A Graph Patrol Problem withRandom Attack Times

Kyle Y. Lin, Michael P. AtkinsonOperations Research Department, Naval Postgraduate School, Monterey, California 93943

{[email protected], [email protected]}

Timothy H. ChungSystems Engineering Department, Naval Postgraduate School, Monterey, California 93943, [email protected]

Kevin D. GlazebrookDepartment of Management Science, Lancaster University Management School, Lancaster LA1 4YX, United Kingdom,

[email protected]

This paper presents a patrol problem, where a patroller traverses a graph through edges to detect potential attacks at nodes.To design a patrol policy, the patroller needs to take into account not only the graph structure, but also the different attacktime distributions, as well as different costs incurred due to successful attacks, at different nodes. We consider both randomattackers and strategic attackers. A random attacker chooses which node to attack according to a probability distributionknown to the patroller. A strategic attacker plays a two-person zero-sum game with the patroller. For each case, we givean exact linear program to compute the optimal solution. Because the linear programs quickly become computationallyintractable as the problem size grows, we develop index-based heuristics. In the random-attacker case, our heuristic isoptimal when there are two nodes, and in a suitably chosen asymptotic regime. In the strategic-attacker case, our heuristic isoptimal when there are two nodes if the attack times are deterministic taking integer values. In our numerical experiments,our heuristic typically achieves within 1% of optimality with computation time orders of magnitude less than what isrequired to compute the optimal policy.

Subject classifications : military: search/surveillance; dynamic programming/optimal control; game/group decisions.Area of review : Optimization.History : Received July 2011; revisions received April 2012, September 2012, October 2012; accepted December 2012.

Published online in Articles in Advance April 12, 2013.

1. IntroductionPatrol problems arise in many real-world situations. Policeofficers patrol highways and cities; security guards patrolmuseums and shopping malls; soldiers patrol military basesand borders. In essence, a patrol problem examines howto route the patroller through many locations in order tofind illicit activities. With modern technological advance-ments, the patrol problem can be applied to many morecontexts, such as routing unmanned aerial vehicles andspeed boats. Whereas in most cases the patroller needs tomove physically, it is not always the case. For instance,a high-resolution video camera installed on a surveillancetower or on a blimp can turn to monitor different locationsalmost instantaneously. A security officer monitoring manylocations through real-time video feeds also faces a patrolproblem if he can watch only one video feed at a time.

Patrol problems have been studied since the 1970s. Ear-lier works focused on allocating police patrol resourcesamong different areas to maximize the overall perfor-mance (Chaiken and Dormont 1978, Chelst 1978, Larson1972, Olson and Wright 1975). Besides police patrol inurban areas, there are specialized patrol models for rural

areas (Birge and Pollock 1989) and on highways (Lee et al.1979, Taylor et al. 1985). These earlier works assumed thatthe frequencies of crimes at different locations remain con-stant and are known to the patrol force. Game theory hasbeen used to analyze some other problems related to patrolproblems, such as search games and infiltration games. Insearch games, a searcher seeks to find a hider who does notwant to be found (Alpern and Gal 2002, 2003; Thomas andWashburn 1991; Zoroa et al. 2009). In infiltration games,an intruder wants to penetrate an area without being caughtby the guard (Auger 1991; Baston and Kikuta 2004, 2009;Ruckle 1983; Washburn and Wood 1995).

In this paper, we consider a patrol problem on a graphwith n nodes. A patroller needs to traverse the graphthrough edges to detect potential attacks at nodes. In eachtime unit, the patroller can move to a node adjacent tohis current node and detect any ongoing attacks at thechosen node at the end of that time unit. The probabilitydistribution of the time it takes to complete an attack, aswell as the damage an undetected attack causes, dependson the node. There are two common ways to model anattacker’s behavior. A random attacker chooses which node

694

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Lin et al.: Graph Patrol Problem with Random Attack TimesOperations Research 61(3), pp. 694–710, © 2013 INFORMS 695

to attack according to a probability distribution known tothe patroller, whereas a strategic attacker plays a two-person zero-sum game with the patroller. From a practicalstandpoint, we are more interested in the strategic attackercase. In this paper, however, we address both cases. Therandom-attacker case is technically interesting in its ownright, and its solution provides valuable insights and pavesthe way for solving the strategic-attacker case.

There are a few recent studies on patrolling a graph in agame-theoretic setting. Shieh et al. (2012) divided the portof Boston into nine areas (nodes), and formulated the patrolproblem as a two-person game. On each day, the defenderrandomly selects the start time of the patrol, and randomlyselects a patrol schedule that needs to leave from and returnto the base node. The work closest to our current work isthat of Alpern et al. (2011). They studied the case of strate-gic attackers, and assumed that the time to complete anattack is deterministic and is the same for all nodes. Theyconsidered finite-time and infinite-time formulations; in thelatter case the patrol must repeat every T time periods forsome predetermined T . The optimal solution can only bederived in very special cases. In our paper, we allow eachnode to have its own attack time distribution and studyboth random attackers and strategic attackers. In each case,we formulate an exact linear program to compute the opti-mal solution. Because the linear programs quickly becomecomputationally intractable as the problem size grows, wepropose index-based heuristics that are easy to compute(Gittins et al. 2011).

The index-based heuristics have been successful in ear-lier works (Archibald et al. 2009; Glazebrook et al. 2007,2009), where a resource (patroller) is dynamically movedamong projects (nodes) to optimize system performance.These earlier works focused on the case when the decisionmaker knows the probability rule that governs the under-lying stochastic process—analogous to random attackers.In addition, the earlier works assumed that the resource(patroller) could be moved from one project (node) toany other project instantaneously—analogous to completegraphs. To the best of our knowledge, our work is the firstto use index-based heuristics as a vehicle to produce effec-tive policies in a game-theoretic setting, and the first to usean aggregate index to overcome the constraint on projectavailability.

The rest of this paper proceeds as follows. Section 2presents a patrol model. Section 3 discusses the case ofrandom attackers, and §4 discusses the case of strategicattackers. In both cases, we give an exact linear programto compute the optimal solution and propose near-optimalheuristics that are easy to compute. Finally, §5 concludesthe paper and points out future research directions.

2. A Patrol ModelIn anticipation of an attack, a defender (henceforth thepatroller) patrols an area hoping to detect the attack before

it completes. An attack is broadly construed as an illicitactivity undertaken by an adversary, such as breaching aperimeter, surveilling the surroundings, or planting a bomb.

There are n locations in the area subject to attack.To model a patroller’s strategy, we embed the n locations ina graph, where each node of the graph represents a locationsubject to attack. We study the case in which the patrolleruses a discrete-time schedule. Two nodes are connected byan edge if the patroller can move from one node to theother in one time period during the patrol. Denote the n×nadjacency matrix by a = 8ai1 j9, where ai1 j = 1 if nodes iand j are connected, or ai1 j = 0 otherwise. By definition,ai1 i = 1 for all i. In this paper, we only consider connectedgraphs. A patrol policy is an indefinite sequence of nodesthat observes the edge constraint.

When an attacker arrives at location i, it takes a randomamount of time to complete the attack, called the attacktime at node i, denoted by Xi1 i = 11 0 0 0 1 n. The probabilitydistributions of these attack times are arbitrary, but knownto both the patroller and the attacker. Whereas an attackcan initiate at any real-valued time, the patroller takes onetime unit to move from one node to an adjacent node. Tofacilitate discussions, we assume that the patroller detectsan ongoing attack if an attacker and the patroller occupythe same node at the end of a time period. An applica-tion illustrating this form of detection model can be foundin antisubmarine warfare contexts, where a helicopter isequipped with a dipping sonar. The helicopter (patroller)makes discrete observations by lowering the sonar unit intothe water, but must retract it prior to transiting to anothersearch location, during which no detections can occur. Weassume there are no false negatives. In other words, if thepatroller visits node i, then the patroller detects any ongo-ing attacks at node i at the end of the period. An unde-tected attack at node i costs ci to the patroller, i = 11 0 0 0 1 n,whereas a detected attack costs 0 regardless of how longthe attacker was at the node prior to detection.

Our patrol model has many applications. For instance,the Coast Guard can patrol a port by dividing the port andsurrounding waterways into several areas. A security guardcan patrol a museum or art gallery. An unmanned aerialvehicle can patrol a combat zone in search of threats. Withsuch applications in mind, the scale of problems appropri-ate for a single patroller should have a moderate number ofnodes, so that there is a reasonable chance of detecting anattack in time. In their work, Shieh et al. (2012) divided theport of Boston into 9 nodes. We suggest that our model ismost applicable when there are no more than 20 nodes inthe patrol graph. Having a single patroller responsible for amuch larger graph will result in a small chance of detectingan attack, thus making the patrol problem both impracticaland uninteresting. The one place in the paper where weconsider larger graphs (in Theorem 3 and Corollary 1) isalso the one place where we consider multiple patrollers.

Loosely speaking, the patroller’s objective is to find apatrol policy to minimize the expected cost incurred due

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Lin et al.: Graph Patrol Problem with Random Attack Times696 Operations Research 61(3), pp. 694–710, © 2013 INFORMS

to a successful attack, when and if an attack occurs. Weconsider two versions of the problem—random attackersand strategic attackers—and discuss them separately in thenext two sections.

3. Patrol Against Random AttackersThis section studies the problem in which an attacker willchoose node i to attack with probability pi1 i = 11 0 0 0 1 n.Whereas we assume the patroller knows pi, we assume thatattacks occur infrequently and the patroller has no knowl-edge about when an attack will occur. Loosely speaking,the patroller seeks to minimize the expected cost by assum-ing that an attack will eventually occur after a very longtime. The problem starts over after an attack takes place,whether the attack is detected (other security measuresensue) or not (disaster happens). To formulate this objectivefunction, we assume that the attackers arrive according to aPoisson process with rate å. Because the Poisson processhas stationary and independent increments, this assumptionimplies that an attack is equally likely to occur at any timemoment and that the patroller cannot learn about futureattack times from the attack history.

From a practical standpoint, the attack rate å is usuallyextremely small. From the formulation standpoint, however,the value of å is inconsequential if we let the problemcontinue indefinitely by ignoring interruptions from attacks.That is, many attackers can operate simultaneously at thesame node, with each acting independently on its own andinflicting damage separately. By minimizing the long-runcost rate in this model, we also minimize the average costdue to each attack, because å is just a scaling constant.Consequently, the optimal policy does not depend on thevalue of å.

3.1. MDP Formulation and Optimal Policy

Because each attacker independently chooses to attacknode i with probability pi, attackers arriving at node i con-stitute independent Poisson processes with respective rates�i = piå1 i = 11 0 0 0 1 n. If the patroller visits node i inthe current period, then according to our assumption, thepatroller detects all ongoing attacks at node i at the end ofthe period. Because there are no attackers at node i imme-diately following a patrol at node i, and the attackers arriveat node i according to a Poisson process, the state of thesystem can be delineated by s = 4s11 s21 0 0 0 1 sn5, where sidenotes the time periods elapsed since node i was last vis-ited by the patroller, i = 11 0 0 0 1 n. The state of each nodeincrements by 1 for each time period without a visit, andreturns to 1 immediately after the patroller’s visit. We writethe state space as

ì={

4s11 0 0 0 1 sn52 si = 1121 0 0 0 1 for i = 11 0 0 0 1 n}

0

Because the patroller visits one node in each time period,all si1 i = 11 0 0 0 1 n, have distinct values. In addition, onlyone si has value 1, namely, the node the patroller justvisited. Therefore, the current node of the patroller can berepresented by l4s5= arg mini si.

Because for any given state the future of the process isindependent of its past, we can formulate the problem asa Markov decision process (MDP). At the end of a timeperiod, the patroller needs to decide whether to stay atthe same node for another time period or move to one ofthe adjacent nodes. Thus, the action space is A = 8j2 j =

11 0 0 0 1 n9. A deterministic, stationary patrol policy can bedelineated by a map � from the state space to the actionspace �2 ì → A. Because the patroller can only move toa node adjacent to the current node, a specific mappings → j is feasible if and only if al4s51 j = 1. We use A4s5 =

8j2 al4s51 j = 19 to denote the set of feasible actions—or,equivalently, the set of nodes the patroller can move to—when the process is in state s.

The transition probability of this MDP is deterministic.If the patroller next visits node i ∈ A4s5 when in state s,the system will transition to state s̃= 4s̃11 s̃21 0 0 0 1 s̃n5, wheres̃i = 1, and s̃j = sj + 1 for j 6= i. For notational simplic-ity, we define the transition function �4s1 i5 to specifythe resulting state if the patroller visits node i in state s.Namely, �4s1 i5= s̃.

To write the cost function for this MDP, suppose the cur-rent state is s, and the patroller visits node i in the next timeperiod. Because the patroller detects attackers at the endof the next time period, the cost incurred in this next timeperiod at node j is equal to the expected number of attack-ers who complete their attack at node j in that time period,multiplied by cj . As seen in Figure 1, suppose that at thetime marked by a circle, the patroller decides to visit node inext. The attacker arriving to node j at the time marked bya square will complete its attack in the next time period ifits attack time Xj falls in 4t−11 t7. Using the Poisson sam-pling theorem (see, for example, Proposition 5.3 in Ross2010), the expected cost incurred at node j is

Cj4s1 i5= cj�j

∫ sj

0P4t − 1 <Xj ¶ t5dt

= cj�j

∫ sj

sj−1P4Xj ¶ t5dt0 (1)

The preceding is true for all j , and does not depend on i,because our model assumes the patroller detects the attack-ers at the end of a time period. Consequently, the costfunction for this MDP is C4s1 i5=

∑nj=1 Cj4s1 i5. Although

Figure 1. This diagram explains the derivation ofCj4s1 i5 in (1).

Time

sj

sj

j i

t

Notes. At the time marked by a circle, the last visit to node j was sjtime units ago and the patroller decides to visit node i next. An attackerarriving to node j at the time marked by the square will complete its attackduring the patroller’s visit to node i, if its attack time Xj ∈ 4t− 11 t7. Theargument holds for t ∈ 601 sj 7.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


C4s1 i5 does not directly depend on i, the choice of i affectsthe state in the next time period, and therefore the costincurred in the future.

In the case when the attack time Xj is bounded, let

Bj ≡ min{

k2 k ∈�+1 P4Xj ¶ k5= 1}

0 (2)

In other words, Bj is the smallest integer that is an upperbound for Xj1 j = 11 0 0 0 1 n. The cost function in (1) isthe same for any sj ¾ Bj + 11 j = 11 0 0 0 1 n. Therefore, forbounded attack times, we can restrict our state space sothat sj ¶ Bj + 1, which allows us to modify our transitionfunction s̃ = �4s1 i5 such that s̃i = 1 and s̃j = min4sj +

11Bj +15 for j 6= i. For the remainder of the paper, we willassume that the attack times are bounded, so that the statespace is finite.

The objective of this MDP is to minimize the total long-run cost rate among the n nodes. Our state space is finitebecause the attack time distributions are bounded, and theaction space is finite because the number of nodes is finite.Therefore, by Theorem 9.1.8 in Puterman (1994), we onlyneed to consider deterministic, stationary policies.

Because the state transition is deterministic, we candefine ��4s5 ≡ �4s1�4s55 as the resulting state if thepatroller applies policy � to state s. For an initial state s0,policy � will induce an indefinite, deterministic sequenceof states, written by 8�k

�4s051 k = 011121 0 0 09, where �k� =

�� k−1� , for k¾ 1. Because the state space is finite, even-

tually some state will be visited for a second time, andthereafter the process regenerates itself because the statetransition is deterministic under the same policy �. Con-sequently, after a number of initial transient moves, thesequence 8�k

�4s051 k = 011121 0 0 09 will repeat some cycleindefinitely. Therefore, if we apply policy � to an initialstate s0, we can write the long-run cost rate at node i as

Vi4�1 s05= limN→�

1N

N−1∑

k=0

Ci

(

�k�4s051�4�k

�4s055)

1

which is also equal to the total expected cost incurred ina cycle divided by the cycle length. Furthermore, we callthe sequence of nodes corresponding to the cycle a patrolpattern.

We seek to determine the optimal long-run cost rate overall nodes, namely,

COPT4s05≡ min�∈ç

n∑

i=1

Vi4�1 s051 (3)

where ç denotes the class of deterministic, stationary patrolpolicies. We use the minimum instead of infimum becauseç is finite, because the state space is finite. Dividing (3) byå gives us the minimized long-run average cost incurredfor each attack. When ci = 1 for all i, the ratio can beinterpreted as the probability of not detecting an attack.Whereas Vi4�1 s05 does depend upon s0, the optimal cost

rate COPT4s05 does not if the graph is connected, becauseVi4�1 s05 depends entirely on the patrol pattern generatedby s0 and �. To determine the optimal policy, it is equiv-alent to find the optimal patrol pattern. If the graph isconnected, from any starting state s0 one can construct apolicy � to produce any feasible patrol pattern. Thus, COPT

is the same for all initial states, and we drop its notationaldependence on s0 for the remainder of the paper.

Now that we have defined all the components of theMDP, we can use standard techniques such as linear pro-gramming to compute the optimal long-run cost rate. Wedefer the details to §EC.1.1, which can be found in theelectronic companion to this paper. An electronic compan-ion to this paper is available as part of the online versionat http://dx.doi.org/10.1287/opre.1120.1149. As discussedin §EC.1.1, this method quickly becomes computationallyintractable for problems of moderate size, which motivatesthe need of efficient heuristics.

3.2. Heuristic Policies on Complete Graphs

To motivate our heuristic policies, we first consider com-plete graphs. A complete graph is suitable in the scenariowhere a security manager sits in a surveillance room watch-ing real-time video feeds from various cameras. Althoughthe security manager can watch only one video feed at atime, he can switch from any feed to any other feed any-time he wants, which is analogous to a patroller movingfrom his current node to any node directly. We use indicesof the kind developed by Whittle (1988) to develop heuris-tic policies for the objective function in (3). We refer thereader to Gittins et al. (2011) for a recent account. Whittleindex policies for restless bandits have seen near-optimalperformance in many other applications (Archibald et al.2009; Glazebrook et al. 2007, 2009). Below, we outlinehow to compute a heuristic policy.

To begin, recall that COPT, defined in Equation (3),denotes the optimal long-run cost rate. First, we relax theproblem by extending the class of policies so that thepatroller is allowed to visit multiple nodes in a time period,as long as the overall long-run visit rate is no greater than 1.To do so, denote by çMN the set of stationary, deterministicpatrol policies

�2 ì→{

Á2 �i ∈ 80119 for i = 11 0 0 0 1 n}

1

where �i = 1 if the patroller will visit node i in the nextperiod. Similar to §3.1, the combination of � ∈ çMN andinitial state s0 induces a patrol pattern. This pattern is morecomplex than those generated by � ∈ ç, because now thepatroller can visit multiple nodes in one period. Becausethe same pattern will repeat indefinitely, we can denoteby �i4�1 s05 the rate at which the patroller visits node iwith policy � ∈çMN with initial state s0, which is just thenumber of visits to node i in the patrol pattern divided byits length.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


We next restrict çMN to only include policies � that meetthe total-rate constraintn∑

i=1

�i4�1 s05¶ 11 ∀ s0 ∈ì0 (4)

We denote the set of policies that satisfy this con-straint as çTR:

çTR=

{

� ∈çMN2n∑

i=1

�i4�1 s05¶ 11 ∀ s0 ∈ì

}

0

Although both Vi4�1 s05 and �i4�1 s05 depend on the initialstate, the optimal long-run cost rate does not, as explainedin §3.1. For the remainder of the paper we will writeinstead Vi4�5 and �i4�5 to simplify notations when wecan safely ignore their connections to s0 without ambiguity.The relaxed problem can be formulated by

CTR≡ min

�∈çTR

n∑

i=1

Vi4�50 (5)

Comparing Equations (3) and (5), it follows immediatelythat COPT ¾CTR because ç is a subset of çTR.

Second, we relax the problem again by incorporating thetotal-rate constraint in (4) into the objective function witha Lagrange multiplier w¾ 0.

C4w5≡ min�∈çMN

{ n∑

i=1

Vi4�5+w

( n∑

i=1

�i4�5− 1)}

= min�∈çMN

n∑

i=1

8Vi4�5+w�i4�59−w0 (6)

By incorporating a Lagrange multiplier, we can drop thetotal-rate constraint in (4), so that in (6) the patroller canvisit up to n nodes in every time period if he chooses todo so. For any w¾ 0, we have that

CTR= min

�∈çTR

n∑

i=1

Vi4�5¾ min�∈çTR

{ n∑

i=1

Vi4�5+w

( n∑

i=1

�i4�5−1)}

¾ min�∈çMN

{ n∑

i=1

Vi4�5+w

( n∑

i=1

�i4�5−1)}

=C4w50

The first inequality follows because w¾0, and∑n

i=1 �i4�5−1 ¶ 0 for any policy � ∈ çTR; the second inequality followsbecause the total-rate constraint

∑ni=1 �i4�5¶ 1 is dropped.

Consequently, we have a string of inequalities:

COPT ¾CTR ¾C4w50 (7)

The optimization problem in (6) breaks up the orig-inal problem into n separate problems, each concerninga single node. For instance, node i wants to minimizeVi4�5+w�i4�5, where w can be interpreted as the servicecharge, when the patroller spends one time period at nodei. By solving this problem, it becomes possible to computean index for each node in each state. An index heuristicpolicy is for the patroller to visit the node that has thehighest index.

3.2.1. Single-Node Problem. This section focuses onthe problem facing a single node, when each visit fromthe patroller costs w > 0. Namely, consider the objectivefunction in (6) concerning only node i, and strip off thesubscript for simplicity

min�∈çMN

V 4�5+w�4�50 (8)

We consider a similar MDP to the one described in §3.1,with the state being the time since the last patrol visit tothis node. For the single-node problem a policy � ∈ çMN

simplifies to a binary decision: visit the node or wait. Theobjective function is to minimize the long-run cost rate,which includes the cost due to not detecting an attack, andthe service cost due to a patroller’s visit. Because the statespace and action space are both finite, we only need to con-sider deterministic, stationary policies (see Puterman 1994,Theorem 9.1.8.). That is, the optimal action—whether thepatroller should visit the node—depends only on the num-ber of periods since the last patrol visit. Because the stateincreases by 1 each time period without a patrol visit, andreturns to 1 after a visit, it is sufficient to consider a policyof this type: Do not visit in states 1121 0 0 0 1 k− 1, and visitin state k, where k is a positive integer. In other words,we only need to consider those policies that visit the nodeonce every k time periods, for k = 1121 0 0 0 0

We next write out the objective function in (8) when thepatroller visits the node once every k time periods. We saya renewal occurs each time the patroller visits the node,so the cycle time (time between renewals) is k. An attackerarriving at time t following a renewal, 0 ¶ t < k, will com-plete its attack if its attack time is no greater k− t. Usinga Poisson sampling result (for example, Proposition 5.3 inRoss 2010), the number of successful attacks in a cyclefollows a Poisson distribution with expected value equal to

�∫ k

0P4X ¶ k− t5dt = �

∫ k

0P4X ¶ t5dt0

Because each successful attack costs c, and a patrol visitcosts w, the long-run cost rate is

f 4k5≡c�∫ k

0 P4X ¶ t5dt +w

k0 (9)

for k = 1121 0 0 0 0 Thus, solving (8) is equivalent to findingk to minimize f 4k5 in (9).

To minimize f 4k5, we first compute

f 4k+15−f 4k5

=1

k4k+15

(

c�k∫ k+1

0P4X¶ t5dt

−c�4k+15∫ k

0P4X¶ t5dt−w

)

=1

k4k+15

(

c�k∫ k+1

kP4X¶ t5dt−c�

∫ k

0P4X¶ t5dt−w

)

0

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


By setting f 4k+15= f 4k5, we can find the per-visit cost wthat makes the patroller indifferent between visiting thenode once every k time periods, or once every k+ 1 timeperiods. The solution will help us characterize the optimalpolicy minimizing f 4k5, and is defined by

W4k5≡ c�

(

k∫ k+1

kP4X ¶ t5dt −

∫ k

0P4X ¶ t5dt

)

(10)

for k = 1121 0 0 0 0 Because X is bounded by a constant B,for k¾ B, we have that

W4k5= c�

(

k−

∫ k

0P4X ¶ t5dt

)

= c�∫ k

0P4X > t5dt

= c�∫ B

0P4X > t5dt = c�E6X70 (11)

In addition, for k = 0, Equation (10) implies W405 = 0.The next theorem uses the functions W4k51 k¾ 0, to char-acterize the optimal policy minimizing the objective in (8).

Theorem 1. The function W4k5 defined in (10) is non-decreasing in k. In addition, for the single-node problemdefined in (8), if w ∈ 6W4k−151W4k57, then it is optimal tovisit the node once every k time periods, for k = 1121 0 0 0 0Moreover, if w¾ c�E6X7, it is optimal not to visit the nodeat all.

The proof of this theorem is deferred to §EC.2.1. Fromthe theorem, we can interpret W4k5 as the maximum per-visit cost for the policy that visits the node in state k (onceevery k time periods) to be optimal.

3.2.2. Index Heuristic (IH). To develop a heuristicbased on indices, affix a subscript i in Equation (10) todefine

Wi4k5≡ci�i

(

k∫ k+1

kP4Xi¶ t5dt−

∫ k

0P4Xi¶ t5dt

)

(12)

as the index of node i if the last patrol visit to node i tookplace k time periods ago, or equivalently, if node i is instate k.

To implement a heuristic based on these indices, wechoose the initial state by supposing that the patrol areahas been neglected for a long time. Therefore, initially weset si = Bi + 1. In the first time period, the patroller simplybegins his patrol at a node that has the highest index value.For each subsequent time period, the patroller comparesthe indices of all nodes (including the current node) andvisits the one that has the highest index value. We call thepreceding patrol policy the index heuristic (IH). Mathemat-ically, whenever in state s = 4s11 s21 0 0 0 1 sn5, the patrollernext visits node j if Wj4sj5 = maxi=11210001nWi4si5. In casethere is a tie, break the tie arbitrarily.

Recall from Theorem 1 that Wi4k5 is nondecreasing ink for all i. Therefore, a node’s index value increases witheach time period without a patrol visit and returns to itssmallest possible value immediately after a patrol visit.Because �i = piå for all i and the indices are used forcomparison across nodes, an equivalent index is to replace�i with pi in (12).

3.2.3. Lower Bound. Recall from (7) that COPT ¾CTR ¾ C4w5, where C4w5 represents the optimal long-runcost rate when each node operates independently with aper-visit cost w. The value C4w5 in (6) is a lower boundfor the optimal cost rate COPT for any w ¾ 0. In this sec-tion, we compute the tightest such lower bound, namely,CTR = maxw¾0 C4w5.

Recall that Wi4k5 represents the per-visit cost that makesnode i indifferent between receiving a patrol visit onceevery k time periods, or once every k+1 time periods. Fora given per-visit cost w, we can define

Ki4w5≡

{

�1 if w¾Wi4Bi53

min8k2 Wi4k5 > w91 otherwise0

From Theorem 1, Ki4w5 represents the optimal inter-val between visits at node i when each patrol visitcosts w. According to this definition, when there are mul-tiple optimal intervals, we break ties by choosing thelongest such interval. Consequently, we can rewrite C4w5=∑n

i=1 Ci4w5−w, where

Ci4w5≡ fi4Ki4w55

=

ci�i1 if w¾Wi4Bi53

ci�i

∫ Ki4w5

0 P4X ¶ t5dt +w

Ki4w51 otherwise0

The second part of the preceding is derived by affixing asubscript i in (9).

The function Ci4w5 represents the optimal long-run costrate for node i if the patroller charges w for each patrolvisit. First, Ci4w5 must be nondecreasing in w, becausethe node can always do better with a smaller servicecharge by using the same service interval. Second, Ci4w5is piecewise linear, with turning points occurring onlyat w = Wi4k5, for k = 1121 0 0 0 1Bi. Third, Ci4w5 is con-cave, because for w 6= Wi4k5, the function Ki4w5 remainsa constant and C ′

i4w5 = 1/Ki4w5, which is nonincreasingin w, because Ki4w5 is nondecreasing in w. Conse-quently, C4w5 =

∑ni=1 Ci4w5−w is also piecewise lin-

ear and concave. Therefore, it is straightforward to com-pute maxw C4w5.

The optimal solution that maximizes C4w5 can either bea point or a line segment. When w 6=Wi4k5 for some i, k,we have that C ′4w5=

∑ni=1 1/Ki4w5− 1. That is, C ′4w5 is

a step function that changes value at Wi4k5 for some i, k,and is nonincreasing. In the case the optimal solution isunique, denoted by w∗, we need to find w∗ such that

w <w∗⇐⇒

n∑

i=1

1Ki4w5

> 11

w >w∗⇐⇒

n∑

i=1

1Ki4w5

< 10

In the case where the optimal solutions consist of a linesegment, we need to find w∗ such that

∑ni=1 1/Ki4w

∗5= 1.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


3.2.4. Optimality of the Index Heuristic in SpecialCases. The index machinery has been used to developheuristics in the context of restless bandits (Gittins et al.2011, Whittle 1988). In some special cases, it is possibleto show that the index heuristics of this kind produce theoptimal solution. To intuitively understand how this mightoccur, observe that there must exist w∗ such that C4w∗5=

maxw¾0 C4w5, and w∗ = Wi4k5 for some i, k (see §3.2.3).For ease of explanation, suppose that w∗ and the node areboth unique and that the latter is labeled 1. The Lagrangianrelaxation with w = w∗ will be solved by any (possiblyinadmissible) patrol pattern �1 that visits each node i everyKi4w

∗5 time units, and by a second pattern �2 that mod-ifies �1 by visiting node 1 every K14w

∗5 − 1 time units.Some randomization between these two patterns, denoted�⊗�1 + 41 −�5⊗�2, will achieve maxw¾0 C4w5 and alsohave the property that the overall rate of patrol visits is 1.If one can show that from some finite time on, the IHproduces the same pattern of costs to each node as does�⊗�1 +41−�5⊗�2, then the IH must be optimal. It turnsout that this is always the case when n= 2, which we statein Theorem 2. For larger problems, this matching of costrates between the IH and � ⊗ �1 + 41 − �5 ⊗ �2 can beachieved—when there are Q patrollers, Q a suitably cho-sen integer—by creating Q shifted copies of patrol patternsthat are inadmissible for a single patroller. Theorem 3 for-malizes this idea.

Theorem 2. If n = 2, then the IH is optimal and C IH =

COPT = maxw¾0 C4w5.

The proof of this theorem is deferred to §EC.2.2. FromTheorem 2, we see that IH is optimal for very small graphs(n= 2). Unfortunately, the theorem does not hold for n¾ 3,as we have seen in counterexamples. In order to explore itsperformance for very large graphs (n → �), it is naturalto consider the asymptotic regime discussed by Weber andWeiss (1990, 1991). In this regime, the number of nodesof the graph and the number of patrollers go to infinity infixed proportion. We are able to establish Theorem 3 andCorollary 1, which make considerably stronger claims thanare generally available for restless bandits, and which arenot subject to the associated sufficient conditions that aredifficult to verify. Please note that this is the only place weconsider multiple patrollers in this paper.

Recall that in our formulation, each node i is char-acterized by the triple 4ci1�i1 Fi5, where Fi is the dis-tribution function of the attack time Xi1 i = 11 0 0 0 1 n.We now consider a Q-fold amplification of our base1-patroller-n-node problem. The Q-fold amplification con-sists of Q autonomous patrollers operating among nQnodes, the latter consisting of Q nodes with the char-acteristics 4ci1�i1 Fi5 for each i1 i = 11 0 0 0 1 n. The newgraph with nQ nodes is still complete, namely, that everynode is accessible from every other node in a singletime step. It will assist to label the nodes 4i1 j5, for i =

11 0 0 0 1 n and j = 11 0 0 0 1Q, where nodes 4i1 j5, j = 11 0 0 0 1Q,

share the characteristics 4ci1�i1 Fi5. We denote objectsrelated to the Q-fold amplification via a superscript Q, andhence write COPT1Q1 CQ4w5, etc. The Lagrangian relax-ation appropriate for the Q-fold amplification replaces (4)by the constraint

∑ni=1

∑Qj=1 �ij4�5 ¶ Q. It is evident that

CQ4w5 = Q ·C4w5, where C4w5 is, as usual, defined withrespect to the base 1-patroller-n-node problem. Hence,there is a common maximizer w∗ for both CQ4 · 5 and C4 · 5.We shall use the notations Wi4k51 Ki4w5 unambiguously inwhat follows without the need for the second node identi-fier. We are now able to state that for any base problem,certain Q-fold amplifications are such that the correspond-ing Lagrangian relaxation is tight and that there is an opti-mal policy, which from some time on always visits Q nodesof highest index.

Theorem 3. For complete graphs, there exists Q ∈ �+,such that

COPT1NQ= max

w¾0CNQ4w51 ∀N ∈�+0

Moreover, for any such NQ-amplification there exists anoptimal policy, which from time Q onward always visitsNQ nodes of highest index.

The proof of this theorem is deferred to §EC.2.3.The following result concerns asymptotic tightness of theLagrangian relaxation and asymptotic optimality of indexpolicies for problems with complete graphs. It is a simpleconsequence of Theorem 3 and its proof.

Corollary 1. For complete graphs the Lagrangian relax-ation in (6) is asymptotically tight in the sense that for anybase problem, limm→� COPT1m/m = maxw¾0 C4w5. More-over, there exists Q ∈ �+ and a sequence of policies8�m1m ∈�+9 such that �m is a policy for the m-amplifica-tion choosing m nodes of highest index at all times from timeQ onward and satisfying limm→� C�m1m/m= maxw¾0 C4w5.

In this subsection, we showed the optimality of the IH, incertain special cases, by proving that it achieves the lowerbound based on the Lagrangian relaxation. There are othercases where we can construct an optimal policy from theLagrangian lower bound. For instance, if the Lagrangianrelaxation results in Ki4w

∗5= 1/n, i = 11 0 0 0 1 n, for a com-plete graph or a circle graph with n nodes, then the optimalpolicy is simply any Hamiltonian cycle. That said, it isextremely unlikely that tractable approaches to the develop-ment of patrol patterns achieving C4w∗5 can be developedin general. For example, even in the special case whenKi4w

∗5 = 1/n for i = 11 0 0 0 1 n, determining whether thereexists a Hamiltonian cycle in an arbitrary graph is NP-complete (Karp 1972). Intuitively speaking, the IH uses agreedy method to find a feasible patrol pattern that comesclose to the optimal policy for the relaxed Lagrangian prob-lem. The next section extends the ideas of the IH to developheuristic policies on arbitrary graphs.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


3.3. Heuristic Policies on Arbitrary Graphs

On complete graphs, the patroller may visit any node atany time, whereas on an arbitrary graph, the patroller canonly visit a node that is adjacent to his current node. If thepatroller just visits the node that has the highest index valueamong all adjacent nodes, the patroller may easily becomestuck in a subgraph, especially on a leaf node. To overcomethis downside, we allow the patroller to look ahead a fewtime periods to compute an aggregate index. Such a com-putation is possible because the state of each node dependsentirely on the patrol path, without involving any random-ness. There are two natural ways to formulate the aggregateindex. For comparison purposes in our numerical study, wealso include a myopic heuristic based only on the expectedcost that can be avoided for the next time period.

3.3.1. Index Reward Heuristic (IRH). First, we caninterpret the index of the selected node as a reward. Withthis interpretation, an l-step look-ahead aggregate index ofa path is the sum of the indices accumulated over that pathin the next l time periods. The patroller can list all possiblepaths of length l and choose the next node to visit basedon the largest aggregate index among all those paths. Eventhough this heuristic computes the aggregate index for anl-step path, the aggregate index is used to determine onlythe next node. Once at the next node, a new l-step look-ahead aggregate index is computed. We call this heuristicthe index reward heuristic (IRH).

Regardless of the choice of l, the IRH is a function thatmaps from a state to a node. Because the state transition isdeterministic, whenever the process enters the same state,the IRH will generate the same patrol sequence. There-fore, the patrol schedule generated by the IRH producesan indefinite repetition of some finite patrol pattern. For agiven patrol pattern, we can evaluate its long-run cost ratein a straightforward manner.

One interesting and important question is, does the per-formance of the heuristic always improve when l increases?The answer is no. When comparing l = 1 and l = 2, theIRH may return the same patrol pattern, or may return twodistinct patterns. When the two patterns are distinct, thepattern generated with l = 2 may perform better than thepattern generated with l = 1, or it may perform worse.The time it takes to compute a patrol pattern in a completegraph is proportional to nl, because we need to compare allnl paths of length l to determine the next node. For thesereasons, it does not make sense to compute a patrol patternby setting l = 2 without first examining l = 1.

We call it the index reward heuristic with depth d, orIRH(d), if we compare the d patrol patterns generatedby look-ahead windows l = 1121 0 0 0 1 d, and choose thebest one. Consequently, by definition, the IRH(d) improves(weakly) as d increases. For complete graphs, IH andIRH(1) are the same.

3.3.2. Index Penalty Heuristic (IPH). Second, we caninterpret the indices of the unselected nodes as penalties.

With this interpretation, an l-step look-ahead aggregateindex of a path is the sum of all indices of unselected nodesaccumulated over that path in the next l time periods. Thepatroller can list all possible paths of length l and choosethe next node to visit based on the smallest aggregate indexamong all those paths. We call it the index penalty heuristicwith depth d, or IPH(d) if we compare the d patrol pat-terns generated by look-ahead windows l = 1121 0 0 0 1 d, andchoose the best one. By definition, the IPH(d) improves(weakly) as d increases. Note that IPH(1) and IRH(1) arethe same.

3.3.3. Myopic Heuristic (MH). We also consider amyopic heuristic in our numerical study. In state s, ifthe patroller visits node i next, then the expected numberof attacks he can detect is �i

∫ si0 P4Xi > t5dt, which fol-

lows from the Poisson sampling theorem (see, for example,Proposition 5.3 in Ross 2010). The expected cost that canbe avoided—or reward gained—if the patroller visits node iin state s is therefore

R4s1 i5= ci�i

∫ si

0P4Xi > t5dt0 (13)

A myopic heuristic policy that looks ahead l time peri-ods compares all feasible paths of length l and choosesthe next node to visit according to the path that gives thehighest total reward gained over that path. We call it themyopic heuristic with depth d, or MH(d), if we comparethe d patrol patterns generated by look-ahead windowsl = 1121 0 0 0 1 d, and choose the best one. By definition, theMH(d) improves (weakly) as d increases.

3.3.4. Lower Bound. To derive a lower bound for theoptimal value on an arbitrary graph, note that the patrollercan do no worse if all the nodes were connected. Therefore,the lower bound derived in §3.2.3 is also a lower bound forarbitrary graphs. However, because that lower bound doesnot take into account graph structure, it can be quite loosein general, especially for sparse graphs such as trees.

We next present a linear program that computes a tighterlower bound by taking into account graph structure. Tobegin, consider a single patrol pattern, and denote by yikthe rate at which the patroller enters node i exactly k timeunits after his previous visit to node i. For instance, if thepatrol pattern is 1-1-2-1-3-1-3-2, then y112 = 2/8, whereasy111 = y113 = y213 = y215 = y312 = y316 = 1/8. From the def-inition, it follows clearly that if node i is present in thepatrol pattern, then

�∑

k=1

kyik = 10 (14)

If node i is not in the patrol pattern, then yik = 0 for all k,so∑�

k=1 kyik = 0.In addition, let xij denote the rate at which the patroller

moves from node i to node j . With the same patrol pattern

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


1-1-2-1-3-1-3-2, we have that x111 = x112 = x311 = x312 =

1/8, and x113 = x211 = 2/8. By definition,

n∑

j=1

xij =

n∑

j=1

xji1 (15)

which is also the rate at which the patroller visits nodei1 i = 1121 0 0 0 1 n. Condition (15) ensures flow balance ateach node. The variables xij and yik are connected, withtwo obvious equations being

yi11 = xii1 i = 1121 0 0 0 1 n1 (16)�∑

k=1

yik =

n∑

j=1

xji1 i = 1121 0 0 0 1 n0 (17)

Constraint (16) follows because either side represents therate at which the patroller visits node i in two consecutivetime periods. Constraint (17) follows because either siderepresents the long-run rate at which the patroller entersnode i.

Now, suppose we allow a randomized policy, such thatthe patroller uses a set of patrol patterns and selects eachpattern with a predetermined probability. For a random-ized policy, we define yik and xij as the weighted aver-age over their counterparts in individual patrol patterns. Wecan interpret yik as the expected rate at which the patrollerenters node i exactly k time units after his previous visit tonode i, and xij the expected rate the patroller moves fromnode i to node j . Whereas constraints (15), (16), and (17)still hold for randomized policies, (14) holds if and only ifnode i is present in all patrol patterns in the random mix.Hence, for an arbitrary randomized policy, the constraint in(14) becomes an inequality:

�∑

k=1

kyik ¶ 11 i = 1121 0 0 0 1 n0 (18)

To set up a linear program, however, we cannot dealwith infinitely many terms yik1 k = 1121 0 0 0. Recall from(2) that Bi denotes the smallest integer such that P4Xi ¶Bi5= 11 i = 11 0 0 0 1 n. Hereafter we redefine yi1Bi

to denotethe expected rate at which the patroller enters node i withthe previous visit at least Bi time units ago. Constraints in(17) and (18) can be rewritten as

Bi∑

k=1

yik =

n∑

j=1

xji1 i = 1121 0 0 0 1 n1 (19)

Bi∑

k=1

kyik ¶ 11 i = 1121 0 0 0 1 n0 (20)

To formulate the objective function, recall from (13) thatif the patroller visits node i exactly k time units after hisprevious visit to node i, then the expected cost that can be

avoided is Ri4k5 ≡ ci�i

∫ k

0 P4Xi > t5dt. Because Ri4k5 =

ci�iE6Xi7, if k¾ Bi, the long-run cost rate at node i is

ci�i −

Bi∑

k=1

yikRi4k50 (21)

Finally, the linear program is

minxij 1 yik

n∑

i=1

(

ci�i −

( Bi∑

k=1

yikRi4k5

))

subject to xij = 01 if aij = 01 (22)n∑

i=1

n∑

j=1

xij = 11 (23)

xij1 yik ¾ 01 (24)

and constraints in (15), (16), (19), and (20).

Constraint (22) observes the edge constraint, and constraint(23) ensures the total rate to be 1. Because each feasiblepatrol pattern (or a randomization over a set of patrol pat-terns) yields a feasible solution to this linear program, butnot vice versa, the optimal solution to this linear programprovides a lower bound for the optimal long-run cost rate.

It is possible to tighten the lower bound further by addingconstraints on yik or constraints that take advantage of spe-cific graph structures. Some of these ideas are presentedin §EC.3. In the next section, we use the lower boundsproduced by this linear program to prepare Tables 2 and 3.

3.4. Numerical Experiments

We consider five graph types in our numerical experiments.1. Complete graph: All nodes are connected. A complete

graph is suitable in the scenario where a security managersits in a surveillance room watching real-time video feedsfrom various cameras. The security manager can watchonly one video feed at a time, but can switch from any feedto any other feed directly, which is analogous to a patrollermoving to any node directly.

2. Line graph: A line graph is applicable to an airbornepatrol unit responsible for a border or a vessel responsiblefor a river or a coast line.

3. Circle graph: A circle graph is applicable to a groundunit patrolling the boundary of an area.

4. Random tree: A random tree is generated recursivelyby connecting a new node randomly to an existing node.It is applicable to a patrol car that is responsible for roadsegments, or a vessel patrolling a river with branches.

5. Hexagon grid: A hexagon grid is popular in wargames (Dunnigan 1992) and is applicable to a patrolunit that covers an open area. Each hexagon correspondsto a node, and the patroller can move between adjacenthexagons. We label the center node as node 1, and nodes2–7 in the first layer, and nodes 8–19 in the second layer,and so on. A hexagon grid with n nodes consists of nodes1 to n with this labeling method.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Although our heuristic works for any bounded attacktime distribution, in order to assess the heuristics we useattack time distributions so that the problem does notbecome trivial. We do not want the attack time to be tooshort, in which case the patroller will never detect anything.Because our research goal is to study the effectiveness ofpatrol policies, we set the attack time to be at least 1 so thateach attack, regardless of its arrival time, can be detectedby some feasible patrol schedule. We also do not want theattack time to be too large, in which case the patroller willdetect almost everything. For a graph with n nodes, we letthe attack time be bounded by B = n, which also makesthe state space manageable for problems of moderate size.

We allow the attack time at each node to follow oneof three distributions: deterministic, uniform, and triangu-lar. For each node, the attack time is equally likely to fol-low these three distributions. In the case of a deterministicattack time, we generate a uniform random variable U611 n7to be its attack time, where n is the number of nodes. Inthe case of a uniform attack time distribution, we generatetwo such uniform random variables to be its minimum andmaximum. In the case of a triangular attack time distribu-tion, we generate three such uniform random variables tobe its minimum, mode, and maximum.

In order to better interpret the results, we set ci = 1for all i, so that

∑ni=1 Vi4�5/å is the long-run propor-

tion of attackers that evade detection, or equivalently, thelimiting probability an attacker will evade detection. Togenerate pi (probability of attacking node i), we first gen-erate uniform variables ui ∼ U60117, and then normalizethem so that pi = ui/

∑nj=1 uj . Recall that the value of å is

inconsequential.To obtain a patrol pattern, recall from §3.2.2 that we set

the initial state by assuming that the entire graph has notbeen patrolled for a long time. We then use a heuristic togenerate a sequence of patrol nodes and determine the cor-responding patrol pattern when a state repeats itself. If nopatrol pattern emerges after 2,000 time periods, we thenuse the entire patrol schedule of length 2,000 as a proxyfor the actual patrol pattern, which happens occasionally

Table 1. Performance of the three heuristics on complete graphs and line graphs with six nodes, reported as the per-centage excess over the optimal evasion probability.

Complete graph Line graph

Heuristic Depth (d) Mean 50th 75th 90th Mean 50th 75th 90th

IRH 1 2072 0000 2034 7064 12016 3096 14094 33084IRH 2 2031 0000 1085 6033 4095 0028 6058 13066IRH 3 2028 0000 1079 6029 4016 0000 4092 12015IPH 1 2072 0000 2034 7064 12016 3096 14094 33084IPH 2 0076 0000 0041 2016 1039 0000 0041 4004IPH 3 0057 0000 0032 1085 0050 0000 0000 1004MH 1 17028 11078 22074 39081 15028 8025 21088 40044MH 2 4056 0077 5054 13002 6030 2031 8090 17071MH 3 1054 0000 1055 5003 2083 0000 3016 8076

Note. We randomly generate 1,000 scenarios and report the mean, the 50th percentile, the 75th percentile, and the 90th percentile.

for some of the larger graphs analyzed in this numericalstudy.

Our first experiment is to compare the three heuristicsin §3.3, namely the IRH, the IPH, and the MH. The IHfor complete graphs is the same as IRH(1) and IPH(1),although it does not show up explicitly in our numericalstudy. We examine graphs of size n = 6, for which wecan compute the optimal solution. To make the point andsave space, we present results for only two graph types—complete graph and line graph—because these are the twoextremes in terms of graph connectivity. For the same num-ber of nodes, the complete graph represents the easiestgraph structure for the patroller, whereas the line graphrepresents the most difficult one.

For each graph type we generate 1,000 scenarios. A sce-nario consists of the adjacency matrix for the graph, thepi values, and the attack time distributions. The pi valuesand attack time distributions are the same for correspondingscenarios across different graph types. For each scenario,we compute the optimal solution (evasion probability) andreport the heuristics in terms of their percentage excess overthe optimal evasion probability. Over the 1,000 scenarios,we report the mean, the 50th, 75th, and 90th percentile inTable 1. Recall that IRH(1) and IPH(1) are identical, but asd increases, IRH(d) improves marginally, whereas IPH(d)improves quite significantly. Also seen in Table 1, the IPHoutperforms the MH by a sizable margin.

One would not expect the MH to perform particularlywell, as is the case in most dynamic resource allocationproblems. However, it may come as somewhat of a sur-prise that the IPH outperforms the IRH when d > 1. Tointuitively understand the downside of the IRH, recall thatWi4k5 increases in k for node i. When we look aheadfor l time periods and treat the index value as a reward,then at times the IRH will wait on node i in order for itsindex to increase, so as to collect a higher reward later on.This strategy may backfire if the patroller ends up wastingtime going to other nodes that benefit little from a patrolvisit. We provide an example with two nodes to illustratethis point.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Example. Consider an example with n = 2 and å = 1.Let p1 = 001, p2 = 009, and c1 = c2 = 1. In addition, theattack time distributions are deterministic at both nodes,with P4X1 = 25 = P4X2 = 2055 = 1. The optimal policy isclearly for the patroller to alternate between the two nodes(patrol pattern 1-2), because it detects all attacks, thereforeyielding a cost rate of 0.

To see how the IRH performs, first write the indices ina matrix form as follows:(

W1415 W1425 W1435W2415 W2425 W2435

)

=

(

0 002 0020 009 2025

)

0

Because both attack times are bounded by 3, Wi4k5 =

Wi435, for i = 112 and k > 3. When the look-ahead win-dow is 1, the IRH will pick node 1 in state 42115, andpick node 2 in state 41125, so the resulting patrol patternis 1-2, the optimal one. When the look-ahead window is 2,however, in state 41125 the IRH will pick node 1, becauseby looking ahead for 2 time periods, the IRH can collecta much higher reward of 2.25 if it waits for another timeperiod before visiting node 2. The resulting patrol patternis 1-1-2, yielding a cost rate of 0.15. �

From this point on, we will focus on the IPH. As dis-cussed earlier, the performance of IPH(d) improves as dincreases, but the computation takes more time. As shownin Table 1, the improvement is more significant in the linegraph than in the complete graph. We next test IPH(d) onfive graph types with d = 11 0 0 0 15 to study how the perfor-mance improves at the cost of computation time, and plotthe results in Figure 2. Each line corresponds to a graphtype, with five points corresponding to d = 1 on the left tod = 5 on the right. As d increases, the performance getsbetter (moving lower) but computation takes longer (mov-ing to the right).

As seen in Figure 2, when d increases, the improvementon the performance is more pronounced on graphs withfewer edges (line, circle, and random trees), and less soon graphs that are well connected (complete and hexagongrid). Intuitively, with a small look-ahead window, thepatroller is more likely to get stuck in a subgraph of asparse graph than in a subgraph of a well-connected graph.Therefore, it is reasonable to set d based on the graph struc-ture, with a small value when the graph is well connected(such as a complete graph) and a large value when it isnot (such as a line graph). We propose a Modified IndexPenalty Heuristic (MIPH), where we set the depth to

d = 1 + �average distance between all pairs of nodes�0

In other words, the depth is set to 2 for a complete graph,and increases as the graph becomes less connected.

Table 2 reports the performance of the MIPH for 5 graphtypes by comparing them to the optimal solution. Becausethe state space in the MDP depends on the graph types,for complete graphs we can compute the optimal solution

Figure 2. This figure displays, for the random-attackercase, the IPH performance against computa-tion time for different d, on five graph typeswith n= 6.

10–3 10–2 10–1 100 10110–1

100

101

102

Line

Circle

Random tree

Hexagon grid

Computation time (seconds in log scale)

Perc

enta

ge o

ver

optim

um (

log

scal

e)

d =2

d =1

d =3 d = 4d = 5

Complete

Notes. The performance is the 90th percentile over 1,000 random scenar-ios, reported as percentage over optimum. Each line corresponds to onegraph type, with d = 112131415 from left to right.

for up to 7 nodes, while for line graphs we can do so forup to 14 nodes. In all cases, we randomly generate 1,000scenarios and report the mean, the 50th, 75th, and the 90thpercentiles. As seen in Table 2, the MIPH performs uni-formly well across all graph types. In particular, there islittle evidence of any degradation in performance as thegraph size n grows within our range of interest. Table 2also gives the average depth and the average computationtime. Overall, we find the MIPH offers a good balancebetween performance and computation time. The last col-umn in Table 2 reports how the lower bound discussedin §3.3.4 compares with the optimal solution on average.For each scenario, we compute 4LB−Opt5/Opt in percent-age, and report the average over 1,000 scenarios. For allgraph types, the lower bounds are about 1% below the opti-mal solution for n = 617, but they gradually degrade as nincreases.

To assess how much our heuristic improves over a naïvepatrol strategy, we consider two graph types: line graph andcircle graph. For a line graph, a naïve patrol moves backand forth between two end nodes, spending just one timeperiod at each end node. For a circle graph, a naïve patrolcircles around all nodes. For n= 6, among the same 1,000scenarios reported in Table 2, on average the naïve patrolproduces an evasion probability 20.57% over optimum forline graphs, and 19.57% over optimum for circle graphs. Ineither case, on average our heuristic produces an evasionprobability less than 0.5% over optimum.

To conclude our numerical study in this section, we nextlook at larger problems by comparing the MIPH with thelower bound discussed in §3.3.4. We only examine com-plete graphs, because the quality of the available lower

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Table 2. Performance of the MIPH in the random-attacker case, reported as percentage excess overthe optimal evasion probability; the last column reports the mean of 4LB − Opt5/Opt inpercentage, where the lower bound is computed using the linear program in §3.3.4.

Avg Avg time LowerGraph # nodes Mean 50th 75th 90th depth (10−2 sec) bound

Complete 6 0076 0000 0041 2016 200 103 −1012Complete 7 1011 0000 1011 3012 200 203 −1026Line 6 0039 0000 0000 0071 400 109 −0067Line 7 0033 0000 0000 0060 400 205 −1030Line 8 0051 0000 0000 1021 400 303 −1080Line 9 0042 0000 0000 1027 500 1100 −2089Line 10 0048 0000 0014 1038 500 1306 −3050Line 11 0044 0000 0026 1047 500 1607 −4033Line 12 0034 0000 0003 1008 600 5602 −5002Line 13 0041 0000 0025 1034 600 8609 −6003Line 14 0053 0000 0053 1059 600 10007 −6039Circle 6 0032 0000 0000 0090 300 101 −0044Circle 7 0043 0000 0000 1050 300 104 −1003Circle 8 0036 0000 0000 1026 400 407 −1039Circle 9 0041 0000 0014 1042 400 509 −2012Random tree 6 0026 0000 0000 0027 307 305 −0059Random tree 7 0025 0000 0000 0032 309 505 −0098Random tree 8 0025 0000 0000 0084 400 803 −1053Random tree 9 0027 0000 0000 0071 400 1103 −2007Hexagon grid 6 0046 0000 0000 1068 300 203 −0068Hexagon grid 7 0052 0000 0022 1068 300 307 −1014Hexagon grid 8 0061 0000 0060 2003 300 501 −1076

Table 3. Performance of the MIPH in the random-attacker case on complete graphs, reported asthe percentage excess over the lower bound,which is computed using the linear programin §3.3.4.

# nodes Mean 50th 75th 90th

6 1096 0075 2050 50099 2051 1081 3027 5023

12 2028 1067 3005 407815 2013 1075 2076 309818 2007 1076 2058 3080

bound for the other graph types degrades as n increases(see Table 2), so such a comparison does not provide usefulinformation. As shown in Table 3, on average the MIPHexceeds the lower bound by about 2% (both mean and 50thpercentile), and the 90th percentile is about 5% over thelower bound. These numbers are very encouraging, becausethey suggest that the MIPH produce excellent results forcomplete graphs up to 18 nodes.

4. Patrol Against Strategic AttackersThis section concerns the situation when the attackeractively chooses which node to attack, in order to maxi-mize the cost incurred due to its attack. In other words,the attacker and the patroller play a simultaneous-movetwo-person zero-sum game, with the patroller trying tominimize the cost, and the attacker trying to maximize it.The patroller decides how to patrol the graph, whereas theattacker chooses which node to attack.

One way to formulate this problem is to use the modelframework in §3. For a given patrol policy, the ratioVi4�5/�i represents the long-run average cost incurred dueto an attack at node i. Consequently, the patroller’s objec-tive function in this two-person zero-sum game is

min�∈çR

maxi=110001n

Vi4�5

�i

1 (25)

where çR represents the set of randomized policies—allpolicies that map from the state space ì to the actionspace A according to a probability distribution. BecauseVi4�5 scales proportionally with �i, the ratio Vi4�5/�i doesnot depend on �i.

4.1. Optimal Policy

By modifying the linear program that computes the opti-mal solution for the random-attacker case, it is possible tocompute the optimal value in (25). We defer the detailsto §EC.1.2. This method produces the optimal solutionused in our numerical studies, but becomes computationallyintractable for moderate-size problems.

4.2. Heuristic Policies

In a two-person zero-sum game, it is often the case thatthe optimal strategy for either player is a mixed strategy.A mixed strategy for the attacker is a probability distri-bution over the nodes to attack. A mixed strategy for thepatroller is a distribution that defines, for each state of thesystem, the probability that the patroller will move to eachadjacent node.This interpretation allows the linear program

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


formulation described in §EC.1.2 to solve the problem opti-mally. However, this approach quickly becomes computa-tionally intractable as the size of the problem grows.

Another way to randomize a patroller’s strategy is tobegin with a set of feasible patrol patterns, and let thepatroller choose each pattern in that set with a certainprobability and repeat that pattern indefinitely. If the setincludes all feasible patterns (there are infinitely many),then the resulting mixed strategy over patterns wouldachieve optimality.

Because we cannot examine an infinite number of patrolpatterns, we propose a heuristic to compute a mixed strat-egy from a finite set of selected patrol patterns. Givena finite set of patrol patterns, denoted by S = 8�11 �210 0 0 1 �m9, with �S� = m, we can formulate a differenttwo-person zero-sum game between the attacker and thepatroller in the standard matrix form. In this game matrix,row i corresponds to the attacker choosing node i to attack,and column j corresponds to the patroller choosing patrolpattern �j , with i = 11 0 0 0 1 n and j = 11 0 0 0 1m. It is thenstraightforward to set up a linear program to solve this two-person zero-sum game (see §3.10 in Washburn 2003).

Of course, the solution to this n×m matrix game will notnecessarily be the globally optimal mixed strategy, becausewe only consider a finite set S. The key to the success ofthis approach is to generate the set S so that the optimalmixed strategies using only the patrol patterns in S is closeto the global optimum. The major advantage is that thelinear program that solves the n×m matrix game is muchsmaller than the linear program discussed in §EC.1.2.

To obtain patrol patterns that constitute S, recall thatwe can use the IPH(d5 in §3.2.2 to find a patrol patternagainst a given attack probability distribution over n nodesp= 4p11 0 0 0 1 pn). Below we discuss how to use the IPH(d)to generate patrol patterns that compose S. We propose togenerate S in three groups.

The first group includes patrol patterns generated froman iterative algorithm that is motivated from fictitious playsproposed by Robinson (1951), where she proved that aniterative method will generate mixed strategies that con-verge to the optimum in a two-person zero-sum matrixgame. In that iterative method, each player chooses a purestrategy arbitrarily in the first round. In each subsequentround, each player chooses a pure strategy that producesthe best expected value against the mixture of strategiesused by the other player in the previous rounds. Becausein our model the patroller has infinitely many strategies(patrol patterns), we first compute 4p11 0 0 0 1 pn5 based on themixture of strategies used by the attacker in the previousrounds, and then use IPH(d) to generate a patrol patternfor the patroller. The algorithm proceeds as follows:

1. In round 1, each player picks a strategy arbitrarily.(a) Denote by �4k5 the patrol pattern used by the

patroller in round k. Choose �415 arbitrarily.(b) Let the attacker pick node 1 to attack. Use ri, i =

11 0 0 0 1 n, to keep track of the number of times node i is

picked by the attacker. Initialize r1 = 1, and ri = 0, fori = 2131 0 0 0 1 n.

2. Repeat the following for a predetermined number ofrounds. In round k¾ 2,

(a) Set pi = ri/∑n

j=1 rj , which represents theattacker’s mixed strategy based on his attack history fromrounds 1 to k − 1. Use the IPH(d) to generate a patrolpattern �4k5.

(b) Find the best node to attack by assuming thepatroller uses patrol pattern �4j5, j = 11 0 0 0 1 k−1, each withprobability 1/4k−15. If attacking node i yields the highestexpected cost, then set ri ← ri + 1.

In round 1, we are free to set �415 to any patrol pattern,but we propose the patrol pattern generated by the IPH(d),when the attack probability at node i is

pi =1/4ciE6Xi75

∑nj=1 1/4cjE6Xj 75

0 (26)

With this choice, from Equation (11) we see thatlimk→� Wi4k5 = cipiåE6Xi7 is the same for all i. In otherwords, with this attack probability distribution, the indexof each node will approach the same limit if the node hasnot been visited for a long time, which results in a patrolpattern that is likely to cover all nodes.

In the kth round of this algorithm, the patrol pattern�4k5 is the best one against the attacker, among d patrolpatterns generated with look-ahead windows 1121 0 0 0 1 d.These other patrol patterns, in many cases identical to �4k5,could add value to S, so we include them as well. Becauseincreasing the number of rounds does take time, addingthese patrol patterns into S improves the overall perfor-mance with almost no additional cost.

Whereas this first group should give us a good mixtureof patrol patterns, theoretically it is possible that some nodeis not covered in any of the patrol patterns generated inthis first group. If node i is not covered in any of the patrolpatterns in S, then restricting the patroller to use only thosepatrol patterns in S will open the door for the attacker toattack node i for a guaranteed success. To fix this problem,we include a second group of patrol patterns in S. Thesecond group consists of all singleton patterns—a patternconsisting of just one node—into S, so that every node iscovered in at least one patrol pattern in S. However, it israrely a good idea for the patroller to spend all his time ona single node while ignoring all other nodes. This motivatesa third (and final) group of patrol patterns to include in S.

The third group that makes up S consists of n additionalpatrol patterns, with each pattern designed to cover oneparticular node but not necessarily confined to that node.We need Proposition 1 below.

Proposition 1. Consider the random-attacker case dis-cussed in §3.1, where p = 4p11 0 0 0 1 pn5 with pi denotingthe attack probability at node i1 i = 11 0 0 0 1 n. If Xi ¾ 1 forall i, and if cjpj >

∑

i 6=j cipi, then a patrol policy that nevervisits node j cannot be optimal.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


The proof of this proposition is deferred to §EC.2.4.To find an attack probability distribution p such that thepatroller has to visit node j with the optimal policy andstill has incentive to visit some other nodes, we let pi =

K/4ciE6Xi75 for i 6= j , where K is a constant. The reasonto let pi, i 6= j , be inversely proportional to ciE6Xi7 is thesame as that for Equation (26). From Proposition 1, wewant to choose K so that

cj

(

1 −∑

i 6=j

K

ciE6Xi7

)

>∑

i 6=j

ciK

ciE6Xi71

or equivalently,

K < 1/

(

1cj

∑

i 6=j

1E6Xi7

+∑

i 6=j

1ciE6Xi7

)

0 (27)

In our numerical experiments, we set ci = 1 for all i,so we can interpret the output as the evasion probability.Therefore, we need pj = 1−

∑

i 6=j K/E6Xi7 > 1−005 = 005,where the inequality follows from (27) by setting ci = 1 forall i. In our numerical experiments, we set pj = 0051 anduse IPH(d) to generate a patrol pattern, with the choice of dto be discussed later. We add all patrol patterns generatedwith different look-ahead windows 1121 0 0 0 1 d into S, forthe same reason discussed when generating the first group.The third group adds at most n ·d additional patrol patternsinto S.

To summarize, we generate S in three groups, and ourheuristic has two parameters r and d, both predeterminedpositive integers. For a graph with n nodes, we run thealgorithm for r × n rounds, to generate the first group. Ifwe run IPH(d) when generating patrol patterns in groups1 and 3, then we can end up with at most r × n × dpatrol patterns in group 1, n in group 2, and at most n×din group 3. The actual number, however, is usually muchsmaller than rnd+d+nd, because many of the generatedpatrol patterns produce identical performance. For instance,patrol patterns 1-2-1-3-2 and 2-3-1-2-1, although differ-ent, produce identical results at each node. In any case, itis straightforward to solve a two-person zero-sum matrixgame using linear programming, when the attacker has npure strategies, and the patroller can use any patrol patternsin S. Our heuristic is optimal in a special case discussedin Theorem 4, whose proof is deferred to §EC.2.5.

Theorem 4. If n = 2, and P4Xi = di5 = 1 for some posi-tive integers di1 i = 112, then the heuristic policy for thestrategic-attacker case is optimal—by using the patrol pat-terns in the second and the third groups.

4.3. Lower Bound

We can modify the linear program in §3.3.4 to compute alower bound for the optimal value in the strategic-attackercase. Recall from (21) that

∑Bi

k=1 yikRi4k5 represents the

long-run cost rate that can be avoided at node i. Hence, theexpected cost if node i is attacked is

ci −1�i

Bi∑

k=1

yikRi4k5= ci

(

1 −

Bi∑

k=1

yik

∫ k

0P4Xi > t5dt

)

0

To minimize the maximum such cost among all nodes, wecan set up a linear program as follows:

min z

subject to z¾ ci

(

1 −

Bi∑

k=1

yik

∫ k

0P4Xi > t5dt

)

0

and constraints in (15), (16), (19), (20),

(22), (23), (24)0

The additional constraints presented in §EC.3 also apply tothe strategic-attacker case. In the next section, we use thislower bound to prepare Tables 4 and 5.

4.4. Numerical Experiments

This section presents numerical experiments for thestrategic-attacker case. Increasing either of the two parame-ters of our heuristic, namely r and d, will improve the per-formance and take more time, so which investment bringsbetter marginal benefit? To answer this question, we con-duct experiments on 6-node complete and line graphs, andpresent the results in Figures 3 and 4, respectively.

As shown in Figure 3, the curve corresponding to d = 1lies to the left and below that corresponding to d = 2 (lesstime and better performance). Setting d = 3 only makes thematters worse. In Figure 4, however, if we set d = 1, thenthe 90th percentile is still 3.18% over optimum even withr = 20. Setting d = 2 and r = 5, the heuristic (90th per-centile) improves to 1.30% over optimum and cuts the run-ning time from 0.51 seconds to 0.42 seconds. For 6-nodeline graphs, d = 2 appears to be the best choice, as thecorresponding curve is to the left and below those corre-sponding to d = 31 41 5. After studying the other graphtypes (results not shown), we conclude that in the strategic-attacker case the depth of IPH should be set smaller thanthat in the random-attacker case in exchange for larger r .In the rest of this section, we set r = 10 and

d = 1 + �4average distance between

all pairs of nodes − 15/2�0

One can check that d = 1 for complete graphs and d ¾ 2for other graph types.

To evaluate our heuristic, we examine the same scenariosas those in Table 2 and report the results in Table 4. Ourheuristic performs very well across all five graph types,with both the mean and the median over 1,000 scenarioswell below 1% above the optimal solutions. Even when theheuristic performs relatively poorly (i.e., 90th percentile),

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Table 4. Performance of the heuristic in the strategic-attacker case, reported as percentage excessover the optimal evasion probability; the last column reports the mean of 4LB − Opt5/Optin percentage, where the lower bound is computed using the linear program in §4.3.

Avg Avg time LowerGraph No. nodes Mean 50th 75th 90th depth (sec) bound

Complete 6 0032 0018 0041 0087 100 0046 0000Complete 7 0032 0021 0042 0075 100 0060 0000Line 6 0029 0007 0029 0080 200 0065 −0023Line 7 0033 0013 0039 0083 200 0088 −0025Line 8 0038 0020 0051 0098 200 1011 −0027Line 9 0027 0015 0036 0066 300 3014 −0031Line 10 0027 0018 0039 0066 300 4004 −0033Line 11 0033 0022 0043 0070 300 5005 −0032Line 12 0033 0022 0042 0070 300 6019 −0039Line 13 0034 0024 0045 0075 300 7056 −0040Line 14 0035 0026 0046 0072 300 8097 −0039Circle 6 0025 0014 0035 0069 200 0072 −0005Circle 7 0042 0028 0061 1000 200 0093 −0006Circle 8 0040 0024 0057 1000 200 1020 −0008Circle 9 0045 0034 0065 0097 200 1048 −0009Random tree 6 0022 0003 0023 0068 200 0064 −0022Random tree 7 0034 0011 0041 0092 200 0083 −0026Random tree 8 0063 0024 0065 1049 200 1008 −0032Random tree 9 0082 0036 0093 1099 200 1041 −0036Hexagon grid 6 0047 0026 0063 1021 200 0092 −0004Hexagon grid 7 0058 0034 0080 1038 200 1039 −0002Hexagon grid 8 0074 0054 1005 1068 200 1090 −0005

it is just about 1%–2% above optimum. The last columnshows the mean of the lower bound derived in §4.3, interms of the percentage below optimum. The lower boundis, on average, within 0.5% of the optimum. In addition,the quality of the lower bound does not appear to degradeas n increases as in the case of random attackers. Themain reason is that the linear programs that produce thelower bound in both cases allow mixed strategies, whichresembles the optimal strategy against strategic attackers,whereas the optimal strategy against random attackers con-sists of just one patrol pattern.

To assess how much our heuristic improves over a naïvepatrol strategy, we again consider two graph types, as in thecase of random attackers. A naïve patrol strategy is mov-ing back and forth for a line graph, and circle around all

Table 5. Performance of the heuristic in the strategic-attacker case, reported as the percentageexcess over the lower bound, which is com-puted using the linear program in §4.3.

n= 12 n= 18

Graph Mean 50th 75th 90th Mean 50th 75th 90th

Complete 0078 0059 1009 1073 1029 1024 1066 2012Line 0072 0050 0090 1057 0070 0056 0095 1036Circle 0048 0039 0064 0095 0057 0048 0074 1004Random tree 1043 1002 1087 3004 1057 1030 1090 2076Hexagon grid 1025 1010 1068 2030 2046 1085 2074 5018

nodes for a circle graph. For n= 6, among the same 1,000scenarios reported in Table 4, on average the naïve patrolproduces an evasion probability 20.65% over optimum forline graphs and 19.59% over optimum for circle graphs. Ineither case, on average our heuristic produces an evasionprobability less than 0.5% over optimum.

To conclude this section, we consider larger graphs andcompare our heuristic with the lower bound derived in §4.3.As shown in Table 5, for 12-node graphs, on average theheuristic performs within 2% above the lower bound, andfor an 18-node graph it is within 3%. The results show notonly the excellent performance of the heuristic, but also thetightness of the lower bound.

5. ConclusionsIn this paper, we study how to patrol a graph against randomattackers and against strategic attackers. In both cases, wegive an exact linear program to compute the optimal solu-tion. Because the linear program quickly becomes compu-tationally intractable as the problem size grows, we proposeeasy-to-compute index-based heuristics, which producenear-optimal performance in our numerical experiments.

Besides producing an effective patrol policy, our workcan also be used to provide recommendations on how todesign a patrol graph. For instance, a museum can comparepatrol results to decide where its most valuable art workshould be exhibited (swapping ci and cj ) and to decide howto connect its exhibit room (adding or removing edges).

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Figure 3. This figure displays, for the strategic-attackercase, the heuristic performance against com-putation time for different r and d, on 6-nodecomplete graphs.

0 2 4 6 8 10 120

0.5

1.0

1.5

2.0

2.5

Computation time (seconds)

Perc

enta

ge o

ver

optim

um

d = 1 d = 2

r = 10

d = 3r = 15

r = 20

r = 5

Notes. The performance is the 90th percentile over 1,000 random scenar-ios, reported as percentage excess over optimum. Each line correspondsto one value of d, with r = 51 101 151 20 from left to right.

One assumption in our model is that it takes the sameamount of time for the patroller to move from one node toits adjacent nodes. If a node is far away from all the othernodes, we can create dummy nodes in between, and let thecost (namely ci) be zero for those dummy nodes. Anotherassumption we make is that the detection occurs at the end

Figure 4. This figure displays, for the strategic-attackercase, the heuristic performance against com-putation time for different r and d, on 6-nodeline graphs.

0 2 4 6 8 10 12 140

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Computation time (seconds)

Perc

enta

ge o

ver

optim

um

r = 5

r = 10r = 15 r = 20

d = 1

d = 5d = 4d = 3d = 2

Notes. The performance is the 90th percentile over 1,000 random scenar-ios, reported as percentage excess over optimum. Each line correspondsto one value of d, with r = 51 101 151 20 from left to right.

of a time period, as opposed to, say, at the beginning of atime period. Although the mathematical expressions for dif-ferent variations will invariably be different, intuitively webelieve the effectiveness of the proposed heuristics wouldbe comparable for other modeling choices.

There are a few possible future research directions. First,we assume the patroller is perfect in detecting an attack. Inreality, a patroller may not detect the attacker even if thetwo occupy the same node at the same time. Second, westudy the case of one patroller. Often in practice, however,a large area is patrolled by many patrollers. Another inter-esting twist to our current model is to allow the attackerto see the patroller when the two occupy the same node.In that case, an attacker can initiate an attack immediatelyafter the patroller leaves the targeted node. Another inter-esting extension is to develop a dynamic game, where theattacker can explore potential targets before deciding whereto attack. These extensions may require substantially dif-ferent formulations.

Supplemental Material

Supplemental material to this paper is available at http://dx.doi.org/10.1287/opre.1120.1149.

Acknowledgments

The authors are grateful to the three referees and the associateeditor for their valuable comments. This material is based uponwork supported by the Office of Naval Research.

ReferencesAlpern S, Gal S (2002) Searching for an agent who may or may not want

to be found. Oper. Res. 50(2):311–323.Alpern S, Gal S (2003) The Theory of Search Games and Rendezvous

(Kluwer Academic Publishers, Norwell, MA).Alpern S, Morton A, Papadaki K (2011) Patrolling games. Oper. Res.

59(5):1246–1257.Archibald TW, Black DP, Glazebrook KD (2009) Indexability and index

heuristics for a simple class of inventory routing problems. Oper. Res.57(2):314–326.

Auger JM (1991) An infiltration game on k arcs. Naval Res. Logist.38(4):511–529.

Baston V, Kikuta K (2004) An ambush game with an unknown numberof infiltrators. Oper. Res. 52(4):597–605.

Baston V, Kikuta K (2009) An ambush game with a fat infiltrator. Oper.Res. 57(2):514–519.

Birge J, Pollock S (1989) Modelling rural police patrol. J. Oper. Res. Soc.40(1):41–54.

Chaiken J, Dormont P (1978) A patrol car allocation model: Capabilitiesand algorithms. Management Sci. 24(12):1291–1300.

Chelst K (1978) An algorithm for deploying a crime directed (tactical)patrol force. Management Sci. 24(12):1314–1327.

Dunnigan JF (1992) The Complete Wargames Handbook: How to Play,Design and Find Them (Quill, Minneapolis).

Gittins JC, Glazebrook KD, Weber R (2011) Multi-armed Bandit Alloca-tion Indices, 2nd ed. (John Wiley & Sons, Hoboken, NJ).

Glazebrook KD, Kirkbride C, Ouenniche J (2009) Index policies for theadmission control and routing of impatient customers to heteroge-neous service stations. Oper. Res. 57(4):975–989.

Glazebrook KD, Kirkbride C, Mitchell HM, Gaver DP, Jacobs PA (2007)Index policies for shooting problems. Oper. Res. 55(4):769–781.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Karp RM (1972) Reducibility among combinatorial problems. Miller RE,Thatcher JW, eds. Complexity of Computer Computations (Plenum,New York), 85–103.

Larson RC (1972) Urban Police Patrol Analysis (MIT Press, Cambridge,MA).

Lee S, Franz L, Wynne A (1979) Optimizing state patrol manpower allo-cation. J. Oper. Res. Soc. 30(10):885–896.

Olson D, Wright G (1975) Models for allocating police preventive patroleffort. Oper. Res. Quart. 26(4, Part 1):703–715.

Puterman ML (1994) Markov Decision Processes: Discrete StochasticDynamic Programming (John Wiley & Sons, New York).

Robinson J (1951) An iterative method of solving a game. Ann. Math.54(2):296–301.

Ross SM (2010) Introduction to Probability Models, 10th ed. (AcademicPress, San Diego).

Ruckle W (1983) Geometric Games and Their Applications (PitmanAdvanced Publishing Program, Boston).

Shieh E, An B, Yang R, Tambe M, Baldwin C, DiRenzo J, MauleB, Meyer G (2012) Protect: A deployed game theoretic system toprotect the ports of the United States. Internat. Conf. AutonomousAgents and Multiagent Systems (AAMAS) (International Foundationfor Autonomous Agents and Multiagent Systems, Richland, SC).

Taylor B, Moore L, Clayton E, Davis K, Rakes T (1985) An integer non-linear goal programming model for the deployment of state highwaypatrol units. Management Sci. 31(11):1335–1347.

Thomas L, Washburn A (1991) Dynamic search games. Oper. Res.39(3):415–422.

Washburn A, Wood K (1995) Two-person zero-sum games for networkinterdiction. Oper. Res. 43(2):243–251.

Washburn AR (2003) Two-Person Zero-Sum Games, 3rd ed. (INFORMS,Hanover, MD).

Weber R, Weiss G (1990) On an index policy for restless bandits. J. Appl.Probab. 27(3):637–648.

Weber R, Weiss G (1991) Addendum to “On an index policy for restlessbandits.” Adv. Appl. Probab. 23(2):429–430.

Whittle P (1988) Restless bandits: Activity allocation in a changing world.J. Appl. Probab. 25:287–298.

Zoroa N, Zoroa P, Fernandez-Saez MJ (2009) Weighted search games.Eur. J. Oper. Res. 195(2):394–411.

Kyle Y. Lin is an associate professor in the OperationsResearch Department at the Naval Postgraduate School. Hisresearch interests include stochastic modeling, queueing theory,game theory, and their applications.

Michael P. Atkinson is an assistant professor in the Opera-tions Research Department at the Naval Postgraduate School. Hisresearch focuses on applying operations research techniques tomilitary, homeland security, and healthcare applications.

Timothy H. Chung is an assistant professor of systems engi-neering at the Naval Postgraduate School. His research interestsinclude modeling and analysis of operational settings involv-ing unmanned systems, notably information gathering and sensorfusion for search and detection missions using probabilistic andoptimization models.

Kevin D. Glazebrook is Distinguished Professor in Opera-tional Research in the Department of Management Science atLancaster University in the United Kingdom (UK). His researchinterest centers on the development and evaluation of heuris-tics for complex stochastic decision problems that are based onstate-based calibrations of the options available. He directs theUS$20 million LANCS Initiative whose goal is to build researchcapability in foundational operations research (OR) within theUK. He also chairs the STOR-i Centre for Doctoral Training instatistics and OR at Lancaster University. Both of these majornational initiatives profit from significant U.S. and other interna-tional engagement.

Dow

nloa

ded

from

info

rms.

org

by [

205.

155.

65.2

26]

on 2

4 Se

ptem

ber

2014

, at 2

0:30

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

ec2 e-companion to Lin et al.: A Graph Patrol Problem with Random Attack Times

Electronic Companion toA Patrol Problem with Random Attack TimesKyle Y. Lin, Michael P. Atkinson, Timothy H. Chung, Kevin D. Glazebrook

EC.1. Linear Programs to Compute the Optimal Policies

This section discusses linear programs to compute the optimal policies. The random-attacker case

is discussed in §EC.1.1, and the strategic-attacker case is discussed in §EC.1.2.

EC.1.1. Optimal Policy Against Random Attackers

The MDP model presented in §3.1 belongs to the class of multichain models described in Chapter

9 of Puterman (1994), because for a given stationary, deterministic policy it is possible for the

resulting Markov chain to have multiple recurrent classes. To determine COPT, we need to solve the

following system of equations for g(s) and h(s) (referred to as the multichain optimality equations

in equations (9.1.1) and (9.1.2) in Puterman (1994)):

g(s) = mini2A(s)

{g(�(s, i))}, 8s2⌦,

g(s)+h(s) = mini2B(s)

{C(s, i)+h(�(s, i))}, 8s2⌦,

where B(s) = {i2A(s) : g(s) = g(�(s, i))}. That is, B(s) is the subset of A(s), including all actions

that attain the minimum in the first equation. The quantity g(s) represents the long-run cost rate

if the system starts in state s and h(s) is a bias term that can be interpreted as a transient cost.

For our system, the optimality equations will have C

OPT = g(s) for all s2⌦, because the long-run

cost rate is independent of the initial state. Consequently, in our model we have B(s) =A(s). As

the MDP has a finite state space, we can formulate the following linear program to compute the

optimal cost rate C

OPT (see §9.1.1 in Puterman (1994) for more details):

maxg,h

g (EC.1)

subject to g+h(s)C(s, i)+h(�(s, i)), 8s2⌦ and i2A(s). (EC.2)

The size of the constraint matrix is on the order of |⌦|n⇥ |⌦| (the exact number of rows depends

upon the adjacency structure of the graph). The number of states grows exponentially, as the

number of nodes grows. If the attack times are bounded by the same integer B > n, then the

number of states is given by

|⌦|=n�1X

i=0

✓n

1

◆✓n� 1

i

◆✓B� 1

n� 1� i

◆(n� 1� i)!,

because there is exactly one node in state 1 (first term), i of the other n� 1 nodes in state B+1

(second term), and each of the remaining n�1� i nodes needs to be in a distinctive state between

2 to B (last two terms). For example, if n= 8 and B = 10, then there are over 8 million states; if

n= 10 and B = 15, there are over 25 billion states.

e-companion to Lin et al.: A Graph Patrol Problem with Random Attack Times ec3

EC.1.2. Optimal Policy Against Strategic Attackers

First, write the dual of the linear program defined by (EC.1)–(EC.2).

minx

X

s2⌦

X

i2A(s)

C(s, i)x(s, i) (EC.3)

subject toX

i2A(s)

x(s, i)�X

t2⌦

X

i2A(t)

I(�(t, i) = s)x(t, i) = 0, 8s2⌦, (EC.4)

x(s, i)� 0, 8s2⌦, and i2A(s). (EC.5)

Recall from §3.1 that �(t, i) represents the resulting state if the patroller chooses node i to visit in

state t. The indicator function I(�(t, i) = s) returns 1, if taking an action i in state t moves the

system to state s, and returns 0 otherwise. The decision variable x(s, i) can be interpreted as the

fraction of time the system is in state s with the patroller next visiting node i. Constraint (EC.4)

states that the long-run transition rates entering and leaving each state must be the same.

The objective function in (EC.3) calculates the total long-run cost rate over all nodes. For a

strategic attacker, however, we want to minimize the largest expected cost per attack among all

nodes. The quantity C(s, i) aggregates the cost over all nodes after the patroller moves to node i

and thus does not allow us to isolate the cost at each node. However, Cj

(s, i), defined in equation

(1), does specify the cost at the individual node j. The long-run cost rate at node j is thereforeP

s2⌦

Pi2A(s)Cj

(s, i)x(s, i). Dividing the preceding by �

j

, we can redefine the zero-sum game in

(25) as

minx

maxj=1,...,n

X

s2⌦

X

i2A(s)

C

j

(s, i)

�

j

x(s, i).

Therefore, we can modify the linear program in (EC.3)–(EC.5) to minimize the largest long-run

average cost per attack among all nodes:

minx

�

subject toX

s2⌦

X

i2A(s)

C

j

(s, i)

�

j

x(s, i) �, j = 1, . . . , n,

and equations (EC.4) to (EC.5).

The minimized � produced by this linear program yields what we want in equation (25).

EC.2. Proofs of Statements

EC.2.1. Proof of Theorem 1

To show that W (k) is nondecreasing, compute

W (k+1)�W (k) = c�(k+1)

✓Zk+2

k+1

P (X t)dt�

Zk+1

k

P (X t)dt

◆� 0,


because P (X t) is nondecreasing in t. To derive the optimal policy for w 2 [W (k � 1),W (k)],

we will show that the objective function f(m) is nonincreasing for m k, and nondecreasing for

m� k. For m k,

f(m)� f(m� 1) =1

(m� 1)m(W (m� 1)�w)

1

(m� 1)m(W (m� 1)�W (k� 1)) 0.

The first inequality follows because w � W (k � 1), while the second inequality follows because

m k and W (k) is nondecreasing. Similarly, for m� k,

f(m+1)� f(m) =1

m(m+1)(W (m)�w)�

1

m(m+1)(W (m)�W (k))� 0.

To prove the final part of the theorem, note that f(k+1) f(k) if and only if w �W (k). It is

optimal not to visit at all, if f(k+1) f(k) for all k, or equivalently,

w� supk=1,2,...

W (k) = limk!1

W (k) = c�E[X].

The proof is completed. ⇤


We restore the node su�x i for the proof. Suppose without loss of generality that W1(1)�W2(1).

Although it is not necessary, we shall suppose for simplicity of presentation that the indices for

each node are strictly increasing over the range 1 kB

i

, i= 1,2. It will be convenient to use the

notational shorthand

�

i

(k) = c

i

�

i

Zk

0

P (Xi

t)dt.

Consider the value of W2(2) in three cases:

1. W2(2)>W1(1)�W2(1). In this case we have that

w

⇤ = argmaxw�0

C(w) =W1(1),

and

K1(w⇤) =K2(w

⇤) = 2.

The Lagrangian relaxation with w = w

⇤ (equivalently, the relaxed problem with value C

TR) is

solved by a policy whose visits to both nodes 1 and 2 have period 2. The associated value is given

by

C(w⇤) =�1(2)+ �2(2)

2.

One can check that the IH applied from any starting state makes alternating visits to nodes 1 and

2, from which it follows that

C

IH =�1(2)+ �2 (2)

2.

Hence, the result holds.


2. W2(2) =W1(1)�W2(1). The values for w⇤ and C(w⇤) in case 1 remain valid. However, because

W2(2) =W1(1) we need to consider two stationary expressions of the IH. One version alternates

between 1 and 2 as in case 1, while the second repeats the patrol pattern 2-1-1 from some finite

point on. However,

W2(2) =W1(1))�1(1)+ �1(2)+ �2 (3)

3=

�1(2)+ �2(2)

2,

so the two expressions of IH have the same associated cost rate. The result then holds for case 2.

3. W1(1)>W2(2)>W2(1). Write

r=max{k :W2(k)<W1(1)} .

By assumption r� 2. For this case we have K1(w) = 1 for w<W1(1); K1(w)� 2 and K2 (w)� 3 for

w>W1(1). It then follows that w⇤ =W1(1) and that there are now two policies for the Lagrangian

relaxation which achieve C(w⇤). These are policy ⇡1 which visits node 1 with period 2 and node 2

with period r+1, and policy ⇡2 which visits node 1 with period 1 and node 2 with period r+1.

The randomization

2

r+1⌦⇡1 +

r� 1

r+1⌦⇡2

can be shown to satisfy the total-rate constraint (4) with equality, and we infer that

C

TR =C(w⇤) =2

r+1

�1 (2)

2+

�2 (r+1)

r+1

�+

r� 1

r+1

�1(1)+

�2 (r+1)

r+1

�

=(r� 1)�1(1)+ �1 (2)+ �2 (r+1)

r+1.

Now if W2 (r+1)>W1(1) then, for any initial state, the IH will, from some finite time onwards,

visit the nodes in a repeating pattern with periodicity r + 1 of r visits to node 1 followed by a

single visit to node 2. It follows that

C

IH =(r� 1)�1(1)+ �1(2)+ �2 (r+1)

r+1

and the result holds. Should it be the case that W2 (r+1) =W1(1) then, as in case 2 there is a

second stationary expression of IH with an associated repeating pattern of length r+2 consisting

of r+1 visits to node 1 followed by a single visit to node 2. As in case 2, the associated cost rates

for the two expressions of the index heuristic are equal. Therefore, the result holds for case 3.

Because the result holds for all three cases, the proof is completed. ⇤



We initially consider cases where our n-node base problem is such that Ki

(w⇤) = k

i

andP

n

i=1 k�1i

=

1. Number the nodes such that k1 k2 . . . k

n

. It is plain that ki

� i. Let Q be the least common

multiple of {ki

,1 i n}. We start by constructing a (possibly infeasible) patrol schedule of length

Q by a single patroller as follows:

Visit node i on occasions i+mk

i

, for m= 0,1, . . . ,Q/k

i

� 1.

It is trivial that a (possibly infeasible) policy built on repetition of this (possibly infeasible) Q-

schedule achieves the bound C (w⇤) =maxw�0C(w).

For t= 1, . . . ,Q, let

A

t

⌘ {i : ki

| (t� i)}

denote the (possibly empty) set of nodes visited on occasion t under this Q-schedule. We now

proceed to the Q-amplification. First recall that we label the nodes (i, j) for a Q-amplification

graph, for i= 1, . . . , n and j = 1, . . . ,Q. We next create sets Ak

t

, for t= 1, . . . ,Q and k= 1, . . . ,Q, by

successive shifts of the above base Q-schedule to the right, operating mod Q in taking the last node

visited and placing it first in each shift. We do so by defining, for all t= 1, . . . ,Q, A1t

⌘ {(i,1); i2A

t

},

and A

k

t

⌘

�(i, k); i2A

t�k+1(modQ)

for k= 2, . . . ,Q. It is easy to check that

��[

Q

k=1Ak

t

��=Q, t= 1, . . . ,Q.

The proposed solution to the Q-amplification of the base problem is centered on a Q-schedule for

the Q patrollers in which the Q nodes in [

Q

k=1Ak

t

are visited on occasion t. Successive repetitions of

this Q-schedule gives the proposed solution to the problem. It is straightforward to check that the

bound C

Q(w⇤) is achieved by this policy and hence that COPT,Q = C

Q(w⇤). Moreover, under this

policy from time Q onward (i.e., after the completion of the first run of the proposed Q-schedule)

the nodes visited at any time have indices in the set {W

i

(ki

),1 i n} while those not being

visited have indices in the set {W

i

(k),1 k k

i

� 1,1 i n}. It follows that the nodes visited

after time Q are always Q of highest index. The claims in the theorem related to NQ-amplifications

rest on constructions which are simply N -fold replicates of the above.

A slightly more elaborate construction is needed in cases for whichP

n

i=1 k�1i

6= 1. Hence for the

n-node base problem, we now suppose without loss of generality that K1 (w⇤�) = k1,K1 (w⇤+) =

k1 +1, and K

i

(w⇤) = k

i

for i= 2, . . . , n, where k2 k3 . . . k

n

. In this case the bound C(w⇤) =

maxw�0C(w) is achieved for the base problem by a policy which visits nodes i one time in every

k

i

, i= 2, . . . , n, and which for node 1 randomizes between a schedule which visits one time in every

k1 +1 and one which visits one time in every k1 and where the probability attached to the former

is given by

↵⌘ k1 (k1 +1)

nX

i=1

k

�1i

� 1

!.


We now choose any integer Q for which the quantities ↵Q/(k1 + 1), (1� ↵)Q/k1, and Q/k

i

, i=

2, . . . , n, are all positive integers. The choice Q= (k1 +1)Q

n

i=1 ki would certainly su�ce. We again

construct a (possibly infeasible) schedule of length Q for searches of the n nodes by a single patroller

as follows:

1. Visit node 1 on occasions 1+mk1, for m= 0, . . . , (1�↵)Q/k1, and on occasions 1+(1�↵)Q+

m(k1 +1), for m= 1, . . . ,↵Q/(k1 +1)� 1.

2. Visit node i on occasions i+mk

i

, m= 0, . . . ,Q/k

i

� 1, for i= 2, . . . , n.

This schedule, repeated indefinitely, will achieve the bound C(w⇤). We now proceed as in theP

n

i=1 k�1i

= 1 case to use the above schedule and shifts of it to construct an admissible Q-schedule

for Q patrollers. As before a policy based on repetitions of this will achieve the lower bound

C

Q(w⇤) and hence be optimal. It will now be true that from time Q onward the nodes visited by

this policy will have indices in the set {Wi

(ki

),1 i n;W1(k1 +1)}, while those not visited will

have indices in the set {W1(k),1 k k1;Wi

(k),1 k k

i

� 1,2 i n}. Since by construction

W1(k1)W

i

(ki

), i= 2, . . . , n, it follows that the nodes visited after time Q are always Q of highest

index. The remainder of the discussion is as above, which completes the proof. ⇤

EC.2.4. Proof of Proposition 1

If the patroller never visits node j, then the long-run cost rate is at least c

j

p

j

⇤, where ⇤ is the

total arrival rate of attackers, because all attackers at node j will evade detection. According to the

assumption, cj

p

j

⇤>

Pi 6=j

c

i

p

i

⇤, where the right-hand side is the long-run cost rate if the patroller

stays at node j for each time period, since the patroller can detect all attackers at node j (recall

the assumption X

j

� 1). Consequently, a patrol policy that never visits node j cannot be optimal,

because the patroller can do better by simply staying at node j at all times. ⇤


There are three cases to consider.

1. The case d1 = d2 = 1. Consider an arbitrary patrol sequence, and let r denote its long-run

fraction of time visiting node 1. Because d1 = 1, the expected cost if node 1 is attacked is (1� r)c1.

Similarly, the expected cost if node 2 is attacked is rc2. The optimal cost is therefore

minr2[0,1]

max{(1� r)c1, rc2}=c1c2

c1 + c2,

when r= c1/(c1 + c2).

With our heuristic policy, the second group of patrol patterns consists of 2 singletons, namely

⇠1 ⌘ 1 (repeat node 1 indefinitely) and ⇠2 ⌘ 2. The matrix game, with the attacker choosing the

rows (nodes to attack) and the patroller choosing the columns (patrol patterns to use), is


⇠1 ⇠2 · · ·

1 0 c1 · · ·

2 c2 0 · · ·

with possibly more columns from the first and the third groups. On the one hand, the value of this

matrix game cannot be less than the optimal cost c1c2/(c1 + c2). On the other hand, the patroller

can achieve this optimal cost by choosing ⇠1 with probability c1/(c1 + c2) and ⇠2 with probability

c2/(c1 + c2). Hence, the heuristic is optimal.

2. The case d1 � 2 and d2 � 2. The optimal policy is clearly to alternate between the two nodes,

detecting all attackers at either node, and achieves the optimal cost 0.

Now consider our heuristic policy. First, because the attack time is deterministic, for i= 1,2, the

index is given by

W

i

(k) =

⇢0, if k < d

i

,

W

i

(di

)> 0, if k� d

i

.

It is straightforward to check that either of the two patrol patterns in the third group detects all

attackers at either node. Hence, the heuristic is optimal.

3. The case d1 = 1 and d2 � 2 (or d1 � 2 and d1 = 1). Without loss of generality, assume d1 = 1

and d2 � 2. First, we show that the optimal cost is c1c2/(c1 + c2d2). Consider an arbitrary patrol

sequence, and let r denote its long-run fraction of time visiting node 1. Because d1 = 1, the expected

cost if node 1 is attacked is (1� r)c1. To determine the optimal patrol sequence, it is su�cient

to consider sequences where any two consecutive visits to node 2 are at least d2 time units apart,

because otherwise we can insert visits to node 1 in between to have another patrol sequence that

is at least as good. If any two consecutive visits to node 2 are at least d2 time units apart, then

with a long-run fraction of time 1� r allocated to node 2, the expected cost if node 2 is attacked

is c2 � d2(1� r)c2. The optimal cost is therefore

minr2[0,1]

max{(1� r)c1, c2 � d2(1� r)c2}=c1c2

c1 + c2d2,

when r= (c1 + c2(d2 � 1))/(c1 + c2d2). One can check that

1� r=c2

c1 + c2d2<

1

d2,

so that indeed in the optimal policy any two consecutive visits to node 2 are at least d2 time units

apart.

Next we show our heuristic policy achieves the optimal cost. With the heuristic policy, the second

group contains a patrol pattern ⇠1 ⌘ 1. In the third group, when we generate the patrol pattern by

letting c2p2 > c1p1, the index for node 1 is

W1(k) = c1p1⇤, k= 1,2, . . . ,


while the index for node 2 is

W2(k) =

⇢0, if k < d2,

c2p2⇤d2, if k� d2.

Because c2p2⇤d2 > c1p1⇤, the resulting patrol pattern is ⇠2 ⌘ (1,1,1, . . . ,1,2), with d2 � 1 consecu-

tive visits to node 1 followed by a visit to node 2. The matrix game consists at least two columns

⇠1 and ⇠2:

⇠1 ⇠2 · · ·

1 0 c1/d2 · · ·

2 c2 0 · · ·

On the one hand, the value of this matrix game cannot be less than the optimal cost c1c2/(c1+c2d2).

On the other hand, the patroller can achieve this optimal cost by choosing ⇠1 with probability

c1/(c1 + c2d2) and ⇠2 with probability c2d2/(c1 + c2d2). Hence, the heuristic is optimal.

The heuristic policy is optimal in all three cases, so the proof is completed. ⇤

EC.3. Additional Constraints for the Linear Program to Compute Lower Bound

We introduce four sets of constraints that help tighten the lower bound produced by the linear

program in §3.3.4. These constraints also apply to the lower bound for the strategic attacker

scenario presented in §4.3.

1. Note thatP

Bik=3 yik represents the rate at which the patroller visits node i with the previous

visit to node i at least 3 time units ago. If the patroller is currently at node i, and if he wants to

return to node i after at least 3 time units, then he needs to first go to node j 6= i, and then to

node k 6= i, in the next two time units. The preceding can be expressed by

BiX

k=3

y

ik

X

j 6=i

min

(x

ij

,

X

k 6=i

x

jk

).

To make the preceding into linear equations, we can create variables zij

for i, j = 1,2, . . . , n, i 6= j,

and let

BiX

k=3

y

ik

X

j 6=i

z

ij

z

ij

x

ij

, i= 1, . . . , n; j 6= i

z

ij

X

k 6=i

x

jk

, i= 1, . . . , n; j 6= i

Adding these three constraints into the linear program tightens the lower bound substantially.

2. To taken the idea one step further, define v

ijk

as the expected rate the patroller goes from

node i to j, j to k, and then k to some node other than i; for j, k 6= i.

BiX

k=4

y

ik

X

j 6=i

X

k 6=i

v

ijk

.


If i, j, k are all di↵erent, use

v

ijk

x

ij

, i= 1, . . . , n; j 6= i

v

ijk

x

jk

, i= 1, . . . , n; j, k 6= i

v

ijk

X

l 6=i

x

kl

, i= 1, . . . , n; j 6= i

For vijj

, use instead

v

ijj

x

ij

, i= 1, . . . , n; j 6= i

v

ijj

a

ij

+ b

ij

, i= 1, . . . , n; j 6= i

2aij

+ b

ij

x

jj

, i= 1, . . . , n; j 6= i

b

ij

X

l 6=i,j

x

jl

, i= 1, . . . , n; j 6= i

where a

ij

is the rate of i, j, j, j, i and b

ij

the rate of i, j, j, k, i.

Technically speaking, it is possible to extend the idea further, but that will involve many more

variables with insignificant gains.

3. Suppose node i is leaf, and it is next to node j. The sumP

Bik=4 yik is the rate the patroller

leaves node i and spends at least 4 time units before returning. There are two ways to return after 4

times units, either i, j, j, j, i, or i, j, k, j, i for some k 6= i, j. This observation implies two inequalities:

BiX

k=4

y

ik

x

ij

,

BiX

k=4

y

ik

1

2x

jj

+X

k 6=i,j

x

jk

.

The first inequality does not add value, because it is implied by another constraint involving z

ij

.

The second inequality can be useful to tighten the bounds on line graphs and random trees.

4. In a line graph, each of the middle nodes can be viewed as a leaf of a line graph to its left,

and also a leaf of a line graph to its right. Say for node 3, we can add

B3X

k=4

y3,k 1

2x2,2 +x2,1 +

1

2x4,4 +x4,5.

This constraint is useful for line and circle graphs.

References

Puterman, M.L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming . John Wiley

& Sons, New York, NY.

a graph patrol problem with random attack times

Documents