the maritime law enforcement response selection problem · the maritime law enforcement response...

OperationsResearchSociety ofSouth Africa

Submittedfor publication in

ORiON

OperationsResearch-Society of

South Africa

The Maritime Law EnforcementResponse Selection Problem

Authors’ identities suppressed: Blind refereeing copy

Abstract

In the maritime law enforcement (MLE) environment, coast guard operators typ-ically have to make a variety of counter-threat decisions following the detection andevaluation of potentially threatening events at sea, such as poaching, piracy, illegal im-migration and pollution. These decisions reside within the so-called response selectioncourse of actions during which maritime law enforcement resources, such as militaryvessels, dedicated coast guard vessels or armed helicopters, have to be dispatched tointercept sea-faring vessels that are deemed to be participating in potentially threaten-ing behaviour. The detection of such threats are, however, stochastically distributed inboth time and space, rendering the coordinated management of these MLE resources,which operate in a harsh and unpredictable environment, a very challenging problem.In addition, the ability to adapt dynamically to continual changes in input data is crit-ical for the success of MLE response selection efforts. A generic mathematical modelwhich may be implemented as a means to finding high-quality trade-off solutions inMLE response selection situations is formulated in this paper.

1 Introduction

The pursuit of effective Maritime Law Enforcement (MLE) requires that coastal nationsestablish suitable monitoring procedures aimed at sea-faring vessels within their maritimejurisdiction areas. These procedures are carried out during a so-called response selectionprocess where, following the detection and evaluation of potentially threatening events in-volving vessels of interest (VOIs) at sea, MLE resources, such as military vessels, dedicatedcoast guard vessels or armed helicopters, are dispatched by various response selection or-ganisations (such as the navy, the department of sea fisheries or the department of bordercontrols) in a coordinated fashion to intercept and neutralise these threats [8, 9]. In orderto achieve MLE efficiency, it is required that these decisions be coordinated in such a wayas to facilitate the rapid and semi-autonomous re-deployment of MLE resources at variouspoints in time.

Mathematical models of the decision processes involved in these response selection situ-ations may be based on the formation of so-called visitation routes for the interception

1

2 Authors’ identities suppressed: Blind refereeing copy

of VOIs by MLE resources [8]. Initial MLE resource deployment is, however, typicallycarried out without full information in respect of the current and future maritime situa-tion. Ancillary information usually has to be gathered from external sources on a continualbasis as the sea picture unfolds, and such information normally contributes significantlyto the temporal evolution of the MLE response selection input data. Additionally, certaininput data are typically not known a priori, but are rather described by random variables,governed by known probability distributions derived from various historical data sources.

In this paper, the fundamental structure of the MLE response selection problem is ex-pressed as a combinatorial optimisation model based on an analogy with the celebratedvehicle routing problem (VRP) and several of its variants. Key components of this mod-elling process include the incorporation of fixed and dynamic model parameters, the def-inition of suitable routing variables, the identification of multiple model objectives, theformulation of model constraints, the specification of stochastic input data, the design ofa paradigm for information processing and an ability to accommodate general system dy-namism. The aim in this paper is to integrate these characteristics of the MLE responseselection problem into a conceptual multi-objective optimisation model, as well as to takefirst steps towards the validation of the functionality of this model by solving it (approx-imately) in the context of an illustrative, hypothetical MLE response selection probleminstance.

The paper is structured as follows. A survey of key vehicle routing problem features foundin the literature that are relevant to MLE response selection is first conducted in §2. Variouspreliminary concepts and assumptions underlying the MLE response selection problemare then elucidated in §3, and this is followed by a generic mathematical formulationof the MLE response selection problem. Various challenges associated with the searchfor non-dominated solutions to high-complexity instances of the MLE response selectionproblem are then discussed in some detail in §4, and this is followed by a descriptionof the multiobjective simulated annealing algorithm we employ in search of high-qualitytrade-off solutions to instances of the MLE response selection problem. The working ofthe mathematical model of §3 and solution search methodology proposed in §4 are thendemonstrated in the context of a hypothetical MLE response selection scenario of relativelylow complexity in §5, after which the paper closes with a discussion on possible futurefollow-up work in §6.

2 VRPs in the literature

The VRP is considered one of the most important and complex combinatorial optimisationproblems in the operations research literature. It has often been described as easy toformulate but very difficult to solve, and is itself a combination of two other celebratedcombinatorial optimisation problems, namely the bin packing problem (in which, given thecapacity of a bin and a finite set of items of specific sizes, the minimum number of binsrequired to contain all the items is sought) and the travelling salesman problem (in which,given a weighted complete graph, a shortest closed tour including every node in the graphis sought). Perhaps the most appealing feature of the VRP is that its original formulationby Clarke and Wright [6], known as the capacitated VRP, may be extended in numerous

Response Selection for Maritime Law Enforcement 3

ways to solve a variety of real-world routing problems.

Modifications to the standard capacitated VRP models are routinely introduced to theliterature in order to incorporate new, fundamental aspects of some practical problem, suchas VRPs with customer time windows, split deliveries, multiple depots, heterogeneous fleetsof vehicles, customer profits, asymmetric travel arcs, demand ranges, etc. The interestedreader is referred to the authoritative book by Toth and Vigo [36] for generic formulationsof these VRP variants. Many papers also focus on in-depth theoretical analyses of theseproblems, often putting forward effective search techniques for solving them.

2.1 The nature of specific problem information

According to Psaraftis [28], real-world applications of VRP formulations may be classi-fied according to four attributes of the nature of information required to specify probleminstances, namely the evolution, quality, availability and processing of information. Thenature of information in a VRP plays a critical role in the characterisation, modelling andsolution search methodology that should be adopted. These attributes, together with theirrespective values, are shown in Table 1.

Evolution Quality Availability ProcessingStatic Deterministic Local Centralised

Dynamic Forecasts Global DecentralisedStochasticUnknown

Table 1: Classification of VRPs with respect to the nature of problem information [28].

The notion of evolution of information refers to the information available to the decisionmaker during the execution of vehicle routes, which may either be classified as static (wherethe input data are known for the entire duration of the routing process and are not subjectto any updates), or dynamic (where the input data are not known for the entire duration ofthe routing process, but are generally revealed or updated as time passes). The quality ofinformation refers to possible uncertainty in respect of the available input data, which mayeither be classified as deterministic (where the input data are known with certainty oncemade available), forecasts (where the input data are subject to revisions as the processevolves), stochastic (where certain input data are prescribed as probability distributionsor evolve according to certain stochastic processes), or unknown. The availability of in-formation may either be classified as local (where only certain role players in the routingprocess are immediately made aware of changes in input data), or global (where changesin input data are updated continuously to most or all role players). Finally, the process-ing of information refers to two main paradigms in respect of the processes followed oncenew information is made available, which may be classified as either centralised (where allinformation is collected and processed by a central unit), or decentralised (where some ofthe information is processed separately from the central unit).


2.2 VRP solution techniques

Due to their simplicity and fast computations, VRP heuristics are very popular for ob-taining satisfactory, though only approximately, Pareto optimal solutions to small-scalemultiobjective problem instances. Solving large-scale VRP instances with complex mul-tiple objectives is, however, often very difficult using only heuristic techniques if the goalis to attain a good approximation of the Pareto front [36]. VRPs of this nature generallycall for the use of more robust, more flexible solution search techniques. In most real-life VRP applications, the problem to be solved is very specific. It is therefore criticalto achieve a good understanding of the specifics of the problem at hand, and also of theVRP variants and associated optimisation techniques available in the literature, in orderto describe, model and solve the problem in an appropriate and efficient manner. Theevolution of VRP search methodologies has, over the past ten years, almost exclusivelytaken place within the realm of metaheuristics [36]. In addition, this time period gave riseto numerous hybridisation techniques, which best encapsulate the rise of metaheuristics inthis field.

The multi-depot VRP is a variant of the capacitated VRP in which routes are simultane-ously sought for several vehicles originating from multiple depots, servicing a fixed set ofcustomers and then returning to their original depots. In addition, the multi-depot loca-tion routing problem is similar to a VRP, but with the difference that the set of depots isnot known a priori [22]. Instead, a subset of candidate depot sites with specific operatingcosts must be selected from a finite set of depots, while simultaneously seeking an optimalset of delivery routes. Compared to simpler VRPs and their variants, for which abundantliterature exists, only a relatively small volume of research has been done on multi-depotVRPs. A small number of good optimisation techniques have nevertheless appeared inthe literature for this problem. Most of these techniques tend to break down an instanceof the problem into a series of single-depot sub-instances and/or only solve it for a singleobjective and/or for a homogeneous fleet of vehicles [17, 18, 26, 35, 38]. Because vehiclesalways start and complete their routes at the same depots, it is easy to merge the set ofcustomer vertices with that of the depot vertices in such simplified model formulations.

In [18] and [35], for example, the solution of a single-objective, multi-depot VRP with ahomogeneous vehicle fleet is decomposed into a three-phase heuristic approach. The firstphase involves clustering or grouping the set of customers to be served by each depot (aclustering sub-problem), while the second phase entails assigning customers served from thesame depot to several routes so that the capacity constraint associated with the vehicles isnot violated (a routing sub-problem). The third phase consists of determining the visitationsequence of customers located on the same route (a scheduling sub-problem). The chiefdifference between considering multiple depots simultaneously and solving multiple single-depot sub-problems separately is that the local search operators in metaheuristics tailoredto these approaches differ significantly in nature.

According to Salhi et al. [30], however, the decomposition of a multi-depot VRP into sev-eral single-depot sub-problems by (naively) assigning the customers to the depots closestto them (distance-wise), and solving these sub-problems individually, using a suitable ap-proach, is easy to carry out, but usually leads to poor, significantly sub-optimal solutions.It is usually more beneficial to utilise optimisation techniques whose search procedures


consider the entire multi-depot VRP as a whole (i.e. a global search should be conductedinstead of launching multiple local searches for simplified sub-problem decompositions ofthis kind1). As a result, the mathematical modelling approach proposed by Salhi et al. [30]is perhaps best suited to solving instances of the MLE response selection problem in a cen-tralised or intermediate decision making paradigm (where the problem is solved globally).In [30], a complete mixed-integer, linear formulation is provided for a generic multi-depotVRP with a heterogeneous vehicle fleet. Moreover, formulation variants are also offeredfor alternative multi-depot VRP scenarios that are included in the MLE response selectionproblem considered in this paper. For instance, multi-depot VRP variants are consideredin which the number of vehicles of a given type is known (i.e. a heterogeneous fleet withdefined classes of vehicles); in which certain types of vehicles cannot be accommodated atcertain depots; in which a vehicle is not required to return to the same depot from whenceit originated; or in which a maximum route length constraint associated with each classof vehicle is imposed (i.e. a distance-constrained VRP). Salhi et al. [30] then solve theproblem (approximately) using a variable neighbourhood search applied to the notion ofborderline customers, using six different local search operators. Unfortunately, the modelformulation and search technique presented in [30] is applicable to single-objective optimi-sation only. Consequently, it is unlikely that this variable neighbourhood search heuristicmay be as effective for solving similar problems in multiobjective space.

Many studies favour the use of genetic algorithms for solving multiobjective VRP variants,often suggesting specific or unique genetic operators [17, 18, 31, 35, 38]. Moreover, due tothe high complexity level associated with problems of this kind, many studies advocate theuse of hybrids over more traditional evolutionary computation approaches in an attemptto improve algorithmic performance. The innovative hybrid genetic search with adaptivecontrol proposed by Vidal et al. [38], for example, offers an optimisation methodology forsolving (among other problems), the multi-depot, periodic VRP with capacitated vehiclesand time constraints. This hybrid includes a number of advanced features in terms ofchromosome fitness evaluation, offspring generation and improvement, as well as effectivepopulation management. Two sub-populations are maintained in [38] (one containing fea-sible individuals and the other infeasible ones), and the mating pool is populated usingparents from both sub-populations. Furthermore, the selection operator mechanism takesinto account both the fitness of individuals and the level of contribution they provide tothe diversity of the population. The fitness of offspring solutions is enhanced by employ-ing education and repair local search procedures. Ho et al. [18] propose another hybridgenetic algorithm for solving difficult multi-depot VRPs. Here, three external heuristicsare combined with a traditional genetic algorithm in order to design a hybrid between theClarke-Wright algorithm [6], the nearest neighbour heuristic [27] and the iterated swapprocedure [23]. While the former two heuristics are solely used for generating high-qualityinitial populations, an iterated swap procedure is applied to the offspring solutions which,if found to have better fitness values than one (or both) of their parents, replaces that (orthose) parent solution(s) in the next generation of candidate solutions.

Although not as popular for solving complex multiobjective VRPs, certain authors have

1The same principle is applicable between the centralised and the decentralised MLE response selectiondecision making paradigms, in which multiple, independent routing sub-processes are conducted in thelatter, while a global search process is performed in the former.


also adopted traditional, sequential simulated annealing techniques as a means to findingsolutions to less complex VRPs [21, 24, 37, 39]. In these approaches, the very popularinter-route and intra-route solution transformation techniques [37] are often employed toprovide the necessary exploration and exploitation features to this metaheuristic for thepurpose of generating a non-dominated front of high quality when tuned appropriately.Harnessing the processing power of parallel computing has also been proposed for solvingcomplex multiobjective VRPs [2].

3 Mathematical modelling

In this paper, we demonstrate how instances of the MLE response selection problem maybe modelled as a special type of VRP, where the fleet of vehicles represents the fleet ofMLE resources at the collective disposal of the various MLE decision entities, the customersrepresent the VOIs tracked at sea within the territorial waters of the coastal nation and thedepots represent the bases and patrol circuits from whence MLE resources are dispatchedon interception routes. Moreover, the notion of a vehicle visiting and servicing a customerrefers to the successful interception of (and perhaps neutralisation of a threat uncoveredat) a VOI by an MLE resource.

3.1 Routing characteristics

In an MLE environment, a base is defined as a facility providing the necessary re-supplying,stationing and maintenance services to MLE resources. Every base is typically unique withrespect to certain characteristics that are relevant to resource assignment operations, suchas size, accessibility or geographical location. These characteristics, combined with externalfactors, may have a significant impact on the way MLE response selection operations areconducted.

The MLE response selection routing problem differs from standard capacitated VRP for-mulations in the literature as a result several key characteristics, which are discussed in [8].MLE resources are generally either allocated for the purpose of intercepting VOIs at sea(such resources are said to be in an active state), or are strategically allocated to certainpatrol circuits or bases until needed for future law enforcement purposes (such resources aresaid to be in an idle state) [9]. Additionally, MLE resources may temporally be unavailablefor law enforcement operations over certain periods of time due to routine maintenance,infrastructure damage, unavailability of crew or depleted autonomy resources. MLE re-sources which are both idle and assigned to a patrol circuit are said to be on stand-by. Inthis paper, the MLE response selection operations considered focus almost exclusively onthe management of active MLE resources.

The notion of patrolling may be described as the strategic allocation of inactive MLEresources to specific geographical regions (i.e. the assignment of idle MLE resources topatrol circuits) — the incorporation of patrol circuits in MLE response selection operationsserves a similar purpose to the so-called repositioning-and-waiting strategies implementedin certain VRP studies as a means of anticipating future events [36]. Here, idle vehiclessituated in low-demand areas are relocated to cover high-demand areas (preferably not too


far from a base, depending on the autonomy levels of these vehicles), also known as “areas ofinterest” or “fruitful regions”, based on known a priori information in the form of historicaldata related to previous events. It is therefore acknowledged that a strong coordinationbetween the management of active MLE resources and that of idle MLE resources is criticaltoward achieving overall successful MLE response selection operations.

As mentioned before, it is typically the case that the entire MLE response selection processof a coastal nation is not conducted by a centralised operator or decision maker, but israther orchestrated by multiple role players, called decision entities in this paper. Exam-ples of such entities include the navy, the coast guard, the department of sea fisheries orthe maritime rescue authority of a coastal nation. These decision entities may perceivethe quality of MLE response selection operations in their own, different ways, as they eachtend to pursue their own individual goals as a result of subjective interpretations of whatis deemed important while carrying out MLE operations. In particular, these decisionentities may perceive the threatening intensity of VOIs differently, function largely inde-pendently from one another, follow their own guidelines, and utilise their own subsets ofMLE resources.

3.2 The presence of trade-off alternatives

Coastal nations (and their various decision entities) typically have their own values, prefer-ences and perceptions of the desirability of trade-offs between objectives when dealing withthreats at sea. It is therefore expected that the MLE response selections of the various roleplayers will vary from one to another. These responses should, however, be coherent andcarried out according to a pre-determined protocol, based on an adequate set of objectives.We advocate that the MLE response selection problem ought to be formulated as a multiob-jective problem. In this respect, we have identified four plausible and realistic fundamentalobjectives (three of which are fully motivated in [8]) that should be pursued simultaneouslyin search of suitable trade-off alternatives to MLE response selection problem instances ingeneral:

I Maximise the combined visitation score of VOIs scheduled to be intercepted andinvestigated, weighted by (a) the probabilities of these VOIs being various types ofthreats from a pre-specified list of relevant threats and (b) the priorities assigned bythe coastal nation with respect to neutralising various threat types from this list.

II Minimise the combined time delay score of VOIs scheduled to be intercepted, weightedin the same manner as in Objective I above.

III Minimise the total operating costs incurred by the dispatch of active MLE resources,including (a) their set-up costs and (b) their travelling costs while in an active state.

IV Maximise the combined consensus score obtained by aggregating the level of agree-ment of each decision entity in respect of the set of VOIs assigned to them for MLEresponse selection operations, weighted by (a) an individual VOI preference orderedset associated with each decision entity, (b) the deviation from the ideal quantity ofVOIs assigned to each decision entity and (c) the relative importance of each decisionentity to the coastal nation.


3.3 On the evolution, quality and processing of information

In an MLE environment, the degree of effectiveness achieved by threat detection and threatevaluation systems plays a critical role in providing accurate and complete input data so asto facilitate the efficient functioning of routing operations. MLE threat detection systemstypically aim to detect and track maritime objects in specified areas of the ocean so as togather kinematic information associated with these objects, such as their sizes, locationsand velocities. This information may then be used by an attribute management systemto create so-called system tracks and derived attributes associated with each of the VOIswithin the jurisdiction area of the coastal nation. Here, a system track is a record of thehistorical displacement of a VOI in space from the time of its detection to the presenttime, while a derived attribute of a VOI is a parameter value that is computed from themeasured attributes of the VOI, such as its acceleration. Based on the computation of thesederived attributes, the system must be able to provide future VOI trajectory estimates thatserve as input data for calculating interception trajectories during the routing process ofMLE response selection operations. It is acknowledged, however, that even the best threatdetection radar systems are often not able to compute the derived attributes of the VOIsfrequently enough to predict future VOI trajectories with high-accuracy. The purpose ofan MLE threat evaluation system is to (attempt to) classify and quantify the potentiallythreatening nature of VOIs by analysing their behaviour based on the information collectedduring the threat detection process and making automated inferences in respect of thenature and level of threat posed by VOIs.

As a solution to an MLE scenario is implemented and response selection operations arecarried out according to this solution, it is merely a matter of time before the situationat sea requires reconsideration. Input data to problem instances are made known to thedecision makers in a continual fashion and are updated concurrently with the determinationand/or implementation of a solution [8]. An operator must therefore solve part of theproblem on the basis of the information currently available, and then re-solve part of theproblem as certain new input data are revealed and as fragments of the solution mightno longer be feasible or preferred. The MLE response selection problem therefore resideswithin the class of dynamic VRPs.

In this respect, we define a disturbance as a threshold phenomenon occurring stochasticallyin time which may cause the currently implemented solution to suffer significantly in termsof quality, or to become infeasible, hence affecting the execution of the current routingplan [8]. A disturbance triggers the start of a new problem instance during which thesituation at sea has to be re-evaluated (i.e. the instance has to be re-solved under theoriginal information combined with the data update which brought about the disturbance,disregarding those VOIs of the previous solution that have already been intercepted).Henceforth, define the notion of a time stage as a finite temporal interval over which anMLE response selection problem instance is considered to be valid. A disturbance thereforecauses the current time stage to end and the next one to begin. Examples of disturbancesmay include the detection of a new VOI, a significant change in the velocity of a VOI,a significant update in respect of the expected threat nature of a VOI or the suddenavailability of a previously unavailable MLE resource.

Furthermore, certain input data are not known with certainty, but are rather described


by random variables with known or estimated probability distributions. It follows thatthe MLE response selection problem also falls in the class of stochastic VRPs [36]. Threemain sources of stochastic information form part of the input data to the MLE responseselection problem, namely information pertaining to customer location (i.e. uncertaintyin respect of the location of a VOI at any given point in time in the future), customerdemand (i.e. uncertainty in respect of the threat nature and threat intensity of potentialVOIs), and customer presence (i.e. uncertainty in respect of the possibility that a VOIdoes not embody any threat at all). The customer service time (i.e. the time elapsedbetween the moment an MLE resource intercepts it at sea to the moment it completesservicing it) also carries significant uncertainty, as it cannot be known with certainty apriori what kind of threat may be embodied by a VOI; characteristics associated withVOIs (even those embodying the same threatening activities) may vary significantly fromone VOI to another. In addition, certain MLE resources are better suited to neutraliseVOIs embodying certain types of threats [8], which is an important factor to take intoaccount when assessing service time. The service times of VOIs may therefore be estimatedbased on the distributions of approximate threat probabilities associated with each VOI(as described above) and historical data for calculating the expected times that certainMLE resources take to service VOIs embodying certain types of threats.

In the model formulation proposed in this paper, it is assumed that newly detected VOIsare individually allocated to the various decision entities by a central operator. This processcan, in essence, be formulated as a multi-person decision making problem [3, 16], wherethe proposed alternatives consist of preferential input received from the decision entitiesand contain information on the desirability levels associated with each VOI in respectof assigning the VOI to a particular decision entity. In other words, decision entitiescommunicate input data related to their preferences of being responsible for interceptingcertain VOIs to the central operator. These preferences are expected to be conflicting innature and hence a decision that limits the overall level of dissatisfaction amongst thedecision entities should ideally be sought (i.e. a certain consensus level must be reached).In this context, the notion of reaching a consensus involves the cooperative process ofobtaining the maximum degree of agreement between all decision entities, resulting in thecollective support of a single solution from a given set of alternatives for the greater goodof the coastal nation, as well as achieving certain satisfaction levels of individual decisionentities to some (lesser) extent. It is, however, also assumed in this model that the centraloperator remains in charge of the overall routing process — i.e. the decision entities are notresponsible for allocating their MLE resources to visitation routes aimed at interceptingthe subset of VOIs assigned to them. A centralised approach with respect to the processingof information is therefore employed in this VRP formulation.

3.4 Set notation and parametric configuration

Let Veτ = {ve1, . . . , venτ } be the set of VOIs at the beginning of time stage τ , let Vr ={vr1, . . . , vrm} be the set of available MLE resources, let

Vb = {vb1, . . . , vb|Vb|}


denote the set of bases and let

Vp = {vp1 , . . . , vp|Vp|}

represent a set of pre-determined patrol circuits. Additionally, let Vτ = Veτ ∪Vr ∪Vb ∪Vp,and let Vekτ = {vek1, . . . , veknτ } ⊆ Veτ be the ordered set of VOIs scheduled to be investigatedby MLE resource k during time stage τ . In our model formulation, active MLE resourcesin Vr are assigned to investigate subsets of VOIs in Veτ during time stage τ , after whichthey are assigned either to travel back to a base in Vb or to a designated patrol circuit inVp.During MLE response selection operations, we assume that the detection system of acoastal nation therefore tracks, during time stage τ , nτ VOIs, which are individuallymatched with estimated probabilities to each of |H| − 2 known threat classes, an un-known threat class and a false alarm class [8]. If pihτ is the probability at time τ that VOIi resides in threat class h, then, of course,

|H|∑h=1

pihτ = 1 for all i ∈ {1, . . . , nτ}.

It is assumed that the position and velocity of each VOI is known at all times (these valuesmay change over time in a stochastic manner), and that estimated itineraries are availablefor all VOIs.

Let dijkτ be the estimated distance travelled by MLE resource k from point i to point jduring time stage τ and set diikτ = +∞ for all points i. If it is assumed that MLE resourcek maintains a fixed average speed of ηk between any two points i and j, then the time thatMLE resource k spends travelling from i to j may be expressed as the quotient of dijkτand ηk. Moreover, the monetary cost associated with MLE resource k travelling from ito j during time stage τ is estimated as Γkdijkτ , where Γk is the cost per unit distancetravelled associated with MLE resource k. Finally, define the set-up cost Csk incurred whenpreparing MLE resource k for departure on a mission.

It is suggested that the MLE response selection problem is, inter alia, formulated as adistance and time constrained VRP [8]. Hence, define the autonomy level adkτ of MLEresource k during time stage τ in respect of distance, and call this value its distanceautonomy level, which represents the maximum distance that it may travel at sea beforehaving to return to a designated base during time stage τ . Similarly, define the autonomylevel atkτ of MLE resource k during time stage τ in respect of time, and call this value itstime autonomy level, which represents the maximum time that it may spend at sea beforehaving to return to a designated base during time stage τ . It is assumed that, while thedistance autonomy level of an MLE resource only diminishes while it is in motion at sea(either in an active or in an idle state), its time autonomy level diminishes continually overtime from the moment the MLE resource leaves a base. It should therefore be noted that,if an MLE resource starts out along a route which does not originate at a depot, its initialautonomy levels will be lower than they would normally be had the same MLE resourcedeparted from a base. Once at a base, MLE resources are replenished and their autonomylevels are reset to their maximum values. A route is classified as distance-constrained


feasible and time-constrained feasible if there exists at least one approved base that is atmost as far away (in terms of both distance and time) as the autonomy level thresholdassociated with the MLE resource after having intercepted the last VOI on its route.Furthermore, due to possible distance-constrained feasibility concerns, so-called distanceand time patrol autonomy thresholds, denoted by Adkρ and Atkρ, are also incorporated intothe model formulation for all MLE resources k ∈ Vr in order to ensure that an MLEresource is only allowed to join a patrol circuit after having completed its mission, providedthat the travel distance and time to the circuit are within the specified autonomy levels.If a certain MLE resource may never be allocated to a certain patrol circuit during anytime stage τ ∈ N, then the corresponding patrol autonomy threshold value may be set to+∞. Finally, as discussed in §3.1, MLE bases typically differ from one another in respect ofnumerous physical characteristics that are relevant to the MLE response selection problem.These characteristics may impose certain end-of-route restrictions on the allocation of MLEresources to bases, preventing certain MLE resources from being scheduled at any time toend their routes at certain bases. To prevent infeasible assignments of this kind, theparameter

βbk =

{1, if MLE resource k is allowed to be scheduled to end its route at base b0, otherwise

is introduced, where k ∈ Vr and b ∈ Vb.We denote the expected service time that MLE resource k will take to investigate and/orneutralise VOIs embodying a threat of type h by Stkh. Then

Sikτ =∑h∈H

Stkhpihτ

is the expected service time of VOI i, provided that it is scheduled to be visited by MLEresource k during time stage τ . Because disturbances occur stochastically in time, it maybe the case that a new time stage is triggered while certain active MLE resources are inthe process of servicing VOIs. It is therefore assumed that VOIs which are in the processof being serviced at the end of a time stage, along with their estimated remaining servicetimes, are updated at the start of the next time stage. Thus, given that VOI i is inthe process of being serviced when a new time stage is triggered, let S′iτ be the expectedremaining time required to complete this service and, henceforth, set Sikτ = S′iτ .

The response time of a VOI may be approximated by adding together (1) the time elapsedsince it was first detected prior to the current time stage, (2) the estimated travel timealong the visitation route of the VOI during the current time stage prior to its interception,(3) the estimated service times of preceding VOIs visited along the same visitation routeas that including the VOI during the current time stage and (4) the estimated set-up timeof the MLE resource scheduled to intercept the VOI provided that it was idle at a baseat the end of the previous time stage. Let T diτ be the time elapsed between the detectionof VOI i and the start of time stage τ , let Twk be the expected set-up time incurred whenpreparing MLE resource k for a mission, and define the parameter

γkτ =

{1, if MLE resource k is not on stand-by at the end of time stage τ − 1

0, otherwise.


Thentiτ = T diτ + T tikτ + T siτ + γkτT

wk

is the expected response time of VOI i during time stage τ , where

T tikτ =

d0kτ ikτηk

, if i = Vekτ (1)

d0kτVekτ

(1)kτ+∑|Vekτ (i)|j=2 dVe

kτ(j−1)Ve

kτ(j)kτ

ηk, otherwise,

and

T sikτ =

{0, if i = Vekτ (1)∑|Vekτ (i)|

`=2 SVekτ (`−1)kτ , otherwise.

Here, the notation Vekτ (i) represents the ith VOI on the route of MLE resource k duringtime stage τ , while |Vekτ (i)| is a cardinal number representing the ordered position of thatVOI on the route. For example, if Vekτ = {5, 16, 9}, then Vekτ (3) = 9 and |Vekτ (9)| = 3.Furthermore, 0kτ represents the initial position of MLE resource k at the beginning of thetime stage. Finally, set tiτ = 0 if VOI i was being serviced by a MLE resource when timestage τ − 1 ended. Of course, tiτ is set to 0 if it is not scheduled as part of any visitationroute.

It is anticipated that certain types of MLE resources excel at countering VOIs embodyingcertain types of threats as a result of their unique infrastructure, size, crew expertiseand speed [8]. This is particularly the case if a VOI embodies a threat type for whichthe decision entity in charge of the MLE resource scheduled to intercept it specialises incountering threats of that type. Ensuring that these suitably equipped MLE resourcesare dispatched to intercept the relevant VOIs is beneficial for many reasons. For instance,the ability to counter certain threats effectively keeps the danger of retaliation, materialdamage and possible human injury to a minimum. Furthermore, effectively investigatingand/or neutralising VOIs reduces the service time. In this respect, let Wkh ∈ [0, 1] be thescore associated with the efficiency of MLE resource k in terms of neutralising a class hthreat. In addition, the nature and frequency of maritime threats detected vary in differentregions of the world, and different coastal nations typically face different types of threatsat different levels of harm or intensity [8]. Thus, let the score Qh ∈ [0, 1] associated withclass h threats represent the priority level assigned by the coastal nation to neutralisethreats in this class. Moreover, it may often be the case that a certain type of MLEresource does not only have a low efficiency value in respect of countering certain typesof threats, but also does not have the capacity to successfully neutralise VOIs embodyingcertain types of threats. In the literature on VRPs with stochastic demand, occurrenceswhere a vehicle reaches a customer and does not have sufficient capacity to service its (nowrevealed) demand are called service failures [36]. In the MLE response selection problem,an MLE resource is defined to be strictly incapable of neutralising a certain VOI if it isforbidden to attempt to intercept a VOI or incapable of neutralising a VOI embodyinga (known) specific type of threat. Situations in which MLE resources are assigned toVOIs whose threatening behaviours they are strictly incapable of neutralising are calledinfeasible encounters. Such situations would never occur if the threat associated with anyVOI was known in advance (assuming that the MLE response selection routing operatoris competent). However, as discussed in §3.3, expectations with respect to the nature of


VOIs are subject to error and, although the nature of certain VOIs are easily detected andevaluated with respect to the threat that they embody, others may carry a high level ofuncertainty, in turn increasing the risk of infeasible encounters. The degradation in MLEresponse selection operations caused by infeasible encounters may be avoided by employingpenalty weights or minimising some objective function.

The final aspect to consider in the formulation of the MLE response selection problemdeals with the multiple-entity decision making process typically involved in MLE responseselection-related decisions, as discussed in §3.1. In [16], reference is made to a multi-person decision making approach in which individual preference ordering of alternatives isused, and where each decision maker is required to provide his preferences in respect ofa set of alternative solutions (ranked from best to worst) for a given scenario. After theVOIs have been distributed to the decision entities, however, each decision entity s ∈ Zis only concerned with a subset of VOIs, rather than the response selection alternativeas a whole (i.e. decision entities are assumed to be indifferent towards VOIs assigned toother decision entities). It is therefore suggested that each decision entity s ∈ Z makeknown to a central operator their individual VOI distribution preferences in the form ofan ordered set of VOIs Oeτ = {os1, . . . , osnτ } as the input data received from decisionentity s, ranked from most preferred to least preferred, as well as an ideal number of VOIsNsτ ∈ {0, . . . , nτ} to which each decision entity would like to be assigned. Moreover, letfc be a discrete function with domain {−nτ , . . . , nτ} measuring the penalty incurred froma spread around the ideal number Nsτ . Finally, let Zs ∈ [0, 1] be a weight representing therelative importance of decision entity to the coastal nation from an MLE strategic point ofview, so that preferences expressed by more important decision entities may be prioritised.

3.5 Model formulation

In our model formulation, the decision variables

xijkτ =

{1, if MLE resource k is scheduled to travel from vertex i to vertex j during τ0, otherwise,

yikτ =

{1, if MLE resource k is scheduled to visit VOI i during τ0, otherwise

and

zisτ =

{1, if VOI i is scheduled on a route assigned to entity s during τ0, otherwise

are adopted. Following the discussions in the preceding four sections, and assuming thata new problem instance has just been triggered at the start of time stage τ ∈ N, the aimin our tetra-objective MLE response selection model is therefore to

maximise∑i∈Veτ

∑k∈Vr

yikτ∑h∈H

QhWkhpihτ ,


minimise∑i∈Veτ

tiτ∑k∈Vr

yikτ∑h∈H

pihτQh,

minimise∑k∈Vr

γkτCsk ∑j∈Veτ

x0kτ jkτ + Γk∑i=0kτi∈Veτ

∑j∈Vτj 6=0kτ

xijkτdijkτ

, and

maximise∑s∈Z

Zs

∑i∈Veτ

zisτ (nτ −Osτ (i))− fc

Nsτ −∑i∈Veτ

zisτ

subject to the constraints

∑i=0kτi∈Veτ

∑j∈Vτj 6=0kτ

xijkτ =∑`∈Veτ

y`kτ , k ∈ Vr, (1)

∑i=0kτi∈Veτ

xijkτ −∑`∈Vτ` 6=0kτ

xj`kτ = 0, j ∈ Veτ , k ∈ Vr, (2)

∑k∈Vr

yikτ ≤ 1, i ∈ Veτ , (3)∑s∈Z

zisτ ≤ 1, i ∈ Veτ , (4)∑j∈Veτ

∑k∈Vr

x0kτ jkτ =∑i∈Veτ

∑`∈Vb`∈Vp

∑k∈Vr

xi`kτ (5)

∑i=0kτi∈Veτ

∑j∈Vτj 6=0kτ

dijkτxijkτ ≤ adkτ , k ∈ Vr, (6)

∑i=0kτi∈Veτ

∑j∈Vτj 6=0kτ

dijkτηk

xijkτ ≤ atkτ , k ∈ Vr, (7)

∑i∈Veτ

xibk ≤ βbk, b ∈ Vb, k ∈ Vr, (8)

−(adkτ − Ad −Adkρ) ≤ Adkρ(1− wdkρτ ), k ∈ Vr, ρ ∈ Vp, (9)

x`ρkτ ≤ wdkρτ , ` ∈ Veτ , k ∈ Vr, ρ ∈ Vp, (10)

−(atkτ − At −Atkρ) ≤ Atkρ(1− wtkρτ ), k ∈ Vr, ρ ∈ Vp, (11)

x`ρk ≤ wtkρτ , ` ∈ Veτ , k ∈ Vr, ρ ∈ Vp, (12)

wdkρτ , wtkρτ ∈ {0, 1}, k ∈ Vr, ρ ∈ Vp, (13)

xijkτ ∈ {0, 1}, i ∈ {0kτ} ∪ Veτ ,j ∈ Veτ\{0kτ}, k ∈ Vr, (14)


yikτ ∈ {0, 1}, i ∈ Veτ , k ∈ Vr, (15)zisτ ∈ {0, 1}, i ∈ Veτ , s ∈ Z. (16)

In the model formulation above, the four objectives are those described in §3.2. Further-more, constraint set (1) couples the routing variables xijkτ and yikτ to ensure that MLEresource k is scheduled to a visitation route containing VOI j if and only if there exists aVOI i from which MLE resource k travels to VOI j. In addition, constraint set (2) ensures,if VOI j is intercepted by MLE resource k during time stage τ , that MLE resource k mustdepart from VOI j after the interception.

Because the VRP formulation of the MLE response selection problem proposed here in-cludes customer profits, not all VOIs have to be included in visitation routes. Moreover,we assume that no more than one MLE resource may be assigned to investigate the sameVOI. Consequently, constraint set (3) ensures that no more than one MLE resource mayarrive at or leave VOI i along a visitation route. This restriction requires that constraintset (4) also has to be included in order to ensure that VOI i may not be assigned to morethan one decision entity.

This model framework assumes that all active MLE resources are scheduled to end theirvisitation routes at either a base or a patrol circuit. In order to implement this condition,constraint (5) ensures that the number of MLE resources assigned to visitation routesduring time stage τ (i.e. the number of active MLE resources scheduled during the timestage) must coincide with the number of MLE resources scheduled to travel either to abase or to a patrol circuit. Meanwhile, constraint sets (6) and (7) represent the distanceautonomy and time autonomy restrictions of active MLE resource k by imposing upperbounds on the maximum distance and time travelled (respectively) by MLE resource kfrom the beginning of time stage τ .

Furthermore, with regard to post-mission assignments, constraint set (8) is incorporatedinto the model formulation as a means of forbidding the search from assigning MLE resourcek to base b at the end of its route if MLE resource k is never allowed to relocate to baseb. Additionally, the distance and time patrol autonomy thresholds are incorporated in theformulation in the form of the sets of linking constraints (9)–(12), where we define theparameters

Ad =∑i=0kτi∈Veτ

∑j∈Veτj 6=0kτ

(dijkτxijkτ )− dVekτ (|Vekτ |)ρkτxVekτ (|Vekτ |)ρkτ

and

At =∑i=0kτi∈Veτ

∑j∈Veτj 6=0kτ

(dijkτηk

xijkτ

)−dVekτ (|Vekτ |)ρkτ

ηkxVekτ (|Vekτ |)ρkτ .

in constraint sets (9) and (11). Here, MLE resource k is forbidden to be scheduled to patrolcircuit ρ at the end of its route if its distance or time autonomy level (respectively) at theend of the route, together with the distance or time to be travelled from the last VOI onthat route to patrol circuit ρ is below the pre-specified distance or time patrol autonomy


threshold, where wdkρτ and wtkρτ are linking constraint variables. Finally, constraint sets(13)–(16) enforce the binary nature of the model variables.

4 Solution approach

The MLE response selection problem exhibits a high level complexity, as it involves, interalia, a heterogeneous fleet of vehicles, multiple depots, customer visitation profits, multipleobjectives, general system dynamism and stochastic information as its main characteristics.It is therefore critical, in respect of the overall MLE response selection system effectiveness,to adopt a solution search technique that is capable of presenting an MLE response selec-tion operator with a high-quality set of non-dominated trade-off solutions within a limitedbudget of time. In this dynamic vehicle routing problem, the amount of time required tofind a new, preferred solution increases the MLE resource response times (consequentlyhaving a negative indirect impact on Objective II of §3.2), while increasing the chancesof avoidable detours (consequently having a negative indirect impact on Objective III),or reaching some VOIs too late (consequently having a negative indirect impact on Ob-jective I). A solution search engine that can quickly generate high-quality non-dominatedapproximations of the Pareto front is therefore preferred over one that slowly generatessolutions that are actually Pareto optimal, as it is anticipated that the former techniquewill in all likelihood still outperform decisions made by human operators by a large margin,given the same amount of solution search time.

4.1 The notions of dominance, Pareto optimality and the Pareto front

Consider a multiobjective optimisation model consisting of n decision variables x1, . . . , xn,m constraints and k objective functions f1, . . . , fk mapping an n-dimensional vectorx = (x1, . . . , xn) in decision space (or solution space) S to a k-dimensional vector f(x) =(f1(x), . . . , fk(x)) in objective space. Without loss of generality, all objective functions arehere assumed to be available in closed form and have to be minimised.

A decision vector x ∈ S is said to dominate a decision vector y ∈ S, denoted by x � y,if fi(x) ≤ fi(y) for all i ∈ {1, . . . , k} and if there exists at least one i∗ ∈ {1, . . . , k}such that fi∗(x) < fi∗(y). It follows that any two candidate solutions to a mulitobjectiveminimisation problem are related to each other in two possible ways only: (1) either onedominates the other, or (2) neither one is dominated by the other. Moreover, x is said tobe non-dominated with respect to some set S′ ⊆ S if there exist no vectors y ∈ S′ such thaty � x. Finally, a candidate solution x is said to be Pareto optimal if it is non-dominatedwith respect to the entire decision space S.

Pareto optimal solution vectors therefore represent trade-off solutions which, when evalu-ated, produce objective function vectors whose performance in one dimension cannot beimproved without detrimentally affecting at least some subset of the other k − 1 dimen-sions. The set of candidate solutions containing all Pareto optimal solutions in S is calledthe Pareto optimal set, denoted by P. That is,

P = {x ∈ S | @ y ∈ S such that y � x}.


Pareto optimal solutions may have no obvious apparent relationship besides their mem-bership of the Pareto optimal set; such solutions are purely classified as such on the basisof their values in objective space. These values produce a set of objective function vec-tors, known as the Pareto front, denoted by F , whose corresponding decision vectors areelements of P. That is,

F = {f(x) | x ∈ P}.

4.2 On approximating the true Pareto front

In a multiobjective combinatorial optimisation problem there exists a countable number ofsolutions on the Pareto front. The aim of any solution search technique is to approximatethese Pareto optimal solutions as closely as possible. In general, however, a close approx-imation of the entire Pareto front during a single run of such a solution search techniqueis almost impossible for complex multiobjective problems, especially when faced with astrict computational budget (i.e. if the computational resources available for solving theproblem, such as the search time, processing power or storage capacity, are tightly con-strained). Finding a good approximation of even a portion of the Pareto front is difficultwhen the structure of the underlying objective functions is not known or is complex [12].

According to Zitzler et al. [40], a multiobjective search technique should always aim toachieve three conflicting goals with respect to the nature of the resulting non-dominatedfront. First, the non-dominated front should ideally be as close as possible to the truePareto front. The non-dominated set of solutions should, in fact, preferably be a subsetof the true Pareto optimal set if possible. Secondly, solutions in the non-dominated setshould ideally be uniformly distributed and diverse with respect to their objective functionvalues along the Pareto front so as to provide the decision maker with the full picture oftrade-off decisions. Lastly, the non-dominated front should ideally capture the entire rangeof values along the true Pareto front.

Here, the first goal is best served by focusing the search on a particular region of the truePareto front — a process known in the literature as exploitation. On the other hand, thesecond goal requires the search to investigate solutions that are uniformly distributed alongthe true Pareto front — a process known in the literature as diversification. Finally, thethird goal aims to extend the non-dominated front at both ends of the true Pareto frontin order to explore new, extreme solutions.

We employed four solution search methodologies to approximate the true Pareto fronts ofMLE response selection problem instances, namely the archived multiobjective simulatedannealing algorithm of Smith et al. [32], the non-dominated sorting genetic algorithm(NSGA-II ) of Deb et al. [11], an adapted hybrid combining these two algorithms anda multi-start simulated annealing algorithm adapted from the (single-start) algorithm in[32]. The archived multi-objective simulated annealing and the NSGA-II mentioned abovehave been documented in the literature as being capable of achieving the above three goalsfor a wide variety of problems provided that their parameter values are chosen carefully.The application of all four of the above-mentioned search methods is not described in thispaper due to space limitations. Instead we only present certain important aspects of themultiobjective simulated annealing solution approach employed in the next section.


4.3 Multiobjective simulated annealing search methodology

The innovative and flexible multiobjective simulated annealing algorithm proposed bySmith et al. [32] is a natural method for approximating the true Pareto front of an MLEresponse selection problem instance. This metaheuristic employs the notion of archiving asa means of (a) assessing the energy levels associated with candidate solutions and (b) keep-ing a record of the non-dominated front uncovered at any point during the solution searchprocess.

4.3.1 The notion of archiving

Because simulated annealing only generates a single solution during each iteration of thesearch process, an external set, known as the archive, is employed to record all non-dominated solutions uncovered during the course of the search process. All solutions gen-erated during the course of the search are candidates for archiving, and are each tested fordominance in respect of every solution in the archive. The archiving process is illustratedin Figure 1 for a bi-objective minimisation problem with objective functions f1 and f2.

f1

f2

Solutions not archivedExisting solutions in the archiveNew solution added to the archiveArchived solution removed

Figure 1: The archiving process for a bi-objective minimisation problem.

If the true Pareto front P were available a priori, it would be possible to express the energy(i.e. the performance of a solution in objective space) of a solution x as some measure of theportion of the front that dominates x. Hence, let Px be the portion of P that dominatesx (i.e. Px = {y ∈ P |y � x}), and define the energy of x as E(x) =M(Px), whereMis a measurable function defined on P. If P is continuous, M may be configured as theLebesgue measure; otherwise M may simply be configured as the cardinality of Px (i.e.the number of solutions in P that dominate x). Of course, E(x) = 0 for all x ∈ P.As the true Pareto front is typically unavailable during the course of the search, however,the energy of a solution is measured based on the current estimate of the true Pareto front,which corresponds to the set of non-dominated solutions uncovered thus far during the


search (that is, the solutions in the archive). The energy difference between two solutionsis measured as the quotient of the difference in their respective energies and the size of thearchive. According to [32], a simulated annealer that uses this energy measure encouragesboth convergence to and coverage of the Pareto front in logarithmic time.

4.3.2 Algorithm outline

Suppose that the set of non-dominated solutions uncovered up to any point during thesolution search process is captured in an archive A. The algorithm is initialised with arandom feasible initial solution, which is initially placed in A. During any iteration, let Abe the set A ∪ {x} ∪ {x′}, where x′ is a neighbouring solution generated by performing aperturbation with respect to the current solution x, and define Ax = {y ∈ A |y � x}.

The estimated energy difference between the solutions x and x′ is then calculated as

∆E(x′,x) =|Ax′ | − |Ax|

|A|,

where division by |A| provides robustness against fluctuations in the number of solutionsin the archive. The reason for including both the current and neighbouring solutions in Ais motivated by the idea that ∆E(x′,x) < 0 whenever x′ � x. Besides its simplicity andefficiency in promoting the storage of non-dominated solutions uncovered during the searchprocess, another benefit of this energy measure is that it encourages exploration alongthe non-dominated front, regardless of the portion of the true Pareto front dominatinga solution. This principle is demonstrated in Figure 2 for a bi-objective minimisationproblem with objective functions f1 and f2. Although it appears as ifM(Px′) >M(Px)in the figure, it can be seen that |Ax′ | = 1 < 3 = |Ax|, hence moving the search closer toa large unexplored region of the non-dominated front when transitioning from x to x′.

The acceptance probability function of a neighbouring solution x′ of the current solution(similar to the Metropolis acceptance rule in standard simulated annealing algorithms) istaken as

P (x′) = min

{1, exp

(−∆E(x′,x)

T

)},

where T is the current temperature. A neighbouring solution that is dominated by fewersolutions in A therefore has a lower energy and is consequently automatically acceptedas the next current solution as it is considered, by definition, to represent an improvingmove. On the other hand, if there is a large positive difference between the energies ofthe neighbouring and current solutions, and the temperature is low, then the move has alow probability of being accepted. This acceptance probability function therefore does notdepend upon pre-determined objective function weights and is not affected by rescalingof the objectives, which results in relatively low computational complexity. If the moveis accepted, then the neighbouring solution is taken as the new current solution and thearchive is updated accordingly.


f1

f2

x′

x

Solutions in PSolutions in A

Figure 2: Energy measure of current and neighbouring solutions for a bi-objective minimi-sation problem with objective functions f1 and f2.

4.3.3 Solution transformation procedures

Solutions to an MLE response selection problem should be encoded in very specific dataformats which allow for the effective application of global and local search operations, easyevaluation of objective function values, end-of-route assignments and tests for solutionfeasibility. The various complex, dynamic features associated with the MLE responseselection problem, however, make it difficult to standardise these data formats to be used aspart of an optimisation search process. Candidate solutions must be encoded in a suitablyversatile format so as to be employed by different model components in a generic manner.In particular, the solution format should accommodate certain model management featuresin which subjective requirements may be dictated by the MLE response selection operatorson a temporal basis. In the literature, solutions to VRP instances are typically encodedas strings, which comprise a number of substrings each representing a route consisting ofa subset of customers scheduled for visitation by a particular vehicle. The order in whichcustomers are entered in such a substring is also the order in which the assigned vehiclevisits them along its route. The customer subsets of such substrings are referred to hereas partial strings. The reader is referred to [9] for further insight into the configurationprocess of solution strings in the MLE response selection context.

Two general types of solution transformation methods are typically employed in the sim-ulated annealing literature for generating neighbouring solutions to a VRP, namely intra-route transformations and inter-route transformations [37]. An intra-route transformationtechnique changes a route within the same solution string by altering one or more partialstrings within that route in an attempt to improve it. An inter-route transformation, onthe other hand, exchanges or relocates one or more partial strings between any routes


of a VRP solution string. These transformations are used to encourage exploitation anddiversification of the search, respectively.

In addition, the VRP customer visitation profits characteristic associated with the MLEresponse selection problem requires transformations that alter the number of VOIs sched-uled to be intercepted by adding or removing partial strings to or from the current solution.Such transformations are called inter-state transformations here, where the state of a so-lution is defined as the number of VOIs visited in a solution. More specifically, we definea string expansion transformation as one that increases the number of VOIs in the currentsolution string by inserting one or more partial strings containing currently unvisited VOIsstored from an external set into it. A string diminution transformation is similarly definedas one that decreases the number of VOIs in the current solution string by removing oneor more partial strings from it.

Examples of the above solution transformation procedures for VRP solution strings areillustrated in Figure 3. Figures 3 (a) and (b) represent intra-route transformations, whileFigures 3 (c) and (d) represent inter-route transformations. Figures 3 (e) and (f) finallyrepresent inter-state transformations2.

For inter-sate transformations, the probability of moving into any other state, given that thesolution is currently in a certain state, may be modelled by an ergodic Markov chain withassociated transition probability matrix, and resulting steady-state probabilities, based onthe binomial probability distribution.

Recall that a random variable U ∼ (n, p) is governed by a binomial probability distributionbased on n trials with success probability p if

P (U = u) =n!

u!(n− u)!pu(1− p)n−u, u ∈ {0, . . . , n}.

Here, the distribution parameter n represents the maximum number of VOIs that may bevisited in a solution string, while the user-defined parameter p is responsible for steeringthe expected proportion of the search spent in every state. The search process is thus ex-pected to spend more time investigating higher-state solutions whenever a higher successprobability parameter is employed. Furthermore, the probability of performing a trans-formation that moves the search to a solution in state u ∈ {0, 1, . . . , n}, provided that itcurrently is in state u′ ∈ {0, 1, . . . , n} with u 6= u′, is given by

Pu′u = PISP (U = u)

1− P (U = u′),

where PIS is the pre-configured probability of performing an inter-state transformationduring any given algorithmic iteration, and∑

u∈{0,...,n}u6=u′

Pu′uPIS

= 1, u′ ∈ {0, . . . , n}.

2The interested reader is referred to [9] for an overview of the solution encoding procedures which allowfor the effective application of these global and local search operators.


1 5 8 0 3 6 9 2 0 7 4

1 5 8 0 9 6 3 2 0 7 4

(a) One-to-one exchange

1 5 8 0 3 6 9 2 0 7 4

1 5 8 0 6 3 9 2 0 7 4

(b) Partial string reversal

1 5 8 0 3 6 9 2 0 7 4

1 8 0 3 6 9 5 2 0 7 4

(c) Delete-and-insert

1 5 8 0 3 6 9 2 0 7 4

1 4 7 8 0 3 6 9 2 0 5

(d) Partial string exchange reversal

1 5 8 0 3 6 9 2 0 7 4

1 5 8 0 9 2 0 7

(e) String diminution

1 5 0 3 0 4 7

2 1 5 0 3 0 4 9 7

(f) String expansion

Figure 3: Illustration of solution transformation procedures. In the above strings, thedummy cells, containing zeros, are used as route separators, while the remaining cellsrepresent individual VOIs scheduled to be intercepted on their respective visitation routes.For example, in the top string in (a), MLE resource 1 is scheduled to visit VOIs 1, 5 and 8(in that order), MLE resource 2 is scheduled to visit VOIs 3, 6, 9 and 2, and MLE resource3 is scheduled to visit VOI 7 and then VOI 4.

Because the size of the decision subspace associated with solutions of a certain state growsexponentially (i.e. significantly more solutions may be uncovered in decision subspaces ofhigher states), it may intuitively seem more logical to set the success probability param-eter relatively high. Certain problem instances (particularly instances containing a largenumber of VOIs and a limited number of available MLE resources) may, however, exhibitrelatively large proportions of infeasible solutions in higher-state subspaces. In such cases,it is detrimental to waste valuable computational time exploring these large infeasibledomains.

The parameter p need not be redefined by hand at the start of every run of the searchprocess. It may, for example, be pre-set as a function of the number of VOIs and MLEresources available at the beginning of a problem instance. Moreover, the sum of all idealquantities of VOI interceptions (

∑s∈Z Nsτ ), as provided by the various decision entities,

may also serve as a guideline as to the choice of an appropriate value for this parameter.


B1

B2

B3

B4

1

2

3

4

5

6

7

89

10

11

P1

P2

P3

P4

P5

P6

P7

a

d

c

be

f

gh

Active MLE ResourcesIdle MLE ResourcesVOIsBasesPatrol Circuits

EEZ bou

ndary

Land

Figure 4: Graphical representation of the physical elements in the hypothetical scenario.

5 A worked example

In order to demonstrate the working of the mathematical model presented in §3 and thesolution search approach described in §4.3, a hypothetical MLE response selection scenariois considered in this section which mimicks an MLE situation of low complexity in whicheleven VOIs have been detected and evaluated in respect of potentially threatening be-haviour. Here, the MLE response selection infrastructure consists of eight MLE resources,four coastal bases, seven patrol circuits, three decision entities and a jurisdiction area whichextends up to 200 nautical miles from the coastline. Furthermore, it is assumed that threatdetection and threat evaluation input data (with respect to the current MLE situation)are available for this problem instance, and that all relevant parameter values and otherinput data have been established. The physical elements of this hypothetical scenario atthe beginning of the problem instance are illustrated in Figure 4, while all input dataconsidered in this scenario may be accessed online [10].

5.1 Optimisation methodology

In order to solve the mathematical model of §3.5 using the proposed multiobjective sim-ulated annealing algorithm of §4.3, we employed four different solution transformationtechniques, triggered at every iteration of the algorithm according to a certain probabilitydistribution. These transformations are a within-route swap transformation with associ-ated probability of 0.35, a between-route swap transformation with associated probability of0.15, a between-route delete-and-insert transformation with associated probability of 0.15,and an inter-state transformation with associated probability of 0.35 and with steady-statebinomial probability distribution governed by the random variable U ∼ (n, p) = (11, 0.8).A graphical representation of this distribution is depicted in Figure 5(a).

Due to the nature of the binomial distribution, states that are located far away from themean are unlikely to be visited during the search process3. Consequently, the decision

3The expected value of a random variable governed by a binomial probability distribution is np, andits variance is np(1− p).


maker may prefer to adopt a states clustering approach as a remedial alternative, in whichgroups of states are regarded as the outcomes of a certain binomial distribution. Then,upon selection of a certain outcome, any state within the cluster represented by this out-come is entered with a certain probability, based on the number of states within the clusteras well as the cardinality of these states. This alternative approach is illustrated in Figure5(b), where 2-state clusters are formed and where the distribution of the clusters is gov-erned by the random variable U ∼ (n, p) = (5, 0.7), while the probability of entering anystate (upon selection of its representative cluster) is given by 0.5. Certain states (e.g. states4 and 5) are therefore allowed to be visited more frequently during the search process, whilestates which are found close to the mean value are not favoured as intensively.

0 2 4 6 8 10 12

0.00

0.10

0.20

0.30

Steady-sta

tepro

bability

State

(a) (Binomial) state probability distribution

1 2 3 4 50.00

0.10

0.20

0.30

{2,3} {4,5} {6,7} {8,9} {10,11}

Steady-sta

tepro

bability

States cluster

(b) (Binomial) cluster probability distributionFigure 5: Illustration of probability distributions used in inter-state transformations.

We adopted the well-known and widely used geometric temperature cooling schedule [10],with cooling parameter 0.85, initial temperature 30 and 700 epochs. A stopping criterion of5 minutes was imposed as a computational time budget. All of the above parameter valuesand probability distributions were chosen after careful consideration, following variouspreliminary empirical experiments.

A visual representation of a non-dominated front generated by the solution search processdescribed above is depicted in Figure 6. This front contains the set of (non-dominated)alternatives, mapped in tetra-objective space, in which the visitation score is measuredalong the vertical axis, the total delay and the operating costs are measured along thehorizontal axes, and the consensus level is measured by varying grey shading intensity.

5.2 Post-optimisation filtering of alternatives

It is anticipated that the large number of solutions populating non-dominated sets ofMLE response selection problem instances (as a result of the relatively large number ofobjectives) may make it difficult for the decision maker to select a suitable trade-off decisionwhen presented with such an overwhelming front4. This is a typical case of more is not

4The non-dominated front depicted in Figure 6 contains 1 249 trade-off alternatives.


0.5

1

1.5

2

2.5

3

3.5

0

20

40

60

80

100

120

1400.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Operating costs ($ 0 000)Total delay (hours)

Vis

itation

score

Consensus level

0.05

22.65

45.25

67.85

90.45

Figure 6: Non-dominated front in 4D objective function space generated during thearchived simulated annealing solution search process for the MLE response selection hy-pothetical scenario in Figure 4.

always better, as the excessive number of choices presented to an operator may requirehim to commit to time-consuming analyses in a limited time frame which may, in turn,cause him to select poor alternatives (from a subjectively preferential point of view). Moreimportantly, it is necessary for the decision maker to be able to select an alternativeas rapidly as possible in order to prevent the deterioration of MLE response selectionoperations caused by late changes in routing operations.

We therefore advocate that one or more filtering mechanisms be employed to reduce thenumber of solutions in the non-dominated front a posteriori of the solution search process.An example of basic filtering techniques may include the use of objective function bounds,where certain thresholds are specified in order to dismiss certain subsets of points in ob-jective space (and, of course, the corresponding points in solution space) which violatethese thresholds. A more sophisticated procedure is the ε-filter technique [40], which aimsto partition areas of high solution density by slicing objective function ranges of valuesat regular intervals in order to form M -orthotopes in the M -dimensional objective space.A single non-dominated point, located in each orthotope, is then retained, provided thatthere is at least one point in the orthotope, while the remaining points are removed fromthe front and solution space. Clustering techniques may also be employed to group non-dominated points into clusters in such a way that any two alternatives in a specific clusterare sufficiently “similar” to one another while being sufficiently “different” to the alterna-tives included in other clusters [20]. One representative in each cluster (e.g. its centroid)may then be retained, representing the characteristics of the cluster as a whole, while theother points in that cluster are discarded.

To illustrate the functionality of the above-mentioned filtering techniques, reconsider thenon-dominated front of Figure 6. Using post-optimisation objective function bounds, sup-


pose that the operator does not wish to select alternatives that are expected to cost morethan $20 000 or to carry more than a total delay of 100 hours, and that any solution eventu-ally implemented should embody a consensus level of at least 40 points amongst the decisionmaking entities. Additionally, suppose that the operator wishes to implement the ε-filtertechnique by slicing each objective function range exactly three times, creating 44 = 256four-orthotopes across the objective space, with ε-values (ε1, ε2, ε3, ε4) ≈ (1, 25, 500, 12.5).The resulting (reduced) non-dominated front generated, together with orthogonal projec-tions onto the three 2-dimensional planes, is shown in Figure 7.

0

0.5

1

1.5

2

2.5

−100

−50

0

50

100

150

200

250

−8

−6

−4

−2

0

2

4

6

Operating costs ($ 0 000)

Total delay (hours)

Vis

itation s

core

Consensus level

40

48.0714

56.1429

64.2143

72.2857

Figure 7: Reduced non-dominated front (comprising 49 solutions) together with orthogonalprojections of this front onto three 2-dimensional planes.

Suppose that the alternative induced by the vector f(x) = (4.28, 39.88, 1.49, 83.3) in thisobjective space is deemed to be the most preferable by the operator. The associated solu-tion is shown (in string form) in Figure 8, while associated configuration of its interceptionroutes scheduled at the beginning of the time stage is shown in Figure 9.

0a 9 P3 0b 7 3 P4 0c 10 5 B2 0e 1 P5 0f 2 11 B2 0g 4 B1 0h 6 B4

Figure 8: String representation of a candidate solution x to the MLE response selectionproblem instance corresponding to the hypothetical MLE response selection scenario whichresults in the objective function vector f(x) = (4.28, 39.88, 1.49, 83.3).

6 Conclusion and future work

In this paper, we proposed a combinatorial optimisation model formulation of a new kindof multiobjective dynamic VRP, called the MLE response selection routing problem. In


B1

B2

B3

B4

1

2

3

4

5

6

7

89

10

11

P1

P2

P3

P4

P5

P6

P7

a

d

c

be

f

gh

Active MLE ResourcesIdle MLE ResourcesVOIsBasesPatrol Circuits

EEZ bou

ndary

Land

Figure 9: Graphical representation of the solution in Figure 8.

this problem, the depots represent the bases from whence MLE resources are dispatched,the fleet of vehicles represents the fleet of MLE resources at the collective disposal of theMLE decision entities of a coastal nation and the customers represent the VOIs tracked atsea within the territorial waters of the coastal nation. It was motivated why the formu-lation of the MLE response selection problem incorporates a unique feature combination,namely that of a heterogeneous fleet of vehicles, multiple depots, customer visitation prof-its, asymmetric travel arc lengths and dynamic customer locations [8]. In addition, it wasacknowledged that the nature of information specification in this problem is classified asboth dynamic and stochastic.

The functionality of the model was demonstrated by solving a single problem instance ofthe MLE response selection problem for a hypothetical MLE response selection scenariousing an archived multiobjective simulated annealing solution approach. Moreover, threetechniques were proposed for filtering densely populated non-dominated solution sets so asto reduce the number of alternatives presented to an operator, as MLE response selection-related decisions need to be made in a timely manner.

For the sake of clarity, only the deterministic form of the MLE response selection problemwas introduced in this paper. It is, however, acknowledged that the highly stochastic natureof this problem remains to be considered meticulously. Such future work includes, interalia, dealing with information pertaining to the uncertainty related to a priori estimationof interception points and routing arc trajectories, establishing vehicle autonomy chanceconstrained programming (i.e. the a priori probability that a route duration/distance isin excess of the assigned vehicle’s time/distance autonomy does not exceed a prederminedconfidence threshold) and introducing an alternative type of domination criterion to assessthe quality of solutions mapped non-deterministically in objective space.

The proposed solution methodology had to be adapted in several different ways in order toaccommodate the unique nature of the MLE response selection problem, which may havecompromised certain performance aspects of the algorithm. It is believed (based on obser-vations in the literature) that the development of a hybrid metaheuristic or hyperheuristicmay eliminate, to some extent, the limitation of this search technique by complement-


ing it with more appropriate inner-loop sub-processes where necessary, in an attempt tobetter guide the search process. Another way of improving solution search performancesis to make use of parallel computing, where multiple metaheuristics run simultaneouslyin parallel with one another, solving the same problem instance, after which their non-dominated fronts are finally combined. Using a multi-start simulated annealing algorithmis an example of such a parallel solution approach. Finally, hypothetical scenarios of vary-ing complexities will provide a more robust performance testing ground for assessing, andpossibly improving, the performance of solution techniques for the MLE response selectionproblem.

References

[1] Bandyopadhyay S, Saha S, Maulik U & Deb K, 2008, A simulated annealing-based multiobjectiveoptimization algorithm: AMOSA, IEEE Transactions on Evolutionary Computation, 12(3), pp. 269–283.

[2] Banis R, Ortega J, Gil C, Fernandez A & De Toro F, 2013, A simulated annealing-basedparallel multi-objective approach to vehicle routing problems with time windows, Journal of ExpertSystems with Applications, 40(5), pp. 1696–1707.

[3] Bressen T, 2007, Consensus decision making: What, why, how, Berrett-Koehler Publishers, SanFrancisco (CA).

[4] Busetti F, 2003, Simulated annealing overview, [Online], [Cited June 2nd, 2013], Available fromhttp://www.geocities.com/francorbusetti/saweb.

[5] Chen P & Xu X, 2008, A hybrid algorithm for multi-depot vehicle routing problem, Proceedings ofthe 2008 IEEE International Conference on Service Operations and Logistics, pp. 2031–2034.

[6] Clarke G & Wright JW, 1964, Scheduling of vehicles from a central depot to a number of deliverypoints, Operations Research, 12(4), pp. 568–581.

[7] Coello CA, Van Veldhuizen DA & Lamont GB, 2002, Evolutionary algorithms for solvingmulti-objective problems, Kluwer Academic, New York (NY).

[8] Colmant A & van Vuuren JH, 2013, Prerequisites for the design of a maritime law enforcementresource assignment decision support system, Proceedings of the 42nd Annual Conference of theOperations Research Society of South Africa, pp. 90–101.

[9] Colmant A & van Vuuren JH, 2014, Solution representation for a maritime law enforcementresponse selection problem, Proceedings of the 43rd Annual Conference of the Operations ResearchSociety of South Africa, pp. 79–87.

[10] Colmant A & van Vuuren JH, 2015, MLE response selection hypothetical scenario, [Online],[Cited April 16th, 2015], Availbale from www.vuuren.co.za/miscellaneous.html.

[11] Deb K, Pratap A, Agarwal S & Meyarivan TAMT, 2002, A fast and elitist multiobjectivegenetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6(2), pp. 182–197.

[12] Deshpande S, Watson LT & Canfield RA, 2013, Pareto front approximation using a hybridapproach, Journal of Procedia Computer Science, 18, pp. 521–530.

[13] Diaz-Gomez PA & Hougen DF, 2007, Initial population for genetic algorithms: A metric approach,Proceedings of the International Conference on Genetic and Evolutionary Methods, pp. 43–49.


[14] Du Toit J & van Vuuren JH, 2012, Toward coastal threat evaluation decision support, Proceedingsof the 41st Annual Conference of the Operations Research Society of South Africa, pp. 31–39.

[15] Du Toit J, 2013, Stellenbosch University, [Personal Communication], Contactable at [email protected].

[16] Herrera-Viedma E, Herrera F & Chiclana F, 2002, A consensus model for multiperson de-cision making with different preference structures, IEEE Transactions on Systems, Man and Cyber-netics, Part A: Systems and Humans, 32(3), pp. 394–402.

[17] Ho W & Ji P, 2003, Component scheduling for chip shooter machines: A hybrid genetic algorithmapproach, Journal of Computers & Operations Research, 30(14), pp. 2175–2189.

[18] Ho W, Ho G, Ji P & Lau H, 2007, A hybrid genetic algorithm for the multi-depot vehicle routingproblem, Journal of Engineering Applications of Artificial Intelligence, 21(4), pp. 548–557.

[19] Horn J, Nafploitis N & Goldberg DE, 1994, A niched Pareto genetic algorithm for multi-objective optimization, Proceedings of the 1999 IEEE International Conference on Evolutionary Com-putation, pp. 82–87.

[20] Jain AK, Murty MN & Flynn PJ, 1999, Data clustering: A review, Journal of ACM ComputingSurveys, 31(3), pp. 264–323.

[21] Kokubugata H & Kawashima H, 2008, Application of simulated annealing to routing problems incity logistics, pp. 420 – 425, Cher Ming Tan (Ed.), I-Tech Education and Publishing, Vienna.

[22] Laporte G, Louveaux F & Mercure H, 1989, Models and exact solutions for a class of stochasticlocation-routing problems, European Journal of Operational Research, 39(1), pp. 71–78.

[23] Lee D & Schachter B, 1980, Two algorithms for constructing a Delaunay triangulation, Interna-tional Journal of Computer & Information Sciences, 9(3), pp. 219–242.

[24] Lin S, Yu V & Lu C, 2011, A simulated annealing heuristic for the truck and trailer routing problemwith time windows, Journal of Expert Systems with Applications, 38(12), pp. 15244–15252.

[25] Murata T & Ishibuchi H, 1995, MOGA: Multi-objective genetic algorithms, Proceedings of the1995 IEEE International Conference on Evolutionary Computation, pp. 289–298.

[26] Ochi LS, Vianna DS, Drummond LM & Victor AO, 1998, A parallel evolutionary algorithm forthe vehicle routing problem with heterogeneous fleet, Journal of Parallel and Distributed Processing,14, pp. 216–224.

[27] Tao Y, Papadias D & Shen Q, 2002, Continuous nearest neighbor search, Proceedings of the 28th

International Conference on Very Large Data Bases, pp. 287–298.

[28] Psaraftis HN, 1995, Dynamic vehicle routing: Status and prospects, Annals of Operations Research,61, pp. 143–164.

[29] Puljić K & Manger R, 2013, Comparison of eight evolutionary crossover operators for the vehiclerouting problem, Journal of Mathematical Communications, 18(2), pp. 359–375.

[30] Salhi S, Imran A & Wassan NA, 2013, The multi-depot vehicle routing problem with het-erogeneous vehicle fleet: Formulation and a variable neighborhood search implementation, [OnlineAccessed], [Cited: 6 March 2014], Available from http://www.kent.ac.uk/kbs/documents/res/working-papers/2013/mdvfmpaper(May2013)Web.pdf.

[31] Shuguang L, Huang W & Ma H, 2008, An effective genetic algorithm for the fleet size and mixvehicle routing problems, Journal of Transportation Research Part E: Logistics and TransportationReview, 45(3), pp. 434–445.


[32] Smith K, Everson RM, Fieldsend JE, Murphy C & Misra R, 2008, Dominance-based multiob-jective simulated annealing, IEEE Transactions on Evolutionary Computation, 12(3), pp. 323–342.

[33] Srinivas N and Deb K, 1994, Multi-objective optimization using non-dominated sorting in geneticalgorithms, IEEE Transactions on Evolutionary Computation, 2(3), pp. 221–248.

[34] Suppapapitnarm A, Seffen KA, Parks GT & Clarkson PJ, 2000, A simulated annealingalgorithm for multiobjective optimization, Journal of Engineering Optimization, 33(1), pp. 59–85.

[35] Surekha P & Sumathi S, 2011, Solution to multi-depot vehicle routing problem using geneticalgorithms, Journal of World Applied Programming, 1(3), pp. 118–131.

[36] Toth P & Vigo D, 2014, Vehicle routing problems, methods, and applications, MOS-SIAM Serieson Optimization, Society for Industrial and Applied Mathematics, Philadelphia (PA).

[37] Van Breedam A, 1995, Improvement heuristics for the vehicle routing problem based on simulatedannealing, European Journal of Operational Research, 86, pp. 480–490.

[38] Vidal T, Crainic TG, Gendreau M, Lahrichi N & Rei W, 2012, A hybrid genetic algorithmfor multidepot and periodic vehicle routing problems, Operations Research, 60(3), pp. 611–624.

[39] Wu T, Low C & Bai J, 2007, Heuristic solutions to multi-depot location-routing problems, Journalof Computers & Operations Research, 29(10), pp. 1393–1415.

[40] Zitzler E, Deb K & Thiele L, 2000, Comparison of multiobjective evolutionary algorithms: Em-pirical results, IEEE Transactions on Evolutionary Computation, 8(2), pp 173-195.

the maritime law enforcement response selection problem · the maritime law enforcement response...

Documents