multi-robot path planning for selective coveragemrl/pubs/sandeep/icra2016.pdfmulti-robot path...

Multi-Robot Path Planning for Selective CoverageSandeep Manjanna, Nikhil Kakodkar, and Gregory Dudek

School of Computer Science, McGill Universityemail:{msandeep, dudek}@cim.mcgill.ca

Abstract—In this paper we propose a reward-driven finite-horizon model akin to a Markov Decision Process to extractthe maximum amount of valuable data in least amount of time.We present a path planning algorithm that generates off-linetrajectories for multiple robots to cover a region of interest byvisiting the hot-spots in increasing order of their significance. Anunderlying distribution is assumed to assist the algorithm withrecognizing the hot-spots. The trajectories generated are both timeand energy efficient. We validate our technique through severalsimulated experiments. Although this technique can be used inany environmental domain (Air, Water or Land), in this paperwe demonstrate the success of our technique using a real robotin surveying coral reefs in the presence of real world conditions.

I. INTRODUCTION

Environment monitoring is an activity of enormous andgrowing importance. In the marine environment, this is mo-tivated by a desire to observe factors such as the impact ofsea level change or increase in temperature, acidification, andchanges in fauna (such as depletion of coral). Modeling theseand related phenomena calls for enormous amounts of data,sometimes sampled over long durations. Collecting such datainvolves risk to human and various resource constraints, andleads to challenges in terms of consistency and repeatability.Collecting wide-field data, especially in a challenging envi-ronments, is often achieved using sessile sensor nodes [1] [2].These have an advantage in terms of consistency, but need to bereplaced, and often provide limited measurement density. Usingan autonomous vehicle for collecting such data is efficient andrelatively risk-free, although the limitations on available powermake efficient coverage particularly important (notably using anelectric vehicle, instead of a gas-powered one, facilitates accessto protected environments).

This behavior of selecting the salient regions to examinefirst is the key feature of our algorithm. Recently there is agrowing interest in non-uniform coverage. Seyed et al. proposea coverage strategy based on space-filling curves that explorethe region non-uniformly [3]. Another interesting approachfor coverage based on the model of curiosity is proposed byGirdhar et al. in [4]. Their information theoretic path planningtechnique provides paths passing through regions with highersurprise factor helping to distinguish various terrains. However,the paths provided by such a method are highly biased bytemporary variations which is good for anomaly detectionbut might not be suitable for persistent monitoring. Selectivecoverage algorithm presented in this paper efficiently plans apath based on the partial knowledge about the environment.

Our approach uses value-iteration to cover the entire regionof interest, but in a prioritized fashion. We assume an underly-ing distribution for the phenomenon that needs to be modeledand build an off-line trajectory to cover the high probabilityregions along with a concern to reduce the travel time andenergy consumption. Some of the potential applications wouldbe - collecting samples from the ocean based on the surface

Figure 1: Autonomous Surface Vehicle (ASV) used in fieldexperiments

temperature data from static sensor nodes or temperature mapsfrom satellites, sampling visual data of coral reefs to monitortheir health and growth, and sampling atmospheric gases basedon the satellite maps. Off-line planning is essential when themission is time and precision critical. As an application contextand experimental validation we consider visual mapping ofshallow water coral reefs. Despite their immense importance,coral reefs in particular are dying worldwide and a first step inestimating the impact of any remediation effort is to evaluatethe reef itself. We used an autonomous boat (Fig. 1) to collectvisual and bathymetric data from the sea surface.

II. SELECTIVE COVERAGE

The region of interest is discretized into grid cells and eachgrid-cell is assigned a utility value equivalent to the integral ofunderlying probability distribution over that cell. In the coralcoverage task, we use depth map of the region as underlyingprobability distribution. We assume a finite set of states Srepresented by these grid-cells and a finite set of actions Athe agent can take at each state (grid-cell). Four actions areallowed in every state - West, North, East, and South. GPS isused to localize the agent and a guaranteed state transition isachieved with the help of a precisely tuned controller. Hence,we assume certainty in state transition when an action a ∈ Ais taken, i.e., the probability of state transition P (s′|s, a) = 1.As mentioned earlier, the agent collects a reward R(s, a) forvisiting a state s ∈ S and taking an action a in state s.

We use the Value Iteration algorithm to compute the bestaction to be taken at a given state. Value iteration is a methodof computing an optimal MDP policy. It computes the optimalvalue of a state V ∗(s), i.e., the expected discounted sum ofrewards that the agent will achieve if it starts at that state andexecutes the optimal policy π∗(s). ∀s ∈ S, the optimal valuefunction V ∗(s) is given by the Bellman equation,

V ∗(s) = maxa

(R(s, a) + γ

∑s′∈S

P (s′|s, a)V ∗(s′)

), (1)

where γ is a discount factor. Thus according to Eq. 1, the valueof a state s is the sum of instantaneous reward and the expecteddiscounted value of the next state, when the best available actionis used. Optimal policy defines an action for every state thatachieves the optimal value (V ∗(s)).

The rewards are defined by the summation of probabilitiesunder that state. We clear the underlying rewards as and whenany agent visits that state. Thus the reward function is changing

over time as the agent clears the rewards. But this is against theMarkovian assumption to keep track of visited states. Hence,in our formulation we use a one-step MDP approach, wherewe model every state transition of the agent as MDP in a newworld and compute the value function over the updated rewardsof the world. Thus the convergence of the value iterationtechnique still holds good for every state transition. Trajectoriesgenerated for a single robot with three different underlyingreward distributions are depicted in Fig. 2.

(a) C-shaped (b) Disjoint (c) Multiple-islandFigure 2: Trajectories generated for a single robot over complexunderlying reward distributions. The color-bars indicate the rewards.

A. Multi-Robot

The selective coverage algorithm can be applied to plan pathsfor multiple robots such that the agents cover different hot-spotsin parallel and efficiently. This is achieved by incorporating thedistance between the agents into the reward function. Thus thestates with high rewards but closer to other agents would be lessinteresting compared to the states farther away from all agents,still with good rewards. In the multi-robot case, the decisionon action is made based on both the reward of the state andthe distance of the state (grid-cell) from other agents.

Each robot has its own reward map which gets updated basedon the current position of the robot itself and the position ofother robots operating in the region. Hence, we assume thefeasibility of continuous communication between the agentsso that each robot has the current location for all the otherrobots. As seen from the results (Fig. 3), trajectories generated(illustrated by white and black lines) for both the agents coverdifferent hot-spots in-parallel.

(a) Disjoint Distribution (b) Multiple-island Distribution

Figure 3: Trajectories generated for two robots (Black and White)over different reward distributions. White and Black triangles indicatethe starting locations of both the robots. The color-bars indicate therewards.

B. Trajectory adaptation

The proposed algorithm has the capability to adapt its outputtrajectory in accordance with predictions about the changes inthe operating environments. In particular for coral mapping,

we use the predictions for wind speed and wind direction togenerate efficient coverage trajectories. The proposed approachis very useful to generate trajectories for sampling and re-sampling the given region of interest over a period of time.The updated rewards indicated in Eq. 2 and Eq. 3 will considerthe effect of wind on robot’s navigation.∀a = {North, South} and ∀s ∈ S,

R(s, a) = R(s, a)− (ε ∗ wspeed ∗ sin(wθ)), (2)

∀a = {East,West} and ∀s ∈ S,R(s, a) = R(s, a)− (ε ∗ wspeed ∗ cos(wθ)), (3)

where ε is the convergence factor for value iteration, wspeedis the speed of the wind in ms−1, and wθ is the wind directionin radians.

III. FIELD EXPERIMENTS AND RESULTS

Our motivation in this paper is to provide efficient trajectoriesfor sampling and re-sampling the regions of interest based onunderlying distribution. We examined the performance of themethod in field experiments on the North Bellairs reef in theCaribbean Sea off the shore of Barbados. Fig. 4 illustrates theresults from the field experiments. In future, we plan to deploytwo or three robots in the field for complete demonstration ofour approach.

(a) (b)

Figure 4: Results from field experiments. (a) Selective Coveragetrajectory (in black) overlaid on top of depth map. Colorbar representsthe depth. (b) Collage of selected images from the coverage. The dottedline on the image shows the coverage trajectory of the robot.

REFERENCES

[1] R. N. Smith, Y. Chao, P. P. Li, D. A. Caron, B. H. Jones,and G. S. Sukhatme, “Planning and implementing trajectoriesfor autonomous underwater vehicles to track evolving oceanprocesses based on predictions from a regional ocean model,” TheInternational Journal of Robotics Research, 2010.

[2] S. L. Nooner and W. W. Chadwick, “Volcanic inflation measuredin the caldera of axial seamount: Implications for magma supplyand future eruptions,” Geochemistry, Geophysics, Geosystems,2009.

[3] S. A. Sadat, J. Wawerla, and R. Vaughan, “Fractal trajectoriesfor online non-uniform aerial coverage,” in IEEE InternationalConference on Robotics and Automation (ICRA), 2015, 2015.

[4] Y. Girdhar, D. Whitney, and G. Dudek, “Curiosity based ex-ploration for learning terrain models,” in IEEE InternationalConference on Robotics and Automation (ICRA), 2014, 2014.

multi-robot path planning for selective coveragemrl/pubs/sandeep/icra2016.pdfmulti-robot path...

Documents