multi-robot path planning for selective coveragemrl/pubs/sandeep/icra2016.pdfmulti-robot path...

2
Multi-Robot Path Planning for Selective Coverage Sandeep Manjanna, Nikhil Kakodkar, and Gregory Dudek School of Computer Science, McGill University email:{msandeep, dudek}@cim.mcgill.ca Abstract—In this paper we propose a reward-driven finite- horizon model akin to a Markov Decision Process to extract the maximum amount of valuable data in least amount of time. We present a path planning algorithm that generates off-line trajectories for multiple robots to cover a region of interest by visiting the hot-spots in increasing order of their significance. An underlying distribution is assumed to assist the algorithm with recognizing the hot-spots. The trajectories generated are both time and energy efficient. We validate our technique through several simulated experiments. Although this technique can be used in any environmental domain (Air, Water or Land), in this paper we demonstrate the success of our technique using a real robot in surveying coral reefs in the presence of real world conditions. I. I NTRODUCTION Environment monitoring is an activity of enormous and growing importance. In the marine environment, this is mo- tivated by a desire to observe factors such as the impact of sea level change or increase in temperature, acidification, and changes in fauna (such as depletion of coral). Modeling these and related phenomena calls for enormous amounts of data, sometimes sampled over long durations. Collecting such data involves risk to human and various resource constraints, and leads to challenges in terms of consistency and repeatability. Collecting wide-field data, especially in a challenging envi- ronments, is often achieved using sessile sensor nodes [1] [2]. These have an advantage in terms of consistency, but need to be replaced, and often provide limited measurement density. Using an autonomous vehicle for collecting such data is efficient and relatively risk-free, although the limitations on available power make efficient coverage particularly important (notably using an electric vehicle, instead of a gas-powered one, facilitates access to protected environments). This behavior of selecting the salient regions to examine first is the key feature of our algorithm. Recently there is a growing interest in non-uniform coverage. Seyed et al. propose a coverage strategy based on space-filling curves that explore the region non-uniformly [3]. Another interesting approach for coverage based on the model of curiosity is proposed by Girdhar et al. in [4]. Their information theoretic path planning technique provides paths passing through regions with higher surprise factor helping to distinguish various terrains. However, the paths provided by such a method are highly biased by temporary variations which is good for anomaly detection but might not be suitable for persistent monitoring. Selective coverage algorithm presented in this paper efficiently plans a path based on the partial knowledge about the environment. Our approach uses value-iteration to cover the entire region of interest, but in a prioritized fashion. We assume an underly- ing distribution for the phenomenon that needs to be modeled and build an off-line trajectory to cover the high probability regions along with a concern to reduce the travel time and energy consumption. Some of the potential applications would be - collecting samples from the ocean based on the surface Figure 1: Autonomous Surface Vehicle (ASV) used in field experiments temperature data from static sensor nodes or temperature maps from satellites, sampling visual data of coral reefs to monitor their health and growth, and sampling atmospheric gases based on the satellite maps. Off-line planning is essential when the mission is time and precision critical. As an application context and experimental validation we consider visual mapping of shallow water coral reefs. Despite their immense importance, coral reefs in particular are dying worldwide and a first step in estimating the impact of any remediation effort is to evaluate the reef itself. We used an autonomous boat (Fig. 1) to collect visual and bathymetric data from the sea surface. II. SELECTIVE COVERAGE The region of interest is discretized into grid cells and each grid-cell is assigned a utility value equivalent to the integral of underlying probability distribution over that cell. In the coral coverage task, we use depth map of the region as underlying probability distribution. We assume a finite set of states S represented by these grid-cells and a finite set of actions A the agent can take at each state (grid-cell). Four actions are allowed in every state - West, North, East, and South. GPS is used to localize the agent and a guaranteed state transition is achieved with the help of a precisely tuned controller. Hence, we assume certainty in state transition when an action a A is taken, i.e., the probability of state transition P (s 0 |s, a)=1. As mentioned earlier, the agent collects a reward R(s, a) for visiting a state s S and taking an action a in state s. We use the Value Iteration algorithm to compute the best action to be taken at a given state. Value iteration is a method of computing an optimal MDP policy. It computes the optimal value of a state V * (s), i.e., the expected discounted sum of rewards that the agent will achieve if it starts at that state and executes the optimal policy π * (s). s S, the optimal value function V * (s) is given by the Bellman equation, V * (s) = max a R(s, a)+ γ X s 0 S P (s 0 |s, a)V * (s 0 ) ! , (1) where γ is a discount factor. Thus according to Eq. 1, the value of a state s is the sum of instantaneous reward and the expected discounted value of the next state, when the best available action is used. Optimal policy defines an action for every state that achieves the optimal value (V * (s)). The rewards are defined by the summation of probabilities under that state. We clear the underlying rewards as and when any agent visits that state. Thus the reward function is changing

Upload: others

Post on 30-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Robot Path Planning for Selective Coveragemrl/pubs/sandeep/ICRA2016.pdfMulti-Robot Path Planning for Selective Coverage Sandeep Manjanna, Nikhil Kakodkar, and Gregory Dudek School

Multi-Robot Path Planning for Selective CoverageSandeep Manjanna, Nikhil Kakodkar, and Gregory Dudek

School of Computer Science, McGill Universityemail:{msandeep, dudek}@cim.mcgill.ca

Abstract—In this paper we propose a reward-driven finite-horizon model akin to a Markov Decision Process to extractthe maximum amount of valuable data in least amount of time.We present a path planning algorithm that generates off-linetrajectories for multiple robots to cover a region of interest byvisiting the hot-spots in increasing order of their significance. Anunderlying distribution is assumed to assist the algorithm withrecognizing the hot-spots. The trajectories generated are both timeand energy efficient. We validate our technique through severalsimulated experiments. Although this technique can be used inany environmental domain (Air, Water or Land), in this paperwe demonstrate the success of our technique using a real robotin surveying coral reefs in the presence of real world conditions.

I. INTRODUCTION

Environment monitoring is an activity of enormous andgrowing importance. In the marine environment, this is mo-tivated by a desire to observe factors such as the impact ofsea level change or increase in temperature, acidification, andchanges in fauna (such as depletion of coral). Modeling theseand related phenomena calls for enormous amounts of data,sometimes sampled over long durations. Collecting such datainvolves risk to human and various resource constraints, andleads to challenges in terms of consistency and repeatability.Collecting wide-field data, especially in a challenging envi-ronments, is often achieved using sessile sensor nodes [1] [2].These have an advantage in terms of consistency, but need to bereplaced, and often provide limited measurement density. Usingan autonomous vehicle for collecting such data is efficient andrelatively risk-free, although the limitations on available powermake efficient coverage particularly important (notably using anelectric vehicle, instead of a gas-powered one, facilitates accessto protected environments).

This behavior of selecting the salient regions to examinefirst is the key feature of our algorithm. Recently there is agrowing interest in non-uniform coverage. Seyed et al. proposea coverage strategy based on space-filling curves that explorethe region non-uniformly [3]. Another interesting approachfor coverage based on the model of curiosity is proposed byGirdhar et al. in [4]. Their information theoretic path planningtechnique provides paths passing through regions with highersurprise factor helping to distinguish various terrains. However,the paths provided by such a method are highly biased bytemporary variations which is good for anomaly detectionbut might not be suitable for persistent monitoring. Selectivecoverage algorithm presented in this paper efficiently plans apath based on the partial knowledge about the environment.

Our approach uses value-iteration to cover the entire regionof interest, but in a prioritized fashion. We assume an underly-ing distribution for the phenomenon that needs to be modeledand build an off-line trajectory to cover the high probabilityregions along with a concern to reduce the travel time andenergy consumption. Some of the potential applications wouldbe - collecting samples from the ocean based on the surface

Figure 1: Autonomous Surface Vehicle (ASV) used in fieldexperiments

temperature data from static sensor nodes or temperature mapsfrom satellites, sampling visual data of coral reefs to monitortheir health and growth, and sampling atmospheric gases basedon the satellite maps. Off-line planning is essential when themission is time and precision critical. As an application contextand experimental validation we consider visual mapping ofshallow water coral reefs. Despite their immense importance,coral reefs in particular are dying worldwide and a first step inestimating the impact of any remediation effort is to evaluatethe reef itself. We used an autonomous boat (Fig. 1) to collectvisual and bathymetric data from the sea surface.

II. SELECTIVE COVERAGE

The region of interest is discretized into grid cells and eachgrid-cell is assigned a utility value equivalent to the integral ofunderlying probability distribution over that cell. In the coralcoverage task, we use depth map of the region as underlyingprobability distribution. We assume a finite set of states Srepresented by these grid-cells and a finite set of actions Athe agent can take at each state (grid-cell). Four actions areallowed in every state - West, North, East, and South. GPS isused to localize the agent and a guaranteed state transition isachieved with the help of a precisely tuned controller. Hence,we assume certainty in state transition when an action a ∈ Ais taken, i.e., the probability of state transition P (s′|s, a) = 1.As mentioned earlier, the agent collects a reward R(s, a) forvisiting a state s ∈ S and taking an action a in state s.

We use the Value Iteration algorithm to compute the bestaction to be taken at a given state. Value iteration is a methodof computing an optimal MDP policy. It computes the optimalvalue of a state V ∗(s), i.e., the expected discounted sum ofrewards that the agent will achieve if it starts at that state andexecutes the optimal policy π∗(s). ∀s ∈ S, the optimal valuefunction V ∗(s) is given by the Bellman equation,

V ∗(s) = maxa

(R(s, a) + γ

∑s′∈S

P (s′|s, a)V ∗(s′)

), (1)

where γ is a discount factor. Thus according to Eq. 1, the valueof a state s is the sum of instantaneous reward and the expecteddiscounted value of the next state, when the best available actionis used. Optimal policy defines an action for every state thatachieves the optimal value (V ∗(s)).

The rewards are defined by the summation of probabilitiesunder that state. We clear the underlying rewards as and whenany agent visits that state. Thus the reward function is changing

Page 2: Multi-Robot Path Planning for Selective Coveragemrl/pubs/sandeep/ICRA2016.pdfMulti-Robot Path Planning for Selective Coverage Sandeep Manjanna, Nikhil Kakodkar, and Gregory Dudek School

over time as the agent clears the rewards. But this is against theMarkovian assumption to keep track of visited states. Hence,in our formulation we use a one-step MDP approach, wherewe model every state transition of the agent as MDP in a newworld and compute the value function over the updated rewardsof the world. Thus the convergence of the value iterationtechnique still holds good for every state transition. Trajectoriesgenerated for a single robot with three different underlyingreward distributions are depicted in Fig. 2.

(a) C-shaped (b) Disjoint (c) Multiple-islandFigure 2: Trajectories generated for a single robot over complexunderlying reward distributions. The color-bars indicate the rewards.

A. Multi-Robot

The selective coverage algorithm can be applied to plan pathsfor multiple robots such that the agents cover different hot-spotsin parallel and efficiently. This is achieved by incorporating thedistance between the agents into the reward function. Thus thestates with high rewards but closer to other agents would be lessinteresting compared to the states farther away from all agents,still with good rewards. In the multi-robot case, the decisionon action is made based on both the reward of the state andthe distance of the state (grid-cell) from other agents.

Each robot has its own reward map which gets updated basedon the current position of the robot itself and the position ofother robots operating in the region. Hence, we assume thefeasibility of continuous communication between the agentsso that each robot has the current location for all the otherrobots. As seen from the results (Fig. 3), trajectories generated(illustrated by white and black lines) for both the agents coverdifferent hot-spots in-parallel.

(a) Disjoint Distribution (b) Multiple-island Distribution

Figure 3: Trajectories generated for two robots (Black and White)over different reward distributions. White and Black triangles indicatethe starting locations of both the robots. The color-bars indicate therewards.

B. Trajectory adaptation

The proposed algorithm has the capability to adapt its outputtrajectory in accordance with predictions about the changes inthe operating environments. In particular for coral mapping,

we use the predictions for wind speed and wind direction togenerate efficient coverage trajectories. The proposed approachis very useful to generate trajectories for sampling and re-sampling the given region of interest over a period of time.The updated rewards indicated in Eq. 2 and Eq. 3 will considerthe effect of wind on robot’s navigation.∀a = {North, South} and ∀s ∈ S,

R(s, a) = R(s, a)− (ε ∗ wspeed ∗ sin(wθ)), (2)

∀a = {East,West} and ∀s ∈ S,R(s, a) = R(s, a)− (ε ∗ wspeed ∗ cos(wθ)), (3)

where ε is the convergence factor for value iteration, wspeedis the speed of the wind in ms−1, and wθ is the wind directionin radians.

III. FIELD EXPERIMENTS AND RESULTS

Our motivation in this paper is to provide efficient trajectoriesfor sampling and re-sampling the regions of interest based onunderlying distribution. We examined the performance of themethod in field experiments on the North Bellairs reef in theCaribbean Sea off the shore of Barbados. Fig. 4 illustrates theresults from the field experiments. In future, we plan to deploytwo or three robots in the field for complete demonstration ofour approach.

(a) (b)

Figure 4: Results from field experiments. (a) Selective Coveragetrajectory (in black) overlaid on top of depth map. Colorbar representsthe depth. (b) Collage of selected images from the coverage. The dottedline on the image shows the coverage trajectory of the robot.

REFERENCES

[1] R. N. Smith, Y. Chao, P. P. Li, D. A. Caron, B. H. Jones,and G. S. Sukhatme, “Planning and implementing trajectoriesfor autonomous underwater vehicles to track evolving oceanprocesses based on predictions from a regional ocean model,” TheInternational Journal of Robotics Research, 2010.

[2] S. L. Nooner and W. W. Chadwick, “Volcanic inflation measuredin the caldera of axial seamount: Implications for magma supplyand future eruptions,” Geochemistry, Geophysics, Geosystems,2009.

[3] S. A. Sadat, J. Wawerla, and R. Vaughan, “Fractal trajectoriesfor online non-uniform aerial coverage,” in IEEE InternationalConference on Robotics and Automation (ICRA), 2015, 2015.

[4] Y. Girdhar, D. Whitney, and G. Dudek, “Curiosity based ex-ploration for learning terrain models,” in IEEE InternationalConference on Robotics and Automation (ICRA), 2014, 2014.