andrew dobson kiril solovey rahul shome dan …kb572/pubs/scalable_asympt_opt_multi...scalable...

8
Scalable Asymptotically-Optimal Multi-Robot Motion Planning Andrew Dobson Kiril Solovey Rahul Shome Dan Halperin Kostas E. Bekris Abstract— Discovering high-quality paths for multi- robot problems can be achieved, in principle, by exploring the composite space of all robots. For instance, sampling- based algorithms that build either roadmaps or tree data structures achieve asymptotic optimality. The hardness of motion planning, however, which implies an exponential dependence on problem dimensionality, renders the ex- plicit construction of such structures in the composite space of all robots impractical. This work proposes a scalable, sampling-based planner for coupled multi-robot problems that provides desirable path-quality guarantees. The proposed dRRT * is an informed, asymptotically-optimal extension of a prior method dRRT, which introduced the idea of building roadmaps for each robot and implicitly searching the tensor product of these structures in the composite space. The paper describes the conditions for convergence to optimal paths in multi-robot problems, which is not feasible for the prior method. Moreover, simulations indicate dRRT * converges to high-quality paths and scales to higher numbers of robots where various alternatives fail. It can also be used on high-dimensional challenges, such as planning for robot manipulators. I. I NTRODUCTION AND PRIOR WORK Many multi-robot planning applications [24], [5], [4] require high-dimensional platforms to simultaneously move in a shared workspace, where high-quality paths must be computed quickly, as demonstrated in Fig. 1. Preprocessing given knowledge of the static scene can help the online computation of high-quality paths. For example, sampling-based roadmaps help with such high- dimensional instances and provide primitives for prepro- cessing a static scene [9], [10]. These methods converge to optimal solutions given sufficient density [8], i.e., at least O(n log(n)) edges are needed for a roadmap with n vertices, while near-optimality is achieved after finite computation [3], [7] or asymptotically [13], [1]. Na¨ ıvely constructing a sampling-based roadmap or tree in the composite configuration space of multi- A. Dobson, R. Shome and K. Bekris are with the Computer Science Dept. of Rutgers Univ., NJ, USA, {chuples.dobson, rahul.shome, kostas.bekris}@cs.rutgers.edu. They were supported by NSF IIS 1451737, IIS-1617744 and CCF 1330789. K. Solovey and D. Halperin are with the Com- puter Science Dept. of Tel Aviv University, Israel, {kirilsol,danha}@post.tau.ac.il. Their work has been supported in part by the Israel Science Foundation (grant no. 825/15) and by the Blavatnik Computer Science Research Fund. K. S. has also been supported by the Clore Israel Foundation. Fig. 1. Simultaneous planning for multiple high-dimensional systems is a difficult, motivating challenge for this work. ple robots provides asymptotic optimality but is not a scalable solution. In particular, memory requirements depend exponentially on the problem’s dimensionality [17]. The alternative is decoupled planning, where paths for robots are computed independently and then coor- dinated [11]. These methods, however, typically lack completeness and optimality guarantees. While some hybrid approaches can automatically take advantage of the inherent decoupling between robots and provide guarantees [23], they are often limited to discrete do- mains. The problem is more complex when the robots exhibit non-trivial dynamics [14]. Collision avoidance or control methods can scale to many robots, but typically lack global path quality guarantees [22], [21]. The previously proposed dRRT approach [18] is a scalable sampling-based approach, which is probabilisti- cally complete. It searches an implicit tensor product of roadmaps, which are constructed independently for each robot [20]. The current work proposes dRRT * and shows that it is an efficient asymptotically optimal extension of the previous method. The dRRT * framework is an anytime algorithm, which quickly finds initial solutions and then refines them, while ensuring asymptotic con- vergence to optimal solutions. Simulations show that the method practically generates high-quality paths while scaling to complex, high-dimensional problems, where alternatives fail. II. PROBLEM SETUP AND NOTATION Consider a shared workspace with R 2 robots, each operating in a configuration space C i for 1 i R. Let C f i C i be each robot’s free space, where it does not collide with the static scene, and C o i = C i \ C f i is the obstacle space for robot i. The

Upload: doanduong

Post on 07-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Scalable Asymptotically-Optimal Multi-Robot Motion Planning

Andrew Dobson Kiril Solovey Rahul Shome Dan Halperin Kostas E. Bekris

Abstract— Discovering high-quality paths for multi-robot problems can be achieved, in principle, by exploringthe composite space of all robots. For instance, sampling-based algorithms that build either roadmaps or tree datastructures achieve asymptotic optimality. The hardness ofmotion planning, however, which implies an exponentialdependence on problem dimensionality, renders the ex-plicit construction of such structures in the compositespace of all robots impractical. This work proposes ascalable, sampling-based planner for coupled multi-robotproblems that provides desirable path-quality guarantees.The proposed dRRT∗ is an informed, asymptotically-optimalextension of a prior method dRRT, which introduced theidea of building roadmaps for each robot and implicitlysearching the tensor product of these structures in thecomposite space. The paper describes the conditions forconvergence to optimal paths in multi-robot problems,which is not feasible for the prior method. Moreover,simulations indicate dRRT∗ converges to high-quality pathsand scales to higher numbers of robots where variousalternatives fail. It can also be used on high-dimensionalchallenges, such as planning for robot manipulators.

I. INTRODUCTION AND PRIOR WORK

Many multi-robot planning applications [24], [5], [4]require high-dimensional platforms to simultaneouslymove in a shared workspace, where high-quality pathsmust be computed quickly, as demonstrated in Fig. 1.Preprocessing given knowledge of the static scene canhelp the online computation of high-quality paths. Forexample, sampling-based roadmaps help with such high-dimensional instances and provide primitives for prepro-cessing a static scene [9], [10]. These methods convergeto optimal solutions given sufficient density [8], i.e., atleast O(n log(n)) edges are needed for a roadmap withn vertices, while near-optimality is achieved after finitecomputation [3], [7] or asymptotically [13], [1].

Naıvely constructing a sampling-based roadmap ortree in the composite configuration space of multi-

A. Dobson, R. Shome and K. Bekris are with the ComputerScience Dept. of Rutgers Univ., NJ, USA, {chuples.dobson,rahul.shome, kostas.bekris}@cs.rutgers.edu. Theywere supported by NSF IIS 1451737, IIS-1617744 and CCF 1330789.

K. Solovey and D. Halperin are with the Com-puter Science Dept. of Tel Aviv University, Israel,{kirilsol,danha}@post.tau.ac.il. Their work hasbeen supported in part by the Israel Science Foundation (grantno. 825/15) and by the Blavatnik Computer Science Research Fund.K. S. has also been supported by the Clore Israel Foundation.

Fig. 1. Simultaneous planning for multiple high-dimensional systemsis a difficult, motivating challenge for this work.

ple robots provides asymptotic optimality but is not ascalable solution. In particular, memory requirementsdepend exponentially on the problem’s dimensionality[17]. The alternative is decoupled planning, where pathsfor robots are computed independently and then coor-dinated [11]. These methods, however, typically lackcompleteness and optimality guarantees. While somehybrid approaches can automatically take advantage ofthe inherent decoupling between robots and provideguarantees [23], they are often limited to discrete do-mains. The problem is more complex when the robotsexhibit non-trivial dynamics [14]. Collision avoidance orcontrol methods can scale to many robots, but typicallylack global path quality guarantees [22], [21].

The previously proposed dRRT approach [18] is ascalable sampling-based approach, which is probabilisti-cally complete. It searches an implicit tensor product ofroadmaps, which are constructed independently for eachrobot [20]. The current work proposes dRRT∗ and showsthat it is an efficient asymptotically optimal extensionof the previous method. The dRRT∗ framework is ananytime algorithm, which quickly finds initial solutionsand then refines them, while ensuring asymptotic con-vergence to optimal solutions. Simulations show that themethod practically generates high-quality paths whilescaling to complex, high-dimensional problems, wherealternatives fail.

II. PROBLEM SETUP AND NOTATION

Consider a shared workspace with R ≥ 2 robots,each operating in a configuration space Ci for 1 ≤i ≤ R. Let Cf

i ⊂ Ci be each robot’s free space,where it does not collide with the static scene, andCoi = Ci \ Cf

i is the obstacle space for robot i. The

composite configuration space C =∏Ri=1 Ci is the

Cartesian product of each robot’s configuration space.A composite configuration Q = (q1, . . . , qR) ∈ C is anR-tuple of robot configurations. For two distinct robotsi, j, denote by Iji (qj) ⊂ Ci the set of configurationsof robot i, which lead into collision with robot j at itsconfiguration qj . Then, the composite free space Cf ⊂ Cconsists of configurations Q = (q1, . . . , qR) subject to:• qi ∈ Cf

i for every 1 ≤ i ≤ R;• qi 6∈ Iji (qj), qj 6∈ Iij(qi) for every 1 ≤ i < j ≤ R.

Each Q ∈ Cf requires robots to not collide withobstacles or pairwise. The composite obstacle space isdefined as Co = C \ Cf.

Given S, T ∈ Cf, where S = (s1, . . . , sR), T =(t1, . . . , tR), a trajectory Σ : [0, 1]→ Cf is a continuouscurve in Cf, such that Σ(0) = S,Σ(1) = T , wherethe R robots move simultaneously. Σ is an R-tuple(σ1, . . . , σR) of robot paths, such that σi : [0, 1]→ Cf

i.The objective is to find a trajectory, which minimizes

a cost function c(·). The analysis assumes the cost isthe sum of robot path lengths, i.e., c(Σ) =

∑Ri=1 |σi|,

where |σi| denotes the standard arc length of σi. Thearguments also work for maxi=1:R |σi|.1 This workprovides sufficient conditions for dRRT∗ to converge tooptimal trajectories over the cost function c.

III. METHODS FOR COMPOSITE SPACE PLANNING

First construct for every robot i ∈ [1, R] aroadmap Gi = (Vi,Ei) over Cf

i according to theasymptotically optimal PRM∗ [8], i.e., two roadmapnodes/configurations are connected with an edge if theyare within radius r(|Vi|) and the local path connectingthem is in Cf

i. Then: G = (V, E) = G1 × . . . × GRis the tensor product roadmap in C (see Figure 2).Formally, V = {(v1, v2, . . . , vR) : ∀ i, vi ∈ Vi} isthe Cartesian product of the nodes from each roadmapGi. For two vertices V = (v1, . . . , vm) ∈ V, V ′ =(v′1, . . . , v

′m) ∈ V, the edge set E contains edge (V, V ′)

if ∀i ∈ [1, R] : vi = v′i or (vi, v′i) ∈ Ei.2

Similar to the previous dRRT technique, the proposeddRRT∗ expands a tree of paths over the tensor productroadmap G. In contrast to dRRT, however:• dRRT∗ performs a rewiring step to refine paths in

the tree, reducing costs to reach particular nodes.• dRRT∗ is anytime, employing branch and bound

pruning after an initial solution is reached.• dRRT∗ uses a heuristic to guide its expansion.

1The types of distances the arguments hold are more general, butproofs for alternative metrics are left as future work.

2Notice this slight difference from dRRT [18] so as to allow edgeswhere some robots remain motionless while others move.

Fig. 2. An illustration of a two-robot tensor product roadmapGi,j between roadmaps Gi and Gj . Two nodes in the tensor-productroadmap share an edge if all the individual robot configurations sharean edge in the individual robot roadmaps.

Algorithm 1 outlines dRRT∗, which grows a treeT = (VT,ET) over G, rooted at start configuration Sand initializes path πbest (Line 1). The method storesthe node added at each iteration V (Line 2), so as toguide the expansion of T towards the goal. The methoditeratively expands T given a time budget (Line 3),according to function Expand dRRT∗ (Line 4) as detailedin Algorithm 2, and stores the newly added node V .Then, the method tries to trace the path π that connectsthe start S with the target T (Line 5). If such a path π isfound, it is stored in πbest, as long as it improves uponthe cost of the previous solution (Lines 6, 7). The bestpath πbest is returned (Line 8).

Algorithm 1: dRRT∗(G, S, T )

1 πbest ← ∅, T.init(S);2 V ← S;3 while time.elapsed() < time limit do4 V ← Expand dRRT∗(G,T, V, T );5 π ← Trace Path(T, S, T );6 if π 6= ∅ and cost(π) < cost(πbest) then7 πbest ← π;8 return πbest

Algorithm 2 outlines the expansion step. The defaultbehavior is summarized in Lines 1-4, i.e., when noV last is passed as argument (Line 1). This operationcorresponds to an exploration step similar to RRT, i.e., arandom sample Qrand is generated in C (Line 2) and itsnearest neighbor V near in T is found (Line 3); then, theoracle function Od(·, ·) returns the implicit graph nodeV new that is a neighbor of V near on the implicit graphin the direction of Qrand (Line 4). If V last is provided(Line 5), which happens when the last iteration generatesa node closer to the goal relative to its parent, then V new

is greedily generated so as to be a neighbor of V last inthe direction of the goal T (Line 6).

In either case, the method next finds neighbors N ,which are adjacent to V new in G and have also been

2

Algorithm 2: Expand dRRT∗(G,T, V last, T )

1 if V last == NULL then2 Qrand ← Random Sample();3 V near ← Nearest Neighbor(T, Qrand);4 V new ← Od(V

near, Qrand);5 else6 V new ← Od(V

last, T );7 N ← Adjacent(V new, G) ∩ VT;8 V best←argminv∈N,L(v,V new)⊂Cf c(v)+c(L(v, V new));9 if V best == NULL then

10 return NULL;11 if V new /∈ T then12 T.Add Vertex(V new);13 T.Add Edge(V best, V new);14 else15 T.Rewire(V best, V new);16 for v ∈ N do17 if c(V new)+c(L(V new, v)) < c(v) and

L(V new, v) ⊂ Cf then18 T.Rewire(V new, v);19 if h(V new) < h(V best) then20 return V new;21 else22 return NULL;

added to T (Line 7). Among N , the best node V best

is chosen, for which the local path L(V best, V new) iscollision-free and that the total path cost to V new isminimized (Line 8). If no such parent can be found(Line 9), the expansion fails and no node is returned(Line 10). Then, if V new is not in T, it is added(Lines 11-13). Otherwise, if it exists, the tree is rewiredso as to contain edge (V best, V new), and the cost of theV new’s sub-tree (if any) is updated (Lines 14, 15). Then,for all nodes in N (Line 16), the method tests whether Tshould be rewired through V new to reach this neighbor.Given that L(V new, v) is collision-free and is of lowercost than the existing path to v (Line 17), the tree isrewired to make V new be the parent of v (line 18).

Finally, if the heuristic for V new is lower than itsparent V best (line 19), the method returns V new (Line 20),prompting greedy expansion of V new during the nextiteration. Otherwise, NULL is returned, promptingexploration. The implementation employs branch-and-bound, which is not reflected in the algorithmics.V new is determined via the dRRT oracle function,

which in conjunction with a simple rewiring scheme issufficient for showing asymptotic optimality for dRRT∗

(see Section IV). The oracle function Od for a two-robot case is illustrated in Figure 3. First, let ρ(Q,Q′)be the ray from configuration Q terminating at Q′. Then,

denote ∠Q(Q′, Q′′) as the minimum angle betweenρ(Q,Q′) and ρ(Q,Q′′). When Qrand is drawn in C, itsnearest neighbor V near in T is found. Then, project thepoints Qrand and V near into each robot space Ci, i.e.,ignore the configurations of other robots.

Fig. 3. (A) The method reasons over all neighbors q′ of q so as tominimize the angle ∠q(q′, q′′). (B) Od(·, ·) finds graph vertex V new

by minimizing angle ∠V near (V new, Qrand). (C,D) V nearand Qrandareprojected into each robot’s C-space so as to find nodes vnew

i and vnewj ,

respectively, which minimize angle ∠vneari/j

(vnewi/j , qrand

i/j ).

The method separately searches the single-robot roadmaps to discover V new. DenoteV near = (v1, . . . , vR) and Qrand = (q1, . . . , qR).For every robot i, let Ni ⊂ Vi be the neighborhood ofvi ∈ Vi, and identify v′i = argminv∈Ni

∠vi(qi, v). Theoracle function returns node V new = (v′1, . . . , v

′R).

Fig. 4. (A) The Voronoi region Vor(V ) of vertex V is shown where ifQrand is drawn, vertex V is selected for expansion. (B) When Qrand liesin the directional Voronoi region Vor′(V ), the expand step expandsto V new. (C) Thus, when Qrand is drawn within volume Vol(V ) =Vor(V ) ∩ Vor′(V ), the method will generate V new via V .

As in RRT and in dRRT, dRRT∗ has a Voronoi-biasproperty [12]. It is, however, slightly more involvedto observe as shown in Figure 4. To generate an edge(V, V new), random sample Qrand must be drawn withinthe Voronoi cell of V , denoted as Vor(V ) (Figure 4(A))and in the general direction of V new, denoted as Vor′(V )(Figure 4(B)). The intersection of these two volumes:Vol(V ) = Vor(V ) ∩ Vor′(V ), is the volume to besampled so as to generate V new via V near (Figure 4(C)).

IV. ANALYSIS

This section examines the properties of dRRT∗ startingwith the asymptotic convergence of the implicit roadmapG to containing a path in Cf with optimum cost. Then,it is shown that dRRT∗ eventually discovers the shortestpath in G. The combination of these two facts provesthe asymptotic optimality of dRRT∗.

3

For simplicity, the analysis considers robots operat-ing in Euclidean space, i.e., Ci is a d-dim. Euclideanhypercube [0, 1]d for fixed d ≥ 2. Robots are alsoassumed to have the same number of degrees of freedomd. Furthermore, the analysis assumes the cost functionof total distance, i.e., |Σ| :=

∑Ri=1 |σi|. Discussion on

these restrictions is provided in Section VI.

A. Optimal Convergence of GFor each robot, an asymptotically optimal PRM∗

roadmap Gi is constructed having n = |Vi| samples andusing a connection radius r(n) necessary for asymptoticconvergence to the optimum [8]. This work, similarto previous work [19], focuses on the notion of arobust optimum path and shows that the tensor productroadmap G converges to such a path as n increases.

Definition 1: A trajectory Σ : [0, 1] → Cf is robustif there exists a fixed δ > 0 such that for every τ ∈[0, 1], X ∈ Co it holds that ‖Σ(τ) − X‖2 ≥ δ, where‖ · ‖2 denotes the standard Euclidean distance.

Definition 2: A value c > 0, which denotes a pathcost, is robust if for every fixed ε > 0, there exists arobust path Σ, such that |Σ| ≤ (1 + ε)c. The robustoptimum c∗, is the infimum over all such values.

For any fixed n ∈ N+, and a specific instance of Gconstructed from R roadmaps, having n samples each,denote by Σ(n) the shortest path from S to T over G.

Definition 3: G is asymp. optimal (AO) if for everyfixed ε > 0 it holds that |Σ(n)| ≤ (1 + ε)c∗ asymptot-ically almost surely 3, where the probability is over allthe instantiations of G with n samples for each PRM.

Using this definition, the following theorem is proven.Recall that d denotes the dimension of a single-robotconfiguration space.

Theorem 1: G is AO when

r(n) ≥ r∗(n) = (1 + η)2

(1

d

) 1d(

log n

n

) 1d

,

where η is any constant larger than 0.Remark. Note that r∗(n) was developed in [7, Theo-

rem 4.1], and guarantees AO of PRM∗ for a single robot.The proof technique described in that work will be oneof the ingredients used to prove Theorem 1.4

This proof shows that for a clearance-robust optimalpath for any given ε > 0 having cost c∗, there exists arobust trajectory Σ : [0, 1]→ Cf, and a fixed δ > 0, suchthat the cost of Σ is at most (1 + ε/2)c∗ and for everyX ∈ Co, τ ∈ [0, 1] it holds that ‖Σ(τ)−X‖ ≥ δ. This

3Let A1, A2, . . . be random variables in some probability space andlet B be an event depending on An. B occurs asymptotically almostsurely (a.a.s.) if limn→∞ Pr[B(An)] = 1.

4Note that r∗(n) can be refined to incorporate the proportion ofCfi, which would reduce this expression.

is predicated on drawing a sufficient number of samplesn ≥ n0. For the proposed proof, it is shown that Gcontains a trajectory Σ(n), such that:

|Σ(n)| ≤ (1 + o(1)) · |Σ|, (1)asymptotically almost surely (a.a.s.), which immediatelyimplies that |Σ(n)| ≤ (1 + ε)c∗, proving Theorem 1.

Thus, it remains to show that there exists a trajectoryon G, which satisfies Eq. 1 a.a.s.. As a first step, it willbe shown that the robustness of Σ = (σ1, . . . , σR) in thecomposite space implies robustness in the single-robotsetting, i.e., robustness along σi.

For τ ∈ [0, 1] define the forbidden space parameter-ized by τ as

Coi (τ) = Co

i ∪R⋃

j=1,j 6=i

Iji (σj(τ)).

Claim 1: For every robot i, τ ∈ [0, 1], and qi ∈Coi (τ), ‖σi(τ)− qi‖2 ≥ δ.

Proof: Fix a robot i, and fix some τ ∈ [0, 1] anda configuration qi ∈ Co

i (τ). Next, define the followingcomposite configuration

Q = (σ1(τ), . . . , qi, . . . , σR(τ)).

Note that it differs from Σ(τ) only in the i-th robot’sconfiguration. By the robustness of Σ it follows thatδ ≤ ‖Σ(τ)−Q‖2

=

(‖σi(τ)− qi‖22 +

R∑j=1,j 6=i

‖σj(τ)− σj(τ)‖22) 1

2

≤ ‖σi(τ)− qi‖2.

The result of Claim 1 is that the paths σ1, . . . , σR arerobust in the sense that there is sufficient clearance forthe individual robots to not collide with each other givena fixed location of a single robot. A Lemma is derivedusing proof techniques from the literature [7], and itimplies every Gi contains a single-robot path σ(n)

i thatconverges to σi.

Lemma 1: For every robot i, Gi is constructed withn samples and a connection radius r(n) ≥ r∗(n)

contains a path σ(n)i with the following attributes a.a.s.:

(i) σ(n)i (0) = si, σ

(n)i (1) = ti;

(ii) |σ(n)i | ≤ (1 + o(1))|σi|;

(iii) ∀q ∈ Im(σ(n)i ), ∃τ ∈ [0, 1] s.t. ‖q − σi(τ)‖2 ≤

r∗(n), where Im(·) is some function image.Proof: The first property (i) follows from the fact

that si, ti are directly added to Gi. The rest follows fromthe proof of Theorem 4.1 in [7], which is applicable heresince r(n) ≥ r∗(n).

Lemma 1 also implies that G contains a path in C,that represents robot-to-obstacle collision-free motions,and minimizes the multi-robot metric cost. In particular,define Σ(n) = (σ

(n)1 , . . . , σ

(n)R ), where σ(n)

i are obtained

4

from Lemma 1. Then

|Σ(n)| =R∑i=1

|σ(n)i | ≤ (1+o(1))

R∑i=1

|σi| ≤ (1+o(1))|Σ|.

Nevertheless, it is not clear whether this ensures theexistence of a path where robot-robot collisions areavoided. That is, although Im(σ

(n)i ) ⊂ Cf

i, it might bethe case that Im(Σ(n)) ∩ Co 6= ∅. Next, it is shownthat σ(n)

1 , . . . , σ(n)R can be reparametrized to induce a

composite-space path whose image is fully contained inCf, with length equivalent to Σ(n).

For each robot i, denote by Vi = (v1i , . . . , v`ii ) the

chain of Gi vertices traversed by σ(n)i . For every vji ∈ Vi

assign a timestamp τ ji of the closest configuration alongσi, i.e.,

τ ji = argminτ∈[0,1]

‖vji − σi(τ)‖2.

Also, define Ti = (τ1i , . . . , τ`ii ) and denote by T the

ordered list of⋃Ri=1 Ti, according to the timestamp

values. Now, for every i, define a global timestampfunction TSi : T → Vi, which assigns to each globaltimestamp in T a single-robot configuration from Vi. Itthus specifies in which vertex robot i resides at timeτ ∈ T . For τ ∈ T , let j be the largest index, such thatτ ji ≤ τ . Then simply assign TSi(τ) = τ ji . From property(iii) in Lemma 1 and Claim 1 it follows that no robot-robot collisions are induced by the reparametrization,concluding the proof of Theorem 1.

B. Asymptotic Optimality of dRRT∗

Finally, dRRT∗ is shown to be AO. Denote by m thetime budget in Algorithm 1, i.e., the number of iterationsof the loop. Denote by Σ(n,m) the solution returned bydRRT∗ for n and m.

Theorem 2: If r(n) > r∗(n) then for every fixedε > 0 it holds that

limn,m→∞

Pr[|Σ(n,m)| ≤ (1 + ε)c∗

]= 1.

Since G is AO (Theorem 1), it suffices to show thatfor any fixed n, and a fixed instance of G, definedover R PRMs with n samples each, dRRT∗ eventually(as m tends to infinity), finds the optimal trajectoryover G. This can be shown using the properties of aMarkov chain with absorbing states [6, Theorem 11.3].While a full proof is omitted here, the idea is similarto what is presented in previous work [18, Theorem 3],and examined in Appendix A. By restricting the statesof the Markov chain to being the graph vertices alongthe optimal path, setting the target vertex to be anabsorbing vertex, and showing that the probability oftransitioning along any edge in this path is nonzero (i.e.,the probability is proportional to µ(Vol(Vk))

µ(Cf)> 0), then

the probability that this process does not reach the target

state along the optimal path converges to 0 as the numberof dRRT∗ iterations tends to infinity. The final step is toshow that the above statements hold when both m andn tend to∞. A proof for this phenomenon can be foundin [18, Theorem 6].

V. EXPERIMENTAL VALIDATION

This section provides an experimental evaluation ofdRRT∗ by demonstrating practical convergence, scalabil-ity for disk robots, and applicability to dual-arm manip-ulation. The approach and alternatives are executed on acluster with Intel(R) Xeon(R) CPU E5-4650 @ 2.70GHzprocessors, and 128GB of RAM. 5

Fig. 5. The 2D environment wherethe 2 disk robots operate.

2 Disk Robots among2D Polygons: Thisbase-case test involves2 disks (Ci := R2)of radius 0.2 withbounded velocity, in a10 × 10 region, in-flated by the radius,as in Figure 5. Thedisks have to swap po-sitions between (0, 0)and (9, 9). This is asetup where it is pos-sible to compute the explicit roadmap, which is notpractical in more involved scenarios. In particular, dRRT∗

is tested against: a) running A* on the implicit tensorroadmap G (referred to as “Implicit A*”), where G isdefined over the same individual roadmaps with N nodesas those used by dRRT∗; and b) an explicitly constructedPRM∗ roadmap with N2 nodes in C.

Fig. 6. For every n, 10 randomly generated pairs of roadmaps aregenerated. dRRT∗ runs 5 random experiments for every roadmap pair,and the implicit A* searches these 10 tensor combinations. PRM∗ is run10 times for every n. Average solution cost is reported over iterations.Data averaged over 10 roadmap pairs. dRRT∗ (solid line) convergesto the optimal path through G (dashed line).

Results are shown in Figure 6. dRRT∗ converges tothe optimal path over G, similar to the one discoveredby Implicit A*, while quickly finding an initial solutionof high quality. Furthermore, the implicit tensor product

5Additional data are provided in Appendix B.

5

roadmap G is of comparable quality to the explicitlyconstructed roadmap.

Table I presents running times. dRRT∗ and implicitA* construct 2 N -sized roadmaps (row 3), which arefaster to construct than the PRM∗ roadmap in C (row 1).PRM∗ becomes very costly as N increases. For N =500, the explicit roadmap contains 250, 000 vertices,taking 1.7GB of RAM to store, which was the upperlimit for the machine used. When the roadmap can beconstructed, it is quicker to query (row 2). dRRT∗ quicklyreturns an initial solution (row 5), and converges within5% of the optimum length (row 6) well before ImplicitA* returns a solution as N increases (row 4). The nextbenchmark further emphasizes this point.

TABLE ICONSTRUCTION AND QUERY TIMES (SECS) FOR 2 DISK ROBOTS.

Number of nodes: N = 50 100 200N2-PRM* construction 3.427 13.293 69.551N2-PRM* query 0.002 0.004 0.0232 N -size PRM* construction 0.1351 0.274 0.558Implicit A* search over G 0.684 2.497 10.184dRRT∗ over G (initial) 0.343 0.257 0.358dRRT∗ over G (converged) 3.497 4.418 5.429

Fig. 7. Data averaged over 10 runs for 3 to 10 robots. (Top):Relative solution cost and success ratio of dRRT∗, dRRT and RRT* forincreasing R. dRRT∗: average iteration and variance for initial solution(top of box), and solution cost and variance after 100, 000 iterations(bottom). Similar results for RRT*. Single data point for dRRT (noquality improvement after first solution). (Bottom): Solution costs overtime.

Many Disk Robots among 2D Polygons: In thesame environment as above, the number of robots R isincreased to evaluate scalability. Each robot starts on theperimeter of the environment and is tasked with reachingthe opposite side. An N = 50 roadmap is constructed for

every robot. It quickly becomes intractable to constructa PRM∗ roadmap in the composite space of many robots.

Figure 7 shows the inability of alternatives to competewith dRRT∗ in scalability. Solution costs are normalizedby an optimistic estimate of the path cost for each case,which is the sum of the optimal solutions for each robot,disregarding robot-robot interactions. Implicit A* fails toreturn solutions even for 3 robots. Directly executingRRT* in the composite space fails to do so for R ≥ 6.The original dRRT method (without the informed searchcomponent) starts suffering in success ratio for R ≥ 5and returns worse solutions than dRRT∗. The averagesolution times for dRRT may decrease as R increasesbut this is due to the decreasing success ratio, i.e., dRRTbegins to only succeed at easy problems.

Fig. 8. (Top): dRRT∗ is run for a dual-arm manipulator to go from itshome position (above) to a reaching configuration (below). 5 randomexperiments are run for 4 random roadmap pairs for every n. dRRT∗achieves perfect success ratio as n increases. (Bottom): dRRT∗ solutionquality over time. Here, larger roadmaps provide benefits in terms ofrunning time and solution quality.

Dual-arm manipulator: This test shows the ben-efits of dRRT∗ when planning for two 7-dimensionalarms. Figure 8 shows that RRT* fails to return solutionswithin 100K iterations. Using small roadmaps is alsoinsufficient for this problem. Both dRRT∗ and ImplicitA* require larger roadmaps to begin succeeding. Butwith N ≥ 500, Implicit A* always fails, while dRRT∗

maintains a 100% success ratio. As expected, roadmapsof increasing size result in higher quality path. Theinformed nature of dRRT∗ also allows to find initialsolutions fast, which together with the branch-and-boundprimitive allows for good convergence.

VI. DISCUSSION

This work proves asymptotic optimality of sampling-based multi-robot planning over implicit structures usingthe dRRT∗ approach, which performs efficient, informedsearch. It constructs asymptotically optimal roadmaps

6

for each robot, but searches over an implicitly definedtensor product, and avoids the need to construct large,dense roadmaps in the composite space of many robots.

Future work includes the consideration of dynamicsgiven recent results [16], [15]. Furthermore, the analysisshould be valid for a wide array of cost functions, butthis must be shown. This approach can be leveragedtoward efficiently solving simultaneous task and motionplanning for many robot manipulators [2].

REFERENCES

[1] A. Dobson and K. E. Bekris. Sparse Roadmap Spanners forAsymptotically Near-Optimal Motion Planning. Intern. Journalof Robotics Research (IJRR), 33(1):18–47, 2014.

[2] A. Dobson and K. E. Bekris. Planning Representations and Al-gorithms for Prehensile Multi-Arm Manipulation. In IEEE/RSJIntern. Conf. on Intelligent Robots and Systems (IROS), 2015.

[3] A. Dobson, G. Moustakides, and K. E. Bekris. Geometric Prob-ability Results For Bounding Path Quality In Sampling-BasedRoadmaps After Finite Computation. In IEEE InternationalConference on Robotics and Automation (ICRA), 2015.

[4] M. Gharbi, J. Cortes, and T. Simeon. Roadmap Composition forMulti-Arm Systems Path Planning. In IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), 2009.

[5] F. Gravot and R. Alami. A Method for Handling MultipleRoadmaps and Its Use for Complex Manipulation Planning. InIEEE Intern. Conf. on Robotics and Automation (ICRA), 2003.

[6] C. Grinstead and J. Snell. Introduction to Probability. AmericanMathmatical Society, Providence, RI, 2012.

[7] L. Janson, A. Schmerling, A. Clark, and M. Pavone. FastMarching Tree: a Fast marching Sampling-Based Method forOptimal Motion Planning in Many Dimensions. InternationalJournal of Robotics Research (IJRR), 34(7):883–921, 2015.

[8] S. Karaman and E. Frazzoli. Sampling-based Algorithms forOptimal Motion Planning. International Journal of RoboticsResearch (IJRR), 30(7):846–894, June 2011.

[9] L. E. Kavraki, P. Svestka, J.-C. Latombe, and M. Overmars.Probabilistic Roadmaps for Path Planning in High-DimensionalConfiguration Spaces. IEEE Transactions on Robotics andAutomation, 12(4):566–580, 1996.

[10] S. M. LaValle and J. J. Kuffner. Randomized KinodynamicPlanning. International Journal of Robotics Research (IJRR),20:378–400, May 2001.

[11] S. Leroy, J. Laumond, and Simeon. Multiple Path Coordinationfor Mobile Robots: A Geometric Algorithm. In Intern. JointConf. on Artificial Intelligence (IJCAI), 1999.

[12] S. R. Lindemann and S. M. LaValle. Incrementally reducingdispersion by increasing voronoi bias in rrts. In IEEE Intern.Conf. on Robotics and Automation (ICRA), 2004.

[13] J. Marble and K. E. Bekris. Asymptotically Near-Optimal Plan-ning with Probabilistic Roadmap Spanners. IEEE Transactionson Robotics, 29(2):432–444, 2013.

[14] J. Peng and S. Akella. Coordinating Multiple Robots with Kin-odynamic Constraints Along Specified Paths. The InternationalJournal of Robotics Research, 24(4):295–310, 2005.

[15] E. Schmerling, L. Janson, and M. Pavone. Optimal sampling-based motion planning under differential constraints: The driftcase with linear affine dynamics. In IEEE Conference onDecision and Control (CDC), 2015.

[16] E. Schmerling, L. Janson, and M. Pavone. Optimal sampling-based motion planning under differential constraints: The drift-less case. In IEEE International Conference on Robotics andAutomation (ICRA), 2015.

[17] J. T. Schwartz and M. Sharir. On the Piano Movers’ Problem:III. Coordinating the Motion of Several Independent Bodies: TheSpecial Case of Circular Bodies Moving Amidst Polygonal Barri-ers. The International Journal of Robotics Research, 2(3):46–75,1983.

[18] K. Solovey, O. Salzman, and D. Halperin. Finding a needle in anexponential haystack: Discrete RRT for exploration of implicitroadmaps in multi-robot motion planning. The InternationalJournal of Robotics Research, 35(5):501–513, 2016.

[19] K. Solovey, O. Salzman, and D. Halperin. New Perspectiveon Sampling-based Motion Planning via Random GeometricGraphs. In Robotics Science and Systems (RSS), 2016.

[20] P. Svestka and M. Overmars. Coordinated Path Planning forMultiple Robots. Robotics and Autonomous Systems, 23:125–152, 1998.

[21] S. Tang and V. Kumar. A Complete Algorithm for GeneratingSafe Trajectories for Multi-Robot Teams. In InternationalSymposium on Robotics Research (ISRR), Sestri Levante, Italy,September 2015.

[22] J. Van Den Berg, S. Guy, M. Lin, and D. Manocha. Reciprocaln-body collision avoidance, volume 70 of Springer Tracts inAdvanced Robotics, pages 3–19. Star edition, 2011.

[23] J. van den Berg, J. Snoeyink, M. Lin, and D. Manocha. Cen-tralized Path Planning for Multiple Robots: Optimal Decouplinginto Sequential Plans. In Robotics: Science and Systems (RSS),2009.

[24] G. Wagner and H. Choset. Subdimensional Expansion for Mul-tirobot Path Planning. Artificial Intelligence Journal, 219:1024,2015.

APPENDIX

A. Proof of Convergence

This appendix examines the result of Theorem 2,and formally proves the convergence of the dRRT∗ treetoward containing all optimal paths.

Lemma 2 (Optimal Tree Convergence of dRRT∗):Consider an arbitrary optimal path π∗ originating fromv0 and ending at vt, then let O(m)

k be the event suchthat after m iterations of dRRT∗, the search tree Tcontains the optimal path up to segment k. Then,

lim infm→∞

P(O

(m)t

)= 1.

Proof. This is shown using Markov chain results [6,Theorem 11.3]. Specifically, absorbing Markov chainscan be leveraged to show that dRRT∗ will eventuallycontain the optimal path over G. An absorbing Markovchain has some subset of its states in which the transitionmatrix only allows self-transitions.

The proof follows by showing that the dRRT∗ methodcan be described as an absorbing Markov chain, wherethe target state of a query is represented as an absorbingstate in a Markov chain. For completeness, the theoremis re-stated here.

Theorem 3 (Thm 11.3 in Grinstead & Snell):In an absorbing Markov chain, the probability thatthe process will be absorbed is 1 (i.e., Q(m) → 0 asn → ∞), where Q(m) is the transition submatrix forall non-absorbing states.

7

The first part is that the dRRT∗ search is cast as anabsorbing Markov chain, and second, that the transitionprobability from each state to the next is nonzero, i.e.,each state eventually connects to the target.

For query (S, T ), let the sequence V ={v1, v2, . . . , vt} of length t represent the verticesof G corresponding to the optimal path throughthe graph which connects these points, where vtcorresponds to the target vertex, and furthermore, letvt be an absorbing state. Theorem 3 operates underthe assumption that each vertex vi is connected to anabsorbing state (vt in this case).

Then, let the transition probability for each state havetwo values, one for each state transitioning to itself,which corresponds to the dRRT∗ search expanding alongsome other arbitrary path. The other value is a transitionprobability from vi to vi+1. This corresponds to themethod sampling within the volume Vol(vi).

Then, as the second step, it must be shown that thisvolume has a positive probability of being sampledin each iteration. It is sufficient then to argue thatµ(Vol(si))µ(Cf)

> 0. Fortunately, for any finite n, previouswork has already shown that this is the case givengeneral position assumptions [18, Lemma 2].

Given these results, the dRRT∗ is cast as an absorbingMarkov chain, which satisfies the assumptions of 3, andtherefore, the matrix Q(m) → 0. This implies that theoptimal path to the goal has been expanded in the tree,and therefore lim infm→∞ P

(O

(m)t

)= 1. �

B. More Experimental Data

This appendix presents additional experimental data.

Fig. 9. 2-Robot convergence data over time.

1) 2-Robot Benchmark: For the 2 disk robots case,Figure 9 provides the solution cost found by dRRT∗overcomputation time instead of over iterations.

Fig. 10. R-robot solution times for varying R.

2) R Disk Robots: For the R disk robots, Figure 10shows query resolution times for the various methods.

Fig. 11. (Top): Convergence rate and success ratio over the minimal9-node roadmap (Bottom): Solution cost over time when using theminimal roadmap.

To emphasize the lack of scalability for alternatemethods, additional experiments were run using asmaller, manually crafted 9-node roadmap, shown inFigure 12. Each robot roadmap samples its nodes withinthe shaded regions indicated in the figure.

Fig. 12. Minimal graphfor the R-robot case.

Figure 11 indicates that evenusing a very well-crafted andsmall roadmap for a problemdoes not help the alternate meth-ods to scale. While they per-form better than with the randomroadmaps, Implicit A* times outfor R ≥ 5, and RRT* times outfor R ≥ 6.

3) Manipulator Benchmark: For the dual-arm manip-ulator benchmark, Figure 13 presents solution qualityover iterations instead of over time.

Fig. 13. Motoman benchmark solution quality over iterations.

8