rescue agent

1
Rescue Agent Cleaner Agent Narrow Corridor Victim Unsafe Cell Clearabl e Debris Distributed Path Planning (DPP) Consider the problem of a team of agents given start and goal locations and asked to find a set of collision-free paths through time and space. One approach that has been shown effective for reasonably large teams is prioritized planning [1] . We present a distributed extension to this algorithm that may improve scalability in certain cases. L-TREMOR We present Large-scale Teams REshaping of MOdels for Rapid-execution (L-TREMOR), a distributed version of the TREMOR [2] algorithm for solving distributed POMDPs with coordination locales (DPCLs). In DPCL problems, agents are typically able to act independently, except in certain sets of states known as coordination locales. Acknowledgements This research has been funded in part by the AFOSR MURI grant FA9550-08-1-0356. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Tractable Planning in Large Teams Emerging team applications require the cooperation of 1000s of members (humans, robots, agents). Team members must complete complex, collaborative tasks in dynamic and uncertain environments. How can we effectively and tractably plan in these domains? Information Sharing for Distributed Planning Prasanna Velagapudi [email protected] Prioritized Planning [1] Distributed Prioritized Planning (DPP) Example Map: Planning for 240 agents Example Map: Rescue Domain Scaling up from TREMOR [2] to L-TREMOR Search and Rescue Disaster Response UAV Surveilla nce Start Goal A B C D A B C D Preliminary Results Conclusions and Future Work In this work, we investigate two related approaches to scale distributed planning into the hundreds of agents using information exchange and reward-shaping. Preliminary work suggests that these techniques may provide competitive performance while improving scalability and reducing computational cost. We are working to further improve performance of these systems through better modeling of the dynamics of the systems and more intelligent dissemination of information over the network. References [1] J. van den Berg and M. Overmars, "Prioritized Motion Planning for Multiple Robots,” Proc. of IEEE/RSJ IROS, 2005. [2] P. Varakantham, J. Kwak, M. Taylor, J. Marecki, P. Scerri, and M. Tambe, "Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping," Proc. of ICAPS, 2009. [3] P. Varakantham, R.T. Maheswaran, T. Gupta, and M. Tambe, "Towards Efficient Computation of Error Bounded Solutions in POMDPs : Expected Value Approximation and Dynamic Disjunctive Beliefs," Proc. of IJCAI, 2007. Role Allocation Policy Solution Interaction Detection Coordination TREMOR Branch & Bound MDP Independent EVA [3] solvers Joint policy evaluation Reward shaping of independent models L- TREMOR Decentralized Auction Sampling & message passing Iterative Distributed Planning One strategy that can be applied to this problem is iterative, independent planning coupled with social model shaping. While the specifics vary by domain, the general process can be broken into a few basic steps: Factor the problem and enumerate the set of interactions in the problem state Create a set of functions that will enable agents to plan independently, except when they are involved in an interaction. Compute independent plans and find potential interactions between agents Each agent computes an independent plan using its local knowledge of the problem. Using this plan, it can search over all possible interactions to find a set of interactions that might involve it. Exchange messages about interactions Once an agent has some idea of what interactions it could be involved in, it can communicate information about those interactions and how it expects to be affected to its teammates. Use exchanged information to improve local model when replanning Now that agents have exchanged information, they have a better idea of which interactions could occur, and how likely they are to occur. They can use this information to improve their local model and return to step 2 to plan again. Command & Control Goal: Get rescue agents to as many victims as possible within a fixed time horizon. Agents can interact through narrow corridors (only one agent can fit at a time) and clearable debris (blocks rescue agents, but can be cleared by cleaner agents). Goal: Get every agent on the team from a start location to a goal location, with no collision between agents and/or any map obstacles. Agents can interact at any state, at any time step, and the interaction (a collision) is highly undesirable for the team. Empirically computed joint reward is shown for L- TREMOR and an independent solution on three different maps. The results show that improvement occurs, but it is sporadic and unstable. N = 6 N = 10 N = 100 (structurall y similar to N=10) When cumulative planning time (over all agents) is normalized by team size, it is evident that L-TREMOR is scaling close to linearly to large teams. While both planners attain almost identical solutions, the centralized prioritized planner is faster, even though the distributed planner uses far fewer iterations. This is because the distributed planner must sometimes replan difficult paths, and does not use an incremental planner. DPP Centrali zed In DPP, agents plan simultaneously, then exchange paths. If an agent receives a conflicting path of higher priority, it adds it as a dynamic obstacle. In prioritized planning, agents are assigned some order, often using a heuristic such as distance to goal. Then, starting with the highest (furthest) agent, plans are generated in sequence. Each agent’s path is used as a dynamic obstacle for all subsequent agents. Comparing centralized and distributed performance By comparing the reward expected by agents to actual joint reward, a negative linear relationship is evident between improvement over an independent solution, and the error in estimating reward.

Upload: pearl-jensen

Post on 02-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Information Sharing for Distributed Planning Prasanna Velagapudi pkv @ cs.cmu.edu. Start. A. A. B. B. C. C. D. D. Tractable Planning in Large Teams - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rescue Agent

Rescue Agent

Cleaner Agent

Narrow Corridor

Victim

Unsafe Cell

Clearable Debris

Distributed Path Planning (DPP)Consider the problem of a team of agents given start and goal locations and asked to find a set of collision-free paths through time and space. One approach that has been shown effective for reasonably large teams is prioritized planning [1]. We present a distributed extension to this algorithm that may improve scalability in certain cases.

L-TREMORWe present Large-scale Teams REshaping of MOdels for Rapid-execution (L-TREMOR), a distributed version of the TREMOR[2] algorithm for solving distributed POMDPs with coordination locales (DPCLs). In DPCL problems, agents are typically able to act independently, except in certain sets of states known as coordination locales.

AcknowledgementsThis research has been funded in part by the AFOSR MURI grant FA9550-08-1-0356. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship.

Tractable Planning in Large TeamsEmerging team applications require the cooperation of 1000s of members (humans, robots, agents). Team members must complete complex, collaborative tasks in dynamic and uncertain environments. How can we effectively and tractably plan in these domains?

Information Sharing for Distributed PlanningPrasanna Velagapudi

[email protected]

Prioritized Planning[1]

Distributed Prioritized Planning (DPP)

Example Map: Planning for 240 agents

Example Map: Rescue Domain

Scaling up from TREMOR[2] to L-TREMOR

Search and Rescue

Disaster Response

UAVSurveillance

Start

Goal

ABCD

ABCD

Preliminary Results

Conclusions and Future WorkIn this work, we investigate two related approaches to scale distributed planning into the hundreds of agents using information exchange and reward-shaping. Preliminary work suggests that these techniques may provide competitive performance while improving scalability and reducing computational cost. We are working to further improve performance of these systems through better modeling of the dynamics of the systems and more intelligent dissemination of information over the network.

References[1] J. van den Berg and M. Overmars, "Prioritized Motion Planning for Multiple Robots,” Proc. of IEEE/RSJ IROS, 2005.[2] P. Varakantham, J. Kwak, M. Taylor, J. Marecki, P. Scerri, and M. Tambe, "Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping," Proc. of

ICAPS, 2009. [3] P. Varakantham, R.T. Maheswaran, T. Gupta, and M. Tambe, "Towards Efficient Computation of Error Bounded Solutions in POMDPs : Expected Value Approximation and

Dynamic Disjunctive Beliefs," Proc. of IJCAI, 2007.

Role Allocation Policy Solution Interaction Detection Coordination

TREMOR

Branch & Bound MDP

Independent EVA[3] solvers

Joint policy evaluation

Reward shapingof independent

modelsL-TREMOR

DecentralizedAuction

Sampling & message passing

Iterative Distributed PlanningOne strategy that can be applied to this problem is iterative, independent planning coupled with social model shaping. While the specifics vary by domain, the general process can be broken into a few basic steps:

① Factor the problem and enumerate the set of interactions in the problem state

Create a set of functions that will enable agents to plan independently, except when they are involved in an interaction.

② Compute independent plans and find potential interactions between agents

Each agent computes an independent plan using its local knowledge of the problem. Using this plan, it can search over all possible interactions to find a set of interactions that might involve it.

③ Exchange messages about interactions

Once an agent has some idea of what interactions it could be involved in, it can communicate information about those interactions and how it expects to be affected to its teammates.

④ Use exchanged information to improve local model when replanning

Now that agents have exchanged information, they have a better idea of which interactions could occur, and how likely they are to occur. They can use this information to improve their local model and return to step 2 to plan again.

Command & Control

Goal: Get rescue agents to as many victims as possible within a fixed time horizon.

Agents can interact through narrow corridors (only one agent can fit at a time) and clearable debris (blocks rescue agents, but can be cleared by cleaner agents).

Goal: Get every agent on the team from a start location to a goal location, with no collision between agents and/or any map obstacles.

Agents can interact at any state, at any time step, and the interaction (a collision) is highly undesirable for the team.

Empirically computed joint reward is shown for L-TREMOR and an independent solution on three different maps. The results show that improvement occurs, but it is sporadic and unstable.

N = 6 N = 10N = 100

(structurally similar to N=10)

When cumulative planning time (over all agents) is normalized by team size, it is evident that L-TREMOR is scaling close to linearly to large teams.

While both planners attain almost identical solutions, the centralized prioritized planner is faster, even though the distributed planner uses far fewer iterations.

This is because the distributed planner must sometimes replan difficult paths, and does not use an incremental planner.

DP

PC

en

tra

lize

d

In DPP, agents plan simultaneously, then exchange paths. If an agent receives a conflicting path of higher priority, it adds it as a dynamic obstacle.

In prioritized planning, agents are assigned some order, often using a heuristic such as distance to goal. Then, starting with the highest (furthest) agent, plans are generated in sequence. Each agent’s path is used as a dynamic obstacle for all subsequent agents.

Comparing centralized and distributed performance

By comparing the reward expected by agents to actual joint reward, a negative linear relationship is evident between improvement over an independent solution, and the error in estimating reward.