pradeep varakantham singapore management university joint work with j.y.kwak, m.taylor, j. marecki,...

22
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Upload: meghan-hodge

Post on 18-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Pradeep Varakantham Singapore Management University

Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Page 2: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Motivating Domains

Disaster RescueSensor

Networks

Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making

Page 3: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Meeting the challengesProblem:

Multiple agents coordinating to perform multiple tasks in presence of uncertainty

Sol: Represent as Distributed POMDPs and solveNEXP Complete for optimal solutionApproximate algorithm to dynamically exploit

structure in interactionsResult: Vast improvement in performance over

existing algorithms

Page 4: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Outline

Illustrative Domain

Model

Approach: Exploit dynamic structure in interactions

Results

Page 5: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Illustrative Domain Multiple types of

robots Uncertainty in

movements Reward

Saving victims Collisions Clearing debris

Maximize expected joint reward

Page 6: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

ModelDisPOMDPs with Coordination Locales, DPCL

Joint model: <S, A, Ω, P, R, O, Ag>Global state represents completion of tasksAgents independent except in coordination locales,

CLsTwo types of CLs:

Same time CL (Ex: Agents colliding with each other)

Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal)

Individual observability

Page 7: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Solving DPCLs with TREMOR

Teams REshaping of MOdels for Rapid execution

Two steps:1. Branch and Bound search

MDP based heuristics

2. Task Assignment evaluation By computing policies for every agentPerform only joint policy computation at CLs

Page 8: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

1. Branch and Bound search

Page 9: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

2. Task Assignment EvaluationUntil convergence of policies or

maximum iterations:1)Solve individual POMDPs2)Identify potential coordination locales3)Based on type and value of

coordination :Shape P and R of relevant individual agents

Capture interactionsEncourage/Discourage interactions

4)Go to step 1

Page 10: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Identifying potential CLsCL = <State, Action>Probability of CL occurring at a time step, T

Given starting beliefStandard belief update given policy

Policy over belief states

Probability of observing w, in belief state “b”

Updating “b”

Page 11: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Type of CLSTCL, if there exists “s” and “a” for which

Transition/Reward function not decomposable, P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’)) OR R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))

FTCL, Completion of task (global state) by an agent at

t’ affects transitions/rewards of other agents at t

Page 12: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Shaping Model (STCL)Shaping transition function

Shaping reward function

Joint transition probability when CL occursNew transition

probability for agent “i”

Page 13: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

ResultsBenchmark Algorithms

Independent POMDPsMemory Bounded Dynamic Programming

(MBDP)

CriterionDecision qualityRun-time

Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

Page 14: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

State space

Page 15: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Agents

Page 16: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Coordination Locales

Page 17: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Time Horizon

Page 18: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Related workExisting Research

DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input

DEC-POMDPs JESP MBDP Exploiting independence in

transition/reward/observation.Model Shaping

Guestrin and Gordon, 2002

Page 19: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

ConclusionDPCL, a specialization of Distributed POMDPs

TREMOR exploits presence of few CLs in domains

TREMOR depends on single agent POMDP solvers

Results: TREMOR outperformed DisPOMDP algorithms,

except in tightly coupled small problems

Page 20: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Questions?

Page 21: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Same Time CL (STCL)There is an STCL, if

Transition function not decomposable, OR P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’))

Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(oi,ai,(sg’,si’))

Reward function not decomposable R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))

Ex: Two robots colliding in a narrow corridor

Page 22: Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Future Time CL

Actions of one agent at “ t’ ” can affect transitions OR observations OR rewards of

other agents at “ t ” P((st

g,sti),at

i,(stg’,st

i’)|ajt’ ) ≠ P((st

g,sti),at

i,(stg’,st

i’)) , ¥ t’ < t

R((stg,st

i),ati,(st

g’,sti’)|aj

t’ ) ≠ R((stg,st

i),ati,(st

g’,sti’)) , ¥ t’

< t O(wt

i,ati,(st

g’,sti’)|aj

t’ ) ≠ O(wti,at

i,(stg’,st

i’)) , ¥ t’ < t

Ex: Clearing of debris assists rescue robots in getting to victims faster