1 university of southern california towards a formalization of teamwork with resource constraints...
Post on 20-Dec-2015
215 views
TRANSCRIPT
1University of Southern California
Towards A Formalization Of Teamwork With Resource Constraints
Praveen Paruchuri, Milind Tambe, Fernando Ordonez
University of Southern California
Sarit Kraus
Bar-Ilan University,Israel
University of Maryland, College Park
December,2003
2University of Southern California
Motivation: Teamwork with Resource Constraints
Agent teams: Agents maximize team rewards and also ensure limited resource consumption
E.g., Limited communication bandwidth, limited battery power etc
Example Domain:
Sensor Net Agents - Limited replenishable energy
Mars Rovers - Limited energy for each daily activity
3University of Southern California
Framework & Context
Framework for agent teams with resource constraints in complex and dynamic environments
Resource constraints soft, not “hard” Okay for Sensor to exceed energy threshold when needed. Okay for Mars rover to exceed allocated energy once in a
while for a regular activity.
MDPPOMDP
MTDP
Single Agent Multi Agent
CMDP ???
With resource Constrains
Context
4University of Southern California
Our Contributions
Extended MTDP ( EMTP ) – A Distributed MDP framework
EMTDP ≠ CMDP with many agents. Policy Randomization in CMDP
– Causes miscoordination in teams. Algorithm for transforming conjoined EMTDP (initial
formulation dealing with joint actions) into actual EMTDP (reasoning about individual actions).
Proof of equivalence between different transformations.Solution algorithm for the actual EMTDP.
Maximize Expected Team Reward
Bound Expected Resource Consumption
5University of Southern California
E-MTDP: Formally Defined
An E-MTDP (for a 2 agent case) is a tuple <S,A,P,R,C1,C2,T1,T2,N,Q> where, S,A,P,R : As defined in MTDP. C1 = [ ]: Vector of cost of resource k for joint action a
in state i ( for agent 1). T1 = [ ]: Threshold on expected resource k consumption. N = [ ]: Vector of joint communication costs for joint
action a in state i. Q : Threshold on communication costs
Simplifying assumptions: Individual observability (no POMDPs) Two agents
ci ak
1 ^
T k1n
i a^
6University of Southern California
Conjoined EMTDP – Simple example
Two agent case
S1
S7
S3
S2 S4
S5
S6
a1b2=.9
a2b1=.3
a2b1=.7
a1b2=.7
a1b1=1
a1b1=.3
a2b1=.7a1b2=.9
a1b2=.1
R(S1,a2b2)=9C1(S1,a2b2)=7C2(S1,a2b2)=7
a2b2=1
7University of Southern California
Linear Program : Solving Conjoined EMTDP
M ax X ri a i a
ai
^ ^̂
x x pj a
ai a
ai
ija
j^^
^^
^
xi a
^ 0
x c ti a i ak
ai
k^ ^^
1 1
x c ti a
ai i ak
k^^
^ 2 2
x n Qiai a
ai
^̂
LP for solving MDP
Maximizing Reward
Handling constraints
Expected cost of resource k over all states and actions less than t1
8University of Southern California
Sample LP solution
VISITED( X11) 0.000000 a1b1 to be executed 0% time
VISITED( X12) 0.3653846 a1b2 : 36% = 9/25
VISITED( X13) 0.6346154 a2b1 : 64% = 16/25
VISITED( X14) 0.000000 a2b2 : 0%
B1(16/25) B2(9/25)
a1( 9/25) 144/625
= .23
81/625
= .13
a2(16/25) 256/625
= .4
144/625
= .23
Should have been 0.
(Miscoordination)
9University of Southern California
Conjoined to Actual EMTDP: Transformation
a b1 1 a b1 2
a b n1S1
a bm 1
a bm n
S1
A1c
C a( )1
N C a( )1
C a m( )
N C a m( )
P f
P f
a b1 1
a b n1
a b1 1a b n1
a bm 1
a bm n
a bm 1
a bm n
X c11
X c n1X o 11
X o n1X c m 1
X c m nX o m 1
X o m n
For each state, for each joint action,
Introduce transition between original and new states
Introduce transitions between new states and original target states
Introduce a communication and non-communication action for each different individual action and add corresponding new states
A c1
A o1
A m c
A m o
10University of Southern California
Non-linear Constraints
Need to introduce non-linear constraints
For each original state For each new state introduced by no communication action
– Set conditional probability of corresponding actions equal
Ex: P(b1/ ) = P(b1/ )=……=P(b1/ ) && …….. &&
P(bn/ ) = P(bn/ )=……=P(bn/ ).
, , , - Observable, Reached by Comm action
, , , - Unobservable, No Comm action
A o1 A o2 A m o
A o1 A o2 A m o
A c1 A c2 A m c
A o1 A o2 A m o
11University of Southern California
Reason for non-linear constraints
Agent B has no hint of state if NC actions. Necessity to make its actions independent of source state. Probability of action b1 from state should equal
probability of same action (i.e b1) from .
Miscoordination avoided Actions independent of state.
Transformation example -
A o1A o2
13University of Southern California
Experiments : Example domain 2
Domain 1: Comparing Expected rewards –Comm Threshold Conjoined Deterministic Miscoordination
EMTDP0 10.55 0 No reward 6.993 10.55 0 No 8.916 10.55 0 No 10.55( Miscoordination resulted in violating resource constraints )Domain 2 -
A team of two rovers and several scientists using themEach scientist has a daily routine of observationsRover can use a limited amount of energy in serving a scientistExperiment conducted: Observe Martian rocksRovers Maximize observation output within the energy budget provided.Soft constraint – Exceeding energy budget on a day is not catastrophicOverutilizing frequently affects other scientist’s workUncertainty – Only .75 chance of succeeding in an observationEMTDP had about 180 states, 1500 variables and 40 non-linear constraints.Could handle problem of this order in below 20 secs.
14University of Southern California
Summary and Future Work
Novel formalization of teamwork with resource constraintsMaximize expected team reward but bound expected resource consumption.Provided a EMTDP formulation where agents avoid miscoordination even though randomized policies.Proved equivalence of different EMTDP transformation strategies ( see paper for details )Introduction of non-linear constraints.
Future Work - Need to fix on complexity. Experiment on n-agent case. Extend work to partially observable domains.