multi-agent planning: an introduction to planning and coordination

Multi-agent PlanningAn introduction to planning and coordination

Mathijs de Weerdt, Adriaan ter Mors, and Cees WitteveenDept. of Software Technology, Delft University of Technology

E-MAIL : {M .M .DEWEERDT, A .W.TERMORS, WITT}@EWI.TUDELFT.NL

1 Introduction

Many day-to-day situations involve decision making: for example, a taxi company hassome transportation tasks to be carried out, a large firm has to distribute a lot of com-plicated tasks among its subdivisions or subcontractors, and an air-traffic controller hasto assign time slots to planes that are landing or taking off. Intelligentagentscan aid inthis decision-making process. Agents are often classified into two categories accordingto the techniques they employ in their decision making:reactiveagents (cf. (Ferber andDrogoul, 1992)) base their next decision solely on their current sensory input;planningagents, on the other hand, take into account anticipated future developments — for in-stance as a result of their own actions — to decide on the most favourable course ofaction.

When an agent should plan and when it should be reactive depends on the particularsituation it finds itself in. Consider the example where an agent has to plan a route fromone place to another. A reactive agent might use a compass to plot its course, whereas aplanning agent would consult a map. Clearly, the planning agent will come up with theshortest route in most cases, as it won’t be confounded by uncrossable rivers, one-waystreets, and labyrinthine city layouts. On the other hand, there are also situations wherea reactive agent can at least be equally effective, for instance if there are no maps toconsult, for instance in a domain of (Mars) exploration rovers. Nevertheless, the abilityto plan ahead is invaluable in many domains, so in this paper we will focus onplanningagents.

The general structure of a planning problem is easy to explain: (the relevant part of)the world is in a certain state, but managers or directors would like it to be in anotherstate. The (abstract) problem of how one should get from the current state of the worldthrough a sequence of actions to the desired goal state is aplanning problem. Ideally,to solve such planning problems, we would like to have a general planning-problemsolver. However, such an algorithm solving all planning problems can be proven tobe non-existing.1 We therefore start to concentrate on a simplification of the generalplanning problem called ‘theclassical planning problem’. Although not all realisticproblems can be modeled as a classical planning problem, they can help to solve more

1That is, the general planning problem is undecidable.

1

complex problems. In this paper we first give an overview of planning techniques forthis classical planning problem and techniques for extensions of this problem. Please,skip this first section if you are already familiar with AI planning techiques. After thisintroduction to AI planning, we briefly discuss coordination in a multi-agent context,and we will introduce a way to organize current work on multi-agent planning by defin-ing several phases in the multi-agent planning process (in Section 3). Finally, we willdescribe some multi-agent planning techniques in more detail, and we conclude withan outlook on open issues in this field.

2 AI Planning

In this section we give an overview of AI planning techniques. It will not come as a sur-prise that AI planning techniques are techniques to search for a plan:forward planningis a planning technique building a plan starting from the initial state,backward plan-ningstarts from the goal states, andleast-commitment planningby constructs plans byadding actions in a non-sequential order. Hereafter we discuss some advanced heuris-tics to guide this search, which is quite useful since in general the (classical) planningproblem is very hard (see Theorem 8). Finally, we discuss some extensions of the clas-sical planning problem, such as dealing with uncertainty, time, and limited resources(fuel, capacity, money, etc.).

2.1 The classical planning problem

Theclassical planning problemcan be defined as follows (Weld, 1999):

Given

• a description of the known part of the initial state of the world (in aformal language, usually propositional logic) denoted byI,

• a description of the goal (i.e., a set of goal states), denoted byG, and

• a description of the possible (atomic) actions that can be performed,modeled as state transformation functions,

determine a plan, i.e., a sequence of actions that transforms each of thestates fitting the initial configuration of the world into one of the goalstates.

The formal language that was used for STRIPS (Fikes and Nilsson, 1971) is commonto most classical planning frameworks. This language is also used in Definition 6 toformally define the classical planning problem.

Example 1. Suppose that initially (i.e., in all states of the world that match the de-scription I), there is a taxi at a locationA, represented by a binary state variabletaxi(A), and a passenger at a locationB, represented bypassgr(B). In each of thestates described byG the passenger should be at a locationC, denoted bypassgr(C).Furthermore, suppose that there are three actions that can transform (some part of) thestate of the world.

2

Key

GI

a() actionset of initial states

set of goal states

G

I

unload(passgr )

move(B,C)

load(passgr )

move(A,B)

Figure 1: A sequence of actions (plan) leading from each of the initial states inI toone of the goal states inG.

1. The taxi can move from one location to another:move(x, y) withx, y ∈ {A,B,C}.This action requires that a prioritaxi(x) holds, and ensures that in the resultingstate¬taxi(x) andtaxi(y) hold.

2. The passenger can get in the taxi:load(p). This action requires a prioritaxi(x)and passgr(y) and x = y, and in the resulting state both¬passgr(y) andpassgr(taxi) should hold.

3. The passenger can get out of the taxi:unload(). This action requirestaxi(x)andpassgr(taxi), and results in¬passgr(taxi) andpassgr(x).

A plan that represents a solution to this problem is shown in Figure 1.

Remark 2. Note that it is not possible nor desirable to completely describe the stateof the world and everything that changes. We use the assumption that “all that is notexplicitly changed by an action remains unchanged” (Janlert, 1987) to deal with thisframe problem(Raphael, 1971). Consequently, we describe (frame) a state of the worldonly by the relevant literals. Such a set of literals is called astate specification. Underthis assumption, the difference between ‘state’ and ‘state specification’ is irrelevant.So often ‘state’ is used while ‘state specification’ is meant.

In STRIPS, the STanford Research Institute Problem Solver (Fikes and Nilsson,1971), states are described using binary state variables, calledpropositions. Actionsor operators are specified by(i) conditions on propositions, calledpreconditions (ii)propositions that are changed to ‘true ’ in the new state, calledadd effects, and(iii)

3

propositions that are changed to ‘false ’, called delete effects. Furthermore, goals canbe described by conditions on propositions.

The propositional STRIPS planning language has been formalized, and the com-plexity of the problems that can be specified in this language has been analyzed by(Lifschitz, 1987). The following formal treatment is based on a further analysis ofSTRIPS planning instances performed by (Nebel, 2000).

To formally specify a propositional planning formalism, we need the followingconcepts and notations. Given the set of all propositional atomsΣ, a literal overΣ isan elements of Σ or its negation¬s. The set of all literals overΣ including⊥ (bottom,to denote ‘false ’) and> (top, to denote ‘true ’) is denoted byΣ. For a set of literalsL ⊆ Σ we define¬L to be the set of literals{¬l | l ∈ L}, where¬l ≡ l′ if l = ¬l′ andl′ ∈ Σ. The set of all formulas is denoted byPROP , and defined as follows: ifp ∈ Σthenp ∈ PROP , and ifA,B ∈ PROP then¬A, (A ∨ B), (A ∧ B), (A → B), and(A ↔ B) ∈ PROP .

Definition 3. (Nebel, 2000) Anoperatoro ∈ 2Σ × 2Σ is defined by a preconditionpreand its effectpost, denoted by〈pre, post〉, wherepre ⊆ Σ ⊆ Σ is a set of propositionalatoms, andpost ⊆ Σ is a set of literals.

We use the notationpre(o) andpost(o) to denote the precondition and the effect ofan operatoro, respectively. The propositional atoms in the precondition of an operatormust all be true in the states ∈ 2Σ to which the operator is applied. LetO be the set ofall operators in a domain.

Definition 4. (Nebel, 2000) TheapplicationApp : 2Σ × O → 2Σ of an operator oractiono to a state (specification)s is defined as2

App(s, o) =

(s− ¬ post(o)) ∪ post(o) if s |= pre(o) ands 6|= ⊥ andpost(o) 6|= ⊥

{⊥} otherwise

That is, the negations of the literals in the effect clause will be removed, and theliterals themselves will be added to the states.

Using Definition 4, we can define the resultRes(s,∆) of applying a sequence ofoperators∆ to a states.

Definition 5. The resultRes : 2Σ × O∗ → 2Σ of applying a sequence of operators∆ = 〈o1, . . . , on〉 to a state (specification, see Remark 2)s is recursively defined by

Res (s, 〈〉) = s

Res (s, 〈o1, o2, . . . , on〉) = Res (App(s, o1), 〈o2, . . . , on〉)

Finally, Nebel defines a planning problem in propositional STRIPS as follows.

Definition 6. (Nebel, 2000) Aplanning problemin the propositional STRIPS formalismis a four tupleΠ = (Σ, O, I, G) where

2We use the standard propositional entailment (|=) here.

4

• Σ is a countably infinite set of propositional atoms, called facts or fluents,

• O ⊆ 2Σ× 2Σ is the set of all possible operators to describe state changes in thisdomain,

• I ⊆ Σ is the initial state, and

• G ⊆ Σ is the goal specification, i.e., the set of propositions that is to be satisfied.

Definition 7. A sequence of operators∆ = 〈o1, . . . , on〉 ∈ O∗ is called asolutionora planfor a planning instanceΠ = (Σ, O, I, G) iff Res(I,∆) |= G andRes(I,∆) 6|=⊥.3

The propositional STRIPS formalism (denoted byS) requires complete state spec-ifications, unconditional effects, and propositional atoms as the formulas in the precon-dition list. This formalism can be extended in various ways:(i) state specifications maybe incomplete (SI), (ii) effects can be conditional (SC), (iii) formulas in preconditions(and effect conditions) can be literals⊆ Σ (SL), and(iv) the formulas in preconditions(and effect conditions) can be arbitrary boolean formulas⊆ PROP (SB).

Theorem 8. (Nebel, 2000) Complexity of planning. The problem of deciding whethera planexistsfor a given instance (i.e., theplan existence problem) is PSPACE-completefor specifications in the propositional STRIPS formalism and for all combinations ofthe four extensions of this language (SI , SC , SL, andSB) described above.

This result is from (Nebel, 2000), who in turn relies on a proof by (Bylander, 1994).The idea behind this proof is as follows. Plan existence is in PSPACE because(i) astate is described byn propositions, and thus there are at most2n states. Since anyplan can be conceived as a sequence of states,(ii) the maximal plan length needed totransform an initial state into a goals state is at most2n. Therefore,(iii) the existenceof a plan of lengthp transforming a states1 into a states2 can be decided using analgorithm that usesO (log(p)), hence polynomial, space: for any intermediate states3

decide plan existence both froms1 to s3, and froms3 to s2, recursively. Since eachstate requiresO(n)-space and the depth of the recursion of this algorithm is at mostO (log(p)) = O(log(2n)) = O(n), the algorithm clearly runs in polynomial space.

The hardness of plan existence in Bylander’s proof is based on the fact that eachPSPACE problem can be translated into a planning problem.

Nebel also analyzes which instances of the different extensions of the planningproblem can be translated into each other, and which cannot. Figure 2 shows a graphof these specialization relations between combinations of the problem extensions, indi-cated by combinations of the subscriptsI, C,L, andB. An arrow from an extensionSL(allow literals) to another extensionSB (allow arbitrary boolean formulas) means thatany planning problem specified inSL can be translated into a problem inSB in poly-nomial time, while the plan size increases at most linearly in the size of the input.

Apart from a specialization relationship of all extensions of the STRIPS formalism,Nebel’s most interesting results are(i) that incomplete state specifications and literalsin preconditions can be compiled to the basic STRIPS formalism where the plan size is

3We use the word ‘iff’ as a short hand for ‘if and only if’.

5

S

SC SI

SIC SLC SLI SB

SLIC SBC SBI

SBIC

SL

Figure 2: The specialization relationships of planning formalisms based on syntacticrestrictions (Nebel, 2000).

preserved, and(ii) that incomplete state specifications and literals in preconditions andeffect conditions can be compiled to the basic STRIPS formalism with conditional ef-fects preserving plan size exactly, and(iii) that there are no other compilation schemespreserving plan size linearly except those implied by the specialization relationship andresults(i) and(ii) .

There exist many different approaches to solve the classical planning problem thatmake use of the STRIPS formalism and its extensions. Later on, we will discuss someof these approaches in a multi-agent setting. To prepare for this discussion, in thenext section, we briefly discuss the most relevant approaches and we describe them asvariants of a general method based on the refinement of the set of all possible plans.This generalization has been made by (Kambhampati, 1997).

2.2 Refinement planning

The search for a sequence of actions (a plan) to get from an initial state to a goal statecan be seen as a refinement of the set of all possible sequences (Kambhampati, 1997).We can describe most existing (classical) planning algorithms using this unifying viewon planning. The unifying concept, representing a set ofcandidatesequences, is aso-calledpartial plan. This partial plan is used to describe a set of partial solutions.Planning algorithms are given by defining how a partial plan is modified such that inthe end all (partial) solutions represented by the partial plan are complete, feasiblesolutions.

6

Definition 9. (Kambhampati, 1997) The syntax of apartial plan, usually denoted byPi,for a planning problemΠ = (Σ, O, I, G) is a directed acyclic graph(V,E, IPC, PTC),whereV is a set of the nodes representing the application of actions fromO, E ⊆V × V is the set of directed edges representing the required precedences of these ac-tions, andIPC andPTC are two kinds of auxiliary constraints:

• IPC ⊆ V × V × PROP is a set of interval preservation constraintsspecify-ing that a formula should hold during a certain interval (between two actions).Currently, the only used instantiations of such constraints are so-called causallinks.4 A causal linkoi

p→ oj specifies that the preconditionp of actionojshouldbe an effect ofoi that may not be undone betweenoi andoj .

• PTC ⊆ V × PROP is a set ofpoint truth constraintsspecifying that a for-mula should hold at a certain point before an action. In current planners theseconstraints are only used to specify open preconditions of actions.

The semantics of a partial plan is the set of action sequences that(i) contain at leastthe actions represented by the nodesV , and(ii) fulfill the precedence constraints im-plied by the directed edgesE. Furthermore, these action sequences should satisfy(iii)all interval preservation constraintsIPC, and(iv) the point truth constraintsPTC.These action sequences are calledcandidateplans or sequences of a partial planPi,denoted bycandidates(Pi).

TheIPC andPTC are especially useful if plans are not constructed bottom-up ortop-down. When actions are added in no particular order, somehow it must be storedwhy a particular action is added at all (e.g., to meet the precondition of a subsequentaction), and, more importantly, it has to be prevented that another action undoes itsrequired result. This kind of information can be stored using the interval preservationconstraints. Point truth constraints specify which preconditions still have to be satisfied.They can also be used to indicate at (or before) which point in a plan they need to befulfilled.

For some algorithms, intermediate results cannot be represented by one partial plan.Therefore, we think in terms of sets of partial plans. We define the candidate plans ofa set of partial plansPS ascandidates(PS) =

⋃Pi∈P candidates(Pi). Each subset

of candidates is represented by a partial planPi ∈ PS, called acomponent. Minimalcandidatesare those candidates that contain only actions that are included in a partialplan (i.e., that are represented inV ).

Given a partial plan, we need to define when a plan (sequence) is one of the candi-dates represented by this partial plan. Such a sequence is called asafe linearization.

Definition 10. (Kambhampati, 1997) A sequence∆ = 〈o1, . . . , on〉 ∈ O∗ is called asafe linearizationof a partial planPi = (V,E, IPC, PTC), if

• there exists a bijective functionf : V → ∆, such that anyv ∈ V represents theapplication of operatorf(v) ∈ ∆ (v ≡ f(v)),

4Causal links are used by least-commitment planners (Weld, 1994). These planners are briefly discussedat the end of this section.

7

move(A,B)

load(passgr1)

load(passgr2)

unload(passgr1)

Iin(passgr1, taxi)(PTC) at(taxi, A)

(IPC) interval preservation

action

constraint (IPC)

Key

move

point truth constraint(PTC)

precedence relation(A)

Figure 3: A partial plan describing all candidates that contain at least the four givenactions, in the order given by the precedence relation and matching the interval preser-vation constraints and the point truth constraints.

• for anyv, w ∈ V , v 6= w, if there is a path fromv to w in (V,E) thenf(v) <f(w) in ∆,

• for any(v, w, ϕ) ∈ IPC , if oi = f(v) < f(w) = oj and ifRes (I, 〈o1, . . . , oi〉) |=ϕ thenRes (I, 〈o1, . . . , oj−1〉) |= ϕ, and

• for any(v, ϕ) ∈ PTC , Res (I, 〈o1, . . . , f(v)〉) |= ϕ.

Example 11. A taxi has to take two passengerspassgr1 andpassgr2 from locationAto locationB. Initially, both the taxi and the passengers are at locationA. The partialplan that describes the set of candidate plans after some additional refinements (i.e.,the addition of twoload actions, amoveaction, and anunload action) is depictedin Figure 3. After some more refinements, each of the candidates should be a correctsequence of actions from the initial state to one of the desired goal states (i.e., whereat(passgr1, B) andat(passgr2, B) hold).

A planner can reduce the set of candidate plans represented by a set of partial plans,by (i) adding actions,(ii) adding precedences on these actions, and(iii) adding auxiliaryconstraints to one or more of the partial plans. A technique to find suitable refinementsis called arefinement strategy, denoted byR.

Proposition 12. (Kambhampati, 1997) Refinement. A refinement strategyR refines(i.e., reduces) the set of candidate plans, represented by a set of partial plansPS: so,if PS′ = R(PS) thencandidates(PS′) ⊆ candidates(PS).

To evaluate the effectiveness of such strategies for refining the set of candidateplans, we look at specific properties. For example, it is important to know whether astrategy preserves all possible solutions (i.e., is complete).

8

Definition 13. A refinement strategyR to find a set ofsolutions is called

1. progressiveiff candidates(R(PS)) ⊂ candidates(PS) for any set of partialplansP ,

2. completeiff solutions ∩ candidates(R(PS)) = solutions ∩ candidates(PS),and

3. systematiciff the set{candidates(Pi) | Pi ∈ R(PS)} is a partition ofcandidates(R(PS)).

Although planning algorithms are implemented in many different ways, they canbe rewritten in an alternative, uniform way, based on the theory of refining a set ofpotential solutions (Kambhampati, 1997). The structure they then have in commoncan be found in Algorithm 14. This algorithm describes how a solutionresult can beobtained from a set of partial plansPS. Each time the function REFINE is executed,and none of the minimal candidates is a solution, a refinement strategy is selected andapplied to one of partial plansP . Then repeatedly an element of the resulting setPSis selected, and this function REFINE is called recursively using this element. Once asolution has been found, this process stops, and the result is returned.

Algorithm 14. (REFINE (P,Π) )

Input: A partial plan P and a problem Π.

Output: A minimal candidate of P that is a solution to Π or ‘fail‘.

begin1. if a minimal candidate c of P is a solution to Π then

1.1. return c

2. else2.1. result := fail2.2. select a refinement strategy R2.3. PS := R(P )2.4. while PS 6= ∅ and result = fail do

2.4.1. non-deterministically select an element Pi of PS2.4.2. PS := PS \ {Pi}2.4.3. result := REFINE(Pi,Π)

2.5. return result

end

Example 15. The planning algorithm Fast Forward (FF) (Hoffmann and Nebel, 2001)starts with the initial state and an empty sequence of actions (plan). Repeatedly, thesequence is extended with actions, always adding to the end of the sequence. For eachof the possible extensions of the sequence (first with one action, then with two actions,

9

etc.), a heuristic value is calculated. The first of the possible extensions that leads to astate with a lower heuristic value than the current state is chosen.

The heuristic uses the relaxation that no subsequent action has negative effects.Under this assumption, a so-called relaxed plan can be constructed where all actionswith satisfied preconditions are added in parallel. The relaxed plan is constructed untilthe goal state is reached. The heuristic value is the cost of all actions in the relaxedplan that are needed to reach the goal state.

Although FF does not use the presented refinement framework, it is in fact a formof refinement planning with the following refinement strategyRFF (which can be usedin step 2.3 in Algorithm 14). Given one partial plan that represents a set of possiblesolutions, a new partial plan is constructed by extending this partial plan at the end.The extension is selected using the heuristic described above. See also Algorithm 16.Note that this particular refinement strategy uses only asingletonset of partial plansto represent of the set of candidates.

Algorithm 16. (RFF(PS) )

Input: A singleton set of partial plans PS.

Output: A singleton set of partial plans that lead to a state with a lower heuristic cost.

begin1. h := the heuristic costs of the current end state2. ∆ := the action sequence of a minimal candidate of PS

3. h′ := ∞4. while h′ ≥ h do

4.1. breadth-first select a sequence of actions ∆′ to extend ∆4.2. h′ := the heuristic value of the end state of ∆ + ∆′

5. return the partial plan that represents ∆ + ∆′

end

Refinement strategies such asRFF can be roughly divided into three categories.

1. Progressionor forward planning methodsconstruct (partial) plans bottom-up:an action to be executed is selected and added to the end of the partial plan. Theaction-selection mechanism varies, but usually a heuristic is used to determine anext action to execute. Some methods do not use backtracking: once an actionis selected, it will not be removed from the partial plan. Such planners are notcomplete, but can often find correct plans much faster. Examples of such plan-ners are FF (Hoffmann and Nebel, 2001), HSP (Bonet and Geffner, 2002), andProdigy (Veloso et al., 1995).

2. Regression refinement methods, also called backward planning methods, con-struct (partial) plans top-down: starting from the description of the set of end

10

states, they determine an action to reach such a state from a states that is pre-sumably ‘closer’ to the initial state (i.e., a shorter sequence of actions is neededto reach this states). The STRIPS planner (Fikes and Nilsson, 1971) is one ofthe first planners that use this technique. Many others use some of the ideas fromthis technique, for example as a heuristic for progression refinement, as in GRT(Refanidis and Vlahavas, 2001).

An advantage of both progression and regression refinement is that these meth-ods can describe a partial plan by a sequence of actions and a description of thestate reached last, and that they can search in thestate spaceto find what actionto add next, and do not need a (more complex)plan spacerepresentation. Givensuch a state-space representation one can easily see whether a plan is valid bycomparing the final state of the plan to the requirements of the goal state, or bycomparing the first state of a plan to the initial state, respectively.

3. The third category,least-commitment planning(Weld, 1994), really needs thecomplicated plan space representation of the partial plans as used in the refine-ment framework, because this type of planning refines plans not only by extend-ing the prefix or the postfix of the plan, but also adds constraints on the possiblesolutions in many ways. This category is sometimes also calledpartial orderplanning, because during the construction of a plan the order of the actions inthe plan is partial, while the order of the actions in a partial plan based on for-ward planning is usually complete. The planners Noah (Sacerdoti, 1975) andPOP (Weld, 1994) fit into this category.

Some approaches use a slightly different variant of a partial plan, called adisjunctivepartial plan. The nodes in the disjunctive partial plan may consist of several actions.The semantics of such a node is that exactly at that point, one of those actions is to beexecuted. For example Graphplan (Blum and Furst, 1997) uses a form of disjunctiveplanning.

Example 17. The Graphplan algorithm consists of two phases. First a planning graphis constructed that represents all possible solutions that reach the goal state in a mini-mum amount of planning steps. Then a solution is extracted from this planning graph.

The planning graph is a directed, layered graph. An example of such a graphis shown in Figure 4. In this graph two types of layers are interleaved: propositionlayers and action layers. A proposition layer consists of nodes that each represent anatom or the negation of an atom. The first layer is a proposition layer representing theinitial state. An action layer contains a node for each possible action. The nodes areconnected by three types of arcs. Precondition arcs connect the action to the atomsof its precondition of a previous layer. Add arcs connect an action to the nodes of thenext layer, representing an atom that is a direct consequence of the postcondition ofthe action, and delete arcs connect actions to the nodes representing atoms that aredisabled by the action. Propositions are always reproduced in the next layer by so-called “no-op” actions. Except when another action is chosen that has a delete effectfor this proposition, this leads to a conflict.

Such mutual exclusions (mutexes) are represented explicitly in Graphplan. Thesemutexes indicate for each action layer which actions cannot be fulfilled at the same time

11

Key

action

proposition

pre/post-condition arc

no-op

Figure 4: Graphplan uses a plan graph consisting of proposition layers and actionlayers.

(in the same layer). To find a correct plan, Graphplan builds the graph while searchingfor a proposition layer that implies the goal state. Backwards from this goal state themutual exclusion relations are verified. If it is impossible to satisfy all mutexes, theplan graph is extended with another layer and the process is repeated.

2.3 Extended planning problems

Over the last few years many extensions of the classical planning problem have beenstudied: dealing with time (Do and Kambhampati, 2001; Penberthy and Weld, 1994;Smith and Weld, 1999), costs (or utility maximization) (Haddawy and Hanks, 1998),limited resources (Koehler, 1998; Wolfman and Weld, 2001), and planning under un-certainty (Boutilier et al., 1999). Of these extensions, planning under uncertainty ismaybe the most relevant when multiple agents are acting in the same environment.Such domains introduce several types of uncertainty.

Firstly, actions can have probabilistic effects: for example, upon moving to anotherlocation by train, we know that we have 85 percent chance of actually reaching ourdestination in time, 10 percent of getting stuck somewhere, and 5 percent chance ofgetting involved in an accident. When the outcomes of actions can be (partially) ob-served, a plan can be constructed including sensing actions and conditional branches.This problem is calledcontingent planning. This leads to a second type of uncertainty.

The sensing actions may fail as well or may not be able to observe the world com-pletely. The unobservable, partially observable, and fully observable domain versionsof this problem are EXPTIME-complete, EXPSPACE-complete, and NEXPTIME-complete, respectively (Bernstein et al., 2000; Haslum and Jonsson, 1999). The addi-tional complexity over propositional STRIPS plan existence (Theorem 8) comes fromthe uncertain results of actions. Not only is the length of a plan exponential, now eachplan can have an exponential number of resulting end states.

Thirdly, planning in a domain where we lack even a probability distribution over

12

the possible outcomes of a non-deterministic operator, is callednon-deterministic plan-ning or conformant planning. So, conformant planning is the problem of finding, in anondeterministic domain, a sequence of actions which will achieve the goal for all pos-sible contingencies. The complexity of a conformant planning problem is EXPSPACE-complete, i.e., strictly higher than that of classical planning (Haslum and Jonsson,1999).

Theorem 18. (Haslum and Jonsson, 1999) Complexity of conformant planning. De-ciding the existence of a conformant plan (sequence) for a problemΠ with an unob-servable propositional domain and actions is EXPSPACE-complete.5

In a multi-agent system, agents often perform actions unexpectedly and indepen-dently of each other. When each agent is planning on its own without communicat-ing and coordinating with the other agents, each agent has to solve some sort of non-deterministic planning problem. Theorem 18 and the complexity analysis of contingentplanning show that non-determinism in planning is EXPTIME-hard. So we can con-clude that such an individual approach to multi-agent planning is EXPTIME-hard.

Corollary 19. An individual approach to planning in multi-agent systems using eitherconformant or contingent planning is EXPTIME-hard.

However, by communicating parts of plans of the agents, we can reduce or even insome domains remove the uncertain effects of actions. Such approaches to planning inmulti-agent systems are discussed in the remainder of this paper.

3 Coordinated planning

One may wonder why we need to study multi-agent planning problems and techniquesas a separate topic. Is not the multi-agent case covered by the general discussions ofplanning? The answer isno, because in real-life problems, we deal with multiple agentshaving their own goals, and it is often impractical or undesirable to create the plan forall agents centrally. These agents may be people or companies simply demandingto plan their actions themselves, or refusing to make all information necessary forplanning available to someone else. Consequently, such agents want to be able maketheir own plans independently of what the other agents are planning to do. This itself isnot a compelling reason to differentiate between planning and multi-agent planning, butin many cases dependencies between the tasks of the agents make independent planningimpossible. That is, if the agents do not take into account the dependencies betweentheir plans, then they might come into conflict when they try to execute their plans. Toresolve their dependencies, agents mustcoordinatetheir efforts. In the (multi-agent)literature, several definitions for coordination are given. A concise and clear definitionis from (Malone and Crowston, 1991):

5The class EXPSPACE is the class of decision problems that can be solved using an amount of spacebounded by2p(n), where p is a polynomial andn is the input length. It is known that the classEXPTIME⊆EXPSPACE contains problems that are intractable (Garey and Johnson, 1979, Theorem 7.16in), even if P=NP.

13

Coordinationis the act of managing interdependencies between activities.

Another, more planning-agent specific definition is due to (Jennings, 1996):

Participation in any social situation should be both simultaneouslyconstraining, in that agents must make a contribution to it, and yet en-riching, in that participation provides resources and opportunities whichwould otherwise be unavailable.Coordination, the process by which anagent reasons about its local actions and the (anticipated) actions of otheragents to try to ensure the community acts in a coherent manner, is the keyto achieving this objective.

Clearly then, the multi-agent planning problem has both a planning and a coordinationcomponent. We therefore define the multi-agent planning problem as follows:

Definition 20. Themulti-agent planning problemis the following problem: Given adescription of the initial state, a set of global goals, a set of (at least two) agents, andfor each agent a set of its capabilities and its private goals, find a plan for each agentthat achieves its private goals, such that these plans together are coordinated and theglobal goals are met as well.

Summarizing, the following statement perfectly captures our concept of multi-agent planning:

Multi-agent planning = planning + coordination

In the remainder of this section we will first discuss the nature of the coordinationproblem, and then discuss the various approaches that combine the planning and thecoordination part.

3.1 Coordination in multi-agent systems

To characterize multi-agent coordination problems, we identify several key character-istics. The first is the nature of the dependencies that necessitate the coordination. Onemay wonder whether there are a number of common problems that underlie any coor-dination situation. Having identified the nature of the coordination problem, we thenhave to choose the mechanism to solve the coordination problem. Again, we might askwhether there exist general coordination mechanisms that can be applied to a variety ofsituations. We are also interested in determining which coordination mechanism is thebest for a given situation, assuming we can choose from a number of mechanisms. Thethird aspect is the applicability and usability of a given coordination mechanism. Anymulti-agent system designer must make certain assumptions — for instance, about thenumber of agents in the system environment and the rate of change in the environment— about the environment in which the agents will operate. Hence, we need to knowthe factors determining the applicability of the coordination mechanism.

(Jennings, 1996) and (Nwana et al., 1996) state five reasons that might necessitatecoordination in multi-agent systems. The first two are very general in nature and donot directly refer to any characteristic of aninteraction situation, the other points dorefer to characteristics of the interaction situation.

14

1. Prevent anarchy or chaos:As an example they state that in British Telecom, noone in the company is aware of the activities of all 130000 employees. Conse-quently, agents have only local views, which is bound to lead to conflicts withoutcoordination.

2. Efficiency:Even if agents can work independently, sometimes working togethermeans being able to solve problems faster.

3. Meeting global constraints:If there are global budget limits, then agents mustmake agreements so that they do not inadvertently exceed the budget.

4. Distributed information, expertise or resources:Often, a task cannot be per-formed (or performed efficiently) by a single agent alone. If the required capa-bilities are distributed among the agents, coordination is necessary.

5. Dependencies between the agents’ actions:For instance, different tasks mayneed the same resources or there may exist a precedence relation between tasks.

Of course, the above categories are not orthogonal; the first four are all, in fact, deriv-able from the fifth.

In their discussion of their well-known multi-agent planning framework, the GPGP-planning framework, (Decker and Lesser, 1994) summarize the need for coordinationfrom a different point of view. Whereas Jennings and Nwana state reasons for coor-dination due to the characteristics of the interaction situation, Decker and Lesser statereasons for coordination based on the point of view of an individual agent. Decker andLesser use the definition of coordination given by Malone and Crowston. That is, thatcoordination is the act of managing interdependencies between activities. Using thisdefinition as a context, Decker and Lesser say that there is a coordinationproblemifone of the following conditions is met:

• The agent has a choice of actions and that choice affects performance.

• The order in which activities are carried out affects performance.

• The time at which actions are executed affects performance.

(Malone and Crowston, 1991, 1993) summarize the need for coordination in theircoordination definition — to manage interdependencies between activities. The focusof their research is finding general coordinationmechanismsthat can be applied incoordination situations of different research disciplines. Their research is characterizedby the following questions they raise:

Are there fundamental coordination processes that occur in all coordi-nated systems?. . . How far can we get by analyzing very general coordi-nation processes and when will we find that most of the important factorsare specific to coordinating a particular kind of task? For example, arethere general heuristics for coordination that are analogous to the gen-eral problem-solving heuristics studied in cognitive science and artificialintelligence?

15

They identify the following coordination processes that occur in many domains:

• coordinated goal selection and decomposition,

• coordinated resource allocation, a special case of which is task assignment,

• coordinated sequencing (one activity after the other) and synchronizing (activi-ties at the same time).

Of course, the above list of processes represent classes of coordination mechanisms;there is not one task assignment protocol or one goal decomposition algorithm.

(Durfee, 2001) identifies factors influencing the usability of specific coordinationmechanisms. That is, he identifies the factors that determine how far the applicabilityof specific coordination mechanisms reach — and he concludes that the applicabilityof every coordination mechanism has its limits:

. . . various coordination strategies for computational agents have emergedover the years. It does not seem possible, however, to devise a coordina-tion strategy that works well under all circumstances; if such a strategyexisted, human societies would substitute it for the myriad constructs em-ployed today such as corporations, governments, markets, teams, com-mittees, professional societies, and mailing groups. Whatever strategy weadopt, certain situations can stress it to the breaking point.

Durfee identifies three varying properties (dimensions) of an interaction situation:The agent population, the task environmentand thesolution properties. For each ofthese dimensions, he identifies the three most obvious properties that impact the us-ability of a coordination strategy.

Agent population :

• Quantity: The number of agents.

• Heterogeneity:For example, agents can have different capabilities, internalarchitectures and communication languages.

• Complexity:Complexity refers to how hard it is to predict what aversatileagent will do, i.e., howpredictablethe agents are.

Task environment :

• Degree of interaction:Some issues concern large groups of agents, such asa resource that regulates exclusive access to a blackboard datastore, otherissues concern small groups agents, such as collision avoidance.

• Dynamics: The rate at which the environment changes, typically due toevents outside the influence of the agents.

• Distributivity: For instance, tasks can originate centrally or distributively.

Solution properties :

16

• Quality: We can measure the quality of a solution by judging how well itcoordinates agent interactions (e.g. near optimal or merely acceptable) orhow efficient it is in utilizing agent resources.

• Robustness:To what extent do changes in the environment invalidate theplans or goals of agents?

• Overhead limitations:Communication bandwidth may be limited.

Scaling up along combinations of these dimensions poses even greater challenges. Forinstance, the delays associated with propagating information in a highly distributedsetting compound the difficulties that arise in a dynamic environment.

3.2 Approaches to multi-agent planning

Multi-agent planning techniques cover quite a range of solutions to different parts ofthe problem. In this section we structure existing work using the phases in the processof solving a multi-agent planning problem. In general, the following phases can bedistinguished (generalizing the main steps in task sharing by (Durfee, 1999)).

1. Refine the global goals or tasks until subtasks remain that can be assigned toindividual agents (global task refinement).

2. Allocate this set of subtasks to the agents (task allocation).

3. Define rules or constraints for the individual agents to prevent them to produceconflicting plans (coordination before planning).

4. For each agent: make a plan to reach its goals (individual planning).

5. Coordinate the individual plans of the agents (coordination after planning).

6. Execute the plans and synthesize the results of the subtasks (plan execution).

Not always all phases of this general multi-agent planning process need to be included.For example, if there are no common or global goals, there is no need for phase 1 and 2,and possible conflicts can be dealt with on forehand (in phase 3) or afterwards (in phase5). Also, some approaches combine different phases. For example, agents can alreadycoordinate their plans while constructing their plans (combination of phase 4 and 5),or postpone coordination until the execution phase (combination of phase 5 and 6), as,e.g., robots may do when they unexpectedly encounter each other while following theirplanned routes.

For each of the phases that can be distinguished in a multi-agent planning process,we describe some of the currently most well-known approaches that can be used to dealwith the issues arising in such a phase.

17

Global task refinement

In the first phase, the global tasks or goals are refined such that each remaining task canbe done by a single agent. Apart from single-agent planning techniques such as HTN(Erol et al., 1994), or non-linear planning (Penberthy and Weld, 1992; Sacerdoti, 1975),special purpose techniques have been developed to create a global multi-agent plan.Such so-called centralized multi-agent planning approaches in fact use the classicalplanning framework to construct and execute multi-agent plans (Katz and Rosenschein,1989; Pednault, 1987).

Task allocation

The centralized multi-agent planning methods mentioned before usually also take careof the assignment of tasks to agents (phase 2). There are, however, many other methodsto establish such a task assignment in a more distributed way, giving the agents a higherdegree of autonomy and privacy, e.g., via complex task allocation protocols (Shehoryand Kraus, 1998) or auctions and market simulations. An auction is a way to make surethat a task is assigned to the agent that attaches the highest value (called private value)to it (Walsh et al., 2000; Wellman et al., 2001). A (Vickrey, 1961) auction is an exam-ple of an auction protocol that is quite often used. In a Vickrey auction each agent canmake one (closed) bid, and the task is assigned to the highest bidder for the price of thesecond-highest bidder. This auction protocol has the nice property that bidding agentsare stimulated to bid their true private value (i.e., exactly what they think it’s worth tothem). Market simulations and economics can also be used to distribute large quanti-ties of resources among agents (Walsh and Wellman, 1999; Wellman, 1993; Wellmanet al., 1998). For example, in (Clearwater, 1996) it is shown how costs and money areturned into a coordination device. These methods are not only used for task assignment(phase 2), but can also be used for coordinating agents after plan construction (phase5). In the context of value-oriented environments, game-theoretical approaches (whereagents reason about the cost of their decision making (or communication) become moreimportant. See, for example, work by Sandholm, supported by results from a multipledispatch center vehicle routing problem (Sandholm and Lesser, 1997). An overview ofvalue-oriented methods to coordinate agents is given in (Fischer et al., 1998). Espe-cially Markov decision processes give an interesting opportunity to deal with a partiallyobservable world as well (Pynadath and Tambe, 2002).

Coordination before planning

In phase 3 the agents are coordinated before they even start creating their plans. Thiscan be done, for example, by introducing so-called social laws. A social law is agenerally accepted convention that each agent has to follow. Such laws restrict theagents in their behavior. They can be used to reduce communication costs and planningand coordination time. In fact, the work of (Yang et al., 1992) and (Foulser et al., 1992)about finding restrictions that make the plan merging process easier, as discussed in theprevious section, is a special case of this type of coordination. Typical examples ofsocial laws in the real world are traffic rules: Because everyone drives on the right side

18

of the road (well, almost everyone), virtually no coordination with oncoming cars isrequired. Generally, solutions found using social laws are not optimal, but they may befound relatively fast. How social laws can be created in the design phase of a multi-agent system is studied by (Shoham and Tennenholtz, 1995). (Briggs, 1996) proposedmore flexible laws, where agents first try to plan using the strictest laws, but when asolution cannot be found agents are allowed to relax these laws somewhat.

Another way to coordinate agents is to figure out the exact interdependencies be-tween their tasks beforehand. Prerequisite constraints can be dealt with centrally usingexisting planning technology (such as partial order planning (Weld, 1994, among oth-ers)) by viewing these tasks as single-agent tasks. More recently, an approach has beenproposed to deal with interferences (such as shared resources) between the goals of oneagent (Thangarajah et al., 2003).

Coordination before planning can also be used to coordinate competitive agentsthat insist on their planning autonomy. Here, the problem is that we have a set ofinterrelated (sub)goals that have to be reached by a set of planning agents, that donot want to be interfered during their planning activity. That is, each of the agentsrequires full planning autonomy, but at the same time we have to be sure that whatever(sub) plan they will construct to solve their part of the problem, these sub plans canbe coordinated seamlessly without requiring replanning. Planning problems like theseoften occur in multi-modal transportation problems: several parties have to ensure thatpackages are transported from their source locations to their destinations. The planningagents are prepared to carry out their part of the job if it can be guaranteed that theywill not be interfered by the activities of other agents.

It is clear that most of those planning problems cannot be decomposed into inde-pendent subproblems without changing the original planning problem. In (ter Morset al., 2004) a preplanning coordination method is described that adds a minimal set ofadditional constraints to the subgoals to be performed in order to ensure a coordinatedsolution by independent planning.

Individual planning

The fourth phase consists of individual planning for each of the agents. In principle,any planning technique can be used here, and different agents may even use other tech-niques. There are a couple of approaches that integrate planning (phase 4) and thecoordination of plans (phase 3 and 5). In the Partial Global Planning (PGP) framework(Durfee and Lesser, 1987), and its extension, Generalized PGP (Decker and Lesser,1992, 1994), each agent has a partial conception of the plans of other agents using aspecialized plan representation. In this method, coordination is achieved as follows. Ifan agent A informs another agent B of a part of its own plan, B merges this informationinto its own partial global plan. Agent B can then try to improve the global plan by, forexample, eliminating redundancy it observes. Such an improved plan is shown to otheragents, who might accept, reject, or modify it. This process is assumed to run concur-rently with the execution of the (first part of the) local plan. PGP has first been appliedto the distributed vehicle monitoring test bed, but, later on, an improved version hasalso been shown to work on a hospital patient scheduling problem. Here (Decker andLi, 2000) used a framework for Task Analysis, Environment Modeling, and Simulation

19

(TAEMS) to model such a multi-agent environment. An overview of the PGP relatedapproaches is given by (Lesser et al., 1998). (Clement and Barrett, 2003) improvedupon this PGP framework by separating the planning algorithm from coordinating theactions, using a more modular approach called shared activities (SHAC).

Coordination after planning

A large body of research focused on what to do after plans have been constructed sep-arately (phase 5). These plan merging methods aim at the construction of a joint planfor a set of agents given the individual (sub) plans of each of the participating agents.(Georgeff, 1983, 1988) was one of the first to actually propose a plan-synchronizationprocess starting with individual plans. He defined a so-called process model to for-malize the actions open to an agent. Parts of such a process model are the correctnessconditions, which are defined on the state of the world and must be valid before exe-cution of the plan may succeed. Two agents can help each other by changing the stateof the world in such a way that the correctness conditions of the other agent becomesatisfied. Of course, changing the state of the world may help one agent, but it mayalso interfere with another agent’s correctness conditions (Georgeff, 1984).

(Stuart, 1985) uses a propositional temporal logic to specify constraints on plans,such that it is guaranteed that only feasible states of the environment can be reached.These constraints are given to a theorem prover to generate sequences of communi-cation actions (in fact, these implement semaphores) that guarantee that no event willfail. To both improve efficiency and resolve conflicts, one can introduce restrictionson individual plans (in phase 3) to ensure efficient merging. This line of action is pro-posed by (Yang et al., 1992) and (Foulser et al., 1992), and can also be used to mergealternative plans to reach the same goal.

Another approach to merging a set of plans into a global plan deals with problemsarisen from both conflicts and redundant actions by using the search method A* and asmart cost-based heuristic: (Ephrati and Rosenschein, 1993) showed that, by dividingthe work of constructing sub plans over several agents, one can reduce the overallcomplexity of the merging algorithm (Ephrati and Rosenschein, 1994).

In other works on plan merging, (Ephrati et al., 1995a; Rosenschein, 1995) pro-pose a distributed polynomial-time algorithm to improve social welfare (i.e., the sumof the benefits of all agents). Through a process of group constraint aggregation, agentsincrementally construct an improved global plan by voting about joint actions. Theyeven propose algorithms to deal with insincere agents and to interleave planning, coor-dination, and execution (Ephrati and Rosenschein, 1995).

An approach that considers both conflicts and positive relations is proposed by (vonMartial, 1989, 1990, 1992). He presents plans hierarchically, and the top level needsto be exchanged among the agents to determine such relations. If possible, relationsare solved or exploited at this top level. If not, a refinement of the plans is made, andthe process is repeated. For each specific type of plan relationship, a different solutionis presented. Relations between the plans of autonomous agents are categorized. Themain aspects are positive/negative relations, (non) consumable resources, requests, andfavor relationships.

20

Recently, (Tsamardinos et al., 2000) succeeded in developing a plan merging al-gorithm that deals with both durative actions and time. They construct a conditionalsimple temporal network to specify (temporal) conflicts between plans. Based on thisspecification, a set of constraints is derived that can be solved by a constraint solver.The solution specifies the required temporal relations between actions in the mergedplan. One of the problems with the plan merging approaches described above is thatone agent may become dependent on another, while this was in the beginning not thecase at all.

Finally, (Cox and Durfee, 2003) describe how to maintain the autonomy, while stillbeing able to use results from other agents to improve the efficiency. Basically, theiridea is to add these dependencies conditionally to the plan: if the other agent succeeds,this more efficient branch of the plan can be executed; otherwise the normal course ofaction can still be followed.

Plan execution

We consider the sixth phase to be of a slightly different order and a bit off-topic. (Thisin fact includes a vast body of work such as on reactive agents and behavior models.)

4 Multi-agent planning systems

In this section, we discuss three approaches to planning and coordination from themulti-agent literature in more detail — one approach in which coordination occursprior to planning (phase 3), one in which planning and coordination are interleaved(phase 4), and one in which agents first make their own plans, and then performplanmerging(phase 5).

4.1 Coordination through filtering

(Ephrati et al., 1995b) distinguish two approaches to multi-agent coordination.Ex-plicit coordinationinvolves agents reasoning about their interactions and negotiations.A problem with explicit coordination is that it can be extremely time-consuming, whichcan be impractical in dynamic domains. With the second approach,implicit coordina-tion, agents follows ‘local rules of behaviour’ that ensure that agents can operate with-out having to worry about interference from other agents. Social laws are an exampleof implicit coordination.

Ephrati’s paper deals with implicit coordination in the form ofmulti-agent filter-ing. Multi-agent filtering is an extension of single-agent filtering, which was a strategydesigned for agents in dynamic environments. An agent using a single-agent filteringstrategy has set a set of goals. Due to changes in the environment, opportunities ariseto take alternative or additional action. A filtering strategy filters out thoseoptionsthat are incompatible with the agent’s current goal. A multi-agent filtering strategybypasses options that are incompatible with the goals ofother agents. Ephrati et al.ask themselves the question whether rational agents should employ a filtering strategy,

21

since bypassing options for the sake of other agents effectively reduces the number ofpossible actions for the filtering agent.

Filtering strategies can be augmented with an override mechanism. If an option in-terferes with its own or other agents’ goals, but it looks particularly promising, then theoption can be taken into deliberation, the same as non-conflicting options. The overridemechanisms is based on a threshold value. Abold agent will not consider many newoptions (he will have high threshold), acautiousagent will use a low threshold value.

The notion ofinterferencein this paper differs from the ‘standard’ notion of in-terference that e.g. (Ephrati and Rosenschein, 1993) use, where for instance the pre-conditions for one action must not be undone by the post-conditions of another action.Instead, the authors identify a single type of conflict situation in their example domain,the multi-agent tileworld domain. In the tileworld domain, there is a map (grid) that islittered with tiles. Objects (holes, tiles, obstacles, etc.) appear and disappear dynami-cally and agents receive payment for filling holes with tiles. Conflict is defined as twoagents trying to fill the same hole.

For this domain, several filtering strategies are presented. One filtering strategy,Static geographic filtering, is based on the location of the agents and the location of theholes. Agents are assigned non-overlapping portions of the map and they filter out anyoptions of fillings holes that are not in their region. Using the filtering strategyIntentionposting, agents must post on a blackboard their intention to fill a hole. Agents bypassoptions of filling holes that another agent intends to fill.

To answer the question of whether employing a filtering strategy is rational, experi-ments where run where a number of agents, from one to all, employ a filtering strategy(static geographic) and all others do not. Using filtering proved the dominant strategy,because whatever the number of agent using filtering, the agents using filtering had, onaverage, the highest utility. Ephrati et al. do not, however, give any evidence that thisconclusion can be generalized to other domains. Indeed, even their claim that filteringis the dominant strategy in the tileworld domain is insufficiently supported, since theydo not precisely specify the strategy used by agentsnot using a filtering strategy.

4.2 Generalized Partial Global Planning

The Partial Global Planning (PGP) (Durfee, 1991) framework is perhaps one of themost influential approaches in distributed artificial intelligence. In the PGP frame-work, agents cooperate because no agent has complete information. PGP assumes acooperative distributed problem solving environment (CDPS, these days referred toas simply DPS, which contrasts with multi-agent systems (MAS), where agents areself-interested), where agents are willing to help each other without any compensation.For CDPS environments, Durfee and Lesser identify four categories of coordinationtechniques, all of which are encompassed in PGP.

1. Contracting:The distributed problem solving process is viewed as one big pro-cess and many potential solvers, such as in parallel computing. The goal ofcoordination is to utilize the problem solvers to the utmost. In case agents canexecute their tasks independently, then contracting (which redistributes tasks)always has positive utility.

22

2. Result-sharing:Result-sharing concentrates on domains where tasks are inher-ently distributed, but the problems arising for one agent may be related to prob-lems arising for other agents.

3. Organizing:Organizational knowledge about e.g. agents’ roles and responsibil-ities can help agents decidewhat information to communicate towhichagents.

4. Planning: Traditional planning in distributed artificial intelligence has focusedon avoiding resource conflicts. If agents can act independently, the focus is oncooperation.

PGP is geared towards a particular kind of multi-agent domain, that of distributedsensor networks. In their paper, a distributed network of acoustic sensors monitoringvehicle movement is used as the running example. The goal of this network is to pro-vide a consistent view of vehicle movements. In order to do so, agents must interprettheir sensor data. Because there is too much data (in particular, too much noise), ex-haustively analyzing the sensor inputs is impractical. Fortunately, interrelationshipsbetween the data of other agents mean that by sharing information, it is possible tointerpret the data accurately and timely. More specifically, coordination techniques fordistributed sensor networks must allow agents to:

• instruct (or propose) other agents to collect specific data (for instance monitor acertain road),

• determine which information to send to whom and when,

• cooperate in other ways. For instance, an agent who has no data of himself tointerpret can be put to work analyzing data of other agents.

Dynamic sensor networks are highly dynamic environments and different PGP co-ordination techniques are used in different circumstances. If, however, we pretend for amoment that the environment is static, then the following four coordination techniquesare used subsequently:

1. Local planning: An agent makes very rough characterizations about all of thepossible interpretations of its dataset, and makes tentative plans for all theseinterpretations. A plan represents future actions at two levels of abstraction.At the high level of abstraction, it outlines the major steps; at the low level ofabstraction, it details the actions for the next major step.

2. Communication with other agents:To know what information to send to whomand when, PGP employs two types of organizations. Thetask-levelorganizationdefines the roles and responsibilities of agents with regard to performing tasks.Themeta-levelorganization defines authority roles, that is, some agents may giveorders to other agents.

3. Initializing a Partial Global Plan:To integrate the plans from other agents withhis own plan, an agent will try to relate the goals of one plan with the goalsof another agent. Goals can be related in various ways. An example of a goalrelation in the vehicle monitoring domain is if different agents each monitor adifferent stretch of track, such that all stretches belong to the same road.

23

4. Modifying PGPs: If agents have received or constructed a partial global plan,they can try to improve it using techniques such as task re-distribution and taskre-ordering. By sending such a (modified) partial global plan to other agents, theagent effectively proposes this plan to other agents. Other agents may accept theroles the first agent has outlined for them in the PGP, they may refuse, or theymay send a counterproposal in the form of a modified partial global plan.

In Generalized Partial Global Planning (GPGP) (Decker and Lesser, 1994), PGPis extended by defining a task-oriented framework (described in (Decker and Lesser,1993)) which allows coordination mechanisms to be ‘inserted’ into the framework. Inthis way, the set of coordination techniques used in the PGP framework for distributedvehicle monitoring is merely one configuration of a framework that defines a ‘family’of coordination algorithms that are not tied to a single domain.

In the TAEMS (Task Analysis, Environment Modeling, and Simulation) frame-work, tasks can be composed of subtasks, forming a hierarchy with one root nodecalled thetask group. Multiple task groups can exist at the same time. The other rela-tionships between tasks areenables(a task can only be started if the enabling task hascompleted),facilitates (that is, a positive relation) andhinders(a negative relation).For evaluating performance of execution of a (set of) task(s), two characteristics arerelevant, theelapsed timeand thequality of task execution, which is meant as a per-formance measure that can be instantiated for specific domains. An agent hasbelief,which means that the agent believes the part of the global task structure that he can see.Agents can commit themselves to performing tasks for other agents.

Each agent has a local scheduler that schedules thecomputational resourcesofagent, i.e., the scheduler determines which tasks an agent will execute and when. Thepurpose of the coordination mechanisms is to ensure that the local scheduler is providedwith the best possible input that allows construction of a high utility schedule. In otherwords, coordination mechanisms are used to enable an agent to make a good plan.

4.3 Plan merging using a resource formalism

Starting point in (de Weerdt et al., 2003) is that there are two agents that have plans thatcan be executed without taking into account the plan of the other agent. In other words,the agents can perform their plans without need for (further) coordination. However,by cooperating the agents can find more efficient plans.

A resource logicis used to represent plans and actions. The goal situation is tohave a set of resources of a particularresource type. The initial situation is also a setof resources. Agents can perform actions, calledskills, that consume resources andproduce resources. A plan can be represented as a directed acyclic graph where thevertices are resources and skills and the arcs connect resources with skills, to indicateeither a consumption or production relationship (for example, see Figure 5 where askill s1 consumes one resource of typea and produces two resources of respectivelytypeb andc).

A plan may produce more resources than required if(i) the skills produce resourcesthat are not used by subsequent skills in the plan and those resources are not requiredfor the goal, or if(ii) such resources are present in the initial situation. These unused

24

a

b c

s1

Figure 5: A plan: skills1 consumes a resource of type ‘a’ and produces resources oftypes ‘b’ and ‘c’.

resources can be consideredside effectsof a plan. Other agents may be able to usethese unused resources. Although every agent already has all the resources he needsfor his plan, by buying resources from other agents, he does not need to produce themhimself. More precisely, if an agent can buy all the useful (i.e., not unused) resourcesthat a particular skill produces, then that skill can be removed from the plan. Theremoval of a skill in turn frees up the resources that were previously consumed by thedeleted skill, which opens up possibilities for further trading of resources.

This work does not go into the details of the negotiation between agents, becauseif agents cooperate (i.e., successfully trade resources), utility can only go up. To theselling agent, the resources are worthless if he does not sell them and, assuming rationalagents, the buying agent will only buy resources if this leads to a reduction in cost.

5 Future work

Many of the accomplishments of multi-agent research, though fine within their in-tended settings, fail to impress when applied ‘real-world’ problems. That is, the sim-plifying assumptions that underlie these approaches often prove too restrictive for prac-tical use. A number of these simplifying assumptions include (DesJardins et al., 2000):

1. The environment is fairly stable. In the early days of AI planning research, theworld was assumed to be static, meaning that changes to the state of the worldare always the result of actions performed by the agent. Of course, researchershave long since recognized the importance of allowing unexpected changes tooccur in the environment, yet many research still relies on the assumption of a(semi-) static environment.

2. The world is deterministic. We assume that we know the result of each ac-tion. Unfortunately, especially in a multi-agent environment, this is not the case,because for example another agent may have changed the world after the pre-condition of an action has been established. However, under the assumption that

25

all agents’ actions are coordinated, this deterministic assumption is quite accept-able.

3. There is a fair degree of coherence.Either the agents are designed to worktogether or they are rational and have an incentive to do so. In other words, agentswill try to maximize their expected utility (Zlotkin and Rosenschein, 1996).

4. Knowledge about the world is correct and consistent among all agents. Inother words, the (relevant part of the) world is completely observable.

5. A feasible goal state existsin which all global goals are achieved, and all privategoals are also met (at least to some degree, such as in “make a lot of money”).

6. Learning is not required. In other words, (past) events do not affect the agentsother than a change of the current state.

7. Communication is reliable and (almost) free.All messages come across safely,and the agents share a common ontology and utility units. Furthermore, there isno significant cost associated with communication actions. Most of these as-sumptions are commonly used, and, fortunately, they are acceptable in manyapplication domains.

To relax the first assumption, concerning changes in the environment, Desjardins etal. propose to introduce a new multi-agent planning paradigm:Distributed ContinualPlanning(DCP). A traditional approach to handling uncertainty is to plan for all thecontingenciesthat might arise. If we plan for both the case that the contingency arisesand for the case that it does not, then obtain a conditional plan that branches for each ofthe possible contigencies. It is not hard to see that if the number of possible contigen-cies becomes very large, then the conditional plan will become unmanageably large.Another approach is therefore to have a single plan andmonitor its execution. As soonas a deviation from the plan is detected, someplan repairmust be done to ensure thatthe goals will still be reached.

Distributed Continual Planning aims to go beyond replanning on failure, or contin-gency planning, by viewing planning as an ongoing, dynamic process in which plan-ning and execution are interleaved. Continual planning should not only react to changesthat threaten the execution of the plan, but also look for opportunities to improve theplan. This might mean for instance that if the goal of a plan has become obsolete yetthe plan is still feasible, then the agent should abandon the current plan and search fora better use of its resources. Desjardins et al. state that an agent should engage incontinual planning when(i) aspects of the world changes dynamically beyond controlof the agent, when(ii) aspects of the world are revealed incrementally, when(iii) timepressure requires an agent to start the execution of his plan before it is complete, orwhen(iv) the agent’s objectives can evolve over time.

References

Allen, J. F., Hendler, J., and Tate, A., editors (1990).Readings in Planning. MorganKaufmann Publishers, San Mateo, CA.

26

Bernstein, D. S., Givan, R., Immerman, N., and Zilberstein, S. (2000). The complexityof decentralized control of markov decision processes. InProceedings of the Six-teenth Conference on Uncertainty in Artificial Intelligence (UAI-00), pages 32–37,San Mateo, CA. Morgan Kaufmann Publishers.

Blum, A. L. and Furst, M. L. (1997). Fast planning through planning graph analysis.Artificial Intelligence, 90:281–300.

Bond, A. H. and Gasser, L., editors (1988).Readings in Distributed Artificial Intelli-gence. Morgan Kaufmann Publishers, San Mateo, CA.

Bonet, B. and Geffner, H. (2002). Planning as heuristic search.Artificial Intelligence,129:5–33. Special issue on Heuristic Search.

Boutilier, C., Dean, T. L., and Hanks, S. (1999). Decision-theoretic planning: Struc-tural assumptions and computational leverage.Journal of AI Research, 11:1–94.

Briggs, W. (1996). Modularity and Communication in Multi-Agent Planning. PhDthesis, University of Texas at Arlington.

Bylander, T. (1994). The computational complexity of propositional STRIPS planning.Artificial Intelligence, 69(1–2):165–204.

Clearwater, S. (1996).Market-Base Control – a paradigm for distributed resourceallocation. World Scientific Publishing Co.

Clement, B. J. and Barrett, A. C. (2003). Continual coordination through shared activ-ities. InProceedings of the Second International Conference on Autonomous Agentsand Multi-Agent Systems (AAMAS-03).

Cox, J. S. and Durfee, E. H. (2003). Exploiting synergy while managing agent auton-omy. InAAMAS ’03 Workshop on Autonomy, Delegation and Control.

de Weerdt, M. M., Bos, A., Tonino, J., and Witteveen, C. (2003). A resource logic formulti-agent plan merging.Annals of Mathematics and Artificial Intelligence, specialissue on Computational Logic in Multi-Agent Systems, 37(1–2):93–130.

Decker, K. and Lesser, V. R. (1993). Quantitative modeling of complex computa-tional task environmnents. InProceedings of the 12th international workshop ondistributed artificial intelligence, pages 67–82, Hidden Valley, Pennsylvania.

Decker, K. S. and Lesser, V. R. (1992). Generalizing the partial global planning al-gorithm. International Journal of Intelligent and Cooperative Information Systems,1(2):319–346.

Decker, K. S. and Lesser, V. R. (1994). Designing a family of coordination algorithms.In Proceedings of the Thirteenth International Workshop on Distributed ArtificialIntelligence (DAI-94), pages 65–84.

Decker, K. S. and Li, J. (2000). Coordinating mutually exclusive resources using gpgp.Autonomous Agents and Multi-Agent Systems, 3(2):113–157.

27

DesJardins, M. E., Durfee, E. H., Ortiz, C. L., and Wolverton, M. J. (2000). A surveyof research in distributed, continual planning.AI Magazine, 20(4):13–22.

Do, M. and Kambhampati, S. (2001). Sapa: A domain-independent heuristic metrictemporal planner. InProceedings of the Sixth European Conference on Planning(ECP-01), pages 109–120.

Durfee, E. H. (1991). Organizations, plans, and schedules: An interdisciplinary per-spective on coordinating AI agents.Journal of Intelligent Systems. Special Issue onthe Social Context of Intelligent Systems.

Durfee, E. H. (1999). Distributed problem solving and planning. In Weiß, G., editor,A Modern Approach to Distributed Artificial Intelligence, chapter 3. The MIT Press,San Francisco, CA.

Durfee, E. H. (2001). Scaling up agent coordination strategies.Computer.

Durfee, E. H. and Lesser, V. R. (1987). Planning coordinated actions in dynamic do-mains. InProceedings of the DARPA Knowledge-Based Planning Workshop, pages18.1–18.10.

Ephrati, E., Pollack, M., and Rosenschein, J. S. (1995a). A tractable heuristic thatmaximizes global utility through local plan combination. In Lesser, V., editor,Pro-ceedings of the First International Conference on Multi-Agent Systems (ICMAS-95),pages 94–101, San Francisco, CA. AAAI Press, distributed by The MIT Press.

Ephrati, E., Pollack, M. E., and Ur, S. (1995b). Deriving multi-agent coordinationthrough filtering strategies. InProceedings of the Fourteenth International JointConference on Artificial Intelligence (IJCAI-95), pages 679–687, San Mateo, CA.Morgan Kaufmann Publishers.

Ephrati, E. and Rosenschein, J. S. (1993). Multi-agent planning as the process ofmerging distributed sub-plans. InProceedings of the Twelfth International Workshopon Distributed Artificial Intelligence (DAI-93), pages 115–129.

Ephrati, E. and Rosenschein, J. S. (1994). Divide and conquer in multi–agent planning.In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), pages 375–380, Menlo Park, CA. AAAI Press.

Ephrati, E. and Rosenschein, J. S. (1995). A framework for the interleaving of ex-ecution and planning for dynamic tasks by multiple agents. In Castelfranchi, C.and Muller, J., editors,From Reaction to Cognition — Fifth European Workshop onModelling Autonomous Agents in a Multi-Agent World (MAAMAW-93), LNAI Vol-ume 957, pages 139–156, Berlin. Springer Verlag.

Erol, K., Hendler, J., and Nau, D. S. (1994). HTN planning: Complexity and ex-pressivity. InProceedings of the Twelfth National Conference on Artificial Intelli-gence (AAAI-94), volume 2, pages 1123–1128, Seattle, Washington, USA. AAAIPress/MIT Press.

28

Ferber, J. and Drogoul, A. (1992). Using reactive multi-agent systems in simulationand problem solving. In Avouris, N. M. and Gasser, L., editors,Distributed Arti-ficial Intelligence: Theory and Praxis, volume 5 ofEuro Courses: Computer andInformation Science, pages 53–80. Kluwer Academic, The Netherlands.

Fikes, R. E. and Nilsson, N. (1971). STRIPS: A new approach to the application oftheorem proving to problem solving.Artificial Intelligence, 5(2):189–208.

Fischer, K., Ruß, C., and Vierke, G. (1998). Decision theory and coordination in multi-agent systems. Technical Report RR-98-02, DFKI GmbH: German Research Centerfor Artificial Intelligence.

Foulser, D., Li, M., and Yang, Q. (1992). Theory and algorithms for plan merging.Artificial Intelligence Journal, 57(2–3):143–182.

Garey, M. and Johnson, D. (1979).Computers and intractability – a guide to the theoryof NP-completeness. W.H. Freeman and company, New York, NY.

Georgeff, M. P. (1983). Communication and interaction in multi-agent planning. InProceedings of the Third National Conference on Artificial Intelligence (AAAI-83),pages 125–129, Menlo Park, CA. AAAI Press. Also published in Bond and Gasser(1988), pages 200–204.

Georgeff, M. P. (1984). A theory of action for multiagent planning. InProceedingsof the Fourth National Conference on Artificial Intelligence (AAAI-84), pages 121–125, Menlo Park, CA. AAAI Press. Also published in Bond and Gasser (1988),pages 205–209.

Georgeff, M. P. (1988). Communication and interaction in multi-agent planning. InBond, A. and Gasser, L., editors,Readings in Distributed Artificial Intelligence,pages 200–204. Morgan Kaufmann Publishers, San Mateo, CA. Also publishedas Georgeff (1983).

Haddawy, P. and Hanks, S. (1998). Utility models for goal-directed, decision-theoreticplanners.Computational Intelligence, 14(3):392–429.

Haslum, P. and Jonsson, P. (1999). Some results on the complexity of planning with in-complete information. InProceedings of the Fifth European Conference on Planning(ECP-99), volume 1809 ofLecture Notes on Artificial Intelligence, pages 308–318,Berlin. Springer Verlag.

Hoffmann, J. and Nebel, B. (2001). The FF planning system: Fast plan generationthrough heuristic search.Journal of AI Research, 14:253–302.

Janlert, L.-E. (1987). Modeling change—the frame problem. In Pylyshyn, Z., editor,The Robot’s Dilemma: The Frame Problem in Artificial Intelligence, pages 1–40.Ablex Publishing Co., Norwood, NJ.

Jennings, N. R. (1996). Coordination techniques for artificial intelligence. In O’Hare,G. and Jennings, N., editors,Foundations of Distributed Artificial Intelligence, pages187–210. John Wiley & Sons, New York, NY.

29

Kambhampati, S. (1997). Refinement planning as a unifying framework for plan syn-thesis.AI Magazine, 18(2):67–97.

Katz, M. J. and Rosenschein, J. S. (1989). Plans for multiple agents. In Gasser,L. and Huhns, M. N., editors,Distributed Artificial Intelligence, volume 2 ofRe-search Notes in Artificial Intelligence, pages 197–228. Pitman Publishing and Mor-gan Kaufmann Publishers, London, UK.

Koehler, J. (1998). Planning under resource constraints. InProceedings of the Thir-teenth European Conference on Artificial Intelligence (ECAI-98), pages 489–493.John Wiley & Sons.

Lesser, V., Decker, K., Carver, N., Garvey, A., Neimen, D., Prassad, M., and Wag-ner, T. (1998). Evolution of the gpgp domain independent coordination framework.Technical Report UMASS CS TR 1998-005, University of Massachusetts.

Lifschitz, V. (1987). On the semantics of STRIPS. InReasoning About Actions andPlans — Proceedings of the 1986 Workshop, pages 1–9, San Mateo, CA. MorganKaufmann Publishers.

Malone, T. W. and Crowston, K. (1991). Toward an interdisciplinary study of coordi-nation. Center for Coordination Science, MIT.

Malone, T. W. and Crowston, K. (1993). The interdisciplinary study of coordination.ACM computing surveys.

Nebel, B. (2000). On the compilability and expressive power of propositional planningformalisms.Journal of AI Research, 12:271–315.

Nwana, H. S., Lee, L., and Jennings, N. R. (1996). Coordination in software agentsystems.BT Technology Journal, 14(4):79–88.

Pednault, E. P. (1987). Formulating multi-agent dynamic-world problems in the clas-sical planning framework. In Georgeff, M. P. and Lansky, A. L., editors,ReasoningAbout Actions and Plans — Proceedings of the 1986 Workshop, pages 47–82, SanMateo, CA. Morgan Kaufmann Publishers. Also published in Allen et al. (1990).

Penberthy, J. and Weld, D. S. (1992). UCPOP: A sound, complete, partial order plan-ner for ADL. In Proceedings of the Third International Conference on KnowledgeRepresentation and Reasoning (KR&R-92), pages 103–114. Morgan Kaufmann Pub-lishers.

Penberthy, J. and Weld, D. S. (1994). Temporal planning with continuous change. InProceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94),pages 1010–1015, Menlo Park, CA. AAAI Press.

Pynadath, D. and Tambe, M. (2002). The communicative multiagent team deci-sion problem: Analyzing teamwork theories and models.Journal of AI Research,16:389–423.

30

Raphael, B. (1971). The frame problem in problem-solving systems. In Findler, N. andMeltzer, B., editors,Proceedings of the Advanced Study Institute on Artificial Intel-ligence and Heuristic Programming, pages 159–169, Edinburgh, UK. EdinburghUniversity Press.

Refanidis, I. and Vlahavas, I. (2001). The GRT planner: Backward heuristic construc-tion in forward state-space planning.Journal of AI Research, 15:115–161.

Rosenschein, J. S. (1995). Multiagent planning as a social process: Voting, privacy, andmanipulation. In Lesser, V. R., editor,Proceedings of the First International Con-ference on Multi-Agent Systems (ICMAS-95), page 431, San Francisco, CA. AAAIPress, distributed by The MIT Press.

Sacerdoti, E. D. (1975). The nonlinear nature of plans. InProceedings of the FourthInternational Joint Conference on Artificial Intelligence (IJCAI-75), pages 206–214,San Mateo, CA. Morgan Kaufmann Publishers.

Sandholm, T. W. and Lesser, V. R. (1997). Coalitions among computationally boundedagents.Artificial Intelligence, 94(1):99–137.

Shehory, O. and Kraus, S. (1998). Methods for task allocation via agent coalitionformation.Artificial Intelligence, 101(1–2):165–200.

Shoham, Y. and Tennenholtz, M. (1995). On social laws for artificial agent societies:Off-line design.Artificial Intelligence, 73(1–2):231–252.

Smith, D. E. and Weld, D. S. (1999). Temporal planning with mutual exclusion rea-soning. InProceedings of the Sixteenth International Joint Conference on ArtificialIntelligence (IJCAI-99), San Mateo, CA. Morgan Kaufmann Publishers.

Stuart, C. J. (1985). An implementation of a multi-agent plan synchronizer. InProceed-ings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85), pages 1031–1033, San Mateo, CA. Morgan Kaufmann Publishers. Also pub-lished in Bond and Gasser (1988), pages 216–219.

ter Mors, A., Valk, J., and Witteveen, C. (2004). Coordinating autonomous planners.In Proceedings of the international conference on artificial intelligence, pages 795–801. CSREA Press.

Thangarajah, J., Padhgam, L., and Winikoff, M. (2003). Detecting and avoiding in-terference between goals in intelligent agents. InProceedings of the EighteenthInternational Joint Conference on Artificial Intelligence (IJCAI-03).

Tsamardinos, I., Pollack, M. E., and Horty, J. F. (2000). Merging plans with quantita-tive temporal constraints, temporally extended actions, and conditional branches. InProceedings of the Fifth International Conference on Artificial Intelligence PlanningSystems (AIPS-00), pages 264–272, Menlo Park, CA. AAAI Press.

Veloso, M., Carbonell, J., Perez, A., Borrajo, D., Fink, E., and Blythe, J. (1995). Inte-grating planning and learning: The PRODIGY architecture.Journal of Experimentaland Theoretical Artificial Intelligence, 7(1):81–120.

31

Vickrey, W. (1961). Computer speculation, auctions, and competitive sealed tenders.Journal of Finance, 16:8–37.

von Martial, F. (1989). Multiagent plan relationships. InProceedings of the NinthInternational Workshop on Distributed Artificial Intelligence (DAI-89), pages 59–72.

von Martial, F. (1990). Coordination of plans in multiagent worlds by taking advantageof the favor relation. In Huhns, M., editor,Proceedings of the Tenth InternationalWorkshop on Distributed Artificial Intelligence (DAI-90), number ACT-AI-355-90in MCC Technical Report, Austin, TX.

von Martial, F. (1992). Coordinating Plans of Autonomous Agents, volume 610 ofLecture Notes on Artificial Intelligence. Springer Verlag, Berlin.

Walsh, W. E. and Wellman, M. P. (1999). A market protocol for decentralized taskallocation and scheduling with hierarchical dependencies. InProceedings of theThird International Conference on Multi-Agent Systems (ICMAS-98), pages 325–332. An extended version of this paper is also available.

Walsh, W. E., Wellman, M. P., and Ygge, F. (2000). Combinatorial auctions for supplychain formation. InSecond ACM Conference on Electronic Commerce, pages 260–269. ACM Press.

Weld, D. S. (1994). An introduction to least-commitment planning.AI Magazine,15(4):27–61.

Weld, D. S. (1999). Recent advances in AI planning.AI Magazine, 20(2):93–123.

Wellman, M. P. (1993). A market-oriented programming environment and its applica-tion to distributed multicommodity flow problems.Journal of Artificial IntelligenceResearch, 1:1–23.

Wellman, M. P., Walsh, W. E., Wurman, P., and MacKie-Mason, J. (1998). Auctionprotocols for decentralized scheduling. InProceedings of the Eighteenth Interna-tional Conference on Distributed Computing Systems.

Wellman, M. P., Walsh, W. E., Wurman, P. R., and MacKie-Mason, J. K. (2001). Auc-tion protocols for decentralized scheduling.Games and Economic Behavior, 35(1–2):271–303.

Wolfman, S. A. and Weld, D. S. (2001). Combining linear programming and satisfia-bility solving for resource planning.Knowledge Engineering Review, 16(1):85–99.

Yang, Q., Nau, D. S., and Hendler, J. (1992). Merging separately generated plans withrestricted interactions.Computational Intelligence, 8(4):648–676.

Zlotkin, G. and Rosenschein, J. S. (1996). Mechanisms for automated negotiation instate oriented domains.Journal of Artificial Intelligence Research, 5:163–238.

32

multi-agent planning: an introduction to planning and coordination

Documents