a hybrid linear programming and relaxed plan heuristic for partial satisfaction planning problems j....

A Hybrid Linear Programming and Relaxed Plan Heuristic for Partial

Satisfaction Planning ProblemsJ. Benton

Menkes van den Briel

Subbarao Kambhampati

Arizona State University

PSPUD

Partial Satisfaction Planning with Utility Dependency

cost: 150cost: 1

(at plane loc2)

(in person plane)

(fly plane loc2) (debark person loc2)

(at plane loc1)

(in person plane)

(at plane loc2)

(at person loc2)

utility((at plane loc1) & (at person loc3)) = 10

cost: 150

(fly plane loc3)

(at plane loc3)

(at person loc2)

utility((at plane loc3)) = 1000 utility((at person loc2)) = 1000

util(S0): 0

S0

util(S1): 0

S1

util(S2): 1000

S2

util(S3): 1000+1000+10=2010

S3

sum cost: 150sum cost: 151 sum cost: 251

net benefit(S0): 0-0=0net benefit(S1): 0-150=-150net benefit(S2): 1000-151=849net benefit(S3): 2010-251=1759

Actions have cost Goal sets have utilityloc2loc1

loc3

100200

150

101

Maximize Net Benefit (utility - cost)

(Do, et al., IJCAI 2007)(Smith, ICAPS 2004; van den Briel, et al., AAAI 2004)

Action Cost/Goal Achievement

InteractionPlan Quality

Heuristic search forSOFT GOALS

Relaxed Planning Graph Heuristics

Integer programming (IP)

LP-relaxation Heuristics

(Do & Kambhampati, KCBS 2004;

Do, et al., IJCAI 2007)

Cannot take all complex interactions

into account

Current encodings don’t scale well, can only be optimal to

some plan stepBBOP-LP

Use its LP relaxation for a

heuristic value

Build a network flow-based

IP encoding

Approach

Perform branch and boundsearch

No time indicesUses multi-valued variables

Gives a second relaxation onthe heuristic

Uses the LP solution to find a relaxed plan(similar to YAHSP, Vidal 2004)

Building a HeuristicA network flow model on variable transitions

Capture relevant transitions with multi-valued fluentsprevail constraints

initial statesgoal states

cost on actions utility on goals

(no time indices)

loc2loc1

loc3

100200

150

101plane person

cost: 1

cost: 1

cost: 1

cost: 1

cost: 1

cost: 1

cost: 101

cost: 150

cost: 200

cost: 100

util: 1000

util: 10

util: 1000

Building a HeuristicConstraints of this model

2. If a fact is deleted, then it must be added to re-achieve a value.3. If a prevail condition is required, then it must be achieved.

1. If an action executes, then all of its effects and prevail conditions must also.

4. A goal utility dependency is achieved iff its goals are achieved.

plane person

cost: 1

cost: 1

cost: 1

cost: 1

cost: 1

cost: 1

cost: 101

cost: 150

cost: 200

cost: 100

util: 1000

util: 10

util: 1000

Building a HeuristicConstraints of this model

1. If an action executes, then all of its effects and prevail conditions must also.

2. If a fact is deleted, then it must be added to re-achieve a value.

3. If a prevail condition is required, then it must be achieved.

4. A goal utility dependency is achieved iff its goals are achieved.

action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f)

1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f)

1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M

goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk| – 1

goaldep(k) ≤ endvalue(v,f) ∀ f in dependency k

Variablesaction(a) ∈ Z+ The number of times a ∈ A is executed

effect(a,v,e) ∈ Z+ The number of times a transition e in state variable v is caused by action a

prevail(a,v,f) ∈ Z+ The number of times a prevail condition f in state variable v is required by action a

endvalue(v,f) ∈ {0,1}

Equal to 1 if value f is the end value in a state variable v

goaldep(k) Equal to 1 if a goal dependency is achievedParameterscost(a) the cost of executing action a ∈ A

utility(v,f) the utility of achieving value f in state variable v

utility(k) the utility of achieving achieving goal dependency Gk

Variablesaction(a) ∈ Z+ The number of times a ∈ A is executed

effect(a,v,e) ∈ Z+ The number of times a transition e in state variable v is caused by action a

prevail(a,v,f) ∈ Z+ The number of times a prevail condition f in state variable v is required by action a

endvalue(v,f) ∈ {0,1}

Equal to 1 if value f is the end value in a state variable v

goaldep(k) Equal to 1 if a goal dependency is achievedParameterscost(a) the cost of executing action a ∈ A

utility(v,f) the utility of achieving value f in state variable v

utility(k) the utility of achieving achieving goal dependency Gk

Objective FunctionMAX Σv∈V,f∈Dv utility(v,f) endvalue(v,f) + Σk∈K utility(k) goaldep(k) – Σa∈A cost(a)

action(a)Maximize Net Benefit

2. If a fact is deleted, then it must be added to re-achieve a value.1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) +

endvalue(v,f)3. If a prevail condition is required, then it must be achieved.1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥

prevail(a,v,f) / M

Updatedat each search node

SearchBranch and Bound

Branch and bound with time limit

Greedy lookahead strategy

All soft goals; all states are goal states

LP-solution guided relaxed plan extraction

Similar to YAHSP (Vidal, 2004)

Returns the best plan (i.e., best bound)

To quickly find good bounds

To add informedness

(at plane loc1) (at plane loc1)

(at plane loc3)


(at plane loc1)

(at plane loc3)

(drop person loc2)

(fly loc2 loc3)

(fly loc1 loc2)

(fly loc3 loc2)

(fly loc1 loc2)

(fly loc1 loc3) (fly loc1 loc3)

(in person plane) (in person plane) (in person plane)

(at person loc2)

Getting a Relaxed Plan



(at plane loc3)


(at plane loc1)

(at plane loc3)

(drop person loc2)

(fly loc2 loc3)

(fly loc1 loc2)

(fly loc3 loc2)

(fly loc1 loc2)



(at person loc2)



(at plane loc3)


(at plane loc1)

(drop person loc2)

(fly loc2 loc3)

(fly loc1 loc2)

(fly loc3 loc2)

(fly loc1 loc2)



(at plane loc3)

(at person loc2)

Experimental Setup

Three modified IPC 3 domains: zenotravel, satellite, rovers (maximize net benefit)

Compared with

, an admissible cost propagation-based heuristicRan with 600 second time limit

SPUDS, uses a relaxed plan-based heuristic

- action costs- goal utilities- goal utility dependencies

BBOP-LP : with and without RP lookahead

Results

rovers satellite

zenotravel

optimalsolutions Found optimal

solution in 15 of 60 problems

(higher net benefit is better)

Results

Novel LP-based heuristic for partial satisfaction planning

Planner that is sensitive to plan quality: BBOP-LP

Future WorkImprove encoding

Explore other lookahead methods

Summary

Branch and bound with RP lookahead

a hybrid linear programming and relaxed plan heuristic for partial satisfaction planning problems j....

Documents

planeperson cost

utility dependency cost

f effecta

person plane fly plane

s3s3 sum cost

net benefit utility

goaldepk f

person loc3