dynamic control in path-planning with real-time heuristic search … · 2007. 10. 22. · ¥ emts :...

1
Dynamic Control in Path-Planning with Real-Time Heuristic Search Heuristic search for planning ASP, HSP, FF, SHERPA, LDFS Real-time heuristic search Constant time per move Real-time path-planning 1-3 ms for all units in games Learning Real-time A* (LRTA*) Constant look-a-head depth Heuristic w.r.t. a fixed goal Dynamic selection at each step: Look-a-head depth and sub-goal Pattern-database approach Pre-compute for each state: intractable Pre-compute for each abstract state optimal action? No depth and next goal? Yes Decision-tree approach Use local search-space attributes to predict best look-a-head depth heuristic estimates n-step progress error estimates Pros of each approach Pattern-databases: optimal depths stored and sub-goals. Decision trees: use on unseen maps Empirical Evaluation 3 RTS maps x 100 problems each Dynamic LRTA* vs. LRTA* Dynamic LRTA* vs. state-of-the-art Larger maps: even more gains! Conclusions State-of-the-art in real-time path- planning (both with and without abstractions). Future work Integrate with PR-LRTS Applicability to general planning Problem Formulation Dynamic LRTA* Results Current state Goal state Lookahead area Lookahead depth Next state A G A G 1 2 3 4 Vadim Bulitko Yngvi Björnsson Mitja Luštrek Jonathan Schaeffer Sverrir Sigmundarson 0 50 100 150 200 250 300 350 400 450 1 1.5 2 2.5 3 3.5 5 7 10 14 0 1 2 3 4 5 5 7 10 15 20 5 7 10 15 20 Mean number of nodes expanded per action Suboptimality (times) Pattern db. (G) Decision tree (G) Oracle (G) Fixed depth (G) 0 50 100 150 200 250 300 350 400 450 1 1.5 2 2.5 3 3.5 Mean number of nodes expanded per action Suboptimality (times) Pattern db. (G) Pattern db. (G+I) Pattern db. (I) Fixed depth (G) 0 50 100 150 200 250 300 2 4 6 8 10 12 14 16 18 20 Mean number of nodes expanded per move Suboptimality (times) Fixed depth Pattern db. (I) Decision tree PR LRTS Koenig’s LRTA* Prioritized LRTA* 0 10 20 1 1.5 2 2.5 3 3.5 number of nodes expanded 8 2 4 Lookahead depth?

Upload: others

Post on 13-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Control in Path-Planning with Real-Time Heuristic Search … · 2007. 10. 22. · ¥ eMTS : version of MTS [Ishida, Korf 1991] ex-tended with lookahead of 10, heuristic weight

Dynamic Control in Path-Planning with Real-Time Heuristic Search

Heuristic search for planningASP, HSP, FF, SHERPA, LDFS

Real-time heuristic searchConstant time per move

Real-time path-planning1-3 ms for all units in games

Learning Real-time A* (LRTA*)Constant look-a-head depthHeuristic w.r.t. a fixed goal

Dynamic selection at each step:Look-a-head depth and sub-goal

Pattern-database approachPre-compute for each state: intractablePre-compute for each abstract state

optimal action? Nodepth and next goal? Yes

Decision-tree approachUse local search-space attributes to predict best look-a-head depth

heuristic estimatesn-step progresserror estimates

Pros of each approachPattern-databases: optimal depths stored and sub-goals.Decision trees: use on unseen maps

Empirical Evaluation3 RTS maps x 100 problems eachDynamic LRTA* vs. LRTA*

Dynamic LRTA* vs. state-of-the-art

Larger maps: even more gains!Conclusions

State-of-the-art in real-time path-planning (both with and without abstractions).

Future workIntegrate with PR-LRTSApplicability to general planning

Problem Formulation Dynamic LRTA* Results

State Abstraction for Real-time Moving Target Pursuit

We are interested in domains that are:

• a priori un-known

• adversarial• real-time (thinking time matters)

Examples:

• video games• robotics

In this research we study moving target pursuit in a game-like environment:

• 2D grid• initially unknown maps• target moves in real-time

Standard methods: incremental A*, D*, and D* Lite compute entire path before the first move can be made. They can be too slow when the target moves away in real time.

Technique: Automated map abstraction has proven effective for speeding search and learning.

Objectives:

• minimize convergence interception travel

Problem Formulation

Unifying architecture for pursuit and target agents:

Pursuit agents have known target locations. They use following algorithms for action planning:

• eMTS: version of MTS [Ishida, Korf 1991] ex-tended with lookahead of 10, heuristic weight of 0.1, max of min update rule. About 30x faster.

• PR MTS: eMTS at level 3, A* (level 0)

• A*

• PRA*: Partial-Refinement A*.

Target agents must choose the best location to run and then potentially do action planning to get there. We use the following policies:

• randomer - a random move at each time step.

• random - a random direction every 5 steps

• minimax: Use 5-ply minimax search to select lo-cation for movement.

• DAM: Dynamic Abstract Minimax uses varying levels of abstract graphs to do minimax planning.

Agent Design

1,000 problems on 10 maps (400 - 1,349 states):

Figure 6: The 10 synthetic maps used in the first and second experiments.

Figure 7: The two maps from commercial games used in the third experiment.

ReferencesBotea, A.; Muller, M.; and Schaeffer, J. 2004. Near optimalhierarchical path-finding. Journal of Game Development1(1):7–28.Bulitko, V., and Lee, G. 2006. Learning in real time search:A unifying framework. Journal of Artificial IntelligenceResearch 25:119 – 157.Bulitko, V.; Sturtevant, N.; and Kazakevich, M. 2005.Speeding up learning in real-time search via automaticstate abstraction. In Proceedings of the National Confer-ence on Artificial Intelligence (AAAI), 1349 – 1354.Bulitko, V. 2004. Learning for adaptive real-time search.Technical Report http: // arxiv. org / abs / cs.AI / 0407016,Computer Science Research Repository (CoRR).Chimura, F., and Tokoro, M. 1994. The Trailblazer search:A new method for searching and capturing moving targets.In Proceedings of the National Conference on Artificial In-telligence, 1347–1352.Culberson, J. C., and Schaeffer, J. 1996. Searching withpattern databases. In Canadian Conference on AI, 402–416.

Edelkamp, S. 1998. Updating shortest paths. In Proceed-ings of the European Conference on Artificial Intelligence,655–659.Hernandez, C., and Meseguer, P. 2005. LRTA*(k). InProceedings of the 19th International Joint Conference onArtificial Intelligence (IJCAI).Holte, R.; Mkadmi, T.; Zimmer, R. M.; and MacDonald,A. J. 1996. Speeding up problem solving by abstraction:A graph oriented approach. Artificial Intelligence 85(1-2):321–361.Holte, R. C.; Grajkowski, J.; and Tanner, B. 2005. Hi-erarchical heuristic search revisited. LNAI 3607, Springer121–133.Ishida, T., and Korf, R. 1991. Moving target search. InProceedings of the International Joint Conference on Arti-ficial Intelligence, 204–210.Ishida, T., and Korf, R. 1995. A realtime search for chang-ing goals. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence 17(6):609–619.Ishida, T. 1992. Moving target search with intelligence. InProceedings of the National Conference on Artificial Intel-ligence, 525–532.

Learning agents

target/pursuer eMTS PR MTS Speed-up

randomer 701 110 527%

random 380 268 42%

minimax 241 214 13%

DAM - 1,148 -

Search agents

target/pursuer A* PRA* Speed-up

randomer 107 102 5%

random 217 193 12%

minimax 329 355 -7%

DAM 474 649 -27%

Effects of State Abstraction

Vadim Bulitko Nathan Sturtevanthttp://ircl.cs.ualberta.ca/games

Target

Selection

Action

Planning

Action

Execution

• Known Target

• Minimax

• Random

• PRA*/A*

• LRTS

• Random

• Full Execution

• Partial Exec.

A B

KC

J

H

E

F

G

D

I

Group 1 Group 2

Group 4

1

4

2

3

Group 3

Figure 3: The process of abstracting a graph.

of the map as if they were single entities. Thus, the primarycontribution of this paper is extension of learning real-timeheuristic search with a state abstraction mechanism.

Building a State AbstractionState abstraction has been studied extensively in reinforce-ment learning (Barto & Mahadevan 2003). While ourapproach is fully automatic, many algorithms, such asMAXQ (Dietterich 1998), rely on manually engineered hi-erarchical representation of the space.

Automatic state abstraction has precedents in heuris-tic search and path-finding. For instance, HierarchicalA* (Holte et al. 1995) and AltO (Holte et al. 1996) usedabstraction to speed up classical search algorithms. Our ap-proach to automatically building abstractions from the un-derlying state representation is similar to Hierarchical A*.

We demonstrate the abstraction procedure on a hand-traceable micro-example in Figure 3. Shown on the left isthe original graph of 11 states. In general, we can use avariety of techniques to abstract the map, and we can alsoprocess the states in any order. Some methods and orderingsmay, however, work better in specific domains. In this paper,we look for cliques in the graph.

For this example, we begin with the state labeled A,adding it and its neighbors, B and C, to abstract group 1,because they are fully connected. Their group becomes asingle state in the abstract graph. Next we consider state D,adding its neighbor, E, to group 2. We do not add H becauseit is not connected to D. We continue to state F, adding itsneighbor, G, to group 3. States H, I, and J are fully con-nected, so they become group 4. Because state K can onlybe reached via state H, we add it to group 4 with H. If allneighbors of a state have already been abstracted, that statewill become a single state in the abstract graph. As statesare abstracted, we add edges between existing groups. Since

Figure 4: Abstraction levels 0, 1, and 2 of a toy map. The numberof states is 206, 57, and 23 correspondingly.

there is an edge between B and E, and they are in differentgroups, we add an edge between groups 1 and 2 in the ab-stract graph. We proceed similarly for the remaining inter-group edges. The resulting abstracted graph of 4 states isshown in the right portion of the figure.

We repeat the process iteratively, building an abstractionhierarchy until there are no edges left in the graph. If theoriginal graph is connected, we will end up with a singlestate at the highest abstraction level, otherwise we will havemultiple disconnected states. Assuming a sparse graph of Vvertices, the size of all abstractions is at most O(V ), becausewe are reducing the size of each abstraction level by at least afactor of two. The cost of building the abstractions is O(V ).Figure 4 shows a micro example.

Because the graph is sparse, we represent it with a list ofstates and edges as opposed to an adjacency matrix. Whenabstracting an entire map, we first build its connectivitygraph and then abstract this graph in two passes. Our ab-stractions are most uniform if we remove 4-cliques in a firstpass, and then abstract the remaining states in a second pass.

Repairing Abstraction During ExplorationA new map is initially unknown to the agent. Under the freespace assumption, the unknown areas are assumed emptyand connected. As the map is explored, obstacles are foundand the initial abstraction hierarchy needs to be repaired toreflect these changes. This is done with four operations:remove-state, remove-edge, add-state, and add-edge. Wedescribe the first two in detail here.

In the abstraction, each edge either abstracts into anotheredge in the parent graph, or becomes internal to a state inthe parent graph. Thus, each abstract edge must maintaina count of how many edges it is abstracting from the lowerlevel. When remove-edge removes an edge, it decrementsthe count of edges abstracted by the parent edge, and recur-sively removes the parent if the count falls to zero. If anedge is abstracted into a state in the parent graph, we addthat state to a repair queue to be handled later. The remove-state operation is similar. It decrements the number of statesabstracted by the parent, removing the parent recursively ifneeded, and then adds the parent state to a repair queue. Thisoperation also removes any edges incident to the state.

When updating larger areas of the map in one pass, us-ing a repair queue allows us to share the cost of the addi-tional steps required to perform further repairs in the graph.Namely, there is no need to completely repair the abstractionif we know we are going to make other changes. The repairqueue is sorted by abstraction level in the graph to ensurethat repairs do not conflict.

In a graph with n states, the remove-state and remove-edge operations can, in the worst case, take O(log n) time.However, their time is directly linked to how many statesare affected by the operation. If there is one edge that cutsthe entire graph, then removing it will take O(log n) time.However, in practice, most removal operations have a localinfluence and take time O(1). Handling states in the repairqueue is an O(log n) time operation in the worst case, butagain, we only pay this cost when we are making changesthat affect the connectivity of the entire map. In practice,

!""#$%&$'()$*%"+",-(./(0&$+12.3&()$*%14./'./,5$'.3(67+.*#" ([email protected])

8/.9&:;.*-("<(=+>&:*$?(@&A$:*3&/*("<(B"3A7*./,(CD.&/D&?(E'3"/*"/?(=+>&:*$?(B$/$'$

F.*G$(!7H*:&# ([email protected])I"J&<(C*&<$/(K/;*.*7*&?(@&A$:*3&/*("<(K/*&++.,&/*(C-;*&3;?(!G7>+G$/$?(C+"9&/.$

K/*:"'7D*."/L 0&$+1*.3&(A$*%1<./'./,(!

! ./D"3A+&*& ;&$:D%(3&*%"';(!

! ;7>"A*.3$+($D*."/;

L @&&A&:(+""#$%&$'(>&+.&9&'(*"(A:"'7D&(>&**&:($D*."/;

L C"3&*.3&;(*%&("AA";.*&(.;(*:7&M(A$*%"+",-

C&**./,L )$*%1<./'./,(./(,:.'(N":+'

L =+,":.*%3M(!02C O67+.*#"(P(!&&(QRS

L 2N"(*-A&;("<(&TA&:.3&/*;M

L U/1A"+.D-M(;*$:*(;*$*&(! ,"$+(;*$*&?(%&7:.;*.D(7A'$*&'

L U<<1A"+.D-M(:$/'"3+-(;&+&D*&'(;*$*&;?("/&(3"9&?(%&7:.;*.D(

/"*(7A'$*&'

L @&,:&&("<(A$*%"+",-M(/73>&:("<(+""#$%&$'('&A*%;(N%&:&(

&::":(.;(+$:,&:(*%$/($*(*%&(A:&9."7;('&A*%

L V?QQQ(A:">+&3;(W3$A?(;*$:*(;*$*&?(,"$+(;*$*&X

)$*%"+",-(U>;&:9&'

@&,:&&("<(A$*%"+",- Q V Y Z [ " \

U/1A"+.D-(WA:">+&3;(]X Z^_\ V\_V YQ_Z V`_Q `_R V_\

C&D"/'(ETA+$/$*."/L C3$++&:(+""#$%&$'('&A*%;(>&/&<.*(3":&(<:"3(*%&(7A'$*&;(*"(*%&(%&7:.;*.D(!

! '&A*%;(D+";&:(! +$:,&:(3":&(+.#&+-(N":;&(*%$/(;3$++&:

L 4.:;*(*&;*M("/1A"+.D-?(.,/":&(7A'$*&;(N%&/(3&$;7:./,(&::":(!

! +&;;(A$*%"+",-

L C&D"/'(*&;*M(">;&:9&(9"+73&("<(7A'$*&;(*"(*%&(%&7:.;*.D(!

! ;3$++&:(9"+73&($*(;3$++&:('&A*%;

@&,:&&("<(A$*%"+",- Q V Y Z [ " \

a"(7A'$*&;(WA:">+&3;(]X `b_^ V[_Y [_\ V_Y Q_Z Q_Q

Q

Q_\

V

V_\

Y

Y_\

Z

Z_\

[

[_\

V Y Z [ \ R ` ^ b VQ

Depth

Up

da

te v

olu

me

/ g

en

era

ted

U/1A"+.D- U<<1A"+.D-

2%.:'(ETA+$/$*."/L 4&N&:(;&$:D%&;(A&:<":3&'($*(+$:,&:(+""#$%&$'('&A*%; !

! '&A*%;(D+";&:(! +$:,&:(3":&(+.#&+-(N":;&(*%$/(;3$++&:

L 4.:;*(*&;*M("/1A"+.D-(&TA&:.3&/*?(;&$:D%(&9&:-(3"9&(!

! +&;;(A$*%"+",-

L C&D"/'(*&;*M(">;&:9&(/73>&:("<(;*$*&;(,&/&:$*&'(A&:(3"9&(N%&/(

;&$:D%./,(&9&:-(3"9&(!

! ;*&&A&:(./D:&$;&(*%$/(/":3$++-

@&,:&&("<(A$*%"+",- Q V Y Z [ " \

E9&:-(3"9&(WA:">+&3;(]X ^R_b b_Q Z_Z Q_R Q_Y Q_Q

Q

YQQ

[QQ

RQQ

^QQ

VQQQ

VYQQ

V[QQ

VRQQ

V^QQ

V Y Z [ \ R ` ^ b VQ

Depth

Ge

ne

rate

d /

mo

ve

U/1A"+.D- U/1A"+.D-(&9&:-(3"9& U<<1A"+.D-

2"N$:';($(0&3&'-L ='$A*.9&(+""#$%&$'("/($/(&T$3A+&(3$AM

L "A*.3$+('&A*%(A&:(;*$:*(;*$*&M

[^](+&;;(*:$9&+(*%$/(>&;*(<.T&'('&A*%

L "A*.3$+('&A*%(A&:(3"9&M

$''.*."/$+(:&'7D*."/

L V[](./(*:$9&+

L [Z](./(D"3A7*$*."/(A&:(3"9&

L a&&'(*"(#/"N("A*.3$+('&A*%;c

L ETA&/;.9&(*"(A:&1D"3A7*&(<":(&9&:-(;*$*&(A$.:(

W`_R(d VQR A$.:;X

L C*$*&($>;*:$D*."/ O67+.*#"(&*($+_(Q\S(M

L Q_QQ[](;*$*&(A$.:;(A:&1D"3A7*&'

L ZZ](+&;;(*:$9&+(*%$/(>&;*(<.T&'('&A*%

Currentstate

Goal state

Lookahead area

Lookaheaddepth

Nextstate

Updatedstate

Currentlookahead

area

Search every move

4.:;*(ETA+$/$*."/

L F$/-(A$*%"+",.D$+(;*$*&;

L 4.:;*(&TA+$/$*."/($AA$:&/*+-(/"*(D"::&D*

L e%-(*%&(+$:,&('.<<&:&/D&(>&*N&&/("/1A"+.D-($/'("<<1A"+.D-f

@&,:&&("<(A$*%"+",- Q V Y Z [ " \

U<<1A"+.D-(WA:">+&3;(]X b\_` Z_` Q_R Q_Q Q_Q Q_Q

agent, i.e., (3) looks at the states the agent was in during thelast n steps, and counts the number of reoccurring states.

The classifier predicts the optimal search depth for thecurrent state. To avoid pre-computing true optimal actions,we assume that deeper search usually yields a better ac-tion. In the training phase, the agent first conducts a dmax-ply search to derive the “optimal” action. Then a series ofprogressively shallower searches are performed to determinethe shallowest search depth, ds, that still returns the “opti-mal” action. During this process, if at any given depth, anaction is returned that differs from the optimal action, theprogression is stopped. This enforces all depths from ds todmax to agree on the best action. This is important for im-proving the overall robustness of the classification, as theclassifier must generalize over a large set of states. Thedepth ds is set as the class label for the vector of featuresdescribing the current state

Once a decision-tree classifier is built, the overhead of us-ing it is negligible. The real-time response is ensured byfixing the maximum length that a decision-tree branch cangrow to, as well as the length of the recent history fromwhich we collect input features. This maximum is indepen-dent of the problem (map) size. Indeed, the four input fea-tures for the classifier are all efficiently computed in smallconstant time, and the classifier itself is only a handful ofshallowly nested conditional statements. Thus, the execu-tion time is dominated by LRTA*’s lookahead search. Theprocess of gathering training data and building the classifieris carried out off-line. Although the classifier appears sim-plistic, with minimal knowledge about the domain, as shownlater it performs surprisingly well.

Pattern Database ApproachWe start with a naıve approach as follows. For each(sstart, sgoal) state pair, the true optimal action a∗(sstart, sgoal)is to take an edge that lies on an optimal path from sstartto sgoal. Once a∗(sstart, sgoal) is known, we can run a seriesof progressively deeper searches from state sstart. The shal-lowest search depth that yields a∗(sstart, sgoal) is the optimalsearch depth d∗(sstart, sgoal).

There are two problems with the naıve approach. First,d∗(sstart, sgoal) is not a priori upper-bounded, thereby for-feiting LRTA*’s real-time property. Second, pre-computingd∗(sstart, sgoal) or a∗(sstart, sgoal) for all pairs of (sstart, sgoal)states on even a 512 × 512 cell video-game map has pro-hibitive time and space complexity. We solve the first prob-lem by capping d∗(sstart, sgoal) at a fixed constant c ≥ 1(henceforth called cap). We solve the second problem by us-ing an automatically built abstraction of the original searchspace. The entire map is partitioned into regions (or abstractstates) and a single search depth value is pre-computed foreach pair of abstract current and goal states. The resultingdata are a pattern database (Culberson & Schaeffer 1998).

This approach speeds up pre-computation and reducesmemory overhead (both important considerations for com-mercial video games). To illustrate, in typical grid worldvideo-game maps, a single application of clique abstrac-tion (Sturtevant & Buro 2005) reduces the number of statesby a factor of 2 to 3. At the abstraction level of five, each re-

AG

A

G

1 2

34

Figure 2: Goal is shown as G, agent as A. Diamonds denote rep-resentative states for each tile. Left: Optimal actions are shown foreach representative of an abstract tile; applying the action of theagent’s tile in the agent’s current location leads into a wall. Right:The dashed line denotes the optimal path.

gion contains about one hundred ground-level states. Thus,a single search depth value is shared among about ten thou-sand state pairs. As a result, five-level clique abstractionyields a four orders of magnitude reduction in memory andabout two orders of magnitude reduction in pre-computationtime (as analyzed later in the section).

Two alternatives to storing the optimal search depth are tostore an optimal action or the optimal heuristic value. Theuse of abstraction excludes both of them. Indeed, sharingan optimal action computed for a single ground-level repre-sentative of an abstract region among all states in the regionmay cause the agent to run into a wall (Figure 2, left). Like-wise, sharing a single heuristic value among all states in aregion leaves the agent without a sense of direction as allstates in its vicinity would look equally close to goal. Suchagents are not guaranteed to reach a goal, let alone mini-mize travel. In contrast, sharing the search depth among anynumber of ground-level states is safe as LRTA* is completefor any search depth. Additionally, optimal search depth ismore robust with respect to map changes (e.g., a bridge be-ing destroyed in a video-game).

We compute a single pattern database per map off-line(Figure 3). In line 1 the state space is abstracted ! times.In this paper we use clique abstraction that merges fullyconnected states. This respects topology of a map but re-quires storing the abstraction links explicitly. An alterna-tive is to use regular rectangular tiles of Botea, Muller, &Schaeffer (2004). Lines 2 through 7 iterate through all pairsof abstract states. For each pair (s′start, s

′goal), representative

ground-level states sstart and sgoal (e.g., centroids of the re-gions) are picked and the optimal search depth value d iscalculated for them. To do this, Dijkstra’s algorithmis runover the ground-level search space (V,E) to compute trueminimal distances from any state to sgoal. Once the dis-tances are known for all successors of sstart, an optimal ac-tion a∗(sstart, sgoal) can be computed trivially. Then the op-timal search depth d∗(sstart, sgoal) is computed as describedabove and capped at c (line 5). The resulting value d is storedfor the pair of abstract states (s′start, s

′goal) in line 6.

During run-time, an LRTA* agent located in state sstartand with the goal state sgoal sets the search depth as the pat-tern database value for pair (s′start, s

′goal), where s′start and s′goal

are images of sstart and sgoal under an !-level abstraction. Therun-time complexity is minimal as s′start, s

′goal, d(s′start, s

′goal)

Vadim Bulitko Yngvi Björnsson Mitja Luštrek Jonathan Schaeffer Sverrir Sigmundarson

0 50 100 150 200 250 300 350 400 4501

1.5

2

2.5

3

3.5

5

7

10

14

0 1 2

3

4

5

5

7

10

15

20

5

7

10

15

20

Mean number of nodes expanded per action

Suboptim

alit

y (

tim

es)

Pattern db. (G)

Decision tree (G)

Oracle (G)

Fixed depth (G)

0 50 100 150 200 250 300 350 400 4501

1.5

2

2.5

3

3.5

5

7

10

14

0 1 2

3

4

5

5

7

10

15

20

5

7

10

15

20

Mean number of nodes expanded per action

Suboptim

alit

y (

tim

es)

Pattern db. (G)

Decision tree (G)

Oracle (G)

Fixed depth (G)

Figure 5: Margins for improvement over fixed-depth LRTA*.

other words, in addition to selecting the search depth perstate, the goal is also selected dynamically, per action.

Finally, one can imagine always using intermediate goals(until the agent enters the abstract state containing sgoal).This gives us three approaches to selecting goal states inLRTA*: always global goal (G); intermediate goal if theglobal goal requires excessive search depth (G+I); alwaysintermediate goal unless the agent is in the goal region (I).

Empirical EvaluationThis section presents results of an empirical evaluation ofLRTA* agents with dynamic control of search depth andgoal selection. All algorithms except Koenig’s LRTA* usebreadth-first search for planning and avoid re-expandingstates via a transposition table. We report sub-optimality inthe solution found and the average amount of computationper action, expressed in the number of states expanded andactual timings on a 3 GHz Pentium IV computer.

We use grid world maps from a popular real-time strategygame as our testbed. Such maps provide a realistic and chal-lenging environment for real-time search (Sigmundarson &Bjornsson 2006). The agents were first tested on three dif-ferent maps (sized 161×161 to 193×193 cells), performing100 searches on each map. The heuristic function used is oc-tile distance – a natural extension of the Manhattan distancefor maps with diagonal actions. The start and goal locationswere chosen randomly, although constrained such that op-timal solution paths cost between 90 and 100. Each datapoint in the plots below is an average of 300 problems (3maps ×100 runs each). All algorithms were implementedin the HOG framework (Sturtevant 2005), and use a sharedcode-base for algorithmic independent features. This hasthe benefit of minimizing performance differences causedby different implementation details (e.g. all algorithms areusing the same tie-breaking mechanism for node selection).

For building the classifiers, we used the J48 decision treealgorithm in the WEKA library (Witten & Frank 2005).The training features were collected from a history trace ofn = 20 steps. The game maps were used for training and10-fold cross-validation was used to avoid over-fitting thedata. The pruning factor and minimum number of data itemsper leaf parameters of the decision tree algorithm were set

0 50 100 150 200 250 300 350 400 4501

1.5

2

2.5

3

3.5

Mean number of nodes expanded per action

Suboptim

alit

y (

tim

es)

Pattern db. (G)

Pattern db. (G+I)

Pattern db. (I)

Fixed depth (G)

Figure 6: Effects of dynamic goal selection.

to 0.05 and 200, respectively. For the reference, at depthdmax = 20 there were approximately 140 thousand train-ing samples recorded on the problems. Collecting the dataand building the classifier took about an hour. Of these twotasks, collecting the training data is by far the more timeconsuming — building the decision trees using the WEKAlibrary takes only a matter of minutes. On the other hand, thedecision-tree building process requires more memory. Forexample, during the large map experiments described in alater subsection, we collected several hundred of thousandtraining samples (taking several hours to collect) and thisrequired and order of 1GB of memory for WEKA to pro-cess. As this calculation is done offline, we are not overlyconcerned with efficiency. As described earlier, once thedecision-tree classifiers have been build they have negligi-ble time and memory footprint.

Margins for Improvement. The result of running the dif-ferent LRTA* agents is shown in Figure 5. Search depth forall versions of LRTA* was capped at 20. The x-axis repre-sents the amount of work done in terms of the mean numberof states (nodes) expanded per action, whereas the y-axisshows the quality of the solutions found (as a multiple of thelength of the optimal path). The standard LRTA* provides abaseline trade-off curve. We say that a generalized versionof LRTA* outperforms the baseline for some values of con-trol parameters if it is better along both axes (i.e., appearsbelow a segment of the baseline curve on the graph).

The topmost line in the figure shows the performance ofLRTA* using fixed search depths in the range [4, 14]. Theclassifier approach, shown as the triangle-labeled line, out-performs the baseline for all but the largest depth (20). Asseen in the figure, the standard LRTA* must expand as manyas twice the number of states to achieve a comparable so-lution quality. The solid triangles show the scope for im-provement if we had a perfect classifier (oracle) returningthe “optimal” search depth for each state (as defined for theclassifier approach). For the shallower searches, the classi-fier performs close to the corresponding oracle, but as thesearch depth increases, the gap between them widens. Thisis consistent with classification accuracy: the ratio of cor-rectly classified instances dropped as the depth increased(from 72% at depth 5 to 44% at depth 20). More descrip-

0 50 100 150 200 250 300

2

4

6

8

10

12

14

16

18

20

Mean number of nodes expanded per move

Suboptim

alit

y (

tim

es)

Scenario file: mitja300.512.txt maxENcutOff: 1000

Fixed depth

Pattern db. (I)

Decision tree

PR LRTS

Koenig’s LRTA*

Prioritized LRTA*

0 10 201

1.5

2

2.5

3

3.5

Su

bo

ptim

alit

y (

tim

es)

Mean number of nodes expanded per move

Scenario file: mitja300.512.txt maxENcutOff: 1000

Figure 8: Dynamic-control LRTA* versus state-of-the-art algorithms. The zoomed-in part highlights the abstraction-based methods, PRLRTS and intermediate-goal PDB; both methods use abstraction level 4; PR LRTS uses γ=1.0 and depths 1, 3, and 5 (from top to bottom).

the worst-case number of nodes expanded per move does notexceed 1000 nodes – approximately the number of nodesthat an optimized implementation of these algorithms canexpand in the amount of time available for planning eachaction in video games. All parameter settings that violatethis constraint were excluded from the figure.

Of the surviving algorithms, two clearly dominate the oth-ers: our new pattern-database approach with intermediategoals and PR LRTS (both use state abstraction). At their bestparameter settings, PR LRTS with the abstraction level of 4,depth 5, and γ=1.0, and the pattern-database approach withthe abstraction level of 4 yield comparable performance andneither algorithm dominates the other. PR LRTS searchesslightly fewer nodes but the pattern-database approach re-turns slightly better solutions, as shown in the zoomed-inpart in the figure. Thus, the pattern-database approach isa new competitive addition to the family of state-of-the-artreal-time search algorithms. Furthermore, as PR LRTS runsLRTA* in an abstract state space, it can be equipped withour dynamic control scheme and is likely to achieve evenbetter performance; this is the subject of ongoing work.

Previous ResearchMost algorithms in single-agent real-time heuristic searchuse fixed search depth, with a few notable exceptions. Rus-sell & Wefald (1991) proposed to estimate the value ofsearch off-line/on-line. They estimated how likely an ad-ditional search is to change an action’s estimated value. In-accuracies in such estimates and the overhead of meta-levelcontrol led to small benefits in combinatorial puzzles.

Ishida (1992) observed that LRTA*-style algorithms tendto get trapped in local minima of their heuristic function,termed “heuristic depressions”. The proposed remedy wasto switch to a limited A* search when a heuristic depres-sion is detected and then use the results of the A* search tocorrect the depression at once. A generalized definition ofheuristic depressions was used by Bulitko (2004) who ar-gued for extending search horizon incrementally until thesearch finds a way out of the depression. After that all ac-tions leading to the found frontier state are executed. A capon the search horizon depth is set by the user. In our ap-

proach, we execute only a single action toward the fron-tier and do not use backtracking — two feature that havebeen shown to improve robustness of real-time search (Sig-mundarson & Bjornsson 2006; Lustrek & Bulitko 2006).Additionally, in the pattern-database approach we switch tointermediate goals when the agent discovers itself in a deepheuristic depression. The idea of pre-computing a patterndatabase of heuristic values for real-time path-planning wasrecently suggested by Lustrek & Bulitko (2006). This paperextends their work in several directions: (i) we introduce theidea of intermediate goals, (ii) we propose an alternative ap-proach that does not require map-specific pre-computation,and (iii) we demonstrate superiority over fixed-ply search onlarge-scale computer game maps.

There is a long tradition for search control in two-playersearch. The problem of semi-dynamically allotting time toeach action decision in two-player search is somewhat anal-ogous to the depth selection investigated here.

Applicability to General PlanningSo far we have evaluated our approach empirically only onpath-planning. However, it may also offer benefits to a widerrange of planning problems. The core heuristic search al-gorithm extended in this paper (LRTA*) was previously ap-plied to general planning (Bonet, Loerincs, & Geffner 1997).The extensions we introduced may have a beneficial effectin a similar way to how the B-LRTA* extensions improvedthe performance of ASP planner. Subgoal selection hasbeen long studied in planning and is a central part of ourintermediate-goal pattern-database approach. Decision treesfor search depth selection are induced from sample trajecto-ries through the space and appear scalable to general plan-ning problems. The only part of our approach that requiressolving numerous ground-level problems is pre-computationof optimal search depth in the pattern databases approach.We conjecture that the approach will still be effective if, in-stead of computing the optimal search depth based on an op-timal action a∗, one were to solve a relaxed planning prob-lem and use the resulting action in place of a∗. Derivingheuristic guidance from solving relaxed problems is com-mon to both planning and the heuristic search community.

8

2 4

Lookahead depth?