applications of informed search optimization problems algorithm a, admissibility, a* zero-sum game...

27
applications of informed search optimization problems Algorithm A, admissibility, A* zero-sum game playing minimax principle, alpha-beta pruning Dave Reed

Upload: hugh-garrett

Post on 16-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

applications of informed search optimization problems

Algorithm A, admissibility, A* zero-sum game playing

minimax principle, alpha-beta pruning

Dave Reed

Page 2: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Optimization problemsOptimization problemsconsider a related search problem:

instead of finding the shortest path (i.e., fewest moves) to a solution, suppose we want to minimize some cost

EXAMPLE: airline travel problem• could associate costs with each flight, try to find the cheapest route• could associate distances with each flight, try to find the shortest route

we could use a strategy similar to breadth first search repeatedly extend the minimal cost path

search is guided by the cost of the path so far

but such a strategy ignores heuristic information would like to utilize a best first approach, but not directly applicable

search is guided by the remaining cost of the path

IDEAL: combine the intelligence of both strategies cost-so-far component of breadth first search (to optimize actual cost) cost-remaining component of best first search (to make use of heuristics)

Page 3: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Algorithm AAlgorithm Aassociate 2 costs with a path

g actual cost of the path so farh heuristic estimate of the remaining cost to the goal*f = g + h combined heuristic cost estimate

*note: the heuristic value is inverted relative to best first

Algorithm A: best first search using f as the heuristic

S1

S2 S3

g = 0, h = 24f = 24

g = 10, h = 19f = 29

g = 20, h = 4f = 24

S4 S5g = 22, h = 5f = 27

g = 21, h = 5f = 26

S6g = 31, h = 2f = 33

10 20

2 1

10

G

6

g = 28, h = 0f = 28

Page 4: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Travel problem revisitedTravel problem revisited%%% h(Loc, Goal, Value) : Value is crow-flies%%% distance from Loc to Goal%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

h(loc(omaha), loc(los_angeles), 1700).h(loc(chicago), loc(los_angeles), 2200).h(loc(denver), loc(los_angeles), 1400).h(loc(los_angeles), loc(los_angeles), 0).

omaha

denver

chicago

los_angeles

1000

1400 600

500

2200

g: cost is actual distances per flight h: cost estimate is crow-flies distance

loc(omaha) f = 1700

loc(denver) loc(chicago)

600 500

f = 2000 f = 2700

loc(los_angeles)

1400

f = 2000

Page 5: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Algorithm A implementationAlgorithm A implementation%%% Algorithm A (for trees)%%% atree(Current, Goal, Path): Path is a list of states%%% that lead from Current to Goal with no duplicate states.%%%%%% atree_help(ListOfPaths, Goal, Path): Path is a list of%%% states (with associated F and G values) that lead from one%%% of the paths in ListOfPaths (a list of lists) to Goal with%%% no duplicate states.%%%%%% extend(G:Path, Goal, ListOfPaths): ListOfPaths is the list%%% of all possible paths (with associated F and G values) obtainable %%% by extending Path (at the head) with no duplicate states.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

atree(State, Goal, G:Path) :- atree_help([0:0:[State]], Goal, G:RevPath), reverse(RevPath, Path).

atree_help([_:G:[Goal|Path]|_], Goal, G:[Goal|Path]).atree_help([_:G:Path|RestPaths], Goal, SolnPath) :- extend(G:Path, Goal, NewPaths), append(RestPaths, NewPaths, TotalPaths), sort(TotalPaths, SortedPaths), atree_help(SortedPaths, Goal, SolnPath).

extend(G:[State|Path], Goal, NewPaths) :- bagof(NewF:NewG:[NextState,State|Path], Cost^H^(move(State, NextState, Cost), not(member(NextState, [State|Path])), h(NextState,Goal,H), NewG is G+Cost, NewF is NewG+H), NewPaths), !.extend(_, _, []).

differences from best

associate two values with each path (F is total estimated cost, G is actual cost so far)

F:G:Path

since extend needs to know of current path, must pass G

new feature of bagof:if a variable appears only in the 2nd arg, must identify it as backtrackable

Page 6: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Travel exampleTravel example

?- atree(loc(omaha), loc(los_angeles), Path).

Path = 2000:[loc(omaha), loc(denver), loc(los_angeles)] ;

Path = 2700:[loc(omaha), loc(chicago), loc(los_angeles)] ;

Path = 2900:[loc(omaha), loc(chicago), loc(denver), loc(los_angeles)] ;

No

note: Algorithm A finds the path with least cost (here, distance)not necessarily the path

with fewest steps

suppose the flight from Chicago to L.A. was 2500 miles (instead of 2200)

OmahaChicagoDenverLAwould be shorter than

OmahaChicagoLA

%%% travelcost.pro Dave Reed 3/15/02%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

move(loc(omaha), loc(chicago), 500).move(loc(omaha), loc(denver), 600).move(loc(chicago), loc(denver), 1000).move(loc(chicago), loc(los_angeles), 2200).move(loc(chicago), loc(omaha), 500).move(loc(denver), loc(los_angeles), 1400).move(loc(denver), loc(omaha), 600).move(loc(los_angeles), loc(chicago), 2200).move(loc(los_angeles), loc(denver), 1400).

%%% h(Loc, Goal, Value) : Value is crow-flies%%% distance from Loc to Goal%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

h(loc(omaha), loc(los_angeles), 1700).h(loc(chicago), loc(los_angeles), 2200).h(loc(denver), loc(los_angeles), 1400).h(loc(los_angeles), loc(los_angeles), 0).

Page 7: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

8-puzzle example8-puzzle example

?- atree(tiles([1,2,3],[8,6,space],[7,5,4]), tiles([1,2,3],[8,space,4],[7,6,5]), Path).

Path = 3:[tiles([1, 2, 3], [8, 6, space], [7, 5, 4]), tiles([1, 2, 3], [8, 6, 4], [7, 5, space]), tiles([1, 2, 3], [8, 6, 4], [7, space, 5]), tiles([1, 2, 3], [8, space, 4], [7, 6, 5])]

Yes

?- atree(tiles([2,3,4],[1,8,space],[7,6,5]), tiles([1,2,3],[8,space,4],[7,6,5]), Path).

Path = 5:[tiles([2, 3, 4], [1, 8, space], [7, 6, 5]), tiles([2, 3, space], [1, 8, 4], [7, 6, 5]), tiles([2, space, 3], [1, 8, 4], [7, 6, 5]), tiles([space, 2, 3], [1, 8, 4], [7, 6, 5]), tiles([1, 2|...], [space, 8|...], [7, 6|...]), tiles([1|...], [8|...], [7|...])]

Yes

here, Algorithm A finds the same paths as best first search not surprising

since g is trivial

still, not guaranteed to be the case

g: actual cost of each move is 1

h: remaining cost estimate is # of tiles out of place

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% h(Board, Goal, Value) : Value is the number of tiles out of place

h(tiles(Row1,Row2,Row3),tiles(Goal1,Goal2,Goal3), Value) :- diff_count(Row1, Goal1, V1), diff_count(Row2, Goal2, V2), diff_count(Row3, Goal3, V3), Value is V1 + V2 + V3.

diff_count([], [], 0).diff_count([H1|T1], [H2|T2], Count) :- diff_count(T1, T2, TCount), (H1 = H2, Count is TCount ; H1 \= H2, Count is TCount+1).

Page 8: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Algorithm A vs. hill-climbingAlgorithm A vs. hill-climbingif the cost estimate function h is perfect, then f is a perfect heuristic

Algorithm A is deterministic

S1

S2 S3

g = 0, h = 28f = 28

g = 10, h = 20f = 30

g = 20, h = 8f = 28

S4 S5g = 22, h = 6f = 28

g = 21, h = 12f = 33

X

10 20

2 1

10

G

6

g = 28, h = 0f = 28

X

20

X

2

if know actual costs for each state, Alg. A reduces to hill-climbing

Page 9: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

AdmissibilityAdmissibilityin general, actual costs are unknown at start – must rely on heuristics

if the heuristic is imperfect, Alg. A is NOT guaranteed to find an optimal solution

S1

S2 S3

g = 0, h = 2f = 2

g = 1, h = 4f = 5

g = 3, h = 1f = 4

1 3

X

1

G

1

g = 4, h = 0f = 4

if a control strategy is guaranteed to find an optimal solution (when a solution exists), we say it is admissibleif cost estimate h never overestimates actual cost, then Alg. A is admissible

(when admissible, Alg. A is commonly referred to as Alg. A*)

Page 10: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Admissible examplesAdmissible examplesis our heuristic for the travel problem admissible?

h (State, Goal) = crow-flies distance from Goal

is our heuristic for the 8-puzzle admissible?

h (State, Goal) = number of tiles out of place, including the space

is our heuristic for the Missionaries & Cannibals admissible?

Page 11: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Cost of the searchCost of the searchthe closer h is to the actual cost function, the fewer states considered

however, the cost of computing h tends to go up as it improves

also, admissibility is not always needed or desired

Graceful Decay of Admissibility: If h rarely overestimates the actual cost by more than D, then Alg. A will rarely find a solution whose cost exceeds optimal by more than D.

computationcost

closeness of h toactual cost

cost of solving problem

cost of computing h

cost of search using h

the best algorithm is one that minimizes the total cost of the solution

Page 12: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Algorithm A (for graphs)Algorithm A (for graphs)representing the search search as a tree is conceptually simpler

but must store entire paths, which include duplicates of states

more efficient to store the search space as a graph store states w/o duplicates, need only remember parent for each state

(so that the solution path can be reconstructed at the end)

IDEA: keep 2 lists of states (along with f & g values, parent pointer)OPEN: states reached by the search, but not yet expandedCLOSED: states reached and already expanded

S1OPEN = [S1:40:40:none]CLOSED = [ ]

h = 40S1

OPEN = [S2:35:10:S1, S3:40:5:S1 ]CLOSED = [S1:40:40:none]

S2 S3

h = 40

10 5

h = 25 h = 35

while more efficient, the graph implementation is trickier when a state moves to the CLOSED list, it may not be finished may have to revise values of states if new (better) paths are found

Page 13: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Flashlight exampleFlashlight exampleconsider the flashlight puzzle discussed in class:

Four people are on one side of a bridge. They wish to cross to the other side, but the bridge can only take the weight of two people at a time. It is dark and they only have one flashlight, so they must share it in order to cross the bridge. Assuming each person moves at a different speed (able to cross in 1, 2, 5 and 10 minutes, respectively), find a series of crossings that gets all four across in the minimal amount of time.

state representation?

cost of a move?

heuristic?

Page 14: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Flashlight implementationFlashlight implementationstate representation must identify the locations of each person and the flashlight

bridge(SetOfPeopleOnLeft, SetOfPeopleOnRight, FlashlightLocation)

note: can use a list to represent a set, but must be careful of permutations e.g., [1,2,5,10] = [1,5,2,10], so must make sure there is only one list repr. per set

solution: maintain the lists in sorted order, so only one permutation is possible

only 3 possible moves:1. if the flashlight is on left and only 1 person on left, then

move that person to the right (cost is time it takes for that person)2. if flashlight is on left and at least 2 people on left, then

select 2 people from left and move them to right (cost is max time of the two)3. if the flashlight is on right, then

select a person from right and move them to left (cost is time for that person)

heuristic:h(State, Goal) = number of people in wrong place

Page 15: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Flashlight code%%% flashlight.pro Dave Reed 3/15/02%%%%%% This file contains the state space definition%%% for the flashlight puzzle.%%%%%% bridge(Left, Right, Loc): Left is a (sorted) list of people%%% on the left shore, Right is a (sorted) list of people on %%% the right shore, and Loc is the location of the flashlight.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

move(bridge([P], Right, left), bridge([], NewRight, right), P) :- merge([P], Right, NewRight).

move(bridge(Left, Right, left), bridge(NewLeft, NewRight, right), Cost) :- length(Left, L), L >= 2, select(P1, Left, Remain), select(P2, Remain, NewLeft), merge([P1], Right, TempRight), merge([P2], TempRight, NewRight), Cost is max(P1, P2).

move(bridge(Left, Right, right), bridge(NewLeft, NewRight, left), P) :- select(P, Right, NewRight), merge([P], Left, NewLeft).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% h(State, Goal, Value) : Value is # of people out of place

h(bridge(Left, Right, _), bridge(GoalLeft, GoalRight, _), Value) :- diff_count(Left, GoalLeft, V1), diff_count(Right, GoalRight, V2), Value is V1 + V2.

diff_count([], _, 0).diff_count([H|T], L, Count) :- diff_count(T, L, TCount), (member(H,L), !, Count is TCount ; Count is TCount+1).

merge(L1,L2,L3):

L3 is the result of merging sorted lists L1 & L2

select(X,L,R):

R is the remains after removing X from list L

Page 16: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Flashlight answersFlashlight answers

?- atree(bridge([1,2,5,10],[],left), bridge([],[1,2,5,10],right), Path).

Path = 17:[bridge([1, 2, 5, 10], [], left), bridge([5, 10], [1, 2], right), bridge([1, 5, 10], [2], left), bridge([1], [2, 5, 10], right), bridge([1, 2], [5, 10], left), bridge([], [1|...], right)] ;

Path = 17:[bridge([1, 2, 5, 10], [], left), bridge([5, 10], [1, 2], right), bridge([2, 5, 10], [1], left), bridge([2], [1, 5, 10], right), bridge([1, 2], [5, 10], left), bridge([], [1|...], right)] ;

Path = 19:[bridge([1, 2, 5, 10], [], left), bridge([2, 5], [1, 10], right), bridge([1, 2, 5], [10], left), bridge([2], [1, 5, 10], right), bridge([1, 2], [5, 10], left), bridge([], [1|...], right)]

Yes

Algorithm A finds optimal solutions to the puzzle note: more than one solution is optimal can use ';' to enumerate solutions, from best to worst

Page 17: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Search in game playingSearch in game playingconsider games involving:

2 players perfect information zero-sum (player's gain is opponent's loss)

examples: tic-tac-toe, checkers, chess, othello, …

non-examples: poker, backgammon, prisoner's dilemma, …

von Neumann (the father of game theory) showed that for such games, there is always a "rational" strategy that is, can always determine a best move, assuming the opponent is equally

rational

X

X

what is X'srational move?

O

O

Page 18: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Game treesGame treesidea: model the game as a search tree

associate a value with each game state (possible since zero-sum)player 1 wants to maximize the state value (call him/her MAX)player 2 wants to minimize the state value (call him/her MIN)

players alternate turns, so differentiate MAX and MIN levels in the tree

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

player 1's move (MAX)

player 1's move (MAX)

player 2's move (MIN)

the leaves of the tree will be end-of-game states

Page 19: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Minimax searchMinimax search

can visualize the search bottom-up (start at leaves, work up to root)

likewise, can search top-down using recursion

WINFORMAX

WINFORMIN

WINFORMAX

WINFORMAX

WINFORMIN

WINFORMIN

player 1's move (MAX)

player 1's move (MAX)

player 2's move (MIN)

minimax search: at a MAX level, take the maximum of all possible moves at a MIN level, take the minimum of all possible moves

Page 20: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Minimax exampleMinimax exampleO

X

O

X

O

X's move (MAX)

O

X

O

X X

O

O

X

O

X X

O

O

X

O O

O's move (MIN)

X

X

O

X

O

X X

O

O X

O

X

O

X X

O

O

X

X's move (MAX)

O

O

DRAW O (MIN) WINS

Page 21: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

In-class exerciseIn-class exercise

X

OX

O

X's move (MAX)

O

X

Page 22: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Minimax in practiceMinimax in practicewhile Minimax Principle holds for all 2-party, perfect info, zero-sum

games, an exhaustive search to find best move may be infeasible

EXAMPLE: in an average chess game, ~100 moves with ~35 options/move ~35100 states in the search tree!

practical alternative: limit the search depth and use heuristics expand the search tree a limited number of levels (limited look-ahead) evaluate the "pseudo-leaves" using a heuristic

high value good for MAX low value good for MINback up the heuristic estimates to determine the best-looking move

at MAX level, take minimum at MIN level, take maximum

MAX

MIN

4 -2 0

MAX

MIN

5 -2 14-3

Page 23: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

Tic-tac-toe exampleTic-tac-toe example1000 if win for MAX (X)

heuristic(State) = -1000 if win for MIN (O)(#rows/cols/diags open for MAX – #rows/cols/diags open for MIN) otherwise

suppose look-ahead of 2 moves

{

X

X

X

O

X

O

4-5 = -1 5-5 = 0

X

XO

5-5 = 0 XO

6-5 = 1

X

O6-5 = 1

X

O

X

O

4-6 = -2

XO

5-6 = -1 X

O

6-6 = 0

X

O6-6 = 0

5-6 = -1

X

O

6-4=2

O

5-4 = 1

X

Page 24: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

--boundsboundssometimes, it isn't necessary to search the entire tree

5 10 2 ??? -10 3 5 ???

- technique: associate bonds with state in the search associate lower bound with MAX: can increase associate upper bound with MIN: can decrease

5

>= 5 ( )

3

<= 3 ( )

Page 25: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

-- pruning pruningdiscontinue search below a MIN node if value <= value of ancestor

5

>= 5 (

<= 3 ( )

already searched

no need to search

discontinue search below a MAX node if value >= value of ancestor

3

<= 3 ( )

>= 5 ( )

already searched

no need to search

5 10 2 ???

-10 3 6 ???

Page 26: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

larger examplelarger example

5 3 7 1 3 4 6 8

Page 27: Applications of informed search  optimization problems Algorithm A, admissibility, A*  zero-sum game playing minimax principle, alpha-beta pruning Dave

tic-tac-toe exampletic-tac-toe example

X

X

X

O

X

O

4-5 = -1 5-5 = 0

X

XO

5-5 = 0 XO

6-5 = 1

X

O6-5 = 1

X

O

X

O

4-6 = -2

XO

5-6 = -1 X

O

6-6 = 0

X

O6-6 = 0

5-6 = -1

X

O

6-4=2

O

5-4 = 1

X

- vs. minimax:worst case: - examines as many states as minimax

best case: assuming branching factor B and depth D, - examines ~2bd/2 states

(i.e., as many as minimax on a tree with half the depth)