9/10 name plates for everyone!. blog qn. on dijkstra algorithm.. what is the difference between...
Post on 21-Dec-2015
219 views
TRANSCRIPT
9/10
Name plates for everyone!
Blog qn. on Dijkstra Algorithm..
• What is the difference between Uniform Cost Search and Dijkstra algorithm?
• Given the difference, which algorithm is better (and when)?
• Any ideas on the other question?
“Informing” Uniform search…
A
B
C
D
G
9
0.1
0.1
0.1
25
Bait &SwitchGraph
No:A (0)
N1:B(.1) N2:G(9)
N3:C(.2)
N4:D(.3)
N5:G(25.3)
Would be nice if we could tell thatN2 is better than N1 --Need to take not just the distance until now, but also distance to goal --Computing true distance to goal is as hard as the full search --So, try “bounds” h(n) prioritize nodes in terms of f(n) = g(n) +h(n) two bounds: h1(n) <= h*(n) <= h2(n) Which guarantees optimality?--h1(n) <= h2(n) <= h*(n) Which is better function?
Admissibility
Informedness
A*(if there are multiple goal nodes, we consider the distance to the nearest goal node)
Several proofs: 1. Based on Branch and bound --g(N) is better than f(N’’) and f(n’’) <= cost of best path through N’’ 2. Based on contours -- f() contours are more goal directed than g() contours 3. Based on contradiction
No
N
N’’
f(n) is the estimate of the length of the shortest path to goal passing through n
A
B
C
D
G
9
.1
.1
.1
25
A* Search
No:A (0)
N1:B(.1+8.8) N2:G(9+0)
N3:C(max(.2+0),8.8)
N4:D(.3+25)
7
20
0
28
25
7
8.8
0
0
25
9
25.2
0
25.1
25
No:A (0)
N1:B(.1+25.2)N2:G(9+0)
f(B)= .1+8.8 = 8.9f(C)= .2+0 = 0.2 This doesn’t make sense since we are reducing the estimate of the actual cost of the path A—B—C—D—G To make f(.) monotonic along a path, we say f(n) = max( f(parent), g(n)+h(n))
PathMax Adjustment
This is just enforcingTriangle law of inequalityThat the sum of two sides
Must be greater than the thirdB
C
G
f(C)
f(B)
C(B
,C)
It will not expandNodes with f >f*(f* is f-value of theOptimal goal whichis the same as g* sinceh value is zero for goals)Uniform
cost search
A*
Visualizing A* Search
IDA*--do iterativedepth first search but Set threshold in terms off (not depth)
(h*-h)/h*
IDA* to handle the A* memory problem
• Basicaly IDDFS, except instead of the iterations being defined in terms of depth, we define it in terms of f-value
– Start with the f cutoff equal to the f-value of the root node
– Loop• Generate and search all nodes whose f-values are
less than or equal to current cutoff. – Use depth-first search to search the trees in the
individual iterations– Keep track of the node N’ which has the smallest f-
value that is still larger than the current cutoff. Let this f-value be next-largest-f-value
-- If the search finds a goal node, terminate. If not, set cutoff = next-largest-f-value and go back to Loop
Properties: Linear memory. #Iterations in the worst case? =
Bd !! (Happens when all nodes have distinct f-values. There is such a thing as too much discrimination…)
Very similar to IDDUC discussed last class
Using memory more effectively: SMA*
• A* can take exponential space in the worst case• IDA* takes linear space (in solution depth) always• If A* is consuming too much space, one can argue that
IDA* is consuming too little• Better idea is to use all the memory that is available, and
start cleaning up as memory starts filling up– Idea: When the memory is about to fill up, remove the leaf node
with the worst f-value from the search tree• But remember its f-value at its parent (which is still in the search
tree)– Since the parent is now the leaf node, it too can get removed to make
space• If ever the rest of the tree starts looking less promising than the
parent of the removed node, the parent will be picked up and expanded again.
– Works quite well—but can thrash when memory is too low• Not unlike your computer with too little RAM..
Different levels of abstraction for shortest path problems on the plane
I
G
I
G
“circular abstraction”
I
G
“Polygonal abstraction”
I
G
“disappearing-act abstraction”
hD
hC
hP
h*
The obstacles in the shortest path problem canbe abstracted in a variety of ways. --The more the abstraction, the cheaper it is to solve the problem in abstract space --The less the abstraction, the more “informed” the heuristic cost (i.e., the closer the abstract path length to actual path length)
Actual
hDhC hP
h*h0
Cost of computing the heuristic
Cost of searching with the heuristic
Total cost incurred in search
Not always clear where the total minimum occurs• Old wisdom was that the global min was closer to cheaper heuristics• Current insights are that it may well be far from the cheaper heuristics for many problems
• E.g. Pattern databases for 8-puzzle • polygonal abstractions for SP• Plan graph heuristics for planning
How informed should the heuristic be?
I
G
I
G
“circular abstraction”
I
G
“Polygonal abstraction”
I
GhD
hC
hP
h*Actual
9/12
h*
h1
h4
h5
Admissibility/Informedness
h2h3
Max(h2,h3)
On “predicting” the effectiveness of Heuristics
• Unfortunately, it is not the case that a heuristic h1 that is more informed than h2 will always do fewer node expansions than h2.
-We can only gurantee that h1 will expand less nodes with f-value less than f* than h2 will
• Consider the plot on the right… do you think h1 or h2 is likely to do better in actual search?
– The “differentiation” ability of the heuristic—I.e., the ability to tell good nodes from the bad ones-- is also important. But it is harder to measure.
• Some new work that does a histogram characterization of the distribution of heuristic values [Korf, 2000]
• Nevertheless, informedness of heuristics is a reasonable qualitative measure
NodesH
euri
stic
val
ue
h1
h2
h*
Let us divide the number of nodes expanded nE intoTwo parts: nI which is the number of nodes expandedWhose f-values were strictly less than f* (I.e. the Cost of the optimal goal), and nG is the # of expandedNodes with f-value greater than f*. So, nE=nI+nG
A more informed heuristic is only guaranteed to haveA smaller nI—all bets are off as far as the nG value isConcerned. In many cases nG may be relatively largeCompared to nI making the nE wind up being higher For an informed heuristic!
Is h1 better or h2?
Proof of Optimality of A* search
Proof of optimality: Let N be the goal node we output.Suppose there is another goal node N’We want to prove that g(N’) >= g(N)Suppose this is not true. i.e. g(N’) < g(N) --Assumption A1
When N was picked up for expansion,Either N’ itself, or some ancestor of N’,Say N’’ must have been on the search queue
If we picked N instead of N’’ for expansion,It was because
f(N) <= f(N’’) ---Fact f1i.e. g(N) + h(N) <= g(N’’) + h(N’’) Since N is goal node, h(N) = 0So, g(N) <= g(N’’) + h(N’’)
But g(N’) = g(N’’) + dist(N’’,N’)Given h(N’) <= h*(N’’) = dist(N’’,N’) (lower bound)So g(N’) = g(N’’)+dist(N’’,N’) >= g(N’’) +h(N’’) ==Fact f2So from f1 and f2 we have g(N) <= g(N’) But this contradicts our assumption A1
No
N N’
N’’
Holds only because h(N’’) is a lower bound on dist(N’’,N’)
The lower-bound (optimistic) estimate on the length of the path to N’ through N’’ is already longer than the path to N.
f(n) is the estimate of the length of the shortest path to goal passing through n
Where do heuristics (bounds) come from?
From relaxed problems (the more relaxed, the easier to compute heuristic, but the less accurate it is)
For path planning on the plane (with obstacles)?
For 8-puzzle problem?
For Traveling sales person?
Assume away obstacles. The distance will then beThe straightline distance (see next slide for other abstractions)
Assume ability to move the tile directly to the place distance= # misplaced tilesAssume ability to move only one position at a time distance = Sum of manhattan distances.
Relax the “circuit” requirement. Minimum spanning tree
Important: “blank” is not counted as a tile..
Performance on 15 Puzzle
• Random 15 puzzle instances were first solved optimally using IDA* with Manhattan distance heuristic (Korf, 1985).
• Optimal solution lengths average 53 moves.
• 400 million nodes generated on average.
• Average solution time is about 50 seconds on current machines.
Limitation of Manhattan Distance
• To solve a 24-Puzzle instance, IDA* with Manhattan distance would take about 65,000 years on average.
• Assumes that each tile moves independently
• In fact, tiles interfere with each other.
• Accounting for these interactions is the key to more accurate heuristic functions.
Getting Fringe Pattern in Shape..
37
1112 13 14 15
14 73
15 1211 13
M.d. is 19 moves, but 31 moves are needed.
M.d. is 20 moves, but 28 moves are needed
37
1112 13 14 15
7 1312
15 311 14
M.d. is 17 moves, but 27 moves are needed
37
1112 13 14 15
12 117 14
13 315
Heuristics from Pattern Databases
1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
5 10 14 7
8 3 6 1
15 12 9
2 11 4 13
31 moves is a lower bound on the total number of moves needed to solve this particular state.
Pattern Database Heuristics
• Culberson and Schaeffer, 1996• A pattern database is a complete set of
such positions, with associated number of moves.
• The bigger the fringe pattern, the more informed the heuristic; but the costlier it is to compute and store..– e.g. a 7-tile pattern database for the Fifteen
Puzzle contains 519 million entries.
Precomputing Pattern Databases
• Entire database is computed with one backward breadth-first search from goal.
• All non-pattern tiles are indistinguishable, but all tile moves are counted.
• The first time each state is encountered, the total number of moves made so far is stored.
• Once computed, the same table is used for all problems with the same goal state.
h#misphmanhatt hpat1
h*h0
Cost of computing the heuristic
Cost of searching with the heuristic
Total cost incurred in search
Not always clear where the total minimum occurs• Old wisdom was that the global min was closer to cheaper heuristics• Current insights are that it may well be far from the cheaper heuristics for many problems
• E.g. Pattern databases for 8-puzzle • polygonal abstractions for SP• Plan graph heuristics for planning
How informed should the heuristic be?
hpat2