final all-pairs shortest paths
TRANSCRIPT
-
8/10/2019 Final All-pairs Shortest Paths
1/45
All-Pairs Shortest Paths
Csc8530Dr. Prasad
Jon A Preston
March 17, 2004
-
8/10/2019 Final All-pairs Shortest Paths
2/45
Outline
Review of graph theory
Problem definition
Sequential algorithms
Properties of interest
Parallel algorithm
Analysis
Recent research References
-
8/10/2019 Final All-pairs Shortest Paths
3/45
Graph Terminology
G = (V, E)
W = weight matrix wij= weight/length of edge (vi, vj)
wij= if viand vjare not connected by an edge wii= 0
Assume W has positive, 0, and negative values
For this problem, we cannot have a negative-sumcycle in G
-
8/10/2019 Final All-pairs Shortest Paths
4/45
Weighted Graph and Weight Matrix
v3
v2
v0 v1
v4
1 2
5
7
6
9
-43
0692060701
97034
20305
01450
43
2
1
0
0 1 2 3 4
-
8/10/2019 Final All-pairs Shortest Paths
5/45
Directed Weighted Graph and Weight Matrix
v4
v2
v0 v3
v5
-1 5
-2
9
4
3
1
2
056
401
072
30
920
10
5
4
3
2
1
0
0 1 2 3 4 5
v17
6
-
8/10/2019 Final All-pairs Shortest Paths
6/45
All-Pairs Shortest Paths Problem Defined
For every pair of vertices viand vjin V, it isrequired to find the length of the shortest path
from vito vjalong edges inE.
Specifically, a matrix Dis to be constructed suchthat dijis the length of the shortest path from vito
vjin G, for all iandj.
Length of a path (or cycle) is the sum of thelengths (weights) of the edges forming it.
-
8/10/2019 Final All-pairs Shortest Paths
7/45
Sample Shortest Path
v4
v2
v0 v3
v5
-1 5
-2
9
4
3
1
2v1
7
6
Shortest path from v0to v4is along edges
(v0, v1), (v1, v2), (v2, v4)
and has length 6
-
8/10/2019 Final All-pairs Shortest Paths
8/45
Disallowing Negative-length Cycles
APSP does not allow for input to containnegative-length cycles
This is necessary because:
If such a cycle were to exist within a path from vito vj,then one could traverse this cycle indefinitely,
producing paths of ever shorter lengths from vito vj.
If a negative-length cycle exists, then all paths
which contain this cycle would have a lengthof -.
-
8/10/2019 Final All-pairs Shortest Paths
9/45
Recent Work on Sequential Algorithms
Floyd-Warshall algorithm is (V3) Appropriate for dense graphs: |E| = O(|V|2)
Johnsons algorithm Appropriate for sparse graphs: |E| = O(|V|)
O(V2 log V + V E) if using a Fibonacci heap
O(V E log V) if using binary min-heap
Shoshan and Zwick (1999) Integer edge weights in {1, 2, , W}
O(W Vp(V W)) where 2.376 and p is a polylog function
Pettie (2002) Allows real-weighted edges
O(V2 log log V + V E)
Strassens Algorithm(matrix multiplication)
1),(
k v
k
k
wwvp
-
8/10/2019 Final All-pairs Shortest Paths
10/45
Properties of Interest
Let denote the length of the shortest path fromvito vjthat goes through at most k - 1 intermediate
vertices (k hops)
= wij(edge length from vito vj) If ijand there is no edge from vito vj, then
Also,
Given that there are no negative weighted cyclesin G, there is no advantage in visiting any vertex
more than once in the shortest path from vito vj.
Since there are only nvertices in G,
ijij wd1
kijd
1
ij
d
01 iiii wd
1 nijij dd
-
8/10/2019 Final All-pairs Shortest Paths
11/45
Guaranteeing Shortest Paths
If the shortest path from vito vjcontains vrand vs(wherevrprecedes vs)
The path from vrto vsmust be minimal (or it wouldntexist in the shortest path)
Thus, to obtain the shortest path from vito vj, we cancompute all combinations of optimal sub-paths (whose
concatenation is a path from vito vj), and then select the
shortest one
vi vs vj
MIN MIN
vr
MIN
MINs
-
8/10/2019 Final All-pairs Shortest Paths
12/45
Iteratively Building Shortest Paths
1kijd
vi vj
v1w1j1
1
kid
v2w2j
1
2
kid
vn wnj1kind
lj
k
il
n
l
k
ij
k
ij wddd 1
1
1 min,min
ljkiln
l
k
ij wdd
1
1
min
-
8/10/2019 Final All-pairs Shortest Paths
13/45
Recurrence Definition
For k> 1,
Guarantees O(log k) steps to calculate
)(min 2/2/ kljkill
kij ddd
vi vl vj
k/2 vertices k/2 vertices
kvertices
MIN MIN
k
ijd
-
8/10/2019 Final All-pairs Shortest Paths
14/45
Similarity
lj
k
il
n
l
k
ij
wdd
1
1
min
n
l
ljilij BAC1
-
8/10/2019 Final All-pairs Shortest Paths
15/45
Computing D
LetDk= matrix with entries dijfor 0 i,j n- 1.
GivenD1, computeD2,D4, ,Dm
D = Dm
To calculateDkfromDk/2, use special form of
matrix multiplication
min
)1log(2 nm
-
8/10/2019 Final All-pairs Shortest Paths
16/45
Modified Matrix Multiplication
Step 2: forr= 0 toN1 dopar
Cr=Ar+Br
end
Step 3: form= 2qto3q1 do
forallr N(rm= 0) dopar
Cr= min(Cr, Cr(m))
-
8/10/2019 Final All-pairs Shortest Paths
17/45
Modified Example
43
21A
43
21B
2215
107C
23
1
-1
1
-2
2-4
4
-3
3
-1
3
-2
4
-4
P100 P101
P000 P001
P110 P111P010 P011
From 9.2, after step (1.3)
-
8/10/2019 Final All-pairs Shortest Paths
18/45
Modified Example (step 2)
43
21A
43
21B
P100 P101
P000 P001
P110 P111P010 P011
From 9.2, after
modified step 2
5 -2
1 0
0 -1
2 1
-
8/10/2019 Final All-pairs Shortest Paths
19/45
Modified Example (step 3)
43
21A
43
21B
P100 P101
P000 P001
P110 P111P010 P011
From 9.2, after
modified step 30 -2
1 0
MIN MIN
MIN MIN
01
20C
-
8/10/2019 Final All-pairs Shortest Paths
20/45
Hypercube Setup
Begin with a hypercube of n3processors Each has registersA,B, and C
Arrange them in an nnnarray (cube)
SetA(0,j, k) = wjkfor 0 j, k n1 i.e processors in positions (0,j, k) containD1= W
When done, C(0,j, k) contains APSP =Dm
-
8/10/2019 Final All-pairs Shortest Paths
21/45
-
8/10/2019 Final All-pairs Shortest Paths
22/45
APSP Parallel Algorithm
Algorithm HYPERCUBE SHORTEST PATH (A,C)Step 1: forj= 0 ton- 1 dopar
fork= 0 ton- 1 dopar
B(0,j, k) = A(0,j, k)
end for
end for
Step 2: fori= 1 to do
(2.1) HYPERCUBE MATRIX MULTIPLICATION(A,B,C)
(2.2) forj= 0 ton- 1 dopar
fork = 0 ton- 1 dopar
(i) A(0,j, k) = C(0,j, k)
(ii) B(0,j, k) = C(0,j, k)
end for
end forend for
)1log( n
-
8/10/2019 Final All-pairs Shortest Paths
23/45
An Example
056
401
072
30
920
10
5
4
3
2
1
00 1 2 3 4 5
D1=
09563
4091001
100712
730
135208
10310
5
4
3
2
1
0
D2=
D4=
0 1 2 3 4 5
095643
409201
1240112
7312032
9514204
10619310
5
4
3
2
1
0
0 1 2 3 4 5
D8=
095643
409201
840112
7312032
9514204
10615310
5
4
3
2
1
0
0 1 2 3 4 5
-
8/10/2019 Final All-pairs Shortest Paths
24/45
Analysis
Steps 1 and (2.2) require constant time
There are iterations of Step (2.1) Each requires O(log n) time
The overall running time is t(n) = O(log2n)
p(n) = n3
Cost is c(n) =p(n) t(n) = O(n3
log2
n)
Efficiency is
)1log( n
)(log
1
)log(
)(
)( 223
3
1
nOnnO
nO
nc
TE
-
8/10/2019 Final All-pairs Shortest Paths
25/45
Recent Research
Jenq and Sahni (1987) compared various parallelalgorithms for solving APSP empirically
Kumar and Singh (1991) used the isoefficiencymetric (developed by Kumar and Rao) to analyze
the scalability of parallel APSP algorithms
Hardware vs. scalability
Memory vs. scalability
-
8/10/2019 Final All-pairs Shortest Paths
26/45
Isoefficiency
For scalable algorithms (efficiency increasesmonotonically as p remains constant and problem
size increases), efficiency can be maintained for
increasing processors provided that the problemsize also increases
Relates the problem size to the number ofprocessors necessary for an increase in speedup
in proportion to the number of processors used
-
8/10/2019 Final All-pairs Shortest Paths
27/45
Isoefficiency (cont)
Given an architecture, defines thedegree of scalability Tells us the required growth in problem size to be able to efficiently
utilize an increasing number of processors
Ex:
Given an isoefficiency of kp3
Ifp0and w0, speedup = 0.8p0(efficiency = 0.8)Ifp1= 2p0, to maintain efficiency of 0.8
w1= 23w0= 8w0
Indicates the superiority of one algorithm over another only whenproblem sizes are increased in the range between the twoisoefficiency functions
-
8/10/2019 Final All-pairs Shortest Paths
28/45
Isoefficiency (cont)
Given an architecture, defines thedegree of scalability Tells us the required growth in problem size to be able to efficiently
utilize an increasing number of processors
Ex:
Given an isoefficiency of kp3
Ifp0and w0, speedup = 0.8p0(efficiency = 0.8)If w1= 2w0, to maintain efficiency of 0.8
p1= 23w0= 8w0
Indicates the superiority of one algorithm over another only whenproblem sizes are increased in the range between the twoisoefficiency functions
Given an isoefficiency of kp3
Ifp0and w0, speedup = 0.8p0(efficiency = 0.8)
Ifp1= 2p0, to maintain efficiency of 0.8w1= 23w0= 8w0
-
8/10/2019 Final All-pairs Shortest Paths
29/45
Memory Overhead Factor (MOF)
Ratio:
Total memory required for all processors
Memory required for the same problems size on single processor
Wed like this to be lower!
-
8/10/2019 Final All-pairs Shortest Paths
30/45
Architectures Discussed
Shared Memory (CREW)
Hypercube (Cube)
Mesh
Mesh with Cut-Through Routing Mesh with Cut-Through and Multicast Routing
Also examined fast and slow communicationtechnologies
-
8/10/2019 Final All-pairs Shortest Paths
31/45
Parallel APSP Algorithms
Floyd Checkerboard
Floyd Pipelined Checkerboard
Floyd Striped
Dijkstra Source-Partition Dijkstra Source-Parallel
-
8/10/2019 Final All-pairs Shortest Paths
32/45
General Parallel Algorithm (Floyd)
Repeat steps 1 through 4 for k:= 1 to n
Step 1: If this processor has a segment ofPk-1[*,k], then transmit it to allprocessors that need it
Step 2: If this processor has a segment ofPk-1[k,*], then transmit it to all
processors that need itStep 3: Wait until the needed segments ofPk-1[*,k] andPk-1[k,*] have
been received
Step 4: For all i,jin this processors partition, computePk[i,j] := min {Pk-1[i,j],Pk-1[i,k] +Pk-1[k,j]}
-
8/10/2019 Final All-pairs Shortest Paths
33/45
Floyd Checkerboard
p
n
p
n
Each cell is assigned to adifferent processor, and this
processor is responsible for
updating the cost matrix
values at each iteration ofthe Floyd algorithm.
Steps 1 and 2 of the GPF
involve each of the
processors sending theirdata to the neighborcolumns and rows.
p
-
8/10/2019 Final All-pairs Shortest Paths
34/45
Floyd Pipelined Checkerboard
p
n
p
n
Similar to the preceding.
Steps 1 and 2 of the GPF
involve each of the
processors sending theirdata to the neighborcolumns and rows.
The difference is that the
processors are notsynchronizedand compute
and send data ASAP (or
sends as soon as it receives).
p
-
8/10/2019 Final All-pairs Shortest Paths
35/45
Floyd Striped
p
n
Each column is assigned adifferent processor, and this
processor is responsible for
updating the cost matrix
values at each iteration of
the Floyd algorithm.
Step 1 of the GPF
involves each of the
processors sending their
data to the neighborcolumns. Step 2 is not
needed (since the column
is contained within the
processor).
p
-
8/10/2019 Final All-pairs Shortest Paths
36/45
Dijkstra Source-Partition
Assumes Dijkstras Single-source Shortest Path is equallydistributed overpprocessors and executed in parallel
Processorpfinds shortest paths from each vertex in itsset to all other vertices in the graph
Fortunately, this approach involves nointer-processor communication
Unfortunately, only nprocessors can be kept busy
Also, memory overhead is high since each processors hasa copy of the weight matrix
-
8/10/2019 Final All-pairs Shortest Paths
37/45
Dijkstras Source-Parallel
Motivated by keeping more processors busy Run ncopies of the Dijkstras SSP
Each copy runs on processors (p> n)
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
n
p
-
8/10/2019 Final All-pairs Shortest Paths
38/45
Calculating Isoefficiency
Example: Floyd Checkerboard
At most n2processors can be kept busy
n must grow as (p) due to problem structure By Floyd (sequential), Te= (n3)
Thus isoefficiency is (p3) = (p1.5)
But what about communication
-
8/10/2019 Final All-pairs Shortest Paths
39/45
Calculating Isoefficiency (cont)
ts= message startup time tw= per-word communication time tc= time to compute next iteration value for one cell in matrix m = number words sent d = number hops between nodes
Hypercube: (ts+ twm) log d = time to deliver m words 2 (ts+ twm) log p = barrier synchronization time (up & down tree) d = p
Step 1 = (ts+ twn/p) log p Step 2 = (ts+ twn/p) log p Step 3 (barrier synch) = 2(ts+ tw) log p Step 4 = tcn2/p
p
ntpttp
p
nttnT
cwswsp
2
log2log2
Isoefficiency = (p1.5(log p)3)
-
8/10/2019 Final All-pairs Shortest Paths
40/45
Mathematical Details
epo TpTT
32
log2log2 ntp
ntpttp
p
nttnpT ccwswso
ppntpnpttTwwso
loglog23 2
How are nandp
related?
-
8/10/2019 Final All-pairs Shortest Paths
41/45
Mathematical Details
epo TpTT
32
log2log2 ntp
ntpttp
p
nttnpT ccwswso
ppntpnpttTwwso
loglog23 2
5.1)log( pp 35.1 )(log pp
35.1 )(log pp
ppntpnpttKnt wwsc loglog23 23
E
E
1
-
8/10/2019 Final All-pairs Shortest Paths
42/45
Calculating Isoefficiency (cont)
ts= message startup time tw= per-word communication time
tc= time to compute next iteration value for one cell in matrix
m = number words sent
d = number hops between nodes
Mesh: Step 1 =
Step 2 =
Step 3 (barrier synch) = Step 4 = Te
pnppp
nsynccommTp
2)/( Isoefficiency = (p3+p2.25)
= (p3)
pp
n
pp
n
p
Isoefficiency and MOF for
-
8/10/2019 Final All-pairs Shortest Paths
43/45
Isoefficiency and MOF for
Algorithm & Architecture Combinations
Base Algorithm Parallel Variant Architecture Isoefficiency MOF
Dijkstra Source-
Partitioned
SM, Cube, Mesh, Mesh-CT,
Mesh-CT-MC
p3 p
Dijkstra Source-Parallel SM, Cube (plogp)1.5 n
Mesh, Mesh-CT
Mesh-CT-MC
p1.8 n
Floyd Stripe SM p3 1
Cube (plogp)3 1
Mesh p4.5 1
Mesh-CT (plogp)3 1
Mesh-CT-MC p3 1
Floyd Checkerboard SM p1.5 1
Cube p1.5(logp)3 1
Mesh p3 1
Mesh-CT p2.25 1
Mesh-CT-MC p2.25 1
Floyd Pipelined
Checkerboard
SM, Cube, Mesh, Mesh-CT,
Mesh-CT-MC
p1.5 1
-
8/10/2019 Final All-pairs Shortest Paths
44/45
Comparing Metrics
Weve used cost previously this semester(cost =pTp)
But notice that the cost of all of the architecture-
algorithm combinations discussed here is (n3
)
Clearly some are more scalable than others
Thus isoefficiency is a useful metric whenanalyzing algorithms and architectures
-
8/10/2019 Final All-pairs Shortest Paths
45/45
References
Akl S. G. Parallel Computation: Models and Methods. PrenticeHall, Upper Saddle River NJ, pp. 381-384,1997. Cormen T. H., Leiserson C. E., Rivest R. L., and Stein C.
Introduction to Algorithms (2ndEdition). The MIT Press, CambridgeMA, pp. 620-642, 2001.
Jenq J. and Sahni S. All Pairs Shortest Path on a Hypercube
Multiprocessor. In International Conference on Parallel Processing.pp. 713-716, 1987.
Kumar V. and Singh V. Scalability of Parallel Algorithms for the AllPairs Shortest Path Problem. Journal of Parallel and DistributedComputing, vol. 13, no. 2, Academic Press, San Diego CA, pp. 124-138, 1991.
Pettie S. A Faster All-pairs Shortest Path Algorithm for Real-weighted Sparse Graphs. In Proc. 29th Int'l Colloq. on Automata,Languages, and Programming (ICALP'02), LNCS vol. 2380, pp. 85-97, 2002.