final all-pairs shortest paths

8/10/2019 Final All-pairs Shortest Paths

1/45

All-Pairs Shortest Paths

Csc8530Dr. Prasad

Jon A Preston

March 17, 2004


2/45

Outline

Review of graph theory

Problem definition

Sequential algorithms

Properties of interest

Parallel algorithm

Analysis

Recent research References


3/45

Graph Terminology

G = (V, E)

W = weight matrix wij= weight/length of edge (vi, vj)

wij= if viand vjare not connected by an edge wii= 0

Assume W has positive, 0, and negative values

For this problem, we cannot have a negative-sumcycle in G


4/45

Weighted Graph and Weight Matrix

v3

v2

v0 v1

v4

1 2

5

7

6

9

-43

0692060701

97034

20305

01450

43

2

1

0

0 1 2 3 4


5/45

Directed Weighted Graph and Weight Matrix

v4

v2

v0 v3

v5

-1 5

-2

9

4

3

1

2

056

401

072

30

920

10

5

4

3

2

1

0

0 1 2 3 4 5

v17

6


6/45

All-Pairs Shortest Paths Problem Defined

For every pair of vertices viand vjin V, it isrequired to find the length of the shortest path

from vito vjalong edges inE.

Specifically, a matrix Dis to be constructed suchthat dijis the length of the shortest path from vito

vjin G, for all iandj.

Length of a path (or cycle) is the sum of thelengths (weights) of the edges forming it.


7/45

Sample Shortest Path

v4

v2

v0 v3

v5

-1 5

-2

9

4

3

1

2v1

7

6

Shortest path from v0to v4is along edges

(v0, v1), (v1, v2), (v2, v4)

and has length 6


8/45

Disallowing Negative-length Cycles

APSP does not allow for input to containnegative-length cycles

This is necessary because:

If such a cycle were to exist within a path from vito vj,then one could traverse this cycle indefinitely,

producing paths of ever shorter lengths from vito vj.

If a negative-length cycle exists, then all paths

which contain this cycle would have a lengthof -.


9/45

Recent Work on Sequential Algorithms

Floyd-Warshall algorithm is (V3) Appropriate for dense graphs: |E| = O(|V|2)

Johnsons algorithm Appropriate for sparse graphs: |E| = O(|V|)

O(V2 log V + V E) if using a Fibonacci heap

O(V E log V) if using binary min-heap

Shoshan and Zwick (1999) Integer edge weights in {1, 2, , W}

O(W Vp(V W)) where 2.376 and p is a polylog function

Pettie (2002) Allows real-weighted edges

O(V2 log log V + V E)

Strassens Algorithm(matrix multiplication)

1),(

k v

k

k

wwvp


10/45

Properties of Interest

Let denote the length of the shortest path fromvito vjthat goes through at most k - 1 intermediate

vertices (k hops)

= wij(edge length from vito vj) If ijand there is no edge from vito vj, then

Also,

Given that there are no negative weighted cyclesin G, there is no advantage in visiting any vertex

more than once in the shortest path from vito vj.

Since there are only nvertices in G,

ijij wd1

kijd

1

ij

d

01 iiii wd

1 nijij dd


11/45

Guaranteeing Shortest Paths

If the shortest path from vito vjcontains vrand vs(wherevrprecedes vs)

The path from vrto vsmust be minimal (or it wouldntexist in the shortest path)

Thus, to obtain the shortest path from vito vj, we cancompute all combinations of optimal sub-paths (whose

concatenation is a path from vito vj), and then select the

shortest one

vi vs vj

MIN MIN

vr

MIN

MINs


12/45

Iteratively Building Shortest Paths

1kijd

vi vj

v1w1j1

1

kid

v2w2j

1

2

kid

vn wnj1kind

lj

k

il

n

l

k

ij

k

ij wddd 1

1

1 min,min

ljkiln

l

k

ij wdd

1

1

min


13/45

Recurrence Definition

For k> 1,

Guarantees O(log k) steps to calculate

)(min 2/2/ kljkill

kij ddd

vi vl vj

k/2 vertices k/2 vertices

kvertices

MIN MIN

k

ijd


14/45

Similarity

lj

k

il

n

l

k

ij

wdd

1

1

min

n

l

ljilij BAC1


15/45

Computing D

LetDk= matrix with entries dijfor 0 i,j n- 1.

GivenD1, computeD2,D4, ,Dm

D = Dm

To calculateDkfromDk/2, use special form of

matrix multiplication

min

)1log(2 nm


16/45

Modified Matrix Multiplication

Step 2: forr= 0 toN1 dopar

Cr=Ar+Br

end

Step 3: form= 2qto3q1 do

forallr N(rm= 0) dopar

Cr= min(Cr, Cr(m))


17/45

Modified Example

43

21A

43

21B

2215

107C

23

1

-1

1

-2

2-4

4

-3

3

-1

3

-2

4

-4

P100 P101

P000 P001

P110 P111P010 P011

From 9.2, after step (1.3)


18/45

Modified Example (step 2)

43

21A

43

21B

P100 P101

P000 P001

P110 P111P010 P011

From 9.2, after

modified step 2

5 -2

1 0

0 -1

2 1


19/45

Modified Example (step 3)

43

21A

43

21B

P100 P101

P000 P001

P110 P111P010 P011

From 9.2, after

modified step 30 -2

1 0

MIN MIN

MIN MIN

01

20C


20/45

Hypercube Setup

Begin with a hypercube of n3processors Each has registersA,B, and C

Arrange them in an nnnarray (cube)

SetA(0,j, k) = wjkfor 0 j, k n1 i.e processors in positions (0,j, k) containD1= W

When done, C(0,j, k) contains APSP =Dm


21/45


22/45

APSP Parallel Algorithm

Algorithm HYPERCUBE SHORTEST PATH (A,C)Step 1: forj= 0 ton- 1 dopar

fork= 0 ton- 1 dopar

B(0,j, k) = A(0,j, k)

end for

end for

Step 2: fori= 1 to do

(2.1) HYPERCUBE MATRIX MULTIPLICATION(A,B,C)

(2.2) forj= 0 ton- 1 dopar

fork = 0 ton- 1 dopar

(i) A(0,j, k) = C(0,j, k)

(ii) B(0,j, k) = C(0,j, k)

end for

end forend for

)1log( n


23/45

An Example

056

401

072

30

920

10

5

4

3

2

1

00 1 2 3 4 5

D1=

09563

4091001

100712

730

135208

10310

5

4

3

2

1

0

D2=

D4=

0 1 2 3 4 5

095643

409201

1240112

7312032

9514204

10619310

5

4

3

2

1

0

0 1 2 3 4 5

D8=

095643

409201

840112

7312032

9514204

10615310

5

4

3

2

1

0

0 1 2 3 4 5


24/45

Analysis

Steps 1 and (2.2) require constant time

There are iterations of Step (2.1) Each requires O(log n) time

The overall running time is t(n) = O(log2n)

p(n) = n3

Cost is c(n) =p(n) t(n) = O(n3

log2

n)

Efficiency is

)1log( n

)(log

1

)log(

)(

)( 223

3

1

nOnnO

nO

nc

TE


25/45

Recent Research

Jenq and Sahni (1987) compared various parallelalgorithms for solving APSP empirically

Kumar and Singh (1991) used the isoefficiencymetric (developed by Kumar and Rao) to analyze

the scalability of parallel APSP algorithms

Hardware vs. scalability

Memory vs. scalability


26/45

Isoefficiency

For scalable algorithms (efficiency increasesmonotonically as p remains constant and problem

size increases), efficiency can be maintained for

increasing processors provided that the problemsize also increases

Relates the problem size to the number ofprocessors necessary for an increase in speedup

in proportion to the number of processors used


27/45

Isoefficiency (cont)

Given an architecture, defines thedegree of scalability Tells us the required growth in problem size to be able to efficiently

utilize an increasing number of processors

Ex:

Given an isoefficiency of kp3

Ifp0and w0, speedup = 0.8p0(efficiency = 0.8)Ifp1= 2p0, to maintain efficiency of 0.8

w1= 23w0= 8w0

Indicates the superiority of one algorithm over another only whenproblem sizes are increased in the range between the twoisoefficiency functions


28/45

Isoefficiency (cont)

Given an architecture, defines thedegree of scalability Tells us the required growth in problem size to be able to efficiently

utilize an increasing number of processors

Ex:


Ifp0and w0, speedup = 0.8p0(efficiency = 0.8)If w1= 2w0, to maintain efficiency of 0.8

p1= 23w0= 8w0

Indicates the superiority of one algorithm over another only whenproblem sizes are increased in the range between the twoisoefficiency functions


Ifp0and w0, speedup = 0.8p0(efficiency = 0.8)

Ifp1= 2p0, to maintain efficiency of 0.8w1= 23w0= 8w0


29/45

Memory Overhead Factor (MOF)

Ratio:

Total memory required for all processors

Memory required for the same problems size on single processor

Wed like this to be lower!


30/45

Architectures Discussed

Shared Memory (CREW)

Hypercube (Cube)

Mesh

Mesh with Cut-Through Routing Mesh with Cut-Through and Multicast Routing

Also examined fast and slow communicationtechnologies


31/45

Parallel APSP Algorithms

Floyd Checkerboard

Floyd Pipelined Checkerboard

Floyd Striped

Dijkstra Source-Partition Dijkstra Source-Parallel


32/45

General Parallel Algorithm (Floyd)

Repeat steps 1 through 4 for k:= 1 to n

Step 1: If this processor has a segment ofPk-1[*,k], then transmit it to allprocessors that need it

Step 2: If this processor has a segment ofPk-1[k,*], then transmit it to all

processors that need itStep 3: Wait until the needed segments ofPk-1[*,k] andPk-1[k,*] have

been received

Step 4: For all i,jin this processors partition, computePk[i,j] := min {Pk-1[i,j],Pk-1[i,k] +Pk-1[k,j]}


33/45

Floyd Checkerboard

p

n

p

n

Each cell is assigned to adifferent processor, and this

processor is responsible for

updating the cost matrix

values at each iteration ofthe Floyd algorithm.

Steps 1 and 2 of the GPF

involve each of the

processors sending theirdata to the neighborcolumns and rows.

p


34/45

Floyd Pipelined Checkerboard

p

n

p

n

Similar to the preceding.

Steps 1 and 2 of the GPF

involve each of the

processors sending theirdata to the neighborcolumns and rows.

The difference is that the

processors are notsynchronizedand compute

and send data ASAP (or

sends as soon as it receives).

p


35/45

Floyd Striped

p

n

Each column is assigned adifferent processor, and this

processor is responsible for

updating the cost matrix

values at each iteration of

the Floyd algorithm.

Step 1 of the GPF

involves each of the

processors sending their

data to the neighborcolumns. Step 2 is not

needed (since the column

is contained within the

processor).

p


36/45

Dijkstra Source-Partition

Assumes Dijkstras Single-source Shortest Path is equallydistributed overpprocessors and executed in parallel

Processorpfinds shortest paths from each vertex in itsset to all other vertices in the graph

Fortunately, this approach involves nointer-processor communication

Unfortunately, only nprocessors can be kept busy

Also, memory overhead is high since each processors hasa copy of the weight matrix


37/45

Dijkstras Source-Parallel

Motivated by keeping more processors busy Run ncopies of the Dijkstras SSP

Each copy runs on processors (p> n)

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p

n

p


38/45

Calculating Isoefficiency

Example: Floyd Checkerboard

At most n2processors can be kept busy

n must grow as (p) due to problem structure By Floyd (sequential), Te= (n3)

Thus isoefficiency is (p3) = (p1.5)

But what about communication


39/45

Calculating Isoefficiency (cont)

ts= message startup time tw= per-word communication time tc= time to compute next iteration value for one cell in matrix m = number words sent d = number hops between nodes

Hypercube: (ts+ twm) log d = time to deliver m words 2 (ts+ twm) log p = barrier synchronization time (up & down tree) d = p

Step 1 = (ts+ twn/p) log p Step 2 = (ts+ twn/p) log p Step 3 (barrier synch) = 2(ts+ tw) log p Step 4 = tcn2/p

p

ntpttp

p

nttnT

cwswsp

2

log2log2

Isoefficiency = (p1.5(log p)3)


40/45

Mathematical Details

epo TpTT

32

log2log2 ntp

ntpttp

p

nttnpT ccwswso

ppntpnpttTwwso

loglog23 2

How are nandp

related?


41/45

Mathematical Details

epo TpTT

32

log2log2 ntp

ntpttp

p

nttnpT ccwswso

ppntpnpttTwwso

loglog23 2

5.1)log( pp 35.1 )(log pp

35.1 )(log pp

ppntpnpttKnt wwsc loglog23 23

E

E

1


42/45

Calculating Isoefficiency (cont)

ts= message startup time tw= per-word communication time

tc= time to compute next iteration value for one cell in matrix

m = number words sent

d = number hops between nodes

Mesh: Step 1 =

Step 2 =

Step 3 (barrier synch) = Step 4 = Te

pnppp

nsynccommTp

2)/( Isoefficiency = (p3+p2.25)

= (p3)

pp

n

pp

n

p

Isoefficiency and MOF for


43/45

Isoefficiency and MOF for

Algorithm & Architecture Combinations

Base Algorithm Parallel Variant Architecture Isoefficiency MOF

Dijkstra Source-

Partitioned

SM, Cube, Mesh, Mesh-CT,

Mesh-CT-MC

p3 p

Dijkstra Source-Parallel SM, Cube (plogp)1.5 n

Mesh, Mesh-CT

Mesh-CT-MC

p1.8 n

Floyd Stripe SM p3 1

Cube (plogp)3 1

Mesh p4.5 1

Mesh-CT (plogp)3 1

Mesh-CT-MC p3 1

Floyd Checkerboard SM p1.5 1

Cube p1.5(logp)3 1

Mesh p3 1

Mesh-CT p2.25 1

Mesh-CT-MC p2.25 1

Floyd Pipelined

Checkerboard

SM, Cube, Mesh, Mesh-CT,

Mesh-CT-MC

p1.5 1


44/45

Comparing Metrics

Weve used cost previously this semester(cost =pTp)

But notice that the cost of all of the architecture-

algorithm combinations discussed here is (n3

)

Clearly some are more scalable than others

Thus isoefficiency is a useful metric whenanalyzing algorithms and architectures


45/45

References

Akl S. G. Parallel Computation: Models and Methods. PrenticeHall, Upper Saddle River NJ, pp. 381-384,1997. Cormen T. H., Leiserson C. E., Rivest R. L., and Stein C.

Introduction to Algorithms (2ndEdition). The MIT Press, CambridgeMA, pp. 620-642, 2001.

Jenq J. and Sahni S. All Pairs Shortest Path on a Hypercube

Multiprocessor. In International Conference on Parallel Processing.pp. 713-716, 1987.

Kumar V. and Singh V. Scalability of Parallel Algorithms for the AllPairs Shortest Path Problem. Journal of Parallel and DistributedComputing, vol. 13, no. 2, Academic Press, San Diego CA, pp. 124-138, 1991.

Pettie S. A Faster All-pairs Shortest Path Algorithm for Real-weighted Sparse Graphs. In Proc. 29th Int'l Colloq. on Automata,Languages, and Programming (ICALP'02), LNCS vol. 2380, pp. 85-97, 2002.

final all-pairs shortest paths

Documents