computer science 320 broadcasting. floyd’s algorithm on smp for i = 0 to n – 1 parallel for r =...
TRANSCRIPT
![Page 1: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/1.jpg)
Computer Science 320
Broadcasting
![Page 2: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/2.jpg)
Floyd’s Algorithm on SMP
for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)
![Page 3: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/3.jpg)
Floyd’s Algorithm on Cluster
• Root node reads distance matrix from input file and scatters row slices to other nodes
• Other nodes compute distances and update their slices
• The slices are gathered back to the root node for output
![Page 4: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/4.jpg)
Parallel I/O File Pattern
• Eliminate the gather of data by having each node write its slice to a separate file
• Eliminate the scatter of data by having each node read its slice from the input file
![Page 5: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/5.jpg)
Execution Timeline
![Page 6: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/6.jpg)
Sharing Data in Computation
• On each pass through the outer loop, the ith row must be available to all of the processes (they all execute the same line of code in the inner loop)
• They can do this in SMP because they share the entire matrix
• They can’t do this in a cluster setup, because they don’t share
for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)
![Page 7: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/7.jpg)
Share Row via a Broadcast Message
• The process that owns a row broadcasts it before the parallel loop is run, on each pass through the outer loop
• Process that owns the row acts as the root for the broadcast, setting up the source buffer
• The other processes set up a destination buffer
• Broadcast also enforces synchronization; they all wait for the broadcast
for i = 0 to n – 1 broadcast row i of d parallel for r = 0 to n – 1 for c = 0 to n – 1 drc = min(drc, dri + dic)
![Page 8: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/8.jpg)
![Page 9: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/9.jpg)
// Allocate storage for row broadcast from another process.row_i = new double [n];row_i_buf = DoubleBuf.buffer (row_i);
int i_root = 0;for (int i = 0; i < n; ++ i){ double[] d_i = d[i]; // Determine which process owns row i. if (! ranges[i_root].contains(i)) ++ i_root; // Broadcast row i from owner process to all processes. if (rank == i_root) world.broadcast(i_root, DoubleBuf.buffer (d_i)); else{ world.broadcast(i_root, row_i_buf); d_i = row_i; } // Inner loops over rows in my slice and over all columns. for (int r = mylb; r <= myub; ++ r){ double[] d_r = d[r]; for (int c = 0; c < n; ++ c) d_r[c] = Math.min (d_r[c], d_r[i] + d_i[c]); }}
![Page 10: Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +](https://reader036.vdocument.in/reader036/viewer/2022072014/56649e955503460f94b99629/html5/thumbnails/10.jpg)
Problem: Too Many Messages
• The amount of time spent in communication is too high when compared to the time spent in computation