parallel programming with mpi and openmp
DESCRIPTION
Parallel Programming with MPI and OpenMP. Michael J. Quinn. Chapter 6. Floyd’s Algorithm. Chapter Objectives. Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D matrices - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/1.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Programmingwith MPI and OpenMP
Michael J. QuinnMichael J. Quinn
![Page 2: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/2.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 6
Floyd’s AlgorithmFloyd’s Algorithm
![Page 3: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/3.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter Objectives
Creating 2-D arraysCreating 2-D arrays Thinking about “grain size”Thinking about “grain size” Introducing point-to-point communicationsIntroducing point-to-point communications Reading and printing 2-D matricesReading and printing 2-D matrices Analyzing performance when computations Analyzing performance when computations
and communications overlapand communications overlap
![Page 4: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/4.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outline
All-pairs shortest path problemAll-pairs shortest path problem Dynamic 2-D arraysDynamic 2-D arrays Parallel algorithm designParallel algorithm design Point-to-point communicationPoint-to-point communication Block row matrix I/OBlock row matrix I/O Analysis and benchmarkingAnalysis and benchmarking
![Page 5: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/5.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
All-pairs Shortest Path Problem
A
E
B
C
D
4
6
1 35
3
1
2
0 6 3 6
4 0 7 10
12 6 0 3
7 3 10 0
9 5 12 2
A
B
C
D
E
A B C D
4
8
1
11
0
E
Resulting Adjacency Matrix Containing Distances
![Page 6: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/6.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Floyd’s Algorithm
for k 0 to n-1for i 0 to n-1
for j 0 to n-1a[i,j] min (a[i,j], a[i,k] + a[k,j])
endforendfor
endfor
![Page 7: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/7.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why It Works
i
k
j
Shortest path from i to k through 0, 1, …, k-1
Shortest path from k to j through 0, 1, …, k-1
Shortest path from i to j through 0, 1, …, k-1
Computedin previousiterations
![Page 8: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/8.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Dynamic 1-D Array Creation
A
Heap
Run-time Stack
![Page 9: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/9.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Dynamic 2-D Array Creation
Heap
Run-time StackBstorage B
![Page 10: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/10.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Designing Parallel Algorithm
PartitioningPartitioning CommunicationCommunication Agglomeration and MappingAgglomeration and Mapping
![Page 11: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/11.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Partitioning
Domain or functional decomposition?Domain or functional decomposition? Look at pseudocodeLook at pseudocode Same assignment statement executed Same assignment statement executed nn33
timestimes No functional parallelismNo functional parallelism Domain decomposition: divide matrix Domain decomposition: divide matrix AA
into its into its nn22 elements elements
![Page 12: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/12.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Communication
Primitive tasksUpdatinga[3,4] whenk = 1
Iteration k:every taskin row kbroadcastsits value w/intask column
Iteration k:every taskin column kbroadcastsits value w/intask row
![Page 13: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/13.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration and Mapping
Number of tasks: staticNumber of tasks: static Communication among tasks: structuredCommunication among tasks: structured Computation time per task: constantComputation time per task: constant Strategy:Strategy:
Agglomerate tasks to minimize Agglomerate tasks to minimize communicationcommunication
Create one task per MPI processCreate one task per MPI process
![Page 14: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/14.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Two Data Decompositions
Rowwise block striped Columnwise block striped
![Page 15: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/15.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Comparing Decompositions
Columnwise block stripedColumnwise block striped Broadcast within columns eliminatedBroadcast within columns eliminated
Rowwise block stripedRowwise block striped Broadcast within rows eliminatedBroadcast within rows eliminated Reading matrix from file simplerReading matrix from file simpler
Choose rowwise block striped Choose rowwise block striped decompositiondecomposition
![Page 16: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/16.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
File Input
File
![Page 17: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/17.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pop Quiz
Why don’t we input the entire file at onceand then scatter its contents among theprocesses, allowing concurrent messagepassing?
![Page 18: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/18.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Point-to-point Communication
Involves a pair of processesInvolves a pair of processes One process sends a messageOne process sends a message Other process receives the messageOther process receives the message
![Page 19: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/19.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Send/Receive Not Collective
![Page 20: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/20.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Function MPI_Send
int MPI_Send (
void *message,
int count,
MPI_Datatype datatype,
int dest,
int tag,
MPI_Comm comm
)
![Page 21: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/21.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Function MPI_Recvint MPI_Recv (
void *message,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm comm,
MPI_Status *status
)
![Page 22: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/22.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Coding Send/Receive
…if (ID == j) { … Receive from I …}…if (ID == i) { … Send to j …}…
Receive is before Send.Why does this work?
![Page 23: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/23.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Inside MPI_Send and MPI_Recv
Sending Process Receiving Process
ProgramMemory
SystemBuffer
SystemBuffer
ProgramMemory
MPI_Send MPI_Recv
![Page 24: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/24.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Return from MPI_Send
Function blocks until message buffer freeFunction blocks until message buffer free Message buffer is free whenMessage buffer is free when
Message copied to system buffer, orMessage copied to system buffer, or Message transmittedMessage transmitted
Typical scenarioTypical scenario Message copied to system bufferMessage copied to system buffer Transmission overlaps computationTransmission overlaps computation
![Page 25: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/25.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Return from MPI_Recv
Function blocks until message in bufferFunction blocks until message in buffer If message never arrives, function never If message never arrives, function never
returnsreturns
![Page 26: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/26.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Deadlock
Deadlock: process waiting for a condition Deadlock: process waiting for a condition that will never become truethat will never become true
Easy to write send/receive code that Easy to write send/receive code that deadlocksdeadlocks Two processes: both receive before sendTwo processes: both receive before send Send tag doesn’t match receive tagSend tag doesn’t match receive tag Process sends message to wrong Process sends message to wrong
destination processdestination process
![Page 27: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/27.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Computational Complexity
Innermost loop has complexity Innermost loop has complexity ((nn)) Middle loop executed at most Middle loop executed at most n/pn/p times times Outer loop executed Outer loop executed nn times times Overall complexity Overall complexity ((nn33/p/p))
![Page 28: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/28.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Communication Complexity
No communication in inner loopNo communication in inner loop No communication in middle loopNo communication in middle loop Broadcast in outer loop Broadcast in outer loop — complexity is — complexity is
((nn log log pp)) Overall complexity Overall complexity ((nn22 log log pp))
![Page 29: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/29.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Execution Time Expression (1)
)/4(log/ npnnpnn
Iterations of outer loopIterations of middle loop
Cell update time
Iterations of outer loop
Messages per broadcastMessage-passing time
Iterations of inner loop
![Page 30: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/30.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Computation/communication Overlap
![Page 31: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/31.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Execution Time Expression (2)
Iterations of outer loopIterations of middle loop
Cell update time
Iterations of outer loop
Messages per broadcastMessage-passing time
Iterations of inner loop
/4loglog/ nppnnpnn Message transmission
![Page 32: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/32.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Predicted vs. Actual PerformanceExecution Time (sec)Execution Time (sec)
ProcessesProcesses PredictedPredicted ActualActual
11 25.5425.54 25.5425.54
22 13.0213.02 13.8913.89
33 9.019.01 9.609.60
44 6.896.89 7.297.29
55 5.865.86 5.995.99
66 5.015.01 5.165.16
77 4.404.40 4.504.50
88 3.943.94 3.983.98
![Page 33: Parallel Programming with MPI and OpenMP](https://reader033.vdocument.in/reader033/viewer/2022061505/568143a1550346895db0233c/html5/thumbnails/33.jpg)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary
Two matrix decompositionsTwo matrix decompositions Rowwise block stripedRowwise block striped Columnwise block stripedColumnwise block striped
Blocking send/receive functionsBlocking send/receive functions MPI_SendMPI_Send MPI_RecvMPI_Recv
Overlapping communications with computationsOverlapping communications with computations