cs 770g - parallel algorithms in scientific computing ...cs770g/handout/mpi2.pdfcs 770g - parallel...

51
CS 770G - Parallel Algorithms in Scientific Computing May 14 , 2001 Lecture 3 Message-Passing II: MPI Programming

Upload: truongdien

Post on 26-Apr-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

CS 770G - Parallel Algorithms in Scientific Computing

May 14 , 2001Lecture 3

Message-Passing II:MPI Programming

Page 2: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

2

References

• Using MPI: Portable Parallel Programming with the Message-Passing Interface

Gropp, Lusk, Skjellum, MIT.

• Parallel Programming with MPI Pacheco, Morgan Kaufmann

Page 3: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

3

Message Passing Programming

• Separate processors.• Separate address spaces.• Procs execute independently and concurrently.• Procs transfer data cooperatively.• Single Program Multiple Data (SPMD)

– All procs are executing the same program, but operating on different data.

• Multiple Program Multiple Data (MPMD)– Different procs may be executing a different program.

• Common software tools: PVM, MPI.

Page 4: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

4

What is MPI? Why?• Message-passing library specifications

– Message-passing model– Not a compiler specification– Not a specific product

• For parallel computers, clusters & heterogeneous networks.• Designed to permit the development of parallel software

libraries.• Designed to provide access to advanced parallel hardware for:

– End users– Library writers– Tool developers

Page 5: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

5

Who Designed MPI?

• Broad participants.• Vendors:

– IBM, Intel, TMC, Meiko, Cray, Convex, nCube.

• Library writers:– PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda.

• Application specialists and consultants– Companies: ARCO, KAI, NAG, Parasoft, Shell, …– Labs: ANL, LANL, LLNL, ORNL, SNL, …– Universities: almost 20.

Page 6: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

6

Why Use MPI?

• Standardization– The only message-passing library which can be considered as

standard.

• Portability– There is no need to modify the source when porting codes

from one platform to another which supports MPI.

• Performance– Vendor implementations should be able to exploit native

hardware to optimize performance.

Page 7: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

7

Why Use MPI? (cont.)

• Availability– A variety of implementations are available, both vendor and

public domain, e.g. MPICH implemented by ANL.

• Functionality– It provides more than 100 subroutine calls.

Page 8: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

8

Features of MPI• General

– Communicators combine context and group for message security.

– Thread safety.

• Point-to-point communication:– Structured buffers and derived datatypes, heterogeneity.– Modes: standard, synchronous, ready (to allow access to fast

protocols), buffered.

• Collective communication:– Both built-in & user-defined collective operators.– Large number of data movement routines– Subgroup defined directly or by topology.

Page 9: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

9

Is MPI Large or Small?

• MPI is large -- over 100 functions– Extensive functionalities requires many functions.

• MPI is small -- 6 functions:– MPI_Init: initialize MPI.– MPI_Comm_size: find out how many procs there are.– MPI_Comm_rank: find out which proc I am.– MPI_Send: send a message.– MPI_Recv: receive a message.– MPI_Finalize: terminate MPI.

Page 10: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

10

Is MPI Large or Small? (cont.)

• MPI is just right– One can access flexibility when it is required.– One need not master all parts of MPI to use it.

Page 11: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

11

Send & Receive• Cooperate data transfer:

• To (from) whom is data sent (received)?

• What is sent?

• How does the receiver identify it?

Data Data

Proc 0 Proc 1

Send Receive

Page 12: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

12

Message Passing: SendSyntax of MPI_Send:

MPI_Send (address, count, datatype, dest, tag, comm)

• (address, count) = a contiguous area in memory containing the message to be sent.

• datatype = type of data, e.g. integer, double precision.– Message length = count * sizeof(datatype)

• dest = integer identifier representing the proc to receive the message.

• tag = nonnegative integer that the destination can use to selectively screen messages.

• comm = communicator = group of procs.

Page 13: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

13

Message Passing: Receive

Syntax of MPI_Recv:

MPI_Recv (address, count, datatype, source, tag,comm, status)

• address, count, datatype, tag, comm the same as MPI_Send.

• source = integer identifier representing the proc to send the message.

• status = information about the message that is received.

Page 14: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

14

SPMD• Proc 0 & proc 1 are actually performing different operations.• However, not necessary to write separate programs for each

proc.• Typically, use conditional statement and proc id to identify

the job of each proc. Example:

endif

status);ORLD,MPI_COMM_W0,MPI_INT,0,,10,MPI_Recv(a

1)(my_idifelse

ORLD);MPI_COMM_W0,MPI_INT,1,,10,MPI_Send(a

0)(my_idif

a[10];int

==

==

Page 15: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

15

Deadlock• Example: exchange data between 2 procs:

• MPI_Send is a synchronous operation. If no system buffering, it keeps waiting until a matching receive is posted.

Proc 0 Proc 1

MPI_SendMPI_Recv

Data 1

Data 2

Data 2

Data 1

MPI_SendMPI_Recv

Page 16: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

16

Deadlock (cont.)

• Both procs are waiting for each other → deadlock.• However, OK if system buffering exists → unsafe

program.

MPI_SendMPI_Recv

Proc 0 Proc 1

Data 1

Data 2

Data 2

Data 1

MPI_SendMPI_Recv

Page 17: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

17

Deadlock (cont.)• Note: MPI_Recv is blocking and nonbuffered.

• A real deadlock:

• Can be fixed by reordering comm.:

Proc 0 Proc 1MPI_Recv MPI_RecvMPI_Send MPI_Send

Proc 0 Proc 1MPI_Send MPI_RecvMPI_Recv MPI_Send

Page 18: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

18

Buffered / Nonbuffered Comm.• No-buffering (phone calls)

– Proc 0 initiates the send request and ring Proc 1. It waits until Proc 1 is ready to receive. The transmission starts.

– Synchronous comm.: complete only when the message was received by the receiving proc.

• Buffering (beeper)– The message to be sent (by Proc 0) is copied to a system-

controlled block of memory (buffer).– Proc 0 can continue executing the rest of its program.– When Proc 1 is ready to receive the message, the system copies

the buffered message to Proc 1.– Asynchronous comm.: may be completed even though the

receiving proc has not received the message.

Page 19: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

19

Buffered Comm.• Buffering requires system resources, e.g. memory, and

can be slower if the receiving proc is ready at the time of requesting send.

• Application buffer: address space that hold the data.• System buffer: system space for storing messages. In

buffered comm, data in application buffer is copied to/from system buffer.

• MPI allows comm. in buffered mode: MPI_Bsend, MPI_Ibsend.

• User allocates the buffer by: MPI_Buffer_attach (buffer, buffer_size)

• Free the buffer by MPI_Buffer_detach.

Page 20: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

20

Blocking / Nonblocking Comm.• Blocking comm. (McDonald)

– The receiving proc has to wait if the message is not ready.– Different from synchronous comm.– Proc 0 may already buffered the message to system and Proc 1

is ready, but the interconnection network is busy.

• Nonblocking comm. (In & Out)– Proc 1 checks with the system if the message has arrived yet. If

not, continue doing other stuff. Otherwise, get the message from the system.

– Useful when computation and comm. can perform at the same time.

Page 21: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

21

Blocking / Nonblocking Comm. (cont.)• MPI allows both nonblocking send & receive: MPI_Isend, MPI_Irecv.

• In nonblocking send, it identifies an area in memory to serve as a send buffer. Processing continues immediately without waiting for message to be copied out from application buffer.

• The program should not modify the application bufferuntil the nonblocking send has completed.

• Nonblocking comm. can be combined with nonbuffereing: MPI_Issend, or buffering: MPI_Ibsend.

• Use MPI_Wait or MPI_Test to determine if the nonblocking send or receive has completed.

Page 22: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

22

Example: Blocking vs Nonblocking• Data exchange in a ring topology.

• Blocking version:0P 1P

3P2P

}

&status);ring_comm,1,0,my_id

MPI_FLOAT,blksize,t,recv_offseMPI_Recv(y

;ring_comm)1,0,my_id

MPI_FLOAT,blksize,t,send_offseMPI_Send(y

blksize;*p)%p)1i((my_idtrecv_offse

blksize;*p)%p)i((my_idtsend_offse

{)ip;i0;(ifor

−+

++

+−−=+−=

++<=

Page 23: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

23

Example: Blocking vs Nonblocking (cont.)• Nonblocking version:

}

);st,&statusrecv_requeMPI_Wait(&

);st,&statussend_requeMPI_Wait(&

blksize;*p)%p)2i((my_idtrecv_offse

blksize;*p)%p)1-i((my_idtsend_offse

est);&recv_requring_comm,1,0,my_id

MPI_FLOAT,blksize,t,recv_offseyMPI_Irecv(

est);&send_requring_comm,1,0,my_id

MPI_FLOAT,blksize,t,send_offseyMPI_Isend(

blksize;*p)%p)1i((my_idtrecv_offse

blksize;*p)%p)i((my_idtsend_offse

{)i1;-pi0;(ifor

blksize;*p)1-(my_idtrecv_offse

blksize;*my_idtsend_offse

+−−=+−=

−+

++

+−−=+−=

++<=+=

=

Page 24: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

24

Summary: Comm. Modes

• 4 comm. Modes in MPI: standard, buffered, synchronous, ready. They can be either blocking or nonblocking.

• In standard modes (MPI_Send, MPI_Recv), it is up to the system to decide whether messages should be buffered.

• In synch. mode, a send won't complete until a matching receive has been posted and which has begun reception of the data.– MPI_Ssend, MPI_Issend.– No system buffering.

Page 25: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

25

Summary: Comm. Modes (con.t)

• In buffered mode, the completion of a send does not depend on the existence of a matching receive.– MPI_Bsend, MPI_Ibsend.– System buffering by MPI_Buffer_attach & MPI_buffer_detach.

• Ready mode not discussed.

Page 26: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

26

Collective Communications

Comm. Pattern involving all the procs; usually more than 2.• MPI_Barrier: synchronize all procs.• Broadcast (MPI_Bcast)• Reduction (MPI_Reduce)

– All procs contribute data that is combined using a binary op.– E.g. max, min, sum, etc.– One proc obtains the final answer.

• Allreduce (MPI_Allreduce)– Same as MPI_Reduce, but every procs contains the final answer.– Conceptually, MPI_Allreduce = MPI_Reduce + MPI_Bcast, but

more efficient.

Page 27: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

27

An Implementation

• Tree-structured comm.: (find the max among procs)

• Only need log p stages of comm.• Not necessary optimal on a particular architecture.

14 5

3

02 36 7

7 4 5

57

7

6

70 P1 2 4 5 6 PP P P P P P

Page 28: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

28

Example 1: Hello, world!

• #include "mpi.h": basic MPI definitions and data types.• MPI_Init starts MPI.• MPI_Finalize exits MPI.• Note: all non-MPI routines are local; thus printf run on

each proc.

}

0;return

ze();MPI_Finali

);world!\n"Hello,printf("

);argc,&argvMPI_Init(&

{

*argv;*char

argc;int

argv)main(argc,int

>< hstdioincludehmpiinclude

.#"."#

Page 29: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

29

Example 2: "Advanced" Hello, world!

• MPI_Comm_rank returns proc id.• MPI_Comm_size returns # of procs.• Note: for some parallel systems, only a few designated

procs can do I/O.• How does the output look like?

}

0;return

ze();MPI_Finali

size);rank,,n"\%dof%damIworld!Hello,printf("

size);&MM_WORLD,ize(MPI_COMPI_Comm_s

rank);&MM_WORLD,ank(MPI_COMPI_Comm_r

);argc,&argvMPI_Init(&

{

argv)main(argc,int

Page 30: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

30

Example: Calculate π• Well-known formula:

• Numerical integration (Trapezoidal rule):

.1

41

0 2 π=+∫ dx

x

f(x)

=ba= n20 x . . . . xn-1n-2x1 xxx

.lssubinterva of #,/)(,

.)(21)()()(

21)( 110

=−=+=

++++≈ −∫

nnabhihax

xfxfxfxfhdxxf

i

nn

b

aL

Page 31: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

31

Example: Calculate π (cont.)• A sequential function Trap(a,b,n) approx the

integral from a to b of f(x) using trap rule with nsubintervals:

}

integral;return

}

f(x);*hintegralintegral

ih;ax

{)in;i1;(ifor

f(b))/2;(f(a)*hintegral

a)/n;(bh

{

n)b,Trap(a,double

+=+=

++<=+=

−=

Page 32: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

32

Parallelizing Trap

• Divide the interval [a,b] into p equal subintervals.• Each proc calculates the local approx. integral using trap rule

simultaneously.• Finally, combine the local values to obtain the total integral.

kn/pxProc 0

2n/pxn/px xProc kProc 1

f(x)

Proc p-1(k+1)n/p xn=b0xa= n-n/px

Page 33: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

33

Parallel Trap Program

);n,b,Trap(aintegralh;*nab

h;*n*my_idaan/p;n

a)/n;(bh1;b0;a128;n

/*locallyruleTrapApply*/

my_id);MM_WORLD,&ank(MPI_COMPI_Comm_r/*idprocmyDetermine*/

p);MM_WORLD,&ize(MPI_COMPI_Comm_s/*procsmanyhowoutFind*/

);argc,&argvMPI_Init(&/*MPIStartup*/

kkk

kkk

kk

k

=+=

+==

−====

Page 34: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

34

Parallel Trap Program (cont.)

ze();MPI_Finali/*MPIClose*/

}ORLD);MPI_COMM_W

tag,,0,MPI_DOUBLE,integral,1MPI_Send(&{else

}}

integral;totaltotalus);ORLD,&statMPI_COMM_W

tag,k,,MPI_DOUBLE,integral,1MPI_Recv(&{)kp;k1;(kfor

integral;total{0)(my_idif

/*integralsupSum*/

+=

++<==

==

Page 35: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

35

Parallelizing Trap Program (cont.)

• Can/should replace MPI_Send & MPI_Recv by MPI_Reduce.

• Embarrassingly parallel -- no comm. needed during the computations of the local approx. integrals.

Page 36: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

36

Wildcards

• MPI_ANY_TAG, MPI_ANY_SOURCE– MPI_Recv can use it for the tag and source input

arguments.

• May use status output argument to determine the actual source and tag.

• In C, the last parameter of MPI_Recv, status, is a struct with at least 2 members:– status -> MPI_TAG

– status -> MPI_SOURCE

• They return the rank of the proc that sent the message (MPI_SOURCE), and the tag number (MPI_TAG).

Page 37: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

37

Timing• MPI_Wtime() returns the wall-clock time.

start;-finishtime

);MPI_WTime(finish

_WORLD);r(MPI_COMMMPI_Barrie

);MPI_WTime(start

_WORLD);r(MPI_COMMMPI_Barrie

time;finish,start,double

==

=

M

M

Page 38: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

38

MPI Data Structures

• Suppose in the previous program, proc 0 reads in the values of a, b, & n from standard input, and then broadcasts the values to the other procs.

• Consequently, we need to perform MPI_Bcast 3 times.

• Sending a message is expensive in parallel environment → min. latency.

• Can reduce overhead cost by sending the 3 values in a single message.

• 3 approaches: count, derived datatype, MPI_Pack/Unpack.

Page 39: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

39

(I) count + datatype

• In MPI_Send (MPI_Recv, MPI_Bcast …), we specify the length of data by count.

• Thus, we may group data items having the same datatype: store the data in contigous memory locations (e.g. array).

• Unfortunately, in our case, a & b are doubles but n is an integer.

Page 40: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

40

(II) Derived Data Type• Define an MPI datatype consisting of 2 doubles and 1 integer.• Use MPI_Type_struct.

• A general MPI datatype is a sequence of pairs:

– ti = MPI_datatype– di = displacement in bytes relative to the starting address of the

message

• E.g.

• Then the derived datatype is:

)},(,),,(),,{( 1100 nn dtdtdt K

10 25 30

ba n

E,20)}(MPI_DOUBL,E,15),(MPI_DOUBLLE,0),{(MPI_DOUB K

Page 41: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

41

(II) Derived Data Type (cont.)

ORLD);MPI_COMM_Wnewtype,0,&a,1,MPI_Bcast(ype);type,&newtdisp,blklen,truct(3,MPI_Type_s

base;-addressdisp[2]ess);s(&n,&addrMPI_Addres

base;-addressdisp[1]ess);s(&b,&addrMPI_Addres);s(&a,&baseMPI_Addres

0;disp[0]

MPI_INT;type[2];MPI_DOUBLEtype[1]type[0]

1;blklen[2]blklen[1]blklen[0]

1;b0;a128;n

address;base,disp[3],MPI_Ainttype[3];newtype,peMPI_Dataty

blklen[3];n,intb;a,double

=

=

=

===

===

===

Page 42: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

42

(III) MPI_Pack / Unpack• MPI_Pack stores noncontiguous data in contiguous memory

locations.• MPI_Unpack copy data from a contiguous buffer into

noncontiguous memory locations (reverse of MPI_Pack).

ORLD);MPI_COMM_W,0,MPI_PACKEDe,buffer_sizbuffer,MPI_Bcast(

ORLD);MPI_COMM_Wposition,&e,buffer_sizbuffer,MPI_MIT,n,1,MPI_Pack(&

ORLD);MPI_COMM_Wposition,&e,buffer_sizbuffer,,MPI_DOUBLEb,1,MPI_Pack(&

ORLD);MPI_COMM_Wposition,&e,buffer_sizbuffer,,MPI_DOUBLEa,1,MPI_Pack(&

0;position

position;intfer_size];buffer[bufchar

=

Page 43: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

43

MPI Topology• Cartesian meshes are very common in parallel

programs solving PDEs.• In such programs, comm. patterns resemble closely

with the computational meshes.• The mapping of the comm. topology to the hardware

topology can be made in many ways; some are better than the others.

• Thus, MPI allows the vendor to help optimize this mapping.

• Two types of virtual topologies in MPI: Cartesian & graph topology.

Page 44: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

44

Cartesian Topology

• Create a 4×3 2D-mesh topology:

• MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, periods, reorder, new_comm);– ndim = # of dimensions = 2– dims = # of procs in each direction– dims[0] = 4, dims[1] = 3

(0,0)

(1,1) (3,1)(2,1)(0,1)

(1,0)

(3,2)(2,2)(1,2)(0,2)

(3,0)(2,0)

Page 45: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

45

Cartesian Topology (cont.)

• MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, periods, reorder, new_comm);– periods indicates if the procs at the end are connected.

Useful for periodic domains.– periods[0] = periods[1] = 0;– reorder indicates whether allowing the system to optimize

the mapping of grid procs to the underlying physical procs by reordering.

Page 46: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

46

Other Functions• MPI_Comm_rank(new_comm, my_new_rank)

– The ranking may have been changed by reordering.

• MPI_Cart_coords(new_comm, my_new_rank, ndim, coordiniates);– Given the rank, it returns the array coordinates

containing the coordinates of the procs.

• MPI_Cart_rank returns the rank when given the coordinates.

• MPI_Cart_get returns dims, periods, coords.• MPI_Cart_shift returns the coordinates of the

source & dest procs in a shift operation.

Page 47: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

47

Communicators• Example: MPI_COMM_WORLD.• Communicator = group + context.• A communicator consists of:

– A group of procs.– A set of comm. channels between these procs.– Each communicator has its own set of channels.– Messages sent with one communicator cannot be received by

another.• Enable development of safe software libraries.

– Library uses private comm. domain.• Sometime, restricting comm. to a subgroup is useful, e.g.

broadcast messages across a row or down a column of grid procs.

Page 48: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

48

Why Communicators ?• Conflicts among MPI calls by users and libraries.• E.g. Sub1 & Sub2 are from 2 different libraries.• Correct execution of library calls:

recv(any)

recv(any)

Sub1

Sub2

21

recv(1)

send(1)

send(2)

send(1)

send(0)

send(0)

recv(0)

recv(2)

0 PP P

Page 49: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

49

Why Communicators ? (cont.)

• Incorrect execution of library calls:

recv(any)

recv(any)

Sub1

Sub2

21

recv(1)

send(1)

send(2)

send(1)

send(0)

send(0)

recv(0)

recv(2)

0 PP P

Page 50: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

50

MPI Group

• A group = set of procs.• Create a group by:

– MPI_Group_incl: includes specific members.– MPI_Group_excl: excludes specific members.– MPI_Group_union: union two groups.– MPI_Group_intersection: intersect two groups.

• MPI_Comm_group: get an existing group.• MPI_Group_free: free a group.

Page 51: CS 770G - Parallel Algorithms in Scientific Computing ...cs770g/handout/mpi2.pdfCS 770G - Parallel Algorithms in Scientific Computing May 14 , ... ARCO, KAI, NAG, Parasoft, Shell

51

MPI-2

• Extensions to MPI1.1, MPI1.2.

• Major topics being discussed:– Dynamic process management.

– Client/server.

– Real-time extensions.

– "One-sided" communications.

– Portable access to MPI state (for debuggers).

– Language bindings for C++ and Fortran 90.