lecture 6: message passing interface (mpi). parallel programming models message passing model used...

33
Lecture 6: Lecture 6: Message Passing Message Passing Interface (MPI) Interface (MPI)

Upload: francis-campbell

Post on 13-Dec-2015

233 views

Category:

Documents


2 download

TRANSCRIPT

Lecture 6:Lecture 6:

Message Passing Message Passing Interface (MPI) Interface (MPI)

Parallel Programming Models

Message Passing Model

Used on Distributed memory MIMD architectures

Multiple processes execute in parallel asynchronously• Process creation may be static or dynamic

Processes communicate by using send and receive primitives

Parallel Programming Models

Example: Pi calculation

f01 f(x) dx = f0

1 4/(1+x2) dx = w ∑ f(xi)

f(x) = 4/(1+x2)

n = 10

w = 1/n

xi = w(i-0.5)

x

f(x)

0 0.1 0.2 xi 1

Parallel Programming ModelsSequential Code

#define f(x) 4.0/(1.0+x*x);

main(){int n,i;float w,x,sum,pi;

printf(“n?\n”);scanf(“%d”, &n);w=1.0/n;sum=0.0;for (i=1; i<=n; i++){

x=w*(i-0.5);sum += f(x);

}pi=w*sum;printf(“%f\n”, pi);

}

= w ∑ f(xi) f(x) = 4/(1+x2) n = 10 w = 1/nxi = w(i-0.5)

x

f(x)

0 0.1 0.2 xi 1

Message-Passing Interface (MPI)

http://www.mpi-forum.org

SPMD Parallel MPI Code

#include <stdio.h> #include <mpi.h>#define f(x) 4.0/(1.0+x*x)

main(int argc, char * argv[]){int myid, nproc, root, err;int n, i, start, end;float w, x, sum, pi;

err = MPI_Init(&argc, &argv);if (err != MPI_SUCCESS) {

printf(stderr, “initialization error\n”);exit(1);

}MPI_Comm_size(MPI_COMM_WORLD, &nproc);MPI_Comm_rank(MPI_COMM_WORLD, &myid);root=0;if (myid == root) {

f1=fopen(“indata”, “r”);fscanf(f1, “%d”, &n);fclose(f1);

}MPI_Bcast(&n, 1, MPI_INT, root, MPI_COMM_WORLD);w=1.0/n;sum=0.0;start = myid*(n/nproc);end = (myid+1)*(n/nproc);for (i=start; i<end; i++){

x = w*(i-0.5);sum += f(x);

}MPI_Reduce(&sum, &pi, MPI_FLOAT, MPI_SUM, root, MPI_COMM_WORLD);if (myid == root) {

f1=fopen(“outdata”, “w”);fprintf(f1, “pi=%f”, &pi);fclose(f1);

}MPI_Finalize();

}

Message-Passing Interface (MPI)

MPI_INIT(int *argc, char ***argv): Initiate an MPI computation. MPI_FINALIZE(): Terminate a computation.

MPI_COMM_SIZE (comm, size): Determine number of processes. MPI_COMM_RANK(comm, pid): Determine my process identifier.

MPI_SEND(buf, count, datatype, dest, tag, comm): Send a message. MPI_RECV(buf, count, datatype, source, tag, comm, status): Receive a message.

• tag: message tag or MPI_ANY_TAG

• source: process id of source process or MPI_ANY_SOURCE

Message-Passing Interface (MPI)

Deadlock: MPI_SEND and MPI_RECV are blocking.

Consider the program where the two processes exchange data:

...if (rank .eq. 0) then

call mpi_send( abuf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr )call mpi_recv( buf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, &status, ierr )

else if (rank .eq. 1) thencall mpi_send( abuf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr )call mpi_recv( buf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, &status, ierr )

endif

Message-Passing Interface (MPI)

Communicators

If two processes use different contexts for communication, there can be no danger of their communication being confused.

Each MPI communicator contains a separate communication context; this defines a separate virtual communication space.

Communicator Handle: identifies the process group and context with respect to which the operation is to be performed

MPI_COMM_WORLD: contains all the processes in a parallel computation

Message-Passing Interface (MPI)

Collective Operations

These operations are all executed in a collective fashion, meaning that each process in a process group calls the communication routine

Barrier: Synchronize all processes. Broadcast: Send data from one process to all processes. Gather: Gather data from all processes to one process. Scatter: Scatter data from one process to all processes. Reduction operations: addition, multiplication, etc. of distributed

data.

Message-Passing Interface (MPI)

Collective Operations

Barrier (comm): Synchronize all processes

Message-Passing Interface (MPI)

Collective Operations

MPI_BCAST (inbuf, incnt, intype, root, comm): 1-to-all

Ex: MPI_BCAST(A, 5, MPI_INT, 0, MPI_COMM_WORLD);

A0A1A2A3A4

P0P1

P2

P3

P0

A0A1A2A3A4

A0A1A2A3A4

A0A1A2A3A4

A0A1A2A3A4

Message-Passing Interface (MPI)

Collective Operations

MPI_SCATTER (inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm): 1-to-all

Ex: int A[100], B[25];MPI_SCATTER(A, 25, MPI_INT, B, 25, MPI_INT, 0, MPI_COMM_WORLD);

A

A0A1A2A3

P0P1

P2

P3

P0

B

A0

A1

A2

A3

Message-Passing Interface (MPI)

Collective Operations

MPI_GATHER (inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm): all-to-1

Ex: int A[100], B[25];MPI_GATHER(B, 25, MPI_INT, A, 25, MPI_INT, 0, MPI_COMM_WORLD);

A

B0B1B2B3

P0P1

P2

P3

P0

B

B0

B1

B2

B3

Message-Passing Interface (MPI)

Collective Operations

Reduction operations: Combine the values in the input buffer of each process using an operator

Operations: MPI_MAX, MPI_MIN MPI_SUM, MPI_PROD MPI_LAND, MPI_LOR, MPI_LXOR (logical) MPI_BAND, MPI_BOR, MPI_BXOR (bitwise)

Message-Passing Interface (MPI)

Collective Operations

MPI_REDUCE (inbuf, outbuf, count, type, op, root, comm)

Returns the combined value to the output buffer of a single root process

Ex: int A[2], B[2];MPI_REDUCE(A, B, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);

P0P1

P2

P3

P0

5 7

A

2 4

0 3

6 2

B

0 2

5 7

A

2 4 0 3 6 2

B

0 2

min

Message-Passing Interface (MPI)

Collective Operations

MPI_ALLREDUCE (inbuf, outbuf, count, type, op, comm)

Returns the combined value to the output buffers of all processes

Ex: int A[2], B[2];MPI_ALLREDUCE(A, B, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);

P1

P2

P3

P0

5 7

A

2 4

0 3

6 2

5 7

A

2 4 0 3 6 2

B

0 2

minP1

P2

P3

P0

0 2

B

0 2

0 2

0 2

Message-Passing Interface (MPI)

Asynchronous Communication

Data is distributed among processes which must then poll periodically for pending read and write requests

Local computation may interleave with the processing of incoming messages

Non-blocking send/receive

MPI_ISEND (buf, count, datatype, dest, tag, comm): Send a message. MPI_IRECV (buf, count, datatype, source, tag, comm, status): Receive a message. MPI_WAIT (MPI_Request *request, MPI_Status *status): Complete a non-blocking

operation

Message-Passing Interface (MPI)

Asynchronous Communication

MPI_IPROBE (source, tag, comm, flag, status): Polls for a pending message without receiving it, and sets a flag. The message can then be received by using MPI_RECV.

MPI_PROBE (source, tag, comm, status): Blocks until the message is available.

MPI_GET_COUNT (status, datatype, count): Determines size of the message.

status (must be set by a previous probe):

• status.MPI_SOURCE

• status.MPI_TAG

Message-Passing Interface (MPI)

Asynchronous Communication

Ex:int count, *buf, source;

MPI_PROBE (MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);

source = status.MPI_SOURCE;

MPI_GET_COUNT(status, MPI_INT, count);

buf = malloc(count*sizeof(int));

MPI_RECV (buf, count, MPI_INT, source, 0, MPI_COMM_WORLD, &status);

Message-Passing Interface (MPI)

Communicators

Communicator Handle: identifies the process group and context with respect to which the operation is to be performed

MPI_COMM_WORLD: contains all the processes in a parallel computation (default)

New communicators are formed by either including or excluding processes from an existing communicator.

MPI_COMM_SIZE() : Determine number of processes. MPI_COMM_RANK() : Determine my process identifier.

Message-Passing Interface (MPI)

Communicators

MPI_COMM_DUP (comm, newcomm): creates a new handle for the same process group

MPI_COMM_SPLIT (comm, color, key, newcomm): creates a new handle for a subset of a given process group

MPI_INTERCOMM_CREATE (comm, leader, peer, rleader, tag, inter): links processes in two groups

MPI_COMM_FREE (comm): destroys a handle

Message-Passing Interface (MPI)

Communicators

Ex: Two processes communicating with a new handle

MPI_COMM newcomm;

MPI_COMM_DUP (MPI_COMM_WORLD, newcomm);

if (myid == 0) MPI_SEND (A, 100, MPI_INT, 1, 0, newcomm);

else MPI_RECV (A, 100, MPI_INT, 0, 0, newcomm);

MPI_COMM_FREE (newcomm);

Message-Passing Interface (MPI)Communicators

Ex: Creating a new group with 4 members

MPI_COMM comm, newcomm;int myid, color;...MPI_COMM_RANK (comm, &myid);if (myid<4) color=1;else color=MPI_UNDEFINED;MPI_COMM_SPLIT (comm, color, myid, &newcomm);MPI_SCATTER (A, 10, MPI_INT, B, 10, MPI_INT, 0, newcomm);

Processes: P0 P1 P2 P3 P4 P5 P6 P7

Ranks incomm: 0 1 2 3 4 5 6 7

Color: 1 1 1 1 ? ? ? ?

Ranks innewcomm: 0 1 2 3

Message-Passing Interface (MPI)Communicators

Ex: Splitting processes into 3 independent groups

MPI_COMM comm, newcomm;int myid, color;...MPI_COMM_RANK (comm, &myid);color = myid % 3;MPI_COMM_SPLIT (comm, color, myid, &newcomm);

Processes: P0 P1 P2 P3 P4 P5 P6 P7

Ranks incomm: 0 1 2 3 4 5 6 7

Color: 0 1 2 0 1 2 0 1

Ranks innewcomm: 0 1 2 0 1 2 0 1

Message-Passing Interface (MPI)

Communicators

MPI_INTERCOMM_CREATE (comm, local_leader, peer_comm, remote_leader, tag, intercomm): links processes in two groups

comm: intracommunicator (within group) local_leader: leader within the group peer_comm: parent communicator remote_leader: other groups’ leader within the parent communicator

Message-Passing Interface (MPI)Communicators

Ex: Communication of processes in two different groups

MPI_COMM newcomm, intercomm;int myid, color;...MPI_COMM_SIZE (MPI_COMM_WORLD, &count);if (count % 2 == 0){

MPI_COMM_RANK (MPI_COMM_WORLD, &myid); color = myid % 2;MPI_COMM_SPLIT (MPI_COMM_WORLD, color, myid, &newcomm);MPI_COMM_RANK (newcomm, &newid); if (newid % 2 == 0){ // group 0

MPI_INTERCOMM_CREATE(newcomm, 0, MPI_COMM_WORLD, 1, 99, intercomm);MPI_SEND (msg, 1, type, newid, 0, intercomm);

}else { // group 1

MPI_INTERCOMM_CREATE(newcomm, 0, MPI_COMM_WORLD, 0, 99, intercomm);MPI_RECV (msg, 1, type, newid, 0, intercomm, &status);

}}MPI_COMM_FREE (intercomm);MPI_COMM_FREE (newcomm);

local_leaderremote_leader

local_leader remote_leaderdestination

P0 P1P2 P3P4 P5P6 P7

Message-Passing Interface (MPI)Communicators

Ex: Communication of processes in two different groups

Processes: P0 P1 P2 P3 P4 P5 P6 P7

Rank in

MPI_COMM_WORLD: 0 1 2 3 4 5 6 7

Processes: P0 P2 P4 P6 P1 P3 P5 P7

Rank in

MPI_COMM_WORLD: 0 2 4 6 1 3 5 7

Rank in

newcomm: 0 1 2 3 0 1 2 3

local_leader remote_leader

newcomm newcomm

local_leaderremote_leader

Message-Passing Interface (MPI)

Derived Types

Allow noncontiguous data elements to be grouped together in a message.

Constructor functions:

MPI_TYPE_CONTIGUOUS (): constructs data type from contiguous elements MPI_TYPE_VECTOR (): constructs data type from blocks separated by stride MPI_TYPE_INDEXED (): constructs data type with variable indices and sizes MPI_TYPE_COMMIT (): commit data type so that it can be used in

communication MPI_TYPE_FREE (): used to reclaim storage

Message-Passing Interface (MPI)

Derived Types

MPI_TYPE_CONTIGUOUS (count, oldtype, newtype): constructs data type from contiguous elements

Ex: MPI_TYPE_CONTIGUOUS (10, MPI_REAL, &newtype);

MPI_TYPE_VECTOR (count, blocklength, stride, oldtype, newtype): constructs data type from blocks separated by stride

Ex: MPI_TYPE_VECTOR (5, 1, 4, MPI_FLOAT, &floattype);

MemoryA

Message-Passing Interface (MPI)

Derived Types

MPI_TYPE_INDEXED (count, blocklengths, indices, oldtype, newtype): constructs data type with variable indices and sizes

Ex: MPI_TYPE_INDEXED (3, BLenghts, Indices, MPI_INT, &newtype);

Blengths 2 3 1

Indices 1 5 10

Data 0 1 2 3 4 5 6 7 8 9 10

Block 0 Block 1 Block 2

Message-Passing Interface (MPI)

Derived Types

MPI_TYPE_COMMIT (type): commit data type so that it can be used in communication

MPI_TYPE_FREE (type): used to reclaim storage

Message-Passing Interface (MPI)

Derived Types

Ex: MPI_TYPE_INDEXED (3, BLenghts, Indices, MPI_INT, &newtype);

MPI_TYPE_COMMIT (&newtype);

MPI_SEND (A, 1, newtype, dest, 0, MPI_COMM_WORLD);

MPI_TYPE_FREE (newtype);

Blengths 2 3 1

Indices 1 5 10

A 0 1 2 3 4 5 6 7 8 9 10

Block 0 Block 1 Block 2