cs 838: pervasive parallelism introduction to mpi copyright 2005 mark d. hill university of...

CS 838: Pervasive Parallelism

Introduction to MPI

Copyright 2005 Mark D. HillUniversity of Wisconsin-Madison

Slides are derived from an online tutorial fromLawrence Livermore National Laboratories

Thanks!

CS 838 2(C) 2005

Outline

• Introduction to MPI

• MPI programming 101

• Point-to-Point Communication

• Collective Communication

• MPI Environment

• References

CS 838 3(C) 2005

Introduction to MPI

• Message Passing – A collection of co-operating processes

– Running on different machines/ executing different code

– Communicate through a standard interface.

• Message Passing Interface (MPI)A library standard established to facilitate portable, efficient

programs using message passing.

• Vendor independent and supported across a large number of platforms.

CS 838 4(C) 2005

Introduction to MPI

• Fairly large set of primitives (129 functions)

• Small set of regularly used routines.

• MPI routines– Environment Setup

– Point-to-Point Communication

– Collective Communication

– Virtual Topologies

– Data Type definitions

– Group-Communicator management

CS 838 5(C) 2005

Outline


• MPI programming 101• Point-to-Point Communication


• MPI Environment

• References

CS 838 6(C) 2005

MPI programming 101

• HelloWorld.c#include <stdio.h>

#include "mpi.h"

int main( int argc, char **argv ) {

int myid, num_procs;

MPI_Init( &argc, &argv );

MPI_Comm_size( MPI_COMM_WORLD, &num_procs );

MPI_Comm_rank( MPI_COMM_WORLD, &myid );

printf( "Hello world from process %d of %d\n", myid, num_procs );

MPI_Finalize();

return 0;

}

• mpicc –o hello HelloWorld.c

• mpirun –np 16 hello

CS 838 7(C) 2005

MPI Programming 101

• Generic MPI program

MPI include file

Terminate MPI environment

Init MPI environment

MPI Message Passing Calls

CS 838 8(C) 2005

MPI Programming 101

• MPI include file# include “mpi.h”

• Initializing MPI environmentMPI_Init (&argc, &argv);

Initialize MPI execution environment

MPI_Comm_size(MPI_COMM_WORLD, &num_procs);

Determine num of processes in the group

MPI_Comm_rank(MPI_COMM_WORLD, &myid);

Get my rank among the processes

• Terminating MPIMPI_Finalize();

• Compile Script: mpicc (compile and link MPI C programs)

• Execute Script: mpirun (Run MPI programs)

CS 838 9(C) 2005

Outline



• Point-to-Point Communication• Collective Communication

• MPI Environment

• References

CS 838 10(C) 2005

Point-to-Point Communication

• Point-to-Point CommunicationMessage passing between two processes

• Types of communication– Synchronous send

– Blocking send / blocking receive

– Non-blocking send / non-blocking receive

– Buffered send

– Combined send/receive

– "Ready" send

• Any type of send can be paired with any type of receive

CS 838 11(C) 2005


#include "mpi.h" #include <stdio.h> int main(int argc, char **argv) { int numtasks, rank, dest, source, rc, count, tag=1; char inmsg, outmsg='x'; MPI_Status Stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) {

dest = 1; source = 1; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD,

&Stat); } else if (rank == 1) {

dest = 0; source = 0; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD,

&Stat); rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);

} printf("Task %d: Received %c from task %d with tag %d \n", rank, inmsg, Stat.MPI_SOURCE, Stat.MPI_TAG);

MPI_Finalize(); }

CS 838 12(C) 2005


• MPI_Send– Basic blocking send operation. Routine returns only after the

application buffer in the sending task is free for reuse.

int MPI_Send( void *send_buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm )

• MPI_Recv– Receive a message and block until the requested data is available in

the application buffer in the receiving task.

int MPI_Recv( void *recv_buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status )

CS 838 13(C) 2005


send_buf recv_buf

Task 0 Task 1

{src, dest, tag, data}

• Push-based communication.• Wild cards allowed on ‘receiver’ side for src and tag.• MPI_Status object can be queried for information on a received message.

CS 838 14(C) 2005


• Blocking vs. Non-blocking– Blocking: Send routine will "return" after it is safe to modify the

send buffer for reuse. Receive "returns" after the data has arrived and is ready for use by the program.

– Non-blocking: Send and receive routines return almost immediately. They do not wait for any communication events to complete, such as message copying from user memory to system buffer space or the actual arrival of message.

• Buffering– System buffer space managed by libraries.

– Can impact performance

– User-managed buffering is also possible.

• Order and Fairness– Order: MPI guarantees in-order message delivery.

– Fairness: MPI does not guarantee fairness.

CS 838 15(C) 2005


• MPI_Ssend– Synchronous blocking send: Send a message and block until the

application buffer in the sending task is free for reuse and the destination process has started to receive the message.

• MPI_Bsend– permits the programmer to allocate the required amount of buffer

space into which data can be copied until it is delivered

• MPI_Isend– Non blocking send. Must return to the user without requiring a

matching receive at the destination. Does NOT mean we can reuse the send buffer immediately.

CS 838 16(C) 2005

MPI Datatypes

• Predefined Elementary DatatypesEg: MPI_CHAR, MPI_INT, MPI_LONG, MPI_FLOAT

• Derived DataTypes are also possible– Contiguous

– Vector

– Indexed

– Struct

• Enables grouping of data for communication

CS 838 17(C) 2005

Outline




• Collective Communication• MPI Environment

• References

CS 838 18(C) 2005

Collective Communication

• Types of Communication – Synchronization - processes wait until all members of the group

have reached the synchronization point.

– Data Movement - broadcast, scatter/gather, all to all.

– Collective Computation (reductions) - one member of the group collects data from the other members and performs an operation (min, max, add, multiply, etc.) on that data.

• All collective communication is blocking

• Responsibility of user to make sure all processes in a group participate.

• Work only with MPI pre-defined datatypes.

CS 838 19(C) 2005


• MPI_Barrier To create a barrier synchronization in a group.

int MPI_Barrier(MPI_Comm comm)

• MPI_Bcast Broadcasts a message from the process with rank "root" to all other

processes of the group. Caveat: Receiving processes should also call this function to receive the

broadcast.

int MPI_Bcast ( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )

CS 838 20(C) 2005


• MPI_Scatter Distributes distinct messages from a single source task to each

task in the group.

int MPI_Scatter ( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm )

CS 838 21(C) 2005


• MPI_Gather Gathers distinct messages from each task in the group to a single

destination task

int MPI_Gather ( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm )

CS 838 22(C) 2005


• MPI_Allgather Concatenation of data to all tasks in a group. Each task in the group,

in effect, performs a one-to-all broadcasting operation within the group

int MPI_Allgather ( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm )

CS 838 23(C) 2005


• MPI_Reduce Applies a reduction operation on all tasks in the group and places

the result in one task.

int MPI_Reduce ( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm )

– Ops: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD etc.

CS 838 24(C) 2005

Outline





• MPI Environment• References

CS 838 25(C) 2005

MPI Environment

• Communicators and Groups– Used to determine which processes may communicate with each

other.

– A group is an ordered set of processes. Each process in a group is associated with a unique integer rank. Rank values start at zero and go to N-1, where N is the number of processes in the group.

– A communicator encompasses a group of processes that may communicate with each other. MPI_COMM_WORLD is the default communicator that includes all processes.

– Groups can be created manually using MPI group-manipulation routines or by using MPI topology-definition routines.

CS 838 26(C) 2005

MPI Environment

• MPI_Init Initializes the MPI execution environment. This function must be called in

every MPI program, must be called before any other MPI functions

• MPI_Comm_size Determines the number of processes in the group associated with a

communicator

• MPI_Comm_rank Determines the rank of the calling process within the communicator

• MPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor

• MPI_Finalize Terminates the MPI execution environment. This function should be the last

MPI routine called in every MPI program

CS 838 27(C) 2005

MPI Environment

• MPI_Init Initializes the MPI execution environment. This function must be called in

every MPI program, must be called before any other MPI functions

• MPI_Comm_size Determines the number of processes in the group associated with a

communicator

• MPI_Comm_rank Determines the rank of the calling process within the communicator

• MPI_Wtime Returns an elapsed wall clock time in seconds on the calling processor

• MPI_Finalize Terminates the MPI execution environment. This function should be the last

MPI routine called in every MPI program

CS 838 28(C) 2005

Final Comments

• Debugging MPI programs– can attach standard debuggers like gdb to an MPI program.

• Profiling MPI programs– Building wrappers

– MPI timers

– Generating log files

– Viewing log files

Refer to online tutorials for more information.

CS 838 29(C) 2005

References

• MPI web pages at Argonne National Laboratory http://www-unix.mcs.anl.gov/mpi

• MPI online reference

http://www-unix.mcs.anl.gov/mpi/www/

• MPI tutorial at Livermore National Laboratories

http://www.llnl.gov/computing/tutorials/mpi/

cs 838: pervasive parallelism introduction to mpi copyright 2005 mark d. hill university of...

Documents