the message passing interface (mpi) in layman's terms

Post on 12-May-2015

2.209 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to the basic concepts of what the Message Passing Interface (MPI) is, and a brief overview of the Open MPI open source software implementation of the MPI specification.

TRANSCRIPT

KYOSS presentation, January 2011 1

Open MPI

KYOSS presentation12 Jan, 2011

Jeff Squyres

KYOSS presentation, January 2011 2

What is the Message Passing Interface (MPI)?

The Bookof MPI

A standards document

www.mpi-forum.org

KYOSS presentation, January 2011 3

Using MPI

Hardware and software implementthe interface in the MPI standard (book)

KYOSS presentation, January 2011 4

MPI implementations

There are many implementationsof the MPI standard

Some areclosed source

Others areopen source

KYOSS presentation, January 2011 5

Open MPI

Open MPI is a free, open sourceimplementation of the MPI standard

www.open-mpi.org

KYOSS presentation, January 2011 6

So what is MPI for?

Message Passing Interface

Let’s break it down…

KYOSS presentation, January 2011 7

1. Message passing

Process A Process B

Message

KYOSS presentation, January 2011 8

1. Message passing

Process A Process B

Pass it

KYOSS presentation, January 2011 9

1. Message passing

Process A Process B

Message has been passed

KYOSS presentation, January 2011 10

1. Message passing

Process

Thread A Thread B

…as opposed to data that is shared

KYOSS presentation, January 2011 11

2. Interface

C programming function calls

Fortran too!

MPI_Send

(buf, co

unt, typ

e, dest,

tag, co

mm)

MPI_Recv(buf, count, type, src, tag, comm, status)

MPI_Init(argv, argc)

MPI_Finalize(void)MPI_Type_size(dtype, s

ize)

MPI_Wait(req, status)

MPI_Test(req, flag, status)MPI_Comm_dup(in, out)

KYOSS presentation, January 2011 12

Fortran? Really?

What most modern developers associate with “Fortran”

KYOSS presentation, January 2011 13

Yes, really

Some oftoday’s most

advancedsimulationcodes arewritten inFortran

KYOSS presentation, January 2011 14

Yes, really

Yes,that Intel

Optimizedfor Nehalem,Westmere,

and beyond!

KYOSS presentation, January 2011 15

Fortran is great for what it is

A simple language for mathematical

expressions and computations

Targeted at scientists and engineers

…not computer scientists or web

developers or database developers or …

KYOSS presentation, January 2011 16

Back to defining “MPI”…

KYOSS presentation, January 2011 17

Putting it back together

Message Passing Interface

“An interface for passing messages”

“C functions for passing messages”

Fortran too!

KYOSS presentation, January 2011 18

C/Fortran functions for message passing

Process A Process B

MPI_Send(…)

KYOSS presentation, January 2011 19

C/Fortran functions for message passing

Process A Process B

MPI_Recv(…)

KYOSS presentation, January 2011 20

Really? Is that all MPI is?

“Can’t I just do that with sockets?”

Yes!(…and no)

KYOSS presentation, January 2011 21

Comparison

(TCP) Sockets

• Connections based on IP addresses and ports

• Point-to-point communication

• Stream-oriented

• Raw data (bytes / octets)

• Network-independent

• “Slow”

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

KYOSS presentation, January 2011 22

Comparison

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

Whoa!What are these?

KYOSS presentation, January 2011 23

Peer integer “rank”

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

3 4 5

6 7 8

9 10 11

KYOSS presentation, January 2011 24

Peer integer “rank”

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

3 4 5

6 7 8

9 10 11

KYOSS presentation, January 2011 25

“Collective”: broadcast

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

3 4 5

6 7 8

9 10 11

KYOSS presentation, January 2011 26

“Collective”: broadcast

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

3 4 5

6 7 8

9 10 11

KYOSS presentation, January 2011 27

“Collective”: scatter

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

KYOSS presentation, January 2011 28

“Collective”: scatter

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

KYOSS presentation, January 2011 29

“Collective”: gather

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

KYOSS presentation, January 2011 30

“Collective”: gather

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

KYOSS presentation, January 2011 31

“Collective”: reduce

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

4

5

3

2

6

6

2

4

5

3

4

4

KYOSS presentation, January 2011 32

“Collective”: reduce

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

42

KYOSS presentation, January 2011 33

“Collective”: …and others

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

0 1 2

9 10 11

3 4 5

6 7 8

KYOSS presentation, January 2011 34

Messages, not bytes

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

Entire messageis sent andreceived

Not astream of

individual bytes

KYOSS presentation, January 2011 35

Messages, not bytes

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

Contents:

Not abunch of

bytes!

17 integers23 doubles98 structs

…or whatever

KYOSS presentation, January 2011 36

Network independent

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

Underlying network

MPI_Send(…)

MPI_Recv(…)

Ethernet Myrinet

InfiniBand Shared memory

TCP iWARP RoCE

KYOSS presentation, January 2011 37

Network independent

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

Underlying network

MPI_Send(…)

MPI_Recv(…)

Ethernet Myrinet

InfiniBand Shared memory

TCP iWARP RoCE

Regardless of underlyingnetwork or transport

protocol, the applicationcode stays the same

KYOSS presentation, January 2011 38

Blazing fast

MPI

• Based on peer integer “rank” (e.g., 8)

• Point-to-point and collective and one-sided and …

• Message oriented

• Typed messages

• Network independent

• Blazing fast

Onemicrosecond

(!)

…more on performance later

KYOSS presentation, January 2011 39

What is MPI?

MPI isprobably

somewherearound here

KYOSS presentation, January 2011 40

What is MPI?

MPI ishides all

the layersunderneath

KYOSS presentation, January 2011 41

What is MPI?

A high-level network programming abstraction

IPaddresses

bytestreams

rawbytes

KYOSS presentation, January 2011 42

What is MPI?

A high-level network programming abstraction

IPaddresses

bytestreams

rawbytesNothing to see here

Please move along

KYOSS presentation, January 2011 43

So what?

What’s all this message passing stuffgot to do with supercomputers?

KYOSS presentation, January 2011 44

So what?

Let’s define “supercomputers”

KYOSS presentation, January 2011 45

Supercomputers

KYOSS presentation, January 2011 46

Supercomputers

“Nebulae”

NationalSupercomputing

Centre,Shenzen,

China

KYOSS presentation, January 2011 47

Supercomputers

“Mare Nostrum”(Our Sea)

BarcelonaSupercomputer

Center,Spain

Used to be a church

KYOSS presentation, January 2011 48

Supercomputers

Notice anything?

KYOSS presentation, January 2011 49

Supercomputers

They’re justracks ofservers!

KYOSS presentation, January 2011 50

Generally speaking…

Supercomputer=

Lots ofprocessors +

Lots ofRAM +

Lots ofdisk

KYOSS presentation, January 2011 51

Generally speaking…

Supercomputer=

(Many) Racks of (commodity)high-end servers

(this is one definition; there are others)

KYOSS presentation, January 2011 52

So if that’s a supercomputer…

Rack of36 1U

servers

KYOSS presentation, January 2011 53

How is it different from my web farm?

Rack of36 1U

servers

KYOSS presentation, January 2011 54

Just a bunch of servers?

The difference betweensupercomputers and web farms

and database farms (and …)

All the servers act together tosolve a single computational problem

KYOSS presentation, January 2011 55

Acting together

Computational problem

Input Output

Take your computational problem…

KYOSS presentation, January 2011 56

Acting together

Computational problem

Input Output

…and split it up!

KYOSS presentation, January 2011 57

Acting together

Computational problem

Input Output

Distribute the input dataacross a bunch of servers

KYOSS presentation, January 2011 58

Acting together

Input Output

Use the network between serversto communicate / coordinate

KYOSS presentation, January 2011 59

Acting together

Input Output

Use the network between serversto communicate / coordinate

KYOSS presentation, January 2011 60

Acting together

Input Output

MPI is used for this communication

KYOSS presentation, January 2011 61

Why go to so much trouble?

Computational problem

One processor

hour

1 processor = …a long time…

KYOSS presentation, January 2011 62

Why go to so much trouble?

Computational problem

One processor

hour

One processor

hour

One processor

hour

21 processors = ~1 hour (!)Disclaimer: scaling is rarely perfect

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

One processor

hour

KYOSS presentation, January 2011 63

High Performance Computing

Using supercomputers to solvereal world problems that are

TOO BIGfor laptops, desktops, or individuals servers

HPC=

KYOSS presentation, January 2011 64

Why does HPC MPI?

Network abstractionAre thesecores?

KYOSS presentation, January 2011 65

Why does HPC MPI?

Network abstraction…orservers?

KYOSS presentation, January 2011 66

Why does HPC MPI?

Message semantics

Array of10,000

integers

KYOSS presentation, January 2011 67

Why does HPC MPI?

Message semantics

Array of10,000

integers

KYOSS presentation, January 2011 68

Why does HPC MPI?

Ultra-low network latency

1 microsecond

(depending on your network type!)

KYOSS presentation, January 2011 69

1 microsecond = 0.000001 second

Fromhere

Tohere

KYOSS presentation, January 2011 70

1 microsecond = 0.000001 second

Fromhere

Tohere

KYOSS presentation, January 2011 71

Holy smokes!

That’s fast

KYOSS presentation, January 2011 72

Let’s get into some details…

KYOSS presentation, January 2011 73

MPI Basics

• “6 function MPI” MPI_Init(): startup MPI_Comm_size(): how many peers? MPI_Comm_rank(): my unique (ordered) ID MPI_Send(): send a message MPI_Recv(): receive a message MPI_Finalize(): shutdown

• Can implement a huge number of parallel applications with just these 6 functions

KYOSS presentation, January 2011 74

Let’s see “Hello, World” in MPI

KYOSS presentation, January 2011 75

MPI Hello, World

#include <stdio.h>#include <mpi.h>

int main(int argc, char **argv) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Hello, world! I am %d of %d\n", rank, size); MPI_Finalize(); return 0;}

Initialize MPI

Who am I?

Num. peers?

Shut down MPI

KYOSS presentation, January 2011 76

Compile it with Open MPI

shell$ mpicc hello.c -o helloshell$

Hey – what’s that? Where’s gcc?

Open MPI comes standard in many Linux and BSD distributions

(and OS X)

KYOSS presentation, January 2011 77

“Wrapper” compiler

shell$ mpicc hello.c -o hello –showmegcc hello.c -o hello -I/opt/openmpi/include -pthread -L/open/openmpi/lib -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldlshell$

mpicc simply fills in a bunch of compiler command line options for you

KYOSS presentation, January 2011 78

Now let’s run it

shell$ mpirun –np 4 hello

Hey – what’s that? Why don’t I just

run “./hello”?

KYOSS presentation, January 2011 79

mpirun launcher

shell$ mpirun –np 4 hello

mpirun launches N copies of yourprogram and “wires them up”

“-np” = “number of processes”

This command launches a 4 process parallel job

KYOSS presentation, January 2011 80

mpirun launcher

shell$ mpirun –np 4 hello

hello hello

hellohello

Four copiesof “hello”

are launched

Then theyare “wired up”on the network

KYOSS presentation, January 2011 81

Now let’s run it

shell$ mpirun –np 4 helloHello, world! I am 0 of 4Hello, world! I am 1 of 4Hello, world! I am 2 of 4Hello, world! I am 3 of 4shell$

By default, all copiesrun on the local host

KYOSS presentation, January 2011 82

Run on multiple servers!

shell$ cat my_hostfilehost1.example.comhost2.example.comhost3.example.comhost4.example.comshell$

KYOSS presentation, January 2011 83

Run on multiple servers!

shell$ cat my_hostfilehost1.example.comhost2.example.comhost3.example.comhost4.example.comshell$ mpirun –hostfile my_hostfile –np 4 helloHello, world! I am 0 of 4Hello, world! I am 1 of 4Hello, world! I am 2 of 4Hello, world! I am 3 of 4shell$

Ran on host1 Ran on host2 Ran on host3 Ran on host4

KYOSS presentation, January 2011 84

Run it again

shell$ mpirun –hostfile my_hostfile –np 4 helloHello, world! I am 2 of 4Hello, world! I am 3 of 4Hello, world! I am 0 of 4Hello, world! I am 1 of 4shell$

Hey – why are the numbers out of

order?

2301

KYOSS presentation, January 2011 85

Standard output re-routing

shell$ mpirun –hostfile my_hostfile –np 4 hello

hello0

hello1

hello3

hello2

mpirun Each “hello” program’sstandard outputis intercepted

and sent across thenetwork to mpirun

Hello, world! I am 0 of 4 Hello, world! I am 1 of 4

Hello, world! I am 2 of 4 Hello, world! I am 3 of 4

KYOSS presentation, January 2011 86

Standard output re-routing

shell$ mpirun –hostfile my_hostfile –np 4 hello

hello0

hello1

hello3

hello2

mpirun But the exactordering of

received printf’sis non-deterministic

Hello, world! I am 0 of 4

Hello, world! I am 1 of 4

Hello, world! I am 2 of 4

Hello, world! I am 3 of 4

KYOSS presentation, January 2011 87

Printf debugging = Bad

If you can’t rely on output ordering,printf debugging is pretty lousy (!)

KYOSS presentation, January 2011 88

Parallel debuggers

Fortunately, there are paralleldebuggers and other tools

Paralleldebugger

Attaches to allprocesses inthe MPI job

hello0

hello1

hello3

hello2

mpirun

KYOSS presentation, January 2011 89

Now let’s send a simple MPI message

KYOSS presentation, January 2011 90

Send a simple message

int rank;double buffer[SIZE];

MPI_Comm_rank(MPI_COMM_WORLD, &rank);if (0 == rank) { /* …initialize buffer[]… */ MPI_Send(buffer, SIZE, MPI_DOUBLE, 1, 123, MPI_COMM_WORLD);} else if (1 == rank) { MPI_Recv(buffer, SIZE, MPI_DOUBLE, 0, 123, MPI_COMM_WORLD, MPI_STATUS_IGNORE);}

If I’m number 0, send the buffer[] array to number 1

If I’m number 1, receive thebuffer[] array from number 0

KYOSS presentation, January 2011 91

That’s enough MPI for now…

KYOSS presentation, January 2011 92

Open MPI

PACX-MPI

LAM/MPI

LA-MPI

FT-MPI

Sun CT 6

Project founded in 2003 after

intensediscussions

between multiple open source MPI implementations

KYOSS presentation, January 2011 93

Open_MPI_Init()

shell$ svn log –r 1 https://svn.open-mpi.org/svn/ompi------------------------------------------------------------------------r1 | jsquyres | 2003-11-22 11:36:58 -0500 (Sat, 22 Nov 2003) | 2 lines

First commit------------------------------------------------------------------------shell$

KYOSS presentation, January 2011 94

Open_MPI_Current_status()

shell$ svn log –r HEAD https://svn.open-mpi.org/svn/ompi------------------------------------------------------------------------r24226 | rhc | 2011-01-11 20:57:47 -0500 (Tue, 11 Jan 2011) | 25 lines

Fixes #2683: Move ORTE DPM compiler warning squash to v1.4------------------------------------------------------------------------shell$

KYOSS presentation, January 2011 95

Open MPI 2011 Membership

15 members, 11 contributors, 2 partners

KYOSS presentation, January 2011 96

Fun stats

• ohloh.net says: 517,400 lines of code 30 developers (over

time) “Well-commented

source code”

• I rank in top-25 ohloh stats for: C Automake Shell script Fortran (ouch!)

KYOSS presentation, January 2011 97

Open MPI has grown

It’s amazing (to me) that the Open MPIproject works so well

New features, new releases, new members

Long live Open MPI!

KYOSS presentation, January 2011 98

Recap

• Defined Message Passing Interface (MPI)

• Defined “supercomputers”

• Defined High Performance Computing (HPC)

• Showed what MPI is

• Showed some trivial MPI codes

• Discussed Open MPI

KYOSS presentation, January 2011 99

Additional Resources

• MPI Forum web site The only site for the official MPI standards http://www.mpi-forum.org/

• NCSA MPI basic and intermediate tutorials Requires a free account http://ci-tutor.ncsa.uiuc.edu/login.php

• “MPI Mechanic” magazine columns http://cw.squyres.com/

KYOSS presentation, January 2011 100

Additional Resources

• Research, Computing, and Engineering (RCE) podcast http://www.rce-cast.com/

• My blog: MPI_BCAST http://blogs.cisco.com/category/performance/

KYOSS presentation, January 2011 101

Questions?

top related