the message passing interface (mpi) in layman's terms
DESCRIPTION
Introduction to the basic concepts of what the Message Passing Interface (MPI) is, and a brief overview of the Open MPI open source software implementation of the MPI specification.TRANSCRIPT
KYOSS presentation, January 2011 1
Open MPI
KYOSS presentation12 Jan, 2011
Jeff Squyres
KYOSS presentation, January 2011 2
What is the Message Passing Interface (MPI)?
The Bookof MPI
A standards document
www.mpi-forum.org
KYOSS presentation, January 2011 3
Using MPI
Hardware and software implementthe interface in the MPI standard (book)
KYOSS presentation, January 2011 4
MPI implementations
There are many implementationsof the MPI standard
Some areclosed source
Others areopen source
KYOSS presentation, January 2011 5
Open MPI
Open MPI is a free, open sourceimplementation of the MPI standard
www.open-mpi.org
KYOSS presentation, January 2011 6
So what is MPI for?
Message Passing Interface
Let’s break it down…
KYOSS presentation, January 2011 7
1. Message passing
Process A Process B
Message
KYOSS presentation, January 2011 8
1. Message passing
Process A Process B
Pass it
KYOSS presentation, January 2011 9
1. Message passing
Process A Process B
Message has been passed
KYOSS presentation, January 2011 10
1. Message passing
Process
Thread A Thread B
…as opposed to data that is shared
KYOSS presentation, January 2011 11
2. Interface
C programming function calls
Fortran too!
MPI_Send
(buf, co
unt, typ
e, dest,
tag, co
mm)
MPI_Recv(buf, count, type, src, tag, comm, status)
MPI_Init(argv, argc)
MPI_Finalize(void)MPI_Type_size(dtype, s
ize)
MPI_Wait(req, status)
MPI_Test(req, flag, status)MPI_Comm_dup(in, out)
KYOSS presentation, January 2011 12
Fortran? Really?
What most modern developers associate with “Fortran”
KYOSS presentation, January 2011 13
Yes, really
Some oftoday’s most
advancedsimulationcodes arewritten inFortran
KYOSS presentation, January 2011 14
Yes, really
Yes,that Intel
Optimizedfor Nehalem,Westmere,
and beyond!
KYOSS presentation, January 2011 15
Fortran is great for what it is
A simple language for mathematical
expressions and computations
Targeted at scientists and engineers
…not computer scientists or web
developers or database developers or …
KYOSS presentation, January 2011 16
Back to defining “MPI”…
KYOSS presentation, January 2011 17
Putting it back together
Message Passing Interface
“An interface for passing messages”
“C functions for passing messages”
Fortran too!
KYOSS presentation, January 2011 18
C/Fortran functions for message passing
Process A Process B
MPI_Send(…)
KYOSS presentation, January 2011 19
C/Fortran functions for message passing
Process A Process B
MPI_Recv(…)
KYOSS presentation, January 2011 20
Really? Is that all MPI is?
“Can’t I just do that with sockets?”
Yes!(…and no)
KYOSS presentation, January 2011 21
Comparison
(TCP) Sockets
• Connections based on IP addresses and ports
• Point-to-point communication
• Stream-oriented
• Raw data (bytes / octets)
• Network-independent
• “Slow”
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
KYOSS presentation, January 2011 22
Comparison
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
Whoa!What are these?
KYOSS presentation, January 2011 23
Peer integer “rank”
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
3 4 5
6 7 8
9 10 11
KYOSS presentation, January 2011 24
Peer integer “rank”
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
3 4 5
6 7 8
9 10 11
KYOSS presentation, January 2011 25
“Collective”: broadcast
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
3 4 5
6 7 8
9 10 11
KYOSS presentation, January 2011 26
“Collective”: broadcast
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
3 4 5
6 7 8
9 10 11
KYOSS presentation, January 2011 27
“Collective”: scatter
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
KYOSS presentation, January 2011 28
“Collective”: scatter
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
KYOSS presentation, January 2011 29
“Collective”: gather
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
KYOSS presentation, January 2011 30
“Collective”: gather
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
KYOSS presentation, January 2011 31
“Collective”: reduce
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
4
5
3
2
6
6
2
4
5
3
4
4
KYOSS presentation, January 2011 32
“Collective”: reduce
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
42
KYOSS presentation, January 2011 33
“Collective”: …and others
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
0 1 2
9 10 11
3 4 5
6 7 8
KYOSS presentation, January 2011 34
Messages, not bytes
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
Entire messageis sent andreceived
Not astream of
individual bytes
KYOSS presentation, January 2011 35
Messages, not bytes
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
Contents:
Not abunch of
bytes!
17 integers23 doubles98 structs
…or whatever
KYOSS presentation, January 2011 36
Network independent
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
Underlying network
MPI_Send(…)
MPI_Recv(…)
Ethernet Myrinet
InfiniBand Shared memory
TCP iWARP RoCE
KYOSS presentation, January 2011 37
Network independent
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
Underlying network
MPI_Send(…)
MPI_Recv(…)
Ethernet Myrinet
InfiniBand Shared memory
TCP iWARP RoCE
Regardless of underlyingnetwork or transport
protocol, the applicationcode stays the same
KYOSS presentation, January 2011 38
Blazing fast
MPI
• Based on peer integer “rank” (e.g., 8)
• Point-to-point and collective and one-sided and …
• Message oriented
• Typed messages
• Network independent
• Blazing fast
Onemicrosecond
(!)
…more on performance later
KYOSS presentation, January 2011 39
What is MPI?
MPI isprobably
somewherearound here
KYOSS presentation, January 2011 40
What is MPI?
MPI ishides all
the layersunderneath
KYOSS presentation, January 2011 41
What is MPI?
A high-level network programming abstraction
IPaddresses
bytestreams
rawbytes
KYOSS presentation, January 2011 42
What is MPI?
A high-level network programming abstraction
IPaddresses
bytestreams
rawbytesNothing to see here
Please move along
KYOSS presentation, January 2011 43
So what?
What’s all this message passing stuffgot to do with supercomputers?
KYOSS presentation, January 2011 44
So what?
Let’s define “supercomputers”
KYOSS presentation, January 2011 45
Supercomputers
KYOSS presentation, January 2011 46
Supercomputers
“Nebulae”
NationalSupercomputing
Centre,Shenzen,
China
KYOSS presentation, January 2011 47
Supercomputers
“Mare Nostrum”(Our Sea)
BarcelonaSupercomputer
Center,Spain
Used to be a church
KYOSS presentation, January 2011 48
Supercomputers
Notice anything?
KYOSS presentation, January 2011 49
Supercomputers
They’re justracks ofservers!
KYOSS presentation, January 2011 50
Generally speaking…
Supercomputer=
Lots ofprocessors +
Lots ofRAM +
Lots ofdisk
KYOSS presentation, January 2011 51
Generally speaking…
Supercomputer=
(Many) Racks of (commodity)high-end servers
(this is one definition; there are others)
KYOSS presentation, January 2011 52
So if that’s a supercomputer…
Rack of36 1U
servers
KYOSS presentation, January 2011 53
How is it different from my web farm?
Rack of36 1U
servers
KYOSS presentation, January 2011 54
Just a bunch of servers?
The difference betweensupercomputers and web farms
and database farms (and …)
All the servers act together tosolve a single computational problem
KYOSS presentation, January 2011 55
Acting together
Computational problem
Input Output
Take your computational problem…
KYOSS presentation, January 2011 56
Acting together
Computational problem
Input Output
…and split it up!
KYOSS presentation, January 2011 57
Acting together
Computational problem
Input Output
Distribute the input dataacross a bunch of servers
KYOSS presentation, January 2011 58
Acting together
Input Output
Use the network between serversto communicate / coordinate
KYOSS presentation, January 2011 59
Acting together
Input Output
Use the network between serversto communicate / coordinate
KYOSS presentation, January 2011 60
Acting together
Input Output
MPI is used for this communication
KYOSS presentation, January 2011 61
Why go to so much trouble?
Computational problem
One processor
hour
1 processor = …a long time…
KYOSS presentation, January 2011 62
Why go to so much trouble?
Computational problem
One processor
hour
One processor
hour
One processor
hour
21 processors = ~1 hour (!)Disclaimer: scaling is rarely perfect
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
One processor
hour
KYOSS presentation, January 2011 63
High Performance Computing
Using supercomputers to solvereal world problems that are
TOO BIGfor laptops, desktops, or individuals servers
HPC=
KYOSS presentation, January 2011 64
Why does HPC MPI?
Network abstractionAre thesecores?
KYOSS presentation, January 2011 65
Why does HPC MPI?
Network abstraction…orservers?
KYOSS presentation, January 2011 66
Why does HPC MPI?
Message semantics
Array of10,000
integers
KYOSS presentation, January 2011 67
Why does HPC MPI?
Message semantics
Array of10,000
integers
KYOSS presentation, January 2011 68
Why does HPC MPI?
Ultra-low network latency
1 microsecond
(depending on your network type!)
KYOSS presentation, January 2011 69
1 microsecond = 0.000001 second
Fromhere
Tohere
KYOSS presentation, January 2011 70
1 microsecond = 0.000001 second
Fromhere
Tohere
KYOSS presentation, January 2011 71
Holy smokes!
That’s fast
KYOSS presentation, January 2011 72
Let’s get into some details…
KYOSS presentation, January 2011 73
MPI Basics
• “6 function MPI” MPI_Init(): startup MPI_Comm_size(): how many peers? MPI_Comm_rank(): my unique (ordered) ID MPI_Send(): send a message MPI_Recv(): receive a message MPI_Finalize(): shutdown
• Can implement a huge number of parallel applications with just these 6 functions
KYOSS presentation, January 2011 74
Let’s see “Hello, World” in MPI
KYOSS presentation, January 2011 75
MPI Hello, World
#include <stdio.h>#include <mpi.h>
int main(int argc, char **argv) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Hello, world! I am %d of %d\n", rank, size); MPI_Finalize(); return 0;}
Initialize MPI
Who am I?
Num. peers?
Shut down MPI
KYOSS presentation, January 2011 76
Compile it with Open MPI
shell$ mpicc hello.c -o helloshell$
Hey – what’s that? Where’s gcc?
Open MPI comes standard in many Linux and BSD distributions
(and OS X)
KYOSS presentation, January 2011 77
“Wrapper” compiler
shell$ mpicc hello.c -o hello –showmegcc hello.c -o hello -I/opt/openmpi/include -pthread -L/open/openmpi/lib -lmpi -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldlshell$
mpicc simply fills in a bunch of compiler command line options for you
KYOSS presentation, January 2011 78
Now let’s run it
shell$ mpirun –np 4 hello
Hey – what’s that? Why don’t I just
run “./hello”?
KYOSS presentation, January 2011 79
mpirun launcher
shell$ mpirun –np 4 hello
mpirun launches N copies of yourprogram and “wires them up”
“-np” = “number of processes”
This command launches a 4 process parallel job
KYOSS presentation, January 2011 80
mpirun launcher
shell$ mpirun –np 4 hello
hello hello
hellohello
Four copiesof “hello”
are launched
Then theyare “wired up”on the network
KYOSS presentation, January 2011 81
Now let’s run it
shell$ mpirun –np 4 helloHello, world! I am 0 of 4Hello, world! I am 1 of 4Hello, world! I am 2 of 4Hello, world! I am 3 of 4shell$
By default, all copiesrun on the local host
KYOSS presentation, January 2011 82
Run on multiple servers!
shell$ cat my_hostfilehost1.example.comhost2.example.comhost3.example.comhost4.example.comshell$
KYOSS presentation, January 2011 83
Run on multiple servers!
shell$ cat my_hostfilehost1.example.comhost2.example.comhost3.example.comhost4.example.comshell$ mpirun –hostfile my_hostfile –np 4 helloHello, world! I am 0 of 4Hello, world! I am 1 of 4Hello, world! I am 2 of 4Hello, world! I am 3 of 4shell$
Ran on host1 Ran on host2 Ran on host3 Ran on host4
KYOSS presentation, January 2011 84
Run it again
shell$ mpirun –hostfile my_hostfile –np 4 helloHello, world! I am 2 of 4Hello, world! I am 3 of 4Hello, world! I am 0 of 4Hello, world! I am 1 of 4shell$
Hey – why are the numbers out of
order?
2301
KYOSS presentation, January 2011 85
Standard output re-routing
shell$ mpirun –hostfile my_hostfile –np 4 hello
hello0
hello1
hello3
hello2
mpirun Each “hello” program’sstandard outputis intercepted
and sent across thenetwork to mpirun
Hello, world! I am 0 of 4 Hello, world! I am 1 of 4
Hello, world! I am 2 of 4 Hello, world! I am 3 of 4
KYOSS presentation, January 2011 86
Standard output re-routing
shell$ mpirun –hostfile my_hostfile –np 4 hello
hello0
hello1
hello3
hello2
mpirun But the exactordering of
received printf’sis non-deterministic
Hello, world! I am 0 of 4
Hello, world! I am 1 of 4
Hello, world! I am 2 of 4
Hello, world! I am 3 of 4
KYOSS presentation, January 2011 87
Printf debugging = Bad
If you can’t rely on output ordering,printf debugging is pretty lousy (!)
KYOSS presentation, January 2011 88
Parallel debuggers
Fortunately, there are paralleldebuggers and other tools
Paralleldebugger
Attaches to allprocesses inthe MPI job
hello0
hello1
hello3
hello2
mpirun
KYOSS presentation, January 2011 89
Now let’s send a simple MPI message
KYOSS presentation, January 2011 90
Send a simple message
int rank;double buffer[SIZE];
MPI_Comm_rank(MPI_COMM_WORLD, &rank);if (0 == rank) { /* …initialize buffer[]… */ MPI_Send(buffer, SIZE, MPI_DOUBLE, 1, 123, MPI_COMM_WORLD);} else if (1 == rank) { MPI_Recv(buffer, SIZE, MPI_DOUBLE, 0, 123, MPI_COMM_WORLD, MPI_STATUS_IGNORE);}
If I’m number 0, send the buffer[] array to number 1
If I’m number 1, receive thebuffer[] array from number 0
KYOSS presentation, January 2011 91
That’s enough MPI for now…
KYOSS presentation, January 2011 92
Open MPI
PACX-MPI
LAM/MPI
LA-MPI
FT-MPI
Sun CT 6
Project founded in 2003 after
intensediscussions
between multiple open source MPI implementations
KYOSS presentation, January 2011 93
Open_MPI_Init()
shell$ svn log –r 1 https://svn.open-mpi.org/svn/ompi------------------------------------------------------------------------r1 | jsquyres | 2003-11-22 11:36:58 -0500 (Sat, 22 Nov 2003) | 2 lines
First commit------------------------------------------------------------------------shell$
KYOSS presentation, January 2011 94
Open_MPI_Current_status()
shell$ svn log –r HEAD https://svn.open-mpi.org/svn/ompi------------------------------------------------------------------------r24226 | rhc | 2011-01-11 20:57:47 -0500 (Tue, 11 Jan 2011) | 25 lines
Fixes #2683: Move ORTE DPM compiler warning squash to v1.4------------------------------------------------------------------------shell$
KYOSS presentation, January 2011 95
Open MPI 2011 Membership
15 members, 11 contributors, 2 partners
KYOSS presentation, January 2011 96
Fun stats
• ohloh.net says: 517,400 lines of code 30 developers (over
time) “Well-commented
source code”
• I rank in top-25 ohloh stats for: C Automake Shell script Fortran (ouch!)
KYOSS presentation, January 2011 97
Open MPI has grown
It’s amazing (to me) that the Open MPIproject works so well
New features, new releases, new members
Long live Open MPI!
KYOSS presentation, January 2011 98
Recap
• Defined Message Passing Interface (MPI)
• Defined “supercomputers”
• Defined High Performance Computing (HPC)
• Showed what MPI is
• Showed some trivial MPI codes
• Discussed Open MPI
KYOSS presentation, January 2011 99
Additional Resources
• MPI Forum web site The only site for the official MPI standards http://www.mpi-forum.org/
• NCSA MPI basic and intermediate tutorials Requires a free account http://ci-tutor.ncsa.uiuc.edu/login.php
• “MPI Mechanic” magazine columns http://cw.squyres.com/
KYOSS presentation, January 2011 100
Additional Resources
• Research, Computing, and Engineering (RCE) podcast http://www.rce-cast.com/
• My blog: MPI_BCAST http://blogs.cisco.com/category/performance/
KYOSS presentation, January 2011 101
Questions?