marmot - an mpi analysis and checking tool · • marmot will add its own bookkeeping and check the...
TRANSCRIPT
26.02.2004 1 Höchstleistungsrechenzentrum StuttgartMatthias Müller
MARMOT - an MPI analysis and checking tool
Bettina Krammer, Katrin Bidmon, Matthias Müller, Pavel Neytchev, Michael Resch
HLRSHigh Performance Computing Center Stuttgart
Allmandring 30D-70550 Stuttgarthttp://www.hlrs.de
Message Checker WorkshopFebruary 20 2004
26.02.2004 2 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Overview
• General Introduction– Motivation– History– Marmot inside CrossGrid
• Technical details– Design and Implementation– Performance
26.02.2004 3 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Motivation
26.02.2004 4 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Vaihingen II
hww configuration
Cray SV1/20
SGI Origin 2000
NEC SX-4/32 NEC SX-5/32 M2e
Researchnetworks
debisnetworksFileserver
Cray T3E-900/512
Untertürkheim2 GBit/s
d 20 km
Vaihingen
hp N-Class
Hitachi SR8000
NEC Azusa hpcLine
Fileserver
26.02.2004 5 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Problems of MPI Programming
• All problems of serial programming• Additional problems:
– Increased difficulty to verify correctness of program– Increased difficulty to debug N parallel processes– New parallel problems (deadlock, race conditions)– Portability between different MPI implementations– And many more …..
PACX-MPI
26.02.2004 6 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Classical Solutions I: Parallel Debugger
• Examples: totalview, DDT, p2d2 • Advantages:
– Same approach and tool as in serial case• Disadvantages:
– Can only fix problems after and if they occur– Scalability: How can you debug programs that crash after 3
hours on 512 nodes?– Reproducibility: How to debug a program that crashes only
every fifth time?– It does not help to improve portability
26.02.2004 7 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Classical Solutions II: Debug version of MPI Library
• Examples:– catches some incorrect usage: e.g. node count in
MPI_CART_CREATE (mpich)– deadlock detection (NEC mpi)
• Advantages:– good scalability– better debugging in combination with totalview
• Disadvantages:– Portability: only helps to use this implementation of MPI– Trade-of between performance and safety– Reproducibility: Does not help to debug irreproducible programs
26.02.2004 8 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Motivation
We were looking for something that:• Improves the portability of a program• Identifies some of the difficult problems automatically• Performance is not a concern as long as the tool is usable
26.02.2004 9 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Marmot Overview
26.02.2004 10 Höchstleistungsrechenzentrum StuttgartMatthias Müller
New Approach: MARMOT
• Design Goals of MARMOT– Portable to all platforms with MPI support– Performance is not a major concern, but it should be fast enough to
allow regular execution during development and also execution onmany nodes with larger workloads for the tough problems
– For a computer center the portability problems between different MPI implementations are the most difficult
• Properties of MARMOT– Run time checks– automatic checks wherever possible– Batch runs must be supported– Verify as much as possible that your program is a correct MPI program– detect possible race conditions– collaboration with a classical debugger and other tools
• allow source-level debugging of your program at the same time• start the debugger as soon as a deadlock is detected
26.02.2004 11 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Roadmap of Marmot
2000 2001 2002 2003 2004 ….
HLRS Research Projekt
Demonstrator
1.0.x
1.1.x
2.x
Tescico?
• First Crossgrid Milestone:• MPI 1.2, Fortran & C/C++• Linux IA32• Tested with test suite
• Second CrossGrid Milestone•Tested with several applications• Portability Release
What’s next?More checks, Visualizer,
Hybrid Programming, MPI 2.0 support ,Integration with other tools, Heuristics
26.02.2004 12 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Marmot inside CrossGrid
26.02.2004 13 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Integration in CrossGrid project
• Marmot is integrated in the CrossGrid build and deployment process:
• rpm is generated for Linux Clusters
• Deployed at Testsites in Poland, Slovakia, Spain, Portugal, Germany, Cyprus, Ireland, Greece
26.02.2004 14 Höchstleistungsrechenzentrum StuttgartMatthias Müller
External Quality Assurance in CrossGrid Project
• 25 subprojects have been analyzed• MARMOT is second in code documentation and on rank 6 in overall
code quality.
26.02.2004 15 Höchstleistungsrechenzentrum StuttgartMatthias Müller
MARMOT testsuite
• For internal tests we have about 100 small test programs with different MPI bugs that are detected by MARMOT
• Currently it is extended to check– A) that Marmot is working correctly with correct MPI programs– B) that Marmot finds all bugs in wrong MPI programs
26.02.2004 16 Höchstleistungsrechenzentrum StuttgartMatthias Müller
CrossGrid Applications
6500CBlood flow simulation
Simulate
11500CHigh Energy Physics
ANN
15500Fortran77Air pollution modeling
STEM
?Fortran90Flood simulationAladin, DaveF
# linesLanguageApplicationName
level 1 - special hardware
40 MHz (40 TB/sec)
75 KHz (75 GB/sec)
5 KHz (5 GB/sec)100 Hz(100 MB/sec)
data recording &offline analysis
level 2 - embedded processors
level 3 - PCs
26.02.2004 17 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Deployment in Germany
• Currently two of the 3 National High Performance Computing Centers have MARMOT installed:– HLRS at Stuttgart (IA32, IA64, NEC SX)– NIC at Jülich (IBM Regatta):
• Just replaced their Cray T3E with a IBM Regatta• spent a lot of time finding a problem that would have been detected
automatically by MARMOT
– LRZ at Munich is interested (Hitachi SR8000)
26.02.2004 18 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Design and Implementation
26.02.2004 19 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Basics of Marmot
• Seamless integration with your program– Simply link your application with the MARMOT C++ Library – No source code modification required– Additional process working as debug server
• Implementation of C and Fortran language binding of MPI-1.2• Configuration
– Debug level and verbosity– Error reporting
• Includes sample code
26.02.2004 20 Höchstleistungsrechenzentrum StuttgartMatthias Müller
How to use an Application with Marmot
Fix bugs, modify application
MPI application to verify
Rebuild application with MARMOT
Run with additional process to generate logfile
Analyze logfile
Build normal MPI application for production
Errors/ problems
no
yes
Höchstleistungsrechenzentrum StuttgartMatthias Müller
Applicationor Test Program
MPI library
MARMOT core tool
Profiling Interface
DebugServer
(additionalprocess)
Design of MARMOT
26.02.2004 22 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Design of Verification Tool
• Library written in C++ that will be linked to the application• This library consists of the debug clients and one debug server.• No source code modification is required except for adding an
additional process working as debug server, i.e. the application will have to be run with mpirun for n+1 instead of n processes.
• Main interface = MPI profiling interface according to the MPI standard 1.2
• Implementation of C language binding of MPI• Implementation of Fortran language binding as a wrapper to the C
interface• Environment variables for tool behavior and output (report of errors,
warnings and/or remarks, trace-back, etc.)
26.02.2004 23 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Client Checks: verification on the local nodes
• Verification of MPI_Request usage– invalid recycling of active request– invalid use of unregistered request– warning if number of requests is zero– warning if all requests are MPI_REQUEST_NULL
• Verification of tag range• Verification if requested cartesian communicator has correct size• Verification of communicator in cartesian calls• Verification of groups in group calls• Verification of sizes in calls that create groups or communicators• Verification if ranges are valid (e.g. in group constructor calls)• Verification if ranges are distinct (e.g. MPI_Group_incl, -excl)• Check for pending messages and active requests in
MPI_Finalize
26.02.2004 24 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Server Checks
• Everything that requires a global view• Control the execution flow• Signal conditions, e.g. deadlocks• Check matching send/receive pairs for consistency• Output log (report errors etc.)
26.02.2004 25 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Example for tests: resource management
• To support a thread safe implementation of MPI many states are safed inside opaque objects in the user space: Requests.
• It is in the user’s responsibility to handle these requests correctly.• MARMOT will add its own bookkeeping and check the correctness
of the handling of requests.• Similar tests are made for communicators, groups, datatypes.
26.02.2004 26 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Class-Hierarchy (example)
MPO_Send
MPO_Pt2Pt
MPO_Blocking MPO_Requestcall
MPO_Recv
MPO_Bsend
MPO_Completion
MPO_Initiator
MPO_Tests
MPO_Isend
MPO_Wait
MPO_Irecv
MPO_Testall
26.02.2004 27 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Current status
• Full MPI 1.2 implemented• C and Fortran binding is supported• Used for several Fortran and C applications (CrossGrid Project and
others)
26.02.2004 28 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Supported Platforms
• Tests on different platforms, using different compilers and MPI implementations, e.g.– Clusters:
• IA32/IA64 • Intel and GNU compiler• Mpich and lam
– IBM Regatta– NEC SX5– Hitachi SR8000
26.02.2004 29 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Success Stories
26.02.2004 30 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Feedback of Crossgrid Applications
• Task 1.1 (biomedical)– C application– Identified issues:
• Possible race conditions due to use of MPI_ANY_SOURCE• Task 1.2 (flood):
– Fortran application– Identified issues:
• Tags outside of valid range• Possible race conditions due to use of MPI_ANY_SOURCE
• Task 1.3 (hep):– ANN (C application)– no issues found by MARMOT
• Task 1.4 (meteo):– STEMII (Fortran)– MARMOT detected holes in self-defined datatypes used in
MPI_Scatterv, MPI_Gatherv. These holes were removed, which helped to improve the performance of the communication.
26.02.2004 31 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Performance
26.02.2004 32 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Bandwidth on an IA64 cluster with Myrinet
0
50
100
150
200
250
1 16 256
4096
6553
6
1048
576
Message size [Bytes]
MB/snative MPIMARMOT
26.02.2004 33 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Latency on an IA64 cluster with Myrinet
0,01
0,1
1
10
1 8 64 512
4096
3276
8
3E+0
5
2E+0
6
Message size [Bytes]
Late
ncy
[ms]
native MPIMARMOT
26.02.2004 34 Höchstleistungsrechenzentrum StuttgartMatthias Müller
cg.B on an IA32 cluster with Myrinet
0
500
1000
1500
2000
2500
1 2 4 8 16
Processors
Mo
ps/
s to
tal
native MPIMARMOT
26.02.2004 35 Höchstleistungsrechenzentrum StuttgartMatthias Müller
CrossGrid Application: WP 1.4: Air pollution modeling
• Air pollution modeling with STEM-II model• Transport equation solved with Petrov-
Crank-Nikolson-Galerkin method• Chemistry and Mass transfer are integrated
using semi-implicit Euler and pseudo-analytical methods
• 15500 lines of Fortran code• 12 different MPI calls:
– MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Type_extent, MPI_Type_struct, MPI_-Type_commit, MPI_-Type_hvector, MPI_Bcast, MPI_Scatterv, MPI_Barrier, MPI_Gatherv, MPI_-Finalize.
26.02.2004 36 Höchstleistungsrechenzentrum StuttgartMatthias Müller
STEM application on an IA32 cluster with Myrinet
01020304050607080
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Processors
Tim
e [s
]
native MPIMARMOT
26.02.2004 37 Höchstleistungsrechenzentrum StuttgartMatthias Müller
CrossGrid Application: WP 1.1: Medical Application
• Calculation of blood flow with Lattice-Boltzmann method
• Stripped down application with 6500 lines of C code
• 14 different MPI calls:– MPI_Init, MPI_Comm_rank,
MPI_Comm_size, MPI_Pack, MPI_Bcast, MPI_Unpack, MPI_Cart_create, MPI_Cart_shift, MPI_Send, MPI_-Recv, MPI_Barrier, MPI_Reduce, MPI_Sendrecv, MPI_Finalize
26.02.2004 38 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Medical application on an IA32 cluster with Myrinet
0
0,1
0,2
0,3
0,4
0,5
0,6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Processors
Tim
e p
er It
erat
ion
[s]
native MPIMARMOT
26.02.2004 39 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Message statistics with native MPI
26.02.2004 40 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Message statistics with native MPI
26.02.2004 41 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Barrier with native MPI
26.02.2004 42 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Barrier with MARMOT
MARMOTDebugserver
26.02.2004 43 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Performance measurements and improvements
• MARMOT‘s overhead has been measured using– Microbenchmarks– NasPB– Crossgrid Applications
• MARMOT‘s performance is constantly improved according to application’s needs
• Rule of thumb: the higher the ratio of communication/computation, the higher the overhead of MARMOT.
26.02.2004 44 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Future Directions
26.02.2004 45 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Future developments
• Extended functionality:– More checks– MPI2– Hybrid programming– Checks based on heuristics?
• Usability improvements: – Performance– Filters– Explanations describing what exactly the user did wrong– Dump core at first error– GUI
26.02.2004 46 Höchstleistungsrechenzentrum StuttgartMatthias Müller
GUI
LocationHow is the problemdistributed across
the machine?
Class of Behavior
Which kind ofbehavior caused
the problem?
Call Graph
Where in the sourcecode is the problem?
In which context?Source Code
Where in my source
did the problem occur?
26.02.2004 47 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Marmot and other tools
26.02.2004 48 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Detailed status analysis with Debugger
$ mpirun -np 3 deadlock1
1 rank 0 performs MPI_Init2 rank 1 performs MPI_Init3 rank 0 performs MPI_Comm_rank4 rank 1 performs MPI_Comm_rank5 rank 0 performs MPI_Comm_size6 rank 1 performs MPI_Comm_size7 rank 0 performs MPI_Recv8 rank 1 performs MPI_Recv8 Rank 0 is pending!8 Rank 1 is pending!
WARNING: deadlock detected, all clients are pending
Step 1: detect deadlock
Step 2:
launch debugger and analyze situation
26.02.2004 49 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Detailed history analysis with message tracing tools
26.02.2004 50 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Intel thread checker to analyze hybrid applications
• Some OpenMP errors will only occur with specific workload on specific nodes
• It would be nice to make just one run to check for MPI and OpenMP errors
• mpicc –safetychecks would be a nice thing
Myrinet 2000
26.02.2004 51 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Summary
26.02.2004 52 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Conclusion
• MARMOT supports MPI 1.2 for C and Fortran binding• Tested with several applications and platforms• C++ implementation with a class hierarchy representing the MPI
calls turned out to be useful• Performance sufficient for most applications
• Future work:– scalability and general performance improvements– distribute tests from server to clients– better user interface to present problems and warnings– more tests to verify collective calls– Integration with other tools
26.02.2004 53 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Publications
• Bettina Krammer, Matthias S. Müller, Michael M. Resch. "MPI Application Development Using the Analysis Tool MARMOT". Accepted for publication in Technical Session on Tools for Program Development and Analysis in Computational Science at ICCS 2004, Krakow, Poland, June 7-9, 2004.
• Rainer Keller, Edgar Gabriel , Bettina Krammer, Matthias S. Müller, Michael M. Resch , "Towards efficient execution of MPI applications on the Grid:Porting and Optimization issues". Accepted for publication in Journal of Grid Computing (2004).
• Bettina Krammer, Katrin Bidmon, Matthias S. Müller, Michael M. Resch. "MARMOT: An MPI Analysis and Checking Tool", Parallel Computing 2003, Dresden, Germany, September 2-5, 2003.
• Rainer Keller, Bettina Krammer, Matthias S. Müller, Michael M. Resch, Edgar Gabriel "MPI Development Tools and Applications for the Grid", Workshop on Grid Applications and Programming Tools , Seattle, U.S.A., June 25, 2003.
26.02.2004 54 Höchstleistungsrechenzentrum StuttgartMatthias Müller
Any questions?