marmot - an mpi analysis and checking tool · • marmot will add its own bookkeeping and check the...

54
26.02.2004 1 Höchstleistungsrechenzentrum Stuttgart Matthias Müller MARMOT - an MPI analysis and checking tool Bettina Krammer, Katrin Bidmon, Matthias Müller, Pavel Neytchev, Michael Resch HLRS High Performance Computing Center Stuttgart Allmandring 30 D-70550 Stuttgart http://www.hlrs.de Message Checker Workshop February 20 2004

Upload: others

Post on 13-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 1 Höchstleistungsrechenzentrum StuttgartMatthias Müller

MARMOT - an MPI analysis and checking tool

Bettina Krammer, Katrin Bidmon, Matthias Müller, Pavel Neytchev, Michael Resch

HLRSHigh Performance Computing Center Stuttgart

Allmandring 30D-70550 Stuttgarthttp://www.hlrs.de

Message Checker WorkshopFebruary 20 2004

Page 2: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 2 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Overview

• General Introduction– Motivation– History– Marmot inside CrossGrid

• Technical details– Design and Implementation– Performance

Page 3: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 3 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Motivation

Page 4: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 4 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Vaihingen II

hww configuration

Cray SV1/20

SGI Origin 2000

NEC SX-4/32 NEC SX-5/32 M2e

Researchnetworks

debisnetworksFileserver

Cray T3E-900/512

Untertürkheim2 GBit/s

d 20 km

Vaihingen

hp N-Class

Hitachi SR8000

NEC Azusa hpcLine

Fileserver

Page 5: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 5 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Problems of MPI Programming

• All problems of serial programming• Additional problems:

– Increased difficulty to verify correctness of program– Increased difficulty to debug N parallel processes– New parallel problems (deadlock, race conditions)– Portability between different MPI implementations– And many more …..

PACX-MPI

Page 6: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 6 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Classical Solutions I: Parallel Debugger

• Examples: totalview, DDT, p2d2 • Advantages:

– Same approach and tool as in serial case• Disadvantages:

– Can only fix problems after and if they occur– Scalability: How can you debug programs that crash after 3

hours on 512 nodes?– Reproducibility: How to debug a program that crashes only

every fifth time?– It does not help to improve portability

Page 7: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 7 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Classical Solutions II: Debug version of MPI Library

• Examples:– catches some incorrect usage: e.g. node count in

MPI_CART_CREATE (mpich)– deadlock detection (NEC mpi)

• Advantages:– good scalability– better debugging in combination with totalview

• Disadvantages:– Portability: only helps to use this implementation of MPI– Trade-of between performance and safety– Reproducibility: Does not help to debug irreproducible programs

Page 8: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 8 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Motivation

We were looking for something that:• Improves the portability of a program• Identifies some of the difficult problems automatically• Performance is not a concern as long as the tool is usable

Page 9: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 9 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Marmot Overview

Page 10: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 10 Höchstleistungsrechenzentrum StuttgartMatthias Müller

New Approach: MARMOT

• Design Goals of MARMOT– Portable to all platforms with MPI support– Performance is not a major concern, but it should be fast enough to

allow regular execution during development and also execution onmany nodes with larger workloads for the tough problems

– For a computer center the portability problems between different MPI implementations are the most difficult

• Properties of MARMOT– Run time checks– automatic checks wherever possible– Batch runs must be supported– Verify as much as possible that your program is a correct MPI program– detect possible race conditions– collaboration with a classical debugger and other tools

• allow source-level debugging of your program at the same time• start the debugger as soon as a deadlock is detected

Page 11: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 11 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Roadmap of Marmot

2000 2001 2002 2003 2004 ….

HLRS Research Projekt

Demonstrator

1.0.x

1.1.x

2.x

Tescico?

• First Crossgrid Milestone:• MPI 1.2, Fortran & C/C++• Linux IA32• Tested with test suite

• Second CrossGrid Milestone•Tested with several applications• Portability Release

What’s next?More checks, Visualizer,

Hybrid Programming, MPI 2.0 support ,Integration with other tools, Heuristics

Page 12: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 12 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Marmot inside CrossGrid

Page 13: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 13 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Integration in CrossGrid project

• Marmot is integrated in the CrossGrid build and deployment process:

• rpm is generated for Linux Clusters

• Deployed at Testsites in Poland, Slovakia, Spain, Portugal, Germany, Cyprus, Ireland, Greece

Page 14: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 14 Höchstleistungsrechenzentrum StuttgartMatthias Müller

External Quality Assurance in CrossGrid Project

• 25 subprojects have been analyzed• MARMOT is second in code documentation and on rank 6 in overall

code quality.

Page 15: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 15 Höchstleistungsrechenzentrum StuttgartMatthias Müller

MARMOT testsuite

• For internal tests we have about 100 small test programs with different MPI bugs that are detected by MARMOT

• Currently it is extended to check– A) that Marmot is working correctly with correct MPI programs– B) that Marmot finds all bugs in wrong MPI programs

Page 16: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 16 Höchstleistungsrechenzentrum StuttgartMatthias Müller

CrossGrid Applications

6500CBlood flow simulation

Simulate

11500CHigh Energy Physics

ANN

15500Fortran77Air pollution modeling

STEM

?Fortran90Flood simulationAladin, DaveF

# linesLanguageApplicationName

level 1 - special hardware

40 MHz (40 TB/sec)

75 KHz (75 GB/sec)

5 KHz (5 GB/sec)100 Hz(100 MB/sec)

data recording &offline analysis

level 2 - embedded processors

level 3 - PCs

Page 17: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 17 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Deployment in Germany

• Currently two of the 3 National High Performance Computing Centers have MARMOT installed:– HLRS at Stuttgart (IA32, IA64, NEC SX)– NIC at Jülich (IBM Regatta):

• Just replaced their Cray T3E with a IBM Regatta• spent a lot of time finding a problem that would have been detected

automatically by MARMOT

– LRZ at Munich is interested (Hitachi SR8000)

Page 18: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 18 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Design and Implementation

Page 19: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 19 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Basics of Marmot

• Seamless integration with your program– Simply link your application with the MARMOT C++ Library – No source code modification required– Additional process working as debug server

• Implementation of C and Fortran language binding of MPI-1.2• Configuration

– Debug level and verbosity– Error reporting

• Includes sample code

Page 20: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 20 Höchstleistungsrechenzentrum StuttgartMatthias Müller

How to use an Application with Marmot

Fix bugs, modify application

MPI application to verify

Rebuild application with MARMOT

Run with additional process to generate logfile

Analyze logfile

Build normal MPI application for production

Errors/ problems

no

yes

Page 21: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

Höchstleistungsrechenzentrum StuttgartMatthias Müller

Applicationor Test Program

MPI library

MARMOT core tool

Profiling Interface

DebugServer

(additionalprocess)

Design of MARMOT

Page 22: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 22 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Design of Verification Tool

• Library written in C++ that will be linked to the application• This library consists of the debug clients and one debug server.• No source code modification is required except for adding an

additional process working as debug server, i.e. the application will have to be run with mpirun for n+1 instead of n processes.

• Main interface = MPI profiling interface according to the MPI standard 1.2

• Implementation of C language binding of MPI• Implementation of Fortran language binding as a wrapper to the C

interface• Environment variables for tool behavior and output (report of errors,

warnings and/or remarks, trace-back, etc.)

Page 23: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 23 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Client Checks: verification on the local nodes

• Verification of MPI_Request usage– invalid recycling of active request– invalid use of unregistered request– warning if number of requests is zero– warning if all requests are MPI_REQUEST_NULL

• Verification of tag range• Verification if requested cartesian communicator has correct size• Verification of communicator in cartesian calls• Verification of groups in group calls• Verification of sizes in calls that create groups or communicators• Verification if ranges are valid (e.g. in group constructor calls)• Verification if ranges are distinct (e.g. MPI_Group_incl, -excl)• Check for pending messages and active requests in

MPI_Finalize

Page 24: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 24 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Server Checks

• Everything that requires a global view• Control the execution flow• Signal conditions, e.g. deadlocks• Check matching send/receive pairs for consistency• Output log (report errors etc.)

Page 25: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 25 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Example for tests: resource management

• To support a thread safe implementation of MPI many states are safed inside opaque objects in the user space: Requests.

• It is in the user’s responsibility to handle these requests correctly.• MARMOT will add its own bookkeeping and check the correctness

of the handling of requests.• Similar tests are made for communicators, groups, datatypes.

Page 26: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 26 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Class-Hierarchy (example)

MPO_Send

MPO_Pt2Pt

MPO_Blocking MPO_Requestcall

MPO_Recv

MPO_Bsend

MPO_Completion

MPO_Initiator

MPO_Tests

MPO_Isend

MPO_Wait

MPO_Irecv

MPO_Testall

Page 27: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 27 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Current status

• Full MPI 1.2 implemented• C and Fortran binding is supported• Used for several Fortran and C applications (CrossGrid Project and

others)

Page 28: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 28 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Supported Platforms

• Tests on different platforms, using different compilers and MPI implementations, e.g.– Clusters:

• IA32/IA64 • Intel and GNU compiler• Mpich and lam

– IBM Regatta– NEC SX5– Hitachi SR8000

Page 29: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 29 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Success Stories

Page 30: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 30 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Feedback of Crossgrid Applications

• Task 1.1 (biomedical)– C application– Identified issues:

• Possible race conditions due to use of MPI_ANY_SOURCE• Task 1.2 (flood):

– Fortran application– Identified issues:

• Tags outside of valid range• Possible race conditions due to use of MPI_ANY_SOURCE

• Task 1.3 (hep):– ANN (C application)– no issues found by MARMOT

• Task 1.4 (meteo):– STEMII (Fortran)– MARMOT detected holes in self-defined datatypes used in

MPI_Scatterv, MPI_Gatherv. These holes were removed, which helped to improve the performance of the communication.

Page 31: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 31 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Performance

Page 32: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 32 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Bandwidth on an IA64 cluster with Myrinet

0

50

100

150

200

250

1 16 256

4096

6553

6

1048

576

Message size [Bytes]

MB/snative MPIMARMOT

Page 33: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 33 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Latency on an IA64 cluster with Myrinet

0,01

0,1

1

10

1 8 64 512

4096

3276

8

3E+0

5

2E+0

6

Message size [Bytes]

Late

ncy

[ms]

native MPIMARMOT

Page 34: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 34 Höchstleistungsrechenzentrum StuttgartMatthias Müller

cg.B on an IA32 cluster with Myrinet

0

500

1000

1500

2000

2500

1 2 4 8 16

Processors

Mo

ps/

s to

tal

native MPIMARMOT

Page 35: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 35 Höchstleistungsrechenzentrum StuttgartMatthias Müller

CrossGrid Application: WP 1.4: Air pollution modeling

• Air pollution modeling with STEM-II model• Transport equation solved with Petrov-

Crank-Nikolson-Galerkin method• Chemistry and Mass transfer are integrated

using semi-implicit Euler and pseudo-analytical methods

• 15500 lines of Fortran code• 12 different MPI calls:

– MPI_Init, MPI_Comm_size, MPI_Comm_rank, MPI_Type_extent, MPI_Type_struct, MPI_-Type_commit, MPI_-Type_hvector, MPI_Bcast, MPI_Scatterv, MPI_Barrier, MPI_Gatherv, MPI_-Finalize.

Page 36: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 36 Höchstleistungsrechenzentrum StuttgartMatthias Müller

STEM application on an IA32 cluster with Myrinet

01020304050607080

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Processors

Tim

e [s

]

native MPIMARMOT

Page 37: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 37 Höchstleistungsrechenzentrum StuttgartMatthias Müller

CrossGrid Application: WP 1.1: Medical Application

• Calculation of blood flow with Lattice-Boltzmann method

• Stripped down application with 6500 lines of C code

• 14 different MPI calls:– MPI_Init, MPI_Comm_rank,

MPI_Comm_size, MPI_Pack, MPI_Bcast, MPI_Unpack, MPI_Cart_create, MPI_Cart_shift, MPI_Send, MPI_-Recv, MPI_Barrier, MPI_Reduce, MPI_Sendrecv, MPI_Finalize

Page 38: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 38 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Medical application on an IA32 cluster with Myrinet

0

0,1

0,2

0,3

0,4

0,5

0,6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Processors

Tim

e p

er It

erat

ion

[s]

native MPIMARMOT

Page 39: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 39 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Message statistics with native MPI

Page 40: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 40 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Message statistics with native MPI

Page 41: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 41 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Barrier with native MPI

Page 42: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 42 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Barrier with MARMOT

MARMOTDebugserver

Page 43: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 43 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Performance measurements and improvements

• MARMOT‘s overhead has been measured using– Microbenchmarks– NasPB– Crossgrid Applications

• MARMOT‘s performance is constantly improved according to application’s needs

• Rule of thumb: the higher the ratio of communication/computation, the higher the overhead of MARMOT.

Page 44: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 44 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Future Directions

Page 45: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 45 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Future developments

• Extended functionality:– More checks– MPI2– Hybrid programming– Checks based on heuristics?

• Usability improvements: – Performance– Filters– Explanations describing what exactly the user did wrong– Dump core at first error– GUI

Page 46: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 46 Höchstleistungsrechenzentrum StuttgartMatthias Müller

GUI

LocationHow is the problemdistributed across

the machine?

Class of Behavior

Which kind ofbehavior caused

the problem?

Call Graph

Where in the sourcecode is the problem?

In which context?Source Code

Where in my source

did the problem occur?

Page 47: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 47 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Marmot and other tools

Page 48: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 48 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Detailed status analysis with Debugger

$ mpirun -np 3 deadlock1

1 rank 0 performs MPI_Init2 rank 1 performs MPI_Init3 rank 0 performs MPI_Comm_rank4 rank 1 performs MPI_Comm_rank5 rank 0 performs MPI_Comm_size6 rank 1 performs MPI_Comm_size7 rank 0 performs MPI_Recv8 rank 1 performs MPI_Recv8 Rank 0 is pending!8 Rank 1 is pending!

WARNING: deadlock detected, all clients are pending

Step 1: detect deadlock

Step 2:

launch debugger and analyze situation

Page 49: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 49 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Detailed history analysis with message tracing tools

Page 50: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 50 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Intel thread checker to analyze hybrid applications

• Some OpenMP errors will only occur with specific workload on specific nodes

• It would be nice to make just one run to check for MPI and OpenMP errors

• mpicc –safetychecks would be a nice thing

Myrinet 2000

Page 51: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 51 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Summary

Page 52: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 52 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Conclusion

• MARMOT supports MPI 1.2 for C and Fortran binding• Tested with several applications and platforms• C++ implementation with a class hierarchy representing the MPI

calls turned out to be useful• Performance sufficient for most applications

• Future work:– scalability and general performance improvements– distribute tests from server to clients– better user interface to present problems and warnings– more tests to verify collective calls– Integration with other tools

Page 53: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 53 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Publications

• Bettina Krammer, Matthias S. Müller, Michael M. Resch. "MPI Application Development Using the Analysis Tool MARMOT". Accepted for publication in Technical Session on Tools for Program Development and Analysis in Computational Science at ICCS 2004, Krakow, Poland, June 7-9, 2004.

• Rainer Keller, Edgar Gabriel , Bettina Krammer, Matthias S. Müller, Michael M. Resch , "Towards efficient execution of MPI applications on the Grid:Porting and Optimization issues". Accepted for publication in Journal of Grid Computing (2004).

• Bettina Krammer, Katrin Bidmon, Matthias S. Müller, Michael M. Resch. "MARMOT: An MPI Analysis and Checking Tool", Parallel Computing 2003, Dresden, Germany, September 2-5, 2003.

• Rainer Keller, Bettina Krammer, Matthias S. Müller, Michael M. Resch, Edgar Gabriel "MPI Development Tools and Applications for the Grid", Workshop on Grid Applications and Programming Tools , Seattle, U.S.A., June 25, 2003.

Page 54: MARMOT - an MPI analysis and checking tool · • MARMOT will add its own bookkeeping and check the correctness of the handling of requests. • Similar tests are made for communicators,

26.02.2004 54 Höchstleistungsrechenzentrum StuttgartMatthias Müller

Any questions?