![Page 1: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/1.jpg)
VAMPIR & VAMPIRTRACE
INTRODUCTION AND OVERVIEW
Performance Analysis of Computer Systems
December 8th, 2011
Holger Brunst, Andreas Knüpfer, Jens Doleschal
![Page 2: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/2.jpg)
Overview
• Introduction
• Event trace visualization
• Vampir & VampirServer
• The Vampir displays • Timeline
• Process Timeline with performance counters
• Summary Display • Message Statistics
• VampirTrace • Instrumentation & run-time measurement
• Conclusions
2
![Page 3: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/3.jpg)
Introduction
Why bother with performance analysis?
• Well, why are you here after all?
• Efficient usage of expensive and limited resources
• Scalability to achieve next bigger simulation
Profiling and Tracing
• Have an optimization phase
– Just like testing and debugging phase
• Use tools!
• Avoid do-it-yourself-with-printf solutions, really!
3
![Page 4: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/4.jpg)
Event trace visualization
Trace visualization
• Alternative and supplement to automatic analysis
• Show dynamic run-time behavior graphically
• Provide statistics and performance metrics
– Processes and threads
– Performance counters
– Functions invocations
– Communication
– I/O
• Interactive browsing, zooming, selecting
– Adapt statistics to zoom level (time interval)
– Also for very large and highly parallel traces
4
![Page 5: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/5.jpg)
Vampir toolset architecture
5
Vampir Trace
Vampir Trace
Trace File
(OTF)
Vampir 7
Trace Bundle
VampirServer
CPU CPU
CPU CPU CPU CPU
CPU CPU
Multi-Core Program
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
Many-Core Program
![Page 6: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/6.jpg)
Usage order of the Vampir performance
analysis toolset
1. Instrument your application with VampirTrace
2. Run your application with an appropriate test set
3. Analyze your trace file with Vampir • Small trace files can be analyzed on your local workstation
1. Start your local Vampir
2. Load trace file from your local disk
• Large trace files should be stored on the cluster file system
1. Start VampirServer on your analysis cluster
2. Start your local Vampir
3. Connect local Vampir with the VampirServer on the analysis cluster
4. Load trace file from the cluster file system
6
![Page 7: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/7.jpg)
Vampir displays
The main displays of Vampir:
• Master Timeline (Global Timeline)
• Process and Counter Timeline
• Function Summary
• Message Summary
• Process Summary
• Communication Matrix
• Call Tree
7
![Page 8: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/8.jpg)
Vampir 7: Displays for a WRF Trace with 64
Processes
8
![Page 9: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/9.jpg)
Master Timeline (Global Timeline)
9
Master Timeline
![Page 10: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/10.jpg)
Process and Counter Timeline
Process Timeline
Counter Timeline
![Page 11: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/11.jpg)
Function Summary
Function Summary
![Page 12: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/12.jpg)
Message Summary
![Page 13: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/13.jpg)
Process Summary
13
Process Summary
![Page 14: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/14.jpg)
Communication Matrix
Communication Matrix
![Page 15: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/15.jpg)
Call Tree
![Page 16: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/16.jpg)
Introduction: Profiling & tracing
Program instrumentation
• Detect run-time events (points of interest)
• Pass information to run-time measurement library
Profile recording
• Collect aggregated information (Time, Counts, … )
• About program and system entities
– Functions, loops, basic blocks
– Application, processes, threads, …
Trace recording
• Save individual event records together with precise
timestamp and process or thread ID
• Plus event specific information 16
![Page 17: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/17.jpg)
Instrumentation & measurement
• What do you need to do for it?
– Use VampirTrace
• Instrumentation (automatic with compiler wrappers)
• Re-compile & re-link
• Trace run (run with appropriate test data set)
• More details later
17
CC = vtcc
CXX = vtcxx
F90 = vtf90
MPICC = vtcc -vt:cc mpicc
CC = icc
CXX = icpc
F90 = ifc
MPICC = mpicc
![Page 18: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/18.jpg)
Instrumentation & measurement
What does VampirTrace do in the background?
• Instrumentation:
– Via compiler wrappers
– By underlying compiler with specific options
– MPI instrumentation with replacement lib
– OpenMP instrumentation with Opari
– Also binary instrumentation with Dyninst
– Partial manual instrumentation
18
![Page 19: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/19.jpg)
Instrumentation & measurement
What does VampirTrace do in the background?
• Trace run:
– Event data collection
– Precise time measurement
– Parallel timer synchronization
– Collecting parallel process/thread traces
– Collecting performance counters (from PAPI, memory usage,
POSIX I/O calls and fork/system/exec calls, and more … )
– Filtering and grouping of function calls
19
![Page 20: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/20.jpg)
Summary
• Vampir & VampirServer
– Interactive trace visualization and analysis
– Intuitive browsing and zooming
– Scalable to large trace data sizes (100GByte)
– Scalable to high parallelism (2000 processes)
• Vampir for Linux, Windows and Mac OS X
• VampirTrace
– Convenient instrumentation and measurement
– Hides away complicated details
– Provides many options and switches for experts
• VampirTrace is part of Open MPI since version 1.3
20
![Page 21: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/21.jpg)
VAMPIR & VAMPIRTRACE
DETAILS AND HANDS-ON
Performance Analysis of Computer Systems December 8th, 2011
Holger Brunst, Andreas Knüpfer, Jens Doleschal
![Page 22: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/22.jpg)
22
Overview
• Event tracing in general • Hands-on: NPB 3.3 BT-MPI • Finding performance bottlenecks • FAQ
Vampir & VampirTrace
![Page 23: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/23.jpg)
23
• Event tracing in general
Vampir & VampirTrace
![Page 24: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/24.jpg)
24
Common event types
• Enter/leave of function/routine/region
– Time stamp, process/thread, function ID
• Send/receive of P2P message (MPI)
– Time stamp, sender, receiver, length, tag, communicator
• Collective communication (MPI)
– Time stamp, process, root, communicator, # bytes
• Hardware performance counter values
– Time stamp, process, counter ID, value
• etc.
![Page 25: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/25.jpg)
25
Profiling and tracing
• Tracing advantages
– Preserve temporal and spatial relationships
– Allow reconstruction of dynamic behavior on any required abstraction level
– Profiles can be calculated from traces
• Tracing disadvantages
– Traces can become very large
– May cause perturbation
– Instrumentation and tracing is complicated
• Event buffering, clock synchronization, …
![Page 26: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/26.jpg)
26
Instrumentation
• Instrumentation: Process of modifying programs to
detect and report events
• There are various ways of instrumentation:
– Manually
• Large effort, error prone
• Difficult to manage
– Automatically
• Via source to source translation
• Via compiler instrumentation
• Program Database Toolkit (PDT)
• OpenMP Pragma And Region Instrumenter (Opari)
![Page 27: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/27.jpg)
27
Open Trace Format (OTF)
• Open source trace file format
• Available at http://www.tu-dresden.de/zih/otf
• Includes powerful libotf for reading/parsing/writing in
custom applications
• Multi-level API:
– High level interface for analysis tools
– Low level interface for trace libraries
• Actively developed by TU Dresden in cooperation with
the University of Oregon and the Lawrence Livermore
National Laboratory
![Page 28: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/28.jpg)
28
Practical instrumentation
• Instrumentation with VampirTrace
– Hide instrumentation in compiler wrapper
– Use underlying compiler, add appropriate options
• Test run
– User representative test input
– Set parameters, environment variables, etc.
– Perform trace run
• Get trace
CC = mpicc
CC = vtcc –vt:cc mpicc
![Page 29: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/29.jpg)
29
Source code instrumentation
manually or automatically
int foo(void* arg) {
enter(7);
if (cond) {
leave(7);
return 1;
}
leave(7);
return 0;
}
int foo(void* arg) {
if (cond) {
return 1;
}
return 0;
}
![Page 30: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/30.jpg)
30
• NAS Parallel Benchmarks 3.3, BT class B • Block tridiagonal solver for nonlinear PDEs
Vampir & VampirTrace Hands-on
![Page 31: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/31.jpg)
Overview: Use of VampirTrace
Instrument your application with VampirTrace
1. Edit your Makefile and change the underlying compiler
2. Tell VampirTrace the parallelization type of your application
CC = cc
CXX = CC
F77 = ftn F90 = ftn
MPICC = cc
MPIF90 = ftn
CC = vtcc
CXX = vtcxx
F77 = vtf77 F90 = vtf90
MPICC = vtcc
MPIF90 = vtf90
-vt:<seq|mpi|mt|hyb>
# seq = sequential
# mpi = parallel (uses MPI) # mt = parallel (uses OpenMP/POSIX threads)
# hyb = hybrid parallel (MPI+Threads)
31
![Page 32: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/32.jpg)
Overview: Use of VampirTrace
Instrument your application with VampirTrace
3. Optional: Choose instrumentation type for your application
-vt:inst <gnu|pgi|sun|xl|ftrace|openuh|manual|
dyninst>
# DEFAULT: automatic instrumentation by compiler # manual: manual by using VT’s API (see manual)
# dyninst: binary instrumentation using Dyninst
32
![Page 33: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/33.jpg)
33
Hands-on: NPB 3.3 BT-MPI
• Load required modules
• Move into tutorial directory
% module load vampirtrace
% cd <path to NPB3.3-MPI>
![Page 34: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/34.jpg)
34
Hands-on: NPB 3.3 BT-MPI
• Select the VampirTrace compiler wrappers
• Build benchmark
% gedit config/make.def
-> comment out line 32, resulting in: 32: #MPIF77 = mpif77
-> modify line 38 as follows:
38: MPIF77 = vtf77 -vt:f77 ifort -lmpi
% make clean
% make bt CLASS=B NPROCS=16
![Page 35: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/35.jpg)
35
Hands-on: NPB 3.3 BT-MPI
• Submit job and launch MPI application
• Visualization with Vampir 7
% cd bin.vampir
% mpirun -np 16 ./bt_B.16
% module load vampir
% vampir &
![Page 36: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/36.jpg)
36
Hands-on: NPB 3.3 BT-MPI Change summary to function based
statistic
![Page 37: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/37.jpg)
37
Hands-on: NPB 3.3 BT-MPI Change metric to number of invocations
![Page 38: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/38.jpg)
38
Hands-on: NPB 3.3 BT-MPI Add counter timeline
![Page 39: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/39.jpg)
39
Hands-on: NPB 3.3 BT-MPI Switch to memory allocation counter
![Page 40: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/40.jpg)
40
Hands-on: NPB 3.3 BT-MPI Use performance radar view to get
an overview
![Page 41: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/41.jpg)
41
Hands-on: NPB 3.3 BT-MPI Switch to memory allocation counter
![Page 42: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/42.jpg)
42
Hands-on: NPB 3.3 BT-MPI Zoom in to see execution phases
![Page 43: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/43.jpg)
43
Hands-on: NPB 3.3 BT-MPI Switch to floating point operation
counter
![Page 44: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/44.jpg)
44
Hands-on: NPB 3.3 BT-MPI Show occurrences of a function
![Page 45: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/45.jpg)
45
Hands-on: NPB 3.3 BT-MPI
![Page 46: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/46.jpg)
46
• Finding performance bottlenecks
Vampir & VampirTrace
![Page 47: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/47.jpg)
47
Finding bottlenecks
• Trace visualization
– Vampir provides a number of display types
– Each allows many different options
• Advice
– Identify essential parts of an application (initialization,
main iteration, I/O, finalization)
– Identify important components of the code (serial computation,
MPI P2P, collective MPI, OpenMP)
– Make a hypothesis about performance problems
– Consider application’s internal workings if known
– Select the appropriate displays
– Use statistic displays in conjunction with timelines
![Page 48: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/48.jpg)
48
Finding bottlenecks
• Communication
• Computation
• Memory, I/O, etc.
• Tracing itself
![Page 49: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/49.jpg)
49
Bottlenecks in communication
• Communications as such (dominating over computation)
• Late sender, late receiver
• Point-to-point messages instead of collective
communication
• Unmatched messages
• Overcharge of MPI’s buffers
• Bursts of large messages (bandwidth)
• Frequent short messages (latency)
• Unnecessary synchronization (barrier)
All of the above usually result in high MPI time share.
![Page 50: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/50.jpg)
51
Bottlenecks in communication
prevalent communication: MPI_Allreduce
![Page 51: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/51.jpg)
52
Bottlenecks in communication
prevalent communication: timeline view
![Page 52: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/52.jpg)
54
Bottlenecks in communication
unnecessary MPI_Barriers
![Page 53: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/53.jpg)
55
Bottlenecks in communication
patterns of successive MPI_Allreduce calls
![Page 54: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/54.jpg)
56
Further bottlenecks
• Unbalanced computation
– Single late comer
• Strictly serial parts of program
– Idle processes/threads
• Very frequent tiny function calls
• Sparse loops
![Page 55: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/55.jpg)
57
Further bottlenecks
example: idle OpenMP threads
![Page 56: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/56.jpg)
58
Bottlenecks in computation
• Memory bound computation
– Inefficient L1/L2/L3 cache usage
– TLB misses
– Detectable via HW performance counters
• I/O bound computation
– Slow input/output
– Sequential I/O on single process
– I/O load imbalance
• Exception handling
![Page 57: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/57.jpg)
59
Bottlenecks in computation
low FP rate due to heavy cache misses
![Page 58: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/58.jpg)
60
Bottlenecks in computation
low FP rate due to heavy FP exceptions
![Page 59: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/59.jpg)
61
Bottlenecks in computation
irregular slow I/O operations
![Page 60: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/60.jpg)
62
Effects due to Tracing
• Measurement overhead
– Especially grave for tiny function calls
– Solve with selective instrumentation
• Long/frequent/asynchronous trace buffer flushes
• Too man concurrent counters
• Heisenbugs
![Page 61: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/61.jpg)
63
Effects due to Tracing
Trace buffer flushes are explicitly marked in the trace.
It is rather harmless at the end of a trace as shown here.
![Page 62: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/62.jpg)
64
• FAQ
Vampir & VampirTrace
![Page 63: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/63.jpg)
VampirTrace FAQ - Tracing switched off
Issue:
Tracing was switched off because the
internal trace buffer was too small that all events fit in
Result:
1. Asynchronous behavior of the application due to
buffer flush of the measurement system
2. No tracing information available after flush operation
3. Huge overhead due to flush operation
[0]VampirTrace: Maximum number of buffer flushes reached \
(VT_MAX_FLUSHES=1)
[0]VampirTrace: Tracing switched off permanently
65
![Page 64: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/64.jpg)
VampirTrace FAQ - Solutions
• Increase trace buffer size
• Increase number of allowed buffer flushes (not
recommended)
• Use filter mechanisms to reduce the number of recorded
% export VT_BUFFER_SIZE = 150M
% export VT_MAX_FLUSHES = 2
% export VT_FILTER_SPEC = /home/user/filter.spec
66
![Page 65: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/65.jpg)
VampirTrace FAQ – Issue of increasing
buffer size
Issue:
Each function entry/exit, MPI event was recorded
Result:
Trace files become large even for short application runs
Solutions:
1. Use filter mechanisms to reduce the number of
recorded events (see slide Function Filtering for more
details)
2. Use selective instrumentation of your application
(see slide Selective Instrumentation for more details) 67
![Page 66: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/66.jpg)
68
Function filtering
• Filtering is one of the ways to reduce trace size
• Environment variable VT_FILTER_SPEC
• Filter definition file contains a list of filters
• See also the vtfilter tool
– Can generate a customized filter file
– Can reduce the size of existing trace files
% export VT_FILTER_SPEC = /home/user/filter.spec
my_*;test_* -- 1000
debug_* -- 0 calculate -- -1
* -- 1000000
![Page 67: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/67.jpg)
Selective instrumentation
• Selective instrumentation can helps you to reduce the
size of your trace file while only parts of interests will be
recorded
• One option to use selective instrumentation is to use a
manual instrumentation instead of a automatic
instrumentation
• Another option is to modify your Makefile in such a way
that a automatic instrumentation (default) is only applied
to source files with interesting parts of interests
(functions of interest)
% vtcc -vt:inst manual … source_code.c
69
![Page 68: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/68.jpg)
VampirTrace FAQ – How to get more insights?
Issue:
I’m interested in more events and hardware counters. What do I have to do?
Solutions:
1. Use the environment option VT_METRICS to enable recording of additional hardware counters like PAPI, CPC or NEC if available.
2. Use the environment option VT_RUSAGE to record the Unix resource usage counters.
3. Use the environment option VT_MEMTRACE, if available on your system, to intercept the libc allocation functions add to record memory allocation information.
For more additional events and recording hardware information see chapter 4 in the VampirTrace manual.
70
![Page 69: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/69.jpg)
71
PAPI
• PAPI counters can be included in traces
– If VampirTrace was build with PAPI support
– If PAPI is available on the platform
• VT_METRICS specifies a list of PAPI counters
• See also the PAPI commands papi_avail and
papi_command_line
% export VT_METRICS = PAPI_FP_OPS:PAPI_L2_TCM
![Page 70: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/70.jpg)
72
Memory allocation and I/O counters
• Memory allocation counters can be recorded:
– If VampirTrace build with memory allocation tracing support
– If GNU glibc is used on the platform
• Intercept glibc functions like “malloc” and “free”
• Environment variable VT_MEMTRACE
• I/O counters can be included in traces
– If VampirTrace was build with I/O tracing support
• Standard I/O calls like “open” and “read” are recorded
• Environment variable VT_IOTRACE
% export VT_MEMTRACE = yes
% export VT_IOTRACE = yes
![Page 71: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/71.jpg)
VampirTrace FAQ – Grouping of functions
Issue:
My functions appear in the default group “application”.
What can I do to better differentiate between different types
of functions?
Result:
Statistics of the default groups are not able to show the
different behavior of different function classes.
Solution:
Use grouping mechanism to to define own groups (see
slide Function Grouping for more details) 73
![Page 72: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/72.jpg)
74
Function grouping
• Groups can be defined for related functions
– Groups can be assigned different colors, highlighting
different activities
• Environment variable VT_GROUPS_SPEC
• Group file contains a list of associated entries
% export VT_GROUPS_SPEC = /home/user/groups.spec
CALC=calculate
MISC=my*;test UNKNOWN=*
![Page 73: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/73.jpg)
75
VampirTrace run-time options
• Control options by environment variables:
– VT_PFORM_GDIR Directory for final trace files
– VT_PFORM_LDIR Directory for intermediate files
– VT_FILE_PREFIX Trace file name
– VT_BUFFER_SIZE Internal trace buffer size
– VT_MAX_FLUSHES Max number of buffer flushes
– VT_MEMTRACE Enable memory allocation tracing
– VT_MPICHECK Enable MPI checking
– VT_IOTRACE Enable I/O tracing
– VT_MPITRACE Enable MPI tracing
– VT_FILTER_SPEC Name of filter definition file
– VT_GROUPS_SPEC Name of grouping definition file
– VT_METRICS PAPI counter selection
![Page 74: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/74.jpg)
76
Conclusions and outlook
• Performance analysis very important in HPC
• Use performance analysis tools for profiling and tracing
• Do not spend effort in do-it-yourself solutions,
e.g. like printf-debugging
• Use tracing tools with some precautions
– Overhead
– Data volume
• Let us know about problems and about feature wishes
![Page 75: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/75.jpg)
77
Vampir and VampirTraces are
available at http://www.vampir.eu and
http://www.tu-dresden.de/zih/vampirtrace/ ,
get support via [email protected]
![Page 76: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/76.jpg)
78
Staff at ZIH - TU Dresden:
Ronny Brendel, Holger Brunst, Jens Doleschal, Ronald Geisler, Daniel Hackenberg, Michael Heyde,
Tobias Hilbrich, Rene Jäkel, Matthias Jurenz, Michael Kluge, Andreas Knüpfer, Matthias Lieber,
Holger Mickler, Hartmut Mix, Matthias Müller, Wolfgang E. Nagel, Reinhard Neumann, Michael Peter,
Heide Rohling, Johannes Spazier, Michael Wagner, Matthias Weber, Bert Wesarg
![Page 77: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/77.jpg)
79
Wrapper functions
• Provide wrapper functions
– Call instrumentation function for notification
– Call original target for functionality
– Via preprocessor directives:
• Via library preload:
– Preload instrumented dynamic library
• Suitable for standard libraries (e.g. MPI, glibc)
#define MPI_Init WRAPPER_MPI_Init
#define MPI_Send WRAPPER_MPI_Send
![Page 78: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/78.jpg)
80
The MPI Profiling Interface
• Each MPI function has to names:
– MPI_xxx and PMPI_xxx
• Replacement of MPI routines at link time
wrapper library
user program
MPI library
MPI_Send
PMPI_Send MPI_Send
MPI_Send
MPI_Send
MPI_Send MPI_Send
![Page 79: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/79.jpg)
81
Compiler instrumentation
gcc –finstrument-functions –c foo.c
• many compilers support this: GCC, Intel, IBM, PGI, NEC,
Hitachi, Sun Fortran, …
• no source code modification necessary
void __cyg_profile_func_enter( <args> );
void __cyg_profile_func_exit( <args> );
![Page 80: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas](https://reader033.vdocument.in/reader033/viewer/2022052611/5f06f7f97e708231d41aa36a/html5/thumbnails/80.jpg)
82
Dynamic instrumentation
• Modify executable in file or binary in memory
• Insert instrumentation calls
• Very platform/machine dependent, expensive
• DynInst project (http://www.dyninst.org)
– Common interface
– Supported platforms: Alpha/Tru64, MIPS/
IRIX,
PowerPC/AIX, Sparc/Solaris, x86/Linux x86/Windows, ia64/Linux