vampir & vampirtrace introduction and overview · vampir & vampirtrace introduction and...

80
VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas Knüpfer, Jens Doleschal [email protected] [email protected] [email protected]

Upload: others

Post on 14-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VAMPIR & VAMPIRTRACE

INTRODUCTION AND OVERVIEW

Performance Analysis of Computer Systems

December 8th, 2011

Holger Brunst, Andreas Knüpfer, Jens Doleschal

[email protected]

[email protected]

[email protected]

Page 2: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Overview

• Introduction

• Event trace visualization

• Vampir & VampirServer

• The Vampir displays • Timeline

• Process Timeline with performance counters

• Summary Display • Message Statistics

• VampirTrace • Instrumentation & run-time measurement

• Conclusions

2

Page 3: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Introduction

Why bother with performance analysis?

• Well, why are you here after all?

• Efficient usage of expensive and limited resources

• Scalability to achieve next bigger simulation

Profiling and Tracing

• Have an optimization phase

– Just like testing and debugging phase

• Use tools!

• Avoid do-it-yourself-with-printf solutions, really!

3

Page 4: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Event trace visualization

Trace visualization

• Alternative and supplement to automatic analysis

• Show dynamic run-time behavior graphically

• Provide statistics and performance metrics

– Processes and threads

– Performance counters

– Functions invocations

– Communication

– I/O

• Interactive browsing, zooming, selecting

– Adapt statistics to zoom level (time interval)

– Also for very large and highly parallel traces

4

Page 5: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Vampir toolset architecture

5

Vampir Trace

Vampir Trace

Trace File

(OTF)

Vampir 7

Trace Bundle

VampirServer

CPU CPU

CPU CPU CPU CPU

CPU CPU

Multi-Core Program

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

Many-Core Program

Page 6: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Usage order of the Vampir performance

analysis toolset

1. Instrument your application with VampirTrace

2. Run your application with an appropriate test set

3. Analyze your trace file with Vampir • Small trace files can be analyzed on your local workstation

1. Start your local Vampir

2. Load trace file from your local disk

• Large trace files should be stored on the cluster file system

1. Start VampirServer on your analysis cluster

2. Start your local Vampir

3. Connect local Vampir with the VampirServer on the analysis cluster

4. Load trace file from the cluster file system

6

Page 7: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Vampir displays

The main displays of Vampir:

• Master Timeline (Global Timeline)

• Process and Counter Timeline

• Function Summary

• Message Summary

• Process Summary

• Communication Matrix

• Call Tree

7

Page 8: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Vampir 7: Displays for a WRF Trace with 64

Processes

8

Page 9: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Master Timeline (Global Timeline)

9

Master Timeline

Page 10: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Process and Counter Timeline

Process Timeline

Counter Timeline

Page 11: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Function Summary

Function Summary

Page 12: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Message Summary

Page 13: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Process Summary

13

Process Summary

Page 14: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Communication Matrix

Communication Matrix

Page 15: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Call Tree

Page 16: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Introduction: Profiling & tracing

Program instrumentation

• Detect run-time events (points of interest)

• Pass information to run-time measurement library

Profile recording

• Collect aggregated information (Time, Counts, … )

• About program and system entities

– Functions, loops, basic blocks

– Application, processes, threads, …

Trace recording

• Save individual event records together with precise

timestamp and process or thread ID

• Plus event specific information 16

Page 17: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Instrumentation & measurement

• What do you need to do for it?

– Use VampirTrace

• Instrumentation (automatic with compiler wrappers)

• Re-compile & re-link

• Trace run (run with appropriate test data set)

• More details later

17

CC = vtcc

CXX = vtcxx

F90 = vtf90

MPICC = vtcc -vt:cc mpicc

CC = icc

CXX = icpc

F90 = ifc

MPICC = mpicc

Page 18: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Instrumentation & measurement

What does VampirTrace do in the background?

• Instrumentation:

– Via compiler wrappers

– By underlying compiler with specific options

– MPI instrumentation with replacement lib

– OpenMP instrumentation with Opari

– Also binary instrumentation with Dyninst

– Partial manual instrumentation

18

Page 19: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Instrumentation & measurement

What does VampirTrace do in the background?

• Trace run:

– Event data collection

– Precise time measurement

– Parallel timer synchronization

– Collecting parallel process/thread traces

– Collecting performance counters (from PAPI, memory usage,

POSIX I/O calls and fork/system/exec calls, and more … )

– Filtering and grouping of function calls

19

Page 20: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Summary

• Vampir & VampirServer

– Interactive trace visualization and analysis

– Intuitive browsing and zooming

– Scalable to large trace data sizes (100GByte)

– Scalable to high parallelism (2000 processes)

• Vampir for Linux, Windows and Mac OS X

• VampirTrace

– Convenient instrumentation and measurement

– Hides away complicated details

– Provides many options and switches for experts

• VampirTrace is part of Open MPI since version 1.3

20

Page 21: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VAMPIR & VAMPIRTRACE

DETAILS AND HANDS-ON

Performance Analysis of Computer Systems December 8th, 2011

Holger Brunst, Andreas Knüpfer, Jens Doleschal

[email protected]

[email protected]

[email protected]

Page 22: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

22

Overview

• Event tracing in general • Hands-on: NPB 3.3 BT-MPI • Finding performance bottlenecks • FAQ

Vampir & VampirTrace

Page 23: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

23

• Event tracing in general

Vampir & VampirTrace

Page 24: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

24

Common event types

• Enter/leave of function/routine/region

– Time stamp, process/thread, function ID

• Send/receive of P2P message (MPI)

– Time stamp, sender, receiver, length, tag, communicator

• Collective communication (MPI)

– Time stamp, process, root, communicator, # bytes

• Hardware performance counter values

– Time stamp, process, counter ID, value

• etc.

Page 25: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

25

Profiling and tracing

• Tracing advantages

– Preserve temporal and spatial relationships

– Allow reconstruction of dynamic behavior on any required abstraction level

– Profiles can be calculated from traces

• Tracing disadvantages

– Traces can become very large

– May cause perturbation

– Instrumentation and tracing is complicated

• Event buffering, clock synchronization, …

Page 26: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

26

Instrumentation

• Instrumentation: Process of modifying programs to

detect and report events

• There are various ways of instrumentation:

– Manually

• Large effort, error prone

• Difficult to manage

– Automatically

• Via source to source translation

• Via compiler instrumentation

• Program Database Toolkit (PDT)

• OpenMP Pragma And Region Instrumenter (Opari)

Page 27: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

27

Open Trace Format (OTF)

• Open source trace file format

• Available at http://www.tu-dresden.de/zih/otf

• Includes powerful libotf for reading/parsing/writing in

custom applications

• Multi-level API:

– High level interface for analysis tools

– Low level interface for trace libraries

• Actively developed by TU Dresden in cooperation with

the University of Oregon and the Lawrence Livermore

National Laboratory

Page 28: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

28

Practical instrumentation

• Instrumentation with VampirTrace

– Hide instrumentation in compiler wrapper

– Use underlying compiler, add appropriate options

• Test run

– User representative test input

– Set parameters, environment variables, etc.

– Perform trace run

• Get trace

CC = mpicc

CC = vtcc –vt:cc mpicc

Page 29: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

29

Source code instrumentation

manually or automatically

int foo(void* arg) {

enter(7);

if (cond) {

leave(7);

return 1;

}

leave(7);

return 0;

}

int foo(void* arg) {

if (cond) {

return 1;

}

return 0;

}

Page 30: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

30

• NAS Parallel Benchmarks 3.3, BT class B • Block tridiagonal solver for nonlinear PDEs

Vampir & VampirTrace Hands-on

Page 31: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Overview: Use of VampirTrace

Instrument your application with VampirTrace

1. Edit your Makefile and change the underlying compiler

2. Tell VampirTrace the parallelization type of your application

CC = cc

CXX = CC

F77 = ftn F90 = ftn

MPICC = cc

MPIF90 = ftn

CC = vtcc

CXX = vtcxx

F77 = vtf77 F90 = vtf90

MPICC = vtcc

MPIF90 = vtf90

-vt:<seq|mpi|mt|hyb>

# seq = sequential

# mpi = parallel (uses MPI) # mt = parallel (uses OpenMP/POSIX threads)

# hyb = hybrid parallel (MPI+Threads)

31

Page 32: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Overview: Use of VampirTrace

Instrument your application with VampirTrace

3. Optional: Choose instrumentation type for your application

-vt:inst <gnu|pgi|sun|xl|ftrace|openuh|manual|

dyninst>

# DEFAULT: automatic instrumentation by compiler # manual: manual by using VT’s API (see manual)

# dyninst: binary instrumentation using Dyninst

32

Page 33: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

33

Hands-on: NPB 3.3 BT-MPI

• Load required modules

• Move into tutorial directory

% module load vampirtrace

% cd <path to NPB3.3-MPI>

Page 34: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

34

Hands-on: NPB 3.3 BT-MPI

• Select the VampirTrace compiler wrappers

• Build benchmark

% gedit config/make.def

-> comment out line 32, resulting in: 32: #MPIF77 = mpif77

-> modify line 38 as follows:

38: MPIF77 = vtf77 -vt:f77 ifort -lmpi

% make clean

% make bt CLASS=B NPROCS=16

Page 35: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

35

Hands-on: NPB 3.3 BT-MPI

• Submit job and launch MPI application

• Visualization with Vampir 7

% cd bin.vampir

% mpirun -np 16 ./bt_B.16

% module load vampir

% vampir &

Page 36: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

36

Hands-on: NPB 3.3 BT-MPI Change summary to function based

statistic

Page 37: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

37

Hands-on: NPB 3.3 BT-MPI Change metric to number of invocations

Page 38: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

38

Hands-on: NPB 3.3 BT-MPI Add counter timeline

Page 39: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

39

Hands-on: NPB 3.3 BT-MPI Switch to memory allocation counter

Page 40: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

40

Hands-on: NPB 3.3 BT-MPI Use performance radar view to get

an overview

Page 41: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

41

Hands-on: NPB 3.3 BT-MPI Switch to memory allocation counter

Page 42: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

42

Hands-on: NPB 3.3 BT-MPI Zoom in to see execution phases

Page 43: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

43

Hands-on: NPB 3.3 BT-MPI Switch to floating point operation

counter

Page 44: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

44

Hands-on: NPB 3.3 BT-MPI Show occurrences of a function

Page 45: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

45

Hands-on: NPB 3.3 BT-MPI

Page 46: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

46

• Finding performance bottlenecks

Vampir & VampirTrace

Page 47: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

47

Finding bottlenecks

• Trace visualization

– Vampir provides a number of display types

– Each allows many different options

• Advice

– Identify essential parts of an application (initialization,

main iteration, I/O, finalization)

– Identify important components of the code (serial computation,

MPI P2P, collective MPI, OpenMP)

– Make a hypothesis about performance problems

– Consider application’s internal workings if known

– Select the appropriate displays

– Use statistic displays in conjunction with timelines

Page 48: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

48

Finding bottlenecks

• Communication

• Computation

• Memory, I/O, etc.

• Tracing itself

Page 49: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

49

Bottlenecks in communication

• Communications as such (dominating over computation)

• Late sender, late receiver

• Point-to-point messages instead of collective

communication

• Unmatched messages

• Overcharge of MPI’s buffers

• Bursts of large messages (bandwidth)

• Frequent short messages (latency)

• Unnecessary synchronization (barrier)

All of the above usually result in high MPI time share.

Page 50: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

51

Bottlenecks in communication

prevalent communication: MPI_Allreduce

Page 51: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

52

Bottlenecks in communication

prevalent communication: timeline view

Page 52: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

54

Bottlenecks in communication

unnecessary MPI_Barriers

Page 53: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

55

Bottlenecks in communication

patterns of successive MPI_Allreduce calls

Page 54: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

56

Further bottlenecks

• Unbalanced computation

– Single late comer

• Strictly serial parts of program

– Idle processes/threads

• Very frequent tiny function calls

• Sparse loops

Page 55: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

57

Further bottlenecks

example: idle OpenMP threads

Page 56: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

58

Bottlenecks in computation

• Memory bound computation

– Inefficient L1/L2/L3 cache usage

– TLB misses

– Detectable via HW performance counters

• I/O bound computation

– Slow input/output

– Sequential I/O on single process

– I/O load imbalance

• Exception handling

Page 57: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

59

Bottlenecks in computation

low FP rate due to heavy cache misses

Page 58: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

60

Bottlenecks in computation

low FP rate due to heavy FP exceptions

Page 59: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

61

Bottlenecks in computation

irregular slow I/O operations

Page 60: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

62

Effects due to Tracing

• Measurement overhead

– Especially grave for tiny function calls

– Solve with selective instrumentation

• Long/frequent/asynchronous trace buffer flushes

• Too man concurrent counters

• Heisenbugs

Page 61: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

63

Effects due to Tracing

Trace buffer flushes are explicitly marked in the trace.

It is rather harmless at the end of a trace as shown here.

Page 62: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

64

• FAQ

Vampir & VampirTrace

Page 63: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VampirTrace FAQ - Tracing switched off

Issue:

Tracing was switched off because the

internal trace buffer was too small that all events fit in

Result:

1. Asynchronous behavior of the application due to

buffer flush of the measurement system

2. No tracing information available after flush operation

3. Huge overhead due to flush operation

[0]VampirTrace: Maximum number of buffer flushes reached \

(VT_MAX_FLUSHES=1)

[0]VampirTrace: Tracing switched off permanently

65

Page 64: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VampirTrace FAQ - Solutions

• Increase trace buffer size

• Increase number of allowed buffer flushes (not

recommended)

• Use filter mechanisms to reduce the number of recorded

% export VT_BUFFER_SIZE = 150M

% export VT_MAX_FLUSHES = 2

% export VT_FILTER_SPEC = /home/user/filter.spec

66

Page 65: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VampirTrace FAQ – Issue of increasing

buffer size

Issue:

Each function entry/exit, MPI event was recorded

Result:

Trace files become large even for short application runs

Solutions:

1. Use filter mechanisms to reduce the number of

recorded events (see slide Function Filtering for more

details)

2. Use selective instrumentation of your application

(see slide Selective Instrumentation for more details) 67

Page 66: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

68

Function filtering

• Filtering is one of the ways to reduce trace size

• Environment variable VT_FILTER_SPEC

• Filter definition file contains a list of filters

• See also the vtfilter tool

– Can generate a customized filter file

– Can reduce the size of existing trace files

% export VT_FILTER_SPEC = /home/user/filter.spec

my_*;test_* -- 1000

debug_* -- 0 calculate -- -1

* -- 1000000

Page 67: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

Selective instrumentation

• Selective instrumentation can helps you to reduce the

size of your trace file while only parts of interests will be

recorded

• One option to use selective instrumentation is to use a

manual instrumentation instead of a automatic

instrumentation

• Another option is to modify your Makefile in such a way

that a automatic instrumentation (default) is only applied

to source files with interesting parts of interests

(functions of interest)

% vtcc -vt:inst manual … source_code.c

69

Page 68: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VampirTrace FAQ – How to get more insights?

Issue:

I’m interested in more events and hardware counters. What do I have to do?

Solutions:

1. Use the environment option VT_METRICS to enable recording of additional hardware counters like PAPI, CPC or NEC if available.

2. Use the environment option VT_RUSAGE to record the Unix resource usage counters.

3. Use the environment option VT_MEMTRACE, if available on your system, to intercept the libc allocation functions add to record memory allocation information.

For more additional events and recording hardware information see chapter 4 in the VampirTrace manual.

70

Page 69: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

71

PAPI

• PAPI counters can be included in traces

– If VampirTrace was build with PAPI support

– If PAPI is available on the platform

• VT_METRICS specifies a list of PAPI counters

• See also the PAPI commands papi_avail and

papi_command_line

% export VT_METRICS = PAPI_FP_OPS:PAPI_L2_TCM

Page 70: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

72

Memory allocation and I/O counters

• Memory allocation counters can be recorded:

– If VampirTrace build with memory allocation tracing support

– If GNU glibc is used on the platform

• Intercept glibc functions like “malloc” and “free”

• Environment variable VT_MEMTRACE

• I/O counters can be included in traces

– If VampirTrace was build with I/O tracing support

• Standard I/O calls like “open” and “read” are recorded

• Environment variable VT_IOTRACE

% export VT_MEMTRACE = yes

% export VT_IOTRACE = yes

Page 71: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

VampirTrace FAQ – Grouping of functions

Issue:

My functions appear in the default group “application”.

What can I do to better differentiate between different types

of functions?

Result:

Statistics of the default groups are not able to show the

different behavior of different function classes.

Solution:

Use grouping mechanism to to define own groups (see

slide Function Grouping for more details) 73

Page 72: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

74

Function grouping

• Groups can be defined for related functions

– Groups can be assigned different colors, highlighting

different activities

• Environment variable VT_GROUPS_SPEC

• Group file contains a list of associated entries

% export VT_GROUPS_SPEC = /home/user/groups.spec

CALC=calculate

MISC=my*;test UNKNOWN=*

Page 73: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

75

VampirTrace run-time options

• Control options by environment variables:

– VT_PFORM_GDIR Directory for final trace files

– VT_PFORM_LDIR Directory for intermediate files

– VT_FILE_PREFIX Trace file name

– VT_BUFFER_SIZE Internal trace buffer size

– VT_MAX_FLUSHES Max number of buffer flushes

– VT_MEMTRACE Enable memory allocation tracing

– VT_MPICHECK Enable MPI checking

– VT_IOTRACE Enable I/O tracing

– VT_MPITRACE Enable MPI tracing

– VT_FILTER_SPEC Name of filter definition file

– VT_GROUPS_SPEC Name of grouping definition file

– VT_METRICS PAPI counter selection

Page 74: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

76

Conclusions and outlook

• Performance analysis very important in HPC

• Use performance analysis tools for profiling and tracing

• Do not spend effort in do-it-yourself solutions,

e.g. like printf-debugging

• Use tracing tools with some precautions

– Overhead

– Data volume

• Let us know about problems and about feature wishes

[email protected]

Page 75: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

77

Vampir and VampirTraces are

available at http://www.vampir.eu and

http://www.tu-dresden.de/zih/vampirtrace/ ,

get support via [email protected]

Page 76: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

78

Staff at ZIH - TU Dresden:

Ronny Brendel, Holger Brunst, Jens Doleschal, Ronald Geisler, Daniel Hackenberg, Michael Heyde,

Tobias Hilbrich, Rene Jäkel, Matthias Jurenz, Michael Kluge, Andreas Knüpfer, Matthias Lieber,

Holger Mickler, Hartmut Mix, Matthias Müller, Wolfgang E. Nagel, Reinhard Neumann, Michael Peter,

Heide Rohling, Johannes Spazier, Michael Wagner, Matthias Weber, Bert Wesarg

Page 77: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

79

Wrapper functions

• Provide wrapper functions

– Call instrumentation function for notification

– Call original target for functionality

– Via preprocessor directives:

• Via library preload:

– Preload instrumented dynamic library

• Suitable for standard libraries (e.g. MPI, glibc)

#define MPI_Init WRAPPER_MPI_Init

#define MPI_Send WRAPPER_MPI_Send

Page 78: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

80

The MPI Profiling Interface

• Each MPI function has to names:

– MPI_xxx and PMPI_xxx

• Replacement of MPI routines at link time

wrapper library

user program

MPI library

MPI_Send

PMPI_Send MPI_Send

MPI_Send

MPI_Send

MPI_Send MPI_Send

Page 79: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

81

Compiler instrumentation

gcc –finstrument-functions –c foo.c

• many compilers support this: GCC, Intel, IBM, PGI, NEC,

Hitachi, Sun Fortran, …

• no source code modification necessary

void __cyg_profile_func_enter( <args> );

void __cyg_profile_func_exit( <args> );

Page 80: VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW · VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW Performance Analysis of Computer Systems December 8th, 2011 Holger Brunst, Andreas

82

Dynamic instrumentation

• Modify executable in file or binary in memory

• Insert instrumentation calls

• Very platform/machine dependent, expensive

• DynInst project (http://www.dyninst.org)

– Common interface

– Supported platforms: Alpha/Tru64, MIPS/

IRIX,

PowerPC/AIX, Sparc/Solaris, x86/Linux x86/Windows, ia64/Linux