gptl: a simple and free general purpose tool for performance analysis and profiling april 8, 2014...

21
GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

Upload: kenneth-matthews

Post on 29-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

GPTL: A simple and free general purpose tool for performance analysis and profiling

April 8, 2014

Jim RosinskiNOAA/ESRL

Page 2: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 2

Outline

• Motivation and Basic Usage• Auto-instrumentation• Auto-profiling MPI routines• Summary across threads and tasks• Induced overhead• Choice of underlying timing routine• PAPI interface• Utility functions• Future work

Page 3: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 3

Motivation

• Needed something to simplify, for an arbitrary number of regions to be timed:

time = 0;for (i = 0; i < 10; i++) { gettimeofday (tp1,0); compute (); gettimeofday (tp2,0); delta = tp2.tv_sec - tp1.tv_sec + 1.e6*(tp2.tv_usec - tp1.tv_usec); time += delta;}printf (“compute took %g seconds\n”, time);

Page 4: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 4

Solution

#include <gptl.h>...ret = GPTLinitialize ()ret = GPTLstart (“total”);for (i = 0; i < 10; i++) { ret = GPTLstart (“compute”); compute (); ret = GPTLstop (“compute”); ...}ret = GPTLstop (“total”);ret = GPTLpr (0);

Page 5: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 5

Results

• Output file timing.0 contains:

Called Wallclock total 1 3.983 compute 10 3.877

Page 6: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 6

Most of the API#include <gptl.h>

...

ret = GPTLsetoption (PAPI_FP_OPS, 1); // Enable a PAPI counter

ret = GPTLsetutr (GPTLnanotime); // Better wallclock timer

...

ret = GPTLinitialize (); // Once per process

ret = GPTLstart (“total”); // Start a timer

ret = GPTLstart (“compute”); // Start another timer

compute (); // Do work

ret = GPTLstop (“compute”); // Stop a timer

...

ret = GPTLstop (“total”); // Stop a timer

ret = GPTLpr (iam); // Print results

ret = GPTLpr_summary (MPI_COMM_WORLD); // Print results summary

// across threads and tasks

Page 7: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 7

Set options via Fortran namelist

• Avoid recoding/recompiling by using Fortran namelist option:

call gptlprocess_namelist (‘my_namelist’, unitno, ret)

• Example contents of ‘my_namelist’:

&gptlnl

utr = ‘nanotime’

eventlist = ‘GPTL_CI’,’PAPI_FP_OPS‘

/

Page 8: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 8

Auto-instrumentation

• Works with Intel, GNU, Pathscale, PGI, AIX

# icc –g –finstrument-functions *.c –lgptl# gfortran –g –finstrument-functions *.f90 –lgptl# pgcc –g –Minstrument:functions *.c –lgptl

• Inserts automatically at function start:__cyg_profile_func_enter (void *this_fn, void *call_site);

• And at function exit:__cyg_profile_func_exit (void *this_fn, void *call_site);

Page 9: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 9

Auto-instrumentation (cont’d)

• GPTL handles these entry points with:

void __cyg_profile_func_enter (void *this_fn, void *call_site){ (void) GPTLstart_instr (this_fn);}

void __cyg_profile_func_exit (void *this_fn, void *call_site){ (void) GPTLstop_instr (this_fn);}

Page 10: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 10

Auto-instrumentation (cont’d)

• After running the app, convert addresses to names with:

hex2name.pl [-demangle] <executable> <timing_file>

Page 11: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 11

Dynamic call tree from auto-instrumentation

Stats for thread 0: Called Wallclock max min FP_OPS total 1 64.021 64.021 64.021 3.50e+08 HPCC_Init 11 0.157 0.157 0.000 95799* HPL_pdinfo 120 0.019 0.018 0.000 96996* HPL_all_reduce 7 0.043 0.036 0.000 448* HPL_broadcast 21 0.041 0.036 0.000 126

HPL_pdlamch 2 0.004 0.004 0.000 94248* HPL_fprintf 240 0.001 0.000 0.000 1200 HPCC_InputFileInit 41 0.001 0.001 0.000 194

ReadInts 2 0.000 0.000 0.000 12 PTRANS 21 22.667 22.667 0.000 4.19e+07 MaxMem 5 0.000 0.000 0.000 796* iceil_ 132 0.000 0.000 0.000 792* ilcm_ 14 0.000 0.000 0.000 84 param_dump 18 0.000 0.000 0.000 84 Cblacs_get 5 0.000 0.000 0.000 30 Cblacs_gridmap 35 0.005 0.001 0.000 225* Cblacs_pinfo 7 0.000 0.000 0.000 40* Cblacs_gridinfo 60 0.000 0.000 0.000 260

Page 12: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 12

MPI Auto-instrumentation

• To enable MPI auto-instrumentation, in macros.make set this:– ENABLE_PMPI=yes

Page 13: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 13

MPI Auto-instrumentation (cont’d)Stats for thread 0:                       Called  Wallclock max       min       AVG_MPI_BYTES   MPI_Init_thru_Finalize      1 8.70e-04  8.70e-04  8.70e-04       -           MPI_Send                  1 5.10e-05  5.10e-05  5.10e-05     4.096e+03     MPI_Recv                  3 2.63e-04  2.32e-04  1.50e-05     4.096e+03     MPI_Ssend                 1 2.40e-05  2.40e-05  2.40e-05     4.096e+03     MPI_Issend                1 1.00e-05  1.00e-05  1.00e-05     4.096e+03     MPI_Sendrecv              1 1.80e-05  1.80e-05  1.80e-05     8.192e+03     MPI_Irecv                 2 1.00e-05  9.00e-06  1.00e-06     4.096e+03     MPI_Isend                 2 6.00e-06  4.00e-06  2.00e-06     4.096e+03     MPI_Wait                  2 1.80e-05  1.70e-05  1.00e-06       -           MPI_Waitall               2 1.10e-05  1.10e-05  0.00e+00       -           MPI_Barrier               1 2.20e-05  2.20e-05  2.20e-05       -           MPI_Bcast                 1 9.00e-06  9.00e-06  9.00e-06     4.096e+03

Page 14: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 14

Induced Overhead

• GPTL estimates its own overhead:overhead of 1 GPTLstart or GPTLstop call=1.28e-07 seconds

Components are as follows:Fortran layer: 1.0e-09 = 1.5% of totalGet thread number: 1.7e-08 = 13.3% of totalGenerate hash index: 1.9e-08 = 14.8% of totalFind hashtable entry: 1.5e-08 = 11.7% of totalUnderlying timing routine: 7.0e-08 = 53.2% of totalMisc start/stop functions: 7.0e-09 = 5.5% of total

Page 15: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 15

Induced Overhead (cont’d)Stats for thread 0:              Called  Wallclock max       min       self_OH parent_OH   total              1    0.910     0.910     0.910   0.000     0.000     1x1e7            1    0.022     0.022     0.022   0.000     0.000     10x1e6          10    0.015  1.55e-03  1.36e-03   0.000     0.000     100x1e5        100    0.014  1.80e-04  1.11e-04   0.000     0.000     1000x1e4      1000    0.015  2.01e-05  1.11e-05   0.000     0.000     1e4x1000     10000    0.015  1.04e-05  1.12e-06   0.000     0.001     1e5x100     100000    0.015  9.05e-06  1.22e-07   0.001     0.006     1e6x10     1.0e+06    0.026  8.74e-06  1.67e-08   0.011     0.062     1e7x1      1.0e+07    0.180  8.74e-06  1.11e-08   0.108     0.618

Page 16: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 16

Underlying timing routine

• Default is gettimeofday()• For Intel arch’s change to register read which

has better granularity and much lower overhead:– C or Fortran: GPTLsetutr(GPTLnanotime);– Fortran: utr = ‘nanotime’ in namelist &gptlnl– May cause problems on machines with variable clock rate (e.g. “turbo mode”)

Page 17: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 17

PAPI details handled by GPTL• This call:

GPTLsetoption (PAPI_FP_OPS, 1);

• Implies: PAPI_library_init (PAPI_VER_CURRENT));

PAPI_thread_init ((unsigned long (*)(void(pthread_self));

PAPI_create_eventset (&EventSet[t]));

PAPI_assign_eventset_component (EventSet[t], 0);

PAPI_multiplex_init ();

PAPI_set_multiplex (EventSet[t]);

PAPI_add_event (EventSet[t], PAPI_FP_OPS));

PAPI_start (EventSet[t]);

• PAPI multiplexing handled automatically, enabled only if needed

Page 18: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 18

timing.summary file generated by GPTLpr_summary(comm)

name ncalls nranks mean_time std_dev wallmax (rank ) wallmin (rank )Diag 1002 2 4.371 3.453 6.812 ( 0) 1.929 ( 1)MainLoop 2 2 53.364 0.007 53.369 ( 0) 53.359 ( 1)ZeroTendencies 200 2 0.086 0.030 0.107 ( 0) 0.065 ( 1)SaveFlux 200 2 0.149 0.048 0.183 ( 0) 0.115 ( 1)RHStendencies 800 2 0.421 0.148 0.526 ( 0) 0.317 ( 1)Vdtotal 1600 2 25.702 1.361 26.665 ( 0) 24.740 ( 1)Vdm 800 2 23.851 1.118 24.642 ( 0) 23.060 ( 1)vdmfinish 800 2 2.794 1.010 3.508 ( 0) 2.080 ( 1)Vdn 800 2 1.848 0.246 2.022 ( 0) 1.674 ( 1)Flux 800 2 4.818 1.135 5.620 ( 1) 4.015 ( 0)Force 800 2 1.901 0.110 1.979 ( 1) 1.823 ( 0)RKdiff 800 2 1.247 0.415 1.540 ( 0) 0.953 ( 1)TimeDiff 800 2 0.736 0.182 0.865 ( 0) 0.608 ( 1)Sponge 800 2 0.364 0.092 0.429 ( 0) 0.299 ( 1)pre_trisol 200 2 0.112 0.027 0.131 ( 0) 0.093 ( 1)Trisol 200 2 0.667 0.078 0.722 ( 1) 0.612 ( 0)post_trisol 200 2 0.082 0.012 0.090 ( 0) 0.073 ( 1)Vdmints 200 2 3.603 0.135 3.699 ( 0) 3.508 ( 1)Pstadv 200 2 0.849 0.044 0.880 ( 1) 0.817 ( 0)

Page 19: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 19

Utility functions

• To print current memory usage at any point in your code:– ret = GPTLprint_memusage (“user string”)

• Produces e.g.– GPTLprint_memusage: user string size=19.5 MB rss=2.1 MB

datastack=1.5 MB

• To auto-profile current memory usage (at both function entry and exit points) :– ret = GPTLsetoption (GPTLdopr_memusage, 1);

• Retrieve wallclock, usr, sys timestamps to user code:– ret = GPTLstamp (&wallclock, &usr, &sys);

Page 20: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 20

Future Work

• XML output• Port to GPU• Dynamic thread allocation for PTHREADS

option• Autoconf?

Page 21: GPTL: A simple and free general purpose tool for performance analysis and profiling April 8, 2014 Jim Rosinski NOAA/ESRL

NCAR SEA 21

Source and Documentation

• Source: https://github.com/jmrosinski/GPTL– git clone [email protected]:jmrosinski/GPTL.git

• Web-based documentation:– jmrosinski.github.io/GPTL

• Feel free to email me: [email protected]