scalability study of s3d using tau sameer shende tau-team@cs.uoregon.edu

Post on 19-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scalability Study of S3D using TAUSameer Shende

tau-team@cs.uoregon.edu

TAU Performance SystemS3D Scalability Study 2

Acknowledgements

Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL]

The performance data presented here is available at:

http://www.cs.uoregon.edu/research/tau/s3d

TAU Performance SystemS3D Scalability Study 3

TAU Parallel Performance System

http://www.cs.uoregon.edu/research/tau/ Multi-level performance instrumentation

Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system

Computer system architectures and operating systems Different programming languages and compilers

Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid

TAU Performance SystemS3D Scalability Study 4

Scalability Study

Harness testcase Platform: Jaguar Cray XT3 at ORNL

1p 8p 64p 512p

Goal: to evaluate scaling properties of code regions Scalability of MPI operations

TAU Performance SystemS3D Scalability Study 5

Introduction to ParaProf: Main Window

click left mouse button

click right mouse button

% paraprof *.ppkload all 1p, 8p, 64p, 512pprofile datasets together

TAU Performance SystemS3D Scalability Study 6

ParaProf: MFLOPs sorted by Exclusive Time

TAU Performance SystemS3D Scalability Study 7

Source Code View

TAU Performance SystemS3D Scalability Study 8

Comparison Window: Inclusive Time

TAU Performance SystemS3D Scalability Study 9

Comparing Level 1 Data Cache Misses

TAU Performance SystemS3D Scalability Study 10

CPU Resource Stalls

TAU Performance SystemS3D Scalability Study 11

ParaProf: 3D view for 512 cpus - Jagged Edges!

TAU Performance SystemS3D Scalability Study 12

MPI_Wait - Jagged Edges Seen in 3D Window

pattern repeatsevery 8 cpus!

512 cpus

TAU Performance SystemS3D Scalability Study 13

MPI_Wait - Histogram (Bins) View

TAU Performance SystemS3D Scalability Study 14

Comparing MPI_Wait

MPI_Wait time increases steadily with processors!

TAU Performance SystemS3D Scalability Study 15

PerfDMF: Performance Data Mgmt. Framework

TAU Performance SystemS3D Scalability Study 16

PerfExplorer - Comparative Analysis Relative speedup, efficiency

total runtime, by event, one event, by phase Breakdown of total runtime Group fraction of total runtime Correlating events to total runtime Timesteps per second

TAU Performance SystemS3D Scalability Study 17

PerfExplorer

TAU’sPerfDMFdatabase

S3D

TAU Performance SystemS3D Scalability Study 18

PerfExplorer: Select Experiment & Analysis

TAU Performance SystemS3D Scalability Study 19

Relative Efficiency By Event

TAU Performance SystemS3D Scalability Study 20

Relative Efficiency For S3D - Weak Scaling

TAU Performance SystemS3D Scalability Study 21

Relative Speedup

TAU Performance SystemS3D Scalability Study 22

Relative Efficiency & Speedup for One Event

TAU Performance SystemS3D Scalability Study 23

Data Mining: Event Correlation to Total Time

r = 1 impliesdirect correlation

TAU Performance SystemS3D Scalability Study 24

MPI Scaling

TAU Performance SystemS3D Scalability Study 25

Total Runtime Breakdown by Events

TAU Performance SystemS3D Scalability Study 26

S3D - Building with TAU Change name of compiler in build/make.XT3

ftn=> tau_f90.sh cc => tau_cc.sh

Set compile time environment variables setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/

Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation

setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’ Selective instrumentation file eliminates instrumentation in lightweight routines Pre-process Fortran source code using cpp before compiling

Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script:

export TAU_THROTTLE=1 export COUNTER1 GET_TIME_OF_DAY export COUNTER2 PAPI_FP_INS export COUNTER3 PAPI_L1_DCM export COUNTER4 PAPI_RES_STL export COUNTER5 PAPI_L2_DCM

TAU Performance SystemS3D Scalability Study 27

Selective Instrumentation in TAU

% cat select.tauBEGIN_EXCLUDE_LIST

MCADIF

GETRATES

TRANSPORT_M::MCAVIS_NEW

MCEDIF

MCACON

CKYTCP

THERMCHEM_M::MIXCP

THERMCHEM_M::MIXENTH

THERMCHEM_M::GIBBSENRG_ALL_DIMT

CKRHOY

MCEVAL4

THERMCHEM_M::HIS

THERMCHEM_M::CPS

THERMCHEM_M::ENTROPY

END_EXCLUDE_LIST

BEGIN_INSTRUMENT_SECTION

loops routine="#"

END_INSTRUMENT_SECTION

TAU Performance SystemS3D Scalability Study 28

Getting Access to TAU on Jaguar set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path) Choose Stub Makefiles (TAU_MAKEFILE env. var.) from

/spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.* Makefile.tau-mpi-pdt-pgi (flat profile) Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile)

Binaries of S3D can be found in: ~sameer/scratch/S3D-BINARIES

withtau» papi, multiplecounters, mpi, pdt, pgi options

without_tau

TAU Performance SystemS3D Scalability Study 29

Concluding Discussion Performance tools must be used effectively More intelligent performance systems for productive use

Evolve to application-specific performance technology Deal with scale by “full range” performance exploration Autonomic and integrated tools Knowledge-based and knowledge-driven process

Performance observation methods do not necessarily need to change in a fundamental sense More automatically controlled and efficiently use

Develop next-generation tools and deliver to community Open source with support by ParaTools, Inc. http://www.cs.uoregon.edu/research/tau

TAU Performance SystemS3D Scalability Study 30

Support Acknowledgements

Department of Energy (DOE)

Office of Science LLNL, LANL, ORNL, ASC PERI

top related