scalability study of s3d using tau sameer shende [email protected]
Post on 19-Dec-2015
215 views
TRANSCRIPT
Scalability Study of S3D using TAUSameer Shende
TAU Performance SystemS3D Scalability Study 2
Acknowledgements
Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL]
The performance data presented here is available at:
http://www.cs.uoregon.edu/research/tau/s3d
TAU Performance SystemS3D Scalability Study 3
TAU Parallel Performance System
http://www.cs.uoregon.edu/research/tau/ Multi-level performance instrumentation
Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system
Computer system architectures and operating systems Different programming languages and compilers
Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid
TAU Performance SystemS3D Scalability Study 4
Scalability Study
Harness testcase Platform: Jaguar Cray XT3 at ORNL
1p 8p 64p 512p
Goal: to evaluate scaling properties of code regions Scalability of MPI operations
TAU Performance SystemS3D Scalability Study 5
Introduction to ParaProf: Main Window
click left mouse button
click right mouse button
% paraprof *.ppkload all 1p, 8p, 64p, 512pprofile datasets together
TAU Performance SystemS3D Scalability Study 6
ParaProf: MFLOPs sorted by Exclusive Time
TAU Performance SystemS3D Scalability Study 7
Source Code View
TAU Performance SystemS3D Scalability Study 8
Comparison Window: Inclusive Time
TAU Performance SystemS3D Scalability Study 9
Comparing Level 1 Data Cache Misses
TAU Performance SystemS3D Scalability Study 10
CPU Resource Stalls
TAU Performance SystemS3D Scalability Study 11
ParaProf: 3D view for 512 cpus - Jagged Edges!
TAU Performance SystemS3D Scalability Study 12
MPI_Wait - Jagged Edges Seen in 3D Window
pattern repeatsevery 8 cpus!
512 cpus
TAU Performance SystemS3D Scalability Study 13
MPI_Wait - Histogram (Bins) View
TAU Performance SystemS3D Scalability Study 14
Comparing MPI_Wait
MPI_Wait time increases steadily with processors!
TAU Performance SystemS3D Scalability Study 15
PerfDMF: Performance Data Mgmt. Framework
TAU Performance SystemS3D Scalability Study 16
PerfExplorer - Comparative Analysis Relative speedup, efficiency
total runtime, by event, one event, by phase Breakdown of total runtime Group fraction of total runtime Correlating events to total runtime Timesteps per second
TAU Performance SystemS3D Scalability Study 17
PerfExplorer
TAU’sPerfDMFdatabase
S3D
TAU Performance SystemS3D Scalability Study 18
PerfExplorer: Select Experiment & Analysis
TAU Performance SystemS3D Scalability Study 19
Relative Efficiency By Event
TAU Performance SystemS3D Scalability Study 20
Relative Efficiency For S3D - Weak Scaling
TAU Performance SystemS3D Scalability Study 21
Relative Speedup
TAU Performance SystemS3D Scalability Study 22
Relative Efficiency & Speedup for One Event
TAU Performance SystemS3D Scalability Study 23
Data Mining: Event Correlation to Total Time
r = 1 impliesdirect correlation
TAU Performance SystemS3D Scalability Study 24
MPI Scaling
TAU Performance SystemS3D Scalability Study 25
Total Runtime Breakdown by Events
TAU Performance SystemS3D Scalability Study 26
S3D - Building with TAU Change name of compiler in build/make.XT3
ftn=> tau_f90.sh cc => tau_cc.sh
Set compile time environment variables setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/
Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation
setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’ Selective instrumentation file eliminates instrumentation in lightweight routines Pre-process Fortran source code using cpp before compiling
Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script:
export TAU_THROTTLE=1 export COUNTER1 GET_TIME_OF_DAY export COUNTER2 PAPI_FP_INS export COUNTER3 PAPI_L1_DCM export COUNTER4 PAPI_RES_STL export COUNTER5 PAPI_L2_DCM
TAU Performance SystemS3D Scalability Study 27
Selective Instrumentation in TAU
% cat select.tauBEGIN_EXCLUDE_LIST
MCADIF
GETRATES
TRANSPORT_M::MCAVIS_NEW
MCEDIF
MCACON
CKYTCP
THERMCHEM_M::MIXCP
THERMCHEM_M::MIXENTH
THERMCHEM_M::GIBBSENRG_ALL_DIMT
CKRHOY
MCEVAL4
THERMCHEM_M::HIS
THERMCHEM_M::CPS
THERMCHEM_M::ENTROPY
END_EXCLUDE_LIST
BEGIN_INSTRUMENT_SECTION
loops routine="#"
END_INSTRUMENT_SECTION
TAU Performance SystemS3D Scalability Study 28
Getting Access to TAU on Jaguar set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path) Choose Stub Makefiles (TAU_MAKEFILE env. var.) from
/spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.* Makefile.tau-mpi-pdt-pgi (flat profile) Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile)
Binaries of S3D can be found in: ~sameer/scratch/S3D-BINARIES
withtau» papi, multiplecounters, mpi, pdt, pgi options
without_tau
TAU Performance SystemS3D Scalability Study 29
Concluding Discussion Performance tools must be used effectively More intelligent performance systems for productive use
Evolve to application-specific performance technology Deal with scale by “full range” performance exploration Autonomic and integrated tools Knowledge-based and knowledge-driven process
Performance observation methods do not necessarily need to change in a fundamental sense More automatically controlled and efficiently use
Develop next-generation tools and deliver to community Open source with support by ParaTools, Inc. http://www.cs.uoregon.edu/research/tau
TAU Performance SystemS3D Scalability Study 30
Support Acknowledgements
Department of Energy (DOE)
Office of Science LLNL, LANL, ORNL, ASC PERI