advanced computing technology center © 2005 ibm corporation the ibm high performance computing...
TRANSCRIPT
Advanced Computing Technology Center
© 2005 IBM Corporation
The IBM High Performance Computing Toolkit
Guojing Cong
Advanced Computing Technology Center
© 2005 IBM Corporation
IBM High Performance Computing Toolkit (HPCT)
One consolidated package
Components:– Hardware Performance Monitor(HPM)
– Simulation Guided Memory Analyzer (SiGMA)
– MPI Profiler (MP_profiler)
– OpenMP Profiler (PompProf)
– Modular I/O Performance Tool (MIO)
– Xprofiler
– GUI integration tool w/ source code traceback (PeekPerf)
– Watson Sparse Matrix Library (WSMP) included
Advanced Computing Technology Center
© 2005 IBM Corporation
Our Vision
A toolkit that spans various aspects of high performance computing
– CPU profiling, memory behavior analysis, communication profiling, I/O analysis and optimization
Integrated performance monitoring and profiling environment
– one single consistent interface for all components
– enhanced functionality
• Binary instrumentation (without source code modification)• Dynamic instrumentation
Available on IBM Platforms
– AIX, LoP, and BlueGene
Advanced Computing Technology Center
© 2005 IBM Corporation
Support Matrix
HPMCount &
HPMlib
MP-profiler&MP-tracer
Xprofiler
SHMEM &
SHMEM-profiler
MIOPompPofi
lerSiGMA
PeekPerfWatson Sparse
Matrix Package
AIX Powe
r
today (AIX 5L 5.1, 5.3)
today (AIX 4.3.3
+)
today (AIX 5L 5.1)
today (AIX 5L
5.1)
today(AIX 5L 5.1)
today (AIX 5L
5.1)
today (AIX 4.3.3+)
today(AIX 4.3.3+)
today (AIX 5L 5.1)
Linux Powe
r
Aug/05 (Linux 2.4
&2.6)
May/05 (Linux
2.6)
Aug-Sep/05 (Linux 2.6)
N/ATBT
(Linux 2.6)
N/AAug-Sep/05 (Linux 2.6)
TBT TBT(Linux 2.6)
Linux JS20
Aug/05 (Linux 2.4
&2.6)
May/05 (Linux
2.6)
Aug-Sep/05 (Linux 2.6)
N/ATBT
(Linux 2.6)
N/AAug-Sep/05 (Linux 2.6)
TBT TBT(Linux 2.6)
Linux BG/L
Aug/05 today Aug/05 N/A TBT N/A N/A today N/A
Advanced Computing Technology Center
© 2005 IBM Corporation
Outline
Xprofiler
HPM
MP Profiler
OpenMP Profiler
MIO
Advanced Computing Technology Center
© 2005 IBM Corporation
Xprofiler
CPU profiling tool similar to gprof
Can be used to profile both serial and parallel applications
Use procedure-profiling information to construct a graphical display of the functions within an application
Provide quick access to the profiled data and helps users identify functions that are the most CPU-intensive
Based on sampling (support from both compiler and kernel)
Charge execution time to source lines and show disassembly code
Advanced Computing Technology Center
© 2005 IBM Corporation
Xprofiler: Main Display
Width of a bar:time includingcalled routines
Height of a bar:time excludingcalled routines
Call arrowslabeled withnumber of calls
Overview windowfor easy navigation(View Overview)
Advanced Computing Technology Center
© 2005 IBM Corporation
Xprofiler: Source Code Window
Source codewindow displayssource codewith time profile(in ticks=.01 sec)
Access
– Select functionin main display
– context menu
– Select functionin flat profile
– Code Display
– Show Source Code
Advanced Computing Technology Center
© 2005 IBM Corporation
Xprofiler - Disassembler Code
Advanced Computing Technology Center
© 2005 IBM Corporation
HPM provides comprehensive reports of hardware events that are
critical to performance
– Accurate and Low overhead
– Comprehensive
• E.g., number of floating-point instructions executed, cache misses, TLB misses
Derived metrics
– correlate the behavior of the application to one or more of the hardware components
Thread-level support
Including
– Hpmcount, libhpm, hpmstat
Advanced Computing Technology Center
© 2005 IBM Corporation
HPM Visualization Using PeekPerf
Advanced Computing Technology Center
© 2005 IBM Corporation
MP_profiler
A set of libraries that collect profiling data for MPI and TurboSHMEM applications
– Implements wrappers using PMPI interface
Report performance metrics, e.g.,
– time used by MPI function calls
– message sizes
Visualization tools help users identify performance bottlenecks
– peekperf maps performance metrics back to the source codes
– peekview gives a visual representation of the overall computation and communication pattern of the system.
Advanced Computing Technology Center
© 2005 IBM Corporation
MP_Profiler Visualization Using PeekPerf
Advanced Computing Technology Center
© 2005 IBM Corporation
MP_Tracer Visualization Using PeekPerf
Advanced Computing Technology Center
© 2005 IBM Corporation
POMP Profiler (PompProf)
Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application:
– Parallel regions
– OpenMP loops inside a parallel region
– User defined functions
Profile data is presented in the form of an XML file that can be visualized with PeekPerf
Advanced Computing Technology Center
© 2005 IBM Corporation
DPOMP
Dynamically instruments OpenMP applications
Has the advantage of the being able to modify binaries with performance instrumentation without requiring access to souce codes or recompilation
Based on dynamic probes using DPCL
Advanced Computing Technology Center
© 2005 IBM Corporation
PompProf Visualization Using PeekPerf
Advanced Computing Technology Center
© 2005 IBM Corporation
Modular I/O Performance Tool (MIO)
I/O Analysis
– Trace module
– Summary of File I/O Activity + Binary Events File
– Low CPU overhead
I/O Performance Enhancement Library
– Prefetch module (optimizes asynchronous prefetch and write-behind)
– System Buffer Bypass capability
– User controlled pages (size and number)
Recoverable Error Handling
– Recover module (monitors return values and errnor + reissues failed requests)
Remote Data Server
– Remote module (simple socket protocol for moving data)
Shared object library for AIX
Advanced Computing Technology Center
© 2005 IBM Corporation
MIO User Code Interface
#define open64(a,b,c)MIO_open64(a,b,c,0)#define read MIO_read#define write MIO_write#define close MIO_close#define lseek64 MIO_lseek64#define fcntl MIO_fcntl#define ftruncate64 MIO_ftruncate64#define fstat64 MIO_fstat64
Advanced Computing Technology Center
© 2005 IBM Corporation
MIO Trace Module (sample partial text output)
Trace close : program <-> pf : /bmwfs/cdh108.T20536_13.SCR300 : (281946/2162.61)=130.37 mbytes/s current size=0 max_size=16277 mode =0777 sector size=4096 oflags =0x302=RDWR CREAT TRUNC open 1 0.01 write 478193 462.10 59774 59774 131072 131072 read 1777376 1700.48 222172 222172 131072 131072 seek 911572 2.83 fcntl 3 0.00 trunc 16 0.40 close 1 0.03 size 127787
Advanced Computing Technology Center
© 2005 IBM Corporation
MSC.Nastran V2001
Benchmark: SOL 111, 1.7M DOF, 1578 modes, 146 frequencies, residual flexibility and acoustics. 120 GB of disk space.
Machine:4-way, 1.3 GHz p655, 32 GB with 16 GB large pages, JFS striped on 16 SCSI disks.
MSC.Nastran:V2001.0.9 with large pages, dmp=2 parallel=2 mem=700mbThe run with MIO used mio=1000mb
Tim
e
(sec
ond
s)
6.8 TB of I/O in 26666 seconds is an average of about 250 MB/sec
0
10,000
20,000
30,000
40,000
50,000
60,000
no MIO with MIO
Elapsed
CPU time
Advanced Computing Technology Center
© 2005 IBM Corporation
Advanced Computing Technology Center
© 2005 IBM Corporation
Advanced Computing Technology Center
© 2005 IBM Corporation
Problems that we are considering
Performance profiling and monitoring for scientific applications on large systems
– Selectively generates and reports profiling data
– Large amount performance data management and analysis
Composite profiling and presentation
– CPU profiling
– Hardware Performance Counter profiling
– Communication profiling