RR
®®
1
Integrated Integrated MPI/OpenMP MPI/OpenMP Performance Performance
AnalysisAnalysis
KAI Software LabKAI Software LabIntel Corporation & Intel Corporation & Pallas, GmbHPallas, GmbH Bob Kuhn, Bob Kuhn, [email protected]@intel.com Hans-Christian Hoppe, Hans-Christian Hoppe, [email protected]@pallas.com
RR
®®
2
OutlineOutline
Why integrated MPI/OpenMP Why integrated MPI/OpenMP programming?programming?
A performance tool for MPI/OpenMP A performance tool for MPI/OpenMP programming (Phase 1)programming (Phase 1)
Integrated performance analysis Integrated performance analysis capability for ASCI Apps (Phase 2)capability for ASCI Apps (Phase 2)
RR
®®
3
Why Integrate Why Integrate MPI and OpenMP?MPI and OpenMP?Hardware trendsHardware trendsSimple example – How it is done now?Simple example – How it is done now?An FEA ExampleAn FEA ExampleASCI ExamplesASCI Examples
RR
®®
4
Parallel Hardware Keeps Parallel Hardware Keeps ComingComing Example recently LLNL Example recently LLNL
ASCI clustersASCI clusters Parallel Capacity Parallel Capacity
Resource (PCR) clusterResource (PCR) cluster– Three clusters totaling Three clusters totaling
472 Pentium 4s; the 472 Pentium 4s; the largest with 252largest with 252
– Theoretical peak 857 Theoretical peak 857 gigaFLOP/s, gigaFLOP/s,
– Linux Linux
– NetworX via SGI FederalNetworX via SGI Federal
HPCWireHPCWire 8/31/01 8/31/01
Parallel global file Parallel global file system clustersystem cluster
– Total 48 Pentium 4 Total 48 Pentium 4 processors processors
– 1,024 clients/servers1,024 clients/servers
– Deliver I/O rates of over Deliver I/O rates of over 32 GB/s32 GB/s
– Fail-over and global lock Fail-over and global lock managermanager
– Linux open sourceLinux open source
– NetworX via SGI FederalNetworX via SGI Federal
HPCWireHPCWire 8/31/01 8/31/01
RR
®®
5
Parallelism Performance Parallelism Performance AnalysisAnalysis
Effort
Cod
e P
erfo
rman
ce
OpenMP
MPIOpenMP Performance tools
MPI/OpenMP Performance
toolsDebuggers, IDEs
RR
®®
6
Cost Effective Parallelism Cost Effective Parallelism Long Term Long Term Wealth of parallelism experience Wealth of parallelism experience
single person codes to large teamsingle person codes to large team TTEETTOONN CCRREETTIINN LLAASSNNEEXX
PPuurrppoossee rraaddiiaattiioonn ttrraannssppoorrtt nnoonn--LLTTEE pphhyyssiiccss IICCFF ssiimmuullaattiioonnss
AAggee ((yyeeaarrss)) ~~55--1100 ~~1100 ~~2255
SSiizzee ((lliinneess)) 2200 KK 110000 KK llaarrggee
DDeevveellooppeerrss 11--22 11 mmaannyy
CCoommpplleexxiittyy llooww mmooddeerraattee hhiigghh
PPaarraalllleell mmooddeell 11 lleevveell SSMMPP 11 lleevveell DDMMPP
vvaarriieedd lleevveell SSMMPP//DDMMPP
ssiinnggllee lleevveell SSMMPP
CCoommpplliiccaattiioonnss mmeemmoorryy mmaannaaggeemmeenntt
bbuuiilldd pprroocceessss
RR
®®
7
ASCI ASCI Ultrascale Tools ProjectUltrascale Tools ProjectPathforward projectPathforward project
– RTS – Parallel System PerformanceRTS – Parallel System Performance
Ten Goals in three areas – Ten Goals in three areas – – ScalabilityScalability – Work with 10,000+ Processors– Work with 10,000+ Processors
– IntegrationIntegration – How about Hardware Monitors, – How about Hardware Monitors, Object Oriented, and Runtime Environment?Object Oriented, and Runtime Environment?
– Ease of UseEase of Use – Dynamic Instrumentation and Be – Dynamic Instrumentation and Be Prescriptive, not just Data Management Prescriptive, not just Data Management
RR
®®
8
Architecture for Ultrascale Architecture for Ultrascale PerformancePerformance
1)1) GuideGuide – Source – Source InstrumentationInstrumentation
2)2) VampirtraceVampirtrace – – MPI/OpenMP MPI/OpenMP InstrumentationInstrumentation
3)3) VampirVampir – – MPI AnalysisMPI Analysis
4)4) GuideViewGuideView – – OpenMP Analysis OpenMP Analysis
Guide
Vampir
GuideView
Application Source
Executable
Guidetrace Library
VampirtraceLibrary
TraceFile
Object Files
RR
®®
9
Phase One Goal – Phase One Goal – Integrated MPI/OpenMPIntegrated MPI/OpenMPPhase One Goals –Phase One Goals –
– Integrated MPI OpenMP TracingIntegrated MPI OpenMP Tracing– Mode most compatible with ASCI SystemsMode most compatible with ASCI Systems
–Whole Program ProfilingWhole Program Profiling– Integrate program profile with parallelismIntegrate program profile with parallelism
– Increased Scalability of Performance Increased Scalability of Performance AnalysisAnalysis
– 1000 processors1000 processors
RR
®®
10
Vampir – Integrated Vampir – Integrated MPI/OpenMPMPI/OpenMP SWEEP3D run
on 4 MPI tasks with 4 OpenMP Threads each
Timeline shows OpenMP regions with glyph
Threaded activity during OpenMP region
RR
®®
11
GuideView – Integrated GuideView – Integrated MPI/OpenMP & ProfileMPI/OpenMP & Profile
SWEEP3D run on 4 MPI tasks each with 4 OpenMP threads
All OpenMP regions for process summarized to one bar
Highlight (Red arrow) shows speedup curve for that set of threads
Thread view shows balance between MPI tasks and threads
RR
®®
12
GuideView – Integrated GuideView – Integrated MPI/OpenMP & ProfileMPI/OpenMP & Profile
Sorting and filtering bring large amounts of information to manageable level
Profile allows comparison of MPI, OpenMP and Application activity inclusive and exclusive
RR
®®
13
Guide –Guide –Compiler WorkhorseCompiler WorkhorseCompilation of Compilation of
OpenMP OpenMP Automatic Automatic
subroutine entry subroutine entry and exit and exit instrumentation –instrumentation –– FortranFortran
– C/C++C/C++
New compiler options –New compiler options –WGtraceWGtrace -- link with -- link with the Vampirtracethe Vampirtrace
WGprofWGprof -- -- subroutine subroutine entry/exit profiling entry/exit profiling
– – WGprof_leafprune WGprof_leafprune minimum size of minimum size of procedures to procedures to retain in profile retain in profile
RR
®®
14
Vampirtrace –Vampirtrace –Profiling Profiling Support for pruning of short routines Support for pruning of short routines
ROUTINE X ENTRY
ROUTINE Y ENTRY
ROUTINE Y EXIT
> Δt < Δt
This tree will be pruned. ROUTINE X will be marked as having calltree info summarized.
All events that have not been pruned could now be written to the tracefile.
˚ ˚ ˚
ROUTINE Z ENTRYROUTINE Z may still be < Δt so cannot yet be written.
RR
®®
15
Scalability on Phase OneScalability on Phase One
Timeline scaling to 256 Tasks/NodesTimeline scaling to 256 Tasks/NodesGathering of tasks in node into groupGathering of tasks in node into group
–Filtering by nodes Filtering by nodes
–Expand each nodeExpand each node
–Message statistics by nodesMessage statistics by nodes
RR
®®
16
Phase Two – Integrating Phase Two – Integrating Capabilities for ASCI AppsCapabilities for ASCI AppsPhase Two Goals –Phase Two Goals –
– Deployment to other platformsDeployment to other platforms – – – Compaq, CPlant, SGICompaq, CPlant, SGI
– Thread-SafetyThread-Safety– ScalabilityScalability – –
– Grouping Grouping – Statistical Analysis Statistical Analysis – Integrated GuideViewIntegrated GuideView
– Hardware performance monitorsHardware performance monitors– Dynamic control of instrumentationDynamic control of instrumentation– Environmental awarenessEnvironmental awareness
RR
®®
17
Thread SafetyThread Safety Collect data from Collect data from
each thread –each thread –– Thread-safeThread-safe
Vampirtrace libraryVampirtrace library
– Per threadPer thread profiling profiling datadata
– Previous release, Previous release, only master thread only master thread logged datalogged data
Improves accuracy Improves accuracy of dataof data
Value to users –Value to users –– Enhances integration Enhances integration
between MPI and between MPI and OpenMPOpenMP
– Enhances visibility into Enhances visibility into functional balance functional balance between threadsbetween threads
RR
®®
18
Scalability: GroupingScalability: Grouping Up to end of FY00Up to end of FY00
– Fixed hierarchy levels Fixed hierarchy levels (system, nodes, CPUs)(system, nodes, CPUs)
– Fixed grouping of Fixed grouping of processesprocesses
– Eg, Impossible to reflect Eg, Impossible to reflect communicatorscommunicators
Need more levelsNeed more levels– Threads are a fourth Threads are a fourth
groupgroup– Systems with deeper Systems with deeper
hierarchies (30T)hierarchies (30T)– Reduce number of on-Reduce number of on-
screen entities for screen entities for scalabilityscalability
Whole system
Node nNode 1
CPU 1 CPU c
T_1 T_p
t_1 t_c
Quadboard
RR
®®
19
Default GroupingDefault Grouping
Default Default GroupingGrouping– By NodesBy Nodes
– By ProcessesBy Processes
– By Master By Master ThreadsThreads
– All ThreadsAll Threads
Can be changed Can be changed in configuration in configuration filefile
All Cluster
All Processes
All Masters
Node n
Process n
T_1 T_pT_0 …
Node 1
Process 0
T_1 T_pT_0 …
All Threads
All Cluster
All Processes
All Masters
Node n
Process n
T_1 T_pT_0 …T_1 T_pT_0 …
Node 1
Process 0
T_1 T_pT_0 …T_1 T_pT_0 …
All Threads
RR
®®
20
Scalability: Scalability: GroupingGrouping
Filter processes Filter processes dialogdialog– Select groups Select groups
combo-boxcombo-box
Display of groupsDisplay of groups– By aggregationBy aggregation
– By representativeBy representative
Grouping applies toGrouping applies to– ““Timeline bars”Timeline bars”
– Counter streamsCounter streams
RR
®®
21
Scalability by GroupingScalability by Grouping
Parallelism display showing all threads
Parallelism display showing only master threads alternating between MPI and OpenMP parallelism
RR
®®
22
Statistical Information Statistical Information GatheringGathering Collects basic Collects basic
statistics statistics at runtimeat runtime Saves statistics in Saves statistics in
an ASCII-filean ASCII-file View statisticsView statistics
– your favorite your favorite spreadsheet ...spreadsheet ...
Reduced overhead Reduced overhead compared to compared to tracingtracing
Parallel Executable
Tracefile(big)
Statsfile(small)
Perl filter
Excel, ...
RR
®®
23
Statistical Information Statistical Information GatheringGathering Can work independent of tracingCan work independent of tracing Significantly lower overhead (memory, Significantly lower overhead (memory,
runtime) runtime) Restriction: for the whole application run ...Restriction: for the whole application run ...
What Organization Data Subroutines Per process Min/max/total
time # of calls
Messages Per sender/receiver
Min/max/total bytes
# of messages
Parallel region
Per process
RR
®®
24
Statistical Information Statistical Information GatheringGathering<act>:<sym>:<proc>:<calls>:<minexcl>:<maxexcl>:<totalexcl>:<minincl>:<maxincl>:<totalincl> INFO ACTSTATS Application:PK_2112_YBDRYS:0:16:3.539324e-04:5.249977e-04:7.470846e-03:3.539324e-04:5.249977e-04:7.470846e-03 INFO ACTSTATS Application:PK_2112_YBDRYS:1:16:3.600121e-04:5.509853e-04:7.577062e-03:3.600121e-04:5.509853e-04:7.577062e-03 INFO ACTSTATS Application:PK_2112_YBDRYS:2:16:3.390312e-04:5.350113e-04:7.542133e-03:3.390312e-04:5.350113e-04:7.542133e-03 INFO ACTSTATS Application:PK_2112_YBDRYS:3:16:3.429651e-04:5.450249e-04:7.494092e-03:3.429651e-04:5.450249e-04:7.494092e-03
01
23
45
67
PK_814_CALCHYDZRDPARAM
PK_562_CALCHYDY
0,00E+005,00E+041,00E+051,50E+05
2,00E+05
2,50E+05
3,00E+05
3,50E+05
PK_814_CALCHYDZ
RDPARAM
PK_562_CALCHYDY
RR
®®
25
GuideView Integrated GuideView Integrated Inside VampirInside Vampir
Creating an Creating an extension API extension API in Vampirin Vampir
– insert menu insert menu itemsitems
– include new include new displaysdisplays
– have access to have access to trace data & trace data & statisticsstatistics
Trace data(in memory)
Vampir menus
Vampir GUI engine
New GuideView
invoke
access
control
Motif graphics library
display
RR
®®
26
New GuideView New GuideView Whole Program ViewWhole Program View Goals –Goals –
– Improve Improve MPI/OpenMP MPI/OpenMP integrationintegration
– Improve Improve scalabilityscalability
– Integrate look Integrate look and feeland feel
Works like old Works like old GuideView!GuideView!
Load time – Load time – Fast!Fast!
RR
®®
27
New GuideView New GuideView Region ViewRegion ViewLooks like old Looks like old
Region view Region view turned on the turned on the side!side!
Scalability test Scalability test – 16 MPI tasks16 MPI tasks
– 16 OpenMP 16 OpenMP threadsthreads
– 300 Parallel 300 Parallel regionsregions
RR
®®
28
Hardware Performance Hardware Performance MonitorsMonitors
1)1) User can call HPM API User can call HPM API in the source codein the source code
2)2) User can define User can define events in Config file events in Config file for Guide for Guide instrumentationinstrumentation
3)3) HPM counter events HPM counter events are also logged from are also logged from Guidetrace and Guidetrace and Vampirtrace library Vampirtrace library
4)4) Underlying HPM Underlying HPM library is PAPIlibrary is PAPI
Guide
Vampir
GuideView
Application Source
Executable
Guidetrace
Vampirtrace
TraceFile
Object Files
Config File
PAPI
RR
®®
29
int main(int argc, char **argv) { int set_id; int inner,outer,other; set_id = VT_create_event_set(“MySet”); VT_add_event(set_id,PAPI_L1_DCM); VT_add_event(EventSet,PAPI_L2_DCM); VT_symdef(outer, “OUTER”, “USERSTATES”); VT_symdef(inner, “INNER”, “USERSTATES”);
VT_symdef(other, “OTHER”, “USERSTATES”);
VT_change_hpm(set_id);
VT_begin(outer);
foo();
VT_begin(inner);
bar();
VT_end(inner);
foo();
VT_end(outer);}
Create a new event set to measure L1 & L2 data cache
misses.
PAPI – PAPI – Hardware Performance Hardware Performance MonitorsMonitors Standardizes Standardizes
names across names across platformsplatforms
Users define Users define counter setscounter sets
User could User could instrument instrument by-hand --by-hand --
But better, But better, Counters are Counters are instrumented instrumented at OpenMP at OpenMP and subrsand subrs
Activate the event set
Collect the events over two user-defined
intervals
Can’t support unsup-ported
counters
RR
®®
30
Hardware Performance ExampleHardware Performance ExampleMPI tasks on timeline
Floating pt instructions correlated but in different window
Or, per MPI task activity correlated in same window
RR
®®
31
Hardware Performance Hardware Performance Can Be RichCan Be Rich
4 x 4 SWEEP3D run showing L1 Data Cache Miss
Cycles Stalled Waiting for Memory Accesses
RR
®®
32
Hardware Performance Hardware Performance in GuideViewin GuideView
You can see the HPM data on all GuideView windows
L1 data cache misses and stalls in Cycle due to memory stalls in per MPI task profile view
RR
®®
33
Derived Hardware Derived Hardware CountersCounters
Vampir and GuideView displays present derived counters
In this menu you can arithmetically combine measured counters into derived counters
RR
®®
34
Environmental Environmental CountersCounters
ParameterParameter MeaningMeaning
utime utime user time used user time used
stime stime system time used system time used
maxrss maxrss
max resident set size max resident set size
ixrss ixrss
shared memory size shared memory size
idrss idrss
unshared data sizeunshared data size
minflt minflt
page reclaims page reclaims
majflt majflt
page faults page faults
nswap nswap
swaps swaps
inblock inblock
block input operations block input operations
oublock oublock
block output operations block output operations
minflt minflt
page reclaims page reclaims
majflt majflt
page faults page faults
Select Select rusagerusage information like HPMs information like HPMs
Data appears Data appears in Vampir and in Vampir and GuideView like GuideView like HPM dataHPM data
Time-varying Time-varying OS counters –OS counters –•Config variable Config variable sets sampling sets sampling frequencyfrequency•Difficult to Difficult to attribute to attribute to source code source code preciselyprecisely
RR
®®
35
Environmental AwarenessEnvironmental Awareness
ParameterParameter MeaningMeaning
MP_EUIDEVICEMP_EUIDEVICE adapter set to be used for message adapter set to be used for message passing passing
MP_EUILIBMP_EUILIB communication subsystem library communication subsystem library implementation implementation
MP_INFOLEVELMP_INFOLEVEL level of message reporting level of message reporting
MP_BUFFER_MEMMP_BUFFER_MEM size of unexpected message buffers size of unexpected message buffers
MP_CSS_INTERRUPTMP_CSS_INTERRUPT generate interrupts for arriving generate interrupts for arriving packetspackets
MP_EAGER_LIMITMP_EAGER_LIMIT threshold for switching to threshold for switching to rendezvous protocol rendezvous protocol
MP_USE_FLOW_CONTMP_USE_FLOW_CONTROLROL
enforce flow control for outgoing enforce flow control for outgoing messages messages
Type 1: Collects IBM MPI information Type 1: Collects IBM MPI information – Treated as static (one time) event in tracefileTreated as static (one time) event in tracefile
– Over 50 parametersOver 50 parameters
RR
®®
36
Dynamic Control of InstrumentationDynamic Control of Instrumentation1)1) In source, User puts In source, User puts
VT_confsync() callsVT_confsync() calls
2)2) At runtime, At runtime, TotalView TotalView is attachedis attached and and breakpoint is inserted breakpoint is inserted
3)3) From process #0, From process #0, user adjusts several user adjusts several instrumentation instrumentation settingssettings
4)4) VTconfigchanged VTconfigchanged flag is set, breakpoint flag is set, breakpoint is exited,is exited,
Guide
Vampir
GuideView
Application Source
Executable
TotalView
VampirtraceLibrary
TraceFile
Object Files
Tracefile reflects change after Tracefile reflects change after nextnext VT_confsync() VT_confsync()
RR
®®
37
Dynamic Control of InstrumentationDynamic Control of InstrumentationKeywordKeyword DescriptionDescription Default ValueDefault Value
LOGFILE-NAMELOGFILE-NAME Tracefile nameTracefile name <argv[0]>.bvt<argv[0]>.bvt
LOGFILE-PREFIXLOGFILE-PREFIX Tracefile path prefixTracefile path prefix Null stringNull string
ACTIVITYACTIVITY Trace activities (User defined)Trace activities (User defined) * ON* ON
SYMBOLSYMBOL Trace symbols (Often subroutines)Trace symbols (Often subroutines) * ON* ON
COUNTERCOUNTER Trace countersTrace counters * ON* ON
OPENMPOPENMP Trace OpenMP regionsTrace OpenMP regions * ON* ON
PCTRACEPCTRACE Record return addressRecord return address OFFOFF
SUM-MPITESTSSUM-MPITESTS Collapse MPI probe and test routinesCollapse MPI probe and test routines ONON
CLUSTERCLUSTER Trace cluster nodesTrace cluster nodes All enabledAll enabled
PROCESSPROCESS Trace processesTrace processes All enabledAll enabled
ENVIRONMENTENVIRONMENT Record environment informationRecord environment information ONON
MEM-MAXBLOCKSMEM-MAXBLOCKS Maximum number of memory blocksMaximum number of memory blocks UnlimitedUnlimited
MEM-OVERWRITEMEM-OVERWRITE Overwrite in–core buffersOverwrite in–core buffers OFFOFF
PRUNE-LIMITPRUNE-LIMIT Execution time thresholdExecution time threshold No pruningNo pruning
RR
®®
38
Structured Trace FilesStructured Trace FilesFrames Manage ScalabilityFrames Manage Scalability
A Section of the Timeline
A Set of Processors
Instances of a subroutine
OpenMP Regions
Messages or Collectives
RR
®®
39
Structured Trace FilesStructured Trace Files Consist of Frames Consist of FramesFrames are defined Frames are defined
In the source code –In the source code –– int VT framedef( int VT framedef( char char name, name, unsigned int unsigned int type mask, int * type mask, int * frame handle )frame handle )
– int VT int VT framestart( int framestart( int frame handle )frame handle )
– int VT int VT framestop( int framestop( int frame handle )frame handle )
Type_mask defines the Type_mask defines the types of data collected –types of data collected –– VT FUNCTIONVT FUNCTION– VT REGIONVT REGION– VT PAR REGIONVT PAR REGION– VT OPENMPVT OPENMP– VT COUNTERVT COUNTER– VT MESSAGEVT MESSAGE– VT COLL OPVT COLL OP– VT COMMUNICATIONVT COMMUNICATION– VT ALLVT ALL
Analyze time frames will Analyze time frames will be availablebe available
RR
®®
40
Structured Trace FilesStructured Trace FilesRapid Access By FramesRapid Access By Frames
Index File
FrameFrame
FrameFrame
1) Structured Tracefile
3) Selecting Thumbnails
Displays Frames in Vampir
2) Vampir Thumbnail Displays
Represent Frames
RR
®®
41
Object Oriented Object Oriented Performance AnalysisPerformance AnalysisHow to avoid SOOX – Instrument with API How to avoid SOOX – Instrument with API
(Scalability Object Oriented eXplosion)(Scalability Object Oriented eXplosion)– C++ templates, classes make it much easier C++ templates, classes make it much easier
– Can be used with or Can be used with or without without sourcesource
VT ActivityVT Activity//InformerMappingsInformerMappings
MPI_SendMPI_Send MPI_RecvMPI_Recv MPI_FinalizeMPI_Finalize Func AFunc A Func InitFunc Init Func X Func Y Func ZFunc X Func Y Func Z
ImYImX ImZ
I_A I_B I_C I_DInformersInformers
EventsEvents
ImQ
Use Use TAU TAU
modelmodel
RR
®®
42
Example of OO InformersExample of OO Informersclass Matrix {class Matrix {
public:public:
InformerMapping im;InformerMapping im;
Matrix(int rows, int columns) {Matrix(int rows, int columns) {
if (rows * columns > 500)if (rows * columns > 500)
im.Rename(“LargeMatrix”);im.Rename(“LargeMatrix”);
else im.Rename(“Matrix”); }else im.Rename(“Matrix”); }
void invert () {void invert () {
Informer(im, “invert”, 12, 15, “Example.C”);Informer(im, “invert”, 12, 15, “Example.C”);
#pragma omp parallel #pragma omp parallel
{ .... }{ .... }
MPI_send(...);MPI_send(...);
}}
void compl () {void compl () {
Informer(im, “Informer(im, “typeid(…)typeid(…)” );” );
........
}}
};};
void main(int argc, char **argv) {void main(int argc, char **argv) {
Matrix Matrix A(10,10),B(512,512),C(1000,1000); A(10,10),B(512,512),C(1000,1000);
// line 1// line 1
B.im.Rename(“MediumMatrix”); B.im.Rename(“MediumMatrix”); // line 2// line 2
A.invert(); A.invert(); // line 3// line 3
B.compl(); B.compl(); // line 4// line 4
C.invert(); C.invert(); // line 5// line 5
}}
Create three Matrix instances: A (mapped to “Matrix” bin), B (mapped to “LargeMatrix” bin),
and C (mapped to “LargeMatrix” bin)
Remap B to “MediumMatrix” bin
A.invert() is traced. Entry and exit events are collected and associated with (“Matrix:invert”) in Matrix bin
B.compl is traced. Entry and exit events are collected and associated with (“Matrix:void
compl(void)”) in MediumMatrix bin
C.invert() is traced. Entry and exit events are collected and associated with (“Matrix:invert”) in
LargeMatrix bin
RR
®®
43
Vampir OO Timeline Vampir OO Timeline Shows Informer BinsShows Informer Bins
InformerMappings: display each bin as a Vampir activity. MPI is put into a separate activity with same prefix
Rename as ‘Mangled name’ InformerMapping:Informer:NormalEventName
RR
®®
44
Vampir OO Profile Shows Vampir OO Profile Shows Informer BinsInformer Bins
Time in Classes: Queens
MPI Time in Class: Queens
RR
®®
45
OO GuideView Shows OO GuideView Shows Regions in BinsRegions in Bins
Time and counter data per thread by Bin
RR
®®
46
Parallel Performance Parallel Performance EngineeringEngineering
ASCI Ultrascale Performance ToolsASCI Ultrascale Performance Tools– ScalabilityScalability
– IntegrationIntegration
– Ease of UseEase of Use
Read about what was presentedRead about what was presented– ftp://ftp.kai.com/private/ftp://ftp.kai.com/private/
Lab_notes_2001.doc.gzLab_notes_2001.doc.gz
– Contact: [email protected]: [email protected]
Thank you for your attention!Thank you for your attention!