allen d. malony, sameer s. shende, robert bell kai li, li li, kevin huck...

58
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon. edu Department of Computer and Information Science Performance Research Laboratory University of Oregon TAU Parallel Performance System

Upload: damon-mcbride

Post on 21-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

Allen D. Malony, Sameer S. Shende, Robert BellKai Li, Li Li, Kevin Huck

{malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu

Department of Computer and Information Science

Performance Research Laboratory

University of Oregon

TAU Parallel Performance System

Page 2: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20022

Outline

Motivation TAU architecture and toolkit

Instrumentation Measurement Analysis

Example applications Users of TAU Conclusion

Page 3: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20023

Problem Domain

ASCI defines leading edge parallel systems and software Large-scale systems and heterogenous platforms Multi-model simulation Complex software integration Multi-language programming Mixed-model parallelism

Complexity challenges performance analysis tools System diversity requires portable tools Need for cross-language support Coverage of parallel computation models Operate at scale

Page 4: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20024

Research Motivation

Tools for performance problem solving Empirical-based performance optimization process Performance technology concerns

characterization

PerformanceTuning

PerformanceDiagnosis

PerformanceExperimentation

PerformanceObservation

hypotheses

properties

• Instrumentation• Measurement• Analysis• Visualization

PerformanceTechnology

• Experimentmanagement

• Performancedatabase

PerformanceTechnology

Page 5: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20025

TAU Performance System

Tuning and Analysis Utilities (11+ year project effort) Performance system framework for scalable parallel and

distributed high-performance computing Targets a general complex system computation model

nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction

Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable performance profiling and tracing facility Open software approach with technology integration

University of Oregon , Forschungszentrum Jülich, LANL

Page 6: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20026

TAU Performance Systems Goals

Multi-level performance instrumentation Multi-language automatic source instrumentation

Flexible and configurable performance measurement Widely-ported parallel performance profiling system

Computer system architectures and operating systems Different programming languages and compilers

Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid

Support for performance mapping Support for object-oriented and generic programming Integration in complex software systems and applications

Page 7: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20027

General Complex System Computation Model

Node: physically distinct shared memory machine Message passing node interconnection network

Context: distinct virtual memory space within node Thread: execution threads (user/system) in context

memory memory

Node Node Node

VMspace

Context

SMP

Threads

node memory

Interconnection Network Inter-node messagecommunication

*

*

physicalview

modelview

Page 8: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20028

TAU Performance System Architecture

EPILOG

Paraver

Page 9: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 20029

TAU Instrumentation Approach

Support for standard program events Routines Classes and templates Statement-level blocks

Support for user-defined events Begin/End events (“user-defined timers”) Atomic events Selection of event statistics

Support definition of “semantic” entities for mapping Support for event groups Instrumentation optimization

Page 10: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200210

TAU Instrumentation

Flexible instrumentation mechanisms at multiple levels Source code

manual automatic

C, C++, F77/90/95 (Program Database Toolkit (PDT))OpenMP (directive rewriting (Opari), POMP spec)

Object code pre-instrumented libraries (e.g., MPI using PMPI) statically-linked and dynamically-linked

Executable code dynamic instrumentation (pre-execution) (DynInstAPI) virtual machine instrumentation (e.g., Java using JVMPI)

Page 11: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200211

TAU Source Instrumentation

Automatic source instrumentation (TAUinstr) Routine entry/exit and class method entry/exit Block entry/exit and statement level (to be added) Uses an instrumentation specification file

Include/exclude list for events and files Uses command line options for group selection

Instrumentation event selection (TAUselect) Automatic generation of instrumentation specification file Instrumentation language to describe event constraints

Event identity and location Event performance properties (e.g., overhead analysis)

Create TAUselect scripts for performance experiments

Page 12: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200212

Multi-Level Instrumentation

Targets common measurement interface TAU API

Multiple instrumentation interfaces Simultaneously active

Information sharing between interfaces Utilizes instrumentation knowledge between levels

Selective instrumentation Available at each level Cross-level selection

Targets a common performance model Presents a unified view of execution

Consistent performance events

Page 13: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200213

Program Database Toolkit (PDT)

Program code analysis framework develop source-based tools

High-level interface to source code information Integrated toolkit for source code parsing, database

creation, and database query Commercial grade front-end parsers Portable IL analyzer, database format, and access API Open software approach for tool development

Multiple source languages Implement automatic performance instrumentation tools

tau_instrumentor

Page 14: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200214

Program Database Toolkit (PDT)

Application/ Library

C / C++parser

Fortran parserF77/90/95

C / C++IL analyzer

FortranIL analyzer

ProgramDatabase

Files

IL IL

DUCTAPE

PDBhtml

SILOON

CHASM

TAU_instr

Programdocumentation

Applicationcomponent glue

C++ / F90/95interoperability

Automatic sourceinstrumentation

Page 15: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200215

PDT 3.0 Functionality

C++ statement-level information implementation for, while loops, declarations, initialization, assignment… PDB records defined for most constructs

DUCTAPE Processes PDB 1.x, 2.x, 3.x uniformly

PDT applications XMLgen

PDB to XML converter (Sottile) Used for CHASM and CCA tools

PDBstmt Statement callgraph display tool

Page 16: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200216

PDT 3.0 Functionality (continued) Cleanscape Flint parser fully integrated for F90/95

Flint parser is very robust Produces PDB records for TAU instrumentation (stage 1)

Linux x86, HP Tru64, IBM AIX Tested on SAGE, POP, ESMF, PET benchmarking codes

Full PDB 2.0 specification (stage 2) [Q1 ‘04] Statement level support (stage 3) [Q3 ‘04]

Open64 parser integrated in PDT for F90/95 Barbara Chapman, University of Houston Generate full PDB 2.0 specification (stage 2) [Q2 ‘04] Statement level support (stage 3) [Q3 ‘04]

PDT 3.0 release at SC2003

Page 17: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200217

TAU Performance Measurement

TAU supports profiling and tracing measurement Robust timing and hardware performance support Support for online performance monitoring

Profile and trace performance data export to file system Selective exporting

Extension of TAU measurement for multiple counters Creation of user-defined TAU counters Access to system-level metrics

Support for callpath measurement Integration with system-level performance data

Linux MAGNET/MUSE (Wu Feng, LANL)

Page 18: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200218

TAU Measurement with Multiple Counters

Extend event measurement to capture multiple metrics Begin/end (interval) events User-defined (atomic) events Multiple performance data sources can be queried

Associate counter function list to event Defined statically or dynamically Different counter sources

Timers and hardware counters User-defined counters (application specified) System-level counters

Monotonically increasing required for begin/end events Extend user-defined counters to system-level counter

Page 19: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200219

TAU Measurement

Performance information Performance events High-resolution timer library (real-time / virtual clocks) General software counter library (user-defined events) Hardware performance counters

PCL (Performance Counter Library) (ZAM, Germany) PAPI (Performance API) (UTK, Ptools Consortium) consistent, portable API

Organization Node, context, thread levels Profile groups for collective events (runtime selective) Performance data mapping between software levels

Page 20: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200220

TAU Measurement Options

Parallel profiling Function-level, block-level, statement-level Supports user-defined events TAU parallel profile data stored during execution Hardware counts values Support for multiple counters Support for callgraph and callpath profiling

Tracing All profile-level events Inter-process communication events Trace merging and format conversion

Page 21: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200221

Grouping Performance Data in TAU

Profile Groups A group of related routines forms a profile group Statically defined

TAU_DEFAULT, TAU_USER[1-5], TAU_MESSAGE, TAU_IO, …

Dynamically defined group name based on string, such as “adlib” or “particles” runtime lookup in a map to get unique group identifier uses tau_instrumentor to instrument

Ability to change group names at runtime Group-based instrumentation and measurement control

Page 22: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200222

TAU Analysis

Parallel profile analysis Pprof

parallel profiler with text-based display ParaProf

Graphical, scalable, parallel profile analysis and display

Trace analysis and visualization Trace merging and clock adjustment (if necessary) Trace format conversion (ALOG, SDDF, VTF, Paraver) Trace visualization using Vampir (Pallas)

Page 23: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200223

Pprof Output (NAS Parallel Benchmark – LU)

Intel QuadPIII Xeon

F90 + MPICH

Profile - Node - Context - Thread

Events - code - MPI

Page 24: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200224

ParaProf (NAS Parallel Benchmark – LU)

node,context, thread Global profiles Routine profile across all nodes

Event legend

Individual profile

Page 25: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200225

TAU + PAPI (NAS Parallel Benchmark – LU )

Floating point operations

Re-link to alternate library

Can use multiple counter support

Page 26: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200226

TAU + Vampir (NAS Parallel Benchmark – LU)

Timeline display Callgraph display

Parallelism display

Communications display

Page 27: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200227

TAU Performance System Status

Computing platforms (selected) IBM SP / pSeries, SGI Origin 2K/3K, Cray T3E / SV-1 /

X1, HP (Compaq) SC (Tru64), Sun, Hitachi SR8000, NEC SX-5/6, Linux clusters (IA-32/64, Alpha, PPC, PA-RISC, Power, Opteron), Apple (G4/5, OS X), Windows

Programming languages C, C++, Fortran 77/90/95, HPF, Java, OpenMP, Python

Thread libraries pthreads, SGI sproc, Java,Windows, OpenMP

Compilers (selected) Intel KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, Sun,

Microsoft, SGI, Cray, IBM (xlc, xlf), Compaq, NEC, Intel

Page 28: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200228

Selected Applications of TAU

Center for Simulation of Accidental Fires and Explosion University of Utah, ASCI ASAP Center, C-SAFE Uintah Computational Framework (UCF) (C++)

Center for Simulation of Dynamic Response of Materials California Institute of Technology, ASCI ASAP Center Virtual Testshock Facility (VTF) (Python, Fortran 90)

Los Alamos National Lab Monte Carlo transport (MCNP) (Susan Post)

Full code automatic instrumentation and profiling ASCI Q validation and scaling

SAIC’s Adaptive Grid Eulerian (SAGE) (Jack Horner) Fortran 90 automatic instrumentation and profiling

Page 29: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200229

Selected Applications of TAU (continued)

Lawrence Livermore National Lab Radiation diffusion (KULL)

C++ automatic instrumentation, callpath profiling

Sandia National Lab DOE CCTTSS SciDAC project Common component architecture (CCA) integration Combustion code (C++, Fortran 90)

Flash Center University of Chicago / Argonne, ASCI ASAP Center FLASH code (C, Fortran 90)

Page 30: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200230

Performance Analysis and Visualization

Analysis of parallel profile and trace measurement Parallel profile analysis

ParaProf ParaVis Profile generation from trace data

Performance database framework (PerfDBF) Parallel trace analysis

Translation to VTF 3.0 and EPILOG Integration with VNG (Technical University of Dresden)

Online parallel analysis and visualization

Page 31: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200231

ParaProf Framework Architecture

Portable, extensible, and scalable tool for profile analysis Try to offer “best of breed” capabilities to analysts Build as profile analysis framework for extensibility

Page 32: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200232

Profile Manager Window

Structured AMR toolkit (SAMRAI++), LLNL

Page 33: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200233

Full Profile Window (Exclusive Time)

512

proc

esse

s

Page 34: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200234

Node / Context / Thread Profile Window

Page 35: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200235

Derived Metrics

Page 36: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200236

Full Profile Window (Metric-specific)

512

proc

esse

s

Page 37: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200237

ParaProf Enhancements

Readers completely separated from the GUI Access to performance profile database

Profile translators

mpiP, papiprof, dynaprof Callgraph display

prof/gprof style with hyperlinks Integration of 3D performance plotting library Scalable profile analysis

Statistical histograms, cluster analysis, … Generalized programmable analysis engine Cross-experiment analysis

Page 38: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200238

ParaVis

PerformanceVisualizer

PerformanceAnalyzer

PerformanceData Reader

Scalable parallel profile analysis Scalable performance displays

3D graphics Analysis across profile samples

Allow for runtime use Animated / interactive visualization Initially develop with SCIRun

Computational environment Performance graphics toolkit

Portable plotting library OpenGL

Page 39: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200239

Performance Visualization in SCIRun

SCIRun program

EVH1, IBM

EVH1, Linux IA-32

Page 40: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200240

“Terrain” Visualization (Full profile)

F

Uintah

Page 41: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200241

“Scatterplot” Visualization Each point

coordinatedeterminedby threevalues:MPI_Reduce

MPI_Recv

MPI_Waitsome

Min/Maxvalue range

Effective forclusteranalysis

Uintah

Page 42: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200242

“Bargraph” Visualization (MPI routines)

QuickTime™ and aGIF decompressorare needed to see this picture.

Uintah, 512 processes, ASCI Blue Pacific

Page 43: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200243

Empirical-Based Performance Optimization

characterization

PerformanceTuning

PerformanceDiagnosis

PerformanceExperimentation

PerformanceObservation

hypotheses

properties

observabilityrequirements ?

ProcessExperiment

Schemas

ExperimentTrials

Experimentmanagement

Page 44: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200244

TAU Performance Database FrameworkPerformance

analysis programs

Performance analysisand query toolkit

profile data only XML representation project / experiment / trial

PerfDMLtranslators

. . .

ORDB

PostgreSQL

PerfDB

Performancedata description

Raw performance data

Other tools

Page 45: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200245

PerfDBF Components

Performance Data Meta Language (PerfDML) Common performance data representation Performance meta-data description PerfDML translators to common data representation

Performance DataBase (PerfDB) Standard database technology (SQL) Free, robust database software (PostgresSQL, MySQL) Commonly available APIs

Performance DataBase Toolkit (PerfDBT) Commonly used modules for query and analysis PerfDB API to facilitate analysis tool development

Page 46: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200246

PerfDBF Browser

Page 47: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200247

PerfDBF Cross-Trial Analysis

Page 48: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200248

TAU Application (Selected)

SAMRAI (LLNL) Overture (LLNL) C-SAFE (ASCI ASAP, University of Utah) VTF (ASCI ASAP, Caltech) SAGE (ASCI LANL) POOMA, POOMA-II (LANL, Code Sourcery) PETSc (ANL) CCA (DOE SciDAC) GrACE (Rutgers University) DOE ACTS toolkit Aurora / SCALEA (University of Vienna)

Page 49: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200249

Work in Progress

Trace visualization Event traces with counters (Vampir 3.0 will visualize) EPILOG trace conversion

Runtime performance monitoring and analysis Online performance data access Performance analysis and visualization in SCIRun

Performance Database Framework XML parallel profile representation of TAU profiles PostgresSQL performance database

Next-generation PDT Performance analysis for component software (CCA)

Page 50: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200250

Concluding Remarks

Complex software and parallel computing systems pose challenging performance analysis problems that require robust methodologies and tools

To build more sophisticated performance tools, existing proven performance technology must be utilized

Performance tools must be integrated with software and systems models and technology Performance engineered software Function consistently and coherently in software and

system environments TAU performance system offers robust performance

technology that can be broadly integrated … so USE IT!

Page 51: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200251

Acknowledgements Department of Energy (DOE)

MICS office DOE 2000 ACTS contract “Performance Technology for Tera-class Parallel Computer

Systems: Evolution of the TAU Performance System” PERC SciDAC project affiliate

University of Utah DOE ASCI Level 1 sub-contract DOE ASCI Level 3 (LANL, LLNL)

NSF National Young Investigator (NYI) award Research Centre Juelich

John von Neumann Institute for Computing Dr. Bernd Mohr

Los Alamos National Laboratory

Page 52: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200252

Case Study: SAMRAI (LLNL)

Structured Adaptive Mesh Refinement Application Infrastructure (SAMRAI)

Programming C++ and MPI SPMD

Instrumentation PDT for automatic instrumentation of routines MPI interposition wrappers SAMRAI timers for interesting code segments

timers classified in groups (apps, mesh, …) timer groups are managed by TAU groups

Page 53: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200253

SAMRAI (Profile)

Euler (2D)

return type routine name

Page 54: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200254

SAMRAI Euler (Profile)

Page 55: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200255

SAMRAI Euler (Trace)

Page 56: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200256

Case Study: EVH1

Enhanced Virginia Hydrodynamics #1 (EVH1) "TeraScale Simulations of Neutrino-Driven Supernovae

and Their Nucleosynthesis" SciDAC project Configured to run a simulation of the Sedov-Taylor blast

wave solution in 2D spherical geometry Performance study found EVH1 communication bound

for more than 64 processors Predominant routine (>50% of execution time) at this

scale is MPI_ALLTOALL Used in matrix transpose-like operations

Page 57: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200257

EVH1 Execution Profile

Page 58: Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck {malony,sameer,bertie,likai,lili,khuck}@cs.uoregon.edu Department of Computer

The TAU Performance System Cray Briefing, SC2002, Nov. 18, 200258

EVH1 Execution Trace

MPI_Alltoall is an execution bottleneck