allen d. malony, sameer shende {malony,sameer}@cs.uoregon.edu department of computer and information...

34
Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon .edu Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance Analysis in the Uintah Software Development Cycle J. Davison de St. Germain, Allan Morris, Steven G. Parker {dav,amorris,sparker}@cs.u tah.edu Department of Computer Science School of Computing University of Oregon

Post on 20-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

Allen D. Malony,

Sameer Shende {malony,sameer}@cs.uoregon.edu

Department of Computer andInformation Science

Computational Science Institute

University of Oregon

Integrating Performance Analysis in the Uintah Software Development Cycle

J. Davison de St. Germain,Allan Morris, Steven G. Parker {dav,amorris,sparker}@cs.utah.edu

Department of Computer Science

School of Computing

University of Oregon

Page 2: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20022

Outline

Scientific software engineering C-SAFE and Uintah Computational Framework (UCF)

Goals and design Challenges for performance technology integration

TAU performance system Role of performance mapping Performance analysis integration in UCF

TAU performance mapping X-PARE

Concluding remarks

Page 3: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20023

Scientific Software (Performance) Engineering

Modern scientific simulation software is complex () Large development teams of diverse expertise Simultaneous development on different system parts Iterative, multi-stage, long-term software development

Need support for managing complex software process Software engineering tools for revision control,

automated testing, and bug tracking are commonplace In contrast, tools for performance engineering are not

evaluation (measurement, analysis, benchmarking) optimization (diagnosis, tracking, prediction, tuning)

Incorporate performance engineering methodology and support by flexible and robust performance tools

Page 4: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20024

Utah ASCI/ASAP Level 1 Center (C-SAFE)

C-SAFE was established to build a problem-solving environment (PSE) for the numerical simulation of accidental fires and explosions Combine fundamental chemistry and engineering physics Integrate non-linear solvers, optimization, computational

steering, visualization, and experimental data verification Support very large-scale coupled simulations

Computer science problems: Coupling multiple scientific simulation codes with

different numerical and software properties Software engineering across diverse expert teams Achieving high performance on large-scale systems

Page 5: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20025

Example C-SAFE Simulation Problems

∑Heptane fire simulation

Material stress simulation

Typical C-SAFE simulation with a billion degrees of freedom and non-linear time dynamics

Page 6: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20026

Uintah Problem Solving Environment (PSE) Enhanced SCIRun PSE

Pure dataflow component-based Shared memory scalable multi-/mixed-mode parallelism Interactive only interactive plus standalone

Design and implement Uintah component architecture Application programmers provide

description of computation (tasks and variables) code to perform task on single “patch” (sub-region of space)

Components for scheduling, partitioning, load balance, … Follow Common Component Architecture (CCA) model

Design and implement Uintah Computational Framework (UCF) on top of the component architecture

Page 7: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20027

Uintah High-Level Component View

Page 8: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20028

High Level Architecture

C-SAFE

Implicitly Connected toAll Components

UCF

Data

Control / Light DataCheckpointingCheckpointing

MixingModelMixingModel

FluidModelFluid

Model

SubgridModel

SubgridModel Chemistry

DatabaseController

ChemistryDatabaseController

ChemistryDatabasesChemistryDatabases

High EnergySimulationsHigh EnergySimulations

NumericalSolvers

NumericalSolvers

Non-PSE Components

PerformanceAnalysis

PerformanceAnalysis

SimulationControllerSimulationController

Problem SpecificationProblem Specification

NumericalSolvers

NumericalSolvers

MPMMPMMaterial

PropertiesDatabase

MaterialPropertiesDatabase

BlazerBlazer

DatabaseDatabase

VisualizationVisualization

DataManager

DataManager

Post ProcessingAnd Analysis

Post ProcessingAnd Analysis

ParallelServicesParallelServices

ResourceManagement

ResourceManagement

PSE Components

SchedulerScheduler

Uintah Parallel Component Architecture

Page 9: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 20029

Uintah Computational Framework (UCF)

Execution model based on software (macro) dataflow Exposes parallelism and hides data transport latency Computations expressed a directed acyclic graphs of tasks

consumes input and produces output (input to future task) input/outputs specified for each patch in a structured grid

Abstraction of global single-assignment memory DataWarehouse Directory mapping names to values (array structured) Write value once then communicate to awaiting tasks

Task graph gets mapped to processing resources Communications schedule approximates global optimal

Page 10: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200210

Uintah Task Graph (Material Point Method)

Diagram of named tasks (ovals) and data (edges)

Imminent computation Dataflow-constrained

MPM Newtonian material point

motion time step Solid: values defined at

material point (particle) Dashed: values defined at

vertex (grid) Prime (’): values updated

during time step

Page 11: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200211

Example Taskgraphs (MPM and Coupled)

Page 12: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200212

Uintah PSE

UCF automatically sets up: Domain decomposition Inter-processor communication with aggregation/reduction Parallel I/O Checkpoint and restart Performance measurement and analysis (stay tuned)

Software engineering Coding standards CVS (Commits: Y3 - 26.6 files/day, Y4 - 29.9 files/day) Correctness regression testing with bugzilla bug tracking Nightly build (parallel compiles) 170,000 lines of code (Fortran and C++ tasks supported)

Page 13: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200213

Performance Technology Integration

Uintah presents challenges to performance integration Software diversity and structure

UCF middleware, simulation code modules component-based hierarchy

Portability objectives cross-language and cross-platform multi-parallelism: thread, message passing, mixed

Scalability objectives High-level programming and execution abstractions

Requires flexible and robust performance technology Requires support for performance mapping

Page 14: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200214

TAU Performance System Framework

Tuning and Analysis Utilities Performance system framework for scalable parallel and

distributed high-performance computing Targets a general complex system computation model

nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction

Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable performance profiling/tracing facility Open software approach

Page 15: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200215

TAU Performance System Architecture

EPILOG

Paraver

Page 16: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200216

Performance Analysis Objectives for Uintah

Micro tuning Optimization of simulation code (task) kernels for

maximum serial performance Scalability tuning

Identification of parallel execution bottlenecks overheads: scheduler, data warehouse, communication load imbalance

Adjustment of task graph decomposition and scheduling Performance tracking

Understand performance impacts of code modifications Throughout course of software development

C-SAFE application and UCF software

Page 17: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200218

Task execution time dominates (what task?)

MPI communication overheads (where?)

Task Execution in Uintah Parallel Scheduler

Profile methods and functions in scheduler and in MPI library

Task execution time distribution per process

Need to map performance data!

Page 18: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200219

Semantics-Based Performance Mapping

Associate performance measurements with high-level semantic abstractions

Need mapping support in the performance measurement system to assign data correctly

Page 19: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200220

Hypothetical Mapping Example Particles distributed on surfaces of a cubeParticle* P[MAX]; /* Array of particles */

int GenerateParticles() {

/* distribute particles over all faces of the cube */

for (int face=0, last=0; face < 6; face++){

/* particles on this face */

int particles_on_this_face = num(face);

for (int i=last; i < particles_on_this_face; i++) {

/* particle properties are a function of face */ P[i] = ... f(face);

...

}

last+= particles_on_this_face;

}

}

Page 20: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200221

Hypothetical Mapping Example (continued)

How much time (flops) spent processing face i particles? What is the distribution of performance among faces? How is this determined if execution is parallel?

int ProcessParticle(Particle *p) {

/* perform some computation on p */

}

int main() {

GenerateParticles();

/* create a list of particles */

for (int i = 0; i < N; i++)

/* iterates over the list */

ProcessParticle(P[i]);

}

Page 21: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200222

No Performance Mapping versus Mapping

Typical performance tools report performance with respect to routines

Does not provide support for mapping

TAU’s performance mapping can observe performance with respect to scientist’s programming and problem abstractions

TAU (no mapping) TAU (w/ mapping)

Page 22: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200223

Uintah Task Performance Mapping Uintah partitions individual particles across processing

elements (processes or threads) Simulation tasks in task graph work on particles

Tasks have domain-specific character in the computation “interpolate particles to grid” in Material Point Method

Task instances generated for each partitioned particle set Execution scheduled with respect to task dependencies

How to attribute execution time among different tasks? Assign semantic name (task type) to a task instance

SerialMPM::interpolateParticleToGrid Map TAU timer object to (abstract) task (semantic entity) Look up timer object using task type (semantic attribute)

Further partition along different domain-specific axes

Page 23: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200224

Task Performance Mapping (Profile)

Performance mapping for different tasks

Mapped task performance across processes

Page 24: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200225

Task Performance Mapping (Trace)

Work packet computation events colored by task type

Distinct phases of computation can be identifed based on task

Page 25: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200226

Task Performance Mapping (Trace - Zoom)

Startup communicationimbalance

Page 26: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200227

Task Performance Mapping (Trace - Parallelism)

Communication/ load imbalance

Page 27: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200228

Comparing Uintah Traces for Scalability Analysis

8 processes

8 processes

32 processes32 processes

32 processes

Page 28: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200229

Performance Tracking and Reporting

Integrated performance measurement allows performance analysis throughout development lifetime

Applied performance engineering in software design and development (software engineering) process Create “performance portfolio” from regular performance

experimentation (couple with software testing) Use performance knowledge in making key software

design decision, prior to major development stages Use performance benchmarking and regression testing to

identify irregularities Support automatic reporting of “performance bugs”

Enable cross-platform (cross-generation) evaluation

Page 29: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200230

XPARE - eXPeriment Alerting and REporting

Experiment launcher automates measurement / analysis Configuration and compilation of performance tools Instrumentation control for Uintah experiment type Execution of multiple performance experiments Performance data collection, analysis, and storage Integrated in Uintah software testing harness

Reporting system conducts performance regression tests Apply performance difference thresholds (alert ruleset) Alerts users via email if thresholds have been exceeded

Web alerting setup and full performance data reporting Historical performance data analysis

Page 30: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200231

XPARE System Architecture

ExperimentLaunch

Mailserver

Performance

Database

PerformanceReporter

ComparisonTool

RegressionAnalyzer

AlertingSetup

Page 31: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200232

Scaling Performance Optimizations (Past)

Last year: initial “correct” scheduler

Reduce communication by 10 x

Reduce task graph overhead by 20 x

ASCI NirvanaSGI Origin 2000Los AlamosNational Laboratory

Page 32: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200233

Scalability to 2000 Processors (Current)

ASCI NirvanaSGI Origin 2000Los AlamosNational Laboratory

Page 33: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200234

Concluding Remarks

Modern scientific simulation environments involves a complex (scientific) software engineering process Iterative, diverse expertise, multiple teams, concurrent

Complex parallel software and systems pose challenging performance analysis problems that require flexible and robust performance technology and methods Cross-platform, cross-language, large-scale Fully-integrated performance analysis system Performance mapping

Neet to support performance engineering methodology within scientific software design and development Performance comparison and tracking

Page 34: Allen D. Malony, Sameer Shende {malony,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University

May 16, 2002 ISHPC 200235

Acknowledgements

Department of Energy (DOE), ASCI AcademicStrategic Alliances Program (ASAP)

Center for the Simulation of Accidental Fires andExplosions (C-SAFE), ASCI/ASAP Level 1 center, University of Utahhttp://www.csafe.utah.edu

Computational Science Institute, ASCI/ASAPLevel 3 projects with LLNL / LANL,University of Oregonhttp://www.csi.uoregon.edu

ftp://ftp.cs.uoregon.edu/pub/malony/Talks/ishpc2002.ppt