allen d. malony, sameer shende {malony,sameer}@cs.uoregon.edu department of computer and information...
Post on 20-Dec-2015
221 views
Embed Size (px)
TRANSCRIPT

Allen D. Malony,
Sameer Shende {malony,sameer}@cs.uoregon.edu
Department of Computer andInformation Science
Computational Science Institute
University of Oregon
Integrating Performance Analysis in the Uintah Software Development Cycle
J. Davison de St. Germain,Allan Morris, Steven G. Parker {dav,amorris,sparker}@cs.utah.edu
Department of Computer Science
School of Computing
University of Oregon

May 16, 2002 ISHPC 20022
Outline
Scientific software engineering C-SAFE and Uintah Computational Framework (UCF)
Goals and design Challenges for performance technology integration
TAU performance system Role of performance mapping Performance analysis integration in UCF
TAU performance mapping X-PARE
Concluding remarks

May 16, 2002 ISHPC 20023
Scientific Software (Performance) Engineering
Modern scientific simulation software is complex () Large development teams of diverse expertise Simultaneous development on different system parts Iterative, multi-stage, long-term software development
Need support for managing complex software process Software engineering tools for revision control,
automated testing, and bug tracking are commonplace In contrast, tools for performance engineering are not
evaluation (measurement, analysis, benchmarking) optimization (diagnosis, tracking, prediction, tuning)
Incorporate performance engineering methodology and support by flexible and robust performance tools

May 16, 2002 ISHPC 20024
Utah ASCI/ASAP Level 1 Center (C-SAFE)
C-SAFE was established to build a problem-solving environment (PSE) for the numerical simulation of accidental fires and explosions Combine fundamental chemistry and engineering physics Integrate non-linear solvers, optimization, computational
steering, visualization, and experimental data verification Support very large-scale coupled simulations
Computer science problems: Coupling multiple scientific simulation codes with
different numerical and software properties Software engineering across diverse expert teams Achieving high performance on large-scale systems

May 16, 2002 ISHPC 20025
Example C-SAFE Simulation Problems
∑Heptane fire simulation
Material stress simulation
Typical C-SAFE simulation with a billion degrees of freedom and non-linear time dynamics

May 16, 2002 ISHPC 20026
Uintah Problem Solving Environment (PSE) Enhanced SCIRun PSE
Pure dataflow component-based Shared memory scalable multi-/mixed-mode parallelism Interactive only interactive plus standalone
Design and implement Uintah component architecture Application programmers provide
description of computation (tasks and variables) code to perform task on single “patch” (sub-region of space)
Components for scheduling, partitioning, load balance, … Follow Common Component Architecture (CCA) model
Design and implement Uintah Computational Framework (UCF) on top of the component architecture

May 16, 2002 ISHPC 20027
Uintah High-Level Component View

May 16, 2002 ISHPC 20028
High Level Architecture
C-SAFE
Implicitly Connected toAll Components
UCF
Data
Control / Light DataCheckpointingCheckpointing
MixingModelMixingModel
FluidModelFluid
Model
SubgridModel
SubgridModel Chemistry
DatabaseController
ChemistryDatabaseController
ChemistryDatabasesChemistryDatabases
High EnergySimulationsHigh EnergySimulations
NumericalSolvers
NumericalSolvers
Non-PSE Components
PerformanceAnalysis
PerformanceAnalysis
SimulationControllerSimulationController
Problem SpecificationProblem Specification
NumericalSolvers
NumericalSolvers
MPMMPMMaterial
PropertiesDatabase
MaterialPropertiesDatabase
BlazerBlazer
DatabaseDatabase
VisualizationVisualization
DataManager
DataManager
Post ProcessingAnd Analysis
Post ProcessingAnd Analysis
ParallelServicesParallelServices
ResourceManagement
ResourceManagement
PSE Components
SchedulerScheduler
Uintah Parallel Component Architecture

May 16, 2002 ISHPC 20029
Uintah Computational Framework (UCF)
Execution model based on software (macro) dataflow Exposes parallelism and hides data transport latency Computations expressed a directed acyclic graphs of tasks
consumes input and produces output (input to future task) input/outputs specified for each patch in a structured grid
Abstraction of global single-assignment memory DataWarehouse Directory mapping names to values (array structured) Write value once then communicate to awaiting tasks
Task graph gets mapped to processing resources Communications schedule approximates global optimal

May 16, 2002 ISHPC 200210
Uintah Task Graph (Material Point Method)
Diagram of named tasks (ovals) and data (edges)
Imminent computation Dataflow-constrained
MPM Newtonian material point
motion time step Solid: values defined at
material point (particle) Dashed: values defined at
vertex (grid) Prime (’): values updated
during time step

May 16, 2002 ISHPC 200211
Example Taskgraphs (MPM and Coupled)

May 16, 2002 ISHPC 200212
Uintah PSE
UCF automatically sets up: Domain decomposition Inter-processor communication with aggregation/reduction Parallel I/O Checkpoint and restart Performance measurement and analysis (stay tuned)
Software engineering Coding standards CVS (Commits: Y3 - 26.6 files/day, Y4 - 29.9 files/day) Correctness regression testing with bugzilla bug tracking Nightly build (parallel compiles) 170,000 lines of code (Fortran and C++ tasks supported)

May 16, 2002 ISHPC 200213
Performance Technology Integration
Uintah presents challenges to performance integration Software diversity and structure
UCF middleware, simulation code modules component-based hierarchy
Portability objectives cross-language and cross-platform multi-parallelism: thread, message passing, mixed
Scalability objectives High-level programming and execution abstractions
Requires flexible and robust performance technology Requires support for performance mapping

May 16, 2002 ISHPC 200214
TAU Performance System Framework
Tuning and Analysis Utilities Performance system framework for scalable parallel and
distributed high-performance computing Targets a general complex system computation model
nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction
Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable performance profiling/tracing facility Open software approach

May 16, 2002 ISHPC 200215
TAU Performance System Architecture
EPILOG
Paraver

May 16, 2002 ISHPC 200216
Performance Analysis Objectives for Uintah
Micro tuning Optimization of simulation code (task) kernels for
maximum serial performance Scalability tuning
Identification of parallel execution bottlenecks overheads: scheduler, data warehouse, communication load imbalance
Adjustment of task graph decomposition and scheduling Performance tracking
Understand performance impacts of code modifications Throughout course of software development
C-SAFE application and UCF software

May 16, 2002 ISHPC 200218
Task execution time dominates (what task?)
MPI communication overheads (where?)
Task Execution in Uintah Parallel Scheduler
Profile methods and functions in scheduler and in MPI library
Task execution time distribution per process
Need to map performance data!

May 16, 2002 ISHPC 200219
Semantics-Based Performance Mapping
Associate performance measurements with high-level semantic abstractions
Need mapping support in the performance measurement system to assign data correctly

May 16, 2002 ISHPC 200220
Hypothetical Mapping Example Particles distributed on surfaces of a cubeParticle* P[MAX]; /* Array of particles */
int GenerateParticles() {
/* distribute particles over all faces of the cube */
for (int face=0, last=0; face < 6; face++){
/* particles on this face */
int particles_on_this_face = num(face);
for (int i=last; i < particles_on_this_face; i++) {
/* particle properties are a function of face */ P[i] = ... f(face);
...
}
last+= particles_on_this_face;
}
}

May 16, 2002 ISHPC 200221
Hypothetical Mapping Example (continued)
How much time (flops) spent processing face i particles? What is the distribution of performance among faces? How is this determined if execution is parallel?
int ProcessParticle(Particle *p) {
/* perform some computation on p */
}
int main() {
GenerateParticles();
/* create a list of particles */
for (int i = 0; i < N; i++)
/* iterates over the list */
ProcessParticle(P[i]);
}

May 16, 2002 ISHPC 200222
No Performance Mapping versus Mapping
Typical performance tools report performance with respect to routines
Does not provide support for mapping
TAU’s performance mapping can observe performance with respect to scientist’s programming and problem abstractions
TAU (no mapping) TAU (w/ mapping)

May 16, 2002 ISHPC 200223
Uintah Task Performance Mapping Uintah partitions individual particles across processing
elements (processes or threads) Simulation tasks in task graph work on particles
Tasks have domain-specific character in the computation “interpolate particles to grid” in Material Point Method
Task instances generated for each partitioned particle set Execution scheduled with respect to task dependencies
How to attribute execution time among different tasks? Assign semantic name (task type) to a task instance
SerialMPM::interpolateParticleToGrid Map TAU timer object to (abstract) task (semantic entity) Look up timer object using task type (semantic attribute)
Further partition along different domain-specific axes

May 16, 2002 ISHPC 200224
Task Performance Mapping (Profile)
Performance mapping for different tasks
Mapped task performance across processes

May 16, 2002 ISHPC 200225
Task Performance Mapping (Trace)
Work packet computation events colored by task type
Distinct phases of computation can be identifed based on task

May 16, 2002 ISHPC 200226
Task Performance Mapping (Trace - Zoom)
Startup communicationimbalance

May 16, 2002 ISHPC 200227
Task Performance Mapping (Trace - Parallelism)
Communication/ load imbalance

May 16, 2002 ISHPC 200228
Comparing Uintah Traces for Scalability Analysis
8 processes
8 processes
32 processes32 processes
32 processes

May 16, 2002 ISHPC 200229
Performance Tracking and Reporting
Integrated performance measurement allows performance analysis throughout development lifetime
Applied performance engineering in software design and development (software engineering) process Create “performance portfolio” from regular performance
experimentation (couple with software testing) Use performance knowledge in making key software
design decision, prior to major development stages Use performance benchmarking and regression testing to
identify irregularities Support automatic reporting of “performance bugs”
Enable cross-platform (cross-generation) evaluation

May 16, 2002 ISHPC 200230
XPARE - eXPeriment Alerting and REporting
Experiment launcher automates measurement / analysis Configuration and compilation of performance tools Instrumentation control for Uintah experiment type Execution of multiple performance experiments Performance data collection, analysis, and storage Integrated in Uintah software testing harness
Reporting system conducts performance regression tests Apply performance difference thresholds (alert ruleset) Alerts users via email if thresholds have been exceeded
Web alerting setup and full performance data reporting Historical performance data analysis

May 16, 2002 ISHPC 200231
XPARE System Architecture
ExperimentLaunch
Mailserver
Performance
Database
PerformanceReporter
ComparisonTool
RegressionAnalyzer
AlertingSetup

May 16, 2002 ISHPC 200232
Scaling Performance Optimizations (Past)
Last year: initial “correct” scheduler
Reduce communication by 10 x
Reduce task graph overhead by 20 x
ASCI NirvanaSGI Origin 2000Los AlamosNational Laboratory

May 16, 2002 ISHPC 200233
Scalability to 2000 Processors (Current)
ASCI NirvanaSGI Origin 2000Los AlamosNational Laboratory

May 16, 2002 ISHPC 200234
Concluding Remarks
Modern scientific simulation environments involves a complex (scientific) software engineering process Iterative, diverse expertise, multiple teams, concurrent
Complex parallel software and systems pose challenging performance analysis problems that require flexible and robust performance technology and methods Cross-platform, cross-language, large-scale Fully-integrated performance analysis system Performance mapping
Neet to support performance engineering methodology within scientific software design and development Performance comparison and tracking

May 16, 2002 ISHPC 200235
Acknowledgements
Department of Energy (DOE), ASCI AcademicStrategic Alliances Program (ASAP)
Center for the Simulation of Accidental Fires andExplosions (C-SAFE), ASCI/ASAP Level 1 center, University of Utahhttp://www.csafe.utah.edu
Computational Science Institute, ASCI/ASAPLevel 3 projects with LLNL / LANL,University of Oregonhttp://www.csi.uoregon.edu
ftp://ftp.cs.uoregon.edu/pub/malony/Talks/ishpc2002.ppt