nick trebon, alan morris, jaideep ray, sameer shende, allen malony {ntrebon, amorris,...

16
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu [email protected] Department of Computer and Information Science Sandia National Laboratories Performance Research Laboratory University of Oregon “Performance Modeling of Component Assemblies with TAU”

Post on 20-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Nick Trebon, Alan Morris, Jaideep Ray,

Sameer Shende, Allen Malony

{ntrebon, amorris, malony,sameer}@cs.uoregon.edu

[email protected]

Department of Computer and Information Science Sandia National Laboratories

Performance Research Laboratory

University of Oregon

“Performance Modeling of Component Assemblies with TAU”

Page 2: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20052

Outline

Motivation Introduction and Background

Performance Measurement in HPC Component Environment

Performance Measuring and Modeling Infrastructure Proxies TAU component Mastermind

Component Assembly Optimization Conclusions

Page 3: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20053

Motivations

Given a set of components, where

each component has multiple

implementations, what is the

optimal subset of implementations

that solve a given problem? How to model a single

component?

How to create a global model

from a set of component models?

How to select optimal subset of

implementations?

• From a performance perspective, a component by itself has no meaning. A component needs a context.

• Context is affected by:• The problem being solved• Parameters (e.g., size of an array) • Mismatched data structures

Page 4: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20054

Performance in HPC Component Environments

Traditional role of performance measurement and modeling Analysis-and-optimization phase

e.g., porting a stable code base to a new architecture Performance model => predict scalability

In a component environment Applications are dynamically composed at runtime Application developers typically do not implement all of

their own components Performance measurements need to be non-intrusive Users interested in a coarse-grained performance

Page 5: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20055

What does performance mean?

Given a problem (characterized by tuple P), what time Te does a

component C need to solve it ? i.e Te = f ( P ) ; what’s f ?

To create a performance model f ( P ), we need: Te = Execution time for a method call

Tm = Execution time of message passing calls within a method

Tc = Compute time for a given method (Tc = Te - Tm)

Input parameters that affect performance (e.g., size of an array)

For our purposes start with simplifying assumptions Blocking communication and no overlap of communication and

computation Ignore disk I/O

Page 6: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20056

How to measure performance? Need to “instrument” the code

But has to be non-intrusive

What kind of performance infrastructure can achieve this?

Previous research suggests proxies

Proxies serve to intercept and forward method calls

Page 7: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20057

CCA Performance Infrastructure The proxy measurement system infrastructure:

Proxy Lightweight : simply, a switch that turns measurement on and

off 1 proxy per component

Tuning and Analysis Utilities (TAU) component Utilizes the TAU measurement library Provides a measurement port Responsible for making the measurements

Mastermind component Responsible for gathering, storing, and reporting measurement

data (timing data from TAU as well as input parameters from proxies)

Queries the TAU component for method-level measurements

Page 8: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20058

Proxy

A proxy uses and provides the same ports that the actual component provides

Also, uses a MonitorPort Identifies performance-

dependent parameters

C1 C2

Before:

C1 C2P2

After:

MM

Page 9: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 20059

Automatic Proxy Generation A tool based upon the

Program Database Toolkit (University of Oregon)

1 proxy created per port

Page 10: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200510

MasterMind

A record is created for each instrumented routine

and stores, for each invocation: Measurement data (e.g., execution time, communication

time, cache hits, etc.)

Input parameters

Currently, the MasterMind outputs all records at

application completion

In the future, perhaps the MasterMind could output

a performance model for a given component (based

upon a linear regression) ?

Page 11: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200511

TAU Component

TAU component is a wrapper to the TAU library

Provides access to timers to measure execution time and

communication time

Also provides access to hardware metrics (e.g., cache

hits) via external libraries such as PAPI or PCL

See http://www.cs.uoregon.edu/research/paracomp/tau

Page 12: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200512

TAU Performance System Architecture

Page 13: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200513

Using performance timings to select optimal components

To find optimal solution, need to reduce solution space Eliminate “insignificant”

components 2-step heuristic

Are children, as a group, insignificant to a parent?

Is an individual node insignificant relative to its siblings?

Optimize reduced core for an approximately optimal solution

Page 14: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200514

Case Study Example

Core identification ran on hydro shock simulation developed at Sandia National Labs 10% thresholds

The original call-graph consisting of 18 nodes reduced to 8 nodes

Page 15: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200515

Conclusions

The proxy-based measurement system allows for non-

intrusive measurement of components

A single component may have multiple performance

models based on different contexts

Eliminating “insignificant” components can ease the

identification of an approximately optimal solution.

Page 16: Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of

Component Performance Modeling with TAU Compframe, Jun. 23, 200516

Future Work

Synthesize a composite performance model from individual component models

Generalizing performance models (e.g. parameterizing models by a processor speed and cache model to make them architecture independent)

Model representation XML?

Quality-of-Service Dynamic Implementation Selection