cca common component architecture performance technology for component software - tau allen d....

32
CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen (LANL) Jaideep Ray (SNL, CA) Matt Sottile (LANL)

Post on 20-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

CCACommon Component Architecture

Performance Technology for Component Software - TAU

Allen D. Malony (U. Oregon)

Sameer Shende (U. Oregon)

Craig Rasmussen (LANL)

Jaideep Ray (SNL, CA)

Matt Sottile (LANL)

Performance Technology for Component Software - TAUCCACommon Component Architecture

2

Overview

• Complexity and performance technology• TAU performance system • Developing performance interfaces for CCA• Performance modeling and prediction issues• Conclusions

Performance Technology for Component Software - TAUCCACommon Component Architecture

3

Focus on Component Technology• Emerging component technology for HPC and Grid• Component: software object embedding functionality• Component architecture (CA): how components

connect• Component framework: implements a CA• Common Component Architecture (CCA)

– Standard foundation for scientific component architecture– Component descriptions

• Scientific Interface Description Language (SIDL)

– CCA ports for component interactions– CCA framework services (CCAFEINE)

Performance Technology for Component Software - TAUCCACommon Component Architecture

4

Problem Statement

How do we create robust and ubiquitous performance technology for the analysis and

tuning of component software in the presence of (evolving) complexity challenges?

How do we apply performance technology effectively for the variety and diversity of

performance problems that arise in the context of CCA components?

Performance Technology for Component Software - TAUCCACommon Component Architecture

5

• Tuning and Analysis Utilities• Performance system framework for scalable parallel and distributed high-

performance computing• Targets a general complex system computation model

– nodes / contexts / threads– Multi-level: system / software / parallelism– Measurement and analysis abstraction

• Integrated toolkit for performance instrumentation, measurement, analysis, and visualization– Portable, configurable performance profiling/tracing facility– Open software approach

• University of Oregon, LANL, FZJ Germany• http://www.cs.uoregon.edu/research/paracomp/tau

Performance Technology for Component Software - TAUCCACommon Component Architecture

6

TAU Performance System Architecture

EPILOG

Paraver

Performance Technology for Component Software - TAUCCACommon Component Architecture

7

TAU Instrumentation

• Flexible instrumentation mechanisms at multiple levels– Source code

• Manual (TAU API, CCA Measurement Port API)• automatic using Program Database Toolkit (PDT), OPARI

(for OpenMP programs), Babel SIDL compiler (proposed)

– Object code• pre-instrumented libraries (e.g., MPI using PMPI)• statically linked• dynamically linked (e.g., Virtual machine instrumentation)• fast breakpoints (compiler generated)

– Executable code• dynamic instrumentation (pre-execution) using DynInstAPI

Performance Technology for Component Software - TAUCCACommon Component Architecture

8

Program Database Toolkit

Application/ Library

C / C++parser

Fortran 77/90parser

C / C++IL analyzer

Fortran 77/90IL analyzer

ProgramDatabase

Files

IL IL

DUCTAPE

PDBhtml

SILOON

CHASM

TAU_instr

Programdocumentation

Applicationcomponent glue

C++ / F90interoperability

Automatic sourceinstrumentation

Performance Technology for Component Software - TAUCCACommon Component Architecture

9

Program Database Toolkit (PDT)• Program code analysis framework for developing source-based tools for

C99, C++ and F90 [U.Oregon, LANL, FZJ Germany]• High-level interface to source code information• Widely portable:

– IBM, SGI, Compaq, HP, Sun, Linux clusters,Windows, Apple, Hitachi, Cray T3E...

• Integrated toolkit for source code parsing, database creation, and database query– commercial grade front end parsers (EDG for C99/C++, Mutek for F90)– Intel/KAI C++ headers for std. C++ library distributed with PDT– portable IL analyzer, database format, and access API– open software approach for tool development

• Target and integrate multiple source languages• Used in CCA for automated generation of SIDL [CHASM]• Use in TAU to build automated performance instrumentation tools

(tau_instrumentor)• Can be used to generate code for performance ports in CCA

Performance Technology for Component Software - TAUCCACommon Component Architecture

10

Extended Component Design

• PKC: Performance Knowledge Component• POC: Performance Observability Component

genericcomponent

Extended Component Design

Performance Technology for Component Software - TAUCCACommon Component Architecture

11

Performance Observation

• Ability to observe execution performance is important– Empirically-derived performance knowledge

• Does not require measurement integration in component

– Monitor during execution to make dynamic decisions• Measurement integration is key

• Performance observation integration– Component integration: core and variant– Runtime measurement and data collection– On-line and off-line performance analysis

Performance Technology for Component Software - TAUCCACommon Component Architecture

12

Performance Observation Component (POC)• Performance observation in a

performance-engineeredcomponent model

• Functional extension of originalcomponent design ( )– Include new component

methods and ports ( ) for othercomponents to access measured performance data

– Allow original component to access performance data• Encapsulate as tightly-couple and co-resident performance

observation object• POC “provides” port allow use optmized interfaces ( )

to access ``internal'' performance observations

Performance Technology for Component Software - TAUCCACommon Component Architecture

13

Performance Observation ComponentPerformance Component

• One performance component per context• Performance component provides a Measurement Port

– Measurement Port allows a user to create and access:• Timer (start/stop, set name/type/group)• Event (trigger)• Control (enable/disable groups)• Query (get functions, metrics, counters, dump to disk)

TimerEvent

ControlQuery

Measurement Port

Performance Technology for Component Software - TAUCCACommon Component Architecture

14

Measurement Port in CCAFEINE namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){}

/* Create a Timer */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type,

string group) = 0;

/* Create a Query interface */ virtual performance::Query* createQuery(void) = 0;

/* Create a User Defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0;

/** * Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; }

Performance Component API

Performance Technology for Component Software - TAUCCACommon Component Architecture

15

namespace performance { class Timer { public:

virtual ~Timer() {} /* Start the Timer. Implement these methods in * a derived class to provide required functionality. */ virtual void start(void) = 0;

/* Stop the Timer.*/ virtual void stop(void) = 0;

virtual void setName(string name) = 0; virtual string getName(void) = 0;

virtual void setType(string name) = 0; virtual string getType(void) = 0;

/**Set the group name associated with the Timer * (e.g., All MPI calls can be grouped into an "MPI" group)*/

virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0;

virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; }

CCA Timer Interface

Performance Technology for Component Software - TAUCCACommon Component Architecture

16

Control Class Interfacenamespace performance { class Control { public: ~Control () { }

/* Control instrumentation. Enable group Id.*/ virtual void enableGroupId(unsigned long id) = 0; /* Control instrumentation. Disable group Id. */ virtual void disableGroupId(unsigned long id) = 0; /* Control instrumentation. Enable group name. */ virtual void enableGroupName(string name) = 0; /* Control instrumentation. Disable group name.*/ virtual void disableGroupName(string name) = 0; /* Control instrumentation. Enable all groups.*/ virtual void enableAllGroups(void) = 0; /* Control instrumentation. Disable all groups.*/ virtual void disableAllGroups(void) = 0; };}

CCA Instrumentation Control Interface

Performance Technology for Component Software - TAUCCACommon Component Architecture

17

Query Class Interfacenamespace performance { class Query { public: virtual ~Query() {}

/* Get the list of Timer names */ virtual void getTimerNames(const char **& functionList, int& numFuncs)

= 0; /* Get the list of Counter names */ virtual void getCounterNames(const char **& counterList,

int& numCounters) = 0;

/* getTimerData. Returns lists of metrics.*/ virtual void getTimerData(const char **& inTimerList,

int numTimers, double **& counterExclusive, double **& counterInclusive, int*& numCalls, int*& numChildCalls, const char **& counterNames, int& numCounters) = 0;

virtual void dumpProfileData(void) = 0; virtual void dumpProfileDataIncremental(void) = 0; // timestamped dump virtual void dumpTimerNames(void) = 0; virtual void dumpTimerData(const char **& inTimerList, int numTimers)

= 0; virtual void dumpTimerDataIncremental(const char **& inTimerList,

int numTimers) = 0; }; }

CCA Performance Query Interface

Performance Technology for Component Software - TAUCCACommon Component Architecture

18

Event Class Interfacenamespace performance { class Event { public: /** * Destructor */ virtual ~Event() { }

/** * Register the name of the event */ virtual void trigger(double data) = 0;

/* e.g., size of a message, error in an iteration, memory allocated */ };}

CCA User Defined Event Interface

Performance Technology for Component Software - TAUCCACommon Component Architecture

19

Measurement Port Implementation• TAU component implements the MeasurementPort

– Implements Timer, Control, Query and Control classes– Registers the port with the CCAFEINE framework

• Components target the generic MeasurementPort interface– Runtime selection of TAU component during execution– Instrumentation code independent of underlying tool– Instrumentation code independent of measurement

choice– TauMeasurement_CCA port implementation uses a

specific TAU measurement library

Performance Technology for Component Software - TAUCCACommon Component Architecture

20

Using MeasurementPort#include "ports/Measurement_CCA.h"

…double MonteCarloIntegrator::integrate (double lowBound, double upBound, int count) { classic::gov::cca::Port * port; double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast < performance::ccaports::Measurement *

>(port); if (measurement_m == 0){ cerr << "Connected to something other than a Measurement port"; return -1; } static performance::Timer* t = measurement_m->createTimer(

string("IntegrateTimer")); t->start();

for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop();

Using the Timer Interface: An Example

Performance Technology for Component Software - TAUCCACommon Component Architecture

21

TAU Component in CCAFEINErepository get TauMeasurementrepository get Driverrepository get MidpointIntegratorrepository get MonteCarloIntegratorrepository get RandomGeneratorrepository get LinearFunctionrepository get NonlinearFunctionrepository get PiFunction

create LinearFunction lin_funccreate NonlinearFunction nonlin_funccreate PiFunction pi_funccreate MonteCarloIntegrator mc_integratorcreate RandomGenerator rand

create TauMeasurement tauconnect mc_integrator RandomGeneratorPort rand RandomGeneratorPortconnect mc_integrator FunctionPort nonlin_func FunctionPortconnect mc_integrator MeasurementPort tau MeasurementPortcreate Driver driverconnect driver IntegratorPort mc_integrator IntegratorPortgo driver Goquit

Performance Technology for Component Software - TAUCCACommon Component Architecture

22

SIDL interface for Timers//// File: performance.sidl//

version performance 1.0;

package performance { class Timer { void start(); void stop(); void setName(in string name); string getName(); void setType(in string name); string getType(); void setGroupName(in string name); string getGroupName(); void setGroupId(in long group); long getGroupId(); }}

Performance Technology for Component Software - TAUCCACommon Component Architecture

23

Using SIDL Interface for Timers// SIDL:#include "performance_Timer.hh"int main(int argc, char* argv[]){ performance::Timer t = performance::Timer::_create(); ... t.setName("Integrate timer"); t.start();

// Computation for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } ... t.stop();

return 0;}

Performance Technology for Component Software - TAUCCACommon Component Architecture

24

Performance Knowledge Component

• Describe and store “known” component’s performance– Benchmark characterizations in performance database– Empirical or analytical performance models

• Saved information about component performance– Use for performance-guided selection and deployment– Use for runtime adaptation

• Representation must be in common forms with standard means for accessing the performance information

Performance Technology for Component Software - TAUCCACommon Component Architecture

25

Performance Knowledge Repository• Component performance repository

– Implement in componentarchitecture framework

– Similar to CCA componentrepository [Alexandria]

– Access by componentinfrastructure

• View performance knowledge as component (PKC)– PKC ports give access to performance knowledge– to other components back to original component– Store performance model for performance prediction– Component composition performance knowledge

Performance Technology for Component Software - TAUCCACommon Component Architecture

26

Component Performance Model• User specified• Inferred automatically by performance tool

– Prior performance data– Expression– Parametric model

• Estimate performance of a single component by – Querying runtime performance data– Passing this to performance model for evaluation

• Integration of performance observation and knowledge components key to runtime selection of components

Performance Technology for Component Software - TAUCCACommon Component Architecture

27

Applications: Uintah (U. Utah)

Scalability analysis

Performance Technology for Component Software - TAUCCACommon Component Architecture

28

Applications: VTF (ASCI ASAP Caltech)• C++, C, F90, Python• PDT, MPI

Performance Technology for Component Software - TAUCCACommon Component Architecture

29

Applications: SAMRAI (LLNL)

• C++• PDT, MPI• SAMRAI timers (groups)

Performance Technology for Component Software - TAUCCACommon Component Architecture

30

TAU Status• Instrumentation supported:

– Source, preprocessor, compiler, MPI, runtime, virtual machine• Languages supported:

– C++, C, F90, Java, Python – HPF, ZPL, HPC++, pC++...

• Packages supported:– PAPI [UTK], PCL [FZJ] (hardware performance counter access), – Opari, PDT [UO,LANL,FZJ], DyninstAPI [U.Maryland] (instrumentation), – EXPERT, EPILOG[FZJ],Vampir[Pallas], Paraver [CEPBA] (visualization)

• Platforms supported:– IBM SP, SGI Origin, Sun, HP Superdome, HP/Compaq Tru64 ES, – Linux clusters (IA-32, IA-64, PowerPC, Alpha), Apple, Windows,– Hitachi SR8000, NEC SX, Cray T3E ...

• Compilers suites supported: – GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM, Compaq,HP, Fujitsu,

Hitachi, Sun, Apple, Microsoft, NEC, Cray, PGI, Absoft, …• Thread libraries supported:

– Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS

Performance Technology for Component Software - TAUCCACommon Component Architecture

31

Concluding Remarks

• Complex component systems pose challenging performance analysis problems that require robust methodologies and tools

• New performance problems will arise– Instrumentation and measurement– Data analysis and presentation– Diagnosis and tuning

• Performance engineered components– Performance knowledge, observation, query and control

• Integration of performance technology

Performance Technology for Component Software - TAUCCACommon Component Architecture

32

Support Acknowledgement

• TAU and PDT support:– Department of Energy (DOE)

• DOE 2000 ACTS contract• DOE MICS contract• DOE ASCI Level 3 (LANL, LLNL)• U. of Utah DOE ASCI Level 1 subcontract

– DARPA– NSF National Young Investigator (NYI) award