sameer shende, allen d. malony {sameer,malony}@cs.uoregon.edu computer & information science...

33
Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon Integration and Application of the TAU Performance System in Parallel Java Environments

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

Sameer Shende, Allen D. Malony

{sameer,malony}@cs.uoregon.edu

Computer & Information Science Department

Computational Science Institute

University of Oregon

Integration and Application of theTAU Performance System inParallel Java Environments

Page 2: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Java HPC and Performance Technology

Interest in performance tools for Java HPC Shared- and distributed-memory parallelism Multi-level (semantic) performance views

Java environment challenges performance technology Language and packages

object-oriented, interfaces, RMI, reflection, … Java Virtual Machine (JVM) execution model

thread mapping, scheduling, SMP execution, event access Just-In-Time (JIT) compilation and dynamic loading Java Native Interface (JNI)

inter-language execution, non-Java events / execution Portability of performance tools and methods

Page 3: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Research Problems

GeneralHow to create robust and ubiquitous performance technology for the analysis and tuning of parallel high-performance software and systems in the presence of (evolving) complexity challenges?

SpecificCan performance technology developed for use in HPC environments be successfully applied to parallel Java environments, and how are the new performance instrumentation, measurement, and analysis problems addressed?

Page 4: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Talk Outline

Java HPC and Performance Technology TAU Performance System

Computation model for performance technology TAU performance system toolkit

Target HPC Java Environment SMP clusters and distributed computing Multi-threading + MPI message passing

Integration (Adaption) of TAU Performance System User-level, JVM-level, JNI-level, inter-language

Example “Mixed-Mode” Application Conclusions

Page 5: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Performance System

Tuning and Analysis Utilities Performance system framework

scalable parallel and distributed HPC Targets a general complex system computation model

nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction

Integrated performance toolkit instrumentation, measurement, analysis, visualization Portable facility based on open software approach

Robust and widely applied

Page 6: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

General Complex System Computation Model

Node: physically distinct shared memory machine Message passing node interconnection network

Context: distinct virtual memory space within node Thread: execution threads (user/system) in context

memory memory

Node Node Node

VMspace

Context

SMP

Threads

node memory

Interconnection Network Inter-node messagecommunication

*

*

physicalview

modelview

Page 7: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Performance System Framework

EPILOG

Paraver

Page 8: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Target HPC Java Environment

Hybrid, multi-language scientific applications Java + {C, C++, Fortran} libraries Numerical, system, communications support Performance optimization

Mixed-mode parallelism Multi-threaded shared memory parallelism Distributed memory parallelism using communications

Cluster of SMP nodes Scalable parallelism Distributed

Page 9: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Performance Technology Issues

Object-oriented programming Object-based performance analysis High-level classes and performance mapping

Multi-level performance events User / source / byte code / VM / OS / libraries / external Multiple performance instrumentation strategies Integration of performance measurements

Mixed-mode parallel computation Multi-threading performance measurement Cross-mode performance correspondence

Hybrid, multi-language performance measurement

Page 10: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Java Source-Level Instrumentation

TAU Java package

User-defined events

TAU.Profile class for new “timers” Start/Stop

Performance data output at end

Page 11: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Java Source Instrumentation Architecture

Any code section can be measured

Portability Measurement options

Profiling, tracing Limitations

Source access only Lack of thread

information Lack of node

information

Java program

TAU.Profile class(init, data, output) TAU package

Profile database stored in JVM heap

TAU as dynamic shared object

JNI C bindings

Profile DB

JNI

TAU

Page 12: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Multi-Threading Performance Measurement

General issues Thread identity and per-thread data storage Performance measurement support and synchronization Fine-grained parallelism

different forms and levels of threading greater need for efficient instrumentation

TAU general threading and measurement model Common thread layer and measurement support Interface to system specific libraries (reg, id, sync)

Target different thread systems with core functionality Pthreads, Windows, Java, OpenMP

Page 13: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Virtual Machine Performance Instrumentation

Integrate performance system with VM Captures robust performance data (e.g., thread events) Maintain features of environment

portability, concurrency, extensibility, interoperation Allow use in optimization methods

JVM Profiling Interface (JVMPI) Generation of JVM events and hooks into JVM Profiler agent (TAU) loaded as shared object

registers events of interest and address of callback routine Access to information on dynamically loaded classes No need to modify Java source, bytecode, or JVM

Page 14: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

JVMPI Events

Method transition events Memory events Heap arena events Garbage collection events Class events Global reference events Monitor events Monitor wait events Thread events Dump events Virtual machine events

Page 15: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Java JVM Instrumentation Architecture

JVMPI

Thread API

Eventnotification

Java program

Profile DB

JNI

TAU

Robust set of events Portability Access to thread info Measurement options Limitations

Overhead Many events Event control No user-defined

events

Page 16: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Java Multi-Threading Performance (Test Case)

Profile and trace Java (JDK 1.2+) applications Observe user-level and system-level threads Observe events for different Java packages

/lang, /io, /awt, … Test application

SciVis, NPAC, Syracuse University

% ./configure -jdk=<dir_where_jdk_is_installed>

% setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH\:<taudir>/<arch>/lib

% java -XrunTAU svserver

Page 17: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Profiling of Java Application (SciVis)

Profile for eachJava thread Captures events

for different Javapackages

24 threads of execution!

Page 18: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Tracing of Java Application (SciVis)

Performance groupsTimeline display

Parallelism view

Page 19: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Vampir Dynamic Call Tree View (SciVis)

Per thread call tree

Annotated performance

Expandedcall tree

Page 20: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Message Communications Performance

Explicit message communications libraries for Java MPI performance measurement

MPI profiling interface - link-time interposition library TAU wrappers in native profiling interface library Send/Receive events and communication statistics

mpiJava (Syracuse, JavaGrande, 1999) Java wrapper package JNI C bindings to MPI communication library Dynamic shared object (libmpijava.so) loaded in JVM prunjava calls mpirun to distribute program to nodes Contrast to Java RMI-based schemes (MPJ, CCJ)

Page 21: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Java Instrumentation Architecture

Java program

mpiJava package

Native MPI library

No source instrumentation

Portability Measurement options Limitations

MPI events onlyNo mpiJava eventsNode info onlyNo thread info

JNI

TAU package

Profile DB

TAU

MPI profiling interface

TAU wrapper

Native MPI library

Page 22: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Mixed-mode Parallel Programs (Java + MPI)

Java threads and MPI communications Shared-memory multi-threading events Message communications events

Unified performance measurement and views Integration of performance mechanisms Integrated association of performance events

thread event and communication events user-defined (source-level) performance events JVM events

Support for performance measurement scaling Support for performance data access

Page 23: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Instrumentation and Measurement Cooperation

Problem JVMPI doesn’t see MPI events (e.g., rank (node)) MPI profiling interfaces doesn’t see threads Source instrumentation doesn’t see either!

Need cooperation between interfaces MPI exposes rank, gets thread information JVMPI exposes thread information, get rank Source instrumentation gets both Post-mortem matching of sends and receives

Selective instrumentation java -XrunTAU:exclude=java/io,sun

Page 24: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

JVMPI

Thread API

Eventnotification

TAU Java Instrumentation Architecture

Java program

TAU package mpiJava package

MPI profiling interface

TAU wrapper

Native MPI library

Profile DB

JNI

TAU

Page 25: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Parallel Java Game of Life (Profile)

mpiJavatestcase

4 nodes,28 threads Thread 4 executes

all MPI routines

Merged Java and MPI eventprofiles

Node 0

Node 1

Node 2

Page 26: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Parallel Java Game of Life (Trace)

Integrated event tracing Merged

trace viz Node

processgrouping

Threadmessagepairing

Vampirdisplay

Multi-level event grouping

Page 27: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Node / Thread Event Timeline

Temporal event behavior Event relationships

Page 28: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Integrated Performance View (Callgraph)

Sourcelevel

MPIlevel

Javapackageslevel

Page 29: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Conclusion Integrate robust and portable performance system

(TAU) in Java HPC environment Apply performance system to observe multiple levels

of Java HPC operation Leverage performance system framework based on

common performance measurement API Key: define multi-level events and define associations

Opportunities for improvement and application JVM instrumentation and JIT (dynamic compilation) Runtime access to performance data Java scientific packages, communication libraries (CCJ,

MPJ, RMI), // compilers (JOMP), applications, ..

Page 30: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

More Information and Acknowledgments

URLs TAU: www.cs.uoregon.edu/research/paracomp/tau

Grant support (TAU) DOE 2000 ACTS

http://www-unix.mcs.anl.gov/DOE2000 http://www.nersc.gov/ACTS

DOE ASCI Level 3 (LANL, LLNL) DARPA

Page 31: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Distributed Monitoring Framework

Extend usability of TAU performance analysis Access TAU performance data during execution Framework model

each application context is a performance data server monitor agent thread is created within each context client processes attach to agents and request data server thread synchronization for data consistency pull mode of interaction

Distributed TAU performance data space “A Runtime Monitoring Framework for the TAU

Profiling System” (ISCOPE ‘99)

Page 32: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

TAU Distributed Monitor Architecture

Each context has a monitor agent

Client in separatethread directs agent

Pull model ofinteraction

TAU profile database

Page 33: Sameer Shende, Allen D. Malony {sameer,malony}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

May 24, 2002 SMPAG Java Interest Group

Java Implementation of TAU Monitor Motivations

More portable monitor middleware system (RMI) More flexible and programmable server interface (JNI) More robust client development (EJB, JDBC, Swing)