kai li, allen d. malony, robert bell, sameer shende {likai,malony,bertie,sameer}@cs.uoregon.edu...

22
Kai Li, Allen D. Malony , Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute, NeuroInformatics Center University of Oregon A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu

Department of Computer and Information Science

Computational Science Institute, NeuroInformatics Center

University of Oregon

A Framework for Online PerformanceAnalysis and Visualization of Large-

Scale Parallel Applications

Page 2: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 2Framework for Online Performance Analysis, and Visualization

Outline

Problem description Scaling and performance observation Interest in online performance analysis General online performance system architecture

Access models Profiling issues and control issues

Framework for online performance analysis TAU performance system SCIRun computational and visualization environment

Experiments Conclusions and future work

Page 3: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 3Framework for Online Performance Analysis, and Visualization

Problem Description

Need for parallel performance observation Instrumentation, measurement, analysis, visualization

In general, there is the concern for intrusion Seen as a tradeoff with accuracy of performance diagnosis

Scaling complicates observation and analysis Issues of data size, processing time, and presentation

Online approaches add capabilities as well as problems Performance interaction, but at what cost?

Tools for large-scale performance observation online Supporting performance system architecture Tool integration, effective usage, and portability

Page 4: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 4Framework for Online Performance Analysis, and Visualization

Scaling and Performance Observation

Consider “traditional” measurement methods Profiling: summary statistics calculated during execution Tracing: time-stamped sequence of execution events

More parallelism more performance data overall Performance specific to each thread of execution Possible increase in number interactions between threads

Harder to manage the data (memory, transfer, storage, …) More parallelism / performance data harder analysis

More time consuming to analyze More difficult to visualize (meaningful displays)

Need techniques to address scaling at all levels

Page 5: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 5Framework for Online Performance Analysis, and Visualization

Why Complicate Matters with Online Methods?

Adds interactivity to performance analysis process Opportunity for dynamic performance observation

Instrumentation change Measurement change

Allows for control of performance data volume Post-mortem analysis may be “too late”

View on status of long running jobs Allow for early termination Computation steering to achieve “better” results Performance steering to achieve “better” performance

Online performance observation may be intrusive

Page 6: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 7Framework for Online Performance Analysis, and Visualization

General Online Performance Observation System

PerformanceData

Perf

orm

ance

Mea

sure

men

t

PerformanceControl Performance

Analysis

PerformanceVisualization

Perf

orm

ance

Inst

rum

ent

Page 7: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 8Framework for Online Performance Analysis, and Visualization

Models of Performance Data Access (Monitoring)

Push Model Producer/consumer style of access and transfer Application decides when/what/how much data to send External analysis tools only consume performance data Availability of new data is signaled passively or actively

Pull Model Client/server style of performance data access and transfer Application is a performance data server Access decisions are made externally by analysis tools Two-way communication is required

Push/Pull Models

Page 8: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 12Framework for Online Performance Analysis, and Visualization

TAU Performance System Architecture

EPILOG

Paraver

ParaProf

Page 9: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 13Framework for Online Performance Analysis, and Visualization

Online Profile Measurement and Analysis in TAU Standard TAU profiling

Per node/context/thread Profile “dump” routine

Context-level Profile file per each

thread in context Appends to profile file Selective event dumping

Analysis tools access filesthrough shared file system

Application-level profile“access” routine

Page 10: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 14Framework for Online Performance Analysis, and Visualization

Online Performance Analysis and Visualization

ApplicationPerformance

Steering PerformanceVisualizer

PerformanceAnalyzer

PerformanceData Reader

TAUPerformance

System

PerformanceData Integrator

SCIRun (Univ. of Utah)

// performancedata streams

// performancedata output

file system

• sample sequencing• reader synchronization

accumulatedsamples

Page 11: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 15Framework for Online Performance Analysis, and Visualization

Profile Sample Data Structure in SCIRun

node

context

thread

Page 12: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 16Framework for Online Performance Analysis, and Visualization

Performance Analysis/Visualization in SCIRun

SCIRun program

Page 13: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 17Online Performance Monitoring, Analysis, and Visualization

Uintah Computational Framework (UCF)

Universityof Utah

UCF analysis Scheduling MPI library Components

500 processes Use for online

and offlinevisualization

Apply SCIRunsteering

Page 14: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 18Online Performance Monitoring, Analysis, and Visualization

“Terrain” Performance Visualization

F

Page 15: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 19Online Performance Monitoring, Analysis, and Visualization

Scatterplot Displays Each point

coordinatedeterminedby threevalues:MPI_ReduceMPI_RecvMPI_Waitsome

Min/Maxvalue range

Effective forclusteranalysis Relation between MPI_Recv and MPI_Waitsome

Page 16: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 20Online Performance Monitoring, Analysis, and Visualization

Online Unitah Performance Profiling

Demonstration of online profiling capability Colliding elastic disks

Test material point method (MPM) code Executed on 512 processors ASCI Blue Pacific at LLNL

Example 1 (Terrain visualization) Exclusive execution time across event groups Multiple time steps

Example 2 (Bargraph visualization) MPI execution time and performance mapping

Example 3 (Domain visualization) Task time allocation to “patches”

Page 17: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 21Online Performance Monitoring, Analysis, and Visualization

Example 1 (Event Groups)

QuickTime™ and aGIF decompressorare needed to see this picture.

Page 18: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 22Online Performance Monitoring, Analysis, and Visualization

Example 2 (MPI Performance)

QuickTime™ and aGIF decompressorare needed to see this picture.

Page 19: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

ParCo 2003 Mini-Symposium 23Online Performance Monitoring, Analysis, and Visualization

Example 3 (Domain-Specific Visualization)

QuickTime™ and aGIF decompressorare needed to see this picture.

Page 20: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 28Framework for Online Performance Analysis, and Visualization

Possible Improvements

Profile merging at context level to reduce number of files Merging at node level may require explicit processing

Concurrent trace merging could also reduce files Hierarchical merge tree Will require explicit processing

Could consider IPC transfer MPI (e.g., used in mpiP for profile merging)

Create own communicators Sockets or PACX between computer server and analyzer

Leverage large-scale systems infrastructure Parallel profile analysis

Page 21: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 29Framework for Online Performance Analysis, and Visualization

Concluding Remarks

Interest in online performance monitoring, analysis, and visualization for large-scale parallel systems

Need to intelligently use Benefit from other scalability considerations of the

system software and system architecture See as an extension to the parallel system architecture Avoid solutions that have portability difficulties In part, this is an engineering problem

Need to work with the system configuration you have Need to understand if approach is applicable to problem

Not clear if there is a single solution

Page 22: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational

PPAM 2003 30Framework for Online Performance Analysis, and Visualization

Future Work

Build online support in TAU performance system Extend to support PULL model capabilities

Develop hierarchical data access solutions Performance studies of full system

Latency analysis Bandwidth analysis

Integration with other performance tools System performance monitors ParaProf parallel profile analyzer

Development of 3D visualization library Portability focus