a case study of hpc metrics collection and analysis philip johnson and michael paulding, university...

1
A Case Study of HPC Metrics Collection and Analysis Philip Johnson and Michael Paulding, University of Hawaii, Honolulu, Hawaii. Goals of the case study Provide a complete implementation of one Purpose Based Benchmark problem definition, called Optimal Truss Design Implement the Optimal Truss Design system in C++ using MPI on a 240 node Linux cluster at the University of Hawaii Develop and evaluate automated support for HPC process and product measurement using Hackystat Assess the utility of the metrics for understanding HPC development Metrics Collected Size (Number of files, Total SLOC, “Parallel” SLOC containing an MPI directive, “Serial” SLOC not containing an MPI directive, Test code) Active Time (amount of time spent editing Optimal Truss Design files) Performance (wall clock time on 1, 2, 4, 8, 16, and 32 processors) Milestone Tests (indicates functional completeness) Command Line Invocations Results: Basic Process and Product Measures Total Source Lines of Code 3320 LOC Total Test Lines of Code 901 LOC Total MPI Lines of Code 1032 LOC Total Days (Calendar time) 1 year Total Days (with Development Activity) 88 days Total Active Time 152 Hours Total Distinct MPI Directives 60 Directives Total Files 56 Files Total Sequential Files (no MPI) 51 Files Total Parallel Files (containing MPI) 5 Files Execution Time 126 sec. (1 processor) 66 sec. (2 processors) 33 sec. (4 processors) 27 sec. (8 processors) 39 sec. (16 processors) 43 sec. (32 processors) Results: Derived Process and Product Measures Derived Metric Definition Value Productivity Proxy (LOC / Active Time) 22 LOC/hour Average Daily Active Time (Total Active Time / Total Days) 1.73 hours/day Test Code Density Percentage (Total Test LOC / Total LOC) 27% MPI Code Density Percentage (Total MPI LOC / Total LOC) 31% MPI File Density Percentage (Total MPI Files / Total Files) 9% MPI Directive Frequency Ratio (Total MPI Directives : Total MPI LOC) 1 Directive : 17 LOC Speedup (Execution Time (1 Proc.) / Exec. Time (n processors) 1.0 (1 processor) 1.9 (2 processors) 3.7 (4 processors) 4.5 (8 processors) 3.2 (16 processors) 2.9 (32 processors) Results: Process and Product Telemetry Charts For More Information Understanding HPCS development through automated process and product measurement with Hackystat, Philip M. Johnson and Michael G. Paulding, Proceedings of the Second Workshop on Productivity and Performance in High-End Computing. Results: Daily Diary with CLI and Most Active File Insights and Lessons Learned Productivity (22 LOC/hour) and test code density (27%) seem in line with traditional software engineering metrics. Speedup data indicates almost linear speedup to 4 processors, then falls off sharply, indicating that current solution is not scalable. Parallel and serial LOC were equal at start of project, then most effort was devoted to serial code, with some final enhancements to parallel code at end of project. Performance data was not comparable over course of project (only final numbers available; no telemetry) Hackystat provides effective infrastructure for collection of process and product metrics. This case study provides useful baseline data to compare with future studies. Future research: Compare to OpenMP or JavaParty implementation. Gather metrics while improving scalability of system Compare metrics against other application types. Analyze CLI data for patterns, bottlenecks Thanks to our sponsors

Post on 21-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Case Study of HPC Metrics Collection and Analysis Philip Johnson and Michael Paulding, University of Hawaii, Honolulu, Hawaii. Goals of the case study

A Case Study of HPC Metrics Collection and AnalysisPhilip Johnson and Michael Paulding, University of Hawaii, Honolulu,

Hawaii. Goals of the case study

• Provide a complete implementation of one Purpose Based Benchmark problem definition, called Optimal Truss Design

• Implement the Optimal Truss Design system in C++ using MPI on a 240 node Linux cluster at the University of Hawaii

• Develop and evaluate automated support for HPC process and product measurement using Hackystat

• Assess the utility of the metrics for understanding HPC development

Metrics Collected

• Size (Number of files, Total SLOC, “Parallel” SLOC containing an MPI directive, “Serial” SLOC not containing an MPI directive, Test code)

• Active Time (amount of time spent editing Optimal Truss Design files)

• Performance (wall clock time on 1, 2, 4, 8, 16, and 32 processors)

• Milestone Tests (indicates functional completeness)

• Command Line Invocations

Results: Basic Process and Product Measures

Total Source Lines of Code 3320 LOC

Total Test Lines of Code 901 LOC

Total MPI Lines of Code 1032 LOC

Total Days (Calendar time) 1 year

Total Days (with Development Activity)

88 days

Total Active Time 152 Hours

Total Distinct MPI Directives 60 Directives

Total Files 56 Files

Total Sequential Files (no MPI) 51 Files

Total Parallel Files (containing MPI)

5 Files

Execution Time 126 sec. (1 processor)66 sec. (2 processors)33 sec. (4 processors)27 sec. (8 processors)39 sec. (16 processors)43 sec. (32 processors)

Results: Derived Process and Product Measures

Derived Metric Definition Value

Productivity Proxy (LOC / Active Time) 22 LOC/hour

Average Daily Active Time (Total Active Time / Total Days) 1.73 hours/day

Test Code Density Percentage (Total Test LOC / Total LOC) 27%

MPI Code Density Percentage (Total MPI LOC / Total LOC) 31%

MPI File Density Percentage (Total MPI Files / Total Files) 9%

MPI Directive Frequency Ratio (Total MPI Directives : Total MPI LOC)

1 Directive : 17 LOC

Speedup (Execution Time (1 Proc.) / Exec. Time (n processors)

1.0 (1 processor)1.9 (2 processors)3.7 (4 processors)4.5 (8 processors)3.2 (16 processors)2.9 (32 processors)

Results: Process and Product Telemetry Charts

For More Information

•Understanding HPCS development through automated process and product measurement with Hackystat, Philip M. Johnson and Michael G. Paulding, Proceedings of the Second Workshop on Productivity and Performance in High-End Computing.

Results: Daily Diary with CLI and Most Active File

Insights and Lessons Learned

•Productivity (22 LOC/hour) and test code density (27%) seem in line with traditional software engineering metrics.

•Speedup data indicates almost linear speedup to 4 processors, then falls off sharply, indicating that current solution is not scalable.

•Parallel and serial LOC were equal at start of project, then most effort was devoted to serial code, with some final enhancements to parallel code at end of project.

•Performance data was not comparable over course of project (only final numbers available; no telemetry)

• Hackystat provides effective infrastructure for collection of process and product metrics.

• This case study provides useful baseline data to compare with future studies.

• Future research:• Compare to OpenMP or JavaParty implementation.• Gather metrics while improving scalability of system• Compare metrics against other application types. • Analyze CLI data for patterns, bottlenecks

Thanks to our sponsors