11 july 2005 tool evaluation scoring criteria professor alan d. george, principal investigator mr....
TRANSCRIPT
11 July 2005
Tool Evaluation Tool Evaluation Scoring CriteriaScoring Criteria
Professor Alan D. George, Principal InvestigatorMr. Hung-Hsun Su, Sr. Research Assistant
Mr. Adam Leko, Sr. Research AssistantMr. Bryan Golden, Research Assistant
Mr. Hans Sherburne, Research Assistant
HCS Research LaboratoryUniversity of Florida
PATPAT
11 July 2005
Usability/Portability Characteristics
311 July 2005
Available Metrics
Description Depth of metrics provided by tool Examples
Communication statistics or events Hardware counters
Importance rating Critical, users must be able to obtain representative performance
data to debug performance problems Rating strategy
Scored using relative ratings (subjective characteristic) Compare tool’s available metrics with metrics provided by other
tools
411 July 2005
Documentation Quality
Description Quality of documentation provided Includes user’s manuals, READMEs, and “quick start” guides
Importance rating Important, can have a large affect on overall usability
Rating strategy Scored using relative ratings (subjective characteristic) Correlated to how long it takes to decipher documentation
enough to use tool Tools with quick start guides or clear, concise high-level
documentation receive higher scores
511 July 2005
Installation
Description Measure of time needed for installation Also incorporates level of expertise necessary to perform
installation Importance
Minor, installation only needs to be done once and may not even be done by end user
Rating strategy Scored using relative ratings based on mean installation time for
all tools All tools installed by a single person with significant system
administration experiences
611 July 2005
Learning Curve
Description Difficulty level associated with learning to use tool
effectively Importance rating
Critical, tools that are perceived as being too difficult to operate by users will be avoided
Rating strategy Scored using relative ratings (subjective characteristic) Based on time necessary to get acquainted with all
features needed for day-to-day operation of tool
711 July 2005
Manual Overhead Description
Amount of user effort needed to instrument their code Importance rating
Important, tool must not cause more work for user in end (instead it should reduce time!)
Rating strategy Use hypothetical test case
MPI program, ~2.5 kloc in 20 .c files with 50 user functions Score one point for each of the following actions that can be completed on a fresh
copy of source code in 10 minutes (estimated) Instrument all MPI calls Instrument all functions Instrument five arbitrary functions Instrument all loops, or a subset of loops Instrument all function callsites, or a subset of callsites (about 35)
811 July 2005
Measurement Accuracy Description
How much runtime instrumentation overhead tool imposes Importance rating
Important, inaccurate data may lead to incorrect diagnosis which creates more work for user with no benefit
Rating strategy Use standard application: CAMEL MPI program Score based on runtime overhead of instrumented executable (wallclock
time) 0-4%: five points 5-9%: four points 10-14%: three points 15-19%: two points 20% or greater: one point
911 July 2005
Multiple Analyses/Views
Description Different ways tool presents data to user Different analyses available from within tool
Importance rating Critical, tools must provide enough ways of looking at data
so that users may track down performance problems Rating strategy
Score based on relative number of views and analyses provided by each tool
Approximately one point for each different view and analyses provided by tool
1011 July 2005
Profiling/Tracing Support
Description Low-overhead profile mode offered by tool Comprehensive event trace offered by tool
Importance rating Critical, profile mode useful for quick analysis and trace mode
necessary for examining what really happens during execution Rating strategy
Two points if a profiling mode is available Two points if a tracing mode is available One extra point if trace file size is within a few percent of best
trace file size across all tools
1111 July 2005
Response Time
Description How much time is needed to get data from tool
Importance rating Average, user should not have to wait an extremely long time for
data but high-quality information should always be first goal of tools
Rating strategy Score is based on relative time taken to get performance data
from tool Tools that perform post-mortem complicated analyses or
bottleneck detection receive lower scores Tools that provide data while program is running receive five
points
1211 July 2005
Source Code Correlation
Description How well tool relates performance data back to original source
code Importance rating
Critical, necessary to see which statements and regions of code are causing performance problems
Rating strategy Four to five points if tool supports source correlation to function
or line level One to three points if tool supports indirect method of attributing
data to functions or source lines Zero points if tool does not provide enough data to map
performance metrics back to source code
1311 July 2005
Stability Description
How likely tool is to crash while under use Importance rating
Important, unstable tools will frustrate users and decrease productivity
Rating strategy Scored using relative ratings (subjective characteristic) Score takes into account
Number of crashes experienced during evaluation Severity of crashes Number of bugs encountered
1411 July 2005
Technical Support
Description How quick responses are received from tool developers or
support departments Quality of information and helpfulness of responses
Importance rating Average, important for users during installation and initial
use of tool but becomes less important as time goes on Rating strategy
Relative rating based on personal communication with our contacts for each tool (subjective characteristic)
Timely, informative responses result in four or more points
11 July 2005
Portability Characteristics
1611 July 2005
Extensibility
Description How easy tool may be extended to support UPC and SHMEM
Importance rating Critical, tools that cannot be extended for UPC and SHMEM are
almost useless for us Rating strategy
Commercial tools receive zero points Regardless of if export or import functionality is available Interoperability covered by another characteristic
Subjective score based on functionality provided by tool Also incorporates quality of code (after quick review)
1711 July 2005
Hardware Support
Description Number and depth of hardware platforms supported
Importance rating Critical, essential for portability
Rating strategy Based on our estimate of important architectures for UPC and SHMEM Award one point for support of each of the following architectures
IBM SP (AIX) IBM BlueGene/L AlphaServer (Tru64) Cray X1/X1E (UnicOS) Cray XD1 (Linux w/Cray proprietary interconnect) SGI Altix (Linux w/NUMALink) Generic 64-bit Opteron/Itanium Linux cluster support
1811 July 2005
Heterogeneity
Description Tool support for running programs across different
architectures within a single run Importance rating
Minor, not very useful on shared-memory machines
Rating strategy Five points if heterogeneity is supported Zero points if heterogeneity is not supported
1911 July 2005
Software Support
Description Number of languages, libraries, and compilers supported
Importance rating Important, should support many compilers and not hinder library
support but hardware support and extensibility are more important
Rating strategy Score based on relative number of languages, libraries, and
compilers supported compared with other tools Tools that instrument or record data for existing closed-source
libraries receive an extra point (up to max of five points)
11 July 2005
Scalability Characteristics
2111 July 2005
Filtering and Aggregation
Description How well tool is able to provide users with tools to simplify
and summarize data being displayed Importance rating
Critical, necessary for users to effectively work with large data sets generated by performance tools
Rating strategy Scored using relative ratings (slightly subjective
characteristic) Tools that provide many different ways of filtering and
aggregating data receive higher scores
2211 July 2005
Multiple Executions Description
Support for relating and comparing performance information from different runs
Examples Automated display of speedup charts Differences between time taken for methods using
different algorithms or variants of a single algorithm Importance rating
Critical, import for doing scalability analysis Rating strategy
Five points if tool supports relating data from different runs Zero points if not
2311 July 2005
Performance Bottleneck Detection Description
How well tool identifies each known (and unknown) bottleneck in our test suite
Importance rating Critical, bottleneck detection the most important function of
a performance tool Rating strategy
Score proportional to the number of PASS ratings given for test suite programs
Slightly subjective characteristic; have to guess that the user is able to determine bottleneck based on data provided by tool
2411 July 2005
Searching
Description Ability of the tool to search for particular information or
events Importance rating
Minor, can be useful but difficult to provide users with a powerful search that is user-friendly
Rating strategy Five points if searching is support
Points deducted if only simple search available Zero points if no search functionality
11 July 2005
Miscellaneous Characteristics
2611 July 2005
Cost Description
How much (per seat) the tool costs to use Importance rating
Important, tools that are prohibitively expensive reduce overall availability of tool
Rating strategy Scale based on per-seat cost
Free: five points $1.00 to $499.99: four points $500.00 to $999.99: three points $1,000.00 to $1,999.99: two points $2,000.00 or more: one point
2711 July 2005
Interoperability Description
How well the tool works and integrates with other performance tools
Importance rating Important, tools lacking in areas like trace visualization can make
up for it by exporting data that other tools can understand (also helpful for getting data from 3rd-party sources)
Rating strategy Zero if data cannot be imported or exported from tool One point for export of data in a simple ASCII format Additional points (up to five) for each format the tool can export
from and import into