1 scidac high-end computer system performance: science and engineering jack dongarra innovative...
TRANSCRIPT
![Page 1: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/1.jpg)
1
SciDAC
High-End Computer System Performance:
Science and Engineering
Jack DongarraInnovative Computing LaboratoryUniversity of Tennessee
http://www.cs.utk.edu/~dongarra/http://www.cs.utk.edu/~dongarra/
![Page 2: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/2.jpg)
2
Four Components for the Four Components for the University of Tennessee’s University of Tennessee’s
Performance Capturing Tools PAPI
Self adapting numerical software Automatic performance enhancementSANS/AEOS/ATLAS
Performance repository for apps, kernels, machines, etcNETLIB, Repository in a Box (RIB)
Modeling, predictability
![Page 3: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/3.jpg)
3
Tools for Tools for Performance EvaluationPerformance Evaluation
Timing and performance evaluation has been an artResolution of the clock Issues about cache effectsDifferent systemsCan be cumbersome and inefficient with
traditional tools Situation about to change
Today’s processors have internal counters
![Page 4: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/4.jpg)
4
Performance CountersPerformance Counters Almost all high performance processors
include hardware performance counters. Some are easy to access, others not
available to users. On most platforms the APIs, if they exist,
are not appropriate for the end user or well documented.
Existing performance counter APIs Compaq Alpha EV 6 & 6/7 SGI MIPS R10000 IBM Power Series CRAY T3E Sun Solaris Pentium Linux and Windows
IA-64 HP-PA RISC Hitachi Fujitsu NEC
![Page 5: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/5.jpg)
5
OverviewOverview ofof PAPI PAPI
Performance Application Programming Interface
The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors
![Page 6: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/6.jpg)
6
Performance Data from PAPIPerformance Data from PAPI Execution Rate (MIPS, Flop/s) Bandwidth Utilization
Main Memory L2 cache L1 cache
Cache Miss Statistics: Icache, Dcache, and L2 cache
TLB misses Mispredicted Branches Instruction Mix (FP, branch, LD/ST, other) Load/store instruction issue rate
![Page 7: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/7.jpg)
7
ImplementationImplementation
Counters exist as a small set of registers that count events.
PAPI provides three interfaces to the underlying counter hardware: 1. The low level interface manages
hardware events in user defined groups called EventSet.
2. The high level interface simply provides the ability to start, stop and read the counters for a specified list of events.
3. Graphical tools to visualize information.
![Page 8: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/8.jpg)
8
PAPI - Supported ProcessorsPAPI - Supported Processors Intel Pentium,Pro,II,III,4
Linux 2.4, 2.2, 2.0 and perf kernel patch IBM Power 3,604,604e
For AIX 4.3 and pmtoolkit (in 4.3.4 available) ([email protected])
Sun UltraSparc I, II, & IIISolaris 2.8
MIPS R10K, R12K AMD Athlon
Linux 2.4 and perf kernel patch Cray T3E, SV1, SV2 Soon: Windows 2K, Compaq Alpha EV6 & 67 and Intel IA-64
![Page 9: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/9.jpg)
9
Go To DemoGo To Demo
![Page 10: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/10.jpg)
10
PAPI’s Parallel InterfacePAPI’s Parallel Interface
![Page 11: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/11.jpg)
11
PAPI DevelopmentPAPI Development Extensions to PAPI to support collection and analysis of
hardware performance counter data in the context of shared and distributed memory parallel programs Allowing for straightforward instrumentation of
multithreaded and multiprocessor applications. Tools will include graphical tools extended with dynamic
instrumentation capabilities. Framework for using Dyninst with parallel programs,
the Free Probe Class Server (FPCS) and IBM’s Dynamic Probe Class Library (DPCL)
Port PAPI to Compaq Alpha and HP machines Summary information on problem spots within
applications Integration with other tools, SvPablo, Dyninst, etc Help with setting up PAPI at various sites.
![Page 12: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/12.jpg)
12
Repository DevelopmentRepository Development Repository of Tools and Data on
Performance Evaluation A network-based catalog that will serve
as a “road map” to important Performance Evaluation enabling technologies
A methodology for evaluation and measurement of the success of the tools.
SciDAC outreach: Start a community effort for the collection and dissemination of performance data
![Page 13: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/13.jpg)
13
Self-Adapting Numerical Self-Adapting Numerical Software (SANS)Software (SANS)
Today’s processors can achieve high-performance, but this requires extensive machine-specific hand tuning.
Simple operations like Matrix-Vector ops require many man-hours / platform• Software lags far behind hardware introduction• Only done if financial incentive is there
Compilers not up to optimization challenge Hardware, compilers, and software have a large design space
w/many parameters Blocking sizes, loop nesting permutations, loop unrolling depths,
software pipelining strategies, register allocations, and instruction schedules.
Complicated interactions with the increasingly sophisticated micro-architectures of new microprocessors.
Need for quick/dynamic deployment of optimized routines. ATLAS - Automatic Tuned Linear Algebra Software
![Page 14: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/14.jpg)
14
SANS ExtensionsSANS Extensions
BLAS Sparse matrix operations Message passing Algorithm selection at a higher
level
![Page 15: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/15.jpg)
15
Repository In a Box (RIB)Repository In a Box (RIB)
Metadata objects are stored in repositories.
A repository automatically generates a web site for displaying customizable views of its metadata - search, browse, join, etc.
Metadata objects are also made available to network applications via the RIB API.
![Page 16: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/16.jpg)
16
Repository InteroperationRepository Interoperation
My Repository
OurVirtual
Repository
Metadata objects
Your Repository
Metadata objects
HTMLCatalog
![Page 17: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/17.jpg)
17
Tools IntegrationTools Integration
PAPI, Dyninst, SVPablo Intelligent Adaptation
Rose and SANS (ATLAS) Repository-in-a-Box effort
provides a toolkit for building and maintaining meta-data repositories
![Page 18: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/18.jpg)
18
Interaction with Other EffortsInteraction with Other Efforts SciDAC - TOPS
David Keyes, ICASE/ODU/LLNL SciDAC - Astrophysics
Tony Mezzacappa, ORNL DOE - Cross-Platform Infrastructure
for Scalable Runtime Application Performance AnalysisBart Miller, U Wisc Jeff H., U of Maryland
![Page 19: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra](https://reader035.vdocument.in/reader035/viewer/2022081603/5697bf891a28abf838c89c50/html5/thumbnails/19.jpg)
19
High-End Computer System Performance:High-End Computer System Performance:Science and EngineeringScience and Engineering
Activities for UTennessee Performance Capturing Tools
PAPIAutomatic performance
enhancementSANS/AEOS/ATLAS
Performance repository for apps, kernels, machines, etcNETLIB, RIB
Modeling, predictability