lecture 2c: benchmarks
DESCRIPTION
Lecture 2c: Benchmarks. Benchmarking. Benchmark is a program that is run on a computer to measure its performance and compare it with other machines Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine. Not practical - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/1.jpg)
Lecture 2c:Lecture 2c:
BenchmarksBenchmarks
![Page 2: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/2.jpg)
Benchmarking
Benchmark is a program that is run on a computer to measure its performance and compare it with other machines
Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine.
Not practical
Standard benchmarks
![Page 3: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/3.jpg)
BenchmarkingTypes of Benchmarks
Synthetic benchmarks
Toy benchmarks
Microbenchmarks
Program Kernels
Real Applications
![Page 4: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/4.jpg)
Benchmarking
Synthetic benchmarks
Artificially created benchmark programs that represent the average frequency of operations (instruction mix) of a large set of programs
• Whetstone benchmark
• Dhrystone benchmark
• Rhealstone benchmark
![Page 5: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/5.jpg)
Benchmarking
Synthetic benchmarks• Whetstone benchmark
• First written in Algol60 in 1972, today Fortran, C/C++, Java versions are available
• Represents the workload of numerical applications
• Measures floating point arithmetic performance
• Unit is Millions of Whetstone instructions per second (MWIPS)
• Shortcommings:
• Does not represent constructs in modern languages, such as pointers, etc.
• Does not consider cache effects
![Page 6: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/6.jpg)
Benchmarking
Synthetic benchmarks• Dhrystone benchmark
• First written in Ada in1984, today
• Represents the workload of C version is available
• Statistics are collected on system software, such as operating system, compilers, editors and a few numerical programs
• Measures integer and string performance, no floating-point operations
• Unit is the number of program iteration completions per second
• Shortcommings:• Does not represent real life programs
• Compiler optimization overstates system performance
• Small code that may fit in the instruction cache
![Page 7: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/7.jpg)
Benchmarking
Synthetic benchmarks• Rhealstone benchmark
• Multi-tasking real-time systems
• Factors are:• Task switching time
• Pre-emption time
• Interrupt latency time
• Semaphore shuffling time
• Dead-lock breaking time
• Datagram throughput time
• Metric is Rhealstones per second
6
∑ wi . (1/ ti) i=1
![Page 8: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/8.jpg)
Benchmarking
Toy benchmarks 10-100 lines of code that the result is known before running the toy program
• Quick sort
• Sieve of EratosthenesFinds prime numbers
http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif
func sieve( var N ) var PrimeArray as array of size N initialize PrimeArray to all true for i from 2 to N for each j from i + 1 to N, where i divides j
set PrimeArray( j ) = false
![Page 9: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/9.jpg)
Benchmarking
Microbenchmarks Small, specially designed programs used to test some specific function of a system (eg. Floating-point execution, I/O subsystem, processor-memory interface, etc.)
• Provide values for important parameters of a system
• Characterize the maximum performance if the overall performance is limited by that single component
![Page 10: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/10.jpg)
Benchmarking
Kernels
Key pieces of codes from real applications.
• LINPACK and BLAS
• Livermore Loops
• NAS
![Page 11: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/11.jpg)
Benchmarking
Kernels • LINPACK and BLAS Libraries
• LINPACK – linear algebra package
• Measures floating-point computing power
• Solves system of linear equations Ax=b with Gaussian elimination
• Metric is MFLOP/s
• DAXPY - most time consuming routine
• Used as the measure for TOP500 list
• BLAS – Basic linear algebra subprograms
• LINPACK makes use of BLAS library
![Page 12: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/12.jpg)
Benchmarking
Kernels • LINPACK and BLAS Libraries
• SAXPY – Scalar Alpha X Plus Y
• Y = X + Y, where X and Y are vectors, is a scalar
• SAXPY for single and DAXPY for double precision
• Generic implementation:for (int i = m; i < n; i++) {
y[i] = a * x[i] + y[i]; }
![Page 13: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/13.jpg)
Benchmarking
Kernels • Livermore Loops
• Developed at LLNL
• Originally in Fortran, now also in C
• 24 numerical application kernels, such as:• hydrodynamics fragment,
• incomplete Cholesky conjugate gradient,
• inner product,
• banded linear systems solution, tridiagonal linear systems solution,
• general linear recurrence equations,
• first sum, first difference,
• 2-D particle in a cell, 1-D particle in a cell,
• Monte Carlo search,
• location of a first array minimum, etc.
• Metrics are arithmetic, geometric and harmonic mean of CPU rate
![Page 14: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/14.jpg)
Benchmarking
Kernels • NAS Parallel Benchmarks
• Developed at NASA Advanced Supercomputing division
• Paper-and-pencil benchmarks
• 11 benchmarks, such as:• Discrete Poisson equation,
• Conjugate gradient
• Fast Fourier Transform
• Bucket sort
• Embarrassingly parallel
• Nonlinear PDE solution
• Data traffic, etc.
![Page 15: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/15.jpg)
Benchmarking
Real Applications
Programs that are run by many users
• C compiler
• Text processing software
• Frequently used user applications
• Modified scripts used to measure particular aspects of system performance, such as interactive behavior, multiuser behavior
![Page 16: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/16.jpg)
Benchmarking
Benchmark Suites Desktop Benchmarks
• SPEC benchmark suite
Server Benchmarks • SPEC benchmark suite
• TPC
Embedded Benchmarks• EEMBC
![Page 17: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/17.jpg)
Benchmarking
SPEC Benchmark Suite Desktop Benchmarks
• CPU-intensive• SPEC CPU2000
• 11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks• Real application programs:
• C compiler• Finite element modeling• Fluid dynamics, etc.
• Graphics intensive• SPECviewperf
• Measures rendering performance using OpenGL
• SPECapc• Pro/Engineer – 3D rendering with solid models• Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests• Unigraphics – solid modeling for an aircraft design
Server Benchmarks • SPECWeb – for web servers• SPECSFS – for NFS performance, throughput-oriented
![Page 18: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/18.jpg)
Benchmarking
TPC Benchmark Suite Server Benchmark Transaction processing (TP) benchmarks Real applications
• TPC-C: simulates a complex query environment
• TPC-H: ad hoc decision support
• TPC-R: business decision support system where users run a standard set of queries
• TPC-W: business-oriented transactional web server Measures performance in transactions per second. Throughput
performance is measured only when response time limit is met. Allows cost-performance comparisons
![Page 19: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/19.jpg)
Benchmarking
EEMBC Benchmarks
for embedded computing systems
34 benchmarks from 5 different application classes:
• Automotive/industrial
• Consumer
• Networking
• Office automation
• Telecommunications
![Page 20: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/20.jpg)
BenchmarkingBenchmarking Strategies
Fixed-computation benchmarks
Fixed-time benchmarks
Variable-computation and variable-time benchmarks
![Page 21: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/21.jpg)
BenchmarkingBenchmarking Strategies
Fixed-computation benchmarks
Fixed-time benchmarks
Variable-computation and variable-time benchmarks
![Page 22: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/22.jpg)
BenchmarkingFixed-Computation benchmarks
W: fixed workload (number of instructions, number of floating-point operations,
etc)
T: measured execution time
R: speed
Compare
T
WR
1
2
2
1
2
1
/
/
T
T
TW
TW
R
RSpeedup
![Page 23: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/23.jpg)
BenchmarkingFixed-Computation benchmarks
Amdahl’s Law
![Page 24: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/24.jpg)
BenchmarkingFixed-Time benchmarks
On a faster system, a larger workload can be processed in the same amount of time
T: fixed execution time
W: workload
R: speed
Compare
T
WR
2
1
2
1
2
1
/
/
W
W
TW
TW
R
RSizeup
![Page 25: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/25.jpg)
BenchmarkingFixed-Time benchmarks
Scaled Speedup
![Page 26: Lecture 2c: Benchmarks](https://reader036.vdocument.in/reader036/viewer/2022081520/56815172550346895dbfa86b/html5/thumbnails/26.jpg)
BenchmarkingVariable-Computation and Variable-Time
benchmarks
In this type of benchmark, quality of the solution is improved.
Q: quality of the solution
T: execution time
Quality improvements per second: T
Q