4. assessing and understanding performance
DESCRIPTION
4. Assessing and Understanding Performance. 4. Performance. 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/1.jpg)
4. Assessing and Understanding Performance
![Page 2: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/2.jpg)
Computer Architecture 4-2
4. Performance
4.1 Introduction4.2 CPU Performance and Its Factors4.3 Evaluating Performance4.4 Real Stuff: Two SPEC Benchmarks and the
Performance of Recent Intel Processors4.5 Fallacies and Pitfalls4.6 Concluding Remarks4.7 Historical Perspective and Further Reading4.8 Exercises
![Page 3: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/3.jpg)
Computer Architecture 4-3
How to measure, report, and summarize performance
Defining Performance An analogy
AirplanePassenger capacity
Cruising range
Cruising speed
Passenger throughpu
t
Boeing 777 375 4630 610 228,750
Boeing 747 470 4150 610 286,700
BAC/Sud Concorde
132 4000 1350 178,200
Douglas DC-8-50
146 8720 544 79,424
Back to chapter overview
Figure 4.1
4.1 Introduction
![Page 4: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/4.jpg)
Computer Architecture 4-4
Performance of a Computer
Response time ( = execution time ) The time between the start and completion of a task
Throughput The total amount of a work done in a given time
Performance and execution time Performancex = 1 / Execution timex
X is n times faster than Y
nX
Y
Y
X
Time ExceutionTime Execution
ePerformancePerformanc
![Page 5: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/5.jpg)
Computer Architecture 4-5
Measuring Performance
Definitions of time Wall-clock time = Response time = Elapsed time
Total time to complete a task Including disk accesses, memory accesses, I/O activities, OS
overhead and etc. CPU execution time = CPU time
The time CPU spends computing for this task CPU time = User CPU time + System CPU time
UNIX time command 90.7u 12.9s 2:39 65%
Definitions of performance System performance: based on elapsed time CPU performance: based on user CPU time
![Page 6: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/6.jpg)
Computer Architecture 4-6
CPU execution time
= CPU clock cycles x clock cycle time
= CPU clock cycles / clock rate
Example: Improving Performance Same instruction sets
Computer A : 4 GHz, 10 seconds
Computer B : ? GHz, 6 second
B requires 1.2 times as many clock cycles as A.
Back to chapter overview
4.2 CPU Performance and Its Factors
![Page 7: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/7.jpg)
Computer Architecture 4-7
[Answer]
CPU timeA = CPU clock cyclesA / clock rateA
10 seconds = CPU clock cyclesA / (4 X 109 cycles/sec)
CPU clock cyclesA = 10 sec. X 4 X 109 cycles/sec
= 40 X 109 cycles
CPU timeB = CPU clock cyclesB / clock rateB
= 1.2 X CPU clock cyclesA / clock rateB
6 seconds = 1.2 X 40 X 109 cycles / clock rateB
clock rateB = 1.2 X 40 X 109 cycles / 6 seconds = 8 GHz
![Page 8: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/8.jpg)
Computer Architecture 4-8
Hardware Software Interface
CPU clock cycles = IC x CPI
IC (Instruction Count) Dependent on compilers and architectures
CPI (Cycles Per Instruction) Dependent on implementations
Performance equation
Execution Time = IC x CPI x clock cycle time
= (IC x CPI) / clock rate
![Page 9: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/9.jpg)
Computer Architecture 4-9
Same instruction set architecture, same program Clock cycle timeA = 250ps, CPIA = 2.0
Clock cycle timeB = 500ps, CPIB = 1.2 Which is faster, and by how much ?[Answer]
Let I = instruction count for the program. CPU timeA = ICA x CPIA x clock cycle timeA
= I x 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 x 500 ps = 600 x I ps Then
Thus, A is 1.2 times faster than B for this program.
1.2 ps I 500
ps I 600
time Executiontime Execution
ePerformanc CPUePerformanc CPU
A
B
B
A
Example: Using the Performance Equation
![Page 10: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/10.jpg)
Computer Architecture 4-10
The Big Picture
cycle ClockSecond
nInstructiocycles Clock
nInstructio Time
Components of performance Units of measure
CPU execution time for a program Seconds for the program
Instruction count (IC) Instructions executed for the program
Clock cycles per instruction (CPI) Average clock cycles / Instruction
Clock cycle time Seconds / Clock cycle
![Page 11: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/11.jpg)
Computer Architecture 4-11
Example: Comparing Code Segments
Which will be faster ? What is the CPI for each sequence ?
Instruction class CPI for the class
A 1
B 2
C 3
Inst. Count Code Sequence A B C
1 2 1 2
2 4 1 1
![Page 12: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/12.jpg)
Computer Architecture 4-12
[Answer]
instruction count1 = 2 + 1 + 2 = 5 and
instruction count2 = 4 + 1 + 1 = 6
Thus (1) executes fewer instructions. CPU clock cycles1 = 2x1 + 1x2 + 2x3 = 10 and
CPU clock cycles2 = 4x1 + 1x2 + 1x3 = 9
Thus (2) is faster. CPI1 = CPU clock cycles1 / instruction count1
= 10 / 5 =2 CPI2 = 9 / 6 = 1.5
(2) has lower CPI.
![Page 13: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/13.jpg)
Computer Architecture 4-13
Benchmarking The process of performance comparison for two or more
systems by measurements
Benchmark Programs specifically chosen to measure performance A workload that the user hopes will predict the performance of
the actual workload
Compiler tricks Optimizations in either the architecture or compiler
Back to chapter overview
4.3 Evaluating Performance
![Page 14: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/14.jpg)
Computer Architecture 4-14
Compiler Tricks by IBM
![Page 15: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/15.jpg)
Computer Architecture 4-15
Difficulties with summarizing performance
A is 10 times faster than B for program 1. B is 10 times faster than A for program 2.
Total execution time: A Consistent Summary Measure
AM: Arithmetic Mean =
Weighted arithmetic mean =
n
1 iiTime
n1
n
1i 1.0 iw
n
1i where, )iwi(Time
Computer A Computer B
Program 1(seconds) 1 10
Program 2(seconds) 1000 100
Total time (seconds) 1001 110
Figure 4.4
Comparing and Summarizing Performance
![Page 16: 4. Assessing and Understanding Performance](https://reader035.vdocument.in/reader035/viewer/2022062321/568136c6550346895d9e621c/html5/thumbnails/16.jpg)
Computer Architecture 4-16
4.6 Concluding Remarks
Three design criteria1. High-performance design
Supercomputer and high-end server
2. Low-cost design Embedded system
3. Cost/performance design Desktop computer
Execution time of real program as the metrics
Back to chapter overview
cycle clockseconds
ninstructiocycle clock
programnsinstructio
programsecond