datorteknik performanceanalyse bild 1 performance –what is it: measures of performance the cpu...
Post on 22-Dec-2015
217 views
TRANSCRIPT
![Page 1: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/1.jpg)
Datorteknik PerformanceAnalyse bild 1
Performance Performance
– what is it: measures of performance
The CPU Performance Equation:– Execution time as the measure
– what affects execution time
– examples
Choosing good benchmarks?– choosing bad benchmarks?
Amdahl's Law
![Page 2: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/2.jpg)
Datorteknik PerformanceAnalyse bild 2
Performance is Time
Time to do the task (Execution Time)– execution time, response time, latency
Tasks per unit time (sec, minute, ...)– throughput, bandwidth
![Page 3: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/3.jpg)
Datorteknik PerformanceAnalyse bild 3
Performance as Response Time
Performance is most often measured as response time or execution time for some task.
“X is n times faster than Y” means
Performance(X) Execution Time(Y)
–––––––––––––– = –––––––––––––––– = n
Performance(Y) Execution Time(X)
ExampleExecution time of program P
X is 5 sec; Y is 10 sec.
X is 2 times faster than Y.
![Page 4: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/4.jpg)
Datorteknik PerformanceAnalyse bild 4
What time to measure? Elapsed time, wall-clock time:
– actual time from start to completion
– depends on CPU, system, I/O, etc.
– often used in real benchmarks
– only suitable choice when I/O is included
CPU Time:– measure/analyze CPU performance only
– may be suitable when machine is timeshared
– possibly both user and system component
– User CPU time is our focus for first part of course
Elapsed time = CPU time + Idle time– usually and assuming time is accurately accounted for
![Page 5: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/5.jpg)
Datorteknik PerformanceAnalyse bild 5
Metrics of performance Different performance metrics are appropriate at
different levels:
Compiler
LanguageProgramming
Application
DatapathControl
Function UnitsTransistors
ISA
Answers per monthOperations per second
(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s
Cycles per second (clock rate)
Cycles per Instruction
![Page 6: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/6.jpg)
Datorteknik PerformanceAnalyse bild 6
Relating Processor Metrics CPU execution time per program
= CPU clock cycles/program X Clock cycle time
= CPU clock cycles/program ÷ Clock rate (frequency)
CPU clock cycles/program= Instructions/program X Clock cycles Per Instruction
Clock cycles Per Instruction (CPI) is an average measurement, it depends on :
– ISA, the implementation, and the program measured
– CPI = CPU clock cycles/program ÷ Instructions/program
– Also, Instructions per clock cycle or IPC = 1 / CPI
CPU execution time = Instructions X CPI X Clock cycle
![Page 7: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/7.jpg)
Datorteknik PerformanceAnalyse bild 7
Let’s look at the single-cycle model analytically
![Page 8: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/8.jpg)
Datorteknik PerformanceAnalyse bild 8
Static timing analysis
Memories 10 ns Register 5 ns Adders 10 ns ALU 10 ns
Use topological sort!
![Page 9: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/9.jpg)
Datorteknik PerformanceAnalyse bild 9
5 ns Branch
logic
Sgn/Ze
extend
Zero ext.
lw $2 const($3)
10 ns10 ns
ALU
A
B
31
0
4+
+
10 ns
10 ns
10 ns
35 ns delay
![Page 10: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/10.jpg)
Datorteknik PerformanceAnalyse bild 10
But that path goes through the data memory!
What if this is not a load/store?
How about an instruction that does nothing?
“NOP”
![Page 11: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/11.jpg)
Datorteknik PerformanceAnalyse bild 11
5 ns Branch
logic
Sgn/Ze
extend
Zero ext.
Nop
10 ns10 ns
ALU
A
B
31
0
4+
+
10 ns
10 ns
10 ns
10 ns delay
![Page 12: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/12.jpg)
Datorteknik PerformanceAnalyse bild 12
5 ns Branch
logic
Sgn/Ze
extend
Zero ext.
Add $ra $rb $rc
10 ns10 ns
ALU
A
B
31
0
4+
+
10 ns
10 ns
10 ns
25 ns delay
![Page 13: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/13.jpg)
Datorteknik PerformanceAnalyse bild 13
5 ns Branch
logic
Sgn/Ze
extend
Zero ext.
B label
10 ns10 ns
ALU
A
B
31
0
4+
+
10 ns
10 ns
10 ns
20 ns delay
![Page 14: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/14.jpg)
Datorteknik PerformanceAnalyse bild 14
35 ns for load/store
but
10 ns for NOP !?
![Page 15: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/15.jpg)
Datorteknik PerformanceAnalyse bild 15
Amdahl’s rule:
“Make the common case fast”
![Page 16: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/16.jpg)
Datorteknik PerformanceAnalyse bild 16
Amdahl's Law Handy for evaluating impact of a change not tied to
CPU performance equation Insight: No improvement of a feature enhances
performance by more than the use of the feature. Suppose that enhancement E accelerates fraction F
of a program by a factor S (remainder of the task is unaffected):
ExecTimeE = ((1 – F( + (F/S)) X ExecTimewithout
F 1-F 1-F
E
S =
F/S
![Page 17: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/17.jpg)
Datorteknik PerformanceAnalyse bild 17
What if we don’t need the ALU?
A branch instruction?
![Page 18: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/18.jpg)
Datorteknik PerformanceAnalyse bild 18
BUT!
The single cycle model has to accomodate the slowest instruction
Even if it rarely occurs!
![Page 19: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/19.jpg)
Datorteknik PerformanceAnalyse bild 19
How much work can our structure perform?
For a program Q:
Time = Number of executed instruction *
Number of cycles per instruction *
Time per cycle
T = Nq * CPI * Tc
![Page 20: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/20.jpg)
Datorteknik PerformanceAnalyse bild 20
For the single cycle model....
CPI = 1 for all instructions
Tc determined by the slowest instruction
![Page 21: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/21.jpg)
Datorteknik PerformanceAnalyse bild 21
How to reduce T?
T = Nq * CPI * Tc
Reduce Nq.
More powerful instructions!
More hardware, longer paths, cycle time
goes up (slower machine)
![Page 22: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/22.jpg)
Datorteknik PerformanceAnalyse bild 22
“No free lunch”
Why designers are so well paid -
to optimize designs.
![Page 23: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/23.jpg)
Datorteknik PerformanceAnalyse bild 23
How to reduce T?
T = Nq * CPI * Tc
Faster hardware
Technological limits
Cost increase not linearly related
Sales volume drops
![Page 24: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/24.jpg)
Datorteknik PerformanceAnalyse bild 24
How to reduce T?
T = Nq * CPI * Tc
Make this a function of the instruction
For example: NOP = 1 cycle
LW = 4 cycles
Chapter 5.4, the classical method
![Page 25: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/25.jpg)
Datorteknik PerformanceAnalyse bild 25
How to reduce T?
T = Nq * CPI * Tc
Make this a function of the instruction
CPI goes up, but we can use an average,
not the worst case
Tc goes down, time to do the longes step,
not the entire instruction
![Page 26: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/26.jpg)
Datorteknik PerformanceAnalyse bild 26
Example
Branch: Step 1: fetch
Step 2: New PC
Add: Step 1: fetch
Step 2: decode/ register fetch
Step 3: Compute and write back
![Page 27: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/27.jpg)
Datorteknik PerformanceAnalyse bild 27
Example
LW = 4 steps
Cycletime = 1/4 old time
T = 4 * 1/4 old time,LW CPI
just as slow for the lw instruction
our worst case!
![Page 28: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/28.jpg)
Datorteknik PerformanceAnalyse bild 28
But that’s not important if LW is not common!
T = Nq * CPI * 1/4 old time
Averaged over this many instructions
1,3?1,7?Never = 4,0!
![Page 29: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/29.jpg)
Datorteknik PerformanceAnalyse bild 29
We win because of quantitative statisticalproperties of our programs!
![Page 30: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/30.jpg)
Datorteknik PerformanceAnalyse bild 30
What value of CPI do we use?
1,3? 1,5? 1,7?
Easy: Use average program!
?
![Page 31: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/31.jpg)
Datorteknik PerformanceAnalyse bild 31
There is no such thing!
![Page 32: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/32.jpg)
Datorteknik PerformanceAnalyse bild 32
Artificial “average programs” called “benchmarks”
Are they something to trust?
What about “peak performance values”
mips? mflops?
We have a peak at CPI = 1....
...a program of only NO-OPS!
![Page 33: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/33.jpg)
Datorteknik PerformanceAnalyse bild 33
Why Do Benchmarks? How we evaluate performance differences
– Across and within a single system (design & variations)
What should benchmarks do?– Represent a large class of important programs
– Behave like typical programs: improved benchmark performance => improved
performance broadly
For better or worse, benchmarks shape a field Good ones accelerate progress Bad benchmarks hurt progress
– help real programs vs. sell machines/papers?
– Enhancements that help benchmarks may not help most programs and v.v.
![Page 34: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/34.jpg)
Datorteknik PerformanceAnalyse bild 34
Classes of Benchmarks (Toy) Benchmarks
– 10-100 line–e.g.,: sieve, puzzle, quicksort
– good first programming assignments
Synthetic Benchmarks– attempt to match average frequencies of real workloads
– e.g., Whetstone, dhrystone
– mostly good for nothing: too artificial
Kernels– Time critical excerpts of real programs
– e.g., Livermore loops, Linpack
– good for micro-performance studies
Real programs– e.g., gcc, spice, Verilog, Database, stock trading
![Page 35: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/35.jpg)
Datorteknik PerformanceAnalyse bild 35
Successful Benchmark: SPEC Collection
1987 RISC industry (workstations) mired in “bench marketing”:
– (“That is an 8 MIPS machine, but they claim 10 MIPS!”)
EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988:
– Sun, MIPS, HP, Apollo, DEC
Create standard list of programs, inputs, reporting rules:
– several real programs, including OS calls
– some I/O
– rules for running and reporting
![Page 36: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/36.jpg)
Datorteknik PerformanceAnalyse bild 36
Multiple clock cycle designs:
State machines
Micro programming
chapter 5.4
“VLSI” design
![Page 37: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/37.jpg)
Datorteknik PerformanceAnalyse bild 37
How to reduce T?
T = Nq * CPI * Tc
Reduce quotient cycles / instruction
reduce “cycles” multiple clock-
cycle design
Increase “instruction” execute more
than one instr.
per cycle!
![Page 38: Datorteknik PerformanceAnalyse bild 1 Performance –what is it: measures of performance The CPU Performance Equation: –Execution time as the measure –what](https://reader033.vdocument.in/reader033/viewer/2022052603/56649d815503460f94a655a5/html5/thumbnails/38.jpg)
Datorteknik PerformanceAnalyse bild 38
More than one instruction per cycle?
Parallelism– Div/mult + floating point + integer
Superscalarity– Multiple issue etc.
Pipelining– Of general importance