Download - Lecture4 Performance Evaluation
-
7/25/2019 Lecture4 Performance Evaluation
1/34
ELEC2300 Computer Organization
Lecture 4: PerformanceEvaluation
!
Professor George Yuan
!
Office: Rm. 2527!
Email:[email protected]
Note: some of the slides are adapted from Computer Organization and Design.
Copyright 1998 Morgan Kaufmann Publishers and Notes of Prof. Pattersons CS152Class, Copyright 1997 UCB.
-
7/25/2019 Lecture4 Performance Evaluation
2/34
ELEC2300 Computer Organization Fall 2013 Page 2
OUTLINE
!
What is the computer performance?
!How to evaluate the performance?
-
7/25/2019 Lecture4 Performance Evaluation
3/34
ELEC2300 Computer Organization Fall 2013 Page 3
Which of these airplanes has the best performance?
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598
Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350
Douglas DC-8-50 146 8720 544
Time to perform the task (Execution Time)
execution time, response time,latency
Tasks per day, hour, week, sec, ns. ..
throughput, bandwidth
Latency and throughput often are in opposition
4 types of airplanes fly between Hong Kong & Shanghai(distance: D mi.)
S
DL =
CSD
CD
ST !!=!
!
=
11
-
7/25/2019 Lecture4 Performance Evaluation
4/34
ELEC2300 Computer Organization Fall 2013 Page 4
Example
! Execution time of Concorde vs. 747:
"
Concorde is 1350 mph / 610 mph = 2.2 times faster
! Throughput of Concorde vs. 747:
"Boeing is 286700 pmph / 178200 pmph = 1.6 times
faster (470*610=286700, 132*1350=178200)
! Conclusions:
"
Concorde is 2.2 times faster in terms of flying time.
"747 is 1.6 times faster in terms of throughput.
-
7/25/2019 Lecture4 Performance Evaluation
5/34
ELEC2300 Computer Organization Fall 2013 Page 5
Execution Time vs. Throughput
! Execution time
"
How long does it take for my job to run?"
How long does it take to execute a job?
" How long must I wait for the database query?
! Throughput:
"
How many tasks can the machine run at once?" What is the average execution rate?
" How much work is getting done?
!
Computer upgrade:
1.
P3 -> P42.1 P3 -> 2 P3
!We will focus primarily on execution time for asingle job.
-
7/25/2019 Lecture4 Performance Evaluation
6/34
ELEC2300 Computer Organization Fall 2013 Page 6
Definitions
!For computer study,
" X is n times faster than Y" means
Problem:"
machine A runs a program in 20 seconds (1 program/20sec)
"machine B runs the same program in 25 seconds (1program/25 sec)
XX timeexecution
eperformanc1
=
Y
X Ytimeexecutioneperformanceperformancn ==
Xtimeexecution
-
7/25/2019 Lecture4 Performance Evaluation
7/34ELEC2300 Computer Organization Fall 2013 Page 7
!
Elapsed time or response time"count everything (disk and memory accesses, I/O , etc.)
"a useful number, but often not good for comparison purposes
!CPU time
"
Does not count I/O or time spent running other programs"can be broken up into system time, and user time
!Our focus: user CPU time"
time spent executing the lines of code that are "in" our program
"
System CPU time: time the CPU spends executing system
(kernal) code in order to run your program, such as, readingfiles, moving information into and out of virtual memory, etc.
Execution Time
XX timeuser CPU
eperformanc1
=
-
7/25/2019 Lecture4 Performance Evaluation
8/34ELEC2300 Computer Organization Fall 2013 Page 8
CPU Time Measurement: Clock Cycles
!
Instead of reporting execution time in seconds, we often
use cycles
!Processor runs machine instructions based on clockclock cycle time
!clock rate (frequency) = cycles per second (1 Hz. = 1cycle/sec)
A 200 Mhz. clock cycle time is
cycle
seconds
program
cycles
program
seconds!=
time
-
7/25/2019 Lecture4 Performance Evaluation
9/34ELEC2300 Computer Organization Fall 2013 Page 9
Relating the Metrics
!CPU time for a programCPU time = CPU clock cycles * clock cycle time
= CPU clock cycles/clock rate
!
Common ways to improve performance(i.e. shorten CPU execution time):"
Reduce number of required CPU clock cycles for
a program"Shorten clock cycle time (i.e. increase clock rate)
-
7/25/2019 Lecture4 Performance Evaluation
10/34
ELEC2300 Computer Organization Fall 2013 Page 10
Example-Problem
! Description:
"
A program takes 10 seconds to run on a 400 MHz
machine (computer A). We want to design a fastermachine (computer B) that can run the same program
in 6 seconds.
"
The increase in clock rate affects the rest of the CPU
design, causing machine B to require 1.2 times asmany clock cycles as machine A for the program.
! Problem to solve:
"
What clock rate should machine B have?
-
7/25/2019 Lecture4 Performance Evaluation
11/34
ELEC2300 Computer Organization Fall 2013 Page 11
Example - Answer
-
7/25/2019 Lecture4 Performance Evaluation
12/34
ELEC2300 Computer Organization Fall 2013 Page 12
Cycle Number Calculation
!CPU time for a programCPU time = CPU clock cycles * clock cycle time
= CPU clock cycles/clock rate
program
assembly program
machine instructions
ISA
compiler
assembler
compiler Instruction #
clock cycles/instruction (CPI)
Cycle # = Instruction # !CPI
processor
-
7/25/2019 Lecture4 Performance Evaluation
13/34
ELEC2300 Computer Organization Fall 2013 Page 13
Cycles Per Instruction
!Wrong assumption:
"
# of CPU clock cycles in a program = # of instructions in the
program,
!Actual situation
"For some processors, some instructions may take more cycles
than the others:
E.g. multiplication takes more cycles than addition
Floating point operations takes more cycles than integer
operations
Memory access takes more cycles than accessing registers
"
Conclusion: not all instructions require the same # of cycles toexecute.
!Cycle per instructions (CPI) an average number of
clock cycles that each instruction in a program takes to
execute.
-
7/25/2019 Lecture4 Performance Evaluation
14/34
ELEC2300 Computer Organization Fall 2013 Page 14
Cycles Per Instruction (CPI)
!Definition (for a given program):
CPI = (CPU clock cycles)/(instruction count)
!A program has the same instruction count on two
different implementations of the same instruction setarchitecture, but it may have different CPIs (because aninstruction may require different numbers of clock cycleson different implementations). If the number of clockcycles for a program is known, knowing either the
instruction count or the CPI can determine the other.!
CPI provides a measure for comparing implementations.
!
Instruction count can be measured using software toolsor simulators.
-
7/25/2019 Lecture4 Performance Evaluation
15/34
ELEC2300 Computer Organization Fall 2013 Page 15
Cycles Per Instruction
!Let there be n different instruction classes
(with different CPIs). For a given program,
suppose we know:
"
CPIi= CPI for instruction class i
"Ci= # of instruction of class I
!CPU clock cycles = CPI * instruction count. It
can be generalized to
! !
!
= =
=
"=
"=
n
i
n
i
iii
n
iii
CCCPICPIand
CCPIcyclesclockCPU
1 1
1
/)(
)(__
-
7/25/2019 Lecture4 Performance Evaluation
16/34
ELEC2300 Computer Organization Fall 2013 Page 16
!
Suppose we have two implementations of the
same instruction set architecture (ISA)
!For some program, machine A has a clock cycle
time of 1 ns (1 GHz) and a CPI of 2.0. Machine
B has a clock cycle time of 2 ns (500MHz) and aCPI of 1.2. Which machine is faster for this
program, and by how much?
!If two machines have the same ISA which of our
quantities (e.g., clock rate, CPI, execution time, # ofinstructions, MIPS) will always be identical?
CPI Example
-
7/25/2019 Lecture4 Performance Evaluation
17/34
ELEC2300 Computer Organization Fall 2013 Page 17
Example - Solution
-
7/25/2019 Lecture4 Performance Evaluation
18/34
ELEC2300 Computer Organization Fall 2013 Page 18
Relating the metrics
!For a given program X running on a machine A
!
The only complete and reliable measure is CPU executiontime
!Other measures are unreliable. E.g. changing theinstruction set to lower the instruction count may lead to alarger CPI or an organization with a slower clock rate.Either case can offset the improvement in instruction count.
=# of instructions
a program
second
clock
# of clocks
# of instructions* *
= instruction count * CPI * clock cycle time
seconds
program
= instruction count * CPI / clock rate
Time =
-
7/25/2019 Lecture4 Performance Evaluation
19/34
ELEC2300 Computer Organization Fall 2013 Page 19
ExampleComparing Code Segments
! Description
"A particular machine has the following hardware facts:
"For a given C++ statement, a compiler designer considers two
code sequences with the following instruction counts:
!
Problem to solve
"Which code sequence executes the most instructions? Which is
faster? What is the CPI for each sequence?
Instruction class CPI for this instruction class
A 1
B 2
C 3
Code sequenceInstruction counts for instruction classes
A B C
1 2 1 2
2 4 1 1
-
7/25/2019 Lecture4 Performance Evaluation
20/34
ELEC2300 Computer Organization Fall 2013 Page 20
Example - Answer
-
7/25/2019 Lecture4 Performance Evaluation
21/34
ELEC2300 Computer Organization Fall 2013 Page 21
A misleading measure - MIPS
!There are some performance measures that are
famous among computer manufacturers andsellers but are misleading!
!
MIPS (million instructions per second)
(meaningless indication of processor speed)
"MIPS = (instruction count)/(execution time * 106)
"
MIPS depends on
Instruction set (instructions have different capabilities)
Program
"
MIPS can vary inversely with performance
"
Peak
performance
-
7/25/2019 Lecture4 Performance Evaluation
22/34
ELEC2300 Computer Organization Fall 2013 Page 22
Some Processors in MIPSProcessor IPS Year
Motorola 68000 1MIPS @ 8MHz 1979
Intel 386DX 8.5MIPS @ 25MHz 1988Intel 486DX 54MIPS @ 66MHz 1992
PowerPC G2 35MIPS @ 33MHz 1994
Intel Pentium Pro 541MIPS @ 200MHz 1996
ARM 7500FE 35.9MIPS @ 40MHz 1996
PowerPC G3 525MIPS @ 233MHz 1997
Zilog eZ80 80MIPS @ 50MHz 1999
Intel Pentium III 1354MIPS @ 500MHz 1999
AMD Athlon 3561MIPS @ 1.2GHz 2000
Pentium 4 9726MIPS @ 3.2GHz 2003
ARM Cortex A8 2000MIPS @ 1.0GHz 2005
Xbox360 IBM Xenon Triple Core 6400MIPS @ 3.2GHz 2005
AMD Athlon 64 3800+ X2(Dual Core) 14564MIPS @ 2.0GHz 2005
Intel Core2 Extreme QX6700 57063MIPS @ 3.33GHz 2006
-
7/25/2019 Lecture4 Performance Evaluation
23/34
-
7/25/2019 Lecture4 Performance Evaluation
24/34
ELEC2300 Computer Organization Fall 2013 Page 24
! Two different compilers are being tested for a 100 MHz. machine
with three different classes of instructions: Class A, Class B, andClass C, which require one, two, and three cycles (respectively).Both compilers are used to produce code for a large piece ofsoftware.
"The first compiler's code uses 5 million Class A instructions, 1
million Class B instructions, and 1 million Class C instructions."The second compiler's code uses 10 million Class A instructions,1 million Class B instructions, and 1 million Class C instructions.
! What are the execution times for each sequence?
!
What is the MIPS index for this processor based on the two testingsequence?
MIPS example
-
7/25/2019 Lecture4 Performance Evaluation
25/34
ELEC2300 Computer Organization Fall 2013 Page 25
!
Some related terminology:
"
clock, clock cycle, cycle
"
clock cycle time, cycle time (seconds, us, ns)
"clock rate, cycle rate (Hz, MHz)
"CPI (cycles per instruction)
"
MIPS (millions of instructions per second)
!Performance is determined by the execution time
!Execution time calculation:
Summary
= instruction count * CPI * clock cycle time
= instruction count * CPI / clock rate
Execution Time
-
7/25/2019 Lecture4 Performance Evaluation
26/34
ELEC2300 Computer Organization Fall 2013 Page 26
OUTLINE
!
What is the computer performance?
!How to evaluate the performance?
-
7/25/2019 Lecture4 Performance Evaluation
27/34
ELEC2300 Computer Organization Fall 2013 Page 27
! Execution time calculation:
! Benchmark: a set of specially designed programs to test theperformance of a computer
!
Performance best determined by running a real application"Benchmarks are application specific
CPU performance, graphics, high-performance computing, object-oriented computing, Java applications, client-server models, mailsystems, file systems, Web servers.
!
SPEC (System Performance Evaluation Cooperative)
"companies have agreed on a set of real program and inputs
"valuable indicator of computer performanceProcessor (ISA implementation) + compiler
Benchmarks
= instruction count * CPI * clock cycle time= instruction count * CPI / clock rate
Execution Time
-
7/25/2019 Lecture4 Performance Evaluation
28/34
ELEC2300 Computer Organization Fall 2013 Page 28
SPEC 89! Compiler enhancementsand performance
0
100
200
300
400
500
600
700
800
tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc
BenchmarkCompiler
Enhanced compiler
SPEC
performa
nceratio
-
7/25/2019 Lecture4 Performance Evaluation
29/34
ELEC2300 Computer Organization Fall 2013 Page 29
!SPEC ratio
"
Reference: Sun Ultra 5_10 with a 300MHzprocessor
!
CINT2000, CFP2000
"Geometric mean of SPEC ratios
SPEC CPU2000
-
7/25/2019 Lecture4 Performance Evaluation
30/34
ELEC2300 Computer Organization Fall 2013 Page 30
SPEC CPU2000 Benchmarks
-
7/25/2019 Lecture4 Performance Evaluation
31/34
ELEC2300 Computer Organization Fall 2013 Page 31
SPEC CPU2000 ratings
-
7/25/2019 Lecture4 Performance Evaluation
32/34
ELEC2300 Computer Organization Fall 2013 Page 32
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
Example:
"Suppose a program runs in 100 seconds on a machine, with
multiplication responsible for 80 seconds of this time. How much do wehave to improve the speed of multiplication if we want the program to run
4 times faster?"
How about making the program 5 times faster?
Principle: Make the common case fast
Amdahl's Law
-
7/25/2019 Lecture4 Performance Evaluation
33/34
ELEC2300 Computer Organization Fall 2013 Page 33
! Suppose we enhance a machine making all floating-point instructions
five times faster. If the execution time of some benchmark before thefloating-point enhancement is 10 seconds, what will the speedup be if
half of the 10 seconds is spent executing floating-point instructions?
! We are looking for a benchmark to show off the new floating-point
unit described above, and want the overall benchmark to show aspeedup of 3. One benchmark we are considering runs for 100
seconds with the old floating-point hardware. How much of the
execution time would floating-point instructions have to account for
in this program in order to yield our desired speedup on this
benchmark?
Example
-
7/25/2019 Lecture4 Performance Evaluation
34/34
ELEC2300 C t O i ti F ll 2013 P 34
!Performance is specific to a particular program
"Total execution time is a consistent summary of performance
!For a given architecture performance increases come
from:
" increases in clock rate (without adverse CPI affects)
"
improvements in processor organization that lower CPI
"compiler enhancements that lower CPI and/or instruction count
!
Pitfall: expecting improvement in one aspect of amachines performance to affect the total performance
!You should not always believe everything you read!
Read carefully!
Remember