lecture4 performance evaluation

Upload: long6973

Post on 26-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Lecture4 Performance Evaluation

    1/34

    ELEC2300 Computer Organization

    Lecture 4: PerformanceEvaluation

    !

    Professor George Yuan

    !

    Office: Rm. 2527!

    Email:[email protected]

    Note: some of the slides are adapted from Computer Organization and Design.

    Copyright 1998 Morgan Kaufmann Publishers and Notes of Prof. Pattersons CS152Class, Copyright 1997 UCB.

  • 7/25/2019 Lecture4 Performance Evaluation

    2/34

    ELEC2300 Computer Organization Fall 2013 Page 2

    OUTLINE

    !

    What is the computer performance?

    !How to evaluate the performance?

  • 7/25/2019 Lecture4 Performance Evaluation

    3/34

    ELEC2300 Computer Organization Fall 2013 Page 3

    Which of these airplanes has the best performance?

    Airplane Passengers Range (mi) Speed (mph)

    Boeing 737-100 101 630 598

    Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350

    Douglas DC-8-50 146 8720 544

    Time to perform the task (Execution Time)

    execution time, response time,latency

    Tasks per day, hour, week, sec, ns. ..

    throughput, bandwidth

    Latency and throughput often are in opposition

    4 types of airplanes fly between Hong Kong & Shanghai(distance: D mi.)

    S

    DL =

    CSD

    CD

    ST !!=!

    !

    =

    11

  • 7/25/2019 Lecture4 Performance Evaluation

    4/34

    ELEC2300 Computer Organization Fall 2013 Page 4

    Example

    ! Execution time of Concorde vs. 747:

    "

    Concorde is 1350 mph / 610 mph = 2.2 times faster

    ! Throughput of Concorde vs. 747:

    "Boeing is 286700 pmph / 178200 pmph = 1.6 times

    faster (470*610=286700, 132*1350=178200)

    ! Conclusions:

    "

    Concorde is 2.2 times faster in terms of flying time.

    "747 is 1.6 times faster in terms of throughput.

  • 7/25/2019 Lecture4 Performance Evaluation

    5/34

    ELEC2300 Computer Organization Fall 2013 Page 5

    Execution Time vs. Throughput

    ! Execution time

    "

    How long does it take for my job to run?"

    How long does it take to execute a job?

    " How long must I wait for the database query?

    ! Throughput:

    "

    How many tasks can the machine run at once?" What is the average execution rate?

    " How much work is getting done?

    !

    Computer upgrade:

    1.

    P3 -> P42.1 P3 -> 2 P3

    !We will focus primarily on execution time for asingle job.

  • 7/25/2019 Lecture4 Performance Evaluation

    6/34

    ELEC2300 Computer Organization Fall 2013 Page 6

    Definitions

    !For computer study,

    " X is n times faster than Y" means

    Problem:"

    machine A runs a program in 20 seconds (1 program/20sec)

    "machine B runs the same program in 25 seconds (1program/25 sec)

    XX timeexecution

    eperformanc1

    =

    Y

    X Ytimeexecutioneperformanceperformancn ==

    Xtimeexecution

  • 7/25/2019 Lecture4 Performance Evaluation

    7/34ELEC2300 Computer Organization Fall 2013 Page 7

    !

    Elapsed time or response time"count everything (disk and memory accesses, I/O , etc.)

    "a useful number, but often not good for comparison purposes

    !CPU time

    "

    Does not count I/O or time spent running other programs"can be broken up into system time, and user time

    !Our focus: user CPU time"

    time spent executing the lines of code that are "in" our program

    "

    System CPU time: time the CPU spends executing system

    (kernal) code in order to run your program, such as, readingfiles, moving information into and out of virtual memory, etc.

    Execution Time

    XX timeuser CPU

    eperformanc1

    =

  • 7/25/2019 Lecture4 Performance Evaluation

    8/34ELEC2300 Computer Organization Fall 2013 Page 8

    CPU Time Measurement: Clock Cycles

    !

    Instead of reporting execution time in seconds, we often

    use cycles

    !Processor runs machine instructions based on clockclock cycle time

    !clock rate (frequency) = cycles per second (1 Hz. = 1cycle/sec)

    A 200 Mhz. clock cycle time is

    cycle

    seconds

    program

    cycles

    program

    seconds!=

    time

  • 7/25/2019 Lecture4 Performance Evaluation

    9/34ELEC2300 Computer Organization Fall 2013 Page 9

    Relating the Metrics

    !CPU time for a programCPU time = CPU clock cycles * clock cycle time

    = CPU clock cycles/clock rate

    !

    Common ways to improve performance(i.e. shorten CPU execution time):"

    Reduce number of required CPU clock cycles for

    a program"Shorten clock cycle time (i.e. increase clock rate)

  • 7/25/2019 Lecture4 Performance Evaluation

    10/34

    ELEC2300 Computer Organization Fall 2013 Page 10

    Example-Problem

    ! Description:

    "

    A program takes 10 seconds to run on a 400 MHz

    machine (computer A). We want to design a fastermachine (computer B) that can run the same program

    in 6 seconds.

    "

    The increase in clock rate affects the rest of the CPU

    design, causing machine B to require 1.2 times asmany clock cycles as machine A for the program.

    ! Problem to solve:

    "

    What clock rate should machine B have?

  • 7/25/2019 Lecture4 Performance Evaluation

    11/34

    ELEC2300 Computer Organization Fall 2013 Page 11

    Example - Answer

  • 7/25/2019 Lecture4 Performance Evaluation

    12/34

    ELEC2300 Computer Organization Fall 2013 Page 12

    Cycle Number Calculation

    !CPU time for a programCPU time = CPU clock cycles * clock cycle time

    = CPU clock cycles/clock rate

    program

    assembly program

    machine instructions

    ISA

    compiler

    assembler

    compiler Instruction #

    clock cycles/instruction (CPI)

    Cycle # = Instruction # !CPI

    processor

  • 7/25/2019 Lecture4 Performance Evaluation

    13/34

    ELEC2300 Computer Organization Fall 2013 Page 13

    Cycles Per Instruction

    !Wrong assumption:

    "

    # of CPU clock cycles in a program = # of instructions in the

    program,

    !Actual situation

    "For some processors, some instructions may take more cycles

    than the others:

    E.g. multiplication takes more cycles than addition

    Floating point operations takes more cycles than integer

    operations

    Memory access takes more cycles than accessing registers

    "

    Conclusion: not all instructions require the same # of cycles toexecute.

    !Cycle per instructions (CPI) an average number of

    clock cycles that each instruction in a program takes to

    execute.

  • 7/25/2019 Lecture4 Performance Evaluation

    14/34

    ELEC2300 Computer Organization Fall 2013 Page 14

    Cycles Per Instruction (CPI)

    !Definition (for a given program):

    CPI = (CPU clock cycles)/(instruction count)

    !A program has the same instruction count on two

    different implementations of the same instruction setarchitecture, but it may have different CPIs (because aninstruction may require different numbers of clock cycleson different implementations). If the number of clockcycles for a program is known, knowing either the

    instruction count or the CPI can determine the other.!

    CPI provides a measure for comparing implementations.

    !

    Instruction count can be measured using software toolsor simulators.

  • 7/25/2019 Lecture4 Performance Evaluation

    15/34

    ELEC2300 Computer Organization Fall 2013 Page 15

    Cycles Per Instruction

    !Let there be n different instruction classes

    (with different CPIs). For a given program,

    suppose we know:

    "

    CPIi= CPI for instruction class i

    "Ci= # of instruction of class I

    !CPU clock cycles = CPI * instruction count. It

    can be generalized to

    ! !

    !

    = =

    =

    "=

    "=

    n

    i

    n

    i

    iii

    n

    iii

    CCCPICPIand

    CCPIcyclesclockCPU

    1 1

    1

    /)(

    )(__

  • 7/25/2019 Lecture4 Performance Evaluation

    16/34

    ELEC2300 Computer Organization Fall 2013 Page 16

    !

    Suppose we have two implementations of the

    same instruction set architecture (ISA)

    !For some program, machine A has a clock cycle

    time of 1 ns (1 GHz) and a CPI of 2.0. Machine

    B has a clock cycle time of 2 ns (500MHz) and aCPI of 1.2. Which machine is faster for this

    program, and by how much?

    !If two machines have the same ISA which of our

    quantities (e.g., clock rate, CPI, execution time, # ofinstructions, MIPS) will always be identical?

    CPI Example

  • 7/25/2019 Lecture4 Performance Evaluation

    17/34

    ELEC2300 Computer Organization Fall 2013 Page 17

    Example - Solution

  • 7/25/2019 Lecture4 Performance Evaluation

    18/34

    ELEC2300 Computer Organization Fall 2013 Page 18

    Relating the metrics

    !For a given program X running on a machine A

    !

    The only complete and reliable measure is CPU executiontime

    !Other measures are unreliable. E.g. changing theinstruction set to lower the instruction count may lead to alarger CPI or an organization with a slower clock rate.Either case can offset the improvement in instruction count.

    =# of instructions

    a program

    second

    clock

    # of clocks

    # of instructions* *

    = instruction count * CPI * clock cycle time

    seconds

    program

    = instruction count * CPI / clock rate

    Time =

  • 7/25/2019 Lecture4 Performance Evaluation

    19/34

    ELEC2300 Computer Organization Fall 2013 Page 19

    ExampleComparing Code Segments

    ! Description

    "A particular machine has the following hardware facts:

    "For a given C++ statement, a compiler designer considers two

    code sequences with the following instruction counts:

    !

    Problem to solve

    "Which code sequence executes the most instructions? Which is

    faster? What is the CPI for each sequence?

    Instruction class CPI for this instruction class

    A 1

    B 2

    C 3

    Code sequenceInstruction counts for instruction classes

    A B C

    1 2 1 2

    2 4 1 1

  • 7/25/2019 Lecture4 Performance Evaluation

    20/34

    ELEC2300 Computer Organization Fall 2013 Page 20

    Example - Answer

  • 7/25/2019 Lecture4 Performance Evaluation

    21/34

    ELEC2300 Computer Organization Fall 2013 Page 21

    A misleading measure - MIPS

    !There are some performance measures that are

    famous among computer manufacturers andsellers but are misleading!

    !

    MIPS (million instructions per second)

    (meaningless indication of processor speed)

    "MIPS = (instruction count)/(execution time * 106)

    "

    MIPS depends on

    Instruction set (instructions have different capabilities)

    Program

    "

    MIPS can vary inversely with performance

    "

    Peak

    performance

  • 7/25/2019 Lecture4 Performance Evaluation

    22/34

    ELEC2300 Computer Organization Fall 2013 Page 22

    Some Processors in MIPSProcessor IPS Year

    Motorola 68000 1MIPS @ 8MHz 1979

    Intel 386DX 8.5MIPS @ 25MHz 1988Intel 486DX 54MIPS @ 66MHz 1992

    PowerPC G2 35MIPS @ 33MHz 1994

    Intel Pentium Pro 541MIPS @ 200MHz 1996

    ARM 7500FE 35.9MIPS @ 40MHz 1996

    PowerPC G3 525MIPS @ 233MHz 1997

    Zilog eZ80 80MIPS @ 50MHz 1999

    Intel Pentium III 1354MIPS @ 500MHz 1999

    AMD Athlon 3561MIPS @ 1.2GHz 2000

    Pentium 4 9726MIPS @ 3.2GHz 2003

    ARM Cortex A8 2000MIPS @ 1.0GHz 2005

    Xbox360 IBM Xenon Triple Core 6400MIPS @ 3.2GHz 2005

    AMD Athlon 64 3800+ X2(Dual Core) 14564MIPS @ 2.0GHz 2005

    Intel Core2 Extreme QX6700 57063MIPS @ 3.33GHz 2006

  • 7/25/2019 Lecture4 Performance Evaluation

    23/34

  • 7/25/2019 Lecture4 Performance Evaluation

    24/34

    ELEC2300 Computer Organization Fall 2013 Page 24

    ! Two different compilers are being tested for a 100 MHz. machine

    with three different classes of instructions: Class A, Class B, andClass C, which require one, two, and three cycles (respectively).Both compilers are used to produce code for a large piece ofsoftware.

    "The first compiler's code uses 5 million Class A instructions, 1

    million Class B instructions, and 1 million Class C instructions."The second compiler's code uses 10 million Class A instructions,1 million Class B instructions, and 1 million Class C instructions.

    ! What are the execution times for each sequence?

    !

    What is the MIPS index for this processor based on the two testingsequence?

    MIPS example

  • 7/25/2019 Lecture4 Performance Evaluation

    25/34

    ELEC2300 Computer Organization Fall 2013 Page 25

    !

    Some related terminology:

    "

    clock, clock cycle, cycle

    "

    clock cycle time, cycle time (seconds, us, ns)

    "clock rate, cycle rate (Hz, MHz)

    "CPI (cycles per instruction)

    "

    MIPS (millions of instructions per second)

    !Performance is determined by the execution time

    !Execution time calculation:

    Summary

    = instruction count * CPI * clock cycle time

    = instruction count * CPI / clock rate

    Execution Time

  • 7/25/2019 Lecture4 Performance Evaluation

    26/34

    ELEC2300 Computer Organization Fall 2013 Page 26

    OUTLINE

    !

    What is the computer performance?

    !How to evaluate the performance?

  • 7/25/2019 Lecture4 Performance Evaluation

    27/34

    ELEC2300 Computer Organization Fall 2013 Page 27

    ! Execution time calculation:

    ! Benchmark: a set of specially designed programs to test theperformance of a computer

    !

    Performance best determined by running a real application"Benchmarks are application specific

    CPU performance, graphics, high-performance computing, object-oriented computing, Java applications, client-server models, mailsystems, file systems, Web servers.

    !

    SPEC (System Performance Evaluation Cooperative)

    "companies have agreed on a set of real program and inputs

    "valuable indicator of computer performanceProcessor (ISA implementation) + compiler

    Benchmarks

    = instruction count * CPI * clock cycle time= instruction count * CPI / clock rate

    Execution Time

  • 7/25/2019 Lecture4 Performance Evaluation

    28/34

    ELEC2300 Computer Organization Fall 2013 Page 28

    SPEC 89! Compiler enhancementsand performance

    0

    100

    200

    300

    400

    500

    600

    700

    800

    tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

    BenchmarkCompiler

    Enhanced compiler

    SPEC

    performa

    nceratio

  • 7/25/2019 Lecture4 Performance Evaluation

    29/34

    ELEC2300 Computer Organization Fall 2013 Page 29

    !SPEC ratio

    "

    Reference: Sun Ultra 5_10 with a 300MHzprocessor

    !

    CINT2000, CFP2000

    "Geometric mean of SPEC ratios

    SPEC CPU2000

  • 7/25/2019 Lecture4 Performance Evaluation

    30/34

    ELEC2300 Computer Organization Fall 2013 Page 30

    SPEC CPU2000 Benchmarks

  • 7/25/2019 Lecture4 Performance Evaluation

    31/34

    ELEC2300 Computer Organization Fall 2013 Page 31

    SPEC CPU2000 ratings

  • 7/25/2019 Lecture4 Performance Evaluation

    32/34

    ELEC2300 Computer Organization Fall 2013 Page 32

    Execution Time After Improvement =

    Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

    Example:

    "Suppose a program runs in 100 seconds on a machine, with

    multiplication responsible for 80 seconds of this time. How much do wehave to improve the speed of multiplication if we want the program to run

    4 times faster?"

    How about making the program 5 times faster?

    Principle: Make the common case fast

    Amdahl's Law

  • 7/25/2019 Lecture4 Performance Evaluation

    33/34

    ELEC2300 Computer Organization Fall 2013 Page 33

    ! Suppose we enhance a machine making all floating-point instructions

    five times faster. If the execution time of some benchmark before thefloating-point enhancement is 10 seconds, what will the speedup be if

    half of the 10 seconds is spent executing floating-point instructions?

    ! We are looking for a benchmark to show off the new floating-point

    unit described above, and want the overall benchmark to show aspeedup of 3. One benchmark we are considering runs for 100

    seconds with the old floating-point hardware. How much of the

    execution time would floating-point instructions have to account for

    in this program in order to yield our desired speedup on this

    benchmark?

    Example

  • 7/25/2019 Lecture4 Performance Evaluation

    34/34

    ELEC2300 C t O i ti F ll 2013 P 34

    !Performance is specific to a particular program

    "Total execution time is a consistent summary of performance

    !For a given architecture performance increases come

    from:

    " increases in clock rate (without adverse CPI affects)

    "

    improvements in processor organization that lower CPI

    "compiler enhancements that lower CPI and/or instruction count

    !

    Pitfall: expecting improvement in one aspect of amachines performance to affect the total performance

    !You should not always believe everything you read!

    Read carefully!

    Remember