computer architecture chapter 2 the role of performance

21
Lec2.1 Computer Architecture Chapter 2 The Role of Performance

Upload: gabby

Post on 13-Jan-2016

49 views

Category:

Documents


1 download

DESCRIPTION

Computer Architecture Chapter 2 The Role of Performance. Introduction. This chapter discusses how to measure, report, and summarize performance and describes the major factor that determine the performance of a computer - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Architecture Chapter 2 The Role of Performance

Lec2.1

Computer Architecture

Chapter 2

The Role of Performance

Page 2: Computer Architecture Chapter 2 The Role of Performance

Lec2.2

° This chapter discusses how to measure, report, and summarize performance and describes the major factor that determine the performance of a computer

° A primary reason for examining performance is that hardware performance is often key to the effectiveness of an entire system of hardware and software

° Assessing the performance of such a system can be quite challenging

° Key to understanding underlying organizational motivation

Why is some hardware better than others for different programs?

What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)

How does the machine's instruction set affect performance?

° For example, to improve the performance of a software system, we may need to understand what factors in the hardware contribute to the overall performance and the relative importance of these factors

Introduction

Page 3: Computer Architecture Chapter 2 The Role of Performance

Lec2.3

Which of these airplanes has the best performance?

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598Boeing 747 470 4150 610BAD/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544

°How much faster is the Concorde compared to the 747?

°How much bigger is the 747 than the Douglas DC-8?

Page 4: Computer Architecture Chapter 2 The Role of Performance

Lec2.4

° Response Time (latency): the time between the start and completion of a task

— How long does it take for my job to run?

— How long does it take to execute a job?

— How long must I wait for the database query?

° Throughput: the total amount of work done in a given time

— How many jobs can the machine run at once?

— What is the average execution rate?

— How much work is getting done?

° If we upgrade a machine with a new processor what do we increase?

If we add a new machine to the lab what do we increase?

Computer Performance

Page 5: Computer Architecture Chapter 2 The Role of Performance

Lec2.5

Two notions of “performance”

Plane

Boeing 747

BAD/Sud Concorde

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmph)

286,700

178,200

Which has higher performance?

Page 6: Computer Architecture Chapter 2 The Role of Performance

Lec2.6

Example

• Time of Concorde vs. Boeing 747?

• Concord is 1350 mph / 610 mph = 2.2 times faster

= 6.5 hours / 3 hours

• Throughput of Concorde vs. Boeing 747 ?

• Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster”

• Boeing is 286,700 pmph / 178,200 pmph = 1.60 “times faster”

• Boeing is 1.6 times (“60%”) faster in terms of throughput

• Concord is 2.2 times (“120%”) faster in terms of flying time

Page 7: Computer Architecture Chapter 2 The Role of Performance

Lec2.7

° Elapsed Time• counts everything (disk and memory accesses, I/O, etc.)

• a useful number, but often not good for comparison purposes

° CPU time• doesn't count I/O or time spent running other programs

• can be broken up into system time, and user time

° Our focus: user CPU time • time spent executing the lines of code that are "in" our program

Execution Time

Page 8: Computer Architecture Chapter 2 The Role of Performance

Lec2.8

° For some program running on machine X,

PerformanceX = 1 / Execution timeX

° "X is n times faster than Y"

PerformanceX / PerformanceY = n

° Problem:• machine A runs a program in 20 seconds

• machine B runs the same program in 25 seconds

Book's Definition of Performance

Page 9: Computer Architecture Chapter 2 The Role of Performance

Lec2.9

Clock Cycles

° Almost all computers are constructed using a clock that runs at a constant rate and determines when events take place in the hardware

° These discrete time intervals are called clock cycles (or clock ticks, clock periods, clocks, cycles)

° Instead of reporting execution time in seconds, we often use cycles

° cycle time = time between ticks = seconds per cycle

° clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)

A 200 Mhz. clock has a cycle time

time

1

200 106 109 5 nanoseconds

Page 10: Computer Architecture Chapter 2 The Role of Performance

Lec2.10

° A given program will require

• some number of instructions (machine instructions)

• some number of cycles

• some number of seconds

° We have a vocabulary that relates these quantities:

• cycle time (seconds per cycle)

• clock rate (cycles per second)

• CPI (cycles per instruction)

a floating point intensive application might have a higher CPI

• MIPS (millions of instructions per second)

this would be higher for a program using simple instructions

Now that we understand cycles

Page 11: Computer Architecture Chapter 2 The Role of Performance

Lec2.11

CPI

CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count

“Average cycles per instruction”

CPU clock cycles = CPI * Ci i

Page 12: Computer Architecture Chapter 2 The Role of Performance

Lec2.12

° Suppose we have two implementations of the same instruction set architecture (ISA).

For some program,

Machine A has a clock cycle time of 10 ns. and a CPI of 2.0 Machine B has a clock cycle time of 20 ns. and a CPI of 1.2

What machine is faster for this program, and by how much?

° If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

CPI Example

Page 13: Computer Architecture Chapter 2 The Role of Performance

Lec2.13

° A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively).

The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of CThe second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.

Which sequence will be faster? How much?What is the CPI for each sequence?

# of Instructions Example

Page 14: Computer Architecture Chapter 2 The Role of Performance

Lec2.14

° MIPS = Instruction count/(Execution time * 10 )

° Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software.

The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.

The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.

° Which sequence will be faster according to MIPS?

° Which sequence will be faster according to execution time?

MIPS example

6

Page 15: Computer Architecture Chapter 2 The Role of Performance

Lec2.15

Metrics of performance

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per month

Useful Operations per second

Page 16: Computer Architecture Chapter 2 The Role of Performance

Lec2.16

° Performance best determined by running a real application• Use programs typical of expected workload

• Or, typical of expected class of applicationse.g., compilers/editors, scientific applications, graphics, etc.

° Small benchmarks• nice for architects and designers

• easy to standardize

• can be abused

° SPEC (System Performance Evaluation Cooperative)• companies have agreed on a set of real program and inputs

• can still be abused (Intel’s “other” bug)

• valuable indicator of performance (and compiler technology)

Benchmarks

Page 17: Computer Architecture Chapter 2 The Role of Performance

Lec2.17

SPEC ‘89

° Compiler “enhancements” and performance

0

100

200

300

400

500

600

700

800

tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

BenchmarkCompiler

Enhanced compiler

SP

EC

pe

rfo

rman

ce r

atio

Page 18: Computer Architecture Chapter 2 The Role of Performance

Lec2.18

SPEC95

° Eighteen application benchmarks (with inputs) reflecting a technical computing workload

° Eight integer

• go, m88ksim, gcc, compress, li, ijpeg, perl, vortex

° Ten floating-point intensive

• tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5

Page 19: Computer Architecture Chapter 2 The Role of Performance

Lec2.19

SPEC ‘95

Benchmark Description

go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database program

tomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation

Page 20: Computer Architecture Chapter 2 The Role of Performance

Lec2.20

SPEC ‘95

Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

Clock rate (MHz)

SP

EC

int

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Pentium

Pentium Pro

PentiumClock rate (MHz)

SP

EC

fp

Pentium Pro

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Page 21: Computer Architecture Chapter 2 The Role of Performance

Lec2.21

Execution Time After Improvement =

Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

° Example:

"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"

How about making it 5 times faster?

Amdahl's Law