cpe 442 introduction to computer architecture the role of performance

25
CpE442 Lec2.1 Introduction to Computer Architectures CpE 442 Introduction to Computer Architecture The Role of Performance Instructor: H. H. Ammar

Upload: gabriel-bernard

Post on 31-Dec-2015

39 views

Category:

Documents


0 download

DESCRIPTION

CpE 442 Introduction to Computer Architecture The Role of Performance. Instructor: H. H. Ammar. Overview of Today’s Lecture: The Role of Performance. Review from Last Lecture Definition and Measures of Performance Summarizing Performance and Performance Pitfalls. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.1 Introduction to Computer Architectures

CpE 442Introduction to Computer

Architecture

The Role of Performance

Instructor: H. H. Ammar

Page 2: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.2 Introduction to Computer Architectures

Overview of Today’s Lecture: The Role of Performance

° Review from Last Lecture

° Definition and Measures of Performance

° Summarizing Performance and Performance Pitfalls

Page 3: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.3 Introduction to Computer Architectures

Review: What is "Computer Architecture"

° Co-ordination of levels of abstraction

I/O systemInstr. Set Proc.

CompilerOperating

System

Application

Digital Design

Circuit Design

° Under a set of rapidly changing Forces

Instruction Set Architecture

Page 4: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.4 Introduction to Computer Architectures

Review: Levels of Representation

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

Page 5: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.5 Introduction to Computer Architectures

Review: Levels of Organization

SPARCstation 20

SPARC Processor

Computer

Control

Datapath

Memory Devices

Input

Output

Page 6: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.6 Introduction to Computer Architectures

Review: Summary from Last Lecture

° All computers consist of five components

• Processor: (1) datapath and (2) control

• (3) Memory

• (4) Input devices and (5) Output devices

° Not all “memory” are created equally

• Cache: fast (expensive) memory are placed closer to the processor

• Main memory: less expensive memory--we can have more

° Input and output (I/O) devices has the messiest organization

• Wide range of speed: graphics vs. keyboard

• Wide range of requirements: speed, standard, cost ... etc.

• Least amount of research (so far)

Page 7: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.7 Introduction to Computer Architectures

Processor Performance

Year

P

e

r

f

o

r

m

a

n

c

e

0

20

40

60

80

100

120

1987 1988 1989 1990 1991 1992 1993

DEC AXP

3000

HP 9000/750

IBM

RS6000/540MIPS M2000MIPS M/120Sun-4/260

IBM Power 2/590

1.54X/yr

1.35X/yr

Page 8: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.8 Introduction to Computer Architectures

Metrics of performance

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per monthOperations per second

Page 9: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.9 Introduction to Computer Architectures

Relating Processor Metrics

° CPU execution time = CPU clock cycles/pgm X clock cycle time

° or CPU execution time = CPU clock cycles/pgm ÷ clock rate

° CPU clock cycles/pgm = Instructions/pgm X CPI the avg. clock cycles per instruction

° or CPI = CPU clock cycles/pgm ÷ Instructions/pgm

° CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program measured

Page 10: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.10 Introduction to Computer Architectures

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

instr. count CPI clock rate

Program

Compiler

Instr. Set Arch.

Organization

Technology

Page 11: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.11 Introduction to Computer Architectures

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

instr count CPI clock rate

Program X (x)

Compiler X (x)

Instr. Set. X X

Organization X X

Technology X

Page 12: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.12 Introduction to Computer Architectures

Organizational Trade-offs

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

Instruction Mix

Cycle Time

CPI

Page 13: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.13 Introduction to Computer Architectures

CPI

CPU time = ClockCycleTime *CPI * Ii = 1

n

i i

CPI = CPI * F where F = I i = 1

n

i i i i

Instruction Count

"instruction frequency"

Invest Resources where time is Spent!

CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count

“Average cycles per instruction”

Page 14: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.14 Introduction to Computer Architectures

Example

Typical Mix

Base Machine (Reg / Reg)

Op Freq(Fi) CPI(i) % Time

ALU 50% 1 .5 33%

Load 20% 2 .4 27%

Store 10% 2 .2 13%

Branch 20% 2 .4 27%

1.5

The CPI = 1.5 cycles per instructionAssignment 1: Turn in the solution of the following problems from the text book By Thursday September 4,Chapter 2, Exercises Section, problems number 2.1, 2.2, 2.3, 2.4, 2.10, 2.11, 2.12, 2.13, and 2.15

Page 15: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.15 Introduction to Computer Architectures

Enhanced Machine (Reg / Reg)

Op Freq CPI(i) % Time

ALU 50% 1 .5 25%

Load 20% 3 .6 30%

Store 10% 3 .3 15%

Branch20% 3 .6 30%

2.0

Assume a program of 1 million instructions, Compare the performance of

Base Machine (B) with the above CPI, 1 GHZ clock, and

Enhanced Machine (E) with 1.333 GHZ and a one cycle increase for L/S

And branch instructions

Page 16: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.16 Introduction to Computer Architectures

Perf. of machine X = 1 / exec. Time of prog on machine X

Perf. of E / Perf. of B = exec. Time of B / exec. Time of E = 1.5 * 1 / 2 * 0.75

= 1

Performance of B is similar to that of E,

No gain in performance

Page 17: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.17 Introduction to Computer Architectures

Marketing Metrics

MIPS = Instruction Count / (Time * 10^6)

= Clock Rate / (CPI * 10^6)

•machines with different instruction sets ?

•programs with different instruction mixes ?

• dynamic frequency of instructions

• uncorrelated with performance

MFLOP/S= FP Operations / (Time * 10^6)

•machine dependent

•often not where time is spent

Page 18: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.18 Introduction to Computer Architectures

Example showing why MIPS can failCompare performance with Compilers 1 and 2 for a given program on a given machine

Instruction Count in Billion forinstruction classes A B C

Compiler 1 5 1 1Compiler 2 10 1 1clock cycles 1 2 3

Clock cycles using compiler1 = 10 BillionClock cycles using compiler2 = 15 Billion

assuming 1GHZ clockCPU Time 1 = 10 secsCPU Time 2 = 15 secs

yet the MIPS rating isMIPS 1 = (instr. Count/cpu time in sec x 10^6) = 700MIPS 2 = 800

Page 19: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.19 Introduction to Computer Architectures

Why Do Benchmarks?° How we evaluate differences

• Different systems

• Changes to a single system

° Provide a target

• Benchmarks should represent large class of important programs

• Improving benchmark performance should help many programs

° For better or worse, benchmarks shape a field

° Good ones accelerate progress

• good target for development

° Bad benchmarks hurt progress

• help real programs v. sell machines/papers?

• Inventions that help real programs don’t help benchmark

Page 20: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.20 Introduction to Computer Architectures

Programs to Evaluate Processor Performance

° (Toy) Benchmarks

• 10-100 line

• e.g.,: sieve, puzzle, quicksort

° Synthetic Benchmarks

• attempt to match average frequencies of real workloads

• e.g., Whetstone, dhrystone

° Kernels

• Time critical excerpts Real programs

• e.g., gcc, spice

Page 21: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.21 Introduction to Computer Architectures

Successful Benchmark: SPEC

° EE Times + 5 companies band together to perform Systems Performance Evaluation Committee (SPEC) in 1988: Sun, MIPS, HP, Apollo, DEC

° Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

Page 22: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.22 Introduction to Computer Architectures

SPEC first round

° First round 1989; 10 programs, single number to summarize performance

° One program: 99% of time in single line of code

° New front-end compiler could improve dramatically

Benchmark

SP

EC

Pe

rf

0

100

200

300

400

500

600

700

800

gcc

epresso

spice

doduc

nasa7

li

eqntott

matrix300

fpppp

tomcatv

Page 23: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.23 Introduction to Computer Architectures

SPEC second round, SPEC95

• 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran

Page 24: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.24 Introduction to Computer Architectures

Amdahl's Law

Speedup due to enhancement E:

ExTime w/o E Performance w/ E

Speedup(E) = -------------------- = ---------------------

ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task

by a factor S and the remainder of the task is unaffected then,

ExTime(with E) = ((1-F) + F/S) X ExTime(without E)

Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E)

<= 1/(1-F) speed up is bounded by this factor

Page 25: CpE 442 Introduction to Computer Architecture The Role of Performance

CpE442 Lec2.25 Introduction to Computer Architectures

Performance Evaluation Summary

° Time is the measure of computer performance!

° Good products created when have:

• Good benchmarks

• Good ways to summarize performance

° If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins

° Remember Amdahl’s Law: Speedup is limited by unimproved part of program

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle