pipelining - computer science · b649: parallel architectures and programming, spring 2009 why...

86
PIPELINING B649 Parallel Architectures and Programming

Upload: vocong

Post on 31-Jul-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

PIPELINING

B649Parallel Architectures and Programming

Page 2: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Announcements

• Blogs• Presentation topics and teams

★ already taken:✴ ISA (App. J)✴ Hardware and software for VLIW and EPIC (App. G)✴ Large Scale Multiprocessors and Scientific Applications (App. H)

• Heads-up★ Blog question★ Assignment 1

2

Page 3: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Why Pipelining?

• Instruction-Level Parallelism (ILP)• Reducing Cycles Per Instruction (CPI)

★ if instructions may take multiple cycles

• Decreasing the clock cycle time★ if each instruction takes one (long) cycle

• Invisible to the programmer

3

u = Time per instruction on unpipelined machinen = Number of pipelined stagestime per pipelined instruction = u / n

Page 4: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Basics of a RISC Instruction Set

• All operations on data registers• Only load and store access memory• All instructions are of one (fixed) size• MIPS64 (64-bit) instructions for example

4

Page 5: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Instruction Set Overview

• ALU instructions★ R1 ← R2 op R3

✴ R1, R2, R3: registers★ R1 ← R2 op I

✴ I: signed extended 16-bit immediate

• Load and store instructions★ LD R1:O, R2

✴ R1, R2: registers, O: 16-bit signed extended 16-bit immediate

• Branch and jump instructions★ comparison between two registers, or register and zero★ no unconditional jump

5

Page 6: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Digression: Recall Multiplexer

6

Page 7: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

• Positive Number: extend with zeroes

• Negative number: extend with ones

Digression: Sign Extension

7

x (16 bits) =

x (32 bits) =

0 x

0 000000000000000 x

-x (16 bits) =

-x (32 bits) =

1 x

1 111111111111111 x

Page 8: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

• Positive Number: extend with zeroes

• Negative number: extend with ones

Digression: Sign Extension

7

x (16 bits) =

x (32 bits) =

0 x

0 000000000000000 x

-x (16 bits) =

-x (32 bits) =

1 x

1 111111111111111 x

-x (16 bits) = (2¹⁶−x−1)-x (extended) = (2¹⁶−x−1) + 2¹⁶(2¹⁶−1) = (2³²−x−1) = -x (32 bits)

Page 9: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

8

Page 10: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle★ fetch current instruction from PC, add 4 to PC

• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

9

Page 11: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle

★ decode instruction, read registers (fixed field decoding)★ do equality test on registers for possible branch★ sign extend offset field, in case it is needed★ add offset to possible branch target address

• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

10

Page 12: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle

★ memory reference: base address+offset to compute effective address

★ register-register ALU instruction: perform the operation★ register-immediate ALU instruction: perform the operation

• MEM: Memory access cycle• WB: Write-back cycle

11

Page 13: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle

★ read or write based on effective address computed in last cycle

• WB: Write-back cycle

12

Page 14: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

★ register-register ALU instruction or Load: write result (computed or loaded from memory) into register file

13

Page 15: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Implementation

• IF: Instruction fetch cycle• ID: Instruction decode / register fetch cycle• EX: Execute / effective address cycle• MEM: Memory access cycle• WB: Write-back cycle

14

branch = 2 cyclesstore = 4 cyclesall else = 5 cyclesCPI = 4.54, assuming 12% branches, 10% stores

Page 16: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple Pipelined Implementation

15

Page 17: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Some Considerations

16

• Resource evaluation★ avoid resource conflicts across stages

• Separate instruction and data memories★ typically, with separate I and D caches

• Register access★ write in first half, read in second half

• PC not shown★ also need an adder to compute branch target★ branch does not change PC until ID (second) stage

✴ ignore for now!

Page 18: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Prevent Interference

17

Page 19: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Observations

18

• Each instruction takes the same number of cycles• Instruction throughput increases

★ hence programs run faster

• Imbalance among pipeline stages reduces performance

• Overheads★ pipeline delays (register setup time)★ clock skew (clock cycle ≥ clock skew + latch overhead)

• Hazards ahead!

Page 20: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Pipeline Hazards

• Structural hazards★ not all instruction combinations possible in parallel

• Data hazards★ data dependence

• Control hazards★ control dependence

19

Page 21: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Pipeline Hazards

• Structural hazards★ not all instruction combinations possible in parallel

• Data hazards★ data dependence

• Control hazards★ control dependence

19

Hazards make it necessary to stall the pipeline

Page 22: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Quantifying the Stall Cost

20

Average instruction time unpipelinedSpeedup = Average instruction time pipelined

CPI unpipelined × Clock cycle unpipelined = CPI pipelined × Clock cycle pipelined

CPI pipelined = Ideal CPI + Pipeline stall cycles per instruction = 1 + Pipeline stall cycles per instruction

Ignoring pipeline overheads, assuming balanced stages,

Clock cycle unpipelined = Clock cycle pipelined

CPI Unpipelined (≈ Pipeline depth)Speedup = 1 + Pipeline stall cycles per instruction

Page 23: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

STRUCTURAL HAZARDS

Page 24: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Structural Hazard Example: Mem. Port Conflict

22

Page 25: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

DATA HAZARDS

Page 26: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Data Hazard Types

• RAW: Read After Write★ true dependence

• WAR: Write After Read★ anti-dependence

• WAR: Write After Write★ output dependence

• RAR: Read After Read★ input dependence

24

Page 27: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Data Hazard Types

• RAW: Read After Write★ true dependence

• WAR: Write After Read★ anti-dependence

• WAR: Write After Write★ output dependence

• RAR: Read After Read★ input dependence

24

Page 28: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Data Hazard Example

25

Page 29: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Ameliorating Data Hazards

26

• Idea:★ ALU results from EX/MEM and MEM/WB registers fed

back to ALU inputs★ if previous ALU operation wrote the register needed by the

current operation, select the forwarded result

Page 30: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Forwarding

27

Page 31: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Ameliorating Data Hazards

28

• Idea:★ ALU results from EX/MEM and MEM/WB registers fed

back to ALU inputs★ if previous ALU operation wrote the register needed by the

current operation, select the forwarded result

• Observations:★ forwarding needed across multiple cycles (how many?)★ forwarding may be implemented across functional units

✴ e.g., output of one unit may be forwarded to input of another, rather than the input of just the same unit

Page 32: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Forwarding Across Multiple Units

29

Page 33: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Not All Stalls Avoided

30

Page 34: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

CONTROL HAZARDS

Page 35: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Handling Branch Hazard

32

Page 36: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Reducing Branch Penalty

33

• “Freeze” or “flush” the pipeline• Treat every branch as not-taken (“predicted-untaken”)

★ need to handle taken branches by roll-back

• Treat every branch as taken (“predicted-taken”)• Delayed branch

Page 37: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Freezing Pipeline

34

Page 38: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Delayed Branch

35

branch instructionsequential successor_1branch target if taken

Page 39: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Behavior of Delayed Branch

36

Page 40: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Schedule of Branch Delay Slot

37

Page 41: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Schedule of Branch Delay Slot

37

Page 42: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

HOW IS PIPELINING IMPLEMENTED?

Page 43: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation

• Instruction fetch cycle (IF)• Instruction decode/register fetch cycle (ID)• Execution / effective address cycle (EX)• Memory access / branch completion cycle (MEM)• Write-back cycle (WB)

39

Page 44: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: IF(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Fetch

40

IR ← Mem[PC];NPC ← PC + 4;

Page 45: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Decode

41

A ← Regs[rs];B ← Regs[rt];Imm ← sign-extended immediate field of IR;

Page 46: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Execution★ Memory reference

★ Register-Register ALU instruction

★ Register-Immediate ALU instruction

★ Branch

42

ALUOutput ← A + Imm;

ALUOutput ← A func B;

ALUOutput ← A op Imm;

ALUOutput ← NPC + (Imm << 2);Cond ← (A == 0);

Page 47: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Memory access / branch completion★ Memory reference

★ Branch

43

LMD ← Mem[ALUOutput] orMem[ALUOutput] ← B;

if (cond) PC ← ALUOutput;

Page 48: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Simple MIPS Implementation: ID(IF ➙ ID ➙ EX ➙ MEM ➙ WB)

• Write-back★ Register-Register ALU instruction

★ Register-Immediate ALU instruction

★ Load instruction

44

Regs[rd] ← ALUOutput;

Regs[rt] ← ALUOutput;

Regs[rt] ← LMD;

Page 49: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

MIPS Data Path

45

Page 50: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

MIPS Data Path: Pipelined

46

Page 51: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Situations for Data Hazard

47

Page 52: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Logic to Detect Data Hazards

48

Page 53: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Forwarding

49

Page 54: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Forwarding

49

Page 55: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Forwarding Implemented

50

Page 56: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Reducing the Branch Delay

51

Page 57: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

EXCEPTIONS

Page 58: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Types of Exceptions

• I/O device request• Invoking an OS service• Tracing• Breakpoint• Integer arithmetic overflow• FP arithmetic anomaly• Page fault• Misaligned memory access• Memory protection violation• Undefined or unimplemented instruction• Hardware malfunction• Power failure

53

Page 59: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Exception Categories

• Synchronous vs Asynchronous• User requested vs coerced• User maskable vs nonmaskable• Within vs between instructions• Resume vs terminate

54

Page 60: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Exception Categorization

55

Page 61: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Handling Exceptions

56

• Force a “trap” into the pipeline on the next IF• Turn of all writes until the trap is taken off

★ place zeros into pipeline latches of instructions following the one causing exception

• Exception handler saves PC and returns to it• Delayed branches cause a problem

★ need to save delay slots plus one number of PCs

Page 62: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

(Precise) Exceptions in MIPS

57

Page 63: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

(Precise) Exceptions in MIPS

57

Page 64: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

(Precise) Exceptions in MIPS

57

Use exception status vector for each instruction

Page 65: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Complications due to Complex Instructions

58

• Instructions that change processor states before they are committed★ autoincrement★ string copy (change memory state)

• State bits★ implicitly condition codes (flags)

✴ instructions setting condition codes not allowed in delay slots

• Multicycle operations★ use of microinstructions

Page 66: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

ADDING FLOATING POINT SUPPORT

Page 67: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Functional Units

• Main integer unit★ handles loads, stores, integer ALU operations, branches

• FP and integer multiplier• FP adder

★ handles FP add, subtract, and conversion

• FP and integer divider

60

Page 68: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline

61

Page 69: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline: Expanded View

62

Page 70: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline: Expanded View

62

Page 71: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Extended Pipeline: Expanded View

62

Page 72: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Additional Complications

63

• Structural hazard because divide unit is not pipelined

• Number of register writes in a cycle may more than one

• WAW hazards possible★ WAR hazards not possible

• Maintaining precise exceptions★ out-of-order completion

• More number of stalls due to RAW hazards, due to longer pipeline

Page 73: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Example: RAW Hazard

64

Page 74: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Example 2

65

Page 75: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Handling the Problems

66

• Register file conflict detection★ detect at ID stage★ detect at MEM or WB stage

Page 76: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Handling the Problems

67

• Register file conflict detection★ detect at ID stage, or★ detect at MEM or WB stage

• WAW hazard detection★ delay the issue of second instruction, or★ stamp out the result of the first instruction (convert it into

noop)

Page 77: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Summary of Checks

• Check for structural hazards• Check for RAW data hazard• Check for WAW data hazard

68

Page 78: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Maintaing Precise Exceptions

• Ignore the problem★ settle for imprecise exceptions

• Buffer the results★ simple buffer could be very big★ history file★ future file

• Let trap handling routines clean up• Hybrid scheme: allow issue when earlier instructions can

no longer raise exception

69

Page 79: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

DYNAMIC SCHEDULING

Page 80: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Idea Behind Dynamic Scheduling

• Split ID stage:★ Issue -- decode, check for structural hazards★ Read operands -- wait until no data hazards, then read

• Other stages remain as before

71

Page 81: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Idea Behind Dynamic Scheduling

• Split ID stage:★ Issue -- decode, check for structural hazards★ Read operands -- wait until no data hazards, then read

• Other stages remain as before

71

DIV.D F0, F2, F4ADD.D F10,F0,F8SUB.D F8,F8,F14

Must take care of WAR and WAW hazards.

Page 82: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding (CDC 6600)

72

Page 83: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding (CDC 6600)

73

• Issue★ check for free functional units★ check if another instruction has the same destination

register (WAW hazard detection)

• Read operands★ monitor availability of source operands (RAW hazard

detection)

• Execution★ functional unit notifies scoreboard of completion

• Write result★ check for WAR hazards, stall write if necessary

Page 84: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding: Effectiveness

• Relative easy to implement the logic★ only as much as a functional unit★ but four times as many buses as without scoreboard

• Reduces Clocks Per Instruction (CPI)★ tries to make use of the available Instruction Level

Parallelism (ILP)★ 1.7x speedup for Fortran, 2.5x for hand-coded assembly

74

Page 85: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Scoreboarding: Limitations

• Overlapping instructions must be picked from a single basic block

• Window: number of scoreboard entries• Number and types of functional units• Presence of anti- and output-dependences

75

Page 86: PIPELINING - Computer Science · B649: Parallel Architectures and Programming, Spring 2009 Why Pipelining? •Instruction-Level Parallelism (ILP) •Reducing Cycles Per Instruction

B649: Parallel Architectures and Programming, Spring 2009

Pitfalls

• Unexpected execution sequences may cause unexpected hazards

• Extensive pipelining can impact other aspects of a design

• Evaluating dynamic or static scheduling on the basis of unoptimized code

76

BNEZ R1, foo DIV.D F0,F2,F4 ... ...foo: L.D F0,qrs