cpe 442 hazards.1 introduction to computer architecture cpe 442 designing a pipeline processor...

41
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

Upload: ariel-hancock

Post on 18-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

CPE 442 hazards.3 Introduction to Computer Architecture Review: Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2

TRANSCRIPT

Page 1: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.1 Introduction to Computer Architecture

CpE 442

Designing a Pipeline Processor (lect. II)

Page 2: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.2 Introduction to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to Hazards (15 minutes)

° Forwarding (25 minutes)

° 1 cycle Load Delay (5 minutes)

° 1 cycle Branch Delay (15 minutes)

° What makes pipelining hard

° Summary (5 minutes)

Page 3: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.3 Introduction to Computer Architecture

Review: Single Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec MemLoad Store

Pipeline Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

IfetchR-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

Page 4: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.4 Introduction to Computer Architecture

Review: A Pipelined Datapath

IF/ID R

egister

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUnit

A

I

RFileDi

Ra

Rb

Rw

MemWr

RegWr ExtOp

ExecUnit

busAbusB

Imm16

ALUOp

ALUSrc

Mux

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4 PC+4

Rs

Rt

PC+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem Wr

Page 5: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.5 Introduction to Computer Architecture

Review: Pipeline Control “Data Stationary Control”

° The Main Control generates the control signals during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later• Control signals for Mem (MemWr Branch) are used 2 cycles later• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID R

egister

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

Reg/Dec Exec Mem

ExtOp

ALUOpRegDst

ALUSrc

BranchMemWr

MemtoRegRegWr

MainControl

ExtOp

ALUOpRegDst

ALUSrc

MemtoRegRegWr

MemtoRegRegWr

MemtoRegRegWr

BranchMemWr

BranchMemWr

Wr

Page 6: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.6 Introduction to Computer Architecture

Review: Pipeline Summary

° Pipeline Processor:• Natural enhancement of the multiple clock cycle processor• Each functional unit can only be used once per instruction• If a instruction is going to use a functional unit:

- it must use it at the same stage as all other instructions• Pipeline Control:

- Each stage’s control signal depends ONLY on the instruction that is currently in that stage

Page 7: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.7 Introduction to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to Hazards

° Forwarding (25 minutes)

° 1 cycle Load Delay (5 minutes)

° 1 cycle Branch Delay (15 minutes)

° What makes pipelining hard

° Summary (5 minutes)

Page 8: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.8 Introduction to Computer Architecture

Introduction to Hazards° Limits to pipelining: Hazards prevent next

instruction from executing during its designated clock cycle

• structural hazards: HW cannot support this combination of instructions

• data hazards: instruction depends on result of prior instruction still in the pipeline

• control hazards: pipelining of branches & other instructionsCommon solution is to stall the pipeline until the hazardbubbles” in the pipeline

Page 9: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.9 Introduction to Computer Architecture

Mem

A Single Memory is a Structural Hazard

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4A

LUMem Reg Mem Reg

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

AL

UReg Mem Reg

AL

UMem Reg Mem Reg

Page 10: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.10 Introduction to Computer Architecture

Option 1: Stall to resolve Memory Structural Hazard

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3(stall)

Instr 4A

LUMem Reg Mem Reg

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

bubble

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

Page 11: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.11 Introduction to Computer Architecture

Option 2: Duplicate to Resolve Structural Hazard

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4A

LUIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

• Separate Instruction Cache (Im) & Data Cache (Dm)

Page 12: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.12 Introduction to Computer Architecture

Data Hazard on r1

add r1 ,r2,r3

sub r4, r1 ,r3

and r6, r1 ,r7

or r8, r1 ,r9

xor r10, r1 ,r11

Page 13: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.13 Introduction to Computer Architecture

Data Hazard on r1: (Figure 6.30, page 397, P&H)

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

• Dependencies backwards in time are hazards

Page 14: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.14 Introduction to Computer Architecture

sub r4, r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Option1: HW Stalls to Resolve Data Hazard

Instr.

Order

Time (clock cycles)

add r1,r2,r3IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

• Dependencies backwards in time are hazardsA

LUIm Reg Dm

Im bubble bubble bubble

AL

UReg Dm Reg

AL

UIm Reg

Im Reg

Page 15: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.15 Introduction to Computer Architecture

But recall use of “Data Stationary Control”

° The Main Control generates the control signals during Reg/Dec• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later• Control signals for Mem (MemWr Branch) are used 2 cycles later• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID R

egister

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

Reg/Dec Exec Mem

ExtOp

ALUOpRegDst

ALUSrc

BranchMemWr

MemtoRegRegWr

MainControl

ExtOp

ALUOpRegDst

ALUSrc

MemtoRegRegWr

MemtoRegRegWr

MemtoRegRegWr

BranchMemWr

BranchMemWr

Wr

Page 16: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.16 Introduction to Computer Architecture

Option 1: How HW really stalls pipeline

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

• HW doesn’t change PC => keeps fetching same instruction & sets control signals to to benign values (0)

stall

stall

stall

AL

UIm Reg Dm

bubble bubble bubble bubbleIm

bubble bubble bubble bubbleIm

bubble bubble bubble bubbleIm

Page 17: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.17 Introduction to Computer Architecture

Option 2: SW inserts indepdendent instructions

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

• Worst case inserts NOP instructions

nop

nop

nop

AL

UIm Reg Dm

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

Page 18: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.18 Introduction to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to Hazards (15 minutes)

° Forwarding

° 1 cycle Load Delay (5 minutes)

° 1 cycle Branch Delay (15 minutes)

° What makes pipelining hard

° Summary (5 minutes)

Page 19: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.19 Introduction to Computer Architecture

Option 3 Insight: Data is available! (Figure 6.35, page 415, P&H)

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

• Pipeline registers already contain needed data

Page 20: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.20 Introduction to Computer Architecture

HW Change for “Forwarding” (Bypassing):

• Increase multiplexors to add paths from pipeline registers• Assumes register read during write gets new value (otherwise more results to be forwarded)

Page 21: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.21 Introduction to Computer Architecture

Complete data Path with Hazard detection and ForwardingFigure 6.41 in the text

Page 22: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.22 Introduction to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to Hazards (15 minutes)

° Forwarding (25 minutes)

° 1 cycle Load Delay

° 1 cycle Branch Delay (15 minutes)

° What makes pipelining hard

° Summary (5 minutes)

Page 23: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.23 Introduction to Computer Architecture

From Last Lecture: The Delay Load Phenomenon

° Although Load is fetched during Cycle 1:• The data is NOT written into the Reg File until the

end of Cycle 5• We cannot read this value from the Reg File until

Cycle 6• 3-instruction delay before the load take effect

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrI0: Load

Ifetch Reg/Dec Exec Mem WrPlus 1

Ifetch Reg/Dec Exec Mem WrPlus 2

Ifetch Reg/Dec Exec Mem WrPlus 3

Ifetch Reg/Dec Exec Mem WrPlus 4

Page 24: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.24 Introduction to Computer Architecture

Forwarding reduces Data Hazard to 1 cycle: (Figure 6.47, page 420 P&H)

Instr.

Order

Time (clock cycles)

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

Page 25: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.25 Introduction to Computer Architecture

Option1: HW Stalls to Resolve Data Hazard

Instr.

Order

Time (clock cycles)

lw r1, 0(r2)

sub r4,r1,r3

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

• “Interlock”: checks for hazard & stalls

stall bubble bubble bubble bubbleIm

and r6,r1,r7

or r8,r1,r9

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Im

AL

UReg Dm Reg

Page 26: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.26 Introduction to Computer Architecture

Option 2: SW inserts independent instructions

Instr.

Order

Time (clock cycles)

lw r1, 0(r2)

sub r4,r1,r3

IF

ID/RF

EX MEM WB

• Worst case inserts NOP instructions• MIPS I solution: No HW checking

nop

and r6,r1,r7

or r8,r1,r9

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Page 27: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.27 Introduction to Computer Architecture

Try producing fast code fora = b + c;d = e – f;

assuming a, b, c, d ,e, and f in memory. Slow code:

LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSW d,Rd

Software Scheduling to Avoid Load Hazards

Page 28: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.29 Introduction to Computer Architecture

Slow code:LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSW d,Rd

Fast code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,fSW a,Ra SUB Rd,Re,RfSW d,Rd

Page 29: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.30 Introduction to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to Hazards (15 minutes)

° Forwarding (25 minutes)

° 1 cycle Load Delay (5 minutes)

° 1 cycle Branch Delay

° What makes pipelining hard

° Summary (5 minutes)

Page 30: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.31 Introduction to Computer Architecture

From Last Lecture: The Delay Branch Phenomenon

° Although Beq is fetched during Cycle 4:• Target address is NOT written into the PC until the end of Cycle 7• Branch’s target is NOT fetched until Cycle 8• 3-instruction delay before the branch take effect

Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11

Ifetch Reg/Dec Exec Mem Wr

Ifetch Reg/Dec Exec Mem Wr16: R-type

Ifetch Reg/Dec Exec Mem Wr

Ifetch Reg/Dec Exec Mem Wr24: R-type

12: Beq(target is 1000)

20: R-type

Clk

Ifetch Reg/Dec Exec Mem Wr1000: Target of Br

Page 31: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.32 Introduction to Computer Architecture

Control Hazard on Branches: 3 stage stall

Page 32: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.33 Introduction to Computer Architecture

Branch Stall Impact

° If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

° 2 part solution:• Determine branch taken or not sooner, AND• Compute taken branch address earlier

° Solution Option 1:• Move Zero test to ID/RF stage• Adder to calculate new PC in ID/RF stage• 1 clock cycle penalty for branch vs. 3

Page 33: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.34 Introduction to Computer Architecture

Option 1: move HW forward to reduce branch delay

Data Path before change

MemoryAccess Write

BackInstruction

FetchInstr. Decode

Reg. FetchExecute

Addr. Calc.

Page 34: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.35 Introduction to Computer Architecture

Branch Delay now 1 clock cycle

Data Path after changeMemoryAccess

WriteBack

InstructionFetch

Instr. DecodeReg. Fetch

ExecuteAddr. Calc.

Page 35: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.36 Introduction to Computer Architecture

Option 2: No Stalls, Define Branch as Delayed, insertinstruction after the branch and allow it to execute always,

° Worst case, SW inserts NOP into branch delay if no instruction can be found

° Where to get instructions to fill branch delay slot?• Before branch instruction,

example sw r1,0(r2); beqd r0,r2,T change to, beqd r0,r2,T; sw r1,0(r2)

• From the target address: only valuable when branch• From fall through: only valuable when don’t branch

° Compiler effectiveness for single branch delay slot:• Fills about 60% of branch delay slots• About 80% of instructions executed in branch delay slots useful

in computation• about 50% (60% x 80%) of slots usefully filled

Page 36: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.37 Introduction to Computer Architecture

Complete data Path with Hazard detection and ForwardingFigure 6.41 in the text

Page 37: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.38 Introduction to Computer Architecture

ExampleTextFigure 6.52

Page 38: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.39 Introduction to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to Hazards (15 minutes)

° Forwarding (25 minutes)

° 1 cycle Load Delay (5 minutes)

° 1 cycle Branch Delay (15 minutes)

° What makes pipelining hard

° Summary (5 minutes)

Page 39: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.40 Introduction to Computer Architecture

When is pipelining hard?° Interrupts: 5 instructions executing in 5 stage pipeline• How to stop the pipeline?• Restrart?• Who caused the interrupt?

Stage Problem interrupts occurringIF Page fault on instruction fetch; misaligned

memory access; memory-protection violationID Undefined or illegal opcodeEX Arithmetic interruptMEM Page fault on data fetch; misaligned memory

access; memory-protection violation° Load with data page fault, Add with instruction page fault?° Solution 1: interrupt vector/instruction 2: interrupt ASAP,

restart everything incomplete

Page 40: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.41 Introduction to Computer Architecture

Data path with Exception Handling, Text Figure 6.55, add a Cause register, an Exception PC, and constant addr. of Exception Handeling routine

Page 41: CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CPE 442 hazards.42 Introduction to Computer Architecture

Review: Summary of Pipelining Basics

° Speed Up Š Pipeline Depth (number of stages); if ideal CPI is 1, then:

° Hazards limit performance on computers:• structural: need more HW resources• data: need forwarding, compiler scheduling• control: early evaluation & PC, delayed branch, prediction

° Increasing length of pipe increases impact of hazards since pipelining helps instruction bandwidth, not latency

° Compilers key to reducing cost of data and control hazards• load delay slots• branch delay slots

° Exceptions, Instruction Set, FP makes pipelining harder

° Longer pipelines => Branch prediction, more instruction parallelism?

Speedup= Pipeline depth1Pipeline stall cycles per instructionClock cycle unpipelined

Clock cycle pipelined