lec1 computer architecture by hsien-hsin sean lee georgia tech -- pipelining

50
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Pipelining (3055 Review) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology

Upload: hsien-hsin-lee

Post on 13-Feb-2017

483 views

Category:

Devices & Hardware


0 download

TRANSCRIPT

Page 1: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

ECE 4100/6100Advanced Computer Architecture

Lecture 1 Pipelining (3055 Review)

Prof. Hsien-Hsin Sean LeeSchool of Electrical and Computer EngineeringGeorgia Institute of Technology

Page 2: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

2

Pipeline Stage

Combinational

LogicF/F

F/F

• Optimal FO4 per pipe– 6 to 8 [UT/Compaq, ISCA-29]– 18 (15+3 latch) [IBM, MICRO-

35]

P4 pipe stage~ 16 FO4

1 FO4

Page 3: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

3

Five-stage Pipelined Datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Inst. Fetch Inst. Decode Exec Mem WB

Page 4: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

4

Example for lw instruction: Instruction Fetch (IF)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Instruction fetch

Page 5: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

5

Example for lw instruction: Instruction Decode (ID)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Instruction decode

Page 6: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

6

Example for lw instruction: Execution (EX)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Execution

Page 7: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

7

Example for lw instruction: Memory (MEM)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Memory

Page 8: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

8

Example for lw instruction: Writeback (WB)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Writeback

Page 9: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

9

Example for sw instruction: Memory (MEM)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Memory

Page 10: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

10

Example for sw instruction: Writeback (WB): do nothing

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Writeback

Page 11: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

11

Corrected Datapath (for lw)

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

Page 12: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

13

Pipeline Control

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

Add Addresult

Shiftleft 2

ALUresult

ALUZero

Add

0

1

Mux

0

1

Mux

Page 13: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

14

• We have 5 stages. What needs to be controlled in each stage?– Instruction Fetch and PC Increment– Instruction Decode / Register Fetch– Execution (4 lines)

• RegDst• ALUop[1:0]• ALUSrc

– Memory Stage (3 lines)• Branch• MemRead• MemWrite

– Write Back (2 lines)• MemtoReg• RegWrite (note that this signal is in ID stage)

Pipeline control

Page 14: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

15

• Extend pipeline registers to include control information (created in ID)

• Pass control signals along just like the data

Pipeline Control

Execution/AddressCalculation stage control

linesMemory access stage

control lines

Write-backstage control

lines

InstructionRegDst

ALUOp1

ALUOp0

ALUSrc Branch

MemRead

MemWrite

Regwrite

Memto Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Page 15: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

16

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

Page 16: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

17

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

IF: lw $10, 8($1)IF: lw $10, 8($1)

Page 17: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

18

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 8($1)ID: lw $10, 8($1)

11

010

0001E

“lw”

Page 18: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

19

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

11

010

00E

ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 8($1)EX: lw $10, 8($1)IF: and $12, $4, $5IF: and $12, $4, $5

1

0

10

000

1100

“sub”

Page 19: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

20

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

10

000

10E

EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 8($1)MEM: lw $10, 8($1)ID: and $12, $4, $5ID: and $12, $4, $5

0

1

10

000

1100

IF: or $13, $6, $7IF: or $13, $6, $7

110

10

“and”

Page 20: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

21

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

10

000

10E

MEM: sub $11, ..MEM: sub $11, .. WB: lw $10, WB: lw $10, 8($1)8($1)

EX: and $12, $4, $5EX: and $12, $4, $5

0

1

10

000

1100

ID: or $13, $6, $7ID: or $13, $6, $7

100

00

“or”

IF: add $14, $8, $9IF: add $14, $8, $9

1

1

Page 21: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

22

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

10

000

10E

WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12…

0

1

10

000

1100

EX: or $13, $6, $7EX: or $13, $6, $7

100

00

“add”

ID: add $14, $8, $9ID: add $14, $8, $9

1

0

IF: xxxxIF: xxxx

Page 22: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

23

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

10

000

10

WB: and $12…WB: and $12…

0

1

MEM: or $13, ..MEM: or $13, ..

100

00

EX: add $14, $8, $9EX: add $14, $8, $9

1

0

IF: xxxxIF: xxxx ID: xxxxID: xxxx

X

M

WB

ID/EX

E

Page 23: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

24

Datapath with ControlWB: or $13…WB: or $13…

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

MEM: add $14, ..MEM: add $14, ..

1000

0

EX: xxxxEX: xxxx

1

0

IF: xxxxIF: xxxx ID: xxxxID: xxxx

X

M

WB

ID/EX

E

Page 24: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

25

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

rite

MemRead

Control

ALU

Instruction[15– 11]

6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Wri t

e

AddressData

memory

Address

WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx

1

0

IF: xxxxIF: xxxx ID: xxxxID: xxxx

X

M

WB

ID/EX

E

Page 25: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

26

Pipelining is not quite that straightforward !• Limits to pipelining: Hazards prevent

next instruction from executing during its designated clock cycle– Structural hazards: HW cannot support this

combination of instructions – Data hazards: Instruction depends on result of

prior instruction still in the pipeline – Control hazards: Caused by delay between the

fetching of instructions and decisions about changes in control flow (branches and jumps).

Page 26: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

27

“Single” Memory Port Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Inst

ruct

ion

orde

r

Page 27: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

28

“Single” Memory Port / Structural Hazard

Time (clock cycles)

Load

add

Instr 2

Instr 3

Instr 4

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Reg ALU RegIfetch DMem

Inst

ruct

ion

orde

r

Page 28: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

29

Data Hazard

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Time (clock cycles)

Inst

ruct

ion

orde

r

Page 29: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

30

Forwarding to Avoid Data HazardTime (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Inst

ruct

ion

orde

r

Page 30: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

31

Forwarding (simplified)

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

ALU

Page 31: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

32

Forwarding (from EX/MEM)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

Page 32: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

33

Forwarding (from MEM/WB)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

Page 33: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

34

Forwarding (operand selection)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

ForwardingUnit

Page 34: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

35

Forwarding (operand propagation)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

ForwardingUnit

Rt

Rs

MU

X

Rd

Rt

EX/MEM Rd

MEM/WB Rd

Page 35: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

36

Data Hazard Even with Forwarding

Time (clock cycles)

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Forward backward in time… no way!! (or way?)

Inst

ruct

ion

orde

r

Page 36: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

37

Data Hazard Even with ForwardingTime (clock cycles)

or r8,r1,r9

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

Reg ALU DMemIfetch Reg

RegIfetch ALU DMem RegBubble

Ifetch ALU DMem RegBubble Reg

Ifetch ALU DMemBubble Reg

Need “pipeline interlock” (or stall) to stop instructions from issuing. How is this detected?

NO ISSUE

NO ISSUE

Inst

ruct

ion

orde

r

Page 37: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

38

Hazard Detection Unit• Stall by letting an instruction that won’t write anything go forward• Stall the pipeline if ID/EX is a load, and (rt=IF/ID.rs or rt=IF/ID.rt)

PC Instructionmemory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Inst

ruct

ion

ID/EX.MemReadIF

/IDW

rite

PC

Writ

e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRtIF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

Page 38: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

39

Code Rescheduling to Avoid Load Hazards

Try producing fast code fora = b + c;d = e – f;

assuming a, b, c, d ,e, and f in memory. Slow code:

LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSW d,Rd

Compiler optimizes for performance. Hardware checks for safety.

Fast code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,fSW a,Ra SUB Rd,Re,RfSW d,Rd

Page 39: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

40

Control Hazard due to Branches (3 stall cycles)10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch

What do you do with the 3 instructions in between?

How do you do it?

Where is the “commit”?

Page 40: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

41

Branch Hazard Resolutions#1 Stall until branch direction is clear ()#2: Static Branch Prediction• Predict Not Taken (Fall through, as shown in previous

slide)– Execute successor instructions in sequence– “Squash” instructions in pipeline if branch actually taken– PC+4 already calculated, so use it to get next instruction

• Predict Branch Taken– But haven’t calculated branch target address

• Might incur 1 cycle branch penalty• Other machines: branch target known before outcome

#3 Dynamic Branch Prediction– Will dedicate a lecture to such techniques

Page 41: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

42

Alternative Branch Hazard Resolutions#4 Delayed Branch

– Define branch to take place AFTER a following instructionbranch instruction

sequential successor1

sequential successor2

........sequential successorn

branch target if taken

– 1 slot delay allows proper decision and branch target address in 5 stage pipeline (next page)

Branch delay of length n

Page 42: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

44

Filling Branch Delay Slot

Make sure R7 will not be used in taken path before redefined

Page 43: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

45

Other Pipelining Issues• To have all instructions finish within

one cycle– Slow down frequency to cope w/ the

critical operation, or– Allow non-uniform latency operation

Page 44: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

46

Support Multiple FP Operations

• Complicate bypass• Potential structural hazard• Multiple (FP) instructions can complete at the same time

– RF might need to be multi-ported– Ordering issue, who gets to update the register?

• Out-of-order completion/retirement: Precise exception issue

IF ID MEM WB

EX

M2

M3

M4

M1

M5

M6

M7

A2

A3

A4

A1

Integer Unit

FP multiplier

FP add

FP divider (non-pipelined)

Page 45: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

47

Full Bypass/Forwarding Needed

IF ID EX M WB

IF ID SS M WBM1 M2 M3 M4 M5 M6 M7

L.D F4,0(R2)

MUL.D F0,F4,F6

A4IF SS ID SS SS SS SS SS SS A1 A2 A3 M WB

IF SS SS SS SS SS SS ID EX SS SS SS M WB

ADD.D F2,F0,F8

S.D F2,0(R2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Clock Cycles

Page 46: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

48

Structural Hazards

• Write to register file at the same cycle (cc11)• Write to the same register (WAW)• MEM in cc10

1 2 3 4 5 6 7 8 9 10 11Clock Cycles

IF ID M WBM1 M2 M3 M4 M5 M6 M7MUL.D F0,F4,F6

A4IF ID A1 A2 A3 M WBADD.D F2,F4,F6

IF ID EX M WBL.D F2,0(R2)

IF ID EX M WB. . . .IF ID EX M WB. . . .

IF ID EX M WB. . . .IF ID EX M WB. . . .

Page 47: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

49

Precise Exception Issue

• Precise exception: If the pipeline can (or must) be stopped– All the instructions before the faulty (or intended) instruction

must be completed– All the instructions after it must not be completed– Restart the execution from the faulty (or intended) instruction

• State must be consistent with the original program order• Not straightforward with out-of-order completion• Simple solution: Stalling until no exception of prior long-latency

instruction is guaranteed• Other modern solution: ROB (will dedicate a lecture to it)

DIV.D F0,F2,F4(exception!)ADD.D F3,F10,F8 (completed)SUB.D F12,F12,F14 (completed)

Page 48: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

50

MIPS R4000 Pipeline• Deeper Pipeline (superpipelining)• 2 cycle delays for load• Predicted-Not-Taken strategy

– Not-taken (fall-through) branch : 1 delay slot– Taken branch: 1 delay slot + 2 idle cycles

IF IS RF EX DF DS TC WB

Instruction Memory Reg Data Memory RegAL

U

Branch target and condition eval.

Page 49: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

51

Load delay (2 cycles)

IF IS RF EX DF DS TC WBInstruction

Memory Reg Data Memory RegALU

Instruction Memory Reg Data Memory RegAL

UInstruction

Memory Reg Data Memory RegALU

Instruction Memory Reg Data Memory RegAL

U

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11

LD R1

ADD R2, R1

Inst 2

Inst 1

If no delay slot instructions scheduled, R4000 will perform HW interlock

Bubble

Bubble

Page 50: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining

52

Branches (Predicted-not-taken)IF IS RF EX DF DS TC WB

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

IF IS RF EX DF DS TC WB

CC9 CC10 CC11Branch

Delay slot

IF IS RF EX DF DS TC WB

IF IS RF EX DF DS TC WB

Branch inst+2

Branch inst+3

NOT

TAKEN

SSStall

Stall

SS SS SS SS SS SS SS

SS SS SS SS SS SS SS SS

IF IS RF EX DF DS TCBranch Target

IF IS RF EX DF DS TC WB

IF IS RF EX DF DS TC WB

Branch

Delay slotTAKEN

ACTUAL

DIRECTION