cpe 442 pipeline.1 intro to computer architecture cpe 242 computer architecture and engineering...

41
CPE 442 pipeline.1 Intro to Computer Architectu CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

Upload: kiley-carrow

Post on 15-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.1 Intro to Computer Architecture

CpE 242Computer Architecture and

EngineeringDesigning a Pipeline Processor

Page 2: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.2 Intro to Computer Architecture

A Single Cycle Processor

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrc

ExtOp

Mu

x

MemtoReg

Clk

Data InWrEn

32

Adr

DataMemory

32

MemWrA

LU

Zero

0

1

0

1

01

InstructionFetch Unit

Clk

Instruction<31:0>Jump

Branch

<21:25>

<16:20>

<11:15>

<0:15>

Imm16

Rd

MainControl

op

ALUControlfunc

ALUop

3

RegDst

ALUSrc

:

<5:0>

<31:26>

Instr<15:0>

Zero

3

Page 3: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.3 Intro to Computer Architecture

Drawbacks of this Single Cycle Processor

° Long cycle time:

• Cycle time must be long enough for the load instruction:

- PC’s Clock -to-Q +

- Instruction Memory Access Time +

- Register File Access Time +

- ALU Delay (address calculation) +

- Data Memory Access Time +

- Register File Setup Time +

- Clock Skew

° Cycle time is much longer than needed for all other instructions. Examples:

• R-type instructions do not require data memory access

• Jump does not require ALU operation nor data memory access

Page 4: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.4 Intro to Computer Architecture

Overview of a Multiple Cycle Implementation

° The root of the single cycle processor’s problems:

• The cycle time has to be long enough for the slowest instruction

° Solution:

• Break the instruction into smaller steps

• Execute each step (instead of the entire instruction) in one cycle

- Cycle time: time it takes to execute the longest step

- Keep all the steps to have similar length

• This is the essence of the multiple cycle processor

° The advantages of the multiple cycle processor:

• Cycle time is much shorter

• Different instructions take different number of cycles to complete

- Load takes five cycles

- Jump only takes three cycles

• Allows a functional unit to be used more than once per instruction

Page 5: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.5 Intro to Computer Architecture

Multiple Cycle Processor

° MCP: A functional unit to be used more than once per instruction

IdealMemoryWrAdrDin

RAdr

32

32

32Dout

MemWr

32

AL

U

3232

ALUOp

ALUControl

Instru

ction R

eg

32

IRWr

32

Reg File

Ra

Rw

busW

Rb5

5

32busA

32busB

RegWr

Rs

Rt

Mu

x

0

1

Rt

Rd

PCWr

ALUSelA

Mux 01

RegDst

Mu

x

0

1

32

PC

MemtoReg

Extend

ExtOp

Mu

x

0

132

0

1

23

4

16Imm 32

<< 2

ALUSelB

Mu

x1

0

Target32

Zero

ZeroPCWrCond PCSrc BrWr

32

IorD

Page 6: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.6 Intro to Computer Architecture

Outline of Today’s Lecture

° Recap and Introduction (5 minutes)

° Introduction to the Concept of Pipelined Processor (15 minutes)

° Pipelined Datapath and Pipelined Control (25 minutes)

° How to Avoid Race Condition in a Pipeline Design? (5 minutes)

° Pipeline Example: Instructions Interaction (15 minutes)

° Summary (5 minutes)

Page 7: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.7 Intro to Computer Architecture

Timing Diagram of a Load Instruction

Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busA

Register File Access Time

Old Value New Value

busB

ALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

Address Old Value New Value

busW Old Value New

Delay through Extender & Mux

Data Memory Access Time

Instruction Fetch Instr Decode /

Reg. Fetch

Address Reg WrData Memory

Register F

ile Write T

ime

Page 8: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.8 Intro to Computer Architecture

The Five Stages of Load

° Ifetch: Instruction Fetch

• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: Calculate the memory address

° Mem: Read the data from the Data Memory

° Wr: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Page 9: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.9 Intro to Computer Architecture

Key Ideas Behind Pipelining

° Grading the Final exam for a class of 100 students:

• 5 problems, five people grading the exam

• Each person ONLY grade one problem

• Pass the exam to the next person as soon as one finishes his part

• Assume each problem takes 12 min to grade

- Each individual exam still takes 1 hour to grade

- But with 5 people, all exams can be graded five times quicker

° The load instruction has 5 stages:

• Five independent functional units to work on each stage

- Each functional unit is used only once

• The 2nd load can start as soon as the 1st finishes its Ifetch stage

• Each load still takes five cycles to complete

• The throughput, however, is much higher

Page 10: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.10 Intro to Computer Architecture

Key Ideas Behind Pipelining

° Let n be number of tasks or exams (or instructions)

° Let k be number of stages for each task

° Let T be the time per stage

° Time per task = T . k

° Total Time per n tasks for non-pipelined solution = T . k . n

° Total Time per n tasks for pipelined solution = T . k + T . (n-1)

° Speedup = pipelined perform/ non-pipelined performance

= Total Time non-pipelined/ Total Time for pipelined

= k . n / k + n-1 = k approx. when n >> k

InputTasks

K – stage pipeline

buffer

Stage 1 Stage 2 Stage k

Page 11: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.11 Intro to Computer Architecture

Pipelining the Load Instruction

° The five independent functional units in the pipeline datapath are:

• Instruction Memory for the Ifetch stage

• Register File’s Read ports (bus A and busB) for the Reg/Dec stage

• ALU for the Exec stage

• Data Memory for the Mem stage

• Register File’s Write port (bus W) for the Wr stage

° One instruction enters the pipeline every cycle

• One instruction comes out of the pipeline (complete) every cycle

• The “Effective” Cycles per Instruction (CPI) is 1

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem Wr1st lw

Ifetch Reg/Dec Exec Mem Wr2nd lw

Ifetch Reg/Dec Exec Mem Wr3rd lw

Page 12: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.12 Intro to Computer Architecture

The Four Stages of R-type

° Ifetch: Instruction Fetch

• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: ALU operates on the two register operands

° Wr: Write the ALU output back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec WrR-type

Page 13: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.13 Intro to Computer Architecture

Pipelining the R-type and Load Instruction

° We have a problem:

• Two instructions try to write to the register file at the same time!

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ops! We have a problem!

Page 14: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.14 Intro to Computer Architecture

Important Observation

° Each functional unit can only be used once per instruction

° Each functional unit must be used at the same stage for all instructions:

• Load uses Register File’s Write Port during its 5th stage

• R-type uses Register File’s Write Port during its 4th stage

Ifetch Reg/Dec Exec Mem WrLoad

1 2 3 4 5

Ifetch Reg/Dec Exec WrR-type

1 2 3 4

Page 15: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.15 Intro to Computer Architecture

Solution 1: Insert “Bubble” into the Pipeline

° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle

• The control logic can be complex

° No instruction is completed during Cycle 5:

• The “Effective” CPI for load is 2

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type Pipeline

Bubble

Ifetch Reg/Dec Exec Wr

Page 16: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.16 Intro to Computer Architecture

Solution 2: Delay R-type’s Write by One Cycle

° Delay R-type’s register write by one cycle:

• Now R-type instructions also use Reg File’s write port at Stage 5

• Mem stage is a NOOP stage: nothing is being done

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec WrR-type Mem

Exec

Exec

Exec

Exec

1 2 3 4 5

Page 17: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.17 Intro to Computer Architecture

The Four Stages of Store

° Ifetch: Instruction Fetch

• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: Calculate the memory address

° Mem: Write the data into the Data Memory

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemStore Wr

Page 18: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.18 Intro to Computer Architecture

The Four Stages of Beq

° Ifetch: Instruction Fetch

• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: ALU compares the two register operands

• Adder calculates the branch target address

° Mem: If the registers we compared in the Exec stage are the same,

• Write the branch target address into the PC

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemBeq Wr

Page 19: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.19 Intro to Computer Architecture

A Pipelined Datapath

IF/ID

Register

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp

ExecUnit

busA

busB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem Wr

Page 20: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.20 Intro to Computer Architecture

The Instruction Fetch Stage

IF/ID

: lw $1, 100 ($2)

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

= 14 Data

Mem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp

ExecUnit

busA

busB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem

You are here!

° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Page 21: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.21 Intro to Computer Architecture

A Detail View of the Instruction Unit

° Location 10: lw $1, 0x100($2)

IF/ID

: lw $1, 100 ($2)

PC

= 14

10

10

Ad

der

InstructionMemory

“4”

Instruction

Address

Clk

Ifetch

You are here!

Reg/Dec

Page 22: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.22 Intro to Computer Architecture

The Decode / Register Fetch Stage

IF/ID

:

ID/E

x: Reg. 2 &

0x100

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp

ExecUnit

busA

busB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem

You are here!

° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Page 23: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.23 Intro to Computer Architecture

Load’s Address Calculation Stage

IF/ID

: ID/E

x Register

Ex/M

em: L

oad’s A

dd

ress

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp=1

ExecUnit

busA

busB

Imm16

ALUOp=Add

ALUSrc=1

Mu

x

1

0

MemtoReg

1

0

RegDst=0

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem

You are here!

° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Page 24: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.24 Intro to Computer Architecture

A Detail View of the Execution Unit

ID/E

x Register

Ex/M

em: L

oad’s M

emory A

dd

ressALUControl

ALUctr

32busA

32

busB

Exten

der

Mu

x

16

imm16

ALUSrc=1ExtOp=1

3

AL

U

Zero

0

1

32

ALUout

32

Ad

der

3 ALUOp=Add

<< 2

32PC+4

Target

32

Clk

Exec

You are here!

Mem

Page 25: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.25 Intro to Computer Architecture

Load’s Memory Access Stage

IF/ID

: ID/E

x Register

Ex/M

em R

egister

Mem

/Wr: L

oad’s D

ata

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr=0

RegWr ExtOp

ExecUnit

busA

busB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch=0

10

Clk

Ifetch Reg/Dec Exec Mem

You are here!

° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Page 26: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.26 Intro to Computer Architecture

Load’s Write Back Stage

IF/ID

: ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr=1 ExtOp

ExecUnit

busA

busB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg=1

1

0

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem

You are somewhere out there!

° Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Wr

Page 27: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.27 Intro to Computer Architecture

How About Control Signals?

IF/ID

: ID/E

x Register

Ex/M

em: L

oad’s A

dd

ress

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp=1

ExecUnit

busA

busB

Imm16

ALUOp=Add

ALUSrc=1

Mu

x

1

0

MemtoReg

1

0

RegDst=0

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Ifetch Reg/Dec Exec Mem

° Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)

• N = Exec, Mem, or Wr

° Example: Controls Signals at Exec Stage = Func(Load’s Exec)

Wr

Page 28: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.28 Intro to Computer Architecture

Pipeline Control

° The Main Control generates the control signals during Reg/Dec

• Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later

• Control signals for Mem (MemWr Branch) are used 2 cycles later

• Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID

Register

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

Reg/Dec Exec Mem

ExtOp

ALUOp

RegDst

ALUSrc

Branch

MemWr

MemtoReg

RegWr

MainControl

ExtOp

ALUOp

RegDst

ALUSrc

MemtoReg

RegWr

MemtoReg

RegWr

MemtoReg

RegWr

Branch

MemWr

Branch

MemWr

Wr

Page 29: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.29 Intro to Computer Architecture

Beginning of the Wr’s Stage: A Real World Problem

° At the beginning of the Wr stage, we have a problem if:

• RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q

° Similarly, at the beginning of the Mem stage, we have a problem if:

• WrAdr’s Clk-to-Q > MemWr’s Clk-to-Q

° We have a race condition between Address and Write Enable!

Ex/M

em

Mem

/Wr RegAdr

RegWr MemWr

Data

WrAdr

DataRegFile

DataMemory

Clk

RegAdr

RegWrRegWr’s Clk-to-Q

RegAdr’s Clk-to-Q

Clk

WrAdr

MemWrMemWr’s Clk-to-Q

WrAdr’s Clk-to-Q

Page 30: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.30 Intro to Computer Architecture

The Pipeline Problem

° Multiple Cycle design prevents race condition between Addr and WrEn:

• Make sure Address is stable by the end of Cycle N

• Asserts WrEn during Cycle N + 1

° This approach can NOT be used in the pipeline design because:

• Must be able to write the register file every cycle

• Must be able write the data memory every cycle

Clock

Ifetch Reg/Dec Exec Mem WrStore

Ifetch Reg/Dec Exec Mem WrStore

Ifetch Reg/Dec Exec Mem WrR-type

Ifetch Reg/Dec Exec Mem WrR-type

Page 31: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.31 Intro to Computer Architecture

Synchronize Register File & Synchronize Memory

° Solution: And the Write Enable signal with the Clock

• This is the ONLY place where gating the clock is used

• MUST consult circuit expert to ensure no timing violation:

- Example: Clock High Time > Write Access Delay

WrEn

I_Addr

I_Data

Reg Fileor

Memory

Clk

I_Addr

I_WrEn

Address

Data

I_WrEn

C_WrEn

C_WrEn

Clk

Address

Data

WrEn

Reg Fileor

Memory

Synchronize Memory and Register File

Address, Data, and WrEn must be stableat least 1 set-up time before the Clk edge

Write occurs at the cycle followingthe clock edge that captures the signals

Page 32: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.32 Intro to Computer Architecture

A More Extensive Pipelining Example

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem Wr0: Load

Ifetch Reg/Dec Exec Mem Wr4: R-type

Ifetch Reg/Dec Exec Mem Wr8: Store

Ifetch Reg/Dec Exec Mem Wr12: Beq (target is 1000)

End ofCycle 4

End ofCycle 5

End ofCycle 6

End ofCycle 7

° End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch

° End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg

° End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec

° End of Cycle 7: Store’s Wr, Beq’s Mem

Page 33: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.33 Intro to Computer Architecture

Pipelining Example: End of Cycle 4

° 0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch

IF/ID

: Beq

Instru

ction

ID/E

x: Store’s b

usA

& B

Ex/M

em: R

-type’s R

esult

Mem

/Wr: L

oad’s D

out

PC

= 16 Data

Mem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

RegWr=0 ExtOp=x

ExecUnit

busA

busB

Imm16

ALUOp=R-type

ALUSrc=0

Mu

x

1

0

MemtoReg=x

1

0

RegDst=1

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch=0

10

12: Beq’s Ifet

8: Store’s Reg 4: R-type’s Exec 0: Load’s Mem

Clk

MemWr=0Clk

Page 34: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.34 Intro to Computer Architecture

Pipelining Example: End of Cycle 5

° 0: Lw’s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch

IF/ID

: Instru

ction @

16

ID/E

x: Beq

’s bu

sA &

B

Ex/M

em: S

tore’s Ad

dress

Mem

/Wr: R

-type’s R

esult

PC

= 20 Data

Mem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

RegWr=1 ExtOp=1

ExecUnit

busA

busB

Imm16

ALUOp=Add

ALUSrc=1

Mu

x

1

0

MemtoReg=1

1

0

RegDst=x

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch=0

10

16: R’s Ifet

12: Beq’s Reg 8: Store’s Exec 4: R-type’s Mem

0: Load’s Wr

Clk

MemWr=0Clk

Page 35: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.35 Intro to Computer Architecture

Pipelining Example: End of Cycle 6

° 4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet

IF/ID

: Instru

ction @

20

ID/E

x:R-typ

e’s bu

sA &

B

Ex/M

em: B

eq’s R

esults

Mem

/Wr: N

othin

g for St

PC

= 24 Data

Mem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

RegWr=1 ExtOp=1

ExecUnit

busA

busB

Imm16

ALUOp=Sub

ALUSrc=0

Mu

x

1

0

MemtoReg=0

1

0

RegDst=x

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch=0

10

20:R-type’s Ifet

16: R-type’s Reg 12: Beq’s Exec 8: Store’s Mem

4: R-type’s Wr

Clk

MemWr=1Clk

Page 36: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.36 Intro to Computer Architecture

Pipelining Example: End of Cycle 7

° 8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet

IF/ID

: Instru

ction @

24

ID/E

x:R-typ

e’s bu

sA &

B

Ex/M

em: R

type’s R

esults

Mem

/Wr:N

othin

g for Beq

PC

= 1000

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

RegWr=0 ExtOp=x

ExecUnit

busA

busB

Imm16

ALUOp=R-type

ALUSrc=0

Mu

x

1

0

MemtoReg=x

1

0

RegDst=1

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch=1

10

24:R-type’s Ifet

20: R-type’s Reg 16: R-type’s Exec 12: Beq’s Mem

8: Store’s Wr

Clk

MemWr=0Clk

Page 37: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.37 Intro to Computer Architecture

The Delay Branch Phenomenon

° Although Beq is fetched during Cycle 4:

• Target address is NOT written into the PC until the end of Cycle 7

• Branch’s target is NOT fetched until Cycle 8

• 3-instruction delay before the branch take effect

° This is referred to as Branch Hazard:

• Clever design techniques can reduce the delay to ONE instruction

Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11

Ifetch Reg/Dec Exec Mem Wr

Ifetch Reg/Dec Exec Mem Wr16: R-type

Ifetch Reg/Dec Exec Mem Wr

Ifetch Reg/Dec Exec Mem Wr24: R-type

12: Beq(target is 1000)

20: R-type

Clk

Ifetch Reg/Dec Exec Mem Wr1000: Target of Br

Page 38: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.38 Intro to Computer Architecture

The Delay Load Phenomenon

° Although Load is fetched during Cycle 1:

• The data is NOT written into the Reg File until the end of Cycle 5

• We cannot read this value from the Reg File until Cycle 6

• 3-instruction delay before the load take effect

° This is referred to as Data Hazard:

• Clever design techniques can reduce the delay to ONE instruction

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrI0: Load

Ifetch Reg/Dec Exec Mem WrPlus 1

Ifetch Reg/Dec Exec Mem WrPlus 2

Ifetch Reg/Dec Exec Mem WrPlus 3

Ifetch Reg/Dec Exec Mem WrPlus 4

Page 39: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.39 Intro to Computer Architecture

Summary

° Disadvantages of the Single Cycle Processor

• Long cycle time

• Cycle time is too long for all instructions except the Load

° Multiple Clock Cycle Processor:

• Divide the instructions into smaller steps

• Execute each step (instead of the entire instruction) in one cycle

° Pipeline Processor:

• Natural enhancement of the multiple clock cycle processor

• Each functional unit can only be used once per instruction

• If a instruction is going to use a functional unit:

- it must use it at the same stage as all other instructions

• Pipeline Control:

- Each stage’s control signal depends ONLY on the instruction that is currently in that stage

Page 40: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.40 Intro to Computer Architecture

Single Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipeline Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch

R-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

Page 41: CPE 442 pipeline.1 Intro to Computer Architecture CpE 242 Computer Architecture and Engineering Designing a Pipeline Processor

CPE 442 pipeline.41 Intro to Computer Architecture

Where to get more information?

° Everything You Need to know about Pipeline Computer:

• Peter Kogge, “The Architecture of Pipeline Computers,” McGraw Hill Book Company, 1981

° Some Classic References on RISC Pipelines:

• Manolis Katevenis, “Reduced Instruction Set Computer Architectures for VLSI,” PhD Thesis, UC Berkeley, 1984.

° Other references:

• David. A Patterson, “Reduced Instruction Set Computers,” Communications of the ACM, January 1985.

• Shing Kong, “Performance, Resources, and Complexity,” PhD Thesis, UC Berkeley, 1989.