lecture 3: review cpu design alvin r. lebeck cps 220 fall 2001

57
Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

Upload: liliana-obrien

Post on 21-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

Lecture 3: Review CPU Design

Alvin R. Lebeck

CPS 220

Fall 2001

Page 2: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 2© Alvin R. Lebeck 2001

Administrivia

• Read Chapter 3

• Homework #1

Processor Design

• Control and Datapath

• Pipelining

• If you need to more information, please see Chapters 5 and 6 of Patterson & Hennessy, “Computer Organization & Design”

Page 3: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 3© Alvin R. Lebeck 2001

Basic ISA Classes

Accumulator:1 address add A acc acc + mem[A]

1+x address addx A acc acc + mem[A + x]

Stack:0 address add tos tos + next (JAVA VM)

General Purpose Register:2 address add A B A A + B

3 address add A B C A B + C

Load/Store:3 address add Ra Rb Rc Ra Rb + Rc

load Ra Rb Ra mem[Rb]

store Ra Rb mem[Rb] Ra

Page 4: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 4© Alvin R. Lebeck 2001

Variable format, 2 and 3 address instruction

• 32-bit word size, 16 GPR (four reserved)

• Rich set of addressing modes (apply to any operand)

• Rich set of operations

– bit field, stack, call, case, loop, string, poly, system

• Rich set of data types (B, W, L, Q, O, F, D, G, H)

• Condition codes

VAX-11

OpCode A/M A/M A/M

Byte 0 1 n m

Page 5: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 5© Alvin R. Lebeck 2001

Kinds of Addressing Modes

• Register direct Ri

• Immediate (literal) v

• Direct (absolute) M[v]

• Register indirect M[Ri]

• Base+Displacement M[Ri + v]

• Base+Index M[Ri + Rj]

• Scaled Index M[Ri + Rj*d + v]

• Autoincrement M[Ri++]

• Autodecrement M[Ri--]

• Memory Indirect M[M[Ri]]

Ri Rj vmemory

reg. file

Page 6: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 6© Alvin R. Lebeck 2001

A "Typical" RISC

• 32-bit fixed format instruction (3 formats)

• 32 64-bit GPR (R0 contains zero)

• 3-address, reg-reg arithmetic instruction

• Single address mode for load/store: base + displacement– no indirection

• Simple branch conditions

• Delayed branch (sometimes)

see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, POWERPC, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

Page 7: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 7© Alvin R. Lebeck 2001

The Big Picture

• The Five Classic Components of a Computer

• Today’s Topic: Datapath and Control Design

Control

Datapath

Memory

Processor

Input

Output

Page 8: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 8© Alvin R. Lebeck 2001

The Big Picture: The Performance Perspective

• Performance of a machine was determined by:– Instruction count

– Clock cycle time

– Clock cycles per instruction

• Processor design (datapath and control) will determine:– Clock cycle time

– Clock cycles per instruction

• In this lecture:– Single cycle processor:

» Advantage: One clock cycle per instruction

» Disadvantage: long cycle time

– Multi cycle processor

Page 9: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 9© Alvin R. Lebeck 2001

The MIPS Instruction Formats

• All MIPS instructions are 32 bits long. The three instruction formats:

R-type

I-type

J-type

• Fields:– op: operation of the instruction– rs, rt, rd: the source and destination registers specifier– shamt: shift amount– funct: selects the variant of the operation in the “op” field– address / immediate: address offset or immediate value– target address: target address of the jump instruction

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Page 10: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 10© Alvin R. Lebeck 2001

An Abstract View of the Implementation

Clk

5

Rw Ra Rb

32 32-bitRegisters

Rd

AL

U

Clk

Data In

DataOut

DataAddress

IdealData

Memory

Instruction

Instruction Address

IdealInstruction

Memory

ClkPC

5Rs

5Rt

16Imm

32

323232

A

B

Page 11: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 11© Alvin R. Lebeck 2001

The Steps of Designing a Processor

• Instruction Set Architecture => Register Transfer Language

• Register Transfer Language =>– Datapath components

– Datapath interconnect

• Datapath components => Control signals

• Control signals => Control logic

Page 12: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 12© Alvin R. Lebeck 2001

RTL: The ADD Instruction

• add rd, rs, rt

– mem[PC] Fetch the instruction from memory

– R[rd] <- R[rs] + R[rt] The actual operation

– PC <- PC + 4 Calculate the next instruction’s address

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

Page 13: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 13© Alvin R. Lebeck 2001

Combinational Logic Elements (Building Blocks)

32A

B

32Result

Zero

OPA

LU

32A

B32

Y32

MU

X

Select

32

32

A

B

32Sum

Carry

Ad

der

CarryIn

ADDER MUX

ALU

Page 14: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 14© Alvin R. Lebeck 2001

Storage Element: Register (Building Block)

• Register– Similar to the D Flip Flop except

» N-bit input and output

» Write Enable input

– Write Enable:

» negated (0): Data Out will not change

» asserted (1): Data Out will become Data In Clk

Data In

Write Enable

N N

Data Out

Page 15: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 15© Alvin R. Lebeck 2001

Storage Element: Register File

• Register File consists of 32 registers:– Two 32-bit output busses:

busA and busB

– One 32-bit input bus: busW

• Register is selected by:– RA selects the register to put on busA

– RB selects the register to put on busB

– RW selects the register to be writtenvia busW when Write Enable is 1

• Clock input (CLK) – The CLK input is a factor ONLY during write operation

– During read operation, behaves as a combinational logic block:

» RA or RB valid => busA or busB valid after “access time.”

Clk

busW

Write Enable

32

32

busA

32

busB

5 5 5

RW RA RB

32 32-bitRegisters

Page 16: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 16© Alvin R. Lebeck 2001

Storage Element: Idealized Memory

• Memory (idealized)– One input bus: Data In

– One output bus: Data Out

• Memory word is selected by:– Address selects the word to put on Data Out

– Write Enable = 1: address selects the memoryword to be written via the Data In bus

• Clock input (CLK) – The CLK input is a factor ONLY during write operation

– During read operation, behaves as a combinational logic block:

» Address valid => Data Out valid after “access time.”

• Looks similar to register file. Why have registers?

Clk

Data In

Write Enable

32 32

DataOut

Address

Page 17: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 17© Alvin R. Lebeck 2001

Overview of the Instruction Fetch Unit

• The common RTL operations– Fetch the Instruction: mem[PC]

– Update the program counter:

» Sequential Code: PC <- PC + 4

» Branch and Jump: PC <- “something else”

32

Instruction WordAddress

InstructionMemory

PCClk

Next AddressLogic

Page 18: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 18© Alvin R. Lebeck 2001

Datapath for Register-Register Operations• R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt

– Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields

– ALUctr and RegWr: control logic after decoding the instruction

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs RtRd

AL

U

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

Page 19: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 19© Alvin R. Lebeck 2001

A Single Cycle Datapath

• We have everything except control signals (underline)

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrc

ExtOp

Mu

x

MemtoReg

Clk

Data InWrEn

32

Adr

DataMemory

32

MemWr

AL

U

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

Jump

Branch

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

Page 20: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 20© Alvin R. Lebeck 2001

Instruction Fetch Unit at the Beginning of Add / Subtract

3030

Sign

Ext

30

16imm16

Mu

x

0

1

Ad

der

“1”

PC

Clk

Ad

der

30

30

Branch = previous Zero = previous

“00”

Addr<31:2>

InstructionMemory

Addr<1:0>

32

Mu

x1

0

26

4

PC<31:28>

Target30

• Fetch the instruction from Instruction memory: Instruction <- mem[PC]– This is the same for all instructions

Jump = previous

Instruction<15:0>

Instruction<31:0>

30

Instruction<25:0>

Page 21: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 21© Alvin R. Lebeck 2001

The Single Cycle Datapath during Add and Subtract

32

ALUctr = Add or Subtract

Clk

busW

RegWr = 1

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

RdRegDst = 1

Exten

der

Mu

x

Mux

3216imm16

ALUSrc = 0

ExtOp = x

Mu

x

MemtoReg = 0

Clk

Data InWrEn

32

Adr

DataMemory

32

MemWr = 0A

LU

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

Jump = 0

Branch = 0

• R[rd] <- R[rs] + / - R[rt]

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

op rs rt rd shamt funct

061116212631

Page 22: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 22© Alvin R. Lebeck 2001

Instruction Fetch Unit at the End of Add and Subtract

3030

Sign

Ext

30

16imm16

Mu

x

0

1

Ad

der

“1”

PC

Clk

Ad

der

30

30

Branch = 0 Zero = x

“00”

Addr<31:2>

InstructionMemory

Addr<1:0>

32

Mu

x1

0

26

4

PC<31:28>

Target30

• PC <- PC + 4– This is the same for all instructions except: Branch and Jump

Jump = 0

Instruction<15:0>

Instruction<31:0>

30

Instruction<25:0>

Page 23: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 23© Alvin R. Lebeck 2001

The “Truth Table” for RegWrite

R-type ori lw sw beq jump

RegWrite 1 1 1 0 0 0

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

• RegWrite = R-type + ori + lw= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type)

+ !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori)

+ op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)

op<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jump

RegWrite

Page 24: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 24© Alvin R. Lebeck 2001

PLA Implementation of the Main Control

op<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jumpRegWrite

ALUSrc

MemtoRegMemWrite

BranchJump

RegDst

ExtOp

ALUop<2>ALUop<1>ALUop<0>

Page 25: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 25© Alvin R. Lebeck 2001

Putting it All Together: A Single Cycle Processor

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

Mux

3216imm16

ALUSrc

ExtOp

Mu

x

MemtoReg

Clk

Data InWrEn

32Adr

DataMemory

32

MemWr

AL

U

InstructionFetch Unit

Clk

Zero

Instruction<31:0>

Jump

Branch

0

1

0

1

01<

21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

MainControl

op6

ALUControlfunc

6

3ALUop

ALUctr3

RegDst

ALUSrc

:Instr<5:0>

Instr<31:26>

Instr<15:0>

Page 26: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 26© Alvin R. Lebeck 2001

Drawback of this Single Cycle Processor

• Long cycle time:– Cycle time must be long enough for the load instruction:

» PC’s Clock -to-Q +

» Instruction Memory Access Time +

» Register File Access Time +

» ALU Delay (address calculation) +

» Data Memory Access Time +

» Register File Setup Time +

» Clock Skew

• Cycle time is much longer than needed for all other instructions

Page 27: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 27© Alvin R. Lebeck 2001

Overview of a Multiple Cycle Implementation

• The root of the single cycle processor’s problems:– The cycle time has to be long enough for the slowest instruction

• Solution:– Break the instruction into smaller steps

– Execute each step (instead of the entire instruction) in 1 clock cycle

» Cycle time: time it takes to execute the longest step

» Try to make all the steps have similar length

– This is the essence of the multiple cycle processor

• The advantages of the multiple cycle processor:– Cycle time is much shorter

– Different instructions take different number of cycles to complete

» Load takes five cycles

» Jump only takes three cycles

– Allows a functional unit to be used more than once per instruction

Page 28: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

Instr Decode /Reg Fetrch

The Five Steps of a Load Instruction

Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busARegister File Access Time

Old Value New Value

busB

ALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

Address Old Value New Value

busW Old Value New

Delay through Extender & Mux

Data Memory Access Time

Instruction Fetch Address Reg WrData Memory

Register F

ile Write T

ime

Page 29: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

29© Alvin R. Lebeck 2001

Multiple Cycle Datapath

IdealMemoryWrAdrDin

RAdr

32

32

32Dout

MemWr

32

AL

U

3232

ALUOp

ALUControl

Instru

ction R

eg

32

IRWr

32

Reg File

Ra

Rw

busW

Rb5

5

32busA

32busB

RegWr

Rs

Rt

Mu

x

0

1

Rt

Rd

PCWr

ALUSelA

Mux 01

RegDst

Mu

x

0

1

32

PC

MemtoReg

Extend

ExtOp

Mu

x0

132

0

1

23

4

16

Imm

32

<< 2

ALUSelB

Mu

x1

0

Target32

Zero

ZeroPCWrCond PCSrc BrWr

32

IorD

Func

OpControl6

6

BeqRtypeOri

Memory:

Page 30: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

CPS 220 30© Alvin R. Lebeck 2001

Where to get more information?

• Chapter 5 of CPS 104 text book:– David Patterson and John Hennessy, “Computer Organization &

Design: The Hardware / Software Interface,” Morgan Kaufman Publishers, San Mateo, California, 1994.

• For a reference on the MIPS architecture:– Gerry Kane, “MIPS RISC Architecture,” Prentice Hall.

• Now: Pipelining

Page 31: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

Introduction to Pipelining

Page 32: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

32© Alvin R. Lebeck 2001

Overview

• A Pipelined Processor : – Introduction to the concept of pipelined processor.

– Pipelined Datapath

– Pipeline example: Load Instruction

Reading: • Chapter 3• Or Chapters 5, 6 in the CPS104 text

Page 33: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

33© Alvin R. Lebeck 2001

Pipelining: It’s Natural!

• Laundry Example

• Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold

• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

• How long to do laundry?

A B C D

Page 34: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

34© Alvin R. Lebeck 2001

Sequential Laundry

• Sequential laundry takes 6 hours for 4 loads

• If they learned pipelining, how long would laundry take?

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

Page 35: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

35© Alvin R. Lebeck 2001

Pipelined Laundry: Start work ASAP

• Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

30 40 40 40 40 20

Page 36: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

36© Alvin R. Lebeck 2001

• Pipelining doesn’t help latency of single task, it helps throughput of entire workload

• Pipeline rate limited by slowest pipeline stage

• Multiple tasks operating simultaneously

• Potential speedup = Number pipe stages

• Unbalanced lengths of pipe stages reduces speedup

• Time to “fill” pipeline and time to “drain” it reduces speedup

A

B

C

D

6 PM 7 8 9

Task

Order

Time

30 40 40 40 40 20

Pipelining Lessons

Page 37: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

37© Alvin R. Lebeck 2001

Review: a Multiple-Cycle Implementation

• The main single cycle processor’s problem:– The cycle time has to be long enough for the slowest instruction to

complete execution.

• Solution:– Break instruction execution into smaller steps– Execute each step in one cycle (instead of the entire instruction).

» Short cycle time: time it takes to execute the longest step » Make all the steps have similar length

– This is the essence of the multiple cycle processor

• Multiple-cycle processor advantages:– Cycle time is much shorter– Different instructions take different number of cycles to complete

» Load takes five cycles» Jump only takes three cycles

– Allows a functional unit to be used more than once per instruction

Page 38: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

38© Alvin R. Lebeck 2001

• MCP: If a functional unit is used more than once per instruction -> cannot pipeline -> lower performance

IdealMemoryWrAdrDin

RAdr

32

32

32Dout

MemWr32

AL

U

3232

ALUOp

ALUControl

Instru

ction R

eg

32

IRWr

32

Reg File

Ra

Rw

busW

Rb5

5

32busA

32busB

RegWr

Rs

Rt

Mu

x

0

1

Rt

Rd

PCWr

ALUSelA

Mux 01

RegDst

Mu

x

0

1

32

PC

MemtoReg

Extend

ExtOp

Mu

x

0

132

0

1

23

4

16Imm 32

<< 2

ALUSelB

Mu

x

1

0

Target32

Zero

ZeroPCWrCond PCSrc BrWr

32

IorD

Multiple Cycle Processor

Page 39: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

39© Alvin R. Lebeck 2001

• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: Calculate the memory address

• Mem: Read the data from the Data Memory

• WrB: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrBLoad

The Five Stages of a Load

Page 40: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

40© Alvin R. Lebeck 2001

Key Ideas Behind Instruction Execution Pipelining

• The load instruction has 5 stages:

– Five independent functional units to work on each stage

» Each functional unit is used only once!

– A 2nd load can start doing Ifetch as soon as the 1st load finishes its Ifetch stage.

– Each load still takes five cycles to complete.

» latency is still 5 cycles

– The throughput is much higher:

» CPI is 1 with ~1/5th the cycle time.

– Instructions start executing before previous instructions complete execution.

Ifetch Reg/Dec Exec Mem WrBLoad

Page 41: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

41© Alvin R. Lebeck 2001

• The five independent pipeline stages are:– Read Next Instruction: The Ifetch stage.

– Decode Instruction and fetch register values: The Reg/Dec stage

– Execute the operation: The Exec stage.

– Access Data-Memory: The Mem stage.

– Write Data to Destination Register: The WrB stage

• One instruction enters the pipeline every cycle– One instruction comes out of the pipeline (completed) every cycle

– The “Effective” Cycles per Instruction (CPI) is 1; ~1/5 cycle time

ClockCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem WrB1st lw

Ifetch Reg/Dec Exec Mem WrB2nd lw

Ifetch Reg/Dec Exec Mem WrB3rd lw

Pipelining the Load Instruction

Page 42: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

42© Alvin R. Lebeck 2001

• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: ALU operates on the two register operands

• WrB: Write the ALU output back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec WrB

R-type

The Four Stages of R-Type

Page 43: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

43© Alvin R. Lebeck 2001

• We have a problem called a structural hazard or pipeline conflict:– Two instructions try to write to the register file at the same time!

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

OOPS! We have a problem!

Pipelining the R-type and Load Instruction

Page 44: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

44© Alvin R. Lebeck 2001

• Each functional unit can only be used once per instruction.

• Each functional unit must be used at the same stage for all instructions:– Load uses Register File’s Write Port during its 5th stage.

– R-type uses Register File’s Write Port during its 4th stage.

Ifetch Reg/Dec Exec Mem WrBLoad

1 2 3 4 5

Ifetch Reg/Dec Exec WrBR-type

1 2 3 4

° How do we solve this pipeline hazard?

Important Observations

Page 45: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

45© Alvin R. Lebeck 2001

• Delay R-type’s register write by one cycle:– Now R-type instructions also use Reg File’s write port at Stage 5

– Mem stage is a NO-OP stage: nothing is being done. Effective CPI?

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Mem WrBR-type

Ifetch Reg/Dec Mem WrBR-type

Ifetch Reg/Dec Exec Mem WrBLoad

Ifetch Reg/Dec Mem WrBR-type

Ifetch Reg/Dec Mem WrBR-type

Exec

Exec

Exec

Exec

Ifetch Reg/Dec Exec WrR-type Mem

1 2 3 4 5

Solution: Delay R-type’s Write by One Cycle

Page 46: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

46© Alvin R. Lebeck 2001

• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: Calculate the memory address

• Mem: Write the data into the Data Memory

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemStore WrB

The Four Stages of a Store

Page 47: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

47© Alvin R. Lebeck 2001

• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: ALU compares the two register operands– Adder calculates the branch target address

• Mem: If the registers we compared in the Exec stage are the same,– Write the branch target address into the PC

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemBeq WrB

The Four Stages of BEQ

Page 48: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

48© Alvin R. Lebeck 2001

IF/ID

Register

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IF_U

nit

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp

ExecUnit

busAbusB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

10

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem WrB

EXUnit

A Pipelined Datapath

Page 49: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

49© Alvin R. Lebeck 2001

MemWr

You are here!

IF/ID

: lw $1, 100 ($2)

ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

= 12 Data

Mem

WADi

RA Do

IF_U

nit

A

I

RFileDi

Ra

Rb

Rw

RegWr ExtOp

ExecUnit

busAbusB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

10

RegDst

Rt

Rd

Imm16

PC+4 PC+4

Rs

Rt

PC

+4

Zero

Branch

10

ClkIfetch Reg/Dec Exec Mem

• Location 8: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

The Instruction Fetch Stage

Page 50: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

50© Alvin R. Lebeck 2001

• Location 8: lw $1, 0x100($2)

lw $1, 100 ($2)

PC

= 12

“8”

Ad

der

InstructionMemory

“4”

Instruction

Address

Clk

Ifetch

You are here!

Reg/Dec

PC

+4

32

Detailed View of the Instruction Fetch Unit

Page 51: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

51© Alvin R. Lebeck 2001

• Location 8: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

IF/ID

:

ID/E

x: Reg. 2 &

0x100

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr ExtOp

ExecUnit

busAbusB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4 PC+4

Rs

Rt

PC

+4

Zero

Branch

10

ClkIfetch Reg/Dec Exec Mem

You are here!

The Decode / Register Fetch Stage

Page 52: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

52© Alvin R. Lebeck 2001O

Prs

rtrd

fun

cP

C +

4

Rw

Control

Rb

Rart

rs

RegisterFile

rt

rd

Imm16

Bus-A

Bus-B

PC+4

Din

Clk Clk

Detailed View of the Fetch/Decode Stage

Page 53: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

53© Alvin R. Lebeck 2001

• Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

IF/ID

: ID/E

x Register

Ex/M

em: L

oad’s A

dd

ress

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFileDi

Ra

Rb

Rw

MemWr

RegWr ExtOp=1

ExecUnit

busAbusB

Imm16

ALUOp=Add

ALUSrc=1

Mu

x

1

0

MemtoReg

10

RegDst=0

Rt

Rd

Imm16

PC+4 PC+4

Rs

Rt

PC

+4

Zero

Branch

10

ClkIfetch Reg/Dec Exec Mem

You are here!

Load’s Address Calculation Stage

Page 54: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

54© Alvin R. Lebeck 2001

ID/E

x Register

Ex/M

em: L

oad’s M

emory A

dd

ress

ALU CONTROL

ALUctr

32busA

32busB

Exten

der

Mu

x16

imm16

ALUSrc=1ExtOp=1

3

AL

UZero

0

1

32ALUout

32

Ad

der

3 ALUOp=Add

<< 2

32PC+4

Target

32

Clk

Exec

You are here!

Mem

Detailed View of the Execution Unit

Page 55: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

55© Alvin R. Lebeck 2001

You are here!

IF/ID

: ID/E

x Register

Ex/M

em R

egister

Mem

/Wr: L

oad’s D

ata

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFileDi

Ra

Rb

Rw

MemWr=0

RegWr ExtOp

ExecUnit

busAbusB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg

1

0

RegDst

Rt

Rd

Imm16

PC+4 PC+4

Rs

Rt

PC

+4

Zero

Branch=0

10

ClkIfetch Reg/Dec Exec Mem

• Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Load’s Memory Access Stage

Page 56: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

56© Alvin R. Lebeck 2001

• Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

IF/ID

: ID/E

x Register

Ex/M

em R

egister

Mem

/Wr R

egister

PC

DataMem

WADi

RA Do

IUn

it

A

I

RFile

Di

Ra

Rb

Rw

MemWr

RegWr=1 ExtOp

ExecUnit

busAbusB

Imm16

ALUOp

ALUSrc

Mu

x

1

0

MemtoReg=1

1

0

RegDst

Rt

Rd

Imm16

PC+4PC+4

Rs

Rt

PC

+4

Zero

Branch

10

Clk

Ifetch Reg/Dec Exec Mem

You are somewhere out there!

Wr

Load’s Write Back Stage

Page 57: Lecture 3: Review CPU Design Alvin R. Lebeck CPS 220 Fall 2001

57© Alvin R. Lebeck 2001

Next Time

• Pipeline Complications