lecture 3: review cpu design alvin r. lebeck cps 220 fall 2001
TRANSCRIPT
Lecture 3: Review CPU Design
Alvin R. Lebeck
CPS 220
Fall 2001
CPS 220 2© Alvin R. Lebeck 2001
Administrivia
• Read Chapter 3
• Homework #1
Processor Design
• Control and Datapath
• Pipelining
• If you need to more information, please see Chapters 5 and 6 of Patterson & Hennessy, “Computer Organization & Design”
CPS 220 3© Alvin R. Lebeck 2001
Basic ISA Classes
Accumulator:1 address add A acc acc + mem[A]
1+x address addx A acc acc + mem[A + x]
Stack:0 address add tos tos + next (JAVA VM)
General Purpose Register:2 address add A B A A + B
3 address add A B C A B + C
Load/Store:3 address add Ra Rb Rc Ra Rb + Rc
load Ra Rb Ra mem[Rb]
store Ra Rb mem[Rb] Ra
CPS 220 4© Alvin R. Lebeck 2001
Variable format, 2 and 3 address instruction
• 32-bit word size, 16 GPR (four reserved)
• Rich set of addressing modes (apply to any operand)
• Rich set of operations
– bit field, stack, call, case, loop, string, poly, system
• Rich set of data types (B, W, L, Q, O, F, D, G, H)
• Condition codes
VAX-11
OpCode A/M A/M A/M
Byte 0 1 n m
CPS 220 5© Alvin R. Lebeck 2001
Kinds of Addressing Modes
• Register direct Ri
• Immediate (literal) v
• Direct (absolute) M[v]
• Register indirect M[Ri]
• Base+Displacement M[Ri + v]
• Base+Index M[Ri + Rj]
• Scaled Index M[Ri + Rj*d + v]
• Autoincrement M[Ri++]
• Autodecrement M[Ri--]
• Memory Indirect M[M[Ri]]
Ri Rj vmemory
reg. file
CPS 220 6© Alvin R. Lebeck 2001
A "Typical" RISC
• 32-bit fixed format instruction (3 formats)
• 32 64-bit GPR (R0 contains zero)
• 3-address, reg-reg arithmetic instruction
• Single address mode for load/store: base + displacement– no indirection
• Simple branch conditions
• Delayed branch (sometimes)
see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, POWERPC, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
CPS 220 7© Alvin R. Lebeck 2001
The Big Picture
• The Five Classic Components of a Computer
• Today’s Topic: Datapath and Control Design
Control
Datapath
Memory
Processor
Input
Output
CPS 220 8© Alvin R. Lebeck 2001
The Big Picture: The Performance Perspective
• Performance of a machine was determined by:– Instruction count
– Clock cycle time
– Clock cycles per instruction
• Processor design (datapath and control) will determine:– Clock cycle time
– Clock cycles per instruction
• In this lecture:– Single cycle processor:
» Advantage: One clock cycle per instruction
» Disadvantage: long cycle time
– Multi cycle processor
CPS 220 9© Alvin R. Lebeck 2001
The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. The three instruction formats:
R-type
I-type
J-type
• Fields:– op: operation of the instruction– rs, rt, rd: the source and destination registers specifier– shamt: shift amount– funct: selects the variant of the operation in the “op” field– address / immediate: address offset or immediate value– target address: target address of the jump instruction
op target address
02631
6 bits 26 bits
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
CPS 220 10© Alvin R. Lebeck 2001
An Abstract View of the Implementation
Clk
5
Rw Ra Rb
32 32-bitRegisters
Rd
AL
U
Clk
Data In
DataOut
DataAddress
IdealData
Memory
Instruction
Instruction Address
IdealInstruction
Memory
ClkPC
5Rs
5Rt
16Imm
32
323232
A
B
CPS 220 11© Alvin R. Lebeck 2001
The Steps of Designing a Processor
• Instruction Set Architecture => Register Transfer Language
• Register Transfer Language =>– Datapath components
– Datapath interconnect
• Datapath components => Control signals
• Control signals => Control logic
CPS 220 12© Alvin R. Lebeck 2001
RTL: The ADD Instruction
• add rd, rs, rt
– mem[PC] Fetch the instruction from memory
– R[rd] <- R[rs] + R[rt] The actual operation
– PC <- PC + 4 Calculate the next instruction’s address
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
CPS 220 13© Alvin R. Lebeck 2001
Combinational Logic Elements (Building Blocks)
32A
B
32Result
Zero
OPA
LU
32A
B32
Y32
MU
X
Select
32
32
A
B
32Sum
Carry
Ad
der
CarryIn
ADDER MUX
ALU
CPS 220 14© Alvin R. Lebeck 2001
Storage Element: Register (Building Block)
• Register– Similar to the D Flip Flop except
» N-bit input and output
» Write Enable input
– Write Enable:
» negated (0): Data Out will not change
» asserted (1): Data Out will become Data In Clk
Data In
Write Enable
N N
Data Out
CPS 220 15© Alvin R. Lebeck 2001
Storage Element: Register File
• Register File consists of 32 registers:– Two 32-bit output busses:
busA and busB
– One 32-bit input bus: busW
• Register is selected by:– RA selects the register to put on busA
– RB selects the register to put on busB
– RW selects the register to be writtenvia busW when Write Enable is 1
• Clock input (CLK) – The CLK input is a factor ONLY during write operation
– During read operation, behaves as a combinational logic block:
» RA or RB valid => busA or busB valid after “access time.”
Clk
busW
Write Enable
32
32
busA
32
busB
5 5 5
RW RA RB
32 32-bitRegisters
CPS 220 16© Alvin R. Lebeck 2001
Storage Element: Idealized Memory
• Memory (idealized)– One input bus: Data In
– One output bus: Data Out
• Memory word is selected by:– Address selects the word to put on Data Out
– Write Enable = 1: address selects the memoryword to be written via the Data In bus
• Clock input (CLK) – The CLK input is a factor ONLY during write operation
– During read operation, behaves as a combinational logic block:
» Address valid => Data Out valid after “access time.”
• Looks similar to register file. Why have registers?
Clk
Data In
Write Enable
32 32
DataOut
Address
CPS 220 17© Alvin R. Lebeck 2001
Overview of the Instruction Fetch Unit
• The common RTL operations– Fetch the Instruction: mem[PC]
– Update the program counter:
» Sequential Code: PC <- PC + 4
» Branch and Jump: PC <- “something else”
32
Instruction WordAddress
InstructionMemory
PCClk
Next AddressLogic
CPS 220 18© Alvin R. Lebeck 2001
Datapath for Register-Register Operations• R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt
– Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields
– ALUctr and RegWr: control logic after decoding the instruction
32
Result
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
5 5 5
Rw Ra Rb
32 32-bitRegisters
Rs RtRd
AL
U
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
CPS 220 19© Alvin R. Lebeck 2001
A Single Cycle Datapath
• We have everything except control signals (underline)
32
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
55 5
Rw Ra Rb
32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
Mux
3216imm16
ALUSrc
ExtOp
Mu
x
MemtoReg
Clk
Data InWrEn
32
Adr
DataMemory
32
MemWr
AL
U
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
Jump
Branch
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
CPS 220 20© Alvin R. Lebeck 2001
Instruction Fetch Unit at the Beginning of Add / Subtract
3030
Sign
Ext
30
16imm16
Mu
x
0
1
Ad
der
“1”
PC
Clk
Ad
der
30
30
Branch = previous Zero = previous
“00”
Addr<31:2>
InstructionMemory
Addr<1:0>
32
Mu
x1
0
26
4
PC<31:28>
Target30
• Fetch the instruction from Instruction memory: Instruction <- mem[PC]– This is the same for all instructions
Jump = previous
Instruction<15:0>
Instruction<31:0>
30
Instruction<25:0>
CPS 220 21© Alvin R. Lebeck 2001
The Single Cycle Datapath during Add and Subtract
32
ALUctr = Add or Subtract
Clk
busW
RegWr = 1
32
32
busA
32
busB
55 5
Rw Ra Rb
32 32-bitRegisters
Rs
Rt
Rt
RdRegDst = 1
Exten
der
Mu
x
Mux
3216imm16
ALUSrc = 0
ExtOp = x
Mu
x
MemtoReg = 0
Clk
Data InWrEn
32
Adr
DataMemory
32
MemWr = 0A
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
Jump = 0
Branch = 0
• R[rd] <- R[rs] + / - R[rt]
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
op rs rt rd shamt funct
061116212631
CPS 220 22© Alvin R. Lebeck 2001
Instruction Fetch Unit at the End of Add and Subtract
3030
Sign
Ext
30
16imm16
Mu
x
0
1
Ad
der
“1”
PC
Clk
Ad
der
30
30
Branch = 0 Zero = x
“00”
Addr<31:2>
InstructionMemory
Addr<1:0>
32
Mu
x1
0
26
4
PC<31:28>
Target30
• PC <- PC + 4– This is the same for all instructions except: Branch and Jump
Jump = 0
Instruction<15:0>
Instruction<31:0>
30
Instruction<25:0>
CPS 220 23© Alvin R. Lebeck 2001
The “Truth Table” for RegWrite
R-type ori lw sw beq jump
RegWrite 1 1 1 0 0 0
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
• RegWrite = R-type + ori + lw= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type)
+ !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori)
+ op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jump
RegWrite
CPS 220 24© Alvin R. Lebeck 2001
PLA Implementation of the Main Control
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
ALUSrc
MemtoRegMemWrite
BranchJump
RegDst
ExtOp
ALUop<2>ALUop<1>ALUop<0>
CPS 220 25© Alvin R. Lebeck 2001
Putting it All Together: A Single Cycle Processor
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
Mux
3216imm16
ALUSrc
ExtOp
Mu
x
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr
AL
U
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
Jump
Branch
0
1
0
1
01<
21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
MainControl
op6
ALUControlfunc
6
3ALUop
ALUctr3
RegDst
ALUSrc
:Instr<5:0>
Instr<31:26>
Instr<15:0>
CPS 220 26© Alvin R. Lebeck 2001
Drawback of this Single Cycle Processor
• Long cycle time:– Cycle time must be long enough for the load instruction:
» PC’s Clock -to-Q +
» Instruction Memory Access Time +
» Register File Access Time +
» ALU Delay (address calculation) +
» Data Memory Access Time +
» Register File Setup Time +
» Clock Skew
• Cycle time is much longer than needed for all other instructions
CPS 220 27© Alvin R. Lebeck 2001
Overview of a Multiple Cycle Implementation
• The root of the single cycle processor’s problems:– The cycle time has to be long enough for the slowest instruction
• Solution:– Break the instruction into smaller steps
– Execute each step (instead of the entire instruction) in 1 clock cycle
» Cycle time: time it takes to execute the longest step
» Try to make all the steps have similar length
– This is the essence of the multiple cycle processor
• The advantages of the multiple cycle processor:– Cycle time is much shorter
– Different instructions take different number of cycles to complete
» Load takes five cycles
» Jump only takes three cycles
– Allows a functional unit to be used more than once per instruction
Instr Decode /Reg Fetrch
The Five Steps of a Load Instruction
Clk
PC
Rs, Rt, Rd,Op, Func
Clk-to-Q
ALUctr
Instruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busARegister File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value
New ValueOld Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
Address Old Value New Value
busW Old Value New
Delay through Extender & Mux
Data Memory Access Time
Instruction Fetch Address Reg WrData Memory
Register F
ile Write T
ime
29© Alvin R. Lebeck 2001
Multiple Cycle Datapath
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr
32
AL
U
3232
ALUOp
ALUControl
Instru
ction R
eg
32
IRWr
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mu
x
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mu
x
0
1
32
PC
MemtoReg
Extend
ExtOp
Mu
x0
132
0
1
23
4
16
Imm
32
<< 2
ALUSelB
Mu
x1
0
Target32
Zero
ZeroPCWrCond PCSrc BrWr
32
IorD
Func
OpControl6
6
BeqRtypeOri
Memory:
CPS 220 30© Alvin R. Lebeck 2001
Where to get more information?
• Chapter 5 of CPS 104 text book:– David Patterson and John Hennessy, “Computer Organization &
Design: The Hardware / Software Interface,” Morgan Kaufman Publishers, San Mateo, California, 1994.
• For a reference on the MIPS architecture:– Gerry Kane, “MIPS RISC Architecture,” Prentice Hall.
• Now: Pipelining
Introduction to Pipelining
32© Alvin R. Lebeck 2001
Overview
• A Pipelined Processor : – Introduction to the concept of pipelined processor.
– Pipelined Datapath
– Pipeline example: Load Instruction
Reading: • Chapter 3• Or Chapters 5, 6 in the CPS104 text
33© Alvin R. Lebeck 2001
Pipelining: It’s Natural!
• Laundry Example
• Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
• How long to do laundry?
A B C D
34© Alvin R. Lebeck 2001
Sequential Laundry
• Sequential laundry takes 6 hours for 4 loads
• If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
35© Alvin R. Lebeck 2001
Pipelined Laundry: Start work ASAP
• Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
36© Alvin R. Lebeck 2001
• Pipelining doesn’t help latency of single task, it helps throughput of entire workload
• Pipeline rate limited by slowest pipeline stage
• Multiple tasks operating simultaneously
• Potential speedup = Number pipe stages
• Unbalanced lengths of pipe stages reduces speedup
• Time to “fill” pipeline and time to “drain” it reduces speedup
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
Pipelining Lessons
37© Alvin R. Lebeck 2001
Review: a Multiple-Cycle Implementation
• The main single cycle processor’s problem:– The cycle time has to be long enough for the slowest instruction to
complete execution.
• Solution:– Break instruction execution into smaller steps– Execute each step in one cycle (instead of the entire instruction).
» Short cycle time: time it takes to execute the longest step » Make all the steps have similar length
– This is the essence of the multiple cycle processor
• Multiple-cycle processor advantages:– Cycle time is much shorter– Different instructions take different number of cycles to complete
» Load takes five cycles» Jump only takes three cycles
– Allows a functional unit to be used more than once per instruction
38© Alvin R. Lebeck 2001
• MCP: If a functional unit is used more than once per instruction -> cannot pipeline -> lower performance
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr32
AL
U
3232
ALUOp
ALUControl
Instru
ction R
eg
32
IRWr
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mu
x
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mu
x
0
1
32
PC
MemtoReg
Extend
ExtOp
Mu
x
0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mu
x
1
0
Target32
Zero
ZeroPCWrCond PCSrc BrWr
32
IorD
Multiple Cycle Processor
39© Alvin R. Lebeck 2001
• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode
• Exec: Calculate the memory address
• Mem: Read the data from the Data Memory
• WrB: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrBLoad
The Five Stages of a Load
40© Alvin R. Lebeck 2001
Key Ideas Behind Instruction Execution Pipelining
• The load instruction has 5 stages:
– Five independent functional units to work on each stage
» Each functional unit is used only once!
– A 2nd load can start doing Ifetch as soon as the 1st load finishes its Ifetch stage.
– Each load still takes five cycles to complete.
» latency is still 5 cycles
– The throughput is much higher:
» CPI is 1 with ~1/5th the cycle time.
– Instructions start executing before previous instructions complete execution.
Ifetch Reg/Dec Exec Mem WrBLoad
41© Alvin R. Lebeck 2001
• The five independent pipeline stages are:– Read Next Instruction: The Ifetch stage.
– Decode Instruction and fetch register values: The Reg/Dec stage
– Execute the operation: The Exec stage.
– Access Data-Memory: The Mem stage.
– Write Data to Destination Register: The WrB stage
• One instruction enters the pipeline every cycle– One instruction comes out of the pipeline (completed) every cycle
– The “Effective” Cycles per Instruction (CPI) is 1; ~1/5 cycle time
ClockCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem WrB1st lw
Ifetch Reg/Dec Exec Mem WrB2nd lw
Ifetch Reg/Dec Exec Mem WrB3rd lw
Pipelining the Load Instruction
42© Alvin R. Lebeck 2001
• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode
• Exec: ALU operates on the two register operands
• WrB: Write the ALU output back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec WrB
R-type
The Four Stages of R-Type
43© Alvin R. Lebeck 2001
• We have a problem called a structural hazard or pipeline conflict:– Two instructions try to write to the register file at the same time!
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
OOPS! We have a problem!
Pipelining the R-type and Load Instruction
44© Alvin R. Lebeck 2001
• Each functional unit can only be used once per instruction.
• Each functional unit must be used at the same stage for all instructions:– Load uses Register File’s Write Port during its 5th stage.
– R-type uses Register File’s Write Port during its 4th stage.
Ifetch Reg/Dec Exec Mem WrBLoad
1 2 3 4 5
Ifetch Reg/Dec Exec WrBR-type
1 2 3 4
° How do we solve this pipeline hazard?
Important Observations
45© Alvin R. Lebeck 2001
• Delay R-type’s register write by one cycle:– Now R-type instructions also use Reg File’s write port at Stage 5
– Mem stage is a NO-OP stage: nothing is being done. Effective CPI?
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Mem WrBR-type
Ifetch Reg/Dec Mem WrBR-type
Ifetch Reg/Dec Exec Mem WrBLoad
Ifetch Reg/Dec Mem WrBR-type
Ifetch Reg/Dec Mem WrBR-type
Exec
Exec
Exec
Exec
Ifetch Reg/Dec Exec WrR-type Mem
1 2 3 4 5
Solution: Delay R-type’s Write by One Cycle
46© Alvin R. Lebeck 2001
• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode
• Exec: Calculate the memory address
• Mem: Write the data into the Data Memory
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemStore WrB
The Four Stages of a Store
47© Alvin R. Lebeck 2001
• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode
• Exec: ALU compares the two register operands– Adder calculates the branch target address
• Mem: If the registers we compared in the Exec stage are the same,– Write the branch target address into the PC
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemBeq WrB
The Four Stages of BEQ
48© Alvin R. Lebeck 2001
IF/ID
Register
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IF_U
nit
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp
ExecUnit
busAbusB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
10
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem WrB
EXUnit
A Pipelined Datapath
49© Alvin R. Lebeck 2001
MemWr
You are here!
IF/ID
: lw $1, 100 ($2)
ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
PC
= 12 Data
Mem
WADi
RA Do
IF_U
nit
A
I
RFileDi
Ra
Rb
Rw
RegWr ExtOp
ExecUnit
busAbusB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
10
RegDst
Rt
Rd
Imm16
PC+4 PC+4
Rs
Rt
PC
+4
Zero
Branch
10
ClkIfetch Reg/Dec Exec Mem
• Location 8: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
The Instruction Fetch Stage
50© Alvin R. Lebeck 2001
• Location 8: lw $1, 0x100($2)
lw $1, 100 ($2)
PC
= 12
“8”
Ad
der
InstructionMemory
“4”
Instruction
Address
Clk
Ifetch
You are here!
Reg/Dec
PC
+4
32
Detailed View of the Instruction Fetch Unit
51© Alvin R. Lebeck 2001
• Location 8: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
IF/ID
:
ID/E
x: Reg. 2 &
0x100
Ex/M
em R
egister
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr ExtOp
ExecUnit
busAbusB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
1
0
RegDst
Rt
Rd
Imm16
PC+4 PC+4
Rs
Rt
PC
+4
Zero
Branch
10
ClkIfetch Reg/Dec Exec Mem
You are here!
The Decode / Register Fetch Stage
52© Alvin R. Lebeck 2001O
Prs
rtrd
fun
cP
C +
4
Rw
Control
Rb
Rart
rs
RegisterFile
rt
rd
Imm16
Bus-A
Bus-B
PC+4
Din
Clk Clk
Detailed View of the Fetch/Decode Stage
53© Alvin R. Lebeck 2001
• Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
IF/ID
: ID/E
x Register
Ex/M
em: L
oad’s A
dd
ress
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFileDi
Ra
Rb
Rw
MemWr
RegWr ExtOp=1
ExecUnit
busAbusB
Imm16
ALUOp=Add
ALUSrc=1
Mu
x
1
0
MemtoReg
10
RegDst=0
Rt
Rd
Imm16
PC+4 PC+4
Rs
Rt
PC
+4
Zero
Branch
10
ClkIfetch Reg/Dec Exec Mem
You are here!
Load’s Address Calculation Stage
54© Alvin R. Lebeck 2001
ID/E
x Register
Ex/M
em: L
oad’s M
emory A
dd
ress
ALU CONTROL
ALUctr
32busA
32busB
Exten
der
Mu
x16
imm16
ALUSrc=1ExtOp=1
3
AL
UZero
0
1
32ALUout
32
Ad
der
3 ALUOp=Add
<< 2
32PC+4
Target
32
Clk
Exec
You are here!
Mem
Detailed View of the Execution Unit
55© Alvin R. Lebeck 2001
You are here!
IF/ID
: ID/E
x Register
Ex/M
em R
egister
Mem
/Wr: L
oad’s D
ata
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFileDi
Ra
Rb
Rw
MemWr=0
RegWr ExtOp
ExecUnit
busAbusB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg
1
0
RegDst
Rt
Rd
Imm16
PC+4 PC+4
Rs
Rt
PC
+4
Zero
Branch=0
10
ClkIfetch Reg/Dec Exec Mem
• Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
Load’s Memory Access Stage
56© Alvin R. Lebeck 2001
• Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
IF/ID
: ID/E
x Register
Ex/M
em R
egister
Mem
/Wr R
egister
PC
DataMem
WADi
RA Do
IUn
it
A
I
RFile
Di
Ra
Rb
Rw
MemWr
RegWr=1 ExtOp
ExecUnit
busAbusB
Imm16
ALUOp
ALUSrc
Mu
x
1
0
MemtoReg=1
1
0
RegDst
Rt
Rd
Imm16
PC+4PC+4
Rs
Rt
PC
+4
Zero
Branch
10
Clk
Ifetch Reg/Dec Exec Mem
You are somewhere out there!
Wr
Load’s Write Back Stage
57© Alvin R. Lebeck 2001
Next Time
• Pipeline Complications