chapter 7 digital design and computer architecture, 2 nd edition chapter 7 david money harris and...
TRANSCRIPT
![Page 1: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/1.jpg)
Chapter 7 <1>
MIC
ROAR
CHIT
ECTU
RE
Digital Design and Computer Architecture, 2nd Edition
Chapter 7
David Money Harris and Sarah L. Harris
![Page 2: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/2.jpg)
Chapter 7 <2>
MIC
ROAR
CHIT
ECTU
RE Chapter 7 :: Topics
• Introduction• Performance Analysis• Single-Cycle Processor• Pipelined Processor• Exceptions• Advanced Microarchitecture
![Page 3: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/3.jpg)
Chapter 7 <3>
MIC
ROAR
CHIT
ECTU
RE• Microarchitecture: the
implementation of an architecture in hardware
• Processor:– Datapath: functional blocks– Control: control signals
Physics
Devices
AnalogCircuits
DigitalCircuits
Logic
Micro-architecture
Architecture
OperatingSystems
ApplicationSoftware
electrons
transistorsdiodes
amplifiersfilters
AND gatesNOT gates
addersmemories
datapathscontrollers
instructionsregisters
device drivers
programs
Introduction
![Page 4: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/4.jpg)
Chapter 7 <4>
MIC
ROAR
CHIT
ECTU
RE• Multiple implementations for a single
architecture:– Single-cycle: Each instruction executes in a
single cycle– Multicycle: Instructions are broken into series of
shorter steps Each instruction executes in n cycles, where n varys according to the instr.
– Pipelined: Each instruction broken up into series of steps & multiple instructions execute at once (Note: AMD and Intel pipelines are different, for the same IA-32 architecture (a.k.a. x86 ISA)
Microarchitecture
![Page 5: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/5.jpg)
Chapter 7 <5>
MIC
ROAR
CHIT
ECTU
RE• Program execution timeExecution Time = (#instructions)(cycles/instruction)(seconds/cycle)
• Definitions:– IC: Instruction Count (= #instructions)– CPI: Cycles/Instruction– clock period: seconds/cycle– IPC: Instructions/Cycle (= 1/CPI)
• Challenge is to satisfy constraints of:– Cost– Power– Performance
Processor Performance
![Page 6: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/6.jpg)
Chapter 7 <6>
MIC
ROAR
CHIT
ECTU
RE• Consider subset of MIPS instructions:
– R-type instructions: and, or, add, sub, slt– Memory instructions: lw, sw– Branch instructions: beq
MIPS Processor
![Page 7: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/7.jpg)
Chapter 7 <7>
MIC
ROAR
CHIT
ECTU
RE• Determines everything about a processor:
– PC and special registers– Register File– Memory
Architectural State
![Page 8: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/8.jpg)
Chapter 7 <8>
MIC
ROAR
CHIT
ECTU
RE
CLK
A RD
InstructionMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RD
DataMemory
WD
WEPCPC'
CLK
32 3232 32
32
32
32 32
32
32
5
5
5
MIPS State Elements
Plus the HI and LO registers
![Page 9: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/9.jpg)
Chapter 7 <9>
MIC
ROAR
CHIT
ECTU
RE• Datapath—design it 1st, to make the
instruction actions possible• Control—design it 2nd, to make them
happen
Single-Cycle MIPS Processor
![Page 10: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/10.jpg)
Chapter 7 <10>
MIC
ROAR
CHIT
ECTU
RESTEP 1: Fetch instruction
IM[PC]
CLK
A RD
InstructionMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RD
DataMemory
WD
WEPCPC'
Instr
CLK
Single-Cycle Datapath: lw fetch
![Page 11: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/11.jpg)
Chapter 7 <11>
MIC
ROAR
CHIT
ECTU
RESTEP 2: Read source operands from RF
RF[rs] or RF[Instr(25:21)]
Instr
CLK
A RD
InstructionMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RD
DataMemory
WD
WEPCPC'
25:21
CLK
Single-Cycle Datapath: lw Register Read
![Page 12: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/12.jpg)
Chapter 7 <12>
MIC
ROAR
CHIT
ECTU
RESTEP 3: Sign-extend the immediate SignExt(immed)
SignImm
CLK
A RD
InstructionMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RD
DataMemory
WD
WEPCPC' Instr
25:21
15:0
CLK
Single-Cycle Datapath: lw Immediate
![Page 13: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/13.jpg)
Chapter 7 <13>
MIC
ROAR
CHIT
ECTU
RESTEP 4: Compute the memory address
addr = RF[rs] + SignExt(immed)
SignImm
CLK
A RD
InstructionMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RD
DataMemory
WD
WEPCPC' Instr
25:21
15:0
SrcB
ALUResult
SrcA Zero
CLK
ALUControl2:0
ALU
010
Single-Cycle Datapath: lw address
![Page 14: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/14.jpg)
Chapter 7 <14>
MIC
ROAR
CHIT
ECTU
RE• STEP 5: Read data from memory and write
it back to register file: RF[rt] DM[addr]
A1
A3
WD3
RD2
RD1WE3
A2
SignImm
CLK
A RD
InstructionMemory
CLK
Sign Extend
RegisterFile
A RD
DataMemory
WD
WEPCPC' Instr
25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
Single-Cycle Datapath: lw Memory Read
![Page 15: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/15.jpg)
Chapter 7 <15>
MIC
ROAR
CHIT
ECTU
RESTEP 6: Determine address of next instruction PC PC + 4
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RD
DataMemory
WD
WEPCPC' Instr
25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
PCPlus4
Result
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
Single-Cycle Datapath: lw PC Increment
![Page 16: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/16.jpg)
Chapter 7 <16>
MIC
ROAR
CHIT
ECTU
REIM[PC]RF[rt] DM[RF[rs] + SignExt(immed)]PC PC + 4
Full RTL Expression for lw
![Page 17: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/17.jpg)
Chapter 7 <17>
MIC
ROAR
CHIT
ECTU
REWrite data in rt to memory: DM[addr]RF[rt]
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RD
DataMemory
WD
WEPCPC' Instr
25:21
20:16
15:0
SrcB20:16
ALUResult ReadData
WriteData
SrcA
PCPlus4
Result
MemWriteRegWrite
Zero
CLK
ALUControl2:0
ALU
10100
Single-Cycle Datapath: sw
![Page 18: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/18.jpg)
Chapter 7 <18>
MIC
ROAR
CHIT
ECTU
RE• Read from rs and rt• Write ALUResult to register file• Write to rd (instead of rt) RF[rd] RF[rs] op RF[rt]
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PCPC' Instr25:21
20:16
15:0
SrcB
20:16
15:11
ALUResult ReadData
WriteData
SrcA
PCPlus4WriteReg4:0
Result
RegDst MemWrite MemtoRegALUSrcRegWrite
Zero
CLK
ALUControl2:0
ALU
0varies1 001
Single-Cycle Datapath: R-Type
![Page 19: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/19.jpg)
Chapter 7 <19>
MIC
ROAR
CHIT
ECTU
RE• Determine whether values in rs and rt are equal• Calculate branch target address: BTA = PC + 4 + SignExt(immed)<< 2 # <<2 = 4x
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1
PC' Instr25:21
20:16
15:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
RegDst Branch MemWrite MemtoRegALUSrcRegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
01100 x0x 1
Single-Cycle Datapath: beq
![Page 20: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/20.jpg)
Chapter 7 <20>
MIC
ROAR
CHIT
ECTU
REIM[PC]if (RF[rs] - RF[rt] == 0) PC BTAelse PC PC + 4
RTL Expression for beq
![Page 21: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/21.jpg)
Chapter 7 <21>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Single-Cycle Processor
![Page 22: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/22.jpg)
Chapter 7 <22>
MIC
ROAR
CHIT
ECTU
RE
RegDst
Branch
MemWrite
MemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
Single-Cycle Control
![Page 23: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/23.jpg)
Chapter 7 <23>
MIC
ROAR
CHIT
ECTU
RE
ALU
N N
N
3
A B
Y
F
F2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLT
Review: ALU
![Page 24: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/24.jpg)
Chapter 7 <24>
MIC
ROAR
CHIT
ECTU
RE
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2Z
ero
Extend
Review: ALU
![Page 25: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/25.jpg)
Chapter 7 <25>
MIC
ROAR
CHIT
ECTU
REALUOp1:0 Meaning
00 Add (for lw, sw)
01 Subtract (for beq)
10 Look at funct (R-type)
11 Not Used
ALUOp1:0 funct ALUControl2:0
00 X 010 (Add)
X1 X 110 (Subtract)
1X 100000 (add) 010 (Add)
1X 100010 (sub) 110 (Subtract)
1X 100100 (and) 000 (And)
1X 100101 (or) 001 (Or)
1X 101010 (slt) 111 (SLT)
Control Unit: ALU Decoder
![Page 26: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/26.jpg)
Chapter 7 <26>
MIC
ROAR
CHIT
ECTU
REInstruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000
lw 100011
sw 101011
beq 000100
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Control Unit: Main Decoder
![Page 27: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/27.jpg)
Chapter 7 <27>
MIC
ROAR
CHIT
ECTU
RE
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10lw 100011 1 0 1 0 0 1 00sw 101011 0 X 1 0 1 X 00beq 000100 0 X 0 1 0 X 01
Control Unit: Main Decoder
![Page 28: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/28.jpg)
Chapter 7 <28>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0010
01
0
0
1
0
Single-Cycle Datapath: or
![Page 29: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/29.jpg)
Chapter 7 <29>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
No change to datapath
Extended Functionality: addi
![Page 30: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/30.jpg)
Chapter 7 <30>
MIC
ROAR
CHIT
ECTU
REInstruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 1 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01
addi 001000
Main Decoder table: addi
![Page 31: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/31.jpg)
Chapter 7 <31>
MIC
ROAR
CHIT
ECTU
REInstruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 1 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01
addi 001000 1 0 1 0 0 0 00
Main Decoder table: addi
![Page 32: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/32.jpg)
Chapter 7 <32>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC'
Instr25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0
1
25:0 <<2
27:0 31:28
PCJump
Jump
Extended Functionality: j
![Page 33: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/33.jpg)
Chapter 7 <33>
MIC
ROAR
CHIT
ECTU
RE
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
j 000010
Main Decoder table: j
![Page 34: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/34.jpg)
Chapter 7 <34>
MIC
ROAR
CHIT
ECTU
RE
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
j 000010 0 X X X 0 X XX 1
Main Decoder table: j
![Page 35: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/35.jpg)
Chapter 7 <35>
MIC
ROAR
CHIT
ECTU
RE
Program Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = IC x CPI x TC
Review: Processor Performance
![Page 36: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/36.jpg)
Chapter 7 <36>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
0100
1
0
1
0 0
TC limited by critical path (lw)
Single-Cycle Performance
![Page 37: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/37.jpg)
Chapter 7 <37>
MIC
ROAR
CHIT
ECTU
RE• Single-cycle critical path: Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem
+ tmux + tRFsetup
• Typically, limiting paths are: – memory, ALU, register file – Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
Single-Cycle Performance
![Page 38: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/38.jpg)
Chapter 7 <38>
MIC
ROAR
CHIT
ECTU
REElement Parameter Delay (ps)Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = ?
Single-Cycle Performance Example
![Page 39: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/39.jpg)
Chapter 7 <39>
MIC
ROAR
CHIT
ECTU
REElement Parameter Delay (ps)Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
= [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps [fclk = 1/0.925 GHz = 1.08 GHz]
Single-Cycle Performance Example
![Page 40: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/40.jpg)
Chapter 7 <40>
MIC
ROAR
CHIT
ECTU
REProgram with IC = 100 billion instructions:
Execution Time = IC x CPI x TC
= (100 × 109)(1)(925 × 10-12 s) = 92.5 seconds
Single-Cycle Performance Example
![Page 41: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/41.jpg)
Chapter 7 <41>
MIC
ROAR
CHIT
ECTU
REPros and cons of single-cycle implementation: + simple design + 1 cycle per every instruction - slow cycle time
limited by longest instruction (lw) - HW: 2 adders + ALU; 2 memories
Evaluation of Single-Cycle Processor
![Page 42: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/42.jpg)
Chapter 7 <42>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0
1
25:0 <<2
27:0 31:28
PCJump
Jump
Review: Single-Cycle Processor
![Page 43: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/43.jpg)
Chapter 7 <43>
MIC
ROAR
CHIT
ECTU
RE//------------------------------------------------// [email protected] 9 November 2005// Top level system including MIPS and memories//------------------------------------------------
module top (input clk, reset, output [31:0] writedata, dataadr, output memwrite);
wire [31:0] pc, instr, readdata;
// instantiate processor and memories mips mips (clk, reset, pc, instr, memwrite, dataadr, writedata, readdata); imem imem (pc[7:2], instr); dmem dmem (clk, memwrite, dataadr, writedata, readdata);
endmodule
Verilog Model
![Page 44: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/44.jpg)
Chapter 7 <44>
MIC
ROAR
CHIT
ECTU
RE//------------------------------------------------// [email protected] 23 October 2005// External data memory used by MIPS single-cycle processor//------------------------------------------------module dmem (input clk, we, input [31:0] a, wd, output [31:0] rd);
reg [31:0] RAM[63:0]; assign rd = RAM[a[31:2]]; // word-aligned read
always @(posedge clk) if (we) RAM[a[31:2]] <= wd; // word-aligned writeendmodule
Verilog Model of Data Memory
![Page 45: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/45.jpg)
Chapter 7 <45>
MIC
ROAR
CHIT
ECTU
REmodule imem (input [5:0] addr, output reg [31:0] instr);
// imem is modeled as a lookup table, a stored-program byte-addressable ROMalways@(addr) case ({addr, 2'b00})
// address instruction// --------- --------------
8'h00: instr = 32'h20020005;8'h04: instr = 32'h2003000c;8'h08: instr = 32'h2067fff7;8'h0c: instr = 32'h00e22025;8'h10: instr = 32'h00642824;8'h14: instr = 32'h00a42820;8'h18: instr = 32'h10a7000a;8'h1c: instr = 32'h0064202a;8'h20: instr = 32'h10800001;
default: instr = {32{1'bx}}; // unknown instruction endcase
endmodule
Verilog Model of Instr. Memory
![Page 46: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/46.jpg)
Chapter 7 <46>
MIC
ROAR
CHIT
ECTU
REmodule imem (input [5:0] addr, output [31:0] instr);
reg [31:0] RAM[63:0];
// imem is RAM, loaded from memfile.dat file with hex values at startup initial begin $readmemh("memfile.dat", RAM); end
assign instr = RAM[addr]; // instr at RAM[addr] is read out
endmodule
// imem can be created with CoreGen for Xilinx synthesis
Alternate Model of Instr. Memory
![Page 47: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/47.jpg)
Chapter 7 <47>
MIC
ROAR
CHIT
ECTU
RE// single-cycle MIPS processormodule mips (input clk, reset, output [31:0] pc, input [31:0] instr, output memwrite, output [31:0] aluout, writedata, input [31:0] readdata);
wire memtoreg, pcsrc, zero, alusrc, regdst, regwrite, jump; wire [2:0] alucontrol;
controller c (instr[31:26], instr[5:0], zero, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol);
datapath dp (clk, reset, memtoreg, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, zero, pc, instr, aluout, writedata, readdata);
endmodule
Verilog Model of MIPS Processor
![Page 48: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/48.jpg)
Chapter 7 <48>
MIC
ROAR
CHIT
ECTU
REmodule controller ( input [5:0] op, funct, input zero, output memtoreg, memwrite, output pcsrc, alusrc, output regdst, regwrite, output jump, output [2:0] alucontrol);
wire [1:0] aluop; wire branch;
maindec md (op, regwrite, regdst, alusrc, branch, memwrite, memtoreg, aluop, jump);
aludec ad (funct, aluop, alucontrol);
assign pcsrc = branch & zero;
endmodule
Verilog Model of Controller
![Page 49: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/49.jpg)
Chapter 7 <49>
MIC
ROAR
CHIT
ECTU
REmodule maindec (input [5:0] op, output regwrite, regdst, alusrc, branch,
output memwrite, memtoreg, output [1:0] aluop, output jump); reg [8:0] controls;
assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, aluop, jump} = controls;
always @(*) case(op) 6'b000000: controls = 9'b110000100; //Rtype 6'b100011: controls = 9'b101001000; //LW 6'b101011: controls = 9'b001010000; //SW 6'b000100: controls = 9'b000100010; //BEQ 6'b001000: controls = 9'b101000000; //ADDI 6'b000010: controls = 9'b000000001; //J default: controls = 9'bxxxxxxxxx; //??? endcaseendmodule
Verilog Model of Main Decoder
![Page 50: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/50.jpg)
Chapter 7 <50>
MIC
ROAR
CHIT
ECTU
REmodule aludec (input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol); always @(*) case(aluop) 2'b00: alucontrol = 3'b010; // add 2'b01: alucontrol = 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol = 3'b010; // ADD 6'b100010: alucontrol = 3'b110; // SUB 6'b100100: alucontrol = 3'b000; // AND 6'b100101: alucontrol = 3'b001; // OR 6'b101010: alucontrol = 3'b111; // SLT default: alucontrol = 3'bxxx; // ??? endcase endcaseendmodule
Verilog Model of ALU Decoder
![Page 51: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/51.jpg)
Chapter 7 <51>
MIC
ROAR
CHIT
ECTU
REmodule datapath (input clk, reset, memtoreg, pcsrc, alusrc, regdst, input regwrite, jump, input [2:0] alucontrol, output zero, output [31:0] pc, input [31:0] instr, output [31:0] aluout, writedata, input [31:0] readdata);
wire [4:0] writereg; wire [31:0] pcnext, pcnextbr, pcplus4, pcbranch; wire [31:0] signimm, signimmsh, srca, srcb, result; // next PC logic flopr #(32) pcreg(clk, reset, pcnext, pc); adder pcadd1(pc, 32'b100, pcplus4); sl2 immsh(signimm, signimmsh); adder pcadd2(pcplus4, signimmsh, pcbranch); mux2 #(32) pcbrmux(pcplus4, pcbranch, pcsrc, pcnextbr); mux2 #(32) pcmux(pcnextbr, {pcplus4[31:28], instr[25:0], 2'b00}, jump, pcnext);
Verilog Model of Datapath
![Page 52: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/52.jpg)
Chapter 7 <52>
MIC
ROAR
CHIT
ECTU
RE// register file logic regfile rf (clk, regwrite, instr[25:21], instr[20:16], writereg, result, srca, writedata);
mux2 #(5) wrmux (instr[20:16], instr[15:11], regdst, writereg); mux2 #(32) resmux (aluout, readdata, memtoreg, result); signext se (instr[15:0], signimm);
// ALU logic mux2 #(32) srcbmux (writedata, signimm, alusrc, srcb); alu alu (srca, srcb, alucontrol, aluout, zero);
endmodule
Verilog Model of Datapath (con’t)
![Page 53: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/53.jpg)
Chapter 7 <53>
MIC
ROAR
CHIT
ECTU
REmodule regfile (input clk, we3, input [4:0] ra1, ra2, wa3, input [31:0] wd3, output [31:0] rd1, rd2);
reg [31:0] rf [31:0];
// three ported register file: read two ports combinationally // write third port on rising edge of clock. Register 0 hardwired to 0
always @(posedge clk) if (we3) rf [wa3] <= wd3;
assign rd1 = (ra1 != 0) ? rf [ra1] : 0; assign rd2 = (ra2 != 0) ? rf[ ra2] : 0;
endmodule
Verilog Model of Register File
![Page 54: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/54.jpg)
Chapter 7 <54>
MIC
ROAR
CHIT
ECTU
REmodule adder (input [31:0] a, b, output [31:0] y); assign y = a + b;endmodule
module sl2 (input [31:0] a, output [31:0] y);// shift left by 2 assign y = {a[29:0], 2'b00}; endmodule
module signext (input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a};endmodule
Verilog Models of Other Parts
![Page 55: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/55.jpg)
Chapter 7 <55>
MIC
ROAR
CHIT
ECTU
REmodule flopr #(parameter WIDTH = 8) (input clk, reset, input [WIDTH-1:0] d, output reg [WIDTH-1:0] q); always @(posedge clk, posedge reset) if (reset) q <= 0; else q <= d;endmodule
module flopenr #(parameter WIDTH = 8) (input clk, reset, en, input [WIDTH-1:0] d, output reg [WIDTH-1:0] q); always @(posedge clk, posedge reset) if (reset) q <= 0; else if (en) q <= d;endmodule
module mux2 #(parameter WIDTH = 8) (input [WIDTH-1:0] d0, d1, input s, output [WIDTH-1:0] y); assign y = s ? d1 : d0; endmodule
Verilog for Parameterized Parts
![Page 56: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/56.jpg)
Chapter 7 <56>
MIC
ROAR
CHIT
ECTU
RE• Unscheduled function call to exception handler• Caused by:
– Hardware, also called an interrupt, e.g. keyboard– Software, also called traps, e.g. undefined instruction
• When exception occurs, the processor:– Records cause of exception (Cause register)– Jumps to exception handler (0x80000180)– Returns to program (EPC register)
Review: Exceptions
![Page 57: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/57.jpg)
Chapter 7 <57>
MIC
ROAR
CHIT
ECTU
RE Example Exception
![Page 58: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/58.jpg)
Chapter 7 <58>
MIC
ROAR
CHIT
ECTU
RE• Not part of register file; in Coprocessor 0
– Cause• Records cause of exception• Coprocessor 0 register 13
– EPC (Exception PC)• Records PC where exception occurred• Coprocessor 0 register 14
• Move from Coprocessor 0– mfc0 $t0, Cause (=mfc0 $t0,$13)– Moves contents of Cause into $t0
00000 $t0 (8) Cause (13) 00000000000
mfc0
31:26 25:21 20:16 15:11 10:0
010000
Review: Exception Registers
![Page 59: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/59.jpg)
Chapter 7 <59>
MIC
ROAR
CHIT
ECTU
REException Cause
Hardware Interrupt 0x00000000
System Call 0x00000020
Breakpoint / Divide by 0 0x00000024
Undefined Instruction 0x00000028
Arithmetic Overflow 0x00000030
Extend single-cycle MIPS processor to handle last two types of exceptions
Review: Exception Causes
![Page 60: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/60.jpg)
Chapter 7 <60>
MIC
ROAR
CHIT
ECTU
RE Exception RTLs
Undefined InstructionIM[PC]. . . # problem in decoding (bad op or func) Cause 40 # = 0x28EPC PCPC 0x80000180 #Exception handler address
Arithmetic OverflowIM[PC]. . . # ALU operation overflowsCause 48 # = 0x30EPC PCPC 0x80000180 #Exception handler address
mfc0 instruction (e.g. mfc0 $t1, $13)IM[PC]RF[rt] RFc0[rd]PC PC + 4
![Page 61: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/61.jpg)
Chapter 7 <61>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc1:0
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
<<2
25:0 (jump)
31:28
27:0
PCJump
00
01
10
11
0x8000 0180
Overflow
CLK
EN
EPCWrite
CLK
EN
CauseWrite
0
1
IntCause
0x30
0x28EPC
Cause
Exception Hardware: EPC & Cause
Never mind the multi-cycle datapath, focus on the exception hardware.
![Page 62: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/62.jpg)
Chapter 7 <62>
MIC
ROAR
CHIT
ECTU
RE
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg1:0 ALUSrcARegWrite
Zero
PCSrc1:0
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0001
Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
<<2
25:0 (jump)
31:28
27:0
PCJump
00
01
10
11
0x8000 0180
CLK
EN
EPCWrite
CLK
EN
CauseWrite
0
1
IntCause
0x30
0x28EPC
Cause
Overflow
...
01101
01110
...15:11
10
C0
Exception Hardware: mfc0
Never mind the multi-cycle datapath, focus on the exception hardware.
![Page 63: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/63.jpg)
Chapter 7 <63>
MIC
ROAR
CHIT
ECTU
RE• Temporal parallelism• Divide single-cycle processor into 5 stages:
– Fetch– Decode– Execute– Memory– Writeback
• Add pipeline registers between stages
Pipelined MIPS Processor
![Page 64: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/64.jpg)
Chapter 7 <64>
MIC
ROAR
CHIT
ECTU
RE
Time (ps)Instr
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead / Write
WriteReg
1
2
0 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 1500 1600 1700 1800 19001000
Instr
1
2
3
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead / Write
WriteReg
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead/Write
WriteReg
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead/Write
WriteReg
FetchInstruction
DecodeRead Reg
ExecuteALU
MemoryRead/Write
WriteReg
Single-Cycle
Pipelined
Single-Cycle vs. Pipelined
![Page 65: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/65.jpg)
Chapter 7 <65>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
lw $s2, 40($0) RF 40
$0RF
$s2+ DM
RF $t2
$t1RF
$s3+ DM
RF $s5
$s1RF
$s4- DM
RF $t6
$t5RF
$s5& DM
RF 20
$s1RF
$s6+ DM
RF $t4
$t3RF
$s7| DM
add $s3, $t1, $t2
sub $s4, $s1, $s5
and $s5, $t5, $t6
sw $s6, 20($s1)
or $s7, $t3, $t4
1 2 3 4 5 6 7 8 9 10
add
IM
IM
IM
IM
IM
IMlw
sub
and
sw
or
Pipelined Processor Abstraction
![Page 66: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/66.jpg)
Chapter 7 <66>
MIC
ROAR
CHIT
ECTU
RE
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PCF0
1PC' InstrD
25:21
20:16
15:0
SrcBE
20:16
15:11
RtE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
ResultW
PCPlus4EPCPlus4F
ZeroM
CLK CLK
ALU
WriteRegE4:0
CLK
CLK
CLK
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
Zero
CLK
ALU
Fetch Decode Execute Memory Writeback
Single-Cycle & Pipelined Datapath
![Page 67: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/67.jpg)
Chapter 7 <67>
MIC
ROAR
CHIT
ECTU
RE
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PCF0
1PC' InstrD
25:21
20:16
15:0
SrcBE
20:16
15:11
RtE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4EPCPlus4F
ZeroM
CLK CLK
WriteRegW4:0
ALU
WriteRegE4:0
CLK
CLK
CLK
Fetch Decode Execute Memory Writeback
WriteReg must arrive at same time as Result
Corrected Pipelined Datapath
![Page 68: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/68.jpg)
Chapter 7 <68>
MIC
ROAR
CHIT
ECTU
RE
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PCF0
1PC' InstrD
25:21
20:16
15:0
5:0
SrcBE
20:16
15:11
RtE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4EPCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
ZeroM
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
BranchE BranchM
RegDstE
ALUSrcE
WriteRegE4:0
• Same control unit as single-cycle processor• Control delayed to proper pipeline stage
Pipelined Processor Control
![Page 69: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/69.jpg)
Chapter 7 <69>
MIC
ROAR
CHIT
ECTU
RE• When an instruction depends on result from
instruction that hasn’t completed• Types:
– Data hazard: register value not yet written back to register file
– Control hazard: next instruction not decided yet (caused by branches) or target address not calculated yet (jumps and branches)
Pipeline Hazards
![Page 70: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/70.jpg)
Chapter 7 <70>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IMadd
or
sub
Data Hazard
![Page 71: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/71.jpg)
Chapter 7 <71>
MIC
ROAR
CHIT
ECTU
RE
2 SW fixes• Insert nops in code at compile time• Rearrange code at compile time 2 HW fixes• Stall the processor at run time• Forward data at run time
Handling Data Hazards
![Page 72: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/72.jpg)
Chapter 7 <72>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IMadd
or
sub
nop
nop
RF RFDMnopIM
RF RFDMnopIM
9 10
• Insert enough nops for result to be ready• Or move independent useful instructions forward
Compile-Time Hazard Elimination
![Page 73: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/73.jpg)
Chapter 7 <73>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
add $s0, $s2, $s3 RF $s3
$s2RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IMadd
or
sub
Data Forwarding
![Page 74: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/74.jpg)
Chapter 7 <74>
MIC
ROAR
CHIT
ECTU
RE
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE
1
0
PCF0
1PC' InstrD
25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
AL
U
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
SignImmD
For
wa
rdA
E
For
wa
rdB
E
20:16RtE
RsD
RdD
RtD
Reg
Wri
teM
Reg
Wri
teW
Hazard Unit
PCPlus4E
BranchE BranchM
ZeroM
Data Forwarding
![Page 75: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/75.jpg)
Chapter 7 <75>
MIC
ROAR
CHIT
ECTU
RE• Forward to Execute stage from either:
– Memory stage or– Writeback stage
• Forwarding logic for ForwardAE:
if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM) then ForwardAE = 10
else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW)
then ForwardAE = 01 else ForwardAE = 00
Forwarding logic for ForwardBE same, but replace rsE with rtE
Data Forwarding
![Page 76: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/76.jpg)
Chapter 7 <76>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
lw $s0, 40($0) RF 40
$0RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IMlw
or
sub
Trouble!
StallingForwarding on a load-use hazard isn’t possible!
![Page 77: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/77.jpg)
Chapter 7 <77>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
lw $s0, 40($0) RF 40
$0RF
$s0+ DM
RF $s1
$s0RF
$t0& DM
RF $s0
$s4RF
$t1| DM
RF $s5
$s0RF
$t2- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IMlw
or
sub
9
RF $s1
$s0
IMor
Stall
StallingThe HW solution is to stall the pipeline
![Page 78: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/78.jpg)
Chapter 7 <78>
MIC
ROAR
CHIT
ECTU
RE
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE
1
0
PCF0
1PC' InstrD
25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
SignImmD
Sta
llF
Sta
llD
For
war
dAE
For
war
dBE
20:16RtE
RsD
RdD
RtD
Reg
Writ
eM
Reg
Writ
eW
Mem
toR
egE
Hazard Unit
Flu
shE
PCPlus4E
BranchE BranchM
ZeroM
EN
EN
CLR
Stalling Hardware
![Page 79: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/79.jpg)
Chapter 7 <79>
MIC
ROAR
CHIT
ECTU
RElwstall = ((rsD==rtE) OR (rtD==rtE)) AND MemtoRegE
StallF = StallD = FlushE = lwstall
• By flushing the Execute stage, and stalling Fetch and Decode stages, the instruction flushed will simply be repeated in then next clock cycle, but this time with correct (forwarded) data!
Stalling Logic
![Page 80: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/80.jpg)
Chapter 7 <80>
MIC
ROAR
CHIT
ECTU
RE• beq:
– branch not determined until 4th stage of pipeline– Instructions after the branch are fetched before the
branch occurs– These instructions must be flushed if branch happens
• Branch misprediction penalty– the # of instruction flushed, when branch is taken– may be reduced by determining branch earlier
Control Hazards
![Page 81: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/81.jpg)
Chapter 7 <81>
MIC
ROAR
CHIT
ECTU
RE
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE
1
0
PCF0
1PC' InstrD
25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchM
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcM
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
AL
U
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
SignImmD
Sta
llF
Sta
llD
For
wa
rdA
E
For
wa
rdB
E
20:16RtE
RsD
RdD
RtD
Reg
Wri
teM
Reg
Wri
teW
Me
mto
Reg
E
Hazard Unit
Flu
shE
PCPlus4E
BranchE BranchM
ZeroM
EN
EN
CL
R
Control Hazards: Original Pipeline
![Page 82: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/82.jpg)
Chapter 7 <82>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
beq $t1, $t2, 40 RF $t2
$t1RF- DM
RF $s1
$s0RF& DM
RF $s0
$s4RF| DM
RF $s5
$s0RF- DM
and $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
and
IM
IM
IM
IMlw
or
sub
20
24
28
2C
30
...
...
9
Flushthese
instructions
64 slt $t3, $s2, $s3 RF $s3
$s2RF
$t3slt DMIM
slt
Control Hazards
![Page 83: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/83.jpg)
Chapter 7 <83>
MIC
ROAR
CHIT
ECTU
RE
EqualD
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE
1
0
PCF0
1PC' InstrD
25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchD
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcD
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
=
SignImmD
Sta
llF
Sta
llD
For
war
dAE
For
war
dBE
20:16RtE
RsD
RdE
RtD
Reg
Writ
eM
Reg
Writ
eW
Mem
toR
egE
Hazard Unit
Flu
shE
EN
EN
CLR
CLR
But: introduced another data hazard in Decode stage!
Early Branch Resolution
![Page 84: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/84.jpg)
Chapter 7 <84>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
beq $t1, $t2, 40 RF $t2
$t1RF- DM
RF $s1
$s0RF& DMand $t0, $s0, $s1
or $t1, $s4, $s0
sub $t2, $s0, $s5
1 2 3 4 5 6 7 8
andIM
IMlw20
24
28
2C
30
...
...
9
Flushthis
instruction
64 slt $t3, $s2, $s3 RF $s3
$s2RF
$t3slt DMIMslt
Early Branch Resolution
![Page 85: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/85.jpg)
Chapter 7 <85>
MIC
ROAR
CHIT
ECTU
RE
EqualD
SignImmE
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
SignExtend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE
1
0
PCF0
1PC' InstrD
25:21
20:16
15:0
5:0
SrcBE
25:21
15:11
RsE
RdE
<<2
+
ALUOutM
ALUOutW
ReadDataW
WriteDataE WriteDataM
SrcAE
PCPlus4D
PCBranchD
WriteRegM4:0
ResultW
PCPlus4F
31:26
RegDstD
BranchD
MemWriteD
MemtoRegD
ALUControlD2:0
ALUSrcD
RegWriteD
Op
Funct
ControlUnit
PCSrcD
CLK CLK CLK
CLK CLK
WriteRegW4:0
ALUControlE2:0
ALU
RegWriteE RegWriteM RegWriteW
MemtoRegE MemtoRegM MemtoRegW
MemWriteE MemWriteM
RegDstE
ALUSrcE
WriteRegE4:0
000110
000110
0
1
0
1
=
SignImmD
Sta
llF
Sta
llD
For
war
dAE
For
war
dBE
For
war
dAD
For
war
dBD
20:16RtE
RsD
RdD
RtD
Reg
Writ
eE
Reg
Writ
eM
Reg
Writ
eW
Mem
toR
egE
Bra
nchD
Hazard Unit
Flu
shE
EN
EN
CLR
CLR
Forwarding to Early-branch HW
![Page 86: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/86.jpg)
Chapter 7 <86>
MIC
ROAR
CHIT
ECTU
RE• Forwarding logic:
ForwardAD = (rsD !=0) AND (rsD == WriteRegM) AND RegWriteM
ForwardBD = (rtD !=0) AND (rtD == WriteRegM) AND RegWriteM
• Stalling logic:branchstall = BranchD AND RegWriteE AND
(WriteRegE == rsD OR WriteRegE == rtD) OR
BranchD AND MemtoRegM AND (WriteRegM == rsD OR WriteRegM == rtD)
StallF = StallD = FlushE = (lwstall OR branchstall)
Control Forwarding & Stalling Logic
![Page 87: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/87.jpg)
Chapter 7 <87>
MIC
ROAR
CHIT
ECTU
RE• Guess whether branch will be taken
– Backward branches are usually taken (in bottom-tested loops)
– Consider history to improve guess• Good prediction significantly reduces fraction
of branches requiring a flush • Requires HW for history table, etc
Branch Prediction
![Page 88: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/88.jpg)
Chapter 7 <88>
MIC
ROAR
CHIT
ECTU
RE• SPECINT2000 benchmark:
– 25% loads– 10% stores – 11% branches– 2% jumps– 52% R-type
• Suppose:– 40% of loads used by next instruction– 25% of branches mispredicted– All jumps flush next instruction (JTA not ready)
• What is the average CPI?
Pipelined Performance Example
![Page 89: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/89.jpg)
Chapter 7 <89>
MIC
ROAR
CHIT
ECTU
RE• Average CPI is the weighted average of CPIlw , CPIsw ,
CPIbeq , CPIj and CPIR-type
• For pipeline processors, CPI = 1 + # of stall cycles
Load CPI = 1 when no stall, = 2 when load-use occurs (1 stall)– CPIlw = 1(0.6) + 2(0.4) = 1.4– CPIsw = 1Branch CPI = 1 when no stall, = 2 when it mispredicts and stalls– CPIbeq = 1(0.75) + 2(0.25) = 1.25Jump CPI = 2 since it always requires 1 stall– CPIj = 2– CPIR-type = 1
Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
Calculation of Average CPI
![Page 90: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/90.jpg)
Chapter 7 <90>
MIC
ROAR
CHIT
ECTU
RE• Pipelined processor critical path: Tc = max {
tpcq + tmem + tsetup
2(tRFread + tmux + teq + tAND + tmux + tsetup )
tpcq + tmux + tmux + tALU + tsetup
tpcq + tmemwrite + tsetup
2(tpcq + tmux + tRFwrite) }
Pipelined Performance
![Page 91: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/91.jpg)
Chapter 7 <91>
MIC
ROAR
CHIT
ECTU
REElement Parameter Delay (ps)Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Equality comparator teq 40
AND gate tAND 15
Memory write Tmemwrite 220
Register file write tRFwrite 100 ps
Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup )
= 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps
Pipelined Performance Example
![Page 92: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/92.jpg)
Chapter 7 <92>
MIC
ROAR
CHIT
ECTU
REProgram with IC = 100 billion instructions
Execution Time = IC × CPI × Tc
= (100 × 109)(1.15)(550 × 10-
12) = 63 seconds
Pipelined Performance Example
![Page 93: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/93.jpg)
Chapter 7 <93>
MIC
ROAR
CHIT
ECTU
RE
Processor
Execution Time(seconds)
Speedup(single-cycle as baseline)
Single-cycle 92.5 1
Multicycle 133 0.70
Pipelined 63 1.47
Processor Performance Comparison
![Page 94: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/94.jpg)
Chapter 7 <94>
MIC
ROAR
CHIT
ECTU
RE• Deep Pipelining• Branch Prediction• Superscalar Processors• Out of Order Processors• Register Renaming• SIMD• Multithreading• Multiprocessors
Advanced Microarchitecture
![Page 95: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/95.jpg)
Chapter 7 <95>
MIC
ROAR
CHIT
ECTU
RE• 10-20 stages typical• Number of stages limited by:
– Pipeline hazards– Sequencing overhead– Power– Cost
Deep Pipelining
![Page 96: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/96.jpg)
Chapter 7 <96>
MIC
ROAR
CHIT
ECTU
RE• Ideal pipelined processor: CPI = 1• Branch misprediction increases CPI• Static branch prediction:
– Check direction of branch (forward or backward)– If backward, predict taken– Else, predict not taken
• Dynamic branch prediction:– Keep history of last (several hundred) branches in branch
target buffer, record:• Branch destination• Whether branch was taken
Branch Prediction
![Page 97: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/97.jpg)
Chapter 7 <97>
MIC
ROAR
CHIT
ECTU
RE add $s1, $0, $0 # sum = 0 add $s0, $0, $0 # i = 0 addi $t0, $0, 10 # $t0 = 10for: beq $s0, $t0, done # if i == 10, branch add $s1, $s1, $s0 # sum = sum + i addi $s0, $s0, 1 # increment i j fordone:
Branch Prediction Example
![Page 98: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/98.jpg)
Chapter 7 <98>
MIC
ROAR
CHIT
ECTU
RE• Remembers whether branch was taken the
last time and does the same thing• Mispredicts first and last branch of loop
1-Bit Branch Predictor
![Page 99: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/99.jpg)
Chapter 7 <99>
MIC
ROAR
CHIT
ECTU
RE
Only mispredicts the last branch of the loop
stronglytaken
predicttaken
weaklytaken
predicttaken
weaklynot taken
predictnot taken
stronglynot taken
predictnot taken
taken taken taken
takentakentaken
taken
taken
2-Bit Branch Predictor
![Page 100: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/100.jpg)
Chapter 7 <100>
MIC
ROAR
CHIT
ECTU
RE• Multiple copies of datapath execute multiple
instructions at once• Dependencies make it tricky to issue multiple
instructions at onceCLK CLK CLK CLK
ARD A1
A2RD1A3
WD3WD6
A4A5A6
RD4
RD2RD5
InstructionMemory
RegisterFile Data
Memory
ALU
s
PC
CLK
A1A2
WD1WD2
RD1RD2
Superscalar
![Page 101: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/101.jpg)
Chapter 7 <101>
MIC
ROAR
CHIT
ECTU
RElw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 2or $t3, $s5, $s6sw $s7, 80($t3)
Time (cycles)
1 2 3 4 5 6 7 8
RF40
$s0
RF
$t0+
DMIM
lw
add
lw $t0, 40($s0)
add $t1, $s1, $s2
sub $t2, $s1, $s3
and $t3, $s3, $s4
or $t4, $s1, $s5
sw $s5, 80($s0)
$t1$s2
$s1
+
RF$s3
$s1
RF
$t2-
DMIM
sub
and $t3$s4
$s3
&
RF$s5
$s1
RF
$t4|
DMIM
or
sw80
$s0
+ $s5
Superscalar Example
![Page 102: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/102.jpg)
Chapter 7 <102>
MIC
ROAR
CHIT
ECTU
RElw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 6/5 = 1.17or $t3, $s5, $s6sw $s7, 80($t3)
Stall
Time (cycles)
1 2 3 4 5 6 7 8
RF40
$s0
RF
$t0+
DMIM
lwlw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
sw $s7, 80($t3)
RF$s1
$t0add
RF$s1
$t0
RF
$t1+
DM
RF$t0
$s4
RF
$t2&
DMIM
and
IMor
and
sub
|$s6
$s5$t3
RF80
$t3
RF+
DM
sw
IM
$s7
9
$s3
$s2
$s3
$s2-
$t0
oror $t3, $s5, $s6
IM
Superscalar with Dependencies
![Page 103: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/103.jpg)
Chapter 7 <103>
MIC
ROAR
CHIT
ECTU
RE• Looks ahead across multiple instructions• Issues as many instructions as possible at once• Issues instructions out of order (as long as no
dependencies)• Dependencies:
– RAW (read after write): one instruction writes, later instruction reads a register
– WAR (write after read): one instruction reads, later instruction writes a register
– WAW (write after write): one instruction writes, later instruction writes a register
Out of Order Processor
![Page 104: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/104.jpg)
Chapter 7 <104>
MIC
ROAR
CHIT
ECTU
RE• Instruction level parallelism (ILP): number
of instruction that can be issued simultaneously (average < 3)
• Scoreboard: table that keeps track of:– Instructions waiting to issue– Available functional units– Dependencies
Out of Order Processor
![Page 105: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/105.jpg)
Chapter 7 <105>
MIC
ROAR
CHIT
ECTU
RElw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 6/4 =
1.5or $t3, $s5, $s6sw $s7, 80($t3) Time (cycles)
1 2 3 4 5 6 7 8
RF40
$s0
RF
$t0+
DMIM
lwlw $t0, 40($s0)
add $t1, $t0, $s1
sub $t0, $s2, $s3
and $t2, $s4, $t0
sw $s7, 80($t3)
or|$s6
$s5$t3
RF80
$t3
RF+
DM
sw $s7
or $t3, $s5, $s6
IM
RF$s1
$t0
RF
$t1+
DMIM
add
sub-$s3
$s2$t0
two cycle latencybetween load anduse of $t0
RAW
WAR
RAW
RF$t0
$s4
RF&
DM
and
IM
$t2
RAW
Out of Order Processor Example
![Page 106: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/106.jpg)
Chapter 7 <106>
MIC
ROAR
CHIT
ECTU
RE
Time (cycles)
1 2 3 4 5 6 7
RF40
$s0
RF
$t0+
DMIM
lwlw $t0, 40($s0)
add $t1, $t0, $s1
sub $r0, $s2, $s3
and $t2, $s4, $r0
sw $s7, 80($t3)
sub-$s3
$s2$r0
RF$r0
$s4
RF&
DM
and
$s7
or $t3, $s5, $s6IM
RF$s1
$t0
RF
$t1+
DMIM
add
sw+80
$t3
RAW
$s6
$s5|
or
2-cycle RAW
RAW
$t2
$t3
lw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 6/3 =
2or $t3, $s5, $s6sw $s7, 80($t3)
Register Renaming
![Page 107: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/107.jpg)
Chapter 7 <107>
MIC
ROAR
CHIT
ECTU
RE• Single Instruction Multiple Data (SIMD)
– Single instruction acts on multiple pieces of data at once– Common application: graphics– Perform short arithmetic operations (also called packed
arithmetic)
• For example, add four 8-bit elementspadd8 $s2, $s0, $s1
a0
0781516232432 Bit position
$s0a1a2a3
b0 $s1b1b2b3
a0 + b0 $s2a1 + b1a2 + b2a3 + b3
+
SIMD
![Page 108: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/108.jpg)
Chapter 7 <108>
MIC
ROAR
CHIT
ECTU
RE• Multithreading
– Wordprocessor: thread for typing, spell checking, printing
• Multiprocessors– Multiple processors (cores) on a single chip
Advanced Architecture Techniques
![Page 109: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/109.jpg)
Chapter 7 <109>
MIC
ROAR
CHIT
ECTU
RE• Process: program running on a computer
– Multiple processes can run at once: e.g., surfing Web, playing music, writing a paper
• Thread: part of a program– Each process has multiple threads: e.g., a word
processor may have threads for typing, spell checking, printing
Threading: Definitions
![Page 110: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/110.jpg)
Chapter 7 <110>
MIC
ROAR
CHIT
ECTU
RE• One thread runs at once• When one thread stalls (for example, waiting
for memory):– Architectural state of that thread stored– Architectural state of waiting thread loaded into
processor and it runs– Called context switching
• Appears to user like all threads running simultaneously
Threads in Conventional Processor
![Page 111: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/111.jpg)
Chapter 7 <111>
MIC
ROAR
CHIT
ECTU
RE• Multiple copies of architectural state• Multiple threads active at once:
– When one thread stalls, another runs immediately– If one thread can’t keep all execution units busy,
another thread can use them
• Does not increase instruction-level parallelism (ILP) of single thread, but increases throughput
Intel calls this “hyperthreading”
Multithreading
![Page 112: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris](https://reader035.vdocument.in/reader035/viewer/2022081416/5697bfa81a28abf838c99832/html5/thumbnails/112.jpg)
Chapter 7 <112>
MIC
ROAR
CHIT
ECTU
RE
• Multiple processors (cores) with a method of communication between them
• Types:– Homogeneous: multiple cores with shared memory– Heterogeneous: separate cores for different tasks (for
example, DSP and CPU in cell phone)– Clusters: each core has own memory system
Multiprocessors