lec1 computer architecture by hsien-hsin sean lee georgia tech -- pipelining
TRANSCRIPT
![Page 1: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/1.jpg)
ECE 4100/6100Advanced Computer Architecture
Lecture 1 Pipelining (3055 Review)
Prof. Hsien-Hsin Sean LeeSchool of Electrical and Computer EngineeringGeorgia Institute of Technology
![Page 2: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/2.jpg)
2
Pipeline Stage
Combinational
LogicF/F
F/F
• Optimal FO4 per pipe– 6 to 8 [UT/Compaq, ISCA-29]– 18 (15+3 latch) [IBM, MICRO-
35]
P4 pipe stage~ 16 FO4
1 FO4
![Page 3: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/3.jpg)
3
Five-stage Pipelined Datapath
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Inst. Fetch Inst. Decode Exec Mem WB
![Page 4: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/4.jpg)
4
Example for lw instruction: Instruction Fetch (IF)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Instruction fetch
![Page 5: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/5.jpg)
5
Example for lw instruction: Instruction Decode (ID)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Instruction decode
![Page 6: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/6.jpg)
6
Example for lw instruction: Execution (EX)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Execution
![Page 7: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/7.jpg)
7
Example for lw instruction: Memory (MEM)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Memory
![Page 8: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/8.jpg)
8
Example for lw instruction: Writeback (WB)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Writeback
![Page 9: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/9.jpg)
9
Example for sw instruction: Memory (MEM)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Memory
![Page 10: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/10.jpg)
10
Example for sw instruction: Writeback (WB): do nothing
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Writeback
![Page 11: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/11.jpg)
11
Corrected Datapath (for lw)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX
![Page 12: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/12.jpg)
13
Pipeline Control
PC
Instructionmemory
Address
Inst
ruct
ion
Instruction[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1Write
data
Read
data Mux
1
ALUcontrol
RegWrite
MemRead
Instruction[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Datamemory
PCSrc
Zero
Add Addresult
Shiftleft 2
ALUresult
ALUZero
Add
0
1
Mux
0
1
Mux
![Page 13: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/13.jpg)
14
• We have 5 stages. What needs to be controlled in each stage?– Instruction Fetch and PC Increment– Instruction Decode / Register Fetch– Execution (4 lines)
• RegDst• ALUop[1:0]• ALUSrc
– Memory Stage (3 lines)• Branch• MemRead• MemWrite
– Write Back (2 lines)• MemtoReg• RegWrite (note that this signal is in ID stage)
Pipeline control
![Page 14: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/14.jpg)
15
• Extend pipeline registers to include control information (created in ID)
• Pass control signals along just like the data
Pipeline Control
Execution/AddressCalculation stage control
linesMemory access stage
control lines
Write-backstage control
lines
InstructionRegDst
ALUOp1
ALUOp0
ALUSrc Branch
MemRead
MemWrite
Regwrite
Memto Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
![Page 15: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/15.jpg)
16
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
![Page 16: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/16.jpg)
17
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
IF: lw $10, 8($1)IF: lw $10, 8($1)
![Page 17: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/17.jpg)
18
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 8($1)ID: lw $10, 8($1)
11
010
0001E
“lw”
![Page 18: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/18.jpg)
19
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
11
010
00E
ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 8($1)EX: lw $10, 8($1)IF: and $12, $4, $5IF: and $12, $4, $5
1
0
10
000
1100
“sub”
![Page 19: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/19.jpg)
20
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
10
000
10E
EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 8($1)MEM: lw $10, 8($1)ID: and $12, $4, $5ID: and $12, $4, $5
0
1
10
000
1100
IF: or $13, $6, $7IF: or $13, $6, $7
110
10
“and”
![Page 20: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/20.jpg)
21
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
10
000
10E
MEM: sub $11, ..MEM: sub $11, .. WB: lw $10, WB: lw $10, 8($1)8($1)
EX: and $12, $4, $5EX: and $12, $4, $5
0
1
10
000
1100
ID: or $13, $6, $7ID: or $13, $6, $7
100
00
“or”
IF: add $14, $8, $9IF: add $14, $8, $9
1
1
![Page 21: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/21.jpg)
22
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
10
000
10E
WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12…
0
1
10
000
1100
EX: or $13, $6, $7EX: or $13, $6, $7
100
00
“add”
ID: add $14, $8, $9ID: add $14, $8, $9
1
0
IF: xxxxIF: xxxx
![Page 22: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/22.jpg)
23
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
10
000
10
WB: and $12…WB: and $12…
0
1
MEM: or $13, ..MEM: or $13, ..
100
00
EX: add $14, $8, $9EX: add $14, $8, $9
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
![Page 23: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/23.jpg)
24
Datapath with ControlWB: or $13…WB: or $13…
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
MEM: add $14, ..MEM: add $14, ..
1000
0
EX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
![Page 24: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/24.jpg)
25
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2R
egW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Wri t
e
AddressData
memory
Address
WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
![Page 25: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/25.jpg)
26
Pipelining is not quite that straightforward !• Limits to pipelining: Hazards prevent
next instruction from executing during its designated clock cycle– Structural hazards: HW cannot support this
combination of instructions – Data hazards: Instruction depends on result of
prior instruction still in the pipeline – Control hazards: Caused by delay between the
fetching of instructions and decisions about changes in control flow (branches and jumps).
![Page 26: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/26.jpg)
27
“Single” Memory Port Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Inst
ruct
ion
orde
r
![Page 27: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/27.jpg)
28
“Single” Memory Port / Structural Hazard
Time (clock cycles)
Load
add
Instr 2
Instr 3
Instr 4
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Reg ALU RegIfetch DMem
Inst
ruct
ion
orde
r
![Page 28: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/28.jpg)
29
Data Hazard
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Time (clock cycles)
Inst
ruct
ion
orde
r
![Page 29: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/29.jpg)
30
Forwarding to Avoid Data HazardTime (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Inst
ruct
ion
orde
r
![Page 30: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/30.jpg)
31
Forwarding (simplified)
DataMemory
RegisterFile
MU
X
ID/EX EX/MEM MEM/WB
ALU
![Page 31: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/31.jpg)
32
Forwarding (from EX/MEM)
ALU
DataMemory
RegisterFile
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
![Page 32: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/32.jpg)
33
Forwarding (from MEM/WB)
ALU
DataMemory
RegisterFile
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
![Page 33: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/33.jpg)
34
Forwarding (operand selection)
ALU
DataMemory
RegisterFile
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
ForwardingUnit
![Page 34: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/34.jpg)
35
Forwarding (operand propagation)
ALU
DataMemory
RegisterFile
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
ForwardingUnit
Rt
Rs
MU
X
Rd
Rt
EX/MEM Rd
MEM/WB Rd
![Page 35: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/35.jpg)
36
Data Hazard Even with Forwarding
Time (clock cycles)
lw r1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
or r8,r1,r9
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Forward backward in time… no way!! (or way?)
Inst
ruct
ion
orde
r
![Page 36: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/36.jpg)
37
Data Hazard Even with ForwardingTime (clock cycles)
or r8,r1,r9
lw r1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
Reg ALU DMemIfetch Reg
RegIfetch ALU DMem RegBubble
Ifetch ALU DMem RegBubble Reg
Ifetch ALU DMemBubble Reg
Need “pipeline interlock” (or stall) to stop instructions from issuing. How is this detected?
NO ISSUE
NO ISSUE
Inst
ruct
ion
orde
r
![Page 37: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/37.jpg)
38
Hazard Detection Unit• Stall by letting an instruction that won’t write anything go forward• Stall the pipeline if ID/EX is a load, and (rt=IF/ID.rs or rt=IF/ID.rt)
PC Instructionmemory
Registers
Mux
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
0
Mux
IF/ID
Inst
ruct
ion
ID/EX.MemReadIF
/IDW
rite
PC
Writ
e
ID/EX.RegisterRt
IF/ID.RegisterRd
IF/ID.RegisterRtIF/ID.RegisterRtIF/ID.RegisterRs
RtRs
Rd
Rt EX/MEM.RegisterRd
MEM/WB.RegisterRd
![Page 38: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/38.jpg)
39
Code Rescheduling to Avoid Load Hazards
Try producing fast code fora = b + c;d = e – f;
assuming a, b, c, d ,e, and f in memory. Slow code:
LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSW d,Rd
Compiler optimizes for performance. Hardware checks for safety.
Fast code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,fSW a,Ra SUB Rd,Re,RfSW d,Rd
![Page 39: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/39.jpg)
40
Control Hazard due to Branches (3 stall cycles)10: beq r1,r3,36
14: and r2,r3,r5
18: or r6,r1,r7
22: add r8,r1,r9
36: xor r10,r1,r11
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch
What do you do with the 3 instructions in between?
How do you do it?
Where is the “commit”?
![Page 40: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/40.jpg)
41
Branch Hazard Resolutions#1 Stall until branch direction is clear ()#2: Static Branch Prediction• Predict Not Taken (Fall through, as shown in previous
slide)– Execute successor instructions in sequence– “Squash” instructions in pipeline if branch actually taken– PC+4 already calculated, so use it to get next instruction
• Predict Branch Taken– But haven’t calculated branch target address
• Might incur 1 cycle branch penalty• Other machines: branch target known before outcome
#3 Dynamic Branch Prediction– Will dedicate a lecture to such techniques
![Page 41: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/41.jpg)
42
Alternative Branch Hazard Resolutions#4 Delayed Branch
– Define branch to take place AFTER a following instructionbranch instruction
sequential successor1
sequential successor2
........sequential successorn
branch target if taken
– 1 slot delay allows proper decision and branch target address in 5 stage pipeline (next page)
Branch delay of length n
![Page 42: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/42.jpg)
44
Filling Branch Delay Slot
Make sure R7 will not be used in taken path before redefined
![Page 43: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/43.jpg)
45
Other Pipelining Issues• To have all instructions finish within
one cycle– Slow down frequency to cope w/ the
critical operation, or– Allow non-uniform latency operation
![Page 44: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/44.jpg)
46
Support Multiple FP Operations
• Complicate bypass• Potential structural hazard• Multiple (FP) instructions can complete at the same time
– RF might need to be multi-ported– Ordering issue, who gets to update the register?
• Out-of-order completion/retirement: Precise exception issue
IF ID MEM WB
EX
M2
M3
M4
M1
M5
M6
M7
A2
A3
A4
A1
Integer Unit
FP multiplier
FP add
FP divider (non-pipelined)
![Page 45: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/45.jpg)
47
Full Bypass/Forwarding Needed
IF ID EX M WB
IF ID SS M WBM1 M2 M3 M4 M5 M6 M7
L.D F4,0(R2)
MUL.D F0,F4,F6
A4IF SS ID SS SS SS SS SS SS A1 A2 A3 M WB
IF SS SS SS SS SS SS ID EX SS SS SS M WB
ADD.D F2,F0,F8
S.D F2,0(R2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Clock Cycles
![Page 46: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/46.jpg)
48
Structural Hazards
• Write to register file at the same cycle (cc11)• Write to the same register (WAW)• MEM in cc10
1 2 3 4 5 6 7 8 9 10 11Clock Cycles
IF ID M WBM1 M2 M3 M4 M5 M6 M7MUL.D F0,F4,F6
A4IF ID A1 A2 A3 M WBADD.D F2,F4,F6
IF ID EX M WBL.D F2,0(R2)
IF ID EX M WB. . . .IF ID EX M WB. . . .
IF ID EX M WB. . . .IF ID EX M WB. . . .
![Page 47: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/47.jpg)
49
Precise Exception Issue
• Precise exception: If the pipeline can (or must) be stopped– All the instructions before the faulty (or intended) instruction
must be completed– All the instructions after it must not be completed– Restart the execution from the faulty (or intended) instruction
• State must be consistent with the original program order• Not straightforward with out-of-order completion• Simple solution: Stalling until no exception of prior long-latency
instruction is guaranteed• Other modern solution: ROB (will dedicate a lecture to it)
DIV.D F0,F2,F4(exception!)ADD.D F3,F10,F8 (completed)SUB.D F12,F12,F14 (completed)
![Page 48: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/48.jpg)
50
MIPS R4000 Pipeline• Deeper Pipeline (superpipelining)• 2 cycle delays for load• Predicted-Not-Taken strategy
– Not-taken (fall-through) branch : 1 delay slot– Taken branch: 1 delay slot + 2 idle cycles
IF IS RF EX DF DS TC WB
Instruction Memory Reg Data Memory RegAL
U
Branch target and condition eval.
![Page 49: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/49.jpg)
51
Load delay (2 cycles)
IF IS RF EX DF DS TC WBInstruction
Memory Reg Data Memory RegALU
Instruction Memory Reg Data Memory RegAL
UInstruction
Memory Reg Data Memory RegALU
Instruction Memory Reg Data Memory RegAL
U
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11
LD R1
ADD R2, R1
Inst 2
Inst 1
If no delay slot instructions scheduled, R4000 will perform HW interlock
Bubble
Bubble
![Page 50: Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining](https://reader036.vdocument.in/reader036/viewer/2022062412/58a14f8f1a28abbe3c8b4f15/html5/thumbnails/50.jpg)
52
Branches (Predicted-not-taken)IF IS RF EX DF DS TC WB
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
IF IS RF EX DF DS TC WB
CC9 CC10 CC11Branch
Delay slot
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Branch inst+2
Branch inst+3
NOT
TAKEN
SSStall
Stall
SS SS SS SS SS SS SS
SS SS SS SS SS SS SS SS
IF IS RF EX DF DS TCBranch Target
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Branch
Delay slotTAKEN
ACTUAL
DIRECTION