lecture 09: risc-v pipeline implementa8on - passlab · lecture 09: risc-v pipeline implementa8on...
TRANSCRIPT
![Page 1: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/1.jpg)
Lecture09:RISC-VPipelineImplementa8on
CSE564ComputerArchitectureSummer2017
DepartmentofComputerScienceandEngineeringYonghongYan
[email protected]/~yan
1
![Page 2: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/2.jpg)
Acknowledgement
• SlidesadaptedfromComputerScience152:ComputerArchitectureandEngineering,Spring2016byDr.GeorgeMichelogiannakisfromUCBerkeley
2
![Page 3: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/3.jpg)
Introduc8on
• CPUperformancefactors– InstrucNoncount
• DeterminedbyISAandcompiler– CPIandCycleNme
• DeterminedbyCPUhardware
• ThreegroupsofinstrucNons– Memoryreference:lw,sw– ArithmeNc/logical:add,sub,and,or,slt– Controltransfer:jal,jalr,b*
• CPI– Single-cycle,CPI=1– 5stageunpipelined,CPI=5– 5stagepipelined,CPI=1
CPU Time = InstructionsProgram
* CyclesInstruction
*TimeCycle
![Page 4: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/4.jpg)
AnIdealPipeline
• Allobjectsgothroughthesamestages• Nosharingofresourcesbetweenanytwostages• PropagaNondelaythroughallpipelinestagesisequal• Theschedulingofanobjectenteringthepipelineisnot
affectedbytheobjectsinotherstages
4
stage 1
stage 2
stage 3
stage 4
Thesecondi+onsgenerallyholdforindustrialassemblylines,butinstruc+onsdependoneachother!
![Page 5: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/5.jpg)
Review:UnpipelinedDatapathforRISC-V
5
0x4
RegWriteEn
AddAdd
clk
WBSelMemWrite
addr
wdata
rdataDataMemory
we
WASel Op2SelImmSelOpCode
clk
clk
addrinst
Inst.Memory
PC rd1
GPRs
rs1rs2
wawd rd2
we
ImmSelect
ALU
ALUControl
PCSelbrrindjabspc+4
Bcomp?BrLogic
![Page 6: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/6.jpg)
Review:HardwiredControlTable
6
Opcode ImmSel Op2Sel FuncSel MemWr RFWen WBSel WASel PCSel
ALU ALUi LW SW BEQtrue
BEQfalse
JAL
JALR
Op2Sel=Reg/Imm WBSel=ALU/Mem/PC PCSel=pc+4/br/rind/jabs
* * *no yes rindPC rd
jabs* * * no
yes PC rdpc+4BrType12 * * no no * *
brBrType12 * * no no * *pc+4BsType12 Imm + yes no * *
pc+4* Reg Func no yes ALU rdIType12 Imm Op pc+4no yes ALU rd
pc+4IType12 Imm + no yes Mem rd
![Page 7: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/7.jpg)
PipelinedDatapath
7
ClockperiodcanbereducedbydividingtheexecuNonofaninstrucNonintomulNplecycles
tC>max{tIM,tRF,tALU,tDM,tRW}(=tDMprobably)
However,CPIwillincreaseunlessinstruc+onsarepipelined
write-backphase
fetchphase
executephase
decode&Reg-fetchphase
memoryphase
addr
wdata
rdataDataMemory
weALU
ImmSelect
0x4Add
addrrdata
Inst.Memory
rd1
GPRs
rs1rs2
wawdrd2
we
IRPC
![Page 8: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/8.jpg)
TechnologyAssump8ons
8
Thus,thefollowingNmingassumpNonisreasonable
• Asmallamountofveryfastmemory(caches)backedupbyalarge,slowermemory• FastALU(atleastforintegers)• MulNportedRegisterfiles(slower!)
tIM~=tRF~=tALU~=tDM~=tRW
A5-stagepipelinewillbefocusofourdetaileddesign-somecommercialdesignshaveover30pipeline
stagestodoanintegeradd!
![Page 9: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/9.jpg)
5-StagePipelinedExecu8on
9
+me t0 t1 t2 t3 t4 t5 t6 t7 ....instrucNon1 IF1 ID1 EX1 MA1 WB1instrucNon2 IF2 ID2 EX2 MA2 WB2instrucNon3 IF3 ID3 EX3 MA3 WB3instrucNon4 IF4 ID4 EX4 MA4 WB4instrucNon5 IF5 ID5 EX5 MA5 WB5
Write-Back(WB)
I-Fetch(IF)
Execute(EX)
Decode,Reg.Fetch(ID)
Memory(MA)
addr
wdata
rdataDataMemory
weALU
ImmSelect
0x4Add
addrrdata
Inst.Memory
rd1
GPRs
rs1rs2wawdrd2
we
IRPC
![Page 10: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/10.jpg)
5-StagePipelinedExecu8onResourceUsageDiagram
10
+me t0 t1 t2 t3 t4 t5 t6 t7 ....IF I1 I2 I3 I4 I5 ID I1 I2 I3 I4 I5EX I1 I2 I3 I4 I5MA I1 I2 I3 I4 I5WB I1 I2 I3 I4 I5
Resources
Write-Back(WB)
I-Fetch(IF)
Execute(EX)
Decode,Reg.Fetch(ID)
Memory(MA)
addr
wdata
rdataDataMemory
weALU
0x4Add
addrrdata
Inst.Memory
rd1
GPRs
rs1rs2wawdrd2
we
IRPC
ImmSelect
![Page 11: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/11.jpg)
PipelinedExecu8on:ALUInstruc8ons
11
IRIR IR
PC A
BY
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmSelect
ALUrd1
GPRs
rs1rs2
wawdrd2
we
wdata
addr
wdata
rdataDataMemory
we
Notquitecorrect!WeneedanInstruc+onReg(IR)foreachstage
![Page 12: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/12.jpg)
PipelinedRISC-VDatapathwithoutjumps
12
IRIR IR
PC A
BY
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmSelect
ALUrd1
GPRs
rs1rs2
wawdrd2
we
DataMemorywdata
addr
wdata
rdata
we
ImmSel Op2Sel
WBSelMemWrite
RegWriteEn
F D E M W
ControlPointsNeedtoBeConnected
ALUControl
![Page 13: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/13.jpg)
Instruc8onsinteractwitheachotherinpipeline
• AninstrucNoninthepipelinemayneedaresourcebeingusedbyanotherinstrucNoninthepipelineàstructuralhazard
• AninstrucNonmaydependonsomethingproducedbyanearlierinstrucNon– Dependencemaybeforadatavalue
àdatahazard– DependencemaybeforthenextinstrucNon’saddress
àcontrolhazard(branches,excep+ons)
13
![Page 14: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/14.jpg)
ResolvingStructuralHazards
• StructuralhazardoccurswhentwoinstrucNonsneedsamehardwareresourceatsameNme– CanresolveinhardwarebystallingnewerinstrucNonNllolder
instrucNonfinishedwithresource• Astructuralhazardcanalwaysbeavoidedbyaddingmorehardwaretodesign– E.g.,iftwoinstrucNonsbothneedaporttomemoryatsame
Nme,couldavoidhazardbyaddingsecondporttomemory• Our5-stagepipelinehasnostructuralhazardsbydesign
– ThankstoRISC-VISA,whichwasdesignedforpipelining
14
![Page 15: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/15.jpg)
DataHazards
15
... x1 ← x0 + 10 x4 ← x1 + 17 ...
x1 is stale. Oops!
x1 ← … x4 ← x1 …
IR IR IR
PC A
B
Y
R
MD1 MD2
addr inst
Inst Memory
0x4 Add
IR
Imm Select
ALU rd1
GPRs
rs1 rs2
wa wd rd2
we
wdata
addr
wdata
rdata Data Memory
we
![Page 16: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/16.jpg)
HowWouldYouResolveThis?
• ThreeopNons– Wait(stall)– Bypass:askthemforwhatyouneedbeforehis/herfinal
deliverable– Speculateonvaluestoread
16
![Page 17: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/17.jpg)
ResolvingDataHazards(1)
17
Strategy 1: Wait for the result to be available by freezing earlier pipeline stages è interlocks
![Page 18: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/18.jpg)
InterlockstoresolveDataHazards
18
IR IR IR
PC A
B
Y
R
MD1 MD2
addr inst
Inst Memory
0x4 Add
IR
Imm Select
ALU rd1
GPRs
rs1 rs2
wa wd rd2
we
wdata
addr
wdata
rdata Data Memory
we
bubble
... x1 ← x0 + 10 x4 ← x1 + 17 ...
Stall Condition
![Page 19: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/19.jpg)
StalledStagesandPipelineBubbles
19
stalled stages
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 I2 I3 I3 I3 I3 I4 I5 ID I1 I2 I2 I2 I2 I3 I4 I5 EX I1 - - - I2 I3 I4 I5 MA I1 - - - I2 I3 I4 I5 WB I1 - - - I2 I3 I4 I5
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) x1 ← (x0) + 10 IF1 ID1 EX1 MA1 WB1 (I2) x4 ← (x1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2 (I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 WB3 (I4) IF4 ID4 EX4 MA4 WB4 (I5) IF5 ID5 EX5 MA5 WB5
Resource Usage
- ⇒ pipeline bubble
![Page 20: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/20.jpg)
InterlockControlLogic
20
IR IR IR
PC A
B
Y
R
MD1 MD2
addr inst
Inst Memory
0x4 Add
IR
Imm Select
ALU rd1
GPRs
rs1 rs2
wa wd rd2
we
wdata
addr
wdata
rdata Data Memory
we
bubble
Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions.
stall Cstall
ws
rs2 rs1 ?
![Page 21: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/21.jpg)
InterlockControlLogicignoringjumps&branches
21
Shouldwealwaysstallifanrsfieldmatchessomerd?
IRIR IR
PC A
BY
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR ALUrd1
GPRs
rs1rs2
wawdrd2
we
wdata
addr
wdata
rdataDataMemory
we
bubble
stallCstall
wsW
rs1rs2 ?
weW
re1 re2Cre
wsEweM wsM
Cdest CdestweE
noteveryinstrucNonwritesaregister=>wenoteveryinstrucNonreadsaregister=>re
ImmSelect
we:writeenable,1-biton/offws:writeselect,5-bitregisternumberre:readenable,1-biton/offrs:readselect,5-bitregisternumber
![Page 22: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/22.jpg)
InRISC-VSodorImplementa8on
22
![Page 23: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/23.jpg)
Source&Des8na8onRegisters
23
ALUI/LW/JALRALU
SW/Bcond
func7 rs2 rs1 func3 rd opcode immediate12 rs1 func3 rd opcode imm rs2 rs1 func3 imm
Jump Offset[19:0]
opcode
rd opcode source(s) des+na+on
ALU rd<=rs1func10rs2 rs1,rs2 rdALUI rd<=rs1opimm rs1 rdLW rd<=M[rs1+imm] rs1 rdSW M[rs1+imm]<=rs2 rs1,rs2 -Bcondrs1,rs2 rs1,rs2 -
true: PC<=PC+imm false: PC<=PC+4
JAL x1<=PC,PC<=PC+imm - rdJALR rd<=PC,PC<=rs1+imm rs1 rd
![Page 24: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/24.jpg)
DerivingtheStallSignal
24
Cdestws=rd
we=Caseopcode
ALU,ALUi,LW,JALR=>on... =>off
Crere1=Caseopcode
ALU,ALUi, =>on =>off
re2=Caseopcode
=>on ->off
LW,SW,Bcond,JALRJAL
ALU,SW,Bcond...
Cstall stall=((rs1D==wsE)&&weE+ (rs1D==wsM)&&weM+ (rs1D==wsW)&&weW)&&re1D+ ((rs2D==wsE)&&weE+ (rs2D==wsM)&&weM+ (rs2D==wsW)&&weW)&&re2D
![Page 25: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/25.jpg)
HazardsduetoLoads&Stores
25
...M[x1+7]<=x2x4<=M[x3+5]...
IRIR IR
PC A
BY
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmSelect
ALUrd1
GPRs
rs1rs2
wawdrd2
we
wdata
addr
wdata
rdataDataMemory
we
bubble
StallCondi+on
Isthereanypossibledatahazardinthisinstruc+onsequence?
Whatifx1+7=x3+5?
![Page 26: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/26.jpg)
Load&StoreHazards
26
However,thehazardisavoidedbecauseourmemorysystemcompleteswritesinonecycle!Load/StorehazardsaresomeNmesresolvedinthepipelineandsomeNmesinthememorysystemitself.Moreonthislaterinthecourse.
...M[x1+7]<=x2x4<=M[x3+5]...
x1+7=x3+5=>datahazard
![Page 27: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/27.jpg)
ResolvingDataHazards(2)
27
Strategy2:Routedataassoonaspossibleaweritiscalculatedtotheearlierpipelinestageàbypass
![Page 28: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/28.jpg)
Bypassing
28
Eachstallorkillintroducesabubbleinthepipeline =>CPI>1
time t0 t1 t2 t3 t4 t5 t6 t7 . . . . (I1) x1 ← x0 + 10 IF1 ID1 EX1 MA1 WB1 (I2) x4 ← x1 + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2 (I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 (I4) stalled stages IF4 ID4 EX4 (I5) IF5 ID5
time t0 t1 t2 t3 t4 t5 t6 t7 . . . . (I1) x1 ← x0 + 10 IF1 ID1 EX1 MA1 WB1 (I2) x4 ← x1 + 17 IF2 ID2 EX2 MA2 WB2 (I3) IF3 ID3 EX3 MA3 WB3 (I4) IF4 ID4 EX4 MA4 WB4 (I5) IF5 ID5 EX5 MA5 WB5
Anewdatapath,i.e.,abypass,cangetthedatafromtheoutputoftheALUtoitsinput
![Page 29: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/29.jpg)
HardwareSupportforForwarding
![Page 30: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/30.jpg)
Detec8ngRAWHazards
• Pass register numbers along pipeline – ID/EX.RegisterRs = register number for Rs in ID/EX – ID/EX.RegisterRt = register number for Rt in ID/EX – ID/EX.RegisterRd = register number for Rd in ID/EX
• Current instruction being executed in ID/EX register • Previous instruction is in the EX/MEM register • Second previous is in the MEM/WB register • RAW Data hazards when
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
FwdfromEX/MEMpipelinereg
FwdfromMEM/WBpipelinereg
![Page 31: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/31.jpg)
Detec8ngtheNeedtoForward• But only if forwarding instruction will write to a register!
– EX/MEM.RegWrite, MEM/WB.RegWrite
• And only if Rd for that instruction is not R0 – EX/MEM.RegisterRd ≠ 0 – MEM/WB.RegisterRd ≠ 0
![Page 32: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/32.jpg)
ForwardingCondi8ons
• Detecting RAW hazard with Previous Instruction – if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 (Forward from EX/MEM pipe stage)
– if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 (Forward from EX/MEM pipe stage)
• Detecting RAW hazard with Second Previous – if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 (Forward from MEM/WB pipe stage)
– if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 (Forward from MEM/WB pipe stage)
![Page 33: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/33.jpg)
AddingaBypass
33
ASrc
...(I1)x1<=x0+10(I2)x4<=x1+17
x4<=x1... x1<=...
IRIR IR
PC A
BY
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmSelect
ALUrd1
GPRs
rs1rs2
wawdrd2
we
wdata
addr
wdata
rdataDataMemory
we
bubble
stall
D
E M W
Whendoesthisbypasshelp?x1<=M[x0+10]x4<=x1+17
JAL500x4<=x1+17
yes no no
![Page 34: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/34.jpg)
TheBypassSignalDerivingitfromtheStallSignal
34
ASrc=(rs1D==wsE)&&weE&&re1D
we=CaseopcodeALU,ALUi,LW,,JALJALR=>on...=>off
NobecauseonlyALUandALUiinstrucNonscanbenefitfromthisbypass
Isthiscorrect?
SplitweEintotwocomponents:we-bypass,we-stall
stall=(((rs1D==wsE)&&weE+(rs1D==wsM)&&weM+(rs1D==wsW)&&weW)&&re1D+((rs2D==wsE)&&weE+(rs2D==wsM)&&weM+(rs2D==wsW)&&weW)&&re2D)
ws=rd
![Page 35: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/35.jpg)
BypassandStallSignals
35
we-bypassE=CaseopcodeEALU,ALUi =>on
... =>off
ASrc=(rs1D==wsE)&&we-bypassE&&re1D
SplitweEintotwocomponents:we-bypass,we-stall
stall=((rs1D==wsE)&&we-stallE+ (rs1D==wsM)&&weM+(rs1D==wsW)&&weW)&&re1D
+((rs2D==wsE)&&weE+(rs2D==wsM)&&weM+(rs2D==wsW)&&weW)&&re2D
we-stallE=CaseopcodeELW,JAL,JALR=>on
JAL =>on... =>off
![Page 36: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/36.jpg)
FullyBypassedDatapath
36
ASrcIRIR IR
PC A
BY
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR ALU
ImmSelect
rd1
GPRs
rs1rs2
wawdrd2
we
wdata
addr
wdata
rdataDataMemory
we
bubble
stall
D
E M W
PCforJAL,...
BSrc
Istheres+llaneedforthestallsignal?stall=(rs1D==wsE)&&(opcodeE==LWE)&&(wsE!=0)&&re1D
+(rs2D==wsE)&&(opcodeE==LWE)&&(wsE!=0)&&re2D
![Page 37: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/37.jpg)
ControlHazards
WhatdoweneedtocalculatenextPC?• ForJumps
– Opcode,PCandoffset
• ForJumpRegister– Opcode,Registervalue,andPC
• ForCondiNonalBranches– Opcode,Register(forcondiNon),PCandoffset
• ForallotherinstrucNons– OpcodeandPC(andhavetoknowit’snotoneofabove)
37
![Page 38: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/38.jpg)
PCCalcula8onBubbles
38
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) x1 ← x0 + 10 IF1 ID1 EX1 MA1 WB1 (I2) x3 ← x2 + 17 IF2 IF2 ID2 EX2 MA2 WB2 (I3) IF3 IF3 ID3 EX3 MA3 WB3 (I4) IF4 IF4 ID4 EX4 MA4 WB4
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 - I2 - I3 - I4 ID I1 - I2 - I3 - I4 EX I1 - I2 - I3 - I4 MA I1 - I2 - I3 - I4 WB I1 - I2 - I3 - I4
Resource Usage
- ⇒ pipeline bubble
![Page 39: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/39.jpg)
SpeculatenextaddressisPC+4
39
A jump instruction kills (not stalls) the following instruction
stall
How?
I2
I1
104
IR IR
PC addr inst
Inst Memory
0x4 Add
bubble
IR
E M Add
Jump?
PCSrc (pc+4 / jabs / rind/ br)
I1 096 ADD I2 100 J 304 I3 104 ADD I4 304 ADD
kill
![Page 40: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/40.jpg)
PipeliningJumps
40
I2
I1
104
stall
IR IR
PC addr inst
Inst Memory
0x4 Add
bubble
IR
E M Add
Jump?
PCSrc (pc+4 / jabs / rind/ br)
IRSrcD = Case opcodeD JAL ⇒ bubble ... ⇒ IM
To kill a fetched instruction -- Insert a mux before IR
Any interaction between stall and jump?
bubble
IRSrcD
I2 I1
304 bubble
I1 096 ADD I2 100 J 304 I3 104 ADD I4 304 ADD
kill
![Page 41: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/41.jpg)
JumpPipelineDiagrams
41
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 I2 I3 I4 I5 ID I1 I2 - I4 I5 EX I1 I2 - I4 I5 MA I1 I2 - I4 I5 WB I1 I2 - I4 I5
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: J 304 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 - - - - (I4) 304: ADD IF4 ID4 EX4 MA4 WB4
Resource Usage
- ⇒ pipeline bubble
![Page 42: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/42.jpg)
PipeliningCondi8onalBranches
42
I1 096ADDI2 100BEQx1,x2+200I3 104ADDI4 304ADD
BEQ?
I2
I1
104
stall
IR IR
PC addrinst
InstMemory
0x4Add
bubble
IR
E MAdd
PCSrc(pc+4/jabs/rind/br)
bubble
IRSrcD
BranchcondiNonisnotknownunNltheexecutestage
whatac+onshouldbetakeninthedecodestage?
AYALU
Taken?
![Page 43: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/43.jpg)
PipeliningCondi8onalBranches
43
I1 096ADDI2 100BEQx1,x2+200I3 104ADDI4 304ADD
stall
IR IR
PC addrinst
InstMemory
0x4Add
bubble
IR
E MAdd
PCSrc(pc+4/jabs/rind/br)
bubble
IRSrcD
AYALU
Taken?
Ifthebranchistaken-killthetwofollowinginstrucNons-theinstrucNonatthedecodestageisnotvalid⇒stallsignalisnotvalid
I2 I1
108I3
Bcond?
?
![Page 44: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/44.jpg)
PipeliningCondi8onalBranches
44
I1: 096ADDI2: 100BEQx1,x2+200I3: 104ADDI4: 304ADD
stall
IR IR
PC addrinst
InstMemory
0x4Add
bubble
IR
E M
PCSrc(pc+4/jabs/rind/br)
bubble AYALU
Taken?I2 I1
108I3
Bcond?
Jump?
IRSrcD
IRSrcE
Ifthebranchistaken-killthetwofollowinginstrucNons-theinstrucNonatthedecodestageisnotvalid⇒stallsignalisnotvalid
Add
PC
![Page 45: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/45.jpg)
BranchPipelineDiagrams(resolvedinexecutestage)
45
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 I2 I3 I4 I5 ID I1 I2 I3 - I5 EX I1 I2 - - I5 MA I1 I2 - - I5 WB I1 I2 - - I5
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: BEQ +200 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 ID3 - - - (I4) 108: IF4 - - - - (I5) 304: ADD IF5 ID5 EX5 MA5 WB5
Resource Usage
- ⇒ pipeline bubble
![Page 46: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/46.jpg)
WhatIf…
• Weusedasimplebranchthatcomparesonlyoneregister(rs1)againstzero
• Canwedoanybeyer?
46
IR IR IR
PC A
B
Y
R
MD1 MD2
addr inst
Inst Memory
0x4 Add
IR
Imm Select
ALU rd1
GPRs
rs1 rs2
wa wd rd2
we
wdata
addr
wdata
rdata Data Memory
we
![Page 47: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/47.jpg)
Usesimplerbranches(e.g.,onlycompareoneregagainstzero)withcompareindecodestage
47
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 I2 I3 I4 I5 ID I1 I2 - I4 I5 EX I1 I2 - I4 I5 MA I1 I2 - I4 I5 WB I1 I2 - I4 I5
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: BEQZ +200 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 - - - - (I4) 300: ADD IF4 ID4 EX4 MA4 WB4
Resource Usage
- ⇒ pipeline bubble
![Page 48: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/48.jpg)
BranchDelaySlots(exposecontrolhazardtosoeware)
• ChangetheISAseman8cssothattheinstrucNonthatfollowsajumporbranchisalwaysexecuted– givescompilertheflexibilitytoputinausefulinstrucNonwherenormally
apipelinebubblewouldhaveresulted.
48
Delayslotinstruc+onexecutedregardlessofbranchoutcome
I1 096 ADD I2 100 BEQZ r1, +200 I3 104 ADD I4 300 ADD
![Page 49: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/49.jpg)
BranchPipelineDiagrams(branchdelayslot)
49
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 I2 I3 I4 ID I1 I2 I3 I4 EX I1 I2 I3 I4 MA I1 I2 I3 I4 WB I1 I2 I3 I4
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) 096: ADD IF1 ID1 EX1 MA1 WB1 (I2) 100: BEQZ +200 IF2 ID2 EX2 MA2 WB2 (I3) 104: ADD IF3 ID3 EX3 MA3 WB3 (I4) 300: ADD IF4 ID4 EX4 MA4 WB4
Resource Usage
![Page 50: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/50.jpg)
Post-1990RISCISAsdon’thavedelayslots
• EncodesmicroarchitecturaldetailintoISA– C.f.IBM650drumlayout
• Whataretheproblemswithdelayslots?
• Performanceissues– E.g.,I-cachemissorpagefaultondelayslotinstrucNoncauses
machinetowait,evenifdelayslotisaNOP• Complicatesmoreadvancedmicroarchitectures
– 30-stagepipelinewithfour-instrucNon-per-cycleissue• Complicatesthecompiler’sjob• BeyerbranchpredicNonreducedneedfordelayslots
50
![Page 51: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/51.jpg)
WhyanInstruc8onmaynotbedispatchedeverycycle(CPI>1)
• Fullbypassingmaybetooexpensivetoimplement– typicallyallfrequentlyusedpathsareprovided– someinfrequentlyusedbypasspathsmayincreasecycleNmeandcounteractthebenefitofreducingCPI
• Loadshavetwo-cyclelatency– InstrucNonawerloadcannotuseloadresult– MIPS-IISAdefinedloaddelayslots,asowware-visiblepipelinehazard(compilerschedulesindependentinstrucNonorinsertsNOPtoavoidhazard).RemovedinMIPS-II(pipelineinterlocksaddedinhardware)
• MIPS:“MicroprocessorwithoutInterlockedPipelineStages”• CondiNonalbranchesmaycausebubbles
– killfollowinginstrucNon(s)ifnodelayslots
51
Machineswithso^ware-visibledelayslotsmayexecutesignificantnumberofNOPinstruc+onsinsertedbythecompiler.NOPsincreaseinstruc+ons/program!
![Page 52: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/52.jpg)
RISC-VBranchesandJumps
• JAL:uncondi8onaljumptoPC+immediate
• JALR:indirectjumptors1+immediate
• Branch:if(rs1condsrs2),branchtoPC+immediate
52
![Page 53: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/53.jpg)
RISC-VBranchesandJumps
53
Instruc<on Takenknown? Targetknown?
JAL
JALRB<cond.>
EachinstrucNonfetchdependsononeortwopiecesofinformaNonfromtheprecedinginstrucNon:
1)IstheprecedinginstrucNonatakenbranch?2)Ifso,whatisthetargetaddress?
• JAL:uncondi8onaljumptoPC+immediate• JALR:indirectjumptors1+immediate• Branch:if(rs1condsrs2),branchtoPC+immediate
AeerInst.Decode
AeerInst.Decode AeerInst.Decode
AeerInst.Decode AeerReg.Fetch
AeerExecute
![Page 54: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/54.jpg)
BranchPenal8esinModernPipelines
54
A PCGeneraNon/MuxP InstrucNonFetchStage1F InstrucNonFetchStage2B BranchAddressCalc/BeginDecodeI CompleteDecodeJ SteerInstrucNonstoFuncNonalunitsR RegisterFileReadE IntegerExecute
Remainderofexecutepipeline(+another6stages)
UltraSPARC-IIIinstrucNonfetchpipelinestages(in-orderissue,4-waysuperscalar,750MHz,2000)
BranchTargetAddressKnown
BranchDirec+on&JumpRegisterTargetKnown
![Page 55: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/55.jpg)
ReducingControlFlowPenalty
• SowwaresoluNons– Eliminatebranches-loopunrolling
• Increasestherunlength– ReduceresoluNonNme-instrucNonscheduling
• ComputethebranchcondiNonasearlyaspossible(oflimitedvaluebecausebranchesowenincriNcalpaththroughcode)
• HardwaresoluNons– Findsomethingelsetodo-delayslots
• Replacespipelinebubbleswithusefulwork(requiressowwarecooperaNon)
– Speculate-branchpredicNon• SpeculaNveexecuNonofinstrucNonsbeyondthebranch
55
![Page 56: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/56.jpg)
BranchPredic8on
• Mo+va+on:– BranchpenalNeslimitperformanceofdeeplypipelined
processors– Modernbranchpredictorshavehighaccuracy– (>95%)andcanreducebranchpenalNessignificantly
• Requiredhardwaresupport:– Predic+onstructures:
• Branchhistorytables,branchtargetbuffers,etc.
– Mispredictrecoverymechanisms:• Keepresultcomputa+onseparatefromcommit • KillinstrucNonsfollowingbranchinpipeline• Restorestatetothatfollowingbranch
56
![Page 57: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/57.jpg)
Sta8cBranchPredic8on
57
Overallprobabilityabranchistakenis~60-70%but:
ISAcanayachpreferreddirecNonsemanNcstobranches,e.g.,MotorolaMC88110
bne0(preferredtaken) beq0(nottaken)
backward90%
forward50%
WhatC++statementdoesthislooklike
WhatC++statementdoesthislooklike
![Page 58: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/58.jpg)
DynamicBranchPredic8onlearningbasedonpastbehavior
• TemporalcorrelaNon(Nme)– IfItellyouthatacertainbranchwastakenlastNme,doesthishelp?
– ThewayabranchresolvesmaybeagoodpredictorofthewayitwillresolveatthenextexecuNon
• SpaNalcorrelaNon(space)– Severalbranchesmayresolveinahighlycorrelatedmanner– Forinstance,apreferredpathofexecuNon
58
![Page 59: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/59.jpg)
DynamicBranchPredic8on
• 1-bitpredicNonscheme– Low-porNonaddressasaddressforaone-bitflagforTakenor
NotTakenhistorically– Simple
• 2-bitpredicNon– Misstwicetochange
![Page 60: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/60.jpg)
BranchPredic8onBits
• Assume2BPbitsperinstrucNon• ChangethepredicNonawertwoconsecuNvemistakes!
60
¬takewrong
taken¬taken
taken
taken
taken¬takeright
takeright
takewrong
¬taken
¬taken¬taken
BPstate: (predicttake/¬take)x(lastpredic+onright/wrong)
![Page 61: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/61.jpg)
BranchHistoryTable
61
4K-entryBHT,2bits/entry,~80-90%correctpredicNons
00FetchPC
Branch? TargetPC
+
I-Cache
Opcode offsetInstruc+on
kBHTIndex
2k-entryBHT,2bits/entry
Taken/¬Taken?
![Page 62: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/62.jpg)
Exploi8ngSpa8alCorrela8onYehandPaC,1992
62
IffirstcondiNonfalse,secondcondiNonalsofalseHistoryregister,H,recordsthedirecNonofthelastNbranchesexecutedbytheprocessor
if (x[i] < 7) then!!y += 1;!
if (x[i] < 5) then!!c -= 4;!
![Page 63: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/63.jpg)
Two-LevelBranchPredictor
63
Pen+umProusestheresultfromthelasttwobranchestoselectoneofthefoursetsofBHTbits(~95%correct)
0 0
kFetchPC
ShiwinTaken/¬Takenresultsofeachbranch
2-bitglobalbranchhistoryshiwregister
Taken/¬Taken?
![Page 64: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/64.jpg)
Specula8ngBothDirec8ons• AnalternaNvetobranchpredicNonistoexecutebothdirecNonsofabranchspeculaNvely
– resourcerequirementisproporNonaltothenumberofconcurrentspeculaNveexecuNons
– onlyhalftheresourcesengageinusefulworkwhenbothdirecNonsofabranchareexecutedspeculaNvely
– branchpredicNontakeslessresourcesthanspeculaNveexecuNonofbothpaths
• WithaccuratebranchpredicNon,itismorecosteffecNvetodedicateallresourcestothepredicteddirecNon!– Whatwouldyouchoosewith80%accuracy?
64
![Page 65: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/65.jpg)
AreWeMissingSomething?
• Knowingwhetherabranchistakenornotisgreat,butwhatelsedoweneedtoknowaboutit?
Branchtargetaddress
65
![Page 66: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/66.jpg)
Limita8onsofBHTs
66
OnlypredictsbranchdirecNon.Therefore,cannotredirectfetchstreamunNlawerbranchtargetisdetermined.
UltraSPARC-IIIfetchpipeline
Correctlypredictedtakenbranch
penalty
JumpRegisterpenalty
A PCGeneraNon/MuxP InstrucNonFetchStage1F InstrucNonFetchStage2B BranchAddressCalc/BeginDecodeI CompleteDecodeJ SteerInstrucNonstoFuncNonalunitsR RegisterFileReadE IntegerExecute
Remainderofexecutepipeline(+another6stages)
![Page 67: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/67.jpg)
BranchTargetBuffer
67
BPbitsarestoredwiththepredictedtargetaddress.IFstage:If(BP=taken)thennPC=targetelsenPC=PC+4Later:checkpredic+on,ifwrongthenkilltheinstruc+onandupdateBTB&BPbelseupdateBPb
IMEM
PC
BranchTargetBuffer(2kentries)
k
BPbpredicted
target BP
target
![Page 68: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/68.jpg)
AddressCollisions(Mis-Predic8on)
68
WhatwillbefetchedawertheinstrucNonat1028?BTBpredicNon = Correcttarget = =>
Assumea128-entryBTB
BPbtargettake236
1028Add.....
132Jump+104
InstrucNonMemory
2361032
killPC=236andfetchPC=1032
Isthisacommonoccurrence?
![Page 69: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/69.jpg)
BTBisonlyforControlInstruc8ons
• IsevenbranchpredicNonfastenoughtoavoidbubbles?• WhendoweindextheBTB?
– i.e.,whatstateisthebranchin,inordertoavoidbubbles?
• BTBcontainsusefulinforma8onforbranchandjumpinstruc8onsonly=>Donotupdateitforotherinstruc8ons
• ForallotherinstrucNonsthenextPCisPC+4!
• Howtoachievethiseffectwithoutdecodingtheinstruc+on?
69
![Page 70: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/70.jpg)
BranchTargetBuffer(BTB)
70
• KeepboththebranchPCandtargetPCintheBTB• PC+4isfetchedifmatchfails• OnlytakenbranchesandjumpsheldinBTB• NextPCdeterminedbeforebranchfetchedanddecoded
2k-entry direct-mapped BTB (can also be associative) I-Cache PC
k
Valid
valid
EntryPC
=
match
predicted
target
targetPC
![Page 71: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/71.jpg)
AreWeMissingSomething?(2)
• WhendoweupdatetheBTBorBHT?
71
IR IR IR
PC A
B
Y
R
MD1 MD2
addr inst
Inst Memory
0x4 Add
IR
Imm Select
ALU rd1
GPRs
rs1 rs2
wa wd rd2
we
wdata
addr
wdata
rdata Data Memory
we
![Page 72: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/72.jpg)
CombiningBTBandBHT
• BTBentriesareconsiderablymoreexpensivethanBHT,butcanredirectfetchesatearlierstageinpipelineandcanaccelerateindirectbranches(JR)
• BHTcanholdmanymoreentriesandismoreaccurate
72
A PCGeneraNon/MuxP InstrucNonFetchStage1F InstrucNonFetchStage2B BranchAddressCalc/BeginDecodeI CompleteDecodeJ SteerInstrucNonstoFuncNonalunitsR RegisterFileReadE IntegerExecute
BTB
BHTBHTinlaterpipelinestagecorrectswhenBTBmissesapredictedtakenbranch
BTB/BHTonlyupdateda^erbranchresolvesinEstage
![Page 73: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/73.jpg)
UsesofJumpRegister(JR)
• Switchstatements(jumptoaddressofmatchingcase)
• DynamicfuncNoncall(jumptorun-NmefuncNonaddress)
• SubrouNnereturns(jumptoreturnaddress)
73
HowwelldoesBTBworkforeachofthesecases?
BTBworkswellifsamecaseusedrepeatedly
BTBworkswellifsamefuncNonusuallycalled,(e.g.,inC++programming,whenobjectshavesametypeinvirtualfuncNoncall)
BTBworkswellifusuallyreturntothesameplace⇒O^enonefunc+oncalledfrommanydis+nctcallsites!
![Page 74: Lecture 09: RISC-V Pipeline Implementa8on - PASSLab · Lecture 09: RISC-V Pipeline Implementa8on ... – Thanks to RISC-V ISA, which was designed for pipelining 14 Data Hazards 15](https://reader031.vdocument.in/reader031/viewer/2022020217/5d40dd3a88c9938c3f8d19cf/html5/thumbnails/74.jpg)
Subrou8neReturnStack
SmallstructuretoaccelerateJRforsubrouNnereturns,typicallymuchmoreaccuratethanBTBs.
74
&fb() &fc()
Pushcalladdresswhenfunc+oncallexecuted
Popreturnaddresswhensubrou+nereturndecoded
fa() { fb(); } fb() { fc(); } fc() { fd(); }
&fd() kentries(typicallyk=8-16)