an open-source out-of-order risc-v core...it is open-source wriaen in chisel (16k loc) it is...

34
BOOM v2 an open-source out-of-order RISC-V core Christopher Celio, Pi-Feng Chiu, Borivoje Nikolic, David PaAerson, Krste Asanović 2017 RISC-V Workshop #7 UC Berkeley

Upload: others

Post on 25-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

BOOMv2anopen-sourceout-of-orderRISC-Vcore

ChristopherCelio,Pi-FengChiu,BorivojeNikolic,DavidPaAerson,KrsteAsanović2017RISC-VWorkshop#7

UC Berkeley

Page 2: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley WhatisBOOM?

2http://ucb-bar.github.io/riscv-boom

◾"BerkeleyOut-of-OrderMachine"◾out-of-order◾superscalar◾ implementsRV64G,bootsLinux◾ Itissynthesizable◾ itisopen-source◾wriAeninChisel(16kloc)◾ Itisparameterizablegenerator◾builtontopofRocket-chipSoCEcosystem

integer

load/store

fp

fetchdec, ren,

& dis

issue/rrdqueues

BOOMv2-2f4i

wb

Page 3: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley BOOMfitsintoRocket-chipSoC

3

BOOM

Rocket

Rocket

Page 4: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Lotsofneatfeatures

4

◾ advancedbranchpredicTon- BTB,RAS(1-cycle)- gshareandTAGE-basedcondiTonalpredictorimplementaTons(3-cycle)- fitsinsingle-portSRAM

◾ loadscanissueout-of-orderwrttootherloads,stores◾ hardwareperformancecounters(~40events)

FRFlsu

btbbpd I$D$

IRFIQ1IQ0

ROB

imulDFMA

SFMA

IQ2

idiv

mshrs

csr

dec

Page 5: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley ParameterizedSuperscalar

5

valexe_units=ArrayBuffer[ExecutionUnit]()exe_units+=Module(newALUExeUnit(is_branch_unit=true,has_fpu=true,has_mul=true))exe_units+=Module(newALUMemExeUnit(fp_mem_support=true,has_div=true))

Issue Select

RegfileWriteback

dual-issue (5r,3w)

bypassing

ALU

div

LSUAgen D$

bypassing

ALU

FPU

bypassnetwork

RegfileRead

imul

Page 6: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley ParameterizedSuperscalar

5

OR

valexe_units=ArrayBuffer[ExecutionUnit]()exe_units+=Module(newALUExeUnit(is_branch_unit=true,has_fpu=true,has_mul=true))exe_units+=Module(newALUMemExeUnit(fp_mem_support=true,has_div=true))

Issue Select

RegfileWriteback

dual-issue (5r,3w)

bypassing

ALU

div

LSUAgen D$

bypassing

ALU

FPU

bypassnetwork

RegfileRead

imul

exe_units+=Module(newALUExeUnit(is_branch_unit=true))exe_units+=Module(newALUExeUnit(has_fpu=true,has_mul=true))exe_units+=Module(newALUExeUnit(has_div=true))exe_units+=Module(newMemExeUnit())

Issue Select

RegfileWriteback

Quad-issue (9r,4w)

ALU

div

LSUAgen D$

ALU

imul

FPU

ALU

bypassing

bypassnetwork

RegfileRead

Page 7: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

Async. FIFOs + level shifters

BOOM Domain (0.5V-1V)

Uncore Domain

VDDVDDCORE

BOOM Core

L1-to-L2 Crossbar

MMIO RouterRead/write all counters

and control registers

1MB L2 CacheBank0

μArchCounters

μArchCounters

JTAGDigital

IO Pads

Configurable Memory Traffic Switcher

To FPGA GPIO

BTB

Load/Store Unit

int int agen

Phys. Int RF (semi-custom

bit cell)

FPU

6KBBr Pred

Issue Units

16KB ScalarInst. Cache

(foundry 6T SRAM Macros)

Arbiter

16KB ScalarData Cache

(foundry 8T SRAM Macros)

Phys.FP RF

Tile Link IO Bidirectional SERDES

UC Berkeley Goodnews

◾ BOOMhas(finally)beentapedout!- 2017Aug15

◾ TSMC28nmHPM◾ Academictest-chip◾ BOOMisinsupportofstudyingotherfeatures

6

Page 8: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

BOOMv2tapingoutanout-of-orderprocessorwitha2personteamin4months

ChristopherCelio,Pi-FengChiu,BorivojeNikolic,DavidPaAerson,KrsteAsanović2017CARRVWorkshop

UC Berkeley

Page 9: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

ProcessorSiFive U54

Rocket (RV64GC)

Berkeley BOOMv2

UltraSPARC T2

ARM Cortex-A9

Intel Xeon Ivy

Language Chisel Chisel Verilog - SystemVerilog

Core LoC 8,000 16,000 290,000 - NDA

Total LoC 34,000 50,000 1,300,000 - NDA

Foundry TSMC TSMC TI TSMC Intel

Technology 28 nm (HPC)

28 nm (HPM)

65 nm 40 nm (G)

22 nm

Core+L1 Area 0.54 mm2 0.52 mm2 ~12 mm2 ~2.5 mm2 ~12 mm2

Coremark/MHz 2.75 3.92 1.64* 3.71 5.60

Frequency 1.5 GHz 1.2 GHz** 1.17 GHz 1.4 GHz 3.3 GHz

UC Berkeley Comparison:Industry

8

core+L1+L216kB/16kB

Page 10: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

ProcessorSiFive U54

Rocket (RV64GC)

Berkeley BOOMv2

UltraSPARC T2

ARM Cortex-A9

Intel Xeon Ivy

Language Chisel Chisel Verilog - SystemVerilog

Core LoC 8,000 16,000 290,000 - NDA

Total LoC 34,000 50,000 1,300,000 - NDA

Foundry TSMC TSMC TI TSMC Intel

Technology 28 nm (HPC)

28 nm (HPM)

65 nm 40 nm (G)

22 nm

Core+L1 Area 0.54 mm2 0.52 mm2 ~12 mm2 ~2.5 mm2 ~12 mm2

Coremark/MHz 2.75 3.92 1.64* 3.71 5.60

Frequency 1.5 GHz 1.2 GHz** 1.17 GHz 1.4 GHz 3.3 GHz

UC Berkeley Comparison:Industry

9

core+L1+L2

*Fromeembc.org.32threads/8coresachieve13Cm/MHz.**EsTmated

16kB/16kB

Page 11: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley LimitsofEducaPonalLibraries

◾ BOOMv1- EducaTonal45nmlibraries- CACTImodelforSRAMs- synthesizedflip-flopsformostnon-cachememoryarrays- goodforcatchingsmallTmingbugs- notaccurateenoughtojusTfysweepingredesigns

◾ BOOMv2- TSMC28nmHPM(highperformancemobile)- LVT-basedstandardcells-memorycompiler(one-andtwo-port)

10

Page 12: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley 4monthsofagiletape-out

11

*ignore the Y-axis: -- too many parameters/variables changing between each run -- doesn't capture DRC violations

Page 13: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley BOOMv1

12

Fetch

Decode &Rename

Unified Issue

Window

PhysicalScalar Register

File (7x3)(PRF)

ROB

Commitwakeup

uop tags

inst

Fetch: 2wIssue: 3-issue

PRF: 7r3w (110)2

2

2

3

LSU

Datacache

ALUFPU iMul

ALUFDiv IDiv

BOOMv1-2f3i

int/idiv/fdiv

load/store

int/fma

fetch

issue/rrdqueues

wbdec, ren,

& dis

◾ shortpipeline- inspiredbyR10K,21264,Cortex-A9

◾ unifiedissuewindow

Page 14: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley BOOMv1:Frontend

13

I$

Fetch1

PC1

Fetch2

µBTB

Fetch0

PC2

hash

TLB

idx

next-pc

taken

I-Cachebackendredirecttarget pre-decode

& target compute

Fetch Buffer

BPD  redirect

Page 15: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley BOOMv1:Frontend

13

I$

Fetch1

PC1

Fetch2

µBTB

Fetch0

PC2

hash

TLB

idx

next-pc

taken

I-Cachebackendredirecttarget pre-decode

& target compute

Fetch Buffer

BPD  redirect

Page 16: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

idx

 redirect

I$Fetch Buffer

Fetch1

PC1

Fetch2

+8backendredirecttarget

Fetch0

PC2

BPD

redirect

BTB

Fetch3

hash

TLB

pre-decode & target compute

checker{hit,

target}next-pc

taken

I-Cache

decode info

RAS

I$

Fetch1

PC1

Fetch2

µBTB

Fetch0

PC2

hash

TLB

idx

next-pc

taken

I-Cachebackendredirecttarget pre-decode

& target compute

Fetch Buffer

BPD  redirect

UC Berkeley Frontend

14

BOOMv1

BOOMv2

◾ BTBinSRAM- set-associaTve- parTallytagged- Checkertoverifyintegrity

◾ BPD(CondiTonalPredictor)- hashgetsenTrestage- redirectbasedonBTB- redirectpushedback(I$)

Page 17: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Regfile:(thefirstP&R)

◾ Regfileblowsup◾ criTcalpathsinissue-select,registerread

15

regfilerob

iq

rename

lsu

mshr

bpdalu

fpu

alu

Page 18: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley BOOMv2

◾ splitunifiedissuewindow(1day)◾ splitphysicalregisterfile(1week)◾ issue/registerreadseparatestages(~1day)◾ 2stagerename(~1week)◾ 3stagefetch(3weeks)

16

Fetch

Decode &Rename

Unified Issue

Window

PhysicalScalar Register

File (7x3)(PRF)

ROB

Commitwakeup

uop tags

inst

Fetch: 2wIssue: 3-issue

PRF: 7r3w (110)2

2

2

3

LSU

Datacache

Before

ALUFPU iMul

ALUFDiv IDiv

Fetch

Decode &Rename1

FP Issue Window

PhysicalFP RegisterFile (3x2)(PFRF)

ROB

Commit

wakeup

uop tags

inst

Fetch: 2wIssue: 4-issue

PIRF: 6r3w (96)PFRF: 3r1w (48)

2

2

2

LSU

Datacache

FPU FDvALU

iMul iDiv

Physical Int Register

File (6x3)(PIRF)

uop tags uop tags

ALU

ALU Issue Window

Mem Issue Window

toInt(LL

wport)

toPFRF(LL wport)

2

I2F

toPFRF(LL wport)

F2I

Rename2 &Dispatch

Afterinteger

load/store

fp

fetchdec, ren,

& dis

issue/rrdqueues

BOOMv2-2f4i

wb

Page 19: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley RegfileChallenges

◾ foundaTonalIPprovidememorieswithsingleanddual-portmemories

◾ companiesbuildtheirownhand-crahedregisterfiles◾ notenoughlaborinacademiatosupportacustomizedregisterfile

17

... ...

WBL0 WBLB0

WWL0

WBLn WBLBn

WWLnWWLn

RWL[m-1,0]

RBL[m-1,0]x n write ports

x m read ports

Page 20: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Regfile

18

•  Routing 4480 wires causes congestion issues

Q

4480

70x64 flip-flops

........

.............

read_data{5-0}[63-0]

7-de

ep lo

gic

tree

.......

......

...

.......

70 e

ntrie

s

64 bits

read_data{5-0}[63] read_data{5-0}[0]

RF unit cell

•  Reduce read_data routing wires to 384

WD0

WD1

WD2

WS[1:0]

D

EN

WE

Q

OE[5-0]

Tri-state buf [5-0]

RD

[5-0

]

RF unit cell

Congestion!

Tri-state buffer mimics the PG in a custom RF cell so that read data wire can be shared

synthesized array

Page 21: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

WS

WD

validaddrdata

decode

WE

Write Port (x3)

addrRead Port (x6)

Regfile Bit Block

2

6RE

Flip flop(dff)D

Q

decode

RD

read data wires

UC Berkeley Regfile:OurquicksoluPon

◾ hand-crahourownbitblockoutofstandardcells- tri-statetodrivereadwires- hierarchicalbitlines

◾ pre-placearraysofbitblocks◾ letrouterhandlethe18wiresper1flip-flop◾ blackboxinChisel- simulatecontrolusingChiselmodel- verifyatgate-level 19

Page 22: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

BOOMv1-2f3i

int/idiv/fdiv

load/store

int/fma

fetch

issue/rrdqueues

wb

integer

load/store

fp

fetchdec, ren,

& dis

issue/rrdqueues

BOOMv2-2f4i

wb

UC Berkeley BeforeandAXer

20

Page 23: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Results

21

Page 24: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Results

◾ Hardtocompute--notapplestooranges- alotoftheworkwasaboutdesignrulechecks,notperformance

21

Page 25: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Results

◾ Hardtocompute--notapplestooranges- alotoftheworkwasaboutdesignrulechecks,notperformance

◾ Roughly25%decreaseinclockperiod◾ Roughly20%decreaseinCoremark/MHz- knownissuefromload-usedelay

21

Page 26: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Results

◾ Hardtocompute--notapplestooranges- alotoftheworkwasaboutdesignrulechecks,notperformance

◾ Roughly25%decreaseinclockperiod◾ Roughly20%decreaseinCoremark/MHz- knownissuefromload-usedelay

◾ Barelyawin?◾ ChangesfixedDRCerrors,geometryerrors(shorts),so...

21

Page 27: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley Results

◾ Hardtocompute--notapplestooranges- alotoftheworkwasaboutdesignrulechecks,notperformance

◾ Roughly25%decreaseinclockperiod◾ Roughly20%decreaseinCoremark/MHz- knownissuefromload-usedelay

◾ Barelyawin?◾ ChangesfixedDRCerrors,geometryerrors(shorts),so...- infinitelyFaster!BeAer!Stronger!

21

Page 28: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley AgileHardwareDevelopment◾ RTLhackingcanbeveryagile

- ~6minutestocompile,build,andrun"riscv-tests"regressionsuite(10KHzforVerilatorsimulator)

- Chiselallowsforquick,far-reachingchanges- generatorapproachallowsforlate-bindingdesigndecisions- smallchanges,improvements(thatdon'taffectfloorplan)areagile

22

Page 29: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley AgileHardwareDevelopment◾ RTLhackingcanbeveryagile

- ~6minutestocompile,build,andrun"riscv-tests"regressionsuite(10KHzforVerilatorsimulator)

- Chiselallowsforquick,far-reachingchanges- generatorapproachallowsforlate-bindingdesigndecisions- smallchanges,improvements(thatdon'taffectfloorplan)areagile

◾ PhysicaldesignisaboAleneck- 2-3hoursforsynthesisresults- 8-24hoursforp&rresults- toomuchvalueprovidedbymanualintervenTon- reportsaredifficulttoreasonabout- sadlycan'tthrowthisatacomputerandget7gooddesignsaweeklater

22

Page 30: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley AgileHardwareDevelopment◾ RTLhackingcanbeveryagile

- ~6minutestocompile,build,andrun"riscv-tests"regressionsuite(10KHzforVerilatorsimulator)

- Chiselallowsforquick,far-reachingchanges- generatorapproachallowsforlate-bindingdesigndecisions- smallchanges,improvements(thatdon'taffectfloorplan)areagile

◾ PhysicaldesignisaboAleneck- 2-3hoursforsynthesisresults- 8-24hoursforp&rresults- toomuchvalueprovidedbymanualintervenTon- reportsaredifficulttoreasonabout- sadlycan'tthrowthisatacomputerandget7gooddesignsaweeklater

◾VerificaTonisanotherboAleneck- IcanwritebugsfasterthanIcanfindthem

22

Page 31: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley FutureDirecPons(forBOOM)

◾ AlotofimprovementstomaketoBOOM- CleardirecTonoffurtherIPC/QoRimprovements

23

Page 32: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley FutureDirecPons(forme)

◾ SeeDaveDitzel'stalkfromyesterdaytolearnmore◾ RISC-VFoundingGoldMember- onlotsofworkinggroups!

◾WearecommiAedtomaintainingtheBOOMopen-sourcerepository

◾We'rehiring...

24

"Esperanto Technologies designs high-performance energy-efficient computing solutions." - riscv.org

Page 33: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley A2-persontapeouttakesavillage!

◾ RISC-VISA- veryout-of-orderfriendly!

◾ ChiselhardwareconstrucTonlanguage- object-oriented,funcTonalprogramming

◾ FIRRTL- exposedRTLintermediaterepresentaTon(IR)

◾ Rocket-chip- AfullworkingSoCplarormbuiltaroundtheRocketin-ordercore

◾ Thanksto:- KrsteAsanović,RimasAvizienis,JonathanBachrach,ScoABeamer,DavidBiancolin,ChristopherCelio,HenryCook,PalmerDabbelt,JohnHauser,AdamIzraelevitz,SagarKarandikar,BenKeller,DonggyuKim,JackKoenig,JimLawson,YunsupLee,RichardLin,EricLove,MarTnMaas,ChickMarkley,AlbertMagyar,HowardMao,MiquelMoreto,QuanNguyen,AlbertOu,DavidA.PaAerson,BrianRichards,ColinSchmidt,WenyuTang,StephenTwigg,HuyVo,AndrewWaterman,AngieWang,andmore...

25

Page 34: an open-source out-of-order RISC-V core...it is open-source wriAen in Chisel (16k loc) It is parameterizable generator built on top of Rocket-chip SoC Ecosystem integer load/store

UC Berkeley QuesTons?

◾ Researchpar*allyfundedbyDARPAAwardNumberHR0011-12-2-0016,theCenterforFutureArchitectureResearch,amemberofSTARnet,aSemiconductorResearchCorpora*onprogramsponsoredbyMARCOandDARPA,andASPIRELabindustrialsponsorsandaffiliatesIntel,Google,Huawei,Nokia,NVIDIA,Oracle,andSamsung.

◾ Approvedforpublicrelease;distribu*onisunlimited.Thecontentofthispresenta*ondoesnotnecessarilyreflecttheposi*onorthepolicyoftheUSgovernmentandnoofficialendorsementshouldbeinferred.

◾ Anyopinions,findings,conclusions,orrecommenda*onsinthispaperaresolelythoseoftheauthorsanddoesnotnecessarilyreflecttheposi*onorthepolicyofthesponsors.

26

FundingAcknowledgements