![Page 1: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/1.jpg)
Princeton University and ETH Zürich
http://openpiton.orghttp://pulp-platform.org
OpenPiton with RISC-V CoresA Hands-On Tutorial with the
Open Source Manycore Processor
![Page 2: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/2.jpg)
Princeton Parallel Research Group• Computer Architecture after Moore’s Law
– @MICRO 2019: ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs (Monday)
• Redesigning the Data Center of the Future– @MICRO 2019: Architectural Implications of Function-as-a-
Service Computing (Wednesday)• Biodegradable Computing (Materials)
• 10 PhD Students• 1 Postdoc• 3 Undergraduates
2Grand Canyon Trip 2019
![Page 3: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/3.jpg)
3
This work was partially supported by the NSF under Grants No. CNS-1823222,CCF-1823032, CCF-1217553, CCF-1453112, and CCF-1438980, Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreements No. FA8650-18-2-7846, FA8650-18-2-7852, and FA8650-18-2-7862, AFOSR under Grant No. FA9550-14-1-0148, and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA), the NSF, AFOSR, or the U.S. Government.
Support
![Page 4: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/4.jpg)
4
The world’s first open source, general purpose, multithreaded manycore processor
• Open source manycore• Written in Verilog RTL• Scales to ½ billion cores• Configurable core, uncore• Includes synthesis and back-end flow• Simulate in VCS, ModelSim, NCSim, Verilator, Icarus• ASIC & FPGA verified• ASIC power and energy fully characterized
[HPCA 2018]• Runs full stack multi-user Debian Linux• Used for Architecture, Programming Language,
Compilers, Operating Systems, Security, EDA research
Tile
Chip
chipset
![Page 5: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/5.jpg)
• Collaboration between Princeton University and PULP team from ETH Zürich
• Goal is to develop a permissively licensed, Linux capable many-core research platform based on RISC-V
• Ariane– RV64GC Core– Linux capable
•– Research manycore system– OpenSPARC T1 based– Coherent NoC, distributed cache
OpenPiton+Ariane
5
![Page 6: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/6.jpg)
• Project started in 2013 by Luca Benini• A collaboration between University of Bologna and ETH Zürich
– Large team. In total about 60 people, not all are working on PULP
• Key goal is
• We were able to start with a clean slate, no need to remain compatible to legacy systems.
Parallel Ultra Low Power (PULP)
How to get the most BANGfor the ENERGY consumed in a computing system
![Page 7: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/7.jpg)
ARIANE: Linux capable 64-bit core• Application class processor• Linux Capable
– Tightly integrated D$ and I$– M, S and U privilege modes– TLB, SV39– Hardware PTW
• Optimized for performance– Frequency: 1.5 GHz (22 FDX)– Area: ~ 175 kGE– Critical path: ~ 25 logic levels
• 6-stage pipeline– In-order issue– Out-of-order write-back– In-order commit
• Scoreboarding• Designed for extendibility• Branch-prediction
– Return Address Stack (RAS)– Branch Target Buffer (BTB)– Branch History Table (BHT)
7
7
![Page 8: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/8.jpg)
ARIANE: Linux capable 64-bit core
8
![Page 9: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/9.jpg)
OpenPiton System Overview
9
Tile
![Page 10: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/10.jpg)
OpenPiton System Overview
10
![Page 11: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/11.jpg)
OpenPiton System Overview
11
Chip
![Page 12: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/12.jpg)
OpenPiton System Overview
12
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
ChipsetChip
![Page 13: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/13.jpg)
OpenPiton System Overview
13
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM
Chip Chipset
![Page 14: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/14.jpg)
OpenPiton System Overview
14
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM WishboneSDHC
Chip Chipset
![Page 15: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/15.jpg)
OpenPiton System Overview
15
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM WishboneSDHC
AXII/O
Chip Chipset
![Page 16: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/16.jpg)
OpenPiton System Overview
16
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM WishboneSDHC
AXII/O
Chip Chipset
![Page 17: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/17.jpg)
Tile Overview
17
To Other Tiles
L2 Cache Slice+
Directory Cache
P-MeshRouters
(3)
L1.5 Cache
CCX Arbiter
FPU
Modified OpenSPARC T1
Core
MITTS(Traffic Shaper)
![Page 18: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/18.jpg)
Silicon Proven Designs: Ariane• Ariane has been taped-out
Globalfoundries 22nm FDXin 2017 and 2018
• The system features 16 kByte ofinstruction and 32 kByte of datacache.
• Poseidon:– Area: 0.23 mm2 – 175 kGE– 0.2 - 1.7 GHz (0.5 V – 1.15 V)
• Kosmodrom:– RV64GCXsmallFloat– Transprecision / Vector FPU– Ariane HP
• 8T library, 0.8V, 1.3 GHz• 55 mW @ 1 GHz
– Ariane LP• 7.5T ULP library, 0.5V, 250 MHz• 5 mW @ 200 MHz 18
Issue
QUENTIN KERBIN
HYPERDRIVE
Poseidon layoutAriane
Kosmodrom layout
Ariane LPAriane HP
L2
NTX
18
![Page 19: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/19.jpg)
Silicon Proven Designs: Piton Chip• 25-core
– 2 Threads per core– 64-bit Architecture– Modified OpenSPARC T1 Core
• 3 NoCs (P-Mesh)– 64-bit, 2D Mesh– Extend off-chip enabling multichip systems
• Directory-Based Cache System– 64KB L2 Cache per core (Shared)– 8KB L1.5 Data Cache– 8KB L1 Data Cache– 16KB L1 Instruction Cache
• IBM 32nm SOI Process– 6mm x 6mm– 460 Million Transistors
• Target: 1GHz Clock @ 900mV• 208 Pin CQFP Package
19
Tile 0 Tile 1 Tile 2 Tile 3 Tile 4
Tile 20
Tile 21
Tile 22
Tile 23
Tile 24
Tile 5 Tile 6 Tile 7 Tile 8 Tile 9
Tile 10
Tile 11
Tile 12
Tile 13
Tile 14
Tile 15
Tile 16
Tile 17
Tile 18
Tile 19
PLLCB Chip Bridge (CB)
![Page 20: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/20.jpg)
Piton Test Setup
20
DRAM + I/O
Chipset FPGAKintex 7
Bridge FPGASpartan 6
Piton + Heat Sink
Bulk Decoupling
Power Supply
Misc. Configuration
[McKeown et al, HotChips 2016] [McKeown et al, IEEE MICRO 2017] [McKeown et al, HPCA 2018]
![Page 21: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/21.jpg)
Putting it all together
21
To Other Tiles
L2 Cache Slice+
Directory Cache
P-MeshRouters
(3)
L1.5 Cache
CCX Arbiter
FPU
Modified OpenSPARC T1
Core
MITTS(Traffic Shaper)
§ Native L1.5 interface is the ideal point to attach a new core
§ Well defined interface similar to CCX from OpenSPARC
§ Write-through cache protocol
§ Coherency mechanism: only need to support invalidation messages
![Page 22: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/22.jpg)
Putting it all together
22
To Other Tiles
L2 Cache Slice+
Directory Cache
P-MeshRouters
(3)
L1.5 Cache
CCX Arbiter
FPU
Modified OpenSPARC T1
Core
MITTS(Traffic Shaper)
§ Native L1.5 interface is the ideal point to attach a new core
§ Well defined interface similar to CCX from OpenSPARC
§ Write-through cache protocol
§ Coherency mechanism: only need to support invalidation messages
![Page 23: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/23.jpg)
FPGA Prototyping Platforms
Available:• Digilent Genesys2– $999 ($600 academic)– 1-2 cores at 66MHz• Xilinx VC707– $3500– 1-4 cores at 60MHz• Digilent Nexys Video– $500 ($250 academic)– 1 core at 30MHz
• BittWare XUPP3R– $7000-8000– >100MHz (12 cores)• Amazon AWS F1– Rent by the hour– 12 cores
![Page 24: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/24.jpg)
OpenPiton Philosophy• Focus/Value is in the Uncore
– Not religious about ISA– Provide whole working system
• We are practical– Use Verilog (Ariane is SV)– Industry standard tools– Use the best tool for job (including commercial CAD tools)
• Primarily for research, but welcome industry also• Licensing
– All our code, Hypervisor, are BSD-like– Linux, T1 core (GPL or LGPL)– Ariane (Solderpad)
• Scalability (Million Core)
24
![Page 25: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/25.jpg)
OpenPiton Community
• Visit http://openpiton.org• [email protected]
25
• Building a community– Welcome community
contributions– Thousands of Downloads
• Google Group
![Page 26: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/26.jpg)
Doing Research with OpenPiton + Ariane
• Software– Install on Debian, test scalability
• Operating System– Recompile kernel, rebuild SW, run
• Hardware/Software Co-design– Add new instructions, change compiler/HV/OS/SW
• Architecture– Change parameters, rebuild HW, run
26
HW
ISA
HV/OS
Apps
Compiler/Runtime
![Page 27: OpenPiton with RISC-V Cores A Hands-On Tutorial with the Open Source Manycore Processorparallel.princeton.edu/openpiton/tutorial_slides/micro19/... · 2019. 10. 28. · •10 PhD](https://reader036.vdocument.in/reader036/viewer/2022071513/613475f8dfd10f4dd73bbeae/html5/thumbnails/27.jpg)
Enabled Research
• Coherence Domain Restriction– Fu et al. MICRO 2015
• Execution Drafting– McKeown et al. MICRO 2014
• Memory Inter-arrival Time Traffic Shaper– Zhou et al. ISCA 2016
• Oblivious RAM– Fletcher et al. ASPLOS 2015
• DVFS modelling• Numerous outside papers• Numerous class research projects
27
Program A Instruction Program B Instruction
Fetch Stage Thread Select Stage
Decode Stage Execute Stage Memory Stage Writeback Stage
Successfully Drafted Instructions Lead Instructions
…
… … … … … … … ……
…
…
…
…
…
…
…
…App 1
App 3
App 2
Frequency
RequestInter-arrival time
2t
t 3t
Uniform Traffic
More Bursty Traffic
2tA Distribution of Traffic
Time