+ ariane the first open-source smp linux-booting risc-v … · •the world's first...

26
: The First Open-Source SMP Linux-booting RISC-V System Scaling From One to Many Cores Jonathan Balkind, Michael Schaffner, Katie Lim, Florian Zaruba, Fei Gao, Jinzheng Tu, Luca Benini, David Wentzlaff Princeton University, ETH Zurich + Ariane openpiton.org pulp-platform.org

Upload: others

Post on 15-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

:The First Open-Source

SMP Linux-booting RISC-V System Scaling From One to Many CoresJonathan Balkind, Michael Schaffner, Katie Lim, Florian Zaruba, Fei Gao,

Jinzheng Tu, Luca Benini, David WentzlaffPrinceton University, ETH Zurich

+ Ariane

openpiton.org pulp-platform.org

Page 2: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Who are we?

• Jonathan Balkind• Lead architect of

OpenPiton• OpenPiton Team

• Led by Prof. David Wentzlaff• Princeton Parallel

Research Group• Open source HW since 2015• 13 PhD students• 1 Postdoc• N undergraduates

• Michael Schaffner• Responsible for OpenPiton+

Ariane integration• PULP Team

• Led by Prof. Luca Benini• ETHZ / Università di Bologna• Open source HW since 2013• Leaders in RISC-V

development• Ariane dev: Florian Zaruba,

Michael Schaffner and others

2

Page 3: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Support

This material is based on research sponsored by the NSF under Grants No. CNS-1823222, CCF-1217553, CCF1453112, CCF-1823032, and CCF-1438980, AFOSR under Grant No. FA9550-14-1-0148, Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement No. FA8650-18-2-7846, FA8650-18-2-7852, and FA8650-18-2-7862 and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those ofthe authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed orimplied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA), the NSF, AFOSR, or the U.S. Government.

3

Page 4: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Project Overview• Collaboration between Princeton University and ETH Zurich• Goal is to develop a permissively licensed, Linux capable

manycore research platform based on RISC-V• Based on mature, extensible designs• Booted SMP Linux in <6 months• The world's first open-source, SMP Linux-booting, RISC-V manycore

• Ariane• RV64GC Core (with extensions)• Linux capable

• OpenPiton• Manycore research platform• Distributed cache coherence and NoC

4

Page 5: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Ariane RV64GC Core• Application class processor• Written in SystemVerilog

• Linux Capable• Tightly integrated D$ and I$• M, S and U privilege modes• TLB, SV39• Hardware PTW

• Optimized for performance• Frequency: 1.5 GHz (22 FDX)• Area: ~ 175 kGE• Critical path: ~ 25 logic levels

• Scoreboarding• 6-stage pipeline• In-order (single) issue• Out-of-order write-back• In-order commit

• Designed for extensibility• Branch-prediction• Return Address Stack (RAS)• Branch Target Buffer (BTB)• Branch History Table (BHT)

5

Page 6: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Ariane

6

Page 7: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Silicon Proven Designs: Ariane

7

Issue

QUENTIN KERBIN

HYPERDRIVE

Poseidon layoutAriane

Kosmodrom layout

Ariane LPAriane HP

L2

NTX

• Ariane taped-out in GlobalFoundries 22nm FDX twice• 16kB instruction and 32kB data caches• Poseidon:

• Area: 0.23 mm2 - 175 kGE• 0.2 - 1.7 GHz (0.5 V - 1.15 V)

• Kosmodrom:• RV64GCXsmallFloat, Transprecision / Vector FPU• Ariane HP

• 8T library, 0.8V, 1.3 GHz• 55 mW @ 1 GHz

• Ariane LP• 7.5T ULP library, 0.5V, 250 MHz• 5 mW @ 200 MHz

Page 8: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

OpenPiton• Open source manycore• Written in Verilog RTL• P-Mesh coherence scales to ½ billion cores• Configurable core, uncore• Simulation in VCS, ModelSim, Incisive, Verilator, Icarus• Includes synthesis and back-end flow• ASIC & FPGA verified• ASIC power and energy fully characterized [HPCA 2018]• Runs full stack multi-user Debian Linux• Used for Architecture, Programming Language,

Compilers, Operating Systems, Security, EDA research

8

Tile

Chip

chipset

Page 9: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

OpenPiton Tile

9

To Other Tiles

L2 Cache Slice+

Directory Cache

P-MeshRouters

(3)

L1.5 Cache

CCX Arbiter

FPU

Modified OpenSPARC T1

Core

MITTS(Traffic Shaper)

Page 10: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

10

Tile

Page 11: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

11

Page 12: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

12

Chip

Page 13: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

13

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

Chip Chipset

Page 14: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

14

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM

Chip Chipset

Page 15: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

15

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM WishboneSDHC

AXII/O

Chip Chipset

Page 16: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

System Overview

16

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM WishboneSDHC

AXII/O

Chip Chipset

Page 17: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Silicon Proven Designs: Piton• 25-core• 2 Threads per core• Modified 64 bit OpenSPARC T1 Core

• 3 P-Mesh NoCs• 64 bit, 2D Mesh• Extend off-chip enabling multichip systems

• P-Mesh Directory-Based Cache System• 64kB L2 Cache per core (Shared)• 8kB L1.5 & L1 Data Caches• 16kB L1 Instruction Cache

• IBM 32nm SOI Process• 6mm x 6mm• 460 Million Transistors - Among largest chips built in academia

• Target: 1 GHz Clock @ 900 mV• Received silicon and runs full-stack Debian in lab

17

Page 18: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

OpenPiton+Ariane

18

Page 19: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

OpenPiton+Ariane Cache Modifications

• New write-through cache subsystem with invalidations and the TRI interface• LR/SC in L1.5 cache• Fetch-and-op in L2

cache

19

Page 20: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

OpenPiton+Ariane Platform Support

• Bootrom auto-generated with device tree from configuration• RISC-V Debug• OpenOCD + GDB• Bootloading

• CLINT• PLIC• lowRISC rv_plic

20

Page 21: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

21

Component Configurability Options

Cores (per chip) Up to 65,536

Cores (per system) Up to 500 million

Core Type OpenSPARC T1 Ariane 64 bit RISC-V

Threads per Core 1/2/4 1

Floating-Point Unit FP64, FP32 FP64, FP32, FP16, FP8, BFLOAT16

TLBs 8/16/32/64 entries Number of entries (16 entries)

L1 I-Cache Number of Sets, Ways (16kB, 4-way)

L1 D-Cache Number of Sets, Ways (8kB, 4-way)

L1.5 Cache Number of Sets, Ways (8kB, 4-way)

L2 Cache Number of Sets, Ways (64kB, 4-way)

Intra-chip Topologies 2D Mesh, Crossbar

Inter-chip Topologies 2D Mesh, 3D Mesh, Crossbar, Butterfly Network

Bootloading SD/SDHC Card, UART, RISC-V JTAG Debug

Configurability Options

Page 22: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

FPGA Prototyping PlatformsAvailable:• Digilent Genesys2• $999 ($600 academic)• 1-2 cores at 66MHz• Xilinx VC707• $3500• 1-4 cores at 60MHz• Digilent Nexys Video• $500 ($250 academic)• 1 core at 30MHz

In progress:• Xilinx VCU118, BittWare XUPP3R• $7000-8000• >100MHz• Amazon AWS F1• Rent by the hour• 1-N cores• Live demo at tomorrow's tutorial!

22

Page 23: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Roadmap

• Testing:• Memory consistency testing with

litmus/herd/diy• Randomised testing with riscv-

torture, RISC-V DV

• Bootloading:• OpenSBI• U-Boot/Coreboot/...

• Debian/Fedora Linux distro

• Performance enhancements• Multi-level TLBs• Branch Prediction improvements*• Increasing TRI/L1.5 line size• Multi-issue Ariane*

• Open backend flow for Ariane• Tapeouts!

* GSoC project with FOSSi Foundation

23

Page 24: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Boot SMP Linux Today!• Clone from:• https://github.com/PrincetonUniversity/openpiton• Simulation with Modelsim, VCS, Verilator• FPGA implementation with Vivado 2018.2 or newer

• RV64GC Demo• 2 cores on Genesys2 at 66MHz• Play Tetris, browse the web!

• Tutorial tomorrow afternoon! (in this room)• Hands-on with Verilator simulation• Boot SMP Linux on FPGA• http://openpiton.org/ISCA19_tutorial.html

24

Page 25: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

25

QUESTIONS?

@OpenPitonhttp://openpiton.org

@pulp_platformhttp://pulp-platform.org

Page 26: + Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux

Balkind and Scha�ner, et al.

Table 3: Some of the supported FPGA build con�gurations. Both cores have the same default cache con�guration (see Table 1).The results have been generated with Vivado 2018.2, using OpenPiton r11 / Ariane v4.1 including additional developmentpatches that will be part of upcoming releases.

Board Name / Clock Con�g Core FPU LUTs Registers RAM Tiles DSPsFPGA Type [MHz] X ⇥ Y Type [y/n] [k] [k] [#] [#]Digilent NexysVideoArtix 77a200tsbg484

30 1 ⇥ 1 Ariane no 95 (71%) 72 (27%) 66 (18%) 16 (2%)30 1 ⇥ 1 Ariane yes 110 (82%) 75 (28%) 66 (18%) 27 (4%)30 1 ⇥ 1 OpenSPARCT1 yes 115 (86%) 96 (36%) 59 (16%) 13 (2%)

Digilent Genesys2Kintex 77k325t�g900-2

67 1 ⇥ 1 Ariane no 86 (42%) 72 (17%) 66 (15%) 16 (2%)67 1 ⇥ 1 Ariane yes 99 (49%) 75 (18%) 66 (15%) 27 (3%)67 1 ⇥ 1 OpenSPARCT1 yes 105 (52%) 91 (22%) 59 (13%) 16 (2%)67 2 ⇥ 1 Ariane no 141 (69%) 113 (28%) 124 (28%) 16 (4%)67 2 ⇥ 1 Ariane yes 167 (82%) 120 (30%) 124 (28%) 54 (6%)67 2 ⇥ 1 OpenSPARCT1† yes 160 (79%) 137 (33%) 112 (25%) 32 (4%)

Xilinx VC707Virtex 77vx485t�g1761-2

60 1 ⇥ 1 Ariane no 99 (33%) 73 (12%) 63 (6%) 16 (<1%)60 1 ⇥ 1 Ariane yes 114 (37%) 77 (13%) 63 (6%) 27 (1%)60 1 ⇥ 1 OpenSPARCT1 yes 119 (39%) 97 (16%) 53 (5%) 16 (<1%)60 2 ⇥ 2 Ariane no 284.1 (94%) 202 (33%) 237 (23%) 64 (2%)60 3 ⇥ 1 Ariane yes 268 (88%) 169 (28%) 179 (17%) 81 (3%)60 3 ⇥ 1 OpenSPARCT1† yes 255 (84%) 208 (34%) 158 (15%) 48 (2%)

Xilinx VCU118Virtex US+xcvu9p�ga2104-2L

100 1 ⇥ 1 Ariane no 90 (8%) 81 (3%) 88 (4%) 19 (<1%)100 1 ⇥ 1 Ariane yes 103 (9%) 84 (4%) 89 (4%) 30 (<1%)100 1 ⇥ 1 OpenSPARCT1 yes 108 (9%) 100 (4%) 79 (4%) 19 (<1%)100 4 ⇥ 4 Ariane no 923 (78%) 704 (30%) 963 (45%) 259 (4%)100 4 ⇥ 2 Ariane yes 583 (49%) 399 (17%) 495 (23%) 219 (3%)

†Without Coherence Domain Restriction [8] in caches.

• CLINT: The Core Local Interrupt Controller (CLINT) pro-vides Inter Processor Interrupts (IPI) and a common time-base. Each core has its own timer compare register whichtriggers an external timer interrupt when it matches theglobal time-base.

• PLIC: The Platform Level Interrupt Controller (PLIC) is ainterrupt controller whichmanages external peripheral inter-rupts. It provides a context for each privilege level and core.The software can con�gure di�erent priority thresholds foreach context. The PLIC is still subject to o�cial standardisa-tion. However, there is already an implementation includinga Linux driver, which is agreed upon.

2.5 Automatic Device Tree GenerationIn order to capture the di�erent platform con�gurations that Open-Piton+Ariane provides, we added an automatic device tree genera-tion script to the PyHP preprocessor from OpenPiton. This scriptparses an XML description of the system address map and platformperipherals (which is also used to generate the chipset crossbar),and together with the information about the number of cores andthe clock frequency it generates a device tree that is compiledinto a bootrom attached to the peripheral space. The "zero-stage"bootloader stored in that bootrom initialises the cores and loadsa pointer to the device tree blob into register a1 as per RISC-Vconvention. With this automatic device tree generation, the sameLinux image can be booted on di�erently parameterised instances,automatically adapting to the platform at runtime.

3 SIMULATION & EMULATION PLATFORMSAriane plugs into the sims simulation infrastructure provided inOpenPiton. This handles the building of simulation models witheach of the supported simulators (at present, Mentor QuestaSim,Synopsys VCS and Verilator), as well as running one test or anentire test suite against the compiled model. We have enhancedsims to support compilation of RISC-V assembly and C tests, andthe direct use of pre-compiled binaries. The primary bare-metal testsuite is the publicly available riscv-tests repository [20]. Beyondbare-metal testing, we also simulate Linux boot for debugging,which takes approximately 4 days to boot for a single core (DRAMreduced to 128MB to speed up the memory initialisation phase insimulation).

3.1 FPGA FlowsThe Ariane core option has been integrated into the OpenPitonprotosyn build �ow and is available for the Digilent Nexys Videoand Genesys2 boards, as well as the Xilinx VC707 and VCU118development boards. The resource consumption of a set of buildswith the standard cache con�guration and di�erent numbers ofcores is shown in Table 3. Since the Ariane FPU pipeline registershave not been optimised for FPGA mapping, enabling the FPU willresult in a somewhat lower core clock frequency. The LUT distribu-tion for single-core Genesys2 builds is shown in Figure 3. The coreamounts to around 22%-41% of the total resources, depending onthe actual con�guration (Ariane with or without FPU, OpenSPARCT1 with FPU). Further, we note that the T1 is around 23% and 93%larger than Ariane with and without FPU, respectively. This area

26