![Page 1: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/1.jpg)
“Flash and Chips” The FireBox Warehouse-Scale Computer
Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab
[email protected] Fujitsu Labs, America Technology Symposium
June 24, 2015
![Page 2: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/2.jpg)
UC Berkeley Three WSC Generations 1. ~2000: Commercial Off-The-Shelf (COTS)
computers, switches, & racks
2. ~2010: Custom computers, switches, & racks but build from COTS chips, drive new board-level standards (“Open Compute”)
3. ~2020: Custom computers, switches, & racks using custom WSC chips, drive new chip-scale open standards (“RISC-V”)
2
![Page 3: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/3.jpg)
UC Berkeley
Moore’s Law: Old vs. New Conventional Wisdom
Old CW: Moore’s Law, each 18-month technology generation, transistor performance/energy improves, cost/transistor decreases
New CW: generations slowing to 3 year -> 5+ year, transistor performance/energy slight improvement, cost/transistor increases!
3
2015: Moore’s Law has ended for logic, SRAM, & DRAM (Maybe 3D Flash & new NVM continues?)
![Page 4: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/4.jpg)
UC Berkeley Why WSC Custom Chips in 2020? Without transistor scaling, improvements in system capability
have to come above transistor-level - More specialized hardware
Good news: when scaling stops, custom chip costs drop - Amortize investments in capital equipment, CAD tools, libraries, training,
… over decade vs. 18 months WSCs proliferate @ $100M/WSC
- Economically sound to divert some $ if yield more cost-performance-energy effective chips
New HW description languages supporting parameterized generators improve productivity and reduce design cost
- E.g., Stanford Genesis2; Berkeley’s Chisel, based on Scala
![Page 5: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/5.jpg)
UC Berkeley Which ISA: x86 or ARM
Much debate about relative merits of these 2 ISAs in 2nd/3rd generation WSC design
ISA doesn’t matter, important differences are in: - Quality of COTS implementations
- microarchitecture quality - circuit design quality - foundry vs. Intel fab
- Business model - Can build custom chips around ARM IP, not Intel
Correct answer is RISC-V, a free and open standard ISA
5
![Page 6: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/6.jpg)
UC Berkeley
www.riscv.org New completely free, open ISA => shared open cores
- Already runs GCC, Linux, glibc, LLVM, QEMU, … - RV32/RV64/RV128 variants for 32b/64b/128b address
Designed for SoCs: allow per-SoC customization yet maintain software compatibility
- Base ISA that will never change with only 40 integer instructions, and supports compiler, linker, OS, …
- Optional extensions provide full general-purpose ISA, including IEEE-754/2008 floating-point
- “Secret sauce” accelerator unique per SoC 11 64-bit chips at Berkeley so far (45nm, 28nm) Multiple external groups building commercial parts to ship this year RISC-V non-profit foundation to launch this summer
6
![Page 7: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/7.jpg)
UC Berkeley FireBox “Strawman” Processor Module ~64 identical vector/SIMD cores / SoC
- Simplify resource management, software model - Provide redundancy for failure tolerance - Enhanced vector unit with specialized engines
8-32 DRAM chips on interposer for high BW - 32Gb chips give 32-128GB DRAM capacity/node - 500GB/s+ DRAM bandwidth
Cache coherent on-chip, simplify OS for SoC Fast messaging NIC/DMA near each core Cache & BW partitioning for QoS support Compress/Encrypt engine so reduce size yet always
encrypted outside node “Secret sauce” for per-SoC accelerators Via parameterized Chisel WSC-SoC generator
- Easy to add custom application accelerators, tune architectural parameters
7
SoC
Shared $/VLS
CPU Vectors ++
Private $/VLS DMA
NIC
HiBW DRAM
Switch
CPU Vectors ++
Private $/VLS DMA
NIC
CPU Vectors ++
Private $/VLS DMA
NIC
Compress/Crypt
Secret Sauce
![Page 8: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/8.jpg)
UC Berkeley Pr
oces
sor M
odul
e Fl
ash
Mod
ule
DRAM
M
odul
e FireBox Rack
8
SoC
Shared $/VLS
CPU Vectors ++
Private $/VLS DMA
NIC
HiBW DRAM
Switch
Switch Chip
DRAM
Bulk DRAM Control
DRAM
DR
AM
DRAM
DR
AM
DRAM
DR
AM
DRAM
Flas
h Flash Control Fl
ash
Flas
h Fl
ash
Flas
h Fl
ash
Flas
h Fl
ash
Switch Chip Switch Chips
CPU Vectors ++
Private $/VLS DMA
NIC
CPU Vectors ++ Private $/VLS
DMA
NIC
Crypt/Compress
Up to ~1000 modules of any kind: Processor, DRAM, Flash
Many Pb/s photonic network
Massive Redundancy for Dependability
Secret Sauce
![Page 9: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/9.jpg)
UC Berkeley FireBox Cluster
Configure P/M/F/N/D mix for workload & budget Each FireBox has up to ~1000 sockets
- Each box can contain any mix of P/M/F/N/D modules, not just one kind - Switches located inside boxes
20 - 40 FireBox Racks / Cluster, Cluster is app target Network box connects WSC to standard networks (e.g. 400GbE) Disk boxes hold local disk arrays
9
N
N
M
M
M
M
M
M
M
P
P
P
P
M
M
M
M
M
P Processor-Heavy Box
M DRAM-Heavy Box
F Flash-Heavy Box
N Network-Heavy Box
D Disk-Heavy Box F
F
F
F F F F
F F F F F
F
F
F
F
F
F
D
D
D
D
D
D
Ethernet
![Page 10: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/10.jpg)
UC Berkeley Warehouse of FireBox Clusters (FBC)
≈50 KW/FireBox => ≈1 - 2 MW/Cluster => 10 - 20 Clusters/WSC
Ethernet datacenter router to connect FBCs to each other and to Internet 10
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
FBC
Ethernet
Internet
![Page 11: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/11.jpg)
UC Berkeley Acknowledgements
Help from ASPIRE outsiders: - Michaela Blott, Xilinx - Bob Brennan, Samsung - Eric Brewer, UCB/Google - Zhangxi Tan, UCB/Pure - Amin Vahdat, Google - Kees Vissers, Xilinx
ASPIRE Sponsors:
- Founding: DARPA, SRC C-FAR, Intel - Affiliates: Google, HP, Huawei, LGE, NVIDIA,
Oracle, Samsung
11
Help from ASPIRE insiders: - Elad Alon - Scott Beamer - Joao Carriera - Pi-Feng Chiu - Adam Izraelevitz - Kurt Keutzer - Forrest Iandola - Yunsup Lee - Eric Love - Martin Maas - Bora Nikolic - Frank Nothaft - David Patterson - Nathan Pemberton - Colin Schmidt - Vladimir Stojanovic - Chen Sun - Andrew Waterman
![Page 12: “Flash and Chips” The FireBox Warehouse-Scale …The FireBox Warehouse-Scale Computer Krste Asanović, David Patterson, Randy Katz, ] + FireBox group and ASPIRE Lab krste@berkeley.edu](https://reader030.vdocument.in/reader030/viewer/2022041103/5f038b597e708231d409940a/html5/thumbnails/12.jpg)
UC Berkeley Summary Solid-state datacenter is already here, now need to
architect chips/systems to take advantage Custom chips are coming back
- Vertical integration makes sense at warehouse-scale - Open-source software developed for open hardware - 99.9% of software identical, only change the secret sauce - Chip NRE costs will drop (or we’re all doomed)
Specialized coprocessors, not specialized cores - Ease of software development - Avoid latency & energy costs to move data out of caches - WSC management prefers arrays of homogeneous cores 12