![Page 1: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/1.jpg)
Algorithms and Specializers for Provably Op:mal Implementa:ons with Resiliency and Efficiency
Elad Alon, Krste Asanovic (Director), Jonathan Bachrach, Jim Demmel, Armando Fox, Kurt Keutzer, Borivoje Nikolic, David PaAerson,
Koushik Sen, John Wawrzynek !
[email protected] http://aspire.eecs.berkeley.edu!
![Page 2: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/2.jpg)
UC Berkeley Future Application Drivers!
2
![Page 3: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/3.jpg)
UC Berkeley Compute Energy “Iron Law”
§ When power is constrained, need beAer energy efficiency for more performance
§ Where performance is constrained (real-‐Lme), want beAer energy efficiency to lower power
Improving energy Efficiency is cri:cal goal for all future systems and workloads
3
Performance = Power * Energy Efficiency (Tasks/Second) (Joules/Second) (Tasks/Joule)
![Page 4: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/4.jpg)
UC Berkeley Good News: Moore’s Law Continues!
4 “Cramming more components onto integrated circuits”, Gordon E. Moore, Electronics, 1965
![Page 5: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/5.jpg)
UC Berkeley
Bad News:Dennard (Voltage) Scaling Over!
5
Distribution A – Approved for Public Release; Distribution Unlimited
• Voltage scaling slowed drastically
• Asymptotically approaching threshold
Why did we hit a power/cooling wall?
9/12/2012 19
The good old days of Dennard Scaling:
Today, now that Dennard Scaling is dead:
X
Ng = CMOS gates/unit area Cload = capacitive load/CMOS gate f = clock frequency V = supply voltage
Data courtesy S. Borkar/Intel 2011
Dennard Scaling
Post-‐Dennard Scaling
CICCSept 10, 2012 10
And L3 energy scaling ended in 2005
Gordon Moore, ISSCC 2003Moore, ISSCC Keynote, 2003
Moore, ISSCC Keynote, 2003
![Page 6: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/6.jpg)
UC Berkeley
1st Impact of End of Scaling:End of Sequential Processor Era!
6
![Page 7: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/7.jpg)
UC Berkeley Parallelism: A one-‐?me gain
Use more, slower cores for beAer energy efficiency. Either § simpler cores, or § run cores at lower Vdd/frequency
§ Even simpler general-‐purpose microarchitectures? - Limited by smallest sensible core
§ Even Lower Vdd/Frequency? - Limited by Vdd/Vt scaling, errors
§ Now what?
7
![Page 8: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/8.jpg)
UC Berkeley
[Muller, ARM CTO, 2009]
2nd Impact of End of Scaling: “Dark Silicon” Cannot switch all transistors at full frequency!
8
No savior device technology on horizon. Future energy-‐efficiency innova:ons must be above transistor level.
![Page 9: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/9.jpg)
UC Berkeley The End of General-‐Purpose Processors?
§ Most compuLng happens in specialized, heterogeneous processors - Can be 100-‐1000X more efficient than general-‐purpose processor
§ Challenges: - Hardware design costs - Sofware development costs
9
NVIDIA Tegra2
![Page 10: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/10.jpg)
UC Berkeley
The Real Scaling Challenge: CommunicaFon
As transistors become smaller and cheaper, communicaLon dominates performance and energy
10
All scales: § Across chip § Up and down memory hierarchy
§ Chip-‐to-‐chip § Board-‐to-‐board § Rack-‐to-‐rack
![Page 11: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/11.jpg)
UC Berkeley ASPIRE: From BeIer to Best
§ What is the best we can do? - For a fixed target technology (e.g., 7nm)
§ Can we prove a bound? § Can we design implementaLon approaching bound?
è Provably OpLmal ImplementaLons
11
Specialize and optimize communication and computation across whole stack from
applications to hardware
![Page 12: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/12.jpg)
UC Berkeley
Communica?on-‐Avoiding Algorithms: Algorithm Cost Measures
1. ArithmeLc (FLOPS) 2. CommunicaLon: moving data between - levels of a memory hierarchy (sequenLal case) - processors over a network (parallel case).
12
CPU Cache
DRAM
CPU DRAM
CPU DRAM
CPU DRAM
CPU DRAM
![Page 13: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/13.jpg)
UC Berkeley Modeling Run?me & Energy
13
![Page 14: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/14.jpg)
UC Berkeley A few examples of speedups
§ Matrix mulLplicaLon - Up to 12x on IBM BG/P for n=8K on 64K cores; 95% less communica?on
§ QR decomposiLon (used in least squares, data mining, …) - Up to 8x on 8-‐core dual-‐socket Intel Clovertown, for 10M x 10 - Up to 6.7x on 16-‐proc. PenLum III cluster, for 100K x 200 - Up to 13x on Tesla C2050 / Fermi, for 110k x 100 - Up to 4x on Grid of 4 ciLes (Dongarra, Langou et al) - “infinite speedup” for out-‐of-‐core on PowerPC laptop - LAPACK thrashed virtual memory, didn’t finish
§ Eigenvalues of band symmetric matrices - Up to 17x on Intel Gainestown, 8 core, vs MKL 10.0 (up to 1.9x sequenLal)
§ IteraLve sparse linear equaLons solvers (GMRES) - Up to 4.3x on Intel Clovertown, 8 core
§ N-‐body (direct parLcle interacLons with cutoff distance) - Up to 10x on Cray XT-‐4 (Hopper), 24K parLcles on 6K procs.
14
![Page 15: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/15.jpg)
UC Berkeley Modeling Energy: Dynamic
15
![Page 16: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/16.jpg)
UC Berkeley Modeling Energy: Memory Reten?on
16
![Page 17: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/17.jpg)
UC Berkeley Modeling Energy: Background Power
17
![Page 18: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/18.jpg)
UC Berkeley Energy Lower Bounds
18
![Page 19: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/19.jpg)
UC Berkeley
Early Result: Perfect Strong Scaling in Time and Energy
§ Every Lme you add processor, use its memory M too § Start with minimal number of procs: PM = 3n2 § Increase P by factor c è total memory increases by factor c § NotaLon for Lming model:
- γt , βt , αt = secs per flop, per word_moved, per message of size m T(cP) = n3/(cP) [ γT+ βt/M1/2 + αt/(mM1/2) ] = T(P)/c
§ NotaLon for energy model: - γe , βe , αe = Joules for same operaLons - δe = Joules per word of memory used per sec - εe = Joules per sec for leakage, etc.
E(cP) = cP { n3/(cP) [ γe+ βe/M1/2 + αe/(mM1/2) ] + δeMT(cP) + εET(cP) } = E(P)
§ Perfect scaling extends to n-‐body, Strassen, …
[IPDPS, 2013]
19
![Page 20: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/20.jpg)
UC Berkeley C-‐A Algorithms Not Just for HPC
§ In ASPIRE, apply to other key applicaLon areas: machine vision, databases, speech recogniLon, sofware-‐defined radio, …
§ IniLal results on lower bounds of database join algorithms
20
![Page 21: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/21.jpg)
UC Berkeley
From C-‐A Algorithms to Provably Op?mal Systems?
§ 1) Prove lower bounds on communicaLon for a computaLon
§ 2) Develop algorithm that achieves lower bound on a system
§ 3) Find that communicaLon Lme/energy cost is >90% of resulLng implementaLon
§ 4) We know we’re within 10% of opLmal!
§ SupporLng technique: OpLmizing sofware stack and compute engines to reduce compute costs and expose unavoidable communicaLon costs
21
![Page 22: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/22.jpg)
UC Berkeley
ESP: An Applica?ons Processor Architecture for ASPIRE
§ Future server and mobile SoCs will have many fixed-‐funcLon accelerators and a general-‐purpose programmable mulLcore
§ Well-‐known how to customize hardware engines for specific task
§ ESP challenge is using specialized engines for general-‐purpose code
22
Intel Ivy Bridge (22nm)
Qualcomm Snapdragon MSM8960 (28nm)
![Page 23: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/23.jpg)
UC Berkeley ESP: Ensembles of Specialized Processors
§ General-‐purpose hardware, flexible but inefficient § Fixed-‐funcLon hardware, efficient but inflexible § Par Lab Insight: PaMerns capture common opera:ons across many applica:ons, each with unique communica:on & computa:on structure
§ Build an ensemble of specialized engines, each individually opLmized for parLcular paAern but collecLvely covering applicaLon needs
§ Bet: Will give us efficiency plus flexibility - Any given core can have a different mix of these depending on workload
23
![Page 24: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/24.jpg)
UC Berkeley Par Lab: Mo?fs common across apps
24
Dense Graph Sparse …
Applica?ons Audio RecogniLon
Object RecogniLon
Scene Analysis
Berkeley View “Dwarfs” or Mo?fs
![Page 25: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/25.jpg)
UC Berkeley
25
Par Lab Apps
Mo?f (nee “Dwarf”) Popularity (Red Hot / Blue Cool)
CompuLng Domains
![Page 26: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/26.jpg)
UC Berkeley
• Pipe-‐and-‐Filter • Agent-‐and-‐Repository • Event-‐based • Bulk Synchronous • Map-‐Reduce
• Layered Systems
• Model-‐view controller
• Arbitrary Task Graphs • Puppeteer • Model-‐View-‐Controller
ApplicaLon
• Graph Algorithms • Dynamic programming • Dense/Spare Linear Algebra • Un/Structured Grids • Graphical Models • Finite State Machines • Backtrack Branch-‐and-‐Bound • N-‐Body Methods • Circuits • Spectral Methods • Monte-‐Carlo
Architec?ng Parallel Sofware
IdenLfy the Sofware Structure
IdenLfy the Key ComputaLons
![Page 27: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/27.jpg)
UC Berkeley Mapping Sofware to ESP: Specializers
§ Capture desired funcLonality at high-‐level using paAerns in a producLve high-‐level language
§ Use paAern-‐specific compilers (Specializers) with autotuners to produce efficient low-‐level code
§ ASP specializer infrastructure, open-‐source download 27
ILP Engine
Dense Engine
Sparse Engine
Graph Engine
ESP Core
Glue Code
Dense Code
SparseCode
Graph Code
ESP Code
Dense Graph Sparse …
Applica?ons Audio RecogniLon
Object RecogniLon
Scene Analysis
Berkeley View “Dwarfs” or Mo?fs
Specializers with SEJITS Implementa?ons and Autotuning
![Page 28: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/28.jpg)
UC Berkeley
Replacing Fixed Accelerators with Programmable Fabric
§ Future server and mobile SoCs will have many fixed-‐funcLon accelerators and a general-‐purpose programmable mulLcore
§ Fabric challenge is retaining extreme energy efficiency while retaining programmability
28
Intel Ivy Bridge (22nm)
Qualcomm Snapdragon MSM8960 (28nm)
![Page 29: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/29.jpg)
UC Berkeley Strawman Fabric Architecture
29
M
A
RM
A
RM
A
RM
A
R
M
A
RM
A
RM
A
RM
A
R
M
A
RM
A
RM
A
RM
A
R
M
A
RM
A
RM
A
RM
A
R
§ Will never have a C compiler § Only programmed using pattern-based
DSLs § More dynamic, less static than earlier
approaches § Dynamic dataflow-driven execution § Dynamic routing § Large memory support
![Page 30: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/30.jpg)
UC Berkeley “Agile Hardware” Development
§ Current hardware design slow and arduous § But now have huge design space to explore § How to examine many design points efficiently?
§ Build parameterized generators, not point designs! § Adopt and adapt best pracLces from Agile Sofware - Complete LVS-‐DRC clean physical design of current version every ~ two weeks (“tapein”)
- Incremental feature addiLon - Test & VerificaLon first step
30
![Page 31: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/31.jpg)
UC Berkeley
Chisel: Construc?ng Hardware In a Scala Embedded Language
§ Embed a hardware-‐descripLon language in Scala, using Scala’s extension faciliLes
§ A hardware module is just a data structure in Scala § Different output rouLnes can generate different types of output (C, FPGA-‐Verilog, ASIC-‐Verilog) from same hardware representaLon
§ Full power of Scala for wriLng hardware generators - Object-‐Oriented: Factory objects, traits, overloading etc - FuncLonal: Higher-‐order funcs, anonymous funcs, currying - Compiles to JVM: Good performance, Java interoperability
31
![Page 32: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/32.jpg)
UC Berkeley Chisel Design Flow!
32
Chisel Program
C++ code
FPGA Verilog
ASIC Verilog
Software Simulator
C++ Compiler
Scala/JVM
FPGA Emulation
FPGA Tools
GDS Layout
ASIC Tools
![Page 33: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/33.jpg)
UC Berkeley Chisel is much more than an HDL
§ The base Chisel system allows you to use the full power of Scala to describe the RTL of a design, then generate Verilog or C++ output from the RTL
§ But Chisel can be extended above with domain-‐specific languages (e.g., signal processing) for fabric
§ Importantly, Chisel can also be extended below with new backends or to add new tools or features (e.g., quantum compuLng circuits)
§ Only ~6,000 lines of code in current version including libraries!
§ BSD-‐licensed open source at: chisel.eecs.berkeley.edu!
33
![Page 34: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/34.jpg)
UC Berkeley
Many processor tapeouts in few years with small group (45nm, 28nm)
34
Clock test site
SRAM test site
DCDC test site
Processor Site
CO
RE
0 VC
0
CO
RE
1 VC
1
CO
RE
2 VC
2
CO
RE
3 VC
3
512K
B
L2
VFIX
ED
Test
Si
tes
![Page 35: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/35.jpg)
UC Berkeley Resilient Circuits & Modeling
§ Future scaled technologies have high variability but want to run with lowest-‐possible margins to save energy
§ Significant increase in sof errors, need resilient systems § Technology modeling to determine tradeoff between MTBF and energy per task for logic, SRAM, & interconnect.
35
Techniques to reduce operaLng voltage can be worse for energy due to rapid rise in errors
![Page 36: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/36.jpg)
UC Berkeley
Hardware
Sofware
Computa?onal and Structural PaIerns
Algorithms and Specializers for Provably OpFmal ImplementaFons with Resiliency and Efficiency
36
Dense Graph Sparse …
ESP (Ensembles of Specialized Processors)
Architecture
C++ SimulaLon
FPGA EmulaLon
Valida?on/Verifica?on
Applica?ons Audio RecogniLon
Object RecogniLon
Scene Analysis
Hardware Cache Coherence
ASIC SoC
FPGA Computer Implementa?on Technologies
CommunicaLon-‐Avoiding Algorithms C-‐A GEMM C-‐A BFS C-‐A SpMV
Deep HW/SW Design-‐Space Explora?on
Pipe&Filter Map-‐Reduce …
… Hardware Generators using Chisel HDL
ILP Engine
Dense Engine
Sparse Engine
Graph Engine
ESP Core
Local Stores + DMA
Glue Code
Dense Code
SparseCode
Graph Code
ESP Code
Specializers with SEJITS Implementa?ons and Autotuning
![Page 37: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling](https://reader034.vdocument.in/reader034/viewer/2022042122/5e9c533c28059c00f81223b6/html5/thumbnails/37.jpg)
UC Berkeley ASPIRE Project
§ IniLal $15.6M/5.5 year funding from DARPA PERFECT program - Started 9/28/2012 - Located in Par Lab space + BWRC
§ Looking for industrial affiliates (see Krste!) § Open House today, 5th floor Soda Hall
37
Research funded by DARPA Award Number HR0011-‐12-‐2-‐0016. Approved for public release; distribu:on is unlimited. The content of this presenta:on does not necessarily reflect the posi:on or the policy of the US government and no official endorsement should be inferred.