algorithms+and+specializers+for+provably+op:mal+ ...€¦ · why did we hit a power/cooling wall?...

37
A lgorithms and S pecializers for P rovably Op:mal I mplementa:ons with R esiliency and E fficiency Elad Alon, Krste Asanovic (Director), Jonathan Bachrach, Jim Demmel, Armando Fox, Kurt Keutzer, Borivoje Nikolic, David PaAerson, Koushik Sen, John Wawrzynek [email protected] http://aspire.eecs.berkeley.edu

Upload: others

Post on 17-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

Algorithms  and  Specializers  for  Provably  Op:mal  Implementa:ons  with  Resiliency  and  Efficiency  

Elad  Alon,  Krste  Asanovic  (Director),  Jonathan  Bachrach,  Jim  Demmel,  Armando  Fox,  Kurt  Keutzer,  Borivoje  Nikolic,  David  PaAerson,  

Koushik  Sen,  John  Wawrzynek  !

[email protected]  http://aspire.eecs.berkeley.edu!

Page 2: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Future Application Drivers!

2

Page 3: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Compute  Energy  “Iron  Law”  

§ When  power  is  constrained,  need  beAer  energy  efficiency  for  more  performance  

§ Where  performance  is  constrained  (real-­‐Lme),  want  beAer  energy  efficiency  to  lower  power  

Improving  energy  Efficiency  is  cri:cal  goal  for  all  future  systems  and  workloads  

3

   Performance  =                  Power            *      Energy  Efficiency  (Tasks/Second)    (Joules/Second)          (Tasks/Joule)  

Page 4: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Good News: Moore’s Law Continues!

4 “Cramming  more  components  onto  integrated  circuits”,  Gordon  E.  Moore,  Electronics,  1965  

Page 5: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Bad News:Dennard (Voltage) Scaling Over!

5

Distribution A – Approved for Public Release; Distribution Unlimited

• Voltage scaling slowed drastically

• Asymptotically approaching threshold

Why did we hit a power/cooling wall?

9/12/2012 19

The good old days of Dennard Scaling:

Today, now that Dennard Scaling is dead:

X

Ng = CMOS gates/unit area Cload = capacitive load/CMOS gate f = clock frequency V = supply voltage

Data courtesy S. Borkar/Intel 2011

Dennard  Scaling  

Post-­‐Dennard  Scaling  

CICCSept 10, 2012 10

And L3 energy scaling ended in 2005

Gordon Moore, ISSCC 2003Moore, ISSCC Keynote, 2003

Moore,  ISSCC  Keynote,  2003  

Page 6: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

1st Impact of End of Scaling:End of Sequential Processor Era!

6

Page 7: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Parallelism:  A  one-­‐?me  gain  

Use  more,  slower  cores  for  beAer  energy  efficiency.  Either  §  simpler  cores,  or  §  run  cores  at  lower  Vdd/frequency  

§  Even  simpler  general-­‐purpose  microarchitectures?  - Limited  by  smallest  sensible  core  

§  Even  Lower  Vdd/Frequency?  - Limited  by  Vdd/Vt  scaling,  errors  

§ Now  what?  

7

Page 8: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

[Muller,  ARM  CTO,  2009]  

2nd  Impact  of  End  of  Scaling:  “Dark  Silicon”  Cannot  switch  all  transistors  at  full  frequency!  

 

8

No  savior  device  technology  on  horizon.  Future  energy-­‐efficiency  innova:ons  must  be  above  transistor  level.  

Page 9: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley The  End  of  General-­‐Purpose  Processors?  

§ Most  compuLng  happens  in  specialized,  heterogeneous  processors  - Can  be  100-­‐1000X  more  efficient  than  general-­‐purpose  processor  

§ Challenges:  - Hardware  design  costs  - Sofware  development  costs  

9

NVIDIA  Tegra2  

Page 10: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

The  Real  Scaling  Challenge:  CommunicaFon  

As  transistors  become  smaller  and  cheaper,  communicaLon  dominates  performance  and  energy  

10

All  scales:  § Across  chip  § Up  and  down  memory  hierarchy  

§ Chip-­‐to-­‐chip  § Board-­‐to-­‐board  § Rack-­‐to-­‐rack  

Page 11: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley ASPIRE:  From  BeIer  to  Best  

§ What  is  the  best  we  can  do?  - For  a  fixed  target  technology  (e.g.,  7nm)  

§ Can  we  prove  a  bound?  § Can  we  design  implementaLon  approaching  bound?  

è  Provably  OpLmal  ImplementaLons  

11

Specialize and optimize communication and computation across whole stack from

applications to hardware  

Page 12: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Communica?on-­‐Avoiding  Algorithms:  Algorithm  Cost  Measures  

1. ArithmeLc  (FLOPS)  2. CommunicaLon:  moving  data  between    - levels  of  a  memory  hierarchy  (sequenLal  case)    - processors  over  a  network  (parallel  case).    

12

CPU  Cache  

DRAM  

CPU  DRAM  

CPU  DRAM  

CPU  DRAM  

CPU  DRAM  

Page 13: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Modeling  Run?me  &  Energy  

13

Page 14: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley A  few  examples  of  speedups  

§  Matrix  mulLplicaLon  -  Up  to  12x  on  IBM  BG/P  for  n=8K  on  64K  cores;  95%  less  communica?on  

§  QR  decomposiLon  (used  in  least  squares,  data  mining,  …)  -  Up  to  8x  on  8-­‐core  dual-­‐socket  Intel  Clovertown,  for  10M  x  10  -  Up  to  6.7x  on  16-­‐proc.  PenLum  III  cluster,  for  100K  x  200  -  Up  to  13x  on  Tesla  C2050  /  Fermi,  for  110k  x  100  -  Up  to  4x  on  Grid  of  4  ciLes  (Dongarra,  Langou  et  al)  -  “infinite  speedup”  for  out-­‐of-­‐core  on  PowerPC  laptop    -  LAPACK  thrashed  virtual  memory,  didn’t  finish  

§  Eigenvalues  of  band  symmetric  matrices  -  Up  to  17x  on  Intel  Gainestown,  8  core,  vs  MKL  10.0  (up  to  1.9x  sequenLal)  

§  IteraLve  sparse  linear  equaLons  solvers  (GMRES)  -  Up  to  4.3x  on  Intel  Clovertown,  8  core  

§  N-­‐body  (direct  parLcle  interacLons  with  cutoff  distance)  -  Up  to  10x  on  Cray  XT-­‐4  (Hopper),  24K  parLcles  on  6K  procs.  

14

Page 15: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Modeling  Energy:  Dynamic  

15

Page 16: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Modeling  Energy:  Memory  Reten?on  

16

Page 17: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Modeling  Energy:  Background  Power  

17

Page 18: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Energy  Lower  Bounds  

18

Page 19: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Early  Result:  Perfect  Strong  Scaling  in  Time  and  Energy    

§  Every  Lme  you  add  processor,  use  its  memory  M  too  §  Start  with  minimal  number  of  procs:  PM  =  3n2  §  Increase  P  by  factor  c  è  total  memory  increases  by  factor  c  §  NotaLon  for  Lming  model:  

-  γt  ,  βt  ,  αt  =  secs  per  flop,  per  word_moved,  per  message  of  size  m  T(cP)  =  n3/(cP)  [  γT+  βt/M1/2  +  αt/(mM1/2)  ]  =  T(P)/c  

§  NotaLon  for  energy  model:  -  γe  ,  βe  ,  αe  =  Joules  for  same  operaLons  -  δe  =  Joules  per  word  of  memory  used  per  sec  -  εe  =  Joules  per  sec  for  leakage,  etc.  

E(cP)  =  cP  {  n3/(cP)  [  γe+  βe/M1/2  +  αe/(mM1/2)  ]  +  δeMT(cP)                        +  εET(cP)  }  =  E(P)  

§  Perfect  scaling  extends  to  n-­‐body,  Strassen,  …  

[IPDPS,  2013]  

19

Page 20: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley C-­‐A  Algorithms  Not  Just  for  HPC  

§  In  ASPIRE,  apply  to  other  key  applicaLon  areas:  machine  vision,  databases,  speech  recogniLon,  sofware-­‐defined  radio,  …  

§  IniLal  results  on  lower  bounds  of  database  join  algorithms  

20

Page 21: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

From  C-­‐A  Algorithms  to  Provably  Op?mal  Systems?  

§  1)  Prove  lower  bounds  on  communicaLon  for  a  computaLon  

§  2)  Develop  algorithm  that  achieves  lower  bound  on  a  system  

§  3)  Find  that  communicaLon  Lme/energy  cost  is  >90%  of  resulLng  implementaLon  

§  4)  We  know  we’re  within  10%  of  opLmal!  

§  SupporLng  technique:  OpLmizing  sofware  stack  and  compute  engines  to  reduce  compute  costs  and  expose  unavoidable  communicaLon  costs  

21

Page 22: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

ESP:  An  Applica?ons  Processor  Architecture  for  ASPIRE  

§  Future  server  and  mobile  SoCs  will  have  many  fixed-­‐funcLon  accelerators  and  a  general-­‐purpose  programmable  mulLcore  

§ Well-­‐known  how  to  customize  hardware  engines  for  specific  task  

§  ESP  challenge  is  using  specialized  engines  for  general-­‐purpose  code  

  22

Intel  Ivy  Bridge  (22nm)  

Qualcomm  Snapdragon  MSM8960  (28nm)  

Page 23: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley ESP:  Ensembles  of  Specialized  Processors  

§ General-­‐purpose  hardware,  flexible  but  inefficient  §  Fixed-­‐funcLon  hardware,  efficient  but  inflexible  § Par  Lab  Insight:  PaMerns  capture  common  opera:ons  across  many  applica:ons,  each  with  unique  communica:on  &  computa:on  structure  

§ Build  an  ensemble  of  specialized  engines,  each  individually  opLmized  for  parLcular  paAern  but  collecLvely  covering  applicaLon  needs  

§ Bet:  Will  give  us  efficiency  plus  flexibility  - Any  given  core  can  have  a  different  mix  of  these  depending  on  workload  

23

Page 24: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Par  Lab:  Mo?fs  common  across  apps  

24

Dense   Graph  Sparse   …  

Applica?ons  Audio  RecogniLon  

Object  RecogniLon  

Scene  Analysis  

Berkeley  View  “Dwarfs”  or  Mo?fs  

Page 25: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

25

Par  Lab  Apps      

   Mo?f  (nee  “Dwarf”)  Popularity        (Red  Hot  /  Blue  Cool)  

CompuLng  Domains    

Page 26: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

• Pipe-­‐and-­‐Filter  • Agent-­‐and-­‐Repository  • Event-­‐based  • Bulk  Synchronous  • Map-­‐Reduce  

• Layered  Systems  

• Model-­‐view  controller  

• Arbitrary  Task  Graphs  • Puppeteer  • Model-­‐View-­‐Controller  

ApplicaLon  

•  Graph  Algorithms  •   Dynamic  programming  •   Dense/Spare  Linear  Algebra    •   Un/Structured  Grids  •   Graphical  Models  •   Finite  State  Machines  •   Backtrack  Branch-­‐and-­‐Bound  •   N-­‐Body  Methods  •   Circuits  •   Spectral  Methods  •   Monte-­‐Carlo  

Architec?ng  Parallel  Sofware  

IdenLfy  the  Sofware  Structure  

IdenLfy  the  Key  ComputaLons  

Page 27: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Mapping  Sofware  to  ESP:  Specializers  

§ Capture  desired  funcLonality  at  high-­‐level  using  paAerns  in  a  producLve  high-­‐level  language  

§ Use  paAern-­‐specific  compilers  (Specializers)  with  autotuners  to  produce  efficient  low-­‐level  code  

§ ASP  specializer  infrastructure,  open-­‐source  download   27

ILP  Engine  

Dense  Engine  

Sparse  Engine  

Graph  Engine  

ESP  Core  

Glue  Code  

Dense  Code  

SparseCode  

Graph  Code  

ESP  Code  

Dense   Graph  Sparse   …  

Applica?ons  Audio  RecogniLon  

Object  RecogniLon  

Scene  Analysis  

Berkeley  View  “Dwarfs”  or  Mo?fs  

Specializers  with  SEJITS  Implementa?ons  and  Autotuning  

Page 28: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Replacing  Fixed  Accelerators  with  Programmable  Fabric  

§  Future  server  and  mobile  SoCs  will  have  many  fixed-­‐funcLon  accelerators  and  a  general-­‐purpose  programmable  mulLcore  

§  Fabric  challenge  is  retaining  extreme  energy  efficiency  while  retaining  programmability  

 

28

Intel  Ivy  Bridge  (22nm)  

Qualcomm  Snapdragon  MSM8960  (28nm)  

Page 29: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Strawman  Fabric  Architecture  

29

M

A  

RM

A  

RM

A  

RM

A  

R

M

A  

RM

A  

RM

A  

RM

A  

R

M

A  

RM

A  

RM

A  

RM

A  

R

M

A  

RM

A  

RM

A  

RM

A  

R

§  Will never have a C compiler §  Only programmed using pattern-based

DSLs §  More dynamic, less static than earlier

approaches §  Dynamic dataflow-driven execution §  Dynamic routing §  Large memory support

Page 30: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley “Agile  Hardware”  Development  

§ Current  hardware  design  slow  and  arduous  § But  now  have  huge  design  space  to  explore  § How  to  examine  many  design  points  efficiently?  

§ Build  parameterized  generators,  not  point  designs!  § Adopt  and  adapt  best  pracLces  from  Agile  Sofware  - Complete  LVS-­‐DRC  clean  physical  design  of  current  version  every  ~  two  weeks  (“tapein”)  

- Incremental  feature  addiLon  - Test  &  VerificaLon  first  step  

30

Page 31: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Chisel:  Construc?ng  Hardware  In  a  Scala  Embedded  Language  

§  Embed  a  hardware-­‐descripLon  language  in  Scala,  using  Scala’s  extension  faciliLes  

§ A  hardware  module  is  just  a  data  structure  in  Scala  § Different  output  rouLnes  can  generate  different  types  of  output  (C,  FPGA-­‐Verilog,  ASIC-­‐Verilog)  from  same  hardware  representaLon  

§  Full  power  of  Scala  for  wriLng  hardware  generators  - Object-­‐Oriented:  Factory  objects,  traits,  overloading  etc  - FuncLonal:  Higher-­‐order  funcs,  anonymous  funcs,  currying  - Compiles  to  JVM:  Good  performance,  Java  interoperability  

31

Page 32: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Chisel Design Flow!

32

Chisel Program

C++ code

FPGA Verilog

ASIC Verilog

Software Simulator

C++ Compiler

Scala/JVM

FPGA Emulation

FPGA Tools

GDS Layout

ASIC Tools

Page 33: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Chisel  is  much  more  than  an  HDL  

§  The  base  Chisel  system  allows  you  to  use  the  full  power  of  Scala  to  describe  the  RTL  of  a  design,  then  generate  Verilog  or  C++  output  from  the  RTL  

§ But  Chisel  can  be  extended  above  with  domain-­‐specific  languages  (e.g.,  signal  processing)  for  fabric  

§  Importantly,  Chisel  can  also  be  extended  below  with  new  backends  or  to  add  new  tools  or  features  (e.g.,  quantum  compuLng  circuits)  

§ Only  ~6,000  lines  of  code  in  current  version  including  libraries!  

§ BSD-­‐licensed  open  source  at:  chisel.eecs.berkeley.edu!

33

Page 34: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Many  processor  tapeouts  in  few  years  with  small  group  (45nm,  28nm)  

34

Clock test site

SRAM test site

DCDC test site

Processor Site

CO

RE

0 VC

0

CO

RE

1 VC

1

CO

RE

2 VC

2

CO

RE

3 VC

3

512K

B

L2

VFIX

ED

Test

Si

tes

Page 35: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley Resilient  Circuits  &  Modeling  

§  Future  scaled  technologies  have  high  variability  but  want  to  run  with  lowest-­‐possible  margins  to  save  energy  

§  Significant  increase  in  sof  errors,  need  resilient  systems  §  Technology  modeling  to  determine  tradeoff  between  MTBF  and  energy  per  task  for  logic,  SRAM,  &  interconnect.  

35

Techniques  to  reduce  operaLng  voltage  can  be  worse  for  energy  due  to  rapid  rise  in  errors  

Page 36: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley

Hardware  

Sofware  

Computa?onal  and  Structural  PaIerns  

Algorithms  and  Specializers  for  Provably  OpFmal  ImplementaFons  with  Resiliency  and  Efficiency  

36

Dense   Graph  Sparse   …  

ESP  (Ensembles  of  Specialized  Processors)  

Architecture  

C++  SimulaLon  

FPGA  EmulaLon  

Valida?on/Verifica?on  

Applica?ons  Audio  RecogniLon  

Object  RecogniLon  

Scene  Analysis  

Hardware  Cache  Coherence  

ASIC  SoC  

FPGA  Computer  Implementa?on  Technologies  

CommunicaLon-­‐Avoiding  Algorithms  C-­‐A  GEMM   C-­‐A  BFS  C-­‐A  SpMV  

Deep  HW/SW  Design-­‐Space  Explora?on  

Pipe&Filter   Map-­‐Reduce  …  

…   Hardware  Generators  using  Chisel  HDL  

ILP  Engine  

Dense  Engine  

Sparse  Engine  

Graph  Engine  

ESP  Core  

Local  Stores  +  DMA  

Glue  Code  

Dense  Code  

SparseCode  

Graph  Code  

ESP  Code  

Specializers  with  SEJITS  Implementa?ons  and  Autotuning  

Page 37: Algorithms+and+Specializers+for+Provably+Op:mal+ ...€¦ · Why did we hit a power/cooling wall? 9/12/2012 19 The good old days of Dennard Scaling: Today, now that Dennard Scaling

UC Berkeley ASPIRE  Project  

§  IniLal  $15.6M/5.5  year  funding  from  DARPA  PERFECT  program  - Started  9/28/2012  - Located  in  Par  Lab  space  +  BWRC  

 §  Looking  for  industrial  affiliates  (see  Krste!)  § Open  House  today,  5th  floor  Soda  Hall  

37

Research  funded  by  DARPA  Award  Number  HR0011-­‐12-­‐2-­‐0016.  Approved  for  public  release;  distribu:on  is  unlimited.  The  content  of  this  presenta:on  does  not  necessarily  reflect  the  posi:on  or  the  policy  of  the  US  government  and  no  official  endorsement  should  be  inferred.