new harnessing)isa)diversity:)design)of)a … · 2014. 6. 19. ·...

45
Harnessing ISA Diversity: Design of a HeterogeneousISA Chip Mul;processor Ashish Venkat Dean M. Tullsen University of California, San Diego 1

Upload: others

Post on 19-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Harnessing  ISA  Diversity:  Design  of  a  Heterogeneous-­‐ISA  Chip  Mul;processor  

Ashish  Venkat        Dean  M.  Tullsen  University  of  California,  San  Diego  

1  

Page 2: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Heterogeneous  Chip  Mul;processors  

•  Commonplace  in  both  general-­‐purpose  and  embedded  worlds.  

•  Heterogeneity  is  oFen  exploited  in  two  fundamental  dimensions:-­‐  

2  

–  Core  Specializa;on:  accelerate  the  performance  of  certain  workloads  

– Micro-­‐architectural  Heterogeneity:  use  small  power-­‐efficient  and  large  high  performance  cores  

 High  Performance  

Crypto  and  

Security  

Low  Power   Vector  Processing  

L2  Cache  

L1  

L2  Cache  

L1  

L2  Cache  

L1  

L2  Cache  

L1  

Page 3: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Heterogeneous  Chip  Mul;processors  

3  

MIPS  R10000  

Thumb  Cortex-­‐A7  

Alpha  ev6  

x86-­‐64  core-­‐i7  

Heterogeneous-­‐ISA  CMP  

ARM  Cortex  A15  

Single-­‐ISA  heterogeneous  CMP  

x86-­‐64  core-­‐i7  

Homogeneous  CMP  

x86-­‐64  core-­‐i7  

x86-­‐64  core-­‐i7  

x86-­‐64  core-­‐i7  

*Rakesh  Kumar,  Keith  Farkas,  Norm  P.  Jouppi,  Partha  Ranganathan,  Dean  M.  Tullsen,  MICRO’03  

63%  speedup  OR  

69%  energy  savings  with  3%  performance  loss*  

Restric>ng  cores  to  a  single  ISA  eliminates  an  important  

dimension  of  heterogeneity  

ARM  Cortex  A5  

ARM  Cortex  A12  

ARM  Cortex  A9  

Page 4: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

•  Enables  ISA-­‐microarchitecture  co-­‐design  – There  is  significant  synergy  in  combining  heterogeneous-­‐ISAs  with  heterogeneous  hardware  

•  Exploits  ISA-­‐affinity  

4  

MIPS  R10000  

Thumb  Cortex  A-­‐15  

Alpha  ev5  

x86-­‐64  core-­‐i7  

– Applica;ons  have  a  natural  ISA  preference  

Why  is  ISA-­‐heterogeneity  advantageous?  

Do  exis>ng  ISAs  provide  sufficient  heterogeneity?  

Page 5: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

5  

•  In  our  design  space  explora;on,  we  employ  three  modern  ISAs:  – ARM’s  energy-­‐efficient  Thumb  ISA  

– The  high  performance  x86-­‐64  ISA  

– The  simple  and  tradi;onally-­‐RISC  Alpha  ISA  

•  They  encompass  several  axes  of  ISA  diversity  – Code  density,  instruc;on  complexity,  register  pressure,  predica;on  support,  floa;ng-­‐point  arithme;c  vs  emula;on  and  SIMD  processing    

Harnessing  ISA  diversity  

Page 6: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

However  .  .  .  To  fully  harness  ISA  diversity,  execu>on  migra>on  is  cri>cal  

Execu;on  Migra;on  

•  Allows  an  applica;on  to  execute  on  the  ISA  of  its  preference,  during  different  phases  of  execu;on  

•  Allows  switching  execu;on  to  a  low  power  core  when  the  power  cord  is  plugged  out  

6  

 

•  Enables  load  balancing  MIPS  R10000  

Thumb  Cortex  A-­‐15  

Alpha  ev5  

x86-­‐64  core-­‐i7  

Page 7: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Execu;on  Migra;on  in  a  Heterogeneous-­‐ISA  CMP  

Heterogeneous -­‐ ISA  Architecture

Thumb Alpha

x86-­‐64  

Symmetrical  Fat  Binary  All  data  objects  are  consistently  referenced  by  the  same  address  in  all  ISAs  

7  

L2  L1  

L2  L1  

L2  L1  

Logical  Address Space

Kernel  Space

Stack

Data Sec;ons (.data,  . bss ,  . tbss ,  . sdata ,   etc )

Code Sec;on Thumb

Code Sec;on  Alpha

Code Sec;on x86-­‐64

Reserved

Maihew  DeVuyst,  Ashish  Venkat,  Dean  M.  Tullsen,  ASPLOS’12  

Page 8: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Outline  

•  Mo;va;on  •  ISA  diversity  •  Design  Space  Explora;on  

– Naviga;on  and  Op;miza;on  –  Inference:  ISA-­‐microarchitecture  co-­‐design  –  Inference:  ISA-­‐affinity  

•  Compila;on  and  Run;me  Strategy  •  Key  Points  

8  

Page 9: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

ISA  diversity:  Code  Density  

9  

•  Alpha:  fixed-­‐length  encoding  •  x86-­‐64:  variable-­‐length  encoding  •  Thumb:  code  compression  

0   0.2   0.4   0.6   0.8   1   1.2   1.4  

x86-­‐64  

alpha  

thumb  

Code  Sec>on  Size  (in  MB)  

x86-­‐64   alpha   thumb  

Page 10: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

ISA  diversity:  Instruc;on  Complexity  

10  

•  Thumb:  reduced  encoding  space  (2-­‐operand  instruc;ons)  •  Alpha:  load-­‐store  ISA  (3-­‐operand  instruc;ons)  •  x86-­‐64:  2-­‐operand  instruc;ons  +  complex  addressing  modes  

0   0.2   0.4   0.6   0.8   1   1.2   1.4   1.6  

x86-­‐64  

alpha  

thumb  

Dynamic  Instruc>on  Count  (normalized  to  Alpha)  

x86-­‐64   alpha   thumb  

Page 11: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

ISA  diversity:  Register  Pressure  

11  

0   1   2   3   4   5  

x86-­‐64  

alpha  

thumb  

Architectural  Register  File  Size  (in  KB)  

x86-­‐64   alpha   thumb  

•  Thumb:  Eight  32-­‐bit  INT  registers  •  Alpha:  Two  banks  of  thirty-­‐two  64-­‐bit  INT  and  FP  registers  •  x86-­‐64:  Sixteen  64-­‐bit  INT  and  Sixteen  128-­‐bit  SSE  registers    

Page 12: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

ISA  diversity:  Register  Pressure  

12  

Register  File  Tradeoffs:  •  Size  and  Power  Dissipa;on:      thumb  <  x86-­‐64  <  alpha  •  Register  Pressure:                                        x86-­‐64  <  alpha  <  thumb  

0   0.2   0.4   0.6   0.8   1   1.2   1.4   1.6   1.8  

x86-­‐64  

alpha  

thumb  

Memory  References  (normalized  to  Alpha)  

x86-­‐64   alpha   thumb  

Page 13: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

ISA  diversity:  Feature  Sets  

13  

•  Floa;ng-­‐point  opera;ons  in  Thumb  –  Emulated  in  soFware  or  execu;on  is  migrated  to  a  different  core  

–  Thumb  cores  don’t  include  FP  instruc;on  windows,  register  files,  and  func;onal  units  –  30%  savings  in  area  and  20%  reduc;on  in  TDP  

•  SIMD  opera;ons  in  Alpha  –  Primi;ve:  allows  pack,  unpack,  max  and  min  

– We  forgo  SIMD  units  in  Alpha  to  save  area  and  power  

Page 14: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

•  Enables  ISA-­‐microarchitecture  co-­‐design  – Does  ISA  diversity  complement  micro-­‐architectural  heterogeneity?  

 

•  Exploits  ISA-­‐affinity  

14  

– Does  ISA  diversity  enable  ISA  affinity?  

Why  is  ISA-­‐heterogeneity  advantageous?  

MIPS  R10000  

Thumb  Cortex  A-­‐15  

Alpha  ev5  

x86-­‐64  core-­‐i7  

Page 15: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

0  

0.4  

0.8  

1.2  

1.6  

2  

0   3   6   9   12   15   18   21  

Normalized

 IPC  

Peak  Power  (W)  

Thumb   Alpha   x86-­‐64  

Does  ISA  diversity  enable  ISA  affinity?  

15  

•  Phase  1  prefers  x86-­‐64  •  Phase  2  prefers  Alpha  •  We  always  prefer  Thumb  at  low  power  budgets  

bzip2  –  execu;on  phase  1   bzip2  –  execu;on  phase  2  

0  

0.4  

0.8  

1.2  

1.6  

2  

0   3   6   9   12   15   18   21  

Normalized

 IPC  

Peak  Power  (W)  

Thumb   Alpha   x86-­‐64  

Page 16: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Outline  

•  Mo;va;on  •  ISA  diversity  •  Design  Space  Explora;on  

– Naviga;on  and  Op;miza;on  –  Inference:  ISA-­‐microarchitecture  co-­‐design  –  Inference:  ISA-­‐affinity  

•  Compila;on  and  Run;me  Strategy  •  Key  Points  

16  

Page 17: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

17  

Design  Space  Explora;on    

Alpha  Thumb   Alpha  

x86-­‐64  

Op>mal  Heterogeneous-­‐ISA  CMP  

Alpha  

Alpha  

Op>mal  single-­‐ISA  heterogeneous  CMP  

x86-­‐64  

Op>mal  homogeneous  CMP  

x86-­‐64   x86-­‐64   x86-­‐64  

Alpha   Alpha  

DSE  

Micro-­‐architectural  parameters  

+  ISA  parameters  

+  Budget  constraints    

Page 18: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

18  

•  To  keep  the  design  space  explora;on  tractable,  we  select  our  target  ISAs  a  priori  

•  Our  target  ISAs:  Thumb,  Alpha  and  x86-­‐64  

•  Full  ISA  customiza;on  only  increases  the  poten;al  performance  and  energy  gains  

Design  Space  Explora;on  Choice  of  ISAs  

Page 19: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

19  

Design  Space  Explora;on  Choice  of  micro-­‐architectural  parameters  Design  Parameter   Design  Choice  

Execu;on  Seman;cs   In-­‐order,  Out-­‐of-­‐order  

Issue  Width   1,  2,  4  

Branch  Predictor   Local,  Tournament  

Reorder  Buffer  Size   64,  128  entries  

Physical  Register  File  (integer)   96,  160  

Physical  Register  File  (FP/SIMD)   64,  96  

Integer  ALUs   1,  3,  6  

Integer  Mul;ply/Divide  Units   1,  2  

Floa;ng-­‐point  ALUs   1,  2,  4  

FP  Mul;ply/Divide  Units   1,  2  

SIMD  Units   1,  2,  4  

Load/Store  Queue   16,32  entries  

Instruc;on  Cache   32KB  4-­‐way,  64KB  4-­‐way  

Private  Data  Cache   32KB  4-­‐way,  64KB  8-­‐way  

Shared  Last  Level  (L2)  cache   4-­‐banked  4MB  4-­‐way,  4-­‐banked  8MB  8-­‐way  

750  thousand  single  core  and  a  sep>llion  (1024)  4-­‐core  configura>ons  

Page 20: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

20  

Design  Space  Explora;on  Choice  of  micro-­‐architectural  parameters  Design  Parameter   Design  Choice  

Execu;on  Seman;cs   In-­‐order,  Out-­‐of-­‐order  

Issue  Width   1,  2,  4  

Branch  Predictor   Local,  Tournament  

Reorder  Buffer  Size   64,  128  entries  

Physical  Register  File  (integer)   96,  160  

Physical  Register  File  (FP/SIMD)   64,  96  

Integer  ALUs   1,  3,  6  

Integer  Mul;ply/Divide  Units   1,  2  

Floa;ng-­‐point  ALUs   1,  2,  4  

FP  Mul;ply/Divide  Units   1,  2  

SIMD  Units   1,  2,  4  

Load/Store  Queue   16,32  entries  

Instruc;on  Cache   32KB  4-­‐way,  64KB  4-­‐way  

Private  Data  Cache   32KB  4-­‐way,  64KB  8-­‐way  

Shared  Last  Level  (L2)  cache   4-­‐banked  4MB  4-­‐way,  4-­‐banked  8MB  8-­‐way  

750  thousand  single  core  and  a  sep>llion  (1024)  4-­‐core  configura>ons  

Page 21: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

21  

Design  Space  Explora;on  Pruned  design  space  Design  Parameter   Design  Choice  

ISAs   Thumb,  Alpha,  x86-­‐64  

Execu;on  Seman;cs   In-­‐order,  Out-­‐of-­‐order  

Issue  Width-­‐Func;on  Units   1-­‐low,  1-­‐med,  2-­‐med,  4-­‐med,  4-­‐high  

Branch  Predictor   Local,  Tournament  

ROB-­‐IntReg-­‐FPReg   64-­‐96-­‐64,  128-­‐160-­‐96  

Load/Store  Queue   16,32  entries  

Cache  Hierarchy   32K/4-­‐4M/4,  32K/4-­‐8M/8,  64K/4-­‐4M/4,  64K/4-­‐8M/8  

600  single  core  and  130  billion  4-­‐core  configura;ons    

Page 22: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

22  

Design  Space  Explora;on  Op;mal  Configura;ons  

Peak  Power  Budget  (20W,  40W,  60W,  unlimited)  

Area  Budget  (48mm2,  64mm2,  80mm2,  unlimited)  

Mul;-­‐programmed  

mixed  workload  

Single-­‐threaded  workload  

28  op>mal  homogeneous,  single-­‐ISA  heterogeneous  and  heterogeneous-­‐ISA  designs  each  

Page 23: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

23  

Design  Space  Explora;on  Budget  Constraints  

x86-­‐64  

Op>mal  homogeneous  CMP  

x86-­‐64   x86-­‐64   x86-­‐64  

x86-­‐64  

Op>mal  single-­‐ISA  heterogeneous  CMP  

x86-­‐64   x86-­‐64   x86-­‐64  

thumb  

Op>mal  heterogeneous-­‐ISA  CMP  

alpha   alpha   x86-­‐64  

alpha  

Op>mal  homogeneous  CMP  

alpha   alpha   alpha  

alpha  

Op>mal  single-­‐ISA  heterogeneous  CMP  

alpha   alpha   alpha  

thumb  

Op>mal  heterogeneous-­‐ISA  CMP  

alpha   alpha   x86-­‐64  

Tight  constraints  (all  cores  are  small)  

Liberal  constraints  (all  cores  free  to  be  big)  

Page 24: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

24  

Design  Space  Explora;on  Mul;-­‐programmed  workload  throughput  

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

1.4  

1.6  

20   40   60   Unlimited  

Speedu

p  

Peak  Power  (W)  

Homogeneous   Single-­‐ISA   Heterogeneous-­‐ISA  

We  always  gain  more  from  ISA  heterogeneity  than  hardware  heterogeneity  

Tight  constraints   Liberal  constraints  

Page 25: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

25  

Design  Space  Explora;on  Mul;-­‐programmed  workload  throughput  

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

1.4  

1.6  

20   40   60   Unlimited  

Speedu

p  

Peak  Power  (W)  

Homogeneous   Single-­‐ISA   Heterogeneous-­‐ISA  

We  always  gain  more  from  ISA  heterogeneity  than  hardware  heterogeneity  

Moderate  constraints  

Page 26: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

26  

Design  Space  Explora;on  Mul;-­‐programmed  workload  throughput  ISA-­‐heterogeneity  benefits  come  from:  

•  ISA-­‐affinity:  different  code  regions  have  a  natural  affinity  for  one  ISA  or  another  

•  ISA-­‐microarchitecture  co-­‐design:  squeeze  in  more  powerful  cores  into  the  same  budget  Best  Single-­‐ISA  Heterogeneous  CMP  

Alpha  OOO  

Medium  End  

Best  Heterogeneous-­‐ISA  CMP  

Alpha  OOO  

Medium  End  

Alpha  OOO  Medium  End  

Alpha  InOrder    

Alpha  OOO  Medium  End  

x86-­‐64  OOO  Medium  End  

Thumb  OOO  Medium  End  

Alpha  OOO  High  

End  

Both  designs  are  constrained  at  an  area  budget  of  64mm2  

Page 27: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

27  

Design  Space  Explora;on  Op;mal  Configura;ons  

Peak  Power  Budget  (20W,  40W,  60W,  unlimited)  

Area  Budget  (48mm2,  64mm2,  80mm2,  unlimited)  

Mul;-­‐programmed  

mixed  workload  

Single-­‐threaded  workload  

28  op>mal  homogeneous,  single-­‐ISA  heterogeneous  and  heterogeneous-­‐ISA  designs  each  

Page 28: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

28  

Design  Space  Explora;on  Mul;-­‐programmed  workload  energy  efficiency  

Ø  22%  energy  savings  and  28%  reduc>on  in  EDP  at  ZERO  performance  loss  

Ø  We  gain  performance  and  decrease  energy  simultaneously  

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

20   40   60   Unlimited  

Normalized

 EDP

 

Peak  Power  (W)  

Homogeneous   Single-­‐ISA   Heterogeneous-­‐ISA  

Tight  constraints   Liberal  constraints  

Page 29: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

29  

Design  Space  Explora;on  Op;mal  Configura;ons  

Peak  Power  Budget  (20W,  40W,  60W,  unlimited)  

Area  Budget  (48mm2,  64mm2,  80mm2,  unlimited)  

Mul;-­‐programmed  

mixed  workload  

Single-­‐threaded  workload  

28  op>mal  homogeneous,  single-­‐ISA  heterogeneous  and  heterogeneous-­‐ISA  designs  each  

Page 30: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

30  

Design  Space  Explora;on  Single  Thread  Performance  

Ø  Mul;ple  small  cores  and  one  large  core  op;mized  for  high  performance  Ø  Combining  the  dual  benefits  of  ISA-­‐affinity  and  area  efficiency  of  Thumb,    

heterogeneous-­‐ISA  CMPs  provide  as  much  as  35%  speedup,  under  the  most  >ght  area  constraints  

0  

0.5  

1  

1.5  

2  

48   64   80   Unlimited  

Speedu

p  

Area  (mm2)  

Homogeneous   Single-­‐ISA   Heterogeneous-­‐ISA  

Page 31: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

31  

Design  Space  Explora;on  Op;mal  Configura;ons  

Peak  Power  Budget  (20W,  40W,  60W,  unlimited)  

Area  Budget  (48mm2,  64mm2,  80mm2,  unlimited)  

Mul;-­‐programmed  

mixed  workload  

Single-­‐threaded  workload  

28  op>mal  homogeneous,  single-­‐ISA  heterogeneous  and  heterogeneous-­‐ISA  designs  each  

Page 32: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

32  

Design  Space  Explora;on  Inferences  –  ISA-­‐microarchitecture  co-­‐design  

0%  

20%  

40%  

60%  

80%  

100%  

In-­‐order  

Out-­‐of-­‐o

rder   1   2   4   64  

128   16  

32  

Low  

High  

Local  

Tournamen

t  

32KB

 

64KB

 

Execu;on  Seman;cs  

Issue  Width   ROB  Size   LSQ  Size   Func;onal  units  

Branch  Predictor  

L1  Cache  Size  

Freq

uency  of  Occurrence  

Micro-­‐architectural  Parameter  

Thumb   Alpha   x86-­‐64  

Page 33: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

33  

Design  Space  Explora;on  Inferences  –  ISA-­‐microarchitecture  co-­‐design  

0%  

20%  

40%  

60%  

80%  

100%  

In-­‐order  

Out-­‐of-­‐o

rder   1   2   4   64  

128   16  

32  

Low  

High  

Local  

Tournamen

t  

32KB

 

64KB

 

Execu;on  Seman;cs  

Issue  Width   ROB  Size   LSQ  Size   Func;onal  units  

Branch  Predictor  

L1  Cache  Size  

Freq

uency  of  Occurrence  

Micro-­‐architectural  Parameter  

Thumb   Alpha   x86-­‐64  

Page 34: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

0%  

20%  

40%  

60%  

80%  

100%  

Execu>

on  Tim

e  

thumb   alpha   x86-­‐64  

34  

Design  Space  Explora;on  Inferences  –  ISA-­‐affinity  

Page 35: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

0%  

20%  

40%  

60%  

80%  

100%  

Execu>

on  Tim

e  

thumb   alpha   x86-­‐64  

35  

Design  Space  Explora;on  Inferences  –  ISA-­‐affinity  

Page 36: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

0%  

20%  

40%  

60%  

80%  

100%  

Execu>

on  Tim

e  

thumb   alpha   x86-­‐64  

36  

Design  Space  Explora;on  Inferences  –  ISA-­‐affinity  

Page 37: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Outline  

•  Mo;va;on  •  ISA  diversity  •  Design  Space  Explora;on  

– Naviga;on  and  Op;miza;on  –  Inference:  ISA-­‐microarchitecture  co-­‐design  –  Inference:  ISA-­‐affinity  

•  Compila;on  and  Run;me  Strategy  •  Key  Points  

37  

Page 38: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Compila;on  and  Run;me  Strategy  

Heterogeneous -­‐ ISA  Architecture

Thumb Alpha

x86-­‐64  

Symmetrical  Fat  Binary  All  data  objects  are  consistently  referenced  by  the  same  address  in  all  ISAs  

38  

L2  L1  

L2  L1  

L2  L1  

Logical  Address Space

Kernel  Space

Stack

Data Sec;ons (.data,  . bss ,  . tbss ,  . sdata ,   etc )

Code Sec;on Thumb

Code Sec;on  Alpha

Code Sec;on x86-­‐64

Reserved

Page 39: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Compila;on  and  Run;me  Strategy  

Heterogeneous -­‐ ISA  Architecture

Thumb Alpha

x86-­‐64  

Powerful  architecture-­‐independent  Intermediate  Representa>on  

Hints  for  State  Transforma;on  at  the  ;me  of  migra;on  

39  

L2  L1  

L2  L1  

L2  L1  

Logical  Address Space

Kernel  Space

Stack

Data Sec;ons (.data,  . bss ,  . tbss ,  . sdata ,   etc )

Code Sec;on Thumb

Code Sec;on  Alpha

Code Sec;on x86-­‐64

Reserved

Page 40: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Overview  of  migra;on  

Stack  

Data  Sec;ons  

Na>ve  execu>on  on  core  -­‐  I  

PC  

PC  Code  Sec;on  

(ISA  –  I)  

Binary  transla>on  on  core  -­‐  II  

Migra>on  requested  !!  Program  state  transforma>on  on  core  -­‐II  

PC  

Na>ve  execu>on  on  core  -­‐  II   40  

Code  Sec;on  (ISA  –  II)  

migra>on-­‐safe  point  (known  at  compile-­‐>me)  

Heterogeneous -­‐ ISA  Architecture

ISA   -­‐ I ISA   -­‐ II

ISA   -­‐ III

L2  L1  

L2  L1  

L2  L1  

Already  consistent  

Switch  to  Core  -­‐  II  

Page 41: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Execu;on  Migra;on  Timeline  

41  

Na>ve  Execu>on  on  ISA  1   Binary  Transla>on  

Program  State  Transforma>on  

Na>ve  Execu>on  on  ISA  2  

Migra>on  Overhead  

Migra>on  Overhead  =  Binary  Transla>on  +  Program  State  Transforma>on  

Migra>on  requested  

t  =  0   t  =  t1   t  =  t2   t  =  t3  

Time  

Migra>on-­‐safe  point  reached  

Page 42: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Migra;on  Cost  –  arbitrary  migra;ons  

•  Average  migra;on  cost:  4  milliseconds  •  Binary  Transla;on  ;me  dominates  the  migra;on  cost  

42  

0  

10  

20  

30  

40  

50  

60  

Execu>

on  Tim

e  (m

s)  

thumb  to  alpha   x86-­‐64  to  alpha   thumb  to  x86-­‐64  alpha  to  x86-­‐64   alpha  to  thumb   x86-­‐64  to  thumb  

Page 43: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Speedup  accoun;ng  for  Migra;on  Cost  

43  

0  0.2  0.4  0.6  0.8  1  

1.2  1.4  1.6  

20   40   60   Unlimited  

Speedu

p  

Peak  Power  (W)  

Homogeneous   Single-­‐ISA  

Heterogeneous-­‐ISA  (w/o  migra;on  cost)   Heterogeneous-­‐ISA  (w/  migra;on  cost)  

•  Performance  degrada;on:  0.4-­‐0.7%  •  Overall  speedup  due  to  migra;on:  11%  

Page 44: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Key  Points  •  Heterogeneous-­‐ISA  CMP  can  outperform  the  best  Single-­‐

ISA  heterogeneous  CMP  by  as  much  as  21%  or  provide  23%  energy  savings  and  32%  reduc;on  in  EDP.  

•  ISA-­‐microarchitecture  co-­‐design  is  cri;cal.  There  is  significant  synergy  in  combining  hardware  heterogeneity  and  ISA  heterogeneity.  

•  Where  hardware  heterogeneity  alone  cannot  provide  any  benefits,  ISA-­‐affinity  s;ll  con;nues  to  provide  both  performance  and  energy  gains.  

44  

Page 45: New Harnessing)ISA)Diversity:)Design)of)a … · 2014. 6. 19. · Harnessing)ISA)Diversity:)Design)of)a Heterogeneous5ISA)Chip)Mul;processor) Ashish%Venkat))))Dean)M.)Tullsen) University)of)California,)San)Diego)

Thank  You!  

45