wisync:(an(architecture(for(( fast(synchroniza5on(through((...

45
WiSync: An Architecture for Fast Synchroniza5on through OnChip Wireless Communica5on ASPLOS ’16 – Atlanta, GA – April 26, 2016 Sergi Abadal ([email protected]) Albert CabellosAparicio, Eduard Alarcón, Josep Torrellas UPC and UIUC

Upload: others

Post on 15-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

WiSync:  An  Architecture  for    Fast  Synchroniza5on  through    

On-­‐Chip  Wireless  Communica5on  

ASPLOS  ’16  –  Atlanta,  GA  –  April  2-­‐6,  2016  

Sergi  Abadal  ([email protected])  Albert  Cabellos-­‐Aparicio,  Eduard  Alarcón,  Josep  Torrellas  

UPC  and  UIUC  

Page 2: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Mo$va$on  

4/21/16   WiSync  –  ASPLOS  ’16  –  April  4th,  2016   2  

Manycores  are  becoming  larger:  Intel  Knights  Landing  chip  has  208  contexts    Fine-­‐grain  synchroniza$on  is  costly  to  support  in  large  manycores   Global  and  broadcast  communica$on  are  prohibi$ve  

 

Page 3: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Opportunity  

4/21/16   WiSync  –  ASPLOS  ’16  –  April  4th,  2016   3  

 On-­‐chip  wireless  technology  provides  a  promising  approach  to  address  it   Low  latency,  natural  broadcast  capabili$es  

 Our  proposal:  a  miniature  antenna  and  transceiver  per  each  core  

 

Antenna  +  Transceiver  

Core  +    L1  +  L2  

Page 4: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Contribu$on:  WiSync  Architecture  

4/21/16   4  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

 Wireless  on-­‐chip  communica$on  for  fine-­‐grain  synchroniza$on    Applicable  to  large  manycores  (128+  cores)    Per-­‐core  Broadcast  Memory:  replicates  data  in  all  cores  on  a  write    

 

Page 5: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Emerging  Interconnect  Technologies  

4/21/16   WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Nanophotonics  Vantrease  et  al,  2008  

Transmission  Line  Interconnects  Oh  et  al,  2013  

Wireless  on-­‐chip  Communica$on  

Transceiver  (transmi8er  and  receiver  circuits)  

On-­‐chip  Antenna  

Core  

Page 6: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

On-­‐chip  Wireless  Communica$on  

4/21/16   6  

  PROS    Inherently  broadcast    Low  latency    Simplicity  /  Flexibility  /  Non-­‐intrusiveness  

  CONS    Less  energy  efficient  than  TL  or  photonics    Low  bandwidth  

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

S.  Abadal  et  al,  “Broadcast-­‐Enabled  Massive  Mul5core  Architectures:  A  Wireless  RF  Approach,”  IEEE  MICRO,  vol.  35,  no.  5,  pp.  52–61,  2015.  

Page 7: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

How  WiSync  Uses  Wireless  (I)  

4/21/16   7  

Main  Use:  Globally  shared  medium  for  data  transmission   One  channel  shared  by  all  cores    Everyone  receives  what  someone  transmits    Consistent  order  of  delivery   Only  one  core  should  transmit  at  the  same  $me  

  Simultaneous  transmissions  “collide”  

 

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

t  

t  

Core  1  

Core  2  

Page 8: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

How  WiSync  Uses  Wireless  (I)  

4/21/16   8  

Main  Use:  Globally  shared  medium  for  data  transmission   One  channel  shared  by  all  cores    Everyone  receives  what  someone  transmits    Consistent  order  of  delivery   Only  one  core  should  transmit  at  the  same  $me  

  Simultaneous  transmissions  “collide”  

 

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

t  

t  

Core  1  

Core  2  

USED  FOR  MOST  SYNCHRONIZATION  

Page 9: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

How  WiSync  Uses  Wireless    (II)  

4/21/16   9  

Special  Use:  collec$ve  binary  OR  (tones)  

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Transmiders  Everyone  silent  

One  transmits  tone  More  than  one  transmits  tone  

Receiver  ‘0’  ‘1’  ‘1’  

Page 10: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

How  WiSync  Uses  Wireless    (II)  

4/21/16   10  

Special  Use:  collec$ve  binary  OR  (tones)  

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Transmiders  Everyone  silent  

One  transmits  tone  More  than  one  transmits  tone  

Receiver  ‘0’  ‘1’  ‘1’  

USED  FOR  GLOBAL  BARRIERS  

Page 11: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Wireless  Performance  and  Cost  

4/21/16   11  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

X.  Yu,  J.  Baylon,  P.  Wejn,  D.  Heo,  P.  Pande,  and  S.  Mirabbasi,  “Architecture  and  Design  of  Mul5-­‐Channel  Millimeter-­‐Wave  Wireless  Network-­‐on-­‐Chip,”  IEEE  Design  &  Test,  2014  

(Simulated)  CMOS  65nm  

30  GHz  –  90  GHz  16  Gb/s  –  48  Gb/s  

Antenna+Transceiver:  0.25  mm2  –  0.79  mm2  

31  mW  –  97  mW  

(Extrapolated)  CMOS  22nm  

60  GHz  –  150  GHz  16  Gb/s  –  48  Gb/s  

Antenna+Transceiver:  0.1  mm2  –  0.3  mm2  

16  mW  –  48  mW  

Technology  Scaling  

Page 12: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Wireless  Performance  and  Cost  

4/21/16   12  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

X.  Yu,  J.  Baylon,  P.  Wejn,  D.  Heo,  P.  Pande,  and  S.  Mirabbasi,  “Architecture  and  Design  of  Mul5-­‐Channel  Millimeter-­‐Wave  Wireless  Network-­‐on-­‐Chip,”  IEEE  Design  &  Test,  2014  

(Simulated)  CMOS  65nm  

30  GHz  –  90  GHz  16  Gb/s  –  48  Gb/s  

Antenna+Transceiver:  0.25  mm2  –  0.79  mm2  

31  mW  –  97  mW  

(Extrapolated)  CMOS  22nm  

60  GHz  –  150  GHz  16  Gb/s  –  48  Gb/s  

Antenna+Transceiver:  0.1  mm2  –  0.3  mm2  

16  mW  –  48  mW  

Technology  Scaling  

Page 13: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

WiSync:  Main  Idea  

4/21/16   13  

 Wireless  network  appears  as  a  bus    Variables  declared  in  program  as  broadcast    

  Allocated  in  Broadcast  Memory  (BM)  

  Do  not  use  the  regular  cache  hierarchy  

  BM:  Small  per-­‐core  memory    Replicates  variables  across  cores  on  every  write  

 Writes  globally  serialized  

  Each  BM  line:  64  bits   C0   C1   C2  wr0  x   wr1  y   wr2  x  

BM  x    y  

BM  x    y  

BM  x    y  Long  bus  

(logical)  

Wireless  Network  

Page 14: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Wireless  Transceiver  

       

B-­‐Memory  (BM)  

The  WiSync  Architecture  

4/21/16   14  

 Two  wireless  networks:  data  &  tone    

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

·∙·∙  ·∙·∙  ·∙·∙  

·∙·∙   ·∙·∙   ·∙·∙  

Core  +  L1  +  L2  

Atomicity  Failure  Bit  (AFB)  

(Data  &  Tone)  Antennas  

Page 15: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

WiSync:  Wireless  Channels  

4/21/16   15  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Frequency  (GHz)  

Signal  Stren

gth  (dB)  

Data  Transfer    Channel  

19  GHz   1  GHz  

Tone  Channel  

60   90  

Page 16: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Data  Transfer  Channel  

4/21/16   16  

  A  message  takes  5  cycles  (5  ns)  in  wireless  network    Transferring  data  takes  4  ns    Handling  collisions  takes  an  addi$onal  1  ns  

 

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

1   C   2   3   4  

1   C  

1   C  Core  1  

Core  2   1   C   2   3   4  

Page 17: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Load  from  the  BM:  Local  

4/21/16   17  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  Core  

1.  Ld  (addr,  PID)  

Transceiv   BM  

Transceiv  

BM  

Transceiv  

3.  val  

2.  Check  PID  

Page 18: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  BM  keeps  the  new  value  in  temporary  registers  

Store  into  the  BM:  Broadcast  

4/21/16   18  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  Core  

1.  st  (addr,  val)  

Transceiv  2.  addr,  val  

BM  

Transceiv  

BM  

Transceiv  3.  OK  4.  OK  

The  value  is  now  updated  in  all  BMs  (including  the  local  

one)  

Page 19: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Store  into  the  BM:  Broadcast  

4/21/16   19  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  Core  

1.  st  (addr,  val)  

Transceiv  2.  addr,  val  

BM  

Transceiv  

BM  

Transceiv  

May  collide  and  need  to  retry  

Page 20: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Store  into  the  BM:  Broadcast  

4/21/16   20  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  Core  

1.  st  (addr,  val)  

Transceiv  2.  addr,  val  

BM  

Transceiv  

BM  

Transceiv  

Y.  OK  Z.  OK  

May  collide  and  need  to  retry  

X.  addr,  val  

Page 21: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Read-­‐Modify-­‐Write  Opera$on  to  the  BM  

4/21/16   21  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  Core  

1.  F&A  (addr)   Transceiv  

4.  addr,  new  val  

BM  

Transceiv  

BM  

Transceiv  

5.  OK  

2.  curr  val  

AFB  =  0  (Atomicity    Failure  Bit)  

3.  new  val  

6.  OK  Transceiver  checks  the  addr  of  remote  stores  while  RMW  is  being  executed  

Page 22: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Read-­‐Modify-­‐Write  Opera$on  to  the  BM  

4/21/16   22  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  Core  

1.  F&A  (addr)   Transceiv   BM  

Transceiv  

BM  

Transceiv  

4.  addr,  val2  

2.  curr  val  

AFB  =  1  (Atomicity    Failure  Bit)  

3.  new  val  

6.  KO  Transceiver  checks  the  addr  of  remote  stores  while  RMW  is  being  executed  

failed  

Page 23: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

WiSync:  Wireless  Channels  

4/21/16   23  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Frequency  (GHz)  

Signal  Stren

gth  (dB)  

Data  Transfer    Channel  

19  GHz   1  GHz  

Tone  Channel  

60   90  

Page 24: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  Tone  Barrier  

4/21/16   24  

J.  Oh,  M.  Prvulovic,  and  A.  Zajic,  “TLSync:  support  for  mul5ple  fast  barriers  using  on-­‐chip  transmission  lines,”  in  Proceedings  of  ISCA-­‐38,  2011,  pp.  105–115.  

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

STEP  1:  Start  barrier  execu$on  

First  arrival      no$fies  everyone  

           =  Core  not  issuing  Tone  (Listening)              =  Core  issuing  Tone  

Page 25: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  Tone  Barrier  

4/21/16   25  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

STEP  2:  Raise  tone  

Those  that  have  not  arrived  raise  tone  

           =  Core  not  issuing  Tone  (Listening)              =  Core  issuing  Tone  

Page 26: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  Tone  Barrier  

4/21/16   26  

New  arrivals  stop  sending  the  tone  

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

STEP  3:  Stop  tone  

Page 27: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  Tone  Barrier  

4/21/16   27  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Everyone  arrived.  The  tone  disappears.  Everyone  senses  it.  

Good  to  go.    

STEP  4:  Finish  barrier  execu$on  

Page 28: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  Tone  Barrier  

4/21/16   28  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Everyone  arrived.  The  tone  disappears.  Everyone  senses  it.  

Good  to  go.    

STEP  4:  Finish  barrier  execu$on  

VERY  LOW  OVERHEAD  BARRIER  

Page 29: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Evalua$on  Environment  

4/21/16   29  

 Architectural  Simulator:  Mul$2sim   Architecture:  manycore  with  64  cores  at  22nm    Core:  OOO,  2-­‐issue  wide,  1GHz    Per-­‐core  BM:  16KB,  5-­‐cycle  RT,  64-­‐bit  entries    Per-­‐core  Transceiver+Antennas:  0.12  mm2,  18  mW   Applica5ons:  SPLASH-­‐2  and  PARSEC  

 

 WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 30: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Simulated  Architectures  

4/21/16   30  

  Baseline:  no  BM,  spinlocks  with  CAS,  centralized  barrier    Baseline+:  no  BM,  MCS  locks,  tournament  barrier  WiSyncNoT:  BM,  wireless  locks  and  barriers  (no  tone)  WiSync:  BM,  wireless  locks,  tone  barriers  

 

 WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 31: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Applica$on  Speedup  over  Baseline  

4/21/16   31  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

  Speedup  of  WiSync  over  Baseline:  39%  (arithme$c  mean)                    over  Baseline+:  19%    

  Tone  Channel:  not  much  impact  b/c  not  much  $ght  barrier  synch  WiSync  works  well  with  many  barriers  /  many  locks  

Page 32: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Conclusions  

4/21/16   32  

WiSync:  manycore  with  wireless-­‐based  synch   Per-­‐core  Broadcast  Memory:  replicates  data  

  Speed-­‐up  apps  by  19%  average  over  fancy  synch    Future  work:    

 Rewrite  apps  to  use  fine-­‐grain  communica$on   Map  graph  applica$ons,  MPI  collec$ves   Use  broadcast  for  non-­‐synch  accesses  

  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 33: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

The  end  

4/21/16   33  

Thanks  for  your  a8enNon.  QuesNons?  

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 34: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Also  in  the  Paper…  

4/21/16   34  

  Alloca$on  of  variables  in  the  BM    Bulk  transmission  support  and  op$miza$on    Sharing  the  tone  channel    Support  for  producer-­‐consumer  paderns,  broadcasts,  reduc$ons…   Mul$programming,  context  switching  and  thread  migra$on    Barrier  Synchroniza$on  and  CAS  Throughput  evalua$on  

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 35: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Tone  Channel  with  Kernel  

4/21/16   35  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

  Tone  Barrier  is  useful  with  $ght  barrier  synch  

Page 36: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Read-­‐Modify-­‐Write  Code  

4/21/16   36  

retry:    fetch&inc  R,  BM_addr      if(AFB)  {        jmp  retry      }      else  {        /*  success  */      }  

   

 

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 37: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Tone  Barrier  Code  

4/21/16   37  

for  ()  {    /*  work  */    barrier:    local_sense  =  !local_sense  a  

   tone_write(addr)      spin  un$l  (tone_read(addr)  ==    local_sense)  

}  

WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 38: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Alloca$ng  a  Variable  in  the  BMs  

4/21/16   38  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Data  Transfer  Channel  BM  C  

1.  alloc  (addr,  PID)  

T   2.  addr,  PID  

BM  

T  

BM  

T  3.  OK  4.  OK  

Page 39: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Tone  Channel  to  Implement  Barriers  

4/21/16   39  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

‘0’   ‘1’   ‘1’  

White:  No  Tone  (Listening)  Blue:  Tone  

Page 40: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Use  of  the  Data  Transfer  Channel  

4/21/16   40  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

LESS  THAN  1%!!!  

Page 41: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Sensi$vity  Study  

4/21/16   41  WiSync  –  ASPLOS  ’16  –  April  4th,  2016  

Page 42: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

PHY  in  Wireless  Communica$on  

4/21/16   42  Wireless  Networks-­‐on-­‐Chip  for  Manycore  Processors  

SER   MOD   DEMOD   DESER        

PHY  

     

PHY  

  Simple  schemes  to  keep  area  and  power  low.  Example:  On-­‐Off  Keying  (OOK)    

 

Page 43: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

MAC  in  Wireless  Communica$on  

4/21/16   43  Wireless  Networks-­‐on-­‐Chip  for  Manycore  Processors  

SER   MOD      

PHY  

DEMOD   DESER      

PHY  MAC   MAC  

  Access  to  the  shared  medium  needs  to  be  managed:   Medium  Access  Control  (MAC)  protocol  

  Challenges:      Scalability    Low  overhead    Fairness  

 

 Ways  to  do  it:    Avoiding  sharing    Contending  for  the  channel    Request  to  send  è  Clear  to  send    Token  passing  

 

Page 44: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Tone  opera$on  

4/21/16   44  

  Barriers  will  be  allocated  in  everyone’s  allocated  barrier  table.  If  the  core  par$cipates  in  that  barrier  (PID),  armed  bit  is  set  to  1.    First  guy  to  arrive  broadcasts  its  arrival  and  causes  the  alloca$on  of  the  barrier  in  everyone’s  Ac$ve  barrier  table.  

   

 

 

WiSync  

  Those  who  par$cipate  (armed)  raise  the  tone.    Whenever  they  arrive  to  the  barrier,  stop  the  tone.    When  everyone  has  arrived,  silence  è  barrier  release.    Upon  release,  deallocate  from  the  ac$ve  barrier  table.    At  execu$on’s  end,  deallocate  from  the  allocated  barrier  table.      

 

 

Page 45: WiSync:(An(Architecture(for(( Fast(Synchroniza5on(through(( …iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_asplos16_2.pdf · Mo$vaon! 4/21/16! WiSync(–ASPLOS(’16(–April(4

Sharing  the  tone  channel  

4/21/16   45  

  What  happens  when  there  is  more  than  one  ac$ve  barrier  at  the  same  $me  (mul$applica$on  scenario)    Time  is  sloded:  each  entry  in  the  Ac$veB  table  will  have  its  own.  

   

 

 WiSync  

  Round-­‐robin  scheduling.    When  a  new  barrier  becomes  ac$ve,  reschedule.    When  a  barrier  is  realeased,  reschedule.