a(framework(for(deploying(( signal(processing(applicaons ... · dsp1 dsp2 dsp3 vcp0(viterbi...

29
Atomix: A Framework for Deploying Signal Processing Applica:ons on Wireless Infrastructure Manu Bansal, Aaron Schulman, Sachin KaA Stanford University NSDI ‘15

Upload: others

Post on 04-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Atomix:  A  Framework  for  Deploying    

Signal  Processing  Applica:ons  on  Wireless  Infrastructure  

   Manu  Bansal,  Aaron  Schulman,  Sachin  KaA  

Stanford  University    

NSDI  ‘15      

Page 2: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

SoIware-­‐defined  base-­‐sta:ons  

int  main()  {          //PHY  &  MAC          …  }  

Wireless  infrastructure  is  programmable  

TI  6670  mul:core    DSP  SoC  

DSP0   DSP1  

Shared  SRAM  

Local  SRAM  

Local  SRAM  

Page 3: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

We  could  be  deploying  apps  into  infrastructure  

     

3  

RF  localiza:on  (SecureArray)  

Video-­‐op:mized    wireless  stack  

(APEX)  

BER  feedback  (SoIRate)  

What  modifica:ons  are  needed  to  deploy  apps?  

Page 4: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Primi:ves  needed  for  deploying  apps  

4  

Base-­‐sta:on  soIware  needs  to  provide    a  modular  interface  to  tap,  tweak,  and  insert  

CSI  

EQ   VITDEC  

Tap  

OFDM  SLICER  

(BER)  

Insert  

SYNC  

Tweak  

Page 5: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Apps  require  high  throughput  and  low  latency  

5  SoIware  must  deliver  hardware-­‐like  performance  

Time  250us   16us  

Data  frame   ACK  frame  

…  20Msps  ==  640Mbps  

High  processing  throughput  

Low  processing  latency  

WiFi  stack  example  (LTE  has  easier  latency  constraints)  

Page 6: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

GeAng  hardware-­‐like  performance  

6  

DSP0   DSP1  

Shared  SRAM  

Local  SRAM  

Local  SRAM  

Pipeline  programming    Memory  management    Inter-­‐core  data  transfers  

Customized  hardware  

Hand-­‐op:mized  soIware  

Page 7: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

GeAng  hardware-­‐like  performance  

7  

DSP0   DSP1  

Shared  SRAM  

Local  SRAM  

Local  SRAM  

Pipeline  programming    Memory  management    Inter-­‐core  data  transfers  

Customized  hardware  

Hand-­‐op:mized  soIware  

Page 8: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

GeAng  hardware-­‐like  performance  

8  

DSP0   DSP1  

Shared  SRAM  

Local  SRAM  

Local  SRAM  

Pipeline  programming    Memory  management    Inter-­‐core  data  transfers  

Modular  changes  can  have  unpredictable  effects  on  :ming  

Customized  hardware  

Hand-­‐op:mized  soIware  

Page 9: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

The  Atom  Abstrac:on  

9  

Atom  A  

tA  (e.g.  200  cycles)  

Atom:  A  unit  of  execu:on    with  fixed,  known  :ming  

Atom  A   Atom  B  

tA  +  tB  

Composability:  Composi:on  of  atoms  is  also  an  atom  

Base-­‐sta:on  soIware  in  Atomix…  1)  Can  be  built  en:rely  out  of  atoms  

2)  Achieves  hardware-­‐like  performance  3)  Enables  modular  modifica:ons  

Page 10: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Base-­‐sta:on  soIware  can  be  built  with  atoms  Atomix  WiFi  chain  can  meet  throughput  and  latency  Apps  are  easily  added  to  Atomix  WiFi  

10  

Page 11: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

WiFi  signal  processing  chain  

11  

Time  

Data  frame   Data  frame  

…  

CSI  EQ  

OFDM  SLICER  SYNC  

Data  flowgraph  over  signal  processing  blocks  

(BPSK  or  QPSK)  

(BPSK)   (QPSK)  

Page 12: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Implemen:ng  blocks  as  atoms  

12  

Slicer  QPSK  

BPSK  Constella:on?  BPSK:  200cy  QPSK:  400cy  

Input  data  length?  1  OFDM  symbol:  200cy  2  OFDM  symbols:  400cy  

N?   M?  

1)  Split  out  branches,  2)  Fix  data  lengths  Make  branching  explicit  

tBPSK48  =  200cy  

QPSK48  

48  cplx   48  bits  

48  cplx   96  bits  

BPSK48  

CSI  EQ  

OFDM  SLICER  SYNC  

(BPSK  or  QPSK)  

Page 13: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Implemen:ng  flowgraphs  as  atoms  

13  Make  branching  explicit  

CSI  EQ  

OFDM  SLICER  SYNC  

CSI  EQ  

OFDM  BPSK48  SYNC  

CSI  EQ  

OFDM  QPSK48  SYNC  

(BPSK  or  QPSK)  

Page 14: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Implemen:ng  flowgraphs  as  atoms  

Explicitly  model  data  access  cost  using  FIFO  atoms  14  

BPSK  Atom  

CSI  EQ  

OFDM  BPSK48  SYNC  

CSI  

EQ  

OFDM  

BPSK  SYNC  

F   F  

F  F  

F  F  

F   F   F  F   F  

Page 15: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

BPSK  F   F  

Parallelizing  flowgraphs  with  atoms  

15  

BPSK  Atom  CSI  

EQ  

OFDM  

F   F  

F  F  

F  F  

F  SYNC   F   F  

Core  0  

Core  1  

Page 16: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

BPSK  F   F  

Parallelizing  flowgraphs  with  atoms  

16  

EQUALIZER  Atom  –  Core  0  

BPSK48  F   F  

BPSK  Atom  –  Core  1  

Transfer  F   F  

CSI  

EQ  

OFDM  

F   F  

F  F  

F  F  

F  SYNC   F   F  

Core  0  

Core  1  

Explicitly  model  data  transfer  cost  as  an  atom  

Page 17: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Implemen:ng  decisions  with  atoms  

17  Make  branch  explicit,  push  to  the  top-­‐level  

DISPATCHER  Header  atom  

     

BPSK  atom  

     

QPSK  atom  

BPSK  

QPSK  

Page 18: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Atomix  framework  in  1-­‐slide  

•  Everything  as  an  atom  – Signal  processing  components  – Hardware  management  components  

•  Atoms  for  blocks,  flowgraphs,  states  •  Simple  control  flow  makes  atoms  composable  •  Declara:ve  language  allows  easy  modifica:on  

18  Atoms  enable  modularity,  precise  :ming,  efficient  pipelines  

Page 19: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Base-­‐sta:on  soIware  can  be  built  with  atoms  Atomix  WiFi  chain  can  meet  throughput  and  latency  Apps  are  easily  added  to  Atomix  WiFi  

19  

Page 20: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Fine-­‐grained  pipeline  parallelism  

20  

CSI  

EQ  

OFDM  

QAM64  

SYNC  

DEINTER-­‐LEAVER  

DEPUNC-­‐TURER  

DECODE-­‐SCATTERER  

VITERBI-­‐  ISSUE  

DECODE-­‐GATHERER  

DESCRA-­‐MBLER   CRC32  

DSP0  

DSP1  

DSP2  

DSP3  

VCP0  

VITERBI-­‐DECODING  

VITERBI-­‐DECODING  

VITERBI-­‐DECODING  

VITERBI-­‐DECODING  

VCP1   VCP2   VCP3  

80-­‐sample  buffers  (OFDM  symbols)  

Decoded  bits  …  010110  

…  

Page 21: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

21  

Page 22: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

22  

Packet    Energy  

CRC    Toggle  

Page 23: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Fine-­‐grained  pipeline  parallelism  

23  Atomix  WiFi  decodes  10MHz  with  resources  to  spare  

Page 24: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Tight  packet  decode  latency  

24  

WiFi  highest-­‐MCS  1000-­‐byte  packets,  CDF  of  decode  latency    (deadline  =  64us  at  5MHz,  32us  at  10MHz,  16us  at  20MHz)  

Atomix  WiFi  decodes  10MHz  in  low  latency,  predictable  :ming  

Page 25: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Experience  with  WiFi  in  Atomix  

25  

Low-­‐level  code  (C)  

Signal  processing  func:ons  (C)  

Parallelized  Atoms  (Ax)  

Schedule,  Resource  Assignment  

Atomix  compiler  

Atomix  run:me  libraries  

Na:ve  app  binary  

Na:ve  compiler  

3,000  Loc  (with  Atomix)  

30,000  LoC  (w/o  Atomix)  

Page 26: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Base-­‐sta:on  soIware  can  be  built  with  atoms  Atomix  WiFi  chain  can  meet  throughput  and  latency  Apps  are  easily  added  to  Atomix  WiFi  

26  

Page 27: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Loca:on-­‐signature  app  

No  change  in  WiFi  packet  decode  latency  27  

CSI  EQ  

OFDM  

SLICER  

PH  

SYNC  

Rxx   EIG   SPT  

4  new  signal  processing  blocks  30  lines  of  code  to  add  app  

Predictable  change  in  SYNC  atom  

2  new  signal  processing  blocks  20  lines  of  code  to  add  app  

Predictable  change  in  SYNC,  DATA  

CSI  EQ  

EVM  

OFDM  

SLICER  

CIR  

SYNC  

ED  

CNR  

Page 28: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Related  work  •  Modular  frameworks  for  GPPs  and  FPGAs  –  SORA:  works  on  GPPs,  no  clear  mapping  to  DSPs  – AirBlue:  Targets  FPGAs,  different  challenges  than  DSPs  –  Ziria:  complementary  to  Atomix    

•  Embedded  real-­‐:me  opera:ng  systems  (Neutrino,  VxWorks,  TI  SYS/BIOS)  –  Typically  for  low  sample  rate  apps  (e.g.,  an:-­‐locking  brakes)  

– Misfits  for  expressing  modular  signal-­‐processing  apps  – No  abstrac:ons  for  blocks,  flowgraphs,  state-­‐machine  

28  

Page 29: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits

Conclusion  Atomix:  a  new  programming  framework  •  Everything  is  an  atom  •  Hardware-­‐like  performance  •  Modularity  to  tap,  tweak,  insert    Future  work:  •  Automated  resource  scheduling  •  Sta:c  program  checking  •  Extending  to  L2-­‐L7  NFV  packet  processing  

29