a(framework(for(deploying(( signal(processing(applicaons ... · dsp1 dsp2 dsp3 vcp0(viterbi...
TRANSCRIPT
Atomix: A Framework for Deploying
Signal Processing Applica:ons on Wireless Infrastructure
Manu Bansal, Aaron Schulman, Sachin KaA
Stanford University
NSDI ‘15
SoIware-‐defined base-‐sta:ons
int main() { //PHY & MAC … }
Wireless infrastructure is programmable
TI 6670 mul:core DSP SoC
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
We could be deploying apps into infrastructure
3
RF localiza:on (SecureArray)
Video-‐op:mized wireless stack
(APEX)
BER feedback (SoIRate)
What modifica:ons are needed to deploy apps?
Primi:ves needed for deploying apps
4
Base-‐sta:on soIware needs to provide a modular interface to tap, tweak, and insert
CSI
EQ VITDEC
Tap
OFDM SLICER
(BER)
Insert
SYNC
Tweak
Apps require high throughput and low latency
5 SoIware must deliver hardware-‐like performance
Time 250us 16us
Data frame ACK frame
… 20Msps == 640Mbps
High processing throughput
Low processing latency
WiFi stack example (LTE has easier latency constraints)
GeAng hardware-‐like performance
6
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
Pipeline programming Memory management Inter-‐core data transfers
Customized hardware
Hand-‐op:mized soIware
GeAng hardware-‐like performance
7
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
Pipeline programming Memory management Inter-‐core data transfers
Customized hardware
Hand-‐op:mized soIware
GeAng hardware-‐like performance
8
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
Pipeline programming Memory management Inter-‐core data transfers
Modular changes can have unpredictable effects on :ming
Customized hardware
Hand-‐op:mized soIware
The Atom Abstrac:on
9
Atom A
tA (e.g. 200 cycles)
Atom: A unit of execu:on with fixed, known :ming
Atom A Atom B
tA + tB
Composability: Composi:on of atoms is also an atom
Base-‐sta:on soIware in Atomix… 1) Can be built en:rely out of atoms
2) Achieves hardware-‐like performance 3) Enables modular modifica:ons
Base-‐sta:on soIware can be built with atoms Atomix WiFi chain can meet throughput and latency Apps are easily added to Atomix WiFi
10
WiFi signal processing chain
11
Time
Data frame Data frame
…
CSI EQ
OFDM SLICER SYNC
Data flowgraph over signal processing blocks
(BPSK or QPSK)
(BPSK) (QPSK)
Implemen:ng blocks as atoms
12
Slicer QPSK
BPSK Constella:on? BPSK: 200cy QPSK: 400cy
Input data length? 1 OFDM symbol: 200cy 2 OFDM symbols: 400cy
N? M?
1) Split out branches, 2) Fix data lengths Make branching explicit
tBPSK48 = 200cy
QPSK48
48 cplx 48 bits
48 cplx 96 bits
BPSK48
CSI EQ
OFDM SLICER SYNC
(BPSK or QPSK)
Implemen:ng flowgraphs as atoms
13 Make branching explicit
CSI EQ
OFDM SLICER SYNC
CSI EQ
OFDM BPSK48 SYNC
CSI EQ
OFDM QPSK48 SYNC
(BPSK or QPSK)
Implemen:ng flowgraphs as atoms
Explicitly model data access cost using FIFO atoms 14
BPSK Atom
CSI EQ
OFDM BPSK48 SYNC
CSI
EQ
OFDM
BPSK SYNC
F F
F F
F F
F F F F F
BPSK F F
Parallelizing flowgraphs with atoms
15
BPSK Atom CSI
EQ
OFDM
F F
F F
F F
F SYNC F F
Core 0
Core 1
BPSK F F
Parallelizing flowgraphs with atoms
16
EQUALIZER Atom – Core 0
BPSK48 F F
BPSK Atom – Core 1
Transfer F F
CSI
EQ
OFDM
F F
F F
F F
F SYNC F F
Core 0
Core 1
Explicitly model data transfer cost as an atom
Implemen:ng decisions with atoms
17 Make branch explicit, push to the top-‐level
DISPATCHER Header atom
BPSK atom
QPSK atom
BPSK
QPSK
Atomix framework in 1-‐slide
• Everything as an atom – Signal processing components – Hardware management components
• Atoms for blocks, flowgraphs, states • Simple control flow makes atoms composable • Declara:ve language allows easy modifica:on
18 Atoms enable modularity, precise :ming, efficient pipelines
Base-‐sta:on soIware can be built with atoms Atomix WiFi chain can meet throughput and latency Apps are easily added to Atomix WiFi
19
Fine-‐grained pipeline parallelism
20
CSI
EQ
OFDM
QAM64
SYNC
DEINTER-‐LEAVER
DEPUNC-‐TURER
DECODE-‐SCATTERER
VITERBI-‐ ISSUE
DECODE-‐GATHERER
DESCRA-‐MBLER CRC32
DSP0
DSP1
DSP2
DSP3
VCP0
VITERBI-‐DECODING
VITERBI-‐DECODING
VITERBI-‐DECODING
VITERBI-‐DECODING
VCP1 VCP2 VCP3
80-‐sample buffers (OFDM symbols)
Decoded bits … 010110
…
21
22
Packet Energy
CRC Toggle
Fine-‐grained pipeline parallelism
23 Atomix WiFi decodes 10MHz with resources to spare
Tight packet decode latency
24
WiFi highest-‐MCS 1000-‐byte packets, CDF of decode latency (deadline = 64us at 5MHz, 32us at 10MHz, 16us at 20MHz)
Atomix WiFi decodes 10MHz in low latency, predictable :ming
Experience with WiFi in Atomix
25
Low-‐level code (C)
Signal processing func:ons (C)
Parallelized Atoms (Ax)
Schedule, Resource Assignment
Atomix compiler
Atomix run:me libraries
Na:ve app binary
Na:ve compiler
3,000 Loc (with Atomix)
30,000 LoC (w/o Atomix)
Base-‐sta:on soIware can be built with atoms Atomix WiFi chain can meet throughput and latency Apps are easily added to Atomix WiFi
26
Loca:on-‐signature app
No change in WiFi packet decode latency 27
CSI EQ
OFDM
SLICER
PH
SYNC
Rxx EIG SPT
4 new signal processing blocks 30 lines of code to add app
Predictable change in SYNC atom
2 new signal processing blocks 20 lines of code to add app
Predictable change in SYNC, DATA
CSI EQ
EVM
OFDM
SLICER
CIR
SYNC
ED
CNR
Related work • Modular frameworks for GPPs and FPGAs – SORA: works on GPPs, no clear mapping to DSPs – AirBlue: Targets FPGAs, different challenges than DSPs – Ziria: complementary to Atomix
• Embedded real-‐:me opera:ng systems (Neutrino, VxWorks, TI SYS/BIOS) – Typically for low sample rate apps (e.g., an:-‐locking brakes)
– Misfits for expressing modular signal-‐processing apps – No abstrac:ons for blocks, flowgraphs, state-‐machine
28
Conclusion Atomix: a new programming framework • Everything is an atom • Hardware-‐like performance • Modularity to tap, tweak, insert Future work: • Automated resource scheduling • Sta:c program checking • Extending to L2-‐L7 NFV packet processing
29