sridhar rajagopalsridhar/ppts/rochester-talk.pdfinput data output data correlator channel estimation...
TRANSCRIPT
![Page 1: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/1.jpg)
RICE UNIVERSITY
Flexible wireless communication architectures
Sridhar Rajagopal
Department of Electrical and Computer EngineeringRice University, Houston TX
Faculty Candidate Seminar – University of RochesterMarch 31, 2003
This work has been supported in part by NSF, Nokia and TI
![Page 2: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/2.jpg)
2RICE UNIVERSITY
Future wireless devices demand flexibility
High data rate mobile devices with multimedia
Multiple antennas w/ complex signal processing algorithms
High performance and low power needs
Multiple algorithms and environments supported in same device
Fast design time
Bluetooth/Home Networks
Wireless Cellular
Wireless LAN
![Page 3: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/3.jpg)
3RICE UNIVERSITY
Flexibility needed in different layers
Physical Layer
MAC Layer
Network Layer
Application Layer
Support for multiple wirelessenvironments and algorithms
at high data rates
Puppeteer project at Ricehttp://www.cs.rice.edu/CS/Systems/Puppeteer/
Analog RF
![Page 4: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/4.jpg)
4RICE UNIVERSITY
Research vision: Attain flexibility
Architectures:Flexibility : support variety of sophisticated algorithmsHigh Performance: GOPs of computation (Mbps) Low Power: < 500 mW
Algorithms:Need efficient algorithms for mapping to architectures
Fast design exploration for efficient algorithms & architectures
Design me
![Page 5: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/5.jpg)
5RICE UNIVERSITY
My contributions: Algorithms
Multi-user channel estimation:[Jnl. Of VLSI Sig. Proc.’02, ASAP’00]Matrix-inversionsNumerical techniques
conjugate-gradient descent for complexity reduction
Multi-user detection: [ISCAS’01]Block-based computation to streaming computations
Pipelining, lower memory requirements
Parallel, fixed-point, streaming VLSI implementations [IEEE Trans. Wireless Comm.’02]
![Page 6: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/6.jpg)
6RICE UNIVERSITY
My contributions: Architectures
Heterogeneous DSP-FPGA system designs: [ICSPAT’00]
Computer arithmetic:[Symp. On Comp. Arith’01]Dynamic truncation in ASICs using on-line arithmetic
[Ph.D. Thesis]
Scalable Wireless Application-specific Processors (SWAPs)
Rapid architecture exploration for flexibility-performance tradeoffs
![Page 7: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/7.jpg)
7RICE UNIVERSITY
Scalable Wireless Application-specific Processors
Family of flexible programmable processorsClusters of ALUsHigh performance by supporting 100’s of ALUsCan provide customization for various algorithmsAdapts (“swaps”) architecture dynamically for power
+
?
**
+
**
+
**
+
**
…? ? ?
Scale Clusters
ScaleALUs
![Page 8: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/8.jpg)
8RICE UNIVERSITY
Rapid design exploration for SWAPs
Low “complexity”, parallel, fixed pointalgorithms
Architecture Exploration ASIC
designapply
DSPdesign
apply
SWAPs+?**
+
**
+
**
+
**…? ? ?
![Page 9: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/9.jpg)
9RICE UNIVERSITY
Research vision summary
Provide a framework to rapidly explore:flexible, high performance, low power architectures (SWAPs)
Efficient algorithm design for mapping to SWAPs
Understanding of algorithms, DSPs and ASICs usedFlexibility-performance trade-off with increasing customization in SWAPs
Inter-disciplinary research:Wireless communications, VLSI Signal Processing, Computer
architecture, Computer arithmetic, CAD, Compilers
![Page 10: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/10.jpg)
10RICE UNIVERSITY
Talk Outline
Research vision
SWAPs - Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 11: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/11.jpg)
11RICE UNIVERSITY
SWAPs borrow from DSPs
DSPs useInstruction Level Parallelism (ILP)Subword Parallelism (MMX)
Current DSPsNot enough functional units (ALUs) for GOPs of computation
• cannot extend to more ALUs• TI C6x DSP has 8 ALUs -- Need 100’s of ALUs
Cannot support more registers (area,ports)Difficult to find ILP as ALUs increase
![Page 12: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/12.jpg)
12RICE UNIVERSITY
SWAPs borrow from ASICs
Exploit data parallelism (DP) alsoAvailable in many wireless algorithmsThis is what ASICs do!
int i,a[N],b[N],c[N]; // 32 bitsshort int d[N],e[N],f[N]; // 16 bits packed
for (i = 0; i< 1024; ++i)
{
a[i] = b[i] + c[i];
d[i] = e[i] + f[i];
}ILP
DP
Subword
![Page 13: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/13.jpg)
13RICE UNIVERSITY
SWAPs borrow from stream processors
Kernel
Viterbidecoding
StreamInput Data Output Data
Correlator channelestimation
receivedsignal
Matchedfilter
InterferenceCancellation
Decoded bits
Kernels (computation) and streams (communication)Operations on kernels use local data in clusters providing GOPs supportStreams expose data parallelism
Imagine stream processor at Stanford [Rixner’01]
Scott Rixner. Stream Processor Architecture, Kluwer Academic Publishers: Boston, MA, 2001.
![Page 14: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/14.jpg)
14RICE UNIVERSITY
SWAPs: multi-cluster DSPs
+++***
InternalMemory
ILP
Memory: Stream Register File (SRF)
DSP(1 cluster)
+++***
+++***
+++***
+++***
…ILP
DPSWAPs
adapt clusters to DPIdentical clusters, same operations.Power-down unused FUs, clusters
![Page 15: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/15.jpg)
15RICE UNIVERSITY
Arithmetic clusters in SWAPs
ALUs (+,*,/)Scratch-pad (Sp)
Indexed accessesComm. unit (CU)
Intercluster comm.Distributed reg. Files
Support more ALUs
Intercluster Network
From/To SRF
Cross Point
Local Register File
CU
+
+
+*
*/
+
/
+
+
+*
*/
+
/
Sp
SRF
![Page 16: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/16.jpg)
16RICE UNIVERSITY
Talk Outline
Research vision
SWAPs Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 17: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/17.jpg)
17RICE UNIVERSITY
SWAPs: Physical layer algorithms
Antenna
Channelestimation
Detection DecodingHigher
(MAC/Network/OS)
Layers
RF Front-end
Basebandprocessing
![Page 18: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/18.jpg)
18RICE UNIVERSITY
SWAP mapping example: Viterbi decoding
Multiple antenna systems (MIMO systems)Complexity exponential with transmit x receive antennas
Estimation: Linear MMSE, blind, conjugate gradient….
Detection: FFT, (blind) interference cancellation….
Decoding: Viterbi, Turbo, LDPC…. & joint schemes
SWAP flexibility lets you use the best algorithms for the situation
Example for concept demonstration: Viterbi decoding
![Page 19: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/19.jpg)
19RICE UNIVERSITY
Parallel Viterbi Decoding for SWAPs
Add-Compare-Select (ACS) : trellis interconnect : computationsParallelism depends on constraint length (#states)
Traceback: searchingConventional
• Sequential (No DP) with dynamic branching• Difficult to implement in parallel architecture
Use Register Exchange (RE) • parallel solution
ACS Unit
Traceback Unit
Detectedbits
Decodedbits
![Page 20: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/20.jpg)
20RICE UNIVERSITY
Parallel Viterbi needs re-ordering for SWAPs
Exploiting Viterbi DP in SWAPs:Use RE instead of regular traceback Re-order ACS, RE
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7)X(8)X(9)X(10)X(11)
X(12)X(13)X(14)X(15)
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7)X(8)X(9)X(10)X(11)
X(12)X(13)X(14)X(15)
X(0)X(2)X(4)X(6)X(8)X(10)X(12)X(14)X(1)X(3)X(5)X(7)X(9)X(11)X(13)X(15)
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7)X(8)X(9)X(10)X(11)X(12)X(13)X(14)X(15)
DP
vector
Regular ACSACS in SWAPs
![Page 21: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/21.jpg)
21RICE UNIVERSITY
Talk Outline
Research vision
SWAP Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 22: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/22.jpg)
22RICE UNIVERSITY
Designing the SWAP architecture
More clusters better than more ALUs/per cluster
1. Decide how many clustersExploit DP
2. Decide what to put within each clusterMaximize ILP with high functional unit efficiencySearch design space with “explore” tool
Time-power-area characterization
+?**
+
**
+
**
+
**
…ILP
DP
? ? ?
![Page 23: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/23.jpg)
23RICE UNIVERSITY
Design a SWAP cluster: “Explore”
Auto-exploration of adders and multipliers for “ACS"
1
2
3
4
5
1
2
3
4
5
40
60
80
100
120
140
160
(43,58)(54,59)
(39,41)
(62,62)
(47,43)
#Multipliers
(40,32)
(70,59)
(65,45)
(49,33)
(39,27)
(80,34)
(73,41)
(61,33)
(48,26)
(39,22)
(50,22)
(85,24)
(76,33)
(60,26)
#Adders
(61,22)
(85,17)
(72,22)
(72,19)
(85,13)
(85,11)
Inst
ruct
ion
coun
t
(Adder util%, Multiplier util%)
![Page 24: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/24.jpg)
24RICE UNIVERSITY
“Explore” tool benefits
Instruction count vs. ALU efficiencyWhat goes inside each cluster
Design customized application-specific unitsBetter performance with increased ALU utilization
Explore Algorithm 1 : 3 adders, 3 multipliers, 32 clustersExplore Algorithm 2 : 4 adders, 1 multiplier, 64 clusters
Chosen Architecture: 4 adders, 3 multipliers, 64 clusters
Explore multiple algorithmsturn off functional units not in use for given kernel
![Page 25: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/25.jpg)
25RICE UNIVERSITY
SWAP flexibility provides power savings
Multiple algorithmsDifferent ALU requirementsDifferent cluster requirements
Turning off ALUsUse the right #ALUs for kernel from static code schedule
Turning off clusters Data across SRF of all clustersEach cluster does not have access to entire SRFNext kernel may need data from SRF of other clustersReconfiguration support needs to be provided
![Page 26: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/26.jpg)
26RICE UNIVERSITY
SWAPs provide cluster scaling
Use mux-demux buffers
Latency hidden - Minimal loss in performance
Can turn off clusters entirely
SRF
Clusters
Mux-Demuxbuffers
![Page 27: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/27.jpg)
27RICE UNIVERSITY
Viterbi reconfiguration using SWAPs
Packet 1Constraint length 7
(16 clusters)
Packet 2Constraint length 9
(64 clusters)
Packet 3Constraint length 5
(4 clusters)
DP Can be turned OFF
![Page 28: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/28.jpg)
28RICE UNIVERSITY
64-bit Rate ½
Packet 1K = 7
Packet 2K = 9
Packet 3K = 5
Kernels(Computation)
No Data Memoryaccesses
Exe
cution T
ime
(cyc
les)
Clusters Memory
Run-time SWAP flexibility
![Page 29: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/29.jpg)
29RICE UNIVERSITY
SWAP exploration for Viterbi decoding
1 10 1001
10
100
1000
Number of clusters
Freq
uen
cy n
eede
d to
att
ain
rea
l-ti
me
(in
MH
z)
K = 9K = 7 K = 5Different SWAPs
(Without reconfiguration)Same SWAP
(With reconfiguration)
DSP
Ideal C64x (w/o co-proc) needs ~200 MHz for real-time
Max DP
![Page 30: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/30.jpg)
30RICE UNIVERSITY
SWAPs : Salient features
1-2 orders of magnitude better than a DSP
Any constraint length ⇒ 10 MHz at 128 Kbps
Same code for all constraint lengths no need to re-compile or load another codeas long as parallelism/cluster ratio is constant
Power savings due to dynamic cluster scaling
![Page 31: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/31.jpg)
31RICE UNIVERSITY
Expected SWAP power consumption
Power model based on [Khailany’03]64 clusters and 1 multiplier per cluster:
0.13 micron, 1.2 VPeak Active Power: ~9 mW at 1 MHz (DSP ~1 mW at 1 MHz)Area: ~53.7 mm2
10 MHz, 128 Kbps with reconfiguration ( DSP ~200mW)
Exploring the VLSI Scalability of Stream Processors, Brucek Khailany et al, Proceedings of theNinth Symposium on High Performance Computer Architecture, February 8-12, 2003
0 10 20 30 40 50 60 700102030405060708090
Active Clusters (max 64)Po
wer
(in
mW
)Viterbi Clusters Peak Power
K = 9 64 ~90 mWK = 7 16 ~28.57 mWK = 5 4 ~13.8 mW
overhead 0 ~8.1 mW
![Page 32: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/32.jpg)
32RICE UNIVERSITY
Multiuser Estimation-Detection+Decoding
Real-time target : 128 Kbps per user
1 10 10010
100
1000
10000
100000
Number of clustersFreq
uenc
y ne
eded
to a
ttain
real
-tim
e (in
MH
z)
FASTMEDIUMSLOW
32-user base-station
Mobile
DSP
Ideal C64x (w/o co-proc) needs ~15 GHz for real-time
![Page 33: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/33.jpg)
33RICE UNIVERSITY
Expected SWAP power : base-station
32 user base-station with 3 X’s per cluster and 64 clusters:0.13 micron, 1.2 VPeak Active Power: ~18.19 mW for 1 MHz (increased X)Area: ~93.4 mm2
Total Peak Base-station power consumption:~18.19 W at 1 GHz for 32 users at 128 Kbps/user
![Page 34: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/34.jpg)
34RICE UNIVERSITY
Talk Outline
Research vision
SWAP Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 35: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/35.jpg)
35RICE UNIVERSITY
Current research:Flexibility vs. performance
SWAPs: 128 Kbps at ~10-100 mW for ViterbiBorrow DP from ASICs!
suitable for base-stationsFlexibility more important than power
suitable for mobile devicesPower constraints tightercan be customized for further power savings
Handset SWAPs (H-SWAPs)Borrow Task pipelining from ASICs!Application-specific units and specialized comm. network
![Page 36: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/36.jpg)
36RICE UNIVERSITY
Handset SWAPs: H-SWAPs
Trade Data Parallelism for Task Pipelining
SRF
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
…
DP
SWAPs(max. clusters
and reconfigure)
+++*
+++*
+++*
+++*
LimitedDP
SWAPlet(limit
clusters)
+++*
+++*
+++*
+++*
LimitedDP
++*
++*
++*
++*
LimitedDP
++++
++++
LimitedDP
H-SWAPs(collection of customized
SWAPlets)
![Page 37: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/37.jpg)
37RICE UNIVERSITY
Sample points in architecture exploration
DSPs(1 cluster)
ILPSubword
ILPSubword
DP
SWAPs(multiple)
H-SWAPs(optimized for handsets)
ILPSubword
DP Task PipeliningCustom ALUs
Programmable solutions with increased customization
Performance, Power benefits
![Page 38: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/38.jpg)
38RICE UNIVERSITY
Future research: Efficient algorithms
MultipathChannel
EqualizerMRC Decoder
DetectorDemodulator
Non-Coherent
STC
Beam-forming
CoherentSTC
ChannelEstimator
Channel
Turbo Equalizer
Multiple Antenna Systems
![Page 39: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/39.jpg)
39RICE UNIVERSITY
Future research: Architectures
Generalized framework and tools for evaluating algorithm-architecture and area-time-power-flexibility trade-offs
Potential applicationsImage processing:
Cameras : variety of compression algorithms
Biomedical applications: Hearing aids: DSP running on body heat*
Sensor networksCompression of data before transmission
*Quote: Gene Frantz, TI Fellow
![Page 40: Sridhar Rajagopalsridhar/ppts/rochester-talk.pdfInput Data Output Data Correlator channel estimation received signal Matched filter Interference Cancellation Decoded bits ¾Kernels](https://reader033.vdocument.in/reader033/viewer/2022053001/5f056bbb7e708231d412e0bf/html5/thumbnails/40.jpg)
40RICE UNIVERSITY
SWAPs: Flexibility, Performance, Power
Need flexible architectures for future wireless devicesHigher data rates, lower power, more complex algorithms
Rapid Exploration for Scalable, Wireless Application-specific ProcessorsFlexibility vs. performance trade-offs
SWAPs - flexibility, high performance and low powerExploit data parallelism like ASICs1-2 orders better performance than DSPsTurn off unused clusters and unused ALUs for low power