dsps for future wireless systems
DESCRIPTION
DSPs for future wireless systems. Sridhar Rajagopal. Motivation. Baseband. Programmable. A/D. Wireless Mobile. RF Unit. D/A. device. Communications. Processor. Higher Layers. Add-on PCMCIA Network Interface Card. Mobile: Switch between standards and between parameters - PowerPoint PPT PresentationTRANSCRIPT
RICE UNIVERSITY
DSPs for future wireless systems
Sridhar Rajagopal
RICE UNIVERSITY
Motivation
Wireless Mobiledevice
BasebandProgrammable
CommunicationsProcessor
RF UnitA/DD/A
Add-on PCMCIA Network Interface CardHigher Layers
•Mobile: Switch between standards and between parameters
•Base-station: varying number of users with different parameters
RICE UNIVERSITY
The problem
Processor Type Algorithms Data rate targets Constraints
Mobile W-CDMA, W-LAN 128Kbps, 100Mbps/N Time,Power,AreaBase-station W-CDMA 4 Mbps TimeBase-station W-LAN 100 Mbps Time
GPP
DSP
FPGA
VLSI
PerformancePower
Flexibility
RICE UNIVERSITY
An approach for the solution
Algorithms well understood at VLSI level
Can design real-time systems.
Pushing it higher in the chain
Current DSPs not powerful enough for our application
Using the IMAGINE simulator to see what kind of architecture features would be useful in a future DSP for such applications.
RICE UNIVERSITY
History of my work
Algorithms
DSP
VLSI
FPGA
IMAGINE
Multiuser channel estimationMultiuser detection
Task-partitioningParallelism Pipelining
Conventional arithmeticOn-line arithmetic
Instruction set extensionsCo-processor support
Functional unit design and usage
DistantPast
RecentPast
Recent andNear Future
RICE UNIVERSITY
Contents
Programmable architecture design using the
IMAGINE simulator
Multiuser estimation and detection implementation
Performance comparisons and results
Other extensions for possible integration
Conclusions
RICE UNIVERSITY
The IMAGINE architecture and simulator
IMAGINE is a media signal processor
Stream Register FileNetworkInterface
StreamController
Imagine Stream Processor
HostProcessor
Net
wor
k
AL
U C
lust
er 0
AL
U C
lust
er 1
AL
U C
lust
er 2
AL
U C
lust
er 3
AL
U C
lust
er 4
AL
U C
lust
er 5
AL
U C
lust
er 6
AL
U C
lust
er 7
SDRAMSDRAM SDRAMSDRAM
Streaming Memory System
Mic
roco
ntr
olle
r
RICE UNIVERSITY
Why the IMAGINE simulator?
Great for media processing algorithms
Has a VLIW-based cluster -- DSP comparisons A good base architecture : 1024-pt FFT
RSIM, SimpleScalar…: more general purpose architecture simulators
Processor Type Time Frequency Power Energy
Imagine 7.4 s 500 MHz 3.8 W 28.12 JTI C6711 120 s 150 MHz 1.3 W 156 JTI C6411 20 s 300 MHz 0.25 W 5 J
Virtex II FPGA 1 s 140 MHz 1 W 1 J
RICE UNIVERSITY
What does the simulator give us?
Execution time for the different parts of the
code
Functional unit utilization
Insights into the bottlenecks
Flexibility to add and remove functional units
already present or design your own
Graphical view of the schedule on the
functional units
RICE UNIVERSITY
Down-side
2 level C++ programmingStreamC:
• transfers streams of data between main memory and stream register file (SRF)
KernelC:• transfers streams from the SRF to the ALU clusters
Code optimized to the number of ALU clusters and the size of the data
Compiler may fail register allocation if too many variables or functional units modified
RICE UNIVERSITY
Contents
Programmable architecture design using the
IMAGINE simulator
Multiuser estimation and detection implementation
Performance comparisons and results
Other extensions for possible integration
Conclusions
RICE UNIVERSITY
Typical workload representation (Base-station)
Equalization FFT Viterbi decoding
Channel estimation Multiuser detection Viterbi/Turbo decoding
Multiple antennas Long spreading codes Space-Time codes
Wireless LAN
W-CDMA
If you felt that life was too easy
RICE UNIVERSITY
Estimation/Detection (64,32 sizes)
TTLLbbbb bbbbRR 00 **
HHLLbrbr rbrbRR 00 **
)RR*A(AA brbb
1ii1iii RxCxLxyy )y(signd ii
H
1H10
H01
H10
H0
1H0
L R
)]AAAdiag(AAAARe[A C
]ARe[A L
)y(signd
]xAxARe[y
ii
1iH1i
H0i
MultiuserEstimation
Kernel 1,2,3
MultiuserDetection
Kernel 6, 7
Massaging matricesfor detection
Kernel 4, 5
RICE UNIVERSITY
Kernels
1. Update: Update Rbb, Rbr 2. Mmult : multiply Rbb * A 3. Iterate: gradient descent
4. MmultL: Calculate L 5. MmultC: Calculate C
6. Mf: Matched Filter 7. Pic: 1 Parallel Interference Cancellation Stage
RICE UNIVERSITY
Kernel 2 (mmult) for 3 +,2*
Divider not being utilized
Adders have limited FU utilization
O(N3) *, O(N3) +
Multipliers 100% in loop
Replace / with *
RICE UNIVERSITY
Kernel 2 (mmult)for 3 +,3*
better adder utilization
needs sufficient registers for scaling [register allocation may fail]
code may also need slight tuning of variables for optimization
RICE UNIVERSITY
Contents
Programmable architecture design using the
IMAGINE simulator
Multiuser estimation and detection implementation
Performance comparisons and results
Other extensions for possible integration
Conclusions
RICE UNIVERSITY
FU utilization on each cluster
Kernel
Functionalunit
utilization*(3 +, 2 *)
ExecutionTime
(cycles)
Functionalunit
utilization*(3 +, 3 *)
ExecutionTime
(cycles)
PerformanceImprovement(Expected:1.5)
1 70% ,100% 1104 78.6% ,78% 960 1.152 53% ,91% 144192 85% ,99% 91136 1.58223 55% ,42% 37892 IN/ OUT 37892 1
Total 1299884 59% ,91% 36128 78% ,84% 26944 1.3415 63% ,96% 68960 68% ,71% 62816 1.1
Total 897606 67% ,100% 2063 90% ,89.6% 1552 1.337 67% ,96% 4842 (3X) 89% ,84.2% 3690 (3X) 1.31
Total 5242
Time for detection at 128 Kbps for each of 32 users at 500 MHz : 4000 cycles
RICE UNIVERSITY
Comparisons with DSPs
0 5 10 15 20 25 30 3510
-6
10-5
10-4
10-3
10-2
Ex
ecu
tio
n t
ime
(in
se
con
ds
)
Users
Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user Our architecture based on Imagine
X
x
RICE UNIVERSITY
Current work
Evaluating performance of wireless communication algorithms such as estimation, detection and decoding on this architecture
Studying bottlenecks, functional unit design needed to attain real-time
The insights gained from the design can also be applied to other processors such as DSPs.