dsps for future wireless systems

20
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal

Upload: halee-dean

Post on 30-Dec-2015

32 views

Category:

Documents


1 download

DESCRIPTION

DSPs for future wireless systems. Sridhar Rajagopal. Motivation. Baseband. Programmable. A/D. Wireless Mobile. RF Unit. D/A. device. Communications. Processor. Higher Layers. Add-on PCMCIA Network Interface Card. Mobile: Switch between standards and between parameters - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DSPs for future wireless systems

RICE UNIVERSITY

DSPs for future wireless systems

Sridhar Rajagopal

Page 2: DSPs for future wireless systems

RICE UNIVERSITY

Motivation

Wireless Mobiledevice

BasebandProgrammable

CommunicationsProcessor

RF UnitA/DD/A

Add-on PCMCIA Network Interface CardHigher Layers

•Mobile: Switch between standards and between parameters

•Base-station: varying number of users with different parameters

Page 3: DSPs for future wireless systems

RICE UNIVERSITY

The problem

Processor Type Algorithms Data rate targets Constraints

Mobile W-CDMA, W-LAN 128Kbps, 100Mbps/N Time,Power,AreaBase-station W-CDMA 4 Mbps TimeBase-station W-LAN 100 Mbps Time

GPP

DSP

FPGA

VLSI

PerformancePower

Flexibility

Page 4: DSPs for future wireless systems

RICE UNIVERSITY

An approach for the solution

Algorithms well understood at VLSI level

Can design real-time systems.

Pushing it higher in the chain

Current DSPs not powerful enough for our application

Using the IMAGINE simulator to see what kind of architecture features would be useful in a future DSP for such applications.

Page 5: DSPs for future wireless systems

RICE UNIVERSITY

History of my work

Algorithms

DSP

VLSI

FPGA

IMAGINE

Multiuser channel estimationMultiuser detection

Task-partitioningParallelism Pipelining

Conventional arithmeticOn-line arithmetic

Instruction set extensionsCo-processor support

Functional unit design and usage

DistantPast

RecentPast

Recent andNear Future

Page 6: DSPs for future wireless systems

RICE UNIVERSITY

Contents

Programmable architecture design using the

IMAGINE simulator

Multiuser estimation and detection implementation

Performance comparisons and results

Other extensions for possible integration

Conclusions

Page 7: DSPs for future wireless systems

RICE UNIVERSITY

The IMAGINE architecture and simulator

IMAGINE is a media signal processor

Stream Register FileNetworkInterface

StreamController

Imagine Stream Processor

HostProcessor

Net

wor

k

AL

U C

lust

er 0

AL

U C

lust

er 1

AL

U C

lust

er 2

AL

U C

lust

er 3

AL

U C

lust

er 4

AL

U C

lust

er 5

AL

U C

lust

er 6

AL

U C

lust

er 7

SDRAMSDRAM SDRAMSDRAM

Streaming Memory System

Mic

roco

ntr

olle

r

Page 8: DSPs for future wireless systems

RICE UNIVERSITY

Why the IMAGINE simulator?

Great for media processing algorithms

Has a VLIW-based cluster -- DSP comparisons A good base architecture : 1024-pt FFT

RSIM, SimpleScalar…: more general purpose architecture simulators

Processor Type Time Frequency Power Energy

Imagine 7.4 s 500 MHz 3.8 W 28.12 JTI C6711 120 s 150 MHz 1.3 W 156 JTI C6411 20 s 300 MHz 0.25 W 5 J

Virtex II FPGA 1 s 140 MHz 1 W 1 J

Page 9: DSPs for future wireless systems

RICE UNIVERSITY

What does the simulator give us?

Execution time for the different parts of the

code

Functional unit utilization

Insights into the bottlenecks

Flexibility to add and remove functional units

already present or design your own

Graphical view of the schedule on the

functional units

Page 10: DSPs for future wireless systems

RICE UNIVERSITY

Down-side

2 level C++ programmingStreamC:

• transfers streams of data between main memory and stream register file (SRF)

KernelC:• transfers streams from the SRF to the ALU clusters

Code optimized to the number of ALU clusters and the size of the data

Compiler may fail register allocation if too many variables or functional units modified

Page 11: DSPs for future wireless systems

RICE UNIVERSITY

Contents

Programmable architecture design using the

IMAGINE simulator

Multiuser estimation and detection implementation

Performance comparisons and results

Other extensions for possible integration

Conclusions

Page 12: DSPs for future wireless systems

RICE UNIVERSITY

Typical workload representation (Base-station)

Equalization FFT Viterbi decoding

Channel estimation Multiuser detection Viterbi/Turbo decoding

Multiple antennas Long spreading codes Space-Time codes

Wireless LAN

W-CDMA

If you felt that life was too easy

Page 13: DSPs for future wireless systems

RICE UNIVERSITY

Estimation/Detection (64,32 sizes)

TTLLbbbb bbbbRR 00 **

HHLLbrbr rbrbRR 00 **

)RR*A(AA brbb

1ii1iii RxCxLxyy )y(signd ii

H

1H10

H01

H10

H0

1H0

L R

)]AAAdiag(AAAARe[A C

]ARe[A L

)y(signd

]xAxARe[y

ii

1iH1i

H0i

MultiuserEstimation

Kernel 1,2,3

MultiuserDetection

Kernel 6, 7

Massaging matricesfor detection

Kernel 4, 5

Page 14: DSPs for future wireless systems

RICE UNIVERSITY

Kernels

1. Update: Update Rbb, Rbr 2. Mmult : multiply Rbb * A 3. Iterate: gradient descent

4. MmultL: Calculate L 5. MmultC: Calculate C

6. Mf: Matched Filter 7. Pic: 1 Parallel Interference Cancellation Stage

Page 15: DSPs for future wireless systems

RICE UNIVERSITY

Kernel 2 (mmult) for 3 +,2*

Divider not being utilized

Adders have limited FU utilization

O(N3) *, O(N3) +

Multipliers 100% in loop

Replace / with *

Page 16: DSPs for future wireless systems

RICE UNIVERSITY

Kernel 2 (mmult)for 3 +,3*

better adder utilization

needs sufficient registers for scaling [register allocation may fail]

code may also need slight tuning of variables for optimization

Page 17: DSPs for future wireless systems

RICE UNIVERSITY

Contents

Programmable architecture design using the

IMAGINE simulator

Multiuser estimation and detection implementation

Performance comparisons and results

Other extensions for possible integration

Conclusions

Page 18: DSPs for future wireless systems

RICE UNIVERSITY

FU utilization on each cluster

Kernel

Functionalunit

utilization*(3 +, 2 *)

ExecutionTime

(cycles)

Functionalunit

utilization*(3 +, 3 *)

ExecutionTime

(cycles)

PerformanceImprovement(Expected:1.5)

1 70% ,100% 1104 78.6% ,78% 960 1.152 53% ,91% 144192 85% ,99% 91136 1.58223 55% ,42% 37892 IN/ OUT 37892 1

Total 1299884 59% ,91% 36128 78% ,84% 26944 1.3415 63% ,96% 68960 68% ,71% 62816 1.1

Total 897606 67% ,100% 2063 90% ,89.6% 1552 1.337 67% ,96% 4842 (3X) 89% ,84.2% 3690 (3X) 1.31

Total 5242

Time for detection at 128 Kbps for each of 32 users at 500 MHz : 4000 cycles

Page 19: DSPs for future wireless systems

RICE UNIVERSITY

Comparisons with DSPs

0 5 10 15 20 25 30 3510

-6

10-5

10-4

10-3

10-2

Ex

ecu

tio

n t

ime

(in

se

con

ds

)

Users

Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user Our architecture based on Imagine

X

x

Page 20: DSPs for future wireless systems

RICE UNIVERSITY

Current work

Evaluating performance of wireless communication algorithms such as estimation, detection and decoding on this architecture

Studying bottlenecks, functional unit design needed to attain real-time

The insights gained from the design can also be applied to other processors such as DSPs.