wireless communications from systems to silicon raghu rao wireless systems group, xilinx inc
Post on 12-Jan-2016
220 Views
Preview:
TRANSCRIPT
WIRELESS COMMUNICATIONS From Systems to Silicon
Raghu RaoWireless Systems Group,Xilinx Inc.
2
R. M. Rao, 2008
Agenda• Introduction to Wireless communications
– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems
– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.
• The Platform FPGA• Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs– Digital baseband
• FPGA tools and design methodology
3
R. M. Rao, 2008
Communications Roadmap
• Key markets• Core DSP technologies
– OFDM– MIMO
• IP Network is key• Enables new approaches to
– QoS management– Robustness– Capacity
4
R. M. Rao, 2008
Wireless Environment
• Multipaths caused by reflections from various objects.
5
R. M. Rao, 2008
Modeling the Channel• As the mobile moves through the environment, the field
strength varies due to :– Free space path loss– Long term (slow) fading– Short term (fast) fading
log(distance)
Sign
al L
evel
(dB
)
path loss
long term fading
short term fading
6
R. M. Rao, 2008
Doppler
• Changes in the received carrier frequency due to the relative motion of the mobile to the base station
• f= fd = (v/cos()– for f=900 MHz, v = 70 MPH (112 km/h)– fD-max = v/ = 93.3 Hz
D=v.t
7
R. M. Rao, 2008
Delay Spread• Measure of the time distribution of power in the channel
impulse response– Typical office 25 ns to 60 ns
– Large Lobbies and atria: 100 ns
– Warehouse and factory floors: 100 ns to 200 ns
– Delay spreads are up to 10 microseconds in cellular environments• Greater than 3 sec in urban areas• 0.5 s in suburban and open areas
8
R. M. Rao, 2008
Exponential Power Delay Profile
• If the delay spread of the channel is larger than the symbol interval we will see multiple paths in our channel.
• Leads to inter-symbol interference (ISI).• Leads to a frequency selective channel.• Average energy of the channel impulse response follows an
exponential power-delay profile.
9
R. M. Rao, 2008
Coherence Bandwidth
• Maximum frequency bandwidth for which the signals are still considered to be correlated.
• Bc in Hz = 1/(2rms) when considering amplitude correlation (correlation coefficient = 0.5)
• rms is the rms-delay-spread of the channel
10
R. M. Rao, 2008
Coherence Time
• Maximum time period for which the signals are still considered to be correlated.
• It is used to characterize the time varying nature of the channel.
• Rule of Thumb 9/(16fm)<Tc<0.423/(fm)
– fm is the maximum Doppler frequency– Correlation coefficient = 0.5
11
R. M. Rao, 2008
Link Budget
• A link budget is used to compute the range, transmit power, receiver sensitivity and other requirements of the communication system.
• In free space the path loss is given by the Friis equation :
• Gt , Gr represent transmit and receive antenna gains. Pt , Pr represent the transmit power and receive power. is the wavelength, d is the distance.
2
2 2(4 )t t r
r
PGGP
d
12
R. M. Rao, 2008
Link Budget
• Expressing path loss in dB :
• Note: is the path loss exponent depending on the environment (2 in free space).
• To compute the SNR at the baseband we need to include thermal noise in the signal bandwidth B, and noise figure of the system NF.
( ) ( ) ( ) ( ) 20log( ) ( ).10log( )4r t t rP dB P dB G dB G dB d
( ) 174 / 10log( )rP dB dBm Hz NF B SNR
13
R. M. Rao, 2008
Link Budget• Margin for desired outage taking into account receiver
structure and antenna diversity.– Standards specify outage probabilities– WiMax – 90% in the cell, 75% at the boundary of the cell.
• Compensation factors for other impairments– Interference from neighbouring cell– Shadow fading, etc.
• Diversity helps achieve the outage probability (or reduces the margin for outage) without increase in transmit power.
14
R. M. Rao, 2008
Diversity
• Diversity provides the receiver with multiple looks at the transmitted signal.
• Prob(all channels in a fade) << Prob(any 1 channel in a fade)• Diversity improves link reliability.
0 20 40 60 80 100 120 140 160 180 200-20
-15
-10
-5
0
5
10
Time
Sign
al L
evel
(dB)
Channel 1
Channel 2
Combinedchannel
15
R. M. Rao, 2008
Diversity Techniques• Spatial Diversity
– Antennas “sufficiently spaced” apart (> ½ wavelength).– Will result in an independent channel response and provide another look at the
transmitted signal.• Frequency Diversity
– Transmit over multiple carrier frequencies.– If the frequencies are “sufficiently far” (coherence bandwidth) apart the channel
response will be different on the different frequencies.• Time Diversity
– Channel is continuously changing.– Transmit signals “sufficiently spaced” (coherence time) apart in time so the 2nd
transmission “sees” a different channel compared to the first one.• Polarization Diversity
– Signals transmitted on two orthogonal polarizations exhibit uncorrelated fading statistics.
16
R. M. Rao, 2008
MIMO Systems
Tx Antenna 1
Tx Antenna 2
Rx Antenna 1
Rx Antenna 2
Tx Antenna M Rx Antenna N
H
• MIMO systems:• Multiple Antennas at the transmitter and
receiver.• 3 types of MIMO Systems:
• STBC MIMO systems• Diversity gain.
• Spatial Multiplexing MIMO systems• Capacity/throughput gain.
• Feedback MIMO systems• Higher performance thru interference
reduction.• MISO (multiple input single output) Systems:
• STBC can be used with just 1 receive antenna.• Provides diversity gain.• To achieve array gain, need knowledge of
channel at the transmitter (feedback).
17
R. M. Rao, 2008
Spatial Multiplexing
• A spatial multiplexing MIMO system transmits different data symbols from each transmitter.
• The signals from each transmitter combine over the air and are received by multiple receive antennas.
• SM systems have a rate=M (num transmit antennas). The diversity order depends on the type of encoding and receiver (uncoded SM with ML decoding has diversity order=N (num receive antennas)).
MODULATOR
MODULATOR
MODULATOR
MIMOReceiverMIMO
Receiver
x(t)
y(t)
z(t)
r1(t) = a11x(t)+a12y(t)+a13z(t)
r3(t) = a31x(t)+a32y(t)+a33z(t)
x(n)
y(n)
z(n)
x(n)
y(n)
z(n)
18
R. M. Rao, 2008
Spatial Multiplexing Receivers
Zero Forcing receiver:
11h
22h
21h
12hTx Antenna 1
Tx Antenna 2
Rx Antenna 1
Rx Antenna 2
1 11 1 12 2 1
2 21 1 22 2 2
1 11 12 1 1
2 21 22 2 2
1 1
2 2
1
1 11 12 1
2 21 22 2
ˆ
ˆ
ˆ
ˆ
y h x h x n
y h x h x n
y h h x n
y h h x n
x y
x y
x h h y
x h h y
W
Significant increase in noise when the channel is in a deep fade.
For ZF receivers 1W H
19
R. M. Rao, 2008
Spatial Multiplexing Receivers
• MMSE MIMO Decoders:– Cancels interference and minimizes noise.– Minimizes the over all error (mean squared error).
2ˆ[( ) ]E x x
1H H
MMSE Ms
M MW H H I H
E SNR
20
R. M. Rao, 2008
Spatial Multiplexing Receivers
• Zero-Forcing• MMSE• Successive Interference cancellation receivers• Sphere detectors (sub-optimal Maximum
Likelihood)
21
R. M. Rao, 2008
Transmit Diversity
• Space Time Block Code (STBC)– 2 Antenna STBC also known as “Alamouti Code”.– Improves BER/SER performance.
Information Source
Constellation Mapper Alamouti ST
block code
h1
h2
SymbolPeriod 2
SymbolPeriod 1
STBC Decoder
ML Decision
ML Decision
Soft decision for c1
Soft decision for c2
1 1 1 2 2r h c h c * *2 1 2 2 1( ) ( )r h c h c
22
R. M. Rao, 2008
STBC Decoder
1 1 2 1 1* * * *
2 2 1 2 2
r h h c n
r h h c n
r Hc n
Decoder:
2 2 1 11 2 *
2 2
ˆ ( )
0ˆ ( )
0
H Hc H r H Hc n
c nc h h
c n
In matrix form the received signal is:
Low complexity decoder.Just 2 complex mults per symbol for a 2 antenna system (and grows linearly with block length/num antennas).
23
R. M. Rao, 2008
Other MIMO schemes
• Achieving high rate high diversity MIMO systems is an area of active research.
• There are many suboptimal STBC schemes that improve the rate but reduce the diversity order.
• There are also combinations of spatial multiplexing and STBC schemes.
• One such scheme is 2 (or more) Alamouti’s in parallel.
24
R. M. Rao, 2008
Stacked Alamouti
Information Source
Constellation Mapper Alamouti ST
block code
Constellation Mapper Alamouti ST
block code
Data Stream 1
Data Stream 2
Interference Cancellation and ML Decision
C1
C2
Data Stream 1
Data Stream 2
r1
r2
Receiver for Interference Cancelling STBC
Transmitter for Interference Cancelling STBC
• Interference Cancelling STBC• 2 Alamouti’s in parallel• Rate 2 system• Diversity order =
N*(M-K+1)– K : co-channel users– N : transmit antennas per user.– M : receive antennas
• Requires N*(K-1)+1 antennas at the receiver to suppress K-1 interferers.
25
R. M. Rao, 2008
Orthogonal Frequency Division Multiplexing (OFDM)
Frequency
Ma
gn
itud
e
OFDM divides a frequency selective channel into a numberof flat fading channels
26
R. M. Rao, 2008
OFDM Modulation
QAMMapping
IFFTCyclicPrefix
S/P P/SD/AandRF
(a)
RFandA/D
Stripcyclicprefix
S/P FFT P/SQAM
decoding
(b)
FEQ
• A QAM symbol is modulated onto each subcarrier
• IFFT/FFT are used for efficient modulation and demodulation
Frequency Domain Time Domain
Time Domain Frequency Domain
27
R. M. Rao, 2008
Combating Multipath
• Sampling at instant Ts all channels experience the same channel and there is no ICI
Multipath componentsmax
Sampling InstantTs
OFDM Symbol
CP
Constructing the cyclic prefix (CP)
28
R. M. Rao, 2008
MIMO and OFDM
• MIMO – Multiple Input Multiple Output Communication System. Employs multiple antennas at both transmitter and receiver.
• OFDM – Orthogonal Frequency Division Multiplexing. Breaks up a broadband channel into many parallel narrowband channels (subcarriers).
• MIMO-OFDM – A Combination of MIMO and OFDM. Appears like many parallel MIMO systems on orthogonal subcarriers.
29
R. M. Rao, 2008
MIMO-OFDM System
OFDM TRANSMITTER 1
OFDM TRANSMITTER N
OFDMDEMODULATOR 1
OFDMDEMODULATOR N
RIC
H S
CA
TT
ER
ING
EN
VIR
ON
ME
NT
MIM
O D
EC
OD
ER
Each transmitter is an independent OFDM modulator.
The source symbols could be space-time block coded or just QAM modulated for spatial multiplexing.
Each receiver is an OFDM demodulator combined with a MIMO decoder to invert the channel on each subcarrier and extract the source symbols.
30
R. M. Rao, 2008
Agenda• Introduction to Wireless communications
– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems
– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.
• The Platform FPGA• Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs– Digital baseband
• FPGA tools and design methodology
31
R. M. Rao, 2008
802.16/802.16e
• The 802.16 WirelessMAN standard includes requirements for operation in :– Line Of Sight (LOS), 10-66 GHz for fixed wireless systems.– Non Line Of Sight (NLOS), <11 GHz for fixed wireless
systems.
• 802.16e (Mobile WiWax) adds enhancements for mobility in the <11 GHz licensed and unlicensed bands. – Operation in mobile mode is limited to licensed bands between
2 GHz and 6 GHz.
32
R. M. Rao, 2008
Scalable OFDMA parameters
Parameters Values
System bandwidth (MHz) 1.25 5 10 20
FFT size (NFFT) 128 512 1024 2048
Sampling Frequency (Fs, MHz) 1.4 5.6 11.2 22.4
Sample Time (1/Fs ns) 714.28 178.57 89.28 44.64
Subcarrier spacing 10.94 KHz
Useful Symbol time 91.4 us
Guard interval 11.4 us
OFDMA symbol time 102.9 us
33
R. M. Rao, 2008
Link BudgetDownlink Uplink
Transmit Power 10 Watts = 40dBm (max=20 Watts)
200 mW = 23dBm (max=200 mW)
Antenna Height 32 meters 1.5 meters
Antenna Gain 15 dBi (BS) -1 dBi (mobile)
EIRP 55 dBm (approx) 22 dBm
# occupied subcarriers 840 out of 1024 840 out of 1024
Power/subcarrier 28 dBm 3.44 dBm
Noise Figure 9 dB (at mobile) 4 dB (at BS)
Total margin for interference, shadow fading, .. (75% coverage at cell edge, 90% overall)
20 dB 20 dB
BS to BS distance 2.8 kms 2.8 kms
SNR Required (Modulation – QPSK 1/8, (repetition code = 4)) (BER=10^-6 after FEC)
-3.31 dB -2.5 dB
Rx sensitivity -100.7 dB -111.1 dB
Max allowable path loss 136.4 dB 133 dB
34
R. M. Rao, 2008
Time Division Duplexing
• 802.16e can be deployed in TDD and FDD environments.• Initial certification profiles are only for TDD.• The DL subframe and UL subframe lengths are adjustable.• TDD assures channel reciprocity.
Frame (j-2) Frame (j+2)Frame (j+1)Frame (j)Frame (j-1)
Downlink subframe Uplink subframe
Adaptive
TTG : Transmit-Receive transition gap
RTG : Receive-Transmit transition gap
35
R. M. Rao, 2008
OFDMA Frame Structure
DL-MAP – Downlink MAP : downlink allocationsUL-MAP – Uplink MAP : uplink allocationsFCH – Frame control header : contains information about the DL-MAP
FCH
FCH
Downlink (DL) Subframe Uplink (UL) SubframeTTG RTG
OFDMA Symbol Number
Su
bch
anne
l log
ical
num
be
r
Pre
am
ble
DL-
MA
P
UL-
MA
P
DL Burst SS1
DL Burst Broadcast
DL Burst Multicast
DL Burst SS2
DL Burst SS3
DL Burst SS1
(From BS2)
DL Burst SS4
Pre
am
ble
DL
-MA
P
UL Burst SS1
UL Burst SS2
UL Burst SS3
UL Burst SS4
Ranging subchannel
36
R. M. Rao, 2008
Data rates for SIMO/MIMO configurations
Source: WiMax Forum
64 QAM with 5/6 CTC
37
R. M. Rao, 2008
Baseband Transmission Model
• OFDM receiver provides estimates of– Channel hn,i(t)
– Frequency offset 0
– Sample timing T'– OFDM symbol timing
OFDMTransmitter
ChannelInner
ReceiverOuter
Receiverai,k
s(t) r(t) ADC
ResultingChannel hi(t)
Timing Delayd(t-eT')
s(t)
hn,i(t)Timing Delay
0(t) Noisen(t)
T'
r(n)
r(n)
38
R. M. Rao, 2008
Generic OFDM Transmitter
• Figure shows a generic MIMO OFDM Tx– MIMO not an element of 802.11a, but it is in 802.11n,
3GPP-LTE and 802.16e
MAC
SourceCoding
e.g. LDPC
Space-TimeEncoder
Beamforming
IFFTAppend
CPInsert Pilots
CFR DUC DPD DAC RF PA
IFFTAppend
CPInsert Pilots
CFR DUC DPD DAC RF PA
39
R. M. Rao, 2008
OFDM Receiver Architecture
• Figure illustrates architecture for generic OFDM Rx• Details will vary as a function of
– Packet-based versus broadcast transmission– Existance of a preamble (or not) in the waveform
ADC
DA
C
DDCSample
Clock Adj.
Course Freq. Offset
Correction
Symbol Timing
CPRemoval
FFT
Extract Pilots
Fine Sample
Clock Adj
Fine Freq.Offset Adj.
Freq. Domain Equalizer
Channel Estimation
PowerEst.
Extract Preamble
Channel Decoding, e.g.
LDPC
MediumAccess
Controller
To/From Network
40
R. M. Rao, 2008
Agenda• Introduction to Wireless communications
– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems
– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.
• The Platform FPGA– Overview of FPGAs and FPGA tools– Building DSP sub-systems on FPGAs– Digital baseband
• FPGA tools and design methodology
41
R. M. Rao, 2008
Digital Receiver Architecture:Abstracted Architecture
• Common model of abstraction for digital receiver is inner/outer receiver
Ø Frequency Offset Estimation/CorrectionØ Sample Clock Offset CorrectionØ Channel Estimation/EqualizationØ Frame detectionØ AGCØ Successive Interference CancellationØ Space-Time-CodingØ IFFT/FFTØ Per sub-carrier processing
Inner Receiver
Receiver Abstraction
Outer Receiver
Control, Protocol and Link Layer processing
Digital IF Processing
q Beamformingq QRD-RLS
Ø Up-ConversionØ Down-ConversionØ ChannelizerØ Fast AGC
Ø Channel Coding
q LDPCq TPCq CTCq Viterbiq (De-) Interleave
Ø Medium Access Control (MAC)Ø Link Layer Processing
Ø System Initialization, Control and MonitoringØ Application
Ø EthernetØ PCI ExpressØ SRIO
Ø CPRIØ OBSAI
42
R. M. Rao, 2008
Receiver Abstraction and Projection on to Platform FPGA
Receiver Function
Characteristics FPGA Platform
Comments
Digital IF Processing
MAC Intensive SX DSP48 main requirement
Inner Receiver MAC intensive Some functions LUT
intensive CORDIC in QRD-RLS
FFT processing for OFDM Correlation processing for
timing Per-carrier complexity
processing (MIMO-OFDM)
SX/LX DSP48 leveraged FFT
FPGA fabric for CORDIC FFT
Outer Receiver
Symbol rate tasks Channel coding
LX ACS/ACSO dominated by low bit precision add/multiplexors
Good match for fabric
Lots of memory required
Control/ Protocol
Gigabit connectivity Linux OS “heavy” tasks TCP/IP
FX Embedded PPC used Rocket IO for
PCI Express SRIO
Num. Sub-carriersTX RXN N
SX/LX
Receiver Abstraction
LX
FX
SX
FPGA product portfolioTailored for various processing Tasks in communicationsreceiver
43
R. M. Rao, 2008
Digital Frontend
Digital upconversion (downconversion)Crest factor reductionDigital pre-distortion
44
R. M. Rao, 2008
Serial Gigabit OBSAI/CPRI Proprietary serial
backplane Inter-chip connectivity
Embedded Software
MAC (Media Access)Decision oriented
tasks CORBARTOSNBAPSCA (JTRS radios)
Conn
ectiv
ity
DACDACADCADC
Logic & IO OBSAI/CPRI SRIO AD/DA interface EMIF
DUC,DDCCFR,DPD
RACHSearcher
OFDM PHYTCC
MIMO
High Performance Processing
High MIPs tasks Radio PHYSupported by embedded
DSP tiles, distributed memory, block memory and logic fabric
SRIO
EMIF
The Platform
45
R. M. Rao, 2008
Virtex-4/5 FPGA ArhitectureHigh-Level View
• FPGA family with 3 members tailored for specific classes of processing– SX: DSP
– LX: Logic centric
– FX: Full featured
• Embedded PowerPC hard IP
• Giga-bit serial connectivity
• DSP processing tiles “DSP48”
46
R. M. Rao, 2008
Virtex-5 FPGA Platform
• 2 slices per CLB, 4 LUTs per CLB• Can be configured as a shift register• Can be configured as distributed memory
Can be configured as RAM
Can be configured as a shift register
47
R. M. Rao, 2008
Arithmatica Parallel CounterArithmatica Parallel Counter20% Faster Performance and 20% Faster Performance and
Uses Less AreaUses Less Area
Arithmatica Parallel CounterArithmatica Parallel Counter20% Faster Performance and 20% Faster Performance and
Uses Less AreaUses Less Area
Integrated Cascade Integrated Cascade Routing Enables Routing Enables
Scalable Performance Scalable Performance
Integrated Cascade Integrated Cascade Routing Enables Routing Enables
Scalable Performance Scalable Performance
Arithmatica A+AdderArithmatica A+Adder
20% Faster Than20% Faster Than
Other ImplementationsOther Implementations
Arithmatica A+AdderArithmatica A+Adder
20% Faster Than20% Faster Than
Other ImplementationsOther Implementations
Pipeline RegistersPipeline RegistersEnable 500Mhz Enable 500Mhz Performance Performance
Pipeline RegistersPipeline RegistersEnable 500Mhz Enable 500Mhz Performance Performance
Scalable 500MHz Performance Not Possible Using Scalable 500MHz Performance Not Possible Using
Standard Cell Libraries and Standard Cell Design FlowStandard Cell Libraries and Standard Cell Design Flow
Scalable 500MHz Performance Not Possible Using Scalable 500MHz Performance Not Possible Using
Standard Cell Libraries and Standard Cell Design FlowStandard Cell Libraries and Standard Cell Design Flow
Virtex-4 DSP48 Slice
48
R. M. Rao, 2008
Z
Y
X
36
36
48
A
B
BCIN
18
18
18
P48
CIN
SUB
3618
18
18
BCOUT
48
ZERO 48
48
PCOUT48
PCIN
48
18
72
Wire Shift Right By 17b
C
48
48
48
To Adjacent DSP48 Tile
Register
48
Pipelined Multiplier
3 delay latency
18
18B
AP (PCOUT)
LS Word
MS Word
48
36b product sign extended to 48b
z-3
49
R. M. Rao, 2008
Pipelined Complex 18x18 MPY
Ar18
Bi18
‘0’
48
Ar18
Bi18
48
S1
S2
48
sn = Slice n
Ar18
Br18
‘0’
48
Ai18
Bi18
48
S3
S4
48-
Pi
Pr
Register
36
Sign Extension
50
R. M. Rao, 2008
Wide Filters At Full Speed Within the Virtex-4 DSP Slice Column
• Systolic N-tap FIR– Scalable N-levels deep implementation– N-levels deep at 500MHz performance
• Uses Integrated Pipeline Registers to Synchronize Filter Inputs
• Utilizes Input and Output Cascade Routing
Build Massively Parallel 512-TAP FIR Filter Build Massively Parallel 512-TAP FIR Filter In a Single Device Achieving In a Single Device Achieving 256 GMACCs/s Performance256 GMACCs/s Performance
Build Massively Parallel 512-TAP FIR Filter Build Massively Parallel 512-TAP FIR Filter In a Single Device Achieving In a Single Device Achieving 256 GMACCs/s Performance256 GMACCs/s Performance
Equivalent Implementation Would Consume Equivalent Implementation Would Consume
444 Embedded Multipliers and 77,008 LCs 444 Embedded Multipliers and 77,008 LCs
And Would Only Achieve ½ The Performance And Would Only Achieve ½ The Performance
Equivalent Implementation Would Consume Equivalent Implementation Would Consume
444 Embedded Multipliers and 77,008 LCs 444 Embedded Multipliers and 77,008 LCs
And Would Only Achieve ½ The Performance And Would Only Achieve ½ The Performance
51
R. M. Rao, 2008
Xilinx FFT IP (4)
• FFT fully utilizes FPGA arithmetic hardware resources• FFT viewed as a recursion using a butterfly kernel
Phase factors: e-j2k/N
e-j2k/N
CADD1CADD2
CMPY
• CADD{1|2}: complex adder• CMPY: complex multiplier
52
R. M. Rao, 2008
Virtex-4 DSP Slice• DSP slice key for
implementing high-performance arithmetic
• Embedded 18x18 MPY and 48b adder– Butterfly phase rotator– Cross-addition
53
R. M. Rao, 2008
Butterfly CMPLX MPY
• Complex MPY used in FFT butterfly
• Optimized to employ Virtex-4 DSP Slice– 4 and 3 MPY option
• Complex MPY available as IP module†
Ar
Br
Ai
Bi
Pi
Pr
DSP Slice 1
DSP Slice 4
DSP Slice 2
DSP Slice 3
Pr + jPi = (Ar+jAi) x (Br + jBi)
† Available: 6.2i IP Update 2
54
R. M. Rao, 2008
Performance/Parallelism/Area• FPGA: highly parallel computing machine• Achieve performance using functional unit parallelism
• Area/throughput tradeoff delivered via Xilinx IP library
• Butterfly array to produce high-performance FFT processor• High computation rate using (possibly) hundreds of DSP
slices– Allocate resources as appropriate to meet system requirements
• Large memory bandwidth using multi-port memory constructed from BRAMs
Mem read BW: 320 x 36 x 500e6 = 5.76 Tera-bps
55
R. M. Rao, 2008
FFT Architecture• For small number of carriers and modest data rates single
butterfly (I)FFT is probably suitable - Small FPGA footprint
switc
h
PhaseFactor ROM
DataRam 0
DataRam 1
switc
h
Output Data
Input Data
Iteration Engine
56
R. M. Rao, 2008
Block boundary detection/Fine timing acquisition
Z-1 Z-1 Z-1Z-1 Z-1 Z-1 Z-1Z-1
Z-1 Z-1 Z-1Z-1 Z-1 Z-1 Z-1Z-1
||2
()*
arg
SAMPLES
KNOWNSEQUENCE
1 OFDM block ofrepeated data
Timing Est
Freq Est
ave
Half an OFDM block
F. Tufvesson, O. Edfors, M. Faulkner, “Time and Frequency Synchronization for OFDM using PN-Sequence Preambles”, VTC-1999/Fall, vol 4, pp.2203-7, New Jersey, 1999.
57
R. M. Rao, 2008
Fine-timing acquisition using a clipped correlator
1
ynsysgencast
bc3
sysgencast
bc2sysgen
d
en
qz-1
in0
in1out0
Register1
sysgen
a
b
suba b
AddSub
3
ld
2
coeff
1
a
2
xnz
1
ynsysgenaddrz-1
ROM1
sysgen
d
addr
en
q
R
a
coeff
ld
yn
MACsysgenz-1
Delay2
4
LD
3
CAddr
2
DAddr
1
xn
1
y
BaudClk
Data Addr
Coef Addr
load
FSM
sysgenenz-1
Delay7
sysgenenz-7
Delay6
sysgenenz-1
Delay5
sysgenz-1
Delay4
sysgenenz-8
Delay3
sysgenz-1
Delay2
sysgenenz-8
Delay1
sysgenz-2
Delay
xn
DAddr
CAddr
LD
yn
xnz
C7
xn
DAddr
CAddr
LD
yn
xnz
C6
xn
DAddr
CAddr
LD
yn
xnz
C5
xn
DAddr
CAddr
LD
yn
xnz
C4
xn
DAddr
CAddr
LD
yn
xnz
C3
xn
DAddr
CAddr
LD
yn
xnz
C2
xn
DAddr
CAddr
LD
yn
xnz
C1
sysgen
a b
en
a +
bz-1AddSub4
sysgen
a b
en
a +
bz-1AddSub2sysgen
a b
en
a +
bz-1AddSub13
sysgen
a b
en
a +
bz-1AddSub12sysgen
a b
en
a +
bz-1AddSub1sysgen
a b
en
a +
bz-1AddSub
2
BaudClk
1
x
Bank of correlators
1-bit correlator
10 time multiplexedcorrelators
Each 1-bit correlator :10 slices
Total for clipped correlator :589 slices
Full precision correlators :32 embedded multipliers896 flipflops
58
R. M. Rao, 2008
QRD
• One of the popular methods of matrix inversion is based on QRD.
• Q is Unitary and R is upper triangular• A Unitary matrix has a trival inverse, • An upper triangular matrix can be inverted by
back-substitution
H QR
1 HQ Q
1 1 HH R Q
59
R. M. Rao, 2008
Givens Rotations
• For a 2x1 vector of real numbers
• For a NxM matrix, repeat the process 2 cells at a time.
2 2
2 2 2 2
0
,
c s a a bs c b
a bc s
a b a b
11 12 13 11 12 1311 12 1311 12 13
21 22 23 21 22 23 22 23 22 23
31 32 33 32 33 32 33 33
0 0
0 0 0 0
a a a a a aa a aa a a
a a a a a a a a a a
a a a a a a a a
60
R. M. Rao, 2008
Systolic Arrays
• Structured arrays with identical cells. Usually a “boundary” cell and an “internal” cell for the QRD process.
Boundary cell
Internal cell 1. The boundary cell generates the rotations.
2. Internal cell applies the rotations to all the cells in the row.
3. The systolic array in this figure can handle any matrix below 3x3.
61
R. M. Rao, 2008
Triangularization mode• For QRD of upto a 3x3
matrix we need 3 boundary cells and 3 internal cells.
• Boundary cells calculate rotation vectors and internal cells store them.
• Data is fed column-wise into the systolic array.
• This may have to be staggered depending on the pipelining delays thru the boundary cell and internal cell.
11 12 1311 12 13 11 12 1311 12 13
21 22 23 22 23 22 23 22 23
31 32 33 31 32 33 32 33 33
0 0 0
0 0 0
a a aa a a a a aa a a
a a a a a a a a a
a a a a a a a a a
31
21
11
a
a
a
32
22
12
a
a
a
33
23
13
a
a
a
The rotation factors for zeroing out cell A(2,1) are stored in cell A(1,2), etc.
62
R. M. Rao, 2008
Q-matrix computation mode
H
H H
Q A R
Q I Q
11 12 1321 21 31 31 11 12 13
32 32 21 21 21 22 23 22 23
32 32 31 31 31 32 33 33
1 0 0 0 0
0 0 0 1 0 0
0 0 0 1 0 0 0
a a ac s c s a a a
c s s c a a a a a
s c s c a a a a
0
0
1
0
1
0
1
0
0
first column of Q matrix
second column of Q matrix
third column of Q matrix
* *
* . * .
* . * .
;
s x I s s I c
z x I c s I s
c c
HQ RA
63
R. M. Rao, 2008
Agenda• Introduction to Wireless communications
– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems
– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.
• The Platform FPGA– Overview of FPGAs and FPGA tools– Building DSP sub-systems on FPGAs– Digital baseband
• FPGA tools and design methodology
64
R. M. Rao, 2008
FPGA Tools for DSP Systems Design
• Higher level tools are raising the level of abstraction.
• Allows non-hardware engineers (algorithm designers) to get a first look at hardware.
• System Generator– Simulink to Hardware
• C-to-Gates tools– C or “higher” level languages to gates
65
R. M. Rao, 2008
System GeneratorSystem Level Modeling & Simulation Framework
Work in the language of your problem
HDL
C
66
R. M. Rao, 2008
HDL Simulation Flow
1. Develop Algorithm &System Model
Download to FPGA
DSP Development Flow
2. Automatic CodeGeneration
Simulink MDL
Bitstream
System Generator Flow
3. Xilinx Implementation Flow
HDL Test Bench Test Vectors
RTL VHDL & Cores
FPGA
67
R. M. Rao, 2008
Configurable MIMO-OFDM Transmitter
8
ImagOut4
7
RealOut4
6
ImagOut3
5
RealOut3
4
ImagOut2
3
RealOut2
2
ImagOut1
1
RealOut1
RealIn
ImagIn
WriteFIFO
BaudClk
RealOut1
ImagOut1
RealOut2
ImagOut2
RealOut3
ImagOut3
RealOut4
ImagOut4
Spatial Demultiplexing
RealIn
ImagIn
SampleClk
Bdata
rfd
Preamble
BFrame
FFTbusy
RealOut
ImagOut
Start
Enable
DataRequest
DataSubcarrier
Pilot Insertionand Data loading
DataIn
SampleClk
Zeroblks
Preamble
Bdata
DataSubc
DataEnable
RealOut
ImagOut
Packetizationand Encoding
SampleClk
Zeroblks
Preamble
Bdata
BFrame
Packet Controller
sysgenandz-0
Logical2
sysgenandz-0
Logical
sysgennot
Inverter FFT
xn_re
xn_im
start
enable
xk_re
xk_im
xk_index
rfd
vout
Busy
FFT
Clock Generator
SampleClk
BaudClk
ClockGenerator
RealIn
ImagIn
Addr
WriteFIFO
RealOut
ImagOut
ReadFIFO
Add Cyclic Extension
3
DataDone2
DataEnable
1
DataIndouble double
double
double
double double
double
Fix_16_10
UFix_6_0double
double
double
Fix_16_10
doubledouble
double
double
double
double
double
double
double
double
double
double
double
double
double
double
Bool
Bool
Bool
double double
Booldouble
double
Packet Controller
Packetization and configurable STBC
encoding
Pilot insertion and data loading
Time shared FFT across antennas
Add Cyclic Extension/Block
Shaping
Spatial Demultiplexing
and Interpolation
Resource sharing (folding factor)Ratio of System clock rate to symbol rate > 8 needed for a 4 transmit antenna system
68
R. M. Rao, 2008
MIMO Receiver Architecture
Samples processed at sample clock rate Samples processedat system clock rate
Packet Detection
Packet Detection
Packet Detection
Packet Detection
Block Boundary Detection
BlockBoundary
Coarse CFOestimate
Coarse CFOestimate
CFO estimator
Strip CP
Strip CP
Strip CP
Strip CP
Input FIFO
Input FIFO
Input FIFO
Input FIFO
FFT
FFT
FFT
FFT
Rx 1
Rx 2
Rx 3
Rx 4
Channel Estimator
Output FIFO
Output FIFO
Output FIFO
Output FIFO
Combine PD
MIMO Decoder Matrix
(MMSE, etc)
MIMO Decode
Soft Decisions
MIMO Decoder
FIFO
Pilot based CFO estimator
Packet Controller
Preamble
Payload
CF
O C
ompe
nsat
or
69
R. M. Rao, 2008
Fine-timing acquisition using a clipped correlator
1
ynsysgencast
bc3
sysgencast
bc2sysgen
d
en
qz-1
in0
in1out0
Register1
sysgen
a
b
suba b
AddSub
3
ld
2
coeff
1
a
2
xnz
1
ynsysgenaddrz-1
ROM1
sysgen
d
addr
en
q
R
a
coeff
ld
yn
MACsysgenz-1
Delay2
4
LD
3
CAddr
2
DAddr
1
xn
1
y
BaudClk
Data Addr
Coef Addr
load
FSM
sysgenenz-1
Delay7
sysgenenz-7
Delay6
sysgenenz-1
Delay5
sysgenz-1
Delay4
sysgenenz-8
Delay3
sysgenz-1
Delay2
sysgenenz-8
Delay1
sysgenz-2
Delay
xn
DAddr
CAddr
LD
yn
xnz
C7
xn
DAddr
CAddr
LD
yn
xnz
C6
xn
DAddr
CAddr
LD
yn
xnz
C5
xn
DAddr
CAddr
LD
yn
xnz
C4
xn
DAddr
CAddr
LD
yn
xnz
C3
xn
DAddr
CAddr
LD
yn
xnz
C2
xn
DAddr
CAddr
LD
yn
xnz
C1
sysgen
a b
en
a +
bz-1AddSub4
sysgen
a b
en
a +
bz-1AddSub2sysgen
a b
en
a +
bz-1AddSub13
sysgen
a b
en
a +
bz-1AddSub12sysgen
a b
en
a +
bz-1AddSub1sysgen
a b
en
a +
bz-1AddSub
2
BaudClk
1
x
Bank of correlators
1-bit correlator
10 time multiplexedcorrelators
Each 1-bit correlator :10 slices
Total for clipped correlator :589 slices
Full precision correlators :32 embedded multipliers896 flipflops
70
R. M. Rao, 2008
MIMO-OFDM Receiver
10
ValidOut
9
PacketDetect
8
SoftDecImag4
7
SoftDecReal4
6
SoftDecImag3
5
SoftDecReal3
4
SoftDecImag2
3
SoftDecReal2
2
SoftDecImag1
1
SoftDecReal1
Ch_tx1rx1
Ch_tx1rx2
Ch_tx1rx3
Ch_tx1rx4
Ch_tx2rx1
Ch_tx2rx2
Ch_tx2rx3
Ch_tx2rx4
Ch_tx3rx1
Ch_tx3rx2
Ch_tx3rx3
Ch_tx3rx4
Ch_tx4rx1
Ch_tx4rx2
Ch_tx4rx3
Ch_tx4rx4
En
Addr
wreal_1_1
wimag_1_1
wreal_1_2
wimag_1_2
wreal_1_3
wimag_1_3
wreal_1_4
wimag_1_4
wreal_2_1
wimag_2_1
wreal_2_2
wimag_2_2
wreal_2_3
wimag_2_3
wreal_2_4
wimag_2_4
wreal_3_1
wimag_3_1
wreal_3_2
wimag_3_2
wreal_3_3
wimag_3_3
wreal_3_4
wimag_3_4
wreal_4_1
wimag_4_1
wreal_4_2
wimag_4_2
wreal_4_3
wimag_4_3
wreal_4_4
wimag_4_4
Weight Matrix Computation
Rxreal1
Rximag1
Rxreal2
Rximag2
Rxreal3
Rximag3
Rxreal4
Rximag4
ValidData
Addr
Out_real1
Out_imag1
Out_real2
Out_imag2
Out_real3
Out_imag3
Out_real4
Out_imag4
ReadFIFO
AddrOut
Output FIFO
RealIn1
ImagIn1
RealIn2
ImagIn2
Baud_clk
PacketDetect
CFO_Est
PktDetPulse
MIMO Packet Detect1
Rxreal1
Rximag1
Rxreal2
Rximag2
Rxreal3
Rximag3
Rxreal4
Rximag4
ReadFIFO
Addr
wreal_1_1
wimag_1_1
wreal_1_2
wimag_1_2
wreal_1_3
wimag_1_3
wreal_1_4
wimag_1_4
wreal_2_1
wimag_2_1
wreal_2_2
wimag_2_2
wreal_2_3
wimag_2_3
wreal_2_4
wimag_2_4
wreal_3_1
wimag_3_1
wreal_3_2
wimag_3_2
wreal_3_3
wimag_3_3
wreal_3_4
wimag_3_4
wreal_4_1
wimag_4_1
wreal_4_2
wimag_4_2
wreal_4_3
wimag_4_3
wreal_4_4
wimag_4_4
BaudClk
Out_real1
Out_imag1
valid_out
ReadWeightMatrix
Out_real2
Out_imag2
Out_real3
Out_imag3
Out_real4
Out_imag4
MIMO Decoder
WriteFIFO
RxStream1
RxStream2
RxStream3
RxStream4
Enable
ReadFIFO
CFO_est
FFT_Start
CFO_Valid
RxOut1
RxOut2
RxOut3
RxOut4
FIFO_status_flag
Input Buffer
RealIn
ImagIn
BaudClk
Out2
BBDValid
Fine Timing Acquisition
RxStream1
RxStream2
RxStream3
RxStream4
FIFO_status_flag
Enable
CFO_Valid
Reset
RxReal1
RxImag1
RxReal2
RxImag2
RxReal3
RxImag3
RxReal4
RxImag4
Valid out
Addr
FFT_RFD
FFT_Start
FFT
0
Display2
0
Display1
z-1 Delay8
enz-1
Delay7
enz-1
Delay6
enz-1
Delay5
enz-1
Delay4
enz-1
Delay3
enz-1
Delay2
enz-1
Delay1
enz-1
Delay
BlkBounDetect
RealIn1
ImagIn1
RealIn2
ImagIn2
RealIn3
ImagIn3
RealIn4
ImagIn4
PacketDetect
BaudClk
ReadEnable
RxStream1
RxStream2
RxStream3
RxStream4
Cyclic Prefix Removal
Clock Generator
SampleClk
BaudClk
ClockGenerator
Rxreal1
Rximag1
Rxreal2
Rximag2
Rxreal3
Rximag3
Rxreal4
Rximag4
ValidData
Addr
ReadAddr
Ch_1_1
Ch_1_2
Ch_1_3
Ch_1_4
Ch_2_1
Ch_2_2
Ch_2_3
Ch_2_4
Ch_3_1
Ch_3_2
Ch_3_3
Ch_3_4
Ch_4_1
Ch_4_2
Ch_4_3
Ch_4_4
CFO_Est
CFO_Est_Valid
Channel Estimation
a
ba - b
AddSub
9
Reset
8
ImagIn4
7
RealIn4
6
ImagIn3
5
RealIn3
4
ImagIn2
3
RealIn2
2
ImagIn1
1
RealIn1
Packet Detection
Fine Timing Acq
Cyclic prefix removal
Channel Estimation
Weight Matrix Computation
MIMO Decoder
FFT
Carrier Frequency Offset Correction
Output FIFO
71
R. M. Rao, 2008
Channel Estimation
32
Chimag16
31
Chreal1630
Chimag15
29
Chreal1528
Chimag14
27
Chreal1426
Chimag13
25
Chreal13
24
Chimag12
23
Chreal1222
Chimag11
21
Chreal1120
Chimag10
19
Chreal10
18
Chimag9
17
Chreal9
16
Chimag8
15
Chreal814
Chimag7
13
Chreal7
12
Chimag6
11
Chreal6
10
Chimag5
9
Chreal5
8
Chimag4
7
Chreal4
6
Chimag3
5
Chreal3
4
Chimag2
3
Chreal2
2
Chimag1
1
Chreal1
Enable
Reset
Pilot_real
Training SymbolsTx4
Enable
Reset
Pilot_real
Training SymbolsTx3
Enable
Reset
Pilot_real
Training SymbolsTx2
Enable
Reset
Pilots
Addr
Training SymbolsTx1
simout11
To Workspace2
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM3
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM2
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM1
addr
Real
Imag
WE
EN
real_out
imag_out
Single Port RAM
sysgen
sel
d0
d1
Mux1
sysgen
sel
d0
d1
Mux
sysgenandz-2
Logical
sysgenz-2
Delay9
sysgenz-2
Delay8
sysgenz-2
Delay7
sysgenz-1 Delay6
sysgenz-2
Delay5
sysgenz-2
Delay4
sysgenz-2
Delay3
sysgenz-2
Delay2
sysgenz-2
Delay12
sysgenz-2
Delay11
sysgenz-2
Delay10
sysgenz-3
Delay1
sysgenrst
enout
Counter2
sysgenrst
enout
Counter1
ValidData
ChEstPilots
ChEstEn
ChEstRst
En
Rst
En2
ChEstPilots1
ControlSignals
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx4-Rx1
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx3-Rx1
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx2-Rx1
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx4
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx3
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx2
addr
Pilots1
Real
Imag
WE
VDATA
real_out
imag_out
Real_in
Imag_in
ChEst Tx1-Rx1
sysgenx 0.3535
CMult7
sysgenx 0.3535
CMult6
sysgenx 0.3535
CMult5
sysgenx 0.3535
CMult4
sysgenx 0.3535
CMult3
sysgenx 0.3535
CMult2
sysgenx 0.3535
CMult1
sysgenx 0.3535
CMult
12
ReadAddr
11
ChEstPilots
10
Addr
9
ValidData
8
Rximag4
7
Rxreal4
6
Rximag3
5
Rxreal3
4
Rximag2
3
Rxreal2
2
Rximag1
1
Rxreal1
double
double
Bool
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
UFix_6_0
Fix_16_10
UFix_6_0
UFix_6_0
UFix_6_0
Fix_16_10
Fix_16_10
double
double
double
Bool
double
double
UFix_6_0
Fix_16_10
Fix_16_10
Bool
double
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_32_20
Fix_32_20
Fix_32_20
double
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_2_0
Fix_32_20
Fix_32_20
Fix_32_20
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_2_0
UFix_6_0
double
double
double (8)
double
double
double
double
double
double
double
doubleFix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_32_20
Fix_32_20
double
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Fix_32_20
Channel Estimation Pilots for Tx4
Channel Estimation Pilots for Tx1
4x4 Channel Estimation Memory
Control Signals
Input FIFO
72
R. M. Rao, 2008
Packet Detection
Schmidl and Cox algorithm for Packet Detection and coarse carrier frequency offset estimation.
T. M. Schmidl, D. C. Cox, “Low Overhead Low Complexity Synchronization for OFDM”, ICC 1996, Vol 3, pp 1301-1306. Z-D
C
P
2
2( )
r(n) c(n)
p(n)
m(n)*
*
Identical halves of 1 OFDM symbol
73
R. M. Rao, 2008
Two Branch CFO estimation using Schmidl and Cox algo
AvePwr
3
CorrMetric _ imag
2
CorrMetric _real
1
Sliding WindowAverager
In
BaudClk
Rst
Out
Slice5
[a:b]
Slice3
[a:b]
Slice2
[a:b]
Slice1
[a:b]
Reinterpret 4
reinterpret
Reinterpret 3
reinterpret
Reinterpret 2
reinterpret
Reinterpret 1
reinterpret
Magnitude -Squared 1
Squarer
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
Delay 4
enz-32
Delay 3
enz-32
Delay 2
enz-2
Delay 1
enz-32
Delay
enz-32
Complex Sliding Window Averager 1
RealIn
ImagIn
BaudClk
Rst
RealOut
ImagOut
Complex Sliding Window Averager
RealIn
ImagIn
BaudClk
Rst
RealOut
ImagOut
Complex Multiply 3
Complex Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply 2
Complex Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
AddSub 2
a
b
a + bz-1
AddSub 1
a
b
a + bz-1
Rst
6
BaudClk
5
ImagIn 2
4
RealIn 2
3
ImagIn 1
2
RealIn 1
1 a
b
Combine the metric from both Antennas
Carrier Frequency Offset causes a linearly increasing rotation in the time domain
jYe Y
74
R. M. Rao, 2008
Carrier Frequency Offset Estimation
• Pre-FFT– Uses a dedicated preamble or symbol for CFO estimation
• Post-FFT using channel estimation pilots– Uses channel estimation training symbols
• Post-FFT CFO Tracking– Needs continuous pilots during payload symbols
• CFO Estimation using Cyclic Prefix– Works well when you have a lengthy cyclic prefix– Examples: WiMax, 3GPP-LTE, DVB-T/H– Does not need preamble or pilot support
75
R. M. Rao, 2008
Pre-FFT Carrier Frequency Offset Estimation
CFO_Est1
Truncate
In1
In2
In3
Out1
Out2
Out3
Rising edgedetector
In1
Out1
Register1
drsten
qz- 1
Packet Detection 3
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
Rst
CorrMetric _ real
CorrMetric _ imag
AvePwr
Delay6
enz-24
Delay5
enz-14
Convert
cast
CORDIC ATAN
z-17
x
y
mag
atan
CMult8
x 0.003906z-2
BBD7
Rst6
Baud_clk5
ImagIn24
RealIn23
ImagIn12
RealIn11
The angle of the correlation metric is proportional to the Carrier frequency offset.
Right size the number of bits before the CORDIC operation.
CORDIC ATAN from the Xilinx Math library calculates the angle.
ˆ
22sN
76
R. M. Rao, 2008
Post-FFT CFO Estimation and tracking
Location of channel estimation training symbols for Antenna 1 for a 2 antenna MIMO system
A subset of channel estimation training symbols is used for CFO
estimation
Angular rotation on symbol 1
Angular rotation on symbol 2
( )kProportional to
CFO
( ( ))ˆ
2 (1 )
mean kc N
CPNs
e
CFO causes a linear rotation every sample in the time domain.
CFO causes a constant rotation on all subcarriers in the frequency domain.
This rotation increases from OFDM symbol to symbol and can be used to estimate CFO.
77
R. M. Rao, 2008
Carrier Frequency Offset Correction
ImagOut 4
8
RealOut 4
7
ImagOut 3
6
RealOut 3
5
ImagOut 2
4
RealOut 2
3
ImagOut 1
2
RealOut 1
1
Rising edgedetector
In1 Out1
Relational 1
a
b
a<=b
z-0
Relational
a
b
a<b
z-0
Negate 1
x(-1)
Logical 1
orz-0
Logical
and
z-0
Delay 7
z-1
Delay 6
z-1
Delay 5
z-1
Delay 4
z-1
Delay 3
z-1
Delay 2
z-1
Delay 1
z-1
Delay
z-1
DDS
freq_off
Enable
Reset
cos_out
sin_out
Counter
rst out
Constant 3
1
Constant 2
78
Constant 1
0
Complex Multiply 3
Complex Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply 2
Complex Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply 1
Complex Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
Complex Multiply
Complex Multiply
RealIn 1
ImagIn 1
RealIn 2
ImagIn 2
BaudClk
RealOut
ImagOut
CMult
x 0.01563
Reset
12
CFO_Est_valid
11
FFT_Start
10
CFO_Est
9
ImagIn 4
8
RealIn 4
7
ImagIn 3
6
RealIn 3
5
ImagIn 2
4
RealIn 2
3
ImagIn 1
2
RealIn 1
1
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Bool
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_15
Fix_16_15Fix_17_15
Fix_16_12
Fix_16_10
Fix_16_10
UFix_16_0
UFix_16_0
UFix_16_0
Bool
Bool
BoolBool
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Fix_16_10
Bool
Bool
Fix_16_10
Fix_16_10
Fix_16_16
double
Direct digital synthesizer (DDS) from the Xilinx DSP SysGen library.
78
R. M. Rao, 2008
Design methodology issues
• FPGA tools– Where to from here?
• C-to-gates– Higher level design languages to gates– Raising the level of abstraction
79
R. M. Rao, 2008
End of Roadmap for theVon Neumann Model
SPEC
Int9
2/M
Hz
Source: Ronen [2001]
CPUs are as smart as they can be!
MHz
L2 $
Spot the CPU!
L1 $
CPU
Source: Agarwala [2002]TI 6416
Clock frequency
scaling
Absolute power limits
With Moore’s law you also get leakage!
Source: Borkar [1999]
Divide and conquer
Source: Zu & Baas [2006]
Multi-core Arrays
1945-2005Sequential
programming
2005 - ????Concurrent
programming 6x6 GALS Processor Array
80
R. M. Rao, 2008
Merging Mindsets:Software Design vs. Hardware Design
class A
start()
class B
class C
class D
resourceA resourceB resourceC
Events Protocols Ordering Sequential execution
Encapsulation Abstraction Portability Re-use
Implementation Detail Control Logic
Interface Glue Concurrency
Communication Architecture
Clocks Signals
Timing
Combining the strengths of both paradigms can bring about a radical improvement in hardware/software system design productivity.
81
R. M. Rao, 2008
Objective for a New Methodology:reduce design cost (by a lot)
• Quality of result (QoR) is not a design goal! Performance, power, BOM cost budgets make QoR a design constraint
• The real objective is to meet the QoR target and minimize: Non-recurring engineering costs (NRE) Time-to-market (TTM)
• The new methodology should save on design cost by enabling Design of portable, retargetable, composable IP blocks Rapid design space exploration and system composition
Total Design CostNRE $, TTM
Traditional HDL FlowQoRperformance/$
performance/W New methodology
AbstractionProfit
abstractioncost
82
R. M. Rao, 2008
‘C’ or higher level language to Gates
• There is interest in higher level design methodologies, such as C-to-Gates from the design community.
• ESL (Electronic system level) tools/design methodologies are being explored.
• But, extracting all the concurrency from a sequential description is not an easy problem.
83
R. M. Rao, 2008
Actor/Dataflow Programming Model
encapsulated state
Actions
State
point-to-point, buffered token-passing connections
actors guarded atomic actions
• A well-known and researched model for concurrent systems– Edward Lee et. al. (UC Berkeley)– Arvind et. al. (MIT)
• Broadly applicable to heterogeneous HW/SW systems• Actors are described in the CAL language (UC Berkeley)
– Open source simulator available from SourceForge– Under consideration as reference model for MPEG
84
R. M. Rao, 2008
Conclusion
• FPGAs are finding wide use in infrastructure communication systems and signal processing systems.
• FPGA are an efficient choice for exploring VLSI architectures.
• FPGA tools are raising the level of abstraction to allow algorithm designers the ability to explore h/w architectures without learning “h/w design tools/languages”.
85
R. M. Rao, 2008
Questions?
top related