an implementation of ieee802 - cae usershomepages.cae.wisc.edu/.../s06/xiehoufranklinreport.pdf ·...
TRANSCRIPT
An Implementation of IEEE802.11a
WLAN System using Subword Parallelism and its Quantization Error
Evaluation
ECE 734 FINAL PROJECT
PROF. YU HEN HU
SUBMITTED BY DAPHNE J. FRANKLIN
MUWU HOU ZAIPENG XIE
1
TABLE OF CONTENTS
Title Page Number
1. Abstract 3
2. Introduction
2.1. Motivation 3
2.2. Background 4
2.3. Baseband Model 4
3. IEEE 802.11a Standard 5
4. Implementation of an OFDM system 8
5. Methodology 10
6. PLX 13
6.1. PLX implementation of FFT 14
7. Results 16
8. Conclusion 22
9. Future Work 22
10. References 23
11. Appendix 24
2
1. ABSTACT
An OFDM system based on the IEEE 802.11a standards is studied and implemented. The
code for the transceiver (transmitter/receiver) section is written in Matlab and the decoder
as well as encoder portions of the code invoking the FFT and IFFT functions are
translated into PLX. The primary characteristics are studied and further optimizations, the
most important of which is subword parallelism, are done on the code and the results are
compared to the theorectical results. This project evaluates the performance
characteristics of the OFDM system which is simulated according to the parameters
established by the standard as well as the quantization errors. Our simulation results show
that the simulation results show that we can decrease the BER and PER significantly by
increasing the FFT/IFFT processor data width. The tradeoff between FFT/IFFT processor
data width and BER and PER grant us opportunities to get the best performance and
minimized FFT/IFFT processor based on requirement.
2. INTRODUCTION
2.1. MOTIVATION:
Orthogonal frequency division multiplexing (OFDM) is a special case of
multicarrier transmission, where a single datastream is transmitted over a number of
lower rate subcarriers. The primary reason for employing multicarrier modulation
techniques like OFDM is to meet the increase in demand of greater communication
capacity with high bandwidth efficience and also because of it’s robustness with respect
to multi-path fading and delay. The processing power of modern digital signal processors
has made the use of OFDM systems both practical and economical.
It has already been accepted for the new wireless local area network standards IEEE
802.11a, High Performance LAN type 2 (HIPERLAN/2) and Mobile Multimedia Access
Communication (MMAC) Systems. Also, it is expected to be used for wireless broadband
multimedia communications, General Switched Telephone Network (GSTN), Cellular
3
radio, Digital Audio Broadcasting (DAB), HDTV Broadcasting and many more
application areas.
OFDM can be seen as either a modulation technique or a multiplexing technique. One of
the main reasons to use OFDM is to increase the robustness against frequency selective
fading or narrowband interference. In a single carrier system, a single fade or interferer
can cause the entire link to fail, but in a multicarrier system, only a small percentage of
the subcarriers will be affected. Error correction coding can then be used to correct for
the few erroneous subcarriers.
The FFT can be used to efficiently perform the modulation of data onto orthogonal
carriers. Recent advances in very-large-scale integration (VLSI) technology make high-
speed, large-size FFT chips commercially affordable. Using this method, both transmitter
and receiver are implemented using efficient FFT techniques that reduce the number of
operations from N2 in DFT down to N log N.
2.2. BACKGROUND:
In July 1998, the IEEE standardization group decided to select OFDM as the basis for
their new 5-GHz standard, targeting a range of data stream from 6 up to 54 Mbps. This
new standard is the first one to use OFDM in packet-based communications, while the
use of OFDM until now was limited to continuous transmission systems.
The concept of using parallel data transmission and frequency division multiplexing was
published in the mid-1960s. Some early OFDM for Wireless Networks development is
traced back to the 1950s. A U.S. patent was filed and issued in January 1970.
2.3. BASEBAND MODEL
The fundamental idea defining an OFDM system is the division of the available
frequency spectrum into several subcarriers. To obtain a high spectral efficiency, the
frequency responses of the subcarriers are overlapping and orthogonal, hence the name
4
OFDM. This orthogonality can be completely maintained with a small price in a loss in
SNR, even though the signal passes through a time dispersive fading channel, by
introducing a cyclic prefix (CP).
The binary information is first grouped, coded, and mapped according to the modulation
in a “signal mapper.” The transmitter first converts the input stream of data from serial to
parallel sets. After the guard band is inserted, an N-point Inverse Fast Fourier Transform
(IFFTN) block transforms the data sequence into time domain (note that N is typically
256 or larger). IFFT is typically useful for OFDM systems as it generates samples of a
waveform that satisfy the orthogonality function. Following the IDFT block, a cyclic
extension of time length TG, chosen to be larger than the expected delay spread, is
inserted to avoid intersymbol and intercarrier interferences. The D/A converter contains
low-pass filters with bandwidth 1/TS, where TS is the sampling interval. The channel is
modeled as an impulse response g(t) followed by the complex additive white Gaussian
noise (AWGN) n(t), where αm is a complex values and 0 ≤ τmTS ≤ TG.
G(t) = αm δ(t- τmTS) ∑
At the receiver, after passing through the analog-to-digital converter (ADC) and
removing the CP, the DFTN is used to transform the data back to frequency domain.
Lastly, the binary information data is obtained back after the demodulation and channel
decoding.
3. IEEE 802.11a STANDARD
The IEEE 802.11 specification is a wireless LAN (WLAN) standard that defines a set of
requirements for the physical layer (PHY) and a medium access control (MAC) layer. For
high data rates, the standard provides two PHYs – IEEE 802.11b for 2.4-GHz operation
and IEEE 802.11a for 5-GHz operation. The IEEE 802.11a standard is designed to serve
applications that require data rates higher than 11 Mbps in the 5-GHz frequency band.
5
When developing WLAN systems, choosing the right modulation and frequency band
should be a priority in RF design, especially when designing IEEE 802.11a radios. For
the past decade, WLAN systems have been designed to operate in the unlicensed 2.4-
GHz frequency band. The 2.4-GHz band provides 83 MHz of total contiguous bandwidth,
spanning from 2.4 to 2.483 GHz. Moving to the 5-GHz band offers over three times the
operating bandwidth over the available spectrum in the 2.4-GHz band. The 5-GHz band
is also less susceptible to interference, unlike the 2.4-GHz unlicensed band, which shares
spectrum, with other wireless appliances such as Bluetooth devices.
In the US, 300 MHz of bandwidth is allocated in the 5-GHz band to WLANs under the
rules of the Unlicensed-National Information Infrastructure (U-NII). The bandwidth is
fragmented into two blocks that are noncontiguous across the 5-GHz band. [1]
The major specifications of the OFDM PHY is shown in Table 1.
Information Data Rate 6,9,12,18,24,36,48 and 54 Mbits/sec
BPSK
QPSK
16-QAM
Modulation 64-QAM
K = 7(64 states)
convolution
Error Correcting Code code
Coding Rate 1/2, 2/3, 3/4
Number of Subcarriers 52
OFDM symbol duration 4 microsec
Guard Interval 0.8 microsec
Occupied Bandwidth 16.6 MHz
Table 1 – Major parameters of the OFDM PHY
6
The mathematical conventions in signal description define that the transmitted signals
will be described in a complex abseband signal notation. The actual transmitted signal is
related to the complex baseband signal by the relation which is given below:
r(RF)(t) = Re{r(t)exp(j2π f ct)} where
Re(.) represents the real part and f c denotes the carrier center frequency
The transmitted baseband signal is a contribution from several OFDM symbols.
rPACKET(t) = rPREAMBLE(t) + rSIGNAL(t–tSIGNAL) + rDATA(t–tDATA) [1] All the subframes of the signal are constructed as an inverse fourier transform of a set of coefficients. The rate dependent as well as the time dependent parameters as specified by IEEE
802.11a standard [1] is given in Table 2 and Table 3.
Data Rate
Mbits/s Modulation
Coding Rate
R
Coded bits
per
subcarrier
Coded bits
per
OFDM
symbol
Data bits per
OFDM
symbol
6 BPSK 1/2 1 48 24
9 BPSK 3/4 1 48 36
12 QPSK 1/2 2 96 48
18 QPSK 3/4 2 96 72
24 16-QAM 1/2 4 192 96
36 16-QAM 3/4 4 192 144
48 64-QAM 2/3 6 288 192
54 64-QAM 3/4 6 288 216
Table 2 - Rate Dependent Parameter
7
Parameter Value
Nsd : number of data subcarriers 48
Nsp : number of pilot subcarriers 4
Nst :number of subcarriers total 52(Nsd + Nsp)
Tfft : FFT/IFFT period 3.2 microsec
Tsignal : duration of signal BPSK OFDM symbol4 microsec(Tgi + Tfft)
Tgi : GI duration 0.8 microsec (Tfft/4)
Tgi2 : training symbol GI duration 1.6 microsec (Tfft/2)
Tsym : symbol interval 4 microsec (Tgi + Tfft)
Table 3 - Time Dependent Parameters
4. IMPLEMENTATION OF AN OFDM SYSTEM
OFDM system can be seen as either a multiplexing or a modulation technique. As
mentioned earlier, one of the primary advantages of the OFDM system is its capability to
minimize interference (frequency selective fading or narrowband interference). This
advantage is brought about because of the fact that while in a single carrier system, a
single fade or interference can cause the entire link to fail whereas in a multicarrier
system like the OFDM system, the interference will affect only a small set of the
subcarriers and thus will not corrupt the entire link.
Error correcting coding can then be used to correct the few subcarriers that are affected.
In a parallel system , the total signal frequency is divided into a fixed number of non-
overlapping frequency subchannels. Each subchannel is then modeled with a different
symbol and then the different subchannels are frequency multiplexed
By using the overlapping multicarrier technique, we can save almost 50% of the
bandwidth. In order to realize the overlapping multicarrier technique, we need to reduce
the crosstalk between subcarriers and the carriers must be mathematically orthogonal.
8
The figure given below illustrates the difference between conventional non-overlapping
multicarrier technique and the overlapping multicarrier modulation technique
.
Figure 1-Orthogonal multicarrier technique vs Conventional multicarrier technique An OFDM signal is a sum of subcarriers that are individually modulated by using phase
shift keying (PSK) or quadrature amplitude modulation (QAM). The symbol [3] can be
written as
( )1
2
22
0.5Re exp 2 ,( )
0
s
ss
N
N c s s siNi
id j f t t t t t Ts t T
others
π−
+=
⎧ ⎧ ⎫⎛ + ⎞⎪ ⎪⎛ ⎞⎪ − − ≤ ≤ +⎪ ⎨ ⎬⎜ ⎟⎜ ⎟= ⎝ ⎠⎨ ⎝ ⎠⎪ ⎪⎩ ⎭⎪
⎪⎩
∑
9
Here Ns is the number of subcarriers, T is the symbol duration, cf is the carrier
frequency and ( )()()( ijbiaiddi +== 1,,1,0 −= Ni ).
The complex baseband OFDM signal defined by the equation above is the inverse fourier
transform of Ns QAM input symbols. The time discrete case is the inverse discrete
fourier transform. In practice, this transform can be implemented very efficiently by the
inverse fast fourier transform (IFFT). The IFFT drastically reduces the amount of
calculations by exploiting the regularity of operations in the IDFT.
( ) TtstNsje −− π
( )( ) TtstNsje −−− 2π
Figure 2 - OFDM modulator
5. METHODOLOGY
In the transmitter, binary input data is encoded by a rate ½ convolutional encoder. The
rate can be increased to 2/3 and ¾. After interleaving, the binary values are converted to
QAM values. Four pilot values are added each 48 data values, resulting in a total of 52
QAM values per OFDM symbol. The symbol is modulated onto 52 subcarriers by
applying the Inverse Fast Fourier Transform (IFFT). The output is converted to serial and
a cyclic extension is added to make the system robust to multipath propagation.
Windowing is applied after to get a narrower output spectrum. Using an IQ modulator,
the signal is converted to analog, which is upconverted to the 5 GHz band, amplified, and
transmitted through the antenna.
10
The receiver performs the reverse operations of the transmitter, with additional training
tasks. In the first step, the receiver has to estimate frequency offset and symbol timing,
using special training symbols in the preamble. After removing the cyclic extension, the
signal can be applied to a Fast Fourier Transform to recover the 52 QAM values of all
subcarriers. The training symbols and the pilot subcarriers are used to correct for the
channel response as well as remaining phase drift. The QAM values are then demapped
into binary values, and finally a Viterbi decoder decodes the information bits.
In fact, the IFFT can be made using an FFT by conjugating input and output of the FFT
and dividing the output by the FFT size. [1]
Figure 3 - Inputs and Outputs of IDFT
If, for example, a 64-point IFFT is used, the coefficients 1 to 26 are mapped to the same
numbered IFFT inputs. The coefficients -26 to -1 are copied to IFFT inputs 38 to 63. The
rest of the inputs, 27 to 37 and the 0 input are set to zero. After performing an IFFT, the
output is cyclically extended to the desired length.
11
The transmitter and the receiver section of the OFDM system was implemented in
MatLab[6] and it was modeled after the transmitter and receiver as described by the IEEE
802.11a standard.
In the transmitter section, the binary input data is encoded and the encoder includes
forward error correction and then the data is interleaved. After interleaving, the binary
values are converted to QAM values. Following this, serial data is converted to parallel
form and the symbol is modulated onto 52 carriers by applying IFFT. Once this is done,
the data is converted back into the serial form and we add cyclic extension and
windowing to the signal.
Shown in Figure 4 is the transmitter section
Figure 4 - Transmitter block diagram for the OFDM PHY in IEEE 802.11a
As for the receiver, it essentially performs the reverse operations of the transmitter. In the
first step the receiver has to estimate the symbol timing and the frequency offset. After
removing the cyclic extension, the signal (in the parallel form) can now be applied to the
fast fourier transform (FFT) in order to recover the 52 QAM values of all the subcarriers.
The next step would be to convert these values to the original binary values by
demapping them. Finally a Viterbi decoder is used to decode the information bits.
12
Shown in Figure 5 is the block diagram of the receiver section.
Figure 5 - Receiver block diagram for the OFDM PHY in IEEE 802.11a
Each time, we use a random function to generate 100 packets of data with each packet
having a size of one byte. We use random generation of data as it is useful in testing the
robustness of the system against a wide range of data. In order to simplify our model, we
use AWGN as it represents the ideal model for wireless channels. We use AWGN as our
default channel mode to evaluate our system because we want to minimize the effect of
interference or side effect of channel noise on the bit error rate (BER).
The packet goes into the system and gets processed by the transmitter and receiver and
the corresponding output is generated. The modulation schemes we have tested it for
include 16-QAM, 64-QAM and QPSK and we simulate the FFT data with different data
widths starting from a bit-width of 4 and incrementing it by 4 bits each step till it reaches
a bit-width of 64.
6. PLX
Most modern processors employ sub-word parallelism in order to increase the
performance of the system. Significant improvement in the performance of multimedia
processing has been achieved by exploiting this sub-word parallelism. In PLX we
13
eliminate the need for complex instructions in the presence of constrained environment
and also ensure that low cost and low power needs for such environments are met.
PLX implementation achieves significant improvement in speedup when compared to 64-
bit RISC processors and to IA 32 processors with MMX and SSE multimedia extensions.
The PLX ISA combines the most useful instructions from the first two generations of
multimedia instructions added to processors. Since our project focuses on trying to
optimize the code using sub-word parallelism, the natural choice of a platform for
implementation would be PLX. [5]
6.1. PLX implementation of FFT
We implemented the Cooley-Tukey fast Fourie transform algorithm on the PLX 1.2
platform, utilizing the subword parallism provided by the PLX ISA to improve
performance.
PLX is an instruction set designed specially for interger subword parallism. In PLX 1.2,
the version we are using, floating point operations is not supported. So our
implementation of the fft/ifft program must be in fixed point. In 802.11a&g, data
subchannels carry data symbols in parallel. These symbol carries 1 bit(BPSK), 2 bits (4-
PSK), 4 bits(16-QAM) or 6 bits of user data (64-QAM). Therefore, a fft program that can
process 6 bits input symbols should be enough to handle all these cases.
To reduce the possibility of quantization error, we use 16 bits for the intermediate value
when doing FFT or IFFT. The fractional point is right before the 8th digit and the first two
bits are used as guard bits. Saturation arithmetic is also used to mitigate quantization
error.
Since the PLX ISA does not have any support for sin/cos computation and it is too time-
consuming to implement them with subroutines, for example, using CORDIC arithmetic,
the approach we use is to precompute all the twiddle factors we will use and save it in a
14
array in the SRAM. When a twiddle factor is needed, the array is accessed by the twiddle
factor index k. 2
( )kj
nn k e
π
ω−
=
The twiddle factor is a complex fractional number. Both the real part and the imaginary
part occupy 16 bits. Two twiddle factors fit into one machine word in a 64-bit PLX
simulator.
The FFT algorithm is implemented as a recursive function. This is not the most efficient
way to implement, but it is very simple and straightforward approach. To accommodate
function calls, we defined one of the 32 general purpose register as the stack pointer
register. We also write macros such as push and pop to manipulate the stack. The kernel
computation of the FFT program is listed following as a C program:
for(int k=0; k<N/2; ++k) {
complex_t y1 = out[k];
complex_t y2 = out[k+N/2];
complex_t t = twiddle(N, k);
out[k].real = y1.real + ((t.real * y2.real) >> 15) - ((t.imag * y2.imag) >> 15);
out[k].imag = y1.imag + ((t.real * y2.imag) >> 15) + ((t.imag * y2.real) >> 15);
out[k+N/2].real = y1.real - ((t.real * y2.real) >> 15) + ((t.imag * y2.imag) >> 15);
out[k+N/2].imag = y1.imag - ((t.real * y2.imag) >> 15) - ((t.imag * y2.real) >>
15);
}
We optimized this part using the PLX subword parallism and predicate registers. In our
implementation, one complex number occupies one register. The lower 32 bits are used
to store real portion of the complex number, while the higher 32 bits are used for the
imaginary part. So for a complex addition or subtraction, we can use one parallel
addition/subtraction PLX instruction rather than two additions/subtractions in a general
purpose processor. PLX also provides scaling for some instruction as can be found in the
pmulshr instruction.
15
imag real
imag real
t
y2
Pmushr.15
imag real
real imag
excheck Psub.
Figure 6 – Computation Flow
7. RESULTS AND DISCUSSION
We have evaluated our WLAN system with multiple data. As is mentioned in the
previous section, we took an AWGN channel model, each time our program generates
100 random packets and sends them through our system. We tested our system with
different coding mode: QPSK, 16QAM, and 64QAM. The results are averaged for
different configurations.
For QPSK, we can see from figure 7 that by increasing the FFT data width, we can
improve the raw data bit error rate (BER). Here, raw data are those data that are received
and demodulated at the receiver but without Viterbi decoding.
Figure 7- raw Data BER vs FFT datawidth for QPSK
16
It is obvious that the raw data BER decrease dramatically if we increase the FFT data
width from 8 to 12, and it keeps decreasing with an increasing FFT width. We can
decrease the raw BER by 50%, if we increase the FFT processor data width from 4 bits to
64 bits.
The data BER is shown in figure 8, similarly we can see that the data BER is reduced by
increasing the FFT data width. Here data means what we get from the Viterbi decoder. It
is still obvious that we can decrease our raw data BER by 67.5% if we increase the FFT
and IFFT data width from 4 bits to 64 bits. The same interesting thing is the curve still
goes down quickly when the FFT data width is increased from 8bits to 12bits. Another
thing worth to mention here is when we increase the FFT data width from 28 to 32, there
will be an 18% decrease of data BER, but if we increase the FFT data width from 32 to
36, the decrease of data BER is just 3%. Similar observation can be found, when we go
from 40 bits data width to 44 data width and 48 width. That means, we can take the least
number of FFT data bit width to get a similar data error rate performance
Figure 8 - Data BER vs FFT datawidth for QPSK
17
The data packet error rate is also shown in figure 9 below. Because the packet error rate
are related to any errors with a packet, if any bits within our received packet have errors
we will have a packet error. Therefore, the packet error rate is higher compared with that
of bit error rate. However, the curve still goes down similarly as the previous two figures.
Here, we still find that the PER will decrease by 62% if we increase the data width of our
FFT and IFFT processor.
Figure 9 - raw Data BER vs FFT datawidth for QPSK
Similarly the simulation results of 16 QAM and 64 QAM are shown below. It is clear that
we can decrease our raw data BER by 15% for 64 QAM 14.5% for 16 QAM, if we
decrease the FFT data width from 4 to 64 Another interesting thing is when we increase
the FFT data width from 4 to 28, the curve go down dramatically. However after 28 and
before 52, the decrease will not be very obvious. For 16 QAM code mode, the results are
shown in figure 10, 11 and 12
18
Figure 10 - raw Data BER vs FFT datawidth for 16 QAM
Figure 11 - Data BER vs FFT datawidth for 16 QAM
19
Figure 12 - Data PER vs FFT datawidth for 16 QAM
For 64 QAM code mode, the results are shown in figure 13, 14 and 15
Figure 13 - raw Data BER vs FFT datawidth for 64 QAM
20
Figure 14 - Data BER vs FFT datawidth for 64 QAM
Figure 15 - Data PER vs FFT datawidth for 64 QAM
21
8. CONCLUSION
In conclusion, we have evaluated the relationship between different FFT/IFFT processor
data width and bit error rate and packet error rate. The simulation results show that we
can decrease the BER and PER significantly by increasing the FFT/IFFT processor data
width. Here, since we all know that the number of FFT/IFFT processor data width means
the size of complex multiplier and complex adder, and the more data width we have, the
larger the FFT/IFFT processor will be.
So, as a result, we should try our best to find a minimal FFT/IFFT data width to obtain a
best performance based on our requirement. From the analyses above, we can see that if
we set the FFT/IFFT data width to 32 bits, the BER performance is similar compared
with that of the FFT processors with more data bit width. Similarly, a 44 bits data width
is another point that can get similar observation as 32 bits.
Therefore, in real practice, when we are trying to design an OFDM system with certain
requirement, we can take the tradeoff from these results mentioned above to get the best
performance and minimized FFT/IFFT processor based on those constraints.
9. FUTURE WORK In our project we have used the default AWGN channel for estimation as well as a fixed
packet size (one byte). However, in real systems packet sizes will vary based on the
implementation and requirements. Apart from this, the packet error rate is dependent on
the bit error rate (BER) and packet size. So, for further research in this area, we can
implement packets with varying sized as well as try out different channel modes like
Exponential channel, Rayleigh’s channel, etc.
22
10. REFERENCES
[1] Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specification, IEEE Standard, Supplement to Standard 802 Part 11: Wireless
LAN, New York, NY, 1999.
[2] Orthogonal Frequency Division Multiplexing, U.S. Patent No. 3, 488,4555, filed
November 14, 1966, issued Jan. 6, 1970.
[3] Proakis, J. G., Digital Communications, Prentice Hall, 3rd edition, 1995.
[4] P. Duhamel and M. Vetterli. Fast Fourier transforms: a tutorial review and a state of
the art. Signal Processing, 19:259–299, April 1990.
[5] PLX 1.1 ISA Reference.
[6] An implementation of OFDM IEEE 802.11 WLAN system in Matlab
23
11. APPENDIX 11.1. FFT code - PLX #define sp R30 //stack point #define bp R29 //base point //function call argument and result #define o1 R1 //first output result #define a1 R21 //first argument #define a2 R22 //second argument #define a3 R23 #define a4 R24 #define a5 R25 #define a6 R26 #define t1 R2 //local variable of a procedure #define t2 R3 #define t3 R4 #define t4 R5 #define t5 R6 #define t6 R7 #define t7 R8 #define t8 R9 #define t9 R10 #define t10 R11 #define t11 R12 #define t12 R13 #define t13 R14 #define t14 R15 #define t15 R16 push macro Rs store.8.update Rs, sp, 8 endm pop macro Rs load.8 Rs, sp, 0 //read data subi sp, sp, 8 //decrement sp by 8 endm mov macro Rd, Rs ori Rd, Rs, 0 endm call macro ADDR jmp.link ADDR
24
endm halt macro trap 0xFFFF end #define twiddle_base 0x0000 #define stack_base 0x1000 #define in_base 0x5000 #define out_base 0x6000 #define fft_points 0x40 //decimal: 64 #define begin a1 #define inc a2 #define out a3 #define N a4 #define depth a5 //recursive depth main proc loadi.z.0 sp, stack_base loadi.z.0 bp, stack_base loadi.z.0 begin, in_base loadi.z.0 inc, 1 loadi.z.0 out, out_base loadi.z.0 N, fft_points loadi.z.0 depth, 0 call FFT halt endp //recursive FFT FFT: cmpi.eq N, 1, P1, P2 P1 load.8 t1, begin, 0 P1 pshifti.4.l t2, t1, 8 P1 store.8 t2, out, 0 P1 ret R31 push begin push inc push out push N push depth push R31
25
pshifti.8.l inc, inc, 1 pshifti.8.r N, N, 1 addi depth, depth, 1 call FFT subi depth, depth, 1 pop R31 pop depth pop N pop out pop inc pop begin push begin push inc push out push N push depth push R31 pshifti.8.l inc, inc, 1 pshifti.8.r N, N, 1 addi depth, depth, 1 padd.8 begin, begin, inc padd.8 out, out, N call FFT pop R31 pop depth pop N pop out pop inc pop begin mov t1, R0 // t1: k pshifti.8.r t2, N, 1 // t2: N/2 padd.8 t3, out, t2 // t3: out+N/2 LOOP: cmp.lt t1, t2, P1, P2 P2 ret R31 loadx.8 t4, out, t1 //t4: y1 loadx.8 t5, t3, t1 //t5: y2 //get twiddle factor pshift.8.l t7, t1, depth //get index
26
load.8 t6, t7, twiddle_base //t6: twiddle factor pmulshr.15 t7, t5, t6 //t7: tmp excheck.4 t8, t7, t7 //t8: tmp2 psub.4.s t9, t7, t8 //t9: tmp3 = tmp - tmp2 padd.4.s t10, t4, t9 //t10: tmp4: low portion of tmp4 is out[k].real psub.4.s t11, t4, t9 //t11: tmp5: low portion of tmp5 is out[k+N/2].real excheck.4 t7, t5, t5 //t7: y2' pmulshr.15 t7, t7, t6 //t7: tmp excheck.4 t8, t7, t7 //t8: tmp2 padd.4.s t9, t7, t8 //t9: tmp3 = tmp + tmp2 padd.4.s t12, t4, t9 //t12: tmp6: low portion of tmp6 is out[k].img psub.4.s t13, t4, t9 //t13: tmp7: low portion of tmp7 is out[k+N/2].img mix.4.r t10, t12, t10 //out[k] mix.4.r t11, t13, t11 //out[k+N/2] padd.8 t7, out, t1 //t7 = out+k store.8 t10, t7, 0 //out[k] <= t10 padd.8 t7, t3, t1 //t7 = out + N/2 + k store.8 t11, t7, 0 //out[k+N/2] = t11 addi t1, t1, 1 jmp LOOP //END of fft 11.2. Partial MatLab code – BER and PER evaluation
( Detailed program that can be simulated is found at http://homepages.cae.wisc.edu/~zaipengx/ECE734homework.shtml; ui_start is the command to start the WLAN simulation system )
function runsim(sim_options)
% set constants used in simulation set_sim_consts; % Set Random number generators initial state % reset random number generators based on current clock value rand('state',sum(100*clock)); randn('state',sum(100*clock)); % Main simulation loop % Initialize simulation timer start_time = clock;
27
% Initialize trellis tables for Viterbi decoding rx_init_viterbi; % counters for information bits num_inf_bits = 0; num_inf_bit_errors = 0; num_inf_packet_errors = 0; inf_ber = 0; inf_per = 0; num_inf_bits_fixed = 0; num_inf_bit_errors_fixed = 0; num_inf_packet_errors_fixed = 0; inf_ber_fixed = 0; inf_per_fixed = 0; % counters for raw (uncoded) bits num_raw_bits = 0; num_raw_bit_errors = 0; num_raw_packet_errors = 0; raw_ber = 0; raw_per = 0; num_raw_bits_fixed = 0; num_raw_bit_errors_fixed = 0; num_raw_packet_errors_fixed = 0; raw_ber_fixed = 0; raw_per_fixed = 0; % Simulation the number of packets specified packet_count = 0; %open a file to write data fid_tx_syms_into_ifft_bit = fopen('syms_into_ifft.txt','w'); fid_tx_time_syms_bit = fopen('syms_from_ifft.txt','w'); fid_rx_data_syms_intofft = fopen('syms_into_fft.txt','w'); fid_rx_freq_data_fromfft = fopen('syms_from_fft.txt','w'); fid_errors = fopen('errors.txt','w'); fid_errors_fixed = fopen('errors_fixed.txt','w'); while packet_count < sim_options.PktsToSimulate packet_count = packet_count + 1; packet_start_time = clock; % Simulate one packet with the current options [inf_bit_cnt, inf_bit_errors, raw_bits_cnt, raw_bit_errors,tx_time_syms_bit,tx_syms_into_ifft_bit,rx_data_syms_intofft,rx_freq_data_fromfft,inf_bit_cnt_fixed, inf_bit_errors_fixed, raw_bits_cnt_fixed, raw_bit_errors_fixed] = ... single_packet(sim_options); %store Transmitter pre_ifft values, tx_syms_into_ifft_bit length_tx_syms_into_ifft_bit=length(tx_syms_into_ifft_bit); real_tx_syms_into_ifft_bit=real(tx_syms_into_ifft_bit); imag_tx_syms_into_ifft_bit=imag(tx_syms_into_ifft_bit);
28
new_tx_syms_into_ifft_bit=reshape([real_tx_syms_into_ifft_bit imag_tx_syms_into_ifft_bit],length_tx_syms_into_ifft_bit,2); fprintf(fid_tx_syms_into_ifft_bit,'%5.4f %5.4f\n',new_tx_syms_into_ifft_bit.'); %store Transmitter post_ifft values %tx_time_syms_bit length_tx_time_syms_bit=length(tx_time_syms_bit); real_tx_time_syms_bit=real(tx_time_syms_bit); imag_tx_time_syms_bit=imag(tx_time_syms_bit); new_tx_time_syms_bit=reshape([real_tx_time_syms_bit imag_tx_time_syms_bit],length_tx_time_syms_bit,2); fprintf(fid_tx_time_syms_bit,'%5.4f %5.4f\n',new_tx_time_syms_bit.'); %store Receiver pre_fft values, rx_data_syms_into fft length_rx_data_syms_intofft=length(rx_data_syms_intofft); real_rx_data_syms_intofft=real(rx_data_syms_intofft); imag_rx_data_syms_intofft=imag(rx_data_syms_intofft); new_rx_data_syms_intofft=reshape([real_rx_data_syms_intofft imag_rx_data_syms_intofft],length_rx_data_syms_intofft*3,2); fprintf(fid_rx_data_syms_intofft,'%5.4f %5.4f\n',new_rx_data_syms_intofft.'); %store Receiver post_ifft values, rx_freq_data_fromfft length_rx_freq_data_fromfft=length(rx_freq_data_fromfft); real_rx_freq_data_fromfft=real(rx_freq_data_fromfft); imag_rx_freq_data_fromfft=imag(rx_freq_data_fromfft); new_rx_freq_data_fromfft=reshape([real_rx_freq_data_fromfft imag_rx_freq_data_fromfft],length_rx_freq_data_fromfft*3,2); fprintf(fid_rx_freq_data_fromfft,'%5.4f %5.4f\n',new_rx_freq_data_fromfft.'); num_inf_bits = num_inf_bits + inf_bit_cnt; num_inf_bit_errors = num_inf_bit_errors + inf_bit_errors; num_inf_packet_errors = num_inf_packet_errors + (inf_bit_errors~=0); inf_ber = num_inf_bit_errors/num_inf_bits; inf_per = num_inf_packet_errors/packet_count; num_raw_bits = num_raw_bits + raw_bits_cnt; num_raw_bit_errors = num_raw_bit_errors + raw_bit_errors; num_raw_packet_errors = num_raw_packet_errors + (raw_bit_errors~=0); raw_ber = num_raw_bit_errors/num_raw_bits; raw_per = num_raw_packet_errors/packet_count; %for our fixed point FFT and IFFT num_inf_bits_fixed = num_inf_bits_fixed + inf_bit_cnt_fixed; num_inf_bit_errors_fixed = num_inf_bit_errors_fixed + inf_bit_errors_fixed; num_inf_packet_errors_fixed = num_inf_packet_errors_fixed + (inf_bit_errors_fixed~=0); inf_ber_fixed = num_inf_bit_errors_fixed/num_inf_bits_fixed;
29
inf_per_fixed = num_inf_packet_errors_fixed/packet_count; num_raw_bits_fixed = num_raw_bits_fixed + raw_bits_cnt_fixed; num_raw_bit_errors_fixed = num_raw_bit_errors_fixed + raw_bit_errors_fixed; num_raw_packet_errors_fixed = num_raw_packet_errors_fixed + (raw_bit_errors_fixed~=0); raw_ber_fixed = num_raw_bit_errors_fixed/num_raw_bits_fixed; raw_per_fixed = num_raw_packet_errors_fixed/packet_count; packet_stop_time = clock; packet_duration = etime(packet_stop_time, packet_start_time); % Display results fprintf('%8s %8s %9s %10s %8s %10s %10s %9s\n', ... ' Packet |', ' Time |', 'raw errs |', ' raw BER |', 'data errs |',' data BER |', ' raw PER |', 'data PER'); fprintf('%7d |%7g | %8d |%10.2e |%10d |%10.2e |%10.2e |%10.2e\n',... packet_count, packet_duration, raw_bit_errors, raw_ber, inf_bit_errors, inf_ber, raw_per, inf_per); fprintf('%7d |%7g | %8d |%10.2e |%10d |%10.2e |%10.2e |%10.2e\n',... packet_count, packet_duration, raw_bit_errors_fixed, raw_ber_fixed, inf_bit_errors_fixed, inf_ber_fixed, raw_per_fixed, inf_per_fixed); %store results into a file fprintf(fid_errors,'%8.0f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f\n',... packet_count, packet_duration, raw_bit_errors, raw_ber, inf_bit_errors, inf_ber, raw_per, inf_per); fprintf(fid_errors_fixed,'%8.0f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f\n',... packet_count, packet_duration, raw_bit_errors_fixed, raw_ber_fixed, inf_bit_errors_fixed, inf_ber_fixed, raw_per_fixed, inf_per_fixed); % read event queue drawnow; end %close the file status_tx_syms_into_ifft_bit = fclose(fid_tx_syms_into_ifft_bit); status_tx_time_syms_bit = fclose(fid_tx_time_syms_bit); status_rx_data_syms_intofft = fclose(fid_rx_data_syms_intofft); status_rx_freq_data_fromfft = fclose(fid_rx_freq_data_fromfft); status_errors = fclose(fid_errors); status_errors_fixed = fclose(fid_errors_fixed); stop_time = clock; elapsed_time = etime(stop_time,start_time); fprintf('Simulation duration: %g seconds\n',elapsed_time); ********************************** END ********************************
30