an implementation of ieee802 - cae usershomepages.cae.wisc.edu/.../s06/xiehoufranklinreport.pdf ·...

30
An Implementation of IEEE802.11a WLAN System using Subword Parallelism and its Quantization Error Evaluation ECE 734 FINAL PROJECT PROF. YU HEN HU SUBMITTED BY DAPHNE J. FRANKLIN MUWU HOU ZAIPENG XIE 1

Upload: others

Post on 21-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

An Implementation of IEEE802.11a

WLAN System using Subword Parallelism and its Quantization Error

Evaluation

ECE 734 FINAL PROJECT

PROF. YU HEN HU

SUBMITTED BY DAPHNE J. FRANKLIN

MUWU HOU ZAIPENG XIE

1

Page 2: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

TABLE OF CONTENTS

Title Page Number

1. Abstract 3

2. Introduction

2.1. Motivation 3

2.2. Background 4

2.3. Baseband Model 4

3. IEEE 802.11a Standard 5

4. Implementation of an OFDM system 8

5. Methodology 10

6. PLX 13

6.1. PLX implementation of FFT 14

7. Results 16

8. Conclusion 22

9. Future Work 22

10. References 23

11. Appendix 24

2

Page 3: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

1. ABSTACT

An OFDM system based on the IEEE 802.11a standards is studied and implemented. The

code for the transceiver (transmitter/receiver) section is written in Matlab and the decoder

as well as encoder portions of the code invoking the FFT and IFFT functions are

translated into PLX. The primary characteristics are studied and further optimizations, the

most important of which is subword parallelism, are done on the code and the results are

compared to the theorectical results. This project evaluates the performance

characteristics of the OFDM system which is simulated according to the parameters

established by the standard as well as the quantization errors. Our simulation results show

that the simulation results show that we can decrease the BER and PER significantly by

increasing the FFT/IFFT processor data width. The tradeoff between FFT/IFFT processor

data width and BER and PER grant us opportunities to get the best performance and

minimized FFT/IFFT processor based on requirement.

2. INTRODUCTION

2.1. MOTIVATION:

Orthogonal frequency division multiplexing (OFDM) is a special case of

multicarrier transmission, where a single datastream is transmitted over a number of

lower rate subcarriers. The primary reason for employing multicarrier modulation

techniques like OFDM is to meet the increase in demand of greater communication

capacity with high bandwidth efficience and also because of it’s robustness with respect

to multi-path fading and delay. The processing power of modern digital signal processors

has made the use of OFDM systems both practical and economical.

It has already been accepted for the new wireless local area network standards IEEE

802.11a, High Performance LAN type 2 (HIPERLAN/2) and Mobile Multimedia Access

Communication (MMAC) Systems. Also, it is expected to be used for wireless broadband

multimedia communications, General Switched Telephone Network (GSTN), Cellular

3

Page 4: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

radio, Digital Audio Broadcasting (DAB), HDTV Broadcasting and many more

application areas.

OFDM can be seen as either a modulation technique or a multiplexing technique. One of

the main reasons to use OFDM is to increase the robustness against frequency selective

fading or narrowband interference. In a single carrier system, a single fade or interferer

can cause the entire link to fail, but in a multicarrier system, only a small percentage of

the subcarriers will be affected. Error correction coding can then be used to correct for

the few erroneous subcarriers.

The FFT can be used to efficiently perform the modulation of data onto orthogonal

carriers. Recent advances in very-large-scale integration (VLSI) technology make high-

speed, large-size FFT chips commercially affordable. Using this method, both transmitter

and receiver are implemented using efficient FFT techniques that reduce the number of

operations from N2 in DFT down to N log N.

2.2. BACKGROUND:

In July 1998, the IEEE standardization group decided to select OFDM as the basis for

their new 5-GHz standard, targeting a range of data stream from 6 up to 54 Mbps. This

new standard is the first one to use OFDM in packet-based communications, while the

use of OFDM until now was limited to continuous transmission systems.

The concept of using parallel data transmission and frequency division multiplexing was

published in the mid-1960s. Some early OFDM for Wireless Networks development is

traced back to the 1950s. A U.S. patent was filed and issued in January 1970.

2.3. BASEBAND MODEL

The fundamental idea defining an OFDM system is the division of the available

frequency spectrum into several subcarriers. To obtain a high spectral efficiency, the

frequency responses of the subcarriers are overlapping and orthogonal, hence the name

4

Page 5: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

OFDM. This orthogonality can be completely maintained with a small price in a loss in

SNR, even though the signal passes through a time dispersive fading channel, by

introducing a cyclic prefix (CP).

The binary information is first grouped, coded, and mapped according to the modulation

in a “signal mapper.” The transmitter first converts the input stream of data from serial to

parallel sets. After the guard band is inserted, an N-point Inverse Fast Fourier Transform

(IFFTN) block transforms the data sequence into time domain (note that N is typically

256 or larger). IFFT is typically useful for OFDM systems as it generates samples of a

waveform that satisfy the orthogonality function. Following the IDFT block, a cyclic

extension of time length TG, chosen to be larger than the expected delay spread, is

inserted to avoid intersymbol and intercarrier interferences. The D/A converter contains

low-pass filters with bandwidth 1/TS, where TS is the sampling interval. The channel is

modeled as an impulse response g(t) followed by the complex additive white Gaussian

noise (AWGN) n(t), where αm is a complex values and 0 ≤ τmTS ≤ TG.

G(t) = αm δ(t- τmTS) ∑

At the receiver, after passing through the analog-to-digital converter (ADC) and

removing the CP, the DFTN is used to transform the data back to frequency domain.

Lastly, the binary information data is obtained back after the demodulation and channel

decoding.

3. IEEE 802.11a STANDARD

The IEEE 802.11 specification is a wireless LAN (WLAN) standard that defines a set of

requirements for the physical layer (PHY) and a medium access control (MAC) layer. For

high data rates, the standard provides two PHYs – IEEE 802.11b for 2.4-GHz operation

and IEEE 802.11a for 5-GHz operation. The IEEE 802.11a standard is designed to serve

applications that require data rates higher than 11 Mbps in the 5-GHz frequency band.

5

Page 6: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

When developing WLAN systems, choosing the right modulation and frequency band

should be a priority in RF design, especially when designing IEEE 802.11a radios. For

the past decade, WLAN systems have been designed to operate in the unlicensed 2.4-

GHz frequency band. The 2.4-GHz band provides 83 MHz of total contiguous bandwidth,

spanning from 2.4 to 2.483 GHz. Moving to the 5-GHz band offers over three times the

operating bandwidth over the available spectrum in the 2.4-GHz band. The 5-GHz band

is also less susceptible to interference, unlike the 2.4-GHz unlicensed band, which shares

spectrum, with other wireless appliances such as Bluetooth devices.

In the US, 300 MHz of bandwidth is allocated in the 5-GHz band to WLANs under the

rules of the Unlicensed-National Information Infrastructure (U-NII). The bandwidth is

fragmented into two blocks that are noncontiguous across the 5-GHz band. [1]

The major specifications of the OFDM PHY is shown in Table 1.

Information Data Rate 6,9,12,18,24,36,48 and 54 Mbits/sec

BPSK

QPSK

16-QAM

Modulation 64-QAM

K = 7(64 states)

convolution

Error Correcting Code code

Coding Rate 1/2, 2/3, 3/4

Number of Subcarriers 52

OFDM symbol duration 4 microsec

Guard Interval 0.8 microsec

Occupied Bandwidth 16.6 MHz

Table 1 – Major parameters of the OFDM PHY

6

Page 7: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

The mathematical conventions in signal description define that the transmitted signals

will be described in a complex abseband signal notation. The actual transmitted signal is

related to the complex baseband signal by the relation which is given below:

r(RF)(t) = Re{r(t)exp(j2π f ct)} where

Re(.) represents the real part and f c denotes the carrier center frequency

The transmitted baseband signal is a contribution from several OFDM symbols.

rPACKET(t) = rPREAMBLE(t) + rSIGNAL(t–tSIGNAL) + rDATA(t–tDATA) [1] All the subframes of the signal are constructed as an inverse fourier transform of a set of coefficients. The rate dependent as well as the time dependent parameters as specified by IEEE

802.11a standard [1] is given in Table 2 and Table 3.

Data Rate

Mbits/s Modulation

Coding Rate

R

Coded bits

per

subcarrier

Coded bits

per

OFDM

symbol

Data bits per

OFDM

symbol

6 BPSK 1/2 1 48 24

9 BPSK 3/4 1 48 36

12 QPSK 1/2 2 96 48

18 QPSK 3/4 2 96 72

24 16-QAM 1/2 4 192 96

36 16-QAM 3/4 4 192 144

48 64-QAM 2/3 6 288 192

54 64-QAM 3/4 6 288 216

Table 2 - Rate Dependent Parameter

7

Page 8: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

Parameter Value

Nsd : number of data subcarriers 48

Nsp : number of pilot subcarriers 4

Nst :number of subcarriers total 52(Nsd + Nsp)

Tfft : FFT/IFFT period 3.2 microsec

Tsignal : duration of signal BPSK OFDM symbol4 microsec(Tgi + Tfft)

Tgi : GI duration 0.8 microsec (Tfft/4)

Tgi2 : training symbol GI duration 1.6 microsec (Tfft/2)

Tsym : symbol interval 4 microsec (Tgi + Tfft)

Table 3 - Time Dependent Parameters

4. IMPLEMENTATION OF AN OFDM SYSTEM

OFDM system can be seen as either a multiplexing or a modulation technique. As

mentioned earlier, one of the primary advantages of the OFDM system is its capability to

minimize interference (frequency selective fading or narrowband interference). This

advantage is brought about because of the fact that while in a single carrier system, a

single fade or interference can cause the entire link to fail whereas in a multicarrier

system like the OFDM system, the interference will affect only a small set of the

subcarriers and thus will not corrupt the entire link.

Error correcting coding can then be used to correct the few subcarriers that are affected.

In a parallel system , the total signal frequency is divided into a fixed number of non-

overlapping frequency subchannels. Each subchannel is then modeled with a different

symbol and then the different subchannels are frequency multiplexed

By using the overlapping multicarrier technique, we can save almost 50% of the

bandwidth. In order to realize the overlapping multicarrier technique, we need to reduce

the crosstalk between subcarriers and the carriers must be mathematically orthogonal.

8

Page 9: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

The figure given below illustrates the difference between conventional non-overlapping

multicarrier technique and the overlapping multicarrier modulation technique

.

Figure 1-Orthogonal multicarrier technique vs Conventional multicarrier technique An OFDM signal is a sum of subcarriers that are individually modulated by using phase

shift keying (PSK) or quadrature amplitude modulation (QAM). The symbol [3] can be

written as

( )1

2

22

0.5Re exp 2 ,( )

0

s

ss

N

N c s s siNi

id j f t t t t t Ts t T

others

π−

+=

⎧ ⎧ ⎫⎛ + ⎞⎪ ⎪⎛ ⎞⎪ − − ≤ ≤ +⎪ ⎨ ⎬⎜ ⎟⎜ ⎟= ⎝ ⎠⎨ ⎝ ⎠⎪ ⎪⎩ ⎭⎪

⎪⎩

9

Page 10: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

Here Ns is the number of subcarriers, T is the symbol duration, cf is the carrier

frequency and ( )()()( ijbiaiddi +== 1,,1,0 −= Ni ).

The complex baseband OFDM signal defined by the equation above is the inverse fourier

transform of Ns QAM input symbols. The time discrete case is the inverse discrete

fourier transform. In practice, this transform can be implemented very efficiently by the

inverse fast fourier transform (IFFT). The IFFT drastically reduces the amount of

calculations by exploiting the regularity of operations in the IDFT.

( ) TtstNsje −− π

( )( ) TtstNsje −−− 2π

Figure 2 - OFDM modulator

5. METHODOLOGY

In the transmitter, binary input data is encoded by a rate ½ convolutional encoder. The

rate can be increased to 2/3 and ¾. After interleaving, the binary values are converted to

QAM values. Four pilot values are added each 48 data values, resulting in a total of 52

QAM values per OFDM symbol. The symbol is modulated onto 52 subcarriers by

applying the Inverse Fast Fourier Transform (IFFT). The output is converted to serial and

a cyclic extension is added to make the system robust to multipath propagation.

Windowing is applied after to get a narrower output spectrum. Using an IQ modulator,

the signal is converted to analog, which is upconverted to the 5 GHz band, amplified, and

transmitted through the antenna.

10

Page 11: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

The receiver performs the reverse operations of the transmitter, with additional training

tasks. In the first step, the receiver has to estimate frequency offset and symbol timing,

using special training symbols in the preamble. After removing the cyclic extension, the

signal can be applied to a Fast Fourier Transform to recover the 52 QAM values of all

subcarriers. The training symbols and the pilot subcarriers are used to correct for the

channel response as well as remaining phase drift. The QAM values are then demapped

into binary values, and finally a Viterbi decoder decodes the information bits.

In fact, the IFFT can be made using an FFT by conjugating input and output of the FFT

and dividing the output by the FFT size. [1]

Figure 3 - Inputs and Outputs of IDFT

If, for example, a 64-point IFFT is used, the coefficients 1 to 26 are mapped to the same

numbered IFFT inputs. The coefficients -26 to -1 are copied to IFFT inputs 38 to 63. The

rest of the inputs, 27 to 37 and the 0 input are set to zero. After performing an IFFT, the

output is cyclically extended to the desired length.

11

Page 12: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

The transmitter and the receiver section of the OFDM system was implemented in

MatLab[6] and it was modeled after the transmitter and receiver as described by the IEEE

802.11a standard.

In the transmitter section, the binary input data is encoded and the encoder includes

forward error correction and then the data is interleaved. After interleaving, the binary

values are converted to QAM values. Following this, serial data is converted to parallel

form and the symbol is modulated onto 52 carriers by applying IFFT. Once this is done,

the data is converted back into the serial form and we add cyclic extension and

windowing to the signal.

Shown in Figure 4 is the transmitter section

Figure 4 - Transmitter block diagram for the OFDM PHY in IEEE 802.11a

As for the receiver, it essentially performs the reverse operations of the transmitter. In the

first step the receiver has to estimate the symbol timing and the frequency offset. After

removing the cyclic extension, the signal (in the parallel form) can now be applied to the

fast fourier transform (FFT) in order to recover the 52 QAM values of all the subcarriers.

The next step would be to convert these values to the original binary values by

demapping them. Finally a Viterbi decoder is used to decode the information bits.

12

Page 13: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

Shown in Figure 5 is the block diagram of the receiver section.

Figure 5 - Receiver block diagram for the OFDM PHY in IEEE 802.11a

Each time, we use a random function to generate 100 packets of data with each packet

having a size of one byte. We use random generation of data as it is useful in testing the

robustness of the system against a wide range of data. In order to simplify our model, we

use AWGN as it represents the ideal model for wireless channels. We use AWGN as our

default channel mode to evaluate our system because we want to minimize the effect of

interference or side effect of channel noise on the bit error rate (BER).

The packet goes into the system and gets processed by the transmitter and receiver and

the corresponding output is generated. The modulation schemes we have tested it for

include 16-QAM, 64-QAM and QPSK and we simulate the FFT data with different data

widths starting from a bit-width of 4 and incrementing it by 4 bits each step till it reaches

a bit-width of 64.

6. PLX

Most modern processors employ sub-word parallelism in order to increase the

performance of the system. Significant improvement in the performance of multimedia

processing has been achieved by exploiting this sub-word parallelism. In PLX we

13

Page 14: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

eliminate the need for complex instructions in the presence of constrained environment

and also ensure that low cost and low power needs for such environments are met.

PLX implementation achieves significant improvement in speedup when compared to 64-

bit RISC processors and to IA 32 processors with MMX and SSE multimedia extensions.

The PLX ISA combines the most useful instructions from the first two generations of

multimedia instructions added to processors. Since our project focuses on trying to

optimize the code using sub-word parallelism, the natural choice of a platform for

implementation would be PLX. [5]

6.1. PLX implementation of FFT

We implemented the Cooley-Tukey fast Fourie transform algorithm on the PLX 1.2

platform, utilizing the subword parallism provided by the PLX ISA to improve

performance.

PLX is an instruction set designed specially for interger subword parallism. In PLX 1.2,

the version we are using, floating point operations is not supported. So our

implementation of the fft/ifft program must be in fixed point. In 802.11a&g, data

subchannels carry data symbols in parallel. These symbol carries 1 bit(BPSK), 2 bits (4-

PSK), 4 bits(16-QAM) or 6 bits of user data (64-QAM). Therefore, a fft program that can

process 6 bits input symbols should be enough to handle all these cases.

To reduce the possibility of quantization error, we use 16 bits for the intermediate value

when doing FFT or IFFT. The fractional point is right before the 8th digit and the first two

bits are used as guard bits. Saturation arithmetic is also used to mitigate quantization

error.

Since the PLX ISA does not have any support for sin/cos computation and it is too time-

consuming to implement them with subroutines, for example, using CORDIC arithmetic,

the approach we use is to precompute all the twiddle factors we will use and save it in a

14

Page 15: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

array in the SRAM. When a twiddle factor is needed, the array is accessed by the twiddle

factor index k. 2

( )kj

nn k e

π

ω−

=

The twiddle factor is a complex fractional number. Both the real part and the imaginary

part occupy 16 bits. Two twiddle factors fit into one machine word in a 64-bit PLX

simulator.

The FFT algorithm is implemented as a recursive function. This is not the most efficient

way to implement, but it is very simple and straightforward approach. To accommodate

function calls, we defined one of the 32 general purpose register as the stack pointer

register. We also write macros such as push and pop to manipulate the stack. The kernel

computation of the FFT program is listed following as a C program:

for(int k=0; k<N/2; ++k) {

complex_t y1 = out[k];

complex_t y2 = out[k+N/2];

complex_t t = twiddle(N, k);

out[k].real = y1.real + ((t.real * y2.real) >> 15) - ((t.imag * y2.imag) >> 15);

out[k].imag = y1.imag + ((t.real * y2.imag) >> 15) + ((t.imag * y2.real) >> 15);

out[k+N/2].real = y1.real - ((t.real * y2.real) >> 15) + ((t.imag * y2.imag) >> 15);

out[k+N/2].imag = y1.imag - ((t.real * y2.imag) >> 15) - ((t.imag * y2.real) >>

15);

}

We optimized this part using the PLX subword parallism and predicate registers. In our

implementation, one complex number occupies one register. The lower 32 bits are used

to store real portion of the complex number, while the higher 32 bits are used for the

imaginary part. So for a complex addition or subtraction, we can use one parallel

addition/subtraction PLX instruction rather than two additions/subtractions in a general

purpose processor. PLX also provides scaling for some instruction as can be found in the

pmulshr instruction.

15

Page 16: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

imag real

imag real

t

y2

Pmushr.15

imag real

real imag

excheck Psub.

Figure 6 – Computation Flow

7. RESULTS AND DISCUSSION

We have evaluated our WLAN system with multiple data. As is mentioned in the

previous section, we took an AWGN channel model, each time our program generates

100 random packets and sends them through our system. We tested our system with

different coding mode: QPSK, 16QAM, and 64QAM. The results are averaged for

different configurations.

For QPSK, we can see from figure 7 that by increasing the FFT data width, we can

improve the raw data bit error rate (BER). Here, raw data are those data that are received

and demodulated at the receiver but without Viterbi decoding.

Figure 7- raw Data BER vs FFT datawidth for QPSK

16

Page 17: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

It is obvious that the raw data BER decrease dramatically if we increase the FFT data

width from 8 to 12, and it keeps decreasing with an increasing FFT width. We can

decrease the raw BER by 50%, if we increase the FFT processor data width from 4 bits to

64 bits.

The data BER is shown in figure 8, similarly we can see that the data BER is reduced by

increasing the FFT data width. Here data means what we get from the Viterbi decoder. It

is still obvious that we can decrease our raw data BER by 67.5% if we increase the FFT

and IFFT data width from 4 bits to 64 bits. The same interesting thing is the curve still

goes down quickly when the FFT data width is increased from 8bits to 12bits. Another

thing worth to mention here is when we increase the FFT data width from 28 to 32, there

will be an 18% decrease of data BER, but if we increase the FFT data width from 32 to

36, the decrease of data BER is just 3%. Similar observation can be found, when we go

from 40 bits data width to 44 data width and 48 width. That means, we can take the least

number of FFT data bit width to get a similar data error rate performance

Figure 8 - Data BER vs FFT datawidth for QPSK

17

Page 18: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

The data packet error rate is also shown in figure 9 below. Because the packet error rate

are related to any errors with a packet, if any bits within our received packet have errors

we will have a packet error. Therefore, the packet error rate is higher compared with that

of bit error rate. However, the curve still goes down similarly as the previous two figures.

Here, we still find that the PER will decrease by 62% if we increase the data width of our

FFT and IFFT processor.

Figure 9 - raw Data BER vs FFT datawidth for QPSK

Similarly the simulation results of 16 QAM and 64 QAM are shown below. It is clear that

we can decrease our raw data BER by 15% for 64 QAM 14.5% for 16 QAM, if we

decrease the FFT data width from 4 to 64 Another interesting thing is when we increase

the FFT data width from 4 to 28, the curve go down dramatically. However after 28 and

before 52, the decrease will not be very obvious. For 16 QAM code mode, the results are

shown in figure 10, 11 and 12

18

Page 19: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

Figure 10 - raw Data BER vs FFT datawidth for 16 QAM

Figure 11 - Data BER vs FFT datawidth for 16 QAM

19

Page 20: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

Figure 12 - Data PER vs FFT datawidth for 16 QAM

For 64 QAM code mode, the results are shown in figure 13, 14 and 15

Figure 13 - raw Data BER vs FFT datawidth for 64 QAM

20

Page 21: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

Figure 14 - Data BER vs FFT datawidth for 64 QAM

Figure 15 - Data PER vs FFT datawidth for 64 QAM

21

Page 22: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

8. CONCLUSION

In conclusion, we have evaluated the relationship between different FFT/IFFT processor

data width and bit error rate and packet error rate. The simulation results show that we

can decrease the BER and PER significantly by increasing the FFT/IFFT processor data

width. Here, since we all know that the number of FFT/IFFT processor data width means

the size of complex multiplier and complex adder, and the more data width we have, the

larger the FFT/IFFT processor will be.

So, as a result, we should try our best to find a minimal FFT/IFFT data width to obtain a

best performance based on our requirement. From the analyses above, we can see that if

we set the FFT/IFFT data width to 32 bits, the BER performance is similar compared

with that of the FFT processors with more data bit width. Similarly, a 44 bits data width

is another point that can get similar observation as 32 bits.

Therefore, in real practice, when we are trying to design an OFDM system with certain

requirement, we can take the tradeoff from these results mentioned above to get the best

performance and minimized FFT/IFFT processor based on those constraints.

9. FUTURE WORK In our project we have used the default AWGN channel for estimation as well as a fixed

packet size (one byte). However, in real systems packet sizes will vary based on the

implementation and requirements. Apart from this, the packet error rate is dependent on

the bit error rate (BER) and packet size. So, for further research in this area, we can

implement packets with varying sized as well as try out different channel modes like

Exponential channel, Rayleigh’s channel, etc.

22

Page 23: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

10. REFERENCES

[1] Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)

Specification, IEEE Standard, Supplement to Standard 802 Part 11: Wireless

LAN, New York, NY, 1999.

[2] Orthogonal Frequency Division Multiplexing, U.S. Patent No. 3, 488,4555, filed

November 14, 1966, issued Jan. 6, 1970.

[3] Proakis, J. G., Digital Communications, Prentice Hall, 3rd edition, 1995.

[4] P. Duhamel and M. Vetterli. Fast Fourier transforms: a tutorial review and a state of

the art. Signal Processing, 19:259–299, April 1990.

[5] PLX 1.1 ISA Reference.

[6] An implementation of OFDM IEEE 802.11 WLAN system in Matlab

23

Page 24: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

11. APPENDIX 11.1. FFT code - PLX #define sp R30 //stack point #define bp R29 //base point //function call argument and result #define o1 R1 //first output result #define a1 R21 //first argument #define a2 R22 //second argument #define a3 R23 #define a4 R24 #define a5 R25 #define a6 R26 #define t1 R2 //local variable of a procedure #define t2 R3 #define t3 R4 #define t4 R5 #define t5 R6 #define t6 R7 #define t7 R8 #define t8 R9 #define t9 R10 #define t10 R11 #define t11 R12 #define t12 R13 #define t13 R14 #define t14 R15 #define t15 R16 push macro Rs store.8.update Rs, sp, 8 endm pop macro Rs load.8 Rs, sp, 0 //read data subi sp, sp, 8 //decrement sp by 8 endm mov macro Rd, Rs ori Rd, Rs, 0 endm call macro ADDR jmp.link ADDR

24

Page 25: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

endm halt macro trap 0xFFFF end #define twiddle_base 0x0000 #define stack_base 0x1000 #define in_base 0x5000 #define out_base 0x6000 #define fft_points 0x40 //decimal: 64 #define begin a1 #define inc a2 #define out a3 #define N a4 #define depth a5 //recursive depth main proc loadi.z.0 sp, stack_base loadi.z.0 bp, stack_base loadi.z.0 begin, in_base loadi.z.0 inc, 1 loadi.z.0 out, out_base loadi.z.0 N, fft_points loadi.z.0 depth, 0 call FFT halt endp //recursive FFT FFT: cmpi.eq N, 1, P1, P2 P1 load.8 t1, begin, 0 P1 pshifti.4.l t2, t1, 8 P1 store.8 t2, out, 0 P1 ret R31 push begin push inc push out push N push depth push R31

25

Page 26: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

pshifti.8.l inc, inc, 1 pshifti.8.r N, N, 1 addi depth, depth, 1 call FFT subi depth, depth, 1 pop R31 pop depth pop N pop out pop inc pop begin push begin push inc push out push N push depth push R31 pshifti.8.l inc, inc, 1 pshifti.8.r N, N, 1 addi depth, depth, 1 padd.8 begin, begin, inc padd.8 out, out, N call FFT pop R31 pop depth pop N pop out pop inc pop begin mov t1, R0 // t1: k pshifti.8.r t2, N, 1 // t2: N/2 padd.8 t3, out, t2 // t3: out+N/2 LOOP: cmp.lt t1, t2, P1, P2 P2 ret R31 loadx.8 t4, out, t1 //t4: y1 loadx.8 t5, t3, t1 //t5: y2 //get twiddle factor pshift.8.l t7, t1, depth //get index

26

Page 27: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

load.8 t6, t7, twiddle_base //t6: twiddle factor pmulshr.15 t7, t5, t6 //t7: tmp excheck.4 t8, t7, t7 //t8: tmp2 psub.4.s t9, t7, t8 //t9: tmp3 = tmp - tmp2 padd.4.s t10, t4, t9 //t10: tmp4: low portion of tmp4 is out[k].real psub.4.s t11, t4, t9 //t11: tmp5: low portion of tmp5 is out[k+N/2].real excheck.4 t7, t5, t5 //t7: y2' pmulshr.15 t7, t7, t6 //t7: tmp excheck.4 t8, t7, t7 //t8: tmp2 padd.4.s t9, t7, t8 //t9: tmp3 = tmp + tmp2 padd.4.s t12, t4, t9 //t12: tmp6: low portion of tmp6 is out[k].img psub.4.s t13, t4, t9 //t13: tmp7: low portion of tmp7 is out[k+N/2].img mix.4.r t10, t12, t10 //out[k] mix.4.r t11, t13, t11 //out[k+N/2] padd.8 t7, out, t1 //t7 = out+k store.8 t10, t7, 0 //out[k] <= t10 padd.8 t7, t3, t1 //t7 = out + N/2 + k store.8 t11, t7, 0 //out[k+N/2] = t11 addi t1, t1, 1 jmp LOOP //END of fft 11.2. Partial MatLab code – BER and PER evaluation

( Detailed program that can be simulated is found at http://homepages.cae.wisc.edu/~zaipengx/ECE734homework.shtml; ui_start is the command to start the WLAN simulation system )

function runsim(sim_options)

% set constants used in simulation set_sim_consts; % Set Random number generators initial state % reset random number generators based on current clock value rand('state',sum(100*clock)); randn('state',sum(100*clock)); % Main simulation loop % Initialize simulation timer start_time = clock;

27

Page 28: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

% Initialize trellis tables for Viterbi decoding rx_init_viterbi; % counters for information bits num_inf_bits = 0; num_inf_bit_errors = 0; num_inf_packet_errors = 0; inf_ber = 0; inf_per = 0; num_inf_bits_fixed = 0; num_inf_bit_errors_fixed = 0; num_inf_packet_errors_fixed = 0; inf_ber_fixed = 0; inf_per_fixed = 0; % counters for raw (uncoded) bits num_raw_bits = 0; num_raw_bit_errors = 0; num_raw_packet_errors = 0; raw_ber = 0; raw_per = 0; num_raw_bits_fixed = 0; num_raw_bit_errors_fixed = 0; num_raw_packet_errors_fixed = 0; raw_ber_fixed = 0; raw_per_fixed = 0; % Simulation the number of packets specified packet_count = 0; %open a file to write data fid_tx_syms_into_ifft_bit = fopen('syms_into_ifft.txt','w'); fid_tx_time_syms_bit = fopen('syms_from_ifft.txt','w'); fid_rx_data_syms_intofft = fopen('syms_into_fft.txt','w'); fid_rx_freq_data_fromfft = fopen('syms_from_fft.txt','w'); fid_errors = fopen('errors.txt','w'); fid_errors_fixed = fopen('errors_fixed.txt','w'); while packet_count < sim_options.PktsToSimulate packet_count = packet_count + 1; packet_start_time = clock; % Simulate one packet with the current options [inf_bit_cnt, inf_bit_errors, raw_bits_cnt, raw_bit_errors,tx_time_syms_bit,tx_syms_into_ifft_bit,rx_data_syms_intofft,rx_freq_data_fromfft,inf_bit_cnt_fixed, inf_bit_errors_fixed, raw_bits_cnt_fixed, raw_bit_errors_fixed] = ... single_packet(sim_options); %store Transmitter pre_ifft values, tx_syms_into_ifft_bit length_tx_syms_into_ifft_bit=length(tx_syms_into_ifft_bit); real_tx_syms_into_ifft_bit=real(tx_syms_into_ifft_bit); imag_tx_syms_into_ifft_bit=imag(tx_syms_into_ifft_bit);

28

Page 29: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

new_tx_syms_into_ifft_bit=reshape([real_tx_syms_into_ifft_bit imag_tx_syms_into_ifft_bit],length_tx_syms_into_ifft_bit,2); fprintf(fid_tx_syms_into_ifft_bit,'%5.4f %5.4f\n',new_tx_syms_into_ifft_bit.'); %store Transmitter post_ifft values %tx_time_syms_bit length_tx_time_syms_bit=length(tx_time_syms_bit); real_tx_time_syms_bit=real(tx_time_syms_bit); imag_tx_time_syms_bit=imag(tx_time_syms_bit); new_tx_time_syms_bit=reshape([real_tx_time_syms_bit imag_tx_time_syms_bit],length_tx_time_syms_bit,2); fprintf(fid_tx_time_syms_bit,'%5.4f %5.4f\n',new_tx_time_syms_bit.'); %store Receiver pre_fft values, rx_data_syms_into fft length_rx_data_syms_intofft=length(rx_data_syms_intofft); real_rx_data_syms_intofft=real(rx_data_syms_intofft); imag_rx_data_syms_intofft=imag(rx_data_syms_intofft); new_rx_data_syms_intofft=reshape([real_rx_data_syms_intofft imag_rx_data_syms_intofft],length_rx_data_syms_intofft*3,2); fprintf(fid_rx_data_syms_intofft,'%5.4f %5.4f\n',new_rx_data_syms_intofft.'); %store Receiver post_ifft values, rx_freq_data_fromfft length_rx_freq_data_fromfft=length(rx_freq_data_fromfft); real_rx_freq_data_fromfft=real(rx_freq_data_fromfft); imag_rx_freq_data_fromfft=imag(rx_freq_data_fromfft); new_rx_freq_data_fromfft=reshape([real_rx_freq_data_fromfft imag_rx_freq_data_fromfft],length_rx_freq_data_fromfft*3,2); fprintf(fid_rx_freq_data_fromfft,'%5.4f %5.4f\n',new_rx_freq_data_fromfft.'); num_inf_bits = num_inf_bits + inf_bit_cnt; num_inf_bit_errors = num_inf_bit_errors + inf_bit_errors; num_inf_packet_errors = num_inf_packet_errors + (inf_bit_errors~=0); inf_ber = num_inf_bit_errors/num_inf_bits; inf_per = num_inf_packet_errors/packet_count; num_raw_bits = num_raw_bits + raw_bits_cnt; num_raw_bit_errors = num_raw_bit_errors + raw_bit_errors; num_raw_packet_errors = num_raw_packet_errors + (raw_bit_errors~=0); raw_ber = num_raw_bit_errors/num_raw_bits; raw_per = num_raw_packet_errors/packet_count; %for our fixed point FFT and IFFT num_inf_bits_fixed = num_inf_bits_fixed + inf_bit_cnt_fixed; num_inf_bit_errors_fixed = num_inf_bit_errors_fixed + inf_bit_errors_fixed; num_inf_packet_errors_fixed = num_inf_packet_errors_fixed + (inf_bit_errors_fixed~=0); inf_ber_fixed = num_inf_bit_errors_fixed/num_inf_bits_fixed;

29

Page 30: An Implementation of IEEE802 - CAE Usershomepages.cae.wisc.edu/.../s06/xiehoufranklinReport.pdf · 2006-05-12 · carriers. Recent advances in very-large-scale integration (VLSI)

inf_per_fixed = num_inf_packet_errors_fixed/packet_count; num_raw_bits_fixed = num_raw_bits_fixed + raw_bits_cnt_fixed; num_raw_bit_errors_fixed = num_raw_bit_errors_fixed + raw_bit_errors_fixed; num_raw_packet_errors_fixed = num_raw_packet_errors_fixed + (raw_bit_errors_fixed~=0); raw_ber_fixed = num_raw_bit_errors_fixed/num_raw_bits_fixed; raw_per_fixed = num_raw_packet_errors_fixed/packet_count; packet_stop_time = clock; packet_duration = etime(packet_stop_time, packet_start_time); % Display results fprintf('%8s %8s %9s %10s %8s %10s %10s %9s\n', ... ' Packet |', ' Time |', 'raw errs |', ' raw BER |', 'data errs |',' data BER |', ' raw PER |', 'data PER'); fprintf('%7d |%7g | %8d |%10.2e |%10d |%10.2e |%10.2e |%10.2e\n',... packet_count, packet_duration, raw_bit_errors, raw_ber, inf_bit_errors, inf_ber, raw_per, inf_per); fprintf('%7d |%7g | %8d |%10.2e |%10d |%10.2e |%10.2e |%10.2e\n',... packet_count, packet_duration, raw_bit_errors_fixed, raw_ber_fixed, inf_bit_errors_fixed, inf_ber_fixed, raw_per_fixed, inf_per_fixed); %store results into a file fprintf(fid_errors,'%8.0f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f\n',... packet_count, packet_duration, raw_bit_errors, raw_ber, inf_bit_errors, inf_ber, raw_per, inf_per); fprintf(fid_errors_fixed,'%8.0f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f %8.5f\n',... packet_count, packet_duration, raw_bit_errors_fixed, raw_ber_fixed, inf_bit_errors_fixed, inf_ber_fixed, raw_per_fixed, inf_per_fixed); % read event queue drawnow; end %close the file status_tx_syms_into_ifft_bit = fclose(fid_tx_syms_into_ifft_bit); status_tx_time_syms_bit = fclose(fid_tx_time_syms_bit); status_rx_data_syms_intofft = fclose(fid_rx_data_syms_intofft); status_rx_freq_data_fromfft = fclose(fid_rx_freq_data_fromfft); status_errors = fclose(fid_errors); status_errors_fixed = fclose(fid_errors_fixed); stop_time = clock; elapsed_time = etime(stop_time,start_time); fprintf('Simulation duration: %g seconds\n',elapsed_time); ********************************** END ********************************

30