sdr implementation of ieee 802.15.4 phy on …jultika.oulu.fi/files/nbnfioulu-201301251016.pdf ·...

DEGREE PROGRAMME IN WIRELESS COMMUNICATIONS ENGINEERING

SDR IMPLEMENTATION OF IEEE 802.15.4 PHY

ON TRANSPORT TRIGGERED ARCHITECTURE

PROCESSOR

Author Amanullah Ghazi

Supervisor Jari Hannuksela

Second

supervisor

Janne Janhunen

January 2013

Ghazi A. (2013) SDR Implementation of IEEE 802.15.4 PHY on Transport

Triggered Architecture Processor. University of Oulu, Department of

Communication Engineering. Master’s Thesis, 41 p.

ABSTRACT

Ever evolving wireless communication standards, reduced time-to-market and a need

for flexibility and interoperability of multiple wireless communication technologies

on a single device are the driving factors behind implementation of wireless

standards on Software Defined Radios (SDR) platforms. The concept behind SDR is

to implement as much functionality on software as possible. SDR provide greater

interoperability and programmability compared with traditional hardwired

implementation at the cost of higher power consumption and market cost. SDR is the

driving technology for the next generation of co-operative and cognitive radios.

For implementing an SDR, the existing wireless communication algorithms needs

to be modified and an appropriate hardware platform needs to be selected. The IEEE

802.15.4 LR-WPAN standard requires low cost and low-power consuming devices.

The data rate requirements are also low (such as 250 kbps). Traditionally, the devices

compliant with the standard are hardwired system-on-chip implementation which

provides benefit in terms of power and cost. Recently, there has been significant

effort on modeling the IEEE 802.15.4 SDR systems which provide greater

interoperability and programmability of the devices. In this study, Transport

Triggered Architecture (TTA) based Application Specific Processor is selected for

SDR implementation of the IEEE 802.15.4 2.4 GHz physical layer for studying the

performance of such system in terms of Bit-Error-Rate, CPU cycle count, and

processor chip area.

As part of this work, different SDR frameworks like GNU Radio, Matlab-Simulink

etc. were evaluated for their feasibility of providing an agile platform for the

development. These existing frameworks need an operating system for their

execution and are not suitable for stand-alone systems such as a TTA based

processor.

The work also includes the study of different receiver algorithms and design

choices for the transceiver implementation. Based on existing literature and Matlab

modeling, Asynchronous Zero-Crossing Detector (AZCD) based non-coherent

receiver algorithm is selected for the implementation. The algorithm provides the

required BER performance with very less complex computation and is suited for low

power and low chip area implementations. The transmitter and receiver are

implemented on single-core TTA processors which provide the required performance

in terms of BER and data throughput. The processors designed need a very low

silicon area and clock frequency for their realization.

Key words: IEEE 802.15.4, LR-WPAN, Software Defined Radio, TTA

CONTENTS

ABSTRACT ................................................................................................................. 2 CONTENTS ................................................................................................................. 3 PREFACE .................................................................................................................... 4 GLOSSARY ................................................................................................................. 5

1. INTRODUCTION ................................................................................................ 8 1.1. Related Works .......................................................................................... 9 1.2. Scope of the Thesis................................................................................. 10 1.3. Structure of Thesis.................................................................................. 10

2. IEEE 802.15.4 PHYSICAL LAYER ................................................................. 11

2.1. Frequency Band ...................................................................................... 12 2.2. Frame Format ......................................................................................... 13

2.3. Coding .................................................................................................... 13 2.4. Modulation ............................................................................................. 14 2.5. Pulse Shaping ......................................................................................... 14 2.6. O-QPSK/MSK equivalence .................................................................... 15

3. TRANCEIVER DESIGN ................................................................................... 17 3.1. Transmitter Design ................................................................................. 17

3.1.1. O-QPSK based Transmitter Design .......................................... 17

3.1.2. MSK based transmitter Design ................................................. 17 3.1.3. Design Decisions ....................................................................... 18

3.2. Receiver Design ..................................................................................... 18 3.2.1. Coherent Receiver Design ......................................................... 18 3.2.2. Non-coherent receiver ............................................................... 21

3.2.3. Design Decisions ....................................................................... 23 4. TRANCEIVER IMPLEMENTATION .............................................................. 24

4.1. Transmitter Implementation ................................................................... 24 4.1.1. Frame Formatter ........................................................................ 24

4.1.2. Chip Mapper .............................................................................. 24 4.1.3. O-QPSK Modulator ................................................................... 24

4.1.4. Half-Sine Pulse Shaper .............................................................. 25 4.1.5. TTA processor for the transmitter ............................................. 25

4.2. Receiver Implementation ....................................................................... 26

4.2.1. Asynchronous Zero-Crossing Detector ..................................... 26 4.2.2. Down Sampling ......................................................................... 28

4.2.3. Bits-to-Word Packing ................................................................ 28 4.2.4. Correlator Bank ......................................................................... 28 4.2.5. Preamble Detection ................................................................... 29

4.2.6. Receiver Algorithm ................................................................... 30

4.2.7. TTA Processor for Receiver ...................................................... 30 5. PERFORMANCE EVALUATIONS ................................................................. 33

5.1. Noise Performance for Receiver ............................................................ 33

5.2. Performance of TTA processor for transmitter ...................................... 34 5.3. Performance of TTA processor for receiver ........................................... 34 5.4. Discussion .............................................................................................. 35

6. SUMMARY ....................................................................................................... 36 7. REFERENCES ................................................................................................... 38

PREFACE

This Master’s thesis has been published in the Department of Communication

Engineering of the University of Oulu, Finland. The research work published in this

thesis was mentored and funded by Department of Computer Science and

Engineering, University of Oulu. Their support is greatly acknowledged.

I would like to thank Professor Olli Silvén from the Department of Computer

Science for providing an excellent and flexible opportunity for this research work

and mentoring the work enthusiastically. I would also like to thank Jani Boutellier

(Dr. Tech.) from the Department of Computer Engineering for his continuous

support and technical mentorship.

I am grateful to Professor Jari Hannuksela from Department of Computer Science

for his encouraging supervision of the thesis work. The academic freedom provided

for the work allowed me to experiment with multiple design options. I am also

thankful to Janne Janhunen (Dr. Tech.) from Center for Wireless Communication

(CWC) for being my second supervisor and providing excellent feedback to the work

being done.

I would like to thank my family for their well wishes and prayers. I would also like

to thank my classmates and friends Shahriar Shahabuddin, Hassan Malik, Ijaz

Ahmad, Nouman Bashir and Masuma Khatun for their support and encouragement

during the thesis work.

Oulu, January 2013

Amanullah Ghazi

GLOSSARY

ALU Arithmetic and Logic Unit

ASIC Application Specific Integrated Circuit

ASK Amplitude Shift Keying

ASP Application Specific Processor

AZCD Asynchronous Zero-Crossing Detector

BPSK Binary Phase Shift Keying

CDM Cross-Differentiate Multiply

CMOS Complementary Metal–Oxide–Semiconductor

CORDIC COordinate Rotation DIgital Computer

CPFSK Continuous Phase Frequency Shift Keying

CPU Central Processing Unit

CSMA-CA Channel Sense Multiple Access with Collision Avoidance

CU Control Unit

DAC Digital to Analog Converter

DSP Digital Signal Processor

DSSS Direct Sequence Spread Spectrum

FFD Full Function Device

FFT Fast Fourier Transform

FIFO First In First Out

FIR Finite Impulse Response

FPGA Field Programmable Gate Array

FSK Frequency Shift Keying

FU Functional Unit

GPP General Purpose Processor

IEEE Institute of Electrical and Electronics Engineers

IF Intermediate Frequency

ISF Integrate and Saturate Filter

LAN Local Area Network

LDI Limiter Discriminator Integrator

LLC Logical Link Control

LR-WPAN Low Rate Wireless Personal Area Network

LSB Least Significant Byte

LSU Load Store Unit

MAC Medium Access Control

MSK Minimum Shift Keying

NF Noise Factor

O-QPSK Offset Quadrature Phase Shift Keying

PAN Personal Area Network

PER Packet Error Network

PHR Physical Layer Header

PHY Physical Layer

PN Pseudo-Random Noise

PPDU PHY Packet Data Unit

PSD Power Spectral Density

PSDU PHY Service Data Unit

PSSS Parallel Sequence Spread Spectrum

QPSK Quadrature Phase Shift Keying

RF Radio Frequency

RFD Reduced Function Device

RRC Root Raised Cosine

SDR Software Defined Radio

SER Symbol Error Rate

SFD Start of Frame Delimiter

SHR Synchronization Header

SIMD Single Instruction Multiple Data

SNR Signal to Noise Ratio

TCE TTA-based Codesign Environment

TTA Transport Triggered Architecture

USRP Universal Serial Radio Peripheral

WLAN Wireless Local Area Network

WMAN Wireless Metropolitan Area Network

WPAN Wireless Personal Area Network

XOR Bitwise exclusive OR operation

ZCD Zero Crossing Detector

ZIFZCD Zero-IF Zero Crossing Detector

A Amplitude of signal

ak O-QPSK symbols

Ci Cumulative decision of the zero-crossing detectors

Dk Decision value of zero-crossing detector

dk MSK symbols

Eb Energy per bit

fc Carrier frequency

I In-phase carrier

ik In-phase sample corresponding to kth phase axis

In Chip sequence

k Kilo

M Number of phase axes in AZCD

N Number of symbols per packet

N0 Noise power spectral density

Q Quadrature phase carrier

qk Quadrature-phase sample corresponding to kth phase axis

S Shape of demodulated signal by AZCD

T Number of filter taps

Tc Chip time

Ts Symbol Time

u Number of 1s in XOR-and-count 1 based correlator

v Correlation value based on Multiply-and-accumulate based correlator

θk kth phase axis angle

1. INTRODUCTION

Recent years have seen an exponential increase in use of wireless communication

technologies for voice, data and video communications. A multitude of wireless

networking technologies has been developed for catering to the needs of these

applications. The existing wireless networks can be categorized broadly as Wireless

Metropolitan Area Network (WMAN), Cellular Networks, Wireless Local Area

Network (WLAN) and Wireless Personal Area Network (WPAN). WMAN and

cellular networks cater to high data rate and voice applications with broader coverage

area replacing the traditional copper-wire based telephony network. WLANs also

cater to high data rate applications over small distances (within a building) replacing

the traditional wired LAN. WPANs are person centered very short range network of

devices requiring low data rate. A number of standards exist for specifying the

compliant devices. WMAN is specified by IEEE 802.16 series of standards [1, 2]. 3rd

Generation Partnership Project (3GPP) [3] specifies a number of standards for

cellular networks. WLAN technologies are governed by IEEE 802.11 series of

specifications [4]. IEEE 802.15 series specifies different standards for WPANs like

Bluetooth, Infrared Data Associations, Body Area Network, etc. [5, 7, 8]. IEEE

802.15.4 standard focuses on low data rate wireless application with very low energy

consumption and very low complexity [8]. The potential applications for IEEE

802.15.4 standard are sensors, interactive toys, smart badges, remote controls, and

home automation [8].

The large number and continuously evolving wireless communication standards

bring a need for reconfigurable and programmable radios also known as Software

Defined Radios. A Software Defined Radio (SDR) is based on software defined

wireless communication protocols instead of hardwired implementation providing

the adaptability and interoperability of multiple wireless communication standards on

a single device. The advantage of using an SDR based wireless communication

system is the ability to update, enhance or replace the functionality of the device.

This provides lower time-to-market and an enhanced lifetime of SDR based devices.

SDR systems provide flexibility at the cost of performance and power goals

compared to hard-wired ASIC implementations. Designing an SDR system for IEEE

802.15.4 is challenging due to low power and complexity requirements of the

standard. Nevertheless, since the performance requirement is less, the low

complexity and low power SDR system can be realized by intelligently choosing the

signal processing algorithms and processor architectures.

Use of a general purpose processor can maximize the flexibility and

programmability while a hard-wired ASIC implementation can minimize the power

consumption and cost. A better choice will be to use an Application Specific

Processor (ASP) which provides an optimal of power consumption, cost and

flexibility. In this thesis, Transport Triggered Architecture (TTA) [9] based ASP is

used for the realization of the SDR system for IEEE 802.15.4 physical layer.

The Transport Triggered Architecture (TTA) is a processor design paradigm where

computations happen as a side effect of data transport. The program directly controls

the internal data transport. A typical TTA processor is shown in Figure 1. A TTA

processor consists of Functional Units (FUs) which perform the specific computation

or function, Register Files (RF) for storing intermediate results and control units for

program control flow. Every functional unit has a number of input & output ports

depending on the number of operand for the computation. One of the input ports is

9

defined as triggering port. Writing data on a triggering port starts the computation

and the result is ready at the output port once computation finishes. The

interconnection network consists of buses and sockets which connect the different

FUs and register files.[9]

Functional Units Register Files Control Units

Sockets BusesPorts

Connections

Figure 1: Components of TTA Processor.

The TTA-based Codesign Environment (TCE) provides an excellent toolset for

designing and simulating TTA processor. The tool chain consists of a processor

design tool, C/C++ compiler, an extensive list of libraries for functional units and

simulation toll for simulating the processor behavior.[10]

1.1. Related Works

Previously, a lot of work has been done in implementing or modeling IEEE 802.15.4

physical layer using an SDR system. Sabater et al. [11] proposes the design of a

SDR transceiver for IEEE 802.15.4 (868 MHz band). The transceiver uses bandpass

delta-sigma modulator and an under-sampling receiver implemented over a Xilinx

Virtex-5 FPGA. One of the popular method for modeling an SDR is by using GNU

Radio [12] and commercially available family SDR radio hardware namely

Universal Serial Radio Peripheral (USRP)[13, 14, 15, 16]. An open source

implementation of IEEE 802.15.4 physical layer (2.4 GHz band) using GNU Radio is

discussed in [17]. The open source physical layer code has been ported on embedded

device USRP E100 [14] as thesis work described in [18]. Since the USRP E100

device cannot execute the receiver algorithm at the required data rate, the design

presented in [18] proposes an FPGA based hardware receiver for receiving the

incoming packets. A similar modeling of IEEE 802.15.4 physical layer has been

done using GNU Radio and USRP device in [19] where the processing in done on

host computer and USRP device is used as RF front end. Another thesis work on the

implementation of IEEE 802.15.4 using GNU Radio, USRP hardware and

CC2420/CC2431 [20] sensor nodes is described in [21].

GNU Radio based SDR realization on a USRP device is good for modeling

purposes, but is not suitable for the actual implementation of sensor nodes. GNU

10

Radio requires Linux operating system to execute, and hence need a host computer.

Real-time performance cannot be guaranteed to such a system. Use of embedded

devices such as USRP E100 running embedded Linux can do away with the need of

a host computer. However, it is not capable of receiver processing on real-time using

software [18]. It is provided with an FPGA which can be used for implementing the

complex signal processing task in hardware. During the initial feasibility study of

this thesis work, GNU Radio framework was considered as one of the options for

implementing the IEEE 802.15.4 physical layer on a TTA architecture based

application specific processor. It has been realized that GNU Radio framework is

heavily dependent on the Linux operating system and hence requires porting the

whole Linux (or Embedded Linux) on TTA processor before using GNU Radio.

Some earlier work has been done in implementing IEEE 802.15.4 without using

GNU Radio or other SDR development framework. For example, thesis work

described in [22] evaluates different signal processing algorithms for the SDR

implementation of IEEE 802.15.4 physical layer. The algorithms selected are for

coherent receiver design, which are more complex for implementation. The work is

done in collaboration with another thesis work [23] which designed efficient Single-

Instruction-Multiple-Data (SIMD) signal processor. Coherent receivers give better

performance in terms of bit error but an require expensive carrier recovery algorithm.

It is better to consider a non-coherent receiver for demodulating the IEEE 802.15.4

signal as described in [24].

1.2. Scope of the Thesis

The objective of the work is to implement the IEEE 802.15.4 2.4 GHz Physical layer

on a TTA Processor and evaluate the TTA architecture performance for wireless

applications. The work includes modifying the existing signal processing algorithms

to suit software implementation, implementing the algorithm in C and designing the

TTA processor which can cater to the performance requirement as specified by IEEE

802.15.4 physical layer standard.

1.3. Structure of Thesis

The rest of this thesis is structured as follows. Chapter 2 gives a brief overview of

IEEE 802.15.4 and the physical layer specification for 2.4 GHz band. Chapter 3

gives an overview of transceiver design and explores different design issues. Chapter

4 provides details of the implemented transmitter and receiver along with the TTA

processors for them. Chapter 5 gives a performance overview of the designed

transceiver in terms of bit-error-rate, gate count and the CPU cycle requirements.

Chapter 6 summarizes the work done as part of the thesis.

11

2. IEEE 802.15.4 PHYSICAL LAYER

IEEE 802.15.4 standard describes the MAC and Physical layer protocols for Wireless

Personal Area Networks [8]. WPANs are used for low data rate communication

between devices over small distance. The main objective of the specification is to

provide a simple and flexible protocol for reliable data transfer over short range

distances at extremely low cost and low power requirements.

The protocol architecture of IEEE 802.15.4 is presented in Figure 2 [8]. The IEEE

802.15.4 describes the specifications for MAC and PHY layers. Higher layers are

described by other protocol stacks like ZigBee [25] while the Logical Link Control

(LLC) layer can be either the IEEE 802.2 [26] or any other LLC protocol

specifications.

IEEE 802.15.4 PHY

IEEE 802.15.4 MAC

IEEE 802.2 LLC Other LLC

Upper Layers

Figure 2: IEEE 802.15.4 protocol architecture

Two types of network topologies are specified by IEEE 802.15.4 specifications.

An IEEE 802.15.4 LR-WPAN compliant network may operate in either of two

topologies: the star topology or the Peer-to-Peer topology as shown in Figure 3. In

star topology, the communication is established between devices and a single master

node, called PAN coordinator. The Peer-to-Peer topology also has a PAN

coordinator which works as the master node, but the devices can communicate with

each other directly. The devices are either Full Function Device (FFD) or Reduced

Function Device (RFD), which cannot act as PAN coordinator. Star topology is

ideally suited for applications like home automation, computer peripheral

interconnection, toys and games and personal health care. The Peer-to-Peer

Topology is suited for more complex applications such as industrial control and

monitoring, wireless sensor networks, inventory tracking, etc. [8]

The MAC layer of IEEE 80.15.4 LR-WPAN is based on Carrier Sense Multiple

Access with Collision Avoidance (CSMA-CA). The MAC layer is responsible for

access to the radio channel, generating beacons if the device is the PAN coordinator,

synchronization of devices and security of the data. It communicates with the

12

physical layer and higher layers with the help of a set of defined messages as defined

by IEEE 802.15.4 standard. [8]

PAN Coordinator

Star Topology

PAN Coordinator

Peer-to-Peer Topology

Full Function Device (FFD)

Reduced Function Device (RFD)

Figure 3: Star and Peer-to-Peer Network Topology

The IEEE 802.15.4 physical layer defines the frequency band, modulation and

coding for the radio signals. It also specifies radio parameters like transmit power

spectral density (PSD), receiver sensitivity, jamming resistance, etc. [8]. Following

subsections describe the IEEE 802.15.4 physical layer specification in details.

2.1. Frequency Band

The IEEE 802.15.4 specifies four physical layer standards utilizing different

frequency bands and modulation schemes. A compliant device can operate on one or

several frequency bands as specified by the standard. The parameters of these

physical layer specifications are briefly summarized in Table 1 [8].

Table 1: IEEE802.15.4 Frequency Bands

Band Frequency

(MHz)

Chip

Rate

Modulation Pulse

Shape

Bit

Rate

(kb/s)

Symbol

Rate

(ksymb/s)

Symbol

868/915 868 – 868.6 300 BPSK RRC 20 20 Binary

902 – 928 600 BPSK RRC 40 40 Binary

868/915

(Optional) 868 – 868.6 400 ASK RRC 250 12.5 20-bit PSSS 902 – 928 1600 ASK RRC 250 50 5-bit PSSS

868/915

(Optional)

868 – 868.6 400 O-QPSK Half-Sine 100 25 16-ary Orth.

902 – 928 1000 O-QPSK Half-Sine 250 62.5 16-ary Orth.

2450 2400 – 2483.5 2000 O-QPSK Half-Sine 250 62.5 16-ary Orth.

2450 MHz frequency band is the most widely used frequency band available to be

used globally for industrial, scientific and medical purposes. In this thesis, the

physical layer specification for 2450 MHz band is considered for design and

implementation. The scope of this thesis is limited to the design and implementation

of 2450 MHz band PHY specification. Any further reference to IEEE 802.15.4

implies the specification for 2450 MHz band PHY specification.

13

2.2. Frame Format

The frame format specified by IEEE 802.15.4 is presented in Figure 4. The frame is

known as PHY Protocol Data Unit (PPDU). The data is transmitted Least

Significant Byte (LSB) first order. Also, the least significant bit of the byte is

transmitted first [8].

Figure 4: PPDU Format for 2.4 GHz PHY.

The PPDU consists of a synchronization header (SHR) containing Preamble and

Start of Frame Delimiter (SFD) fields. SHR field is used by a transceiver for frame

and timing synchronization. The information contained in SHR is fixed and priory

known to both the transmitter and the receiver. For 2.4 GHz PHY, the preamble is

four byte long. All the bits in the preamble field are set to binary 0. The SFD field is

a fixed 8-bit sequence that marks the end of preamble and start of the frame [8].

PHY header contains the length of the payload. The frame length field is of 7 bit

length. The actual payload is of variable length with the maximum length of 127

Byte [8].

2.3. Coding

The IEEE 802.15.4 standard employs Direct Sequences Spread Spectrum (DSSS) for

spreading the data bits using 16 32-bit quasi-orthogonal pseudo-noise (PN)

sequences. The PN sequences or chips are listed in Table 2. Four consecutive bits of

the PPDU are mapped to a data symbol. Every data symbol is then encoded to the

32-bit PN chips [8].

Table 2: IEEE 802.15.4 Spreading Codes

Symbol chips

0000 01110100010010101100001110011011

0001 01000100101011000011100110110111

0010 01001010110000111001101101110100

0011 10101100001110011011011101000100

0100 11000011100110110111010001001010

0101 00111001101101110100010010101100

0110 10011011011101000100101011000011

0111 10110111010001001010110000111001

1000 11011110111000000110100100110001

1001 11101110000001101001001100011101

1010 11100000011010010011000111011110

1011 00000110100100110001110111101110

1100 01101001001100011101111011100000

Preamble SFDFrame Length

(7 bits)Reserved

(1 Bit)PSDU

1 variable

Octets

Synchronization Header PHY Header PHY Payload

14

1101 10010011000111011110111000000110

1110 00110001110111101110000001101001

1111 00011101111011100000011010010011

2.4. Modulation

The IEEE 802.15.4 standard uses Offset-QPSK (O-QPSK) modulation for

transmitting the chips. Even indexed chips are modulated to in-phase carrier (I) and

odd indexed chips are modulated to quadrature-phase (Q) carrier. The Q-phase

carrier is delayed by an offset of half symbol time with respect to the I-phase carrier

to form the offset (Figure 5) [8].

C0 C2 C4 C6 . . . . C28 C30

C1 C3 C5 C7 . . . . C29 C31

2Tc

Tc

Q-Phase

I-Phase

Figure 5: O-QPSK modulated Chip Sequences.

The offset of half symbol time in O-QPSK modulation prevents the zero-crossing

in the constellation diagram (Figure 6). O-QPSK modulation allows a single bit

change (either I or Q) at half of the symbol time. This results in a maximum phase

change of π/2 compared with Q-PSK which has a maximum phase change of π. Lack

of zero-crossing in O-QPSK modulation makes it suitable for non-linear power

amplifiers which has better power efficiency compared with linear power amplifiers.

I

Q

1110

00 01

I

Q

1110

00 01

QPSK Constellation O-QPSK Constellation

Figure 6: QPSK and O-QPSK constellations Diagram.

2.5. Pulse Shaping

IEEE 802.15.4 uses half-sine pulse shaping for transmitting the O-QPSK modulated

signal. The pulse shape used to represent each baseband chip is given by Equation

(1) [8].

15

otherwise

TtT

t

tpc

c

,0

20,2

sin)(

(1)

Where Tc is chip time. O-QPSK modulated half-sine waveform is presented in

Figure 7. The resulting baseband signal has a constant envelop.

Figure 7: O-QPSK I and Q symbols with the half-sine pulse shape.

2.6. O-QPSK/MSK equivalence

An O-QPSK modulated signal can be represented as

( ) {[ ∑ ( )

]

[ ∑ ( )

] }

(2)

where = (-1, 1) are the chip sequences, is amplitude of modulated signal, is

the carrier frequency and g(t) is a sinusoidal pulse as defined as

( ) {

( )

(3)

The O-QPSK modulation with half-sine pulse shape is equivalent to Continuous

Phase Frequency Shift Keying (CPFSK) with modulation index of 0.5 (also known

as MSK Modulation). To make the O-QPSK strictly equivalent to MSK, data must

be encoded as per MSK/O-QPSK coder Equation (4) [27].

16

{

(4)

where, dk are the MSK symbols and ak are the O-QPSK symbols. The

corresponding MSK/O-QPSK decoder equation is given as.

{

(5)

17

3. TRANCEIVER DESIGN

3.1. Transmitter Design

The IEEE 802.15.4 transmitter can be designed considering either the O-QPSK

modulation with half-sine pulse shaping or MSK modulator with MSK/O-QPSK

encoder. Following subsections provide details of both the designs.

3.1.1. O-QPSK based Transmitter Design

The block diagram of the baseband transmitter is shown in Figure 8. The transmitter

design is simple and is based on O-QPSK modulation with half-sine pulse shaping.

Fram

e Fo

rmat

ter

Ch

ip M

app

er

O-Q

PSK

M

od

ula

tor

Hal

f-Si

ne

Pu

lse

Shap

e

I

Q

I

Q

Figure 8: Block Diagram of the O-QPSK modulator based transmitter.

The transmitter consists of a frame formatter which formats the incoming payload

as per the IEEE 802.15.4 specification to form PPDU (see Section 2.2). Preamble,

SFD and frame length field is prefixed to the payload bits. The formatted PPDU is

then sent to the chip mapper. The chip mapper maps the four consecutive bits of the

PPDU to a 32-bit orthogonal PN sequence as specified in Table 2. These chips are

then modulated to I and Q carriers in O-QPSK modulator. I and Q phases are then

passed through a half-sine pulse shaping filter which outputs the samples of half-sine

pulses.

The design of a transmitter this kind is simple. A frame formatter can easily be

implemented by copying the incoming payload to a buffer containing preamble, SFD

and frame length field. The chip mapper can be implemented as a lookup table

containing 16 quasi-orthogonal chip sequences. The O-QPSK modulator converts

these chips from 0/1 to -1/1. The half-sine pulse shape filter output samples of

positive or negative half-sine pulse based on whether the I/Q symbols are +1/-1

respectively.

3.1.2. MSK based transmitter Design

Since O-QPSK with half sine pulse shaping is equivalent to MSK modulation, an

MSK modulator based transmitter can also be designed. The block diagram of such a

transmitter is shown in Figure 9.

18

Fram

e Fo

rmat

ter

Ch

ip M

app

er

MSK

/O-Q

PSK

C

od

er

MSK

M

od

ula

tor

I

Q

Figure 9: Block Diagram of the MSK modulator based transmitter.

The transmitter design is similar to the O-QPSK based transmitter. The O-QPSK

chips are first encoded with MSK / O-QPSK encoding Equation (4). The encoded

chips are then sent to an MSK modulator.

3.1.3. Design Decisions

Both of the transmitter design are simple and can be implemented on a TTA based

processor. The O-QPSK modulator with half-sine pulse shape directly maps to the

IEEE standard and hence is easier to test and verify. For this reason, O-QPSK

modulator based transmitter is selected for implementation.

3.2. Receiver Design

Following sub-sections provides a brief overview of some of the design choices

available for receiver design. Coherent and non-coherent demodulation techniques

are considered for evaluating the candidature for an efficient receiver which can

match the data throughput and SER requirement of the IEEE 802.15.4 standard. The

evaluation is theoretical and based on existing literature reviews.

3.2.1. Coherent Receiver Design

IEEE 802.15.4 receivers could use the coherent detection technique for demodulating

the O-QPSK modulated signal. In the coherent detection method, the detector tries to

estimate the actual transmitted symbol by computing the absolute phase of the carrier

signal. For this reason coherent receivers require strict timing and phase

synchronization. A coherent detection based SDR receiver is extensively discussed in

[18]. A simplified block diagram of such receiver is shown in Figure 10.

Frequency Offset Estimator &

compensatorMatched Filter Down Converter

Correlator Bank

Preamble Detection

Figure 10: Coherent Receiver Design.

19

Frequency Offset Estimator and Compensator

Wireless transceiver systems have imperfections in up/down converters and local

oscillators. These imperfections result into frequency and phase offset between

transmitted and received signals. The offset causes a constant rotation to the signal

constellation and must be compensated before the coherent detection of the symbol.

The frequency and phase offset estimation is done is two stages.

The first phase is a coarse frequency estimator which operates in a low signal to

noise ratio (SNR) and provides the initial estimate of carrier frequency. There are

different methods for performing the coarse frequency estimation. An FFT followed

by a search for spectral maxima provides an estimate for the frequency. Such a

frequency estimator is not suited for IEEE 802.15.4 receiver since spikes in FFT

periodogram of the signal are closely spaced [28]. The other method for coarse

frequency estimation is data based or correlation based estimators. Some of these

estimators like Kay estimator [29] or Meyr estimator [30] could be used for coarse

frequency estimation.

In second phase, fine frequency and phase offset estimation is done. The IEEE

802.15.4 uses preamble at the beginning of each frame which can be used for

frequency and phase estimation using data-added estimation algorithms. A

complex-valued correlator is used to perform the synchronization and estimation of

the phase. The incoming complex signal is correlated with the known preamble

sequence. The net rotation per sample is computed by dividing the correlation value

by the number of samples [18].

Once the frequency and phase offset is estimated, the samples need to be corrected

by compensating for these offset. This can be done by rotating the samples by the

offset in the constellation using CORDIC algorithms [31].

Matched Filter

A matched filter is used for detecting the known signal in an Additive White

Gaussian Noise channel. The idea is to correlate the received signal with known

pulse shape used during the transmission of the signal (half-sine pulse in IEEE

802.15.4 case). An adaptive matched filter whose co-efficient change with the

changing channel provides better noise performance at the cost of higher complexity.

For IEEE 802.15.4, a non-adaptive matched filter whose coefficient are matched to

the half-sine pulse, can fulfill the noise performance requirement. The filter co-

efficient of the matched filter is given by

.,0

.1,..2,1,0,sin)(

otherwise

TnT

n

nTh s

(6)

Where, T is the number of filter taps and Ts is symbol time.

Quantize / Down-sample

The output of the matched filter can be further quantized (to a single bit) to reduce

the complexity of the correlator. Also, the output of matched filter needs to be

20

down-sampled to match the symbol time. For the correlator to work correctly, the

symbol time synchronization needs to be insured.

The down-sampling operates in two modes. During the preamble detection and

synchronization phase, down-sampling is performed at multiple possible timing

offsets. During the actual symbol detection phase, the down-sampling is done at the

best timing offset selected during the preamble detection phase.

Preamble Detector

The IEEE 802.15.4 frame (PPDU) consists of five byte of the synchronization header

(Preamble and SFD) which is used by the receiver for frequency, phase and frame

synchronization. The synchronization header consists of known chip sequences and

hence need a correlator matched to preamble and SFD chip sequences. The input

sequence is down sampled at multiple offset and correlated with the preamble chip

sequence. The timing offset giving the best correlation is used for the downsampling

of payload chips. Detection of the Start of Frame Delimiter (SFD) denotes the end of

preamble and start of actual payload chips (frame synchronization).

Correlator Bank

The IEEE 802.15.4 uses direct sequence spread spectrum for converting four-bit data

symbols to 32-bit PN chip sequences. For the detection of the data symbol, a bank of

correlator matched to the PN sequences can be used. The block diagram of the

correlator bank is shown in Figure 11. There are 16 correlators matched to the 16 PN

chip sequences (Table 2). The symbol corresponding to the correlator having

maximum correlation value is selected as symbol decision.

Correlator 0

Correlator 1

Correlator 2

Correlator 15

Fin

d M

ax

.

.

.

.

Figure 11: Correlator Bank.

The design of a coherent receiver possesses hard challenges for implementation on

a low cost and low power device. The coherent detection algorithms are

computationally complex and are not suited for the sensor node implementation [24].

For example, the frequency and phase estimation algorithm uses complex CORDIC

algorithms for phase estimation and compensation which are difficult to realize on a

low cost device.

21

3.2.2. Non-coherent receiver

As discussed in Section 2.6, the O-QPSK modulation with half-sine pulse is

equivalent to CPFSK modulation with the modulation index of 0.5. Traditionally,

FSK demodulation is based on non-coherent detection methods. These non-coherent

detectors have advantage of robustness and operate without a need for carrier

recovery devices. The block diagram of the non-coherent demodulator based receiver

is shown in Figure 12.

MSK

D

emo

du

lato

r

Do

wn

Sa

mp

le

Preamble Detector

Correlator

I

Q

Chip Time

Figure 12: Block Diagram of Non Coherent Receiver.

MSK Demodulator

A non-coherent MSK demodulator could be used for the demodulation of O-QPSK

signal. MSK demodulators are based on non-coherent detection methods such as

limiter-discriminator integrator (LDI) and dump detector [32]. The LDI technique is

the basis for designing a zero-crossing demodulator (ZCD), reducing the receiver

complexity further as in the case of Cross-Differentiate Multiply (CDM)

demodulator [33] or the digital tangent method [34]. A simple demodulator based on

zero-crossing demodulation known as ZIFZCD (Zero IF Zero-crossing detector) is

presented in [35]. The complexity of the receiver is substantially reduced in ZIFZCD

demodulation using the Zero-IF demodulation and use of a hard-limiter device. The

ZIFZCD algorithm uses a LDI technique and requires a symbol timing recovery

device to operate. The ZIFZCD algorithm is further modified to replace the LDI

device with an integrate and saturate device in the Asynchronous Zero-Crossing

Demodulator (AZCD) [36]. The AZCD algorithm’s computational complexity is

very low and it does not require a complex symbol timing recovery device.

The block diagram of a Zero-crossing based detector is shown in Figure 13. It

consists of a phase axis generator, which generates multiple phase axes (M) in order

to operate with lower modulation index values. These phase axis values are hard-

limited and sent to a bank of zero-crossing detectors. Each zero-crossing detector

takes a pair of phase axes (I & Q) and output a decision value D which gives the

direction of phase axis crossings (clockwise, anticlockwise or no crossing). The

decisions from all the ZCD are combined in a decision device which gives the

decision about the symbol.

MSK demodulator does not give the actual O-QPSK encoded chips. To get the

transmitted chip, the demodulated sequence needs to be decoded using the MSK/O-

QPSK decoder Equation (5). The drawback of using Equation (5) for decoding the

MSK demodulated sequence to O-QPSK sequence is the recursive nature of the

22

equation. The output of MSK/O-QPSK decoder (ak) is dependent on previous output

(ak-1). This results in the propagation of a single bit error to all the following bits. A

solution to this problem is to use the correlator bank matched to the MSK/O-QPSK

encoded chip sequence (using Equation (4)) instead of converting the received

sequence from MSK to O-QPSK. Design of such a correlator is discussed in Section

3.2.1.

Ph

ase

Axi

s G

ener

ato

r

Har

d-L

imit

er

ZCD0

ZCD1

ZCDM/2-1

Dec

isio

n

Dev

ice

I

i0

q0

i1

q1

iM/2-1

qM/2-1

D0

D1

DM/2-1Q

S

Figure 13: AZCD Block Diagram.

Down-sample

The output of the zero crossing detector is a binary signal and do not require any

further quantization (As is the case with coherent demodulation). Down-sampling is

still required to match with transmitted symbol time. The design of the down-

sampling block is the same as explained in Section 3.2.1.

Preamble Detector

The design of the preamble detector is similar to the case of coherent receiver

(Section 3.2.1). The co-efficient of the correlator is matched to the MSK/O-QPSK

encoded preamble and SFD chip sequence using Equation (4).

Correlator Bank

The design of the correlator bank is similar to that of the coherent receiver. The

correlators are matched to the MSK/O-QPSK encoded chip sequence instead of the

original chip sequence (Table 1). During this transformation, the auto-correlation and

cross-correlation properties of PN sequences remain nearly unchanged. Figure 14

shows the auto-correlation and cross-correlation between chip sequences for symbol

0 and symbol 8. The only difference between coded and uncoded PN sequence is that

negative spikes are generated in cross-correlation between MSK/O-QPSK encoded

PN sequences which has no effect on the design of the correlator bank.

A minor issue with the MSK/O-QPSK encoded PN sequence is that the encoding

can be performed only till the 31st bit of the PN sequence. The encoding Equation (4)

needs the knowledge of (k+1)th bit for encoding the kth bit. So, the 32nd

bit of PN

sequence cannot be encoded. Hence, the coding gain achieved using MSK/O-QPSK

encoded PN sequence is slightly lower than the original PN sequence.

23

Figure 14: Auto-correlation and cross-correlation of PN sequences.

3.2.3. Design Decisions

A coherent receiver has better noise performance compared with a non-coherent

receiver at the cost of higher complexity. The computational complexity of the

coherent receiver requires powerful processors for implementation. They are not

suited for designing low-power and low cost receivers. The Asynchronous Zero-

crossing demodulator (AZCD) based non-coherent receiver design has a lower noise

performance but can easily meet the noise performance requirement of IEEE

802.15.4 [37]. AZCD algorithm complexity is very low and can easily be

implemented on a low cost application specific processor. Hence, AZCD based non-

coherent receiver design is selected for implementation on a TTA based processor.

For the despreading of chip sequence, a correlator bank matched to MSK/O-QPSK

coded PN sequence is used. Following subsections briefly describe the theory behind

AZCD detector and the correlator bank.

24

4. TRANCEIVER IMPLEMENTATION

4.1. Transmitter Implementation

An O-QPSK based transmitter is implemented on TTA processor. The block diagram

of the transmitter is shown in Figure 8. The implementation details of the blocks are

given in following subsections.

4.1.1. Frame Formatter

The frame formatter is used to format the payload as per the IEEE specification

(Section 2.2). The frame formatter reads the input data (payload) using a special

functional unit which read serial data from a FIFO queue. The payload is stored in a

buffer which already contains the preamble (of length 4 bytes, all 0s), SFD and frame

length values. Once, all the payload byte has been received, the frame formatter

sends the formatted frame (PPDU) to the chip mapper for spreading. Considering the

maximum payload size to be 127 byte, 5 bytes of the frame header and 1 byte of the

length field, the buffer size for storing the PPDU is 133 bytes.

4.1.2. Chip Mapper

The role of the chip mapper is to convert the IEEE 802.15.4 frame (PPDU) to PN

chips (Section 2.3). The chip mapper is implemented in form of a lookup table

containing the 16 PN chips of length 32-bit each. Each byte of the PPDU is mapped

to two symbols (4-bits each). The symbols are then mapped to the corresponding PN

chip sequences using the lookup table. The spreading factor for the chip mapper is 8

(32-bit chip per 4-bit symbol). So, the memory required for storing the chips is 1064

bytes (133 x 8).

4.1.3. O-QPSK Modulator

The PN chips are modulated to in-phase (I) and quadrature-phase (Q) carriers as

specified by the IEEE specification (Section 2.4). Even indexed chips are modulated

to I carrier and odd indexed chips are modulated to Q carrier. The I and Q carrier

values are represented as 8-bit fixed point numbers representing +1 or -1. Since each

bit of PN chips is represented by an 8-bit fixed point number, the memory required to

store the IQ symbols is 8512 bytes (1064 x 8).

O-QPSK modulation requires an offset of half symbol time between I and Q

symbols. For implementing the offset, IQ symbols needs to be oversampled resulting

in bigger buffer requirements. The offset can easily be implemented in the pulse

shaping filter without the need of an extra buffer. Hence, the implementation of

offset is left for the 0pulse shaping block. The O-QPSK modulator in this case is a

simple QPSK modulator.

25

4.1.4. Half-Sine Pulse Shaper

The job of the half-sine pulse shaper is to output the samples of a half-sine pulse for

each I and Q carrier (Section 2.5). Eight samples of a positive half-sine pulse are

multiplied with I and Q symbol and sent out to Digital-to-Analog Converter (DAC)

using a special functional unit in TTA processor. The samples are stored as 8-bit

fixed point number and the result of multiplication of IQ symbol with pulse-shape

samples is also stored as 8-bit fixed point numbers. The samples are directly sent out

to the DAC after multiplication, avoiding the need of a buffer to store the outgoing

baseband signal. Also, the samples for the quadrature carrier are delayed by half

symbol (4 samples) to implement the offset between I and Q carrier.

4.1.5. TTA processor for the transmitter

The transmitter does not require much of computation resources. The frame

formatter and the chip mapper do not require much of computations and mainly

require memory read/write. O-QPSK modulator unpacks the 32-bit chips and maps

0/1 to -1/+1. The computations required for modulating the chips are simple bit-shift

and masking operations.

The bottleneck of the transmitter chain is a pulse shaping filter. The pulse shaping

filter requires eight fixed-point multiplications for each IQ symbol (eight times

oversampling). The fixed-point multiplication requires an integer multiply and an

arithmetic shift operation. The fixed-point multiplications are data independent and

can be performed in parallel. Keeping the processor simple, two ALUs and one FU

for multiplication are found to be enough to meet the data rate requirement at very

low clock frequency.

A table of resources used in the TTA processor is provided below:

Table 3: Resources on TTA processor for transmitter

Resource Name Count Function

ALU 2 Arithemetic and Logical Computations

MUL 1 Integer Multiplication

LSU 1 Load Store Unit for memory access

CU 1 Control unit for program flow control

RF 4 Register file for storing intermediate results

STREAM_IN 1 Serial input of data

STREAM_OUT 1 Serial output of data

Buses 6 Interconnection Network

The estimated gate count for the TTA processor is around 16k gates. The processor

requires a clock frequency of 35 MHz to meet the processing requirement of 250

kbps.

26

4.2. Receiver Implementation

The block diagram of the AZCD based non-coherent receiver is shown in Figure 15.

The following subsections will give a brief overview of the implementation of these

blocks on a TTA processor.

AZC

D

Do

wn

Sa

mp

le

Preamble Detector

Correlator Bank

I

Q

Chip Time

Bit

s-to

-Wo

rd

Pac

kin

g

Figure 15: Receiver block diagram.

4.2.1. Asynchronous Zero-Crossing Detector

The implementation of the zero-crossing detector is the most computation intensive

task in the whole receiver chain. The block diagram of AZCD is shown in Figure 13.

The Asynchronous Zero-Crossing Detector consists of a phase axis generator, Hard-

Limiter, Zero-Crossing Detectors and a decision device.

Phase Axis Generator

One of the design decisions for implementing a phase axis generator is the selection

of the number of the phase axes (M). Since, the modulation index of the MSK

modulation is 0.5, the minimum number of phase axes required to correctly

demodulate the MSK signal using a zero-crossing detector is four [35]. The use of

Integrate and Saturate Filter (ISF) in place of Limiter-Discriminator Integrator (LDI)

further increases the number of phase axes required to detect a switch in the direction

of phase change. For this reason, the number of phase axes (M) is chosen to be 8.

The equation for phase axis generation is given as.

{

(7)

The θk values are selected to generate the phase-axes at equal phase angle. For

eight phase axes, the θk values are 0, π/8, π/4, 3π/8. The sine and cosine values for

these angles are computed and used as constant 8-bit fixed point numbers.

For each pair of IQ samples received, four pairs of phase axes (namely i0, q0, i1, q1,

i2, q2, i3, q3,) are generated. This requires twelve fixed point multiplications and six

addition/subtraction operations (for θk = 0, no computations required). The

operations can be executed in parallel for all the pair of phase-axes.

27

Hard Limiter

Once the phase axes are generated, the exact position of samples with respect to the

corresponding phase axis in no more important. The zero-crossing detector detects an

axis crossing between the previous and the next sample and do not require the exact

value of the sample. For this reason, the generated phase axes are hard-limited to

-1/+1. The implementation of the hard limiter is similar to signum function which

gives the sign of a number as

( ) {

(8)

The hard limiting function can be implemented using the if-else statement, sign-bit

extraction by a logical shift or using a special FU in the processor. The conditional

if-else based implementation is not suited for signal processing applications. Sign-bit

extraction and conversion to +1/-1 requires logical shift, mask and arithmetic shift

operation and is computationally complex. For this reason, the hard limiter is

implemented using a special FU which computes the signum operation. A special

instruction (_TCE_SIGNUM) is defined which uses the FU and performs signum in

one clock cycle. Total eight such FU can perform the hard-limiting of phase axes in

parallel within a single clock cycle.

Zero-Crossing Detector

Hard-limited phase axes pairs (ik, qk) are sent to a bank of zero crossing detectors

(ZCD). Each ZCD takes a pair of axis values (ik, qk) and gives a decision variable Dk.

Dk denotes the direction of phase axis crossing. Dk = 1 denotes axis crossing in the

anti-clockwise direction, Dk = -1 denotes axis crossing in the clockwise direction and

Dk = 0 denotes no axis crossing detected. Dk is given by

[ ( ) ( )] (9)

Each zero crossing detector requires two multiplication, three subtraction and one

‘divide by 2’ operation. Integer-division by 2 can be replaced by an arithmetic right-

shift operation. The number of zero-crossing detectors is four which can do the

computation in parallel.

Decision Device

The decision device combines all the zero-crossing detections values (Dk) and gives

the symbol decision. The directions of phase rotation from all the ZCDs (Dk) are

combined together using Equation (10).

(

∑

)

(10)

Where sat(x) is defined as

28

( ) {

(11)

Ci gives the cumulative phase change direction from all the ZCD and can have

values (+1, 0, -1). The output of the decision device is given by S, defined as

{

(12)

S gives the shape of the modulated signal. If no axis crossing is detected with the

current sample (Ci = 0), the previous decision (Si-1) is output as the decision.

The implementation of the decision device is straight forward. The only issue is

implementing the sat(x) function. To avoid the conditional execution, a special

functional unit is implemented in the TTA processor for performing the saturation

function.

4.2.2. Down Sampling

For AZCD to work correctly, the sampling rate of the signal should be above the

symbol rate. For implementation, four times oversampling is used. AZCD gives one

symbol decision per I-Q sample pair. So, the output of AZCD must be down sampled

by a factor of 2 to match the symbol rate. The down sampling block copies either

even or odd indexed AZCD output values according to the provided offset to a

memory buffer.

4.2.3. Bits-to-Word Packing

The down-sampled AZCD output values are the demodulated MSK/O-QPSK

encoded chips. These values are converted from -1/1 to hard-bit format (0/1) and

packed in 32-bit word. The goal of the packing is to optimize the correlator

implementation as discussed in Section 4.2.4.

The bit packing is a highly parallelizable operation. It requires converting 32

bipolar values (-/+1) to hard bits (0/1) and shifting the bit value to its bit position

using logical shift operation. These 32 operations can be performed in parallel in two

clock cycles given the resources. A logical OR operation is performed between all

these 32 values to pack them as single 32-bit word. The OR operation can also be

performed in parallel by dividing into 32 values into two groups of 16 and

performing OR in 1st cycle, dividing the resulting 16 values into two groups of 8 and

performing OR in 2nd

cycle so on. The total number of cycles needed to perform OR

operation for 32-bit packing is five. So, the total number of cycles required for

packing 32-bit value is seven provided there are enough registers and ALUs to

perform the parallelization.

4.2.4. Correlator Bank

The correlator bank is used to de-spread the PN chip sequences to the data symbol.

Traditionally, a correlator is implemented as an FIR filter using multiply-and-

accumulate (MAC) operation. Since the output of AZCD are hard-bit values, hard bit

29

correlation (namely xor-and-count 1s) can be used very efficiently. For this reason,

the outputs of AZCD are packed into 32-bit words (representing the 32-bit PN

sequence). To detect the transmitted symbol, the 32-bit packed chip is correlated

with all the known 16 PN chip sequence and the symbol corresponding to the highest

correlating chip sequence is selected. Since the AZCD is an MSK demodulator, the

PN chips used for correlation are encoded with MSK/O-QPSK encoder using

Equation (4), as described in Section 3.2.1. The table of MSK/O-QPSK encoded chip

sequences is given in Table 4. Figure 11 gives the block diagram of the correlator

bank.

Table 4: MSK/O-QPSK encoded chip sequence

Symbol Chip Sequence MSK/O-QPSK Encoded Chip Sequence

0000 01110100010010101100001110011011 10011011001110101111011100000011

0001 01000100101011000011100110110111 10110011101011110111000000111001

0010 01001010110000111001101101110100 10111010111101110000001110011011

0011 10101100001110011011011101000100 00101111011100000011100110110011

0100 11000011100110110111010001001010 01110111000000111001101100111010

0101 00111001101101110100010010101100 11110000001110011011001110101111

0110 10011011011101000100101011000011 00000011100110110011101011110111

0111 10110111010001001010110000111001 00111001101100111010111101110000

1000 11011110111000000110100100110001 01100100110001010000100011111100

1001 11101110000001101001001100011101 01001100010100001000111111000110

1010 11100000011010010011000111011110 01000101000010001111110001100100

1011 00000110100100110001110111101110 11010000100011111100011001001100

1100 01101001001100011101111011100000 10001000111111000110010011000101

1101 10010011000111011110111000000110 00001111110001100100110001010000

1110 00110001110111101110000001101001 11111100011001001100010100001000

1111 00011101111011100000011010010011 11000110010011000101000010001111

16 correlators are matched to 16 MSK/O-QPSK PN sequences. The correlation is

performed as bitwise XOR operation between the detected chip and the known chip.

The result of XOR is passed to a special FU which counts the number of 1s in the

value. The correlator giving the minimum number of 1s in the XOR result gives the

maximum correlation and corresponding symbol is selected as the detected data

symbol.

4.2.5. Preamble Detection

The IEEE 802.15.4 frame is prefixed with the synchronization header consisting of

eight preamble symbols and two SFD symbols. The synchronization header is used

for frame and symbol timing synchronization. Preamble detection is implemented as

a correlator matched to chip sequence corresponding to preamble symbol (symbol 0)

and counting peaks. The preamble detection is considered successful if eight peaks

(above a defined threshold) are detected after the correlation of an incoming chip

sequence with preamble chips. The design of the correlator is same as explained in

Section 4.2.4 (based on the XOR and count 1s). The count of 1s in XOR result can

easily be converted to traditional multiply-and-accumulate based correlation as

30

(13)

Here, v is the traditional multiply-and-accumulate based correlation value and u is

the count of 1s in the XOR result.

4.2.6. Receiver Algorithm

The brief flowchart of the receiver algorithm is shown in Figure 16. The receiver

works in two modes. The first mode is preamble detection. In this mode, I-Q samples

are buffered till the length of preamble samples. The buffer is then demodulated

using AZCD and the output of AZCD is down sampled at two different offsets. Both

of these downsampled values are then passed to the preamble detector block. If

preamble detection is successful, the best downsampling offset (giving maximum

preamble correlation) is selected as the timing offset for further data downsampling.

If preamble detection is not successful, further samples are buffered and the

preamble detection process is repeated again.

Once the preamble detection is successful, the receiver switches to the payload

detection mode. Samples corresponding to the SFD field are demodulated using the

AZCD, downsampled at the correct timing offset (selected during the preamble

detection phase) and de-spread using the correlator bank. If the detected SFD is the

same as the expected value, frame synchronization is achieved. Next, samples

corresponding to the frame length field are buffered and demodulated. Based on the

frame length value, the length of payload samples is computed. The payload samples

are then buffered, demodulated using AZCD, downsampled and de-spread using the

correlator bank giving the required payload.

4.2.7. TTA Processor for Receiver

The bottleneck in the receiver processing chain is AZCD block and the processor is

designed to optimize the AZCD algorithm execution. Since, the number of phase

axes is chosen to be eight; eight ALUs are used in the TTA processor to allow the

parallel execution of ZCD algorithm. The resources used for implementing the TTA

processor are given in Table 5. The estimated gate count for the processor is 95 k

gates. The processor needs 200 MHz of clock frequency to achieve a data throughput

of 250 kbps.

31

START

Buffer samples of length = preamble chip length * oversampling ratio

AZCD Decode (M phase axes)

Down-sample by Oversampling ratio at multiple offset

Correlate with preamble chips

Preamble Detected?

No

Get the best down-sampling offset (Max preamble correlation)

Buffer Data Samples

AZCD Decode (M phase axes)

Down-sample by Oversampling ratio at correct offset

Correlate which chip sequences to detect data symbol

END

Yes

Figure 16: Flow chart of receiver algorithm

32

Table 5: Resources used in TTA processor for receiver

Resource Name Count Function

ALU 8 Arithmetic and Logical Computations

MUL 5 Integer Multiplication

LSU 2 Load Store Unit for memory access

CU 1 Control unit for program flow control

RF 8 Register file for storing intermediate results

STREAM_IN 1 Serial input of data

STREAM_OUT 1 Serial output of data

SIGNUM 8 For computing the signum(x) function

COUNT_ONES 1 For counting the number of 1s in 32 bit register

SATURATE 1 For saturating the values to +/-1

Buses 16 Interconnection Network

33

5. PERFORMANCE EVALUATIONS

The following subsections describe the performance evaluation of the IEEE 802.15.4

transceiver realization on a TTA processor. Performance is measured in terms of

Symbol Error Rate (SER) for the receiver design and performance of the designed

TTA processors in terms of gate count and clock frequency requirement.

5.1. Noise Performance for Receiver

The plot of Eb/N0(dB) versus Symbol Error Rate (SER) is shown in Figure 17. The

IEEE 802.15.4 requires receiver sensitivity to be -85dBm. The receiver sensitivity is

defined as threshold input power that yields a Packet Error Rate of 1 per cent

considering the PSDU length to be 20 octets without the presence of any interference

[8].

Figure 17: Eb/N0 vs SER Curve.

The value of desired SER can be computed from PER as

( ) (14)

Where, N is the number of symbols per packet. Considering 20 octets, the number

of data symbols N becomes 40 (4-bit symbols). Using N=20, PER = 1%, the SER

threshold is computed to be .

From Figure 17, the Eb/N0 value at the input of the receiver should be at least 19

dB to get a SER of less than . The Noise Figure (NF) requirement for RF

receiver can be computed using these values and receiver sensitivity requirement of

-85 dBm as

34

( ) ( ) (15)

Considering chip rate B = 2 Mchips/sec, the NF requirement for the receiver is

computed to be 7 dB. Such low noise RF receivers can be easily fabricated using

CMOS process. One of such RF receivers for 2.4 GHz band is demonstrated in [39].

Comparing the noise performance with the coherent receiver design [18], we see

that the receiver noise performance is much better than that of the non-coherent

receiver. The Eb/N0 requirement at the input of the coherent receiver is around 8.7 dB

[18] which is much better than 19 dB requirement of non-coherent receiver. The

better performance in the coherent receiver is achieved at the cost of higher

complexity and implementation cost. Nonetheless, the non-coherent receiver can

easily meet the noise performance requirement of IEEE 802.15.4 with much less

complexity than the coherent receiver.

5.2. Performance of the TTA processor for transmitter

The TTA processor for transmitter is simple enough to be implemented on low-cost

devices. The gate count estimate for the processor is 16 k gates. The CPU cycle

requirement for processing a PPDU for transmission is given as

( ) (16)

The clock frequency required for the processor to achieve a data throughput of 250

kbps is defined as per Equation (16). The minimum clock frequency required for the

processor is 35 MHz.

5.3. Performance of the TTA processor for receiver

The real challenge in the IEEE 802.15.4 transceiver design is designing the low

power and low cost processor for a receiver. The choice of a good receiver algorithm

such as AZCD which requires very basic processing makes the task somewhat easier.

The challenge with implementing the AZCD algorithm is the conditional processing

requirements like saturation, signum function etc., which are handled using special

functional units. The designed processor takes 5930 CPU cycles for processing each

byte of the payload. There is an overhead of 32 k CPU cycles for the preamble

detection and synchronization process. To achieve the data throughput of 250 kbps,

the clock frequency required by the processor is 200 MHz. The estimated gate count

of the processor is 95 k which is small enough to be implemented on low cost

devices.

The clock frequency and gate count can be further reduced by replacing the

generic ALUs with the FUs to perform required arithmetic or logical operation. Such

a design would give much better performance at the loss of generality of the

processor.

For comparison, an SIMD processor designed in [23] also operates at 200 MHz

with SIMD factor of 16 and some special hardware blocks for processing of coherent

demodulation (like CORDIC, MAC, etc.). The resource usages for the designed TTA

processor are comparable to the SIMD processor as described in [23].

35

5.4. Discussion

Designing an SDR system for any wireless communication technology is an issue of

software-hardware trade-off. An ideal SDR system should be totally software driven

with an antenna connected at the serial port of the processor. Such a system is hard to

realize for any practical purposes. The hardware-software tradeoff needs to be

optimized to get the best of flexibility, adaptability and time-to-market under the

constraints of required performance matrices in terms of throughput, latency, energy

consumption and cost. While designing an SDR system using an ASIP, the same

trade-off needs to be taken care. The signal processing needs to be divided into

hardware implemented and software implemented part. The hardware part is

implemented as a functional unit of the ASP. The trade-off in designing the special

functional unit is to choose instructions which are generic in nature so that the

designed processor could be used without much of change for other similar

technologies (programmability).

A qualitative comparison of different methodologies for the implementation of

IEEE 802.15.4 is presented in Table 6. The comparison is done between system-on-

chip (SOC) implementations (for example TI CC2531 [40], Marvell 88MZ100 [41]

etc.), GPP based implementation (using GNU Radio, Matlab, etc.) and the proposed

TTA implementation. The SOC implementations are optimal in terms of cost and

energy consumption but lack the programmability of an SDR system. The time to

market for an SOC implementation is high and there is no maintainability once the

product is delivered. GNU Radio or Matlab-Simulink based implementation with the

help of SDR kits (like USRP devices) are highly programmable. However, they

cannot guarantee the real time throughput and the solution cannot be integrated on

low cost devices. Using an ASP based SDR implementation balances the trade-off

providing programmability and integration on low-cost devices at cost of higher cost

and energy consumption.

Table 6: Comparison of different implementation of IEEE 802.15.4

Parameters System-on-chip SDR on GPP SDR on ASP

(TTA)

Programmability No Yes Yes

Integration with Sensor

Nodes

Yes No Yes

Real-time Processing Yes No Yes

Low-Cost Yes NA Yes

36

6. SUMMARY

The goal of this work was to analyze the feasibility of realizing an SDR for low cost

and low power wireless communication transceiver on a TTA based ASP. The

wireless communication standard selected for realization was IEEE 802.15.4. At

first, the feasibility of using an existing SDR development framework (GNU Radio)

on a TTA based processor was studied. It was concluded that the GNU Radio

framework cannot be used as standalone system and needs Linux running on host PC

or a device running Embedded Linux for its execution.

Due to infeasibility of using the existing SDR development frameworks on TTA

processor, the signal processing algorithms needed to be implemented from scratch.

For this, algorithm level study was carried out for selecting signal processing

algorithms suitable for the IEEE 802.15.4 specified requirements. Both coherent and

non-coherent receiver algorithms were reviewed as a candidate for the

implementation over TTA. Finally, the AZCD based receiver algorithm was selected

for implementation.

The designed transceiver is first modeled using Matlab to check whether selected

algorithms meet the noise performance as desired by the IEEE standard. Different

parameters for implementation (such as the sampling rate, the preamble detection

threshold, the number of phase axes for AZCD etc.) were selected at this stage using

Matlab model. The Matlab model was also used for the testing and verification of

actual C implementation.

Once the Matlab model of the transceiver was complying with IEEE requirement,

the transceiver was implemented in C along with implementing the TTA processor

for the transmitter and the receiver. Special functional units were implemented in the

TTA processor for the receiver to speed-up the algorithm. The data throughput

requirement of 250 kbps was achieved on both transmitter clocked at 35 MHz and

receiver clocked at 200 MHz. The gate counts of the processors are small enough to

be implemented on low-cost and low-power devices.

The implemented transceiver lacks in the noise performance compared with

industry available SOC implementations such as TI CC2531 [40], Marvell 88MZ100

[41] or compared with the coherent receiver algorithm implementations [22, 23].

Nonetheless, the noise performance meets the requirement of the IEEE 802.15.4

specification. The commercially available SOCs also provide better performance in

terms of the energy consumption and production cost. However, compared with the

SDR implementation, the SOCs lack the programmability, have higher time-to-

market with no post production maintainability. The coherent receiver based SDR

implementation [22] has higher complexity and need a fully functional DSP [23] for

its operation. The processing complexity of the realization proposed in this thesis

provides an optimal of cost and power consumption along with providing the

benefits of the SDR system and meeting the performance requirements of the IEEE

802.15.4 standard.

While the current AZCD based implementation meets the noise performance of the

IEEE 802.15.4 standard, it can be improved by using more complex synchronous

ZIFZCD algorithm at the cost of designing a more complex TTA processor. The

current design implements most of the functionalities in software keeping the

processor generic. The design can be further optimized in terms of gate count, CPU

clock frequency and power consumption by using a software-hardware hybrid

design. A generic ZIFZCD or AZCD block can be implemented in hardware as a

37

special functional unit of the TTA processor. Since, the ZIFZCD and AZCD

algorithms can be used for demodulating any CPFSK modulated signal, the TTA

processor can be used in any system employing CPFSK modulation.

The work presented in this thesis establishes the feasibility of implementing low

rate baseband wireless transceiver systems on a TTA based application specific

processor. Using the example of the IEEE 802.15.4 LR-WPAN PHY standard, it has

been shown that a low data-rate SDR transceiver can be implemented on a TTA

processor. The TTA based ASP design can be one of the ideal platforms for moving

from current SOCs based implementation to the programmable SDR

implementations. The results of this thesis work have been published as a conference

paper in the ‘Wireless Innovation Forum Conference of Wireless Communication

Technologies and Software Defined Radio’ (SDR-WinnComm 2013), Washington

DC, USA [42].

38

7. REFERENCES

[1] IEEE Standard 802.16 (2004), IEEE standard for local and metropolitan

area networks part 16: Air interface for fixed broadband wireless access

systems.

[2] IEEE Standard 802.16-2005 (2005), IEEE standard for local and

metropolitan area networks part 16: Air interface for fixed and mobile

broadband wireless access systems.

[3] 3rd

Generation Partnership Project (3GPP), http://www.3gpp.org

(17.12.2012)

[4] IEEE Standard 802.11-2012 (Revision 2012), IEEE Standard for

Information technology - Telecommunications and information exchange

between systems Local and metropolitan area networks - Specific

requirements Part 11: Wireless LAN medium access control (MAC) and

physical layer (PHY) specifications.

[5] IEEE Standard 802.15.1-2005 (2005), IEEE Standard for Information

technology-- Local and metropolitan area networks-Specific requirements-

Part 15.1a: Wireless Medium Access Control (MAC) and Physical Layer

(PHY) specifications for Wireless Personal Area Networks (WPAN).

[6] IEEE Standard 802.15.2-2003 (2003), IEEE Recommended Practice for

Information technology-- Local and metropolitan area networks - Specific

requirements- Part 15.2: Coexistence of Wireless Personal Area Networks

with Other Wireless Devices Operating in Unlicensed Frequency Bands.

[7] IEEE Standard 802.15.3-2003 (2003), IEEE Standard for Information

technology - Telecommunications and information exchange between

systems - Local and metropolitan area networks - Specific requirements

Part 15.3: Wireless Medium Access Control (MAC) and Physical Layer

(PHY) Specifications for High Rate Wireless Personal Area Networks

(WPAN).

[8] IEEE Standard 802.15.4-2011, (2011), IEEE Standard for Local and

metropolitan area networks--Part 15.4: Low-Rate Wireless Personal Area

Networks (LR-WPANs).

[9] Corporaal, H. (1998), Microprocessor architectures: from VLIW to TTA, J.

Wiley.

[10] Jääskeläinen, P., Guzma, V., Cilio, A.,Pitkänen, P., Takala, J. (2007),

Codesign toolset for application-specific instruction-set processors, Proc.

SPIE Multimedia on Mobile Devices.

39

[11] Sabater, J., ome , .M., L pe , M. (20 0 , Towards an IEEE 802.15.4

SDR transceiver. In: 17th IEEE International Conference on Electronics,

Circuits, and Systems (ICECS), pp.323-326,

[12] GNU Radio, http://gnuradio.org (17.12.2012).

[13] Data Sheet for USRP N200/N210 Network Series,

https://www.ettus.com/content/files/07495_Ettus_N200-

210_DS_Flyer_HR.pdf (17.12.2012).

[14] Data Sheet for USRP E100/E110 Embedded Series,

https://www.ettus.com/content/files/07495_Ettus_E100-

110_DS_Flyer_HR.pdf (17.12.2012)

[15] Data Sheet for USRP B100 Bus Series,

https://www.ettus.com/content/files/07495_Ettus_B100_DS_Flyer_HR_4.

pdf (17.12.2012)

[16] Data Sheet for USRP1 Bus Series,

https://www.ettus.com/content/files/07495_Ettus_USRP1_DS_Flyer_HR.p

df (17.12.2012)

[17] Schmid, T. (2006), GNU Radio 802.15.4 En- and Decoding,

http://nesl.ee.ucla.edu/fw/thomas/thomas_project_report.pdf (17.12.2012)

[18] Thandee R. (2012), IEEE 802.15.4 Implementation on an Embedded

Device. Master’s thesis. Virginia Polytechnic Institute and State

University, USA.

[19] Knauth S., Implementation of an IEEE 802.15.4 Transceiver with a

Software-defined Radio setup,

http://www.ihomelab.ch/fileadmin/Dateien/PDF/emw2008_paper_knauth.

pdf (17.12.2012)

[20] Chipcon products from Texas Instruments, CC2420: 2.4 GHz IEEE

802.15.4/ Zigbee-ready RF transceiver.

http://focus.ti.com/lit/ds/symlink/cc2420.pdf (17.12.2012).

[21] Mahmood S. (2011), Software defined radio for wireless sensor &

cognitive networks. Master’s thesis. Faculty of Information Technology,

University of Applied Sciences, Vasa, Finland.

[22] Koteng R. M. (2006), Evaluation of SDR-implementation of IEEE

802. 5.4 Physical Layer. Master’s thesis. Norwegian University of Science

and Technology, Department of Electronics and Telecommunications,

Norway.

[23] Naess H. (2006), A programmable DSP for low-power,low-complexity

baseband processing. Master’s thesis. Norwegian University of Science

and Technology, Department of Electronics and Telecommunications,

Norway.

40

[24] Dehaese N., Bourdel S., Barthelemy H., Bas G. (2006), Simple

demodulator for 802.15.4 low-cost receivers. In: IEEE Radio and Wireless

Symposium, 17-19 Jan. 2006.

[25] The ZigBee Alliance, http://www.zigbee.org/ (17.12.2012)

[26] IEEE Standard 802.2-1998 (1998), IEEE Standard for Information

technology - Telecommunications and information exchange between

systems - Local and metropolitan area networks - Specific requirements -

Part 2: Logical Link Control.

[27] Proakis, J.G. (1989), Digital Communications, New York: McGraw-Hill.

[28] Volker B., Handel P. (2001), Frequency estimation from proper sets of

correlations. IEEE Transactions on Signal Processing, vol.50, no.4,

pp.791-802.

[29] Kay S., (1989), A fast and accurate single frequency estimator. IEEE

Transactions on Acoustics, Speech and Signal Processing, vol.37, no.12,

Dec 1989.

[30] Classen F., Meyr H. (1993), Two frequency estimation schemes operating

independently of timing information. In:Global Telecommunications

Conference, vol. 3.

[31] Volder J.E. (1959), The CORDIC Trigonometric Computing

Technique. IRE Transactions on Electronic Computers, vol. EC-8, no.3,

pp. 330–334.

[32] Tjhung T., Wittke P. (1970), Carrier Transmission of Binary Data in a

Restricted Band. IEEE Transactions on Communication Technology,

vol.18, no.4, pp.295-304.

[33] Park J. (1970), An FM Detector for Low S/N. IEEE Transactions

on Communication Technology, vol.18, no.2, pp.110-118.

[34] Meyr H., Subramanian R. (1995), Advanced digital receiver principles and

technologies for PCS. IEEE Communications Magazine, vol.33, no.1,

pp.68-78.

[35] Kwon H.M., Kwang B. L. (1996), A novel digital FM receiver for mobile

and personal communications. IEEE Transactions on Communications,

vol.44, no.11, pp.1466-1476.

[36] Dehaese N., Bourdel S., Barthelemy H., Bachelet Y., Bas G. (2005), FSK

zero-crossing demodulator for 802.15.4 low-cost receivers, In: 12th IEEE

International Conference on Electronics, Circuits and Systems (ICECS)

pp.1-4, 11-14 Dec. 2005.

41

[37] Dehaese N., Bourdel S., Barthelemy H., Bas G. (2006), Simple

demodulator for 802.15.4 low-cost receivers, IEEE Radio and Wireless

Symposium, pp. 315- 318, 17-19 Jan. 2006.

[38] Luzzatto A., Shirazi G. (2007), Wireless Transceiver Design, J. Wiley, pp.

36.

[39] Zolfaghari A., Razavi B. (2003), A Low Power 2.4 GHz

Transmitter/Receiver CMOS IC, IEEE Journal of Solid-state Circuits, vol.

38, no. 2.

[40] A USB-Enabled System-On-Chip Solution for 2.4-GHz IEEE 802.15.4 and

ZigBee Applications, http://www.ti.com/lit/ds/symlink/cc2531.pdf

(14.01.2013).

[41] Marvell 88MZ100: Zigbee SoC Solution for Home Automation and More,

http://www.marvell.com/wireless/88MZ100/assets/Marvell-88MZ100-

ZigBee-Product-Brief.pdf (14.01.2013)

[42] Ghazi A., Boutellier J., Hannuksela J., Silvén O., Janhunen J. (2013), Low-

complexity SDR implementation of IEEE 802.15.4 baseband transceiver

on application specific processor. In: The Wireless Innovation Forum

Conference on Communications Technologies and Software Defined

Radio (SDR-WInnComm), 8-10 Jan. 2013.

sdr implementation of ieee 802.15.4 phy on …jultika.oulu.fi/files/nbnfioulu-201301251016.pdf ·...

Documents