sdr implementation of ieee 802.15.4 phy on …jultika.oulu.fi/files/nbnfioulu-201301251016.pdf ·...
TRANSCRIPT
DEGREE PROGRAMME IN WIRELESS COMMUNICATIONS ENGINEERING
SDR IMPLEMENTATION OF IEEE 802.15.4 PHY
ON TRANSPORT TRIGGERED ARCHITECTURE
PROCESSOR
Author Amanullah Ghazi
Supervisor Jari Hannuksela
Second
supervisor
Janne Janhunen
January 2013
Ghazi A. (2013) SDR Implementation of IEEE 802.15.4 PHY on Transport
Triggered Architecture Processor. University of Oulu, Department of
Communication Engineering. Master’s Thesis, 41 p.
ABSTRACT
Ever evolving wireless communication standards, reduced time-to-market and a need
for flexibility and interoperability of multiple wireless communication technologies
on a single device are the driving factors behind implementation of wireless
standards on Software Defined Radios (SDR) platforms. The concept behind SDR is
to implement as much functionality on software as possible. SDR provide greater
interoperability and programmability compared with traditional hardwired
implementation at the cost of higher power consumption and market cost. SDR is the
driving technology for the next generation of co-operative and cognitive radios.
For implementing an SDR, the existing wireless communication algorithms needs
to be modified and an appropriate hardware platform needs to be selected. The IEEE
802.15.4 LR-WPAN standard requires low cost and low-power consuming devices.
The data rate requirements are also low (such as 250 kbps). Traditionally, the devices
compliant with the standard are hardwired system-on-chip implementation which
provides benefit in terms of power and cost. Recently, there has been significant
effort on modeling the IEEE 802.15.4 SDR systems which provide greater
interoperability and programmability of the devices. In this study, Transport
Triggered Architecture (TTA) based Application Specific Processor is selected for
SDR implementation of the IEEE 802.15.4 2.4 GHz physical layer for studying the
performance of such system in terms of Bit-Error-Rate, CPU cycle count, and
processor chip area.
As part of this work, different SDR frameworks like GNU Radio, Matlab-Simulink
etc. were evaluated for their feasibility of providing an agile platform for the
development. These existing frameworks need an operating system for their
execution and are not suitable for stand-alone systems such as a TTA based
processor.
The work also includes the study of different receiver algorithms and design
choices for the transceiver implementation. Based on existing literature and Matlab
modeling, Asynchronous Zero-Crossing Detector (AZCD) based non-coherent
receiver algorithm is selected for the implementation. The algorithm provides the
required BER performance with very less complex computation and is suited for low
power and low chip area implementations. The transmitter and receiver are
implemented on single-core TTA processors which provide the required performance
in terms of BER and data throughput. The processors designed need a very low
silicon area and clock frequency for their realization.
Key words: IEEE 802.15.4, LR-WPAN, Software Defined Radio, TTA
CONTENTS
ABSTRACT ................................................................................................................. 2 CONTENTS ................................................................................................................. 3 PREFACE .................................................................................................................... 4 GLOSSARY ................................................................................................................. 5
1. INTRODUCTION ................................................................................................ 8 1.1. Related Works .......................................................................................... 9 1.2. Scope of the Thesis................................................................................. 10 1.3. Structure of Thesis.................................................................................. 10
2. IEEE 802.15.4 PHYSICAL LAYER ................................................................. 11
2.1. Frequency Band ...................................................................................... 12 2.2. Frame Format ......................................................................................... 13
2.3. Coding .................................................................................................... 13 2.4. Modulation ............................................................................................. 14 2.5. Pulse Shaping ......................................................................................... 14 2.6. O-QPSK/MSK equivalence .................................................................... 15
3. TRANCEIVER DESIGN ................................................................................... 17 3.1. Transmitter Design ................................................................................. 17
3.1.1. O-QPSK based Transmitter Design .......................................... 17
3.1.2. MSK based transmitter Design ................................................. 17 3.1.3. Design Decisions ....................................................................... 18
3.2. Receiver Design ..................................................................................... 18 3.2.1. Coherent Receiver Design ......................................................... 18 3.2.2. Non-coherent receiver ............................................................... 21
3.2.3. Design Decisions ....................................................................... 23 4. TRANCEIVER IMPLEMENTATION .............................................................. 24
4.1. Transmitter Implementation ................................................................... 24 4.1.1. Frame Formatter ........................................................................ 24
4.1.2. Chip Mapper .............................................................................. 24 4.1.3. O-QPSK Modulator ................................................................... 24
4.1.4. Half-Sine Pulse Shaper .............................................................. 25 4.1.5. TTA processor for the transmitter ............................................. 25
4.2. Receiver Implementation ....................................................................... 26
4.2.1. Asynchronous Zero-Crossing Detector ..................................... 26 4.2.2. Down Sampling ......................................................................... 28
4.2.3. Bits-to-Word Packing ................................................................ 28 4.2.4. Correlator Bank ......................................................................... 28 4.2.5. Preamble Detection ................................................................... 29
4.2.6. Receiver Algorithm ................................................................... 30
4.2.7. TTA Processor for Receiver ...................................................... 30 5. PERFORMANCE EVALUATIONS ................................................................. 33
5.1. Noise Performance for Receiver ............................................................ 33
5.2. Performance of TTA processor for transmitter ...................................... 34 5.3. Performance of TTA processor for receiver ........................................... 34 5.4. Discussion .............................................................................................. 35
6. SUMMARY ....................................................................................................... 36 7. REFERENCES ................................................................................................... 38
PREFACE
This Master’s thesis has been published in the Department of Communication
Engineering of the University of Oulu, Finland. The research work published in this
thesis was mentored and funded by Department of Computer Science and
Engineering, University of Oulu. Their support is greatly acknowledged.
I would like to thank Professor Olli Silvén from the Department of Computer
Science for providing an excellent and flexible opportunity for this research work
and mentoring the work enthusiastically. I would also like to thank Jani Boutellier
(Dr. Tech.) from the Department of Computer Engineering for his continuous
support and technical mentorship.
I am grateful to Professor Jari Hannuksela from Department of Computer Science
for his encouraging supervision of the thesis work. The academic freedom provided
for the work allowed me to experiment with multiple design options. I am also
thankful to Janne Janhunen (Dr. Tech.) from Center for Wireless Communication
(CWC) for being my second supervisor and providing excellent feedback to the work
being done.
I would like to thank my family for their well wishes and prayers. I would also like
to thank my classmates and friends Shahriar Shahabuddin, Hassan Malik, Ijaz
Ahmad, Nouman Bashir and Masuma Khatun for their support and encouragement
during the thesis work.
Oulu, January 2013
Amanullah Ghazi
GLOSSARY
ALU Arithmetic and Logic Unit
ASIC Application Specific Integrated Circuit
ASK Amplitude Shift Keying
ASP Application Specific Processor
AZCD Asynchronous Zero-Crossing Detector
BPSK Binary Phase Shift Keying
CDM Cross-Differentiate Multiply
CMOS Complementary Metal–Oxide–Semiconductor
CORDIC COordinate Rotation DIgital Computer
CPFSK Continuous Phase Frequency Shift Keying
CPU Central Processing Unit
CSMA-CA Channel Sense Multiple Access with Collision Avoidance
CU Control Unit
DAC Digital to Analog Converter
DSP Digital Signal Processor
DSSS Direct Sequence Spread Spectrum
FFD Full Function Device
FFT Fast Fourier Transform
FIFO First In First Out
FIR Finite Impulse Response
FPGA Field Programmable Gate Array
FSK Frequency Shift Keying
FU Functional Unit
GPP General Purpose Processor
IEEE Institute of Electrical and Electronics Engineers
IF Intermediate Frequency
ISF Integrate and Saturate Filter
LAN Local Area Network
LDI Limiter Discriminator Integrator
LLC Logical Link Control
LR-WPAN Low Rate Wireless Personal Area Network
LSB Least Significant Byte
LSU Load Store Unit
MAC Medium Access Control
MSK Minimum Shift Keying
NF Noise Factor
O-QPSK Offset Quadrature Phase Shift Keying
PAN Personal Area Network
PER Packet Error Network
PHR Physical Layer Header
PHY Physical Layer
PN Pseudo-Random Noise
PPDU PHY Packet Data Unit
PSD Power Spectral Density
PSDU PHY Service Data Unit
PSSS Parallel Sequence Spread Spectrum
QPSK Quadrature Phase Shift Keying
RF Radio Frequency
RFD Reduced Function Device
RRC Root Raised Cosine
SDR Software Defined Radio
SER Symbol Error Rate
SFD Start of Frame Delimiter
SHR Synchronization Header
SIMD Single Instruction Multiple Data
SNR Signal to Noise Ratio
TCE TTA-based Codesign Environment
TTA Transport Triggered Architecture
USRP Universal Serial Radio Peripheral
WLAN Wireless Local Area Network
WMAN Wireless Metropolitan Area Network
WPAN Wireless Personal Area Network
XOR Bitwise exclusive OR operation
ZCD Zero Crossing Detector
ZIFZCD Zero-IF Zero Crossing Detector
A Amplitude of signal
ak O-QPSK symbols
Ci Cumulative decision of the zero-crossing detectors
Dk Decision value of zero-crossing detector
dk MSK symbols
Eb Energy per bit
fc Carrier frequency
I In-phase carrier
ik In-phase sample corresponding to kth phase axis
In Chip sequence
k Kilo
M Number of phase axes in AZCD
N Number of symbols per packet
N0 Noise power spectral density
Q Quadrature phase carrier
qk Quadrature-phase sample corresponding to kth phase axis
S Shape of demodulated signal by AZCD
T Number of filter taps
Tc Chip time
Ts Symbol Time
u Number of 1s in XOR-and-count 1 based correlator
v Correlation value based on Multiply-and-accumulate based correlator
θk kth phase axis angle
1. INTRODUCTION
Recent years have seen an exponential increase in use of wireless communication
technologies for voice, data and video communications. A multitude of wireless
networking technologies has been developed for catering to the needs of these
applications. The existing wireless networks can be categorized broadly as Wireless
Metropolitan Area Network (WMAN), Cellular Networks, Wireless Local Area
Network (WLAN) and Wireless Personal Area Network (WPAN). WMAN and
cellular networks cater to high data rate and voice applications with broader coverage
area replacing the traditional copper-wire based telephony network. WLANs also
cater to high data rate applications over small distances (within a building) replacing
the traditional wired LAN. WPANs are person centered very short range network of
devices requiring low data rate. A number of standards exist for specifying the
compliant devices. WMAN is specified by IEEE 802.16 series of standards [1, 2]. 3rd
Generation Partnership Project (3GPP) [3] specifies a number of standards for
cellular networks. WLAN technologies are governed by IEEE 802.11 series of
specifications [4]. IEEE 802.15 series specifies different standards for WPANs like
Bluetooth, Infrared Data Associations, Body Area Network, etc. [5, 7, 8]. IEEE
802.15.4 standard focuses on low data rate wireless application with very low energy
consumption and very low complexity [8]. The potential applications for IEEE
802.15.4 standard are sensors, interactive toys, smart badges, remote controls, and
home automation [8].
The large number and continuously evolving wireless communication standards
bring a need for reconfigurable and programmable radios also known as Software
Defined Radios. A Software Defined Radio (SDR) is based on software defined
wireless communication protocols instead of hardwired implementation providing
the adaptability and interoperability of multiple wireless communication standards on
a single device. The advantage of using an SDR based wireless communication
system is the ability to update, enhance or replace the functionality of the device.
This provides lower time-to-market and an enhanced lifetime of SDR based devices.
SDR systems provide flexibility at the cost of performance and power goals
compared to hard-wired ASIC implementations. Designing an SDR system for IEEE
802.15.4 is challenging due to low power and complexity requirements of the
standard. Nevertheless, since the performance requirement is less, the low
complexity and low power SDR system can be realized by intelligently choosing the
signal processing algorithms and processor architectures.
Use of a general purpose processor can maximize the flexibility and
programmability while a hard-wired ASIC implementation can minimize the power
consumption and cost. A better choice will be to use an Application Specific
Processor (ASP) which provides an optimal of power consumption, cost and
flexibility. In this thesis, Transport Triggered Architecture (TTA) [9] based ASP is
used for the realization of the SDR system for IEEE 802.15.4 physical layer.
The Transport Triggered Architecture (TTA) is a processor design paradigm where
computations happen as a side effect of data transport. The program directly controls
the internal data transport. A typical TTA processor is shown in Figure 1. A TTA
processor consists of Functional Units (FUs) which perform the specific computation
or function, Register Files (RF) for storing intermediate results and control units for
program control flow. Every functional unit has a number of input & output ports
depending on the number of operand for the computation. One of the input ports is
9
defined as triggering port. Writing data on a triggering port starts the computation
and the result is ready at the output port once computation finishes. The
interconnection network consists of buses and sockets which connect the different
FUs and register files.[9]
Functional Units Register Files Control Units
Sockets BusesPorts
Connections
Figure 1: Components of TTA Processor.
The TTA-based Codesign Environment (TCE) provides an excellent toolset for
designing and simulating TTA processor. The tool chain consists of a processor
design tool, C/C++ compiler, an extensive list of libraries for functional units and
simulation toll for simulating the processor behavior.[10]
1.1. Related Works
Previously, a lot of work has been done in implementing or modeling IEEE 802.15.4
physical layer using an SDR system. Sabater et al. [11] proposes the design of a
SDR transceiver for IEEE 802.15.4 (868 MHz band). The transceiver uses bandpass
delta-sigma modulator and an under-sampling receiver implemented over a Xilinx
Virtex-5 FPGA. One of the popular method for modeling an SDR is by using GNU
Radio [12] and commercially available family SDR radio hardware namely
Universal Serial Radio Peripheral (USRP)[13, 14, 15, 16]. An open source
implementation of IEEE 802.15.4 physical layer (2.4 GHz band) using GNU Radio is
discussed in [17]. The open source physical layer code has been ported on embedded
device USRP E100 [14] as thesis work described in [18]. Since the USRP E100
device cannot execute the receiver algorithm at the required data rate, the design
presented in [18] proposes an FPGA based hardware receiver for receiving the
incoming packets. A similar modeling of IEEE 802.15.4 physical layer has been
done using GNU Radio and USRP device in [19] where the processing in done on
host computer and USRP device is used as RF front end. Another thesis work on the
implementation of IEEE 802.15.4 using GNU Radio, USRP hardware and
CC2420/CC2431 [20] sensor nodes is described in [21].
GNU Radio based SDR realization on a USRP device is good for modeling
purposes, but is not suitable for the actual implementation of sensor nodes. GNU
10
Radio requires Linux operating system to execute, and hence need a host computer.
Real-time performance cannot be guaranteed to such a system. Use of embedded
devices such as USRP E100 running embedded Linux can do away with the need of
a host computer. However, it is not capable of receiver processing on real-time using
software [18]. It is provided with an FPGA which can be used for implementing the
complex signal processing task in hardware. During the initial feasibility study of
this thesis work, GNU Radio framework was considered as one of the options for
implementing the IEEE 802.15.4 physical layer on a TTA architecture based
application specific processor. It has been realized that GNU Radio framework is
heavily dependent on the Linux operating system and hence requires porting the
whole Linux (or Embedded Linux) on TTA processor before using GNU Radio.
Some earlier work has been done in implementing IEEE 802.15.4 without using
GNU Radio or other SDR development framework. For example, thesis work
described in [22] evaluates different signal processing algorithms for the SDR
implementation of IEEE 802.15.4 physical layer. The algorithms selected are for
coherent receiver design, which are more complex for implementation. The work is
done in collaboration with another thesis work [23] which designed efficient Single-
Instruction-Multiple-Data (SIMD) signal processor. Coherent receivers give better
performance in terms of bit error but an require expensive carrier recovery algorithm.
It is better to consider a non-coherent receiver for demodulating the IEEE 802.15.4
signal as described in [24].
1.2. Scope of the Thesis
The objective of the work is to implement the IEEE 802.15.4 2.4 GHz Physical layer
on a TTA Processor and evaluate the TTA architecture performance for wireless
applications. The work includes modifying the existing signal processing algorithms
to suit software implementation, implementing the algorithm in C and designing the
TTA processor which can cater to the performance requirement as specified by IEEE
802.15.4 physical layer standard.
1.3. Structure of Thesis
The rest of this thesis is structured as follows. Chapter 2 gives a brief overview of
IEEE 802.15.4 and the physical layer specification for 2.4 GHz band. Chapter 3
gives an overview of transceiver design and explores different design issues. Chapter
4 provides details of the implemented transmitter and receiver along with the TTA
processors for them. Chapter 5 gives a performance overview of the designed
transceiver in terms of bit-error-rate, gate count and the CPU cycle requirements.
Chapter 6 summarizes the work done as part of the thesis.
11
2. IEEE 802.15.4 PHYSICAL LAYER
IEEE 802.15.4 standard describes the MAC and Physical layer protocols for Wireless
Personal Area Networks [8]. WPANs are used for low data rate communication
between devices over small distance. The main objective of the specification is to
provide a simple and flexible protocol for reliable data transfer over short range
distances at extremely low cost and low power requirements.
The protocol architecture of IEEE 802.15.4 is presented in Figure 2 [8]. The IEEE
802.15.4 describes the specifications for MAC and PHY layers. Higher layers are
described by other protocol stacks like ZigBee [25] while the Logical Link Control
(LLC) layer can be either the IEEE 802.2 [26] or any other LLC protocol
specifications.
IEEE 802.15.4 PHY
IEEE 802.15.4 MAC
IEEE 802.2 LLC Other LLC
Upper Layers
Figure 2: IEEE 802.15.4 protocol architecture
Two types of network topologies are specified by IEEE 802.15.4 specifications.
An IEEE 802.15.4 LR-WPAN compliant network may operate in either of two
topologies: the star topology or the Peer-to-Peer topology as shown in Figure 3. In
star topology, the communication is established between devices and a single master
node, called PAN coordinator. The Peer-to-Peer topology also has a PAN
coordinator which works as the master node, but the devices can communicate with
each other directly. The devices are either Full Function Device (FFD) or Reduced
Function Device (RFD), which cannot act as PAN coordinator. Star topology is
ideally suited for applications like home automation, computer peripheral
interconnection, toys and games and personal health care. The Peer-to-Peer
Topology is suited for more complex applications such as industrial control and
monitoring, wireless sensor networks, inventory tracking, etc. [8]
The MAC layer of IEEE 80.15.4 LR-WPAN is based on Carrier Sense Multiple
Access with Collision Avoidance (CSMA-CA). The MAC layer is responsible for
access to the radio channel, generating beacons if the device is the PAN coordinator,
synchronization of devices and security of the data. It communicates with the
12
physical layer and higher layers with the help of a set of defined messages as defined
by IEEE 802.15.4 standard. [8]
PAN Coordinator
Star Topology
PAN Coordinator
Peer-to-Peer Topology
Full Function Device (FFD)
Reduced Function Device (RFD)
Figure 3: Star and Peer-to-Peer Network Topology
The IEEE 802.15.4 physical layer defines the frequency band, modulation and
coding for the radio signals. It also specifies radio parameters like transmit power
spectral density (PSD), receiver sensitivity, jamming resistance, etc. [8]. Following
subsections describe the IEEE 802.15.4 physical layer specification in details.
2.1. Frequency Band
The IEEE 802.15.4 specifies four physical layer standards utilizing different
frequency bands and modulation schemes. A compliant device can operate on one or
several frequency bands as specified by the standard. The parameters of these
physical layer specifications are briefly summarized in Table 1 [8].
Table 1: IEEE802.15.4 Frequency Bands
Band Frequency
(MHz)
Chip
Rate
Modulation Pulse
Shape
Bit
Rate
(kb/s)
Symbol
Rate
(ksymb/s)
Symbol
868/915 868 – 868.6 300 BPSK RRC 20 20 Binary
902 – 928 600 BPSK RRC 40 40 Binary
868/915
(Optional) 868 – 868.6 400 ASK RRC 250 12.5 20-bit PSSS 902 – 928 1600 ASK RRC 250 50 5-bit PSSS
868/915
(Optional)
868 – 868.6 400 O-QPSK Half-Sine 100 25 16-ary Orth.
902 – 928 1000 O-QPSK Half-Sine 250 62.5 16-ary Orth.
2450 2400 – 2483.5 2000 O-QPSK Half-Sine 250 62.5 16-ary Orth.
2450 MHz frequency band is the most widely used frequency band available to be
used globally for industrial, scientific and medical purposes. In this thesis, the
physical layer specification for 2450 MHz band is considered for design and
implementation. The scope of this thesis is limited to the design and implementation
of 2450 MHz band PHY specification. Any further reference to IEEE 802.15.4
implies the specification for 2450 MHz band PHY specification.
13
2.2. Frame Format
The frame format specified by IEEE 802.15.4 is presented in Figure 4. The frame is
known as PHY Protocol Data Unit (PPDU). The data is transmitted Least
Significant Byte (LSB) first order. Also, the least significant bit of the byte is
transmitted first [8].
Figure 4: PPDU Format for 2.4 GHz PHY.
The PPDU consists of a synchronization header (SHR) containing Preamble and
Start of Frame Delimiter (SFD) fields. SHR field is used by a transceiver for frame
and timing synchronization. The information contained in SHR is fixed and priory
known to both the transmitter and the receiver. For 2.4 GHz PHY, the preamble is
four byte long. All the bits in the preamble field are set to binary 0. The SFD field is
a fixed 8-bit sequence that marks the end of preamble and start of the frame [8].
PHY header contains the length of the payload. The frame length field is of 7 bit
length. The actual payload is of variable length with the maximum length of 127
Byte [8].
2.3. Coding
The IEEE 802.15.4 standard employs Direct Sequences Spread Spectrum (DSSS) for
spreading the data bits using 16 32-bit quasi-orthogonal pseudo-noise (PN)
sequences. The PN sequences or chips are listed in Table 2. Four consecutive bits of
the PPDU are mapped to a data symbol. Every data symbol is then encoded to the
32-bit PN chips [8].
Table 2: IEEE 802.15.4 Spreading Codes
Symbol chips
0000 01110100010010101100001110011011
0001 01000100101011000011100110110111
0010 01001010110000111001101101110100
0011 10101100001110011011011101000100
0100 11000011100110110111010001001010
0101 00111001101101110100010010101100
0110 10011011011101000100101011000011
0111 10110111010001001010110000111001
1000 11011110111000000110100100110001
1001 11101110000001101001001100011101
1010 11100000011010010011000111011110
1011 00000110100100110001110111101110
1100 01101001001100011101111011100000
Preamble SFDFrame Length
(7 bits)Reserved
(1 Bit)PSDU
1 variable
Octets
Synchronization Header PHY Header PHY Payload
14
1101 10010011000111011110111000000110
1110 00110001110111101110000001101001
1111 00011101111011100000011010010011
2.4. Modulation
The IEEE 802.15.4 standard uses Offset-QPSK (O-QPSK) modulation for
transmitting the chips. Even indexed chips are modulated to in-phase carrier (I) and
odd indexed chips are modulated to quadrature-phase (Q) carrier. The Q-phase
carrier is delayed by an offset of half symbol time with respect to the I-phase carrier
to form the offset (Figure 5) [8].
C0 C2 C4 C6 . . . . C28 C30
C1 C3 C5 C7 . . . . C29 C31
2Tc
Tc
Q-Phase
I-Phase
Figure 5: O-QPSK modulated Chip Sequences.
The offset of half symbol time in O-QPSK modulation prevents the zero-crossing
in the constellation diagram (Figure 6). O-QPSK modulation allows a single bit
change (either I or Q) at half of the symbol time. This results in a maximum phase
change of π/2 compared with Q-PSK which has a maximum phase change of π. Lack
of zero-crossing in O-QPSK modulation makes it suitable for non-linear power
amplifiers which has better power efficiency compared with linear power amplifiers.
I
Q
1110
00 01
I
Q
1110
00 01
QPSK Constellation O-QPSK Constellation
Figure 6: QPSK and O-QPSK constellations Diagram.
2.5. Pulse Shaping
IEEE 802.15.4 uses half-sine pulse shaping for transmitting the O-QPSK modulated
signal. The pulse shape used to represent each baseband chip is given by Equation
(1) [8].
15
otherwise
TtT
t
tpc
c
,0
20,2
sin)(
(1)
Where Tc is chip time. O-QPSK modulated half-sine waveform is presented in
Figure 7. The resulting baseband signal has a constant envelop.
Figure 7: O-QPSK I and Q symbols with the half-sine pulse shape.
2.6. O-QPSK/MSK equivalence
An O-QPSK modulated signal can be represented as
( ) {[ ∑ ( )
]
[ ∑ ( )
] }
(2)
where = (-1, 1) are the chip sequences, is amplitude of modulated signal, is
the carrier frequency and g(t) is a sinusoidal pulse as defined as
( ) {
( )
(3)
The O-QPSK modulation with half-sine pulse shape is equivalent to Continuous
Phase Frequency Shift Keying (CPFSK) with modulation index of 0.5 (also known
as MSK Modulation). To make the O-QPSK strictly equivalent to MSK, data must
be encoded as per MSK/O-QPSK coder Equation (4) [27].
16
{
(4)
where, dk are the MSK symbols and ak are the O-QPSK symbols. The
corresponding MSK/O-QPSK decoder equation is given as.
{
(5)
17
3. TRANCEIVER DESIGN
3.1. Transmitter Design
The IEEE 802.15.4 transmitter can be designed considering either the O-QPSK
modulation with half-sine pulse shaping or MSK modulator with MSK/O-QPSK
encoder. Following subsections provide details of both the designs.
3.1.1. O-QPSK based Transmitter Design
The block diagram of the baseband transmitter is shown in Figure 8. The transmitter
design is simple and is based on O-QPSK modulation with half-sine pulse shaping.
Fram
e Fo
rmat
ter
Ch
ip M
app
er
O-Q
PSK
M
od
ula
tor
Hal
f-Si
ne
Pu
lse
Shap
e
I
Q
I
Q
Figure 8: Block Diagram of the O-QPSK modulator based transmitter.
The transmitter consists of a frame formatter which formats the incoming payload
as per the IEEE 802.15.4 specification to form PPDU (see Section 2.2). Preamble,
SFD and frame length field is prefixed to the payload bits. The formatted PPDU is
then sent to the chip mapper. The chip mapper maps the four consecutive bits of the
PPDU to a 32-bit orthogonal PN sequence as specified in Table 2. These chips are
then modulated to I and Q carriers in O-QPSK modulator. I and Q phases are then
passed through a half-sine pulse shaping filter which outputs the samples of half-sine
pulses.
The design of a transmitter this kind is simple. A frame formatter can easily be
implemented by copying the incoming payload to a buffer containing preamble, SFD
and frame length field. The chip mapper can be implemented as a lookup table
containing 16 quasi-orthogonal chip sequences. The O-QPSK modulator converts
these chips from 0/1 to -1/1. The half-sine pulse shape filter output samples of
positive or negative half-sine pulse based on whether the I/Q symbols are +1/-1
respectively.
3.1.2. MSK based transmitter Design
Since O-QPSK with half sine pulse shaping is equivalent to MSK modulation, an
MSK modulator based transmitter can also be designed. The block diagram of such a
transmitter is shown in Figure 9.
18
Fram
e Fo
rmat
ter
Ch
ip M
app
er
MSK
/O-Q
PSK
C
od
er
MSK
M
od
ula
tor
I
Q
Figure 9: Block Diagram of the MSK modulator based transmitter.
The transmitter design is similar to the O-QPSK based transmitter. The O-QPSK
chips are first encoded with MSK / O-QPSK encoding Equation (4). The encoded
chips are then sent to an MSK modulator.
3.1.3. Design Decisions
Both of the transmitter design are simple and can be implemented on a TTA based
processor. The O-QPSK modulator with half-sine pulse shape directly maps to the
IEEE standard and hence is easier to test and verify. For this reason, O-QPSK
modulator based transmitter is selected for implementation.
3.2. Receiver Design
Following sub-sections provides a brief overview of some of the design choices
available for receiver design. Coherent and non-coherent demodulation techniques
are considered for evaluating the candidature for an efficient receiver which can
match the data throughput and SER requirement of the IEEE 802.15.4 standard. The
evaluation is theoretical and based on existing literature reviews.
3.2.1. Coherent Receiver Design
IEEE 802.15.4 receivers could use the coherent detection technique for demodulating
the O-QPSK modulated signal. In the coherent detection method, the detector tries to
estimate the actual transmitted symbol by computing the absolute phase of the carrier
signal. For this reason coherent receivers require strict timing and phase
synchronization. A coherent detection based SDR receiver is extensively discussed in
[18]. A simplified block diagram of such receiver is shown in Figure 10.
Frequency Offset Estimator &
compensatorMatched Filter Down Converter
Correlator Bank
Preamble Detection
Figure 10: Coherent Receiver Design.
19
Frequency Offset Estimator and Compensator
Wireless transceiver systems have imperfections in up/down converters and local
oscillators. These imperfections result into frequency and phase offset between
transmitted and received signals. The offset causes a constant rotation to the signal
constellation and must be compensated before the coherent detection of the symbol.
The frequency and phase offset estimation is done is two stages.
The first phase is a coarse frequency estimator which operates in a low signal to
noise ratio (SNR) and provides the initial estimate of carrier frequency. There are
different methods for performing the coarse frequency estimation. An FFT followed
by a search for spectral maxima provides an estimate for the frequency. Such a
frequency estimator is not suited for IEEE 802.15.4 receiver since spikes in FFT
periodogram of the signal are closely spaced [28]. The other method for coarse
frequency estimation is data based or correlation based estimators. Some of these
estimators like Kay estimator [29] or Meyr estimator [30] could be used for coarse
frequency estimation.
In second phase, fine frequency and phase offset estimation is done. The IEEE
802.15.4 uses preamble at the beginning of each frame which can be used for
frequency and phase estimation using data-added estimation algorithms. A
complex-valued correlator is used to perform the synchronization and estimation of
the phase. The incoming complex signal is correlated with the known preamble
sequence. The net rotation per sample is computed by dividing the correlation value
by the number of samples [18].
Once the frequency and phase offset is estimated, the samples need to be corrected
by compensating for these offset. This can be done by rotating the samples by the
offset in the constellation using CORDIC algorithms [31].
Matched Filter
A matched filter is used for detecting the known signal in an Additive White
Gaussian Noise channel. The idea is to correlate the received signal with known
pulse shape used during the transmission of the signal (half-sine pulse in IEEE
802.15.4 case). An adaptive matched filter whose co-efficient change with the
changing channel provides better noise performance at the cost of higher complexity.
For IEEE 802.15.4, a non-adaptive matched filter whose coefficient are matched to
the half-sine pulse, can fulfill the noise performance requirement. The filter co-
efficient of the matched filter is given by
.,0
.1,..2,1,0,sin)(
otherwise
TnT
n
nTh s
(6)
Where, T is the number of filter taps and Ts is symbol time.
Quantize / Down-sample
The output of the matched filter can be further quantized (to a single bit) to reduce
the complexity of the correlator. Also, the output of matched filter needs to be
20
down-sampled to match the symbol time. For the correlator to work correctly, the
symbol time synchronization needs to be insured.
The down-sampling operates in two modes. During the preamble detection and
synchronization phase, down-sampling is performed at multiple possible timing
offsets. During the actual symbol detection phase, the down-sampling is done at the
best timing offset selected during the preamble detection phase.
Preamble Detector
The IEEE 802.15.4 frame (PPDU) consists of five byte of the synchronization header
(Preamble and SFD) which is used by the receiver for frequency, phase and frame
synchronization. The synchronization header consists of known chip sequences and
hence need a correlator matched to preamble and SFD chip sequences. The input
sequence is down sampled at multiple offset and correlated with the preamble chip
sequence. The timing offset giving the best correlation is used for the downsampling
of payload chips. Detection of the Start of Frame Delimiter (SFD) denotes the end of
preamble and start of actual payload chips (frame synchronization).
Correlator Bank
The IEEE 802.15.4 uses direct sequence spread spectrum for converting four-bit data
symbols to 32-bit PN chip sequences. For the detection of the data symbol, a bank of
correlator matched to the PN sequences can be used. The block diagram of the
correlator bank is shown in Figure 11. There are 16 correlators matched to the 16 PN
chip sequences (Table 2). The symbol corresponding to the correlator having
maximum correlation value is selected as symbol decision.
Correlator 0
Correlator 1
Correlator 2
Correlator 15
Fin
d M
ax
.
.
.
.
Figure 11: Correlator Bank.
The design of a coherent receiver possesses hard challenges for implementation on
a low cost and low power device. The coherent detection algorithms are
computationally complex and are not suited for the sensor node implementation [24].
For example, the frequency and phase estimation algorithm uses complex CORDIC
algorithms for phase estimation and compensation which are difficult to realize on a
low cost device.
21
3.2.2. Non-coherent receiver
As discussed in Section 2.6, the O-QPSK modulation with half-sine pulse is
equivalent to CPFSK modulation with the modulation index of 0.5. Traditionally,
FSK demodulation is based on non-coherent detection methods. These non-coherent
detectors have advantage of robustness and operate without a need for carrier
recovery devices. The block diagram of the non-coherent demodulator based receiver
is shown in Figure 12.
MSK
D
emo
du
lato
r
Do
wn
Sa
mp
le
Preamble Detector
Correlator
I
Q
Chip Time
Figure 12: Block Diagram of Non Coherent Receiver.
MSK Demodulator
A non-coherent MSK demodulator could be used for the demodulation of O-QPSK
signal. MSK demodulators are based on non-coherent detection methods such as
limiter-discriminator integrator (LDI) and dump detector [32]. The LDI technique is
the basis for designing a zero-crossing demodulator (ZCD), reducing the receiver
complexity further as in the case of Cross-Differentiate Multiply (CDM)
demodulator [33] or the digital tangent method [34]. A simple demodulator based on
zero-crossing demodulation known as ZIFZCD (Zero IF Zero-crossing detector) is
presented in [35]. The complexity of the receiver is substantially reduced in ZIFZCD
demodulation using the Zero-IF demodulation and use of a hard-limiter device. The
ZIFZCD algorithm uses a LDI technique and requires a symbol timing recovery
device to operate. The ZIFZCD algorithm is further modified to replace the LDI
device with an integrate and saturate device in the Asynchronous Zero-Crossing
Demodulator (AZCD) [36]. The AZCD algorithm’s computational complexity is
very low and it does not require a complex symbol timing recovery device.
The block diagram of a Zero-crossing based detector is shown in Figure 13. It
consists of a phase axis generator, which generates multiple phase axes (M) in order
to operate with lower modulation index values. These phase axis values are hard-
limited and sent to a bank of zero-crossing detectors. Each zero-crossing detector
takes a pair of phase axes (I & Q) and output a decision value D which gives the
direction of phase axis crossings (clockwise, anticlockwise or no crossing). The
decisions from all the ZCD are combined in a decision device which gives the
decision about the symbol.
MSK demodulator does not give the actual O-QPSK encoded chips. To get the
transmitted chip, the demodulated sequence needs to be decoded using the MSK/O-
QPSK decoder Equation (5). The drawback of using Equation (5) for decoding the
MSK demodulated sequence to O-QPSK sequence is the recursive nature of the
22
equation. The output of MSK/O-QPSK decoder (ak) is dependent on previous output
(ak-1). This results in the propagation of a single bit error to all the following bits. A
solution to this problem is to use the correlator bank matched to the MSK/O-QPSK
encoded chip sequence (using Equation (4)) instead of converting the received
sequence from MSK to O-QPSK. Design of such a correlator is discussed in Section
3.2.1.
Ph
ase
Axi
s G
ener
ato
r
Har
d-L
imit
er
ZCD0
ZCD1
ZCDM/2-1
Dec
isio
n
Dev
ice
I
i0
q0
i1
q1
iM/2-1
qM/2-1
D0
D1
DM/2-1Q
S
Figure 13: AZCD Block Diagram.
Down-sample
The output of the zero crossing detector is a binary signal and do not require any
further quantization (As is the case with coherent demodulation). Down-sampling is
still required to match with transmitted symbol time. The design of the down-
sampling block is the same as explained in Section 3.2.1.
Preamble Detector
The design of the preamble detector is similar to the case of coherent receiver
(Section 3.2.1). The co-efficient of the correlator is matched to the MSK/O-QPSK
encoded preamble and SFD chip sequence using Equation (4).
Correlator Bank
The design of the correlator bank is similar to that of the coherent receiver. The
correlators are matched to the MSK/O-QPSK encoded chip sequence instead of the
original chip sequence (Table 1). During this transformation, the auto-correlation and
cross-correlation properties of PN sequences remain nearly unchanged. Figure 14
shows the auto-correlation and cross-correlation between chip sequences for symbol
0 and symbol 8. The only difference between coded and uncoded PN sequence is that
negative spikes are generated in cross-correlation between MSK/O-QPSK encoded
PN sequences which has no effect on the design of the correlator bank.
A minor issue with the MSK/O-QPSK encoded PN sequence is that the encoding
can be performed only till the 31st bit of the PN sequence. The encoding Equation (4)
needs the knowledge of (k+1)th bit for encoding the kth bit. So, the 32nd
bit of PN
sequence cannot be encoded. Hence, the coding gain achieved using MSK/O-QPSK
encoded PN sequence is slightly lower than the original PN sequence.
23
Figure 14: Auto-correlation and cross-correlation of PN sequences.
3.2.3. Design Decisions
A coherent receiver has better noise performance compared with a non-coherent
receiver at the cost of higher complexity. The computational complexity of the
coherent receiver requires powerful processors for implementation. They are not
suited for designing low-power and low cost receivers. The Asynchronous Zero-
crossing demodulator (AZCD) based non-coherent receiver design has a lower noise
performance but can easily meet the noise performance requirement of IEEE
802.15.4 [37]. AZCD algorithm complexity is very low and can easily be
implemented on a low cost application specific processor. Hence, AZCD based non-
coherent receiver design is selected for implementation on a TTA based processor.
For the despreading of chip sequence, a correlator bank matched to MSK/O-QPSK
coded PN sequence is used. Following subsections briefly describe the theory behind
AZCD detector and the correlator bank.
24
4. TRANCEIVER IMPLEMENTATION
4.1. Transmitter Implementation
An O-QPSK based transmitter is implemented on TTA processor. The block diagram
of the transmitter is shown in Figure 8. The implementation details of the blocks are
given in following subsections.
4.1.1. Frame Formatter
The frame formatter is used to format the payload as per the IEEE specification
(Section 2.2). The frame formatter reads the input data (payload) using a special
functional unit which read serial data from a FIFO queue. The payload is stored in a
buffer which already contains the preamble (of length 4 bytes, all 0s), SFD and frame
length values. Once, all the payload byte has been received, the frame formatter
sends the formatted frame (PPDU) to the chip mapper for spreading. Considering the
maximum payload size to be 127 byte, 5 bytes of the frame header and 1 byte of the
length field, the buffer size for storing the PPDU is 133 bytes.
4.1.2. Chip Mapper
The role of the chip mapper is to convert the IEEE 802.15.4 frame (PPDU) to PN
chips (Section 2.3). The chip mapper is implemented in form of a lookup table
containing the 16 PN chips of length 32-bit each. Each byte of the PPDU is mapped
to two symbols (4-bits each). The symbols are then mapped to the corresponding PN
chip sequences using the lookup table. The spreading factor for the chip mapper is 8
(32-bit chip per 4-bit symbol). So, the memory required for storing the chips is 1064
bytes (133 x 8).
4.1.3. O-QPSK Modulator
The PN chips are modulated to in-phase (I) and quadrature-phase (Q) carriers as
specified by the IEEE specification (Section 2.4). Even indexed chips are modulated
to I carrier and odd indexed chips are modulated to Q carrier. The I and Q carrier
values are represented as 8-bit fixed point numbers representing +1 or -1. Since each
bit of PN chips is represented by an 8-bit fixed point number, the memory required to
store the IQ symbols is 8512 bytes (1064 x 8).
O-QPSK modulation requires an offset of half symbol time between I and Q
symbols. For implementing the offset, IQ symbols needs to be oversampled resulting
in bigger buffer requirements. The offset can easily be implemented in the pulse
shaping filter without the need of an extra buffer. Hence, the implementation of
offset is left for the 0pulse shaping block. The O-QPSK modulator in this case is a
simple QPSK modulator.
25
4.1.4. Half-Sine Pulse Shaper
The job of the half-sine pulse shaper is to output the samples of a half-sine pulse for
each I and Q carrier (Section 2.5). Eight samples of a positive half-sine pulse are
multiplied with I and Q symbol and sent out to Digital-to-Analog Converter (DAC)
using a special functional unit in TTA processor. The samples are stored as 8-bit
fixed point number and the result of multiplication of IQ symbol with pulse-shape
samples is also stored as 8-bit fixed point numbers. The samples are directly sent out
to the DAC after multiplication, avoiding the need of a buffer to store the outgoing
baseband signal. Also, the samples for the quadrature carrier are delayed by half
symbol (4 samples) to implement the offset between I and Q carrier.
4.1.5. TTA processor for the transmitter
The transmitter does not require much of computation resources. The frame
formatter and the chip mapper do not require much of computations and mainly
require memory read/write. O-QPSK modulator unpacks the 32-bit chips and maps
0/1 to -1/+1. The computations required for modulating the chips are simple bit-shift
and masking operations.
The bottleneck of the transmitter chain is a pulse shaping filter. The pulse shaping
filter requires eight fixed-point multiplications for each IQ symbol (eight times
oversampling). The fixed-point multiplication requires an integer multiply and an
arithmetic shift operation. The fixed-point multiplications are data independent and
can be performed in parallel. Keeping the processor simple, two ALUs and one FU
for multiplication are found to be enough to meet the data rate requirement at very
low clock frequency.
A table of resources used in the TTA processor is provided below:
Table 3: Resources on TTA processor for transmitter
Resource Name Count Function
ALU 2 Arithemetic and Logical Computations
MUL 1 Integer Multiplication
LSU 1 Load Store Unit for memory access
CU 1 Control unit for program flow control
RF 4 Register file for storing intermediate results
STREAM_IN 1 Serial input of data
STREAM_OUT 1 Serial output of data
Buses 6 Interconnection Network
The estimated gate count for the TTA processor is around 16k gates. The processor
requires a clock frequency of 35 MHz to meet the processing requirement of 250
kbps.
26
4.2. Receiver Implementation
The block diagram of the AZCD based non-coherent receiver is shown in Figure 15.
The following subsections will give a brief overview of the implementation of these
blocks on a TTA processor.
AZC
D
Do
wn
Sa
mp
le
Preamble Detector
Correlator Bank
I
Q
Chip Time
Bit
s-to
-Wo
rd
Pac
kin
g
Figure 15: Receiver block diagram.
4.2.1. Asynchronous Zero-Crossing Detector
The implementation of the zero-crossing detector is the most computation intensive
task in the whole receiver chain. The block diagram of AZCD is shown in Figure 13.
The Asynchronous Zero-Crossing Detector consists of a phase axis generator, Hard-
Limiter, Zero-Crossing Detectors and a decision device.
Phase Axis Generator
One of the design decisions for implementing a phase axis generator is the selection
of the number of the phase axes (M). Since, the modulation index of the MSK
modulation is 0.5, the minimum number of phase axes required to correctly
demodulate the MSK signal using a zero-crossing detector is four [35]. The use of
Integrate and Saturate Filter (ISF) in place of Limiter-Discriminator Integrator (LDI)
further increases the number of phase axes required to detect a switch in the direction
of phase change. For this reason, the number of phase axes (M) is chosen to be 8.
The equation for phase axis generation is given as.
{
(7)
The θk values are selected to generate the phase-axes at equal phase angle. For
eight phase axes, the θk values are 0, π/8, π/4, 3π/8. The sine and cosine values for
these angles are computed and used as constant 8-bit fixed point numbers.
For each pair of IQ samples received, four pairs of phase axes (namely i0, q0, i1, q1,
i2, q2, i3, q3,) are generated. This requires twelve fixed point multiplications and six
addition/subtraction operations (for θk = 0, no computations required). The
operations can be executed in parallel for all the pair of phase-axes.
27
Hard Limiter
Once the phase axes are generated, the exact position of samples with respect to the
corresponding phase axis in no more important. The zero-crossing detector detects an
axis crossing between the previous and the next sample and do not require the exact
value of the sample. For this reason, the generated phase axes are hard-limited to
-1/+1. The implementation of the hard limiter is similar to signum function which
gives the sign of a number as
( ) {
(8)
The hard limiting function can be implemented using the if-else statement, sign-bit
extraction by a logical shift or using a special FU in the processor. The conditional
if-else based implementation is not suited for signal processing applications. Sign-bit
extraction and conversion to +1/-1 requires logical shift, mask and arithmetic shift
operation and is computationally complex. For this reason, the hard limiter is
implemented using a special FU which computes the signum operation. A special
instruction (_TCE_SIGNUM) is defined which uses the FU and performs signum in
one clock cycle. Total eight such FU can perform the hard-limiting of phase axes in
parallel within a single clock cycle.
Zero-Crossing Detector
Hard-limited phase axes pairs (ik, qk) are sent to a bank of zero crossing detectors
(ZCD). Each ZCD takes a pair of axis values (ik, qk) and gives a decision variable Dk.
Dk denotes the direction of phase axis crossing. Dk = 1 denotes axis crossing in the
anti-clockwise direction, Dk = -1 denotes axis crossing in the clockwise direction and
Dk = 0 denotes no axis crossing detected. Dk is given by
[ ( ) ( )] (9)
Each zero crossing detector requires two multiplication, three subtraction and one
‘divide by 2’ operation. Integer-division by 2 can be replaced by an arithmetic right-
shift operation. The number of zero-crossing detectors is four which can do the
computation in parallel.
Decision Device
The decision device combines all the zero-crossing detections values (Dk) and gives
the symbol decision. The directions of phase rotation from all the ZCDs (Dk) are
combined together using Equation (10).
(
∑
)
(10)
Where sat(x) is defined as
28
( ) {
(11)
Ci gives the cumulative phase change direction from all the ZCD and can have
values (+1, 0, -1). The output of the decision device is given by S, defined as
{
(12)
S gives the shape of the modulated signal. If no axis crossing is detected with the
current sample (Ci = 0), the previous decision (Si-1) is output as the decision.
The implementation of the decision device is straight forward. The only issue is
implementing the sat(x) function. To avoid the conditional execution, a special
functional unit is implemented in the TTA processor for performing the saturation
function.
4.2.2. Down Sampling
For AZCD to work correctly, the sampling rate of the signal should be above the
symbol rate. For implementation, four times oversampling is used. AZCD gives one
symbol decision per I-Q sample pair. So, the output of AZCD must be down sampled
by a factor of 2 to match the symbol rate. The down sampling block copies either
even or odd indexed AZCD output values according to the provided offset to a
memory buffer.
4.2.3. Bits-to-Word Packing
The down-sampled AZCD output values are the demodulated MSK/O-QPSK
encoded chips. These values are converted from -1/1 to hard-bit format (0/1) and
packed in 32-bit word. The goal of the packing is to optimize the correlator
implementation as discussed in Section 4.2.4.
The bit packing is a highly parallelizable operation. It requires converting 32
bipolar values (-/+1) to hard bits (0/1) and shifting the bit value to its bit position
using logical shift operation. These 32 operations can be performed in parallel in two
clock cycles given the resources. A logical OR operation is performed between all
these 32 values to pack them as single 32-bit word. The OR operation can also be
performed in parallel by dividing into 32 values into two groups of 16 and
performing OR in 1st cycle, dividing the resulting 16 values into two groups of 8 and
performing OR in 2nd
cycle so on. The total number of cycles needed to perform OR
operation for 32-bit packing is five. So, the total number of cycles required for
packing 32-bit value is seven provided there are enough registers and ALUs to
perform the parallelization.
4.2.4. Correlator Bank
The correlator bank is used to de-spread the PN chip sequences to the data symbol.
Traditionally, a correlator is implemented as an FIR filter using multiply-and-
accumulate (MAC) operation. Since the output of AZCD are hard-bit values, hard bit
29
correlation (namely xor-and-count 1s) can be used very efficiently. For this reason,
the outputs of AZCD are packed into 32-bit words (representing the 32-bit PN
sequence). To detect the transmitted symbol, the 32-bit packed chip is correlated
with all the known 16 PN chip sequence and the symbol corresponding to the highest
correlating chip sequence is selected. Since the AZCD is an MSK demodulator, the
PN chips used for correlation are encoded with MSK/O-QPSK encoder using
Equation (4), as described in Section 3.2.1. The table of MSK/O-QPSK encoded chip
sequences is given in Table 4. Figure 11 gives the block diagram of the correlator
bank.
Table 4: MSK/O-QPSK encoded chip sequence
Symbol Chip Sequence MSK/O-QPSK Encoded Chip Sequence
0000 01110100010010101100001110011011 10011011001110101111011100000011
0001 01000100101011000011100110110111 10110011101011110111000000111001
0010 01001010110000111001101101110100 10111010111101110000001110011011
0011 10101100001110011011011101000100 00101111011100000011100110110011
0100 11000011100110110111010001001010 01110111000000111001101100111010
0101 00111001101101110100010010101100 11110000001110011011001110101111
0110 10011011011101000100101011000011 00000011100110110011101011110111
0111 10110111010001001010110000111001 00111001101100111010111101110000
1000 11011110111000000110100100110001 01100100110001010000100011111100
1001 11101110000001101001001100011101 01001100010100001000111111000110
1010 11100000011010010011000111011110 01000101000010001111110001100100
1011 00000110100100110001110111101110 11010000100011111100011001001100
1100 01101001001100011101111011100000 10001000111111000110010011000101
1101 10010011000111011110111000000110 00001111110001100100110001010000
1110 00110001110111101110000001101001 11111100011001001100010100001000
1111 00011101111011100000011010010011 11000110010011000101000010001111
16 correlators are matched to 16 MSK/O-QPSK PN sequences. The correlation is
performed as bitwise XOR operation between the detected chip and the known chip.
The result of XOR is passed to a special FU which counts the number of 1s in the
value. The correlator giving the minimum number of 1s in the XOR result gives the
maximum correlation and corresponding symbol is selected as the detected data
symbol.
4.2.5. Preamble Detection
The IEEE 802.15.4 frame is prefixed with the synchronization header consisting of
eight preamble symbols and two SFD symbols. The synchronization header is used
for frame and symbol timing synchronization. Preamble detection is implemented as
a correlator matched to chip sequence corresponding to preamble symbol (symbol 0)
and counting peaks. The preamble detection is considered successful if eight peaks
(above a defined threshold) are detected after the correlation of an incoming chip
sequence with preamble chips. The design of the correlator is same as explained in
Section 4.2.4 (based on the XOR and count 1s). The count of 1s in XOR result can
easily be converted to traditional multiply-and-accumulate based correlation as
30
(13)
Here, v is the traditional multiply-and-accumulate based correlation value and u is
the count of 1s in the XOR result.
4.2.6. Receiver Algorithm
The brief flowchart of the receiver algorithm is shown in Figure 16. The receiver
works in two modes. The first mode is preamble detection. In this mode, I-Q samples
are buffered till the length of preamble samples. The buffer is then demodulated
using AZCD and the output of AZCD is down sampled at two different offsets. Both
of these downsampled values are then passed to the preamble detector block. If
preamble detection is successful, the best downsampling offset (giving maximum
preamble correlation) is selected as the timing offset for further data downsampling.
If preamble detection is not successful, further samples are buffered and the
preamble detection process is repeated again.
Once the preamble detection is successful, the receiver switches to the payload
detection mode. Samples corresponding to the SFD field are demodulated using the
AZCD, downsampled at the correct timing offset (selected during the preamble
detection phase) and de-spread using the correlator bank. If the detected SFD is the
same as the expected value, frame synchronization is achieved. Next, samples
corresponding to the frame length field are buffered and demodulated. Based on the
frame length value, the length of payload samples is computed. The payload samples
are then buffered, demodulated using AZCD, downsampled and de-spread using the
correlator bank giving the required payload.
4.2.7. TTA Processor for Receiver
The bottleneck in the receiver processing chain is AZCD block and the processor is
designed to optimize the AZCD algorithm execution. Since, the number of phase
axes is chosen to be eight; eight ALUs are used in the TTA processor to allow the
parallel execution of ZCD algorithm. The resources used for implementing the TTA
processor are given in Table 5. The estimated gate count for the processor is 95 k
gates. The processor needs 200 MHz of clock frequency to achieve a data throughput
of 250 kbps.
31
START
Buffer samples of length = preamble chip length * oversampling ratio
AZCD Decode (M phase axes)
Down-sample by Oversampling ratio at multiple offset
Correlate with preamble chips
Preamble Detected?
No
Get the best down-sampling offset (Max preamble correlation)
Buffer Data Samples
AZCD Decode (M phase axes)
Down-sample by Oversampling ratio at correct offset
Correlate which chip sequences to detect data symbol
END
Yes
Figure 16: Flow chart of receiver algorithm
32
Table 5: Resources used in TTA processor for receiver
Resource Name Count Function
ALU 8 Arithmetic and Logical Computations
MUL 5 Integer Multiplication
LSU 2 Load Store Unit for memory access
CU 1 Control unit for program flow control
RF 8 Register file for storing intermediate results
STREAM_IN 1 Serial input of data
STREAM_OUT 1 Serial output of data
SIGNUM 8 For computing the signum(x) function
COUNT_ONES 1 For counting the number of 1s in 32 bit register
SATURATE 1 For saturating the values to +/-1
Buses 16 Interconnection Network
33
5. PERFORMANCE EVALUATIONS
The following subsections describe the performance evaluation of the IEEE 802.15.4
transceiver realization on a TTA processor. Performance is measured in terms of
Symbol Error Rate (SER) for the receiver design and performance of the designed
TTA processors in terms of gate count and clock frequency requirement.
5.1. Noise Performance for Receiver
The plot of Eb/N0(dB) versus Symbol Error Rate (SER) is shown in Figure 17. The
IEEE 802.15.4 requires receiver sensitivity to be -85dBm. The receiver sensitivity is
defined as threshold input power that yields a Packet Error Rate of 1 per cent
considering the PSDU length to be 20 octets without the presence of any interference
[8].
Figure 17: Eb/N0 vs SER Curve.
The value of desired SER can be computed from PER as
( ) (14)
Where, N is the number of symbols per packet. Considering 20 octets, the number
of data symbols N becomes 40 (4-bit symbols). Using N=20, PER = 1%, the SER
threshold is computed to be .
From Figure 17, the Eb/N0 value at the input of the receiver should be at least 19
dB to get a SER of less than . The Noise Figure (NF) requirement for RF
receiver can be computed using these values and receiver sensitivity requirement of
-85 dBm as
34
( ) ( ) (15)
Considering chip rate B = 2 Mchips/sec, the NF requirement for the receiver is
computed to be 7 dB. Such low noise RF receivers can be easily fabricated using
CMOS process. One of such RF receivers for 2.4 GHz band is demonstrated in [39].
Comparing the noise performance with the coherent receiver design [18], we see
that the receiver noise performance is much better than that of the non-coherent
receiver. The Eb/N0 requirement at the input of the coherent receiver is around 8.7 dB
[18] which is much better than 19 dB requirement of non-coherent receiver. The
better performance in the coherent receiver is achieved at the cost of higher
complexity and implementation cost. Nonetheless, the non-coherent receiver can
easily meet the noise performance requirement of IEEE 802.15.4 with much less
complexity than the coherent receiver.
5.2. Performance of the TTA processor for transmitter
The TTA processor for transmitter is simple enough to be implemented on low-cost
devices. The gate count estimate for the processor is 16 k gates. The CPU cycle
requirement for processing a PPDU for transmission is given as
( ) (16)
The clock frequency required for the processor to achieve a data throughput of 250
kbps is defined as per Equation (16). The minimum clock frequency required for the
processor is 35 MHz.
5.3. Performance of the TTA processor for receiver
The real challenge in the IEEE 802.15.4 transceiver design is designing the low
power and low cost processor for a receiver. The choice of a good receiver algorithm
such as AZCD which requires very basic processing makes the task somewhat easier.
The challenge with implementing the AZCD algorithm is the conditional processing
requirements like saturation, signum function etc., which are handled using special
functional units. The designed processor takes 5930 CPU cycles for processing each
byte of the payload. There is an overhead of 32 k CPU cycles for the preamble
detection and synchronization process. To achieve the data throughput of 250 kbps,
the clock frequency required by the processor is 200 MHz. The estimated gate count
of the processor is 95 k which is small enough to be implemented on low cost
devices.
The clock frequency and gate count can be further reduced by replacing the
generic ALUs with the FUs to perform required arithmetic or logical operation. Such
a design would give much better performance at the loss of generality of the
processor.
For comparison, an SIMD processor designed in [23] also operates at 200 MHz
with SIMD factor of 16 and some special hardware blocks for processing of coherent
demodulation (like CORDIC, MAC, etc.). The resource usages for the designed TTA
processor are comparable to the SIMD processor as described in [23].
35
5.4. Discussion
Designing an SDR system for any wireless communication technology is an issue of
software-hardware trade-off. An ideal SDR system should be totally software driven
with an antenna connected at the serial port of the processor. Such a system is hard to
realize for any practical purposes. The hardware-software tradeoff needs to be
optimized to get the best of flexibility, adaptability and time-to-market under the
constraints of required performance matrices in terms of throughput, latency, energy
consumption and cost. While designing an SDR system using an ASIP, the same
trade-off needs to be taken care. The signal processing needs to be divided into
hardware implemented and software implemented part. The hardware part is
implemented as a functional unit of the ASP. The trade-off in designing the special
functional unit is to choose instructions which are generic in nature so that the
designed processor could be used without much of change for other similar
technologies (programmability).
A qualitative comparison of different methodologies for the implementation of
IEEE 802.15.4 is presented in Table 6. The comparison is done between system-on-
chip (SOC) implementations (for example TI CC2531 [40], Marvell 88MZ100 [41]
etc.), GPP based implementation (using GNU Radio, Matlab, etc.) and the proposed
TTA implementation. The SOC implementations are optimal in terms of cost and
energy consumption but lack the programmability of an SDR system. The time to
market for an SOC implementation is high and there is no maintainability once the
product is delivered. GNU Radio or Matlab-Simulink based implementation with the
help of SDR kits (like USRP devices) are highly programmable. However, they
cannot guarantee the real time throughput and the solution cannot be integrated on
low cost devices. Using an ASP based SDR implementation balances the trade-off
providing programmability and integration on low-cost devices at cost of higher cost
and energy consumption.
Table 6: Comparison of different implementation of IEEE 802.15.4
Parameters System-on-chip SDR on GPP SDR on ASP
(TTA)
Programmability No Yes Yes
Integration with Sensor
Nodes
Yes No Yes
Real-time Processing Yes No Yes
Low-Cost Yes NA Yes
36
6. SUMMARY
The goal of this work was to analyze the feasibility of realizing an SDR for low cost
and low power wireless communication transceiver on a TTA based ASP. The
wireless communication standard selected for realization was IEEE 802.15.4. At
first, the feasibility of using an existing SDR development framework (GNU Radio)
on a TTA based processor was studied. It was concluded that the GNU Radio
framework cannot be used as standalone system and needs Linux running on host PC
or a device running Embedded Linux for its execution.
Due to infeasibility of using the existing SDR development frameworks on TTA
processor, the signal processing algorithms needed to be implemented from scratch.
For this, algorithm level study was carried out for selecting signal processing
algorithms suitable for the IEEE 802.15.4 specified requirements. Both coherent and
non-coherent receiver algorithms were reviewed as a candidate for the
implementation over TTA. Finally, the AZCD based receiver algorithm was selected
for implementation.
The designed transceiver is first modeled using Matlab to check whether selected
algorithms meet the noise performance as desired by the IEEE standard. Different
parameters for implementation (such as the sampling rate, the preamble detection
threshold, the number of phase axes for AZCD etc.) were selected at this stage using
Matlab model. The Matlab model was also used for the testing and verification of
actual C implementation.
Once the Matlab model of the transceiver was complying with IEEE requirement,
the transceiver was implemented in C along with implementing the TTA processor
for the transmitter and the receiver. Special functional units were implemented in the
TTA processor for the receiver to speed-up the algorithm. The data throughput
requirement of 250 kbps was achieved on both transmitter clocked at 35 MHz and
receiver clocked at 200 MHz. The gate counts of the processors are small enough to
be implemented on low-cost and low-power devices.
The implemented transceiver lacks in the noise performance compared with
industry available SOC implementations such as TI CC2531 [40], Marvell 88MZ100
[41] or compared with the coherent receiver algorithm implementations [22, 23].
Nonetheless, the noise performance meets the requirement of the IEEE 802.15.4
specification. The commercially available SOCs also provide better performance in
terms of the energy consumption and production cost. However, compared with the
SDR implementation, the SOCs lack the programmability, have higher time-to-
market with no post production maintainability. The coherent receiver based SDR
implementation [22] has higher complexity and need a fully functional DSP [23] for
its operation. The processing complexity of the realization proposed in this thesis
provides an optimal of cost and power consumption along with providing the
benefits of the SDR system and meeting the performance requirements of the IEEE
802.15.4 standard.
While the current AZCD based implementation meets the noise performance of the
IEEE 802.15.4 standard, it can be improved by using more complex synchronous
ZIFZCD algorithm at the cost of designing a more complex TTA processor. The
current design implements most of the functionalities in software keeping the
processor generic. The design can be further optimized in terms of gate count, CPU
clock frequency and power consumption by using a software-hardware hybrid
design. A generic ZIFZCD or AZCD block can be implemented in hardware as a
37
special functional unit of the TTA processor. Since, the ZIFZCD and AZCD
algorithms can be used for demodulating any CPFSK modulated signal, the TTA
processor can be used in any system employing CPFSK modulation.
The work presented in this thesis establishes the feasibility of implementing low
rate baseband wireless transceiver systems on a TTA based application specific
processor. Using the example of the IEEE 802.15.4 LR-WPAN PHY standard, it has
been shown that a low data-rate SDR transceiver can be implemented on a TTA
processor. The TTA based ASP design can be one of the ideal platforms for moving
from current SOCs based implementation to the programmable SDR
implementations. The results of this thesis work have been published as a conference
paper in the ‘Wireless Innovation Forum Conference of Wireless Communication
Technologies and Software Defined Radio’ (SDR-WinnComm 2013), Washington
DC, USA [42].
38
7. REFERENCES
[1] IEEE Standard 802.16 (2004), IEEE standard for local and metropolitan
area networks part 16: Air interface for fixed broadband wireless access
systems.
[2] IEEE Standard 802.16-2005 (2005), IEEE standard for local and
metropolitan area networks part 16: Air interface for fixed and mobile
broadband wireless access systems.
[3] 3rd
Generation Partnership Project (3GPP), http://www.3gpp.org
(17.12.2012)
[4] IEEE Standard 802.11-2012 (Revision 2012), IEEE Standard for
Information technology - Telecommunications and information exchange
between systems Local and metropolitan area networks - Specific
requirements Part 11: Wireless LAN medium access control (MAC) and
physical layer (PHY) specifications.
[5] IEEE Standard 802.15.1-2005 (2005), IEEE Standard for Information
technology-- Local and metropolitan area networks-Specific requirements-
Part 15.1a: Wireless Medium Access Control (MAC) and Physical Layer
(PHY) specifications for Wireless Personal Area Networks (WPAN).
[6] IEEE Standard 802.15.2-2003 (2003), IEEE Recommended Practice for
Information technology-- Local and metropolitan area networks - Specific
requirements- Part 15.2: Coexistence of Wireless Personal Area Networks
with Other Wireless Devices Operating in Unlicensed Frequency Bands.
[7] IEEE Standard 802.15.3-2003 (2003), IEEE Standard for Information
technology - Telecommunications and information exchange between
systems - Local and metropolitan area networks - Specific requirements
Part 15.3: Wireless Medium Access Control (MAC) and Physical Layer
(PHY) Specifications for High Rate Wireless Personal Area Networks
(WPAN).
[8] IEEE Standard 802.15.4-2011, (2011), IEEE Standard for Local and
metropolitan area networks--Part 15.4: Low-Rate Wireless Personal Area
Networks (LR-WPANs).
[9] Corporaal, H. (1998), Microprocessor architectures: from VLIW to TTA, J.
Wiley.
[10] Jääskeläinen, P., Guzma, V., Cilio, A.,Pitkänen, P., Takala, J. (2007),
Codesign toolset for application-specific instruction-set processors, Proc.
SPIE Multimedia on Mobile Devices.
39
[11] Sabater, J., ome , .M., L pe , M. (20 0 , Towards an IEEE 802.15.4
SDR transceiver. In: 17th IEEE International Conference on Electronics,
Circuits, and Systems (ICECS), pp.323-326,
[12] GNU Radio, http://gnuradio.org (17.12.2012).
[13] Data Sheet for USRP N200/N210 Network Series,
https://www.ettus.com/content/files/07495_Ettus_N200-
210_DS_Flyer_HR.pdf (17.12.2012).
[14] Data Sheet for USRP E100/E110 Embedded Series,
https://www.ettus.com/content/files/07495_Ettus_E100-
110_DS_Flyer_HR.pdf (17.12.2012)
[15] Data Sheet for USRP B100 Bus Series,
https://www.ettus.com/content/files/07495_Ettus_B100_DS_Flyer_HR_4.
pdf (17.12.2012)
[16] Data Sheet for USRP1 Bus Series,
https://www.ettus.com/content/files/07495_Ettus_USRP1_DS_Flyer_HR.p
df (17.12.2012)
[17] Schmid, T. (2006), GNU Radio 802.15.4 En- and Decoding,
http://nesl.ee.ucla.edu/fw/thomas/thomas_project_report.pdf (17.12.2012)
[18] Thandee R. (2012), IEEE 802.15.4 Implementation on an Embedded
Device. Master’s thesis. Virginia Polytechnic Institute and State
University, USA.
[19] Knauth S., Implementation of an IEEE 802.15.4 Transceiver with a
Software-defined Radio setup,
http://www.ihomelab.ch/fileadmin/Dateien/PDF/emw2008_paper_knauth.
pdf (17.12.2012)
[20] Chipcon products from Texas Instruments, CC2420: 2.4 GHz IEEE
802.15.4/ Zigbee-ready RF transceiver.
http://focus.ti.com/lit/ds/symlink/cc2420.pdf (17.12.2012).
[21] Mahmood S. (2011), Software defined radio for wireless sensor &
cognitive networks. Master’s thesis. Faculty of Information Technology,
University of Applied Sciences, Vasa, Finland.
[22] Koteng R. M. (2006), Evaluation of SDR-implementation of IEEE
802. 5.4 Physical Layer. Master’s thesis. Norwegian University of Science
and Technology, Department of Electronics and Telecommunications,
Norway.
[23] Naess H. (2006), A programmable DSP for low-power,low-complexity
baseband processing. Master’s thesis. Norwegian University of Science
and Technology, Department of Electronics and Telecommunications,
Norway.
40
[24] Dehaese N., Bourdel S., Barthelemy H., Bas G. (2006), Simple
demodulator for 802.15.4 low-cost receivers. In: IEEE Radio and Wireless
Symposium, 17-19 Jan. 2006.
[25] The ZigBee Alliance, http://www.zigbee.org/ (17.12.2012)
[26] IEEE Standard 802.2-1998 (1998), IEEE Standard for Information
technology - Telecommunications and information exchange between
systems - Local and metropolitan area networks - Specific requirements -
Part 2: Logical Link Control.
[27] Proakis, J.G. (1989), Digital Communications, New York: McGraw-Hill.
[28] Volker B., Handel P. (2001), Frequency estimation from proper sets of
correlations. IEEE Transactions on Signal Processing, vol.50, no.4,
pp.791-802.
[29] Kay S., (1989), A fast and accurate single frequency estimator. IEEE
Transactions on Acoustics, Speech and Signal Processing, vol.37, no.12,
Dec 1989.
[30] Classen F., Meyr H. (1993), Two frequency estimation schemes operating
independently of timing information. In:Global Telecommunications
Conference, vol. 3.
[31] Volder J.E. (1959), The CORDIC Trigonometric Computing
Technique. IRE Transactions on Electronic Computers, vol. EC-8, no.3,
pp. 330–334.
[32] Tjhung T., Wittke P. (1970), Carrier Transmission of Binary Data in a
Restricted Band. IEEE Transactions on Communication Technology,
vol.18, no.4, pp.295-304.
[33] Park J. (1970), An FM Detector for Low S/N. IEEE Transactions
on Communication Technology, vol.18, no.2, pp.110-118.
[34] Meyr H., Subramanian R. (1995), Advanced digital receiver principles and
technologies for PCS. IEEE Communications Magazine, vol.33, no.1,
pp.68-78.
[35] Kwon H.M., Kwang B. L. (1996), A novel digital FM receiver for mobile
and personal communications. IEEE Transactions on Communications,
vol.44, no.11, pp.1466-1476.
[36] Dehaese N., Bourdel S., Barthelemy H., Bachelet Y., Bas G. (2005), FSK
zero-crossing demodulator for 802.15.4 low-cost receivers, In: 12th IEEE
International Conference on Electronics, Circuits and Systems (ICECS)
pp.1-4, 11-14 Dec. 2005.
41
[37] Dehaese N., Bourdel S., Barthelemy H., Bas G. (2006), Simple
demodulator for 802.15.4 low-cost receivers, IEEE Radio and Wireless
Symposium, pp. 315- 318, 17-19 Jan. 2006.
[38] Luzzatto A., Shirazi G. (2007), Wireless Transceiver Design, J. Wiley, pp.
36.
[39] Zolfaghari A., Razavi B. (2003), A Low Power 2.4 GHz
Transmitter/Receiver CMOS IC, IEEE Journal of Solid-state Circuits, vol.
38, no. 2.
[40] A USB-Enabled System-On-Chip Solution for 2.4-GHz IEEE 802.15.4 and
ZigBee Applications, http://www.ti.com/lit/ds/symlink/cc2531.pdf
(14.01.2013).
[41] Marvell 88MZ100: Zigbee SoC Solution for Home Automation and More,
http://www.marvell.com/wireless/88MZ100/assets/Marvell-88MZ100-
ZigBee-Product-Brief.pdf (14.01.2013)
[42] Ghazi A., Boutellier J., Hannuksela J., Silvén O., Janhunen J. (2013), Low-
complexity SDR implementation of IEEE 802.15.4 baseband transceiver
on application specific processor. In: The Wireless Innovation Forum
Conference on Communications Technologies and Software Defined
Radio (SDR-WInnComm), 8-10 Jan. 2013.