liang li nine month report
TRANSCRIPT
UNIVERSITY OF SOUTHAMPTON
Faculty of Engineering, Science and Mathematics
School of Electronics and Computer Science
A progress report submitted for continuation towards a PhD
Supervisor: Dr. Rob Maunder, Prof. Bashir M Al-Hashimi and Prof. Lajos Hanzo
Examiner: Dr Song Xin Ng
Analysis of Low Power Implementational
Issues of Turbo-like Codes in Body Area
Networks
by Liang Li
November 3, 2009
UNIVERSITY OF SOUTHAMPTON
ABSTRACT
FACULTY OF ENGINEERING, SCIENCE AND MATHEMATICS
SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE
A progress report submitted for continuation towards a PhD
by Liang Li
Body Area Networks (BANs) are a promising application of wireless sensor networks
(WSNs) which are attracting a lot of research interest. A BAN is a WSN located in the
vicinity of a human body for continual monitoring of certain parameters of the human
body, which can provide a healthcare service in a more comfortable, convenient and
economical way than the conventional methods. The extremely low power and high reli-
ability requirements of BANs make the communication challenge. In this report, a state
of the art investigation of the research on communication technologies in BANs is given.
Based on the investigation, a proposal of using Turbo-like codes for the channel coding
scheme of BANs is discussed. Because of the low power requirement of BANs applica-
tions, the low power implementation issues of Turbo decoding schemes are discussed. A
method to determine the optimal data width specification in fixed-point implementation
of Turbo decoder from a low power point of view is presented. A framework to compare
and evaluate different Turbo-like codes from the energy consumption point of view is
proposed.
Contents
Acknowledgements xi
1 Introduction 1
1.1 Introduction of Body Area Networks (BANs) . . . . . . . . . . . . . . . . 1
1.2 Communication in BANs . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Communication requirements . . . . . . . . . . . . . . . . . . . . . 4
1.2.1.1 Frequency conditions . . . . . . . . . . . . . . . . . . . . 4
1.2.1.2 Network scale and communication range . . . . . . . . . 5
1.2.1.3 Data rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1.4 Reliability, accuracy and latency . . . . . . . . . . . . . . 6
1.2.1.5 Energy consumption . . . . . . . . . . . . . . . . . . . . . 6
1.2.1.6 Network topology . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Candidate options for Body Area Networks . . . . . . . . . . . . . . . . . 7
1.3.1 Turbo-like Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Outline of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Turbo-like Code Solutions in BANs 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Turbo codes and BCJR decoding algorithm . . . . . . . . . . . . . . . . . 18
2.2.1 UMTS encoder and decoder architecture . . . . . . . . . . . . . . . 18
2.2.2 BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2.1 Log-BCJR algorithm . . . . . . . . . . . . . . . . . . . . 24
2.3 EXIT chart analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Fixed-point representation in a Turbo decoder . . . . . . . . . . . . . . . . 31
3 Optimal Data-width Settings for Fixed-point Implementation 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Fixed-point EXIT chart analysis of UMTS Turbo Decoder . . . . . . . . . 42
3.3 Simulation and Analysis Results . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 Comparison between different Logarithm methods . . . . . . . . . 43
3.3.2 Comparison and Analysis in Fixed-point simulation . . . . . . . . 43
3.3.2.1 Wrapping Technique . . . . . . . . . . . . . . . . . . . . . 46
3.3.2.2 Saturation Technique . . . . . . . . . . . . . . . . . . . . 49
3.3.2.3 Normalisation Technique . . . . . . . . . . . . . . . . . . 51
3.3.2.4 Final validation . . . . . . . . . . . . . . . . . . . . . . . 51
4 Energy Estimation Decoding Algorithm 57
v
vi CONTENTS
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Previous works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 A framework for quantifying the energy consumption of a Turbo-like decoder 61
4.3.1 Level 1 of the framework . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.2 Future work: Level 2 of the framework . . . . . . . . . . . . . . . . 64
5 Conclusions and Further Works 67
Bibliography 69
List of Figures
1.1 A typical BANs architecture. . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Two concatenation way of Turbo-like codes. . . . . . . . . . . . . . . . . . 9
1.3 Two decoding schemes of two types of Turbo-like codes. . . . . . . . . . . 9
2.1 Transmission scheme of serial concatenation codes. . . . . . . . . . . . . . 14
2.2 A typical BER chart for Turbo codes. . . . . . . . . . . . . . . . . . . . . 15
2.3 Performance comparison of a Turbo code and a convolutional code [?]. . . 16
2.4 A classical Turbo encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 A classical Turbo decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 A classical SC decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Scheme of UMTS Turbo encoder. . . . . . . . . . . . . . . . . . . . . . . . 19
2.8 Scheme of the convolutional encoder and the trellis diagram. . . . . . . . 21
2.9 Trellis diagram of a transition sequence. . . . . . . . . . . . . . . . . . . . 22
2.10 A example transition sequence. . . . . . . . . . . . . . . . . . . . . . . . . 22
2.11 Scheme of UMTS Turbo decoder. . . . . . . . . . . . . . . . . . . . . . . . 23
2.12 A example trellis of a short terminated trellis code. . . . . . . . . . . . . . 25
2.13 Scheme of the EXIT chart generating. . . . . . . . . . . . . . . . . . . . . 29
2.14 One EXIT curve I(ae) = F (I(aa)) of UMTS Turbo code using BPSK totransmit over an AWGN channel having an SNR of -4 dB. . . . . . . . . . 29
2.15 EXIT chart of UMTS Turbo decoder. . . . . . . . . . . . . . . . . . . . . 30
2.16 The decoding trajectories in the EXIT chart. . . . . . . . . . . . . . . . . 30
3.1 correction function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 A possible accumulation route in the trellis. . . . . . . . . . . . . . . . . . 38
3.3 Example of difference calculation in two’s complement representation. . . 39
3.4 Three different Turbo codes in previous works. . . . . . . . . . . . . . . . 40
3.5 EXIT chart of different log algorithms. . . . . . . . . . . . . . . . . . . . . 43
3.6 EXIT chart of different fraction lengths. . . . . . . . . . . . . . . . . . . . 44
3.7 Scheme of UMTS Turbo decoder. . . . . . . . . . . . . . . . . . . . . . . . 46
3.8 EXIT chart of different integer lengths with wrapping technique - 1. . . . 47
3.9 EXIT chart of different integer lengths with wrapping technique - 2. . . . 47
3.10 EXIT chart of different integer lengths with wrapping technique - 3. . . . 49
3.11 EXIT chart of different integer lengths with wrapping technique - 4. . . . 49
3.12 EXIT chart of different integer lengths with saturation technique. . . . . . 50
3.13 EXIT chart of different integer lengths with normalisation technique. . . . 52
3.14 Simulation results of 5114-bit block length in fixed-point with normalisa-tion and floating-point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
vii
viii LIST OF FIGURES
3.15 Simulation results of 40-bit block length in fixed-point with normalisationand floating-point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.16 Simulation results of SNR=-4.83dB/453-bit block length in fixed-pointwith normalisation and floating-point. . . . . . . . . . . . . . . . . . . . . 54
3.17 Simulation results of 5114-bit block length in fixed-point with wrappingtechnique and floating-point. . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.18 Simulation results of 40-bit block length in fixed-point with wrappingtechnique and floating-point. . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.19 Simulation results of SNR=-4.83dB/453-bit block length in fixed-pointwith wrapping technique and floating-point. . . . . . . . . . . . . . . . . . 55
4.1 The dependence between the different stages. . . . . . . . . . . . . . . . . 63
4.2 Flowchart of energy estimation framework. . . . . . . . . . . . . . . . . . 65
List of Tables
1.1 Data rate requirement of different applications in BANs . . . . . . . . . . 5
2.1 Different representation methods for integer numbers . . . . . . . . . . . . 32
2.2 Two’s complement representation method for fraction numbers . . . . . . 33
3.1 Different representation methods for integer numbers . . . . . . . . . . . . 40
ix
Acknowledgements
I would like to express my gratitude to all those who gave me help to complete this
report. I would like give my special thanks to my supervisors, Dr. Rob Maunder, Prof.
Bashir M Al-Hashimi and Prof. Lajos Hanzo. Their insightful guidance and directions
and all the wise advice allowed this work to become reality. I am deeply indebted to
Dr. Rob Maunder. His stimulating suggestions and encouragement helped me in all the
time of research for and writing of this report.
Many thanks also to my colleagues and the staff of the Communications Group and
Electronic System Design Group for the useful discussions and comments throughout
my research. Special thanks to my colleagues Amit Acharyya and Dr. Jos Akhtman for
their technical support.
I would also like to express my appreciation to my parents who taught me all the good
things that really matter in life. Especially, I would like to thanks to my girlfriend
Nuofei Lu whose patient love enabled me to complete this work.
xi
Chapter 1
Introduction
A wireless sensor network (WSN) is a network that composed with a number of sensor
devices with the ability to communication with each other or with upper level networks.
A sensor node in a WSN consist of data sensing, data processing, and communicating
components. The sensors are deployed either inside the phenomenon or very close to it.
Typically, one or more central devices are included in a WSN for collecting data from
the sensors and communicating with upper level networks. With the development of
wireless communication technologies, WSNs start to play an important role in support-
ing different scales of networks to connect a person at anywhere, at anytime and with
anybode. In this report, a promising application of wireless sensor networks (WSNs),
Body Area Networks (BANs) is introduce. The communication requirement of BANs is
investigated based on literature review. The candidate technologies for the wireless com-
munication in BANs are discussed, including the potential of using Turbo-like coding
schemes in BANs, which is advocated in this report. Finally, the outline of the report
is given.
1.1 Introduction of Body Area Networks (BANs)
Body Area Networks (BANs) or Wireless Body Area Networks (WBANs) are a promising
application of short range wireless sensor networks(WSNs) in the healthcare industry.
Its basic scenario is by locating a number of wireless sensors on the human body for
continual monitoring of physiological parameters of humans such as heart rate, Electro-
CardioGram (ECG) data, ElectroEncephaloGraphy (EEG) data, blood pressure, body
temperature, levels of certain chemicals such as sugar, oxygen and medications in the
blood, motion, etc [1]. The parameters could be important or even life critical to some
people or patients, such as the aging population, chronic disease patients, cerebrovas-
cular and cardiovascular disease patients. Long-term monitoring and logging of the
1
2 Chapter 1 Introduction
physiological parameters of the patients could help doctors to treat the patient or dis-
cover risks earlier. The purpose of deploying BANs on such group of people is to create
a more comfortable, convenient and economical way to perform the required monitoring
and logging missions either in hospitals or in a home-based healthcare system. The
concept of (W)BANs was first introduced by T. G. Zimmerman in 1996 [2]. Over the
past few years, the advancements in electronic systems and wireless technologies have
enabled the development of small and intelligent medical sensors which can be attached
to or implanted into the human body. The healthcare industry is becoming increasingly
interested in using such types of technologies to develop practical BANs [3]. Hence, a
lot of hot spots in such research area are being widely researched including energy har-
vesting, signal processing and communication, etc. Recently the IEEE 802.15 working
group established a study group, body area network (802.15.TG6), to develop guideline
for using wireless technologies for BANs applications in various healthcare services. This
report is focused on the wireless communication technology for BANs. Furthermore, the
concept of BANs can be further divided into two categories depending on their operating
environments. One is mainly operated on the surface or in the vicinity of human bodies,
namely wearable BANs. The other is operated inside body, namely implantable BANs.
Due to the different operating environments, the communication requirements of such
two types of application are different, which lead to different strategies of the develop-
ment of the communication technology in the networks. Most of the recent works and
the IEEE 802.15.TG6 group are targeted on the wearable BANs, which is also the focus
of this report. In the rest part of this report, the term BANs is referred to the wearable
BANs.
1.2 Communication in BANs
To discuss the wireless communication issues in BANs, we start with the scenarios of
BANs applications. A typical BANs scenario is given in Figure 1.1. It consists a number
of sensor nodes attached on the human body to perform different functions for collecting
different physiological parameters. A central device such as a PDA or a mobile phone
takes the data from the sensor nodes via a wireless network formed between them, namely
a BAN. The size of the sensors is supposed to be as small as possible for comfort and
convenience issues. Therefore, the functions of them are limited. The basic assumption
is the sensors only collect the data from the body, do the necessary signal processing and
transmit the data to the central device in real-time communication. The central device
then perform more processing of the data. It might connect to an external network
for communicating with a higher-level system, such as the hospital, depending on the
requirement of different scenarios. The number of sensors required could be variable
depending on different applications. For one particular disease, usually there are only
a few (< 3) sensor nodes required [3]. However, for more complicated situations, more
Chapter 1 Introduction 3
Sensor nodes Central node
Figure 1.1: A typical BANs architecture.
sensors might be required. Especially, when motion detection is involved, for example,
for people need after treatment to help with recover the mobility, sensors might required
on every movable part of the body, and including more sensors mean more accurate of
the motion detection. According to [1], typically no more than 20 sensors will be used
for any one person.
Based on such a scenario of the applications of BANs, there are a couple of features of the
scenario need to be considered carefully while developing communication technologies
for BANs.
• Firstly, the limited resource on the sensors due to the limited size of them. The
nodes need to be light, small, wireless and long lived. Since one of the purpose of
BANs is to create a convenient healthcare system for patients without the support
from the professionals. The wireless sensors on human bodies need to be battery
operated or energy harvesting operated. They are required to last long without
any need of maintenance. This leads to a extremely energy efficient requirement.
• Secondly, in the scenario, all the sensors and the central device are always around
the patient which leads to a very short cover range requirement for BANs.
• Thirdly, in the scenario, all the data only required to be transmitted from the
sensors to the central nodes. Since the function of a sensor is simple and limited,
there is almost no need for a dual-way communication in the system. An one-
way communication (from the sensors to the central node) is sufficient for BANs.
Although, in some scenarios, there may be an overall energy saving by letting
nodes listen to each other. Also, including relays in the networt might require
4 Chapter 1 Introduction
nodes to receive. The basic assumption of no requirement of transmission from
the central node to the sensors is still valid.
Based on the discussion of the basic scenario of BANs, in next section, the requirement
of the communication in BANs is discussed.
1.2.1 Communication requirements
To develop new technologies of a particular type of WSN, the basic requirements need
to be considered first. In this section, we will discuss the communication requirements
of BANs, which includes frequency bands, network scale, communication range, data
rate, reliability, energy consumption, network topology and security issues.
1.2.1.1 Frequency conditions
Currently, there is no clear frequency range could be used for BANs. According to the
latest news [4], the Federal Communications Commision (FCC) is considering several
possible frequency bands for use by BANs:
• 2300-2305 MHz and 2360-2395 MHz Band: The 802.15.TG6 Group and GE Health-
care (GEHC) propose to use this band for BANs. However, this band is currently
used by several other services, including Aeronautical Mobile Telemetry (AMT),
federal radio location and amateur radio users. This can be a problem regarding
interferences and security. The FCC is considering the proposed potential use of
these bands by BANs on a coexistence and non-interference basis.
• 2400-2483.5 MHz Band: This band is used by Industrial, Scientific and Medical
(ISM) equipment on a non-licensed basis under the FCC’s rules. The FCC seeks
comment on whether BANs could operate in this band under current rules or
whether new rules would be required to regulate BANs using this band.
• Other Frequency Bands: The FCC seeks comment on whether other frequency
bands may be appropriate for BANs, including the 5150-5250 MHz band, which is
now allocated for federal and non-federal aeronautical navigation and non-federal
fixed-satellite use and unlicensed notional information infrastructure (U-NII) de-
vices.
An alternative solution is to use the Ultra-WideBand (UWB) technology, which is au-
thorised to communicating between 3.1 GHz and 10.6 GHz [5]. [6] discussed the potential
and the advantages of applying UWB technology to BANs.
Chapter 1 Introduction 5
1.2.1.2 Network scale and communication range
As discussed in Section 1.2, a BAN is a small scale network which could include up to 20
sensor nodes. One important feature of BANs is that all the devices in the network are
around a human body. This extremely limited the communication range requirement
of the network. An universal agreement of the communication range of BANs is 2-5
meters [1,7,8], which is shorter than the cover range of any existing WSNs application.
An obvious advantage of such a short communication range is that it directly leads to a
low emission level from the transmitter. On the other hand, all the devices are in each
others vicinity, which could induce a interference problem.
1.2.1.3 Data rate
Most of the previous works agree that BANs require real-time low data rate communi-
cation, but the detailed investigation results of the data rate requirement are different.
For example, the investigation results from [3] and [7] are summarised in Table 1.1.
Healthcare applications B. Zhen’s work [3] H. Li’s work [7]
Heartbeat <0.1 kps 0.05 kpsBody temperature <0.1 kps 0.05 kps
Electrocardiogram (ECG) 2.5 kps 72 kpsElectroencephalography (EEG) 0.54 kps 131.1 kps
Electromyography (EMG) 1152 kpsBlood pressure <0.1 kps 0.05 kps
Blood sugar level <0.1 kpsBlood analysis 8.192 kps
Table 1.1: Data rate requirement of different applications in BANs
Note that for the low data rate requirement applications, such as heartbeat, body tem-
perature and blood pressure, the investigation results are quiet close. However, for more
complicated parameters, such as ECG and EEG, the investigation results are very dif-
ferent. The reason of this could be the different assumption of the pre-processing of the
signals before transmitting. For example, if a sensor transmits compressed data rather
than transits the raw data, the data rate requirement could be reduced. With a review
of a number of previous work, the highest data rate assumption is given by [8], which
claimed that up to 1 Mbps data rate is required by BANs. This conclusion still gives
a data rate requirement low than any existing WSNs applications. For example, for
another lower speed short range WSNs application Wireless Personal Area Networks
(WPANs), the required data rate is up to 10 Mbps. Although, some of the previous
works including video stream transmission into BANs scenarios, the data rate require-
ment could be increased to 100Mbit/s [1].
6 Chapter 1 Introduction
1.2.1.4 Reliability, accuracy and latency
Although the requirement of the performance in BANs is still under discussion, which
includes reliability, accuracy and latency, of BANs. Since the monitoring signal are life
critical, the fault data transmission or a few minutes delay maybe fatal for the patient.
The requirement of the performance in BANs should be relatively high. Many previous
works agreed that delays and communication errors should be within strictly defined
standard in order to avoid disastrous behaviour [3, 9, 10]. According to the 802.15.TG6
group’s official released report [11], a fast reaction of < 1 second with a reliability of
99.99% is expected. Also latency should < 250 ms and jitter < 50 ms. The evaluation
of the performance can be revealed by the delay profile, the information loss rate, the
bit error rate (BER) and frame error rate (FER), which need to be considered carefully
during the design of a communication system. In addition, when a human body moves
the sensors will change positions to each other. When the environment is changing
channel conditions will also change and interfere in the network performance. For BANs,
the system must keep its reliability under all the possible conditions. The different state
of human bodies must be considered such as walking, running, turning, etc. Different
environments will be crossed in practice where different and multiple BANs will coexist
or different and multiple interference resistance will happen, such as tunnels, subways
parks, etc. The design of BANs must be prepared for all the realistic scenarios.
1.2.1.5 Energy consumption
The energy consumption is a very crucial issue in BANs on the sensor nodes due to the
limited energy resource and the long life-time requirement. As discussed in Section 1.2,
the sensors in BANs have to be battery operated or depending on energy harvesting
which do not need to be recharged frequently. Since the sensors are expected to be
as small as possible, the energy resources can be provided on them would be extremely
limited. This requires every function on the sensors need to be energy efficient, including
the wireless communication mechanism. In addition, a distinguished feature of BANs
is that the sensors are required to attached on the human body. On the other hand,
previous works suggested that, for the safety to human body, wireless devices should be
separated at least 30 cm distance from human body [3]. Hence, extremely low power in
transmission is required to protect human tissues. It is widely agreed that the low power
requirement is one of the most challenging issues in developing BANs [1, 3, 12]. Since
such a small distance and especial requirement for protecting people tissue of BANs, it
is not covered by existing wireless standards [1].
Chapter 1 Introduction 7
1.2.1.6 Network topology
Because of the small network scale and the one-way communication assumption, a star
topology becomes a direct solution of BANs [13,14]. The advantages of star topology are
its simple architecture and highly concentrated system complexity to the central node,
which is suitable features for BANs. However, the devices are located on the human
body that can be in motion. BANs should therefore be robust against frequent changes
in the network topology. Moreover, human bodies strongly attenuate RF signal [10].
Both of the reasons lead to benefits of using a multi-hop network topology in BANs.
In addition, using multi-hop transmission instead of direct transmission can approach
lower energy consumption in communication [12]. Hence, many of recent works were
focusing on multi-hop network solutions for BANs [15,16].
1.2.1.7 Security
Security is a major issue on medical applications [8]. Safety and privacy should be
concerned including all involved parts from doctors, nurses, patients, administrative
personal and medical service providers. Devices also needs authentication for security.
Interferences of external devices or intentional attacks must be considered on safety
issues. On the other hand, due to the limited resource on the sensors and the user
friendly requirement, the system must be simple. The insertion and deinsertion of a
node in a BAN must be easy to the user.
1.3 Candidate options for Body Area Networks
Based on the discussion in Section 1.2.1, the distinctions of BANs from other networks
are its lower communication range, extremely low power and high accuracy and reliability
requirements. Since the existing WSNs technologies are not suitable for BANs, a new
IEEE study group 802.15.TG6 is working on developing new standard and protocol for
BANs applications. For creating a new protocol suitable for BANs, there is a choice
between to define a new PHY/MAC or to evaluate and improve current available or
emerging technologies. Some paper suggested to modify from existing standards [1, 3].
For the low power purpose, a possible approach is to scale down existing standards,
such as turn down the transmit power or introduce duty cycle mechanism [1]. IEEE
802.15.4a/b is a popular standard to be evaluated or improved for BANs applications
in previous works [8, 17–19]. Some other Existing standards such as Medical Implant
Communication Service (MICS) and Wireless Medical Telemetry Service (WMTS) are
also suggested [20]. Based on an investigation of some previous works about using
existing standards in BANs [8, 17–23], we found that most of these works focus on
the evaluation of the performance of the standards in BANs applications. Despite of
8 Chapter 1 Introduction
the importance of the low power requirement of BANs, the power issue have not been
addressed a lot in these works. [23] pointed out that the reason of the energy consumption
issue of the potential standards is hard to investigate is that the platforms currently
available for evaluating the existing standards are not particularly designed with proper
low power techniques for the purpose of extremely low power applications such as BANs.
Therefore, it is not fair to evaluate the energy consumption based on the platforms
since the existing standard could be further scaled down, such as turning down the
transmission power or introducing duty cycle mechanism, for the low power purpose,
as suggested in [1]. However, the scaling down of a standard must lead to degradation
on the performance. For example, although some of the previous works claim that the
802.15.4 standard provides a sufficient performance for BANs applications, it cannot be
guaranteed that the performance can be maintained after proper scaling down techniques
are applied to the standard.
Another option is to define a new standard for BANs. Since the existing low power
standards could not meet the ultra-low power requirement of BANs. How to further
scale down the energy consumption in a WSNs and keep the high performance becomes
a challenge. There are two problems challenge the performance of the communication in
BANs. One is that the transmission power of the sensors is ideally as low as possible for
protecting human tissues. On the other hand, human tissues are composed primarily
of water molecules, which tend to absorb RF energy [24]. These problems induce a
requirement of a proper Error Correction Code in channel coding scheme to maintain a
high reliability and accuracy in such a crucial condition. However, due to the reduction
of the transmission power in the system, the energy consumption by channel coding
could take more contribution of the whole system, which make a low power design of the
channel coding scheme becomes desirable. For overcome such challenge, we investigate
the potential to use Turbo-like codes in the communication of BANs. The novel way to
apply Turbo-like Codes in BANs and the advantages will be discussed in Section 1.3.1.
1.3.1 Turbo-like Codes
We will give all overall discussion of Turbo principle and Turbo-like codes in next chapter.
In this section, for the purpose of discussing the protential advantages of applying Turbo-
like codes in the channel coding scheme of BANs, we will give a brief introduction of
some distinctive features of Turbo-like codes.
Turbo-like codes refer to a type of ECC that includes two component codes in one coding
scheme. The concatenation between the component codes on the encoding process could
be parallel or serial. An interleaver is used between the component encoders. The two
type of concatenation in Turbo-like codes are illustrated in Figure 1.2. The success of
Turbo-like codes is that it introduces an iterative decoding process to approach the best
decoding results. The two decoding schemes corresponding to the two encoding schemes
Chapter 1 Introduction 9
Encoder1
Encoder2
π Encoder1 π Encoder2
Input
MU
X
Output
Parallel Concatenated Code
Input Output
Serial Concatenated Code
Figure 1.2: Two concatenation way of Turbo-like codes.
are given in Figure 1.3. An iterative decoding process is performed between the two
ππ−1
deM
UX
Decoder2
Decoder1
Decoder2
π
π−1
Decoder1Input
Output
Parallel Decoding Scheme
Output Input
Serial Decoding Scheme
Figure 1.3: Two decoding schemes of two types of Turbo-like codes.
concatenated decoders by feeding the decoded results back to each other’s input. Under
such a scheme, the decoded result improves in each iteration, until the best result is
achieved after a certain number of iterations. The details of the Turbo-like codes are
given in next chapter.
The advantages of the Turbo-like code are its high reliability and the near Shannon
capacity performance [25, 26], which could conquer the crucial transmission condition
in BANs. However, the disadvantage of it is its relatively high complexity decoding
scheme. Turbo-like codes are usually not appropriate for low power communication
since the iterative decoding process consumes a lot of energy [27]. However, the one-
way communication assumption in BANs provide an opportunity to apply Turbo-like
codes in its communication system. In contrast of the complicated decoding schemes
of the Turbo-like codes, the encoding schemes of them are simple and easy to have a
low power implementation. Therefore, in a star topology network, the decoding scheme
does not need to be implemented on the sensors based on the one-way communication
assumption. Although it need to be implemented on the central nodes, in BANs, the
central node usually has a sufficient energy recourse since it is usually a much bigger
equipment than the sensors, such as a PDA or a smart mobile. Hence, the Turbo-like
codes are naturally suited for a star topology BAN. Furthermore, as we discussed, the
multihop network is necessary in some scenario of BANs. In a multihop network, relays
are used reduce the transmission distance. Thus the transmission power on each nodes
can be reduced, which is an especially desired feature for BANs. On the other hand,
it is required to induce extra receiving, transmitting and coding process on the relays,
10 Chapter 1 Introduction
which might increase the overall energy consumption on the relays and reduce the life-
time of them. For the channel coding point of view, one way to avoid inducing extra
coding process on the relays is to transmit the received signals without any processing
but amplify them. However, it is not a desirable solution since in this way, the noisy in
the transmission is also been amplified on the relay, which might increase the decoding
error rate at the central node and decrease the communication performance. For an
alternative solution,we propose a novel decoding scheme the relay which including fewer
times of iteration in the decoding process, and transmit the sub-optimal decoding result.
It is possible to find out a balance point that with a certain times of iterative decoding
on the relays, the energy consumption of the extra coding process and the overall com-
munication performance are both acceptable and the transmission power can be reduced
by the multihop mechanism in the network.
To further research on the proposed novel decoding scheme, we need to investigate on
the energy consumption of the low power implementations of the Turbo-like code, such
as fixed-point ASIC implementation. In this report, we proposed a method to determine
the optimal datawidth specification in fixed-point implementation of Turbo-like code in
low power point of view. As discussed in Section 1.3, the possible energy consumption
of a communication standard or a coding scheme for the low power applications such
as BANs is hard to evaluate. In this report, we propose a framework to compare and
evaluate the possible energy consumption of different Turbo-like code decoding schemes.
To sum up, in this report, we discussed the possibility of introducing Turbo-like code
into BANs communication system. And related requirement of the method to evaluate
the energy consumption of a Turbo-like code decoding system is introduced. An im-
portant issue of low power realization of Turbo-like decoding algorithms in fixed-point
implementation, the data width specification is explored. The outline of the report is
given in next section.
1.4 Outline of the report
The later chapters in this report are organised as follows.
• Chapter 2 is a background chapter. A fully introduction to the Turbo principle and
Turbo-like codes is presented. The fixed-point implementation issues of hardware
design for the algorithms are also introduced.
• Chapter 3 proposes a method to determine the optimal data width specification in
fixed-point implementations of Turbo-like codes from a low power point of view.
• Chapter 4 proposes a framework to evaluate and compare the different Turbo-like
codes from the energy consumption point of view. Part of the framework is the
future work of the project, which is also discussed in this chapter.
Chapter 1 Introduction 11
• Chapter 5 gives the conclusion of this report.
Chapter 2
Turbo-like Code Solutions in
BANs
In this chapter, we introduce the background information of this report. It includes a
brief introduction of the Turbo principle and its encoding and decoding algorithms, the
EXIT chart analysis tool and the basic theory of fixed-point representation in hardware
implementation.
2.1 Introduction
The Turbo principle is a concept of error correcting code (ECC) including iterative
decoding processes, also referred as turbo decoding processes, such as serial or parallel
concatenated codes [25,28] and LDPC codes [29]. A unique feature of Turbo-like codes
is including two or more component codes concatenated in the schemes. Such types of
codes are called concatenated codes, which were first proposed by [30]. The first version
of concatenation codes was serially concatenated (SC) codes. It includes two or more
component codes concatenated in a serial structure. A famous example consists of a
Reed-Solomon code [31] as the outer code (applied first and removed last) and followed
by a convolutional code [32] as the inner code (applied last and removed first) [33] in
the scheme. In the early concatenated coding schemes, despite including two or more
component codes, there is no iterative decoding process in the decoder. The decoder
generates hard decisions (i.e. the determined bit results) directly. In a communication
receiver, the demodulator is usually produce soft decisions in the demodulation process.
The soft decisions are corresponding to the hard decisions. Instead of giving the decoded
bit results, soft decisions are reliability information expressed by a posteriori probability
of each bit. Soft decisions express not just what the most likely value of a bit is, but also
how likely it is while hard decisions only express the former. And before Turbo principle
discovered, a typical decoder utilises the soft decisions in the decoding process and
13
14 Chapter 2 Turbo-like Code Solutions in BANs
generate hard decisions at its output. Such a decoder could be called as a Soft-in Hard-
out (SIHO) decoder. Therefore, a straight forward way of decoding the SC codes involves
the use of a SIHO decoder for the inner decoder and a HIHO (Hard-in Hard-out) decoder
for the outer decoder concatenatedly. If a convolutional encoder is concerned, a Viterbi
Algorithm (VA) [34] decoder is used at the corresponding place to give hard decisions.
As discussed in [35], the first drawback of such a structure is that the inner decoder
generates hard decisions, thus preventing the outer decoder from utilising its ability to
accept soft decisions at its input. The second drawback is that if the inner decoder makes
a continual error sequence, the outer decoder is unable to correct the error. The second
drawback can be conquered by inserting an interleaver between the inner and the outer
encoder and correspondingly an deinterleaver between the inner and the outer decoder.
The function of a interleaver is to rearrange the order of a sequence in a pseudo-random
way. The function of a deinterleaver, with knowledge of the rearranging method of
the corresponding interleaver, is to restore the order of an interleaved sequence. Thus, a
continual error sequence in the inner decoder becomes dispersed in the input to the outer
decoder. The transmission scheme is shown in Figure 2.1. However, if errors occurred
Decoded bits
Input bits Outer encoder Interleaver Inner encoder
Channel
Inner decoderDeinterleaverOuter decoder
Figure 2.1: Transmission scheme of serial concatenation codes.
at the output of the outer decoder, these would remain in the final decoding results. A
Turbo-like code can be considered of a refinement of the concatenated encoding schemes
with an improved decoding process including iterative algorithms. The concept of turbo
decoding is for a system with two component codes to pass soft decisions from the output
of one decoder to the input of the other decoder, and to iterate this process many times
to produce more reliable decisions. To obtain benefits from an iterative decoding process,
it required that the two decoders feed soft decisions to each other. It is because using
hard decisions as an input of a decoder degrades system performance compared with soft
decisions [36]. Therefore, it requires Soft-in Soft-out (SISO) decoders for the decoding
of each component code. The introduction of Turbo codes in [25] is also the first time
of introduction of parallel concatenation (PC) codes. It was reported that the scheme
can achieve a bit error rate (BER) of 10−5 using a rate 1/2 code over an additive white
Gaussian noise (AWGN) channel and BPSK modulation at an Eb/N0 of 0.7 dB [25,26].
According to the discussion in [25,26], the Shannon capacity for a binary modulation is
the error probability Pe = 0 (Pe = 10−5 can be taken as a reference here) for Eb/N0 =
0 dB. Hence the performance is 0.7 dB from Shannon capacity. Most importantly, with
the iterative decoding scheme, the complexity of a Turbo decoder is much less than
that of a non-iterative decoder having the same performance. According to [37], the
complexity required to allow the earlier codes to approach the Shannon capacity would
Chapter 2 Turbo-like Code Solutions in BANs 15
be not feasible to implement. The discovery of Turbo codes has revolutionised the field of
error correcting codes since it first time achieved the performance very close to Shannon
capacity in practice.
To evaluate the performance of a Turbo or Turbo-like code, a Bit Error Rate (BER)
chart is a commonly used tool. A typical BER chart of Turbo codes looks like Figure 2.2.
Y axis is the BER of the decoding result after a certain times of iterative decoding and
X axis is Eb/N0, where Eb is the energy in one bit and N0 is the noise power spectral
density (i.e. noise power in a 1 Hz bandwidth). As shown in Figure 2.2, a typical Turbo
0 0.5 1 1.5 2 2.5
BE
R Turbo cliff
Error floor
10−7
10−6
10−5
10−4
10−3
10−2
10−1
10−0
Eb/N0
Threshold Eb/N0
Figure 2.2: A typical BER chart for Turbo codes.
code can achieve very low BER once Eb/N0 reached a certain point. In the figure, the
point which the BER curve starts to decrease is called threshold Eb/N0, the region where
the BER curve falling fast is called the turbo cliff region and the region where the BER
curve is flat at a very low value is called the error floor region. To understand how the
Turbo codes outperforms the earlier coding schemes, we quote Figure 2.3 from [?]. It
shows simulation results of the original rate R=1/2 turbo code presented in [25] and
a maximum free distance (MFD) R=1/2, memory ν = 14(2, 1, 14) convolutional code
with Viterbi decoding. The simulation results show that the Turbo code outperforms the
convolutional code by 1.7 dB at a BER of 1−−5. The comparison is distinct, especially
since a detailed complexity analysis reveals that the complexity of the Turbo decoder is
much smaller than the Viterbi decoder used for the convolutional code.
A classical Turbo encoder is composed of two recursive systematic convolutional (RSC)
encoders, as shown in Figure 2.4. The input information sequence is encoded twice by
the two RSC encoders. The first encoder processes the information in its original order,
while the second encoder processes the same sequence in a different order obtained by
an interleaver. In this scheme the systematic bit sequence is also transmitted to the
16 Chapter 2 Turbo-like Code Solutions in BANs
Figure 2.3: Performance comparison of a Turbo code and a convolutional code [?].
decoder. As shown in the figure, sequence c and d are the output of each encoder. Se-
quence a is the systematic bit sequence and b is the interleaved systematic bit sequence.
Note that only a is transmitted since b can be obtained by an identical interleaver on the
decoder. In the decoding process, as shown in Figure 2.5, two a posteriori probability
RSC1
RSC2
Puncturing
OutputInput a
b
c
dπ
Figure 2.4: A classical Turbo encoder.
(APP) decoders are used correspondingly for the two convolutional encoders in the en-
coding scheme to get the minimal bit error probability. In the figure, a, c and d are the
soft decisions sequence corresponding to the output sequence a, c and d in Figure 2.4
obtained by the demodulator. The purpose of an APP decoder is to compute a posteriori
probabilities on either the information bits or the encoded symbols. Its applications in
Turbo-like codes make it became the major representative of the SISO decoders. The
algorithm was originally invented by Bahl, Cocke, Jelinek and Raviv in 1972, so called
BCJR algorithm [38]. The capability of generating soft decisions of it is well suited for
iterative decoding schemes. In Figure 2.5, the two decoders are working alternatively in
Chapter 2 Turbo-like Code Solutions in BANs 17
an iterative way. To get the correct order of the input sequences, a identical interleaver
with the one used in the encoding scheme and a corresponding deinterleaver is used be-
tween the decoders. An extra interleaver is used for providing the systematic sequence
for both of the decoders. The main advantage of this decoding process compared with
using the VA decoders is that it utilises the ability of the decoders to accept soft decisions
at their input. However, in iterative decoding schemes, the information provided for one
decoder from the other one, is extrinsic information but not a posteriori information.
The extrinsic information represents the new information obtained by a decoder. The
reason of using extrinsic information is to prevent the decoding scheme from being a
positive feedback amplifier [35]. As shown in Figure 2.5, the a priori information from
systematic sequence is added to the input of the decoders, since the a posteriori in-
formation already includes the a priori information from the previous decoding process
from the other decoder, this creates a positive feedback amplifier in the loop, by using
extrinsic information instead of a posteriori information, such a problem can be solved.
Therefore, the output of the decoders in Figure 2.5 is extrinsic information. It can be
obtained by a simple subtraction between the a posteriori and the a priori information.
Alternately, it can also be generated directly by a modified BCJR algorithm. By receiv-
ing the new extrinsic information from the other decoder, the reliability of the decoding
increases in each iteration. The whole decoding process stops when the required relia-
bility is reached or until no further reliability can be gleaned. In practice, the modified
APP1
APP2Output
ππ−1π
c
d
a
Figure 2.5: A classical Turbo decoder.
BCJR algorithm avoid the final subtraction operations, which is more suitable for itera-
tive decoding schemes. Moreover, a further improved version of BCJR algorithm called
the Log-APP or Log-BCJR algorithm is a transferred version of BCJR algorithm in
logarithmic domain. The purpose of it is to avoid the mass of multiplication operations
in BCJR algorithm and more importantly, the Log-BCJR algorithm has variables with
a much more manageable dynamic range than those of the BCJR algorithm, reducing
the memory requirement and allowing fixed-point processing to be used. Since it avoids
the complex circuit implementation due to many multipliers required by original BCJR
algorithm and requires much less memory, Log-BCJR algorithm is widely used in prac-
tice. Hence, in this report, since we only investigate the applications of BCJR algorithm
in interactive decoding schemes, the algorithm we discuss and simulate would be the
18 Chapter 2 Turbo-like Code Solutions in BANs
modified Log-BCJR algorithm that generate the extrinsic information directly. We will
discuss the detail of the algorithm in next section.
The Turbo principle can also apply to SC codes, which becomes to another primary cat-
egory Turbo-like code, serially concatenated convolutional codes (SCCC) [28]. Instead
of using the decoding scheme in Figure 2.1, a scheme similar to with the Turbo decoder,
as shown in Figure 2.6 is used. The two SIHO decoders are replaced with SISO decoders
and an interleaver and a deinterleaver are required to form the iterative decoding loop.
According to [35], serial Turbo codes perform better than parallel Turbo codes in the
Inner decoderOuter decoderOutput Input
π
π−1
Figure 2.6: A classical SC decoder.
error floor region. On the other hand, in the turbo cliff region, parallel Turbo codes
perform better with the same overall coding rate.
2.2 Turbo codes and BCJR decoding algorithm
Turbo-like codes generally have a simple encoding scheme and a relatively complicated
decoding scheme. In this section, we use 3rd Generation Partnership Project (3GPP)
Universal Mobile Telecommunications System (UMTS) Standard [39] as an example to
introduce the typical Turbo coding schemes, the included convolutional code and the
SISO decoding algorithm for convolutional code, BCJR algorithm. UMTS Turbo code
and BCJR algorithm are also using as examples to present my works in this report, so
the description in this section are refereed in later chapters.
2.2.1 UMTS encoder and decoder architecture
To simplify the description, we assume BPSK modulation is used in our case, so each
symbol in transmission is a bit. For other modulation methods, the transmitted bits
here would be replaced by transmitted symbols. According to [39], the concatenated
RSC encoder of UMTS Turbo code is a rate R=1/2, K=4 constraint length and m=3
memory convolutional code. Two such identical encoder form a rate 1/3, 8-state PCCC
illustrated in Figure 2.7. In the RSC encoder, the three memory bits forms an 8-state
finite-state machine (FSM). We use the notation Na to represent the block length of
the encoding sequence a. Before the encoding of the bit sequence a commences, the
shift registers of each concatenated convolutional code are initialised in a state that is
known to the receiver. Typically, the m=3 memory elements of each shift register are
Chapter 2 Turbo-like Code Solutions in BANs 19
D DD
DD
Interleaver
D
MU
X
Output
Inputa
b
a
c
e
d
f
Figure 2.7: Scheme of UMTS Turbo encoder.
initialised with logic-zeros, placing them in what is referred to as the “all zeros” state.
However, following the encoding of the Na bits in the sequence a, the shift registers will
enter states that are not inherently known to the receiver. A number of techniques have
been proposed to cope with this [35]:
• No termination: In this case, in the decoding process, the end of a block sequence
is considered to have a equivalent possibility of each possible state. No information
of the final state need to be provided. The decoding process is then less effective
for the last encoded data and the performance may be reduced. The degradation
is a function of the block length. However, for some applications the degradation
may be acceptable.
• Termination: This method involves several extra bits at the end of each block
sequence to force the encoder return to the “all zero” state. The UMTS Turbo
code of Figure 2.7 is one example of such technique. The extra tail bits also need to
be sent to the decoder. This method conquered the uncertain final state issue but
induced another two drawbacks. Firstly, extra redundancy information is added to
the transmission. Nevertheless, the redundancy is negligible except for very short
blocks and it is useful for error correction. Secondly, for parallel codes, the tail bits
are not identical for each constituent codes, which means in the iterative decoding
process, the extrinsic information of the tail bits cannot be exchanged between the
decoders. Hence, the data at the end of the block sequence will get less benefit
from the Turbo decoding process. The SCCC also has the similar problem.
• Adopt tail-biting: [40] introduced a technique allows any state of the encoder
as the initial state. This method involves a double encoding process: Firstly, a
20 Chapter 2 Turbo-like Code Solutions in BANs
normal encoding of the sequence starting from “all zero” state is performed, but
the output of the encoder is ignored. Only the final state of the encoder is stored.
Secondly, the encoding process is performed again in order to actually generate
the output. In this step, the initial state is a function of the final state previously
stored. The result of this process is the final state of the encoder is equal to its
initial state. The advantage of this method is no extra bits have to be added
and transmitted. However, the double encoding process is the main drawback of
this method. In addition, it only works for the convolutional codes where BCJR
algorithm is especially adapted.
In the UMTS Turbo code, the termination technique is used as shown in Figure 2.7. The
initial states of the shift registers are all set to zeros when starting to encode a bit block
a. Note that after encoding the Na bits of the source sequence a, the two switches in the
figure switch down to form a closed loop in the two encoders. Following this, m = 3 bits
are encoded in order to reset the contents of the shift register to “all zero” state. The
output of the Turbo encoder is a, c, e, d and f , where a is the systematic bit sequence, c
and d are the encoded bit sequences of the two encoders, respectively, and e and f are the
termination sequence of the two encoders, respectively. The termination is performed by
taking the tail bits from the shift register feedback after all information bits are encoded.
It takes m bits to force the final state back to “all zero” state for each encoder. Therefore,
in the case where a comprises Na bits, c and d will comprise Nc = Nd = Na + m bits,
while e and f will comprise Ne = Nf = m bits. In UMTS Standard, the possible block
length of the Turbo code (i.e. the length of bit sequence a) Na ∈ [40, 5114]. For the
interleaved sequence b, the length is Nb = Na. The termination bits e and f have
a length of the number of memory bits in the RSC encoders, Ne = Nf = m = 3.
Consequently, for the encoded sequence c and d, there is Nc = Nd = Na + 3. Note that
the additional termination bits (e and f) make the coding rate R of the encoder lower
than 1/3, namely R = Na/(Na + Nc + Nd + Ne + Nf ) = Na/(3Na + 4m).
To understand the operation of the FSM, the state transition can be shown as a trellis
diagram in Figure 2.8. an, cn and end are the input sequence and the output sequence.
S1, S2 and S3 are the current state of the three memory bits in the encoder. S+1 , S+
2
and S+3 are the next state of the three memory bits. The transition of the states and
the decoding results can be expressed as the following equations.
• For encoding bits:
S+1 = ak ⊕ S2 ⊕ S3 (2.1)
S+2 = S1 (2.2)
S+3 = S2 (2.3)
ck = S+1 ⊕ S1 ⊕ S2 (2.4)
Chapter 2 Turbo-like Code Solutions in BANs 21
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
0/0
0/0
1/1
1/0
0/1
0/1
1/0
1/11/1
0/0
0/1
1/0
1/0
0/1
0/0
1/1
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 1
0/0
0/0
1/1
1/0
0/1
0/1
1/0
1/1
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
1 1 0
S1 S2 S3
S+1 S+
2 S+3S1S2S3
State1
ak/ck
State2
State3
State4
State5
State6
State7
State8
S+1 S+
2 S+3
ck
ek
S1S2S3
State1
ek/ck
State2
State3
State4
State5
State6
State7
State8
Transition trellis for encoding bits Transition trellis for termination bits
S+1 S+
2 S+3
an
yn
Figure 2.8: Scheme of the convolutional encoder and the trellis diagram.
• For termination bits:
S+1 = 0 (2.5)
S+2 = S1 (2.6)
S+3 = S2 (2.7)
ek = S2 ⊕ S3 (2.8)
ck = 0 ⊕ s1 ⊕ S3 (2.9)
The eight possible states are corresponding to the State1 to State8 as shown in the
figure. The trellis diagrams gives all the possible transitions of the FSM. The left trellis
diagram shows the transitions for the encoding bits in a sequence. The right diagram
shows the transitions for the termination bits in a sequence. Note that the first state in a
transition sequence is “all zero”, which is the state1 in the figure. With the termination
technique, the last state in the sequence is forced back to state1. It causes the possible
transitions at the first three steps and the last three steps are limited. A transition trellis
diagram of a transition sequence is shown in Figure 2.9. The input sequence is an and the
output sequence cn and en can be obtained by tracking the state transition in Figure 2.9.
For instance, for a 5 bits input sequence a = [0, 1, 1, 0, 1], the transitions in the trellis
is shown in Figure 2.10. Note that there are 8 steps in the trellis since there are three
termination bits included. The encoded bit sequence would be c = [0, 1, 0, 0, 1, 0, 1, 1] and
the actually transmitted systematic bit sequence is [a, e] = [0, 1, 1, 0, 1, 1, 0, 1]. The trellis
diagram is not only helpful to understand the encoding operations of a convolutional
22 Chapter 2 Turbo-like Code Solutions in BANs
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
State1
State2
State3
State4
State5
State6
State7
State8
S1S2S3 a1/c1 a2/c2 a3/c3 a4/c4 a5/c5 aNa/cNa
e2/cNa+2 e3/cNa+3e1/cNa+1
Figure 2.9: Trellis diagram of a transition sequence.
code, but also useful to explain the BCJR decoding algorithm, as we shall discuss later.
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
yn/cn
S1S2S3
State1
State2
State3
State5
State6
State7
State8
State4
0/0 1/1 1/0 0/0 1/1 0/0 0/1 1/1
Figure 2.10: A example transition sequence.
The architecture of the decoder is as shown in Figure 2.11. A data transmission loop is
formed between the decoder 1 and decoder 2 to realize the iterative decoding process.
Each iteration consists of two half iterations, one for each constituent RSC code. The
two decoders operate alternately since the input of one decoder includes the output of the
other decoder from pervious half iteration. The operation of the RSC decoder (i.e. the
BCJR algorithm) is described in Section 2.2.2. In the figure, the input of the decoding
scheme is assumed to be in soft decision form, which makes that the channel gain and
Chapter 2 Turbo-like Code Solutions in BANs 23
Decoder 1
Decoder 2
ππ−1π
ap
babc be
ac
ec
fc
dc
Input
deM
UX
ya
ye
za
ze
aa ae
cc
Figure 2.11: Scheme of UMTS Turbo decoder.
noise variance have been properly taken into account. The five input ac, cc, dc, ec and
fc are the input soft decisions corresponding to the coded output a, c, e, d and f in the
encoding scheme. For each decoder, it received two information sequence. One is the
soft decisions of the encoded sequence, which is received from the transmission channel
directly (cc for decoder 1 and dc for decoder 2). The other one is the uncoded sequence
input information from the other decoder, which is simply formed by adding the a priori
information provided by the other decoder to the received systematic information. The
a priori information is the extrinsic information generated by the other decoder after
rearranging the order by the proper interleaver (π) or deinterleaver (π−1). For decoder
1, the input LLR from the other decoder ya is the sum of the aa and ac following with
ec as shown in the figure. Because the two encoders have independent tails, the soft
decisions of the tail bits are not passed between the decoders. Thus the information of
the termination bits need to be considered carefully. The systematic information of the
decoder 1’s termination bits ec need to be added at the end for a complete ya. Therefore,
ya = [aa+ac, ec]. On the other hand, for the extrinsic information generated by decoder
1, ye, the information of the termination bits need to be cut off before it interleaved and
transmitted to decoder 2, as shown in the figure. Therefore, the length of ya and ye are
Ny = Na+m. For decoder 2, respectively, the uncoded information is the sum of ba (i.e.
interleaved ae) and interleaved systematic information bc. Therefore, za = [ba + bc, fc]
and the same processing of the termination bits is applied on za and ze. The length of za
and ze are Nz = Nb + 3 = Na + 3. In the first iteration, be is initialised with a sequence
of zero valued LLRs which imply that the values of the corresponding bits are completely
unknown. ac is the received systematic information. Note that two identical interleaver
of the interleaver in the encoding scheme and a corresponding deinterleaver are used
between the decoders to give the correct order of the input sequence. As discussed
24 Chapter 2 Turbo-like Code Solutions in BANs
before, in the BCJR algorithm we used, the extrinsic information directly generated by
the decoding algorithm, which is done inside the decoder, after all the iterations are
completed, the a posteriori output of the decoding scheme is obtained by adding the
final extrinsic output to the final a priori input of the decoder 1, as shown in the figure.
And the SISO decoding process is then completed. Based on the soft decisions, hard bit
decisions can be taken to give the final decoding result.
2.2.2 BCJR algorithm
As we discussed, the APP algorithm we investigate is the Log-BCJR algorithm. In this
section, we give a brief description of the the Approx-Log-BCJR algorithm. The original
BCJR algorithm is introduced with detail in [37].
2.2.2.1 Log-BCJR algorithm
There are two main advantages of inducing Log-BCJR algorithm. Firstly, the original
BCJR algorithm is consist of many multiplication operations which lead to very complex
circuits while implementing in hardware. Log-BCJR algorithm avoids the multiplica-
tions by transforming the algorithm into the logarithmic domain, where multiplications
become additions. Secondly, the values of soft decisions in the normal domain could
have a very large dynamic range and theoretically unlimited, which leads to a large
amount of memory space in practice. To transfer them to logarithmic domain reduces
the dynamic range of the soft decisions and consequently all the internal variables in
the algorithm. Hence significantly this approach reduce the memory requirement to
implement the algorithm. We use notation y to represent the systematic bits in the
encoder including the termination bits, which means y = [a, e], and use ya to represent
the received uncoded sequence LLRs in our BCJR decoder, according to Figure 2.11.
We have y = {yn}Ny
n=1. In normal domain, the soft decision of a received bit is defined
as:
yan =
P (yn = 0)
P (yn = 1)(2.10)
where yan is the soft decision of the received bit yn. In logarithmic domain, the soft
decisions become log-likelihood ratios (LLRs) defined as:
yan = ln
(P (yn = 0)
P (yn = 1)
)(2.11)
The two basic operations in the original BCJR algorithm is addition and multiplication.
For A = ln(a) and B = ln(b), the multiplication in normal domain becomes addition in
logarithmic domain.
ln(ab) = ln(eAeB) = A + B (2.12)
Chapter 2 Turbo-like Code Solutions in BANs 25
Addition in the normal domain can be solved by Jacobian logarithm in the logarithmic
domain, which we use max∗ to define the function
ln(a + b) = ln(eA + eB) = max(A,B) + ln(1 + e−|A−B|) = max∗(A,B) (2.13)
The max∗ function is usually computed by successive pairwise operations when there are
more than two terms of it. In practice, the function fc = ln(1 + e−|A−B|) can be imple-
mented by a Look-Up-Table (LUT), so the function can be done by a select operation in
the LUT. The LUT realized version of Log-BCJR algorithm is called Approx-Log-BCJR
algorithm. In Approx-Log-BCJR, max function can be done by a compare operation
between A and B. Thus, in all the operations required in Log-BCJR algorithm are
“add”, “compare” and “select”, so called ACS operations.
For presenting Log-BCJR algorithm, we use the convolutional code in the UMTS Turbo
code as an example. Figure 2.12 shows the example trellis diagram provided in Sec-
tion 2.2. yn are the systematic bits and cn are the encoded bits. There are three tail
bits used for termination, as shown in the trellis, that is driving the encoder back into
“all zero” state, State1. Note that we use the notations for decoder 1 in Figure 2.11
here but simply replacing sequence y and c with z and b, the same decoding trellis
can also apply to decoder 2 in Figure 2.11. In the trellis diagram, there are 16 pos-
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
State1
State2
State3
State5
State6
State7
State8
State4
0/0 1/1 1/0 0/0 1/1 0/0 0/1 1/1yn/cn
StateT1 T3
T4
T5
T6
S2 S4
S3
T12
T28
T46
T54
T58
T60
S7 S14
S6
S5
S1
T2
S23S31
S35
S37
S38
T59
Figure 2.12: A example trellis of a short terminated trellis code.
sible transitions, except the three initial steps at the start and the three termination
steps at the end. For a certain input systematic sequence yn and the corresponding
encoded sequence cn. only one transition is used in each step in a encoding trellis, as
exemplified in Figure 2.12. The corresponding systematic bit and encoded bit of each
transition is also given in the figure. To represent each state in Figure 2.12, we use
26 Chapter 2 Turbo-like Code Solutions in BANs
notation {S1, S2, S3...S38} to notate the possible states in the trellis following the order
from to to bottom and from left to right, as shown in the figure. Similarly, we use
notation {T1, T2, T3...T60} to notate each possible transitions in the trellis following the
same order. In addition, we use notation tn to represent the transition employed in the
encoder trellis for the nth bit. Similarly sn is the state entered by the encoder after
the nth bit. Therefore, for the example sequence of y and c in Figure 2.12, we have
the traced transitions {tn}Ny
n=1 = {T1, T4, T12, T28, T46, T54, T58, T60} and traced states
{sn}Ny
n=0 = {S1, S2, S6, S14, S23, S31, S35, S37, S38}. For describing the algorithm, we de-
fined the following notations.
• fr(T ) is the starting state of the transition T . For example, in Figure 2.12,
fr(T1)=S1 and fr(T3)=S2.
• to(T ) is the ending state of the transition T . For example, in Figure 2.12, to(T1)=S2
and to(T2)=S3.
• fr(S) is the aggregate of all the transitions started from state S. For example, in
Figure 2.12, fr(S2)={T3, T4}.
• to(S) is the aggregate of all the transitions ended at the state S. For example, in
Figure 2.12, to(S38)={T59, T60}.
• y(T ) is the value for the bit in y that is implied by the transition T . For example,
in Figure 2.12, y(T1) = 0 since t1 = T1 implies that y1 = 0. Similarly, y(T4) = 1.
• c(T ) is the value for the bit in c that is implied by the transition T . For example,
in Figure 2.12, c(T1) = 0 and c(T2) = 1.
• n(T ) is the bit index associated with the transition T . For example, in Figure 2.12,
n(T1) = n(T2) = 1 and n(T3) = n(T4) = n(T5) = n(T6) = 2.
With the notations above, we shall start to describe the Log-BCJR algorithm. The
ultimate purpose of the algorithm is to calculate extrinsic LLRs of the decoded sequence
ye. However, in the algorithm, it is more immediate to calculate the probability that
the encoder traversed a specific transition in the trellis. The calculation of the extrinsic
LLRs ye leads to the calculations of another three groups of internal variables, γ, α and
β.
• The γ values are conditional transition probabilities. In our case, the γ values is
divided into two sub-groups, the a priori transition probability γy and the channel
transition probability γc. They corresponding to each transition in the trellis.
For each transition in each step, there is a γy(T ) and a γc(T ). γy(T ) represents
the probability ln[P (tn(T ) = T |yan)]. γc(T ) represents the probability ln[P (tn(T ) =
T |ccn)].
Chapter 2 Turbo-like Code Solutions in BANs 27
• The α values are corresponding to each state in each step in the trellis. It is the
conditional probability that in step n (i.e. the decoding process is working on the
trellis step that corresponding to the received yan and cc
n), the traversed transition
T is started from a particular state S, that is α(S) represents the probability
ln[P (Sn(S) = S|{yan}
n(S)n=1 , {cc
n}n(S)n=1 )].
• A β value, on the other hand, is the conditional probabilities of a traversed tran-
sition T is ended to a particular state, that is β(S) represents the probability
ln[P (Sn(S) = S|{yan}
Ny
n=n(S)+1, {ccn}
Nc
n=n(S)+1)].
Finally, the three groups of the variables can be used to calculate the probability that
the encoder traversed a specific transition T in the trellis. We use δ to represent such
a probability. For calculating the extrinsic information, δy is considered here, which
δyT represents the probability ln[P (tn(T ) = T |{yan}
Ny
n−1,n 6=n(T ), {ccn}
Nc
n−1)]. It is the joint
probability of the corresponding γc, α and β of the transition T .
For computing all the variables above, Log-BCJR algorithm is composed of the following
four parts.
1. γ calculation: The values of γ depend on the inputs of the convolutional decoder.
There are two inputs, the encoded LLRs input and the uncoded LLRs input. As
shown in Figure 2.7, the encoded LLRs input is the LLRs of the encoded sequence
received from the channel cc
n. The uncoded LLRs input is ya. For a transition T ,
the γy and γc can be calculated as:
γy(T ) = (1 − y(T ))yan(T ) (2.14)
γc(T ) = (1 − c(T ))ccn(T ) (2.15)
2. α calculation: The values of α depend on the γ values and α values from the
previous step in the trellis. Hence, it requires a forward recursion in the trellis to
obtain all the α values. For a state S, in step n, the function to calculate α is:
α(S) = max*
T∈to(S)(γy(T ) + γc(T ) + α(fr(T ))) (2.16)
where α(S1) = 0.
3. β calculation: The values of β depend on the γ values and β values from the next
step in the trellis. Hence, it requires a backward recursion in the trellis to obtain
all the β values. For a state S, in step n, the function to calculate β is:
β(S) = max*
T∈fr(S)(γy(T ) + γc(T ) + β(to(T ))) (2.17)
where β(S38) = 0
28 Chapter 2 Turbo-like Code Solutions in BANs
4. δy calculation: The values of δy can be calculated according to (2.18).
δy(T ) = γc(T ) + α(from(T )) + β(to(T )) (2.18)
5. Finally, the extrinsic information can be calculated based on δ values. The extrinsic
LLRs of the uncoded bits ye are:
yen = max*
T |y(T )=0(δy(T )) − max*
T |y(T )=1(δy(T )) (2.19)
The algorithm is accomplished.
2.3 EXIT chart analysis
As we mentioned, the BER chart is a powerful tool to analyse the performance of a
Turbo-like code. However, it is unable to characterise the convergence behaviour of a
Turbo-like code, for example at the onset of the turbo cliff. This requires a different
analysis tool, namely the extrinsic information transfer (EXIT) chart [41]. An EXIT
chart uses mutual information (MI) measurement to quantity the quality of the extrinsic
information exchanged between the constituent decoders in an iterative decoding sys-
tems. It is comprised of two curves for the two decoders in the system. Each curve
plots the mutual information of the extrinsic LLRs versus the mutual information of the
a priori LLRs of one decoder in the system, which is basically to measure the quality
of the input and the output of the decoder. For example, taking the UMTS decoding
scheme as an example, for the first decoder, the EXIT curve plots I(ae, a) as a function
of I(aa, a) as shown in Figure 2.13, where I(aa, a) is the mutual information between
aa and a, while I(ae, a) is the mutual information between ae and a. For drawing the
EXIT curve, we using simulator to generate sequences of a priori LLRs aa having a range
of mutual information (0 < I(aa, a) < 1). Using simulations that include the channel
model, the modulation model and the BCJR decoder, the extrinsic output ae can be
obtained and measured. If we use I(ae) to represent I(ae, a) and I(aa) to represent
I(aa, a), the EXIT function I(ae) = F (I(aa)) of the UMTS Turbo code is shown in Fig-
ure 2.14. In the simulation, we use the exact convolutional code shown in Figure 2.13,
with BPSK modulation and AWGN channel. The Signal-to-Noise Ratio (SNR) is -4dB.
The SNR is defined as:
SNR =Es
N0(2.20)
For the other decoder, with the same function another EXIT curve can be drawn based
on the simulation. For a Turbo code, owing to the symmetry of the two concatenated
codes, the EXIT function of the lower convolutional code is identical to that of the upper
Chapter 2 Turbo-like Code Solutions in BANs 29
DDD
Channel
BCJR
decoder
LLRsgenerate
MImeasure
a
c
e
aInput
MU
X
Modulator
deModulator
deM
UX
cc
ec
ac
ae
aa
I(ae, a)
I(aa, a)ya
ye
Figure 2.13: Scheme of the EXIT chart generating.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Figure 2.14: One EXIT curve I(ae) = F (I(aa)) of UMTS Turbo code using BPSKto transmit over an AWGN channel having an SNR of -4 dB.
convolutional code. In EXIT chart, the second curve is displayed with the swapped
axes, that is the horizontal axis is the mutual information of the extrinsic output and
the vertical axis is the mutual information of the a priori input. The reason to display
the second curve with the swapped axises is because in the iterative decoding process,
the output of one decoder is the input of the other decoder in next iteration. By putting
the input of the decoder and the output of the other decoder in the same axis. The
30 Chapter 2 Turbo-like Code Solutions in BANs
interaction of the two concatenated decodes can be predicted on an EXIT chart. The
complete EXIT chart of the UMTS Turbo code generated by our simulation results is
given in Figure 2.15. The iterative decoding process of the Turbo code can be revealed by
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1EXIT chart of the UMTS Turbo decoder
I(aa)
I(a e)
Figure 2.15: EXIT chart of UMTS Turbo decoder.
decoding trajectories in the EXIT chart, as shown in Figure 2.16. A decoding trajectory
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Figure 2.16: The decoding trajectories in the EXIT chart.
start at (0,0) point, where at the start of the decoding process, there is no a priori
information coming from the other decoder. The mutual information of the output of
the first decoder can be obtained by the upper curve in the EXIT chart and is provided as
Chapter 2 Turbo-like Code Solutions in BANs 31
the input of the second decoder. Based on the mutual information provided by the first
decoder, the mutual information of the output of the second decoder can be obtained
by the lower curve in the EXIT chart. The decoding performance of the next iteration
can be obtained by the same way. Thus, a decoding trajectory can be obtained. Note
that the the condition of the decoding trajectory has a high probability to reach the
(1,1) point is that the EXIT chart has open tunnel. By reaching the (1,1) point, the
maximum likelihood decoding has been found and the BER will be in the error floor
region. However, since the EXIT chart is the statistical results by large samples of
simulation. In practice, the trajectories are varying from the EXIT chart. As shown in
Figure 2.16, the three trajectories are all different and departured from the EXIT chart.
The EXIT chart gives the average convergence behaviour of the investigated code. An
EXIT chart allows to consider the two concatenated codes in isolation of each other.
Since EXIT charts can predict the iterative interaction of the two codes, the iterative
decoding process does not need to be simulated in order to draw an EXIT chart. Thus,
EXIT chars can be obtained faster than BER/FER charts.
The measurement of the mutual information has a number of different methods. The
first method is the averaging method uses the equation:
I(a, a) = 1 +1
Na
Na∑
n−1
1∑
a′=0
(1 − a′)ea
1 + ealog2[
(1 − a′)ea
1 + ea] (2.21)
This method has the advantage of not requiring any knowledge of the bit sequence
a. This is achieved by assuming that the LLRs in zmathbf a satisfy the consistency
condition, that is the LLRs do no express too much confidence or too little confidence.
Since the averaging method “believe” what the LLRs say, it does not need to consider
the true values of the bits in a. However, this assumption is only valid if there are no
sub-optimalities in the receiver design. This requires perfect channel estimation, perfect
carrier recovery, perfect synchronisation, perfect equalisation and optimal decoding using
the Log-BCJR algorithm. The histogram method of measuring mutual information does
not make the described assumption and is therefore better suited when a sub-optimal
receiver is employed. This method uses knowledge of the true values of the bits in a to
avoid having to “believe” what the LLRs say.
2.4 Fixed-point representation in a Turbo decoder
In this section, we give a introduction of fixed-point representation in hardware design.
Fixed-point representation, compared with floating-point representation, is easily imple-
mented in a small memory space and it is fast to execute. Therefore, it is well-suited to
real-time or low-power applications. Internally, the computation of fixed-point numbers
32 Chapter 2 Turbo-like Code Solutions in BANs
take the values as integers, but considered the integer part and fraction part separately
with an imaginary point.
Two’s complement representation is the most widely used fixed-point representation
in practice. A two’s complement binary number is divided into three parts, a sign
bit, an integer part and a fraction part. First, let us consider the two’s complement
representation of signed integers before considering the representation of numbers having
a fraction part. The most significant bit is used as the sign bit, where 0 is used to
represent positive signs and 1 is to represent negative signs. The rest of the bits represent
the magnitude of the number. For a negative number, the magnitude of it complemented
bit by bit and incremented by 1 is its two’s complement representation. For example
the 3-bit representation of 2 is 010. The complement of this is 101. Adding 1 to
this give the two’s complement representation of -2, namely 110. The complete set
of 3-bit two’s complement representations is given in Table 2.1. In addition, another
two signed integer representation methods, sign and absolute value notation and one’s
complement notation, are also given as examples in the table for comparison. As shown,
Binary number 000 001 010 011 100 101 110 111
Sign and absolute value +0 +1 +2 +3 -0 -1 -2 -3One’ complement +0 +1 +2 +3 -3 -2 -1 -0Two’ complement +0 +1 +2 +3 -4 -3 -2 -1
Table 2.1: Different representation methods for integer numbers
compared with the other two methods, two’s complement notation avoided the double
representation of zero. As a consequence, the range of negative values is more than the
range of positive values by one smallest value in its resolution. The main advantage of
two’s complement notation is the ability to perform the addition of negative numbers,
without needing to take the sign of the operands into consideration. In two’s complement
notation, the subtraction is achieved by doing the complement and adding. For example,
in 3-bit representation, 2−3 can be done by calculating the sum of 2 (010) and -3 (101).
2 − 3 = 2 + (−3) = 010 + 101 = 111 = −1 (2.22)
For the subtractions in two’s complement notation, letting the result overflow is neces-
sary. Take the following calculation as an example:
3 − 3 = 3 + (−3) = 010 + 110 = (1)000 = 000 = 0 (2.23)
Since the overflowed part is lost, the calculation gives the correct result naturally. In
contrast, the subtraction in the other two notation methods is more complicate since
complement and adding does no give the correct result. For example, in one’s comple-
ment notation:
3 − 2 = 3 + (−2) = 011 + 101 = 000 = 0 (2.24)
Chapter 2 Turbo-like Code Solutions in BANs 33
Therefore, the addition involve different signed components need to be considered care-
fully, extra correction is required.
For a fractional fixed-point number, A , an imaginary point is set at a certain place. For
a 3-bit two’s complement fixed-point number with 2-bit fraction part, a example for such
method is given in Table 2.2. In the table, the imaginary point is placed after the most
significant bit in the binary representation. For n-bit two’s complement representation,
Binary number 0.00 0.01 0.10 0.11 1.00 1.01 1.10 1.11
Two’ complement +0.00 +0.25 +0.50 +0.75 -1 -0.75 -0.5 -0.25
Table 2.2: Two’s complement representation method for fraction numbers
we use notation Qp.q to represent the point setting, where p represents the bit number
of the integer component and q represents the bit number of fraction component. The
total bit number n = p + q + 1. For example, a 8-bit two’s complement number with
a imaginary point after the 5th bit: 01100.010. The integer part is 12 and the fraction
part is 0.25, thus the value of 01100.010 is 12.25. The maximum and minimum limits of
the representation are given by (2.25), and the resolution r is given by (2.26).
−2p+q
2q≤ A ≤
2p+q − 1
2q(2.25)
r = 2−q (2.26)
Chapter 3
Optimal Data-width Settings for
Fixed-point Implementation
3.1 Introduction
In Turbo-like decoding schemes, the algorithms are usually specified in the floating-
point domain. However, in practical implementations, for energy efficiency, a fixed-
point number representation is mandatory for most architectures, such as DSP systems,
FPGA or VLSI implementations [42], since fixed-point implementation allows significant
energy consumption reductions, with only insignificant reductions in performance [43].
As discussed in Chapter 2, one of the advantages of the Log-BCJR algorithm is the
reduced dynamic range of the internal variables and the LLRs. In practice, this allows
a fixed point representation to be used. In fixed-point implementation, the hardware
complexity increases linearly with the internal bit-width representation of the data since
the bit-width of the representation determines the bit-width of all the databus and the
computing resources in the datapath structure [35]. Moreover, the iterative decoding
process of Turbo-like coding schemes require a large amount of memory space to store the
internal variables. Using less bits for each variable can significantly reduce the memory
requirement and hence reduce the energy consumption of the decoder. Therefore, for
a low power implementation, minimising the number of bits required for representing
the fixed-point quantities in the algorithm is a very important issue. However, the
information lost due to the reducing of the data width will cause degradation of the
performance. Therefore, there is a trade off between communication performance and
hardware complexity. This needs to be explored for an low power design.
Many papers investigated the fixed-point implementation issues of Turbo decoders by
exploring the minimum data width of the different quantities with acceptable degrada-
tion on BER/FER chart [42–50]. However, no universal conclusion has been obtained.
Even though some of the papers were using the same specification of the simulation,
35
36 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
namely the UMTS Turbo decoder with BPSK modulation simulated in AWGN channel,
the conclusions are different [42, 43, 45, 47, 49, 50]. The reason is that, in fixed-point
implementation, there are different issues affect the decoding performance and different
techniques to deal with the issues.
The performance degradation caused by fixed point implementation is due to the lost
information, that is the underflow and the overflow. For underflow, the fraction bit-width
limited the computation accurateness of the calculations in the algorithm. Especially, in
our case, since we investigate the Log-BCJR algorithm [51] using Look-Up-Table(LUT)
to realize Jacobian logarithm, the precision of the fixed-point representing is directly
relative to the numbers of the elements in the LUT. As discussed in Chapter 2, the
max∗ operator in Jacobian algorithm is defined as:
max∗(x, y) = ln(ex + ey) (3.1)
= max(x, y) + ln(1 + e−|y−x|) (3.2)
= max(x, y) + fc(|y − x|) (3.3)
Function fc is a quantised version of function ln(1 + e−|y−x|) which is implemented
by a LUT. Therefore, the bit-width of fraction part determines the largest number of
elements in the LUT, as shown in Figure 3.1. For example, 3-bit fraction number gives
the resolution of the LUT 0.125, which makes the largest possible LUT has 7 elements.
By this analogy, 2-bit fraction number gives a 4 elements LUT and 1-bit fraction number
0 0.5 1 1.5 2 2.5 3 3.5 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
|y−x|
f(|y
−x|
Correction fuctions and its LUT implementation
correction fuction3−bit fraction LUT2−bit fraction LUT1−bit fraction LUT
Figure 3.1: correction function.
gives only 2 elements LUT, as shown in Figure 3.1. Hence, the fraction bit-width
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 37
affects not only the width of databus, computing resources and memory requirement
as discussed, but also the complexity of the LUT used in the algorithm.
The occurrence of overflow issues depends on the dynamic range of the variables in the
algorithm and the number of bits used in the integer part of the fixed-point representa-
tion. In the event of overflow, the lost information could be fatal to the system perfor-
mance. However, the dynamic range of the variables is difficult predict and sometimes
quite large requiring a large number of bits in fixed point representation to guarantee the
range is covered. In Log-BCJR decoder, there are only three different operations, “add”,
“compare” and “select”, as mentioned as ACS operations. The “compare” and “select”
operations are not be able to induce any overflow. However, any “add” is possible to be
overflowed. Take the Log-BCJR algorithm we used in the UMTS decoder in Chapter 2
for example, in the decoding trellis each α is the sum of two γ, a α from the previous
step and a correct function fc in (3.3). Since it includes a α from the previous step,
the calculation of α forms an accumulation of the α value in the trellis. Therefore, the
values of α would increase without limits as the block length is increased. The resulting
overflow in a limited data width is the most significant effect needs to be considered.
The calculation of β has the same problem. The δ calculation is the sum of a α, a β and
a γ. It is also possible to be overflowed. To deal with this issues, a number of different
techniques have been proposed [45,48].
The first approach is to saturate the over flowing data during its processing. This method
is widely used in fixed point digital filters [52, 53]. A disadvantage of this approach is
that it requires some additional saturation hardware on each computing unit that could
cause a overflow, such as adders. Our simulation results showed that this technique is
not suitable for Log-BCJR algorithm alone, but can work well in collaboration with a
second technique, namely normalisation [45].
Normalisation is applied in Log-BCJR algorithm for dealing with the overflow on α and
β internal variables particularly. It scales down the increasing metrics in each step, in
order to prevent them from increasing without bound. This reduces the occurrence of
overflow and allows the data width for representing the variables to be further reduced.
As discussed, the α and β values are accumulating in the decoding trellis. Taking α as a
example, each α is the sum of a previous α, two γ values and a correct function values.
For each α, there is a accumulation history route in the trellis. A example is shown
in Figure 3.2. Based on the algorithm describe in Chapter 2, α(S4), α(S5), α(S6) and
α(S7) accumulate from α(S2) and α(S3), which in turn accumulate from α(S1). This
accumulation continues as the forward recursion proceeds, with subsequent α values
typically becoming higher and higher. In this way, overflow can occur for the α values
calculated towards the end of the forward recursion. However, the extrinsic LLRs that
are generated by the Log-BCJR are not sensitive to the particular value of any α value,
only to the difference between the α values of states having the same bit index [48]. For
example, the Log-BCJR is not sensitive to the values of α(S4), α(S5), α(S6) and α(S7),
38 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
only to the difference α(S4)−α(S5), α(S4)−α(S6), α(S4)−α(S7) and so on. The same
conclusion can also be applied to β values. As shown in (2.19), basic operation of the
extrinsic LLRs calculation is to select two δ and to calculate the difference of them, where
δ is the sum of a particular group of a α, a β and a γ in a single step. Therefore, if the
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
State1
State2
State3
State5
State6
State7
State8
State4
0/0 1/1 1/0 0/0 1/1 0/0 0/1 1/1yn/cn
StateT1 T3
T4
T5
T6
S2 S4
S3
S7
S6
S5
S1
T2
Figure 3.2: A possible accumulation route in the trellis.
α values from the previous step reduced with a unique value before the α values of the
current step are calculated, the concerned differences remains, but the increasing speed
of the α values would be slow down. The same method also works for β values. The
normalisation technique approach such purpose. However, the normalisation process
needs extra calculation and operations to realize, increasing the datapath complexity.
In addition, the normalisation technique also has different approaches. The most widely
used normalisation technique is the subtractive normalisation [45, 48, 54]. The path
metrics is normalised by subtracting a constant from all the metrics in particular time.
Even this method also has different versions. In [45], the path metrics is subtracted with
the respective minimum one in each step. In [48], the path metrics is subtracted with
the maximum one of them in the step. This technique requires extra computations to
find the maximum path metric and perform the subtractions. In [54], a modified version
is mentioned that instead of searching for the smallest or largest metric at each step, a
fixed state metric is subtracted from all path metrics. Hence, the comparison operation
for searching the required metric can be avoided. All this different version end up with
different data width requirements in the papers’ conclusions.
The third approach explores the nature of the two’s complement representation [45].
It was first introduced for Viterbi decoders [55] and later applied to SISO decoders.
The Log-BCJR decoding process is only concerned with the difference between the path
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 39
metrics. It can be proven that all possible differences between pairs of path metrics
are upper bounded [55]. Therefore, in two’s complement representation, as long as the
difference between two metrics is not over the largest value that can be represented by
the specified data width, the subtraction can be performed correctly using modulo 2n
arithmetic by simply ignoring the overflow of the operands. Three examples of difference
calculation in such method is given in Figure 3.3. Note that in the calculation of (1+3)-2
and (2+2+2)-3, the results in the brackets are both overflowed in 3-bit two’s complement
representation, but the equivalent calculation in two’s complement representation still
gives the correct answer as long as the result does not overflowed. However, for the third
calculation, the difference in the last calculation step is overflowed. In this situation,
the two’s complement representation cannot maintain the result correct, as shown in the
figure. The modulo 2n arithmetic is naturally implemented in VLSI architecture. Thus,
000
100
010110
111 001
011101
(1+3)-2=2
(001+011)-010=100-010=100+110=010=2
(2+2+2)-3=3
(010+010+010)-011=110-011=110+101=011=3
(2+2+2)-1=5
(010+010+010)-001=110-001=110+111=101=-3
-2
0
2
-4
-1 1
-3 3
Figure 3.3: Example of difference calculation in two’s complement representation.
no additional hardware requirement is required in this approach. According to [45],
1-2 bit more data width may be required for this approach compared to subtractive
normalisation, and it is shown that for a high-speed MAP decoder one additional bit
results in approximately 25% higher area and power consumption.
In conclusion, to implement a Turbo decoding algorithm in fixed-point representation,
different choices of relative techniques have different optimal data width requirement. In
the eight similar previous works we investigated [42–47,49,50], the different environment,
design and implementation configurations lead to different conclusions. Some of them
did not even provide a clear configuration of their simulations, which make the results
unrepeatable. A brief summarisation of the configurations of the eight papers are given in
Table 3.1. Three similar turbo codes are considered in the papers, as shown in Figure 3.4.
In the figure, type-2 corresponds to the UMTS Turbo encoder that discussed in Chapter
2.
Only a few papers discussed the effects of this issue based on mathematical proofs
[45,47,48]. However, it is shown that mathematical proofs are not sufficient to decide the
optimal data width specifications in practice. Some of the mathematical proofs give the
upper bounds of the path metrics that will never being exceed in practice [48]. And when
saturation and normalisation technique is applied, the data width requirement can be
further reduced with tolerable decrease in communication performance (i.e. BER/FER
degradation) [45]. Therefore, our simulation result show that the actual dynamic range
40 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
DDD
DDD
DDD
DDD D DD
DDD
ππ
π
Type-1 Type-2 Type-3
Figure 3.4: Three different Turbo codes in previous works.
authors J. Hsu [44] G. Montorsi [47] H. Michel [42] A. Worm [45]
encoder Type-1 Type-2 Type-2 Type-2modulation BPSK BPSK/PAM BPSK BPSK
channel AWGN AWGN AWGN/Rayleigh AWGN/Rayleighinterleaver helical N/A N/A 3GPP compliant
block length (bit) 216 4828 600 600iteration times 5 10 5/10 5/7/10normalisation Yes N/A N/A Yes
wrapping/saturation N/A saturation N/A saturationLook-Up-Table 16 elements 22 elements 7/10 elements N/A
authors M. A. Castellon [43] M. A. Castellon [50] T. K. Blankenship [46] R. Hoshyar [49]
encoder Type-2 Type-2 Type-3 Type-2modulation BPSK BPSK BPSK BPSK
channel AWGN AWGN AWGN/Rayleigh AWGN/Rayleighinterleaver block prime N/A N/A ideal
block length (bit) 1024 N/A 640 2896iteration times 3/8 5/8 N/A 7/18 halfnormalisation N/A N/A N/A Yes
wrapping/saturation saturation N/A N/A N/ALook-Up-Table 2 elements 7 elements 2/4/8 elements N/A
Table 3.1: Different representation methods for integer numbers
used in fixed-point implementation can be less than the theoretical bounds predicted by
mathematical analysis. As a result, the data-width decisions of a decoding algorithm
cannot be done only based on mathematical analysis. Traditional BER/FER chart sim-
ulation is time consuming. Sometimes different types of variables in a decoding scheme
have different optimal data-widths. It induce a large number of combinations required
to be tested while using simulation to find out the optimal settings. If considering the
effects of different technique utilisation, the required simulation of drawing BER/FER
charts will be unacceptable. In addition, BER/FER chart analysis does not give an
insight into the iterative decoding convergence process. Hence, to fully investigate the
optimal data width in fixed-point implementation of a decoding algorithm, we propose
a method based on EXIT chart [41] analysis to determine the optimal fixed point spec-
ification of a Turbo-like decoder in practical implementations. Our method is less time
consuming compared with previous works using BER/FER chart to do the same analy-
sis. Moreover, our results showed that the EXIT chart provides more useful information
than BER/FER chart when determining the optimal fixed point specification. Instead
of only giving the performance result, the EXIT chart shows the convergence behaviour
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 41
of the decoder. And the reasons caused the performance degradation by insufficient
bit-width can be analysed. Hence, the proper technique to prevent the degradation can
be induced to further optimise the system. For presenting our method, we investigate
the 3GPP UMTS Turbo decoder [39] and the optimal data width specifications for its
fixed-point implementation is concluded and compared with previous works. It is easy
to apply this method to any Turbo code and potentially any Turbo-like code including
an iterative decoding scheme which can be analysed by EXIT chart.
As introduced in Chapter 2, EXIT chart analysis is a powerful tool to analyse and opti-
mise the convergence behaviour of iterative systems, such as Turbo-like decoders. Unlike
BER/FER simulation, It is less time consuming since the simulation of the interleaver in
the decoder and the actual iterative decoding process are not required. Although the ef-
fects of the performance by an sub-optimal interleaver cannot be revealed, an interleaver
only changes the order of the information sequence and no information is lost during
such a process by fixed-point implementation. Since our purpose is to investigate the
performance degradation by fixed-point implementation, the unconsidered interleaver
would not affect the result of our method. Moreover, a BER plot can only give the per-
formance of a particular number of iterations in the decoding process, while an EXIT
chart traces the convergence behaviour of the decoder allowing an arbitrary number of
iterations to be considered. Our results show that based on the analysis of EXIT chart,
not only the performance can be investigated, but also the reasons causing the perfor-
mance degradation by fixed point implementation can be identified. Hence, the proper
combination of techniques can be chosen to improve the performance. One drawback
is an EXIT chart only considers a fixed Signal-to-nose ratio (SNR) while a BER/FER
chart considered a wide range of SNR or Eb/N0. However, since the EXIT simulation
is a lot less time consuming than BER/FER simulation. It is possible to draw different
EXIT chart under different SNR if necessary. To sum up, EXIT chart is more suitable
than BER/FER chart for finding the optimal data-width setting for a fixed-point im-
plemented decoding scheme. Also, we have tested an SNR where the tunnel is narrow
and the performance is most sensitive to the fixed point representation’s limitations.
The ideal of using EXIT chart to analysis the impact of finite precision arithmetics of
Turbo codes is first introduced in [56]. However, no convincible analysis procedure and
conclusion were given. In this chapter, we first time introduce a detailed analysis method
of using EXIT chart to determine the optimal data width specification of fixed-point im-
plementation of Turbo-like decoders, by giving a fully investigation of the UMTS Turbo
decoder [39]. In Section 3.2, to demonstrate our method, we use it to select the optimal
data-width specification for UMTS Turbo decoder with a comprehensive consideration of
fixed-point implementation techniques. The conclusion is then compared with previous
works. In the last section, our conclusion is given.
42 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
3.2 Fixed-point EXIT chart analysis of UMTS Turbo De-
coder
To present our method, we use EXIT chart analysis to investigate the fixed-point effects
of the UMTS Turbo decoder implemented by Log-BCJR with Jacobian logarithm. The
specification and structure of the UMTS encoder and decoder are presented in Chapter
2. In our simulation, BPSK modulation is assumed, with an AWGN channel. We first
use a SNR=-4dB noise level for simulation, which the EXIT chart of the UMTS Turbo
code has a moderately open tunnel, so the degradation of the performance can be easy
to observe. We also chose an SNR=-4.83 dB where the tunnel is almost closed (i.e.
the onset of the Turbo cliff) and the performance is most sensitive to the fixed point
representation’s limitations to validate our optimal data width specification. Random
bit sequences are given on the input of the Turbo encoder. We use 453-bit frame length
(i.e. interleaver length), which is the geometric mean of the minimum and maximum
block length of the UMTS standard. Since the performance degradation of Turbo codes
proportion with the block length in logarithm domain, we use it to investigate the opti-
mal data-width specification. The shortest (40-bit) and longest (5114-bit) frame lengths
in UMTS Standard is then simulated under the optimal specification for investigate the
performance effect of the frame length. In addition, we gathered different conclusions
from eight previous works [42–47, 49, 50] as a comparison to show the validity of our
work.
Firstly, the effects on the EXIT chart by using three BCJR algorithms are simulated
in floating-point representation. The three algorithms are Log-BCJR using exact cal-
culation of the Jocobian logarithm, Log-BCJR using 8 elements look-up-table (LUT)
Jacobian logarithm and Max-Log-BCJR [57]. It has been proved that the performance
loss by using Jocobian logarithm is less than 0.1dB relative to the exact log calcula-
tion, which is usually considered acceptable [54]. The performance degradation of Max-
Log-BCJR UMTS Turbo decoder is also well explored. According to [58], the Eb/N0
performance degradation on 10−5 BER between Log-BCJR and Max-Log-BCJR is 0.3
dB of 640 bits block length and 0.54dB of 5114 bits block length in AWGN channel
and worse in Rayleigh fading channel, which is considered significant (not acceptable).
In our analysis process, we aim to obtain the fixed-point EXIT chart as close to the
floating-point Log-BCJR result as possible and consider the degradation similar to that
of the floating-point Max-Log-BCJR result unacceptable.
Secondly, we investigate the effects of limited fraction part on fixed-point representation.
Since the limitation on fraction part length also limited the elements in the LUT, the
numbers of elements in the LUT is also considered.
Thirdly, we investigate the effects of a limited integer part on the fixed-point represen-
tation. The three overflow control approaches discussed before are all investigated.
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 43
Finally, based on the analysis, the optimal combination of fraction length and integer
length is then investigated under the different block length. The effect of termination
techniques is investigated here. The conclusion is given and compared with previous
works.
3.3 Simulation and Analysis Results
3.3.1 Comparison between different Logarithm methods
Figure 3.5 gives the EXIT charts of UMTS Turbo decoder using the three log algo-
rithms mentioned before. The tunnel between the two curves is narrower due to the
information lost by Max-Log-BCJR. Therefore, by certain iteration decoding times, the
mutual information should be lower than Log-BCJR algorithm implementation. In other
words, to obtain the certain target BER, more decoding iterations could be required.
Therefore, we can assert that the BER degradation due to the information lost by the
implementation can be reflected in the EXIT chart. Our further simulation results prove
this conclusion.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)/I(b
e)
I(a e)/
I(b a)
Log−BCJRJacobi−Log−BCJRMax−Log−BCJR
Figure 3.5: EXIT chart of different log algorithms.
3.3.2 Comparison and Analysis in Fixed-point simulation
To analyse the effects of fixed-point representation, fixed-point data type are used for all
the variables in simulations. Later, we will use a long bit-width, 32-bit, for the fraction
44 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
part but limited bit length for the integer part in order to investigate the degradation
caused by the limited dynamic range. First however we consider the opposite in order
to investigate the performance of limited precision with sufficient integer bit-width (32-
bit). Note that the effects on the LUT in Log-BCJR is also considered here. For n-bit
fraction part representation, up to 2n elements are used in the LUT, as described in
Section 3.1. The EXIT chart results are shown in Figure 3.6. The simulation results
show that using 1-bit length in fraction part and 2 elements in the LUT gives observable
degradation in EXIT chart result, but 2-bit length in fraction part with 4 elements in
the LUT gives almost no observable degradation in the EXIT chart result. As shown in
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)/I(b
e)
I(a e)/
I(b a)
0−bit fraction1−bit fraction2−bit fraction3−bit fractionfloating point
Figure 3.6: EXIT chart of different fraction lengths.
the figure, 2-bit fraction length give no difference result compared with the floating point
result. 1-bit fraction length also gives EXIT chart very close the floating-point result
and the degradation is much less than that of the floating-point Max-Log-BCJR result.
Note that 0-bit fraction part effectively removes the LUT, transforming it from the
approx-Log-BCJR to the Max-Log-BCJR. The EXIT chart degradation is worse than
that of the Max-Log-BCJR, however because of the low resolution used for the BCJR
variables. Considering the trading-off between energy consumption and performance,
1-bit and 2-bit fraction part both possible to be sufficient for most of the applications.
Further decision should refer to later combined simulation results with limited fraction
and integer lengths. The BER chart analysis for different fraction lengths are given
in [43, 47, 50]. Both [43, 47] concluded that 2-bit for the fraction length approaches
the performance of the floating-point decoder, which could be chosen for the optimal
specification. Although, [50] declared that 3-bit fraction length gives better performance,
which only incurs a penalty of 0.015dB. The simulation result of [47] showed that 1-bit
fraction length only causes a loss of 0.1dB for medium-low SNR but has no consequences
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 45
on the error floor performance. In the eight papers we selected, five of them determined
2-bit fraction length is the most optimal choice and three of them chose 3-bit fraction
length.
For determining the optimal bit-width of integer part, many papers investigated the
optimal bit-width setting for the different internal variables (input LLRs, α, β, γ and
λ) separately. However, in practice, it is not convenient to store different variables
with different data-width memory block. The different setting of different variables
will no decrease the memory requirement rather than using a unique data-width set-
ting. Although [42] claimed that further bit-width minimisation for different variables
can reduce the switching activity which has influence on the energy consumption, as
the process technology scaling down, the contribution to the total power consumption
by dynamic power become smaller and smaller. Thus, the benefits of considered the
different variables separately is reduced. On the other hand, such a strategy requires
additional extension and clipping mechanism of the databuses in the datapath, which
increase the design complexity of the datapath. Therefore, we consider a single data-
width setting in our analysis. However, it is valuable and necessary to consider the input
LLRs and the internal variables of the SISO decoder separately, because the limit of the
input LLRs directly affect the dynamic range of the internal variables, such as α and
β. According to [47], the possible differences between pairs of path metrics ∆MAX (i.e.
the possible difference between the α values or β values in a signal step in the decoding
trellis), which is significantly important in BCJR decoder as discussed in last section,
are upper-bounded by a function of the dynamic range of the input LLRs:
∆MAX = min(wMu + dmin(w)Mc) (3.4)
where dmin(w) is the minimum weight of the code sequences generated by input se-
quences with weight w, ±Mu and ±Mc are the dynamic ranges for respectively the
two input of the SISO decoder, extrinsic information and LLRs received by the soft
demodulator. Hence, dmin(w) depends on the considered code. ±Mu and ±Mc are
simply related to the bit-width of the integer part of the input LLRs. As discussed in
the previous section, it is important to keep the difference between pairs of metrics for
maintaining the performance, and different overflow control techniques require different
data-width to guarantee this condition. Therefore, based on the discussion in [47], in
an insufficient data-width specification, the bit-width of the internal variables typically
requires a couple more bits than the bit-width of the input LLRs.
As discussed, with different bit-width settings for different variables, the transformation
between different length data needs to be carefully managed. Transforming shorter
length data to longer length data would not induce any problem since the values of the
data remain unaltered. An extension mechanism simply add zores to the extra highest
bits can solve the problem. It is easy to realize in hardware and no extra operation
required in our simulation. However, the transforming for the other direction may
46 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
cause information loss. Moreover, the highest bit in two’s complement representation
determines the sign of the value, which means simply ignoring the extra bits during the
transforming may not only reduce the value but change the sign of the data value. It will
significantly affect the correctness of the decoding process. Hence, a clipping mechanism
with saturation is required during the transforming. If the original data value is over
Decoder 1
Decoder 2 clip
clip
clip
clip
clip
clip
clip
dc
ae
ap
aa
babebc
cc
fc
ac
ec
π−1π π
Figure 3.7: Scheme of UMTS Turbo decoder.
the limitation of the aiming data width, the transformed data must be set to the limited
value. Thus, the information lost is minimum. Such a method require extra hardware to
realized in practice and extra operations in our simulation. Figure 3.7 showed that the
clipping operations we used in the decoding scheme to simulate the data transforming
between different data-widths.
3.3.2.1 Wrapping Technique
We investigated the optimal bit-width settings for the length of integer part under differ-
ent overflow control techniques by using EXIT chart analysis. As discussed before, two’s
complement representation can naturally avoid the effect of the overflow in BCJR algo-
rithm since the overflowed data can be considered as wrapping in a circle, so the distance
between two data remained. The benefit of this wrapping technique is that no extra
operation or hardware is required. Therefore, it is suitable for the cases where memory
is sufficient or a simple datapath is required. Note that for input LLRs, saturation is still
required since it has shorter bit-width than internal variables and external input, as dis-
cussed before. The wrapping technique is only suitable for the internal variables. We use
notation (LLR:X,VAR:Y) to describe the integer lengths setting, where X is the integer
length of the input LLRs of a BCJR decoder and Y is the integer length of all the other
internal variables. Figure 3.8 shows the EXIT chart results of setting (LLR:5,VAR:7),
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 47
(LLR:4,VAR:7) and (LLR:3,VAR:7). The simulation results showed that with setting
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1EXIT
5−bit LLRs/7−bit VARs4−bit LLRs/7−bit VARs3−bit LLRs/7−bit VARs
Figure 3.8: EXIT chart of different integer lengths with wrapping technique - 1.
(LLR:5,VAR:7), there is almost no degradation in EXIT chart compared with floating-
point result. It is obvious that the EXIT chart of setting (LLR:3,VAR:7) failed to create
a tunnel to (1,1) point, which means that the BER of the decoding result would be
significantly reduced. The setting (LLR:5,VAR:7) and (LLR:4,VAR:7) also give differ-
ent results in EXIT chart analysis, which is shown in Figure 3.9. It is a zoomed in
version of Figure 3.8. For setting (LLR:5,VAR:7), the curves of EXIT function Ie(Ia)
0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.990.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
EXIT
5−bit LLRs/7−bit VARs4−bit LLRs/7−bit VARs3−bit LLRs/7−bit VARs
Figure 3.9: EXIT chart of different integer lengths with wrapping technique - 2.
48 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
reaches a peak value at a certain Ia and starts decreasing. It makes the tunnel closed.
Although the close point is very near to (1,1) point, it means the best possible decoding
result of (LLR:5,VAR:7) cannot match the result of (LLR:4,VAR:7). Respectively in
BER simulation, there is going to be a degradation of BER from (LLR:4,VAR:7) to
(LLR:5,VAR:7). Note that the tunnels of (LLR:5,VAR:7) and (LLR:3,VAR:7) close in
different ways. They show the different reasons cause the closures. For (LLR:3,VAR:7),
the closure is caused by a lower increasing speed of the curves Ie(Ia). Since the only
difference between(LLR:3,VAR:7) and (LLR:4,VAR:7) is 1 bit shorter or input LLRs,
it can be conjectured that the lower Ie(Ia) is due to the lost information in LLRs by
the decreased bit-width. Thus, 4-bit integer length is the minimum sufficient bit-width
setting for input LLRs. For (LLR:5,VAR:7), the closure is due to the reduction of curves
Ie(Ia) after their peak times. The result of (LLR:4,VAR:7) is proved that such a bit-
width setting is sufficient for maintaining the validated information in all the variables,
so the performance degradation of (LLR:5,VAR:7) is due to the not enough bit-width
difference between the input LLRs and the internal variables. Because while the it-
eration times increasing, the mutual information in a priori LLRs is increasing, which
means the average absolute value of the LLRs is increasing. Due to the accumulated add
operations of the input LLRs in BCJR algorithm, insufficient difference of bit-widths
between the input LLRs and the internal variables may cause a serious overflow problem
in the calculations of the internal variables. Therefore at the end of the EXIT chart, the
function Ie(Ia) starts decreasing. It is the overflow of the internal variables exceeds the
tolerance limit of the wrapping technique caused the EXIT chart failure to reach the
(1,1) point.
Such a effect can be shown more obviously in Figure 3.10 and Figure 3.11. In Fig-
ure 3.10, for the results of (LLR:5,VAR:7) and (LLR:6,VAR:7) the peak point of the
curves occur earlier due to the even smaller bit-width difference between the input
LLRs and the internal variables. While the difference increasing, the performance reach
the best point at (LLR:4,VAR:7). On the other hand, in Figure 3.11, when the integer
length of the input LLRs becomes shorter than 4-bit, the performance getting worse
again. However, since the difference is sufficient, there are no reductions occurred in the
curves of (LLR:3,VAR:7) and (LLR:2,VAR:7). Only the increasing speed of the curves
is lower due to the insufficient bit-width of the input LLRs, which caused the closure
of the tunnel. If we further reduce the internal variables integer bit-width to 6-bit, the
tunnel in the EXIT chart is always closed before (1,1) point irrespective of the integer
bit-width of the LLRs.
In conclusion, 4-bit integer width of the input LLRs is the minimum acceptable setting
for UMTS Turbo decoders. With wrapping technique, the minimum difference between
the input LLRs and the internal variables is 3 bits. Therefore, the optimal integer length
setting is (LLR:4,VAR:7).
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 49
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.990.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
I(aa)/I(b
e)
I(a e)/
I(b a)
6−bit LLRs/7−bit VARs5−bit LLRs/7−bit VARs4−bit LLRs/7−bit VARs
Figure 3.10: EXIT chart of different integer lengths with wrapping technique - 3.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)/I(b
e)
I(a e)/
I(b a)
4−bit LLRs/7−bit VARs3−bit LLRs/7−bit VARs2−bit LLRs/7−bit VARs
Figure 3.11: EXIT chart of different integer lengths with wrapping technique - 4.
3.3.2.2 Saturation Technique
Wrapping technique is actually a “do nothing” technique. No additional operation or
hardware is used to deal with the overflow in internal variables. Another simple overflow
control technique is saturation technique. As mentioned before, the input LLRs are
forced to be saturated due to the shorter data-width than the internal variables in
our specification. The same technique can also be applied to the internal variables.
50 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
The problem is that the BCJR algorithm is using the distance between the metrics,
which are the internal variables, as described before. The saturation technique limited
all the overflowed data value to the maximum or minimum value, which changed the
difference of the variables. The differences between the overflowed data become 0 and
the differences between the overflowed and unoverflowed data are also decreased. The
simulation results showed that this problem makes the results under saturation technique
even worse than using wrapping technique. However, when subtracting normalisation,
as mentioned as rescaling normalisation technique, is applied, saturation technique is a
necessary condition to obtain the benefit from normalisation [54]. Our further simulation
results showed that normalisation technique without saturation cannot reduce the data-
width requirement less than wrapping technique. Figure 3.12 gives the simulation results
of using saturation technique. Since the conditions for the input LLRs are not changed,
the minimum integer width of them remains 4-bit. However, the required bit-width
difference between the input LLRs and the internal variables is significantly increased
due to the application of saturation technique.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1EXIT
(LLR:4,VAR:13)(LLR:4,VAR:12)(LLR:4,VAR:11)
Figure 3.12: EXIT chart of different integer lengths with saturation technique.
Although it can be observed that for setting(LLR:4,VAR:12), the tunnel is closed be-
fore (1,1) point, as shown in the figure, the closing point is very close to (1,1) point
and the EXIT curves have almost no difference with the floating point result. Hence,
(LLR:4,VAR:12) is the optimal integer bit-width setting for saturation technique. Its
EXIT chart result has almost no difference with the floating-point result. In the result
of (LLR:4,VAR:11) the tunnel is closed far before (1,1) point. Note that different from
the results of using wrapping technique, when the integer length of the internal variables
reduced to 11-bit, the function Ie(Ia) falls to near 0 values very soon after the peak point.
As discussed before, the reason caused a exit chart curves (i.e. function Ie(Ia)) starting
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 51
reducing is the insufficient difference of the integer lengths between the input LLRs and
internal variables. Since the EXIT chart of (LLR:4,VAR:13) can reach the (1,1) point, 4-
bit integer length for the input LLRs is still sufficient under saturation technique. Hence,
the saturation technique increases the requirement of the integer length difference be-
tween the input LLRs and the internal variables. As mentioned before, the decoding
result only depends on the difference between path metrics (i.e. internal variables). The
saturation technique fixed overflowed variables to the positive and negative limits. It
can be speculated that while the overflowed internal variables fixed at the limited values,
the difference between path metrics become 0. Hence the reliable soft output cannot be
obtained. When a certain amount of the internal variables overflowed, the EXIT chart
curves reduce to 0 very fast, as shown in the result of (LLR:4,VAR:11). Therefore, the
saturation technique is not suitable for convolutional codes. However, it is a precondi-
tion for applying normalisation technique. Our simulation results in with normalisation
technique show that it is important to combine saturation and normalisation techniques
together to obtain the most optimal bit-width specification.
3.3.2.3 Normalisation Technique
The limitation of the wrapping technique is that if the difference between the path
metrics exceeds the dynamic range, the subtraction would not give the correct result
any more. The purpose of the saturation technique is to fixed the problem. However,
as we showed in saturation technique simulation results, it induced another problem
which a lot of overflowed variables are fixed at the same value. Normalisation technique
is introduced to deal with such a problem. Our simulation results show that, with the
combination of saturation and normalisation, the requirement of the integer bit-width
of the internal variables can be further reduced. In our simulation, for each group the
increasing variables α and β are subtracted with the largest one of them in each step.
The EXIT chart results are shown in Figure 3.13. The optimal bit-width setting of
the integer length is (LLR:4,VAR:5), which is two less bits requirement for the internal
variables.
3.3.2.4 Final validation
To finally determine and validate the optimal data width specification for the fixed-
point implementation of the UMTS Turbo code, we investigate the EXIT chart perfor-
mance considered the combination of the limited integer and fraction lengths. Since the
simulation results from Figure 3.6, which only limited the fixed-point fraction length,
were not sufficient to determine the optimal fraction length, we consider both 1-bit
and 2-bit fraction length in our final validating simulation. We combined the frac-
tion length settings with the optimal integer length setting for the wrapping technique
(LLR:4,VAR:7) and for the normalisation technique (LLR:4,VAR:5). We use notation
52 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)/I(b
e)
I(a e)/
I(b a)
(LLR:4,VAR:7)(LLR:4,VAR:6)(LLR:4,VAR:5)(LLR:4,VAR:4)
Figure 3.13: EXIT chart of different integer lengths with normalisation technique.
(LLR:X,VAR:Y,FRC:Z) to represent the settings in our simulation results, where Z is the
length of the fraction part. Moreover, for different settings, we simulate them in different
situation, which include the longest block length (5114-bit), the shortest block length
(40-bit) and the most performance sensitive SNR (SNR=-4.83dB) where the tunnel in
EXIT is just open.
For normalisation technique, the final validation are shown in Figure 3.14 which is
the result for the longest block length, Figure 3.15 which is the result for the short-
est block length and Figure 3.16 which is the result for the most sensitive SNR=-4.83
dB. According to the results, setting (LLR:4,VAR:5,FRC:2) gives almost as same perfor-
mance as the floating-point results in different situations while setting (LLR:4,VAR:5,VAR:1)
gives further degradation due to the combined effects of the limited integer and fraction
lengths, but the degradation is not as bad as the Max-Log-BCJR though. Moreover,
according to the simulation results, the block length does not have a significant effect on
the EXIT chart. Consequently, a unique optimal specification can work for any block
length. Therefore, for normalisation technique, we conclude that (LLR:4,VAR:5,FRC:2)
is the optimal specification of the UMTS Turbo decoder.
For wrapping technique, the final validation are shown in Figure 3.17 which is the result
for the longest block length, Figure 3.18 which is the result for the shortest block length
and Figure 3.19 which is the result for the most sensitive SNR=-4.83 dB. Clearly with
the results in the figures, 2-bit fraction part is the best option in our case. Hence, for
wrapping technique, (LLR:4,VAR:7,FRC:2) is the optimal specification of the UMTS
Turbo decoder.
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 53
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Fixd−point (LLR:4,VAR:5,FRC:1)Fixed−point (LLR:4,VAR:5,FRC:2)Floating−point
Figure 3.14: Simulation results of 5114-bit block length in fixed-point with normali-sation and floating-point.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Fixd−point (LLR:4,VAR:5,FRC:1)Fixed−point (LLR:4,VAR:5,FRC:2)Floating−point
Figure 3.15: Simulation results of 40-bit block length in fixed-point with normalisationand floating-point.
To compare our results with previous works, for the input LLRs, [42, 47, 49] claimed
that 3-bit integer length is sufficient. However, they only considered the input LLRs
received from the channel. In EXIT chart simulation, the considered input LLRs are
the a priori input of the concatenate decoders, which includes the channel input and
the extrinsic information from the other decoder. Since the they are both the input
of the concatenate decoders, it is more reasonable to consider that they have the same
bit-width. Hence, our results showed that 4-bit integer length is the optimal setting for
the input LLRs. [46] gave the same conclusion about the integer bit-width of the input
54 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Fixd−point (LLR:4,VAR:5,FRC:1)Fixed−point (LLR:4,VAR:5,FRC:2)Floating−point
Figure 3.16: Simulation results of SNR=-4.83dB/453-bit block length in fixed-pointwith normalisation and floating-point.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Fixd−point (LLR:4,VAR:7,FRC:1)Fixed−point (LLR:4,VAR:7,FRC:2)Floating−point
Figure 3.17: Simulation results of 5114-bit block length in fixed-point with wrappingtechnique and floating-point.
LLRs. The other papers mentioned before did not considered the input LLRs separately.
For the internal variables, [44,46] considered all the different internal variables (i.e. γ, α,
beta and delta) separately. The longest variables require 8 bits for the integer part. [50]
concluded that 7-bit is the optimal setting. The conclusion of [49] is 6-bit. [42, 43] has
the same conclusion with our result that 5-bit is the most optimal setting. The reason
caused so many different conclusion is the different circumstance used in the simulation,
Chapter 3 Optimal Data-width Settings for Fixed-point Implementation 55
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Fixd−point (LLR:4,VAR:7,FRC:1)Fixed−point (LLR:4,VAR:7,FRC:2)Floating−point
Figure 3.18: Simulation results of 40-bit block length in fixed-point with wrappingtechnique and floating-point.
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
I(aa)
I(a e)
Fixd−point (LLR:4,VAR:7,FRC:1)Fixed−point (LLR:4,VAR:7,FRC:2)Floating−point
Figure 3.19: Simulation results of SNR=-4.83dB/453-bit block length in fixed-pointwith wrapping technique and floating-point.
as discussed before. For example, [49] used normalisation in their simulation, but only at
a couple of certain steps in the decoding process while we used it in each step. Hence, it
concluded with one more bit requirement for the internal variables than our conclusion.
By using EXIT chart, we proved that with proper overflow control techniques, the
optimal bit-width specification for the UMTS Turbo decoder is (integer:4-bit,fraction:2-
bit) for the input LLRs and (integer:5-bit,fraction:2-bit) for the internal variables.
56 Chapter 3 Optimal Data-width Settings for Fixed-point Implementation
In conclusion, we introduced a method to determine the optimal data width specifica-
tion for implementing a Turbo code in a low power fixed point system based on EXIT
chart analysis. By applying the method to the UMTS Turbo code, we demonstrate
the advantage of the method compared with conventional method based on BER chart
analysis. The different techniques for reduce the data width requirement of fixed point
turbo code implementation are also discussed.
Chapter 4
Energy Estimation Decoding
Algorithm
In this Chapter, a framework to estimate the energy consumption of a encoder/decoder
on the algorithmic level is proposed.
4.1 Introduction
There are different aspects to evaluate a systems power/energy consumption. For in-
stance, the average power is directly related to the chip heating and temperature issues
and the worst case instantaneous power affects the voltage drop problem [59]. In low
power WSN applications, such as Body Area Networks, a long life-time is the most
important motivation for applying low power techniques. Since a lot of advanced tech-
niques, such as clock gating and power gating, can help to increase the life-time of a
system without changing the average power while the system is fully operating, the es-
timation of power consumption of a design in early design stage is not good enough to
investigate its potential life-time. Hence, energy consumption estimation is more suit-
able for this issue. Indeed, the latest works on long life-time WSNs design issues are
more focused on energy consumption based design [60–62].
Power/Energy estimation is required for all levels of abstraction in the design flow with
different purposes [63]. In later stages, such as gate-level or transistor level, very accurate
estimation can be given, since most of the information of implementation is available.
On the other hand, most of the design effort has already been invested by this stage.
Not a lot power reduction can be achieved after the estimation. The purposes of pow-
er/energy estimation at these stages are only to fine-tune the design and verify that
the power constraints have been met. Therefore, to design an extremely low-power sys-
tem, power/energy estimation is more important at the early stages. By being aware
57
58 Chapter 4 Energy Estimation Decoding Algorithm
of forecasted energy consumption during the early design stages, more energy reduction
can be achieved. However, in very early design stages, such as algorithm level design,
most knowledge of the physical parameters affecting the energy consumption are not
available which make the energy estimation at this stage very difficult. Hence, in tradi-
tional design flow, communication engineers estimate the computational complexity of
an algorithm instead of energy estimation to evaluate an algorithm design. The com-
plexity indicates the certain computing resource an algorithm needs. However, in the
case of a Turbo decoder for example, [64] demonstrated that memory access rather than
computational complexity is the most critical part of the decoder in terms of energy con-
sumption. The same assertion can be given to many other types of system design [65,66].
There are also other components in the implementation of a system, such as the data-
path selection logic, the internal registers and the controller. Again, their contribution
to the energy consumption cannot be predicted by the computational complexity of an
algorithm. Therefore, a lower complexity algorithm cannot guarantee a lower energy
consumption implementation. For low power design, energy estimation would provide
more information than complexity estimation.
As discussed in Chapter 1, in short range, lowpower WSNs, such as BANs, due to the low
transmission power, energy consumption in the physical layer, especially in the channel
coding, scheme could have a significant contribution on the energy consumption of the
whole system. In this chapter, we propose a framework for estimating energy consump-
tion at a very early stage, namely the algorithm level, of a channel coding system. As
we focus on the energy consumption rather than the average power consumption of a
channel coding system, a lot of decisions are required in algorithm level design which can
affect the energy consumption of the final implementation. Moreover, after this stage,
the basic scheme is fixed. The potential reduction of power consumption is then limited.
Therefore, the decisions in algorithm level design is very important to a low power de-
sign. Our framework aims to rank various coding scheme design options and thus helps
in selecting the one that is potentially more effective from the energy point of view. Since
the encoding algorithms are typically of low complexity and energy consumption, we are
particularly interested in the energy consumption of decoding algorithms, which are
typically much higher. The framework is suitable for all turbo-like algorithms and even
other types of algorithms in channel coding systems, such as equalisation, interference
cancellation and MIMO (Multi-Input and Multi-Output) detection. The knowledge of
the hardware design in later design stages is not required while applying the framework.
There are two classic approaches to implement a coding scheme, namely DSP (Digital
Signal Porcessing) implementation and ASIC (Appication-Specific Integrated Circuit)
implementation. A DSP system is based on a general purpose processer with an in-
struction set. Thus, the algorithm is realized by assembly language program. An ASIC
system, on the other hand is a specific designed system for a particular application.
Therefore, the hardware design would be very optimal for the applying algorithm. DSP
Chapter 4 Energy Estimation Decoding Algorithm 59
implemenation is widely used in traditional WSNs applications due to its general appi-
cability. However, comparing with ASIC implementation, DSP implementation’s long
execution time and low hardware usage efficiency are not suitable for low power systems.
Moreover, the lower bound energy consumption of a coding scheme is an important issue
which needs to be considered in Physical Layer design. Therefore, our framework aims
to estimate possible energy consumption of an algorithm in ASIC implementations.
4.2 Previous works
In previous works, power/energy estimation in early design stage, as mentioned as high-
level power/energy estimation, can be divided in two categories. One is based on DSP or
FPGA implementations [60,67,68]. More specifically, in order to simplify the problem,
these methods assume the fixed architectural templates, that are offered by DSPs and
FPGAs. The benefit of such types of methods is that it is easy to make the approaches
suitable for a widely range of algorithm. However, DSP or FPGA implementations are
not suitable for extremely low power applications, since the unique characteristics of an
algorithm cannot be explored in these architectures. Such characteristics may be utilised
in hardware implementation, which are very important for low power design, which is
why the lower bound energy consumption is of particular interest to communication
engineers.
To investigate the distinct characteristics of an algorithm, energy estimation of possible
ASIC implementations is required. This is the other category of high-level power/energy
estimation, mostly referred as behavioural level power/energy estimation which based
on executable behavioural descriptions [63].
“Algorithm level”, usually refers to a clear mathematical description of an algorithm, is
a more general concept compared with behavioural level since the term algorithm level
is widely used in communication area but “behavioural level” is a concept in hardware
design area, which means an executable program or a clearly flow chart description with
detailed operations requirement. However, they have no clear distinction from the pow-
er/energy estimation point of view. They both refer to a clear mathematical description
of an algorithm and lack of knowledge of the architecture of the implementation.
One type of behavioural level power estimation method is the activity-based model,
which typically assumes some architectural style or template and produces physical
capacitance and switching activity estimations of the resources based on it [69]. The
dynamic power is then expressed as:
P =∑
r∈{all resources}
frCrV2dd (4.1)
60 Chapter 4 Energy Estimation Decoding Algorithm
where fr is the access frequency of resource r which is produced by activity prediction,
Cr is the switched capacitance of r and Vdd is the supply voltage. [69] The equation
has a couple of equivalent transformation in different methods, but they all based on
(4.1). Typically, in such methods [70–72], only dynamic power is considered. However,
as IC process technology enter deep submicron sizes, an exponential increase in the
subthreshold leakage current arises, which makes the leakage power of CMOS circuits
unneglectable. Moreover, switching activity estimation for sequential circuits is difficult
and time consuming, which makes such type of method difficult to use in a practical
algorithm design stage. An alternative approach is offered by a complexity-based model,
which considers the power/energy consumption of a system to be a sum of different
entities power/energy consumption. In [73], the power consumption of cryptographic
algorithms are estimated based on how many different components (registers, adders ...
etc) were used and what type of memory was chosen. In [74], the energy consumption
in a digital CMOS circuit is expressed as:
E = µNgatesEgate (4.2)
where µ is the circuit activity, Egate is the energy consumption per switching gate of a
reference cell (e.g. 2-input NAND gate) in a particular technology and Ngates is the gate
approximate equivalent count of the design with the reference cell. There parameters
can be obtained by specification parameters of the technology or by simulation. Hence,
all three types of power components (datapath, memory and controller) in CMOS circuit
are considered automatically. The drawback of these methods is that the activity of the
circuit is roughly estimated by using only one parameter. The framework we proposed
conquers this drawback by considered the different components separately.
The other challenge of high-level power/energy estimation is that unlike DSP or FPGA
implementation, ASIC design is can be optimised for the specific algorithm. As a re-
sult, the possible implementation can be difficult to predicate in the algorithm level.
Some previous works obtained the specifics of hardware implementation by using high-
level synthesis tools [73, 74]. Others transfer the behavioural description into a more
complicated description, which may include boolean functions, truth tables or circuit
design [70, 72]. There approaches require knowledge of hardware design and synthe-
sis processes. Further more, the required program and simulation processes are time
consuming, which are not desirable for an algorithm level design stage. To conquer this
challenge, our framework relies on the designer to specify the algorithm partitioning and
resource constraints, but avoids an actual hardware design process. In this approach the
framework not only estimates the potential energy consumption of an algorithm but also
provides feedback on the quality of a design strategy.
Chapter 4 Energy Estimation Decoding Algorithm 61
4.3 A framework for quantifying the energy consumption
of a Turbo-like decoder
In this section, a framework that allows us to compare and estimate the energy consump-
tion of a Turbo-like decoder design at the algorithm level is proposed. Traditional low
power system design methods can only design the algorithm based on the computational
complexity analysis. Our framework provides an ample opportunity of feeding back the
energy estimation result to algorithm choice or design steps. Based on the purpose of
our work, the comparison and evaluation of the different Turbo-like code algorithm, we
develop two levels of the framework. For comparison of different algorithms, it is not
necessary to estimate all the possible energy consumption in the implementation of the
algorithms, since this could be a time consuming work at the algorithm level. There-
fore, in the level 1 of our framework, we aim to provide a quick method which allows
communication engineers to compare different algorithms from the energy consumption
point of view with little extra effort. We only considered two main parts of the possible
energy consumption while implementing an algorithm in hardware design, the energy
consumption by all the operations in the algorithm and the energy consumption by the
memory requirement of the algorithm. In our case, for a Turbo-like code, all the possible
operations in the algorithm are ACS operations. The reason we select these two parts
of energy consumption in the system in level 1 framework is because that only these two
parts are directly related to the target algorithms. The other parts of energy consump-
tion of the system, such as the energy contribution by the controller and the datapath
structure, could be variable depending on different design strategies. Therefore, only
an approximate estimation can be given at the algorithm level. However, in level 2
framework, we aim to give a energy estimation which considered all the possible energy
contribution in the system. The level 1 framework is presented in Section 4.3.1. The
level 2 framework is still in the future work planning stage, which discussed in Section
4.3.2.
4.3.1 Level 1 of the framework
For considering the energy consumption of the computing operations in the target al-
gorithm, a conventional complexity analysis of the algorithm need to be performed.
Taking one convolutional decoder in the UMTS Turbo decoding scheme as an example,
the algorithm is introduced in Chapter 2. We category all the ACS operations into
two operations, additions (including subtractions) and the max∗ operations. For a n-
bit decoding frame, the complexity analysis result shows that the decoding algorithm
including 97n − 10 additions and 30n − 20 max∗ operations.
62 Chapter 4 Energy Estimation Decoding Algorithm
For considering the energy consumption of each operation in the algorithm, we imple-
mented the addition and the max∗ operation in gate-level design based on STMircoelec-
tronis 0.12 µm technology standard cell library. We consider a max∗ operation with a 4
elements LUT’s support here. The data width of the operation units is 8-bit, which is
sufficient for the target convolutional decoding process, according to the results in Chap-
ter 3. Then analysis the energy consumption of each operation by power analysis tool
Synopsys PrimeTime [75]. Such a procedure is strictly following the standard ASIC de-
sign procedure. With the assumption that in the critical path of the implementation has
no more than 10 adders and the system clock is lower than 10MHz, our power analysis
result is that the typical energy consumption of an addition operation in our specifica-
tion is Eadd = 0.04591pJ . With the same specification, the typical energy consumption
of a max∗ operation is Emax∗ = 175pJ . Note that a max∗ operation including a more
than one comparison and addition operations and a 4 elements LUT which consumed
much more energy than an addition operation. The conventional complexity analysis
cannot taking such difference between the operations into account. Therefore, based on
the complexity analysis and our power analysis results, the total energy consumption by
the operations in the UMTS decoding algorithm can be calculated by:
Eoperations = Eadd × (97n − 10) + Emax∗ × (30n − 20) (4.3)
The calculation result of Eoperations is 5254.4∗n−3500 pJ. Therefore, for a 40-bit frame,
the decoding energy consumption by the operations is 2.07 × 105 pJ. For a 5114-bit
frame, the decoding energy consumption by the operations is 2.69 × 107 pJ.
For considering the energy consumption by the memory requirement of the algorithm,
we first need to address the total memory requirement of the algorithm, that is, how
many variables are required to be stored during the decoding. This required to analysis
the dependence of the different stage in the algorithm. According to the introduction
in Chapter 2, in the UMTS Turbo decoder, the decoding algorithm of the convolutional
code including five stage, the calculation of γ, α, β, δ and ye. The dependence of the
stage can be shown in Figure 4.1. As shown in the figure, the γ values are required
to be stored for the calculation of α, β and δ. The γ,α and β values all need to be
stored for the calculation of δ. Since the input can be used to calculate γ straight away,
and the δ can be used to calculate ye and the output straight away, there is no need
to store such variables. The energy consumption of the memory is basically depends
on the reading and the writing times of the memory. The writing times is equal to the
number of variables need to be stored since each variable only need to be wrote in the
the memory for one times. The reading times is equal to times of using the variables in
the algorithm, since the variables can only be stored in the memory, for how many times
of using the variable, there are how many times of reading the variable from the memory.
According to the algorithm, for an n-bit frame, there are 32n − 40 γ values, 8n − 10 α
values, and 8n − 10 β values. Therefore, there are 48n − 60 times of writing required
Chapter 4 Energy Estimation Decoding Algorithm 63
γ
α β
δ
ye
Input
Output
Figure 4.1: The dependence between the different stages.
during the decoding. Each γ is used once in α calculation and once in β calculation.
Only half of the γ is used once in the δ calculation. Therefore, the total times of reading
γ is 2.5 × (32n − 40) = 80n − 100 times. Each α is used once for δ calculation, which
induce 8n−10 times reading. Each β is used once for δ calculation, which induce 8n−10
times reading. To sum up, 96n− 120 times of reading is required for the decoding. Due
to the lack of memory standard cells in our standard cell library. The power analysis
of memory unit cannot be performed. Therefore, we used the datasheet of a 64Mbits
memory product NEC uPD4564163 [76] to calculate the energy consumption of the
memory. According to the datasheet, assuming no miss reading happened during the
decoding processing, each reading or writing operation required 2 clock cycles at least.
One reading or writing operation consume 9900 pJ. Note that the product is outdated
compared with the technology we used for estimate the operation energy consumption.
Therefore, with the proper memory product, the energy consumption might be reduced.
In this case, the reading and writing consume the same amount of energy in the memory.
The total energy consumption can be simple calculation:
Ememory = 9900 × (96n − 120) = 9.5 × 105n − 1.19 × 106(pJ) (4.4)
Therefore, for a 40-bit frame, the total energy consumption by the memory is Ememory =
3.68 × 107 pJ. For a 5114-bit frame, Ememory = 4.86 × 109 pJ.
In this level of framework, the analysis results allow the comparison of different Turbo
decoding algorithms from the operation energy point of view and the memory energy
64 Chapter 4 Energy Estimation Decoding Algorithm
point of view. If the same technology library of the memory and the standard cells both
avaliable, it is reasonable to consider the sum of the two energy consumption part is the
energy consumption estimation directly related to the algorithm. As discussed, the rest
part of the energy consumption in the system highly depends on the design strategy
hence cannot be accurately estimated at the algorithm level.
In next section, we discuss the future work of the level 2 of our framework, which
aim to give a total energy consumption estimation considering all the possible energy
contribution in the system. As discussed, since accurately estimation of the total system
energy consumption is impossible at the algorithm level, reasonable assumptions are
required for level 2 of the framework.
4.3.2 Future work: Level 2 of the framework
The level 2 of the framework we propose is based on complexity, memory and parallelism
analysis of the mathematical description of the algorithm. By converting the descrip-
tion into factor graphs, the computing resource, memory requirement and parameters
of control unit can be obtained. The energy estimation is based on a look-up table of
the energy consumption of different entities in design. The look-up table is built using
the simulation of a particular technology library, in our case, STMicroelectronis 0.12µm
process standard cell library. A digital circuit system can be divided into three compo-
nents, namely the datapath architecture, the system memory and the controller. Hence,
the total energy consumption of a system is divided into three parts, as expressed in
(4.5):
Etotal = Edatapath + Ememory + Econtroller (4.5)
Our framework estimates the energy using clock cycle accurate analysis and timing anal-
ysis. The cycle-accurate analysis considers the energy consumption of different hardware
components in different modes of operation, typically operating mode and idle mode.
The timing analysis considers the required operation times of different components and
the total time consumption of processing a typical task (e.g. decoding a data frame).
The energy consumption of the system, such as average energy consumption per clock
cycle or the energy consumption of a particular task, can be obtained.
A flowchart of the framework is shown in Figure 4.2. The mathematical description of
a algorithm is the basic input of the framework. A executable program is not required.
For a complicated algorithm, it is usually divided into many steps for ease of imple-
mentation. Therefore, a partitioning analysis of the algorithm is needed. After this,
the algorithm can be converted into a factorgraph-based description. In our framework,
a factor graph is used for describing the computation complexity in the algorithm and
an overall flowchart is used to describe the dependence between the steps. The com-
puting resource required can be estimated based on the factor graph. Note that the
Chapter 4 Energy Estimation Decoding Algorithm 65
Mathmatical Description
Partitioning Analysis
Overall Flowchart
Computing Resource Estimation Resource Constraint
EstimationMemory Requirement Controller State
Estimation Control Signal Estimation
Memory Access Estimation
Timing Analysis
MemoryEnergy Estimation of
DatapathEnergy Estimation of
ControllerEnergy Estimation of
Factor Graphic forComplexity Analysis
Figure 4.2: Flowchart of energy estimation framework.
computing resource estimation includes all the entities in the datapath. By considering
the resource requirement of each step and the dependence information in the overall
flowchart, the overall resource constraint can be obtained by analysis. In addition, the
estimation of control signals, controller state, memory requirement and memory access
are given. Timing analysis is given with the information from the factor graph and over-
all flowchart. This generates the total clock cycle requirement to get the clock frequency
constraint and for later cycle-accurate estimation. Finally, the three parts of energy
consumption, the datapath, the memory and the controller can be obtained, as shown
in in Figure 4.2.
Chapter 5
Conclusions and Further Works
In this report, we give an investigation of the state of the art of the development of
wireless communication system for BANs. Based on the investigation of the previous
works, we proposed a promising solution of applying Turbo-like codes in the channel
coding scheme of BANs communication system. Based on the proposal, we brought out
the requirement of exploring the fixed-point low power implementation of Turbo-like
codes and evaluating the different Turbo-like codes from the energy consumption point
of view.
Therefore, in chapter 3, we proposed a method based on EXIT chart analysis to deter-
mine the optimal data width specification of a Turbo-like decoding algorithm in fixed-
point low power implementation. The issue is significantly important to the energy
consumption of the implementation. We represent our method by applying it to the
UMTS Turbo decoder. We considered the different conditions of the overflow issue in
the implementation and compared our result with the previous works. The advantages of
our method compared with conventional BER/FER chart analysis method are revealed.
In chapter 4, we proposed a framework to evaluate the different Turbo-like codes from
the energy consumption point of view. The framework has two levels. The level 1 of the
framework considered the energy consumption of the required ACS operations in the
algorithm and the related memory requirement. Level 1 has a relatively simple proce-
dure, which can be easily applied to the target algorithms. It offer a better evaluation
of the algorithms than conventional complexity evaluation of the algorithms from en-
ergy consumption point of view but with little extra effort required. The level 2 of the
framework is the future work plan of the project. It aims to create a procedure to al-
low gate-level energy consumption estimation of the target algorithms without hardware
design knowledge requirement. Some detail of this level of the framework is discussed.
The works presented in this report is the preparation of exploring the novel decoding
scheme on the relays in BANs proposed in Chapter 1. Therefore, the future work is to
67
68 Chapter 5 Conclusions and Further Works
apply the two proposed methods to Turbo-like codes which considered suitable for BANs
applications. With the consideration of the other part of energy consumption on the
relays including the receiving and transmission power, the modulation and demodulation
schemes, our novel decoding scheme can be investigated.
Bibliography
[1] S. Drude, “Requirements and applications scenarios for body area netwokrs,” in
Mobile and Wireless Communications Summit, 2007. 16th IST, 2007.
[2] T. G. Zimmerman, “Personal area networks: Nearfield intrabody communication,”
IBM System Journal, vol. 35, pp. 609–617, 1996.
[3] B. Zhen, H. Li, and R. Kohno, “IEEE body area netwokrs for medical applica-
tions,” in Wireless Communication Systems, 2007. ISWCS 2007. 4th International
Symposium on, 2007.
[4] M. L. R. Fox, H. Symons, S. Berson, and H. Westphal, “Fcc pro-
poses rules for body area networks (mban),” Jul. 2009, access
from:http://mobihealthnews.com/3078/fcc-proposes-rules-for-body-area-networks-
mban/.
[5] “Revision of part 15 regarding ultra-wideband transmission systems. first report and
order, et docket, 98-153, fcc 02-48,” Federal Communications Commission (FCC),
Tech. Rep., 2002.
[6] J. Ryckaert, C. Desset, V. de Heyn, M. Badaroglu, P. Wambacq, G. V. der Plas,
and B. V. Poucke, “Ultra-wideband transmitter for wireless body area networks,”
in Proceeding on 14th IST Mobile & Wireless Communications Summit, Jun. 2005.
[7] H. Li, K. Takizawa, B. Zhen, and R. Kohno, “Body area network and its standard-
ization at IEEE 802.15.MBAN,” in Mobile and Wireless Communications Summit,
2007. 16th IST, Jul. 2007, pp. 1–5.
[8] J. A. D. Moutinho, “Wireless body area network,” 2009, rECIN2009.
[9] V. M. Jones, R. G. A. Bults, D. Konstantas, and P. A. M. Vierhout, “Healthcare
pans: Personal area networks for trauma care and home care,” in In 4th Inter-
national Symposium on Wireless Personal Multimedia Communications (WPMC),
2001, pp. 1369–1374.
[10] M. Soini, J. Nummela, P. Oksa, L. Ukkonen, and L. Sydnheimo, “Wireless body area
network for hip rehabilitation system,” Ubiquitous Computing and Communication
Journal, vol. 3, p. 7, 2008.
69
70 BIBLIOGRAPHY
[11] B. Zhen, “Ban technical requirements,” IEEE 802.15.TG6, Tech. Rep., Sep. 2008.
[12] B. Latr, I. Moerman, B. Dhoedt, and P. Demeester, “Networking in wireless body
area networks,” in in 5th FTW PHD Symposium, Interactive poster session, Dec.
2004, p. 113.
[13] C. K. Singh and A. Kumar, “Performance evaluation of an IEEE 802.15.4 sensor
network with a star topology,” Wireless Networks, vol. 14, no. 4, pp. 543–568, Aug.
2008.
[14] S. Choi, S. Song, K. Sohn, H. Kim, J. Kim, J. Yoo, and H. Yoo, “A low-power star-
topology body area network controller for periodic data monitoring around and
inside the human body,” in 2006 10th IEEE International Symposium on Wearable
Computers, Oct. 2006, pp. 139–140.
[15] A. G. Ruzzelli, R. Jurdak, G. M. P. O’Hare, and P. V. D. Stok, “Energy-efficient
multi-hop medical sensor networking,” in Proceedings of the 1st ACM SIGMOBILE
international workshop on Systems and networking support for healthcare and as-
sisted living environments, 2007, pp. 37–42.
[16] B. Latre, B. Braem, I. Moerman, C. Blondia, E. Reusens, W. Joseph, and P. De-
meester, “A low-delay protocol for multihop wireless body area networks,” in Mo-
biQuitous 2007. Fourth Annual International Conference on Mobile and Ubiquitous
Systems: Networking & Services, Aug. 2007, pp. 1–8.
[17] J. Misic, “Enforcing patient privacy in healthcare WSNs using ECC implemented on
802.15.4 beacon enabled clusters,” in Pervasive Computing and Communications,
2008. PerCom 2008. Sixth Annual IEEE International Conference on, 2008.
[18] J. Rousselot, A. El-Hoiydi, and J.-D. Decotignie, “Performance evaluation of the
IEEE 802.15.4a UWB physical layer for body area networks,” in Computers and
Communications, 2007. ISCC 2007. 12th IEEE Symposium on, 2007.
[19] D. Domenicali and M.-G. D. Benedetto, “Perfromance analysis for a body area net-
work composed of IEEE 802.15.4a devices,” in Proceedings of 4th Workshop on Po-
sitioning, Navigation and Communication 2007(WPNC’07), Hannover, Germany,
Mar. 2007, pp. 273–276.
[20] M. R. Yuce, “Implementation of body area networks based on MICS/WMTS med-
ical bands for healthcare systems,” in IEEE Engineering in Medicine and Biology
Society Conference (IEEE EMBC08), Aug 2008, pp. 3417–3421.
[21] S. Stoa, I. Balasingham, and T. A. Ramstad, “Data throughput optimization in
the ieee 802.15.4 medical sensor networks,” in ISCAS 2007. IEEE International
Symposium on Circuits and Systems, May. 2007, pp. 1361–1364.
BIBLIOGRAPHY 71
[22] X. Liang and I. Balasingham, “Performance analysis of the IEEE 802.15.4 based ecg
monitoring network,” in Proceeding of The Seventh IASTED International Confer-
ences on Wireless and Optical Communications (WOC’07), 2007.
[23] R. C. Shah, L. Nachman, and C. Wan, “On the performance of bluetooth and ieee
802.15.4 radios in a body area network,” in Proceedings of the ICST 3rd interna-
tional conference on Body area networks, Tempe, Arizona, 2008.
[24] D. D. Arumugam and D. W. Engels, “Impacts of rf radiation on the human body
in a passive rfid environment,” in 2008 IEEE Antennas and Propagation Society
International Symposium, Jul. 2008, pp. 1–4.
[25] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon limit error correcting
coding and decoding: Turbo codes,” in IEEE Proceedings of the Int. Conf. on
Communications, 1993.
[26] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding:
Turbo-codes,” IEEE Trans. on Communications, vol. 44, no. 10, pp. 1261–1271,
Oct. 1996.
[27] I. Joe, “Energy efficiency maximization for wireless sensor netwokrs,” international
Federation for Information Processing, vol. 211, pp. 115–122, 2006.
[28] S. Benedetto and G. Montorsi, “Serial concatenated of block and convolutional
codes,” Electronics Letters, vol. 32, no. 10, pp. 887–888, May. 1996.
[29] R. Gallager, “Low-density parity-check codes,” IRE Transaction on Information
Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962.
[30] J. G. D. Forney, “Concatenated codes,” Massachusetts Institute of Technology Re-
search Lab of Electronics, Tech. Rep., 1966.
[31] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” SIAM
Journal of Applied Math, vol. 8, pp. 300–304, 1960.
[32] P. Elias, “Coding for noisy channels,” in IRE Convention Record Pt. 4, 1955, pp.
37–37.
[33] J. H. Yuen, M. K. Simon, W. Miller, F. Pollara, C. R. Ryan, D. Divsalar, and
J. C. Morakis, “Modulation and coding for satellite and space communications,” in
Proceedings of the IEEE, vol. 78, no. 7, Jul. 1990, pp. 1250–1265.
[34] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm,” IEEE Transactions on Information Theory, vol. IT-13, pp.
493–497, Apr. 1967.
[35] E. Boutillon, C. Douillard, and G. Montorsi, “Iterative decoding of concatenated
convolutional codes: Implementation issues,” in Proceedings of the IEEE, 2007.
72 BIBLIOGRAPHY
[36] B. Sklar, Fundamentals of Turbo Codes, Digital Communications: Fundamentals
and Applications, Second Edition. Prentice-Hall, 2001.
[37] C. Schlegel and L. Perez, Trellis and Turbo coding, ser. IEEE Press series on digital
& mobile communication, J. B. Anderson, Ed. John Wiley & Sons, 2004.
[38] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for
minimizing symbol error rate,” IEEE Transactions on Information Theory, vol. 20,
no. 3, pp. 284–287, Mar. 1974.
[39] “3rd generation partnership project; technical specification group radio access net-
work; multiplexing and channel coding (tdd) (release 7),” 3GPP Organizational
Partners (ARIB, ATIS, CCSA, ETSI, TTA, TTC), Tech. Rep., 2008.
[40] C. Weiss, C. Bettstetter, and S. Riedel, “Code construction and decoding of parallel
concatenated tail-biting codes,” IEEE Transactions on Information Theory, vol. 47,
no. 10, pp. 366–386, Jan. 2001.
[41] S. ten Brink, “Convergence behavior of iteratively decoded parallel concatenated
codes,” IEEE Transactions on Communications, vol. 49, no. 10, pp. 1727–1737,
Oct. 2001.
[42] H. Michel and N. Wehn, “Turbo-decoder quantization for umts,” IEEE Communi-
cation Letters, vol. 5, no. 2, pp. 55–57, Feb. 2001.
[43] M. A. Castellon, I. J. Fair, and D. G. Elliott, “Fixed-point Turbo decoder implemen-
tation suitable for embedded applications,” in Electrical and Computer Engineering,
2005. Canadian Conference on, May. 2005, pp. 1065–1068.
[44] J. Hsu and C. Wang, “On finite-precision implementation of a decoder for turbo
codes,” in Proceedings of the 1999 IEEE International Symposium on, vol. 4, Or-
lando, FL, USA, Jul. 1999, pp. 423–426.
[45] A. Worm, H. Michel, F. Gilbert, G. Kreiselmaier, M. Thul, and N. Wehn, “Ad-
vanced implementation issues of turbo-decoders,” in Proc. 2nd International Sym-
posium on Turbo-Codes and Related Topics, 2000, pp. 351–354.
[46] T. K. Blankenship and B. Classon, “Fixed-point performance of low-complexity
turbo decoding algorithms,” in Vehicular Technology Conference, 2001. IEEE VTS
53rd, vol. 2, Rhodes, Greece, 2001, pp. 1483–1487.
[47] G. Montorsi and S. Benedetto, “Design of fixed-point iterative decoders for concate-
nated codes with interleavers,” IEEE Journal on Selected Areas in Communications,
vol. 19, pp. 871–882, 2001.
[48] Y. Wu, B. D. Woerner, and T. K. Blankenship, “Data width requirements in siso
decoding with modulo normalization,” IEEE Transactions on Communications,
vol. 49, no. 11, pp. 1861–1868, Nov. 2001.
BIBLIOGRAPHY 73
[49] R. Hoshyar, A. R. S. Bahai, and R. Tafazolli, “Finite precision Turbo decoding,”
in Proc. 3rd International Symposiumon Turbo Codes and Related Topics, Brest,
France, Sep. 2003, pp. 483–486.
[50] A. Morales-Cortes, R. Parra-Michel, L. F. Gonzalez-Perez, and T. G. Cervantes,
“Finite precision analysis of the 3gpp standard turbo decoder for fixed-point im-
plementation in fpga devices,” in Reconfigurable Computing and FPGAs, 2008.
International Conference on, Dec. 2008, pp. 43–48.
[51] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “A soft-input soft-output
app module for iterative decoding of concatenated codes,” IEEE Communications
Letters, vol. 1, no. 1, pp. 22–24, Jan. 1997.
[52] V. Singh, “Elimination of overflow oscillations in fixed-point state-spece digital
filters using saturation alrithmetic,” IEEE Transactions on Circuits and Systems,
vol. 37, no. 6, pp. 814–818, Jun. 1990.
[53] D. A. Balley and A. A. Beer, “Simulation of filter structures for fixed-point imple-
mentation,” in Proceeding of the 28th Southeastern Symposium on System Theory,
Baton Rouge, LA, USA, 1996, pp. 270–274.
[54] G. Masera, Turbo Code Applications: a journey from a paper to realization, K. Srip-
imanwat, Ed. Springer Netherlands, 2005.
[55] A. Hekstra, “An alternative to metric rescaling in viterbi decoders,” IEEE Tran-
scations on Communications, vol. 37, pp. 1220–1222, Nov. 1989.
[56] B. Riaz and J. Bajcsy, “Impact of finite precision arithmetics on exit chart analysis
of turbo codes,” in 5th IEEE Consumer Communications and Networking Confer-
ence, 2008. CCNC 2008., 2008.
[57] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and sub-
optimal map decoding algorithm operating in the log domain,” in Proceeding of
IEEE International Conference of Communication, 1995, pp. 1009–1013.
[58] M. C. Valenti and J. Sun, “The UMTS Turbo code and an efficient dcoder imple-
mentation suitable for software-defined radios,” International Journal of Wireless
Information Networks, vol. 8, no. 4, pp. 203–215, Oct. 2001.
[59] F. N. Jajm, “A survey of power estimation techniques in VLSI circuits,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 2, no. 4, pp.
446–455, Dec. 1994.
[60] O. Celebican, T. S. Rosing, and V. J. M. III, “Energy estimation of peripheral
devices in embedded systems,” in Proceedings of the 14th ACM Great Lakes sym-
posium on VLSI, Boston, MA, USA, 2004, pp. 430 – 435.
74 BIBLIOGRAPHY
[61] J. Kaza and C. Chakrabarti, “Design and implementation of low-energy turbo
decoders,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,
vol. 12, no. 9, pp. 968–977, Sep. 2004.
[62] S. Chouhan, R. Bose, and M. Balakrishnan, “A framework for energy-consumption-
based-design space exploration for wireless sensor nodes,” IEEE Transaction on
Computer-Aided Design of Intergrated Circuits and Systems, vol. 28, no. 7, pp.
1017–1024, Jul 2009.
[63] E. Macii, “CAD algorithms, methods and tools for low-power circuits and systems,”
IEEE Technology Surveys, Tech. Rep., 2006.
[64] G. Masera, M. Mazza, G. Piccinini, F. Viglione, and M. Zamboni, “Architectural
strategies for low-power VLSI Turbo decoders,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 10, no. 3, pp. 279–285, Jun. 2002.
[65] K. Hildingsson, T. Arslan, and A. T. Erdogan, “Energy evaluation methodology for
platform based system-on-chip design,” in Proceedings of IEEE Computer society
Annual Symposium on, Feb. 2004, pp. 61–68.
[66] T. V. Aa, M. Jayapala, F. Barat, H. Corporaal, F. Catthoor, and G. Deconinck,
“A high-level memory energy estimator based on reuse distance,” in Proceedings of
the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES’05),
San Jose, Calif, USA, Mar. 2005.
[67] J. Laurent, E. Senn, N. Julien, and E. Martin, “High-level energy estimation for
DSP systems,” in Proceedings of Int. Workshop on Power And Timing Modeling,
Optimization and Simulation PATMOS 2001, 2001, pp. 311–316.
[68] C. Menn, O. Bringmann, and W. Rosenstiel, “Controller estimation for FPGA
target architectures during high-level synthesis,” in Proceedings of the 15th inter-
national symposium on System Synthesis, 2002.
[69] P. Landman, “High-level power estimation,” in Proceedings of ISLPED, 1996, pp.
29–35.
[70] P. Surti and L. Chao, “Controller power estimation using information from behav-
ioraldescription,” in ISCAS ’96., vol. 4, May. 1996, pp. 679–682.
[71] J. N. Kozhaya and F. N. Najm, “Accurate power estimation for large sequen-
tial circuits,” in Proceedings of the 1997 IEEE/ACM international conference on
Computer-aided design, San Jose, California, United States, 1997, pp. 488 – 493.
[72] M. Lesser and V. Ohm, “Accurate power estimation for sequential cmos circuits
using graph-based methods,” VLSI Design, vol. 12, pp. 187–203, 2001.
BIBLIOGRAPHY 75
[73] M. Khaddour and O. Hammami, “High level energy consumption estimation of
cryptographic algorithms,” in ICTTA 2008. 3rd International Conference on, Apr.
2008, pp. 1–6.
[74] A. B. A. Garcia, J. Gobert, T. Dombek, H. Mehrez, and F. Petrot, “Energy estima-
tions in high level cycle-accurate descriptions of embedded systems,” in Proceedings
of 5th International Workshop on Design and Diagnostics of Electronic Circuits
and Systems (DDECS’2002), Brno, Czech Republic, Apr. 2002, pp. 228–235.
[75] “Datasheet of primetime,” Synopsys, Tech. Rep., 2009.
[76] “Nec upd4564163 datasheets: 64mbit synchronous dram, 4 bank, lvttl,” NEC, Tech.
Rep., 1998.