2008:010 civ master's thesis downlink baseband decoder …1024368/... · 2016. 10. 4. · the...
TRANSCRIPT
2008:010 CIV
M A S T E R ' S T H E S I S
Downlink Baseband DecoderImplementation
Ulf Andersson Magnus Isaksson
Luleå University of Technology
MSc Programmes in Engineering Electrical Engineering
Department of Computer Science and Electrical EngineeringDivision of Signal Processing
2008:010 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--08/010--SE
Downlink Baseband DecoderImplementation
Ulf AnderssonMagnus Isaksson
Ericsson Lindholmen
November, 2007
ABSTRACT
Previous generations of cellular networks was built for telephone calls and slow data trans-
mission. Due to the rapid changes in information technology, these factors do not meet the
requirements of today’s wireless revolution. The first specifications for The 3rd generation sys-
tem (3G) was released 2000 from the 3GPP collaboration group. WCDMA is one of the air
interfaces in the specifications.
In 2001 the first phase of HSDPA, High Speed Downlink Packet Access, was introduced to
the specifications. Instead of sending the data on a dedicated channel to each user the radio
resources are used more efficiently in HSDPA by using shared channels in the downlink. A
control channel signals which users that is to receive data in each time instance.
This master thesis has been carried out at DBP IoV (Downlink Baseband Processing Integra-
tion and Verification) at Ericsson, Lindholmen in Gothenburg. This department is responsible
for realtime target integration and verification of the baseband processing system in WCDMA,
including testenvironment development and testcase design. Currently tests are in most cases
executed and recorded for offline analysis. The memory available for recording on the test
hardware limits the maximum run-time of the tests. To be able to run long tests data has to
be decoded and analyzed in real-time.
The purpose of this thesis was to design and implement a realtime decoder for a subset of
WCDMA, namely the downlink HSDPA channels. It should be investigated how much that can
be done in software on a DSP and how much, if at all, that needs to be done in hardware on
an FPGA. This was then to be implemented on and integrated into existing test environments.
A comprehensive study of WCDMA in general and HSDPA in particular has been carried
out. The specifications define in detail how encoding is done, so the core part of the thesis was
to design a decoder based on these. During the project there was a need to verify parts of the
implementation so an encoder was programmed in Matlab, enabling control of all parameters.
It was concluded that decoding could be done entirely on the DSP, and a working decoder
software was made. This does however have limitations in the number of users (mobiles) in the
system and only supports one cell. If some of the processing is emigrated to hardware (FPGA)
these limitations could easily be overcome.
Keywords: 3G, WCDMA, HSDPA, DSP, Real-Time Decoder
iii
PREFACE
This master thesis is the final part of the MSc programme in Electrical Engineering. It has
been carried out at Ericsson AB Lindholmen, Gothenburg 2007.
We would like to thank our supervisor Stefan Davidsson at Ericsson and our examiner Per
Lindgren at LTU. We would also like to thank everyone at Ericsson who has supported us and
helped us with this project. We would especially like to mention Henrik Haggebrandt who
provided us with FPGA code, Johan Fredriksson who introduced us to the test environment,
Ulf Pettersson who helped us with the DSP, Joakim Eriksson who helped us integrate our
software into the existing environment and Anders Andersson & Jan Lindskog for HSDPA
related questions and ideas.
Ulf Andersson and Magnus Isaksson
Gothenburg, November 2007
v
CONTENTS
Chapter 1: Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Abbrevations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2: Theory 3
2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Cellular networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 3G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 3GPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 UTRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 WCDMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 General Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.4 Multipath Diversity and Rake Receivers . . . . . . . . . . . . . . . . . . . 9
2.3.5 Near/Far Problem and Power Control . . . . . . . . . . . . . . . . . . . . 9
2.3.6 Handovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.7 Protocol architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.8 Medium Access Control Protocol . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.9 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.10 Logical Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.11 Transport Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.12 Physical Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.13 Transport blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.14 Channel coding and multiplexing . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.15 Spreading and modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 HSDPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Time units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 Power control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Rate adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.5 Fast packet scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.6 MAC-hs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.7 Hybrid ARQ with soft combining . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.8 HARQ in HSDPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.9 HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.10 Coding of HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.11 HS-SCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.12 Coding of HS-SCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.13 HS-DPCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 3: Test environment 45
3.1 TXAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.1 General overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.2 DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Test execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chapter 4: Method 47
4.1 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Descrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2 HS-SCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.3 HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Matlab Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Development tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Other tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.3 User handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.4 HS-SCCH decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5.5 HS-DSCH decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.6 User checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.7 Cell checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6.2 TXADCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6.3 Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 5: Result 59
5.1 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 6: Discussion 61
6.1 Design choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
viii
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Appendix A:Modulation and FEC coding 63
A.1 Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.1.1 QPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.1.2 16QAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.2 Error coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A.2.1 Convolutional coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
A.2.2 Turbo Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.3 Rate matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.3.1 Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.3.2 Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.3.3 WCDMA Rate matching algorithm . . . . . . . . . . . . . . . . . . . . . . 70
A.3.4 Viterbi decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Appendix B:Charts 73
Appendix C:Abbreviations 77
ix
CHAPTER 1
Introduction
1.1 Background
The “Baseband Processing” department at Ericsson Lindholmen, is responsible for the design,
development, and maintenance of the Downlink and Uplink Baseband Processing subsystem
(DBP and UBP) for Ericsson’s 3G base-stations. The IoV subdepartment is responsible for
realtime target integration and verification of the baseband processing system, including testen-
vironment development, testcase design and to execute integration and verification.
Currently tests are in most cases executed and recorded for offline analysis. The memory
available for recording on the test hardware limits the maximum run-time of the tests. Longer
tests are necessary to detect errors that might only appear after running for a relatively long
time, possibly hours or days. To be able to run long tests data has to be decoded and analyzed
in real-time.
1.2 Purpose
The purpose of this project was to investigate the real-time requirements of decoding a subset of
the channels in WCDMA, more specifically HSDPA downlink channels. How much can be done
in software (on a DSP, Digital Signal Processor) and how much needs to be done in hardware
(on a FPGA, programmable )? This should then be implemented and integrated into existing
test environments.
1.3 Limitations
The decoding should be limited to one cell only, using the primary scrambling code. Transmit
diversity and MIMO (Multiple Input Multiple Output) should not be considered, and are not
covered.
1
2 Introduction
1.4 Abbrevations
There are a lot of abbreviations used in these report, a collection of these can be found in
appendix C.
CHAPTER 2
Theory
2.1 Basics
2.1.1 Cellular networks
A cellular network is a radio network made up of a number of radio cells (or just cells) each
served by a fixed transmitter, known as a cell site or base station. These cells are used to cover
different areas in order to provide radio coverage over a wider area than the area of one cell.
[12]
The primary requirement of a cellular system is to have a method to distinguish the transmit-
ters in the different cells from each other. There are two ways to do this: Frequency Division
Multiple Access (FDMA) and Code Division Multiple Access (CDMA). Time Division Multiple
Access (TDMA) allows the same frequency to be used by different users in different time slots
but cannot be used alone to separate cells.
The increased capacity of a cellular network compared to a network with a single transmitter
comes from the fact that the same radio frequency can be reused in different geographical areas.
The frequency reuse factor is the rate at which the same frequency can be used in the network.
In case of FDMA the same frequencies cannot be used in neighbouring cells because of cell
overlapping and inter-cell disturbance, the frequency reuse factor is lower than 1. In CDMA
cells are distinguished by codes rather than frequencies which means that the frequency reuse
factor can be 1.
The use of multiple cells for mobile transceivers means that there has to be some mechanism
for the transceivers to change cells as they move around. This is usually called handover.
2.1.2 History
The 1st generation mobile phone systems (1G) were analog systems launched in the 1980s. One
such example is NMT (Nordic Mobile Telephone).
The 2nd generation systems (2G) were digital systems using either CDMA or TDMA multi-
3
4 Theory
plexing techniques, launched in the 1990s. The most common one is GSM, using TDMA. The
second generation of networks were built mainly for telephone calls and slow data transmission.
Due to the rapid changes in information technology, these factors do not meet the requirements
of today’s wireless revolution.
The 3rd generation system (3G) is defined in the IMT-2000 standard created by ITU in 1999.
It defines five possible radio interfaces for 3G. One of these is WCDMA, which is the one used
in the European 3G standard UMTS.
2.2 3G
2.2.1 3GPP
The 3rd Generation Partnership Project (3GPP) is a collaboration agreement that was estab-
lished in December 1998. The scope of 3GPP is to make technical specifications for a globally
applicable third generation (3G) mobile phone system following the IMT-2000 standard. The
current partners are ARIB (Japan), CCSA (China), ETSI (Europe), ATIS (North America),
TTA (Korea) and TTC (Japan).
3GPP standards are referred to as releases, each one introducing new features. In 2000 the
first standard, Release ’99, was released, defining the UMTS network. Following this Release 4
came in 2001 and Release 5 in 2002, introducing HSDPA. The latest release is Release 6 from
2004, and Release 7 and 8 are in progress.
2.2.2 UTRAN
The radio access network in UMTS is called UTRAN (UMTS Terrestrial Radio Access Network),
see figure 2.1 for an overview of the architecture.
Figure 2.1: UMTS architecture
The UTRAN consists of one or more Radio Network Subsystems (RNS). Each RNS consists
of one Radio Network Controller (RNC) and one or more Node Bs, also called base stations
or RBS (Radio Base Station) in Ericsson terms. The RNCs are responsible for the control of
all radio resources of the Node Bs connected to it. The function of the Node B is air interface
2.3. WCDMA 5
processing and some radio resource management.
There are a lot of open interfaces connecting every part of the network to allow parts from
different manufacturers to be used. The interface between RNCs is called Iur and between
RNCs and Node Bs Iub. The radio access network is connected to a core network with the Iu
interface. There are two types of Iu interfaces; IuCS to accommodate circuit switched (CS) data
and IuPS to accommodate packet switched (PS) data. The Core Network (CN) is responsible
for switching and routing calls and data connections to external networks.
Mobile phones are called UE (User Equipment) and connects to the Node Bs with Uu in-
terface. The Uu interface is the WCDMA radio interface which will be described in the next
section.
2.3 WCDMA
2.3.1 General Properties
WCDMA, also referred to as UTRA (UMTS terrestrial radio access), is the air interface used
for UMTS (Universal Mobile Telecommunications System). WCDMA is a Wideband Direct Se-
quence Code Division Multiple Access (DS-CDMA) spread spectrum system. User information
bits are spread over a wide bandwidth by a spreading code and multiplied with a pseudo-
random scrambling code. The spread bits are called chips. WCDMA has a flexible multirate
transmission scheme to support transmission of different types of services with different data
rates and QoS (Quality of Service) parameters.
In a spread spectrum system the processing gain is the ratio of the spread bandwidth to the
unspread bandwidth. A higher processing gain gives a lower signal to interference ratio, or C/I
(carrier-to-interference), but lower bit rates. As an example a ratio of 256 gives a processing
gain of 24 dB. The required power density over interference density is typically 5 dB for speech
service, which gives a C/I = 5 − 24 = −19dB [7]. This means that the signal can be 19 dB
under the interference or thermal noise power and still be detected. This is the reason why
spread spectrum systems have found its use in military applications for several decades.
There are two possible duplex modes, FDD (frequency division duplex) and TDD (time
division duplex). In FDD separate carrier frequencies are used for uplink and one for downlink,
while in TDD one carrier is time shared between downlink and uplink. FDD is the primary
mode used in UMTS and the one described from here on.
The chip rate in WCDMA is 3.84 Mcps (Megachips per second) and the carrier bandwidth
approximately 5 MHz. The carrier spacing has a raster of 200 kHz and can vary from 4.2 to 5.4
MHz depending on interference scenarios. The frame length is 10 ms and each frame is divided
into 15 slots.
From here on the uplink will only be described briefly and the downlink in more detail.
2.3.2 Spreading
The spreading process is also known as channelization. Spreading is basically done by assigning
the data bits 0 and 1 the values 1 and -1, repeating them by the spreading factor (SF) and
multiplying with the channelization code.
The channelization codes are orthogonal variable-length Walsh codes, also known as OVSF
6 Theory
(Orthogonal Variable Spreading Factor). The data bits 0 and 1 are assigned the values 1 and
-1. The creation of the codes can be recursively defined as in equation 2.1.
Cch,1,0 = 1,[
Cch,2,0
Cch,2,1
]
=
[
Cch,1,0 Cch,1,0
Cch,1,0 −Cch,1,0
]
,
Cch,2n+1,0
Cch,2n+1,1
Cch,2n+1,2
Cch,2n+1,3...
Cch,2n+1,2n+1−2
Cch,2n+1,2n+1−1
=
Cch,2n,0 Cch,2n,0
Cch,2n,0 −Cch,2n,0
Cch,2n,1 Cch,2n,1
Cch,2n,1 −Cch,2n,1...
...
Cch,2n,2n−1 Cch,2n,2n
−1
Cch,2n,2n−1 −Cch,2n,2n
−1
(2.1)
See figure 2.2 for the beginning of the code tree.
Figure 2.2: Walsh Tree
A given code can only be used if there are no other codes used on the path from that given
code to the root of the tree, or any code belonging to the sub-tree generated from that specific
code. Otherwise the code would not be orthogonal with every other code.
The channelization code is used in the uplink to separate channels from each UE, and in the
downlink to separate channels from each cell. The codes are denoted Cch,SF,k where k is the
code number and SF the spreading factor. In WCDMA the SF may vary from 4 to 256 chips
on uplink channels and 4 to 512 chips on downlink channels[4].
An example with a spreading factor 4 can be seen in figure 2.3. In a correlation receiver the
signal is despread and integrated (summed) over one bit. On row 5 it can be seen that the sum
2.3. WCDMA 7
will be 4 or -4 respectively for data 1 or -1. On the last row the signal has been coded with
another spreading code, the result when despreading is integration values lingering around zero.
To get the original data, the integrated values are divided by the code length. In the example
this results in dividing 4 and -4 with 4, yielding 1 and -1.
Figure 2.3: Example of spreading and despreading, SF = 4
With perfect timing different codes are completely uncorrelated. Unfortunately the nature of
radio transmission with multipath propagation, small timing errors and motion-related effects
makes this impossible. Furthermore there is a limited codes space, so in order to separate users
and cells and solve the orthogonality problem scrambling codes are used.
8 Theory
2.3.3 Scrambling
Scrambling is used to separate UE’s in the uplink and cells in the downlink from each other.
The scrambling codes are pseudo-random, or pseudo-noise (PN), codes. If two transmitters
use different codes there is a “low” noise-like correlation at any time offset, where the average
correlation level is proportional to 1/codelength. The self-correlation is also low if the offset is
larger than one chip.
There are two types of codes, long and short codes. The long codes are Gold codes truncated
to the 10 ms frame length, thus resulting in 38400 chips with 3.84 Mcps. The short code
length is 256 chips and the codes are chosen from the extended S(2) code family. The uplink
scrambling may use both long and short codes whilst downlink uses only long codes.
2.3.3.1 Downlink scrambling code generation
Figure 2.4: Scrambling code generator
A total of 218 scrambling codes (0 - 262142) can be generated[4]. Primarily 8192 of these
are used (another 2 · 8192 are used for compressed mode), divided into 512 sets of primary
scrambling codes each with 15 secondary scrambling codes. The primary scrambling codes are
p = 16 · i where i = 0..511. The i:th set of secondary scrambling codes consist of code numbers
s = 16 · i + k where k = 1..15.
The set of primary scrambling codes is further divided into 64 scrambling code groups, each
consisting of 8 primary scrambling codes. The j:th scrambling code group consists of primary
scrambling codes 16 · 8 · j + 16 · k where j = 0..63 and k = 0..7. Each cell is associated with one
primary scrambling code.
The reason for the code groups is to facilitate the cell search procedure. The code group is
signaled on one of the synchronization channels. The UE receives this and correlates all codes
in the group with the pilot channel, scrambled with the primary scrambling code of the cell.
When a peak is found the UE has found the scrambling code of the cell.
2.3. WCDMA 9
The scrambling code sequences are constructed by combining two real sequences into a com-
plex one. The basis for the real sequences are so-called m-sequences, or maximum length
sequences (MLS). An m-sequence cycles through all possible 2m − 1 states within the shift
register. A further description is outside the scope of this report.
The real sequences are constructed by position-wise modulo-2 sum of 38400 chip segments
from two m-sequences x and y with generator polynomials of degree 18. The polynomials are
Gx = 1 + X7 + X18
Gy = 1 + X5 + X7 + X10 + X18(2.2)
In hardware this is implemented by using maximal linear feedback shift registers (MLFS),
figure 2.4 is taken from the 3GPP specifications[4].
2.3.4 Multipath Diversity and Rake Receivers
When transmitting radio signals over land one will experience multiple reflections, diffraction
and attenuation of the signal energy. This is called multipath propagation and is caused by
buildings, mountains and so on. If these paths are nearly equal in length this will result in
signal cancellation, called fast fading.
If the distance is large (> 78m in UMTS = speed of light/chip rate) the signal energy will
arrive at the receiver across clearly distinguishable time instants, giving a certain multipath
delay profile. The receiver can then separate the multipath components and combine them to
obtain multipath diversity. Such a receiver is called a Rake receiver and can have a different
number of so called fingers allocated to the delay positions with significant energy, and then
combine these to get the correct signal.
2.3.5 Near/Far Problem and Power Control
Tight and fast power control is very important in WCDMA, without it a single overpowered
UE could block a whole cell. A mobile at the edge of a cell may suffer a path loss from one
that is closer to the base station. The mobile close to the base station could easily over shout
the other and give rise to the near-far problem.
Open-loop power control is used by the UE to make a rough estimate of path loss by measuring
the downlink beacon channel CPICH as an initial power setting when entering a cell.
Fast closed-loop power control is used on the dedicated channels. The base station measures
the Signal-to-Interference Ratio (SIR) from each mobile and compares it to a reference level.
Based on this it will command the mobile to raise or lower its power. This is executed at a rate
of 1500 Hz. If no control where to be used, there would be impossible for the Node B to decode
all signals received, due to the nature of CDMA. That is, the orthogonal coding requires all
signals to be of equal amplitude when despreading. The same method is used on the downlink
to provide more power to UEs at the cell edge and for power/radiation reduction purposes.
There is also an outer loop power control where the target SIR setpoint for each mobile can
be adjusted from Bit Error Ratio (BER) or Block Error Ratio (BLER) estimates by the Radio
Network Controller (RNC).
10 Theory
2.3.6 Handovers
When a user moves from one cell to another a handover has to occur. In a traditional hard
handover the current connection is broken before a connection to the new cell is established. In
a soft handover on the other hand the connection to the new cell is established before leaving the
current cell. This is the main type of handovers in WCDMA, and a special case of this is softer
handover where the links added or removed belong to the same base station. Hard handovers
are used in WCDMA to change to another frequency carrier or another system, like GSM.
To support hard handover something called compressed mode is used. This basically means
that either the spreading factor is decreased or bits are punctured to allow empty timeslots for
inter-frequency measurements. Compressed mode will not be covered further.
2.3.7 Protocol architecture
An overview of the protocol architecture can be seen in figure 2.5.
Figure 2.5: Overview of the protocol architecture
The three protocol layers are the physical layer (layer 1), data link layer (layer 2) and network
layer (layer 3). Layer 2 contains a number of sublayers, the Medium Access Protocol and Radio
Link Control. There are also two service-dependent protocols, the Packet Data Convergence
2.3. WCDMA 11
Protocol (PDCP) and Broadcast/Multicast Control Protocol (BMC).
The PDCP is used for packet switched services, mainly IP. One of its main functions is header
compression. The BMC is used for cell broadcast services.
The RRC encapsulates higher layer control messages (call control, session management etc)
for transmission over the radio interface. The control interfaces between the RRC and the lower
layer protocols are used to configure parameters for the different channels, measurements and
error reporting etc.
The RLC provides segmentation and retransmission services, flow control and ciphering etc.
2.3.8 Medium Access Control Protocol
In the Medium Access Control (MAC) the logical channels are mapped to transport channels.
The MAC layer consists of three logical entities, MAC-b for the broadcast channel, MAC-c/sh
for common and shared channels and MAC-d for the dedicated channels. Other functions
performed in the MAC include but are not limited to:
• selection of appropriate Transport Format for each Transport Channel depending on in-
stantaneous source rate
• priority handling between data flows of one UE and between UE’s
• multiplexing/demultiplexing of upper layer PDU’s (Protocol Data Units) into/from trans-
port blocks
• traffic volume measurement
2.3.9 Channels
As can be seen in figure 2.5 there are three different channel types: Logical Channels, Trans-
port Channels (TrCH) and Physical Channels (PhCH). The channel mapping from logical to
transport to physical channels is shown in figure 2.6. There are also physical channels used
only for physical procedures.
2.3.10 Logical Channels
There are two different types of logical channels, Control Channels and Traffic Channels.
Control Channels
• Broadcast Control Channel (BCCH) - Downlink channel for broadcasting system control
information.
• Paging Control Channel (PCCH) - Downlink channel that transfers paging information.
• Dedicated Control Channel (DCCH) - A point-to-point bidirectional channel that trans-
mits dedicated control information between the network and a mobile station.
• Common Control Channel (CCCH) - A bidirectional channel that transmits control
information between the network and an UE.
12 Theory
Figure 2.6: Channel mapping
Traffic Channels
• Dedicated Traffic Channel (DTCH) - A point-to-point channel dedicated to one mobile
station for the transfer of user information. Can exist in both uplink and downlink.
• Common Traffic Channel (CTCH) - A point-to-multipoint downlink channel for the
transfer of dedicated user information to all or a group of UEs.
2.3.11 Transport Channels
Two types of transport channels exist, dedicated channels and common channels. The only
dedicated channel is DCH. The common channels are
• Broadcast Channel (BCH) - Downlink transport channel used to broadcast system- and
cell-specific information. Transmitted over the entire cell with low fixed bit rate.
• Forward Access Channel (FACH) - Downlink transport channel that carries information
to mobile stations known to be located in the given cell.
• Paging Channel (PCH) - Downlink transport channel that carries data relevant to the
paging procedure. Associated with the transmission of physical-layer Paging Indicators
to support efficient sleep-mode procedures.
• Random Access Channel (RACH) - Uplink transport channel intended to carry control
information or small amounts of packet data from the mobile station. Characterized by
a collision risk and by using only open-loop power control.
2.3. WCDMA 13
2.3.12 Physical Channels
• Physical Random Access Channel (PRACH) - Carries RACH.
• Primary Common Control Physical Channel (P-CCPCH) - Carries BCH at a fixed rate
of 30 Kbps and channelization code Cch,256,1.
• Secondary Common Control Physical Channel (S-CCPCH) - Carries FACH and PCH
at a variable rate.
• Dedicated Physical Data Channel (DPDCH) - Carries DCH
• Dedicated Physical Control Channel (DPCCH) - Carries control information.
• Synchronisation Channel (SCH) - Used for cell search, two sub-channels: primary and
secondary SCH.
• Common Pilot Channel (CPICH) - Carries a predefined bit sequence with channelization
code Cch,256,1.
• Acquisition Indicator Channel (AICH) - Carries acquisition indicators used in random
access procedure.
• Paging Indication Channel (PICH) - Carries page indicators to indicate a page message
on the PCH.
2.3.13 Transport blocks
Transport block sets with one or more Transport Blocks (TB) arrives from the MAC to the
physical layer every TTI. The Transmission Time Interval (TTI) is TrCH specific and can be
10 (one radio frame), 20, 40 or 80 ms.
The transport format (TF) defines the data in a transport block set and consists of two
parts, semi-static and dynamic. The semi-static parts are common to all transport formats in
a transport channel and are:
• Transmission time interval (TTI)
• Error protection scheme
• Size of CRC
• Static rate matching parameter
The dynamic parts can be different for every transport format, these are:
• Transport block size
• Transport block set size
All transport formats associated with a channel forms a transport format set (TFS), and each
format has a unique identifier called the transport format identifier (TFI). Several transport
channels can be multiplexed to one coded composite transport channel (CCTrCH), as described
in the next section. The collection of transport formats used in a CCTrCH is called the transport
format combination (TFC), with the identifiers called TFCI.
14 Theory
2.3.14 Channel coding and multiplexing
The general coding and multiplexing of transport channels DCH, RACH, BCH, FACH and
PCH is shown in figure 2.7. The inputs to the process are the transport block sets arriving
every TTI.
Figure 2.7: Downlink multiplexing and channel coding
2.3. WCDMA 15
Each step will be described briefly in following sections.
2.3.14.1 CRC attachment
Cyclic Redundancy Check (CRC) is used for error detection on transport blocks. Higher layers
define if there should be CRC of size 24, 16, 12, 8 or no CRC. The generator polynomials used
are:
• gCRC24 (D) = D24 + D23 + D6 + D5 + D + 1
• gCRC16 (D) = D16 + D12 + D5 + 1
• gCRC12 (D) = D12 + D11 + D3 + D2 + D + 1
• ggCRC8 (D) = D8 + D7 + D4 + D3 + D + 1
Where for example gCRC8 (D) means that the polynomial bit string is 110010111. The in-
formation is divided modulo 2 by the generator polynomial and the remainder becomes the
checksum.
2.3.14.2 Transport block concatenation and code block segm entation
After CRC attachment the transport blocks are concatenated and possibly segmented to dif-
ferent coding blocks.
The number of transport blocks on a TrCH i is denoted Mi. Bi is the number of bits in
each block, including CRC parity bits. The number of bits after serial concatenation of the Mi
blocks is Xi = MiBi. Segmentation of the bit sequence is done if Xi > Z, where Z = 504 for
convolutional coding and Z = 5114 for turbo coding.
The number of code blocks is
Ci =
⌈
Xi
Z
⌉
. (2.3)
For Ci > 0 the number of bits in each code block is
Ki =
⌈
Xi
Ci
⌉
. (2.4)
For turbo coding if Xi < 40 then Ki = 40.
If Xi is not a multiple of Ci, or if turbo coding is used and Xi < 40, filler bits are added to
the beginning of the first block. The number of filler bits, Yi is determined by
Yi = CiKi − Xi. (2.5)
Concatenation is done to avoid the overhead of added tail bits, while segmentation is done
to keep down the implementation complexity.
16 Theory
2.3.14.3 Channel coding
The bits are encoded with one of the Forward Error Correcting algorithms, convolutional or
turbo coding, described in appendix A.
After encoding the number of bits Yi for each code block depends on the coding scheme
according to:
• Convolutional coding
– Rate 1/2: Yi = 2Ki + 16.
– Rate 1/3: Yi = 3Ki + 24.
• Turbo coding
– Rate 1/3: Yi = 3Ki + 12.
The encoded blocks are serially concatenated. The total number of output bits is Ei = CiYi.
2.3.14.4 Rate matching
WCDMA provides flexible data rates and the number of bits on a transport channel can vary
between different TTIs. The rate matching adapts this resulting bit rate to the limited possible
bit rates of a physical channel. Bits are repeated or punctured according to the rate matching
attribute, which is semistatic and can only be changed through higher layer signaling. The rate
matching algorithm is further described in section A.3.3 in appendix A.
2.3.14.5 Insertion of DTX indication, fixed positions
Fixed or flexible positions of transport channels in the radio frame can be used, this will not
be explained in detail here. If fixed positions are used a fixed number of bits are reserved for
each TrCH in the radio frame. If the positions are fixed and the output from the rate matching
stage does not fill up the reserved bits, DTX indications are inserted here. DTX indicate when
the transmission should be turned off.
2.3.14.6 1st interleaving
The 1st interleaving is a block interleaver with inter-column permutations. For 10 ms TTI this
stage is transparent.
2.3.14.7 Radio frame segmentation
For TTI’s longer than 10 ms the input bit sequence is segmented and mapped onto Fi consec-
utive radio frames.
2.3.14.8 TrCH Multiplexing
Every 10 ms, one radio frame from each TrCH is delivered to the TrCH multiplexing. These
radio frames are serially multiplexed into a coded composite transport channel (CCTrCH).
2.3. WCDMA 17
2.3.14.9 Insertion of DTX, flexible positions
If the positions are flexible and there still are bits left in the radio frame these are filled up with
DTX indications.
2.3.14.10 Physical channel segmentation
When more than one PhCH is used this step divides the bits among the different PhCH’s. Note
that the actual mapping is done after interleaving.
2.3.14.11 2nd interleaving
The second interleaver performs intra-frame interleaving. It is applied separately for each
physical channel segment.
It is a block interleaver and consists of bits input to a matrix with padding, the inter-column
permutation for the matrix and bits output from the matrix with pruning.
Let U be the number of bits in one radio frame, then the output bit sequence from the block
interleaver is derived as follows:
1. Assign C2 = 30 to be the number of columns of the matrix. The columns of the matrix
are numbered 0, 1, 2, ..., C2 − 1 from left to right.
2. Determine the number of rows of the matrix, R2, by finding minimum integer R2 such
that U ≤ R2 · C2. The rows of rectangular matrix are numbered 0, 1, 2, ..., R2 - 1 from
top to bottom.
3. Write the input bit sequence into the R2×C2 matrix row by row starting in column 0 of
row 0. If R2 · C2 > U dummy bits are added to the end.
4. Perform the inter-column permutation for the matrix based on the pattern shown in table
2.1, where P2(j) is the original column position of the j-th permuted column.
5. The output of the block interleaver is the bit sequence read out column by column from
the inter-column permuted matrix. The output is pruned by deleting dummy bits that
were padded to the input of the matrix before the inter-column permutation.
Number of columns Inter-column permutation pattern
C2 P2(0), P2(1)...P2(C2 − 1)
30 0, 20, 10, 5, 15, 25, 3, 13, 23, 8, 18, 28, 1, 11, 21,
6, 16, 26, 4, 14, 24, 19, 9, 29, 12, 2, 7, 22, 27, 17
Table 2.1: Inter-column permutation pattern
2.3.14.12 Physical channel mapping
The physical channel mapping is described in section 2.3.9.
18 Theory
2.3.15 Spreading and modulation
C c h , S F , mS e r i a l
t oPa ra l l e l
I / QM a p p e r j
I + j Q
S d l , n
P h C H # n S
Figure 2.8: Downlink spreading and modulation
Figure 2.8 shows the spreading that is done for every physical channel except SCH. SCH
instead carries a special predefined synchronization sequence. QPSK is used for all channels
(again except for SCH). With HSDPA (which will be described in the next section) another
modulation, 16QAM, may be used.
2.3.15.1 IQ mapping
QPSK The binary value 0 is mapped to 1, the binary value 1 to -1 and DTX to 0. In the
serial-to-parallel converter every even binary symbol is mapped to an I branch and every odd
symbol to a Q branch.
16QAM A set of four binary symbols nk, nk+1, nk+2, nk+3 are serial-to-parallel converted to
two binary symbols on the I branch, i1 = nk, i2 = nk+2, and two on the Q branch, q1 =
nk+1, q2 = nk+3. These are then mapped to 16QAM according to table A.1.
2.3.15.2 Channelization and scrambling
The I and Q branches are spread to the chip rate by the channelization code Cch,SF,m, described
in section 2.3.2. The resulting chip sequence on the Q branch will be multiplied with j and
summed with the corresponding chip on the I branch, resulting in a complex sequence.
The resulting sequence of complex valued chips will then be scrambled by a complex-valued
scrambling code Sdl,n, described in section 2.3.3. Then each channel will be weighted by a
weight factor Gi before being combined by complex addition.
2.3.15.3 RF Modulation
After the spreading process the complex chip sequence is modulated as shown in figure 2.9
before RF transmission. T in this picture is the final baseband IQ data that will be referenced
2.3. WCDMA 19
later.
Sp l i t t e r
P u l s e s h a p i n g
P u l s e s h a p i n g
R e { T }
T
I m { T }
cos(ωt)
sin(ωt)
Figure 2.9: Downlink modulation
20 Theory
2.4 HSDPA
The High Speed Downlink Packet Access data packet transmission was first defined in the 3GPP
Release 4 with a peak rate of 4 Mbps, further developed in Release 5, and then complemented
with the downlink equivalent HSUPA (High Speed Uplink Packet Access, also known as EUL,
Enhanced Uplink) in Release 6. The current release is the Release 6 referred to in this text.
HSDPA is a shared downlink resource, hence one transport channel is shared between multiple
users in the cell. This service is packet switched (PS), which means that user data is very bursty,
hence the link resource to one user does not need to be reserved through out the connection. In
other words, the available radio resources must be dynamically shared between the users that
are currently requesting data.
Now consider multiple users connected to a Node B. All of these users must be able to get
data on request at all times during their connection. This implies that they must have their
own channelization code on a dedicated channel to separate them from each other, causing
channelization codes to start to run out rapidly when more users are connecting. One solution
is to let each user have its own downlink scrambling code, but then the orthogonality from the
single source (Node B) would be lost.
Category Maximum
number of
codes
Inter-TTI Maximum
transport
block size
Data rate Modulation
schemes
1 5 3 7298 1.2Mb/s Both
2 5 3 7298 1.2Mb/s Both
3 5 2 7298 1.8Mb/s Both
4 5 2 7298 1.8Mb/s Both
5 5 1 7298 3.6Mb/s Both
6 5 1 7298 3.6Mb/s Both
7 10 1 14411 7.2Mb/s Both
8 10 1 14411 7.2Mb/s Both
9 15 1 20251 10.1Mb/s Both
10 15 1 27952 14.0Mb/s Both
11 5 2 3630 0.9Mb/s QPSK
12 5 1 3630 1.8Mb/s QPSK
Table 2.2: HSDPA UE categories
Instead, HSDPA uses a set of 15 channelization codes shared in time domain and a shared
control signaling channel which conveys control informatin to the UE’s, in order to inform the
connected UE’s when data is available for a certain UE ID. This resolves the code problem,
and does not affect the QoS since PS services does not need to have a guaranteed time delay
on data packets.
There are 12 different HSDPA categories, with different modulation schemes and number
of codes. Different transmission spacings called inter-TTI intervals are also used to effectively
limit the bandwidth of certain categories. This is simply done by utilizing every second or every
third transmission time interval (TTI). The theoretical data rates ranges from 0.9 to 14 Mbps,
2.4. HSDPA 21
and are all listed in table 2.2.
2.4.1 Time units
There are three important time units used in HSDPA. These are:
Radio frame: 10 ms. 38400 chips divided into 15 slots.
Slot: 667 µs. 2560 chips.
Subframe: TTI for HSDPA consists of three slots i.e. 2 ms. 7680 chips.
2.4.2 Overview
There are three key concepts in HSDPA separating it from R99 services.
• Rate adaptation
• Hybrid ARQ with soft combining
• Fast packet scheduling
The rate adaptation adjusts the data rate to a specific user at a frame-by-frame basis, calcu-
lated on the current channel quality reported by the UE. Each TTI a new rate is selected for
each receiving user by adjusting modulation scheme, forward error-correction coding (FEC) and
redundancy version, where the highest rate is assigned to the user with best channel conditions.
This user is also getting the highest scheduling priority by the fast packet scheduling process.
Hybrid ARQ or HARQ is an in-sequence delivery and redundancy versioning process, making
use of erroneous packets by combining them with retransmissions.
Modulation schemes used are QPSK and 16QAM for HS-DSCH, and just QPSK for HS-
SCCH. FEC coding algorithms are Turbo coding for HS-DSCH and convolutional coding for
HS-SCCH. These techniques are described in appendix A.
Figure 2.10: HSDPA channel structure overview
22 Theory
The channel structure of HSDPA is shown in figure 2.10. The downlink information is carried
on the HS-DSCH and HS-SCCH channels, while uplink information is carried on one or more
dedicated channels as in R99. There is also a uplink signaling channel called HS-DPCCH, and
an extra dedicated downlink channel to carry power control commands to the UE (for uplink
power control).
The downlink channels are shared between every UE in the cell. Note that HS-DSCH is a
transport channel, and figure 2.10 only shows physical channels. HS-PDSCH is the physical
HS-DSCH constituent, and there can be up to 15 of these physical channels, also called HS-
DSCH codes in this text. How these 15 channels are shared among the users is decided by the
scheduler.
2.4.3 Power control
Because HSDPA uses a shared channel resource for the receivers, the power cannot be controlled
per user since they are not at the same distance from the Node B (nor do they have equal channel
conditions). Instead, the power is kept at a fairly constant level, actually the power gap from
the R99 channels up to near the maximum output of the power amplifier. In this way, the cell
power is always utilized well.
Figure 2.11: Illustrative example of power utilization in HSDPA
If a certain user is located near the Node B, there are two scenarios which may apply. Either
this user has sufficient power to get a clear channel and can utilize HSDPA at its peak rate, or
the dedicated channels are using to much power so that the user must receive at a lower rate.
Now, if this user is located far from the Node B (near the cell edge), two similar scenarios
exist. Either this user has sufficient reception to receive at a quite high rate if the cell is not
highly populated, or the cell is populated with many other users using dedicated channels,
hence this user must receive at a very low rate using basic modulation if at all possible.
This rate adaptation is described in the next section. Because the spreading factor is constant
2.4. HSDPA 23
for HSDPA, the processing gain is also constant, and the rate must be lowered in order to receive
properly on bad channels. This is the main reason why the scheduling of data is first done in
the time domain, so that one user near the Node B with good channel conditions does not get
affected by a user with poor channel conditions (and get a lower rate).
2.4.3.1 Uplink power control
It is still critical to have a tight power control on the uplink channels to avoid interfering other
UE’s. This is implemented by assigning each UE a dedicated channel, carrying power control
commands in the TPC field. This may also be used for complementary circuit switched data
such as speech.
Regarding the issue with downlink channelization codes, it should be mentioned that the
dedicated channel can use a SF of up to 512 since data is not carried, so there will be enough
DPCH’s for HS UE’s assuming not all HS-DSCH codes are used. If more UE’s need to be
connected for instance when using all HS-DSCH codes, fractional DPCH (F-DPCH) may be
used, allowing several UE’s to share one DPCH.
2.4.4 Rate adaptation
HSDPA uses the following link adaptation techniques. As mentioned earlier, power control
is omitted for the downlink and replaced by means of changing modulation scheme between
QPSK/16QAM, and code rate of the FEC code to adjust bit rate. The basic idea behind this is
to lower the modulation i.e. increase the processing gain, and also increase the effective coding
rate of the FEC in order to allow UE’s with weak or interfered signal to receive properly.
The higher modulation of 16QAM provides twice the bit rate compared to QPSK on clear
channels, and puncturing of parity bits gives a code rate of almost 1 (0.9715). On the other
hand, if the channel is noisy or has a weak signal due to bad coverage or interference, the
modulation can be changed to standard QPSK with a coding rate of 1/3, with repetition of
both systematic and parity bits to increase redundancy.
The UE’s with favorable channel conditions are prioritized by the scheduler, hence given
better average throughput. Refer to the next section about scheduling for further details on
this.
The link adaptation is frame-based, that is, for each TTI a new transport format will be
chosen dynamically. When there are favorable link conditions, 16QAM and (close to) 1/1
coding rate is applied.
The CQI field of the uplink control channel HS-DPCCH (see section 2.4.13) contains infor-
mation reflecting the downlink channel conditions which is used by MAC-hs in Node B for
selecting transport format.
2.4.5 Fast packet scheduling
Scheduling in HSDPA differs from other WCDMA services, in the sense that the former uses
instantaneous measuring of the channel conditions to schedule packets to several users. The
user with the best link conditions will get the highest priority, either in time domain or in code
domain, but preferably in time domain if the UE’s are experience different channel conditions.
24 Theory
Otherwise, the one with best channel will be forced to receive at a lower data rate than possible.
This is also called channel-dependent scheduling in some literature.
If there are, for instance, two users sharing one channel with similar link conditions, then
they will share the channel resources equally, utilizing half of the code set each. If one of them
suffers from poor signal strength due to bad coverage then the one with best link conditions
will get more air time and also be able to use lower coding rate and higher modulation.
Therefore, the channel is always utilized well, and the average throughput for the better link
will be high. Figure 2.12 shows an example of channel-dependent scheduling between three
users, who suffers from dips in channel quality at different times.
Figure 2.12: Example of scheduling three users with variations in channel quality
The scheduling functionality resides in MAC-hs in Node B and not in the RNC as for the
other R99 services. The reason for this is to keep the round trip times short, hence enabling
high data rates.
2.4.6 MAC-hs
The MAC protocol provides data transport between physical layer (layer 1) and higher sub-
layers of layer 2 through the RLC protocol, and also scheduling of MAC packet transmissions
to different users connected to the Node B (see section 2.3.8 about WCDMA). Both of these
protocols are part of the Data Link Layer (layer 2) of the OSI model, but the MAC-hs resides
in the Node B whilst the MAC-d/MAC-c/sh and the RLC is in the RNC.
The reason for placing MAC-hs in Node B is obvious if the data rates and the possible
number of Node B’s connected to each RNC are taken into account; the fast signaling requires
fast operation of the MAC-hs, thus its position is close to the physical layer in the Node B to
releave the RNC. The drawback is that this adds complexity to the relatively simple Node B.
The MAC-hs can be seen as an extension of the MAC-d/MAC-c/sh protocols, dropped down
to the physical layer. Its primary tasks are HARQ process management/in-sequence delivery to
higher layers, user scheduling, transport format selection and extended flow control. The MAC-
hs receives a flow of MAC-d PDU’s which are priority sorted between a set of priority queues,
2.4. HSDPA 25
each tied to a channel of its own. These are then prioritized and segmented into MAC-hs PDU’s
and sent using HARQ.
The MAC-hs specific control signaling is mapped directly onto HS-SCCH in the physical
layer. The information consists of HARQ parameters, transport format selection and UE ID.
Mapping and coding of these parameters is described in section 2.4.11.
Figure 2.13: MAC-hs encapsulation of TCP/IP packets
Now consider each MAC-d PDU as a MAC-hs SDU, that is, MAC-d header and MAC-d
payload. Each MAC-hs PDU consists of a MAC-hs header (normally 21 bits when using fixed
MAC-d PDU size) and a sequence of N MAC-d PDU’s. If the sum of the MAC-d PDU’s are
less than the transport block size, the MAC-hs PDU is padded with dummy bits to fit the
transport block. The data encapsulation from IP packets down to MAC-hs PDU’s is shown in
figure 2.13.
The initial user data has already been encapsulated into the IP packet in layer 3 by the
remote serving application (for instance, a web server). The IP header is then compressed by
the PDCP protocol in layer 2 (RNC), which maps higher-level protocol characteristics onto the
characteristics of the underlying radio-interface protocols, providing protocol transparency for
higher-layer protocols. The PDCP header and payload forms the RLC SDU. This is split into
40 bytes long payload segments, each with its own RLC header. These then forms the MAC-d
SDU.
26 Theory
2.4.6.1 MAC-hs header
The MAC-hs header format is shown in figure 2.14. In this case, all MAC-d PDU’s are of equal
size, yielding a 21 bit header. If they were of different sizes, they would be grouped in sets
according to their size and the SID/N/F fields would be repeated in the header for each set
yielding 11 bits extra per additional PDU set. Normally, they are all of equal size though.
Figure 2.14: MAC-hs header
The VF version flag is reserved for future extensions. If this bit is a 1, the header is not valid
as of today.
QueueID identifies a reordering queue, which buffers received blocks in a receiving window in
order to deliver the transport blocks in-sequence. All MAC-d PDU’s of a MAC-hs PDU belong
to the same reordering queue.
TSN is the transmission sequence number of the transport block, used when reordering. The
in-sequence delivery is implemented on top of HARQ, since there is no interaction between the
HARQ processes.
SID is a size index identifier that specifies the size of the MAC-d PDU’s. N is the number
of MAC-d PDU’s in this set (normally only one as mentioned previously).
Finally, F indicates the end of the header.
2.4.7 Hybrid ARQ with soft combining
For a perfect transmission channel with no noise and maximum amplitude of the received signal
at all times, there is no need for any forward error-correction coding (FEC) or retransmission
algorithms. In practice, there are no such things as perfect channels, hence must there be
some sort of error handling. Even a short length copper or fiber cable can be interfered by
external electro-magnetic sources, and this is often taken care of by either packet acknowledg-
ment/retransmission or FEC.
Retransmission schemes is the best choice if the channel conditions are good enough to cope
without FEC most of the time (as for cable links), and FEC schemes is the best choice for noisy
channels such as air interfaces.
For an air interface however, there is no guarantees what so ever that even a highly redundant
packet can be received at all times, due to fluctuations in the channel. One better approach
is a combination of the retransmission scheme Automatic Repeat Request (ARQ) and FEC,
called Hybrid ARQ, or HARQ. In this case, the received packets are first decoded according
2.4. HSDPA 27
to the FEC scheme used, and then checked for errors using CRC. If an erroneous packet was
received despite FEC coding, the receiver discards the packet and sends a NACK back to the
transmitter. Otherwise, an ACK is sent to indicate that the packet could be salvaged.
ARQ is also automatically responsible for in-sequence delivery of the packets, since retrans-
missions will cause new transmissions inside the receiving window to come out of sequence.
This is done by an re-ordering queue in the receiver.
2.4.8 HARQ in HSDPA
HSDPA uses an extension of HARQ, HARQ with soft combining, which keeps erroneous frames
in a soft buffer (called virtual IR buffer at the transmitter side) since there is some amount
of intact data, and requests a retransmission of the same frame. This retransmission may be
either the same data bits as was sent last time including systematic bits, or a new set of parity
bits with or without systematic bits. The retransmission is then combined with the soft buffer,
until there is enough information for the frame to be decoded successfully.
There are two types of retransmission combination; Chase Combining and Incremental Re-
dundancy.
2.4.8.1 Chase Combining
In a Chase Combining (CC) algorithm, retransmitted frames are exact copies of the original
one, and are combined until the frame can be decoded. This is a simple algorithm, since the bits
to keep from each retransmitted frame are simply those who differs from the buffered frame.
Eventually, assumed that interference may be regarded as random, i.e. uniformly distributed,
the combination of frames will equal the original data sent by the transmitter.
However, if the channel suffers from long-term random interference, this is not a very efficient
algorithm since there may be several retransmissions for each frame, taking channel bit-rate
down severely.
2.4.8.2 Incremental Redundancy
Another approach is Incremental Redundancy (IR). In this algorithm the frame is retransmit-
ted with increasing redundancy and perhaps even without systematic bits (or with punctured
systematic bits). Like Chase Combining, the retransmissions are combined with the original
frame until enough redundancy is accumulated. This results in better error coding/lower code
rate, since it is often more efficient to use more dense coding than to retransmit several times
if the channel quality is poor. That is, the redundancy is adapted to channel conditions.
This is much more robust to static interference than CC, since one retransmission generally
contains the redundancy omitted in the initial transmission.
As an example, consider a maximum coding i.e. minimum code rate of 1/4. Let’s say that the
initial data frame was transmitted with a code rate of 3/4, and a transmission error occurred
due to interference on the channel. The next data frame would then be coded using a data
rate of 3/8. Suppose this frame was also altered by interference, then the second retransmission
would be coded with a data rate of 1/4, hence maximum redundancy is sent.
28 Theory
2.4.8.3 Physical layer details
All of the HARQ functionality resides in MAC-hs in the physical layer. The principle of the
rate matching is depicted in figure 2.15.
The specifications in [3] allows the UE to have a smaller soft buffer than required by the
largest transport blocks, and than the bits output from the Turbo coder for a certain valid
transport block. This limitation is tied to the UE category mentioned previously, and signaled
at connection setup.
In order to accommodate the coded transport block bits in the buffer, these must first be rate
matched (puncturing only) to perfectly fill the buffer. This is done in the first rate matching
stage, where systematic bits are always preserved.
Figure 2.15: Principle of the HARQ two-stage rate matching
The second rate matching stage performs the actual redundancy selection for retransmissions,
as well as rate matching for physical channels. If for instance one QPSK channel is used, there
are 960 available channel bits, and the bits input must be matched to this using either repetition
or puncturing. The rate matching pattern is calculated as described in section 2.4.10.5.
The redundancy version parameters s and r are defined more detailed in section 2.4.10.5
and the coding of them into HS-SCCH is defined in 2.4.12.1. For s = 1, systematic bits will
also be preserved (if possible) in the second rate matching stage, and for s = 0 the number of
systematic bits depends solely on the rate matching algorithms.
Depending on the amount of available channel bits and the redundancy version parameters,
different rate matching scenarios will apply. If systematic bits are prioritized and the amount
of channel bits is enough for the systematic bits, only parity bits will be punctured. On the
other hand, if systematic bits are not prioritized, this implies that parity bits are prioritized
instead. In this case, the systematic bits are primarily punctured. Although, both systematic
and parity bits are repeated in both cases.
An example of HARQ rate matching for both CC and IR is shown in figure 2.16. The
transport block size is 13904 bits (all channels, QPSK), and the soft buffer is 38400 bits.
2.4. HSDPA 29
Figure 2.16: Example of rate matching for different redundancy versions, when using 15 QPSK channels
The initial transmission is equal to a CC retransmission. The down-most bits represents an
IR transmission, where in this example only complementary (and some copies of) parity bits
are sent, i.e. systematic bits are not prioritized and does not fit into the physical channels.
The parity bits to send are actually not bit sequences as shown in the figure, but chosen from
the entire parity block. Even parity bits that has already been sent (in the end of the last
transmission) may be picked out by the redundancy (rate matching) algorithm.
The transmissions/retransmissions are handled by several parallel stop-and-wait HARQ pro-
cesses, one for each receiver at the receiving side and one for each channel and receiver at the
transmitting side.
2.4.9 HS-DSCH
The High-Speed Downlink Shared Channel utilizes both time- and code multiplexing. According
to the specifications, the first choice is multiplexing in the time domain.
HS-DSCH uses a constant SF of 16. 15 of these channelization codes are available for HS-
DSCH use, because the first code is reserved to preserve higher spreading factors. This enables
up to 15 users to share the channel during each time frame (TTI).
The TTI is also different from the other services of WCDMA, since it has been reduced to
2 ms (3 time slots) to boost throughput. The main reason for this is the bottle neck problem
30 Theory
introduced by TCP/IP, when data packets need to be acknowledged very fast which cannot
be done using longer TTI’s. This has also been counteracted with the HARQ functionality
described in more detail in sections 2.4.7 and 2.4.10.5.
The maximum bit rate is obtained when all 15 codes are assigned to one UE. With 3.84
Mcps and 16QAM modulation scheme, the bit rate becomes 3.8416 · 4 · 15 = 14.4 Mbps. This is
however not the net data transmission rate, since the overhead of the error-correction coding
is not taken into account, even though this is only a few percent when using full puncturing of
parity bits. Moreover, further overhead is added in the MAC-hs header, reducing net data rate
even more.
Figure 2.17: Example of code and time multiplexing using five HS-DSCH codes, shared between four
users
An illustration of four UE’s sharing an HS-DSCH with 5 channelization codes is shown in
figure 2.17. The sharing is made through both code- and time multiplexing of available TTI
slots between the users. Note that the order of the code assignment to the users is of no
importance regarding the transmission itself. The scheduler selects which users and how many
codes to assign each of them, and the time domain sharing between the time frames. The
selection of which user to get the most bandwidth is complex and done by the scheduler in the
RNC.
2.4.10 Coding of HS-DSCH
The RNC chooses a suitable transport block size (TBS) for this particular channel, whose size,
together with modulation scheme and number of allocated codes, decides the effective data
quantity to be transferred. If the TBS is too small for the transmission, repetition of coded
bits is performed in the HARQ stage, see figure 2.18 which illustrates the coding chain. If the
TBS is too large, coded bits must be punctured (removed) in order to fit. This may include
puncturing of systematic bits, if they are not prioritized.
Error correction is obtained using 1/3 rate Turbo coding of the transport block concatenated
with CRC-24. The CRC provides error check once the data is decoded.
Bit scrambling of raw data is performed in order to iron-out any long sequences of ones or
zeros. This is important to avoid repeated low-frequency electro-magnetic interference, which
was a fundamental problem in 2G GSM UE’s.
Interleaving of coded data is also important when using high-speed wireless communication.
2.4. HSDPA 31
Figure 2.18: HS-DSCH coding chain
The interleaver re-arranges the input bits, so that systematic bits and parity bits are evenly
spread out in the time domain, hence making the error correction more robust to dynamic radio
interference.
The coding chain illustrated in figure 2.18 is described in more detail in the following sections.
It is related to the coding chain described in 2.3.14. The same denotations as used in [3], section
4.5, is used here. Note that there are some dualities, for instance A = TBS.
2.4.10.1 CRC attachment
The 24 CRC bits are calculated from the A (or TBS) transport block bits aim1, aim2, .., aimA,
according to 2.3.14.1. There is only one transport block. The checksum is appended to the end
of the transport block, producing the bit sequence bim1, bim2, .., bimB , where B = A + 24.
32 Theory
2.4.10.2 Bit scrambling
The bits output from the CRC calculation, Bim, are scrambled by XOR’ing them with the
scrambling sequence generated by the following algorithm. The resulting B scrambled bits are
denoted Dim.
The generator polynomial g is given in equation 2.6, and the initial conditions are given in
equation 2.7.
g = {g1, g2, .., g16} = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1} (2.6)
yγ = 0, −15 < γ < 1
yγ = 1, γ = 1(2.7)
Then, the scrambling sequence yγ is calculated according to
yγ =
(
16∑
x=1
gx · ´yγ−x
)
mod 2, 1 < γ ≤ B (2.8)
2.4.10.3 Code block segmentation
This is done according to section 2.3.14.2 for turbo coding. Note that there is only one transport
block.
2.4.10.4 Turbo coding
Oir bits are encoded using Turbo code at rate 1/3, according to section A.1. Each code block
is coded separately, yielding 12 tail bits per code block, 4 for systematic and 2 ·4 = 8 for parity.
That is, the extra overhead after Turbo coding is Ci · 12.
The resulting E bits are denoted ci1, ci2, .., ciE .
2.4.10.5 HARQ rate matching
Before rate matching stage one, input bits are separated into three sequences X1,X2,X3 with
systematic bits in the first, and parity bits in the other two according to
x1,i,k = ci,3(k−1)+1
x2,i,k = ci,3(k−1)+2 k = 1..Xi
x3,i,k = ci,3(k−1)+3
(2.9)
where
Xi =E
3. (2.10)
2.4. HSDPA 33
First rate matching stage This rate matching stage operates on the three bit sequences
defined above. The number of bits input i.e. the number of coded bits is E, and the size of the
virtual IR buffer is denoted NIR which is signaled from layer 2 for each HARQ process.
Repetition is never performed in this stage, and puncturing of systematic bits is not allowed.
The puncturing decision and pattern calculation follows.
If NIR ≥ E
Coded bits will fit into the buffer and no puncturing needs to be performed.
else
Coded bits does not fit into the buffer, and puncturing must be performed.
end
That is, when NIR < E redundancy is punctured according to the general rate matching
algorithm described in section A.3. The number of parity bits that needs to be punctured is
∆NTTIil = NIR − E (negative sign equals puncturing), where i and l indicates the transport
channel and transport format combination respectively. Both of these indexes are static for
HS-DSCH and do not need to be considered.
The number of bits out of the first stage is denoted Nsys for systematic bits, Np1 and Np2 for
parity 1 and parity 2 bits respectively, and given in equation 2.11.
Nsys = Xi
Np1 = Xi −
∣
∣
∣
∣
∆NTTIil
2
∣
∣
∣
∣
Np2 = Xi −
∣
∣
∣
∣
∆NTTIil
2
∣
∣
∣
∣
(2.11)
Second rate matching stage The following parameters decides the redundancy version of
the current transport block, as an effect of number of bits input, available channel bits, and
RV (Redundancy Version) parameters. This stage is always operative, and may as mentioned
earlier perform both puncturing and repetition of both systematic bits and parity bits. After
the redundancy version has been selected, the appropriate rate matching is performed in order
to fit the selected bits into the physical channels.
Available channel bits are denoted Ndata and given below as a function of modulation scheme
MS, where QPSK = 0 and 16QAM = 1, and number of physical channels P .
Ndata = (MS + 1) · 960 · P (2.12)
If Ndata ≤ Nsys + Np1 + Np2 then puncturing is performed. The number of systematic bits
after puncturing is
Nt,sys = min (Nsys, Ndata) (2.13)
for a transmission that prioritizes systematic bits (s = 1), and similarly
Nt,sys = max (Ndata − (Np1 + Np2) , 0) (2.14)
34 Theory
for a transmission that prioritizes non systematic bits (s = 0).
That is for s = 1, if all systematic bits can fit into the physical channels, all bits are preserved
during puncturing. Otherwise, the maximum number of systematic bits are sent and all parity
bits are punctured.
For s = 0, either no systematic bits are sent if only the parity bits fits into the physical
channels, or the amount of bits available after parity bits are fitted.
If Ndata > Nsys+Np1+Np2 then repetition is performed. The number of bits after repetition
is
Nt,sys =
⌊
Nsys ·Ndata
Nsys + 2Np1
⌋
(2.15)
Nt,p1 =
⌊
Ndata − Nt,sys
2
⌋
(2.16)
Nt,p2 =
⌈
Ndata − Nt,sys
2
⌉
(2.17)
for systematic, parity 1 and parity 2 bits respectively. Of course, s has no effect here.
A closer glance at these equations reveals that the bits are evenly repeated to fill the physical
channels. Note that Np1 = Np2, but Nt,p1 does not necessarily has to be equal to Nt,p2.
The general rate matching algorithm described in section A.3 is applied to the three bit
sequences, using rate matching parameters defined in table 2.3.
Xi eplus eminus
Systematic bits Nsys Nsys |Nsys − Nt,sys|
Parity 1 bits Np1 2 · Np1 2 · |Np1 − Nt,p1|
Parity 2 bits Np2 Np2 |Np2 − Nt,p2|
Table 2.3: Rate matching parameters for second HARQ rate matching stage
The final rate matching parameter eini as a function of RV parameter r is then given by
equations 2.18 and 2.19, for puncturing and repetition respectively.
eini(r) =
((
Xi −
⌊
r · eplus
rmax
⌋
− 1
)
mod eplus
)
+ 1 (2.18)
eini(r) =
((
Xi −
⌊
(s + 2 · r) · eplus
2 · rmax
⌋
− 1
)
mod eplus
)
+ 1 (2.19)
Where rmax is the maximum number of redundancy versions for the given MS,
rmax = (MS + 1) · 2 (2.20)
and r versions are listed in section 2.4.12.1.
2.4. HSDPA 35
2.4.10.6 HARQ bit collection
HARQ bit collection stage performs the bit collection algorithm that follows. The bit sequences
from rate matching stage 2 are read into a rectangular interleaving matrix column by column.
The dimensions of the matrix are Nrow × Ncol, where
Nrow = 2, MS = 0Nrow = 4, MS = 1 (2.21)
Ncol =Ndata
Nrow(2.22)
Ncol can also be written as
Ncol = P · 480 (2.23)
since Ndata is proportional against Nrow. The following parameters defined in equations 2.24
and 2.25 specifies where systematic bits and parity bits are to be written into the matrix. Nr is
the number of rows where systematic bits are to be written, and Nc is the number of columns
where systematic bits are written in Nr + 1 rows. Furthermore, Nc = 0 if there are no parity
bits.
Nr =
⌊
Nt,sys
Ncol
⌋
(2.24)
Nc = Nt,sys − Nr · Ncol (2.25)
If Nc = 0 and Nr > 0, the systematic bits are written into rows 1..Nr. Otherwise, systematic
bits are written into rows 1..Nr + 1 in the first Nc (left-most) columns, and if Nr > 0 the
remaining bits are written into the remaining Ncol − Nc columns (using Nr rows).
In other words, either all systematic bits will fit on one row using Nc columns, or they will
fit on the first Nr + 1 rows using Nc columns plus Nr rows in the column span [Nc + 1, Ncol].
Note that matrix indexes starts at 1.
Figure 2.19: Illustrative example of the HARQ bit collection matrix
The parity bits are then written into the remaining space, column by column with alternating
order of parity 1 and parity 2 bits. See figure 2.19 for an illustrative example for the case Nc 6= 0.
36 Theory
Since the matrix element count is always equal to the number of physical channel bits, there
will not be any padding bits to take care of.
Finally, the bits are read out from the matrix row-by-row. The resulting bit sequence is
denoted WR.
2.4.10.7 Physical channel segmentation
WR is divided into P segments of length U , where U = RP
. U is the number of bits in each
physical channel, either 960 for QPSK or 1920 for 16QAM. The segments are ordered in sequence
with the first segment later mapped to the first physical channel, where the bits are denoted
uP1, uP2, .., uPU .
Bits on the P th physical channel after segmentation:
uP,k = wk+(P−1)·U , k = 1..U. (2.26)
2.4.10.8 Interleaving
This stage is important to enhance the error coding properties. Consider the scenario of a 2 ms
long TTI with 960 bits in sequence. Now, if the bits were arranged as systematic bit, parity bit
1, parity bit 2, ... and so on, an interference peak of a very short duration would easily destroy
a number of bits and their parity bits. This makes error correction difficult. Instead, the bits
are spread around in the bit sequence, making it much less likely that the same interference
would destroy the same amount of bits, hence error correction is made more robust.
The basic interleaver used here is the basic 2nd block interleaver described in section 2.3.14.11,
using 32 rows times 30 columns (32×30). The interleaving is then done in three steps as follows.
1. The input sequence is written row by row in chunks of 30 bits into the 32 × 30 matrix.
2. Then the columns are permuted according to a predefined permutation table.
3. Finally, the bits output are the rows read out top-down.
For QPSK, one block interleaver is used and the input bits UP,U are fed through to produce
the output sequence vP,1, vP,2, .., vP,U , where P is the physical channel and U is the number of
bits in each physical channel. For 16QAM at the other hand, two identical block interleavers
are used and the input bits UP,U are separated 60 at a time, with the first two bits to the upper
interleaver and the next two bits to the lower one.
2.4.10.9 Constellation re-arrangement
There are four modulation constellations for 16QAM. The bits output from the interleaver are
grouped in four bit long sequences so that vp,k, vp,k+1, vp,k+2, vp,k+3 are used, where k mod 4 =
1. For QPSK, this stage is transparent.
The following table shows the bit operations for each constellation version, as a function of
the b parameter.
2.4. HSDPA 37
Figure 2.20: HS-PDSCH interleaver configuration
b Output Operation
0 vp,k, vp,k+1, vp,k+2, vp,k+3 None
1 vp,k+2, vp,k+3, vp,k, vp,k+1 Swapping LSB’s with MSB’s
2 vp,k, vp,k+1, vp,k+2, vp,k+3 Inversion of LSB’s
3 vp,k+2, vp,k+3, vp,k, vp,k+1 Both swapping and inversion
Table 2.4: Constellation version table for 16QAM modulation
2.4.11 HS-SCCH
The control information required to operate HS-DSCH is carried on a new separate control
channel called the High-Speed Shared Control Channel (HS-SCCH). This is a downlink control
channel, providing the UE with information such as channelization code set, modulation scheme,
transport block size and HARQ parameters. The length of an HS-SCCH subframe is the same
as for the HS-DSCH, that is, three time slots.
The SF is constant at 128, at which the number of bits in one subframe becomes 7680128 ·2 = 120
bits, i.e. one slot is 40 bits long.
Figure 2.21: HS-SCCH information bits
The link information provided in HS-SCCH is shown in figure 2.21, and explained below. Part
1 consists of the so called Transport-Format and Resource-related Information bits (8 bits), and
Part 2 consists of the transport block size and HARQ parameters.
38 Theory
The Channelization Code Set specifies which of the codes in the HS-DSCH code tree to be
used when decoding the HS-DSCH (see section 2.4.12.1 below). The Modulation Scheme bit
selects which modulation scheme to use,
0 - QPSK
1 - 16QAM.
There are in total 254 different transport block sizes defined for HS-DSCH (see [5] Annex A
for a complete listing of TBS’s for FDD). The Transport Block Size field specifies which one,
in a set of 63 available block sizes out of the subset of possible combinations for this transport
format, this connection uses. This parameter together with modulation scheme and number of
codes defines the transport block size for this particular transmission. These three fields inform
the UE how to decode the HS-DSCH, and must therefore be descrambled and decoded before
the start of the HS-DSCH subframe. To ensure that the UE will finish decoding of the control
information, there is a delay of one time slot (667 µs) after Part 1 before sending the HS-DSCH.
The RV parameters s, r and b are described in the HS-DSCH section above.
The new data indicator NDI bit is used to indicate if this transmission is the original trans-
mission, or a retransmission.
HS-SCCH is actually a physical channel. The control information is mapped directly onto
HS-SCCH from the MAC-hs (see section 2.4.6).
2.4.11.1 Timing in relation to HS-DSCH
The modulation scheme and spreading code must be known, that is, decoded and ready for
use, when the first HS-DSCH subframe arrives. Therefore, the Part 1 information is coded and
scrambled with an UE specific code, and sent two time slots before the beginning of the first
HS-DSCH subframe. In other words, the UE has one time slot to decode the Part 1 information
before decoding the first data frame. See figure 2.22.
Figure 2.22: HS-SCCH timing in relation to HS-DSCH
2.4.11.2 UE decoding behavior
Each UE connected to the cell (which is using HSDPA) must decode Part 1 for all HS-SCCH
channels in each TTI. Actually, there is a limitation of four channels per TTI in the Release 6
specifications. Since Part 1 is scrambled with the UE ID of the intended receiver as we shall
see further on, the UE can detect if the subsequent HS-DSCH slots as well as the remaining
2.4. HSDPA 39
two SCCH slots need to be decoded. As mentioned above, there is a decoding margin of one
slot between the first HS-SCCH slot (Part 1) and the first HS-DSCH slot, i.e. the UE has got
667µs to decode it. This implies that Part 1 and Part 2 are decoded individually.
2.4.12 Coding of HS-SCCH
2.4.12.1 Coding of channelization code set, transport bloc k size and HARQ
Channelization code set parameters Given O and P , the offset and number of HS-DSCH
codes, channelization code set is mapped to the 7 bits xccs,1, xccs,2, ..., xccs,7 according to the
following equations.
The first three bits are the code group indicator bits, of which xccs,1 is the MSB,
xccs,1, xccs,2, xccs,3 = min (P − 1, 15 − P )
and the four last bits are the code offset indicator bits, of which xccs,4 is the MSB,
xccs,4, xccs,5, xccs,6, xccs,7 =∣
∣O − 1 −⌊
P8
⌋
· 15∣
∣.
O begins at 1, because of the reserved code (O = 0) for higher spreading factors, thus 15
codes are available as fore mentioned. This coding will free the last bit of the Part 1 byte,
which can then be used for modulation scheme mapping (see next section). Without coding,
the offset O = 1..15 and number of codes P = 1..15 would each have occupied 4 bits.
Transport block size parameter The transport block size of the HS-DSCH depends on
three variables; the ki transport block size index parameter, the modulation scheme and the
number of codes. In the specification, only decoding of the index parameter i.e. calculation of
transport block size is specified. Therefore, we shall first introduce the calculation of TBS and
then the coding of TBS into ki.
The calculation of TBS is specified in [5].
TBS = L(kt), (2.27)
where
kt = ki + k0,i. (2.28)
Table 2.5 specifies k0,i for the given modulation scheme and number of codes. TBS is then
given by
If kt < 40
L(kt) = 125 + 12 · kt (2.29)
else
L(kt) =⌊
Lmin · pkt
⌋
, (2.30)
where p = 20852048 and Lmin = 296.
40 Theory
Combination i Modulation
scheme
Number
of codes
k0,i
0
QPSK
1 1
1 2 40
2 3 63
3 4 79
4 5 92
5 6 102
6 7 111
7 8 118
8 9 125
9 10 131
10 11 136
11 12 141
12 13 145
13 14 150
14 15 153
15
16QAM
1 40
16 2 79
17 3 102
18 4 118
19 5 131
20 6 141
21 7 150
22 8 157
23 9 164
24 10 169
25 11 175
26 12 180
27 13 184
28 14 188
29 15 192
Table 2.5: k0,i for different modulation schemes and number of codes
The inverse calculation when coding ki then becomes the following. Because we already know
which modulation scheme and how many codes to use, this is done by first checking whether
the first case (ki < 40), or the second one (ki >= 40) applies.
First, assume that kt < 40. This gives
k0,t =L(kt) − 125
12. (2.31)
Now, if k0,t >= 40, our assumption was not right and we get
kt =
⌈
lg L(kt)296
lg 20852048
⌉
. (2.32)
Otherwise, we get
2.4. HSDPA 41
kt = dk0,te . (2.33)
Finally, equation 2.34 gives us the ki to be coded into Part 2.
ki = kt − k0,i. (2.34)
Also to be noted, there is a limitation on ki = [0..62] i.e. 63 different TBS’s for each combi-
nation of modulation scheme and number of codes. This gives 63 · 2 · 15 = 1890 different (valid)
combinations.
HARQ RV parameters These three parameters s, r and b may come in any of the eight
possible redundancy versions listed in table 2.6. b only affects 16QAM transmissions and is not
sent when using QPSK. Instead, there are two more possibilities for r. The coded bits Xrv in
table 2.6 are mapped to the three bits xrv,1, xrv,1, xrv,3, where xrv,1 is the MSB.
QPSK 16QAM
Xrv s r s r b
0 1 0 1 0 0
1 0 0 0 0 0
2 1 1 1 1 1
3 0 1 0 1 1
4 1 2 1 0 1
5 0 2 1 0 2
6 1 3 1 0 3
7 0 3 1 1 0
Table 2.6: RV coding for QPSK and 16QAM
2.4.12.2 Coding chain and information mapping for HS-SCCH
The sub-blocks of HS-SCCH are coded as described in this section. First, we denote Part 1
information bits x1,i, i = 1..8, and the Part 2 information bits x2,i, i = 1..13. Part 1 and Part 2
are coded and rate matched in order to fit into one subframe, one slot for Part 1 and two slots
for Part 2. The coding chain is illustrated as a flow diagram in figure 2.23.
Concatenation of HS-SCCH information The channelization code set is mapped to the
first 7 bits xccs,1, xccs,2, ..., xccs,7 as mentioned in the previous section 2.4.12.1. The modulation
scheme is mapped as 0 for QPSK and 1 for 16QAM, to the last bit xms,1, forming
x1,i = xccs,i, i = 1, 2, .., 7
x1,i = xms,i, i = 8(2.35)
42 Theory
Figure 2.23: HS-SCCH coding chain
2.4. HSDPA 43
For Part 2, the bit mapping is as follows. Transport block size identifier bits are denoted
xtbs,1, xtbs,2, .., xtbs,6, HARQ information bits are denoted xhap,1, xhap,2, xhap,3, RV information
bits are denoted xrv,1, xrv,2, xrv,3 and new data indicator bit is denoted by xnd,1.
The resulting bit sequence x2,1, x2,2, .., x2,13 is
x2,i = xtbs,i, i = 1, 2, .., 6
x2,i = xhap,i, i = 7, 8, 9
x2,i = xrv,i, i = 10, 11, 12
x2,i = xnd,i, i = 13
(2.36)
CRC attachment The 16 CRC bits are calculated from the concatenated bit sequence
x1,1, .., x1,8, x2,1, .., x2,13, according to 2.3.14.1, yielding the sequence ck, k = 1..16 in equation
2.37.
ck = Pim(17−k) (2.37)
To form the UE specific CRC used at the receiver to identify the subframe, the CRC is
masked with the 16 bit UE ID bits xue,1..16, xue,1 is LSB, according to equation 2.38. See figure
2.24.
cue,k = (ck + xue,k) mod 2, k = 1..16 (2.38)
Figure 2.24: UE specific CRC calculation
Finally, this is appended to Part 2 to form y1..29 = x2,1, .., x2,13, cue,1, .., cue,16.
Coding of Part 1 Part 1 information bits are encoded using 1/3-rate convolutional coding
as described in section A.1 which produces 3 · 8 + 24 = 48 bits, denoted z1,i, i = 1..48. These
bits are then punctured by omitting the following bits: z1,1, z1,2, z1,4, z1,8, z1,42, z1,45, z1,47,
z1,48, to form the 40 bits coded and rate matched output sequence r1,i, i = 1..40.
The 16 bit UE ID is also coded with convolutional code, but with 1/2 code rate producing
2 · 16 + 16 = 48 bits. The coded UE ID bits are denoted z2,i, i = 1..48, and are also punctured
44 Theory
to obtain 40 bits by omitting the same bits as in the Part 1 rate matching described above.
This yields the sequence cue,i, i = 1..40.
Finally, the coded Part 1 is scrambled with the coded UE ID, to form the 40 bit UE specific
sequence
s1,k = (r1,k + cue,k) mod 2, k = 1..40 (2.39)
Coding of Part 2 y1..29 is 1/3-rate coded using convolutional coding to obtain the 3·29+24 =
111 bits long output sequence z2,i, i = 1..111. This sequence is then punctured by omitting the
following bits: z2,1, z2,2, z2,3, z2,4, z2,5, z2,6, z2,7, z2,8, z2,12, z2,14, z2,15, z2,24, z2,42, z2,48, z2,54,
z2,57, z2,60, z2,66, z2,69, z2,96, z2,99, z2,101, z2,102, z2,104, z2,105, z2,106, z2,107, z2,108, z2,109, z2,110,
z2,111.
The result is the 80 bits r2,i, i = 1..80.
2.4.12.3 Physical channel mapping
Part 1 bits s1,k, k = 1..40 and Part 2 bits r1,i, i = 1..80 are finally concatenated. This means
that Part 1 will fit into the first slot of the HS-SCCH subframe, and Part 2 will fit into the last
two slots.
2.4.13 HS-DPCCH
As for the downlink signaling on the HS-SCCH, there is a need for an uplink signaling channel
as well. This is where the physical uplink channel HS-DPCCH serves as a feedback channel for
HARQ ACK’s and Channel Quality Indication (CQI) information indicating the quality of the
air interface to this particular user.
HS-DPCCH is code multiplexed with the uplink DPCH, and the SF is constant at 256 which
means that each time slot contains 10 bits. The 5 bit CQI information is block-coded into 20
bits using a (20, 5) coder, and the HARQ ACK bit is repetition encoded to 10 bits. The whole
HS-DPCCH slot is 3 time slots i.e. one TTI and is illustrated in figure 2.25.
Figure 2.25: HS-DPCCH information bits
CHAPTER 3
Test environment
3.1 TXAD
3.1.1 General overview
TXAD (TX Adapter board) is used to verify the downlink functionality.
As this is Ericsson proprietary it will not be described in detail, only things relevant to
understand our implementation will be discussed.
It has one FPGA that handles stimuli to the TX board which is called BULL. Another FPGA,
BILL, handles recording of baseband data.
3.1.2 DSP
TXAD is also fitted with a DSP. The DSP used is a TMS320C6416T from Texas Instruments,
running at 850 MHz. This is a very powerful Fixed-Point DSP with Viterbi and Turbo decoder
coprocessors.
The DSP is equipped with two external memory interfaces (EMIF), EMIFA and EMIFB.
EMIFA is used to access the SDRAM and EMIFB is connected to BILL.
The operating system on the DSP is a lightweight version of OSE called OSEck, OSE Compact
Kernel, developed by ENEA. The DSP also contains software to handle error tracing among
other things.
3.2 Test execution
Test cases are made by using SPECMAN, and the test environment is a base station which
communicates with a TXAD board.
A tool called ATENG is used to execute the test case.
In general a test case will send stimuli to the downlink functionality of the base station
through the TXAD board, which will record baseband data. This output will then be sent to
45
46 Test environment
a reference model for comparison.
CHAPTER 4
Method
4.1 Decoding
The 3GPP specifications defines how data should be encoded, it says nothing about decoding.
Some of the steps are easily reversible, others takes some more effort.
4.1.1 Descrambling
The input to this stage is the baseband IQ-data.
The scrambling is done by complex multiplication, so the descrambling should be done by
complex division. The division of two complex numbers can be written:
A
C=
(a + bj)
(c + dj)=
(
ac + bd
c2 + d2
)
+
(
bc − ad
c2 + d2
)
j (4.1)
Both the real and imaginary part are scaled by the number (c2 +d2) which is just a constant.
Let A be the received sequence and C the scrambling code. C is just a sequence of −1 and
1 which means that (c2 + d2) will always evaluate to 2. That means that to get the original
sequence the computed result should be divided by 2, or equivalently downshifted one bit. This
is not done in the current implementation, but this shouldn’t matter as we don’t care about the
absolute amplitudes (only relative) and the result should fit in 16 bits if the input is sufficiently
small (smaller than 215). The calculations done are
Iout = Iin · Icode + Qin · Qcode
Qout = Qin · Icode − Iin · Qcode
(4.2)
In the simulation this is done in software on the DSP. The scrambling code is then precalcu-
lated and stored in the DSP memory before the decoder starts.
In the real program this is done in hardware, on the FPGA, because this was already im-
plemented. The scrambling code is then generated by the shift registers described in section
2.3.3.1.
47
48 Method
The output from this stage is the descrambled IQ-data, in 16 bits format.
4.1.2 HS-SCCH
4.1.2.1 Dechannelization
As already shown in 2.3.2 the dechannelization is done by integrating (summing) the received
values. According to 2.4.11 the spreading factor of HS-SCCH is 128. As the output should
eventually go to the viterbi decoder it is convenient to stay in the BPSK domain (1, -1 for the
binary values 0, 1). So what is done is this for both I and Q:
1. Sum 128 consecutive numbers with the channelization code to get S
2. If abs(S) is less than 128 abort, dechannelization failed
3. If S > 0 output is 1, else −1
The output from this stage is an 8-bit signed sequence of −1 and 1.
4.1.2.2 Unmasking and depuncturing
Part-1 data is masked with the UEID specific mask created by 1/2-rate convolutional coding
and puncturing. The unmasking is done by simply multiplying, as BPSK multiplication is the
same as XOR.
Then part-1 and part-2 data is punctured as described in section 2.4.12.2. The depuncturing
is done by inserting zeros in place of the punctured bits.
4.1.2.3 Convolutional decoder
To decode the encoded data the Viterbi Co-Processor (VCP) on the DSP is used. See section
A.3.4 for a brief introduction to how it works.
Input to the Viterbi decoder coprocessor is calculated branch metrics, According to Texas
Instruments reference guide[8] these are calculated as:
For 1/2 rate, 2 branch metrics per symbol period needs to be calculated:
BM0(t) = r0(t) + r1(t)
BM1(t) = r0(t) − r1(t)(4.3)
For 1/3 rate, 4 branch metrics per symbol period needs to be calculated:
BM0(t) = r0(t) + r1(t) + r2(t)
BM1(t) = r0(t) + r1(t) − r2(t)
BM2(t) = r0(t) − r1(t) + r2(t)
BM3(t) = r0(t) − r1(t) − r2(t)
(4.4)
where r(t) is the received codeword, and r0(t) is the upper branch from the encoder.
The VCP is configured to decode with the parameters as defined in section A.2.1 for 1/3 code.
Equations 4.4 applies and the branch metrics are calculated for part-1 and part-2 respectively.
The VCP is setup to receive these with DMA and triggered to start.
4.1. Decoding 49
4.1.2.4 Parameter extraction
Upon VCP completion the decoded outputs of part-1 and part-2 are available. As hard decisions
are used in the decoder these are now bits. The sent data is extracted and the parameters are
calculated as defined in 2.4.12.2.
4.1.2.5 CRC calculation
The CRC calculation is done by table lookup for speed. It is a 16-bit CRC and the polynomial
is defined in section 2.3.14.1. The result is bit reflected.
4.1.3 HS-DSCH
Turbo decoding is not performed to lighten the burden of the DSP. Instead, the systematic
bits are extracted from the coded data. This implies that puncturing of systematic bits is not
supported.
4.1.3.1 Dechannelization and demodulation
The difference between dechannelization of HS-DSCH and HS-SCCH is that in this case no
actual decoding is done, so the output will be 0 and 1 instead of BPSK. The DSP is better
suited for working with bytes than with bits though, so the result is still saved in 8-bit sequence.
QPSK This is exactly the same as for HS-SCCH with the differences that the spreading factor
is 16 and the output is 0 and 1 (implemented by shifting down the sign bit).
16QAM According to section 2.3.15.1 we know that we can expect four different amplitudes
for 16QAM, where two are the negation of the other two. The actual values are not important,
we just start integrating and search for two different amplitudes. When they are found we start
over again calculating valus Si, this time comparing the amplitudes calculated with the known
ones. Looking at table A.1 one can see that for each branch, I and Q, there are two bits. The
first bit is determined by the sign of Si, if it is a positive value we get 0 otherwise 1. The second
bit is determined by the amplitude, if abs(S) is the lower amplitude we get 0 otherwise 1. This
allows for efficient implementation where we handle one 32-bit word at a time:
word = (sumI<0) << 24; // sumI = I branch, sumQ = Q branch
word |= (sumQ<0) << 16;
word |= (abs(sumI) == ampH) << 8; // ampH = the higher amplitude
word |= (abs(sumQ) == ampH);
4.1.3.2 Deinterleaver
The interleaver is described in section 2.4.10.8. To reverse this we move back the columns and
read out the symbols. This is done in software by using a lookup-table for the start of every
column in the sequence (30 values for QPSK and 60 values for 16QAM), and use an offset from
this.
50 Method
In the 16QAM case the output order of the two interleavers can be changed, this is accounted
for by changing offsets. The output of the lower interleaver can be inversed, the deinterleaver
inverses it back. If these operations should be performed is decided by the RV parameters.
4.1.3.3 Desegmentation
All the channels are already saved in order in the same array so nothing needs to be done here.
4.1.3.4 HARQ Bit Collection
The way the systematic and parity bits are ordered is described in section 2.4.10.5. As no turbo
decoding is done only the systematic bits needs to be extracted. This is done in-place by using
the unaligned memory access capability of the DSP. The data is ordered as column by column
in sequence, and each bit is represented by 8-bits. Unaligned access is used to continuesly
overwrite parity bits of previous column with systematic bits of the next column. Doing it
in-place also means that for columns completely filled with systematic bits no processing needs
to be done, thus saving time.
4.1.3.5 Rate matching
As no turbo decoding is done puncturing of data is not supported. However for some transport
blocks repetition is performed so this is implemented. Because of its presumed unusualness the
implementation is not very efficient and basically just uses the equation defined in section A.3.3
and saves the bits to take away in an array.
This is also the place where the tail bits are removed from the sequence and the 8-bit sequence
is converted to a bit sequence.
4.1.3.6 Bit descrambling
The descrambling is done exactly as scrambling, by XORing the data with the code. The
code, as described in section 2.4.10.2 is precalculated for the largest possible transport block
for efficiency.
4.1.3.7 Header and CRC extraction
The header is defined in section 2.4.6. The CRC is the last 24 bits.
4.1.3.8 CRC calculation
As for HS-SCCH table lookups are used for CRC calculation. The HS-DSCH CRC is 24 bits
and the polynomial is defined in 2.3.14.1.
4.2 Matlab Encoder
To aid the development of the DSP decoder, we have also implemented a basic HSDPA encoder
in Matlab. This enabled debugging of each of the decoding stages, which would otherwise had
4.3. Development tools 51
been very time consuming. Some of the decoding functions were first implemented in Matlab
as well, to verify correctness before implementing on target.
The coding of HS-SCCH is complete, whilst the coding of HS-DSCH is only partial. Punc-
turing of systematic bits is not supported since the decoder does not have this functionality
either, and the actual payload data is limited to the same data for each transport block.
The output is the I/Q data formatted to a suitable format for the DSP simulator.
4.3 Development tools
Texas Instruments Code Composer Studio 3.3 has been used to code, simulate and emulate the
target DSP TMS320C6416T. For onchip debugging Blackhawk 560 JTAG emulator has been
used. Matlab with Communications Toolbox have been used for the encoder.
4.4 Other tools
Some small POSIX C tools have been written to extract data from various log files to be able
to verify the decoding.
binparser Convert baseband I/Q data to a suitable format for the DSP simulator
readhslog Extract MAC-HS PDUs from a log file to be able to compare bit-by-bit with decoder
output
paramext Extract paramters from a log file necessary to be able to do the decoding (scrambling
code, users etc)
4.5 Software
This section provides an overview on how the software works. The decoding has been described
in section 4.1, so this text will focus on how the software is setup, how users are handled and
where the actual checks are made.
4.5.1 Flowchart
A general overview of the software can be seen in figure B.1 in appendix B. Data flow is
represented by the broader arrows coming from left to right, starting with the descrambling
that is done in BILL. The upper branch represents HS-SCCH decoding and the lower HS-
DSCH decoding. Data input and output to and from each stage is represented by the small
arrows with corresponding labels. Program flow is represented by the decision boxes and arrows
connecting them. The small thick arrows are for illustrative purposes and should be considered
as semaphores that triggers start of dataflow.
On each subframe interrupt the users will be iterated and decoded, as can be seen in the
upper left corner. After everything has been decoded, all checks have been made and reporting
is done the software sits and waits for the next subframe interrupt.
52 Method
4.5.2 Setup
4.5.2.1 Cell
Some cell-specific parameters needs to be provided to the program. These are the Scrambling
Code used and something called tCell, which is a cell-specific chip offset of tCell · 256 chips.
These parameters are written to BILL.
4.5.2.2 Users
At least one user needs to be defined, the upper limit is the available amount of dynamic
memory. Each user has a specific UEID, a category and 1 to 4 HS-SCCH codes. When a new
user is created the following steps will be done:
1. Check for available memory, abort if none available.
2. Check if a valid category has been entered (1-12), if not abort.
3. Create the UE mask for HS-SCCH decoding.
4. Create HS-SCCH codes assign to user. If one code has already been created for another
user just provide a reference.
5. Add user to the inactive list.
The only category-specific information currently used for any purpose is the inter-TTI arrival
time.
4.5.2.3 Run request
The program is started by sending a run request with a specified BFN. In this BFN the baseband
data flow from BILL will start and when a buffer is filled decoding will start.
4.5.3 User handling
Users are handled in three linked lists corresponding to three states. The states are inactive,
active and released, see figure 4.1.
Inactive is the default state and means that this user has not yet successfully decoded an HS-
SCCH channel with any of its assigned codes. Active means that the user found a working code
in the last TTI, the code is then bound to this user. An active user that no longer succeeds in
decoding HS-SCCH with its bound code, i.e. it is no longer scheduled any data, will be moved
to the released state. It should be noted that the active user is only evaluated when it should
be scheduled according to the inter-TTI parameter (see 2.4). The next TTI all released users
will be moved to the inactive state.
This state machine will run until one of these conditions is no longer true:
• Less than 15 HS-PDSCH channels have been decoded
• There are still HS-SCCH codes not bound to any active user
4.5. Software 53
Figure 4.1: User states
4.5.4 HS-SCCH decoding
For active users HS-SCCH decoding will only be done with its bound code, if this fails the user
is no longer active.
For inactive users HS-SCCH decoding will be done with the lowest unused code this TTI
until a working code is found or all codes assigned to the user (maximum 4) are exhausted. If
a working code is found the user becomes active.
First the whole subframe will be dechannelized with the specified code. If this fails the code
is not used in this TTI, no user has been assigned to it. If it succeeds part-1 and part-2 data is
prepared for convolutional decoding. In the flowchart this is illustrated by an lower and upper
branch for the two parts.
For part-1 the following will be done:
• Unmask with UE specifik mask
• Depuncture, insert zeros
• Calculate branch metrics for viterbi decoding
• Setup the Viterbi Co-Processor for decoding.
And for part-2:
• Depuncture, insert zeros
• Calculate branch metrics for viterbi decoding
• Setup the Viterbi Co-Processor for decoding.
After this the VCP is started. When the decoding is done data is extracted from both parts
and the CRC is calculated. If the calculated CRC checks out with the one received from part-2
54 Method
we know this HS-SCCH channel belongs to this user. If not we check the next code until all
codes for the user have been exhausted.
If we find a working code we proceed to HS-DSCH decoding.
4.5.5 HS-DSCH decoding
First all transport block parameters are calculated, such as those for segmentation and inter-
leaving. In this step it can be determined if the transport block is punctured, if so we abort
here and reports that it’s not supported. Otherwise this is done in the order described:
• Dechannelization (includes demodulation)
• Deinterleaving (includes desegmentation)
• HARQ bit collection
• Rate matching (only repetition)
• HS-PDU descrambling
• Header extraction
• CRC calculation and comparison
4.5.5.1 Error reporting
Fatal errors will be reported. These can be
• in Dechannelization, if integrated values lower than the spreading factor is found
• in Header Extraction, if the header is illegal
• in CRC comparison, if the received and calculated CRC differs
4.5.6 User checks
When all decoding for the user is done checks can be performed and abnormalities can be
reported. Currently nothing is implemented here.
4.5.7 Cell checks
When all users has been decoded cell-specific checks can be performed and abnormalities can
be reported. Currently the only check implemented is to see if there is sufficiently many HS-
PDSCH scheduled this TTI. The threshold is provided in the run request.
4.6 Specification
4.6.1 Signals
For start, stop and setup of the decoder as described in section 4.5.2 the following signals are
defined.
4.6. Specification 55
4.6.1.1 TXAD SETUP HSDPA DEC CELL REQ
This signal is used to setup cell specific parameters.
Field Type Description
sig no U32 Signal Number
scrCode U16 Scrambling code
tCell U8 Cell-specific offset (tCell*256 chips)
Table 4.1: TXAD SETUP HSDPA DEC CELL REQ
4.6.1.2 TXAD SETUP HSDPA DEC CELL CFM
This signal is sent in response to TXAD SETUP HSDPA DEC CELL REQ
Field Type Description
sig no U32 Signal Number
Table 4.2: TXAD SETUP HSDPA DEC CELL CFM
4.6.1.3 TXAD SETUP HSDPA DEC USER REQ
This signal is used to setup users. Should be sent multiple times for multiple users.
Field Type Description
sig no U32 Signal Number
ueid U16 UE-ID or HS-RNTI of the user
category U8 UE category (1-12)
scchCodes[4] S8 Array of HS-SCCH codes to listen to. Shall be
entered in ascending order. If less than 4 the
remaining entries shall have value -1.
Table 4.3: TXAD SETUP HSDPA DEC USER REQ
4.6.1.4 TXAD SETUP HSDPA DEC USER CFM
This signal is sent in response to TXAD SETUP HSDPA DEC USER REQ.
Field Type Description
sig no U32 Signal Number
Table 4.4: TXAD SETUP HSDPA DEC USER CFM
56 Method
4.6.1.5 TXAD SETUP HSDPA DEC RUN REQ
This signal is used to trigger start of decoding.
Field Type Description
sig no U32 Signal Number
pdschThr U8 PDSCH threshold. Warn with a trace if number of
PDSCH is below this number in any subframe.
startBFN U16 BFN to start decoding
Table 4.5: TXAD SETUP HSDPA DEC RUN REQ
4.6.1.6 TXAD SETUP HSDPA DEC RUN CFM
This signal is sent in response to TXAD SETUP HSDPA DEC RUN REQ
Field Type Description
sig no U32 Signal Number
Table 4.6: TXAD SETUP HSDPA DEC RUN CFM
4.6.1.7 TXAD SETUP HSDPA DEC STOP REQ
This signal is used to stop the decoding.
Field Type Description
sig no U32 Signal Number
Table 4.7: TXAD SETUP HSDPA DEC STOP REQ
4.6.1.8 TXAD SETUP HSDPA DEC STOP CFM
This signal is sent in response to TXAD SETUP HSDPA DEC STOP REQ
Field Type Description
sig no U32 Signal Number
Table 4.8: TXAD SETUP HSDPA DEC STOP CFM
4.6.2 TXADCLI
These are the commands wrapping the signals
4.6. Specification 57
txadcli %TXAD -c hssetcell <scrambling code> <tcell>
txadcli %TXAD -c hssetusr <ueid/h-rnti> <category> <scch_codes 0 .. 4>
txadcli %TXAD -c hsrun <pdsch threshold> <start_bfn>
txadcli %TXAD -c hsstop
4.6.3 Traces
Every trace is preceded by [bfn:sfn].
4.6.3.1 Number of HS-PDSCH below threshold
HSDPA ERROR: Only %d PDSCH channels
4.6.3.2 Error in HS-DSCH decoding
HSDPA ERROR DSCH: Puncturing not supported!
HSDPA ERROR DSCH: Despreading failed!
HSDPA ERROR DSCH: Illegal MAC-hs header!
HSDPA ERROR DSCH: CRC check failed!
Followed by:
** HSDPA FAIL: UEID %d P: %d %s TBS: %d
4.6.3.3 Report every X:th subframe
HSDPA REPORT: max_cycles: %d
minTBS: %d maxTBS: %d minP: %d maxP: %d
sfn_with_data: %d sfn_with_errors: %d
4.6.3.4 Fatal error
HSDPA ERROR: Missed subframe!
CHAPTER 5
Result
5.1 Decoder
A working decoder has been implemented as a DSP program, supporting one cell and multiple
users. The baseband data is transferred to our application through an external memory interface
by an FPGA. Before transferring the data the FPGA descrambles it, all other processing is done
on the DSP. The FPGA part has been made by Ericsson.
No turbo decoding is done, which limits the decoder to unpunctured HS-DSCH data.
The program has been integrated into existing test environment and tests have successfully
executed with real hardware and verified to be correctly decoded. The program can easily be
customized to accommodate different test cases.
5.2 Measurements
Measurements have been done to see if the real-time requirements of the decoder has been
fulfilled. The hard limit for decoding of one subframe is the TTI length, 2 ms.
For measurements the C6416 Device Cycle Accurate Simulator in Code Composer has been
used. This simulates everything on the DSP, including cache and the Viterbi coprocessor. The
same compiler flags as in the final program has been used. The cycles measured are those
used for complete decoding of a subframe, excluding descrambling as this is in implemented on
hardware on the actual setup. The major difference between the final setup is the overhead of
the kernel and other applications running on the DSP.
The encoding has been done with our Matlab encoder implementation. The decoder has been
setup with 4 users with the same 4 channelization codes for every run.
The measurements seen in table 5.1 are for the biggest TB sizes for QPSK/16QAM respec-
tively with 1 and 4 users. As can be seen the difference between QPSK and 16QAM is not that
big, although the TB size for 16QAM is double that for QPSK. The reason for this is the way
demodulation is done. An actual decoding run on real hardware took on average 358050 cycles
for 16QAM 27952, an increase of 9% from simulation. The biggest contributor to the increase
59
60 Result
Modulation Scheduled Users TB sizes Cycles Time (ms)a
QPSK 1 13904 320837 0.3775
16QAM 1 27952 328457 0.3864
QPSK 4 3695, 3695, 3695, 2775 426587 0.5019
16QAM 4 7430, 7430, 7430, 5579 434626 0.5113
aAssuming 850 MHz clockrate
Table 5.1: Simulated decoding
is probably the DMA transfers of baseband data. In the simulation this was done from internal
memory, on real hardware this is done through the external memory interface from the FPGA.
Simulation measurements for all QPSK TB sizes have also been made, in figure B.2 in ap-
pendix B results for all TB sizes using 15 channels can be seen. The peaks for the smallest
TB sizes are due to that these are repeated in rate matching, and this adds extra time to the
decoding. It can also be seen that the actual worst case is somewhere below the middle (0,52
ms for size 7168) and not for the largest transport block, this is due to the way the HARQ
bit collection is implemented. The computation time for every step in the HS-DSCH decoding
chain increases with increasing transport block sizes, except for the HARQ bit collection, which
decreases with decreased number of parity bits.
CHAPTER 6
Discussion
6.1 Design choices
The saving in computation time due to extracting the systematic bits instead of decoding
was probably not of that great importance, considering the processing margin achieved. We
have not looked into the performance of the built-in turbo co-processor on the DSP, but the
computation time for this should not differ that much from the Viterbi co-processor for small
transport blocks at least. And if certain time-consuming operations were to be emigrated to
hardware instead, even large transport blocks should not be a problem.
The choice to have descrambling in hardware was purely made because it was already imple-
mented and readily available.
This will however introduce a limitation on the decoder, since HARQ retransmissions often
uses a redundancy version where systematic bits are punctured.
The reasons for implementing the entire decoder on DSP was to get an idea of how much
processing power that is required for the complete decoding. It could then be decided which
parts of the decoding chain that is suitable for software, and which, if any, that would be better
off on hardware. The conclusion is that all of the decoding can easily be done on DSP alone.
But since more users in the cell imposes a worst case scenario when the HS-SCCH codes are
assigned to the users in the end of the user list, i.e. all users must be decoded in order to find
the right ones, the dechannelization of HS-SCCH should be implemented on hardware instead.
6.2 Future work
To be able to support more than one cell, and better support for more users the dechannelization
of HS-SCCH and most of the HS-DSCH decoding should be implemented in hardware (FPGA).
To be able to support more test scenarios the following may be done to further simulate real
UE’s:
• Implement turbo decoding using co-processor and support punctured data to allow resends
61
62 Discussion
• Implement simulation of HS-DPCCH, that is simulate CQI and ACK/NACK
For external verification of the MAC-hs PDU’s the PDU’s can be transferred in real-time
over a proprietary interface to the workstation.
APPENDIX A
Modulation and FEC coding
A.1 Modulation
A.1.1 QPSK
Quadrature Phase-Shift Keying is a rather simple phase shift modulation scheme, in which the
input data bits are modulated as pairs of bits to form a single-carrier sinusoidal signal with
(normally) 90o phase shifts. This enables a symbol of two bits of data to be transmitted in
the single-frequency carrier wave by simply specifying the phase of the signal as a multiple of
90o plus 45o offset. The modulated signal is constructed by summing two source sinusoids of
the same amplitude, separated by 180o in phase, and then apply ordinary bi-phase shift keying
(BPSK) on these separately.
The resulting bit pairs are shown in the constellation diagram in figure A.1 below. Here we
see that each quadrant of the complex plane represents a unique combination of bits; Q1− 11,
Q2− 01, Q3− 10 and Q4− 00. Each symbol have equal reliability regardless of position, since
all four constellations have equal distances between their nearest neighbor. The decision area
is simply the 90o slice between two quadrants.
Using this modulation technique, the data rate of the communication channel is effectively
doubled. The disadvantage is obviously the increased sensitivity to interference in the phase if
the signal, that is, the Bit Error Ratio (BER) is also increased. However, compared to higher
levels of modulation, this is more robust to interference and does not require as large amounts
of transmission power as for instance 16QAM do.
A.1.2 16QAM
A more sophisticated modulation scheme is the 16-Quadrature Amplitude Modulation. This is
similar to PSK in the sense that it also uses two sinusoids of the same frequency which are
summed together to form the desirable modulated signal. However, these are separated by a
static phase shift of 90o to make it possible for the receiver to demodulate the signal without the
63
64 Appendix A
Figure A.1: Normalized constellation diagram for QPSK modulation
use of separate carrier wave frequencies. Instead, QAM uses amplitude modulation to generate
the modulated symbols. These two constituent signals are then summed to form the modulated
signal.
16QAM uses two discrete levels of amplitude in order to get 16 different symbols. These are
defined in table A.1 for WCDMA modulation.
Depending on which quadrant is the dominant one, the resulting signal will get a unique am-
plitude and phase out of 16 possible variants (4 bits per symbol) as depicted in the constellation
diagram in figure A.2.
It may also be beneficial to change the order of the bit constellations, depending on channel
conditions and so on, since not all bits have equal distance to the nearest neighbors (the outer
points do not have an upper limit), hence different reliabilities.
This modulation scheme imposes another problem when used in macro diversity networks
(such as UMTS), namely the fact that the amplitude of the signal is initially not known by the
UE’s, and may change rapidly. For QPSK for instance, the amplitude has only one discrete
level.
As for all higher-order modulation schemes, the noise sensitivity as well as the power require-
ment increases with the complexity of the modulation, hence this is not ideal for channels with
poor signal quality. That is, if two reveivers with equal signal strength are to receive QPSK
versus 16QAM, the latter one will have almost half the processing gain. However, when a
good channel is available, the data rate is doubled compared to QPSK whilst the bandwidth is
maintained.
The decision areas of 16QAM are somewhat complex compared to QPSK, and not as robust to
interference since both phase and amplitude interference affects the resulting received symbols.
A.2. Error coding 65
i1q1i2q2 I branch Q branch
0000 0.4472 0.4472
0001 0.4472 1.3416
0010 1.3416 0.4472
0011 1.3416 1.3416
0100 0.4472 -0.4472
0101 0.4472 -1.3416
0110 1.3416 -0.4472
0111 1.3416 -1.3416
1000 -0.4472 0.4472
1001 -0.4472 1.3416
1010 -1.3416 0.4472
1011 -1.3416 1.3416
1100 -0.4472 -0.4472
1101 -0.4472 -1.3416
1110 -1.3416 -0.4472
1111 -1.3416 -1.3416
Table A.1: 16QAM mapping
Figure A.2: WCDMA constellation diagram for 16QAM modulation
A.2 Error coding
Two different types of error coding algorithms are used in WCDMA; 1/2-rate and 1/3-rate
convolutional code and 1/3-rate turbo code. The latter one is used for higher data rates
when good error correction is required, at the expense of large overhead and complexity in
66 Appendix A
encoding/decoding. The performance benefits are achieved when large enough block sizes are
used. Turbo coding is a so called systematic algorithm, meaning that the original, uncoded
data block is included in the code word. If the data block is n bits long, and the code sequence
is m bits long, the entire code word is m + n bits and the code rate is n/ (m + n).
A.2.1 Convolutional coding
Convolutional coding is the base error-correction coding algorithm in WCDMA, and is also the
main building block of a Turbo coder as we shall see in the next section.
The structure of an encoder can be seen as a series of K − 1 bit sequential shift registers,
each tied to a modulo-2 adder connected in series with the other adders, forming a chain to
output i of n outputs in total. The adders are placed according to the ones in the generator
polynomial. Note that 3GPP polynomials are given in octal form.
K is the so called constraint length, in the 3GPP case equal to 9. n is the number of output
bits which are 2 for the 1/2 rate and 3 for the 1/3-rate encoder. With k = 1 input bits (for
each output sequence) the encoder is said to be a (n, k,K) = (3, 1, 9) convolutional encoder for
the 1/3-rate version.
Furthermore, there are n generator polynomials Gi, one for each output, that specifies which
of the K−1 memory bits and 1 input bit to add to the output sum. This is illustrated in figure
A.3 for both the 1/2-rate and 1/3-rate encoders used in WCDMA.
Figure A.3: 3GPP (2,1,9) and (3,1,9) convolutional encoders
The generator polynomials used in WCDMA coding are presented in binary form in table
A.2. The position of the modulo-2 adders are simply matched against the bit pattern in the 9
bit polynomials, hence the output bits are the sum of those memory elements and, common to
all outputs, the input bit.
A.2. Error coding 67
Note that the strict sum of two 1’s is 0 (overflow) in the modulo-2 summing.
(2,1,9) (3,1,9)
G0 1,0,1,1,1,0,0,0,1 1,0,1,1,0,1,1,1,1
G1 1,1,1,1,0,1,0,1,1 1,1,0,1,1,0,0,1,1
G2 - 1,1,1,0,0,1,0,0,1
Table A.2: Bit sequence representation of Gi
The output vector U of an 1/3-rate convolutional encoder with binary bit vector representa-
tion of the generator polynomials as above and input vector X, can be stated as
U =G0 · ((1 + q−1 + q−2 + .. + q−N ) · x)+
G1 · ((1 + q−1 + q−2 + .. + q−N ) · x)+
G2 · ((1 + q−1 + q−2 + .. + q−N ) · x),
(A.1)
where q is the delay operator. To make sure that the internal state is all-zero at the beginning
of next data input sequence, K − 1 zeros are added at the end of all inputs. This is known as
Trellis termination.
A Systematic Convolutional (SC) encoder is obtained by including the input data in the
output. This greatly improves BER performance, at the cost of more data to transmit i.e. more
overhead. In the (3, 1, 9) case however, the increase will only be 33%.
A.2.2 Turbo Coding
The Shannon limit of a communication channel is the theoretical maximum information transfer
rate of the channel, for a particular noise level. Turbo code error-correction comes closest to the
Shannon limit of all to date known coding algorithms (Pushing the Limit [10]). This enables
high transmission rates through the same noisy channel as for other non near-Shannon limit
algorithms, without having to increase transmission power.
The obvious drawback is that the computation power required when decoding increases,
making Turbo codes unusable in long-term communication such as speech channels or video
calls. The ideal usage for Turbo codes is in PS data channels, which are only used for short
periods of time, but at a very high peak data rate.
An overview of the encoding/decoding chain is shown in figure A.4.
The Turbo encoder is a so called Parallel Concatenated Convolutional Code (PCCC) encoder,
which consists of two parallel Recursive Systematic Convolutional (RSC) encoders. The first
RSC operates directly on the input sequence and also outputs the unchanged input (systematic
output). The other RSC is fed with bits supplied from a dynamic interleaver, which performs
inter-row and intra-row permutation of the input bits arranged in a matrix with dimensions
depending on the length of the input bit block.
The arrangement and the algorithms for permuting the rows (for 3GPP implementation) are
quite complex and not presented here, please refer to [3] for details.
68 Appendix A
Figure A.4: Turbo coding and decoding chain
An R × C rectangular matrix is constructed, where R and C are dependent of the number
of input bits K that can be in the interval 40 to 5114. R can be 5, 10 or 20 rows. C is given
from a prime number look-up table. The resulting R · C elements are filled, row by row, with
the input bit sequence. If R ·C > K, the final elements are padded with dummy bits which are
later pruned away from the matrix after the permutation operations.
The interleaving of bits in a non-contiguous way gives better protection against burst inter-
ference in the channel. If several bits in sequence are destroyed in transmission, the likelihood
that the entire transport block will be unusable is greatly decreased compared to in-order
transmission.
The first encoder produces the first parity bit Zk whilst the other produces the second parity
bit Zk. In addition to the systematic output Xk, this yields a 1/3-rate Turbo encoder as shown
in figure A.5. The RSC constituent encoders are 8-state. The interleaver output is denoted Xk.
Figure A.5: Structure of a 1/3-rate Turbo encoder
A.3. Rate matching 69
A.3 Rate matching
In order to make the error coded transport block fit into the physical channel(s), these bits has
to be matched against available channel bits. This is done by either removing bits or repeating
bits.
A.3.1 Repetition
By repeating coded bits when there is enough channel bits available for this, the redundancy
is increased and the coding becomes more robust. For Turbo codes, repetition of systematic
bits only would not increase the redundancy as much as repeating parity bits, because the
correlation is not as strong for these as for parity.
A better approach is to repeat only parity bits for each systematic bit, or to repeat both
systematic and parity bits evenly. This way the total effective coding rate is increased, and
decoding becomes easier in a noisy channel.
A.3.2 Puncturing
By omitting certain bits from the code given from the encoder output, i.e. puncturing the code,
one may change the effective coding rate from the original coding rate up to 1/1 for Turbo
codes (only systematic bits are sent). This is done by comparing to a predefined puncturing
pattern bit-wise; if 1 transmit the code bit, otherwise discard it.
The new coding rate depends on the number of bits omitted and the length of the puncturing
pattern. An example of a 1/2-rate encoder with different puncturing patterns is listed in table
A.3 below.
Coding rate Puncturing pattern Length n
1/2 1 1
1
2/3 10 2
11
3/4 101 3
110
5/6 10101 5
11010
8/10 11000101 8
11111010
Table A.3: Example of puncturing patterns
The puncturing pattern is n bits long. Every bit masked out by this bit pattern will be kept
during the first n bits of the code words, and then the next n bits will be masked out and so
on until the end of the code streams have been reached.
For puncturing of WCDMA Turbo code, the puncturing might be performed on systematic
70 Appendix A
bits as well, or on both systematic and parity bits.
A.3.3 WCDMA Rate matching algorithm
The rate matching pattern algorithm indicates which of the coded bits to either puncture or
repeat. The parameters that controls this selection is number of input bits Xi, initial error
eini, positive error update eplus and negative error update eminus. These are given in [3] section
4.2.7.2 for downlink, and in section 2.4.10.5 for HSDPA.
The following pseudo code represents the iteration of repeated or punctured bits.
e = e_ini
m = 1
while (m <= Xi)
e = e - e_minus
if (e <= 0) then
if (puncturing) then
puncture_bit(m)
else
repeat_bit(m)
end
e = e + e_plus
end
m = m + 1
end
deini/eminuse is the number of bits that stays unaffected during the rate matching. The
interval between each punctured or repeated bit is dependent on the ratio eplus/eminus.
A.3.4 Viterbi decoder
A Viterbi decoder is used to decode data encoded with a convolutional decoder. It uses the
Viterbi algorithm which is a maximum likelihood sequence detector. The most likely sequence
is found by traversing a so called trellis. The trellis consists of S stages for each time instant
during encoding, where S corresponds to the number of bits input N to the encoder plus
(K − 1) tail bits, or trellis termination bits. At each stage there are 2K−1 states, where K is
the constraint length of the encoder.
Out from every stage there are two state transactions, or branches, corresponding to the
inputs “0” and “1” to the encoder. Which state will follow the other is dependent on the
polynomials of the encoder. Every branch is associated with a symbol corresponding to the
output from the encoder. For a rate 1/R encoder there are 2R possible output values.
The trellis can be illustrated with dots representing states, and arrows representing the
branches. In figure A.6 a trellis for a 1/2 rate encoder with K = 3 and polynomials 7 and
A.3. Rate matching 71
5 can be seen.
Figure A.6: Trellis for a K = 3 Convolutional Code
The principle of decoding is then
1. Calculate Branch Metrics for each possible state transition. These are the normed
distances between every possible symbol and the received symbol. Due to symmetry there
are 2R−1 different branch metrics for rate 1/R code.
2. Calculate Cumultative Path Metric. This is the sum of M previous Branch Metrics,
where M is the memory depth of the decoder.
3. Calculate surviving path. The surviving path is the path with lowest Path Metric.
4. Extract the error-corrected data. The leftmost bit in each state in every stage of the
surviving part corresponds to the input to the encoder.
To be able to decode the encoder is always initiated to the state with all zeros. The input is
padded with (K − 1) zeros to flush the delay elements which means that the end state is also
known to be all zeros.
APPENDIX B
Charts
73
74
Appendix
B
Figure B.1: Decoder flowchart
75
Figure B.2: All TB sizes for 15 PDSCH channels, QPSK
APPENDIX C
Abbreviations
16QAM Quadrature Amplitude Modulation of order 16
2G 2nd generation
3G 3rd generation
3GPP Third Generation Partnership Project
AICH Acquisition Indicator Channel
ARQ Automatic Repeat ReQuest
BCCH Broadcast Control Channel
BCH Broadcast Channel
BER Bit Error Ratio
BFN Node B Frame Number
BLER Block Error Ratio
BPSK Binary Phase Shift Keying
CCCH Common Control Channel
CCH Control Channel
CCPCH Common Control Physical Channel
CCTrCH Coded Composite Transport Channel
CDMA Code Division Multiple Access
CPICH Common Pilot Channel
CRC Cyclic Redundancy Check
CS Circuit Switched
CTCH Common Traffic Channel
DBP Downlink Baseband Processing
DCCH Dedicated Control Channel
DCH Dedicated Channel
DL Downlink
DPCCH Dedicated Physical Control Channel
DPCH Dedicated Physical Channel
DPDCH Dedicated Physical Data Channel
DSP Digital Signal Processor
77
78 Appendix C
DTCH Dedicated Traffic Channel
DTX Discontinuous Transmission
EMIF External Memory Interface
FACH Forward Access Channel
FDD Frequency Division Duplex
FEC Forward Error Correction
FPGA Field-Programmable Gate Array
HARQ Hybrid Automatic Repeat Request
HSDPA High Speed Data Packet Access
HS-SCCH High-Speed Shared Control Channel
HS-DSCH High-Speed Downlink Shared Channel
HS-PDSCH High-Speed Physical Downlink Shared Channel
HS-DPCCH uplink High Speed-Dedicated Physical Control Channel
IMT-2000 International Mobile Telecommunications 2000
IP Internet Protocol
ITU International Telecommunication Union
MAC Medium Access Control
P-CCPCH Primary Common Control Physical Channel
PCCH Paging Control Channel
PCH Paging Channel
PDCP Packet Data Convergence Protocol
PDU Protocol Data Unit
PhCH Physical Channel
PHY Physical layer
PICH Page Indicator Channel
PRACH Physical Random Access Channel
PS Packet Switched
QoS Quality Of Service
QPSK Quadrature Phase Shift Keying
R99 Release ’99
RAN Radio Access Network
RBS Radio Base Station (also called Node B)
RF Radio Frequency
RLC Radio Link Control
RNC Radio Network Controller
RRC Radio Resource Control
RV Redundancy Version
S-CCPCH Secondary Common Control Physical Channel
SCH Synchronization Channel
SDU Service Data Unit
SIR Signal-to-Interference Ratio
SF Spreading Factor
SFN Subframe Number
TB Transport Block
79
TDD Time Division Duplex
TDMA Time Division Multiple Access
TFCI Transport Format Combination Indicator
TrCH Transport Channel
TTI Time Transmission Interval
UE User Equipment
UMTS Universal Mobile Telecommunications System
UTRAN UMTS Terrestrial Radio Access Network
WCDMA Wideband Code Division Multiple Access
REFERENCES
[1] 3GPP. Physical channels and mapping of transport channels onto physical channels (release
6). Technical Report 25.211, 3rd Generation Partnership Project, 2005.
[2] 3GPP. Physical layer - general description (release 6). Technical Report 25.201, 3rd
Generation Partnership Project, 2005.
[3] 3GPP. Multiplexing and channel coding (release 6). Technical Report 25.212, 3rd Gener-
ation Partnership Project, 2006.
[4] 3GPP. Spreading and modulation (release 6). Technical Report 25.213, 3rd Generation
Partnership Project, 2006.
[5] 3GPP. Medium access control (MAC) protocol specification (release 6). Technical Report
25.321, 3rd Generation Partnership Project, 2007.
[6] E. Dahlman, S. Dahlman, S. Parkvall, J. Skold, and P. Beming. 3G Evolution: HSPA and
LTE for Mobile Broadband. Academic Press, 2007.
[7] H. Holma and A. Toskala. WCDMA for UMTS. Wiley, 2001.
[8] T. Instruments. TMS320C64x DSP Viterbi-Decoder Coprocessor (VCP) Reference Guide.
Technical Report SPRU533D, Texas Instruments, 2004.
[9] ITU. About mobile technology and IMT-2000. http://www.itu.int/osg/spu/imt-2000/
technology.html, 2005.
[10] E. Klarreich. Pushing the limit. Science News Online., 2005.
[11] S. Reifegerste. CRC calculation. http://www.zorc.breitbandkatze.de/crc.html, 2006.
[12] Wikipedia. Cellular network — wikipedia, the free encyclopedia. http://en.wikipedia.
org/w/index.php?title=Cellular_network&oldid=165647159, 2007. [Online; accessed
23-October-2007].
81