2008:010 civ master's thesis downlink baseband decoder …1024368/... · 2016. 10. 4. · the...

2008:010 CIV

M A S T E R ' S T H E S I S

Downlink Baseband DecoderImplementation

Ulf Andersson Magnus Isaksson

Luleå University of Technology

MSc Programmes in Engineering Electrical Engineering

Department of Computer Science and Electrical EngineeringDivision of Signal Processing

2008:010 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--08/010--SE

Downlink Baseband DecoderImplementation

Ulf AnderssonMagnus Isaksson

Ericsson Lindholmen

November, 2007

ABSTRACT

Previous generations of cellular networks was built for telephone calls and slow data trans-

mission. Due to the rapid changes in information technology, these factors do not meet the

requirements of today’s wireless revolution. The first specifications for The 3rd generation sys-

tem (3G) was released 2000 from the 3GPP collaboration group. WCDMA is one of the air

interfaces in the specifications.

In 2001 the first phase of HSDPA, High Speed Downlink Packet Access, was introduced to

the specifications. Instead of sending the data on a dedicated channel to each user the radio

resources are used more efficiently in HSDPA by using shared channels in the downlink. A

control channel signals which users that is to receive data in each time instance.

This master thesis has been carried out at DBP IoV (Downlink Baseband Processing Integra-

tion and Verification) at Ericsson, Lindholmen in Gothenburg. This department is responsible

for realtime target integration and verification of the baseband processing system in WCDMA,

including testenvironment development and testcase design. Currently tests are in most cases

executed and recorded for offline analysis. The memory available for recording on the test

hardware limits the maximum run-time of the tests. To be able to run long tests data has to

be decoded and analyzed in real-time.

The purpose of this thesis was to design and implement a realtime decoder for a subset of

WCDMA, namely the downlink HSDPA channels. It should be investigated how much that can

be done in software on a DSP and how much, if at all, that needs to be done in hardware on

an FPGA. This was then to be implemented on and integrated into existing test environments.

A comprehensive study of WCDMA in general and HSDPA in particular has been carried

out. The specifications define in detail how encoding is done, so the core part of the thesis was

to design a decoder based on these. During the project there was a need to verify parts of the

implementation so an encoder was programmed in Matlab, enabling control of all parameters.

It was concluded that decoding could be done entirely on the DSP, and a working decoder

software was made. This does however have limitations in the number of users (mobiles) in the

system and only supports one cell. If some of the processing is emigrated to hardware (FPGA)

these limitations could easily be overcome.

Keywords: 3G, WCDMA, HSDPA, DSP, Real-Time Decoder

iii

PREFACE

This master thesis is the final part of the MSc programme in Electrical Engineering. It has

been carried out at Ericsson AB Lindholmen, Gothenburg 2007.

We would like to thank our supervisor Stefan Davidsson at Ericsson and our examiner Per

Lindgren at LTU. We would also like to thank everyone at Ericsson who has supported us and

helped us with this project. We would especially like to mention Henrik Haggebrandt who

provided us with FPGA code, Johan Fredriksson who introduced us to the test environment,

Ulf Pettersson who helped us with the DSP, Joakim Eriksson who helped us integrate our

software into the existing environment and Anders Andersson & Jan Lindskog for HSDPA

related questions and ideas.

Ulf Andersson and Magnus Isaksson

Gothenburg, November 2007

v

CONTENTS

Chapter 1: Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.4 Abbrevations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Chapter 2: Theory 3

2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Cellular networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 3G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 3GPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.2 UTRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 WCDMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 General Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.2 Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.3 Scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.4 Multipath Diversity and Rake Receivers . . . . . . . . . . . . . . . . . . . 9

2.3.5 Near/Far Problem and Power Control . . . . . . . . . . . . . . . . . . . . 9

2.3.6 Handovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.7 Protocol architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.8 Medium Access Control Protocol . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.9 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.10 Logical Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.11 Transport Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.12 Physical Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.13 Transport blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.14 Channel coding and multiplexing . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.15 Spreading and modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 HSDPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1 Time units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.3 Power control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.4 Rate adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.5 Fast packet scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.6 MAC-hs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.7 Hybrid ARQ with soft combining . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.8 HARQ in HSDPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.9 HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.10 Coding of HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.11 HS-SCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.12 Coding of HS-SCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4.13 HS-DPCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 3: Test environment 45

3.1 TXAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1.1 General overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1.2 DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Test execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Chapter 4: Method 47

4.1 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.1 Descrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.2 HS-SCCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.3 HS-DSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 Matlab Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.3 Development tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Other tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5.3 User handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5.4 HS-SCCH decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5.5 HS-DSCH decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.6 User checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.7 Cell checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6.2 TXADCLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.6.3 Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter 5: Result 59

5.1 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Chapter 6: Discussion 61

6.1 Design choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

viii

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Appendix A:Modulation and FEC coding 63

A.1 Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.1.1 QPSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.1.2 16QAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.2 Error coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A.2.1 Convolutional coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

A.2.2 Turbo Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A.3 Rate matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.3.1 Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.3.2 Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.3.3 WCDMA Rate matching algorithm . . . . . . . . . . . . . . . . . . . . . . 70

A.3.4 Viterbi decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Appendix B:Charts 73

Appendix C:Abbreviations 77

ix

CHAPTER 1

Introduction

1.1 Background

The “Baseband Processing” department at Ericsson Lindholmen, is responsible for the design,

development, and maintenance of the Downlink and Uplink Baseband Processing subsystem

(DBP and UBP) for Ericsson’s 3G base-stations. The IoV subdepartment is responsible for

realtime target integration and verification of the baseband processing system, including testen-

vironment development, testcase design and to execute integration and verification.

Currently tests are in most cases executed and recorded for offline analysis. The memory

available for recording on the test hardware limits the maximum run-time of the tests. Longer

tests are necessary to detect errors that might only appear after running for a relatively long

time, possibly hours or days. To be able to run long tests data has to be decoded and analyzed

in real-time.

1.2 Purpose

The purpose of this project was to investigate the real-time requirements of decoding a subset of

the channels in WCDMA, more specifically HSDPA downlink channels. How much can be done

in software (on a DSP, Digital Signal Processor) and how much needs to be done in hardware

(on a FPGA, programmable )? This should then be implemented and integrated into existing

test environments.

1.3 Limitations

The decoding should be limited to one cell only, using the primary scrambling code. Transmit

diversity and MIMO (Multiple Input Multiple Output) should not be considered, and are not

covered.

1

2 Introduction

1.4 Abbrevations

There are a lot of abbreviations used in these report, a collection of these can be found in

appendix C.

CHAPTER 2

Theory

2.1 Basics

2.1.1 Cellular networks

A cellular network is a radio network made up of a number of radio cells (or just cells) each

served by a fixed transmitter, known as a cell site or base station. These cells are used to cover

different areas in order to provide radio coverage over a wider area than the area of one cell.

[12]

The primary requirement of a cellular system is to have a method to distinguish the transmit-

ters in the different cells from each other. There are two ways to do this: Frequency Division

Multiple Access (FDMA) and Code Division Multiple Access (CDMA). Time Division Multiple

Access (TDMA) allows the same frequency to be used by different users in different time slots

but cannot be used alone to separate cells.

The increased capacity of a cellular network compared to a network with a single transmitter

comes from the fact that the same radio frequency can be reused in different geographical areas.

The frequency reuse factor is the rate at which the same frequency can be used in the network.

In case of FDMA the same frequencies cannot be used in neighbouring cells because of cell

overlapping and inter-cell disturbance, the frequency reuse factor is lower than 1. In CDMA

cells are distinguished by codes rather than frequencies which means that the frequency reuse

factor can be 1.

The use of multiple cells for mobile transceivers means that there has to be some mechanism

for the transceivers to change cells as they move around. This is usually called handover.

2.1.2 History

The 1st generation mobile phone systems (1G) were analog systems launched in the 1980s. One

such example is NMT (Nordic Mobile Telephone).

The 2nd generation systems (2G) were digital systems using either CDMA or TDMA multi-

3

4 Theory

plexing techniques, launched in the 1990s. The most common one is GSM, using TDMA. The

second generation of networks were built mainly for telephone calls and slow data transmission.

Due to the rapid changes in information technology, these factors do not meet the requirements

of today’s wireless revolution.

The 3rd generation system (3G) is defined in the IMT-2000 standard created by ITU in 1999.

It defines five possible radio interfaces for 3G. One of these is WCDMA, which is the one used

in the European 3G standard UMTS.

2.2 3G

2.2.1 3GPP

The 3rd Generation Partnership Project (3GPP) is a collaboration agreement that was estab-

lished in December 1998. The scope of 3GPP is to make technical specifications for a globally

applicable third generation (3G) mobile phone system following the IMT-2000 standard. The

current partners are ARIB (Japan), CCSA (China), ETSI (Europe), ATIS (North America),

TTA (Korea) and TTC (Japan).

3GPP standards are referred to as releases, each one introducing new features. In 2000 the

first standard, Release ’99, was released, defining the UMTS network. Following this Release 4

came in 2001 and Release 5 in 2002, introducing HSDPA. The latest release is Release 6 from

2004, and Release 7 and 8 are in progress.

2.2.2 UTRAN

The radio access network in UMTS is called UTRAN (UMTS Terrestrial Radio Access Network),

see figure 2.1 for an overview of the architecture.

Figure 2.1: UMTS architecture

The UTRAN consists of one or more Radio Network Subsystems (RNS). Each RNS consists

of one Radio Network Controller (RNC) and one or more Node Bs, also called base stations

or RBS (Radio Base Station) in Ericsson terms. The RNCs are responsible for the control of

all radio resources of the Node Bs connected to it. The function of the Node B is air interface

2.3. WCDMA 5

processing and some radio resource management.

There are a lot of open interfaces connecting every part of the network to allow parts from

different manufacturers to be used. The interface between RNCs is called Iur and between

RNCs and Node Bs Iub. The radio access network is connected to a core network with the Iu

interface. There are two types of Iu interfaces; IuCS to accommodate circuit switched (CS) data

and IuPS to accommodate packet switched (PS) data. The Core Network (CN) is responsible

for switching and routing calls and data connections to external networks.

Mobile phones are called UE (User Equipment) and connects to the Node Bs with Uu in-

terface. The Uu interface is the WCDMA radio interface which will be described in the next

section.

2.3 WCDMA

2.3.1 General Properties

WCDMA, also referred to as UTRA (UMTS terrestrial radio access), is the air interface used

for UMTS (Universal Mobile Telecommunications System). WCDMA is a Wideband Direct Se-

quence Code Division Multiple Access (DS-CDMA) spread spectrum system. User information

bits are spread over a wide bandwidth by a spreading code and multiplied with a pseudo-

random scrambling code. The spread bits are called chips. WCDMA has a flexible multirate

transmission scheme to support transmission of different types of services with different data

rates and QoS (Quality of Service) parameters.

In a spread spectrum system the processing gain is the ratio of the spread bandwidth to the

unspread bandwidth. A higher processing gain gives a lower signal to interference ratio, or C/I

(carrier-to-interference), but lower bit rates. As an example a ratio of 256 gives a processing

gain of 24 dB. The required power density over interference density is typically 5 dB for speech

service, which gives a C/I = 5 − 24 = −19dB [7]. This means that the signal can be 19 dB

under the interference or thermal noise power and still be detected. This is the reason why

spread spectrum systems have found its use in military applications for several decades.

There are two possible duplex modes, FDD (frequency division duplex) and TDD (time

division duplex). In FDD separate carrier frequencies are used for uplink and one for downlink,

while in TDD one carrier is time shared between downlink and uplink. FDD is the primary

mode used in UMTS and the one described from here on.

The chip rate in WCDMA is 3.84 Mcps (Megachips per second) and the carrier bandwidth

approximately 5 MHz. The carrier spacing has a raster of 200 kHz and can vary from 4.2 to 5.4

MHz depending on interference scenarios. The frame length is 10 ms and each frame is divided

into 15 slots.

From here on the uplink will only be described briefly and the downlink in more detail.

2.3.2 Spreading

The spreading process is also known as channelization. Spreading is basically done by assigning

the data bits 0 and 1 the values 1 and -1, repeating them by the spreading factor (SF) and

multiplying with the channelization code.

The channelization codes are orthogonal variable-length Walsh codes, also known as OVSF

6 Theory

(Orthogonal Variable Spreading Factor). The data bits 0 and 1 are assigned the values 1 and

-1. The creation of the codes can be recursively defined as in equation 2.1.

Cch,1,0 = 1,[

Cch,2,0

Cch,2,1

]

=

[

Cch,1,0 Cch,1,0

Cch,1,0 −Cch,1,0

]

,

Cch,2n+1,0

Cch,2n+1,1

Cch,2n+1,2

Cch,2n+1,3...

Cch,2n+1,2n+1−2

Cch,2n+1,2n+1−1

=

Cch,2n,0 Cch,2n,0

Cch,2n,0 −Cch,2n,0

Cch,2n,1 Cch,2n,1

Cch,2n,1 −Cch,2n,1...

...

Cch,2n,2n−1 Cch,2n,2n

−1

Cch,2n,2n−1 −Cch,2n,2n

−1

(2.1)

See figure 2.2 for the beginning of the code tree.

Figure 2.2: Walsh Tree

A given code can only be used if there are no other codes used on the path from that given

code to the root of the tree, or any code belonging to the sub-tree generated from that specific

code. Otherwise the code would not be orthogonal with every other code.

The channelization code is used in the uplink to separate channels from each UE, and in the

downlink to separate channels from each cell. The codes are denoted Cch,SF,k where k is the

code number and SF the spreading factor. In WCDMA the SF may vary from 4 to 256 chips

on uplink channels and 4 to 512 chips on downlink channels[4].

An example with a spreading factor 4 can be seen in figure 2.3. In a correlation receiver the

signal is despread and integrated (summed) over one bit. On row 5 it can be seen that the sum

2.3. WCDMA 7

will be 4 or -4 respectively for data 1 or -1. On the last row the signal has been coded with

another spreading code, the result when despreading is integration values lingering around zero.

To get the original data, the integrated values are divided by the code length. In the example

this results in dividing 4 and -4 with 4, yielding 1 and -1.

Figure 2.3: Example of spreading and despreading, SF = 4

With perfect timing different codes are completely uncorrelated. Unfortunately the nature of

radio transmission with multipath propagation, small timing errors and motion-related effects

makes this impossible. Furthermore there is a limited codes space, so in order to separate users

and cells and solve the orthogonality problem scrambling codes are used.

8 Theory

2.3.3 Scrambling

Scrambling is used to separate UE’s in the uplink and cells in the downlink from each other.

The scrambling codes are pseudo-random, or pseudo-noise (PN), codes. If two transmitters

use different codes there is a “low” noise-like correlation at any time offset, where the average

correlation level is proportional to 1/codelength. The self-correlation is also low if the offset is

larger than one chip.

There are two types of codes, long and short codes. The long codes are Gold codes truncated

to the 10 ms frame length, thus resulting in 38400 chips with 3.84 Mcps. The short code

length is 256 chips and the codes are chosen from the extended S(2) code family. The uplink

scrambling may use both long and short codes whilst downlink uses only long codes.

2.3.3.1 Downlink scrambling code generation

Figure 2.4: Scrambling code generator

A total of 218 scrambling codes (0 - 262142) can be generated[4]. Primarily 8192 of these

are used (another 2 · 8192 are used for compressed mode), divided into 512 sets of primary

scrambling codes each with 15 secondary scrambling codes. The primary scrambling codes are

p = 16 · i where i = 0..511. The i:th set of secondary scrambling codes consist of code numbers

s = 16 · i + k where k = 1..15.

The set of primary scrambling codes is further divided into 64 scrambling code groups, each

consisting of 8 primary scrambling codes. The j:th scrambling code group consists of primary

scrambling codes 16 · 8 · j + 16 · k where j = 0..63 and k = 0..7. Each cell is associated with one

primary scrambling code.

The reason for the code groups is to facilitate the cell search procedure. The code group is

signaled on one of the synchronization channels. The UE receives this and correlates all codes

in the group with the pilot channel, scrambled with the primary scrambling code of the cell.

When a peak is found the UE has found the scrambling code of the cell.

2.3. WCDMA 9

The scrambling code sequences are constructed by combining two real sequences into a com-

plex one. The basis for the real sequences are so-called m-sequences, or maximum length

sequences (MLS). An m-sequence cycles through all possible 2m − 1 states within the shift

register. A further description is outside the scope of this report.

The real sequences are constructed by position-wise modulo-2 sum of 38400 chip segments

from two m-sequences x and y with generator polynomials of degree 18. The polynomials are

Gx = 1 + X7 + X18

Gy = 1 + X5 + X7 + X10 + X18(2.2)

In hardware this is implemented by using maximal linear feedback shift registers (MLFS),

figure 2.4 is taken from the 3GPP specifications[4].

2.3.4 Multipath Diversity and Rake Receivers

When transmitting radio signals over land one will experience multiple reflections, diffraction

and attenuation of the signal energy. This is called multipath propagation and is caused by

buildings, mountains and so on. If these paths are nearly equal in length this will result in

signal cancellation, called fast fading.

If the distance is large (> 78m in UMTS = speed of light/chip rate) the signal energy will

arrive at the receiver across clearly distinguishable time instants, giving a certain multipath

delay profile. The receiver can then separate the multipath components and combine them to

obtain multipath diversity. Such a receiver is called a Rake receiver and can have a different

number of so called fingers allocated to the delay positions with significant energy, and then

combine these to get the correct signal.

2.3.5 Near/Far Problem and Power Control

Tight and fast power control is very important in WCDMA, without it a single overpowered

UE could block a whole cell. A mobile at the edge of a cell may suffer a path loss from one

that is closer to the base station. The mobile close to the base station could easily over shout

the other and give rise to the near-far problem.

Open-loop power control is used by the UE to make a rough estimate of path loss by measuring

the downlink beacon channel CPICH as an initial power setting when entering a cell.

Fast closed-loop power control is used on the dedicated channels. The base station measures

the Signal-to-Interference Ratio (SIR) from each mobile and compares it to a reference level.

Based on this it will command the mobile to raise or lower its power. This is executed at a rate

of 1500 Hz. If no control where to be used, there would be impossible for the Node B to decode

all signals received, due to the nature of CDMA. That is, the orthogonal coding requires all

signals to be of equal amplitude when despreading. The same method is used on the downlink

to provide more power to UEs at the cell edge and for power/radiation reduction purposes.

There is also an outer loop power control where the target SIR setpoint for each mobile can

be adjusted from Bit Error Ratio (BER) or Block Error Ratio (BLER) estimates by the Radio

Network Controller (RNC).

10 Theory

2.3.6 Handovers

When a user moves from one cell to another a handover has to occur. In a traditional hard

handover the current connection is broken before a connection to the new cell is established. In

a soft handover on the other hand the connection to the new cell is established before leaving the

current cell. This is the main type of handovers in WCDMA, and a special case of this is softer

handover where the links added or removed belong to the same base station. Hard handovers

are used in WCDMA to change to another frequency carrier or another system, like GSM.

To support hard handover something called compressed mode is used. This basically means

that either the spreading factor is decreased or bits are punctured to allow empty timeslots for

inter-frequency measurements. Compressed mode will not be covered further.

2.3.7 Protocol architecture

An overview of the protocol architecture can be seen in figure 2.5.

Figure 2.5: Overview of the protocol architecture

The three protocol layers are the physical layer (layer 1), data link layer (layer 2) and network

layer (layer 3). Layer 2 contains a number of sublayers, the Medium Access Protocol and Radio

Link Control. There are also two service-dependent protocols, the Packet Data Convergence

2.3. WCDMA 11

Protocol (PDCP) and Broadcast/Multicast Control Protocol (BMC).

The PDCP is used for packet switched services, mainly IP. One of its main functions is header

compression. The BMC is used for cell broadcast services.

The RRC encapsulates higher layer control messages (call control, session management etc)

for transmission over the radio interface. The control interfaces between the RRC and the lower

layer protocols are used to configure parameters for the different channels, measurements and

error reporting etc.

The RLC provides segmentation and retransmission services, flow control and ciphering etc.

2.3.8 Medium Access Control Protocol

In the Medium Access Control (MAC) the logical channels are mapped to transport channels.

The MAC layer consists of three logical entities, MAC-b for the broadcast channel, MAC-c/sh

for common and shared channels and MAC-d for the dedicated channels. Other functions

performed in the MAC include but are not limited to:

• selection of appropriate Transport Format for each Transport Channel depending on in-

stantaneous source rate

• priority handling between data flows of one UE and between UE’s

• multiplexing/demultiplexing of upper layer PDU’s (Protocol Data Units) into/from trans-

port blocks

• traffic volume measurement

2.3.9 Channels

As can be seen in figure 2.5 there are three different channel types: Logical Channels, Trans-

port Channels (TrCH) and Physical Channels (PhCH). The channel mapping from logical to

transport to physical channels is shown in figure 2.6. There are also physical channels used

only for physical procedures.

2.3.10 Logical Channels

There are two different types of logical channels, Control Channels and Traffic Channels.

Control Channels

• Broadcast Control Channel (BCCH) - Downlink channel for broadcasting system control

information.

• Paging Control Channel (PCCH) - Downlink channel that transfers paging information.

• Dedicated Control Channel (DCCH) - A point-to-point bidirectional channel that trans-

mits dedicated control information between the network and a mobile station.

• Common Control Channel (CCCH) - A bidirectional channel that transmits control

information between the network and an UE.

12 Theory

Figure 2.6: Channel mapping

Traffic Channels

• Dedicated Traffic Channel (DTCH) - A point-to-point channel dedicated to one mobile

station for the transfer of user information. Can exist in both uplink and downlink.

• Common Traffic Channel (CTCH) - A point-to-multipoint downlink channel for the

transfer of dedicated user information to all or a group of UEs.

2.3.11 Transport Channels

Two types of transport channels exist, dedicated channels and common channels. The only

dedicated channel is DCH. The common channels are

• Broadcast Channel (BCH) - Downlink transport channel used to broadcast system- and

cell-specific information. Transmitted over the entire cell with low fixed bit rate.

• Forward Access Channel (FACH) - Downlink transport channel that carries information

to mobile stations known to be located in the given cell.

• Paging Channel (PCH) - Downlink transport channel that carries data relevant to the

paging procedure. Associated with the transmission of physical-layer Paging Indicators

to support efficient sleep-mode procedures.

• Random Access Channel (RACH) - Uplink transport channel intended to carry control

information or small amounts of packet data from the mobile station. Characterized by

a collision risk and by using only open-loop power control.

2.3. WCDMA 13

2.3.12 Physical Channels

• Physical Random Access Channel (PRACH) - Carries RACH.

• Primary Common Control Physical Channel (P-CCPCH) - Carries BCH at a fixed rate

of 30 Kbps and channelization code Cch,256,1.

• Secondary Common Control Physical Channel (S-CCPCH) - Carries FACH and PCH

at a variable rate.

• Dedicated Physical Data Channel (DPDCH) - Carries DCH

• Dedicated Physical Control Channel (DPCCH) - Carries control information.

• Synchronisation Channel (SCH) - Used for cell search, two sub-channels: primary and

secondary SCH.

• Common Pilot Channel (CPICH) - Carries a predefined bit sequence with channelization

code Cch,256,1.

• Acquisition Indicator Channel (AICH) - Carries acquisition indicators used in random

access procedure.

• Paging Indication Channel (PICH) - Carries page indicators to indicate a page message

on the PCH.

2.3.13 Transport blocks

Transport block sets with one or more Transport Blocks (TB) arrives from the MAC to the

physical layer every TTI. The Transmission Time Interval (TTI) is TrCH specific and can be

10 (one radio frame), 20, 40 or 80 ms.

The transport format (TF) defines the data in a transport block set and consists of two

parts, semi-static and dynamic. The semi-static parts are common to all transport formats in

a transport channel and are:

• Transmission time interval (TTI)

• Error protection scheme

• Size of CRC

• Static rate matching parameter

The dynamic parts can be different for every transport format, these are:

• Transport block size

• Transport block set size

All transport formats associated with a channel forms a transport format set (TFS), and each

format has a unique identifier called the transport format identifier (TFI). Several transport

channels can be multiplexed to one coded composite transport channel (CCTrCH), as described

in the next section. The collection of transport formats used in a CCTrCH is called the transport

format combination (TFC), with the identifiers called TFCI.

14 Theory

2.3.14 Channel coding and multiplexing

The general coding and multiplexing of transport channels DCH, RACH, BCH, FACH and

PCH is shown in figure 2.7. The inputs to the process are the transport block sets arriving

every TTI.

Figure 2.7: Downlink multiplexing and channel coding

2.3. WCDMA 15

Each step will be described briefly in following sections.

2.3.14.1 CRC attachment

Cyclic Redundancy Check (CRC) is used for error detection on transport blocks. Higher layers

define if there should be CRC of size 24, 16, 12, 8 or no CRC. The generator polynomials used

are:

• gCRC24 (D) = D24 + D23 + D6 + D5 + D + 1

• gCRC16 (D) = D16 + D12 + D5 + 1

• gCRC12 (D) = D12 + D11 + D3 + D2 + D + 1

• ggCRC8 (D) = D8 + D7 + D4 + D3 + D + 1

Where for example gCRC8 (D) means that the polynomial bit string is 110010111. The in-

formation is divided modulo 2 by the generator polynomial and the remainder becomes the

checksum.

2.3.14.2 Transport block concatenation and code block segm entation

After CRC attachment the transport blocks are concatenated and possibly segmented to dif-

ferent coding blocks.

The number of transport blocks on a TrCH i is denoted Mi. Bi is the number of bits in

each block, including CRC parity bits. The number of bits after serial concatenation of the Mi

blocks is Xi = MiBi. Segmentation of the bit sequence is done if Xi > Z, where Z = 504 for

convolutional coding and Z = 5114 for turbo coding.

The number of code blocks is

Ci =

⌈

Xi

Z

⌉

. (2.3)

For Ci > 0 the number of bits in each code block is

Ki =

⌈

Xi

Ci

⌉

. (2.4)

For turbo coding if Xi < 40 then Ki = 40.

If Xi is not a multiple of Ci, or if turbo coding is used and Xi < 40, filler bits are added to

the beginning of the first block. The number of filler bits, Yi is determined by

Yi = CiKi − Xi. (2.5)

Concatenation is done to avoid the overhead of added tail bits, while segmentation is done

to keep down the implementation complexity.

16 Theory

2.3.14.3 Channel coding

The bits are encoded with one of the Forward Error Correcting algorithms, convolutional or

turbo coding, described in appendix A.

After encoding the number of bits Yi for each code block depends on the coding scheme

according to:

• Convolutional coding

– Rate 1/2: Yi = 2Ki + 16.

– Rate 1/3: Yi = 3Ki + 24.

• Turbo coding

– Rate 1/3: Yi = 3Ki + 12.

The encoded blocks are serially concatenated. The total number of output bits is Ei = CiYi.

2.3.14.4 Rate matching

WCDMA provides flexible data rates and the number of bits on a transport channel can vary

between different TTIs. The rate matching adapts this resulting bit rate to the limited possible

bit rates of a physical channel. Bits are repeated or punctured according to the rate matching

attribute, which is semistatic and can only be changed through higher layer signaling. The rate

matching algorithm is further described in section A.3.3 in appendix A.

2.3.14.5 Insertion of DTX indication, fixed positions

Fixed or flexible positions of transport channels in the radio frame can be used, this will not

be explained in detail here. If fixed positions are used a fixed number of bits are reserved for

each TrCH in the radio frame. If the positions are fixed and the output from the rate matching

stage does not fill up the reserved bits, DTX indications are inserted here. DTX indicate when

the transmission should be turned off.

2.3.14.6 1st interleaving

The 1st interleaving is a block interleaver with inter-column permutations. For 10 ms TTI this

stage is transparent.

2.3.14.7 Radio frame segmentation

For TTI’s longer than 10 ms the input bit sequence is segmented and mapped onto Fi consec-

utive radio frames.

2.3.14.8 TrCH Multiplexing

Every 10 ms, one radio frame from each TrCH is delivered to the TrCH multiplexing. These

radio frames are serially multiplexed into a coded composite transport channel (CCTrCH).

2.3. WCDMA 17

2.3.14.9 Insertion of DTX, flexible positions

If the positions are flexible and there still are bits left in the radio frame these are filled up with

DTX indications.

2.3.14.10 Physical channel segmentation

When more than one PhCH is used this step divides the bits among the different PhCH’s. Note

that the actual mapping is done after interleaving.

2.3.14.11 2nd interleaving

The second interleaver performs intra-frame interleaving. It is applied separately for each

physical channel segment.

It is a block interleaver and consists of bits input to a matrix with padding, the inter-column

permutation for the matrix and bits output from the matrix with pruning.

Let U be the number of bits in one radio frame, then the output bit sequence from the block

interleaver is derived as follows:

1. Assign C2 = 30 to be the number of columns of the matrix. The columns of the matrix

are numbered 0, 1, 2, ..., C2 − 1 from left to right.

2. Determine the number of rows of the matrix, R2, by finding minimum integer R2 such

that U ≤ R2 · C2. The rows of rectangular matrix are numbered 0, 1, 2, ..., R2 - 1 from

top to bottom.

3. Write the input bit sequence into the R2×C2 matrix row by row starting in column 0 of

row 0. If R2 · C2 > U dummy bits are added to the end.

4. Perform the inter-column permutation for the matrix based on the pattern shown in table

2.1, where P2(j) is the original column position of the j-th permuted column.

5. The output of the block interleaver is the bit sequence read out column by column from

the inter-column permuted matrix. The output is pruned by deleting dummy bits that

were padded to the input of the matrix before the inter-column permutation.

Number of columns Inter-column permutation pattern

C2 P2(0), P2(1)...P2(C2 − 1)

30 0, 20, 10, 5, 15, 25, 3, 13, 23, 8, 18, 28, 1, 11, 21,

6, 16, 26, 4, 14, 24, 19, 9, 29, 12, 2, 7, 22, 27, 17

Table 2.1: Inter-column permutation pattern

2.3.14.12 Physical channel mapping

The physical channel mapping is described in section 2.3.9.

18 Theory

2.3.15 Spreading and modulation

C c h , S F , mS e r i a l

t oPa ra l l e l

I / QM a p p e r j

I + j Q

S d l , n

P h C H # n S

Figure 2.8: Downlink spreading and modulation

Figure 2.8 shows the spreading that is done for every physical channel except SCH. SCH

instead carries a special predefined synchronization sequence. QPSK is used for all channels

(again except for SCH). With HSDPA (which will be described in the next section) another

modulation, 16QAM, may be used.

2.3.15.1 IQ mapping

QPSK The binary value 0 is mapped to 1, the binary value 1 to -1 and DTX to 0. In the

serial-to-parallel converter every even binary symbol is mapped to an I branch and every odd

symbol to a Q branch.

16QAM A set of four binary symbols nk, nk+1, nk+2, nk+3 are serial-to-parallel converted to

two binary symbols on the I branch, i1 = nk, i2 = nk+2, and two on the Q branch, q1 =

nk+1, q2 = nk+3. These are then mapped to 16QAM according to table A.1.

2.3.15.2 Channelization and scrambling

The I and Q branches are spread to the chip rate by the channelization code Cch,SF,m, described

in section 2.3.2. The resulting chip sequence on the Q branch will be multiplied with j and

summed with the corresponding chip on the I branch, resulting in a complex sequence.

The resulting sequence of complex valued chips will then be scrambled by a complex-valued

scrambling code Sdl,n, described in section 2.3.3. Then each channel will be weighted by a

weight factor Gi before being combined by complex addition.

2.3.15.3 RF Modulation

After the spreading process the complex chip sequence is modulated as shown in figure 2.9

before RF transmission. T in this picture is the final baseband IQ data that will be referenced

2.3. WCDMA 19

later.

Sp l i t t e r

P u l s e s h a p i n g

P u l s e s h a p i n g

R e { T }

T

I m { T }

cos(ωt)

sin(ωt)

Figure 2.9: Downlink modulation

20 Theory

2.4 HSDPA

The High Speed Downlink Packet Access data packet transmission was first defined in the 3GPP

Release 4 with a peak rate of 4 Mbps, further developed in Release 5, and then complemented

with the downlink equivalent HSUPA (High Speed Uplink Packet Access, also known as EUL,

Enhanced Uplink) in Release 6. The current release is the Release 6 referred to in this text.

HSDPA is a shared downlink resource, hence one transport channel is shared between multiple

users in the cell. This service is packet switched (PS), which means that user data is very bursty,

hence the link resource to one user does not need to be reserved through out the connection. In

other words, the available radio resources must be dynamically shared between the users that

are currently requesting data.

Now consider multiple users connected to a Node B. All of these users must be able to get

data on request at all times during their connection. This implies that they must have their

own channelization code on a dedicated channel to separate them from each other, causing

channelization codes to start to run out rapidly when more users are connecting. One solution

is to let each user have its own downlink scrambling code, but then the orthogonality from the

single source (Node B) would be lost.

Category Maximum

number of

codes

Inter-TTI Maximum

transport

block size

Data rate Modulation

schemes

1 5 3 7298 1.2Mb/s Both

2 5 3 7298 1.2Mb/s Both

3 5 2 7298 1.8Mb/s Both

4 5 2 7298 1.8Mb/s Both

5 5 1 7298 3.6Mb/s Both

6 5 1 7298 3.6Mb/s Both

7 10 1 14411 7.2Mb/s Both

8 10 1 14411 7.2Mb/s Both

9 15 1 20251 10.1Mb/s Both

10 15 1 27952 14.0Mb/s Both

11 5 2 3630 0.9Mb/s QPSK

12 5 1 3630 1.8Mb/s QPSK

Table 2.2: HSDPA UE categories

Instead, HSDPA uses a set of 15 channelization codes shared in time domain and a shared

control signaling channel which conveys control informatin to the UE’s, in order to inform the

connected UE’s when data is available for a certain UE ID. This resolves the code problem,

and does not affect the QoS since PS services does not need to have a guaranteed time delay

on data packets.

There are 12 different HSDPA categories, with different modulation schemes and number

of codes. Different transmission spacings called inter-TTI intervals are also used to effectively

limit the bandwidth of certain categories. This is simply done by utilizing every second or every

third transmission time interval (TTI). The theoretical data rates ranges from 0.9 to 14 Mbps,

2.4. HSDPA 21

and are all listed in table 2.2.

2.4.1 Time units

There are three important time units used in HSDPA. These are:

Radio frame: 10 ms. 38400 chips divided into 15 slots.

Slot: 667 µs. 2560 chips.

Subframe: TTI for HSDPA consists of three slots i.e. 2 ms. 7680 chips.

2.4.2 Overview

There are three key concepts in HSDPA separating it from R99 services.

• Rate adaptation

• Hybrid ARQ with soft combining

• Fast packet scheduling

The rate adaptation adjusts the data rate to a specific user at a frame-by-frame basis, calcu-

lated on the current channel quality reported by the UE. Each TTI a new rate is selected for

each receiving user by adjusting modulation scheme, forward error-correction coding (FEC) and

redundancy version, where the highest rate is assigned to the user with best channel conditions.

This user is also getting the highest scheduling priority by the fast packet scheduling process.

Hybrid ARQ or HARQ is an in-sequence delivery and redundancy versioning process, making

use of erroneous packets by combining them with retransmissions.

Modulation schemes used are QPSK and 16QAM for HS-DSCH, and just QPSK for HS-

SCCH. FEC coding algorithms are Turbo coding for HS-DSCH and convolutional coding for

HS-SCCH. These techniques are described in appendix A.

Figure 2.10: HSDPA channel structure overview

22 Theory

The channel structure of HSDPA is shown in figure 2.10. The downlink information is carried

on the HS-DSCH and HS-SCCH channels, while uplink information is carried on one or more

dedicated channels as in R99. There is also a uplink signaling channel called HS-DPCCH, and

an extra dedicated downlink channel to carry power control commands to the UE (for uplink

power control).

The downlink channels are shared between every UE in the cell. Note that HS-DSCH is a

transport channel, and figure 2.10 only shows physical channels. HS-PDSCH is the physical

HS-DSCH constituent, and there can be up to 15 of these physical channels, also called HS-

DSCH codes in this text. How these 15 channels are shared among the users is decided by the

scheduler.

2.4.3 Power control

Because HSDPA uses a shared channel resource for the receivers, the power cannot be controlled

per user since they are not at the same distance from the Node B (nor do they have equal channel

conditions). Instead, the power is kept at a fairly constant level, actually the power gap from

the R99 channels up to near the maximum output of the power amplifier. In this way, the cell

power is always utilized well.

Figure 2.11: Illustrative example of power utilization in HSDPA

If a certain user is located near the Node B, there are two scenarios which may apply. Either

this user has sufficient power to get a clear channel and can utilize HSDPA at its peak rate, or

the dedicated channels are using to much power so that the user must receive at a lower rate.

Now, if this user is located far from the Node B (near the cell edge), two similar scenarios

exist. Either this user has sufficient reception to receive at a quite high rate if the cell is not

highly populated, or the cell is populated with many other users using dedicated channels,

hence this user must receive at a very low rate using basic modulation if at all possible.

This rate adaptation is described in the next section. Because the spreading factor is constant

2.4. HSDPA 23

for HSDPA, the processing gain is also constant, and the rate must be lowered in order to receive

properly on bad channels. This is the main reason why the scheduling of data is first done in

the time domain, so that one user near the Node B with good channel conditions does not get

affected by a user with poor channel conditions (and get a lower rate).

2.4.3.1 Uplink power control

It is still critical to have a tight power control on the uplink channels to avoid interfering other

UE’s. This is implemented by assigning each UE a dedicated channel, carrying power control

commands in the TPC field. This may also be used for complementary circuit switched data

such as speech.

Regarding the issue with downlink channelization codes, it should be mentioned that the

dedicated channel can use a SF of up to 512 since data is not carried, so there will be enough

DPCH’s for HS UE’s assuming not all HS-DSCH codes are used. If more UE’s need to be

connected for instance when using all HS-DSCH codes, fractional DPCH (F-DPCH) may be

used, allowing several UE’s to share one DPCH.

2.4.4 Rate adaptation

HSDPA uses the following link adaptation techniques. As mentioned earlier, power control

is omitted for the downlink and replaced by means of changing modulation scheme between

QPSK/16QAM, and code rate of the FEC code to adjust bit rate. The basic idea behind this is

to lower the modulation i.e. increase the processing gain, and also increase the effective coding

rate of the FEC in order to allow UE’s with weak or interfered signal to receive properly.

The higher modulation of 16QAM provides twice the bit rate compared to QPSK on clear

channels, and puncturing of parity bits gives a code rate of almost 1 (0.9715). On the other

hand, if the channel is noisy or has a weak signal due to bad coverage or interference, the

modulation can be changed to standard QPSK with a coding rate of 1/3, with repetition of

both systematic and parity bits to increase redundancy.

The UE’s with favorable channel conditions are prioritized by the scheduler, hence given

better average throughput. Refer to the next section about scheduling for further details on

this.

The link adaptation is frame-based, that is, for each TTI a new transport format will be

chosen dynamically. When there are favorable link conditions, 16QAM and (close to) 1/1

coding rate is applied.

The CQI field of the uplink control channel HS-DPCCH (see section 2.4.13) contains infor-

mation reflecting the downlink channel conditions which is used by MAC-hs in Node B for

selecting transport format.

2.4.5 Fast packet scheduling

Scheduling in HSDPA differs from other WCDMA services, in the sense that the former uses

instantaneous measuring of the channel conditions to schedule packets to several users. The

user with the best link conditions will get the highest priority, either in time domain or in code

domain, but preferably in time domain if the UE’s are experience different channel conditions.

24 Theory

Otherwise, the one with best channel will be forced to receive at a lower data rate than possible.

This is also called channel-dependent scheduling in some literature.

If there are, for instance, two users sharing one channel with similar link conditions, then

they will share the channel resources equally, utilizing half of the code set each. If one of them

suffers from poor signal strength due to bad coverage then the one with best link conditions

will get more air time and also be able to use lower coding rate and higher modulation.

Therefore, the channel is always utilized well, and the average throughput for the better link

will be high. Figure 2.12 shows an example of channel-dependent scheduling between three

users, who suffers from dips in channel quality at different times.

Figure 2.12: Example of scheduling three users with variations in channel quality

The scheduling functionality resides in MAC-hs in Node B and not in the RNC as for the

other R99 services. The reason for this is to keep the round trip times short, hence enabling

high data rates.

2.4.6 MAC-hs

The MAC protocol provides data transport between physical layer (layer 1) and higher sub-

layers of layer 2 through the RLC protocol, and also scheduling of MAC packet transmissions

to different users connected to the Node B (see section 2.3.8 about WCDMA). Both of these

protocols are part of the Data Link Layer (layer 2) of the OSI model, but the MAC-hs resides

in the Node B whilst the MAC-d/MAC-c/sh and the RLC is in the RNC.

The reason for placing MAC-hs in Node B is obvious if the data rates and the possible

number of Node B’s connected to each RNC are taken into account; the fast signaling requires

fast operation of the MAC-hs, thus its position is close to the physical layer in the Node B to

releave the RNC. The drawback is that this adds complexity to the relatively simple Node B.

The MAC-hs can be seen as an extension of the MAC-d/MAC-c/sh protocols, dropped down

to the physical layer. Its primary tasks are HARQ process management/in-sequence delivery to

higher layers, user scheduling, transport format selection and extended flow control. The MAC-

hs receives a flow of MAC-d PDU’s which are priority sorted between a set of priority queues,

2.4. HSDPA 25

each tied to a channel of its own. These are then prioritized and segmented into MAC-hs PDU’s

and sent using HARQ.

The MAC-hs specific control signaling is mapped directly onto HS-SCCH in the physical

layer. The information consists of HARQ parameters, transport format selection and UE ID.

Mapping and coding of these parameters is described in section 2.4.11.

Figure 2.13: MAC-hs encapsulation of TCP/IP packets

Now consider each MAC-d PDU as a MAC-hs SDU, that is, MAC-d header and MAC-d

payload. Each MAC-hs PDU consists of a MAC-hs header (normally 21 bits when using fixed

MAC-d PDU size) and a sequence of N MAC-d PDU’s. If the sum of the MAC-d PDU’s are

less than the transport block size, the MAC-hs PDU is padded with dummy bits to fit the

transport block. The data encapsulation from IP packets down to MAC-hs PDU’s is shown in

figure 2.13.

The initial user data has already been encapsulated into the IP packet in layer 3 by the

remote serving application (for instance, a web server). The IP header is then compressed by

the PDCP protocol in layer 2 (RNC), which maps higher-level protocol characteristics onto the

characteristics of the underlying radio-interface protocols, providing protocol transparency for

higher-layer protocols. The PDCP header and payload forms the RLC SDU. This is split into

40 bytes long payload segments, each with its own RLC header. These then forms the MAC-d

SDU.

26 Theory

2.4.6.1 MAC-hs header

The MAC-hs header format is shown in figure 2.14. In this case, all MAC-d PDU’s are of equal

size, yielding a 21 bit header. If they were of different sizes, they would be grouped in sets

according to their size and the SID/N/F fields would be repeated in the header for each set

yielding 11 bits extra per additional PDU set. Normally, they are all of equal size though.

Figure 2.14: MAC-hs header

The VF version flag is reserved for future extensions. If this bit is a 1, the header is not valid

as of today.

QueueID identifies a reordering queue, which buffers received blocks in a receiving window in

order to deliver the transport blocks in-sequence. All MAC-d PDU’s of a MAC-hs PDU belong

to the same reordering queue.

TSN is the transmission sequence number of the transport block, used when reordering. The

in-sequence delivery is implemented on top of HARQ, since there is no interaction between the

HARQ processes.

SID is a size index identifier that specifies the size of the MAC-d PDU’s. N is the number

of MAC-d PDU’s in this set (normally only one as mentioned previously).

Finally, F indicates the end of the header.

2.4.7 Hybrid ARQ with soft combining

For a perfect transmission channel with no noise and maximum amplitude of the received signal

at all times, there is no need for any forward error-correction coding (FEC) or retransmission

algorithms. In practice, there are no such things as perfect channels, hence must there be

some sort of error handling. Even a short length copper or fiber cable can be interfered by

external electro-magnetic sources, and this is often taken care of by either packet acknowledg-

ment/retransmission or FEC.

Retransmission schemes is the best choice if the channel conditions are good enough to cope

without FEC most of the time (as for cable links), and FEC schemes is the best choice for noisy

channels such as air interfaces.

For an air interface however, there is no guarantees what so ever that even a highly redundant

packet can be received at all times, due to fluctuations in the channel. One better approach

is a combination of the retransmission scheme Automatic Repeat Request (ARQ) and FEC,

called Hybrid ARQ, or HARQ. In this case, the received packets are first decoded according

2.4. HSDPA 27

to the FEC scheme used, and then checked for errors using CRC. If an erroneous packet was

received despite FEC coding, the receiver discards the packet and sends a NACK back to the

transmitter. Otherwise, an ACK is sent to indicate that the packet could be salvaged.

ARQ is also automatically responsible for in-sequence delivery of the packets, since retrans-

missions will cause new transmissions inside the receiving window to come out of sequence.

This is done by an re-ordering queue in the receiver.

2.4.8 HARQ in HSDPA

HSDPA uses an extension of HARQ, HARQ with soft combining, which keeps erroneous frames

in a soft buffer (called virtual IR buffer at the transmitter side) since there is some amount

of intact data, and requests a retransmission of the same frame. This retransmission may be

either the same data bits as was sent last time including systematic bits, or a new set of parity

bits with or without systematic bits. The retransmission is then combined with the soft buffer,

until there is enough information for the frame to be decoded successfully.

There are two types of retransmission combination; Chase Combining and Incremental Re-

dundancy.

2.4.8.1 Chase Combining

In a Chase Combining (CC) algorithm, retransmitted frames are exact copies of the original

one, and are combined until the frame can be decoded. This is a simple algorithm, since the bits

to keep from each retransmitted frame are simply those who differs from the buffered frame.

Eventually, assumed that interference may be regarded as random, i.e. uniformly distributed,

the combination of frames will equal the original data sent by the transmitter.

However, if the channel suffers from long-term random interference, this is not a very efficient

algorithm since there may be several retransmissions for each frame, taking channel bit-rate

down severely.

2.4.8.2 Incremental Redundancy

Another approach is Incremental Redundancy (IR). In this algorithm the frame is retransmit-

ted with increasing redundancy and perhaps even without systematic bits (or with punctured

systematic bits). Like Chase Combining, the retransmissions are combined with the original

frame until enough redundancy is accumulated. This results in better error coding/lower code

rate, since it is often more efficient to use more dense coding than to retransmit several times

if the channel quality is poor. That is, the redundancy is adapted to channel conditions.

This is much more robust to static interference than CC, since one retransmission generally

contains the redundancy omitted in the initial transmission.

As an example, consider a maximum coding i.e. minimum code rate of 1/4. Let’s say that the

initial data frame was transmitted with a code rate of 3/4, and a transmission error occurred

due to interference on the channel. The next data frame would then be coded using a data

rate of 3/8. Suppose this frame was also altered by interference, then the second retransmission

would be coded with a data rate of 1/4, hence maximum redundancy is sent.

28 Theory

2.4.8.3 Physical layer details

All of the HARQ functionality resides in MAC-hs in the physical layer. The principle of the

rate matching is depicted in figure 2.15.

The specifications in [3] allows the UE to have a smaller soft buffer than required by the

largest transport blocks, and than the bits output from the Turbo coder for a certain valid

transport block. This limitation is tied to the UE category mentioned previously, and signaled

at connection setup.

In order to accommodate the coded transport block bits in the buffer, these must first be rate

matched (puncturing only) to perfectly fill the buffer. This is done in the first rate matching

stage, where systematic bits are always preserved.

Figure 2.15: Principle of the HARQ two-stage rate matching

The second rate matching stage performs the actual redundancy selection for retransmissions,

as well as rate matching for physical channels. If for instance one QPSK channel is used, there

are 960 available channel bits, and the bits input must be matched to this using either repetition

or puncturing. The rate matching pattern is calculated as described in section 2.4.10.5.

The redundancy version parameters s and r are defined more detailed in section 2.4.10.5

and the coding of them into HS-SCCH is defined in 2.4.12.1. For s = 1, systematic bits will

also be preserved (if possible) in the second rate matching stage, and for s = 0 the number of

systematic bits depends solely on the rate matching algorithms.

Depending on the amount of available channel bits and the redundancy version parameters,

different rate matching scenarios will apply. If systematic bits are prioritized and the amount

of channel bits is enough for the systematic bits, only parity bits will be punctured. On the

other hand, if systematic bits are not prioritized, this implies that parity bits are prioritized

instead. In this case, the systematic bits are primarily punctured. Although, both systematic

and parity bits are repeated in both cases.

An example of HARQ rate matching for both CC and IR is shown in figure 2.16. The

transport block size is 13904 bits (all channels, QPSK), and the soft buffer is 38400 bits.

2.4. HSDPA 29

Figure 2.16: Example of rate matching for different redundancy versions, when using 15 QPSK channels

The initial transmission is equal to a CC retransmission. The down-most bits represents an

IR transmission, where in this example only complementary (and some copies of) parity bits

are sent, i.e. systematic bits are not prioritized and does not fit into the physical channels.

The parity bits to send are actually not bit sequences as shown in the figure, but chosen from

the entire parity block. Even parity bits that has already been sent (in the end of the last

transmission) may be picked out by the redundancy (rate matching) algorithm.

The transmissions/retransmissions are handled by several parallel stop-and-wait HARQ pro-

cesses, one for each receiver at the receiving side and one for each channel and receiver at the

transmitting side.

2.4.9 HS-DSCH

The High-Speed Downlink Shared Channel utilizes both time- and code multiplexing. According

to the specifications, the first choice is multiplexing in the time domain.

HS-DSCH uses a constant SF of 16. 15 of these channelization codes are available for HS-

DSCH use, because the first code is reserved to preserve higher spreading factors. This enables

up to 15 users to share the channel during each time frame (TTI).

The TTI is also different from the other services of WCDMA, since it has been reduced to

2 ms (3 time slots) to boost throughput. The main reason for this is the bottle neck problem

30 Theory

introduced by TCP/IP, when data packets need to be acknowledged very fast which cannot

be done using longer TTI’s. This has also been counteracted with the HARQ functionality

described in more detail in sections 2.4.7 and 2.4.10.5.

The maximum bit rate is obtained when all 15 codes are assigned to one UE. With 3.84

Mcps and 16QAM modulation scheme, the bit rate becomes 3.8416 · 4 · 15 = 14.4 Mbps. This is

however not the net data transmission rate, since the overhead of the error-correction coding

is not taken into account, even though this is only a few percent when using full puncturing of

parity bits. Moreover, further overhead is added in the MAC-hs header, reducing net data rate

even more.

Figure 2.17: Example of code and time multiplexing using five HS-DSCH codes, shared between four

users

An illustration of four UE’s sharing an HS-DSCH with 5 channelization codes is shown in

figure 2.17. The sharing is made through both code- and time multiplexing of available TTI

slots between the users. Note that the order of the code assignment to the users is of no

importance regarding the transmission itself. The scheduler selects which users and how many

codes to assign each of them, and the time domain sharing between the time frames. The

selection of which user to get the most bandwidth is complex and done by the scheduler in the

RNC.

2.4.10 Coding of HS-DSCH

The RNC chooses a suitable transport block size (TBS) for this particular channel, whose size,

together with modulation scheme and number of allocated codes, decides the effective data

quantity to be transferred. If the TBS is too small for the transmission, repetition of coded

bits is performed in the HARQ stage, see figure 2.18 which illustrates the coding chain. If the

TBS is too large, coded bits must be punctured (removed) in order to fit. This may include

puncturing of systematic bits, if they are not prioritized.

Error correction is obtained using 1/3 rate Turbo coding of the transport block concatenated

with CRC-24. The CRC provides error check once the data is decoded.

Bit scrambling of raw data is performed in order to iron-out any long sequences of ones or

zeros. This is important to avoid repeated low-frequency electro-magnetic interference, which

was a fundamental problem in 2G GSM UE’s.

Interleaving of coded data is also important when using high-speed wireless communication.

2.4. HSDPA 31

Figure 2.18: HS-DSCH coding chain

The interleaver re-arranges the input bits, so that systematic bits and parity bits are evenly

spread out in the time domain, hence making the error correction more robust to dynamic radio

interference.

The coding chain illustrated in figure 2.18 is described in more detail in the following sections.

It is related to the coding chain described in 2.3.14. The same denotations as used in [3], section

4.5, is used here. Note that there are some dualities, for instance A = TBS.

2.4.10.1 CRC attachment

The 24 CRC bits are calculated from the A (or TBS) transport block bits aim1, aim2, .., aimA,

according to 2.3.14.1. There is only one transport block. The checksum is appended to the end

of the transport block, producing the bit sequence bim1, bim2, .., bimB , where B = A + 24.

32 Theory

2.4.10.2 Bit scrambling

The bits output from the CRC calculation, Bim, are scrambled by XOR’ing them with the

scrambling sequence generated by the following algorithm. The resulting B scrambled bits are

denoted Dim.

The generator polynomial g is given in equation 2.6, and the initial conditions are given in

equation 2.7.

g = {g1, g2, .., g16} = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1} (2.6)

yγ = 0, −15 < γ < 1

yγ = 1, γ = 1(2.7)

Then, the scrambling sequence yγ is calculated according to

yγ =

(

16∑

x=1

gx · ´yγ−x

)

mod 2, 1 < γ ≤ B (2.8)

2.4.10.3 Code block segmentation

This is done according to section 2.3.14.2 for turbo coding. Note that there is only one transport

block.

2.4.10.4 Turbo coding

Oir bits are encoded using Turbo code at rate 1/3, according to section A.1. Each code block

is coded separately, yielding 12 tail bits per code block, 4 for systematic and 2 ·4 = 8 for parity.

That is, the extra overhead after Turbo coding is Ci · 12.

The resulting E bits are denoted ci1, ci2, .., ciE .

2.4.10.5 HARQ rate matching

Before rate matching stage one, input bits are separated into three sequences X1,X2,X3 with

systematic bits in the first, and parity bits in the other two according to

x1,i,k = ci,3(k−1)+1

x2,i,k = ci,3(k−1)+2 k = 1..Xi

x3,i,k = ci,3(k−1)+3

(2.9)

where

Xi =E

3. (2.10)

2.4. HSDPA 33

First rate matching stage This rate matching stage operates on the three bit sequences

defined above. The number of bits input i.e. the number of coded bits is E, and the size of the

virtual IR buffer is denoted NIR which is signaled from layer 2 for each HARQ process.

Repetition is never performed in this stage, and puncturing of systematic bits is not allowed.

The puncturing decision and pattern calculation follows.

If NIR ≥ E

Coded bits will fit into the buffer and no puncturing needs to be performed.

else

Coded bits does not fit into the buffer, and puncturing must be performed.

end

That is, when NIR < E redundancy is punctured according to the general rate matching

algorithm described in section A.3. The number of parity bits that needs to be punctured is

∆NTTIil = NIR − E (negative sign equals puncturing), where i and l indicates the transport

channel and transport format combination respectively. Both of these indexes are static for

HS-DSCH and do not need to be considered.

The number of bits out of the first stage is denoted Nsys for systematic bits, Np1 and Np2 for

parity 1 and parity 2 bits respectively, and given in equation 2.11.

Nsys = Xi

Np1 = Xi −

∣

∣

∣

∣

∆NTTIil

2

∣

∣

∣

∣

Np2 = Xi −

∣

∣

∣

∣

∆NTTIil

2

∣

∣

∣

∣

(2.11)

Second rate matching stage The following parameters decides the redundancy version of

the current transport block, as an effect of number of bits input, available channel bits, and

RV (Redundancy Version) parameters. This stage is always operative, and may as mentioned

earlier perform both puncturing and repetition of both systematic bits and parity bits. After

the redundancy version has been selected, the appropriate rate matching is performed in order

to fit the selected bits into the physical channels.

Available channel bits are denoted Ndata and given below as a function of modulation scheme

MS, where QPSK = 0 and 16QAM = 1, and number of physical channels P .

Ndata = (MS + 1) · 960 · P (2.12)

If Ndata ≤ Nsys + Np1 + Np2 then puncturing is performed. The number of systematic bits

after puncturing is

Nt,sys = min (Nsys, Ndata) (2.13)

for a transmission that prioritizes systematic bits (s = 1), and similarly

Nt,sys = max (Ndata − (Np1 + Np2) , 0) (2.14)

34 Theory

for a transmission that prioritizes non systematic bits (s = 0).

That is for s = 1, if all systematic bits can fit into the physical channels, all bits are preserved

during puncturing. Otherwise, the maximum number of systematic bits are sent and all parity

bits are punctured.

For s = 0, either no systematic bits are sent if only the parity bits fits into the physical

channels, or the amount of bits available after parity bits are fitted.

If Ndata > Nsys+Np1+Np2 then repetition is performed. The number of bits after repetition

is

Nt,sys =

⌊

Nsys ·Ndata

Nsys + 2Np1

⌋

(2.15)

Nt,p1 =

⌊

Ndata − Nt,sys

2

⌋

(2.16)

Nt,p2 =

⌈

Ndata − Nt,sys

2

⌉

(2.17)

for systematic, parity 1 and parity 2 bits respectively. Of course, s has no effect here.

A closer glance at these equations reveals that the bits are evenly repeated to fill the physical

channels. Note that Np1 = Np2, but Nt,p1 does not necessarily has to be equal to Nt,p2.

The general rate matching algorithm described in section A.3 is applied to the three bit

sequences, using rate matching parameters defined in table 2.3.

Xi eplus eminus

Systematic bits Nsys Nsys |Nsys − Nt,sys|

Parity 1 bits Np1 2 · Np1 2 · |Np1 − Nt,p1|

Parity 2 bits Np2 Np2 |Np2 − Nt,p2|

Table 2.3: Rate matching parameters for second HARQ rate matching stage

The final rate matching parameter eini as a function of RV parameter r is then given by

equations 2.18 and 2.19, for puncturing and repetition respectively.

eini(r) =

((

Xi −

⌊

r · eplus

rmax

⌋

− 1

)

mod eplus

)

+ 1 (2.18)

eini(r) =

((

Xi −

⌊

(s + 2 · r) · eplus

2 · rmax

⌋

− 1

)

mod eplus

)

+ 1 (2.19)

Where rmax is the maximum number of redundancy versions for the given MS,

rmax = (MS + 1) · 2 (2.20)

and r versions are listed in section 2.4.12.1.

2.4. HSDPA 35

2.4.10.6 HARQ bit collection

HARQ bit collection stage performs the bit collection algorithm that follows. The bit sequences

from rate matching stage 2 are read into a rectangular interleaving matrix column by column.

The dimensions of the matrix are Nrow × Ncol, where

Nrow = 2, MS = 0Nrow = 4, MS = 1 (2.21)

Ncol =Ndata

Nrow(2.22)

Ncol can also be written as

Ncol = P · 480 (2.23)

since Ndata is proportional against Nrow. The following parameters defined in equations 2.24

and 2.25 specifies where systematic bits and parity bits are to be written into the matrix. Nr is

the number of rows where systematic bits are to be written, and Nc is the number of columns

where systematic bits are written in Nr + 1 rows. Furthermore, Nc = 0 if there are no parity

bits.

Nr =

⌊

Nt,sys

Ncol

⌋

(2.24)

Nc = Nt,sys − Nr · Ncol (2.25)

If Nc = 0 and Nr > 0, the systematic bits are written into rows 1..Nr. Otherwise, systematic

bits are written into rows 1..Nr + 1 in the first Nc (left-most) columns, and if Nr > 0 the

remaining bits are written into the remaining Ncol − Nc columns (using Nr rows).

In other words, either all systematic bits will fit on one row using Nc columns, or they will

fit on the first Nr + 1 rows using Nc columns plus Nr rows in the column span [Nc + 1, Ncol].

Note that matrix indexes starts at 1.

Figure 2.19: Illustrative example of the HARQ bit collection matrix

The parity bits are then written into the remaining space, column by column with alternating

order of parity 1 and parity 2 bits. See figure 2.19 for an illustrative example for the case Nc 6= 0.

36 Theory

Since the matrix element count is always equal to the number of physical channel bits, there

will not be any padding bits to take care of.

Finally, the bits are read out from the matrix row-by-row. The resulting bit sequence is

denoted WR.

2.4.10.7 Physical channel segmentation

WR is divided into P segments of length U , where U = RP

. U is the number of bits in each

physical channel, either 960 for QPSK or 1920 for 16QAM. The segments are ordered in sequence

with the first segment later mapped to the first physical channel, where the bits are denoted

uP1, uP2, .., uPU .

Bits on the P th physical channel after segmentation:

uP,k = wk+(P−1)·U , k = 1..U. (2.26)

2.4.10.8 Interleaving

This stage is important to enhance the error coding properties. Consider the scenario of a 2 ms

long TTI with 960 bits in sequence. Now, if the bits were arranged as systematic bit, parity bit

1, parity bit 2, ... and so on, an interference peak of a very short duration would easily destroy

a number of bits and their parity bits. This makes error correction difficult. Instead, the bits

are spread around in the bit sequence, making it much less likely that the same interference

would destroy the same amount of bits, hence error correction is made more robust.

The basic interleaver used here is the basic 2nd block interleaver described in section 2.3.14.11,

using 32 rows times 30 columns (32×30). The interleaving is then done in three steps as follows.

1. The input sequence is written row by row in chunks of 30 bits into the 32 × 30 matrix.

2. Then the columns are permuted according to a predefined permutation table.

3. Finally, the bits output are the rows read out top-down.

For QPSK, one block interleaver is used and the input bits UP,U are fed through to produce

the output sequence vP,1, vP,2, .., vP,U , where P is the physical channel and U is the number of

bits in each physical channel. For 16QAM at the other hand, two identical block interleavers

are used and the input bits UP,U are separated 60 at a time, with the first two bits to the upper

interleaver and the next two bits to the lower one.

2.4.10.9 Constellation re-arrangement

There are four modulation constellations for 16QAM. The bits output from the interleaver are

grouped in four bit long sequences so that vp,k, vp,k+1, vp,k+2, vp,k+3 are used, where k mod 4 =

1. For QPSK, this stage is transparent.

The following table shows the bit operations for each constellation version, as a function of

the b parameter.

2.4. HSDPA 37

Figure 2.20: HS-PDSCH interleaver configuration

b Output Operation

0 vp,k, vp,k+1, vp,k+2, vp,k+3 None

1 vp,k+2, vp,k+3, vp,k, vp,k+1 Swapping LSB’s with MSB’s

2 vp,k, vp,k+1, vp,k+2, vp,k+3 Inversion of LSB’s

3 vp,k+2, vp,k+3, vp,k, vp,k+1 Both swapping and inversion

Table 2.4: Constellation version table for 16QAM modulation

2.4.11 HS-SCCH

The control information required to operate HS-DSCH is carried on a new separate control

channel called the High-Speed Shared Control Channel (HS-SCCH). This is a downlink control

channel, providing the UE with information such as channelization code set, modulation scheme,

transport block size and HARQ parameters. The length of an HS-SCCH subframe is the same

as for the HS-DSCH, that is, three time slots.

The SF is constant at 128, at which the number of bits in one subframe becomes 7680128 ·2 = 120

bits, i.e. one slot is 40 bits long.

Figure 2.21: HS-SCCH information bits

The link information provided in HS-SCCH is shown in figure 2.21, and explained below. Part

1 consists of the so called Transport-Format and Resource-related Information bits (8 bits), and

Part 2 consists of the transport block size and HARQ parameters.

38 Theory

The Channelization Code Set specifies which of the codes in the HS-DSCH code tree to be

used when decoding the HS-DSCH (see section 2.4.12.1 below). The Modulation Scheme bit

selects which modulation scheme to use,

0 - QPSK

1 - 16QAM.

There are in total 254 different transport block sizes defined for HS-DSCH (see [5] Annex A

for a complete listing of TBS’s for FDD). The Transport Block Size field specifies which one,

in a set of 63 available block sizes out of the subset of possible combinations for this transport

format, this connection uses. This parameter together with modulation scheme and number of

codes defines the transport block size for this particular transmission. These three fields inform

the UE how to decode the HS-DSCH, and must therefore be descrambled and decoded before

the start of the HS-DSCH subframe. To ensure that the UE will finish decoding of the control

information, there is a delay of one time slot (667 µs) after Part 1 before sending the HS-DSCH.

The RV parameters s, r and b are described in the HS-DSCH section above.

The new data indicator NDI bit is used to indicate if this transmission is the original trans-

mission, or a retransmission.

HS-SCCH is actually a physical channel. The control information is mapped directly onto

HS-SCCH from the MAC-hs (see section 2.4.6).

2.4.11.1 Timing in relation to HS-DSCH

The modulation scheme and spreading code must be known, that is, decoded and ready for

use, when the first HS-DSCH subframe arrives. Therefore, the Part 1 information is coded and

scrambled with an UE specific code, and sent two time slots before the beginning of the first

HS-DSCH subframe. In other words, the UE has one time slot to decode the Part 1 information

before decoding the first data frame. See figure 2.22.

Figure 2.22: HS-SCCH timing in relation to HS-DSCH

2.4.11.2 UE decoding behavior

Each UE connected to the cell (which is using HSDPA) must decode Part 1 for all HS-SCCH

channels in each TTI. Actually, there is a limitation of four channels per TTI in the Release 6

specifications. Since Part 1 is scrambled with the UE ID of the intended receiver as we shall

see further on, the UE can detect if the subsequent HS-DSCH slots as well as the remaining

2.4. HSDPA 39

two SCCH slots need to be decoded. As mentioned above, there is a decoding margin of one

slot between the first HS-SCCH slot (Part 1) and the first HS-DSCH slot, i.e. the UE has got

667µs to decode it. This implies that Part 1 and Part 2 are decoded individually.

2.4.12 Coding of HS-SCCH

2.4.12.1 Coding of channelization code set, transport bloc k size and HARQ

Channelization code set parameters Given O and P , the offset and number of HS-DSCH

codes, channelization code set is mapped to the 7 bits xccs,1, xccs,2, ..., xccs,7 according to the

following equations.

The first three bits are the code group indicator bits, of which xccs,1 is the MSB,

xccs,1, xccs,2, xccs,3 = min (P − 1, 15 − P )

and the four last bits are the code offset indicator bits, of which xccs,4 is the MSB,

xccs,4, xccs,5, xccs,6, xccs,7 =∣

∣O − 1 −⌊

P8

⌋

· 15∣

∣.

O begins at 1, because of the reserved code (O = 0) for higher spreading factors, thus 15

codes are available as fore mentioned. This coding will free the last bit of the Part 1 byte,

which can then be used for modulation scheme mapping (see next section). Without coding,

the offset O = 1..15 and number of codes P = 1..15 would each have occupied 4 bits.

Transport block size parameter The transport block size of the HS-DSCH depends on

three variables; the ki transport block size index parameter, the modulation scheme and the

number of codes. In the specification, only decoding of the index parameter i.e. calculation of

transport block size is specified. Therefore, we shall first introduce the calculation of TBS and

then the coding of TBS into ki.

The calculation of TBS is specified in [5].

TBS = L(kt), (2.27)

where

kt = ki + k0,i. (2.28)

Table 2.5 specifies k0,i for the given modulation scheme and number of codes. TBS is then

given by

If kt < 40

L(kt) = 125 + 12 · kt (2.29)

else

L(kt) =⌊

Lmin · pkt

⌋

, (2.30)

where p = 20852048 and Lmin = 296.

40 Theory

Combination i Modulation

scheme

Number

of codes

k0,i

0

QPSK

1 1

1 2 40

2 3 63

3 4 79

4 5 92

5 6 102

6 7 111

7 8 118

8 9 125

9 10 131

10 11 136

11 12 141

12 13 145

13 14 150

14 15 153

15

16QAM

1 40

16 2 79

17 3 102

18 4 118

19 5 131

20 6 141

21 7 150

22 8 157

23 9 164

24 10 169

25 11 175

26 12 180

27 13 184

28 14 188

29 15 192

Table 2.5: k0,i for different modulation schemes and number of codes

The inverse calculation when coding ki then becomes the following. Because we already know

which modulation scheme and how many codes to use, this is done by first checking whether

the first case (ki < 40), or the second one (ki >= 40) applies.

First, assume that kt < 40. This gives

k0,t =L(kt) − 125

12. (2.31)

Now, if k0,t >= 40, our assumption was not right and we get

kt =

⌈

lg L(kt)296

lg 20852048

⌉

. (2.32)

Otherwise, we get

2.4. HSDPA 41

kt = dk0,te . (2.33)

Finally, equation 2.34 gives us the ki to be coded into Part 2.

ki = kt − k0,i. (2.34)

Also to be noted, there is a limitation on ki = [0..62] i.e. 63 different TBS’s for each combi-

nation of modulation scheme and number of codes. This gives 63 · 2 · 15 = 1890 different (valid)

combinations.

HARQ RV parameters These three parameters s, r and b may come in any of the eight

possible redundancy versions listed in table 2.6. b only affects 16QAM transmissions and is not

sent when using QPSK. Instead, there are two more possibilities for r. The coded bits Xrv in

table 2.6 are mapped to the three bits xrv,1, xrv,1, xrv,3, where xrv,1 is the MSB.

QPSK 16QAM

Xrv s r s r b

0 1 0 1 0 0

1 0 0 0 0 0

2 1 1 1 1 1

3 0 1 0 1 1

4 1 2 1 0 1

5 0 2 1 0 2

6 1 3 1 0 3

7 0 3 1 1 0

Table 2.6: RV coding for QPSK and 16QAM

2.4.12.2 Coding chain and information mapping for HS-SCCH

The sub-blocks of HS-SCCH are coded as described in this section. First, we denote Part 1

information bits x1,i, i = 1..8, and the Part 2 information bits x2,i, i = 1..13. Part 1 and Part 2

are coded and rate matched in order to fit into one subframe, one slot for Part 1 and two slots

for Part 2. The coding chain is illustrated as a flow diagram in figure 2.23.

Concatenation of HS-SCCH information The channelization code set is mapped to the

first 7 bits xccs,1, xccs,2, ..., xccs,7 as mentioned in the previous section 2.4.12.1. The modulation

scheme is mapped as 0 for QPSK and 1 for 16QAM, to the last bit xms,1, forming

x1,i = xccs,i, i = 1, 2, .., 7

x1,i = xms,i, i = 8(2.35)

42 Theory

Figure 2.23: HS-SCCH coding chain

2.4. HSDPA 43

For Part 2, the bit mapping is as follows. Transport block size identifier bits are denoted

xtbs,1, xtbs,2, .., xtbs,6, HARQ information bits are denoted xhap,1, xhap,2, xhap,3, RV information

bits are denoted xrv,1, xrv,2, xrv,3 and new data indicator bit is denoted by xnd,1.

The resulting bit sequence x2,1, x2,2, .., x2,13 is

x2,i = xtbs,i, i = 1, 2, .., 6

x2,i = xhap,i, i = 7, 8, 9

x2,i = xrv,i, i = 10, 11, 12

x2,i = xnd,i, i = 13

(2.36)

CRC attachment The 16 CRC bits are calculated from the concatenated bit sequence

x1,1, .., x1,8, x2,1, .., x2,13, according to 2.3.14.1, yielding the sequence ck, k = 1..16 in equation

2.37.

ck = Pim(17−k) (2.37)

To form the UE specific CRC used at the receiver to identify the subframe, the CRC is

masked with the 16 bit UE ID bits xue,1..16, xue,1 is LSB, according to equation 2.38. See figure

2.24.

cue,k = (ck + xue,k) mod 2, k = 1..16 (2.38)

Figure 2.24: UE specific CRC calculation

Finally, this is appended to Part 2 to form y1..29 = x2,1, .., x2,13, cue,1, .., cue,16.

Coding of Part 1 Part 1 information bits are encoded using 1/3-rate convolutional coding

as described in section A.1 which produces 3 · 8 + 24 = 48 bits, denoted z1,i, i = 1..48. These

bits are then punctured by omitting the following bits: z1,1, z1,2, z1,4, z1,8, z1,42, z1,45, z1,47,

z1,48, to form the 40 bits coded and rate matched output sequence r1,i, i = 1..40.

The 16 bit UE ID is also coded with convolutional code, but with 1/2 code rate producing

2 · 16 + 16 = 48 bits. The coded UE ID bits are denoted z2,i, i = 1..48, and are also punctured

44 Theory

to obtain 40 bits by omitting the same bits as in the Part 1 rate matching described above.

This yields the sequence cue,i, i = 1..40.

Finally, the coded Part 1 is scrambled with the coded UE ID, to form the 40 bit UE specific

sequence

s1,k = (r1,k + cue,k) mod 2, k = 1..40 (2.39)

Coding of Part 2 y1..29 is 1/3-rate coded using convolutional coding to obtain the 3·29+24 =

111 bits long output sequence z2,i, i = 1..111. This sequence is then punctured by omitting the

following bits: z2,1, z2,2, z2,3, z2,4, z2,5, z2,6, z2,7, z2,8, z2,12, z2,14, z2,15, z2,24, z2,42, z2,48, z2,54,

z2,57, z2,60, z2,66, z2,69, z2,96, z2,99, z2,101, z2,102, z2,104, z2,105, z2,106, z2,107, z2,108, z2,109, z2,110,

z2,111.

The result is the 80 bits r2,i, i = 1..80.

2.4.12.3 Physical channel mapping

Part 1 bits s1,k, k = 1..40 and Part 2 bits r1,i, i = 1..80 are finally concatenated. This means

that Part 1 will fit into the first slot of the HS-SCCH subframe, and Part 2 will fit into the last

two slots.

2.4.13 HS-DPCCH

As for the downlink signaling on the HS-SCCH, there is a need for an uplink signaling channel

as well. This is where the physical uplink channel HS-DPCCH serves as a feedback channel for

HARQ ACK’s and Channel Quality Indication (CQI) information indicating the quality of the

air interface to this particular user.

HS-DPCCH is code multiplexed with the uplink DPCH, and the SF is constant at 256 which

means that each time slot contains 10 bits. The 5 bit CQI information is block-coded into 20

bits using a (20, 5) coder, and the HARQ ACK bit is repetition encoded to 10 bits. The whole

HS-DPCCH slot is 3 time slots i.e. one TTI and is illustrated in figure 2.25.

Figure 2.25: HS-DPCCH information bits

CHAPTER 3

Test environment

3.1 TXAD

3.1.1 General overview

TXAD (TX Adapter board) is used to verify the downlink functionality.

As this is Ericsson proprietary it will not be described in detail, only things relevant to

understand our implementation will be discussed.

It has one FPGA that handles stimuli to the TX board which is called BULL. Another FPGA,

BILL, handles recording of baseband data.

3.1.2 DSP

TXAD is also fitted with a DSP. The DSP used is a TMS320C6416T from Texas Instruments,

running at 850 MHz. This is a very powerful Fixed-Point DSP with Viterbi and Turbo decoder

coprocessors.

The DSP is equipped with two external memory interfaces (EMIF), EMIFA and EMIFB.

EMIFA is used to access the SDRAM and EMIFB is connected to BILL.

The operating system on the DSP is a lightweight version of OSE called OSEck, OSE Compact

Kernel, developed by ENEA. The DSP also contains software to handle error tracing among

other things.

3.2 Test execution

Test cases are made by using SPECMAN, and the test environment is a base station which

communicates with a TXAD board.

A tool called ATENG is used to execute the test case.

In general a test case will send stimuli to the downlink functionality of the base station

through the TXAD board, which will record baseband data. This output will then be sent to

45

46 Test environment

a reference model for comparison.

CHAPTER 4

Method

4.1 Decoding

The 3GPP specifications defines how data should be encoded, it says nothing about decoding.

Some of the steps are easily reversible, others takes some more effort.

4.1.1 Descrambling

The input to this stage is the baseband IQ-data.

The scrambling is done by complex multiplication, so the descrambling should be done by

complex division. The division of two complex numbers can be written:

A

C=

(a + bj)

(c + dj)=

(

ac + bd

c2 + d2

)

+

(

bc − ad

c2 + d2

)

j (4.1)

Both the real and imaginary part are scaled by the number (c2 +d2) which is just a constant.

Let A be the received sequence and C the scrambling code. C is just a sequence of −1 and

1 which means that (c2 + d2) will always evaluate to 2. That means that to get the original

sequence the computed result should be divided by 2, or equivalently downshifted one bit. This

is not done in the current implementation, but this shouldn’t matter as we don’t care about the

absolute amplitudes (only relative) and the result should fit in 16 bits if the input is sufficiently

small (smaller than 215). The calculations done are

Iout = Iin · Icode + Qin · Qcode

Qout = Qin · Icode − Iin · Qcode

(4.2)

In the simulation this is done in software on the DSP. The scrambling code is then precalcu-

lated and stored in the DSP memory before the decoder starts.

In the real program this is done in hardware, on the FPGA, because this was already im-

plemented. The scrambling code is then generated by the shift registers described in section

2.3.3.1.

47

48 Method

The output from this stage is the descrambled IQ-data, in 16 bits format.

4.1.2 HS-SCCH

4.1.2.1 Dechannelization

As already shown in 2.3.2 the dechannelization is done by integrating (summing) the received

values. According to 2.4.11 the spreading factor of HS-SCCH is 128. As the output should

eventually go to the viterbi decoder it is convenient to stay in the BPSK domain (1, -1 for the

binary values 0, 1). So what is done is this for both I and Q:

1. Sum 128 consecutive numbers with the channelization code to get S

2. If abs(S) is less than 128 abort, dechannelization failed

3. If S > 0 output is 1, else −1

The output from this stage is an 8-bit signed sequence of −1 and 1.

4.1.2.2 Unmasking and depuncturing

Part-1 data is masked with the UEID specific mask created by 1/2-rate convolutional coding

and puncturing. The unmasking is done by simply multiplying, as BPSK multiplication is the

same as XOR.

Then part-1 and part-2 data is punctured as described in section 2.4.12.2. The depuncturing

is done by inserting zeros in place of the punctured bits.

4.1.2.3 Convolutional decoder

To decode the encoded data the Viterbi Co-Processor (VCP) on the DSP is used. See section

A.3.4 for a brief introduction to how it works.

Input to the Viterbi decoder coprocessor is calculated branch metrics, According to Texas

Instruments reference guide[8] these are calculated as:

For 1/2 rate, 2 branch metrics per symbol period needs to be calculated:

BM0(t) = r0(t) + r1(t)

BM1(t) = r0(t) − r1(t)(4.3)

For 1/3 rate, 4 branch metrics per symbol period needs to be calculated:

BM0(t) = r0(t) + r1(t) + r2(t)

BM1(t) = r0(t) + r1(t) − r2(t)

BM2(t) = r0(t) − r1(t) + r2(t)

BM3(t) = r0(t) − r1(t) − r2(t)

(4.4)

where r(t) is the received codeword, and r0(t) is the upper branch from the encoder.

The VCP is configured to decode with the parameters as defined in section A.2.1 for 1/3 code.

Equations 4.4 applies and the branch metrics are calculated for part-1 and part-2 respectively.

The VCP is setup to receive these with DMA and triggered to start.

4.1. Decoding 49

4.1.2.4 Parameter extraction

Upon VCP completion the decoded outputs of part-1 and part-2 are available. As hard decisions

are used in the decoder these are now bits. The sent data is extracted and the parameters are

calculated as defined in 2.4.12.2.

4.1.2.5 CRC calculation

The CRC calculation is done by table lookup for speed. It is a 16-bit CRC and the polynomial

is defined in section 2.3.14.1. The result is bit reflected.

4.1.3 HS-DSCH

Turbo decoding is not performed to lighten the burden of the DSP. Instead, the systematic

bits are extracted from the coded data. This implies that puncturing of systematic bits is not

supported.

4.1.3.1 Dechannelization and demodulation

The difference between dechannelization of HS-DSCH and HS-SCCH is that in this case no

actual decoding is done, so the output will be 0 and 1 instead of BPSK. The DSP is better

suited for working with bytes than with bits though, so the result is still saved in 8-bit sequence.

QPSK This is exactly the same as for HS-SCCH with the differences that the spreading factor

is 16 and the output is 0 and 1 (implemented by shifting down the sign bit).

16QAM According to section 2.3.15.1 we know that we can expect four different amplitudes

for 16QAM, where two are the negation of the other two. The actual values are not important,

we just start integrating and search for two different amplitudes. When they are found we start

over again calculating valus Si, this time comparing the amplitudes calculated with the known

ones. Looking at table A.1 one can see that for each branch, I and Q, there are two bits. The

first bit is determined by the sign of Si, if it is a positive value we get 0 otherwise 1. The second

bit is determined by the amplitude, if abs(S) is the lower amplitude we get 0 otherwise 1. This

allows for efficient implementation where we handle one 32-bit word at a time:

word = (sumI<0) << 24; // sumI = I branch, sumQ = Q branch

word |= (sumQ<0) << 16;

word |= (abs(sumI) == ampH) << 8; // ampH = the higher amplitude

word |= (abs(sumQ) == ampH);

4.1.3.2 Deinterleaver

The interleaver is described in section 2.4.10.8. To reverse this we move back the columns and

read out the symbols. This is done in software by using a lookup-table for the start of every

column in the sequence (30 values for QPSK and 60 values for 16QAM), and use an offset from

this.

50 Method

In the 16QAM case the output order of the two interleavers can be changed, this is accounted

for by changing offsets. The output of the lower interleaver can be inversed, the deinterleaver

inverses it back. If these operations should be performed is decided by the RV parameters.

4.1.3.3 Desegmentation

All the channels are already saved in order in the same array so nothing needs to be done here.

4.1.3.4 HARQ Bit Collection

The way the systematic and parity bits are ordered is described in section 2.4.10.5. As no turbo

decoding is done only the systematic bits needs to be extracted. This is done in-place by using

the unaligned memory access capability of the DSP. The data is ordered as column by column

in sequence, and each bit is represented by 8-bits. Unaligned access is used to continuesly

overwrite parity bits of previous column with systematic bits of the next column. Doing it

in-place also means that for columns completely filled with systematic bits no processing needs

to be done, thus saving time.

4.1.3.5 Rate matching

As no turbo decoding is done puncturing of data is not supported. However for some transport

blocks repetition is performed so this is implemented. Because of its presumed unusualness the

implementation is not very efficient and basically just uses the equation defined in section A.3.3

and saves the bits to take away in an array.

This is also the place where the tail bits are removed from the sequence and the 8-bit sequence

is converted to a bit sequence.

4.1.3.6 Bit descrambling

The descrambling is done exactly as scrambling, by XORing the data with the code. The

code, as described in section 2.4.10.2 is precalculated for the largest possible transport block

for efficiency.

4.1.3.7 Header and CRC extraction

The header is defined in section 2.4.6. The CRC is the last 24 bits.

4.1.3.8 CRC calculation

As for HS-SCCH table lookups are used for CRC calculation. The HS-DSCH CRC is 24 bits

and the polynomial is defined in 2.3.14.1.

4.2 Matlab Encoder

To aid the development of the DSP decoder, we have also implemented a basic HSDPA encoder

in Matlab. This enabled debugging of each of the decoding stages, which would otherwise had

4.3. Development tools 51

been very time consuming. Some of the decoding functions were first implemented in Matlab

as well, to verify correctness before implementing on target.

The coding of HS-SCCH is complete, whilst the coding of HS-DSCH is only partial. Punc-

turing of systematic bits is not supported since the decoder does not have this functionality

either, and the actual payload data is limited to the same data for each transport block.

The output is the I/Q data formatted to a suitable format for the DSP simulator.

4.3 Development tools

Texas Instruments Code Composer Studio 3.3 has been used to code, simulate and emulate the

target DSP TMS320C6416T. For onchip debugging Blackhawk 560 JTAG emulator has been

used. Matlab with Communications Toolbox have been used for the encoder.

4.4 Other tools

Some small POSIX C tools have been written to extract data from various log files to be able

to verify the decoding.

binparser Convert baseband I/Q data to a suitable format for the DSP simulator

readhslog Extract MAC-HS PDUs from a log file to be able to compare bit-by-bit with decoder

output

paramext Extract paramters from a log file necessary to be able to do the decoding (scrambling

code, users etc)

4.5 Software

This section provides an overview on how the software works. The decoding has been described

in section 4.1, so this text will focus on how the software is setup, how users are handled and

where the actual checks are made.

4.5.1 Flowchart

A general overview of the software can be seen in figure B.1 in appendix B. Data flow is

represented by the broader arrows coming from left to right, starting with the descrambling

that is done in BILL. The upper branch represents HS-SCCH decoding and the lower HS-

DSCH decoding. Data input and output to and from each stage is represented by the small

arrows with corresponding labels. Program flow is represented by the decision boxes and arrows

connecting them. The small thick arrows are for illustrative purposes and should be considered

as semaphores that triggers start of dataflow.

On each subframe interrupt the users will be iterated and decoded, as can be seen in the

upper left corner. After everything has been decoded, all checks have been made and reporting

is done the software sits and waits for the next subframe interrupt.

52 Method

4.5.2 Setup

4.5.2.1 Cell

Some cell-specific parameters needs to be provided to the program. These are the Scrambling

Code used and something called tCell, which is a cell-specific chip offset of tCell · 256 chips.

These parameters are written to BILL.

4.5.2.2 Users

At least one user needs to be defined, the upper limit is the available amount of dynamic

memory. Each user has a specific UEID, a category and 1 to 4 HS-SCCH codes. When a new

user is created the following steps will be done:

1. Check for available memory, abort if none available.

2. Check if a valid category has been entered (1-12), if not abort.

3. Create the UE mask for HS-SCCH decoding.

4. Create HS-SCCH codes assign to user. If one code has already been created for another

user just provide a reference.

5. Add user to the inactive list.

The only category-specific information currently used for any purpose is the inter-TTI arrival

time.

4.5.2.3 Run request

The program is started by sending a run request with a specified BFN. In this BFN the baseband

data flow from BILL will start and when a buffer is filled decoding will start.

4.5.3 User handling

Users are handled in three linked lists corresponding to three states. The states are inactive,

active and released, see figure 4.1.

Inactive is the default state and means that this user has not yet successfully decoded an HS-

SCCH channel with any of its assigned codes. Active means that the user found a working code

in the last TTI, the code is then bound to this user. An active user that no longer succeeds in

decoding HS-SCCH with its bound code, i.e. it is no longer scheduled any data, will be moved

to the released state. It should be noted that the active user is only evaluated when it should

be scheduled according to the inter-TTI parameter (see 2.4). The next TTI all released users

will be moved to the inactive state.

This state machine will run until one of these conditions is no longer true:

• Less than 15 HS-PDSCH channels have been decoded

• There are still HS-SCCH codes not bound to any active user

4.5. Software 53

Figure 4.1: User states

4.5.4 HS-SCCH decoding

For active users HS-SCCH decoding will only be done with its bound code, if this fails the user

is no longer active.

For inactive users HS-SCCH decoding will be done with the lowest unused code this TTI

until a working code is found or all codes assigned to the user (maximum 4) are exhausted. If

a working code is found the user becomes active.

First the whole subframe will be dechannelized with the specified code. If this fails the code

is not used in this TTI, no user has been assigned to it. If it succeeds part-1 and part-2 data is

prepared for convolutional decoding. In the flowchart this is illustrated by an lower and upper

branch for the two parts.

For part-1 the following will be done:

• Unmask with UE specifik mask

• Depuncture, insert zeros

• Calculate branch metrics for viterbi decoding

• Setup the Viterbi Co-Processor for decoding.

And for part-2:

• Depuncture, insert zeros

• Calculate branch metrics for viterbi decoding

• Setup the Viterbi Co-Processor for decoding.

After this the VCP is started. When the decoding is done data is extracted from both parts

and the CRC is calculated. If the calculated CRC checks out with the one received from part-2

54 Method

we know this HS-SCCH channel belongs to this user. If not we check the next code until all

codes for the user have been exhausted.

If we find a working code we proceed to HS-DSCH decoding.

4.5.5 HS-DSCH decoding

First all transport block parameters are calculated, such as those for segmentation and inter-

leaving. In this step it can be determined if the transport block is punctured, if so we abort

here and reports that it’s not supported. Otherwise this is done in the order described:

• Dechannelization (includes demodulation)

• Deinterleaving (includes desegmentation)

• HARQ bit collection

• Rate matching (only repetition)

• HS-PDU descrambling

• Header extraction

• CRC calculation and comparison

4.5.5.1 Error reporting

Fatal errors will be reported. These can be

• in Dechannelization, if integrated values lower than the spreading factor is found

• in Header Extraction, if the header is illegal

• in CRC comparison, if the received and calculated CRC differs

4.5.6 User checks

When all decoding for the user is done checks can be performed and abnormalities can be

reported. Currently nothing is implemented here.

4.5.7 Cell checks

When all users has been decoded cell-specific checks can be performed and abnormalities can

be reported. Currently the only check implemented is to see if there is sufficiently many HS-

PDSCH scheduled this TTI. The threshold is provided in the run request.

4.6 Specification

4.6.1 Signals

For start, stop and setup of the decoder as described in section 4.5.2 the following signals are

defined.

4.6. Specification 55

4.6.1.1 TXAD SETUP HSDPA DEC CELL REQ

This signal is used to setup cell specific parameters.

Field Type Description

sig no U32 Signal Number

scrCode U16 Scrambling code

tCell U8 Cell-specific offset (tCell*256 chips)

Table 4.1: TXAD SETUP HSDPA DEC CELL REQ

4.6.1.2 TXAD SETUP HSDPA DEC CELL CFM

This signal is sent in response to TXAD SETUP HSDPA DEC CELL REQ



Table 4.2: TXAD SETUP HSDPA DEC CELL CFM

4.6.1.3 TXAD SETUP HSDPA DEC USER REQ

This signal is used to setup users. Should be sent multiple times for multiple users.



ueid U16 UE-ID or HS-RNTI of the user

category U8 UE category (1-12)

scchCodes[4] S8 Array of HS-SCCH codes to listen to. Shall be

entered in ascending order. If less than 4 the

remaining entries shall have value -1.

Table 4.3: TXAD SETUP HSDPA DEC USER REQ

4.6.1.4 TXAD SETUP HSDPA DEC USER CFM

This signal is sent in response to TXAD SETUP HSDPA DEC USER REQ.



Table 4.4: TXAD SETUP HSDPA DEC USER CFM

56 Method

4.6.1.5 TXAD SETUP HSDPA DEC RUN REQ

This signal is used to trigger start of decoding.



pdschThr U8 PDSCH threshold. Warn with a trace if number of

PDSCH is below this number in any subframe.

startBFN U16 BFN to start decoding

Table 4.5: TXAD SETUP HSDPA DEC RUN REQ

4.6.1.6 TXAD SETUP HSDPA DEC RUN CFM

This signal is sent in response to TXAD SETUP HSDPA DEC RUN REQ



Table 4.6: TXAD SETUP HSDPA DEC RUN CFM

4.6.1.7 TXAD SETUP HSDPA DEC STOP REQ

This signal is used to stop the decoding.



Table 4.7: TXAD SETUP HSDPA DEC STOP REQ

4.6.1.8 TXAD SETUP HSDPA DEC STOP CFM

This signal is sent in response to TXAD SETUP HSDPA DEC STOP REQ



Table 4.8: TXAD SETUP HSDPA DEC STOP CFM

4.6.2 TXADCLI

These are the commands wrapping the signals

4.6. Specification 57

txadcli %TXAD -c hssetcell <scrambling code> <tcell>

txadcli %TXAD -c hssetusr <ueid/h-rnti> <category> <scch_codes 0 .. 4>

txadcli %TXAD -c hsrun <pdsch threshold> <start_bfn>

txadcli %TXAD -c hsstop

4.6.3 Traces

Every trace is preceded by [bfn:sfn].

4.6.3.1 Number of HS-PDSCH below threshold

HSDPA ERROR: Only %d PDSCH channels

4.6.3.2 Error in HS-DSCH decoding

HSDPA ERROR DSCH: Puncturing not supported!

HSDPA ERROR DSCH: Despreading failed!

HSDPA ERROR DSCH: Illegal MAC-hs header!

HSDPA ERROR DSCH: CRC check failed!

Followed by:

** HSDPA FAIL: UEID %d P: %d %s TBS: %d

4.6.3.3 Report every X:th subframe

HSDPA REPORT: max_cycles: %d

minTBS: %d maxTBS: %d minP: %d maxP: %d

sfn_with_data: %d sfn_with_errors: %d

4.6.3.4 Fatal error

HSDPA ERROR: Missed subframe!

CHAPTER 5

Result

5.1 Decoder

A working decoder has been implemented as a DSP program, supporting one cell and multiple

users. The baseband data is transferred to our application through an external memory interface

by an FPGA. Before transferring the data the FPGA descrambles it, all other processing is done

on the DSP. The FPGA part has been made by Ericsson.

No turbo decoding is done, which limits the decoder to unpunctured HS-DSCH data.

The program has been integrated into existing test environment and tests have successfully

executed with real hardware and verified to be correctly decoded. The program can easily be

customized to accommodate different test cases.

5.2 Measurements

Measurements have been done to see if the real-time requirements of the decoder has been

fulfilled. The hard limit for decoding of one subframe is the TTI length, 2 ms.

For measurements the C6416 Device Cycle Accurate Simulator in Code Composer has been

used. This simulates everything on the DSP, including cache and the Viterbi coprocessor. The

same compiler flags as in the final program has been used. The cycles measured are those

used for complete decoding of a subframe, excluding descrambling as this is in implemented on

hardware on the actual setup. The major difference between the final setup is the overhead of

the kernel and other applications running on the DSP.

The encoding has been done with our Matlab encoder implementation. The decoder has been

setup with 4 users with the same 4 channelization codes for every run.

The measurements seen in table 5.1 are for the biggest TB sizes for QPSK/16QAM respec-

tively with 1 and 4 users. As can be seen the difference between QPSK and 16QAM is not that

big, although the TB size for 16QAM is double that for QPSK. The reason for this is the way

demodulation is done. An actual decoding run on real hardware took on average 358050 cycles

for 16QAM 27952, an increase of 9% from simulation. The biggest contributor to the increase

59

60 Result

Modulation Scheduled Users TB sizes Cycles Time (ms)a

QPSK 1 13904 320837 0.3775

16QAM 1 27952 328457 0.3864

QPSK 4 3695, 3695, 3695, 2775 426587 0.5019

16QAM 4 7430, 7430, 7430, 5579 434626 0.5113

aAssuming 850 MHz clockrate

Table 5.1: Simulated decoding

is probably the DMA transfers of baseband data. In the simulation this was done from internal

memory, on real hardware this is done through the external memory interface from the FPGA.

Simulation measurements for all QPSK TB sizes have also been made, in figure B.2 in ap-

pendix B results for all TB sizes using 15 channels can be seen. The peaks for the smallest

TB sizes are due to that these are repeated in rate matching, and this adds extra time to the

decoding. It can also be seen that the actual worst case is somewhere below the middle (0,52

ms for size 7168) and not for the largest transport block, this is due to the way the HARQ

bit collection is implemented. The computation time for every step in the HS-DSCH decoding

chain increases with increasing transport block sizes, except for the HARQ bit collection, which

decreases with decreased number of parity bits.

CHAPTER 6

Discussion

6.1 Design choices

The saving in computation time due to extracting the systematic bits instead of decoding

was probably not of that great importance, considering the processing margin achieved. We

have not looked into the performance of the built-in turbo co-processor on the DSP, but the

computation time for this should not differ that much from the Viterbi co-processor for small

transport blocks at least. And if certain time-consuming operations were to be emigrated to

hardware instead, even large transport blocks should not be a problem.

The choice to have descrambling in hardware was purely made because it was already imple-

mented and readily available.

This will however introduce a limitation on the decoder, since HARQ retransmissions often

uses a redundancy version where systematic bits are punctured.

The reasons for implementing the entire decoder on DSP was to get an idea of how much

processing power that is required for the complete decoding. It could then be decided which

parts of the decoding chain that is suitable for software, and which, if any, that would be better

off on hardware. The conclusion is that all of the decoding can easily be done on DSP alone.

But since more users in the cell imposes a worst case scenario when the HS-SCCH codes are

assigned to the users in the end of the user list, i.e. all users must be decoded in order to find

the right ones, the dechannelization of HS-SCCH should be implemented on hardware instead.

6.2 Future work

To be able to support more than one cell, and better support for more users the dechannelization

of HS-SCCH and most of the HS-DSCH decoding should be implemented in hardware (FPGA).

To be able to support more test scenarios the following may be done to further simulate real

UE’s:

• Implement turbo decoding using co-processor and support punctured data to allow resends

61

62 Discussion

• Implement simulation of HS-DPCCH, that is simulate CQI and ACK/NACK

For external verification of the MAC-hs PDU’s the PDU’s can be transferred in real-time

over a proprietary interface to the workstation.

APPENDIX A

Modulation and FEC coding

A.1 Modulation

A.1.1 QPSK

Quadrature Phase-Shift Keying is a rather simple phase shift modulation scheme, in which the

input data bits are modulated as pairs of bits to form a single-carrier sinusoidal signal with

(normally) 90o phase shifts. This enables a symbol of two bits of data to be transmitted in

the single-frequency carrier wave by simply specifying the phase of the signal as a multiple of

90o plus 45o offset. The modulated signal is constructed by summing two source sinusoids of

the same amplitude, separated by 180o in phase, and then apply ordinary bi-phase shift keying

(BPSK) on these separately.

The resulting bit pairs are shown in the constellation diagram in figure A.1 below. Here we

see that each quadrant of the complex plane represents a unique combination of bits; Q1− 11,

Q2− 01, Q3− 10 and Q4− 00. Each symbol have equal reliability regardless of position, since

all four constellations have equal distances between their nearest neighbor. The decision area

is simply the 90o slice between two quadrants.

Using this modulation technique, the data rate of the communication channel is effectively

doubled. The disadvantage is obviously the increased sensitivity to interference in the phase if

the signal, that is, the Bit Error Ratio (BER) is also increased. However, compared to higher

levels of modulation, this is more robust to interference and does not require as large amounts

of transmission power as for instance 16QAM do.

A.1.2 16QAM

A more sophisticated modulation scheme is the 16-Quadrature Amplitude Modulation. This is

similar to PSK in the sense that it also uses two sinusoids of the same frequency which are

summed together to form the desirable modulated signal. However, these are separated by a

static phase shift of 90o to make it possible for the receiver to demodulate the signal without the

63

64 Appendix A

Figure A.1: Normalized constellation diagram for QPSK modulation

use of separate carrier wave frequencies. Instead, QAM uses amplitude modulation to generate

the modulated symbols. These two constituent signals are then summed to form the modulated

signal.

16QAM uses two discrete levels of amplitude in order to get 16 different symbols. These are

defined in table A.1 for WCDMA modulation.

Depending on which quadrant is the dominant one, the resulting signal will get a unique am-

plitude and phase out of 16 possible variants (4 bits per symbol) as depicted in the constellation

diagram in figure A.2.

It may also be beneficial to change the order of the bit constellations, depending on channel

conditions and so on, since not all bits have equal distance to the nearest neighbors (the outer

points do not have an upper limit), hence different reliabilities.

This modulation scheme imposes another problem when used in macro diversity networks

(such as UMTS), namely the fact that the amplitude of the signal is initially not known by the

UE’s, and may change rapidly. For QPSK for instance, the amplitude has only one discrete

level.

As for all higher-order modulation schemes, the noise sensitivity as well as the power require-

ment increases with the complexity of the modulation, hence this is not ideal for channels with

poor signal quality. That is, if two reveivers with equal signal strength are to receive QPSK

versus 16QAM, the latter one will have almost half the processing gain. However, when a

good channel is available, the data rate is doubled compared to QPSK whilst the bandwidth is

maintained.

The decision areas of 16QAM are somewhat complex compared to QPSK, and not as robust to

interference since both phase and amplitude interference affects the resulting received symbols.

A.2. Error coding 65

i1q1i2q2 I branch Q branch

0000 0.4472 0.4472

0001 0.4472 1.3416

0010 1.3416 0.4472

0011 1.3416 1.3416

0100 0.4472 -0.4472

0101 0.4472 -1.3416

0110 1.3416 -0.4472

0111 1.3416 -1.3416

1000 -0.4472 0.4472

1001 -0.4472 1.3416

1010 -1.3416 0.4472

1011 -1.3416 1.3416

1100 -0.4472 -0.4472

1101 -0.4472 -1.3416

1110 -1.3416 -0.4472

1111 -1.3416 -1.3416

Table A.1: 16QAM mapping

Figure A.2: WCDMA constellation diagram for 16QAM modulation

A.2 Error coding

Two different types of error coding algorithms are used in WCDMA; 1/2-rate and 1/3-rate

convolutional code and 1/3-rate turbo code. The latter one is used for higher data rates

when good error correction is required, at the expense of large overhead and complexity in

66 Appendix A

encoding/decoding. The performance benefits are achieved when large enough block sizes are

used. Turbo coding is a so called systematic algorithm, meaning that the original, uncoded

data block is included in the code word. If the data block is n bits long, and the code sequence

is m bits long, the entire code word is m + n bits and the code rate is n/ (m + n).

A.2.1 Convolutional coding

Convolutional coding is the base error-correction coding algorithm in WCDMA, and is also the

main building block of a Turbo coder as we shall see in the next section.

The structure of an encoder can be seen as a series of K − 1 bit sequential shift registers,

each tied to a modulo-2 adder connected in series with the other adders, forming a chain to

output i of n outputs in total. The adders are placed according to the ones in the generator

polynomial. Note that 3GPP polynomials are given in octal form.

K is the so called constraint length, in the 3GPP case equal to 9. n is the number of output

bits which are 2 for the 1/2 rate and 3 for the 1/3-rate encoder. With k = 1 input bits (for

each output sequence) the encoder is said to be a (n, k,K) = (3, 1, 9) convolutional encoder for

the 1/3-rate version.

Furthermore, there are n generator polynomials Gi, one for each output, that specifies which

of the K−1 memory bits and 1 input bit to add to the output sum. This is illustrated in figure

A.3 for both the 1/2-rate and 1/3-rate encoders used in WCDMA.

Figure A.3: 3GPP (2,1,9) and (3,1,9) convolutional encoders

The generator polynomials used in WCDMA coding are presented in binary form in table

A.2. The position of the modulo-2 adders are simply matched against the bit pattern in the 9

bit polynomials, hence the output bits are the sum of those memory elements and, common to

all outputs, the input bit.

A.2. Error coding 67

Note that the strict sum of two 1’s is 0 (overflow) in the modulo-2 summing.

(2,1,9) (3,1,9)

G0 1,0,1,1,1,0,0,0,1 1,0,1,1,0,1,1,1,1

G1 1,1,1,1,0,1,0,1,1 1,1,0,1,1,0,0,1,1

G2 - 1,1,1,0,0,1,0,0,1

Table A.2: Bit sequence representation of Gi

The output vector U of an 1/3-rate convolutional encoder with binary bit vector representa-

tion of the generator polynomials as above and input vector X, can be stated as

U =G0 · ((1 + q−1 + q−2 + .. + q−N ) · x)+

G1 · ((1 + q−1 + q−2 + .. + q−N ) · x)+

G2 · ((1 + q−1 + q−2 + .. + q−N ) · x),

(A.1)

where q is the delay operator. To make sure that the internal state is all-zero at the beginning

of next data input sequence, K − 1 zeros are added at the end of all inputs. This is known as

Trellis termination.

A Systematic Convolutional (SC) encoder is obtained by including the input data in the

output. This greatly improves BER performance, at the cost of more data to transmit i.e. more

overhead. In the (3, 1, 9) case however, the increase will only be 33%.

A.2.2 Turbo Coding

The Shannon limit of a communication channel is the theoretical maximum information transfer

rate of the channel, for a particular noise level. Turbo code error-correction comes closest to the

Shannon limit of all to date known coding algorithms (Pushing the Limit [10]). This enables

high transmission rates through the same noisy channel as for other non near-Shannon limit

algorithms, without having to increase transmission power.

The obvious drawback is that the computation power required when decoding increases,

making Turbo codes unusable in long-term communication such as speech channels or video

calls. The ideal usage for Turbo codes is in PS data channels, which are only used for short

periods of time, but at a very high peak data rate.

An overview of the encoding/decoding chain is shown in figure A.4.

The Turbo encoder is a so called Parallel Concatenated Convolutional Code (PCCC) encoder,

which consists of two parallel Recursive Systematic Convolutional (RSC) encoders. The first

RSC operates directly on the input sequence and also outputs the unchanged input (systematic

output). The other RSC is fed with bits supplied from a dynamic interleaver, which performs

inter-row and intra-row permutation of the input bits arranged in a matrix with dimensions

depending on the length of the input bit block.

The arrangement and the algorithms for permuting the rows (for 3GPP implementation) are

quite complex and not presented here, please refer to [3] for details.

68 Appendix A

Figure A.4: Turbo coding and decoding chain

An R × C rectangular matrix is constructed, where R and C are dependent of the number

of input bits K that can be in the interval 40 to 5114. R can be 5, 10 or 20 rows. C is given

from a prime number look-up table. The resulting R · C elements are filled, row by row, with

the input bit sequence. If R ·C > K, the final elements are padded with dummy bits which are

later pruned away from the matrix after the permutation operations.

The interleaving of bits in a non-contiguous way gives better protection against burst inter-

ference in the channel. If several bits in sequence are destroyed in transmission, the likelihood

that the entire transport block will be unusable is greatly decreased compared to in-order

transmission.

The first encoder produces the first parity bit Zk whilst the other produces the second parity

bit Zk. In addition to the systematic output Xk, this yields a 1/3-rate Turbo encoder as shown

in figure A.5. The RSC constituent encoders are 8-state. The interleaver output is denoted Xk.

Figure A.5: Structure of a 1/3-rate Turbo encoder

A.3. Rate matching 69

A.3 Rate matching

In order to make the error coded transport block fit into the physical channel(s), these bits has

to be matched against available channel bits. This is done by either removing bits or repeating

bits.

A.3.1 Repetition

By repeating coded bits when there is enough channel bits available for this, the redundancy

is increased and the coding becomes more robust. For Turbo codes, repetition of systematic

bits only would not increase the redundancy as much as repeating parity bits, because the

correlation is not as strong for these as for parity.

A better approach is to repeat only parity bits for each systematic bit, or to repeat both

systematic and parity bits evenly. This way the total effective coding rate is increased, and

decoding becomes easier in a noisy channel.

A.3.2 Puncturing

By omitting certain bits from the code given from the encoder output, i.e. puncturing the code,

one may change the effective coding rate from the original coding rate up to 1/1 for Turbo

codes (only systematic bits are sent). This is done by comparing to a predefined puncturing

pattern bit-wise; if 1 transmit the code bit, otherwise discard it.

The new coding rate depends on the number of bits omitted and the length of the puncturing

pattern. An example of a 1/2-rate encoder with different puncturing patterns is listed in table

A.3 below.

Coding rate Puncturing pattern Length n

1/2 1 1

1

2/3 10 2

11

3/4 101 3

110

5/6 10101 5

11010

8/10 11000101 8

11111010

Table A.3: Example of puncturing patterns

The puncturing pattern is n bits long. Every bit masked out by this bit pattern will be kept

during the first n bits of the code words, and then the next n bits will be masked out and so

on until the end of the code streams have been reached.

For puncturing of WCDMA Turbo code, the puncturing might be performed on systematic

70 Appendix A

bits as well, or on both systematic and parity bits.

A.3.3 WCDMA Rate matching algorithm

The rate matching pattern algorithm indicates which of the coded bits to either puncture or

repeat. The parameters that controls this selection is number of input bits Xi, initial error

eini, positive error update eplus and negative error update eminus. These are given in [3] section

4.2.7.2 for downlink, and in section 2.4.10.5 for HSDPA.

The following pseudo code represents the iteration of repeated or punctured bits.

e = e_ini

m = 1

while (m <= Xi)

e = e - e_minus

if (e <= 0) then

if (puncturing) then

puncture_bit(m)

else

repeat_bit(m)

end

e = e + e_plus

end

m = m + 1

end

deini/eminuse is the number of bits that stays unaffected during the rate matching. The

interval between each punctured or repeated bit is dependent on the ratio eplus/eminus.

A.3.4 Viterbi decoder

A Viterbi decoder is used to decode data encoded with a convolutional decoder. It uses the

Viterbi algorithm which is a maximum likelihood sequence detector. The most likely sequence

is found by traversing a so called trellis. The trellis consists of S stages for each time instant

during encoding, where S corresponds to the number of bits input N to the encoder plus

(K − 1) tail bits, or trellis termination bits. At each stage there are 2K−1 states, where K is

the constraint length of the encoder.

Out from every stage there are two state transactions, or branches, corresponding to the

inputs “0” and “1” to the encoder. Which state will follow the other is dependent on the

polynomials of the encoder. Every branch is associated with a symbol corresponding to the

output from the encoder. For a rate 1/R encoder there are 2R possible output values.

The trellis can be illustrated with dots representing states, and arrows representing the

branches. In figure A.6 a trellis for a 1/2 rate encoder with K = 3 and polynomials 7 and

A.3. Rate matching 71

5 can be seen.

Figure A.6: Trellis for a K = 3 Convolutional Code

The principle of decoding is then

1. Calculate Branch Metrics for each possible state transition. These are the normed

distances between every possible symbol and the received symbol. Due to symmetry there

are 2R−1 different branch metrics for rate 1/R code.

2. Calculate Cumultative Path Metric. This is the sum of M previous Branch Metrics,

where M is the memory depth of the decoder.

3. Calculate surviving path. The surviving path is the path with lowest Path Metric.

4. Extract the error-corrected data. The leftmost bit in each state in every stage of the

surviving part corresponds to the input to the encoder.

To be able to decode the encoder is always initiated to the state with all zeros. The input is

padded with (K − 1) zeros to flush the delay elements which means that the end state is also

known to be all zeros.

APPENDIX B

Charts

73

74

Appendix

B

Figure B.1: Decoder flowchart

75

Figure B.2: All TB sizes for 15 PDSCH channels, QPSK

APPENDIX C

Abbreviations

16QAM Quadrature Amplitude Modulation of order 16

2G 2nd generation

3G 3rd generation

3GPP Third Generation Partnership Project

AICH Acquisition Indicator Channel

ARQ Automatic Repeat ReQuest

BCCH Broadcast Control Channel

BCH Broadcast Channel

BER Bit Error Ratio

BFN Node B Frame Number

BLER Block Error Ratio

BPSK Binary Phase Shift Keying

CCCH Common Control Channel

CCH Control Channel

CCPCH Common Control Physical Channel

CCTrCH Coded Composite Transport Channel

CDMA Code Division Multiple Access

CPICH Common Pilot Channel

CRC Cyclic Redundancy Check

CS Circuit Switched

CTCH Common Traffic Channel

DBP Downlink Baseband Processing

DCCH Dedicated Control Channel

DCH Dedicated Channel

DL Downlink

DPCCH Dedicated Physical Control Channel

DPCH Dedicated Physical Channel

DPDCH Dedicated Physical Data Channel

DSP Digital Signal Processor

77

78 Appendix C

DTCH Dedicated Traffic Channel

DTX Discontinuous Transmission

EMIF External Memory Interface

FACH Forward Access Channel

FDD Frequency Division Duplex

FEC Forward Error Correction

FPGA Field-Programmable Gate Array

HARQ Hybrid Automatic Repeat Request

HSDPA High Speed Data Packet Access

HS-SCCH High-Speed Shared Control Channel

HS-DSCH High-Speed Downlink Shared Channel

HS-PDSCH High-Speed Physical Downlink Shared Channel

HS-DPCCH uplink High Speed-Dedicated Physical Control Channel

IMT-2000 International Mobile Telecommunications 2000

IP Internet Protocol

ITU International Telecommunication Union

MAC Medium Access Control

P-CCPCH Primary Common Control Physical Channel

PCCH Paging Control Channel

PCH Paging Channel

PDCP Packet Data Convergence Protocol

PDU Protocol Data Unit

PhCH Physical Channel

PHY Physical layer

PICH Page Indicator Channel

PRACH Physical Random Access Channel

PS Packet Switched

QoS Quality Of Service

QPSK Quadrature Phase Shift Keying

R99 Release ’99

RAN Radio Access Network

RBS Radio Base Station (also called Node B)

RF Radio Frequency

RLC Radio Link Control

RNC Radio Network Controller

RRC Radio Resource Control

RV Redundancy Version

S-CCPCH Secondary Common Control Physical Channel

SCH Synchronization Channel

SDU Service Data Unit

SIR Signal-to-Interference Ratio

SF Spreading Factor

SFN Subframe Number

TB Transport Block

79

TDD Time Division Duplex

TDMA Time Division Multiple Access

TFCI Transport Format Combination Indicator

TrCH Transport Channel

TTI Time Transmission Interval

UE User Equipment

UMTS Universal Mobile Telecommunications System

UTRAN UMTS Terrestrial Radio Access Network

WCDMA Wideband Code Division Multiple Access

REFERENCES

[1] 3GPP. Physical channels and mapping of transport channels onto physical channels (release

6). Technical Report 25.211, 3rd Generation Partnership Project, 2005.

[2] 3GPP. Physical layer - general description (release 6). Technical Report 25.201, 3rd

Generation Partnership Project, 2005.

[3] 3GPP. Multiplexing and channel coding (release 6). Technical Report 25.212, 3rd Gener-

ation Partnership Project, 2006.

[4] 3GPP. Spreading and modulation (release 6). Technical Report 25.213, 3rd Generation

Partnership Project, 2006.

[5] 3GPP. Medium access control (MAC) protocol specification (release 6). Technical Report

25.321, 3rd Generation Partnership Project, 2007.

[6] E. Dahlman, S. Dahlman, S. Parkvall, J. Skold, and P. Beming. 3G Evolution: HSPA and

LTE for Mobile Broadband. Academic Press, 2007.

[7] H. Holma and A. Toskala. WCDMA for UMTS. Wiley, 2001.

[8] T. Instruments. TMS320C64x DSP Viterbi-Decoder Coprocessor (VCP) Reference Guide.

Technical Report SPRU533D, Texas Instruments, 2004.

[9] ITU. About mobile technology and IMT-2000. http://www.itu.int/osg/spu/imt-2000/

technology.html, 2005.

[10] E. Klarreich. Pushing the limit. Science News Online., 2005.

[11] S. Reifegerste. CRC calculation. http://www.zorc.breitbandkatze.de/crc.html, 2006.

[12] Wikipedia. Cellular network — wikipedia, the free encyclopedia. http://en.wikipedia.

org/w/index.php?title=Cellular_network&oldid=165647159, 2007. [Online; accessed

23-October-2007].

81

2008:010 civ master's thesis downlink baseband decoder …1024368/... · 2016. 10. 4. · the...

Documents