handover parameters self-optimization by q-learning in 4g networks

68

Upload: mohamed-raafat-omri-engmsc

Post on 15-Apr-2017

484 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Handover Parameters Self-optimization by Q-Learning in 4G Networks
Page 2: Handover Parameters Self-optimization by Q-Learning in 4G Networks

I

Handover Parameters

Self-optimization by

Q-Learning in 4G

Networks

Realized by: Supervised by:

Mohamed Raafat OMRI PhD. Maissa BOUJELBEN

July 12, 2016

Page 3: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

i

Dedication

I dedicate my dissertation work to my family and my friends. A special feeling of

gratitude to my loving parents, Salah and Sghaira Omri whose words of

encouragement and push for tenacity ring in my ears.

My sisters Kaouther, Lamia, Soumaya, Leila

and my brother Lotfi have never left

my side and are very special.

I dedicate this dissertation to my friends

who have supported me throughout the process.

I will always appreciate all they have done.

I also dedicate this work and give special

thanks to my lovely fiancรฉe Safa.

Page 4: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

ii

ACKNOWLEDGEMENTS

I would like to thank my supervisor PhD. Maissa Boujelben for her help and guidance

throughout my progress in this project.

I would like to acknowledge and thank Mr. Walid Douagi, head of Telecom Department,

PhD. Talel Zouari, my school ESPRIT and ESPRIT TECH for allowing me to conduct my

project and providing the requested assistance.

Special thanks go to the members of the jury.

I must acknowledge as well the many friends, colleagues and teachers who assisted, advised,

and supported my engineering studies and writing efforts over the years.

Finally I would like to acknowledge my family for their unlimited support and help.

Page 5: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

iii

Abstract

With more and more customers using mobile communications it is important for the service

providers to give their customers the best Quality-of-Service (QoS) they can afford. Many

providers have taken to improving their networks and make them more appealing to

customers. One such improvement that providers can deliver to their customers is to enhance

reliability of the network meaning that customers' calls are less likely to be dropped by the

network.

This dissertation explores improving the reliability of a 4G network by optimizing the

parameters used in handover. The process of handover within mobile communication

networks is very important since it allows users to move around freely while still staying

connected to the network. The most important parameters used in the handover process are

the Time-to-Trigger (TTT) and Hysteresis (hys). These parameters are used to determine

whether a base station is better than the serving base station by enough offset to warrant a

handover taking place. The challenge in optimizing the handover parameters is that there is a

fine balance that needs to be struck between calls being dropped due to a handover failing and

the connection switching back and forth between two base stations, unnecessarily, wasting the

network resources. In this project, we propose to use a machine learning technique known as

Q-Learning to optimize the handover parameters by generating a policy that can be followed

to adjust the parameters as needed. It was found that the implemented Q-Learning algorithm

was capable of improving the Handover performance by minimizing the chosen Handover-

related Key Performance Indicators (KPI).

Key words: LTE-Advanced, Handover, Q-learning Algorithm, Hysteresis margin, Time-To-

Trigger, Self-Optimization Network.

Page 6: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

iv

Table of contents

General Introduction .................................................................................................................. 1

Chapter I: LTE-Advanced Overview ......................................................................................... 3

Introduction. ........................................................................................................................ 3

1.1. Requirements and Targets for LTE-Advanced. ........................................................... 3

1.2. LTE Enabling Technologies. ....................................................................................... 5

1.2.1. Downlink OFDMA (Orthogonal Frequency Division Multiple Access).................. 5

1.2.2. Uplink SC-FDMA (Single Carrier Frequency Division Multiple Access). .............. 6

1.2.3. LTE-A Channel Bandwidths and resource elements. ............................................... 7

1.3. LTE-Advanced Network Architecture. ........................................................................ 7

1.3.1. The Core Network: Evolved Packet Core (EPC). ..................................................... 8

1.3.2. The Access Network E-UTRAN............................................................................... 9

1.3.3. The User Equipment (UE). ..................................................................................... 12

1.4. E-UTRAN Network Interfaces. ..................................................................................... 12

1.4.1. X2 Interface. ........................................................................................................... 12

1.4.2. S1 Interface ............................................................................................................. 13

1.5. LTE Protocol Architecture ............................................................................................ 14

1.5.1. User Plane ............................................................................................................... 14

1.5.2. Control Plane .......................................................................................................... 14

1.5.2.1. Radio Resource Control (RRC). .............................................................................. 15

1.5.2.2. Radio Resource Control States. ............................................................................... 16

1.6. Self-Organizing Networks. ............................................................................................ 17

Conclusion ............................................................................................................................ 19

Chapter II: Handover in LTE-Advanced .................................................................................. 20

Introduction. ......................................................................................................................... 20

2.1. Handover Definition and Characteristics ...................................................................... 20

2.1.1. Seamless Handover ................................................................................................. 20

2.1.2. Lossless Handover .................................................................................................. 21

2.2. Types of Handover ........................................................................................................ 22

2.2.1. Intra LTE Handover: Horizontal Handover ............................................................ 22

2.2.2. Vertical Handover ................................................................................................... 22

2.3. Handover Techniques .................................................................................................... 23

Page 7: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

v

2.3.1. Soft handover, Make-Before-Break ........................................................................ 23

2.3.2. Hard handover, Break-Before-Make: ..................................................................... 23

2.4. Handover Procedure ...................................................................................................... 24

2.5. Handover Measurements ............................................................................................... 29

2.6. Handover Parameters ..................................................................................................... 31

2.7. Time To Trigger & Hysteresis.. ..................................................................................... 33

Conclusion. ........................................................................................................................... 35

Chapter III: Machine Learning and Handover Parameter Optimization simulation ................ 36

Introduction. ......................................................................................................................... 36

3.1. Q-Learning overview. .................................................................................................... 36

3.1.1. Machine Learning. .................................................................................................. 36

3.1.2. Reinforcement Learning. ........................................................................................ 37

3.1.3. Q-Learning. ............................................................................................................. 38

3.2. Proposed Approach for HO optimization: ..................................................................... 40

3.2.1. Set of states ............................................................................................................. 40

3.2.2. Set of actions. .......................................................................................................... 42

3.2.3. Reward. ................................................................................................................... 43

3.3. Simulation & Performance evaluation: ......................................................................... 43

3.3.1. Simulation parameters............................................................................................. 44

3.3.2. Simulation results. ................................................................................................... 48

Conclusion. ........................................................................................................................... 52

General Conclusion .................................................................................................................. 53

References ................................................................................................................................ 54

Page 8: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

vi

List of Figures

Figure 1: Orthogonal Frequency Division Multiple Accessโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 6

Figure 2: LTE SAE Evolved Packet Coreโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ. 8

Figure 3: E-UTRAN Architectureโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ. 9

Figure 4: Functional Split between E-UTRAN and EPCโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 11

Figure 5: Protocol stack for the user-plane and control-plane at X2 interfaceโ€ฆโ€ฆโ€ฆโ€ฆ.. 13

Figure 6: Protocol stack for the user-plane and control-plane at S1 interface โ€ฆโ€ฆโ€ฆโ€ฆ. 13

Figure 7: E-UTRAN Protocol Stackโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 14

Figure 8: The RRC Statesโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 16

Figure 9: Decision on Handover Typeโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 24

Figure 10: Intra-MME/Serving Gateway Handover โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ. 25

Figure 11: Handover Timing โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ................................................ 28

Figure 12: Downlink reference signal structure for LTE-Advanced โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ... 31

Figure 13: Handover measurement filtering and reporting โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ 31

Figure 14: Handover triggering procedure โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 32

Figure 15: State 157 possible actionsโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ 42

Figure 16: Illustration of Coverage within the Simulation Areaโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 45

Figure 17: Illustration of how the TTT values changed over time for large values when

UE travelling at walking speedsโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ

49

Figure 18: Comparison of TTT Optimization for Walking Speeds (Starting Point

5.12s)โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ

50

Figure 19: Graph of Optimized vs. Non-Optimized Results for Starting Point

TTT=0s hys.=0dB when UE traveling at walking speedsโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ..

51

Figure 20: Graph of Optimized vs. Non-Optimized Results for Starting Point

TTT=0.256s hys.=5dB when UE traveling at walking speedsโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ

52

Page 9: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

vii

List of Tables

Table 1: LTE-Advanced development historyโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ... 3

Table 2: Number of PRBs โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ... 6

Table 3: Operational benefits by SONโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ... 19

Table 4: Table of the different LTE hys. valuesโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ. 33

Table 5: Table of the different LTE TTT valuesโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 34

Table 6: Table of the different LTE Trigger types and their criteriaโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ 34

Table 7: Set of statesโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 41

Table 8: Simulation parametersโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ 47

Page 10: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

viii

Abbreviations

3G 3rd Generation (Cellular Systems)

3GPP Third Generation Partnership Project

4G 4th Generation (Cellular Systems)

AC Admission Control

ACK Acknowledgement (in ARQ protocols)

AI Artificial Intelligence

AM Acknowledged mode

AGWA Access Gateway

AS Access Stratum

BS Base Station

CDF Cumulative Distribution Function

CDMA Code Division Multiple Access

CQI Channel Quality Indicator

CS Circuit-Switched

dB Decibel

DFT Discrete Fourier Transform

DL Downlink

DRB Data Radio Bearer

eNodeB Enhanced Node B (3GPP Base Station)

EPC Evolved Packet Core

E-UTRAN Evolved Universal Terrestrial Radio Access

FDD Frequency Division Duplex

GPRS General Packet Radio Service

GSM Global System for Mobile communications

HO Handover

HOM HO margin

HSDPA High Speed Downlink Packet Access

HSS Home Subscriber Server

HYS Hysteresis

IMS Multimedia Sub-system

IP Internet Protocol

Page 11: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

ix

ITU International Telecommunication Union

ITU-T ITU Telecommunication Standardization Sector

LTE-Advanced Long Term Evolution Advanced

MAC Medium Access Control

MME Mobility Management Entity

NACK Negative Acknowledgement

NAS Non-Access Stratum

NGMN Next Generation Mobile Networks

OFDM Orthogonal Frequency Division Multiplexing

OFDMA Orthogonal Frequency Division Multiple Access

OTP Optimum Trigger Point

PAPR Peak-to-Average Power Ratio

PCRF Policy and Charging Rules Function

PDCP Packet-Data Convergence Protocol

PDN Packet Data Network

PDU Protocol Data Unit

PGW PDN Gateway

QoS Quality of Service

RAN Radio Access Network

RB Resource Block

RF Radio Frequency

RLC Radio Link Protocol

RNC Radio Network Controller

ROHC RObust Header Compression

RRC Radio Resource Control

RRM Radio Resource Management

RSRP Reference Signal Received Power

RSRQ Reference Signal Received Quality

RSS Received Signal Strength

RSSI Received Signal Strength Indicator

SAE System Architecture Evolution

S1 The interface between eNodeB and Access Gateway

S1AP S1 Application Part

SC-FDMA Single Carrier - Frequency Division Multiple Access

Page 12: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

x

SGW Serving Gateway

SINR Signal-to-Interference-plus-Noise Ratio

SIR Signal-to-Interference Ratio

SN Sequence Number

SON Self-Organizing Network

SRB Signaling Radio Bearers

TE Terminal Equipment

TM Transparent Mode

TTI Transmission Time Interval

TTT Time-to-Trigger

UE User Equipment, the 3GPP name for the mobile terminal

UL Uplink

UM Unacknowledged Mode

UMTS Universal Mobile Telecommunication System

USIM Universal Subscriber Identity Module

VoIP Voice over IP

X2 Interface between eNodeBโ€™s

Page 13: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

1

General Introduction

In recent years, there has been enormous growth in mobile telecommunications traffic in line

with the rapid spread of smart phone devices. The cellular networks are evolving to meet the

future requirements of data rate, coverage and capacity. LTE Advanced is a mobile

communication standard and a major enhancement of the Long Term Evolution (LTE)

standard. It was formally submitted as a candidate 4G system to ITU-T in late 2009 as

meeting the requirements of the IMT-Advanced standard, and was standardized by the 3rd

Generation Partnership Project (3GPP) in March 2011 as 3GPP Release 10. One of the

important LTE Advanced benefits is the ability to take advantage of advanced topology

networks; optimized heterogeneous networks with a mix of macrocells with low power nodes

such as picocells, femtocells and new relay nodes. The next significant performance leap in

wireless networks will come from making the most of topology, and brings the network closer

to the user by adding many of these low power nodes. LTE-Advanced further improves the

capacity and coverage, and ensures user fairness. LTE-Advanced also introduces multicarrier

to be able to use ultra wide bandwidth, up to 100 MHz of spectrum supporting very high data

rates. Mobility aspect for the enhancement is an important Long Term Evolution technology

since it should support mobility for various mobile speeds up to 350km/h or even up to

500km/h. With the moving speed even higher, the handover procedure will be more frequent

and fast; therefore, the handover performance becomes more crucial especially for real time

services [11].

One of the main goals of LTE-Advanced or any wireless system for that matter is to provide

fast and seamless handover from one cell (a source cell) to another (a target cell). The service

should be maintained during the handover procedure, data transfer should not be delayed or

should not be lost; otherwise performance will be dramatically degraded. This is especially

applicable for LTE-Advanced systems because of the distributed nature of the LTE radio

access network architecture which consists of just one type of node, the base station, known in

LTE-Advanced as the eNodeB [7].

In LTE-Advanced there are also some predefined handover conditions for triggering the

handover procedure as well as some goals regarding handover design and optimization such

as decreasing the total number of handovers in the whole system by predicting the handover,

decreasing the number of ping pong handovers, and having fast and seamless handover.

Page 14: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

2

Hence, optimizing the handover procedure to get the required performance is considered as

one important issue in LTE-Advanced networks [11].

Actually, many studies are carried out to achieve improvements in LTE-Advanced handover,

with different HO algorithms and which take several stages for different cases, but certainly

all of them are done in order to get optimum handover mechanisms that can handle the

smooth handover on cell boundaries of the LTE-Advanced network.

The main objective of this project is to develop a Q-learning algorithm to self-optimize the

parameters used in the handover process of 4G networks.

In this project we have three chapters: the first chapter contains an overview of LTE

technology; the main characteristics and functionalities of the system are described as well as

the enabling technologies, network architecture and protocol. In the second chapter, we

introduce the general concepts of handover and we describe the whole HO procedure. The

optimization and design principles as well as the variables used as inputs and the different HO

parameters also explained and finally the third chapter discusses our proposed approach.

First, we present the machine Learning explaining thus the reinforcement learning and the Q-

Learning. Then we discuss the handover parameter optimization. Finally we present the

simulation parameters and the obtained results.

Page 15: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

3

Chapter I: LTE-Advanced Overview

Introduction:

In LTE-Advanced networks, focus is on higher capacity: The driving force to further develop

LTE towards LTEโ€“Advanced - LTE Release10 was to provide higher bitrates in a cost

efficient way and, at the same time, completely fulfill the requirements set by ITU for IMT

Advanced, also referred to as 4G.

In this chapter, we will present the LTE-Advanced technologies, resource elements and the

network architecture by citing the different key components.

1.1. Requirements and Targets for LTE-Advanced:

3GPP completed the process of defining LTE-Advanced for radio access, so that the

technology systems remain competitive in the future. The 3GPP has identified a set of high

level requirements that have already been exceeded so far.

The following target requirements were agreed among operators and vendors at the project to

define the evolution of 3G networks.

Table 1: LTE-Advanced development history.

WCDMA

(UMTS)

HSPA

HSDPA/HSUPA

HSPA+ LTE LTE-A

Max downlink

speed (bps)

384 K 14 M 28 M 100 M 1 G

Max uplink speed

(bps)

128 K 5.7 M 11 M 50 M 100 M

Latency round

trip time (approx.)

150 ms 100 ms 50 ms

max

~10 ms Less than 5 ms

3GPP releases Rel 99/5 Rel 5/6 Rel 7 Rel 8/9 Rel 10

Approx. years of

initial roll out

2003/4 2005/6 HSDPA

2007/8 HSUPA

2008/9 2009/10 2011

Access

methodology

CDMA CDMA CDMA OFDMA/SC-

FDMA

OFDMA/SC-

FDMA

Some of key LTE-Advanced requirements related to data rate, throughput, latency, and

mobility are provided below [3]:

Page 16: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

4

Peak data rate:

o 1 Gbps data rate will be achieved by 4 โ€byโ€ 4 MIMO and transmission bandwidth

wider than approximately 70 MHz.

Peak Spectrum efficiency:

o DL: Rel. 8 LTE satisfies IMT โ€Advanced requirement: 30 bps/Hz.

o UL: Need to double from Release 8 to satisfy IMTโ€Advanced

requirement: 15 bps/Hz and 30 bps/Hz in Rel 10.

Capacity and cellโ€edge user throughput:

o Target for LTEโ€Advanced was set considering gain of 1.4 to 1.6 from Release 8

LTE performance.

Spectrum flexibility:

In addition to the bands currently defined for LTE Release 8, TR 36.913

identifies the following new bands:

o 450โ€“470 MHz band

o 698โ€“862 MHz band

o 790โ€“862 MHz band

o 2.3โ€“2.4 GHz band

o 3.4โ€“4.2 GHz band

o 4.4โ€“4.99 GHz band

Some of these bands are now formally included in the 3GPP Release 9 and Release 10

specifications. Note that frequency bands are considered release independent features, which

means that it is acceptable to deploy an earlier release product in a band not defined until a

later release. LTE-Advanced is designed to operate in spectrum allocations of different sizes,

including allocations wider than the 20 MHz in Release 8, in order to achieve higher

performance and target data rates. Although it is desirable to have bandwidths greater than 20

MHz deployed in adjacent spectrum, the limited availability of spectrum means that

aggregation from different bands is necessary to meet the higher bandwidth requirements.

This option has been allowed for in the IMT-Advanced specifications.

Mobility:

o E-UTRAN should be optimized for low mobile speed from 0 to 15 km/h.

o Higher mobile speed between 15 and 120 km/h should be supported with high

performance.

Page 17: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

5

o Mobility across the cellular network shall be maintained at speeds from 120 km/h

to 350 km/h (or even up to 500 km/h depending on the frequency band).

Coverage:

o Throughput, spectrum efficiency and mobility targets above should be met for 5

km cells, and with a slight degradation for 30 km cells. Cells range up to 100 km

should not be precluded.

o Available for paired and unpaired spectrum arrangements.

1.2. LTE Enabling Technologies:

LTE has introduced a number of new technologies when compared to the previous cellular

systems. They enable LTE-Advanced to operate more efficiently with respect to the use of

spectrum, and also to provide much higher data rates that are being required.

A major difference of LTE-Advanced in comparison to its 3GPP ancestors is the radio

interface; Orthogonal Frequency Division Multiple Access (OFDMA) and Single Carrier

Frequency Division Multiple Access (SC-FDMA) are used for the downlink and uplink

respectively, as radio access schemes [6].

1.2.1. Downlink OFDMA (Orthogonal Frequency Division Multiple Access):

OFDMA is a variant of OFDM (Orthogonal Frequency Division Multiplexing) and it is the

downlink access technology. One of the most important advantages is the intrinsic

orthogonality provided by OFDMA to the users within a cell, which translates into an almost

null level of intra-cell interference. Therefore, inter-cell interference is the limiting factor

when high reuse levels are intended. In this case, cell-edge users are especially susceptible to

the effects of inter-cell interference. OFDMA divides the wide available bandwidth into many

narrow and mutually orthogonal subcarriers and transmits the data in parallel streams. The

smallest transmission unit in the downlink LTE-Advanced system is known as a Physical

Resource Block (PRB).

Page 18: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

6

Figure 1: Orthogonal Frequency Division Multiple Access [5].

A resource block contains 12 subcarriers, regardless of the overall LTE-Advanced signal

bandwidth. They also cover one slot in the time frame; this means that different LTE-

Advanced signal bandwidths will have different numbers of resource blocks.

Table 2: Number of PRBs.

Channel Bandwidth (MHz)

1.4

3

5

10

15

20

Number of PRBs

6

15

25

50

75

100

The OFDM signal used in LTE-Advanced comprises a maximum of 2048 different sub-

carriers having a spacing of 15 kHz. Although it is mandatory for the mobiles to have

capability to be able to receive all 2048 sub-carriers, not all need to be transmitted by the base

station (eNodeB) which only needs to be able to support the transmission of 72 sub-carriers.

In this way all mobiles will be able to talk to any base station.

1.2.2. Uplink SC-FDMA (Single Carrier Frequency Division Multiple Access):

For the LTE-Advanced uplink, a different concept is used for the access technique. Although

still using a form of OFDMA technology, the implementation is called Single Carrier

Frequency Division Multiple Access (SC-FDMA). The main task of this scheme is to assign

communication resources to multiple users. The major difference to other schemes is that it

Page 19: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

7

performs DFT (Discrete Fourier Transform) operation on time domain modulated data before

going into OFDM modulation.

One of the key parameters that affect all mobiles is that of battery life. Even though battery

performance is improving all the time, it is still necessary to ensure that the mobiles use as

little battery power as possible. With the RF power amplifier that transmits the radio

frequency signal via the antenna to the base station being the highest power item within the

mobile, it is necessary that it operates in as efficient mode as possible. This can be

significantly affected by the form of radio frequency modulation and signal format. Signals

that have a high peak to average ratio and require linear amplification do not lend themselves

to the use of efficient RF power amplifiers [5].

1.2.3. LTE-A Channel Bandwidths and resource elements:

One of the key parameters associated with the use of OFDM within LTE-Advanced is the

choice of bandwidth. The available bandwidth influences a variety of decisions including the

number of carriers that can be accommodated in the OFDM signal and in turn this influences

elements including the symbol length and so forth [6].

LTE can support 6 kinds of bandwidth and obviously, to higher bandwidth we will obtain

greater channel capacity: 1.4 MHz, 3MHz, 5MHz, 10MHz, 15 MHz and 20MHz.

In addition to this, the subcarriers are spaced 15 kHz apart from each other. To maintain

orthogonality, this gives a symbol rate of 1 / 15 kHz = of 66.7 ยตs. Each subcarrier is able to

carry data at a maximum rate of 15 ksps (kilo symbols per second). This gives a 20 MHz

bandwidth system a raw symbol rate of 18 Msps. In turn this is able to provide a raw data rate

of 108 Mbps as each symbol using 64QAM is able to represent six bits.

1.3. LTE-Advanced Network Architecture:

LTE-A has been designed to support only packet switched services, in contrast to the circuit-

switched model of previous cellular systems. It aims to provide seamless Internet Protocol

(IP) connectivity between User Equipment (UE) and the Packet Data Network (PDN), without

any disruption to the end usersโ€™ applications during mobility [2].

While the term โ€œLTEโ€ encompasses the evolution of the Universal Mobile

Telecommunications System (UMTS) radio access through the Evolved UTRAN (E-

UTRAN), it is accompanied by an evolution of the non-radio aspects under the term โ€œSystem

Architecture Evolutionโ€ (SAE).

Page 20: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

8

Together LTE-Advanced and SAE comprise the Evolved Packet System (EPS). This EPS in

turn includes the EPC (Evolved Packet Core) on the core side and E-UTRAN (Evolved

UMTS Terrestrial Radio Access Network) on the access side [2].

In addition to these two components, User Equipment (UE) and Services Domain are also

very important subsystems of LTE architecture.

1.3.1. The Core Network: Evolved Packet Core (EPC):

The core network is responsible for the overall control of the UE and establishment of the

bearers. The Evolved Packet Core is the main element of the LTE-Advanced SAE network.

This consists of four main elements and connects to the eNodeBโ€™s as shown in the diagram

below.

Figure 2: LTE-Advanced SAE Evolved Packet Core [6].

Mobility Management Entity (MME):

The MME is the main control node for the LTE SAE access network, handling a number of

features, it can therefore be seen that the SAE MME provides a considerable level of overall

control functionality. The protocols running between the UE and the CN are known as the

Non Access Stratum (NAS) protocols. The main functions supported by the MME can be

classified as:

Functions related to bearer management โ€“ This includes the establishment,

maintenance and release of the bearers and is handled by the session management

layer in the NAS protocol.

Page 21: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

9

Functions related to connection management โ€“ This includes the establishment of

the connection and security between the network and UE and is handled by the

connection or mobility management layer in the NAS protocol layer.

Serving Gateway (SGW):

The Serving Gateway, SGW, is a data plane element within the LTE SAE. Its main purpose is

to manage the user plane mobility and it also acts as the main border between the Radio

Access Network, RAN and the core network. The SGW also maintains the data paths between

the eNodeBโ€™s and the PDN Gateways. In this way the SGW forms an interface for the data

packet network at the E-UTRAN.

PDN Gateway (PGW):

The LTE SAE PDN (Packet Data Network) gateway provides connectivity for the UE to

external packet data networks, fulfilling the function of entry and exit point for UE data. The

UE may have connectivity with more than one PGW for accessing multiple PDNs.

Home Subscription Server (HSS):

The HSS is a database server which is located in the operator's premises. All the user

subscription information is stored in the HSS. The HSS also contains the records of the user

location and has the original copy of the user subscription profile. The HSS is interacting with

the MME, and it needs to be connected to all the MMEs in the network that controls the UE.

1.3.2. The Access Network E-UTRAN:

The E-UTRAN is the Access Network of LTE and simply consists of a network of eNodeBโ€™s

that are connected to each other via X2 interface as illustrated in Figure 3. The eNodeBโ€™s are

also connected to the EPC via S1 interface, more specifically to the MME by means of the

S1-MME interface and to the S-GW by means of the S1-U interface.

Figure 3: E-UTRAN Architecture [9].

Page 22: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

10

eNodeB:

The eNodeB is a radio base station of a LTE network that controls all radio-related functions

in the fixed part of the system. These radio base stations are distributed throughout the

coverage region and each of them is placed near a radio antenna. One of the biggest

differences between LTE network and legacy mobile communication system 3G is a base

station.

Practically, an eNodeB provides bridging between the UE and EPC. All the radio protocols

that are used in the access link are terminated in the eNodeB. The eNodeB does

ciphering/deciphering in the user plane as well as IP header compression/decompression. The

eNodeB also has some responsibilities in the control plane such as radio resource

management and performing control over the usage of radio resources.

The E-UTRAN has many responsibilities regarding to all related radio functions. The main

features that supports are the following:

Radio Resource Management:

The RRM objective is to make the mobility feasible in cellular wireless networks so that the

network with the help of the UE takes care of the mobility without user intervention. RRM

covers all functions related to the radio bearers, such as radio bearer control, radio admission

control, radio mobility control, scheduling and dynamic allocation of resources to UEs in both

uplink and downlink.

IP Header Compression:

This helps to ensure efficient use of the radio interface by compressing the IP packet headers

which could otherwise represent a significant overhead, especially for small packets such as

VoIP.

One of the main functions of PDCP (Packet Data Convergence Protocol) is header

compression using the Robust Header Compression (ROHC) protocol defined by the IETF. In

LTE, header compression is very important because there is no support for the transport of

voice services via the Circuit-Switched (CS) domain.

Security:

Security is a very important feature of all 3GPP radio access technologies. LTE provides

security in a similar way to its predecessors UMTS and GSM. Because of the sensitivity of

signaling messages exchanged between the eNodeB itself and the terminal, or between the

MME and the terminal, all this set of information is protected against eavesdropping and

alteration.

Page 23: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

11

The implementation of security architecture of LTE is carried out by two functions: Ciphering

of both control plane (RRC) data and user plane data, and Integrity Protection which is used

for control plane (RRC) data only. Ciphering is used in order to protect the data streams from

being received by a third party, while Integrity Protection allows the receiver to detect packet

insertion or replacement. RRC always activates both functions together, either following

connection establishment or as part of the handover to LTE.

Connectivity to the EPC:

This function consists of the signaling towards the MME and the bearer path towards the S-

GW. All of the above-mentioned functions are concentrated in the eNodeB as in LTE all the

radio controller functions are gathered in the eNodeB. This concentration helps different

protocol layers interact with each other better and will end up in decreased latency and

increase in efficiency.

On the network side, all of these functions reside in the eNodeBโ€™s, each of which can be

responsible for managing multiple cells. Unlike some of the previous second and third

generation technologies, LTE integrates the radio controller function into the eNodeB. This

allows tight interaction between the different protocol layers of the radio access network

(RAN), thus reducing latency and improving efficiency. Furthermore, as LTE does not

support soft handover there is no need for a centralized data-combining function in the

network. One consequence of the lack of a centralized controller node is that, as the UE

moves, the network must transfer all information related to a UE, that is, the UE context,

together with any buffered data, from one eNodeB to another. Mechanisms are therefore

needed to avoid data loss during handover.

Figure 4: Functional Split between E-UTRAN and EPC [5].

Page 24: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

12

1.3.3. The User Equipment (UE):

The end user communicates using a UE. The UE can be a handheld device like a smart phone

or it can be a device which is embedded in a laptop. The UE is divided into two parts: the

Universal Subscriber Identity Module (USIM) and the rest of the UE, which is called

Terminal Equipment (TE).

The USIM is an application with the purpose of identification and authentication of the user

for obtaining security keys. This application is placed into a removable smart card called a

universal integrated circuit card (UICC).

The UE in general is the end-user platform that by the use of signaling with the network, sets

up, maintains, and removes the necessary communication links. The UE is also assisting in

the handover procedure and sends reports about terminal location to the network.

1.4. E-UTRAN Network Interfaces:

There are two interfaces concerned in handover procedure in LTE for UEs in active mode,

which are X2 and S1 interfaces. Both interfaces can be used in handover procedures, but with

different purposes.

1.4.1. X2 Interface:

The X2 interface has a key role in the intra-LTE handover operation. The source eNodeB will

use the X2 interface to send the Handover Request message to the target eNodeB. If the X2

interface does not exist between the two eNodeBโ€™s in question, then procedures need to be

initiated to set one up before handover can be achieved [3].

The Handover Request message initiates the target eNodeB to reserve resources and it will

send the Handover Request Acknowledgement message assuming resources are found.

There are different information elements provided (some optional) on the handover Request

message, such as:

Requested SAE bearers to be handed over.

Handover restrictions list, which may restrict following handovers for the UE.

Page 25: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

13

Last visited cells the UE has been connected to, if the UE historical information

collection functionality is enabled. This has been considered to be useful in avoiding

the Ping-Pong effects between different cells when the target eNodeB is given

information on how the serving eNodeB has been changing in the past. Thus actions

can be taken to limit frequent X2 User Plane.

Figure 5: Protocol stack for the user-plane and control-plane at X2 interface [3].

1.4.2. S1 Interface:

The radio network signaling over S1 consists of the S1 Application Part (S1AP).The S1AP

protocol handles all procedures between the EPC and E-UTRAN. It is also capable of

carrying messages transparently between the EPC and the UE. Over the S1 interface the S1AP

protocol primarily supports general E-UTRAN procedures from the EPC, transfers

transparent non-access signaling and performs the mobility function. The figure below

shows the protocol stack for the user-plane and control-plane at S1 interface [3].

Figure 6: Protocol stack for the user-plane and control-plane at S1 interface [3].

Page 26: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

14

1.5. LTE Protocol Architecture:

The overall radio interface protocol architecture for LTE can be divided into User Plane

Protocols and Control Plane Protocols. The U-UTRAN protocol stack is depicted in the figure

7.

Figure 7: E-UTRAN Protocol Stack [8].

1.5.1. User Plane:

An IP packet is tunneled between the P-GW and the eNodeB to be transmitted towards the

UE. Different tunneling protocols can be used. The tunneling protocol used by 3GPP is called

the GPRS tunneling protocol (GTP) [8].

The LTE Layer 2 user-plane protocol stack is composed of three sub layers: Packet Data

Convergence Protocol (PDCP), Radio Link Control (RLC) and Medium Access Control

(MAC). These sub layers are terminated in the eNodeB on the network side.

1.5.2. Control Plane:

Control plane and User plane have common protocols which perform the same functions

except that for the control plane protocols there is no header compression. In the access

stratum protocol stack and above the PDCP, there is the Radio Resource Control (RRC)

protocol which is considered as a โ€œLayer 3โ€ protocol. RRC sends signaling messages between

Page 27: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

15

the eNodeB and UE for establishing and configuring the radio bearers of all lower layers in

the access stratum.

1.5.2.1. Radio Resource Control (RRC):

The RRC (Radio Resource Control) layer is a key signaling protocol which supports many

functions between the terminal and the eNodeB. The RRC protocol enables the transfer of

common NAS information which is applicable to all UEs as well as dedicated NAS

information which is applicable only to a specific UE. In addition, for UEs in RRC_IDLE,

RRC supports notification of incoming calls.

The key features of RRC are the following:

Broadcast of System Information: Handles the broadcasting of system

information, which includes NAS common information. Some of the system

information is applicable only for UEโ€™s in RRC-IDLE while other system

information is also applicable for UEs in RRC-CONNECTED.

RRC Connection Management: Covers all procedures related to the

establishment, modification and release of an RRC connection, including paging,

initial security activation, establishment of Signaling Radio Bearers (SRBโ€™s) and of

radio bearers carrying user data (Data Radio Bearers, DRBโ€™s), handover within LTE

(including transfer of UE RRC context information), configuration of the lower

protocol layers, access class barring and radio link failure.

Establishment and release of radio resources: This relates to the allocation of resources

for the transport of signaling messages or user data between the terminal and eNodeB.

Paging: this is performed through the PCCH logical control channel. The prominent

usage of paging is to page the UEโ€™s that are in RRC-IDLE. Paging can also be used to

notify UEโ€™s both in RRC-IDLE and RRC-CONNECTED modes about system

information changes or SIB10 and SIB11 transfers.

Transmission of signaling messages to and from the EPC: these messages (known as

NAS for Non Access Stratum) are transferred to and from the terminal via the RRC;

they are, however, treated by RRC as transparent messages.

Handover: the handover is triggered by the eNodeB, based on the received

measurement reports from the UE. Handover is classified in different types based on

the origin and destination of the handover. The handover can start and end in the E-

Page 28: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

16

UTRAN, it can start in the E-UTRAN and end in another Radio Access Technology

(RAT), or it can start from another RAT and end in E-UTRAN.

The RRC also supports a set of functions related to end-user mobility for terminals in

RRC Connected state. This includes:

Measurement control: This refers to the configuration of measurements to be

performed by the terminal as well as the method to report them to the eNodeB.

Support of inter-cell mobility procedures: which are also known as handover

User context transfer: between eNodeB at handover.

1.5.2.2. Radio Resource Control States:

The main function of the RRC protocol is to manage the connection between the terminal and

the EUTRAN access network. To achieve this, RRC protocol states have been defined and

they are depicted in the figure below. Each of them actually corresponds to the states of the

connection, and describes how the network and the terminal shall handle special functions

like terminal mobility, paging message processing and network system information

broadcasting [14].

In E-UTRAN, the RRC state machine is very simple and limited to two states only: RRC-

IDLE, and RRC-CONNECTED.

Figure 8: The RRC States [14]

In the RRC-IDLE state, there is no connection between the terminal and the eNodeB,

meaning that the terminal is actually not known by the E-UTRAN Access Network. The

terminal user is inactive from an application level perspective, which does not mean at all that

nothing happens at the radio interface level. Nevertheless, the terminal behavior is specified in

order to save as much battery power as possible and is actually limited to three main items:

Periodic decoding of System Information Broadcast by E-UTRAN: this process is

required in case the information is dynamically updated by the network.

Page 29: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

17

Decoding of paging messages: so that the terminal can further connect to the network

in case of an incoming session.

Cell reselection: the terminal periodically evaluates the best cell it should camp on

through its own radio measurements and based on network System Information

parameters. When the condition is reached, the terminal autonomously performs a

selection of a new serving cell.

In the RRC-CONNECTED state, there is an active connection between the terminal and the

eNodeB, which implies a communication context being stored within the eNodeB for this

terminal. Both sides can exchange user data and or signaling messages over logical channels.

Unlike the RRC-IDLE state, the terminal location is known at the cell level. Terminal

mobility is under the control of the network using the handover procedure, which decision is

based on many possible criteria including measurement reported by the terminal of by the

physical layer of the eNodeB itself.

1.6. Self-Organizing Networks:

A self-organizing Network (SON) is an automation technology designed to make the

planning, configuration, management, optimization and healing of mobile radio access

networks simpler and faster. SON functionality and behavior has been defined and specified

in generally accepted mobile industry recommendations produced by organizations such as

3GPP and the NGMN.

SON has been codified within 3GPP Release 8 and subsequent specifications in a series of

standards including 36.902, as well as public white papers outlining use cases from the

NGMN. The first technology making use of SON features will be Long Term Evolution

(LTE), but the technology has also been retro-fitted to older radio access technologies such as

Universal Mobile Telecommunications System (UMTS). The LTE specification inherently

supports SON features like Automatic Neighbor Relation (ANR) detection, which is the 3GPP

LTE Rel. 8 flagship feature.

Newly added base stations should be self-configured in line with a "plug-and-play" paradigm,

while all operational base stations will regularly self-optimize parameters and algorithmic

behavior in response to observed network performance and radio conditions. Furthermore,

self-healing mechanisms can be triggered to temporarily compensate for a detected equipment

outage, while awaiting a more permanent solution.

Self-organizing network functionalities are commonly divided into three major sub-functional

groups, each containing a wide range of decomposed use cases:

Page 30: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

18

Self-configuration functions:

Self-configuration strives towards the "plug-and-play" paradigm in the way that new base

stations shall automatically be configured and integrated into the network. This means both

connectivity establishment, and download of configuration parameters are software. Self-

configuration is typically supplied as part of the software delivery with each radio cell by

equipment vendors. When a new base station is introduced into the network and powered on,

it gets immediately recognized and registered by the network. The neighboring base stations

then automatically adjust their technical parameters (such as emission power, antenna tilt,

etc.) in order to provide the required coverage and capacity, and, in the same time, avoid the

interference.

Self-optimization functions:

Every base station contains hundreds of configuration parameters that control various aspects

of the cell site. Each of these can be altered to change network behavior, based on

observations of both the base station itself, and measurements at the mobile station or handset.

One of the first SON features establishes neighbor relations automatically (ANR), while

others optimize random access parameters or mobility robustness in terms of handover

oscillations. A very illustrative use case is the automatic switch-off of a percent of base

stations during the night hours. The neighboring base station would then re-configure their

parameters in order to keep the entire area covered by signal. In case of a sudden growth in

connectivity demand for any reason, the "sleeping" base stations "wake up" almost

instantaneously. This mechanism leads to significant energy savings for operators.

Self-healing functions:

When some nodes in the network become inoperative, self-healing mechanisms aim at

reducing the impacts from the failure, for example by adjusting parameters and algorithms in

adjacent cells so that other nodes can support the users that were supported by the failing

node. In legacy networks, the failing base stations are at times hard to identify and a

significant amount of time and resources is required to fix it. This function of SON permits to

spot such a failing base stations immediately in order to take further measures, and ensure no

or insignificant degradation of service for the users.

Page 31: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

19

Table 3: Operational benefits by SON.

Self-Configuration Flexibility in logistics (eNodeB not site specific).

Reduced site / parameter planning.

Simplified installation; less prone to errors.

No/minimum drive tests.

Faster rollout.

Self-Optimization Increased network quality and performance.

Parameter optimization reduced maintenance, site visits.

Self-Healing Error self-detection and mitigation.

Speed up maintenance.

Reduce outage time.

Conclusion:

In LTE-Advanced focus is on higher capacity: the driving force to further develop LTE

towards LTEโ€“Advanced. LTE-Advanced provides higher bitrates in a cost efficient way and,

at the same time, completely fulfill the requirements set by ITU for IMT Advanced, also

referred to as 4G. In the next chapter, we will pay particular attention to the handover.

Page 32: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

20

Chapter II: Handover in LTE-Advanced

Introduction:

Mobility is an essential component of mobile cellular communication systems because it

offers clear benefits to the end users: low delay services such as voice or real time video

connections can be maintained while moving even in high speed trains.

Handover is one of the key procedures for ensuring that the users move freely through the

network while still being connected and being offered quality services. Since its success

rate is a key indicator of user satisfaction, it is vital that this procedure happens as fast and as

seamlessly as possible. Hence, optimizing the handover procedure to get the required

performance is considered an important issue in LTE networks.

In this context, we study in this chapter the Handover by its characteristics and different types.

2.1. Handover Definition and Characteristics:

The process of handover is very important in mobile telecommunications. It involves moving

the resource allocation for a mobile phone or a piece of UE from one base station to another.

This process is used to provide better Quality-of-Service (QoS) to customers by allowing

them to continue to use provided services even after moving out of range of the original

serving base station. It is important that handovers are performed quickly, cause little-to-no

disruption to the user's experience and are completed with a very high success rate. If a

handover is unsuccessful it is likely that an on-going call will be dropped due to there not

being enough resources available on a base station (known as an eNodeB in LTE) or the if

Received Signal Strength (RSS) to the UE drops below a certain threshold needed to maintain

the call. This threshold, in LTE, is known as the noise or and has a value of -97.5dB.

Handovers are stated to take roughly 0.25 seconds to complete after the decision has been

made for a handover to take place [17].

Depending on the required QoS, a seamless handover or a lossless handover is performed as

appropriate for each radio bearer. The descriptions of each of them are presented below.

2.1.1. Seamless Handover:

The objective of seamless handover is to provide a given QoS when the UE moves from the

coverage of one cell to the coverage of another cell. In LTE seamless handover is applied to

Page 33: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

21

all radio bearers carrying control plane data and for user plane radio bearers mapped on RLC-

UM. These types of data are typically reasonably tolerant of losses but less tolerant of delay,

(e.g. voice services). Therefore seamless handover should minimize the complexity and delay

although some SDUs might be lost [4].

In the seamless handover, PDCP entities including the header compression contexts are reset,

and the COUNT values are set to zero. As a new key is anyway generated at handover, there

is no security reason to keep the COUNT values. On the UE side, all the PDCP SDUs that

have not been transmitted yet will be sent to the target cell after handover. PDCP SDUs for

which the transmission has not been started can be forwarded via X2 interface towards the

target eNodeB. Unacknowledged PDCP SDUs will be lost. This minimizes the handover

complexity because no context (i.e. configuration information) has to be transferred between

the source and the target eNodeB.

2.1.2. Lossless Handover:

Lossless handover means that no data should be lost during handover. This is achieved by

performing retransmission of PDCP PDUs for which reception has not been acknowledged by

the UE before the UE detaches from the source cell to make a handover. In lossless handover,

in-sequence delivery during handover can be ensured by using PDCP Data PDUs sequence

numbers. Lossless handover can be very suitable for delay-tolerant services like file

downloads that the loss of PDCP SDUs can enormously decrease the data rate because of

TCP reaction.

Lossless handover is applied for user plane and for some control plane radio bearers that are

mapped on RLC-AM. In lossless handover, on the UE side the header compression protocol is

reset because its context is not forwarded from the source eNodeB to the target eNodeB, but

the PDCP SDUs' sequence numbers and the COUNT values are not reset [4]. To ensure

lossless handover in the uplink, the PDCP PDUs stored in the PDCP retransmission buffer are

retransmitted by the RLC protocol based on the PDCP SNs which are maintained during the

handover and deliver them to the gateway in the correct sequence.

In order to ensure lossless handover in the downlink, the source eNodeB

forwards the uncompressed PDCP SDUs for which reception has not yet been

acknowledged by the UE to the target eNodeB for retransmission in the

downlink.

Page 34: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

22

2.2. Types of Handover:

The handover is triggered by the eNodeB, based on the received measurement reports from

the UE. Handover is classified in different types based on the origination and destination of

the handover. The handover can start and end in the E-UTRAN, it can start in the E-UTRAN

and end in another Radio Access Technology (RAT), or it can start from another RAT and

end in E-UTRAN [15].

Handover is classified as:

Intra-frequency intra-LTE handover.

Inter-frequency intra-LTE handover.

Inter-RAT towards LTE handover.

Inter-RAT towards UTRAN handover.

Inter-RAT towards GERAN handover.

Inter-RAT towards cdma2000 system handover.

2.2.1. Intra LTE Handover: Horizontal Handover:

In intra LTE handover, which is focused by this project, both the origination and destination

eNodeBโ€™s are within the LTE system. In this type of handover, the RRC connection

reconfiguration message acts as a handover command. The interface between eNodeBโ€™s is an

X2 interface. Upon handover, the source eNodeB sends an X2 handover request message to

the target eNodeB in order to make it ready for the coming handover.

2.2.2. Vertical Handover:

There have been tremendous breakthroughs recorded in the last decade in the historical

evolution of the wireless communication networks. The complex nature of the wireless

environment has made the technology difficult or almost impossible for the network to be

efficient in providing esteemed users high data rate and good Quality of Service (QoS)

requirements. In trying to accomplish these demands, fourth generation (4G) wireless systems

engage in collaborating heterogeneous wireless technologies to allow users get connected

anywhere and at all times. The heterogeneity of the wireless networks involves the integration

of diverse radio access technologies (RAT) such as LTE/LTE-Advanced, UMTS, HSPA,

GPRS, GSM, WiMAX and WiFi. The purpose of integrating these independent networks is to

realize the demand for high data rate and good QoS to support multimedia streaming at

precision levels.

Page 35: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

23

Consequently, the issue of seamless handover, high QoS support, resource allocation,

mobility management and security must be appropriately addressed before achieving these

requirements. As one of the strategies in achieving this purpose, handover mechanism is

introduced and could be defined as a process of reassigning resources as a result of Mobile

user equipment (UE) movement when it switches from one technology to another. An intra-

technology handover process mainly based on the received signal strength (RSS) levels,

is known as Horizontal Handover (HHO) and occurs when the UE switches access points

(APs) or eNodeBs while maintaining the same network. On the other hand, UE switching

their connections to a different network of abstracting proficiencies are termed Vertical

Handover (VHO). This has become possible because of the emergence of multitude

overlapping wireless networks which makes the handover process more complex.

2.3. Handover Techniques:

Handover can be categorized as: Soft handover and hard handover also known as

Make-Before-Break and Break-Before-Make respectively.

2.3.1. Soft handover, Make-Before-Break:

Soft handover is a category of handover procedures where the radio links are added and

abandoned in such manner that the UE always keeps at least one radio link to the UTRAN.

Soft and softer handover were introduced in WCDMA architecture. There is a centralized

controller called Radio Network Controller (RNC) to perform handover control for each UE

in the architecture of WCDMA. It is possible for a UE to simultaneously connect to two or

more cells (or cell sectors) during a call. If the cells the UE connected are from the same

physical site, it is referred as softer handover [10].

In handover aspect, soft handover is suitable for maintaining an active session, preventing

voice call dropping, and resetting a packet session. However, the soft handover requires much

more complicated signaling, procedures and system architecture such as in the WCDMA

network.

2.3.2. Hard handover, Break-Before-Make:

Hard handover is a category of handover procedures where all the old radio links in the UE

are abandoned before the new radio links are established. The hard handover is commonly

used when dealing with handovers in the legacy wireless systems. The hard handover requires

Page 36: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

24

a user to break the existing connection with the current cell (source cell) and make a new

connection to the target cell [10].

In LTE only hard handover is supported, meaning that there is a short interruption in service

when the handover is performed.

2.4. Handover Procedure:

Depending on whether any EPC entity is involved in preparing and executing of a handover

between a source eNodeB and a target eNodeB or not, an LTE handover can be either X2

handover using X2 interface or S1 handover using S1 interface.

Figure 9 shows how a source eNodeB decides on a handover type, X2 or S1, when a handover

is triggered.

Figure 9: Decision on Handover Type.

Handover procedure in LTE can be divided into three phases: handover preparation, handover

execution and handover completion [4]. The procedure starts with the measurement reporting

of a handover event by the User Equipment (UE) to the serving evolved Node B (eNodeB).

The Evolved Packet Core (EPC) is not involved in handover procedure for the control plane

handling, i.e. preparation messages are directly exchanged between the eNodeBโ€™s [1]. That is

the case when X2 interface is deployed, otherwise MME will be used for HO signaling.

The handover procedure with the basic handover scenario is depicted in Figure 10.

Page 37: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

25

Figure 10: Intra-MME/Serving Gateway handover [9].

Page 38: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

26

Handover preparation:

During the handover preparation, data flows between UE and the core network as usual. This

phase includes messaging such as measurement control, which defines the UE measurement

parameters and then the measurement report sent accordingly as the triggering criteria is

satisfied. Handover decision is then made at the serving eNodeB, which requests a handover

to the target cell and performs admission control. Handover request is then acknowledged by

the target eNodeB.

Handover execution:

Handover execution phase is started when the source eNodeB sends a handover command to

UE. During this phase, data is forwarded from the source to the target eNodeB, which buffers

the packets. UE then needs to synchronize to the target cell and perform a random access to

the target cell to obtain UL allocation and timing advance as well as other necessary

parameters. Finally, the UE sends a handover confirm message to the target eNodeB after

which the target eNodeB can start sending the forwarded data to the UE [1].

Handover completion:

In the final phase, the target eNodeB informs the MME that the user plane path has changed.

S-GW is then notified to update the user plane path. At this point, the data starts flowing on

the new path to the target eNodeB. Finally all radio and control plane resources are released in

the source eNodeB.

A more detailed description of the intra-MME/Serving Gateway HO procedure is given

below:

1. Based on the area restriction information, the source eNodeB configures the UE

measurement procedure.

2. MEASUREMENT REPORT is sent by the UE after it is triggered based on some

rules.

3. The decision for handover is taken by the source eNodeB based on

MEASUREMENTREPORT and RRM information.

4. HANDOVER REQUEST message is sent to the target eNodeB by the source eNodeB

containing all the necessary information to prepare the HO at the target side.

5. RAB QoS information. Performing admission control is to increase the likelihood of a

successful HO, in that the target eNodeB decides if the resources can be granted or

not. In case the resources can be granted, the target eNodeB configures the required

resources according to the received E-RAB QoS information then reserves a Cell

Radio Network Temporary Identifier (C-RNTI) and a RACH preamble for the UE.

Page 39: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

27

6. The target eNodeB prepares HO and then sends the HANDOVER REQUEST

ACKNOWLEDGE to the source eNodeB. There is a transparent container in the

HANDOVER REQUEST ACKNOWLEDGE message which is aimed to be sent to

the UE as an RRC message for performing the handover. The container includes a new

C-RNTI, target eNodeB security algorithm identifiers for the selected security

algorithms, may include a dedicated RACH preamble, and possibly some other

parameters like RNL/TNL information for the forwarding tunnels. If there is a need

for data forwarding, the source eNodeB can start forwarding the data to the target

eNodeB as soon as it sends the handover command towards the UE.

Steps 7 to 16 are designed to avoid data loss during HO:

7. To perform the handover the target eNodeB generates the RRC message, i.e. RRC

Connection Reconfiguration message including the mobility Control Information. This

message is sent towards the UE by the source eNodeB.

8. The SN STATUS TRANSFER message is sent by the source eNodeB to the target

eNodeB. In that message, the information about uplink PDCP SN receiver status and

the downlink PDCP SN transmitter status of E-RABs are provided. The PDCP SN of

the first missing UL SDU is included in the uplink PDCP SN receiver status. The next

PDCP SN that the target eNodeB shall assign to the new SDUs is indicated by the

downlink PDCP SN transmitter status.

At this point, data forwarding of user plane downlink packets can use either a

โ€œseamless modeโ€ minimizing the interruption time during the move of the UE, or a

โ€œlossless modeโ€ not tolerating packet loss at all. The source eNodeB may decide to

operate one of these two modes on a per EPS bearer basis, based on the QoS received

over X2 for this bearer.

9. After reception of the RRC Connection Reconfiguration message including the

mobility Control Information by the UE, the UE tries to perform synchronization to

the target eNodeB and to access the target cell via RACH. If a dedicated RACH

preamble was assigned for the UE, it can use a contention free procedure; otherwise it

shall use a contention based procedure. In the sense of security, the target eNodeB

specific keys are derived by the UE and the selected security algorithms are

configured to be used in the target cell.

10. The target eNodeB responds based on timing advance and uplink allocation.

11. After the UE is successfully accessed to the target cell, it sends the RRC Connection

Reconfiguration Complete message for handover confirmation, The C- RNTI sent in

Page 40: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

28

the RRC Connection Reconfiguration Complete message is verified by the target

eNodeB and afterwards the target eNodeB can now begin sending data to the UE.

12. A PATH SWITCH message is sent to MME by the target eNodeB to inform that the

UE has changed cell.

13. UPDATE USER PLANE REQUEST message is sent by the MME to the Serving

Gateway.

14. The Serving Gateway switches the downlink data path to the target eNodeB and sends

one or more \end marker" packets on the old path to the source eNodeB to indicate no

more packets will be transmitted on this path. Then U-plane/TNL resources towards

the source eNodeB can be released.

15. An UPDATE USER PLANE RESPONSE message is sent to the MME by the Serving

Gateway.

16. The MME sends the PATH SWITCH ACKNOWLEDGE message to confirm the

PATH SWITCH message.

17. The target eNodeB sends UE CONTEXT RELEASE to the source eNodeB to inform

the success of handover to it. The target eNodeB sends this message to the source

eNodeB after the PATH SWITCH ACKNOWLEDGE is received by the target

eNodeB from the MME.

18. After the source eNodeB receives the UE CONTEXT RELEASE message, it can

release the radio and C-plane related resources. If there is ongoing data forwarding it

can continue.

Figure 11: Handover Timing [8]

Page 41: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

29

2.5. Handover Measurements:

The handover procedure in LTE-Advanced, which is a part of the RRM, is based on the UEโ€™s

measurements. Handover decisions are usually based on the downlink channel measurements

which consist of Reference Signal Received Power (RSRP) and Reference Signal Received

Quality (RSRQ) made in the UE and sent to the eNodeB regularly [12]. The descriptions of

each of them are presented following:

Reference Signal Received Power (RSRP):

The RSRP measurement provides cell-specific signal strength metric. This measurement is

used mainly to rank different LTE-Advanced candidate cells according to their signal strength

and is used as an input for handover and cell reselection decisions. RSRP is defined for a

specific cell as the linear average received power (in Watts) of the signals that carry cell-

specific Reference Signals (RS) within the considered measurement frequency bandwidth [4].

Reference Signal Received Quality (RSRQ):

This measurement is intended to provide a cell-specific signal quality metric. Similarly to

RSRP, this metric is used mainly to rank different LTE candidate cells according to their

signal quality. This measurement is used as an input for handover and cell reselection

decisions, for example in scenarios for which RSRP measurements do not provide sufficient

information to perform reliable mobility decisions.

The RSRQ is defined as:

๐‘…๐‘†๐‘…๐‘„ = ๐‘.๐‘…๐‘†๐‘…๐‘ƒ

๐‘…๐‘†๐‘†๐ผ (1)

Where N is the number of Resource Blocks (RBs) of the LTE-Advanced carrier RSSI

measurement bandwidth. The measurements in the numerator and denominator are made over

the same set of resource blocks. While RSRP is an indicator of the wanted signal strength,

RSRQ additionally takes the interference level into account due to the inclusion of RSSI.

RSRQ therefore enables the combined effect of signal strength and interference to be reported

in an efficient way [4].

Besides RSRP/RSRQ, handover technology has other decision criterions, such as:

Signal Noise Ratio (SNR):

The SNR is a measurement that compares the level of a desired signal to the level of

background noise (unwanted signal). It is defined as the ratio of signal power and the noise

power. A ratio higher than 1:1 indicates more signal than noise.

๐‘†๐‘๐‘… = ๐‘ƒ๐‘ ๐‘–๐‘”๐‘›๐‘Ž๐‘™

๐‘ƒ๐‘›๐‘œ๐‘–๐‘ ๐‘’ (2)

Page 42: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

30

Where P is average power. Both signal and noise power must be measured at the same or

equivalent points in a system, and within the same system bandwidth [16].

Carrier-to-Interference Ratio (CIR):

CIR expressed in decibels (dB) is a measurement of signaling effectiveness and it is defined

as the ratio of the power in the carrier to the power of the interference signal.

Signal Interference plus Noise Ratio (SINR):

This metric is used to optimize the transmit power level for a target quality of service

assisting with handover decisions. Accurate SINR estimation provides a more efficient system

and a higher user-perceived quality of service.

SINR is defined as the ratio of signal power to the combined noise and interference power:

๐‘†๐ผ๐‘๐‘… = ๐‘ƒ๐‘ ๐‘–๐‘”๐‘›๐‘Ž๐‘™

๐‘ƒ๐‘›๐‘œ๐‘–๐‘ ๐‘’+ ๐‘ƒ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘“๐‘’๐‘Ÿ๐‘’๐‘›๐‘๐‘’ (3)

Where P is the averaged power, values are commonly quoted in dB.

Received Signal Strength Indicator (RSSI):

The LTE carrier RSSI is defined as the total received wideband power observed by the UE

from all sources, including co-channel serving and non-serving cells, adjacent channel

interference and thermal noise within the measurement bandwidth specified by the 3GPP.

LTE-Advanced carrier RSSI is not reported as a measurement in its own right, but is used as

an input to the LTE-Advanced RSRQ measurement [4].

As mentioned earlier, handover measurements in LTE-Advanced are done at the downlink

reference symbols in the frame structure as shown in Figure 12. However, handover decision

can also be based on the uplink measurements. This study focuses on downlink handover

measurements.

Page 43: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

31

Figure 12: Downlink reference signal structure for LTE-Advanced.

The averaging of fast fading over all the reference symbols is done at Layer 1 and hence is

called L1 filtering (Figure 13). The use of scalable bandwidth in LTE allows doing the

handover measurement on different bandwidth.

Figure 13: Handover measurement filtering and reporting [10].

2.6. Handover Parameters:

The handover procedure has different parameters which are used to enhance its performance

and setting these parameters to the optimal values is a very important task. In LTE the

triggering of handover is usually based on measurement of link quality and some other

parameters in order to improve the performance. The most important ones include [13]:

Handover initiation threshold level RSRP and RSRQ:

This level is used for handover initiation. When the handover threshold decreases, the

probability of a late handover decreases and the Ping-Pong effect increases. It can be varied

Page 44: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

32

according to different scenarios and propagation conditions to make theses trade-offs and

obtain a better performance.

Hysteresis margin:

The Hysteresis margin also called HO margin is the main parameter that governs the HO

algorithm between two eNodeBโ€™s. The handover is initiated if the link quality of another cell

is better than current link quality by a hysteresis value. It is used to avoid ping-pong effects.

However, it can increase handover failure since it can also prevent necessary handovers.

Time-to-Trigger (TTT):

When applying Time-to-Trigger, the handover is initiated only if the triggering requirement is

fulfilled for a time interval. This parameter can decrease the number of unnecessary

handovers and effectively avoid Ping-Pong effects. But it can also delay the handover which

then increase the probability of handover failures.

The length and shape of averaging window:

The effect of the channel variation due to fading should be minimized in handover decision.

Averaging window can be used to filter it out. Both the length and the shape of the window

can affect the handover initiation. Long windows reduce the number of handovers but

increase the delay. The shape of the windows, e.g. rectangular or exponential shape, can also

affect the number of handovers and probability of unnecessary handovers.

The listed parameters will affect directly the handover initiations and hence they can be tuned

according to certain design goals. However there are other parameters like the measurement

report period which can also have an impact on the handover initiations.

Figure 14: Handover triggering procedure [11].

Page 45: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

33

In summary, the starting point of the handover triggering procedure is the measurements

performed by the UE. These are done periodically as defined by the measurement period

parameter configured at the eNodeB. When a condition is reached in which the serving cell

RSRP drops an amount of the configured HO offset, usually 2-3dB, below the measured

neighbor cell, a timer is started.

In case this condition lasts the amount of the Time to Trigger (TTT) value, a measurement

report is sent to the eNodeB, which initiates the handover by sending a handover command to

the UE. In case the reporting conditions change and no longer satisfy the triggering conditions

before the timer reaches the TTT value, a measurement report will not be sent and new

measurement calculations and timers are started [11].

2.7. Time To Trigger & Hysteresis:

In this project LTE, two main parameters are studied in the handover process. These

parameters are the Time-to-Trigger (TTT) and Hysteresis (hys). The hys is used to dene how

much better the RSS of a neighboring base station must be than the serving base station for a

handover to be considered. The values of hys are defined in Decibels (dB) and range from 0

to 10dB in 0.5dB increments, this results in there being 21 different values of hys. The full

range of hys values can be seen in Table 4.

Table 4: Table of the different LTE hys values.

Index

hys (dB)

0

0.0

1

0.5

2

1.0

3

1.5

4

2.0

5

2.5

6

3.0

7

3.5

8

4.0

9

4.5

10

5.0

Index

hys (dB)

11

5.5

12

6.0

13

6.5

14

7.0

15

7.5

16

8.0

17

8.5

18

9.0

19

9.5

20

10

The TTT is a length of time, defined in seconds, that is used to define how long a neighboring

base station must be considered better than the serving base station for. There are 16 different

values of TTT ranging from 0 to 5.12 seconds. Unlike with hys., the TTT values do not

increase linearly; instead they increase exponential with smaller increases at the lower values

and bigger increases at the larger values. The full list of TTT values can be seen in Table 5

and a graph of how the TTT values increase can be seen in Figure 11.

Page 46: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

34

Table 5: Table of the different LTE TTT values.

Index

TTT (s)

0

0.0

1

0.04

2

0.064

3

0.08

4

0.1

5

0.128

6

0.16

7

0.256

8

0.32

Index

TTT (s)

9

0.48

10

0.512

11

0.64

12

1.024

13

1.280

14

2.56

15

5.12

There are 336 different combinations of TTT and hys values. Having such a large range of

combinations means that pairs of values can mean that a neighboring eNodeB has to be better

by a large value of hys but for a small value of TTT or vice-versa. This makes for an

interesting dynamic for which pairs of values will work the best in any given environment.

In LTE there are eight different triggers defined for initiating handovers. Table 6 shows

different trigger events and how they are defined [18].

Table 6: Table of the different LTE Trigger types and their criteria.

Event Type Trigger Criteria

A1 Serving becomes better than a threshold.

A2 Serving becomes worse than a threshold.

A3 Neighbor becomes offset better than Primary Cell (PCell).

A4 Neighbor becomes better than threshold.

A5 PCell becomes worse than threshold1 and neighbor becomes

better than threshold2.

A6 Neighbor becomes offset better than Secondary Cell (SCell).

B1 Inter RAT neighbor becomes better than threshold.

B2 PCell becomes worse than threshold1 and inter RAT

neighbor becomes better than threshold2.

Out of the eight triggers the A3 event is the most common and its definition is that a

neighboring eNodeB must give the UE better Reference Signal Received Power (RSRP) by an

amount defined by the hys., for a length of time defined by the TTT. [19] The A3 event can

be represented by the following equation:

๐‘…๐‘†๐‘…๐‘ƒ๐‘›๐‘’๐‘–๐‘”โ„Ž๐‘๐‘œ๐‘Ÿ๐‘–๐‘›๐‘” + ๐ป๐‘ฆ๐‘  > ๐‘…๐‘†๐‘…๐‘ƒ๐‘ ๐‘’๐‘Ÿ๐‘ฃ๐‘–๐‘›๐‘” (4)

When a handover event is triggered a measurement report is sent from the UE to the Serving

eNodeB. The measurement report contains the information required for the Serving eNodeB

Page 47: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

35

to make a decision on whether to initiate a handover or not. The full, high-level, procedure for

a LTE handover is as follows:

1. If a Neighboring eNodeB is found to be better than the Serving eNodeB a

measurement report is sent by the UE to the Serving eNodeB.

2. The Serving eNodeB considers the information in the measurement report and decides

whether or not a handover should take place.

3. If it is decided that a handover should take place then a message is sent to the

Neighboring eNodeB to prepare resources for the UE.

4. Once the resources are ready for the UE the new Serving eNodeB sends a message to

the old eNodeB to release the resources it previously had for the UE.

5. Finally a message is sent to the MME to finalize the handover process.

Conclusion:

The handover parameters need to be optimized for good performance. Too low handover

offset and TTT values in fading conditions result in back and forth ping- pong handovers

between the cells. Too high values then can be the cause of call drops during handovers as the

radio conditions get too bad for transmission in the serving cell.

In the last chapter, we will explain our proposed solution to optimize the Handover

parameters.

Page 48: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

36

Chapter III: Machine Learning and Handover Parameter

Optimization simulation

Introduction:

Optimizing handover is a major activity in network operations, with Hysteresis and Time-to-

Trigger as the main control parameters. For each HO, depending on the Hys-TTT tuple also

called the trigger point, either: a success, Ping-Pong, or Radio Link failure occurs.

Along this chapter, we will describe the Q-Learning, present our proposed approach for

Handover optimization and finish by simulation results.

3.1. Q-Learning overview:

3.1.1. Machine Learning:

Machine learning is a form of Artificial Intelligence (AI) that involves designing and studying

systems and algorithms with the ability to learn from data. This field of AI has many

applications within research (such as system optimization), products (such as image

recognition) and advertising (such as adverts that use a user's browsing history). There are

many different paradigms that machine learning algorithms use. Algorithms can use training

sets to train an algorithm to give appropriate outputs; other algorithms look for patterns in

data; while others use the notion of rewards to find out if an action could be considered

correct or not [20]. Three of the most popular types of machine learning algorithms are:

Supervised learning is where an algorithm is trained using a training set of data.

This set of data includes inputs and the known outputs for those inputs. The training

set is used to fine-tune the parameters in the algorithm. The purpose of this kind of

algorithm is to learn a general mapping between inputs and outputs so that the

algorithm can give an accurate result for an input with an unknown output. This

type of algorithm is generally used in classification systems.

Unsupervised learning algorithms only know about the inputs they are given. The

goal of such an algorithm is to try and find patterns or structure within the input

data. Such an algorithm would be given inputs and any patterns that are contained

would become more and more visible the more inputs the algorithm is given.

Reinforcement learning uses an intelligent agent to perform actions within an

environment. Any such action will yield a reward to the agent and the agent's goal

Page 49: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

37

is to learn about how the environment reacts to any given action. The agent then

uses this knowledge to try and maximize its reward gains.

3.1.2. Reinforcement Learning:

In reinforcement learning an intelligent agent is learning what action to do at any given time

to maximize the notion of a reward. In the beginning the agent has no knowledge of what

action it should take from any state within the learning environment. It must instead learn

through trial and error, exploring all possible actions and finding the ones that perform the

best.

The trade-of between exploration and exploitation is one of the main features of

reinforcement and can greatly affect the performance of a chosen algorithm. A reinforcement

learning algorithm must contemplate this trade-off of whether to exploit an action that

resulted in a large reward or to explore other actions with the possibility of receiving a greater

reward.

Another main feature of reinforcement learning is that the problem in question is taken into

context as a whole. This is different from other types of ma- chine learning algorithms, as

they will not considered how the results of any sub-problems may affect the problem as a

whole.

The basic elements required for reinforcement learning is as follows:

A Model (M) of the environment that consists of a set of States (S) and Actions

(A).

A reward function (R).

A value function (V).

A policy (P).

The model of the environment is used to mimic the behavior of the environment, such as

predicting the next state and reward from a state and taken action. Models are generally used

for planning by deciding what action to take while considering future rewards.

The reward function defines how good or bad an action is from a state. It is also used to

define the immediate reward the agent can expect to receive. Generally a mapping between a

state-action pair and a numerical value is used to define the reward that the agent would gain.

The reward values are used to define the policy where the best value of state-action pair is

used to define the action to take from a state.

Page 50: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

38

While the reward function defines the immediate reward that can be gained from a state, the

value function defines how good a state will be long-term. This difference can create possible

conflicts of interest for an agent; so while its goal is to collect as much reward as possible, it

has to weigh up the options of picking a state that may provide a lot of up front reward but not

much future reward against a state with a lot of future reward but not much immediate reward.

The policy is a mapping between a state and the best action to be taken from that state at any

given time. Policies can be simple or complex; with a simple policy consisting of a lookup

table, while more complex policies can involve search processes. In general most policies

begin stochastic so that the agent can start to learn what actions are more optimal. [11]

3.1.3. Q-Learning:

Q-Learning is a type of reinforcement learning algorithm where an agent tries to discover an

optimal policy from its history of interactions within an environment. What makes

Q-Learning so powerful is that it will always learn the optimal policy (which action a to take

from a state s) for a problem regardless of the policy it follows, as long as there is no limit on

the number of times the agent can try an action. Due to this ability to always learn the optimal

policy, Q-Learning is known as an Off-Policy learner. The history of interactions of an agent

can be shown as a sequence of State-Action-Rewards:

< s0, a0, r1, s1, a1, r2, s2, a2... >

This can be described as the agent was in State 0, did Action 0, received Reward 0 and

transitioned into State 1; then did Action 1, received Reward 1 and transitioned into State 2;

and so on.

The history of interactions can be treated as a sequence of experiences, with each experience

being a tuple.

< s, a, r, s >

The meaning of the tuple is that the agent was in State s, did Action a, received Reward r and

transitioned in State s. The experiences are what the agent uses to determine what the optimal

action to take is at a given time.

The basic process of a Q-Learning algorithm can be seen in Algorithm 3.1. The general

process requires that the learning agent is given a set of states, a set of actions, a discount

factor ฮณ and step size ฮฑ. The agent also keeps a table of Q-Values, denoted by Q(s,a) where s

is a state and a is an action from that state. A Q-Value is also an average of all the experiences

the agent has with a specific state-action pair. This allows for good and bad experiences to be

averaged out to giving a reasonable estimation of the actual value of state-action pair.

Page 51: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

39

The process of averaging out experiences is done using Temporal Differences.

It could be said that the best way to estimate the next value in a list is to take the average of

all the previous values. Equation 4 shows this process.

๐ด๐‘˜ = (๐‘ฃ1 +โ‹ฏ+ ๐‘ฃ๐‘˜)

๐‘˜ (5)

Therefore

๐‘˜ ๐ด๐‘˜ = ๐‘ฃ1 + โ‹ฏ + ๐‘ฃ๐‘˜

= (๐‘˜ โˆ’ 1)๐ด๐‘˜โˆ’1 + ๐‘ฃ๐‘˜ (6)

Then dividing by k gives:

๐ด๐‘˜ = (1 โˆ’ 1

๐‘˜) ๐ด๐‘˜โˆ’1 +

๐‘ฃ๐‘˜

๐‘˜ (7)

Then let ฮฑk = 1/k:

๐ด๐‘˜ = (1 โˆ’ ๐›ผ๐‘˜)๐ด๐‘˜โˆ’1+ ๐›ผ๐‘˜ + ๐‘ฃ๐‘˜

= ๐ด๐‘˜โˆ’1 + ๐›ผ๐‘˜(๐‘ฃ๐‘˜ โˆ’ ๐ด๐‘˜โˆ’1 ) (8)

The part of Equation 8 where the difference vk โˆ’ Akโˆ’1 is seen is known as the Temporal

Difference Error or TD Error. This shows how different the old value Ak-1 is from the new

value vk. The new value of the estimate, Ak, is then the old estimate, Ak-1, plus the TD error

times k. The Q-Values, therefore, are defined using temporal differences and Equation 9

shows the formula to calculate the values, where is a variable between 0 and 1 and defines

the step size of the algorithm. If the step size were 0 then the algorithm would ignore any

rewards received and if the step size were 1 the algorithm would consider the rewards gained

just as much as the previous experiences of a state-action pair. The discount factor is also a

variable between 0 and 1 and defines how much less future rewards will be worth compared

to the current reward. If the discount factor were to be 0, then the future rewards would not be

considered a lot. If the discount factor were to be 1, then the future rewards would be worth as

much as the current rewards. The possible future rewards (maxaQ(s,a)) is the maximum of the

Q-Values of all possible state-action pairs from the action selected.

๐‘„[๐‘ , ๐‘Ž] = ๐‘„[๐‘ , ๐‘Ž] + ๐›ผ (๐‘Ÿ + ๐›พ๐‘š๐‘Ž๐‘ฅ๐‘Žโ€ฒ๐‘„[๐‘ โ€ฒ, ๐‘Žโ€ฒ] โˆ’ ๐‘„[๐‘ , ๐‘Ž]) (9)

The table of Q-Values can either be initialized as empty or with some values pre-set to try and

lead the agent to a specific goal state. Once the agent has initialized these parameters it

observes the starting state. The starting state can either be chosen by random or be a pre-

determined start state for the problem. The agent will then choose an action. Actions are

chosen either stochastically or by a policy. Once an action has been chosen the agent will

carry out the action and receive a reward. This reward is used to update the table of Q-Values

Page 52: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

40

using Equation 9. Finally the agent moves into the new state and repeats until termination;

which can be either when the agent discovers a goal state or after a certain number of actions

have be taken.

Require:

S is a set of states

A is a set of actions

ฮณ the discount reward factor

ฮฑ is the learning rate

1: procedure Q-Learning(S, A, ฮณ, ฮฑ)

2: real array Q[S, A] 3: previous state s

4: previous action a

5: initialize Q[S, A] arbitrarily

6: observe current state s

7: repeat

8: select and carry out an action a

9: observe reward r and state sโ€ฒ

10: Q[s, a] โ† Q[s, a] + ฮฑ (r + ฮณmaxaโ€ฒ Q[sโ€ฒ, aโ€ฒ] โˆ’ Q[s, a])

11: s โ† sโ€ฒ

12: until termination

13: end procedure

After a Q-Learning algorithm has finished exploring the model of the environment it creates a

policy. The policy is generated by searching across all actions for a state and finding the next

state with the greatest value. The policy is therefore a lookup table that maps a state with the

best possible next state. The policy created can then be used to solve the problem that the Q-

Learning agent was exploring [22].

3.2. Proposed Approach for HO optimization:

3.2.1. Set of states:

The approach taken for optimizing the handover parameters in LTE-Advanced uses a Q-

Learning algorithm based on the process given in Section 3.1. In the approach the model of

the environment has a state for every combination of TTT and hys.; giving a total number of

336 states.

Page 53: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

41

Table 7: Set of states.

HYS

TTT

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10

0 0.0 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020

1 0.04 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041

2 0.064 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062

3 0.08 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083

4 0.1 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104

5 0.128 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125

6 0.16 126 127 128 129 130 131 132 133 134 135(6) 136(5) 137(8) 138 139 140 141 142 143 144 145 146

7 0.256 147 148 149 150 151 152 153 154 155 156(4) 157 158(1) 159 160 161 162 163 164 165 166 167

8 0.32 168 169 170 171 172 173 174 175 176 177(7) 178(2) 179(3) 180 181 182 183 184 185 186 187 188

9 0.48 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209

10 0.512 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230

11 0.64 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251

12 1.024 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272

13 1.280 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293

14 2.56 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314

15 5.12 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335

Page 54: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

42

3.2.2. Set of actions:

An action within the model can move to any other state that is different by one of the

following changes to the handover parameters:

1. A single value increase of TTT. (1)

2. A single value increase of hys. (2)

3. A single value increase of both TTT and hys. (3)

4. A single value decrease of TTT. (4)

5. A single value decrease of hys. (5)

6. A single value decrease of both TTT and hys. (6)

7. A single value increase of TTT and a single value decrease of hys. (7)

8. A single value increase of hys and a single value decrease of TTT. (8)

For example if the learning agent is in the state 157 where the TTT equals 0.256s and the hys

equals 5.0dB and performed action 3 from the list seen above: a single value increase of both

TTT and hys.), then the new TTT would equal 0.32s and the hys. would equal 5.5dB: state

179. In fact the possible next states for the state 157 are: {135(6), 136(5), 137(8), 156(4), 158(1),

179(3), 178(2), 177(7)}

HYS (dB)

TT

T (

s)

5.0 5.54.5

7

6

3

4 1

8

2

5

0.16

0.256

0.32S157

S137S136S135

S156

S177

S178

S179

S158

Figure 15: State 157 possible actions.

The full list of hys. values can be seen in Table 3 and the full list of TTT values can be seen in

Table 4. Having the actions only change the parameters by one increase or decrease of the

Page 55: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

43

TTT and hys values each time not only allows for more refined optimization of the parameters

but it also makes sure that no large changes can suddenly happen.

3.2.3. Reward:

Due to the nature of this kind of problem, the reward gained by an action is dynamic and is

likely to be different each time it is taken. Rewards are based on the number of drop and ping-

pongs accumulated in the simulation for current state in the environment model. The rewards

are defined by the following equation:

๐‘…๐‘’๐‘ค๐‘Ž๐‘Ÿ๐‘‘ = ๐ป๐‘Ž๐‘›๐‘‘๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘ ๐‘ข๐‘๐‘๐‘’๐‘ ๐‘ ๐‘“๐‘ข๐‘™ /(10 โˆ— ๐ท๐‘Ÿ๐‘œ๐‘๐‘  + 2 โˆ— ๐‘ƒ๐‘–๐‘›๐‘”๐‘ƒ๐‘œ๐‘›๐‘”๐‘ ) (10)

The coefficients in Equation 10 are given the values of 10 for drops and 2 for ping-pongs.

Drops are extremely bad for the QoS of a communication system so it's given a large value

and the reason ping-pongs are multiplied by 2 to remove the successful handover that was

caused by the Ping-Pong and give the agent a penalty. The reward is given to the agent and

the Q-Value for that state is updated just before the agent selects the next action to take. The

agent then selects new actions in discrete time steps, which allows for the simulation to run

for fixed periods of time with TTT-hys. pairs specified by a state in the environment model.

After the agent has been given enough time to try every action at least once the Q-Learning

agent generates a policy. This policy can then be used to attempt to optimize the handover

parameters by changing the TTT and hys. values after a call is dropped or the connection

ping-pongs between base stations. The Q-Learning agent still receives rewards every time a

call is dropped or the connection ping-pongs while following the generated policy. Doing this

allows for the system to always be learning; even after the initial learning process that

generated the policy.

3.3. Simulation & Performance evaluation:

The simulation is a very important part of the project. It is required to provide the basic

functionality of a LTE network. For simplicity the simulation was broken down into two main

components; the mobile (UE) and the base station (eNodeB). Due to the project revolving

around the handover process in LTE, it made sense for the two main components of the

simulation to be the mobile and the base station; it is the mobile that triggers the measurement

report and the base station that makes the decision on whether a handover should take place or

not. Each base station would also be given its own Q-Learning agent since each base station is

unique. Since the A3 event trigger (Table 5) is the most common it was decided that it would

Page 56: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

44

be the only trigger implemented in the simulation to reduce the complexity within the

simulation.

Tools and systems used for the simulation: VMwareยฎ Workstation 10.0.1, Ubuntu

14.04, C++ and Matlab R2012a.

Server Machine: EasyNote Packard bell (6GB DDR3 Memory, intelยฎ i3-3120M,

NVIDIAยฎ GeForce 72 GB Dedicated VRAM and 15.6โ€ 16:9 HD LED LCD).

3.3.1. Simulation parameters:

The optimization system was tested in two scenarios. One scenario was to have 10 UEs

moving randomly around 9 base stations, with each UE being 1m in height and each base

station being 60m in height, using the Random Direction mobility model seen in Section 4.3,

where the speed of the UE is 1 to 4m =s, which is walking speed and the duration of the

direction is between 100 and 200 seconds. The other scenario is to have the UE moving at 10

to 15m=s, which is roughly 30mph. In these scenarios, each mobile begins on top of one of

the base stations before it starts moving so that handovers are not required as soon as the

simulation starts. These scenarios would also be run with no fading in the RSS calculations so

as to make the environment easier for the agents to learn.

Each base station has its own Q-Learning agent to optimize the TTT and hys. values for that

specific base station. The agents are given 1,000,000 seconds to attempt to learn the

environment they are working within, with each state being given 180 seconds to gain their

reward. This length of time was chosen because there are 336 states each with a maximum of

8 actions, therefore the time needed to do all actions would take approximately 483,840

seconds if each action was given 180 seconds. This length of time is less than half of the total

time given so even due to the randomness of selecting next actions when learning the

environment there should be enough time to try all states and a selection of the available

actions. After the agents have learned the environment they generate a policy for their base

station to follow. The simulation is then run for 200,000 seconds to test how well the policies

perform.

Within the simulation there are many variable parameters that need to be assigned values.

Such parameters are: the height of the base stations, the height of the UEs, the dimensions of

the simulation area and the positioning of the base stations. Other parameters that also needed

to be considered are the number of base stations and the number of UEs; as well as the

transmission power of the base stations and the time limit for what would be perceived to

Page 57: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

45

have been Ping-Pong connection. The type of environment also had to be chosen; whether the

simulation would be within a rural, urban, small city, medium city or large city environment.

The type of environment was decided to be a medium sized city. It was decided that the

height of any UE would be 1m as it would normally be in the possession of a human being.

The height of any base station would be 60 m because base stations are either placed upon tall

buildings or structures so that they produce better coverage. Since it was decided that the

environment would be a medium sized city, this height made sense since there would be some

large multi-story building. It was also decided that the transmission power of the base stations

would be 46dBm and the time limit for ping-pongs to occur would be 5s as they were the

values that had been used in a similar project [17]. The simulation area and positioning of the

base stations both depended on the propagation model used and the area of coverage that

could be expected from it. From Figure 17 it was found that for the chosen propagation model

the Path Loss would reach the LTE noise or of -97.5dB at around 2km. Therefore, from this it

was decided that the area for the simulation would be 6 km by 6 km. It was then decided that

9 base stations would provide good coverage in this area. Each of these base stations will also

have its own Q-Learning agent since each base station has its own unique TTT and hys

values. The base stations were placed in the following locations and the coverage can be seen

in Figure 16.

Figure 16: Illustration of Coverage within the Simulation Area.

Page 58: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

46

Finally, it was decided that 10 UEs would be used in the simulation because this would allow

for the learning done by the Q-Learning agents to happen faster than if there were just 1 UE.

The UEs would also start on top of different base stations so that they will start in different

places allowing for the Q-Learning agent for each base station to start learning straight away

without having to wait for a UE to move into range, as at least 1 UE will start on top of it.

Mobility Model:

A mobility model defines the way in which an entity will move. For the purposes of the

simulation the mobility model used needed to be random in nature. After some research it was

decided that the mobility model to be used in the simulation would either be the Random

Direction or Random Waypoint model because they are two of the most popular random

mobility models [23]. We will focus on the Random Waypoint Model.

Random Direction Model:

The Random Direction Model is defined as follows:

1. Select a direction randomly between 0 and 359 degrees.

2. Select a random speed to move at.

3. Select a random duration to move for.

4. Move in the selected direction at the selected speed for the selected duration.

5. Repeat until termination.

Random Waypoint Model:

In mobility management, the random waypoint model is a random model for the movement of

mobile users, and how their location, velocity and acceleration change over time. Mobility

models are used for simulation purposes when new network protocols are evaluated. The

random waypoint model was first proposed by Johnson and Maltz. It is one of the most

popular mobility models to evaluate mobile ad hoc network (MANET) routing protocols,

because of its simplicity and wide availability.

In random-based mobility simulation models, the mobile nodes move randomly and freely

without restrictions. To be more specific, the destination, speed and direction are all chosen

randomly and independently of other nodes. This kind of model has been used in many

simulation studies.

Two variants, the random walk model and the random direction model are variants of the

random waypoint model.

Description: The movement of nodes is governed in the following manner: Each node begins

by pausing for a fixed number of seconds. The node then selects a random destination in the

simulation area and a random speed between 0 and some maximum speed. The node moves to

Page 59: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

47

this destination and again pauses for a fixed period before another random location and speed.

This behavior is repeated for the length of the simulation.

The Random Waypoint Model is defined as follows:

1. Randomly select the co-ordinates for a point within the environment.

2. Select a random speed to move at.

3. Select a random length of time to pause for when the destination is reached.

4. Move towards the selected co-ordinates at the selected speed

5. Pause for the randomly selected length of time.

6. Repeat until termination.

It was decided that the Random Direction Model would be used in the simulation because

the Random Waypoint Model has the problem that it is possible to select the co-ordinates

of a point very close to where you begin and then pause for a long period time. The

possibility of that happening is undesirable within the simulation. Random Direction does

not have this problem and it is also possible to set boundaries on the parameters to make

sure that a minimum distance is travelled.

Propagation Model:

A propagation model defines how the received signal from a transmitter decays the further

from the transmitter you are. There are many different models available, all with different

functions and purposes. After some research three models were considered; the Okumura-

Hata Model, the Egli Model and the Cost231-Hata Model.

It was decided that the Cost231-Hata Model would be the one used in the simulation since it

works with frequencies up to 2000MHz (which is the minimum operating frequency of LTE),

unlike the Okumura-Hata model which only works up to 1500MHz. The Cost231-Hata Model

was also picked over the Egli Model because if the Egli Model was used it would require the

simulation area to be very large, with at least 15km between each base station. This would

mean that the UEs would need to move over large distances before any handover attempts

could occur [24, 25].

Table 8: Simulation parameters.

Parameter Value

Number of eNodeB 9

Number of UE 10

Mobility model Random Waypoint Model

Propagation model Cost231-Hata

Page 60: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

48

3.3.2. Simulation results:

The first step taken to implementing the simulation was to decide on the programming

language to be used. After some deliberation it was decided that C++ would be used. It was

chosen due to its Object Oriented nature being very powerful in this situation considering the

multiple mobiles and base stations needing to be implemented.

The next step was to create the base classes for the UEs and base stations, with basic

functionality such as Accessors and Mutators for changing the parameters of the classes. Such

parameters for the base stations would be if a mobile were currently connected to it and the X-

Y co-ordinates representing its location. Such parameters for the UEs would be the ID number

of the base station it is currently connected to and the X-Y co-ordinates representing its

location.

It was important to get the Discrete Event Simulation framework implemented early into the

project. This meant that a DES framework did not have to be created from scratch, which

allowed more development time to be spent on other aspects of the simulation. The DES

library itself was very simple to use once some experience was had with it. There were two

main parts to the library; they were the Scheduler and the Event Handler. The Scheduler is the

class that provides the discrete time steps in the simulation as well as passing events to the

event handlers. The Event Handler is an abstract class that is the super class to the UEs, base

stations and Q-Learning agents. The characteristics that are inherited include the Handler

method, which receives events from the Scheduler.

When both the mobility and propagation models had been implemented the next part to be

done was the handovers along with detection for call dropping and connection ping-pongs.

These are very important parts of the simulation and their implementation can be seen in

Appendix A. While all three of these components worked correctly the handover triggering

was not implemented as efficiently as it could have been. It ended up using decrementing

variables instead of the DES library due to bugs that remained unresolved.

Simulation Testing:

It was very important for the simulation and Q-Learning algorithm to be tested so that there

was confidence that they would produce the correct results when executed. There are many

different types of testing that can take place to ensure that a piece of software is working

correctly. Testing methods that were considered for this project include: Unit testing, Black

Box testing and White/Clear Box testing.

Page 61: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

49

Unit testing involves testing individual parts or processes of a program so that the

stakeholders can be confident in them to be it for use. The different parts of the simulation

that were tested this way were the Mobility Model, handovers, drops, ping-pongs and the Q-

learning algorithm changing the values of TTT and hys correctly.

Black Box testing involves making sure that a function works as required without any

knowledge of the underlying code. This type of testing is performed by giving a function an

input, and comparing the output from the function with a previously determined expected

output. White Box testing is used to make sure that the underlying code used in a function

works as required. This type of testing is done by inserting print statements in the code to see

how the respective variables are changing while the code is running. These are compared

against previously determined expected values to confirm whether the function is working as

intended. In figure 17 we have different values of TTT for the eight eNobeBs. As seen, we

notice that the eight curves converge to the interval [1.2s- 1.6s]

Figure 17: Illustration of how the TTT values changed over time for large values when UE

travelling at walking speeds.

Page 62: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

50

Figure 18 shows how the TTT values for base stations 5 and 8 are optimized over a simulation

run. It can be seen in this figure that both base stations were in consensus to reduce the value

of their TTT from the starting value of 5:12s. However, both base stations were not in

consensus for how much the value should be reduce by. Base station 8 reduced its TTT value

to as low as 0:16s before settling between it and 0:256s. Base station 5 on the other hand

reduced its TTT value a lot less, only going as low as to oscillate between 1:024s and 1:28s. It

can also be seen that Base station 5 oscillated a lot between those two values and this could be

an indication that the algorithm had got stuck between two non-optimal states and was not

able to optimize the value anymore. This means that even though the optimization improved

the performance there was a large window of potential for further improvement.

From these results it can be said that the system performed as expected with the base stations

shown having reduced their values of TTT. There were also a very high number of dropped

calls and no ping-pongs, which was also expected in the simulation.

Figure 18: Comparison of TTT Optimization for Walking Speeds (Starting Point 5.12s)

Optimization the Drop Ratio KPI:

The results of how the optimization system compared to the static values can be seen in

Figure 19 when the TTT started with its largest possible values of 5.12 seconds. The

results show that the process of optimizing the values initially generated a very large

increase in the number of dropped calls. However, the system then managed to improve

Page 63: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

51

rapidly and ended up having a better dropped call ratio by the end to the simulation run

than that of the non-optimized system.

Figure 19: Graph of Optimized vs. Non-Optimized Results for Starting Point

TTT=0s hys.=0dB when UE traveling at walking speeds.

It turned out that ping-pongs were a very rare occurrence in the simulation. This was most

likely due to there being no fading in the simulation and that Random Direction mobility

model would having the UE moving in one direction for a long time meaning that the only

likely occurrence of a Ping-Pong would be if a handover took place and the UE then

turned around and moved the other direction.

Figure 19 shows how the optimization system performed against the static values when

the simulation started with the TTT being 0 seconds and the hys. being 0 dB. It can be

seen that the optimization process and the static values performed very similar. It can be

seen though, that the error bars for the optimized system become a lot smaller than those

for the static values the longer the simulation is run. This means that the optimization

system would be expected to perform better the majority of the time.

Page 64: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

52

Again, as seen in Figure 20, the optimization process performed a lot better than the static

values when they were originally set to their middle values of 0.256 seconds for TTT and 5

dB for hys.

Figure 20: Graph of Optimized vs. Non-Optimized Results for Starting Point

TTT=0.256s hys.=5dB when UE traveling at walking speeds.

Conclusion:

It can be seen that when the system does not get stuck between non-optimal states that it will

optimize the TTT and hys. values as quickly as is needed, i.e., whenever a dropped call or

Ping-Pong occurs. The optimization system, however, also appears to have some drawbacks.

It was seen in the first scenario that the optimization system caused a very large increase to

the dropped call ratio before improving it. This is a usual downside in optimization processes

where โ€œthings have to get worse before they can get betterโ€ and this process is a part of Q-

Learning where the possible future rewards are taken into account when selecting a new state

to move to.

Page 65: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

53

General Conclusion

LTE-Advanced is an efficient SON. In fact, the SON is a concept that originated from Next-

Generation Mobile Network Alliance. A SON network, as LTE-Advanced, can automatically

extend, change, configure, and optimize its topology, coverage, capacity, cell size, and

channel allocation, based on changes in location, traffic pattern, interference, and the

situation/environment. LTE-Advanced is an exchange-to-exchange (e2e) self-aware and self-

optimizing system with a series of features and solutions.

In this context, we try along this project to optimize the most important feature in SON: the

Handover. We propose the Q-Learning algorithm as a solution to optimize the Handover

parameters because of its advantageous nature of reinforcement converging to optimal

function. In fact, Q-Learning is the ideal solution for continuous problem. To reach that goal,

we focus on most important parameters: Time-to-trigger and Hysteresis. We try to find out the

optimal combination of that couple.

By talking of Handover and SON, we cannot not ignore the interference. Further projects can

investigate the interaction between two major technical challenges for LTE-Advanced cells

deployment, in order to face the explosive increase of the traffic growth. These challenges are

Inter-Cell Interference Management which becomes critical in dense deployment of small

cells and Mobility Management since the handover frequency between close-by cells

increases considerably. These two features, often analyzed separately are intertwined. The

main reason is that handover occurs in overlapping cellular areas, where interference level is

the highest.

Page 66: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

54

References

[1] 3GPP TS 36.331 V9.4.0 (2010-09) Evolved Universal Terrestrial Radio Access (E-

UTRA); Radio Resource Control (RRC); Protocol specification (Release 9).

[2] Alcatel-Lucent, Strategic White Paper, โ€œThe LTE Network Architectureโ€, 2009

http:/www.alcatel-lucent.com.

[3] NGN Guru Solutions White Paper, Long Term Evolution (LTE). August 2008

http:/www.ngnguru.com.

[4] Stefania Sesia, Issam Touk, Matthew Baker, โ€œLTE-The UMTS long Term Evolution:

From Theory to Practiceโ€. John Wiley & Sons Ltd, 2009.

[5] LTE Tutorial Artiza Networks

http://www.artizanetworks.com/lte_tut_sae_tec.html.

[6] LTE-Advanced presented by Raavi Trinath

http://fr.slideshare.net/RAAVIthrinath/lte-advanced-20732830

[7] QUALCOMM Incorporated, LTE Mobility Enhancements, February 2010.

[8] Harri Holma, Antti Toskala, โ€œLTE-The UMTS long Term Evolution: From

Theory to Practiceโ€. John Wiley & Sons Ltd, 2009.

[9] 3GPP TS 36.300 V9.5.0 (2010-09), Evolved Universal Terrestrial Radio Access (E-

UTRA) and Evolved Universal Terrestrial Radio Access Network (E-UTRAN);

overall description; Stage 2 (Release 9).

[10] Cheng-Chung Lin, Kumbesan Sandrasegaran, Huda Adibah Mohd Ramli, and Riyaj.

โ€œOptimized Performance Evaluation of LTE Hard Handover Algorithm with Average

RSRP Constraintโ€. International Journal of Wireless & Mobile Networks (IJWMN)

Vol. 3, No. 2, April 2011.

[11] Konstantinos Dimou, Min Wang, Yu Yang, Muhammmad Kazmi, Anna Larmo, Jonas

Pettersson, Walter Muller, Ylva Timner; โ€œHandover within 3GPP LTE: Design

Principles and Performanceโ€ Ericsson Research, 2009 IEEE.

[12] M. Anas, F.D. Calabrese, P.E. Mogensen, C.Rosa and K.I. Pedersen, โ€œPerformance

Evaluation of Received Signal Strength Based Hard Handover for UTRAN LTEโ€,

IEEE 65th Vehicular Technology Conference, April 2007.

[13] Y. Yang, โ€œOptimization of Handover Algorithms within 3GPPLTE,โ€ MSc Thesis

Report, KTH, February 2009.

Page 67: Handover Parameters Self-optimization by Q-Learning in 4G Networks

Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.

55

[14] D. Sunil Shah, โ€œA tutorial on LTE Evolved Utran (Eutran) and LTE Self Organizing

Networksโ€ (SON). December 2010.

[15] Ali Neissi Shooshtari,โ€ Optimizing handover performance in LTE networks

containing relays โ€School of Electrical Engineering. MSc Thesis Report; Espoo April

2011.

[16] http://www.radio-electronics.com/info/rf-technology-design/rf-noise-

sensitivity/receiver-signal-to-noise-ratio.php

[17] T. Jansen, I. Balan, J. Turk, I. Moerman, and T. Kurner, \Handover parameter

optimization in lte self-organizing networks," in Vehicular Technology Conference

Fall (VTC 2010-Fall), 2010 IEEE 72nd, pp. 1{5, IEEE, 2010.

[18] 3GPP TS 36.331 V10.7.0, LTE; Evolved Universal Terrestrial Radio Access (E-

UTRA); Radio Resource Control (RRC); Protocol specification (Release 10),

November 2012.

[19] N. Sinclair, Handover Optimisation using Neural Networks within LTE. PhD thesis,

University of Strathclyde, 2013.

[20] E. Alpaydin, Introduction to Machine Learning. MIT press, 2 ed., 2010.

[21] A. G. Barto and R. S. Sutton, Reinforcement learning: An introduction. MIT press,

1998.

[22] D. L. Poole and A. K. Mackworth, Artificial Intelligence: Foundations of

Computational Agents. Cambridge University Press, 2010.

[23] R. Roy, Handbook of Mobile Ad Hoc Networks for Mobility Models. Springer, 2010.

[24] J. Chebil, A. K. Lwas, M. R. Islam, and A. Zyoud, \Comparison of empirical

propagation path loss models for mobile communications in the suburban area of kuala

lumpur," in Mechatronics (ICOM), 2011 4th International Conference On, pp. 1{5,

IEEE, 2011.

[25] N. Shabbir, M. T. Sadiq, H. Kashif, and R. Ullah, \Comparison of radio propagation

models for long term evolution (lte) network.," International Journal of Next-

Generation Networks, vol. 3, no. 3, 2011.

[26] N. Sinclair, D. Harle, I. Glover, J. Irvine, and R. Atkinson, "An advanced som

algorithm applied to handover management within lte" 2013.

Page 68: Handover Parameters Self-optimization by Q-Learning in 4G Networks