handover parameters self-optimization by q-learning in 4g networks
TRANSCRIPT
I
Handover Parameters
Self-optimization by
Q-Learning in 4G
Networks
Realized by: Supervised by:
Mohamed Raafat OMRI PhD. Maissa BOUJELBEN
July 12, 2016
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
i
Dedication
I dedicate my dissertation work to my family and my friends. A special feeling of
gratitude to my loving parents, Salah and Sghaira Omri whose words of
encouragement and push for tenacity ring in my ears.
My sisters Kaouther, Lamia, Soumaya, Leila
and my brother Lotfi have never left
my side and are very special.
I dedicate this dissertation to my friends
who have supported me throughout the process.
I will always appreciate all they have done.
I also dedicate this work and give special
thanks to my lovely fiancรฉe Safa.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
ii
ACKNOWLEDGEMENTS
I would like to thank my supervisor PhD. Maissa Boujelben for her help and guidance
throughout my progress in this project.
I would like to acknowledge and thank Mr. Walid Douagi, head of Telecom Department,
PhD. Talel Zouari, my school ESPRIT and ESPRIT TECH for allowing me to conduct my
project and providing the requested assistance.
Special thanks go to the members of the jury.
I must acknowledge as well the many friends, colleagues and teachers who assisted, advised,
and supported my engineering studies and writing efforts over the years.
Finally I would like to acknowledge my family for their unlimited support and help.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
iii
Abstract
With more and more customers using mobile communications it is important for the service
providers to give their customers the best Quality-of-Service (QoS) they can afford. Many
providers have taken to improving their networks and make them more appealing to
customers. One such improvement that providers can deliver to their customers is to enhance
reliability of the network meaning that customers' calls are less likely to be dropped by the
network.
This dissertation explores improving the reliability of a 4G network by optimizing the
parameters used in handover. The process of handover within mobile communication
networks is very important since it allows users to move around freely while still staying
connected to the network. The most important parameters used in the handover process are
the Time-to-Trigger (TTT) and Hysteresis (hys). These parameters are used to determine
whether a base station is better than the serving base station by enough offset to warrant a
handover taking place. The challenge in optimizing the handover parameters is that there is a
fine balance that needs to be struck between calls being dropped due to a handover failing and
the connection switching back and forth between two base stations, unnecessarily, wasting the
network resources. In this project, we propose to use a machine learning technique known as
Q-Learning to optimize the handover parameters by generating a policy that can be followed
to adjust the parameters as needed. It was found that the implemented Q-Learning algorithm
was capable of improving the Handover performance by minimizing the chosen Handover-
related Key Performance Indicators (KPI).
Key words: LTE-Advanced, Handover, Q-learning Algorithm, Hysteresis margin, Time-To-
Trigger, Self-Optimization Network.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
iv
Table of contents
General Introduction .................................................................................................................. 1
Chapter I: LTE-Advanced Overview ......................................................................................... 3
Introduction. ........................................................................................................................ 3
1.1. Requirements and Targets for LTE-Advanced. ........................................................... 3
1.2. LTE Enabling Technologies. ....................................................................................... 5
1.2.1. Downlink OFDMA (Orthogonal Frequency Division Multiple Access).................. 5
1.2.2. Uplink SC-FDMA (Single Carrier Frequency Division Multiple Access). .............. 6
1.2.3. LTE-A Channel Bandwidths and resource elements. ............................................... 7
1.3. LTE-Advanced Network Architecture. ........................................................................ 7
1.3.1. The Core Network: Evolved Packet Core (EPC). ..................................................... 8
1.3.2. The Access Network E-UTRAN............................................................................... 9
1.3.3. The User Equipment (UE). ..................................................................................... 12
1.4. E-UTRAN Network Interfaces. ..................................................................................... 12
1.4.1. X2 Interface. ........................................................................................................... 12
1.4.2. S1 Interface ............................................................................................................. 13
1.5. LTE Protocol Architecture ............................................................................................ 14
1.5.1. User Plane ............................................................................................................... 14
1.5.2. Control Plane .......................................................................................................... 14
1.5.2.1. Radio Resource Control (RRC). .............................................................................. 15
1.5.2.2. Radio Resource Control States. ............................................................................... 16
1.6. Self-Organizing Networks. ............................................................................................ 17
Conclusion ............................................................................................................................ 19
Chapter II: Handover in LTE-Advanced .................................................................................. 20
Introduction. ......................................................................................................................... 20
2.1. Handover Definition and Characteristics ...................................................................... 20
2.1.1. Seamless Handover ................................................................................................. 20
2.1.2. Lossless Handover .................................................................................................. 21
2.2. Types of Handover ........................................................................................................ 22
2.2.1. Intra LTE Handover: Horizontal Handover ............................................................ 22
2.2.2. Vertical Handover ................................................................................................... 22
2.3. Handover Techniques .................................................................................................... 23
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
v
2.3.1. Soft handover, Make-Before-Break ........................................................................ 23
2.3.2. Hard handover, Break-Before-Make: ..................................................................... 23
2.4. Handover Procedure ...................................................................................................... 24
2.5. Handover Measurements ............................................................................................... 29
2.6. Handover Parameters ..................................................................................................... 31
2.7. Time To Trigger & Hysteresis.. ..................................................................................... 33
Conclusion. ........................................................................................................................... 35
Chapter III: Machine Learning and Handover Parameter Optimization simulation ................ 36
Introduction. ......................................................................................................................... 36
3.1. Q-Learning overview. .................................................................................................... 36
3.1.1. Machine Learning. .................................................................................................. 36
3.1.2. Reinforcement Learning. ........................................................................................ 37
3.1.3. Q-Learning. ............................................................................................................. 38
3.2. Proposed Approach for HO optimization: ..................................................................... 40
3.2.1. Set of states ............................................................................................................. 40
3.2.2. Set of actions. .......................................................................................................... 42
3.2.3. Reward. ................................................................................................................... 43
3.3. Simulation & Performance evaluation: ......................................................................... 43
3.3.1. Simulation parameters............................................................................................. 44
3.3.2. Simulation results. ................................................................................................... 48
Conclusion. ........................................................................................................................... 52
General Conclusion .................................................................................................................. 53
References ................................................................................................................................ 54
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
vi
List of Figures
Figure 1: Orthogonal Frequency Division Multiple Accessโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 6
Figure 2: LTE SAE Evolved Packet Coreโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 8
Figure 3: E-UTRAN Architectureโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 9
Figure 4: Functional Split between E-UTRAN and EPCโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 11
Figure 5: Protocol stack for the user-plane and control-plane at X2 interfaceโฆโฆโฆโฆ.. 13
Figure 6: Protocol stack for the user-plane and control-plane at S1 interface โฆโฆโฆโฆ. 13
Figure 7: E-UTRAN Protocol Stackโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 14
Figure 8: The RRC Statesโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 16
Figure 9: Decision on Handover Typeโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 24
Figure 10: Intra-MME/Serving Gateway Handover โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 25
Figure 11: Handover Timing โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ................................................ 28
Figure 12: Downlink reference signal structure for LTE-Advanced โฆโฆโฆโฆโฆโฆโฆ... 31
Figure 13: Handover measurement filtering and reporting โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 31
Figure 14: Handover triggering procedure โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 32
Figure 15: State 157 possible actionsโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 42
Figure 16: Illustration of Coverage within the Simulation Areaโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 45
Figure 17: Illustration of how the TTT values changed over time for large values when
UE travelling at walking speedsโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ
49
Figure 18: Comparison of TTT Optimization for Walking Speeds (Starting Point
5.12s)โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ
50
Figure 19: Graph of Optimized vs. Non-Optimized Results for Starting Point
TTT=0s hys.=0dB when UE traveling at walking speedsโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..
51
Figure 20: Graph of Optimized vs. Non-Optimized Results for Starting Point
TTT=0.256s hys.=5dB when UE traveling at walking speedsโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ
52
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
vii
List of Tables
Table 1: LTE-Advanced development historyโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ... 3
Table 2: Number of PRBs โฆโฆโฆโฆโฆ.โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ... 6
Table 3: Operational benefits by SONโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ... 19
Table 4: Table of the different LTE hys. valuesโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 33
Table 5: Table of the different LTE TTT valuesโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 34
Table 6: Table of the different LTE Trigger types and their criteriaโฆโฆโฆโฆโฆโฆโฆโฆ 34
Table 7: Set of statesโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ.. 41
Table 8: Simulation parametersโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 47
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
viii
Abbreviations
3G 3rd Generation (Cellular Systems)
3GPP Third Generation Partnership Project
4G 4th Generation (Cellular Systems)
AC Admission Control
ACK Acknowledgement (in ARQ protocols)
AI Artificial Intelligence
AM Acknowledged mode
AGWA Access Gateway
AS Access Stratum
BS Base Station
CDF Cumulative Distribution Function
CDMA Code Division Multiple Access
CQI Channel Quality Indicator
CS Circuit-Switched
dB Decibel
DFT Discrete Fourier Transform
DL Downlink
DRB Data Radio Bearer
eNodeB Enhanced Node B (3GPP Base Station)
EPC Evolved Packet Core
E-UTRAN Evolved Universal Terrestrial Radio Access
FDD Frequency Division Duplex
GPRS General Packet Radio Service
GSM Global System for Mobile communications
HO Handover
HOM HO margin
HSDPA High Speed Downlink Packet Access
HSS Home Subscriber Server
HYS Hysteresis
IMS Multimedia Sub-system
IP Internet Protocol
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
ix
ITU International Telecommunication Union
ITU-T ITU Telecommunication Standardization Sector
LTE-Advanced Long Term Evolution Advanced
MAC Medium Access Control
MME Mobility Management Entity
NACK Negative Acknowledgement
NAS Non-Access Stratum
NGMN Next Generation Mobile Networks
OFDM Orthogonal Frequency Division Multiplexing
OFDMA Orthogonal Frequency Division Multiple Access
OTP Optimum Trigger Point
PAPR Peak-to-Average Power Ratio
PCRF Policy and Charging Rules Function
PDCP Packet-Data Convergence Protocol
PDN Packet Data Network
PDU Protocol Data Unit
PGW PDN Gateway
QoS Quality of Service
RAN Radio Access Network
RB Resource Block
RF Radio Frequency
RLC Radio Link Protocol
RNC Radio Network Controller
ROHC RObust Header Compression
RRC Radio Resource Control
RRM Radio Resource Management
RSRP Reference Signal Received Power
RSRQ Reference Signal Received Quality
RSS Received Signal Strength
RSSI Received Signal Strength Indicator
SAE System Architecture Evolution
S1 The interface between eNodeB and Access Gateway
S1AP S1 Application Part
SC-FDMA Single Carrier - Frequency Division Multiple Access
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
x
SGW Serving Gateway
SINR Signal-to-Interference-plus-Noise Ratio
SIR Signal-to-Interference Ratio
SN Sequence Number
SON Self-Organizing Network
SRB Signaling Radio Bearers
TE Terminal Equipment
TM Transparent Mode
TTI Transmission Time Interval
TTT Time-to-Trigger
UE User Equipment, the 3GPP name for the mobile terminal
UL Uplink
UM Unacknowledged Mode
UMTS Universal Mobile Telecommunication System
USIM Universal Subscriber Identity Module
VoIP Voice over IP
X2 Interface between eNodeBโs
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
1
General Introduction
In recent years, there has been enormous growth in mobile telecommunications traffic in line
with the rapid spread of smart phone devices. The cellular networks are evolving to meet the
future requirements of data rate, coverage and capacity. LTE Advanced is a mobile
communication standard and a major enhancement of the Long Term Evolution (LTE)
standard. It was formally submitted as a candidate 4G system to ITU-T in late 2009 as
meeting the requirements of the IMT-Advanced standard, and was standardized by the 3rd
Generation Partnership Project (3GPP) in March 2011 as 3GPP Release 10. One of the
important LTE Advanced benefits is the ability to take advantage of advanced topology
networks; optimized heterogeneous networks with a mix of macrocells with low power nodes
such as picocells, femtocells and new relay nodes. The next significant performance leap in
wireless networks will come from making the most of topology, and brings the network closer
to the user by adding many of these low power nodes. LTE-Advanced further improves the
capacity and coverage, and ensures user fairness. LTE-Advanced also introduces multicarrier
to be able to use ultra wide bandwidth, up to 100 MHz of spectrum supporting very high data
rates. Mobility aspect for the enhancement is an important Long Term Evolution technology
since it should support mobility for various mobile speeds up to 350km/h or even up to
500km/h. With the moving speed even higher, the handover procedure will be more frequent
and fast; therefore, the handover performance becomes more crucial especially for real time
services [11].
One of the main goals of LTE-Advanced or any wireless system for that matter is to provide
fast and seamless handover from one cell (a source cell) to another (a target cell). The service
should be maintained during the handover procedure, data transfer should not be delayed or
should not be lost; otherwise performance will be dramatically degraded. This is especially
applicable for LTE-Advanced systems because of the distributed nature of the LTE radio
access network architecture which consists of just one type of node, the base station, known in
LTE-Advanced as the eNodeB [7].
In LTE-Advanced there are also some predefined handover conditions for triggering the
handover procedure as well as some goals regarding handover design and optimization such
as decreasing the total number of handovers in the whole system by predicting the handover,
decreasing the number of ping pong handovers, and having fast and seamless handover.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
2
Hence, optimizing the handover procedure to get the required performance is considered as
one important issue in LTE-Advanced networks [11].
Actually, many studies are carried out to achieve improvements in LTE-Advanced handover,
with different HO algorithms and which take several stages for different cases, but certainly
all of them are done in order to get optimum handover mechanisms that can handle the
smooth handover on cell boundaries of the LTE-Advanced network.
The main objective of this project is to develop a Q-learning algorithm to self-optimize the
parameters used in the handover process of 4G networks.
In this project we have three chapters: the first chapter contains an overview of LTE
technology; the main characteristics and functionalities of the system are described as well as
the enabling technologies, network architecture and protocol. In the second chapter, we
introduce the general concepts of handover and we describe the whole HO procedure. The
optimization and design principles as well as the variables used as inputs and the different HO
parameters also explained and finally the third chapter discusses our proposed approach.
First, we present the machine Learning explaining thus the reinforcement learning and the Q-
Learning. Then we discuss the handover parameter optimization. Finally we present the
simulation parameters and the obtained results.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
3
Chapter I: LTE-Advanced Overview
Introduction:
In LTE-Advanced networks, focus is on higher capacity: The driving force to further develop
LTE towards LTEโAdvanced - LTE Release10 was to provide higher bitrates in a cost
efficient way and, at the same time, completely fulfill the requirements set by ITU for IMT
Advanced, also referred to as 4G.
In this chapter, we will present the LTE-Advanced technologies, resource elements and the
network architecture by citing the different key components.
1.1. Requirements and Targets for LTE-Advanced:
3GPP completed the process of defining LTE-Advanced for radio access, so that the
technology systems remain competitive in the future. The 3GPP has identified a set of high
level requirements that have already been exceeded so far.
The following target requirements were agreed among operators and vendors at the project to
define the evolution of 3G networks.
Table 1: LTE-Advanced development history.
WCDMA
(UMTS)
HSPA
HSDPA/HSUPA
HSPA+ LTE LTE-A
Max downlink
speed (bps)
384 K 14 M 28 M 100 M 1 G
Max uplink speed
(bps)
128 K 5.7 M 11 M 50 M 100 M
Latency round
trip time (approx.)
150 ms 100 ms 50 ms
max
~10 ms Less than 5 ms
3GPP releases Rel 99/5 Rel 5/6 Rel 7 Rel 8/9 Rel 10
Approx. years of
initial roll out
2003/4 2005/6 HSDPA
2007/8 HSUPA
2008/9 2009/10 2011
Access
methodology
CDMA CDMA CDMA OFDMA/SC-
FDMA
OFDMA/SC-
FDMA
Some of key LTE-Advanced requirements related to data rate, throughput, latency, and
mobility are provided below [3]:
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
4
Peak data rate:
o 1 Gbps data rate will be achieved by 4 โbyโ 4 MIMO and transmission bandwidth
wider than approximately 70 MHz.
Peak Spectrum efficiency:
o DL: Rel. 8 LTE satisfies IMT โAdvanced requirement: 30 bps/Hz.
o UL: Need to double from Release 8 to satisfy IMTโAdvanced
requirement: 15 bps/Hz and 30 bps/Hz in Rel 10.
Capacity and cellโedge user throughput:
o Target for LTEโAdvanced was set considering gain of 1.4 to 1.6 from Release 8
LTE performance.
Spectrum flexibility:
In addition to the bands currently defined for LTE Release 8, TR 36.913
identifies the following new bands:
o 450โ470 MHz band
o 698โ862 MHz band
o 790โ862 MHz band
o 2.3โ2.4 GHz band
o 3.4โ4.2 GHz band
o 4.4โ4.99 GHz band
Some of these bands are now formally included in the 3GPP Release 9 and Release 10
specifications. Note that frequency bands are considered release independent features, which
means that it is acceptable to deploy an earlier release product in a band not defined until a
later release. LTE-Advanced is designed to operate in spectrum allocations of different sizes,
including allocations wider than the 20 MHz in Release 8, in order to achieve higher
performance and target data rates. Although it is desirable to have bandwidths greater than 20
MHz deployed in adjacent spectrum, the limited availability of spectrum means that
aggregation from different bands is necessary to meet the higher bandwidth requirements.
This option has been allowed for in the IMT-Advanced specifications.
Mobility:
o E-UTRAN should be optimized for low mobile speed from 0 to 15 km/h.
o Higher mobile speed between 15 and 120 km/h should be supported with high
performance.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
5
o Mobility across the cellular network shall be maintained at speeds from 120 km/h
to 350 km/h (or even up to 500 km/h depending on the frequency band).
Coverage:
o Throughput, spectrum efficiency and mobility targets above should be met for 5
km cells, and with a slight degradation for 30 km cells. Cells range up to 100 km
should not be precluded.
o Available for paired and unpaired spectrum arrangements.
1.2. LTE Enabling Technologies:
LTE has introduced a number of new technologies when compared to the previous cellular
systems. They enable LTE-Advanced to operate more efficiently with respect to the use of
spectrum, and also to provide much higher data rates that are being required.
A major difference of LTE-Advanced in comparison to its 3GPP ancestors is the radio
interface; Orthogonal Frequency Division Multiple Access (OFDMA) and Single Carrier
Frequency Division Multiple Access (SC-FDMA) are used for the downlink and uplink
respectively, as radio access schemes [6].
1.2.1. Downlink OFDMA (Orthogonal Frequency Division Multiple Access):
OFDMA is a variant of OFDM (Orthogonal Frequency Division Multiplexing) and it is the
downlink access technology. One of the most important advantages is the intrinsic
orthogonality provided by OFDMA to the users within a cell, which translates into an almost
null level of intra-cell interference. Therefore, inter-cell interference is the limiting factor
when high reuse levels are intended. In this case, cell-edge users are especially susceptible to
the effects of inter-cell interference. OFDMA divides the wide available bandwidth into many
narrow and mutually orthogonal subcarriers and transmits the data in parallel streams. The
smallest transmission unit in the downlink LTE-Advanced system is known as a Physical
Resource Block (PRB).
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
6
Figure 1: Orthogonal Frequency Division Multiple Access [5].
A resource block contains 12 subcarriers, regardless of the overall LTE-Advanced signal
bandwidth. They also cover one slot in the time frame; this means that different LTE-
Advanced signal bandwidths will have different numbers of resource blocks.
Table 2: Number of PRBs.
Channel Bandwidth (MHz)
1.4
3
5
10
15
20
Number of PRBs
6
15
25
50
75
100
The OFDM signal used in LTE-Advanced comprises a maximum of 2048 different sub-
carriers having a spacing of 15 kHz. Although it is mandatory for the mobiles to have
capability to be able to receive all 2048 sub-carriers, not all need to be transmitted by the base
station (eNodeB) which only needs to be able to support the transmission of 72 sub-carriers.
In this way all mobiles will be able to talk to any base station.
1.2.2. Uplink SC-FDMA (Single Carrier Frequency Division Multiple Access):
For the LTE-Advanced uplink, a different concept is used for the access technique. Although
still using a form of OFDMA technology, the implementation is called Single Carrier
Frequency Division Multiple Access (SC-FDMA). The main task of this scheme is to assign
communication resources to multiple users. The major difference to other schemes is that it
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
7
performs DFT (Discrete Fourier Transform) operation on time domain modulated data before
going into OFDM modulation.
One of the key parameters that affect all mobiles is that of battery life. Even though battery
performance is improving all the time, it is still necessary to ensure that the mobiles use as
little battery power as possible. With the RF power amplifier that transmits the radio
frequency signal via the antenna to the base station being the highest power item within the
mobile, it is necessary that it operates in as efficient mode as possible. This can be
significantly affected by the form of radio frequency modulation and signal format. Signals
that have a high peak to average ratio and require linear amplification do not lend themselves
to the use of efficient RF power amplifiers [5].
1.2.3. LTE-A Channel Bandwidths and resource elements:
One of the key parameters associated with the use of OFDM within LTE-Advanced is the
choice of bandwidth. The available bandwidth influences a variety of decisions including the
number of carriers that can be accommodated in the OFDM signal and in turn this influences
elements including the symbol length and so forth [6].
LTE can support 6 kinds of bandwidth and obviously, to higher bandwidth we will obtain
greater channel capacity: 1.4 MHz, 3MHz, 5MHz, 10MHz, 15 MHz and 20MHz.
In addition to this, the subcarriers are spaced 15 kHz apart from each other. To maintain
orthogonality, this gives a symbol rate of 1 / 15 kHz = of 66.7 ยตs. Each subcarrier is able to
carry data at a maximum rate of 15 ksps (kilo symbols per second). This gives a 20 MHz
bandwidth system a raw symbol rate of 18 Msps. In turn this is able to provide a raw data rate
of 108 Mbps as each symbol using 64QAM is able to represent six bits.
1.3. LTE-Advanced Network Architecture:
LTE-A has been designed to support only packet switched services, in contrast to the circuit-
switched model of previous cellular systems. It aims to provide seamless Internet Protocol
(IP) connectivity between User Equipment (UE) and the Packet Data Network (PDN), without
any disruption to the end usersโ applications during mobility [2].
While the term โLTEโ encompasses the evolution of the Universal Mobile
Telecommunications System (UMTS) radio access through the Evolved UTRAN (E-
UTRAN), it is accompanied by an evolution of the non-radio aspects under the term โSystem
Architecture Evolutionโ (SAE).
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
8
Together LTE-Advanced and SAE comprise the Evolved Packet System (EPS). This EPS in
turn includes the EPC (Evolved Packet Core) on the core side and E-UTRAN (Evolved
UMTS Terrestrial Radio Access Network) on the access side [2].
In addition to these two components, User Equipment (UE) and Services Domain are also
very important subsystems of LTE architecture.
1.3.1. The Core Network: Evolved Packet Core (EPC):
The core network is responsible for the overall control of the UE and establishment of the
bearers. The Evolved Packet Core is the main element of the LTE-Advanced SAE network.
This consists of four main elements and connects to the eNodeBโs as shown in the diagram
below.
Figure 2: LTE-Advanced SAE Evolved Packet Core [6].
Mobility Management Entity (MME):
The MME is the main control node for the LTE SAE access network, handling a number of
features, it can therefore be seen that the SAE MME provides a considerable level of overall
control functionality. The protocols running between the UE and the CN are known as the
Non Access Stratum (NAS) protocols. The main functions supported by the MME can be
classified as:
Functions related to bearer management โ This includes the establishment,
maintenance and release of the bearers and is handled by the session management
layer in the NAS protocol.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
9
Functions related to connection management โ This includes the establishment of
the connection and security between the network and UE and is handled by the
connection or mobility management layer in the NAS protocol layer.
Serving Gateway (SGW):
The Serving Gateway, SGW, is a data plane element within the LTE SAE. Its main purpose is
to manage the user plane mobility and it also acts as the main border between the Radio
Access Network, RAN and the core network. The SGW also maintains the data paths between
the eNodeBโs and the PDN Gateways. In this way the SGW forms an interface for the data
packet network at the E-UTRAN.
PDN Gateway (PGW):
The LTE SAE PDN (Packet Data Network) gateway provides connectivity for the UE to
external packet data networks, fulfilling the function of entry and exit point for UE data. The
UE may have connectivity with more than one PGW for accessing multiple PDNs.
Home Subscription Server (HSS):
The HSS is a database server which is located in the operator's premises. All the user
subscription information is stored in the HSS. The HSS also contains the records of the user
location and has the original copy of the user subscription profile. The HSS is interacting with
the MME, and it needs to be connected to all the MMEs in the network that controls the UE.
1.3.2. The Access Network E-UTRAN:
The E-UTRAN is the Access Network of LTE and simply consists of a network of eNodeBโs
that are connected to each other via X2 interface as illustrated in Figure 3. The eNodeBโs are
also connected to the EPC via S1 interface, more specifically to the MME by means of the
S1-MME interface and to the S-GW by means of the S1-U interface.
Figure 3: E-UTRAN Architecture [9].
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
10
eNodeB:
The eNodeB is a radio base station of a LTE network that controls all radio-related functions
in the fixed part of the system. These radio base stations are distributed throughout the
coverage region and each of them is placed near a radio antenna. One of the biggest
differences between LTE network and legacy mobile communication system 3G is a base
station.
Practically, an eNodeB provides bridging between the UE and EPC. All the radio protocols
that are used in the access link are terminated in the eNodeB. The eNodeB does
ciphering/deciphering in the user plane as well as IP header compression/decompression. The
eNodeB also has some responsibilities in the control plane such as radio resource
management and performing control over the usage of radio resources.
The E-UTRAN has many responsibilities regarding to all related radio functions. The main
features that supports are the following:
Radio Resource Management:
The RRM objective is to make the mobility feasible in cellular wireless networks so that the
network with the help of the UE takes care of the mobility without user intervention. RRM
covers all functions related to the radio bearers, such as radio bearer control, radio admission
control, radio mobility control, scheduling and dynamic allocation of resources to UEs in both
uplink and downlink.
IP Header Compression:
This helps to ensure efficient use of the radio interface by compressing the IP packet headers
which could otherwise represent a significant overhead, especially for small packets such as
VoIP.
One of the main functions of PDCP (Packet Data Convergence Protocol) is header
compression using the Robust Header Compression (ROHC) protocol defined by the IETF. In
LTE, header compression is very important because there is no support for the transport of
voice services via the Circuit-Switched (CS) domain.
Security:
Security is a very important feature of all 3GPP radio access technologies. LTE provides
security in a similar way to its predecessors UMTS and GSM. Because of the sensitivity of
signaling messages exchanged between the eNodeB itself and the terminal, or between the
MME and the terminal, all this set of information is protected against eavesdropping and
alteration.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
11
The implementation of security architecture of LTE is carried out by two functions: Ciphering
of both control plane (RRC) data and user plane data, and Integrity Protection which is used
for control plane (RRC) data only. Ciphering is used in order to protect the data streams from
being received by a third party, while Integrity Protection allows the receiver to detect packet
insertion or replacement. RRC always activates both functions together, either following
connection establishment or as part of the handover to LTE.
Connectivity to the EPC:
This function consists of the signaling towards the MME and the bearer path towards the S-
GW. All of the above-mentioned functions are concentrated in the eNodeB as in LTE all the
radio controller functions are gathered in the eNodeB. This concentration helps different
protocol layers interact with each other better and will end up in decreased latency and
increase in efficiency.
On the network side, all of these functions reside in the eNodeBโs, each of which can be
responsible for managing multiple cells. Unlike some of the previous second and third
generation technologies, LTE integrates the radio controller function into the eNodeB. This
allows tight interaction between the different protocol layers of the radio access network
(RAN), thus reducing latency and improving efficiency. Furthermore, as LTE does not
support soft handover there is no need for a centralized data-combining function in the
network. One consequence of the lack of a centralized controller node is that, as the UE
moves, the network must transfer all information related to a UE, that is, the UE context,
together with any buffered data, from one eNodeB to another. Mechanisms are therefore
needed to avoid data loss during handover.
Figure 4: Functional Split between E-UTRAN and EPC [5].
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
12
1.3.3. The User Equipment (UE):
The end user communicates using a UE. The UE can be a handheld device like a smart phone
or it can be a device which is embedded in a laptop. The UE is divided into two parts: the
Universal Subscriber Identity Module (USIM) and the rest of the UE, which is called
Terminal Equipment (TE).
The USIM is an application with the purpose of identification and authentication of the user
for obtaining security keys. This application is placed into a removable smart card called a
universal integrated circuit card (UICC).
The UE in general is the end-user platform that by the use of signaling with the network, sets
up, maintains, and removes the necessary communication links. The UE is also assisting in
the handover procedure and sends reports about terminal location to the network.
1.4. E-UTRAN Network Interfaces:
There are two interfaces concerned in handover procedure in LTE for UEs in active mode,
which are X2 and S1 interfaces. Both interfaces can be used in handover procedures, but with
different purposes.
1.4.1. X2 Interface:
The X2 interface has a key role in the intra-LTE handover operation. The source eNodeB will
use the X2 interface to send the Handover Request message to the target eNodeB. If the X2
interface does not exist between the two eNodeBโs in question, then procedures need to be
initiated to set one up before handover can be achieved [3].
The Handover Request message initiates the target eNodeB to reserve resources and it will
send the Handover Request Acknowledgement message assuming resources are found.
There are different information elements provided (some optional) on the handover Request
message, such as:
Requested SAE bearers to be handed over.
Handover restrictions list, which may restrict following handovers for the UE.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
13
Last visited cells the UE has been connected to, if the UE historical information
collection functionality is enabled. This has been considered to be useful in avoiding
the Ping-Pong effects between different cells when the target eNodeB is given
information on how the serving eNodeB has been changing in the past. Thus actions
can be taken to limit frequent X2 User Plane.
Figure 5: Protocol stack for the user-plane and control-plane at X2 interface [3].
1.4.2. S1 Interface:
The radio network signaling over S1 consists of the S1 Application Part (S1AP).The S1AP
protocol handles all procedures between the EPC and E-UTRAN. It is also capable of
carrying messages transparently between the EPC and the UE. Over the S1 interface the S1AP
protocol primarily supports general E-UTRAN procedures from the EPC, transfers
transparent non-access signaling and performs the mobility function. The figure below
shows the protocol stack for the user-plane and control-plane at S1 interface [3].
Figure 6: Protocol stack for the user-plane and control-plane at S1 interface [3].
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
14
1.5. LTE Protocol Architecture:
The overall radio interface protocol architecture for LTE can be divided into User Plane
Protocols and Control Plane Protocols. The U-UTRAN protocol stack is depicted in the figure
7.
Figure 7: E-UTRAN Protocol Stack [8].
1.5.1. User Plane:
An IP packet is tunneled between the P-GW and the eNodeB to be transmitted towards the
UE. Different tunneling protocols can be used. The tunneling protocol used by 3GPP is called
the GPRS tunneling protocol (GTP) [8].
The LTE Layer 2 user-plane protocol stack is composed of three sub layers: Packet Data
Convergence Protocol (PDCP), Radio Link Control (RLC) and Medium Access Control
(MAC). These sub layers are terminated in the eNodeB on the network side.
1.5.2. Control Plane:
Control plane and User plane have common protocols which perform the same functions
except that for the control plane protocols there is no header compression. In the access
stratum protocol stack and above the PDCP, there is the Radio Resource Control (RRC)
protocol which is considered as a โLayer 3โ protocol. RRC sends signaling messages between
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
15
the eNodeB and UE for establishing and configuring the radio bearers of all lower layers in
the access stratum.
1.5.2.1. Radio Resource Control (RRC):
The RRC (Radio Resource Control) layer is a key signaling protocol which supports many
functions between the terminal and the eNodeB. The RRC protocol enables the transfer of
common NAS information which is applicable to all UEs as well as dedicated NAS
information which is applicable only to a specific UE. In addition, for UEs in RRC_IDLE,
RRC supports notification of incoming calls.
The key features of RRC are the following:
Broadcast of System Information: Handles the broadcasting of system
information, which includes NAS common information. Some of the system
information is applicable only for UEโs in RRC-IDLE while other system
information is also applicable for UEs in RRC-CONNECTED.
RRC Connection Management: Covers all procedures related to the
establishment, modification and release of an RRC connection, including paging,
initial security activation, establishment of Signaling Radio Bearers (SRBโs) and of
radio bearers carrying user data (Data Radio Bearers, DRBโs), handover within LTE
(including transfer of UE RRC context information), configuration of the lower
protocol layers, access class barring and radio link failure.
Establishment and release of radio resources: This relates to the allocation of resources
for the transport of signaling messages or user data between the terminal and eNodeB.
Paging: this is performed through the PCCH logical control channel. The prominent
usage of paging is to page the UEโs that are in RRC-IDLE. Paging can also be used to
notify UEโs both in RRC-IDLE and RRC-CONNECTED modes about system
information changes or SIB10 and SIB11 transfers.
Transmission of signaling messages to and from the EPC: these messages (known as
NAS for Non Access Stratum) are transferred to and from the terminal via the RRC;
they are, however, treated by RRC as transparent messages.
Handover: the handover is triggered by the eNodeB, based on the received
measurement reports from the UE. Handover is classified in different types based on
the origin and destination of the handover. The handover can start and end in the E-
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
16
UTRAN, it can start in the E-UTRAN and end in another Radio Access Technology
(RAT), or it can start from another RAT and end in E-UTRAN.
The RRC also supports a set of functions related to end-user mobility for terminals in
RRC Connected state. This includes:
Measurement control: This refers to the configuration of measurements to be
performed by the terminal as well as the method to report them to the eNodeB.
Support of inter-cell mobility procedures: which are also known as handover
User context transfer: between eNodeB at handover.
1.5.2.2. Radio Resource Control States:
The main function of the RRC protocol is to manage the connection between the terminal and
the EUTRAN access network. To achieve this, RRC protocol states have been defined and
they are depicted in the figure below. Each of them actually corresponds to the states of the
connection, and describes how the network and the terminal shall handle special functions
like terminal mobility, paging message processing and network system information
broadcasting [14].
In E-UTRAN, the RRC state machine is very simple and limited to two states only: RRC-
IDLE, and RRC-CONNECTED.
Figure 8: The RRC States [14]
In the RRC-IDLE state, there is no connection between the terminal and the eNodeB,
meaning that the terminal is actually not known by the E-UTRAN Access Network. The
terminal user is inactive from an application level perspective, which does not mean at all that
nothing happens at the radio interface level. Nevertheless, the terminal behavior is specified in
order to save as much battery power as possible and is actually limited to three main items:
Periodic decoding of System Information Broadcast by E-UTRAN: this process is
required in case the information is dynamically updated by the network.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
17
Decoding of paging messages: so that the terminal can further connect to the network
in case of an incoming session.
Cell reselection: the terminal periodically evaluates the best cell it should camp on
through its own radio measurements and based on network System Information
parameters. When the condition is reached, the terminal autonomously performs a
selection of a new serving cell.
In the RRC-CONNECTED state, there is an active connection between the terminal and the
eNodeB, which implies a communication context being stored within the eNodeB for this
terminal. Both sides can exchange user data and or signaling messages over logical channels.
Unlike the RRC-IDLE state, the terminal location is known at the cell level. Terminal
mobility is under the control of the network using the handover procedure, which decision is
based on many possible criteria including measurement reported by the terminal of by the
physical layer of the eNodeB itself.
1.6. Self-Organizing Networks:
A self-organizing Network (SON) is an automation technology designed to make the
planning, configuration, management, optimization and healing of mobile radio access
networks simpler and faster. SON functionality and behavior has been defined and specified
in generally accepted mobile industry recommendations produced by organizations such as
3GPP and the NGMN.
SON has been codified within 3GPP Release 8 and subsequent specifications in a series of
standards including 36.902, as well as public white papers outlining use cases from the
NGMN. The first technology making use of SON features will be Long Term Evolution
(LTE), but the technology has also been retro-fitted to older radio access technologies such as
Universal Mobile Telecommunications System (UMTS). The LTE specification inherently
supports SON features like Automatic Neighbor Relation (ANR) detection, which is the 3GPP
LTE Rel. 8 flagship feature.
Newly added base stations should be self-configured in line with a "plug-and-play" paradigm,
while all operational base stations will regularly self-optimize parameters and algorithmic
behavior in response to observed network performance and radio conditions. Furthermore,
self-healing mechanisms can be triggered to temporarily compensate for a detected equipment
outage, while awaiting a more permanent solution.
Self-organizing network functionalities are commonly divided into three major sub-functional
groups, each containing a wide range of decomposed use cases:
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
18
Self-configuration functions:
Self-configuration strives towards the "plug-and-play" paradigm in the way that new base
stations shall automatically be configured and integrated into the network. This means both
connectivity establishment, and download of configuration parameters are software. Self-
configuration is typically supplied as part of the software delivery with each radio cell by
equipment vendors. When a new base station is introduced into the network and powered on,
it gets immediately recognized and registered by the network. The neighboring base stations
then automatically adjust their technical parameters (such as emission power, antenna tilt,
etc.) in order to provide the required coverage and capacity, and, in the same time, avoid the
interference.
Self-optimization functions:
Every base station contains hundreds of configuration parameters that control various aspects
of the cell site. Each of these can be altered to change network behavior, based on
observations of both the base station itself, and measurements at the mobile station or handset.
One of the first SON features establishes neighbor relations automatically (ANR), while
others optimize random access parameters or mobility robustness in terms of handover
oscillations. A very illustrative use case is the automatic switch-off of a percent of base
stations during the night hours. The neighboring base station would then re-configure their
parameters in order to keep the entire area covered by signal. In case of a sudden growth in
connectivity demand for any reason, the "sleeping" base stations "wake up" almost
instantaneously. This mechanism leads to significant energy savings for operators.
Self-healing functions:
When some nodes in the network become inoperative, self-healing mechanisms aim at
reducing the impacts from the failure, for example by adjusting parameters and algorithms in
adjacent cells so that other nodes can support the users that were supported by the failing
node. In legacy networks, the failing base stations are at times hard to identify and a
significant amount of time and resources is required to fix it. This function of SON permits to
spot such a failing base stations immediately in order to take further measures, and ensure no
or insignificant degradation of service for the users.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
19
Table 3: Operational benefits by SON.
Self-Configuration Flexibility in logistics (eNodeB not site specific).
Reduced site / parameter planning.
Simplified installation; less prone to errors.
No/minimum drive tests.
Faster rollout.
Self-Optimization Increased network quality and performance.
Parameter optimization reduced maintenance, site visits.
Self-Healing Error self-detection and mitigation.
Speed up maintenance.
Reduce outage time.
Conclusion:
In LTE-Advanced focus is on higher capacity: the driving force to further develop LTE
towards LTEโAdvanced. LTE-Advanced provides higher bitrates in a cost efficient way and,
at the same time, completely fulfill the requirements set by ITU for IMT Advanced, also
referred to as 4G. In the next chapter, we will pay particular attention to the handover.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
20
Chapter II: Handover in LTE-Advanced
Introduction:
Mobility is an essential component of mobile cellular communication systems because it
offers clear benefits to the end users: low delay services such as voice or real time video
connections can be maintained while moving even in high speed trains.
Handover is one of the key procedures for ensuring that the users move freely through the
network while still being connected and being offered quality services. Since its success
rate is a key indicator of user satisfaction, it is vital that this procedure happens as fast and as
seamlessly as possible. Hence, optimizing the handover procedure to get the required
performance is considered an important issue in LTE networks.
In this context, we study in this chapter the Handover by its characteristics and different types.
2.1. Handover Definition and Characteristics:
The process of handover is very important in mobile telecommunications. It involves moving
the resource allocation for a mobile phone or a piece of UE from one base station to another.
This process is used to provide better Quality-of-Service (QoS) to customers by allowing
them to continue to use provided services even after moving out of range of the original
serving base station. It is important that handovers are performed quickly, cause little-to-no
disruption to the user's experience and are completed with a very high success rate. If a
handover is unsuccessful it is likely that an on-going call will be dropped due to there not
being enough resources available on a base station (known as an eNodeB in LTE) or the if
Received Signal Strength (RSS) to the UE drops below a certain threshold needed to maintain
the call. This threshold, in LTE, is known as the noise or and has a value of -97.5dB.
Handovers are stated to take roughly 0.25 seconds to complete after the decision has been
made for a handover to take place [17].
Depending on the required QoS, a seamless handover or a lossless handover is performed as
appropriate for each radio bearer. The descriptions of each of them are presented below.
2.1.1. Seamless Handover:
The objective of seamless handover is to provide a given QoS when the UE moves from the
coverage of one cell to the coverage of another cell. In LTE seamless handover is applied to
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
21
all radio bearers carrying control plane data and for user plane radio bearers mapped on RLC-
UM. These types of data are typically reasonably tolerant of losses but less tolerant of delay,
(e.g. voice services). Therefore seamless handover should minimize the complexity and delay
although some SDUs might be lost [4].
In the seamless handover, PDCP entities including the header compression contexts are reset,
and the COUNT values are set to zero. As a new key is anyway generated at handover, there
is no security reason to keep the COUNT values. On the UE side, all the PDCP SDUs that
have not been transmitted yet will be sent to the target cell after handover. PDCP SDUs for
which the transmission has not been started can be forwarded via X2 interface towards the
target eNodeB. Unacknowledged PDCP SDUs will be lost. This minimizes the handover
complexity because no context (i.e. configuration information) has to be transferred between
the source and the target eNodeB.
2.1.2. Lossless Handover:
Lossless handover means that no data should be lost during handover. This is achieved by
performing retransmission of PDCP PDUs for which reception has not been acknowledged by
the UE before the UE detaches from the source cell to make a handover. In lossless handover,
in-sequence delivery during handover can be ensured by using PDCP Data PDUs sequence
numbers. Lossless handover can be very suitable for delay-tolerant services like file
downloads that the loss of PDCP SDUs can enormously decrease the data rate because of
TCP reaction.
Lossless handover is applied for user plane and for some control plane radio bearers that are
mapped on RLC-AM. In lossless handover, on the UE side the header compression protocol is
reset because its context is not forwarded from the source eNodeB to the target eNodeB, but
the PDCP SDUs' sequence numbers and the COUNT values are not reset [4]. To ensure
lossless handover in the uplink, the PDCP PDUs stored in the PDCP retransmission buffer are
retransmitted by the RLC protocol based on the PDCP SNs which are maintained during the
handover and deliver them to the gateway in the correct sequence.
In order to ensure lossless handover in the downlink, the source eNodeB
forwards the uncompressed PDCP SDUs for which reception has not yet been
acknowledged by the UE to the target eNodeB for retransmission in the
downlink.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
22
2.2. Types of Handover:
The handover is triggered by the eNodeB, based on the received measurement reports from
the UE. Handover is classified in different types based on the origination and destination of
the handover. The handover can start and end in the E-UTRAN, it can start in the E-UTRAN
and end in another Radio Access Technology (RAT), or it can start from another RAT and
end in E-UTRAN [15].
Handover is classified as:
Intra-frequency intra-LTE handover.
Inter-frequency intra-LTE handover.
Inter-RAT towards LTE handover.
Inter-RAT towards UTRAN handover.
Inter-RAT towards GERAN handover.
Inter-RAT towards cdma2000 system handover.
2.2.1. Intra LTE Handover: Horizontal Handover:
In intra LTE handover, which is focused by this project, both the origination and destination
eNodeBโs are within the LTE system. In this type of handover, the RRC connection
reconfiguration message acts as a handover command. The interface between eNodeBโs is an
X2 interface. Upon handover, the source eNodeB sends an X2 handover request message to
the target eNodeB in order to make it ready for the coming handover.
2.2.2. Vertical Handover:
There have been tremendous breakthroughs recorded in the last decade in the historical
evolution of the wireless communication networks. The complex nature of the wireless
environment has made the technology difficult or almost impossible for the network to be
efficient in providing esteemed users high data rate and good Quality of Service (QoS)
requirements. In trying to accomplish these demands, fourth generation (4G) wireless systems
engage in collaborating heterogeneous wireless technologies to allow users get connected
anywhere and at all times. The heterogeneity of the wireless networks involves the integration
of diverse radio access technologies (RAT) such as LTE/LTE-Advanced, UMTS, HSPA,
GPRS, GSM, WiMAX and WiFi. The purpose of integrating these independent networks is to
realize the demand for high data rate and good QoS to support multimedia streaming at
precision levels.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
23
Consequently, the issue of seamless handover, high QoS support, resource allocation,
mobility management and security must be appropriately addressed before achieving these
requirements. As one of the strategies in achieving this purpose, handover mechanism is
introduced and could be defined as a process of reassigning resources as a result of Mobile
user equipment (UE) movement when it switches from one technology to another. An intra-
technology handover process mainly based on the received signal strength (RSS) levels,
is known as Horizontal Handover (HHO) and occurs when the UE switches access points
(APs) or eNodeBs while maintaining the same network. On the other hand, UE switching
their connections to a different network of abstracting proficiencies are termed Vertical
Handover (VHO). This has become possible because of the emergence of multitude
overlapping wireless networks which makes the handover process more complex.
2.3. Handover Techniques:
Handover can be categorized as: Soft handover and hard handover also known as
Make-Before-Break and Break-Before-Make respectively.
2.3.1. Soft handover, Make-Before-Break:
Soft handover is a category of handover procedures where the radio links are added and
abandoned in such manner that the UE always keeps at least one radio link to the UTRAN.
Soft and softer handover were introduced in WCDMA architecture. There is a centralized
controller called Radio Network Controller (RNC) to perform handover control for each UE
in the architecture of WCDMA. It is possible for a UE to simultaneously connect to two or
more cells (or cell sectors) during a call. If the cells the UE connected are from the same
physical site, it is referred as softer handover [10].
In handover aspect, soft handover is suitable for maintaining an active session, preventing
voice call dropping, and resetting a packet session. However, the soft handover requires much
more complicated signaling, procedures and system architecture such as in the WCDMA
network.
2.3.2. Hard handover, Break-Before-Make:
Hard handover is a category of handover procedures where all the old radio links in the UE
are abandoned before the new radio links are established. The hard handover is commonly
used when dealing with handovers in the legacy wireless systems. The hard handover requires
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
24
a user to break the existing connection with the current cell (source cell) and make a new
connection to the target cell [10].
In LTE only hard handover is supported, meaning that there is a short interruption in service
when the handover is performed.
2.4. Handover Procedure:
Depending on whether any EPC entity is involved in preparing and executing of a handover
between a source eNodeB and a target eNodeB or not, an LTE handover can be either X2
handover using X2 interface or S1 handover using S1 interface.
Figure 9 shows how a source eNodeB decides on a handover type, X2 or S1, when a handover
is triggered.
Figure 9: Decision on Handover Type.
Handover procedure in LTE can be divided into three phases: handover preparation, handover
execution and handover completion [4]. The procedure starts with the measurement reporting
of a handover event by the User Equipment (UE) to the serving evolved Node B (eNodeB).
The Evolved Packet Core (EPC) is not involved in handover procedure for the control plane
handling, i.e. preparation messages are directly exchanged between the eNodeBโs [1]. That is
the case when X2 interface is deployed, otherwise MME will be used for HO signaling.
The handover procedure with the basic handover scenario is depicted in Figure 10.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
25
Figure 10: Intra-MME/Serving Gateway handover [9].
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
26
Handover preparation:
During the handover preparation, data flows between UE and the core network as usual. This
phase includes messaging such as measurement control, which defines the UE measurement
parameters and then the measurement report sent accordingly as the triggering criteria is
satisfied. Handover decision is then made at the serving eNodeB, which requests a handover
to the target cell and performs admission control. Handover request is then acknowledged by
the target eNodeB.
Handover execution:
Handover execution phase is started when the source eNodeB sends a handover command to
UE. During this phase, data is forwarded from the source to the target eNodeB, which buffers
the packets. UE then needs to synchronize to the target cell and perform a random access to
the target cell to obtain UL allocation and timing advance as well as other necessary
parameters. Finally, the UE sends a handover confirm message to the target eNodeB after
which the target eNodeB can start sending the forwarded data to the UE [1].
Handover completion:
In the final phase, the target eNodeB informs the MME that the user plane path has changed.
S-GW is then notified to update the user plane path. At this point, the data starts flowing on
the new path to the target eNodeB. Finally all radio and control plane resources are released in
the source eNodeB.
A more detailed description of the intra-MME/Serving Gateway HO procedure is given
below:
1. Based on the area restriction information, the source eNodeB configures the UE
measurement procedure.
2. MEASUREMENT REPORT is sent by the UE after it is triggered based on some
rules.
3. The decision for handover is taken by the source eNodeB based on
MEASUREMENTREPORT and RRM information.
4. HANDOVER REQUEST message is sent to the target eNodeB by the source eNodeB
containing all the necessary information to prepare the HO at the target side.
5. RAB QoS information. Performing admission control is to increase the likelihood of a
successful HO, in that the target eNodeB decides if the resources can be granted or
not. In case the resources can be granted, the target eNodeB configures the required
resources according to the received E-RAB QoS information then reserves a Cell
Radio Network Temporary Identifier (C-RNTI) and a RACH preamble for the UE.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
27
6. The target eNodeB prepares HO and then sends the HANDOVER REQUEST
ACKNOWLEDGE to the source eNodeB. There is a transparent container in the
HANDOVER REQUEST ACKNOWLEDGE message which is aimed to be sent to
the UE as an RRC message for performing the handover. The container includes a new
C-RNTI, target eNodeB security algorithm identifiers for the selected security
algorithms, may include a dedicated RACH preamble, and possibly some other
parameters like RNL/TNL information for the forwarding tunnels. If there is a need
for data forwarding, the source eNodeB can start forwarding the data to the target
eNodeB as soon as it sends the handover command towards the UE.
Steps 7 to 16 are designed to avoid data loss during HO:
7. To perform the handover the target eNodeB generates the RRC message, i.e. RRC
Connection Reconfiguration message including the mobility Control Information. This
message is sent towards the UE by the source eNodeB.
8. The SN STATUS TRANSFER message is sent by the source eNodeB to the target
eNodeB. In that message, the information about uplink PDCP SN receiver status and
the downlink PDCP SN transmitter status of E-RABs are provided. The PDCP SN of
the first missing UL SDU is included in the uplink PDCP SN receiver status. The next
PDCP SN that the target eNodeB shall assign to the new SDUs is indicated by the
downlink PDCP SN transmitter status.
At this point, data forwarding of user plane downlink packets can use either a
โseamless modeโ minimizing the interruption time during the move of the UE, or a
โlossless modeโ not tolerating packet loss at all. The source eNodeB may decide to
operate one of these two modes on a per EPS bearer basis, based on the QoS received
over X2 for this bearer.
9. After reception of the RRC Connection Reconfiguration message including the
mobility Control Information by the UE, the UE tries to perform synchronization to
the target eNodeB and to access the target cell via RACH. If a dedicated RACH
preamble was assigned for the UE, it can use a contention free procedure; otherwise it
shall use a contention based procedure. In the sense of security, the target eNodeB
specific keys are derived by the UE and the selected security algorithms are
configured to be used in the target cell.
10. The target eNodeB responds based on timing advance and uplink allocation.
11. After the UE is successfully accessed to the target cell, it sends the RRC Connection
Reconfiguration Complete message for handover confirmation, The C- RNTI sent in
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
28
the RRC Connection Reconfiguration Complete message is verified by the target
eNodeB and afterwards the target eNodeB can now begin sending data to the UE.
12. A PATH SWITCH message is sent to MME by the target eNodeB to inform that the
UE has changed cell.
13. UPDATE USER PLANE REQUEST message is sent by the MME to the Serving
Gateway.
14. The Serving Gateway switches the downlink data path to the target eNodeB and sends
one or more \end marker" packets on the old path to the source eNodeB to indicate no
more packets will be transmitted on this path. Then U-plane/TNL resources towards
the source eNodeB can be released.
15. An UPDATE USER PLANE RESPONSE message is sent to the MME by the Serving
Gateway.
16. The MME sends the PATH SWITCH ACKNOWLEDGE message to confirm the
PATH SWITCH message.
17. The target eNodeB sends UE CONTEXT RELEASE to the source eNodeB to inform
the success of handover to it. The target eNodeB sends this message to the source
eNodeB after the PATH SWITCH ACKNOWLEDGE is received by the target
eNodeB from the MME.
18. After the source eNodeB receives the UE CONTEXT RELEASE message, it can
release the radio and C-plane related resources. If there is ongoing data forwarding it
can continue.
Figure 11: Handover Timing [8]
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
29
2.5. Handover Measurements:
The handover procedure in LTE-Advanced, which is a part of the RRM, is based on the UEโs
measurements. Handover decisions are usually based on the downlink channel measurements
which consist of Reference Signal Received Power (RSRP) and Reference Signal Received
Quality (RSRQ) made in the UE and sent to the eNodeB regularly [12]. The descriptions of
each of them are presented following:
Reference Signal Received Power (RSRP):
The RSRP measurement provides cell-specific signal strength metric. This measurement is
used mainly to rank different LTE-Advanced candidate cells according to their signal strength
and is used as an input for handover and cell reselection decisions. RSRP is defined for a
specific cell as the linear average received power (in Watts) of the signals that carry cell-
specific Reference Signals (RS) within the considered measurement frequency bandwidth [4].
Reference Signal Received Quality (RSRQ):
This measurement is intended to provide a cell-specific signal quality metric. Similarly to
RSRP, this metric is used mainly to rank different LTE candidate cells according to their
signal quality. This measurement is used as an input for handover and cell reselection
decisions, for example in scenarios for which RSRP measurements do not provide sufficient
information to perform reliable mobility decisions.
The RSRQ is defined as:
๐ ๐๐ ๐ = ๐.๐ ๐๐ ๐
๐ ๐๐๐ผ (1)
Where N is the number of Resource Blocks (RBs) of the LTE-Advanced carrier RSSI
measurement bandwidth. The measurements in the numerator and denominator are made over
the same set of resource blocks. While RSRP is an indicator of the wanted signal strength,
RSRQ additionally takes the interference level into account due to the inclusion of RSSI.
RSRQ therefore enables the combined effect of signal strength and interference to be reported
in an efficient way [4].
Besides RSRP/RSRQ, handover technology has other decision criterions, such as:
Signal Noise Ratio (SNR):
The SNR is a measurement that compares the level of a desired signal to the level of
background noise (unwanted signal). It is defined as the ratio of signal power and the noise
power. A ratio higher than 1:1 indicates more signal than noise.
๐๐๐ = ๐๐ ๐๐๐๐๐
๐๐๐๐๐ ๐ (2)
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
30
Where P is average power. Both signal and noise power must be measured at the same or
equivalent points in a system, and within the same system bandwidth [16].
Carrier-to-Interference Ratio (CIR):
CIR expressed in decibels (dB) is a measurement of signaling effectiveness and it is defined
as the ratio of the power in the carrier to the power of the interference signal.
Signal Interference plus Noise Ratio (SINR):
This metric is used to optimize the transmit power level for a target quality of service
assisting with handover decisions. Accurate SINR estimation provides a more efficient system
and a higher user-perceived quality of service.
SINR is defined as the ratio of signal power to the combined noise and interference power:
๐๐ผ๐๐ = ๐๐ ๐๐๐๐๐
๐๐๐๐๐ ๐+ ๐๐๐๐ก๐๐๐๐๐๐๐๐๐ (3)
Where P is the averaged power, values are commonly quoted in dB.
Received Signal Strength Indicator (RSSI):
The LTE carrier RSSI is defined as the total received wideband power observed by the UE
from all sources, including co-channel serving and non-serving cells, adjacent channel
interference and thermal noise within the measurement bandwidth specified by the 3GPP.
LTE-Advanced carrier RSSI is not reported as a measurement in its own right, but is used as
an input to the LTE-Advanced RSRQ measurement [4].
As mentioned earlier, handover measurements in LTE-Advanced are done at the downlink
reference symbols in the frame structure as shown in Figure 12. However, handover decision
can also be based on the uplink measurements. This study focuses on downlink handover
measurements.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
31
Figure 12: Downlink reference signal structure for LTE-Advanced.
The averaging of fast fading over all the reference symbols is done at Layer 1 and hence is
called L1 filtering (Figure 13). The use of scalable bandwidth in LTE allows doing the
handover measurement on different bandwidth.
Figure 13: Handover measurement filtering and reporting [10].
2.6. Handover Parameters:
The handover procedure has different parameters which are used to enhance its performance
and setting these parameters to the optimal values is a very important task. In LTE the
triggering of handover is usually based on measurement of link quality and some other
parameters in order to improve the performance. The most important ones include [13]:
Handover initiation threshold level RSRP and RSRQ:
This level is used for handover initiation. When the handover threshold decreases, the
probability of a late handover decreases and the Ping-Pong effect increases. It can be varied
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
32
according to different scenarios and propagation conditions to make theses trade-offs and
obtain a better performance.
Hysteresis margin:
The Hysteresis margin also called HO margin is the main parameter that governs the HO
algorithm between two eNodeBโs. The handover is initiated if the link quality of another cell
is better than current link quality by a hysteresis value. It is used to avoid ping-pong effects.
However, it can increase handover failure since it can also prevent necessary handovers.
Time-to-Trigger (TTT):
When applying Time-to-Trigger, the handover is initiated only if the triggering requirement is
fulfilled for a time interval. This parameter can decrease the number of unnecessary
handovers and effectively avoid Ping-Pong effects. But it can also delay the handover which
then increase the probability of handover failures.
The length and shape of averaging window:
The effect of the channel variation due to fading should be minimized in handover decision.
Averaging window can be used to filter it out. Both the length and the shape of the window
can affect the handover initiation. Long windows reduce the number of handovers but
increase the delay. The shape of the windows, e.g. rectangular or exponential shape, can also
affect the number of handovers and probability of unnecessary handovers.
The listed parameters will affect directly the handover initiations and hence they can be tuned
according to certain design goals. However there are other parameters like the measurement
report period which can also have an impact on the handover initiations.
Figure 14: Handover triggering procedure [11].
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
33
In summary, the starting point of the handover triggering procedure is the measurements
performed by the UE. These are done periodically as defined by the measurement period
parameter configured at the eNodeB. When a condition is reached in which the serving cell
RSRP drops an amount of the configured HO offset, usually 2-3dB, below the measured
neighbor cell, a timer is started.
In case this condition lasts the amount of the Time to Trigger (TTT) value, a measurement
report is sent to the eNodeB, which initiates the handover by sending a handover command to
the UE. In case the reporting conditions change and no longer satisfy the triggering conditions
before the timer reaches the TTT value, a measurement report will not be sent and new
measurement calculations and timers are started [11].
2.7. Time To Trigger & Hysteresis:
In this project LTE, two main parameters are studied in the handover process. These
parameters are the Time-to-Trigger (TTT) and Hysteresis (hys). The hys is used to dene how
much better the RSS of a neighboring base station must be than the serving base station for a
handover to be considered. The values of hys are defined in Decibels (dB) and range from 0
to 10dB in 0.5dB increments, this results in there being 21 different values of hys. The full
range of hys values can be seen in Table 4.
Table 4: Table of the different LTE hys values.
Index
hys (dB)
0
0.0
1
0.5
2
1.0
3
1.5
4
2.0
5
2.5
6
3.0
7
3.5
8
4.0
9
4.5
10
5.0
Index
hys (dB)
11
5.5
12
6.0
13
6.5
14
7.0
15
7.5
16
8.0
17
8.5
18
9.0
19
9.5
20
10
The TTT is a length of time, defined in seconds, that is used to define how long a neighboring
base station must be considered better than the serving base station for. There are 16 different
values of TTT ranging from 0 to 5.12 seconds. Unlike with hys., the TTT values do not
increase linearly; instead they increase exponential with smaller increases at the lower values
and bigger increases at the larger values. The full list of TTT values can be seen in Table 5
and a graph of how the TTT values increase can be seen in Figure 11.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
34
Table 5: Table of the different LTE TTT values.
Index
TTT (s)
0
0.0
1
0.04
2
0.064
3
0.08
4
0.1
5
0.128
6
0.16
7
0.256
8
0.32
Index
TTT (s)
9
0.48
10
0.512
11
0.64
12
1.024
13
1.280
14
2.56
15
5.12
There are 336 different combinations of TTT and hys values. Having such a large range of
combinations means that pairs of values can mean that a neighboring eNodeB has to be better
by a large value of hys but for a small value of TTT or vice-versa. This makes for an
interesting dynamic for which pairs of values will work the best in any given environment.
In LTE there are eight different triggers defined for initiating handovers. Table 6 shows
different trigger events and how they are defined [18].
Table 6: Table of the different LTE Trigger types and their criteria.
Event Type Trigger Criteria
A1 Serving becomes better than a threshold.
A2 Serving becomes worse than a threshold.
A3 Neighbor becomes offset better than Primary Cell (PCell).
A4 Neighbor becomes better than threshold.
A5 PCell becomes worse than threshold1 and neighbor becomes
better than threshold2.
A6 Neighbor becomes offset better than Secondary Cell (SCell).
B1 Inter RAT neighbor becomes better than threshold.
B2 PCell becomes worse than threshold1 and inter RAT
neighbor becomes better than threshold2.
Out of the eight triggers the A3 event is the most common and its definition is that a
neighboring eNodeB must give the UE better Reference Signal Received Power (RSRP) by an
amount defined by the hys., for a length of time defined by the TTT. [19] The A3 event can
be represented by the following equation:
๐ ๐๐ ๐๐๐๐๐โ๐๐๐๐๐๐ + ๐ป๐ฆ๐ > ๐ ๐๐ ๐๐ ๐๐๐ฃ๐๐๐ (4)
When a handover event is triggered a measurement report is sent from the UE to the Serving
eNodeB. The measurement report contains the information required for the Serving eNodeB
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
35
to make a decision on whether to initiate a handover or not. The full, high-level, procedure for
a LTE handover is as follows:
1. If a Neighboring eNodeB is found to be better than the Serving eNodeB a
measurement report is sent by the UE to the Serving eNodeB.
2. The Serving eNodeB considers the information in the measurement report and decides
whether or not a handover should take place.
3. If it is decided that a handover should take place then a message is sent to the
Neighboring eNodeB to prepare resources for the UE.
4. Once the resources are ready for the UE the new Serving eNodeB sends a message to
the old eNodeB to release the resources it previously had for the UE.
5. Finally a message is sent to the MME to finalize the handover process.
Conclusion:
The handover parameters need to be optimized for good performance. Too low handover
offset and TTT values in fading conditions result in back and forth ping- pong handovers
between the cells. Too high values then can be the cause of call drops during handovers as the
radio conditions get too bad for transmission in the serving cell.
In the last chapter, we will explain our proposed solution to optimize the Handover
parameters.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
36
Chapter III: Machine Learning and Handover Parameter
Optimization simulation
Introduction:
Optimizing handover is a major activity in network operations, with Hysteresis and Time-to-
Trigger as the main control parameters. For each HO, depending on the Hys-TTT tuple also
called the trigger point, either: a success, Ping-Pong, or Radio Link failure occurs.
Along this chapter, we will describe the Q-Learning, present our proposed approach for
Handover optimization and finish by simulation results.
3.1. Q-Learning overview:
3.1.1. Machine Learning:
Machine learning is a form of Artificial Intelligence (AI) that involves designing and studying
systems and algorithms with the ability to learn from data. This field of AI has many
applications within research (such as system optimization), products (such as image
recognition) and advertising (such as adverts that use a user's browsing history). There are
many different paradigms that machine learning algorithms use. Algorithms can use training
sets to train an algorithm to give appropriate outputs; other algorithms look for patterns in
data; while others use the notion of rewards to find out if an action could be considered
correct or not [20]. Three of the most popular types of machine learning algorithms are:
Supervised learning is where an algorithm is trained using a training set of data.
This set of data includes inputs and the known outputs for those inputs. The training
set is used to fine-tune the parameters in the algorithm. The purpose of this kind of
algorithm is to learn a general mapping between inputs and outputs so that the
algorithm can give an accurate result for an input with an unknown output. This
type of algorithm is generally used in classification systems.
Unsupervised learning algorithms only know about the inputs they are given. The
goal of such an algorithm is to try and find patterns or structure within the input
data. Such an algorithm would be given inputs and any patterns that are contained
would become more and more visible the more inputs the algorithm is given.
Reinforcement learning uses an intelligent agent to perform actions within an
environment. Any such action will yield a reward to the agent and the agent's goal
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
37
is to learn about how the environment reacts to any given action. The agent then
uses this knowledge to try and maximize its reward gains.
3.1.2. Reinforcement Learning:
In reinforcement learning an intelligent agent is learning what action to do at any given time
to maximize the notion of a reward. In the beginning the agent has no knowledge of what
action it should take from any state within the learning environment. It must instead learn
through trial and error, exploring all possible actions and finding the ones that perform the
best.
The trade-of between exploration and exploitation is one of the main features of
reinforcement and can greatly affect the performance of a chosen algorithm. A reinforcement
learning algorithm must contemplate this trade-off of whether to exploit an action that
resulted in a large reward or to explore other actions with the possibility of receiving a greater
reward.
Another main feature of reinforcement learning is that the problem in question is taken into
context as a whole. This is different from other types of ma- chine learning algorithms, as
they will not considered how the results of any sub-problems may affect the problem as a
whole.
The basic elements required for reinforcement learning is as follows:
A Model (M) of the environment that consists of a set of States (S) and Actions
(A).
A reward function (R).
A value function (V).
A policy (P).
The model of the environment is used to mimic the behavior of the environment, such as
predicting the next state and reward from a state and taken action. Models are generally used
for planning by deciding what action to take while considering future rewards.
The reward function defines how good or bad an action is from a state. It is also used to
define the immediate reward the agent can expect to receive. Generally a mapping between a
state-action pair and a numerical value is used to define the reward that the agent would gain.
The reward values are used to define the policy where the best value of state-action pair is
used to define the action to take from a state.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
38
While the reward function defines the immediate reward that can be gained from a state, the
value function defines how good a state will be long-term. This difference can create possible
conflicts of interest for an agent; so while its goal is to collect as much reward as possible, it
has to weigh up the options of picking a state that may provide a lot of up front reward but not
much future reward against a state with a lot of future reward but not much immediate reward.
The policy is a mapping between a state and the best action to be taken from that state at any
given time. Policies can be simple or complex; with a simple policy consisting of a lookup
table, while more complex policies can involve search processes. In general most policies
begin stochastic so that the agent can start to learn what actions are more optimal. [11]
3.1.3. Q-Learning:
Q-Learning is a type of reinforcement learning algorithm where an agent tries to discover an
optimal policy from its history of interactions within an environment. What makes
Q-Learning so powerful is that it will always learn the optimal policy (which action a to take
from a state s) for a problem regardless of the policy it follows, as long as there is no limit on
the number of times the agent can try an action. Due to this ability to always learn the optimal
policy, Q-Learning is known as an Off-Policy learner. The history of interactions of an agent
can be shown as a sequence of State-Action-Rewards:
< s0, a0, r1, s1, a1, r2, s2, a2... >
This can be described as the agent was in State 0, did Action 0, received Reward 0 and
transitioned into State 1; then did Action 1, received Reward 1 and transitioned into State 2;
and so on.
The history of interactions can be treated as a sequence of experiences, with each experience
being a tuple.
< s, a, r, s >
The meaning of the tuple is that the agent was in State s, did Action a, received Reward r and
transitioned in State s. The experiences are what the agent uses to determine what the optimal
action to take is at a given time.
The basic process of a Q-Learning algorithm can be seen in Algorithm 3.1. The general
process requires that the learning agent is given a set of states, a set of actions, a discount
factor ฮณ and step size ฮฑ. The agent also keeps a table of Q-Values, denoted by Q(s,a) where s
is a state and a is an action from that state. A Q-Value is also an average of all the experiences
the agent has with a specific state-action pair. This allows for good and bad experiences to be
averaged out to giving a reasonable estimation of the actual value of state-action pair.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
39
The process of averaging out experiences is done using Temporal Differences.
It could be said that the best way to estimate the next value in a list is to take the average of
all the previous values. Equation 4 shows this process.
๐ด๐ = (๐ฃ1 +โฏ+ ๐ฃ๐)
๐ (5)
Therefore
๐ ๐ด๐ = ๐ฃ1 + โฏ + ๐ฃ๐
= (๐ โ 1)๐ด๐โ1 + ๐ฃ๐ (6)
Then dividing by k gives:
๐ด๐ = (1 โ 1
๐) ๐ด๐โ1 +
๐ฃ๐
๐ (7)
Then let ฮฑk = 1/k:
๐ด๐ = (1 โ ๐ผ๐)๐ด๐โ1+ ๐ผ๐ + ๐ฃ๐
= ๐ด๐โ1 + ๐ผ๐(๐ฃ๐ โ ๐ด๐โ1 ) (8)
The part of Equation 8 where the difference vk โ Akโ1 is seen is known as the Temporal
Difference Error or TD Error. This shows how different the old value Ak-1 is from the new
value vk. The new value of the estimate, Ak, is then the old estimate, Ak-1, plus the TD error
times k. The Q-Values, therefore, are defined using temporal differences and Equation 9
shows the formula to calculate the values, where is a variable between 0 and 1 and defines
the step size of the algorithm. If the step size were 0 then the algorithm would ignore any
rewards received and if the step size were 1 the algorithm would consider the rewards gained
just as much as the previous experiences of a state-action pair. The discount factor is also a
variable between 0 and 1 and defines how much less future rewards will be worth compared
to the current reward. If the discount factor were to be 0, then the future rewards would not be
considered a lot. If the discount factor were to be 1, then the future rewards would be worth as
much as the current rewards. The possible future rewards (maxaQ(s,a)) is the maximum of the
Q-Values of all possible state-action pairs from the action selected.
๐[๐ , ๐] = ๐[๐ , ๐] + ๐ผ (๐ + ๐พ๐๐๐ฅ๐โฒ๐[๐ โฒ, ๐โฒ] โ ๐[๐ , ๐]) (9)
The table of Q-Values can either be initialized as empty or with some values pre-set to try and
lead the agent to a specific goal state. Once the agent has initialized these parameters it
observes the starting state. The starting state can either be chosen by random or be a pre-
determined start state for the problem. The agent will then choose an action. Actions are
chosen either stochastically or by a policy. Once an action has been chosen the agent will
carry out the action and receive a reward. This reward is used to update the table of Q-Values
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
40
using Equation 9. Finally the agent moves into the new state and repeats until termination;
which can be either when the agent discovers a goal state or after a certain number of actions
have be taken.
Require:
S is a set of states
A is a set of actions
ฮณ the discount reward factor
ฮฑ is the learning rate
1: procedure Q-Learning(S, A, ฮณ, ฮฑ)
2: real array Q[S, A] 3: previous state s
4: previous action a
5: initialize Q[S, A] arbitrarily
6: observe current state s
7: repeat
8: select and carry out an action a
9: observe reward r and state sโฒ
10: Q[s, a] โ Q[s, a] + ฮฑ (r + ฮณmaxaโฒ Q[sโฒ, aโฒ] โ Q[s, a])
11: s โ sโฒ
12: until termination
13: end procedure
After a Q-Learning algorithm has finished exploring the model of the environment it creates a
policy. The policy is generated by searching across all actions for a state and finding the next
state with the greatest value. The policy is therefore a lookup table that maps a state with the
best possible next state. The policy created can then be used to solve the problem that the Q-
Learning agent was exploring [22].
3.2. Proposed Approach for HO optimization:
3.2.1. Set of states:
The approach taken for optimizing the handover parameters in LTE-Advanced uses a Q-
Learning algorithm based on the process given in Section 3.1. In the approach the model of
the environment has a state for every combination of TTT and hys.; giving a total number of
336 states.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
41
Table 7: Set of states.
HYS
TTT
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10
0 0.0 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020
1 0.04 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041
2 0.064 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062
3 0.08 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083
4 0.1 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104
5 0.128 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
6 0.16 126 127 128 129 130 131 132 133 134 135(6) 136(5) 137(8) 138 139 140 141 142 143 144 145 146
7 0.256 147 148 149 150 151 152 153 154 155 156(4) 157 158(1) 159 160 161 162 163 164 165 166 167
8 0.32 168 169 170 171 172 173 174 175 176 177(7) 178(2) 179(3) 180 181 182 183 184 185 186 187 188
9 0.48 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209
10 0.512 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230
11 0.64 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251
12 1.024 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
13 1.280 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
14 2.56 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314
15 5.12 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
42
3.2.2. Set of actions:
An action within the model can move to any other state that is different by one of the
following changes to the handover parameters:
1. A single value increase of TTT. (1)
2. A single value increase of hys. (2)
3. A single value increase of both TTT and hys. (3)
4. A single value decrease of TTT. (4)
5. A single value decrease of hys. (5)
6. A single value decrease of both TTT and hys. (6)
7. A single value increase of TTT and a single value decrease of hys. (7)
8. A single value increase of hys and a single value decrease of TTT. (8)
For example if the learning agent is in the state 157 where the TTT equals 0.256s and the hys
equals 5.0dB and performed action 3 from the list seen above: a single value increase of both
TTT and hys.), then the new TTT would equal 0.32s and the hys. would equal 5.5dB: state
179. In fact the possible next states for the state 157 are: {135(6), 136(5), 137(8), 156(4), 158(1),
179(3), 178(2), 177(7)}
HYS (dB)
TT
T (
s)
5.0 5.54.5
7
6
3
4 1
8
2
5
0.16
0.256
0.32S157
S137S136S135
S156
S177
S178
S179
S158
Figure 15: State 157 possible actions.
The full list of hys. values can be seen in Table 3 and the full list of TTT values can be seen in
Table 4. Having the actions only change the parameters by one increase or decrease of the
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
43
TTT and hys values each time not only allows for more refined optimization of the parameters
but it also makes sure that no large changes can suddenly happen.
3.2.3. Reward:
Due to the nature of this kind of problem, the reward gained by an action is dynamic and is
likely to be different each time it is taken. Rewards are based on the number of drop and ping-
pongs accumulated in the simulation for current state in the environment model. The rewards
are defined by the following equation:
๐ ๐๐ค๐๐๐ = ๐ป๐๐๐๐๐ฃ๐๐๐ ๐ข๐๐๐๐ ๐ ๐๐ข๐ /(10 โ ๐ท๐๐๐๐ + 2 โ ๐๐๐๐๐๐๐๐๐ ) (10)
The coefficients in Equation 10 are given the values of 10 for drops and 2 for ping-pongs.
Drops are extremely bad for the QoS of a communication system so it's given a large value
and the reason ping-pongs are multiplied by 2 to remove the successful handover that was
caused by the Ping-Pong and give the agent a penalty. The reward is given to the agent and
the Q-Value for that state is updated just before the agent selects the next action to take. The
agent then selects new actions in discrete time steps, which allows for the simulation to run
for fixed periods of time with TTT-hys. pairs specified by a state in the environment model.
After the agent has been given enough time to try every action at least once the Q-Learning
agent generates a policy. This policy can then be used to attempt to optimize the handover
parameters by changing the TTT and hys. values after a call is dropped or the connection
ping-pongs between base stations. The Q-Learning agent still receives rewards every time a
call is dropped or the connection ping-pongs while following the generated policy. Doing this
allows for the system to always be learning; even after the initial learning process that
generated the policy.
3.3. Simulation & Performance evaluation:
The simulation is a very important part of the project. It is required to provide the basic
functionality of a LTE network. For simplicity the simulation was broken down into two main
components; the mobile (UE) and the base station (eNodeB). Due to the project revolving
around the handover process in LTE, it made sense for the two main components of the
simulation to be the mobile and the base station; it is the mobile that triggers the measurement
report and the base station that makes the decision on whether a handover should take place or
not. Each base station would also be given its own Q-Learning agent since each base station is
unique. Since the A3 event trigger (Table 5) is the most common it was decided that it would
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
44
be the only trigger implemented in the simulation to reduce the complexity within the
simulation.
Tools and systems used for the simulation: VMwareยฎ Workstation 10.0.1, Ubuntu
14.04, C++ and Matlab R2012a.
Server Machine: EasyNote Packard bell (6GB DDR3 Memory, intelยฎ i3-3120M,
NVIDIAยฎ GeForce 72 GB Dedicated VRAM and 15.6โ 16:9 HD LED LCD).
3.3.1. Simulation parameters:
The optimization system was tested in two scenarios. One scenario was to have 10 UEs
moving randomly around 9 base stations, with each UE being 1m in height and each base
station being 60m in height, using the Random Direction mobility model seen in Section 4.3,
where the speed of the UE is 1 to 4m =s, which is walking speed and the duration of the
direction is between 100 and 200 seconds. The other scenario is to have the UE moving at 10
to 15m=s, which is roughly 30mph. In these scenarios, each mobile begins on top of one of
the base stations before it starts moving so that handovers are not required as soon as the
simulation starts. These scenarios would also be run with no fading in the RSS calculations so
as to make the environment easier for the agents to learn.
Each base station has its own Q-Learning agent to optimize the TTT and hys. values for that
specific base station. The agents are given 1,000,000 seconds to attempt to learn the
environment they are working within, with each state being given 180 seconds to gain their
reward. This length of time was chosen because there are 336 states each with a maximum of
8 actions, therefore the time needed to do all actions would take approximately 483,840
seconds if each action was given 180 seconds. This length of time is less than half of the total
time given so even due to the randomness of selecting next actions when learning the
environment there should be enough time to try all states and a selection of the available
actions. After the agents have learned the environment they generate a policy for their base
station to follow. The simulation is then run for 200,000 seconds to test how well the policies
perform.
Within the simulation there are many variable parameters that need to be assigned values.
Such parameters are: the height of the base stations, the height of the UEs, the dimensions of
the simulation area and the positioning of the base stations. Other parameters that also needed
to be considered are the number of base stations and the number of UEs; as well as the
transmission power of the base stations and the time limit for what would be perceived to
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
45
have been Ping-Pong connection. The type of environment also had to be chosen; whether the
simulation would be within a rural, urban, small city, medium city or large city environment.
The type of environment was decided to be a medium sized city. It was decided that the
height of any UE would be 1m as it would normally be in the possession of a human being.
The height of any base station would be 60 m because base stations are either placed upon tall
buildings or structures so that they produce better coverage. Since it was decided that the
environment would be a medium sized city, this height made sense since there would be some
large multi-story building. It was also decided that the transmission power of the base stations
would be 46dBm and the time limit for ping-pongs to occur would be 5s as they were the
values that had been used in a similar project [17]. The simulation area and positioning of the
base stations both depended on the propagation model used and the area of coverage that
could be expected from it. From Figure 17 it was found that for the chosen propagation model
the Path Loss would reach the LTE noise or of -97.5dB at around 2km. Therefore, from this it
was decided that the area for the simulation would be 6 km by 6 km. It was then decided that
9 base stations would provide good coverage in this area. Each of these base stations will also
have its own Q-Learning agent since each base station has its own unique TTT and hys
values. The base stations were placed in the following locations and the coverage can be seen
in Figure 16.
Figure 16: Illustration of Coverage within the Simulation Area.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
46
Finally, it was decided that 10 UEs would be used in the simulation because this would allow
for the learning done by the Q-Learning agents to happen faster than if there were just 1 UE.
The UEs would also start on top of different base stations so that they will start in different
places allowing for the Q-Learning agent for each base station to start learning straight away
without having to wait for a UE to move into range, as at least 1 UE will start on top of it.
Mobility Model:
A mobility model defines the way in which an entity will move. For the purposes of the
simulation the mobility model used needed to be random in nature. After some research it was
decided that the mobility model to be used in the simulation would either be the Random
Direction or Random Waypoint model because they are two of the most popular random
mobility models [23]. We will focus on the Random Waypoint Model.
Random Direction Model:
The Random Direction Model is defined as follows:
1. Select a direction randomly between 0 and 359 degrees.
2. Select a random speed to move at.
3. Select a random duration to move for.
4. Move in the selected direction at the selected speed for the selected duration.
5. Repeat until termination.
Random Waypoint Model:
In mobility management, the random waypoint model is a random model for the movement of
mobile users, and how their location, velocity and acceleration change over time. Mobility
models are used for simulation purposes when new network protocols are evaluated. The
random waypoint model was first proposed by Johnson and Maltz. It is one of the most
popular mobility models to evaluate mobile ad hoc network (MANET) routing protocols,
because of its simplicity and wide availability.
In random-based mobility simulation models, the mobile nodes move randomly and freely
without restrictions. To be more specific, the destination, speed and direction are all chosen
randomly and independently of other nodes. This kind of model has been used in many
simulation studies.
Two variants, the random walk model and the random direction model are variants of the
random waypoint model.
Description: The movement of nodes is governed in the following manner: Each node begins
by pausing for a fixed number of seconds. The node then selects a random destination in the
simulation area and a random speed between 0 and some maximum speed. The node moves to
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
47
this destination and again pauses for a fixed period before another random location and speed.
This behavior is repeated for the length of the simulation.
The Random Waypoint Model is defined as follows:
1. Randomly select the co-ordinates for a point within the environment.
2. Select a random speed to move at.
3. Select a random length of time to pause for when the destination is reached.
4. Move towards the selected co-ordinates at the selected speed
5. Pause for the randomly selected length of time.
6. Repeat until termination.
It was decided that the Random Direction Model would be used in the simulation because
the Random Waypoint Model has the problem that it is possible to select the co-ordinates
of a point very close to where you begin and then pause for a long period time. The
possibility of that happening is undesirable within the simulation. Random Direction does
not have this problem and it is also possible to set boundaries on the parameters to make
sure that a minimum distance is travelled.
Propagation Model:
A propagation model defines how the received signal from a transmitter decays the further
from the transmitter you are. There are many different models available, all with different
functions and purposes. After some research three models were considered; the Okumura-
Hata Model, the Egli Model and the Cost231-Hata Model.
It was decided that the Cost231-Hata Model would be the one used in the simulation since it
works with frequencies up to 2000MHz (which is the minimum operating frequency of LTE),
unlike the Okumura-Hata model which only works up to 1500MHz. The Cost231-Hata Model
was also picked over the Egli Model because if the Egli Model was used it would require the
simulation area to be very large, with at least 15km between each base station. This would
mean that the UEs would need to move over large distances before any handover attempts
could occur [24, 25].
Table 8: Simulation parameters.
Parameter Value
Number of eNodeB 9
Number of UE 10
Mobility model Random Waypoint Model
Propagation model Cost231-Hata
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
48
3.3.2. Simulation results:
The first step taken to implementing the simulation was to decide on the programming
language to be used. After some deliberation it was decided that C++ would be used. It was
chosen due to its Object Oriented nature being very powerful in this situation considering the
multiple mobiles and base stations needing to be implemented.
The next step was to create the base classes for the UEs and base stations, with basic
functionality such as Accessors and Mutators for changing the parameters of the classes. Such
parameters for the base stations would be if a mobile were currently connected to it and the X-
Y co-ordinates representing its location. Such parameters for the UEs would be the ID number
of the base station it is currently connected to and the X-Y co-ordinates representing its
location.
It was important to get the Discrete Event Simulation framework implemented early into the
project. This meant that a DES framework did not have to be created from scratch, which
allowed more development time to be spent on other aspects of the simulation. The DES
library itself was very simple to use once some experience was had with it. There were two
main parts to the library; they were the Scheduler and the Event Handler. The Scheduler is the
class that provides the discrete time steps in the simulation as well as passing events to the
event handlers. The Event Handler is an abstract class that is the super class to the UEs, base
stations and Q-Learning agents. The characteristics that are inherited include the Handler
method, which receives events from the Scheduler.
When both the mobility and propagation models had been implemented the next part to be
done was the handovers along with detection for call dropping and connection ping-pongs.
These are very important parts of the simulation and their implementation can be seen in
Appendix A. While all three of these components worked correctly the handover triggering
was not implemented as efficiently as it could have been. It ended up using decrementing
variables instead of the DES library due to bugs that remained unresolved.
Simulation Testing:
It was very important for the simulation and Q-Learning algorithm to be tested so that there
was confidence that they would produce the correct results when executed. There are many
different types of testing that can take place to ensure that a piece of software is working
correctly. Testing methods that were considered for this project include: Unit testing, Black
Box testing and White/Clear Box testing.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
49
Unit testing involves testing individual parts or processes of a program so that the
stakeholders can be confident in them to be it for use. The different parts of the simulation
that were tested this way were the Mobility Model, handovers, drops, ping-pongs and the Q-
learning algorithm changing the values of TTT and hys correctly.
Black Box testing involves making sure that a function works as required without any
knowledge of the underlying code. This type of testing is performed by giving a function an
input, and comparing the output from the function with a previously determined expected
output. White Box testing is used to make sure that the underlying code used in a function
works as required. This type of testing is done by inserting print statements in the code to see
how the respective variables are changing while the code is running. These are compared
against previously determined expected values to confirm whether the function is working as
intended. In figure 17 we have different values of TTT for the eight eNobeBs. As seen, we
notice that the eight curves converge to the interval [1.2s- 1.6s]
Figure 17: Illustration of how the TTT values changed over time for large values when UE
travelling at walking speeds.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
50
Figure 18 shows how the TTT values for base stations 5 and 8 are optimized over a simulation
run. It can be seen in this figure that both base stations were in consensus to reduce the value
of their TTT from the starting value of 5:12s. However, both base stations were not in
consensus for how much the value should be reduce by. Base station 8 reduced its TTT value
to as low as 0:16s before settling between it and 0:256s. Base station 5 on the other hand
reduced its TTT value a lot less, only going as low as to oscillate between 1:024s and 1:28s. It
can also be seen that Base station 5 oscillated a lot between those two values and this could be
an indication that the algorithm had got stuck between two non-optimal states and was not
able to optimize the value anymore. This means that even though the optimization improved
the performance there was a large window of potential for further improvement.
From these results it can be said that the system performed as expected with the base stations
shown having reduced their values of TTT. There were also a very high number of dropped
calls and no ping-pongs, which was also expected in the simulation.
Figure 18: Comparison of TTT Optimization for Walking Speeds (Starting Point 5.12s)
Optimization the Drop Ratio KPI:
The results of how the optimization system compared to the static values can be seen in
Figure 19 when the TTT started with its largest possible values of 5.12 seconds. The
results show that the process of optimizing the values initially generated a very large
increase in the number of dropped calls. However, the system then managed to improve
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
51
rapidly and ended up having a better dropped call ratio by the end to the simulation run
than that of the non-optimized system.
Figure 19: Graph of Optimized vs. Non-Optimized Results for Starting Point
TTT=0s hys.=0dB when UE traveling at walking speeds.
It turned out that ping-pongs were a very rare occurrence in the simulation. This was most
likely due to there being no fading in the simulation and that Random Direction mobility
model would having the UE moving in one direction for a long time meaning that the only
likely occurrence of a Ping-Pong would be if a handover took place and the UE then
turned around and moved the other direction.
Figure 19 shows how the optimization system performed against the static values when
the simulation started with the TTT being 0 seconds and the hys. being 0 dB. It can be
seen that the optimization process and the static values performed very similar. It can be
seen though, that the error bars for the optimized system become a lot smaller than those
for the static values the longer the simulation is run. This means that the optimization
system would be expected to perform better the majority of the time.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
52
Again, as seen in Figure 20, the optimization process performed a lot better than the static
values when they were originally set to their middle values of 0.256 seconds for TTT and 5
dB for hys.
Figure 20: Graph of Optimized vs. Non-Optimized Results for Starting Point
TTT=0.256s hys.=5dB when UE traveling at walking speeds.
Conclusion:
It can be seen that when the system does not get stuck between non-optimal states that it will
optimize the TTT and hys. values as quickly as is needed, i.e., whenever a dropped call or
Ping-Pong occurs. The optimization system, however, also appears to have some drawbacks.
It was seen in the first scenario that the optimization system caused a very large increase to
the dropped call ratio before improving it. This is a usual downside in optimization processes
where โthings have to get worse before they can get betterโ and this process is a part of Q-
Learning where the possible future rewards are taken into account when selecting a new state
to move to.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
53
General Conclusion
LTE-Advanced is an efficient SON. In fact, the SON is a concept that originated from Next-
Generation Mobile Network Alliance. A SON network, as LTE-Advanced, can automatically
extend, change, configure, and optimize its topology, coverage, capacity, cell size, and
channel allocation, based on changes in location, traffic pattern, interference, and the
situation/environment. LTE-Advanced is an exchange-to-exchange (e2e) self-aware and self-
optimizing system with a series of features and solutions.
In this context, we try along this project to optimize the most important feature in SON: the
Handover. We propose the Q-Learning algorithm as a solution to optimize the Handover
parameters because of its advantageous nature of reinforcement converging to optimal
function. In fact, Q-Learning is the ideal solution for continuous problem. To reach that goal,
we focus on most important parameters: Time-to-trigger and Hysteresis. We try to find out the
optimal combination of that couple.
By talking of Handover and SON, we cannot not ignore the interference. Further projects can
investigate the interaction between two major technical challenges for LTE-Advanced cells
deployment, in order to face the explosive increase of the traffic growth. These challenges are
Inter-Cell Interference Management which becomes critical in dense deployment of small
cells and Mobility Management since the handover frequency between close-by cells
increases considerably. These two features, often analyzed separately are intertwined. The
main reason is that handover occurs in overlapping cellular areas, where interference level is
the highest.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
54
References
[1] 3GPP TS 36.331 V9.4.0 (2010-09) Evolved Universal Terrestrial Radio Access (E-
UTRA); Radio Resource Control (RRC); Protocol specification (Release 9).
[2] Alcatel-Lucent, Strategic White Paper, โThe LTE Network Architectureโ, 2009
http:/www.alcatel-lucent.com.
[3] NGN Guru Solutions White Paper, Long Term Evolution (LTE). August 2008
http:/www.ngnguru.com.
[4] Stefania Sesia, Issam Touk, Matthew Baker, โLTE-The UMTS long Term Evolution:
From Theory to Practiceโ. John Wiley & Sons Ltd, 2009.
[5] LTE Tutorial Artiza Networks
http://www.artizanetworks.com/lte_tut_sae_tec.html.
[6] LTE-Advanced presented by Raavi Trinath
http://fr.slideshare.net/RAAVIthrinath/lte-advanced-20732830
[7] QUALCOMM Incorporated, LTE Mobility Enhancements, February 2010.
[8] Harri Holma, Antti Toskala, โLTE-The UMTS long Term Evolution: From
Theory to Practiceโ. John Wiley & Sons Ltd, 2009.
[9] 3GPP TS 36.300 V9.5.0 (2010-09), Evolved Universal Terrestrial Radio Access (E-
UTRA) and Evolved Universal Terrestrial Radio Access Network (E-UTRAN);
overall description; Stage 2 (Release 9).
[10] Cheng-Chung Lin, Kumbesan Sandrasegaran, Huda Adibah Mohd Ramli, and Riyaj.
โOptimized Performance Evaluation of LTE Hard Handover Algorithm with Average
RSRP Constraintโ. International Journal of Wireless & Mobile Networks (IJWMN)
Vol. 3, No. 2, April 2011.
[11] Konstantinos Dimou, Min Wang, Yu Yang, Muhammmad Kazmi, Anna Larmo, Jonas
Pettersson, Walter Muller, Ylva Timner; โHandover within 3GPP LTE: Design
Principles and Performanceโ Ericsson Research, 2009 IEEE.
[12] M. Anas, F.D. Calabrese, P.E. Mogensen, C.Rosa and K.I. Pedersen, โPerformance
Evaluation of Received Signal Strength Based Hard Handover for UTRAN LTEโ,
IEEE 65th Vehicular Technology Conference, April 2007.
[13] Y. Yang, โOptimization of Handover Algorithms within 3GPPLTE,โ MSc Thesis
Report, KTH, February 2009.
Handover Parameters Self-optimization by Q-Learning in 4G networks ESPRIT Tech.
55
[14] D. Sunil Shah, โA tutorial on LTE Evolved Utran (Eutran) and LTE Self Organizing
Networksโ (SON). December 2010.
[15] Ali Neissi Shooshtari,โ Optimizing handover performance in LTE networks
containing relays โSchool of Electrical Engineering. MSc Thesis Report; Espoo April
2011.
[16] http://www.radio-electronics.com/info/rf-technology-design/rf-noise-
sensitivity/receiver-signal-to-noise-ratio.php
[17] T. Jansen, I. Balan, J. Turk, I. Moerman, and T. Kurner, \Handover parameter
optimization in lte self-organizing networks," in Vehicular Technology Conference
Fall (VTC 2010-Fall), 2010 IEEE 72nd, pp. 1{5, IEEE, 2010.
[18] 3GPP TS 36.331 V10.7.0, LTE; Evolved Universal Terrestrial Radio Access (E-
UTRA); Radio Resource Control (RRC); Protocol specification (Release 10),
November 2012.
[19] N. Sinclair, Handover Optimisation using Neural Networks within LTE. PhD thesis,
University of Strathclyde, 2013.
[20] E. Alpaydin, Introduction to Machine Learning. MIT press, 2 ed., 2010.
[21] A. G. Barto and R. S. Sutton, Reinforcement learning: An introduction. MIT press,
1998.
[22] D. L. Poole and A. K. Mackworth, Artificial Intelligence: Foundations of
Computational Agents. Cambridge University Press, 2010.
[23] R. Roy, Handbook of Mobile Ad Hoc Networks for Mobility Models. Springer, 2010.
[24] J. Chebil, A. K. Lwas, M. R. Islam, and A. Zyoud, \Comparison of empirical
propagation path loss models for mobile communications in the suburban area of kuala
lumpur," in Mechatronics (ICOM), 2011 4th International Conference On, pp. 1{5,
IEEE, 2011.
[25] N. Shabbir, M. T. Sadiq, H. Kashif, and R. Ullah, \Comparison of radio propagation
models for long term evolution (lte) network.," International Journal of Next-
Generation Networks, vol. 3, no. 3, 2011.
[26] N. Sinclair, D. Harle, I. Glover, J. Irvine, and R. Atkinson, "An advanced som
algorithm applied to handover management within lte" 2013.