data transformation trajectories in embedded systems1088225/fulltext01.pdf · data transformation...

Data Transformation Trajectories in Embedded Systems

GOKULNATH KASINATHAN

KTH ROYAL INSTITUTE OF TECHNOLOGY

I N F O R M A T IO N A N D C O M M U N I C A T I O N T E C H N O L O G Y

DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY,

SECOND LEVEL

STOCKHOLM, SWEDEN 2016

Data Transformation Trajectories in Embedded Systems

Gokulnath Kasinathan

2016-11-27

Master’s Thesis

Examiner Mats Brorsson

Industrial supervisor Daniel Jakobsson, Ericsson

KTH Royal Institute of Technology

School of Information and Communication Technology (ICT)

Department of Communication Systems

SE-100 44 Stockholm, Sweden

Abstract

Mobile phone tracking is the ascertaining of the position or location of a mobile phone when moving

from one place to another place. Location Based Services Solutions include Mobile positioning system

that can be used for a wide array of consumer-demand services like search, mapping, navigation, road

transport traffic management and emergency-call positioning. The Mobile Positioning System (MPS)

supports complementary positioning methods for 2G, 3G and 4G/LTE (Long Term Evolution) networks.

Mobile phone is popularly known as an UE (User Equipment) in LTE.

A prototype method of live trajectory estimation for massive UE in LTE network has been proposed in

this thesis work. RSRP (Reference Signal Received Power) values and TA(Timing Advance) values are

part of LTE events for UE. These specific LTE events can be streamed to a system from eNodeB of

LTE in real time by activating measurements on UEs in the network. AoA (Angle of Arrival) and TA

values are used to estimate the UE position. AoA calculation is performed using RSRP values. The

calculated UE positions are filtered using Particle Filter(PF) to estimate trajectory. To obtain live

trajectory estimation for massive UEs, the LTE event streamer is modelled to produce several task units

with events data for massive UEs. The task level modelled data structures are scheduled across Arm

Cortex A15 based MPcore, with multiple threads.

Finally, with massive UE live trajectory estimation, IMSI (International mobile subscriber identity) is

used to maintain hidden markov requirements of particle filter functionality while maintaining load

balance for 4 Arm A15 cores. This is proved by serial and parallel performance engineering. Future

work is proposed for Decentralized task level scheduling with hash function for IMSI with extension of

cores and Concentric circles method for AoA accuracy.

Keywords: Angle of Arrival, Hidden Markov Model, Particle Filter, Arm A15 MPcore, Parallel

Programming, Real time task level scheduler, Serial and Parallel performance engineering.

Sammanfattning

Mobiltelefoners positionering är välfungerande för positionslokalisering av mobiltelefoner när de rör

sig från en plats till en annan. Lokaliseringstjänsterna inkluderar mobil positionering system som kan

användas till en mängd olika kundbehovs tjänster som sökning av position, position i kartor, navigering,

vägtransporters trafik managering och nödsituationssamtal med positionering. Mobil positions system

(MPS) stödjer komplementär positions metoder för 2G, 3G och 4G/LTE (Long Term Evolution)

nätverk. Mobiltelefoner är populärt känd som UE (User Equipment) inom LTE.

En prototypmetod med verkliga rörelsers estimering för massiv UE i LTE nätverk har blivit föreslagen

för detta examens arbete. RSRP (Reference Signal Received Power) värden och TA (Timing Advance)

värden är del av LTE händelser för UE. Dessa specifika LTE event kan strömmas till ett system från

eNodeB del av LTE, i realtid genom aktivering av mätningar på UEar i nätverk. AoA (Angel of Arrival)

och TA värden är använt för att beräkna UEs position. AoA beräkningar är genomförda genom

användandet av RSRP värden. Den kalkylerade UE positionen är filtrerad genom användande av Particle

Filter (PF) för att estimera rörelsen.

För att identifiera verkliga rörelser, beräkningar för massiva UEs, LTE event streamer är modulerad att

producera flera uppgifts enheter med event data från massiva UEar. De tasks modulerade data

strukturerna är planerade över Arm Cortex A15 baserade MPcore, med multipla trådar. Slutligen, med

massiva UE verkliga rörelser, beräkningar med IMSI(International mobile subscriber identity) är använt

av den Hidden Markov kraven i Particle Filter’s funktionalitet medans kravet att underhålla last balansen

för 4 Arm A15 kärnor. Detta är utfört genom seriell och parallell prestanda teknik.

Framtida arbeten för decentraliserade task nivå skedulering med hash funktion för IMSI med utökning

av kärnor och Concentric circles metod för AoA noggrannhet.

Nyckelord: Angle of Arrival, Hidden Markov Model, Particle Filter, Arm A15 MPcore, Parallel

Programming, Real time task level scheduler, Serial och Parallel performance engineering.

Acknowledgements

At first, I would like to express my sincere gratitude to my supervisor, Daniel Jakobsson and my exam-

iners, Mats Brorsson and Ben Juurlink for their support, guidance and motivation during the thesis work.

I would also like to thank my thesis partner Nawabul Haque and people at Ericsson, Linkoping for their

continuous support. Also I am thankful for my supervisor from linkoping and Karl from stockholm to

help us with equipments during our second and third live test. I am truly grateful for my professor Mats

Brorsson guidance and support throughout the project completion. I would like to thank my family for

their patience and continued support throughout.

iii

Contents

List of Figures 1

List of Tables 7

1 Introduction 11

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Motivation and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 PROBLEM STATEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Background 17

2.1 Evolved Packet core or Core Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Radio Access Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 RBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.2 RRC Connection Establishment and initial UE attach procedure . . . . . . . . . 20

2.3 Cell Trace for PM events with RBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.1 PM Initiated UE measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.2 Cell Trace Activation for Initiation of measurements . . . . . . . . . . . . . . . 22

2.4 Cell trace Mapping with SGSNMME . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.1 UE Context release and Detach procedure . . . . . . . . . . . . . . . . . . . . . 22

2.4.2 Cell Trace Mapping activation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.3 Real time Network streaming for Cell trace mapping . . . . . . . . . . . . . . . 23

2.5 Identifiers in LTE Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.1 Global eNodeB ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.2 Physical Cell ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.3 eNBUES1APID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.4 MMEUES1APID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

v

2.5.5 IMSI ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Other LTE terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.1 UE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.2 Timing Advance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.3 RSRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.4 AoA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.6.5 DoD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Positioning 27

3.1 Positioning in LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Antenna modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.1 LS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.2 MUSIC Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Need of Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.2 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.3 Dynamic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.4 Linear and Non-linear problems . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.5 Formal Bayesian Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.6 State Space Model for Position Estimation . . . . . . . . . . . . . . . . . . . . 40

3.3.7 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Multicore Technology 47

4.1 Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Moore’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2.1 Three Walls Of VLSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Cache Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.1 Cache line or block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.2 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.3 Way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.4 Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 Cache - Memory mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.6 Classification of Cache misses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

vi

4.7 Transport Connectivity Unit (TCU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.8 Cache Coherency policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.8.1 MSI protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.8.2 From MSI to MESI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Test Environment 65

5.1 Live Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1.1 Antenna at RBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.1.2 RBS Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Measurement Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3 Test Site Visits with drive/walk route . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4 Concurrent client server test for Real time scheduler . . . . . . . . . . . . . . . . . . . . 76

5.4.1 Cell trace events streamer and decoder . . . . . . . . . . . . . . . . . . . . . . . 77

5.4.2 Communication between DUS-41 and TCU-03 . . . . . . . . . . . . . . . . . . 77

5.4.3 Mapping events streamer and decoder . . . . . . . . . . . . . . . . . . . . . . . 79

5.5 Output Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Implementation 83

6.1 System Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.1.1 Communication with Radio Base Station Equipments . . . . . . . . . . . . . . . 83

6.1.2 Solution Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.2 Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2.1 Antenna modelling for AoA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2.2 Multiple Signal Classification Algorithm . . . . . . . . . . . . . . . . . . . . . 87

6.2.3 Least Square Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.4 Verification of antenna modelling for an Ericsson RBS . . . . . . . . . . . . . . 92

6.2.5 Requirement of Particle Filter to process massive UE . . . . . . . . . . . . . . . 93

6.3 Parallel processing with AXM 5512 Multi-core . . . . . . . . . . . . . . . . . . . . . . 95

6.3.1 Parallel Programming requirements for Multi-core . . . . . . . . . . . . . . . . 95

6.3.2 Single core/Single thread scheduler . . . . . . . . . . . . . . . . . . . . . . . . 97

6.3.3 Real time Multi-threaded Scheduler design for MPcore . . . . . . . . . . . . . . 100

6.3.4 OpenMP based Centralized Task level scheduler for MPcore . . . . . . . . . . . 105

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7 Result and Discussion 109

7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

vii

7.1.1 Network based AoA angle Vs GPS based crude AoA angle . . . . . . . . . . . . 109

7.1.2 Profiling for single core execution with Gprof for AXM 5512 SoC . . . . . . . . 109

7.1.3 Race detector with Helgrind for four Arm Cortex A15 in AXM 5512 SoC . . . . 113

7.1.4 Heap Memory profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.1.5 Cache Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.1.6 Scalability analysis with Gustaffson Law for Multi-core scheduling with four

Arm Cortex A15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.2.1 Ground truth measurement characteristics for RSRP . . . . . . . . . . . . . . . 118

7.2.2 Measurement Characteristics in drive test . . . . . . . . . . . . . . . . . . . . . 120

7.2.3 Software Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.2.4 Serial Performance Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.2.5 Parallel Performance Engineering . . . . . . . . . . . . . . . . . . . . . . . . . 136

7.2.6 Heap profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2.7 Cache Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

8 Conclusion and Futurework 1538.1 Decentralized distribution of task level scheduling with extension of cores . . . . . . . . 153

8.2 Concentric circles for AoA accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Appendices 157

A LTE Event Parameters 159

B Data Structures 163

Bibliography 167

viii

1

Acronyms

3GPP 3rd Generation Partnership Project

AoA Angle of Arrival

CN Core Network

DoD Direction of Departure

DUS41 Digital Unit multi Standard

eNodeB Base Station

EPC Evolved Packet Core

LTE Long Term Evolution

MME Mobility Management Entity

MOM Managed Object Model

NAS Non Access Stratum

PDN-GW Packet Data Network Gateway

PM Performance Management

RAN Radio Access Network

RBS Radio Base Station

RRC Radio Resource Control

RSRP Reference Signal Received Power

S-GW Serving Gateway

TA Timing Advance

UE User Equipment

List of Figures

2.1 LTE Architecture[10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 LTE Interfaces[10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3 RBS Functional Blocks[10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Cell Trace[12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5 Cell trace Mapping[14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 OTDOA[25] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Enhanced Cell ID[25] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Gaussian distribution[45] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Filter with Gaussian distribution[46] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5 Likelihood evaluation with state x=3 in left side and state x=12 in right side[47] . . . . . 43

4.1 Moore’s Law[48,71] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Evolution of System-On-Chip[48] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Uniform Memory access model[49] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Non-Uniform Memory Access model[49] . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.5 Arm Cortex -A15 MP Core Processor[54] . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Block diagram of Cortex A15 MP Core Processor[54] . . . . . . . . . . . . . . . . . . . 54

4.7 AXM 5512 Multicore System-On-Chip[80] . . . . . . . . . . . . . . . . . . . . . . . . 55

4.8 Features of AXM 5512 SoC[80] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.9 Physical Address decoding for private core cache in ARM A15[53]. . . . . . . . . . . . 57

4.10 K-way set associative cache[53] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.11 Memory hierarchy in AXM SoC within TCU03[54] . . . . . . . . . . . . . . . . . . . . 58

4.12 MSI Coherence Policy[51] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1 Outdoor RBS Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Outdoor RBS Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.3 dus41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3

4 LIST OF FIGURES

5.4 dual beam antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.5 antennas at enkoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.6 horizontal and vertical pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.7 Dus-41[55,56] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.8 ports of dus41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.9 block diagram of dus-41[55,56] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.10 DUS41 used at enkoping[55,56] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.11 TCU03[57] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.12 ports of TCU[58] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.13 TCU03 used in Enkoping[57,58] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.14 Enkoping RBS Cabinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.15 drivepath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.16 Cell trace for event streamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.17 Event data with time stamp in millisecond . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.18 EventData list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.19 AvailableData list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.20 Cell trace mapping events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.21 SGSNMME for enkoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.22 MappingEntry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.23 UeDataPost structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.24 Socket interface between TCU03 and CMT Server . . . . . . . . . . . . . . . . . . . . 82

6.1 High level System Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.2 Solution Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.3 Antenna Gain Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.4 Antenna Gain Differential Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.5 DoD for UE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.6 Antenna Gain Differential Diagram for cell Id-1 and cell Id-2 . . . . . . . . . . . . . . . 90

6.7 Angle of Arrival(AoA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.8 AoA from DoD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.9 6-Sector Symmetric-cells[68] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.10 Particle Filter Algorithm loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.11 Hidden Markov Model[47] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.12 Functional Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.13 Consumer modelling showing callback functions with code regions . . . . . . . . . . . . 98

6.14 DataOfInterest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.15 Producer-Consumer and bounded buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 100

LIST OF FIGURES 5

6.16 Producer Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.17 Thread and IMSI ID Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.18 Bounded Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.19 False Sharing[69] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.20 Thread Level Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.21 OpenMPScheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.22 OpenMPScheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.1 Network AoA vs GPS based AoA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7.2 Gprof graph for main function call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.3 Gprof graph for Particle filter functions . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.4 Real time system constructs initialization order-3 . . . . . . . . . . . . . . . . . . . . . 113

7.5 heap profiling for massive UE before peak graph-5 . . . . . . . . . . . . . . . . . . . . 114

7.6 heap profiling for massive UE after peak-6 . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.7 Scalability for Multi-threaded scheduler graph-7 . . . . . . . . . . . . . . . . . . . . . . 117

7.8 Scalability for OpenMP scheduler graph-8 . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.9 Scalability Comparison graph-9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.10 RSRP Signal strength from serving cell along the drive route . . . . . . . . . . . . . . . 120

7.11 Ground truth analysis of RSRP from 1 or 2 cells . . . . . . . . . . . . . . . . . . . . . 121

7.12 Main function callgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.13 lock1-code snippet of Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.14 Anetenna model child region callgraph-10 . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.15 lock for Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

7.16 Unoptimized Call graph for Particle Filter child region . . . . . . . . . . . . . . . . . . 127

7.17 Unoptimized Sequential version execution for Particle Filter child region . . . . . . . . . 128

7.18 Arm A15 Cortex showing PMU counters-4 . . . . . . . . . . . . . . . . . . . . . . . . 128

7.19 Performance counter monitor[75] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.20 PAPI counters connecting with PMU for measurement of cache misses[75,53] . . . . . . 129

7.21 Listing-1:Unoptimized Sequential version for Least Square Algorithm-6 . . . . . . . . . 129

7.22 Listing-2:Optimized Sequential version for Least Square Algorithm-7 . . . . . . . . . . 130

7.23 Listing-3:Unoptimized Sequential version for Least Square Algorithm-8 . . . . . . . . . 130

7.24 Listing-4:Optimized Sequential version for Least Square Algorithm-9 . . . . . . . . . . 130

7.25 child region measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.26 Column wise matrix addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.27 Row wise matrix addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.28 matrix addition function called in particle filter . . . . . . . . . . . . . . . . . . . . . . 132

7.29 matrix multiplication function called in particle filter . . . . . . . . . . . . . . . . . . . 132

6 LIST OF FIGURES

7.30 particle filter trajectory of IMSI-1 with Unoptimized algorithmic loops of Single threaded

version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.31 optimized Particle filter loops with PAPI start counters . . . . . . . . . . . . . . . . . . 133

7.32 optimized particle filter loop with PAPI stop counters . . . . . . . . . . . . . . . . . . . 133

7.33 optimized Sequential version for Particle Filter-15 . . . . . . . . . . . . . . . . . . . . . 134

7.34 Leak summary of Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.35 IMSI-1 from Optimized Single threaded version-16 . . . . . . . . . . . . . . . . . . . . 135

7.36 particle filter trajectory of IMSI-1 with Unoptimized algorithmic loops of Single threaded

version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

7.37 Producer Consumer Lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

7.38 Call graph of threaded scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

7.39 Deterministic test execution trace for multi-threaded scheduler . . . . . . . . . . . . . . 138

7.40 multi-threaded race detector tool for trajectory software . . . . . . . . . . . . . . . . . 138

7.41 race detector check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139



7.44 leak summary for Parallel multi-threaded scheduler . . . . . . . . . . . . . . . . . . . . 140

7.45 IMSI-1 UE trajectory from multi threaded scheduler . . . . . . . . . . . . . . . . . . . 141

7.46 IMSI-10 from multi-threaded scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.47 Callgrind showing bottleneck of Locking . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.48 Load balance for 4 cores -1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.49 Load balance for 4 cores-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142



7.52 Callgraph for OpenMP consumer section . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.53 Deterministic condition test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.54 leak summary for Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.55 IMSI-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.56 IMSI-50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.57 callgrind execution for 1UE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147





7.62 heap profiler for one UE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.63 heap profiler-1 for massive UE Parallel version . . . . . . . . . . . . . . . . . . . . . . 150

LIST OF FIGURES 7

7.64 Cache profiler for 1UE in Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.65 Cache profiler for 10UE for Parallel version . . . . . . . . . . . . . . . . . . . . . . . . 151

8.1 Decentralized task scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

8.2 Concentric circles for AoA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8 LIST OF FIGURES

List of Tables

2.1 RSRP Value Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 MESICache[50] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1 Installation Parameters[68] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Data sheet[31] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.3 MUSIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.4 Cell ID and RSRP Value for LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.1 Instruction Cache Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.2 Data Cache Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.3 Single Core Execution Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.4 Multicore Execution Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.5 OpenMP Execution Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.6 Event Type and Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.7 Event Data Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.8 Memory requirement for particles for 1 UE . . . . . . . . . . . . . . . . . . . . . . . . 148

7.9 Memory requirement for particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

A.1 Event Header Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

A.2 RSRP Event specific parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

A.3 TA Event specific parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9

10 LIST OF TABLES

Chapter 1

Introduction

1.1 Introduction

Mobile or User Equipment positioning has been a very important field of research in sensor fusion with

wireless communication technologies. The two major classification of UE positioning are:

• Indoor Positioning

• Outdoor Positioning

In indoor positioning system, WiFi or Bluetooth provide a solution for UE positioning inside the build-

ing. They provide good accuracy close to 10 meters and works better as the WiFi is installed with good

network connection. For outdoor positioning system the two major focus areas were,

• Global Positioning System(GPS)

• Positioning based on Network data

GPS uses satellites for locating the UE. UE has inbuilt hardware for supporting GPS. This hardware

consumes lot of power. Also GPS requires clear line of sight for UE to receive signals from satellites.

GPS comes with drawback of supporting hardware, so UE which does not have supporting hardware

GPS cannot be used for positioning. Alternatively, with network based positioning, mobile network

can be utilized for UE positioning. There are several applications and use scenarios, when network

based location services are made functional for UEs. Telecommunication wireless network operators,

spend a great part of operational and maintenance expenditures in planning, configuring, optimizing and

maintaining wireless access network. They use radio signal propagation prediction tools which are not

fully accurate. The reasons for inaccuracies are imperfections in geographical data because of non-line

of sight and radio signal interference. This is mainly caused by the result of multipath signals as well as

11

12 1. Introduction

attenuation of the signal because of changes in traffic distribution. The above mentioned problems force

the operators to continuously optimize their networks using measurements and statistics by performing

drive or walk test. A drive test often uses a car driving around in a city while measuring the network to

get information about the coverage, throughput, bit-rate and signal quality of the network. Drive/Walk

testing provides a picture of the end user perception in the field and enables the operator to identify

locations causing poor performance. These information could be used to know how to tune the network

or where to build the new base station. Drive/walk test are however, not ideal since only a limited part

of the network can be analysed due to access restrictions. Furthermore, only a snapshot of live-data in

time for field test can be captured and analysed.

1.2 Motivation and Scope

With the rise in Long Term Evolution (LTE) network technology, the positioning methods is also made as

one of the important development component of this technology. The scope of this project is restricted

to Lte release-8. With every new release in LTE standards, some contribution and enhancements are

done for network based location services. For example, Release 9 has three positioning methods like

Assisted Global Navigation Satellite Services(AGNSS),Observed time difference Of Arrival(OTDOA)

and Enhanced Cell ID(ECID). In our project we use ECID method for positioning. The major motivation

factors for initiating this project are the following:

• To provide Opportunity for more location based services in many sectors like transport, commu-

nication and business sectors.

• To have better Network quality and dynamic monitoring.

• To reduce the operational expenditure for maintenance of wireless access network.

1.3 PROBLEM STATEMENT

There can be many scenarios where there is a need to to monitor continuously or track location of

several UEs present in the network. This requires setting up a real time system which receives the live

information from all the UEs present in the network and estimate their positions. Real time system refers

to process the received information to calculate the location based information as soon as it is receives

from the network. When ever there is delay in processing location based services, the estimation of UE

location will get delayed and accumulated. So in-order to process massive data for all the UEs there is

a need to distribute the load across several machines. So each machine process their allocated jobs of

computation for several UEs. For this to happen, the machines has to be distributed within RBS cabinet.

Ericsson RBS has an equipment known as Transport Connectivity Unit(TCU) which has this capability

Thesis Outline 13

to distribute the load for processing massive UE positioning. The main aim of this project is to create a

prototype of software for processing location based services for massive UE.

• Is it possible to do wireless antenna modelling for an Ericsson RBS to estimate AoA for one

or more UEs present within the network, with network based real time sensor data? Does the

estimated AoA can be used for the trajectory estimation and what kind of trajectory such an AoA

estimation brings with ground truth analysis?

• Is it possible to optimize the algorithms loop level for better cache friendly utilization, while

maintaining the trajectory functionality? How to do serial performance engineering for child

region algorithms?

• With the given task level modelled data structure from eNodeB based network streaming interface

and single thread cache friendly data structures for particle filter, how to schedule these task

with multiple threads in multi-core while maintaining hidden markov requirement for massive UE

trajectory estimation with particle filter?

• How to design a centralized task level scheduler for multi core, more efficient with optimized or

no lock contention? How to do parallel performance engineering?

• How to design a decentralized task level scheduler for extension of cores which fulfils hidden

markov requirement with hash method for massive UE trajectory estimation? How to estimate

AoA with more accuracy with varying TA distance?

1.4 Thesis Outline

Thesis is organized with following chapters.

• Chapter-2 explains LTE background and various configuration for streaming massive UE data

from radio base station, streaming requirements from MME for massive UE positioning.

• Chapter-3 explains State-Of-The-Art technologies in Positioning, including antenna modelling,

AoA calculation and positioning with particle filter.

• Chapter-4 explains Multicore State-Of-The-Art technologies available in TCU03.

• Chapter-5 Test Environment explains about various equipments used in live test and conducted

live test for AoA angle modelling. Nextly, it talks about Concurrent client server based simulation

test with input interfaces from RBS, MME(Mobility Management Entity) and output interface to

CMT server.

14 1. Introduction

• Chapter-6 explains Implementation methodologies for positioning, antenna modelling, AoA es-

timation and particle filter requirement for scheduling massive UE. Followed by Single core and

Multicore schedulers.

• Chapter-7 explains Results and Discussion.

• Chapter-8 explaining Conclusion and Futurework.

• Appendix-1 explains about LTE events and parameters

• Appendix-2 explains about various data structures used for the project.

1.5 Contributions

• Algorithm Design and Development

– Design of Multiple Signal Classification algorithm for antenna modelling with Ericsson RBS

and Implement in real time to find angle of arrival for one or massive UE.

– Design of Least Square algorithm for antenna modelling with Ericsson RBS and Implement

in real time for finding angle of arrival for massive UEs. Implementation of Cramer Rao

Lower Bound estimation for finding angle of arrival variance. Finally, the procedure of ob-

taining AoA and AoA Variance which are together used as data fusion for sensor model in

particle filter positioning.

• Multi core/ Multi threaded scheduler

– Design of task level scheduler for data structures with multicore and multithreaded paral-

lelism to maintain hidden markov requirement during processing trajectory estimation for

massive UE.

– Design of centralized task level scheduler with OpenMP for scheduling massive UE posi-

tioning with multicore while maintaining hidden markov requirement for particle filter.

• Measurements:

– Ground truth RSRP measurement, network AoA estimation Vs GPS AoA for correctness of

sensor model for particle filter.

Contributions 15

– Analysis of cache friendly utilization for least square and particle filter algorithmic loop level

by serial performance engineering with respect to child region.

– Analysis parallel performance engineering for different schedulers and verification of trajec-

tory estimation for massive UE processing with MPcore.

• Tools Used for making working software prototype fully functional with MPcore:

– GDB and Valgrind for debugging.

– Gprof GNU based tool and Callgrind tool of Valgrind is used for measuring lines of code

and relative communication bottleneck.

– Memcheck tool of Valgrind is used to detect memory leaks and memory errors.

– Helgrind tool of Valgrind is used to detect race and multi threaded errors.

– PAPI tool with PMU counters in ARM A15 and Cachegrind tool of Valgrind is used for

cache profiling.

– Scalability with taskset command and Load balance analysis with different core.

– Matlab for trajectory plot and DoD verification.

The following are contributed by my thesis partner Nawabul Haque

• Real time Cell trace events Input interface with task level modelling data structure and real time

mapping events streamer.

• Development of Single core/ single worker thread with cache friendly data structures and Particle

filter for real time filtering AoA and TA for positioning.

• Real time Output interface with CMT server.

16 1. Introduction

Chapter 2

Background

In this chapter, an overview of LTE network is presented. Next, various background work related to

this thesis for massive UE measurement configuration for real time network streaming have been pre-

sented. Long Term Evolution (LTE) is a prominent technology currently for the mobile communications

or telecommunications. It is also considered as 4G technology and has been defined by 3rd Generation

Partnership Project (3GPP). 3GPP was established in 1998. 3GPP is a joint collaboration of a number of

telecommunications standard development organizations. These organizations contribute to the specifi-

cations of the 4G[1]. LTE was introduced in 3GPP Release 8[2,3]. After this there were several releases

in LTE like Release 9, Release 10. For our project we use LTE release 8 related software features. LTE

network can be classified with the following major components:

• LTE Mobile phone or LTE dongle popularly known as UE

• Evolved Packet core (EPC)or Core Network

• Radio Access Network or LTE RAN

The above mentioned components can be visualized in the figure 2.1 which shows LTE Architecture.

2.1 Evolved Packet core or Core Network

Core network (CN) is also known as Evolved Packet Core (EPC).In the Core network many network

elements exist. These network elements are also known as logical nodes and can be classified into user

plane or control plane nodes. These components can be listed as below:

• MME - Mobility Management Entity (MME) is the control-plane node of the CN or EPC. MME

is responsible for bearer management, connection management of UEs. In other word, MME

processes the signalling between the UE and the CN. The protocols running between the UE and

17

18 2. Background

Figure 2.1: LTE Architecture[10]

the CN are referred as the Non-Access Stratum (NAS) protocols. MME manages and stores UE

context (for UE/user identities, UE mobility state, user security parameters). MME generates

temporary identities and allocates them to UEs. MME also authenticates the user during tracking

area list management, PDN-GW and S-GW selection, handovers(intra- and inter-LTE).

• SGW - The Serving Gateway(S-GW)) is the user plane node connecting the EPC to the LTE Radio

Access Network(RAN). The SGW routes and forwards user data packets, while also acting as the

mobility anchor for the user plane during the inter-eNodeB handover. SGW acts as the anchor for

mobility between LTE and other 3GPP technologies terminating S4 interface.

• PDN-GW - The Packet Data Network Gateway (PDN-GW) deals with user plane. PDN connects

the EPC to the internet. It’s responsible for IP address allocation for the UE, QoS enhancement

and Uplink and Downlink service-level charging.

Figure 2.2 presents various LTE Interfaces.In E-UTRAN, eNodeB provide the E-UTRA user plane pro-

tocols (PDCP/RLC/MAC/PHY) and control plane protocol which terminates towards the UE. The eN-

odeBs are connected by the S1-interface[4] to the EPC. The eNodeB connects to the MME by means

of the S1-MME interface and to the Serving Gateway(S-GW) by means of the S1-U interface. The S1

interface supports a many-to-many relation between MMEs/ S-GW and eNodeB. The eNodeBs are in-

terconnected with each other by means of the X2 interface[5].

LTE Network Elements:

Radio Access Network 19

Figure 2.2: LTE Interfaces[10]

Figure 2.3: RBS Functional Blocks[10]

2.2 Radio Access Network

In this section we will discuss about the necessary radio protocol communication happening in RBS.

2.2.1 RBS

Radio base station is also known as enhanced node and it is referred with multiple terminologies eNodeB

or site or eNB in this report.

In general, the radio protocol architecture for LTE can be separated into Management and control plane

and user plane as shown in below figure 2.3 which represents RBS Functional Blocks.

The Control Plane handles radio-specific functionality which depends on the state of the user equip-

ment. The user plane protocol is between the base station (eNodeB) or (RBS) and User Equipment

(UE). At user plane side, the application creates data packets that are processed by protocols such as

20 2. Background

TCP, UDP and IP, while in the control plane, the radio resource control(RRC) protocol writes signalling

messages that are exchanged between the base station and the mobile. In both cases, the information

is processed by the packet data convergence protocol (PDCP), the radio link control (RLC) protocol

and the medium access control (MAC) protocol, before being passed to the physical layer for transmis-

sion. eNodeB interfaces with the UE and hosts the PHYsical(PHY), Medium Access Control (MAC),

Radio Link Control (RLC), and Packet Data Control Protocol (PDCP) layers. It also hosts Radio Re-

source Control(RRC) functionality corresponding to the control plane[6]. It performs many functions

including radio resource management, admission control, scheduling, enforcement of negotiated UL

QoS, cell information broadcast, ciphering/deciphering of user and control plane data, and compres-

sion/decompression of Downlink/Uplink user plane packet headers.

2.2.2 RRC Connection Establishment and initial UE attach procedure

After the Random access procedure, if the UE is not already attached to the network it has to attach to

the network by initiating the attach procedure. An UE has to do so by sending packet data(PDN) with

NAS protocol for EPS. The NAS Transport procedure carries signalling between the UE and the SGSN-

MME. For initiating any NAS procedure, the UE is required to establish an RRC connection with the

eNodeB[6].This RRC Connection establishment procedure is used to transfer the initial NAS message

from the UE to the MME via, the eNodeB, however the eNodeB does not interpret NAS messages. The

purpose of this procedure, is to announce the presence of UE in the network and request the resources

from the network for its service needs. The initial creation of a Packet Data Network (PDN) connection is

performed through this Attach procedure. The MME sends the identity Request to the UE if it is unable

to retrieve the UE profile from its database in HSS. The UE responds back with IMSI in the Identity

Response message to MME. After getting the valid IMSI from the UE, the network now authenticates

the UE to ascertain whether the UE is genuine or not. After successful authentication, the network

protects the privacy of the UE subscription profile. Once the initial attach procedure succeeds, a context

is established for the UE in the MME. Now the RRC connection request is completed by eNodeB and

the MME accepts the UE. Thus the UE gets attached in the network.

2.3 Cell Trace for PM events with RBS

Performance Monitoring events (PM events) are required for positioning UE in network. There are two

possibilities for streaming these PM events data. Firstly, we analysed UE Trace and found the drawback

of positioning only limited number of 16 UEs. Secondly, we analysed the requirements and usage of

Cell Trace for positioning massive UEs. Figure 2.4 presents Cell trace procedure. So we found that

Cell trace fulfils our requirement but it gives raise to other requirements which we will discuss in next

section.

Cell Trace for PM events with RBS 21

Figure 2.4: Cell Trace[12]

2.3.1 PM Initiated UE measurements

Management Object Model Configuration ((MOM) Configuration) is an Moshell framework which has

certain commands to enable the Radio communication [7,8,19].

Step:1 Select an RBS through MO shell by enabling the appropriate MO function.

Step:2 Configure measurement Object MO, which enables the frequency of the cell ID.

Step:3 Configure Report Configuration MO, for reporting measurements.

Step:4 Finally in-order to Connect the report configuration and Measurement Object the operator creates

another MO and sets the appropriate attribute to point to the created report configuration.

Step:5 For coverage of all the cell regions in base station, including Cell to Cell hand-over for all cells

in the particular Base station, the above procedure is repeated for all the cells present in base station.

Step:6 Enable Event feature streaming with full 1000 UE fraction for the subscription profile in-order to

provide live measurements for all massive UE’s present in the coverage area for entire base station.

All the UEs should have configured with wireless settings, access point as LTE. All the UEs are required

to run a network trigger android application periodically for every second.

22 2. Background

2.3.2 Cell Trace Activation for Initiation of measurements

When using a Cell Trace Subscription Profile, the RBS(depending on UE fraction for the profile) ini-

tiates UE measurements on one or more UEs[11,12,13]. When the required measurement configura-

tion is complete(as per subsection), the measurements are ready to be initiated. This initiation is done

by including Performance Monitoring (PM) event UEMEASINTRAFREQ1 in a Cell Trace network

streaming, this in-turn corresponds to an attribute command in MO shell communicating with digital

unit server(DUS41)in RBS. The RBS puts each event received into the Cell Trace or UE trace stream

depending on what is activated. To not affect nor disturb ordinary traffic, the RBS does not initiate all

UE measurements at once in a specific cell. The RBS initiates UE measurements with periodicity of

1 second in each cell. By analysing, the event based information[20], the Cell Trace function provides

monitoring and evaluation capability of any traffic scenario taking place in LTE Radio Access Network.

2.4 Cell trace Mapping with SGSNMME

In this section, we will discuss about UE context release and the requirement for mapping. This is

followed by streaming of mapping events. The Cell trace mapping data from MME is transferred over

Gom interface[18].

2.4.1 UE Context release and Detach procedure

The Detach procedure is used to detach the UE from the network[17]. It is initiated by the UE or the

network. On completion of the detach procedure, the UE is in the de-registered state and cannot send

and receive data.

• UE Initiated detach procedure - If the UE does not require services from the network, it needs

to de-register itself with the network by initiating detach procedure by sending a Detach Request

message. The Detach type indicates whether detach is due to switch-Off or transition from active

to idle mode. If only the detach type is transition to idle mode, the MME sends detach accept and

clears the UE Context held at the eNodeB.

• Network initiated detach procedure - The UE Context Release Request procedure enables the

eNodeB to request the SGSN-MME to release the UE associated logical S1-connection because

of LTE generated reasons[17].

The UE Context Release procedure enables the SGSN-MME to order the release of the UE-associated

logical connection, because of any of the occurrences of the following condition,

• Completion of a transaction between the UE and the EPC.

Cell trace Mapping with SGSNMME 23

Figure 2.5: Cell trace Mapping[14]

• Completion of successful handover.

• Completion of handover cancellation.

2.4.2 Cell Trace Mapping activation

There are three ways in which mapping information between MME UE S1AP ID and IMSI of the UE

can be obtained from the MME[14,15].

• Logging XML file format for Report Output period[16].

• Streamed in real-time to an external system over the TCP[15].

• Both logged to file and streaming.

Whenever there is a change in the Cell trace interface, the post processing system must either fetch the

new XML file from the SGSN-MME Or new events data will be streamed to Cell Trace mapping. How-

ever, An understanding of the XML file helps in the interpretation of the Cell Trace Mapping for Cell

trace interface which remains same.

2.4.3 Real time Network streaming for Cell trace mapping

We use this wisdom for live Cell trace mapping streamer out of the three methods mentioned above[15].

These events are streamed to the post-processing system over TCU03 through TCP streaming. The

Stream header record contains the information about the start date and time for the streamed data. It also

includes information about the restart of the streaming. The event record contains the event data such as

event identity, result of the event and more. Types of Event Identity in SGSNMME node:

24 2. Background

• RRC connection request

• Internal UE Context release

Figure 2.5 presents the Cell trace mapping and how S1-APID is sent from both eNodeB and MME to post

processing system. We use S1-APID as mapping instance between the eNodeB and MME. Thus Cell

trace mapping streamer gives information of complete list of MME UE S1AP ID and IMSI mapping

from the streamer start time stamp. So, the main reason for change in Mme Ue S1AP ID is the context

release happening for the User equipment as well as, the initial access to attach in network because of

RRC connection request event, whenever, there is change in number of UEs present.

It is possible to use Cell Trace for the base station if the corresponding Cell Trace Mapping function

from the Mobility management entity is available. In addition, in both Cell trace with Performance

Monitoring event streamer as well as Cell trace mapping streamer we get administrative start time-

stamp. In the Cell Trace files from the RBS each UE-individual PM event is tagged with the S1AP IDs.

This MME can then provide a mapping file between S1AP IDs and the IMSI and IMEI/IMEISV. Hence

it is possible to get IMSI and IMEI/IMEISV for massive number of UEs for Cell Trace data[10].

2.5 Identifiers in LTE Network

2.5.1 Global eNodeB ID

In LTE network, every eNodeB has a unique identity using which it can be identified globally. This is

called as GlobaleNBID. This ID is defined in 3GPP TS 36.300[21].

2.5.2 Physical Cell ID

In LTE network, every eNodeB has either 3 or 6 physical cells. Each physical cell has a separate

identifier known as cell id. The Cell is pre-configured to send out a local ID which is one of the 510

unique ID.

2.5.3 eNBUES1APID

eNBUES1APID is used to identify a UE over S1 interface within an eNodeB. This ID is received and

used by MME for all UE associated S1-AP signaling.

2.5.4 MMEUES1APID

MMEUES1AP ID is an identity associated with a UE and it is allocated uniquely over S1 interface

within an MME. eNodeB receives this ID from MME and stores it for the duration UE has logical S1

connection. eNodeB uses this ID for all the UE associated S1-AP signaling. So this MMEUE S1AP ID

is a temporary identity for a UE.

Other LTE terms 25

2.5.5 IMSI ID

The International Mobile Subscriber Identity is known as IMSI identity. This identity is a permanent

identity.

2.6 Other LTE terms

2.6.1 UE

UE refers to User Equipment. UE represents any user device like mobile or dongles which uses LTE

network.

2.6.2 Timing Advance

In LTE, whenever the UE is present with-in the base station it tries to establish RRC connection with Ra-

dio base station(eNodeB), the UE transmits a Random Access Preamble to the base station. The eNodeB

monitors random access waveforms and estimates the time of arrival known as Timing Advance(TA).

Based on this the eNodeB transmits a random access response which consists of timing advance com-

mand which is 6bit value in range 0-63. This value tells the UE to increase or decrease the delay time.

The threshold value 32 represents zero value which tells the UE that it does not need to change the

delay time. So the values above the threshold tells UE to increase of the delay time and values below

the threshold to decrease the delay time. TA value is equal to two times the propagation delay. TA is

expressed in the units of 16 Ts, where:

Ts = 1/(15000x2048)s = 32.55ns[23]

TA value provides a round trip time (RTT) value for the signal. Thus TA can be used to estimate distance

of the UE from base station antenna. To estimate the distance of the UE from the antenna, the following

distance speed equation can be used.

TAdistance = (TAx16Ts)/2xC, whereCisthespeedoflight = 300, 000km/s

2.6.3 RSRP

(RSRP) is the linear average of reference signal received power within the specified range of bandwidth

between -140 dBm to -44 dBm in terms of watts. This is the most important requirement that the UE

has to measure for cell selection, re-selection, intra and inter handover. These measured RSRP values

are mapped to reported RSRP, in a range from 0 to 97 with 1 dBm as resolution. The reported RSRP

value 0 corresponds to value less than or equal to -140 dBm and 97 represents RSRP value greater than

or equal to -44 dBm[22]. This mapped values are reported to the network by cell trace as RSRP value

by the UE. Table 2.1 presents the mapping of the measured RSRP values to the reported RSRP values.

26 2. Background

Table 2.1: RSRP Value Mapping

Reported RSRP Value Measured RSRP quantity Unit

0 RSRP < -140 dBm

1 -140 ≤ RSRP < -139 dBm

... ... ..

96 -45 ≤ RSRP < -44 dBm

97 -44 ≤ RSRP dBm

2.6.4 AoA

Angle of Arrival(AoA) is the measurement of the incoming angle from which a receiver UE registers

a transmitted signal. This AoA angle provides an estimate of the direction in which UE is located

with reference to geographical north as reference direction.3GPP 36.305 describes ECID positioning

methods, which uses AoA for UE positioning[24]. In this project, AoA is used for estimation of the UE

position.

2.6.5 DoD

Direction of Departure (DoD) is the measurement of the angle at which the transmitter transmits the

signal. DoD is complementary phase shift of AoA.

2.7 Summary

In this chapter, we provided necessary concepts in LTE background that is required for the project.

Next we saw the network streaming methods like cell trace and cell trace mapping requirements for

positioning massive UE.

Chapter 3

Positioning

This section provides a detailed overview on various positioning methods with LTE based network data.

Next it talks about the positioning methods used in this thesis.This section presents various algorithms

used for network based UE positioning in real time.Finally it talks about the related work done in the

past with matlab simulation or in off line ie, not in real time.

3.1 Positioning in LTE

Following are the positioning techniques in LTE with location based services with 3GPP Release-9.

• A-GNSS (Assisted Global Navigation Satellite System) - To overcome the lack of unobstructed

line-of-sight and low signal level drawbacks of autonomous GNSS the cellular network assists

the GNSS receiver by providing assistance data for the visible satellites. In an assisted GNSS

system, the contents of the GPS navigation message are supplied to the receiver mobile device by

the cellular network. There are two basic methods for positioning calculation for A-GNSS, both

approaches involve the mobile device measuring the GNSS signals.

1. Mobile Assisted: The device measures the visible satellites and sends GNSS measurement data

to the network, which calculates the position.

2. Mobile Based : The device performs the same satellite measurements as mobile assisted but

also calculates its position before sending the calculated location back to the network.

This location calculation places additional complexity on the device resulting in additional pro-

cessing to be performed and therefore results in increased power consumption.

• OTDOA (Observed Time Difference of Arrival) - OTDOA is the positioning solution of choice

when GNSS signals cannot be used due to a lack of clear line of sight. OTDOA uses neighbour

27

28 3. Positioning

Figure 3.1: OTDOA[25]

cells(eNodeB’s) to derive an observed time difference of arrival relative to the serving cell. Cur-

rent solutions are based on both Inter-Band and Intra-Band eNodeB measurements. This position

estimation is based on measuring the Time Difference Of Arrival(TDOA) of special reference sig-

nals, embedded into the overall downlink signal, received from different eNodeB’s.

Each of the TDOA measurements describes a hyperbola as shown in figure 3.1, where the two

focus points (F1,F2)are the two measured eNodeB’s. The measurement needs to be taken at least

for three pairs of base station. The position of the device is the intersection of the three hyperbolas

for the three measured base stations(A-B,A-C and B-C). The measurement taken between a pair

of eNB’s is defined as Reference Signal Timed Difference (RSTD). The measurement is defined

as the relative timing difference between a subframe received from the neighbouring cell j and the

corresponding subframe from the serving cell i. These measurements are taken on the Positioning

Reference Signals, the results are reported back to the location server, where the calculation of the

position happens.

• ECID (Enhanced Cell ID) - OTDOA is the method of choice for urban and indoor areas, where (A-

)GNSS will not provide its best or no performance at all. Another method for position estimation

in LTE is Enhanced Cell ID (E-CID), based on Cell of Origin(COO). With COO, the position of

the device is estimated using the knowledge of the geographical coordinates of its serving base

station, in terms of LTE the eNB.

The knowledge of the serving cell can be obtained executing a tracking area update or by paging.

Antenna modelling 29

Figure 3.2: Enhanced Cell ID[25]

Enhanced Cell ID (E-CID) has been defined with LTE mainly for devices where no GNSS receiver

has been integrated. On top of using the knowledge of the geographical coordinates of the serving

base station, the position of the device is estimated more accurately by performing measurements

on radio signals. E-CID can be executed in three ways, using different measurements:

1. E-CID with estimating the distance from 1 base station. This is shown in case 1 from figure

3.2.

2. E-CID with measuring the distance from 3 base station.This is shown in figure 3.2, in case 2.

3. E-CID by measuring the Angle-of-Arrival(AoA) from at least 2 base station or 3. This is shown

in figure 3.2 from case 3.

In the first two case the position accuracy would be just a circle with measurements of (TDOA)

and TA are taken by the device therefore its UE assisted. For the third case,the measurements of

RSRP and TA are taken by base station and are therefore its eNB-assisted.

3.2 Antenna modelling

The antenna gain information is typically made available by antenna manufacturers, typically as antenna

gain diagrams[26] with respect to the antenna main beam direction.

The antenna gain in dB can be described in polar coordinates by G(φ, θ), where φ and θ are the hori-

zontal angle facing north with range −180. <= φ <= 180. and is the negative elevation angle relative

to the horizontal plane in range 90. <= θ <= 90.. It is common to consider the antenna gain to be

separable into a horizontal and a vertical component[39,47],

GA(φ, θ) = gAh(φ) +GAv(θ).

Since the horizontal angle will also be referred to as the Direction of Departure. In this thesis, the focus

will be on the horizontal antenna component and therefore we will neglect the vertical component and

adopt to horizontal gain modelling. The Antenna Gain modelling[27,28] is proved for the following

30 3. Positioning

equations. The horizontal azimuth gain model is parametrized with max gain Gm in dBi,horizontal half

power beam width HPBWh degrees and a front-back ration FBRh in dB, which are combined in the gain

expression according to, the Horizontal Antenna Gain[27],

GHV (α) = −min(12(α

HPBW)2, FBR) +Gmax,−180 <= (α) <= 180.

The antenna gain information is typically made available by antenna manufacturers, typically as antenna

diagrams with respect to the antenna main beam direction.

The antenna diagram can be described by a trigonometric model. This model is shown in below equation.

G(α, θ) =N∑n=0

ηn(cos(n(α− θ)))[39, 47]

In the above equation, α is the main beam direction and θ is the angle variable. The antennas used

in different Base Stations may vary and thus have different characteristics based on the following field

installation .

1. Main Beam width angle or the Azhimuthal direction

2. The Maximum Gain is acheived in the Azhimuthal direction.

3. Frequency of the Cell(Antenna)

With Dual beam positioning, the main distinguishing characteristics between the antenna cells are the

main beam width angle. This refers to the main Serving angle for the serving cells.

Hence, During Wireless Modelling for an LTE Base station with matlab, the above equation will enable

one to verify for each of the antenna cell which will have their maximum horizontal gain in their main

beam direction.

3.2.1 LS Method

The User Equipment can be made to measure the reference signal received power from both serving

and non-serving cell. Radio Propagation : The base station transmits a reference signal associated to

cell C,and the received power is modelled as a sum consisting of four terms.The first part of the sum is

the power used to transmit the signal Ptrans.The signal is attenuated by the feeder loss,Pfloss , and the

propagation path loss, Pploss.The last term in the expression is the antenna gain.


Thus received signal strength of the UE from cell c, can be modelled as

Prec = PBSctrans − P cfloss +GAhc + P cploss

We consider Antenna gain is modelled as Horizontal Gain component in antenna modelling leading to

expression G(α, θ), describes the horizontal gain of the antenna at the DoD angle α with antenna az-

imuth angle θ respectively.

The Key idea of least square algorithm, is to compare the power of the two received signals that originate

from different antennas at the same base station.

For example if i and j are the two physical cells present in Cell trace. The Reference signal received

power from the cell is modelled by,

Prec1 = Ptrans1 + Pfloss1 +G1(α, θ) + Pploss1 −−(1)

Prec2 = Ptrans2 + Pfloss2 +G2(α, θ) + Pploss2 −−(2)

By Subtracting (1) and (2),

Prec1 − Prec2 = Ptrans1 − Ptrans2 + Pfloss1 − Pfloss2 +G1(α, θ)−G2(α, θ) + Pploss1 − Pploss2

Following Assumptions from the above equation are made for positioning interest:

1. The first assumption made is that the transmitted power at the two antennas will either be the same,

and cancel out.

2. An assumption can be made with the Path loss and feeder loss variables.since both the signals have

originated from the cells present at the same base station, they have travelled approximately the same

distance from from the site.

Hence the path loss is assumed as same for both the signals and therefore the Pploss and Pfloss will get

cancel.

32 3. Positioning

Pdiff = Prec1 − Prec2

t12 = Ptrans1 − Ptrans2

Pdiff − t12 = G1(α, θ)−G2(α, θ)

The difference t12 is usually zero since the transmitted power from the same BS is usually the same but

might be different in some special cases.

Now the RSS difference is directly related to the difference in antenna gains. The gain of the antennas

is angularly dependent and the function relating the gain and the angle is known as the antenna gain

diagram.

H(α, θ1, θ2) = G1(α, θ1)−G2(α, θ2)

Thereby, the difference between reference signal received power is equal to the difference in the horizon-

tal antenna gain . This relative antenna gain difference can be determined from antenna gain information

together with antenna branch configurations(antenna main beam directions).

α = argmin(|Pdiff −H(α, θ1, θ2)|)

A solution to estimate theta is to use a Least Square(LS) estimator. The Least Square estimator com-

putes the angle theta that minimizes the least squared error norm between the measured data (the RSRP

difference) and the function (horizontal gain differences) evaluated using the values from the radiation

patterns. The main beam width angle of two different cells, is used to filter the exact angle[30].

CRLB for angle variance: The CRLB states the lowest possible variance for an unbiased estimator and

is derived for the differential antenna diagram[30].

The differential antenna function H(α, θ1, θ2) is given by the following equation.

H(α, θ1, θ2) =

N∑n=0

ηn(cos(n(α− θ1)))− ηn(cos(n(α− θ2)))


The derivative of differential function is given by the following equation which describes the AoA vari-

ance.

(dH(σ, θ1, θ2)

dθ) =

N∑n=0

ηn(sin(nα− nθ1))− ηn(sin(nα− nθ2))

3.2.2 MUSIC Method

Another technique to obtain DoD estimates is based on the Multiple signal classification Algorithm.

The MUSIC[32] algorithm produces a ”spectrum” P(theta) that exhibits peaks for angle theta close to

the true AoA of the incoming signals. This Spectrum is computed with the following steps:

1. Data Collection:

A sequence of RSRP on the three antenna faces is collected by exchanging radio messages.

2. Covariance Estimation:

The Spatial covariance matrix R is estimated using the incoming sequence of RSRP values from cell

trace.

3. Singular value decomposition:

The matrix R is decomposed using singular value decomposition (SVD). This eventually Separates sig-

nal component from noise Assuming a single signal source(the target node), one of the eigenvectors is

related to target messages, while the other two are related to the noise.

4. Projection:

The steering vector G(theta) is projected onto the subspace spanned by the noise eigenvectors.

G(θ) = G1(θ), G2(θ), G3(θ)

Where G1, G2, G3 are the particular Antenna Gain of the cell (antenna) identified from the same base

station

P (θ) = G(θ)HG(θ)

G(θ)HΠG(θ)

where, NoisePhat Π contains the three noise eigenvectors. Thus the true signal from the noise compo-

nent is separated by spatial covariance matrix from singular value decomposition.

5.By finding the MUSIC spectrum, the peak spectrum is evaluated for DoD. Algorithm-4 from the Al-

gorithm two and three, We Obtain the DoD which is known as the degree of departure measurement

standardized for LTE is known as the estimated angle of UE in with reference direction as geographical

34 3. Positioning

north, with positive in clockwise direction.

AoA is defined as the incoming angle from which the receiver UE registers a transmitted signal from the

base station.

DoD is denoted as the inverse relation and is the angle from which the transmitter UE transmits its sig-

nal.

The difference between DoD and AoA is just 180 degree shift. To obtain AoA with cell trace measure-

ments of RSRP we use the following:

if(One RSRP measurement )

AoA = main beam width of serving cell.

AoA Variance = Symmetric angle difference between main beam width direction divided by 2.

if(Two RSRP measurements)

AoA = given by Least Square estimate along with main beam width direction.

AoA Variance = CRLB modelling gives angle variance.

if(Three RSRP measurements)

AoA = given by Multiple Signal Classification

3.3 Need of Filters

Non Line Of Sight (NLoS) NLoS refers to the presence of obstacle between the sender and the UE.

NLOS situations are quite complex and can result in multipath signals as well as attenuation of the

signal.

Measurement error Measurement error can be introduced by the equipment which measures radio

signal or sends radio signal. An example can be the serving cell antenna with one RSRP can introduce

high value of AoA variance. Moreover if the measurement data like AoA and TA reported by the network

is taken directly to produce the output trajectory, it will receive a lot of errors involved and that will be

reflected in trajectory behaviour. So noise and error in measurement will show irregular performance in

trajectory.

Signal Interference with massive UE When positioning application for massive UE is performed in

real time, there will be lot of interference of radio signal.

In real world applications the data that is obtained from sensors, may not contain the information that

is wanted or needed. Instead the interesting unknown signal quantities must be estimated from the

measurements. To be able to do this, filters can be used. In the simplest cases linear filters could be used

to estimate certain frequency information. However to be able to solve more complex problems, model

based filtering such as Gaussian filter is necessary to obtain good results.

Need of Filters 35

3.3.1 Probability

Probability for a random variable X be equal to a value x is denoted as p(X = x) = p(x). The values

of P (X) can range from 0 to 1.

0 <= P (X) <= 1

Total Probability

Total probability theorem says that the probability of event X is the sum of the probabilities of event X

given Y is true for all the possible events Y. This theorem can be presented by the following equation:

P (X) =∑

Y P (X|Y )P (Y )

Joint Probability

Joint probability is the probability of occurrence of one random variable x together with another random

variable y. Joint probability is denoted as p(x, y)

Conditional Probability

Conditional probability is defined as the probability of a random variable x, given the already known

value of another random variable y. When we already know that one event has occurred, and we want

to know the probability of different event occurring, then this is called as conditional probability.The

probability that event A will occur, given that (or) on the condition that, event B has already occurred.It

is denoted by P (A|B).

P (A|B) = P (AandB)/P (B)

Probability density function

In probability theory, a probability density function(PDF), or density of a continuous random variable,

is a function that describes the relative likelihood for this random variable to take on a given value.

Filter with Gaussian distribution

Gaussian noise is a statistical noise having a probability density function(PDF) equal to that of the

normal distribution, which is also known as the Gaussian distribution. In other words, the values that

the noise can take on are Gaussian-distributed as shown in figure 3.3.

Following equation represents a Gaussian distribution:

36 3. Positioning

Figure 3.3: Gaussian distribution[45]

f(x, µ, σ) = 1σ√

2πe- (x−µ)

2

2σ2

Where µ represents the mean and σ2 the variance of the distribution.

The shape of Gaussian distribution as shown in figure 3.3, is completely characterized by population

mean and population variance. From the figure we can also infer some interesting remarks such as the

following.

Firstly, the distribution has symmetric mean and the relative frequency of samples is highest at mean.

Secondly, about 2/3rd of all samples fall within one standard deviation of the mean. Thirdly, about 95

percentage of samples lie within 2 standard deviations of the mean.

Figure 3.4 presents a simple filtering mechanism with Gaussian distribution. If a system reporting

measurement about location with angle and distance from origin, has a Gaussian noise, then the mea-

surement can be represented with a Gaussian distribution. When we consider that the object is having a

one dimensional motion and moving along X axis. We can define motion model of the system assuming

constant velocity and noise in estimation as Gaussian, then we can draw Gaussian distribution with mean

as Z1 and a variance as σ21 . The blue curve represent this distribution with P (Z1|x). Similarly the Gaus-

sian distribution of measurement with P (Z2|x) can be represented as red curve with mean as Z2 and a

variance as σ22 . Intuitively, we can see that there is a high probability of the object being between the

two mean values of the position of the object denoted by estimation and measurement.By multiplication

the two Gaussian distribution we get another Gaussian distribution known as L(x) as shown by black

curve with mean value between the two means and variance σ2L which is smaller than σ2

1 and σ22 .

Need of Filters 37

Figure 3.4: Filter with Gaussian distribution[46]

Posterior Probability density function

A posterior probability is the probability of assigning observations to the division of groups given the

data. The posterior probability is the probability of the parameters θ given the evidence p(θ|X). The

likelihood function is the probability of the evidence given the parameters p(X|θ).

Prior Probability density function

A prior probability is the probability that an observation will fall into a particular division of group before

we collect the data.Let us define the relation between likelihood function and the posterior probability.

When we have a prior belief that the probability distribution function is p(θ) and observations x with the

likelihood p(x|θ), then the posterior probability is defined as:

p(θ|x) =p(x|θ)p(θ)p(x)

3.3.2 Bayes Theorem

Bayes law[33,43] is a direct application of conditional probability.The posterior probability distribution

of one random variable given the value of another can be calculated with Bayes theorem by multiply-

ing the prior probability distribution by the likelihood function, and then dividing by the normalizing

constant. This rule is used in statistical filtering for constructing the posterior probability density func-

tion(PDF) of the state vector. State vector is the vector which contains the quantities being filtered.

38 3. Positioning

3.3.3 Dynamic Estimation

Dynamic estimation problem includes modelling two mathematical models namely, state dynamics and

measurement equations[34,35]. State dynamics talks about how the state vector evolves with time. The

equation of state dynamics can be assumed to be of the form as shown below,

xt+1 = f(xt, vt)

Here xt+1 is the state vector to be estimated at time t + 1 and it can be represented as a function of

current state vector xt and white noise which is also referred as process or system noise vt. In the above

equation, f is known as a function which could be assumed to be of linear or non-linear system. For the

special case, when f is linear and vt is Gaussian, then the transition density P (xt+1|xt) is also called

Gaussian. In measurement equation the received measurements can be modelled as a measurement

function dependent on the state vector and measurement noise.

zt+1 = h(xt+1, wt+1)

In the above equation zt+1 represents the vector of received measurements at time step t + 1 which

relates the state vector with measurement vector andwt+1 is the measurement noise. The PDF ofwt+1 is

assumed to be known and the two noises are vt and wt+1 are mutually independent. Thus, an equivalent

probabilistic model for the above equation can be represented by the conditional PDF p(zt+1|xt+1). For

special case when function h is assumed to be linear and wt+1 is Gaussian, then P (zt+1|xt+1) is also

Gaussian.

In-order to start the dynamic estimation problem the initial conditions are also required. This initial

condition which is before any measurement was received are referred as the prior PDF of state vector

at time step t = 0, which is represented as P (x0). So the dynamic estimation problem is completely

defined by all the three probabilistic description functions of the problem as the following,

P (x0), P (xt+1|xt), P (zt+1|xt+1)

3.3.4 Linear and Non-linear problems

If the mathematical modelling of functions f and h governing the state dynamics and measurement

equations respectively are linear, then the problem is linear. The state dynamics[36,37,38] is represented

by the below equation.

xt+1 = Axt + vt

Need of Filters 39

The measurement equation can be modelled as the below equation.

zt+1 = Bxt+1 + wt+1

Thus, in case of linear systems, Kalman Filter (KF) can be applied for filtering. It is known fact that

KF performs well for linear system having Gaussian noise. In case of non linear system there is a

modification applied in KF with jacobian framework. This modification for non-linear problem leads

to Extended Kalman Filter (EKF) and its variants. Furthermore, with increasing non linear and non

gaussian noises, EKF leads severe departure from the linear Gaussian situation. This leads to filter

divergence which is exhibited by estimation errors substantially larger than indicated by the filter’s

internal covariance. For such non linear problem, particle filter may be an option which can handle

non-linearity.

3.3.5 Formal Bayesian Filter

This section will describe Bayes Filter and how it is derived. The recursive filter equations in Bayes

Filter[35,43] is the solution to any given filtering problem, linear as well as non-linear. However, in

most cases the recursion can not be solved for a non-linear problem. As mentioned in statistical filtering

the focus is often on the probability, which is interpreted as the probability for the state x, given all

measurements y from time 1 up till time t. PDF is also known as filtering density.

In the Bayesian approach[36,37,38] one attempts to construct the posterior PDF of the state vector xtgiven all the available information. This posterior PDF at time step t, is written p(xt|Zt), where Ztdenotes the set of all measurements received up to and including zt.Zt can be represented as:

Zt = z0, z1, ...., zt

The formal Bayesian recursive filter consists of a prediction and an update operation. The prediction

operation propagates the posterior PDF of the state vector from current time step to next time step. The

new propagated PDF is called prior PDF of the state vector and it can be represented as P (xt|Zt−1)

which can be obtained via the dynamics model (the transition density). This prior PDF can be calculated

as follows:

p(xt|Zt−1) =

∫p(xt|xt−1)p(xt−1|Zt−1)dxt−1

40 3. Positioning

This prior PDF in time step t can be updated to incorporate the new measurements zt received at time

step t. This update will give the posterior PDF of state vector at time step t. The posterior PDF can be

represented as P (xt|Zt) and it is defined with following equation.

p(xt|Zt) = p(zt|xt)p(xt|Zt−1)/p(zt|Zt−1)

This above equation is Bayes rule, as we can define the posterior PDF = (Likelihood X Prior PDF)/Normalising

denominator. The denominator is normalizing factor that normalizes the probabilities to make sure that

the PDF integrates to one. The following equation defines normalization.

p(zt|Zt−1) =

∫p(zt|xt)p(xt|Zt−1)dxt

3.3.6 State Space Model for Position Estimation

A state space model is employed to model the behavior of UE. In cell trace communication over digital

unit server, there is a possibility to cover all the User Equipment present within the base station and their

position can be modelled.

An User equipment can be positioned in the network with its State vector.The State vector that is used

consists of four states, which represents the position and velocity in both horizontal and the vertical

plane.

xt =

px

py

vx

vy

A particle represents a belief of a state. Particle Filter[39,47] consists of known number of set of parti-

cles. Each and every particle is associated with state vector and weight.When the particle (or sample)

represents a concrete belief of a state information known as state vector, the weight W that indicates

the certainty that the belief of state is true.In statistical terms the weight represents the probability of

presence of the particle.

Sampling in Particle Filter: Sampling represents a way to draw a number of samples(particles) according

Need of Filters 41

to a given probability.

To be able to estimate these states, a motion model and sensor model is used.

• Motion Model - A motion model[35, 41, 44] describes the motion of the UE between the current

state xt and the future statext+1. The one that has been used in this work is called white-noise ac-

celeration model. The model assumes that the acceleration of the target is with standard deviation

(sigma) and constant velocity model.

xt+1 = Axt +Bvt

where

A =

1 0 T 0

0 1 0 T

0 0 1 0

0 0 0 1

, B =

T22 0

0 T22

T 0

0 T

and vt is a Gaussian noise matrix with 0 as mean and σ2 as variance and it can be represented as:

vt = Q.R

where

Q =

(σ2 0

0 σ2

), R =

(rx

ry

)rx and ry are random values from a normal distribution.

• Sensor Model - The sensor model describes the relation between states with the measurements.

The measurements yt for each time instance are both DoDs and TA.In the State vector only the

position states x1:2t are related to the measurements DoD and TA.

The DoD describes the angle between the UEs position and the base station serving site coordi-

nates. The DoDs that are estimated from the RSRP measurements are modelled as

DoD = arctan(py, px) = arctan(x1:2)

The TA measurement describes the distance between the UE and the base station serving site.

42 3. Positioning

TA =√px2 + py2 = |x1:2|

3.3.7 Particle Filter

Particle filter uses particles as samples to define the probability density function (PDF). As we have seen

that there is a state vector that is associated to every particle and this state vector contains states that are

continuously filtered and evolved through different phases. In this way the particles are used to define

prior and posterior PDF.

• Initialization- At the beginning, a certain N number of particles are created. A state vector is

attached with every particle and these particles are represented with initial state as xi0, where i

varies from 1 to N. N is the number of particles that is used in the algorithm. Although random

distribution is used as the initial distribution which spreads the particles around a center point, all

the weights, are assigned a uniform weight 1/N, where i represents 1 to N particles. Hence the

initial distribution gives all the particles the same probability.

• State Prediction or Time Update phase - In this phase, prediction of state estimate is made and

we simulate a trajectory from one time-step to the next using the dynamic motion model. Each

particle from previous time step (t-1) is inserted and moved according to the motion model and

this provides an estimate of the new state vector of every particle for the next time step called state

estimate. It is also called a set of samples from prior PDF. Hence the predicted state estimate from

the motion model is utilized to move each particle according to a prediction of where the target

UE will be in the next time-step. The acceleration of the particles are limited through the motion

models standard deviation. Thus the particles move a realistic distance between the time steps.

• Measurement Update - In Measurement Update, we compute weight for the particles using sensor

model. This phase allows the particles, to be updated based on the measurements. The prior sam-

ples or particles from prediction phase are assigned new weights in this phase. In Measurement

Update stage the measurements are predicted for each particle and the error is used to calculate the

likelihood. This process of calculating the likelihood by comparing the particles predicted state

with the received measurement is explained below with equation.

wit = P (zt|xit)

Need of Filters 43

Figure 3.5: Likelihood evaluation with state x=3 in left side and state x=12 in right side[47]

For each time step, measurements are predicted for each particle and evaluated against real mea-

surements. If a particle’s measurements are close to the real measurements the particle will get

a high importance weight. However, if the measurements are not close enough, the particle will

receive a low importance weight. This variation of weight varies according to the distance.

Example of Gaussian Likelihood evaluation[47]: Assume there is a model which describes the

relation between the measurement y and state x,

y = 1.x+ e

e ∼ N(1, 3)

If the measurement y=6 and the state x=3. The likelihood would be,

pe(y − x) = pe(3) =1√

(2π32)exp−

(3−1)2

2.32 = 0.1065

In addition, if the state x=12 and the measurement y=4 it yields that,

pe(y − x) = pe(8) =1√

(2π32)exp−

(8−1)2

2.32 = 0.0087

So figure 3.5 shows that the likelihood is radically decreased when the difference between the

state x and the measurement y grows larger. Measurement Update algorithm is presented below

with derivation from sensor model.

44 3. Positioning

• Normalization - Since the particles are estimating a probability density function, they need to in-

tegrate to one. The new weight assigned to the particles in the update phase is then normalized

by dividing weight of every particle with the sum of the weight of all the particles. This step is

necessary to make sure that the sum of the weights of all the particles must be equal to one. The

normalization of the importance weight, process is given by following equation.

wit =wit∑Nj=1 w

jt

• Re-sampling - In this re-sampling phase method, the current set of particles represents the filtering

density and here a new set of particles are drawn with replacement from this distribution with

respect to the current particles importance weights. Re-sampling produces a new set of particles

that reflects the old particles and their weights. Hence, there is a higher probability to sample a

particle with high weight than a particle with low weight. The set of N new particles will contain

many copies of the particles from the previous set with high importance weight and less or none of

those with low importance weight. This Multinomial Re-sampling algorithm with Ripley method

is presented below.

So, after the initialization phase which only happens once then, the following four phases like state

prediction or time update, measurement update, normalization phase and Re-sampling becomes a cycle

which will be followed for all measurements.

3.4 Related Work

This project depends on several functional components with real time modelling and implementation

namely, Antenna Modelling, AoA estimation, Filtering. There have been lot of work done in the field

related to UE positioning in wireless networks. FCC’s E911 requirements in USA[40], is that a UE

position should be estimated with a defined accuracy in case of emergency. This requirements led to

several work for UE positioning techniques. Gunnarson, Lindsten and Carsson[39] contributed for an

offline trajectory estimation method from UE side. They use different filters to compare performance

of trajectory. Another related work done for antenna modelling methodology discussed by Gunnars-

son, Johansson, Furuskar, Lundevall, Simonsson, Tidestav and Blomgren[27]. They propose antenna

model with downtilted, which includes vertical gain pattern along with horizontal gain pattern described

by 3GPP specifications[28,29] .They describe antenna gain modelling in terms of both horizontal gain

Summary 45

Algorithm 1: Measurement Update Algorithm

1 for i = 1, ..., N do Predict TA and DoD measurements for the particle,

yt = h(x1:2t )TA

ypt = h(x1:2t )DoDp = 1...K

2 Calculate the likelihood wiTA,

wTA =1√

(2πσ2TA)

exp− (TAs−yt)

2

2σ2TA

3 Calculate the likelihood wpDoD for p = 1...K,

wpDoD = exp− (y

pt−s

p)2

2σ2DoD + exp

− (ypt−360−sp)2

2σ2DoD + exp

− (ypt +360−sp)2

2σ2DoD

4 Calculate the total likelihood5

wit = wTAxw1DoDx...xw

pDoD

and vertical gain pattern. However only horizontal gain pattern modelling is used for our thesis. AoA

estimation is important component which requires three RSRP according to MUSIC algorithm which

gives a unique AoA with good signal extraction from noise modelling. A lot of work has been done

with different algorithms with off-line[32].However, in our thesis, there was a requirement to do On-line

computation for AoA with real time parameters[31]. This project uses Particle filter for filtering purpose.

A lot of research work has been done in this field. Gustafson[35] has discussed the use of particle filter

for positioning. He also discusses various bottlenecks involved in resampling methods. Gustafsson also

discusses about dynamic motion model in his book about sensor fusion.

3.5 Summary

In this chapter, many algorithms used for positioning is discussed. At the beginning and at the end

related work about UE positioning has been discussed.However, all the authors have contributed for

Off-line UE positioning in LTE[42] with matlab based simulations. In this project real time network

based data is used for positioning with various real time positioning algorithms.

46 3. Positioning

Algorithm 2: Multinomial Re-sampling Algorithm

1 Calculate cumulative product of N random numbers and cumulative sum of weights of N particles2 wc[0]representstheweightofthe1stparticle3 u[0] = (randomnumberbetween0to1) 1

N4 for i = 1, ..., N − 1 do5 wc[i] = wc[i− 1] + weightoftheithparticle6 u[i] = u[i− 1] ∗ (randomnumberbetween0to1) 1

N−1

7 end8 By finding the closest value to u in the Cumulative Distribution Function (CDF) of wit and save

the index I which will eventually be used to select new N particles from old particles accordingto their weight

9 k = 010 for i = 1, ..., N − 1 do11 while wc[k] < u[N − i− 1] do12 k = (k + 1)%N13 end14 select kth particle as a new particle15 end

Chapter 4

Multicore Technology

In this chapter let us see about the past trend of processors, three walls of VLSI, introduction to mul-

ticore architectures, essential knowledge about cache- terminologies, various memory mapping, cache

misses and coherency. Further we discuss about the multicore cache memory system in TCU known as

transport network unit. In addition, we will discuss about the four standard questions of cache memory

like block placement, spatial locality access, block replacement and write policy used in private core

cache memory in TCU.

4.1 Trend

During the last decades, computer systems gained more performance due to the steady increase of the

processor clock rates and/or due to memory size, bandwidth and speed. So in that time only by upgrad-

ing the hardware to more powerful system, the performance gain in terms of speed up was increased

without any programming effort. Frequency scaling was the dominant reason for improvements in com-

puter performance from the mid-1980s until 2004[48]. The runtime of a program is equal to the number

of instructions multiplied by the average time per instruction. Maintaining everything else constant,

increasing the clock frequency decreases the average time it takes to execute an instruction. An increase

in frequency thus decreases runtime for all computation-bounded programs.

4.2 Moore’s Law

Moore’s law refers to an observation made by Intel co-founder Gordon Moore in 1965. He noticed

that the number of transistors per square inch on integrated circuits had doubled every year since their

47

48 4. Multicore Technology

Figure 4.1: Moore’s Law[48,71]

invention. Moore’s law[48,49,71] predicts that this trend will continue into foreseeable future as shown

in figure 4.1. Although the pace has slowed, the number of transistors per square inch has since doubled

approximately every 18 months.

4.2.1 Three Walls Of VLSI

• The power wall[48] - the trend of consuming exponentially increasing power with each factorial

increase of operating frequency. This increase can be mitigated by using smaller but more cores.

The single core frequency wall has power constrained performance. This is due to physical limita-

tions of semiconductor based microelectronics and power dissipation, it became more difficult to

increase processors speed. The power consumption of a chip is given by the following equation:

P = CXV 2Xf,

Where P is power, C is capacitance being switched per clock cycle(proportional to the number of

transistors whose inputs change), V is voltage which is proportional to the processor frequency

and f is the processor frequency.

• The Instruction Level Parallelism wall[48] - the increasing difficulty of finding enough parallelism

in a single instructions stream to keep a high performance single core processor busy.

• The memory wall[48]- the increasing gap between processor and memory speeds. This effect

pushes cache sizes larger in-order to hide the latency of memory, and also more memory channels.

Yet the demand for faster applications is continuously increasing. The solution for that is going parallel.

The principle concept was that large problems can often be divided into smaller independent ones, which

Memory Hierarchy 49

Figure 4.2: Evolution of System-On-Chip[48]

are then solved concurrently or in parallel. Then we need multiple workers or processing elements

to execute those independent jobs. Multi core architecture is what we mean by multiple processing

elements- processors-within a single machine. These processors differ from super-scalar processors

which can issue multiple instructions per cycle from one instruction stream (thread); by contrast, a multi

core processor can issue multiple instructions per cycle from multiple instruction stream.

Researchers contribute to more VLSI technology scaling with three walls and increase in transistors

based on Moore’s law. Hence with System-On-Chip designers manufacturing technology has reached

to multiple processor SoC from digital ASIC technology as shown in figure 4.2.

4.3 Memory Hierarchy

In a shared-memory multi-processor system the way the memory is organized and so accessed by the

processors can be either uniform memory access (UMA) or non-uniform memory access(NUMA).

Uniform Memory Access model[49], gets its name from the fact that each processor must use the same

shared bus to access memory, resulting in a memory access time that is uniform across all processors.

The figure 4.3 below shows UMA model.

The problem with UMA model that it is not scalable. As the number of processors increase the

interconnection bus becomes a hot spot and with several requests to memory the network traffic becomes

congested. Non-Uniform Memory Access model[49]: In NUMA model each processor has its own local

memory module that it can access directly and with distinctive performance advantage. At the same

time, it can access any memory module belonging to other processor using shared bus. The diagram of

NUMA architecture can be seen below figure 4.4.

If data resides in local memory, access is fast. If data resides in remote memory, access is slower.


Figure 4.3: Uniform Memory access model[49]

Figure 4.4: Non-Uniform Memory Access model[49]

Cache Terminology 51

So the advantage of the NUMA architecture as a hierarchical shared memory. By providing each node

with its own local memory, memory access can take place in parallel and avoid throughput limitations

and contention issues associated with shared memory bus that happened in the UMA. The downside of

NUMA model, the time required to retrieve data from an adjacent node within the NUMA model will

be significantly higher than that required to access local memory.

4.4 Cache Terminology

4.4.1 Cache line or block

A cache line[49,67] refers to the smallest loadable unit of a cache, a block of contiguous words in main

memory.

4.4.2 Index

The index is the part of a memory address which determines in which line(s) of the cache the address

can be found.

4.4.3 Way

A way is a subdivision of a cache, each way being equal size and indexed in the same fashion. The line

associated with a particular value from each cache way grouped together forms a set.

4.4.4 Tag

The tag is the part of a memory address stored within the cache which identifies the main memory

address associated with a line of data.

4.5 Cache - Memory mapping

There are three methods in block placement[49,67].

Direct mapped caches: If each block has only one place it can appear in the cache, the cache is said to

be direct mapped. Using the least significant bits in the physical address as index, each memory address

that share the same index will be mapped to the same location in the cache. The mapping is usually

(Block address)MOD(Number of blocks in cache).

Fully associative caches:If a block can be placed anywhere in the cache, the cache is said to be fully

associative.

Any memory address can be mapped to any entry in the cache. It require mechanism to search the whole

cache entries to determine if the requested address exists in cache or not. This mapping is only suitable


for small caches. N-way set associative caches: It is a compromise between the direct mapped and fully

associative designs. If a block can be placed in a restricted set of places in the cache, the cache is said

to be set associative .Cache entries are divided into sets with size N, typically N is either 2,4,8,16 etc.

Using tags in the address, each memory address is assigned a set.So a block is first mapped onto a set,

and then the block can be placed anywhere within that set. The set is usually chosen by bit selection, that

is (Block address)MOD(Number of sets in cache) In-Order to determine which type of cache is best, we

need to define two factors,

1. Hit time: Time needed to determine if a memory address exists in a cache entry or not.

2. Hit ratio : The likelihood of the cache containing the memory addresses that the processor wants.

4.6 Classification of Cache misses

A Cache miss[67,70,71] is the non-availability of data in cache. We can classify these misses in three

categories.

• Compulsory miss or Cold Miss - When we start a program, there will be nothing no instruction

or data corresponding to the program in the cache. So any access to a block that is fetched for the

first time will necessarily be a miss. This is also called as compulsory misses as these misses due

to first reference of datum are compulsorily needs to be fetched from main memory. Cache size

and associativity makes no difference to the number of compulsory misses. Pre fetching can help

to avoid compulsory misses.

• Capacity misses - Capacity misses are those misses due to the size of the cache being smaller than

the data accessed in the program. This refers to the fact that cache cannot accommodate all the

blocks needed. Capacity misses can be increased by increasing the cache size and cache block

size.

• Conflict misses - This refers to the misses due to many memory blocks mapping to the same

cache block. For example, misses occur because a block B which was originally in the cache,

got replaced by some other block A. If block B is subsequently accessed, the miss occurring is a

conflict miss. This is a major problem with direct mapped cache. This conflict misses can be due

to particular amount of limited cache associativity and/or replacement policy.

Since accessing main memory is slow, modern processors provide fast local memory(cache) to speed up

memory access. Caches are only effective for data that is being re-used. There may be multiple levels

of cache each with different characteristics. Most modern processors have 3 levels of cache. Third level

cache (L3 cache) is often shared among several processor cores. As caches are the local copies of global

memory, multiple cores can hold a copy of the same data in their caches.

Transport Connectivity Unit (TCU) 53

For OpenMP based parallel programming codes, where multiple threads share the same address

space this could lead to problems. Before accessing main memory, a processor core will check its

own cache and the cache of the other socket to ensure consistency between cache and memory. This is

referred to as cache coherency. The term coherency means any changes in shared data by one processor’s

private L1 cache are visible to the other processors, private cache. Ensuring that a node is cache coherent

does not mean that problems associated with multiple copies of data are completely removed. Data held

in processor registers are not covered by coherence. This lack of coherence can lead to a race condition.

Example of 2 Single core sockets:

• Both processors have taken a copy of the same data from main memory.

• If one of them wants to write to this data, then the local copy will be affected, but the main memory

does not change.

• The reason for fast cache is to avoid slower main memory access.

• On a cache coherent system, the rest of the node needs to be told about the update.

• Other processors cache needs to be told that its data is ”bad” and that needs a fresh copy.

• Then the new data can be written into the local copy held in cache. Now if Cpu-2 wishes to read

from or write to the data it needs to get a fresh copy.

4.7 Transport Connectivity Unit (TCU)

TCU uses AXM 5512 Multicore SoC with State-Of-the-Art technologies[58]. This SoC contains ARM

Cortex A15 cores. A block diagram of Cortex A15 is presented below figure 4.5, showing a 4 core

multicore design. The Performance monitoring unit(one per core) is highlighted in figure 4.6. In an

Embedded Multi-core[70], it is a known fact that, there will be maximum memory access latency when

ever there is a communication between main-memory and the cores. Therefore, the memory access

latency is reduced by making use of multi-level caches. Furthermore, hierarchical multi-level cache

memory design in both size and set associativity will produce more speed-up in communication with

closer level cache. TCU equipment has AXM5512 SoC[80] which is an SMP system that has in total

12 ARM Cortex A15 based MPCores which is shown in figure 4.7 and figure 4.8. In this SMP based

Multicore system, for every cluster, the Cache coherency is maintained between the four private cores

level-1 cache and shared level-2 caches by all four cores.

• Level-3 Shared Cache - The L3 Cache has size of 8 MB, which is 16-way Set Associativity. This

L3 Cache is shared by 3 clusters of cores. Each cluster has 4 cores of Arm Cortex A15.


Figure 4.5: Arm Cortex -A15 MP Core Processor[54]

Figure 4.6: Block diagram of Cortex A15 MP Core Processor[54]


Figure 4.7: AXM 5512 Multicore System-On-Chip[80]

Figure 4.8: Features of AXM 5512 SoC[80]


• Level-2 Shared Cache - The L2 Cache has size of 2 MB, which is 16-way Set Associativity. It

has fixed line length of 64 bytes. This L2 Cache is shared by 4 Arm cores as shown in below

figure. Level-2 shared cache consists of physically indexed and tagged cache. It has Random

cache-replacement policy.

• Level-1 Private Cache - L1 Instruction-Cache has size of 32 KB, which is having 2-way set as-

sociativity. L1 Data-Cache has size of 32 KB, which is a 2-way set associativity. Both Data and

Instruction, private caches map their data to particular processor. Both the caches have fixed line

length of 64 bytes. Both Data and Instruction cache is Physically Indexed and Physically Tagged.

The Cache replacement policy in both Level-1 Instruction and Data cache is least recently used

algorithm.

Before we look into properties of L1 private core cache, let us look at decoding a given physical ad-

dress[52,53]. The cache line length is sixteen words(64 bytes) and we have 2-ways set associativity.

Total cache size is 32KB. Physical address bit decoding for private core cache is also shown in figure

4.9.

Number of cache lines =((Cache size divided by number of ways),divided by block size)

= (32KB/2/64 bytes)

= 256 lines in each way

The total physical address bits = 40 bits

To locate a byte within a word we need byte offset = 2 bits [0:1]

To locate a word within a cache line of 64 bytes(16words)= 4 bits [5:2]

To locate a cache line within a way(256 lines in each way) we need eight bits to index a cache line

within a way = 8 bits [13:6]

The remaining bits is used as a tag[physical address bits - 14] = 26 bits[39:14]

General k-way set associative cache organization is shown i figure 4.10. Now let us see in detail analysis

of TCU03 AXM chip, private core cache which is Set Associativity design based on memory mapping,

by discussing the following 4 standard questions

• When we copy a block of data from main memory to the cache, where exactly should we put it?

The index field of the address is used to select a particular cache line or block. In direct mapped

cache, the main memory address is mapped as directly as one to one mapping of cache locations

as lines. So depending on index bits, each location in main memory ending with same index bits

will map to only one location in direct mapped cache. Hence there are many chances for cache

thrashing, So this is potential disadvantage in direct mapping. In AXM-5512 SoC, the private


Figure 4.9: Physical Address decoding for private core cache in ARM A15[53].

Figure 4.10: K-way set associative cache[53]


Figure 4.11: Memory hierarchy in AXM SoC within TCU03[54]

caches are 2-way set associative cache memory, for which there is an advantage of mapping. The

index field of the address continues to be used to select a particular cache line or block, but now

it points to an individual line in each way. So depending on index bits, each location in main

memory ending with same index bits will map to either of the two way cache, with same index

as location as line in cache but not to both. To precisely tell which block of memory is stored in

cache, we store additional information as Tag bits.

• How can we tell if a word is already in the cache, or if it has to be fetched from main memory

first?

Since programs have spatial locality (Once a location is retrieved, there is more chances that

nearby locations would be retrieved in near future). So a cache is organized in the form of blocks.

Cache block size is 64 bytes as we seen above. So the block size is 16 words, hence if any word is

accessed or read from main memory, consecutive 15 words will also be accessed in the same go

of read access. Hence whenever the private cores wants to read data from main memory, it reads

the entire cache line size of data ie, 64 bytes which will have 100 cycles latency as shown in figure

4.11 Memory hierarchy of AXM SoC.

The Level-2 cache memory system supports for inclusion between the L1 data caches and the

L2 cache. A line that resides in any of the Level-1 data caches must also reside in the L2 cache.

This approach[70] has benefits Which yields the highest performance for delivering data to a core.

Principle of Locality for Cache: A Program in execution tends to re-use data and instructions near

those that have recently used are more likely to be accessed in future.

– Spatial Locality: Data items with nearby addresses tend to be referenced close together in


time.

– Temporal Locality: Recently referenced data items are likely to be referenced in near future.

Repeatedly referenced variable leads to temporal locality.

Example Code Snippet:

– Sum=0;

– for(i=0;i¡n;i++)

– Sum + = a[i] + b[i];

– return Sum;

Locality example[70] for L1 Data Cache private core

– Spatial Locality - reference of array a[i] and b[i] elements such as accessing successive ele-

ments stored in contiguous memory location.

– Temporal Locality- reference of data element Sum for each iteration.

Locality example[70] for L1 Instruction Cache private core

– Spatial Locality - reference instructions in sequential accessing memory layout within loops.

– Temporal Locality - Cycle through loop repeatedly till the loop count i met the condition

(i¡n).

• What policy to be used to replace a cache block in case the cache memory is full? Eventually, the

small cache memory might fill up. To load a new block from main RAM, we have to replace one

of the existing blocks in the cache. So least replacement policy is used. In general, a cache size

is much too small to hold all the data you might possibly need, so at some point you are going to

have to remove something from the cache in order to make room for new data. The goal is to retain

those items that are more likely to be retrieved again soon. This requires a sensible algorithm for

selecting what to remove from the cache. One simple solution is Least recently used algorithm.

As an example, lets imagine a cache that can hold up to five piece of data. Let us assume that

we access three piece of data A,B and C. As we keep accessing we store in cache.Next when we


access D and E, they are added to cache and filling all the slots. Now suppose if we access A again

A is already in cache. So the cache does not change but however this cache counts the use as most

recently used. Now if we were to access F, we need to throw something so as to make new room

for F. Hence the least used piece of data is B, and so we throw B out of cache and replace F.

• How can write operations be handled by the memory system?

Caching Write policy

– Write hit: When the processor executes a store instruction, a cache lookup on the address(es)

to be written is performed. For a cache hit on a write, there are two choices.

* Write back: In this case, writes[50,53] are performed only to the cache, and not to main

memory. This means that cache line and main memory can contain different data. The

cache line holds newer data, and main memory contains older data(said to be stale).

To mark these lines, each line of the cache has an associated dirty bit. When a write

happens which updates the cache, but not main memory, the dirty bit is set. If the cache

later evicts a cache line whose dirty bit is set(a dirty line), it writes the line out to main

memory.Using write back cache policy can significantly reduce traffic to slow external

memory and therefore improve performance and save power.

* Write through: With this policy writes are performed to both the cache and main mem-

ory. This means that the cache and main memory are kept coherent. As there are more

writes to main memory, a write-through policy is slower than a write-back policy if the

write buffer fills.

Write through needs two state per cache line(valid and invalid), and write back needs 3

state(modified, shared and invalid)

– Write miss:

* Write allocate: Upon a write miss this policy, brings the line in the cache.

* write no allocate: According to this, Upon a write miss the line remains invalid in the

cache.

In write back the cache, owns the line while in Write- through the memory, owns the line.

To fully exploit the advantage of rite back, write allocate is needed to give the line to the

cache. But it doesn’t bring anything with write through, so we use write no allocate with

write through.

While in a uniprocessor system, for write through and write -no-allocate caches, only two states

are needed(Valid or Invalid), in a multiprocessor system as the target system, we can find a set of

n states, each one manipulated by the finite state machine implemented on the cache coherence

Cache Coherency policy 61

controllers on each node levels of cache hierarchy. The following finite state is same for every

block and every cache, but actual state of a block differs for different caches.

4.8 Cache Coherency policy

There are coherency issues[53,51] as one cache line can be in more than one cache.Any Cache line can

be in 1 of 5 states.

• M (Modified): a cache block in this state holds the only valid copy of data. The core has read and

write permissions over the block. The copy of the block found on main memory is stale. If another

core requests the block, the cache with the block in the modified state must provide it. Cache line

has been modified, is different from main memory, and is the only cached copy.(’dirty’).

• O(Owned): a cache block in this state must provide the data if another core requests it. In this

case the block can coexist with another blocks in the shared state. The core holding the block in

the Owned state has just read permission over it.

• E (Exclusive): a cache block in the exclusive state holds a valid copy of the data with read and

write permissions over it. In this case the state does not need to supply the block in the case

another core requests it. Exclusive state can be seen as an intermediate state between shared and

modified. cache line is the same as main memory and is the only cached copy.

• S (Shared): a cache block in this state has a valid copy of the data with read permission over

it.Cache line is the same as main memory but copies may exist in other caches. Other cores can

also hold the block on the shared state and one of them may have it in the owned state.If no owner

block is present the main memory must provide it in the case that another core requests it.

• I (Invalid):a Cache line or cache block data does not hold a valid block.

4.8.1 MSI protocol

The MSI protocol supports the minimum set of states to ensure the cache coherence protocol to work

properly for invalidation based write back private caches.Figure 4.12 presents MSI state diagram. The

state transition diagram in above figure, shows the possible state transitions of a cache block. All the

transitions in the diagram, have a label of the form R/A, where R indicates a request while A represent

the action that the cache coherence controller must be taken following the request made- As in the other

protocols, each of these transitions is composed of one or more operations and, for the correctness of

the protocol, it’s necessary that these be performed in an atomic way. There are two types of transition,

based on who makes the request:

• The bold arcs represent the transitions due to the read(LOAD) and write(STORE) processor issues.


Figure 4.12: MSI Coherence Policy[51]

Cache Coherency policy 63

• The dashed arcs represent the transition due to requests from the other caches(L-REQ and S-REQ).

MSI has performance issue as it requires two transactions for common case. First read the data and then

write it. Transaction 1 is used to move from Invalidate to Shared and transaction 2 is used to move the

cache line state from Shared to modified. This inefficiency exist even if application has no data sharing

at all.

4.8.2 From MSI to MESI

The MESI protocol(known also as Illinois protocol due to its development at the University of Illinois

at Urbana-Champaign[51]) is widely used cache coherence protocol. It is the most common protocol

which supports write back cache. Analysing the MSI protocol, the first factor of inefficiency can be seen

when a process needs to read and modify a data item: the transitions that are caused are always two,

even when there are no other nodes sharing the cache block. In fact, it is initially generated a transition

that gets the memory block in shared state; the second transition is caused by the processor write request

that converts the state of the block from shared to modified. Table 4.1 presents the MESI cache hit and

miss with respect to read and write for private core cache in A15. The MESI protocol adds an Exclusive

state to reduce the traffic caused by writes of blocks that only exist in one cache. This new state indicates

an intermediate level of binding between shared and modified.

Table 4.1: MESICache[50]

Read/Write Event Local Cache line Remote Cache line

Read Hit Use local copy No action

Read miss I to S, or I to E (S,E,M) to S

Write Hit (S,E) to M (S,E,M) to I

Write Miss I to M (S,E,M) to I

The shared Level-2 Cache memory system consists of Snoop Control Unit. This SCU fromCortex

A15 reference manual[53] uses hybrid Modified Exclusive Shared Invalid (MESI) and Modified Owned

Exclusive Shared Invalid (MOESI)protocols to maintain coherency between individual L1 data caches

and the L2 cache. The L2 memory system contains snoop tag array that is a duplicate copy of each of

the L1 data cache directories. The snoop tag array reduces the amount of snoop traffic between the L2

memory system and the L1 memory system. In this thesis the requirement is to use one cluster, with the

performance analysis with four Arm Cortex A-15 based MPCore.


4.9 Summary

In this chapter we discussed about the state of the art technology of multicore AXM 5512 SoC in TCU03

equipment. We have seen the cache properties, cache memory design explanation with four major

questions.

Chapter 5

Test Environment

This chapter talks about various test environments, through which the working software prototype is

made fully functional. This chapter also describes various hardware and software components used

for live test, drive or walk test, simultaneous client-server based socket level communication test for

scheduling massive UEs.

5.1 Live Test Environment

Ericsson has several live test RBS sites in many cities.Figure 5.1 presents the Outdoor RBS site. A

complete Ericsson RBS site includes products such as antenna systems, power supplies and backup,

enclosure and installation materials, site transmission, shelters, solar panels, fuel cells, towers , RBS

cabinet and RU cabinet. Ericsson has radio base station and several live test sites in Enkoping, which is

a city located 200 km away from Linkoping. We choose this site with which we had access to radio base

station and a nearby test site, which is quite convenient to access radio base station. Figure 5.2 presents

the Outdoor RBS cabinet. Also figure 5.3 shows DUS41 used for our project in enkoping site.

• A: Climate System

• B: Power Units

• C: Radio Sub-rack

• D: Alarm Connection panel

• E: Backup Batteries

• F: Space for Optional transmission equipment

65

66 5. Test Environment

Figure 5.1: Outdoor RBS Site

Figure 5.2: Outdoor RBS Cabinet

Live Test Environment 67

Figure 5.3: dus41


Figure 5.4: dual beam antenna

5.1.1 Antenna at RBS

The antennas used in our test site are Kathrein 80010656[31].Figure 5.4 presents the dual beam antenna.

This antenna has dual beam faces and the azimuth angle or beam-width directions for these two beams

the horizontal pattern with left beam +30 degree and right beam with -30 degree. Both the horizontal

and vertical pattern are shown in figure 5.6. The two vertical flat surfaces facing outward horizontally

are the beam-width direction for the two beams of the antenna. There are three such dual beam antennas

used as shown in figure 5.5. So with three such dual beam antennas used, 6 beams are present in total.

Each beam has separate beam-width angle, otherwise known as azimuth angle with which each beam

can serve in one of the six sectors. These antennas were installed on the top of water tower at the radio

base station site in enkoping.

5.1.2 RBS Cabinet

There are two cabinets installed near the water tower, namely the radio unit Cabinet which connect to

antennas and the RBS Cabinet, where the RBS equipments resides. Following are the RBS equipments

placed inside the RBS cabinet:

• DUS41

• TCU03


Figure 5.5: antennas at enkoping

Figure 5.6: horizontal and vertical pattern

• Network router

• Uplink Transport Unit

• 48Volt Power supply

DUS41

This digital unit equipment known as DUS-41 is a LTE technology which is connected to RU through

radio interface on one side, and provides interface to LTE signals. On the other side the DUS41 is

connected to TCU03. The connection between DUS41 and TCU03 is made through OAM(Operation

and Maintenance). The figure 5.7 and 5.8 shows various connectivity ports present on DUS41.It has

6 radio interface ports, 3 transport ports like TN A, TN B, TN C, one Local Maintenance Terminal

port LMT in short and one GPS port. One of the transport port in DUS41 is connected to TCU03 for

communication. LMT port is for managing the node in remote terminal. The functional block diagram

of DUS41 is presented in figure 5.9. DUS41 has baseband processing as the central block. The block on

the right side of the baseband processing unit has radio interface connection to the radio unit. The left

side block has the baseband processor that has many input for various kind of input data. Apart from

these a power supply of 48V DC is required to run the equipment. The DUS41 we installed in RBS for


Figure 5.7: Dus-41[55,56]

Figure 5.8: ports of dus41

Figure 5.9: block diagram of dus-41[55,56]


Figure 5.10: DUS41 used at enkoping[55,56]

Figure 5.11: TCU03[57]

our positioning project is shown in below figure 5.10.

TCU03

TCU03- Transport Connectivity network Unit (TCU03) resides in the RBS cabinet along with DUS41

equipment.In RBS, TCU03 is connected to the digital unit equipment DUS41 on one side and on the

other side, it is connected to the mobile backhaul. It receives data from digital unit and sends data over

the network through network unit. A TCU03 is presented in the figure 5.11.Figure 5.12, shows different

connectivity ports like two ports named TN L and TN K. The port TN K was used to connect to the

DUS41 with RJ45 cable. Figure 5.13 presents TCU03 used in enkoping.


Figure 5.12: ports of TCU[58]

Figure 5.13: TCU03 used in Enkoping[57,58]

Measurement Requirement 73

5.2 Measurement Requirement

The two major classification of LTE events available for performance recording from eNodeB are as

follows:

• Internal Events - Internal events are generated within RBS. They provide information about vari-

ous periodical communication and behaviour of the RBS.

• External Events - External events happen external to the RBS. These events are UE signalling

events. These events generate radio level signalling between RBS and UE or external signalling

between RBS and Core Network.

Apart from these types of events, there is another type of event defined in LTE which is known as PM-

Initiated UE measurement events. The two events we are interested for UE positioning are:

• UE MEAS INTRAFREQ1[11,12,20]- This is PM-Initiated UE measurement event. This event

contains the MMES1APID of UE, the time stamp of the event generation, RSRP value from

serving cell, global cell ID of the serving cell, the physical cell IDs of the neighbouring non-

serving cell which provides the RSRP and their RSRPs.

• INTERNAL PER RADIO UE MEASUREMENT TA[11,12,20]- This is a periodic internal event

generated in RBS. It is generated every minute and contain 60 values of TA with an interval of 1

second. Along with TA values this event provides the MMES1APID of the UE.

The two types of measurement data that is required for real time positioning are:

• RSRP values - RSRP values provides the signal strength from the various cells within one base

station from which the UE can receive power. Both the serving cell and different neighbouring

non-serving cells provide RSRP. Thus these RSRP values from atleast two cells from the same

base station are required to calculate the AoA for a particular UE.

• TA values - TA values provides the distance in the form of timing advance value which in-turn

represents the time taken for the signal to reach from the antenna of the serving cell to the UE.

5.3 Test Site Visits with drive/walk route

This section discusses various visits made to the test RBS site. Next, it presents the drive route used

for the testing the software for positioning. UE Positioning with live network data includes antenna


modelling for test RBS, AoA calculation and particle filter. So during drive or walk test across the drive

route, there is a continuous requirement of RSRP and TA values.

• First visit to RBS site(on 16th June 2015): The first visit was made to install DUS41 and TCU03

equipment in RBS cabinet as shown in figure 5.14, in test RBS site. The RUs were connected to

the DUS41 and TCU03 was connected to the IP transport equipment, mini-link SP[77]. DUS41

is connected to TCU03 over RJ45 cable and both the equipments were also connected to the re-

quired power supply of(48V DC) which is available inside RBS cabinet. Once after installation

was made, the reachability of TCU03 and the reachability of DUS41 through TCU03 was tested.

Followed by the initial version of software with measurement configuration. Finally we could

successfully able to stream and decode the RSRP events.

• Second visit to RBS site(on 9th July 2015): We made another visit to test site with our software

version with all features required for trajectory calculation for one UE present in the network.A

drive test was made with route shown in figure 5.15. when the software is tested with network, we

found that the TA fields in the events carrying TA values were unavailable. Further we found that

we did not get adequate RSRP values for running real time multiple signal classification algorithm

to arrive at AoA. So there is a need to model the least square algorithm and implement it within

trajectory software for real time AoA angle processing.

• Third visit to RBS site(on 3rd August 2015): The third visit to the test site had three major pur-

poses. Firstly, a solution was made for fixing the issue related to the TA values being unavailable

in the TA events. Secondly the least square algorithm was implemented and integrated with tra-

jectory software which requires 2 RSRP values for AoA and CRLB logic was made for AoA

variance. Thirdly, the equipments (DUS41 and TCU03) at test RBS cabinet had became inacces-

sible because of some fault and it did not allow the access to TCU03 from external network and

in turn no access to DUS41 equipment. So the first important purpose was to fix the fault and

bring the communication with the equipments, up and running in order. Hence it was needed to

restore the board software and install our software again. However, the TCU03 at RBS site was

still inaccessible from network external to the RBS and we found out that the uplink from RBS

was blocked from Stockholm. So the next day we got it unblocked by requesting the concerned

person in Stockholm. This allowed us to access the TCU03 from external network and in turn

DUS41. Then we tested the fix for TA values in the TA events field. Further when we did contin-

uous streaming for drive test and fixed the AoA and AoA variance with LS algorithm for getting

Test Site Visits with drive/walk route 75

Figure 5.14: Enkoping RBS Cabinet


Figure 5.15: drivepath

sensor model in PF to work continuously.Finally we could also fix AoA and AoA variance issue

with one RSRP.

• Fourth visit to RBS site(on 25th August 2015) So after the third test we found that the main beam

direction of the 6 antennas installed above the water tower as shown in figure were miss-matching

with the orientation of cell ID information given to us. The fourth visit again had two purpose.

Firstly, finding correct main beam direction of antennas and its appropriate cell ID. The second

purpose was to collect the events data with another drive test.

5.4 Concurrent client server test for Real time scheduler

This section discusses various input streamer interface and output interfaces. The client simulation

test file was created for driving streaming data for both cell trace events streamer and mapping events

streamer. Both the cell trace events streamer and mapping events streamer test environment for simulat-

ing massive UE scheduler is explained below.

Concurrent client server test for Real time scheduler 77

Figure 5.16: Cell trace for event streamer

5.4.1 Cell trace events streamer and decoder

Through MO Shell command configuration along with Cell trace activation, we use configuration for

measuring massive number of user equipment present within the base station. Through MO Shell Com-

mand the IP Address of DUS-41 and port number is mentioned. Along with this fraction profile of 1000

is set for complete coverage of massive number of the User Equipment.

5.4.2 Communication between DUS-41 and TCU-03

TCU-03 communicates with DUS-41 with the above IP and Port configured through MO Shell. TCU-

03 becomes the TCP Server and DUS-41 becomes the TCP Client. Once the Socket Communication is

established in a separate thread as shown in figure 5.16, TCU can receive the two required events. UE

level measurement with timing alignment information is sent periodically for available MMES1APID

with 1 array per minute consisting of 60 sample values. This is also collected as events.


Figure 5.17: Event data with time stamp in millisecond

Figure 5.18: EventData list

Apart from RSRP and TA, we also receive ta-Interval and ta-Start-Time stamp. The ta-Interval is con-

figured as 1 minute. The RSRP events are taken one by one element out from the event Stream packet

according to the event parameters described in PM Events.

Figure 5.17 presents an EventData with time stamp in millisecond. As RSRP events arrive for a partic-

ular UE with MMES1APID, they are stored in the ”EventData” list one by one. This EventData list is

a doubly linked list with forward and backward pointer pointing to next node and previous node respec-

tively as shown in figure 5.18. All the successive events are updated in a eventDataHead pointer of the

list.

As soon as TA events consisting of 60 samples for the particular MMES1APID related UE arrives,

TA values are updated for every ”EventData” structure of the same MMES1APID.This ”EventData”

list along with total number of events, the head and tail pointer are all collectively stored in unique

MMES1APID related structure known as DataOfInterest.

For detailed information on DataOfInterest and EventData and their usage refer appendix(2).

The populated DataOfInterest structure is attached to AvailableData structure which are now ready to

be processed as shown in figure 5.19. This AvailableData list is again a doubly linked list consisting of

successive order of DataOfInterest structures and a count variable which represents the number of task

units available to be processed. This list is sent as unidirectional queue-1 to main thread with ”Avail-

ableDatahead” pointer pointing to last MMES1APID(recently added) and ”AvailableDataTail” pointer

pointing to first MMES1APID.

Concurrent client server test for Real time scheduler 79

Figure 5.19: AvailableData list

5.4.3 Mapping events streamer and decoder

In another thread which is started from the main thread, TCU03 runs a TCP Server and listens for the

connection with MME as TCP Client with different port number.This MME also has MO Shell related

command configuration settings. Soon after the configuration is made the MME initiates a connection

with this TCP server and starts streaming the mapping events to trajectory software running in TCU03

as shown in figure 5.20.

MMES1APID is a temporary ID for an UE at MME level and IMSI is permanent and unique global

ID. So to associate the trajectory to a particular UE, there is a need for mapping the temporary ID to

permanent IMSI ID.

There is an Event-ID, named as Internal Proc UE Ctxt Release Enodeb, Proc- represents a procedure

happening internally between base station and MME. This event is made in MME, whenever the UE has

Context release. Context release represents, an UE getting detached from the radio network. Whenever,

this event occurs, the context is released, the UE looses this MMES1APID ID and the UE is assigned a

new MMES1APID when it establishes another context for radio access. Figure 5.21 presents bit-level

packed binary file given by SGSN MME and opened through python script. So there is a need to match

this MMES1APID number to get the unique IMSI ID of the UE.

This Event is bit packed and the format is defined in the Ericsson documents[15]. This event which


Figure 5.20: Cell trace mapping events

Figure 5.21: SGSNMME for enkoping

Output Interface 81

Figure 5.22: MappingEntry

consists of time stamp in hour, minute, second and millisecond, base-station Id as eNodeB, IMSI and

MMES1APID.From these Events the mapping information of MMES1APID and IMSI are stored in

MappingEntry list.

This MappingEntry list is a doubly linked list and it is sent to main thread through unidirectional queue

as shown in figure 5.22 (queue-2) with ”MappingEntryHead” pointer pointing last MappingEntry node

(incoming stream mapping events are posted in head pointer side) and ”MappingEntryTail” pointer

pointing first MappingEntry node.

5.5 Output Interface

Output Interface is also known as data upload component. This upload component runs as a separate

thread known as TCP client and uploads the output buffer to Ercisson’s CMT server as shown in figure

5.24. CMT server is also known as Cellular mobile trajectories which is an HTTP server is running

within the CMT server and has a database to store trajectory data for massive UE in the form of sensor

data.

This Output buffer stores the filtered trajectory data. It is implemented in the form of linked list of a


Figure 5.23: UeDataPost structure

Figure 5.24: Socket interface between TCU03 and CMT Server

structure known as ’UeDataPost’ as shown in figure 5.23. This structure contains the IMSI value and a

list of ’PostData’ structure that corresponds to this IMSI value. PostData structure consists of structure

fields like latitude, longitude, timestamp and cell ID. This Output Interface component attaches a new

item (or UeDataPost in the list whenever a set of trajectory data for a known IMSI is ready after filtering

and conversion.)

This list is shared by main thread and output interface component running as a separate thread. For

detailed information on UeDataPost and postData and their usage refer appendix(2).

Chapter 6

Implementation

In this chapter, we will talk about system visualization and RBS communication followed by our solu-

tion architecture. Next positioning is discussed with antenna modelling for AoA calculation and particle

filter requirements for massive UE positioning. This is followed by multi-core section with parallel pro-

gramming issues, single core/ single thread implementation then followed by real time multi-threaded

and OpenMP scheduler for scheduling massive UEs in multi-core.

6.1 System Visualization

In this section we will discuss about various high level communication that happens with Ericsson RBS

equipments. Followed by our solution architecture involving real time streaming for network based real

time UE positioning.

6.1.1 Communication with Radio Base Station Equipments

The following figure 6.1 shows high level system visualization of our positioning system.

Ericsson RBS cabinet has two equipments residing like Digital Unit Server named as DUS41 and

Transport communication unit network equipment named as TCU03. The antennas installed in RBS are

mainly used by serving cell to communicate to UE. The antennas provide RSRP signal to UE. DUS-41

is connected to radio unit and radio unit is connected to antennas. DUS-41 communicates with antennas

with radio interface and TCU-03 through TCP protocol. TCU-03 is connected to transport network and

it starts streaming the PM events from DUS-41 through measurement configuration. Dus-41 provides

real time streaming of LTE events in the form of cell trace to trajectory software.

Each SGSN MME has a node ID and every SGSN MME node can serve mapping identities upto a max-

imum of 20 eNodeB with 20 different eNodeB identities. This MME provides real time streaming of

83

84 6. Implementation

Figure 6.1: High level System Visualization

IMSI mapping information to trajectory software which is a permanent identifier.

Cellular mobile trajectories (CMT) server is an http server that continuously receives trajectory informa-

tion of UE and stores in database. Tcu becomes tcp client and send continuously trajectory data to tcp

server known as CMT server. The data base in CMT server can store massive UE trajectory data. Also

the trajectory information for every UE can be fetched and real time trajectory can be plotted in google

maps.

6.1.2 Solution Architecture

Figure 6.2 shows solution architecture of our trajectory software.

The trajectory software[61,63] is running in TCU03 equipment which has AXM SoC as shown in figure

6.2, which describes the solution architecture.

• Input Interface - The input interface in solution architecture is responsible for receiving real time

network streaming data and storing in buffer. Both the input interfaces from cell trace in eNodeB

and binary file decoder from MME are described in test environment section.

• Antenna modelling for AoA - The antenna modelling algorithms has to be developed with matlab

modelling and implemented with software in c language for target SoC in TCU03. This modelling

has to be tested with both matlab and real time implementation. Finally the tested implementation

System Visualization 85

Figure 6.2: Solution Architecture


has to be tested in real time with TCU03 and DUS41 communication with drive test. So with

drive test we should receive AoA and AoA Variance consistently.

• Positioning Filter - There are two modules in positioning filter. The first module is Particle filter

which filters the received measurement data AoA and TA values with reference to RBS coordi-

nates. This particle filter then gives positioning in the form of state vector containing positions and

velocities in the X and Y direction of XY coordinate system. Since the data to be uploaded in CMT

server should be of the geographical parameters in the form of latitude and longitude. Finally the

XY coordinate points has to be converted to geographic coordinates latitude and longitude.

• Output Interface - In real time network positioning, the positioning for massive UE has to be

tracked and this should be stored in database in the form of server. The trajectory information

needs to be updated in CMT server. This is done by using the output interface buffers.

6.2 Positioning

6.2.1 Antenna modelling for AoA

Antenna modelling for an ericsson eNodeB is quite important for getting proper aoa values.The Antenna

main beam direction and their cell ID is given in table 6.1. Table 6.2 presents the antenna parameters

referred from katherin data sheet[31]. This modelling requires real time network streaming input values

from UE such as RSRP and cell ID.

Table 6.1: Installation Parameters[68]

Cell ID 10 11 12 13 14 15

Main Beam direction 10deg 70deg 130deg 190deg 250deg 310deg

Table 6.2: Data sheet[31]

Horizontal Pattern R(−30deg) and L(+30deg)

Frequency range 2490-2690 MHz

Max. Gain 19.6

Front Back Ratio +23

The antenna diagram as shown in figure 6.3 can be described by a trigonometric model. This model is

shown in below equation.

G(α, θ) =

N∑n=0

ηn(cos(n(α− θ)))[39, 47]

Positioning 87

Angle(in degrees)0 50 100 150 200 250 300 350 400

Ant

enna

Gai

n(in

dB

m)

-5

0

5

10

15

20

G1(α=10)G2(α=70)G3(α=130)G4(α=190)G5(α=250)G6(α=310)

Figure 6.3: Antenna Gain Diagram

In the above equation, α is the main beam direction and θ is the angle variable. This modelling for

various algorithms has to be initially performed in matlab and several calculations with antenna data

sheet parameters has to be performed and tested during mathematical modelling.

6.2.2 Multiple Signal Classification Algorithm

Using the above mentioned horizontal Antenna gain equation, and different physical cell ID along with

their mounted main beam-width angle, we can model the Antenna gain look-up table.Hence we can

arrive at Antenna Gain diagram. This requires the following procedure:

• The Received Signal Strength samples from three physical cells of the same base station along

with their cell id.

• The Steering vector of Antenna Gains formed with three antenna faces of their corresponding cell

id.

• Compute a Spatial Covariance matrix.

• Apply Singular value decomposition (SVD) method to extract signal from noise component.

• By finding the MUSIC spectrum, the peak spectrum is evaluated for DoD. Example of MUSIC

algorithm for obtaining DoD angle. when MUSIC algorithm is evaluated, With three RSS mea-

surements from three cell 10,11 and 12. Table 6.3 shows the parameters of measurements used in

MUSIC algorithm.


Angle(in degrees)0 50 100 150 200 250 300 350 400

Gai

n D

iffer

ence

(in d

Bm

)

-25

-20

-15

-10

-5

0

5

10

15

20

25

H12H23H34H45H56H61

Figure 6.4: Antenna Gain Differential Diagram

Table 6.3: MUSIC

Steering vector G G1 G2 G3

Steering vector Cell for SVD G10 G11 G12

Beam-width angle 10 70 140

SVD method Signal source Noise-1 Noise-2

RSRP measurements X -137.5 -142.5 -147.5

Finally we Obtain Degree of departure angle as 37 degrees with peak spectrum.

6.2.3 Least Square Algorithm

In the Antenna model, the least square algorithm takes input as two RSRP, and its corresponding two

physical Cell identities, We obtain the pair of two neighbouring cell identity. The difference between

the reference signal received powers is equal to the difference in horizontal antenna gain. The difference

with known two neighbouring cells can be determined in Matlab with modelling antenna gain informa-

tion together with antenna branch configuration(also known as antenna main beam direction radiation in

field). Figure 6.4 shows the differential antenna gain diagram.

For all subsequent pairs of antennas, we can derive differential Antenna gain look up table and variance

antenna gain look up table by modelling antenna gain, differential antenna gain and variance antenna

gain diagrams.

Positioning 89

Figure 6.5: DoD for UE

Cramer Rao Lower Bound modelling for angle variance estimate

In order to achieve an accurate position estimate it is needed to know how large the variance of DoD

estimates are. The variance of the angle estimates differs depending on the angle. The expected min-

imum variance of the angle estimates are given by the Cramer Rao Lower Bound(CRLB). The CRLB

states the lowest possible variance for an unbiased estimator and is derived for the differential antenna

diagram. For the case when two RSRP measurements are available a special case of the bound occurs

and the bound is reduced to,

var(α) >=2(σ2

RSS)

(dH(σ,θ1,θ2)dθ )

2

[39, 47]

In the above equation, α is the main beam direction and θ is the angle variable. The variance (σ2RSS) is

set to 3 dB as a standard value.

Example of Least square algorithm for AoA angle modelling. Let us consider the following example

with two RSRP and two neighbouring cell-Id as shown in figure 6.5. we get DoD (degree of departure) in

clock-wise reference direction from North with origin or zero degree as base station. Table 6.4 presents

the Cell ID and RSRP values for Least square algorithm.

Table 6.4: Cell ID and RSRP Value for LS

Physical Cell Id 10 11

Azhimuth angle 10 70

RSRP measurements -137.5 -142.5


Angle(in degrees)0 50 100 150 200 250 300 350 400

Gai

n D

iffer

ence

(in d

Bm

)

-25

-20

-15

-10

-5

0

5

10

15

20

25

H12

RSRPdiff

Figure 6.6: Antenna Gain Differential Diagram for cell Id-1 and cell Id-2

Figure 6.6 presents differential antenna gain diagram between cell-Id 10 and 11.So further we also used

the fact, that the difference in RSRP is the difference in horizontal gain and hence the horizontal angle

will also be referred to as the direction of departure(DoD).Since by solving the least square problem, we

get two solutions of which one is true DoD and it can be filtered with the identified cells main beam-

width direction of radio signal.

We get difference in RSRP is 5 with values in table. When we draw horizontal line at 5, we get DoD

as 37 degree and 331 degree. Since we have first pair of Cells 10 and 11 with beam-width direction as

10 and 70 degrees. This becomes the filtering angle for getting the correct degree of departure as 37

degrees.

DoD is measured in clock-wise reference direction from North with origin or zero degree as base station.

Figure 6.7 presents AoA measurement seen from eNodeB. AoA is defined as the estimated angle of a

UE with respect to a reference direction which is geographical North, positive in a counter-clockwise

direction, as seen from an eNodeB. Figure 6.8 presents angle conversion from DoD to AoA method with

all four quadrant. AoA is Angle of arrival for UE in terms of Cartesian coordinates. Hence DoD has to

be converted to Geometric Cartesian coordinates with respect to x and y axis for arc tan function. Thus

we get AoA = 90−DoD = 53.

Positioning 91

Figure 6.7: Angle of Arrival(AoA)

Figure 6.8: AoA from DoD


Figure 6.9: 6-Sector Symmetric-cells[68]

6.2.4 Verification of antenna modelling for an Ericsson RBS

By modelling look up table(LUT), we have both performance and functionality optimized. Figure 6.9

presents avarage RSRP availability measured by various symmetric cells and the main beam width or

azhitmuth direction orientation for antennas.

(1) Verification Of Antenna Gain LUT:

The main beam direction for cell ID 10 is 10 degree, correspondingly this cell should have maximum

gain of 19.6 at 10 degree.











(2) Verification Of Differential Antenna Gain LUT:

The maximum gain difference for antenna pair H12 at 10 degree is +23 and at 70 degree is -23.





Positioning 93


So after many Live test with wireless modelling of base station, the field acceptable functionality along

with performance optimized algorithm that can be used for estimating DoD angle, given that n number

of RSRP measurements availability, if n=1 then,

DoD = Serving Cell Azimuth angle.

DoD Variance = 30 (since Symmetric of 60 degree difference is maintained in Azimuth angle between

any two consecutive cells).

else if n=2,

DoD is solved by Least square estimation and filtered by choosing the estimate closest to the serving

azimuth angle.

DoD Variance is solved by derivative of differential antenna gain variance LUT.

Finally both DoD and DoD Variance is converted to AoA and AoA variance similar to the above exam-

ple.

6.2.5 Requirement of Particle Filter to process massive UE

When ever there is a new IMSI to be processed for every measurement, with AoA, AoAVariance and

TA, we start particle filter loop for as shown in figure 6.10. We create new set of N number of random

particles near base station within the radius of 1750 meters and center as the base station coordinates.

For every particle a state vector is attached. In order to get good trajectory functionality, we consider N

as 5000 particles. Each particle has state vector containing positions and velocities in X and Y direction

of a XY coordinate system along with importance weight which will determine the importance of the

particle during the filter processing.

Once the set of N particles initialisation with equal weight are formed with in circle, the particle filter

algorithm is started with computational intensive loop for all 5000 particles as shown in figure . As the

UE moves, during the filter progresses, the observed measurement data like TA will become the distance

from the base station and the AoA and AoA variance will be the angle information utilized in sensor

model during measurement update phase.

Once the computational intensive loop is processed for all 5000 particles consisting of state prediction,

measurement update and normalization the re-sampling phase is started. Depending on the importance

weight of the particle, a particle with higher weight has a higher probability to be sampled and selected

in the re-sampling process. Thus the particles that are selected after the re-sampling particles are current

state vector of the UE that will be needed for next measurement.

In hidden markov process, the future state depends on past through the present. When given current

state of the UE, current observation is independent of all past. Yet, successive observations are not inde-


Figure 6.10: Particle Filter Algorithm loop

Parallel processing with AXM 5512 Multi-core 95

Figure 6.11: Hidden Markov Model[47]

pendent, it is correlated by hidden state as shown in figure 6.11.

In Particle Filter, the goal is to estimate the state vector xt, as accurate as possible given the previous

state, current measurement and all previous measurements y1:t−1. Also from the above figure of hidden

markov model all the state variables are hidden information and only the measurement variables are

observable information. The diagram shows the markov chain with state dependencies. The markov as-

sumption concludes that the current state xt already contains all the information about the earlier states.

Also there is no extra information in all the measurements till y1:t−1, that are not included in the state

xt. So for the particle filter in-order to find next state, it needs previous state vector for processing along

with current measurement.

Consumer model showing functional components for massive UE

For processing massive UE, the markov assumption simplifies the dependencies by storing the current

state vector after processing every measurement. This requirement has to be fulfilled for all massive UE

processing as shown in figure 6.12 during multi core scheduling.

6.3 Parallel processing with AXM 5512 Multi-core

6.3.1 Parallel Programming requirements for Multi-core

The following issues needs to be considered for parallel programming of multicores[53,61]:

• Function re-entrancy - A function is re-entrant if it can be invoked while already being in the

process of execution, if it can be interrupted in the middle of execution and invoked again before


Figure 6.12: Functional Components

the interrupted execution completes. For a function to be re-entrant, it must fulfil the following

conditions[53]:

– All data must be supplied by the caller function.

– The function must not hold static or global data over successive calls.

– The function cannot return a pointer to static data.

– The function cannot itself call functions which are not re-entrant.

• Thread Safeness - A Piece of code or function, can be called simultaneously from multiple threads

even when the invocations use shared data, because all references to the shared data are serialized.

For a function to be thread safe, it must protect shared data with locks. This means that the

implementation needs to be changed by adding synchronization blocks or primitives to protect

concurrent accesses to shared resources (or critical section), from different threads.

Although a re-entrant function is more likely to be thread-safe , a re-entrant function need not be

thread safe and a thread safe function need not be re-entrant.

A re-entrant function can be invoked, interrupted and re-invoked simultaneously by multiple

threads if and only if each caller(invocation) has references that provides unique input data.

A thread-safe function can be invoked simultaneously by multiple threads, even if each invocation

references same data or input, as all access to shared data is serialized.

• Data race and Locks - Most of the modern applications are designed and implemented to be

multi-threaded application in multi core target, where all the cores execute their task in parallel.


Debugging this parallel application adds further complexity. In addition to earlier challenges there

are new challenges in debugging a multi threaded application while in execution in multi cores.

The two most prominent categories are 1. Race Condition. 2. Dead lock situation.

Race condition occur when two or more threads share one global variable. When there is no proper

synchronization mechanism to block one thread execution on one core while the other thread ex-

ecutes an operation on that shared variable in another core.

Race conditions are caused by incorrect and missing synchronization on shared variable in pro-

gram mechanism such as mutex, semaphores and atomic operations which are available in most

of the thread libraries. These resources are provided as a means to set up critical regions to avoid

race conditions.

A program that relies on threads executing in a particular sequence to work correctly may have

a race condition. Data race causes in-deterministic execution trace scenarios and hence causes

wrong results. So Locks are needed to access shared resource but the problem with locks, is that

they tend to serialize the code region. Hence it is advisable to implement locking mechanism for

data and not for code.

• Cache contention and false sharing - If multiple threads are using data which reside within the

same coherent cache lines, there can be cache line migration overhead even if the actual variables

are not shared. False sharing can happen when a processor regularly accesses data that is never

changed by another processor and this data shares a cache line with data that will be altered by

another processor.

• Dead Lock and Live lock - Deadlock is the situation where two(or more) threads are each waiting

for another thread to release a shared resource. Such threads are effectively blocked, waiting for a

lock that can never be released. Live lock occurs when multiple threads are able to execute, with-

out blocking, but the system considering all the cores as a whole, due to repeated access pattern

of resource contention, become stagnate and speed is decreased.

6.3.2 Single core/Single thread scheduler

The performance of code regions has to be analysed in-order to design software for efficient cache

re-use[70,59].Figure 6.13 presents the main thread function performance with respect to various code

region.

A child region is called as function which calls the algorithm. Exclusive measurements are for code


Figure 6.13: Consumer modelling showing callback functions with code regions

Figure 6.14: DataOfInterest

region by excluding the child region. Inclusive measurements includes Child region along with exclusive

measurements. This child region include the functional components of algorithms like least square

antenna modelling, particle filter and an open source MGRS (Military Grid Reference Software for

latitude and longitude conversion coordinates)[62].

Cache friendly Data structures with predictable access pattern

Predictable memory access patterns with forward and backward pointers to linked list is often used for

better cache friendly memory access by cores with private caches.

Figure 6.14 presents the task level modelled DataOfInterest data structure. As we have seen earlier,

antenna model utilizes the RSRP values, physical cell-id and its appropriate beam-width and gives angle

of arrival and angle of arrival variance. The above figure shows DataOfInterest consisting of IMSI for a

particular UE and EventData list which is a doubly linked list with forward and backward pointers. Each

EventData node within EventData list has set of cell trace data like set of RSRPs, Serving cell-Id, other

Physical cell Id, TA, and time stamp etc. Antenna model utilizes each EventData items by traversing


from eventDataTail to produce ”AoA” angle and ”AoAVariance” with increasing time steps. So its quite

obvious that all the members of structure are accessed and processed by the algorithm with one go and

gives more cache hit performance by reading all adjacent data with spatial locality.

Preparation of filter data:

When compared with Structure of Arrays, Array Of Structures have better cache locality because in

memory layout, each object is kept together. If all the members of structure are accessed together then it

is more preferable to have Array of structures. In-addition, When compared with static memory alloca-

tion, dynamic memory allocation has the benefit that data structure can grow and shrink to fit changing

data requirements. Since we know the number of set of ”AoA” available together with valid ”TA” pairs

within one minute by maintaining a counter variable which increments number of measurements in an-

tenna module batch processing. This is required because the counter variable could have different values

for different ”UE”.

Cache friendly UeDataPost structure

Convert latitude longitude function which is an open source software(MGRS) that utilizes horizontal

and vertical XY position present in each PostData element and converts them to latitude and longitude

values and stores back in PostData element.

Critical section and Locks among various interface threads

The threads communicate through shared memory and the common data structures are shared across all

the threads. The critical sections used in the program are as follows:

• Critical section-1 - So the lock is being made, in the event streamer thread for posting DataOfIn-

terest task units in head side known as availableDataHead. A variable in AvailableList structure

tells the count of number of DataOfInterest nodes in the AvailableList or otherwise number of

task units. This count variable is incremented by one within lock, whenever the streamer thread

attaches or post . Same lock is used in main thread that writes to this list of queue by decreasing

the number of count variable entries by reading the DataOfInterest from ”AvailableDataTail” and

freeing the Available list queue elements in main thread. So together with AvailableData- a linked

list based queue in events streamer thread and static buffer in main thread forms a hybrid buffer

arrangement of input buffering for single core and single thread.

• Critical section-2 (Massive UE list in Consumer thread) - In-Order to store the particles state

vector for every unique IMSI and also new IMSI, there is a need for list. So this list becomes a


Figure 6.15: Producer-Consumer and bounded buffer

critical section, when multiple consumer threads are trying to add new IMSI, race condition can

occur. So there is a need for lock in head side of the list and every new IMSI is added in head side.

• Critical Section-3 (Consumer threads and Output interface thread) - The data structure UeDataPost

is shared between the consumer threads and output interface thread. This lock is similar to first

critical section and it is shared by all consumer threads and output interface thread.

6.3.3 Real time Multi-threaded Scheduler design for MPcore

In general design of a producer, consumer and bounded buffer the following issues are applicable. A

Producer component cannot insert DataOfInterest task into bounded buffer, when the buffer is full. A

Consumer component cannot fetch and process the DataOfInterest task from the bounded buffer, when

the buffer is empty.

In general in multi-core scenario with the presence of more than two arm cores with more than two

consumers as shown in figure 6.15, it adds further complexity of maintaining the functionality, while

simultaneously all the cores should keep processing in parallel without waiting or sleeping and hence

making the load to be balanced across all the cores.


Figure 6.16: Producer Modelling

• Producer Modelling - This producer modelling consists of three threads. As we have seen about

cell trace streamer interface thread and mapping interface streamer thread in previous chapter, in

this section let us see how the two streamer thread is interfaced with main thread. The below

figure 6.16 shows producer modelling for massive user equipment.

• Cell trace streamer interface thread - We use binary semaphore event flag, in events streamer in-

terface thread to indicate the presence of DataOfInterest task for the number of UEs found by

cell trace. As we seen in previous section, every UE present in the vicinity of base station has an

MMES1APID.

• Mapping streamer interface thread - Mapping fifo shared across main thread and mapping streamer.

Main thread is just a reader thread for list of mapping events and mapping streamer thread is a

writer thread for mapping events list. The mapping streamer decodes the MMES1APID and along

with its mapping information known as, IMSI identity number.Since we have already seen that

IMSI is a permanent and globally unique identifier of UE, we need to find the ”IMSI” as mapping

information before we place DataOfInterest in bounded buffer queue. In Order to achieve this

we make a mapping fifo list and the incoming MMES1APID and IMSI pair, is added on head

side of fifo list. Main thread being just reader, reads from tail side of mapping fifo list and finds

the IMSI identity which is a permanent identity for the corresponding MMES1APID and then


Figure 6.17: Thread and IMSI ID Buffer

adds in bounded buffer queue. We make use of binary semaphore event flag, in mapping streamer

interface to store the number of list of mapping information obtained.

• Main thread - In main thread, we keep monitoring the presence of binary semaphore event flag

from mapping streamer interface and binary semaphore event flag from events streamer interface.

Once both the binary semaphore event flag turns high or one, the main thread is responsible in

producer modelling component for adding DataOfInterest task with IMSI identity in head side of

queue and incrementing count semaphore. This count semaphore tells the number of task items

present within bounded buffer for each of the consumer threads/cores to process.

Need of Critical Section

The following section discusses the multi-threaded scheduler modelling for bounded buffer require-

ments.

• Unique IMSI identity - For Hidden markov model requirement for massive UE, there is a need for

multicore scheduler design to process only unique IMSI. Intuitively, this implies no two consumer

threads/cores should process the same IMSI. Thus during parallel processing with multicores,

unique IMSI identity related DataOfInterest processing is formally defined as all the parallel con-

sumer threads/cores can only process or schedule different IMSI at run time. During parallel

processing if this requirement is not taken into consideration then the consumer threads will inter-

leave and process future DataOfInterest for same IMSI before present DataOfInterest is processed

and hence the hidden markov requirement will be violated.

For this reason, we need a run-time mechanism to compare the IMSI ID that is currently being

scheduled across all the parallel processing cores and filter the unique IMSI ID for the consumer

worker threads.

• Shared static buffer for multiple consumer threads - The figure 6.17 shows CpuProcessingIMSI

buffer which is shared across all consumer threads.

This buffer is a shared static buffer with four columns and two rows. The number of column

depends on the available cores/threads. This buffer has each column with first row as thread ID

and the second row with the IMSI ID that is currently being scheduled by this thread.


Figure 6.18: Bounded Buffer

• Producer-Consumer shared bounded buffer - The bounded buffer in multi-threaded scheduling is

dynamic queue based list which is shared by producer as well as all consumer threads. Figure

6.18 presents list based bounded buffer as dataOfInterestqueue consisting DataOfInterest task

units. With multi-core all the consumer threads will be ready to process its allocated task. During

the parallel execution, since only one consumer thread/core can find its unique IMSI ID and only

this thread can decrement the count semaphore. while this is happening, simultaneously producer

component should not insert DataOfInterest in bounded buffer head pointer side and increment

this count semaphore.

Every consumer thread initially starts accessing dataOfInterestTail node of bounded buffer. Then

it can both access and detach any node from either dataOfInterestTail node or middle node or

dataOfInterestHead node by comparing CpuProcessingIMSI buffer from above figure for IMSI

ID that is different. If any matching IMSI ID is found then we proceed to find different IMSI from

previous node of dataOfInterestTail. So the following scenarios needs to be considered while

determining the unique IMSI ID for an UE to schedule.

• Unique IMSI ID related DataOfInterest structure can be any node in-between queue head and tail

pointers but not the node pointed by head and tail pointers.

• Unique IMSI ID related DataOfInterest structure could be the node found as same as the node

pointed by tail pointer of the queue.

• Unique IMSI identity related DataOfInterest structure could be the node found as same as the

node pointed by head pointer of the queue.

• Unique IMSI identity related DataOfInterest structure could be the node found as same as the

node pointed by both head pointer and tail pointer of the queue. This scenario will occur for last

node or only one node remaining in queue that is to be processed.


Figure 6.19: False Sharing[69]

Figure 6.20: Thread Level Parallelism

Consumer modelling

As we saw in earlier section, caches consist of lines, each holding multiple adjacent words of 64 bytes.

In multi-core chapter, we have seen how the main memory is both read and written in terms of cache

lines. The cache line can accommodate very few data, If read byte is not present in cache, then full cache

line from main memory is read.

During memory write with any change in certain bytes of 64 byte cache line data, always the core writes

full cache line to main memory.

Thus as shown in figure 6.19 the CpuProcessingIMSI buffer is shared across all consumer threads/CPU


cores. Hence by applying lock to the critical section the hidden markov property is fulfilled and false

sharing is avoided but it results in lock contention across all cores as communication bottleneck.

Finally, each task is processed by different cores as shown in figure 6.20 . After processing Antenna

module for AoA, particle filter and open source MGRS module for latitude and longitude data for tra-

jectory, the thread that processed IMSI ID will reset its IMSI entry place in CpuProcessingIMSI buffer

to zero to indicate the thread/core is ready for processing next task.

6.3.4 OpenMP based Centralized Task level scheduler for MPcore

In OpenMP[60] based scheduler, the above described producer and multiple consumers, lock is opti-

mized and hence with the following design of buffering arrangement, the lock contention among pro-

ducer thread and multiple consumer threads are avoided.

Producer modelling

As we saw in single core scheduler, similar hybrid buffer is utilized to bring the massive UE list contain-

ing DataOfInterest task units to main thread. This scheduler design is designed in such a way that each

worker thread processes different IMSI related DataOfInterest task units.

Bounded buffer:

In this scheduler the bounded buffer is a centralized static buffer distribution as shown in figure 6.21.

This static buffer is assumed to be initially empty.

The producer component consists of events streamer thread, mapping streamer thread and main thread

which is a master thread are delayed until the static bounded buffer is empty. This static buffer is filled

by different IMSI in each index location by comparing with every other remaining location.

OpenMP based Consumer modelling

In this modelling the parallel region of the code is identical to many threads as shown in below figure.

The parallel region is both thread safe and function re-entrant as shown in figure 6.22.

All OpenMP programs begin as a single process called the master thread. The master thread executes

sequentially fetching DataOfInterest task units from static part of hybrid buffer. This master thread

compares all task units and finds the four different IMSI related task units to place in centralized static

buffer. The size of centralized static buffer to hold different number of task unit is equal to the number


Figure 6.21: OpenMPScheduler

of parallel threads/cores as shown in figure. The master thread does all these sequential code region,

until the first parallel region construct is encountered.

OMP Parallel for: The master thread then creates a team of parallel threads with this parallel region

construct as shown in figure.

The statements in the program that are enclosed by the parallel region construct are then executed in

parallel among the various team threads. As shown in above figure antenna array modelling gives AoA

which is used for sensor model in particle filter which is discussed in earlier sections gives state vector

with XY coordinates. The last module is Latitude and Longitude conversion module which converts the

XY coordinates to geographical latitude and longitude coordinates.

At the end of for loop When the team of threads complete the statements in the parallel region construct,

they synchronize by waiting others to finish and terminate, by leaving only the master thread.

Now the master thread proceeds to the Output interface. With OpenMP scheduler all the consumer par-

allel threads are delayed until the static buffer is fully filled by producer component or depending on

number of UEs.There are two commonly used OpenMP based scheduler know as static and dynamic.

Out of this the static is default and it is frequently used if the parallel region execution time is same

across all parallel threads and loop count is same as number of threads.


Figure 6.22: OpenMPScheduler


6.4 Summary

Positioning with Antenna modelling, AoA calculation and Verification of Ericsson RBS for antenna

modelling is explained. This is followed by explaining massive UE positioning requirements with par-

ticle filter. Next Multicore parallel programming requirements are explained followed by Single core

scheduler, Multi threaded scheduler scheduling massive UEs fulfilling the requirements. This is fol-

lowed by OpenMP scheduler design explanation.

Chapter 7

Result and Discussion

This chapter talks about the results obtained for AoA angle modelling with drive test and then it talks

about the serial and parallel performance engineering results. Next, it talks about the discussion section

with ground truth measurements in drive route followed by software characteristics, serial single core

and parallel performance engineering for multicore procedure is discussed.

7.1 Results

7.1.1 Network based AoA angle Vs GPS based crude AoA angle

Figure 7.1 presents the Network AoA vs GPS based AoA.GPS location data of UE has been taken for

reference and GPS based crude AoA angle was calculated between RBS and the UE at different mea-

surement points.

These reference GPS based AoA values are compared to AoA values from the RSRP values received

from the network.

7.1.2 Profiling for single core execution with Gprof for AXM 5512 SoC

Gprof is a GNU tool used for measuring the lines of code execution time.

Figure 7.2 presents the graph for main function calls for single core serial region.

From the graph we can infer that out of the three major child region algorithms, particle filter API con-

sumes 97.29 percent which is highest.Figure 7.3 presents execution profiling graph for only particle filter

child region functions.

109

110 7. Result and Discussion

Measurement points0 100 200 300 400 500 600 700 800

AoA

(in

deg

rees

)

0

50

100

150

200

250

300

350

400

AoA values (GPS)

AoA values (Network)

Figure 7.1: Network AoA vs GPS based AoA

Results 111

Figure 7.2: Gprof graph for main function call


Figure 7.3: Gprof graph for Particle filter functions

Results 113

Figure 7.4: Real time system constructs initialization order-3

It is self explanatory that resample function consumes around 14 percent of execution time in particle

filter which is first highest execution time.

7.1.3 Race detector with Helgrind for four Arm Cortex A15 in AXM 5512 SoC

Race Detection Free Software Analysis:

The following order of initialization is necessary for race free multi-threaded software.

Step 1: Declare event semaphores flags and Mutex locks.

Step 2: Initialize mutex locks mentioned in section () with pthread mutex init API.

Step 3: Allocate the memory requirements for Queue based data structures for shared memory commu-

nication with main thread and other streamer threads .

Step 4: Start the input interface streamer threads and output interface streamer threads for various client

server based communications.

Figure 7.4 presents the order of initialisation for real time synchronization constructs . Race detection is

analysed with multiple threads by simulating with 10 UEs and it is explained in discussion section.

7.1.4 Heap Memory profiling

Figure 7.5 presents heap memory requirements for simulating 70 UEs before peak usage.

Figure 7.6 presents heap memory requirements for simulating 70 UEs after peak usage.


Figure 7.5: heap profiling for massive UE before peak graph-5

So it is self-explanatory from both before and after peak, that re-sample function decreases its memory

requirements by 10 percent.

7.1.5 Cache Profiling

Instruction Cache access for Level-1 Private Core Cache Table 7.1 presents Level-1 Instruction cache

hit rate analysis.We can infer that there is 0.2 percent increase in Instruction cache miss rate from 1 UE

to 10 UEs.

Table 7.1: Instruction Cache Profile

Number of UEs 32k Instruction Cache miss rate 32k Instruction Cache hit rate

1 0.01% 99.99%

10 0.19% 99.81%

Data Cache access for Level-1 Private Core Cache Table 7.2 presents Level-1 Data cache hit rate

analysis. We can observe the read miss rate being constant from 1 UE to 10 UEs but the write miss rate

is increased by 0.3 percent.

Results 115

Figure 7.6: heap profiling for massive UE after peak-6


Table 7.2: Data Cache Profile

Number of UEs 32k Data Cache miss rate read miss rate write miss rate 32k Data Cache Hit rate

1 1.3% 1.7% 0.3% 98.7%

10 1.4% 1.7% 0.6% 98.6%

7.1.6 Scalability analysis with Gustaffson Law for Multi-core scheduling with four Arm CortexA15

Single Core Scheduler Table 7.3 presents Single core Scheduler results. In 26 minutes, we can observe

60 UEs are processed.

Table 7.3: Single Core Execution Profile

Total Processing time Number of UEs can be processed by 1 Arm A15 core

26 minutes 60

Multi-threaded Scheduler version scalability Table 7.4 presents Multicore execution profile for scala-

bility analysis with multi-threaded scheduler.We can observe that with 4 cores we could process 90 UEs

in 26 minutes of processing time.

Table 7.4: Multicore Execution Profile

Number of UEs can be processed 2 Arm A15 cores 3 Arm A15 cores 4 Arm A15 cores

In 26 minutes of processing time 60-70 80 80-90

OpenMP Scheduler version scalability Table 7.5 presents Multicore execution profile for scalability

analysis with OpenMP scheduler.We can observe that with 4 cores we could process 120 UEs in 26

minutes of processing time.

Table 7.5: OpenMP Execution Profile

Number of UEs can be processed by 2 Arm A15 cores 3 Arm A15 cores 4 Arm A15 cores

In 26 minutes of processing time 80-90 90 120

Figure 7.7 presents scalability graph for multithreaded scheduler for 1 to 4 A15 cores.

Figure 7.8 presents scalability graph for OpenMP scheduler for 1 to 4 A15 cores.

Figure 7.9 presents scalability graph comparing the two schedulers with single core scheduler. So we

Results 117

Figure 7.7: Scalability for Multi-threaded scheduler graph-7


Figure 7.8: Scalability for OpenMP scheduler graph-8

could observe that OpenMP scheduler with 4 A15 cores gives 2X increase in number of UEs than multi-

threaded version showing 1.5X times increase in number of UEs processed. Thus when there is decrease

in streaming RSRP and TA events for 5 Second other than 1 second and decreasing the number of parti-

cles by 5 times, we get maximum of 3000 UEs that can be processed.

7.2 Discussion

7.2.1 Ground truth measurement characteristics for RSRP

This section discusses the measurement characteristics that we performed with drive test in enkoping

with RBS equipments explained in test environment chapter.

Signal Strength Characteristics

The drive route as shown in figure 7.10 presents different segments of signal strength characteristics

based on the coverage area of serving cell. Some segments of this drive route had a good signal reception

from serving cell based on main beam-width or azhimuth angle, whereas some other segments has a poor

signal strength reception.

Discussion 119

Figure 7.9: Scalability Comparison graph-9


Longitude17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12

Latit

ude

59.622

59.624

59.626

59.628

59.63

59.632

59.634

59.636

59.638

59.64

59.642

RS

RP

Val

ue

10

15

20

25

30

35

40

45

50

55

60

Figure 7.10: RSRP Signal strength from serving cell along the drive route

Figure 7.10 presents the plot of RSRP values from serving cell with various measurement points along

the drive route. The same figure also presents, colour of the circle which distinguishes the strength

of RSRP. The color bar on the right side with increasing RSRP values maps the color of the circles.

Segments with blue and dark blue shows weaker signal strength, while the segment with yellow circles

have the strongest signal strength.

7.2.2 Measurement Characteristics in drive test

During live test in enkoping drive test, we collected the real RSRP measurements data for one UE

trajectory logged for 26 minutes. This duration of 26 minutes represent 31 TA data otherwise known as

31 DataOfInterest values. Figure 7.11 presents the ground truth measurement characteristics for RSRP

from 1 or 2 cells.

Total RSRP data logged for one UE trajectory including the source and destination of drive path is

around 1479. The total measurements data that is being processed till output interface is about 1375

measurements. Out of these 1375 measurement points, when we discard some RSRP measurements

which were obtained during start and upon finishing the drive test from agatan test lab we get 740

measurement points. Out of these 740 measurement points, we receive 418 measurement points, for

which RSRP values were available from 2 cells of the same RBS, whereas at 323 measurement points,

RSRP values are obtained from only serving cell.

Discussion 121

Longitude17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.125

Latit

ude

59.62

59.625

59.63

59.635

59.64

59.645

Figure 7.11: Ground truth analysis of RSRP from 1 or 2 cells

7.2.3 Software Characteristics

Figure 7.12 presents the main function callgraph which shows all the functions called from the main

thread.

Start trace stream()

Main thread calls this function and this function starts a thread with thread handler as receive events()

function, which is a network streamer from eNodeB. Further it calls decode event packet required()

function which decodes the performance management events of cell trace. This function decodes the

two PM Events like ”UE MEAS INTRAFREQ1” for RSRP and its associated Cell-Id along with PM

event ”INTERNAL PER RADIO UE MEASUREMENT TA” for timing advance values. This input

interface streamer and its associated data structure is explained in chapter-3. The lock for this critical

section of shared data structure between main thread and events streamer thread is shown in figure 7.13

code snippet.


Figure 7.12: Main function callgraph

Figure 7.13: lock1-code snippet of Sequential version

Discussion 123

receive events()

As we seen earlier in section (), about cell trace communication which is implemented with TCP Stream-

ing unlike UE trace which is UDP Streaming. The below table 7.6 represents the cell trace event type

data along with their size in bytes.

Table 7.6: Event Type and Size

Event Type Size of data(in bytes)

TCP Stream indicator 425

RSRP Events 80

TA event 168

Based on the desired time interval, we can configure the OAM interface with desired frequency with

which we can obtain number of RSRP data per second time interval. With this configuration we receive

RSRP events for 1 UE per minute as follows:

Total size in bytes will be = 60 X 80 bytes= 4800 bytes.

With continuous per second based network triggering application or android application running in the

UE, the TA event will be received every minute continuously.

Thus total event data for 1 UE per minute = 4800 + 168= 4968.

Since the test drive was 26 minutes the total bytes of data sent through cell trace input interface for

various number of UEs are shown in table 7.7.

Table 7.7: Event Data Sizes

S:No Number of UEs Events streamer data for 1 minute Events streamer data for 26 minute

1 1 4.968 KB 154 KB

2 10 49 KB 1.54 MB

3 100 496 KB 15.4 MB

4 1000 4.96 MB 154 MB

So by referring the table above, we could justify that the software should be able to process 496 KB

of task data in one minute in-order to process 100 IMSI UEs. If the number of threads/cores are when

increased the software should process 4.96 MB of task units data in-order to process 1000 IMSI UEs.

Start mapping stream()

Main thread calls this function. The client test file drives data to mapping logger text file. This streamer

thread decodes the bit packed binary file decoder for decoding IMSI identifier corresponding to every

Mmes1apid obtained. This function maintains a list of mapping for UE with Mmes1apid and IMSI

identity number.


Figure 7.14: Anetenna model child region callgraph-10

Start ue data post()

The trajectory points are stored in Post Data queue which is stored inside Ue Data Post queue. Multiple

consumer threads with multiple parallel core processing can interleave and schedule parallel execution

and store the final trajectory points. The trajectory points are stored in the form of successive latitude

and longitude points in post data queue and it is pushed to Ue Data Post queue consisting of list of

massive IMSI. This function will keep posting the trajectory information for all processed massive UEs

to CMT server.

get AoA from RSRP()

Figure 7.14 presents the callgraph of child region in antenna module.This function takes input arguments

as event data structure, calculates and assigns AoAangle and AoAvariance through call by reference.

Internally this function calls single cell angle() for only one rsrp and calls Get Serving Cell LS() and

Get AoAangle LS() for two rsrp. Thus the callee, and all the called functions are made with both thread-

safe and function re-entrancy.

Get Serving Cell LS ()

This function is invoked when there is two rsrp and their corresponding cell identities obtained within

event data structure (as shown in appendix-2) from cell trace. Depending on the neighbouring cell iden-

tity pair, this function finds the differential antenna look up table (LUT) and variance LUT. The LUT is

Discussion 125

modelled using matlab for all physical cells within the base station.

Get AoAangle LS ()

This function finds the difference of horizontal gain. The method of least squares is about estimating

parameters by minimizing the squared discrepancies between observed data, on the one hand, and their

expected values on the other. The least squares criterion is a computationally convenient measure of

fit. It corresponds to maximum likelihood estimation when the noise is normally distributed with equal

variance.

It computes maxima and minima to find two angles. Out of the two angle, the correct angle is filtered

using the beam-width (or, azimuth angle) angle of the cell pair. This function gives the computed output

of AoA angle and AoA variance corresponding to every events data.

Single cell angle()

This function is invoked when there is only one rsrp value found within the event data. In this function

only the cell beam-width angle is used as shown below and rsrp is not used for finding angle.

Single Cell MainBeam[6]= 10, 70, 130, 190, 250, 310

Variance[6]= 30 ,30 ,30 ,30 , 30, 30

The symmetric beam-width difference between any two adjacent neighbouring cells is 60 degree. So

each cell covers 30 degree in either direction. Hence the variance is 30 for all cells.

run particle filter()

This function takes the input arguments as both UeDataPost, measurements structure and number of

measurements to be computed. For every measurements, inturn this function calls get ue particles,

followed by a computational intensive loop, resample particles and free particles function. Finally It

updates the UeDataPost with the computed horizontal and vertical positions for each measurements.

resample particles()

The multi-nominal re-sampling by ripley method is used for re-sampling in particle filter. For better

re-sampling, in-order to avoid filter convergence the conditional processing part of ripley’s method is

not used . In this re-sampling, we select or choose particles which are closer to the measurement. The

particles that are not closer will not get chance to participate in re-sampling phase and so free particles ()


Figure 7.15: lock for Particle Filter

function is used to free those particles which are not chosen. During re-sampling, many particles which

has high importance weights, which are located closer to measurement becomes identical. Therefore,

there is a need for allocating memory for new particles and copy particle function is used to copy se-

lected particles state vector.Whenever one complete measurement loop is processed with one set of

AoA and TA for every IMSI, this resample particles function stores the state vector for UE for that cor-

responding IMSI in UE Create particles list. The data structure of UE Create particles list consists of

IMSI values and state vector xt for all N particles. This data structure is shown in appendix [2].

Get ue particles()

This function has particles stored in UE Create particles list.

This list is stored and accessed inside this function. This is used for hidden markov requirement for

massive user equipment. This is a critical section and lock for lines of code is shown in figure 7.15.

Create particles()

Whenever, there is a new IMSI identity is found, the create particles function is called along by passing

IMSI identity number. The particles structure is shown in appendix. The term weight of the particles is

used to analyse the importance weight when compared to the received measurement weight. Depending

on the weight the term selected denotes weather the particle is selected for re-sampling. The particles

are initially randomly created and it should qualify the initial accept criteria.

generate Gaussian Noise()

This function consists of processing gaussian noise with mean as zero and selects random noise rxandryfor white noise acceleration in motion model. The weight of the particle in the previous time stamp will

be used as variance for gaussian property. This gaussian noise model, models the required probability

distribution for processing motion model equation in state prediction .

Discussion 127

Figure 7.16: Unoptimized Call graph for Particle Filter child region

7.2.4 Serial Performance Engineering

• Analysis of hot-spots[71,70] and use cache friendly algorithmic loop level optimisations for all

callee of child region. Measure the cache performance with PAPI tool which is internally con-

nected with performance monitor unit of Arm cortex A15.

• Deterministic test execution trace analysis[71] for finding the trajectory points for one IMSI to plot

the trajectory. This will help in analysing the optimised trajectory functionality and unoptimised

trajectory functionality.

• Analysis with memory leak detection[64,65,66].

• Justification for trajectory functionality variation.

Analysis of hot spots and Cache friendly algorithmic loop level optimizations in child region

The following are the methods applicable while finding and avoiding hotspots in software[70]:

• Avoiding irregular data access patterns.Finding the known standard compiler loop level optimiza-

tions like loop fusion, loop optimization.

• Avoid fetching data cache lines that are partially used. Avoid eviction of cache lines that will be

accessed in future by loop avoidance.

• Use pointer based data transfer than array based transfer across functions.

• Avoid mathematical library functions like power and exponential which are expensive operations.

Figure 7.16 presents the call graph[78] of child region in particle filter for unoptimized sequential ver-

sion. Figure 7.17 presents the call grind execution characteristics of child region in particle filter for

unoptimized sequential version.

The functions like calculate multiplication function takes 33 percent and calculate addition function

takes 5 percent of execution. These functions are called in computational intensive loops. Hence finding

hotspots and analysing cache friendly loop level optimizations are necessary.


Figure 7.17: Unoptimized Sequential version execution for Particle Filter child region

Figure 7.18: Arm A15 Cortex showing PMU counters-4

Cache friendly algorithmic loop level optimizations in Antenna model

Figure 7.18 presents the performance monitor counter in Arm cortex A15 known as PMU register.

Figure 7.19 presents the usage of PMU register with PAPI tools for counting the level-1 cache access

and misses. Figure 7.20 presents the PAPI events like level-1 data cache miss and level-1 instruction

cache misses.

• Pointer Based versus Array based data transfer - The fact that accessing data from main memory

is way slower than accessing data from cache. We need to make best usage of what already loaded

to the cache or what can be interpreted as increasing data locality. Temporal locality and spatial

locality is what we mean by making best use of data available in the cache.

Figure 7.21 shows hotspot in antenna model with static array based data transfer. Depending on

Discussion 129

Figure 7.19: Performance counter monitor[75]

Figure 7.20: PAPI counters connecting with PMU for measurement of cache misses[75,53]

the cell id pair obtained during cell trace, the antenna differential LUT and variance LUT are

selected. Since the LUTs are global and constant data, we need to transfer the identified LUT in

get Serving cell LS() function. In this function the code snippet in figure 7.21, shows static array

based data transfer. The figure 7.22 presents pointer based LUT data transfer.

• Loop Fusion - The figure 7.23 presents the 3 different loops, with first loop runs 361 times. while

second and third loop runs 360 times as working set size. Loop fusion is a technique by which

we combine adjacent loops into one single loop to increase the work per iteration. This technique

provides better re-usability of data and get rid of half the loop comparisons.

Figure 7.24 shows loop fusion for Loop level code snippet example of merging two independent

computations loop into one loop. The first line iteration is split with loop unrolling for data depen-

Figure 7.21: Listing-1:Unoptimized Sequential version for Least Square Algorithm-6


Figure 7.22: Listing-2:Optimized Sequential version for Least Square Algorithm-7

Figure 7.23: Listing-3:Unoptimized Sequential version for Least Square Algorithm-8

dency, while the total loop runs for 360 iterations with more locality. The code snippet in figure

7.25 presents the way of measuring optimized child region of least square callee function known

as ”Get Serving Cell LS()” by measuring the level-1 cache access with PAPI. Before algorithmic

loop level Optimization in function Get Serving Cell, we get the following measurement with

PAPI counters.

Level-1 Data Cache misses : 126

Level-1 Data Cache access : 2700.

While after, algorithmic loop level Optimization in function Get Serving Cell, we get the follow-

ing measurement with PAPI counters.

Level-1 Data Cache misses : 25

Level-1 Data Cache access : 1238.

Thus after performing cache friendly algorithmic loop level optimizations for antenna model, we

can see the level-1 data cache misses are reduced by 5 times and level-1 data cache access are

decreased by 52 percent on an average for one event data comprising of RSRPs and cellId. Thus

Figure 7.24: Listing-4:Optimized Sequential version for Least Square Algorithm-9

Discussion 131

Figure 7.25: child region measurement

Figure 7.26: Column wise matrix addition

for processing one minute related dataOfInterest task units, the effective level-1 cache access will

be further reduced.

Cache friendly algorithmic loop level optimizations in Particle filter

Figure 7.26 presents the code snippet for matrix addition with column wise. Figure 7.27 presents the

code snippet for matrix addition with row wise.Two dimensional arrays[70] in C are stored row-wise.

So accessing elements in each row per main iterations is more efficient than accessing column wise.

Thus depending on the memory layout, when we use adjacent contiguous memory locations or what so

called as stride-1 accesses and loop re-ordering is possible. This will avoid pulling data into cache that

will be partially used.

Figure 7.28 presents the matrix addition function used in particle filter loops in motion model computa-

tions. This function is called inside filter loops for 5000 times.

Figure 7.29 presents the matrix multiplication function used in motion model for state estimate. This

function is called for 5000 times for all particles inside the particle filter loop.

Figure 7.30 presents particle filter trajectory with unoptimized loops in single threaded version.

• Loop Fusion - In Particle filter,state prediction phase and measurement phase and weight normali-

sation phase indicates that all three loops are iterating over same known datasets (for 5000 times),

but due to capacity of the cache or being far from each other, the accessed data is evicted. The

solution for that is merging loops together in one loop by avoiding read after write dependency.

Figure 7.27: Row wise matrix addition


Figure 7.28: matrix addition function called in particle filter

Figure 7.29: matrix multiplication function called in particle filter

Longitude17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.125

Latit

ude

59.62

59.625

59.63

59.635

59.64

59.645

Figure 7.30: particle filter trajectory of IMSI-1 with Unoptimized algorithmic loops of Single threadedversion

Discussion 133

Figure 7.31: optimized Particle filter loops with PAPI start counters

Figure 7.32: optimized particle filter loop with PAPI stop counters

Figure 7.31 presents optimized particle filter loops after loop fusion and elimination of matrix

functions.In this computational intensive loop in code snippet following things happen with cache

friendly way to generate more cache hits and reduce the level-1 access. All the 5000 particles have

state vector variables stored in contiguous memory location. Thus computational intensive loop

has working set size equal to number of particles N = 5000. Each particle when accessed its state

vector information with total 40 bytes are stored in contiguous memory locations are accessed to-

gether by reading (or pulling) the cache line of size 64 bytes processing motion model and sensor

model for state prediction. Hence read of the state vector followed by processing of motion model

and sensor model and write of processed state vector happens with maximum cache hits.

Thus the stride -1 access locality is increased in single computational intensive loop[70]. As

shown in code snippet in figure 7.31, Calling generateGaussianNoise function twice in-order to

get two random values rand val1 and rand val2 is very expensive. So the same function uses both

data in input arguments with call by reference The code snippet in figure 7.32 presents the way of

measuring optimized child region of particle filter.

The following are the PAPI measurements for particle filter child region consisting of Unopti-

mized loop.

Level-1 data cache misses = 54865

Level-1 data cache access = 20607684

The following are the PAPI measurements for particle filter child region consisting of Optimized

loop.


Figure 7.33: optimized Sequential version for Particle Filter-15

Level-1 data cache misses = 76252

Level-1 data cache access = 17702562

Deterministic execution path test case

In the Single core/Single threaded scheduler version, the consumer part of the software is the same main

thread.

Figure 7.33 presents the optimized call graph showing particle filter. This optimised version does not

have the matrix multiplication and addition function. Thus the child region inside particle filter has only

one computational intensive loop after performing loop fusion, loop avoidance and eliminating function

calls thus decreases the Level-1 cache access by 14 percent.

The call graph in figure 7.33 shows particle filter calling only generateGaussianNoise() function and

resample function. This main thread in-turn executes in only one Arm A15 core and processes only

one UE related DataOfInterest task unit. So as soon as the trajectory software main thread component

post UEDataPost queue to output interface which is UEDataPost thread, the post data gets freed after

sending to CMT server. Before freeing the post data component, we store the latitude and longitude of

any hardcoded UE and collect their trajectory plot in matlab for verifying the trajectory functionality.So

in total we receive 1375 events data and the figure shows the trajectory plot for UE.

Analysis with memory leak detection

Figure 7.34 shows leak summary for sequential version[72]. This shows there is no direct loss or indirect

loss. So there is no leak in single threaded version.

Discussion 135

Figure 7.34: Leak summary of Sequential version

Longitude17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.125

Latit

ude

59.615

59.62

59.625

59.63

59.635

59.64

59.645

Figure 7.35: IMSI-1 from Optimized Single threaded version-16

Justification for trajectory functionality

Figure 7.35 presents the trajectory functionality for optimized single threaded version and figure 7.36

presents the trajectory functionality of unoptimized single threaded version. The reason behind the

tolerable variation in trajectory functionality with latitude and longitude plot is due to the following:

• As we saw in create particles()function, random particles are created initially within circle with

radius 1750 meters with center as eNodeB. So all the 5000 particles are scattered around the circle

and it could be at any distance with 1750 meters without any prediction initially.

• Since very few particles will be found close to the received measurements, will be selected for re-

sampling stage. Thus we could analyse during state prediction for successive measurements the

generateGaussianNoise () function produces random noise based on probability distribution.So

during every cycle this random noise will be different and this will also create some tolerable

error in trajectory functionality.

• The resample particles() function is also responsible for variation in trajectory performance with


Longitude17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.125

Latit

ude

59.62

59.625

59.63

59.635

59.64

59.645

Figure 7.36: particle filter trajectory of IMSI-1 with Unoptimized algorithmic loops of Single threadedversion

cumulative distribution. The cumulative product of N random numbers and cumulative sum of

weights of N particles also contributes to this variation in trajectory performance.

7.2.5 Parallel Performance Engineering

• Use deterministic test execution trace analysis for finding the trajectory points for one and random

IMSI UE scheduled across multiple threads on multiple cores.

• Scheduling the task level modelled data structures with multiple threads across multi-cores to

analyse with race detection and memory leak detection.

• Relative Communication bottleneck analysis across the parallel processing cores with 1 UE.

• Schedule the task for massive UE across multi-cores for scalability analysis with Gustaffson law.

• Perform the heap profiling and cache profiling for massive UE for the total software.

Multi-threaded scheduler for parallel processing

The code snippet in figure 7.37 presents the producer lock region for posting dataOfInterest task units in

dataofinterestqueue which is a bounded buffer in head region.

The code snippet also presents the event semaphore flag increments for every task units known as

dataOfInterests.

Figure 7.38 presents the call graph of multi-threaded scheduler. The consumer worker thread routine

shows the call tree like pthread mutex lock, pthread mutex unlock and run particle filter(). This lock is

Discussion 137

Figure 7.37: Producer Consumer Lock

Figure 7.38: Call graph of threaded scheduler

shared by all consumer threads which is same as the lock described in code snippet shown in figure.

In the multi-threaded scheduler version, the consumer part of the trajectory software is executed by

multiple threads. The figure 7.39 presents the deterministic execution path of software for tracing the

events.

These consumer threads utilizes all the 4 Arm A15 cores and processes different UE DataOfInterest task

units. In multicore, with task level scheduling the consumer worker threads are allowed to interleave and

run on any available A15 cores. The critical section shown in figure 7.39, does comparison of unique

IMSI ID for all threads with every other active threads/cores to make sure it is unique and then sets

the thread to process the task. Each consumer worker threads, after processing dataOfInterest task unit

for AoA, AoA variance, the particular worker thread frees the dataOfInterest and process particle filter.

Now the filter processes and post the filtered XY position coordinates to MGRS latitude and longitude

conversion module. This module post the trajectory information in the form of latitude and longitude

coordinates. So as soon as the trajectory software consumer thread component post UEDataPost queue

to output interface which is UEDataPost thread, the post data gets stored in CMT server and UEData-

Post structure is freed UEDataPost thread also known as output interface. Before freeing the post data

component, we store the latitude and longitude of any hardcoded UE and collect their trajectory plot in

matlab for verifying the trajectory functionality.So in total we receive 1375 events data and the figure

shows the trajectory plot for UE. The corner case where the number of UEs is lesser than the available

number of threads is one of the significant deterministic test condition which is tested.

When there is only one IMSI related UE task units, only one Arm core out of four available cores will

process the DataOfInterest task unit during execution with multi-threaded scheduler. This is because,


Figure 7.39: Deterministic test execution trace for multi-threaded scheduler

Figure 7.40: multi-threaded race detector tool for trajectory software

upon finishing all the measurements in one DataOfInterest task, the hidden markov property needs to

be fulfilled by UECreate Particles list by storing the state vector of all partcles after processing the re-

sampling phase for every measurements data in eventData structure . Since the threads are allowed to

interleave, any thread/core is allowed to process the next task and gets the current UE state from the

UE create particles list. This deterministic test condition is also tested with critical section.

Scheduling the task level data structures across multicore for analysis of race detection and leakdetection

Helgrind[74] is a valgrind tool for detecting synchronization errors in multi-threaded programs. The

figure 7.40 presents how the tool is used for our trajectory software to analyse race detection and dead-

lock scenario. The drive test simulation file triggers simulation for 10 UEs with trajectory software. The

figure 7.41 presents the initial start of program execution and the initial occurrence of false positives and

lock held as none. The figure7.42 presents the execution of helgrind tool in AXM SoC with arm cores

with multi-threaded scheduler with multi-cores.

Discussion 139

Figure 7.41: race detector check


The figure 7.43 presents the processed simulation for 10 UEs with helgrind tool.

Memcheck[72] is used for memory leak detector used for analysing multi-threaded memory leaks.

When we run the trajectory software with multiple consumer threads for processing one UE related

dataOfInterest task units, the total number of memory allocated for various list is made equal to the total

number of frees. Thus ones the data structure is freed immediately a null memory is assigned.

Analysis of memory leak detection during parallel processing

Memcheck is a valgrind’s tool used for memory leak detection. A memory leak occurs when a block of

dynamically allocated memory is never freed (or) de-allocated explicitly.

Memcheck is a tool used for debugging memory management bugs. The various defects given by leak

detector summary are the following:

• Definitely lost - A leak is considered Definitely Lost when, at process exit, there is no pointer or

chain of pointers to the leaked memory block.

• Indirectly lost - When memcheck finds a valid start pointer (or) interior pointer to a given block

of memory but that pointer is in another block which is ’directly lost’, memcheck will report the

block of memory as an ’Indirectly lost’.Directly lost and Indirectly lost leaks are referred as defi-

nitely lost.



Figure 7.44: leak summary for Parallel multi-threaded scheduler

• Still reachable and Possibly lost - A block of memory is reported to be still reachable when mem-

check finds, after process execution ends, at least one pointer with start address of the block.

The term ’possibly’ states that the valgrind tool does not know whether the leak is ’Definitely

Lost’ or ’Still reachable’.

Example of such chain of pointers is linked-list or queues in C programming with self-referential

data structures, where each element of the list contains a pointer to the next element, and the list

itself is fully referenced by an initial ’head’ pointer (start-pointer of the first element in the list)

Among these it is a programmer’s responsibility to fix definitely lost and indirectly lost leaks.

Usually once all the definitely lost leaks are fixed, the indirectly lost leaks will get eliminated.

Thus we analyse that the definitely lost and indirectly lost memory are zero as shown in figure 7.44.

Discussion 141

17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.12559.62

59.625

59.63

59.635

59.64

59.645

Figure 7.45: IMSI-1 UE trajectory from multi threaded scheduler

17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.12559.62

59.625

59.63

59.635

59.64

59.645

Figure 7.46: IMSI-10 from multi-threaded scheduler

Hence there is no memory leak with multi-threaded software execution. So the trajectory plot for 1 UE

with 4 consumer threads is shown in figure 7.45.

The figure 7.46 shows the plot for 10th IMSI based UE trajectory when simulated massive number of

UEs with 4 consumer threads.

Relative Communication bottleneck analysis for multi-threaded scheduler with multi-core

The following figure 7.47 presents the callgrind[78] execution for 1 UE with 4 threads in linux virtual

machine. Thus we can observe that the mutex lock consumes 18 percent of relative execution and

mutex unlock consumed 10 percent of execution share. Since with 1 UE only one consumer thread

runs one dataOfInterest task unit. So the communication bottleneck of lock is shared between core that

runs producer thread and every consumer worker thread/cores for the bounded buffer queue. As shown

in callgraph, lock So the code region consisting of lock and unlock within consumer worker threads

creates more contention because of comparison of unique IMSI between all the worker threads/cores.

Thus when number of threads are increased the lock contention also increases.


Figure 7.47: Callgrind showing bottleneck of Locking

Figure 7.48: Load balance for 4 cores -1

Scalability Analysis with Gustaffson’s Law for multi-threaded scheduler with 1 cluster of AXM5512 SoC

In the trajectory software the number of threads is increased from 2, 3 to 4. The task set is masked

correspondingly with 0x30, 0x70, 0xF0. Thus by increasing the consumer thread and Arm A15 cores

and keeping the time as constant, we could observe the increase in number of UEs getting processed.

The figure 7.48 shows load on 4 Arm A15 cores at 30 milliseconds. The average utilization of all 4 cpu

core is around 60 percent during 30th millisecond.

The figure 7.49 shows load on 4 Arm cores at 110 milliseconds. The average utilization of all 4 cpu

core is around 90 percent during 110th millisecond.

The figure 7.50 shows load on 4 Arm cores at 220 milliseconds.The average utilization of all 4 cpu core

is around 90 percent during 2200th millisecond.

The figure 7.51 shows load on 4 Arm cores at 250 milliseconds.We could see the average utilization of

all 4 cpu core is around 48 percent during 250th millisecond. So we can observe the rise of load from

60 percent to 90 percent and fall of load from 90 percent to 48 percent happens in just 200 millisecond.

Figure 7.49: Load balance for 4 cores-2

Discussion 143



There are two possible reasons for this kind of loads getting scheduled.

Firstly, it is static part of the hybrid queue fetches only 16 dataOfInterests from events streamer thread

to main thread.

Secondly, because of lock contention between the producer and all consumers. Along with this the

contention also happens among all the consumer worker threads scheduled on available 3 or 4 cores

during run time.

OpenMP based scheduler for Multi-core

The figure 7.52 shows the OpenMP scheduler call graph. The compiler directive in opnemp known

as, Pragma omp parallel for splits the code region into number of parallel threads that is assigned by

omp set num threads() function that is called above openmp directive. All the data in code region is by

default remain private for each thread. Apart from various global LUT for antenna model, no data is

shared across the parallel threads. When the number of active UEs is lesser than the number threads,

then the remaining other worker threads which does not have task to be processed will be inactive. This

consumer parallelism can be run through OMP get thread num () API function which always gives the

active threads id.

The following is an example of deterministic test scenario, if we assume the number of UE available

to be scheduled is 1 and number of consumer threads/cores are 4. The static buffer as shown in figure

7.53 has always only one task unit ie , only one IMSI related DataOfInterest and 3 remaining other

buffer locations slot remain empty with out any task to be processed. If there is only one IMSI related

DataOfInterest task is scheduled with 4 active worker threads, only master thread will be active and all

Figure 7.52: Callgraph for OpenMP consumer section


Figure 7.53: Deterministic condition test case

Discussion 145

Figure 7.54: leak summary for Parallel version

other threads does not have task to be processed. so these remains inactive. Hence after processing all

the events data, in UEDataPost thread we get 1375 or minimum above 1360 events. So with hard-coded

IMSI number we could transfer all processed trajectory points of latitude and longitude information to

logger text file through which we could plot trajectory in matlab.

Analysis of leak detection

Memcheck for leak detection summary with OpenMP scheduler is shown in figure 7.54. The figure also

shows the completion of 1375 total processed value for 1 IMSI UE.

Trajectory plot for 1st UE is shown in figure 7.55.

Trajectory plot for 50th random UE is shown in figure 7.56. Again for the justification of tolerable

performance error of the trajectory functionality explained in serial performance engineering section

remains for slight variation in trajectory of 1st IMSI UE and 50th IMSI UE.

Relative Communication bottleneck analysis OpenMP based multi-threaded scheduler

The figure 7.57 below presents the call grind execution of particle filter trajectory for 1 UE when shed-

uled with openmp scheduler across multi-cores. When the compiler optimization flag with -O3 is en-

abled during callgrind simulation with OpenMP, we get the callgrind execution as shown in figure.

Scalability Analysis with Gustaffson’s Law for OpenMP based scheduler with 1 cluster of AXM5512 SoC

In the trajectory software the number of threads is increased from 2, 3 to 4 by OMP Set NUM Thread.

The taskset is masked correspondingly with 0x30, 0x70, 0xF0. Thus by increasing the consumer thread


Longitude17.07 17.08 17.09 17.1 17.11 17.12 17.13

Latit

ude

59.615

59.62

59.625

59.63

59.635

59.64

59.645

Figure 7.55: IMSI-1

Longitude17.075 17.08 17.085 17.09 17.095 17.1 17.105 17.11 17.115 17.12 17.125

Latit

ude

59.62

59.625

59.63

59.635

59.64

59.645

Figure 7.56: IMSI-50

Discussion 147

Figure 7.57: callgrind execution for 1UE


and Arm A15 cores and keeping the time as constant, we could observe the increase in number of UEs

getting processed.

The below figure 7.58 presents the cpu load at 4th minute during parallel processing of the 4 ARM

Cortex A15 cores


Cortex A15 cores


Cortex A15 cores

So with figure 7.61 which shows load at 10th minute, we can conclude that the load across 4 Arm cortex

A15 cpu cores are well balanced for 90 percent to 96 percent during parallel processing and consistently

remained above 90 percent for all 4 cores.

7.2.6 Heap profiling

Valgrind Massif tool[79] is used for analysing heap memory profiling. The following table 7.8 shows

the theoretical memory requirements for particles for 1 UE and table 7.9 shows memory requirement for

number of UEs.





Table 7.8: Memory requirement for particles for 1 UE

Number of particles Memory requirement(KB)

1 0.040

100 4

5000 200

Table 7.9: Memory requirement for particles

Number of UE Number of particles Memory requirement(MB)

1 5 KB 0.2

70 350 KB 14

120 600 KB 24

1000 5 MB 200

Massif tool is used to analyse the trajectory software initially for 1 UE.

The figure 7.62 shows heap profiling chart with massif tool for 1 UE. As the filter starts initially there

is no particles and all initial particles are allocated memory by create particles() function. This cre-

ate particles function creates memory for 5000 particles. At this very initial stage all the 5000 particles

are scattered in the area of a circle with 1750 meters as radius with origin as eNodeB. Therefore, it is

understood that, when filter receives initial measurement quite a few particles are close to this initial

measurement. So only those few particles are selected in re-sampling particles function and hence in-

order to have 5000 particles, more memory for new particles are created again as a copy of selected few

particles before freeing all the old particles. This is the reason for initial peak memory usage of 606

kilobyte as shown in figure 7.62.After peak, as filter keeps progressing to 4th snapshot and then on, we

could observe that the memory usage remains on an average of 370 kilobyte for one UE. The reason for

this is constant memory usage is, as the filter progresses from the first iteration, most of the particles

are found close to the measurement, causing most of the particles now are selected during re-sampling

process. This will reduce the need for re-allocation of memory for new particles and as particles found

Discussion 149

Figure 7.62: heap profiler for one UE

more closer there will be decrease in memory allocation.

Massif for 70 UE

The following figure shows the snapshot of memory consumption for 70 UEs in AXM SoC during mas-

sive UE simulation with parallel threads in 4 Arm A15 cores. This is shown in figure 7.63.

The massif tool takes snapshot during every heap allocation/de-allocation. The snapshot at 2 infers

the following summary of heap profiling. The resample particles() allocates highest memory of 72.97

percent initially during very first measurement received for all 70 UEs.The run particle filter() occupies

the second highest heap memory consumption of 6.08 percent. After this create particles()function al-

locates memory for 5000 particles for all 70 UEs consuming upto 4.35 percent of heap usage.Finally

receive events() consumes 2.67 percent of heap memory for allocating dataOfInterest tast units for 70

UEs till snapshot 2 of the particular time instant.

During the next heap allocation/de-allocation at snapshot at 4, we could notice the resample particles()

decreases to 70.19 percent while receive events increased double the memory consumption and ex-

ceeded memory allocation of create particles function in comparison with snapshot at 2.

During the next heap allocation/de-allocation of snapshot at 11, we could observe that peak usage of


Figure 7.63: heap profiler-1 for massive UE Parallel version

heap memory consumption is 11 as shown in figure. The following memory usage characteristics re-

main constant even after peak. The resample particles() function decreased its memory consumption to

63.21 percent, yet it is the highest memory consumption for massive UE positioning with particle filter.

The receive events () function populates more dataOfInterest allocation and becomes the second highest

memory consumption function with 12.31 percent. This is followed by run particle filter () with 5.42

percent and finally create particles() with 3.72 percent, which is used very infrequently when there is

more time-steps between two received measurements for filter estimation.

7.2.7 Cache Profiling

Cache grind for 1 UE

Cachegrind[73] is one of the valgrind tool, which is used to measure cache performance. Cachegrind tool

measures effective private core’s level-1 cache usage of both instruction cache, data cache and shared

level-2 cache as last level. So while measuring the trajectory software performance for cache friendly

usage, we need to provide the MPCore’s instruction and data cache details. So the details are provided in

the order of total cache size, set associativity and cache line size. So while measuring, with cachegrind

tool for AXM 5512 SOC, the cache details of Arm cortex A15 has to be provided with the following

order.

Level-1 Instruction Cache details I1 size= 32KB, I1 associativity = 2, I1 cache line size =64.

Discussion 151

Figure 7.64: Cache profiler for 1UE in Parallel version

Figure 7.65: Cache profiler for 10UE for Parallel version

Level-1 Data Cache details, D1 size= 32KB, D1 associativity =2, D1 Cache line size =64.

Level-2 Shared data cache, L2 size = 2MB, L2 associativity =16, L2 Cache line size =64.

So after processing 1375 events for 1 UE for all 31 TA events of drive test data, the miss rate of level-1

instruction cache and level-1 data cache found by cachegrind tool. The following screen shot in figure

7.64 shows the result of cache performance for 1 UE execution.

Cache grind for 10 UE

With multi-threaded scheduler version when simulating for 10 UEs, the total events data processed are

13695.

The screen shot in figure 7.65 shows the result of cache performance in terms of miss rate for the whole


trajectory software for processing 10 UE with multi-threaded version.

7.3 Summary

In this final result section we have seen the results obtained with Network based AoA and GPS based

AoA. Thus the Network based AoA and AoA Variance are used in sensor model for particle filter posi-

tioning with ground truth analysis. Finally we have seen Serial performance engineering for single core

and parallel performance engineering for multicore.

Chapter 8

Conclusion and Futurework

8.1 Decentralized distribution of task level scheduling with extension ofcores

When we experimented with increase in number of threads in OpenMP scheduler design from 4 to 8

consumer parallel threads, the load balance drops from 90 to 70 percent. This is because of two reasons.

First reason for this is the master thread assigns different IMSI task unit for different threads/cores to

get scheduled. The master thread after filling the static bounded buffer creates different threads when

it finds the first parallel scheduling constructs. So the master thread consume significant load on one

core making other cores to wait for some significant delay. Second major reason is when the number of

parallel processing consumer threads/cores increase, the delay with implicit barrier related to ”OpenMP

parallel for and end construct” will cause all the processed threads to wait until remaining processing

threads to get finished.

The figure 8.1 presents Decentralized task level scheduler by hash function. When the IMSI number

is divided by number of cores as hard coded constant, we get the remainder value which decides the

core it should be scheduled. The main advantage is the parallel code region remains same or similar to

OpenMP as well as the task with same IMSI ID will be in queue form and will get scheduled with same

core. Thus the hidden markov requirement will be maintained with this scheduling. This is shown with

the following example.

Let us assume the availability of number of cores are N=8.If there are task with IMSI as ending with

55007, So (IMSI modulo N ) will give the core which will schedule. In this case 55007 modulo 8 = 7,

which means all the incoming task of IMSI 55007 will execute only in A15 07th core, while maintaining

current state vector of last event of DataOfInterest will be the input for next DataOfInterest related to

IMSI 55007.

153

154 8. Conclusion and Futurework

Figure 8.1: Decentralized task scheduling

Concentric circles for AoA accuracy 155

Figure 8.2: Concentric circles for AoA

8.2 Concentric circles for AoA accuracy

In this method, we use concentric circles with centre as the RBS geographic coordinates as shown in

figure 8.2. With TA based distance varying with radius of cicle we should get complete 0 to 359 ie, 360

degree coverage.

So by keeping a constant TA distance as radii from RBS coordinates, we can do reverse engineering and

obtain 0 degree to 359 degree related DoD with difference in horizontal gain as difference in RSRP. We

can easily find the two RSRP values which will drive the particular DoD. From DoD we can arrive at

AoA for the complete circle. Also we can arrive at AoA variance based on CRLB. Now we can increase

the TA distance or the radii and arrive at bigger concentric circle. Again we need to receive the same

reverse engineered RSRP values from RBS inorder to get the DoD angle from 0 to 359 degree. The

advantage with this is we can arrive at full coverage of 360 degrees related RSRP values. When we have

three RSRP the location based services will be more effective with MUSIC algorithm. This is because

the drawback of LS algorithm is that the difference in horizontal antenna gain, maximum gain margin

the FBR ratio.

156 8. Conclusion and Futurework

Appendices

157

Chapter A

LTE Event Parameters

There are two LTE events which are streamed in order to receive required information regarding UE

position estimation. Various parameters received in the events are presented here. For both the events,

an event header is received with every event. The parameters received in this header are common for

both the events and are presented in table A.1.

Table A.1: Event Header Parameters

Parameter Name Size(in byte) Description

EVENT PARAM TIMESTAMP HOUR 1 Hour value in timestamp

EVENT PARAM TIMESTAMP MINUTE 1 Minute value in timestamp

EVENT PARAM TIMESTAMP SECOND 1 Second value in timestamp

EVENT PARAM TIMESTAMP MILLISEC 2 Millisecond value in timestamp

EVENT PARAM SCANNER ID 3 ID of the scanner

EVENT PARAM RBS MODULE ID 1 DUL (Digital Unit reference)

EVENT PARAM GLOBAL CELL ID 4 Global ID of serving cell

EVENT PARAM ENBS1APID 3 eNB S1 AP ID of UE

EVENT PARAM MMES1APID 4 MME S1 AP ID of UE

EVENT PARAM GUMMEI 7 GUMMEI of the released MME

EVENT PARAM RAC UE REF 4 The UE identity used in service between RAC

layer and Baseband.

EVENT PARAM TRACE REC SESS REF 3 The unique session ID for UE

The parameters (other than header parameters) of the event UE MEAS INTRAFREQ1 (streamed to re-

ceive RSRP values) are presented in table A.2.

159

160 A. LTE Event Parameters

Table A.2: RSRP Event specific parameters


EVENT PARAM MEASUREMENT ID 1 Measurement ID

EVENT PARAM REPORT CONFIG TYPE 1 Report Configuration type

EVENT PARAM RSRPSERVING 1 RSRP measurement from serving cell

EVENT PARAM RSRQSERVING 1 RSRQ measurement from serving cell

EVENT PARAM PHYSICAL CELLID1 2 Physical cell ID of the first cell

EVENT PARAM RSRPRESULT1 1 RSRP measurement from 1st cell

EVENT PARAM RSRQRESULT1 1 RSRQ measurement from 1st cell

EVENT PARAM PHYSICAL CELLID2 2 Physical cell ID of the 2nd cell

EVENT PARAM RSRPRESULT2 1 RSRP measurement from 2nd cell

EVENT PARAM RSRQRESULT2 1 RSRQ measurement from 2nd cell

EVENT PARAM PHYSICAL CELLID3 2 Physical cell ID of the 3rd cell

EVENT PARAM RSRPRESULT3 1 RSRP measurement from 3rd cell

EVENT PARAM RSRQRESULT3 1 RSRQ measurement from 3rd cell

EVENT PARAM PHYSICAL CELLID4 2 Physical cell ID of the 4th cell

EVENT PARAM RSRPRESULT4 1 RSRP measurement from 4th cell

EVENT PARAM RSRQRESULT4 1 RSRQ measurement from 4th cell













The parameters (other than header parameters) of INTERNAL PER RADIO UE MEASUREMENT TA

(streamed to receive TA values) are presented in table A.3.

161

Table A.3: TA Event specific parameters


EVENT PARAM TIMESTAMP START HOUR 1 Hour value in timestamp

EVENT PARAM TIMESTAMP START MINUTE 1 Minute value in timestamp

EVENT PARAM TIMESTAMP START SECOND 1 Second value in timestamp

EVENT PARAM TIMESTAMP START MILLISEC 2 Millisecond value in timestamp

EVENT PARAM TA INTERVAL 2 TA interval

EVENT ARRAY TA 2 TA value array

162 A. LTE Event Parameters

Chapter B

Data Structures

DataOfInterest Structure in the Trajectory software is defined as follows:

struct DataOfInterest

{//Number of EventData records present in the list

uint8 t numFilled;

//MMES1APID of the UE

uint32 t mmes1apid;

//IMSI of the UE

uint64 t imsi;

//The timestamp of the TA event

uint32 t taStartTimeStamp;

//Head pointer of the list of EventData structures

EventData* eventDataHead;

//Tail pointer of the list of EventData structures

EventData* eventDataTail;

//Backward pointer of the list of DataOfInterest structure

struct DataOfInterest* prev;

//Forward pointer of the list of DataOfInterest structure

struct DataOfInterest* next;

}

Below the structure of EventData used in the Trajectory software is presented:

struct EventData

{

163

164 B. Data Structures

//Timestamp of the RSRP event generation

uint32 t timestamp;

//Global cell id of the serving cell

uint32 t globalCellId;

//RSRP value from serving cell

uint8 t rsrpServing;

//Array of physical cell ids of neighbouring cells

uint16 t phyCellId[8];

//Array of RSRP values received from neighbouring cells

uint8 t rsrp[8];

TA value of the UE at the timestamp mentioned above

uint16 t ta;

//Backward pointer of the list of EventData structures

struct EventData* prev;

//Forward pointer of the list of EventData structures

struct EventData* next;

}

UeDataPost structure used for creating an output buffer in a form of a list is presented below:

struct UeDataPost

{//Number of records filled.

uint8 t numFilled;

//IMSI of the UE

uint64 t imsi;

//Head pointer of the list of PostData structures

PostData* postDataHead;

//Tail pointer of the list of PostData structures

PostData* postDataTail;

//Backward pointer of the list of UeDataPost structures

struct UeDataPost* prev;

//Forward pointer of the list of UeDataPost structure

struct UeDataPost* next;

}PostData structure used for storing the trajectory points to be posted to CMT server is presented below:

struct PostData

165

{//Latitude of the UE position

double lat;

//Longitude of the UE position

double lng;

//Timestamp of the UE position

long long time;

//Speed of the UE

double speed;

//ID of the serving cell

uint32 t cellid;

//Signal strength received from serving cell

uint8 t rss;

//Backward pointer of the list of PostData structures

struct PostData* prev;

//Forward pointer of the list of PostData structures

struct PostData* next;

}struct particle

{// UE position in X-coordinate

double x;

// UE position in Y-coordinate

double y;

// UE velocity in X-coordinate

double vx;

// UE velocity in Y-coordinate

double vy;

//Speed of the UE

uint8 t selected;

// Importance weight of particle

double weight;

}struct UeParticles

{// Permanent identifier of UE

166 B. Data Structures

uint64 t imsi;

// timestamp of current state vector

uint32 t timestamp;

// Current state vector of UE

particle** particles;

// points to previous UE

struct UeParticles* prev;

//points to next UE

struct UeParticles* next;

}struct MappingEntry

{// Permanent identifier of UE

uint64 t imsi;

// Temporary identifier of UE

uint32 t MME UE S1AP ID;

// points to previous UE mapping

struct MappingEntry* prev;

//points to next UE mapping

struct MappingEntry* next;

}

Bibliography

[1] 3GPP, The mobile broadband standard, http://www.3gpp.org/, Accessed 14th Sep. 2015.

[2] 3GPP releases, http://www.3gpp.org/specifications/67-releases, Accessed 3rd November 2015.

[3] 3GPP TS 29.281, General Packet Radio System (GPRS) Tunnelling Protocol User Plane (GTPv1-

U).

[4] 3GPP TS 36.410, Evolved Universal Terrestrial Radio Access Network (E-UTRAN); S1 general

aspects and principles.

[5] 3GPP TS 36.420, Evolved Universal Terrestrial Radio Access Network (E-UTRAN); X2 general

aspects and principles.

[6] 3GPP TS 36.331, Evolved Universal Terrestrial Radio Access (E-UTRA); Radio Resource Control

(RRC); Protocol specification.

[7] MoShell 11.0p User Guide, 1553-CXC1328930, Ericsson AB, 25th October 2015.

[8] Managed Object Model (MOM) User Guide, 1553-1/CSX 101 09 Uen H, Ericsson AB, 12th May

2011.

[9] The LTE Network Architecture, A comprehensive tutorial, Alcatel white paper, 2009.

[10] System and Node Description FDD, 1/1551-LZA 701 6004-V1 Uen AC1, Ericsson AB, 15th April

2015.

[11] Performance Management, 3/1551-HSC 105 50/1-V1 Uen V, Ericsson AB, 22nd May 2014.

[12] PM-Initiated UE Measurements, 35/1553-HSC 105 50/1-V1 Uen H1, Ericsson AB, 27th Novem-

ber 2014.

[13] Trace Event Streaming, 51/1553-HSC 105 50/1-V1 Uen D, Ericsson AB, 6th December 2011.

[14] Cell Trace Mapping, 53/221 02-AXB 250 05/8 Uen E, Ericsson AB, 13th August 2013.

167

168 BIBLIOGRAPHY

[15] MME Cell Trace UE Id Mapping File Format, 1/198 18-CRA 250 63 Uen, Ericsson AB, 7th April

2010.

[16] MME Cell Trace UE Id Mapping XML Definition, 1/1551-CRA 250 63, Ericsson AB, 30th

November 2009.

[17] LTE Cell Trace UE ID Mapping, 140/1594-FCP 103 8147 Uen, Ericsson AB, 25th August 2009.

[18] Gom Interface Description, 9/1551-AXB 250 05/8 Uen AY1, Ericsson AB, 6th February 2013.

[19] Classes for Managed Object Model (on Ericsson CPI Store), 155 54-EN/LZN 785 0001/3-V1 Uen

B, Ericsson AB, Accessed 15th November 2015.

[20] PM Event List, LTE RAN, 6/1551-LZA 701 6004-V1 Uen, Revision U, Ericsson AB, 2015.

[21] 3GPP TS 36.300, Evolved Universal Terrestrial Radio Access (E-UTRA) and Evolved Universal

Terrestrial Radio Access Network (E-UTRAN); Overall description.

[22] 3GPP TS 36.214, Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer - Mea-

surements.

[23] 3GPP TS 36.213, Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer proce-

dures.

[24] 3GPP 36.305, Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer procedures

[25] Mike Thorpe, Ewald Zelmer, LTE Location Based Services Technology Introduction White paper,

Rohde& Schwarz, September 2013.

[26] Nico Deblauwe. GSM-based Positioning: Techniques and Applications. PhD thesis, Vrije Univer-

siteit Brussel, 2008.

[27] F. Gunnarsson, M. Johansson, A Furuskar, M. Lundevall, A. Simonsson, C. Tidestav and M. Blom-

gren, Downtilted Base Station Antennas - A Simulation Model Proposal and Impact on HSPA and

LTE Performance, Vehicular Technology Conference, 2008.

[28] 3GPP TR 25.814, Physical layer aspects for evolved Universal Terrestrial Radio Access (UTRA).

[29] 3GPP TR 25.996, Spatial Channel Model for Multiple Input Multiple Output (MIMO).

[30] F. Gustafsson, Particle filter theory and practice with positioning applications, Aerospace and Elec-

tronic Systems Magazine, 2010.

[31] Kathrein 80010656, Dual-Beam Panel Antenna, Kathrein Antennen. Electronic,

https://www.kathrein.com/.

BIBLIOGRAPHY 169

[32] G.Giorgetti,RF-BASED LOCALIZATION IN GPS-DENIED APPLICATIONS, PHD Thesis, Aug

2009.

[33] Y. C. Ho and R. C. K. Lee, A Bayesian approach to problems in stochastic estimation and control,

IEEE Trans. Automatic Control, vol. 9, pp. 333-339, 1964.

[34] Thomas B Schon. Solving Nonlinear State Estimation Problems Using Particle Filters-An Engi-

neering Perspective. PhD thesis, Linkopings universitet, 2006.

[35] Fredrik Gustafsson. Statistical Sensor Fusion. Studentlitteratur AB, 2010.

[36] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, Novel approach to nonlinear/non-Gaussian

Bayesian state estimation, IEE Proc.-F, vol. 140, no. 2, pp. 107-113, 1993.

[37] J. S. Liu and R. Chen, Sequential Monte Carlo methods for dynamical systems, Journal of the

American Statistical Association, vol. 93, pp. 1032-1044, 1998.

[38] A. Doucet, S. Godsill, and C. Andrieu, On sequential Monte Carlo sampling methods for Bayesian

Filtering, Statistics and Computing, vol. 10, no. 3, pp. 197-208, 2000.

[39] F. Gunnarsson, F. Lindsten, and N. Carlsson, Particle Filtering for Network-Based Positioning

Terrestrial Radio Networks, IET Conference, 2014.

[40] FCC. Enhanced-911-wireless services, https://www.fcc.gov/encyclopedia/enhanced-9-1-1-

wireless-services, Accessed 3rd November 2015.

[41] Fredrik Gunnarsson, Fredrik Gustafsson. Mobile positioning using wireless networks, IEEE Signal

Processing Magazine, 22(4):41-53, 2005.

[42] Torbjorn Wigren Ari Kangas, Iana Siomina. Chapter 32: Positioning in lte, book: Handbook of

Position Location: Theory, Practice and Advances, November 2011.

[43] A. Doucet, N. de Freitas, and N. J. Gordon, Sequential Monte Carlo Methods in Practice. New

York: Springer, 2001.

[44] Vesselin P. Jilkov and X. Rong Li. Survey of maneuvering target tracking. part 1:Dynamic models.

IEEE Transactions on Aerospace and Electronic Systems, 39(4):1333-1364, 2003.

[45] Gaussian Distribution,Universit of Notre Dame, https://www3.nd.edu/ rwilliam/stats1/x21.pdf,

Accessed 3rd May 2016.

[46] Maximum Likelihood Estimator, University of OXFORD/Robotics,

http://www.robots.ox.ac.uk/ az/lectures/est/lect34.pdf, Accessed 3rd November 2015.

170 BIBLIOGRAPHY

[47] N. Carlsson, Position estimation in LTE using Particle Filters, 2014.

[48] Parallel Computing, Zhonghai Lu, IL2226:Embedded System Design, July-15, 2016.

[49] MARWA ABDUL-MONEM AL-SHANDAWELY,Impacts on data structures and algorithm on

multicore, Master Thesis, https://www.pdc.kth.se/ erwinl/ThesisMarwa.pdf,July-15,2016.

[50] Chen Zhang, MESI Cache Coherence, The University of IOWA/High Performance Computer Ar-

chitecture, Spring 2006, http://homepage.cs.uiowa.edu/ ghosh/4-20-06.pdf , 18th August 2016.

[51] Coherence Techniques by Silvia Lametti, Master Thesis, University of Pisa ,December 1 2010,

accessed 12th Auguest 2016.

[52] Technical Reference Mannual of Arm Cortex A15, revision r2p0, ARM, 28 September 2011, ac-

cessed 15 May 2016.

[53] Cortex-A Series Programmer’s Guide, Version 1.0, ARM, 25 March 2011,

http://www.dsi.fceia.unr.edu.ar/downloads/EPEC/CortexASeriesProgrGuide.pdf, accessed 15

May 2016.

[54] Dr.Lennart Johnsson, Introduction to HPC, University of Hous-

ton/COSC6365 Introduction to High-Performance Computing, Spring 2014,

http://www2.cs.uh.edu/ johnsson/COSC63652013/Lecture07S.pdf-52, accessed 15 August

2016.

[55] DUS-41 introduction with enodeb 4g aws for Telcel commercial offer, TEM-14:001515 Uen, Rev

PA2, 5th November 2014.

[56] KT LTE - DUL20 to DUS41 Migration, Ericsson AB, 3rd February 2014.

[57] U14B Release Presentation, 1/221 09 - FGC 101 1081Uen, Rev A, Ericsson AB, 24th September

2014.

[58] TCU03, Built-in transport for RBS 6000, 287 01-FGC 101 2511 Revision A, Ericsson AB 2014.

[59] Wind River Linux, http://www.windriver.com/products/linux/, Accessed 10th November 2015.

[60] OpenMP, http://openmp.org/wp/, Accessed 10th November 2015.

[61] Debian, https://www.debian.org/, Accessed 10th November 2015.

[62] WGS84, http://earth-info.nga.mil/GandG/wgs84/ , Accessed 10th November 2015.

[63] Linaro: open source software for ARM® SoCs, http://www.linaro.org/ , Accessed 10th November

2015.

BIBLIOGRAPHY 171

[64] GDB: The GNU Project Debugger, https://www.gnu.org/software/gdb/, Accessed 10th November

2015.

[65] Git, https://git-scm.com/ , Accessed 10th November 2015.

[66] Valgrind, http://valgrind.org/, Accessed 10th November 2015.

[67] John L. Hennessy and David A. Patterson, Computer Architecture:A Quantitative Approach (The

Morgan Kaufmann Series in Computer Architecture and Design).Morgan Kaufmann Publishing

Co., 2002.

[68] SixSectorInETN by O.Real, G.Widell, H.Stridell, Ericsson AB, 3rd September 2015.

[69] Intel Guide for developing Multithreaded application, Henry Gabb, Martyn Corden, Todd Rosen-

quist, Paul Fischer, Julia Fedorova, Clay Breshears, Thomas Zipplies, Vladimir Tsymbal, Levent

Akyil, Anton Pegushin, Alexey Kukanov, Paul Petersen, Mike Voss, Aaron Tersteeg and Jay Hoe-

flinger, 16 Jan 2012.

[70] Gerson Robboy, Portland State University/CS 201: Computer Systems Programming,

http://web.cecs.pdx.edu/ jrb/cs201/lectures/cache.friendly.code.pdf, 23rd July 2015.

[71] Andras Vajda, With Contributions by Mats Brorsson and Diarmuid Corcoran, Foreword by

Hakan Eriksson Programming Many Core Chips( ISBN 9781441997395. 9781441997388, DOI

10.1007/978-1-4419-9739-5 Publisher: Springer US.,2011.

[72] Memcheck: a memory error detector, http://valgrind.org/docs/manual/mc-manual.html, accessed

3rd September 2015.

[73] Cachegrind: a cache and branch-prediction profiler, http://valgrind.org/docs/manual/cg-

manual.html,accessed 3rd September 2015.

[74] Helgrind: a thread error detector, http://valgrind.org/docs/manual/hg-manual.html,accessed 3rd

September 2015.

[75] PAPI: Performance Application Programming Interface, http://icl.cs.utk.edu/papi/,accessed 3rd

September 2015.

[76] GNU gprof: for profiling a program, https://sourceware.org/binutils/docs/gprof/,accessed 3rd

September 2015.

[77] Mini-Link SP, Ericsson AB, http://www.ericsson.com/ourportfolio/products/sp, Accessed 4th

November 2015.

172 BIBLIOGRAPHY

[78] Kcachegrind, Call Graph Viewer, https://kcachegrind.github.io/html/Home.html, Accessed 4th

November 2015.

[79] Valgrind Massif: A heap profiler, http://valgrind.org/docs/manual/ms-manual.html, Accessed 4th

November 2015.

[80] MT-DTEA-PA1, MT2015 Ericsson Internal Presentation, Accessed 10th May 2015.

TRITA-ICT-EX-2016:158

www.kth.se

data transformation trajectories in embedded systems1088225/fulltext01.pdf · data transformation...

Documents