analyzing social communities and its importance on...

ANALYZING SOCIAL COMMUNITIES AND ITS IMPORTANCE ON DYNAMIC MOBILENETWORKS

By

MD ABDUL ALIM

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2016

c© 2016 Md Abdul Alim

2

To my family

3

ACKNOWLEDGMENTS

I would like to express my utmost gratitude to my supervisor Prof. My T. Thai for

her continuous support and guidance during my study and research at the University of

Florida. I have benefited a lot from her keenness in research endeavors, great skills in

writing and presentation, and her great personality and enthusiasm, which profoundly

inspired me throughout my journey. Her wisdom, support and advice have guided me

through all of my difficult moments, not only in doing research but also in my personal

life. Also, I am grateful to have excellent lab-mates who have provided extremely helpful

resources during my study.

I am thankful to Prof. Tamer Kahveci, Prof. Prabhat Mishra, Prof. Panos Pardalos

and Prof. Daisy Zhe Wang for being in my PhD committee.

Finally, I would like to thank all my family members for their relentless support

throughout my study as well as for my career.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

CHAPTER

1 SOCIAL COMMUNITIES AND ITS IMPORTANCE ON MOBILE NETWORKEFFICIENCY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.1.1 Social Community and Multi-hop D2D Communication . . . . . . . 131.1.2 Efficient Content Transmission Through D2D Multicast Communication 151.1.3 Community Structure Vulnerability . . . . . . . . . . . . . . . . . . 171.1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.1.5 Paper Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.1 Recent Advances in Multi-hop D2D Communication . . . . . . . . 201.2.2 D2D and Multicasting in Cellular Network . . . . . . . . . . . . . . 211.2.3 Community Structure Vulnerability . . . . . . . . . . . . . . . . . . 22

2 LEVERAGING SOCIAL COMMUNITIES FOR OPTIMIZING CELLULAR DEVICE-TO-DEVICECOMMUNICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1 Cost-effective Relay Selection for Content Delivery in Multi-hop D2D . . . 242.2 System Overview and Model Representation . . . . . . . . . . . . . . . . 25

2.2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Problem Formulation and Solution . . . . . . . . . . . . . . . . . . . . . . 312.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Social Community Aware Cellular Network . . . . . . . . . . . . . . 322.3.3 Community Structure and Durable Community . . . . . . . . . . . 34

2.3.3.1 Durable community detection . . . . . . . . . . . . . . . . 362.3.3.2 A greedy algorithm for DCD problem . . . . . . . . . . . . 37

2.4 Cost-Effective Device Selection . . . . . . . . . . . . . . . . . . . . . . . . 402.4.1 Relay Graph Construction . . . . . . . . . . . . . . . . . . . . . . . 402.4.2 We Weight Assignment in Gr . . . . . . . . . . . . . . . . . . . . . 412.4.3 Social Community Aware Device Selection for Multi-hop D2D . . . 422.4.4 Solving the Optimization Problem . . . . . . . . . . . . . . . . . . . 432.4.5 Exact Solution by Cutting Plane . . . . . . . . . . . . . . . . . . . . 47

2.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 TOWARDS EFFICIENT SOCIAL-AWARE CONTENT TRANSMISSION THROUGHDEVICE-TO-DEVICE MULTICAST COMMUNICATIONS . . . . . . . . . . . . . 59

3.1 D2D Enhanced Content Transmission . . . . . . . . . . . . . . . . . . . . 593.1.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.1.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.1.2.1 Radio network . . . . . . . . . . . . . . . . . . . . . . . . 613.1.2.2 Social network . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3 Solution for MRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4 SOCIAL-AWARE MULTICAST CONTENT TRANSMISSION: SPECIAL CASESCENARIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.1 CRS: Two-hop MRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.1.1 Solution Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.1.2 Solutions for RSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.1.2.1 The pptimal solution to RSP . . . . . . . . . . . . . . . . 824.1.2.2 The greedy solution to RSP . . . . . . . . . . . . . . . . . 83

4.1.3 Solution to CRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 ROBUSTNESS OF COMMUNITY STRUCTURES: APPROXIMATION ALGORITHMSAND ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.1 Density-based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.1.1 Network Model and Problem Definition . . . . . . . . . . . . . . . . 895.1.2 Complexity of DBC . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.1.3 Solutions to DBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.2 A General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.3 Broken Community Analysis: Constraint on Edge Removal . . . . . . . . 98

5.3.1 k-Density-based Broken Community . . . . . . . . . . . . . . . . . 985.3.2 A General Framework: k-Broken Community Assessment . . . . . 100

5.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.4.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.4.2 Performance Evaluation of CVA . . . . . . . . . . . . . . . . . . . . 1025.4.3 Performance Evaluation for Generalized Framework . . . . . . . . 1045.4.4 Analysis of the Edge Constrained Version . . . . . . . . . . . . . . 112

5.4.4.1 Results for k-DBC problem . . . . . . . . . . . . . . . . . 1125.4.4.2 Results for k-BCA problem . . . . . . . . . . . . . . . . . 112

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6

6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7

LIST OF TABLES

Table page

2-1 Summary of important symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2-2 Running times in seconds for DCD . . . . . . . . . . . . . . . . . . . . . . . . . 39

2-3 Comparison of running times in seconds . . . . . . . . . . . . . . . . . . . . . . 40

2-4 Main wireless network parameters . . . . . . . . . . . . . . . . . . . . . . . . . 49

3-1 CQI / MCS table for LTE-A [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3-2 Summary of notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3-3 Social CQI Matrix (SCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3-4 Wireless network parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3-5 Gap analysis between ORS and ERS . . . . . . . . . . . . . . . . . . . . . . . 74

3-6 D2D pair counts in ORS and ERS . . . . . . . . . . . . . . . . . . . . . . . . . 74

3-7 Comparison of delivery times in second . . . . . . . . . . . . . . . . . . . . . . 75

5-1 Experimental datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5-2 Network communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5-3 Network characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8

LIST OF FIGURES

Figure page

2-1 D2D communication scenario before the transmission takes place . . . . . . . 26

2-2 Flow chart for the proposed solution scheme . . . . . . . . . . . . . . . . . . . 27

2-3 Content transmission success rate for different cases . . . . . . . . . . . . . . 51

2-4 Offload performance analysis for different cases . . . . . . . . . . . . . . . . . 54

2-5 Cost-effectiveness of multi-hop D2D for three different content sizes and arange of tmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2-6 Execution time of RPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2-7 Impact of different parameters for constructing Gp on the performance of RPF . 57

2-8 The cost of the BS vs user count . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3-1 D2D enabled multicast (a multi-hop scenario): v1 and v3 form the relay devicesin second hop, along with purple nodes (v2, v4) they are directly served by theeNB in first hop. v5, v6, v7 and v8 denote the relay devices in subsequent hops. 60

3-2 Analysis of ERS performance vs ORS . . . . . . . . . . . . . . . . . . . . . . . 74

3-3 Delivery time for varying the eNB budget . . . . . . . . . . . . . . . . . . . . . . 76

3-4 Delivery time for varying the multicast users . . . . . . . . . . . . . . . . . . . . 76

3-5 Delivery time for varying the RB count . . . . . . . . . . . . . . . . . . . . . . . 76

3-6 Delivery time for varying the content size . . . . . . . . . . . . . . . . . . . . . 77

3-7 Delivery time and D2D pair count for varying the social tie distribution . . . . . 77

3-8 Heatmap depicting hop count for varying both budget and multicast user . . . . 78

4-1 eNB Budget (RD count) for varying user count . . . . . . . . . . . . . . . . . . 86

4-2 Content delivery time for varying user count . . . . . . . . . . . . . . . . . . . . 86

4-3 Execution time for varying user count . . . . . . . . . . . . . . . . . . . . . . . 86

5-1 Density based broken community analysis for k largest community . . . . . . . 103

5-2 Edge removal count by greedy algorithm CCF for breaking k largest communities.γ = 0.5 in first column, γ = 0.3 in second column . . . . . . . . . . . . . . . . . 105

5-3 Edge removal count by CCF for breaking k randomly selected communities.γ = 0.5 in first column, γ = 0.3 in second column . . . . . . . . . . . . . . . . . 107

9

5-4 Edge removal count by CCF for breaking k smallest communities. γ = 0.5 infirst column, γ = 0.3 in second column . . . . . . . . . . . . . . . . . . . . . . . 108

5-5 A small community detected by Oslom for γ = 0.3 in Enron network. Here theinternal structure shows parts are connected through small number of edges.Our greedy algorithm removes the pink cut edges. . . . . . . . . . . . . . . . . 110

5-6 A community detected by Oslom for γ = 0.3 in Facebook network. Here theinternal structure shows parts are connected through small number of edgesin pink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5-7 Broken Community Analysis. k-DBC in 1st Column, Outcome of CEL on k-BCAin 2nd Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

10

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

ANALYZING SOCIAL COMMUNITIES AND ITS IMPORTANCE ON DYNAMIC MOBILENETWORKS

By

Md Abdul Alim

December 2016

Chair: My T. ThaiMajor: Computer Engineering

Many complex systems, from World Wide Web and online social networks to mobile

networks, exhibit community structure in which nodes can be grouped into densely

interconnected communities. This special structure has been exploited extensively

to design better solutions for many applications such as routing in wireless networks,

worm containment and interest prediction in social networks. In this dissertation, the

impact of social communities on emerging device-to-device (D2D) communication

has been analyzed and a social-aware scheme has been introduced taking social

encounters into context for time sensitive content transmission. Simulation results show

that the proposed social community-aware approach yields significant performance

gain, in terms of the amount of traffic offloaded from the cellular network to the D2D tier

compared to social-unaware methods.

Recently, the trend of accessing popular video-on-demand contents over cellular

network has increased unimaginably due to the widespread use of social media,

thus straining the capacity of existing wireless cellular networks. The cellular network

resource utilization can be significantly improved when requests for a particular content

are generated from a group of users located in a particular area. In such cases, a

traditional multicast scheme serves all users in a cell by limiting the data rate to the user

with the worst channel condition which results in degraded satisfaction for users with

better channel quality. Unlike the conventional multicast scheme, which also assumes

11

the altruistic nature of users and does not consider the Base Station (BS) cost, a novel

framework leveraging both the D2D communication and the social relationship between

users has been introduced in this dissertation with the aim to achieving better quality of

service while delivering time sensitive video content to multicast users. Experimental

evaluation shows our proposed solution achieves significant enhancements of the

overall performance compared to the state-of-the-art solutions.

Since, community based approaches not only provides helpful information in

developing more social-aware strategies for mobile network problems but also

promises a wide range of applications enabled by social networking; analyzing and

properly understanding the behaviors and characteristics of communities is of great

advantage. Investigating how the community structure is reshaped under node and

edge removal and consequently, what footprint it leaves on the network performance

is of particular interest. Due to the high interaction within a community, it is assumed

that communities are hard to break; therefore, community-based solutions are very

robust. In this dissertation, we aim to face this important question: can communities be

broken easily in a network? To answer this question, at first, a density-based problem

formulation for analyzing the vulnerability of network communities is introduced in

terms of edge removal from the network. The NP-completeness of the problem is

proven and a O(log k) approximation algorithm for solving the problem, where k is

the number of communities to be broken, has been introduced. Moreover, it has also

been shown that approximating the problem within a ratio better than our proposed

solution is unlikely possible. Additionally, the vulnerability of communities in the context

of arbitrary community detection algorithms is analyzed. The empirical results show that

communities are vulnerable to edge removal and in some cases the removal of a small

fraction of edges can break the community structure.

12

CHAPTER 1SOCIAL COMMUNITIES AND ITS IMPORTANCE ON MOBILE NETWORK

EFFICIENCY

1.1 Introduction

Complex networks in general exhibit the property of having community structure in

which nodes can be grouped into densely interconnected communities. Understanding

the behaviors and characteristics of communities is of great advantage. It not only

provides helpful information in developing more social-aware strategies for social

network problems but also promises a wide range of applications enabled by mobile

networking, such as routing in Delay Tolerant Networks (DTNs) [35], mode selection

and resource allocation in Device-to-Device (D2D) communication [41, 71], worm

containment in cellular networks [82]. Furthermore, communities reveal the core

network components together with their mutual interactions, thereby representing

the entire network as a compact and more descriptive level. Understanding the

community properties can thus help design efficient solutions for such applications.

In this context, we analyze the characteristics of social communities in wireless networks

and leverage its importance in designing efficient solutions for device-to-device (D2D)

communications.

1.1.1 Social Community and Multi-hop D2D Communication

The demand for wireless data services has increased exponentially in the past

decade thus straining the capacity of existing wireless cellular networks [28] and [32].

One promising solution to meet this capacity crunch is to offload cellular traffic via the

use of direct device-to-device (D2D) communications for enabling proximity services

over the cellular licensed band [60]. To reap the benefits of D2D over cellular, there

is a need to optimize and manage the added cellular interference resulting from D2D

[54]. However, due to the high mobility of cellular devices, establishing and ensuring the

success of D2D transmission is a major challenge.

13

Recently, there has been an increased interest to operate D2D over cellular

using multi-hop transmissions (henceforth referred to as multi-hop D2D) [16, 38, 50].

Such multi-hop D2D architectures can reduce the outage probability while potentially

increasing the capacity of D2D communication by alleviating the effect of interference

from the cellular users [33, 44, 53, 68, 74]. Unlike multi-hop ad hoc networks, which do

not use the cellular spectrum and do not require any infrastructure, multi-hop D2D is

controlled centrally by the base station (BS) for ensuring the QoS of both the cellular

and D2D users simultaneously. In cellular multi-hop D2D scenarios, one must properly

group the mobile devices in order to achieve the required quality-of-service (QoS).

Such a grouping is particularly dependent on the mobility patterns of the devices. One

major challenge in the analysis of such mobile, multi-hop D2D pertains to its strong

dependence on dynamic human behavior which must be correlated with the complex

QoS considerations of the cellular system.

Recently, it has been observed that cellular devices carried by humans exhibit

a pattern with respect to their physical encounters both in space and time [22] and

[35]. Such social encounters have been shown to exhibit a community structure

property which implies that the network can be divided into groups of nodes with dense

connections inside each group and fewer connections across groups. From a D2D

perspective, users who encounter one another frequently will be likely to form a social

community [25, 34]. Additionally, the longer a device stays close to another device, the

mutual interaction between them grows further compared to other sporadic contacts.

Moreover, a large number of longer duration contacts over a period of time makes the

mutual connection more reliable for the continuity of a D2D session which forms the

basis of durable communities. Leveraging such durable communities for improving D2D

transmission constitutes therefore an opportunity that has hitherto not been explored.

For establishing D2D connections, the cellular BS must provide proper incentives

to the users so that they become willing to share their resources for each others

14

transmissions which in turn incurs cost to the BS [81]. Naturally, if most users are

unwilling to participate in D2D transmission, the resources cannot be fully utilized,

and the operation of the underlaid cellular D2D links will be jeopardized. For real-time

content transmission, that must meet stringent latency requirements, a high mobility

of the devices will disrupt an ongoing D2D session. This will eventually lead the D2D

transmission to fail in delivering the content within the needed time bound. In such

cases, the BS must initiate resource consuming cellular connection after dropping the

interrupted session, thus reducing the overall network QoS and failing to exploit the

benefits of D2D. Consequently, to enable reliable delivery of real-time content over

multi-hop D2D at minimum BS cost, it is imperative to identify a set of reliable devices.

Also, such devices must remain within the transmission range of one another during the

D2D session to maintain the QoS. In this thesis, we show that leveraging community

structure helps find reliable devices that enable successful content transmission.

1.1.2 Efficient Content Transmission Through D2D Multicast Communication

The trend of accessing popular video-on-demand contents over cellular network has

increased unimaginably due to the widespread use of social media [61], thus straining

the capacity of existing cellular networks. It has become even more challenging to

guarantee certain quality of service (QoS) in terms of content delivery time. The cellular

network resource utilization can be significantly improved when requests for a particular

content are generated from a group of users located in a particular area [6]. In such

cases, a traditional multicast scheme serves all users in a cell by limiting the data rate to

the user with the worst channel condition. Therefore, users with better channel quality

cannot take advantage of it which results in degraded satisfaction. One promising

solution to face this issue is to use device-to-device (D2D) communications for enabling

proximity services over the cellular licensed band [16]. In such scenarios, D2D can

achieve superior data rate even with small transmit power by utilizing the better channel

quality among devices, thus enhancing the QoS. Due to the high interest for same piece

15

of video content in a particular location and the performance improvement achieved by

D2D communication, the 3rd Generation Partnership Project (3GPP) defined proximity

based multicast as a service to efficiently deliver content over the cellular network

specially during crowded events [3], [4].

The existing works showed D2D can dramatically improve the performance of

underlaying cellular network in terms of guaranteed superior data rates, effective quality

of service, high spectrum efficiency and enhanced system capacity [6], [16]. However,

they failed to consider the following two practical aspects, (1) Base Station (BS) cost

and (2) social relationship between users. In the real-world scenario, mobile users are

reluctant to share their resources [48] which makes it challenging to choose suitable

users as relay. Several factors, such as finite energy, limited storage, valuable CPU

resource and security and privacy considerations make them far from altruistic. BSs

must pay incentives to encourage users so that they are more willing to share their

resources which in turn incurs cost to the BSs [81]. Unfortunately, majority of the current

works have ignored the BS cost and assumed cooperative and selfless users while

designing the D2D systems [6]. Human social relationship is a very important factor

in D2D system design since the devices are carried by humans. Social relationship

in general exhibit the property of having community structure in which users can be

grouped into densely interconnected communities. Users belonging to the same social

community in real life will be more interested to extract a content from another user in

that community and also a user will be more willing to share its resources with other

socially connected users in the same real-world community [48]. Majority of the recent

works on multicasting have also failed to consider this social aspect of human behavior

[6]. We, on the contrary, in this dissertation, incorporate both of these aspects while

considering potential relay devices for D2D-based multicast communication.

In this dissertation, we reap the benefit of D2D by identifying a set of Relay Devices

(RDs) in different hops that will collectively incur at most a given cost to the eNB (BS in

16

LTE-A [3]) for efficiently relaying content. RDs in the first hop receive data directly from

the eNB and forward it to the next hop users who are socially connected to that RD and

also within close physical proximity. By leveraging the knowledge of social connections

and channel conditions among devices, eNB decides modulation and coding schemes

(MCS) for each hop, and also selects hop-wise RDs for minimizing latency.

1.1.3 Community Structure Vulnerability

Understanding the community properties can help assess its impact on network

vulnerability since changes or failures occurred in one community can have a profound

impact which can consequently lead to the transformation of other communities. Due

to the high interaction within a community, we intuitively assume that communities are

hard to break; therefore, community-based solutions are very robust. Let us take a

community-aware routing protocol in DTNs as an example. In this approach, a group

or community in DTNs can be visualized as a group of frequently interacting wireless

devices with less connectivity to other groups. Devices in the same community have

higher chances to encounter each other to transfer carried messages. Therefore, the

knowledge of the community structure could help the routing protocols to wisely choose

better forwarding relays for any specific destination, and hence, could significantly

improve the chance of message delivery. These approaches have been shown to

be very efficient and are among the best methods in DTNs [31, 35]. However, the

success of the forwarding clearly depends to a great extent on the internal structure

of communities. The non-participation of only some important devices is significant

enough to degrade the entire network’s performance. Removal of certain edges can

lead to unstable behavior of the whole routing process [11]. This raises a question: Are

communities really as hard to be broken as believed, even to intentional attacks?

In this dissertation, we first study how social-aware approaches can significantly

improve the performance of multi-hop D2D and subsequently, we proceed on assessing

community strength with respect to the removal of edges. The removal of edges can be

17

interpreted as the failures in communication links in wireless networks or DTNs due to

energy constraint or the moving of wireless devices. The removal of edges can be also

done via unfriending in OSNs. More specifically, in this dissertation, we choose several

combination of different types and sizes of communities and attempt to break them.

Clearly, if the number of edges removed is significantly less than the total number of

edges in communities, we can say that it is easy to break the communities. Otherwise,

we conclude that the communities are very strong.

Unfortunately, identifying these critical edges is very challenging due to several

factors: 1) Communities behave very differently based on the location of edge removal.

They can either stay intact if the removal edge is less important, or can be broken down

into smaller subcommunities which can further be merged to other communities. 2)

There is no universally agreed definition of community and there is a vast amount of

community detection algorithms in the literature [29, 51, 58], it forces us to define a

general method to assess the broken communities for an arbitrary community detection

algorithm. And 3) the networks are in large-scale, thus the devised algorithms identifying

these critical edges must be scalable.

1.1.4 Contribution

The main contribution of this dissertation is to introduce a new framework that

exploits durable social communities to enable successful transfer of a content between

two devices with minimum cost using multi-hop D2D. We model the problem as

a cost-effective device selection strategy on multi-hop D2D for real-time content

delivery. We first formulate the durable community structure and introduce the

concept of sustainable and bridge edges by exploiting the historical encounters of

devices. We further propose a novel community detection method based on those

previous encounters. Subsequently, we formulate the device selection problem as an

optimization problem and we introduce an efficient method for finding the optimal set

of devices on multi-hop path leveraging those social communities. This is in contrast

18

to most existing works on multi-hop D2D that solely focus on system performance

[16, 38, 44, 50, 53, 68]. Simulation results show that our method outperforms classical

social-unaware methods significantly on traces generated by the state-of-the-art mobility

models.

In this dissertation, we also introduce another novel framework that exploits the

social relationships among devices to choose a set of RDs in different hops along with

hop-wise MCSs, within the eNB budget that enables successful transfer of a content

from the eNB to all multicast users in minimum time. We model this problem as an

efficient relay selection strategy for multi-hop D2D for real-time content delivery. We

show its NP-completeness and form a mixed integer program to solve it. We also

introduce a scalable heuristic algorithm to tackle this generic version of the problem. We

further analyze a special case of this problem for delivering the content to all the users.

We propose a greedy algorithm with provable performance guarantee for this particular

case. Moreover, we experimentally show the effectiveness of our proposed methods.

In terms of identifying critical edges important for community structure, we can

summarize the contributions as follows:

• We define the framework for community structure fragility. At first we introducethe density based broken community (DBC) problem for breaking k communitieswith the minimum number of edge removals and analyze its complexity. We thenprovide an approximation algorithm with theoretical performance guarantee for theDBC problem.

• To analyze the vulnerability of the community structures in a broader sense,we extend the problem formulation to communities produced from an arbitrarycommunity detection algorithm. We offer an efficient heuristic to break thecommunities and identify the set of critical edges.

• In order to analyze the edge constrained version and accordingly to identify theedges that are crucial for community structure, we furthermore examine theproblem from the view point of locating a fixed number of important edges whoseremoval breaks as many communities as possible.

• We conduct extensive experiments with different parameters to mine interestingobservations about the behavior of broken communities after edge removal. The

19

results show that only a small percentage of edges are enough for breaking thecommunity structure. And thus, the communities are not as strong as we think.

1.1.5 Paper Organization

The rest of the dissertation is organized as follows. Chapter 2 introduces the

social-aware community based approaches for efficient content delivery in D2D. Chapter

3 discusses the social-aware relay selection mechanism for multicast content delivery

whereas chapter 4 introduces cost-effective relay selection (CRS) procedure which is

a special case of the multi-hop problem discussed in Chapter 3 and also introduces

efficient algorithms with provable performance guarantee to tackle this new problem.

Chapter 5 analyzes the vulnerability of community structure and discusses how its

structure changes under edge removal. We draw the conclusion in Chapter 6.

1.2 Literature Review

In this section we discuss the recent progress made on the D2D communications

research. First, we discuss the recent publications on multi-hop D2D and describe

how our framework brings novelty in this line of research. Secondly, we discuss the

social-aware schemes aimed at improving the multicast performance in cellular networks

in terms of D2D communication. Finally, we also provide a list of recent works that have

focused on importance of the community structure and its vulnerability in particular.

1.2.1 Recent Advances in Multi-hop D2D Communication

Research community have seen a deluge of works in recent times that investigate

the impact of underlaid D2D on cellular network’s performance [16, 19, 21, 39, 45, 49,

66, 72, 73]. Most of the papers encompass the potential of D2D for reducing outage

probability of mobile devices [28], to offloading mobile backhaul data traffic [24], to mode

selection and device discovery, to efficient spectrum management through interference

coordination [54, 56, 60, 78]. Despite significant research on cellular D2D, there are very

few works which consider the cellular multi-hop D2D case. One of the earliest related

works is [53] in which the relay selection problem for cellular D2D was studied. In [74],

20

the authors consider D2D communication for relaying user equipment (UE) traffic while

introducing a relay selection rule based on interference constraints. The works in [44]

and [68] investigate the maximum ergodic capacity and outage probability of cooperative

relaying in relay-assisted D2D communication. The results show that multi-hop D2D

lowers the outage probability and improves cell edge throughput capacity by reducing

the effect of interference from the cellular users. However, none of these works factors

in the impact of mobility of devices on the system performance and on the successful

delivery of time sensitive contents in particular.

We formulate the device selection problem as an optimization problem and we

introduce an efficient method for finding the optimal set of devices on multi-hop path

leveraging the social communities based on device encounters. This is in contrast

to most existing works on multi-hop D2D that solely focus on system performance

[16, 38, 44, 50, 53, 68].

Note that, unlike the more classical case of delay tolerant networks (DTNs) [12, 34],

we consider only time sensitive content transfer between source and target with certain

delay constraint on the total transmission time. This makes our simultaneous D2D

transmission fundamentally different than the DTN which is distributed in nature where

the decision to transmit a content upon a device contact is made locally. In addition,

signal interference, resource allocation, noise and fading are intrinsic design parameters

in D2D communication underlaying cellular networks which makes the design and

operation of D2D completely different from DTNs and related ideas such as ad hoc

networks.

1.2.2 D2D and Multicasting in Cellular Network

Recently, multimedia content sharing over D2D in underlaid cellular network has

been investigated in several works, a survey on this can be found in [6]. The work

in [15] focuses on designing an adaptive resource allocation policy for the efficient

delivery of multicast services in Long Term Evolution (LTE) systems. The authors

21

exploited the multi-user diversity in splitting the multicast group into subgroups and apply

subgroup-based adaptive modulation and coding schemes. [63] proposes a learning

solution based on a multi-armed bandit algorithm that dynamically selects the best

allocation of users between multicast and D2D to guarantee the timely delivery of data.

The main difference of the cited works compared to our proposal is that none of these

works factors in the impact of dynamic social behavior on the system performance and

on the successful delivery of time sensitive multicast contents in particular. Moreover,

they do not consider the BS cost and assumes the altruistic nature of users which make

them inapplicable in real world scenario. We exploit the social aspect as well as radio

network characteristics while choosing cost-effective relays that incur less BS cost.

Moreover, our multi-hop based generic approach outpaces some recent works that focus

on leveraging D2D but limits it to only two-hop communication [55]. Finally, the above

mentioned D2D based works concentrate their attention to pairs of directly connected

content source and content requester, whereas, our work encompasses a broader class

of generic case, namely the multi-hop D2D scenario for delivering the content efficiently

under various practical constraints.

1.2.3 Community Structure Vulnerability

Although a lot of work has been performed on network vulnerability assessment,

none of them really targeted the problem from community structure point of view by

defining quantification measure for broken community. Nam et al. [57] deals with

community structure vulnerability from node point of view based on the Normalized

Mutual Information (NMI) measure. Alim et al. [13] identifies important nodes critical for

overlapping community structure. They find out how different the network communities

are once nodes are removed, but does not address the core issue whether the

communities are broken or not.

The literature on community structure and its detection can be found in an excellent

survey of Fortunato et al. [29]. Assessing the vulnerability of network community

22

structure, however, has so far been a relatively untrodden area. In his recent work [18]

Borgatti address the problem of discovering key players in a network. A large body of

work has been devoted in identifying the node roles within a community by a link-based

technique together with a modification of node degree [64], or by the detection of key

nodes, overlapping communities and “date” and “party” hubs [40]. However, none

of these approaches discusses whether the communities are strong enough under

sustained attack or not.

On the assessment of network vulnerability, existing studies mainly focus on

assessing the average shortest path length [9], and the global clustering coefficient

[52]. Dinh et al. [26] suggested the β-disruptor problem to find a minimum set of edges

or nodes whose removal degrades the total pairwise connectivity to a desired degree.

None of these works consider the assessment of network vulnerability from community

structure point of view.

23

CHAPTER 2LEVERAGING SOCIAL COMMUNITIES FOR OPTIMIZING CELLULAR

DEVICE-TO-DEVICE COMMUNICATIONS

In this chapter, we investigate the impact of social-aware community based

approaches on the performance of D2D underlaying cellular networks. We first present

the motivation for applying social based strategies in enhancing content delivery rate

in multi-hop D2D communication network in Section 2.1 and introduce the system in

Section 2.2 while Section 2.3 provides the problem formulation. Section 2.4 discusses

reliable device selection procedure for multi-hop D2D. Simulation results are analyzed in

Section 2.5.

2.1 Cost-effective Relay Selection for Content Delivery in Multi-hop D2D

Multi-hop transmission [16, 38, 50] has gained interest in recent times for D2D

underlaying cellular networks. Such multi-hop D2D architectures can potentially

increase the capacity of D2D communication by alleviating the effect of interference

from the cellular users [44, 53, 68]. Unlike multi-hop ad hoc networks, which do not use

the cellular spectrum and do not require any infrastructure, multi-hop D2D is controlled

centrally by BS for ensuring the QoS of both the cellular and D2D users simultaneously.

One major challenge in the analysis of such mobile, multi-hop D2D pertains to its strong

dependence on dynamic human behavior which must be correlated with the complex

QoS considerations of the cellular system.

For establishing D2D connections, the cellular base station (BS) must provide

proper incentives to the users so that they become willing to share their resources for

each others transmissions. Naturally, if most users are unwilling to participate in D2D

transmission, the resources cannot be fully utilized, and the operation of the underlaid

cellular D2D links will be jeopardized. For real-time content transmission, that must meet

stringent latency requirements, a high mobility of the devices will disrupt an ongoing

D2D session. This will eventually lead the D2D transmission to fail in delivering the

content within the needed time bound. In such cases, the BS must initiate resource

24

consuming cellular connection after dropping the interrupted session, thus reducing

the overall network QoS and failing to exploit the benefits of D2D. Consequently, to

enable reliable delivery of real-time content over multi-hop D2D at minimum BS cost, it is

imperative to identify a set of reliable devices. Also, such devices must remain within the

transmission range of one another during the D2D session to maintain the QoS. Next,

we lay the foundation for identifying these cost-effective relay devices on the multi-hop

D2D underlaying cellular network.

2.2 System Overview and Model Representation

2.2.1 System Overview

Consider the downlink transmission of an OFDMA cellular network consisting

of a single base station (BS) and a set N of user equipments (UEs). The UEs are

able to communicate with one another using D2D links that are underlaid on the

cellular network. The total bandwidth B is divided into F resource blocks (RB) in the

set F . We consider a co-channel network deployment in which B is shared between

cellular and D2D transmissions while considering one RB per UE. We assume UE

i requests a content from BS which, in turn, selects UE j (i , j ∈ N ) among other

UEs having the content, as the source of the content. The BS will enable direct D2D

connections between UE i and UE j when the distance between them is within a

desired D2D communication range dmax which, in turn, corresponds to a required

signal-to-interference-plus-noise ratio (SINR) as shown in Figure 2-1A.

In practice, setting up reliable direct D2D connections while satisfying the

quality-of-service (QoS) requirements of both the traditional cellular UEs (CUEs) as

well as the D2D UEs is challenging. On the one hand, the unreliable propagation

medium and longer distance might affect the link quality between D2D devices (Figure

2-1B). On the other hand, interference from other cellular and D2D UEs sharing the

same RB will also contribute toward lowering the SINR (Figure 2-1C). In such low SINR

25

A D2D transmission with high SINRdue to distance d ≤ dmax

B D2D transmission not possible due tolow SINR as d > dmax

C Channel interference between cellu-lar communication (UE3 and BS) andD2D pairs (UE1, UE2) and (UE4, UE5)

Figure 2-1. D2D communication scenario before the transmission takes place

cases, the use of multi-hop D2D communications can be beneficial to enhance the

overall D2D QoS.

Indeed, the effectiveness of multi-hop D2D depends on suitable device selection

mechanisms. Ideally, for the D2D to successfully sustain data transmission, the devices

that are chosen along the multi-hop D2D path must not move beyond the D2D range

during a communication session so as to maintain the desired SINR target. Designing

such mechanisms is challenging due to the coupling between mobility patterns,

26

Figure 2-2. Flow chart for the proposed solution scheme

incentives for sharing resources, and network QoS. In our model, we focus on selecting

a least cost reliable multi-hop path for real-time content delivery from a source to a

destination. It has been observed that mobility and physical encounter patterns are

very closely related to social structures, and very often frequency and length of physical

interaction is strongly correlated with proximity [22]. Therefore, we leverage the historical

encounter patterns of devices to identify social communities that gives indication on how

devices come closer to each other. Thus, the goal of the proposed least cost multi-hop

path approach is to select devices based on the social encounters and communities so

as to make sure they stay within close proximity of one another during the D2D session.

A flow chart that summarizes the implementation of the proposed scheme is

shown in Figure 2-2. Whenever a request for a content comes to the BS from a device

r , the BS identifies the source of the content. If no such device is found to hold the

content, then, the content is transmitted directly from the BS towards r using cellular

communication.

27

If a device s having the content is identified, the BS initiates the durable community

detection phase by invoking the DCD algorithm that is detailed in Subsection 2.3.3.1.

The BS then assigns proper edge weights to each of the D2D pairs present in its

coverage area using the social-based technique that is explained in Subsection 2.4.2.

Finally, the BS identifies the multi-hop D2D path to relay the content from s to r , if there

exists any such feasible path that can deliver the content within a certain time threshold

tmax , instead of; otherwise, the BS initiates a direct cellular connection towards r . In

the former case (when a feasible path exists), if the total incentive that the BS has to

pay to the relay devices on the multi-hop path is larger than the direct BS to r cost

which is termed as B2D cost, the BS also initiates a direct cellular connection towards

r rather than serving the content via D2D. Once the content transmission starts via

multi-hop D2D, the BS keeps track of the pairwise mobility of devices for each hop. If

the device mobility leads to a minimum allowable SINR that is below a certain threshold,

the multi-hop D2D connection can no longer be sustained. At this point, the BS has

to initiate a direct cellular connection towards r to fulfill its content request. Next, we

describe the necessary system model.

2.2.2 System Model

In our network, we consider real-time content sharing among mobile D2D users

with strict delay requirements. We assume device r requests a content of size b from

the BS at time t. The BS identifies s, another UE, as the peer device having the content

that would serve the request of r via D2D. There are several approaches to identify a

suitable source for a requested content in literature [75] which is not our focus in this

dissertation. Hereinafter, s is referred to as the source device and r as the destination

device. However, as discussed previously, these UEs may not be able to communicate

directly due to physical constraints and hence a multi-hop path needs to be identified for

effective content transfer.

28

Table 2-1. Summary of important symbolsSymbol Description Symbol DescriptionN Set of UEs in the network Gr Multi-hop D2D graph at time t

B Bandwidth of the network η Speed of lightF Set of RBs ci ,j Cost that BS pays to incentivize i to send content to j

F Total number of RBs ψ Shadowing componentdmax Maximum D2D range �t Historical encounter spans Source of the content t Actual time when content request is generated from r

r Destination/requester of the content Gp Contact graphb Content size tc Time that would have taken to transmit content c of

size b from s to r if they were within dmax

Ri ,j Achievable data rate between device i and j Dij Encounter duration between device i and j

lz Bandwidth of resource block z ∈ F δ Predefined stability thresholdZ Set of devices sharing same RB z Li (t) Position of device i at time t

gi ,j Channel gain between device i and j �Dij Average contact duration between device i and j in �tσ2 variance of the Gaussian noise λi ,j Average number of encounters between i and j

pi Transmit power of device i G e Encounter history graphγi ,j SINR at device j for the link i → j ζ Strength thresholddi ,j Distance between device i and j ρ Predetermined weight factorα Path loss exponent C Set of durable communitiesm0 Fading component wb

uv Weight of bridge edge between u and v

pi ,j Received power at device j for the link i → j w suv Weight of sustainable edge between u and v

ti ,j Time required to transmit content Bu,v Percentage of actual encounter duration larger than tcfrom device i to j between device u and v

tp

i ,jPropagation delay between device i to j hC Durability of community C

txi ,j

Transmission delay from device i to j k Number of detected communities

The achievable rate Ri ,j for the transmission between a device i and device j is

Ri ,j = lz log2(1 + γi ,j), (2–1)

where lz is the bandwidth of RB z ∈ F used by i for its data transmission to j ,

γi ,j denotes the SINR for j from i . For the link between i and j , considering signal

interference from all other devices using the same RB z , we have

γi ,j =pigi ,j∑

i ′∈Z,i ′ 6=i pi ′gi ′j + σ2, (2–2)

where Z is the set of devices sharing RB z , gi ,j is the channel gain between i and

j , pi is the transmit power of device i , and σ2 is the variance of the Gaussian noise.

Here, we note that the BS and the devices operate in a half-duplex mode and the

same set of resources (i.e. subcarriers) is shared for transmission of content. In our

model, devices on several D2D links can transmit simultaneously and hence can cause

interference with one another when using the same RB. However, devices on different

hops do not interfere with one another over the same RB. The proposed approach can

accommodate any algorithm for allocating RBs to the various D2D and cellular links.

29

Without loss of generality, hereinafter, we adopt graph coloring techniques such as in

[67] to perform this assignment. In our model, in line with existing D2D works [16] and

for tractability, we do not consider interference on the reverse, acknowledgment link. We

have observed in our experimental evaluation, incorporating reverse link interference

into the formulation does not significantly affect the conclusions.

Due to the fact that one cannot know which D2D links will be actively relaying at

every hop until we execute our proposed relay path finder (RPF) algorithm described in

Section 2.4, we assume that all the D2D links are active. This enables us to compute the

data rate of the links which is required by the RPF algorithm for choosing relays that can

deliver the content within tmax . In order to reduce the interference between the cellular

links and the D2D links, we identify the D2D links which are within close proximity to the

cellular link and we ensure that they do not reuse the same RBs. We only allow those

links which are sufficiently far apart to share the same resources. This is essentially

similar to the classical frequency reuse concept used in cellular networks, but now we

apply to D2D transmissions. For each D2D link (i , j), we identify the interference set for

this link. An interference set for (i , j) contains all the links whose transmitter or receiver

are within a certain distance from the transmitter i of the link (i , j) and could potentially

cause large interference. In the graph coloring based resource allocation scheme that

we use, these links are assigned different resource blocks. Links that are significantly

far away from each other are allowed to have same RB. Once the RB allocation is

complete, we utilize Equation (2–1) and (2–2) to compute the data rates for each link.

For the wireless network, we consider distance-dependent path loss and multipath

Rayleigh fading along with log-normal shadowing. Thus, the received power of each link

between devices i and j can be described as pi ,j = pi .(di ,j)−α.|m0|2.10ψ/10, where pi is

the transmit power of device i , α is the path loss exponent, m0 is the fading component,

and ψ is the log-normal shadowing component.

30

Given this SINR model, we now formulate the time required for the transmission

between device i and j . The time ti ,j in the link (hop) from i to j is defined as

ti ,j = tpi ,j + txi ,j =di ,j

η+

b

Ri ,j

, (2–3)

where tpi ,j is the propagation delay between device i and device j which, in turn, depends

on the distance di ,j of the single hop link between i and j , and the speed of light η. The

transmission delay, txi ,j , depends on the packet size b and on the achievable data rate for

the transmission between i and j as per (2–1).

To incentivize a certain device i for sharing its resources with another device j ,

the BS must incur a cost ci ,j . A device that experiences a good channel and that has a

higher transmit power will be able to transfer content more efficiently than others, and

hence is a better candidate for D2D from the BS’s perspective. Accordingly, we have,

ci ,j = pi ,j = pi · (di ,j)−α · |m0|2 · 10ψ/10. (2–4)

This incentive/cost can be in the form of monetary remunerations, coupons, or

free data. We summarize most of the important notations used throughout this chapter

in Table 2-1. Next, we define the necessary framework for formulating the problem of

identifying reliable devices on multi-hop D2D.

2.3 Problem Formulation and Solution

2.3.1 Problem Formulation

Given this wireless network model, the next goal is to find a set of devices that

would enable feasible multi-hop D2D communications while satisfying stringent delay

constraints and minimize the BS’s cost, as per (2–4). We introduce the concept of

feasible path formally as follows:

Definition 1. (Feasible Path) Given a cellular network G = (V, E), where V is the set

of all devices and E is the set of links that connects them, a feasible path from source

s to destination r in G , is an ordering P of devices in V, where P =< i1, ... , ik > such

31

that i1 = s, ik = r , (ij , ij+1) ∈ E and given the interference and mobility of devices,∑k

i=1,j=i+1 ti ,j ≤ tmax where ti ,j and tmax indicate the time required to transfer a content

from i to j and the maximum allowed content sharing time, respectively.

For successful delivery of a content using multi-hop D2D, the devices on a feasible

path must also remain within a range that corresponds to the desired SINR throughout

the D2D session. To combine these properties, we now present the cost-effective device

selection problem for multi-hop D2D (CEDS-MD):

Problem 1. (CEDS-MD) Cost-effective device selection for multi-hop D2D (CEDS-MD)

seeks to identify a feasible path P that results in minimum cost of transmission from

source s to destination r by minimizing the device cost denoted by C(P) =∑

(i ,j)∈P ci ,j ,

where ci ,j is the cost of BS for incentivizing device i to share resource with device j , and

i is the immediate predecessor of j in the feasible path and the devices on P remain

within the D2D transmission range throughout tmax as governed by the cellular base

station.

2.3.2 Social Community Aware Cellular Network

Incorporating social based device proximity information with conventional physical

layer metrics enables better resource utilization and enhanced traffic offload in D2D

[62]. However, these measures are not able to capture the impact of user mobility on the

successful completion of D2D transmission particularly when devices are moving rapidly

during the transmission. Consequently, there is a need to adopt a more realistic view for

the social context by basing it on other social dimensions such as the actual encounters

between users. Device encounters have been shown to satisfy the community structure

property [22] and thus, the stability of D2D session must be correlated with durable

social communities.

Therefore, as a first step towards solving CEDS-MD, we must identify durable social

communities based on the previous encounter histories. When two devices i and j

are within the transmission range dmax of each other, they can communicate in D2D

32

mode under the control of the BS. According to the 3GPP Release 12 [5], for proximity

services, each mobile device not only updates its location with the BS on a periodic

basis, but it also reports the presence of other devices within close distance, both in time

and space, who have already subscribed for the proximity-based services. The BS then

saves the corresponding device identities as well as start and end time of the contact.

Assuming a content request is generated at a given time t during a day, the BS extracts

all the specific historical encounters that start around t in order to realistically predict the

mobility pattern of the devices. To this end, the BS constructs a physical contact graph

Gp which is a weighted undirected graph and detects the durable communities. Devices

belonging to the same community are more likely to have longer contact duration and,

hence, they will get more priority to be chosen on the multi-hop D2D if they happen to be

within each others proximity at content request time t.

In Gp, each edge represents the average duration of contact between two devices

for a certain span �t of previous days. �t can be any number of previous days (or

hours) depending on the way the encounter histories are being preserved in the BS. If

tc is the time required for the content to be transmitted from s to r when they are within

the range dmax , the BS will need to consider those previous encounters in �t that have

an average duration of at least tc . Although encounters having duration at least tc are

good candidates for reliable connections, the longer the duration the more durable it is.

To put the duration length into perspective, we not only take into account the encounters

having duration of sufficient length (tc ) but also all the previous encounters with duration

Dij ≥ (1 + δ)tc where the stability threshold, δ ≥ 0, is a user controlled parameter that

reflects the importance of the duration length of encounters beyond tc . At the same

time, we also emphasize on the impact of encounter rates of two devices in �t paired

with the duration. Next we will formally define the notions related to encounters.

33

2.3.3 Community Structure and Durable Community

Now, we introduce the necessary terms to describe encounters in the context of

D2D and formally define the notion of a durable community structure in this subsection.

Assume that i and j come into the communication range at time te , that is, ||Li(t−e )−

Lj(t−e )|| > dmax and ||Li(te) − Lj(te)|| ≤ dmax , where t−e denotes the time before te , Li(t)

the position of user i at time t, dmax the D2D transmission range as determined by the

BS and ||.|| the distance measure. With this, we can define the D2D contact duration:

Definition 2. The D2D contact duration between users i and j is defined as the time

during which they are in contact before moving out of the range, that is, Dij = t − te with

mint−te{t : ||Li(t)− Lj(t)|| > dmax , t > te}, where t and te are in the continuous time scale.

Consider a series of q contact durations Dij = (D1ij , ... ,D

qij ) between nodes i and j in

time frame �t, then, we can make the following definition:

Definition 3. The average contact duration, denoted by �Dij =∑q

k=1Dk

ij

q, is the expected

time during which two devices stay within dmax before they move apart again once after

coming in proximity to one another.

Next, let G e = (V, Ee,T) be an undirected graph representing the physical

encounters of |V| mobile devices. Ee is the set of undirected relationships (in this

case encounters). Each edge E ei has an associated collection of two-dimensional

vectors denoted by Ti = (Ti1,Ti2, ...). Each element of in Ti denotes contact time and

corresponding duration in �t time span, i.e., Tij =< tuv ,Duv > for all the j encounters

between device u and v in �t.

Contact Graph: The request for a content is generated at time t and tc is the time

required to transmit the content from s to r if they are within range dmax . We construct an

undirected and weighted contact graph Gp = (Vp, Ep,Wp), where |Vp| = n and |Ep| = m.

In doing so, we consider only those encounters in G e that have average contact duration

�Dij sufficiently long enough to cater tc starting at t, i.e., �Dij ≥ (1 + δ)tc where δ ≥ 0 is the

34

predefined stability threshold. wuv ∈ Wp is the weight function on each edge (u, v) ∈ Ep

where u, v ∈ Vp.

Weight Assignment in Gp: Encounters having average contact durations larger than

tc are very good candidates for sustainable D2D transmission. However, considering

only the average duration might result in choosing some encounters having a large

number of less than tc duration which will negatively impact the reliable device selection

for multi-hop D2D. To account for this in assigningWp, we will prioritize those edges

having encounters with actual duration larger than tc with more weight. To this end, we

define Buv , 0 ≤ Buv ≤ 1 that denotes the percentage of times the encounter duration was

actually larger than tc . Accordingly, we define the weight wuv = ρBuv · λuv + (1 − ρ) �Duv

where ρ, 0 ≤ ρ ≤ 1 is a predefined weight factor that signifies how much emphasis

should be put on the average encounter duration with respect to the percentage of times

the encounter duration was actually larger than tc as denoted by Buv . To account for

the encounter rate we have multiplied Buv by the weight factor λuv , so that the impact of

frequent long duration contacts can also be captured in the edge weight. λuv denotes

the average number of encounters between u and v over the time period �t.

Next, we will define a durable community structure that will group devices having

similar contact duration together. Such a structure has special properties related to

bridge and sustainable edges. In fact, an edge (u, v) in Gp is said to be bridge edge if it

has small percentage of successful contact durations Buv which is reflected by wuv < ζ

where ζ is the predefined strength threshold. A sustainable edge (u, v) is defined to

have large percentage of successful contact durations Buv which is reflected by wuv ≥ ζ.

We denote the weight of sustainable and bridge edges as w suv and w b

uv , respectively.

We leverage these edge weights in deciding the relay devices which we describe in

Subsection 2.4.2.

Consequently, a durable community structure, denoted by C = {C1,C2, ... ,Ck},

is a collection of k subsets of V satisfying ∪ki=1Ci = V. We say that, a collection of

35

nodes Ci ∈ C and its induced subgraph is a durable community in Gp if nodes inside Ci

are connected primarily through sustainable edges and nodes across communities Ci

and Cj , if connected, will have bridge edges. Next, we propose an approach to detect

durable communities in Gp.

2.3.3.1 Durable community detection

For a node u ∈ Vp, let Au be the set of neighbors adjacent to u. Moreover, let wu be

the weight corresponding to this set. For any C ⊆ Vp, let C in and C out be, respectively,

the set of links having both endpoints in C and the set of links heading out from C .

Additionally, let wC =∑

(u,v)∈C in wuv , w outC =

∑(u,v)∈Cout wuv and w+

C = wC + w outC .

Given the contact graph Gp, we seek to find a community structure C = {C1,C2, ... ,Ck}

that would strive to group sustainable edges inside a community and place bridge edges

across communities. Intuitively, any grouping that maximizes the ratio of sustainable

edges to bridge edges inside a community achieves our objective. Thus, we define

the durability of a community C as hC = wC

w+

C

, and we formulate the following Durable

Community Detection (DCD) optimization problem:

maximize R =∑C∈C

hC =∑C∈C

wC

w+C

,

s.t. Ci ∩ Cj = ∅ ∀i , j ∈ {1, 2, ... , k},k⋃

i=1

Ci = Vp

In this formulation, the number of communities k is determined by optimizing

the objective function R and is not an input parameter. Next, we show the following

properties of network communities identified by optimizing our suggested metric R:

(i) links within a community have high durability contribution and (ii) links connecting

communities have low durability contribution.

36

Proposition 2.1. Let C = {C1,C2, ... ,Ck} be a community structure detected by

optimizing R, links within each Ci are of strong durability contribution while those

connecting communities are of weak durability contribution.

Proof. For any node u ∈ Vp and subset S ⊆ Vp, let wu,S be the total weight of all links

that u has towards S and vice versa. By this definition, we obtain wu = wu,S + wu,Vp\S .

Consider a community C ∈ C, u ∈ C and v /∈ C . Since v is not a member of C , we

have

wC

w+C

>wC + wv ,C

w+C + wv

=wC + wv ,C

w+C + wv ,C + wv ,V\C

,

because otherwise adding v to C will give a better value of R. This equality results in

wv ,C

wv

<wC

w+C

,

which, in turn, implies that the links joining v to C are insignificant in terms of durability

contribution with respect to the total weight of C as a whole.

Similarly, for any node u ∈ C , we have

wC

w+C

>wC − wu,C

w+C − wu

=wC − wu,C

w+C − wu,C + wu,V\C

,

because otherwise excluding u from C will give a better estimation of R. This inequality

simplifies to

wu,C

wu

>wC

w+C

,

which shows that the links joining u to C are of significant weight having larger durability

contribution in comparison to the total internal weight of C .

2.3.3.2 A greedy algorithm for DCD problem

Solving the DCD problem is NP-hard as shown by a similar reduction to modularity

as in [20]. Consequently, a heuristic approach that can provide a good solution in a

37

timely manner is more desirable. In this regard, we propose a greedy algorithm for the

DCD problem consisting of three phases, shown in Alg. 1.

The first phase, referred to as the development phase, identifies raw communities

in the input network. Initially, all nodes are unassigned and do not belong to any

community. Next, a random node is selected as the first member of a new community C ,

and consequently, new members who help to maximize C ’s durability, hC , are gradually

added into C . When there is no more node that can improve this objective of the current

community, another new community is formed and the whole process is then continued

in the very same manner on this newly formed community.

Next, the augmentation phase rearranges nodes into more appropriate communities.

In the first phase, new members are added into a community C in a random order.

Therefore, C ’s objective value hC can further be improved if some of its members, that

reduce the total durability, are excluded. Such nodes then form singleton communities.

This step requires the re-evaluation of all C ’s members as a result. The removal of such

nodes creates more cohesive communities having higher internal connectedness.

In the last phase, the refinement phase, global stability of the whole network is

re-estimated. This phase looks at the merging of two adjacent communities in order to

improve the overall objective function. If two communities have a large number of mutual

connections between them, it is thus more durable to combine them into one community.

The run time complexity of the development and augmentation phases are O(nm).

Moreover, even though the refinement phase might take O(n3m) time in the worst case

scenario, we have found that the DCD algorithm computes the durable communities

within milliseconds even for networks containing hundreds of nodes as reported in

Table 2-2. Since the optimal solution takes exponential time for larger instances of the

network, we use smaller values of n in order to obtain results for optimal solution for

comparing with the running time of DCD. We formulated the DCD problem as an integer

program with quadratic constraints and objective function and solved it using CPLEX

38

Algorithm 1 DCD algorithmData: Network Gp = (Vp,Ep,Wp)Result: Durable community structure CPhase I: Development Phase.Initialize C ← ∅Initialize Q ← Vpwhile ∃ unassigned node x ∈ Q do

C ← {x}Q ← Q\{x}while ∃y ∈ Q such that hC∪{y} > hC do

y ← argmaxy∈Q

{hC∪{y}}

C ← C ∪ {y}Q ← Q\{y}

C ← C ∪ {C}Phase II: Augmentation Phase.for C ∈ C do

while ∃x ∈ C such that hC\{x} > hC doC ← C\{x}C ← C ∪ {x}

Phase II: Refinement Phase.while ∃C1,C2 such that hC1∪C2

> hC1+ hC2

do(C1,C2)← argmax

C1,C2∈C{hC1∪C2

− hC1− hC2

}

C ← (C\{C1,C2}) ∪ {C1 ∪ C2}Return C

[36] to obtain the result for optimal solution. We have reported the results of run time

comparison in the Table 2-3. Clearly, the run time complexity of the optimal algorithm

increases exponentially as the number of devices increases in the network, whereas

DCD takes only a small amount of time on all of those cases which makes DCD suitable

for real-time relay selection.

Table 2-2. Running times in seconds for DCD

Method User count (n)20 50 80 110 140 170

DCD 0.006 0.022 0.05 0.018 0.27 0.84

39

Table 2-3. Comparison of running times in seconds

Method User count (n)10 15 20 25 30

DCD 0.006 0.005 0.006 0.005 0.009Optimal 1.68 6.31 422.73 1465 2970

2.4 Cost-Effective Device Selection

Once content request is generated at time t, the BS initiates a centralized process

that encompasses two tasks. First, it constructs Gp and finds out durable communities

as described in previous section. In the second step, the BS selects a set of devices

to solve the CEDS-MD problem defined in Problem 1. To ensure high likelihood of the

successful delivery of content through D2D, the BS incorporates the social encounter

based community information as described subsequently.

2.4.1 Relay Graph Construction

The BS initiates the second step for device selection by constructing a multi-hop

D2D graph Gr = (Vr , Er ,Wc ,We) where Vr is the set of devices present at time t.

Wc denotes the BS cost, for any (i , j) ∈ Er , ci ,j indicates how much incentive BS

has to spend in order to make device i agree to share its resources with device j for

relay purpose as defined in (2–4). We put an edge between two devices i and j if and

only if the distance between them is within the D2D communication range, that is, the

SINR from i to j is above a certain threshold as determined by the BS. Here, the BS

is also considered as part of the graph where it is represented by a vertex. The edge

connecting the BS and each device has a cost that pertains to the physical channel

condition between them. Since a transmitting device in a D2D pair with better channel

condition is preferred from the BS’s point of view, the BS will pay a higher incentive and

thus, it incurs more cost to the BS which is captured in equation (2–4). In contrast, for a

direct BS to device connection, a receiving device having better channel condition with

the BS will require less physical resource blocks for the transmission which will result

40

in a smaller B2D cost. The BS will have to use a relatively large number of resource

blocks to transmit the content within tmax to a device which is far away from it which is

essentially a device experiencing poor channel condition at the BS. Consequently, the

cost for BS to that device, termed as B2D cost, will be naturally higher than a device

with better channel condition. In summary, the B2D cost can be defined to be inversely

proportional to the radio channel condition from the BS to that device as denoted below.

cBS,j =K

pBS,j= K × {pBS .(dBS,j)−α.|m0|2.10ψ/10}−1. (2–5)

A device located closer to the BS essentially experiences better channel condition

at the BS and incurs less B2D cost to receive the content. The inverse of the numerical

value of the received signal at device j from the BS, denoted by pBS,j is a large number;

the constant K < 1 is thus required to normalize the cost so that the B2D cost is in the

same scale with the multi-hop D2D cost. To account for the mobility of the devices on

the multi-hop path, i.e., increasing the likelihood of successful content delivery, we resort

on identified durable communities for the assignment of edge weightWe described

below.

2.4.2 We Weight Assignment in Gr

Since the durable communities are constructed based on physical encounter

history, users belonging to the same community have strong connections internally

that not only help in reliable content transfer but also lay the basic foundation for

stable and sustainable encounter predictions. The BS follows specific rules in order

to assign proper edge weights Wij between two devices i and j who are within dmax

in Gr according to their membership in the durable communities obtained from the

contact graph Gp. (i) Devices belonging to same community as well as connected via

sustainable edge will have small weight that is inversely proportional to the total internal

edge weight of that community. (ii) Devices belonging to same community but either

connected with a bridge edge in Gp or without any edge in Gp will have larger weight in

41

Gr compared to case (i). (iii) If devices belong to different communities Ci and Cj and

there is no edge connecting them in Gp or the edge connecting them is a bridge edge in

Gp, the edge connecting them in Gr will have large weight that is inversely proportional

to the weight of the edge bearing minimum weight among all edges connecting Ci and

Cj in Gp. If there is no edge connecting Ci and Cj in Gp, we assign Wij the value which

is the maximum weight between any two devices in Gr . (iv) If devices belong to different

communities Ci and Cj and a sustainable edge connects them in Gp, the edge weight

Wij between i and j in Gr will be smaller than that of case (iii). According to these four

criteria, edge weights are assigned between adjacent devices (within dmax ) in Gr which

help our proposed solution RPF to choose suitable relay devices for multi-hop content

transfer as we will demonstrate in the performance evaluation section.

2.4.3 Social Community Aware Device Selection for Multi-hop D2D

The goal is to find a least cost path from s to r in relay graph Gr within practical

constraints of maximum delivery time imposed as part of latency which puts a limit

on the number of relay devices. At the same time, we emphasis on the importance of

incorporating durable communities into decision making process of device selection for

successful D2D session completion. To take this into account, we modify the cost of the

path P in Problem 1 as part of our solution to CEDS-MD. Accordingly, we include the

edge weight Wij that was computed in Section 2.4.2, to obtain the total cost wij between

i and j as follows:

wij = Wij + ci ,j . (2–6)

Note that, both the terms in the right hand side of (2–6) are normalized and of the

same order of magnitude. For real-time content sharing with D2D communication, we

can formulate the optimal relay selection problem in multi-hop D2D cellular network as

42

the following optimization problem. Let the variable xij represent each edge (i , j) ∈ Er :

xij =

1, if e(i , j) is selected for least cost feasible path.

0, otherwise.(2–7)

We have the following Integer Program (IP):

min∑

(i ,j)∈E

wijxij (2–8)

s.t.∑

(i ,j)∈E

fij −∑

(k,i)∈E

fki =

1 i = s,

−1 i = r ,

0 ∀i ∈ V , i 6= s, t,

(2–9)

∑(i ,j)∈E

ti ,jxij ≤ tmax , (2–10)

xij ∈ {0, 1}, ∀(i , j) ∈ E . (2–11)

(2–9) ensures that the selected cost-effective devices constitute a path. The time

for transmission between devices i and j is obtained considering cellular and the

wireless channel as in (2–3). (2–10) makes sure that the selected devices deliver the

time-sensitive content within the maximum allowable time tmax with high likelihood.

This optimization problem is NP-complete since it belongs to a class of combinatorial

optimization [76]. Therefore, we cannot derive the optimal solution in polynomial time.

Next, we introduce the proposed approach to solve the CEDS-MD problem.

2.4.4 Solving the Optimization Problem

We solve the CEDS-MD problem in three steps: (i) relax the IP formulation into a

linear program (LP) and solve it, (ii) show that the optimal solution of the LP has at most

two fractional paths that will be constructed and (iii) formulate a new LP by adding new

constraints. Then, we keep solving the modified LP until it becomes infeasible. This

43

approach obtains the optimal solution in near polynomial time by using interior point

method in solving the LP. We start by relaxing (2–11) to obtain the LP:

min∑

(i ,j)∈E

wijxij (2–12)

s.t.∑

(i ,j)∈E

fij −∑

(k,i)∈E

fki =

1 i = s,

−1 i = r ,

0 ∀i ∈ V , i 6= s, t,

(2–13)

∑(i ,j)∈E

ti ,jxij ≤ tmax , (2–14)

0 ≤ xij ≤ 1 ∀(i , j) ∈ E . (2–15)

Property of LP Solution

We denote the LP relaxation of (2–15) as P. The optimal solution of P is no longer

integral as in the classical shortest path problem[8], due to the addition of constraint

(2–10). However, the following theorem holds true.

Theorem 2.1. There exists either an optimal solution for P that contains at most two

fractional s, r paths or P is infeasible.

Proof. Denote Psr as the collection of all s, r paths. Denote wpj , t(pj) as the total

weight and total delay of a path pj ∈ Psr , respectively. pj is called a long-delay path if

t(pj) > tmax and is called a short-delay path otherwise.

We will show that if P is feasible and an optimal solution x∗ contains more than two

fractional s, r paths, then either x∗ can be transformed to an optimal solution with at

most two s, r paths or x∗ is not optimal. Assume x∗ contains k > 2 fractional paths and

is optimal. It is clear that some short-delay paths must be included, otherwise x∗ is not

even feasible. Therefore, the problem can be categorized into three cases: i) all paths

are short-delay paths, ii) at least two short-delay paths and a long-delay path exist and

iii) at least two long-delay paths and a short-delay path exist.

44

In the first case, if all the short-delay paths selected have the same weight, an

equivalent solution can be constructed by assigning flow of 1 to one of the selected

paths and flow of 0 to all the others. Such an optimal solution has only one path. If the

weight of the selected paths are different, by shifting the flow from heavy-weight paths to

light-weight paths can improve the solution and hence, x∗ is not optimal.

In the second and the third case, the weight of long-delay paths must be smaller

than short-delay paths or we can shift the flow to short-delay paths and improve the

solution. Denote the collection of all selected paths as Px∗, we must have∑

pj∈Px∗fjt(pj) =

tmax , where fj is the flow assigned to path pj . If the total time is less than tmax , it is

possible to shift flows from short-delay paths to long-delay paths and improve the

solution. In the second case, denote p1, p2 as two short-delay paths. Also, let pa as a

representation of all other selected paths, where

fa =∑

pj∈Px∗ ,j 6=1,2

fj ,

t(pa) =

∑pj∈Px∗ ,j 6=1,2 fjt(pj)

fa,

w(pa) =

∑pj∈Px∗ ,j 6=1,2 fjw(pj)

fa.

Clearly, fat(pa) + f1t(p1) + f2t(p2) = tmax ,

faw(pa) + f1w(p1) + f2w(p2) = Y ∗,

where Y ∗ denotes the objective value of solution x∗. Also, we have t(pa) >

tmax ,w(pa) < w(p1),w(p2).

Without loss of generality, let t(p1) < t(p2), then w(p1) > w(p2) or p1, p2 cannot

coexist in the optimal solution. Consider two moves: (1) Remove p2 from the optimal

solution. (2) Remove p1 from the optimal solution. For both moves, the solutions are

recalculated by assigning flows to the remaining selected paths. Denote the objective

value by Y 1,Y 2 for move (1) and (2) respectively. We will show that it is impossible to

have both Y 1,Y 2 ≤ Y ∗ and Y ∗ is not an optimal solution.

45

After move (1), the following formulas hold.

tmax = (fa + δ1)t(pa) + (f1 + f2 − δ1)t(p1),

Y 1 = (fa + δ1)w(pa) + (f1 + f2 − δ1)w(p1),

δ1 = f2t(p2)− t(p1)

t(pa)− t(p1).

Therefore,

�1 = Y 1 − Y ∗ = f2(w(p1)− w(p2)) + δ1(w(pa)− w(p1))

= f2((w(p1)− w(p2)) +t(p2)− t(p1)

t(pa)− t(p1)(w(pa)− w(p1))).

After move (2), the following formulas hold.

tmax = (fa − δ2)t(pa) + (f1 + f2 + δ2)t(p2),

Y 2 = (fa − δ2)w(pa) + (f1 + f2 + δ2)w(p2),

δ2 = f1t(p2)− t(p1)

t(pa)− t(p2).

Therefore,

�2 = Y 2 − Y ∗ = f1(w(p2)− w(p1))− δ2(w(pa)− w(p2))

= f1((w(p2)− w(p1)) +t(p2)− t(p1)

t(pa)− t(p2)(w(p2)− w(pa))).

Assume �1, �2 > 0, since f1, f2 > 0, we have

w(p1)− w(p2)

w(p1)− w(pa)>

t(p2)− t(p1)

t(pa)− t(p1), (2–16)

w(p1)− w(p2)

w(p2)− w(pa)<

t(p2)− t(p1)

t(pa)− t(p2). (2–17)

46

However, inequality (2–16), (2–17) cannot both hold simultaneously. To see it

clearly, let

a = w(p1)− w(p2), b = w(p2)− w(pa), (2–18)

c = t(pa)− t(p2), d = t(p2)− t(p1). (2–19)

Then, inequality (2–16) reduces to aa+b

> dc+d

, while inequality (2–17) reduces to ab< d

c.

The first one implies ac > bd while the second one implies ac < bd .

Therefore, in the second case in which there exists two or more short-delay paths

in the solution, we can always perform move (1) or (2) to reduce number of short-delay

paths without increasing objective value. The same claim holds true for the third case

with a similar reasoning.

In conclusion, we can always create an optimal solution for P while selecting at

most two s, r paths.

2.4.5 Exact Solution by Cutting Plane

Based on Theorem 2.1, an optimal solution with at most two fractional paths can

always be generated by solving P. The case of only one path is trivial since it is already

the optimal integral solution and no further work is required. Therefore, we are only

interested in solutions with two fractional paths. Clearly, the two paths must be one

short-delay path, denoted as ps and one long-delay path, denoted as pl . Since any

feasible integral solution must be a short-delay path, we are particularly interested in ps .

Denote Xps =∑

(i ,j)∈ps �xij , where �xij is the value of xij in the current solution. If we cut the

path ps out of the feasible region of P, the solution must explore other paths by adding

the following constraint

∑(i ,j)∈ps

xij < Xps (2–20)

By resolving P iteratively while updating the constraint (2–20), the feasible region

of P is gradually decreased. We continue the iteration until it is infeasible. The optimal

47

Algorithm 2 RPF: An optimal algorithm for finding least cost relay pathData: Network Gr = (Vr ,Er ,Wv ,We), source s, target r and tmax

Result: A path comprising a set S of edges forming the relayInitialize Q ← ∅Solve the LP in (2–8)-(2–15)P ← solution of LPwhile P is feasible do

F ← {construct feasible path(s) in P}Q ← Q ∪ {short delay path in F}if F contains only one path from s to r then

Return the path in Q with smallest weight

else if F contains two paths from s to r thenLet ps and pl be the pathsAdd constraint according to (2–20) to the LP

Solve the LP with the additional constraintP ← solution of the updated LP

if P is infeasible && Q = ∅ thenNo feasible path existsInitiate direct cellular communication between BS and r

else if Q <> ∅ && {∃P ′ ∈ Q|C(P ′) < C(B2D)} thenReturn the path in Q with smallest weight and cost < B2D

elseInitiate direct cellular communication between BS and r

solution will then be the short-delay path with minimum weight. The final algorithm,

which we call relay path finder (RPF), is presented in Alg. 2.

2.5 Performance Evaluation

For our simulations, the mobility trace for nodes is generated by self-similar

least action walk model (SLAW) which is shown to be very realistic in capturing

user mobility [46]. In particular, SLAW generated traces are shown to be effective in

representing social contexts present among people sharing common interests or those

in a single community such as university campus, companies and theme parks. In

human mobility, people strive to reduce the distance of travel by visiting all the nearby

destinations before visiting farther destinations unless some high priority events such

as appointments force them to make a long distance trip even in the presence of

unvisited nearby destinations. SLAW leverages this self-similarity of fractal waypoints,

which can be viewed as destinations, to realistically predict the human mobility. In this

dissertation, we have used the similar parameter settings for capturing this regularity

48

in human mobility patterns which are also suggested in the original paper [46]. The

wireless propagation channel is modeled for urban macrocell scenarios with shadowing

component set to having standard deviation of 12 dB and path loss exponent α set

to 3. The cell area is set up as a 1 km × 1 km square with the BS at its center. The

noise spectral density is −174 dBm/Hz. The transmit power for each device is 100 mW

whereas the power of the BS is set to 10 W. The total bandwidth of the RBs are set

to 5 MHz in accordance with LTE RBs [2] and the maximum D2D distance is set to

dmax = 15 m. The main wireless network parameters are listed in Table 2-4. We have

set ρ, ζ and δ to 0.8, 0.7 and 4 respectively in constructing Gp for durable community

detection. We describe how to choose these values later in this section.

Table 2-4. Main wireless network parametersNotation DescriptionCell dimension 1000 x 1000 m2

BS location Center of the areaShadowing std. dev. 12 dBPath Loss Exponent 3Noise spectral density −174 dBm/HzBS transmit power 10 WD2D transmit power 100 mWMaximum D2D distance 15 mRB size 12 sub-carriers, 0.5 ms

We have compared the performance of our solution, RPF, with Groups-NET (GNET

in short) which is a mobility-aware social-based approach that analyses the impact of

device mobility on the cellular network performance and multi-hop D2D in particular [59].

GNET identifies social groups based on previous social meetings. It then computes the

likelihood of each group meeting in future by computing the group-to-group paths by

considering the meeting regularity and shared group members. Finally, it identifies the

most probable path from the source to the destination by leveraging the group-to-group

path probability. It has been shown that GNET outperforms other state-of-the-art

49

methods in terms of improving the cellular network efficiency [59]. We also compare

our results with two other social-oblivious methods: i) minimizing cost (MC) scheme

that chooses devices that minimize the cost of the BS in content transmission, ii)

closest to destination (CD) scheme that selects the device that is physically closest to

destination at each hop. These greedy methods have been used for relay selection in

multi-hop D2D as an efficient way to offload cellular traffic and to enable content transfer

through D2D when direct connection can not be established between the source and the

destination.

We generated location of total 400 users in the designated area using SLAW model

for 72 hours and used first �t = 48 hours for detecting the social encounter based

communities. The rest 24 hours were used for simulating the D2D content transfer. We

randomly chose 20 cellular users uniformly distributed over the area and 20 pairs of D2D

devices as source and target (having distance larger than dmax ) and averaged the results

over a large number of independent simulation runs.

Figure 2-3 compares the content delivery rate for the proposed algorithm and the

baseline approaches when different parameters are varied. In Figure 2-3A, we show the

content delivery rate achieved by our proposed algorithm RPF for 140 users and three

different content sizes 150 KB, 570 KB, and 1 MB as the content sharing time tmax is

varied from 10 s to 120 s. For a particular content size b, with increasing tmax , the RPF

tends to choose devices on the multi-hop path with delivery time close to tmax so as to

minimize the cost. This results in more hops on a multi-hop D2D path, thus making it

more susceptible to device mobility. Consequently, the content delivery success rate

keeps decreasing with larger values of tmax . However, RPF chooses the same multi-hop

path after certain value of tmax as the path cost can no longer be minimized within tmax .

From the Figure 2-3A, we can see that the delivery success rate remains at around

70% after tmax reaches 120 s for content size b = 1 MB. For a particular tmax value, the

delivery success rate decreases with larger content size. Larger content requires more

50

65

70

75

80

85

90

95

100

20 40 60 80 100 120

% o

f conte

nt deliv

ere

d

tmax (sec)

b=150 KB

b=570 KB

b=1 MB

A Impact of tmax and different content sizefor RPF

55

60

65

70

75

80

85

90

95

100

20 40 60 80 100 120

% o

f conte

nt delivere

d

tmax

Proposed RPFMCCD

GNET

B Impact of tmax on content delivery fordifferent methods

60

65

70

75

80

85

90

95

100

200 300 400 500 600 700 800 900 1000

% o

f conte

nt deliv

ere

d

Content size (KB)

Proposed RPFMCCD

GNET

C Impact of content size b

40

50

60

70

80

90

100

100 150 200 250 300 350 400

% o

f conte

nt delivere

d

Number of users

Proposed RPFMCCD

GNET

D Impact of total users

Figure 2-3. Content transmission success rate for different cases

time to be transmitted which makes them more prone to device mobility. Therefore, a

larger content size results in reduced content delivery rate for a fixed tmax which is also

evident from the Figure 2-3A.

In Figure 2-3B, we can see that all the methods achieve a high success rate for

content transmission when the content sharing session is constrained by small tmax

values with the RPF securing 100% when tmax < 20 s. The results are shown for

a content size of 1 MB and user count of 140 users. However, all the approaches

experience a reduced content delivery rate for D2D sessions with longer duration. In

such cases, the mobility of the devices can lead to premature tear down of the multi-hop

D2D session. Interestingly, Figure 2-3B shows that the proposed RPF is more resilient

to mobility than all other approaches. In particular, the RPF experiences a much slower

51

performance degradation as tmax varies. Even with a maximum D2D session of close

to two minutes, RPF achieves a successful delivery rate that is more than 69%. RPF’s

consideration of durable social communities enables it to identify devices that are

likely to maintain the required QoS during the whole session by remaining close to one

another. The content delivery rate is up to 18% higher for the RPF algorithm relative to

the social-unaware scenario for tmax = 120 s.

In Figure 2-3C, we show how varying the content size impacts the content delivery

rate when tmax is set to 100 s for 140 users. Clearly, as content size increases, the

delivery rate decreases. However, the rate of degradation for RPF is much smaller

than other methods. This is due to the fact that larger contents require more time for

transmission which, in turn, makes the longer D2D session more susceptible to device

mobility. In such cases, the mobility of the devices can lead to premature tear down

of the multi-hop D2D session. As a result, methods that do not account for social

communities in choosing reliable devices on multi-hop will experience a poor delivery

rate. Figure 2-3C shows that the content delivery rate is up to 14% higher for the

proposed RPF algorithm when compared to the social-unaware scenario for b = 1 MB.

Figure 2-3D shows how the content delivery rate varies with the network size. As

the number of users increases, one expects a better delivery rate due to more option for

multi-hop. However, a large number of users will also increase interference for users that

need to transmit on the same RB. In such a scenario with scarce resources, achievable

data rate decreases leading to longer transmission time which makes them more

susceptible to device mobility. Interestingly, RPF suffers less from the increased user

concentration which makes it best device selection method. In Figure 2-3D, we can see

that the proposed RPF is more resilient to mobility than all other approaches. Moreover,

the content delivery rate resulting from RPF is up to 24% higher than the social-unaware

scenario for a user count of 400.

52

Furthermore, from Figs. 2-3A-2-3D, we can see that all the baseline methods

perform poorly compared to RPF in terms of content delivery success rate. On the

one hand, for CD, since it does not consider the signal and noise information, it suffers

from poor content delivery. On the other hand, MC always tries to minimize the BS cost

which, in some cases, results in choosing devices that require large time to deliver thus

making it prone to experience disconnections during mobility. GNET also suffers from

poor delivery rate as it prioritizes the most probable community-to-community path. Two

adjacent devices belonging to two different communities with large community-to-community

path probability will be chosen by GNET, even if they have never met before. As a result,

these devices without significant previous meeting records, might move far apart

from each other during a transmission session leading to poor delivery of content.

On the other hand, RPF’s consideration of durable social communities enables it to

identify devices that are likely to maintain the required QoS during the whole session by

remaining close to one another.

Figure 2-4 evaluates the offload performance of the proposed RPF. For a 100

seconds duration, we recorded the number of active B2D links in the network which

is shown in Figs. 2-4A-2-4D. The BS initiates a direct cellular connection towards the

target when: a) there is no feasible multi-hop path or b) the multi-hop device cost is

larger than the direct BS to device (B2D) cost or c) the mobility of devices on a path

leads to a premature disconnection of that path.

Figure 2-4A shows the impact of increasing tmax as contents of three different sizes

150 KB, 570 KB, and 1 MB are transmitted. Since contents can be transmitted for a

longer duration with increasing tmax , the number of active B2D links increases for each

of the content size b. This corroborates the intuition that the mobility of devices may

disrupt D2D sessions which will require the BS to use costly B2D links. As the content

size increases, more time is needed for content transmission which, in turn, makes the

multi-hop D2D path more prone to device mobility. Consequently, the premature tear

53

0

5

10

15

20

25

20 40 60 80 100 120

Num

ber

of active

B2D

lin

ks

tmax (sec)

b=1MB

b=570KB

b=150KB

A Active B2D links vs tmax

0

5

10

15

20

25

30

35

20 40 60 80 100 120

Num

ber

of active

B2D

lin

ks

tmax

Proposed RPFMCCD

GNET

B Impact of tmax on the number of activeB2D links for different methods

10

20

30

40

50

60

70

80

90

5 10 15 20 25

Nu

mber

of a

ctive B

2D

lin

ks

Minimum allowed SINR (dB)

Proposed RPFMCCD

GNET

C Active B2D links vs SINR

8

10

12

14

16

18

20

22

24

26

28

30

200 300 400 500 600 700 800 900 1000

Nu

mber

of a

ctive B

2D

lin

ks

Content size (KB)

Proposed RPFMCCD

GNET

D Active B2D links vs content size

Figure 2-4. Offload performance analysis for different cases

down of an ongoing session due to device mobility leads to the reliance on increasing

number of B2D links where the content is served directly by the BS to the content

requester.

In Figure 2-4B, we can see that, when a longer tmax is allowed, the number of

active B2D links increases for the content size 1 MB and 140 users. This corroborates

the intuition that mobility of devices may disrupt D2D sessions which leads the BS to

use costly B2D links. However, RPF requires 40% less B2D links compared to other

methods. The reduction in B2D links demonstrates the improved offload capabilities of

the proposed RPF. Such an offload of traffic from the BS to the D2D tier also reduces

the usage of expensive backhaul traffic.

54

90

91

92

93

94

95

96

97

98

99

100

20 40 60 80 100 120

Pe

rcen

tag

e o

f tim

e

D2D

is c

ho

se

n (

%)

tmax (sec)

b=150KB

b=570KB

b=1MB

Figure 2-5. Cost-effectiveness of multi-hop D2D for three different content sizes and arange of tmax

In Figure 2-4C, we can see the impact of minimum allowed SINR on network traffic

offload. In case of smaller SINR, devices can sustain longer D2D sessions since the

required QoS for such a communication is low. This results in more successful content

delivery over multi-hop D2D which, in turn, requires less number of B2D links. However,

when the allowable SINR is increased, the tolerance to device mobility is decreased

which subsequently results in more active B2D links. From Figure 2-4C, we can see that

the other methods require as high as 158% more B2D links compared to the proposed

RPF for a target SINR of 5 dB.

In Figure 2-4D, we show the comparative performance of different relay selection

methods for varying number of content size from 150 KB to 1 MB for a fixed network

of size 140 and tmax = 100 s. As content size increases, all methods will start to

increasingly rely on the B2D links. However, RPF requires 28% less B2D links compared

to other methods. The reduction in the number of B2D links demonstrates the improved

offload capabilities of the proposed RPF. Such an offload of traffic from the BS to the

D2D tier also reduces the usage of expensive backhaul traffic.

In Figure 2-5, we show the percentage of time a multi-hop D2D path is chosen

instead of an expensive direct B2D link for a user count of 140. This figure also gives an

55

indication on the quality of the cost functions that we have defined in (2–4) and (2–5).

Note that, this comparison considers only the cost of direct B2D and D2D relay devices

before the transmission starts. Figure 2-5 shows that, over 90% of the time, the RPF

chooses multi-hop D2D due to its cost-effectiveness. The small portion of time during

which the direct B2D links are used is primarily due to the destination devices which are

closer to the BS and can receive the content directly from the BS with low cost. As the

allowed tmax increases, more D2D links are chosen by RPF compared to B2D links. With

increasing tmax , the RPF tends to choose devices on the multi-hop path with delivery

time close to tmax as it tries to minimize the cost. This results in a D2D path having

a smaller cost, which explains why more D2D links are selected by RPF as the tmax

increases for a particular content size. Furthermore, as the content size increases, the

chances of forming better (less expensive) D2D path starts decreasing. For a given tmax ,

RPF has to choose devices of higher cost as the content size b increases in order to

find a D2D path that is capable of delivering the content within tmax . Therefore, a larger

content size results in decreased percentage of D2D links being chosen by RPF before

the content transmission starts as shown in Figure 2-5.

Figure 2-6 shows the execution time needed for our approach. RPF achieves

the optimal solution within shortest possible time even in large networks. In almost

all realizations, it takes less than a second on the average to compute cost-effective

devices on multi-hop path. We performed all the computations on an AMD Opteron(tm)

Processor 6168 CPU with 64 GB-memory Linux machine.

In Figure 2-7, we show the impact of different parameters mentioned in Subsection 2.3.3

on the performance of the RPF algorithm. Figure 2-7 is the heat map representing the

impact of δ (stability threshold) and the weight factor ρ on the content delivery success

rate achieved by RPF for a user count of 140, content size b = 1 MB, and tmax = 100 s.

The success rate is depicted by the RGB colors. As the success rate gets higher, the

color becomes lighter in the heat map. From Figure 2-7, we can see that the color

56

0

20

40

60

80

100

120

100 150 200 250 300 350 400

Tim

e (

ms)

Number of users

Figure 2-6. Execution time of RPF Figure 2-7. Impact of different parametersfor constructing Gp on theperformance of RPF

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 100 150 200 250 300 350

Norm

alized

BS

co

st

Number of users

Proposed RPFMCCD

GNET

Figure 2-8. The cost of the BS vs user count

is lightest, i.e., the contents are successfully delivered, in the top right corner where

ρ = 0.8 and δ = 4. The content delivery success rate increases till δ = 4 and starts

deteriorating as δ is increased further. Accordingly, we assign the values of ρ and δ to

0.8 and 4 respectively for this setup of user count, content size and tmax . We vary the

value of strength threshold ζ from 0.5 to 0.9 and choose ζ = 0.7 as RPF achieves better

content delivery with this setup.

57

In Figure 2-8, we show how the cost of the BS varies with total users in the network.

The BS cost is normalized by the highest cost attained for the maximum user count. It

is clear that the BS cost for RPF is smaller than that of any other methods. Although,

MC aims to choose relay devices that yield a minimum cost, it suffers from poor delivery

since it does not take the mobility of devices into account. Therefore, the BS has to

invoke expensive B2D links to deliver the content resulting in increased BS cost as

evident from the Figure 2-8. The other two methods also results in a higher BS cost as

they also fail to consider devices’ mobility while choosing relay devices. For all of the

methods, as the number of users increases so does the interference originating from

users that are sharing same resources. In such a scenario, similar to what we have seen

in Figure 2-3D, achievable data rate decreases due to scarce resources. This, in turn,

leads to longer transmission time which makes them more susceptible to device mobility

and consequently results in higher BS cost. However, as the user count increases, the

gap between RPF and other methods also increases validating the superiority of our

method in terms of minimizing the BS cost.

2.6 Summary

In this chapter, we have studied the impact of device mobility on the performance

of multi-hop D2D underlaying cellular network. We have introduced a novel model

that considers durable communities based on the social encounters of devices for

predicting the likelihood of devices’ proximity. We have formulated the reliable device

selection problem as an IP optimization problem and we have proposed an efficient

heuristic algorithm to solve it. We have also shown that leveraging social communities

can increase the content delivery rate in multi-hop D2D. Simulation results showed that

our proposed method outperformed classical social-unaware methods significantly in

terms of traffic offload. The results also showed that the proposed method achieved

its objectives with manageable computational complexity which makes it applicable to

larger networks.

58

CHAPTER 3TOWARDS EFFICIENT SOCIAL-AWARE CONTENT TRANSMISSION THROUGH

DEVICE-TO-DEVICE MULTICAST COMMUNICATIONS

In Chapter 2 we have seen how social-aware approach leads to better content

delivery through multi-hop D2D communication when the content source and content

requester are beyond typical D2D range. In this chapter, we explore the impact of social

relationship on the performance of multicast content delivery via D2D underlaid cellular

networks.

3.1 D2D Enhanced Content Transmission

Let us consider a group of devices who are interested in the same multicast content

served by a single LTE-A cell shown in Figure 3-1. We consider a transmission scenario

where devices v1 to v4 receive the content directly from the eNB in the first hop, v1 and v3

act as RDs in the second hop to transmit the content to the rest of the devices that could

not receive it from the eNB. In subsequent hops, v5, v6, v7 and v8 help in further relaying

the content to other devices using multi-hop D2D within minimum time, thus achieving

better quality of service. In the remaining of this section, we will introduce the details on

the system model considering the social and physical aspects of devices’ relationship.

3.1.1 Problem Setup

Table 3-1. CQI / MCS table for LTE-A [3]CQI MCS Efficiency CQI MCS EfficiencyIndex [bit/s/Hz] Index [bit/s/Hz]0 not in range 0.0000 8 16-QAM 1.91411 QPSK 0.1523 9 16-QAM 2.40632 QPSK 0.2344 10 64-QAM 2.73053 QPSK 0.3770 11 64-QAM 3.32234 QPSK 0.6016 12 64-QAM 3.90235 QPSK 0.8770 13 64-QAM 4.52346 QPSK 1.1758 14 64-QAM 5.11527 16-QAM 1.4766 15 64-QAM 5.5547

59

Figure 3-1. D2D enabled multicast (a multi-hop scenario): v1 and v3 form the relaydevices in second hop, along with purple nodes (v2, v4) they are directlyserved by the eNB in first hop. v5, v6, v7 and v8 denote the relay devices insubsequent hops.

We consider LTE-A [3] systems where OFDMA and single carrier frequency

division multiple access (SC-FDMA) are used to access the downlink and the uplink,

respectively. The available radio spectrum is managed in terms of resource blocks (RBs)

and, in the frequency domain, each RB corresponds to 12 consecutive and equally

spaced sub-carriers. One RB is the smallest frequency resource that can be assigned

to a device. The overall number of available RBs depends on the system bandwidth

configuration and can vary between 6 (1.4 MHz channel bandwidth) and 100 (20 MHz).

We also assume there is a single eNB that manages the spectrum, by assigning

the adequate number of RBs to each scheduled device and by selecting the modulation

and coding schemes (MCS) for each RB. Scheduling procedures are based on the

channel quality indicator (CQI) feedback, transmitted by each device to the eNB over

dedicated control channels. The CQI is associated to the maximum supported MCS [3],

as reported in Table 3-1 for the LTE-A standard. We use the terms ‘device’ and ‘user’

interchangeably throughout this chapter.

60

3.1.2 System Model

3.1.2.1 Radio network

In the considered LTE-A single-cell area with total N ′ devices, a set N ⊆ N ′

of n devices is seeking for a particular content of size b from the eNB. The total

bandwidth is divided into B resource blocks (RB) in the set B. We assume that the

eNB constructs a multicast tree T and transmits the content to the first hop devices

using direct transmission. Some of the devices among them subsequently disseminate

it to the next hop devices via D2D communication and so on. As a result, the content is

transmitted in a multi-hop D2D fashion. According to Rel. 12 3GPP [5] specifications, we

consider that the D2D links exploit uplink frequencies and all the RDs in a particular hop

simultaneously use the frequencies with same MCS to deliver the multicast data over

the D2D links in a synchronized manner, as described in [4]. The receivers consider

these retransmissions as multipath components of the same signal.

In our network, we consider real-time content sharing among mobile D2D users.

The achievable rate Rij for the transmission between a device i and device j using RB z

is

Rij = lz log2(1 + γij), (3–1)

where lz is the bandwidth of RB z ∈ B used by i for its data transmission to j , γij denotes

the SINR for j from i . For the link between i and j , considering signal interference from

all other devices using the same RB z , we have

γij =pigij∑

i ′∈Z,i ′ 6=i pi ′gi ′j + σ2, (3–2)

where Z is the set of devices sharing RB z , gij is the channel gain between i and j , pi is

the transmit power of device i , and σ2 is the variance of the Gaussian noise.

For the wireless network, we consider distance-dependent path loss and multipath

Rayleigh fading. Thus, the received power of each link between devices i and j can be

61

described as pij = pi .(dij)−α.|m0|2, where pi is the transmit power of device i , α is the

path loss exponent, and m0 is the fading component.

Given this SINR model, the LTE-A system obtains the CQI values by using the

resulting SINR values defined in equation (3–2). The SINR values can be mapped to

the corresponding CQI values on each RB for all users on either cellular and D2D links

that ensures a block error rate smaller than 1% [55]. The eNB collects the SINR values

and maps them to corresponding CQI between itself and other individual devices as

well as between each pair of devices. Let Q be the set of available CQI levels and let

qi ∈ {1, 2, ... , q} be the CQI between device i and the eNB, ∀i ∈ N . Moreover, let

qi ,j be the CQI value for each D2D link between devices i , j ∈ N , i 6= j . The means

by which this is achieved are outside the scope of this paper. However, the pairwise

values for both uplink (UL) and downlink (DL) directions can be obtained by eNB from

the Channel State Information (CSI) through the use of the reference signals (RSs) in

LTE-A [28]. Each CQI level is associated with a given supported MCS. For an MCS

value c , the attainable data rate depends on the number of assigned RBs and on the

spectral efficiency for c as reported in Table 3-1. Hence, we compute the time required

for transmitting a content of size b in a particular hop that uses B RBs as: tc = b/(r dc ·B)

(for direct cellular) or tc = b/(r uc · B) (for D2D) if the corresponding MCS for that hop

is c . The terms r dc and r uc represent the data rates respectively in downlink and uplink

transmissions adopting the MCS associated to the CQI c .

3.1.2.2 Social network

For establishing D2D connections, the eNB must provide proper incentives eu to

the users u ∈ N so that they become willing to share their resources for each others’

transmissions [14, 81]. However, even then, in reality, mobile users are usually reluctant

to share their resources [48] due to several practical reasons encompassing limited

resources and privacy concerns. Interestingly, devices belonging to the same social

community will be more interested to help disseminate content to other devices in the

62

same community [10, 48]. We take into account the fact that social relationships in the

form of kinship, friendship, or colleague relationship between devices also influence

the content request pattern in social network. For instance friends watching game

in a stadium, students on-campus accessing a video content of common interest or

neighbors watching soccer matches or the Super Bowl show some degree of social

relationships in their interaction. Such social relationships have been shown to exhibit

a community structure property which implies that the users can be divided into groups

with dense connections inside each group and fewer connections across groups [17].

Cellular operators can leverage this community structure property to identify physically

close cost-effective users who can help transmit a content. Ideally, the chosen users

should be socially connected as well as in close proximity to one another at the time of

content transmission to extract the full benefit of D2D.

Let wij denote the social tie between devices i , j ∈ N , wij = 0 when there is no

social link between them. We use the binary variable lij to express the willingness of i

sharing its resource with j as follows:

lij =

1, if i and j belong to the same community

or if wij ≥ 0.5

0, otherwise.

(3–3)

Users with large social interaction between them are shown to have strong social

tie which is captured by the value wij . A social tie of at least 0.5 is considered in this

dissertation for allowing user i to share its resources with user j which is a reasonable

assumption supported by many recent works on social network [65, 77]. Furthermore,

in a social network, if the tie between two users is high, it is more likely for them to

be in the same community. We deploy the well-known Blondel community detection

algorithm [17] in our experiments to extract social communities. In the experimental

evaluation section we vary the social tie between users to see how that impacts the

63

system performance. Now, we define a generalized problem formulation that aims to

deliver the multicast content in minimum time by choosing relay devices that are within a

certain BS cost.

3.2 Problem Formulation

We now propose a problem to tackle the multicast content transmission problem

from a multi-hop point of view. From an eNB’s standpoint, the smaller the incentive

costs, eu, of devices operating as RD are, the less it has to pay as incentive in the form

of monetary remunerations, coupons, or free data which in turn reduces the operator

cost. Thus, the objective is to identify a set S of RDs and their hop-wise positions

in the multicast tree T , so that the following trade-off can be balanced within a fixed

eNB budget I : 1) minimizing content delivery time and 2) The total incentive cost of

S is not larger than I . In this dissertation, we define the cost of a set X of devices as

C(X ) =∑

i∈X ei . The notations we used in this chapter are summarized in Table 3-2.

We now formally define the Multi-hop Relay Selection (MRS) problem below.

Table 3-2. Summary of notationsNotation DescriptionN Set of mobile devices seeking the same content.Kd Set of potential relay devices corresponding to downlink

CQI level cd .Q, q Set of CQI levels in LTE-A, total CQI levelsS Set of selected relay devices.T The multicast tree.I , b,B Maximum eNB budget, content size, number of RBs.r dc , r

uc Data rate in downlink and uplink, respectively.

wij Social tie between devices i and j .lij Device i ’s willingness to share resources with device j .mij Social CQI value between device i and j

Definition 4. (MRS) Given a set of devices N in network G requesting the same

content, total eNB budget I as incentives to the devices, MRS seeks to find a set of relay

64

devices S ⊆ N and the multicast tree T such that all the devices receive the content

from the eNB in minimum time and the cost C(S) is at most I .

We prove that the MRS problem is NP-complete in Theorem 3.1 by reducing from

the set cover (SC) problem [69], which is known to be NP-complete.

Theorem 3.1. The MRS problem is NP-complete.

Proof. First we introduce the decision version of MRS, which asks whether there exists

relay devices S and multicast tree T such that by time tmax , all devices receive the

content and C(S) ≤ I . It is clear that with specific S and T , the time required to transmit

the content to all devices can be calculated in polynomial time. Therefore, MRS is in NP.

Next, we consider a special case of MRS (P-MRS), in which the eNB is required to

send the content to two sets of devices H,F . The CQI levels between eNB and a device

in H, F are q1, q2 respectively and the transmitting times are t1, t2. Also, each device

h ∈ H is able to relay the content to a set F (h) ⊆ F of devices using CQI level q3 in time

t3, ∪h∈HF (h) = F . Let t1 + t3 < tmax < t2. The incentive for each relay device is 1 and

the total budget of the eNB is a positive integer I .

We then reduce the SC problem to P-MRS, which leads to the NP-Hardness of

P-MRS. As P-MRS is a special case of MRS, MRS is NP-hard and in turn NP-complete.

The decision version of SC is as follows.

SC: Given a set of m elements U, a collection S = {Si |Si ⊆ U, i = 1, ..., n} of n

sets and k ∈ N+, the SC problem seeks to identify whether there exists a sub-collection

S ′ ⊆ S where |S ′| ≤ k whose union equals U.

Let F = U, I = k and form a device h and corresponding F (h) for each Si ∈ S . With

t1 + t3 < tmax < t2, we create a P-MRS instance from the SC instance.

When the SC problem has a solution S ′, |S ′| ≤ k that can cover all elements in

U, the P-MRS problem has a solution by choosing all devices h whose corresponding

sets are in S ′ as the relay devices. All devices in H are placed in the first layer of the

multicast tree and all devices in F in the second layer. As the solution to SC is feasible,

65

Table 3-3. Social CQI Matrix (SCM)PPPPPPPPPS

N\Kd1 d2 ... dj

s1 ms1,d1 ms1,d2 ... ms1,dj

s2 ms2,d1 ms2,d2 ... ms2,dj

... ... ... ... ...

si msi ,d1 msi ,d2 ... msi ,dj

in the P-MRS, all devices in F can receive the data from a device in H and the content

delivery time is t1 + t3 < tmax .

When the SC problem has no solution, in the P-MRS instance, some of the devices

in F cannot receive the content via a relay device. Therefore, the content delivery time

is t2 > tmax (when all devices receive the data directly from the eNB) and the P-MRS

problem is infeasible.

Therefore, the P-MRS problem is NP-Hard as SC is NP-Hard. As P-MRS is only

a special case of MRS, MRS must be NP-Hard and consequently an NP-complete

problem.

3.3 Solution for MRS

Our proposed solution aims to identify the optimal multicast tree T to solve the MRS

problem by deciding: (i) the appropriate CQI levels in each layer of T (ii) the set of the

RDs (S), and (iii) the hop-wise position of each of the devices in S in the tree T .

As the first step to solve the problem, the eNB computes the pairwise CQI levels

qi ,j where j ∈ N , i.e., set of devices that can be served by an RD i using direct D2D,

∀i ∈ N . During this step, the eNB integrates the social aspect with the physical radio

network characteristics. As part of this, the eNB constructs a social CQI matrix (SCM)

as depicted in Table 3-3. Social CQI index between devices i and j is expressed as

mi ,j = qi ,j ∗ li ,j (refer Equation (3–3)), which denotes the likelihood of i sharing resources

with j when channel quality is qi ,j . When they have no social relationship, mi ,j is set to

0. Although, this will reduce some potential D2D pairs, consideration of social aspect

66

while choosing relay devices is of practical significance. In real-life scenario, as we

have already explained in Chapter 1.1.2, users are very reluctant to and in some cases

quite skeptical in sharing resources with unknown persons even if they are situated

close to each other within D2D proximity. Socially connected users have been found

to be more willing to share resources with one another which facilitates the success

of proximity-based D2D services [48]. Note that the proposed scheme requires that

the eNB is aware of the updated SCM, which incurs some extra overhead. However,

D2D communications are usually based on the assumption of stationary or, at least,

semi-static D2D channels because of their low mobility and short communication range

in the proximity based services [55]. Further research related to tackling higher device

mobility is left for future studies.

We formulate the MRS problem as a mixed integer program by constructing a

multicast tree T with the eNB placed as the root in layer 0. A device i in layer l transmits

content to its D2D proximity devices N(i) in layer l + 1, which we term as one hop

transmission. The eNB transmits a content to the devices in layer 1 using DL data rate

r dc (refer Section 3.1.2) by setting CQI to cd . Since, our focus is to ensure faster content

delivery without significant buffering in the relay devices, the selected RDs in that layer

must deliver the content to the next hop via D2D using the same or larger data rate to

become a feasible solution. Therefore, a solution is considered feasible only if the D2D

links in the second hop originating from each RD in layer 1 use MCS level cu ≥ cd in

uplink. Accordingly, we only consider devices j ∈ N\S to be in N(i) if mi ,j ≥ cu, i.e.,

devices that are able to decode the content correctly when i transmits it using MCS cu

via D2D. In the subsequent layers, similar to the first layer, relay devices are chosen

into S given that the D2D links satisfy this MCS requirement, i.e., MCS level must be

non-decreasing in the following layers : cul+1 ≥ cul . The total number of layers lmax in a

particular tree (i.e. depth) will be at most the number of allowed RDs which is bounded

by the eNB budget. We denote the set of layers with L = {0, 1, 2, ... , lmax}. The time tc

67

to deliver the content from i to those in the next layer is determined from the data rate

associated with CQI c in Table 3-1 if the selected CQI in that layer is c as described in

Section 3.1.2.

Our aim is to reduce the overall content delivery time for a given eNB budget. In

this context, we define a binary variable pu which defines if a device u will act as an RD.

We also introduce a variable yv ,l denoting whether a device v belongs to layer l . We

consider the eNB in our formulation as node 0 and place it at layer l = 0, accordingly we

set y0,0 = 1. Let the variable zu,v denote whether device v receives the content from u,

i.e., device u is the relay device for v . In a particular layer, each relay device transmits

using the same CQI which is captured by the binary variable w cl .

w cl =

1, if layer l uses CQI c .


We keep track of the number of layers in the tree using the variable xl which is set

to 1 when there is at least one device in layer l , otherwise it is 0. We also introduce a

binary variable qL which denotes if the total number of layers is at least L. We formulate

the problem as a mixed integer program P below in (3–5).

68

min t

s.t.∑l

yv ,l = 1,∀v ∈ N (3–5a)

xl ≤∑v∈N

yv ,l ≤ A · xl , ∀l ∈ L (3–5b)

L−1∑l=0

∑c∈Q

tc · w cl ≤ t + A · (1− qL),∀L ∈ L (3–5c)

∑c∈Q

w cl = 1,∀l ∈ L (3–5d)

xl ≤ ql , ∀l ∈ L (3–5e)∑u∈N

zu,v = 1,∀v ∈ N (3–5f)

yu,l + zu,v ≤ 1 + yv ,l+1,∀u, v ∈ N ,∀l ∈ L (3–5g)

w cl+1 · tc ≤ w c ′

l · tc′+ A · (1− w c ′

l ),∀c ′ ≤ c ,∀l ∈ L (3–5h)∑v∈N

zu,v ≤ A · pu,∀u ∈ N (3–5i)

yu,l + zu,v ≤ 1 +

mu,v∑c=1

w cl ,∀u, v ∈ N ,∀l ∈ L (3–5j)

∑u∈N

eu · pu ≤ I (3–5k)

zu,v ∈ {0, 1},∀u, v ∈ N (3–5l)

yv ,l ∈ {0, 1}, ∀v ∈ N ,∀l ∈ L (3–5m)

xl ∈ {0, 1},∀l ∈ L (3–5n)

w cl ∈ {0, 1},∀c ∈ Q,∀l ∈ L (3–5o)

qL ∈ {0, 1}, ∀L ∈ L (3–5p)

pu ∈ {0, 1},∀u ∈ N (3–5q)

69

The objective of P is to minimize delivery time. The constant A denotes a large

positive number in all of the above cases. Constraint (3–5a) ensures that a device v

must be in exactly one of the layers in the tree. A layer exists when there is at least one

device in it, constraint (3–5b) captures that. (3–5c) calculates the delivery time t. (3–5d)

ensures that devices belonging to a particular layer transmits using the same CQI.

Constraint (3–5e) detects whether the total number of layers is at least L. (3–5f) ensures

that a device receives the data from exactly one device in the upper layer. If a device v

belongs to layer l + 1 and it receives the content from device u, then u must be in layer

l , (3–5g) expresses this scenario. (3–5h) considers the fact that a device in layer l + 1

must transmit using a CQI equal to or higher than that of a device in layer l . Constraints

(3–5i) and (3–5k) ensure that the total cost of selected RDs are within the eNB budget

I . If device u in layer l transmits the data to device v in layer l + 1, they must be socially

and physically connected, i.e., the mu,v (Table 3-3) value must be larger or equal to the

CQI of layer c , which is ensured by (3–5j).

Corollary 1. The size of P is O(n2 · lmax).

We have already shown in Theorem 3.1 that the MRS problem is NP-complete and

obtaining the optimal solution requires exponential time with the size of the D2D pairs.

Consequently, we introduce a heuristic approach, time efficient relay selector (ERS),

to tackle the MRS problem in the next subsection. We also show in the experimental

evaluation that ERS achieves the objective value within at most 5% of that of the optimal

solution.

ERS: Time-efficient approach for MRS

As pointed out in the previous section, with the increase in the number of potential

D2D pairs, the solution space becomes prohibitively large and intractable to solve

at some point. However, we observe that some of those D2D pairs are less likely to

be selected as part of the multicast tree and hence can be safely removed from the

solution space. To this end, we perform a pre-processing step before solving (3–5),

70

which removes non-contributory D2D pairs. For a D2D pair (x , y), if the eNB can deliver

the content to y faster than transmitting via x , we discard the pair. Equipped with the

pre-processing step, we express the ERS algorithm in Alg. 3.

Algorithm 3 ERS: Efficient Relay SelectorInput: SCM matrix, content size b, number of RB B, r dc , r uc ,∀c ∈ QOutput: The optimum multicast tree T

foreach D2D pair {(x , y)|mx ,y > 0} doCompute tx = b/(r dmeNB,x

· B)Compute ty = b/(r dmeNB,y

· B)Compute tx ,y = ty ,x = b/(r umx ,y

· B)if ty ≤ tx + tx ,y then

Mark SCM entry mx ,y = 0

Solve P for T

Lemma 1. Pre-processing reduces D2D pairs by at least half of their initial number.

Proof. Before the pre-processing, for each pair (x , y) with mx ,y > 0, the pair (y , x) must

also exist as my ,x = mx ,y > 0. It is clear that tx , ty , tx ,y > 0 in each iteration of Alg. 3.

Then, the condition in the if statement for pairs (x , y), (y , x) cannot both be true and at

least one of the pairs will be removed. As Alg. 3 iterates through all initial D2D pairs, at

least half of them will be removed after the pre-processing.

We solve the MRS problem via the ERS method using CPLEX tool [36]. We also

demonstrate the significantly reduced D2D pair count used in ERS in the experimental

evaluation section.

3.4 Experimental Evaluation

In this section, we evaluate the performance of the proposed algorithms. We show

the comparative performance analysis of the algorithm we proposed for the generic

multi-hop D2D.

In the first set of experiments, we evaluate how social-aware multi-hop D2D can

achieve better content delivery time when the number of relay devices are constrained

by the eNB budget. The analysis is performed according to guidelines of LTE-A system

71

model [3]. We consider distance-dependent path loss and multipath Rayleigh fading

with exponent α = 3. The main wireless parameters are listed in Table 3-4. We have

considered 20 to 100 RBs dedicated for the multicast users on a 20 MHz channel

bandwidth [3]. Pairwise CQIs between devices including with eNB are computed by

mapping the signal-to-interference-plus-noise-ratio (SINR) on each RB onto the CQI

level that ensures a block error rate smaller than 1% [55].

To model the social tie, we use real-world location-based Gowalla network topology

from Stanford repository [1]. We choose n = 25 to 100 users with a step size of 25 who

are in a particular location at a certain time seeking the same content [6]. The social

tie wij between a pair of users is assigned randomly from a uniform distribution ranging

(0, a], where we vary the parameter a from 0.1 to 1.0 to observe the importance of social

communities on the delivery time. We then deploy the Blondel [17] community detection

algorithm to extract the social communities and subsequently to find the willingness lij of

a device i to share its resources with device j as defined in (3–3).

Table 3-4. Wireless network parametersNotation DescriptionCell dimension 100 x 100 m2

eNB location Middle of Left edge of the areaChannel Model Multipath Rayleigh fadingPath Loss Exponent 3Noise spectral density −174 dBm/HzeNB transmit power 10 WD2D transmit power 100 mWMaximum D2D distance 30 mRB size 12 sub-carriers, 0.5 ms

We compare the performance of optimal relay selector (ORS), ERS, restricted relay

selector (RRS), greedy relay selector (GRS) [55] and conventional multicast scheme

(CMS) [6] in terms of reducing the content delivery time. ORS is the solution obtained by

solving (3–5) using CPLEX [36]. ERS chooses relay devices from the reduced solution

72

space as described in Alg. 3. RRS selects the relay devices that are chosen from only

layer l = 1 while eNB being placed alone in layer l = 0. This means the whole content

transmission is restricted to two hops only. RRS thus obtains the optimal solution for

two hop transmission by limiting lmax to 2 in (3–5). GRS, proposed in [55], chooses relay

devices from first hop (l = 1) without any constraint on the eNB budget. In the second

hop it attaches devices greedily to the RD in layer 1 with the highest CQI and completes

the overall transmission in minimum time within two hops. It has been shown that GRS

outperforms other state-of-the-art methods in terms of reducing content delivery time

[55]. CMS, which transmits the content according to the CQI of the device having the

worst channel condition [6], is used as a base line for delivery time comparison.

We assume the video-on-demand (VoD) content size b = 1 MB, number of

resource blocks B = 100, social tie distribution parameter a = 1.0, eNB budget I = 10

throughout the subsequent experiments unless otherwise mentioned. We also assume

for simplicity that the store and forward decoding in each relay device takes negligible

time compared to the transmission time to other devices. Without loss of generality,

for better understanding of the comparisons, we have assigned eu = 1 for all devices

throughout the experiments.

The first analysis focuses on demonstrating that ERS outputs near-optimal solution

with respect to the optimal solution of (3–5). Since the optimal solution takes exponential

time for larger instances of the network, in Figure 3-2 we use |N | = 25 for comparison

with a range of eNB budgets. The gap is computed as follows:

gap =deliveryTime(ERS)− deliveryTime(ORS)

deliveryTime(ORS)× 100%

We can see ERS delivers the content always within 5% gap of that of ORS

as evident from Table 3-5 for various budgets. However, the execution time (ET)

varies significantly between ERS and ORS indicated by the third and fourth columns

respectively in the table. Executions are performed on an AMD Opteron(tm) Processor

73

6168 CPU with 64 GB-memory Linux machine. The reduced count of D2D pairs,

which is contributing to the huge run time improvement, is also evident from Table 3-6.

Accordingly, we use only ERS for comparative analysis in subsequent experiments.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

0 1 2 3 4 5

Deliv

ery

tim

e (

s)

Budget

ORS

ERS

Figure 3-2. Analysis of ERS performance vs ORS

Table 3-5. Gap analysis between ORS and ERSBudget % gap ET(s) ET(s)

ERS ORS1 4.45 0.1 112 1.96 0.2 20353 1.97 0.4 61894 3.92 0.6 128795 4.56 1.0 20212

Table 3-6. D2D pair counts in ORS and ERS

Method User count (n)25 50 75 100

ORS 600 2332 3852 7948ERS 135 426 551 1760

We now analyze the comparative performance of ERS, RRS, GRS and CMS with

respect to delivery time for a varying number of eNB budgets setting |N | = n = 50.

The result is shown in Figure 3-3. For all the considered relay selection schemes, ERS

74

achieves the best delivery time. CMS takes worst delivery time and it does not change

with the budget size as CMS does not leverage the D2D and hence, does not require

any RD. It is interesting to underline that GRS suffers from poor delivery time when the

budget is restrained to small number. Since, for each device in N\S, GRS attaches it

to the relay device with highest CQI value, it requires comparatively large number of

relay devices to cover all devices in N\S. It takes as large as 5 relay devices before it

can produce any feasible solution. Even if we allow GRS to operate without any budget

constraint (marked as Unrestricted GRS in Table 3-7), still ERS and RRS outperform it

when budget is larger than 2 as can be seen from Table 3-7. On the other hand, RRS

achieves better performance in terms of minimizing the delivery time compared to other

methods except ERS. As more budget is allowed, ERS has the liberty to choose from

more number of D2D devices with better CQI spanning larger hops that can minimize

the delivery time compared to RRS.

Table 3-7. Comparison of delivery times in second

Method Budget1 2 3 5 10

Unrestricted GRS 0.92 0.92 0.92 0.92 0.92RRS 0.98 0.92 0.84 0.84 0.83ERS 0.98 0.92 0.84 0.75 0.72

Figure 3-4 compares the impact of multicast user count on the delivery time performance

of different relay selection schemes. The area where the users are distributed is

progressively extended starting from the smaller area for n = 25 users to the whole

cell of 100 × 100 m2 for n = 100 users. As users are gradually moved farther from the

eNB as the multicast user count increases, the worst channel quality between eNB

and a device also deteriorates. As a result, the delivery time for each of the schemes

increases. However, increased distance from eNB unravels the opportunity of multi-hop

D2D, particularly for those devices that are far away from the eNB. ERS takes the

75

full advantage of the D2D pairs by delivering content to the devices in the cell edge

through multi-hop D2D. As user count n reaches 100, ERS outperforms other schemes

significantly with an average delivery time gain of 68.4% over CMS and 22.3% over RRS.

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

0 2 4 6 8 10

Deliv

ery

tim

e (

s)

Budget

ERS

RRS

GRS

CMS

Figure 3-3. Delivery time for varying the eNBbudget

0.5

1

1.5

2

2.5

3

25 50 75 100

Deliv

ery

tim

e (

s)

n

ERS

RRS

GRS

CMS

Figure 3-4. Delivery time for varying themulticast users

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

20 30 40 50 60 70 80 90 100

De

live

ry tim

e (

s)

Available RB

ERS

RRS

GRS

CMS

Figure 3-5. Delivery time for varying the RB count

In Figure 3-5, we show how varying the number of available resource blocks can

impact the relay selection and corresponding content delivery time for a fixed content

size of 1 MB, multicast user count of 50 and fixed budget of 10. We depict the delivery

time for a range of RBs spanning 20 to 100 with a step size of 10 in the network. As

76

expected, CMS is the worst performing scheme for multicast content delivery requiring

5.56 s for 20 RBs and 1.11 s when there are 100 RBs. When the number of available

RBs is small (20), ERS exhibits performance gain of 12.6% and 34.4% over RRS and

CMS, respectively. Not surprisingly, all of the schemes improve the delivery time as

more number of resource blocks are allowed to be used for transmission. This, once

again, reinforces the superiority of ERS in terms of working efficiently under resource

constraint.

Figure 3-6 demonstrates how varying the content size impacts the content delivery

time when the number of RBs, user count and budget is fixed to the values mentioned

at the start of this section. We show the impact on delivery time by ranging the VoD

content size from 1 MB to 10 MB. Given a fixed number of RB, B = 100, with increased

content size, more time is required to transmit the content. As a result of this increased

per hop time, the overall content delivery time increases for all of the methods. However,

similar to previous trend, ERS outperforms other methods significantly.

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

Deliv

ery

tim

e (

s)

Content size (MB)

ERS

RRS

GRS

CMS

Figure 3-6. Delivery time for varying thecontent size

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0.1 0.3 0.5 0.7 0.9

0

100

200

300

400

500

Deliv

ery

tim

e (

s)

D2D

pair c

ount

Social tie

Figure 3-7. Delivery time and D2D pair countfor varying the social tiedistribution

So far in all of the experiments we have considered social tie wij between user i

and user j in online social network chosen uniformly from a distribution ranging (0, a]

77

Figure 3-8. Heatmap depicting hop count for varying both budget and multicast user

and setting a = 1.0. We now analyze the impact of social tie by varying distribution

range, i.e., a = 0.1 to a = 1.0. The smaller the value of a, the less would be the social

tie strength wij , which makes it less likely for i and j to be in the same community. This

would eventually result in less likelihood of sharing resources between i and j , i.e., lij

becoming small. Hence, there will be fewer D2D device pairs which in turn culminates

into larger delivery time. Recall that lij also depends on the social tie between i and j

denoted by wij ≥ 0.5 (refer Equation (3–3)). Therefore, once the value of a becomes at

least 0.5, the number of D2D pairs increases significantly. This is evident from Figure

3-7 where we vary the social tie distribution range a for a given RB size, content size,

multicast user count and budget size as specified earlier and report the results obtained

by the ERS algorithm in terms of total content delivery time and number of D2D pairs.

As a increases, more D2D pairs (red line) are included for sharing resources between

each other which results in quicker delivery time (blue line). We averaged the results

over a large number of independent simulation runs for this purpose.

78

We now express the underlying hop count used by our proposed method ERS as a

function of both eNB budget and the total number of multicast users. For this analysis,

we fix the RB count to 100, content size to 1 MB, social tie distribution range to (0, 1.0].

We vary the user count from 25 to 100 with a step size of 25 and eNB budget from 1 to

10 in this experiment. In Figure 3-8, we depict the hop count by the RGB colors. As the

hop count gets higher, the color becomes lighter in the heat map. For a given budget, as

the total user count increases, devices far away from the eNB are better served by the

D2D requiring multi-hop D2D communication which can be seen from lighter color in the

heat map.

3.5 Summary

In this chapter, we have studied the benefit of social-aware multi-hop D2D for video

content delivery to multicast users under practical constraints imposed by the eNB

for relay selection. We have formulated a novel problem, MRS, for minimizing content

delivery time to a group of users and shown its NP-completeness. We have introduced

a mixed integer program formulation to express MRS and proposed a heuristic scheme

to efficiently solve the problem. Our proposed social-aware solution minimizes the Base

Station cost efficiently by relaying video content to a set of relay devices which, in turn,

transmits the content via multi-hop D2D to other devices with poorer channel condition

which could not receive the content from the eNB. Simulation results showed that our

proposed methods outperformed existing state-of-the-art methods significantly in terms

of minimizing content delivery time.

79

CHAPTER 4SOCIAL-AWARE MULTICAST CONTENT TRANSMISSION: SPECIAL CASE

SCENARIO

In the previous chapter, we have explored the performance enhancement of

multicast content delivery brought forth by the social-aware D2D communication.

Although the mixed integer program P that we devised in Chapter 3 provides an

accurate solution to MRS, it may not scale well for large MRS instances even after the

pre-processing step to remove non-contributing D2D pairs. In this chapter, we introduce

an efficient greedy based approximation algorithm with provable performance guarantee

to solve a two-hop variation of the MRS problem.

4.1 CRS: Two-hop MRS

We now discuss the cost-effective relay selection problem (CRS), a constrained

version of the MRS problem in which the multicast content is delivered in at most two

hops. In CRS, the eNB selects a particular CQI level in downlink and transmits the

content to a set of devices that are capable of decoding it. Among these devices, a set

of relay devices are chosen who then transmit the content via D2D using the uplink to

the rest of the devices. The objective of CRS is to ensure content delivery to all devices

in minimum time by selecting appropriate CQI levels for both downlink and uplink. With

the guaranteed minimum time, CRS asks the most cost-effective relay devices in the

second hop to transmit the content to rest of the devices that cannot be directly served

by the eNB. The formal definition of CRS is as follows.

Definition 5. (CRS) Given a network G of N devices requesting the same content, CRS

seeks to first find the CQI levels cd and cu in downlink and uplink respectively such that

the content can be delivered to all the devices in minimum time. Then it asks for a set S

of relay devices with minimum C(S) that guarantees the minimum delivery time.

In the proof of Theorem 3.1, the P-MRS problem we constructed is essentially a

special case for CRS and it is NP-Complete. Therefore, CRS is also NP-Complete.

80

4.1.1 Solution Sketch

Our proposed solutions aim to identify the proper configuration to solve CRS by

deciding: (i) the CQI levels cd , cu that guarantees content delivery with minimum time,

(ii) set S of the RDs to transmit the content to all devices that cannot be served directly

by the eNB.

For the first step of solving CRS, we are required to find out the appropriate value

of CQI in each layer that results in minimum delivery time. We start by calculating

the overall transmission time T for all possible cd , cu combinations and sort the

combinations in ascending order of T . Then we use a binary search to locate the

feasible combination with minimum transmission time. The feasibility of a combination

cd , cu is discussed as follows.

For each combination cd , cu, we first identify the set Kd of devices that are able to

decode the content transmitted from the eNB under CQI level cd . Then, we construct the

set of devices that can retrieve the content from a device i ∈ Kd via D2D. For an uplink

MCS cu each device i first constructs a list of devices that can be served by i via D2D,

i.e., Xi = {k ∈ N\Kd |mi ,k ≥ cu} which is called the maximum reachable devices (MRD)

set of device i . Since D2D transmissions in each hop are synchronized as they are

performed in the same transmission time interval (TTI) [5], all considered RDs serve all

UEs in next hop in a single transmission by using the MCS corresponding to the chosen

uplink MCS cu. A combination cd , cu is feasible when ∪i∈KdXi = N\Kd .

For the second step of solving CRS, we introduce two solution schemes for the

relay selection problem (RSP) that identifies the RD set S with minimum cost under

the obtained cd , cu during the first step. The two solutions are described in the next

subsection.

81

4.1.2 Solutions for RSP

4.1.2.1 The pptimal solution to RSP

In this subsection, we solve RSP optimally by formulating it as an integer linear

program.

With given cd , cu, as discussed in the previous section, we can identify the set

Kd and all Xi , i ∈ Kd . We introduce a variable ai for every device i ∈ Kd , with the

intended meaning that ai = 1 when the device i is selected as a relay device, and ai = 0

otherwise. We can express the RSP problem as the following integer linear program

(ILP):

minimize∑i∈Kd

ei · ai

subject to∑i :v∈Xi

ai ≥ 1, ∀v ∈ N\Kd

ai ∈ {0, 1} (4–1)

The constraint ensures that rest of the devices N\Kd can retrieve the content from

at least one of the devices in i ∈ Kd . This implies the feasible solution of this ILP must

ensure that each of the devices that could not receive the content from the eNB belongs

to the maximum reachable devices (MRD) set X of at least one relay device.

We obtain the optimal solution of RSP by solving (4–1) using CPLEX tool [36].

Accordingly, we denote the overall solution to solve the CRS problem as Optimal Relay

Device Selector (ORDS) when this ILP is deployed in the second step to solve RSP. We

discuss the comparative performance of this ORDS algorithm in Section 4.2. As RSP

resembles the set cover problem and is thus NP-Hard, we now propose a fast greedy

algorithm to RSP in order to achieve high efficiency for large problem instances.

82

4.1.2.2 The greedy solution to RSP

The proposed greedy solution to RSP, gain maximizer (GM) described in Alg. 4,

iteratively selects a device which maximizes the number of newly covered devices in

N\Kd . Notice that GM may not be able to find a feasible solution for all combinations of

cd , cu. The gain function f (u) = |Xu |eu

indicates the number of uncovered devices that u

can transmit the content to, denoted by |Xu| weighted by its incentivization cost eu using

the given CQI level. In each iteration we pick the device u with highest gain (a tie is

broken arbitrarily) and add it to the final set S. The gain function f (v) for all other nodes

is updated each time a device is added to the set S . The process continues until either

all the devices in N\Kd are covered or there is no feasible solution for this value of cu.

Algorithm 4 GM: Gain MaximizerInput: Xu,∀u ∈ Kd

Output: S ⊆ Kd

S ← ∅, C ← ∅foreach u ∈ Kd do

Initialize the gain f (u)← |Xu |eu

while C 6= N\Kd ∧ Kd 6= S dov ← argmax

v ′∈Kd\S{f (v ′)}

S ← S ∪ {v}foreach x ∈ Xv doC ← C ∪ {x}Update f (y) for all y ∈ Kd such that x ∈ Xy

if C 6= N\Kd thenSolution not feasible

We are now ready to propose the solution to solve the CRS as a whole based on

the solutions of RSP in the next section.

4.1.3 Solution to CRS

The solution to CRS, namely FRDS (Fast Relay Device Selector) combines what we

discussed above in Section 4.1.1 and Section 4.1.2. Upon construction of the potential

(cd , cu) pairs, FRDS sorts them according to their content delivery time in ascending

83

order. It then runs a binary search over those combinations to identify the pair that can

ensure all the devices can receive the content. For each pair (cd , cu), it then invokes

Alg. 4 to check whether the potential set of relay devices corresponding to that (cd , cu)

combination can transmit the content to all the devices that could not retrieve it from

the eNB. Note that the ILP discussed in Section 4.1.2.1 can also be used to determine

the feasibility of a pair (cd , cu), yet FRDS uses Alg. 4 for obtaining better run time

efficiency as well as approximation guarantee which we discuss next. We also compare

the performance of FRDS with ORDS (described in previous Section 4.1.2.1) in the

experimental evaluation section. The algorithm FRDS is detailed below.

Algorithm 5 FRDS: Fast Relay Device SelectorInput: SCM matrix, NOutput: �cd , �cu,S1. Construct all possible combination (cd , cu) such that cu ≥ cd

2. Compute the content delivery time tcd

+ tcu for each combination (cd , cu)

3. Sort the combinations in ascending order according to the delivery time4. Identify the feasible pair (�cd , �cu) with smallest delivery time using a binary search5. Calculate the relay device collection S using Alg. 4

Complexity analysis

We now discuss the solution quality and running time complexity of Alg. 5.

Theorem 4.1. Alg. 5 ensures minimum content delivery time and achieves an approxi-

mation ratio of O(log(n)) for the cost of selecting relay devices.

Proof. Alg. 5 guarantees the minimum possible delivery time due to two facts: (i) the

feasibility check of a delivery time (which is essentially the feasibility of the pair (cd , cu))

is accurate, ensured by Alg. 4. (ii) A binary search structure over sorted time guarantees

finding the minimum feasible time. In terms of the cost of selecting relay devices, as

RSP resembles the Set Cover problem, the approximation ratio for the greedy solution of

RSP is also log(n) [69].

Lemma 2. The run time computational complexity of FRDS is O(n2 · log(q)), where q is

the total number of CQI levels.

84

Proof. There are total q2 possible combination for downlink and uplink CQI pairs

(cd , cu). The binary search of the appropriate level of downlink and uplink CQI levels

takes O(log(q2)) time. For each of the downlink and uplink CQI combination (cd , cu),

the FRDS algorithm at first constructs the set of potential relay devices Kd which takes

linear time in n. For each of the devices i ∈ Kd , FRDS identifies the set of devices that

can be served by i via D2D, which can take O(n2) time in worst case. There can be total

O(n) possible collections of sets for the GM algorithm which takes O(n) time to find out

whether the solution is feasible and if so the corresponding set of relay devices. So,

the while loop of the GM algorithm takes O(n2) in worst case which is the time for each

combination (cd , cu). Hence, the total time complexity for binary searching the actual

feasible pair that results in minimum time is O(n2 · log(q)).


In this section, we evaluate the performance of the proposed algorithms. We

analyze this special case of two-hop scenario which has approximation guarantee with

different methods which we have also used for comparison purpose in the previous

chapter. We compare the performance of our proposed algorithms for solving the

CRS problem. Recall that CRS is a special case scenario of the multi-hop problem in

which the multicast content is delivered in at most two hops with the aim of minimizing

content delivery time along with choosing cost-effective relay devices. We evaluate the

performance of ORDS (introduced in Section 4.1.2.1) and FRDS algorithms with that of

the GRS algorithm [55] as well as conventional multicast scheme (CMS) [6] which we

have also used in the previous chapter. We compare these schemes in terms of different

metrics including the eNB budget, content delivery time and execution time. ORDS is

the solution obtained by solving the CRS problem where in the second step we deployed

CPLEX [36] to solve (4–1). FRDS selects the relay devices according to the Alg. 5.

Figure 4-1 shows the comparative performance of different methods in terms of eNB

budget, which is essentially the RD count since we assumed eu = 1, required to deliver

85

0

2

4

6

8

10

25 50 75 100

Budget

n

ORDS

FRDS

GRS

Figure 4-1. eNB Budget (RD count) forvarying user count

0.5

1

1.5

2

2.5

3

25 50 75 100

Deliv

ery

tim

e (

s)

n

ORDS

FRDS

GRS

CMS

Figure 4-2. Content delivery time for varyinguser count

0

200

400

600

800

1000

1200

1400

1600

25 50 75 100

Execution tim

e (

ms)

n

ORDS

FRDS

GRS

Figure 4-3. Execution time for varying user count

a content of size 1 MB using 100 RBs when the multicast user count is varied from 25 to

100 with a step size of 25. The objective of the CRS problem is to deliver the content in

minimum time while choosing minimum cost RDs. FRDS requires almost similar number

of RDs as does the ORDS, which is well within the O(log(n)) approximation ratio that we

proved in Theorem 4.1. FRDS also requires significantly lesser time to compute the RDs

than the ORDS, which can be seen from Figure 4-3. On the other hand, As the user

count increases, GRS requires increasingly larger number of RDs to deliver the content

86

to all the multicast users. Note that, for user count n = 25, GRS requires less number

of RDs than that of the FRDS. However, this comes at a cost: the content delivery time

for GRS takes longer than that of the FRDS as Figure 4-2 clearly suggests for the user

count of 25.

Figure 4-2 demonstrates the content delivery time performance of different relay

selection schemes. As the multicast user count increases, the worst channel quality

between eNB and a device also deteriorates due to the poor channel condition of

the devices on the edge of the network. As a result, the delivery time for each of the

schemes increases. CMS requires comparatively longer time than any other methods

for delivering the content since it transmits the content to all the users using the CQI

value of the user with poorest channel condition. GRS takes longer time to deliver the

content when user count is small. For larger user count, GRS takes similar amount of

time in delivering the content compared with ORDS and FRDS. However, this comes at

a cost: GRS also requires large number of RDs to accomplish its objective which makes

this approach practically cost inefficient. FRDS requires exactly same amount of time as

ORDS does; however, it requires significantly less time to compute the RDs as evident

from Figure 4-3. Furthermore, FRDS requires very small number of RDs in delivering

the content which makes it the best fit for identifying the cost-effective relay devices.

In Figure 4-3, we report the running time of the algorithms. ORDS takes more

time to compute the RDs compared to FRDS which requires time in milliseconds. With

increasing user count, ORDS’s time complexity exponentially increases. It is worth

noting that for smaller values of n, GRS takes very small time to compute the RDs.

However, as the user count increases, it requires significantly longer time to identify

RDs. The reason lies in how GRS identifies the RDs. For each downlink CQI cd , GRS

identifies the potential relay devices. It then enumerates all possible combinations of

the potential RDs, starting with the smallest size combination, to verify the feasibility

of the solution. This expensive step makes the algorithm take prohibitively larger time

87

than any other method specially when the potential relay devices are also large for large

values of n, which can be seen for n = 100 in Figure 4-3. Note that Figure 4-1 does not

have any corresponding values for CMS scheme as it does not support the concept of

RD. In case of Figure 4-3, CMS takes very small amount of time to compute the CQI

value of the user with worst channel condition with which it delivers the content to all

the multicast users (not shown in the figure). This near constant time computational

complexity comes at a cost: the content delivery time is much larger than any other

method considered. In summary, FRDS achieves almost 1800% gain over GRS for

n = 100 in terms of running time complexity.

4.3 Summary

In this chapter, we have studied the benefit of social-aware D2D for video content

delivery to multicast users under practical constraints imposed by the eNB for relay

selection. We have devised the problem as a special case of the generic problem

introduced in the previous chapter and analyzed its complexity. Moreover, we provided

an approximation algorithm for this special case with a provable performance guarantee.

Experimental evaluation results showed that our proposed methods outperformed

existing state-of-the-art methods significantly in terms of minimizing content delivery

time.

88

CHAPTER 5ROBUSTNESS OF COMMUNITY STRUCTURES: APPROXIMATION ALGORITHMS

AND ANALYSIS

In this chapter, we define the framework for assessing community structure fragility.

At first we introduce the density based broken community (DBC) problem for breaking

k communities with the minimum number of edge removals and analyze its complexity.

We then provide an approximation algorithm with theoretical performance guarantee for

the DBC problem in Section 5.1. To analyze the vulnerability of the community structures

in a broader sense, we extend the problem formulation to communities produced from

an arbitrary community detection algorithm. We offer an efficient heuristic to break the

communities and identify the set of critical edges in Section 5.2. In order to analyze

the edge constrained version and accordingly to identify the edges that are crucial

for community structure, we furthermore examine the problem from the view point of

locating a fixed number of important edges whose removal breaks as many communities

as possible in Section 5.3. We conduct extensive experiments with different parameters

to mine interesting observations about the behavior of broken communities after edge

removal. The results are reported in Section 5.4.

5.1 Density-based Analysis

5.1.1 Network Model and Problem Definition

In this chapter a network is represented by a graph G = (V ,E) where V is the

set of n nodes and E is the set of m edges. A node u in G represents a user while an

edge (u, v) represents the interaction between the users u and v in the network. For a

community C ⊆ V , let mC and nC be the number of internal edges and the number of

nodes in C , respectively. Let C in denote the set of edges having both endpoints in C .

We have used the terms vertex and node interchangeably throughout this chapter.

There are several quantitative measures to identify communities in a network such

as maximizing the modularity based functions [29] and density based functions [30].

89

In this section, we first consider the density function and discuss other community

detection measures later in section III.

The density based function can be defined as (C) = |C in|(|C |2 )

to identify a set C of

nodes as a community [30]. The more C approaches a clique of its size, the higher its

density value (C).

The threshold on the internal density that suffices for C to be a local community is

given by

τ(C) =σ(C)(|C |

2

) where σ(C) =

(|C |2

)1− 1

(|C |2 ) (5–1)

Thus a subgraph induced by C is a local community iff (C) ≥ τ(C) or equivalently

|C in| ≥ σ(C).

As can be seen, this density function particularly has the advantage of dealing

with the candidate group only, not requiring any predefined threshold nor user defined

parameter. However, we discuss other community detection measures later in the

general framework section. Besides, σ(C) is an increasing function which approaches

C ’s full number of connections, i.e., the number of edges in a clique of size |C |. Hence,

σ(C) is a powerful tool for detecting local communities, i.e., densely connected parts of

a network.

Based on the definition of the density function, a community C is broken if, by

removing a set of edges S from E , the density of C falls below τ(C). Therefore, let

ki denote the number of edges required to be removed from community Ci to make

(Ci\Si) < τ(Ci\Si) where Si is the set of removed edges in community Ci , then ki is

defined as

ki = min{t|( 2(mCi− t)

nCi(nCi− 1)

) < τ(Ci)} (5–2)

The density-based breaking of communities (DBC) problem is defined as follows:

Definition 6. (DBC) Given an undirected graph G = (V ,E), and a set C of k commu-

nities, find a subset S ⊂ E of minimum cardinality such that removing S from the graph

breaks every community in C .

90

5.1.2 Complexity of DBC

Theorem 5.1. The DBC problem is NP-complete.

Proof. The decision version of DBC is defined as follows. Given (G ,C , l), where

G = (V ,E) is a graph, C is a set of communities of G , and l is a positive integer,

determine whether there exists a set S ⊂ E such that in G ′ = (V ,E\S), every

community in C is broken, and |S | ≤ l .

Given a set S of edges, one can efficiently check whether |S | ≤ l and whether all

communities in C are broken. Thus DBC is in NP.

To show the NP-hardness, we reduce from the vertex cover problem, defined as

follows. Given (G , l), where G = (V ,E) is a graph, and l is a positive integer, a vertex

cover is a set A ⊂ V such that for all e = (u, v) ∈ E , u ∈ A or v ∈ A. The problem is to

determine whether a vertex cover A exists with |A| ≤ l .

First, we need to define the identification of vertices in a graph.

Definition 7 (Vertex identification). Let H = (V ,E) be a graph. Let A = {ui : i ∈ I}

be a collection of vertices. Identification of the vertices A is defined to be the following

operation.

Let H ′ = (V ′,E ′) be the induced subgraph of H after removing vertices {ui : i ∈ I}.

Let u be a new vertex. Then

V ∗ = V ′∪{u}

E ∗ = E ′∪{(u,w) : (ui ,w) ∈ E ,w ∈ V ′}

and H∗ = (V ∗,E ∗) is the result of the operation.

Construction. Let C be the following community with 4 vertices and 5 edges:

1

2

3

4

91

Let (G , l) be an instance of vertex cover. For each edge e = (u, v) ∈ G , we create a

copy Ce of C . We associate edge (1, 2) in Ce with u, and edge (3, 4) with v .

Now form the graph G = _⋃e∈ECe , the disjoint union of the Ce . Finally, for each

vertex v in G , identify in G all incident vertices to the edges to which v is associated.

The resulting graph will be called G ∗. Together with the collection C = {Ce : e ∈ E},

(G ∗,C , l) form an instance of the decision version of DBC , where we consider Ce in G ∗

to be the set of vertices of Ce in G after identification.

Example. To illustrate the above construction, we will consider an instance of vertex

cover (G , l) where G is a triangle, and show G and finally G ∗.

G : u

v

w → G : 1

2

3

4u v

5

6

7

8v w

9

10

11

12w u

. Then, vertices {1, 11}, {2, 12} are

identified corresponding to the edges associated with u, and likewise for the other

vertices in G , and we have G ∗:

1,11

2,12

3, 5

4, 67,9

8,10

u v

w

Given instance (G , l) of vertex cover, it remains to be shown that a solution for

(G ∗,C , l) yields a solution for (G , l). Each community Ce meets the density requirement

to be a community by a single edge. Since none of the edges in Ce other than (1, 2) and

(3, 4) are shared with any other community in C , we can assume that only (1, 2) or (3, 4)

(after identification) is removed from any given community. Thus, each edge that is a

candidate for removal corresponds to a unique vertex in G .

Thus, given a solution B of at most l edges whose removal breaks C , we get a set

A of vertices corresponding to the edges in B. This set A is a vertex cover of G . To see

92

this let e ∈ E . Then Ce is broken by removing B. Thus, one of the edges corresponding

to the vertices of e must be in B; hence at least one vertex of e is in A.

By similar argument, a feasible vertex cover for (G , l) gives rise to a feasible

solution of (G ∗,C , l).

5.1.3 Solutions to DBC

In this section, we provide an approximation algorithm for DBC with a theoretical

performance guarantee. In doing so, we first reduce DBC to the set multicover problem,

in a way that preserves the approximation ratio for set multicover. We then apply

solutions of set multicover to our problem. The challenging part of this approach is to

reduce a problem to another one while preserving the ratio.

Definition 8 (Approximation ratio preserving reduction). Let �1 and �2 be minimization

problems.

Let f be a polynomial-time algorithm such that if I1 is an instance of �1, I2 = f (I1) is

an instance of �2 with OPT (I2)) ≤ OPT (I1); that is, the value of the optimal solution to

I2 is at most the value of the optimal solution to I1.

Let g be a polynomial time algorithm, such that if t is a solution of I2 = f (I1),

s = g(I1, t) is a solution of I1 such that the objective function value of s is not more than

the objective function value of t; that is, obj�1(I1, s) ≤ obj�2

(I2, t).

Then, by use of f and g, an α-approximation for �2 yields an α-approximation for

�1.

Consider the problem

Definition 9 (Set multicover).

minimize x

subject to Ax ≥ b,

0 ≤ x ≤ u, (x integer)

where A is n by m matrix (aij), aij ∈ {0, 1}, bi ∈ N for i ∈ {1, ... , n}, ui ∈ N, i ∈ {1, ... ,m}.

93

We have defined set multicover as an integer program, for convenience, but one

may think of row i of A as giving the subsets to which element i belongs, bi as the

number of times element i is required to be covered, xi would correspond to the number

of times set i could be picked, bounded above by ui .

Next, we will define an approximation ratio preserving reduction from DBC to set

multicover. Let I1 be an instance of DBC, consisting of a graph G = (V ,E) and a set of

communities C to be broken. Suppose each Ci ∈ C to require ki edges to be removed.

Now, for the set multicover instance, instance I2 will be defined in the following way.

Define the set of elements to be covered to be C , with bi = ki . For each e ∈ E , define

Ae := {C ∈ C : e ∈ C}. These sets will form the collection of subsets of C from which

we choose the multicover. Finally, define ue , the maximum times Ae can be chosen, to

be |{f ∈ E : Af = Ae}|.

Thus, I2 is a valid instance of set multicover. Now, any feasible solution s of I1

corresponds in a natural way to a feasible solution t of I2 of equal cost. List the edges

removed in s: e1, e2, ... , ek . For each edge ei , add one to the number of times Aei is

chosen. This procedure clearly results in a feasible solution t of I2 of equal cost to s.

Thence, OPT (I2) ≤ OPT (I1).

Now, let t be a feasible solution of I2. It consists of a collection {(Ae, xe)} of subsets

of C together with the number of times each subset is chosen. To construct s: for each

subset Ae , pick xe edges f such that Af = Ae . This is possible since xe ≤ ue , where ue

is the number of edges satisfying this condition. The cost of s is equal to the cost of t.

Hence, we have an approximation-preserving reduction.

Set multicover as defined above has a log k-approximation algorithm [27], where k

is the number of elements to be covered. If we combine this algorithm with the above

reduction, we have a log k-approximation algorithm for DBC, where k is the number of

communities to be broken.

94

We present the approximation algorithm in Alg. 6 labeled CVA (Community

Vulnerability Assessment). The gain function f (e) indicates the number of unbroken

communities L(e) that the edge e belongs to. In each iteration we pick the edge with

highest gain until all the communities in C are broken. The DeletionVector D contains

the number of edges necessary to be removed for each community in order to break it.

This vector D is updated each time an edge is removed from a community. Once all the

necessary edges to break a community Ci have been removed, i.e. when Di becomes 0,

the community is broken and the gain function f (e) is updated.

Algorithm 6 CVA: An approximation algorithm for finding the critical edgesData: Network G = (V ,E), DeletionVector D, C , |C | = k

Result: A set S ⊆ E edgesS ← ∅C ← ∅for each edge e ∈ E do

compute the gain f (e)

while |C | ≤ k doe ′ ← argmax

e∈E\S{f (e)}

In case of a tie, choose randomlyS ← S ∪ {e ′}for l = 1 to k do

if Cl /∈ C thenif e ′ ∈ Cl then

Dl ← Dl − 1if Dl ≤ 0 then

C ← C ∪ {Cl}f (e) = f (e)− 1 for all e ∈ Cl

return S

5.2 A General Framework

We now discuss the breaking community problem in the context of a general

community detection algorithm. There are a plethora of community detection algorithms

with different objective functions. Thus, we define what it means to break a community

for an arbitrary community detection algorithm as follows.

95

Definition 10. (Broken Community) Consider a community detection algorithm A ,

which produces a collection C of communities on graph G (written C = A (G)). Let G ′

be a new graph after removal of a set of edges, and let C ′ = A (G ′). Let γ ∈ (0, 1). A

community C ∈ C is said to be broken in graph G ′ if there does not exist a community

C ′ ∈ C ′ satisfying

( i) C ′ ⊂ C , and

( ii)|C ′||C |

> γ.

We introduce the strictness threshold γ which defines how much similarity the two

structures have in terms of number of common nodes once the community is broken

after edge removal. The larger this threshold the less strict the requirement is and vice

versa.

Accordingly, Broken Community Assessment (BCA) problem is formulated as

follows:

Definition 11. (BCA) Given a network represented by a graph G = (V ,E), a specific

set C of k communities, BCA seeks for a minimum cardinality subset S ⊆ E such that

removal of S from G breaks every community in C .

Solution for the General Case

Let ε > 0. Define a c-way ε-balanced partition of a graph to be a partition with c

components, such that for each component A, |A| < (1+ε)nc

[37].

Lemma 3. Partitioning a community C into at least c ε-balanced subparts, where

γc ≥ 1 + ε makes it broken.

96

Proof. After a balanced paritioning of C into c subparts, each partition has less than

(1 + ε)nc

vertices, where n = |C |. Now, let γc ≥ 1 + ε, and A be a component. Then,

|A| < (1 + ε)n

c=(1 + ε)n1γ(γc)

≤(1 + ε)γn

(1 + ε)= γn

Finally, any community C ′ detected within C must lie in one of the components, A; so

|C ′| < γ|C |, and the community C is broken.

We devise Alg. 7 for solving the BCA problem based on Lemma 3. In order to find

a solution, c should satisfy the condition γc ≥ 1 + ε. We partition each community into

c-balanced components. The proposed Critical Community Fragility (CCF) algorithm

follows.

Algorithm 7 CCF: A heuristic algorithm for breaking communitiesData: Network G = (V ,E), k Communities C , strictness threshold γResult: A set S ⊆ E of edgesS ← ∅c ← z : z is least integer satisfying zγ ≥ 1 + εfor each community Ci ∈ C do

compute the c-way balanced partitioning [37]Cuti = set of edges to cut Ci into c partsS ← S ∪ Cuti

return S

For each of the target k communities, Alg. 7 at first finds out the number of parts it

needs to be partitioned for breaking that community as per the general definition. Each

of the target communities are then divided into c parts by balanced partitioning algorithm

as proposed in [37]. The edges that lie in between different parts are subsequently

removed to ensure that the community is broken.

97

5.3 Broken Community Analysis: Constraint on Edge Removal

We have thus far devised formulations to break a community where we are

choosing the minimum number of required edges to break all the given k communities.

In real life the choice of edges is often constrained by a fixed budget. In the latter case, it

is more ideal to extract those critical edges whose removal breaks as many communities

as possible. For instance, when the budget is fixed, in order to limit the spread of

misinformation [79, 80] in OSNs or to stop worm propagation in cellular networks, one

might want to safeguard as many affected communities as possible.

In this section we investigate the broken community problem from a different angle,

that is to maximize the number of broken community within an allowed budget, i.e.,

deleting at most k edges, defined as follows:

Definition 12. Given a network represented by a graph G = (V ,E), a set C =

{C1,C2, ... ,Cl} of communities and a positive integer k ≤ m, the problem seeks for a

subset S ⊆ E of edges where |S | ≤ k such that the number of broken communities in C

is maximized after the removal of S .

Based on the definition of broken community, we define two variants of the above

problem, k-DBC and k-BCA in following subsections.

5.3.1 k-Density-based Broken Community

The k-Density based Broken Community (k-DBC) problem is formulated according

to the definition of broken community defined in Section 5.1.1. A community is said to be

broken if the internal density becomes smaller than the threshold as given by Equation

5–1 as edges are removed one by one. The set of ki edges inside the community Ci for

all i ∈ 1, ..., l whose removal will break that community is determined by Equation 5–2.

Introduce binary variable xj for each edge ej ∈ E for all j ∈ 1, ...,m, whose value

will be set to 1 if ej is chosen for removal and also consider the variable zi which denotes

whether community Ci is broken or not. The IP formulation of this problem is given

below:

98

maximizel∑

i=1

zi

subject to∑ej∈Ci

xj ≥ kizi , ∀Ci ∈ C

∑j

xj ≤ k , ∀j ∈ {1, ...,m}

xj ∈ {0, 1}, ∀j ∈ {1, ...,m}

zi ∈ {0, 1}, ∀i ∈ {1, ..., l}

Solution of k-DBC. When ki = 1 for all i ∈ {1, ... , l}, the above integer program

resembles Maximum Coverage problem, a well-known NP-complete problem [7]; thus,

k-DBC is NP-complete. We propose an algorithm k-CVA for solving this problem in Alg.

8. k-CVA keeps on removing edges with highest gain denoted by f (e) that measures the

number of unbroken communities e belongs to until k edges are removed.

Algorithm 8 k-CVA: An optimal algorithm for finding the critical edgesData: Network G = (V ,E), DeletionVector D, k ≤ m, set of Communities CResult: A set S ⊆ E of k edges, set B of broken communitiesS ← ∅B ← ∅while |S | ≤ k do

for each edge e ∈ E docompute the gain f (e)

e ′ ← argmaxe∈E\S

{f (e)}

S ← S ∪ {e ′}for each community l ∈ C do

if e ′ ∈ l thenDl ← Dl − 1if Dl ≤ 0 then

f (e) = f (e)− 1 for all e ∈ Cl

B ← B ∪ {l}

return S , B

99

5.3.2 A General Framework: k-Broken Community Assessment

The k-Broken Community Assessment (k-BCA) problem defined in the context of a

general community detection algorithm is formulated based on the following definition of

broken community (defined in Section 5.2):

Definition 13. A community C in graph G is said to be broken in graph G ′ = [G\E ′] if

there does not exist a community C ′ in G ′ satisfying ( i) C ′ ⊂ C , and ( ii) |C ′|/|C | > γ after

removal of edge set E ′ from G .

a) Solution for the General Case: We devise a greedy algorithm in Alg. 9 labeled

CEL (Critical Edge Locator) for solving the k-BCA problem. As the number of edges is

constrained, CEL removes those edges that will break the communities apart. To this

end, we introduce a metric for edge importance inside a community based on maximum

common neighborhood value. We calculate the common neighbor value of an edge by

finding out the number of common nodes the endpoints of the edge has. For an edge

(u, v), Common Neighbor Index (CNI) is the number of common neighbors between

vertex u and v , i.e., CNI (u, v) = |N(u) ∩ N(v)|, N(u) denotes the neighbors of u in G for

all u ∈ V .

This CNI indicates how important that edge is in keeping the community connected.

If an edge has small CNI value, it implies that removing this will facilitate the breaking of

that community since very few (if any) other common neighbors exist to keep different

parts of that particular community connected. Hence, CEL chooses these edges

instead of those that have high CNI value. Moreover, since the goal is to break as many

communities as possible, CEL not only ranks edges based on CNI value but also takes

into consideration the cut size (number of edges required for balanced partitioning) of

the community an edge belongs into when calculating the final weight of each edge.

CEL prioritizes those edges that belong to communities that require fewer cut edges.

Additionally to avoid choosing bridge edges that connect two different communities, CEL

assigns them higher weight value.

100

Algorithm 9 CEL: A heuristic for finding the critical edgesData: Network G = (V ,E), set of Communities C, strictness threshold γ, k ≤ m

Result: A set S ⊆ E , |S | ≤ k edges, set of broken communitiesS ← ∅c ← z : z is least integer satisfying zγ ≥ 1 + εfor each community Ci ∈ C do

Compute the c-way balanced partitioning [37]Cuti = set of edges to cut Ci into c sub-parts

for each edge e(u, v) ∈ E doCalculate CNI (u, v)Find Community with smallest cut edges where e belongs, Cs = argmin

(u,v)∈Cj

{|Cutj |}

w(u,v) ← CNI (u, v) + |Cuts |if u ∈ Ci and v ∈ Cj and i 6= j then

w(u,v) ←∞

while |S | ≤ k doe(u, v)← argmin

(u,v)∈E\S{w(u,v)}

S ← S ∪ {e}Find the set of communities B that are brokenreturn S , B

The greedy algorithm chooses k edges starting with those having smallest weights

and finally, it identifies the list of broken communities after the removal of selected edges

according to the definition of broken community.


Our goal in this section is to: 1) Evaluate the performance of our proposed algorithm

CVA and k-CVA by comparing them to the optimal solutions, and 2) Assess the strength

of a community using the algorithms CCF and CEL.

5.4.1 Data Set

Set up: We use data sets from well-known social, collaboration and communication

networks which exhibit inherent community structure in their organization. The Facebook

data [70] we are using consists of the social network interactions between Rice

University graduate students and contains strongly connected components. The Arxiv

Condensed Matter Physics collaboration network is obtained from the e-print database

[23] and covers scientific collaborations between authors who have submitted papers to

101

Condensed Matter category. If an author i co-authored a paper with author j , the graph

contains an undirected edge from i to j . If the paper is co-authored by k authors, this

generates a completely connected (sub)graph on k nodes. We have further considered

Enron email [47] communication network dataset. A summary of the data sets are given

in Table 5-1.

Table 5-1. Experimental datasets

Data Set Node Count Edge Count

Facebook [70] 4039 88234Arxiv [23] 23133 93439Enron [47] 36692 183831

5.4.2 Performance Evaluation of CVA

To test how different communities behave under DBC formulation, we compare the

result of CVA with the outcome of optimal Integer Programming (IP) solution. We have

chosen the k largest communities based on the node numbers. For each k , a minimum

number of edges are chosen by Alg. 6 and removed from the network. Total set of

edges that are required to break all these k communities according to the definition in

Equation 5–1 is then plotted to compare the performance with that of the optimal one.

All tests are averaged on 500 runs for consistency.

IP Formulation

We formulate the DBC problem as an IP problem so that we can compare it with the

performance of CVA. This IP will be solved using the CPLEX package [36].

Let the variable zi represent each edge ei ∈ E :

zi =

1, if ei is selected for removal.


For each Cj ∈ C = {C1, ... ,Ck}, kj be the number of edges required to break Cj as

defined in Equation 5–2. Then we have the following IP:

102

minimizem∑i=1

zi

subject to∑ei∈Cj

zi ≥ kj , ∀Cj ∈ C ,

zi ∈ {0, 1}, ∀i ∈ {1, ...,m}

0

10

20

30

40

50

60

70

80

90

10 20 30 40 50 60 70 80 90 100

edge

s r

em

oved

k broken communities

OptimalCVA

A Facebook

0

50

100

150

200

250

300

350

400

10 20 30 40 50 60 70 80 90 100

edge

s r

em

oved


OptimalCVA

B Arxiv

0

10

20

30

40

50

60

70

80

10 20 30 40 50 60 70 80 90 100

edge

s r

em

oved


OptimalCVA

C Enron

Figure 5-1. Density based broken community analysis for k largest community

Figure 5-1 depicts the number of edges required to be removed for breaking a total

of 100 communities. As can be seen, the comparative performance of CVA is very much

close to the optimal one for all the data sets except for a negligible deviation in Arxiv

data set as Figure 5-1B depicts. And thus we can conclude that CVA performs very well.

103

5.4.3 Performance Evaluation for Generalized Framework

We provide the comparative analysis of the behavior of different networks under two

community detection algorithms. For this purpose, we use Blondel [17] and Oslom [43].

The first one is a modularity based community detection scheme which has been shown

to produce very good modular components in timely manner [42]. On the other hand,

the latter one is based on statistical properties of the graph which allows overlapping

communities. The characteristics of different networks detected by these two community

detection algorithms is shown in Table 5-2.

Table 5-2. Network communities

Data Set Community Count Community Countin Blondel in Oslom

Facebook 17 118Arxiv 620 1764Enron 1265 1374

As a first approach to observe how communities behave under sustained edge

removal, we target k large communities with CCF. To this end, we choose two different

values of strictness threshold γ, 0.5 and 0.3 all of which follow γc ≥ (1 + ε). For all

of the experiments, we have considered ε = 0.03 for the balanced partitioning. The

threshold 0.5 is less strict than 0.3 in the sense that it allows more nodes to be retained

even after breaking the community. We also show the behavior of CCF for k randomly

selected communities and k smallest communities. For Facebook network with Blondel

community detection algorithm, we try to break all 17 communities detected and for all

other cases we take 30 communities. The results that we plot are averaged over 100

runs to get rid of inconsistencies as much as possible.

Figure 5-2 shows the performance of different types of communities obtained

through different community detection algorithms for different strictness threshold

(γ) as we remove edges using CCF. In this figure, we are considering the k largest

communities which were chosen based on their respective number of nodes. From

104

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

% e

dges

rem

oved

k communities

oslomblondel

CVA

A Facebook,γ=.5

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

% e

dges

rem

oved

k communities

oslomblondel

CVA

B Facebook,γ=.3

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

C Arxiv,γ=.5

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

D Arxiv,γ=.3

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

E Enron,γ=.5

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

F Enron,γ=.3

Figure 5-2. Edge removal count by greedy algorithm CCF for breaking k largestcommunities. γ = 0.5 in first column, γ = 0.3 in second column

Figure 5-2 first column, it is clearly evident that only a small fraction of edge removal

causes the communities to be broken for γ = 0.5. On an average a maximum of

20% edge removal is enough to break all 17 communities in Facebook network

detected through Blondel as can be seen in Figure 5-2A. In case of Oslom, it takes

a large number of edges (a litte more than 40% on an average) which implies larger

communities detected by Oslom are more strongly connected internally.

For Arxiv network, the number of edges required to break 30 large communities

is only 4% for Blondel community detection algorithm and 15% for Oslom on an

105

average as shown in Figure 5-2C. For Arxiv with the same γ value and equal number

of communities, Oslom requires comparatively small number of edges to be removed

than in the case of Facebook. It implies members in communities for this particular

Facebook network are densely connected internally compared to Arxiv network and as a

result it was easier to break Arxiv communities with small number of edges. The same

observation is applicable for Enron network as portrayed in Figure 5-2E.

As we break more and more communities in Enron, Oslom requires decreasing

number of edges on average to break them. The reason is, the smaller the community

becomes the fewer the edges are needed to break them. In all of these cases we

have put the performance of CVA in parallel to visualize how communities detected by

different community detection algorithms are broken compared to the ones detected by

density-based algorithm in terms of breaking k communities. In all of the cases, CVA

requires very few edges to break all the communities. In general communities detected

by Oslom requires more edge removals compared to any other approach. One of the

reasons behind this behavior is that Oslom produces overlapping communities and as a

result it requires more edges to break those communities.

The second column of Figure 5-2 depicts the behavior of different networks for

γ = 0.3. This means the strictness imposed by γ will necessitate more edges to be

removed as few nodes are allowed to be retained if the community is to be broken. This

is evident from each of the figures Figure 5-2B, Figure 5-2D and Figure 5-2F. One thing

to notice in this regard, the need for more edge removal is equally true for both of the

community detection algorithms: Blondel and Oslom. The percentage of edges needed

to break only 1 (k = 1) community increase by almost double when we decrease γ from

0.5 to 0.3. Even though we impose more strictness, still, in case of Blondel community

detection algorithm, as low as only 7% for Arxiv and 24% for Enron networks on average

are required to break k communities. Facebook communities, since they are strongly

connected with more internal edges as was seen in earlier cases, require more (35%)

106

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

% e

dges

rem

oved

k communities

oslomblondel

CVA

A Facebook,γ=.5

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

% e

dges

rem

oved

k communities

oslomblondel

CVA

B Facebook,γ=.3

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

C Arxiv,γ=.5

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

D Arxiv,γ=.3

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

E Enron,γ=.5

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

F Enron,γ=.3

Figure 5-3. Edge removal count by CCF for breaking k randomly selected communities.γ = 0.5 in first column, γ = 0.3 in second column

edges to break all selected communities. Oslom in this case also needs more edges to

be removed compared to Blondel for breaking the same number of communities. This is

consistent with the behavior we observed so far in general for Oslom.

Next, we consider k randomly selected communities in Figure 5-3. It corroborates

the earlier observations that breaking a community requires more edges in case of

Oslom compared to Blondel. Only a small percentage of edge removal breaks all k

communities for both Arxiv and Enron networks. Facebook communities, as mentioned

for other cases, require more edges to break all the selected communities. As the

107

threshold γ becomes more stringent from 0.5 to 0.3 more edges are required to be

removed as can be seen from 2nd column.

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

% e

dges

rem

oved

k communities

oslomblondel

CVA

A Facebook,γ=.5

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

% e

dges

rem

oved

k communities

oslomblondel

CVA

B Facebook,γ=.3

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

C Arxiv,γ=.5

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

D Arxiv,γ=.3

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

E Enron,γ=.5

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

% e

dges

rem

oved

k communities

oslomblondel

CVA

F Enron,γ=.3

Figure 5-4. Edge removal count by CCF for breaking k smallest communities. γ = 0.5 infirst column, γ = 0.3 in second column

For k smallest communities we can see almost similar behavior as large communities

except for the fact that this time we need comparatively smaller number edges to be

removed for Oslom as can be seen in Figure 5-4. For both arxiv and enron dataset,

CVA requires large number of edges to break the communities as for small communities

more edges are needed for the internal density to go below a certain value resulting in

comparatively more edge removal. The same observation can be drawn for randomly

108

selected communities as some small communities might be chosen in random selection.

Nevertheless, the interesting outcome that we can sift from all of these figures points to

the fact that in many cases few edges are enough for breaking communities.

Impact of the Location of Communities: In order to understand how the

communities are interconnected and intra-connected and what is the impact of their

relative structural position in the network on the vulnerability of the communities, we

consider three different cases for each of the data set. We choose two communities

on random basis using three criteria and find out how difficult it is to break them

by removing the optimal number of critical edges. The first criteria chooses two

communities who do not have any connecting edges between them, i.e., non-adjacent

communities, the second criteria opts for two adjacent communities and tries to break

it based on the internal connections of each of the communities only without taking

into consideration the inter-community edges. The third criteria does the same as the

second one but this time it takes into consideration the inter-community edges while

breaking the communities. We call first criteria ‘non adjacent community’, the second

one ‘adjacent community inter-edge not considered’ and the third one ‘adjacent with

inter-edge’. Table 5-3 shows the percentage of edges needed to break two communities

in each of the three criteria. We considered Blondel community detection algorithm for

this case with γ = 0.3 and run over 50 different combination of random communities.

Table 5-3. Network characteristics

Data Set non adjacent adjacent community adj. communitycommunity inter-edge not considered with inter-edge

Facebook 12% 13% 12.4%Arxiv 15% 11% 11%Enron 20% 18% 17%

Intuitively, communities with connecting in-between edges are easier to break due

to the attraction of the neighboring communities. However, the above analysis from

Table 5-3 shows that this is not generally true. Moreover, for breaking two non-adjacent

109

Figure 5-5. A small community detected by Oslom for γ = 0.3 in Enron network. Herethe internal structure shows parts are connected through small number ofedges. Our greedy algorithm removes the pink cut edges.

Figure 5-6. A community detected by Oslom for γ = 0.3 in Facebook network. Here theinternal structure shows parts are connected through small number of edgesin pink.

and adjacent communities, the results are quite similar to the ones we have seen

in Figure 5-3 for random communities. The outcome of this experimental result is

consistent with what CCF does. This re-establishes the already claimed conjecture that

communities are in fact easy to break. We have also observed that for some cases,

as low as 1% edge removal is enough to break the community. To explore one of the

reasons behind this more closely, we also consider a small community detected by

Oslom community detection algorithm in Enron data set. The observation is depicted

in Figure 5-5. The internal structure seems to be modular and connected through

few number of important edges. This justifies our approach of partitioning each of

the communities into parts to break them. Interestingly the critical edges that were

selected by CCF are exactly the same one shown in this figure in pink. This shows that

110

communities can be broken by removing some crucial edges that keep different parts

inside a community closer.

We also observe similar edges in Figure 5-6 in another randomly selected

community detected by Oslom in Facebook. These edges act as the connecting force in

a community, removal of which results in broken community.

0

2

4

6

8

10

12

14

5 10 15 20 25

% C

om

munitie

s B

roken

k edges

Optimalk-CVA

A Facebook

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10

% C

om

munitie

s B

roken

% k edges

BlondelOslom

B Facebook, γ = 0.5

0

0.5

1

1.5

2

2.5

3

5 10 15 20 25

% C

om

munitie

s B

roke

n

k edges

Optimalk-CVA

C Arxiv

35

40

45

50

55

60

65

70

75

80

1 2 3 4 5 6 7 8 9 10

% C

om

munitie

s B

roke

n

% k edges

BlondelOslom

D Arxiv, γ = 0.5

0

1

2

3

4

5

6

7

8

9

5 10 15 20 25

% C

om

mu

nitie

s B

roke

n

k edges

Optimalk-CVA

E Enron

20

30

40

50

60

70

80

90

100

110

1 2 3 4 5 6 7 8 9 10

% C

om

mu

nitie

s B

roke

n

% k edges

BlondelOslom

F Enron, γ = 0.5

Figure 5-7. Broken Community Analysis. k-DBC in 1st Column, Outcome of CEL onk-BCA in 2nd Column

111

5.4.4 Analysis of the Edge Constrained Version

5.4.4.1 Results for k-DBC problem

We show the empirical results on real-world networks including the Arxiv citation

network, Facebook and Enron email dataset and compare the result of k-CVA with

the outcome of optimal IP solution. As can be seen from Figure 5-7 first column, the

greedy algorithm performs very close to and within a small bound of the optimal one for

all the data sets in terms of percentage of communities that are broken as we keep on

increasing k from 1 to 25.

5.4.4.2 Results for k-BCA problem

We keep on varying the budget k as the percentage of total edges in the network

from 1% to 10% for γ = 0.5 as can be seen from Figure 5-7 second column. In

Facebook dataset (Figure 5-7B), more than 94% of communities identified by Blondel

are broken only after 10% edge removal. On the other hand, with same number of edge

removal in Arxiv citation network (Figure 5-7D), around 60% communities detected

by Blondel are broken. The same trend is seen for Enron network (Figure 5-7F).

In all cases, communities identified by Blondel are more resilient compared to the

communities found by Oslom. In short, only a small percentage (10%) of edge removal

makes almost all of the communities identified by Blondel and Oslom to be broken.

Overall, communities are vulnerable to edge removal.

5.5 Summary

We made a novel attempt to study the community vulnerability problem for

assessing the system fragility under edge removal. We formulated the density-based

broken community problem and show its complexity. We also provided an efficient

approximation algorithm for solving this problem after proving its ratio. In addition,

we proposed a heuristic, CCF, for solving the general version of breaking community

problem. Experimental results on real world data under this newly defined framework

gave us insightful knowledge about the underlying community structure. We found out

112

that communities in real-world networks are susceptible to edge failures and in many

cases the failure of only a small number of critical edges can break major communities

in the network.

113

CHAPTER 6CONCLUSION

In this dissertation, we have made an attempt to analyze the community structure’s

importance on real world cellular network’s performance. we have studied the impact

of device mobility on the performance of multi-hop D2D underlaying cellular network.

We have introduced a novel model that considers durable communities based on the

social encounters of devices for predicting the likelihood of devices’ proximity. We have

formulated the reliable device selection problem as an IP optimization problem and

we have proposed an efficient heuristic algorithm to solve it. We have also shown that

leveraging social communities can increase the content delivery rate in multi-hop D2D.

Simulation results show that our proposed method outperforms classical social-unaware

methods significantly in terms of traffic offload. The results also show that the proposed

method achieves its objectives with manageable computational complexity which makes

it applicable to larger networks.

We have also studied the benefit of social-aware multi-hop D2D for video content

delivery to multicast users under practical constraints imposed by the eNB for relay

selection. We have formulated a novel problem, MRS, for minimizing content delivery

time to a group of users and shown its NP-completeness. We have introduced a mixed

integer program formulation to express MRS and proposed a heuristic scheme to

efficiently solve the problem. Our proposed social-aware solution minimizes the Base

Station cost efficiently by relaying video content to a set of relay devices which, in

turn, transmits the content via multi-hop D2D to other devices with poorer channel

condition which could not receive the content from the eNB. We further discussed a

special case of the proposed generic problem and analyzed its complexity. Moreover, we

provided an approximation algorithm for this special case with a provable performance

guarantee. Simulation results showed that our proposed methods outperformed existing

state-of-the-art methods significantly in terms of minimizing content delivery time.

114

We have also made a novel attempt to study the community vulnerability problem

for assessing the system fragility under edge removal. We formulated the density-based

broken community problem and showed its complexity. We have also provided an

efficient approximation algorithm for solving this problem after proving its ratio. In

addition, we have proposed a heuristic, CCF, for solving the general version of breaking

community problem. Moreover, we have discussed a variant for each of these two

problems where the choice of edge is constrained and the goal is to maximize the

broken community count. Experimental results on real world data under this newly

defined framework give us insightful knowledge about the underlying community

structure. We have observed that communities in real-world networks are susceptible to

edge failures and in many cases the failure of only a small number of critical edges can

break major communities in the network.

115

REFERENCES

[1] Stanford Network Analysis Project. http://snap.stanford.edu/, 2016.

[2] 3GPP. LTE-Advanced (3GPP Release 10 and beyond) (2011).36.300.

URL http://www.3gpp.org

[3] ———. “Evolved universal terrestrial radio access (E-UTRA) and evolved universalterrestrial radio access network (E-UTRAN), Rel. 11.” Tech. Rep. 36.300 (2012).

[4] ———. “General aspects and principles for interfaces supporting multimediabroadcast multicast service (MBMS) within E-UTRAN, Rel. 11.” Technical Report36.440 (2012).

[5] ———. “Feasibility study for proximity services (ProSe) (Release 12).” TechnicalReport 22.803 (2013).

[6] Afolabi, Richard O, Dadlani, Aresh, and Kim, Kiseon. “Multicast scheduling andresource allocation algorithms for OFDMA-based systems: A survey.” IEEECommunications Surveys & Tutorials 15 (2013).1: 240–254.

[7] Ageev, A. A. and Sviridenko, M. “Approximation algorithms for maximum coverageand max cut with given sizes of parts.” IPCO (1999): 17–30.

[8] Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. “Network flows.” DTIC Document(1988).

[9] Albert, R., Albert, I., and Nakarado, G. L. “Structural vulnerability of the NorthAmerican power grid.” Phys. Rev. E 69 (2004).2.

[10] Alim, M. A., Pan, T., Thai, M. T., and Saad, W. “Leveraging Social Communitiesfor Optimizing Cellular Device-to-Device Communications.” IEEE Transactions onWireless Communications PP (2016).99: 1–1.

[11] Alim, Md Abdul, Kuhnle, Alan, and Thai, M. T. “Are Communities as Strong as WeThink?” Proc. IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining (ASONAM). 2014, 314–319.

[12] Alim, Md Abdul, Li, Xiang, Nguyen, Nam, Thai, My, and Helal, Abdelsalam.“Structural Vulnerability Assessment of Community-based Routing in OpportunisticNetworks.” IEEE Transactions on Mobile Computing 15 (2016).12: 3156–3170.

[13] Alim, Md Abdul, Nguyen, Nam P., Dinh, Thang N., and Thai, My T. “StructuralVulnerability Analysis of Overlapping Communities in Complex Networks.” Pro-ceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on WebIntelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01. WI-IAT ’14.Washington, DC, USA: IEEE Computer Society, 2014, 5–12.

URL http://dx.doi.org/10.1109/WI-IAT.2014.10

116

http://www.3gpp.org

http://dx.doi.org/10.1109/WI-IAT.2014.10

[14] Alim, Md Abdul, Pan, Tianyi, Thai, My Tra, and Saad, Walid. “Leveraging SocialCommunities for Optimizing Cellular Device-to-Device Communications.” arXivpreprint arXiv:1611.01582 (2016).

[15] Araniti, Giuseppe, Condoluci, Massimo, Militano, Leonardo, and Iera, Antonio.“Adaptive resource allocation to multicast services in LTE systems.” IEEE Transac-tions on Broadcasting 59 (2013).4: 658–664.

[16] Asadi, A., Wang, Q., and Mancuso, V. “A survey on device-to-devicecommunication in cellular networks.” IEEE Communications Surveys & Tutori-als 16 (2014).4: 1801–1819.

[17] Blondel, V. D., Guillaume, J., Lambiotte, R., and Lefebvre, E. “Fast unfolding ofcommunities in large networks.” J. Stat. Mech.: Theory and Experiment (2008).

[18] Borgatti, Stephen P. and Everett, Martin G. “A Graph-theoretic perspective oncentrality.” Social Networks 28 (2006).4: 466 – 484.

[19] Botsov, Mladen, Klugel, Markus, Kellerer, Wolfgang, and Fertl, Peter. “Locationdependent resource allocation for mobile device-to-device communications.” 2014IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2014,1679–1684.

[20] Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., andWagner, D. “On modularity clustering.” IEEE Transactions on Knowledge and DataEngineering 20 (2008).2: 172–188.

[21] Chen, Xiaohang, Chen, Li, Zeng, Mengxian, Zhang, Xin, and Yang, Dacheng.“Downlink resource allocation for device-to-device communication underlayingcellular networks.” 2012 IEEE 23rd International Symposium on Personal, Indoorand Mobile Radio Communications-(PIMRC). IEEE, 2012, 232–237.

[22] Cho, E., Myers, S., and Leskovec, J. “Friendship and mobility: user movement inlocation-based social networks.” In Proc. of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining (2011): 1082–1090.

[23] dataset, ArXiv. “http://www.cs.cornell.edu/projects/kddcup/datasets.html.” Proc.KDD Cup 2003. 2003.

[24] Diaz, Carlos G, Saad, Walid, Maham, Behrouz, Niyato, Dusit, and Madhukumar,AS. “Strategic device-to-device communications in backhaul-constrained wirelesssmall cell networks.” 2014 IEEE Wireless Communications and NetworkingConference (WCNC). IEEE, 2014, 1661–1666.

[25] Dinh, Thang N, Nguyen, Nam P, Alim, Md Abdul, and Thai, My T. “A near-optimaladaptive algorithm for maximizing modularity in dynamic scale-free networks.”Journal of Combinatorial Optimization 30 (2015).3: 747–767.

117

[26] Dinh, Thang N., Xuan, Ying, Thai, My T., Pardalos, Panos M., and Znati, Taieb. “Onnew approaches of assessing network vulnerability: hardness and approximation.”IEEE/ACM Trans. Netw. 20 (2012).2: 609–619.

[27] Dobson, Gregory. “Worst-Case Analysis of Greedy Heuristics for IntegerProgramming with Nonnegative Data.” Mathematics of Operations Research 7(1982).4: 515–531.

[28] Fodor, G., Dahlman, E., Mildh, G., Parkvall, S., Reider, N., Miklos, G., and Turanyi,Z. “Design aspects of network assisted device-to-device communications.” IEEEComm. Mag 50(3) (2012): 170–177.

[29] Fortunato, S. “Community detection in graphs.” Physics Reports 486 (2010).3-5: 75– 174.

[30] Fortunato, S. and Castellano, C. “Community Structure in Graphs.” eprint arXiv:0712.2716 (2007).

[31] Gao, Wei, Li, Qinghua, Zhao, Bo, and Cao, Guohong. “Multicasting in delay tolerantnetworks: a social network perspective.” Proc. tenth ACM international symposiumon Mobile ad hoc networking and computing. MobiHoc ’09. New York, NY, USA:ACM, 2009, 299–308.

URL http://doi.acm.org/10.1145/1530748.1530790

[32] Han, B., Hui, P., Kumar, V. S. A., Marathe, M. V., Shao, J., and Srinivasan,A. “Mobile data offloading through opportunistic communications and socialparticipation.” IEEE Trans. Mobile Computing 11 (2012).5: 821–834.

[33] Hasan, Mohammed, Hossain, Ekram, and Kim, Dong In. “Resource allocationunder channel uncertainties for relay-aided device-to-device communicationunderlaying LTE-A cellular networks.” IEEE Transactions on Wireless Communica-tions 13 (2014).4: 2322–2338.

[34] Hui, P. and Crowcroft, J. “Human mobility models and opportunisticcommunications system design.” Philosophical Transactions of the Royal Societyof London A: Mathematical, Physical and Engineering Sciences 366 (2008).1872:2005–2016.

[35] Hui, Pan, Crowcroft, Jon, and Yoneki, Eiko. “Bubble rap: Social-based forwardingin delay-tolerant networks.” IEEE Transactions on Mobile Computing 10 (2011).11:1576–1589.

[36] IBM, IBM ILOG CPLEX Optimization Studio. 2014.

URL http://www-03.ibm.com/software/products/en/ibmilogcpleoptistud

[37] Karypis, George and Kumar, Vipin. “Multilevel k-way Partitioning Scheme forIrregular Graphs.” SIAM Review 2 (1998).41.

118

http://doi.acm.org/10.1145/1530748.1530790

http://www-03.ibm.com/software/products/en/ibmilogcpleoptistud

[38] Kaufman, B., Lilleberg, J., and Aazhang, B. “Spectrum sharing scheme betweencellular users and ad-hoc device-to-device users.” IEEE Transactions on WirelessCommunications 12 (2013).3: 1038–1049.

[39] Kim, Joongheon and Molisch, Andreas F. “Quality-aware millimeter-wavedevice-to-device multi-hop routing for 5G cellular networks.” 2014 IEEE Inter-national Conference on Communications (ICC). IEEE, 2014, 5251–5256.

[40] Kovacs, Istvan A., Palotai, Robin, Szalay, Mate S., and Csermely, Peter.“Community Landscapes: An Integrative Approach to Determine OverlappingNetwork Module Hierarchy, Identify Key Nodes and Predict Network Dynamics.”PLoS ONE 5 (2010).9: e12528.

[41] Kuhnle, Alan, Li, Xiang, and Thai, My T. “Online Algorithms for Optimal ResourceManagement in Dynamic D2D Communications.” Mobile Ad-hoc and SensorNetworks (MSN), 2014 10th International Conference on. IEEE, 2014, 130–137.

[42] Lancichinetti, A. and Fortunato, S. “Community detection algorithms: A comparativeanalysis.” Physical review. E. 80 (2009).

[43] Lancichinetti, Andrea, Radicchi, Filippo, Ramasco, Jos J., and Fortunato, Santo.“Finding Statistically Significant Communities in Networks.” PLoS ONE 6 (2011).4:e18961.

[44] Lee, D., Kim, S., Lee, J., and Heo, J. “Performance of multihop decode-and-forwardrelaying assisted device-to-device communication underlaying cellular networks.”In Proc. of International Symposium on Information Theory and its Applications(2012): 455–459.

[45] Lee, Dong Heon, Choi, Kae Won, Jeon, Wha Sook, and Jeong, Dong Geun.“Resource allocation scheme for device-to-device communication for maximizingspatial reuse.” 2013 IEEE Wireless Communications and Networking Conference(WCNC). IEEE, 2013, 112–117.

[46] Lee, K., Hong, S., Kim, S. J., Rhee, I., and Chong, S. “Slaw: A new mobilitymodel for human walks.” In Proc. of IEEE International Conference on ComputerCommunications (2009): 855–863.

[47] Leskovec, J., Lang, K. J., A., Dasgupta, and Mahoney, M. W. “Community structurein large networks: Natural cluster sizes and the absence of large well-definedclusters.” Internet Mathematics 6 (2009).1: 29–123.

[48] Li, Y., Hui, P., Jin, D., Su, L., and Zeng, L. “Evaluating the impact of socialselfishness on the epidemic routing in delay tolerant networks.” IEEE Comm.Letters 14 (2010).11: 1026–1028.

119

[49] Lin, Xingqin, Andrews, Jeffrey G, Ghosh, Amitabha, and Ratasuk, Rapeepat. “Anoverview of 3GPP device-to-device proximity services.” IEEE CommunicationsMagazine 52 (2014).4: 40–48.

[50] Lin, Y. and Hsu, Y. “Multihop cellular: A new architecture for wirelesscommunications.” In Proc. of IEEE International Conference on Computer Commu-nications 3 (2000): 1273–1282.

[51] Lu, Z., Wu, W, Chen, W, Zhong, J, Bi, Y, and Gao, Z. “The Maximum CommunityPartition Problem in Networks.” Discrete Math., Alg. and Appl. (2013).

[52] Luciano, Rodrigues, F.A., Travieso, G., and Boas, V. P. R. “Characterization ofcomplex networks: A survey of measurements.” Advances in Physics 56 (2007).1:167–242.

URL http://dx.doi.org/10.1080/00018730601170527

[53] Ma, X., Yin, R., Yu, G., and Zhang, Z. “A distributed relay selection method for relayassisted device-to-device communication system.” In Proc. of 23rd InternationalSymposium on Personal Indoor and Mobile Radio Communications (2012):1020–1024.

[54] Madan, R., Borran, J., Sampath, A., Bhushan, N., Khandekar, A., and Ji, T.“Cell association and interference coordination in heterogeneous LTE-A cellularnetworks.” IEEE Journal on Selected Areas in Communications 28 (2010).9:1479–1489.

[55] Militano, Leonardo, Condoluci, Massimo, Araniti, Giuseppe, Molinaro,Antonella, Iera, Antonio, and Muntean, Gabriel-Miro. “Single frequency-baseddevice-to-device-enhanced video delivery for evolved multimedia broadcast andmulticast services.” IEEE Transactions on Broadcasting 61 (2015).2: 263–278.

[56] Min, Hyunkee, Lee, Jemin, Park, Sungsoo, and Hong, Daesik. “Capacityenhancement using an interference limited area for device-to-device uplinkunderlaying cellular networks.” IEEE Transactions on Wireless Communications 10(2011).12: 3995–4000.

[57] Nguyen, N. P., Alim, Md Abdul, Shen, Y., and Thai, M. T. “Assessing networkvulnerability in a community structure point of view.” Proc. IEEE/ACM InternationalConference on Advances in Social Networks Analysis and Mining (ASONAM).2013, 231–235.

[58] Nguyen, Nam P, Alim, Md Abdul, Dinh, Thang N, and Thai, My T. “A method todetect communities with stability in social networks.” Social Network Analysis andMining 4 (2014).1: 1–15.

120

http://dx.doi.org/10.1080/00018730601170527

[59] Nunes, Ivan O, de Melo, Pedro OS Vaz, and Loureiro, Antonio AF. “LeveragingD2D Multi-Hop Communication Through Social Group Meetings Awareness.” IEEEWireless Communications Magazine (2016): 1–9.

[60] Pei, Y. and Liang, Y. “Resource allocation for device-to-device communicationsoverlaying two-way cellular networks.” IEEE Transactions on Wireless Communica-tions 12 (2013).7: 3611–3621.

[61] Pew Research Center, Washington, D.C. “Social Media Update 2014.” (2014).

URL http://www.pewinternet.org/2015/01/09/social-media-update-2014/

[62] Proebster, M., Kaschub, M., Werthmann, T., and Valentin, S. “Context-awareresource allocation for cellular wireless networks.” EURASIP Journal on WirelessCommunications and Networking 2012 (2012).1: 1–19.

[63] Rebecchi, Filippo, Valerio, Lorenzo, Bruno, Raffaele, Conan, Vania, de Amorim,Marcelo Dias, and Passarella, Andrea. “A joint multicast/D2D learning-basedapproach to LTE traffic offloading.” Computer Communications 72 (2015): 26–37.

[64] Scripps, Jerry, Tan, Pang-Ning, and Esfahanian, Abdol-Hossein. “Node roles andcommunity structure in networks.” Proc. 9th WebKDD and 1st SNA-KDD 2007workshop on Web mining and social network analysis. WebKDD/SNA-KDD ’07.2007, 26–35.

[65] Semiari, Omid, Saad, Walid, Valentin, Stefan, Bennis, Mehdi, and Poor, H Vincent.“Context-Aware Small Cell Networks: How Social Metrics Improve WirelessResource Allocation.” IEEE Transactions on Wireless Communications 14(2015).11: 5927–5940.

[66] Sun, Yue, Wang, Tianyu, Song, Lingyang, and Han, Zhu. “Efficient resourceallocation for mobile social networks in D2D communication underlaying cellularnetworks.” 2014 IEEE International Conference on Communications (ICC). IEEE,2014, 2466–2471.

[67] Tan, Li, Feng, Zhiyong, Li, Wei, Jing, Zhong, and Gulliver, T Aaron. “Graph coloringbased spectrum allocation for femtocell downlink interference mitigation.” In Proc.of Wireless Communications and Networking Conference (WCNC), 2011 IEEE(2011): 1248–1252.

[68] Vanganuru, K., Ferrante, S., and Sternberg, G. “System capacity and coverage ofa cellular network with D2D mobile relays.” In Proc. of Military CommunicationsConference (2012).

[69] Vazirani, Vijay V. Approximation algorithms. Springer Science & Business Media,2013.

121

http://www.pewinternet.org/2015/01/09/social-media-update-2014/

[70] Viswanath, Bimal, Post, Ansley, Gummadi, Krishna P., and Mislove, Alan. “Ananalysis of social network-based Sybil defenses.” Proc. ACM SIGCOMM 2010conference. SIGCOMM ’10. New York, NY, USA: ACM, 2010, 363–374.

[71] Wang, Fang, Li, Yong, Wang, Zhaocheng, and Yang, Zhixing.“Social-Community-Aware Resource Allocation for D2D CommunicationsUnderlaying Cellular Networks.” IEEE Transactions on Vehicular Technology65 (2016).5: 3628–3640.

[72] Wang, Feiran, Xu, Chen, Song, Lingyang, and Han, Zhu. “Energy-efficient resourceallocation for device-to-device underlay communication.” IEEE Transactions onWireless Communications 14 (2015).4: 2082–2092.

[73] Wang, Feiran, Xu, Chen, Song, Lingyang, Zhao, Qun, Wang, Xiaoli, and Han, Zhu.“Energy-aware resource allocation for device-to-device underlay communication.”2013 IEEE International Conference on Communications (ICC). IEEE, 2013,6076–6080.

[74] Wang, L., Peng, T., Yang, Y., and Wang, W. “Interference Constrained RelaySelection of D2D Communication for Relay Purpose Underlaying CellularNetworks.” In Proc. of 8th International Conference on Wireless Communica-tions, Networking and Mobile Computing (2012).

[75] Wang, Qin, Wang, Wei, Jin, Shi, Zhu, Hongbo, and Zhang, Nai Tong.“Game-theoretic source selection and power control for quality-optimized wirelessmultimedia device-to-device communications.” In Proc. of IEEE Global Communica-tions Conference (GLOBECOM). IEEE, 2014, 4568–4573.

[76] Wang, Z. and Crowcroft, J. “Quality-of-service routing for supporting multimediaapplications.” IEEE Journal on Selected Areas in Communications 14 (1996).7:1228–1234.

[77] Xiang, Rongjing, Neville, Jennifer, and Rogati, Monica. “Modeling relationshipstrength in online social networks.” Proceedings of the 19th international confer-ence on World wide web. ACM, 2010, 981–990.

[78] Xu, Shaoyi, Wang, Haiming, Chen, Tao, Huang, Qing, and Peng, Tao. “Effectiveinterference cancellation scheme for device-to-device communication underlayingcellular networks.” Vehicular Technology Conference Fall (VTC 2010-Fall), 2010IEEE 72nd. IEEE, 2010, 1–5.

[79] Zhang, Huiling, Alim, Md Abdul, Li, Xiang, Thai, My T, and Nguyen, Hien T.“Misinformation in Online Social Networks: Detect Them All with a Limited Budget.”ACM Transactions on Information Systems (TOIS) 34 (2016).3: 18.

[80] Zhang, Huiling, Alim, Md Abdul, Thai, My T, and Nguyen, Hien T. “Monitorplacement to timely detect misinformation in online social networks.” 2015 IEEEInternational Conference on Communications (ICC). IEEE, 2015, 1152–1157.

122

[81] Zhang, Yanru, Song, Lingyang, Saad, Walid, Dawy, Zaher, and Han, Zhu.“Contract-Based Incentive Mechanisms for Device-to-Device Communicationsin Cellular Networks.” IEEE Journal on Selected Areas in Communications 33(2015).10: 2144–2155.

[82] Zhu, Zhichao, Cao, Guohong, Zhu, Sencun, Ranjan, Supranamaya, and Nucci,Antonio. “A social network based patching scheme for worm containment in cellularnetworks.” Handbook of Optimization in Complex Networks. Springer, 2012.505–533.

123

BIOGRAPHICAL SKETCH

Md Abdul Alim received the Bachelor of Science degree in Computer Science and

Engineering from Bangladesh University of Engineering and Technology, Bangladesh in

2007. He worked in a multi-national company before joining the University of Florida in

2012 for pursuing higher studies. He received the Ph.D. degree from the department of

Computer and Information Science and Engineering at the University of Florida under

the supervision of Dr. My T. Thai in December 2016. His research interests include

social-aware device-to-device communication underlaying next generation cellular

network, network vulnerability and community structure analysis in complex networks

including large-scale online social, wireless, and biological networks. He also works on

influence propagation and viral marketing in online social networks and approximation

algorithms and its application in combinatorial optimization.

During his Ph.D. study, Alim has published many papers in top-tier peer-reviewed

conferences and journals including IEEE/ACM Transactions. Alim is also the recipient

of many awards such as the University of Florida Graduate School Fellowship Award,

Gartner Group Info Tech Fund, Student Travel Grants of CISE and NSF.

124

analyzing social communities and its importance on...

Documents