analyzing social communities and its importance on...
TRANSCRIPT
ANALYZING SOCIAL COMMUNITIES AND ITS IMPORTANCE ON DYNAMIC MOBILENETWORKS
By
MD ABDUL ALIM
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2016
c© 2016 Md Abdul Alim
2
To my family
3
ACKNOWLEDGMENTS
I would like to express my utmost gratitude to my supervisor Prof. My T. Thai for
her continuous support and guidance during my study and research at the University of
Florida. I have benefited a lot from her keenness in research endeavors, great skills in
writing and presentation, and her great personality and enthusiasm, which profoundly
inspired me throughout my journey. Her wisdom, support and advice have guided me
through all of my difficult moments, not only in doing research but also in my personal
life. Also, I am grateful to have excellent lab-mates who have provided extremely helpful
resources during my study.
I am thankful to Prof. Tamer Kahveci, Prof. Prabhat Mishra, Prof. Panos Pardalos
and Prof. Daisy Zhe Wang for being in my PhD committee.
Finally, I would like to thank all my family members for their relentless support
throughout my study as well as for my career.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CHAPTER
1 SOCIAL COMMUNITIES AND ITS IMPORTANCE ON MOBILE NETWORKEFFICIENCY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.1.1 Social Community and Multi-hop D2D Communication . . . . . . . 131.1.2 Efficient Content Transmission Through D2D Multicast Communication 151.1.3 Community Structure Vulnerability . . . . . . . . . . . . . . . . . . 171.1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.1.5 Paper Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.1 Recent Advances in Multi-hop D2D Communication . . . . . . . . 201.2.2 D2D and Multicasting in Cellular Network . . . . . . . . . . . . . . 211.2.3 Community Structure Vulnerability . . . . . . . . . . . . . . . . . . 22
2 LEVERAGING SOCIAL COMMUNITIES FOR OPTIMIZING CELLULAR DEVICE-TO-DEVICECOMMUNICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 Cost-effective Relay Selection for Content Delivery in Multi-hop D2D . . . 242.2 System Overview and Model Representation . . . . . . . . . . . . . . . . 25
2.2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Problem Formulation and Solution . . . . . . . . . . . . . . . . . . . . . . 312.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Social Community Aware Cellular Network . . . . . . . . . . . . . . 322.3.3 Community Structure and Durable Community . . . . . . . . . . . 34
2.3.3.1 Durable community detection . . . . . . . . . . . . . . . . 362.3.3.2 A greedy algorithm for DCD problem . . . . . . . . . . . . 37
2.4 Cost-Effective Device Selection . . . . . . . . . . . . . . . . . . . . . . . . 402.4.1 Relay Graph Construction . . . . . . . . . . . . . . . . . . . . . . . 402.4.2 We Weight Assignment in Gr . . . . . . . . . . . . . . . . . . . . . 412.4.3 Social Community Aware Device Selection for Multi-hop D2D . . . 422.4.4 Solving the Optimization Problem . . . . . . . . . . . . . . . . . . . 432.4.5 Exact Solution by Cutting Plane . . . . . . . . . . . . . . . . . . . . 47
2.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3 TOWARDS EFFICIENT SOCIAL-AWARE CONTENT TRANSMISSION THROUGHDEVICE-TO-DEVICE MULTICAST COMMUNICATIONS . . . . . . . . . . . . . 59
3.1 D2D Enhanced Content Transmission . . . . . . . . . . . . . . . . . . . . 593.1.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.1.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.2.1 Radio network . . . . . . . . . . . . . . . . . . . . . . . . 613.1.2.2 Social network . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3 Solution for MRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4 SOCIAL-AWARE MULTICAST CONTENT TRANSMISSION: SPECIAL CASESCENARIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.1 CRS: Two-hop MRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.1.1 Solution Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.1.2 Solutions for RSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.2.1 The pptimal solution to RSP . . . . . . . . . . . . . . . . 824.1.2.2 The greedy solution to RSP . . . . . . . . . . . . . . . . . 83
4.1.3 Solution to CRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 ROBUSTNESS OF COMMUNITY STRUCTURES: APPROXIMATION ALGORITHMSAND ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Density-based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.1.1 Network Model and Problem Definition . . . . . . . . . . . . . . . . 895.1.2 Complexity of DBC . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.1.3 Solutions to DBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 A General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.3 Broken Community Analysis: Constraint on Edge Removal . . . . . . . . 98
5.3.1 k-Density-based Broken Community . . . . . . . . . . . . . . . . . 985.3.2 A General Framework: k-Broken Community Assessment . . . . . 100
5.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.4.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.4.2 Performance Evaluation of CVA . . . . . . . . . . . . . . . . . . . . 1025.4.3 Performance Evaluation for Generalized Framework . . . . . . . . 1045.4.4 Analysis of the Edge Constrained Version . . . . . . . . . . . . . . 112
5.4.4.1 Results for k-DBC problem . . . . . . . . . . . . . . . . . 1125.4.4.2 Results for k-BCA problem . . . . . . . . . . . . . . . . . 112
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7
LIST OF TABLES
Table page
2-1 Summary of important symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2-2 Running times in seconds for DCD . . . . . . . . . . . . . . . . . . . . . . . . . 39
2-3 Comparison of running times in seconds . . . . . . . . . . . . . . . . . . . . . . 40
2-4 Main wireless network parameters . . . . . . . . . . . . . . . . . . . . . . . . . 49
3-1 CQI / MCS table for LTE-A [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3-2 Summary of notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3-3 Social CQI Matrix (SCM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3-4 Wireless network parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3-5 Gap analysis between ORS and ERS . . . . . . . . . . . . . . . . . . . . . . . 74
3-6 D2D pair counts in ORS and ERS . . . . . . . . . . . . . . . . . . . . . . . . . 74
3-7 Comparison of delivery times in second . . . . . . . . . . . . . . . . . . . . . . 75
5-1 Experimental datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5-2 Network communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5-3 Network characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8
LIST OF FIGURES
Figure page
2-1 D2D communication scenario before the transmission takes place . . . . . . . 26
2-2 Flow chart for the proposed solution scheme . . . . . . . . . . . . . . . . . . . 27
2-3 Content transmission success rate for different cases . . . . . . . . . . . . . . 51
2-4 Offload performance analysis for different cases . . . . . . . . . . . . . . . . . 54
2-5 Cost-effectiveness of multi-hop D2D for three different content sizes and arange of tmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2-6 Execution time of RPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2-7 Impact of different parameters for constructing Gp on the performance of RPF . 57
2-8 The cost of the BS vs user count . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3-1 D2D enabled multicast (a multi-hop scenario): v1 and v3 form the relay devicesin second hop, along with purple nodes (v2, v4) they are directly served by theeNB in first hop. v5, v6, v7 and v8 denote the relay devices in subsequent hops. 60
3-2 Analysis of ERS performance vs ORS . . . . . . . . . . . . . . . . . . . . . . . 74
3-3 Delivery time for varying the eNB budget . . . . . . . . . . . . . . . . . . . . . . 76
3-4 Delivery time for varying the multicast users . . . . . . . . . . . . . . . . . . . . 76
3-5 Delivery time for varying the RB count . . . . . . . . . . . . . . . . . . . . . . . 76
3-6 Delivery time for varying the content size . . . . . . . . . . . . . . . . . . . . . 77
3-7 Delivery time and D2D pair count for varying the social tie distribution . . . . . 77
3-8 Heatmap depicting hop count for varying both budget and multicast user . . . . 78
4-1 eNB Budget (RD count) for varying user count . . . . . . . . . . . . . . . . . . 86
4-2 Content delivery time for varying user count . . . . . . . . . . . . . . . . . . . . 86
4-3 Execution time for varying user count . . . . . . . . . . . . . . . . . . . . . . . 86
5-1 Density based broken community analysis for k largest community . . . . . . . 103
5-2 Edge removal count by greedy algorithm CCF for breaking k largest communities.γ = 0.5 in first column, γ = 0.3 in second column . . . . . . . . . . . . . . . . . 105
5-3 Edge removal count by CCF for breaking k randomly selected communities.γ = 0.5 in first column, γ = 0.3 in second column . . . . . . . . . . . . . . . . . 107
9
5-4 Edge removal count by CCF for breaking k smallest communities. γ = 0.5 infirst column, γ = 0.3 in second column . . . . . . . . . . . . . . . . . . . . . . . 108
5-5 A small community detected by Oslom for γ = 0.3 in Enron network. Here theinternal structure shows parts are connected through small number of edges.Our greedy algorithm removes the pink cut edges. . . . . . . . . . . . . . . . . 110
5-6 A community detected by Oslom for γ = 0.3 in Facebook network. Here theinternal structure shows parts are connected through small number of edgesin pink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5-7 Broken Community Analysis. k-DBC in 1st Column, Outcome of CEL on k-BCAin 2nd Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
ANALYZING SOCIAL COMMUNITIES AND ITS IMPORTANCE ON DYNAMIC MOBILENETWORKS
By
Md Abdul Alim
December 2016
Chair: My T. ThaiMajor: Computer Engineering
Many complex systems, from World Wide Web and online social networks to mobile
networks, exhibit community structure in which nodes can be grouped into densely
interconnected communities. This special structure has been exploited extensively
to design better solutions for many applications such as routing in wireless networks,
worm containment and interest prediction in social networks. In this dissertation, the
impact of social communities on emerging device-to-device (D2D) communication
has been analyzed and a social-aware scheme has been introduced taking social
encounters into context for time sensitive content transmission. Simulation results show
that the proposed social community-aware approach yields significant performance
gain, in terms of the amount of traffic offloaded from the cellular network to the D2D tier
compared to social-unaware methods.
Recently, the trend of accessing popular video-on-demand contents over cellular
network has increased unimaginably due to the widespread use of social media,
thus straining the capacity of existing wireless cellular networks. The cellular network
resource utilization can be significantly improved when requests for a particular content
are generated from a group of users located in a particular area. In such cases, a
traditional multicast scheme serves all users in a cell by limiting the data rate to the user
with the worst channel condition which results in degraded satisfaction for users with
better channel quality. Unlike the conventional multicast scheme, which also assumes
11
the altruistic nature of users and does not consider the Base Station (BS) cost, a novel
framework leveraging both the D2D communication and the social relationship between
users has been introduced in this dissertation with the aim to achieving better quality of
service while delivering time sensitive video content to multicast users. Experimental
evaluation shows our proposed solution achieves significant enhancements of the
overall performance compared to the state-of-the-art solutions.
Since, community based approaches not only provides helpful information in
developing more social-aware strategies for mobile network problems but also
promises a wide range of applications enabled by social networking; analyzing and
properly understanding the behaviors and characteristics of communities is of great
advantage. Investigating how the community structure is reshaped under node and
edge removal and consequently, what footprint it leaves on the network performance
is of particular interest. Due to the high interaction within a community, it is assumed
that communities are hard to break; therefore, community-based solutions are very
robust. In this dissertation, we aim to face this important question: can communities be
broken easily in a network? To answer this question, at first, a density-based problem
formulation for analyzing the vulnerability of network communities is introduced in
terms of edge removal from the network. The NP-completeness of the problem is
proven and a O(log k) approximation algorithm for solving the problem, where k is
the number of communities to be broken, has been introduced. Moreover, it has also
been shown that approximating the problem within a ratio better than our proposed
solution is unlikely possible. Additionally, the vulnerability of communities in the context
of arbitrary community detection algorithms is analyzed. The empirical results show that
communities are vulnerable to edge removal and in some cases the removal of a small
fraction of edges can break the community structure.
12
CHAPTER 1SOCIAL COMMUNITIES AND ITS IMPORTANCE ON MOBILE NETWORK
EFFICIENCY
1.1 Introduction
Complex networks in general exhibit the property of having community structure in
which nodes can be grouped into densely interconnected communities. Understanding
the behaviors and characteristics of communities is of great advantage. It not only
provides helpful information in developing more social-aware strategies for social
network problems but also promises a wide range of applications enabled by mobile
networking, such as routing in Delay Tolerant Networks (DTNs) [35], mode selection
and resource allocation in Device-to-Device (D2D) communication [41, 71], worm
containment in cellular networks [82]. Furthermore, communities reveal the core
network components together with their mutual interactions, thereby representing
the entire network as a compact and more descriptive level. Understanding the
community properties can thus help design efficient solutions for such applications.
In this context, we analyze the characteristics of social communities in wireless networks
and leverage its importance in designing efficient solutions for device-to-device (D2D)
communications.
1.1.1 Social Community and Multi-hop D2D Communication
The demand for wireless data services has increased exponentially in the past
decade thus straining the capacity of existing wireless cellular networks [28] and [32].
One promising solution to meet this capacity crunch is to offload cellular traffic via the
use of direct device-to-device (D2D) communications for enabling proximity services
over the cellular licensed band [60]. To reap the benefits of D2D over cellular, there
is a need to optimize and manage the added cellular interference resulting from D2D
[54]. However, due to the high mobility of cellular devices, establishing and ensuring the
success of D2D transmission is a major challenge.
13
Recently, there has been an increased interest to operate D2D over cellular
using multi-hop transmissions (henceforth referred to as multi-hop D2D) [16, 38, 50].
Such multi-hop D2D architectures can reduce the outage probability while potentially
increasing the capacity of D2D communication by alleviating the effect of interference
from the cellular users [33, 44, 53, 68, 74]. Unlike multi-hop ad hoc networks, which do
not use the cellular spectrum and do not require any infrastructure, multi-hop D2D is
controlled centrally by the base station (BS) for ensuring the QoS of both the cellular
and D2D users simultaneously. In cellular multi-hop D2D scenarios, one must properly
group the mobile devices in order to achieve the required quality-of-service (QoS).
Such a grouping is particularly dependent on the mobility patterns of the devices. One
major challenge in the analysis of such mobile, multi-hop D2D pertains to its strong
dependence on dynamic human behavior which must be correlated with the complex
QoS considerations of the cellular system.
Recently, it has been observed that cellular devices carried by humans exhibit
a pattern with respect to their physical encounters both in space and time [22] and
[35]. Such social encounters have been shown to exhibit a community structure
property which implies that the network can be divided into groups of nodes with dense
connections inside each group and fewer connections across groups. From a D2D
perspective, users who encounter one another frequently will be likely to form a social
community [25, 34]. Additionally, the longer a device stays close to another device, the
mutual interaction between them grows further compared to other sporadic contacts.
Moreover, a large number of longer duration contacts over a period of time makes the
mutual connection more reliable for the continuity of a D2D session which forms the
basis of durable communities. Leveraging such durable communities for improving D2D
transmission constitutes therefore an opportunity that has hitherto not been explored.
For establishing D2D connections, the cellular BS must provide proper incentives
to the users so that they become willing to share their resources for each others
14
transmissions which in turn incurs cost to the BS [81]. Naturally, if most users are
unwilling to participate in D2D transmission, the resources cannot be fully utilized,
and the operation of the underlaid cellular D2D links will be jeopardized. For real-time
content transmission, that must meet stringent latency requirements, a high mobility
of the devices will disrupt an ongoing D2D session. This will eventually lead the D2D
transmission to fail in delivering the content within the needed time bound. In such
cases, the BS must initiate resource consuming cellular connection after dropping the
interrupted session, thus reducing the overall network QoS and failing to exploit the
benefits of D2D. Consequently, to enable reliable delivery of real-time content over
multi-hop D2D at minimum BS cost, it is imperative to identify a set of reliable devices.
Also, such devices must remain within the transmission range of one another during the
D2D session to maintain the QoS. In this thesis, we show that leveraging community
structure helps find reliable devices that enable successful content transmission.
1.1.2 Efficient Content Transmission Through D2D Multicast Communication
The trend of accessing popular video-on-demand contents over cellular network has
increased unimaginably due to the widespread use of social media [61], thus straining
the capacity of existing cellular networks. It has become even more challenging to
guarantee certain quality of service (QoS) in terms of content delivery time. The cellular
network resource utilization can be significantly improved when requests for a particular
content are generated from a group of users located in a particular area [6]. In such
cases, a traditional multicast scheme serves all users in a cell by limiting the data rate to
the user with the worst channel condition. Therefore, users with better channel quality
cannot take advantage of it which results in degraded satisfaction. One promising
solution to face this issue is to use device-to-device (D2D) communications for enabling
proximity services over the cellular licensed band [16]. In such scenarios, D2D can
achieve superior data rate even with small transmit power by utilizing the better channel
quality among devices, thus enhancing the QoS. Due to the high interest for same piece
15
of video content in a particular location and the performance improvement achieved by
D2D communication, the 3rd Generation Partnership Project (3GPP) defined proximity
based multicast as a service to efficiently deliver content over the cellular network
specially during crowded events [3], [4].
The existing works showed D2D can dramatically improve the performance of
underlaying cellular network in terms of guaranteed superior data rates, effective quality
of service, high spectrum efficiency and enhanced system capacity [6], [16]. However,
they failed to consider the following two practical aspects, (1) Base Station (BS) cost
and (2) social relationship between users. In the real-world scenario, mobile users are
reluctant to share their resources [48] which makes it challenging to choose suitable
users as relay. Several factors, such as finite energy, limited storage, valuable CPU
resource and security and privacy considerations make them far from altruistic. BSs
must pay incentives to encourage users so that they are more willing to share their
resources which in turn incurs cost to the BSs [81]. Unfortunately, majority of the current
works have ignored the BS cost and assumed cooperative and selfless users while
designing the D2D systems [6]. Human social relationship is a very important factor
in D2D system design since the devices are carried by humans. Social relationship
in general exhibit the property of having community structure in which users can be
grouped into densely interconnected communities. Users belonging to the same social
community in real life will be more interested to extract a content from another user in
that community and also a user will be more willing to share its resources with other
socially connected users in the same real-world community [48]. Majority of the recent
works on multicasting have also failed to consider this social aspect of human behavior
[6]. We, on the contrary, in this dissertation, incorporate both of these aspects while
considering potential relay devices for D2D-based multicast communication.
In this dissertation, we reap the benefit of D2D by identifying a set of Relay Devices
(RDs) in different hops that will collectively incur at most a given cost to the eNB (BS in
16
LTE-A [3]) for efficiently relaying content. RDs in the first hop receive data directly from
the eNB and forward it to the next hop users who are socially connected to that RD and
also within close physical proximity. By leveraging the knowledge of social connections
and channel conditions among devices, eNB decides modulation and coding schemes
(MCS) for each hop, and also selects hop-wise RDs for minimizing latency.
1.1.3 Community Structure Vulnerability
Understanding the community properties can help assess its impact on network
vulnerability since changes or failures occurred in one community can have a profound
impact which can consequently lead to the transformation of other communities. Due
to the high interaction within a community, we intuitively assume that communities are
hard to break; therefore, community-based solutions are very robust. Let us take a
community-aware routing protocol in DTNs as an example. In this approach, a group
or community in DTNs can be visualized as a group of frequently interacting wireless
devices with less connectivity to other groups. Devices in the same community have
higher chances to encounter each other to transfer carried messages. Therefore, the
knowledge of the community structure could help the routing protocols to wisely choose
better forwarding relays for any specific destination, and hence, could significantly
improve the chance of message delivery. These approaches have been shown to
be very efficient and are among the best methods in DTNs [31, 35]. However, the
success of the forwarding clearly depends to a great extent on the internal structure
of communities. The non-participation of only some important devices is significant
enough to degrade the entire network’s performance. Removal of certain edges can
lead to unstable behavior of the whole routing process [11]. This raises a question: Are
communities really as hard to be broken as believed, even to intentional attacks?
In this dissertation, we first study how social-aware approaches can significantly
improve the performance of multi-hop D2D and subsequently, we proceed on assessing
community strength with respect to the removal of edges. The removal of edges can be
17
interpreted as the failures in communication links in wireless networks or DTNs due to
energy constraint or the moving of wireless devices. The removal of edges can be also
done via unfriending in OSNs. More specifically, in this dissertation, we choose several
combination of different types and sizes of communities and attempt to break them.
Clearly, if the number of edges removed is significantly less than the total number of
edges in communities, we can say that it is easy to break the communities. Otherwise,
we conclude that the communities are very strong.
Unfortunately, identifying these critical edges is very challenging due to several
factors: 1) Communities behave very differently based on the location of edge removal.
They can either stay intact if the removal edge is less important, or can be broken down
into smaller subcommunities which can further be merged to other communities. 2)
There is no universally agreed definition of community and there is a vast amount of
community detection algorithms in the literature [29, 51, 58], it forces us to define a
general method to assess the broken communities for an arbitrary community detection
algorithm. And 3) the networks are in large-scale, thus the devised algorithms identifying
these critical edges must be scalable.
1.1.4 Contribution
The main contribution of this dissertation is to introduce a new framework that
exploits durable social communities to enable successful transfer of a content between
two devices with minimum cost using multi-hop D2D. We model the problem as
a cost-effective device selection strategy on multi-hop D2D for real-time content
delivery. We first formulate the durable community structure and introduce the
concept of sustainable and bridge edges by exploiting the historical encounters of
devices. We further propose a novel community detection method based on those
previous encounters. Subsequently, we formulate the device selection problem as an
optimization problem and we introduce an efficient method for finding the optimal set
of devices on multi-hop path leveraging those social communities. This is in contrast
18
to most existing works on multi-hop D2D that solely focus on system performance
[16, 38, 44, 50, 53, 68]. Simulation results show that our method outperforms classical
social-unaware methods significantly on traces generated by the state-of-the-art mobility
models.
In this dissertation, we also introduce another novel framework that exploits the
social relationships among devices to choose a set of RDs in different hops along with
hop-wise MCSs, within the eNB budget that enables successful transfer of a content
from the eNB to all multicast users in minimum time. We model this problem as an
efficient relay selection strategy for multi-hop D2D for real-time content delivery. We
show its NP-completeness and form a mixed integer program to solve it. We also
introduce a scalable heuristic algorithm to tackle this generic version of the problem. We
further analyze a special case of this problem for delivering the content to all the users.
We propose a greedy algorithm with provable performance guarantee for this particular
case. Moreover, we experimentally show the effectiveness of our proposed methods.
In terms of identifying critical edges important for community structure, we can
summarize the contributions as follows:
• We define the framework for community structure fragility. At first we introducethe density based broken community (DBC) problem for breaking k communitieswith the minimum number of edge removals and analyze its complexity. We thenprovide an approximation algorithm with theoretical performance guarantee for theDBC problem.
• To analyze the vulnerability of the community structures in a broader sense,we extend the problem formulation to communities produced from an arbitrarycommunity detection algorithm. We offer an efficient heuristic to break thecommunities and identify the set of critical edges.
• In order to analyze the edge constrained version and accordingly to identify theedges that are crucial for community structure, we furthermore examine theproblem from the view point of locating a fixed number of important edges whoseremoval breaks as many communities as possible.
• We conduct extensive experiments with different parameters to mine interestingobservations about the behavior of broken communities after edge removal. The
19
results show that only a small percentage of edges are enough for breaking thecommunity structure. And thus, the communities are not as strong as we think.
1.1.5 Paper Organization
The rest of the dissertation is organized as follows. Chapter 2 introduces the
social-aware community based approaches for efficient content delivery in D2D. Chapter
3 discusses the social-aware relay selection mechanism for multicast content delivery
whereas chapter 4 introduces cost-effective relay selection (CRS) procedure which is
a special case of the multi-hop problem discussed in Chapter 3 and also introduces
efficient algorithms with provable performance guarantee to tackle this new problem.
Chapter 5 analyzes the vulnerability of community structure and discusses how its
structure changes under edge removal. We draw the conclusion in Chapter 6.
1.2 Literature Review
In this section we discuss the recent progress made on the D2D communications
research. First, we discuss the recent publications on multi-hop D2D and describe
how our framework brings novelty in this line of research. Secondly, we discuss the
social-aware schemes aimed at improving the multicast performance in cellular networks
in terms of D2D communication. Finally, we also provide a list of recent works that have
focused on importance of the community structure and its vulnerability in particular.
1.2.1 Recent Advances in Multi-hop D2D Communication
Research community have seen a deluge of works in recent times that investigate
the impact of underlaid D2D on cellular network’s performance [16, 19, 21, 39, 45, 49,
66, 72, 73]. Most of the papers encompass the potential of D2D for reducing outage
probability of mobile devices [28], to offloading mobile backhaul data traffic [24], to mode
selection and device discovery, to efficient spectrum management through interference
coordination [54, 56, 60, 78]. Despite significant research on cellular D2D, there are very
few works which consider the cellular multi-hop D2D case. One of the earliest related
works is [53] in which the relay selection problem for cellular D2D was studied. In [74],
20
the authors consider D2D communication for relaying user equipment (UE) traffic while
introducing a relay selection rule based on interference constraints. The works in [44]
and [68] investigate the maximum ergodic capacity and outage probability of cooperative
relaying in relay-assisted D2D communication. The results show that multi-hop D2D
lowers the outage probability and improves cell edge throughput capacity by reducing
the effect of interference from the cellular users. However, none of these works factors
in the impact of mobility of devices on the system performance and on the successful
delivery of time sensitive contents in particular.
We formulate the device selection problem as an optimization problem and we
introduce an efficient method for finding the optimal set of devices on multi-hop path
leveraging the social communities based on device encounters. This is in contrast
to most existing works on multi-hop D2D that solely focus on system performance
[16, 38, 44, 50, 53, 68].
Note that, unlike the more classical case of delay tolerant networks (DTNs) [12, 34],
we consider only time sensitive content transfer between source and target with certain
delay constraint on the total transmission time. This makes our simultaneous D2D
transmission fundamentally different than the DTN which is distributed in nature where
the decision to transmit a content upon a device contact is made locally. In addition,
signal interference, resource allocation, noise and fading are intrinsic design parameters
in D2D communication underlaying cellular networks which makes the design and
operation of D2D completely different from DTNs and related ideas such as ad hoc
networks.
1.2.2 D2D and Multicasting in Cellular Network
Recently, multimedia content sharing over D2D in underlaid cellular network has
been investigated in several works, a survey on this can be found in [6]. The work
in [15] focuses on designing an adaptive resource allocation policy for the efficient
delivery of multicast services in Long Term Evolution (LTE) systems. The authors
21
exploited the multi-user diversity in splitting the multicast group into subgroups and apply
subgroup-based adaptive modulation and coding schemes. [63] proposes a learning
solution based on a multi-armed bandit algorithm that dynamically selects the best
allocation of users between multicast and D2D to guarantee the timely delivery of data.
The main difference of the cited works compared to our proposal is that none of these
works factors in the impact of dynamic social behavior on the system performance and
on the successful delivery of time sensitive multicast contents in particular. Moreover,
they do not consider the BS cost and assumes the altruistic nature of users which make
them inapplicable in real world scenario. We exploit the social aspect as well as radio
network characteristics while choosing cost-effective relays that incur less BS cost.
Moreover, our multi-hop based generic approach outpaces some recent works that focus
on leveraging D2D but limits it to only two-hop communication [55]. Finally, the above
mentioned D2D based works concentrate their attention to pairs of directly connected
content source and content requester, whereas, our work encompasses a broader class
of generic case, namely the multi-hop D2D scenario for delivering the content efficiently
under various practical constraints.
1.2.3 Community Structure Vulnerability
Although a lot of work has been performed on network vulnerability assessment,
none of them really targeted the problem from community structure point of view by
defining quantification measure for broken community. Nam et al. [57] deals with
community structure vulnerability from node point of view based on the Normalized
Mutual Information (NMI) measure. Alim et al. [13] identifies important nodes critical for
overlapping community structure. They find out how different the network communities
are once nodes are removed, but does not address the core issue whether the
communities are broken or not.
The literature on community structure and its detection can be found in an excellent
survey of Fortunato et al. [29]. Assessing the vulnerability of network community
22
structure, however, has so far been a relatively untrodden area. In his recent work [18]
Borgatti address the problem of discovering key players in a network. A large body of
work has been devoted in identifying the node roles within a community by a link-based
technique together with a modification of node degree [64], or by the detection of key
nodes, overlapping communities and “date” and “party” hubs [40]. However, none
of these approaches discusses whether the communities are strong enough under
sustained attack or not.
On the assessment of network vulnerability, existing studies mainly focus on
assessing the average shortest path length [9], and the global clustering coefficient
[52]. Dinh et al. [26] suggested the β-disruptor problem to find a minimum set of edges
or nodes whose removal degrades the total pairwise connectivity to a desired degree.
None of these works consider the assessment of network vulnerability from community
structure point of view.
23
CHAPTER 2LEVERAGING SOCIAL COMMUNITIES FOR OPTIMIZING CELLULAR
DEVICE-TO-DEVICE COMMUNICATIONS
In this chapter, we investigate the impact of social-aware community based
approaches on the performance of D2D underlaying cellular networks. We first present
the motivation for applying social based strategies in enhancing content delivery rate
in multi-hop D2D communication network in Section 2.1 and introduce the system in
Section 2.2 while Section 2.3 provides the problem formulation. Section 2.4 discusses
reliable device selection procedure for multi-hop D2D. Simulation results are analyzed in
Section 2.5.
2.1 Cost-effective Relay Selection for Content Delivery in Multi-hop D2D
Multi-hop transmission [16, 38, 50] has gained interest in recent times for D2D
underlaying cellular networks. Such multi-hop D2D architectures can potentially
increase the capacity of D2D communication by alleviating the effect of interference
from the cellular users [44, 53, 68]. Unlike multi-hop ad hoc networks, which do not use
the cellular spectrum and do not require any infrastructure, multi-hop D2D is controlled
centrally by BS for ensuring the QoS of both the cellular and D2D users simultaneously.
One major challenge in the analysis of such mobile, multi-hop D2D pertains to its strong
dependence on dynamic human behavior which must be correlated with the complex
QoS considerations of the cellular system.
For establishing D2D connections, the cellular base station (BS) must provide
proper incentives to the users so that they become willing to share their resources for
each others transmissions. Naturally, if most users are unwilling to participate in D2D
transmission, the resources cannot be fully utilized, and the operation of the underlaid
cellular D2D links will be jeopardized. For real-time content transmission, that must meet
stringent latency requirements, a high mobility of the devices will disrupt an ongoing
D2D session. This will eventually lead the D2D transmission to fail in delivering the
content within the needed time bound. In such cases, the BS must initiate resource
24
consuming cellular connection after dropping the interrupted session, thus reducing
the overall network QoS and failing to exploit the benefits of D2D. Consequently, to
enable reliable delivery of real-time content over multi-hop D2D at minimum BS cost, it is
imperative to identify a set of reliable devices. Also, such devices must remain within the
transmission range of one another during the D2D session to maintain the QoS. Next,
we lay the foundation for identifying these cost-effective relay devices on the multi-hop
D2D underlaying cellular network.
2.2 System Overview and Model Representation
2.2.1 System Overview
Consider the downlink transmission of an OFDMA cellular network consisting
of a single base station (BS) and a set N of user equipments (UEs). The UEs are
able to communicate with one another using D2D links that are underlaid on the
cellular network. The total bandwidth B is divided into F resource blocks (RB) in the
set F . We consider a co-channel network deployment in which B is shared between
cellular and D2D transmissions while considering one RB per UE. We assume UE
i requests a content from BS which, in turn, selects UE j (i , j ∈ N ) among other
UEs having the content, as the source of the content. The BS will enable direct D2D
connections between UE i and UE j when the distance between them is within a
desired D2D communication range dmax which, in turn, corresponds to a required
signal-to-interference-plus-noise ratio (SINR) as shown in Figure 2-1A.
In practice, setting up reliable direct D2D connections while satisfying the
quality-of-service (QoS) requirements of both the traditional cellular UEs (CUEs) as
well as the D2D UEs is challenging. On the one hand, the unreliable propagation
medium and longer distance might affect the link quality between D2D devices (Figure
2-1B). On the other hand, interference from other cellular and D2D UEs sharing the
same RB will also contribute toward lowering the SINR (Figure 2-1C). In such low SINR
25
A D2D transmission with high SINRdue to distance d ≤ dmax
B D2D transmission not possible due tolow SINR as d > dmax
C Channel interference between cellu-lar communication (UE3 and BS) andD2D pairs (UE1, UE2) and (UE4, UE5)
Figure 2-1. D2D communication scenario before the transmission takes place
cases, the use of multi-hop D2D communications can be beneficial to enhance the
overall D2D QoS.
Indeed, the effectiveness of multi-hop D2D depends on suitable device selection
mechanisms. Ideally, for the D2D to successfully sustain data transmission, the devices
that are chosen along the multi-hop D2D path must not move beyond the D2D range
during a communication session so as to maintain the desired SINR target. Designing
such mechanisms is challenging due to the coupling between mobility patterns,
26
Figure 2-2. Flow chart for the proposed solution scheme
incentives for sharing resources, and network QoS. In our model, we focus on selecting
a least cost reliable multi-hop path for real-time content delivery from a source to a
destination. It has been observed that mobility and physical encounter patterns are
very closely related to social structures, and very often frequency and length of physical
interaction is strongly correlated with proximity [22]. Therefore, we leverage the historical
encounter patterns of devices to identify social communities that gives indication on how
devices come closer to each other. Thus, the goal of the proposed least cost multi-hop
path approach is to select devices based on the social encounters and communities so
as to make sure they stay within close proximity of one another during the D2D session.
A flow chart that summarizes the implementation of the proposed scheme is
shown in Figure 2-2. Whenever a request for a content comes to the BS from a device
r , the BS identifies the source of the content. If no such device is found to hold the
content, then, the content is transmitted directly from the BS towards r using cellular
communication.
27
If a device s having the content is identified, the BS initiates the durable community
detection phase by invoking the DCD algorithm that is detailed in Subsection 2.3.3.1.
The BS then assigns proper edge weights to each of the D2D pairs present in its
coverage area using the social-based technique that is explained in Subsection 2.4.2.
Finally, the BS identifies the multi-hop D2D path to relay the content from s to r , if there
exists any such feasible path that can deliver the content within a certain time threshold
tmax , instead of; otherwise, the BS initiates a direct cellular connection towards r . In
the former case (when a feasible path exists), if the total incentive that the BS has to
pay to the relay devices on the multi-hop path is larger than the direct BS to r cost
which is termed as B2D cost, the BS also initiates a direct cellular connection towards
r rather than serving the content via D2D. Once the content transmission starts via
multi-hop D2D, the BS keeps track of the pairwise mobility of devices for each hop. If
the device mobility leads to a minimum allowable SINR that is below a certain threshold,
the multi-hop D2D connection can no longer be sustained. At this point, the BS has
to initiate a direct cellular connection towards r to fulfill its content request. Next, we
describe the necessary system model.
2.2.2 System Model
In our network, we consider real-time content sharing among mobile D2D users
with strict delay requirements. We assume device r requests a content of size b from
the BS at time t. The BS identifies s, another UE, as the peer device having the content
that would serve the request of r via D2D. There are several approaches to identify a
suitable source for a requested content in literature [75] which is not our focus in this
dissertation. Hereinafter, s is referred to as the source device and r as the destination
device. However, as discussed previously, these UEs may not be able to communicate
directly due to physical constraints and hence a multi-hop path needs to be identified for
effective content transfer.
28
Table 2-1. Summary of important symbolsSymbol Description Symbol DescriptionN Set of UEs in the network Gr Multi-hop D2D graph at time t
B Bandwidth of the network η Speed of lightF Set of RBs ci ,j Cost that BS pays to incentivize i to send content to j
F Total number of RBs ψ Shadowing componentdmax Maximum D2D range �t Historical encounter spans Source of the content t Actual time when content request is generated from r
r Destination/requester of the content Gp Contact graphb Content size tc Time that would have taken to transmit content c of
size b from s to r if they were within dmax
Ri ,j Achievable data rate between device i and j Dij Encounter duration between device i and j
lz Bandwidth of resource block z ∈ F δ Predefined stability thresholdZ Set of devices sharing same RB z Li (t) Position of device i at time t
gi ,j Channel gain between device i and j �Dij Average contact duration between device i and j in �tσ2 variance of the Gaussian noise λi ,j Average number of encounters between i and j
pi Transmit power of device i G e Encounter history graphγi ,j SINR at device j for the link i → j ζ Strength thresholddi ,j Distance between device i and j ρ Predetermined weight factorα Path loss exponent C Set of durable communitiesm0 Fading component wb
uv Weight of bridge edge between u and v
pi ,j Received power at device j for the link i → j w suv Weight of sustainable edge between u and v
ti ,j Time required to transmit content Bu,v Percentage of actual encounter duration larger than tcfrom device i to j between device u and v
tp
i ,jPropagation delay between device i to j hC Durability of community C
txi ,j
Transmission delay from device i to j k Number of detected communities
The achievable rate Ri ,j for the transmission between a device i and device j is
Ri ,j = lz log2(1 + γi ,j), (2–1)
where lz is the bandwidth of RB z ∈ F used by i for its data transmission to j ,
γi ,j denotes the SINR for j from i . For the link between i and j , considering signal
interference from all other devices using the same RB z , we have
γi ,j =pigi ,j∑
i ′∈Z,i ′ 6=i pi ′gi ′j + σ2, (2–2)
where Z is the set of devices sharing RB z , gi ,j is the channel gain between i and
j , pi is the transmit power of device i , and σ2 is the variance of the Gaussian noise.
Here, we note that the BS and the devices operate in a half-duplex mode and the
same set of resources (i.e. subcarriers) is shared for transmission of content. In our
model, devices on several D2D links can transmit simultaneously and hence can cause
interference with one another when using the same RB. However, devices on different
hops do not interfere with one another over the same RB. The proposed approach can
accommodate any algorithm for allocating RBs to the various D2D and cellular links.
29
Without loss of generality, hereinafter, we adopt graph coloring techniques such as in
[67] to perform this assignment. In our model, in line with existing D2D works [16] and
for tractability, we do not consider interference on the reverse, acknowledgment link. We
have observed in our experimental evaluation, incorporating reverse link interference
into the formulation does not significantly affect the conclusions.
Due to the fact that one cannot know which D2D links will be actively relaying at
every hop until we execute our proposed relay path finder (RPF) algorithm described in
Section 2.4, we assume that all the D2D links are active. This enables us to compute the
data rate of the links which is required by the RPF algorithm for choosing relays that can
deliver the content within tmax . In order to reduce the interference between the cellular
links and the D2D links, we identify the D2D links which are within close proximity to the
cellular link and we ensure that they do not reuse the same RBs. We only allow those
links which are sufficiently far apart to share the same resources. This is essentially
similar to the classical frequency reuse concept used in cellular networks, but now we
apply to D2D transmissions. For each D2D link (i , j), we identify the interference set for
this link. An interference set for (i , j) contains all the links whose transmitter or receiver
are within a certain distance from the transmitter i of the link (i , j) and could potentially
cause large interference. In the graph coloring based resource allocation scheme that
we use, these links are assigned different resource blocks. Links that are significantly
far away from each other are allowed to have same RB. Once the RB allocation is
complete, we utilize Equation (2–1) and (2–2) to compute the data rates for each link.
For the wireless network, we consider distance-dependent path loss and multipath
Rayleigh fading along with log-normal shadowing. Thus, the received power of each link
between devices i and j can be described as pi ,j = pi .(di ,j)−α.|m0|2.10ψ/10, where pi is
the transmit power of device i , α is the path loss exponent, m0 is the fading component,
and ψ is the log-normal shadowing component.
30
Given this SINR model, we now formulate the time required for the transmission
between device i and j . The time ti ,j in the link (hop) from i to j is defined as
ti ,j = tpi ,j + txi ,j =di ,j
η+
b
Ri ,j
, (2–3)
where tpi ,j is the propagation delay between device i and device j which, in turn, depends
on the distance di ,j of the single hop link between i and j , and the speed of light η. The
transmission delay, txi ,j , depends on the packet size b and on the achievable data rate for
the transmission between i and j as per (2–1).
To incentivize a certain device i for sharing its resources with another device j ,
the BS must incur a cost ci ,j . A device that experiences a good channel and that has a
higher transmit power will be able to transfer content more efficiently than others, and
hence is a better candidate for D2D from the BS’s perspective. Accordingly, we have,
ci ,j = pi ,j = pi · (di ,j)−α · |m0|2 · 10ψ/10. (2–4)
This incentive/cost can be in the form of monetary remunerations, coupons, or
free data. We summarize most of the important notations used throughout this chapter
in Table 2-1. Next, we define the necessary framework for formulating the problem of
identifying reliable devices on multi-hop D2D.
2.3 Problem Formulation and Solution
2.3.1 Problem Formulation
Given this wireless network model, the next goal is to find a set of devices that
would enable feasible multi-hop D2D communications while satisfying stringent delay
constraints and minimize the BS’s cost, as per (2–4). We introduce the concept of
feasible path formally as follows:
Definition 1. (Feasible Path) Given a cellular network G = (V, E), where V is the set
of all devices and E is the set of links that connects them, a feasible path from source
s to destination r in G , is an ordering P of devices in V, where P =< i1, ... , ik > such
31
that i1 = s, ik = r , (ij , ij+1) ∈ E and given the interference and mobility of devices,∑k
i=1,j=i+1 ti ,j ≤ tmax where ti ,j and tmax indicate the time required to transfer a content
from i to j and the maximum allowed content sharing time, respectively.
For successful delivery of a content using multi-hop D2D, the devices on a feasible
path must also remain within a range that corresponds to the desired SINR throughout
the D2D session. To combine these properties, we now present the cost-effective device
selection problem for multi-hop D2D (CEDS-MD):
Problem 1. (CEDS-MD) Cost-effective device selection for multi-hop D2D (CEDS-MD)
seeks to identify a feasible path P that results in minimum cost of transmission from
source s to destination r by minimizing the device cost denoted by C(P) =∑
(i ,j)∈P ci ,j ,
where ci ,j is the cost of BS for incentivizing device i to share resource with device j , and
i is the immediate predecessor of j in the feasible path and the devices on P remain
within the D2D transmission range throughout tmax as governed by the cellular base
station.
2.3.2 Social Community Aware Cellular Network
Incorporating social based device proximity information with conventional physical
layer metrics enables better resource utilization and enhanced traffic offload in D2D
[62]. However, these measures are not able to capture the impact of user mobility on the
successful completion of D2D transmission particularly when devices are moving rapidly
during the transmission. Consequently, there is a need to adopt a more realistic view for
the social context by basing it on other social dimensions such as the actual encounters
between users. Device encounters have been shown to satisfy the community structure
property [22] and thus, the stability of D2D session must be correlated with durable
social communities.
Therefore, as a first step towards solving CEDS-MD, we must identify durable social
communities based on the previous encounter histories. When two devices i and j
are within the transmission range dmax of each other, they can communicate in D2D
32
mode under the control of the BS. According to the 3GPP Release 12 [5], for proximity
services, each mobile device not only updates its location with the BS on a periodic
basis, but it also reports the presence of other devices within close distance, both in time
and space, who have already subscribed for the proximity-based services. The BS then
saves the corresponding device identities as well as start and end time of the contact.
Assuming a content request is generated at a given time t during a day, the BS extracts
all the specific historical encounters that start around t in order to realistically predict the
mobility pattern of the devices. To this end, the BS constructs a physical contact graph
Gp which is a weighted undirected graph and detects the durable communities. Devices
belonging to the same community are more likely to have longer contact duration and,
hence, they will get more priority to be chosen on the multi-hop D2D if they happen to be
within each others proximity at content request time t.
In Gp, each edge represents the average duration of contact between two devices
for a certain span �t of previous days. �t can be any number of previous days (or
hours) depending on the way the encounter histories are being preserved in the BS. If
tc is the time required for the content to be transmitted from s to r when they are within
the range dmax , the BS will need to consider those previous encounters in �t that have
an average duration of at least tc . Although encounters having duration at least tc are
good candidates for reliable connections, the longer the duration the more durable it is.
To put the duration length into perspective, we not only take into account the encounters
having duration of sufficient length (tc ) but also all the previous encounters with duration
Dij ≥ (1 + δ)tc where the stability threshold, δ ≥ 0, is a user controlled parameter that
reflects the importance of the duration length of encounters beyond tc . At the same
time, we also emphasize on the impact of encounter rates of two devices in �t paired
with the duration. Next we will formally define the notions related to encounters.
33
2.3.3 Community Structure and Durable Community
Now, we introduce the necessary terms to describe encounters in the context of
D2D and formally define the notion of a durable community structure in this subsection.
Assume that i and j come into the communication range at time te , that is, ||Li(t−e )−
Lj(t−e )|| > dmax and ||Li(te) − Lj(te)|| ≤ dmax , where t−e denotes the time before te , Li(t)
the position of user i at time t, dmax the D2D transmission range as determined by the
BS and ||.|| the distance measure. With this, we can define the D2D contact duration:
Definition 2. The D2D contact duration between users i and j is defined as the time
during which they are in contact before moving out of the range, that is, Dij = t − te with
mint−te{t : ||Li(t)− Lj(t)|| > dmax , t > te}, where t and te are in the continuous time scale.
Consider a series of q contact durations Dij = (D1ij , ... ,D
qij ) between nodes i and j in
time frame �t, then, we can make the following definition:
Definition 3. The average contact duration, denoted by �Dij =∑q
k=1Dk
ij
q, is the expected
time during which two devices stay within dmax before they move apart again once after
coming in proximity to one another.
Next, let G e = (V, Ee,T) be an undirected graph representing the physical
encounters of |V| mobile devices. Ee is the set of undirected relationships (in this
case encounters). Each edge E ei has an associated collection of two-dimensional
vectors denoted by Ti = (Ti1,Ti2, ...). Each element of in Ti denotes contact time and
corresponding duration in �t time span, i.e., Tij =< tuv ,Duv > for all the j encounters
between device u and v in �t.
Contact Graph: The request for a content is generated at time t and tc is the time
required to transmit the content from s to r if they are within range dmax . We construct an
undirected and weighted contact graph Gp = (Vp, Ep,Wp), where |Vp| = n and |Ep| = m.
In doing so, we consider only those encounters in G e that have average contact duration
�Dij sufficiently long enough to cater tc starting at t, i.e., �Dij ≥ (1 + δ)tc where δ ≥ 0 is the
34
predefined stability threshold. wuv ∈ Wp is the weight function on each edge (u, v) ∈ Ep
where u, v ∈ Vp.
Weight Assignment in Gp: Encounters having average contact durations larger than
tc are very good candidates for sustainable D2D transmission. However, considering
only the average duration might result in choosing some encounters having a large
number of less than tc duration which will negatively impact the reliable device selection
for multi-hop D2D. To account for this in assigningWp, we will prioritize those edges
having encounters with actual duration larger than tc with more weight. To this end, we
define Buv , 0 ≤ Buv ≤ 1 that denotes the percentage of times the encounter duration was
actually larger than tc . Accordingly, we define the weight wuv = ρBuv · λuv + (1 − ρ) �Duv
where ρ, 0 ≤ ρ ≤ 1 is a predefined weight factor that signifies how much emphasis
should be put on the average encounter duration with respect to the percentage of times
the encounter duration was actually larger than tc as denoted by Buv . To account for
the encounter rate we have multiplied Buv by the weight factor λuv , so that the impact of
frequent long duration contacts can also be captured in the edge weight. λuv denotes
the average number of encounters between u and v over the time period �t.
Next, we will define a durable community structure that will group devices having
similar contact duration together. Such a structure has special properties related to
bridge and sustainable edges. In fact, an edge (u, v) in Gp is said to be bridge edge if it
has small percentage of successful contact durations Buv which is reflected by wuv < ζ
where ζ is the predefined strength threshold. A sustainable edge (u, v) is defined to
have large percentage of successful contact durations Buv which is reflected by wuv ≥ ζ.
We denote the weight of sustainable and bridge edges as w suv and w b
uv , respectively.
We leverage these edge weights in deciding the relay devices which we describe in
Subsection 2.4.2.
Consequently, a durable community structure, denoted by C = {C1,C2, ... ,Ck},
is a collection of k subsets of V satisfying ∪ki=1Ci = V. We say that, a collection of
35
nodes Ci ∈ C and its induced subgraph is a durable community in Gp if nodes inside Ci
are connected primarily through sustainable edges and nodes across communities Ci
and Cj , if connected, will have bridge edges. Next, we propose an approach to detect
durable communities in Gp.
2.3.3.1 Durable community detection
For a node u ∈ Vp, let Au be the set of neighbors adjacent to u. Moreover, let wu be
the weight corresponding to this set. For any C ⊆ Vp, let C in and C out be, respectively,
the set of links having both endpoints in C and the set of links heading out from C .
Additionally, let wC =∑
(u,v)∈C in wuv , w outC =
∑(u,v)∈Cout wuv and w+
C = wC + w outC .
Given the contact graph Gp, we seek to find a community structure C = {C1,C2, ... ,Ck}
that would strive to group sustainable edges inside a community and place bridge edges
across communities. Intuitively, any grouping that maximizes the ratio of sustainable
edges to bridge edges inside a community achieves our objective. Thus, we define
the durability of a community C as hC = wC
w+
C
, and we formulate the following Durable
Community Detection (DCD) optimization problem:
maximize R =∑C∈C
hC =∑C∈C
wC
w+C
,
s.t. Ci ∩ Cj = ∅ ∀i , j ∈ {1, 2, ... , k},k⋃
i=1
Ci = Vp
In this formulation, the number of communities k is determined by optimizing
the objective function R and is not an input parameter. Next, we show the following
properties of network communities identified by optimizing our suggested metric R:
(i) links within a community have high durability contribution and (ii) links connecting
communities have low durability contribution.
36
Proposition 2.1. Let C = {C1,C2, ... ,Ck} be a community structure detected by
optimizing R, links within each Ci are of strong durability contribution while those
connecting communities are of weak durability contribution.
Proof. For any node u ∈ Vp and subset S ⊆ Vp, let wu,S be the total weight of all links
that u has towards S and vice versa. By this definition, we obtain wu = wu,S + wu,Vp\S .
Consider a community C ∈ C, u ∈ C and v /∈ C . Since v is not a member of C , we
have
wC
w+C
>wC + wv ,C
w+C + wv
=wC + wv ,C
w+C + wv ,C + wv ,V\C
,
because otherwise adding v to C will give a better value of R. This equality results in
wv ,C
wv
<wC
w+C
,
which, in turn, implies that the links joining v to C are insignificant in terms of durability
contribution with respect to the total weight of C as a whole.
Similarly, for any node u ∈ C , we have
wC
w+C
>wC − wu,C
w+C − wu
=wC − wu,C
w+C − wu,C + wu,V\C
,
because otherwise excluding u from C will give a better estimation of R. This inequality
simplifies to
wu,C
wu
>wC
w+C
,
which shows that the links joining u to C are of significant weight having larger durability
contribution in comparison to the total internal weight of C .
2.3.3.2 A greedy algorithm for DCD problem
Solving the DCD problem is NP-hard as shown by a similar reduction to modularity
as in [20]. Consequently, a heuristic approach that can provide a good solution in a
37
timely manner is more desirable. In this regard, we propose a greedy algorithm for the
DCD problem consisting of three phases, shown in Alg. 1.
The first phase, referred to as the development phase, identifies raw communities
in the input network. Initially, all nodes are unassigned and do not belong to any
community. Next, a random node is selected as the first member of a new community C ,
and consequently, new members who help to maximize C ’s durability, hC , are gradually
added into C . When there is no more node that can improve this objective of the current
community, another new community is formed and the whole process is then continued
in the very same manner on this newly formed community.
Next, the augmentation phase rearranges nodes into more appropriate communities.
In the first phase, new members are added into a community C in a random order.
Therefore, C ’s objective value hC can further be improved if some of its members, that
reduce the total durability, are excluded. Such nodes then form singleton communities.
This step requires the re-evaluation of all C ’s members as a result. The removal of such
nodes creates more cohesive communities having higher internal connectedness.
In the last phase, the refinement phase, global stability of the whole network is
re-estimated. This phase looks at the merging of two adjacent communities in order to
improve the overall objective function. If two communities have a large number of mutual
connections between them, it is thus more durable to combine them into one community.
The run time complexity of the development and augmentation phases are O(nm).
Moreover, even though the refinement phase might take O(n3m) time in the worst case
scenario, we have found that the DCD algorithm computes the durable communities
within milliseconds even for networks containing hundreds of nodes as reported in
Table 2-2. Since the optimal solution takes exponential time for larger instances of the
network, we use smaller values of n in order to obtain results for optimal solution for
comparing with the running time of DCD. We formulated the DCD problem as an integer
program with quadratic constraints and objective function and solved it using CPLEX
38
Algorithm 1 DCD algorithmData: Network Gp = (Vp,Ep,Wp)Result: Durable community structure CPhase I: Development Phase.Initialize C ← ∅Initialize Q ← Vpwhile ∃ unassigned node x ∈ Q do
C ← {x}Q ← Q\{x}while ∃y ∈ Q such that hC∪{y} > hC do
y ← argmaxy∈Q
{hC∪{y}}
C ← C ∪ {y}Q ← Q\{y}
C ← C ∪ {C}Phase II: Augmentation Phase.for C ∈ C do
while ∃x ∈ C such that hC\{x} > hC doC ← C\{x}C ← C ∪ {x}
Phase II: Refinement Phase.while ∃C1,C2 such that hC1∪C2
> hC1+ hC2
do(C1,C2)← argmax
C1,C2∈C{hC1∪C2
− hC1− hC2
}
C ← (C\{C1,C2}) ∪ {C1 ∪ C2}Return C
[36] to obtain the result for optimal solution. We have reported the results of run time
comparison in the Table 2-3. Clearly, the run time complexity of the optimal algorithm
increases exponentially as the number of devices increases in the network, whereas
DCD takes only a small amount of time on all of those cases which makes DCD suitable
for real-time relay selection.
Table 2-2. Running times in seconds for DCD
Method User count (n)20 50 80 110 140 170
DCD 0.006 0.022 0.05 0.018 0.27 0.84
39
Table 2-3. Comparison of running times in seconds
Method User count (n)10 15 20 25 30
DCD 0.006 0.005 0.006 0.005 0.009Optimal 1.68 6.31 422.73 1465 2970
2.4 Cost-Effective Device Selection
Once content request is generated at time t, the BS initiates a centralized process
that encompasses two tasks. First, it constructs Gp and finds out durable communities
as described in previous section. In the second step, the BS selects a set of devices
to solve the CEDS-MD problem defined in Problem 1. To ensure high likelihood of the
successful delivery of content through D2D, the BS incorporates the social encounter
based community information as described subsequently.
2.4.1 Relay Graph Construction
The BS initiates the second step for device selection by constructing a multi-hop
D2D graph Gr = (Vr , Er ,Wc ,We) where Vr is the set of devices present at time t.
Wc denotes the BS cost, for any (i , j) ∈ Er , ci ,j indicates how much incentive BS
has to spend in order to make device i agree to share its resources with device j for
relay purpose as defined in (2–4). We put an edge between two devices i and j if and
only if the distance between them is within the D2D communication range, that is, the
SINR from i to j is above a certain threshold as determined by the BS. Here, the BS
is also considered as part of the graph where it is represented by a vertex. The edge
connecting the BS and each device has a cost that pertains to the physical channel
condition between them. Since a transmitting device in a D2D pair with better channel
condition is preferred from the BS’s point of view, the BS will pay a higher incentive and
thus, it incurs more cost to the BS which is captured in equation (2–4). In contrast, for a
direct BS to device connection, a receiving device having better channel condition with
the BS will require less physical resource blocks for the transmission which will result
40
in a smaller B2D cost. The BS will have to use a relatively large number of resource
blocks to transmit the content within tmax to a device which is far away from it which is
essentially a device experiencing poor channel condition at the BS. Consequently, the
cost for BS to that device, termed as B2D cost, will be naturally higher than a device
with better channel condition. In summary, the B2D cost can be defined to be inversely
proportional to the radio channel condition from the BS to that device as denoted below.
cBS,j =K
pBS,j= K × {pBS .(dBS,j)−α.|m0|2.10ψ/10}−1. (2–5)
A device located closer to the BS essentially experiences better channel condition
at the BS and incurs less B2D cost to receive the content. The inverse of the numerical
value of the received signal at device j from the BS, denoted by pBS,j is a large number;
the constant K < 1 is thus required to normalize the cost so that the B2D cost is in the
same scale with the multi-hop D2D cost. To account for the mobility of the devices on
the multi-hop path, i.e., increasing the likelihood of successful content delivery, we resort
on identified durable communities for the assignment of edge weightWe described
below.
2.4.2 We Weight Assignment in Gr
Since the durable communities are constructed based on physical encounter
history, users belonging to the same community have strong connections internally
that not only help in reliable content transfer but also lay the basic foundation for
stable and sustainable encounter predictions. The BS follows specific rules in order
to assign proper edge weights Wij between two devices i and j who are within dmax
in Gr according to their membership in the durable communities obtained from the
contact graph Gp. (i) Devices belonging to same community as well as connected via
sustainable edge will have small weight that is inversely proportional to the total internal
edge weight of that community. (ii) Devices belonging to same community but either
connected with a bridge edge in Gp or without any edge in Gp will have larger weight in
41
Gr compared to case (i). (iii) If devices belong to different communities Ci and Cj and
there is no edge connecting them in Gp or the edge connecting them is a bridge edge in
Gp, the edge connecting them in Gr will have large weight that is inversely proportional
to the weight of the edge bearing minimum weight among all edges connecting Ci and
Cj in Gp. If there is no edge connecting Ci and Cj in Gp, we assign Wij the value which
is the maximum weight between any two devices in Gr . (iv) If devices belong to different
communities Ci and Cj and a sustainable edge connects them in Gp, the edge weight
Wij between i and j in Gr will be smaller than that of case (iii). According to these four
criteria, edge weights are assigned between adjacent devices (within dmax ) in Gr which
help our proposed solution RPF to choose suitable relay devices for multi-hop content
transfer as we will demonstrate in the performance evaluation section.
2.4.3 Social Community Aware Device Selection for Multi-hop D2D
The goal is to find a least cost path from s to r in relay graph Gr within practical
constraints of maximum delivery time imposed as part of latency which puts a limit
on the number of relay devices. At the same time, we emphasis on the importance of
incorporating durable communities into decision making process of device selection for
successful D2D session completion. To take this into account, we modify the cost of the
path P in Problem 1 as part of our solution to CEDS-MD. Accordingly, we include the
edge weight Wij that was computed in Section 2.4.2, to obtain the total cost wij between
i and j as follows:
wij = Wij + ci ,j . (2–6)
Note that, both the terms in the right hand side of (2–6) are normalized and of the
same order of magnitude. For real-time content sharing with D2D communication, we
can formulate the optimal relay selection problem in multi-hop D2D cellular network as
42
the following optimization problem. Let the variable xij represent each edge (i , j) ∈ Er :
xij =
1, if e(i , j) is selected for least cost feasible path.
0, otherwise.(2–7)
We have the following Integer Program (IP):
min∑
(i ,j)∈E
wijxij (2–8)
s.t.∑
(i ,j)∈E
fij −∑
(k,i)∈E
fki =
1 i = s,
−1 i = r ,
0 ∀i ∈ V , i 6= s, t,
(2–9)
∑(i ,j)∈E
ti ,jxij ≤ tmax , (2–10)
xij ∈ {0, 1}, ∀(i , j) ∈ E . (2–11)
(2–9) ensures that the selected cost-effective devices constitute a path. The time
for transmission between devices i and j is obtained considering cellular and the
wireless channel as in (2–3). (2–10) makes sure that the selected devices deliver the
time-sensitive content within the maximum allowable time tmax with high likelihood.
This optimization problem is NP-complete since it belongs to a class of combinatorial
optimization [76]. Therefore, we cannot derive the optimal solution in polynomial time.
Next, we introduce the proposed approach to solve the CEDS-MD problem.
2.4.4 Solving the Optimization Problem
We solve the CEDS-MD problem in three steps: (i) relax the IP formulation into a
linear program (LP) and solve it, (ii) show that the optimal solution of the LP has at most
two fractional paths that will be constructed and (iii) formulate a new LP by adding new
constraints. Then, we keep solving the modified LP until it becomes infeasible. This
43
approach obtains the optimal solution in near polynomial time by using interior point
method in solving the LP. We start by relaxing (2–11) to obtain the LP:
min∑
(i ,j)∈E
wijxij (2–12)
s.t.∑
(i ,j)∈E
fij −∑
(k,i)∈E
fki =
1 i = s,
−1 i = r ,
0 ∀i ∈ V , i 6= s, t,
(2–13)
∑(i ,j)∈E
ti ,jxij ≤ tmax , (2–14)
0 ≤ xij ≤ 1 ∀(i , j) ∈ E . (2–15)
Property of LP Solution
We denote the LP relaxation of (2–15) as P. The optimal solution of P is no longer
integral as in the classical shortest path problem[8], due to the addition of constraint
(2–10). However, the following theorem holds true.
Theorem 2.1. There exists either an optimal solution for P that contains at most two
fractional s, r paths or P is infeasible.
Proof. Denote Psr as the collection of all s, r paths. Denote wpj , t(pj) as the total
weight and total delay of a path pj ∈ Psr , respectively. pj is called a long-delay path if
t(pj) > tmax and is called a short-delay path otherwise.
We will show that if P is feasible and an optimal solution x∗ contains more than two
fractional s, r paths, then either x∗ can be transformed to an optimal solution with at
most two s, r paths or x∗ is not optimal. Assume x∗ contains k > 2 fractional paths and
is optimal. It is clear that some short-delay paths must be included, otherwise x∗ is not
even feasible. Therefore, the problem can be categorized into three cases: i) all paths
are short-delay paths, ii) at least two short-delay paths and a long-delay path exist and
iii) at least two long-delay paths and a short-delay path exist.
44
In the first case, if all the short-delay paths selected have the same weight, an
equivalent solution can be constructed by assigning flow of 1 to one of the selected
paths and flow of 0 to all the others. Such an optimal solution has only one path. If the
weight of the selected paths are different, by shifting the flow from heavy-weight paths to
light-weight paths can improve the solution and hence, x∗ is not optimal.
In the second and the third case, the weight of long-delay paths must be smaller
than short-delay paths or we can shift the flow to short-delay paths and improve the
solution. Denote the collection of all selected paths as Px∗, we must have∑
pj∈Px∗fjt(pj) =
tmax , where fj is the flow assigned to path pj . If the total time is less than tmax , it is
possible to shift flows from short-delay paths to long-delay paths and improve the
solution. In the second case, denote p1, p2 as two short-delay paths. Also, let pa as a
representation of all other selected paths, where
fa =∑
pj∈Px∗ ,j 6=1,2
fj ,
t(pa) =
∑pj∈Px∗ ,j 6=1,2 fjt(pj)
fa,
w(pa) =
∑pj∈Px∗ ,j 6=1,2 fjw(pj)
fa.
Clearly, fat(pa) + f1t(p1) + f2t(p2) = tmax ,
faw(pa) + f1w(p1) + f2w(p2) = Y ∗,
where Y ∗ denotes the objective value of solution x∗. Also, we have t(pa) >
tmax ,w(pa) < w(p1),w(p2).
Without loss of generality, let t(p1) < t(p2), then w(p1) > w(p2) or p1, p2 cannot
coexist in the optimal solution. Consider two moves: (1) Remove p2 from the optimal
solution. (2) Remove p1 from the optimal solution. For both moves, the solutions are
recalculated by assigning flows to the remaining selected paths. Denote the objective
value by Y 1,Y 2 for move (1) and (2) respectively. We will show that it is impossible to
have both Y 1,Y 2 ≤ Y ∗ and Y ∗ is not an optimal solution.
45
After move (1), the following formulas hold.
tmax = (fa + δ1)t(pa) + (f1 + f2 − δ1)t(p1),
Y 1 = (fa + δ1)w(pa) + (f1 + f2 − δ1)w(p1),
δ1 = f2t(p2)− t(p1)
t(pa)− t(p1).
Therefore,
�1 = Y 1 − Y ∗ = f2(w(p1)− w(p2)) + δ1(w(pa)− w(p1))
= f2((w(p1)− w(p2)) +t(p2)− t(p1)
t(pa)− t(p1)(w(pa)− w(p1))).
After move (2), the following formulas hold.
tmax = (fa − δ2)t(pa) + (f1 + f2 + δ2)t(p2),
Y 2 = (fa − δ2)w(pa) + (f1 + f2 + δ2)w(p2),
δ2 = f1t(p2)− t(p1)
t(pa)− t(p2).
Therefore,
�2 = Y 2 − Y ∗ = f1(w(p2)− w(p1))− δ2(w(pa)− w(p2))
= f1((w(p2)− w(p1)) +t(p2)− t(p1)
t(pa)− t(p2)(w(p2)− w(pa))).
Assume �1, �2 > 0, since f1, f2 > 0, we have
w(p1)− w(p2)
w(p1)− w(pa)>
t(p2)− t(p1)
t(pa)− t(p1), (2–16)
w(p1)− w(p2)
w(p2)− w(pa)<
t(p2)− t(p1)
t(pa)− t(p2). (2–17)
46
However, inequality (2–16), (2–17) cannot both hold simultaneously. To see it
clearly, let
a = w(p1)− w(p2), b = w(p2)− w(pa), (2–18)
c = t(pa)− t(p2), d = t(p2)− t(p1). (2–19)
Then, inequality (2–16) reduces to aa+b
> dc+d
, while inequality (2–17) reduces to ab< d
c.
The first one implies ac > bd while the second one implies ac < bd .
Therefore, in the second case in which there exists two or more short-delay paths
in the solution, we can always perform move (1) or (2) to reduce number of short-delay
paths without increasing objective value. The same claim holds true for the third case
with a similar reasoning.
In conclusion, we can always create an optimal solution for P while selecting at
most two s, r paths.
2.4.5 Exact Solution by Cutting Plane
Based on Theorem 2.1, an optimal solution with at most two fractional paths can
always be generated by solving P. The case of only one path is trivial since it is already
the optimal integral solution and no further work is required. Therefore, we are only
interested in solutions with two fractional paths. Clearly, the two paths must be one
short-delay path, denoted as ps and one long-delay path, denoted as pl . Since any
feasible integral solution must be a short-delay path, we are particularly interested in ps .
Denote Xps =∑
(i ,j)∈ps �xij , where �xij is the value of xij in the current solution. If we cut the
path ps out of the feasible region of P, the solution must explore other paths by adding
the following constraint
∑(i ,j)∈ps
xij < Xps (2–20)
By resolving P iteratively while updating the constraint (2–20), the feasible region
of P is gradually decreased. We continue the iteration until it is infeasible. The optimal
47
Algorithm 2 RPF: An optimal algorithm for finding least cost relay pathData: Network Gr = (Vr ,Er ,Wv ,We), source s, target r and tmax
Result: A path comprising a set S of edges forming the relayInitialize Q ← ∅Solve the LP in (2–8)-(2–15)P ← solution of LPwhile P is feasible do
F ← {construct feasible path(s) in P}Q ← Q ∪ {short delay path in F}if F contains only one path from s to r then
Return the path in Q with smallest weight
else if F contains two paths from s to r thenLet ps and pl be the pathsAdd constraint according to (2–20) to the LP
Solve the LP with the additional constraintP ← solution of the updated LP
if P is infeasible && Q = ∅ thenNo feasible path existsInitiate direct cellular communication between BS and r
else if Q <> ∅ && {∃P ′ ∈ Q|C(P ′) < C(B2D)} thenReturn the path in Q with smallest weight and cost < B2D
elseInitiate direct cellular communication between BS and r
solution will then be the short-delay path with minimum weight. The final algorithm,
which we call relay path finder (RPF), is presented in Alg. 2.
2.5 Performance Evaluation
For our simulations, the mobility trace for nodes is generated by self-similar
least action walk model (SLAW) which is shown to be very realistic in capturing
user mobility [46]. In particular, SLAW generated traces are shown to be effective in
representing social contexts present among people sharing common interests or those
in a single community such as university campus, companies and theme parks. In
human mobility, people strive to reduce the distance of travel by visiting all the nearby
destinations before visiting farther destinations unless some high priority events such
as appointments force them to make a long distance trip even in the presence of
unvisited nearby destinations. SLAW leverages this self-similarity of fractal waypoints,
which can be viewed as destinations, to realistically predict the human mobility. In this
dissertation, we have used the similar parameter settings for capturing this regularity
48
in human mobility patterns which are also suggested in the original paper [46]. The
wireless propagation channel is modeled for urban macrocell scenarios with shadowing
component set to having standard deviation of 12 dB and path loss exponent α set
to 3. The cell area is set up as a 1 km × 1 km square with the BS at its center. The
noise spectral density is −174 dBm/Hz. The transmit power for each device is 100 mW
whereas the power of the BS is set to 10 W. The total bandwidth of the RBs are set
to 5 MHz in accordance with LTE RBs [2] and the maximum D2D distance is set to
dmax = 15 m. The main wireless network parameters are listed in Table 2-4. We have
set ρ, ζ and δ to 0.8, 0.7 and 4 respectively in constructing Gp for durable community
detection. We describe how to choose these values later in this section.
Table 2-4. Main wireless network parametersNotation DescriptionCell dimension 1000 x 1000 m2
BS location Center of the areaShadowing std. dev. 12 dBPath Loss Exponent 3Noise spectral density −174 dBm/HzBS transmit power 10 WD2D transmit power 100 mWMaximum D2D distance 15 mRB size 12 sub-carriers, 0.5 ms
We have compared the performance of our solution, RPF, with Groups-NET (GNET
in short) which is a mobility-aware social-based approach that analyses the impact of
device mobility on the cellular network performance and multi-hop D2D in particular [59].
GNET identifies social groups based on previous social meetings. It then computes the
likelihood of each group meeting in future by computing the group-to-group paths by
considering the meeting regularity and shared group members. Finally, it identifies the
most probable path from the source to the destination by leveraging the group-to-group
path probability. It has been shown that GNET outperforms other state-of-the-art
49
methods in terms of improving the cellular network efficiency [59]. We also compare
our results with two other social-oblivious methods: i) minimizing cost (MC) scheme
that chooses devices that minimize the cost of the BS in content transmission, ii)
closest to destination (CD) scheme that selects the device that is physically closest to
destination at each hop. These greedy methods have been used for relay selection in
multi-hop D2D as an efficient way to offload cellular traffic and to enable content transfer
through D2D when direct connection can not be established between the source and the
destination.
We generated location of total 400 users in the designated area using SLAW model
for 72 hours and used first �t = 48 hours for detecting the social encounter based
communities. The rest 24 hours were used for simulating the D2D content transfer. We
randomly chose 20 cellular users uniformly distributed over the area and 20 pairs of D2D
devices as source and target (having distance larger than dmax ) and averaged the results
over a large number of independent simulation runs.
Figure 2-3 compares the content delivery rate for the proposed algorithm and the
baseline approaches when different parameters are varied. In Figure 2-3A, we show the
content delivery rate achieved by our proposed algorithm RPF for 140 users and three
different content sizes 150 KB, 570 KB, and 1 MB as the content sharing time tmax is
varied from 10 s to 120 s. For a particular content size b, with increasing tmax , the RPF
tends to choose devices on the multi-hop path with delivery time close to tmax so as to
minimize the cost. This results in more hops on a multi-hop D2D path, thus making it
more susceptible to device mobility. Consequently, the content delivery success rate
keeps decreasing with larger values of tmax . However, RPF chooses the same multi-hop
path after certain value of tmax as the path cost can no longer be minimized within tmax .
From the Figure 2-3A, we can see that the delivery success rate remains at around
70% after tmax reaches 120 s for content size b = 1 MB. For a particular tmax value, the
delivery success rate decreases with larger content size. Larger content requires more
50
65
70
75
80
85
90
95
100
20 40 60 80 100 120
% o
f conte
nt deliv
ere
d
tmax (sec)
b=150 KB
b=570 KB
b=1 MB
A Impact of tmax and different content sizefor RPF
55
60
65
70
75
80
85
90
95
100
20 40 60 80 100 120
% o
f conte
nt delivere
d
tmax
Proposed RPFMCCD
GNET
B Impact of tmax on content delivery fordifferent methods
60
65
70
75
80
85
90
95
100
200 300 400 500 600 700 800 900 1000
% o
f conte
nt deliv
ere
d
Content size (KB)
Proposed RPFMCCD
GNET
C Impact of content size b
40
50
60
70
80
90
100
100 150 200 250 300 350 400
% o
f conte
nt delivere
d
Number of users
Proposed RPFMCCD
GNET
D Impact of total users
Figure 2-3. Content transmission success rate for different cases
time to be transmitted which makes them more prone to device mobility. Therefore, a
larger content size results in reduced content delivery rate for a fixed tmax which is also
evident from the Figure 2-3A.
In Figure 2-3B, we can see that all the methods achieve a high success rate for
content transmission when the content sharing session is constrained by small tmax
values with the RPF securing 100% when tmax < 20 s. The results are shown for
a content size of 1 MB and user count of 140 users. However, all the approaches
experience a reduced content delivery rate for D2D sessions with longer duration. In
such cases, the mobility of the devices can lead to premature tear down of the multi-hop
D2D session. Interestingly, Figure 2-3B shows that the proposed RPF is more resilient
to mobility than all other approaches. In particular, the RPF experiences a much slower
51
performance degradation as tmax varies. Even with a maximum D2D session of close
to two minutes, RPF achieves a successful delivery rate that is more than 69%. RPF’s
consideration of durable social communities enables it to identify devices that are
likely to maintain the required QoS during the whole session by remaining close to one
another. The content delivery rate is up to 18% higher for the RPF algorithm relative to
the social-unaware scenario for tmax = 120 s.
In Figure 2-3C, we show how varying the content size impacts the content delivery
rate when tmax is set to 100 s for 140 users. Clearly, as content size increases, the
delivery rate decreases. However, the rate of degradation for RPF is much smaller
than other methods. This is due to the fact that larger contents require more time for
transmission which, in turn, makes the longer D2D session more susceptible to device
mobility. In such cases, the mobility of the devices can lead to premature tear down
of the multi-hop D2D session. As a result, methods that do not account for social
communities in choosing reliable devices on multi-hop will experience a poor delivery
rate. Figure 2-3C shows that the content delivery rate is up to 14% higher for the
proposed RPF algorithm when compared to the social-unaware scenario for b = 1 MB.
Figure 2-3D shows how the content delivery rate varies with the network size. As
the number of users increases, one expects a better delivery rate due to more option for
multi-hop. However, a large number of users will also increase interference for users that
need to transmit on the same RB. In such a scenario with scarce resources, achievable
data rate decreases leading to longer transmission time which makes them more
susceptible to device mobility. Interestingly, RPF suffers less from the increased user
concentration which makes it best device selection method. In Figure 2-3D, we can see
that the proposed RPF is more resilient to mobility than all other approaches. Moreover,
the content delivery rate resulting from RPF is up to 24% higher than the social-unaware
scenario for a user count of 400.
52
Furthermore, from Figs. 2-3A-2-3D, we can see that all the baseline methods
perform poorly compared to RPF in terms of content delivery success rate. On the
one hand, for CD, since it does not consider the signal and noise information, it suffers
from poor content delivery. On the other hand, MC always tries to minimize the BS cost
which, in some cases, results in choosing devices that require large time to deliver thus
making it prone to experience disconnections during mobility. GNET also suffers from
poor delivery rate as it prioritizes the most probable community-to-community path. Two
adjacent devices belonging to two different communities with large community-to-community
path probability will be chosen by GNET, even if they have never met before. As a result,
these devices without significant previous meeting records, might move far apart
from each other during a transmission session leading to poor delivery of content.
On the other hand, RPF’s consideration of durable social communities enables it to
identify devices that are likely to maintain the required QoS during the whole session by
remaining close to one another.
Figure 2-4 evaluates the offload performance of the proposed RPF. For a 100
seconds duration, we recorded the number of active B2D links in the network which
is shown in Figs. 2-4A-2-4D. The BS initiates a direct cellular connection towards the
target when: a) there is no feasible multi-hop path or b) the multi-hop device cost is
larger than the direct BS to device (B2D) cost or c) the mobility of devices on a path
leads to a premature disconnection of that path.
Figure 2-4A shows the impact of increasing tmax as contents of three different sizes
150 KB, 570 KB, and 1 MB are transmitted. Since contents can be transmitted for a
longer duration with increasing tmax , the number of active B2D links increases for each
of the content size b. This corroborates the intuition that the mobility of devices may
disrupt D2D sessions which will require the BS to use costly B2D links. As the content
size increases, more time is needed for content transmission which, in turn, makes the
multi-hop D2D path more prone to device mobility. Consequently, the premature tear
53
0
5
10
15
20
25
20 40 60 80 100 120
Num
ber
of active
B2D
lin
ks
tmax (sec)
b=1MB
b=570KB
b=150KB
A Active B2D links vs tmax
0
5
10
15
20
25
30
35
20 40 60 80 100 120
Num
ber
of active
B2D
lin
ks
tmax
Proposed RPFMCCD
GNET
B Impact of tmax on the number of activeB2D links for different methods
10
20
30
40
50
60
70
80
90
5 10 15 20 25
Nu
mber
of a
ctive B
2D
lin
ks
Minimum allowed SINR (dB)
Proposed RPFMCCD
GNET
C Active B2D links vs SINR
8
10
12
14
16
18
20
22
24
26
28
30
200 300 400 500 600 700 800 900 1000
Nu
mber
of a
ctive B
2D
lin
ks
Content size (KB)
Proposed RPFMCCD
GNET
D Active B2D links vs content size
Figure 2-4. Offload performance analysis for different cases
down of an ongoing session due to device mobility leads to the reliance on increasing
number of B2D links where the content is served directly by the BS to the content
requester.
In Figure 2-4B, we can see that, when a longer tmax is allowed, the number of
active B2D links increases for the content size 1 MB and 140 users. This corroborates
the intuition that mobility of devices may disrupt D2D sessions which leads the BS to
use costly B2D links. However, RPF requires 40% less B2D links compared to other
methods. The reduction in B2D links demonstrates the improved offload capabilities of
the proposed RPF. Such an offload of traffic from the BS to the D2D tier also reduces
the usage of expensive backhaul traffic.
54
90
91
92
93
94
95
96
97
98
99
100
20 40 60 80 100 120
Pe
rcen
tag
e o
f tim
e
D2D
is c
ho
se
n (
%)
tmax (sec)
b=150KB
b=570KB
b=1MB
Figure 2-5. Cost-effectiveness of multi-hop D2D for three different content sizes and arange of tmax
In Figure 2-4C, we can see the impact of minimum allowed SINR on network traffic
offload. In case of smaller SINR, devices can sustain longer D2D sessions since the
required QoS for such a communication is low. This results in more successful content
delivery over multi-hop D2D which, in turn, requires less number of B2D links. However,
when the allowable SINR is increased, the tolerance to device mobility is decreased
which subsequently results in more active B2D links. From Figure 2-4C, we can see that
the other methods require as high as 158% more B2D links compared to the proposed
RPF for a target SINR of 5 dB.
In Figure 2-4D, we show the comparative performance of different relay selection
methods for varying number of content size from 150 KB to 1 MB for a fixed network
of size 140 and tmax = 100 s. As content size increases, all methods will start to
increasingly rely on the B2D links. However, RPF requires 28% less B2D links compared
to other methods. The reduction in the number of B2D links demonstrates the improved
offload capabilities of the proposed RPF. Such an offload of traffic from the BS to the
D2D tier also reduces the usage of expensive backhaul traffic.
In Figure 2-5, we show the percentage of time a multi-hop D2D path is chosen
instead of an expensive direct B2D link for a user count of 140. This figure also gives an
55
indication on the quality of the cost functions that we have defined in (2–4) and (2–5).
Note that, this comparison considers only the cost of direct B2D and D2D relay devices
before the transmission starts. Figure 2-5 shows that, over 90% of the time, the RPF
chooses multi-hop D2D due to its cost-effectiveness. The small portion of time during
which the direct B2D links are used is primarily due to the destination devices which are
closer to the BS and can receive the content directly from the BS with low cost. As the
allowed tmax increases, more D2D links are chosen by RPF compared to B2D links. With
increasing tmax , the RPF tends to choose devices on the multi-hop path with delivery
time close to tmax as it tries to minimize the cost. This results in a D2D path having
a smaller cost, which explains why more D2D links are selected by RPF as the tmax
increases for a particular content size. Furthermore, as the content size increases, the
chances of forming better (less expensive) D2D path starts decreasing. For a given tmax ,
RPF has to choose devices of higher cost as the content size b increases in order to
find a D2D path that is capable of delivering the content within tmax . Therefore, a larger
content size results in decreased percentage of D2D links being chosen by RPF before
the content transmission starts as shown in Figure 2-5.
Figure 2-6 shows the execution time needed for our approach. RPF achieves
the optimal solution within shortest possible time even in large networks. In almost
all realizations, it takes less than a second on the average to compute cost-effective
devices on multi-hop path. We performed all the computations on an AMD Opteron(tm)
Processor 6168 CPU with 64 GB-memory Linux machine.
In Figure 2-7, we show the impact of different parameters mentioned in Subsection 2.3.3
on the performance of the RPF algorithm. Figure 2-7 is the heat map representing the
impact of δ (stability threshold) and the weight factor ρ on the content delivery success
rate achieved by RPF for a user count of 140, content size b = 1 MB, and tmax = 100 s.
The success rate is depicted by the RGB colors. As the success rate gets higher, the
color becomes lighter in the heat map. From Figure 2-7, we can see that the color
56
0
20
40
60
80
100
120
100 150 200 250 300 350 400
Tim
e (
ms)
Number of users
Figure 2-6. Execution time of RPF Figure 2-7. Impact of different parametersfor constructing Gp on theperformance of RPF
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50 100 150 200 250 300 350
Norm
alized
BS
co
st
Number of users
Proposed RPFMCCD
GNET
Figure 2-8. The cost of the BS vs user count
is lightest, i.e., the contents are successfully delivered, in the top right corner where
ρ = 0.8 and δ = 4. The content delivery success rate increases till δ = 4 and starts
deteriorating as δ is increased further. Accordingly, we assign the values of ρ and δ to
0.8 and 4 respectively for this setup of user count, content size and tmax . We vary the
value of strength threshold ζ from 0.5 to 0.9 and choose ζ = 0.7 as RPF achieves better
content delivery with this setup.
57
In Figure 2-8, we show how the cost of the BS varies with total users in the network.
The BS cost is normalized by the highest cost attained for the maximum user count. It
is clear that the BS cost for RPF is smaller than that of any other methods. Although,
MC aims to choose relay devices that yield a minimum cost, it suffers from poor delivery
since it does not take the mobility of devices into account. Therefore, the BS has to
invoke expensive B2D links to deliver the content resulting in increased BS cost as
evident from the Figure 2-8. The other two methods also results in a higher BS cost as
they also fail to consider devices’ mobility while choosing relay devices. For all of the
methods, as the number of users increases so does the interference originating from
users that are sharing same resources. In such a scenario, similar to what we have seen
in Figure 2-3D, achievable data rate decreases due to scarce resources. This, in turn,
leads to longer transmission time which makes them more susceptible to device mobility
and consequently results in higher BS cost. However, as the user count increases, the
gap between RPF and other methods also increases validating the superiority of our
method in terms of minimizing the BS cost.
2.6 Summary
In this chapter, we have studied the impact of device mobility on the performance
of multi-hop D2D underlaying cellular network. We have introduced a novel model
that considers durable communities based on the social encounters of devices for
predicting the likelihood of devices’ proximity. We have formulated the reliable device
selection problem as an IP optimization problem and we have proposed an efficient
heuristic algorithm to solve it. We have also shown that leveraging social communities
can increase the content delivery rate in multi-hop D2D. Simulation results showed that
our proposed method outperformed classical social-unaware methods significantly in
terms of traffic offload. The results also showed that the proposed method achieved
its objectives with manageable computational complexity which makes it applicable to
larger networks.
58
CHAPTER 3TOWARDS EFFICIENT SOCIAL-AWARE CONTENT TRANSMISSION THROUGH
DEVICE-TO-DEVICE MULTICAST COMMUNICATIONS
In Chapter 2 we have seen how social-aware approach leads to better content
delivery through multi-hop D2D communication when the content source and content
requester are beyond typical D2D range. In this chapter, we explore the impact of social
relationship on the performance of multicast content delivery via D2D underlaid cellular
networks.
3.1 D2D Enhanced Content Transmission
Let us consider a group of devices who are interested in the same multicast content
served by a single LTE-A cell shown in Figure 3-1. We consider a transmission scenario
where devices v1 to v4 receive the content directly from the eNB in the first hop, v1 and v3
act as RDs in the second hop to transmit the content to the rest of the devices that could
not receive it from the eNB. In subsequent hops, v5, v6, v7 and v8 help in further relaying
the content to other devices using multi-hop D2D within minimum time, thus achieving
better quality of service. In the remaining of this section, we will introduce the details on
the system model considering the social and physical aspects of devices’ relationship.
3.1.1 Problem Setup
Table 3-1. CQI / MCS table for LTE-A [3]CQI MCS Efficiency CQI MCS EfficiencyIndex [bit/s/Hz] Index [bit/s/Hz]0 not in range 0.0000 8 16-QAM 1.91411 QPSK 0.1523 9 16-QAM 2.40632 QPSK 0.2344 10 64-QAM 2.73053 QPSK 0.3770 11 64-QAM 3.32234 QPSK 0.6016 12 64-QAM 3.90235 QPSK 0.8770 13 64-QAM 4.52346 QPSK 1.1758 14 64-QAM 5.11527 16-QAM 1.4766 15 64-QAM 5.5547
59
Figure 3-1. D2D enabled multicast (a multi-hop scenario): v1 and v3 form the relaydevices in second hop, along with purple nodes (v2, v4) they are directlyserved by the eNB in first hop. v5, v6, v7 and v8 denote the relay devices insubsequent hops.
We consider LTE-A [3] systems where OFDMA and single carrier frequency
division multiple access (SC-FDMA) are used to access the downlink and the uplink,
respectively. The available radio spectrum is managed in terms of resource blocks (RBs)
and, in the frequency domain, each RB corresponds to 12 consecutive and equally
spaced sub-carriers. One RB is the smallest frequency resource that can be assigned
to a device. The overall number of available RBs depends on the system bandwidth
configuration and can vary between 6 (1.4 MHz channel bandwidth) and 100 (20 MHz).
We also assume there is a single eNB that manages the spectrum, by assigning
the adequate number of RBs to each scheduled device and by selecting the modulation
and coding schemes (MCS) for each RB. Scheduling procedures are based on the
channel quality indicator (CQI) feedback, transmitted by each device to the eNB over
dedicated control channels. The CQI is associated to the maximum supported MCS [3],
as reported in Table 3-1 for the LTE-A standard. We use the terms ‘device’ and ‘user’
interchangeably throughout this chapter.
60
3.1.2 System Model
3.1.2.1 Radio network
In the considered LTE-A single-cell area with total N ′ devices, a set N ⊆ N ′
of n devices is seeking for a particular content of size b from the eNB. The total
bandwidth is divided into B resource blocks (RB) in the set B. We assume that the
eNB constructs a multicast tree T and transmits the content to the first hop devices
using direct transmission. Some of the devices among them subsequently disseminate
it to the next hop devices via D2D communication and so on. As a result, the content is
transmitted in a multi-hop D2D fashion. According to Rel. 12 3GPP [5] specifications, we
consider that the D2D links exploit uplink frequencies and all the RDs in a particular hop
simultaneously use the frequencies with same MCS to deliver the multicast data over
the D2D links in a synchronized manner, as described in [4]. The receivers consider
these retransmissions as multipath components of the same signal.
In our network, we consider real-time content sharing among mobile D2D users.
The achievable rate Rij for the transmission between a device i and device j using RB z
is
Rij = lz log2(1 + γij), (3–1)
where lz is the bandwidth of RB z ∈ B used by i for its data transmission to j , γij denotes
the SINR for j from i . For the link between i and j , considering signal interference from
all other devices using the same RB z , we have
γij =pigij∑
i ′∈Z,i ′ 6=i pi ′gi ′j + σ2, (3–2)
where Z is the set of devices sharing RB z , gij is the channel gain between i and j , pi is
the transmit power of device i , and σ2 is the variance of the Gaussian noise.
For the wireless network, we consider distance-dependent path loss and multipath
Rayleigh fading. Thus, the received power of each link between devices i and j can be
61
described as pij = pi .(dij)−α.|m0|2, where pi is the transmit power of device i , α is the
path loss exponent, and m0 is the fading component.
Given this SINR model, the LTE-A system obtains the CQI values by using the
resulting SINR values defined in equation (3–2). The SINR values can be mapped to
the corresponding CQI values on each RB for all users on either cellular and D2D links
that ensures a block error rate smaller than 1% [55]. The eNB collects the SINR values
and maps them to corresponding CQI between itself and other individual devices as
well as between each pair of devices. Let Q be the set of available CQI levels and let
qi ∈ {1, 2, ... , q} be the CQI between device i and the eNB, ∀i ∈ N . Moreover, let
qi ,j be the CQI value for each D2D link between devices i , j ∈ N , i 6= j . The means
by which this is achieved are outside the scope of this paper. However, the pairwise
values for both uplink (UL) and downlink (DL) directions can be obtained by eNB from
the Channel State Information (CSI) through the use of the reference signals (RSs) in
LTE-A [28]. Each CQI level is associated with a given supported MCS. For an MCS
value c , the attainable data rate depends on the number of assigned RBs and on the
spectral efficiency for c as reported in Table 3-1. Hence, we compute the time required
for transmitting a content of size b in a particular hop that uses B RBs as: tc = b/(r dc ·B)
(for direct cellular) or tc = b/(r uc · B) (for D2D) if the corresponding MCS for that hop
is c . The terms r dc and r uc represent the data rates respectively in downlink and uplink
transmissions adopting the MCS associated to the CQI c .
3.1.2.2 Social network
For establishing D2D connections, the eNB must provide proper incentives eu to
the users u ∈ N so that they become willing to share their resources for each others’
transmissions [14, 81]. However, even then, in reality, mobile users are usually reluctant
to share their resources [48] due to several practical reasons encompassing limited
resources and privacy concerns. Interestingly, devices belonging to the same social
community will be more interested to help disseminate content to other devices in the
62
same community [10, 48]. We take into account the fact that social relationships in the
form of kinship, friendship, or colleague relationship between devices also influence
the content request pattern in social network. For instance friends watching game
in a stadium, students on-campus accessing a video content of common interest or
neighbors watching soccer matches or the Super Bowl show some degree of social
relationships in their interaction. Such social relationships have been shown to exhibit
a community structure property which implies that the users can be divided into groups
with dense connections inside each group and fewer connections across groups [17].
Cellular operators can leverage this community structure property to identify physically
close cost-effective users who can help transmit a content. Ideally, the chosen users
should be socially connected as well as in close proximity to one another at the time of
content transmission to extract the full benefit of D2D.
Let wij denote the social tie between devices i , j ∈ N , wij = 0 when there is no
social link between them. We use the binary variable lij to express the willingness of i
sharing its resource with j as follows:
lij =
1, if i and j belong to the same community
or if wij ≥ 0.5
0, otherwise.
(3–3)
Users with large social interaction between them are shown to have strong social
tie which is captured by the value wij . A social tie of at least 0.5 is considered in this
dissertation for allowing user i to share its resources with user j which is a reasonable
assumption supported by many recent works on social network [65, 77]. Furthermore,
in a social network, if the tie between two users is high, it is more likely for them to
be in the same community. We deploy the well-known Blondel community detection
algorithm [17] in our experiments to extract social communities. In the experimental
evaluation section we vary the social tie between users to see how that impacts the
63
system performance. Now, we define a generalized problem formulation that aims to
deliver the multicast content in minimum time by choosing relay devices that are within a
certain BS cost.
3.2 Problem Formulation
We now propose a problem to tackle the multicast content transmission problem
from a multi-hop point of view. From an eNB’s standpoint, the smaller the incentive
costs, eu, of devices operating as RD are, the less it has to pay as incentive in the form
of monetary remunerations, coupons, or free data which in turn reduces the operator
cost. Thus, the objective is to identify a set S of RDs and their hop-wise positions
in the multicast tree T , so that the following trade-off can be balanced within a fixed
eNB budget I : 1) minimizing content delivery time and 2) The total incentive cost of
S is not larger than I . In this dissertation, we define the cost of a set X of devices as
C(X ) =∑
i∈X ei . The notations we used in this chapter are summarized in Table 3-2.
We now formally define the Multi-hop Relay Selection (MRS) problem below.
Table 3-2. Summary of notationsNotation DescriptionN Set of mobile devices seeking the same content.Kd Set of potential relay devices corresponding to downlink
CQI level cd .Q, q Set of CQI levels in LTE-A, total CQI levelsS Set of selected relay devices.T The multicast tree.I , b,B Maximum eNB budget, content size, number of RBs.r dc , r
uc Data rate in downlink and uplink, respectively.
wij Social tie between devices i and j .lij Device i ’s willingness to share resources with device j .mij Social CQI value between device i and j
Definition 4. (MRS) Given a set of devices N in network G requesting the same
content, total eNB budget I as incentives to the devices, MRS seeks to find a set of relay
64
devices S ⊆ N and the multicast tree T such that all the devices receive the content
from the eNB in minimum time and the cost C(S) is at most I .
We prove that the MRS problem is NP-complete in Theorem 3.1 by reducing from
the set cover (SC) problem [69], which is known to be NP-complete.
Theorem 3.1. The MRS problem is NP-complete.
Proof. First we introduce the decision version of MRS, which asks whether there exists
relay devices S and multicast tree T such that by time tmax , all devices receive the
content and C(S) ≤ I . It is clear that with specific S and T , the time required to transmit
the content to all devices can be calculated in polynomial time. Therefore, MRS is in NP.
Next, we consider a special case of MRS (P-MRS), in which the eNB is required to
send the content to two sets of devices H,F . The CQI levels between eNB and a device
in H, F are q1, q2 respectively and the transmitting times are t1, t2. Also, each device
h ∈ H is able to relay the content to a set F (h) ⊆ F of devices using CQI level q3 in time
t3, ∪h∈HF (h) = F . Let t1 + t3 < tmax < t2. The incentive for each relay device is 1 and
the total budget of the eNB is a positive integer I .
We then reduce the SC problem to P-MRS, which leads to the NP-Hardness of
P-MRS. As P-MRS is a special case of MRS, MRS is NP-hard and in turn NP-complete.
The decision version of SC is as follows.
SC: Given a set of m elements U, a collection S = {Si |Si ⊆ U, i = 1, ..., n} of n
sets and k ∈ N+, the SC problem seeks to identify whether there exists a sub-collection
S ′ ⊆ S where |S ′| ≤ k whose union equals U.
Let F = U, I = k and form a device h and corresponding F (h) for each Si ∈ S . With
t1 + t3 < tmax < t2, we create a P-MRS instance from the SC instance.
When the SC problem has a solution S ′, |S ′| ≤ k that can cover all elements in
U, the P-MRS problem has a solution by choosing all devices h whose corresponding
sets are in S ′ as the relay devices. All devices in H are placed in the first layer of the
multicast tree and all devices in F in the second layer. As the solution to SC is feasible,
65
Table 3-3. Social CQI Matrix (SCM)PPPPPPPPPS
N\Kd1 d2 ... dj
s1 ms1,d1 ms1,d2 ... ms1,dj
s2 ms2,d1 ms2,d2 ... ms2,dj
... ... ... ... ...
si msi ,d1 msi ,d2 ... msi ,dj
in the P-MRS, all devices in F can receive the data from a device in H and the content
delivery time is t1 + t3 < tmax .
When the SC problem has no solution, in the P-MRS instance, some of the devices
in F cannot receive the content via a relay device. Therefore, the content delivery time
is t2 > tmax (when all devices receive the data directly from the eNB) and the P-MRS
problem is infeasible.
Therefore, the P-MRS problem is NP-Hard as SC is NP-Hard. As P-MRS is only
a special case of MRS, MRS must be NP-Hard and consequently an NP-complete
problem.
3.3 Solution for MRS
Our proposed solution aims to identify the optimal multicast tree T to solve the MRS
problem by deciding: (i) the appropriate CQI levels in each layer of T (ii) the set of the
RDs (S), and (iii) the hop-wise position of each of the devices in S in the tree T .
As the first step to solve the problem, the eNB computes the pairwise CQI levels
qi ,j where j ∈ N , i.e., set of devices that can be served by an RD i using direct D2D,
∀i ∈ N . During this step, the eNB integrates the social aspect with the physical radio
network characteristics. As part of this, the eNB constructs a social CQI matrix (SCM)
as depicted in Table 3-3. Social CQI index between devices i and j is expressed as
mi ,j = qi ,j ∗ li ,j (refer Equation (3–3)), which denotes the likelihood of i sharing resources
with j when channel quality is qi ,j . When they have no social relationship, mi ,j is set to
0. Although, this will reduce some potential D2D pairs, consideration of social aspect
66
while choosing relay devices is of practical significance. In real-life scenario, as we
have already explained in Chapter 1.1.2, users are very reluctant to and in some cases
quite skeptical in sharing resources with unknown persons even if they are situated
close to each other within D2D proximity. Socially connected users have been found
to be more willing to share resources with one another which facilitates the success
of proximity-based D2D services [48]. Note that the proposed scheme requires that
the eNB is aware of the updated SCM, which incurs some extra overhead. However,
D2D communications are usually based on the assumption of stationary or, at least,
semi-static D2D channels because of their low mobility and short communication range
in the proximity based services [55]. Further research related to tackling higher device
mobility is left for future studies.
We formulate the MRS problem as a mixed integer program by constructing a
multicast tree T with the eNB placed as the root in layer 0. A device i in layer l transmits
content to its D2D proximity devices N(i) in layer l + 1, which we term as one hop
transmission. The eNB transmits a content to the devices in layer 1 using DL data rate
r dc (refer Section 3.1.2) by setting CQI to cd . Since, our focus is to ensure faster content
delivery without significant buffering in the relay devices, the selected RDs in that layer
must deliver the content to the next hop via D2D using the same or larger data rate to
become a feasible solution. Therefore, a solution is considered feasible only if the D2D
links in the second hop originating from each RD in layer 1 use MCS level cu ≥ cd in
uplink. Accordingly, we only consider devices j ∈ N\S to be in N(i) if mi ,j ≥ cu, i.e.,
devices that are able to decode the content correctly when i transmits it using MCS cu
via D2D. In the subsequent layers, similar to the first layer, relay devices are chosen
into S given that the D2D links satisfy this MCS requirement, i.e., MCS level must be
non-decreasing in the following layers : cul+1 ≥ cul . The total number of layers lmax in a
particular tree (i.e. depth) will be at most the number of allowed RDs which is bounded
by the eNB budget. We denote the set of layers with L = {0, 1, 2, ... , lmax}. The time tc
67
to deliver the content from i to those in the next layer is determined from the data rate
associated with CQI c in Table 3-1 if the selected CQI in that layer is c as described in
Section 3.1.2.
Our aim is to reduce the overall content delivery time for a given eNB budget. In
this context, we define a binary variable pu which defines if a device u will act as an RD.
We also introduce a variable yv ,l denoting whether a device v belongs to layer l . We
consider the eNB in our formulation as node 0 and place it at layer l = 0, accordingly we
set y0,0 = 1. Let the variable zu,v denote whether device v receives the content from u,
i.e., device u is the relay device for v . In a particular layer, each relay device transmits
using the same CQI which is captured by the binary variable w cl .
w cl =
1, if layer l uses CQI c .
0, otherwise.(3–4)
We keep track of the number of layers in the tree using the variable xl which is set
to 1 when there is at least one device in layer l , otherwise it is 0. We also introduce a
binary variable qL which denotes if the total number of layers is at least L. We formulate
the problem as a mixed integer program P below in (3–5).
68
min t
s.t.∑l
yv ,l = 1,∀v ∈ N (3–5a)
xl ≤∑v∈N
yv ,l ≤ A · xl , ∀l ∈ L (3–5b)
L−1∑l=0
∑c∈Q
tc · w cl ≤ t + A · (1− qL),∀L ∈ L (3–5c)
∑c∈Q
w cl = 1,∀l ∈ L (3–5d)
xl ≤ ql , ∀l ∈ L (3–5e)∑u∈N
zu,v = 1,∀v ∈ N (3–5f)
yu,l + zu,v ≤ 1 + yv ,l+1,∀u, v ∈ N ,∀l ∈ L (3–5g)
w cl+1 · tc ≤ w c ′
l · tc′+ A · (1− w c ′
l ),∀c ′ ≤ c ,∀l ∈ L (3–5h)∑v∈N
zu,v ≤ A · pu,∀u ∈ N (3–5i)
yu,l + zu,v ≤ 1 +
mu,v∑c=1
w cl ,∀u, v ∈ N ,∀l ∈ L (3–5j)
∑u∈N
eu · pu ≤ I (3–5k)
zu,v ∈ {0, 1},∀u, v ∈ N (3–5l)
yv ,l ∈ {0, 1}, ∀v ∈ N ,∀l ∈ L (3–5m)
xl ∈ {0, 1},∀l ∈ L (3–5n)
w cl ∈ {0, 1},∀c ∈ Q,∀l ∈ L (3–5o)
qL ∈ {0, 1}, ∀L ∈ L (3–5p)
pu ∈ {0, 1},∀u ∈ N (3–5q)
69
The objective of P is to minimize delivery time. The constant A denotes a large
positive number in all of the above cases. Constraint (3–5a) ensures that a device v
must be in exactly one of the layers in the tree. A layer exists when there is at least one
device in it, constraint (3–5b) captures that. (3–5c) calculates the delivery time t. (3–5d)
ensures that devices belonging to a particular layer transmits using the same CQI.
Constraint (3–5e) detects whether the total number of layers is at least L. (3–5f) ensures
that a device receives the data from exactly one device in the upper layer. If a device v
belongs to layer l + 1 and it receives the content from device u, then u must be in layer
l , (3–5g) expresses this scenario. (3–5h) considers the fact that a device in layer l + 1
must transmit using a CQI equal to or higher than that of a device in layer l . Constraints
(3–5i) and (3–5k) ensure that the total cost of selected RDs are within the eNB budget
I . If device u in layer l transmits the data to device v in layer l + 1, they must be socially
and physically connected, i.e., the mu,v (Table 3-3) value must be larger or equal to the
CQI of layer c , which is ensured by (3–5j).
Corollary 1. The size of P is O(n2 · lmax).
We have already shown in Theorem 3.1 that the MRS problem is NP-complete and
obtaining the optimal solution requires exponential time with the size of the D2D pairs.
Consequently, we introduce a heuristic approach, time efficient relay selector (ERS),
to tackle the MRS problem in the next subsection. We also show in the experimental
evaluation that ERS achieves the objective value within at most 5% of that of the optimal
solution.
ERS: Time-efficient approach for MRS
As pointed out in the previous section, with the increase in the number of potential
D2D pairs, the solution space becomes prohibitively large and intractable to solve
at some point. However, we observe that some of those D2D pairs are less likely to
be selected as part of the multicast tree and hence can be safely removed from the
solution space. To this end, we perform a pre-processing step before solving (3–5),
70
which removes non-contributory D2D pairs. For a D2D pair (x , y), if the eNB can deliver
the content to y faster than transmitting via x , we discard the pair. Equipped with the
pre-processing step, we express the ERS algorithm in Alg. 3.
Algorithm 3 ERS: Efficient Relay SelectorInput: SCM matrix, content size b, number of RB B, r dc , r uc ,∀c ∈ QOutput: The optimum multicast tree T
foreach D2D pair {(x , y)|mx ,y > 0} doCompute tx = b/(r dmeNB,x
· B)Compute ty = b/(r dmeNB,y
· B)Compute tx ,y = ty ,x = b/(r umx ,y
· B)if ty ≤ tx + tx ,y then
Mark SCM entry mx ,y = 0
Solve P for T
Lemma 1. Pre-processing reduces D2D pairs by at least half of their initial number.
Proof. Before the pre-processing, for each pair (x , y) with mx ,y > 0, the pair (y , x) must
also exist as my ,x = mx ,y > 0. It is clear that tx , ty , tx ,y > 0 in each iteration of Alg. 3.
Then, the condition in the if statement for pairs (x , y), (y , x) cannot both be true and at
least one of the pairs will be removed. As Alg. 3 iterates through all initial D2D pairs, at
least half of them will be removed after the pre-processing.
We solve the MRS problem via the ERS method using CPLEX tool [36]. We also
demonstrate the significantly reduced D2D pair count used in ERS in the experimental
evaluation section.
3.4 Experimental Evaluation
In this section, we evaluate the performance of the proposed algorithms. We show
the comparative performance analysis of the algorithm we proposed for the generic
multi-hop D2D.
In the first set of experiments, we evaluate how social-aware multi-hop D2D can
achieve better content delivery time when the number of relay devices are constrained
by the eNB budget. The analysis is performed according to guidelines of LTE-A system
71
model [3]. We consider distance-dependent path loss and multipath Rayleigh fading
with exponent α = 3. The main wireless parameters are listed in Table 3-4. We have
considered 20 to 100 RBs dedicated for the multicast users on a 20 MHz channel
bandwidth [3]. Pairwise CQIs between devices including with eNB are computed by
mapping the signal-to-interference-plus-noise-ratio (SINR) on each RB onto the CQI
level that ensures a block error rate smaller than 1% [55].
To model the social tie, we use real-world location-based Gowalla network topology
from Stanford repository [1]. We choose n = 25 to 100 users with a step size of 25 who
are in a particular location at a certain time seeking the same content [6]. The social
tie wij between a pair of users is assigned randomly from a uniform distribution ranging
(0, a], where we vary the parameter a from 0.1 to 1.0 to observe the importance of social
communities on the delivery time. We then deploy the Blondel [17] community detection
algorithm to extract the social communities and subsequently to find the willingness lij of
a device i to share its resources with device j as defined in (3–3).
Table 3-4. Wireless network parametersNotation DescriptionCell dimension 100 x 100 m2
eNB location Middle of Left edge of the areaChannel Model Multipath Rayleigh fadingPath Loss Exponent 3Noise spectral density −174 dBm/HzeNB transmit power 10 WD2D transmit power 100 mWMaximum D2D distance 30 mRB size 12 sub-carriers, 0.5 ms
We compare the performance of optimal relay selector (ORS), ERS, restricted relay
selector (RRS), greedy relay selector (GRS) [55] and conventional multicast scheme
(CMS) [6] in terms of reducing the content delivery time. ORS is the solution obtained by
solving (3–5) using CPLEX [36]. ERS chooses relay devices from the reduced solution
72
space as described in Alg. 3. RRS selects the relay devices that are chosen from only
layer l = 1 while eNB being placed alone in layer l = 0. This means the whole content
transmission is restricted to two hops only. RRS thus obtains the optimal solution for
two hop transmission by limiting lmax to 2 in (3–5). GRS, proposed in [55], chooses relay
devices from first hop (l = 1) without any constraint on the eNB budget. In the second
hop it attaches devices greedily to the RD in layer 1 with the highest CQI and completes
the overall transmission in minimum time within two hops. It has been shown that GRS
outperforms other state-of-the-art methods in terms of reducing content delivery time
[55]. CMS, which transmits the content according to the CQI of the device having the
worst channel condition [6], is used as a base line for delivery time comparison.
We assume the video-on-demand (VoD) content size b = 1 MB, number of
resource blocks B = 100, social tie distribution parameter a = 1.0, eNB budget I = 10
throughout the subsequent experiments unless otherwise mentioned. We also assume
for simplicity that the store and forward decoding in each relay device takes negligible
time compared to the transmission time to other devices. Without loss of generality,
for better understanding of the comparisons, we have assigned eu = 1 for all devices
throughout the experiments.
The first analysis focuses on demonstrating that ERS outputs near-optimal solution
with respect to the optimal solution of (3–5). Since the optimal solution takes exponential
time for larger instances of the network, in Figure 3-2 we use |N | = 25 for comparison
with a range of eNB budgets. The gap is computed as follows:
gap =deliveryTime(ERS)− deliveryTime(ORS)
deliveryTime(ORS)× 100%
We can see ERS delivers the content always within 5% gap of that of ORS
as evident from Table 3-5 for various budgets. However, the execution time (ET)
varies significantly between ERS and ORS indicated by the third and fourth columns
respectively in the table. Executions are performed on an AMD Opteron(tm) Processor
73
6168 CPU with 64 GB-memory Linux machine. The reduced count of D2D pairs,
which is contributing to the huge run time improvement, is also evident from Table 3-6.
Accordingly, we use only ERS for comparative analysis in subsequent experiments.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
0 1 2 3 4 5
Deliv
ery
tim
e (
s)
Budget
ORS
ERS
Figure 3-2. Analysis of ERS performance vs ORS
Table 3-5. Gap analysis between ORS and ERSBudget % gap ET(s) ET(s)
ERS ORS1 4.45 0.1 112 1.96 0.2 20353 1.97 0.4 61894 3.92 0.6 128795 4.56 1.0 20212
Table 3-6. D2D pair counts in ORS and ERS
Method User count (n)25 50 75 100
ORS 600 2332 3852 7948ERS 135 426 551 1760
We now analyze the comparative performance of ERS, RRS, GRS and CMS with
respect to delivery time for a varying number of eNB budgets setting |N | = n = 50.
The result is shown in Figure 3-3. For all the considered relay selection schemes, ERS
74
achieves the best delivery time. CMS takes worst delivery time and it does not change
with the budget size as CMS does not leverage the D2D and hence, does not require
any RD. It is interesting to underline that GRS suffers from poor delivery time when the
budget is restrained to small number. Since, for each device in N\S, GRS attaches it
to the relay device with highest CQI value, it requires comparatively large number of
relay devices to cover all devices in N\S. It takes as large as 5 relay devices before it
can produce any feasible solution. Even if we allow GRS to operate without any budget
constraint (marked as Unrestricted GRS in Table 3-7), still ERS and RRS outperform it
when budget is larger than 2 as can be seen from Table 3-7. On the other hand, RRS
achieves better performance in terms of minimizing the delivery time compared to other
methods except ERS. As more budget is allowed, ERS has the liberty to choose from
more number of D2D devices with better CQI spanning larger hops that can minimize
the delivery time compared to RRS.
Table 3-7. Comparison of delivery times in second
Method Budget1 2 3 5 10
Unrestricted GRS 0.92 0.92 0.92 0.92 0.92RRS 0.98 0.92 0.84 0.84 0.83ERS 0.98 0.92 0.84 0.75 0.72
Figure 3-4 compares the impact of multicast user count on the delivery time performance
of different relay selection schemes. The area where the users are distributed is
progressively extended starting from the smaller area for n = 25 users to the whole
cell of 100 × 100 m2 for n = 100 users. As users are gradually moved farther from the
eNB as the multicast user count increases, the worst channel quality between eNB
and a device also deteriorates. As a result, the delivery time for each of the schemes
increases. However, increased distance from eNB unravels the opportunity of multi-hop
D2D, particularly for those devices that are far away from the eNB. ERS takes the
75
full advantage of the D2D pairs by delivering content to the devices in the cell edge
through multi-hop D2D. As user count n reaches 100, ERS outperforms other schemes
significantly with an average delivery time gain of 68.4% over CMS and 22.3% over RRS.
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
0 2 4 6 8 10
Deliv
ery
tim
e (
s)
Budget
ERS
RRS
GRS
CMS
Figure 3-3. Delivery time for varying the eNBbudget
0.5
1
1.5
2
2.5
3
25 50 75 100
Deliv
ery
tim
e (
s)
n
ERS
RRS
GRS
CMS
Figure 3-4. Delivery time for varying themulticast users
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
20 30 40 50 60 70 80 90 100
De
live
ry tim
e (
s)
Available RB
ERS
RRS
GRS
CMS
Figure 3-5. Delivery time for varying the RB count
In Figure 3-5, we show how varying the number of available resource blocks can
impact the relay selection and corresponding content delivery time for a fixed content
size of 1 MB, multicast user count of 50 and fixed budget of 10. We depict the delivery
time for a range of RBs spanning 20 to 100 with a step size of 10 in the network. As
76
expected, CMS is the worst performing scheme for multicast content delivery requiring
5.56 s for 20 RBs and 1.11 s when there are 100 RBs. When the number of available
RBs is small (20), ERS exhibits performance gain of 12.6% and 34.4% over RRS and
CMS, respectively. Not surprisingly, all of the schemes improve the delivery time as
more number of resource blocks are allowed to be used for transmission. This, once
again, reinforces the superiority of ERS in terms of working efficiently under resource
constraint.
Figure 3-6 demonstrates how varying the content size impacts the content delivery
time when the number of RBs, user count and budget is fixed to the values mentioned
at the start of this section. We show the impact on delivery time by ranging the VoD
content size from 1 MB to 10 MB. Given a fixed number of RB, B = 100, with increased
content size, more time is required to transmit the content. As a result of this increased
per hop time, the overall content delivery time increases for all of the methods. However,
similar to previous trend, ERS outperforms other methods significantly.
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Deliv
ery
tim
e (
s)
Content size (MB)
ERS
RRS
GRS
CMS
Figure 3-6. Delivery time for varying thecontent size
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.1 0.3 0.5 0.7 0.9
0
100
200
300
400
500
Deliv
ery
tim
e (
s)
D2D
pair c
ount
Social tie
Figure 3-7. Delivery time and D2D pair countfor varying the social tiedistribution
So far in all of the experiments we have considered social tie wij between user i
and user j in online social network chosen uniformly from a distribution ranging (0, a]
77
Figure 3-8. Heatmap depicting hop count for varying both budget and multicast user
and setting a = 1.0. We now analyze the impact of social tie by varying distribution
range, i.e., a = 0.1 to a = 1.0. The smaller the value of a, the less would be the social
tie strength wij , which makes it less likely for i and j to be in the same community. This
would eventually result in less likelihood of sharing resources between i and j , i.e., lij
becoming small. Hence, there will be fewer D2D device pairs which in turn culminates
into larger delivery time. Recall that lij also depends on the social tie between i and j
denoted by wij ≥ 0.5 (refer Equation (3–3)). Therefore, once the value of a becomes at
least 0.5, the number of D2D pairs increases significantly. This is evident from Figure
3-7 where we vary the social tie distribution range a for a given RB size, content size,
multicast user count and budget size as specified earlier and report the results obtained
by the ERS algorithm in terms of total content delivery time and number of D2D pairs.
As a increases, more D2D pairs (red line) are included for sharing resources between
each other which results in quicker delivery time (blue line). We averaged the results
over a large number of independent simulation runs for this purpose.
78
We now express the underlying hop count used by our proposed method ERS as a
function of both eNB budget and the total number of multicast users. For this analysis,
we fix the RB count to 100, content size to 1 MB, social tie distribution range to (0, 1.0].
We vary the user count from 25 to 100 with a step size of 25 and eNB budget from 1 to
10 in this experiment. In Figure 3-8, we depict the hop count by the RGB colors. As the
hop count gets higher, the color becomes lighter in the heat map. For a given budget, as
the total user count increases, devices far away from the eNB are better served by the
D2D requiring multi-hop D2D communication which can be seen from lighter color in the
heat map.
3.5 Summary
In this chapter, we have studied the benefit of social-aware multi-hop D2D for video
content delivery to multicast users under practical constraints imposed by the eNB
for relay selection. We have formulated a novel problem, MRS, for minimizing content
delivery time to a group of users and shown its NP-completeness. We have introduced
a mixed integer program formulation to express MRS and proposed a heuristic scheme
to efficiently solve the problem. Our proposed social-aware solution minimizes the Base
Station cost efficiently by relaying video content to a set of relay devices which, in turn,
transmits the content via multi-hop D2D to other devices with poorer channel condition
which could not receive the content from the eNB. Simulation results showed that our
proposed methods outperformed existing state-of-the-art methods significantly in terms
of minimizing content delivery time.
79
CHAPTER 4SOCIAL-AWARE MULTICAST CONTENT TRANSMISSION: SPECIAL CASE
SCENARIO
In the previous chapter, we have explored the performance enhancement of
multicast content delivery brought forth by the social-aware D2D communication.
Although the mixed integer program P that we devised in Chapter 3 provides an
accurate solution to MRS, it may not scale well for large MRS instances even after the
pre-processing step to remove non-contributing D2D pairs. In this chapter, we introduce
an efficient greedy based approximation algorithm with provable performance guarantee
to solve a two-hop variation of the MRS problem.
4.1 CRS: Two-hop MRS
We now discuss the cost-effective relay selection problem (CRS), a constrained
version of the MRS problem in which the multicast content is delivered in at most two
hops. In CRS, the eNB selects a particular CQI level in downlink and transmits the
content to a set of devices that are capable of decoding it. Among these devices, a set
of relay devices are chosen who then transmit the content via D2D using the uplink to
the rest of the devices. The objective of CRS is to ensure content delivery to all devices
in minimum time by selecting appropriate CQI levels for both downlink and uplink. With
the guaranteed minimum time, CRS asks the most cost-effective relay devices in the
second hop to transmit the content to rest of the devices that cannot be directly served
by the eNB. The formal definition of CRS is as follows.
Definition 5. (CRS) Given a network G of N devices requesting the same content, CRS
seeks to first find the CQI levels cd and cu in downlink and uplink respectively such that
the content can be delivered to all the devices in minimum time. Then it asks for a set S
of relay devices with minimum C(S) that guarantees the minimum delivery time.
In the proof of Theorem 3.1, the P-MRS problem we constructed is essentially a
special case for CRS and it is NP-Complete. Therefore, CRS is also NP-Complete.
80
4.1.1 Solution Sketch
Our proposed solutions aim to identify the proper configuration to solve CRS by
deciding: (i) the CQI levels cd , cu that guarantees content delivery with minimum time,
(ii) set S of the RDs to transmit the content to all devices that cannot be served directly
by the eNB.
For the first step of solving CRS, we are required to find out the appropriate value
of CQI in each layer that results in minimum delivery time. We start by calculating
the overall transmission time T for all possible cd , cu combinations and sort the
combinations in ascending order of T . Then we use a binary search to locate the
feasible combination with minimum transmission time. The feasibility of a combination
cd , cu is discussed as follows.
For each combination cd , cu, we first identify the set Kd of devices that are able to
decode the content transmitted from the eNB under CQI level cd . Then, we construct the
set of devices that can retrieve the content from a device i ∈ Kd via D2D. For an uplink
MCS cu each device i first constructs a list of devices that can be served by i via D2D,
i.e., Xi = {k ∈ N\Kd |mi ,k ≥ cu} which is called the maximum reachable devices (MRD)
set of device i . Since D2D transmissions in each hop are synchronized as they are
performed in the same transmission time interval (TTI) [5], all considered RDs serve all
UEs in next hop in a single transmission by using the MCS corresponding to the chosen
uplink MCS cu. A combination cd , cu is feasible when ∪i∈KdXi = N\Kd .
For the second step of solving CRS, we introduce two solution schemes for the
relay selection problem (RSP) that identifies the RD set S with minimum cost under
the obtained cd , cu during the first step. The two solutions are described in the next
subsection.
81
4.1.2 Solutions for RSP
4.1.2.1 The pptimal solution to RSP
In this subsection, we solve RSP optimally by formulating it as an integer linear
program.
With given cd , cu, as discussed in the previous section, we can identify the set
Kd and all Xi , i ∈ Kd . We introduce a variable ai for every device i ∈ Kd , with the
intended meaning that ai = 1 when the device i is selected as a relay device, and ai = 0
otherwise. We can express the RSP problem as the following integer linear program
(ILP):
minimize∑i∈Kd
ei · ai
subject to∑i :v∈Xi
ai ≥ 1, ∀v ∈ N\Kd
ai ∈ {0, 1} (4–1)
The constraint ensures that rest of the devices N\Kd can retrieve the content from
at least one of the devices in i ∈ Kd . This implies the feasible solution of this ILP must
ensure that each of the devices that could not receive the content from the eNB belongs
to the maximum reachable devices (MRD) set X of at least one relay device.
We obtain the optimal solution of RSP by solving (4–1) using CPLEX tool [36].
Accordingly, we denote the overall solution to solve the CRS problem as Optimal Relay
Device Selector (ORDS) when this ILP is deployed in the second step to solve RSP. We
discuss the comparative performance of this ORDS algorithm in Section 4.2. As RSP
resembles the set cover problem and is thus NP-Hard, we now propose a fast greedy
algorithm to RSP in order to achieve high efficiency for large problem instances.
82
4.1.2.2 The greedy solution to RSP
The proposed greedy solution to RSP, gain maximizer (GM) described in Alg. 4,
iteratively selects a device which maximizes the number of newly covered devices in
N\Kd . Notice that GM may not be able to find a feasible solution for all combinations of
cd , cu. The gain function f (u) = |Xu |eu
indicates the number of uncovered devices that u
can transmit the content to, denoted by |Xu| weighted by its incentivization cost eu using
the given CQI level. In each iteration we pick the device u with highest gain (a tie is
broken arbitrarily) and add it to the final set S. The gain function f (v) for all other nodes
is updated each time a device is added to the set S . The process continues until either
all the devices in N\Kd are covered or there is no feasible solution for this value of cu.
Algorithm 4 GM: Gain MaximizerInput: Xu,∀u ∈ Kd
Output: S ⊆ Kd
S ← ∅, C ← ∅foreach u ∈ Kd do
Initialize the gain f (u)← |Xu |eu
while C 6= N\Kd ∧ Kd 6= S dov ← argmax
v ′∈Kd\S{f (v ′)}
S ← S ∪ {v}foreach x ∈ Xv doC ← C ∪ {x}Update f (y) for all y ∈ Kd such that x ∈ Xy
if C 6= N\Kd thenSolution not feasible
We are now ready to propose the solution to solve the CRS as a whole based on
the solutions of RSP in the next section.
4.1.3 Solution to CRS
The solution to CRS, namely FRDS (Fast Relay Device Selector) combines what we
discussed above in Section 4.1.1 and Section 4.1.2. Upon construction of the potential
(cd , cu) pairs, FRDS sorts them according to their content delivery time in ascending
83
order. It then runs a binary search over those combinations to identify the pair that can
ensure all the devices can receive the content. For each pair (cd , cu), it then invokes
Alg. 4 to check whether the potential set of relay devices corresponding to that (cd , cu)
combination can transmit the content to all the devices that could not retrieve it from
the eNB. Note that the ILP discussed in Section 4.1.2.1 can also be used to determine
the feasibility of a pair (cd , cu), yet FRDS uses Alg. 4 for obtaining better run time
efficiency as well as approximation guarantee which we discuss next. We also compare
the performance of FRDS with ORDS (described in previous Section 4.1.2.1) in the
experimental evaluation section. The algorithm FRDS is detailed below.
Algorithm 5 FRDS: Fast Relay Device SelectorInput: SCM matrix, NOutput: �cd , �cu,S1. Construct all possible combination (cd , cu) such that cu ≥ cd
2. Compute the content delivery time tcd
+ tcu for each combination (cd , cu)
3. Sort the combinations in ascending order according to the delivery time4. Identify the feasible pair (�cd , �cu) with smallest delivery time using a binary search5. Calculate the relay device collection S using Alg. 4
Complexity analysis
We now discuss the solution quality and running time complexity of Alg. 5.
Theorem 4.1. Alg. 5 ensures minimum content delivery time and achieves an approxi-
mation ratio of O(log(n)) for the cost of selecting relay devices.
Proof. Alg. 5 guarantees the minimum possible delivery time due to two facts: (i) the
feasibility check of a delivery time (which is essentially the feasibility of the pair (cd , cu))
is accurate, ensured by Alg. 4. (ii) A binary search structure over sorted time guarantees
finding the minimum feasible time. In terms of the cost of selecting relay devices, as
RSP resembles the Set Cover problem, the approximation ratio for the greedy solution of
RSP is also log(n) [69].
Lemma 2. The run time computational complexity of FRDS is O(n2 · log(q)), where q is
the total number of CQI levels.
84
Proof. There are total q2 possible combination for downlink and uplink CQI pairs
(cd , cu). The binary search of the appropriate level of downlink and uplink CQI levels
takes O(log(q2)) time. For each of the downlink and uplink CQI combination (cd , cu),
the FRDS algorithm at first constructs the set of potential relay devices Kd which takes
linear time in n. For each of the devices i ∈ Kd , FRDS identifies the set of devices that
can be served by i via D2D, which can take O(n2) time in worst case. There can be total
O(n) possible collections of sets for the GM algorithm which takes O(n) time to find out
whether the solution is feasible and if so the corresponding set of relay devices. So,
the while loop of the GM algorithm takes O(n2) in worst case which is the time for each
combination (cd , cu). Hence, the total time complexity for binary searching the actual
feasible pair that results in minimum time is O(n2 · log(q)).
4.2 Experimental Evaluation
In this section, we evaluate the performance of the proposed algorithms. We
analyze this special case of two-hop scenario which has approximation guarantee with
different methods which we have also used for comparison purpose in the previous
chapter. We compare the performance of our proposed algorithms for solving the
CRS problem. Recall that CRS is a special case scenario of the multi-hop problem in
which the multicast content is delivered in at most two hops with the aim of minimizing
content delivery time along with choosing cost-effective relay devices. We evaluate the
performance of ORDS (introduced in Section 4.1.2.1) and FRDS algorithms with that of
the GRS algorithm [55] as well as conventional multicast scheme (CMS) [6] which we
have also used in the previous chapter. We compare these schemes in terms of different
metrics including the eNB budget, content delivery time and execution time. ORDS is
the solution obtained by solving the CRS problem where in the second step we deployed
CPLEX [36] to solve (4–1). FRDS selects the relay devices according to the Alg. 5.
Figure 4-1 shows the comparative performance of different methods in terms of eNB
budget, which is essentially the RD count since we assumed eu = 1, required to deliver
85
0
2
4
6
8
10
25 50 75 100
Budget
n
ORDS
FRDS
GRS
Figure 4-1. eNB Budget (RD count) forvarying user count
0.5
1
1.5
2
2.5
3
25 50 75 100
Deliv
ery
tim
e (
s)
n
ORDS
FRDS
GRS
CMS
Figure 4-2. Content delivery time for varyinguser count
0
200
400
600
800
1000
1200
1400
1600
25 50 75 100
Execution tim
e (
ms)
n
ORDS
FRDS
GRS
Figure 4-3. Execution time for varying user count
a content of size 1 MB using 100 RBs when the multicast user count is varied from 25 to
100 with a step size of 25. The objective of the CRS problem is to deliver the content in
minimum time while choosing minimum cost RDs. FRDS requires almost similar number
of RDs as does the ORDS, which is well within the O(log(n)) approximation ratio that we
proved in Theorem 4.1. FRDS also requires significantly lesser time to compute the RDs
than the ORDS, which can be seen from Figure 4-3. On the other hand, As the user
count increases, GRS requires increasingly larger number of RDs to deliver the content
86
to all the multicast users. Note that, for user count n = 25, GRS requires less number
of RDs than that of the FRDS. However, this comes at a cost: the content delivery time
for GRS takes longer than that of the FRDS as Figure 4-2 clearly suggests for the user
count of 25.
Figure 4-2 demonstrates the content delivery time performance of different relay
selection schemes. As the multicast user count increases, the worst channel quality
between eNB and a device also deteriorates due to the poor channel condition of
the devices on the edge of the network. As a result, the delivery time for each of the
schemes increases. CMS requires comparatively longer time than any other methods
for delivering the content since it transmits the content to all the users using the CQI
value of the user with poorest channel condition. GRS takes longer time to deliver the
content when user count is small. For larger user count, GRS takes similar amount of
time in delivering the content compared with ORDS and FRDS. However, this comes at
a cost: GRS also requires large number of RDs to accomplish its objective which makes
this approach practically cost inefficient. FRDS requires exactly same amount of time as
ORDS does; however, it requires significantly less time to compute the RDs as evident
from Figure 4-3. Furthermore, FRDS requires very small number of RDs in delivering
the content which makes it the best fit for identifying the cost-effective relay devices.
In Figure 4-3, we report the running time of the algorithms. ORDS takes more
time to compute the RDs compared to FRDS which requires time in milliseconds. With
increasing user count, ORDS’s time complexity exponentially increases. It is worth
noting that for smaller values of n, GRS takes very small time to compute the RDs.
However, as the user count increases, it requires significantly longer time to identify
RDs. The reason lies in how GRS identifies the RDs. For each downlink CQI cd , GRS
identifies the potential relay devices. It then enumerates all possible combinations of
the potential RDs, starting with the smallest size combination, to verify the feasibility
of the solution. This expensive step makes the algorithm take prohibitively larger time
87
than any other method specially when the potential relay devices are also large for large
values of n, which can be seen for n = 100 in Figure 4-3. Note that Figure 4-1 does not
have any corresponding values for CMS scheme as it does not support the concept of
RD. In case of Figure 4-3, CMS takes very small amount of time to compute the CQI
value of the user with worst channel condition with which it delivers the content to all
the multicast users (not shown in the figure). This near constant time computational
complexity comes at a cost: the content delivery time is much larger than any other
method considered. In summary, FRDS achieves almost 1800% gain over GRS for
n = 100 in terms of running time complexity.
4.3 Summary
In this chapter, we have studied the benefit of social-aware D2D for video content
delivery to multicast users under practical constraints imposed by the eNB for relay
selection. We have devised the problem as a special case of the generic problem
introduced in the previous chapter and analyzed its complexity. Moreover, we provided
an approximation algorithm for this special case with a provable performance guarantee.
Experimental evaluation results showed that our proposed methods outperformed
existing state-of-the-art methods significantly in terms of minimizing content delivery
time.
88
CHAPTER 5ROBUSTNESS OF COMMUNITY STRUCTURES: APPROXIMATION ALGORITHMS
AND ANALYSIS
In this chapter, we define the framework for assessing community structure fragility.
At first we introduce the density based broken community (DBC) problem for breaking
k communities with the minimum number of edge removals and analyze its complexity.
We then provide an approximation algorithm with theoretical performance guarantee for
the DBC problem in Section 5.1. To analyze the vulnerability of the community structures
in a broader sense, we extend the problem formulation to communities produced from
an arbitrary community detection algorithm. We offer an efficient heuristic to break the
communities and identify the set of critical edges in Section 5.2. In order to analyze
the edge constrained version and accordingly to identify the edges that are crucial
for community structure, we furthermore examine the problem from the view point of
locating a fixed number of important edges whose removal breaks as many communities
as possible in Section 5.3. We conduct extensive experiments with different parameters
to mine interesting observations about the behavior of broken communities after edge
removal. The results are reported in Section 5.4.
5.1 Density-based Analysis
5.1.1 Network Model and Problem Definition
In this chapter a network is represented by a graph G = (V ,E) where V is the
set of n nodes and E is the set of m edges. A node u in G represents a user while an
edge (u, v) represents the interaction between the users u and v in the network. For a
community C ⊆ V , let mC and nC be the number of internal edges and the number of
nodes in C , respectively. Let C in denote the set of edges having both endpoints in C .
We have used the terms vertex and node interchangeably throughout this chapter.
There are several quantitative measures to identify communities in a network such
as maximizing the modularity based functions [29] and density based functions [30].
89
In this section, we first consider the density function and discuss other community
detection measures later in section III.
The density based function can be defined as (C) = |C in|(|C |2 )
to identify a set C of
nodes as a community [30]. The more C approaches a clique of its size, the higher its
density value (C).
The threshold on the internal density that suffices for C to be a local community is
given by
τ(C) =σ(C)(|C |
2
) where σ(C) =
(|C |2
)1− 1
(|C |2 ) (5–1)
Thus a subgraph induced by C is a local community iff (C) ≥ τ(C) or equivalently
|C in| ≥ σ(C).
As can be seen, this density function particularly has the advantage of dealing
with the candidate group only, not requiring any predefined threshold nor user defined
parameter. However, we discuss other community detection measures later in the
general framework section. Besides, σ(C) is an increasing function which approaches
C ’s full number of connections, i.e., the number of edges in a clique of size |C |. Hence,
σ(C) is a powerful tool for detecting local communities, i.e., densely connected parts of
a network.
Based on the definition of the density function, a community C is broken if, by
removing a set of edges S from E , the density of C falls below τ(C). Therefore, let
ki denote the number of edges required to be removed from community Ci to make
(Ci\Si) < τ(Ci\Si) where Si is the set of removed edges in community Ci , then ki is
defined as
ki = min{t|( 2(mCi− t)
nCi(nCi− 1)
) < τ(Ci)} (5–2)
The density-based breaking of communities (DBC) problem is defined as follows:
Definition 6. (DBC) Given an undirected graph G = (V ,E), and a set C of k commu-
nities, find a subset S ⊂ E of minimum cardinality such that removing S from the graph
breaks every community in C .
90
5.1.2 Complexity of DBC
Theorem 5.1. The DBC problem is NP-complete.
Proof. The decision version of DBC is defined as follows. Given (G ,C , l), where
G = (V ,E) is a graph, C is a set of communities of G , and l is a positive integer,
determine whether there exists a set S ⊂ E such that in G ′ = (V ,E\S), every
community in C is broken, and |S | ≤ l .
Given a set S of edges, one can efficiently check whether |S | ≤ l and whether all
communities in C are broken. Thus DBC is in NP.
To show the NP-hardness, we reduce from the vertex cover problem, defined as
follows. Given (G , l), where G = (V ,E) is a graph, and l is a positive integer, a vertex
cover is a set A ⊂ V such that for all e = (u, v) ∈ E , u ∈ A or v ∈ A. The problem is to
determine whether a vertex cover A exists with |A| ≤ l .
First, we need to define the identification of vertices in a graph.
Definition 7 (Vertex identification). Let H = (V ,E) be a graph. Let A = {ui : i ∈ I}
be a collection of vertices. Identification of the vertices A is defined to be the following
operation.
Let H ′ = (V ′,E ′) be the induced subgraph of H after removing vertices {ui : i ∈ I}.
Let u be a new vertex. Then
V ∗ = V ′∪{u}
E ∗ = E ′∪{(u,w) : (ui ,w) ∈ E ,w ∈ V ′}
and H∗ = (V ∗,E ∗) is the result of the operation.
Construction. Let C be the following community with 4 vertices and 5 edges:
1
2
3
4
91
Let (G , l) be an instance of vertex cover. For each edge e = (u, v) ∈ G , we create a
copy Ce of C . We associate edge (1, 2) in Ce with u, and edge (3, 4) with v .
Now form the graph G = _⋃e∈ECe , the disjoint union of the Ce . Finally, for each
vertex v in G , identify in G all incident vertices to the edges to which v is associated.
The resulting graph will be called G ∗. Together with the collection C = {Ce : e ∈ E},
(G ∗,C , l) form an instance of the decision version of DBC , where we consider Ce in G ∗
to be the set of vertices of Ce in G after identification.
Example. To illustrate the above construction, we will consider an instance of vertex
cover (G , l) where G is a triangle, and show G and finally G ∗.
G : u
v
w → G : 1
2
3
4u v
5
6
7
8v w
9
10
11
12w u
. Then, vertices {1, 11}, {2, 12} are
identified corresponding to the edges associated with u, and likewise for the other
vertices in G , and we have G ∗:
1,11
2,12
3, 5
4, 67,9
8,10
u v
w
Given instance (G , l) of vertex cover, it remains to be shown that a solution for
(G ∗,C , l) yields a solution for (G , l). Each community Ce meets the density requirement
to be a community by a single edge. Since none of the edges in Ce other than (1, 2) and
(3, 4) are shared with any other community in C , we can assume that only (1, 2) or (3, 4)
(after identification) is removed from any given community. Thus, each edge that is a
candidate for removal corresponds to a unique vertex in G .
Thus, given a solution B of at most l edges whose removal breaks C , we get a set
A of vertices corresponding to the edges in B. This set A is a vertex cover of G . To see
92
this let e ∈ E . Then Ce is broken by removing B. Thus, one of the edges corresponding
to the vertices of e must be in B; hence at least one vertex of e is in A.
By similar argument, a feasible vertex cover for (G , l) gives rise to a feasible
solution of (G ∗,C , l).
5.1.3 Solutions to DBC
In this section, we provide an approximation algorithm for DBC with a theoretical
performance guarantee. In doing so, we first reduce DBC to the set multicover problem,
in a way that preserves the approximation ratio for set multicover. We then apply
solutions of set multicover to our problem. The challenging part of this approach is to
reduce a problem to another one while preserving the ratio.
Definition 8 (Approximation ratio preserving reduction). Let �1 and �2 be minimization
problems.
Let f be a polynomial-time algorithm such that if I1 is an instance of �1, I2 = f (I1) is
an instance of �2 with OPT (I2)) ≤ OPT (I1); that is, the value of the optimal solution to
I2 is at most the value of the optimal solution to I1.
Let g be a polynomial time algorithm, such that if t is a solution of I2 = f (I1),
s = g(I1, t) is a solution of I1 such that the objective function value of s is not more than
the objective function value of t; that is, obj�1(I1, s) ≤ obj�2
(I2, t).
Then, by use of f and g, an α-approximation for �2 yields an α-approximation for
�1.
Consider the problem
Definition 9 (Set multicover).
minimize x
subject to Ax ≥ b,
0 ≤ x ≤ u, (x integer)
where A is n by m matrix (aij), aij ∈ {0, 1}, bi ∈ N for i ∈ {1, ... , n}, ui ∈ N, i ∈ {1, ... ,m}.
93
We have defined set multicover as an integer program, for convenience, but one
may think of row i of A as giving the subsets to which element i belongs, bi as the
number of times element i is required to be covered, xi would correspond to the number
of times set i could be picked, bounded above by ui .
Next, we will define an approximation ratio preserving reduction from DBC to set
multicover. Let I1 be an instance of DBC, consisting of a graph G = (V ,E) and a set of
communities C to be broken. Suppose each Ci ∈ C to require ki edges to be removed.
Now, for the set multicover instance, instance I2 will be defined in the following way.
Define the set of elements to be covered to be C , with bi = ki . For each e ∈ E , define
Ae := {C ∈ C : e ∈ C}. These sets will form the collection of subsets of C from which
we choose the multicover. Finally, define ue , the maximum times Ae can be chosen, to
be |{f ∈ E : Af = Ae}|.
Thus, I2 is a valid instance of set multicover. Now, any feasible solution s of I1
corresponds in a natural way to a feasible solution t of I2 of equal cost. List the edges
removed in s: e1, e2, ... , ek . For each edge ei , add one to the number of times Aei is
chosen. This procedure clearly results in a feasible solution t of I2 of equal cost to s.
Thence, OPT (I2) ≤ OPT (I1).
Now, let t be a feasible solution of I2. It consists of a collection {(Ae, xe)} of subsets
of C together with the number of times each subset is chosen. To construct s: for each
subset Ae , pick xe edges f such that Af = Ae . This is possible since xe ≤ ue , where ue
is the number of edges satisfying this condition. The cost of s is equal to the cost of t.
Hence, we have an approximation-preserving reduction.
Set multicover as defined above has a log k-approximation algorithm [27], where k
is the number of elements to be covered. If we combine this algorithm with the above
reduction, we have a log k-approximation algorithm for DBC, where k is the number of
communities to be broken.
94
We present the approximation algorithm in Alg. 6 labeled CVA (Community
Vulnerability Assessment). The gain function f (e) indicates the number of unbroken
communities L(e) that the edge e belongs to. In each iteration we pick the edge with
highest gain until all the communities in C are broken. The DeletionVector D contains
the number of edges necessary to be removed for each community in order to break it.
This vector D is updated each time an edge is removed from a community. Once all the
necessary edges to break a community Ci have been removed, i.e. when Di becomes 0,
the community is broken and the gain function f (e) is updated.
Algorithm 6 CVA: An approximation algorithm for finding the critical edgesData: Network G = (V ,E), DeletionVector D, C , |C | = k
Result: A set S ⊆ E edgesS ← ∅C ← ∅for each edge e ∈ E do
compute the gain f (e)
while |C | ≤ k doe ′ ← argmax
e∈E\S{f (e)}
In case of a tie, choose randomlyS ← S ∪ {e ′}for l = 1 to k do
if Cl /∈ C thenif e ′ ∈ Cl then
Dl ← Dl − 1if Dl ≤ 0 then
C ← C ∪ {Cl}f (e) = f (e)− 1 for all e ∈ Cl
return S
5.2 A General Framework
We now discuss the breaking community problem in the context of a general
community detection algorithm. There are a plethora of community detection algorithms
with different objective functions. Thus, we define what it means to break a community
for an arbitrary community detection algorithm as follows.
95
Definition 10. (Broken Community) Consider a community detection algorithm A ,
which produces a collection C of communities on graph G (written C = A (G)). Let G ′
be a new graph after removal of a set of edges, and let C ′ = A (G ′). Let γ ∈ (0, 1). A
community C ∈ C is said to be broken in graph G ′ if there does not exist a community
C ′ ∈ C ′ satisfying
( i) C ′ ⊂ C , and
( ii)|C ′||C |
> γ.
We introduce the strictness threshold γ which defines how much similarity the two
structures have in terms of number of common nodes once the community is broken
after edge removal. The larger this threshold the less strict the requirement is and vice
versa.
Accordingly, Broken Community Assessment (BCA) problem is formulated as
follows:
Definition 11. (BCA) Given a network represented by a graph G = (V ,E), a specific
set C of k communities, BCA seeks for a minimum cardinality subset S ⊆ E such that
removal of S from G breaks every community in C .
Solution for the General Case
Let ε > 0. Define a c-way ε-balanced partition of a graph to be a partition with c
components, such that for each component A, |A| < (1+ε)nc
[37].
Lemma 3. Partitioning a community C into at least c ε-balanced subparts, where
γc ≥ 1 + ε makes it broken.
96
Proof. After a balanced paritioning of C into c subparts, each partition has less than
(1 + ε)nc
vertices, where n = |C |. Now, let γc ≥ 1 + ε, and A be a component. Then,
|A| < (1 + ε)n
c=(1 + ε)n1γ(γc)
≤(1 + ε)γn
(1 + ε)= γn
Finally, any community C ′ detected within C must lie in one of the components, A; so
|C ′| < γ|C |, and the community C is broken.
We devise Alg. 7 for solving the BCA problem based on Lemma 3. In order to find
a solution, c should satisfy the condition γc ≥ 1 + ε. We partition each community into
c-balanced components. The proposed Critical Community Fragility (CCF) algorithm
follows.
Algorithm 7 CCF: A heuristic algorithm for breaking communitiesData: Network G = (V ,E), k Communities C , strictness threshold γResult: A set S ⊆ E of edgesS ← ∅c ← z : z is least integer satisfying zγ ≥ 1 + εfor each community Ci ∈ C do
compute the c-way balanced partitioning [37]Cuti = set of edges to cut Ci into c partsS ← S ∪ Cuti
return S
For each of the target k communities, Alg. 7 at first finds out the number of parts it
needs to be partitioned for breaking that community as per the general definition. Each
of the target communities are then divided into c parts by balanced partitioning algorithm
as proposed in [37]. The edges that lie in between different parts are subsequently
removed to ensure that the community is broken.
97
5.3 Broken Community Analysis: Constraint on Edge Removal
We have thus far devised formulations to break a community where we are
choosing the minimum number of required edges to break all the given k communities.
In real life the choice of edges is often constrained by a fixed budget. In the latter case, it
is more ideal to extract those critical edges whose removal breaks as many communities
as possible. For instance, when the budget is fixed, in order to limit the spread of
misinformation [79, 80] in OSNs or to stop worm propagation in cellular networks, one
might want to safeguard as many affected communities as possible.
In this section we investigate the broken community problem from a different angle,
that is to maximize the number of broken community within an allowed budget, i.e.,
deleting at most k edges, defined as follows:
Definition 12. Given a network represented by a graph G = (V ,E), a set C =
{C1,C2, ... ,Cl} of communities and a positive integer k ≤ m, the problem seeks for a
subset S ⊆ E of edges where |S | ≤ k such that the number of broken communities in C
is maximized after the removal of S .
Based on the definition of broken community, we define two variants of the above
problem, k-DBC and k-BCA in following subsections.
5.3.1 k-Density-based Broken Community
The k-Density based Broken Community (k-DBC) problem is formulated according
to the definition of broken community defined in Section 5.1.1. A community is said to be
broken if the internal density becomes smaller than the threshold as given by Equation
5–1 as edges are removed one by one. The set of ki edges inside the community Ci for
all i ∈ 1, ..., l whose removal will break that community is determined by Equation 5–2.
Introduce binary variable xj for each edge ej ∈ E for all j ∈ 1, ...,m, whose value
will be set to 1 if ej is chosen for removal and also consider the variable zi which denotes
whether community Ci is broken or not. The IP formulation of this problem is given
below:
98
maximizel∑
i=1
zi
subject to∑ej∈Ci
xj ≥ kizi , ∀Ci ∈ C
∑j
xj ≤ k , ∀j ∈ {1, ...,m}
xj ∈ {0, 1}, ∀j ∈ {1, ...,m}
zi ∈ {0, 1}, ∀i ∈ {1, ..., l}
Solution of k-DBC. When ki = 1 for all i ∈ {1, ... , l}, the above integer program
resembles Maximum Coverage problem, a well-known NP-complete problem [7]; thus,
k-DBC is NP-complete. We propose an algorithm k-CVA for solving this problem in Alg.
8. k-CVA keeps on removing edges with highest gain denoted by f (e) that measures the
number of unbroken communities e belongs to until k edges are removed.
Algorithm 8 k-CVA: An optimal algorithm for finding the critical edgesData: Network G = (V ,E), DeletionVector D, k ≤ m, set of Communities CResult: A set S ⊆ E of k edges, set B of broken communitiesS ← ∅B ← ∅while |S | ≤ k do
for each edge e ∈ E docompute the gain f (e)
e ′ ← argmaxe∈E\S
{f (e)}
S ← S ∪ {e ′}for each community l ∈ C do
if e ′ ∈ l thenDl ← Dl − 1if Dl ≤ 0 then
f (e) = f (e)− 1 for all e ∈ Cl
B ← B ∪ {l}
return S , B
99
5.3.2 A General Framework: k-Broken Community Assessment
The k-Broken Community Assessment (k-BCA) problem defined in the context of a
general community detection algorithm is formulated based on the following definition of
broken community (defined in Section 5.2):
Definition 13. A community C in graph G is said to be broken in graph G ′ = [G\E ′] if
there does not exist a community C ′ in G ′ satisfying ( i) C ′ ⊂ C , and ( ii) |C ′|/|C | > γ after
removal of edge set E ′ from G .
a) Solution for the General Case: We devise a greedy algorithm in Alg. 9 labeled
CEL (Critical Edge Locator) for solving the k-BCA problem. As the number of edges is
constrained, CEL removes those edges that will break the communities apart. To this
end, we introduce a metric for edge importance inside a community based on maximum
common neighborhood value. We calculate the common neighbor value of an edge by
finding out the number of common nodes the endpoints of the edge has. For an edge
(u, v), Common Neighbor Index (CNI) is the number of common neighbors between
vertex u and v , i.e., CNI (u, v) = |N(u) ∩ N(v)|, N(u) denotes the neighbors of u in G for
all u ∈ V .
This CNI indicates how important that edge is in keeping the community connected.
If an edge has small CNI value, it implies that removing this will facilitate the breaking of
that community since very few (if any) other common neighbors exist to keep different
parts of that particular community connected. Hence, CEL chooses these edges
instead of those that have high CNI value. Moreover, since the goal is to break as many
communities as possible, CEL not only ranks edges based on CNI value but also takes
into consideration the cut size (number of edges required for balanced partitioning) of
the community an edge belongs into when calculating the final weight of each edge.
CEL prioritizes those edges that belong to communities that require fewer cut edges.
Additionally to avoid choosing bridge edges that connect two different communities, CEL
assigns them higher weight value.
100
Algorithm 9 CEL: A heuristic for finding the critical edgesData: Network G = (V ,E), set of Communities C, strictness threshold γ, k ≤ m
Result: A set S ⊆ E , |S | ≤ k edges, set of broken communitiesS ← ∅c ← z : z is least integer satisfying zγ ≥ 1 + εfor each community Ci ∈ C do
Compute the c-way balanced partitioning [37]Cuti = set of edges to cut Ci into c sub-parts
for each edge e(u, v) ∈ E doCalculate CNI (u, v)Find Community with smallest cut edges where e belongs, Cs = argmin
(u,v)∈Cj
{|Cutj |}
w(u,v) ← CNI (u, v) + |Cuts |if u ∈ Ci and v ∈ Cj and i 6= j then
w(u,v) ←∞
while |S | ≤ k doe(u, v)← argmin
(u,v)∈E\S{w(u,v)}
S ← S ∪ {e}Find the set of communities B that are brokenreturn S , B
The greedy algorithm chooses k edges starting with those having smallest weights
and finally, it identifies the list of broken communities after the removal of selected edges
according to the definition of broken community.
5.4 Experimental Evaluation
Our goal in this section is to: 1) Evaluate the performance of our proposed algorithm
CVA and k-CVA by comparing them to the optimal solutions, and 2) Assess the strength
of a community using the algorithms CCF and CEL.
5.4.1 Data Set
Set up: We use data sets from well-known social, collaboration and communication
networks which exhibit inherent community structure in their organization. The Facebook
data [70] we are using consists of the social network interactions between Rice
University graduate students and contains strongly connected components. The Arxiv
Condensed Matter Physics collaboration network is obtained from the e-print database
[23] and covers scientific collaborations between authors who have submitted papers to
101
Condensed Matter category. If an author i co-authored a paper with author j , the graph
contains an undirected edge from i to j . If the paper is co-authored by k authors, this
generates a completely connected (sub)graph on k nodes. We have further considered
Enron email [47] communication network dataset. A summary of the data sets are given
in Table 5-1.
Table 5-1. Experimental datasets
Data Set Node Count Edge Count
Facebook [70] 4039 88234Arxiv [23] 23133 93439Enron [47] 36692 183831
5.4.2 Performance Evaluation of CVA
To test how different communities behave under DBC formulation, we compare the
result of CVA with the outcome of optimal Integer Programming (IP) solution. We have
chosen the k largest communities based on the node numbers. For each k , a minimum
number of edges are chosen by Alg. 6 and removed from the network. Total set of
edges that are required to break all these k communities according to the definition in
Equation 5–1 is then plotted to compare the performance with that of the optimal one.
All tests are averaged on 500 runs for consistency.
IP Formulation
We formulate the DBC problem as an IP problem so that we can compare it with the
performance of CVA. This IP will be solved using the CPLEX package [36].
Let the variable zi represent each edge ei ∈ E :
zi =
1, if ei is selected for removal.
0, otherwise.(5–3)
For each Cj ∈ C = {C1, ... ,Ck}, kj be the number of edges required to break Cj as
defined in Equation 5–2. Then we have the following IP:
102
minimizem∑i=1
zi
subject to∑ei∈Cj
zi ≥ kj , ∀Cj ∈ C ,
zi ∈ {0, 1}, ∀i ∈ {1, ...,m}
0
10
20
30
40
50
60
70
80
90
10 20 30 40 50 60 70 80 90 100
edge
s r
em
oved
k broken communities
OptimalCVA
A Facebook
0
50
100
150
200
250
300
350
400
10 20 30 40 50 60 70 80 90 100
edge
s r
em
oved
k broken communities
OptimalCVA
B Arxiv
0
10
20
30
40
50
60
70
80
10 20 30 40 50 60 70 80 90 100
edge
s r
em
oved
k broken communities
OptimalCVA
C Enron
Figure 5-1. Density based broken community analysis for k largest community
Figure 5-1 depicts the number of edges required to be removed for breaking a total
of 100 communities. As can be seen, the comparative performance of CVA is very much
close to the optimal one for all the data sets except for a negligible deviation in Arxiv
data set as Figure 5-1B depicts. And thus we can conclude that CVA performs very well.
103
5.4.3 Performance Evaluation for Generalized Framework
We provide the comparative analysis of the behavior of different networks under two
community detection algorithms. For this purpose, we use Blondel [17] and Oslom [43].
The first one is a modularity based community detection scheme which has been shown
to produce very good modular components in timely manner [42]. On the other hand,
the latter one is based on statistical properties of the graph which allows overlapping
communities. The characteristics of different networks detected by these two community
detection algorithms is shown in Table 5-2.
Table 5-2. Network communities
Data Set Community Count Community Countin Blondel in Oslom
Facebook 17 118Arxiv 620 1764Enron 1265 1374
As a first approach to observe how communities behave under sustained edge
removal, we target k large communities with CCF. To this end, we choose two different
values of strictness threshold γ, 0.5 and 0.3 all of which follow γc ≥ (1 + ε). For all
of the experiments, we have considered ε = 0.03 for the balanced partitioning. The
threshold 0.5 is less strict than 0.3 in the sense that it allows more nodes to be retained
even after breaking the community. We also show the behavior of CCF for k randomly
selected communities and k smallest communities. For Facebook network with Blondel
community detection algorithm, we try to break all 17 communities detected and for all
other cases we take 30 communities. The results that we plot are averaged over 100
runs to get rid of inconsistencies as much as possible.
Figure 5-2 shows the performance of different types of communities obtained
through different community detection algorithms for different strictness threshold
(γ) as we remove edges using CCF. In this figure, we are considering the k largest
communities which were chosen based on their respective number of nodes. From
104
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% e
dges
rem
oved
k communities
oslomblondel
CVA
A Facebook,γ=.5
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% e
dges
rem
oved
k communities
oslomblondel
CVA
B Facebook,γ=.3
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
C Arxiv,γ=.5
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
D Arxiv,γ=.3
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
E Enron,γ=.5
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
F Enron,γ=.3
Figure 5-2. Edge removal count by greedy algorithm CCF for breaking k largestcommunities. γ = 0.5 in first column, γ = 0.3 in second column
Figure 5-2 first column, it is clearly evident that only a small fraction of edge removal
causes the communities to be broken for γ = 0.5. On an average a maximum of
20% edge removal is enough to break all 17 communities in Facebook network
detected through Blondel as can be seen in Figure 5-2A. In case of Oslom, it takes
a large number of edges (a litte more than 40% on an average) which implies larger
communities detected by Oslom are more strongly connected internally.
For Arxiv network, the number of edges required to break 30 large communities
is only 4% for Blondel community detection algorithm and 15% for Oslom on an
105
average as shown in Figure 5-2C. For Arxiv with the same γ value and equal number
of communities, Oslom requires comparatively small number of edges to be removed
than in the case of Facebook. It implies members in communities for this particular
Facebook network are densely connected internally compared to Arxiv network and as a
result it was easier to break Arxiv communities with small number of edges. The same
observation is applicable for Enron network as portrayed in Figure 5-2E.
As we break more and more communities in Enron, Oslom requires decreasing
number of edges on average to break them. The reason is, the smaller the community
becomes the fewer the edges are needed to break them. In all of these cases we
have put the performance of CVA in parallel to visualize how communities detected by
different community detection algorithms are broken compared to the ones detected by
density-based algorithm in terms of breaking k communities. In all of the cases, CVA
requires very few edges to break all the communities. In general communities detected
by Oslom requires more edge removals compared to any other approach. One of the
reasons behind this behavior is that Oslom produces overlapping communities and as a
result it requires more edges to break those communities.
The second column of Figure 5-2 depicts the behavior of different networks for
γ = 0.3. This means the strictness imposed by γ will necessitate more edges to be
removed as few nodes are allowed to be retained if the community is to be broken. This
is evident from each of the figures Figure 5-2B, Figure 5-2D and Figure 5-2F. One thing
to notice in this regard, the need for more edge removal is equally true for both of the
community detection algorithms: Blondel and Oslom. The percentage of edges needed
to break only 1 (k = 1) community increase by almost double when we decrease γ from
0.5 to 0.3. Even though we impose more strictness, still, in case of Blondel community
detection algorithm, as low as only 7% for Arxiv and 24% for Enron networks on average
are required to break k communities. Facebook communities, since they are strongly
connected with more internal edges as was seen in earlier cases, require more (35%)
106
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% e
dges
rem
oved
k communities
oslomblondel
CVA
A Facebook,γ=.5
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% e
dges
rem
oved
k communities
oslomblondel
CVA
B Facebook,γ=.3
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
C Arxiv,γ=.5
0
10
20
30
40
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
D Arxiv,γ=.3
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
E Enron,γ=.5
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
F Enron,γ=.3
Figure 5-3. Edge removal count by CCF for breaking k randomly selected communities.γ = 0.5 in first column, γ = 0.3 in second column
edges to break all selected communities. Oslom in this case also needs more edges to
be removed compared to Blondel for breaking the same number of communities. This is
consistent with the behavior we observed so far in general for Oslom.
Next, we consider k randomly selected communities in Figure 5-3. It corroborates
the earlier observations that breaking a community requires more edges in case of
Oslom compared to Blondel. Only a small percentage of edge removal breaks all k
communities for both Arxiv and Enron networks. Facebook communities, as mentioned
for other cases, require more edges to break all the selected communities. As the
107
threshold γ becomes more stringent from 0.5 to 0.3 more edges are required to be
removed as can be seen from 2nd column.
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% e
dges
rem
oved
k communities
oslomblondel
CVA
A Facebook,γ=.5
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
% e
dges
rem
oved
k communities
oslomblondel
CVA
B Facebook,γ=.3
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
C Arxiv,γ=.5
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
D Arxiv,γ=.3
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
E Enron,γ=.5
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
% e
dges
rem
oved
k communities
oslomblondel
CVA
F Enron,γ=.3
Figure 5-4. Edge removal count by CCF for breaking k smallest communities. γ = 0.5 infirst column, γ = 0.3 in second column
For k smallest communities we can see almost similar behavior as large communities
except for the fact that this time we need comparatively smaller number edges to be
removed for Oslom as can be seen in Figure 5-4. For both arxiv and enron dataset,
CVA requires large number of edges to break the communities as for small communities
more edges are needed for the internal density to go below a certain value resulting in
comparatively more edge removal. The same observation can be drawn for randomly
108
selected communities as some small communities might be chosen in random selection.
Nevertheless, the interesting outcome that we can sift from all of these figures points to
the fact that in many cases few edges are enough for breaking communities.
Impact of the Location of Communities: In order to understand how the
communities are interconnected and intra-connected and what is the impact of their
relative structural position in the network on the vulnerability of the communities, we
consider three different cases for each of the data set. We choose two communities
on random basis using three criteria and find out how difficult it is to break them
by removing the optimal number of critical edges. The first criteria chooses two
communities who do not have any connecting edges between them, i.e., non-adjacent
communities, the second criteria opts for two adjacent communities and tries to break
it based on the internal connections of each of the communities only without taking
into consideration the inter-community edges. The third criteria does the same as the
second one but this time it takes into consideration the inter-community edges while
breaking the communities. We call first criteria ‘non adjacent community’, the second
one ‘adjacent community inter-edge not considered’ and the third one ‘adjacent with
inter-edge’. Table 5-3 shows the percentage of edges needed to break two communities
in each of the three criteria. We considered Blondel community detection algorithm for
this case with γ = 0.3 and run over 50 different combination of random communities.
Table 5-3. Network characteristics
Data Set non adjacent adjacent community adj. communitycommunity inter-edge not considered with inter-edge
Facebook 12% 13% 12.4%Arxiv 15% 11% 11%Enron 20% 18% 17%
Intuitively, communities with connecting in-between edges are easier to break due
to the attraction of the neighboring communities. However, the above analysis from
Table 5-3 shows that this is not generally true. Moreover, for breaking two non-adjacent
109
Figure 5-5. A small community detected by Oslom for γ = 0.3 in Enron network. Herethe internal structure shows parts are connected through small number ofedges. Our greedy algorithm removes the pink cut edges.
Figure 5-6. A community detected by Oslom for γ = 0.3 in Facebook network. Here theinternal structure shows parts are connected through small number of edgesin pink.
and adjacent communities, the results are quite similar to the ones we have seen
in Figure 5-3 for random communities. The outcome of this experimental result is
consistent with what CCF does. This re-establishes the already claimed conjecture that
communities are in fact easy to break. We have also observed that for some cases,
as low as 1% edge removal is enough to break the community. To explore one of the
reasons behind this more closely, we also consider a small community detected by
Oslom community detection algorithm in Enron data set. The observation is depicted
in Figure 5-5. The internal structure seems to be modular and connected through
few number of important edges. This justifies our approach of partitioning each of
the communities into parts to break them. Interestingly the critical edges that were
selected by CCF are exactly the same one shown in this figure in pink. This shows that
110
communities can be broken by removing some crucial edges that keep different parts
inside a community closer.
We also observe similar edges in Figure 5-6 in another randomly selected
community detected by Oslom in Facebook. These edges act as the connecting force in
a community, removal of which results in broken community.
0
2
4
6
8
10
12
14
5 10 15 20 25
% C
om
munitie
s B
roken
k edges
Optimalk-CVA
A Facebook
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10
% C
om
munitie
s B
roken
% k edges
BlondelOslom
B Facebook, γ = 0.5
0
0.5
1
1.5
2
2.5
3
5 10 15 20 25
% C
om
munitie
s B
roke
n
k edges
Optimalk-CVA
C Arxiv
35
40
45
50
55
60
65
70
75
80
1 2 3 4 5 6 7 8 9 10
% C
om
munitie
s B
roke
n
% k edges
BlondelOslom
D Arxiv, γ = 0.5
0
1
2
3
4
5
6
7
8
9
5 10 15 20 25
% C
om
mu
nitie
s B
roke
n
k edges
Optimalk-CVA
E Enron
20
30
40
50
60
70
80
90
100
110
1 2 3 4 5 6 7 8 9 10
% C
om
mu
nitie
s B
roke
n
% k edges
BlondelOslom
F Enron, γ = 0.5
Figure 5-7. Broken Community Analysis. k-DBC in 1st Column, Outcome of CEL onk-BCA in 2nd Column
111
5.4.4 Analysis of the Edge Constrained Version
5.4.4.1 Results for k-DBC problem
We show the empirical results on real-world networks including the Arxiv citation
network, Facebook and Enron email dataset and compare the result of k-CVA with
the outcome of optimal IP solution. As can be seen from Figure 5-7 first column, the
greedy algorithm performs very close to and within a small bound of the optimal one for
all the data sets in terms of percentage of communities that are broken as we keep on
increasing k from 1 to 25.
5.4.4.2 Results for k-BCA problem
We keep on varying the budget k as the percentage of total edges in the network
from 1% to 10% for γ = 0.5 as can be seen from Figure 5-7 second column. In
Facebook dataset (Figure 5-7B), more than 94% of communities identified by Blondel
are broken only after 10% edge removal. On the other hand, with same number of edge
removal in Arxiv citation network (Figure 5-7D), around 60% communities detected
by Blondel are broken. The same trend is seen for Enron network (Figure 5-7F).
In all cases, communities identified by Blondel are more resilient compared to the
communities found by Oslom. In short, only a small percentage (10%) of edge removal
makes almost all of the communities identified by Blondel and Oslom to be broken.
Overall, communities are vulnerable to edge removal.
5.5 Summary
We made a novel attempt to study the community vulnerability problem for
assessing the system fragility under edge removal. We formulated the density-based
broken community problem and show its complexity. We also provided an efficient
approximation algorithm for solving this problem after proving its ratio. In addition,
we proposed a heuristic, CCF, for solving the general version of breaking community
problem. Experimental results on real world data under this newly defined framework
gave us insightful knowledge about the underlying community structure. We found out
112
that communities in real-world networks are susceptible to edge failures and in many
cases the failure of only a small number of critical edges can break major communities
in the network.
113
CHAPTER 6CONCLUSION
In this dissertation, we have made an attempt to analyze the community structure’s
importance on real world cellular network’s performance. we have studied the impact
of device mobility on the performance of multi-hop D2D underlaying cellular network.
We have introduced a novel model that considers durable communities based on the
social encounters of devices for predicting the likelihood of devices’ proximity. We have
formulated the reliable device selection problem as an IP optimization problem and
we have proposed an efficient heuristic algorithm to solve it. We have also shown that
leveraging social communities can increase the content delivery rate in multi-hop D2D.
Simulation results show that our proposed method outperforms classical social-unaware
methods significantly in terms of traffic offload. The results also show that the proposed
method achieves its objectives with manageable computational complexity which makes
it applicable to larger networks.
We have also studied the benefit of social-aware multi-hop D2D for video content
delivery to multicast users under practical constraints imposed by the eNB for relay
selection. We have formulated a novel problem, MRS, for minimizing content delivery
time to a group of users and shown its NP-completeness. We have introduced a mixed
integer program formulation to express MRS and proposed a heuristic scheme to
efficiently solve the problem. Our proposed social-aware solution minimizes the Base
Station cost efficiently by relaying video content to a set of relay devices which, in
turn, transmits the content via multi-hop D2D to other devices with poorer channel
condition which could not receive the content from the eNB. We further discussed a
special case of the proposed generic problem and analyzed its complexity. Moreover, we
provided an approximation algorithm for this special case with a provable performance
guarantee. Simulation results showed that our proposed methods outperformed existing
state-of-the-art methods significantly in terms of minimizing content delivery time.
114
We have also made a novel attempt to study the community vulnerability problem
for assessing the system fragility under edge removal. We formulated the density-based
broken community problem and showed its complexity. We have also provided an
efficient approximation algorithm for solving this problem after proving its ratio. In
addition, we have proposed a heuristic, CCF, for solving the general version of breaking
community problem. Moreover, we have discussed a variant for each of these two
problems where the choice of edge is constrained and the goal is to maximize the
broken community count. Experimental results on real world data under this newly
defined framework give us insightful knowledge about the underlying community
structure. We have observed that communities in real-world networks are susceptible to
edge failures and in many cases the failure of only a small number of critical edges can
break major communities in the network.
115
REFERENCES
[1] Stanford Network Analysis Project. http://snap.stanford.edu/, 2016.
[2] 3GPP. LTE-Advanced (3GPP Release 10 and beyond) (2011).36.300.
URL http://www.3gpp.org
[3] ———. “Evolved universal terrestrial radio access (E-UTRA) and evolved universalterrestrial radio access network (E-UTRAN), Rel. 11.” Tech. Rep. 36.300 (2012).
[4] ———. “General aspects and principles for interfaces supporting multimediabroadcast multicast service (MBMS) within E-UTRAN, Rel. 11.” Technical Report36.440 (2012).
[5] ———. “Feasibility study for proximity services (ProSe) (Release 12).” TechnicalReport 22.803 (2013).
[6] Afolabi, Richard O, Dadlani, Aresh, and Kim, Kiseon. “Multicast scheduling andresource allocation algorithms for OFDMA-based systems: A survey.” IEEECommunications Surveys & Tutorials 15 (2013).1: 240–254.
[7] Ageev, A. A. and Sviridenko, M. “Approximation algorithms for maximum coverageand max cut with given sizes of parts.” IPCO (1999): 17–30.
[8] Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. “Network flows.” DTIC Document(1988).
[9] Albert, R., Albert, I., and Nakarado, G. L. “Structural vulnerability of the NorthAmerican power grid.” Phys. Rev. E 69 (2004).2.
[10] Alim, M. A., Pan, T., Thai, M. T., and Saad, W. “Leveraging Social Communitiesfor Optimizing Cellular Device-to-Device Communications.” IEEE Transactions onWireless Communications PP (2016).99: 1–1.
[11] Alim, Md Abdul, Kuhnle, Alan, and Thai, M. T. “Are Communities as Strong as WeThink?” Proc. IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining (ASONAM). 2014, 314–319.
[12] Alim, Md Abdul, Li, Xiang, Nguyen, Nam, Thai, My, and Helal, Abdelsalam.“Structural Vulnerability Assessment of Community-based Routing in OpportunisticNetworks.” IEEE Transactions on Mobile Computing 15 (2016).12: 3156–3170.
[13] Alim, Md Abdul, Nguyen, Nam P., Dinh, Thang N., and Thai, My T. “StructuralVulnerability Analysis of Overlapping Communities in Complex Networks.” Pro-ceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on WebIntelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01. WI-IAT ’14.Washington, DC, USA: IEEE Computer Society, 2014, 5–12.
URL http://dx.doi.org/10.1109/WI-IAT.2014.10
116
[14] Alim, Md Abdul, Pan, Tianyi, Thai, My Tra, and Saad, Walid. “Leveraging SocialCommunities for Optimizing Cellular Device-to-Device Communications.” arXivpreprint arXiv:1611.01582 (2016).
[15] Araniti, Giuseppe, Condoluci, Massimo, Militano, Leonardo, and Iera, Antonio.“Adaptive resource allocation to multicast services in LTE systems.” IEEE Transac-tions on Broadcasting 59 (2013).4: 658–664.
[16] Asadi, A., Wang, Q., and Mancuso, V. “A survey on device-to-devicecommunication in cellular networks.” IEEE Communications Surveys & Tutori-als 16 (2014).4: 1801–1819.
[17] Blondel, V. D., Guillaume, J., Lambiotte, R., and Lefebvre, E. “Fast unfolding ofcommunities in large networks.” J. Stat. Mech.: Theory and Experiment (2008).
[18] Borgatti, Stephen P. and Everett, Martin G. “A Graph-theoretic perspective oncentrality.” Social Networks 28 (2006).4: 466 – 484.
[19] Botsov, Mladen, Klugel, Markus, Kellerer, Wolfgang, and Fertl, Peter. “Locationdependent resource allocation for mobile device-to-device communications.” 2014IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2014,1679–1684.
[20] Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., andWagner, D. “On modularity clustering.” IEEE Transactions on Knowledge and DataEngineering 20 (2008).2: 172–188.
[21] Chen, Xiaohang, Chen, Li, Zeng, Mengxian, Zhang, Xin, and Yang, Dacheng.“Downlink resource allocation for device-to-device communication underlayingcellular networks.” 2012 IEEE 23rd International Symposium on Personal, Indoorand Mobile Radio Communications-(PIMRC). IEEE, 2012, 232–237.
[22] Cho, E., Myers, S., and Leskovec, J. “Friendship and mobility: user movement inlocation-based social networks.” In Proc. of the 17th ACM SIGKDD internationalconference on Knowledge discovery and data mining (2011): 1082–1090.
[23] dataset, ArXiv. “http://www.cs.cornell.edu/projects/kddcup/datasets.html.” Proc.KDD Cup 2003. 2003.
[24] Diaz, Carlos G, Saad, Walid, Maham, Behrouz, Niyato, Dusit, and Madhukumar,AS. “Strategic device-to-device communications in backhaul-constrained wirelesssmall cell networks.” 2014 IEEE Wireless Communications and NetworkingConference (WCNC). IEEE, 2014, 1661–1666.
[25] Dinh, Thang N, Nguyen, Nam P, Alim, Md Abdul, and Thai, My T. “A near-optimaladaptive algorithm for maximizing modularity in dynamic scale-free networks.”Journal of Combinatorial Optimization 30 (2015).3: 747–767.
117
[26] Dinh, Thang N., Xuan, Ying, Thai, My T., Pardalos, Panos M., and Znati, Taieb. “Onnew approaches of assessing network vulnerability: hardness and approximation.”IEEE/ACM Trans. Netw. 20 (2012).2: 609–619.
[27] Dobson, Gregory. “Worst-Case Analysis of Greedy Heuristics for IntegerProgramming with Nonnegative Data.” Mathematics of Operations Research 7(1982).4: 515–531.
[28] Fodor, G., Dahlman, E., Mildh, G., Parkvall, S., Reider, N., Miklos, G., and Turanyi,Z. “Design aspects of network assisted device-to-device communications.” IEEEComm. Mag 50(3) (2012): 170–177.
[29] Fortunato, S. “Community detection in graphs.” Physics Reports 486 (2010).3-5: 75– 174.
[30] Fortunato, S. and Castellano, C. “Community Structure in Graphs.” eprint arXiv:0712.2716 (2007).
[31] Gao, Wei, Li, Qinghua, Zhao, Bo, and Cao, Guohong. “Multicasting in delay tolerantnetworks: a social network perspective.” Proc. tenth ACM international symposiumon Mobile ad hoc networking and computing. MobiHoc ’09. New York, NY, USA:ACM, 2009, 299–308.
URL http://doi.acm.org/10.1145/1530748.1530790
[32] Han, B., Hui, P., Kumar, V. S. A., Marathe, M. V., Shao, J., and Srinivasan,A. “Mobile data offloading through opportunistic communications and socialparticipation.” IEEE Trans. Mobile Computing 11 (2012).5: 821–834.
[33] Hasan, Mohammed, Hossain, Ekram, and Kim, Dong In. “Resource allocationunder channel uncertainties for relay-aided device-to-device communicationunderlaying LTE-A cellular networks.” IEEE Transactions on Wireless Communica-tions 13 (2014).4: 2322–2338.
[34] Hui, P. and Crowcroft, J. “Human mobility models and opportunisticcommunications system design.” Philosophical Transactions of the Royal Societyof London A: Mathematical, Physical and Engineering Sciences 366 (2008).1872:2005–2016.
[35] Hui, Pan, Crowcroft, Jon, and Yoneki, Eiko. “Bubble rap: Social-based forwardingin delay-tolerant networks.” IEEE Transactions on Mobile Computing 10 (2011).11:1576–1589.
[36] IBM, IBM ILOG CPLEX Optimization Studio. 2014.
URL http://www-03.ibm.com/software/products/en/ibmilogcpleoptistud
[37] Karypis, George and Kumar, Vipin. “Multilevel k-way Partitioning Scheme forIrregular Graphs.” SIAM Review 2 (1998).41.
118
[38] Kaufman, B., Lilleberg, J., and Aazhang, B. “Spectrum sharing scheme betweencellular users and ad-hoc device-to-device users.” IEEE Transactions on WirelessCommunications 12 (2013).3: 1038–1049.
[39] Kim, Joongheon and Molisch, Andreas F. “Quality-aware millimeter-wavedevice-to-device multi-hop routing for 5G cellular networks.” 2014 IEEE Inter-national Conference on Communications (ICC). IEEE, 2014, 5251–5256.
[40] Kovacs, Istvan A., Palotai, Robin, Szalay, Mate S., and Csermely, Peter.“Community Landscapes: An Integrative Approach to Determine OverlappingNetwork Module Hierarchy, Identify Key Nodes and Predict Network Dynamics.”PLoS ONE 5 (2010).9: e12528.
[41] Kuhnle, Alan, Li, Xiang, and Thai, My T. “Online Algorithms for Optimal ResourceManagement in Dynamic D2D Communications.” Mobile Ad-hoc and SensorNetworks (MSN), 2014 10th International Conference on. IEEE, 2014, 130–137.
[42] Lancichinetti, A. and Fortunato, S. “Community detection algorithms: A comparativeanalysis.” Physical review. E. 80 (2009).
[43] Lancichinetti, Andrea, Radicchi, Filippo, Ramasco, Jos J., and Fortunato, Santo.“Finding Statistically Significant Communities in Networks.” PLoS ONE 6 (2011).4:e18961.
[44] Lee, D., Kim, S., Lee, J., and Heo, J. “Performance of multihop decode-and-forwardrelaying assisted device-to-device communication underlaying cellular networks.”In Proc. of International Symposium on Information Theory and its Applications(2012): 455–459.
[45] Lee, Dong Heon, Choi, Kae Won, Jeon, Wha Sook, and Jeong, Dong Geun.“Resource allocation scheme for device-to-device communication for maximizingspatial reuse.” 2013 IEEE Wireless Communications and Networking Conference(WCNC). IEEE, 2013, 112–117.
[46] Lee, K., Hong, S., Kim, S. J., Rhee, I., and Chong, S. “Slaw: A new mobilitymodel for human walks.” In Proc. of IEEE International Conference on ComputerCommunications (2009): 855–863.
[47] Leskovec, J., Lang, K. J., A., Dasgupta, and Mahoney, M. W. “Community structurein large networks: Natural cluster sizes and the absence of large well-definedclusters.” Internet Mathematics 6 (2009).1: 29–123.
[48] Li, Y., Hui, P., Jin, D., Su, L., and Zeng, L. “Evaluating the impact of socialselfishness on the epidemic routing in delay tolerant networks.” IEEE Comm.Letters 14 (2010).11: 1026–1028.
119
[49] Lin, Xingqin, Andrews, Jeffrey G, Ghosh, Amitabha, and Ratasuk, Rapeepat. “Anoverview of 3GPP device-to-device proximity services.” IEEE CommunicationsMagazine 52 (2014).4: 40–48.
[50] Lin, Y. and Hsu, Y. “Multihop cellular: A new architecture for wirelesscommunications.” In Proc. of IEEE International Conference on Computer Commu-nications 3 (2000): 1273–1282.
[51] Lu, Z., Wu, W, Chen, W, Zhong, J, Bi, Y, and Gao, Z. “The Maximum CommunityPartition Problem in Networks.” Discrete Math., Alg. and Appl. (2013).
[52] Luciano, Rodrigues, F.A., Travieso, G., and Boas, V. P. R. “Characterization ofcomplex networks: A survey of measurements.” Advances in Physics 56 (2007).1:167–242.
URL http://dx.doi.org/10.1080/00018730601170527
[53] Ma, X., Yin, R., Yu, G., and Zhang, Z. “A distributed relay selection method for relayassisted device-to-device communication system.” In Proc. of 23rd InternationalSymposium on Personal Indoor and Mobile Radio Communications (2012):1020–1024.
[54] Madan, R., Borran, J., Sampath, A., Bhushan, N., Khandekar, A., and Ji, T.“Cell association and interference coordination in heterogeneous LTE-A cellularnetworks.” IEEE Journal on Selected Areas in Communications 28 (2010).9:1479–1489.
[55] Militano, Leonardo, Condoluci, Massimo, Araniti, Giuseppe, Molinaro,Antonella, Iera, Antonio, and Muntean, Gabriel-Miro. “Single frequency-baseddevice-to-device-enhanced video delivery for evolved multimedia broadcast andmulticast services.” IEEE Transactions on Broadcasting 61 (2015).2: 263–278.
[56] Min, Hyunkee, Lee, Jemin, Park, Sungsoo, and Hong, Daesik. “Capacityenhancement using an interference limited area for device-to-device uplinkunderlaying cellular networks.” IEEE Transactions on Wireless Communications 10(2011).12: 3995–4000.
[57] Nguyen, N. P., Alim, Md Abdul, Shen, Y., and Thai, M. T. “Assessing networkvulnerability in a community structure point of view.” Proc. IEEE/ACM InternationalConference on Advances in Social Networks Analysis and Mining (ASONAM).2013, 231–235.
[58] Nguyen, Nam P, Alim, Md Abdul, Dinh, Thang N, and Thai, My T. “A method todetect communities with stability in social networks.” Social Network Analysis andMining 4 (2014).1: 1–15.
120
[59] Nunes, Ivan O, de Melo, Pedro OS Vaz, and Loureiro, Antonio AF. “LeveragingD2D Multi-Hop Communication Through Social Group Meetings Awareness.” IEEEWireless Communications Magazine (2016): 1–9.
[60] Pei, Y. and Liang, Y. “Resource allocation for device-to-device communicationsoverlaying two-way cellular networks.” IEEE Transactions on Wireless Communica-tions 12 (2013).7: 3611–3621.
[61] Pew Research Center, Washington, D.C. “Social Media Update 2014.” (2014).
URL http://www.pewinternet.org/2015/01/09/social-media-update-2014/
[62] Proebster, M., Kaschub, M., Werthmann, T., and Valentin, S. “Context-awareresource allocation for cellular wireless networks.” EURASIP Journal on WirelessCommunications and Networking 2012 (2012).1: 1–19.
[63] Rebecchi, Filippo, Valerio, Lorenzo, Bruno, Raffaele, Conan, Vania, de Amorim,Marcelo Dias, and Passarella, Andrea. “A joint multicast/D2D learning-basedapproach to LTE traffic offloading.” Computer Communications 72 (2015): 26–37.
[64] Scripps, Jerry, Tan, Pang-Ning, and Esfahanian, Abdol-Hossein. “Node roles andcommunity structure in networks.” Proc. 9th WebKDD and 1st SNA-KDD 2007workshop on Web mining and social network analysis. WebKDD/SNA-KDD ’07.2007, 26–35.
[65] Semiari, Omid, Saad, Walid, Valentin, Stefan, Bennis, Mehdi, and Poor, H Vincent.“Context-Aware Small Cell Networks: How Social Metrics Improve WirelessResource Allocation.” IEEE Transactions on Wireless Communications 14(2015).11: 5927–5940.
[66] Sun, Yue, Wang, Tianyu, Song, Lingyang, and Han, Zhu. “Efficient resourceallocation for mobile social networks in D2D communication underlaying cellularnetworks.” 2014 IEEE International Conference on Communications (ICC). IEEE,2014, 2466–2471.
[67] Tan, Li, Feng, Zhiyong, Li, Wei, Jing, Zhong, and Gulliver, T Aaron. “Graph coloringbased spectrum allocation for femtocell downlink interference mitigation.” In Proc.of Wireless Communications and Networking Conference (WCNC), 2011 IEEE(2011): 1248–1252.
[68] Vanganuru, K., Ferrante, S., and Sternberg, G. “System capacity and coverage ofa cellular network with D2D mobile relays.” In Proc. of Military CommunicationsConference (2012).
[69] Vazirani, Vijay V. Approximation algorithms. Springer Science & Business Media,2013.
121
[70] Viswanath, Bimal, Post, Ansley, Gummadi, Krishna P., and Mislove, Alan. “Ananalysis of social network-based Sybil defenses.” Proc. ACM SIGCOMM 2010conference. SIGCOMM ’10. New York, NY, USA: ACM, 2010, 363–374.
[71] Wang, Fang, Li, Yong, Wang, Zhaocheng, and Yang, Zhixing.“Social-Community-Aware Resource Allocation for D2D CommunicationsUnderlaying Cellular Networks.” IEEE Transactions on Vehicular Technology65 (2016).5: 3628–3640.
[72] Wang, Feiran, Xu, Chen, Song, Lingyang, and Han, Zhu. “Energy-efficient resourceallocation for device-to-device underlay communication.” IEEE Transactions onWireless Communications 14 (2015).4: 2082–2092.
[73] Wang, Feiran, Xu, Chen, Song, Lingyang, Zhao, Qun, Wang, Xiaoli, and Han, Zhu.“Energy-aware resource allocation for device-to-device underlay communication.”2013 IEEE International Conference on Communications (ICC). IEEE, 2013,6076–6080.
[74] Wang, L., Peng, T., Yang, Y., and Wang, W. “Interference Constrained RelaySelection of D2D Communication for Relay Purpose Underlaying CellularNetworks.” In Proc. of 8th International Conference on Wireless Communica-tions, Networking and Mobile Computing (2012).
[75] Wang, Qin, Wang, Wei, Jin, Shi, Zhu, Hongbo, and Zhang, Nai Tong.“Game-theoretic source selection and power control for quality-optimized wirelessmultimedia device-to-device communications.” In Proc. of IEEE Global Communica-tions Conference (GLOBECOM). IEEE, 2014, 4568–4573.
[76] Wang, Z. and Crowcroft, J. “Quality-of-service routing for supporting multimediaapplications.” IEEE Journal on Selected Areas in Communications 14 (1996).7:1228–1234.
[77] Xiang, Rongjing, Neville, Jennifer, and Rogati, Monica. “Modeling relationshipstrength in online social networks.” Proceedings of the 19th international confer-ence on World wide web. ACM, 2010, 981–990.
[78] Xu, Shaoyi, Wang, Haiming, Chen, Tao, Huang, Qing, and Peng, Tao. “Effectiveinterference cancellation scheme for device-to-device communication underlayingcellular networks.” Vehicular Technology Conference Fall (VTC 2010-Fall), 2010IEEE 72nd. IEEE, 2010, 1–5.
[79] Zhang, Huiling, Alim, Md Abdul, Li, Xiang, Thai, My T, and Nguyen, Hien T.“Misinformation in Online Social Networks: Detect Them All with a Limited Budget.”ACM Transactions on Information Systems (TOIS) 34 (2016).3: 18.
[80] Zhang, Huiling, Alim, Md Abdul, Thai, My T, and Nguyen, Hien T. “Monitorplacement to timely detect misinformation in online social networks.” 2015 IEEEInternational Conference on Communications (ICC). IEEE, 2015, 1152–1157.
122
[81] Zhang, Yanru, Song, Lingyang, Saad, Walid, Dawy, Zaher, and Han, Zhu.“Contract-Based Incentive Mechanisms for Device-to-Device Communicationsin Cellular Networks.” IEEE Journal on Selected Areas in Communications 33(2015).10: 2144–2155.
[82] Zhu, Zhichao, Cao, Guohong, Zhu, Sencun, Ranjan, Supranamaya, and Nucci,Antonio. “A social network based patching scheme for worm containment in cellularnetworks.” Handbook of Optimization in Complex Networks. Springer, 2012.505–533.
123
BIOGRAPHICAL SKETCH
Md Abdul Alim received the Bachelor of Science degree in Computer Science and
Engineering from Bangladesh University of Engineering and Technology, Bangladesh in
2007. He worked in a multi-national company before joining the University of Florida in
2012 for pursuing higher studies. He received the Ph.D. degree from the department of
Computer and Information Science and Engineering at the University of Florida under
the supervision of Dr. My T. Thai in December 2016. His research interests include
social-aware device-to-device communication underlaying next generation cellular
network, network vulnerability and community structure analysis in complex networks
including large-scale online social, wireless, and biological networks. He also works on
influence propagation and viral marketing in online social networks and approximation
algorithms and its application in combinatorial optimization.
During his Ph.D. study, Alim has published many papers in top-tier peer-reviewed
conferences and journals including IEEE/ACM Transactions. Alim is also the recipient
of many awards such as the University of Florida Graduate School Fellowship Award,
Gartner Group Info Tech Fund, Student Travel Grants of CISE and NSF.
124