on quantifying the effects of mobility on data replication in...

1

On Quantifying the Effects of Mobility on DataReplication in Mobile Ad Hoc Networks

Abstract— In mobile ad hoc networks, nodes move freely and networkpartition occurs frequently. To mitigate this problem, data replication iscommonly used to increase the data availability and reduce the data accessdelay. However, most previous work assumed a particular mobility modeland could not fully study the effects of mobility on data replication. In thispaper, we quantify the effects of mobility on different data replication al-gorithms from various perspectives. The study is based on several metricswhich are not limited to the average access delay and data availability, byincluding the geographical distribution of these values. Through extensiveexperiments, we study the effects of four typical mobility models on datareplication, and identify the most suitable data replication algorithms un-der various mobility models.

I. INTRODUCTION

In mobile ad hoc networks (MANETs) [1], since nodes movefreely, network partition may occur, where nodes in one partitioncannot access data held by nodes in other partitions. To mitigatethis problem, data replication can be used. By replicating thedata into a number of nodes, a data request can be served by theclosest node which has the data replica. Then, even if there is anetwork partition between the requesting node and the originallydata source, the data request can still be served as long as it canreach a node with a data replica. Further, since the data requestcan be served with less number of hops, the data access delay isreduced.

Data replication can increase the data availability and reducethe data access delay, but at the cost of data storage. Sincemobile nodes only have limited storage space, bandwidth, andpower, it is impossible for one node to hold all the data. There-fore, it is important for mobile nodes to cooperate with eachother to decide which node should hold which data replica. Toincrease data availability, a node may not hold the data whichhas already been replicated by neighbors so that its local storagecan be used to hold additional data. However, this may increasethe hop count of some data and increase the data access delay.The problem becomes more complex when mobility is consid-ered, since mobility can change the location of the data replica,and then affect the data availability and data access delay.

There have been some studies on data replication in MANETs[2–4]. These studies show that node mobility significantly af-fects the performance of data replication. However, most ofthese works assume a particular mobility model and only ex-amine the effect of one particular mobility model on the perfor-mance of their proposed algorithm. In other words, they couldnot provide any general insights on the relationship between dif-ferent mobility models and data replication algorithms. In thispaper, we aim to study the effect of various mobility models ondata replication and then identify the most suitable data repli-cation algorithms under various mobility models. More specifi-cally, the contributions of this paper are as follows:1. We quantify the effect of mobility on data replication basedon metrics such as data access delay and data availability. Be-sides these traditional metrics, we also look into the geographi-

cal distribution of access delay and data availability.2. Our experimental results illustrate that different replicationalgorithms show quite different features on node cooperation,and thus achieve different data access performance under differ-ent mobility models. Specifically, we provide a deep analysisand evaluation on the relationship between data replication andnode mobility, and identify the reason behind it.3. We identify the most suitable data replication algorithms un-der various mobility models. These results can be used as guide-lines for researchers and system developers to design and exam-ine data replication algorithms when considering node mobility.

The remainder of this paper is organized as follows. In Sec-tion II, we summarize related work in this area. In Section III,we present the system model, performance metrics, and datareplication algorithms that will be evaluated in this paper. Sec-tion IV reports the evaluation results of how different data repli-cation algorithm perform under various mobility models. Fi-nally, Section V concludes the paper.

II. RELATED WORK

Data replication has been extensively studied in the Web en-vironment [5,6] and the distributed database systems [7], wherethe goal is to place some replicas of the web servers or databaseamong a number of possible locations so that the performance interms of query delay or data availability is optimized. However,in all these conventional works, both web servers and databasesystems are assumed to be static, whereas our work is proposedfor a mobile ad hoc environment.

Recently, much research has been conducted to investigate theeffect of mobility on network performance such as the efficiencyof routing protocol [8] and network partitioning [9]. For rout-ing, the main objective is to find destinations and forward datawith low message overhead, high data delivery ratio, and shortdelivery delay. For network partition, the dynamic changes ofthe size and shape of each partition are important issues. Thesestudies have some similarity to our work from the point of in-vestigating the effect of mobility. However, these works mainlyfocus on link stability and node distribution in the network; i.e.,their studies are at the link and node level. Our work, however,focuses on the effects of mobility on data replication.

Some existing works studied the effects of mobility on dataavailability and data dissemination speed in MANETs. In [10],the authors mathematically define some metrics that representthe effects on information diffusion in MANETs. Huang andChen [11] studied how to replicate data when nodes have groupmobility pattern. However, all these works aim at studying thenetwork dynamics and the characteristic of node mobility. Theycannot provide any deep and general insight on the internal re-lationship between mobility model and data access. The ex-isting work that is most relevant to our work is [12], whereHara proposes metrics to evaluate the impact of mobility on data

2

availability. However, all the metrics are limited to data avail-ability. No specific data replication algorithm is analyzed andthe data access performance is not considered. Therefore, it isnot enough to fully examine the relationship between mobilitymodel and data access. In this work, we will study the effectsof mobility on data replication in terms of data access delay anddata availability. We also identify the most suitable data replica-tion algorithms under various mobility models.

III. PRELIMINARY

In this section, we propose new metrics to quantify the effectsof mobility on data replication and present four data replicationalgorithms that will be used in the evaluation.

A. System Model

We assume there are 𝑚 nodes in the network. The nodes aredenoted by 𝑁 = {𝑁1, 𝑁2, ..., 𝑁𝑚}, where 𝑁𝑘(𝑘 = 1, ...,𝑚)is a node identifier. The communication range of each mobilenode is represented by a circle with radius 𝑅. When two nodesmove out of their communication range, the link between themwill fail and the link failure probability between 𝑁𝑖 and 𝑁𝑗 isdenoted as 𝑓𝑖𝑗 . The link failure probability is related to the dis-tance, and the moving direction and velocity of the nodes. Forexample, if two connected nodes have a long distance and theymove towards the opposite direction, then they are easier to dis-connect and the link failure probability between them is high.We assume every link is bidirectional, and thus 𝑓𝑖𝑗 is equal to𝑓𝑗𝑖. The network can be partitioned due to the limitations of thecommunication range and link failure.

There are 𝑛 different data items in the network. The set ofdata items is denoted by 𝐷 = {𝑑1, 𝑑2, ...𝑑𝑛}, where 𝑑𝑘(𝑘 =1, ..., 𝑛) is a data identifier. Each mobile node maintains someamount of data locally. For simplicity, we assume that data arenot updated, and similar techniques used in [13] and [14,15] canbe applied to extend the proposed scheme to handle data updateor data consistency issues. These data items may be replicated toother nodes based on some data replication algorithm. Becauseof limited memory (or disk) size, each mobile node can onlyhost 𝐵(𝐵 < 𝑛) replicas including its original data. When a mo-bile node 𝑁𝑖 needs to access a data item 𝑑𝑗 , 𝑁𝑖 first searchesits local memory. If 𝑁𝑖 cannot find a copy of 𝑑𝑗 in the localmemory, 𝑁𝑖 communicates with its reachable nodes (throughone-hop or multi-hop links in its partition) to get 𝑑𝑗 . If the re-questing node cannot communicate with any of the nodes thathave 𝑑𝑗 , 𝑑𝑗 is considered to be not accessible to 𝑁𝑖.

B. Evaluation Metrics

Based on the system model, we define several metrics thatrepresent the performance of different data replication algo-rithms.

B.1 Average Access Delay (𝒟)This metric is defined as the average number of hops from the

query node to the nearest node that has the data. Formally, ifwe use 𝑡𝑖𝑗 to denote the access delay of the 𝑗th request of node𝑁𝑖, the average access delay during the whole experiment can

be expressed by the following equation:

𝒟 =∑𝑚

𝑖=1

∑ℛ(𝑖)𝑗=1 𝑡𝑖𝑗∑𝑚

𝑖=1 ℛ(𝑖)(1)

Here, ℛ(𝑖) is a function to return the number of requests initi-ated by node 𝑁𝑖 during the experiment.

B.2 Average Availability (𝒜)Average availability is the average probability that the query

can be served successfully. Similarly, we use a binary variable𝑠𝑖𝑗 to denote if the 𝑗th request of node 𝑁𝑖 is satisfied or not, thedefinition of this metric can be formalized as

𝒜 =∑𝑚

𝑖=1

∑ℛ(𝑖)𝑗=1 𝑠𝑖𝑗∑𝑚

𝑖=1 ℛ(𝑖)(2)

where

𝑠𝑖𝑗 =

{1 the 𝑗th request of node 𝑁𝑖 is satisfied;0 else.

B.3 Distribution of the Access Delay (𝒟ℎ)We believe that the average access delay may not always be

a very significant metric since it treats the two case equally: 1)each request has similar access delay; and 2) some requests havelong access delay but others have short delay. Therefore, tostudy the performance of data replication algorithms, the dis-tribution of access delay is more significant than their averagevalue, and is heavily affected by the adopted mobility model andreplication algorithms. Therefore, we define the distribution ofaccess delay as a new metric by the following equation:

𝒟ℎ =𝑚∑𝑖=1

ℛ(𝑖)∑𝑗=1

𝑏𝑒𝑙(𝑡𝑖𝑗 , ℎ), (ℎ = 0, 𝑡, 2𝑡, ...) (3)

where 𝑡 is the statistic interval of the access delay, and 𝑏𝑒𝑙(𝑡𝑖𝑗 , ℎ)is a function to return if 𝑡𝑖𝑗 belongs to the range [ℎ, ℎ+ 𝑡).

𝑏𝑒𝑙(𝑡𝑖𝑗 , ℎ) =

{1 ℎ ≤ 𝑡𝑖𝑗 < ℎ+ 𝑡;0 else.

B.4 Geographical Distribution of the Access Delay (𝒟⟨ℎ𝑥,ℎ𝑦⟩)Since different mobility models may lead to different deploy-

ment patterns of mobile nodes, we study the geographical dis-tribution of the data access delay. We divide the entire networkarea into ℎ × ℎ small subareas and compare the results in dif-ferent subareas. The geographical distribution of access delay atsubarea ⟨ℎ𝑥, ℎ𝑦⟩ is expressed by the following equation:

𝒟⟨ℎ𝑥,ℎ𝑦⟩ =𝑚∑𝑖=1

ℛ(𝑖)∑𝑗=1

ℒ(𝑡𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) (4)

where ℒ(𝑡𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) is a function that returns if the requesttakes place in the subarea ⟨ℎ𝑥, ℎ𝑦⟩.

ℒ(𝑡𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) ={

𝑡𝑖𝑗 if the 𝑗th request of node 𝑁𝑖 isinitiated in the subarea ⟨ℎ𝑥, ℎ𝑦⟩;

0 else.

3

B.5 Geographical Distribution of Availability (𝒜⟨ℎ𝑥,ℎ𝑦⟩)Similar to the definition of geographical distribution of access

delay, the geographical distribution of availability is representedby the following equation:

𝒜⟨ℎ𝑥,ℎ𝑦⟩ =𝑚∑𝑖=1

ℛ(𝑖)∑𝑗=1

ℒ(𝑠𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) (5)

where ℒ(𝑡𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) is a function that returns if the requesttakes place in the subarea ⟨ℎ𝑥, ℎ𝑦⟩.

ℒ(𝑡𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) ={

𝑠𝑖𝑗 if the 𝑗th request of node 𝑁𝑖 isinitiated in the subarea ⟨ℎ𝑥, ℎ𝑦⟩;

0 else.

C. Data Replication Algorithms

To study the effects of mobility on data replication, we usethe following four representative data replication algorithms.

C.1 Greedy Data Replication

The Greedy data replication is a naive data replication algo-rithm. In this algorithm, each node replicates its most frequentlyaccessed data until the memory is full. More specifically, let 𝑎𝑖𝑗denote the access frequency of node 𝑁𝑖 to data 𝑑𝑗 . Then, eachnode always replicates the data with the highest 𝑎𝑖𝑗 . Since eachnode only takes its own data access pattern into account duringdata replication, this algorithm is non-cooperative.

C.2 Pairing Cooperation Data Replication

Different from the Greedy data replication, in the Paring al-gorithm (e.g., the OTOO scheme in [16] and the DAFN schemein [2]), each mobile node cooperates with one of its neighborsto decide which data to replicate. More specifically, each nodepair 𝑁𝑖 and 𝑁𝑗 calculates a combined access frequency valueto data item 𝑑𝑘 at 𝑁𝑖 and 𝑁𝑗 , called 𝐶𝐴𝐹𝑖𝑗 , respectively. Forexample, for 𝑁𝑖:

𝐶𝐴𝐹𝑖𝑗(𝑘) = 𝑎𝑖𝑘 + 𝑎𝑗𝑘 × (1− 𝑓𝑖𝑗) (6)

Similarly 𝑁𝑗 calculates its combined access frequency. Eachnode sorts the data according to the CAF value and picks dataitems with the highest values to replicate in its memory until nomore data items can be replicated. The data replication decisiondoes not simply depend on the access frequency of one singlenode. It depends on the access frequency of the other pairingnode and the link stability between them.

C.3 Reliable Neighboring Data Replication

The Paring algorithm considers neighboring nodes whenmaking data replication choices. However, it still considers itsown access frequency as the most important factor and only con-siders to cooperate with one neighboring node. As described in[16], the reliable Neighboring data replication algorithm furtherincreases the degree of cooperation and allows nodes to repli-cate and share data with multiple reliable neighbors within itsone-hop range. The replication decision is made depending onthe data access frequency and the link stability. More specif-ically, in this algorithm, part of the node’s memory is used to

hold the most interesting data for itself and others are for its re-liable neighbors. The combined access frequency function fornode 𝑁𝑖 to data 𝑑𝑘 in the Neighboring algorithm is defined as:

𝐶𝐴𝐹𝑖(𝑘) =∑

𝑁𝑗∈𝑛𝑏(𝑖)𝑎𝑗𝑘 × (1− 𝑓𝑖𝑗) (7)

where 𝑛𝑏(𝑖) is the set that includes all reliable neighbors of 𝑁𝑖;i.e., whose link failure rate to 𝑁𝑖 is less than a threshold.

C.4 Reliable Grouping Data Replication

Reliable Grouping data replication (e.g., the DCG scheme in[2] and DRAM scheme in [11]) is the most aggressively coop-erative algorithm in data replication. All nodes in this groupcontribute parts of their memory to share and replicate data forall members in the same group. More specifically, the accessfrequency and access overhead of each data is evaluated fromthe group perspective. During data replication, the data withthe highest grouping access frequency will be allocated first atthe node that minimizes the total access delay within the group.The allocation process is repeated for all data items in the orderof their access frequency until the memory of all nodes in thegroup are filled. The Grouping algorithm can fully exploit thecooperation among a group of well connected nodes. Obviously,the performance of the group data replication algorithm highlydepends on the group connectivity, and the performance will bebetter when the group connectivity is better.

IV. EXPERIMENTS

In this section, we measure the performance of the four datareplication algorithms under typical mobility modes: randomwalk [17], random waypoint [18], Manhattan mobility [10], andreference point group mobility [19].

A. Mobility Models

Random Walk (RW): In this model, at every unit of experimen-tal time, each mobile node randomly determines a movement di-rection, and randomly determines a movement speed from 0 to𝑉 m/sec. From long term point of view, this model offers verylow mobility similar to vibrating in the same position, becausemobile nodes randomly change movement direction.Random WayPoint (RWP): Each node remains stationary fora pause time 𝑆 seconds. Then, it selects a random destinationin the entire area and moves to the destination at a speed de-termined randomly between 0 and 𝑉 m/sec. After reaching thedestination, it pauses again, and then repeats this process. In thismodel, mobile nodes tend to gather at the center of the area.Manhattan Mobility (MM): This model emulates the nodemovement on streets where nodes only travel on the pathways inthe map. Manhattan grid maps of horizontal and vertical streetsare used to restrict the node movement. On each street, the mo-bile nodes move along the lanes in both directions. At eachintersection, the mobile nodes choose their directions and speed(0 to 𝑉 m/sec) randomly.Reference Point Group Mobility (RPGM): This model is usedto model group mobility. Each group has a logical “center”called a reference point and group members (nodes). Each ref-erence point moves according to the RWP model with 𝑉 ′ m/sec

4

TABLE IPARAMETER CONFIGURATION

Parameter Symbol Value RangeNumber of nodes m 300Node movement speed V 5m/s (3, 8m/s)Group movement speed (RPGM) V’ 5m/s (3, 8m/s)Radius of group (RPGM) R 300mNode pause time S 5sec (3, 7sec)Group pause time (RPGM) S’ 5sec (3, 7sec)Communication range C 100mNumber of data n 200Memory Size B 10Zipf access 𝜃 0.8

(maximum speed) and 𝑆′ sec (pause time). In each group, nodesare uniformly distributed within a certain radius from the refer-ence point. To achieve this, we assume that each node movesaccording to the RW model with 𝑉 m/sec (maximum speed)within that range. Specifically, a node’s movement vector iscomposed by adding the movement vector of the RW model ofthe node to that of the RWP model of the reference point.

B. Simulation Settings

There are 𝑚 mobile nodes (𝑁 = 𝑁1, ..., 𝑁𝑚) in a 2500𝑚 ×2500𝑚 square area. All nodes move based on the mobilitymodel. For the MM model, we use a grid road map with sixvertical and horizontal streets; i.e., 25 blocks of the same size(500𝑚 × 500𝑚). For the RPGM model, we assume that thereare 25 reference points 𝑟𝑝1,...,𝑟𝑝25, and 𝑁𝑗(𝑗 = 1, ...,𝑚) setsits reference points as 𝑟𝑝⌈(𝑚/25)⌉.

At the beginning of the simulations, the initial position ofeach mobile node is randomly determined in the space wherethe node can exist. For example, nodes can only exist on a roadin the MM model. We set the simulation time 𝑇 as 500,000 sec-onds. Each node initiates query request every 5 seconds. There-fore, each node has almost 100,000 requests during the entiresimulation period. We neglect the first 1000 seconds to removethe impact of the initial start. Table I summarizes the parametersand their values used in the experiments. Most parameters arefixed to constant values while others can change within a range

TABLE IIACCESS DELAY WITH UNIFORM DATA ACCESS PATTERN

RW RWP MM RPGMGreedy 0.8410 0.8511 1.4296 1.4346Pairing 0.9047 0.8619 1.3238 1.3347Neighboring 0.9863 0.8664 1.0423 1.3153Grouping 1.0093 0.8689 1.237 1.3559

TABLE IIIACCESS DELAY WITH ZIPF DATA ACCESS PATTERN (𝜃 = 0.8)


represented by the parenthetic values.

C. Results

C.1 Average Delay and Average Data Availability

In this subsection, we study the average delay and averagedata availability of four data replication algorithms under fourmobility models with uniform data access pattern and a moreskewed data access pattern, i.e., Zipf data access.

Table II shows the average query delay with uniform data ac-cess pattern. As for the RW mobility model, both the Greedyalgorithm and the Pairing algorithm achieve relatively shorteraccess delay than the other two. The short access delay of theGreedy algorithm is due to its low data availability (as shown inTable IV), and the missed queries will not be accounted. ThePairing algorithm, however, helps share data with one-hop pair-ing nodes. In this way, the node and its paring node can bothserve its requests. Considering the relatively reliable connec-tivity between paring nodes under the RW mobility, the Paringalgorithm can achieve higher data availability and lower querydelay compared to other algorithms. Similar results can be ob-served from Table III.

In RWP, there is no reliable connectivity between any nodepair, and hence the Paring algorithm may not be helpful for datasharing. Similar to the Greedy algorithm, most requests in Par-ing are served locally. Therefore, the average query delay of theParing algorithm becomes even shorter in this case at the cost oflow data availability (see Table IV). Similar results exist in theNeighboring and Grouping algorithms. However, the averagedelay of the Greedy algorithm increases in RWP. This is relatedto the network formation under RWP where nodes tend to gatherat the central area. Thus, large partitions may be formed in thecenter, which increases the possibility of finding available datafrom nearby nodes to serve query requests when data access isuniformed distributed.

Similarly, in the MM and the RPGM mobility models, due tothe road layout constraint and the restricted mobility pattern, thenetwork has relatively higher density from nodes perspective.As expected, larger partitions can be formed in MM and RPGMcompared to RW. Therefore, the query delay becomes larger in

TABLE IVDATA AVAILABILITY WITH UNIFORM DATA ACCESS PATTERN


TABLE VDATA AVAILABILITY WITH ZIPF DATA ACCESS PATTERN (𝜃 = 0.8)


5

Greedy Paring Neighboring Grouping0

0.05

0.1

0.15

0.2

0.25

local1 hop2 hops3 hops4 hops5 hops6 hops6+ hops

(a)RW


0.05

0.1

0.15

0.2

0.25


(b)RWP


0.05

0.1

0.15

0.2

0.25


(c)MM


0.05

0.1

0.15

0.2

0.25


(d)RPGM

Fig. 1. Distribution of the data access delay (with uniform data access pattern)


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(a)RW

Greedy Pairing Neighboring Grouping0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(b)RWP

Greedy Pairing Neighboring Grouping0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(c)MM


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(d)RPGM

Fig. 2. Distribution of the data access delay (with Zipf (𝜃 = 0.8) data access pattern)

the MM and the RPGM models.Table III shows the results of query delay with skewed data

access following Zipf (𝜃 = 0.8) distribution. By comparingTable II and Table III, we can see that as data access becomesmore skewed, the average data access delay decreases dramati-cally. This is because as data access becomes more skewed, itbecomes easier for each node to buffer and replicate its inter-ested data into its own memory or at nearby nodes so that morequery requests can be served locally or from nearby neighbors.Here we also note that there are two factors that may affect theperformance in RWP. First, due to random mobility, there willbe fewer reliable connections in RWP. Therefore the cooperationbased algorithms tend to work like the Greedy algorithm result-ing in low access delay. Second, the cooperative algorithms maystill replicate and share data with other nodes when they findsome reliable connections occasionally. This increases the ac-cess delay as two nodes move farther away but still reachablewith multiple hops. When the data access pattern is uniform,the first factor has more weight on the performance. When dataaccess becomes skewed, the second factor has more weight be-cause some interesting data with high access frequency may notbe replicated locally. Therefore, the Paring and Neighboring al-gorithms have a larger access delay in RWP than those in RWwhen the access pattern follows Zipf distribution.

Table IV and Table V show the results of data availability withuniform and Zipf data access pattern. Similar to the results ofdata access delay, the data availability is much higher in Zipfdata access than uniform data access. Moreover, we can seethat MM and RPGM always have better data availability thanRW and RWP. This advantage comes from the higher relativenode density and more similar node mobility in MM and RPGM.More nodes can be accessed and more data can be used to servequery requests.

Tables II, III, IV and V also demonstrate that cooperation

helps to improve performance in MM and RPGM, but less im-provement in RW, and none in RWP. This is because MM andRPGM have more mobility similarity between close nodes thanthat in RW and RWP. If close nodes can move together for along time, cooperative data replication algorithms such as Par-ing, Neighboring, and Grouping have more advantages.

In summary, in RWP where nodes move randomly, theGreedy algorithm is the best solution. In the RW model whereclose nodes have more reliable connections, the Paring algo-rithm works the best. In MM, mobile nodes tend to have morereliable neighbors and have higher nodes density, the Neigh-boring algorithm shows more advantage. In RPGM modelwhere nodes move following strict group mobility, Groupingdata replication outperforms others.

C.2 Distribution of Access Delay

Figure 1 and Figure 2 show the distribution of access delayunder uniform and Zipf distributions. In these figures, the y-axis indicates the request success ratio. Each bar represents thequery delay in terms of hops. Seven different bars representdifferent distribution of the access delay, from 0 hop to 6+ hops.

As shown in Figure 1(a), for the RW model, since Paring,Neighboring, and Grouping algorithms share data among nearbynodes, a few requests that are not satisfied locally can be servedfrom one-hop or two-hop neighbors. Because the data accessis uniformly distributed, the improvement from cooperation isnot too much. When the data access become more skewed (asshown in Figure 2(a)), more cooperations exist, and more re-quests are served from nearby nodes. For example, comparedto the Greedy algorithm, the Neighboring algorithm sacrifices5% requests served locally, but achieves 15% more requests thatcan be served from one-hop neighbors. Similarly, the Groupingalgorithm tries to share data in a larger area. It has the fewestnumber of requests served locally, but the largest number of sat-

6

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d)RPGM

Fig. 3. Geographical distribution of the access delay (Greedy Algorithm)

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d)RPGM

Fig. 4. Geographical distribution of the access delay (Paring Algorithm)

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay(d)RPGM

Fig. 5. Geographical distribution of the access delay (Neighboring Algorithm)

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a)RW

0500

10001500

20002500

0

1000

2000

30000

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d)RPGM

Fig. 6. Geographical distribution of the access delay (Grouping Algorithm)

isfied requests from two-hop or three-hop neighbors.In Figures 1(b) and 2(b), since nodes move randomly in RWP,

cooperative algorithms such as Paring, Neighboring, and Group-ing do not get help from cooperation. The Greedy algorithm inwhich each node replicates its most interested data, however, ismore suitable for RWP.

In MM, due to the road layout constraint, nodes can onlymove on and follow the roads. Therefore, each node has moreneighbors in MM than that in RW and RWP, and the averagenetwork partition size can be larger than that in RW and RWP.As a result, as shown in Figures 1(c) and 2(c), each replication

algorithm has more requests satisfied from multi-hop neighbors.In Figures 1(d) and 2(d), the result is similar to that in MM

due to the relatively reliable connectivity and higher densityin RPGM. These two figures also clearly demonstrate that inRPGM, the Grouping replication algorithm has more requestsserved by the neighboring nodes that are multiple hops away.

C.3 Geographical Distribution of Access Delay

Figures 3 to 6 show geographical distribution of access delaywith different data replication algorithms. Due to page limit, weonly present the results with Zipf distribution.

7

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d)RPGM

Fig. 7. Geographical distribution of the data availability (Greedy Algorithm)

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d)RPGM

Fig. 8. Geographical distribution of the data availability (Paring Algorithm)

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d)RPGM

Fig. 9. Geographical distribution of the data availability (Neighboring Algorithm)

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a)RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b)RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c)MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d)RPGM

Fig. 10. Geographical distribution of the data availability (Grouping Algorithm)

As shown in Figures 3(a), 4(a), 5(a), and 6(a), geographi-cal location does not affect the access delay too much with theRW mobility model. This is because nodes are initially ran-domly distributed and randomly determine movement directionsin RW. The node density is relatively even, and hence there is nolarge variation for data access delay at different locations. How-ever, we can still see that the Greedy algorithm and the Paringalgorithm have lower access delay than the other two, which isconsistent with our previous results on access delay.

From Figures 3(b), 4(b), 5(b), and 6(b), we can see some in-teresting results under RWP. When a node is at the boundary of

the simulation area, its access delay is short. As it moves to-wards the center area, its access delay becomes larger first andthen begins to decrease. In RWP, during each movement cy-cle, each node randomly chooses a destination and moves there.Therefore, nodes have higher probability to appear at the cen-ter area, and thus the central area has higher node density thanthe boundary area. Thus, nodes are easier to be isolated at theboundary area, but form large partitions at the center of the sim-ulation area. In an extreme case where one node is isolated, itsaccess delay is the lowest since it can only access the local repli-cated data. At the center area, the node density is high, which

8

helps nodes to find their interested data from close nearby nodes,resulting in a low access delay.

Under MM, shown in Figures 3(c), 4(c), 5(c), and 6(c), mo-bile nodes are only allowed to move in the vertical or horizontaldirections following the road layout, and thus the access delayis only available at the position where there is a road. This isgood for achieving a relatively higher node density and avoid-ing nodes being isolated, but it results in larger access delaycompared to RW and RWP.

Finally, Figures 3(d), 4(d), 5(d), and 6(d) compare the accessdelay of different data replication algorithms under the RPGMmobility model. Similar to RWP, RPGM has lower access de-lay at the boundary area and the center area but larger delayin the middle. This is because the movement of the referencepoint of each group follows the RWP mobility model, and themobility pattern of each mobile group follows RWP. Becauseof the group mobility characteristic of RPGM, the connectiv-ity among nodes in the same group are relatively reliable. Thishelps nodes to form larger partitions and thus more nodes canbe reached. Therefore, the access delay is larger in the RPGMmobility model than that in the RWP.

C.4 Geographical Distribution of Data Availability

Similar to the geographical distribution of data access delay,from Figures 7(a), 8(a), 9(a), and 10(a), we can see that dataavailability is independent to the location where the query isinitiated under RW. However, different data replication algo-rithms achieve different data availability. In the Greedy algo-rithm, there is no data sharing since each node only replicatesdata according to its own interest. Therefore, there could beduplicated data among closely connected nodes, which reducesthe overall data availability. The Neighboring algorithm and theGrouping algorithm aim to share data with nearby nodes, whichcan remove some data redundancy and improve the data avail-ability. However, when partition occurs, data saved on neigh-bors may not be available. The Paring algorithm, however, con-siders to replicate data on the most reliable neighbor, and canachieve the best balance between nodes’ cooperation and therisk of partition. Therefore, the Paring algorithm has the highestdata availability.

As shown in Figures 7(b), 8(b), 9(b), and 10(b), under RWP,nodes are easier to stay around the central area, and thus the dataavailability is higher in the center area. We can also see that theGreedy algorithm has the best data availability in RWP since itdoes not consider any cooperation.

In MM, shown in Figures 7(c), 8(c), 9(c), and 10(c), thereare more nodes around the intersection area than other area, andhence the data availability at the intersection is higher. We alsofind an interesting fact existing in the cooperative data replica-tion algorithms. Let’s use the Paring algorithm as an example.Figure 11 shows the data availability along the third horizontalroad, i.e., the position (x,y) changes from [0, 1500] to [2500,1500]. In this figure, we can see that both the intersection areaand the middle segments of the road have higher data availabil-ity. However, the data availability is low at other areas that areclose to the intersections. This fact comes from the character-istic of the MM mobility model. Due to the road layout con-straint, mobile nodes may split at the intersection area when

0 500 1000 1500 2000 25000.68

0.69

0.7

0.71

0.72

0.73

0.74

0.75

Ava

ilabi

lity

Intersection Intersection IntersectionIntersection

Fig. 11. Geographical distribution of the Paring algorithm in MM

they choose different movement directions. Since cooperationbased data replication algorithms rely on data sharing amongnearby nodes, some data may not be available when split hap-pens, which affects the data availability at these areas. However,when nodes are aware of the splitting, they will reorganize theircollaborative nodes and share data with them. Therefore, afterthe reorganization process, i.e., at the middle segments of theroad, they can achieve a relatively higher data availability.

Figures 7(d), 8(d), 9(d), and 10(d) present results for RPGM.Due to the similar mobility pattern of the mobile nodes in RWPand the reference point in RGPM, the shape of the data availabil-ity figure of RPGM is similar to RWP, i.e., higher data availabil-ity near the center area and low data availability at the boundaryarea. Since nodes in the same group have quite similar mobilitypattern and more reliable connectivity, RPGM can achieve muchhigher data availability than RWP. By comparing different datareplication algorithms, we can see that the Grouping algorithmhas the highest data availability. The advantage comes from itsdata sharing within each mobile group, and thus nodes’ memorycan be utilized more efficiently.

D. Discussions

In this section, we summarize the experimental results andidentify the most suitable data replication algorithms under var-ious mobility models.

RW Model: Under RW, nodes have low mobility similar tovibrating in the same position. Then, the connectivity betweenclosely connected nodes is relatively reliable. Also, RW alwaysforms small network partitions but rarely forms large ones. Dueto the low mobility, even if there is a network partition, eachpartition is relatively stable. Thus, when designing a data repli-cation algorithm, it is more appropriate for nodes to coopera-tively replicate data with their closely connected neighbor, andthe replication should not rely on data sharing with a large num-ber of nodes. This also explains why the Paring algorithm is themost efficient algorithm in the simulation.

RWP Model: In RWP, nodes move randomly and do notshow any reliable connections with each other, and hence thenode partition rate is high. Thus, it may not be good to sharedata with others and the non-cooperative Greedy algorithm maybe the best choice. On the other hand, since nodes tend to gather

9

at the center of the network, it forms a large partition around thecentral area, where the availability is high. Thus, when design-ing a data replication algorithm, it is better to push and replicatethe most important data on the nodes around the central area.Further, mobile nodes should forward their requests to the cen-tral area to improve the query success ratio.

MM Model: The MM model has several interesting featuresdue to its restricted mobility. First, in MM the connection be-tween neighboring nodes lasts longer than that in RWP. The con-nectivity is relatively reliable because several neighboring nodeson the same street with the same direction often move together.Therefore, when designing a data replication algorithm underMM, it is effective to share data among neighbors in the samedirection. Second, the node density is higher in the intersectionarea than other areas. Similar to the RWP model, it is more suit-able to buffer some important data at these areas to better servefuture requests. Finally, partitions frequently occur after the in-tersection area, and resulting in low data availability in theseareas. To maintain high data availability and low query delay,new schemes should be designed to predict partition at the inter-section and pre-fetch the important data before the partition.

RPGM Model: The RPGM model provides much higher dataavailability but longer query delay than other mobility mod-els. Due to group mobility, RPGM always provides higherconnectivity among nodes in the same group and the most re-liable group connection. As a result, cooperation based datareplication algorithm can achieve the best performance in termsof data availability by cooperatively sharing data within eachgroup. However, the negative effect is that the query delay isrelatively longer than other mobility models due to node coop-eration. By contributing more memory to replicate data for othergroup members, mobile nodes have to access some of the inter-ested data from other nodes through multi-hop. In summary,it is effective to share data among nodes in the same group inRPGM. It is important to have a good group detection techniqueto detect nodes moving in the same group and then effectivelyallocate data within the group.

V. CONCLUSION

In mobile ad hoc networks, nodes move freely and networkpartition occurs frequently. To mitigate this problem, data repli-cation is commonly used to increase the data availability andreduce the data access delay. However, most previous work as-sumed a particular mobility model and could not fully study theeffects of mobility on data replication. In this paper, we quantifythe effects of mobility on different data replication algorithmsfrom various perspectives. The study is based on several met-rics which are not limited to the average access delay and dataavailability, by including the geographical distribution of thesevalues. Through extensive experiments, we study the effects offour typical mobility models on data replication, and identify themost suitable data replication algorithms under various mobilitymodels.

We believe that the experimental results and knowledge ob-tained from the results are very useful for researchers to designvarious algorithms for data sharing and replication on these typ-ical mobility models. To the best our knowledge, this is the firstwork that explores and provides a deep explanation of the rela-

tionship between node mobility and data replication algorithms.

REFERENCES[1] D. B. Johnson and D. A. Maltz. Dynamic source routing in ad hoc wireless

networks. Mobile Computing, Kluwer, pages 153–181, 1996.[2] T. Hara and S. K. Madria. Data replication for improving data acces-

sibility in ad hoc networks. IEEE Transactions on Mobile Computing,5(11):1515–1532, 2006.

[3] K. Wang and B. Li. Efficient and guaranteed service coverage in partition-able mobile ad-hoc networks. IEEE INFOCOM, 2002.

[4] J. Luo, J. Hubaux, and P. Eugster. Pan: Providing Reliable Storage inMobile Ad Hoc Networks with Probabilistic Quorum Systems. ACM Mo-biHoc, 2003.

[5] H. Yu and A. Vahdat. Minimal Replication Cost for Availability. ACMSymposium on Principles of Distributed Computing (PODC), 2002.

[6] H. Yu and A. Vahdat. The costs and limits of availability for replicatedservices. ACM Transactions on Computer Systems, 24:70–113, 2006.

[7] L. Gao, M. Dahlin, A. Nayate, J. Zheng, A. Iyengar. Consistency andReplication: Application Specific Data Replication for Edge Services. In-ternational conference on World Wide Web, 2003.

[8] J. Zhao and G. Cao. Vadd: Vehicle-assisted data delivery in vehicular adhoc networks. IEEE Transactions on Vehicular Technology, 57(3):1910–1922, May 2008 (A preliminary version appeared in IEEE infocom’06).

[9] J. Hahner, D. Dudkowski, P. Marron, and K. Rothermel. Quantifying net-work partitioning in mobile ad hoc networks. International Conference onMobile Data Management, pages 174–181, 2007.

[10] F. Bai, N. Sadagopan, and A. Helmy. Important: A framework to systemat-ically analyze the impact of mobility on performance of routing protocolsfor adhoc networks. IEEE INFOCOM, 2003.

[11] J. Huang and M. Chen. On the effect of group mobility to data replicationin ad hoc networks. IEEE Transactions on Mobile Computing, 5:492 –507, 2006.

[12] T. Hara. Quantifying impact of mobility on data availability in mobile adhoc networks. IEEE Transactions on Mobile Computing, 9(2):241–258,2010.

[13] T. Hara. Replica Allocation in Ad hoc Networks with Periodic Data Up-date. International Conference on Mobile Data Management, 2002.

[14] T. Hara and S. Madria. Consistency management strategies for data repli-cation in mobile ad hoc networks. IEEE Transactions on Mobile Comput-ing, 8(7):950–967, 2009.

[15] J. Cao, Y. Zhang, G. Cao, and L. Xie. Data consistency for cooperativecaching in mobile environments. IEEE Computer, 40(4):60–66, 2007.

[16] L. Yin and G. Cao. Balancing the tradeoffs between data accessibilityand query delay in ad hoc networks. IEEE International Symposium onReliable Distributed Systems, pages 289–298, 2004.

[17] K. Pearson. The problem of the random walk. Nature, 72(1867):342,1905.

[18] J. Broch, D. Maltz, D. Johnson Y. Hu, and J. Jetcheva. A PerformanceComparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols.ACM MobiCom, pages 85–97, October 1998.

[19] X. Hong, M. Gerla, G. Pei, and C. Chiang. A group mobility model for adhoc wireless networks. ACM international workshop on Modeling, analy-sis and simulation of wireless and mobile systems, pages 53–60, 1999.

on quantifying the effects of mobility on data replication in...

Documents